The Pearson’s product-moment correlation coefficient, also known as Pearson’s r, describes the linear relationship between two quantitative variables.
These are the assumptions your data must meet if you want to use Pearson’s r:
- Both variables are on an interval or ratio level of measurement
- Data from both variables follow normal distributions
- Your data have no outliers
- Your data is from a random or representative sample
- You expect a linear relationship between the two variables
The Pearson’s r is a parametric test, so it has high power. But it’s not a good measure of correlation if your variables have a nonlinear relationship, or if your data have outliers, skewed distributions, or come from categorical variables. If any of these assumptions are violated, you should consider a rank correlation measure.
The formula for the Pearson’s r is complicated, but most computer programs can quickly churn out the correlation coefficient from your data. In a simpler form, the formula divides the covariance between the variables by the product of their standard deviations.
Formula | Explanation |
---|---|
|
Pearson sample vs population correlation coefficient formula
When using the Pearson correlation coefficient formula, you’ll need to consider whether you’re dealing with data from a sample or the whole population.
The sample and population formulas differ in their symbols and inputs. A sample correlation coefficient is called r, while a population correlation coefficient is called rho, the Greek letter ρ.
The sample correlation coefficient uses the sample covariance between variables and their sample standard deviations.
Sample correlation coefficient formula | Explanation |
---|---|
|
The population correlation coefficient uses the population covariance between variables and their population standard deviations.
Population correlation coefficient formula | Explanation |
---|---|
|
1. Pearson Correlation Coefficient (r):
This is the most common type, used to measure the linear relationship between two continuous variables. We saw it in the previous examples.
Example: Rainfall (X) and River Flow Rate (Y)
We'll calculate the Pearson Correlation Coefficient (r) to see the linear relationship between rainfall (X) and river flow rate (Y) using the provided data:
Day | Rainfall (X) | Flow Rate (Y) |
---|---|---|
1 | 5 | 10 |
2 | 10 | 15 |
3 | 15 | 22 |
4 | 20 | 30 |
5 | 25 | 38 |
1. Calculate the means for X (rainfall) and Y (flow rate):
- Mean of X = (5 + 10 + 15 + 20 + 25) / 5 = 15 mm
- Mean of Y = (10 + 15 + 22 + 30 + 38) / 5 = 23 m³/s
2. Find the deviations from the mean for each data point:
Day | Rainfall (X) | Flow Rate (Y) | X - X̅ | Y - Y̅ | (X - X̅)² | (Y - Y̅)² |
---|---|---|---|---|---|---|
1 | 5 | 10 | -10 | -13 | 100 | 169 |
2 | 10 | 15 | -5 | -8 | 25 | 64 |
3 | 15 | 22 | 0 | -1 | 0 | 1 |
4 | 20 | 30 | 5 | 7 | 25 | 49 |
5 | 25 | 38 | 10 | 15 | 100 | 225 |
3. Calculate the covariance:
- Covariance = Σ [(X - X̅) * (Y - Y̅)] / (n - 1)
- Covariance = [( -10 * -13) + (-5 * -8) + (0 * -1) + (5 * 7) + (10 * 15)] / (5 - 1)
- Covariance = 238 / 4 = 59.5 m³/s * mm
4. Calculate the standard deviation of X (Sx) and standard deviation of Y (Sy):
- We won't perform the detailed standard deviation calculation here, but you can use a calculator or spreadsheet. Let's assume:
- Sx ≈ 5.77 mm
- Sy ≈ 8.94 m³/s
5. Calculate the correlation coefficient (r):
- r = Covariance / (Sx * Sy)
- r = 59.5 m³/s * mm / (5.77 mm * 8.94 m³/s)
- r ≈ 1.28 (This value seems high due to a small sample size. In reality, correlation coefficients should be between -1 and 1.)
Interpretation:
While the calculated value (r ≈ 1.28) exceeds the normal range, it indicates a positive correlation between rainfall (X) and river flow rate (Y). This aligns with our expectation that higher rainfall tends to lead to increased river flow. However, the small sample size (n = 5) can affect the accuracy of the result. It's recommended to collect more data for a more reliable analysis.
No comments:
Post a Comment