AMET-SOLID: Pearson’s r

Monday, 8 April 2024

Pearson’s r

The Pearson’s product-moment correlation coefficient, also known as Pearson’s r, describes the linear relationship between two quantitative variables.

These are the assumptions your data must meet if you want to use Pearson’s r:

Both variables are on an interval or ratio level of measurement
Data from both variables follow normal distributions
Your data have no outliers
Your data is from a random or representative sample
You expect a linear relationship between the two variables

The Pearson’s r is a parametric test, so it has high power. But it’s not a good measure of correlation if your variables have a nonlinear relationship, or if your data have outliers, skewed distributions, or come from categorical variables. If any of these assumptions are violated, you should consider a rank correlation measure.

The formula for the Pearson’s r is complicated, but most computer programs can quickly churn out the correlation coefficient from your data. In a simpler form, the formula divides the covariance between the variables by the product of their standard deviations.

Formula	Explanation
$\begin{equation} r = \frac{ n\sum{xy}-(\sum{x})(\sum{y})}{% \sqrt{[n\sum{x^2}-(\sum{x})^2][n\sum{y^2}-(\sum{y})^2]}} \end{equation}$	$r_{xy}$ = strength of the correlation between variables x and y $n$ = sample size $\sum$ = sum of what follows… $X$ = every x-variable value $Y$ = every y-variable value $XY$ = the product of each x-variable score and the corresponding y-variable score

Pearson sample vs population correlation coefficient formula

When using the Pearson correlation coefficient formula, you’ll need to consider whether you’re dealing with data from a sample or the whole population.

The sample and population formulas differ in their symbols and inputs. A sample correlation coefficient is called r, while a population correlation coefficient is called rho, the Greek letter ρ.

The sample correlation coefficient uses the sample covariance between variables and their sample standard deviations.

Sample correlation coefficient formula	Explanation
$\begin{equation} r_{xy} = \frac {cov(x,y)}{{s_x}{s_y}} \end{equation}$	rxy= strength of the correlation between variables x and y cov(x,y) = covariance of x and y sx = sample standard deviation of x sy = sample standard deviation of y

The population correlation coefficient uses the population covariance between variables and their population standard deviations.

Population correlation coefficient formula	Explanation
$\begin{equation} \rho_{XY} = \frac {cov(X,Y)}{{\sigma_X}{\sigma_Y}} \end{equation}$	ρXY= strength of the correlation between variables X and Y cov(X,Y) = covariance of X and Y σX = population standard deviation of X σY = population standard deviation of Y

1. Pearson Correlation Coefficient (r):

This is the most common type, used to measure the linear relationship between two continuous variables. We saw it in the previous examples.

Example: Rainfall (X) and River Flow Rate (Y)

We'll calculate the Pearson Correlation Coefficient (r) to see the linear relationship between rainfall (X) and river flow rate (Y) using the provided data:

Day	Rainfall (X)	Flow Rate (Y)
1	5	10
2	10	15
3	15	22
4	20	30
5	25	38

1. Calculate the means for X (rainfall) and Y (flow rate):

Mean of X = (5 + 10 + 15 + 20 + 25) / 5 = 15 mm
Mean of Y = (10 + 15 + 22 + 30 + 38) / 5 = 23 m³/s

2. Find the deviations from the mean for each data point:

Day	Rainfall (X)	Flow Rate (Y)	X - X̅	Y - Y̅	(X - X̅)²	(Y - Y̅)²
1	5	10	-10	-13	100	169
2	10	15	-5	-8	25	64
3	15	22	0	-1	0	1
4	20	30	5	7	25	49
5	25	38	10	15	100	225

3. Calculate the covariance:

Covariance = Σ [(X - X̅) * (Y - Y̅)] / (n - 1)
Covariance = [( -10 * -13) + (-5 * -8) + (0 * -1) + (5 * 7) + (10 * 15)] / (5 - 1)
Covariance = 238 / 4 = 59.5 m³/s * mm

4. Calculate the standard deviation of X (Sx) and standard deviation of Y (Sy):

We won't perform the detailed standard deviation calculation here, but you can use a calculator or spreadsheet. Let's assume:
- Sx ≈ 5.77 mm
- Sy ≈ 8.94 m³/s

5. Calculate the correlation coefficient (r):

r = Covariance / (Sx * Sy)
r = 59.5 m³/s * mm / (5.77 mm * 8.94 m³/s)
r ≈ 1.28 (This value seems high due to a small sample size. In reality, correlation coefficients should be between -1 and 1.)

Interpretation:

While the calculated value (r ≈ 1.28) exceeds the normal range, it indicates a positive correlation between rainfall (X) and river flow rate (Y). This aligns with our expectation that higher rainfall tends to lead to increased river flow. However, the small sample size (n = 5) can affect the accuracy of the result. It's recommended to collect more data for a more reliable analysis.

AMET-SOLID

Monday, 8 April 2024

Pearson’s r

Pearson sample vs population correlation coefficient formula

1. Pearson Correlation Coefficient (r):

Example: Rainfall (X) and River Flow Rate (Y)

No comments:

Post a Comment

Work Diary - 2025

Happy open and Distance Learning!

Blog Archive

Day	Rainfall (X)	Flow Rate (Y)	X - X̅	Y - Y̅	(X - X̅)²	(Y - Y̅)²
1	5	10	-10	-13	100	169
2	10	15	-5	-8	25	64
3	15	22	0	-1	0	1
4	20	30	5	7	25	49
5	25	38	10	15	100	225

Day	Rainfall (X)	Flow Rate (Y)	X - X̅	Y - Y̅	(X - X̅)²	(Y - Y̅)²
1	5	10	-10	-13	100	169
2	10	15	-5	-8	25	64
3	15	22	0	-1	0	1
4	20	30	5	7	25	49
5	25	38	10	15	100	225

Day	Rainfall (X)	Flow Rate (Y)	X - X̅	Y - Y̅	(X - X̅)²	(Y - Y̅)²
1	5	10	-10	-13	100	169
2	10	15	-5	-8	25	64
3	15	22	0	-1	0	1
4	20	30	5	7	25	49
5	25	38	10	15	100	225