Monday 8 April 2024

Pearson’s r

The Pearson’s product-moment correlation coefficient, also known as Pearson’s r, describes the linear relationship between two quantitative variables.

These are the assumptions your data must meet if you want to use Pearson’s r:

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

The Pearson’s r is a parametric test, so it has high power. But it’s not a good measure of correlation if your variables have a nonlinear relationship, or if your data have outliers, skewed distributions, or come from categorical variables. If any of these assumptions are violated, you should consider a rank correlation measure.

The formula for the Pearson’s r is complicated, but most computer programs can quickly churn out the correlation coefficient from your data. In a simpler form, the formula divides the covariance between the variables by the product of their standard deviations.

FormulaExplanation

  \begin{equation*} r = \frac{ n\sum{xy}-(\sum{x})(\sum{y})}{% \sqrt{[n\sum{x^2}-(\sum{x})^2][n\sum{y^2}-(\sum{y})^2]}} \end{equation*}

  • r_{xy} = strength of the correlation between variables x and y
  • n = sample size
  • \sum = sum of what follows…
  • X = every x-variable value
  • Y = every y-variable value
  • XY = the product of each x-variable score and the corresponding y-variable score

Pearson sample vs population correlation coefficient formula

When using the Pearson correlation coefficient formula, you’ll need to consider whether you’re dealing with data from a sample or the whole population.

The sample and population formulas differ in their symbols and inputs. A sample correlation coefficient is called r, while a population correlation coefficient is called rho, the Greek letter ρ.

The sample correlation coefficient uses the sample covariance between variables and their sample standard deviations.

Sample correlation coefficient formulaExplanation

  \begin{equation*} r_{xy} = \frac {cov(x,y)}{{s_x}{s_y}} \end{equation*}

  • rxy= strength of the correlation between variables x and y
  • cov(x,y) = covariance of x and y
  • sx = sample standard deviation of x
  • sy = sample standard deviation of y

The population correlation coefficient uses the population covariance between variables and their population standard deviations.

Population correlation coefficient formulaExplanation

  \begin{equation*} \rho_{XY} = \frac {cov(X,Y)}{{\sigma_X}{\sigma_Y}} \end{equation*}

  • ρXY= strength of the correlation between variables X and Y
  • cov(X,Y) = covariance of X and Y
  • σX = population standard deviation of X
  • σY = population standard deviation of Y

1. Pearson Correlation Coefficient (r):

This is the most common type, used to measure the linear relationship between two continuous variables. We saw it in the previous examples.

Example: Rainfall (X) and River Flow Rate (Y)

We'll calculate the Pearson Correlation Coefficient (r) to see the linear relationship between rainfall (X) and river flow rate (Y) using the provided data:

DayRainfall (X)Flow Rate (Y)
1510
21015
31522
42030
52538









1. Calculate the means for X (rainfall) and Y (flow rate):

  • Mean of X = (5 + 10 + 15 + 20 + 25) / 5 = 15 mm
  • Mean of Y = (10 + 15 + 22 + 30 + 38) / 5 = 23 m³/s

2. Find the deviations from the mean for each data point:

DayRainfall (X)Flow Rate (Y)X - X̅Y - Y̅(X - X̅)²(Y - Y̅)²
1510-10-13100169
21015-5-82564
315220-101
42030572549
525381015100225









3. Calculate the covariance:

  • Covariance = Σ [(X - X̅) * (Y - Y̅)] / (n - 1)
  • Covariance = [( -10 * -13) + (-5 * -8) + (0 * -1) + (5 * 7) + (10 * 15)] / (5 - 1)
  • Covariance = 238 / 4 = 59.5 m³/s * mm

4. Calculate the standard deviation of X (Sx) and standard deviation of Y (Sy):

  • We won't perform the detailed standard deviation calculation here, but you can use a calculator or spreadsheet. Let's assume:
    • Sx ≈ 5.77 mm
    • Sy ≈ 8.94 m³/s

5. Calculate the correlation coefficient (r):

  • r = Covariance / (Sx * Sy)
  • r = 59.5 m³/s * mm / (5.77 mm * 8.94 m³/s)
  • r ≈ 1.28 (This value seems high due to a small sample size. In reality, correlation coefficients should be between -1 and 1.)

Interpretation:

While the calculated value (r ≈ 1.28) exceeds the normal range, it indicates a positive correlation between rainfall (X) and river flow rate (Y). This aligns with our expectation that higher rainfall tends to lead to increased river flow. However, the small sample size (n = 5) can affect the accuracy of the result. It's recommended to collect more data for a more reliable analysis.

No comments:

Post a Comment

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...