Monday 8 April 2024

Spearman’s rank correlation and other coefficient formula

The symbols for Spearman’s rho are ρ for the population coefficient and rs for the sample coefficient. The formula calculates the Pearson’s r correlation coefficient between the rankings of the variable data.

To use this formula, you’ll first rank the data from each variable separately from low to high: every datapoint gets a rank from first, second, or third, etc.

Then, you’ll find the differences (di) between the ranks of your variables for each data pair and take that as the main input for the formula.

Spearman’s rank correlation coefficient formulaExplanation

  \begin{equation*} r_{s} = 1 - \frac {6\sum{d^2_i}}{(n^3-n)} \end{equation*}

  • rs= strength of the rank correlation between variables
  • di = the difference between the x-variable rank and the y-variable rank for each pair of data
  • d2i = sum of the squared differences between x- and y-variable ranks
  • n = sample size

If you have a correlation coefficient of 1, all of the rankings for each variable match up for every data pair. If you have a correlation coefficient of -1, the rankings for one variable are the exact opposite of the ranking of the other variable. A correlation coefficient near zero means that there’s no monotonic relationship between the variable rankings.

Other coefficients

The correlation coefficient is related to two other coefficients, and these give you more information about the relationship between variables.

Coefficient of determination

When you square the correlation coefficient, you end up with the correlation of determination (r2). This is the proportion of common variance between the variables. The coefficient of determination is always between 0 and 1, and it’s often expressed as a percentage.

Coefficient of determinationExplanation
r2The correlation coefficient multiplied by itself

The coefficient of determination is used in regression models to measure how much of the variance of one variable is explained by the variance of the other variable.

A regression analysis helps you find the equation for the line of best fit, and you can use it to predict the value of one variable given the value for the other variable.

A high r2 means that a large amount of variability in one variable is determined by its relationship to the other variable. A low r2 means that only a small portion of the variability of one variable is explained by its relationship to the other variable; relationships with other variables are more likely to account for the variance in the variable.

The correlation coefficient can often overestimate the relationship between variables, especially in small samples, so the coefficient of determination is often a better indicator of the relationship.

Coefficient of alienation

When you take away the coefficient of determination from unity (one), you’ll get the coefficient of alienation. This is the proportion of common variance not shared between the variables, the unexplained variance between the variables.

Coefficient of alienationExplanation
1 – r2One minus the coefficient of determination

A high coefficient of alienation indicates that the two variables share very little variance in common. A low coefficient of alienation means that a large amount of variance is accounted for by the relationship between the variables.

There are several types of correlation coefficients, each suited to analyze different relationships between variables. Here is a numeric example for Spearman’s rank correlation:

1. Example: Construction Site Noise Levels (X) and Worker Productivity (Y)

Let's imagine we're monitoring a construction site and want to see if there's a relationship between noise level (X) on a scale of 1 (quiet) to 5 (very loud) and worker productivity (Y) measured as the number of tasks completed per hour. Here's some data:

DayNoise Level (X)Productivity (Y)
1310
257
3212
448
5115









Solution for Spearman's Rank Correlation Coefficient (ρ):

  1. Assign Ranks to Each Variable:
  • Rank the noise levels (X) from 1 (lowest) to 5 (highest).
  • Rank the productivity levels (Y) from 1 (lowest) to 5 (highest).
DayNoise Level (X)Rank (X)Productivity (Y)Rank (Y)
133101
25574
321125
44483
512152
2. Calculate the differnce in Ranks (D):
Subtract the corresponding ranks for each data point (Rank(X) - Rank(Y)).
DayNoise Level (X)Rank (X)Productivity (Y)Rank (Y)D (Rank(X) - Rank(Y))
1331012
25574-1
321125-4
444831
5121520

  1.  Differences Squared (D^2):
  • Square the difference in ranks (D) for each data point.

DayNoise Level (X)Rank (X)Productivity (Y)Rank (Y)D (Rank(X) - Rank(Y))D^2
13310124
25574-11
321125-416
4448311
51215200










  1. Calculate Spearman's Rank Correlation Coefficient (ρ):

  • Formula: ρ = 1 - [(6ΣD²) / (n*(n² - 1))]

    • Σ (sigma) means "sum of" for all data points (n = 5 in this case).
  • ρ = 1 - [(6 * (4 + 1 + 16 + 1 + 0)) / (5 * (5² - 1))]

  • ρ = 1 - [(6 * 22) / (5 * 24)]

  • ρ = 1 - (132 / 120)

  • ρ ≈ -0.08

Interpretation:

The Spearman's Rank Correlation Coefficient (ρ ≈ -0.08) indicates a very weak negative correlation between noise level (X) and worker productivity (Y). There's almost no connection between the two. This might be because other factors besides noise, such as worker experience or task complexity, also influence productivity.

No comments:

Post a Comment

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...