Monday, 8 April 2024

Correlation Coefficients - Crash Course

Correlation Coefficient for Civil Engineers: A Crash Course

Hey there, civil engineers! Today we'll dive into the world of correlation coefficients, a fancy way to measure how much two things change together.

What is a correlation analysis?

Correlation analysis is a statistical technique that gives you information about the relationship between variables.

Correlation analysis can be calculated to investigate the relationship of variables. How strong the correlation is is determined by the correlation coefficient, which varies from -1 to +1. Correlation analyses can thus be used to make a statement about the strength and direction of the correlation.

Correlation Coefficient

Correlation coefficients summarize data and help you compare results between studies.

A correlation coefficient is a descriptive statistic. That means that it summarizes sample data without letting you infer anything about the population. A correlation coefficient is a bivariate statistic when it summarizes the relationship between two variables, and it’s a multivariate statistic when you have more than two variables.

If your correlation coefficient is based on sample data, you’ll need an inferential statistic if you want to generalize your results to the population. You can use an F test or a t test to calculate a test statistic that tells you the statistical significance of your finding.


Imagine this: You're testing a new concrete mix. You want to see if there's a relationship between the amount of water you add (variable X) and the concrete's compressive strength (variable Y). Correlation coefficient helps you understand that!

The Formula (don't worry, it's not scary):

We won't go too deep into the math, but the correlation coefficient, often called "r," is basically a number between -1 and 1.

  • Positive r (between 0 and 1): X and Y increase or decrease together. More water (X) might lead to higher strength (Y).
  • Negative r (between -1 and 0): X and Y move in opposite directions. More water (X) might lead to lower strength (Y).
  • r close to 0: There's no clear connection between X and Y. The amount of water added doesn't seem to affect the strength much.

Key Points to Remember:

  • Correlation doesn't equal causation! Just because two things change together doesn't mean one causes the other. Maybe something else entirely affects both the water content and the strength.
  • Correlation coefficient only tells you about linear relationships. If the data goes up and down in curves, r might not be very useful.

Let's see some examples!

Example 1: Beam Strength and Thickness

We measure the thickness (X) of steel beams and their bending strength (Y). We calculate a correlation coefficient of r = 0.8. This indicates a strong positive correlation. Thicker beams (X) tend to have higher bending strength (Y), which makes sense!

Example 2: Traffic Volume and Road Quality

We monitor traffic volume (X) on a road and its quality rating (Y). We get a correlation coefficient of r = -0.5. Here, there's a moderate negative correlation. As traffic volume (X) increases, the road quality rating (Y) decreases, which is something you'd expect with more wear and tear.

Remember: Correlation coefficient is a powerful tool to see if two things are related, but it doesn't tell the whole story. Use it alongside your engineering knowledge to make informed decisions!

Example 1: Soil Compaction and Moisture Content

Let's say we're civil engineers testing the compaction of soil samples. We measure the moisture content (X) as a percentage and the resulting dry density (Y) of the compacted soil in grams per cubic centimeter (g/cm³). Here's some data:

SampleMoisture Content (X)Dry Density (Y)
110%1.65 g/cm³
215%1.60 g/cm³
320%1.55 g/cm³
425%1.50 g/cm³
530%1.40 g/cm³









We want to find the correlation coefficient (r) to see if there's a relationship between moisture content and dry density. While there are calculators and spreadsheet functions for this, let's break it down step-by-step:

  1. Calculate the mean (average) for both X and Y:
    • Mean of X (moisture content) = (10 + 15 + 20 + 25 + 30) / 5 = 20%
    • Mean of Y (dry density) = (1.65 + 1.60 + 1.55 + 1.50 + 1.40) / 5 = 1.52 g/cm³
  1. Find the deviations from the mean for each data point (X - X̅ and Y - Y̅):
SampleMoisture Content (X)Dry Density (Y)X - X̅Y - Y̅(X - X̅)²(Y - Y̅)²
110%1.65 g/cm³-10%0.13 g/cm³1000.0169
215%1.60 g/cm³-5%0.08 g/cm³250.0064
320%1.55 g/cm³0%0.03 g/cm³00.0009
425%1.50 g/cm³5%-0.02 g/cm³250.0004
530%1.40 g/cm³10%-0.12 g/cm³1000.0144
  1. Calculate the covariance (measure of how much X and Y vary together):
    • Covariance = Σ [(X - X̅) * (Y - Y̅)] / (n - 1)
    • Σ (sigma) means "sum of" for all data points (n = 5 in this case).
    • Covariance = [( -10 * 0.13) + (-5 * 0.08) + (0 * 0.03) + (5 * -0.02) + (10 * -0.12)] / (5 - 1)
    • Covariance = -0.39 / 4 = -0.0975
  1. Calculate the standard deviation of X (Sx) and standard deviation of Y (Sy): (We won't go into the detailed formula here, but calculators and spreadsheets can handle this.)
    • Sx ≈ 5.77%
    • Sy ≈ 0.09 g/cm³
  1. Finally, calculate the correlation coefficient (r):

    • r = Covariance / (Sx * Sy)
    • r = -0.0975 / (5.77% * 0.09 g/cm³)
    • r ≈ -0.17

Interpretation:

The negative correlation coefficient (r = -0.17) indicates a weak negative relationship between moisture content (X) and dry density (Y). As the moisture content increases, the dry density tends to decrease slightly, which aligns with expectations for soil compaction. However, the value is close to zero, suggesting a very weak connection.

Example 2: Steel Beam Weight and Length

Let's imagine we're analyzing steel beams and want to see if there's a correlation between their weight (X) in kilograms (kg) and their length (Y) in meters (m). Here's some data:

BeamWeight (X)Length (Y)
1100 kg2.0 m
2150 kg3.0 m
3220 kg4.5 m
4280 kg5.6 m
5350 kg7.0 m









Following the same steps as the previous example:

  1. Calculate the mean for weight (X) and length (Y):
    • Mean of X (weight) = (100 + 150 + 220 + 280 + 350) / 5 = 220 kg
    • Mean of Y (length) = (2.0 + 3.0 + 4.5 + 5.6 + 7.0) / 5 = 4.42 m

  1. Find the deviations from the mean for each data point:
BeamWeight (X)Length (Y)X - X̅Y - Y̅(X - X̅)²(Y - Y̅)²
1100 kg2.0 m-120 kg-2.42 m144005.8564
2150 kg3.0 m-70 kg-1.42 m49002.0164
3220 kg4.5 m0 kg0.08 m00.0064
4280 kg5.6 m60 kg1.18 m36001.3924
5350 kg7.0 m130 kg2.58 m169006.6564    














3. Calculate the covariance:
  • Covariance = Σ [(X - X̅) * (Y - Y̅)] / (n - 1)
  • Covariance = [( -120 * -2.42) + (-70 * -1.42) + (0 * 0.08) + (60 * 1.18) + (130 * 2.58)] / (5 - 1)
  • Covariance = 5232.4 / 4 = 1308.1 kg*m
4. Calculate the standard deviation of weight (Sx) and standard deviation of length (Sy): (We'll assume you can use a calculator or spreadsheet for this step.)
  • Sx ≈ 80.62 kg
  • Sy ≈ 1.74 m
5. Calculate the correlation coefficient (r):

  • r = Covariance / (Sx * Sy)
  • r = 1308.1 kg*m / (80.62 kg * 1.74 m)
  • r ≈ 0.95

Interpretation:

The strong positive correlation coefficient (r ≈ 0.95) indicates a clear relationship between weight (X) and length (Y) of the steel beams. As the length of the beams increases, their weight also tends to increase significantly, which aligns with our understanding of material properties.

No comments:

Post a Comment

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...