|


The Correlation Coefficient
Date: 09/11/98 at 13:56:58
From: Steven S. Krompf
Subject: Correlation Coefficient
Dear Dr. Mitteldorf,
I am sending this reply on behalf of our patron, who wrote the
following:
The books on statistics we have consulted (four of them) discuss
linear regression and correlation analysis, but none of them explains
how one calculates them.
We are trying to establish whether there is a correlation between
4 groups (1-4), and the values obtained for each. There appears to be
a negative correlation - but is it statistically significant?
We'd appreciate your showing us how to calculate the correlation
coefficient and any additional particulars.
Thank you.
Here are the sample data:
Age Value
1 104
2 88
3 69
4 55
Date: 09/11/98 at 19:52:32
From: Doctor Mitteldorf
Subject: Re: Correlation Coefficient
Dear Stephen,
Okay, here goes. The formula is:
<xy>-<x><y>
-------------- = r = correlation coefficient
std(x)*std(y)
In your case, x is the age numbers and y is the "values."
<xy> means the average value of x*y taken together. In other words,
multiply 1*104, 2*88, 3*69, 4*55, add them up, and divide by 4.
I get 176.75.
<x> is the simple average of the x's, which is 2.5.
<y> is the average of the y's, which is 79.
So we have for the numerator 176.75 - 2.5*79 = -20.75.
In the denominator, std means standard deviation. The standard
deviation is the square root of the variance, and the variance is
computed very much like the numerator, but for x or y with itself,
rather than the product.
So var(x) is by definition <x^2>-<x>^2. In other words, first square
the x's and average them. Then average the x's and square the result.
Take the square root of the difference to get the standard deviation.
I get 1.118 for this.
Do the same thing for the standard deviation of the y's. I get 18.58.
Now all that's left is to divide the numerator by the product of the
two standard deviations's. This gives r, which comes out to -.998.
The correlation coefficient always comes out between -1 and +1, though
this may not be at all obvious from the recipe that I gave you. If the
numbers had been 1,2,3,4 and 2,4,6,8 or 1,2,3,4 and 4,7,10,13, then the
correlation would have come out to 1.0 exactly, and if they had been
1,2,3,4 and 4,3,2,1, then the correlation would have come out -1.0
exactly.
For random numbers, the correlation would have come out within 0.2 of
zero.
So it's fairly safe to say for most situations that -.998 is a highly
significant correlation, though the place from which the numbers
derived and the hypothesis you're trying to prove must be taken into
account in a complicated way to attach any quantitative meaning to
this statement.
- Doctor Mitteldorf, The Math Forum
Check out our web site! http://mathforum.org/dr.math/
|
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/