The Correlation Coefficient
Date: 09/11/98 at 13:56:58 From: Steven S. Krompf Subject: Correlation Coefficient Dear Dr. Mitteldorf, I am sending this reply on behalf of our patron, who wrote the following: The books on statistics we have consulted (four of them) discuss linear regression and correlation analysis, but none of them explains how one calculates them. We are trying to establish whether there is a correlation between 4 groups (1-4), and the values obtained for each. There appears to be a negative correlation - but is it statistically significant? We'd appreciate your showing us how to calculate the correlation coefficient and any additional particulars. Thank you. Here are the sample data: Age Value 1 104 2 88 3 69 4 55
Date: 09/11/98 at 19:52:32 From: Doctor Mitteldorf Subject: Re: Correlation Coefficient Dear Stephen, Okay, here goes. The formula is: <xy>-<x><y> -------------- = r = correlation coefficient std(x)*std(y) In your case, x is the age numbers and y is the "values." <xy> means the average value of x*y taken together. In other words, multiply 1*104, 2*88, 3*69, 4*55, add them up, and divide by 4. I get 176.75. <x> is the simple average of the x's, which is 2.5. <y> is the average of the y's, which is 79. So we have for the numerator 176.75 - 2.5*79 = -20.75. In the denominator, std means standard deviation. The standard deviation is the square root of the variance, and the variance is computed very much like the numerator, but for x or y with itself, rather than the product. So var(x) is by definition <x^2>-<x>^2. In other words, first square the x's and average them. Then average the x's and square the result. Take the square root of the difference to get the standard deviation. I get 1.118 for this. Do the same thing for the standard deviation of the y's. I get 18.58. Now all that's left is to divide the numerator by the product of the two standard deviations's. This gives r, which comes out to -.998. The correlation coefficient always comes out between -1 and +1, though this may not be at all obvious from the recipe that I gave you. If the numbers had been 1,2,3,4 and 2,4,6,8 or 1,2,3,4 and 4,7,10,13, then the correlation would have come out to 1.0 exactly, and if they had been 1,2,3,4 and 4,3,2,1, then the correlation would have come out -1.0 exactly. For random numbers, the correlation would have come out within 0.2 of zero. So it's fairly safe to say for most situations that -.998 is a highly significant correlation, though the place from which the numbers derived and the hypothesis you're trying to prove must be taken into account in a complicated way to attach any quantitative meaning to this statement. - Doctor Mitteldorf, The Math Forum Check out our web site! http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.