Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

The Correlation Coefficient


Date: 09/11/98 at 13:56:58
From: Steven S. Krompf
Subject: Correlation Coefficient

Dear Dr. Mitteldorf,

I am sending this reply on behalf of our patron, who wrote the 
following:

The books on statistics we have consulted (four of them) discuss
linear regression and correlation analysis, but none of them explains 
how one calculates them.

We are trying to establish whether there is a correlation between 
4 groups (1-4), and the values obtained for each. There appears to be 
a negative correlation - but is it statistically significant?

We'd appreciate your showing us how to calculate the correlation 
coefficient and any additional particulars.

Thank you.

Here are the sample data:

   Age         Value

    1           104
    2            88 
    3            69 
    4            55


Date: 09/11/98 at 19:52:32
From: Doctor Mitteldorf
Subject: Re: Correlation Coefficient

Dear Stephen,

Okay, here goes.  The formula is: 

    <xy>-<x><y>
   -------------- = r = correlation coefficient
    std(x)*std(y)

In your case, x is the age numbers and y is the "values."

<xy> means the average value of x*y taken together. In other words, 
multiply 1*104, 2*88, 3*69, 4*55, add them up, and divide by 4. 
I get 176.75.

<x> is the simple average of the x's, which is 2.5.
<y> is the average of the y's, which is 79.

So we have for the numerator 176.75 - 2.5*79 = -20.75.

In the denominator, std means standard deviation. The standard 
deviation is the square root of the variance, and the variance is 
computed very much like the numerator, but for x or y with itself, 
rather than the product.

So var(x) is by definition <x^2>-<x>^2. In other words, first square 
the x's and average them. Then average the x's and square the result.  
Take the square root of the difference to get the standard deviation. 
I get 1.118 for this.

Do the same thing for the standard deviation of the y's. I get 18.58.

Now all that's left is to divide the numerator by the product of the 
two standard deviations's. This gives r, which comes out to -.998.

The correlation coefficient always comes out between -1 and +1, though 
this may not be at all obvious from the recipe that I gave you. If the  
numbers had been 1,2,3,4 and 2,4,6,8 or 1,2,3,4 and 4,7,10,13, then the 
correlation would have come out to 1.0 exactly, and if they had been 
1,2,3,4 and 4,3,2,1, then the correlation would have come out -1.0 
exactly. 
 
For random numbers, the correlation would have come out within 0.2 of 
zero.

So it's fairly safe to say for most situations that -.998 is a highly 
significant correlation, though the place from which the numbers 
derived and the hypothesis you're trying to prove must be taken into 
account in a complicated way to attach any quantitative meaning to 
this statement.

- Doctor Mitteldorf, The Math Forum
  Check out our web site! http://mathforum.org/dr.math/   
    
Associated Topics:
High School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/