Associated Topics || Dr. Math Home || Search Dr. Math

### The Correlation Coefficient

```
Date: 09/11/98 at 13:56:58
From: Steven S. Krompf
Subject: Correlation Coefficient

Dear Dr. Mitteldorf,

I am sending this reply on behalf of our patron, who wrote the
following:

The books on statistics we have consulted (four of them) discuss
linear regression and correlation analysis, but none of them explains
how one calculates them.

We are trying to establish whether there is a correlation between
4 groups (1-4), and the values obtained for each. There appears to be
a negative correlation - but is it statistically significant?

We'd appreciate your showing us how to calculate the correlation

Thank you.

Here are the sample data:

Age         Value

1           104
2            88
3            69
4            55
```

```
Date: 09/11/98 at 19:52:32
From: Doctor Mitteldorf
Subject: Re: Correlation Coefficient

Dear Stephen,

Okay, here goes.  The formula is:

<xy>-<x><y>
-------------- = r = correlation coefficient
std(x)*std(y)

In your case, x is the age numbers and y is the "values."

<xy> means the average value of x*y taken together. In other words,
multiply 1*104, 2*88, 3*69, 4*55, add them up, and divide by 4.
I get 176.75.

<x> is the simple average of the x's, which is 2.5.
<y> is the average of the y's, which is 79.

So we have for the numerator 176.75 - 2.5*79 = -20.75.

In the denominator, std means standard deviation. The standard
deviation is the square root of the variance, and the variance is
computed very much like the numerator, but for x or y with itself,
rather than the product.

So var(x) is by definition <x^2>-<x>^2. In other words, first square
the x's and average them. Then average the x's and square the result.
Take the square root of the difference to get the standard deviation.
I get 1.118 for this.

Do the same thing for the standard deviation of the y's. I get 18.58.

Now all that's left is to divide the numerator by the product of the
two standard deviations's. This gives r, which comes out to -.998.

The correlation coefficient always comes out between -1 and +1, though
this may not be at all obvious from the recipe that I gave you. If the
numbers had been 1,2,3,4 and 2,4,6,8 or 1,2,3,4 and 4,7,10,13, then the
correlation would have come out to 1.0 exactly, and if they had been
1,2,3,4 and 4,3,2,1, then the correlation would have come out -1.0
exactly.

For random numbers, the correlation would have come out within 0.2 of
zero.

So it's fairly safe to say for most situations that -.998 is a highly
significant correlation, though the place from which the numbers
derived and the hypothesis you're trying to prove must be taken into
account in a complicated way to attach any quantitative meaning to
this statement.

- Doctor Mitteldorf, The Math Forum
Check out our web site! http://mathforum.org/dr.math/
```
Associated Topics:
High School Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search