Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: General correlation coefficient.
Replies: 2   Last Post: Jul 10, 1996 10:46 PM

 Messages: [ Previous | Next ]
 Steve Cable Posts: 2 Registered: 12/12/04
General correlation coefficient.
Posted: Jul 9, 1996 12:11 AM

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------6A7A58B05FE7"

This is a multi-part message in MIME format.

--------------6A7A58B05FE7
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Can anyone advise me of a general purpose algorithm for calculating a correlation
coefficient?

For linear equations of the form y=ax+b, it's pretty well documented that you can
calculate a correlation coefficient r = Sxy / sqrt (Sxx * Sxy). I see this often
labeled as Pearson's r.

Somewhere I came across a more general expression of r that worked for the non-linear
equations I was working with by comparing y_hat to y_obs, the estimated and actual
values of y. I coded this in C as shown in the attachment, and it always matched SAS,
so I was happy till today:

Today, while fitting the equation y=d+((a-d)/(1+(x/c)^b))
to the data
x y
0.0117 0.160503
0.181 0.176018
0.438 0.203961
1.46 0.331981
2.88 0.529194
4.77 0.80795
9.89 1.600164
20 3.178673

After fitting the curve,
a=0.159774
b=1.133458
c=135.193465
d=29.513744 (if your milage varies, it should be close anyway)

and I get a correlation coefficient of 1.0024, which is obviously bogus.

I tried to find a rounding error, or lack of precision somewhere, but the residual sum
of squares is really calculated as larger than the total sum of squares using that
algorithm.

So I'm left with the conclusion that this is not the most general way to calculate a
correlation coefficient for a generalized non-linear equation, not to mention inaccurate
under at least some circumstances.

What do you think would be better?
Thank you for considering this puzzle.
Steve

--------------6A7A58B05FE7
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="Junk.rot"

// corrcoef.c
// Purpose:
// Calculates the correlation coefficient, r. R is derived from the
// square root of the coefficient of determination, R^2. The coefficient
// of determination is calculated as the ratio of explained variation
// divided by the total variation. The explained variation is estimated
// from the least squares fit of the equation to the data.
//***************************************************************************/
#include <math.h> /* for sqrt */
#include "common.h" /* for the function prototype */
/****************************************************************************/
double correlation_coefficient (
double y_obs[], // the observed data
double y_hat[], // the data predicted by the relationship
double w[], // the weighting factors
short num_obs) // the number of observations
{
double y_hat_ave = 0.0; // average of the estimated y values
double y_obs_ave = 0.0; // average of the observed y values
short i; // loop index
double r; // the correlation coefficient
double w_sum = 0.0; // the total weight
double sst = 0.0; // total variation in the data
double ssr = 0.0; // total variation explained by the
// relationship between x and y
double temp1; // temporary value used for efficiency
double temp2; // temporary value used for efficiency

// First get the average
for (i = 0; i < num_obs; i++)
{
y_obs_ave += (w[i] * y_obs[i]);
y_hat_ave += (w[i] * y_hat[i]);
w_sum += w[i];
}
y_obs_ave /= w_sum;
y_hat_ave /= w_sum;

// Then get the sums of the squares
for (i = 0; i < num_obs; i++)
{
temp1 = y_hat[i] - y_hat_ave;
ssr += (w[i] * temp1 * temp1);
temp2 = y_obs[i] - y_obs_ave;
sst += (w[i] * temp2 * temp2);
}

// Finally, take the ratio
r = sqrt (ssr / sst);

return (r);
}

--------------6A7A58B05FE7--

Date Subject Author
7/9/96 Steve Cable
7/10/96 Bob Wheeler
7/10/96 Jeff Brush