Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
General correlation coefficient.
Posted:
Jul 9, 1996 12:11 AM
|
|
MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------6A7A58B05FE7"
This is a multi-part message in MIME format.
--------------6A7A58B05FE7 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit
Can anyone advise me of a general purpose algorithm for calculating a correlation coefficient?
For linear equations of the form y=ax+b, it's pretty well documented that you can calculate a correlation coefficient r = Sxy / sqrt (Sxx * Sxy). I see this often labeled as Pearson's r.
Somewhere I came across a more general expression of r that worked for the non-linear equations I was working with by comparing y_hat to y_obs, the estimated and actual values of y. I coded this in C as shown in the attachment, and it always matched SAS, so I was happy till today:
Today, while fitting the equation y=d+((a-d)/(1+(x/c)^b)) to the data x y 0.0117 0.160503 0.181 0.176018 0.438 0.203961 1.46 0.331981 2.88 0.529194 4.77 0.80795 9.89 1.600164 20 3.178673
After fitting the curve, a=0.159774 b=1.133458 c=135.193465 d=29.513744 (if your milage varies, it should be close anyway)
and I get a correlation coefficient of 1.0024, which is obviously bogus.
I tried to find a rounding error, or lack of precision somewhere, but the residual sum of squares is really calculated as larger than the total sum of squares using that algorithm.
So I'm left with the conclusion that this is not the most general way to calculate a correlation coefficient for a generalized non-linear equation, not to mention inaccurate under at least some circumstances.
What do you think would be better? Thank you for considering this puzzle. Steve
--------------6A7A58B05FE7 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Junk.rot"
// corrcoef.c // Purpose: // Calculates the correlation coefficient, r. R is derived from the // square root of the coefficient of determination, R^2. The coefficient // of determination is calculated as the ratio of explained variation // divided by the total variation. The explained variation is estimated // from the least squares fit of the equation to the data. //***************************************************************************/ #include <math.h> /* for sqrt */ #include "common.h" /* for the function prototype */ /****************************************************************************/ double correlation_coefficient ( double y_obs[], // the observed data double y_hat[], // the data predicted by the relationship double w[], // the weighting factors short num_obs) // the number of observations { double y_hat_ave = 0.0; // average of the estimated y values double y_obs_ave = 0.0; // average of the observed y values short i; // loop index double r; // the correlation coefficient double w_sum = 0.0; // the total weight double sst = 0.0; // total variation in the data double ssr = 0.0; // total variation explained by the // relationship between x and y double temp1; // temporary value used for efficiency double temp2; // temporary value used for efficiency
// First get the average for (i = 0; i < num_obs; i++) { y_obs_ave += (w[i] * y_obs[i]); y_hat_ave += (w[i] * y_hat[i]); w_sum += w[i]; } y_obs_ave /= w_sum; y_hat_ave /= w_sum;
// Then get the sums of the squares for (i = 0; i < num_obs; i++) { temp1 = y_hat[i] - y_hat_ave; ssr += (w[i] * temp1 * temp1); temp2 = y_obs[i] - y_obs_ave; sst += (w[i] * temp2 * temp2); }
// Finally, take the ratio r = sqrt (ssr / sst);
return (r); }
--------------6A7A58B05FE7--
|
|
|
|