Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Explanation for why linear regression is a poor fit
Replies: 8   Last Post: Feb 15, 2013 10:36 AM

 Messages: [ Previous | Next ]
 Lurker Posts: 44 Registered: 12/18/04
Re: Explanation for why linear regression is a poor fit
Posted: Feb 15, 2013 10:36 AM

I take this as asking why statements like "p = 0.025, meaning it is
significant at 97.5% confidence" are not true, and how you can
demonstrate this.

The p-value depends on (at least approximately) Normal distribution of
residuals. Looking at the regression in FC Counts Chart I get:
-------------
The regression equation is
MPN = 10177 - 0.240 Date

Predictor Coef SE Coef T P
Constant 10177 4324 2.35 0.019
Date -0.2396 0.1065 -2.25 0.025

S = 1220.74 R-Sq = 1.0% R-Sq(adj) = 0.8%
-----------
Agreeing with Excel, but the residuals are very far from Normally
distributed, so you can't trust the P values.

I'm guessing that this is microbiology data (faecal coliforms in water
determined by MPN techniques?). In that domain it's common to take logs,
usually base 10. Doing the regression on log(MPN) I get
------------------
log(MPN) = - 0.73 + 0.000070 Date

Predictor Coef SE Coef T P
Constant -0.726 2.355 -0.31 0.758
Date 0.00007001 0.00005801 1.21 0.228

S = 0.664892 R-Sq = 0.3% R-Sq(adj) = 0.1%
----------------------
Very nicely behaved residuals, so you can trust the P values, which show
no evidence for a date dependence.

HTH

KJ
On 04/02/2013 20:55, em.derenne@gmail.com wrote:
> Hi-
> I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but it produces a p-value 0.025. I can't formulate a solid argument because I don't understand the material well enough. Am I incorrect in saying this is a poor fit? Even visually to me it looks like a poor fit. Additionally, he says things like: "FC Count at Samish River/Thomas Road: N = 498, r2 = 0.01, p = 0.025, meaning it is significant at 97.5% confidence" I know you can't use P-values to describe stats like this. I need help explaining why this data isn't showing a significant declining trend with a linear regression (in less of course I am incorrect.)
>
> Thanks for clarification and help.
>
> Data and Graph: http://dl.dropbox.com/u/18470470/Copy%20of%20Regression%20Correlation%20info.xlsx
>
>

Date Subject Author
2/4/13 divergent.tseries@gmail.com
2/4/13 Ray Koopman
2/4/13 Richard Ulrich
2/4/13 Richard Ulrich
2/4/13 Paul
2/4/13 David Jones
2/4/13 David Jones
2/15/13 Lurker