Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


Math Forum
»
Discussions
»
sci.math.*
»
sci.stat.math
Notice: We are no longer accepting new posts, but the forums will continue to be readable.
Topic:
Explanation for why linear regression is a poor fit
Replies:
8
Last Post:
Feb 15, 2013 10:36 AM



Lurker
Posts:
44
Registered:
12/18/04


Re: Explanation for why linear regression is a poor fit
Posted:
Feb 15, 2013 10:36 AM


I take this as asking why statements like "p = 0.025, meaning it is significant at 97.5% confidence" are not true, and how you can demonstrate this.
The pvalue depends on (at least approximately) Normal distribution of residuals. Looking at the regression in FC Counts Chart I get:  The regression equation is MPN = 10177  0.240 Date
Predictor Coef SE Coef T P Constant 10177 4324 2.35 0.019 Date 0.2396 0.1065 2.25 0.025
S = 1220.74 RSq = 1.0% RSq(adj) = 0.8%  Agreeing with Excel, but the residuals are very far from Normally distributed, so you can't trust the P values.
I'm guessing that this is microbiology data (faecal coliforms in water determined by MPN techniques?). In that domain it's common to take logs, usually base 10. Doing the regression on log(MPN) I get  log(MPN) =  0.73 + 0.000070 Date
Predictor Coef SE Coef T P Constant 0.726 2.355 0.31 0.758 Date 0.00007001 0.00005801 1.21 0.228
S = 0.664892 RSq = 0.3% RSq(adj) = 0.1%  Very nicely behaved residuals, so you can trust the P values, which show no evidence for a date dependence.
HTH
KJ On 04/02/2013 20:55, em.derenne@gmail.com wrote: > Hi > I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but it produces a pvalue 0.025. I can't formulate a solid argument because I don't understand the material well enough. Am I incorrect in saying this is a poor fit? Even visually to me it looks like a poor fit. Additionally, he says things like: "FC Count at Samish River/Thomas Road: N = 498, r2 = 0.01, p = 0.025, meaning it is significant at 97.5% confidence" I know you can't use Pvalues to describe stats like this. I need help explaining why this data isn't showing a significant declining trend with a linear regression (in less of course I am incorrect.) > > Thanks for clarification and help. > > Data and Graph: http://dl.dropbox.com/u/18470470/Copy%20of%20Regression%20Correlation%20info.xlsx > >



