The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Explanation for why linear regression is a poor fit
Replies: 8   Last Post: Feb 15, 2013 10:36 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 44
Registered: 12/18/04
Re: Explanation for why linear regression is a poor fit
Posted: Feb 15, 2013 10:36 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I take this as asking why statements like "p = 0.025, meaning it is
significant at 97.5% confidence" are not true, and how you can
demonstrate this.

The p-value depends on (at least approximately) Normal distribution of
residuals. Looking at the regression in FC Counts Chart I get:
The regression equation is
MPN = 10177 - 0.240 Date

Predictor Coef SE Coef T P
Constant 10177 4324 2.35 0.019
Date -0.2396 0.1065 -2.25 0.025

S = 1220.74 R-Sq = 1.0% R-Sq(adj) = 0.8%
Agreeing with Excel, but the residuals are very far from Normally
distributed, so you can't trust the P values.

I'm guessing that this is microbiology data (faecal coliforms in water
determined by MPN techniques?). In that domain it's common to take logs,
usually base 10. Doing the regression on log(MPN) I get
log(MPN) = - 0.73 + 0.000070 Date

Predictor Coef SE Coef T P
Constant -0.726 2.355 -0.31 0.758
Date 0.00007001 0.00005801 1.21 0.228

S = 0.664892 R-Sq = 0.3% R-Sq(adj) = 0.1%
Very nicely behaved residuals, so you can trust the P values, which show
no evidence for a date dependence.


On 04/02/2013 20:55, wrote:
> Hi-
> I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but it produces a p-value 0.025. I can't formulate a solid argument because I don't understand the material well enough. Am I incorrect in saying this is a poor fit? Even visually to me it looks like a poor fit. Additionally, he says things like: "FC Count at Samish River/Thomas Road: N = 498, r2 = 0.01, p = 0.025, meaning it is significant at 97.5% confidence" I know you can't use P-values to describe stats like this. I need help explaining why this data isn't showing a significant declining trend with a linear regression (in less of course I am incorrect.)
> Thanks for clarification and help.
> Data and Graph:

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.