Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: Explanation for why linear regression is a poor fit
Replies: 8   Last Post: Feb 15, 2013 10:36 AM

 Messages: [ Previous | Next ]
 David Jones Posts: 77 Registered: 2/9/12
Re: Explanation for why linear regression is a poor fit
Posted: Feb 4, 2013 7:25 PM

"David Jones" wrote in message news:kepism\$als\$1@speranza.aioe.org...

"Paul" wrote in message

On Monday, February 4, 2013 3:55:36 PM UTC-5, em.de...@gmail.com wrote:

> I haven't taken stats in a few years and recently there have been a lot
> thrown around my work place, including the attached graph (and raw data).
> I realize that low R2 mean that the linear regression is not a good fit,

First, as Dave notes, you have time series data here. Moreover, the spacing
of the dates is irregular. If the regression is col/day v. date, I hope
whoever ran the regression used the actual dates and not index (1, 2, 3,
...) for the predictor variable. (Also, as Dave mentions, there may be
better tools than simple regression given that it's a time series.)

Second, your data has high variance. A low R*2 does not necessarily signal a
poor fit (in the sense of incorrect model), although it may signal that the
regression model does not have enough predictive power to do you much good.
When the data is quite noisy, sometimes a low R^2 is the best you can do
(and sometimes the model actually has some value).

> but it produces a p-value 0.025.

If this is the p-value of the usual F-test, all it says is that your trend
model fits better than assuming a constant mean. It does not say the trend
model is correct (or that a better model cannot be found).

> I can't formulate a solid argument because I don't understand the
> material well enough. Am I incorrect in saying this is a poor fit? Even
> visually to me it looks like a poor fit. Additionally, he says things
> like: "FC Count at Samish River/Thomas Road: N = 498, r2 = 0.01, p =
> 0.025, meaning it is significant at 97.5% confidence" I know you can't
> use P-values to describe stats like this.

Mixing "significant at" and "confidence" is IMHO sloppy use of terminology,
but the underlying intent is not necessarily wrong.

> I need help explaining why this data isn't showing a significant
> declining trend with a linear regression (in less of course I am
> incorrect.)

It looks like declining trend to me. Whether the rate is _practically_
significant is an open question. I would not be surprised if it proved to be
statistically significant even with a more careful analysis.

=================================================

If something like an F-test is being used, the plots show that the
assumptions necessary for the validity of this test clearly do not hold.
However, something like a permutation test might be applicable and it looks
likely to find a significantly negative slope for the regression line.
However there appear to be other important changes in pattern going on, not
well described by a linear trend in location.
The plot suggests a change in behaviour at the lower end of the
distribution. I see that previous reports from this project have used
log-transformed data, and this would be easy to try and might provide a
better view for the lower end of the observation scale . Similarly, past
work has at least looked at differences across seasons and it would be worth
extending this to look the possibility of different trends in different
parts of the year. (Found previous work at
https://fortress.wa.gov/ecy/publications/publications/0803029.pdf page 41 on
(Nov 2008): this previous report also mixes significance and confidence
badly, as noted by Paul above.)

David Jones

http://www.skagitcounty.net/PublicWorksCleanWater/Documents/wqreport2010/2010%20Annual%20Report%20Final.pdf
, which has plots that show seasonal behaviour for various quantities
including FC, possibly indicating a step-change in behaviour for one site on
Thomas Creek, data to 2009)

David Jones

Date Subject Author
2/4/13 divergent.tseries@gmail.com
2/4/13 Ray Koopman
2/4/13 Richard Ulrich
2/4/13 Richard Ulrich
2/4/13 Paul
2/4/13 David Jones
2/4/13 David Jones
2/15/13 Lurker