
Re: Explanation for why linear regression is a poor fit
Posted:
Feb 4, 2013 7:47 PM


On Mon, 04 Feb 2013 18:13:22 0500, Rich Ulrich <rich.ulrich@comcast.net> wrote:
>On Mon, 4 Feb 2013 12:55:36 0800 (PST), em.derenne@gmail.com wrote: > >>Hi >>I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but
[snip, rest of post; start of my previous reply]
>The highest 5 outcome scores are all in the first half of the graph, >and the very highest one is near the beginning. Does that >seem important? That's most of the effect. >
I discovered that I can reformat the downloaded chart, even though it is readonly, in order to show a logarithmic spacing for the Y axis. The data are pretty well distributed, from a minimum of 2 to a maximum of 17000.
That gives charts that look a lot like charts in the report that David Jones gives a link to.... so, taking the log of the measures does give a *statistical* model that has errors that are much better behaved, and one that someone else seems to be using (presuming these are the same data).
>On the other hand, very few people would say that any >time series is properly tested by a simply linear regression >when there are autocorrelation effects... which there almost >always are. > >As to the size of the effect, and how few cases it depends >on  I'm "pretty sure" that the trend becomes n.s. if you >remove the top 5 points; "probably" for the top 3, and >"maybe" for removing the top one alone.
I can imagine that it is a useful statement, to be able to say that certain REALLY high levels are no longer being reached. But it needs, I think, a regression on the logtransformed values in order to make a proper statement on the (lack of) trend in that pollution control.
 Rich Ulrich

