On Mon, 04 Feb 2013 18:13:22 -0500, Rich Ulrich <firstname.lastname@example.org> wrote:
>On Mon, 4 Feb 2013 12:55:36 -0800 (PST), email@example.com wrote: > >>Hi- >>I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but
[snip, rest of post; start of my previous reply]
>The highest 5 outcome scores are all in the first half of the graph, >and the very highest one is near the beginning. Does that >seem important? That's most of the effect. >
I discovered that I can re-format the downloaded chart, even though it is read-only, in order to show a logarithmic spacing for the Y axis. The data are pretty well distributed, from a minimum of 2 to a maximum of 17000.
That gives charts that look a lot like charts in the report that David Jones gives a link to.... so, taking the log of the measures does give a *statistical* model that has errors that are much better behaved, and one that someone else seems to be using (presuming these are the same data).
>On the other hand, very few people would say that any >time series is properly tested by a simply linear regression >when there are autocorrelation effects... which there almost >always are. > >As to the size of the effect, and how few cases it depends >on -- I'm "pretty sure" that the trend becomes n.s. if you >remove the top 5 points; "probably" for the top 3, and >"maybe" for removing the top one alone.
I can imagine that it is a useful statement, to be able to say that certain REALLY high levels are no longer being reached. But it needs, I think, a regression on the log-transformed values in order to make a proper statement on the (lack of) trend in that pollution control.