The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Explanation for why linear regression is a poor fit
Replies: 8   Last Post: Feb 15, 2013 10:36 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Richard Ulrich

Posts: 2,961
Registered: 12/13/04
Re: Explanation for why linear regression is a poor fit
Posted: Feb 4, 2013 7:47 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Mon, 04 Feb 2013 18:13:22 -0500, Rich Ulrich
<> wrote:

>On Mon, 4 Feb 2013 12:55:36 -0800 (PST), wrote:

>>I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but

[snip, rest of post; start of my previous reply]

>The highest 5 outcome scores are all in the first half of the graph,
>and the very highest one is near the beginning. Does that
>seem important? That's most of the effect.

I discovered that I can re-format the downloaded chart, even
though it is read-only, in order to show a logarithmic spacing
for the Y axis. The data are pretty well distributed, from a
minimum of 2 to a maximum of 17000.

That gives charts that look a lot like charts in the report that
David Jones gives a link to.... so, taking the log of the measures
does give a *statistical* model that has errors that are much
better behaved, and one that someone else seems to be using
(presuming these are the same data).

>On the other hand, very few people would say that any
>time series is properly tested by a simply linear regression
>when there are autocorrelation effects... which there almost
>always are.
>As to the size of the effect, and how few cases it depends
>on -- I'm "pretty sure" that the trend becomes n.s. if you
>remove the top 5 points; "probably" for the top 3, and
>"maybe" for removing the top one alone.

I can imagine that it is a useful statement, to be able to say
that certain REALLY high levels are no longer being reached.
But it needs, I think, a regression on the log-transformed
values in order to make a proper statement on the (lack of)
trend in that pollution control.

Rich Ulrich

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.