Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: Test constantness of normality of residuals from linear regression
Replies: 9   Last Post: Jan 12, 2013 7:01 PM

 Messages: [ Previous | Next ]
 Paul Posts: 513 Registered: 2/23/10
Re: Test constantness of normality of residuals from linear regression
Posted: Jan 10, 2013 9:59 AM

On Jan 10, 12:48 am, Ray Koopman <koop...@sfu.ca> wrote:
> On Jan 9, 7:35 pm, Paul <paul.domas...@gmail.com> wrote:
>
>
>
>
>

> > After much browsing of Wikipedia and the web, I used both normal
> > probability plot and Anderson-Darling to test the normality of
> > residuals from a simple linear regression (SLR) of 6 data points.
> > Results were very good.  However, SLR doesn't just assume that the
> > residuals are normal.  It assumes that the standard deviation of the
> > PDF that gives rise to the residuals is constant along the horizontal
> > axis.  Is there a way to test for this if none of the data points have
> > the same value for the independent variable?  I want to be able to
> > show that there is no gross curves or spreading/focusing of the
> > scatter.

>
> > In electrical engineering signal theory, the horizontal axis is time.
> > Using Fourier Transform (FT), time-frequency domains can show trends.
> > Intuitively, I would set up the data as a scatter graph of residuals
> > plotted against the independent variable (which would be treated as
> > time).  Gross curves show up as low-frequency content.  There should
> > be none if residuals are truly iid.  The spectrum should look like
> > white noise.  The usual way to get the power spectrum is the FT of the
> > autocorrelation function, which itself should resemble an impulse at
> > zero.  This just shows indepedence of samples, not constant iid normal
> > along the horizontal axis.

>
> > As for spreading or narrowing of the scatter, I guess that can be
> > modelled in time as a multiplication of a truly random signal by a
> > linear (or exponential) attenuation function.  The latter acts like a
> > modulation envelope.  Their power spectrums will then convolve in some
> > weird way.  I'm not sure if this is a fruitful direction for
> > identifying trends in the residuals.  It starts to get convoluted
> > pretty quickly.

>
> > Surely there must be a less klugy way from the world of statistics?  I
> > realize that my sample size will probably be too small for many
> > conceptual approaches.  For example, if I had a wealth of data points,
> > I could segment the horizontal axis, then do a normality test on each
> > segment.  This would generate mu's and sigma's as well, which could
> > then be compared across segments.  So for the sake of conceptual
> > gratification, I'm hoping for a more elegant test for the ideal case
> > of many data points.  If there is also a test for small sample sizes,
> > so much the better (though I don't hold my breath).

>
> If y|x = a + b*x + e, where the errors are iid random variables with
> zero means, and you do an ordinary least squares fit of that model to
> (x1,y1), ..., (xn,yn), then the theoretical variance of the residual
> for xi is  1 - 1/n - [(xi-m)^2 / sum{(xj-m)^2}], where m is the mean
> of x1, ..., xn. In words, residuals whose x is far from the mean tend
> to be smaller than those whose x is hear the mean. (This is known as
> "leverage": points far from the mean have more "leverage" on the
> regression line, pulling it closer to them.) Note that normality is
> not required

Ray,

Thanks for the background. One of the 4 explicit assumptions of
regression is that the PDF for the random errors are normal, according
to Introductory Statistics by Prem S Man (3rd edition). Is this not
correct? This is the reasons I am learngin about normality tests, and
especially about the constantness of the PDF along the horizontal axis.

Date Subject Author
1/9/13 Paul
1/9/13 Paul
1/10/13 Ray Koopman
1/10/13 Paul
1/10/13 Ray Koopman
1/10/13 Michael Press
1/10/13 Paul
1/12/13 Herman Rubin
1/10/13 David Jones
1/10/13 Paul