Paul
Posts:
208
Registered:
2/23/10
|
|
Re: Test constantness of normality of residuals from linear regression
Posted:
Jan 10, 2013 9:59 AM
|
|
On Jan 10, 12:48 am, Ray Koopman <koop...@sfu.ca> wrote: > On Jan 9, 7:35 pm, Paul <paul.domas...@gmail.com> wrote: > > > > > > > After much browsing of Wikipedia and the web, I used both normal > > probability plot and Anderson-Darling to test the normality of > > residuals from a simple linear regression (SLR) of 6 data points. > > Results were very good. However, SLR doesn't just assume that the > > residuals are normal. It assumes that the standard deviation of the > > PDF that gives rise to the residuals is constant along the horizontal > > axis. Is there a way to test for this if none of the data points have > > the same value for the independent variable? I want to be able to > > show that there is no gross curves or spreading/focusing of the > > scatter. > > > In electrical engineering signal theory, the horizontal axis is time. > > Using Fourier Transform (FT), time-frequency domains can show trends. > > Intuitively, I would set up the data as a scatter graph of residuals > > plotted against the independent variable (which would be treated as > > time). Gross curves show up as low-frequency content. There should > > be none if residuals are truly iid. The spectrum should look like > > white noise. The usual way to get the power spectrum is the FT of the > > autocorrelation function, which itself should resemble an impulse at > > zero. This just shows indepedence of samples, not constant iid normal > > along the horizontal axis. > > > As for spreading or narrowing of the scatter, I guess that can be > > modelled in time as a multiplication of a truly random signal by a > > linear (or exponential) attenuation function. The latter acts like a > > modulation envelope. Their power spectrums will then convolve in some > > weird way. I'm not sure if this is a fruitful direction for > > identifying trends in the residuals. It starts to get convoluted > > pretty quickly. > > > Surely there must be a less klugy way from the world of statistics? I > > realize that my sample size will probably be too small for many > > conceptual approaches. For example, if I had a wealth of data points, > > I could segment the horizontal axis, then do a normality test on each > > segment. This would generate mu's and sigma's as well, which could > > then be compared across segments. So for the sake of conceptual > > gratification, I'm hoping for a more elegant test for the ideal case > > of many data points. If there is also a test for small sample sizes, > > so much the better (though I don't hold my breath). > > If y|x = a + b*x + e, where the errors are iid random variables with > zero means, and you do an ordinary least squares fit of that model to > (x1,y1), ..., (xn,yn), then the theoretical variance of the residual > for xi is 1 - 1/n - [(xi-m)^2 / sum{(xj-m)^2}], where m is the mean > of x1, ..., xn. In words, residuals whose x is far from the mean tend > to be smaller than those whose x is hear the mean. (This is known as > "leverage": points far from the mean have more "leverage" on the > regression line, pulling it closer to them.) Note that normality is > not required
Ray,
Thanks for the background. One of the 4 explicit assumptions of regression is that the PDF for the random errors are normal, according to Introductory Statistics by Prem S Man (3rd edition). Is this not correct? This is the reasons I am learngin about normality tests, and especially about the constantness of the PDF along the horizontal axis.
|
|