|
|
Re: Test constantness of normality of residuals from linear regression
Posted:
Jan 10, 2013 12:48 AM
|
|
On Jan 9, 7:35 pm, Paul <paul.domas...@gmail.com> wrote: > After much browsing of Wikipedia and the web, I used both normal > probability plot and Anderson-Darling to test the normality of > residuals from a simple linear regression (SLR) of 6 data points. > Results were very good. However, SLR doesn't just assume that the > residuals are normal. It assumes that the standard deviation of the > PDF that gives rise to the residuals is constant along the horizontal > axis. Is there a way to test for this if none of the data points have > the same value for the independent variable? I want to be able to > show that there is no gross curves or spreading/focusing of the > scatter. > > In electrical engineering signal theory, the horizontal axis is time. > Using Fourier Transform (FT), time-frequency domains can show trends. > Intuitively, I would set up the data as a scatter graph of residuals > plotted against the independent variable (which would be treated as > time). Gross curves show up as low-frequency content. There should > be none if residuals are truly iid. The spectrum should look like > white noise. The usual way to get the power spectrum is the FT of the > autocorrelation function, which itself should resemble an impulse at > zero. This just shows indepedence of samples, not constant iid normal > along the horizontal axis. > > As for spreading or narrowing of the scatter, I guess that can be > modelled in time as a multiplication of a truly random signal by a > linear (or exponential) attenuation function. The latter acts like a > modulation envelope. Their power spectrums will then convolve in some > weird way. I'm not sure if this is a fruitful direction for > identifying trends in the residuals. It starts to get convoluted > pretty quickly. > > Surely there must be a less klugy way from the world of statistics? I > realize that my sample size will probably be too small for many > conceptual approaches. For example, if I had a wealth of data points, > I could segment the horizontal axis, then do a normality test on each > segment. This would generate mu's and sigma's as well, which could > then be compared across segments. So for the sake of conceptual > gratification, I'm hoping for a more elegant test for the ideal case > of many data points. If there is also a test for small sample sizes, > so much the better (though I don't hold my breath).
If y|x = a + b*x + e, where the errors are iid random variables with zero means, and you do an ordinary least squares fit of that model to (x1,y1), ..., (xn,yn), then the theoretical variance of the residual for xi is 1 - 1/n - [(xi-m)^2 / sum{(xj-m)^2}], where m is the mean of x1, ..., xn. In words, residuals whose x is far from the mean tend to be smaller than those whose x is hear the mean. (This is known as "leverage": points far from the mean have more "leverage" on the regression line, pulling it closer to them.) Note that normality is not required.
|
|