Date: Jan 9, 2013 10:35 PM
Subject: Test constantness of normality of residuals from linear regression
After much browsing of Wikipedia and the web, I used both normal
probability plot and Anderson-Darling to test the normality of
residuals from a simple linear regression (SLR) of 6 data points.
Results were very good. However, SLR doesn't just assume that the
residuals are normal. It assumes that the standard deviation of the
PDF that gives rise to the residuals is constant along the horizontal
axis. Is there a way to test for this if none of the data points have
the same value for the independent variable? I want to be able to
show that there is no gross curves or spreading/focusing of the
In electrical engineering signal theory, the horizontal axis is time.
Using Fourier Transform (FT), time-frequency domains can show trends.
Intuitively, I would set up the data as a scatter graph of residuals
plotted against the independent variable (which would be treated as
time). Gross curves show up as low-frequency content. There should
be none if residuals are truly iid. The spectrum should look like
white noise. The usual way to get the power spectrum is the FT of the
autocorrelation function, which itself should resemble an impulse at
zero. This just shows indepedence of samples, not constant iid normal
along the horizontal axis.
As for spreading or narrowing of the scatter, I guess that can be
modelled in time as a multiplication of a truly random signal by a
linear (or exponential) attenuation function. The latter acts like a
modulation envelope. Their power spectrums will then convolve in some
weird way. I'm not sure if this is a fruitful direction for
identifying trends in the residuals. It starts to get convoluted
Surely there must be a less klugy way from the world of statistics? I
realize that my sample size will probably be too small for many
conceptual approaches. For example, if I had a wealth of data points,
I could segment the horizontal axis, then do a normality test on each
segment. This would generate mu's and sigma's as well, which could
then be compared across segments. So for the sake of conceptual
gratification, I'm hoping for a more elegant test for the ideal case
of many data points. If there is also a test for small sample sizes,
so much the better (though I don't hold my breath).