On 2013-01-10, Michael Press <email@example.com> wrote: > In article ><firstname.lastname@example.org>, > Ray Koopman <email@example.com> wrote:
>> It all depends on what you want. Look up the Gauss-Markov theorem. >> To justify the usual OLS estimates of the regression coefficients, >> the errors need only to be unbiased, uncorrelated, and homoscedastic, >> but to justify all the usual p-values and confidence regions, the >> errors must be iid normal.
>> However, that's considering only the theoretical justification. >> In practice, what matters is not whether the assumptions are right >> or wrong, but how wrong they are -- they're never exactly right.
>> Normality is probably the least important assumption. The most >> important things to worry about are the general form of the model >> and whether it includes all the relevant predictor variables. Then >> you ask how correlated and/or heteroscedastic the errors might be. >> Finally, you might wonder about shapes of the error distributions. >> Minor departures from normality are inconsequential. Nothing in the >> real world is exactly normal, and any test of normality will reject >> if the sample size is big enough.
> Assuming that the errors are normally distributed is > equivalent to assuming that the errors have mean zero > and fixed variance (using the new word I heard today: > homoscedastic) in that those assumptions least affect > how close our analysis gets to discerning the > parameters of interest.
This is totally wrong. Mean zero is important if a constant term is being estimated; in general, what is important is homoscedasticity and lack of covariance between the variables upon which regression is being done and the "errors".
This can happen quite easily without normality. As has been posted, the estimates behave just about as well without normality as with, but the traditional tests, with nothing to support them, may well come out differently.
Normality is a bad assumption > only if we are suppressing some knowledge of how the > errors are distributed beyond the initial assumptions. > If it somehow turns out that a different set of > assumptions about the errors is better, for some value > of better, then that is called scientific discovery, > not bad assumptions. We should get to the point where > we cannot wring any more meaning out to the data and > are left with errors normally distributed around zero.
This is nonsense. If the observations are "good", meaning the errors in the estimates will be small, the error distribution will be of little consequence. It is only in poorly fitting models tbat the distribution may be of consequence.
> It is not that I said anything more than you about the > mathematics and statistics---only voiced my perspective > on the process. If you see that I am in error, normal > for me, I welcome hearing about it.
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Department of Statistics, Purdue University firstname.lastname@example.org Phone: (765)494-6054 FAX: (765)494-0558