On Tue, 23 Apr 2013 12:15:20 -0700 (PDT), Paul <firstname.lastname@example.org> wrote:
>I'm perusing http://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test#Test_for_normality >for a statistical test of normality for my residuals. H0 is normality >of the residuals. > >For typical hypothesis testing, we want small significance, which >means a small rejection region. Thus, and value of the statistic that >falls in the rejection region is less likely due to chance (in >combination with the truth of H0). In testing a drug for a medical >effect, that makes sense because we often want to demonstrate an >effect, and H0 is typically the absence of an effect. For values of >the statistic that fall in the small rejection region, we can say that >if H0 is true, it is highly unlikely for us to get this value for the >statistic. The smaller the significance, the smaller the rejection >region, and less we are able to attribute the chance any values in the >rejection region. > >For normality, we often want the opposite. We want H0, which is >normality of the residuals. We can not accept H0 to any degree of >confidence using this setup of hypothesis testing, but at least we can >make it very easy to reject H0 so any non-rejection of H0 is seen to >be well founded. This implies large rejection region and high >significance. In fact, we might want to a 95% rejection region, the >counterpart of the wanting a 5% rejection region when the intent is to >demonstrate that rejection of H0 is not due to chance. > >Is this reasonable? I ask because the table in the above link shows >significance values 1%, 2.5%, 5%, 10%, and 15%. These small values >seem more like the values that one might be interested in when wanting >to demonstrate valid rejection of H0.
No, it is not reasonable in the context of testing.
"Nothing is really normal." Any real data that you collect (or any residuals that you want to look at) are going to look non-normal -- at any p-value that you want to consider -- if you can take a sample N that is large enough.
That's part of why "tests of normality" are not used much by practicing statisticians. There are usually more obvious clues (source of the data; eyeballing the numbers) that a test is going to be affected.
- The F-test is pretty darn robust, once you have a large enough sample. If the sample is small enough that normality really matters, it is too often too small for the test of normality to have enough power to detect the non-normality.
- Furthermore, the nature of the non-normality will matter. Hetergeneity across the range of prediction warns about the validity of the whole model, and can be important than outliers; but one or two very *extreme* outliers can sabotage the error term, rendering testing useless.
Independent t-test. S1 = (1,2,3,4,5); S2= (6,7,8,9,10). - t-test, probably significant. Replace the highest value, 10, in the higher group with 1000. t-test is no longer significant or even suggests a difference.