"Paul" wrote in message news:firstname.lastname@example.org...
On Apr 23, 3:51 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote: > On Tue, 23 Apr 2013 12:15:20 -0700 (PDT), Paul > > > > > > <paul.domas...@gmail.com> wrote: > >I'm > >perusinghttp://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test#Test_for_n... > >for a statistical test of normality for my residuals. H0 is normality > >of the residuals. > > >For typical hypothesis testing, we want small significance, which > >means a small rejection region. Thus, and value of the statistic that > >falls in the rejection region is less likely due to chance (in > >combination with the truth of H0). In testing a drug for a medical > >effect, that makes sense because we often want to demonstrate an > >effect, and H0 is typically the absence of an effect. For values of > >the statistic that fall in the small rejection region, we can say that > >if H0 is true, it is highly unlikely for us to get this value for the > >statistic. The smaller the significance, the smaller the rejection > >region, and less we are able to attribute the chance any values in the > >rejection region. > > >For normality, we often want the opposite. We want H0, which is > >normality of the residuals. We can not accept H0 to any degree of > >confidence using this setup of hypothesis testing, but at least we can > >make it very easy to reject H0 so any non-rejection of H0 is seen to > >be well founded. This implies large rejection region and high > >significance. In fact, we might want to a 95% rejection region, the > >counterpart of the wanting a 5% rejection region when the intent is to > >demonstrate that rejection of H0 is not due to chance. > > >Is this reasonable? I ask because the table in the above link shows > >significance values 1%, 2.5%, 5%, 10%, and 15%. These small values > >seem more like the values that one might be interested in when wanting > >to demonstrate valid rejection of H0. > > No, it is not reasonable in the context of testing. > > "Nothing is really normal." Any real data that you collect (or > any residuals that you want to look at) are going to look > non-normal -- at any p-value that you want to consider -- if > you can take a sample N that is large enough. > > That's part of why "tests of normality" are not used much by > practicing statisticians. There are usually more obvious clues > (source of the data; eyeballing the numbers) that a test is > going to be affected. > > - The F-test is pretty darn robust, once you have a large > enough sample. If the sample is small enough that normality > really matters, it is too often too small for the test of normality > to have enough power to detect the non-normality. > > - Furthermore, the nature of the non-normality will matter. > Hetergeneity across the range of prediction warns about the > validity of the whole model, and can be important than outliers; > but one or two very *extreme* outliers can sabotage the error > term, rendering testing useless. > > Independent t-test. > S1 = (1,2,3,4,5); S2= (6,7,8,9,10). > - t-test, probably significant. > Replace the highest value, 10, in the higher group with > 1000. t-test is no longer significant or even suggests a > difference.
OK, I get it. There are practical considerations at play. However, from a conceptual standpoint, say H0 was some generic hypothesis rather than residual normality. My question about whether the significance should be small or large depending on whether I'm interested resoundingly showing the resonableness of rejecting or not rejecting H0 still seems to hold. Would you (or anyone else) be able to chime in under this modified scenario?
You can't try to impose to impose a false symmetry between the null and alternative hypotheses in anything like this way. For ALL reasonable tests, for fixed sample size, lowering the probability of rejecting the null hypothesis when it is true (changing the significance level) also reduces the probability of accepting a fixed alternative hypothesis under the assumption that hypothesis is true.
If you were dead keen on using "normal distribution" as the alternative hypothesis .... so that there is only a small chance of reaching that conclusion if a certain different null hypothesis is true, then you could set up such a test , but this would depend heavily on the null hypothesis distribution you choose . If you wanted to follow this up you could look up "tests of separate families of hypotheses" to see one possible approach. The basic sources for these illustrate that testing H0=HA against H1=HB is not the opposite of testing H0=HB against H1=HA and you can get results where both are accepted, both rejected, or only one accepted.
But there would still be radically different results from testing, for example : H0=Cauchy vs H1=Normal, compared with H0=Laplace (double exponential) vs H1=Normal.
None of such tests would be at all like an Anderson-Darling test. One basic point here is that while the the null hypothesis distribution can be (and often is) a special case of the alternative hypothesis, the alternative hypothesis can't be a special case of the null hypothesis (as otherwise the alternative hypothesis would always be true if the the null hypothesis is true. So the Anderson-Darling test can test H0:Normal against H1:any possible distribution, it can't be used (and there is no test that can be used) to test H0=any possible distribution against H1=Normal.
In the above, H0 means the specifically designated null hypothesis for a given test.
For the general version of your question, significance tests are not framed in terms either of "showing the reasonableness of rejecting or not rejecting H0" (whatever that might mean). They are based on controlling the probability of rejecting the null hypothesis when it is true ... usually with a fixed significance level, but it is possible to try to compromise between using a higher false-rejection probability in order to get higher power for some alternative. But you would have to have a good reason for doing this and would still need to limit the significance level to a reasonable level, otherwise you might just as well always accept the alternative hypothesis and not bother with testing. For example, if you were doing an initial screening of an extremely large number of different treatments for some medical condition, in a situation where is is unlikely that a randomly chosen treatment would be effective , you might want to reduce the number for further testing to 10% and so choose in the initial stage to test for effectiveness (against H0:noneffective) at the 10% level. There would be little point in testing at a 90% significance level as that wouldn't reduce the work-load at the second stage much. An alternative here would be to specify the required power for a given size of effect (probability of detecting an effect if the effect is larger than a given value) and work back to a significance level to achieve this. This might lead to a large proportion of treatments being taken forward for further testing. In either approach, one could consider improving the initial testing stage (for example by increasing the sample size) in order to achieve a good power for a reasonably low rate of treatments being passed to the next stage (low significance level).
So ... high significance levels might be reasonable in some cases but you would need to have a good reason to use one ... at least they are not theoretically impossible, but you would be close to needing to answer "why do the test at all, instead of always rejecting the null hypothesis?". You need to remember the essentially non-symmetric nature of the null and alternative hypotheses, which is imbued in "accept the null hypothesis unless there is sufficient evidence to reject it ". You are free to switch the null and alternative around if that suits your line of thinking in a given instance, leading to "accept the NEW null hypothesis unless there is sufficient evidence to reject it ", but such a test would not necessarily be logically valid and, even if valid, would not be the logical opposite of the original test