Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: small confidence limit to demonstrate inability to reject H0?
Replies: 4   Last Post: Oct 13, 2013 4:09 PM

 Messages: [ Previous | Next ]
 Richard Ulrich Posts: 2,961 Registered: 12/13/04
Re: small confidence limit to demonstrate inability to reject H0?
Posted: Oct 10, 2013 7:08 PM

On Thu, 10 Oct 2013 13:38:33 -0700 (PDT), paul.domaskis@gmail.com
wrote:

>I've had my head into intro stats and simple hypothesis testing for about a year. It all seems to make sense, especially if you want to demonstrate an effect for a drug and H0 corresponds to no effect. For example, if you set a 95% confidence limit, then the rejection region for the H0's PDF is 5%. If your sample statistic lies outside of the confidence limits, then under the scenario that H0 is true, there is only a 5% chance that you could have gotten your result by chance. The closer you set your confidence limit to 100%, the less likely that a result outside of that limit is due to chance, under the H0 scenario. So you're more justified in rejection H0.
>
>What I don't get is the common practice of choosing high confidence limits when you're interested in showing that rejection of H0 is *not* justified. [break]

I don't know that this is a "common practice" in good literature. But
I do agree that this is an area where non-statisticians are much
more likely to apply their rules-of-thumb in ways that turn out to
be backwards ("small p-levels are always better"). And the real
statisticians have to notice that they need to be careful of where

The situation seems to be better these days in "bioequivalence
studies", where the purpose is showing that meds are similar.
Nowadays, they insist on some efficacy in the study, and put a CI
on similarity.

===start of Wikip section on bioequivalence (Oct 10, 2013)
Regulatory definition
Australia

In Australia, the Therapeutics Goods Administration (TGA) considers
preparations to be bioequivalent if the 90% confidence intervals (90%
CI) of the rate ratios, between the two preparations, of Cmax and AUC
lie in the range 0.80-1.25. Tmax should also be similar between the
products.[1]
====end of quote; entries fairly similar for other countries.

I can mention that the FDA papers that come up by Google are
difficult to read. When I looked into this 10 years ago, it seemed
to me that the formal standards used by the FDA in the US in the 1980s
were almost as silly as what you describe. I don't know whether
they were applied so loosely, but it *looked* like someone could have
declared bioequivalence by collecting data where the sample was
too small (or the study too sloppy) to show an effect in either group.

>... For example, at http://www.r-bloggers.com/the-many-uses-of-q-q-plots, 95% confidence bands are chosen for a Q-Q plot, with H0 being that the data is normally distributed. To highlight why I'm confused by this, lets consider the extreme case of 99.99% confidence limits. That means the rejection region is only 0.001%, and the bands widen out. Almost any scattering of data will fit within the confidence limits. So if all my data points are within the confidence limits, it says very little. In contrast, if I chose 50% confidence limits, having all my data points within those limits is a harder test to meet. I know that hypothesis testing does not allow one to determine whether H0 is true, but having all data points within 50% confidence limits allows one to definitively rule out any evidence against H0, which is a lot more info than
>what can be gleaned from 99.99% confidence limits. From browsing the web, weak evidence starts to buildup if p-values drop as low as 0.10.

As a very general rule, p-levels are not good indicators of
effect size. Or, conversely, for lack-of-effect size. N is the
huge confounder, plus, it depends on how sensitive your
test is for finding an "effect."

In Repeated Measures testing, there is a Mauchley test for
sphericity that is regularly reported by stats programs, which
especially conditions the confidence that one should place in
subcontrasts. Naively, one would think that "5%" warns of
hazard. However, for most data with reasonable power for the
main hypothesis, "50%" is a cutoff that gives the confidence
that you want.

Or, going the opposite direction, a two-sample t-test with equal Ns
is so robust that it hardly matters what a test on equal-variances
tells you. Further, it is always a *bad* idea to use that test to
condition whether you use the "test that assumes equal variances."

>
>So if one wants to demonstrate a lack of evidence against H0, then why wouldn't one choose to use as low a confidence limit as possible? I described my confusion in the context of Q-Q plots, but the same conceptual questions dogs me for simple scalar hypothesis testing.

When you want to show similarity, you want to show a narrow
CI on some relevant difference, for some meaningful measure...
not merely an absence of "demonstrated" difference. Crappy measures
can give you apparent similarities.

--
Rich Ulrich

Date Subject Author
10/10/13 Paul
10/10/13 Richard Ulrich
10/11/13 Paul
10/11/13 Richard Ulrich
10/13/13 Luis A. Afonso