Bruce Weaver wrote: > Reef Fish wrote: > > Richard Ulrich wrote: > >> On 30 Oct 2006 06:50:28 -0800, "Reef Fish" > > RU: > >> I would prefer to say, "the method *may* be all wrong," and I think > >> that RF expresses that more relaxed idea in his closing comments, > >> where some violations are more serious than others .... > >> [snip, some detail] > > > RF: > > BUt those are TWO DIFFERENT sets of statements. > > > > In the above, it means If the ASSUMPTION(s) are NOT valid, then > > the statistical results based on the method WILL be all wrong. > > There is no "may be" about it. If you have two binary variables > > X and Y and you test its correlation with the test statistic T for > > the Pearson correlation coefficient (which would be phi for the > > two binary variables), the result WILL be wrong because the > > assumption is violated 100%, without question. > > > > In the situation below, it's about the VALIDATION of the assumption. > > If Normality is required of a variable, and it is not known 100% to be > > nonnormal, then there is leeway in deciding what is a serious > > violation and what is not, because in that case (unlike the case it > > does not require any thinking to know that the (0,1) variable is > > NOT normal) the DATA can never prove with 100% certainty > > whether it came from a Normal population or not. > > > > There is a BIG difference in the above two situations. > > > But as Box put it, "in nature there never was a normal distribution". > So if we're talking about real data, we *know* with 100% certainty that > it's not normal, and the real question is whether it is *useful* to > assume that it is.
This discussion is USEFUL, VERY useful.
I mentioned the (0,1) variable as clearly non-Normal in a sense which is STILL different from sense you quoted Box (1976). I had already proven that theorem in 1970 in my unpublished book on Data Analysis which had been continuously revised since I retired in 1999. :-)
So, in THAT sense, I said it LONG before Box published it in 1976. But I am also sure Box had said it many times to his students. But what I said was in my textbook of 1970, :-)
It was the ONLY Theorem in that book.
It was stated in Section 1 (the first section) TITLE:
"NOTHING IN THE REAL WORLD IS NORMALLY DISTRIBUTED.
in Chapter 1, titled "Analysis of A Single Variable. Normality: Theory and Practice".
It was the same textbook used in Harvard Stat 220r (the most advanced and highest numbered seminar course at Harvard, in 1990. :-)
That Section was followed by Section 2, titled: NORMALITY SHOULD BE USED.
Section 3. HOW DO YOU DISTINGUISH A NORMAL FROM A NONNORMAL DISTRIBUTION.
Section 4. ILLUSTRATION OF NORMALITY IN DISGUISE.
> Here's the Box quote in some context. > > "In applying mathematics to subjects such as physics or statistics we > make tentative assumptions about the real world which we know are false > but which we believe may be useful nonetheless. The physicist knows > that particles have mass and yet certain results, approximating what > really happens, may be derived from the assumption that they do not. > Equally, the statistician knows, for example, that in nature there never > was a normal distribution, there never was a straight line, yet with > normal and linear assumptions, known to be false, he can often derive > results which match, to a useful approximation, those found in the real > world." > > Box GEP. Science and Statistics. JASA, Vol. 71, No. 356 (Dec., 1976), > 791-799.
That's in relation to Box's discussion of the ITERATIVE process of the analyst first being a "sponsor' of his tentative model, then a "critic" of the same model, etc.
I have quoted many passages from that same paper, in sci.stat.math. Google found 16 threads, on searching for "George Box Data Analysis, JASA". All of those threads were directly related to what he had to say about Data Analysis. I haven't done a hard count, but I think I have recommended Box's 1976 paper in sci.stat.math more often than I recommended ANY book or paper.
I have included Box's quotes in all editions of my textbook since 1976. So, what your quoted passage, though different from most of the passages I quoted from that same article, are in the same SPIRIT of Data Analysis.
Actually the part that is even more relevant to the present issue is his paragraphs on "Selective Worrying", onmice and tiger and mathematistry.
That's what Data Analysis is all about.
It's NOT about taking any statement TOO seriously (such as Normality). It's NOT about nitpicking on the RESIDUALS being always non- independent and yet they are used to validify the assumption of independence of the errors. It's NOT about anything being 100% true or 100% false.
It's about using the brain cells the good lord gave us to THINK with.
> -- > Bruce Weaver > firstname.lastname@example.org > www.angelfire.com/wv/bwhomedir
The topic of Validation of assumptions is already explicitly mentioned under my first two topics in "Reef Fish Statistics for Dummies".
I had hinted to my NEXT topic in the "Reef Fish Statistics for Dummies" series to be the Validation of Assumptions in MULTIPLE regression. That has not appeared yet. Perhaps when the NOISE die down, I'll write it.
Meanwhile, I am seriously considering PUBLISHING my Data Analysis Notes (which I had called my "textbook",) as in the above.
Perhaps I should title it "Data Analysis for Advanced Dummies?"
The First Chapter (and all its sections) will like be almost exactly the way it appeared throughout the years. Those 4 Sections will definitely remain unchanged, because in one single chapter, I want my students and readers to know immediately that Data Analysis is an INTERPLAY of Theory and Practice -- to discard the useless theory (such as a literal Normal distribution in DATA because none ever existed) but to USE Normal theory (as a very useful approximation of what is REAL).