The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* »

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Need help understanding Homogeneity of Variance please
Replies: 1   Last Post: Oct 31, 2006 3:14 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View  
Reef Fish

Posts: 1,021
Registered: 8/4/06
Re: Need help understanding Homogeneity of Variance please
Posted: Oct 31, 2006 3:14 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Bruce Weaver wrote:
> Reef Fish wrote:
> > Richard Ulrich wrote:
> >> On 30 Oct 2006 06:50:28 -0800, "Reef Fish"
> RU:

> >> I would prefer to say, "the method *may* be all wrong," and I think
> >> that RF expresses that more relaxed idea in his closing comments,
> >> where some violations are more serious than others ....
> >> [snip, some detail]

> >
> RF:
> > BUt those are TWO DIFFERENT sets of statements.
> >
> > In the above, it means If the ASSUMPTION(s) are NOT valid, then
> > the statistical results based on the method WILL be all wrong.
> > There is no "may be" about it. If you have two binary variables
> > X and Y and you test its correlation with the test statistic T for
> > the Pearson correlation coefficient (which would be phi for the
> > two binary variables), the result WILL be wrong because the
> > assumption is violated 100%, without question.
> >
> > In the situation below, it's about the VALIDATION of the assumption.
> > If Normality is required of a variable, and it is not known 100% to be
> > nonnormal, then there is leeway in deciding what is a serious
> > violation and what is not, because in that case (unlike the case it
> > does not require any thinking to know that the (0,1) variable is
> > NOT normal) the DATA can never prove with 100% certainty
> > whether it came from a Normal population or not.
> >
> > There is a BIG difference in the above two situations.

> But as Box put it, "in nature there never was a normal distribution".
> So if we're talking about real data, we *know* with 100% certainty that
> it's not normal, and the real question is whether it is *useful* to
> assume that it is.

This discussion is USEFUL, VERY useful.

I mentioned the (0,1) variable as clearly non-Normal in a sense
which is STILL different from sense you quoted Box (1976). I had
already proven that theorem in 1970 in my unpublished book on
Data Analysis which had been continuously revised since I
retired in 1999. :-)

So, in THAT sense, I said it LONG before Box published it in 1976.
But I am also sure Box had said it many times to his students. But
what I said was in my textbook of 1970, :-)

It was the ONLY Theorem in that book.

It was stated in Section 1 (the first section) TITLE:


in Chapter 1, titled "Analysis of A Single Variable. Normality:
Theory and Practice".

It was the same textbook used in Harvard Stat 220r (the most
advanced and highest numbered seminar course at Harvard,
in 1990. :-)

That Section was followed by Section 2, titled:



> Here's the Box quote in some context.
> "In applying mathematics to subjects such as physics or statistics we
> make tentative assumptions about the real world which we know are false
> but which we believe may be useful nonetheless. The physicist knows
> that particles have mass and yet certain results, approximating what
> really happens, may be derived from the assumption that they do not.
> Equally, the statistician knows, for example, that in nature there never
> was a normal distribution, there never was a straight line, yet with
> normal and linear assumptions, known to be false, he can often derive
> results which match, to a useful approximation, those found in the real
> world."
> Box GEP. Science and Statistics. JASA, Vol. 71, No. 356 (Dec., 1976),
> 791-799.

That's in relation to Box's discussion of the ITERATIVE process of
the analyst first being a "sponsor' of his tentative model, then a
"critic" of the same model, etc.

I have quoted many passages from that same paper, in sci.stat.math.
Google found 16 threads, on searching for "George Box Data Analysis,
JASA". All of those threads were directly related to what he had to
say about Data Analysis. I haven't done a hard count, but I think I
have recommended Box's 1976 paper in sci.stat.math more often
than I recommended ANY book or paper.

I have included Box's quotes in all editions of my textbook since 1976.
So, what your quoted passage, though different from most of the
passages I quoted from that same article, are in the same SPIRIT
of Data Analysis.

Actually the part that is even more relevant to the present issue is
his paragraphs on "Selective Worrying", onmice and tiger and

That's what Data Analysis is all about.

It's NOT about taking any statement TOO seriously (such as Normality).
It's NOT about nitpicking on the RESIDUALS being always non-
independent and yet they are used to validify the assumption of
independence of the errors. It's NOT about anything being 100%
true or 100% false.

It's about using the brain cells the good lord gave us to THINK with.

> --
> Bruce Weaver

The topic of Validation of assumptions is already explicitly mentioned
under my first two topics in "Reef Fish Statistics for Dummies".

I had hinted to my NEXT topic in the "Reef Fish Statistics for Dummies"
series to be the Validation of Assumptions in MULTIPLE regression.
That has not appeared yet. Perhaps when the NOISE die down,
I'll write it.

Meanwhile, I am seriously considering PUBLISHING my Data Analysis
Notes (which I had called my "textbook",) as in the above.

Perhaps I should title it "Data Analysis for Advanced Dummies?"

The First Chapter (and all its sections) will like be almost exactly
way it appeared throughout the years. Those 4 Sections will
definitely remain unchanged, because in one single chapter, I want
my students and readers to know immediately that Data Analysis
is an INTERPLAY of Theory and Practice -- to discard the useless
theory (such as a literal Normal distribution in DATA because none
ever existed) but to USE Normal theory (as a very useful
approximation of what is REAL).

-- Reef Fish Bob.

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.