The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Courses » ap-stat

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Boxplots
Replies: 0  

Advanced Search

Back to Topic List Back to Topic List  
Rex Boggs

Posts: 80
Registered: 12/6/04
Posted: Jun 12, 1996 5:25 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Paul sent this to me, rather than the list. I am forwarding it at his request.



Rex asks about Boxplots:
>But I've noticed that computer packages are more sophisticated -
>they draw to whiskers out to say 1.5 SDs away from the mean, and then
>use synmbols to represent outliers (eg dots for 'mild' outliers and
>squares for 'severe' outliers).
>I think this is great for a computer to do, but what about for kids?
>Isn't it sufficient to just determine the 5-number summary, and plot
>those values? Some information is lost, but much time is saved.
>What are the 'industry standards' in this area? Are there any?

As the author of a book on the subject, of the original EDA code in
Minitab, and of the Data Desk statistics package, I guess I should reply to
this one.
The standard definition for a boxplot (as made by John Tukey, the original
definer) is that the whiskers extend to 1.5 fourth-spreads*
(Inter-Quartile-Ranges, but see note below) beyond each quartile*. This is
a calculation easily done by hand. It has been justified as a good rump
outlier detector by simulations published by Hoaglin and Igelewitz. One
should *not* use standard deviations. Nor should one (as is done by some
packages) extend the whiskers to the 90% (or other arbitary percentile)

*Note: Statisticians haven't agreed on the definition of the quartile. My
favorite -- and the one that agrees with Tukey's original definition (and
with Moore in both of his books), is that the quartile is the median of the
values above (below) the median with the median included in both ends if it
is a data value (that is, if n is odd), but only the upper half or lower
half of the data used if n is even. This means that students need only
divide by 2 rather than interpolate. It also happens to have other nice
properties, which don't matter to the intro course. With quartiles defined
in this way, the IQR is a fine ruler for outliers.

I am heartily in favor of teaching boxplots this way because it raises the
issue of outliers early in the course (so that we can continue to ask of
each method we see, "what would be the effect of an outlier on this?" -- an
exercise that both reminds students of the importance of being alert for
outliers and a good way to think through each method in turn.)

In my naive youth, I once asked Tukey "why 1.5?" His answer was (and I
assume, still is) "1 is too small and 2 is too large." -- and the
simulations bear him out.

-- Paul Velleman

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.