Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Boxplots
Posted:
Jun 12, 1996 5:25 AM


Paul sent this to me, rather than the list. I am forwarding it at his request.
Cheers
Rex
Rex asks about Boxplots: >But I've noticed that computer packages are more sophisticated  >they draw to whiskers out to say 1.5 SDs away from the mean, and then >use synmbols to represent outliers (eg dots for 'mild' outliers and >squares for 'severe' outliers). > >I think this is great for a computer to do, but what about for kids? >Isn't it sufficient to just determine the 5number summary, and plot >those values? Some information is lost, but much time is saved. > >What are the 'industry standards' in this area? Are there any?
As the author of a book on the subject, of the original EDA code in Minitab, and of the Data Desk statistics package, I guess I should reply to this one. The standard definition for a boxplot (as made by John Tukey, the original definer) is that the whiskers extend to 1.5 fourthspreads* (InterQuartileRanges, but see note below) beyond each quartile*. This is a calculation easily done by hand. It has been justified as a good rump outlier detector by simulations published by Hoaglin and Igelewitz. One should *not* use standard deviations. Nor should one (as is done by some packages) extend the whiskers to the 90% (or other arbitary percentile) point.
*Note: Statisticians haven't agreed on the definition of the quartile. My favorite  and the one that agrees with Tukey's original definition (and with Moore in both of his books), is that the quartile is the median of the values above (below) the median with the median included in both ends if it is a data value (that is, if n is odd), but only the upper half or lower half of the data used if n is even. This means that students need only divide by 2 rather than interpolate. It also happens to have other nice properties, which don't matter to the intro course. With quartiles defined in this way, the IQR is a fine ruler for outliers.
I am heartily in favor of teaching boxplots this way because it raises the issue of outliers early in the course (so that we can continue to ask of each method we see, "what would be the effect of an outlier on this?"  an exercise that both reminds students of the importance of being alert for outliers and a good way to think through each method in turn.)
In my naive youth, I once asked Tukey "why 1.5?" His answer was (and I assume, still is) "1 is too small and 2 is too large."  and the simulations bear him out.
 Paul Velleman



