Outliers in a Box-And-Whisker Plot
Date: 12/07/2000 at 07:21:19 From: Bob Tucek Subject: Outliers in a box-and-whisker plot I am teaching box and whisker plots to my seventh grade students. We have calculated 1.5 times the IQR and added it to the upper quartile and subtracted it from the lower quartile. If any data is beyond those points, it is an outlier. The question is, "would it be an outlier if the point were equal to 1.5*IQR away from one of the quartiles? My instincts tell me that it would not be; that the point would need to be farther away than that. This appears to be confirmed by the TI-83 calculator, since it does not graph the point as an outlier. However, I used a worksheet for an assignment that had only one point that was exactly equal to 1.5*IQR away, and none farther away. The first question on the page asks them to identify the outlier. This implies that it would be an outlier. I have looked in every book I have, searched your archives and the archives of other math sites online and can't find a clarification anywhere. They all explain how to find them, but aren't specific enough to answer my question. Thank you, Bob Tucek
Date: 12/07/2000 at 11:58:54 From: Doctor TWE Subject: Re: Outliers in a box-and-whisker plot Hi Bob - thanks for writing to Dr. Math. Outliers are data points that are outside the range of the data values that we want to describe. Outliers can be due to an error in measurement of the value, a value from a different population, or simply a rare chance event. In any case, where we draw the line for "outside" is somewhat arbitrary. (Why 1.5*IQR? Why not 2*IQR? Orsqrt(10)*IQR? Or (pi/2)*IQR?) When taking statistical measurements, there is a "gray area," and methods for finding outliers are simply guidelines to help us find errant points - they're not intended to be absolute. The farther away from the mean a data point is, the more suspect it is. I would describe a data point that is exactly 1.5*IQR away from the Quartiles as a "borderline outlier." The idea is to recognize that these data points are more likely to be "tainted." The method you describe is, in fact, only one way of finding outliers. Another method is to define an outlier as any data point where the absolute value of the z-score is greater than 3 (i.e. it lies more than 3 standard deviations away from the mean). This definition would create a different set of boundaries for outliers. Incidentally, my college statistics textbook (_Statistics for Engineering and the Sciences_, 4th edition; W. Mendenhall and T. Sincich; Prentice-Hall; 1995) describes suspect outliers as "observations that fall between the inner fences and the outer fences," where inner fences are defined at 1.5*IQR and outer fences are defined at 3*IQR. To me, this implies that a data point exactly on the inner fence would not be considered a suspect outlier (since it is not "between" the fences). But then it proceeds to describe highly suspect outliers as "observations that fall outside the outer fences." But how then are we to interpret a data point that falls exactly on the outer fence? It is, strictly speaking, neither "between the fences" nor "beyond the outer fence." [Perhaps we can call it a "somewhat highly suspect outlier."] An important thing to note is that in the end-of-chapter summary, it describes both of these as "rules of thumb for detecting outliers." The bottom line: The reliability of data points in our data set is not an "all-or-nothing" situation, but rather colored in shades of gray. Where we choose to "draw the line" is somewhat arbitrary and can be determined using different methods. So pick a method, and just be consistent. I hope this helps. If you have any more questions, write back. - Doctor TWE, The Math Forum http://mathforum.org/dr.math/
For more on the meanings of "quartile" and mathematicians' disagreements about them, see Defining Quartiles http://mathforum.org/library/drmath/view/60969.html - Doctor Melissa, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.