Box and Whisker PlotsDate: 02/13/2000 at 21:39:15 From: Ramiro Martinez Subject: Box and Whisker Plots I don't understand box and whisker plots. All I know is that a box and whisker plot is used to display data. I can't find information on this anywhere else. Sincerely, Ramiro Martinez Date: 02/14/2000 at 12:03:38 From: Doctor TWE Subject: Re: Box and Whisker Plots Hi Ramiro - thanks for writing to Dr. Math. A box-and-whisker plot (often simply called a box plot) is a graphical way of showing data. It is useful for quickly finding outliers - data points out of line with the rest of the data set. Suppose we want to construct a box plot of the following test scores: 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100 If they're not already in numerical order, it's best to arrange them in ascending order. First, we need to construct the "box." To do so, we must find the upper and lower quartiles and the median. The median is the number in the middle of our set (when arranged in numerical order). The upper and lower quartiles are the values 1/4 of the way from the top or bottom of our set. In our example: 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100 ^ ^ ^ L.Q. Median U.Q. To draw the box, we'll put a scale on the x-axis and draw a box from the lower quartile to the upper quartile. We'll add a vertical line to mark the median, like so: LQ M UQ +-------+ | | | +-------+ ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LQ = Lower Quartile, M = Median, UQ = Upper Quartile. Now we add "fences." First, we compute the inner quartile range (IQR). The IQR = UQ - LQ. So in our example IQR = 85 - 77 = 8. The inner fences are 1.5*IQR below the L.Q. and 1.5*IQR above the U.Q. For our example, the inner fences are at: 77 - 1.5*8 = 77 - 12 = 65 and at 85 + 1.5*8 = 85 + 12 = 97 We'll mark these with a dotted line (I'll use colons ":"). Sometimes the fences are not drawn on the box plot, but we'll put them in so we can see where they are: LIF LQ M UQ UIF : +-------+ : : | | | : : +-------+ : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LIF = Lower Inner Fence, UIF = Upper Inner Fence. There is also a set of outer fences. These are 3*IQR below the L.Q. and 3*IQR above the U.Q. For our example, the outer fences are at: 77 - 3*8 = 77 - 24 = 53 and at 85 + 3*8 = 85 + 24 = 109 We'll mark these with another dotted line. These are always twice as far out as the inner fences. Here's what we have so far: LOF LIF LQ M UQ UIF UOF : : +-------+ : : : : | | | : : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LOF = Lower Outer Fence, UOF = Upper Outer Fence. Now we add the "whiskers." Find the first value above (to the right of) the Lower Inner Fence. Mark it with an X and draw a line connecting it to the box. Similarly, find the first value below (to the left of) the Upper Inner Fence. Mark it with an X and draw a line connecting it to the box as well. In our example, the end values for our whiskers are at 73 (the first value above 65) and 95 (the first value below 97.) Our plot now looks like this: LOF LIF LQ M UQ UIF UOF : : +-------+ : : : : X---| | |---------X : : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 Finally, we have to mark the outliers. Values between the inner and outer fences are called "suspect outliers." We mark them with an asterisk "*". Values outside the outer fences are called "highly suspect outliers." We mark them with an "o". In our example, we have two suspect outliers: the 60 and the 100. We also have one highly suspect outlier: the 50. Once we mark these on our plot, we're finished: LOF LIF LQ M UQ UIF UOF : : +-------+ : : o : * : X---| | |---------X : * : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 We could "erase" the fences and labels, but I'd probably leave them in so that the person looking at the graph can see where they are. If we erase them, we'll have: +-------+ o * X---| | |---------X * +-------+ ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 As you can see, this plot quickly gives an idea of what our data look like. Half the numbers are between 77 and 85, the middle of the data set is at 83, the "reasonable" range of the data goes from 73 to 95, and we have three suspect data values at 50, 60, and 100. A nice feature of this kind of plot is that all the computations are relatively simple. We never had to do anything more than add, subtract, and multiply by 1.5 and 3. I hope this helps! If you have any more questions, write back. - Doctor TWE, The Math Forum http://mathforum.org/dr.math/ For more on the meanings of "quartile" and mathematicians' disagreements about them, see Defining Quartiles http://mathforum.org/library/drmath/view/60969.html - Doctor Melissa, The Math Forum http://mathforum.org/dr.math/ |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. Math^{TM}
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/