Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Box and Whisker Plots


Date: 02/13/2000 at 21:39:15
From: Ramiro Martinez
Subject: Box and Whisker Plots

I don't understand box and whisker plots. All I know is that a box and 
whisker plot is used to display data. I can't find information on this 
anywhere else.

Sincerely, 
Ramiro Martinez


Date: 02/14/2000 at 12:03:38
From: Doctor TWE
Subject: Re: Box and Whisker Plots

Hi Ramiro - thanks for writing to Dr. Math.

A box-and-whisker plot (often simply called a box plot) is a graphical 
way of showing data. It is useful for quickly finding outliers - data 
points out of line with the rest of the data set. 

Suppose we want to construct a box plot of the following test scores:

     50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

If they're not already in numerical order, it's best to arrange them 
in ascending order.

First, we need to construct the "box." To do so, we must find the 
upper and lower quartiles and the median. The median is the number in 
the middle of our set (when arranged in numerical order). The upper 
and lower quartiles are the values 1/4 of the way from the top or 
bottom of our set. In our example:

     50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100
                  ^               ^               ^
                 L.Q.           Median           U.Q.

To draw the box, we'll put a scale on the x-axis and draw a box from 
the lower quartile to the upper quartile. We'll add a vertical line to 
mark the median, like so:

                             LQ     M UQ
                              +-------+
                              |     | |
                              +-------+
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110

     where LQ = Lower Quartile, M = Median, UQ = Upper Quartile.


Now we add "fences." First, we compute the inner quartile range (IQR). 
The IQR = UQ - LQ. So in our example IQR = 85 - 77 = 8. The inner 
fences are 1.5*IQR below the L.Q. and 1.5*IQR above the U.Q. For our 
example, the inner fences are at:

              77 - 1.5*8 = 77 - 12 = 65
     and at   85 + 1.5*8 = 85 + 12 = 97

We'll mark these with a dotted line (I'll use colons ":"). Sometimes 
the fences are not drawn on the box plot, but we'll put them in so we 
can see where they are:

                 LIF         LQ     M UQ         UIF
                  :           +-------+           :
                  :           |     | |           :
                  :           +-------+           :
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110

     where LIF = Lower Inner Fence, UIF = Upper Inner Fence.


There is also a set of outer fences. These are 3*IQR below the L.Q. 
and 3*IQR above the U.Q. For our example, the outer fences are at:

              77 - 3*8 = 77 - 24 = 53
     and at   85 + 3*8 = 85 + 24 = 109

We'll mark these with another dotted line. These are always twice as 
far out as the inner fences. Here's what we have so far:

     LOF         LIF         LQ     M UQ         UIF         UOF
      :           :           +-------+           :           :
      :           :           |     | |           :           :
      :           :           +-------+           :           :
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110

     where LOF = Lower Outer Fence, UOF = Upper Outer Fence.


Now we add the "whiskers." Find the first value above (to the right 
of) the Lower Inner Fence. Mark it with an X and draw a line 
connecting it to the box. Similarly, find the first value below (to 
the left of) the Upper Inner Fence. Mark it with an X and draw a line 
connecting it to the box as well. In our example, the end values for 
our whiskers are at 73 (the first value above 65) and 95 (the first 
value below 97.) Our plot now looks like this:

     LOF         LIF         LQ     M UQ         UIF         UOF
      :           :           +-------+           :           :
      :           :       X---|     | |---------X :           :
      :           :           +-------+           :           :
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110


Finally, we have to mark the outliers. Values between the inner and 
outer fences are called "suspect outliers." We mark them with an 
asterisk "*".

Values outside the outer fences are called "highly suspect outliers." 
We mark them with an "o". In our example, we have two suspect 
outliers: the 60 and the 100. We also have one highly suspect outlier: 
the 50. Once we mark these on our plot, we're finished:

     LOF         LIF         LQ     M UQ         UIF         UOF
      :           :           +-------+           :           :
   o  :      *    :       X---|     | |---------X :  *        :
      :           :           +-------+           :           :
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110


We could "erase" the fences and labels, but I'd probably leave them in 
so that the person looking at the graph can see where they are. If we 
erase them, we'll have:

                              +-------+
   o         *            X---|     | |---------X    *
                              +-------+
   ^.........^.........^.........^.........^.........^.........^
   50       60        70        80        90        100       110

As you can see, this plot quickly gives an idea of what our data look 
like. Half the numbers are between 77 and 85, the middle of the data 
set is at 83, the "reasonable" range of the data goes from 73 to 95, 
and we have three suspect data values at 50, 60, and 100.

A nice feature of this kind of plot is that all the computations are 
relatively simple. We never had to do anything more than add, 
subtract, and multiply by 1.5 and 3.

I hope this helps! If you have any more questions, write back.

- Doctor TWE, The Math Forum
  http://mathforum.org/dr.math/   
    


For more on the meanings of "quartile" and mathematicians' 
disagreements about them, see

  Defining Quartiles
  http://mathforum.org/library/drmath/view/60969.html

- Doctor Melissa, The Math Forum
  http://mathforum.org/dr.math/   
    
Associated Topics:
College Statistics
High School Statistics
Middle School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/