The Math Forum

Ask Dr. Math - Questions and Answers from our Archives
Associated Topics || Dr. Math Home || Search Dr. Math

Using the Range to Find Outliers

Date: 09/11/2001 at 07:46:20
From: Stu
Subject: Spotting outliers

Can you help me? How can you use the range of a set of numbers to 
determine whether any of the observations were outliers?

Date: 09/11/2001 at 11:36:58
From: Doctor Jubal
Subject: Re: Spotting outliers

Hi Stu,

Thanks for writing Dr. Math.

There are a number of fairly simple ways to test for outliers. I'll
describe three, and maybe one of them is the one you were thinking of.

One method is to find the upper and lower quartile values.  The upper
quartile value (UQ) is the value that 75% of the data set is equal to 
or less than. The lower quartile value (LQ) is the value that 25% of 
the data set is equal to or less than. Then define the interquartile 
range (IQR) as the difference between the upper and lower quartiles.

  IQR = UQ - LQ

Many statistics books define suspect outliers as those that are at 
least 1.5*IQR greater than the upper quartile or 1.5*IQR less than the 
lower quartile.

Another common method is to find the mean and the standard deviation 
of the data set (search the Dr. Math archives or write back if you 
don't understand these terms), and then call anything that falls more 
than three standard deviations away from the mean an outlier.  That 
is, x is an outlier if

   abs(x - mean)
  --------------- > 3
      std dev

This is usually called a z-test in statistics books, and the ratio
abs(x-mean)/(std dev) is often abbreviated z.

Another common outlier test is the Q-test, but be careful with this 
one since it should never be used to reject more than one point. It 
compares how far out the outlier is to the total range of the data, 
and since the range of the data gets smaller as you start rejecting 
points, it is possible to reject almost the entire data set if you 
apply the Q-test several times in succession, so never do it more than 
once. To do a Q-test, find the ratio

       abs(x_a - x_b)
  Q = ----------------

x_a is the possible outlier, x_b is the data point closest to it, and 
R is the total range of the data set. If Q is greater than a certain 
critical value (Qcrit depends on the number of data points and how 
sure you want to be that it's okay to reject x_a as an outlier), then 
call x_a an outlier. At the 90% confidence level (that is, you want to 
be 90% sure than x_a doesn't really belong in the data set before you 
reject it), I've tabulated a few values for Qcrit below.  You can find 
more in any statistics book.

  number of data points     Qcrit
           3                 0.94
           4                 0.76
           5                 0.64
           6                 0.56
           7                 0.51
           8                 0.47
           9                 0.44
          10                 0.41

If you want more information on these and possibly other outlier-
finding techniques, you can search the Dr. Math archives using the 


and I can guarantee you'll get more than a few hits. If you don't 
think any of these were the method you were thinking of, feel free to 
write back.

- Doctor Jubal, The Math Forum   

For more on the meanings of "quartile" and mathematicians' 
disagreements about them, see

  Defining Quartiles

- Doctor Melissa, The Math Forum   
Associated Topics:
High School Statistics

Search the Dr. Math Library:

Find items containing (put spaces between keywords):
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

Math Forum Home || Math Library || Quick Reference || Math Forum Search

Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.