|


Using the Range to Find OutliersDate: 09/11/2001 at 07:46:20 From: Stu Subject: Spotting outliers Can you help me? How can you use the range of a set of numbers to determine whether any of the observations were outliers?
Date: 09/11/2001 at 11:36:58
From: Doctor Jubal
Subject: Re: Spotting outliers
Hi Stu,
Thanks for writing Dr. Math.
There are a number of fairly simple ways to test for outliers. I'll
describe three, and maybe one of them is the one you were thinking of.
One method is to find the upper and lower quartile values. The upper
quartile value (UQ) is the value that 75% of the data set is equal to
or less than. The lower quartile value (LQ) is the value that 25% of
the data set is equal to or less than. Then define the interquartile
range (IQR) as the difference between the upper and lower quartiles.
IQR = UQ - LQ
Many statistics books define suspect outliers as those that are at
least 1.5*IQR greater than the upper quartile or 1.5*IQR less than the
lower quartile.
Another common method is to find the mean and the standard deviation
of the data set (search the Dr. Math archives or write back if you
don't understand these terms), and then call anything that falls more
than three standard deviations away from the mean an outlier. That
is, x is an outlier if
abs(x - mean)
--------------- > 3
std dev
This is usually called a z-test in statistics books, and the ratio
abs(x-mean)/(std dev) is often abbreviated z.
Another common outlier test is the Q-test, but be careful with this
one since it should never be used to reject more than one point. It
compares how far out the outlier is to the total range of the data,
and since the range of the data gets smaller as you start rejecting
points, it is possible to reject almost the entire data set if you
apply the Q-test several times in succession, so never do it more than
once. To do a Q-test, find the ratio
abs(x_a - x_b)
Q = ----------------
R
x_a is the possible outlier, x_b is the data point closest to it, and
R is the total range of the data set. If Q is greater than a certain
critical value (Qcrit depends on the number of data points and how
sure you want to be that it's okay to reject x_a as an outlier), then
call x_a an outlier. At the 90% confidence level (that is, you want to
be 90% sure than x_a doesn't really belong in the data set before you
reject it), I've tabulated a few values for Qcrit below. You can find
more in any statistics book.
number of data points Qcrit
3 0.94
4 0.76
5 0.64
6 0.56
7 0.51
8 0.47
9 0.44
10 0.41
If you want more information on these and possibly other outlier-
finding techniques, you can search the Dr. Math archives using the
keyword
outlier
and I can guarantee you'll get more than a few hits. If you don't
think any of these were the method you were thinking of, feel free to
write back.
- Doctor Jubal, The Math Forum
http://mathforum.org/dr.math/
For more on the meanings of "quartile" and mathematicians' disagreements about them, see Defining Quartiles http://mathforum.org/library/drmath/view/60969.html - Doctor Melissa, The Math Forum http://mathforum.org/dr.math/ |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/