Associated Topics || Dr. Math Home || Search Dr. Math

### Using the Range to Find Outliers

```
Date: 09/11/2001 at 07:46:20
From: Stu
Subject: Spotting outliers

Can you help me? How can you use the range of a set of numbers to
determine whether any of the observations were outliers?
```

```
Date: 09/11/2001 at 11:36:58
From: Doctor Jubal
Subject: Re: Spotting outliers

Hi Stu,

Thanks for writing Dr. Math.

There are a number of fairly simple ways to test for outliers. I'll
describe three, and maybe one of them is the one you were thinking of.

One method is to find the upper and lower quartile values.  The upper
quartile value (UQ) is the value that 75% of the data set is equal to
or less than. The lower quartile value (LQ) is the value that 25% of
the data set is equal to or less than. Then define the interquartile
range (IQR) as the difference between the upper and lower quartiles.

IQR = UQ - LQ

Many statistics books define suspect outliers as those that are at
least 1.5*IQR greater than the upper quartile or 1.5*IQR less than the
lower quartile.

Another common method is to find the mean and the standard deviation
of the data set (search the Dr. Math archives or write back if you
don't understand these terms), and then call anything that falls more
than three standard deviations away from the mean an outlier.  That
is, x is an outlier if

abs(x - mean)
--------------- > 3
std dev

This is usually called a z-test in statistics books, and the ratio
abs(x-mean)/(std dev) is often abbreviated z.

Another common outlier test is the Q-test, but be careful with this
one since it should never be used to reject more than one point. It
compares how far out the outlier is to the total range of the data,
and since the range of the data gets smaller as you start rejecting
points, it is possible to reject almost the entire data set if you
apply the Q-test several times in succession, so never do it more than
once. To do a Q-test, find the ratio

abs(x_a - x_b)
Q = ----------------
R

x_a is the possible outlier, x_b is the data point closest to it, and
R is the total range of the data set. If Q is greater than a certain
critical value (Qcrit depends on the number of data points and how
sure you want to be that it's okay to reject x_a as an outlier), then
call x_a an outlier. At the 90% confidence level (that is, you want to
be 90% sure than x_a doesn't really belong in the data set before you
reject it), I've tabulated a few values for Qcrit below.  You can find
more in any statistics book.

number of data points     Qcrit
3                 0.94
4                 0.76
5                 0.64
6                 0.56
7                 0.51
8                 0.47
9                 0.44
10                 0.41

finding techniques, you can search the Dr. Math archives using the
keyword

outlier

and I can guarantee you'll get more than a few hits. If you don't
think any of these were the method you were thinking of, feel free to
write back.

- Doctor Jubal, The Math Forum
http://mathforum.org/dr.math/
```

```
For more on the meanings of "quartile" and mathematicians'

Defining Quartiles
http://mathforum.org/library/drmath/view/60969.html

- Doctor Melissa, The Math Forum
http://mathforum.org/dr.math/
```
Associated Topics:
High School Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search