Using the Range to Find Outliers
Date: 09/11/2001 at 07:46:20 From: Stu Subject: Spotting outliers Can you help me? How can you use the range of a set of numbers to determine whether any of the observations were outliers?
Date: 09/11/2001 at 11:36:58 From: Doctor Jubal Subject: Re: Spotting outliers Hi Stu, Thanks for writing Dr. Math. There are a number of fairly simple ways to test for outliers. I'll describe three, and maybe one of them is the one you were thinking of. One method is to find the upper and lower quartile values. The upper quartile value (UQ) is the value that 75% of the data set is equal to or less than. The lower quartile value (LQ) is the value that 25% of the data set is equal to or less than. Then define the interquartile range (IQR) as the difference between the upper and lower quartiles. IQR = UQ - LQ Many statistics books define suspect outliers as those that are at least 1.5*IQR greater than the upper quartile or 1.5*IQR less than the lower quartile. Another common method is to find the mean and the standard deviation of the data set (search the Dr. Math archives or write back if you don't understand these terms), and then call anything that falls more than three standard deviations away from the mean an outlier. That is, x is an outlier if abs(x - mean) --------------- > 3 std dev This is usually called a z-test in statistics books, and the ratio abs(x-mean)/(std dev) is often abbreviated z. Another common outlier test is the Q-test, but be careful with this one since it should never be used to reject more than one point. It compares how far out the outlier is to the total range of the data, and since the range of the data gets smaller as you start rejecting points, it is possible to reject almost the entire data set if you apply the Q-test several times in succession, so never do it more than once. To do a Q-test, find the ratio abs(x_a - x_b) Q = ---------------- R x_a is the possible outlier, x_b is the data point closest to it, and R is the total range of the data set. If Q is greater than a certain critical value (Qcrit depends on the number of data points and how sure you want to be that it's okay to reject x_a as an outlier), then call x_a an outlier. At the 90% confidence level (that is, you want to be 90% sure than x_a doesn't really belong in the data set before you reject it), I've tabulated a few values for Qcrit below. You can find more in any statistics book. number of data points Qcrit 3 0.94 4 0.76 5 0.64 6 0.56 7 0.51 8 0.47 9 0.44 10 0.41 If you want more information on these and possibly other outlier- finding techniques, you can search the Dr. Math archives using the keyword outlier and I can guarantee you'll get more than a few hits. If you don't think any of these were the method you were thinking of, feel free to write back. - Doctor Jubal, The Math Forum http://mathforum.org/dr.math/
For more on the meanings of "quartile" and mathematicians' disagreements about them, see Defining Quartiles http://mathforum.org/library/drmath/view/60969.html - Doctor Melissa, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994-2013 The Math Forum