OutliersDate: 03/14/2001 at 09:32:11 From: Mrs. Ben-Ami Subject: Probability What is the definition of outlier? Date: 03/14/2001 at 10:10:39 From: Doctor Mitteldorf Subject: Re: Probability Dear Mrs. Ben-Ami, For a variety of definitions of "outlier," you can use a searcher like Google to look for the words definition outlier. You'll find definitions like these: 1) Outlier - a data point that is an "unusual" observation and likely should be discarded. Note: The median is less affected by outliers than is the mean. 2) A number that is far apart from the rest of the data; an extreme value either much lower or much higher than the rest of the values in the data set. Outliers are known to skew means or averages. But I'm afraid you've unearthed an embarrassing secret of the statistical trade: An outlier is a point which your data set is better off without. If you can prove your point better by ignoring some small portion of your data, why not ignore it? It's probably just a blunder on the part of the person collecting data, or some special, irrelevant circumstance that we needn't investigate in detail. There is no rigorous definition of an "outlier," and generations of statisticians have made their employers' data look better than they really are by selectively eliminating from analysis inconvenient data points. Having said all that - there is some justification for the concept. Usually, there are many small sources of difference that together cause data to be scattered in a recognizable pattern, and from analyzing that pattern, you can conclude a great deal both about the difference and about the average properties of the data. And it's often true that, in a large data set, something odd happens to a few of the measurements that doesn't happen to the rest. It can be as simple as reading the meter wrong, or that some process was inadvertently left incomplete at a few of the sites. You look at the data and they fall into a smooth and regular pattern except for a few points that stick out and make you wonder what heppened. So the concept of an "outlier" and the reason for eliminating them from a data set before analysis are both legitimate; it's just that the process of recognizing outliers lies outside of any objective, mathematical process, and is thus subject to easy abuse. Statistical analysis is sometimes done today by pure scientists whose only motive is to seek truth, but more often it is done on contract to organizations that have much at stake in the outcome. There is pressure to make the analysis come out in one direction, and the selective elimination of "outliers" is a favorite tool for justifying the distortion of science by political ideology or economic interest or even a theoretical bias of the scientist himself. - Doctor Mitteldorf, The Math Forum http://mathforum.org/dr.math/ |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. Math^{TM}
© 1994- The Math Forum at NCTM. All rights reserved.
http://mathforum.org/dr.math/