The Math Forum

Ask Dr. Math - Questions and Answers from our Archives
Associated Topics || Dr. Math Home || Search Dr. Math


Date: 03/14/2001 at 09:32:11
From: Mrs. Ben-Ami
Subject: Probability

What is the definition of outlier?

Date: 03/14/2001 at 10:10:39
From: Doctor Mitteldorf
Subject: Re: Probability

Dear Mrs. Ben-Ami,

For a variety of definitions of "outlier," you can use a searcher like 
Google to look for the words definition outlier. You'll find 
definitions like these:

1) Outlier - a data point that is an "unusual" observation and likely 
should be discarded. Note: The median is less affected by outliers 
than is the mean.

2) A number that is far apart from the rest of the data; an extreme
value either much lower or much higher than the rest of the values
in the data set. Outliers are known to skew means or averages.

But I'm afraid you've unearthed an embarrassing secret of the 
statistical trade: An outlier is a point which your data set is better 
off without. If you can prove your point better by ignoring some small 
portion of your data, why not ignore it?  It's probably just a blunder 
on the part of the person collecting data, or some special, irrelevant 
circumstance that we needn't investigate in detail.

There is no rigorous definition of an "outlier," and generations of 
statisticians have made their employers' data look better than they 
really are by selectively eliminating from analysis inconvenient data 

Having said all that - there is some justification for the concept.  
Usually, there are many small sources of difference that together 
cause data to be scattered in a recognizable pattern, and from 
analyzing that pattern, you can conclude a great deal both about the 
difference and about the average properties of the data. And it's 
often true that, in a large data set, something odd happens to a few 
of the measurements that doesn't happen to the rest. It can be as 
simple as reading the meter wrong, or that some process was 
inadvertently left incomplete at a few of the sites. You look 
at the data and they fall into a smooth and regular pattern except for 
a few points that stick out and make you wonder what heppened.

So the concept of an "outlier" and the reason for eliminating them 
from a data set before analysis are both legitimate; it's just that 
the process of recognizing outliers lies outside of any objective, 
mathematical process, and is thus subject to easy abuse. Statistical 
analysis is sometimes done today by pure scientists whose only motive 
is to seek truth, but more often it is done on contract to 
organizations that have much at stake in the outcome. There is 
pressure to make the analysis come out in one direction, and the 
selective elimination of "outliers" is a favorite tool for justifying 
the distortion of science by political ideology or economic interest 
or even a theoretical bias of the scientist himself.

- Doctor Mitteldorf, The Math Forum   
Associated Topics:
High School Statistics

Search the Dr. Math Library:

Find items containing (put spaces between keywords):
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

Math Forum Home || Math Library || Quick Reference || Math Forum Search

Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.