Associated Topics || Dr. Math Home || Search Dr. Math

### Outliers

```
Date: 03/14/2001 at 09:32:11
From: Mrs. Ben-Ami
Subject: Probability

What is the definition of outlier?
```

```
Date: 03/14/2001 at 10:10:39
From: Doctor Mitteldorf
Subject: Re: Probability

Dear Mrs. Ben-Ami,

For a variety of definitions of "outlier," you can use a searcher like
Google to look for the words definition outlier. You'll find
definitions like these:

1) Outlier - a data point that is an "unusual" observation and likely
should be discarded. Note: The median is less affected by outliers
than is the mean.

2) A number that is far apart from the rest of the data; an extreme
value either much lower or much higher than the rest of the values
in the data set. Outliers are known to skew means or averages.

But I'm afraid you've unearthed an embarrassing secret of the
statistical trade: An outlier is a point which your data set is better
off without. If you can prove your point better by ignoring some small
portion of your data, why not ignore it?  It's probably just a blunder
on the part of the person collecting data, or some special, irrelevant
circumstance that we needn't investigate in detail.

There is no rigorous definition of an "outlier," and generations of
statisticians have made their employers' data look better than they
really are by selectively eliminating from analysis inconvenient data
points.

Having said all that - there is some justification for the concept.
Usually, there are many small sources of difference that together
cause data to be scattered in a recognizable pattern, and from
analyzing that pattern, you can conclude a great deal both about the
difference and about the average properties of the data. And it's
often true that, in a large data set, something odd happens to a few
of the measurements that doesn't happen to the rest. It can be as
simple as reading the meter wrong, or that some process was
inadvertently left incomplete at a few of the sites. You look
at the data and they fall into a smooth and regular pattern except for
a few points that stick out and make you wonder what heppened.

So the concept of an "outlier" and the reason for eliminating them
from a data set before analysis are both legitimate; it's just that
the process of recognizing outliers lies outside of any objective,
mathematical process, and is thus subject to easy abuse. Statistical
analysis is sometimes done today by pure scientists whose only motive
is to seek truth, but more often it is done on contract to
organizations that have much at stake in the outcome. There is
pressure to make the analysis come out in one direction, and the
selective elimination of "outliers" is a favorite tool for justifying
the distortion of science by political ideology or economic interest
or even a theoretical bias of the scientist himself.

- Doctor Mitteldorf, The Math Forum
http://mathforum.org/dr.math/
```
Associated Topics:
High School Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search