Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Mean or Median?

Date: 11/02/2003 at 16:53:27
From: Jill
Subject: Why do we use mean instead of median

In working on finding textbook readability for a math project, we have 
to find the mean number of words per sentence.

One question asks: Why does the forumla uses the mean number of words 
per sentence insteand of the median number of words per sentence?

When I found both the mean and the median, they had the same value, 
13.  They are both measuring the center, so what's the difference?  I
think it's supposed to have something to do with outliers. 


Date: 11/02/2003 at 17:14:32
From: Doctor Ian
Subject: Re: why do we use mean instead of median

Hi Jill, 

Probably the easiest way to see the difference is to consider some 
data with an outlier.  For example, suppose there are 7 people who
graduate from some university with degrees in communications. They all
get jobs, and their salaries are 

      $27,000
      $29,000
      $33,000
      $34,000
      $35,000
      $39,000
   $5,000,000

The last guy got a job playing basketball in the NBA!  Now, the median
is the middle value, or $34,000.  But the mean is about $750,000.  

The $5 million salary is what we call an "outlier".  :^D

So, if you were trying to tell prospective communications majors what
they could expect to earn after graduation, which number would give
them a more accurate picture--the median, or the mean? 

On the other hand, if you were just trying to get people to come to 
your university, which number would attract more students?  

This is why it's important to understand when someone reports an
"average" value, he might be talking about the mean, the median, or
the mode (another "middle" value), depending on what kind of
impression he wants to make.  It's up to _you_ to ask, in each case,
_which_ average he's talking about.  

Does this help? 

- Doctor Ian, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 11/02/2003 at 17:56:40
From: Jill
Subject: why do we use mean instead of median

So the reason we use the mean is because the mean is resistant to
outliers?  And it includes the exact data??


Date: 11/02/2003 at 20:48:44
From: Doctor Ian
Subject: Re: why do we use mean instead of median

Hi Jill,

Actually, the mean is affected by outliers.  To see why, go back and
look at our salary data again.  Now suppose we change the $5 million
salary to $40,000.  The median doesn't change at all, but the mean
changes by quite a bit!  

That is, the median is only affected by what's going on near the
middle of the data, while the mean is affected by _all_ the data--even 
if some of it is outrageous.  

However, there is a nice feature of the mean, which makes it nice to
compute, and that is that we can just keep a running total, and a
count of the values we've see so far, and we can get the mean.  If we
later want to include more values, we can just throw them into the pot. 

But if we want to find the median, we have to keep track of _all_ our
values, even if there are millions or billions of them--because 
there's no other way to find the one that's in the middle. 

Does this make sense? 

- Doctor Ian, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 11/03/2003 at 15:53:25
From: Jill
Subject: Thank you (why do we use mean instead of median)

If the mean and median are different, why is it that sometimes they
have the same value? 


Date: 11/03/2003 at 19:41:59
From: Doctor Ian
Subject: Re: why do we use mean instead of median

Hi Jill, 

Sometimes that's just the way it works out.  For example, suppose I
collect some data, and all my values are the same:

  3, 3, 3, ..., 3, 3, 3

If I add these up and divide by the number of 3's, I'll get 3 as my
mean, right?  And the middle value is going to be a 3.  So the mean 
and median will be the same. 

In fact, if I make symmetric changes to this data set, I can still
arrange to get the same mean and median.  For example, suppose I
subtract 1 at one end, and add 1 at the other, to get

  2, 3, 3, ..., 3, 3, 4

The median won't change.  Do you see why the mean won't change, 
either? 

Suppose I have this data set:

  1 2 3 4 5 6 7 8 9 10 11

Paired from the ends, each pair adds up to 12:

  1 2 3 4 5 6 7 8 9 10 11
  | | | | |___| | | |  | 
  | | | |_______| | |  |
  | | |___________| |  |
  | |_______________|  |
  |____________________|

So the median will be 6, and so will the mean. 

Ultimately, if your mean and median are the same, what it tells you is
that your data are arranged symmetrically around the median.  For
every bit over the mean on one side, there's a corresponding bit under
the mean on the other side, balancing it out.  When they're different,
that tells you that this symmetry has been broken.

Does this make sense? 

- Doctor Ian, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 11/04/2003 at 18:04:19
From: Jill
Subject: why do we use mean instead of median

So do we use the mean because it's higher than the median?


Date: 11/04/2003 at 19:58:08
From: Doctor Ian
Subject: Re: why do we use mean instead of median

Hi Jill,

Either one could be higher than the other.  Here's an example where
the median is higher than the mean:

   0, 0, 1, 1, 1          Mean = 3/5,  Median = 1

Here's an example where the mean is higher than the median:

   0, 0, 0, 1, 1          Mean = 2/5,  Median = 0

Usually, if the mean and median are close, it means that the data are
symmetric around the mean, so the other advantages of the mean (e.g.,
that it's easier to update when more data comes along) make it the
better choice.  

In a case like the college example, the mean is clearly skewed by an
outlier, so the median is a better choice. 

The important thing to remember is that there _is_ a choice, and when
the mean and median are substantially different, the one you choose
will depend on what point you want to make. 

For example, suppose you find that the median income of people in
Manhattan is $27,000 per year, while the mean is $78,000 (because
there are a lot of _really_ rich people there).  

If you're arguing for a tax increase, you'd want to say that the
"average" income is the mean, since it looks more like people can
afford it. 

But if you're arguing for a tax decrease, you'd want to say that the
"average" income is the median, since it looks like people can't
afford it.  

Neither one is inherently better than the other.  And that's why you
have to be careful whenever you hear someone talking about "average"
values.  

- Doctor Ian, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 11/04/2003 at 21:19:21
From: Jill
Subject: why do we use mean instead of median

These were my data:

 3,7,7,7,8,9,10,10,10,11,11,13,13,13,14,15,15,15,17,17,19,19,25

For the mean, I get 12.5.  For the median, I get 13.  Why is the
median higher than the mean?


Date: 11/04/2003 at 22:57:50
From: Doctor Ian
Subject: Re: why do we use mean instead of median

Hi Jill, 

Let's look at a very simple data set, with just one value:

   6

The mean is 6, and so is the median.  Now let's add a couple more values:

  5, 6, 7

Note that I can write it this way:

  (6-1), 6, (6+1)

Now, the median is unchanged.  It's still 6.  What happens when I
compute the mean?

  (6-1) + 6 + (6+1)   6 + 6 + 6 + -1 + 1
  ----------------- = ------------------
          3                   3

                      6 + 6 + 6 + 0
                    = -------------
                            3

                    = 6

What if I add some more values:

  2, 5, 6, 7, 8

Again, I can write everything in terms of the median plus or minus
something:

  (6-4), (6-1), 6, (6+1), (6+2)

The median is still 6.  When I compute the mean, I get

    6 + 6 + 6 + 6 + 6 + -4 + -1 + 1 + 2
    -----------------------------------
                  5

    6 + 6 + 6 + 6 + 6 + -2
  = ----------------------
              5

    6 + 6 + 6 + 6 + 6   -2
  = ----------------- + --
            5            5

  = 6 - 2/5

In other words, the values on either side of the median no longer
balance each other out.  The difference between 6 and 2 is greater 
than the difference between 6 and 8; so this pulls the mean to the 
left (i.e., makes it smaller than the median). 

Does that make sense?  Suppose I used these values instead:

  4, 5, 6, 7, 10

i.e., 

  (6-2), (6-1), 6, (6+1), (6+4)

Now the situation is reversed, and my mean will be 

  6 + 2/5

which is larger than the median. 

In fact, I can just forget about the median for a moment, and add up
the differences directly:

  Values                Differences

  5, 6, 7          ->   -1, 0, 1           These add up to zero

  2, 5, 6, 7, 8    ->   -4, -1, 0, 1, 2    These add up to -2  

  4, 5, 6, 7, 10   ->   -2, -1, 0, 1, 4    These add up to 2

So we can compute the difference between the median and the mean by
adding up all the _differences_ between the median and the values, and
dividing by the number of values.  

Let's try that with your data:

 -10, -6, -6, -6, -5, -4, -3, -3, -3, -2, -2, 
  
  0, 

  0, 0, 1, 2, 2, 2, 4, 4, 6, 6, 12

The negative differences add up to -50.  The positive differences add
up to 39.  The total of the differences is -11.  There are 23
differences, so the mean will be about 11/23 less than the median, i.e., 

  mean = median - 11/23

       = 13 - 11/23

       = 12.52 (approximately)

Note that if we add 1 to each of the values on the right side of the
median, we get

 3,7,7,7,8,9,10,10,10,11,11,13,14,14,15,16,16,16,18,18,20,20,26
                               \______________________________/
                                 These have all increased by 1
which has differences of 

 -10, -6, -6, -6, -5, -4, -3, -3, -3, -2, -2, 
  
  0, 

  1, 1, 2, 3, 3, 3, 5, 5, 7, 7, 13
  \______________________________/
    These have all increased by 1

Now the negative differences (-50) and the positive differences (50)
exactly cancel out; and the mean is 

  mean = median + 0/23

       = 13 + 0/23

       = 13

If we added one more to the values on the right side--or subtracted
one from each of the values on the left side--we'd end up with a mean
that is _larger_ than the median, instead of smaller. 

The main thing to realize is that the mean and median are telling you
different things about the data, so there's no reason to expect them
to be the same ahead of time, unless you know that the values are
supposed to be symmetrically distributed around the median. 

Does this make sense? 

- Doctor Ian, The Math Forum
  http://mathforum.org/dr.math/ 
Associated Topics:
High School Statistics
Middle School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/