Mean or Median?
Date: 11/02/2003 at 16:53:27 From: Jill Subject: Why do we use mean instead of median In working on finding textbook readability for a math project, we have to find the mean number of words per sentence. One question asks: Why does the forumla uses the mean number of words per sentence insteand of the median number of words per sentence? When I found both the mean and the median, they had the same value, 13. They are both measuring the center, so what's the difference? I think it's supposed to have something to do with outliers.
Date: 11/02/2003 at 17:14:32 From: Doctor Ian Subject: Re: why do we use mean instead of median Hi Jill, Probably the easiest way to see the difference is to consider some data with an outlier. For example, suppose there are 7 people who graduate from some university with degrees in communications. They all get jobs, and their salaries are $27,000 $29,000 $33,000 $34,000 $35,000 $39,000 $5,000,000 The last guy got a job playing basketball in the NBA! Now, the median is the middle value, or $34,000. But the mean is about $750,000. The $5 million salary is what we call an "outlier". :^D So, if you were trying to tell prospective communications majors what they could expect to earn after graduation, which number would give them a more accurate picture--the median, or the mean? On the other hand, if you were just trying to get people to come to your university, which number would attract more students? This is why it's important to understand when someone reports an "average" value, he might be talking about the mean, the median, or the mode (another "middle" value), depending on what kind of impression he wants to make. It's up to _you_ to ask, in each case, _which_ average he's talking about. Does this help? - Doctor Ian, The Math Forum http://mathforum.org/dr.math/
Date: 11/02/2003 at 17:56:40 From: Jill Subject: why do we use mean instead of median So the reason we use the mean is because the mean is resistant to outliers? And it includes the exact data??
Date: 11/02/2003 at 20:48:44 From: Doctor Ian Subject: Re: why do we use mean instead of median Hi Jill, Actually, the mean is affected by outliers. To see why, go back and look at our salary data again. Now suppose we change the $5 million salary to $40,000. The median doesn't change at all, but the mean changes by quite a bit! That is, the median is only affected by what's going on near the middle of the data, while the mean is affected by _all_ the data--even if some of it is outrageous. However, there is a nice feature of the mean, which makes it nice to compute, and that is that we can just keep a running total, and a count of the values we've see so far, and we can get the mean. If we later want to include more values, we can just throw them into the pot. But if we want to find the median, we have to keep track of _all_ our values, even if there are millions or billions of them--because there's no other way to find the one that's in the middle. Does this make sense? - Doctor Ian, The Math Forum http://mathforum.org/dr.math/
Date: 11/03/2003 at 15:53:25 From: Jill Subject: Thank you (why do we use mean instead of median) If the mean and median are different, why is it that sometimes they have the same value?
Date: 11/03/2003 at 19:41:59 From: Doctor Ian Subject: Re: why do we use mean instead of median Hi Jill, Sometimes that's just the way it works out. For example, suppose I collect some data, and all my values are the same: 3, 3, 3, ..., 3, 3, 3 If I add these up and divide by the number of 3's, I'll get 3 as my mean, right? And the middle value is going to be a 3. So the mean and median will be the same. In fact, if I make symmetric changes to this data set, I can still arrange to get the same mean and median. For example, suppose I subtract 1 at one end, and add 1 at the other, to get 2, 3, 3, ..., 3, 3, 4 The median won't change. Do you see why the mean won't change, either? Suppose I have this data set: 1 2 3 4 5 6 7 8 9 10 11 Paired from the ends, each pair adds up to 12: 1 2 3 4 5 6 7 8 9 10 11 | | | | |___| | | | | | | | |_______| | | | | | |___________| | | | |_______________| | |____________________| So the median will be 6, and so will the mean. Ultimately, if your mean and median are the same, what it tells you is that your data are arranged symmetrically around the median. For every bit over the mean on one side, there's a corresponding bit under the mean on the other side, balancing it out. When they're different, that tells you that this symmetry has been broken. Does this make sense? - Doctor Ian, The Math Forum http://mathforum.org/dr.math/
Date: 11/04/2003 at 18:04:19 From: Jill Subject: why do we use mean instead of median So do we use the mean because it's higher than the median?
Date: 11/04/2003 at 19:58:08 From: Doctor Ian Subject: Re: why do we use mean instead of median Hi Jill, Either one could be higher than the other. Here's an example where the median is higher than the mean: 0, 0, 1, 1, 1 Mean = 3/5, Median = 1 Here's an example where the mean is higher than the median: 0, 0, 0, 1, 1 Mean = 2/5, Median = 0 Usually, if the mean and median are close, it means that the data are symmetric around the mean, so the other advantages of the mean (e.g., that it's easier to update when more data comes along) make it the better choice. In a case like the college example, the mean is clearly skewed by an outlier, so the median is a better choice. The important thing to remember is that there _is_ a choice, and when the mean and median are substantially different, the one you choose will depend on what point you want to make. For example, suppose you find that the median income of people in Manhattan is $27,000 per year, while the mean is $78,000 (because there are a lot of _really_ rich people there). If you're arguing for a tax increase, you'd want to say that the "average" income is the mean, since it looks more like people can afford it. But if you're arguing for a tax decrease, you'd want to say that the "average" income is the median, since it looks like people can't afford it. Neither one is inherently better than the other. And that's why you have to be careful whenever you hear someone talking about "average" values. - Doctor Ian, The Math Forum http://mathforum.org/dr.math/
Date: 11/04/2003 at 21:19:21 From: Jill Subject: why do we use mean instead of median These were my data: 3,7,7,7,8,9,10,10,10,11,11,13,13,13,14,15,15,15,17,17,19,19,25 For the mean, I get 12.5. For the median, I get 13. Why is the median higher than the mean?
Date: 11/04/2003 at 22:57:50 From: Doctor Ian Subject: Re: why do we use mean instead of median Hi Jill, Let's look at a very simple data set, with just one value: 6 The mean is 6, and so is the median. Now let's add a couple more values: 5, 6, 7 Note that I can write it this way: (6-1), 6, (6+1) Now, the median is unchanged. It's still 6. What happens when I compute the mean? (6-1) + 6 + (6+1) 6 + 6 + 6 + -1 + 1 ----------------- = ------------------ 3 3 6 + 6 + 6 + 0 = ------------- 3 = 6 What if I add some more values: 2, 5, 6, 7, 8 Again, I can write everything in terms of the median plus or minus something: (6-4), (6-1), 6, (6+1), (6+2) The median is still 6. When I compute the mean, I get 6 + 6 + 6 + 6 + 6 + -4 + -1 + 1 + 2 ----------------------------------- 5 6 + 6 + 6 + 6 + 6 + -2 = ---------------------- 5 6 + 6 + 6 + 6 + 6 -2 = ----------------- + -- 5 5 = 6 - 2/5 In other words, the values on either side of the median no longer balance each other out. The difference between 6 and 2 is greater than the difference between 6 and 8; so this pulls the mean to the left (i.e., makes it smaller than the median). Does that make sense? Suppose I used these values instead: 4, 5, 6, 7, 10 i.e., (6-2), (6-1), 6, (6+1), (6+4) Now the situation is reversed, and my mean will be 6 + 2/5 which is larger than the median. In fact, I can just forget about the median for a moment, and add up the differences directly: Values Differences 5, 6, 7 -> -1, 0, 1 These add up to zero 2, 5, 6, 7, 8 -> -4, -1, 0, 1, 2 These add up to -2 4, 5, 6, 7, 10 -> -2, -1, 0, 1, 4 These add up to 2 So we can compute the difference between the median and the mean by adding up all the _differences_ between the median and the values, and dividing by the number of values. Let's try that with your data: -10, -6, -6, -6, -5, -4, -3, -3, -3, -2, -2, 0, 0, 0, 1, 2, 2, 2, 4, 4, 6, 6, 12 The negative differences add up to -50. The positive differences add up to 39. The total of the differences is -11. There are 23 differences, so the mean will be about 11/23 less than the median, i.e., mean = median - 11/23 = 13 - 11/23 = 12.52 (approximately) Note that if we add 1 to each of the values on the right side of the median, we get 3,7,7,7,8,9,10,10,10,11,11,13,14,14,15,16,16,16,18,18,20,20,26 \______________________________/ These have all increased by 1 which has differences of -10, -6, -6, -6, -5, -4, -3, -3, -3, -2, -2, 0, 1, 1, 2, 3, 3, 3, 5, 5, 7, 7, 13 \______________________________/ These have all increased by 1 Now the negative differences (-50) and the positive differences (50) exactly cancel out; and the mean is mean = median + 0/23 = 13 + 0/23 = 13 If we added one more to the values on the right side--or subtracted one from each of the values on the left side--we'd end up with a mean that is _larger_ than the median, instead of smaller. The main thing to realize is that the mean and median are telling you different things about the data, so there's no reason to expect them to be the same ahead of time, unless you know that the values are supposed to be symmetrically distributed around the median. Does this make sense? - Doctor Ian, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.