Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Derivation of Linear Interpolation Median Formula

Date: 09/29/2007 at 05:16:31
From: Daya
Subject: Median,  m  =  L + [ (N/2  F) / f ]C    

Median,  m  =  L + [ (N/2  F) / f ]C.  

How does this median formula come?  My teacher did not show and proof
how does this formula come.  Therefore, I just substitute and blindly
use the formula.  Can you help me?

This formula is used to find the median in a group data with class 
interval.  The median is the value of the data in the middle position
of the set when the data is arranged in numerical order.  The class 
where the middle position is located is called the median class and
this is also the class where the median is located.  This formula is
used to find the median in a group data which is located in the median
class.

Median,  m  =  L + [ (N/2  F) / f ]C  

  L means lower boundary of the median class

  N means sum of frequencies

  F means cumulative frequency before the median class.  Meaning that
    the class before the median class what is the frequency

  f means frequency of the median class
 
  C means the size of the median class

I have tried to use an ogive graph to understand, but I still did not
get how did this formula come.



Date: 09/30/2007 at 23:19:53
From: Doctor Peterson
Subject: Re: Median,  m  =  L + [ (N/2  F) / f ]C

Hi, Daya.

This is a linear interpolation (on the ogive graph, as you suggested),
which finds where the actual median WOULD be if you assume that the
data are uniformly distributed within the median class.

One way to derive the formula is just to note that N/2 is the number
of data values BELOW the median, so N/2 - F is the number of data
values in median class that are below the median.  Therefore, (N/2 -
F)/f is the fraction of values in the median class that are below the
median.  This times C is that fraction of the class width; adding L
gives the value at that position in the class.

In terms of the ogive (cumulative distribution), let's first just plot
the actual cumulative frequency before each class, something like

  N+                             *
   |                        *
   |                   *
   |
   |              * ---
   + . . . . . .     ^
   |                 |f
   |                 v
  F|         *      ---
   |    *
   *----+----+----+----+----+----+
             L
             |<-->|
                C

We don't know where the actual data points are, but if they are
uniformly distributed within each class, we could connect the points
above with straight lines.  Your formula gives the x coordinate
corresponding to y=N/2.  See if you can derive it this way.

If you have any further questions, feel free to write back.


- Doctor Peterson, The Math Forum
  http://mathforum.org/dr.math/ 

Date: 09/01/2016 at 08:11:52
From: Pramod
Subject: Rationale for finding median from a frequency distribution

Given this frequency distribution table:

   60-70          4
   70-80          5
   80-90          6
   90-100         7
  ------------------
             n = 22 

I used the following rationale to calculate the median.

Median data entry = (22 + 1)/2 
                  = 11.5th entry from first
                  = 11.5 - 9
                  = 2.5th of 6 entries through 80-90

Now, since I don't know the 6 data entries of median class, I assumed that
they were distributed equally through 80 to 90 (10 class width):

   81.667, 83.333, 85, 86.667, 88.333, 90

I used these in the formula

   Median = L + {(n + 1)/2) - c.f.} * (h/f)

Here,

        L = lower limit of median class
        h = class width
     c.f. = cumulative frequency up to the preceding class
        f = frequency of median class
        n = total data entries/summation of frequencies

I got

   Median = 2.5th data entry 
          = (83.333 + 85)/2
          = 84.1667

But in almost every statistics book I have ever studied, the formula for
calculating median from a continuous frequency distribution table is 
given as

   Median = L + {(n)/2) - c.f.} * (h/f)

I know very well that the median calculated from such data is not exact,
since we know only the range of data entries -- not the actual data
entries, themselves. But still, does't it make more sense to use my
formula? Doesn't it give a more precise approximation? If you agree, why
is the latter formula used in almost every textbook?



Date: 09/01/2016 at 10:17:24
From: Doctor Peterson
Subject: Re: Rationale for finding median from a frequency distribution

Hi, Pramod.

I discussed this formula for Daya, above, but I didn't go into the details
of the derivation to confirm that that formula could not be improved upon.

I have a small problem with your example: you didn't clearly state how to
interpret your classes.

Let's take a closer look at your data.

   class        freq
   -----        ----
   60-70          4
   70-80          5
   80-90          6
   90-100         7
                ----
             n = 22 

Which class is 70 in? I will assume that 80-90 means 80 <= x < 90, as is
commonly done for continuous data; if the values are integers, then the
class could also be described as 80-89 (inclusive), but then our estimate
would have to be rounded to an integer, so we would not get a similar
formula.

If the 6 values in the class 80 <= x < 90 are evenly spaced across these
10 units, then they are spaced 10/6 = 1 2/3 units apart. 

I would center them like this:

     5/6  __5/3__   __5/3__   __5/3__   __5/3__   __5/3__   5/6
    /  \ /       \ /       \ /       \ /       \ /       \ /  \
        *         *    |    *         *         *         *
   +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
   80    81    82    83    84    85    86    87    88    89    90

Therefore, the 2.5th value is 83 1/3 -- that is, 80 + 2*5/3, not 
80 + 2.5*5/3.

The standard formula gives

   Median = 80 + [(22/2) - 9] * (10/6) 
          = 80 + 2*5/3 
          = 83 1/3

This agrees with my answer.

In the page above, the implication is that we would use the continuous CDF
(for your example) like this:

   n=22|                   *
       +                  /
       |                 /
       |               /
       |              * ---
       |             /   ^
     11| . . . . . .     |f=6
       +          /      v
    F=9|         *      ---
       |       /
       |      /
       |    *
       |  /
      0*----+----+----+----+
       60   70   80   90  100
                 |<-->|
                  C=10

Linear interpolation puts the median 2/6 of the way from 80 to 90, giving
83 1/3 again.

The difference between my first approach and yours is that I was a little
more careful to distribute the values uniformly within the entire
interval; whereas your last value is right at the end of the interval
(and, I think, really in the next interval!). The fact that this results
in the same answer obtained for a piecewise-linear CDF is encouraging.

- Doctor Peterson, The Math Forum at NCTM
  



Date: 09/01/2016 at 11:03:57
From: Pramod
Subject: Thank you (Rationale for finding median from a frequency distribution)

Thank you for the well-explained answer. 

It looks like my approach was lacking on which class boundary to include.
Associated Topics:
College Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2015 The Math Forum
http://mathforum.org/dr.math/