Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Origin of Origin's Outputs

Date: 09/17/2012 at 05:12:53
From: Ji
Subject: box-plot

I have read this question and answer:

  http://mathforum.org/library/drmath/view/60969.html 

But another question has arisen. When I draw this box-plot using Origin
(version 8.5 of the software), it resulted in

   LQ = 3 and UQ = 7 for data set B = {1, 2, 3, 4, 5, 6, 7, 8, 9}
   LQ = 3 and UQ = 9 for data set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

Apparently, the Origin program included the median when calculating the
lower and upper quartiles for data set B, but excluded the median when
calculating those values for D.

So Origin seems to follow neither Tukey's method nor others. But what
method could it have used for these calculations?



Date: 09/17/2012 at 19:23:23
From: Doctor Peterson
Subject: Re: box-plot

Hi, Ji.

Your quartile data appears to agree with the chart on that page for sets B
and D using the "M&S" method. Here's the relevant section:

   Mendenhall and Sincich, in their text _Statistics for Engineering and
   the Sciences_, define a different method of finding quartile values. To
   apply their method on a data set with n elements, first calculate

      L = (1/4)(n + 1)

   and round to the nearest integer. If L falls halfway between two
   integers, round up. The Lth element is the lower quartile value.

   Next, calculate

      U = (3/4)(n + 1)

   and round to the nearest integer. If U falls halfway between two
   integers, round down. The Uth element is the upper quartile value. 

For set B = {1, 2, 3, 4, 5, 6, 7, 8, 9}, n = 9, so 

   L = (1/4)(9 + 1) = 2.5, rounded up to 3
   U = (3/4)(9 + 1) = 7.5, rounded down to 7

Thus, LQ and UQ of B are 3 and 7, respectively.

For set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, n = 11, so

   L = (1/4)(11 + 1) = 3
   U = (3/4)(11 + 1) = 9

Thus, LQ and UQ of D are 3 and 9, respectively.

There may well be other methods they could use to obtain these values;
does M&S produce the values you observe for other cases?

- Doctor Peterson, The Math Forum
  http://mathforum.org/dr.math/ 
  


Date: 09/17/2012 at 20:49:41
From: Ji
Subject: box-plot

Thanks for your help! 

When the number of data is odd, it seems that Origin software applies the
"M&S" method. 

When the number of data is even, for example, {1, 2, 3, 4, 5, 6, 7, 8},
n = 8, 

   L = (1/4)(8 + 1) = 2.25, round down to 2
   U = (3/4)(8 + 1) = 6.75, round up to 7

Thus, the LQ and UQ are 2 and 7, respectively. Are they right? 

But Origin gives output of LQ and UQ as 2.5 and 6.5, respectively, with 
median = 4.5.



Date: 09/17/2012 at 23:03:00
From: Doctor Peterson
Subject: Re: box-plot

Hi, Ji.

You're right, it appears they do not consistently use the M&S method; for
even numbers, they might use Tukey or M&M. But since those have a very
different style from M&S, I can't imagine they would mix the two. More
likely, they follow some other method entirely.

For a larger study of different methods of calculation of quartiles
(actually percentiles, the quartiles being the 25th and 75th percentiles),
see

    http://www.amstat.org/publications/jse/v14n3/langford.html 

Table 2 is similar to Dr. TWE's table, but with slightly smaller sample
data sets. Perhaps if you try those, you will recognize one that agrees
with your software. Let me know what values you get for the quartiles.

I suspect they may be using the "CDF method" (4):

   METHOD 4 ("CDF"): The Pth percentile value is found as follows.
   Calculate np. If np is an integer, then the Pth percentile value is the
   average of #(np) and #(np + 1). If np is not an integer, the Pth
   percentile value is #ceil(np); that is, we round up. Alternatively, one
   can look at #(np + 0.5) and round off unless it is half an odd integer,
   in which case it is left unrounded.

   As an example, if S5 = {1, 2, 3, 4, 5} and p = 0.25, then #(np) = 1.25,
   which is not an integer. So we take the next largest integer, and
   hence, Q1 = 2. Using the alternative calculation, we would look at 
   #(np + 0.5) = #(1.75), which would again round off to 2. Note that this
   method can be considered as "Method 10 with rounding."

Translated into Dr. TWE's terms,

   L = n/4
       if it is not an integer, round up and use Lth data point;
       if it is an integer, average the Lth and (L + 1)th data points

   U = 3n/4
       if it is not an integer, round up and use Lth data point;
       if it is an integer, average the Lth and (L + 1)th data points

For {1, 2, 3, 4, 5, 6, 7, 8}, this gives

   LQ: L = 8/4 = 2; average 2nd and 3rd data points, giving 2.5
   UQ: U = 3*8/4 = 6; average 6th and 7th data points, giving 6.5

For {1, 2, 3, 4, 5, 6, 7, 8, 9}, we get

   LQ: L = 9/4 = 2.25; round up and use 3rd data point, 3
   UQ: U = 3*9/4 = 6.75; round up and use 7th data point, 7

These cases agree with what you've told me.

- Doctor Peterson, The Math Forum
  http://mathforum.org/dr.math/ 



Date: 09/18/2012 at 03:59:54
From: Ji
Subject: Thank you (box-plot)

Dear Dr. Peterson,

I'm very grateful to you for all your help!

The software which I am using, OriginPro 8.5 (OriginLab), produced the
following values for the quartiles:

        SET                  Q1, Q3    MEDIAN   n/4    3n/4   2n/4
   ------------------------------------------------------------------    
      {1, 2, 3, 4}           (1.5, 3.5)   2.5     1      3      2
      {1, 2, 3, 4, 5}          (2, 4)     3       1.25   3.75   2.5
      {1, 2, 3, 4, 5, 6}       (2, 5)     3.5     1.5    4.5    3
      {1, 2, 3, 4, 5, 6, 7}    (2, 6)     4       1.75   5.25   3.5

These results indicate that the software uses Method 4 (CDF), in
accordance to your deduction.

Thank you very much!

Best regards!

Allen Gaohua Ji,
Shanghai Ocean University



Date: 09/18/2012 at 16:03:19
From: Doctor Peterson
Subject: Re: Thank you (box-plot)

Hi, Ji.

Yes, it does look very likely that they are using this method, or at least
something equivalent.

Note that at the end of his article, Langford says that this is the best
method, and offers an equivalent method that is easy to teach:

   Thus, the following method is equivalent to the CDF Method 4, yet has
   the flavor of the Inclusive and Exclusive Methods 1 and 2, and thus
   should be more accessible to students.

   SUGGESTED METHOD: Divide the data set into two halves, a bottom half
   and a top half. If n is odd, include or exclude the median in the
   halves so that each half has an odd number of elements. The lower and
   upper quartiles are then the medians of the bottom and top halves,
   respectively.

So for these four examples, the work looks like these:

   --+--            <-- bottom half
   1 | 2 | 3 | 4
     |   | --+--    <-- top half
    1.5 2.5 3.5     <-- Q1, Q2, Q3



   ----+----
   1  (2) (3) (4)  5
       |   ----+----
       2   3   4



   ----+----
   1  (2)  3 | 4  (5)  6
       |     | ----+----
       2    3.5    5



   ----+----
   1  (2)  3  (4)  5  (6)  7
       |       |   ----+----
       2       4       6

So it looks like your software has made a good choice.

- Doctor Peterson, The Math Forum
  http://mathforum.org/dr.math/ 



Date: 09/18/2012 at 21:06:25
From: Ji
Subject: Thank you (box-plot)

Dear Dr. Peterson,

Thank you again for your detailed instruction! Now I'm clear about how my
software works.

Best regards!

Allen Gaohua Ji
Associated Topics:
High School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/