Associated Topics || Dr. Math Home || Search Dr. Math

### Origin of Origin's Outputs

```Date: 09/17/2012 at 05:12:53
From: Ji
Subject: box-plot

http://mathforum.org/library/drmath/view/60969.html

But another question has arisen. When I draw this box-plot using Origin
(version 8.5 of the software), it resulted in

LQ = 3 and UQ = 7 for data set B = {1, 2, 3, 4, 5, 6, 7, 8, 9}
LQ = 3 and UQ = 9 for data set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

Apparently, the Origin program included the median when calculating the
lower and upper quartiles for data set B, but excluded the median when
calculating those values for D.

So Origin seems to follow neither Tukey's method nor others. But what
method could it have used for these calculations?

```

```
Date: 09/17/2012 at 19:23:23
From: Doctor Peterson
Subject: Re: box-plot

Hi, Ji.

Your quartile data appears to agree with the chart on that page for sets B
and D using the "M&S" method. Here's the relevant section:

Mendenhall and Sincich, in their text _Statistics for Engineering and
the Sciences_, define a different method of finding quartile values. To
apply their method on a data set with n elements, first calculate

L = (1/4)(n + 1)

and round to the nearest integer. If L falls halfway between two
integers, round up. The Lth element is the lower quartile value.

Next, calculate

U = (3/4)(n + 1)

and round to the nearest integer. If U falls halfway between two
integers, round down. The Uth element is the upper quartile value.

For set B = {1, 2, 3, 4, 5, 6, 7, 8, 9}, n = 9, so

L = (1/4)(9 + 1) = 2.5, rounded up to 3
U = (3/4)(9 + 1) = 7.5, rounded down to 7

Thus, LQ and UQ of B are 3 and 7, respectively.

For set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, n = 11, so

L = (1/4)(11 + 1) = 3
U = (3/4)(11 + 1) = 9

Thus, LQ and UQ of D are 3 and 9, respectively.

There may well be other methods they could use to obtain these values;
does M&S produce the values you observe for other cases?

- Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/

```

```
Date: 09/17/2012 at 20:49:41
From: Ji
Subject: box-plot

When the number of data is odd, it seems that Origin software applies the
"M&S" method.

When the number of data is even, for example, {1, 2, 3, 4, 5, 6, 7, 8},
n = 8,

L = (1/4)(8 + 1) = 2.25, round down to 2
U = (3/4)(8 + 1) = 6.75, round up to 7

Thus, the LQ and UQ are 2 and 7, respectively. Are they right?

But Origin gives output of LQ and UQ as 2.5 and 6.5, respectively, with
median = 4.5.

```

```
Date: 09/17/2012 at 23:03:00
From: Doctor Peterson
Subject: Re: box-plot

Hi, Ji.

You're right, it appears they do not consistently use the M&S method; for
even numbers, they might use Tukey or M&M. But since those have a very
different style from M&S, I can't imagine they would mix the two. More
likely, they follow some other method entirely.

For a larger study of different methods of calculation of quartiles
(actually percentiles, the quartiles being the 25th and 75th percentiles),
see

http://www.amstat.org/publications/jse/v14n3/langford.html

Table 2 is similar to Dr. TWE's table, but with slightly smaller sample
data sets. Perhaps if you try those, you will recognize one that agrees
with your software. Let me know what values you get for the quartiles.

I suspect they may be using the "CDF method" (4):

METHOD 4 ("CDF"): The Pth percentile value is found as follows.
Calculate np. If np is an integer, then the Pth percentile value is the
average of #(np) and #(np + 1). If np is not an integer, the Pth
percentile value is #ceil(np); that is, we round up. Alternatively, one
can look at #(np + 0.5) and round off unless it is half an odd integer,
in which case it is left unrounded.

As an example, if S5 = {1, 2, 3, 4, 5} and p = 0.25, then #(np) = 1.25,
which is not an integer. So we take the next largest integer, and
hence, Q1 = 2. Using the alternative calculation, we would look at
#(np + 0.5) = #(1.75), which would again round off to 2. Note that this
method can be considered as "Method 10 with rounding."

Translated into Dr. TWE's terms,

L = n/4
if it is not an integer, round up and use Lth data point;
if it is an integer, average the Lth and (L + 1)th data points

U = 3n/4
if it is not an integer, round up and use Lth data point;
if it is an integer, average the Lth and (L + 1)th data points

For {1, 2, 3, 4, 5, 6, 7, 8}, this gives

LQ: L = 8/4 = 2; average 2nd and 3rd data points, giving 2.5
UQ: U = 3*8/4 = 6; average 6th and 7th data points, giving 6.5

For {1, 2, 3, 4, 5, 6, 7, 8, 9}, we get

LQ: L = 9/4 = 2.25; round up and use 3rd data point, 3
UQ: U = 3*9/4 = 6.75; round up and use 7th data point, 7

These cases agree with what you've told me.

- Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/

```

```
Date: 09/18/2012 at 03:59:54
From: Ji
Subject: Thank you (box-plot)

Dear Dr. Peterson,

I'm very grateful to you for all your help!

The software which I am using, OriginPro 8.5 (OriginLab), produced the
following values for the quartiles:

SET                  Q1, Q3    MEDIAN   n/4    3n/4   2n/4
------------------------------------------------------------------
{1, 2, 3, 4}           (1.5, 3.5)   2.5     1      3      2
{1, 2, 3, 4, 5}          (2, 4)     3       1.25   3.75   2.5
{1, 2, 3, 4, 5, 6}       (2, 5)     3.5     1.5    4.5    3
{1, 2, 3, 4, 5, 6, 7}    (2, 6)     4       1.75   5.25   3.5

These results indicate that the software uses Method 4 (CDF), in

Thank you very much!

Best regards!

Allen Gaohua Ji,
Shanghai Ocean University

```

```
Date: 09/18/2012 at 16:03:19
From: Doctor Peterson
Subject: Re: Thank you (box-plot)

Hi, Ji.

Yes, it does look very likely that they are using this method, or at least
something equivalent.

Note that at the end of his article, Langford says that this is the best
method, and offers an equivalent method that is easy to teach:

Thus, the following method is equivalent to the CDF Method 4, yet has
the flavor of the Inclusive and Exclusive Methods 1 and 2, and thus
should be more accessible to students.

SUGGESTED METHOD: Divide the data set into two halves, a bottom half
and a top half. If n is odd, include or exclude the median in the
halves so that each half has an odd number of elements. The lower and
upper quartiles are then the medians of the bottom and top halves,
respectively.

So for these four examples, the work looks like these:

--+--            <-- bottom half
1 | 2 | 3 | 4
|   | --+--    <-- top half
1.5 2.5 3.5     <-- Q1, Q2, Q3

----+----
1  (2) (3) (4)  5
|   ----+----
2   3   4

----+----
1  (2)  3 | 4  (5)  6
|     | ----+----
2    3.5    5

----+----
1  (2)  3  (4)  5  (6)  7
|       |   ----+----
2       4       6

- Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/

```

```
Date: 09/18/2012 at 21:06:25
From: Ji
Subject: Thank you (box-plot)

Dear Dr. Peterson,

Thank you again for your detailed instruction! Now I'm clear about how my
software works.

Best regards!

Allen Gaohua Ji
```
Associated Topics:
High School Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search