Origin of Origin's OutputsDate: 09/17/2012 at 05:12:53 From: Ji Subject: box-plot I have read this question and answer: http://mathforum.org/library/drmath/view/60969.html But another question has arisen. When I draw this box-plot using Origin (version 8.5 of the software), it resulted in LQ = 3 and UQ = 7 for data set B = {1, 2, 3, 4, 5, 6, 7, 8, 9} LQ = 3 and UQ = 9 for data set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} Apparently, the Origin program included the median when calculating the lower and upper quartiles for data set B, but excluded the median when calculating those values for D. So Origin seems to follow neither Tukey's method nor others. But what method could it have used for these calculations? Date: 09/17/2012 at 19:23:23 From: Doctor Peterson Subject: Re: box-plot Hi, Ji. Your quartile data appears to agree with the chart on that page for sets B and D using the "M&S" method. Here's the relevant section: Mendenhall and Sincich, in their text _Statistics for Engineering and the Sciences_, define a different method of finding quartile values. To apply their method on a data set with n elements, first calculate L = (1/4)(n + 1) and round to the nearest integer. If L falls halfway between two integers, round up. The Lth element is the lower quartile value. Next, calculate U = (3/4)(n + 1) and round to the nearest integer. If U falls halfway between two integers, round down. The Uth element is the upper quartile value. For set B = {1, 2, 3, 4, 5, 6, 7, 8, 9}, n = 9, so L = (1/4)(9 + 1) = 2.5, rounded up to 3 U = (3/4)(9 + 1) = 7.5, rounded down to 7 Thus, LQ and UQ of B are 3 and 7, respectively. For set D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, n = 11, so L = (1/4)(11 + 1) = 3 U = (3/4)(11 + 1) = 9 Thus, LQ and UQ of D are 3 and 9, respectively. There may well be other methods they could use to obtain these values; does M&S produce the values you observe for other cases? - Doctor Peterson, The Math Forum http://mathforum.org/dr.math/ Date: 09/17/2012 at 20:49:41 From: Ji Subject: box-plot Thanks for your help! When the number of data is odd, it seems that Origin software applies the "M&S" method. When the number of data is even, for example, {1, 2, 3, 4, 5, 6, 7, 8}, n = 8, L = (1/4)(8 + 1) = 2.25, round down to 2 U = (3/4)(8 + 1) = 6.75, round up to 7 Thus, the LQ and UQ are 2 and 7, respectively. Are they right? But Origin gives output of LQ and UQ as 2.5 and 6.5, respectively, with median = 4.5. Date: 09/17/2012 at 23:03:00 From: Doctor Peterson Subject: Re: box-plot Hi, Ji. You're right, it appears they do not consistently use the M&S method; for even numbers, they might use Tukey or M&M. But since those have a very different style from M&S, I can't imagine they would mix the two. More likely, they follow some other method entirely. For a larger study of different methods of calculation of quartiles (actually percentiles, the quartiles being the 25th and 75th percentiles), see http://www.amstat.org/publications/jse/v14n3/langford.html Table 2 is similar to Dr. TWE's table, but with slightly smaller sample data sets. Perhaps if you try those, you will recognize one that agrees with your software. Let me know what values you get for the quartiles. I suspect they may be using the "CDF method" (4): METHOD 4 ("CDF"): The Pth percentile value is found as follows. Calculate np. If np is an integer, then the Pth percentile value is the average of #(np) and #(np + 1). If np is not an integer, the Pth percentile value is #ceil(np); that is, we round up. Alternatively, one can look at #(np + 0.5) and round off unless it is half an odd integer, in which case it is left unrounded. As an example, if S5 = {1, 2, 3, 4, 5} and p = 0.25, then #(np) = 1.25, which is not an integer. So we take the next largest integer, and hence, Q1 = 2. Using the alternative calculation, we would look at #(np + 0.5) = #(1.75), which would again round off to 2. Note that this method can be considered as "Method 10 with rounding." Translated into Dr. TWE's terms, L = n/4 if it is not an integer, round up and use Lth data point; if it is an integer, average the Lth and (L + 1)th data points U = 3n/4 if it is not an integer, round up and use Lth data point; if it is an integer, average the Lth and (L + 1)th data points For {1, 2, 3, 4, 5, 6, 7, 8}, this gives LQ: L = 8/4 = 2; average 2nd and 3rd data points, giving 2.5 UQ: U = 3*8/4 = 6; average 6th and 7th data points, giving 6.5 For {1, 2, 3, 4, 5, 6, 7, 8, 9}, we get LQ: L = 9/4 = 2.25; round up and use 3rd data point, 3 UQ: U = 3*9/4 = 6.75; round up and use 7th data point, 7 These cases agree with what you've told me. - Doctor Peterson, The Math Forum http://mathforum.org/dr.math/ Date: 09/18/2012 at 03:59:54 From: Ji Subject: Thank you (box-plot) Dear Dr. Peterson, I'm very grateful to you for all your help! The software which I am using, OriginPro 8.5 (OriginLab), produced the following values for the quartiles: SET Q1, Q3 MEDIAN n/4 3n/4 2n/4 ------------------------------------------------------------------ {1, 2, 3, 4} (1.5, 3.5) 2.5 1 3 2 {1, 2, 3, 4, 5} (2, 4) 3 1.25 3.75 2.5 {1, 2, 3, 4, 5, 6} (2, 5) 3.5 1.5 4.5 3 {1, 2, 3, 4, 5, 6, 7} (2, 6) 4 1.75 5.25 3.5 These results indicate that the software uses Method 4 (CDF), in accordance to your deduction. Thank you very much! Best regards! Allen Gaohua Ji, Shanghai Ocean University Date: 09/18/2012 at 16:03:19 From: Doctor Peterson Subject: Re: Thank you (box-plot) Hi, Ji. Yes, it does look very likely that they are using this method, or at least something equivalent. Note that at the end of his article, Langford says that this is the best method, and offers an equivalent method that is easy to teach: Thus, the following method is equivalent to the CDF Method 4, yet has the flavor of the Inclusive and Exclusive Methods 1 and 2, and thus should be more accessible to students. SUGGESTED METHOD: Divide the data set into two halves, a bottom half and a top half. If n is odd, include or exclude the median in the halves so that each half has an odd number of elements. The lower and upper quartiles are then the medians of the bottom and top halves, respectively. So for these four examples, the work looks like these: --+-- <-- bottom half 1 | 2 | 3 | 4 | | --+-- <-- top half 1.5 2.5 3.5 <-- Q1, Q2, Q3 ----+---- 1 (2) (3) (4) 5 | ----+---- 2 3 4 ----+---- 1 (2) 3 | 4 (5) 6 | | ----+---- 2 3.5 5 ----+---- 1 (2) 3 (4) 5 (6) 7 | | ----+---- 2 4 6 So it looks like your software has made a good choice. - Doctor Peterson, The Math Forum http://mathforum.org/dr.math/ Date: 09/18/2012 at 21:06:25 From: Ji Subject: Thank you (box-plot) Dear Dr. Peterson, Thank you again for your detailed instruction! Now I'm clear about how my software works. Best regards! Allen Gaohua Ji |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. Math^{TM}
© 1994- The Math Forum at NCTM. All rights reserved.
http://mathforum.org/dr.math/