As we progress through M&M's first chapter we are noticing a difference in terminology between Minitab and the text.
Not that it makes much difference but if Q1 doesn't lie on a data point it interpolates between the two it is between. This isn't a lot different from the way that Moore defines Q1.
To make things more confusing, the boxplots ARE the same as M&M defines however minitab calls those values "Upper and lower hinges" HL and HU, as they call the values, seem to be calculated by the process Moore uses to find Q1 and Q3.
----- End of forwarded message from John Burnette -----
That's only the tip of the iceberg! I have seen more than a dozen different ways of defining these points which give slightly different results. In some cases, two methods give the same results for some values of n and different results for other values of n. What Minitab does (or did in the older versions I'm most familiar with) for Q1 and Q3 is part of a general approach to quantiles that is older than boxplots. It can involve some messy interpolation in the general case, so when Tukey invented the boxplot (remember, a lot of his inventions were meant to be done while flying on a plane in the days before laptops) he used approximate quartiles which he called "hinges". Elementary textbooks tend to blur this distinction and give one definition and call the result a "quartile". For example, Siegel uses Tukey's hinges and calls them "quartiles". Later a different approximate quartile was adopted for the QLP materials. It was subsequently adopted in the texts by David Moore and by the TI-82 and TI-83, and is pretty much standard in K-12 as a result. These materials also blur the distinction Minitab is (correctly) making.
Those are my general comments, and this sentence is a note to move on if you do not want more detail on
1. the problems of using non-standard terminology 2. what the actual different definitions are
I think the QLP materials are wonderful -- better than 99% of the stuff being used to teach statistics in the colleges. But I do wish they were a little less "creative" in their terminology. Hinges and quartiles were already well established before QLP came along, so I think that would be a good reason to go with one or the other. I'm not sure why they went with a third alternative (and called it a quartile). I also note that they called the things that most people (and Minitab) call "dotplots" "lineplots". Whatever the reasons, using nonstandard terminology does have the price of confusion sooner or later. I've even been criticized or "corrected" by high school teachers because I used standard terminolgy for things rather than the nonstandard terminology they were accustomed to. (Was it McCauley who said, "Beware the man who's read but one book"?)
TI did their homework and asked around to find out which of the many definitions of quartiles/hinges they would implement on their calculators. However, they were more interested with usage in high school classrooms rather than in the statistics profession.
Speaking of TI and standard terminology, the TI-82 implemented what I would call "quick" boxplots. These are boxplots without any flagging of outliers. Since these lack one of the two main reasons for doing boxplots in the first place, I was disappointed. I was happier when I saw that the TI-83 does plain old real boxplots as well. I was not so happy when I saw that the manual called the quick boxplots "regular baxplots" and the regular boxplots "modified boxplots", as if the limitations of the 82 were the standard of regularity, and a real boxplot was some kind of aberration.
Here are three flavors of quartiles. Everybody agrees you need to sort the data first. I'll just talk about the first quartile. To get the third, sort your data in the wrong direction and then follow the steps below.
The first quartile has rank (n+1)/4. Note that everyone agrees that the median has rank (n+1)/2 and Minitab is just extending the pattern. It does the same sort of thing to get deciles or percentiles. If the rank is not a whole number, Minitab uses linear interpolation between two adjacent data values. Note that in the case of quartiles this may put you one-fourth or three-fourths of the way between two data values.
Find the median. Then find the median of the data values whose ranks are LESS THAN OR EQUAL TO the rank of the median. This will be a data value or it will be half way between two data values.
Find the median. Then find the median of the data values whose ranks are STRICTLY LESS THAN the rank of the median. This will be a data value or it will be half way between two data values.
Note that for SOME values of n, SOME of these methods give the same results. No two of them give the same results for all n.
_ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College | | Plymouth, New Hampshire 03264 USA | * | Rural Route 1, Box 10 / | Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_____/ email@example.com fax (603) 535-2943 (work)