Class Intervals in Statistics
Date: 02/23/2009 at 15:48:23 From: Oliver Subject: Class intervals in Statistics I can't feel comfortable with the issue of having a negative boundary when we have data which is made up of purely positive numbers. The best way to explain would be with an example: The number of breakdowns in a machine with the data is grouped from 0-4, 5-9, 10-14, 15-19 etc.. The midpoints of each interval would be taken from the midpoints of the lowest and highest boundary. No problem normally: the midpoint of the 5-9 boundary is the midpoint of 4.5 and 9.5, i.e. 7 But what about 0-4? Surely the lower boundary must be zero, giving a midpoint of 2.25? However, textbooks tend to say it should be -0.5, giving a midpoint of 2. I believe that if the data is essentially positive, the boundaries can't go below zero. Trivial it may seem, but I hate ambiguity.
Date: 02/23/2009 at 16:36:39 From: Doctor Peterson Subject: Re: Class intervals in Statistics Hi, Oliver. What's happening here is sort of a pretend "boundary" being used to convert a discrete variable (the number of breakdowns, which must be a whole number) into a continuous variable (location on the x-axis of the histogram). You want columns on a histogram whose MIDPOINTS represent the actual values. If you didn't have classes, there would be columns for 0, 1, 2, 3, 4, and so on; if the midpoint of a column is at 0, and the width is 1, then it must extend from -0.5 to +0.5: +-+ | | | +-+ | | | +-+ | | +-+ | | | | | | | | | | +-+ | | | | | | ===+=+=+=+=+=+=... 0 1 2 3 4 5 With classes, you will have one bar representing the entire class: +---------+ | | | | | | | | | | ===+=+=+=+=+=+=... 0 1 2 3 4 5 It should still cover the same interval on the axis, so it goes from -0.5 to 4.5; its midpoint is (-0.5 + 4.5)/2 = 2. That's just a formality, and allows us to pretend that any value, not just whole numbers, is allowed. Note that the midpoint is the same as what it is if you ignore all this and just take the actual discrete values: (0+1+2+3+4)/5 = 2; or if you just treat 0 and 4 as the endpoints (leaving a gap of 1 between bars, which is a no-no): (0+4)/2 = 2. So you'll never really get a count of -0.5, any more than you'll get a 4.5; these boundaries are equally fictitious! And if you prefer, you never really have to mention -0.5 in your calculations. But it allows us to have a histogram like +---------+ | | | +---------+ | | +-- | | | | | | ===+=+=+=+=+=+=+=+=+=+=+=... 0 1 2 3 4 5 6 7 8 9 10 that uniformly covers the axis, rather than +-------+ | | | | +-------+ | | | | +- | | | | | | | | | | ===+=+=+=+=+=+=+=+=+=+=+=... 0 1 2 3 4 5 6 7 8 9 10 where there are gaps, and the endpoints teeter on the edge of their bars. If you have any further questions, feel free to write back. - Doctor Peterson, The Math Forum http://mathforum.org/dr.math/
Date: 02/23/2009 at 17:59:11 From: Oliver Subject: Thank you (Class intervals in Statistics) Thank you for your reply, I'm happier with this now, as you mention that it's just a formality to help us create histograms and also the midpoint of the actual discrete values is the same. I mistakenly thought before that including the -0.5 gave a negative bias, but now it's clearer. Much appreciated!
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994-2013 The Math Forum