Associated Topics || Dr. Math Home || Search Dr. Math

### Class Intervals in Statistics

```Date: 02/23/2009 at 15:48:23
From: Oliver
Subject: Class intervals in Statistics

I can't feel comfortable with the issue of having a negative boundary
when we have data which is made up of purely positive numbers.  The
best way to explain would be with an example:

The number of breakdowns in a machine with the data is grouped from
0-4, 5-9, 10-14, 15-19 etc..

The midpoints of each interval would be taken from the midpoints of
the lowest and highest boundary.  No problem normally: the midpoint
of the 5-9 boundary is the midpoint of 4.5 and 9.5, i.e. 7

But what about 0-4?  Surely the lower boundary must be zero, giving a
midpoint of 2.25?  However, textbooks tend to say it should be -0.5,
giving a midpoint of 2.

I believe that if the data is essentially positive, the boundaries
can't go below zero.  Trivial it may seem, but I hate ambiguity.

```

```
Date: 02/23/2009 at 16:36:39
From: Doctor Peterson
Subject: Re: Class intervals in Statistics

Hi, Oliver.

What's happening here is sort of a pretend "boundary" being used to
convert a discrete variable (the number of breakdowns, which must be a
whole number) into a continuous variable (location on the x-axis of
the histogram).

You want columns on a histogram whose MIDPOINTS represent the actual
values. If you didn't have classes, there would be columns for 0, 1,
2, 3, 4, and so on; if the midpoint of a column is at 0, and the width
is 1, then it must extend from -0.5 to +0.5:

+-+
| |
| +-+
| | | +-+
| | +-+ |
| | | | |
| | | | +-+
| | | | | |
===+=+=+=+=+=+=...
0 1 2 3 4 5

With classes, you will have one bar representing the entire class:

+---------+
|         |
|         |
|         |
|         |
|         |
===+=+=+=+=+=+=...
0 1 2 3 4 5

It should still cover the same interval on the axis, so it goes from
-0.5 to 4.5; its midpoint is (-0.5 + 4.5)/2 = 2.

That's just a formality, and allows us to pretend that any value, not
just whole numbers, is allowed.  Note that the midpoint is the same as
what it is if you ignore all this and just take the actual discrete
values: (0+1+2+3+4)/5 = 2; or if you just treat 0 and 4 as the
endpoints (leaving a gap of 1 between bars, which is a no-no): (0+4)/2
= 2.

So you'll never really get a count of -0.5, any more than you'll get a
4.5; these boundaries are equally fictitious!  And if you prefer, you
never really have to mention -0.5 in your calculations.  But it allows
us to have a histogram like

+---------+
|         |
|         +---------+
|         |         +--
|         |         |
|         |         |
===+=+=+=+=+=+=+=+=+=+=+=...
0 1 2 3 4 5 6 7 8 9 10

that uniformly covers the axis, rather than

+-------+
|       |
|       | +-------+
|       | |       | +-
|       | |       | |
|       | |       | |
===+=+=+=+=+=+=+=+=+=+=+=...
0 1 2 3 4 5 6 7 8 9 10

where there are gaps, and the endpoints teeter on the edge of their bars.

If you have any further questions, feel free to write back.

- Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/

```

```
Date: 02/23/2009 at 17:59:11
From: Oliver
Subject: Thank you (Class intervals in Statistics)

Thank you for your reply, I'm happier with this now, as you mention
that it's just a formality to help us create histograms and also the
midpoint of the actual discrete values is the same.  I mistakenly
thought before that including the -0.5 gave a negative bias, but now
it's clearer.  Much appreciated!
```
Associated Topics:
College Statistics
High School Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search