The Math Forum

Ask Dr. Math - Questions and Answers from our Archives
Associated Topics || Dr. Math Home || Search Dr. Math

Difference of Two Proportions in Statistics

Date: 08/15/2005 at 10:02:38
From: Roberta
Subject: Difference of two proportions

In stats class we are studying the binomial distribution and we 
tackled the difference of two proportions.  Why is it that we assume
np > 5 and n(1-p) > 5?  Where does the 5 come from?

Date: 08/17/2005 at 13:07:29
From: Doctor George
Subject: Re: Difference of two proportions

Hi Roberta,

I think I understand your question.  When np > 5 and n(1-p) > 5 it is
common to use the normal distribution as an approximation to the 
binomial distribution.  Doing this allows us to use the z-chart to 
find probabilities, and that is a convenient thing to do.

So the question is why does np > 5 and n(1-p) > 5 make the
approximation reasonable?  I also have never seen the reason in 
writing, but after thinking about it for a while I can make some sense
out of it.

Remember that the binomial distribution converges to the normal 
distribution for large n.  This is what makes the issue in your 
question even worth considering.

The normal distribution has infinite tails, which the binomial
distribution does not.  In order for the approximation to make sense
the binomial distribution needs to have fairly long tails.  In other
words, the distance from the mean to the each endpoint must be a
sufficiently large number of standard deviations.

I should mention that I use q = 1-p with the binomial distribution.  
Your textbook may not be using q in this way.  Let's call L the
distance from the mean to the left endpoint in standard deviations.  Then:

  L = np / sqrt(npq)

  np = L^2 * q

if np > 5 then

  L > sqrt(5/q)

Since q cannot be greater than 1, L is always greater than 2.23.  On a
z-chart you can see that less than 1.3% of the area is to the left of
-2.23, so the left tail of the normal distribution has very little
area beyond the endpoint of the binomial distribution.  For smaller
values of q the area to the left of L is even less.  So with np > 5
the left tail of the binomial distribution starts becoming long enough
to look similar to the left tail of the normal distribution.

Now let's call R the distance from the mean to the right endpoint in
standard deviations.  Then

  R = (n-np) / sqrt(npq)

  R = n(1-p) / sqrt(np(1-p))

  n(1-p) = R^2 * p

if n(1-p) > 5 then

  R > sqrt(5/p)

Now apply the same logic to the right tail that we did to the left 

Rules of thumb go wrong now and then, but we can see that this is a
pretty good one.

Does that make sense?  Write again if you need more help.

- Doctor George, The Math Forum 

Date: 08/19/2005 at 06:09:21
From: Roberta
Subject: Thank you (Difference of two proportions)

Thank you soooooo much!!!!  You've been a great help!  This is the
first time I actually "asked Dr. Math" and I didn't think I would get
a reply but I got one--wow!  It's really cool!  Thank you once again
and hope to get some help in the future! :)
Associated Topics:
College Statistics

Search the Dr. Math Library:

Find items containing (put spaces between keywords):
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

Math Forum Home || Math Library || Quick Reference || Math Forum Search

Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.