Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Geometric distribution?
Replies: 2   Last Post: Sep 10, 2012 12:14 PM

 Messages: [ Previous | Next ]
 qguy Posts: 3 Registered: 9/9/12
Geometric distribution?
Posted: Sep 9, 2012 12:45 AM

I have a number of questions about the geometric distribution.

(I use Excel here to clarify my methods and questions. But these are not
Excel questions.)

I refer to an example in one of my statistics texts because presumably their

Suppose we are testing components that have a probability of 0.02 (1/50) of
being defective. The probability that the n-th component is defective (and
none earlier) is p*q^(n-1), where p=0.02 and q=1-p = 0.98. Also, I believe
the cumulative probability is 1-(1-q)^n = Sigma(p*q*(k-1), k=1,...,n), which
represents the probability of finding the first defective component before
the (n+1)-st test. Conversely, I believe (1-q)^n represents the probability
of not finding a defective component before the (n+1)-st test. [1]

In Excel, I build a table where A1:A1000 has the values 1 to 1000, B1:B1000
has the formula =\$P\$1*(1-\$P\$1)^(A1-1), and C1:C1000 has the formula
=1-(1-\$P\$1)^A1. \$P\$1 has the value 0.02. Of course, I could have more than
1000 since the geometric distribution is infinite. But 1000 seems to be
sufficient to get reasonable numerical results for the following
calculations.

1. The text says that the mean is 1/p = 50. That represents the number of
components we should expect to have to test before finding a defect.

I confirm that with the following weighted average formula:
=SUMPRODUCT(A1:A1000,B1:B1000)/SUM(B1:B1000) [1]. Normally, I would not
divide by SUM(B1:B1000) because the sum of the probabilities should be 1.
But dividing by the sum compensates for the fact that I do not have an
infinite distribution.

I thought that means the cumulative probability of the 50-th test should be

On the other hand, cumulative P(k=35) is closest to and not less than 50%,
specifically 50.69%. So if I didn't "know better" based on the definition
of the mean (1/p), I would have thought that we should expect to have to
test 35 components before finding a defect.

I think I am misunderstanding something. But what?

My guess: this is the difference between the mean and median applied to the
geometric distribution. Right?

In other words, k=35 represents the median, which I know is now always the
same as the mean.

But in that case, what meaning do I give to the median?

Again, I thought it meant we should not "expect" (50% of the time) to find
the first defective component before the 35th test. But that seems to be at
odds with the mean, which is the "expected value" E(X).

Obviously, my thoughts are in an "infinite loop". Can someone get me out of
it?

2. And if that's the explanation (median v. mean), what interpretation
should (can) I give to the middle two quartiles (middle 50%); that is,
between cumulative P(k=15) = 26.14% and cumulative P(k=68) = 74.69%?

I want to say that they mean that we should "expect" (50% of the time) to
find the first defective component between the 15th and 68th test inclusive.

Is that a correct statement? If not, how do we find the "expected" middle
50%? And is that a useful statistic?

(It is certainly useful to find the middle 68% of a normal distribution,
i.e. +/- 1 sd of the mean, for example.)

3. Finally, the text says that the standard deviation is SQRT(q)/p, which is
about 49.50. More precisely, the variance is (1/p)*(1/p - 1) = 50*49 =
2450.

I confirm that with the following forumla (M1 is the mean):
=SQRT(SUMPRODUCT(B1:B1000,(A1:A1000-M1)^2)/SUM(B1:B1000)).

But what meaning do we give to the std dev of the geometric distribution?
How do we make use of it?

For a normal distribution, we can say that about 68% of the data should be
between +/-1 sd of the mean. But I don't think that applies (or should
apply) to the geometric distribution. Right?

In fact, based on the my cumulative probabilities in C1:C1000, about 86.47%
of the outcomes lie between +/-1 sd of the mean of this geometric
distribution.

-----
[1] For non-Excel users:

The operator "^" is "power of". For example, 5^3 is 5*5*5.

The expression SUMPRODUCT(A1:A1000,B1:B1000) is equivalent to
Sigma(A[i]*B[i],i=1,...,1000).

Date Subject Author
9/9/12 qguy
9/9/12 qguy
9/10/12 Herman Rubin