Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


qguy
Posts:
3
Registered:
9/9/12


Geometric distribution?
Posted:
Sep 9, 2012 12:45 AM


I have a number of questions about the geometric distribution.
(I use Excel here to clarify my methods and questions. But these are not Excel questions.)
I refer to an example in one of my statistics texts because presumably their answers are correct.
Suppose we are testing components that have a probability of 0.02 (1/50) of being defective. The probability that the nth component is defective (and none earlier) is p*q^(n1), where p=0.02 and q=1p = 0.98. Also, I believe the cumulative probability is 1(1q)^n = Sigma(p*q*(k1), k=1,...,n), which represents the probability of finding the first defective component before the (n+1)st test. Conversely, I believe (1q)^n represents the probability of not finding a defective component before the (n+1)st test. [1]
In Excel, I build a table where A1:A1000 has the values 1 to 1000, B1:B1000 has the formula =$P$1*(1$P$1)^(A11), and C1:C1000 has the formula =1(1$P$1)^A1. $P$1 has the value 0.02. Of course, I could have more than 1000 since the geometric distribution is infinite. But 1000 seems to be sufficient to get reasonable numerical results for the following calculations.
1. The text says that the mean is 1/p = 50. That represents the number of components we should expect to have to test before finding a defect.
I confirm that with the following weighted average formula: =SUMPRODUCT(A1:A1000,B1:B1000)/SUM(B1:B1000) [1]. Normally, I would not divide by SUM(B1:B1000) because the sum of the probabilities should be 1. But dividing by the sum compensates for the fact that I do not have an infinite distribution.
I thought that means the cumulative probability of the 50th test should be about 50% (with adjustments for quantization error). But in fact, cumulative P(k=50) is about 63.58%.
On the other hand, cumulative P(k=35) is closest to and not less than 50%, specifically 50.69%. So if I didn't "know better" based on the definition of the mean (1/p), I would have thought that we should expect to have to test 35 components before finding a defect.
I think I am misunderstanding something. But what?
My guess: this is the difference between the mean and median applied to the geometric distribution. Right?
In other words, k=35 represents the median, which I know is now always the same as the mean.
But in that case, what meaning do I give to the median?
Again, I thought it meant we should not "expect" (50% of the time) to find the first defective component before the 35th test. But that seems to be at odds with the mean, which is the "expected value" E(X).
Obviously, my thoughts are in an "infinite loop". Can someone get me out of it?
2. And if that's the explanation (median v. mean), what interpretation should (can) I give to the middle two quartiles (middle 50%); that is, between cumulative P(k=15) = 26.14% and cumulative P(k=68) = 74.69%?
I want to say that they mean that we should "expect" (50% of the time) to find the first defective component between the 15th and 68th test inclusive.
Is that a correct statement? If not, how do we find the "expected" middle 50%? And is that a useful statistic?
(It is certainly useful to find the middle 68% of a normal distribution, i.e. +/ 1 sd of the mean, for example.)
3. Finally, the text says that the standard deviation is SQRT(q)/p, which is about 49.50. More precisely, the variance is (1/p)*(1/p  1) = 50*49 = 2450.
I confirm that with the following forumla (M1 is the mean): =SQRT(SUMPRODUCT(B1:B1000,(A1:A1000M1)^2)/SUM(B1:B1000)).
But what meaning do we give to the std dev of the geometric distribution? How do we make use of it?
For a normal distribution, we can say that about 68% of the data should be between +/1 sd of the mean. But I don't think that applies (or should apply) to the geometric distribution. Right?
In fact, based on the my cumulative probabilities in C1:C1000, about 86.47% of the outcomes lie between +/1 sd of the mean of this geometric distribution.
 [1] For nonExcel users:
The operator "^" is "power of". For example, 5^3 is 5*5*5.
The expression SUMPRODUCT(A1:A1000,B1:B1000) is equivalent to Sigma(A[i]*B[i],i=1,...,1000).



