Summing Activity Leads to a Mean of e
Date: 04/01/2005 at 11:08:24 From: Steve Subject: I keep getting e as the result of a class activity, why? I asked my students to keep adding random integers from 1 to 100 until the sum exceeded 100. We then found the average number of terms added. The answer seems to be e. Why? The more we do it, the closer we get. I can't find a way to derive this result, although it seems plausible. I tried using the expansion for e as a sum of reciprocals of factorials. I guess this is related to the probability, but I'm not quite sure.
Date: 04/04/2005 at 12:27:51 From: Doctor Rick Subject: Re: I keep getting e as the result of a class activity, why? Hi, Steve. This is an interesting question! My working hypothesis is that the theoretical mean may be (1 + 1/N)^N, or close to this, for the generalized experiment in which you add random numbers from 1 through N until the sum is greater than N. In the limit as N increases infinitely, this value equals e. For N = 100, it is 2.7048. As a start to understanding this problem, I considered a much simpler case. I let N = 4, so that we are choosing numbers from 1 to 4 and stopping when the sum exceeds 4. There are few enough possibilities that I can enumerate them. There is no way that one number can exceed a sum of 4. There are 10 ways that the first two numbers chosen can have a sum greater than 4: 14, 23, 24, 32, 33, 34, 41, 42, 43, 44 Each has a probability of 1/4^2 = 1/16, since there are 4^2 ways to choose two numbers and each has equal probability. Thus P(2), the probability that after two numbers the total exceeds 4, is 10/16. There are 20 ways to choose three numbers such that the first two sum to 4 or less and the third takes the sum above 4: 113, 114, 122, 123, 124, 131, 132, 133, 134, 212, 213, 214, 221, 222, 223, 224, 311, 312, 313, 314 The probability of each is (1/4)^3 = 1/64, thus P(3) = 20/64. There are 15 ways to choose four numbers such that the first three sum to 4 or less and the fourth takes the sum above 4: 1112, 1113, 1114, 1121, 1122, 1123, 1124, 1211, 1212, 1213, 1214, 2111, 2112, 2113, 2114 The probability of each is (1/4)^4 = 1/256, so P(4) = 15/256. The only possibility left is 1111, in which case any choice for the fifth number will put the sum over 4. The probability of getting 1111 is 1/256, so that is P(5), the probability that it will take 5 numbers to exceed a total of 4. The mean number of selections required to exceed a total of 4 is 2*P(2) + 3*P(3) + 4*P(4) + 5*P(5) = 2*10/16 + 3*20/64 + 4*15/256 + 5*1/256 = (2*10*4^2 + 3*20*4 + 4*15 + 5*1)/256 = 625/256 = 2.44140625 That's exactly 5^4/4^4 = (1 + 1/4)^4, which is what I was hoping I'd find. Perhaps I'm on the right track. However, another Math Doctor, Dr. George, has let me know that he did an experiment (like yours, I assume) where N = 10, and he did not get a mean of (1 + 1/10)^10. Dr. George had some other very helpful information, though. It is a known result that in the limit as N increases infinitely, the mean number of terms needed to exceed N is e. You can read about it here: MathWorld: Uniform Sum Distribution http://mathworld.wolfram.com/UniformSumDistribution.html I need to study that further myself. - Doctor Rick, The Math Forum http://mathforum.org/dr.math/
Date: 04/07/2005 at 09:17:14 From: Doctor George Subject: Re: I keep getting e as the result of a class activity, why? Hi Steve, You can see from Dr. Rick's work that doing an analytical experiment for N=10 would be a lot of work, so what I did was a computer simulation instead. I noticed from Dr. Rick's work that he set up his discrete experiment a little differently than mine. I then changed my assumptions to match his and repeated my simulation. For N=10 I now get a mean of (1 + 1/10)^10 as Dr. Rick's intuition expected. - Doctor George, The Math Forum http://mathforum.org/dr.math/
Date: 04/11/2005 at 21:41:18 From: Doctor Rick Subject: Re: I keep getting e as the result of a class activity, why? Hi, Steve, I'm back! I haven't had much time to give to this problem, but I think now I can convince myself (at least) that it's true: If we select random digits from 1 through N until the sum exceeds N, the mean number of selections is (1+1/N)^N. In case you're still interested, here's the proof. Let's go back to the case of N=4 and construct a chart. The number in the n-th column of the m-th row is the probability that after m selections, the sum is n. I will only show the columns 1 through N. 1 2 3 4 ------------------------ 1 1/4 1/4 1/4 1/4 2 0 1/16 2/16 3/16 3 0 0 1/64 3/64 4 0 0 0 1/256 How did I generate this table? On the first selection, there is an equal probabilty of selecting any of the numbers 1 through 4. On the second selection, the sum cannot be 1, because the lowest possible sum is 1+1=2. The probability that the sum will be 2 is the probability that the sum after 1 selection is less than 2, times 1/4, the probability that the second selection will be the number that makes the total 2. The same follows for each succeeding entry in the table. In general, the probability P(m,n) that the total after m selections is n, is P(m,n) = 0 , n < m (1/N)Sum (k=1 to n-1) P(k,n-1) , n >= m Now notice the difference between P(m,n-1) and P(m,n): if n >= m, then N*P(m,n-1) = Sum(k=1 to n-2) P(k,n-1) so that N*P(m,n) = N*P(m,n) + P(m-1,n-1) and N^m P(m,n) = N^m P(m,n) + N^(m-1) P(m-1,n-1) The denominator of each P(m,n) is N^m. The numerator of P(m,n) is found by adding the numerators of the cell to its left, P(m,n-1) and the cell above that, P(m-1,n-1). This is the rule for Pascal's triangle, and indeed you can see that the columns in my table are the rows of Pascal's triangle: 1; 1,1; 1,2,1; 1,3,3,1. Now the numbers in Pascal's triangle are combinations; we thus get the formula (writing C(a,b) for the combinations of "a" things taken "b" at a time): P(m,n) = C(n-1,m-1)/N^m , m <= n ; 0, m > n (n-1)! = ----------------- , m <= n ; 0, m > n (m-1)! (n-m)! N^m How do we use these probabilities to compute the mean number of selections to make the sum exceed N? First, the sum of each row of the table (up through column N) is the probability that the sum has not yet exceeded N. The probability that the sum has exceeded N is 1 minus this sum. The probability that the m-th selection is the first to put the sum over N, is P(m) = P(m-th sum exceeds N) - P((m-1)-th sum exceeds N) = (1 - Sum(n=1 to N)P(m,n)) - (1 - Sum(k=1 to N)P(m-1,n)) = Sum(n=1 to N)P(m-1,n) - Sum(n=1 to N)P(m,n) = Sum(n=1 to N)(P(m-1,n) - P(m,n)) The probability that the first selection (m=1) puts the sum over N is zero. Applying the formula to the table above for N=4, we get m P(sum<N) P(m selections needed to exceed N) ------------------------------------------------------- 1 4/4 0 2 6/16 4/4 - 6/16 = 10/16 3 4/64 6/16 - 4/64 = 20/64 4 1/256 4/64 - 1/256 = 15/256 5 0 1/256 - 0 = 1/256 This agrees with my previous work based on enumerating the possibilities. As I did there, I calculate the mean number of terms needed to exceed N by multiplying each probability in the last column by m and summing: 1*P(1) + 2*P(2) + 3*P(3) + 4*P(4) + 5*P(5) = 20/16 + 60/64 + 20/256 + 5/256 = 625/256 In general, the formula is <m> = Sum(m=2 to N+1) m*Sum(n=1 to N)(P(m-1,n) - P(m,n)) The sums can be interchanged, so that instead of summing across the rows of the table, then add the row sums, we sum first down the columns, then add the column sums. We have <m> = Sum(n=1 to N) Sum(m=2 to N+1) m*(P(m-1,n) - P(m,n)) Examine the inner sum: in the case N = 4, it is 2(P(1,n)-P(2,n)) + 3(P(2,n)-P(3,n)) + 4(P(3,n)-P(4,n)) + 5(P(4,n) - P(5,n)) = 2P(1,n) - 2P(2,n) + 3P(2,n) - 3P(3,n) + 4P(3,n) - 4P(4,n) + 5P(4,n) - 5P(5,n) ----------------------------------------------- 2P(1,n) + P(2,n) + P(3,n) + P(4,n) - 5P(5,n) In general, with P(1,n) = 1/N and P(N+1,n) = 0, this is 1/N + Sum(m=1 to N)P(m,n) The full sum is <m> = Sum(n=1 to N)(1/N + Sum(m=1 to N)P(m,n)) = 1 + Sum(n=1 to N) Sum(m=1 to N) P(m,n) = 1 + Sum(n=1 to N) Sum(m=1 to n) C(n-1,m-1)/N^m I changed the summation limit in the last step because P(m,n) = 0 for m > n. Recall that the expansion of (1 + 1/N)^n is (1 + a)^n = C(n,0) + C(n,1)a + C(n,2)a^2 + ... + C(n,n)a^n Comparing the following expansion with the sum over m, we see they are equal: (1 + 1/N)^(n-1)/N = (C(n-1,0) + C(n-1,1)/N + ... + C(n-1,n-1)/N^(n- 1))/N = C(n-1,0)/N + C(n-1,1)/N^2 + ... + C(n-1,n-1)/N^n Using this result, our formula for the mean is now <m> = 1 + (1/N)Sum(n=1 to N)(1 + 1/N)^(n-1) = 1 + (1/N)Sum(n=0 to N-1)(1 + 1/N)^n The remaining sum is a geometric series, and we can apply the formula Sum(k=0 to n) a^k = (a^(n+1)-1)/(a-1) to get <m> = 1 + (1/N)((1 + 1/N)^N - 1)/((1 + 1/N) - 1) = 1 + (1/N)((1 + 1/N)^N - 1)/(1/N) = 1 + (1 + 1/N)^N - 1 = (1 + 1/N)^N That's what I've been trying to show! It's not the tightest proof in the world, but I can rest content. Thanks for bringing up an interesting problem. - Doctor Rick, The Math Forum http://mathforum.org/dr.math/
Date: 04/12/2005 at 07:59:06 From: Steve Subject: Thank you (I keep getting e as the result of a class activity, why?) Wow. Thanks for all your effort. I'll sift through your proof this weekend. I'm glad to see that this actually works. Now I have to try to explain it to my students. Thanks again for all your work. Steve
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.