
Re: problem on recordbreaking values in probability
Posted:
Feb 27, 2013 4:05 PM


On Wed, 27 Feb 2013 07:10:08 0500, David Bernier wrote: > On 02/27/2013 05:49 AM, David Bernier wrote: >> On 02/27/2013 05:31 AM, David Bernier wrote: >>> I used Marsaglia's 64bit SUPER KISS pseudorandom number generator >>> to simulate uniform r.v.s on [0, 1] that are independent, as >>> X_1, X_2, X_3, ad infinitum >>> >>> For each go, (or sequence) I define its 1st recordbreaking value >>> as R(1) as X_1, its 2nd recordbreaking value R(2) as the >>> value taken by X_n for the smallest n with X_n > X_1, and in general [ R(k+1) is the value taken by X_n for the smallest n with X_n > R(k)] ... >>> In my first simulation I get: R(20) = 0.999999999945556 >>> or about 5.4E11 less than 1 , a one in 18 billion event. ... >>> In fact, R(20) is about 1  (0.307)^20 ... > > I finally got another 20th recordbreaking value, on my > second sequence, but it took much longer. The values > 1  R(20), one per sequence, are what I call "pvalues" > > from the simulations so far, a lot of variance in orders > of magnitude: > > the corresponding pvalue is 0.000000000054 // seq. 1 > the corresponding pvalue is 0.000000000001 // seq. 2 > the corresponding pvalue is 0.000000002463 > the corresponding pvalue is 0.000000000782 > the corresponding pvalue is 0.000000106993 > the corresponding pvalue is 0.000000342142 > the corresponding pvalue is 0.000000001978 [etc]
It would be useful to report the number of trials each simulation took to find its 20th RBV. If a simulation takes m trials, the variance of the value X_m is approximately 1/(m^2), where X_m is the mth smallest (ie the largest) number among m trials. (See <http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29#Order_statistics>) Your simulation data implies there is a wide variance among the values of m required to find a 20th RBV.
In following, let L(n) = Pr(n'th item of n is lowest). (Distribution of the lowest item should be similar to distribution of 1(highest item).) I suppose that L(n) = 1/n and that the expected value of the number of recordlowvalues (RLV's) in m trials is sum{i=1 to m}(1/i), or about H_m, the m'th harmonic number, which can be approximated by log(m) + gamma, with gamma = EulerMascheroni constant, about 0.5772. When H_m ~ 20, log(m) ~ 19.42 and m is about 272 million. That is, simulations to find the 20th RLV or RBV will take about 272 million steps, on average. (I don't know which kind of average applies, or what the variance might be.)
 jiw

