
Re: problem on recordbreaking values in probability
Posted:
Feb 27, 2013 10:24 PM


On 02/27/2013 04:05 PM, James Waldby wrote: > On Wed, 27 Feb 2013 07:10:08 0500, David Bernier wrote: >> On 02/27/2013 05:49 AM, David Bernier wrote: >>> On 02/27/2013 05:31 AM, David Bernier wrote: >>>> I used Marsaglia's 64bit SUPER KISS pseudorandom number generator >>>> to simulate uniform r.v.s on [0, 1] that are independent, as >>>> X_1, X_2, X_3, ad infinitum >>>> >>>> For each go, (or sequence) I define its 1st recordbreaking value >>>> as R(1) as X_1, its 2nd recordbreaking value R(2) as the >>>> value taken by X_n for the smallest n with X_n > X_1, and in general > [ R(k+1) is the value taken by X_n for the smallest n with X_n > R(k)] > ... >>>> In my first simulation I get: R(20) = 0.999999999945556 >>>> or about 5.4E11 less than 1 , a one in 18 billion event. > ... >>>> In fact, R(20) is about 1  (0.307)^20 ... >> >> I finally got another 20th recordbreaking value, on my >> second sequence, but it took much longer. The values >> 1  R(20), one per sequence, are what I call "pvalues" >> >> from the simulations so far, a lot of variance in orders >> of magnitude: >> >> the corresponding pvalue is 0.000000000054 // seq. 1 >> the corresponding pvalue is 0.000000000001 // seq. 2 >> the corresponding pvalue is 0.000000002463 >> the corresponding pvalue is 0.000000000782 >> the corresponding pvalue is 0.000000106993 >> the corresponding pvalue is 0.000000342142 >> the corresponding pvalue is 0.000000001978 > [etc] > > It would be useful to report the number of trials each simulation > took to find its 20th RBV. If a simulation takes m trials, the > variance of the value X_m is approximately 1/(m^2), where X_m is > the mth smallest (ie the largest) number among m trials. (See > <http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29#Order_statistics>) > Your simulation data implies there is a wide variance among the > values of m required to find a 20th RBV. > > In following, let L(n) = Pr(n'th item of n is lowest). (Distribution > of the lowest item should be similar to distribution of 1(highest > item).) I suppose that L(n) = 1/n and that the expected value of the > number of recordlowvalues (RLV's) in m trials is sum{i=1 to m}(1/i), > or about H_m, the m'th harmonic number, which can be approximated by > log(m) + gamma, with gamma = EulerMascheroni constant, about 0.5772. > When H_m ~ 20, log(m) ~ 19.42 and m is about 272 million. That is, > simulations to find the 20th RLV or RBV will take about 272 million > steps, on average. (I don't know which kind of average applies, or > what the variance might be.) >
Yes, I hadn't thought of using expectations and expected values, leading to partial sums of the harmonic series.
You wrote: "simulations to find the 20th RLV or RBV will take about 272 million steps".
Right, that looks like an interesting way of looking at record values.
Since it's like a stopping time, the number of the simulation, I suppose it could be denoted S({X_i}_{i=1, ... oo}, 20) or S(X_{BOLD_overligned}, 20).
Electricity should go off tomorrow morning for repair.
I think your estimate E(log(S_20)) ~= 20  gamma should be very good.
The higher moments (and the variance) of log(S_20) are still something of a mystery to me.
I propose to change 20 to 15 or 16 because it takes a long time with 20th RBV simulations.
David Bernier memo $ pwd /home/david/photon/feb26a/GEOMARSAG
$ ls l david 9203 Feb 27 04:48 a.out david 1866 Feb 27 04:48 test17a.c
gcc std=c89 test17a.c ./a.out  dracut:/# lvm vgcfgrestore File descriptor 9 (/.console_lock) leaked on lvm invocation. Parent PID 993: sh Please specify a *single* volume group to restore.

