
Re: problem on recordbreaking values in probability
Posted:
Mar 1, 2013 2:10 AM


On 02/27/2013 10:24 PM, David Bernier wrote: > On 02/27/2013 04:05 PM, James Waldby wrote: >> On Wed, 27 Feb 2013 07:10:08 0500, David Bernier wrote: >>> On 02/27/2013 05:49 AM, David Bernier wrote: >>>> On 02/27/2013 05:31 AM, David Bernier wrote: >>>>> I used Marsaglia's 64bit SUPER KISS pseudorandom number generator >>>>> to simulate uniform r.v.s on [0, 1] that are independent, as >>>>> X_1, X_2, X_3, ad infinitum >>>>> >>>>> For each go, (or sequence) I define its 1st recordbreaking value >>>>> as R(1) as X_1, its 2nd recordbreaking value R(2) as the >>>>> value taken by X_n for the smallest n with X_n > X_1, and in general >> [ R(k+1) is the value taken by X_n for the smallest n with X_n > R(k)] >> ... >>>>> In my first simulation I get: R(20) = 0.999999999945556 >>>>> or about 5.4E11 less than 1 , a one in 18 billion event. >> ... >>>>> In fact, R(20) is about 1  (0.307)^20 ... >>> >>> I finally got another 20th recordbreaking value, on my >>> second sequence, but it took much longer. The values >>> 1  R(20), one per sequence, are what I call "pvalues" >>> >>> from the simulations so far, a lot of variance in orders >>> of magnitude: >>> >>> the corresponding pvalue is 0.000000000054 // seq. 1 >>> the corresponding pvalue is 0.000000000001 // seq. 2 >>> the corresponding pvalue is 0.000000002463 >>> the corresponding pvalue is 0.000000000782 >>> the corresponding pvalue is 0.000000106993 >>> the corresponding pvalue is 0.000000342142 >>> the corresponding pvalue is 0.000000001978 >> [etc] >> >> It would be useful to report the number of trials each simulation >> took to find its 20th RBV. If a simulation takes m trials, the >> variance of the value X_m is approximately 1/(m^2), where X_m is >> the mth smallest (ie the largest) number among m trials. (See >> <http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29#Order_statistics>) >> >> Your simulation data implies there is a wide variance among the >> values of m required to find a 20th RBV. >> >> In following, let L(n) = Pr(n'th item of n is lowest). (Distribution >> of the lowest item should be similar to distribution of 1(highest >> item).) I suppose that L(n) = 1/n and that the expected value of the >> number of recordlowvalues (RLV's) in m trials is sum{i=1 to m}(1/i), >> or about H_m, the m'th harmonic number, which can be approximated by >> log(m) + gamma, with gamma = EulerMascheroni constant, about 0.5772. >> When H_m ~ 20, log(m) ~ 19.42 and m is about 272 million. That is, >> simulations to find the 20th RLV or RBV will take about 272 million >> steps, on average. (I don't know which kind of average applies, or >> what the variance might be.) >> > > Yes, I hadn't thought of using expectations and expected values, > leading to partial sums of the harmonic series. > > You wrote: > "simulations to find the 20th RLV or RBV will take about 272 million > steps". > > Right, that looks like an interesting way of looking at > record values. > > Since it's like a stopping time, the number of the simulation, > I suppose it could be denoted S({X_i}_{i=1, ... oo}, 20) > or S(X_{BOLD_overligned}, 20). > > Electricity should go off tomorrow morning for repair. > > I think your estimate > E(log(S_20)) ~= 20  gamma > should be very good. > > The higher moments (and the variance) of > log(S_20) are still something of a mystery to me. > > I propose to change 20 to 15 or 16 because it takes > a long time with 20th RBV simulations. [...]
I did the "number of steps" simulations for the 16th RBVs (recordbreakingvalues) where at step zero (or initially) the record value is 0.0 .
I made a histogram of the values of ln(S_16) for 162 simulated sequences.
The url for the image is: < http://imageshack.us/a/img812/6161/histo16a.jpg >. The 16  EulerGamma or about 15.4 for the mean (?) of S_16 is consistent with the histogram, or vice versa.
Out of the 162 samples, the two largest 16th "stopping times" were: 180,684,317,342 and 310,870,009,300.
The natural logarithms of those are about 25.920 and 26.463, respectively.
My histogram of ln(S16) for the 162 samples has eight rectangular boxes.
The rightmost rectangular box with ~= two samples near the mark 25 on the horizontal axis should correspond to the 25.920 and 26.463 extreme largest log(S16) values.
The smallest of the samples for S16 was 523 , with a natural logarithm of about 6.26 .
I got 175 CPU minutes of data, or about 1 sample per minute.
So, going as far as the 16th RBV is probably a bit too far to get to the "fine structure" of the probability density functions of the log(S_k).
David Bernier  dracut:/# lvm vgcfgrestore File descriptor 9 (/.console_lock) leaked on lvm invocation. Parent PID 993: sh Please specify a *single* volume group to restore.

