Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: problem on record-breaking values in probability
Replies: 14   Last Post: Apr 14, 2013 11:36 PM

 Messages: [ Previous | Next ]
 David Bernier Posts: 3,892 Registered: 12/13/04
Re: problem on record-breaking values in probability
Posted: Mar 10, 2013 9:47 PM

On 03/10/2013 08:52 PM, David Bernier wrote:
> On 03/01/2013 08:41 AM, David Bernier wrote:
>> On 02/27/2013 10:24 PM, David Bernier wrote:
>>> On 02/27/2013 04:05 PM, James Waldby wrote:
>>>> On Wed, 27 Feb 2013 07:10:08 -0500, David Bernier wrote:
>>>>> On 02/27/2013 05:49 AM, David Bernier wrote:
>>>>>> On 02/27/2013 05:31 AM, David Bernier wrote:
>>>>>>> I used Marsaglia's 64-bit SUPER KISS pseudo-random number generator
>>>>>>> to simulate uniform r.v.s on [0, 1] that are independent, as
>>>>>>> X_1, X_2, X_3, ad infinitum
>>>>>>>
>>>>>>> For each go, (or sequence) I define its 1st record-breaking value
>>>>>>> as R(1) as X_1, its 2nd record-breaking value R(2) as the
>>>>>>> value taken by X_n for the smallest n with X_n > X_1, and in general

>>>> [ R(k+1) is the value taken by X_n for the smallest n with X_n > R(k)]
>>>> ...

>>>>>>> In my first simulation I get: R(20) = 0.999999999945556
>>>>>>> or about 5.4E-11 less than 1 , a one in 18 billion event.

>>>> ...
>>>>>>> In fact, R(20) is about 1 - (0.307)^20 ...
>>>>>
>>>>> I finally got another 20th record-breaking value, on my
>>>>> second sequence, but it took much longer. The values
>>>>> 1 - R(20), one per sequence, are what I call "p-values"
>>>>>
>>>>> from the simulations so far, a lot of variance in orders
>>>>> of magnitude:
>>>>>
>>>>> the corresponding p-value is 0.000000000054 // seq. 1
>>>>> the corresponding p-value is 0.000000000001 // seq. 2
>>>>> the corresponding p-value is 0.000000002463
>>>>> the corresponding p-value is 0.000000000782
>>>>> the corresponding p-value is 0.000000106993
>>>>> the corresponding p-value is 0.000000342142
>>>>> the corresponding p-value is 0.000000001978

>>>> [etc]
>>>>
>>>> It would be useful to report the number of trials each simulation
>>>> took to find its 20th RBV. If a simulation takes m trials, the
>>>> variance of the value X_m is approximately 1/(m^2), where X_m is
>>>> the mth smallest (ie the largest) number among m trials. (See
>>>> <http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29#Order_statistics>)
>>>>
>>>>
>>>>
>>>> Your simulation data implies there is a wide variance among the
>>>> values of m required to find a 20th RBV.
>>>>
>>>> In following, let L(n) = Pr(n'th item of n is lowest). (Distribution
>>>> of the lowest item should be similar to distribution of 1-(highest
>>>> item).) I suppose that L(n) = 1/n and that the expected value of the
>>>> number of record-low-values (RLV's) in m trials is sum{i=1 to m}(1/i),
>>>> or about H_m, the m'th harmonic number, which can be approximated by
>>>> log(m) + gamma, with gamma = Euler-Mascheroni constant, about 0.5772.
>>>> When H_m ~ 20, log(m) ~ 19.42 and m is about 272 million. That is,
>>>> simulations to find the 20th RLV or RBV will take about 272 million
>>>> steps, on average. (I don't know which kind of average applies, or
>>>> what the variance might be.)
>>>>

>>>
>>> Yes, I hadn't thought of using expectations and expected values,
>>> leading to partial sums of the harmonic series.
>>>
>>> You wrote:
>>> "simulations to find the 20th RLV or RBV will take about 272 million
>>> steps".
>>>
>>> Right, that looks like an interesting way of looking at
>>> record values.
>>>
>>> Since it's like a stopping time, the number of the simulation,
>>> I suppose it could be denoted S({X_i}_{i=1, ... oo}, 20)
>>> or S(X_{BOLD_overligned}, 20).
>>>
>>> Electricity should go off tomorrow morning for repair.
>>>
>>> E(log(S_20)) ~= 20 - gamma
>>> should be very good.

>>
>> In the literature, a remarkable article, which may have
>> appeared in the Am. Math. Monthly, can be found by
>> searching for:
>> Breaking Records and Breaking Boards. Ned Glick
>>
>> The author is Ned Glick, at some time at UBC in Canada.
>>
>> David Bernier

>
> I did long simulations for 12th Record-Breaking Values.
>
> With MatLab, I constructed a histogram of the natural
> logarithms of the 76,000 values:
>
> < http://img521.imageshack.us/img521/7702/records12log.jpg > .
>
> The mean should be close to 12 - EulerGamma:
> http://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant
>
> David Bernier

S_12 is number of trials (steps) taken to find the 12th
Record-Breaking Value. On Average, log(S_12) is close
to 12 - gamma (gamma is the Euler-Mascheroni constant).

A number of 76,000 sequences were generated, each being
continued until the 12th Record-Breaking Value for
that sequence was found. There is such variance from
one sample S_12 to another that I prefer the
quantities log(S_12) , for the histograms.

Occasionally, an unusually high record is attained
in the 1st, 2nd, ... or 11th Record-Breaking Value.
That makes breaking the record all the more difficult.
In the simulations, the computer would pass (say) three
hours or more on the same sequence, with no new output
to the file for three or more hours.

David Bernier
--
\$apr1\$LJgyupye\$GZQc9jyvrdP50vW77sYvz1

Date Subject Author
2/27/13 David Bernier
2/27/13 David Bernier
2/27/13 David Bernier
2/27/13 James Waldby
2/27/13 David Bernier
3/1/13 David Bernier
3/1/13 David Bernier
3/10/13 David Bernier
3/10/13 David Bernier
3/11/13 James Waldby
3/11/13 David Bernier
4/13/13 David Bernier
4/14/13 David Bernier
4/14/13 David Petry