Date: Apr 14, 2013 12:17 AM Author: David Bernier Subject: Re: problem on record-breaking values in probability On 04/13/2013 11:51 PM, David Bernier wrote:

> On 03/11/2013 12:32 PM, David Bernier wrote:

>> On 03/11/2013 02:07 AM, James Waldby wrote:

>>> On Sun, 10 Mar 2013 21:47:27 -0400, David Bernier wrote:

>>>> On 03/10/2013 08:52 PM, David Bernier wrote:

>>>>> On 03/01/2013 08:41 AM, David Bernier wrote:

>>>>>> On 02/27/2013 10:24 PM, David Bernier wrote:

>>>>>>> On 02/27/2013 04:05 PM, James Waldby wrote:

>>>>>>>> On Wed, 27 Feb 2013 07:10:08 -0500, David Bernier wrote:

>>>>>>>>> On 02/27/2013 05:49 AM, David Bernier wrote:

>>>>>>>>>> On 02/27/2013 05:31 AM, David Bernier wrote:

>>>>>>>>>>> I used Marsaglia's 64-bit SUPER KISS pseudo-random number

>>>>>>>>>>> generator

>>>>>>>>>>> to simulate uniform r.v.s on [0, 1] that are independent, as

>>>>>>>>>>> X_1, X_2, X_3, ad infinitum

>>>>>>>>>>>

>>>>>>>>>>> For each go, (or sequence) I define its 1st record-breaking

>>>>>>>>>>> value

>>>>>>>>>>> as R(1) as X_1, its 2nd record-breaking value R(2) as the

>>>>>>>>>>> value taken by X_n for the smallest n with X_n > X_1, and in

>>>>>>>>>>> general

>>>>>>>> [ R(k+1) is the value taken by X_n for the smallest n with X_n >

>>>>>>>> R(k)]

>>>>>>>> ...

>>> [snip]

>>>>>>>> [etc]

>>>>>>>>

>>>>>>>> It would be useful to report the number of trials each simulation

>>>>>>>> took to find its 20th RBV. If a simulation takes m trials, the

>>> [snip]

>>>>>>>> In following, let L(n) = Pr(n'th item of n is lowest).

>>>>>>>> (Distribution

>>>>>>>> of the lowest item should be similar to distribution of 1-(highest

>>>>>>>> item).) I suppose that L(n) = 1/n and that the expected value of

>>>>>>>> the

>>>>>>>> number of record-low-values (RLV's) in m trials is sum{i=1 to

>>>>>>>> m}(1/i),

>>>>>>>> or about H_m, the m'th harmonic number, which can be

>>>>>>>> approximated by

>>>>>>>> log(m) + gamma, with gamma = Euler-Mascheroni constant, about

>>>>>>>> 0.5772.

>>> [snip]

>>>>>> In the literature, a remarkable article, which may have

>>>>>> appeared in the Am. Math. Monthly, can be found by

>>>>>> searching for:

>>>>>> Breaking Records and Breaking Boards. Ned Glick

>>> ...

>>>>> I did long simulations for 12th Record-Breaking Values.

>>>>>

>>>>> With MatLab, I constructed a histogram of the natural

>>>>> logarithms of the 76,000 values:

>>>>>

>>>>> < http://img521.imageshack.us/img521/7702/records12log.jpg > .

>>> ...

>>>> S_12 is number of trials (steps) taken to find the 12th

>>>> Record-Breaking Value. On Average, log(S_12) is close

>>>> to 12 - gamma (gamma is the Euler-Mascheroni constant).

>>>>

>>>> A number of 76,000 sequences were generated, each being

>>>> continued until the 12th Record-Breaking Value for

>>>> that sequence was found. There is such variance from

>>>> one sample S_12 to another that I prefer the

>>>> quantities log(S_12) , for the histograms.

>>>>

>>>> Occasionally, an unusually high record is attained

>>>> in the 1st, 2nd, ... or 11th Record-Breaking Value.

>>>> That makes breaking the record all the more difficult.

>>>> In the simulations, the computer would pass (say) three

>>>> hours or more on the same sequence, with no new output

>>>> to the file for three or more hours.

>>>

>>> You may already have done so, but if not and if you are going to

>>> run more simulations, consider (a) profiling the code, and

>>> (b) trying different compilation options. (a) allows you to find

>>> out which lines of code use most of the hours of CPU time, so you

>>> can try alternate ways of coding them. Under (b), starting with

>>> the same random seed in each case, try optimization options -O1,

>>> -O2, -O3, -Ofast, timing the execution and also verifying the

>>> same results. From my interpretation of URL below, those 4

>>> optimization options are all you need to try.

>>> <http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>

>>>

>>

>> Maybe I've spent enough computer time on this distribution

>> problem. I'm grateful for your posts in reply.

>>

>> I think I found something known about the limiting distribution

>> of the k'th record-breaking trial number from the article of

>> Ned Glick:

>>

>> From Ned Glick's (UBC) 1978 article:

>>

>> N_r is the r'th record-breaking time (serial number

>> of the trial). N_1 = 1 always.

>>

>> "Also Renyi [30] stated a "central limit theorem" for the random

>> variables N_r: as r --> oo, the

>> distribution of (ln(N_r) - r)/sqrt(r) is

>> asymptotically normal with mean = 0 and variance= 1."

>> Ned Glick, "Breaking Records and Breaking Boards", 1978

>>

>>

>>

>> In Renyi's article,

>>

>> (log(nu_k) - k)/sqrt(k) ~ N(0,1) as k -> oo

>>

>> "Theorie des elements saillants d'une suite d'observations",

>> 1962.

> [...]

>

> "Advances in Combinatorial Methods and

> Applications to Probability and Statistics"

> edited by N. Balakrishnan,

> Google Books,

>

>

> http://books.google.ca/books/about/Advances_in_Combinatorial_Methods_and_Ap.html?id=WdJSxINF7VIC

>

>

> Part II, Applications to Probability Problems,

> Chapter 13, "Stirling Numbers and Records",

> pp. 189-201 approximately.

>

> In the "classical scheme" of records,

> alpha_1 = alpha_2 = ... alpha_n .

>

> Then Prob[xi_n = 1] = 1/n.

>

> xi_n is the indicator function for the n'th r.v.

> (n = 1, 2, 3, ... ).

> xi_n = 1 if X_n is a record, 0 otherwise.

>

> X_1, X_2, ... are i.i.d. uniform [0, 1] r.v.s (say).

>

> Formulas (13.19) and (13.19) give formulas

> for probability distributions related to

> record values.

>

> I think using Generalized Stirling numbers

> makes it harder to understand.

>

> But "Stirling Numbers and Records" is memorable.

>

> David Bernier

>

There is also the book "Records:mathematical theory"

by Valery B. Nevzorov.

Probably th Nevzorov who wrote a chapter in

"Advances in Combinatorial Methods and

Applications to Probability and Statistics"

"Records:mathematical theory":

http://books.google.ca/books/about/Records.html?id=B7ZpJKouceoC&redir_esc=y

dave

--

Jesus is an Anarchist. -- J.R.