|


Probability and Random NumbersDate: 02/28/2004 at 10:23:49 From: GVK Subject: Probability and Random Numbers Let me start with an example first. If I have a six-faced die, I can be sure that rolling it will produce 6 events of equal probability (1/6 each). Suppose I blacken one of the faces and roll the die. I will ignore the result and roll the die again if the blackened face shows up. Can I be sure that the other five will show up with a probability of 1/5 each? The original problem asks if I can use a coin (which has 2 events of equal probability) to devise three events with 1/3rd probability each? I find devising 4 events easy; toss the coin twice and interpret the results 00, 01, 10, 11 (ie HH, HT, TH, TT) as the 4 events. But, is it okay to disregard any one result (say 00), similar to blackening the die above, and claim that the other three events are of equal probability? I find it hard to believe that a coin can be used to have three events of equal probability. I tried devising various versions of the problem, like using the function rand5() to generate a function rand7(), and the blackened six-faced die, etc.
Date: 02/29/2004 at 09:15:20
From: Doctor Jacques
Subject: Re: Probability and Random Numbers
Hi GVK,
The method does work. However, there is a catch; the number of coin
tosses is theoretically unbounded. Although it is quite unlikely, you
can have a very long run of "00" results. This means that you cannot
guarantee that you will generate the next number in a predetermined
amount of time. There is no method that would allow you to generate
those events with a finite number of bits, since 3 never divides any
power of 2 (another way to see this is to notice that the binary
expansion of 1/3 is a repeating infinite string of bits).
As the probability of failure is 1/4, the expected number of coin
tosses is:
2*(1 + 1/4 + (1/4)^2 ...) = 8/3 = 2.66...
This is rather wasteful, as the theoretical number of bits required
to generate an event with probability 1/3 is:
log(3) / log(2) = 1.585
If you want to generate a single event, you cannot do better. If you
want to produce a large number of such events, with an exact
probability of 1/3, and without wasting too many bits, you can still
do better.
The first question would be "why bother?". There are several reasons
for doing this, beyond the fact that this is fun!
In some applications, such as those related to high-security
cryptography, it is really important to have exact probabilities and
truly random numbers. The numbers returned by the random() routines
found in programming languages are not random at all; they are
produced by a deterministic algorithm. They are called random because
their distributions are statistically similar to true random numbers.
When really random numbers are required, we use external devices
based, for example, on radioactive decay or thermal noise--these
processes cannot be reproduced. If you are interested in this, you
could read:
http://www.fourmilab.ch/hotbits/
However, those devices usually have a very low rate; if you want to
produce large amounts of numbers in a short time, you may want to use
as few random bits as possible.
Let us look at a method for generating random integers between 0 and
(n - 1), with an exact uniform distribution. This is similar to the
random(n) function. However, that function, as it is implemented, is
not exact; it generates integers in a large range (for example,
between 0 and 65535), and returns the integer part of (n*x)/65536. If
n is not a power of 2, some numbers will have a slightly larger
probability than others, although the difference will be small if n
is small. In this case, we will generate numbers between 0 and
(n - 1) with an exact probability of 1/n.
We assume that we have available a function NextBit(), that returns a
single random bit (and we assume that calls to that function are
expensive). We assume also that we have to generate many numbers, so
that only the average cost matters.
A first idea would be to collect many random bits, and to generate
random numbers in batches. For example, for a probability of 1/3, we
could collect 8 bits at a time to produce a number between 0 and 255.
As 3^5 = 243 < 256, we could discard the result if it is greater than
242; otherwise, we will produce five numbers between 0 and 2. The
probability of failure is 13/256, so the average number of
experiments is:
1 + 13/256 + (13/256)^2 ... = 256/243 = 1.053...
In each experiment, we consume 8 bits and produce 5 numbers, so the
average number of bits per number is:
1.053 * 8/5 = 1.686
which is already much closer to the theoretical figure of 1.585.
We can still do better. The idea is that, in case of failure, the
difference (x - 243) is a random number between 0 and 12, which is
equivalent to log(13)/log(2) = 3.7 bits, and we could try to take
advantage of it.
The following algorithm uses a similar (but slightly different) idea.
It uses two internal integer variables, m and r, which are not reset
at the beginning of the algorithm (in C, you would declare them
as "static"). Initially, m = 1 and r = 0. We also have a parameter
N, which is a large integer such that 2N can still be represented
exactly in the computer. As said before, n is the modulus of the
numbers you want to produce (they will be between 0 and (n - 1)), we
must have n < N, and we have a function NextBit() that returns a
truly random bit.
1. WHILE m < N DO r : = 2*r + NextBit(); m = 2*m; (r is a random
variable of modulus m)
2. Divide m by n : m = n*q + b
3. IF r >= n*q, let m : = m - n*q, r : = r - n*q (r is still a random
variable of modulus m), and go to step 1.
4. Otherwise, let x : = r mod n, r : = [r/n], and m : = q, and return
x.
In step 4, as r is between 0 and (n*q - 1), we can consider r as a
random variable of modulus n*q. As this is divisble by n, (r mod n)
will be uniformly distributed, and the quotient [r/n] will be
uniformly distributed between 0 and n - 1.
Note that the running time is still unbounded, since you can return
to step 1 arbitrarily often, but, in practice, this will give you an
exact distribution while consuming a number of bits quite close to
the theoretical minimum.
For example, assume that we want to generate decimal digits (n = 10)
and we let N = 256. A typical run would be as follows:
* We start with m = 1, r = 0, and get 8 bits. Assume they are
1, 1, 1, 1, 1, 1, 0, 1.
At the end of step 1, we have m = 256, r = 253.
* We divide 256 by 10, giving q = 25, n*q = 250.
* As 253 >= 250, we are out of luck. The excess, r - n*q = 253 - 250
= 3, is a random variable of modulus 256 - 250 = 6. We let
therefore m = 6 and r = 3 (note that we have saved more than two
bits, instead of throwing them away).
* We start again at step 1. This time, we need to generate 6 bits, in
order to have m >= 256. Assume the process runs as follows
(starting with m = 6, r = 3:
NextBit() | m | r
-----------+-----+----
0 | 12 | 6
1 | 24 | 13
1 | 48 | 27
0 | 96 | 54
0 | 192 | 108
1 | 384 | 217
* Step 2 gives q = 38, n*q = 380
* As 217 < 380, we will return (217 mod 10) = 7. As we were in the
21st decade out of 38, we let r = 21, m = 38 to be used for the
next call. This means that we already have more than 5 bits in
store.
We have used 14 bits to generate the first number, but this is merely
the "startup cost". As m = 38, the next call will need only 3 new
bits in step 1. In the long run, the average number of bits per
number will be close to the theoretical value log(10)/log(2) = 3.32.
The probability of failure at step 3 is at most 9/m <= 9/256.
A nice feature of this algorithm is that n is used only inside the
routine. You can use a different n for each call, without re-
initializing the variables m and r.
Does this help? Write back if you'd like to talk about this some
more, or if you have any other questions.
- Doctor Jacques, The Math Forum
http://mathforum.org/dr.math/
Date: 03/10/2004 at 10:43:15 From: GVK Subject: Probability and Random Numbers Hello Dr. Jacques, Thanks a lot for your reply. I have just one question. In your reply, you wrote: since 3 never divides any power of 2 (another way to see this is to notice that the binary expansion of 1/3 is a repeating infinite string of bits). Are the two equivalent? 5 never divides any power of 2 either, however 1/5 is not an infinite sequence of bits. Is it possible to get 5 events of 1/5 probability using a coin? Date: 03/12/2004 at 02:04:45 From: Doctor Jacques Subject: Re: Probability and Random Numbers Hi again GVK, The two are equivalent. In base N, the N-ary expansion of 1/k is infinite (and repeating) unless all prime factors of k are also prime factors of N, which amounts to saying that k divides some power of N. If N = 2, the only allowed prime factor is 2, so any fraction whose denominator (in lowest terms) is not a power of 2 will have an infinite binary expansion. For your specific example, the binary expansion of 1/5 is: 1/5 = 0. 0011 0011.... = (3/16) + (3/16)^2 + ... and this is a repeating binary number. It is indeed quite possible to generate events with probability 1/5 by tossing coins--you can use the algorithm I gave you with n = 5 and N >= 8). By the way, note that that algorithm also works in the opposite direction; you can also use it to convert "base n" events to "base m" events for any different m and n. For example, you can use it to generate random binary strings using six-faced dice; you must essentially modify step 1 and the choice of N. - Doctor Jacques, The Math Forum http://mathforum.org/dr.math/ |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/