Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Probability and Random Numbers

Date: 02/28/2004 at 10:23:49
From: GVK
Subject: Probability and Random Numbers

Let me start with an example first.  If I have a six-faced die, I can 
be sure that rolling it will produce 6 events of equal probability 
(1/6 each).  Suppose I blacken one of the faces and roll the die.  I 
will ignore the result and roll the die again if the blackened face 
shows up.  Can I be sure that the other five will show up with a 
probability of 1/5 each? 

The original problem asks if I can use a coin (which has 2 events of 
equal probability) to devise three events with 1/3rd probability each?  
I find devising 4 events easy; toss the coin twice and interpret the 
results 00, 01, 10, 11 (ie HH, HT, TH, TT) as the 4 events.  But, is 
it okay to disregard any one result (say 00), similar to blackening 
the die above, and claim that the other three events are of equal 
probability?  I find it hard to believe that a coin can be used to 
have three events of equal probability.

I tried devising various versions of the problem, like using the 
function rand5() to generate a function rand7(), and the blackened 
six-faced die, etc.



Date: 02/29/2004 at 09:15:20
From: Doctor Jacques
Subject: Re: Probability and Random Numbers

Hi GVK,

The method does work.  However, there is a catch; the number of coin 
tosses is theoretically unbounded.  Although it is quite unlikely, you 
can have a very long run of "00" results.  This means that you cannot 
guarantee that you will generate the next number in a predetermined 
amount of time.  There is no method that would allow you to generate 
those events with a finite number of bits, since 3 never divides any 
power of 2 (another way to see this is to notice that the binary 
expansion of 1/3 is a repeating infinite string of bits).

As the probability of failure is 1/4, the expected number of coin 
tosses is:

  2*(1 + 1/4 + (1/4)^2 ...) = 8/3 = 2.66...

This is rather wasteful, as the theoretical number of bits required 
to generate an event with probability 1/3 is:

  log(3) / log(2) = 1.585

If you want to generate a single event, you cannot do better.  If you 
want to produce a large number of such events, with an exact 
probability of 1/3, and without wasting too many bits, you can still 
do better.

The first question would be "why bother?".  There are several reasons
for doing this, beyond the fact that this is fun!

In some applications, such as those related to high-security
cryptography, it is really important to have exact probabilities and
truly random numbers.  The numbers returned by the random() routines
found in programming languages are not random at all; they are
produced by a deterministic algorithm.  They are called random because
their distributions are statistically similar to true random numbers.

When really random numbers are required, we use external devices
based, for example, on radioactive decay or thermal noise--these 
processes cannot be reproduced.  If you are interested in this, you
could read:

     http://www.fourmilab.ch/hotbits/ 

However, those devices usually have a very low rate; if you want to 
produce large amounts of numbers in a short time, you may want to use 
as few random bits as possible.

Let us look at a method for generating random integers between 0 and 
(n - 1), with an exact uniform distribution.  This is similar to the 
random(n) function.  However, that function, as it is implemented, is 
not exact; it generates integers in a large range (for example,
between 0 and 65535), and returns the integer part of (n*x)/65536.  If 
n is not a power of 2, some numbers will have a slightly larger 
probability than others, although the difference will be small if n 
is small.  In this case, we will generate numbers between 0 and 
(n - 1) with an exact probability of 1/n.

We assume that we have available a function NextBit(), that returns a 
single random bit (and we assume that calls to that function are 
expensive).  We assume also that we have to generate many numbers, so 
that only the average cost matters.

A first idea would be to collect many random bits, and to generate 
random numbers in batches.  For example, for a probability of 1/3, we 
could collect 8 bits at a time to produce a number between 0 and 255. 
As 3^5 = 243 < 256, we could discard the result if it is greater than 
242; otherwise, we will produce five numbers between 0 and 2. The 
probability of failure is 13/256, so the average number of 
experiments is:

  1 + 13/256 + (13/256)^2 ... = 256/243 = 1.053...

In each experiment, we consume 8 bits and produce 5 numbers, so the 
average number of bits per number is:

  1.053 * 8/5 = 1.686

which is already much closer to the theoretical figure of 1.585.

We can still do better.  The idea is that, in case of failure, the 
difference (x - 243) is a random number between 0 and 12, which is 
equivalent to log(13)/log(2) = 3.7 bits, and we could try to take 
advantage of it.

The following algorithm uses a similar (but slightly different) idea. 
It uses two internal integer variables, m and r, which are not reset 
at the beginning of the algorithm (in C, you would declare them 
as "static").  Initially, m = 1 and r = 0.  We also have a parameter
N, which is a large integer such that 2N can still be represented 
exactly in the computer.  As said before, n is the modulus of the 
numbers you want to produce (they will be between 0 and (n - 1)), we 
must have n < N, and we have a function NextBit() that returns a 
truly random bit.

1. WHILE m < N DO r : = 2*r + NextBit(); m = 2*m; (r is a random
   variable of modulus m)

2. Divide m by n : m = n*q + b

3. IF r >= n*q, let m : = m - n*q, r : = r - n*q (r is still a random
   variable of modulus m), and go to step 1.

4. Otherwise, let x : = r mod n, r : = [r/n], and m : = q, and return 
   x.

In step 4, as r is between 0 and (n*q - 1), we can consider r as a 
random variable of modulus n*q.  As this is divisble by n, (r mod n) 
will be uniformly distributed, and the quotient [r/n] will be 
uniformly distributed between 0 and n - 1.

Note that the running time is still unbounded, since you can return 
to step 1 arbitrarily often, but, in practice, this will give you an 
exact distribution while consuming a number of bits quite close to 
the theoretical minimum.

For example, assume that we want to generate decimal digits (n = 10) 
and we let N = 256.  A typical run would be as follows:

* We start with m = 1, r = 0, and get 8 bits. Assume they are
  1, 1, 1, 1, 1, 1, 0, 1.
  At the end of step 1, we have m = 256, r = 253.

* We divide 256 by 10, giving q = 25, n*q = 250.

* As 253 >= 250, we are out of luck.  The excess, r - n*q = 253 - 250
  = 3, is a random variable of modulus 256 - 250 = 6. We let
  therefore m = 6 and r = 3 (note that we have saved more than two
  bits, instead of throwing them away).

* We start again at step 1.  This time, we need to generate 6 bits, in
  order to have m >= 256.  Assume the process runs as follows
  (starting with m = 6, r = 3:

   NextBit() |  m  |  r
  -----------+-----+----
      0      |  12 |   6
      1      |  24 |  13
      1      |  48 |  27
      0      |  96 |  54
      0      | 192 | 108
      1      | 384 | 217

* Step 2 gives q = 38, n*q = 380

* As 217 < 380, we will return (217 mod 10) = 7.  As we were in the
  21st decade out of 38, we let r = 21, m = 38 to be used for the
  next call.  This means that we already have more than 5 bits in
  store.

We have used 14 bits to generate the first number, but this is merely 
the "startup cost".  As m = 38, the next call will need only 3 new 
bits in step 1.  In the long run, the average number of bits per 
number will be close to the theoretical value log(10)/log(2) = 3.32. 
The probability of failure at step 3 is at most 9/m <= 9/256.

A nice feature of this algorithm is that n is used only inside the 
routine.  You can use a different n for each call, without re-
initializing the variables m and r.

Does this help?  Write back if you'd like to talk about this some 
more, or if you have any other questions.

- Doctor Jacques, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 03/10/2004 at 10:43:15
From: GVK
Subject: Probability and Random Numbers

Hello Dr. Jacques,

Thanks a lot for your reply.  I have just one question.  In your 
reply, you wrote:

   since 3 never divides any power of 2 (another way to see this is to
   notice that the binary expansion of 1/3 is a repeating infinite
   string of bits).

Are the two equivalent?  5 never divides any power of 2 either, 
however 1/5 is not an infinite sequence of bits.  Is it possible to 
get 5 events of 1/5 probability using a coin?


Date: 03/12/2004 at 02:04:45
From: Doctor Jacques
Subject: Re: Probability and Random Numbers

Hi again GVK,

The two are equivalent.  In base N, the N-ary expansion of 1/k is 
infinite (and repeating) unless all prime factors of k are also prime 
factors of N, which amounts to saying that k divides some power of N. 
If N = 2, the only allowed prime factor is 2, so any fraction whose 
denominator (in lowest terms) is not a power of 2 will have an 
infinite binary expansion.

For your specific example, the binary expansion of 1/5 is:

  1/5 = 0. 0011 0011.... = (3/16) + (3/16)^2 + ...

and this is a repeating binary number.

It is indeed quite possible to generate events with probability 1/5 
by tossing coins--you can use the algorithm I gave you with n = 5 
and N >= 8).

By the way, note that that algorithm also works in the opposite 
direction; you can also use it to convert "base n" events to "base 
m" events for any different m and n.  For example, you can use it to 
generate random binary strings using six-faced dice; you must 
essentially modify step 1 and the choice of N.

- Doctor Jacques, The Math Forum
  http://mathforum.org/dr.math/ 
Associated Topics:
College Probability

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/