Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: Laplace's rule of succession
Replies: 1   Last Post: Feb 4, 2007 2:50 PM

 Messages: [ Previous | Next ]
 Michael Olea Posts: 57 Registered: 12/13/04
Re: Laplace's rule of succession
Posted: Feb 4, 2007 2:50 PM

Scott wrote:

> In Bayesian statistics, Laplace's rule of succession attempts to solve
> the problem of how we can predict that the sun will rise tomorrow,
> given its past frequency of rising.
>
> Definitions:
>
> 1. Let p be the long-run frequency, as observed.
> 2. Let n be the total number of trials.
> 3. Let s be the number of *successes* among these trials, so that n -
> s is the number of failures.
>
> The rule of succession states that the probability of the next success
> is given by the *expected value of a normalized likelihood function*.
> The likelihood function is
>
> p^s * (1 - p)^(n - s).
>
> Normalized with the integral S_{0 to 1}(p^s * (1 - p)^(n - s)) dp, one
> obtains as the expected value
>
> (s + 1)/(n + 2)
>
> for the probability of the next success. Thus, if all we know is that
> the sun has risen 2000 times, the probability of its rising again is
> 2001/2002.
>
> Now, I have a question. What's so special about this likelihood
> function? It seems to be formulated completely ad hoc.

It is the assumption that sunrise events are identically and independently
distributed. Given that assumption, then there is some probability p that
the sun will rise, and the question becomes one of estimating p from the
data: s successes in n trials. There are two more assumptions: 1) the prior
probability over p is the uniform distribution over [0, 1], 2) the loss
function (or cost of errors) is quadratic.

In other words, we are trying to estimate the parameter p. We start out with
the assumption, prior to collecting any data, that all values of p in [0,
1] are equally likely.

The second issue, once you have computed the posterior distribution over p,
is how to pick a "best" estimate for p. What is "best" will depend on a
loss function. One choice lead to the MAP (maximum a posteriori
probability), which is a value of p at which the pdf has a maximum. Under
the given assumptions above the pdf is unimodal, and achieves a maximum for
p = s/n:

solve for p the expression d/dp [p^s * (1-p)^n-s] = 0

Another choice of loss function, the one in the "law of succession", leads
to the mean of the posterior pdf.

> ... If the sample
> space were all possible successions, the probability of the next
> success would simply be 1/2. ...

Only if all sequences are equally likely. It is possible for the sample
space to include all possible successions, but assign them different a
priori probabilities. If all sequences were a priori equally likely (or in
other words there are no constraints over the hypothesis space) then
generalization, or induction, would be impossible. There would be no mutual
information between future and past, and no basis for predicting the future
of a process from its past.

> ... So what gives?

Would you take this bet: every day the sun does not rise I will give you
\$1000 (just assume there is some way to collect it), every day it does rise
you give me \$1000?

If not, you are probably making the working assumption that there is mutual
information between future and past.

> The figure p^s * (1 - p)^(n - s) is the probability that there will be
> s successes, with *fixed probability p* for each success, a
> probability independent of the trial number. ...

Not quite. The probability is, in this model, fixed, and independent of the
trial number, but under those conditions the probability of s successes in
n trials is n!/[s!(n-s)!] * p^s * (1 - p)^(n - s). For given values of n
and s then n!/[s!(n-s)!] is a constant, and it cancels out in the
normalization above.

> ... But how can we impose
> this property on a sequence? How do we know that there are fixed
> probabilities of success and failure on each trial?

It's an assumption - an idealized model that is justified on a case by case
basis. With both the sun and the earth there are irreversible evolutionary
processes at work, so no stationary model can be "correct". But the
evolutionary processes are slow enough compared to human time scales that
stationarity is a reasonable assumption. More interesting is to model
processes like the probability of solar flares geater than some magnitude
occuring in some time window.

> Is Laplace's rule even accepted nowadays?

This amounts to asking if there are processes usefully modeled as independ,
identically distributed, stationary stochastic processes with uniform
priors over the parameter p and quadratic loss functions.

> I would like to understand more of the philosophical theory behind the
> choice and justification of the likelihood function. Thank you for

Should you believe that this coin is fair? W Bialek, q-bio.NC/0508044.
http://www.princeton.edu/~wbialek/our_papers/bialek_05.pdf

Predictability, complexity and learning. W Bialek, I Nemenman & N Tishby,
Neural Comp 13, 2409-2463 (2001).

http://www.cs.ubc.ca/~murphyk/Bayes/bayes.html
http://bayes.cs.ucla.edu/BOOK-2K/index.html
http://bayes.wustl.edu/etj/prob/book.pdf

-- Michael

Date Subject Author
2/3/07 Scott
2/4/07 Michael Olea