Date: Feb 4, 2007 2:50 PM Author: Michael Olea Subject: Re: Laplace's rule of succession Scott wrote:

> In Bayesian statistics, Laplace's rule of succession attempts to solve

> the problem of how we can predict that the sun will rise tomorrow,

> given its past frequency of rising.

>

> Definitions:

>

> 1. Let p be the long-run frequency, as observed.

> 2. Let n be the total number of trials.

> 3. Let s be the number of *successes* among these trials, so that n -

> s is the number of failures.

>

> The rule of succession states that the probability of the next success

> is given by the *expected value of a normalized likelihood function*.

> The likelihood function is

>

> p^s * (1 - p)^(n - s).

>

> Normalized with the integral S_{0 to 1}(p^s * (1 - p)^(n - s)) dp, one

> obtains as the expected value

>

> (s + 1)/(n + 2)

>

> for the probability of the next success. Thus, if all we know is that

> the sun has risen 2000 times, the probability of its rising again is

> 2001/2002.

>

> Now, I have a question. What's so special about this likelihood

> function? It seems to be formulated completely ad hoc.

It is the assumption that sunrise events are identically and independently

distributed. Given that assumption, then there is some probability p that

the sun will rise, and the question becomes one of estimating p from the

data: s successes in n trials. There are two more assumptions: 1) the prior

probability over p is the uniform distribution over [0, 1], 2) the loss

function (or cost of errors) is quadratic.

In other words, we are trying to estimate the parameter p. We start out with

the assumption, prior to collecting any data, that all values of p in [0,

1] are equally likely.

The second issue, once you have computed the posterior distribution over p,

is how to pick a "best" estimate for p. What is "best" will depend on a

loss function. One choice lead to the MAP (maximum a posteriori

probability), which is a value of p at which the pdf has a maximum. Under

the given assumptions above the pdf is unimodal, and achieves a maximum for

p = s/n:

solve for p the expression d/dp [p^s * (1-p)^n-s] = 0

Another choice of loss function, the one in the "law of succession", leads

to the mean of the posterior pdf.

> ... If the sample

> space were all possible successions, the probability of the next

> success would simply be 1/2. ...

Only if all sequences are equally likely. It is possible for the sample

space to include all possible successions, but assign them different a

priori probabilities. If all sequences were a priori equally likely (or in

other words there are no constraints over the hypothesis space) then

generalization, or induction, would be impossible. There would be no mutual

information between future and past, and no basis for predicting the future

of a process from its past.

> ... So what gives?

Would you take this bet: every day the sun does not rise I will give you

$1000 (just assume there is some way to collect it), every day it does rise

you give me $1000?

If not, you are probably making the working assumption that there is mutual

information between future and past.

> The figure p^s * (1 - p)^(n - s) is the probability that there will be

> s successes, with *fixed probability p* for each success, a

> probability independent of the trial number. ...

Not quite. The probability is, in this model, fixed, and independent of the

trial number, but under those conditions the probability of s successes in

n trials is n!/[s!(n-s)!] * p^s * (1 - p)^(n - s). For given values of n

and s then n!/[s!(n-s)!] is a constant, and it cancels out in the

normalization above.

> ... But how can we impose

> this property on a sequence? How do we know that there are fixed

> probabilities of success and failure on each trial?

It's an assumption - an idealized model that is justified on a case by case

basis. With both the sun and the earth there are irreversible evolutionary

processes at work, so no stationary model can be "correct". But the

evolutionary processes are slow enough compared to human time scales that

stationarity is a reasonable assumption. More interesting is to model

processes like the probability of solar flares geater than some magnitude

occuring in some time window.

> Is Laplace's rule even accepted nowadays?

This amounts to asking if there are processes usefully modeled as independ,

identically distributed, stationary stochastic processes with uniform

priors over the parameter p and quadratic loss functions.

> I would like to understand more of the philosophical theory behind the

> choice and justification of the likelihood function. Thank you for

> your help.

Should you believe that this coin is fair? W Bialek, q-bio.NC/0508044.

http://www.princeton.edu/~wbialek/our_papers/bialek_05.pdf

Predictability, complexity and learning. W Bialek, I Nemenman & N Tishby,

Neural Comp 13, 2409-2463 (2001).

http://www.princeton.edu/~wbialek/learning_links.html

http://www.cs.ubc.ca/~murphyk/Bayes/bayes.html

http://bayes.cs.ucla.edu/BOOK-2K/index.html

http://bayes.wustl.edu/etj/prob/book.pdf

-- Michael