Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Re: Laplace's rule of succession
Posted:
Feb 4, 2007 2:50 PM


Scott wrote:
> In Bayesian statistics, Laplace's rule of succession attempts to solve > the problem of how we can predict that the sun will rise tomorrow, > given its past frequency of rising. > > Definitions: > > 1. Let p be the longrun frequency, as observed. > 2. Let n be the total number of trials. > 3. Let s be the number of *successes* among these trials, so that n  > s is the number of failures. > > The rule of succession states that the probability of the next success > is given by the *expected value of a normalized likelihood function*. > The likelihood function is > > p^s * (1  p)^(n  s). > > Normalized with the integral S_{0 to 1}(p^s * (1  p)^(n  s)) dp, one > obtains as the expected value > > (s + 1)/(n + 2) > > for the probability of the next success. Thus, if all we know is that > the sun has risen 2000 times, the probability of its rising again is > 2001/2002. > > Now, I have a question. What's so special about this likelihood > function? It seems to be formulated completely ad hoc.
It is the assumption that sunrise events are identically and independently distributed. Given that assumption, then there is some probability p that the sun will rise, and the question becomes one of estimating p from the data: s successes in n trials. There are two more assumptions: 1) the prior probability over p is the uniform distribution over [0, 1], 2) the loss function (or cost of errors) is quadratic.
In other words, we are trying to estimate the parameter p. We start out with the assumption, prior to collecting any data, that all values of p in [0, 1] are equally likely.
The second issue, once you have computed the posterior distribution over p, is how to pick a "best" estimate for p. What is "best" will depend on a loss function. One choice lead to the MAP (maximum a posteriori probability), which is a value of p at which the pdf has a maximum. Under the given assumptions above the pdf is unimodal, and achieves a maximum for p = s/n:
solve for p the expression d/dp [p^s * (1p)^ns] = 0
Another choice of loss function, the one in the "law of succession", leads to the mean of the posterior pdf.
> ... If the sample > space were all possible successions, the probability of the next > success would simply be 1/2. ...
Only if all sequences are equally likely. It is possible for the sample space to include all possible successions, but assign them different a priori probabilities. If all sequences were a priori equally likely (or in other words there are no constraints over the hypothesis space) then generalization, or induction, would be impossible. There would be no mutual information between future and past, and no basis for predicting the future of a process from its past.
> ... So what gives?
Would you take this bet: every day the sun does not rise I will give you $1000 (just assume there is some way to collect it), every day it does rise you give me $1000?
If not, you are probably making the working assumption that there is mutual information between future and past. > The figure p^s * (1  p)^(n  s) is the probability that there will be > s successes, with *fixed probability p* for each success, a > probability independent of the trial number. ...
Not quite. The probability is, in this model, fixed, and independent of the trial number, but under those conditions the probability of s successes in n trials is n!/[s!(ns)!] * p^s * (1  p)^(n  s). For given values of n and s then n!/[s!(ns)!] is a constant, and it cancels out in the normalization above.
> ... But how can we impose > this property on a sequence? How do we know that there are fixed > probabilities of success and failure on each trial?
It's an assumption  an idealized model that is justified on a case by case basis. With both the sun and the earth there are irreversible evolutionary processes at work, so no stationary model can be "correct". But the evolutionary processes are slow enough compared to human time scales that stationarity is a reasonable assumption. More interesting is to model processes like the probability of solar flares geater than some magnitude occuring in some time window. > Is Laplace's rule even accepted nowadays?
This amounts to asking if there are processes usefully modeled as independ, identically distributed, stationary stochastic processes with uniform priors over the parameter p and quadratic loss functions.
> I would like to understand more of the philosophical theory behind the > choice and justification of the likelihood function. Thank you for > your help.
Should you believe that this coin is fair? W Bialek, qbio.NC/0508044. http://www.princeton.edu/~wbialek/our_papers/bialek_05.pdf
Predictability, complexity and learning. W Bialek, I Nemenman & N Tishby, Neural Comp 13, 24092463 (2001). http://www.princeton.edu/~wbialek/learning_links.html
http://www.cs.ubc.ca/~murphyk/Bayes/bayes.html http://bayes.cs.ucla.edu/BOOK2K/index.html http://bayes.wustl.edu/etj/prob/book.pdf
 Michael



