```Date: Feb 4, 2007 2:50 PM
Author: Michael Olea
Subject: Re: Laplace's rule of succession

Scott wrote:> In Bayesian statistics, Laplace's rule of succession attempts to solve> the problem of how we can predict that the sun will rise tomorrow,> given its past frequency of rising.> > Definitions:> > 1. Let p be the long-run frequency, as observed.> 2. Let n be the total number of trials.> 3. Let s be the number of *successes* among these trials, so that n -> s is the number of failures.> > The rule of succession states that the probability of the next success> is given by the *expected value of a normalized likelihood function*.> The likelihood function is> > p^s * (1 - p)^(n - s).> > Normalized with the integral S_{0 to 1}(p^s * (1 - p)^(n - s)) dp, one> obtains as the expected value> > (s + 1)/(n + 2)> > for the probability of the next success. Thus, if all we know is that> the sun has risen 2000 times, the probability of its rising again is> 2001/2002.> > Now, I have a question. What's so special about this likelihood> function? It seems to be formulated completely ad hoc.It is the assumption that sunrise events are identically and independentlydistributed. Given that assumption, then there is some probability p thatthe sun will rise, and the question becomes one of estimating p from thedata: s successes in n trials. There are two more assumptions: 1) the priorprobability over p is the uniform distribution over [0, 1], 2) the lossfunction (or cost of errors) is quadratic.In other words, we are trying to estimate the parameter p. We start out withthe assumption, prior to collecting any data, that all values of p in [0,1] are equally likely.The second issue, once you have computed the posterior distribution over p,is how to pick a "best" estimate for p. What is "best" will depend on aloss function. One choice lead to the MAP (maximum a posterioriprobability), which is a value of p at which the pdf has a maximum. Underthe given assumptions above the pdf is unimodal, and achieves a maximum forp = s/n:solve for p the expression d/dp [p^s * (1-p)^n-s] = 0Another choice of loss function, the one in the "law of succession", leadsto the mean of the posterior pdf.> ... If the sample> space were all possible successions, the probability of the next> success would simply be 1/2. ...Only if all sequences are equally likely. It is possible for the samplespace to include all possible successions, but assign them different apriori probabilities. If all sequences were a priori equally likely (or inother words there are no constraints over the hypothesis space) thengeneralization, or induction, would be impossible. There would be no mutualinformation between future and past, and no basis for predicting the futureof a process from its past.> ... So what gives?Would you take this bet: every day the sun does not rise I will give you\$1000 (just assume there is some way to collect it), every day it does riseyou give me \$1000?If not, you are probably making the working assumption that there is mutualinformation between future and past. > The figure p^s * (1 - p)^(n - s) is the probability that there will be> s successes, with *fixed probability p* for each success, a> probability independent of the trial number. ...Not quite. The probability is, in this model, fixed, and independent of thetrial number, but under those conditions the probability of s successes inn trials is n!/[s!(n-s)!] * p^s * (1 - p)^(n - s). For given values of nand s then n!/[s!(n-s)!] is a constant, and it cancels out in thenormalization above.> ... But how can we impose> this property on a sequence? How do we know that there are fixed> probabilities of success and failure on each trial?It's an assumption - an idealized model that is justified on a case by casebasis. With both the sun and the earth there are irreversible evolutionaryprocesses at work, so no stationary model can be "correct". But theevolutionary processes are slow enough compared to human time scales thatstationarity is a reasonable assumption. More interesting is to modelprocesses like the probability of solar flares geater than some magnitudeoccuring in some time window. > Is Laplace's rule even accepted nowadays?This amounts to asking if there are processes usefully modeled as independ,identically distributed, stationary stochastic processes with uniformpriors over the parameter p and quadratic loss functions.> I would like to understand more of the philosophical theory behind the> choice and justification of the likelihood function. Thank you for> your help.Should you believe that this coin is fair? W Bialek, q-bio.NC/0508044.http://www.princeton.edu/~wbialek/our_papers/bialek_05.pdfPredictability, complexity and learning. W Bialek, I Nemenman & N Tishby,Neural Comp 13, 2409-2463 (2001).http://www.princeton.edu/~wbialek/learning_links.htmlhttp://www.cs.ubc.ca/~murphyk/Bayes/bayes.htmlhttp://bayes.cs.ucla.edu/BOOK-2K/index.htmlhttp://bayes.wustl.edu/etj/prob/book.pdf-- Michael
```