On Monday, February 18, 2013 3:16:59 PM UTC+1, David Jones wrote: > "Cagdas Ozgenc" wrote in message > > news:firstname.lastname@example.org... > > > > Hello, > > > > I am confused with the usage of Bayes with model selection. > > > > I frequently see the following notation: > > > > P(H | D) = P(D | H)*P(H) / P(D) where H is hypothesis and D is data. > > > > It's Bayes rule. What I don't understand is the following. If in reality D > ~ > > N(m,v) and my hypothesis is that D ~ (m',v) where m is different from m' > and > > if all hypothesis are equally likely > > > > P(D) = sum P(D|H)*P(H)dH is not equal to true P(D), or is it? > > > > ======================================================================= > > > > The standard notation is sloppy notation. If you use "K" to represent what > > is known before observing data "D", then
> > P(H | D,K) = P(D | H,K)*P(H|K) / P(D|K) > > and then go on as you were, you get > > P(D |K) = sum P(D|H,K)*P(H|K) dH > > ... which at least illustrates your concern. > > > "True P(D)" can be thought of as P(D | infinite pre-knowledge), while > Bayes' > Rule requires P(D |K)=P(D |actual pre-knowledge). > > > > David Jones
"Cagdas Ozgenc" wrote in message news:email@example.com...
I realized the sloppiness as well. Nevertheless philosophically I don't understand what is "actual pre-knowledge" and "infinite pre-knowlege". Could you elaborate on that? Is there a difference if my hypotheses are coming from a constrained set or from a set of all computable distributions?
I was rather thinking in the context of a sequence of experiments, and where the models being used for the Bayesian analysis encompassed the correct/true model. Thus a sequence of experiments would yield P(D |K1), P(D |K1,K2), P(D |K1,K2,K3, .... ), as the standardising factors, all evaluated using the model distributions. If the prior/posterior distributions converge to identify correctly the parameters of the distribution for the observations for an experiment, then at that stage (and if the family of model distributions for the next observation includes the true distribution for that observation) then the distribution derived for the scaling factor (after many experiments) will be essentially identical to the true distribution for the observation.
To see what happens if the distributions used to model in the analysis are incorrect, consider what happens if the Bayesian analysis steps are done on the basis of the well-known theory for a normal distribution with unknown mean and variance .... no matter what the actual distribution of the observations is, the (incorrect) analysis will produce posterior distributions that depend only on the sample mean and variance of the observations. Under some assumptions, the sample mean and variance converge to the true mean and variance of the observations, and so the mean and variance of the distribution of observations is identified. But the model would then produce a normal distribution for the scaling factor in the Bayes' Rule step, while the true distribution would not be normal. Of course, under other assumptions the sample mean and variance do not converge ... for example if the observations actually came from a Cauchy distribution ... but the fact that the modelled posteriors are unknown in terms of the sample mean and variance makes it relatively easy to see how the (incorrect) posterior distributions will behave.