Date: Feb 19, 2013 7:36 AM Author: David Jones Subject: Re: Trying to understand Bayes and Hypothesis On Monday, February 18, 2013 3:16:59 PM UTC+1, David Jones wrote:

> "Cagdas Ozgenc" wrote in message

>

> news:6369cf9a-b2d7-4105-9c19-7196df399299@googlegroups.com...

>

>

>

> Hello,

>

>

>

> I am confused with the usage of Bayes with model selection.

>

>

>

> I frequently see the following notation:

>

>

>

> P(H | D) = P(D | H)*P(H) / P(D) where H is hypothesis and D is data.

>

>

>

> It's Bayes rule. What I don't understand is the following. If in reality D

> ~

>

> N(m,v) and my hypothesis is that D ~ (m',v) where m is different from m'

> and

>

> if all hypothesis are equally likely

>

>

>

> P(D) = sum P(D|H)*P(H)dH is not equal to true P(D), or is it?

>

>

>

> =======================================================================

>

>

>

> The standard notation is sloppy notation. If you use "K" to represent what

>

> is known before observing data "D", then

>

> P(H | D,K) = P(D | H,K)*P(H|K) / P(D|K)

>

> and then go on as you were, you get

>

> P(D |K) = sum P(D|H,K)*P(H|K) dH

>

> ... which at least illustrates your concern.

>

>

> "True P(D)" can be thought of as P(D | infinite pre-knowledge), while

> Bayes'

> Rule requires P(D |K)=P(D |actual pre-knowledge).

>

>

>

> David Jones

"Cagdas Ozgenc" wrote in message

news:38d6bbba-bba8-480c-ba70-f542c0587a2f@googlegroups.com...

Hello David,

I realized the sloppiness as well. Nevertheless philosophically I don't

understand what is "actual pre-knowledge" and "infinite pre-knowlege". Could

you elaborate on that? Is there a difference if my hypotheses are coming

from a constrained set or from a set of all computable distributions?

Thanks

=========================================

I was rather thinking in the context of a sequence of experiments, and where

the models being used for the Bayesian analysis encompassed the correct/true

model. Thus a sequence of experiments would yield P(D |K1), P(D |K1,K2),

P(D |K1,K2,K3, .... ), as the standardising factors, all evaluated using the

model distributions. If the prior/posterior distributions converge to

identify correctly the parameters of the distribution for the observations

for an experiment, then at that stage (and if the family of model

distributions for the next observation includes the true distribution for

that observation) then the distribution derived for the scaling factor

(after many experiments) will be essentially identical to the true

distribution for the observation.

To see what happens if the distributions used to model in the analysis are

incorrect, consider what happens if the Bayesian analysis steps are done on

the basis of the well-known theory for a normal distribution with unknown

mean and variance .... no matter what the actual distribution of the

observations is, the (incorrect) analysis will produce posterior

distributions that depend only on the sample mean and variance of the

observations. Under some assumptions, the sample mean and variance converge

to the true mean and variance of the observations, and so the mean and

variance of the distribution of observations is identified. But the model

would then produce a normal distribution for the scaling factor in the

Bayes' Rule step, while the true distribution would not be normal. Of

course, under other assumptions the sample mean and variance do not converge

... for example if the observations actually came from a Cauchy distribution

... but the fact that the modelled posteriors are unknown in terms of the

sample mean and variance makes it relatively easy to see how the

(incorrect) posterior distributions will behave.

David Jones