Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: Trying to understand Bayes and Hypothesis
Replies: 11   Last Post: Feb 22, 2013 3:09 AM

 Messages: [ Previous | Next ]
 David Jones Posts: 80 Registered: 2/9/12
Re: Trying to understand Bayes and Hypothesis
Posted: Feb 19, 2013 7:36 AM

On Monday, February 18, 2013 3:16:59 PM UTC+1, David Jones wrote:
> "Cagdas Ozgenc" wrote in message
>
>
>
>
> Hello,
>
>
>
> I am confused with the usage of Bayes with model selection.
>
>
>
> I frequently see the following notation:
>
>
>
> P(H | D) = P(D | H)*P(H) / P(D) where H is hypothesis and D is data.
>
>
>
> It's Bayes rule. What I don't understand is the following. If in reality D
> ~
>
> N(m,v) and my hypothesis is that D ~ (m',v) where m is different from m'
> and
>
> if all hypothesis are equally likely
>
>
>
> P(D) = sum P(D|H)*P(H)dH is not equal to true P(D), or is it?
>
>
>
> =======================================================================
>
>
>
> The standard notation is sloppy notation. If you use "K" to represent what
>
> is known before observing data "D", then

>
> P(H | D,K) = P(D | H,K)*P(H|K) / P(D|K)
>
> and then go on as you were, you get
>
> P(D |K) = sum P(D|H,K)*P(H|K) dH
>
> ... which at least illustrates your concern.
>
>
> "True P(D)" can be thought of as P(D | infinite pre-knowledge), while
> Bayes'
> Rule requires P(D |K)=P(D |actual pre-knowledge).
>
>
>
> David Jones

"Cagdas Ozgenc" wrote in message

Hello David,

I realized the sloppiness as well. Nevertheless philosophically I don't
understand what is "actual pre-knowledge" and "infinite pre-knowlege". Could
you elaborate on that? Is there a difference if my hypotheses are coming
from a constrained set or from a set of all computable distributions?

Thanks

=========================================

I was rather thinking in the context of a sequence of experiments, and where
the models being used for the Bayesian analysis encompassed the correct/true
model. Thus a sequence of experiments would yield P(D |K1), P(D |K1,K2),
P(D |K1,K2,K3, .... ), as the standardising factors, all evaluated using the
model distributions. If the prior/posterior distributions converge to
identify correctly the parameters of the distribution for the observations
for an experiment, then at that stage (and if the family of model
distributions for the next observation includes the true distribution for
that observation) then the distribution derived for the scaling factor
(after many experiments) will be essentially identical to the true
distribution for the observation.

To see what happens if the distributions used to model in the analysis are
incorrect, consider what happens if the Bayesian analysis steps are done on
the basis of the well-known theory for a normal distribution with unknown
mean and variance .... no matter what the actual distribution of the
observations is, the (incorrect) analysis will produce posterior
distributions that depend only on the sample mean and variance of the
observations. Under some assumptions, the sample mean and variance converge
to the true mean and variance of the observations, and so the mean and
variance of the distribution of observations is identified. But the model
would then produce a normal distribution for the scaling factor in the
Bayes' Rule step, while the true distribution would not be normal. Of
course, under other assumptions the sample mean and variance do not converge
... for example if the observations actually came from a Cauchy distribution
... but the fact that the modelled posteriors are unknown in terms of the
sample mean and variance makes it relatively easy to see how the
(incorrect) posterior distributions will behave.

David Jones