Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: Trying to understand Bayes and Hypothesis
Replies: 11   Last Post: Feb 22, 2013 3:09 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
David Jones

Posts: 60
Registered: 2/9/12
Re: Trying to understand Bayes and Hypothesis
Posted: Feb 19, 2013 7:36 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Monday, February 18, 2013 3:16:59 PM UTC+1, David Jones wrote:
> "Cagdas Ozgenc" wrote in message
>
> news:6369cf9a-b2d7-4105-9c19-7196df399299@googlegroups.com...
>
>
>
> Hello,
>
>
>
> I am confused with the usage of Bayes with model selection.
>
>
>
> I frequently see the following notation:
>
>
>
> P(H | D) = P(D | H)*P(H) / P(D) where H is hypothesis and D is data.
>
>
>
> It's Bayes rule. What I don't understand is the following. If in reality D
> ~
>
> N(m,v) and my hypothesis is that D ~ (m',v) where m is different from m'
> and
>
> if all hypothesis are equally likely
>
>
>
> P(D) = sum P(D|H)*P(H)dH is not equal to true P(D), or is it?
>
>
>
> =======================================================================
>
>
>
> The standard notation is sloppy notation. If you use "K" to represent what
>
> is known before observing data "D", then


>
> P(H | D,K) = P(D | H,K)*P(H|K) / P(D|K)
>
> and then go on as you were, you get
>
> P(D |K) = sum P(D|H,K)*P(H|K) dH
>
> ... which at least illustrates your concern.
>
>
> "True P(D)" can be thought of as P(D | infinite pre-knowledge), while
> Bayes'
> Rule requires P(D |K)=P(D |actual pre-knowledge).
>
>
>
> David Jones




"Cagdas Ozgenc" wrote in message
news:38d6bbba-bba8-480c-ba70-f542c0587a2f@googlegroups.com...

Hello David,

I realized the sloppiness as well. Nevertheless philosophically I don't
understand what is "actual pre-knowledge" and "infinite pre-knowlege". Could
you elaborate on that? Is there a difference if my hypotheses are coming
from a constrained set or from a set of all computable distributions?

Thanks

=========================================

I was rather thinking in the context of a sequence of experiments, and where
the models being used for the Bayesian analysis encompassed the correct/true
model. Thus a sequence of experiments would yield P(D |K1), P(D |K1,K2),
P(D |K1,K2,K3, .... ), as the standardising factors, all evaluated using the
model distributions. If the prior/posterior distributions converge to
identify correctly the parameters of the distribution for the observations
for an experiment, then at that stage (and if the family of model
distributions for the next observation includes the true distribution for
that observation) then the distribution derived for the scaling factor
(after many experiments) will be essentially identical to the true
distribution for the observation.

To see what happens if the distributions used to model in the analysis are
incorrect, consider what happens if the Bayesian analysis steps are done on
the basis of the well-known theory for a normal distribution with unknown
mean and variance .... no matter what the actual distribution of the
observations is, the (incorrect) analysis will produce posterior
distributions that depend only on the sample mean and variance of the
observations. Under some assumptions, the sample mean and variance converge
to the true mean and variance of the observations, and so the mean and
variance of the distribution of observations is identified. But the model
would then produce a normal distribution for the scaling factor in the
Bayes' Rule step, while the true distribution would not be normal. Of
course, under other assumptions the sample mean and variance do not converge
... for example if the observations actually came from a Cauchy distribution
... but the fact that the modelled posteriors are unknown in terms of the
sample mean and variance makes it relatively easy to see how the
(incorrect) posterior distributions will behave.

David Jones





Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.