Topic: Probabilities always >= 0 and <= 1?
Topic: Probabilities always >= 0 and <= 1?

 Jussi Piitulainen
Re: Probabilities always >= 0 and <= 1?
Posted: May 9, 2012 12:14 AM

FFMG writes:

> On Tuesday, 8 May 2012 11:35:35 UTC+2, FFMG wrote:
> > Hi,
> >
> > I was looking at a a site,
> > (http://bionicspirit.com/blog/2012/02/09/howto-build-naive-bayes-classifier.html),
> > basically talking about a Naive Bayes Classifier.
> >
> > But in some cases the formula gives me probabilities greater than 1.
> > How it is possible?
> >
> > // Total of 18 documents.
> > // * 9 documents out of a total of 18 are spam messages
> > // * 3 documents out of those 18 contain the word "naughty"
> > // * 3 documents containing the word "naughty" have been marked as spam
> > // * 3 documents out of the total contain the word "money"
> > // * 3 emails out of those have been marked as spam
> >
> > P(spam|naughty,money) = P(money|spam) * P(money|spam) * P(spam)
> > --------------------------
> > P(naughty) * P(money)
> >
> > P(spam|naughty,money) = 3/9 * 3/9 * 9/18 = 2
> > ----------------
> > 3/18 * 3/18
> >
> > But how can a probability be outside of 0 and 1? Must I always
> > force the numbers to be between 0 and 1 and accept that in some
> > cases they will fall outside the range?
> >
> > Many thanks for suggestions as to where I might have gone wrong.
> >
> > Regards,
> >
> > FFMG

>
> Thanks for all the replies, I guess I will force the documents
> classification between 0 and 1, because in my case I will have 100
> of thousands of documents, (we have +200000 currently), and
> hopefully it will not take more than 5000 'training' to get some
> meaningful data classification.
>
> I just thought that even with my 18 documents I should still get a
> probability between 0 and 1.
>
> My main task was to write unit tests, and if the correct result in
> my test with 18 documents is a probability of '2' then I guess the
> calculations are valid.
>
> Thanks for all inputs and suggestions.

Look again at Ray Vickson's post. I think he hit the nail on the head.

He used the law of total probablility to expand the denominator as

P("naughty", "money")
= P("naughty", "money" | "spam" or "not spam")
= P("naughty", "money" | "spam")
+ P("naughty", "money" | "not spam")

after which you can use the _same_ independence assumption in both the
numerator and the denominator. That shouldn't lead to such a blatant

Perhaps this is how Naive Bayes is always done. I haven't checked.

