The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Is a most-likely probability 'better' depending on the size of the

Replies: 4   Last Post: Feb 3, 2010 12:28 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 27
Registered: 1/8/09
Is a most-likely probability 'better' depending on the size of the

Posted: Feb 2, 2010 7:17 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply


I'm working on an algorithm to guess the correct English word within
text in which some words have become illegible.
It boils down to creating a list of candidate words, along with their
probabilities, and choosing the most likely.
Alternating between training data, and new test data, I can establish
that the probability estimations are fairly accurate. (Though to be
useful, the algorithm needs to provide a shorter candidate list in the
first place!)

Suppose I have two cases:
A) There are 2 candidate words with probabilities 0.51 and 0.49.
B) There are 101 candidate words, one with P=0.51, and a hundred
others all with P = .0049.

One of the approaches the algorithm takes is based on the N recent
known words prior to the unknown word (its Ngram), so there are
inevitably situations when the Ngram contains words that have
themselves been corrected in a prior step. If this is the case, I need
to know how much I can rely on that previous result.
Is there any basis for believing that in case B) the result is more
trustworthy? After all, the choice with P=0.51 is more than 100 times
more likely than the next best word. But in case A) there's virtually
nothing to choose between them.
Rightly or wrongly, that's how I intuitively feel about the choices,
but then I remember... both 'best choices' will be wrong 49% of the
time, so it doesn't make any difference!

Is there a measure for this, or is it totally irrelevant?

Eventually the goal is to have a much higher confidence than 0.51 in a
single choice, but there will occasionally be situations with these
borderline results. In these cases I'll offer the user a drop-down
replacement list with all the choices and their probabilities, for
them to pick from.
Talking to non-maths friends about this, most of them feel the same
way that they would be more confident making a choice in case B) than
A) .

Any thoughts?... Is this a bit of a Monty Hall problem?



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.