Topic: Please critique my scheme for re-weighting source data
 JohnF
Re: Please critique my scheme for re-weighting source data
Posted: Feb 24, 2012 5:35 AM

Jennifer Murphy <JenMurphy@jm.invalid> wrote:
> Rich Ulrich <rich.ulrich@comcast.net> wrote:
>

>> What are you trying to do?
>
> I am trying to calculate for each word the relative likeliness that it
> would be encountered by an average well-educated person in their daily
> activities: reading the paper, listening to the news, attending classes,
> talking to other people, reading books, etc.
>
> The raw scores that I have already do that, but I question the
> weighting.I do not think that the average person encounters the types of
> words typically found in academic journals at the same frequency as they
> would those found in newspapers or magazines. Therefore, I want to
> re-weight the five sources to reflect a more average experience.

Don't weight the sources, weight the people.
That is, define a person by a "state vector"
p = <w_A,w_B,...,w_E>
representing his inclination/weight to read each
kind of source. You're now kind of using p=<.2,.2,.2,.2,.2>.
Is that really "average"? Or maybe you can't define
a single average person. College-educated will probably have
a different vector than high-school dropouts.
So you ultimately have a five-dimensional (that is,
#sources-dimensional) people space, with each point in that
space having its own "likelihood distribution" for coming
across your words. ... Or something like that. The basic
point, again, being to weight the people.

John Forkosh ( mailto: j@f.com where j=john and f=forkosh )

