Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.math.independent

Topic: Please critique my scheme for re-weighting source data
Replies: 8   Last Post: May 27, 2012 11:57 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Peter Webb

Posts: 122
Registered: 11/21/11
Re: Please critique my scheme for re-weighting source data
Posted: Feb 24, 2012 6:43 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Like the others have said, you weight your data according to the relative
number of words the "average well educated person" hears/reads from each
source. Which you have no practical way of determining, because as far as I
know there has been no study of how "average well educated people" spend
their time absorbing information.

You could guess, but the answers you eventually get will get will be
meaningless, as the answers will reflect your weightings just as much as the
data, and you have no means of weighting the data. Or rather, each possible
weighting just reflects your arbitrary assumption of what an "average
educated person" sees or hears.

So for example (somebody else's example) the average educated person spends
zero time reading academic journals, so that's a zero. The readership of the
internet, of novels, watching TV etc varies enormously across "average
educated person". How do you "weight" words that appear in incidental
dialogue on some reality TV show with words read on a web page? These are
completely different media.

You have (I believe) zero chance on forming numbers which actually mean
something by weighting the data and adding it together. You are weighting
and adding apples and oranges.

The interesting thing is the variation between media of the vocabularies;
add these together and you are throwing away the most interesting data.

If you are doing this because somebody asked you to, you need to either get
a lot more information from them (as to why you are doing it) or tell them
its impossible.



"Jennifer Murphy" <JenMurphy@jm.invalid> wrote in message
news:so3dk713f73qgao2ba5ek4f8vnus6h63qg@4ax.com...
> On Thu, 23 Feb 2012 13:56:58 -0500, Rich Ulrich
> <rich.ulrich@comcast.net> wrote:
>

>>You give no hint, that I notice, of what it is that you
>>are trying to accomplish.
>>
>>For most purposes of inference that come to my mind,
>>the extreme cases -- the ones that you seem to propose
>>to drop -- are the most informative and most interesting.
>>So I conclude that your interests are probably the opposite
>>(in some fashion) from what my naive interests would be.
>>
>>I repeat-- What are you trying to do?

>
> I am trying to calculate for each word the relative likeliness that it
> would be encountered by an average well-educated person in their daily
> activities: reading the paper, listening to the news, attending classes,
> talking to other people, reading books, etc.
>
> The raw scores that I have already do that, but I question the
> weighting.I do not think that the average person encounters the types of
> words typically found in academic journals at the same frequency as they
> would those found in newspapers or magazines. Therefore, I want to
> re-weight the five sources to reflect a more average experience.





Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.