Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Probability distribution fitting to idf data
Posted:
Aug 10, 2006 2:51 PM


Hi All I am facing a statistical problem which involves the fitting of an appropriate probability distribution to a certain data set. I have a set of inverse document frequency (IDF) values (please refer to [1]) for all the words in a document, and the problem I am facing is to chop this array of values at a certain point so that I only have the most useful values and not the irrelevant ones. I am looking for tools, algorithms etc that can be used to crack this problem. Please contact me in pandey.gaurav@gmail.com to discuss any ideas. I can also send a sample set of these values to those interested.
Thanks!
Gaurav Pandey
References: [1] S. Robertson, Understanding Inverse Document Frequency: On theoretical arguments for IDF, Available at http://www.soi.city.ac.uk/~ser/idfpapers/Robertson_idf_JDoc.pdf



