The Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.math

Topic: Is there a way to calculate an average ranking from uneven lists?
Replies: 15   Last Post: Oct 30, 2013 12:18 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Graham Cooper

Posts: 4,495
Registered: 5/20/10
Re: Is there a way to calculate an average ranking from uneven lists?
Posted: Oct 28, 2013 2:56 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Sunday, October 27, 2013 11:42:34 PM UTC-7, graham...@gmail.com wrote:
> On Sunday, October 27, 2013 11:27:03 PM UTC-7, Jennifer Murphy wrote:
>

> > On Mon, 28 Oct 2013 00:23:32 +0000 (UTC), James Waldby
>
> >
>
> > <not@valid.invalid> wrote:
>
> >
>
> >
>
> >
>
> > >On Sun, 27 Oct 2013 14:06:56 -0700, Jennifer Murphy wrote:
>
> >
>
> > >> On Sun, 27 Oct 2013 13:36:29 -0600, Virgil wrote:>
>
> >
>
> > >>> Jennifer Murphy <JenMurphy@jm.invalid> wrote:
>
> >
>
> > >>>
>
> >
>
> > >>>> There are many lists containing rankings of great books. Some are
>
> >
>
> > >>>> limited to a particular genre (historical novels, biographies, science
>
> >
>
> > >>>> fiction). Others are more general. Some are fairly short (50-100 books).
>
> >
>
> > >>>> Others are much longer (1,001 books).
>
> >
>
> > >>>>
>
> >
>
> > >>>> Is there a way to "average" the data from as many of these lists as
>
> >
>
> > >>>> possible to get some sort of composite ranking of all of the books that
>
> >
>
> > >>>> appear in any of the lists?
>
> >
>
> > >[snip]
>
> >
>
> > >>>> I ran into problems when the lists are of different lengths and contain
>
> >
>
> > >>>> different books. I could not think of a way to calculate a composite
>
> >
>
> > >>>> ranking (or rating) when the lists do not all contain the same books.
>
> >
>
> > >>>>
>
> >
>
> > >>>> Another complication is that at least one of the lists is unranked (The
>
> >
>
> > >>>> Time 100). Is there any way to make use of that list?
>
> >
>
> > >>>>
>
> >
>
> > >>>> I created a PDF document with some tables illustrating what I have
>
> >
>
> > >>>> tried. Here's the link to the DropBox folder:
>
> >
>
> > >>>> https://www.dropbox.com/sh/yrckul6tsrbp23p/zNHXxSdeOH
>
> >
>
> > >>>
>
> >
>
> > >>>One way to compare rankings when there are different numbers of objects
>
> >
>
> > >>>ranked in different rankings is to scale them all over the same range,
>
> >
>
> > >>>such as from 0% to 100%.
>
> >
>
> > >>>
>
> >
>
> > >>>Thus in all rankings a lowest rank would rank 0% and the highest 100%,
>
> >
>
> > >>>and the middle one, if there were one, would rank 50%.
>
> >
>
> > >>>Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%,
>
> >
>
> > >>>and so on.
>
> >
>
> > >>>
>
> >
>
> > >>>For something of rank r out of n ranks use (r-1)/(n-1) times 100%.
>
> >
>
> > >>
>
> >
>
> > >> In the lists I have, the highest ranking entity is R=1, the lowest is
>
> >
>
> > >> R=N. For that, I think the formula is (N-R)/(N-1). No?
>
> >
>
> > >>
>
> >
>
> > >> Two questions:
>
> >
>
> > >>
>
> >
>
> > >> 1. Do I then just average the ranks across the lists?
>
> >
>
> > >>
>
> >
>
> > >> 2. What scaled rank do I use for a book that is not ranked in a list?
>
> >
>
> > >
>
> >
>
> > >For the given problem, averages of ranks probably aren't a statistically
>
> >
>
> > >sound approach. For example, see the "Qualitative description" section
>
> >
>
> > >of article <http://en.wikipedia.org/wiki/Rating_scale>, which says:
>
> >
>
> > >"User ratings are at best ordinal categorizations. While it is not
>
> >
>
> > >uncommon to calculate averages or means for such data, doing so
>
> >
>
> > >cannot be justified because in calculating averages, equal intervals
>
> >
>
> > >are required to represent the same difference between levels of perceived
>
> >
>
> > >quality. The key issues with aggregate data based on the kinds of rating
>
> >
>
> > >scales commonly used online are as follow: Averages should not be
>
> >
>
> > >calculated for data of the kind collected." (etc.)
>
> >
>
> >
>
> >
>
> > Yes, I did feel a little uneasy about averaging numbers that are not
>
> >
>
> > really numerical in the usual sense.
>
> >
>
> >
>
> >
>
> > >Also see <http://en.wikipedia.org/wiki/Polytomous_Rasch_model> which in
>
> >
>
> > >its "The model" section has some statistical analysis that might (or might
>
> >
>
> > >not) apply. Also see <http://en.wikipedia.org/wiki/Likert_scale> and
>
> >
>
> > >some pages listed at <http://en.wikipedia.org/wiki/Category:Psychometrics>.
>
> >
>
> > >
>
> >
>
> > >Here's an approach to consider: Set up some criteria for giving points
>
> >
>
> > >to various books, and give each book a total score based on the number of
>
> >
>
> > >criteria it meets when all the lists are considered. For each list, each
>
> >
>
> > >book gets 1 point for each criterion that it meets. Sort the resulting
>
> >
>
> > >scores from large to small.
>
> >
>
> > >
>
> >
>
> > >Here's an example of a possible set of criteria: { in first place; in top 2;
>
> >
>
> > >in top 5; in top 10; in top 20; in top 40; in top 80; on list}.
>
> >
>
> > >
>
> >
>
> > >For example, if list 1 is { #1 Emma; #2 Mrs. Dalloway; #3 Anna Karenina;
>
> >
>
> > >#4 Lolita; #5 Salome; #6 Vera} and list 2 is { #1 Emma; #2 Persuasion;
>
> >
>
> > >#3 Northanger Abbey}, then Emma scores 16;
>
> >
>
> >
>
> >
>
> > Do you give a book a score for being in the top 80 even if the list only
>
> >
>
> > has 50 or 10 entries?
>
> >
>
> >
>
> >
>
> > >Mrs. Dalloway and Persuasion
>
> >
>
> > >score 7; Anna Karenina, Northanger Abbey, Lolita, and Salome score 6;
>
> >
>
> > >Vera scores 5. Perhaps it would work better with more and larger lists.
>
> >
>
> >
>
> >
>
> > This is a very creative solution. I like that it is additive. This
>
> >
>
> > completely eliminates the problem of what to do with books that are
>
> >
>
> > missing from the list.
>
> >
>
> >
>
> >
>
> > What would you say to combining your idea with Ben's. Give each #1 book
>
> >
>
> > a score of "1". Give each lower ranked book on each list a discounted
>
> >
>
> > score (geometrically or arithmetically). Then just add them up?
>
> >
>
> >
>
> >
>
> > >Anyhow, make up a set of criteria, run all your lists against it, and
>
> >
>
> > >if the results aren't right, change the criteria until they are.
>
> >
>
> >
>
> >
>
> > I think I'll do just that. :-)
>
>
>
>
>
>
>
> There is a method to rank ALL the books, I used it to rank horses and pick trifectas!
>
>
>
> All you need is to calculate a SCALAR MULTIPLE for each ranking service.
>
>
>
> Each book is assigned a starting value of 1.
>
> Each ranking site (horse race) is assigned a starting value of 1.
>
>
>
>
>
> Select a random book.
>
> calculate it's % rank in that list.
>
> Increase or decrease the books value PARTIALLY to decrease the error.
>
>
>
> Select a random ranking site.
>
> multiply all the books ranks.
>
> calculate its WEIGHT of the ranking site and adjust that PARTIALLY to decrease the error.
>
>
>
>
>
> This will jiggle all the books score and all the sites weights
>
> until the errors reduce to a minium.
>
>
>
> It takes about 1/2 hour to settle with 10,000 books (horses)
>
> and you gradually decrease the amount you change (the PARTIAL CHANGE bit)
>
> so it settles on the optimum solution without jiggling too much towards the end.
>
>
>
> See SIMULATED ANNEALING
>
>
>
>





I think with book rating sites a Spread Factor could be calculated too.


So the books total rank would be:

BOOK-RANK = LISTED-RANK * SITE-WEIGHT-TOP / SITE-WEIGHT-BOTTOM


although I would get that working last, get


BOOK-RANK = LISTED-RANK * SITE-WEIGHT


working 1st, as calculating the spread for each site would be tricky
but should reduce the error of fitting the data significantly.


Herc



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2017. All Rights Reserved.