Virgil
Posts:
9,012
Registered:
1/6/11


Re: Is there a way to calculate an average ranking from uneven lists?
Posted:
Oct 28, 2013 3:14 AM


In article <j0sr695ffuelprtlh8akljk2118t3buhpl@4ax.com>, Jennifer Murphy <JenMurphy@jm.invalid> wrote:
> On Sun, 27 Oct 2013 17:01:48 0600, Virgil <virgil@ligriv.com> wrote: > > >In article <2ivq699o8a81ppiu5qognbecbgm9et2sov@4ax.com>, > > Jennifer Murphy <JenMurphy@jm.invalid> wrote: > > > >> On Sun, 27 Oct 2013 13:36:29 0600, Virgil <virgil@ligriv.com> wrote: > >> > >> >In article <chpq69prq63kh364qqmphkmqedhgm5ti6h@4ax.com>, > >> > Jennifer Murphy <JenMurphy@jm.invalid> wrote: > >> > > >> >> There are many lists containing rankings of great books. Some are > >> >> limited to a particular genre (historical novels, biographies, science > >> >> fiction). Others are more general. Some are fairly short (50100 books). > >> >> Others are much longer (1,001 books). > >> >> > >> >> Is there a way to "average" the data from as many of these lists as > >> >> possible to get some sort of composite ranking of all of the books that > >> >> appear in any of the lists? > >> > >> >One way to compare rankings when there are different numbers of objects > >> >ranked in different rankings is to scale them all over the same range, > >> >such as from 0% to 100%. > >> > > >> >Thus in all rankings a lowest rank would rank 0% and the highest 100%, > >> >and the middle one, if there were one, would rank 50%. > >> >Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%, > >> >and so on. > >> > > >> >For something of rank r out of n ranks use (r1)/(n1) times 100%. > >> > >> In the lists I have, the highest ranking entity is R=1, the lowest is > >> R=N. For that, I think the formula is (NR)/(N1). No? > > > >Works for me! > >> > >> Two questions: > >> > >> 1. Do I then just average the ranks across the lists? > > > >That ought to work. but th effect of your averaging will be to compress > >the pattern of rankings towards o.5 with fewer near either 1 or 0. > > Yes, but isn't this what we want? Are you suggesting that this is a > problem?
NO, but it is an effect you shuold be prepared for. > > If a book is ranked high and low on different lists, then the "average" > rank would be more in the middle. If a book is close to the top in most > lists, then the average ranking would be closer to the top. > > The term regression to the mean" comes to mind... > > >> 2. What scaled rank do I use for a book that is not ranked in a list? > > > >If no preferences are evident, I would either leave it out entirely > > Are you suggesting that the composite list only include books that are > on ALL lists? That would have the effect of making the final list > smaller and smaller as the number of lists increases. This is the > opposite effect that I want to achieve. What I meant was to give any book not mentioned in a particular list the lowest possible score for that list or even invent a "lower than mentioned" score for each list. > > >or give each book mentioned the same score of 0.5 ( or 50%). > > Do you mean that we add all of the books that are any list to all of the > lists and assign any that do not have a ranking the 0.5 value? On a list > of 1,000 books, this would have the effect of giving a book that did not > even make the list, a ranking higher than half of the books that did.
No, what I mean is something like giving each book a plus score for each mention but no score at all for a nonmention. > > Let's consider some actual data. Here are 3 sample lists each containing > 5 books, but not the same 5 books: > > Rank List 1 List 2 List 3 > 1 A B F > 2 B A H > 3 C E C > 4 D G D > 5 E D A > > When listed by book, the data looks like this: > > List 1 List 2 List 3 > Books Rank Rank Rank > Book A 1 2 5 > Book B 2 1 > Book C 3 3 > Book D 4 5 4 > Book E 5 3 > Book F 1 > Book G 4 > Book H 2 > > How would you calculate average rankings?
Any way you try will have several drawbacks. Clearly books D and E reviewed fairly well and book G did not, but the reasom why a particular reviewer did or did not review a particular book may well have nothing at all to do with the book's quality but be related to its subject matter, its publisher, or sheer happenstance. 

