
Re: Is there a way to calculate an average ranking from uneven lists?
Posted:
Oct 30, 2013 12:06 PM


On Mon, 28 Oct 2013 00:23:32 +0000 (UTC), James Waldby <not@valid.invalid> wrote:
>On Sun, 27 Oct 2013 14:06:56 0700, Jennifer Murphy wrote: >> On Sun, 27 Oct 2013 13:36:29 0600, Virgil wrote:> >>> Jennifer Murphy <JenMurphy@jm.invalid> wrote: >>> >>>> There are many lists containing rankings of great books. Some are >>>> limited to a particular genre (historical novels, biographies, science >>>> fiction). Others are more general. Some are fairly short (50100 books). >>>> Others are much longer (1,001 books). >>>> >>>> Is there a way to "average" the data from as many of these lists as >>>> possible to get some sort of composite ranking of all of the books that >>>> appear in any of the lists? >[snip] >>>> I ran into problems when the lists are of different lengths and contain >>>> different books. I could not think of a way to calculate a composite >>>> ranking (or rating) when the lists do not all contain the same books. >>>> >>>> Another complication is that at least one of the lists is unranked (The >>>> Time 100). Is there any way to make use of that list? >>>> >>>> I created a PDF document with some tables illustrating what I have >>>> tried. Here's the link to the DropBox folder: >>>> https://www.dropbox.com/sh/yrckul6tsrbp23p/zNHXxSdeOH >>> >>>One way to compare rankings when there are different numbers of objects >>>ranked in different rankings is to scale them all over the same range, >>>such as from 0% to 100%. >>> >>>Thus in all rankings a lowest rank would rank 0% and the highest 100%, >>>and the middle one, if there were one, would rank 50%. >>>Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%, >>>and so on. >>> >>>For something of rank r out of n ranks use (r1)/(n1) times 100%. >> >> In the lists I have, the highest ranking entity is R=1, the lowest is >> R=N. For that, I think the formula is (NR)/(N1). No? >> >> Two questions: >> >> 1. Do I then just average the ranks across the lists? >> >> 2. What scaled rank do I use for a book that is not ranked in a list? > >For the given problem, averages of ranks probably aren't a statistically >sound approach. For example, see the "Qualitative description" section >of article <http://en.wikipedia.org/wiki/Rating_scale>, which says: >"User ratings are at best ordinal categorizations. While it is not >uncommon to calculate averages or means for such data, doing so >cannot be justified because in calculating averages, equal intervals >are required to represent the same difference between levels of perceived >quality. The key issues with aggregate data based on the kinds of rating >scales commonly used online are as follow: Averages should not be >calculated for data of the kind collected." (etc.) > >Also see <http://en.wikipedia.org/wiki/Polytomous_Rasch_model> which in >its "The model" section has some statistical analysis that might (or might >not) apply. Also see <http://en.wikipedia.org/wiki/Likert_scale> and >some pages listed at <http://en.wikipedia.org/wiki/Category:Psychometrics>. > >Here's an approach to consider: Set up some criteria for giving points >to various books, and give each book a total score based on the number of >criteria it meets when all the lists are considered. For each list, each >book gets 1 point for each criterion that it meets. Sort the resulting >scores from large to small. > >Here's an example of a possible set of criteria: { in first place; in top 2; >in top 5; in top 10; in top 20; in top 40; in top 80; on list}. > >For example, if list 1 is { #1 Emma; #2 Mrs. Dalloway; #3 Anna Karenina; >#4 Lolita; #5 Salome; #6 Vera} and list 2 is { #1 Emma; #2 Persuasion; >#3 Northanger Abbey}, then Emma scores 16; Mrs. Dalloway and Persuasion >score 7; Anna Karenina, Northanger Abbey, Lolita, and Salome score 6; >Vera scores 5. Perhaps it would work better with more and larger lists. >Anyhow, make up a set of criteria, run all your lists against it, and >if the results aren't right, change the criteria until they are.
After considering all of the suggestions and observations and running some simulations on some sample data, it looks like this is the breakthrough I needed.
Rather than defining specific, discrete criteria, I combined your suggestion with Ben's suggestion of calculating a rank score starting with "1" for the #1 book and discounting lower ranked books from there. Ben's continuous scoring criteria is smoother and easier to implement.
Your suggestion of adding up the scores completely eliminates the averaging problem of a book getting a high ranking on one list composite list because of a high ranking on just one list while being missing entirely from the others.
Thanks for the help. If I get time, I'll post some sample data.

