On Sun, 27 Oct 2013 12:20:03 -0700, Jennifer Murphy <JenMurphy@jm.invalid> wrote:
>There are many lists containing rankings of great books. Some are >limited to a particular genre (historical novels, biographies, science >fiction). Others are more general. Some are fairly short (50-100 books). >Others are much longer (1,001 books). > >Is there a way to "average" the data from as many of these lists as >possible to get some sort of composite ranking of all of the books that >appear in any of the lists? > >I took a crack at it with a spreadsheet, but ran into problems. I will >explain it briefly here. > >If the lists are all the same length and include exactly the the same >books, the solution is relatively simple (I think). I can just average >the ranks. I can even add a weighting factor to each list to adjust the >influence on the composite ranking up or down. > >I ran into problems when the lists are of different lengths and contain >different books. I could not think of a way to calculate a composite >ranking (or rating) when the lists do not all contain the same books. > >Another complicationb is that at least one of the lists is unranked (The >Time 100). Is there any way to make use of that list? > >I created a PDF document with some tables illustrating what I have >tried. Here's the link to the DropBox folder: > >https://www.dropbox.com/sh/yrckul6tsrbp23p/zNHXxSdeOH
I want to thank everyone who contributed to this thread. I now have a method that seems to work very well. It involves assigning the #1 ranked book in each list a value of "1". A discount factor, F, is then used to assign scores to each lower ranked book. A weighting factor is then applied to each list. The book's final score is just the sum of those weighted, discounted ranking scores.
I wrote a little summary of the method for future reference. I have put a new PDF document of that summary in the DropBox folder. (See link above.) It contains smaple data and the code for the VBA macro (UDF) that does the calculations.
The final solution was a combination of suggestions from several people. A special thanks to them.
Virgil suggested scaling the ranks. This got me thinking about the whole concept of normalizing the disparate scales.
Ben Bacarisse pointed out that high rankings are likely to be more significant than lower ones and suggested a negative exponential (geometric) scaling formula.
James Waldby provided several sources to back up his assertion that it is not statistically sound to average rankings. He then suggested converting the rankings to scores based on some discrete criteria and adding them up as alternative method. This was probably the most significant breakthrough concept (for me). I merely replaced his discrete criteria with Ben's continuous geometric discounting method.
The resulting method seems to me to work quite well.
Several other people suggested methods that I didn't fully understand and/or didn't have time to explore.
Thank you all very much. This has been fun.
I would be interested in any comments on this solution.