On 10/27/2013 03:20 PM, Jennifer Murphy wrote: > There are many lists containing rankings of great books. Some are > limited to a particular genre (historical novels, biographies, science > fiction). Others are more general. Some are fairly short (50-100 books). > Others are much longer (1,001 books). > > Is there a way to "average" the data from as many of these lists as > possible to get some sort of composite ranking of all of the books that > appear in any of the lists? > > I took a crack at it with a spreadsheet, but ran into problems. I will > explain it briefly here. > > If the lists are all the same length and include exactly the the same > books, the solution is relatively simple (I think). I can just average > the ranks. I can even add a weighting factor to each list to adjust the > influence on the composite ranking up or down. > > I ran into problems when the lists are of different lengths and contain > different books. I could not think of a way to calculate a composite > ranking (or rating) when the lists do not all contain the same books. > > Another complicationb is that at least one of the lists is unranked (The > Time 100). Is there any way to make use of that list? > > I created a PDF document with some tables illustrating what I have > tried. Here's the link to the DropBox folder: > > https://www.dropbox.com/sh/yrckul6tsrbp23p/zNHXxSdeOH >
I have a couple of ideas...
(1) The different lists have different criteria for inclusion or exclusion. They may not be explicit, but let's assume they are made explicit. An exclusion criterion "not poetry" can in principle be turned into a combination of "ors" and "inclusion factors", as
"not poetry" = "is novel" or "is non-fiction" or "is historical novel".
these selectors matter because Tolstoy's "War and Peace" would not appear in a list "English literature" works ... yet, it's Russian literature, has been translated in English, and has received wide acclaim.
The idea would be to find all lists which, according to their explicit selection criteria, may include say "War and Peace" if all books in said category were ranked. But different lists which may include "War and Peace" will probably sometimes have different criteria.
(2) To consider calibrating between lists, say if 10 out of 20 lists all included the novel "Moby Dick", then to sort of use "Moby Dick" as a benchmark.
(3) My own observation with movies and books is that some books and movies seem designed to maximize sales, or to "target" a specific segment of readers & tastes, e.g. Harlequin series, which while "good reading for entertaiment", can be more easily read than "Remembrance of Things Past", a multi-volume novel by French author Marcel Proust, < http://en.wikipedia.org/wiki/In_Search_of_Lost_Time > .