> On Sun, 27 Oct 2013 23:46:59 +0000, Ben Bacarisse <firstname.lastname@example.org> > wrote: >>>>In article <email@example.com>, >>>> Jennifer Murphy <JenMurphy@jm.invalid> wrote: >>>> >>>>> There are many lists containing rankings of great books. Some are >>>>> limited to a particular genre (historical novels, biographies, science >>>>> fiction). Others are more general. Some are fairly short (50-100 books). >>>>> Others are much longer (1,001 books). >>>>> >>>>> Is there a way to "average" the data from as many of these lists as >>>>> possible to get some sort of composite ranking of all of the books that >>>>> appear in any of the lists? <snip> >>Here's another idea to add to the mix. Some rankings (and I think this >>is one) have the property the top is more significant than the bottom. >>Anyone who picks a book to be no. 1 should have carefully weighed it up >>against no. 2 and no. 3. But what about no. 1001? How likely is it >>that some tiny alteration in the assessment might make it no. 995 or >>998? And this effect is related to the absolute length of the list, not >>just the relative position within it. > > That's a great point. I guess it would depend on how the rankings were > compiled. Suppose 100 prominent "authorities" on literature were asked > to provide their top 10 or top 25 or even top 50 great books. Then a > master list of all books on any list were sent to the same people and > they were asked to rank them. > >>Put it another way, outstanding things stand out. Once you are into the >>more run of the mill. the distinction become less significant. >> >>As a result, you might consider a negative exponential weighting -- a >>ranking of R is given a value of w^(R-1) with 0 < w < 1. Thus all first >>positions are "worth" 1, all second positions are worth w, and all third >>positions w^2 and so on. > > I think this a geometric discounting process. Each successive rank is > discounted by a factor or "w" (0 < w < 1):
Ah, is that a standard name, or just a description?
> R1 = 1 > R2 = R1 * w > R3 = R2 * w > R4 = R3 * w > > This has the attractive property that every Rank K book on any list will > get the same score, regardless of the length of the list. > > Now, do I average these scores or just add them up? If I average them, > then I have to figure out what to do with the books that are not on a > list. I could give them the score equal to the length of the list + 1. > For a short list, that could be way too high. > > If I add them up, missing books get a zero score, which seems > appropriate. For a short list, that could be too low.
I'd average the scores over the number of lists they appear in. This seems fair. If my latest book is really the best thing written this year, it will appear as no. 1 in all the critics' lists, but it's no better than the no. 1 from ten years ago when only two lists were published.
If you want to get a bit more data from the number of lists involved, score each book as a pair: (average score, number of lists). You can then use the second value to break a tie.
> Do you prefer this to an arithmetic discounting?
I don't know. I'd try both and see if I get a feel for what looks right with real data.
> Suppose I subtract 1/N > from each rank, where N = the length of the longest list. If my longest > list has 5 entries, the ranks would be: > > 1 1 - 0/5 = 1 - 0.0 = 1.0 > 2 1 - 1/5 = 1 - 0.2 = 0.8 > 3 1 - 2/5 = 1 - 0.4 = 0.6 > 4 1 - 3/5 = 1 - 0.6 = 0.4 > 5 1 - 4/5 = 1 - 0.8 = 0.2 > 6 1 - 5/5 = 1 - 1.0 = 0.0
Certainly worth trying, but it weights the "gaps" as being the same which my gut feeling tells me is not what people do with lists (this is my point about stand-out thing standing out, and mediocre things all being much of a muchness).