On Sun, 27 Oct 2013 23:46:59 +0000, Ben Bacarisse <email@example.com> wrote:
>Jennifer Murphy <JenMurphy@jm.invalid> writes: > >> On Sun, 27 Oct 2013 13:36:29 -0600, Virgil <firstname.lastname@example.org> wrote: >> >>>In article <email@example.com>, >>> Jennifer Murphy <JenMurphy@jm.invalid> wrote: >>> >>>> There are many lists containing rankings of great books. Some are >>>> limited to a particular genre (historical novels, biographies, science >>>> fiction). Others are more general. Some are fairly short (50-100 books). >>>> Others are much longer (1,001 books). >>>> >>>> Is there a way to "average" the data from as many of these lists as >>>> possible to get some sort of composite ranking of all of the books that >>>> appear in any of the lists? ><snip> >>>One way to compare rankings when there are different numbers of objects >>>ranked in different rankings is to scale them all over the same range, >>>such as from 0% to 100%. >>> >>>Thus in all rankings a lowest rank would rank 0% and the highest 100%, >>>and the middle one, if there were one, would rank 50%. >>>Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%, >>>and so on. >>> >>>For something of rank r out of n ranks use (r-1)/(n-1) times 100%. >> >> In the lists I have, the highest ranking entity is R=1, the lowest is >> R=N. For that, I think the formula is (N-R)/(N-1). No? > >Here's another idea to add to the mix. Some rankings (and I think this >is one) have the property the top is more significant than the bottom. >Anyone who picks a book to be no. 1 should have carefully weighed it up >against no. 2 and no. 3. But what about no. 1001? How likely is it >that some tiny alteration in the assessment might make it no. 995 or >998? And this effect is related to the absolute length of the list, not >just the relative position within it. > >Put it another way, outstanding things stand out. Once you are into the >more run of the mill. the distinction become less significant. > >As a result, you might consider a negative exponential weighting -- a >ranking of R is given a value of w^(R-1) with 0 < w < 1. Thus all first >positions are "worth" 1, all second positions are worth w, and all third >positions w^2 and so on.
This suggestion, in combination with James Waldby's suggestion to add up the score, rather than average them, looks to be a very good solution. I've run some preliminary simulations that look great.
The only question I have is whether to use a geometric or arithmetic progression for discounting lower ranking books. I think I agree with you that the geometric (or exponential) progression does a better job of emphasizing the greater significance of the higher rankings.
Here's some sample data on ranks from 1-25. The first column is the raw ranks. The next three columns shows geometric discounting using F=0.96. The last three show arithmetic discounting with the scores going to zero at N+1. I set the geometric discounting factor to 0.96 so that Book #2 would get the same score as for the arithmetic progression.
The geometric progression takes a greater absolute absolute discount at the top, but a constant relative discount. The arithmetic progression takes constant absolute discount, which results in an increasing relative discount, but on a smaller and smaller base.