
Re: Is there a way to calculate an average ranking from uneven lists?
Posted:
Oct 30, 2013 12:18 PM


On Sun, 27 Oct 2013 23:46:59 +0000, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>Jennifer Murphy <JenMurphy@jm.invalid> writes: > >> On Sun, 27 Oct 2013 13:36:29 0600, Virgil <virgil@ligriv.com> wrote: >> >>>In article <chpq69prq63kh364qqmphkmqedhgm5ti6h@4ax.com>, >>> Jennifer Murphy <JenMurphy@jm.invalid> wrote: >>> >>>> There are many lists containing rankings of great books. Some are >>>> limited to a particular genre (historical novels, biographies, science >>>> fiction). Others are more general. Some are fairly short (50100 books). >>>> Others are much longer (1,001 books). >>>> >>>> Is there a way to "average" the data from as many of these lists as >>>> possible to get some sort of composite ranking of all of the books that >>>> appear in any of the lists? ><snip> >>>One way to compare rankings when there are different numbers of objects >>>ranked in different rankings is to scale them all over the same range, >>>such as from 0% to 100%. >>> >>>Thus in all rankings a lowest rank would rank 0% and the highest 100%, >>>and the middle one, if there were one, would rank 50%. >>>Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%, >>>and so on. >>> >>>For something of rank r out of n ranks use (r1)/(n1) times 100%. >> >> In the lists I have, the highest ranking entity is R=1, the lowest is >> R=N. For that, I think the formula is (NR)/(N1). No? > >Here's another idea to add to the mix. Some rankings (and I think this >is one) have the property the top is more significant than the bottom. >Anyone who picks a book to be no. 1 should have carefully weighed it up >against no. 2 and no. 3. But what about no. 1001? How likely is it >that some tiny alteration in the assessment might make it no. 995 or >998? And this effect is related to the absolute length of the list, not >just the relative position within it. > >Put it another way, outstanding things stand out. Once you are into the >more run of the mill. the distinction become less significant. > >As a result, you might consider a negative exponential weighting  a >ranking of R is given a value of w^(R1) with 0 < w < 1. Thus all first >positions are "worth" 1, all second positions are worth w, and all third >positions w^2 and so on.
This suggestion, in combination with James Waldby's suggestion to add up the score, rather than average them, looks to be a very good solution. I've run some preliminary simulations that look great.
The only question I have is whether to use a geometric or arithmetic progression for discounting lower ranking books. I think I agree with you that the geometric (or exponential) progression does a better job of emphasizing the greater significance of the higher rankings.
Here's some sample data on ranks from 125. The first column is the raw ranks. The next three columns shows geometric discounting using F=0.96. The last three show arithmetic discounting with the scores going to zero at N+1. I set the geometric discounting factor to 0.96 so that Book #2 would get the same score as for the arithmetic progression.
The geometric progression takes a greater absolute absolute discount at the top, but a constant relative discount. The arithmetic progression takes constant absolute discount, which results in an increasing relative discount, but on a smaller and smaller base.
Geometric Discount Arithmetic Discount Rank F = 0.96 Diff % N = 25 Diff % 1 1.000 . . 1.000 . . 2 0.960 0.0400 96.00% 0.960 0.0400 96.00% 3 0.922 0.0384 96.00% 0.920 0.0400 95.83% 4 0.885 0.0369 96.00% 0.880 0.0400 95.65% 5 0.849 0.0354 96.00% 0.840 0.0400 95.45% 6 0.815 0.0340 96.00% 0.800 0.0400 95.24% 7 0.783 0.0326 96.00% 0.760 0.0400 95.00% 8 0.751 0.0313 96.00% 0.720 0.0400 94.74% 9 0.721 0.0301 96.00% 0.680 0.0400 94.44% 10 0.693 0.0289 96.00% 0.640 0.0400 94.12% 11 0.665 0.0277 96.00% 0.600 0.0400 93.75% 12 0.638 0.0266 96.00% 0.560 0.0400 93.33% 13 0.613 0.0255 96.00% 0.520 0.0400 92.86% 14 0.588 0.0245 96.00% 0.480 0.0400 92.31% 15 0.565 0.0235 96.00% 0.440 0.0400 91.67% 16 0.542 0.0226 96.00% 0.400 0.0400 90.91% 17 0.520 0.0217 96.00% 0.360 0.0400 90.00% 18 0.500 0.0208 96.00% 0.320 0.0400 88.89% 19 0.480 0.0200 96.00% 0.280 0.0400 87.50% 20 0.460 0.0192 96.00% 0.240 0.0400 85.71% 21 0.442 0.0184 96.00% 0.200 0.0400 83.33% 22 0.424 0.0177 96.00% 0.160 0.0400 80.00% 23 0.407 0.0170 96.00% 0.120 0.0400 75.00% 24 0.391 0.0163 96.00% 0.080 0.0400 66.67% 25 0.375 0.0156 96.00% 0.040 0.0400 50.00% 26 0.360 0.0150 96.00% 0.000 0.0400 0.00%
I think the geometric progression is probably better.
Thanks for the great suggestion.

