On Sun, 27 Oct 2013 17:01:48 -0600, Virgil <email@example.com> wrote:
>In article <firstname.lastname@example.org>, > Jennifer Murphy <JenMurphy@jm.invalid> wrote: > >> On Sun, 27 Oct 2013 13:36:29 -0600, Virgil <email@example.com> wrote: >> >> >In article <firstname.lastname@example.org>, >> > Jennifer Murphy <JenMurphy@jm.invalid> wrote: >> > >> >> There are many lists containing rankings of great books. Some are >> >> limited to a particular genre (historical novels, biographies, science >> >> fiction). Others are more general. Some are fairly short (50-100 books). >> >> Others are much longer (1,001 books). >> >> >> >> Is there a way to "average" the data from as many of these lists as >> >> possible to get some sort of composite ranking of all of the books that >> >> appear in any of the lists? >> >> >One way to compare rankings when there are different numbers of objects >> >ranked in different rankings is to scale them all over the same range, >> >such as from 0% to 100%. >> > >> >Thus in all rankings a lowest rank would rank 0% and the highest 100%, >> >and the middle one, if there were one, would rank 50%. >> >Four items with no ties would rank 0%, 33 1/3%, 66 2/3% and 100%, >> >and so on. >> > >> >For something of rank r out of n ranks use (r-1)/(n-1) times 100%. >> >> In the lists I have, the highest ranking entity is R=1, the lowest is >> R=N. For that, I think the formula is (N-R)/(N-1). No? > >Works for me! >> >> Two questions: >> >> 1. Do I then just average the ranks across the lists? > >That ought to work. but th effect of your averaging will be to compress >the pattern of rankings towards o.5 with fewer near either 1 or 0.
Yes, but isn't this what we want? Are you suggesting that this is a problem?
If a book is ranked high and low on different lists, then the "average" rank would be more in the middle. If a book is close to the top in most lists, then the average ranking would be closer to the top.
The term regression to the mean" comes to mind...
>> 2. What scaled rank do I use for a book that is not ranked in a list? > >If no preferences are evident, I would either leave it out entirely
Are you suggesting that the composite list only include books that are on ALL lists? That would have the effect of making the final list smaller and smaller as the number of lists increases. This is the opposite effect that I want to achieve.
>or give each book mentioned the same score of 0.5 ( or 50%).
Do you mean that we add all of the books that are any list to all of the lists and assign any that do not have a ranking the 0.5 value? On a list of 1,000 books, this would have the effect of giving a book that did not even make the list, a ranking higher than half of the books that did.
Let's consider some actual data. Here are 3 sample lists each containing 5 books, but not the same 5 books:
Rank List 1 List 2 List 3 1 A B F 2 B A H 3 C E C 4 D G D 5 E D A
When listed by book, the data looks like this:
List 1 List 2 List 3 Books Rank Rank Rank Book A 1 2 5 Book B 2 1 Book C 3 3 Book D 4 5 4 Book E 5 3 Book F 1 Book G 4 Book H 2