On Mon, 28 Oct 2013 10:26:56 +0000 (UTC), JohnF <email@example.com> wrote:
>Jennifer Murphy <JenMurphy@jm.invalid> wrote: >> James Waldby <firstname.lastname@example.org> wrote: >>> >>>For the given problem, averages of ranks probably aren't a statistically >>>sound approach. For example, see the "Qualitative description" section >>>of article <http://en.wikipedia.org/wiki/Rating_scale>, which says: >>>"User ratings are at best ordinal categorizations. While it is not >>>uncommon to calculate averages or means for such data, doing so >>>cannot be justified because in calculating averages, equal intervals >>>are required to represent the same difference between levels of perceived >>>quality. The key issues with aggregate data based on the kinds of rating >>>scales commonly used online are as follow: Averages should not be >>>calculated for data of the kind collected." (etc.) >> >> Yes, I did feel a little uneasy about averaging numbers that are not >> really numerical in the usual sense. > >Yes, I think that approach would be wrong, based on the following >extreme case counterexample: > Suppose you have 100 different lists, 99 of which identically contain >the same two books, and only those two books, in the same ranking, > Lists 1-99 contain: Book#1=The Bernie Madoff Story, #2=The Ken Lay Story >Finally, the 100th list contains, say 100 different books, including >our above two losers, but ranked > List 100 contains: Book#1=The complete works of Shakespeare, > #2=..., #3=..., ..., and finally, > #99=The Bernie Madoff Story, #100=The Ken Lay Story >Clearly, Bernie and Ken suck, but when they're the only two >books on a list, then they have to rank #1 and #2. >So you need a methodology that avoids a combined ranking giving > Bernie: 99 #1-scores and one #99-score, and giving > Ken: 99 #2-scores and one #100-score. >That would significantly overestimate them.
This example is badly skewed. First, all of the lists I will use will be from authoritative sources, they will all have a lot more than 2 books, and no two lists will be even close to identical. So I have to quibble with your point about the Bernie and Ken books being that terrible. If these 100 lists are from reputable sources (as opposed to, say, from some crackpot on usenet ;-)), then the Bernie and Ken books do not "suck". If the last list is the 100 greatest books of all time, then even making it to #99 and #100 would be well above the "suck" level. ;-) But I get your point (I think).
I might include a list containing only science fiction books. This would exclude 95% of all books, but, as you say, some book on that list will have to be #1. The #1 book on the sci fi list would probably also be on at least some of the other lists, so it's ranking there would be factored in. And I can apply a weighting factor to each list so that the most substantive lists have a greater influence on the results.