Calculating Percentile Rank
Date: 04/30/2009 at 20:18:30 From: Susan Subject: Percentile rank Percentile rank means the percentage of scores that fall "at or below" a certain number. If more than one data value matches the number, why do we only count half of the data values when calculating the percentile rank? ie: 10, 11, 12, 12, 12, 12, 15, 18, 19, 20. Why is the percentile rank of 12 calculated at 4/10 instead of 6/10 since there are 6 data values that fall "at or below" 12?
Date: 05/01/2009 at 13:03:44 From: Doctor Peterson Subject: Re: Percentile rank Hi, Susan. Percentile is not always defined exactly the same way; there are some tricky details, especially when you want to apply the concept to a small "toy" data set like this one. In real life, you would apply it to, say, 30,000 scores on a standardized test, and this sort of problem goes away. I'm not familiar with the specific rule you are using, but I did find it online. There are actually two different concepts to think about. First, consider the following article: Wikipedia: Percentile http://en.wikipedia.org/wiki/Percentile That discusses percentile in the sense of "what value is at the nth percentile (where n is a whole number)?" This gives 99 points that divide a large data set into 100 equal parts, so that any value between the p/100th and the (p+1)/100th is considered to be "in" the pth percentile. The adjustments in the definitions are needed to deal with cases where N is not a multiple of 100, so that the calculations do not point to individual values. What you are asking about is percentile rank, which is somewhat different from that; it asks "at what percentile (again, a whole number) is this value?" Here the problem with a small data set (or a large set with few possible values) is that the same value may appear in more than one "percentile" in the above sense. We have to decide which one we should use--the first? the last? the middle? The following article gives your definition in symbolic form without further explanation, and contrary to its earlier definition in words: Wikipedia: Percentile Rank http://en.wikipedia.org/wiki/Percentile_rank cf_l + 0.5 f_i -------------- * 100% N There cf_l is the number of scores lower than the score of interest, f_i is the number of scores equal to the score of interest, and N is the total number of scores. So you are counting all scores below, and half the scores at, the given value in finding the percentage. This definition makes good sense to me. Basically, they don't want to be biased toward either the first data point with the given value (the number of values BELOW 12, namely 2/10 = 20%) or the last (the number of values AT OR BELOW 12, namely 6/10 = 60%; this can also be taken as 100%--the number of values ABOVE 12, which gives 100%-- 40% = 60%). So they essentially take the average of the two. They are splitting the difference between the two possible definitions. In other words, the MIDDLE of the 12's best represents where the 12's as a group are "at", better than either the first or the last of them. If you have any further questions, feel free to write back. - Doctor Peterson, The Math Forum http://mathforum.org/dr.math/
Date: 05/01/2009 at 16:05:15 From: Susan Subject: Thank you (Percentile rank ) Dear Dr. Peterson, Thank you for your very detailed answer to my question regarding percentile rank. I have referenced many textbooks regarding percentile rank, but none of them have explained "why" half of the repeating values are counted, they simply tell you to only count half of them. I am a 9th grade algebra teacher and I like to tell my students the "why" behind formulas, definitions, etc. because I think they are more apt to remember if they understand the "why." I whole-heartedly appreciate the time and effort you put into responding to my question (a question that has taunted me and my colleagues for a long time). Thank you, Susan
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994-2013 The Math Forum