Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: two-sample nonparametric test on quantiles
Replies: 2   Last Post: Jan 28, 2013 12:53 PM

 Messages: [ Previous | Next ]
 Richard Ulrich Posts: 2,940 Registered: 12/13/04
Re: two-sample nonparametric test on quantiles
Posted: Jan 28, 2013 2:09 AM

On Sun, 27 Jan 2013 17:08:06 -0800 (PST), Ray Koopman <koopman@sfu.ca>
wrote:

>On Jan 27, 2:19 am, Anonymous wrote:
>> Hello,
>> I have two random samples (each of them i.i.d. with continuous
>> distribution) and I need to test, whether they come from
>> distributions which have the same 100p% quantile (for p=5%).
>> What I need is some generalisation of two-sample Mann-Whitney
>> test on equality of medians.

First, the Mann-Whitney rank-sum test is not a test on equality
of medians. I can give you an MW that rejects in one direction
while the exact medians differ strongly in the other direction.

You can generalize the simplest median test
http://en.wikipedia.org/wiki/Median_test
percentile in question.

Ray's warning still holds, about having at least 1 and preferably 5
for every expected cell value.

>>
>> I would also need to have non-parametric confidence intervals
>> for empirical quantiles of some sort.
>>
>> I intuitively understand, that I would need to have quite large
>> samples for p close to zero to reject the null (q1=q2) hypothesis.
>>
>> Any reference to literature and/or software implementation that
>> would solve these problems would be appreciated.

>
>Let x_1,...,x_m and y_1,...,y_n be the two sets of observations.
>Unless you make some assumptions about the forms of their true
>distributions, you have no basis for distinguishing among values
>that lie between successive order statistics of the pooled data.
>Let Z refer to the set of midpoints of the intervals between
>successive order statistics of the pooled data.
>
>Now suppose you want test the hypothesis that some particular value
>z is the q'th quantile of both the X and Y parent distributions.
>(Note that you must specify both z and q.) Compute
>
>t[z,q] = (n*(#{x < z} - m*q)^2 + m*(#{y < z} - n*q)^2)/(m*n*q*(1-q)).
>
>If the hypothesis is true and if min{m,n}*min{q,1-q} is "sufficiently
>large" -- say >= 5, certainly >= 1 -- then t should be distributed
>approximately as chi-square with 2 df.

Ray,
That computation looks a lot like two of the cells of the 2x2 table,
and it also looks like sum of two squared-normals that might have
2 d.f. instead of 1. But you have also introduced the notion that
z is the common value of the percentile. Does that account for
the d.f.? - So far, I don't see why that doesn't just lessen the
power.

>
>To get a 100p% CI for the q'th quantile, find the subset of Z for
>which t[z,q] < the p'th quantile of the chi-square(2) distribution.
>(The subset may be empty if the sample x and y distributions are
>very different from one another.)

One well-known non-parametric method for placing the CI on
a rank-score makes use of the ranks. For small p, the observed
counts are distributed Poisson. So, for 5% out of 500, the count
of 25 is the observed "rank 25" but what are observed as ranks
16 and 36 are the 2 s.d. rank for the 5 percent rank. (... since
the SD of sq(X) where X is Poisson is 0.5, you can take the sqrt(25)
to get 5, and then add and suibtract 1 in order to see the 2 SD
range.)

--
Rich Ulrich

Date Subject Author
1/28/13 Ray Koopman
1/28/13 Richard Ulrich