Drexel dragonThe Math ForumDonate to the Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: two-sample nonparametric test on quantiles
Replies: 2   Last Post: Jan 28, 2013 12:53 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Richard Ulrich

Posts: 2,917
Registered: 12/13/04
Re: two-sample nonparametric test on quantiles
Posted: Jan 28, 2013 2:09 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Sun, 27 Jan 2013 17:08:06 -0800 (PST), Ray Koopman <koopman@sfu.ca>

>On Jan 27, 2:19 am, Anonymous wrote:
>> Hello,
>> I have two random samples (each of them i.i.d. with continuous
>> distribution) and I need to test, whether they come from
>> distributions which have the same 100p% quantile (for p=5%).
>> What I need is some generalisation of two-sample Mann-Whitney
>> test on equality of medians.

First, the Mann-Whitney rank-sum test is not a test on equality
of medians. I can give you an MW that rejects in one direction
while the exact medians differ strongly in the other direction.

You can generalize the simplest median test
by making your split at your arbitrary score for the pooled
percentile in question.

Ray's warning still holds, about having at least 1 and preferably 5
for every expected cell value.

>> I would also need to have non-parametric confidence intervals
>> for empirical quantiles of some sort.
>> I intuitively understand, that I would need to have quite large
>> samples for p close to zero to reject the null (q1=q2) hypothesis.
>> Any reference to literature and/or software implementation that
>> would solve these problems would be appreciated.

>Let x_1,...,x_m and y_1,...,y_n be the two sets of observations.
>Unless you make some assumptions about the forms of their true
>distributions, you have no basis for distinguishing among values
>that lie between successive order statistics of the pooled data.
>Let Z refer to the set of midpoints of the intervals between
>successive order statistics of the pooled data.
>Now suppose you want test the hypothesis that some particular value
>z is the q'th quantile of both the X and Y parent distributions.
>(Note that you must specify both z and q.) Compute
>t[z,q] = (n*(#{x < z} - m*q)^2 + m*(#{y < z} - n*q)^2)/(m*n*q*(1-q)).
>If the hypothesis is true and if min{m,n}*min{q,1-q} is "sufficiently
>large" -- say >= 5, certainly >= 1 -- then t should be distributed
>approximately as chi-square with 2 df.

That computation looks a lot like two of the cells of the 2x2 table,
and it also looks like sum of two squared-normals that might have
2 d.f. instead of 1. But you have also introduced the notion that
z is the common value of the percentile. Does that account for
the d.f.? - So far, I don't see why that doesn't just lessen the

>To get a 100p% CI for the q'th quantile, find the subset of Z for
>which t[z,q] < the p'th quantile of the chi-square(2) distribution.
>(The subset may be empty if the sample x and y distributions are
>very different from one another.)

One well-known non-parametric method for placing the CI on
a rank-score makes use of the ranks. For small p, the observed
counts are distributed Poisson. So, for 5% out of 500, the count
of 25 is the observed "rank 25" but what are observed as ranks
16 and 36 are the 2 s.d. rank for the 5 percent rank. (... since
the SD of sq(X) where X is Poisson is 0.5, you can take the sqrt(25)
to get 5, and then add and suibtract 1 in order to see the 2 SD

Rich Ulrich

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum 1994-2015. All Rights Reserved.