Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


Math Forum
»
Discussions
»
sci.math.*
»
sci.stat.math
Notice: We are no longer accepting new posts, but the forums will continue to be readable.
Topic:
twosample nonparametric test on quantiles
Replies:
2
Last Post:
Jan 28, 2013 12:53 PM




Re: twosample nonparametric test on quantiles
Posted:
Jan 28, 2013 2:09 AM


On Sun, 27 Jan 2013 17:08:06 0800 (PST), Ray Koopman <koopman@sfu.ca> wrote:
>On Jan 27, 2:19 am, Anonymous wrote: >> Hello, >> I have two random samples (each of them i.i.d. with continuous >> distribution) and I need to test, whether they come from >> distributions which have the same 100p% quantile (for p=5%). >> What I need is some generalisation of twosample MannWhitney >> test on equality of medians.
First, the MannWhitney ranksum test is not a test on equality of medians. I can give you an MW that rejects in one direction while the exact medians differ strongly in the other direction.
You can generalize the simplest median test http://en.wikipedia.org/wiki/Median_test by making your split at your arbitrary score for the pooled percentile in question.
Ray's warning still holds, about having at least 1 and preferably 5 for every expected cell value.
>> >> I would also need to have nonparametric confidence intervals >> for empirical quantiles of some sort. >> >> I intuitively understand, that I would need to have quite large >> samples for p close to zero to reject the null (q1=q2) hypothesis. >> >> Any reference to literature and/or software implementation that >> would solve these problems would be appreciated. > >Let x_1,...,x_m and y_1,...,y_n be the two sets of observations. >Unless you make some assumptions about the forms of their true >distributions, you have no basis for distinguishing among values >that lie between successive order statistics of the pooled data. >Let Z refer to the set of midpoints of the intervals between >successive order statistics of the pooled data. > >Now suppose you want test the hypothesis that some particular value >z is the q'th quantile of both the X and Y parent distributions. >(Note that you must specify both z and q.) Compute > >t[z,q] = (n*(#{x < z}  m*q)^2 + m*(#{y < z}  n*q)^2)/(m*n*q*(1q)). > >If the hypothesis is true and if min{m,n}*min{q,1q} is "sufficiently >large"  say >= 5, certainly >= 1  then t should be distributed >approximately as chisquare with 2 df.
Ray, That computation looks a lot like two of the cells of the 2x2 table, and it also looks like sum of two squarednormals that might have 2 d.f. instead of 1. But you have also introduced the notion that z is the common value of the percentile. Does that account for the d.f.?  So far, I don't see why that doesn't just lessen the power.
> >To get a 100p% CI for the q'th quantile, find the subset of Z for >which t[z,q] < the p'th quantile of the chisquare(2) distribution. >(The subset may be empty if the sample x and y distributions are >very different from one another.)
One wellknown nonparametric method for placing the CI on a rankscore makes use of the ranks. For small p, the observed counts are distributed Poisson. So, for 5% out of 500, the count of 25 is the observed "rank 25" but what are observed as ranks 16 and 36 are the 2 s.d. rank for the 5 percent rank. (... since the SD of sq(X) where X is Poisson is 0.5, you can take the sqrt(25) to get 5, and then add and suibtract 1 in order to see the 2 SD range.)
 Rich Ulrich



