Date: Jan 28, 2013 2:28 PM
Author: Richard Ulrich
Subject: Re: two-sample nonparametric test on quantiles

On Mon, 28 Jan 2013 10:21:21 -0800 (PST), "Mickey M."
<> wrote:

>I think I have to precise my quenstion:
>Suppose we have X1, ..., Xn i.i.d from the distribution F and
>Y1, ..., Ym i.i.d from the distribution G.
>[and also X a random variable from F and Y ~ G]
>The standard Mann-Whitney test tests the null hypothesis
>H0: F=G
>H1: F different from G
>or against the one-sided hypothesis
>H1': F stochastically greater than G (i.e. P(X>Y) > 0.5)
>I would like to know, whether it is possible to prove that Mann-Whitney test
>can be used in fact to test:
>H0: P(X>Y) = 0.5
>H1: P(X>Y) \neq 0.5
>or H1': P(X>Y) > 0.5
>(i.e. whether it is possible to relax the H0 hypothesis, F and G not necessarily the same under H0)
>When I looked on the proof of M-W in my textbook, it seems to me that the assumption F=G (H0) is essential to prove the distributional properties of the Mann-Whitney U statistics. But I have seen the second variant of the test somewhere on the internet (without proof)....

I think this article by Morten W. Fagerland has an answer to your
question. Bruce Weaver posted this reference not long ago
on the SPSS list.

The article shows that, for large samples, the MW test
shows a large sensitivity to the difference in Shape or
Variance instead of (only) the desired difference in Location.

t-tests, non-parametric tests, and large
studies —a paradox of statistical practice?

[from the discussion]
"Furthermore, if the results from the WMW test are interpreted
strictly according to the test’s null hypothesis, Prob(X<Y)=0.5, the
WMW test is an efficient and useful test. For large studies,
however, where the purpose is to compare the means of continuous
variables, the choice of test is easy: the t-test is robust even to
severely skewed data and should be used almost exclusively."

I will point out that Conover showed that the MW is asymptotically
equivalent to an ANOVA test on the rank-transformed data.

That has the implication that deviations are measured in squared-
distances, rather than interchange-distances. This is the essential
difference between the Kendall rank-order correlation and the
Spearman rank-order correlation, where the Spearman is an ANOVA-
type statistic, since it can be computed as a Pearson r on the
rank-transformed variables. I have posted before about this
distinction between the Spearman and the Kendall, and why they
do have different rejection regions.

Thus, while Fagerland says that the WMF is "efficient and useful"
for that hypothesis you state, it is not the "least" parametric, which
would possibly be Kendall's tau-c, which corrects for ties (using
group membership as a 0/1 variable).

Rich Ulrich