On Mon, 28 Jan 2013 10:21:21 -0800 (PST), "Mickey M." <firstname.lastname@example.org> wrote:
>I think I have to precise my quenstion: > >Suppose we have X1, ..., Xn i.i.d from the distribution F and >Y1, ..., Ym i.i.d from the distribution G. > >[and also X a random variable from F and Y ~ G] > >The standard Mann-Whitney test tests the null hypothesis >H0: F=G >against >H1: F different from G >or against the one-sided hypothesis >H1': F stochastically greater than G (i.e. P(X>Y) > 0.5) > >I would like to know, whether it is possible to prove that Mann-Whitney test >can be used in fact to test: > >H0: P(X>Y) = 0.5 >against >H1: P(X>Y) \neq 0.5 >or H1': P(X>Y) > 0.5 > >(i.e. whether it is possible to relax the H0 hypothesis, F and G not necessarily the same under H0) > >When I looked on the proof of M-W in my textbook, it seems to me that the assumption F=G (H0) is essential to prove the distributional properties of the Mann-Whitney U statistics. But I have seen the second variant of the test somewhere on the internet (without proof).... >
I think this article by Morten W. Fagerland has an answer to your question. Bruce Weaver posted this reference not long ago on the SPSS list.
The article shows that, for large samples, the MW test shows a large sensitivity to the difference in Shape or Variance instead of (only) the desired difference in Location.
t-tests, non-parametric tests, and large studies a paradox of statistical practice?
[from the discussion] "Furthermore, if the results from the WMW test are interpreted strictly according to the tests null hypothesis, Prob(X<Y)=0.5, the WMW test is an efficient and useful test. For large studies, however, where the purpose is to compare the means of continuous variables, the choice of test is easy: the t-test is robust even to severely skewed data and should be used almost exclusively."
I will point out that Conover showed that the MW is asymptotically equivalent to an ANOVA test on the rank-transformed data.
That has the implication that deviations are measured in squared- distances, rather than interchange-distances. This is the essential difference between the Kendall rank-order correlation and the Spearman rank-order correlation, where the Spearman is an ANOVA- type statistic, since it can be computed as a Pearson r on the rank-transformed variables. I have posted before about this distinction between the Spearman and the Kendall, and why they do have different rejection regions.
Thus, while Fagerland says that the WMF is "efficient and useful" for that hypothesis you state, it is not the "least" parametric, which would possibly be Kendall's tau-c, which corrects for ties (using group membership as a 0/1 variable).