On Jan 31, 2:30 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote: > > By the way -- for years I have been somewhat disparaging > towards rank tests. Now it turns out that I underestimated > how bad they could be.
I, too, am much less sanguine about the U-test than I used to be. Here are some notes I made to myself a few years ago that explain one of the reasons for my pessimism. ____________________________________________________________________
Regardless of how we estimate its standard deviation, there are potential problems for any test that takes U to be approximately normal. Suppose we have a continuous dependent variable in two populations, say A and B, and that the distribution in each population is symmetric with the same mean, so that P(A > B) = 1/2. If population A is even moderately more variable than population B, and if nA is small and nB is not, then the distribution of U can look extremely non-normal, especially in the tails. For instance, consider two logistic populations with zero means and with sA ? 4sB, where s is the usual logistic scale parameter. If nA = 3 and nB = 30 then the distribution of U will have spikes at 0,30,60,90 whose frequencies are approximately proportional to 1,3,3,1, with the 87 other values of U between 0 and 90 having smaller frequencies. (Think of a suspension bridge with towers whose heights are binomially distributed, with each tower connected to the next by very saggy too-long cables, but with no cables connecting the end towers to the shores.)
The reason this happens can be seen by transforming the dependent variable so that the transformed B-distribution (the one with the larger sample size and smaller variance) is Uniform(0,1). Then the transformed A-distribution will generally look something a Beta variable with alpha = beta < 1. (If the original distributions are logistic then the transformed pdfA will be r*(x*(1?x))^(r?1) / (x^r + (1?x)^r)^2, where r = sB/sA.) As sB/sA gets smaller, the transformed A-distribution becomes more like a simple Bernoulli variable that takes on only the values 0 and 1, each with probability 1/2. If nB is not small then the sample distribution of the transformed B-variable will be approximately uniform, and U will approximate nB times a Binomial(nA,1/2) variable.