Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Re: kolmogovsmirnov, wilcoxon and kruskal tests
Posted:
Dec 31, 2012 6:38 PM


On Mon, 31 Dec 2012 18:38:28 +0000 (UTC), Herman Rubin <hrubin@skew.stat.purdue.edu> wrote:
>On 20121230, Rich Ulrich <rich.ulrich@comcast.net> wrote: [snip a bunch] > >> For some other tests: > >> It is common to see ttest presented with tests for pooled vs >> separate variances. Can you tell by looking at SDs and Ns which >> test will be "more powerful" for given comparison? [assumption] HR> >The more degrees of freedom, the more power. However, pooled >variances make the assumption that the variances are equal, which >can be a very strong assumption. > >There are ways of getting a rancomized ttest for the case of >different populations which is valid no matter what the several >variances happen to be; however, it only has the smallest number >of degrees of freedom.
The point that I was thinking of here is particular.  For equal Ns in two groups, there is hardly any difference in ttests or plevels between pooling and not.  For unequal Ns, the difference can be large. If the tiny N has the small SD, you are apt to see a difference because both groups have relatively small Standard Errors. If the tiny N has the large SD, the SE of that group remains large... so the computed SE for the difference remains large, and the test will have little power.
> >> It is common (SPSS, say) to see a contingency table with both the >> Pearson chisquared test and the Likelihood test. Do you know which >> test is more sensitive to which kind of difference? [criterion] HR> >The two tests are asymptotically the same. With reasonably >large samples, there should be little difference.
Well, that is true for 2x2 tables. For larger tables: Not So. The rejection regions can differ, regardless of N.
Pearson's contingency test gives a greater weight to a *large* deviation in a single cell  think of the squared values of (OE) in a omputation formula  where the likelihood test does not square that difference, and effectively gives relatively greater weights to multiple cells with moderate deviations. (See the appendix in Agresti, for 'power family'.)
> >> Can you construct a set of paired data for which the paired >> ttest is less powerful than the separategroups ttest? >> [assumption] >> (Is it ever fair to ignore the knowledge that these data are >> correlated?) > >If the correlation is negative, the paired will be less powerful.,
Right. That becomes clearer when you contrast the formulas for variance Var(A) + Var(B) and Var(A) + Var(B)  2*Cov(AB)
> >> The Spearman and Kendall coefficients for rankcorrelation >> do not have the same rejection area. Do you have a reason >> for selecting one over the other? [criterion] > >The Kendall coefficient is VERY close to normal for reasonable >sample sizes. The two coefficients are highly correlated; I >seem to recall that the correlation approaches one as the >sample size increases.
No, not so. Again, there are different rejection regions.
The Spearman  as is often mentioned  is exactly computable as a Pearson, productmoment correlation, using the Ranks as scores. Thus, deviations are measured as summed as squares, analogous to the contingency chisquared example.
Kendall, on the other hand, counts  as deviations from perfect correlation  the number of single "swaps" or interchanges that are necessary to make the scores agree. So this is a linear metric of difference, rather than squared.
Sample A (1, ..., 101) Sample B (2, ..., 101, 1) Sample C (4, ..., 52, 1,2,3, 51, ..., 101)
For Spearman: ranks 1 vs 101 give about 100^2 for deviation in B For Spearman: sum of middeviations is about 3* (50^2) in C (less)
For Kendall: 100 interchanges to correct A For Kendall: 3*50 interchanges to correct B (more)
For Spearman, correlation (AB) is less than (AC) For Kendall, correlation (AB) is greater than (AC)
... Therefore, rejection areas are different.
Larger samples will give you more precise estimates of each coefficient, but there is no reason for them to converge in order.
 Rich Ulrich



