The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Topic: kolmogov-smirnov, wilcoxon and kruskal tests
Replies: 14   Last Post: Dec 31, 2012 6:38 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Richard Ulrich

Posts: 2,940
Registered: 12/13/04
Re: kolmogov-smirnov, wilcoxon and kruskal tests
Posted: Dec 31, 2012 6:38 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Mon, 31 Dec 2012 18:38:28 +0000 (UTC), Herman Rubin
<> wrote:

>On 2012-12-30, Rich Ulrich <> wrote:
[snip a bunch]
>> For some other tests:
>> It is common to see t-test presented with tests for pooled vs
>> separate variances. Can you tell by looking at SDs and Ns which
>> test will be "more powerful" for given comparison? [assumption]

>The more degrees of freedom, the more power. However, pooled
>variances make the assumption that the variances are equal, which
>can be a very strong assumption.
>There are ways of getting a rancomized t-test for the case of
>different populations which is valid no matter what the several
>variances happen to be; however, it only has the smallest number
>of degrees of freedom.

The point that I was thinking of here is particular.
- For equal Ns in two groups, there is hardly any difference
in t-tests or p-levels between pooling and not.
- For unequal Ns, the difference can be large. If the tiny N
has the small SD, you are apt to see a difference because both
groups have relatively small Standard Errors. If the tiny N
has the large SD, the SE of that group remains large... so
the computed SE for the difference remains large, and the test
will have little power.

>> It is common (SPSS, say) to see a contingency table with both the
>> Pearson chisquared test and the Likelihood test. Do you know which
>> test is more sensitive to which kind of difference? [criterion]

>The two tests are asymptotically the same. With reasonably
>large samples, there should be little difference.

Well, that is true for 2x2 tables. For larger tables: Not So.
The rejection regions can differ, regardless of N.

Pearson's contingency test gives a greater weight to a
*large* deviation in a single cell -- think of the squared
values of (O-E) in a omputation formula -- where the likelihood
test does not square that difference, and effectively
gives relatively greater weights to multiple cells with
moderate deviations.
(See the appendix in Agresti, for 'power family'.)

>> Can you construct a set of paired data for which the paired
>> t-test is less powerful than the separate-groups t-test?
>> [assumption]
>> (Is it ever fair to ignore the knowledge that these data are
>> correlated?)

>If the correlation is negative, the paired will be less powerful.,

Right. That becomes clearer when you contrast the formulas
for variance
Var(A) + Var(B) and
Var(A) + Var(B) - 2*Cov(AB)

>> The Spearman and Kendall coefficients for rank-correlation
>> do not have the same rejection area. Do you have a reason
>> for selecting one over the other? [criterion]

>The Kendall coefficient is VERY close to normal for reasonable
>sample sizes. The two coefficients are highly correlated; I
>seem to recall that the correlation approaches one as the
>sample size increases.

No, not so. Again, there are different rejection regions.

The Spearman - as is often mentioned - is exactly computable
as a Pearson, product-moment correlation, using the Ranks
as scores. Thus, deviations are measured as summed as
squares, analogous to the contingency chi-squared example.

Kendall, on the other hand, counts -- as deviations from
perfect correlation -- the number of single "swaps" or
interchanges that are necessary to make the scores agree.
So this is a linear metric of difference, rather than squared.

Sample A (1, ..., 101)
Sample B (2, ..., 101, 1)
Sample C (4, ..., 52, 1,2,3, 51, ..., 101)

For Spearman: ranks 1 vs 101 give about 100^2 for deviation in B
For Spearman: sum of mid-deviations is about 3* (50^2) in C (less)

For Kendall: 100 interchanges to correct A
For Kendall: 3*50 interchanges to correct B (more)

For Spearman, correlation (AB) is less than (AC)
For Kendall, correlation (AB) is greater than (AC)

... Therefore, rejection areas are different.

Larger samples will give you more precise estimates of each
coefficient, but there is no reason for them to converge in order.

Rich Ulrich

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2017. All Rights Reserved.