On Mon, 31 Dec 2012 18:38:28 +0000 (UTC), Herman Rubin <email@example.com> wrote:
>On 2012-12-30, Rich Ulrich <firstname.lastname@example.org> wrote: [snip a bunch] > >> For some other tests: > >> It is common to see t-test presented with tests for pooled vs >> separate variances. Can you tell by looking at SDs and Ns which >> test will be "more powerful" for given comparison? [assumption] HR> >The more degrees of freedom, the more power. However, pooled >variances make the assumption that the variances are equal, which >can be a very strong assumption. > >There are ways of getting a rancomized t-test for the case of >different populations which is valid no matter what the several >variances happen to be; however, it only has the smallest number >of degrees of freedom.
The point that I was thinking of here is particular. - For equal Ns in two groups, there is hardly any difference in t-tests or p-levels between pooling and not. - For unequal Ns, the difference can be large. If the tiny N has the small SD, you are apt to see a difference because both groups have relatively small Standard Errors. If the tiny N has the large SD, the SE of that group remains large... so the computed SE for the difference remains large, and the test will have little power.
> >> It is common (SPSS, say) to see a contingency table with both the >> Pearson chisquared test and the Likelihood test. Do you know which >> test is more sensitive to which kind of difference? [criterion] HR> >The two tests are asymptotically the same. With reasonably >large samples, there should be little difference.
Well, that is true for 2x2 tables. For larger tables: Not So. The rejection regions can differ, regardless of N.
Pearson's contingency test gives a greater weight to a *large* deviation in a single cell -- think of the squared values of (O-E) in a omputation formula -- where the likelihood test does not square that difference, and effectively gives relatively greater weights to multiple cells with moderate deviations. (See the appendix in Agresti, for 'power family'.)
> >> Can you construct a set of paired data for which the paired >> t-test is less powerful than the separate-groups t-test? >> [assumption] >> (Is it ever fair to ignore the knowledge that these data are >> correlated?) > >If the correlation is negative, the paired will be less powerful.,
Right. That becomes clearer when you contrast the formulas for variance Var(A) + Var(B) and Var(A) + Var(B) - 2*Cov(AB)
> >> The Spearman and Kendall coefficients for rank-correlation >> do not have the same rejection area. Do you have a reason >> for selecting one over the other? [criterion] > >The Kendall coefficient is VERY close to normal for reasonable >sample sizes. The two coefficients are highly correlated; I >seem to recall that the correlation approaches one as the >sample size increases.
No, not so. Again, there are different rejection regions.
The Spearman - as is often mentioned - is exactly computable as a Pearson, product-moment correlation, using the Ranks as scores. Thus, deviations are measured as summed as squares, analogous to the contingency chi-squared example.
Kendall, on the other hand, counts -- as deviations from perfect correlation -- the number of single "swaps" or interchanges that are necessary to make the scores agree. So this is a linear metric of difference, rather than squared.
Sample A (1, ..., 101) Sample B (2, ..., 101, 1) Sample C (4, ..., 52, 1,2,3, 51, ..., 101)
For Spearman: ranks 1 vs 101 give about 100^2 for deviation in B For Spearman: sum of mid-deviations is about 3* (50^2) in C (less)
For Kendall: 100 interchanges to correct A For Kendall: 3*50 interchanges to correct B (more)
For Spearman, correlation (AB) is less than (AC) For Kendall, correlation (AB) is greater than (AC)
... Therefore, rejection areas are different.
Larger samples will give you more precise estimates of each coefficient, but there is no reason for them to converge in order.