Douglas 73
Posts:
39
From:
Pacific northwest, USA
Registered:
1/15/11


Upside down Beginner Statistics Prioity
Posted:
Jul 11, 2012 9:41 AM


In the beginning (of Statistics 101) there was a cursory explanation of coin flipping outcomes, and the Chisquare table. Then it was quickly on to twodice outcomes, and one was to notice the triangular peak as those outcomes accumulated. By the end of the second week (as I recall) we were introduced to the wonders of the normal curve and the central limit theory as the acme of statistical sophistication. Beyond that we were presented normal curve method variations such as Student's T.
Since those days (1958), everything I have read indicates that one seeks to measure the truly important issues with normal curve, or as normal a curve which is possible in the circumstances, methods as first priority. Failing that, one looks to refined nonparametric methods like the KS Test. If none those methods suffice for whatever reason, you are left to the lowly Chisquare method varieties.
For instance, in Pearson's goodnessoffit Test (which appears to be disappearing in the newer beginner texts), one degree of freedom (1 d.f., the minimum provided for) is considered to provide the crudest pvalue approximation result for any randomly variable frequency data. More degrees of freedom are said to provide some increasingly refined pvalue approximation. And there is usually advice given that if one needs to use more than 30 d.f., one should look to normal curve methods. Thus, the statistical methods decision circle is complete. (There does not seem to be provision for the possibility of an infinite decision loop.)
It is my experience that virtually all published statistically based studies have one, or more, probabilities value(s) which usually represent the primary evidence supporting whatever conclusion(s) each study makes. That would seem to make the accuracy of these probabilities values(s) of prime importance.
Image my surprise discovering the lowly 1 d.f. Chisquare approximation provides exactly the same p value (and Z value) as the standard normal curve approximation at every binomial point.
From 2 d.f. onward, there is no contest. The normal curve estimates immediately become variably disassociated from their underlying probabilities. Since the Chisquare distribution is the underlying probabilities themself, appropriate Chisquare evaluation forever remains connected to underlying probabilities. It is the most direct interface to underlying probabilities currently existent.
I suggest the traditional evaluation decision loop is fundamentally sounder with the priorities reversed. Only if Chisquare evaluation is not appropriate to your study needs should you consider other evaluation methods. Because in every instance, no matter how stylish or sophisticated the others might seem, they will in all cases have probability values more remote from provable underlying probability values than an appropriate Chisquare evaluation. And, I suspect, in many past cases study probabilities reported are in fact disassociated from actual underlying probabilities. I attribute this mainly to being instructed inadequately in the seemingly simple basics, and subsequent incomplete data evaluation practice. The later is a subject in its own right for another day.
Douglas

