Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
Re: Student-t distribution - profound or biased?
Posted:
Jul 2, 2006 2:33 PM
|
|
Thanks for your post. You are correct in suggesting that the accuracy gets worse as we move to the 2nd, 3rd, etc. moments. Now let's consider the average and the t-ratio.
Gosset was a remarkable person who swam against the tide. I've always said that the reasoned "like an engineer" (I'm a chemical engineer and also a statistician.) Prior to his work it was deemed that in order to use the "Z-ratio" (analog to the t-ratio, but with the true standard deviation sigma "known") a person would have to either get a lot of data to obtain an accurate estimate of "sigma" or use a theoretical "known" value. It seems to be that Gosset reasoned as follows. "The form of the Z-ratio is known from first principles and it is correct. The only problem is that we never actually know sigma. So let's calculate an unbiased estimate of sigma from a modest amount of data and move forward... using the Z-ratio... just as if everything is OK." In doing this he recognized that the usual probability values associated with the Z-ratio (such as 1.96 for the 0.05 level of significance, etc.) would no longer be correct. He probably reasoned that, since sigma is not known, the distribution of t would be broader than for Z and hence the probability values had to be properly adjusted. As I like to put it, reasoning like an engineer, he foresaw that the form of the t-ratio was correct (again, an analog of the Z-ratio) and realized that he could "move the problem to a place where he could fix it". That is, by adjusting the probability values in the tails. But how to adjust them?
Before I go further, let me add that there are alternative versions of this story. One version says that Ronald Fisher conceived the form of the t-ratio and that Gosset learned it from him. Whatever...
OK, so by merely substituting the t-ratio for the Z-ratio and "pretending" that an estimate of sigma is as good as the true value, the tail probabilities will (generally) "be wrong". Gosset did substantial work with random numbers... a Monte Carlo analysis... to learn how to adjust the tail probabilities. In his time that was an incredible undertaking. The common t-tables are set up for fixed (specified) values of the tail probabilities. In other words, the probs are the arguments in those tables.
Once we see how Monte Carlo works, the door is then opened to a whole new world of thinking about "statistics", probability, and many related subjects. This puts us in aorld in which we can teach ourselves and others some common and even rather exotic methods in statistics and more. As you may know, the design of the first atomic bomb was verified and also modified by using Monte Carlo analysis... simulating neutrons banging into the nucleus of uranium atoms. That involved extensive work with one of the early digital computers.
Now go one level deeper into your concerns. The Z-ratio (implying sigma is known... infinite degrees of freedom to calculate Z is one way to think about this)... is really just a special case of the t-ratio. In a common t-table the values of t are larger than the corresponding values of Z not as an arbitrary adjustment. Those "adjustments" were derived by both Monte Carlo analyses and also from mathematics.
If you want to delve further into this, I'd be happy to send you some Monte Carlo software that I use when teaching the t-ratio. It's written in the Basic language (Basic because Basic is very easy to "see", if someone wants to understand the code.) It's designed to ask whether there is a difference between two underlying averages. You specify those "true" averages (they can be the same) and the underlying "true" sigmas. At execution time it generates samples of data from each source (normal distributions)... compares them... calculates and reports t, etc. a large number of times (you specify how many times)... and reports among other things the number of times your empirical values of t exceed the tabled values (also called the reference values of t). This can be sent as an executable and/or as the raw code so you can see how it works.
I have Monte Carlos demos for a lot of other "statistics", including some that are not commonly used... but which IMHO should be used.
If you'd like to get that, send an e-mail to me at hedging77 followed by the usual symbol for "at" yahoo.com. Mention this note. I check that address about once a day. I'll then give you the address I use "all day long" every day.
Be of good cheer... OMU
Gosset discovered the form of the t distribution by a combination of mathematical and empirical work with random numbers, an early application of the Monte-Carlo method.
WGSGNUAYHTTE@spammotel.com wrote: > Dear Probability Experts, > > I am a retired Civil Engineer and teacher who never took a course in > statistics/probability, but the concepts of variability have changed > the way I think about almost everything. I would very much appreciate > help in understanding one specific aspect of the "Student's t" > distribution. The following represents my understanding - it is not > presented as fact or argument. > > First, I believe that the accuracy with which a sample parameter > approximates the same parameter in the underlying population decreases > as the moment increases. For a given sample size, the dispersion is > less reliable than the mean, the skewness is less reliable than the > dispersion, and the kurtosis is less reliable than the skewness. > > It appears that Gosset recognized the POSSIBILITY that the sample > variance might UNDERESTIMATE the population variance. He derived the > "t" distribution in which "t" is always larger than the sample standard > deviation "s" (especially for sample sizes less than about 30). It > seems, therefore, that he created an estimate of the population > dispersion that was broader than actually indicated by the sample. By > assuming a more disperse population, it would be far less likely that > the brewing process would appear out of control. He would, therefore, > avoid shutting the process down unnecessarily. I am sure that the > Guinness management liked that. > > I don't understand why it is not EQUALLY POSSIBLE for the variance of a > small sample to OVERESTIMATE the dispersion of the underlying > population. This would result in allowing the process to continue when > it is actually out of control. In other words, why is there not an > "op" statistic (other possibility) which is the reciprocal of "t?" How > would one now whether to use "t" or "op" for a certain small sample. > > I know I am missing something, but my question seems important with the > recent emphasis on product quality. Any help will be appreciated, and > responses in common parlance will be most useful. > > Troubled
|
|
|
|