Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Student-t distribution - profound or biased?
Replies: 9   Last Post: Jul 4, 2006 5:01 PM

 Messages: [ Previous | Next ]
 Old Mac User Posts: 104 Registered: 3/12/06
Re: Student-t distribution - profound or biased?
Posted: Jul 2, 2006 2:33 PM

Thanks for your post. You are correct in suggesting that the accuracy
gets worse as we move to the 2nd, 3rd, etc. moments. Now let's
consider the average and the t-ratio.

Gosset was a remarkable person who swam against the tide. I've always
said that the reasoned "like an engineer" (I'm a chemical engineer and
also a statistician.) Prior to his work it was deemed that in order to
use the "Z-ratio" (analog to the t-ratio, but with the true standard
deviation sigma "known") a person would have to either get a lot of
data to obtain an accurate estimate of "sigma" or use a theoretical
"known" value. It seems to be that Gosset reasoned as follows. "The
form of the Z-ratio is known from first principles and it is correct.
The only problem is that we never actually know sigma. So let's
calculate an unbiased estimate of sigma from a modest amount of data
and move forward... using the Z-ratio... just as if everything is OK."
In doing this he recognized that the usual probability values
associated with the Z-ratio (such as 1.96 for the 0.05 level of
significance, etc.) would no longer be correct. He probably reasoned
that, since sigma is not known, the distribution of t would be broader
than for Z and hence the probability values had to be properly
adjusted. As I like to put it, reasoning like an engineer, he foresaw
that the form of the t-ratio was correct (again, an analog of the
Z-ratio) and realized that he could "move the problem to a place where
he could fix it". That is, by adjusting the probability values in the
tails. But how to adjust them?

Before I go further, let me add that there are alternative versions of
this story. One version says that Ronald Fisher conceived the form of
the t-ratio and that Gosset learned it from him. Whatever...

OK, so by merely substituting the t-ratio for the Z-ratio and
"pretending" that an estimate of sigma is as good as the true value,
the tail probabilities will (generally) "be wrong".
Gosset did substantial work with random numbers... a Monte Carlo
analysis... to learn how to adjust the tail probabilities. In his time
that was an incredible undertaking. The common t-tables are set up for
fixed (specified) values of the tail probabilities. In other words, the
probs are the arguments in those tables.

Once we see how Monte Carlo works, the door is then opened to a whole
new world of thinking about "statistics", probability, and many related
subjects. This puts us in aorld in which we can teach ourselves and
others some common and even rather exotic methods in statistics and
more. As you may know, the design of the first atomic bomb was
verified and also modified by using Monte Carlo analysis... simulating
neutrons banging into the nucleus of uranium atoms. That involved
extensive work with one of the early digital computers.

Now go one level deeper into your concerns. The Z-ratio (implying
sigma is known... infinite degrees of freedom to calculate Z is one way
In a common t-table the values of t are larger than the corresponding
derived by both Monte Carlo analyses and also from mathematics.

If you want to delve further into this, I'd be happy to send you some
Monte Carlo software that I use when teaching the t-ratio. It's written
in the Basic language (Basic because Basic is very easy to "see", if
someone wants to understand the code.) It's designed to ask whether
there is a difference between two underlying averages. You specify
those "true" averages (they can be the same) and the underlying "true"
sigmas. At execution time it generates samples of data from each source
(normal distributions)... compares them... calculates and reports t,
etc. a large number of times (you specify how many times)... and
reports among other things the number of times your empirical values of
t exceed the tabled values (also called the reference values of t).
This can be sent as an executable and/or as the raw code so you can see
how it works.

I have Monte Carlos demos for a lot of other "statistics", including
some that are not commonly used... but which IMHO should be used.

If you'd like to get that, send an e-mail to me at hedging77 followed
by the usual symbol for "at" yahoo.com. Mention this note. I check
"all day long" every day.

Be of good cheer... OMU

Gosset discovered the form of the t distribution by a combination of
mathematical and empirical work with random numbers, an early
application of the Monte-Carlo method.

WGSGNUAYHTTE@spammotel.com wrote:
> Dear Probability Experts,
>
> I am a retired Civil Engineer and teacher who never took a course in
> statistics/probability, but the concepts of variability have changed
> the way I think about almost everything. I would very much appreciate
> help in understanding one specific aspect of the "Student's t"
> distribution. The following represents my understanding - it is not
> presented as fact or argument.
>
> First, I believe that the accuracy with which a sample parameter
> approximates the same parameter in the underlying population decreases
> as the moment increases. For a given sample size, the dispersion is
> less reliable than the mean, the skewness is less reliable than the
> dispersion, and the kurtosis is less reliable than the skewness.
>
> It appears that Gosset recognized the POSSIBILITY that the sample
> variance might UNDERESTIMATE the population variance. He derived the
> "t" distribution in which "t" is always larger than the sample standard
> deviation "s" (especially for sample sizes less than about 30). It
> seems, therefore, that he created an estimate of the population
> dispersion that was broader than actually indicated by the sample. By
> assuming a more disperse population, it would be far less likely that
> the brewing process would appear out of control. He would, therefore,
> avoid shutting the process down unnecessarily. I am sure that the
> Guinness management liked that.
>
> I don't understand why it is not EQUALLY POSSIBLE for the variance of a
> small sample to OVERESTIMATE the dispersion of the underlying
> population. This would result in allowing the process to continue when
> it is actually out of control. In other words, why is there not an
> "op" statistic (other possibility) which is the reciprocal of "t?" How
> would one now whether to use "t" or "op" for a certain small sample.
>
> I know I am missing something, but my question seems important with the
> recent emphasis on product quality. Any help will be appreciated, and
> responses in common parlance will be most useful.
>
> Troubled