Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Student-t distribution - profound or biased?
Replies: 3   Last Post: Jul 3, 2006 3:35 PM

 Messages: [ Previous | Next ]
 Stephen Montgomery-Smith Posts: 2,351 Registered: 12/6/04
Re: Student-t distribution - profound or biased?
Posted: Jul 3, 2006 12:53 PM

I've been a bad boy here. You were completely correct to put me right,
but I was sufficiently vague so that you couldn't actually say I was
wrong, where if you had read my mind you would know that I was wrong,
because what I was trying to say is that the "sytematic underestimate"
property is why we divide by n-1 instead of n - but I am wrong because
the n-1 is already in the definition of both the variance and the
standard deviation.

But by accident it looks like I was right, because in a way you are
saying that just because something is an unbiased estimator doesn't make
it necessarily a good estimator, because if you take its square root it
is no longer unbiased.

Old Mac User wrote:
> Under virtually all circumstances, the sample variance is an unbiased
> estimate of the population variance. (Extreme exception: a distribution
> that has infinite variance... strictly a theoretical sort of thing not
> found in practice.)

...and indeed this is a nice example, because if the distribution does
have infinite variance, then the sample variance is still an unbiased
estimator (i.e. its expected value is infinity), but it clearly
systematically underestimates.

> However, the sample standard deviation is a biased estimate of the
> population standard deviation. The magnitude of this bias can be
> rather large when the sample is based on a small number of degrees of
> freedom. For instance if the sample size is n = 2 (1 df) then the
> sample standard deviation should be multiplied by the factor 1.23 (I
> think that's correct... it's close) to get an unbiased estimate of the
> population standard deviation.
> If the sample size is large (say, n = 15) then the bias is still
> present but it is trivial.
>
> Now what does this mean? Among other things is means that if we have
> many duplicate samples (n = 2 in each instance)... and if we calculate
> the standard deviation for each of those... and if we (unwisely)
> average those standard deviations together (after all, averaging is
> always a good thing... right?) then that average will still be a biased
> estimate of the population standard deviation, sigma. If we go this
> route then we still need to multiply that averaged standard deviation
> by the factor 1.23 (approx.) to get an unbiased estimate of the
> population standard deviation.
>
> The other route to getting an unbiased estimate of the population
> standard deviation sigma would be to calculate the variance of each
> sample (variance = std dev-squared) and pool those together to get the
> pooled variance. Take the sq root of the pooled variance and we'll have
> an unbiased estimate of sigma.
>
> These matters are seldom mentioned in formal courses in statistics.
> Rather, the instructor says "the standard deviation calculated from
> data is an estimate of the population standard deviation". That's
> true. But if the sample size n is small then it's a biased estimate.
> You may say "so what". Well, I've been down this road as a consultant,
> and in more than one instance it was a big deal. There's nothing quite
> like laying this information on a pompus "expert" (who didn't know it)
> in a legal matter where the magnitude of the standard deviation was a
> critical issue.
>
> Be careful with this stuff. What I'm seeing on these boards is
> well-intentioned people who are attempting to use upscale (even exotic)
> software and imbedded methods they do not understand. Things like "I'm
> trying to use the (fill in the blank) package to (fill in the blank)
> can you help me understand what all those numbers it gave me? Or...
> "which buttons should I push to make it do (fill in the blank)."
>
>
>
> Stephen Montgomery-Smith wrote:
>

>>Old Mac User wrote:
>>

>>>You wrote...
>>>
>>>"The sample average SYSTEMATICALLY underestimates the population
>>>average
>>>in that it does this more times than not."
>>>
>>>Can you elaborate on that?
>>>
>>>OMU

>>
>>I meant "The sample standard variance SYSTEMATICALLY underestimates the
>>population variance."
>>
>>Sorry.
>>
>>Stephen

>
>

Date Subject Author
7/3/06 Stephen Montgomery-Smith
7/3/06 Robert Israel
7/3/06 Ronald Bruck