I've been a bad boy here. You were completely correct to put me right, but I was sufficiently vague so that you couldn't actually say I was wrong, where if you had read my mind you would know that I was wrong, because what I was trying to say is that the "sytematic underestimate" property is why we divide by n-1 instead of n - but I am wrong because the n-1 is already in the definition of both the variance and the standard deviation.
But by accident it looks like I was right, because in a way you are saying that just because something is an unbiased estimator doesn't make it necessarily a good estimator, because if you take its square root it is no longer unbiased.
Old Mac User wrote: > Under virtually all circumstances, the sample variance is an unbiased > estimate of the population variance. (Extreme exception: a distribution > that has infinite variance... strictly a theoretical sort of thing not > found in practice.)
...and indeed this is a nice example, because if the distribution does have infinite variance, then the sample variance is still an unbiased estimator (i.e. its expected value is infinity), but it clearly systematically underestimates.
> However, the sample standard deviation is a biased estimate of the > population standard deviation. The magnitude of this bias can be > rather large when the sample is based on a small number of degrees of > freedom. For instance if the sample size is n = 2 (1 df) then the > sample standard deviation should be multiplied by the factor 1.23 (I > think that's correct... it's close) to get an unbiased estimate of the > population standard deviation. > If the sample size is large (say, n = 15) then the bias is still > present but it is trivial. > > Now what does this mean? Among other things is means that if we have > many duplicate samples (n = 2 in each instance)... and if we calculate > the standard deviation for each of those... and if we (unwisely) > average those standard deviations together (after all, averaging is > always a good thing... right?) then that average will still be a biased > estimate of the population standard deviation, sigma. If we go this > route then we still need to multiply that averaged standard deviation > by the factor 1.23 (approx.) to get an unbiased estimate of the > population standard deviation. > > The other route to getting an unbiased estimate of the population > standard deviation sigma would be to calculate the variance of each > sample (variance = std dev-squared) and pool those together to get the > pooled variance. Take the sq root of the pooled variance and we'll have > an unbiased estimate of sigma. > > These matters are seldom mentioned in formal courses in statistics. > Rather, the instructor says "the standard deviation calculated from > data is an estimate of the population standard deviation". That's > true. But if the sample size n is small then it's a biased estimate. > You may say "so what". Well, I've been down this road as a consultant, > and in more than one instance it was a big deal. There's nothing quite > like laying this information on a pompus "expert" (who didn't know it) > in a legal matter where the magnitude of the standard deviation was a > critical issue. > > Be careful with this stuff. What I'm seeing on these boards is > well-intentioned people who are attempting to use upscale (even exotic) > software and imbedded methods they do not understand. Things like "I'm > trying to use the (fill in the blank) package to (fill in the blank) > can you help me understand what all those numbers it gave me? Or... > "which buttons should I push to make it do (fill in the blank)." > > > > Stephen Montgomery-Smith wrote: > >>Old Mac User wrote: >> >>>You wrote... >>> >>>"The sample average SYSTEMATICALLY underestimates the population >>>average >>>in that it does this more times than not." >>> >>>Can you elaborate on that? >>> >>>OMU >> >>I meant "The sample standard variance SYSTEMATICALLY underestimates the >>population variance." >> >>Sorry. >> >>Stephen > >