The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Which sample variance should I choose?
Replies: 7   Last Post: Sep 5, 2011 3:50 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Steven D'Aprano

Posts: 21
Registered: 3/22/11
Re: Which sample variance should I choose?
Posted: Sep 1, 2011 4:30 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Thu, 1 Sep 2011 04:34 am Paul wrote:

> If you know the population mean, then
> s2_1 = ?(x - µ)^2 / n
> is unbiased.
> If you don't know the population mean, then
> s2_2 = ?(x - m)^2 / (n-1)
> is unbiased, while
> s2_3 = ?(x - m)^2 / n
> is biased but nevertheless more accurate than s2_2.

Thanks Paul, that's exactly the sort of thing I'm looking for.

Not that I don't believe you :) but if you also have a reference (especially
one that's online) that would be really helpful.

Thanks to everyone who answered.


> None of these distinctions matters if n is reasonably large
> On Aug 31, 1:02 pm, Steven D'Aprano <steve
>> wrote:

>> Paige Miller wrote:
>> > On Aug 31, 1:06 am, Steven D'Aprano <steve
>> >> wrote:

>> [...]
>> >> Under what circumstances should I prefer each of these four estimators
>> >> of ?^2 and what are the pros and cons of each?

>> > There is no answer until you tell us what you are planning to use the
>> > variance for.

>> That's exactly what I'm trying to find out. Under which circumstances
>> should I prefer one method over the others?
>> I don't actually have a *specific* usage in mind, other than answering
>> the question "what's the sample variance of this data?" But I would like
>> to understand why somebody might choose one version or another.
>> E.g.
>> the unbiased sample variance (divide by n-1) has the advantage that, on
>> average, it will equal the population variance (provided certain
>> assumptions hold, such as sampling with replacement);
>> but the unbiased sample variance also has a larger spread, so although it
>> is the most accurate on average, there's a chance that it will be much
>> further off. The biased sample variance (divide by n) is less accurate
>> but more precise (the results are clustered more closely together, so the
>> chances of getting a result that is *way* off is much reduced);
>> etc. Or at least, this is what I *think* is the case.
>> I'm not even sure that it is mathematically valid to substitute µ into
>> the sample variance formulae instead of the sample mean. I can't see why
>> it wouldn't be, but I'm not sure.
>> For reference, here's the suggested sample variance formulae again:
>> s^2 = ?(x - m)^2 / n      (Eq. 1) Biased, using sample mean
>> s^2 = ?(x - m)^2 / (n-1)  (Eq. 2) Unbiased, using sample mean
>> s^2 = ?(x - µ)^2 / n      (Eq. 3) Biased, using population mean
>> s^2 = ?(x - µ)^2 / (n-1)  (Eq. 4) Unbiased, using population mean
>> where the sums are over each x in the sample.
>> --
>> Steven


Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.