Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Which sample variance should I choose?
Replies: 7   Last Post: Sep 5, 2011 3:50 AM

 Messages: [ Previous | Next ]
 paulvonhippel at yahoo Posts: 72 Registered: 7/13/05
Re: Which sample variance should I choose?
Posted: Aug 31, 2011 2:34 PM

If you know the population mean, then
s2_1 = ?(x - µ)^2 / n
is unbiased.

If you don't know the population mean, then
s2_2 = ?(x - m)^2 / (n-1)
is unbiased, while
s2_3 = ?(x - m)^2 / n
is biased but nevertheless more accurate than s2_2.

None of these distinctions matters if n is reasonably large

On Aug 31, 1:02 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> Paige Miller wrote:
> > On Aug 31, 1:06 am, Steven D'Aprano <steve
> > +comp.lang.pyt...@pearwood.info> wrote:

> [...]
> >> Under what circumstances should I prefer each of these four estimators of
> >> ?^2 and what are the pros and cons of each?

>
> > There is no answer until you tell us what you are planning to use the
> > variance for.

>
> That's exactly what I'm trying to find out. Under which circumstances should
> I prefer one method over the others?
>
> I don't actually have a *specific* usage in mind, other than answering the
> question "what's the sample variance of this data?" But I would like to
> understand why somebody might choose one version or another.
>
> E.g.
>
> the unbiased sample variance (divide by n-1) has the advantage that, on
> average, it will equal the population variance (provided certain
> assumptions hold, such as sampling with replacement);
>
> but the unbiased sample variance also has a larger spread, so although it is
> the most accurate on average, there's a chance that it will be much further
> off. The biased sample variance (divide by n) is less accurate but more
> precise (the results are clustered more closely together, so the chances of
> getting a result that is *way* off is much reduced);
>
> etc. Or at least, this is what I *think* is the case.
>
> I'm not even sure that it is mathematically valid to substitute µ into the
> sample variance formulae instead of the sample mean. I can't see why it
> wouldn't be, but I'm not sure.
>
> For reference, here's the suggested sample variance formulae again:
>
> s^2 = ?(x - m)^2 / n      (Eq. 1) Biased, using sample mean
> s^2 = ?(x - m)^2 / (n-1)  (Eq. 2) Unbiased, using sample mean
> s^2 = ?(x - µ)^2 / n      (Eq. 3) Biased, using population mean
> s^2 = ?(x - µ)^2 / (n-1)  (Eq. 4) Unbiased, using population mean
>
> where the sums are over each x in the sample.
>
> --
> Steven

Date Subject Author
8/31/11 Steven D'Aprano
8/31/11 leoldv
8/31/11 Richard Ulrich
8/31/11 Paige Miller
8/31/11 Steven D'Aprano
8/31/11 paulvonhippel at yahoo
9/1/11 Steven D'Aprano
9/5/11 Steven D'Aprano