Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: Which sample variance should I choose?
Replies: 7   Last Post: Sep 5, 2011 3:50 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Steven D'Aprano

Posts: 11
Registered: 3/22/11
Re: Which sample variance should I choose?
Posted: Aug 31, 2011 2:02 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Paige Miller wrote:

> On Aug 31, 1:06 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:

[...]
>> Under what circumstances should I prefer each of these four estimators of
>> ?^2 and what are the pros and cons of each?

>
> There is no answer until you tell us what you are planning to use the
> variance for.


That's exactly what I'm trying to find out. Under which circumstances should
I prefer one method over the others?

I don't actually have a *specific* usage in mind, other than answering the
question "what's the sample variance of this data?" But I would like to
understand why somebody might choose one version or another.

E.g.

the unbiased sample variance (divide by n-1) has the advantage that, on
average, it will equal the population variance (provided certain
assumptions hold, such as sampling with replacement);

but the unbiased sample variance also has a larger spread, so although it is
the most accurate on average, there's a chance that it will be much further
off. The biased sample variance (divide by n) is less accurate but more
precise (the results are clustered more closely together, so the chances of
getting a result that is *way* off is much reduced);

etc. Or at least, this is what I *think* is the case.

I'm not even sure that it is mathematically valid to substitute µ into the
sample variance formulae instead of the sample mean. I can't see why it
wouldn't be, but I'm not sure.

For reference, here's the suggested sample variance formulae again:

s^2 = ?(x - m)^2 / n   (Eq. 1) Biased, using sample mean
s^2 = ?(x - m)^2 / (n-1) (Eq. 2) Unbiased, using sample mean
s^2 = ?(x - µ)^2 / n (Eq. 3) Biased, using population mean
s^2 = ?(x - µ)^2 / (n-1) (Eq. 4) Unbiased, using population mean

where the sums are over each x in the sample.


--
Steven




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.