```Date: Aug 31, 2011 2:02 PM
Author: Steven D'Aprano
Subject: Re: Which sample variance should I choose?

Paige Miller wrote:> On Aug 31, 1:06 am, Steven D'Aprano <steve> +comp.lang.pyt...@pearwood.info> wrote:[...]>> Under what circumstances should I prefer each of these four estimators of>> ?^2 and what are the pros and cons of each?> > There is no answer until you tell us what you are planning to use the> variance for.That's exactly what I'm trying to find out. Under which circumstances shouldI prefer one method over the others?I don't actually have a *specific* usage in mind, other than answering thequestion "what's the sample variance of this data?" But I would like tounderstand why somebody might choose one version or another.E.g.the unbiased sample variance (divide by n-1) has the advantage that, onaverage, it will equal the population variance (provided certainassumptions hold, such as sampling with replacement);but the unbiased sample variance also has a larger spread, so although it isthe most accurate on average, there's a chance that it will be much furtheroff. The biased sample variance (divide by n) is less accurate but moreprecise (the results are clustered more closely together, so the chances ofgetting a result that is *way* off is much reduced);etc. Or at least, this is what I *think* is the case.I'm not even sure that it is mathematically valid to substitute µ into thesample variance formulae instead of the sample mean. I can't see why itwouldn't be, but I'm not sure.For reference, here's the suggested sample variance formulae again:s^2 = ?(x - m)^2 / n      (Eq. 1) Biased, using sample means^2 = ?(x - m)^2 / (n-1)  (Eq. 2) Unbiased, using sample means^2 = ?(x - µ)^2 / n      (Eq. 3) Biased, using population means^2 = ?(x - µ)^2 / (n-1)  (Eq. 4) Unbiased, using population meanwhere the sums are over each x in the sample.-- Steven
```