
Re: Which sample variance should I choose?
Posted:
Aug 31, 2011 2:34 PM


If you know the population mean, then s2_1 = ?(x  µ)^2 / n is unbiased.
If you don't know the population mean, then s2_2 = ?(x  m)^2 / (n1) is unbiased, while s2_3 = ?(x  m)^2 / n is biased but nevertheless more accurate than s2_2.
None of these distinctions matters if n is reasonably large
On Aug 31, 1:02 pm, Steven D'Aprano <steve +comp.lang.pyt...@pearwood.info> wrote: > Paige Miller wrote: > > On Aug 31, 1:06 am, Steven D'Aprano <steve > > +comp.lang.pyt...@pearwood.info> wrote: > [...] > >> Under what circumstances should I prefer each of these four estimators of > >> ?^2 and what are the pros and cons of each? > > > There is no answer until you tell us what you are planning to use the > > variance for. > > That's exactly what I'm trying to find out. Under which circumstances should > I prefer one method over the others? > > I don't actually have a *specific* usage in mind, other than answering the > question "what's the sample variance of this data?" But I would like to > understand why somebody might choose one version or another. > > E.g. > > the unbiased sample variance (divide by n1) has the advantage that, on > average, it will equal the population variance (provided certain > assumptions hold, such as sampling with replacement); > > but the unbiased sample variance also has a larger spread, so although it is > the most accurate on average, there's a chance that it will be much further > off. The biased sample variance (divide by n) is less accurate but more > precise (the results are clustered more closely together, so the chances of > getting a result that is *way* off is much reduced); > > etc. Or at least, this is what I *think* is the case. > > I'm not even sure that it is mathematically valid to substitute µ into the > sample variance formulae instead of the sample mean. I can't see why it > wouldn't be, but I'm not sure. > > For reference, here's the suggested sample variance formulae again: > > s^2 = ?(x  m)^2 / n (Eq. 1) Biased, using sample mean > s^2 = ?(x  m)^2 / (n1) (Eq. 2) Unbiased, using sample mean > s^2 = ?(x  µ)^2 / n (Eq. 3) Biased, using population mean > s^2 = ?(x  µ)^2 / (n1) (Eq. 4) Unbiased, using population mean > > where the sums are over each x in the sample. > >  > Steven

