> If you know the population mean, then > s2_1 = ?(x - µ)^2 / n > is unbiased. > > If you don't know the population mean, then > s2_2 = ?(x - m)^2 / (n-1) > is unbiased, while > s2_3 = ?(x - m)^2 / n > is biased but nevertheless more accurate than s2_2.
Thanks Paul, that's exactly the sort of thing I'm looking for.
Not that I don't believe you :) but if you also have a reference (especially one that's online) that would be really helpful.
Thanks to everyone who answered.
> None of these distinctions matters if n is reasonably large > > On Aug 31, 1:02 pm, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: >> Paige Miller wrote: >> > On Aug 31, 1:06 am, Steven D'Aprano <steve >> > +comp.lang.pyt...@pearwood.info> wrote: >> [...] >> >> Under what circumstances should I prefer each of these four estimators >> >> of ?^2 and what are the pros and cons of each? >> >> > There is no answer until you tell us what you are planning to use the >> > variance for. >> >> That's exactly what I'm trying to find out. Under which circumstances >> should I prefer one method over the others? >> >> I don't actually have a *specific* usage in mind, other than answering >> the question "what's the sample variance of this data?" But I would like >> to understand why somebody might choose one version or another. >> >> E.g. >> >> the unbiased sample variance (divide by n-1) has the advantage that, on >> average, it will equal the population variance (provided certain >> assumptions hold, such as sampling with replacement); >> >> but the unbiased sample variance also has a larger spread, so although it >> is the most accurate on average, there's a chance that it will be much >> further off. The biased sample variance (divide by n) is less accurate >> but more precise (the results are clustered more closely together, so the >> chances of getting a result that is *way* off is much reduced); >> >> etc. Or at least, this is what I *think* is the case. >> >> I'm not even sure that it is mathematically valid to substitute µ into >> the sample variance formulae instead of the sample mean. I can't see why >> it wouldn't be, but I'm not sure. >> >> For reference, here's the suggested sample variance formulae again: >> >> s^2 = ?(x - m)^2 / n (Eq. 1) Biased, using sample mean >> s^2 = ?(x - m)^2 / (n-1) (Eq. 2) Unbiased, using sample mean >> s^2 = ?(x - µ)^2 / n (Eq. 3) Biased, using population mean >> s^2 = ?(x - µ)^2 / (n-1) (Eq. 4) Unbiased, using population mean >> >> where the sums are over each x in the sample. >> >> -- >> Steven