On Mar 23, 2:58 pm, bert <bert.hutchi...@btinternet.com> wrote: > On Mar 23, 9:16 pm, Steven D'Aprano <steve > > > > > > > > > > +comp.lang.pyt...@pearwood.info> wrote: > > I'm trying to demonstrate numerically (rather than algebraically) that > > the expectation of the sample variance is the population variance, but > > it's not working for me. > > > Some quick(?) background... please correct me if I'm wrong about anything. > > > The variance of a population is: > > > ?^2 = 1/n * ?(x-?)^2 over all x in the population > > > where ^2 means superscript 2 (i.e. squared). In case you can't read the > > symbols, here it is again in ASCII-only text: > > > theta^2 = 1/n * SUM( (x-mu)^2 ) > > > If you don't have the entire population as your data, you can estimate > > the population variance by calculating a sample variance: > > > s'^2 = 1/n * ?(x-?)^2 over all x in the sample > > > where s' is being used instead of s subscript n. > > > This is unbiased, provided you know the population mean mu ?. Normally > > you don't though, and you're reduced to estimating it from your sample: > > > s'^2 = 1/n * ?(x-m)^2 > > > where m is being used as the symbol for sample mean x bar = ?x/n > > > Unfortunately this sample variance is biased, so the "unbiased sample > > variance" is used instead: > > > s^2 = 1/(n-1) * ?(x-m)^2 > > > What makes this unbiased is that the expected value of the sample > > variances equals the true population variance. E.g. see > > >http://en.wikipedia.org/wiki/Bessel's_correction > > > The algebra convinces me -- I'm sure it's correct. But I'd like an easy > > example I can show people, but it's not working for me! > > > Let's start with a population of: [1, 2, 3, 4]. The true mean is 2.5 and > > the true (population) variance is 1.25. > > > All possible samples for each sample size > 1, and their exact sample > > variances, are: > > > n = 2 > > 1,2 : 1/2 > > 1,3 : 2 > > 1,4 : 9/2 > > 2,3 : 1/2 > > 2,4 : 2 > > 3,4 : 1/2 > > Expectation for n=2: 5/3 > > > n=3 > > 1,2,3 : 1 > > 1,3,4 : 7/3 > > 2,3,4 : 1 > > Expectation for n=3: 13/9 > > > n=4 > > 1,2,3,4 : 5/3 > > Expectation for n=4: 5/3 > > > As you can see, none of the expectations for a particular sample size are > > equal to the population variance. If I instead add up all ten possible > > sample variances, and divide by ten, I get 1.6 which is still not equal > > to 1.25. > > > What am I misunderstanding? > > The formulae are correct only for a population with > a Gaussian distribution.
No. They are true for any iid sample from any distribution. It is true that if the population is not Gaussian we may not be able to compute the _distribution_ of the sample variance, but we can still use the given formulas to find the expectation.
I think that the OP's problem stems from trying to use the formulas where they do not apply. He does not have an iid sample, because the only combinations he looks at are those in which there are no repetitions; that is, he is sampling *without replacement* (he does not allow things like (1,1,1); he only allows things like (1,2,3).) That makes his successive sample values dependent instead of independent, so the formulas he cites (which assume _independence_) do not apply.
> The distribution of your > test population [1, 2, 3, 4] is not Gaussian, and > its difference from normality is enough to give > those differences in the sample variances. > --