Date: Mar 23, 2011 5:16 PM
Author: Steven D'Aprano
Subject: Expectation of the variance
I'm trying to demonstrate numerically (rather than algebraically) that

the expectation of the sample variance is the population variance, but

it's not working for me.

Some quick(?) background... please correct me if I'm wrong about anything.

The variance of a population is:

?^2 = 1/n * ?(x-?)^2 over all x in the population

where ^2 means superscript 2 (i.e. squared). In case you can't read the

symbols, here it is again in ASCII-only text:

theta^2 = 1/n * SUM( (x-mu)^2 )

If you don't have the entire population as your data, you can estimate

the population variance by calculating a sample variance:

s'^2 = 1/n * ?(x-?)^2 over all x in the sample

where s' is being used instead of s subscript n.

This is unbiased, provided you know the population mean mu ?. Normally

you don't though, and you're reduced to estimating it from your sample:

s'^2 = 1/n * ?(x-m)^2

where m is being used as the symbol for sample mean x bar = ?x/n

Unfortunately this sample variance is biased, so the "unbiased sample

variance" is used instead:

s^2 = 1/(n-1) * ?(x-m)^2

What makes this unbiased is that the expected value of the sample

variances equals the true population variance. E.g. see

http://en.wikipedia.org/wiki/Bessel's_correction

The algebra convinces me -- I'm sure it's correct. But I'd like an easy

example I can show people, but it's not working for me!

Let's start with a population of: [1, 2, 3, 4]. The true mean is 2.5 and

the true (population) variance is 1.25.

All possible samples for each sample size > 1, and their exact sample

variances, are:

n = 2

1,2 : 1/2

1,3 : 2

1,4 : 9/2

2,3 : 1/2

2,4 : 2

3,4 : 1/2

Expectation for n=2: 5/3

n=3

1,2,3 : 1

1,3,4 : 7/3

2,3,4 : 1

Expectation for n=3: 13/9

n=4

1,2,3,4 : 5/3

Expectation for n=4: 5/3

As you can see, none of the expectations for a particular sample size are

equal to the population variance. If I instead add up all ten possible

sample variances, and divide by ten, I get 1.6 which is still not equal

to 1.25.

What am I misunderstanding?

Thanks in advance,

--

Steven