Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Expectation of the variance
Posted:
Mar 23, 2011 5:16 PM


I'm trying to demonstrate numerically (rather than algebraically) that the expectation of the sample variance is the population variance, but it's not working for me.
Some quick(?) background... please correct me if I'm wrong about anything.
The variance of a population is:
?^2 = 1/n * ?(x?)^2 over all x in the population
where ^2 means superscript 2 (i.e. squared). In case you can't read the symbols, here it is again in ASCIIonly text:
theta^2 = 1/n * SUM( (xmu)^2 )
If you don't have the entire population as your data, you can estimate the population variance by calculating a sample variance:
s'^2 = 1/n * ?(x?)^2 over all x in the sample
where s' is being used instead of s subscript n.
This is unbiased, provided you know the population mean mu ?. Normally you don't though, and you're reduced to estimating it from your sample:
s'^2 = 1/n * ?(xm)^2
where m is being used as the symbol for sample mean x bar = ?x/n
Unfortunately this sample variance is biased, so the "unbiased sample variance" is used instead:
s^2 = 1/(n1) * ?(xm)^2
What makes this unbiased is that the expected value of the sample variances equals the true population variance. E.g. see
http://en.wikipedia.org/wiki/Bessel's_correction
The algebra convinces me  I'm sure it's correct. But I'd like an easy example I can show people, but it's not working for me!
Let's start with a population of: [1, 2, 3, 4]. The true mean is 2.5 and the true (population) variance is 1.25.
All possible samples for each sample size > 1, and their exact sample variances, are:
n = 2 1,2 : 1/2 1,3 : 2 1,4 : 9/2 2,3 : 1/2 2,4 : 2 3,4 : 1/2 Expectation for n=2: 5/3
n=3 1,2,3 : 1 1,3,4 : 7/3 2,3,4 : 1 Expectation for n=3: 13/9
n=4 1,2,3,4 : 5/3 Expectation for n=4: 5/3
As you can see, none of the expectations for a particular sample size are equal to the population variance. If I instead add up all ten possible sample variances, and divide by ten, I get 1.6 which is still not equal to 1.25.
What am I misunderstanding?
Thanks in advance,
 Steven



