The Math Forum

Ask Dr. Math - Questions and Answers from our Archives
Associated Topics || Dr. Math Home || Search Dr. Math

Estimators of Statistics Variables

Date: 11/17/1999 at 14:14:59
From: Eugenio Rapella
Subject: Statistics: properties of estimators

I'm interested in the properties of the estimators for the most common 
values in statistics and I have some questions:

1. Let s^2 = sum((Xi-M)^2)/n and sigma^2 = sum((Xi-M)^2)/(n-1). Is 
   there an elementary proof that s^2 of a sample in not an unbiased 
   estimator for s^2 of the population while sigma^2 of a sample is an 
   unbiased estimator for sigma^2 of the population?

2. Is there an elementary proof that the median of a sample is an 
   unbiased estimator for the arithmetic mean of the population?

3. Are there elementary proofs whether the estimators of these indices 
   do or don't have other properties (the Italian words are 
   "efficienza" and "consistenza")?

Thank for your kind attention,
Eugenio Rapella

Date: 11/20/1999 at 06:36:48
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I think I understand what you are trying to do, but I don't believe it 
is possible.  

The idea of an unbiased estimator depends on what kind of distribution 
you expect in the sample. There are too many kinds of distributions 
for you to "average" over them, or to construct a "distribution of 

I think the ideas about "unbiased" really are softer than this, and 
must be taken not as mathematical truths but as statements about the 
kinds of distributions we encounter in real-life problems.

On the other hand, it's certainly true that s^2 depends on n, and in 
ANY distribution it's likely to rise with increasing n. Suppose we 
limit ourselves to discrete distributions containing all the numbers 
1 ... k. In other words, our distribution is known to be either

     1         or
     1 2       or
     1 2 3     or
     1 2 3 4   etc.

Suppose we sample with replacement. Then we can actually calculate 
whether sigma^2 is an unbiased estimator. If you have time and 
motivation to investigate this, I'd be interested to hear what you 

For question (2), I think you could imagine doing this for the 
distribution of distributions I proposed above, and that it would turn 
out to be true that the median is an unbiased estimator. But suppose 
your distribution were the squares of whole numbers:

     1          or
     1 4        or
     1 4 9      or
     1 4 9 16   etc.

Then the median wouldn't be an unbiased estimator of the mean at all: 
the median would generally come out lower.

For (3), I'm not familiar with the Italian words "efficienza" or 
"consistenza," or with statistical meanings for their English cognates 
"efficiency" and "consistency."

- Doctor Mitteldorf, The Math Forum   

Date: 11/20/1999 at 11:12:53
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I thought some more about your challenge, and decided the best way to 
reduce it to a tractable problem is to limit ourselves to uniform, 
continuous distributions. In other words, when we talk about "general 
distributions" the space is so large we get lost, but we can make a 
start with uniform, continuous distributions. 

I have calculated the average variance sigma^2 for samples of size 
n = 2, and found that it does come out the same as the variance of the 
entire population. I can prove this. I've also experimented 
numerically with n = 3, 4, 5, 6 and larger; it seems to be true in my 
numerical experiments that for samples of size n, sigma^2 is the same 
as the variance of the entire population, but I can't yet prove this. 
Perhaps you'd like to try: I suggest you use induction on n. You'll 
have to construct the distribution for samples of size n based on the 
distribution for samples of size n-1.

Here's my proof that sigma^2 = population variance for n = 2:

The distribution can be (a < x < b), but without loss of generality, 
we can use the space (-1 < x < 1). Suppose x is uniformly distributed 
between -1 and 1. The variance of this distribution s^2 is 1/3:
      Integral (x^2 dx)
     ------------------- = 1/3
      Integral 1 

Suppose we sample 2 numbers from this distribution. Let y = x1-x2 be 
the difference between these two numbers. Then the variance of our 
two-number sample is (y^2)/4 using (n = 2), or (y^2)/2 using 
(n-1 = 1).

The distribution of y is a triangular wedge, rising linearly from -2 
to 0, falling again from 0 to +2:

              /  \
     --------/    \--------

We may write this distribution as prob(y) = (2-|y|)/4, (-2 < y < 2).

Then the mean value of the quantity (y^2)/2 is

     Integral prob(y) (y^2)/2 dy

This integral does indeed come out to 1/3, the same as the variance of 
the population as a whole.

Can you extend this calculation to triples of uniformly-distributed 


- Doctor Mitteldorf, The Math Forum   

Date: 12/06/1999 at 12:39:35
From: Eugenio Rapella
Subject: Re: Statistics: properties of estimators

Dear colleague,

I thank you very much for your kind reply to my request; what you 
suggest is really interesting, but I'm trying something different. 
I've only basic knowledge in statistics, I teach math in an Italian 
high school (to 18-year-old students) and I just wanted to show them 
why in scientific calculators there are two different keys for the 
standard deviation: "sx" and "sigmax."

Here is what I have in mind:

Suppose we have a population of n people and each person is associated 
with a particular value Xi where i = 1, 2, ..., n (e.g. his weight).

Let MP = sum(Xi)/n be the arithmetic mean of the population, 
SIGMA^2P = sum((Xi-M)^2)/n and S^2P = sum((Xi-M)^2)/(n-1) be two 
possible definitions for the population variance.

Let's take k elements (k < n) to form a sample and let's define MS, 
SIGMA^2S, and S^2S to be the analogue values calculated on the chosen 

If we imagine calculating these values for all the possible C(n,k) 
(combinations of n objects taken k at a time) samples, these values 
become random variables each one with its particular mean value; if X 
is a random variable, let's call E(X) (E = Expectation) its mean 

We say that MS is an "unbiased" estimator of MP because E(MS) = MP 
(and this is easy to prove). One could expect SIGMA^2S to be an 
unbiased estimator for SIGMA^2P, but this is not true: in order to 
have unbiased estimators you have to substitute SIGMA^2P with S^2P and 
SIGMA^2S with S^2S.

Here is a numeric example: let's take a population of n = 4 elements 
and the values 1, 2, 4, 13(*). We have MP = 5, SIGMA^2P = 90/4 = 45/2, 
S^2P = 90/3 = 30 related to the whole population.

Suppose we take samples of k = 3 elements. If we sample without 
replacement, we have C(4,3) = 4 possible samples; for each sample 
let's calculate MS, SIGMA^2S and S^2S:

     SAMPLE      MS    SIGMA^2S    S^2S
     ------      --    --------    ----
     1, 2, 4     7/3      14/9      7/3
     1, 2, 13   16/3     266/9    133/3
     1, 4, 13    6        26       39 
     2, 4, 13   19/3     206/9    103/3

Each sample has the same probability 1/4, calculating the mean value 
of the three random variables we have: E(MS) = 5, E(SIGMA^2S) = 20, 
and E(S^2S) = 30.

While E(MS) = MP and E(S^2S) = S^2P, E(SIGMA^2S) <> SIGMA^2P and the 
variance SIGMA^2S is not an unbiased estimator for SIGMA^2P. Note 
that, for any population and for any "n", if we take k = 1, SIGMA^2S 
of any sample is 0 (while S^2S is not defined: 0/0) so,

     E(SIGMA^2S) = 0 <> SIGMA^2P 

and that is enough to prove that SIGMA^2S cannot be an unbiased 
estimator for the variance of the entire population. As you see, these 
statements are very general and not connected to the population 
distribution (they're always true).

In the Italian translation of "Introductory Statistics" by T. H. 
Wonnacott and R. J. Wonnacott, I read "it is possible to demonstrate 
that S^2S is an unbiased estimator for S^2P, the demonstration can be 
found in more complete books." I just wonder how difficult this 
demonstration could be. (Can you tell me something about it?)

There's another thing that puzzles me: in the same book I find that 
"the Arithmetic Mean and the Median are BOTH unbiased estimators of 
the Arithmetic Mean of the population." Maybe this refers to samples 
"with replacement" because in the example (*) the Medians of the four 
samples are 2, 2, 4, 4 and E(MEDIANS) = 12/4 = 3 <> MP = 5.

In (*), the Median of the entire population is 3, but in general, the 
Median of the sample cannot be an unbiased estimator of the median of 
the population. With an even number of elements, the Median is defined 
to be the arithmetic mean of the two center values, so, if we consider 
samples of two elements (k = 2) the Median of each sample is again the 
arithmetic mean of the sample and E(MEDIANS) = E(MS) <> "Median of the 
population." [In (*), for k = 2, we have E(MS) = E(MEDIANS) = 5 and 
"Median of the population" = 3.)

I think I can solve some of these problems, but right now I have no 
more time to dedicate to these questions, so, if you have any comment 
or suggestion, it's really welcome.

Anyway, just for reading this long message you deserve a big thank 
you, so here it is: THANK YOU!

Have a merry Xmas and a fantastic 2000,
Eugenio Rapella

Date: 12/08/1999 at 15:32:39
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I consulted with a colleague here at the Math Forum, Doctor Anthony, 
who has helped me see the proof of your statement that the variance of 
the sample multiplied by (n/n-1) is an unbiased estimator for the 
variance of the set from which the sample is drawn. Here's the proof, 
based on his explanation to me.

First, some notation:

"Global average" means average over the entire, large set from which 
the samples are drawn. I'll write global averages as {F}.

"Sample average" is the average of one sample of n drawn from the 
large set with replacement. I'll write sample average as <F>.

"Ensemble average" is the average over a large number of samples of 
size n, and I'll write the ensemble average as [F]. The ensemble is 
assumed to be large enough that [<x>] = {x}, i.e. the mean of a large 
number of sample means is the same as the global mean.

Next, a lemma: It's a familiar result that the means of samples of 
size n have a variance (1/n) times the global variance. In my 

     [<x>^2] - [<x>]^2 = (1/n)({x^2} - {x}^2)

The meaning is: take a random sample of size n (with replacement). 
Calculate the sample mean. Repeat this again and again, until you have 
a large number of such mean values. The variance of this ensemble is 
(1/n) times the variance of the global set from which the samples are 

Proof of the lemma: We may assume without loss of generality that the 
distribution is centered about zero, i.e. {x} = 0. Then it will also 
be true that [<x>] = 0, as we noted above. So we are left with 

     [<x>^2] = (1/n){x^2}

On the left, we have the average of a large number of terms, each of 
which is


This can be written as 


The second sum is over cross-terms of the form xi*xj, with I <> j. 
When we take the ensemble average of these terms, they go to zero 

     SUM(xi) = 0

(Note that the sample is with replacement, so it is not necessarily 
true that xi = xj. The sum over a large number of pairs xi*xj will be 
zero because it is a multiple of SUM(xi), which by assumption is 

So we are left with the first term. Each term is the sum of squares of 
n randomly-selected elements, so the ensemble average will just be n 
times the global mean square:

     (1/n^2)*[SUM(xi^2)] = (1/n^2) n{x^2} = (1/n){x^2}

which is our lemma.


Moving on to the theorem, now: in my notation, the theorem is:

     [<x^2>-<x>^2] = ((n-1)/n)({x^2} - {x}^2)

(Visually, this looks so much like the lemma, it is worth taking a few 
minutes to be sure you understand the orders of the averaging and the 
squaring for each term. Except for the factor in front, the right-hand 
sides are the same in the theorem and the lemma. Both are the global 
variance. On the left-hand side, the lemma has [<x>^2] where the 
theorem has [<x^2>].)

First, notice that the [] can be applied separately to the two terms 
on the left. As before, we'll specialize to the case where the global 
mean is zero, so we are left with:

     [<x^2>]-[<x>^2] = ((n-1)/n){x^2}

Note that we can't set [<x>^2] equal to zero, because the sample means 
are not zero, even though the global mean is zero. In fact, in our 
lemma, we just proved that:

     [<x>^2] = (1/n) {x^2}

The term on the left, however, is the ensemble average of the sample 
averages of x^2. This average doesn't depend on the fact that the x's 
are grouped into samples, and it is just the same as {x^2}.

So we have 

     [<x^2>]-[<x>^2] = {x^2} - (1/n){x^2}

     [<x^2>]-[<x>^2] = ((n-1)/n) {x^2}

just what we wanted to prove.

I'll leave you with two thoughts. First, it is not quite trivial to 
extend either proof to the case where the mean is not zero. I'll leave 
you to do this as an exercise. Second, the whole thing depends on 
sampling with replacement. After all, if your global set had 4 members 
and you sampled it 4 times without replacement, then the sample 
variance would be exactly the global variance every time. Where does 
the stipulation "with replacement" come into our proof?

Thanks for bringing this question up, and I hope to continue a 
dialogue on the subject.

- Doctor Mitteldorf, The Math Forum   
Associated Topics:
College Statistics

Search the Dr. Math Library:

Find items containing (put spaces between keywords):
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

Math Forum Home || Math Library || Quick Reference || Math Forum Search

Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.