Associated Topics || Dr. Math Home || Search Dr. Math

Estimators of Statistics Variables

```
Date: 11/17/1999 at 14:14:59
From: Eugenio Rapella
Subject: Statistics: properties of estimators

I'm interested in the properties of the estimators for the most common
values in statistics and I have some questions:

1. Let s^2 = sum((Xi-M)^2)/n and sigma^2 = sum((Xi-M)^2)/(n-1). Is
there an elementary proof that s^2 of a sample in not an unbiased
estimator for s^2 of the population while sigma^2 of a sample is an
unbiased estimator for sigma^2 of the population?

2. Is there an elementary proof that the median of a sample is an
unbiased estimator for the arithmetic mean of the population?

3. Are there elementary proofs whether the estimators of these indices
do or don't have other properties (the Italian words are
"efficienza" and "consistenza")?

Thank for your kind attention,
Eugenio Rapella
```

```
Date: 11/20/1999 at 06:36:48
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I think I understand what you are trying to do, but I don't believe it
is possible.

The idea of an unbiased estimator depends on what kind of distribution
you expect in the sample. There are too many kinds of distributions
for you to "average" over them, or to construct a "distribution of
distributions."

I think the ideas about "unbiased" really are softer than this, and
must be taken not as mathematical truths but as statements about the
kinds of distributions we encounter in real-life problems.

On the other hand, it's certainly true that s^2 depends on n, and in
ANY distribution it's likely to rise with increasing n. Suppose we
limit ourselves to discrete distributions containing all the numbers
1 ... k. In other words, our distribution is known to be either

1         or
1 2       or
1 2 3     or
1 2 3 4   etc.

Suppose we sample with replacement. Then we can actually calculate
whether sigma^2 is an unbiased estimator. If you have time and
motivation to investigate this, I'd be interested to hear what you
find.

For question (2), I think you could imagine doing this for the
distribution of distributions I proposed above, and that it would turn
out to be true that the median is an unbiased estimator. But suppose
your distribution were the squares of whole numbers:

1          or
1 4        or
1 4 9      or
1 4 9 16   etc.

Then the median wouldn't be an unbiased estimator of the mean at all:
the median would generally come out lower.

For (3), I'm not familiar with the Italian words "efficienza" or
"consistenza," or with statistical meanings for their English cognates
"efficiency" and "consistency."

- Doctor Mitteldorf, The Math Forum
http://mathforum.org/dr.math/
```

```
Date: 11/20/1999 at 11:12:53
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I thought some more about your challenge, and decided the best way to
reduce it to a tractable problem is to limit ourselves to uniform,
continuous distributions. In other words, when we talk about "general
distributions" the space is so large we get lost, but we can make a

I have calculated the average variance sigma^2 for samples of size
n = 2, and found that it does come out the same as the variance of the
entire population. I can prove this. I've also experimented
numerically with n = 3, 4, 5, 6 and larger; it seems to be true in my
numerical experiments that for samples of size n, sigma^2 is the same
as the variance of the entire population, but I can't yet prove this.
Perhaps you'd like to try: I suggest you use induction on n. You'll
have to construct the distribution for samples of size n based on the
distribution for samples of size n-1.

Here's my proof that sigma^2 = population variance for n = 2:

The distribution can be (a < x < b), but without loss of generality,
we can use the space (-1 < x < 1). Suppose x is uniformly distributed
between -1 and 1. The variance of this distribution s^2 is 1/3:

+1
Integral (x^2 dx)
-1
------------------- = 1/3
+1
Integral 1
-1

Suppose we sample 2 numbers from this distribution. Let y = x1-x2 be
the difference between these two numbers. Then the variance of our
two-number sample is (y^2)/4 using (n = 2), or (y^2)/2 using
(n-1 = 1).

The distribution of y is a triangular wedge, rising linearly from -2
to 0, falling again from 0 to +2:

/\
/  \
--------/    \--------

We may write this distribution as prob(y) = (2-|y|)/4, (-2 < y < 2).

Then the mean value of the quantity (y^2)/2 is

2
Integral prob(y) (y^2)/2 dy
-2

This integral does indeed come out to 1/3, the same as the variance of
the population as a whole.

Can you extend this calculation to triples of uniformly-distributed
numbers?

Thanks.

- Doctor Mitteldorf, The Math Forum
http://mathforum.org/dr.math/
```

```
Date: 12/06/1999 at 12:39:35
From: Eugenio Rapella
Subject: Re: Statistics: properties of estimators

Dear colleague,

I thank you very much for your kind reply to my request; what you
suggest is really interesting, but I'm trying something different.
I've only basic knowledge in statistics, I teach math in an Italian
high school (to 18-year-old students) and I just wanted to show them
why in scientific calculators there are two different keys for the
standard deviation: "sx" and "sigmax."

Here is what I have in mind:

Suppose we have a population of n people and each person is associated
with a particular value Xi where i = 1, 2, ..., n (e.g. his weight).

Let MP = sum(Xi)/n be the arithmetic mean of the population,
SIGMA^2P = sum((Xi-M)^2)/n and S^2P = sum((Xi-M)^2)/(n-1) be two
possible definitions for the population variance.

Let's take k elements (k < n) to form a sample and let's define MS,
SIGMA^2S, and S^2S to be the analogue values calculated on the chosen
sample.

If we imagine calculating these values for all the possible C(n,k)
(combinations of n objects taken k at a time) samples, these values
become random variables each one with its particular mean value; if X
is a random variable, let's call E(X) (E = Expectation) its mean
value.

We say that MS is an "unbiased" estimator of MP because E(MS) = MP
(and this is easy to prove). One could expect SIGMA^2S to be an
unbiased estimator for SIGMA^2P, but this is not true: in order to
have unbiased estimators you have to substitute SIGMA^2P with S^2P and
SIGMA^2S with S^2S.

Here is a numeric example: let's take a population of n = 4 elements
and the values 1, 2, 4, 13(*). We have MP = 5, SIGMA^2P = 90/4 = 45/2,
S^2P = 90/3 = 30 related to the whole population.

Suppose we take samples of k = 3 elements. If we sample without
replacement, we have C(4,3) = 4 possible samples; for each sample
let's calculate MS, SIGMA^2S and S^2S:

SAMPLE      MS    SIGMA^2S    S^2S
------      --    --------    ----
1, 2, 4     7/3      14/9      7/3
1, 2, 13   16/3     266/9    133/3
1, 4, 13    6        26       39
2, 4, 13   19/3     206/9    103/3

Each sample has the same probability 1/4, calculating the mean value
of the three random variables we have: E(MS) = 5, E(SIGMA^2S) = 20,
and E(S^2S) = 30.

While E(MS) = MP and E(S^2S) = S^2P, E(SIGMA^2S) <> SIGMA^2P and the
variance SIGMA^2S is not an unbiased estimator for SIGMA^2P. Note
that, for any population and for any "n", if we take k = 1, SIGMA^2S
of any sample is 0 (while S^2S is not defined: 0/0) so,

E(SIGMA^2S) = 0 <> SIGMA^2P

and that is enough to prove that SIGMA^2S cannot be an unbiased
estimator for the variance of the entire population. As you see, these
statements are very general and not connected to the population
distribution (they're always true).

In the Italian translation of "Introductory Statistics" by T. H.
Wonnacott and R. J. Wonnacott, I read "it is possible to demonstrate
that S^2S is an unbiased estimator for S^2P, the demonstration can be
found in more complete books." I just wonder how difficult this
demonstration could be. (Can you tell me something about it?)

There's another thing that puzzles me: in the same book I find that
"the Arithmetic Mean and the Median are BOTH unbiased estimators of
the Arithmetic Mean of the population." Maybe this refers to samples
"with replacement" because in the example (*) the Medians of the four
samples are 2, 2, 4, 4 and E(MEDIANS) = 12/4 = 3 <> MP = 5.

In (*), the Median of the entire population is 3, but in general, the
Median of the sample cannot be an unbiased estimator of the median of
the population. With an even number of elements, the Median is defined
to be the arithmetic mean of the two center values, so, if we consider
samples of two elements (k = 2) the Median of each sample is again the
arithmetic mean of the sample and E(MEDIANS) = E(MS) <> "Median of the
population." [In (*), for k = 2, we have E(MS) = E(MEDIANS) = 5 and
"Median of the population" = 3.)

I think I can solve some of these problems, but right now I have no
more time to dedicate to these questions, so, if you have any comment
or suggestion, it's really welcome.

Anyway, just for reading this long message you deserve a big thank
you, so here it is: THANK YOU!

Have a merry Xmas and a fantastic 2000,
Eugenio Rapella
```

```
Date: 12/08/1999 at 15:32:39
From: Doctor Mitteldorf
Subject: Re: Statistics: properties of estimators

Dear Eugenio,

I consulted with a colleague here at the Math Forum, Doctor Anthony,
who has helped me see the proof of your statement that the variance of
the sample multiplied by (n/n-1) is an unbiased estimator for the
variance of the set from which the sample is drawn. Here's the proof,
based on his explanation to me.

First, some notation:

"Global average" means average over the entire, large set from which
the samples are drawn. I'll write global averages as {F}.

"Sample average" is the average of one sample of n drawn from the
large set with replacement. I'll write sample average as <F>.

"Ensemble average" is the average over a large number of samples of
size n, and I'll write the ensemble average as [F]. The ensemble is
assumed to be large enough that [<x>] = {x}, i.e. the mean of a large
number of sample means is the same as the global mean.

Next, a lemma: It's a familiar result that the means of samples of
size n have a variance (1/n) times the global variance. In my
notation:

[<x>^2] - [<x>]^2 = (1/n)({x^2} - {x}^2)

The meaning is: take a random sample of size n (with replacement).
Calculate the sample mean. Repeat this again and again, until you have
a large number of such mean values. The variance of this ensemble is
(1/n) times the variance of the global set from which the samples are
drawn.

Proof of the lemma: We may assume without loss of generality that the
distribution is centered about zero, i.e. {x} = 0. Then it will also
be true that [<x>] = 0, as we noted above. So we are left with

[<x>^2] = (1/n){x^2}

On the left, we have the average of a large number of terms, each of
which is

((1/n)*SUM(xi))^2

This can be written as

(1/n^2)*(SUM(xi^2)+SUM(2xi*xj))

The second sum is over cross-terms of the form xi*xj, with I <> j.
When we take the ensemble average of these terms, they go to zero
because

SUM(xi) = 0

(Note that the sample is with replacement, so it is not necessarily
true that xi = xj. The sum over a large number of pairs xi*xj will be
zero because it is a multiple of SUM(xi), which by assumption is
zero.)

So we are left with the first term. Each term is the sum of squares of
n randomly-selected elements, so the ensemble average will just be n
times the global mean square:

(1/n^2)*[SUM(xi^2)] = (1/n^2) n{x^2} = (1/n){x^2}

which is our lemma.

------------------------------------------------------------------

Moving on to the theorem, now: in my notation, the theorem is:

[<x^2>-<x>^2] = ((n-1)/n)({x^2} - {x}^2)

(Visually, this looks so much like the lemma, it is worth taking a few
minutes to be sure you understand the orders of the averaging and the
squaring for each term. Except for the factor in front, the right-hand
sides are the same in the theorem and the lemma. Both are the global
variance. On the left-hand side, the lemma has [<x>^2] where the
theorem has [<x^2>].)

First, notice that the [] can be applied separately to the two terms
on the left. As before, we'll specialize to the case where the global
mean is zero, so we are left with:

[<x^2>]-[<x>^2] = ((n-1)/n){x^2}

Note that we can't set [<x>^2] equal to zero, because the sample means
are not zero, even though the global mean is zero. In fact, in our
lemma, we just proved that:

[<x>^2] = (1/n) {x^2}

The term on the left, however, is the ensemble average of the sample
averages of x^2. This average doesn't depend on the fact that the x's
are grouped into samples, and it is just the same as {x^2}.

So we have

[<x^2>]-[<x>^2] = {x^2} - (1/n){x^2}

[<x^2>]-[<x>^2] = ((n-1)/n) {x^2}

just what we wanted to prove.

I'll leave you with two thoughts. First, it is not quite trivial to
extend either proof to the case where the mean is not zero. I'll leave
you to do this as an exercise. Second, the whole thing depends on
sampling with replacement. After all, if your global set had 4 members
and you sampled it 4 times without replacement, then the sample
variance would be exactly the global variance every time. Where does
the stipulation "with replacement" come into our proof?

Thanks for bringing this question up, and I hope to continue a
dialogue on the subject.

- Doctor Mitteldorf, The Math Forum
http://mathforum.org/dr.math/
```
Associated Topics:
College Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search