Estimators of Statistics VariablesDate: 11/17/1999 at 14:14:59 From: Eugenio Rapella Subject: Statistics: properties of estimators I'm interested in the properties of the estimators for the most common values in statistics and I have some questions: 1. Let s^2 = sum((Xi-M)^2)/n and sigma^2 = sum((Xi-M)^2)/(n-1). Is there an elementary proof that s^2 of a sample in not an unbiased estimator for s^2 of the population while sigma^2 of a sample is an unbiased estimator for sigma^2 of the population? 2. Is there an elementary proof that the median of a sample is an unbiased estimator for the arithmetic mean of the population? 3. Are there elementary proofs whether the estimators of these indices do or don't have other properties (the Italian words are "efficienza" and "consistenza")? Thank for your kind attention, Eugenio Rapella Date: 11/20/1999 at 06:36:48 From: Doctor Mitteldorf Subject: Re: Statistics: properties of estimators Dear Eugenio, I think I understand what you are trying to do, but I don't believe it is possible. The idea of an unbiased estimator depends on what kind of distribution you expect in the sample. There are too many kinds of distributions for you to "average" over them, or to construct a "distribution of distributions." I think the ideas about "unbiased" really are softer than this, and must be taken not as mathematical truths but as statements about the kinds of distributions we encounter in real-life problems. On the other hand, it's certainly true that s^2 depends on n, and in ANY distribution it's likely to rise with increasing n. Suppose we limit ourselves to discrete distributions containing all the numbers 1 ... k. In other words, our distribution is known to be either 1 or 1 2 or 1 2 3 or 1 2 3 4 etc. Suppose we sample with replacement. Then we can actually calculate whether sigma^2 is an unbiased estimator. If you have time and motivation to investigate this, I'd be interested to hear what you find. For question (2), I think you could imagine doing this for the distribution of distributions I proposed above, and that it would turn out to be true that the median is an unbiased estimator. But suppose your distribution were the squares of whole numbers: 1 or 1 4 or 1 4 9 or 1 4 9 16 etc. Then the median wouldn't be an unbiased estimator of the mean at all: the median would generally come out lower. For (3), I'm not familiar with the Italian words "efficienza" or "consistenza," or with statistical meanings for their English cognates "efficiency" and "consistency." - Doctor Mitteldorf, The Math Forum http://mathforum.org/dr.math/ Date: 11/20/1999 at 11:12:53 From: Doctor Mitteldorf Subject: Re: Statistics: properties of estimators Dear Eugenio, I thought some more about your challenge, and decided the best way to reduce it to a tractable problem is to limit ourselves to uniform, continuous distributions. In other words, when we talk about "general distributions" the space is so large we get lost, but we can make a start with uniform, continuous distributions. I have calculated the average variance sigma^2 for samples of size n = 2, and found that it does come out the same as the variance of the entire population. I can prove this. I've also experimented numerically with n = 3, 4, 5, 6 and larger; it seems to be true in my numerical experiments that for samples of size n, sigma^2 is the same as the variance of the entire population, but I can't yet prove this. Perhaps you'd like to try: I suggest you use induction on n. You'll have to construct the distribution for samples of size n based on the distribution for samples of size n-1. Here's my proof that sigma^2 = population variance for n = 2: The distribution can be (a < x < b), but without loss of generality, we can use the space (-1 < x < 1). Suppose x is uniformly distributed between -1 and 1. The variance of this distribution s^2 is 1/3: +1 Integral (x^2 dx) -1 ------------------- = 1/3 +1 Integral 1 -1 Suppose we sample 2 numbers from this distribution. Let y = x1-x2 be the difference between these two numbers. Then the variance of our two-number sample is (y^2)/4 using (n = 2), or (y^2)/2 using (n-1 = 1). The distribution of y is a triangular wedge, rising linearly from -2 to 0, falling again from 0 to +2: /\ / \ --------/ \-------- We may write this distribution as prob(y) = (2-|y|)/4, (-2 < y < 2). Then the mean value of the quantity (y^2)/2 is 2 Integral prob(y) (y^2)/2 dy -2 This integral does indeed come out to 1/3, the same as the variance of the population as a whole. Can you extend this calculation to triples of uniformly-distributed numbers? Thanks. - Doctor Mitteldorf, The Math Forum http://mathforum.org/dr.math/ Date: 12/06/1999 at 12:39:35 From: Eugenio Rapella Subject: Re: Statistics: properties of estimators Dear colleague, I thank you very much for your kind reply to my request; what you suggest is really interesting, but I'm trying something different. I've only basic knowledge in statistics, I teach math in an Italian high school (to 18-year-old students) and I just wanted to show them why in scientific calculators there are two different keys for the standard deviation: "sx" and "sigmax." Here is what I have in mind: Suppose we have a population of n people and each person is associated with a particular value Xi where i = 1, 2, ..., n (e.g. his weight). Let MP = sum(Xi)/n be the arithmetic mean of the population, SIGMA^2P = sum((Xi-M)^2)/n and S^2P = sum((Xi-M)^2)/(n-1) be two possible definitions for the population variance. Let's take k elements (k < n) to form a sample and let's define MS, SIGMA^2S, and S^2S to be the analogue values calculated on the chosen sample. If we imagine calculating these values for all the possible C(n,k) (combinations of n objects taken k at a time) samples, these values become random variables each one with its particular mean value; if X is a random variable, let's call E(X) (E = Expectation) its mean value. We say that MS is an "unbiased" estimator of MP because E(MS) = MP (and this is easy to prove). One could expect SIGMA^2S to be an unbiased estimator for SIGMA^2P, but this is not true: in order to have unbiased estimators you have to substitute SIGMA^2P with S^2P and SIGMA^2S with S^2S. Here is a numeric example: let's take a population of n = 4 elements and the values 1, 2, 4, 13(*). We have MP = 5, SIGMA^2P = 90/4 = 45/2, S^2P = 90/3 = 30 related to the whole population. Suppose we take samples of k = 3 elements. If we sample without replacement, we have C(4,3) = 4 possible samples; for each sample let's calculate MS, SIGMA^2S and S^2S: SAMPLE MS SIGMA^2S S^2S ------ -- -------- ---- 1, 2, 4 7/3 14/9 7/3 1, 2, 13 16/3 266/9 133/3 1, 4, 13 6 26 39 2, 4, 13 19/3 206/9 103/3 Each sample has the same probability 1/4, calculating the mean value of the three random variables we have: E(MS) = 5, E(SIGMA^2S) = 20, and E(S^2S) = 30. While E(MS) = MP and E(S^2S) = S^2P, E(SIGMA^2S) <> SIGMA^2P and the variance SIGMA^2S is not an unbiased estimator for SIGMA^2P. Note that, for any population and for any "n", if we take k = 1, SIGMA^2S of any sample is 0 (while S^2S is not defined: 0/0) so, E(SIGMA^2S) = 0 <> SIGMA^2P and that is enough to prove that SIGMA^2S cannot be an unbiased estimator for the variance of the entire population. As you see, these statements are very general and not connected to the population distribution (they're always true). In the Italian translation of "Introductory Statistics" by T. H. Wonnacott and R. J. Wonnacott, I read "it is possible to demonstrate that S^2S is an unbiased estimator for S^2P, the demonstration can be found in more complete books." I just wonder how difficult this demonstration could be. (Can you tell me something about it?) There's another thing that puzzles me: in the same book I find that "the Arithmetic Mean and the Median are BOTH unbiased estimators of the Arithmetic Mean of the population." Maybe this refers to samples "with replacement" because in the example (*) the Medians of the four samples are 2, 2, 4, 4 and E(MEDIANS) = 12/4 = 3 <> MP = 5. In (*), the Median of the entire population is 3, but in general, the Median of the sample cannot be an unbiased estimator of the median of the population. With an even number of elements, the Median is defined to be the arithmetic mean of the two center values, so, if we consider samples of two elements (k = 2) the Median of each sample is again the arithmetic mean of the sample and E(MEDIANS) = E(MS) <> "Median of the population." [In (*), for k = 2, we have E(MS) = E(MEDIANS) = 5 and "Median of the population" = 3.) I think I can solve some of these problems, but right now I have no more time to dedicate to these questions, so, if you have any comment or suggestion, it's really welcome. Anyway, just for reading this long message you deserve a big thank you, so here it is: THANK YOU! Have a merry Xmas and a fantastic 2000, Eugenio Rapella Date: 12/08/1999 at 15:32:39 From: Doctor Mitteldorf Subject: Re: Statistics: properties of estimators Dear Eugenio, I consulted with a colleague here at the Math Forum, Doctor Anthony, who has helped me see the proof of your statement that the variance of the sample multiplied by (n/n-1) is an unbiased estimator for the variance of the set from which the sample is drawn. Here's the proof, based on his explanation to me. First, some notation: "Global average" means average over the entire, large set from which the samples are drawn. I'll write global averages as {F}. "Sample average" is the average of one sample of n drawn from the large set with replacement. I'll write sample average as <F>. "Ensemble average" is the average over a large number of samples of size n, and I'll write the ensemble average as [F]. The ensemble is assumed to be large enough that [<x>] = {x}, i.e. the mean of a large number of sample means is the same as the global mean. Next, a lemma: It's a familiar result that the means of samples of size n have a variance (1/n) times the global variance. In my notation: [<x>^2] - [<x>]^2 = (1/n)({x^2} - {x}^2) The meaning is: take a random sample of size n (with replacement). Calculate the sample mean. Repeat this again and again, until you have a large number of such mean values. The variance of this ensemble is (1/n) times the variance of the global set from which the samples are drawn. Proof of the lemma: We may assume without loss of generality that the distribution is centered about zero, i.e. {x} = 0. Then it will also be true that [<x>] = 0, as we noted above. So we are left with [<x>^2] = (1/n){x^2} On the left, we have the average of a large number of terms, each of which is ((1/n)*SUM(xi))^2 This can be written as (1/n^2)*(SUM(xi^2)+SUM(2xi*xj)) The second sum is over cross-terms of the form xi*xj, with I <> j. When we take the ensemble average of these terms, they go to zero because SUM(xi) = 0 (Note that the sample is with replacement, so it is not necessarily true that xi = xj. The sum over a large number of pairs xi*xj will be zero because it is a multiple of SUM(xi), which by assumption is zero.) So we are left with the first term. Each term is the sum of squares of n randomly-selected elements, so the ensemble average will just be n times the global mean square: (1/n^2)*[SUM(xi^2)] = (1/n^2) n{x^2} = (1/n){x^2} which is our lemma. ------------------------------------------------------------------ Moving on to the theorem, now: in my notation, the theorem is: [<x^2>-<x>^2] = ((n-1)/n)({x^2} - {x}^2) (Visually, this looks so much like the lemma, it is worth taking a few minutes to be sure you understand the orders of the averaging and the squaring for each term. Except for the factor in front, the right-hand sides are the same in the theorem and the lemma. Both are the global variance. On the left-hand side, the lemma has [<x>^2] where the theorem has [<x^2>].) First, notice that the [] can be applied separately to the two terms on the left. As before, we'll specialize to the case where the global mean is zero, so we are left with: [<x^2>]-[<x>^2] = ((n-1)/n){x^2} Note that we can't set [<x>^2] equal to zero, because the sample means are not zero, even though the global mean is zero. In fact, in our lemma, we just proved that: [<x>^2] = (1/n) {x^2} The term on the left, however, is the ensemble average of the sample averages of x^2. This average doesn't depend on the fact that the x's are grouped into samples, and it is just the same as {x^2}. So we have [<x^2>]-[<x>^2] = {x^2} - (1/n){x^2} [<x^2>]-[<x>^2] = ((n-1)/n) {x^2} just what we wanted to prove. I'll leave you with two thoughts. First, it is not quite trivial to extend either proof to the case where the mean is not zero. I'll leave you to do this as an exercise. Second, the whole thing depends on sampling with replacement. After all, if your global set had 4 members and you sampled it 4 times without replacement, then the sample variance would be exactly the global variance every time. Where does the stipulation "with replacement" come into our proof? Thanks for bringing this question up, and I hope to continue a dialogue on the subject. - Doctor Mitteldorf, The Math Forum http://mathforum.org/dr.math/ |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. Math^{TM}
© 1994- The Math Forum at NCTM. All rights reserved.
http://mathforum.org/dr.math/