Sample and Population Standard Deviation
Date: 03/03/2001 at 11:18:08 From: Lara Brook Subject: Standard deviation What letters represent theoretical and 'real' standard deviation, mean, and variance?
Date: 03/05/2001 at 10:22:11 From: Doctor Jordi Subject: Re: Standard deviation Hello, Lara - thanks for writing to Dr. Math. "Theoretical" and "real" standard deviation? I am not very sure what you mean, but I am guessing that you are talking about *sample* standard deviation and *population* standard deviation. The traditional symbol for the sample standard deviation is S (lowercase or uppercase; there is a slight difference between the two) and the equivalent Greek letter sigma (which looks like an o with a little tail sticking out from the top) is commonly used to denote the population standard deviation. Also recall that S^2 is called the sample variance and sigma^2 is called the population variance; these two are probably the ones you will work with the most. Formally speaking, their mathematical definitions are as follows. S^2 = Sum(i=1 to n)(X_i - Xbar)^2 --------------------------- n - 1 Where X_i denotes the ith value in our sample, the ith realization of the random variable X, and Xbar, written as a X with a bar above it, denotes the sample mean (again, not to be confused with the population mean). sigma^2 = E((X - mu)^2) Where E denotes the expected value function, which you may or may not have encountered already. Roughly, the expected value function tells you the value "on average" that we would expect the expression sent as input to the function to take. For example, the expected value of the random variable X, E(X), would be the value we expect this variable to take on average, which is nothing more than the population mean. In fact, mu is defined to be E(X). Be careful between the distinction of the population and sample variances (or standard deviations), as they have different definitions. You have to realize the difference between a sample and the population it was drawn from. For example, say you are using a thermometer to measure the freezing point of water. Say that this thermometer can measure very small changes in temperature, but that it does not always measure the same temperature in the same way; it can be a little off to one side or to the other. Say you have taken five measurements of the freezing point of water with this thermometer, which were the following (in Fahrenheit): 32.1 32.3 32.0 31.8 31.8 In this setup, the sample is our five numbers above, and the population is the abstract infinity of all possible values our thermometer can display for the freezing point of water. You can take the average of these five values in our sample, which we will now call the sample mean instead of average, and you will find that it is Xbar = 32.12 (just add all values and divide by 5). Now, we know that the freezing point of water should be 32 degrees Fahrenheit; in fact, our thermometer was probably calibrated to read 32 degrees for freezing water, so can we conclude from our experiment that the freezing point of water is not 32 degrees? No, because our sample mean need not be equal to the population mean. The population mean is mu = 32 degrees. In fact, if we were to take many, many, more readings (say, 1000) would you expect our sample mean (the average of the readings) to get closer to or farther away from 32, the population mean? The assertion that we expect the sample mean to get closer and closer to the population mean as the sample size gets larger and larger is called the Law of Large Numbers. It is a very intuitively pleasing statement, and it can be proven using a few assumptions from probability theory, but I will not go into that right now. Let's go back to our sample of thermometer readings. What is the sample variance, S^2? Just by using the definition of S^2, we find that S^2 = (32.1 - 32.12)^2 + (32.3 - 32.12)^2 + (32.0 - 32.12)^2 + (31.8 - 32.12)^2 + (31.8 - 32.12)^2 -------------------------------------------------------- 5 - 1 so S^2 = 0.063 <------------------ sample variance S = 0.250998 (approximately) <------ sample standard deviation This tells us something about the accuracy of our thermometer. The sample standard deviation roughly says that on average, our thermometer will be about 0.25 off from the 'true' value. That could be a large or small standard deviation, depending on what we want the uses of this thermometer to be. However, the sample standard deviation that we have calculated here is subject to change. If we repeat this experiment and take five more values, we are likely to get a different variance. If we take five thousand readings, we are again likely to get a slightly different variance, but close to a certain value. If we were to take five million readings, our sample variance would get closer to a certain value. In short, the more readings we take, the closer our sample variance (or sample standard deviation) should be to the population variance (or population standard deviation). That is, we can use S^2 to estimate sigma^2, and the goodness of the estimation of sigma^2 using S^2 should be better if we increase the sample size. We say that S^2 is a consistent estimator of sigma^2. Sometimes it is possible in advance to know the population variance if we know the population mean and the distribution of the random variable in question (our random variable in our thermometer example was the reading of the thermometer). Most often, in real life, we know neither of these two, so we can use the sample mean and variances to make estimates about them. In fact, that's what statistics is all about: making inferences about unknown populations using data collected from samples. Always keep that in mind as you pursue your studies in statistics. I hope you found this explanation interesting. If you have any more doubts, would like to talk about this more, or if you have further questions, please write back. - Doctor Jordi, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.