Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Sample and Population Standard Deviation


Date: 03/03/2001 at 11:18:08
From: Lara Brook
Subject: Standard deviation

What letters represent theoretical and 'real' standard deviation, 
mean, and variance?


Date: 03/05/2001 at 10:22:11
From: Doctor Jordi
Subject: Re: Standard deviation

Hello, Lara - thanks for writing to Dr. Math.

"Theoretical" and "real" standard deviation?  I am not very sure what 
you mean, but I am guessing that you are talking about *sample* 
standard deviation and *population* standard deviation. 

The traditional symbol for the sample standard deviation is S 
(lowercase or uppercase; there is a slight difference between the two) 
and the equivalent Greek letter sigma (which looks like an o with a 
little tail sticking out from the top) is commonly used to denote the 
population standard deviation. Also recall that S^2 is called the 
sample variance and sigma^2 is called the population variance; these 
two are probably the ones you will work with the most. Formally 
speaking, their mathematical definitions are as follows. 

     S^2 = Sum(i=1 to n)(X_i - Xbar)^2
           ---------------------------
                     n - 1

Where X_i denotes the ith value in our sample, the ith realization of 
the random variable X, and Xbar, written as a X with a bar above it, 
denotes the sample mean (again, not to be confused with the population 
mean).

     sigma^2 = E((X - mu)^2)

Where E denotes the expected value function, which you may or may not 
have encountered already. Roughly, the expected value function tells 
you the value "on average" that we would expect the expression sent as 
input to the function to take. For example, the expected value of the 
random variable X, E(X), would be the value we expect this variable to 
take on average, which is nothing more than the population mean. In 
fact, mu is defined to be E(X).

Be careful between the distinction of the population and sample 
variances (or standard deviations), as they have different 
definitions. You have to realize the difference between a sample and 
the population it was drawn from. For example, say you are using a 
thermometer to measure the freezing point of water. Say that this 
thermometer can measure very small changes in temperature, but that it 
does not always measure the same temperature in the same way; it can 
be a little off to one side or to the other. Say you have taken five 
measurements of the freezing point of water with this thermometer, 
which were the following (in Fahrenheit):

     32.1 32.3 32.0 31.8 31.8

In this setup, the sample is our five numbers above, and the 
population is the abstract infinity of all possible values our 
thermometer can display for the freezing point of water. You can take 
the average of these five values in our sample, which we will now call 
the sample mean instead of average, and you will find that it is Xbar 
= 32.12 (just add all values and divide by 5). 

Now, we know that the freezing point of water should be 32 degrees 
Fahrenheit; in fact, our thermometer was probably calibrated to read 
32 degrees for freezing water, so can we conclude from our experiment 
that the freezing point of water is not 32 degrees? No, because our 
sample mean need not be equal to the population mean. The population 
mean is mu = 32 degrees.  In fact, if we were to take many, many, more 
readings (say, 1000) would you expect our sample mean (the average of 
the readings) to get closer to or farther away from 32, the population 
mean?
 
The assertion that we expect the sample mean to get closer and closer 
to the population mean as the sample size gets larger and larger is 
called the Law of Large Numbers. It is a very intuitively pleasing 
statement, and it can be proven using a few assumptions from 
probability theory, but I will not go into that right now.

Let's go back to our sample of thermometer readings.  What is the 
sample variance, S^2?  Just by using the definition of S^2, we find 
that

     S^2 = (32.1 - 32.12)^2 + (32.3 - 32.12)^2 + (32.0 - 32.12)^2 + 
           (31.8 - 32.12)^2 + (31.8 - 32.12)^2
           --------------------------------------------------------
                     5 - 1 
   
  so S^2 = 0.063        <------------------ sample variance

     S   = 0.250998 (approximately) <------ sample standard deviation

This tells us something about the accuracy of our thermometer. The 
sample standard deviation roughly says that on average, our 
thermometer will be about 0.25 off from the 'true' value. That could 
be a large or small standard deviation, depending on what we want the 
uses of this thermometer to be. However, the sample standard deviation 
that we have calculated here is subject to change. If we repeat this 
experiment and take five more values, we are likely to get a different 
variance. If we take five thousand readings, we are again likely to 
get a slightly different variance, but close to a certain value. If we 
were to take five million readings, our sample variance would get 
closer to a certain value. In short, the more readings we take, the 
closer our sample variance (or sample standard deviation) should be to 
the population variance (or population standard deviation). That is, 
we can use S^2 to estimate sigma^2, and the goodness of the estimation 
of sigma^2 using S^2 should be better if we increase the sample size.  
We say that S^2 is a consistent estimator of sigma^2.

Sometimes it is possible in advance to know the population variance if 
we know the population mean and the distribution of the random 
variable in question (our random variable in our thermometer example 
was the reading of the thermometer). Most often, in real life, we know 
neither of these two, so we can use the sample mean and variances to 
make estimates about them.

In fact, that's what statistics is all about: making inferences about 
unknown populations using data collected from samples. Always keep 
that in mind as you pursue your studies in statistics.

I hope you found this explanation interesting. If you have any more 
doubts, would like to talk about this more, or if you have further 
questions, please write back.

- Doctor Jordi, The Math Forum
  http://mathforum.org/dr.math/   
    
Associated Topics:
High School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/