|


Sample and Population Standard DeviationDate: 03/03/2001 at 11:18:08 From: Lara Brook Subject: Standard deviation What letters represent theoretical and 'real' standard deviation, mean, and variance?
Date: 03/05/2001 at 10:22:11
From: Doctor Jordi
Subject: Re: Standard deviation
Hello, Lara - thanks for writing to Dr. Math.
"Theoretical" and "real" standard deviation? I am not very sure what
you mean, but I am guessing that you are talking about *sample*
standard deviation and *population* standard deviation.
The traditional symbol for the sample standard deviation is S
(lowercase or uppercase; there is a slight difference between the two)
and the equivalent Greek letter sigma (which looks like an o with a
little tail sticking out from the top) is commonly used to denote the
population standard deviation. Also recall that S^2 is called the
sample variance and sigma^2 is called the population variance; these
two are probably the ones you will work with the most. Formally
speaking, their mathematical definitions are as follows.
S^2 = Sum(i=1 to n)(X_i - Xbar)^2
---------------------------
n - 1
Where X_i denotes the ith value in our sample, the ith realization of
the random variable X, and Xbar, written as a X with a bar above it,
denotes the sample mean (again, not to be confused with the population
mean).
sigma^2 = E((X - mu)^2)
Where E denotes the expected value function, which you may or may not
have encountered already. Roughly, the expected value function tells
you the value "on average" that we would expect the expression sent as
input to the function to take. For example, the expected value of the
random variable X, E(X), would be the value we expect this variable to
take on average, which is nothing more than the population mean. In
fact, mu is defined to be E(X).
Be careful between the distinction of the population and sample
variances (or standard deviations), as they have different
definitions. You have to realize the difference between a sample and
the population it was drawn from. For example, say you are using a
thermometer to measure the freezing point of water. Say that this
thermometer can measure very small changes in temperature, but that it
does not always measure the same temperature in the same way; it can
be a little off to one side or to the other. Say you have taken five
measurements of the freezing point of water with this thermometer,
which were the following (in Fahrenheit):
32.1 32.3 32.0 31.8 31.8
In this setup, the sample is our five numbers above, and the
population is the abstract infinity of all possible values our
thermometer can display for the freezing point of water. You can take
the average of these five values in our sample, which we will now call
the sample mean instead of average, and you will find that it is Xbar
= 32.12 (just add all values and divide by 5).
Now, we know that the freezing point of water should be 32 degrees
Fahrenheit; in fact, our thermometer was probably calibrated to read
32 degrees for freezing water, so can we conclude from our experiment
that the freezing point of water is not 32 degrees? No, because our
sample mean need not be equal to the population mean. The population
mean is mu = 32 degrees. In fact, if we were to take many, many, more
readings (say, 1000) would you expect our sample mean (the average of
the readings) to get closer to or farther away from 32, the population
mean?
The assertion that we expect the sample mean to get closer and closer
to the population mean as the sample size gets larger and larger is
called the Law of Large Numbers. It is a very intuitively pleasing
statement, and it can be proven using a few assumptions from
probability theory, but I will not go into that right now.
Let's go back to our sample of thermometer readings. What is the
sample variance, S^2? Just by using the definition of S^2, we find
that
S^2 = (32.1 - 32.12)^2 + (32.3 - 32.12)^2 + (32.0 - 32.12)^2 +
(31.8 - 32.12)^2 + (31.8 - 32.12)^2
--------------------------------------------------------
5 - 1
so S^2 = 0.063 <------------------ sample variance
S = 0.250998 (approximately) <------ sample standard deviation
This tells us something about the accuracy of our thermometer. The
sample standard deviation roughly says that on average, our
thermometer will be about 0.25 off from the 'true' value. That could
be a large or small standard deviation, depending on what we want the
uses of this thermometer to be. However, the sample standard deviation
that we have calculated here is subject to change. If we repeat this
experiment and take five more values, we are likely to get a different
variance. If we take five thousand readings, we are again likely to
get a slightly different variance, but close to a certain value. If we
were to take five million readings, our sample variance would get
closer to a certain value. In short, the more readings we take, the
closer our sample variance (or sample standard deviation) should be to
the population variance (or population standard deviation). That is,
we can use S^2 to estimate sigma^2, and the goodness of the estimation
of sigma^2 using S^2 should be better if we increase the sample size.
We say that S^2 is a consistent estimator of sigma^2.
Sometimes it is possible in advance to know the population variance if
we know the population mean and the distribution of the random
variable in question (our random variable in our thermometer example
was the reading of the thermometer). Most often, in real life, we know
neither of these two, so we can use the sample mean and variances to
make estimates about them.
In fact, that's what statistics is all about: making inferences about
unknown populations using data collected from samples. Always keep
that in mind as you pursue your studies in statistics.
I hope you found this explanation interesting. If you have any more
doubts, would like to talk about this more, or if you have further
questions, please write back.
- Doctor Jordi, The Math Forum
http://mathforum.org/dr.math/
|
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/