Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Sampling From Finite Population with Replacement
Replies: 28   Last Post: Sep 30, 2010 6:30 AM

 Messages: [ Previous | Next ]
 cagdas.ozgenc@gmail.com Posts: 58 Registered: 3/29/06
Re: Sampling From Finite Population with Replacement
Posted: Sep 28, 2010 2:48 PM

> > Here is what I am trying to say. Take any statistics book you will
> > find a statment that starts something like the following:

>
> > "You have a population of size N with elements normally distributed
> > with Mu and Sigma. If we sample from this population with
> > replacement..." Then they continue calculating population mean and
> > variance, and then claim that Expected Value of Sample mean is equal
> > to Mu.

>
> > The point I read that it gives me the creeps. First of all normal
> > distribution is a model. Yes sample mean will give an unbiased
> > estimate of the population mean (which is a population parameter not a
> > model parameter). But on average it will not be Mu. Sampling with
> > replacement from a finite population will not give an unbiased
> > estimation of the model paramaters. Either I am not reading my books
> > carefully or this issue is somehow swept under the rug.

>
> > In the infinite population case my understanding is that population
> > parameter and model paratemer will converge. But when we talk about
> > inference do we ever care about the model parameter?

>
> I don't understand your objection.  Have you ever tried it with a
> population small enough so that you can enumerate all possible samples
> of a given size?  E.g., try the following:
>
> 1. Let the population consist of 5 scores:  2, 3, 4, 5, 6
>
> 2. Compute the population mean and SD (with N, not n-1 in the
> denominator).
>
> 3. Draw all possible samples of n=2 (with replacement) from the
> population--there are 25 of them.  For each one, compute the sample
> mean.
>
> 4. Compute the mean and SD of the 25 sample means.  For the SD, use
> N=25 in the denominator, because you have the entire population of
> sample means.
>
> Notice that the mean of the sample means = the population mean; and
> the SD of the sample means = the population SD over the square root
> of the sample size.
>
> --
> Bruce Weaver
> "When all else fails, RTFM."- Al?nt?y? gizle -
>
> - Al?nt?y? göster -

Let me try to explain one more time. My questions are usually too deep
down there for me to explain properly.

First of all I am talking about 3 diferent things here: model
parameters, population parameters, sample statistics

I have no objection to the fact that population mean can be calculated
by sample mean in an unbiased way. However you will find commonly in
text books and real life research that what's trying to be inferred is
not the population mean but the model mean (or generating process).

For example take a look at the lecture notes of a stats class in
UCDavis that I just found on the internet (page 5):

http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf

It is trying to show the difference between sampling with replacement
vs sampling without replacement. But that's not the issue here.
There's something else wrong about it.

Starts with the following:

"Suppose the heights of female students entering UC Davis in year 2005
follows a normal distribution, with mean mu and standard deviation
sigma"

First of all number of female students entering UC Davis in year 2005
is finite. If this is really our population then there is no way it
can be normally distributed. Normal distribution is a model for
infinite populations. This means that actually we are not looking at a
population but we are looking at a sampling from normal distribution.

Now the rest of the problem:

"A random sample of 100 students are taken" from the above so called
population.

Now we are looking at a sample of a sample. This means that no matter
what you do, you will never find the mean of the normal distribution
(Mu) by repeated sampling. It doesn't matter whether you do it with
replacement or without replacement.

You will end up calculating the mean of the population, which will be
slightly or significantly different from Mu depending on how many
students entered UC Davis in year 2005. This means that our samples of
100 students will be an unbiased estimate of the population mean but a
biased estimate of Mu.

This is the difference between sampling with replacement from a finite
population and sampling from an infinite population. It seems to me
that this is a chronical problem in stat texts.

I understand that when you have 1000 elements in your population the
difference in the result will be miniscule. Or some could say that it
is just a model and a model is not 100% reflection of real life
(that's why it is called a model). However, members of a finite
population can well be generated by a normal random variable, there is
nothing wrong with that. The problem arises when we start calling this
a population and start calculating mean, variance, and confidence
intervals. Here we were trying to capture the essence of the
generating stochastic process (mu, sigma), but we actually ended up
with something else.

The put the final nail in the coffin, let's look at the last sentence
on that page of lecture notes:

"Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is
just NOT TRUE!