
Re: Sampling From Finite Population with Replacement
Posted:
Sep 28, 2010 2:48 PM


> > Here is what I am trying to say. Take any statistics book you will > > find a statment that starts something like the following: > > > "You have a population of size N with elements normally distributed > > with Mu and Sigma. If we sample from this population with > > replacement..." Then they continue calculating population mean and > > variance, and then claim that Expected Value of Sample mean is equal > > to Mu. > > > The point I read that it gives me the creeps. First of all normal > > distribution is a model. Yes sample mean will give an unbiased > > estimate of the population mean (which is a population parameter not a > > model parameter). But on average it will not be Mu. Sampling with > > replacement from a finite population will not give an unbiased > > estimation of the model paramaters. Either I am not reading my books > > carefully or this issue is somehow swept under the rug. > > > In the infinite population case my understanding is that population > > parameter and model paratemer will converge. But when we talk about > > inference do we ever care about the model parameter? > > I don't understand your objection. Have you ever tried it with a > population small enough so that you can enumerate all possible samples > of a given size? E.g., try the following: > > 1. Let the population consist of 5 scores: 2, 3, 4, 5, 6 > > 2. Compute the population mean and SD (with N, not n1 in the > denominator). > > 3. Draw all possible samples of n=2 (with replacement) from the > populationthere are 25 of them. For each one, compute the sample > mean. > > 4. Compute the mean and SD of the 25 sample means. For the SD, use > N=25 in the denominator, because you have the entire population of > sample means. > > Notice that the mean of the sample means = the population mean; and > the SD of the sample means = the population SD over the square root > of the sample size. > >  > Bruce Weaver > bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home > "When all else fails, RTFM." Al?nt?y? gizle  > >  Al?nt?y? göster 
Let me try to explain one more time. My questions are usually too deep down there for me to explain properly.
First of all I am talking about 3 diferent things here: model parameters, population parameters, sample statistics
I have no objection to the fact that population mean can be calculated by sample mean in an unbiased way. However you will find commonly in text books and real life research that what's trying to be inferred is not the population mean but the model mean (or generating process).
For example take a look at the lecture notes of a stats class in UCDavis that I just found on the internet (page 5):
http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf
It is trying to show the difference between sampling with replacement vs sampling without replacement. But that's not the issue here. There's something else wrong about it.
Starts with the following:
"Suppose the heights of female students entering UC Davis in year 2005 follows a normal distribution, with mean mu and standard deviation sigma"
First of all number of female students entering UC Davis in year 2005 is finite. If this is really our population then there is no way it can be normally distributed. Normal distribution is a model for infinite populations. This means that actually we are not looking at a population but we are looking at a sampling from normal distribution.
Now the rest of the problem:
"A random sample of 100 students are taken" from the above so called population.
Now we are looking at a sample of a sample. This means that no matter what you do, you will never find the mean of the normal distribution (Mu) by repeated sampling. It doesn't matter whether you do it with replacement or without replacement.
You will end up calculating the mean of the population, which will be slightly or significantly different from Mu depending on how many students entered UC Davis in year 2005. This means that our samples of 100 students will be an unbiased estimate of the population mean but a biased estimate of Mu.
This is the difference between sampling with replacement from a finite population and sampling from an infinite population. It seems to me that this is a chronical problem in stat texts.
I understand that when you have 1000 elements in your population the difference in the result will be miniscule. Or some could say that it is just a model and a model is not 100% reflection of real life (that's why it is called a model). However, members of a finite population can well be generated by a normal random variable, there is nothing wrong with that. The problem arises when we start calling this a population and start calculating mean, variance, and confidence intervals. Here we were trying to capture the essence of the generating stochastic process (mu, sigma), but we actually ended up with something else.
The put the final nail in the coffin, let's look at the last sentence on that page of lecture notes:
"Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is just NOT TRUE!

