
Re: Sampling From Finite Population with Replacement
Posted:
Sep 28, 2010 6:38 PM


On Tue, 28 Sep 2010 11:48:47 0700 (PDT), Cagdas Ozgenc <cagdas.ozgenc@gmail.com> wrote:
My understanding of your problem differs from what Bruce is arguing.
[snip, previous posts] > >Let me try to explain one more time. My questions are usually too deep >down there for me to explain properly. > >First of all I am talking about 3 diferent things here: model >parameters, population parameters, sample statistics > >I have no objection to the fact that population mean can be calculated >by sample mean in an unbiased way. However you will find commonly in >text books and real life research that what's trying to be inferred is >not the population mean but the model mean (or generating process). > >For example take a look at the lecture notes of a stats class in >UCDavis that I just found on the internet (page 5): > >http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf > >It is trying to show the difference between sampling with replacement >vs sampling without replacement. But that's not the issue here. >There's something else wrong about it. > >Starts with the following: > >"Suppose the heights of female students entering UC Davis in year 2005 >follows a normal distribution, with mean mu and standard deviation >sigma"
I agree that the above is ambiguous if you really want to press the point. It uses mu and sigma which describe populations It does not state whether the population is the class of 2005, or something wider that would be more useful for generalization.
> >First of all number of female students entering UC Davis in year 2005 >is finite. If this is really our population then there is no way it
First, that is not necessarily the population. The class of 2005 is not described here, necessarily, as other than a sample from an infinite population with those (assumed) parameters.
On the other hand, I see no need at all that a sample drawn with a particular distribution needs to be infinite. A single observation can be "drawn from a distribution."
And if the class of 2005 is the "population", then "normal" is a description which is assumed, or presumed to be close enough for the purposes of some problem.
>can be normally distributed. Normal distribution is a model for >infinite populations. This means that actually we are not looking at a >population but we are looking at a sampling from normal distribution. > >Now the rest of the problem: > >"A random sample of 100 students are taken" from the above so called >population. > >Now we are looking at a sample of a sample.
The terminology of "population" versus "sample" is not particularly illuminating unless one is going to talk about the finite sampling correction, or otherwise make use of the limitation of the total N. This could be a sample from a sample, or a sample from a population.
> This means that no matter >what you do, you will never find the mean of the normal distribution >(Mu) by repeated sampling. It doesn't matter whether you do it with >replacement or without replacement.
So the mean is wrong. It does not pretend to be an estimate with zero error, only with zero bias when the whole procedure is applied many times. As I tried to emphasize in my previous reply, "zero bias" does not say that the estimate has zero variance. For normal, it does not even the least possible variance.
> >You will end up calculating the mean of the population, which will be >slightly or significantly different from Mu depending on how many >students entered UC Davis in year 2005.
Huh? If 2005 *is* the "population", how could "the mean of the population" be "significantly different from Mu"?
> This means that our samples of >100 students will be an unbiased estimate of the population mean but a >biased estimate of Mu.
"'population mean" is Mu, by conventional definition. So, I can't see how you can accept that the samples of 100 are unbiased in estimating one but not the other.
> >This is the difference between sampling with replacement from a finite >population and sampling from an infinite population. It seems to me >that this is a chronical problem in stat texts.
You've lost me for good.
> >I understand that when you have 1000 elements in your population the >difference in the result will be miniscule. Or some could say that it >is just a model and a model is not 100% reflection of real life
By "model", do you refer to the "sample"?
>(that's why it is called a model). However, members of a finite >population can well be generated by a normal random variable, there is >nothing wrong with that. The problem arises when we start calling this >a population and start calculating mean, variance, and confidence >intervals. Here we were trying to capture the essence of the >generating stochastic process (mu, sigma), but we actually ended up >with something else. > >The put the final nail in the coffin, let's look at the last sentence >on that page of lecture notes: > >"Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is >just NOT TRUE!
If X1, ..., X100 are individuals sampled, I thought that was the starting point. I don't see it as a conclusion.
 Rich Ulrich

