Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Sampling From Finite Population with Replacement
Replies: 28   Last Post: Sep 30, 2010 6:30 AM

 Messages: [ Previous | Next ]
 Bruce Weaver Posts: 753 Registered: 12/18/04
Re: Sampling From Finite Population with Replacement
Posted: Sep 28, 2010 5:49 PM

On Sep 28, 2:48 pm, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote:

>
> Let me try to explain one more time. My questions are usually too deep
> down there for me to explain properly.
>
> First of all I am talking about 3 diferent things here: model
> parameters, population parameters, sample statistics
>
> I have no objection to the fact that population mean can be calculated
> by sample mean in an unbiased way. However you will find commonly in
> text books and real life research that what's trying to be inferred is
> not the population mean but the model mean (or generating process).
>
> For example take a look at the lecture notes of a stats class in
> UCDavis that I just found on the internet (page 5):
>
> http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf
>
> It is trying to show the difference between sampling with replacement
> vs sampling without replacement. But that's not the issue here.
> There's something else wrong about it.
>
> Starts with the following:
>
> "Suppose the heights of female students entering UC Davis in year 2005
> follows a normal distribution, with mean mu and standard deviation
> sigma"
>
> First of all number of female students entering UC Davis in year 2005
> is finite. If this is really our population then there is no way it
> can be normally distributed.

I agree. But nothing else is truly normal either, at least if you're
working with real (rather than simulated) data. George Box summed it
up pretty nicely as follows:

??the statistician knows?that in nature there never was a normal
distribution, there never was a straight line, yet with normal and
linear assumptions, known to be false, he can often derive results
which match, to a useful approximation, those found in the real
world.? (JASA, 1976, Vol. 71, 791-799)

> Normal distribution is a model for
> infinite populations.

This strikes me as too restrictive. I think the normal distribution
can also serve as a fairly decent model for finite populations,
provided they are large enough. According to the website given below,
UC Davis had a little over 5,000 freshman students in 2005. If even
half were female, the normal distribution described might not be too
bad as a *model* for the height distribution of incoming female
students.

> This means that actually we are not looking at a
> population but we are looking at a sampling from normal distribution.
>
> Now the rest of the problem:
>
> "A random sample of 100 students are taken" from the above so called
> population.
>
> Now we are looking at a sample of a sample. This means that no matter
> what you do, you will never find the mean of the normal distribution
> (Mu) by repeated sampling. It doesn't matter whether you do it with
> replacement or without replacement.
>
> You will end up calculating the mean of the population, which will be
> slightly or significantly different from Mu depending on how many
> students entered UC Davis in year 2005. This means that our samples of
> 100 students will be an unbiased estimate of the population mean but a
> biased estimate of Mu.
>
> This is the difference between sampling with replacement from a finite
> population and sampling from an infinite population. It seems to me
> that this is a chronical problem in stat texts.
>
> I understand that when you have 1000 elements in your population the
> difference in the result will be miniscule. Or some could say that it
> is just a model and a model is not 100% reflection of real life
> (that's why it is called a model).

Or as Box said,

"All models are wrong. Some are useful."

> However, members of a finite
> population can well be generated by a normal random variable, there is
> nothing wrong with that. The problem arises when we start calling this
> a population and start calculating mean, variance, and confidence
> intervals. Here we were trying to capture the essence of the
> generating stochastic process (mu, sigma), but we actually ended up
> with something else.
>
> The put the final nail in the coffin, let's look at the last sentence
> on that page of lecture notes:
>
> "Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is
> just NOT TRUE!

No, it's not. But I think the question is whether the approximation
is close enough to be useful under the circumstances.

HTH.
--
Bruce Weaver