|
Re: Sampling From Finite Population with Replacement
Posted:
Sep 28, 2010 11:49 PM
|
|
On 29 Eylül, 01:49, Bruce Weaver <bwea...@lakeheadu.ca> wrote: > On Sep 28, 2:48 pm, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote: > > > > > > > > > Let me try to explain one more time. My questions are usually too deep > > down there for me to explain properly. > > > First of all I am talking about 3 diferent things here: model > > parameters, population parameters, sample statistics > > > I have no objection to the fact that population mean can be calculated > > by sample mean in an unbiased way. However you will find commonly in > > text books and real life research that what's trying to be inferred is > > not the population mean but the model mean (or generating process). > > > For example take a look at the lecture notes of a stats class in > > UCDavis that I just found on the internet (page 5): > > >http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf > > > It is trying to show the difference between sampling with replacement > > vs sampling without replacement. But that's not the issue here. > > There's something else wrong about it. > > > Starts with the following: > > > "Suppose the heights of female students entering UC Davis in year 2005 > > follows a normal distribution, with mean mu and standard deviation > > sigma" > > > First of all number of female students entering UC Davis in year 2005 > > is finite. If this is really our population then there is no way it > > can be normally distributed. > > I agree. But nothing else is truly normal either, at least if you're > working with real (rather than simulated) data. George Box summed it > up pretty nicely as follows: > > "...the statistician knows...that in nature there never was a normal > distribution, there never was a straight line, yet with normal and > linear assumptions, known to be false, he can often derive results > which match, to a useful approximation, those found in the real > world." (JASA, 1976, Vol. 71, 791-799) > > > Normal distribution is a model for > > infinite populations. > > This strikes me as too restrictive. I think the normal distribution > can also serve as a fairly decent model for finite populations, > provided they are large enough. According to the website given below, > UC Davis had a little over 5,000 freshman students in 2005. If even > half were female, the normal distribution described might not be too > bad as a *model* for the height distribution of incoming female > students. > > http://facts.ucdavis.edu/student_population_headcount_fall.lasso > > > > > > > This means that actually we are not looking at a > > population but we are looking at a sampling from normal distribution. > > > Now the rest of the problem: > > > "A random sample of 100 students are taken" from the above so called > > population. > > > Now we are looking at a sample of a sample. This means that no matter > > what you do, you will never find the mean of the normal distribution > > (Mu) by repeated sampling. It doesn't matter whether you do it with > > replacement or without replacement. > > > You will end up calculating the mean of the population, which will be > > slightly or significantly different from Mu depending on how many > > students entered UC Davis in year 2005. This means that our samples of > > 100 students will be an unbiased estimate of the population mean but a > > biased estimate of Mu. > > > This is the difference between sampling with replacement from a finite > > population and sampling from an infinite population. It seems to me > > that this is a chronical problem in stat texts. > > > I understand that when you have 1000 elements in your population the > > difference in the result will be miniscule. Or some could say that it > > is just a model and a model is not 100% reflection of real life > > (that's why it is called a model). > > Or as Box said, > > "All models are wrong. Some are useful." > > > However, members of a finite > > population can well be generated by a normal random variable, there is > > nothing wrong with that. The problem arises when we start calling this > > a population and start calculating mean, variance, and confidence > > intervals. Here we were trying to capture the essence of the > > generating stochastic process (mu, sigma), but we actually ended up > > with something else. > > > The put the final nail in the coffin, let's look at the last sentence > > on that page of lecture notes: > > > "Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is > > just NOT TRUE! > > No, it's not. But I think the question is whether the approximation > is close enough to be useful under the circumstances. > > HTH. > -- > Bruce Weaver > bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home > "When all else fails, RTFM."- Al?nt?y? gizle - > > - Al?nt?y? göster -- Al?nt?y? gizle - > > - Al?nt?y? göster -
Of course normal is a good model for finite populations as well. I agree with you on that.
The point I was trying to make is that once you get into using a model, this can be any probability distribution with infinitely many values (normal, uniform, or even a discrete distribution with infinite variety in values), or density estimation for example there is indeed a difference between sampling from an infinite population and sampling from a finite population with replacement.
|
|