On Sep 28, 2:48 pm, Cagdas Ozgenc <cagdas.ozg...@gmail.com> wrote:
> > Let me try to explain one more time. My questions are usually too deep > down there for me to explain properly. > > First of all I am talking about 3 diferent things here: model > parameters, population parameters, sample statistics > > I have no objection to the fact that population mean can be calculated > by sample mean in an unbiased way. However you will find commonly in > text books and real life research that what's trying to be inferred is > not the population mean but the model mean (or generating process). > > For example take a look at the lecture notes of a stats class in > UCDavis that I just found on the internet (page 5): > > http://www.stat.ucdavis.edu/~jie/stat13.winter2010/lec20.pdf > > It is trying to show the difference between sampling with replacement > vs sampling without replacement. But that's not the issue here. > There's something else wrong about it. > > Starts with the following: > > "Suppose the heights of female students entering UC Davis in year 2005 > follows a normal distribution, with mean mu and standard deviation > sigma" > > First of all number of female students entering UC Davis in year 2005 > is finite. If this is really our population then there is no way it > can be normally distributed.
I agree. But nothing else is truly normal either, at least if you're working with real (rather than simulated) data. George Box summed it up pretty nicely as follows:
??the statistician knows?that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world.? (JASA, 1976, Vol. 71, 791-799)
> Normal distribution is a model for > infinite populations.
This strikes me as too restrictive. I think the normal distribution can also serve as a fairly decent model for finite populations, provided they are large enough. According to the website given below, UC Davis had a little over 5,000 freshman students in 2005. If even half were female, the normal distribution described might not be too bad as a *model* for the height distribution of incoming female students.
> This means that actually we are not looking at a > population but we are looking at a sampling from normal distribution. > > Now the rest of the problem: > > "A random sample of 100 students are taken" from the above so called > population. > > Now we are looking at a sample of a sample. This means that no matter > what you do, you will never find the mean of the normal distribution > (Mu) by repeated sampling. It doesn't matter whether you do it with > replacement or without replacement. > > You will end up calculating the mean of the population, which will be > slightly or significantly different from Mu depending on how many > students entered UC Davis in year 2005. This means that our samples of > 100 students will be an unbiased estimate of the population mean but a > biased estimate of Mu. > > This is the difference between sampling with replacement from a finite > population and sampling from an infinite population. It seems to me > that this is a chronical problem in stat texts. > > I understand that when you have 1000 elements in your population the > difference in the result will be miniscule. Or some could say that it > is just a model and a model is not 100% reflection of real life > (that's why it is called a model).
Or as Box said,
"All models are wrong. Some are useful."
> However, members of a finite > population can well be generated by a normal random variable, there is > nothing wrong with that. The problem arises when we start calling this > a population and start calculating mean, variance, and confidence > intervals. Here we were trying to capture the essence of the > generating stochastic process (mu, sigma), but we actually ended up > with something else. > > The put the final nail in the coffin, let's look at the last sentence > on that page of lecture notes: > > "Then X1,...,X100 are i.i.d. N(mu, sigma) random variables". This is > just NOT TRUE!
No, it's not. But I think the question is whether the approximation is close enough to be useful under the circumstances.