Date: Jan 9, 2013 10:05 PM
Subject: Re: Question concerning MLE for model parameter estimation
I am going to try and at least answer part of your question. The confusion I think you may be experiencing is in understanding what the likelihood function is and what function it serves.
To understand the likelihood function, you need to understand its origins, which are in Bayesian and not Fisherian statistics. If you see its use in Bayesian statistics, then you can see how Fisher adapted it for use in frequency based statistics.
In Bayesian theory, the likelihood function is the portion of the inference formula which "weighs" the data. Indeed, it is sometimes called the "evidence factor."
Bayesian estimation works in a way quite opposite Frequentist statistics works. In Bayesian statistics the goal is inductive reasoning, that is case based reasoning. With each data point as an individual case, information about the hidden information (the parameters) is gradually gleaned. Unlike deductive reasoning, however, inductive reasoning is incomplete and subjective. I am giving you an example below.
Let the symbol for a parameter be m and the data x. The goal then is to infer m from x. However, as you can see x, but not m, you are trying to find the function and its parameters for x(m), where m is unknown.
The formula p(m|x) is proportional to p(x|m)p(m). The formula for p(x|m) is the likelihood function. p(m) is called the prior and it represents prior knowledge about m and p(m|x) is called the posterior and represents the beliefs after the data has been seen.
Remember, the goal of Bayesian statistics is to infer the parameter. The parameter in this example is which distribution is the true distribution.
Let's say we know there are three possible discrete distributions and we have three data points. The three models are mutually exclusive and exhaustive.
The models are m1 has a distribution of 1/4 chance x=5,1/2 chance x=6 and 1/4 chance x=7. m2 has a distribution of 1/3rd chance for 6,7,8 and m3 has 4/9 that x=7, 4/9 that x=8 and 1/9 that x=9.
You have observed two 7's and an 8.
The likelihood of x|m1 is 1/4*1/4*0=0.
The likelihood of x|m2 is (1/3)^3=1/27=.04.
The likelihood of x|m3 is (4/9)^3=64/729=.09.
m3 is the value of the parameter that maximizes the likelihood.
Now lets imagine what we are trying to maximize is the probability of correctly selecting the mean and variance of a group of observations. Since you know it is Gaussian, you take the product of the probability of observing each observation over the set of all possible values of mu and sigma. As it happens, of course, you know that if all you are interested in is the single point of maximum likelihood for a differentiable pdf, then you can construct density function and differentiate with respect to the parameters of interest and find the maximum. In doing this you lose the rest of the information on the other possible values of the parameters, but this is where Fisher gets sneaky.
Fisher tries to use deductive reasoning instead. He reasons that if you construct modus tollens, then if you treat some hypothesis as true, but then show the test which is consequent is false then you have shown the hypothesis is false. Deductive reasoning can be complete.
So what Fisher is doing is using the MLE only to create a summary measure to form a simple statistical test from.
If you add in a Weibull distribution, it is no different from the Gaussian. You still have to estimate the parameters you do not know. It doesn't actually matter if it is a median or a mode or anything else you can come up with, like the numbered models above. What you are doing is modeling them jointly. Assuming independence, you are building a multidimensional joint distribution of every relevant parameter.
As to the residue, you will need to ask someone experienced with this. My gut says you could not assume them to be Gaussian, based on what you have described, but that really requires someone to have already worked this out and my guess is that it is in the literature already.