I did a simple linear regression (SLR) on 2 equal-length vectors of data, then subjected the residuals to a Normal Probability Plot (NPP, http://en.wikipedia.org/wiki/Normal_probability_plot). The fit was good, and there was no gross concavity, convexity, or "S" shape to indicate skew or excess kurtosis.
From web browsing, I found that the mean and the standard deviation of the normal distribution that is being tested for can be estimated by the y-intercept and the slope of the NPP. In other words, a 2nd SLR is performed on the residuals NPP scatter graph.
I am at a loss as to how to resolve a discrepancy. The estimate of *standard deviation* of the residuals comes from slope of the NPP. This should correspond to the standard error of the estimate from the SLR on the 2 vectors of data. Shouldn't it?? It doesn't. There is a notable error of about 5% (N=16 data points, yes I know it's a small sample). I don't know which is the correct result. I am using Excel's linear regression LINEST, but as will be clear, that's not all that relevant to the problem since I can read their documentation to ensure that they conform with textbook theory.
Part of this discrepancy can be explained by the fact that the formula for the normal order statistic means is approximate (see the NPP wikipedia page), but I suspect that it's not the main culprit because the estimated *mean* of the residuals was highly accurate (in the order of 1e-17, ideally zero).
I decided to manually calculate the estimate of the standard deviation for residuals. This is simply the sum of the square (SS) of the residuals (SSres), normalized by the DF, then square-rooted. According to the SLR theory, the DF should be N-2 because one degree is in getting the mean of the independent variable, and another is lost in getting the mean of the dependent variable. I manually verified that this is in fact what is done by Excel's SLR.
The alternative is to look at the NPP problem completely separately from SLR problem. This is simply estimating a population standard deviation from a sample. The sample consists for the residuals from the SLR problem, this fact is not used. The DF in such a process is N-1.
For the estimation of standard deviation for the residuals (not just for the sample, but for the whole hypothetical population), which DF is theoretically correct, N-1 or N-2? As a disclaimer, I should say that using N-1 gives a greater discrepancy from the SLR than even the NPP yields. So it doesn't really help dispel the discrepancy. Be that as it may, however, I'm still interested in what is the theoretically correct choice for DF.
P.S. I'm not interested in the Maximum Likelihood approach for the time being. Better to get a good understanding of the why's for one approach before broaching another approach.