```Date: Jan 9, 2013 2:56 AM
Author: Paul
Subject: Degrees of freedom (DF) in normality test

I did a simple linear regression (SLR) on 2 equal-length vectors ofdata, then subjected the residuals to a Normal Probability Plot (NPP,http://en.wikipedia.org/wiki/Normal_probability_plot).  The fit wasgood, and there was no gross concavity, convexity, or "S" shape toindicate skew or excess kurtosis.From web browsing, I found that the mean and the standard deviation ofthe normal distribution that is being tested for can be estimated bythe y-intercept and the slope of the NPP.  In other words, a 2nd SLRis performed on the residuals NPP scatter graph.I am at a loss as to how to resolve a discrepancy.  The estimate of*standard deviation* of the residuals comes from slope of the NPP.This should correspond to the standard error of the estimate from theSLR on the 2 vectors of data.  Shouldn't it??  It doesn't.  There is anotable error of about 5% (N=16 data points, yes I know it's a smallsample).  I don't know which is the correct result.  I am usingExcel's linear regression LINEST, but as will be clear, that's not allthat relevant to the problem since I can read their documentation toensure that they conform with textbook theory.Part of this discrepancy can be explained by the fact that the formulafor the normal order statistic means is approximate (see the NPPwikipedia page), but I suspect that it's not the main culprit becausethe estimated *mean* of the residuals was highly accurate (in theorder of 1e-17, ideally zero).I decided to manually calculate the estimate of the standard deviationfor residuals.  This is simply the sum of the square (SS) of theresiduals (SSres), normalized by the DF, then square-rooted.According to the SLR theory, the DF should be N-2 because one degreeis in getting the mean of the independent variable, and another islost in getting the mean of the dependent variable.  I manuallyverified  that this is in fact what is done by Excel's SLR.The alternative is to look at the NPP problem completely separatelyfrom SLR problem.  This is simply estimating a population standarddeviation from a sample.  The sample consists for the residuals fromthe SLR problem, this fact is not used.  The DF in such a process isN-1.For the estimation of standard deviation for the residuals (not justfor the sample, but for the whole hypothetical population), which DFis theoretically correct, N-1 or N-2?  As a disclaimer, I should saythat using N-1 gives a greater discrepancy from the SLR than even theNPP yields.  So it doesn't really help dispel the discrepancy.  Bethat as it may, however, I'm still interested in what is thetheoretically correct choice for DF.P.S.  I'm not interested in the Maximum Likelihood approach for thetime being.  Better to get a good understanding of the why's for oneapproach before broaching another approach.
```