Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: Degrees of freedom (DF) in normality test
Replies: 4   Last Post: Jan 10, 2013 6:45 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Ray Koopman

Posts: 3,383
Registered: 12/7/04
Re: Degrees of freedom (DF) in normality test
Posted: Jan 9, 2013 3:31 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Jan 8, 11:56 pm, Paul <paul.domas...@gmail.com> wrote:
> I did a simple linear regression (SLR) on 2 equal-length vectors of
> data, then subjected the residuals to a Normal Probability Plot (NPP,http://en.wikipedia.org/wiki/Normal_probability_plot). The fit was
> good, and there was no gross concavity, convexity, or "S" shape to
> indicate skew or excess kurtosis.
>
> From web browsing, I found that the mean and the standard deviation of
> the normal distribution that is being tested for can be estimated by
> the y-intercept and the slope of the NPP. In other words, a 2nd SLR
> is performed on the residuals NPP scatter graph.
>
> I am at a loss as to how to resolve a discrepancy. The estimate of
> *standard deviation* of the residuals comes from slope of the NPP.
> This should correspond to the standard error of the estimate from the
> SLR on the 2 vectors of data. Shouldn't it?? It doesn't. There is a
> notable error of about 5% (N=16 data points, yes I know it's a small
> sample). I don't know which is the correct result. I am using
> Excel's linear regression LINEST, but as will be clear, that's not all
> that relevant to the problem since I can read their documentation to
> ensure that they conform with textbook theory.
>
> Part of this discrepancy can be explained by the fact that the formula
> for the normal order statistic means is approximate (see the NPP
> wikipedia page), but I suspect that it's not the main culprit because
> the estimated *mean* of the residuals was highly accurate (in the
> order of 1e-17, ideally zero).
>
> I decided to manually calculate the estimate of the standard deviation
> for residuals. This is simply the sum of the square (SS) of the
> residuals (SSres), normalized by the DF, then square-rooted.
> According to the SLR theory, the DF should be N-2 because one degree
> is in getting the mean of the independent variable, and another is
> lost in getting the mean of the dependent variable. I manually
> verified that this is in fact what is done by Excel's SLR.
>
> The alternative is to look at the NPP problem completely separately
> from SLR problem. This is simply estimating a population standard
> deviation from a sample. The sample consists for the residuals from
> the SLR problem, this fact is not used. The DF in such a process is
> N-1.
>
> For the estimation of standard deviation for the residuals (not just
> for the sample, but for the whole hypothetical population), which DF
> is theoretically correct, N-1 or N-2? As a disclaimer, I should say
> that using N-1 gives a greater discrepancy from the SLR than even the
> NPP yields. So it doesn't really help dispel the discrepancy. Be
> that as it may, however, I'm still interested in what is the
> theoretically correct choice for DF.
>
> P.S. I'm not interested in the Maximum Likelihood approach for the
> time being. Better to get a good understanding of the why's for one
> approach before broaching another approach.


Even if the true regression is linear and the error random
variables are iid normal, the sample residuals are not iid.
Their joint distribution is n-variate normal with zero means
and covariance matrix = (I - H)*sigma^2, where sigma^2 is the
variance of the error distribution and H is the "hat matrix"
associated with the predictors. The expected order statistics
of the residuals are not a simple linear function of the
expected order statistics of n iid normals.



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.