Date: Dec 7, 2012 6:27 AM
Author: Ray Koopman
Subject: Re: Response to your last of 12/7 at 12:17am

On Dec 7, 12:18 am, djh <halitsk...@att.net> wrote:
> II.
>
> You wrote:
>
> ?2. Why do you present results from the regressions of c on
> both(u,e,u*e) and (u,e,u*e,u^2) for the same data? It is unusual to
> consider the results of both analyses, except for the purpose of
> deciding which model to use. Have you checked the significance of the
> quadratic term? Does its inclusion reduce the Standard Error of
> Prediction (SEP) substantially? (Those are two conceptually separate
> issues.)?
>
> I presented results from both out of ignorance, i.e. not knowing
> whether the similarity of ?het?, ?L-H?, and ?H-L? results for both
> regressions would be relevant to your consideration of the entire
> matter.
>
> But re actual comparison of the two regressions, consider for the sake
> of discussion just the average slopes Aubu and Aubqu for c on
> (u,e,u*e) and c on (u,e,u*e,u^2) respectively.
>
> For these two average slopes the ?2-way? data for Subset x MoSS |
> (Fold x Set x Length) = (a1,1,24) are:
>
> Aubu Aubqu
>
> t 0.297413484 0.274819583
> df 133.708200500 128.3098338
> 2-tailed p 0.766614886 0.783897935
> N for 1S 47 47
> N for 1C 36 36
> N for R1S 41 41
> N for R1C 34 34
> 1S Coeff -1.952726236 -4.174717322
> IC Coeff -4.009830911 -4.280992887
> R1S Coeff 0.060946658 -0.901924172
> R1C Coeff -0.84027985 -2.098708096
> Var for 1S 3.881459118 4.222990696
> Var for 1C 2.589180286 2.578813072
> Var for R1S 4.019024218 3.952029957
> Var for R1C 4.61471327 4.991890592
>
> based on the underlying data:
>
> Method N N R R
> Subset S C S C
>
> obsN 47 36 41 34
> Mean u 0.615 0.501 0.540 0.558
> Mean u^2 0.394 0.293 0.315 0.335
>
> Aubu -1.953 -4.010 0.061 -0.840
> Aubqu -4.175 -4.281 -0.902 -2.099
>
> AubuSE 1.970 1.609 2.005 2.148
> AubquSE 2.055 1.606 1.988 2.234
>
> And speaking from my usual complete ignorance, I would say that the
> similarity of t,df,p for Aubu and Aubqu at the ?2-way? level is
> sufficient to:
>
> a) rule out any need for further analysis at the ?underlying? level;
>
> b) motivate the choice of c on (u,e,u*e) over c on (u,e,u*e,u^2)
> because complicating the regression by addition of u^2 doesn?t seem to
> make any appreciable difference, at least not at the ?2-way level?.
>
> But if these conclusions are incorrect, I await your guidance and
> instructions.


Here's what's usually done to compare two predictor sets such as
(u,e,u*e) and (u,e,u*e,u^2), where the only difference between the
two sets is that the second set has one extra predictor that the
first set lacks. Let

estimated coefficient of the extra predictor in the second model
t = ----------------------------------------------------------------.
estimated standard error of that estimated coefficient

Refer t to the t-distribution with df = n-k, where n = # of
observations, and k = # of coefficients in the second model.
(If the model has an intercept then k = # of predictors + 1.)

A totally equivalent way is to get F = (RSS1/RSS2 - 1)*(n-k), where
RSS1 and RSS2 are the Residual Sums of Squares from fitting the two
models. Refer F to the F-distribution with df = (1,n-k). F = t^2.
p(F) uses only the upper tail of the F-distribution, whereas
p(t) uses both tails of the t-distribution.

The point is that asking if the coefficient is significantly
different from 0 is EXACTLY the same as asking if the ratio of the
Standard Errors of Prediction is significantly different from 1.

If the test is significant, it means only that the true coefficient
is not exactly 0, that the true ratio of SEPs is not exactly 1. No
matter how small the p is, it doesn't mean SEP1/SEP2 is big enough to
get excited about, that using the first model instead of the second
would increase the SEP by enough to worry about. That's a subjective
judgment call, that depends on the situation. In a perfect world,
standard regression software would give a CI for SEP1/SEP2.