1. What does "(Fold,Set,Len) = (all,all,all)" mean?
Just my stupid way of saying that the reported Subset x MoSS values were obtained by averaging across all six folds, all six sets, and all lengths meeting the ?four-way content? criterion (data for S,N, data for C,N, data for S,R, and data for C,R). I think you said that anything NOT mentioned is assumed to be averaged over, so perhaps I should have left out the ?|? (at) clause entirely.
II. You wrote:
2. Something's wrong somewhere. Those p's are too similar to one another, and are too large to be consistent with the other results you've been reporting.
No ? it?s just that the ?good? and ?great? p?s for u^2 are very length- specific, as is shown by the following table for u^2 in regression c on (e,u,u*e,u^2) for Len | subset = S, method = N, fold = a1, set = 1. (Note that this table is sorted by increasing p.)
So the question posed by the following table is the same basic question I actually asked several posts ago, namely: for (S, N, a1, 1), do we have ENOUGH ?good? and ?great? p?s to claim that the model c on (e,u,u*e,u^2) ?works? in a sufficient number of cases to ?keep? it, at least for the factor combination S, N, a1, 1) ?
Also, please note that similar tables exist for all of the factor combinations equivalent to (S, N, a1, 1), so is it possible we should actually be comparing the distributions of p for u^2 from all these different factor combinations ... to see which distributions of p are ?left? of others and ?right? of others in the horizontal sense (i.e. with p as the x-axis)?
u^2 (t, df, p) Table: t, df, and p values for u^2 in regression c on (e,u,u*e,u^2) for Len | subset=S, method= N, fold=a1, set=1
?In particular, you should not be considering any results from regressing c on (u,u^2) if e matters?.
I'm sorry to plead ignorance but nothing you've ever posted before has prepared me to understand you here at all. What I mean by this is the following.
From the beginning we have been using a regression involving e, a regression involving u, and a regression involving (e,u) IN CONCERT, NOT as mutually exclusive alternatives.
First we had:
1a) ln(c/L) on ln(c/e) 1b) ln(c/L) on ln(c/u) 1c) ln(c/L) on (ln(c/e), ln(c/u))
Then, because of your reservations about these regressions, we simplified to
2a)c on e 2b)c on u 2c)c on (e,u)
and that actually improved matters.
And then finally, because of your very remarkable intuition that the ?L/H? dichotomization of u should be replaced by adding u-related factors to the regressions themselves, we have arrived at
3a) c on (e,u,u*e), by addition of a u-factor to c on e 3b) c on (u,u^2), by addition of a u-factor to c on u 3c) c on (e,u,u*e,u^2), by addition of two u factors to c on (e,u)
So ... if we never intended 1(a-c) as mutually exclusive alternatives, nor 2a-c as mutually exclusive alternatives, why all of a sudden do we have to treat (3a-3c) as mutually exclusive alternatives? Please recall here that the ultimate goal was always to develop predictors for logistic regressions, and back when we were doing logistic regressions, you said it?s best to throw everything into the soup that one can think of ... that?s why we had logistic regression predictors based on MORE THAN ONE linear regression.
Also, why is NOT statistically legitimate to postulate that there are BOTH:
a) a relationship between c and u that, as you suspected, is best expressed by c on (u,u^2) because the relationship changes with increasing u
b) a relationship between c and e that, again as you expected, is best expressed by c on (e,u,u*e) because again, the relationship changes with increasing u.
IV. You wrote:
?4. SEP = sqrt[Residual Sum of Squares / df] = sqrt[Residual Mean Square] Are you sure Ivo's program doesn't give that as optional output??
and I have a feeling that rsq is what you?re looking for. But if you can?t say for sure, then what I?ll do is a complete rerun that generates all of them and then compare each to the Excel Standard Error of Prediction for the entire regression (not for any particular coefficient.)
V. You wrote:
?5. Let the constant in the input to Ivo's program default to 1.?
VI You wrote:
?6. I have a hunch that Het may be related to a Subset x Fold interaction, where the d.v. is the average slope.?
That would be wonderful, but wouldn?t we need 20-odd folds to show it?
VII You wrote:
?7. Your forthcoming explanation "using reasoning based on the behavior of the average slope Auq of c on (u,u^2) and the covariance AubC of e and u in c on (u,e,u*e)" will probably go right over my head, because I have only a hunch about what the average slope Auq of c on (u,u^2) might mean, and not a clue about what the covariance AubC of e and u in c on (u,e,u*e) might mean.?
I will hold off on this until I have read your response to my remarks in (III) above. If the rules of the game have changed so that we can?t use c on (u,u^2) IN CONCERT with c on (e,u,u*e), then I obviously can?t base any argument on the behavior of the average slope of the former when it?s regressed against available lengths and the behavior of covariance of the latter (covariance of e and u) when it?s regressed against available lengths. I have to be able to use both behaviors to make the argument.
VIII. Thanks as always for your continued patience, tolerance, and willingness to consider these matters.