|
|
Response to your last
Posted:
Dec 7, 2012 9:07 PM
|
|
I. You wrote:
1. What does "(Fold,Set,Len) = (all,all,all)" mean?
Just my stupid way of saying that the reported Subset x MoSS values were obtained by averaging across all six folds, all six sets, and all lengths meeting the ?four-way content? criterion (data for S,N, data for C,N, data for S,R, and data for C,R). I think you said that anything NOT mentioned is assumed to be averaged over, so perhaps I should have left out the ?|? (at) clause entirely.
II. You wrote:
2. Something's wrong somewhere. Those p's are too similar to one another, and are too large to be consistent with the other results you've been reporting.
No ? it?s just that the ?good? and ?great? p?s for u^2 are very length- specific, as is shown by the following table for u^2 in regression c on (e,u,u*e,u^2) for Len | subset = S, method = N, fold = a1, set = 1. (Note that this table is sorted by increasing p.)
So the question posed by the following table is the same basic question I actually asked several posts ago, namely: for (S, N, a1, 1), do we have ENOUGH ?good? and ?great? p?s to claim that the model c on (e,u,u*e,u^2) ?works? in a sufficient number of cases to ?keep? it, at least for the factor combination S, N, a1, 1) ?
Also, please note that similar tables exist for all of the factor combinations equivalent to (S, N, a1, 1), so is it possible we should actually be comparing the distributions of p for u^2 from all these different factor combinations ... to see which distributions of p are ?left? of others and ?right? of others in the horizontal sense (i.e. with p as the x-axis)?
u^2 (t, df, p) Table: t, df, and p values for u^2 in regression c on (e,u,u*e,u^2) for Len | subset=S, method= N, fold=a1, set=1
Len t df p
71 3.930 24 0.00063 26 3.434 44 0.00131 122 3.565 16 0.00258 24 3.162 47 0.00274 27 3.101 58 0.00297 110 3.396 16 0.00369 101 3.179 19 0.00494 35 2.870 59 0.00569 84 2.460 27 0.02058 109 2.462 25 0.02108 25 2.343 66 0.02216 73 2.185 31 0.03654 69 1.989 34 0.05474 62 1.988 24 0.05828 49 1.922 39 0.06193 55 1.929 31 0.06294 44 1.733 35 0.09186 37 1.667 68 0.10004 28 1.635 64 0.10697 54 1.639 33 0.11063 94 1.638 22 0.11567 41 1.616 32 0.11598 29 1.564 74 0.12219 30 1.546 64 0.12705 60 1.533 34 0.13462 75 1.510 20 0.14672 33 1.464 54 0.14893 66 1.451 35 0.15580 52 1.404 38 0.16830 74 1.394 25 0.17562 50 1.240 40 0.22236 32 1.216 47 0.22989 67 1.186 40 0.24280 63 1.147 28 0.26105 38 1.084 38 0.28513 53 1.065 33 0.29463 40 1.053 46 0.29789 68 1.064 19 0.30072 77 0.998 28 0.32687 58 0.989 32 0.32996 76 0.950 22 0.35222 48 0.873 38 0.38816 43 0.860 33 0.39616 80 0.807 31 0.42564 46 0.766 30 0.44947 87 0.717 17 0.48337 56 0.679 31 0.50249 45 0.677 29 0.50349 83 0.659 19 0.51765 96 0.644 23 0.52619 59 0.537 24 0.59645 61 0.490 39 0.62669 36 0.454 57 0.65159 39 0.443 30 0.66063 65 0.424 21 0.67621 120 0.390 16 0.70203 95 0.325 12 0.75075 51 0.288 45 0.77443 108 0.270 14 0.79079 31 0.234 65 0.81572 90 0.169 14 0.86841 111 0.124 18 0.90264 34 0.078 73 0.93820 47 0.065 45 0.94811 89 0.061 11 0.95249 42 0.002 31 0.99881
III. You wrote:
?In particular, you should not be considering any results from regressing c on (u,u^2) if e matters?.
I'm sorry to plead ignorance but nothing you've ever posted before has prepared me to understand you here at all. What I mean by this is the following.
From the beginning we have been using a regression involving e, a regression involving u, and a regression involving (e,u) IN CONCERT, NOT as mutually exclusive alternatives.
First we had:
1a) ln(c/L) on ln(c/e) 1b) ln(c/L) on ln(c/u) 1c) ln(c/L) on (ln(c/e), ln(c/u))
Then, because of your reservations about these regressions, we simplified to
2a)c on e 2b)c on u 2c)c on (e,u)
and that actually improved matters.
And then finally, because of your very remarkable intuition that the ?L/H? dichotomization of u should be replaced by adding u-related factors to the regressions themselves, we have arrived at
3a) c on (e,u,u*e), by addition of a u-factor to c on e 3b) c on (u,u^2), by addition of a u-factor to c on u 3c) c on (e,u,u*e,u^2), by addition of two u factors to c on (e,u)
So ... if we never intended 1(a-c) as mutually exclusive alternatives, nor 2a-c as mutually exclusive alternatives, why all of a sudden do we have to treat (3a-3c) as mutually exclusive alternatives? Please recall here that the ultimate goal was always to develop predictors for logistic regressions, and back when we were doing logistic regressions, you said it?s best to throw everything into the soup that one can think of ... that?s why we had logistic regression predictors based on MORE THAN ONE linear regression.
Also, why is NOT statistically legitimate to postulate that there are BOTH:
a) a relationship between c and u that, as you suspected, is best expressed by c on (u,u^2) because the relationship changes with increasing u
b) a relationship between c and e that, again as you expected, is best expressed by c on (e,u,u*e) because again, the relationship changes with increasing u.
IV. You wrote:
?4. SEP = sqrt[Residual Sum of Squares / df] = sqrt[Residual Mean Square] Are you sure Ivo's program doesn't give that as optional output??
He provides:
rsq(), where rsq = ?sse? / ?sst? adjrsq(), sigmasq(), ybar(), sst(), k(), n()
and I have a feeling that rsq is what you?re looking for. But if you can?t say for sure, then what I?ll do is a complete rerun that generates all of them and then compare each to the Excel Standard Error of Prediction for the entire regression (not for any particular coefficient.)
V. You wrote:
?5. Let the constant in the input to Ivo's program default to 1.?
OK.
VI You wrote:
?6. I have a hunch that Het may be related to a Subset x Fold interaction, where the d.v. is the average slope.?
That would be wonderful, but wouldn?t we need 20-odd folds to show it?
VII You wrote:
?7. Your forthcoming explanation "using reasoning based on the behavior of the average slope Auq of c on (u,u^2) and the covariance AubC of e and u in c on (u,e,u*e)" will probably go right over my head, because I have only a hunch about what the average slope Auq of c on (u,u^2) might mean, and not a clue about what the covariance AubC of e and u in c on (u,e,u*e) might mean.?
I will hold off on this until I have read your response to my remarks in (III) above. If the rules of the game have changed so that we can?t use c on (u,u^2) IN CONCERT with c on (e,u,u*e), then I obviously can?t base any argument on the behavior of the average slope of the former when it?s regressed against available lengths and the behavior of covariance of the latter (covariance of e and u) when it?s regressed against available lengths. I have to be able to use both behaviors to make the argument.
VIII. Thanks as always for your continued patience, tolerance, and willingness to consider these matters.
|
|