It?s a funny kind of Catch-22, in the following sense. We CAN use all three of 3a-c IN CONCERT to generate predictors for our logistic regressions, so long as we don?t inquire as to whether 3ac are themselves individually justifiable statistically. Once we ask that question of each of 3a-c, we can no longer use them in concert to generate predictors. (Kind of ?Heisenbergian?, in the sense that the very act of observing constrains what we can accurately observe.)
Well now that I understand, I think we?re OK for the following reason.
Suppose I generate the same ordered p table as I just presented for all of the slope coefficients of 3c ( u, e, u*e as well as u^2), and of course at all values of subset x method x fold x set. Then one of the following outcomes can be expected:
i) we will not only find statistical goodness for all of the four slope coefficients individually, but we will find statistical goodness ONLY where we want it ... say for the sake of discussion just in N1S and maybe N2S but not N3S, and not in any cell involving subset=C and/ or method=R;
ii) we will find statistical goodness, but not where we expect it ... say for the sake of discussion that we find it in no R cells, but we do find it in some C cells as well as some S cells. (In this case, we would have to scientifically step-back and think about what might be going on ... maybe mutation plays a deforming role in some cases more than others.)
iii) we don?t find statistical goodness anywhere, or if we do, we find it equally in corresponding R and N cells, which would mean that we might have discovered something about the random distribution of certain bioinformatic variables, but not much else.
In the case that (i) is the outcome, we will have achieved our goal of statistically justifying the single regression c on (u,e,u*e,u^2), which we can then use to generate predictors for our logistic regressions.
Furthermore, if each of the four coefficients are statistically ?good? in the sense of the last paragraph of our last post, then it ?costs? us nothing to drop regresssions 3a and 3b because there are no new variables in either of these regressions that aren?t in 3c.
And finally, we can continue to investigate the following ?good? het results, since all of them involve average slopes or covar arising from 3c:
a1 a3 b1 b47 c1 c2 C S C S c S c S C S C S "Het"
N1 Aubqe H H L L H H H H L L L L 0 N3 Aubqe L L H H L L H H H H L L 0
N3 Aubqu L H H H H H L H L H L H 4
N1 AubqC H L H L H L H L H L H L 6
N2 AubqC H L H L H L H L H L H L 6
So, please let me know if generating the ordered p tables for all values of subset x method x fold x set and all slope coefficients of 3c is what I should do next (and also perhaps for the covar of 3c.)
If so, please let me know:
iv) whether it would help you for me to first do a full rerun to capture SEPs via Ivo?s sigmasq?s, in case you need them as adjunct information for evaluation.
v) if I can programatically generate the comparison of the plots, or if it?s actually worthwhile to visually compare every case.
If not, please let me know what you think I should do next.
Also, with respect to point (v), once I have the data generated, perhaps you can send me two sample plots offline so I can see what the IOTT effect actually looks like (so I "know what it is when I see it .. heh heh heh.")
Thanks as always for your continued consideration of these matters.