Date: Dec 8, 2012 7:44 PM
Author: Halitsky
Subject: I think I understand; if so, then here’s what I ex<br>	pect you’ll agree I should do next

Thanks very much for taking the time to explain.

It?s a funny kind of Catch-22, in the following sense. We CAN use all
three of 3a-c IN CONCERT to generate predictors for our logistic
regressions, so long as we don?t inquire as to whether 3ac are
themselves individually justifiable statistically. Once we ask that
question of each of 3a-c, we can no longer use them in concert to
generate predictors. (Kind of ?Heisenbergian?, in the sense that the
very act of observing constrains what we can accurately observe.)

Well now that I understand, I think we?re OK for the following reason.

Suppose I generate the same ordered p table as I just presented for
all of the slope coefficients of 3c ( u, e, u*e as well as u^2), and
of course at all values of subset x method x fold x set. Then one of
the following outcomes can be expected:

i) we will not only find statistical goodness for all of the four
slope coefficients individually, but we will find statistical goodness
ONLY where we want it ... say for the sake of discussion just in N1S
and maybe N2S but not N3S, and not in any cell involving subset=C and/
or method=R;

ii) we will find statistical goodness, but not where we expect it ...
say for the sake of discussion that we find it in no R cells, but we
do find it in some C cells as well as some S cells. (In this case, we
would have to scientifically step-back and think about what might be
going on ... maybe mutation plays a deforming role in some cases more
than others.)

iii) we don?t find statistical goodness anywhere, or if we do, we find
it equally in corresponding R and N cells, which would mean that we
might have discovered something about the random distribution of
certain bioinformatic variables, but not much else.

In the case that (i) is the outcome, we will have achieved our goal of
statistically justifying the single regression c on (u,e,u*e,u^2),
which we can then use to generate predictors for our logistic
regressions.

Furthermore, if each of the four coefficients are statistically ?good?
in the sense of the last paragraph of our last post, then it ?costs?
us nothing to drop regresssions 3a and 3b because there are no new
variables in either of these regressions that aren?t in 3c.

And finally, we can continue to investigate the following ?good? het
results, since all of them involve average slopes or covar arising
from 3c:

a1 a3 b1 b47 c1 c2
C S C S c S c S C S C S "Het"

N1 Aubqe H H L L H H H H L L L L 0
N3 Aubqe L L H H L L H H H H L L 0

N3 Aubqu L H H H H H L H L H L H 4

N1 AubqC H L H L H L H L H L H L 6

N2 AubqC H L H L H L H L H L H L 6

So, please let me know if generating the ordered p tables for all
values of subset x method x fold x set and all slope coefficients of
3c is what I should do next (and also perhaps for the covar of 3c.)

If so, please let me know:

iv) whether it would help you for me to first do a full rerun to
capture SEPs via Ivo?s sigmasq?s, in case you need them as adjunct
information for evaluation.

v) if I can programatically generate the comparison of the plots, or
if it?s actually worthwhile to visually compare every case.

If not, please let me know what you think I should do next.

Also, with respect to point (v), once I have the data generated,
perhaps you can send me two sample plots offline so I can see what the
IOTT effect actually looks like (so I "know what it is when I see
it .. heh heh heh.")

Thanks as always for your continued consideration of these matters.