Date: Dec 8, 2012 7:44 PM Author: Halitsky Subject: I think I understand; if so, then here’s what I ex<br> pect you’ll agree I should do next Thanks very much for taking the time to explain.

It?s a funny kind of Catch-22, in the following sense. We CAN use all

three of 3a-c IN CONCERT to generate predictors for our logistic

regressions, so long as we don?t inquire as to whether 3ac are

themselves individually justifiable statistically. Once we ask that

question of each of 3a-c, we can no longer use them in concert to

generate predictors. (Kind of ?Heisenbergian?, in the sense that the

very act of observing constrains what we can accurately observe.)

Well now that I understand, I think we?re OK for the following reason.

Suppose I generate the same ordered p table as I just presented for

all of the slope coefficients of 3c ( u, e, u*e as well as u^2), and

of course at all values of subset x method x fold x set. Then one of

the following outcomes can be expected:

i) we will not only find statistical goodness for all of the four

slope coefficients individually, but we will find statistical goodness

ONLY where we want it ... say for the sake of discussion just in N1S

and maybe N2S but not N3S, and not in any cell involving subset=C and/

or method=R;

ii) we will find statistical goodness, but not where we expect it ...

say for the sake of discussion that we find it in no R cells, but we

do find it in some C cells as well as some S cells. (In this case, we

would have to scientifically step-back and think about what might be

going on ... maybe mutation plays a deforming role in some cases more

than others.)

iii) we don?t find statistical goodness anywhere, or if we do, we find

it equally in corresponding R and N cells, which would mean that we

might have discovered something about the random distribution of

certain bioinformatic variables, but not much else.

In the case that (i) is the outcome, we will have achieved our goal of

statistically justifying the single regression c on (u,e,u*e,u^2),

which we can then use to generate predictors for our logistic

regressions.

Furthermore, if each of the four coefficients are statistically ?good?

in the sense of the last paragraph of our last post, then it ?costs?

us nothing to drop regresssions 3a and 3b because there are no new

variables in either of these regressions that aren?t in 3c.

And finally, we can continue to investigate the following ?good? het

results, since all of them involve average slopes or covar arising

from 3c:

a1 a3 b1 b47 c1 c2

C S C S c S c S C S C S "Het"

N1 Aubqe H H L L H H H H L L L L 0

N3 Aubqe L L H H L L H H H H L L 0

N3 Aubqu L H H H H H L H L H L H 4

N1 AubqC H L H L H L H L H L H L 6

N2 AubqC H L H L H L H L H L H L 6

So, please let me know if generating the ordered p tables for all

values of subset x method x fold x set and all slope coefficients of

3c is what I should do next (and also perhaps for the covar of 3c.)

If so, please let me know:

iv) whether it would help you for me to first do a full rerun to

capture SEPs via Ivo?s sigmasq?s, in case you need them as adjunct

information for evaluation.

v) if I can programatically generate the comparison of the plots, or

if it?s actually worthwhile to visually compare every case.

If not, please let me know what you think I should do next.

Also, with respect to point (v), once I have the data generated,

perhaps you can send me two sample plots offline so I can see what the

IOTT effect actually looks like (so I "know what it is when I see

it .. heh heh heh.")

Thanks as always for your continued consideration of these matters.