First, thanks VERY much for the explicit algorithm. I can code that and will do so. You can expect results for all six folds in a few days (I?m coding some nasty stuff for the Navy right now, which is requiring me to contribute some ?pro bono? debug time after-hours.)
Second, the multiplicity factor is already 72, and was 72 in the files I sent in my off-line email of 9/15, referencing my s.s.m post of 9/15 at 1:54.
Third, you wrote:
?The plot makes it obvious that the only possible split for these intervals is the first vs the rest.?
If this is the case, then must it not also be the case that it is meaningful to contrast:
the CI of the ?first? (lowest) CI interval, i.e. the interval for 2:R2 C uH;
the CI of the ?last? (highest) CI interval, i.e. the interval for 2:R2 S uH ?
If so, then this is a critical result which will make both Jacques and Arthur quite happy, for the following reasons.
The difference between the non-random cell ?2 C uH? and the non-random cell ?2 S uH? is the difference between:
a) the ?core? set S of 119 dicodons containing the original 63 dicodons PLUS the energetic equivalents (reverse complements) of these 63 without duplicates or dicodons containing ?stop?-dicodons (that?s why there are 119, not 126.)
b) the ?complement? set C containing the remaining 1058 dicodons which encode the same dipeptides as the ?core? 119 (note that the term ?complement? here is used in a generic sense which is different from the technical sense of ?complement? in the term ?reverse complement?.)
So, from Jacques? ?energetic? perspective, the plot tells us that we do NOT get meaningful results if we split the set ?2 S? (containing 119 dicodons) into
the original 63 dicodons in set ?1 S?
the 60 energetic equivalents of these dicodons in set ?3 S?
(Why does the plot tell us this? Because we do not get a ?CI contrast? between (1:R1 S uH) and (1:R1 C uH), nor between (3:R3 S uH) and (3:R3 C uH) ).
And therefore, the plot tells Jacques that the system appears to be taking advantage of an overall ?superset? of dicodons with certain energetic properties, regardless of whether these dicodons come from the original over-represented 63 or the 60 non-over-represented energetic equivalents of these 63. (For reasons I won?t go into here, it is not necessary for the 60 to be over-represented in order for their energetic properties to be used advantageously.)
Turning now to an evaluation of the plot from Arthur?s perspective, the plot tells us that for the purpose of developing predictors for investigation of structural alignability via logistic regressions:
we CAN'T use results obtained from cell ?3 S uH?
we CAN?T use results obtained from cell ?1 S uH?
we CAN use results obtained from cell ?2 S uH?.
And inasmuch as the predictive performance of our logistic regression on the a1 fold was obtained using results from cell ?1 S uH?, we SHOULD be able to improve this performance by re-predicting using results from cell ?2 S uH?.