Here are the results of executing the (Y,Z,Y-Z) protocol on the eS (slope) coefficient of our regression Re = ln(c/L) on ln(c/e):
Y .239 2865 (at uL) Z .605 14342 (at uH) Y-Z 3.8E-10 5749 (across uL,uH)
Y Z sum(1S-1C) -0.552 -0.243 sum(R1S-R1C) 0.350 0.138 diff +0.902 +0.381 -------------------------------
Y .809 7724 (at uL) Z .308 27594 (at uH) Y-Z 5.5E-23 18276 (across uL,uH)
Y Z sum(2S-2C) -0.079 -0.455 sum(R2S-R2C) 0.098 0.273 diff +0.177 +0.728
Y .282 936 (at uL) Z .074 16978 (at uH) Y-Z 9.0E-08 1830 (across uL,uH)
Y Z sum(3S-3C) -0.444 -0.973 sum(R3S-R3C) 0.390 0.350 +0.834 +1.323 --------------------------------
If you look at the three lines labelled "diff", you'll see immediately why we got weaker and more inconsistent results than desired when we attempted to predict structural alignability using a predictor derived from the eS coefficient of Re for the dicodon subset 1S. (If you go back in the s.s.m "archive" here, you'll see that we really only got believable results for just the one fold a1 using a predictor derived from the eS coefficient of Re for the dicodon subset 1S and u-level = uA (all).)
In particular, the three "diff" lines together imply that we will get our biggest bang for the buck if we use predictors derived from the eS coefficients from runs on the dicodon subset 2S or 3S with u-level set to uH, NOT the dicodon subset 1S.
Also, the sum and diff lines for Z for 2:R2 and 3:R3 accord precisely with the fundamental energetic hypothesis we've been pursuing. This is because these lines indicate that:
a) for non-randomly constructed uH (i.e. Z) data, the eS coefficient of Re is essentially negative
b) for randomly constructed uH (i.e. Z) data, the eS coefficient of Re is essentially positive.
And this is exactly as expected: the representation level (ln(c/L) of our dipeptides of interest correlates NEGATIVELY with the energetic level (ln(c/e) of the dicodons underlying these dipeptides.
Why? Because the underlying O-F index was constructed so that energetic favorability INCREASES as the index value DECREASES.
ASSUMING that the above analysis is defensible, it bodes well for our increased success in structural alignability prediction for two main reasons:
a) we can now use two sets of predictors derived BOTH from the eS coefficients of Re for(2S,uH) AND from the eS coefficients of Re for (3S,uH) data, rather than just a predictor derived from one set of data;
b) we can actually look at the underlying sequences and find those that belong to the INTERSECTION of the (2S,uH) data and the (3S,uH) data, and then compare alignability results on these most "highly- valued" sequences vs other "less-valued" sequences.
But note the "ASSUMING" in the above paragraph - I realize that you may well conclude that:
c) the tables for eS exhibited above do not support ANY meaningful conclusions about the role of energetics in dipeptide over- representation;
d) or, entirely different conclusions than the ones I've drawn above.
Thanks as always for the time you'll spend considering the above. In the meantime, I'll be doing the (Y,Z,Y-Z) analysis on the remaining six coefficients.