Date: Dec 14, 2012 12:45 AM
Author: Halitsky
Subject: Re your questions  about the plots sent off-line (and the underlying<br> data posted here 12/13 at 10:33am)

You wrote:

?What are the 1..36? All the other values are monotone increasing.
Did they come that way, or did you sort them?

The best way to see the difference between the plots is to take cols
2 & 3 as x & y coordinates, then plot the points along with a line
from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot
is mostly above. I'm not as struck by that difference as you seem to
be. Where did the numbers come from??

Answers

1. The 1...36 are irrelevant if the data are plotted the way you
suggest ? they were just a way of giving Excel an x-axis to plot
against. And thanks very much for the suggestion as how to plot in
cases like this ? of course it never would have occurred to me to do
it that way, and I was delighted to see that Excel lets you do it
pretty easily (for a Microsoft-owned product, that is.)

2. Yes ? columns 2 and 3 were sorted.

3. Here?s where the numbers came from.

Recall that:

a) the fold x subset ?het? data which I presented for Aubuqe on L at
MoSS N, set 1:

Slopes of Regressions of
Aubqe on Length (L) for each
Fold x Subset |
Set 1, Method N
Fold x Slope
Subset | # of
Set 1 of Aubqe
Meth N L?s on L
a3_S_1_N 70 -0.000188
c1_C_1_N 101 -0.000026
a3_C_1_N 48 0.000052
c1_S_1_N 101 0.000266
c2_S_1_N 96 0.000421
c2_C_1_N 95 0.000550
b47_C_1_N 99 0.000618
a1_S_1_N 101 0.001069
b47_S_1_N 99 0.001079
b1_S_1_N 31 0.001119
b1_C_1_N 28 0.002015
a1_C_1_N 101 0.002210

were selected (because of their low associated ?het? p) from the fold
x subset data for the regression Aubque on L computed for ALL six
combinations of Set x MoSS.

b) to get all the fold x subset Aubque on L data for all combinations
of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at
each Len x Set x MoSS x Fold x Subset.

Call this entire set of underlying data for c on (e,u,u*e,u^2) the
?Rubq-base?, and instead of the computing the regression Aubque on L
over the entire Rubq-base, compute the regression ueSlope on (ubar,
ebar) over the entire Rubq-base , where:

i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);

ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean
of ?e? at each L.

From each computation of ueSlope on (ubar, ebar) we have a pair of
slopes with a pair of associated probabilities, and therefore across
all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs
of probabilities, or 144 probabilities in all.

DISREGARDING Fold and Set, divide these 144 probabilities into four
groups:

36 at subset S, Method N
36 at subset C, Method N
36 at subset S, Method R
36 at subset C, Method R

Sort each of these groups independently (lowest to highest p), and
then pair off elements of these four groups as follows:

pair off the 36 from S,N with the 36 from C,N by corresponding rank
(from the sort of each group)

pair off the 36 from S,R with the 36 from C,R by corresponding rank
(from the sort of each group)

(Note (!!!!) that these pairings are DIFFFERENT (!!!) from the
pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in
my post of 12/13@12:33.)

You will then have these two tables of paired p?s (and the associated
plot ?done your way?, which I?ve sent offline):

SN,CN

0.004293565,0.000147868
0.009398,0.000235407
0.019790086,0.002576217
0.021645402,0.020854486
0.041148681,0.023919
0.056848093,0.041120964
0.169920851,0.042472596
0.236373,0.059794
0.248019846,0.079939524
0.277783068,0.087268176
0.281488299,0.13125994
0.287886,0.17489924
0.299769,0.180724763
0.299875026,0.185042614
0.360314613,0.207785097
0.370746358,0.21197145
0.406029587,0.228176227
0.43289,0.252242125
0.465398176,0.275296878
0.482382234,0.305134999
0.530897822,0.309388442
0.559333624,0.332112292
0.626424347,0.361024514
0.702399,0.41780334
0.741387901,0.423432022
0.768317356,0.476818276
0.820922877,0.542145
0.831159936,0.559098289
0.832584062,0.581960315
0.88900441,0.619627105
0.893789589,0.646265173
0.894253162,0.74717756
0.935126553,0.757530416
0.977748076,0.884119
0.980182674,0.900867429
0.984220184,0.938430375

SR,CR
0.000503944,0.00011982
0.00118415,0.012214573
0.041027523,0.029133944
0.052112332,0.048936138
0.054021335,0.05764761
0.057693811,0.05865896
0.068659527,0.064182305
0.083710757,0.088376406
0.094021303,0.107473805
0.130456898,0.147682873
0.21540961,0.162392478
0.236780945,0.181759433
0.236936513,0.201847347
0.269875322,0.210439736
0.294476424,0.226305355
0.315561395,0.227038784
0.319462902,0.255699197
0.327971706,0.288864935
0.463861812,0.302035139
0.479255866,0.312164668
0.564392402,0.388447922
0.577382726,0.397416524
0.579430243,0.434182601
0.588970805,0.438280224
0.61542756,0.516128733
0.629984706,0.614130775
0.698570658,0.675962212
0.719544247,0.689950901
0.732798731,0.735779895
0.813873971,0.778392333
0.883957837,0.800207872
0.888276157,0.870729822
0.888377668,0.911149831
0.917545651,0.93512393
0.977990461,0.941162349
0.980356048,0.986071449

So, depending on one?s ?IOT reaction? to the plot I?ve sent offline
for the two tables above, one might be willing to say that in general,
CN p?s plot significantly lower than CR p?s for equivalent SN?s and
SR?s.

And this result, assuming you?re willing to accept it, is extremely
important for the following reason.

It says that regardless of dicodon set 1,2,3, the (S,N) subsets
?evolved/were designed? (depending on your point of view ? heh heh
heh) so that mutation away from these sets to (C,N) sets does NOT
change the predictive capacities of ubar and ebar in ueSlope on (ubar,
ebar) as much as the predictive capacities of ubar and ebar in ueSlope
on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)
sets.

Or, to boil that statement down even further, the result says that we
have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that
does NOT exist for (S,R) sets. And the existence of this invariant
strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all
evolved to keep certain thermodynamic properties of protein messasges
relatively constant despite the mutation which these messages must
perforce undergo over time.

Finally, apart from this empirical interpretation of the plot I?ve
sent off line, I have a ?feeling? that the facts above regarding
ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been
discussing regarding Aubqe on L. But if you agree, then the ball is
now in your court for the obvious reason that I have neither the
knowledge nor experience nor statistical brain-power to determine if
ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...