Date: Dec 14, 2012 12:45 AM Author: Halitsky Subject: Re your questions about the plots sent off-line (and the underlying<br> data posted here 12/13 at 10:33am) You wrote:

?What are the 1..36? All the other values are monotone increasing.

Did they come that way, or did you sort them?

The best way to see the difference between the plots is to take cols

2 & 3 as x & y coordinates, then plot the points along with a line

from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot

is mostly above. I'm not as struck by that difference as you seem to

be. Where did the numbers come from??

Answers

1. The 1...36 are irrelevant if the data are plotted the way you

suggest ? they were just a way of giving Excel an x-axis to plot

against. And thanks very much for the suggestion as how to plot in

cases like this ? of course it never would have occurred to me to do

it that way, and I was delighted to see that Excel lets you do it

pretty easily (for a Microsoft-owned product, that is.)

2. Yes ? columns 2 and 3 were sorted.

3. Here?s where the numbers came from.

Recall that:

a) the fold x subset ?het? data which I presented for Aubuqe on L at

MoSS N, set 1:

Slopes of Regressions of

Aubqe on Length (L) for each

Fold x Subset |

Set 1, Method N

Fold x Slope

Subset | # of

Set 1 of Aubqe

Meth N L?s on L

a3_S_1_N 70 -0.000188

c1_C_1_N 101 -0.000026

a3_C_1_N 48 0.000052

c1_S_1_N 101 0.000266

c2_S_1_N 96 0.000421

c2_C_1_N 95 0.000550

b47_C_1_N 99 0.000618

a1_S_1_N 101 0.001069

b47_S_1_N 99 0.001079

b1_S_1_N 31 0.001119

b1_C_1_N 28 0.002015

a1_C_1_N 101 0.002210

were selected (because of their low associated ?het? p) from the fold

x subset data for the regression Aubque on L computed for ALL six

combinations of Set x MoSS.

b) to get all the fold x subset Aubque on L data for all combinations

of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at

each Len x Set x MoSS x Fold x Subset.

Call this entire set of underlying data for c on (e,u,u*e,u^2) the

?Rubq-base?, and instead of the computing the regression Aubque on L

over the entire Rubq-base, compute the regression ueSlope on (ubar,

ebar) over the entire Rubq-base , where:

i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);

ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean

of ?e? at each L.

From each computation of ueSlope on (ubar, ebar) we have a pair of

slopes with a pair of associated probabilities, and therefore across

all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs

of probabilities, or 144 probabilities in all.

DISREGARDING Fold and Set, divide these 144 probabilities into four

groups:

36 at subset S, Method N

36 at subset C, Method N

36 at subset S, Method R

36 at subset C, Method R

Sort each of these groups independently (lowest to highest p), and

then pair off elements of these four groups as follows:

pair off the 36 from S,N with the 36 from C,N by corresponding rank

(from the sort of each group)

pair off the 36 from S,R with the 36 from C,R by corresponding rank

(from the sort of each group)

(Note (!!!!) that these pairings are DIFFFERENT (!!!) from the

pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in

my post of 12/13@12:33.)

You will then have these two tables of paired p?s (and the associated

plot ?done your way?, which I?ve sent offline):

SN,CN

0.004293565,0.000147868

0.009398,0.000235407

0.019790086,0.002576217

0.021645402,0.020854486

0.041148681,0.023919

0.056848093,0.041120964

0.169920851,0.042472596

0.236373,0.059794

0.248019846,0.079939524

0.277783068,0.087268176

0.281488299,0.13125994

0.287886,0.17489924

0.299769,0.180724763

0.299875026,0.185042614

0.360314613,0.207785097

0.370746358,0.21197145

0.406029587,0.228176227

0.43289,0.252242125

0.465398176,0.275296878

0.482382234,0.305134999

0.530897822,0.309388442

0.559333624,0.332112292

0.626424347,0.361024514

0.702399,0.41780334

0.741387901,0.423432022

0.768317356,0.476818276

0.820922877,0.542145

0.831159936,0.559098289

0.832584062,0.581960315

0.88900441,0.619627105

0.893789589,0.646265173

0.894253162,0.74717756

0.935126553,0.757530416

0.977748076,0.884119

0.980182674,0.900867429

0.984220184,0.938430375

SR,CR

0.000503944,0.00011982

0.00118415,0.012214573

0.041027523,0.029133944

0.052112332,0.048936138

0.054021335,0.05764761

0.057693811,0.05865896

0.068659527,0.064182305

0.083710757,0.088376406

0.094021303,0.107473805

0.130456898,0.147682873

0.21540961,0.162392478

0.236780945,0.181759433

0.236936513,0.201847347

0.269875322,0.210439736

0.294476424,0.226305355

0.315561395,0.227038784

0.319462902,0.255699197

0.327971706,0.288864935

0.463861812,0.302035139

0.479255866,0.312164668

0.564392402,0.388447922

0.577382726,0.397416524

0.579430243,0.434182601

0.588970805,0.438280224

0.61542756,0.516128733

0.629984706,0.614130775

0.698570658,0.675962212

0.719544247,0.689950901

0.732798731,0.735779895

0.813873971,0.778392333

0.883957837,0.800207872

0.888276157,0.870729822

0.888377668,0.911149831

0.917545651,0.93512393

0.977990461,0.941162349

0.980356048,0.986071449

So, depending on one?s ?IOT reaction? to the plot I?ve sent offline

for the two tables above, one might be willing to say that in general,

CN p?s plot significantly lower than CR p?s for equivalent SN?s and

SR?s.

And this result, assuming you?re willing to accept it, is extremely

important for the following reason.

It says that regardless of dicodon set 1,2,3, the (S,N) subsets

?evolved/were designed? (depending on your point of view ? heh heh

heh) so that mutation away from these sets to (C,N) sets does NOT

change the predictive capacities of ubar and ebar in ueSlope on (ubar,

ebar) as much as the predictive capacities of ubar and ebar in ueSlope

on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)

sets.

Or, to boil that statement down even further, the result says that we

have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that

does NOT exist for (S,R) sets. And the existence of this invariant

strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all

evolved to keep certain thermodynamic properties of protein messasges

relatively constant despite the mutation which these messages must

perforce undergo over time.

Finally, apart from this empirical interpretation of the plot I?ve

sent off line, I have a ?feeling? that the facts above regarding

ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been

discussing regarding Aubqe on L. But if you agree, then the ball is

now in your court for the obvious reason that I have neither the

knowledge nor experience nor statistical brain-power to determine if

ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...