
Re your questions about the plots sent offline (and the underlying data posted here 12/13 at 10:33am)
Posted:
Dec 14, 2012 12:45 AM


You wrote:
?What are the 1..36? All the other values are monotone increasing. Did they come that way, or did you sort them?
The best way to see the difference between the plots is to take cols 2 & 3 as x & y coordinates, then plot the points along with a line from (0,0) to (1,1). The Splot is mostly below the line. the Cplot is mostly above. I'm not as struck by that difference as you seem to be. Where did the numbers come from??
Answers
1. The 1...36 are irrelevant if the data are plotted the way you suggest ? they were just a way of giving Excel an xaxis to plot against. And thanks very much for the suggestion as how to plot in cases like this ? of course it never would have occurred to me to do it that way, and I was delighted to see that Excel lets you do it pretty easily (for a Microsoftowned product, that is.)
2. Yes ? columns 2 and 3 were sorted.
3. Here?s where the numbers came from.
Recall that:
a) the fold x subset ?het? data which I presented for Aubuqe on L at MoSS N, set 1:
Slopes of Regressions of Aubqe on Length (L) for each Fold x Subset  Set 1, Method N Fold x Slope Subset  # of Set 1 of Aubqe Meth N L?s on L a3_S_1_N 70 0.000188 c1_C_1_N 101 0.000026 a3_C_1_N 48 0.000052 c1_S_1_N 101 0.000266 c2_S_1_N 96 0.000421 c2_C_1_N 95 0.000550 b47_C_1_N 99 0.000618 a1_S_1_N 101 0.001069 b47_S_1_N 99 0.001079 b1_S_1_N 31 0.001119 b1_C_1_N 28 0.002015 a1_C_1_N 101 0.002210
were selected (because of their low associated ?het? p) from the fold x subset data for the regression Aubque on L computed for ALL six combinations of Set x MoSS.
b) to get all the fold x subset Aubque on L data for all combinations of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at each Len x Set x MoSS x Fold x Subset.
Call this entire set of underlying data for c on (e,u,u*e,u^2) the ?Rubqbase?, and instead of the computing the regression Aubque on L over the entire Rubqbase, compute the regression ueSlope on (ubar, ebar) over the entire Rubqbase , where:
i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);
ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean of ?e? at each L.
From each computation of ueSlope on (ubar, ebar) we have a pair of slopes with a pair of associated probabilities, and therefore across all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs of probabilities, or 144 probabilities in all.
DISREGARDING Fold and Set, divide these 144 probabilities into four groups:
36 at subset S, Method N 36 at subset C, Method N 36 at subset S, Method R 36 at subset C, Method R
Sort each of these groups independently (lowest to highest p), and then pair off elements of these four groups as follows:
pair off the 36 from S,N with the 36 from C,N by corresponding rank (from the sort of each group)
pair off the 36 from S,R with the 36 from C,R by corresponding rank (from the sort of each group)
(Note (!!!!) that these pairings are DIFFFERENT (!!!) from the pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in my post of 12/13@12:33.)
You will then have these two tables of paired p?s (and the associated plot ?done your way?, which I?ve sent offline):
SN,CN
0.004293565,0.000147868 0.009398,0.000235407 0.019790086,0.002576217 0.021645402,0.020854486 0.041148681,0.023919 0.056848093,0.041120964 0.169920851,0.042472596 0.236373,0.059794 0.248019846,0.079939524 0.277783068,0.087268176 0.281488299,0.13125994 0.287886,0.17489924 0.299769,0.180724763 0.299875026,0.185042614 0.360314613,0.207785097 0.370746358,0.21197145 0.406029587,0.228176227 0.43289,0.252242125 0.465398176,0.275296878 0.482382234,0.305134999 0.530897822,0.309388442 0.559333624,0.332112292 0.626424347,0.361024514 0.702399,0.41780334 0.741387901,0.423432022 0.768317356,0.476818276 0.820922877,0.542145 0.831159936,0.559098289 0.832584062,0.581960315 0.88900441,0.619627105 0.893789589,0.646265173 0.894253162,0.74717756 0.935126553,0.757530416 0.977748076,0.884119 0.980182674,0.900867429 0.984220184,0.938430375
SR,CR 0.000503944,0.00011982 0.00118415,0.012214573 0.041027523,0.029133944 0.052112332,0.048936138 0.054021335,0.05764761 0.057693811,0.05865896 0.068659527,0.064182305 0.083710757,0.088376406 0.094021303,0.107473805 0.130456898,0.147682873 0.21540961,0.162392478 0.236780945,0.181759433 0.236936513,0.201847347 0.269875322,0.210439736 0.294476424,0.226305355 0.315561395,0.227038784 0.319462902,0.255699197 0.327971706,0.288864935 0.463861812,0.302035139 0.479255866,0.312164668 0.564392402,0.388447922 0.577382726,0.397416524 0.579430243,0.434182601 0.588970805,0.438280224 0.61542756,0.516128733 0.629984706,0.614130775 0.698570658,0.675962212 0.719544247,0.689950901 0.732798731,0.735779895 0.813873971,0.778392333 0.883957837,0.800207872 0.888276157,0.870729822 0.888377668,0.911149831 0.917545651,0.93512393 0.977990461,0.941162349 0.980356048,0.986071449
So, depending on one?s ?IOT reaction? to the plot I?ve sent offline for the two tables above, one might be willing to say that in general, CN p?s plot significantly lower than CR p?s for equivalent SN?s and SR?s.
And this result, assuming you?re willing to accept it, is extremely important for the following reason.
It says that regardless of dicodon set 1,2,3, the (S,N) subsets ?evolved/were designed? (depending on your point of view ? heh heh heh) so that mutation away from these sets to (C,N) sets does NOT change the predictive capacities of ubar and ebar in ueSlope on (ubar, ebar) as much as the predictive capacities of ubar and ebar in ueSlope on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R) sets.
Or, to boil that statement down even further, the result says that we have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that does NOT exist for (S,R) sets. And the existence of this invariant strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all evolved to keep certain thermodynamic properties of protein messasges relatively constant despite the mutation which these messages must perforce undergo over time.
Finally, apart from this empirical interpretation of the plot I?ve sent off line, I have a ?feeling? that the facts above regarding ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been discussing regarding Aubqe on L. But if you agree, then the ball is now in your court for the obvious reason that I have neither the knowledge nor experience nor statistical brainpower to determine if ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...

