```Date: Dec 14, 2012 3:48 AM
Author: Ray Koopman
Subject: Re: Re your questions about the plots sent off-line (and the<br> underlying data posted here 12/13 at 10:33am)

On Dec 13, 9:45 pm, djh <halitsk...@att.net> wrote:> You wrote:>> ?What are the 1..36? All the other values are monotone increasing.> Did they come that way, or did you sort them?>> The best way to see the difference between the plots is to take cols> 2 & 3 as x & y coordinates, then plot the points along with a line> from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot> is mostly above. I'm not as struck by that difference as you seem to> be. Where did the numbers come from??>> Answers>> 1.  The 1...36 are irrelevant if the data are plotted the way you> suggest ? they were just a way of giving Excel an x-axis to plot> against.  And thanks very much for the suggestion as how to plot in> cases like this ? of course it never would have occurred to me to do> it that way, and I was delighted to see that Excel lets you do it> pretty easily (for a Microsoft-owned product, that is.)You should seriously consider a real plotting program such ashttp://www.gnuplot.info/>> 2.  Yes ? columns 2 and 3 were sorted.>> 3.  Here?s where the numbers came from.>> Recall that:>> a) the fold x subset ?het? data which I presented for Aubuqe on L at> MoSS N, set 1:>>    Slopes of Regressions of> Aubqe on Length (L) for each>       Fold x Subset |>       Set 1, Method N> Fold x             Slope> Subset |     #      of> Set 1       of     Aubqe> Meth N     L?s      on L> a3_S_1_N     70  -0.000188> c1_C_1_N    101  -0.000026> a3_C_1_N     48   0.000052> c1_S_1_N    101   0.000266> c2_S_1_N     96   0.000421> c2_C_1_N     95   0.000550> b47_C_1_N    99   0.000618> a1_S_1_N    101   0.001069> b47_S_1_N    99   0.001079> b1_S_1_N     31   0.001119> b1_C_1_N     28   0.002015> a1_C_1_N    101   0.002210>> were selected (because of their low associated ?het? p) from the fold> x subset data for the regression Aubque on L computed for ALL six> combinations of Set x MoSS.>> b) to get all the fold x subset Aubque on L data for all combinations> of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at> each Len x Set x MoSS x Fold x Subset.You seem to switch willy-nilly between Aubuqe, Aubqe, and Aubque.How do they differ?>> Call this entire set of underlying data for c on (e,u,u*e,u^2) the> ?Rubq-base?, and instead of the computing the regression Aubque on L> over the entire Rubq-base, compute the regression ueSlope on (ubar,> ebar) over the entire Rubq-base , where:>> i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);Do you mean the coefficient of u*e?>> ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean> of ?e? at each L.>> From each computation of ueSlope on (ubar, ebar) we have a pair of> slopes with a pair of associated probabilities, and therefore across> all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs> of probabilities, or 144 probabilities in all.What is the "computation of ueSlope on (ubar, ebar)"?How do you get a pair of p's from it?>> DISREGARDING Fold and Set, divide these 144 probabilities into four> groups:>> 36 at subset S, Method N> 36 at subset C, Method N> 36 at subset S, Method R> 36 at subset C, Method R>> Sort each of these groups independently (lowest to highest p), and> then pair off elements of these four groups as follows:>> pair off the 36 from S,N with the 36 from C,N by corresponding rank> (from the sort of each group)>> pair off the 36 from S,R with the 36 from C,R by corresponding rank> (from the sort of each group)Regardless of the answers to my previous questions, you can't splitnaturally paired p's, sort them, re-pair them, and then compare there-paired p's -- which you shouldn't compare in the first place,even without the shuffling, because p-values are NOT effect sizes.>> (Note (!!!!) that these pairings are DIFFFERENT (!!!) from the> pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in> my post of 12/13@12:33.)>> You will then have these two tables of paired p?s (and the associated> plot ?done your way?, which I?ve sent offline):>> SN,CN>> 0.004293565,0.000147868> 0.009398,0.000235407> 0.019790086,0.002576217> 0.021645402,0.020854486> 0.041148681,0.023919> 0.056848093,0.041120964> 0.169920851,0.042472596> 0.236373,0.059794> 0.248019846,0.079939524> 0.277783068,0.087268176> 0.281488299,0.13125994> 0.287886,0.17489924> 0.299769,0.180724763> 0.299875026,0.185042614> 0.360314613,0.207785097> 0.370746358,0.21197145> 0.406029587,0.228176227> 0.43289,0.252242125> 0.465398176,0.275296878> 0.482382234,0.305134999> 0.530897822,0.309388442> 0.559333624,0.332112292> 0.626424347,0.361024514> 0.702399,0.41780334> 0.741387901,0.423432022> 0.768317356,0.476818276> 0.820922877,0.542145> 0.831159936,0.559098289> 0.832584062,0.581960315> 0.88900441,0.619627105> 0.893789589,0.646265173> 0.894253162,0.74717756> 0.935126553,0.757530416> 0.977748076,0.884119> 0.980182674,0.900867429> 0.984220184,0.938430375>> SR,CR> 0.000503944,0.00011982> 0.00118415,0.012214573> 0.041027523,0.029133944> 0.052112332,0.048936138> 0.054021335,0.05764761> 0.057693811,0.05865896> 0.068659527,0.064182305> 0.083710757,0.088376406> 0.094021303,0.107473805> 0.130456898,0.147682873> 0.21540961,0.162392478> 0.236780945,0.181759433> 0.236936513,0.201847347> 0.269875322,0.210439736> 0.294476424,0.226305355> 0.315561395,0.227038784> 0.319462902,0.255699197> 0.327971706,0.288864935> 0.463861812,0.302035139> 0.479255866,0.312164668> 0.564392402,0.388447922> 0.577382726,0.397416524> 0.579430243,0.434182601> 0.588970805,0.438280224> 0.61542756,0.516128733> 0.629984706,0.614130775> 0.698570658,0.675962212> 0.719544247,0.689950901> 0.732798731,0.735779895> 0.813873971,0.778392333> 0.883957837,0.800207872> 0.888276157,0.870729822> 0.888377668,0.911149831> 0.917545651,0.93512393> 0.977990461,0.941162349> 0.980356048,0.986071449>> So, depending on one?s ?IOT reaction? to the plot I?ve sent offline> for the two tables above, one might be willing to say that in general,> CN p?s plot significantly lower than CR p?s for equivalent SN?s and> SR?s.>> And this result, assuming you?re willing to accept it, is extremely> important for the following reason.>> It says that regardless of dicodon set 1,2,3, the (S,N) subsets> ?evolved/were designed? (depending on your point of view ? heh heh> heh) so that mutation away from these sets to (C,N) sets does NOT> change the predictive capacities of ubar and ebar in ueSlope on (ubar,> ebar) as much as the predictive capacities of ubar and ebar in ueSlope> on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)> sets.>> Or, to boil that statement down even further, the result says that we> have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that> does NOT exist for (S,R) sets.  And the existence of this invariant> strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all> evolved to keep certain thermodynamic properties of protein messasges> relatively constant despite the mutation which these messages must> perforce undergo over time.>> Finally, apart from this empirical interpretation of the plot I?ve> sent off line, I have a ?feeling? that the facts above regarding> ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been> discussing regarding Aubqe on L.  But if you agree, then the ball is> now in your court for the obvious reason that I have neither the> knowledge nor experience nor statistical brain-power to determine if> ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...
```