Date: Dec 14, 2012 3:48 AM Author: Ray Koopman Subject: Re: Re your questions about the plots sent off-line (and the<br> underlying data posted here 12/13 at 10:33am) On Dec 13, 9:45 pm, djh <halitsk...@att.net> wrote:

> You wrote:

>

> ?What are the 1..36? All the other values are monotone increasing.

> Did they come that way, or did you sort them?

>

> The best way to see the difference between the plots is to take cols

> 2 & 3 as x & y coordinates, then plot the points along with a line

> from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot

> is mostly above. I'm not as struck by that difference as you seem to

> be. Where did the numbers come from??

>

> Answers

>

> 1. The 1...36 are irrelevant if the data are plotted the way you

> suggest ? they were just a way of giving Excel an x-axis to plot

> against. And thanks very much for the suggestion as how to plot in

> cases like this ? of course it never would have occurred to me to do

> it that way, and I was delighted to see that Excel lets you do it

> pretty easily (for a Microsoft-owned product, that is.)

You should seriously consider a real plotting program such as

http://www.gnuplot.info/

>

> 2. Yes ? columns 2 and 3 were sorted.

>

> 3. Here?s where the numbers came from.

>

> Recall that:

>

> a) the fold x subset ?het? data which I presented for Aubuqe on L at

> MoSS N, set 1:

>

> Slopes of Regressions of

> Aubqe on Length (L) for each

> Fold x Subset |

> Set 1, Method N

> Fold x Slope

> Subset | # of

> Set 1 of Aubqe

> Meth N L?s on L

> a3_S_1_N 70 -0.000188

> c1_C_1_N 101 -0.000026

> a3_C_1_N 48 0.000052

> c1_S_1_N 101 0.000266

> c2_S_1_N 96 0.000421

> c2_C_1_N 95 0.000550

> b47_C_1_N 99 0.000618

> a1_S_1_N 101 0.001069

> b47_S_1_N 99 0.001079

> b1_S_1_N 31 0.001119

> b1_C_1_N 28 0.002015

> a1_C_1_N 101 0.002210

>

> were selected (because of their low associated ?het? p) from the fold

> x subset data for the regression Aubque on L computed for ALL six

> combinations of Set x MoSS.

>

> b) to get all the fold x subset Aubque on L data for all combinations

> of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at

> each Len x Set x MoSS x Fold x Subset.

You seem to switch willy-nilly between Aubuqe, Aubqe, and Aubque.

How do they differ?

>

> Call this entire set of underlying data for c on (e,u,u*e,u^2) the

> ?Rubq-base?, and instead of the computing the regression Aubque on L

> over the entire Rubq-base, compute the regression ueSlope on (ubar,

> ebar) over the entire Rubq-base , where:

>

> i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);

Do you mean the coefficient of u*e?

>

> ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean

> of ?e? at each L.

>

> From each computation of ueSlope on (ubar, ebar) we have a pair of

> slopes with a pair of associated probabilities, and therefore across

> all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs

> of probabilities, or 144 probabilities in all.

What is the "computation of ueSlope on (ubar, ebar)"?

How do you get a pair of p's from it?

>

> DISREGARDING Fold and Set, divide these 144 probabilities into four

> groups:

>

> 36 at subset S, Method N

> 36 at subset C, Method N

> 36 at subset S, Method R

> 36 at subset C, Method R

>

> Sort each of these groups independently (lowest to highest p), and

> then pair off elements of these four groups as follows:

>

> pair off the 36 from S,N with the 36 from C,N by corresponding rank

> (from the sort of each group)

>

> pair off the 36 from S,R with the 36 from C,R by corresponding rank

> (from the sort of each group)

Regardless of the answers to my previous questions, you can't split

naturally paired p's, sort them, re-pair them, and then compare the

re-paired p's -- which you shouldn't compare in the first place,

even without the shuffling, because p-values are NOT effect sizes.

>

> (Note (!!!!) that these pairings are DIFFFERENT (!!!) from the

> pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in

> my post of 12/13@12:33.)

>

> You will then have these two tables of paired p?s (and the associated

> plot ?done your way?, which I?ve sent offline):

>

> SN,CN

>

> 0.004293565,0.000147868

> 0.009398,0.000235407

> 0.019790086,0.002576217

> 0.021645402,0.020854486

> 0.041148681,0.023919

> 0.056848093,0.041120964

> 0.169920851,0.042472596

> 0.236373,0.059794

> 0.248019846,0.079939524

> 0.277783068,0.087268176

> 0.281488299,0.13125994

> 0.287886,0.17489924

> 0.299769,0.180724763

> 0.299875026,0.185042614

> 0.360314613,0.207785097

> 0.370746358,0.21197145

> 0.406029587,0.228176227

> 0.43289,0.252242125

> 0.465398176,0.275296878

> 0.482382234,0.305134999

> 0.530897822,0.309388442

> 0.559333624,0.332112292

> 0.626424347,0.361024514

> 0.702399,0.41780334

> 0.741387901,0.423432022

> 0.768317356,0.476818276

> 0.820922877,0.542145

> 0.831159936,0.559098289

> 0.832584062,0.581960315

> 0.88900441,0.619627105

> 0.893789589,0.646265173

> 0.894253162,0.74717756

> 0.935126553,0.757530416

> 0.977748076,0.884119

> 0.980182674,0.900867429

> 0.984220184,0.938430375

>

> SR,CR

> 0.000503944,0.00011982

> 0.00118415,0.012214573

> 0.041027523,0.029133944

> 0.052112332,0.048936138

> 0.054021335,0.05764761

> 0.057693811,0.05865896

> 0.068659527,0.064182305

> 0.083710757,0.088376406

> 0.094021303,0.107473805

> 0.130456898,0.147682873

> 0.21540961,0.162392478

> 0.236780945,0.181759433

> 0.236936513,0.201847347

> 0.269875322,0.210439736

> 0.294476424,0.226305355

> 0.315561395,0.227038784

> 0.319462902,0.255699197

> 0.327971706,0.288864935

> 0.463861812,0.302035139

> 0.479255866,0.312164668

> 0.564392402,0.388447922

> 0.577382726,0.397416524

> 0.579430243,0.434182601

> 0.588970805,0.438280224

> 0.61542756,0.516128733

> 0.629984706,0.614130775

> 0.698570658,0.675962212

> 0.719544247,0.689950901

> 0.732798731,0.735779895

> 0.813873971,0.778392333

> 0.883957837,0.800207872

> 0.888276157,0.870729822

> 0.888377668,0.911149831

> 0.917545651,0.93512393

> 0.977990461,0.941162349

> 0.980356048,0.986071449

>

> So, depending on one?s ?IOT reaction? to the plot I?ve sent offline

> for the two tables above, one might be willing to say that in general,

> CN p?s plot significantly lower than CR p?s for equivalent SN?s and

> SR?s.

>

> And this result, assuming you?re willing to accept it, is extremely

> important for the following reason.

>

> It says that regardless of dicodon set 1,2,3, the (S,N) subsets

> ?evolved/were designed? (depending on your point of view ? heh heh

> heh) so that mutation away from these sets to (C,N) sets does NOT

> change the predictive capacities of ubar and ebar in ueSlope on (ubar,

> ebar) as much as the predictive capacities of ubar and ebar in ueSlope

> on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)

> sets.

>

> Or, to boil that statement down even further, the result says that we

> have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that

> does NOT exist for (S,R) sets. And the existence of this invariant

> strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all

> evolved to keep certain thermodynamic properties of protein messasges

> relatively constant despite the mutation which these messages must

> perforce undergo over time.

>

> Finally, apart from this empirical interpretation of the plot I?ve

> sent off line, I have a ?feeling? that the facts above regarding

> ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been

> discussing regarding Aubqe on L. But if you agree, then the ball is

> now in your court for the obvious reason that I have neither the

> knowledge nor experience nor statistical brain-power to determine if

> ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...