Date: Dec 14, 2012 3:48 AM
Author: Ray Koopman
Subject: Re: Re your questions about the plots sent off-line (and the<br> underlying data posted here 12/13 at 10:33am)

On Dec 13, 9:45 pm, djh <halitsk...@att.net> wrote:
> You wrote:
>
> ?What are the 1..36? All the other values are monotone increasing.
> Did they come that way, or did you sort them?
>
> The best way to see the difference between the plots is to take cols
> 2 & 3 as x & y coordinates, then plot the points along with a line
> from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot
> is mostly above. I'm not as struck by that difference as you seem to
> be. Where did the numbers come from??
>
> Answers
>
> 1. The 1...36 are irrelevant if the data are plotted the way you
> suggest ? they were just a way of giving Excel an x-axis to plot
> against. And thanks very much for the suggestion as how to plot in
> cases like this ? of course it never would have occurred to me to do
> it that way, and I was delighted to see that Excel lets you do it
> pretty easily (for a Microsoft-owned product, that is.)


You should seriously consider a real plotting program such as
http://www.gnuplot.info/

>
> 2. Yes ? columns 2 and 3 were sorted.
>
> 3. Here?s where the numbers came from.
>
> Recall that:
>
> a) the fold x subset ?het? data which I presented for Aubuqe on L at
> MoSS N, set 1:
>
> Slopes of Regressions of
> Aubqe on Length (L) for each
> Fold x Subset |
> Set 1, Method N
> Fold x Slope
> Subset | # of
> Set 1 of Aubqe
> Meth N L?s on L
> a3_S_1_N 70 -0.000188
> c1_C_1_N 101 -0.000026
> a3_C_1_N 48 0.000052
> c1_S_1_N 101 0.000266
> c2_S_1_N 96 0.000421
> c2_C_1_N 95 0.000550
> b47_C_1_N 99 0.000618
> a1_S_1_N 101 0.001069
> b47_S_1_N 99 0.001079
> b1_S_1_N 31 0.001119
> b1_C_1_N 28 0.002015
> a1_C_1_N 101 0.002210
>
> were selected (because of their low associated ?het? p) from the fold
> x subset data for the regression Aubque on L computed for ALL six
> combinations of Set x MoSS.
>
> b) to get all the fold x subset Aubque on L data for all combinations
> of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at
> each Len x Set x MoSS x Fold x Subset.


You seem to switch willy-nilly between Aubuqe, Aubqe, and Aubque.
How do they differ?

>
> Call this entire set of underlying data for c on (e,u,u*e,u^2) the
> ?Rubq-base?, and instead of the computing the regression Aubque on L
> over the entire Rubq-base, compute the regression ueSlope on (ubar,
> ebar) over the entire Rubq-base , where:
>
> i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);


Do you mean the coefficient of u*e?

>
> ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean
> of ?e? at each L.
>
> From each computation of ueSlope on (ubar, ebar) we have a pair of
> slopes with a pair of associated probabilities, and therefore across
> all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs
> of probabilities, or 144 probabilities in all.


What is the "computation of ueSlope on (ubar, ebar)"?
How do you get a pair of p's from it?

>
> DISREGARDING Fold and Set, divide these 144 probabilities into four
> groups:
>
> 36 at subset S, Method N
> 36 at subset C, Method N
> 36 at subset S, Method R
> 36 at subset C, Method R
>
> Sort each of these groups independently (lowest to highest p), and
> then pair off elements of these four groups as follows:
>
> pair off the 36 from S,N with the 36 from C,N by corresponding rank
> (from the sort of each group)
>
> pair off the 36 from S,R with the 36 from C,R by corresponding rank
> (from the sort of each group)


Regardless of the answers to my previous questions, you can't split
naturally paired p's, sort them, re-pair them, and then compare the
re-paired p's -- which you shouldn't compare in the first place,
even without the shuffling, because p-values are NOT effect sizes.

>
> (Note (!!!!) that these pairings are DIFFFERENT (!!!) from the
> pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in
> my post of 12/13@12:33.)
>
> You will then have these two tables of paired p?s (and the associated
> plot ?done your way?, which I?ve sent offline):
>
> SN,CN
>
> 0.004293565,0.000147868
> 0.009398,0.000235407
> 0.019790086,0.002576217
> 0.021645402,0.020854486
> 0.041148681,0.023919
> 0.056848093,0.041120964
> 0.169920851,0.042472596
> 0.236373,0.059794
> 0.248019846,0.079939524
> 0.277783068,0.087268176
> 0.281488299,0.13125994
> 0.287886,0.17489924
> 0.299769,0.180724763
> 0.299875026,0.185042614
> 0.360314613,0.207785097
> 0.370746358,0.21197145
> 0.406029587,0.228176227
> 0.43289,0.252242125
> 0.465398176,0.275296878
> 0.482382234,0.305134999
> 0.530897822,0.309388442
> 0.559333624,0.332112292
> 0.626424347,0.361024514
> 0.702399,0.41780334
> 0.741387901,0.423432022
> 0.768317356,0.476818276
> 0.820922877,0.542145
> 0.831159936,0.559098289
> 0.832584062,0.581960315
> 0.88900441,0.619627105
> 0.893789589,0.646265173
> 0.894253162,0.74717756
> 0.935126553,0.757530416
> 0.977748076,0.884119
> 0.980182674,0.900867429
> 0.984220184,0.938430375
>
> SR,CR
> 0.000503944,0.00011982
> 0.00118415,0.012214573
> 0.041027523,0.029133944
> 0.052112332,0.048936138
> 0.054021335,0.05764761
> 0.057693811,0.05865896
> 0.068659527,0.064182305
> 0.083710757,0.088376406
> 0.094021303,0.107473805
> 0.130456898,0.147682873
> 0.21540961,0.162392478
> 0.236780945,0.181759433
> 0.236936513,0.201847347
> 0.269875322,0.210439736
> 0.294476424,0.226305355
> 0.315561395,0.227038784
> 0.319462902,0.255699197
> 0.327971706,0.288864935
> 0.463861812,0.302035139
> 0.479255866,0.312164668
> 0.564392402,0.388447922
> 0.577382726,0.397416524
> 0.579430243,0.434182601
> 0.588970805,0.438280224
> 0.61542756,0.516128733
> 0.629984706,0.614130775
> 0.698570658,0.675962212
> 0.719544247,0.689950901
> 0.732798731,0.735779895
> 0.813873971,0.778392333
> 0.883957837,0.800207872
> 0.888276157,0.870729822
> 0.888377668,0.911149831
> 0.917545651,0.93512393
> 0.977990461,0.941162349
> 0.980356048,0.986071449
>
> So, depending on one?s ?IOT reaction? to the plot I?ve sent offline
> for the two tables above, one might be willing to say that in general,
> CN p?s plot significantly lower than CR p?s for equivalent SN?s and
> SR?s.
>
> And this result, assuming you?re willing to accept it, is extremely
> important for the following reason.
>
> It says that regardless of dicodon set 1,2,3, the (S,N) subsets
> ?evolved/were designed? (depending on your point of view ? heh heh
> heh) so that mutation away from these sets to (C,N) sets does NOT
> change the predictive capacities of ubar and ebar in ueSlope on (ubar,
> ebar) as much as the predictive capacities of ubar and ebar in ueSlope
> on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)
> sets.
>
> Or, to boil that statement down even further, the result says that we
> have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that
> does NOT exist for (S,R) sets. And the existence of this invariant
> strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all
> evolved to keep certain thermodynamic properties of protein messasges
> relatively constant despite the mutation which these messages must
> perforce undergo over time.
>
> Finally, apart from this empirical interpretation of the plot I?ve
> sent off line, I have a ?feeling? that the facts above regarding
> ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been
> discussing regarding Aubqe on L. But if you agree, then the ball is
> now in your court for the obvious reason that I have neither the
> knowledge nor experience nor statistical brain-power to determine if
> ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...