Date: Dec 14, 2012 3:48 AM
Author: Ray Koopman
Subject: Re: Re your questions about the plots sent off-line (and the<br> underlying data posted here 12/13 at 10:33am)
On Dec 13, 9:45 pm, djh <halitsk...@att.net> wrote:
> You wrote:
> ?What are the 1..36? All the other values are monotone increasing.
> Did they come that way, or did you sort them?
> The best way to see the difference between the plots is to take cols
> 2 & 3 as x & y coordinates, then plot the points along with a line
> from (0,0) to (1,1). The S-plot is mostly below the line. the C-plot
> is mostly above. I'm not as struck by that difference as you seem to
> be. Where did the numbers come from??
> 1. The 1...36 are irrelevant if the data are plotted the way you
> suggest ? they were just a way of giving Excel an x-axis to plot
> against. And thanks very much for the suggestion as how to plot in
> cases like this ? of course it never would have occurred to me to do
> it that way, and I was delighted to see that Excel lets you do it
> pretty easily (for a Microsoft-owned product, that is.)
You should seriously consider a real plotting program such as
> 2. Yes ? columns 2 and 3 were sorted.
> 3. Here?s where the numbers came from.
> Recall that:
> a) the fold x subset ?het? data which I presented for Aubuqe on L at
> MoSS N, set 1:
> Slopes of Regressions of
> Aubqe on Length (L) for each
> Fold x Subset |
> Set 1, Method N
> Fold x Slope
> Subset | # of
> Set 1 of Aubqe
> Meth N L?s on L
> a3_S_1_N 70 -0.000188
> c1_C_1_N 101 -0.000026
> a3_C_1_N 48 0.000052
> c1_S_1_N 101 0.000266
> c2_S_1_N 96 0.000421
> c2_C_1_N 95 0.000550
> b47_C_1_N 99 0.000618
> a1_S_1_N 101 0.001069
> b47_S_1_N 99 0.001079
> b1_S_1_N 31 0.001119
> b1_C_1_N 28 0.002015
> a1_C_1_N 101 0.002210
> were selected (because of their low associated ?het? p) from the fold
> x subset data for the regression Aubque on L computed for ALL six
> combinations of Set x MoSS.
> b) to get all the fold x subset Aubque on L data for all combinations
> of Set x MoSS, we obviously had to first regress c on (e,u,u*e,u^2) at
> each Len x Set x MoSS x Fold x Subset.
You seem to switch willy-nilly between Aubuqe, Aubqe, and Aubque.
How do they differ?
> Call this entire set of underlying data for c on (e,u,u*e,u^2) the
> ?Rubq-base?, and instead of the computing the regression Aubque on L
> over the entire Rubq-base, compute the regression ueSlope on (ubar,
> ebar) over the entire Rubq-base , where:
> i) ueSlope is the slope of the u*e term in c on (e,u,u*e,u^2);
Do you mean the coefficient of u*e?
> ii) ubar is the mean of ?u? (=u/(1+u) at each L and ebar is the mean
> of ?e? at each L.
> From each computation of ueSlope on (ubar, ebar) we have a pair of
> slopes with a pair of associated probabilities, and therefore across
> all combinations of Set x MoSS x Fold x Subset, we have 72 such pairs
> of probabilities, or 144 probabilities in all.
What is the "computation of ueSlope on (ubar, ebar)"?
How do you get a pair of p's from it?
> DISREGARDING Fold and Set, divide these 144 probabilities into four
> 36 at subset S, Method N
> 36 at subset C, Method N
> 36 at subset S, Method R
> 36 at subset C, Method R
> Sort each of these groups independently (lowest to highest p), and
> then pair off elements of these four groups as follows:
> pair off the 36 from S,N with the 36 from C,N by corresponding rank
> (from the sort of each group)
> pair off the 36 from S,R with the 36 from C,R by corresponding rank
> (from the sort of each group)
Regardless of the answers to my previous questions, you can't split
naturally paired p's, sort them, re-pair them, and then compare the
re-paired p's -- which you shouldn't compare in the first place,
even without the shuffling, because p-values are NOT effect sizes.
> (Note (!!!!) that these pairings are DIFFFERENT (!!!) from the
> pairings of (S,N) with (S,R) and (C,N) with (C,R) which I presented in
> my post of 12/13@12:33.)
> You will then have these two tables of paired p?s (and the associated
> plot ?done your way?, which I?ve sent offline):
> So, depending on one?s ?IOT reaction? to the plot I?ve sent offline
> for the two tables above, one might be willing to say that in general,
> CN p?s plot significantly lower than CR p?s for equivalent SN?s and
> And this result, assuming you?re willing to accept it, is extremely
> important for the following reason.
> It says that regardless of dicodon set 1,2,3, the (S,N) subsets
> ?evolved/were designed? (depending on your point of view ? heh heh
> heh) so that mutation away from these sets to (C,N) sets does NOT
> change the predictive capacities of ubar and ebar in ueSlope on (ubar,
> ebar) as much as the predictive capacities of ubar and ebar in ueSlope
> on (ubar, ebar)are changed by the mutation of (S,R) sets to (C,R)
> Or, to boil that statement down even further, the result says that we
> have found a (relative) INVARIANT UNDER MUTATION for (S,N) sets that
> does NOT exist for (S,R) sets. And the existence of this invariant
> strongly suggests that the (S,N) subsets of dicodon sets 1,2,3 all
> evolved to keep certain thermodynamic properties of protein messasges
> relatively constant despite the mutation which these messages must
> perforce undergo over time.
> Finally, apart from this empirical interpretation of the plot I?ve
> sent off line, I have a ?feeling? that the facts above regarding
> ueSlope on (ubar,ebar) must be related somehow to the facts we?ve been
> discussing regarding Aubqe on L. But if you agree, then the ball is
> now in your court for the obvious reason that I have neither the
> knowledge nor experience nor statistical brain-power to determine if
> ueSlope on (ubar,ebar) and Aubqe on L are related, and if so how ...