Date: Dec 14, 2012 12:08 AM Author: Ray Koopman Subject: Re: Re your question about "linearity of SE’s in l<br> ength" On Dec 13, 7:28 pm, djh <halitsk...@att.net> wrote:

> I.

>

> You asked:

>

> "But here the SEs of Aubque are roughly linear in Length, even tho the

> regressions are for individual lengths, not length intervals. Does

> that hold for the other Axxxx's?"

>

> I will check the other Axxxx?s and also the other folds at N, since I

> only gave you b1,C and b1,S at N.

>

> To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see

> if the plot looks ?straight?? (If not, please let me know the check

> you want me to do (otherwise, no need to respond to this question.)

Just eyeball the (L,SE) plots.

>

> II. You wrote:

>

> This needs to be understood. Is there something intrinsically

> different going on at the longer lengths, or is it just length per se

> (as in more opportunities)?

>

> The answer to this question would first take a great great great deal

> of what might be called ?distributional analysis? of the frequency of

> ?dipeptides of interest? and their ?dicodons of interest? relative to

> length. (Recall here that for method N and set 1, for example, there

> are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.)

>

> And then, once we knew the frequency of ?dipeptides of interest? and

> ?dicodons of interest? relative to length, we would have to compute

> the possibilities of u-variation and e-variation within each n-tuple

> of dicodons of interest that encodes each dipeptide of interest. Only

> then would we be able to compute an answer to the question of whether

> it?s as simple a matter as ?possibilities increase with length?.

>

> But I would much prefer not to ?go there? now for two reasons:

>

> a) the requisite ?distributional analysis? and subsequent u/e-

> variation analysis could easily take months if not years;

>

> b) I?d like to see first if linearity of SE with length is essentially

> constant across method x set x subset x fold, or whether the 72

> (MoSS,Set,Subset,Fold) combinations exhibit different degrees of

> linearity of SE with length in some systematic way(s).

>

> For example, it would be a highly desirable outcome (though of course

> ?too good to be true?) if MoSS = R combinations (and/or Subset = C

> combinations, or (R,C) combinations) exhibited MORE constancy of SE

> with length than MoSS = N combinations (and/or Subset = S, or (N,S)

> combinations). This is because such an outcome would suggest that the

> system might over-represent certain dicodons because over-

> representation of certain dicodons keeps the ?enthalpic profile? of

> messages invariant with length.

>

> So, I hope you?ll permit me to first do the linearity checks for all

> of the special average slopes (and the special covar) across the 36

> (Set, Subset, Fold) combinations at N, before going to ?distributional

> analysis?.

When I look at the two (L,Aubqe) plots I have, the first thing I see

is an empty vertical gap about 10 units wide whose right edge is near

the L-mean. The distribution on the left of the gap looks linear, but

the distribution on the right looks more like a quadratic tipped on

its side, as in L = a + b*(Aubqe - c)^2. Does anyting like that show

up anywhere else?