Date: Dec 14, 2012 12:08 AM
Author: Ray Koopman
Subject: Re: Re your question about "linearity of SE’s in l<br>	ength"

On Dec 13, 7:28 pm, djh <halitsk...@att.net> wrote:
> I.
>
> You asked:
>
> "But here the SEs of Aubque are roughly linear in Length, even tho the
> regressions are for individual lengths, not length intervals. Does
> that hold for the other Axxxx's?"
>
> I will check the other Axxxx?s and also the other folds at N, since I
> only gave you b1,C and b1,S at N.
>
> To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see
> if the plot looks ?straight?? (If not, please let me know the check
> you want me to do (otherwise, no need to respond to this question.)


Just eyeball the (L,SE) plots.

>
> II. You wrote:
>
> This needs to be understood. Is there something intrinsically
> different going on at the longer lengths, or is it just length per se
> (as in more opportunities)?
>
> The answer to this question would first take a great great great deal
> of what might be called ?distributional analysis? of the frequency of
> ?dipeptides of interest? and their ?dicodons of interest? relative to
> length. (Recall here that for method N and set 1, for example, there
> are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.)
>
> And then, once we knew the frequency of ?dipeptides of interest? and
> ?dicodons of interest? relative to length, we would have to compute
> the possibilities of u-variation and e-variation within each n-tuple
> of dicodons of interest that encodes each dipeptide of interest. Only
> then would we be able to compute an answer to the question of whether
> it?s as simple a matter as ?possibilities increase with length?.
>
> But I would much prefer not to ?go there? now for two reasons:
>
> a) the requisite ?distributional analysis? and subsequent u/e-
> variation analysis could easily take months if not years;
>
> b) I?d like to see first if linearity of SE with length is essentially
> constant across method x set x subset x fold, or whether the 72
> (MoSS,Set,Subset,Fold) combinations exhibit different degrees of
> linearity of SE with length in some systematic way(s).
>
> For example, it would be a highly desirable outcome (though of course
> ?too good to be true?) if MoSS = R combinations (and/or Subset = C
> combinations, or (R,C) combinations) exhibited MORE constancy of SE
> with length than MoSS = N combinations (and/or Subset = S, or (N,S)
> combinations). This is because such an outcome would suggest that the
> system might over-represent certain dicodons because over-
> representation of certain dicodons keeps the ?enthalpic profile? of
> messages invariant with length.
>
> So, I hope you?ll permit me to first do the linearity checks for all
> of the special average slopes (and the special covar) across the 36
> (Set, Subset, Fold) combinations at N, before going to ?distributional
> analysis?.


When I look at the two (L,Aubqe) plots I have, the first thing I see
is an empty vertical gap about 10 units wide whose right edge is near
the L-mean. The distribution on the left of the gap looks linear, but
the distribution on the right looks more like a quadratic tipped on
its side, as in L = a + b*(Aubqe - c)^2. Does anyting like that show
up anywhere else?