```Date: Dec 14, 2012 12:08 AM
Author: Ray Koopman
Subject: Re: Re your question about "linearity of SE’s in l<br>	ength"

On Dec 13, 7:28 pm, djh <halitsk...@att.net> wrote:> I.>> You asked:>> "But here the SEs of Aubque are roughly linear in Length, even tho the> regressions are for individual lengths, not length intervals. Does> that hold for the other Axxxx's?">> I will check the other Axxxx?s and also the other folds at N, since I> only gave you b1,C and b1,S at N.>> To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see> if the plot looks ?straight??  (If not, please let me know the check> you want me to do (otherwise, no need to respond to this question.)Just eyeball the (L,SE) plots.>> II.  You wrote:>> This needs to be understood. Is there something intrinsically> different going on at the longer lengths, or is it just length per se> (as in more opportunities)?>> The answer to this question would first take a great great great deal> of what might be called ?distributional analysis? of the frequency of> ?dipeptides of interest? and their ?dicodons of interest? relative to> length. (Recall here that for method N and set 1, for example, there> are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.)>> And then, once we knew the frequency of ?dipeptides of interest? and> ?dicodons of interest? relative to length, we would have to compute> the possibilities of u-variation and e-variation within each n-tuple> of dicodons of interest that encodes each dipeptide of interest.  Only> then would we be able to compute an answer to the question of whether> it?s as simple a matter as ?possibilities increase with length?.>> But I would much prefer not to ?go there? now for two reasons:>> a) the requisite ?distributional analysis? and subsequent u/e-> variation analysis could easily take months if not years;>> b) I?d like to see first if linearity of SE with length is essentially> constant across method x set x subset x fold, or whether the 72> (MoSS,Set,Subset,Fold) combinations exhibit different degrees of> linearity of SE with length in some systematic way(s).>> For example, it would be a highly desirable outcome (though of course> ?too good to be true?) if MoSS = R combinations (and/or Subset = C> combinations, or (R,C) combinations) exhibited MORE constancy of SE> with length than MoSS = N combinations (and/or Subset = S, or (N,S)> combinations).  This is because such an outcome would suggest that the> system might over-represent certain dicodons because over-> representation of certain dicodons keeps the ?enthalpic profile? of> messages invariant with length.>> So, I hope you?ll permit me to first do the linearity checks for all> of the special average slopes (and the special covar) across the 36> (Set, Subset, Fold) combinations at N, before going to ?distributional> analysis?.When I look at the two (L,Aubqe) plots I have, the first thing I seeis an empty vertical gap about 10 units wide whose right edge is nearthe L-mean. The distribution on the left of the gap looks linear, butthe distribution on the right looks more like a quadratic tipped onits side, as in L = a + b*(Aubqe - c)^2. Does anyting like that showup anywhere else?
```