Date: Dec 13, 2012 10:28 PM Author: Halitsky Subject: Re your question about "linearity of SE’s in lengt<br> h" I.

You asked:

"But here the SEs of Aubque are roughly linear in Length, even tho the

regressions are for individual lengths, not length intervals. Does

that hold for the other Axxxx's?"

I will check the other Axxxx?s and also the other folds at N, since I

only gave you b1,C and b1,S at N.

To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see

if the plot looks ?straight?? (If not, please let me know the check

you want me to do (otherwise, no need to respond to this question.)

II. You wrote:

This needs to be understood. Is there something intrinsically

different going on at the longer lengths, or is it just length per se

(as in more opportunities)?

The answer to this question would first take a great great great deal

of what might be called ?distributional analysis? of the frequency of

?dipeptides of interest? and their ?dicodons of interest? relative to

length. (Recall here that for method N and set 1, for example, there

are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.)

And then, once we knew the frequency of ?dipeptides of interest? and

?dicodons of interest? relative to length, we would have to compute

the possibilities of u-variation and e-variation within each n-tuple

of dicodons of interest that encodes each dipeptide of interest. Only

then would we be able to compute an answer to the question of whether

it?s as simple a matter as ?possibilities increase with length?.

But I would much prefer not to ?go there? now for two reasons:

a) the requisite ?distributional analysis? and subsequent u/e-

variation analysis could easily take months if not years;

b) I?d like to see first if linearity of SE with length is essentially

constant across method x set x subset x fold, or whether the 72

(MoSS,Set,Subset,Fold) combinations exhibit different degrees of

linearity of SE with length in some systematic way(s).

For example, it would be a highly desirable outcome (though of course

?too good to be true?) if MoSS = R combinations (and/or Subset = C

combinations, or (R,C) combinations) exhibited MORE constancy of SE

with length than MoSS = N combinations (and/or Subset = S, or (N,S)

combinations). This is because such an outcome would suggest that the

system might over-represent certain dicodons because over-

representation of certain dicodons keeps the ?enthalpic profile? of

messages invariant with length.

So, I hope you?ll permit me to first do the linearity checks for all

of the special average slopes (and the special covar) across the 36

(Set, Subset, Fold) combinations at N, before going to ?distributional

analysis?.