Date: Dec 13, 2012 10:28 PM
Author: Halitsky
Subject: Re your question about "linearity of SE’s in lengt<br>	h"


You asked:

"But here the SEs of Aubque are roughly linear in Length, even tho the
regressions are for individual lengths, not length intervals. Does
that hold for the other Axxxx's?"

I will check the other Axxxx?s and also the other folds at N, since I
only gave you b1,C and b1,S at N.

To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see
if the plot looks ?straight?? (If not, please let me know the check
you want me to do (otherwise, no need to respond to this question.)

II. You wrote:

This needs to be understood. Is there something intrinsically
different going on at the longer lengths, or is it just length per se
(as in more opportunities)?

The answer to this question would first take a great great great deal
of what might be called ?distributional analysis? of the frequency of
?dipeptides of interest? and their ?dicodons of interest? relative to
length. (Recall here that for method N and set 1, for example, there
are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.)

And then, once we knew the frequency of ?dipeptides of interest? and
?dicodons of interest? relative to length, we would have to compute
the possibilities of u-variation and e-variation within each n-tuple
of dicodons of interest that encodes each dipeptide of interest. Only
then would we be able to compute an answer to the question of whether
it?s as simple a matter as ?possibilities increase with length?.

But I would much prefer not to ?go there? now for two reasons:

a) the requisite ?distributional analysis? and subsequent u/e-
variation analysis could easily take months if not years;

b) I?d like to see first if linearity of SE with length is essentially
constant across method x set x subset x fold, or whether the 72
(MoSS,Set,Subset,Fold) combinations exhibit different degrees of
linearity of SE with length in some systematic way(s).

For example, it would be a highly desirable outcome (though of course
?too good to be true?) if MoSS = R combinations (and/or Subset = C
combinations, or (R,C) combinations) exhibited MORE constancy of SE
with length than MoSS = N combinations (and/or Subset = S, or (N,S)
combinations). This is because such an outcome would suggest that the
system might over-represent certain dicodons because over-
representation of certain dicodons keeps the ?enthalpic profile? of
messages invariant with length.

So, I hope you?ll permit me to first do the linearity checks for all
of the special average slopes (and the special covar) across the 36
(Set, Subset, Fold) combinations at N, before going to ?distributional