On Dec 13, 7:28 pm, djh <halitsk...@att.net> wrote: > I. > > You asked: > > "But here the SEs of Aubque are roughly linear in Length, even tho the > regressions are for individual lengths, not length intervals. Does > that hold for the other Axxxx's?" > > I will check the other Axxxx?s and also the other folds at N, since I > only gave you b1,C and b1,S at N. > > To check linearity, I assume I plot (length,SE)?s as (x,y)?s, and see > if the plot looks ?straight?? (If not, please let me know the check > you want me to do (otherwise, no need to respond to this question.)
Just eyeball the (L,SE) plots.
> > II. You wrote: > > This needs to be understood. Is there something intrinsically > different going on at the longer lengths, or is it just length per se > (as in more opportunities)? > > The answer to this question would first take a great great great deal > of what might be called ?distributional analysis? of the frequency of > ?dipeptides of interest? and their ?dicodons of interest? relative to > length. (Recall here that for method N and set 1, for example, there > are 63 ?dicodons of interest? encoding 49 ?dipeptides of interest?.) > > And then, once we knew the frequency of ?dipeptides of interest? and > ?dicodons of interest? relative to length, we would have to compute > the possibilities of u-variation and e-variation within each n-tuple > of dicodons of interest that encodes each dipeptide of interest. Only > then would we be able to compute an answer to the question of whether > it?s as simple a matter as ?possibilities increase with length?. > > But I would much prefer not to ?go there? now for two reasons: > > a) the requisite ?distributional analysis? and subsequent u/e- > variation analysis could easily take months if not years; > > b) I?d like to see first if linearity of SE with length is essentially > constant across method x set x subset x fold, or whether the 72 > (MoSS,Set,Subset,Fold) combinations exhibit different degrees of > linearity of SE with length in some systematic way(s). > > For example, it would be a highly desirable outcome (though of course > ?too good to be true?) if MoSS = R combinations (and/or Subset = C > combinations, or (R,C) combinations) exhibited MORE constancy of SE > with length than MoSS = N combinations (and/or Subset = S, or (N,S) > combinations). This is because such an outcome would suggest that the > system might over-represent certain dicodons because over- > representation of certain dicodons keeps the ?enthalpic profile? of > messages invariant with length. > > So, I hope you?ll permit me to first do the linearity checks for all > of the special average slopes (and the special covar) across the 36 > (Set, Subset, Fold) combinations at N, before going to ?distributional > analysis?.
When I look at the two (L,Aubqe) plots I have, the first thing I see is an empty vertical gap about 10 units wide whose right edge is near the L-mean. The distribution on the left of the gap looks linear, but the distribution on the right looks more like a quadratic tipped on its side, as in L = a + b*(Aubqe - c)^2. Does anyting like that show up anywhere else?