For the a1 fold, the three-column table at the end of this post gives:
1) LId (integer length interval id from 1 to 12 for our new constatn- ratio intervals (this is the L that figures in our ratio c/L) 2) value of c (this is the c that figures in our ratio c/L) 3) n (number of a1 fold subsequences whose length is within the interval specified by LId and which contain c positions occupied either by a study dicodon or control dicodon (c is neutral between the two because a position is counted in c if the position is occupied by any one of the 82 dipeptides that are encoded EITHER by the 119 study group dicodons OR the 1058 control dicodons.)
The reason I'm providing this table is that I'm hoping you'll be able to analyze it to check whether there's any property of the c vs L distribution that would bring into question the meaningfulness of our driver correlations lnc-lne on lnc-lnL and lnc-lnu on lnc-lnL. For example, if you look at the counts per c within any single LId, they look very "normal-ish", and if this is in fact the case, I'm concerned that this may have an undesirable effect on the ratio c/l, i.e. an effect which would force any correlation involving the term c/L to be dicarded.
The table below looks very much like the tables we got five years ago on a very large and general sample, so I am assuming that the "normal- ish" look to the counts for c's within the same L will also be the same for data from the other five folds. But if it's important, I'll provide the same table for the other five folds as I get to them during the two new passes I'm now doing.
Thanks for considering this matter, to the extent that you have time to do so.