You wrote: ?You should probably check them against what you get?
I spot checked against various raw detail files, and I didn?t find any anomalies.
?It looks like B1 is going to give problems.?
You can probably decide this question by looking at the table at the end of this post.
To understand this table, you need only realize that to do our subset/ method 2-ways per set per fold per SINGLETON len interval, we need to have data for a given triple (set, fold, len) in all four categories determined by subset and method, namely NS, NC, RS, and RC. And the table shows the counts for the only b1 triples (set, fold, len) for which all four of NS,NC,RS,RC are non-empty.
So, if you think the counts in the following table are high enough to allow us to operate on b1, then we?re OK. But if you think they?re too low, I can very likely increase the counts by getting more underlying b1 data ... I distinctly recall being conservative when I picked the b1 proteins and associated messages by assuming that high amino acid identity meant high codon identity, and this is not necessarily the case due to the fact that many changes of codons do not change amino acid identity (e.g. changing att to atc still yields the amino acid I = ILE = Isoleucine.
Please let me know what you think ? it won?t be a huge detour to try and pick-up more b1 data if you think the counts in the table below are too low to allow you to operate satisfactorily.
Finally, what?s very interesting about b1 proteins is that they?re immunoglobulins, i.e. proteins which play a role in immune systems by mutating to meet the challenge posed by new antigens (invaders.) So given the fact that it?s the job of b1 proteins to mutate, it?s actually surprising we?re getting any counts at all ... in effect, by their very nature, b1 proteins push the limits of the domain over which we can expect our hypotheses to hold ...