1. In answer to your question, there is no standard nomenclature, so any of these will do:
a) first, second b) left, right c) leading, final.
I probably would prefer leading, final myself.
2. Re your new method for dealing with the original 20K messages:
If we get to the point of actually producing Paper I, then I would like to include your alternative in the paper, so that:
a) referees are aware of your thoughts on the matter; b) others can try it to see how the results compare with our original method (with the "arbitrary" cutoffs".)
But I don't want to stop forward motion right now if it's not an absolute "gotcha".
Also, regarding the "three-round" testing that we did (on b3b4's, then on b2...b5's centered on "survivor" b3b4's, then on b1...b6's centered on "survivor" b2b3b4b5's", I just want to make clear that this "three- round" testing protocol was our innovation (meaning JRF/AML/DJH), not the Vandy biostatistician's. He just suggested the overall t-test framework (i.e. the inner 100 randomizations of each of the real 200 messages in a real set, and the outer split of the 20000 real messages into 100 real sets of 200 real messages each.)
I won't go into the reasons why the "three-round" protocol may be superior to a protocol in which we just look at b1b2b3b4b5b6 frequencies but I do want to make clear that the results of using such a "three-round" protocol may differ greatly from those obtained by a protocol which just looks at b1b2b3b4b5b6 frequencies (as for example, the protocol you suggest in your last post.)