" I've done some Monte Carlo runs calculating u, using both method 2 and method 3, for sets of random dicodons chosen equiprobably from the 774 in the union of S63 & C711. The distribution of u_2 looks better enough than the distribution of u_3 that I wouldn't bother to investigate method 3 any further. "
Thanks for the heads-up. Unless you post otherwise, I'll stick with method 2 (the present method).
2. You wrote:
"Sum N_tot = what we've been callng 'c', right? In my Monte Carlo experimenation, I often got sum N_yes = 0 when c was small. If the values of c in the data you posted in 305 djh May 26, 8:41 am are typical, I'm surprised you've never had u = 0, which would have given you error messages for c/u and ln[u]."
Not sure what you mean by "Sum N_tot", but:
i) Yes, for one message segment, N_tot = c, since c is the number of "dipeptides of interest" in the message segment, i.e. dipeptides that can be encoded by "dicodons of interest" (i.e. dicodons in the reference set), and c = N_tot = N_yes + N_no.
ii) And yes, under the definition of u I've been using (method 2), u can be 0 because N_yes can be 0 in a message segment. So yes, c/u can err out and also ln(u).
That's why I've always said that the analysis includes data with u > 0, because I simply drop the guys in which dicodons of interest are not represented at all.
I think this is legitimate because what we claim we are measuring is "u" segments where there is any representation level of "dicodons of interest" at all, and this set does not include segments in which there are no "dicodons of interest" at all.
But if you think we should handle this matter differently, please let me know what you think we should do.