You're more than welcome, Ray. It's little enough compared to what you've conveyed to me.
To recap the above post (plus error corrections), what I wrote was the following:
****** Suppose your boss gives you an encoded announcement in which:
a) a 2-word sequence for AP appears just once and this 2-word sequence is att ccc (one of the above 63 2-word sequences)
b) a 2-word sequence for AL appears just once and this 2-word sequence is gcg ctg (one of the above 63 2-word sequences)
c) no other 2-word sequence from the above 63 occurs in the announcement (equivalently, you will typeset an announcement in which none of the above 49 2-letter sequences appears other than one AP and one AL.)
Then the degree ?u? of average over-representation of the 63 2-word sequences in the announcement is computed by my PERL code as:
1+1 = 2 (1 of the 63 for AP and 1 of the 63 for AL)
1/12 + 1/8 = 5/24 (expected frequency for AL and expected frequency for AP)
2/(5/24) = 48/5
(48/5)/2 = 48/10 = 4.8 = u.
In this example, L is not given, i.e. I did not specify the length of the announcement in which AP and AL are to appear. I did this deliberately because it is irrelevant to the computation of "u", i.e. "u" would be 4.8 regardless of how long the announcement is (so long as the announcement contained only the AP and AL as the only 2-letter sequences encoded by any of the 62 over-represented two-word sequences.)
On the other hand, "c" is implicitly given - it is 2, i.e. the number of positions in the announcement which COULD BE coded by one of the 63 over-represemtred two-word sequences.
That is, if in the above example, AL WAS encoded by one of the 63 two- word sequences but AP WASN'T, then c would still be 2. In this case, u would be:
1 (actual for AL) + 0 (actual for AP) = 1
1/12 (expected for AL) + 1/8 (actual for AP) = 24/5
1/(5/24) = 24/5
(24/5) / 2 = 24/10 = 2.4 = u.
And it is absolutely critical to note that in both of these examples, the last step of computing u involves dividing by c = 2 to get the AVERAGE degree of over-representation.
So therefore, if we consider our variable c/u, we see that:
c/u = c/(U/c) = c^2/U (where U is respectively 48/5 and 24/5 in the two examples above)
and therefore the regression
c/u on c/L
c^2/U on c/L
which means (I think) that there is an essentially quadratic term in the correlation.
And if I am correct here, then the effect of the U and L as denominators is to make a non-linear regression look linear.