On May 29, 6:50 am, djh <halitsk...@att.net> wrote: > I need to draw your attention to a very important fact about the > behavior of u and the x2 driver correlation ln(c/u) on ln(c/L). > Please note that in the following exposition, the study and control > dicodon groups are the original S63 and C711. > > In the original data that I used in the work we did to arrive at > the 7-predictor logistic regression model, I was using study group > data in which u > 1 and control group data in which u > 1. That > is, I was insisting that study group dicodons be OVER-represented > in segment data, and similarly for control group dicodons. > > But the problem with this approach was that it led to numerically > uneven study and control group populations, since there are a lot > of study group segments with u between 1.36 and 4.00, whereas there > are virtually no control group segments with u between 1.36 and 4.00 > ... 1.36 is about where u tops-out for the control group segments. > (This follows from the fact that S63 was originally selected because > its dicodons are known to be significantly over-represented, based > on the work done in 2005.) > > So, to equalize study and control group populations, I decided to > take ALL segments so long as u > 0. > > But what I am seeing is that this broader choice is decreasing the > strength of the correlation ln(c/u) on ln(c/L), and also presumably, > the degree of difference in mean and variance of ln(c/u) between > groups (although I haven't specifically checked this.) In this > regard, note that with u > 0, the coefficient of ln(c/u) on ln(c/L) > never drops below 0.25 to 0.40, i.e. it never reverses sign or > anything like that. It's just that the coefficient is generally > lower than when u > 1. > > I don't know if you can see a way to deal with this problem. If we > go for a "more effective" u, then we have less comparable study and > control group populations, whereas if we go for more comparable study > and control group populations, then we have a "less effective" u. > > Finally, please note that I don't know for sure whether choosing > u > 0 rather than u > 1 also affects "e" - I think it may, but > certainly not as much as it does u.
I think this is more a substantive problem than a statistical one. If you include u < 1 then the control group is bigger, but the study and control u-distributions still don't overlap much. (But think of overlap on the log scale. Look at the distributions, not just summary statistics.)
Would keeping the restriction mean that the groups would overlap so little that people might question the relevance of comparing the two groups, because they are so obviously "non-comparable"? Would dropping the restriction change the question to one you're not interested in asking, even tho you may get a statistically-clearer answer than you would get with the restriction? I'm not the one to answer those questions.
But I do have a basic factual question: You said in an email (Mar 21, 8:19 AM) that the units of u are kcals/mols. That's representation level? 1 kcal/mol means neither over- nor under-representation?