|
|
Re: Comparative results for {lnL,w,lnLw,x1,x2} & {lnL,w,lnLw,x1,x2,lnLx1,lnLx2}
Posted:
May 4, 2012 1:05 AM
|
|
This is a combined reply to two posts. _______________________
190 djh May 3, 6:35 am
> You wrote: > >> "To fix that, divide each count by the sum of the four counts, >> to get proportions." > > Yes I should have thought of this myself - thanks very much for > correcting me here. > > You also wrote: > >> "However, all this seems too ad hoc, and I'm uncomfortable with it." > > In response, let me ask you to put aside for the moment your basic > objection to the whole "residual categorization" approach and consider > the "5-predictor" model whose results were given for study and control > groups for all six folds in the PDF I sent you offline last night. > > If I replace the absolute population counts for categories 00,01,10,11 > in this 5-predictor model with the proportions you just suggested, > then this 5 predictor model boils down to: > > {lnL,x1,x2,w,lnLw} , where w is your "proportional" weighting factor > rather than my "absolute count" weighting factor. > > And I don't see where I would be wrong to say that the factors w and > lnLw are actually LESS ad hoc than the factors mv=2x1+x2 and lnLmv > which I used as a pure "guess" in the last iteration that wound up > working on all six folds EXCEPT the c2 fold. > > Furthermore, the fact the factors w and lnLw will allow the 5- > predictor model to work on the c2 fold AS WELL AS the other five folds > indicates (to me at least) that IF the residual approach is adopted > in the first place, then a weighting predictor must be added into any > model to account for the fact that there are systematic proportional > differences among the 00,01,10,11 population counts at each length > interval.
You're saying that the probability of a match in a cell is related to the relative number of inputs in that cell compared to the three other cells. I have to ask "Why?" Does that make sense on its own, or is it a surrogate for something that hasn't yet been identified? The differences in the input counts are caused by correlation among the (absolute adjusted) residuals. Why are those residuals correlated? What controls that correlation? You should add the *cause* of the correlation, not the *result* of the correlation, to the model.
> > So, assuming you're willing to let me proceed with the 5-predictor > model {lnL,x1,x2,w,lnLw}, I have one further question. > > Before starting the Stage II "official tests" of this model, should I > try the 7-predictor model {lnL,x1,x2,w,lnLx1,lnLx2,lnlw} and see how > it compares to the 5-predictor model {lnL,x1,x2,w,lnLw}? I'm asking > this because of the partial success we achieved a while back with the > very simple 3-predictor model {lnL,x1,lnLx1} that you suggested ... > i.e. mu reasoning here is that if you felt it best to add in the > factor lnLx1 to this model, then it seems to follow that lnLx1 and > lnLx2 should be added to the five predictor model {lnL,x1,x2,w,lnLw} > so as to produce the 7-predictor model > {lnL,x1,x2,w,lnLx1,lnLx2,lnlw}. What do you think?
I think the product terms need to be clearly significant.
> > As always, thanks for your patience. Having "wallowed" in the data > of the "category counts" for a few days now (to use the term that > you used an email or two ago), I do belive there's good reason to > think that the proportional weighting predictor w is NOT ad hoc, > but reflects deep properties of the underlying system (systematic > proportional differences between the 00,01,10,11 category counts) > that can and should be tested on their own to get a better idea of > what's going on. ________________________
191 djh May 3, 11:27 am
> I sent you offline a PDD with comparative results for > {lnL,w,lnLw,x1,x2} & {lnL,w,lnLw,x1,x2,lnLx1,lnLx2}, where "w" > is the proportional factor for the category 00, 01, 10, or 11. > > "Naked eye" evaluation of the confidence intervals seems to > indicate that the best chance of success will come from applying > your specified tests to the results of the 7-predictor model > {lnL,w,lnLw,x1,x2,lnLx1,lnLx2} for the study and control groups. > > If you disagree, please let me know. In the meantime, I will > start applying the tests to this model so that I can at least > become fluent in the clerical "moves" required to execute them.
I've looked at the pdf, and I don't so much disagree as wonder what you're taking as evidence of success, or what you really mean by success.
A side comment: Odds ratios are meaningful only for dichotomous predictors. Yes, you can look to see whether the CI includes 1, but it would be easier (and would avoid what appear to be out-of- printable-range values) to have CIs for the coefficients themselves, and (perhaps more importantly, in your case) CIs for the difference between corresponding coefficients in the two groups. Also, note that the intercept is really just another coefficient and should be reported on too.
The 95% CI for a coefficient is coeff +/- 1.96*stderr. The 95% CI for the difference between two coefficients is (coeff1 - coeff2) +/- 1.96*sqrt[(stderr1)^2 + (stderr2)^2].
> > Thanks again, Ray
|
|