|
|
Re: Correct way to normalize an rmsd-based distance metric used in repeated trials of pairs
Posted:
Apr 7, 2012 2:07 AM
|
|
On Apr 6, 8:09 pm, djh <halitsk...@att.net> wrote: > Below are two sets of results from John's calculator. > > The first set you've seen before - they're the "length 20" results > using the two predictors (v1+v)/2 and v1-v2. > > The second set of results is also based on a matrix for length 20 > using the same two predictors. > > The difference is that in this case, I used only data exhibiting a > value of 3 for the new "C" variable which I mentioned in my last post, > rather than allowing C to be any of the values 2 thru 6 which it > actually assumes in the length 20 data. (In other words, I have not > yet incorporated "C" into the model as a third predictor - right now I > am merely using the original two predictors but on data in which C is > constant - just to see what happens.) > > As you can see from the two sets of results: > > a) the chi-square drops from 483.0906 in the first set of results to > 12.6410 in the second set, roughly a 40-fold decrease; > b) the p-value associated with the chi-square goes from < 0.0001 to > 0.0018. > > Does this drop in chi-square signify a step in the right direction, or > is 12.6410 still a relatively "big" chi-square? We would of course > like you to say that this drop in chi-square is a step in the right > direction, because we can think of "C" as a scalar on our unquantized > variable u that underlies v. > > Thanks for any time you can afford to spend considering this question, > and again, please forgive my abysmal ignorance about these matters.
This is just a quick note to sort out the chi-squares. I'll have more to say later.
The chi-square I described in my previous post compares the logistic model to the saturated model, that fits the data perfectly. The bigger the chi-square (and the smaller the p), the more certain you are that the logistic model is worse than the saturated model.
The chi-square in John's output compares the logistic model to the "null" model, that gives all the cells the same probability. The bigger the chi-square (and the smaller the p), the more certain you are that the logistic model is better than the null model.
So you want my chi-square to be small and nonsignificant, and John's to be big and significant.
For another take on how well the model fits, plot the "Calc Prob" in John's output as a function of the sample proportion, n1/(n0+n1). Ideally, the two should be equal. How far is the plot from a 45 degree line?
|
|