
Re: Correct way to normalize an rmsdbased distance metric used in repeated trials of pairs
Posted:
Apr 7, 2012 2:07 AM


On Apr 6, 8:09 pm, djh <halitsk...@att.net> wrote: > Below are two sets of results from John's calculator. > > The first set you've seen before  they're the "length 20" results > using the two predictors (v1+v)/2 and v1v2. > > The second set of results is also based on a matrix for length 20 > using the same two predictors. > > The difference is that in this case, I used only data exhibiting a > value of 3 for the new "C" variable which I mentioned in my last post, > rather than allowing C to be any of the values 2 thru 6 which it > actually assumes in the length 20 data. (In other words, I have not > yet incorporated "C" into the model as a third predictor  right now I > am merely using the original two predictors but on data in which C is > constant  just to see what happens.) > > As you can see from the two sets of results: > > a) the chisquare drops from 483.0906 in the first set of results to > 12.6410 in the second set, roughly a 40fold decrease; > b) the pvalue associated with the chisquare goes from < 0.0001 to > 0.0018. > > Does this drop in chisquare signify a step in the right direction, or > is 12.6410 still a relatively "big" chisquare? We would of course > like you to say that this drop in chisquare is a step in the right > direction, because we can think of "C" as a scalar on our unquantized > variable u that underlies v. > > Thanks for any time you can afford to spend considering this question, > and again, please forgive my abysmal ignorance about these matters.
This is just a quick note to sort out the chisquares. I'll have more to say later.
The chisquare I described in my previous post compares the logistic model to the saturated model, that fits the data perfectly. The bigger the chisquare (and the smaller the p), the more certain you are that the logistic model is worse than the saturated model.
The chisquare in John's output compares the logistic model to the "null" model, that gives all the cells the same probability. The bigger the chisquare (and the smaller the p), the more certain you are that the logistic model is better than the null model.
So you want my chisquare to be small and nonsignificant, and John's to be big and significant.
For another take on how well the model fits, plot the "Calc Prob" in John's output as a function of the sample proportion, n1/(n0+n1). Ideally, the two should be equal. How far is the plot from a 45 degree line?

