|
|
Re: Correct way to normalize an rmsd-based distance metric used in repeated trials of pairs
Posted:
Apr 8, 2012 4:12 AM
|
|
On Apr 7, 6:19 pm, djh <halitsk...@att.net> wrote: > I think I see what has to be done mathematically, but don't know > how to express it statistically. I think it boils down to a four > predictor logistic model, but I'm not sure. > > Each input pair can not only be characterized by two values (vi,vj) > of our current quantized variable v, but also by two values (ci,cj), > where c is the new "count" variable I mentioned in my last couple of > posts. And similarly for each output pair. > > So mathematically, our upper-triangular 10x10 matrix V=(vi,vj) is > actually a matrix of matrices, i.e. each cell (vi,vj) of V is a > matrix Cij = (cijp,cijq), where p and q are values of C.
For each pair of objects, we arbitrarily assign the 'i' and 'j' labels so that vi <= vj. But object i has C = ci, and object j has C = cj, and you haven't said if any special relation holds for those two C-values by virtue of the corresponding V-values being ordered, so it sounds like the C matrix in each cell of the upper triangular V matrix is full square, giving 55*n^2 cells in the design, where 'n' is the number of different values of C.
Note: We've been talking of matrices, but in principle both V (i.e., U) and C could be continuous, in which case the input would no longer be summary counts n0 and n1, but individual 0s and 1s.
> > I think this boils down to constructing a four-predictor model with > predictors vi,vj,cp,cq, but again, am not sure.
Right, where cp is the value of C for the object whose V is vi, and cq is the value of C for the object whose V is vj.
> > If/when you have a chance, please advise whether I'm looking at this > correctly. If so, I'll have to write a little more PERL to generate > the numbers, but I think it may be worth it. I'm in the process of > hand-tallying a couple of Cij matrices to see what they look like, > and they really seem to be quite orderly ... > > One other question - would it be preferable to run some two-predictor > models using just cp,cq before trying to combine these predictors > with the vi,vj predictors? Or do you always have to run all relevant > variables "together"?
In general, you always need to put all the predictors in together.
|
|