|
|
Re: Correct way to normalize an rmsd-based distance metric used in repeated trials of pairs
Posted:
Apr 14, 2012 12:12 PM
|
|
On Apr 13, 7:27 pm, djh <halitsk...@att.net> wrote: > Thanks as always for taking the time to respond. > > I see what you mean by this: > > "If an effect is significant in one group and not significant in > another, you can not conclude that the effect is different in the > two groups. Such a conclusion requires testing and rejecting the > hypothesis that the effect is the same in both groups. \" > > and I would ask you to elaborate the framework for "testing and > rejecting the hypothesis that the effect is the same in both groups", > but I think I'm making progress toward the regression model, and want > to keep focussed on that. > > What I think may be possible is a three-predictor model, where two of > the three predictors are binary and the third > > In particular, I have been building on the idea of "divergence from > predicted value" that I asked you about in my last post, as follows. > > Suppose that in the sset of data for a given length interval L, we > find two statistically significant correlations involving the same > independent variable: > > y1 = f(x1) > y2 = f(x1) > > where x1 is our quantized variable v, y1 is our energetic variable e > (as in the two tables I sent you offline), and y2 is the new "count" > variable "c" that I've mentioned in a few previous posts. > > Then we can split S into four sets "11, 01, 10, 00" as follows: > > 11: observed y1 falls within 1sd of predicted y1 and observed y2 > falls within 1sd of predicted y2 > 01: observed y1 falls outside 1sd of predicted y1 and observed y2 > falls within 1sd of predicted y2 > 10: observed y1 falls within 1sd of predicted y1 and observed y2 > falls outside 1sd of predicted y2 > 00: observed y1 falls outside 1sd of predicted y1 and observed y2 > falls outside 1sd of predicted y2 > > And therefore, I can easily submit input pairs drawn from these four > sets to Arthur's program and model the yields via John's calculator > using > > L1,0,0,yield > L1,0,1,yield > L1,1,0 yield > L1,1,1 yield > > Li,0,0,yield > Li,0,1,yield > Li,1,0 yield > Li,1,1 yield > > Ln,0,0,yield > Ln,0,1,yield > Ln,1,0 yield > Ln,1,1 yield > > where Li (1<=i<=n) are length interval values, e.g. 13-22,23-32, etc. > > But of course, this entire idea is predicated on your approval of the > idea of splitting any given data set into four subsets according to > degree of divergence from predicted values of y1 and y2.
Well, it's the sort of thing I've seen students try from time to time, and it never worked for them, but it's easy enough to do, so I suppose there's no harm in trying, as long as your time is free.
In general, when you get an idea for a novel analysis, it's usually a good idea to simulate it, to try it on artificial data that you have constructed, whose properties are known. Will it find what you want and ignore what you don't want? If not then how could you interpret the results of analyzing real data as meaning what you want them to?
|
|