|
|
Re: Correct way to normalize an rmsd-based distance metric used in repeated trials of pairs
Posted:
Apr 3, 2012 6:04 AM
|
|
My cup runneth over, Ray, really. Again, your intellectual "caritas" is inspiring - I use the Latin term here to signify something perhaps even more than "generosity" and "kindness".
Your post above is really the one of which I should send framed copies to Jacques and Arthur, inasmuch as it not only holds out the possibility that we MIGHT be able to make a reasonable statistical argument for our claims, but also because again, it lays out exactly what our remaining plan of attack should be.
In this regard, here is what I will do regarding what you wrote here:
"Reviewers for the (psychology) journals with which I am familiar would want some empirical assurance that the small but consistently significant coefficients were not due to predictors that are known or suspected to matter but were omitted, such as those mentioned in the two posts immediately following the one to which I am replying. "
1. First, I will definitely go back and do what I mentioned in my second "follow-up" post, namely - rerun the analysis counting only output pairs whose average "spans" fall within 1sd of mean average "span" for the run (where "run" is defined by a triple of values for (length, v1, v2).
If this reanalysis yields same or better logistic correlations than those you've seen already, then this result might well help with potential referees, because it would lessen the possible effect of "chance" involving low outliers and "expected" results involving high outliers.
On the other hand, if this reanalysis kills the current logisitic correlations outright, well, that's the way the cookie crumbles. At least we'll know what's what and that we did an honest day's work.
2. Assuming the reanalysis in (1) does not kill our current logistic correlations, I will run the weighted LINEAR correlations on logit[p] according to the instructions you've provided.
In this regard, I am interpreting your last post as saying that if we run the weighted linear correlations on logit[p] rather than the ordinary linear correlations, then you would be able to make a more informed decision as to whether we could use the resulting R^2's as a crude indicator of the extent to which there might be an "evolutionary factor" influencing the behavior of Dr. Lesk's program. (But if I'm misinterpreting you incorrectly here, please clarify.)
3. Regarding what you wrote here:
"The general rule is that the more radical the conclusion you are trying to draw, the more evidence you need to present that yours is the only reasonable explanation. "
I am pleased to be able to tell you that if the analysis is still viable after (1) and (2) are completed, then your point here should be at least partially addressed by our main control protocol (which I haven't yet discussed at all). Dr. Lesk is known throughout the community as being extraordinarly punctilious and severe regarding controls, but even he has agreed that we have a very elegant control whose results will be very difficult to disparage (whether good for us or bad for us.)
4. Finally, please note that before reporting here the results of the reanalysis in (1) above, I have to re-calculate the current logistic correlations for lengths 20-80 as well as the new ones for 90-110. This is because I forgot what you told me to do when p = 1 and 1-p = 0, so I was deleting cases when output=input, instead of adding the .5 and 1 in these cases (which is what you suggested in your post of 7:21pm Mar 27 as a lousy but workable solution for the "1" cases as well as the "0" cases.) This shouldn't make a whole lot of difference, but I want to make sure the current data are in the best possible shape before proceeding with (1) above, particularly because it won't take me but a few minutes to re-calculate the data for lengths 20-110 and re-report the results here.
5. You can expect a note from Jacques toward the end of the week (he has been travelling and then catching-up with the regular affairs of his lab.)
|
|