Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Correct way to normalize an rmsd-based distance metric used in
repeated trials of pairs

Replies: 148   Last Post: May 8, 2012 3:40 AM

 Search Thread: Advanced Search

 Messages: [ Previous | Next ]
 Halitsky Posts: 600 Registered: 2/3/09
Your questions go to the "moment of truth" that Jacques and Arthur
may shortly be facing

Posted: Apr 29, 2012 2:38 PM
 Plain Text Reply

You wrote:

"Part of the applied statistics lore is that such simple linear
combinations often -- some would say almost always -- work better
than they "ought" to, but that's after an intentional decision to
weight. The problem with accidental weightings that work is that
they can just as accidentally quit working.

I was hoping for an a priori theoretical justification. Forget the
logging, and ask yourself why approximating c/e well should be more
important than approximating c/u well, where both approximations are
of the form a*(c/L)^b, each approximation has its own (empirically
determined) a & b, and "importance" means utility for predicting the
probability of a match. Under what conditions should the relative
importance be approximately constant? Might it ever reverse? Etc. "

In response, let me say first that because I chose to answer your
question about "weighting" narrowly instead of broadly, I left you
with the impression that there is no a priori theoretical
justification for favoring "dicodon enthalpic level" (c/e) over
"dicodon representation level" (c/u.)

In fact, the primary problem which may shortly confront Jacques and
Arthur is just the reverse: choosing among several different
hypotheses as to why "dicodon enthalpic level" should be more
important than "dicodon representation level". In this regard, I have
been championing the "evolutionary hypothesis" which I've mentioned
earlier: the importante of "dicodon enthalpic level" is a left-over
from constraints on the earliest systems capable of making proteins
from genes. On the other hand, our colleague Marvin Stodolsky has
been championing a "mechanistic hypothesis", i.e. a hypothesis which
asserts that "dicodon enthalpic level" is relevant to various
processes which can and do affect the manufacture of proteins from
genes even today. (Note that these two hypotheses are not mutually
exclusive: the question is which Jacques and Arthur should choose to
include or omit, and which to emphasize.)

Further, the secondary probem which may shortly confront Jacques and
Arthur is deciding how to explain why "dicodon representation
level" (c/u) should matter at all. (Recall that in your first
proposed model {lnL,x1,lnLx1}, which worked fairly well in certain
cases, we disregarded c/u entirely (since we disregarded x2 as a
predictor.)

Regarding this question, I have an idea which is amenable to empirical
investigation - it has to do with a "supply chain" constraint
involving the numbers of available auxiliary molecules (tRNAs) of
various types that are available to an organism during the process by
which a protein is made from a gene. And if necessary, I am prepared
to analyze the relevant available databases to see how this idea does
or doesn't play out. And, of course, Jacques and Arthur and Marvin
may have their own ideas as to why "dicodon representation level"
should matter at all, as opposed to "dicodon enthalpic level".

Finally, let me explain why I have chosen the locution "Jacques and
Arthur MAY have to confront ..."

We have three folds left to go (b47, c1, and c2) to see if the new
{lnL,mv,lnLmv} model holds up with respect to the nice differentiation
which it makes between study group and control group data for each
fold.

If the apparent success of this new model is an "accident" of the sort
you indicate it might be, then it will fail on one of the three
remaining folds and there will be no problem at all for Jacques and
Arthur to confront.

But if this new model should work on the remaining three folds to
differentiate study group from control group data, then there is
reason to expect that it will "always" work, inasmuch as our six folds
each contains two different examples of the three basic fold types:
helical (a1/a3), sheet(b1.b47), and helix/sheet(c1,c2.) So, if the
new model fails to work on future data after succeeding for our
present six folds, then we will have reason to suspect a readily
explainable "exception to the rule."

In any event, everything depends right now on what happens with our
remaining three folds ... I will be posting the b47 fold results
sometime later tonight after I finish the control group runs.

Thanks so much again, Ray.

Date Subject Author
4/1/12 Halitsky
4/3/12 Ray Koopman
4/3/12 Halitsky
4/6/12 Ray Koopman
4/6/12 Halitsky
4/6/12 Halitsky
4/7/12 Ray Koopman
4/7/12 Halitsky
4/8/12 Ray Koopman
4/8/12 Halitsky
4/9/12 Halitsky
4/9/12 Halitsky
4/9/12 Ray Koopman
4/9/12 Halitsky
4/9/12 Halitsky
4/10/12 Ray Koopman
4/10/12 Halitsky
4/11/12 Halitsky
4/11/12 Ray Koopman
4/11/12 Halitsky
4/11/12 Halitsky
4/11/12 Halitsky
4/11/12 Art Kendall
4/11/12 Halitsky
4/13/12 Ray Koopman
4/13/12 Halitsky
4/13/12 Halitsky
4/14/12 Ray Koopman
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/15/12 Ray Koopman
4/15/12 Halitsky
4/15/12 Halitsky
4/15/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Halitsky
4/17/12 Ray Koopman
4/17/12 Halitsky
4/17/12 Ray Koopman
4/18/12 Halitsky
4/19/12 Ray Koopman
4/19/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 gimpeltf@hotmail.com
4/20/12 gimpeltf@hotmail.com
4/20/12 gimpeltf@hotmail.com
4/21/12 gimpeltf@hotmail.com
4/21/12 Halitsky
4/24/12 Ray Koopman
4/22/12 Halitsky
4/23/12 Halitsky
4/24/12 Ray Koopman
4/24/12 Halitsky
4/24/12 Halitsky
4/24/12 Ray Koopman
4/26/12 Ray Koopman
4/26/12 Halitsky
4/26/12 Halitsky
4/27/12 Ray Koopman
4/27/12 Halitsky
4/27/12 Ray Koopman
4/28/12 Halitsky
4/28/12 Ray Koopman
4/28/12 Halitsky
4/28/12 Ray Koopman
4/28/12 gimpeltf@hotmail.com
4/28/12 Ray Koopman
4/28/12 gimpeltf@hotmail.com
4/28/12 Halitsky
4/29/12 Ray Koopman
4/29/12 Ray Koopman
4/29/12 Halitsky
4/29/12 Ray Koopman
4/29/12 Halitsky
4/29/12 Halitsky
4/29/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
4/30/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
5/1/12 Ray Koopman
5/1/12 Halitsky
5/1/12 Ray Koopman
5/1/12 Halitsky
5/1/12 Halitsky
5/2/12 Halitsky
5/2/12 Halitsky
5/2/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Halitsky
5/4/12 Ray Koopman
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/5/12 Halitsky
5/5/12 Ray Koopman
5/5/12 Halitsky
5/7/12 Halitsky
5/7/12 Halitsky
5/8/12 Ray Koopman
5/8/12 Halitsky
5/6/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/1/12 Halitsky
4/27/12 Halitsky
4/28/12 Ray Koopman
4/28/12 Halitsky
4/24/12 Ray Koopman
4/19/12 Halitsky
4/20/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Ray Koopman
4/16/12 Halitsky
4/14/12 Ray Koopman
4/13/12 Halitsky
4/8/12 Ray Koopman

© The Math Forum at NCTM 1994-2018. All Rights Reserved.