Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Correct way to normalize an rmsd-based distance metric used in
repeated trials of pairs

Replies: 148   Last Post: May 8, 2012 3:40 AM

 Messages: [ Previous | Next ]
 Ray Koopman Posts: 3,383 Registered: 12/7/04
Re: Correct way to normalize an rmsd-based distance metric used in
repeated trials of pairs

Posted: Apr 24, 2012 12:24 AM

On Apr 20, 8:59 pm, gimpe...@hotmail.com wrote:
> Very interesting ... Arthur has suspected all along that submitting
> "short" subsequences to his alignment program would not be wise,
> because at least in the case of alpha-helical proteins, his program
> will always find something to align. And when I ran the length
> interval 23-32 for fold a1 (which includes helical proteins only)
> on the refined "00" data ... sure enough ... the yield ratio was 1,
> i.e. output = input. So there is no sense in running the remaining
> 01/10/11 data for length interval 33-42 through Arthur's program, nor
> for running any of the 00/01/10/11 a1 data for the last remaining
> length interval 13-22.
>
> Therefore, I have added length as a third predictor and done a summary
> run on 32 points (the 00/01/10/11 results for each of the eight length
> intervals from 33-42 thru 103-112. The results appear below, and it
> seems like the critical predictor (which is now predictor 2) holds up
> very well.
>
> But ... that is of course my naive and untutored judgement ... please
> take a look at your earliest convenience at the data below and tell me
> what you think. Should I move on to the next of the five folds using
> the same methodology? Or can you tell that this attempt using the
> "00/01/10/11" game plan has also failed ?

Let me call the predictors L, x1, x2 instead of first, second, third.

With all three predictors in the model, the weight on x2 is so small
and so far from significance that x2 should be dropped. As is usual
in such cases, dropping it changes things only negligibly.

Using log L instead of L improves the fit (increases the maximized
likelihood) and makes x2 even more nonsignificant.

Dropping x2 and using L, x1, and L*x1 gives almost as good a fit,
and changing L to log L improves the fit even more.

Here is a table giving {chi-square, df} for various models against
the corresponding saturated model. (The saturated model with x2 is
not the same as the saturated model wihout x2.)

L,x1,x2 L,x1 L,x1,L*x1
raw L {1136.41, 28} {533.755, 13} {455.82, 12}
log L {1048.03, 28} {445.424, 13} {348.74, 12}

Chi-square/df is a measure of the misfit of the model.

Here are the results for the best-fitting model. (ASE is the standard
acronym for Asymptotic Standard Error. In this case it refers to the
estimated ASE of the estimated coefficient.)

Predictor Coefficient ASE
Log L 4.3080 0.3609
x1 -0.8996 0.0915
(Log L)*x1 -6.5615 0.2587
intercept -6.5615 0.2587

Including the product term effectively does two separate regressions
-- one for x1 = 1, one for x1 = 0. The two regression curves are
predicted to cross at about L = 120. Does that seem reasonable?

Conclusions:
1. Both L and x1 are necessary.
2. x2 is uselesss.
3. Log L is better than L.
4. The effect of x1 varies as a function of L.

>
> 40,0,0,1416,1485
> 40,0,1,1919,1017
> 40,1,0,1053,2575
> 40,1,1,787,2072
> 50,0,0,416,1068
> 50,0,1,701,1040
> 50,1,0,642,1250
> 50,1,1,484,1231
> 60,0,0,304,645
> 60,0,1,366,714
> 60,1,0,343,1087
> 60,1,1,107,714
> 70,0,0,252,534
> 70,0,1,278,758
> 70,1,0,182,765
> 70,1,1,128,954
> 80,0,0,160,430
> 80,0,1,103,197
> 80,1,0,171,310
> 80,1,1,108,520
> 90,0,0,27,24
> 90,0,1,46,152
> 90,1,0,30,304
> 90,1,1,26,137
> 100,0,0,67,98
> 100,0,1,50,189
> 100,1,0,93,196
> 100,1,1,1,230
> 110,0,0,32,39
> 110,0,1,20,157
> 110,1,0,81,238
> 110,1,1,0,125
>
> Descriptives...
>
> 10393 cases have Y=0; 21255 cases have Y=1.
>
> Variable Avg SD
> 1 55.5015 17.8084
> 2 0.5354 0.4987
> 3 0.4844 0.4998
>
> Iteration History...
> -2 Log Likelihood = 40068.5930 (Null Model)
> -2 Log Likelihood = 38305.1477
> -2 Log Likelihood = 38278.7631
> -2 Log Likelihood = 38278.7296
> -2 Log Likelihood = 38278.7296 (Converged)
>
> Overall Model Fit...
> Chi Square= 1789.8634; df=3; p= 0.0000
>
> Coefficients and Standard Errors...
> Variable Coeff. StdErr p
> 1 0.0205 0.0008 0.0000
> 2 0.7643 0.0247 0.0000
> 3 -0.0074 0.0247 0.7649
> Intercept -0.7802
>
> Odds Ratios and 95% Confidence Intervals...
> Variable O.R. Low -- High
> 1 1.0208 1.0192 1.0223
>
> 2 2.1476 2.0460 2.2542
>
> 3 0.9926 0.9457 1.0419
>
> X1 X2 X3 n0 n1 Calc Prob
> 40.0000 0.0000 0.0000 1416 1485 0.5105
> 40.0000 0.0000 1.0000 1919 1017 0.5086
> 40.0000 1.0000 0.0000 1053 2575 0.6913
> 40.0000 1.0000 1.0000 787 2072 0.6897
> 50.0000 0.0000 0.0000 416 1068 0.5615
> 50.0000 0.0000 1.0000 701 1040 0.5597
> 50.0000 1.0000 0.0000 642 1250 0.7333
> 50.0000 1.0000 1.0000 484 1231 0.7319
> 60.0000 0.0000 0.0000 304 645 0.6113
> 60.0000 0.0000 1.0000 366 714 0.6096
> 60.0000 1.0000 0.0000 343 1087 0.7716
> 60.0000 1.0000 1.0000 107 714 0.7703
> 70.0000 0.0000 0.0000 252 534 0.6589
> 70.0000 0.0000 1.0000 278 758 0.6572
> 70.0000 1.0000 0.0000 182 765 0.8058
> 70.0000 1.0000 1.0000 128 954 0.8046
> 80.0000 0.0000 0.0000 160 430 0.7035
> 80.0000 0.0000 1.0000 103 197 0.7019
> 80.0000 1.0000 0.0000 171 310 0.8359
> 80.0000 1.0000 1.0000 108 520 0.8349
> 90.0000 0.0000 0.0000 27 24 0.7445
> 90.0000 0.0000 1.0000 46 152 0.7431
> 90.0000 1.0000 0.0000 30 304 0.8622
> 90.0000 1.0000 1.0000 26 137 0.8613
> 100.0000 0.0000 0.0000 67 98 0.7816
> 100.0000 0.0000 1.0000 50 189 0.7803
> 100.0000 1.0000 0.0000 93 196 0.8848
> 100.0000 1.0000 1.0000 1 230 0.8841
> 110.0000 0.0000 0.0000 32 39 0.8146
> 110.0000 0.0000 1.0000 20 157 0.8135
> 110.0000 1.0000 0.0000 81 238 0.9042
> 110.0000 1.0000 1.0000 0 125 0.9035

Date Subject Author
4/1/12 Halitsky
4/3/12 Ray Koopman
4/3/12 Halitsky
4/6/12 Ray Koopman
4/6/12 Halitsky
4/6/12 Halitsky
4/7/12 Ray Koopman
4/7/12 Halitsky
4/8/12 Ray Koopman
4/8/12 Halitsky
4/9/12 Halitsky
4/9/12 Halitsky
4/9/12 Ray Koopman
4/9/12 Halitsky
4/9/12 Halitsky
4/10/12 Ray Koopman
4/10/12 Halitsky
4/11/12 Halitsky
4/11/12 Ray Koopman
4/11/12 Halitsky
4/11/12 Halitsky
4/11/12 Halitsky
4/11/12 Art Kendall
4/11/12 Halitsky
4/13/12 Ray Koopman
4/13/12 Halitsky
4/13/12 Halitsky
4/14/12 Ray Koopman
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/14/12 Halitsky
4/15/12 Ray Koopman
4/15/12 Halitsky
4/15/12 Halitsky
4/15/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Halitsky
4/16/12 Halitsky
4/17/12 Ray Koopman
4/17/12 Halitsky
4/17/12 Ray Koopman
4/18/12 Halitsky
4/19/12 Ray Koopman
4/19/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Ray Koopman
4/20/12 Halitsky
4/20/12 Halitsky
4/20/12 gimpeltf@hotmail.com
4/20/12 gimpeltf@hotmail.com
4/20/12 gimpeltf@hotmail.com
4/21/12 gimpeltf@hotmail.com
4/21/12 Halitsky
4/24/12 Ray Koopman
4/22/12 Halitsky
4/23/12 Halitsky
4/24/12 Ray Koopman
4/24/12 Halitsky
4/24/12 Halitsky
4/24/12 Ray Koopman
4/26/12 Ray Koopman
4/26/12 Halitsky
4/26/12 Halitsky
4/27/12 Ray Koopman
4/27/12 Halitsky
4/27/12 Ray Koopman
4/28/12 Halitsky
4/28/12 Ray Koopman
4/28/12 Halitsky
4/28/12 Ray Koopman
4/28/12 gimpeltf@hotmail.com
4/28/12 Ray Koopman
4/28/12 gimpeltf@hotmail.com
4/28/12 Halitsky
4/29/12 Ray Koopman
4/29/12 Ray Koopman
4/29/12 Halitsky
4/29/12 Ray Koopman
4/29/12 Halitsky
4/29/12 Halitsky
4/29/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
4/30/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
4/30/12 Ray Koopman
4/30/12 Halitsky
5/1/12 Ray Koopman
5/1/12 Halitsky
5/1/12 Ray Koopman
5/1/12 Halitsky
5/1/12 Halitsky
5/2/12 Halitsky
5/2/12 Halitsky
5/2/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Halitsky
5/4/12 Ray Koopman
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/4/12 Halitsky
5/5/12 Halitsky
5/5/12 Ray Koopman
5/5/12 Halitsky
5/7/12 Halitsky
5/7/12 Halitsky
5/8/12 Ray Koopman
5/8/12 Halitsky
5/6/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/3/12 Halitsky
5/3/12 Ray Koopman
5/3/12 Halitsky
5/1/12 Halitsky
4/27/12 Halitsky
4/28/12 Ray Koopman
4/28/12 Halitsky
4/24/12 Ray Koopman
4/19/12 Halitsky
4/20/12 Ray Koopman
4/16/12 Halitsky
4/16/12 Ray Koopman
4/16/12 Halitsky
4/14/12 Ray Koopman
4/13/12 Halitsky
4/8/12 Ray Koopman