|
|
Re: Correct way to normalize an rmsd-based distance metric used in repeated trials of pairs
Posted:
Apr 16, 2012 2:14 PM
|
|
On Apr 16, 10:27 am, djh <halitsk...@att.net> wrote: > I tried your suggestion for using the media to dichotomize the > residals (after adjusting the residuals according to your instructions > two posts ago.) The result I got seems to provide additional evidence > that both correlations: > > correlation Cel: (log c - log e) on (log c - log L) > correlation Cul: (log c - log u) on (log c - log L) > > are very much homoscedastic. But I wanted to check this result with > you, to make sure that I'm correct in saying that this result results > from the homoscedasticity of the two correlations. > > So - here's what I did on a set of 612 subsequences: > > 1) ran the regressions Cel and Cul > 2) adjusted the residuals for each according to your "adjustment" > instruction (here I used the Excel "VAR" function to determine > variance of x = (log c - log L) and I used the residuals reported out > by Excel.) > 3) found the medians for the two sets of adjusted residuals; > 4) assigned the index 11, 01, 10, 00 to each of the 612 subsequences > in the obvious way: > > a) 11 if the subsequence had a Cel residual < the Cel median and a > Cul residual < the Cul median > b) 10 if the subsequence had a Cel residual < the Cel median and a > Cul residual >= the Cul median > c) 01 if the subsequence had a Cel residual >= the Cel median and a > Cul residual < the Cul median > d) 00 if the subsequence had a Cel residual >- the Cel median and a > Cul residual >= the Cul median.
'0' means 'a big number', '1' means 'a small number'. Obvious ;)
> > And the result of this assignment was: > > 153 of the 612 subsequences marked 11 > 153 of the 612 subsequences marked 10 > 153 of the 612 subsequences marked 01 > 153 of the 612 subsequences marked 00 > > I assume that this even 4-way split is the result of the residuals > being highly homoscedastic in both correlations Cel and Cul, since if > I'm understanding the basics correctly: > > i) high homoscedasticity implies very uncorrelated errors > ii) very uncorrelated errors in both the Cul and Cel correlation would > lead to the above even 4-way split. > > Is this interpretation correct, or am I entirely off-track here ? > > Thanks as always for any time you can afford to spend considering this > question ...
The evenness of the four-way split is the result of two things: dichotomizing at the medians, and the two sets of residuals being uncorrelated with one another. Although certain patterns of heteroscedasticity could prevent the split from being even, homoscedasticity does not make it even. Also, remember that it's the absolute residuals we're talking about, not the signed residuals.
|
|