Here is an example of an empirically worthwhile Bonferroni ranking obtained via the sampling protocol proposed in my previous post. So far as I can see, this ranking must be considered sample-size independent, but of course, the final judgment here must be yours.
Across ALL 12 length intervals for the a1 fold using dicodon set S63, there are a total of:
876 observations with u restricted to uL (call this set SuL) 2092 observations with u restricted to uH (call this set SuH).
So, we begin by randomly sampling 20 times from SuL, so as to create 20 sets of 100 observations each (call these SuL1 thru SuL20).
And similarly, we randomly sample 20 times from suH, so as to create 20 sets of 100 observations each (call these SuH1 thru SuH20).
Next, we run the three regressions
RLe: ln(c/L) on ln(c/e) RLu: ln(c/L) on ln(c/u) RLb: ln(c/L) on ln(c/u) and ln(c/e)
on each of the sets SuL1 thru SuL20 and each of the sets SuH1 thru SuH20.
Next, we use your two-stage custom heteroscedastic t-test with 20 rows to compare:
1) the sets of slopes and slope SEs obtained from these runs:
Pes: p from t-test on slopes from RLe runs Pei: p from t-test on intercepts from RLe runs Pus: p from t-test on slopes from RLu runs Pui: p from t-test on intercepts from RLu runs Pbse: p from t-test on slopes for ln(c/e) from RLb runs Pbsu: p from t-test on slopes for ln(c/u) from RLb runs Pbi: p from t-test on intercepts from RLb runs
and obtain this Bonferroni ranking (with desired p = 0.05):
I will not dwell on the scientific import of this ranking, except to say that:
1) inasmuch as the regressions for uH data are uniformly better than the regressions for uH data, this ranking indicates that we are statistically entited to restrict u to uH when developing predictors for the logistic regressions that we?ll use to predict structural alignability yields from AML?s pinq program;
2) for Paper I, it may be important that Pbse is NOT significant.
But again, these conclusions are based on the assumptions that:
1) that you will adjuge the above ranking to be sample-size independent;
2) we get no contra-indications from running the same protocol using the dicodon sets C711, S119, C1058, S60, C493, S63R, and C673R.
One final methodological note:
When we develop our logistic prediction predictors, the above results indicate that we can/should run our three linear regressions (two simple and one multiple) ACROSS all length intervals (not per length interval), and THEN split our observations into length intervals ONLY for the purposes of submitting subsequences to AML?s pinq program for structural alignment. (I think you suggested something like this some time ago ...)