"I got the same p-values by doing two ordinary paired t's with 5 df, which is wrong. ALL tests must use the procedure from my Oct 2 post (although the df's may be so big that nothing substantial would change if you got p-values from the normal distribution instead of the t distribution)."
and I interpret this as response as telling me to do the following.
(I) Go back to the begining and for each combination of the six folds and the three dicodon sets n = 1,2,3, get the following means (M's) and variances (V's):
A) M(nS-nC) and V(nS-nC) per length interval for each uL
B) M(RnS-RnC) and V(RnS-RnC) per length interval for uL
C) M(nS-nC) and V(nS-nC) per length interval for uH
D) M(RnS-RnC) and V(1S-1C) per length interval for uH
(II) Then leverage-up these "per length interval" values the same way as I did before, in accordance with point #4 of your 10/2 post, to get the "leveraged-up" means and variances LM(nS-nC), LV(nS-nC), LM(RnS- RnC), LV(RnS-RnC) for uL and uH (again, for each combination of fold and dicodon set)
(III) Then plug these "leveraged-up" values into your 10/2 formulae for t and df, just as I did before, so that the numerator of t is:
[LM(nS-nC) - LM(RnS-RnC) for uL] - [LM(nS-nC) - LM(RnS-RnC) for uH]
and the four variances inside "sqrt[...]" in the denominator of t are:
LV(nS-nC) for uL, LV(RnS-RnC) for uL, LV(nS-nC) for uH, and LV(nS-nC) for uH;
(IV) The leveraged-up df's for the denominator of df are those for (nS- nC) and RnS-RnC for each of uLand uH.
If (I-IV) comprise a correct interpretation of your instructions, then you'll immediately see that a major question arises because in general:
(V) the N's for S and C are not the same for each pair of corresponding S,C cells of the design (e.g. S and C for fold a1, length interval 5, uL)
(VI) the N's for RnS and RnC are not the same for each pair of coresponding S,C cells of the design
In particular, the question is:
(VII) what N's should be used to compute the variances in I(A-D) and the df's in (IV)?
Regarding this question, please note that it goes away entirely if I simplify and sharpen the model as per my post of 10/8 @ 9:40 pm, simply because the introduction of C dicodon subsets of the same cardinality as S dicodon subsets allows me to:
A) compute u and e FOR EACH OBSERVATION relative to the S subset AND relative to the C subset
B) take the difference of the two u's and the difference of the two e's PER OBSERVATION.
with the consequent simplification of the design to
C) 6 folds x 12 length intervals x 2 (for N:R).
and the consequent ability to apply the instructions in your post of 10/2 without problem (both per length interval and across length intervals via "leveraging-up")
But even though I?m going to re-do the entire analysis (including the Re/Ru/Reu regressions) with this new design, I?m curious what the correct procedural answer is to question VIII.
2. Regarding the role of folds in the design.
??? In anova terms, you have been treating Fold as a fixed effect: the particular folds in the study were not chosen randomly, and the conclusions apply to those folds and no others. Do you want to switch now and treat Fold as a random effect, so that you can generalize to folds not in the study? If so then all the previous analyses must be done differently.
Yes, we have to be able to generalize to folds not in the study. Can you explain what has to change, both with respect to our current design and to the new design I proposed in my post of 10/8 @ 9:40pm. In this regard, please note that we must still keep and evaluate results PER FOLD for the simple reason that structural alignability analysis via logistic regressions (our Stage II goal) must be done per fold.