(The second column is the first less 0.25, where 0.25 results form the null assumption of a 1:1:1:1 proportion for w for any 00,01,10,11 4- tuple.
The result according to John's calculator is EXACTLY the same.
I am sure you will readily understand why this is the case mathematically/statistically, but from a scientific point of view, I am very pleased to tell you that:
1) it makes the "w" factor completely understandable and justifiable; 2) it removes the seeming "dependency" of the results in one cell, like 00, on a proportion involving cells 00,01,10,11, which made you very uncomfortable.
The new measure w' = w - 0.25 is simply the departure of the cell from the norm in a particular pair of directions.
a) if w' is positive for 00, it means that more than the expected number of adjusted absolute residuals are above the median for both the x1 and the x2 driver linear regressions;
b) if w' is positive for 11, it means that more than the expected number of adjusted absolute residuals are below the median for both regressions;
c) if w' is positive for 01, it means that more than the expected number of adjusted absolute residuals are above the median for the x1 driver and below for the x2 driver
d) if w' is positive for 01, it means that more than the expected number of adjusted absolute residuals are below the median for the x1 driver and above for the x2 driver
Each of these situations can be interpreted scientifically as representing a pattern resulting from a particular kind of "mutational drift" away from an expected norm, and it is entirely reasonable to suppose that the degree of alignability between two subsequences would depend in part on whether they have instantiated the same type of pattern.
So, I don't think we have to look any longer for the "real" predictor underlying the "surrogate" predictor w. I am more than content, in fact, delighted, to intepret w as a surrogate for w' = w - 0.25.
I know you're dealing with a lot of other matters right now (e.g the sampling strategy and the quasi-Nchoose2 form of the input values.
But I felt it important to communicate this result to you, because I think it's really important ...
Also, maybe you would have a moment to tell me whether it would be even better to get the mean w for each cell for each length across all six folds and subtract this mean, rather than the 0.25 resulting from the null assumption.