
Re: Some preliminary numbers and three preliminary simple questions ...
Posted:
Aug 18, 2012 1:22 AM


On Aug 17, 6:07 am, djh <halitsky.d@att.net> wrote: > Thanks as always for your willingness to contine this discussion. > > Your summary is remarkably accurate ? the only clarification I would > add is to modify this statement: > > ?Then you create a 12 x 12 matrix, say M12, by comparing the > hyperboloids from the column 1 regressions to those from the column 2 > regressions; and a similar matrix, say M13, by comparing the > hyperboloids from the column 1 regressions to those from the column 3 > regressions.? > > to this: > > ?Then you create a 12 x 12 matrix, say M12, by comparing pairs of > hyperboloids from the column 1 regressions to pairs from the column 2 > regressions; and a similar matrix, say M13, by comparing pairs of > hyperboloids from the column 1 regressions to pairs from the column 3 > regressions.? > > Turning now to the ?data frames?, I?m sorry I used this term unclearly > (I thought I had used it before in our discussions.) > > My previous statement was in error. There are actually 240 nonrandom > data frames (not 180), and 240 random data frames (not 180.) > > In particular, since we?re working with 6 folds f1,...,f6, we have 15 > unique pairs of folds: > > f1f2 > f1f3 > f1f4 > f1f5 > f1f6 > > f2f3 > f2f4 > f2f5 > f2f6 > > f3f4 > f3f5 > f3f6 > > f4f5 > f4f6 > > f5f6 > > and from these we can form the following 20 pairs of pairs of folds: > > f1f2 f1f3 > f1f2 f1f4 > f1f2 f1f5 > f1f2 f1f6 > f1f3 f1f4 > f1f3 f1f5 > f1f3 f1f6 > f1f4 f1f5 > f1f4 f1f6 > f1f5 f1f6 > > f2f3 f2f4 > f2f3 f2f5 > f2f3 f2f6 > f2f4 f2f5 > f2f4 f2f6 > f2f5 f2f6 > > f3f4 f3f5 > f3f4 f3f6 > f3f5 f3f6 > > f4f5 f4f6 > > So if we adopt your notion of a 12 x 3 table of regressions, each of > the above 20 pairs of pairs will yield one of these tables, eg. > f1f2,f1f3 will yield T123, etc. > > But we can generate the 20 tables Tijk for data derived by choosing: > > 1) one of 6 nonrandom dicodon sets > 2) restricting u to uL or uH > > and so we will have 20*6*2 = 240 tables Tijk computed from data > obtained using nonrandom dicodon sets. > > And similarly, we can also generate the 20 tables Rijk for data > derived by choosing: > > 3) one of 6 random dicodon sets > 4) restricting u to uL or uH > > and so we will again have 20*6*2 = 240 matrices Rijk computed from > data obtained using nonrandom dicodon sets. > > Each of the 240 tables Tijk will yield a pair of matrices Mij,Mik for > which the scalars Vij and Vik can be computed using your scalar > function V. > > And each of the 240 tables Rijk will yield a pair of matrices rMij, > rMik for which the scalars rVij and rVik can be computed using your > scalar function V.
In the analysis of a triple {i,j,k}, j & k are interchangeable with one another but not with i, which plays a different role. Is there a reason that different folds get to be 'i' different numbers of times?
Instead of the 20 triples
{1,2,3},{1,2,4},{1,2,5},{1,2,6},{1,3,4},{1,3,5},{1,3,6},{1,4,5}, {1,4,6},{1,5,6}, {2,3,4},{2,3,5},{2,3,6},{2,4,5},{2,4,6},{2,5,6}, {3,4,5},{3,4,6},{3,5,6}, {4,5,6}}
do you really want the 60 triples
{1,2,3},{1,2,4},{1,2,5},{1,2,6},{1,3,4},{1,3,5},{1,3,6},{1,4,5}, {1,4,6},{1,5,6}, {2,1,3},{2,1,4},{2,1,5},{2,1,6},{2,3,4},{2,3,5}, {2,3,6},{2,4,5}, {2,4,6},{2,5,6}, {3,1,2},{3,1,4},{3,1,5},{3,1,6},{3,2,4},{3,2,5},{3,2,6},{3,4,5}, {3,4,6},{3,5,6}, {4,1,2},{4,1,3},{4,1,5},{4,1,6},{4,2,3},{4,2,5},{4,2,6},{4,3,5}, {4,3,6},{4,5,6}, {5,1,2},{5,1,3},{5,1,4},{5,1,6},{5,2,3},{5,2,4},{5,2,6},{5,3,4}, {5,3,6},{5,4,6}, {6,1,2},{6,1,3},{6,1,4},{6,1,5},{6,2,3},{6,2,4},{6,2,5},{6,3,4}, {6,3,5},{6,4,5}
in which each fold gets to be 'i' the same number of times?
> > So we will actually have: > > 5) a set S of 480 values of the function V derived from data obtained > using nonrandom dicodon sets, and a distribution D of these values of > V. > > 6) a set Sr of 480 values of the function V derived from data obtained > using random dicodon sets, and a distribution Dr of these values of V. > > And therefore, my first question (Q1) is whether there is some > legitimate way to determine whether D differs significantly from > Dr.
No, at least not with your current resources. The problem is that the 480 (or whatever) Vvalues share data, both within and between pairs. You can easily estimate the difference between the means of the V and Vr distributions, but the complex dependence structure of the data means that there is no easy estimate of the standard error of the difference.
> > Furthermore, suppose for the sake of discussion that the answer to > this question is ?yes?, and that in fact, there is a significant > difference between D and Dr. > > Then my second question (Q2) would be whether this entitles us to > treat the values of V in the set S as characterizing some ?real? > properties of the data associated with at least SOME nonrandom > dicodon sets (namely our 6 nonrandom dicodon sets.) > > But of course, if your answer to Q1 is ?no?, then my second question > would be whether the nature of the distributions D and Dr will tell > you what you need to know in order to figure how to bootstrap in order > to decide whether the distribution of values of V differs in data > obtained using nonrandom dicodon sets versus data obtained using > random dicodon sets.

