|
|
Re: Some preliminary numbers and three preliminary simple questions ...
Posted:
Aug 18, 2012 1:22 AM
|
|
On Aug 17, 6:07 am, djh <halitsky.d@att.net> wrote: > Thanks as always for your willingness to contine this discussion. > > Your summary is remarkably accurate ? the only clarification I would > add is to modify this statement: > > ?Then you create a 12 x 12 matrix, say M12, by comparing the > hyperboloids from the column 1 regressions to those from the column 2 > regressions; and a similar matrix, say M13, by comparing the > hyperboloids from the column 1 regressions to those from the column 3 > regressions.? > > to this: > > ?Then you create a 12 x 12 matrix, say M12, by comparing pairs of > hyperboloids from the column 1 regressions to pairs from the column 2 > regressions; and a similar matrix, say M13, by comparing pairs of > hyperboloids from the column 1 regressions to pairs from the column 3 > regressions.? > > Turning now to the ?data frames?, I?m sorry I used this term unclearly > (I thought I had used it before in our discussions.) > > My previous statement was in error. There are actually 240 non-random > data frames (not 180), and 240 random data frames (not 180.) > > In particular, since we?re working with 6 folds f1,...,f6, we have 15 > unique pairs of folds: > > f1f2 > f1f3 > f1f4 > f1f5 > f1f6 > > f2f3 > f2f4 > f2f5 > f2f6 > > f3f4 > f3f5 > f3f6 > > f4f5 > f4f6 > > f5f6 > > and from these we can form the following 20 pairs of pairs of folds: > > f1f2 f1f3 > f1f2 f1f4 > f1f2 f1f5 > f1f2 f1f6 > f1f3 f1f4 > f1f3 f1f5 > f1f3 f1f6 > f1f4 f1f5 > f1f4 f1f6 > f1f5 f1f6 > > f2f3 f2f4 > f2f3 f2f5 > f2f3 f2f6 > f2f4 f2f5 > f2f4 f2f6 > f2f5 f2f6 > > f3f4 f3f5 > f3f4 f3f6 > f3f5 f3f6 > > f4f5 f4f6 > > So if we adopt your notion of a 12 x 3 table of regressions, each of > the above 20 pairs of pairs will yield one of these tables, eg. > f1f2,f1f3 will yield T123, etc. > > But we can generate the 20 tables Tijk for data derived by choosing: > > 1) one of 6 non-random dicodon sets > 2) restricting u to uL or uH > > and so we will have 20*6*2 = 240 tables Tijk computed from data > obtained using non-random dicodon sets. > > And similarly, we can also generate the 20 tables Rijk for data > derived by choosing: > > 3) one of 6 random dicodon sets > 4) restricting u to uL or uH > > and so we will again have 20*6*2 = 240 matrices Rijk computed from > data obtained using non-random dicodon sets. > > Each of the 240 tables Tijk will yield a pair of matrices Mij,Mik for > which the scalars Vij and Vik can be computed using your scalar > function V. > > And each of the 240 tables Rijk will yield a pair of matrices rMij, > rMik for which the scalars rVij and rVik can be computed using your > scalar function V.
In the analysis of a triple {i,j,k}, j & k are interchangeable with one another but not with i, which plays a different role. Is there a reason that different folds get to be 'i' different numbers of times?
Instead of the 20 triples
{1,2,3},{1,2,4},{1,2,5},{1,2,6},{1,3,4},{1,3,5},{1,3,6},{1,4,5}, {1,4,6},{1,5,6}, {2,3,4},{2,3,5},{2,3,6},{2,4,5},{2,4,6},{2,5,6}, {3,4,5},{3,4,6},{3,5,6}, {4,5,6}}
do you really want the 60 triples
{1,2,3},{1,2,4},{1,2,5},{1,2,6},{1,3,4},{1,3,5},{1,3,6},{1,4,5}, {1,4,6},{1,5,6}, {2,1,3},{2,1,4},{2,1,5},{2,1,6},{2,3,4},{2,3,5}, {2,3,6},{2,4,5}, {2,4,6},{2,5,6}, {3,1,2},{3,1,4},{3,1,5},{3,1,6},{3,2,4},{3,2,5},{3,2,6},{3,4,5}, {3,4,6},{3,5,6}, {4,1,2},{4,1,3},{4,1,5},{4,1,6},{4,2,3},{4,2,5},{4,2,6},{4,3,5}, {4,3,6},{4,5,6}, {5,1,2},{5,1,3},{5,1,4},{5,1,6},{5,2,3},{5,2,4},{5,2,6},{5,3,4}, {5,3,6},{5,4,6}, {6,1,2},{6,1,3},{6,1,4},{6,1,5},{6,2,3},{6,2,4},{6,2,5},{6,3,4}, {6,3,5},{6,4,5}
in which each fold gets to be 'i' the same number of times?
> > So we will actually have: > > 5) a set S of 480 values of the function V derived from data obtained > using non-random dicodon sets, and a distribution D of these values of > V. > > 6) a set Sr of 480 values of the function V derived from data obtained > using random dicodon sets, and a distribution Dr of these values of V. > > And therefore, my first question (Q1) is whether there is some > legitimate way to determine whether D differs significantly from > Dr.
No, at least not with your current resources. The problem is that the 480 (or whatever) V-values share data, both within and between pairs. You can easily estimate the difference between the means of the V and Vr distributions, but the complex dependence structure of the data means that there is no easy estimate of the standard error of the difference.
> > Furthermore, suppose for the sake of discussion that the answer to > this question is ?yes?, and that in fact, there is a significant > difference between D and Dr. > > Then my second question (Q2) would be whether this entitles us to > treat the values of V in the set S as characterizing some ?real? > properties of the data associated with at least SOME non-random > dicodon sets (namely our 6 non-random dicodon sets.) > > But of course, if your answer to Q1 is ?no?, then my second question > would be whether the nature of the distributions D and Dr will tell > you what you need to know in order to figure how to bootstrap in order > to decide whether the distribution of values of V differs in data > obtained using non-random dicodon sets versus data obtained using > random dicodon sets.
|
|