On Aug 7, 11:44 pm, Ray Koopman <koop...@sfu.ca> wrote: > On Aug 3, 7:28 am, djh <firstname.lastname@example.org> wrote: > >> This question is about whether or not a certain design will yield >> a certain result that can legitimately be said to be sample-size >> independent. >> >> Necessary background for the question is provided in (I-II) below. >> >> I. Given the two simple linear regressions: >> >> ln(c/L) on Ln(c/e) >> ln(c/L) on Ln(c/u) >> >> and the multiple linear regression >> >> ln(c/L) on (ln(c/e),ln(c/u)) (two independent variables) >> >> suppose first that we run all three regressions on all twelve >> length intervals for the a1 fold using the dicodon set S63 with >> u restricted to uL. >> >> Next, suppose we use your custom two-stage heteroscedastic test >> using: >> >> the 12 slopes from the ln(c/L) on ln(c/e) runs >> the 12 slopes for the ln(c/e) variable from the ln(c/L) on >> (ln(c/e),ln(c/u)) runs >> >> where the ?x? N?s and ?y? N?s for this test are BOTH: >> >> 52, 72, 76, 67, 71, 67, 74, 103, 84, 72, 56, 82 >> >> (Note that the N?s are the same for BOTH the ?x? and ?y? sides of >> the input to the test because BOTH the ln(c/L) on ln(c/e) runs AND >> the ln(c/L) on (ln(c/e,ln(c/u)) runs were done for the same 12 >> length intervals for same fold (a1) using the same dicodon set >> (S63) with the same restriction on u (u = uL).) >> >> II. Now suppose we again run the two same two simple linear >> regressions: >> >> ln(c/L) on Ln(c/e) >> ln(c/L) on Ln(c/u) >> >> and the same multiple linear regression >> >> ln(c/L) on (ln(c/e), ln(c/u)) (two independent variables) >> >> on all twelve length intervals for the a1 fold using the dicodon >> set S63, BUT WITH u RESTRICTED to uH INSTEAD OF uL. >> >> Further, suppose we AGAIN run the SAME custom two-stage >> heteroscedastic test using: >> >> the 12 slopes from the ln(c/L) on ln(c/e) runs >> the 12 slopes for the ln(c/e) variable from the ln(c/L) on >> (ln(c/e),ln(c/u))runs >> >> where the ?x? N?s and ?y? N?s in THIS case are >> >> 177, 243, 235, 156, 185, 179, 151, 175, 162, 139, 138, 152 >> >> Question: >> >> Even though the N?s in test II are much greater than the N?s in >> test I, isn?t it nonetheless perfectly legitimate to compare >> the the p?s resulting from tests I and II because WITHIN EACH OF >> THESE TWO TESTS, the 12 sample sizes were THE SAME for BOTH of the >> regressions ln(c/L) on ln(c/e) and ln(c/L) on (ln(c/e,ln(c/u))? >> >> It seems to me that given the basic ideas informing the basic >> notion of ?sampling?, the answer to this question has to be ?yes?. >> >> But again, I know that?s just my naivete and ignorance talking, >> and that?s why I?m asking for a ruling from you here, plus of >> course, an explanation for dummies of why the answer is ?no? >> (if the answer is ?no?.) > > The family is still here, so I don't have time the construct a nice > answer, but you seem to be charging ahead, so here's a short answer. > The heteroscedastic t I described requires the comparands to be > independent, but the two coefficients you want to compare -- the > slope in the regression of ln(c/L) on ln(c/e), and the slope with > respect to ln(c/e) in the multiple regression of ln(c/L) on ln(c/e) > and ln(c/u) -- are not independent. > > I also have other reservations about the comparison, but I'll save > those for later.
The family has left, so I should have some time now.
The comparison you want to make can (probably) be done, but not with the two-stage heterocscedastic t-test that I described. A different procedure will be needed. Comparing one of the coefficients in a multiple regression to the corresponding simple regression coefficient calculated from the same data is not something that is usually done. There are technical problems. All the tests so far have used a "fixed regressor" model, in which the distribution of the predictors is taken to be given and arbitrary, as opposed to a random sample from some distribution, and all the "random error" -- the misfit to the model -- is assumed to be in the dependent variable. However, the relation you want to test is affected by the joint distribution of the predictors, so you will probably have to switch to a more realistic model, and this will complicate matters. Moreover, even if a proper test can be devised, it's still not obvious to me that the comparison makes sense. What do you think it will tell your?
A more fundamental problem relates to the length intervals. I assume that you are using intervals, instead of doing everything at each actual length, because the N at each length is too small to support analysis. However, that shouldn't stop you from asking what you would get if you had enough data to analyze each length separately. In the regressions mentioned above, L would become a constant; the intercept would be an obvious direct function of L, but the slope would involve L only indirectly, to the extent that the relations among c,e,u change as a function of L. I'm not sure how this should be approached, but I suspect that both the current and previous regressions are not quite right. My guess is that you would be better off to hit the Pause button on all the regression stuff, get some software that lets you do good 3D plotting, and look at the joint c,e,u distributions as a (time) function of L; i.e., a "3D movie" of sorts.