On Aug 22, 5:26 am, djh <firstname.lastname@example.org> wrote: > Thanks for giving me (offline) your permission to reopen this thread > with a new reply to your post of 8/13@4:42/ > > 1. Possible value of ?simple regression vs multiple regression > comparison? > > When you responded (in your post of 8/13@4:42) to my proposal > involving comparison of a simple and a multiple regression, you wrote: > > ?The comparison you want to make can (probably) be done, but not with > the two-stage heterocscedastic t-test that I described. A different > procedure will be needed. [. . .] Moreover, even if a proper test can > be devised, it's still not obvious to me that the comparison makes > sense. What do you think it will tell you?? > > To answer your question about what we?ll learn, I?m assuming that by > comparing: > > ln(c/L) on ln(c/e) with ln(c/L) on (ln(c/e),ln(c/u)) > ln(c/L) on ln(c/u) with ln(c/L) on (ln(c/e, ln(c/u)) > > we MAY learn something about when dipeptide concentration is: > > i) more a function of energy concentration > ii) more a function of dicodon concentation > iii) more a function of both energy concentration AND dicodon > concentration.
Here is something I posted some time ago in another forum. I think most of it applies to your problem.
There are several properties of regression that are directly relevant to questions of the relative "importance" or "impact" of predictors but are widely misunderstood:
1. First and foremost, all the variables that truly matter must be present in the regression equation. If any important variables are omitted then the results can be misleading unless all the omitted variables are uncorrelated with all the included variables. There is no point in attempting to discover the relative importance of some predictors for which you have data unless you already know that these are the only predictors that matter.
2. R^2 can be partitioned into components representing the unique contribution of each predictor only when all the predictors are mutually uncorrelated. The problem is not partitioning R^2 -- that can always be done. The problem is that the results do not always represent the unique contribution of each predictor. However intuitively straightforward the notion of unique contributions may seem, there is no mathematical definition that is entirely satisfactory when the predictors are correlated.
3. Importance ratings obtained by comparing semipartial correlations or changes in R^2 (i.e., squared semipartials) depend on the joint distribution of the predictors. Contrary to what is often implicit in the importance question, the results are not inherent properties of the variables alone, but joint properties of the variables and the particular multivariate distribution they happen to have. This is especially important when the distribution of the predictors is an artifact -- either a direct artifact, because the investigator set the values of the predictors; or an indirect artifact, because the investigator selected cases or sampled nonrandomly. And even if the sample distribution is a valid estimate of some "natural" population distribution, if the population distribution changes then the true semipartials can also change, even though the mechanism relating the predictors to the outcome variable has not changed.
4. If the predictors are in the same units (possibly after data- independent unit-equating transformations, which excludes sample- specific standardization), then comparing the raw-score regression weights can lead to conclusions of relative importance that are inherent properties of the variables alone. However, the definition of importance that is implicit in comparisons of the regression weights is the expected change in the outcome variable for a unit change in the predictor in question, with all other predictors held constant. In some situations this definition may be appropriate; in others, not.
So where does this leave us? It may often mean that questions about importance can not be answered -- at least not in the sense that they were asked. Unfortunately, regression seems to have been "sold" to many as a way to answer all such questions. It can't.
> > So, would you have the time to devise the ?different procedure? you > mention? > > If so, I would like to perform it AS I collect the data for the > ?closeness analysis? we?re discussing in the other thread. That way, > I won?t have to revisit the same file more than once. > > 2. Regressions when L held constant. > > In your post of 8:13@4:42, you also wrote: > > ?A more fundamental problem relates to the length intervals. I assume > that you are using intervals, instead of doing everything at each > actual length, because the N at each length is too small to support > analysis. However, that shouldn't stop you from asking what you would > get if you had enough data to analyze each length separately. In the > regressions mentioned above, L would become a constant; the intercept > would be an obvious direct function of L, but the slope would involve > L only indirectly, to the extent that the relations among c,e,u change > as a function of L. I'm not sure how this should be approached, but I > suspect that both the current and previous regressions are not quite > right.? > > I agree that when L is held constant, it may be more appropriate to > investigate regressions like: > > a) ln(c) on ln(e) > b) ln(c) on ln(u) > c) ln(c) on (ln(e),ln(u)) > > And if the data from: > > d) ln(c/L) on ln(c/e) > e) ln(c/L) on ln(c/u) > f) ln(c/L) on (ln(c/e),ln(c/u)) > > are empirically ?interesting?, then I will investigate (a-c) as well > as (d-f). But again, if comparing (d) with (f) and (e) with (f) > suggest nothing of empirical interest, then there?s no sense looking > at (a-c).