On Nov 21, 8:13 pm, djh <halitsk...@att.net> wrote: > In a different thread, Ray Koopman explained that if one > suspects these regressions to be dependent on the IV ?u?: > > c on u > c on e > c on (e,u) > > then under the usual initial assumption that the dependence > is linear, these three regressions should be modified to: > > c on (u, u^2) instead of c on u > c on (e, u, u*e) instead of c on e > c on (e, u, u*e, u^2) instead of c on (e,u)
In this post I want to talk about only your first case, where the d.v. is a quadratic function of the i.v.: y = a0 + a1*x + a2*x^2. (I use the usual generic variable names x and y so as not to get caught up in any peculiarities of your particular variables.)
First, be aware that this is called "linear regression" by some people, and "nonlinear regression" by others. Both are right: y is linear in the coefficients a0,a1,a2 but quadratic in x. In my experience, statisticians will usually say "linear" (presumably because estimating the coefficients is simple when y is linear in the coefficients but complicated otherwise), but users/consumers of the results will usually say "nonlinear" (presumably because they are looking at, or at least thinking of, a plot of the regression function).
I suggested the quadratic function as a replacement for the piecewise linear function:
|b0 + b1*x, x <= x' y = | , for some pre-specified value x'. |c0 + c1*x, x > x'
Note that that function is not realistic, in the sense that it allows a discontinuity at x'. A more realistic function would be
|d0 + d1*(x-x'), x <= x' y = | . |d0 + d2*(x-x'), x > x'
But that function, too, is unrealistic because its gradient can be discontinuous. The quadratic has neither of those problems but is usually criticized because its slope is not bounded, so choosing between the two reduces to deciding which is less unrealistic, with neither being completely satisfactory. (There are, of course, other possible functions, but they are all more complicated.)
The general question is whether the slope of the regression is different for low and high x-values, and (more importantly?) how that difference varies as a function of other factors. That is, you have many regressions to do, and the results must be compared to one another.
Your current analyses look at the slope differences, c1-b1. (The intercepts and their differences are generally meaningless when the slopes differ.)
a2 in the quadratic model carries the same general meaning as c1-b1. To see this, think of a linear function y = A0 + A1*x in which both the intercept (A0) and the slope (A1) are themselves linear functions of x: A0 = a00 + a01*x, A1 = a10 + a11*x. Clearly, a11 specifies how the slope A1 changes as a function of x. Substituting for A0 and A1 gives the quadratic model, with a2 = a11:
y = ( A0 ) + ( A1 )*x
= (a00 + a01*x) + (a10 + a11*x)*x
= a00 + (a01 + a10)*x + a11*x^2
= a0 + ( a1 )*x + a2 *x^2.
As with the piecewise-linear model, the intercept (a0) in the quadratic model is generally meaningless.
Now for what to many people seems surprising and counterintuitive, even just plain wrong: in the quadratic model, a1 is generally meaningless. This is because a1 is only the slope of the function at x = 0, whereas many people think of a1 as some sort of conceptual, if not literal, "average" slope. If a single number is wanted that represents some sort of average slope then it can be computed, but first that "average" must be defined.