Date: Nov 26, 2012 3:20 AM
Author: Ray Koopman
Subject: Re: Interpretation of coefficients in multiple regressions which<br> model linear dependence on an IV

On Nov 21, 8:13 pm, djh <halitsk...@att.net> wrote:
> In a different thread, Ray Koopman explained that if one
> suspects these regressions to be dependent on the IV ?u?:
>
> c on u
> c on e
> c on (e,u)
>
> then under the usual initial assumption that the dependence
> is linear, these three regressions should be modified to:
>
> c on (u, u^2) instead of c on u
> c on (e, u, u*e) instead of c on e
> c on (e, u, u*e, u^2) instead of c on (e,u)


In this post I want to talk about only your first case, where the
d.v. is a quadratic function of the i.v.: y = a0 + a1*x + a2*x^2.
(I use the usual generic variable names x and y so as not to get
caught up in any peculiarities of your particular variables.)

First, be aware that this is called "linear regression" by some
people, and "nonlinear regression" by others. Both are right:
y is linear in the coefficients a0,a1,a2 but quadratic in x. In
my experience, statisticians will usually say "linear" (presumably
because estimating the coefficients is simple when y is linear in
the coefficients but complicated otherwise), but users/consumers
of the results will usually say "nonlinear" (presumably because they
are looking at, or at least thinking of, a plot of the regression
function).

I suggested the quadratic function as a replacement for the
piecewise linear function:

|b0 + b1*x, x <= x'
y = | , for some pre-specified value x'.
|c0 + c1*x, x > x'

Note that that function is not realistic, in the sense that it
allows a discontinuity at x'. A more realistic function would be

|d0 + d1*(x-x'), x <= x'
y = | .
|d0 + d2*(x-x'), x > x'

But that function, too, is unrealistic because its gradient can be
discontinuous. The quadratic has neither of those problems but is
usually criticized because its slope is not bounded, so choosing
between the two reduces to deciding which is less unrealistic, with
neither being completely satisfactory. (There are, of course, other
possible functions, but they are all more complicated.)

The general question is whether the slope of the regression is
different for low and high x-values, and (more importantly?) how
that difference varies as a function of other factors. That is,
you have many regressions to do, and the results must be compared
to one another.

Your current analyses look at the slope differences, c1-b1.
(The intercepts and their differences are generally meaningless
when the slopes differ.)

a2 in the quadratic model carries the same general meaning as c1-b1.
To see this, think of a linear function y = A0 + A1*x in which both
the intercept (A0) and the slope (A1) are themselves linear functions
of x: A0 = a00 + a01*x, A1 = a10 + a11*x. Clearly, a11 specifies how
the slope A1 changes as a function of x. Substituting for A0 and A1
gives the quadratic model, with a2 = a11:

y = ( A0 ) + ( A1 )*x

= (a00 + a01*x) + (a10 + a11*x)*x

= a00 + (a01 + a10)*x + a11*x^2

= a0 + ( a1 )*x + a2 *x^2.

As with the piecewise-linear model, the intercept (a0) in the
quadratic model is generally meaningless.

Now for what to many people seems surprising and counterintuitive,
even just plain wrong: in the quadratic model, a1 is generally
meaningless. This is because a1 is only the slope of the function at
x = 0, whereas many people think of a1 as some sort of conceptual,
if not literal, "average" slope. If a single number is wanted that
represents some sort of average slope then it can be computed, but
first that "average" must be defined.