Date: Nov 26, 2012 3:20 AM Author: Ray Koopman Subject: Re: Interpretation of coefficients in multiple regressions which<br> model linear dependence on an IV On Nov 21, 8:13 pm, djh <halitsk...@att.net> wrote:

> In a different thread, Ray Koopman explained that if one

> suspects these regressions to be dependent on the IV ?u?:

>

> c on u

> c on e

> c on (e,u)

>

> then under the usual initial assumption that the dependence

> is linear, these three regressions should be modified to:

>

> c on (u, u^2) instead of c on u

> c on (e, u, u*e) instead of c on e

> c on (e, u, u*e, u^2) instead of c on (e,u)

In this post I want to talk about only your first case, where the

d.v. is a quadratic function of the i.v.: y = a0 + a1*x + a2*x^2.

(I use the usual generic variable names x and y so as not to get

caught up in any peculiarities of your particular variables.)

First, be aware that this is called "linear regression" by some

people, and "nonlinear regression" by others. Both are right:

y is linear in the coefficients a0,a1,a2 but quadratic in x. In

my experience, statisticians will usually say "linear" (presumably

because estimating the coefficients is simple when y is linear in

the coefficients but complicated otherwise), but users/consumers

of the results will usually say "nonlinear" (presumably because they

are looking at, or at least thinking of, a plot of the regression

function).

I suggested the quadratic function as a replacement for the

piecewise linear function:

|b0 + b1*x, x <= x'

y = | , for some pre-specified value x'.

|c0 + c1*x, x > x'

Note that that function is not realistic, in the sense that it

allows a discontinuity at x'. A more realistic function would be

|d0 + d1*(x-x'), x <= x'

y = | .

|d0 + d2*(x-x'), x > x'

But that function, too, is unrealistic because its gradient can be

discontinuous. The quadratic has neither of those problems but is

usually criticized because its slope is not bounded, so choosing

between the two reduces to deciding which is less unrealistic, with

neither being completely satisfactory. (There are, of course, other

possible functions, but they are all more complicated.)

The general question is whether the slope of the regression is

different for low and high x-values, and (more importantly?) how

that difference varies as a function of other factors. That is,

you have many regressions to do, and the results must be compared

to one another.

Your current analyses look at the slope differences, c1-b1.

(The intercepts and their differences are generally meaningless

when the slopes differ.)

a2 in the quadratic model carries the same general meaning as c1-b1.

To see this, think of a linear function y = A0 + A1*x in which both

the intercept (A0) and the slope (A1) are themselves linear functions

of x: A0 = a00 + a01*x, A1 = a10 + a11*x. Clearly, a11 specifies how

the slope A1 changes as a function of x. Substituting for A0 and A1

gives the quadratic model, with a2 = a11:

y = ( A0 ) + ( A1 )*x

= (a00 + a01*x) + (a10 + a11*x)*x

= a00 + (a01 + a10)*x + a11*x^2

= a0 + ( a1 )*x + a2 *x^2.

As with the piecewise-linear model, the intercept (a0) in the

quadratic model is generally meaningless.

Now for what to many people seems surprising and counterintuitive,

even just plain wrong: in the quadratic model, a1 is generally

meaningless. This is because a1 is only the slope of the function at

x = 0, whereas many people think of a1 as some sort of conceptual,

if not literal, "average" slope. If a single number is wanted that

represents some sort of average slope then it can be computed, but

first that "average" must be defined.