Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: R^2 for linearized regression
Replies: 3   Last Post: Jan 31, 2013 5:18 PM

 Messages: [ Previous | Next ]
 Richard Ulrich Posts: 2,959 Registered: 12/13/04
Re: R^2 for linearized regression
Posted: Jan 31, 2013 1:08 PM

On Wed, 30 Jan 2013 23:27:30 -0800 (PST), Darek <darek12345@gmail.com>
wrote:

>Hi all!
>
>I would like to ask about R^2 in linearized regression where Y value
>is transformed e.g.:
>http://en.wikipedia.org/wiki/Nonlinear_regression#Linearization
>If we apply power function (Y=a*b^X) for regression in Excel or SPSS
>the R^2 (sum of squares etc.) is calculated using linearized function
>i.e.: ln(Y)=a+ln(X)

Yes, if you "linearize" an equation, you do get a different
Sum of squares, etc. But SPSS allows nonlinear regression
which does not require or use linearization (and I imagine
that there are various packages with various provisions,
for Excel).

>
>I think that comparison of R^2 for the same dataset for various
>regression functions (e.g.between linear and power function) where Y
>is transformed is not proper method of selection of best regression
>model.

Right.
What regression minimizes is the sum of squares of residuals, and
you cannot merely compare those sums when they are measured by
different metrics, raw versus log versus [whatever].

What metric is used for those residuals? What metric do you *want*
for those residuals? - the "best model" is the one that provides
the "smallest residuals" in whatever is the "most sensible" metric.
Differences measured in log-units will not match differences
measured by raw units.

- Now, Tukey describes using power transformations in the
precise form that incorporates constants (from the derivatives
of the transformations) so that the SS remains approximately
constant. I think that that was in his book on regression.
If I recall properly, the SS-residuals is better preserved than
the overall SS, so that regressions in different metrics can be
approximately compared by the size of their residual SS, and
not the R^2. I always considered that to be "of academic
interest" because I was happier looking at the scatterplots,
and judging which one has the characteristic of "equal interval"
in the measurements, so that errors across the range of the
scale seem to have the same "clinical" meaning. For my data,
that was (almost) always the plot where the two variables
were closer to the Normal distributions.

>I think that in the case described above if we would like to compare
>various functions of regression, R^2 should be calculated using
>function Y=a*b^X not function after linearization ln(Y)=a+ln(X).
>
>Could you give your opinion on this matter?

Which residuals do you like better? What is the natural metric
for the variable? If you easily talk about, "This score is twice
the size of that one," then your language suggests that the
log metric is the natural one. That's true for a whole lot of
biological variables. And a lot of others.

My opinion is - You can't take any scores out of context and
say that the raw values deserve to be raw, or deserve to be
logged. It depends on what sort of relation between the
two variables is expected to be linear, with homogeneous
(equal variance across the range) errors.

--
Rich Ulrich

Date Subject Author
1/31/13 David Jones
1/31/13 Richard Ulrich
1/31/13 Richard Ulrich