Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: R^2 for linearized regression
Replies: 3   Last Post: Jan 31, 2013 5:18 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Richard Ulrich

Posts: 2,859
Registered: 12/13/04
Re: R^2 for linearized regression
Posted: Jan 31, 2013 1:08 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Wed, 30 Jan 2013 23:27:30 -0800 (PST), Darek <darek12345@gmail.com>
wrote:

>Hi all!
>
>I would like to ask about R^2 in linearized regression where Y value
>is transformed e.g.:
>http://en.wikipedia.org/wiki/Nonlinear_regression#Linearization
>If we apply power function (Y=a*b^X) for regression in Excel or SPSS
>the R^2 (sum of squares etc.) is calculated using linearized function
>i.e.: ln(Y)=a+ln(X)


I'm not sure about your starting point here.

Yes, if you "linearize" an equation, you do get a different
Sum of squares, etc. But SPSS allows nonlinear regression
which does not require or use linearization (and I imagine
that there are various packages with various provisions,
for Excel).


>
>I think that comparison of R^2 for the same dataset for various
>regression functions (e.g.between linear and power function) where Y
>is transformed is not proper method of selection of best regression
>model.


Right.
What regression minimizes is the sum of squares of residuals, and
you cannot merely compare those sums when they are measured by
different metrics, raw versus log versus [whatever].

What metric is used for those residuals? What metric do you *want*
for those residuals? - the "best model" is the one that provides
the "smallest residuals" in whatever is the "most sensible" metric.
Differences measured in log-units will not match differences
measured by raw units.

- Now, Tukey describes using power transformations in the
precise form that incorporates constants (from the derivatives
of the transformations) so that the SS remains approximately
constant. I think that that was in his book on regression.
If I recall properly, the SS-residuals is better preserved than
the overall SS, so that regressions in different metrics can be
approximately compared by the size of their residual SS, and
not the R^2. I always considered that to be "of academic
interest" because I was happier looking at the scatterplots,
and judging which one has the characteristic of "equal interval"
in the measurements, so that errors across the range of the
scale seem to have the same "clinical" meaning. For my data,
that was (almost) always the plot where the two variables
were closer to the Normal distributions.


>I think that in the case described above if we would like to compare
>various functions of regression, R^2 should be calculated using
>function Y=a*b^X not function after linearization ln(Y)=a+ln(X).
>
>Could you give your opinion on this matter?


Which residuals do you like better? What is the natural metric
for the variable? If you easily talk about, "This score is twice
the size of that one," then your language suggests that the
log metric is the natural one. That's true for a whole lot of
biological variables. And a lot of others.

My opinion is - You can't take any scores out of context and
say that the raw values deserve to be raw, or deserve to be
logged. It depends on what sort of relation between the
two variables is expected to be linear, with homogeneous
(equal variance across the range) errors.


--
Rich Ulrich



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.