By the way, this post by David Jones is fine, and my post does not contradict anything in it. I offered a slightly different angle on the same advice.
In the final paragraph, where he says, "From a theoretical point of view," I don't have a better single word for "theoretical," but I would prefer some statement like, "From a theoretical point of view that focuses on the validity and robustness of the statistical tests ...". His next sentence fixes that tiny problem.
-- Rich Ulrich
On Thu, 31 Jan 2013 15:29:42 -0000, "David Jones" <email@example.com> wrote:
[snip, original post] >====================================== > >It is important to be clear about how the value of R^2 that you use is >calculated when you use it. Just using values from individual fitting >modules may well not be enough. > >See http://en.wikipedia.org/wiki/Coefficient_of_determination > >You should try calculating R^2 directly from the sets of observed and >corresponding values predicted values, where >(i) "observed" is the original observations and "predicted" is either the >predictions from linear regression or the exponential of the predictions >from the regression model for the log-ed data (it is also possible to >include a "bias adjusted" version of the latter) >and >(ii) "observed" is the log-ed original observations and "predicted" is >either the predictions from linear regression on the log-ed data or the >logarithm of the predictions from the regression model for the original >data. > >This gives at least 4 values to compare. You can also try introducing an >additional linear regression step, for example where in (i) you could fit a >linear model for the observed data based on the exponentiated predictions >from the linear model for the log-ed observations. > >If you have time you could construct a pair of scatter plots of observed >versus predicted values in both original and transformed spaces. > >But there is no definite generally applicable answer to your question, >except hat you should definitely have a comparison of R^2 values calculated >for the same transformation of the observed data. From a theoretical point >of view , if the usual model-checks for regression models suggest that the >transformed model is better then you should be using the R^2 calculated for >the log-ed data. But, if practical/real-world considerations suggest that >the "importance" of errors of prediction is equal on the non-transformed >scale, then R^2 calculated for the untransformed observations may be more >closely aligned to what you are trying to use the predictions for. > >David Jones