Richard Ulrich wrote: > > On 17 Apr 2005 20:28:12 -0700, "Reef Fish" > <Large_Nassau_Grouper@Yahoo.com> wrote: > [snip, much...] > > > > The "expected sign" of a (multiple) regression coefficient is the one > > single ERROR most often committed by social scientists and economist in > > their interpretation of regression coefficients. > > I seem to differ greatly from you on the nature of this error.... > > > > Over the years, I have not found a SINGLE CASE in which a justification > > was given (nor hinted) on where the "expectation" of the expected sign > > came from. > > Deaf to all explanations? no, making a rhetorical point.
"not given (nor hinted)" -- how could it be heard? :-) I am used to hearing rhetorics. Yours is not even a good rhetorical point.
> > > > > The ERROR was always when the user think of the sign of a multiple > > correlation coefficient as the sign of the SIMPLE correlation between > > that X and Y, whereas the SIGN of the coefficient is the sign of the > > PARTIAL correlation between that X and Y, GIVEN all the rest of the > > independent variables in the regression! > > Yes, the sign of the simple correlation is important to > pay attention to.
But it has no relevance to the sign of a MULTIPLE regression coefficient.
> > [...] > > > > One of the latest national news is the NEW SAT exam, consisting of > > Verbal, Quantitative, and Essay. Let's say those are THREE of some > > 10 independent variables used to predict (of fit) the GPR of admitted > > [snip, invented example of "opposite-sign" prediction.]
Not invented. Been through that process nearly 30 years ago!
> > Here is a *real* example of opposite-sign prediction. > > It was either the SAT or another achievement test which figured > out that they achieved more reliable estimation of Verbal by > subtracting off some of the achievement on a Reading Speed > sub-scale that was computed internally (not reported to users). > Folks who read faster could get farther through the test, without > knowing more, so it provided a *rational* correction.
What's your point? Were you using regression methods to build a PREDICTION model with the available data?
I was teaching a graduate course in Data Analysis in which each student chose his own REAL data sets to learn how to do multiple regression and model building throughout the course.
(That'w why I've analyzed THOUSANDS of real data sets with regression methods from those graduate classes alone!)
One student was working for the Admissions Office of the University and used the real data used BY the university to build its prediction model(s). It was an extensive dataset with thousands of observations, with the usual SAT scores, a Math Achievement Score, some nominal and categorical variables in the students' applications, to FIT to the students' Grade Point Average at the end of their Freshman year, so that the "prediction models" were used in subsequent year to help decide whether to admit certain students, based on the predicted performance at the end of the Freshman year.
That turned out to be a GOLDEN set of data to use for pedagogical purposes to demonstrate the "expected sign" fallacy as well as showing how the SIGN of the SAT math variable in the predictive models can be POSITIVE or NEGATIVE, statistically insignificantly negative, OR statistically significantly negative, ALL depending on what OTHER variables are in the predictive models. The variable that would make the sign of the SAT Math variable NEGATIVE in predicting GPR was the PRESENCE of the Math Achievement Score in the same model, in combination with certain other variables.
Thus, one could in fact manipulate the SIGN of the SAT Math score coefficient at will, to be positive OR negative, where there is no question that the SIMPLE correlation between of that variable and GPR is consistently and significantly POSITIVE.
One important fact to remember is that this is a PREDICTIVE model.
Granted it would be difficult to explain to the unwashed why the sign of the SAT score is NEGATIVE!
"Does that mean the lower my SAT Math score, the better chance I have to be admitted?" would be the obvious question by the student or the parents.
YES, if you knew the PREDICTION model and use it to cheat! :-) Sort of like using insider information in stock trading. Remember Martha? :) THEN, indeed a student could deliberately score low on the SAT Math and enhance his/her chances of admission IF HE KNEW THAT MODEL was used.
The PREDICTIVE model is neither meant to be a CAUSAL model nor a CONTROL model. To use it as such would just be another common abuse of regression models.
Over the years, those predicted models (developed by MY graduate students in the course, not the models actually used by the Admissions Office) with a NEGATIVE sign for SAT Math, consistently stood all tests of cross-validation, subsampling, and the rest of the data-analytic techniques to see if a developed model is "stable" and hold for future predictions.
Contrary to your unsubstantiated speculation:
RU> We are avoiding artifacts that RU> willl not be consistent between samples.
> Here is a rule, I think, for using opposite-sign predictors: > Make sure that they actually work. I think, too, there will > have to be a face-valid interpretation of them. The easiest > instances that I know of have involved pairs of variables, > so that the (B-k*A) term can be explicitly used in prediction; > also, you can figure out separately whether (B-k*A) works > better than another model of difference, like k*log(B/A).
You are just using your ad hoc way of trying to explain the effect of the PARTIAL correlation information imbedded in a predictive model in which your intuition about its sign was wrong. > > > > > > The MIS-interpretation of the "expected sign" of multiple regression > > coefficient gave rise to a flurry of papers on Ridge Regression, for > > the sole purpose of make the observed sign "correct", when they could > > give NO raason (or even know that those are not SIMPLE ocrrelation > > signs) why any sign should have a "positive" or "negative" expectation. > > But, I take it, you forever *missed* the explanation for why > people did not like the opposite-sign predictions: They > didn't hold up.
WRONG! I explained why they don't like it, because it's counter- intuitive and people MIS-interpret such predictive models as if they "explain" or "control" the GPR average in the students performance.
TWO or THREE wrongs do not make one right!
> That's why the ridge-regressions *did* tend > to work -- they replicated. "Reduced variance" was the goal.
That is why Ridge Regression was a fad for a few years. It was a naked emperor promoted by those who MISINTERPRET the signs of the multiple regression coefficients.
If the Ridge Regression enthusiasts wanted to use MSE as their criterion rather than LS, they could play Stein's game and publish some results irrelevant to sound application.
> > > > I have rejected more submitted journal papers based on that faulty and > > false premise than you can imagine. But such misinterpretations are > > EVERYWHERE in the applied journals of economics and social sciences. > > > > *All* those people seem to differ greatly from you, on the > nature of the problems in regressions.
*All* those who thought the "expected" sign should be the same as the "expected sign" of a simple correlation are WRONG, without exception. (completely orthogonal X's excepted, as noted in my introduction).
That is both a theoretical AND an empirical FACT.
> > EVERYWHERE ... it seems to me like this ought to have provoked > a response of curiosity. Do you *still* not wonder why?
There is nothing to wonder why, when I knew the fallacy of those who misinterpret coefficients, and I know the theoretical as well as proven empirical reasons (as in the University Admission data). What is there left to wonder?
I routinely took published articles from economics and the social sciences as lecture material in my graduate classes to PROVE that the authors were WRONG when they say "the expected sign", becuase they were dealing with models with a large number of independent (they like to use fancy words like "exogenous" too, as if that added anything of substance) variables and they say variable X is "expected" to have a positive sign in some model when they did not even TELL what the other variables are, let alone reason WHY the partial correlation should be expected to be positive!
You KNOW then and there, they were thinking "simple correlation" when it should have been the "partial correlation", whose sign (expected or observed) depends CRITICALLY and ENTIRELY on what OTHER variables are in the fitted model!
> > [snip, some] > > > > I've seen the "expected sign" MIS-interpreted every time I've seen > > that term used in a multiple regression context; I've NEVER seen > > anyone argue on why that sign is expected to be "positive" or > > "negative" by arguing from a partial correlation point of view! > > Artifact, Bob, artifact.
I have reason to believe that you are in the camp of the "expected sign" abusers, from everything you've said in this post!
> We are avoiding artifacts that > willl not be consistent between samples.
That's only YOUR unsubstantiated speculations. I've explained how the Admission Office data could consistently support, and significantly support PREDICTIVE models of GPR in which the SIGN of the SAT Math variable was NEGSTIVE.
> Here is some > background of why your rant does not move me, and > must have frustrated a good many good researchers whom > you have reviewed.
I consider that my contribution to stop/lessen statistical ABUSE and statistical POLLUTION, by those in the social and economic sciences who are no more equipped to practice statistics as they are to practice brain surgery or law, after reading a book or a chapter of a computer manual and thinks they can practice statistics correctly.
> > Psychometrics figured out a long time ago that rating > scales are not created by multiple regression of items. > (Certain ideas in making *scales*, I believe, have carried over > usefully to good intuitions while *using* multiple regression.)
Don't change the subject.
We are NOT talking about rating scales or any of what psychometricians do. We are SPECIFICALLY talking about the CORRECT and PROPER interpretation of the SIGN of multiple regression coefficients.
> The most common way to create additive scales in the > social sciences makes use of simple sums of items, or > of item responses "0,1,2,3". It takes a huge sample to > justify using differential weighting of items, or of scores > (for all items, or for single items).
I've had colleagues who talked about "unit weighting" too. They were talking about STATISTICS. They were exercising psychometric voodoo and quackery in the name of statistics!
< iirrelevant tangent to the interpretation of SIGN of the multiple regression coefficients snipped >
> > Now, if the opposite sign can replicate, I would certainly > search for the reason.
The reasons would be PARTIAL correlation of one variable with another in the PRESENCE of the remaining variables.
Simple! Tod bad you never learned that.
> However, these suppressors are usually accidents.
Your inexperience with REAL data in "model building" using regression methods showed.
> Hope this helps.
Sorry, your post didn't help one bit in explaining away the common abuse by those who are totally oblivious to the DIFFERENCE between the "expected sign" of a simple correlation and that of a PARTIAL correlation.
Hope this helps. If not, I am not surprised.
Of all the textbooks on regression, the one that best articulates the FALLACY of the "expected sign" phenomenon (by people like yourself and the other social scientists and economists) is the book by Mosteller and Tukey!! "Data Analysis and Regression" (1977).
Get a copy of that book, and read the relevent chapters related to partial correlations (and the misnomer of "keeping the other variables constnat" when speaking of the given variables in a partial corr.), and try to read it CAREFULLY and read it WELL.