Date: Dec 11, 2012 1:20 PM
Author: paul
Subject: Multiple regression with all dummy variables
Does a multiple regression with all dummy (indicator) variables make

sense? I work at a state university tutoring various basic subjects

including college algebra, first semester calculus, and a two-semester

"Statistics for Business and Economics" sequence. In recent years my

students have been taught that an alternative to using the ANOVA

technique is to run a multiple regression analysis using all dummy

variables. A recent example given as a study guide for the final exam

was a comparison of used-car prices by color (white, black, blue, or

silver.) Both ANOVA and a multiple regression (with black as the

excluded category) reject the null hypothesis that there is no

difference in prices by color. But the students are then told that the

multiple regression gives more information since we can conclude from

the t-tests on individual coefficients that silver cars sell for more

than the base case (black.) I thought you needed at least one measured

(scalar?) variable among the explanatory variables -- it makes no

sense to do a scatter plot on just a dummy variable, so what on earth

is this "line" (or surface) you are getting from the regression?

So, is having at least one measured explanatory variable a basic

requirement for regression? Has anyone proven that the individual

coefficients on an all-dummy variable regression have no meaning?

Perhaps they follow a well-defined distribution, which might not be

Student's t. Any easy on-line sources? I did not see anything in basic

article on regression in wikipedia.

I'll mention that previously students were taught that, according to

the Central Limit Theorem, if you are doing hypothesis testing on a

mean and you have more than 30 or 40 data points, it's OK to assume

your test statistic is normally rather than t-distributed. They've

abandoned that nonsense, but I'm sceptical about these all-dummy

regressions.

Thanks for any help!