On Tuesday, December 11, 2012 1:20:48 PM UTC-5, paul wrote: > Does a multiple regression with all dummy (indicator) variables make > sense?
> In recent years my students have been taught that an alternative to > using the ANOVA technique is to run a multiple regression analysis using > all dummy variables.
> I thought you needed at least one measured (scalar?) variable among > the explanatory variables -- it makes no sense to do a scatter plot > on just a dummy variable,
It does to me. For example, do a scatter plot of salary v. gender (0 = male, 1 = female). You get two columns of points, from which you can eyeball both differences in location (mean/median) and dispersion (range/standard deviation).
> so what on earth is this "line" (or surface) you are getting from the > regression?
It provides the conditional mean of the response variable as a linear function of the indicators. I suspect your concern is grounded at least partly in the fact that the function only makes sense when the arguments are all zeros and ones (i.e., the domain is discrete). That's also true, though, of other models you might find more intuitive. Suppose I regress towing power on the number of locomotives in a train. The domain of the predictor variable is discrete, so most of the points on a regression line would not represent any real-world scenario; but we still draw the line rather than discrete points for the mean power for each number of engines (and drawing discrete points rather than a line would not change the fact that the mean power is a linear function of the number of engines).