Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


Math Forum
»
Discussions
»
sci.math.*
»
sci.stat.math
Notice: We are no longer accepting new posts, but the forums will continue to be readable.
Topic:
Multiple regression with all dummy variables
Replies:
7
Last Post:
Feb 15, 2013 4:17 PM



paul
Posts:
2
Registered:
12/11/12


Multiple regression with all dummy variables
Posted:
Dec 11, 2012 1:20 PM


Does a multiple regression with all dummy (indicator) variables make sense? I work at a state university tutoring various basic subjects including college algebra, first semester calculus, and a twosemester "Statistics for Business and Economics" sequence. In recent years my students have been taught that an alternative to using the ANOVA technique is to run a multiple regression analysis using all dummy variables. A recent example given as a study guide for the final exam was a comparison of usedcar prices by color (white, black, blue, or silver.) Both ANOVA and a multiple regression (with black as the excluded category) reject the null hypothesis that there is no difference in prices by color. But the students are then told that the multiple regression gives more information since we can conclude from the ttests on individual coefficients that silver cars sell for more than the base case (black.) I thought you needed at least one measured (scalar?) variable among the explanatory variables  it makes no sense to do a scatter plot on just a dummy variable, so what on earth is this "line" (or surface) you are getting from the regression?
So, is having at least one measured explanatory variable a basic requirement for regression? Has anyone proven that the individual coefficients on an alldummy variable regression have no meaning? Perhaps they follow a welldefined distribution, which might not be Student's t. Any easy online sources? I did not see anything in basic article on regression in wikipedia.
I'll mention that previously students were taught that, according to the Central Limit Theorem, if you are doing hypothesis testing on a mean and you have more than 30 or 40 data points, it's OK to assume your test statistic is normally rather than tdistributed. They've abandoned that nonsense, but I'm sceptical about these alldummy regressions.
Thanks for any help!



