
Re: linear regression and multicollinearity
Posted:
Nov 6, 2007 4:23 PM


On Nov 6, 2:43 am, David Winsemius <doe_s...@comcast.n0T> wrote: > hbe...@gmail.com wrote innews:1194325079.861472.12860@50g2000hsm.googlegroups.com: > > > I'm sorry, R linear regression works OK (lm function), only have a > > problem the solve function for matrix equations, it seems that use > > singular value decomposition and fails when the matrix A have a linear > > dependent column (or row). > > As it should fail. It will fail in more instances than when you simply > have two rows or columns that are multiples. Form two colums that are > random numbers then form a 3rd column that is the sum of the first two. > X'X will also be rank deficient in that case. > > > > > On 6 nov, 01:25, hbe...@gmail.com wrote: > >> Hi, > > >> I have 2 questions: > > >> 1) > >> I know that multicollinearity may cause some problems, but may be > >> not? Suppose I've X1, X2 predictors and Y response variable with the > >> following data: > >> X1 X2 Y > >>  > >> 1 2 3 > >> 2 4 6 > >> 3 6 9 > >> 4 8 12 > > >> X2 = 2*X1, there exists multicollinearity between X1 and X2. > > >> When I try a least squares regression for Y = b0 + b1*X1 + b2*X2 > >> I expect Y =X1 + X2 > >> (b0 = 0 and b1= b2 = 1), the unbiased and minimum variance estimator > >> But with a software package, exactly R, I get that the system is > >> singular. > >> It's ok, if I have the X matrix> X > > >> [,1] [,2] [,3] > >> [1,] 1 1 2 > >> [2,] 1 2 4 > >> [3,] 1 3 6 > >> [4,] 1 4 8 > > >> X and then R try to invert X' X (in R notation t(X) %*% X) that is > >> not invertible and I get an error. > > Because of columns 2 and 3. > > >> Of course is not a real world case problem but, this is an error? is > >> common in other packages than R? > > Not an error. Your X"X matrix to be inverted should have used X[,1:2]. > The dependent variable column, X[,3], is not in the hat matrix. That is > why the regression works and your inversion did not.
X[,3] is not the dependent variable.
X1 X2 Y  1 2 3 2 4 6 3 6 9 4 8 12
X = 1 1 2 1 2 4 1 3 6 1 4 8
(the first column multiplies the intercept b0)
Y = 3 6 9 12
But may be I understand your point, in place of invert X'X (this is impossible), R invert a matrix W'W where W are the independent rows of X, or use other technique to determine B1, ..., Bp regressor coefficients ???
X < matrix(c(1,1,1,1,1,2,3,4,2,4,6,8),nrow=4) I3 < matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) Y = t(X) %*% X ## impossible solve(Y,I3)
I2 < matrix(c(1,0,0,1),nrow=2) Y = t(X[,1:2]) %*% X[,1:2] ## posible solve(Y,I2)
I'm trying to understand deeply multicollinearity causes and consequences, feel free to correct me if I confuse... Multicollinearity is equivalent to ill posed X'X, in the worst case X'X is not invertible. Because we use (X'X)^{1} for coefficients estimatators and X'X is ill possed then we have a big sensibility of changes on the results (then big variance) with small changes in some data of predictors, and this is the reason why multicollinearity may cause a big problem ???
I'm looking now in http://en.wikipedia.org/wiki/Linear_least_squares that says, when X'X is not invertible then use other methods based on QR decomposition or singular value decomposition.
I appreciate very much your help! I'm trying to understand deeply the essence of the subject, to understand why, in which cases and how much may cause a problem or not (I think is more important than remember rules, and some texts gives only rules without deep explanations, the esscence). Then I'm trying to get and unify concepts from: http://en.wikipedia.org/wiki/Linear_least_squares http://en.wikipedia.org/wiki/Linear_regression
> > R> X > V1 V2 V3 > 1 1 2 3 > 2 1 4 6 > 3 1 6 9 > 4 1 8 12 > > R> Y < t(X[,1:2]) %*% X[,1:2] > > R> Z < matinv(Y) > R> Z > V1 V2 > V1 1.50 0.25 > V2 0.25 0.05 > attr(,"rank") > [1] 2 > attr(,"swept") > [1] TRUE TRUE > >  > David Winsemius
Thanks David for your response!

