The Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: linear regression and multicollinearity
Replies: 13   Last Post: Nov 15, 2007 4:28 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
hberig@gmail.com

Posts: 10
Registered: 11/5/07
Re: linear regression and multicollinearity
Posted: Nov 6, 2007 4:23 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Nov 6, 2:43 am, David Winsemius <doe_s...@comcast.n0T> wrote:
> hbe...@gmail.com wrote innews:1194325079.861472.12860@50g2000hsm.googlegroups.com:
>

> > I'm sorry, R linear regression works OK (lm function), only have a
> > problem the solve function for matrix equations, it seems that use
> > singular value decomposition and fails when the matrix A have a linear
> > dependent column (or row).

>
> As it should fail. It will fail in more instances than when you simply
> have two rows or columns that are multiples. Form two colums that are
> random numbers then form a 3rd column that is the sum of the first two.
> X'X will also be rank deficient in that case.
>
>
>

> > On 6 nov, 01:25, hbe...@gmail.com wrote:
> >> Hi,
>
> >> I have 2 questions:
>
> >> 1)
> >> I know that multicollinearity may cause some problems, but may be
> >> not? Suppose I've X1, X2 predictors and Y response variable with the
> >> following data:
> >> X1 X2 Y
> >> -----------
> >> 1 2 3
> >> 2 4 6
> >> 3 6 9
> >> 4 8 12

>
> >> X2 = 2*X1, there exists multicollinearity between X1 and X2.
>
> >> When I try a least squares regression for Y = b0 + b1*X1 + b2*X2
> >> I expect Y =X1 + X2
> >> (b0 = 0 and b1= b2 = 1), the unbiased and minimum variance estimator
> >> But with a software package, exactly R, I get that the system is
> >> singular.
> >> It's ok, if I have the X matrix> X

>
> >> [,1] [,2] [,3]
> >> [1,] 1 1 2
> >> [2,] 1 2 4
> >> [3,] 1 3 6
> >> [4,] 1 4 8

>
> >> X and then R try to invert X' X (in R notation t(X) %*% X) that is
> >> not invertible and I get an error.

>
> Because of columns 2 and 3.
>

> >> Of course is not a real world case problem but, this is an error? is
> >> common in other packages than R?

>
> Not an error. Your X"X matrix to be inverted should have used X[,1:2].
> The dependent variable column, X[,3], is not in the hat matrix. That is
> why the regression works and your inversion did not.


X[,3] is not the dependent variable.

X1 X2 Y
-----------
1 2 3
2 4 6
3 6 9
4 8 12

X =
1 1 2
1 2 4
1 3 6
1 4 8

(the first column multiplies the intercept b0)

Y =
3
6
9
12


But may be I understand your point, in place of invert X'X (this is
impossible), R invert a matrix W'W where W are the independent rows of
X, or use other technique to determine B1, ..., Bp regressor
coefficients ???

X <- matrix(c(1,1,1,1,1,2,3,4,2,4,6,8),nrow=4)
I3 <- matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)
Y = t(X) %*% X
## impossible
solve(Y,I3)

I2 <- matrix(c(1,0,0,1),nrow=2)
Y = t(X[,1:2]) %*% X[,1:2]
## posible
solve(Y,I2)




I'm trying to understand deeply multicollinearity causes and
consequences, feel free to correct me if I confuse...
Multicollinearity is equivalent to ill posed X'X, in the worst case
X'X is not invertible.
Because we use (X'X)^{-1} for coefficients estimatators and X'X is ill
possed then we have a big sensibility of changes on the results (then
big variance) with small changes in some data of predictors, and this
is the reason why multicollinearity may cause a big problem ???

I'm looking now in http://en.wikipedia.org/wiki/Linear_least_squares
that says, when X'X is not invertible then use other methods based on
QR decomposition or singular value decomposition.



I appreciate very much your help! I'm trying to understand deeply the
essence of the subject, to understand why, in which cases and how much
may cause a problem or not (I think is more important than remember
rules, and some texts gives only rules without deep explanations, the
esscence). Then I'm trying to get and unify concepts from:
http://en.wikipedia.org/wiki/Linear_least_squares
http://en.wikipedia.org/wiki/Linear_regression



>
> R> X
> V1 V2 V3
> 1 1 2 3
> 2 1 4 6
> 3 1 6 9
> 4 1 8 12
>
> R> Y <- t(X[,1:2]) %*% X[,1:2]
>
> R> Z <- matinv(Y)
> R> Z
> V1 V2
> V1 1.50 -0.25
> V2 -0.25 0.05
> attr(,"rank")
> [1] 2
> attr(,"swept")
> [1] TRUE TRUE
>
> --
> David Winsemius


Thanks David for your response!




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.