Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: SVD for PCA: The right most rotation matrix
Replies: 22   Last Post: Jan 4, 2013 4:19 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Paul

Posts: 263
Registered: 2/23/10
SVD for PCA: The right most rotation matrix
Posted: Oct 28, 2012 8:28 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

My apologies if this appears twice. The posting of this message seems
to have been held up.

I am trying to understand SVD in the context of PCA. I have looked at
Leskovec (http://search.yahoo.com/r/
_ylt=A0oG7t31r41QSHsAFA9XNyoA;_ylu=X3oDMTE0YmlrMDI5BHNlYwNzcgRwb3MDMQRjb2xvA2FjMgR2dGlkA01BUDAwNl83MQ--/
SIG=13fl10gvd/EXP=1351491701/**http%3a//www.cs.cmu.edu/~guestrin/Class/
10701-S06/Handouts/recitations/recitation-pca_svd.ppt) and Shlen
(http://search.yahoo.com/r/
_ylt=A0oG7t0dsI1Qj3oAG1ZXNyoA;_ylu=X3oDMTE0YmlrMDI5BHNlYwNzcgRwb3MDMQRjb2xvA2FjMgR2dGlkA01BUDAwNl83MQ--/
SIG=11r2sjgrs/EXP=1351491741/**http%3a//www.snl.salk.edu/~shlens/
pca.pdf) for intution.

The scenario I use is a lab experiment in which m sensors
syncrhonously sample data at n points in time, yielding a data matrix
X with m rows and n columns. Each row contains the readings from a
single sensor/instrument, and each column contains the readings from
an instant in time. I suppose that the rows could also be key words
in a data mining exercise, and the columns could be documents in which
we try to find these key words in (as per Leskovec above), but that
scenario is a bit foggier for me because it deals with "concepts", the
number of which matches neither m nor n. So as a first step, stick
with the scenario for lab sensor/instrument. Also, consider only real
data, so the data covariance matrices are diagonalizable with
orthonormal eigenvectors corresponding to simple rotations of the data
in m-space.

http://en.wikipedia.org/wiki/Principal_component_analysis#Details
diagonalizes the data set X by factoring it into X=(W)(Sigma)(Vt)
where:

* For W, the columns of this (m)x(m) matrix are the orthonormal
eigenvectors of covariance matrix (X)(Xt) {Xt is the transpose of X}.

* Specifically, (X)(Xt) contain the covariances from pairing the m
sensors/instruments rather than from pairing the n samples of m
measurements. The former is of interest to us while for the life of
me, I can't see the relevance of the latter.

* Xt = Is the transpose of X.

* Vt is the transpose of (n)x(n) matrix V. The columns of V are the
orthonormal eigenvectors of the covariance matrix (Xt)(X) --
specifically, the covariances from pairing the n samples of m
measurements. This relevance of this matrix is what I can't see the
relevance of (intuitively).

* Sigma is the diagonal matrix of square roots of eigenvalues of (X)
(Xt), which are the same as for (Xt)(X).

I am trying to eek out some intuition from X=(W)(Sigma)(Vt). I find
it curious and interesting that the covariances (X)(Xt) are viewed as
a linear transformation, and the eigenvectors in W become the
orthogonal directions in which the scalings differ. Hence, they form
the basis vectors that are aligned with the principal components.
Then it becomes obvious that Sigma is simply the anisotropic axial
scaling.

If X is viewed as some kind of linear tranformation (and I'm not sure
if I'm actully suppose to do that), than Vt can be seen as a rotation
so that the princpal component aligns with the 1st axis, the 2nd
principal component aligns with the 2nd, etc., prior to the scaling by
Sigma. Finally, I would expect W to rotate the data back to its
original orientation, thus yielding X on the LHS.

Following Shlen's tutorial, I find the above picture is easier to see
if we rewrite the SVD formula as (Wt)(X)=(Sigma)(Vt), where the /rows/
of Wt are the eigenvectors of covariance (X)(Xt) between sensors/
instruments. Treating them as basis vectors, then multiplying them by
the columns of X simply projects the m-value samples from each
measurement instance onto the principle components, which yields the
rotation of the data points so that the principle components align
with the axes. Conversely, X=(W)[(Sigma)(Vt)] takes the data points
in the rotated state (principle components aligned with axes) and
unrotates themm so that it matches the orientation of the measured
data points.

One of the most disturbing things I haven't been able to figure out is
what V (or Vt) corresponds to in the real world. I mean, if X was a
transformation, then Vt is simply a rotation in n-space. But X
*isn't* a transformation. And n-space is meaningless because we would
never treat the vector of data from a single sensor as a data point
(i.e., each measurement instance in time as a dimension) and plot it
in n-dimensional space. So even though V or Vt somehow corresponds to
a geometric rotation of sorts, it's in an space that is nonsensical
and has no bearing in the real world.

I realize that Leskovec describes SVD differently, as documents versus
search terms, with concepts as an intermediate thing that is
determined by the SVD. The left and right singular vectors then
represent the correlation of documents versus concepts and search
terms versus concepts. However, he doesn't really delve into why the
math corresponds to that. Also, I'm much more interested in the lab
sensor/instrument scenario, where the size of the diagonal matrix
corresponds to the size of the data set (at least before dimensional
reduction).

So when I look at the mockingly simple SVD formula, I have developed a
phobia of the mysterious rotation matrix at the tail end. It has
defied my endless attempts (no joke) to try to understand
intuitively. Thank you anyone for imparting some clear intution to
this.


Date Subject Author
10/28/12
Read SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Ray Koopman
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Ray Koopman
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Art Kendall
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Art Kendall
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Art Kendall
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Art Kendall
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/30/12
Read Re: SVD for PCA: The right most rotation matrix
Art Kendall
11/1/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Richard Ulrich
10/29/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
11/1/12
Read Re: SVD for PCA: The right most rotation matrix
Gottfried Helms
11/1/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
11/2/12
Read Re: SVD for PCA: The right most rotation matrix
Gottfried Helms
11/4/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
11/4/12
Read Re: SVD for PCA: The right most rotation matrix
Gottfried Helms
11/6/12
Read Re: SVD for PCA: The right most rotation matrix
Paul
1/4/13
Read Re: SVD for PCA: The right most rotation matrix
Gary

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.