The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: SVD for PCA: The right most rotation matrix
Replies: 21   Last Post: Nov 6, 2012 2:10 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 517
Registered: 2/23/10
Re: SVD for PCA: The right most rotation matrix
Posted: Nov 4, 2012 2:04 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Hi, Gottfried,

Thanks again for taking the time to write such a detailed response. I
gave it a quice twice-over, but it needs more than that. I'll get
back to it soon, but I got some preliminary comments.

On Nov 2, 2:02 am, Gottfried Helms <> wrote:
> it seems I made my comment more complicated than the procedure is.
> Am 01.11.2012 20:53 schrieb Paul:

>> I might be missing some linear algebra theory here, but I looked up
>> gettrans() and I'm not sure what is meant by a column rotation in
>> that context.

> No, gettrans is just a function-call in my MatMate-script-language,
> which returns a rotation-matrix. For instance, by the command:
> t1 = gettrans(X,"drei") // "drei" means "triangular"
> t1 becomes the rotation-matrix, which is required to rotate
> columnwise...

I am still not sure what is a columnwise rotation. Do you actually
switch columns around, or is it more like a geometric rotation?

> ...X to triangular shape. After that we can do the following
> with t1:
> Y = X * t1
> // Y is a lower triangular matrix, (with possibly empty columns
> // to the right
> Z = Y * t1'
> // Z equals now X, because t1*t1' = I (Identitymatrix)
> or, for doing roation to principals components position:
> t2 = gettrans(X,"pc") // "pc" means "principal components"
> and then
> B = X * t2
> // the columns of B are now orthogonal, are the principal
> // components
> I've introduced that function "gettrans" additionally to the simple
> "rotate"- function to have the rotation-matrix available for later
> manipulation, or to be able to reverse a rotation later or to apply
> the same rotation to another matrix etc. It can also be made to work
> only on certain columns and using only certain rows for the
> criterion; This is then useful, if one uses rotations, which are
> implemented as iterative procedures like "pc" or "varimax" or
> similar.

>>> The key is, that the n samples define m vectors in an
>>> n-dimensional euclidean space; simply each column of X can be seen
>>> as a spatial dimension. In that n-dimensional space there are m
>>> vectors, where the number m is smaller than n. Any rotation in
>>> that space repositions the vectors, but *not* the relation, or
>>> better: the angles, between them

>> I'm not sure why *any* rotation in n-space would not preserve
>> angles. I thought that a rotation is by definition a unitary
>> transformation (from a recent brush-up on linear algebra at
>> Wikipedia e.g.

> My remark may be obfuscating here. There is the concept of "oblique
> rotations" in factor analysis (as opposed to orthogonal rotations)
> which do not preserve the angles - and I had the impulse to exclude
> this case verbally... So this remark could just be deleted

>>> ...So we can rotate the vector model X
>>> (columnwise) first such, that
>>> sensor 1 defines the x-axes,
>>> sensor 2 and 1 define the x-y-plane
>>> sensor 3 to 1 define the x-y-z-space
>>> and so on.

>> I don't quite follow what you mean by "rotat[ing] the vector
>> [model] X columnwise". If you interpret each column of X as a
>> point (or vector) in n-space, we get what you describe (sensor 1 is
>> the x-axis, sensor 2 is the y-axis, etc.). However, a rotation is
>> not needed for this.

> If we speak of the n-dimensional space, each column represent the
> coordinates on one axis. Then each row represents one vector
> (from the origin) to some point in this n-dimensional space: for
> each sensor there is one wire from the origin into the n-space,
> and the angles between that wires (more precisely: the cosines of
> that angles) are expressed by the correlation-coefficients. That
> view of statistical data may be somehow unusual - but it is coherent
> with the operations of rotations and the finding of principal
> components - and this is what your matrix Vt stands for.

>>> In effect, that rotation provides a matrix X1 which is triangular
>>> with as many nonzero-columns as the rank of the matrix is (and we
>>> assume for simplicityness, that it equals m)

>> I think I'm missing something fundamental...the data matrix is not
>> triangular, though the (n)x(n) covariance matrix (Xt)(X) is
>> symmetric.

> No, not the data matrix X. But after X is rotated to triangular
> position by t1 then
> X1 = X * t1
> is lower triangular (with some empty columns due to the defective
> rank of X)

What is meant by rotating to triangular position? Do you mean
geometric position, or that X somehow becomes a triangular matrix by
rearranging its columns? What if there are not enough properly placed
zeros for that to be possible?

>>> Then the matrix X1 can be rotated to the position of their
>>> principal components (we're talking already of the nonzero columns
>>> only), let's call this X2

>> I see that the data must be rotated so that the principal axes
>> align with the axes of m-space (not n-space), and then the diagonal
>> matrix Sigma performs the anisotropic axial stretching.

> No, again we rotate in the columns/the n-space. Just we apply the
> (costly because of iterations) rotation to orthogonality (which
> gives principal components) only to the first m axes in X1 (which is
> already triangular with only m significant columns)
> X2 = X1 * t2
> or equivalently
> X2 = X * t1 * t2 = X * (t1 * t2) = X * Vt
> After that X2 contains the coordinates of your sensor-measures
> after rotation in the n-space in such a way that in the first
> column the sum of squared coordinates is the maximum possible
> and in the m'th column the least possible and because
> X2 ' * X2 is diagonal we may say, that the columns are orthogonal

>>> That two rotations together form your matrix Vt. After that, X2
>>> can be rotated by rotation of its rows to diagonal form - this is
>>> your rotation-matrix W, which rotates for the principal components
>>> with respect of the rows in X2 (and which is the same as the
>>> rotation with respect of the rows in X).

>> But W is not applied after Vt,

> ???
> If we have
> W * X * Vt
> we can also write
> W * (X * Vt)
> which is meant when I say that W is applied "after" the rotation by
> Vt in my example....

I got lost...the middle matrix should be Sigma, a diagonal matrix of

>> So the rotation by W is very intuitive to me, while the rotation by
>> Vt is not. And as I described, it's all the more mysterious when
>> you consider that X isn't actually a transformation that is applied
>> to data -- it *is* the data.

> This remark "... isn't actually a transformation..." confuses now
> me. ;-) Well, I understood X as data as well, I have no idea, where
> the idea of "being a transformation" comes from and what I am
> possibly missing here. Very likely I didn't properly catch your way
> of approaching the problem...

That's the view of X = W Sigma Vt. Sigma is an anisotropic axial
stretch while W rotates these stretch axes to the principal components
of the data in m-space. What is never explained is what Vt rotates.
In order for the rotations and stretches to apply, X=W*Sigma*Vt must
be viewed as a transformation applied to a vector (or a collection of
column vectors). Which means Vt is first applied, then Sigma, then
W. W and Vt are orthogonal rotations.

However, X isn't a transformation that is applied to data vectors, and
it is hard to imagine what vectors Vt would apply to. They would have
to be in n-space, but n-space doesn't have much meaning in the context
of finding correlations between the m data sets (one from each

> --------------------------------------------------------
> (...)

>> Furthermore, when I am seeking correlation between the m sensors,
>> it confounds me to think about why one would picture the data
>> points in n- space. As an analogy, if I am doing simple linear
>> regression on a cloud of 1000 points in the x-y plane, I don't try
>> to picture the data points in 1000-dimension space.

> Well, we might say, such a concept is superfluous, not needed. It
> just reflects a possibilitywhich occurs when we look at the
> correlation matrix and its cholesky-factors. Say, with our m x n
> -datamatrix X (I use the '-apostroph for transposition)
> R = X * X' / n // R is the m x m correlation-matrix
> then we have also with some rotation W
> Z = W * R * W' // Z = Sigma = diagonal
> but also, if we see R in its cholesky-factors L and L'
> Z = W * (L * L') * W' // Z = Sigma = diagonal
> and because any rotation-matrix t postmultiplied with its transpose
> is the identity
> Z = W * (L * I * L') * W' = W * (L * t * t' * L') * W'
> Now L is usually taken as m x m matrix as well, but there is no
> problem to expand it by empty columns to make a m x n matrix
> out of it and then to assume t such that
> L * t = X / sqrt(n)
> and then rewrite:
> Z = W * (L * t * t' * L') * W' = W * (X * t' * t * X')/n * W'
> where again (X * t' * t * X')/n = X * X' /n = R shows the
> identity of the solutions.

>>> [24] t1 = gettrans(X,"Drei")
>>> t1 :
>>> 0.0856 0.0449 0.3898 0.6802 -0.4701 -0.3937
>>> 0.0929 0.0538 -0.1865 -0.1958 0.3348 -0.8963
>>> -0.8486 0.1986 0.1513 0.2615 0.3856 -0.0206
>>> -0.0516 -0.6916 -0.5339 0.4630 0.1392 0.0151
>>> 0.3812 -0.2498 0.5843 0.1452 0.6459 0.1125
>>> 0.3405 0.6441 -0.4049 0.4418 0.2858 0.1685

>> Sorry, I tried to google gettrans, but wasn't able to find much
>> beyond the fact that it is a column rotation. It's not clear to me
>> what is meant by that. Consequently, I wasn't able to follow the
>> rest of the example.

> With the given parameters X and "Drei" (="triangular") it calls the
> procedure, which returns that rotation-matrix, which can rotate X to
> lower triangular shape. Having it stored as an explicite matrix we
> can apply this rotation and also revert it and furtherly do anything
> we want with it.
> If you are using windows, you can even download that MatMate-program
> and do the steps yourself (and possibly experiment further) See my
> software-pages It's an amateurish
> program, however working nice for me, but if some installation
> problems occur (which is easily possible) let me know.

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.