Paul
Posts:
426
Registered:
2/23/10


Re: SVD for PCA: The right most rotation matrix
Posted:
Nov 4, 2012 2:04 PM


Hi, Gottfried,
Thanks again for taking the time to write such a detailed response. I gave it a quice twiceover, but it needs more than that. I'll get back to it soon, but I got some preliminary comments.
On Nov 2, 2:02 am, Gottfried Helms <he...@unikassel.de> wrote: > it seems I made my comment more complicated than the procedure is. > > Am 01.11.2012 20:53 schrieb Paul: > >> I might be missing some linear algebra theory here, but I looked up >> gettrans() and I'm not sure what is meant by a column rotation in >> that context. > > No, gettrans is just a functioncall in my MatMatescriptlanguage, > which returns a rotationmatrix. For instance, by the command: > > t1 = gettrans(X,"drei") // "drei" means "triangular" > > t1 becomes the rotationmatrix, which is required to rotate > columnwise...
I am still not sure what is a columnwise rotation. Do you actually switch columns around, or is it more like a geometric rotation?
> ...X to triangular shape. After that we can do the following > with t1: > > Y = X * t1 > // Y is a lower triangular matrix, (with possibly empty columns > // to the right > Z = Y * t1' > // Z equals now X, because t1*t1' = I (Identitymatrix) > > or, for doing roation to principals components position: > > t2 = gettrans(X,"pc") // "pc" means "principal components" > > and then > > B = X * t2 > // the columns of B are now orthogonal, are the principal > // components > > I've introduced that function "gettrans" additionally to the simple > "rotate" function to have the rotationmatrix available for later > manipulation, or to be able to reverse a rotation later or to apply > the same rotation to another matrix etc. It can also be made to work > only on certain columns and using only certain rows for the > criterion; This is then useful, if one uses rotations, which are > implemented as iterative procedures like "pc" or "varimax" or > similar. > >>> The key is, that the n samples define m vectors in an >>> ndimensional euclidean space; simply each column of X can be seen >>> as a spatial dimension. In that ndimensional space there are m >>> vectors, where the number m is smaller than n. Any rotation in >>> that space repositions the vectors, but *not* the relation, or >>> better: the angles, between them >> >> I'm not sure why *any* rotation in nspace would not preserve >> angles. I thought that a rotation is by definition a unitary >> transformation (from a recent brushup on linear algebra at >> Wikipedia e.g. >>http://en.wikipedia.org/wiki/Orthogonal_matrix). > > My remark may be obfuscating here. There is the concept of "oblique > rotations" in factor analysis (as opposed to orthogonal rotations) > which do not preserve the angles  and I had the impulse to exclude > this case verbally... So this remark could just be deleted > >>> ...So we can rotate the vector model X >>> (columnwise) first such, that >>> sensor 1 defines the xaxes, >>> sensor 2 and 1 define the xyplane >>> sensor 3 to 1 define the xyzspace >>> and so on. > >> I don't quite follow what you mean by "rotat[ing] the vector >> [model] X columnwise". If you interpret each column of X as a >> point (or vector) in nspace, we get what you describe (sensor 1 is >> the xaxis, sensor 2 is the yaxis, etc.). However, a rotation is >> not needed for this. > > If we speak of the ndimensional space, each column represent the > coordinates on one axis. Then each row represents one vector > (from the origin) to some point in this ndimensional space: for > each sensor there is one wire from the origin into the nspace, > and the angles between that wires (more precisely: the cosines of > that angles) are expressed by the correlationcoefficients. That > view of statistical data may be somehow unusual  but it is coherent > with the operations of rotations and the finding of principal > components  and this is what your matrix Vt stands for. > >>> In effect, that rotation provides a matrix X1 which is triangular >>> with as many nonzerocolumns as the rank of the matrix is (and we >>> assume for simplicityness, that it equals m) > >> I think I'm missing something fundamental...the data matrix is not >> triangular, though the (n)x(n) covariance matrix (Xt)(X) is >> symmetric. > > No, not the data matrix X. But after X is rotated to triangular > position by t1 then > X1 = X * t1 > is lower triangular (with some empty columns due to the defective > rank of X)
What is meant by rotating to triangular position? Do you mean geometric position, or that X somehow becomes a triangular matrix by rearranging its columns? What if there are not enough properly placed zeros for that to be possible?
>>> Then the matrix X1 can be rotated to the position of their >>> principal components (we're talking already of the nonzero columns >>> only), let's call this X2 >> >> I see that the data must be rotated so that the principal axes >> align with the axes of mspace (not nspace), and then the diagonal >> matrix Sigma performs the anisotropic axial stretching. > > No, again we rotate in the columns/the nspace. Just we apply the > (costly because of iterations) rotation to orthogonality (which > gives principal components) only to the first m axes in X1 (which is > already triangular with only m significant columns) > > X2 = X1 * t2 > or equivalently > X2 = X * t1 * t2 = X * (t1 * t2) = X * Vt > > After that X2 contains the coordinates of your sensormeasures > after rotation in the nspace in such a way that in the first > column the sum of squared coordinates is the maximum possible > and in the m'th column the least possible and because > X2 ' * X2 is diagonal we may say, that the columns are orthogonal > >>> That two rotations together form your matrix Vt. After that, X2 >>> can be rotated by rotation of its rows to diagonal form  this is >>> your rotationmatrix W, which rotates for the principal components >>> with respect of the rows in X2 (and which is the same as the >>> rotation with respect of the rows in X). >> >> But W is not applied after Vt, > > ??? > > If we have > W * X * Vt > we can also write > W * (X * Vt) > which is meant when I say that W is applied "after" the rotation by > Vt in my example....
I got lost...the middle matrix should be Sigma, a diagonal matrix of Eigenvalues.
>> So the rotation by W is very intuitive to me, while the rotation by >> Vt is not. And as I described, it's all the more mysterious when >> you consider that X isn't actually a transformation that is applied >> to data  it *is* the data. > > This remark "... isn't actually a transformation..." confuses now > me. ;) Well, I understood X as data as well, I have no idea, where > the idea of "being a transformation" comes from and what I am > possibly missing here. Very likely I didn't properly catch your way > of approaching the problem...
That's the view of X = W Sigma Vt. Sigma is an anisotropic axial stretch while W rotates these stretch axes to the principal components of the data in mspace. What is never explained is what Vt rotates. In order for the rotations and stretches to apply, X=W*Sigma*Vt must be viewed as a transformation applied to a vector (or a collection of column vectors). Which means Vt is first applied, then Sigma, then W. W and Vt are orthogonal rotations.
However, X isn't a transformation that is applied to data vectors, and it is hard to imagine what vectors Vt would apply to. They would have to be in nspace, but nspace doesn't have much meaning in the context of finding correlations between the m data sets (one from each sensor).
>  > (...) > >> Furthermore, when I am seeking correlation between the m sensors, >> it confounds me to think about why one would picture the data >> points in n space. As an analogy, if I am doing simple linear >> regression on a cloud of 1000 points in the xy plane, I don't try >> to picture the data points in 1000dimension space. > > Well, we might say, such a concept is superfluous, not needed. It > just reflects a possibilitywhich occurs when we look at the > correlation matrix and its choleskyfactors. Say, with our m x n > datamatrix X (I use the 'apostroph for transposition) > > R = X * X' / n // R is the m x m correlationmatrix > > then we have also with some rotation W > > Z = W * R * W' // Z = Sigma = diagonal > > but also, if we see R in its choleskyfactors L and L' > > Z = W * (L * L') * W' // Z = Sigma = diagonal > > and because any rotationmatrix t postmultiplied with its transpose > is the identity > > Z = W * (L * I * L') * W' = W * (L * t * t' * L') * W' > > Now L is usually taken as m x m matrix as well, but there is no > problem to expand it by empty columns to make a m x n matrix > out of it and then to assume t such that > > L * t = X / sqrt(n) > > and then rewrite: > > Z = W * (L * t * t' * L') * W' = W * (X * t' * t * X')/n * W' > > where again (X * t' * t * X')/n = X * X' /n = R shows the > identity of the solutions. > >>> [24] t1 = gettrans(X,"Drei") >>> t1 : >>> 0.0856 0.0449 0.3898 0.6802 0.4701 0.3937 >>> 0.0929 0.0538 0.1865 0.1958 0.3348 0.8963 >>> 0.8486 0.1986 0.1513 0.2615 0.3856 0.0206 >>> 0.0516 0.6916 0.5339 0.4630 0.1392 0.0151 >>> 0.3812 0.2498 0.5843 0.1452 0.6459 0.1125 >>> 0.3405 0.6441 0.4049 0.4418 0.2858 0.1685 > >> Sorry, I tried to google gettrans, but wasn't able to find much >> beyond the fact that it is a column rotation. It's not clear to me >> what is meant by that. Consequently, I wasn't able to follow the >> rest of the example. > > With the given parameters X and "Drei" (="triangular") it calls the > procedure, which returns that rotationmatrix, which can rotate X to > lower triangular shape. Having it stored as an explicite matrix we > can apply this rotation and also revert it and furtherly do anything > we want with it. > > If you are using windows, you can even download that MatMateprogram > and do the steps yourself (and possibly experiment further) See my > softwarepages http://go.helmsnet.de/sw/matmate. It's an amateurish > program, however working nice for me, but if some installation > problems occur (which is easily possible) let me know.

