Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Determine relative importance of original variables after performing PCA
Replies: 6   Last Post: Jan 28, 2013 6:48 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Greg Heath

Posts: 5,978
Registered: 12/7/04
Re: Determine relative importance of original variables after performing PCA
Posted: Jan 28, 2013 6:48 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

PLEASE DO NOT TOP-POST: 'IT IS CONSIDERED A HEINOUS BREACH OF GOOGLE GROUP ETIQUETTE TO POST REPLIES ABOVE A PREVIOUS POST.'

"Maureen " <maureen_510@hotmail.com> wrote in message <ke4rlc$lvc$1@newscl01ah.mathworks.com>...
> Initially, I am interested in dimension reduction as I wanted to reduce the plot down from 27 to either a 2 or 3 dimensional plot, that was why I decided to use PCA.

PCA ranks variables according to spread. However, if you are not using techniques that depend on spread ranking, you cannot expect the display to provide any more information than the practical dimensionality of the data and the corresponding linearly dependent combinations of variables (negligibly small singular values).

If you are more interested in correlations among variables, then standardize the original variables to have zero-mean and unit-variance. The resulting covariance matrix is then the correlation coefficient matrix. In addition to providing the correlation information,
projections onto the new PC planes may yield useful info.

> I also understand that in PCA the orthogonal transformed input gives the most spread and I thought it could be helpful in visualizing my data with maximum spread on the input variables. I am not doing any form of classification, just to clarify, I do not have classes >in which I hope my data will sit into.

Then use unsupervised clustering. It can tell you if classes of data appear to be present.

> But after plotting, I realised some overfit issue and I thought maybe I used too many >input variables.

You don't have to guess

help cond
doc cond
help rank
doc rank

>Thus, I decided to remove some of the variables, but I do not know which constitute more and which are less significant in which I can remove. I tried by removing the variables that produce smaller projection on the plot and the result did not seem to >improve, instead worsen.

What result? If you can quantify a result goal or bound then maybe we can help.

>Hence, I thought maybe I should find out which variables contribute most to the first 2 >PCs for a 2D plot.
>
> Is my line of thoughts right?


I don't know what you are looking for. If you want to rank variables according to spread,
just sort var(X) where 27 = size(X,2) or use the diagonal of the covariance matrix.

>So if that is right, will STEPWISEFIT as mentioned earlier in the discussion, help in >finding the variable importance? Or would some other method be more effective?

STEPWISEFIT is for linear regression. If you have no specified output variables or classes, it is of no use.

It might help if you

1. Sort the variables w.r.t. variance
2. Explain what each variable does in real life
3. Posted the 27 sorted variances and resulting correlation coeffiient matrix in a form suitable for cutting and pasting.

Greg



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.