Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Determine relative importance of original variables after performing PCA
Replies: 6   Last Post: Jan 28, 2013 6:48 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Ilya Narsky

Posts: 133
Registered: 11/7/08
Re: Determine relative importance of original variables after performing PCA
Posted: Jan 24, 2013 6:08 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

"Maureen " <maureen_510@hotmail.com> wrote in message
news:kdrm1o$e57$1@newscl01ah.mathworks.com...
> I have 350 observation and 27 variables. So I want to use PCA for
> dimension reduction purpose to plot the 350 observation on a 2D plot,
> which effectively means that I will only be using PC1 and PC2. My purpose
> is just to see their relationship on a 2D plot.
>
> But how do I determine which of my original variables contribute most to
> the first two principle components and which of the variables are less
> important in which I can discard? I have saw many similar post online but
> have not come up with a solution. Where should I go from here?
> I have read through the documentation on feature selection, and some
> people suggested using stepwisefit and other regression methods. I do not
> have much background with regression, so do correct me if I am wrong.
> Based on my readings, I believe I would need to have a set of criteria to
> select the features, in which I do not have an idea what should the
> criteria be. Also there should be a set of output, Y in order to perform
> stepwisefit. But for my case, all 27 variables are my features, which is
> the input so to speak and I do not have a set of output.
>
> So if not using regression, may I know where do I go from here, so that I
> can determine the importance of my original set of variables? In other
> words, I need to find the contribution of the original variables to PC1
> and PC2.
>
> Appreciate any help/ suggestion. Thanks in advance!
>


PCA is not really suited for variable selection. The typical workflow is to
discard principal components, not the original variables. The simplest
approach would be to keep enough components so that the cumulative
percentage of the total variance explained by these components (see 5th
output from the pca function) would be above a certain threshold, something
like 0.7-0.9. Better approaches can be found in popular textbooks and review
articles.

Here is a related thread that might help you:
http://www.mathworks.com/matlabcentral/answers/49134-determining-variables-that-contribute-to-principal-components

There are research papers on variable selection for PCA, and there are many
more papers on unsupervised variable selection, which is what you want since
you do not have response Y. Google them if you'd like. These are not
available from official MATLAB. There may be something on the File Exchange;
I have not checked.

-Ilya





Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.