Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Determine relative importance of original variables after performing PCA
Replies: 6   Last Post: Jan 28, 2013 6:48 AM

 Messages: [ Previous | Next ]
 Ilya Narsky Posts: 145 Registered: 11/7/08
Re: Determine relative importance of original variables after performing PCA
Posted: Jan 24, 2013 6:08 PM

"Maureen " <maureen_510@hotmail.com> wrote in message
news:kdrm1o\$e57\$1@newscl01ah.mathworks.com...
> I have 350 observation and 27 variables. So I want to use PCA for
> dimension reduction purpose to plot the 350 observation on a 2D plot,
> which effectively means that I will only be using PC1 and PC2. My purpose
> is just to see their relationship on a 2D plot.
>
> But how do I determine which of my original variables contribute most to
> the first two principle components and which of the variables are less
> important in which I can discard? I have saw many similar post online but
> have not come up with a solution. Where should I go from here?
> I have read through the documentation on feature selection, and some
> people suggested using stepwisefit and other regression methods. I do not
> have much background with regression, so do correct me if I am wrong.
> Based on my readings, I believe I would need to have a set of criteria to
> select the features, in which I do not have an idea what should the
> criteria be. Also there should be a set of output, Y in order to perform
> stepwisefit. But for my case, all 27 variables are my features, which is
> the input so to speak and I do not have a set of output.
>
> So if not using regression, may I know where do I go from here, so that I
> can determine the importance of my original set of variables? In other
> words, I need to find the contribution of the original variables to PC1
> and PC2.
>
> Appreciate any help/ suggestion. Thanks in advance!
>

PCA is not really suited for variable selection. The typical workflow is to
discard principal components, not the original variables. The simplest
approach would be to keep enough components so that the cumulative
percentage of the total variance explained by these components (see 5th
output from the pca function) would be above a certain threshold, something
like 0.7-0.9. Better approaches can be found in popular textbooks and review
articles.

There are research papers on variable selection for PCA, and there are many
more papers on unsupervised variable selection, which is what you want since
you do not have response Y. Google them if you'd like. These are not
available from official MATLAB. There may be something on the File Exchange;
I have not checked.

-Ilya

Date Subject Author
1/24/13 Ilya Narsky
1/25/13 Greg Heath
1/27/13 Greg Heath
1/28/13 Greg Heath