Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Re: Determine relative importance of original variables after performing PCA
Posted:
Jan 24, 2013 6:08 PM


"Maureen " <maureen_510@hotmail.com> wrote in message news:kdrm1o$e57$1@newscl01ah.mathworks.com... > I have 350 observation and 27 variables. So I want to use PCA for > dimension reduction purpose to plot the 350 observation on a 2D plot, > which effectively means that I will only be using PC1 and PC2. My purpose > is just to see their relationship on a 2D plot. > > But how do I determine which of my original variables contribute most to > the first two principle components and which of the variables are less > important in which I can discard? I have saw many similar post online but > have not come up with a solution. Where should I go from here? > I have read through the documentation on feature selection, and some > people suggested using stepwisefit and other regression methods. I do not > have much background with regression, so do correct me if I am wrong. > Based on my readings, I believe I would need to have a set of criteria to > select the features, in which I do not have an idea what should the > criteria be. Also there should be a set of output, Y in order to perform > stepwisefit. But for my case, all 27 variables are my features, which is > the input so to speak and I do not have a set of output. > > So if not using regression, may I know where do I go from here, so that I > can determine the importance of my original set of variables? In other > words, I need to find the contribution of the original variables to PC1 > and PC2. > > Appreciate any help/ suggestion. Thanks in advance! >
PCA is not really suited for variable selection. The typical workflow is to discard principal components, not the original variables. The simplest approach would be to keep enough components so that the cumulative percentage of the total variance explained by these components (see 5th output from the pca function) would be above a certain threshold, something like 0.70.9. Better approaches can be found in popular textbooks and review articles.
Here is a related thread that might help you: http://www.mathworks.com/matlabcentral/answers/49134determiningvariablesthatcontributetoprincipalcomponents
There are research papers on variable selection for PCA, and there are many more papers on unsupervised variable selection, which is what you want since you do not have response Y. Google them if you'd like. These are not available from official MATLAB. There may be something on the File Exchange; I have not checked.
Ilya



