"Maureen " <email@example.com> wrote in message <firstname.lastname@example.org>... > I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot. > > But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?
You have not indicated
1. whether the task is classification or regression 2. if any of the 27 are ouputs 3. the number of output variables
The most important is 1 because PCA is inapprorpriate for classification. Therefore, I'll assume the task is regression.
The next most important is 2 because PCA is only used to transform the input space. Therefore I'll assume 27 original input variables.
3 is still important becase it affects what algorithms/techniques should be used.
> I have read through the documentation on feature selection, and some people >suggested using stepwisefit and other regression methods.
Yes. The best criterion to use is one that optimizes a specific function of the output variables.
>I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in >which I do not have an idea what should the criteria be.
If it's regression, it is simple, just read the STEPWISEFIT documentation.
If it is classification, then you should not be using PCA because there is no reason why PCA space should be preferred over the original.
> Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output. > > So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2. > > Appreciate any help/ suggestion. Thanks in advance!
If you don't know what you want to optimize, then there is no reason to use PCA over the original variables.
What do you want to do with the data?? What is your ultimate goal.
P.S. I want to to carpentry with two tools. Which 2 should I use?