On Mar 14, 2:46 pm, "Aaronne " <ggyy...@hotmail.com> wrote: > Hi Smart Guys, > > I have got the data (can be downloaded here: [enter link description here]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11. > > Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please? > > clc; clf; clear all; close all; > > %% Load the extracted features > features = xlsread('ExtractedFeatures.xls'); > numFeatures = 23; > > %% Define ground truth > groundTruthGroup = cell(numFeatures,1); > groundTruthGroup(1:15) = cellstr('Good'); > groundTruthGroup(16:end) = cellstr('bad'); > > %% Select features > featureSelcted = [features(:,3), features(:,9)]; > > %% Run LDA > [ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear'); > bad = ~strcmp(ldaClass,groundTruthGroup); > ldaResubErr2 = sum(bad)/numFeatures; > > [ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass); > > %% Scatter plot > gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd'); > xlabel('Feature 3'); > ylabel('Feature 9'); > hold on; > plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx'); > hold off; > > %% Leave one out cross validation > leaveOneOutPartition = cvpartition(numFeatures, 'leaveout'); > ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear')); > ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ... > groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition); > > %% Display the results > clc; > disp('______________________________________ Results ______________________________________________________'); > disp(' '); > disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr)); > disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2)); > disp(' '); > disp('Confusion Matrix:'); > disp(ldaResubCM) > disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr)); > disp(' '); > disp('______________________________________________________________________________________________________'); > > I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods? > > I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes. > > II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)
I don't know why you think PCA should even be considered for classification dimensionality reduction.
It chooses the directions in which the variables have the most spread, not the dimensions that have the most relative distances between clustered subclasses.
You are probably better off clustering the mixture, or each class separately, then using either LDA with truncated/regularized pinv(Sw)*Sb or PLSREGRESS.