
Re: How to do a classification using Matlab?
Posted:
Mar 15, 2013 9:23 AM


On 3/14/2013 2:46 PM, Aaronne wrote: > Hi Smart Guys, > > I have got the data (can be downloaded here: [enter link description > here][1]) and tried to run a simple LDA based classification based on > the 11 features stored in the dataset, ie, F1, F2, ..., F11. > > Here I wrote some codes in Matlab using only 2 features. May I ask > some questions based on the codes I have got please? > > clc; clf; clear all; close all; > %% Load the extracted features > features = xlsread('ExtractedFeatures.xls'); > numFeatures = 23; > %% Define ground truth > groundTruthGroup = cell(numFeatures,1); > groundTruthGroup(1:15) = cellstr('Good'); > groundTruthGroup(16:end) = cellstr('bad'); > %% Select features > featureSelcted = [features(:,3), features(:,9)]; > %% Run LDA > [ldaClass, ldaResubErr] = > classify(featureSelcted(:,1:2), featureSelcted(:,1:2), > groundTruthGroup, 'linear'); > bad = > ~strcmp(ldaClass,groundTruthGroup); > ldaResubErr2 = sum(bad)/numFeatures; > [ldaResubCM,grpOrder] = > confusionmat(groundTruthGroup,ldaClass); > %% Scatter plot > gscatter(featureSelcted(:,1), featureSelcted(:,2), > groundTruthGroup, 'rgb', 'osd'); > xlabel('Feature 3'); > ylabel('Feature 9'); > hold on; > plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx'); > hold off; > %% Leave one out cross validation > leaveOneOutPartition = cvpartition(numFeatures, > 'leaveout'); > ldaClassFun = @(xtrain, ytrain, > xtest)(classify(xtest, xtrain, ytrain, 'linear')); > ldaCVErr = crossval('mcr', > featureSelcted(:,1:2), ... > groundTruthGroup, 'predfun', ldaClassFun, 'partition', > leaveOneOutPartition); > %% Display the results > clc; > disp('______________________________________ Results > ______________________________________________________'); > disp(' '); > disp(sprintf('Resubstitution Error of LDA (Training Error > calculated by Matlab buildin): %d', ldaResubErr)); > disp(sprintf('Resubstitution Error of LDA (Training Error > calculated manually): %d', ldaResubErr2)); > disp(' '); > disp('Confusion Matrix:'); > disp(ldaResubCM) > disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', > ldaCVErr)); > disp(' '); > disp('______________________________________________________________________________________________________'); > > > I. My first question is how to do a feature selection? For example, > using forward or backward feature selection, and ttest based methods? > > I have checked that the Matlab has got the `sequentialfs` method but > not sure how to incorporate it into my codes. > II. How do using the Matlab `classify` method to do a classification > with more than 2 features? Should we perform the PCA at first? For > example, currently we have 11 features, and we run PCA to produce 2 or > 3 PCs and then run the classification? (I am expecting to write a loop > to add each feature one by one to do a forward feature selection. Not > just run PCA to do a dimension reduciton.) > > III. I have also try to run a ROC analysis. I refer to the webpage > [enter link description here][2] which has got an implementation of a > simple LDA method and produce the linear scores of the LDA. Then we > can use `perfcurve` to get the ROC curve. > IIIa. However, I am not sure how to use `classify` method with > `perfcurve` to get the ROC. > > IIIb. Also, how to do a ROC with the crossvalidation? > > IIIc. After we have got the `OPTROCPT`, which is the best cutoff > point, how can we use this cutoff point to produce better > classification? > > %% ROC Analysis > featureSelcted = [features(:,3), > features(:,9)]; groundTruthNumericalLable = > [zeros(15,1); ones(8,1)]; > % Calculate linear discriminant coefficients > ldaCoefficients = LDA(featureSelcted, > groundTruthNumericalLable); > % Calulcate linear scores for the training data > ldaLinearScores = [ones(numFeatures,1) > featureSelcted] * ldaCoefficients'; > % Calculate class probabilities > classProbabilities = exp(ldaLinearScores) ./ > repmat(sum(exp(ldaLinearScores),2),[1 2]); > % Fit probabilities for scores > figure, > [FPR, TPR, Thr, AUC, OPTROCPT] = > perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0); > plot(FPR, TPR, 'or') > xlabel('False positive rate (FPR, 1Specificity)'); ylabel('True > positive rate (TPR, Sensitivity)') > title('ROC for classification by LDA') > grid on; > > IV. Currently, I calculate the accuracy of the training and cross > validation errors by the classify and `crossval` functions. May I ask > how to get those values in a summary by using `classperf`? > > V. If anyone knows a good tutorial of using Matlab statistic toolbox > to do machine learning task with a full example please tell me. > Some Matlab Help examples are really confusing to me because the > examples are made in pieces and I am really a novice to machine > learning. Sorry if I asked some question bot proper. Thanks very much > for your help. > > > > A. > > > [1]: http://ge.tt/6eijw4b/v/0 > [2]: > http://matlabdatamining.blogspot.co.uk/2010/12/lineardiscriminantanalysislda.html
It sounds as if you have Statistics Toolbox. If so, then why bother rewriting discriminant analysis code? There is a good deal of information about discriminant analysis here: http://www.mathworks.com/help/stats/discriminantanalysis1.html There may be more information than you care to read about classification in these two sections: http://www.mathworks.com/help/stats/supervisedlearning.html http://www.mathworks.com/help/stats/ensemblelearning.html
Good luck,
Alan Weiss MATLAB mathematical toolbox documentation

