Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: How to do a classification using Matlab?
Replies: 4   Last Post: May 1, 2014 2:57 AM

 Messages: [ Previous | Next ]
 Aaronne Posts: 110 Registered: 6/2/11
How to do a classification using Matlab?
Posted: Mar 14, 2013 2:46 PM

Hi Smart Guys,

I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.

Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?

clc; clf; clear all; close all;

numFeatures = 23;

%% Define ground truth
groundTruthGroup = cell(numFeatures,1);
groundTruthGroup(1:15) = cellstr('Good');

%% Select features
featureSelcted = [features(:,3), features(:,9)];

%% Run LDA
[ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');

[ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass);

%% Scatter plot
gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
xlabel('Feature 3');
ylabel('Feature 9');
hold on;
hold off;

%% Leave one out cross validation
leaveOneOutPartition = cvpartition(numFeatures, 'leaveout');
ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ...
groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);

%% Display the results
clc;
disp('______________________________________ Results ______________________________________________________');
disp(' ');
disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
disp(' ');
disp('Confusion Matrix:');
disp(ldaResubCM)
disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
disp(' ');
disp('______________________________________________________________________________________________________');

I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?

I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.

II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)

III. I have also try to run a ROC analysis. I refer to the webpage [enter link description here][2] which has got an implementation of a simple LDA method and produce the linear scores of the LDA. Then we can use `perfcurve` to get the ROC curve.

IIIa. However, I am not sure how to use `classify` method with `perfcurve` to get the ROC.

IIIb. Also, how to do a ROC with the cross-validation?

IIIc. After we have got the `OPTROCPT`, which is the best cut-off point, how can we use this cut-off point to produce better classification?

%% ROC Analysis
featureSelcted = [features(:,3), features(:,9)];
groundTruthNumericalLable = [zeros(15,1); ones(8,1)];

% Calculate linear discriminant coefficients
ldaCoefficients = LDA(featureSelcted, groundTruthNumericalLable);

% Calulcate linear scores for the training data
ldaLinearScores = [ones(numFeatures,1) featureSelcted] * ldaCoefficients';

% Calculate class probabilities
classProbabilities = exp(ldaLinearScores) ./ repmat(sum(exp(ldaLinearScores),2),[1 2]);

% Fit probabilities for scores
figure,
[FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);
plot(FPR, TPR, 'or-')
xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True positive rate (TPR, Sensitivity)')
title('ROC for classification by LDA')
grid on;

IV. Currently, I calculate the accuracy of the training and cross validation errors by the classify and `crossval` functions. May I ask how to get those values in a summary by using `classperf`?

V. If anyone knows a good tutorial of using Matlab statistic toolbox to do machine learning task with a full example please tell me.

Some Matlab Help examples are really confusing to me because the examples are made in pieces and I am really a novice to machine learning. Sorry if I asked some question bot proper. Thanks very much for your help.

A.

[1]: http://ge.tt/6eijw4b/v/0
[2]: http://matlabdatamining.blogspot.co.uk/2010/12/linear-discriminant-analysis-lda.html

Date Subject Author
3/14/13 Aaronne
3/15/13 Alan Weiss
3/16/13 Greg Heath
5/3/13 Shiguo
5/1/14 Greg Heath