Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: How to do a classification using Matlab?
Replies: 3   Last Post: May 3, 2013 10:03 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Greg Heath

Posts: 214
Registered: 12/13/04
Re: How to do a classification using Matlab?
Posted: Mar 16, 2013 9:15 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Mar 14, 2:46 pm, "Aaronne " <ggyy...@hotmail.com> wrote:
> Hi Smart Guys,
>
> I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.
>
> Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?
>
>     clc; clf; clear all; close all;
>
>     %% Load the extracted features
>     features                            = xlsread('ExtractedFeatures.xls');
>     numFeatures                         = 23;
>
>     %% Define ground truth
>     groundTruthGroup                    = cell(numFeatures,1);
>     groundTruthGroup(1:15)              = cellstr('Good');
>     groundTruthGroup(16:end)            = cellstr('bad');
>
>     %% Select features
>     featureSelcted                      = [features(:,3), features(:,9)];
>
>     %% Run LDA
>     [ldaClass, ldaResubErr]             = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
>     bad                                 = ~strcmp(ldaClass,groundTruthGroup);
>     ldaResubErr2                        = sum(bad)/numFeatures;
>
>     [ldaResubCM,grpOrder]               = confusionmat(groundTruthGroup,ldaClass);
>
>     %% Scatter plot
>     gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
>     xlabel('Feature 3');
>     ylabel('Feature 9');
>     hold on;
>     plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
>     hold off;
>
>     %% Leave one out cross validation
>     leaveOneOutPartition                = cvpartition(numFeatures, 'leaveout');
>     ldaClassFun                         = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
>     ldaCVErr                            = crossval('mcr', featureSelcted(:,1:2), ...
>         groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
>
>     %% Display the results
>     clc;
>     disp('______________________________________ Results ______________________________________________________');
>     disp(' ');
>     disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
>     disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
>     disp(' ');
>     disp('Confusion Matrix:');
>     disp(ldaResubCM)
>     disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
>     disp(' ');
>     disp('_____________________________________________________________________­_________________________________');
>
> I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?
>
> I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.
>
> II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)


I don't know why you think PCA should even be considered for
classification dimensionality reduction.

It chooses the directions in which the variables have the most spread,
not the dimensions that have the most relative distances between
clustered subclasses.

You are probably better off clustering the mixture, or each class
separately, then
using either LDA with truncated/regularized pinv(Sw)*Sb or PLSREGRESS.

Hope this helps.

Greg



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.