Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.

Topic: pls classification,plsregress
Replies: 3   Last Post: Apr 25, 2012 11:54 AM

 Messages: [ Previous | Next ]
 Greg Heath Posts: 211 Registered: 12/13/04
Re: pls classification,plsregress
Posted: Apr 25, 2012 11:54 AM

On Apr 25, 8:21 am, "Kebi Wilcox" <kewil...@gmail.com> wrote:
> "Chenwei" wrote in message <g7ced9\$t3...@fred.mathworks.com>...
> > I need to know how use the new matlab function plsregress to
> > get a classification.I have one testing set and one training
> > set(they are matrix,each column is a spectrum with about
> > 6000 intensity points)and one vector with the state(two
> > classes= 1 or 0)for each spectrum.I don't know how to handle
> > the results of plsregress to get a classification of the
> > testing set(1 or 0) based on the training set.Someone can
> > help me? thanks...

>
> Where you able to resolve this problem and how did you go about it?

On Aug 6 2008, 1:27 pm, Peter Perkins
<Peter.PerkinsRemoveT...@mathworks.com> wrote:
> Chenwei wrote:
> > I need to know how use the new matlab function plsregress to
> > get a classification.I have one testing set and one training
> > set(they are matrix,each column is a spectrum with about
> > 6000 intensity points)and one vector with the state(two
> > classes= 1 or 0)for each spectrum.I don't know how to handle
> > the results of plsregress to get a classification of the
> > testing set(1 or 0) based on the training set.Someone can
> > help me? thanks...

>
> I believe that the standard thing to do when using PLS regression for classification goes something roughly like the following:
>
> Fit a regression model to a set of dummy binary variables that define the classes. In the case of two classes, you already have the (one) vector of 1's and 0's. Use that fitted regression model to predict the response value for new data, i.e., a value between 0 and 1. Pick a threshold (like, 0.5) at which to discriminate between the two classes.
>
> Hope this helps.

1. You should extract a much smaller number of features from your
spectral vectors with 6000 elements. The true dimensionality of the
input space is much, much less than 6000. Whether down-sampling, low-
pass filtering or orthogonal projection is appropriate depends on the
data.

2. You may be able to obtain a quick idea of the true dimensionality
via SVD to estimate the RANK of the spectral input matrix.

3. I haven't had the pleasure of using PLS yet. However, I am pretty
sure that it has it's own method of dimensionality reduction to
mitigate multicolinearity of inputs.

4. You don't indicate the number of 0 and 1 measurements that you
have.
If your model contains Np estimated parameters it would be desirable
that
~0.5 < N1/N0 < ~2 and min(N0,N1) >> Np.
However, good results can still be obtained if these conditions are
not satisfied.

5. The threshold value depends on

a. The ratio of the number of 0 and 1 examples in training
b. The expected ratio of the number of 0 and 1 examples in general
use.
c. The relative importance of misclassifying 0 and 1 classes.
d. The maximum allowable error for each class

6. It is usually a good idea to have a third, independent, validation
set to determine, post-training, a practical threshold.

7. There are various techniques to mitigate the inability to satisfy
some of the aove conditions. So, be prepared for a bit of trial and
error.

8. Practicing on some of MATLAB's classification demo examples should
be worthwhile.

Hope this helps.

Greg

Date Subject Author
8/6/08 Chenwei
8/6/08 Peter Perkins
4/25/12 Kebi Wilcox
4/25/12 Greg Heath