Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
Re: pls classification,plsregress
Posted:
Apr 25, 2012 11:54 AM
|
|
On Apr 25, 8:21 am, "Kebi Wilcox" <kewil...@gmail.com> wrote: > "Chenwei" wrote in message <g7ced9$t3...@fred.mathworks.com>... > > I need to know how use the new matlab function plsregress to > > get a classification.I have one testing set and one training > > set(they are matrix,each column is a spectrum with about > > 6000 intensity points)and one vector with the state(two > > classes= 1 or 0)for each spectrum.I don't know how to handle > > the results of plsregress to get a classification of the > > testing set(1 or 0) based on the training set.Someone can > > help me? thanks... > > Where you able to resolve this problem and how did you go about it?
On Aug 6 2008, 1:27 pm, Peter Perkins <Peter.PerkinsRemoveT...@mathworks.com> wrote: > Chenwei wrote: > > I need to know how use the new matlab function plsregress to > > get a classification.I have one testing set and one training > > set(they are matrix,each column is a spectrum with about > > 6000 intensity points)and one vector with the state(two > > classes= 1 or 0)for each spectrum.I don't know how to handle > > the results of plsregress to get a classification of the > > testing set(1 or 0) based on the training set.Someone can > > help me? thanks... > > I believe that the standard thing to do when using PLS regression for classification goes something roughly like the following: > > Fit a regression model to a set of dummy binary variables that define the classes. In the case of two classes, you already have the (one) vector of 1's and 0's. Use that fitted regression model to predict the response value for new data, i.e., a value between 0 and 1. Pick a threshold (like, 0.5) at which to discriminate between the two classes. > > Hope this helps.
1. You should extract a much smaller number of features from your spectral vectors with 6000 elements. The true dimensionality of the input space is much, much less than 6000. Whether down-sampling, low- pass filtering or orthogonal projection is appropriate depends on the data.
2. You may be able to obtain a quick idea of the true dimensionality via SVD to estimate the RANK of the spectral input matrix.
3. I haven't had the pleasure of using PLS yet. However, I am pretty sure that it has it's own method of dimensionality reduction to mitigate multicolinearity of inputs.
4. You don't indicate the number of 0 and 1 measurements that you have. If your model contains Np estimated parameters it would be desirable that ~0.5 < N1/N0 < ~2 and min(N0,N1) >> Np. However, good results can still be obtained if these conditions are not satisfied.
5. The threshold value depends on
a. The ratio of the number of 0 and 1 examples in training b. The expected ratio of the number of 0 and 1 examples in general use. c. The relative importance of misclassifying 0 and 1 classes. d. The maximum allowable error for each class
6. It is usually a good idea to have a third, independent, validation set to determine, post-training, a practical threshold.
7. There are various techniques to mitigate the inability to satisfy some of the aove conditions. So, be prepared for a bit of trial and error.
8. Practicing on some of MATLAB's classification demo examples should be worthwhile.
Hope this helps.
Greg
|
|
|
|