Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: the math in classify.m
Replies: 7   Last Post: Dec 3, 2012 7:34 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Peter Perkins

Posts: 1,665
Registered: 12/7/04
Re: the math in classify.m
Posted: Aug 9, 2005 9:48 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

marquito wrote:

> =====================================================
> % Pooled estimate of covariance
> [Q,R] = qr(training - gmeans(gindex,:), 0);
> R = R / sqrt(n - ngroups); % SigmaHat = R'*R
> s = svd(R);
> if any(s <= eps^(3/4)*max(s))
> error('The pooled covariance matrix of TRAINING
> must be positive definite.');
> end
>
> % MVN relative log posterior density, by group, for
> each sample
> for k = 1:ngroups
> A = (sample - repmat(gmeans(k,:), mm, 1)) / R;
> D(:,k) = log(prior(k)) - .5*sum(A .* A, 2);
> end
> ======================================================
>
> I dont know exactly, was is going on there. I expected to see
> something like a multivariate Gauss distribution, like:
>
> p(x|class) = 1/sqrt(2pi^d * |Sigma| ) * exp( (x-mu)Sigma^-1(x-mu))
>
> or something similar to this. Could somebody verify this or explain
> what kind of magic the programmer used (why qr decomposition?).


You're right that the density formally involves the inverse of a cov matrix.
But inverting a potentially large matrix explicitly is usually not a good idea.
Write down what the estimate SigmaHat would be (X0'*X0, where X0 is the
centered data matrix, bearing in mind that the centering is different for each
class), then write X0 as Q*R. Just like the comment says, SigmaHat is R'*R,
because Q is orthonormal. Now substitute that into the quadratic form
(x-mu)Sigma^-1(x-mu), and you find that you can compute that quadratic form
using backsolve on a triangular R.


> I'm also very interested how 'D' is calculated since I use this value
> to show the distances of a sample to the different classes. Shouldn't
> this be the probability density function of x for the different
> classes?


It is the log of that, multiplied by the class prior probabilities. Computed on
the log scale, because multivariate probabilities get very, very small.

With a flat prior, I think the third output POSTERIOR is what you are asking for.

Hope this helps.

- Peter Perkins
The MathWorks, Inc.



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.