The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Mahalanobis_distance and Gaussian distribution
Replies: 6   Last Post: Jan 18, 2013 12:10 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Richard Ulrich

Posts: 2,961
Registered: 12/13/04
Re: Mahalanobis_distance and Gaussian distribution
Posted: Jan 17, 2013 2:02 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Thu, 17 Jan 2013 11:21:24 -0000, "David Jones"
<> wrote:
[snip, a bunch]
>The Mahalanobis distances may be dimensionless with respect to the units of
>the underlying observations but that does not men that they are immediately
>comparable across different sources of data. Even of the number of
>dimensions is the same you still need to look at context. For example, if
>used in some formal testing procedure, the power of such tests can be
>different. Consider two different set of observations on the underlying
>quantity, one with rather more random observation error than the other.
>For different dimensions, consider the case where the dimensions are much
>more different, say 2 and 100. Then a typical value of Mahalanobis distance
>for a point from the second population would be 100, but this would be very
>unusual value for a point from the first population. In fact the sets of
>values of distances for the two populations would hardly overlap. If this is
>meaningful for whatever way you intend to use the distances then OK. But
>many uses are of the kind where you are looking for datapoints that are
>unusual with respect to an initial distribution ... the Mahalanobis distance
>is not (without some transformation) directly usable in a comparison between
>sets of data with different dimensions, as exemplified in the case above
>where a value of 100 is unusual for one population but not the other.


I'm asking myself -- to judge which is more of an outlier, Why
can't we consider the "p-value" of each of these two
chisquared distributions with different df's?

I'm not saying that this is a good idea. -- I *suspect* that there
is something shaky about it, or I might have heard of it being
done before, and it doesn't seem familiar. Or, is that just
because the circumstances are too rare in my reading?

Wikip tells me that the M distance was first used in anthropology,
for categorizing new skulls. They should have "missing" to account
for, at times when some measurements aren't avialable, which
would create the same circumstance. I wonder what they do.

Rich Ulrich

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.