Date: Jan 17, 2013 2:02 PM
Author: Richard Ulrich
Subject: Re: Mahalanobis_distance and Gaussian distribution

On Thu, 17 Jan 2013 11:21:24 -0000, "David Jones"
<> wrote:
[snip, a bunch]
>The Mahalanobis distances may be dimensionless with respect to the units of
>the underlying observations but that does not men that they are immediately
>comparable across different sources of data. Even of the number of
>dimensions is the same you still need to look at context. For example, if
>used in some formal testing procedure, the power of such tests can be
>different. Consider two different set of observations on the underlying
>quantity, one with rather more random observation error than the other.
>For different dimensions, consider the case where the dimensions are much
>more different, say 2 and 100. Then a typical value of Mahalanobis distance
>for a point from the second population would be 100, but this would be very
>unusual value for a point from the first population. In fact the sets of
>values of distances for the two populations would hardly overlap. If this is
>meaningful for whatever way you intend to use the distances then OK. But
>many uses are of the kind where you are looking for datapoints that are
>unusual with respect to an initial distribution ... the Mahalanobis distance
>is not (without some transformation) directly usable in a comparison between
>sets of data with different dimensions, as exemplified in the case above
>where a value of 100 is unusual for one population but not the other.


I'm asking myself -- to judge which is more of an outlier, Why
can't we consider the "p-value" of each of these two
chisquared distributions with different df's?

I'm not saying that this is a good idea. -- I *suspect* that there
is something shaky about it, or I might have heard of it being
done before, and it doesn't seem familiar. Or, is that just
because the circumstances are too rare in my reading?

Wikip tells me that the M distance was first used in anthropology,
for categorizing new skulls. They should have "missing" to account
for, at times when some measurements aren't avialable, which
would create the same circumstance. I wonder what they do.

Rich Ulrich