
Re: Mahalanobis_distance and Gaussian distribution
Posted:
Jan 17, 2013 2:02 PM


On Thu, 17 Jan 2013 11:21:24 0000, "David Jones" <dajhawk@hotmail.co.uk> wrote: [snip, a bunch] > >The Mahalanobis distances may be dimensionless with respect to the units of >the underlying observations but that does not men that they are immediately >comparable across different sources of data. Even of the number of >dimensions is the same you still need to look at context. For example, if >used in some formal testing procedure, the power of such tests can be >different. Consider two different set of observations on the underlying >quantity, one with rather more random observation error than the other. > >For different dimensions, consider the case where the dimensions are much >more different, say 2 and 100. Then a typical value of Mahalanobis distance >for a point from the second population would be 100, but this would be very >unusual value for a point from the first population. In fact the sets of >values of distances for the two populations would hardly overlap. If this is >meaningful for whatever way you intend to use the distances then OK. But >many uses are of the kind where you are looking for datapoints that are >unusual with respect to an initial distribution ... the Mahalanobis distance >is not (without some transformation) directly usable in a comparison between >sets of data with different dimensions, as exemplified in the case above >where a value of 100 is unusual for one population but not the other.
David,
I'm asking myself  to judge which is more of an outlier, Why can't we consider the "pvalue" of each of these two chisquared distributions with different df's?
I'm not saying that this is a good idea.  I *suspect* that there is something shaky about it, or I might have heard of it being done before, and it doesn't seem familiar. Or, is that just because the circumstances are too rare in my reading?
Wikip tells me that the M distance was first used in anthropology, for categorizing new skulls. They should have "missing" to account for, at times when some measurements aren't avialable, which would create the same circumstance. I wonder what they do.
 Rich Ulrich

