Date: Feb 12, 2013 4:39 PM
Author: RGVickson@shaw.ca
Subject: Re: Question: Centroid given a distance metric
On Tuesday, February 12, 2013 10:26:24 AM UTC-8, Nicolas Bonneel wrote:

> On 2/11/2013 12:17 PM, Andrey Savov wrote:

>

> > Was wondering if you guys can point me in the right direction.

>

> >

>

> > Are there any known/studied methods to calculate a centroid (geometric center) of finite set of points in n-dimensional real Euclidean space by only knowing a distance metric f(x,y): R^n x R^n -> R ?

>

> >

>

>

>

> Have you tried posing it as an optimization problem:

>

> F(x) = \argmin \sum_i d(x, x_i)^2

>

> and running any optimization method ?

>

>

>

> There won't likely be a close form solution for an arbitrary distance

>

> d(x,y), but if it's smooth and the dimension not too large, you can

>

> manage to find a global optimum. It will not necessarily be unique

>

> though, but should exist if d is not a strange function (like d=\infty

>

> everywhere etc.).

>

>

>

>

>

> --

>

> Nicolas Bonneel

I, as is often the case the distance function is convex, the sum of squares of it is also convex, so a local min will be a global min. However, the problem arises that sometimes the F(x) function is NOT smooth: the minimum may--and in practical problems, often does--lie right on top of one of the points x_i, making F non-differentiable at the optimal solution. This does not always happen, but it does happen often enough that location-analysis folks have to devise special algorithms to handle the problem.

I would ask: why do you want to minimize the sum of squares? For Euclidean distance, that F(x) has some physical and statistical meaning, and furthermore leads to a simple solution. However, for other norms such as d(x,y) = |x|+|y| or d(x,y) = max(|x|,|y|), or for a p-norm with 1 < p < 2, what significance can one attach to the sum of squares? Certainly it makes _some_ problems much harder instead of easier (for example, when d(x,y) = |x| + |y|).

Ray Vickson