On Thu, 11 Jul 2013 16:14:05 -0500, quasi <email@example.com> wrote:
>"Julio Di Egidio wrote: >>"quasi wrote: >>> ols6000 wrote: >>>> >>>>One further thing to keep in mind: a method in which you select >>>>the best of N-1 lines is O(N^2), because calculation of the sum >>>>for each line is O(N). That means such a method isn't practical. >>> >>> Good point. >>> >>> The most efficient algorithm proposed so far has complexity O(N) >>> _times_ the complexity of a distance sum calculation. >>> >>> However, a distance sum calculation by its very nature is O(N). >>> so if each one of the distance sum calculations has to be done >>> separately, and if each one is O(N), then, as you indicate, the >>> whole task is O(N^2). >>> >>> It seems unlikely that you can do better. >> >>As already noted, we can indeed do better, namely O(N+N), as >>there is no need to recalculate the cost function from scratch >>at each step. > >I must have missed that. > >If you can avoid recalculations of the cost function at each >step, then, as you indicate, the whole task is just O(N), and >thus, ols6000 is potentially back in business.
The cost function SUM(y_i - y(x_i)) for the line y = a x + b becomes SUM(y_i - a x_i - b) = N (y_mean - a x_mean - b), so minimizing this function corresponds with maximizing a x_mean +b. Calculating x_mean is O(N), but has to be done just once.
Actually, since a x_mean + b has the geometrical interpretation of y-coordinate of the intersection of y = a x + b and x = x_mean, repeated evaluation of that expression can be replaced by the geometrical argument that the best line is the line with the highest intersection point, which must (and can) be the point where the line x = x_mean reaches the convex hull.