On Thu, 11 Jul 2013 15:54:29 -0700 (PDT), firstname.lastname@example.org wrote:
>On Thursday, July 11, 2013 2:11:21 PM UTC-7, Leon Aigret wrote: >> The cost function SUM(y_i - y(x_i)) for the line y = a x + b becomes >> SUM(y_i - a x_i - b) = N (y_mean - a x_mean - b), so minimizing this >> function corresponds with maximizing a x_mean +b. Calculating x_mean >> is O(N), but has to be done just once. > >Carrying this one step further, minimizing the cost function is just maximizing a. The problem with this is that the cost is subject to y_i-y(x_i)>=0
At this point in the analysis the cost function can and will be used to evaluate every line y = a x + b which lies below all data points (i.e. y_i-y(x_i)>=0 for all i) or, equivalently, lies below all points of the convex hull, so just maximizing a is not an option.
>> Actually, since a x_mean + b has the geometrical interpretation of >> y-coordinate of the intersection of y = a x + b and x = x_mean, >> repeated evaluation of that expression can be replaced by the >> geometrical argument that the best line is the line with the highest >> intersection point, which must (and can) be the point where the line >> x = x_mean reaches the convex hull. > >Can you explain what you mean by "the line with the highest intersection point"?
More precisely: the line that, among all lines y = a x + b that lie below the convex hull, has the highest intersection point (x_mean, a x_mean + b) with the line x = x_mean.