On Wednesday, July 10, 2013 12:45:01 PM UTC-7, Ray Vickson wrote: >The best line must pass through at most two of the data points; here is why.
> You can formulate the best-line problem as a linear program: > maximize a + x_bar * b > subject to a + x_i * b <= y_i, i=1,..., N > Here, a and b are "free", not sign-restricted. This optimization problem has 2 variables (a, b) and N constraints. Its DUAL is > minimize sum_i y_i * v_i > subject to sum_i v_i = 1, sum x_i * v_i = x_bar and v_i >= 0 for i = 1,2,...,N. > This problem has N variables (the dual variables v_i) and two constraints. The dual constraints are equalities because the primal variables a and b are "free"; the dual variables v_i are sign-restricted because the primal constraints are inequalities. > > There exists an optimal solution at a basic feasible solution of the dual; any basic solution (feasible or not) has at most as many non-zero variables as there are constraints, so in this case any basic solution has at most two positive variables. The positive dual variables correspond to tight primal constraints, so correspond to two points through which the bounding line must pass.
I know nothing about linear programming. Can you supply some references?
> Note: when we say it *must* pass through at most two data points, we mean that it *need not be forced* to pass through 3 or more; however, it can "accidentally" pass through 3 or more points. This would correspond to a dual linear program having a non-unique optimal solution.
I trust you didn't mean "must pass through at most two of the data points" then, as if all points were co-linear it would pass through all of them.
Interesting that this idea doesn't directly use the convex hull, although I suspect the linear programming solvers do something closely related.