On Wednesday, July 10, 2013 11:00:53 AM UTC-7, ols...@sbcglobal.net wrote: > On Wednesday, July 10, 2013 7:24:31 AM UTC-7, Leon Aigret wrote: > > > About the only thing not yet done in this thread is summarizing the > > > findings: > > > > Here is a summary of where I think we are so far. > > > > Leon Aigret says > > > The lower bound criterion turns out to be equivalent with the > > > requirement that the bounding line stays below, with just touching > > > allowed, the convex hull of the set of points. > > > > This has not been proved. Only the fact that a "best" bounding line must pass through at least one point has been shown (by quasi). > > > > If it turns out to be true that said line must pass through at least two points, then we have an O(n^2) method, which is too slow for practical use. > > > > The convex hull may be found in O(n) time, given that the data is already sorted, but even if the optimum line(s) pass through two points on the hull (and this hasn't yet been proved), the method is still O(n^2). > > > > The hypothesis that an optimum line must pass through two neighboring points of the original data is incorrect, as shown by counter-example. > > > > It may be that an optimum line must pass through two neighboring points of the convex hull, but this has not been proved.
The best line must pass through at most two of the data points; here is why.
You can formulate the best-line problem as a linear program: maximize a + x_bar * b subject to a + x_i * b <= y_i, i=1,..., N Here, a and b are "free", not sign-restricted. This optimization problem has 2 variables (a, b) and N constraints. Its DUAL is minimize sum_i y_i * v_i subject to sum_i v_i = 1, sum x_i * v_i = x_bar and v_i >= 0 for i = 1,2,...,N. This problem has N variables (the dual variables v_i) and two constraints. The dual constraints are equalities because the primal variables a and b are "free"; the dual variables v_i are sign-restricted because the primal constraints are inequalities.
There exists an optimal solution at a basic feasible solution of the dual; any basic solution (feasible or not) has at most as many non-zero variables as there are constraints, so in this case any basic solution has at most two positive variables. The positive dual variables correspond to tight primal constraints, so correspond to two points through which the bounding line must pass.
Note: when we say it *must* pass through at most two data points, we mean that it *need not be forced* to pass through 3 or more; however, it can "accidentally" pass through 3 or more points. This would correspond to a dual linear program having a non-unique optimal solution.