Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Steps toward more efficient optimization
Replies: 11   Last Post: Aug 6, 2012 12:44 PM

 Messages: [ Previous | Next ]
 RGVickson@shaw.ca Posts: 1,677 Registered: 12/1/07
Re: Steps toward more efficient optimization
Posted: Aug 5, 2012 7:09 PM

On Sunday, August 5, 2012 12:03:26 PM UTC-7, (unknown) wrote:
> Mr. Vickson,
>
>
>
> You understand perfectly. I'm solving for A using experimentally derived B and C, not the other way around, but you get the point.
>
>
>
> I'm surprised by one thing you said: When I described how I would compose the LP problem, I used an intermediate vector Y. You rightly pointed out that Y is not necessary; the equations can be composed easily without it. However, does this matter? I had assumed that the solver would automatically process the additional complexity in a trivial amount of time.
>
>
>
> Also, thank you for mentioning the dual problem. I know nothing about that, so I will do some research into it.
>
>
>
> ----
>
>
>

> > I don't see why the problems you get are as large as you claim.
>
>
>
> This gives me the opportunity to re-explain everything -- which I can do better now that I've thought things through.
>
>
>
> I'm trying to tune a nonlinear model f(A,X)=Y by adjusting A to make Y as close as possible to its measured values. There are L elements in A (parameters), M elements in X (independent variables), and N elements in Y (dependent variables). I have P measurements.
>
>
>
> By exploring the model, I've discovered that it has the linear form g(X) x A + C = Y where g(X) is a nonlinear transformation of X and C is a vector of offsets. I don't have an explicit description of g(). All I know is what I can infer by running the model with different values of A and X.
>
>
>
> Suppose I want to find g(X0). Logically, g(X0) is the partial derivative dY0/dA. I can find g(X0), therefore, by running the model with perturbed values of A and seeing how Y0 changes. Once I've done that for all elements of A, I can replace g(X0) with an LxN matrix of known values, let's say B0. Now B0 x A + C0 = Y0.
>
>
>
> I don't want to do that for just one value of X, however; I want to do it for all values of X corresponding to the measured points. After I do that, I have B x A + C = Y, where B is an LxNP matrix, A is unchanged at Lx1, C is NPx1, and Y is NPx1. This is the equation I need to optimize.
>
>
>
> In other words, in my linear recasting of the problem, I have L independent variables, NP dependent variables, and one point to fit. That is why I am expecting so many equations. Counting the variables like you did, I expect to have ~80 As + ~20*50 Zs = ~1080 variables. The basic equations will result in ~20*50 = 1000 equalities, and coding the absolute values is going to create ~2*20*50 = 2000 inequalities. Plus, I'll have constraints that result in a handful of additional equalities (~4) and inequalities (~160).
>
>
>
> ----
>
>
>
> I've decided that, before I start working in lp_solve, I'm going to do more research and see if I can find a linear solver which uses a method of least squares. The requirement to code absolute values as inequality constraints seems inefficient for this problem.
>
>
>
> ----
>
>
>
> Also, I have a new question: Is there a convention to the symbols used for representing the sizes of matrices in LP problems? I get the impression that my use of L, M, N, and P is not standard.
>
>
>
>
>
> -TC

First let's address your question "When I described how I would compose the LP problem, I used an intermediate vector Y. You rightly pointed out that Y is not necessary; the equations can be composed easily without it. However, does this matter?"

The simple answer is that it matters a lot, and including unnecessary equations can be serious in terms of CPU time, etc. Here is why. Let's say we have a problem with 400 variables and 400 inequality constraints of the form z_k >= u_k - sum_j w_kj v_j and z_k >= sum_j w_kj v_j - u_k (where the u and v are known and the w, z are the variables). This would be a model without equations. What a simplex-based LP solver would do is introduce slack and/or surplus variables to convert all inequalities (except for sign restrictions) to equations, so the new problem would have 400 equations and 800 variables. It would work in some way with a nonsingular sub-matrix whose size is the number of constraints, in this case a 400x400 matrix,and would then use LU decomposition (or, in the old days, a matrix inverse) to get a basic feasible solution, and to update the matrix when changing to a new basis. So, the basic work would be on a number of related 400x400 matrices, which you can think of as being inverted. Now suppose, instead, that you introduce 200 new variables y_k and introduce into the model the 200 equations y_k = sum_j w_kj v_j, then re-write the inequalities as z_k >= y_k - u_k and z_k >= u_k - y_k. Now the solver still introduces a total of 400 slack/surplus variables, so now has 400+200 = 600 equations in 800+200 = 1000 variables. Now the method would work throughout with a sequence of 600x600 matrices that you can think of as being inverted. You and I know that the equations for the y_k are just a "shorthand", and do not change the basics of the model; however, the computer does not know this, so it treats the y_k as genuine new variables and their definitions as genuine new equations.

However, back when I was teaching this stuff, before retiring, I always used to advise my students to use the most convenient form for modelling, just as long as the solver technology could handle the problem in a reasonable time. So, if introducing a lot of strictly unnecessary variables helps in simplifying the modelling formulations (as well as helping to guard against modelling errors and typos, etc.) then that is the thing to do. But, when this results in a model too large to be handled by the available solvers (often student versions, for example) then they need to start eliminating those convenient extra variables to cut down on the problem size; or, if doing that improves the CPU time significantly (say reducing it from several hours to a few seconds) then again, that is what they should do. However, in a real-world design project it might not matter so much that the CPU time is in hours or days, since it maybe will be done off-line and just a few times prior to the final design. Perhaps your situation is covered by this circumstance, and if so, those extra variables may not matter much in practice.

BTW: the above account explains why, instead of solving a 400x200 problem P it is better to solve the 200x400 dual problem D, because instead of inverting 400x400 matrices in the primal problem we would be inverting 200x200 matrices in the dual. Many codes automatically print out the dual solution along with the primal solution. So, if you attack the solution of problem P by solving its dual D, the printed solution of D would also print out its (that is, D's) dual. The dual of D is P, so you can read off the solution to your original problem P by looking in the right place in the printout for D.

Now, another issue: I am not sure I "get" what you are trying to do. Could you, perhaps, cook up a very small example, say with 2 dependent variables and 3 independent variables and 2 experiments (with actual, if fictitious numbers), and write down in detail (not using matrix notation, but actual formulas) the resulting problem? That would be very helpful. If that is too time-consuming, perhaps just one dependent variable, 2 independent variables and 2 experiments would serve to show what you mean.

RGV

Date Subject Author
8/3/12 TC
8/3/12 m. m. m.
8/3/12 RGVickson@shaw.ca
8/3/12 TC
8/3/12 donstockbauer@hotmail.com
8/4/12 RGVickson@shaw.ca
8/4/12 RGVickson@shaw.ca
8/4/12 RGVickson@shaw.ca
8/5/12 TC
8/5/12 RGVickson@shaw.ca
8/6/12 TC
8/6/12 TC