An algorithm for nonlinear optimization using linear programming

Math. Program., Ser. B 100: 27–48 (2004)
Digital Object Identifier (DOI) 10.1007/s10107-003-0485-4
Richard H. Byrd · Nicholas I.M. Gould · Jorge Nocedal · Richard A. Waltz
An algorithm for nonlinear optimization using linear
programming and equality constrained subproblems
To Roger Fletcher, friend and optimization wizard, on his 65th birthday.
Received: October 1, 2002 / Accepted: August 21, 2003
Published online: November 21, 2003 – © Springer-Verlag 2003
Abstract. This paper describes an active-set algorithm for large-scale nonlinear programming based on the
successive linear programming method proposed by Fletcher and Sainz de la Maza [10]. The step computation
is performed in two stages. In the first stage a linear program is solved to estimate the active set at the solution.
The linear program is obtained by making a linear approximation to the 1 penalty function inside a trust
region. In the second stage, an equality constrained quadratic program (EQP) is solved involving only those
constraints that are active at the solution of the linear program. The EQP incorporates a trust-region constraint
and is solved (inexactly) by means of a projected conjugate gradient method. Numerical experiments are
presented illustrating the performance of the algorithm on the CUTEr [1, 15] test set.
1. Introduction
Sequential quadratic programming (SQP) constitutes one of the most successful methods for large-scale nonlinear optimization. In recent years, active set SQP methods have
proven to be quite effective at solving problems with thousands of variables and constraints [9, 11], but are likely to become very expensive as the problems they are asked
to solve become larger and larger. This is particularly so since the principal cost of SQP
methods is often dominated by that of solving the quadratic programming (QP) subproblems, and in the absence of a QP truncating scheme, this cost can become prohibitive.
These concerns have motivated us to look for a different active-set approach.
In this paper we describe a trust-region algorithm for nonlinear programming that
does not require the solution of a general quadratic program at each iteration. It can be
viewed as a so-called “EQP form” [12] of sequential quadratic programming, in which
R.H. Byrd: Department of Computer Science, University of Colorado, Boulder, CO 80309, USA,
e-mail: [email protected]. This author was supported by Air Force Office of Scientific
Research grant F49620-00-1-0162, Army Research Office Grant DAAG55-98-1-0176, and National Science
Foundation grant INT-9726199.
N.I.M. Gould: Computational Science and Engineering Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire OX11 0Qx, England, EU; e-mail: [email protected]. This author was supported in part
by the EPSRC grant GR/R46641.
J. Nocedal, R.A. Waltz: Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, 60208-3118, USA. These authors were supported by National Science Foundation grants CCR9987818, ATM-0086579 and CCR-0219438 and Department of Energy grant DE-FG02-87ER25047-A004.
Report OTC 2002/4, Optimization Technology Center
To Roger Fletcher, with respect and admiration
28
R.H. Byrd et al.
a guess of the active set is made (using linear programming techniques) and then an
equality constrained quadratic program is solved to attempt to achieve optimality.
The idea of solving a linear program (LP) to identify an active set, followed by the
solution of an equality constrained quadratic problem (EQP) was first proposed and analyzed by Fletcher and Sainz de la Maza [10], and more recently by Chin and Fletcher [4],
but has received little attention beyond this. This “sequential linear programming-EQP
method”, or SLP-EQP in short, is motivated by the fact that solving quadratic subproblems with inequality constraints, as in the SQP method, can be prohibitively expensive
for many large problems. The cost of solving one linear program followed by an equality
constrained quadratic problem, could be much lower.
In this paper we go beyond the ideas proposed by Fletcher and Sainz de la Maza in
that we investigate new techniques for generating the step, managing the penalty parameter and updating the LP trust region. Our algorithm also differs from the approach of
Chin and Fletcher, who use a filter to determine the acceptability of the step, whereas we
employ an 1 merit function. All of this results in major algorithmic differences between
our approach and those proposed in the literature.
2. Overview of the algorithm
The nonlinear programming problem will be written as
minimize f (x)
(2.1a)
x
subject to hi (x) = 0,
gi (x) ≥ 0,
i∈E
i ∈ I,
(2.1b)
(2.1c)
where the objective function f : Rn → R, and the constraint functions hi : Rn →
R, i ∈ Egi : Rn → R, i ∈ I, are assumed to be twice continuously differentiable.
The SLP-EQP algorithm studied in this paper is a trust-region method which uses a
merit function to determine the acceptability of a step. It separates the active-set identification phase from the step computation phase – unlike SQP methods where both tasks
are accomplished by solving a quadratic program – and employs different trust regions
for each phase. First, a linear programming problem (LP) based on a linear model of
the merit function is solved. The solution of this LP defines a step, d LP , and a working set W which is a subset of the constraints active at the solution of this LP. Next, a
Cauchy step, d C , is computed by minimizing a quadratic model of the merit function
along the direction d LP . The Cauchy step plays a crucial role in the global convergence properties of the algorithm. Once the LP and Cauchy steps have been computed,
an equality constrained quadratic program (EQP) is solved, treating the constraints in
W as equality constraints and ignoring all other constraints, to obtain the EQP point
x EQP .
The trial point x T of the algorithm is chosen to lie on the line segment starting at
the Cauchy point x C = x + d C and terminating at the EQP point x EQP , where x denotes
the current iterate. The trial point x T is accepted if it provides sufficient decrease of the
merit function; otherwise the step is rejected, the trust region is reduced and a new trial
point is computed.
An algorithm for nonlinear optimization
29
Algorithm 2.1: SLP-EQP – General Outline
Initial data: x0 , 0 > 0, LP
0 > 0, 0 < ρu ≤ ρs < 1.
For k = 0, 1, . . . , until a stopping test is satisfied, perform the following steps.
LP
1. LP point Solve an LP with trust region LP
k based on linear models of (2.1) to obtain step dk ,
and working set Wk .
2. Cauchy point Compute αkLP ∈ (0, 1] as the (approximate) minimizer of q(αdkLP ) such that
αkLP dkLP ≤ k . Set dkC = αkLP dkLP and xkC = xk + dkC .
3. EQP point Compute x EQP by solving an EQP with trust region k and constraints defined by
EQP
Wk . Define dkCE = xk − xkC as the segment leading from the Cauchy point to the EQP point.
EQP
4. Trial point Compute αk ∈ [0, 1] as the (approximate) minimizer of
EQP
q(dkC + αdkCE ). Set dk = dkC + αk dkCE and xkT = xk + dk .
5. Compute
ρk =
φ(xk ; νk ) − φ(xkT ; νk )
.
qk (0) − qk (dk )
6a. If ρk ≥ ρs , choose k+1 ≥ k , otherwise choose k+1 < k .
6b. If ρk ≥ ρu , set xk+1 ← xkT , otherwise set xk+1 ← xk .
7. Compute LP
k+1 .
The algorithm is summarized in Algorithm 2.1 above. Here φ(x; ν) denotes the 1
merit function
φ(x; ν) = f (x) + ν
|hi (x)| + ν
(max(0, −gi (x)),
(2.2)
i∈E
i∈I
with penalty parameter ν. A quadratic model of φ is denoted by q. The trust-region
radius for the LP subproblem is denoted by LP , whereas is the primary (master)
trust-region radius that controls both the size of the EQP step and the total step.
An appealing feature of the SLP-EQP algorithm is that established techniques for
solving large-scale versions of the LP and EQP subproblems are readily available. Current high quality LP software is capable of solving problems with more than a million
variables and constraints, and the solution of an EQP can be performed efficiently using
an iterative approach such as the conjugate gradient method. Two of the key questions
regarding the SLP-EQP approach which will play a large role in determining its efficiency are: (i) how well does the linear program predict the optimal active set, and (ii)
what is the cost of the iteration compared to its main competitors, the interior-point and
active-set approaches?
Many details of the algorithm are yet to be specified. This will be the subject of the
following sections.
3. The Linear Programming (LP) phase
The goal of the LP phase is to make an estimate of the optimal active set W ∗ , at moderate
cost. In general terms we want to solve the problem
30
R.H. Byrd et al.
minimize ∇f (x)T d
(3.1a)
d
subject to hi (x) + ∇hi (x)T d = 0,
gi (x) + ∇gi (x)T d ≥ 0,
i∈E
(3.1b)
i∈I
(3.1c)
d∞ ≤ ,
LP
(3.1d)
where LP is a trust-region radius whose choice will be discussed in Section 3.1. The
working set W is defined as the set of constraints that are active at the solution of this
LP, if these constraints are linearly independent. Otherwise, some linearly independent
subset of these form the working set.
Working with this LP is attractive since it requires no choice of parameters, but it
has the drawback that its constraints may be infeasible. This possible inconsistency of
constraint linearizations and the trust region has received considerable attention in the
context of SQP methods; see, e.g. [5] and the references therein.
To deal with the possible inconsistency of the constraints we follow an 1 -penalty
approach in which the constraints (3.1b)–(3.1c) are incorporated in the form of a penalty
term in the model objective. Specifically, we reformulate the LP phase as the minimization of a linear approximation of the 1 merit function (2.2) subject to the trust-region
constraint. The linear approximation of the merit function φ at the current estimate x is
given by
(d) = ∇f (x)T d + ν
+ν
|hi (x) + ∇hi (x)T d|
i∈E
max(0, −gi (x) − ∇gi (x)T d).
i∈I
The working-set determination problem is then given by
minimize (d)
d
subject to d∞ ≤ LP .
The function is non-differentiable but it is well-known that this problem can be written
as the following equivalent, smooth linear program
minimize ∇f (x)T d + ν
d,r,s,t
i∈E
T
(ri + si ) + ν
ti
subject to hi (x) + ∇hi (x) d = ri − si , i ∈ E
gi (x) + ∇gi (x)T d ≥ −ti , i ∈ I
d∞ ≤ LP
r, s, t ≥ 0.
(3.4a)
i∈I
(3.4b)
(3.4c)
(3.4d)
(3.4e)
Here r, s and t are vectors of slack variables which allow for the relaxation of the equality
and inequality constraints. We denote a solution of this problem by d LP (ν).
An algorithm for nonlinear optimization
31
The working set W is defined as some linearly independent subset of the active set
A at the LP solution point which is defined as
A(d LP ) = {i ∈ E | hi (x) + ∇hi (x)T d LP = 0}
∪{i ∈ I | gi (x) + ∇gi (x)T d LP = 0}.
The span of constraint gradients in W is the same as the span of active constraint gradients. Software for linear programming typically provides this linearly independent set.
If the LP subproblem is non-degenerate the working set is synonymous with the active
set defined above. Note that we do not include all of the equality constraints in the active
set but only those whose right hand side is zero in (3.4b), for otherwise the EQP system
could be overly constrained.
In our software implementation, simple bound constraints on the variables are omitted from the merit function and handled as explicit constraints. We will ensure that
the starting point and all subsequent iterates satisfy the bounds. In particular we add
lower and upper bounds to (3.4) to ensure that the LP step satisfies the bounds. For the
sake of simplicity, however, we will omit all details concerning the handling of bounds
constraints, and will only make remarks about them when pertinent.
3.1. Trust region for the LP step
Since the model objective (3.1a) is linear, the choice of the trust-region radius LP is
much more delicate than in trust-region methods that employ quadratic models. The trust
region must be large enough to allow significant progress toward the solution, but must
be small enough so that the LP subproblem identifies only locally active constraints. We
have found that it is difficult to balance these two goals, and will present here a strategy
that appears to work well in practice and is supported by a global convergence theory.
There may, however, be more effective strategies and the choice of LP remains an open
subject of investigation.
We update the LP trust region as follows. If the trial step dk taken by the algorithm
on the most current iteration was accepted (i.e., ρk ≥ ρu ), we define
LP
if αkLP = 1
min(max{1.2dk ∞ , 1.2dkC ∞ , 0.1LP
k }, 7k ),
, (3.5)
LP
=
C
LP
LP
k+1
min(max{1.2dk ∞ , 1.2dk ∞ , 0.1k }, k ),
if αkLP < 1
whereas if the step dk was rejected we set
LP
LP
LP
k+1 = min(max{0.5dk ∞ , 0.1k }, k ).
(3.6)
The motivation for (3.5) stems from the desire that LP
k+1 be no larger than a multiple of
the norm of the trial step dk and the Cauchy step dkC , so that the LP trust region be small
enough to exclude extraneous, inactive constraints as the iterate converges to a solution.
Note that the LP trust region can decrease after an accepted step, and we include the
term 0.1LP
k to limit the rate of this decrease. Finally, we only allow the LP trust region
to increase if the step is accepted and αkLP = 1 (i.e., dkLP = dkC ). Otherwise, if αkLP < 1 this
may be an indication that our linear models are not accurate. The term 7LP
k prevents
the LP trust region from growing too rapidly in the case when it may increase.
32
R.H. Byrd et al.
When the trial step dk is rejected, (3.6) ensures that kLP does not grow. We would
LP
again want to make LP
k+1 a fraction of dk ∞ , and the term 0.1k limits the rate of
decrease.
This LP trust-region update is supported by the global convergence theory presented
in Byrd et al [2], which also provides a range of admissible values for the constants in
(3.5)–(3.6). The results in [2] show in particular that, if the sequence of iterates {xk }
is bounded, then there exists an accumulation point of the SLP-EQP algorithm that is
stationary for the merit function φ.
4. The Cauchy point
The reduction in the objective and constraints provided by the LP step can be very small.
To ensure that the algorithm has favorable global convergence properties, we will require
that the total step makes at least as much progress as a Cauchy point x C . This is a point
which provides sufficient decrease of a quadratic model of the merit function along the
LP direction d LP and subject to the restriction x C − x2 ≤ . The quadratic model,
q(d), is defined as
q(d) = (d) + 21 d T H (x, λ)d,
(4.1)
where H (x, λ) denotes the Hessian of the Lagrangian of the NLP problem (2.1) and λ
is a vector of Lagrange multiplier estimates defined later on in (5.7). To define the Cauchy point, we select 0 < τ < 1, let δ = min(1, /||d LP 2 ) and compute a steplength
0 < α LP ≤ 1 as the first member of the sequence {δτ i }i=0,1,... for which
φ(x; ν) − q(α LP d LP ) ≥ η[φ(x; ν) − (α LP d LP )],
(4.2)
where η ∈ (0, 1) is a given constant. We then define
x C = x + α LP d LP ≡ x + d C .
(4.3)
The backtracking line search used to compute α LP does not involve evaluations of the
problem functions, but rather, only evaluations of their inexpensive model approximations.
5. The EQP step
Having computed the LP step d LP which determines the working set W, we now wish
to compute a step d that attempts to achieve optimality for this working set by solving
an equality constrained quadratic program (EQP) of the form
minimize
d
1 T
2 d H (x, λ)d
+ ∇f (x)T d
subject to hi (x) + ∇hi (x)T d = 0,
gi (x) + ∇gi (x)T d = 0,
d2 ≤ .
(5.1a)
i ∈E ∩W
(5.1b)
i ∈I ∩W
(5.1c)
(5.1d)
An algorithm for nonlinear optimization
33
The trust-region radius places some restriction on the step size and prevents the step
from being unbounded in the case of negative curvature. Note that the constraints (5.1b)–
(5.1c) are consistent by definition of the working set W, but to make them compatible
with the trust region we may relax them, as will be explained below.
Let AW ∈ Rp×n represent the Jacobian matrix of the constraints in the working set where p is the number of constraints in the working set, and define a matrix
ZW ∈ Rn×(n−p) which is a null-space basis for AW (i.e., AW ZW = 0). One can
express the solution of (5.1) as
d = d N + ZW d Z ,
(5.2)
for some vector d N which satisfies the constraints (5.1b)–(5.1c) and some reduced space
vector d Z ∈ Rn−p . The vector d N will be computed here as
d N = α N d̂ N ,
(5.3)
where d̂ N is the minimum norm solution of (5.1b)–(5.1c) and α N is the largest value
in [0, 1] for which α d̂ N 2 ≤ 0.8. (If α N < 1, we replace the zeros in the right hand
sides of (5.1b) and (5.1c) by
[rE ]i = hi (x) + ∇hi (x)T d N , i ∈ E ∩ W, [rI ]i = gi (x) + ∇gi (x)T d N , i ∈ I ∩ W.)
If we define d̄ = ZW d Z as a step in the null-space of the working set constraint
gradients, then we can compute the EQP step d EQP as an approximate solution of the
problem
minimize
d̄
1 T EQP
(x, λ)d̄
2 d̄ H
subject to ∇hi (x)T d̄ = 0,
∇gi (x)T d̄ = 0,
d̄2 ≤ EQP
+ (g EQP )T d̄
(5.4a)
i ∈E ∩W
(5.4b)
i ∈I ∩W
(5.4c)
,
(5.4d)
where the definitions of the matrix H EQP (x, λ) and the vector g EQP are discussed below,
and
EQP = 2 − d N 22 .
The EQP point is computed as
x EQP = x + d N + d EQP .
(5.5)
The matrix H EQP should approximate the Hessian of the Lagrangian of the NLP
problem (2.1), and this raises the question of how to compute the Lagrange multipliers
that define H EQP . As discussed in Section 8, we will generally use a least squares multipliers approach in which inactive constraints are assigned zero multipliers. This choice
is appropriate for the termination test and most other aspects of the algorithm, but is
inadequate for the EQP step computation. The reason is that by setting the multipliers
for the inactive constraints to zero we ignore curvature information concerning violated
constraints – and this can lead to inefficiencies, as we have observed in practice. Instead
34
R.H. Byrd et al.
of this, for the EQP step we will obtain Lagrange multiplier estimates for the inactive
constraints based on our 1 penalty function.
Let us define the set of violated general constraints for the projection step d N as
V = {i ∈
/ W | hi (x) + ∇hi (x)T d N = 0} ∪ {i ∈
/ W | gi (x) + ∇gi (x)T d N < 0}, (5.6)
and denote its complement by V c . The Hessian of the quadratic model (5.4a) will be
defined as
H EQP (x, λ) = ∇ 2 f (x) + ν
sign(hi (x) + ∇hi (x)T d N )∇ 2 hi (x)
−ν
i∈V ∩I
i∈V ∩E
∇ gi (x) −
2
i∈V c ∩E
λi ∇ 2 hi (x) −
λi ∇ 2 gi (x). (5.7)
i∈V c ∩I
The terms involving ν in (5.7) are the Hessians of the penalty terms in the 1 function
φ for the violated constraint indices. Since these penalty terms are inactive for the projection step d N , they are smooth functions within some neighborhood of this point. The
signs for these terms are based on the values of the linearization of these constraints
at the projection point. We view (5.7) as the Hessian of the penalty function φ, where
inactive, violated constraints have been assigned non-zero multipliers. We should note
that, for consistency, the Hessian H EQP defined in (5.7) is the same Hessian used in the
definition of the quadratic model (4.1) used for computing the Cauchy and trial steps,
and is the only Hessian computed throughout the algorithm.
We can also incorporate linear information on the violated constraints into the EQP
step by defining
g EQP = H EQP (x, λ)d N + ∇f (x)
sign(hi (x) + ∇hi (x)T d N )∇hi (x) − ν
∇gi (x).
+ν
i∈V ∩E
(5.8)
i∈V ∩I
The last three terms in (5.8) represent the gradient of the terms in the penalty function
whose linearization is nonconstant on the working set subspace.
To summarize, these definitions are necessitated by the active-set approach followed
in this paper. In a classical SQP method, the QP solver typically enforces that the linearized constraints are satisfied throughout the step computation process. In this case, it is
not necessary to include curvature information on violated constraints since the violated
set V would be empty. By contrast our algorithm may completely ignore some of the
constraints in the EQP phase and we need to account for this.
5.1. Solution of the EQP
The equality constrained quadratic problem (5.4), with its additional spherical trustregion constraint, will be solved using a projected Conjugate-Gradient/Lanczos iteration, as implemented in the GALAHAD code GLTR of Gould et al. [14] (HSL routine
VF05 [16]). This algorithm has the feature of continuing for a few more iterations after
the first negative curvature direction is encountered.
An algorithm for nonlinear optimization
35
The projected CG/Lanczos approach applies orthogonal projections at each iteration
to keep d̄ in the null-space of AW . The projection of a vector v, say w = P v, is computed
by solving the system
w
v
I
ATW (x)
=
(5.9)
u
0
0
AW (x)
where u is an auxiliary vector; see also [13]. We use the routine MA27 from the HSL
library [16] to factor this system. We note that the projection step d N is computed without
much work since it uses the same factorization as is used in solving (5.9).
The CG iteration can be preconditioned to speed up convergence by replacing the
identity matrix in the (1,1) block of the coefficient matrix in (5.9) with a preconditioner
G which in some sense approximates H EQP . However, we will not consider preconditioners here since they require significant changes to various aspects of our algorithm.
The termination criteria used for solving this subproblem is based on the norm of the
relative residual, stopping when this value is less than some tolerance. In the tests reported
√
in Section 11 we used a termination tolerance of 1.5 × 10−8 ≈ mach , where mach
denotes the machine precision.
6. The trial step
Having computed the LP, Cauchy and EQP steps, we now combine them to define the
trial step of the iteration, d, in such a way as to obtain sufficient decrease in the quadratic
model of the penalty function.
We consider the vector leading from the Cauchy point to the EQP point,
d CE = x EQP − x C ,
where x C and x EQP are defined in (4.3) and (5.5), respectively. We then compute the steplength α EQP ∈ [0, 1] which approximately minimizes q(d C + αd CE ), where q is given
by (4.1). (If some bounds of the NLP are violated, we decrease α EQP further so that they
are satisfied.) The trial step of the iteration will be defined as
d = d C + α EQP d CE ,
where d C is the step to the Cauchy point. In practice we do not implement an exact line
search to compute α EQP , but rather use a backtracking line search that terminates as soon
as the model q has decreased.
The computation of the trial step d is similar to the dogleg method of Powell [20, 21]
for approximately minimizing a quadratic objective subject to a trust-region constraint.
As in the dogleg approach, the step is computed via a one dimensional line search along
a piecewise path from the origin to the Cauchy point x C to a Newton-like point (the
EQP point x EQP ). However, in contrast to the standard dogleg method, the model q is
not necessarily a decreasing function along the segment from the Cauchy point to the
EQP point when the Hessian is positive-definite (which is why a line search is used to
compute α EQP ). Since the minimizer can occur at x C we set α EQP = 0 if it becomes very
small (in our tests, less than 10−16 ).
36
R.H. Byrd et al.
7. Step acceptance, trust region update and SOC
Given a current point x and penalty parameter ν, a trial point, x T given by a step d is
accepted if
ρ=
ared
φ(x; ν) − φ(x T ; ν)
=
> ρu ,
pred
q(0) − q(d)
ρu ∈ [0, 1).
(7.1)
In our implementation we set ρu = 10−8 . Since we always ensure that the predicted
reduction is positive (by the choices of α LP and α EQP used to compute the trial step d),
the acceptance rule (7.1) guarantees that we only accept steps which give a reduction in
the merit function.
As is well known (Maratos [19]) steps that make good progress toward the solution
may be rejected by the penalty function φ, which may lead to slow convergence. We
address this difficulty by computing a second order correction (SOC) step [8], which
incorporates second order curvature information on the constraints.
If the trial point x T does not provide sufficient decrease of the merit function, we
compute d SOC as the minimum norm solution of
AW (x)d + cW (x T ) = 0,
where cW (x T ) is the value of the constraints in the working set at the original trial point.
In this case the trial step is computed as the sum of the original trial step and some
fraction of the second order correction step, d SOC
d ← d + τ SOC d SOC ,
where, the scalar τ SOC ∈ [0, 1] enforces satisfaction of all of the bounds on the variables.
In our algorithm we compute d SOC by solving the linear system
SOC 0
d
I
ATW (x)
=
.
(7.2)
t
−cW (x T )
0
AW (x)
Note that the computation of the second order correction step takes into account only
the constraints in the current working set (ignoring other constraints). The motivation
for this is twofold. First, it allows us to use the same coefficient matrix in (7.2) as is used
to compute projections in the CG/Lanczos routine of the EQP step (5.9) and therefore
no additional matrix factorizations are needed. Second, in the case when our working
set is accurate, we are justified in ignoring the constraints not in the working set in the
SOC step computation. Conversely, if our working set is very inaccurate it is unlikely
that a SOC step that would include all the constraints would be of much value anyway.
The SOC step could be computed selectively but for simplicity we take the conservative approach of attempting a SOC step after every rejected trial step. Another issue to
consider is from where to attempt the SOC step. There appear to be two viable options,
the trial point, x T = x + d, and the EQP point x EQP . If we attempt the SOC step from the
full EQP point, this requires an extra evaluation of the objective and constraint functions
(assuming x T = x EQP ). For this reason we attempt the SOC step from the original trial
point.
An algorithm for nonlinear optimization
37
For a current trust-region radius k and step dk , we update the (master) trust-region
radius by the following rule

if
0.9 ≤ ρ
max(k , 7dk 2 ),



max(k , 2dk 2 ),
if 0.3 ≤ ρ < 0.9
k+1 =
,
(7.3)
k ,
if 10−8 ≤ ρ < 0.3



−8
min(0.5k , 0.5dk 2 ), if
ρ < 10
where ρ is defined in (7.1) and represents the agreement between the reduction in the
merit function and the reduction predicted by the quadratic model q.
8. The Lagrange multiplier estimates
Both the LP and the EQP phases of the algorithm provide possible choices for Lagrange
multiplier estimates. However, we choose to compute least-squares Lagrange multipliers
since they satisfy the optimality conditions as well as possible for the given iterate x,
and can be computed very cheaply as we now discuss.
The multipliers corresponding to the constraints in the current working set λW are
computed by solving the system
−∇f (x)
t
I
ATW (x)
=
.
(8.1)
0
λW
0
AW (x)
Since the coefficient matrix in the system above is factored to compute projections
(5.9) in the CG/Lanczos method, the cost of computing these least-squares multipliers
is one extra backsolve which is a negligible cost in the overall iteration (considering
the CG/Lanczos method involves nCG backsolves where nCG is the number of CG/Lanczos iterations performed during the EQP phase). If any of the computed least-squares
multipliers corresponding to inequality constraints are negative, these multipliers are
reset to zero. The Lagrange multipliers λ corresponding to constraints not in the current
working set are set to zero (except in the computation of the Hessian of the Lagrangian
H EQP (x, λ) where they are assigned a penalty-based value as indicated by (5.7)). These
least squares multipliers are used in the stopping test for the nonlinear program.
9. Penalty parameter update
The choice of the penalty parameter ν in (2.2) has a significant impact on the performance
of the iteration. If the algorithm is struggling to become feasible, it can be beneficial to
increase ν. However, if ν becomes too large too quickly this can cause the algorithm to
converge very slowly. Existing strategies for updating the penalty parameter are based
on tracking the size of the Lagrange multipliers or checking the optimality conditions
for the non-differentiable merit function φ.
Here we propose a new approach for updating the penalty parameter based on the
LP phase. We take the view that, if it is possible to satisfy the constraints (3.1b)–(3.1d),
then we would like to choose ν large enough in (3.4), to do so. Otherwise, if this is
38
R.H. Byrd et al.
not possible, then we choose ν to enforce a sufficient decrease in the violation of the
linearized constraints at x, which we measure through the function
1
ξ(x, ν) =
|hi (x) + ∇hi (x)T d LP (ν)|
|E| + |I|
i∈E
+
max(0, −gi (x) − ∇gi (x)T d LP (ν)) .
i∈I
The minimum possible infeasibility value for the LP subproblem will be denoted by
ξ(x, ν∞ ), where ν∞ is meant to represent an infinite value for the penalty parameter.
At the current point xk and given νk−1 from the previous iteration, we use the following relation to define the sufficient decrease in infeasibility required by the new penalty
parameter νk :
ξ(xk , νk−1 ) − ξ(xk , νk ) ≥ (ξ(xk , νk−1 ) − ξ(xk , ν∞ )),
∈ (0, 1].
(9.1)
In our implementation we use the value = 0.1. We can now outline our strategy for
updating the penalty parameter on each iteration.
Algorithm 9.1. Penalty Parameter Update Strategy
Given: (xk , νk−1 ) and the parameters ν∞ , tol1 , tol2 and .
Solve LP (3.4) with (xk , νk−1 ) to get d LP (νk−1 ).
if d LP (νk−1 ) is feasible (i.e., ξ(xk , νk−1 ) < tol1 )
νk ← νk−1 (Case 1).
else
Solve LP (3.4) with (xk , ν∞ ) to get d LP (ν∞ ).
if d LP (ν∞ ) is feasible (i.e., ξ(xk , ν∞ ) < tol1 )
Choose some νk−1 < νk ≤ ν∞ such that ξ(xk , νk ) < tol1 (Case 2).
else if ξ(xk , νk−1 ) − ξ(xk , ν∞ ) < tol2 (no significant progress in feasibility possible)
νk ← νk−1 (Case 3).
else
Choose some νk−1 < νk ≤ ν∞ such that (9.1) is satisfied (Case 4).
end (if)
end (if)
In our implementation we set tol1 = tol2 = 10−8 , and initialize νinit = 10 where νinit is
the value of the penalty parameter at the beginning of the algorithm. In practice, instead
of using a very large penalty value for computing ξ(xk , ν∞ ), this value is computed by
setting ∇f = 0 in the linear objective (3.4a) which has the effect of ignoring the NLP
objective f (xk ) and minimizing the linear constraint violation as much as possible.
The implementation of Case 2 is achieved by increasing νk−1 by a factor of ten
and re-solving the LP until feasibility is achieved. Case 4 is implemented in a similar
manner until the condition (9.1) is satisfied with = 0.1. In Case 3 we determine that
no significant improvement in feasibility is possible for the current LP (as determined
by comparing the feasibility measure for νk−1 with the feasibility measure for ν∞ ) and
so we set νk ← νk−1 rather than increasing the penalty parameter.
An algorithm for nonlinear optimization
39
One concern with our penalty parameter update strategy is that it may require the
solution of multiple LPs per iteration. However, in practice this is only the case generally
in a small fraction of the total iterations. Typically the penalty parameter only increases
early on in the optimization calculation and then settles down to an acceptable value for
which the algorithm achieves feasibility. Moreover, it is our experience that although
this may result in multiple LP solves on some iterations, it results in an overall savings
in iterations (and total LP solves) by achieving a better penalty parameter value more
quickly, compared with rules which update the penalty parameter based on the size of
the Lagrange multipliers. In addition, we have observed that, when using a simplex LP
solver, the extra LP solves are typically very inexpensive requiring relatively few simplex iterations because of the effectiveness of warm starts when re-solving the LP with
a different penalty parameter value. (In the results reported in Section 11 the percentage
of additional simplex iterations required by Algorithm 9.1 averages roughly 4%.)
Another concern is that using this scheme the penalty parameter may become too
large too quickly and we may need to add a safeguard which detects this and reduces ν
on occasion. In practice we have noticed that this does seem to occur on a small minority
of the problems and we have implemented the following strategy for reducing ν. If there
is a sequence of five consecutive successful iterations where the iterate is feasible and
ν > 1000(λ∞ + 1), then ν is judged to be too large and is reset to ν = λ∞ + 10.
The penalty parameter ν is permitted to be decreased a maximum of two times. Although
this approach is somewhat conservative, it has proved to be quite successful in practice
in handling the few problems where ν becomes too large without adversely affecting the
majority of problems where it does not.
10. The complete algorithm
We now summarize the algorithm using the pseudo-code of Algorithm 10.1 below. We
will call our particular implementation of the SLP-EQP method the SliqueAlgorithm.
√In
our implementation the intial trust region radii are set to 0 = 1 and LP
=
0.8
/
n,
0
0
where n is the number of problem variables.
11. Numerical tests
In order to assess the potential of the SLP-EQP approach taken in Slique, we test it
here on the CUTEr [1, 15] set of problems and compare it with the state-of-the-art codes
Knitro [3, 23] and Snopt [11].
Slique 1.0 implements the algorithm outlined in the previous section. In all results
reported in this section, Slique 1.0 uses the commercial LP software package ILOG
CPLEX 8.0 [17] running the default dual simplex approach to solve the LP subproblems. Knitro 2.1 implements a primal-dual interior-point method with trust regions. It
makes use of second derivative information, and controls the barrier parameter using
a path-following approach. Snopt 6.1-1(5) is a line search SQP method in which the
search direction is determined by an active-set method for convex quadratic programming. Snopt requires only first derivatives of the objective function and constraints,
40
R.H. Byrd et al.
Algorithm 10.1: Slique Algorithm
Initial data: x0 , 0 > 0, LP
0 > 0, 0 < ρu < 1.
For k = 0, 1, . . . , until a stopping test is satisfied, perform the following steps.
1. LP point Use Algorithm 9.1 to obtain step dkLP and penalty parameter νk .
2. Working set Define the working set Wk at xk + dkLP , as described in section 3.
by
solving
(8.1).
3. Multipliers Compute
least
squares
multipliers
λW
Set λi = [λW ]i , i ∈ W; λi = 0, i ∈
/ W. If λi < 0 for i ∈ I, set λi = 0.
4. Evaluate Hessian (5.7), at xk .
5. Cauchy point Compute αkLP ∈ (0, 1] as the (approximate) minimizer of q(αdkLP ) such that
αkLP dkLP ≤ k . Set dkC = αkLP dkLP and xkC = xk + dkC .
6. EQP point Compute x EQP from (5.5) using (5.3) and the solution of (5.4), with constraints
EQP
defined by Wk . Define dkCE = xk − xkC as the segment leading from the Cauchy point to the
EQP point.
EQP
∈
[0, 1] as the (approximate) minimizer of
7. Trial point Compute αk
C
CE
q(dk + αdk ) (and such that all the simple bounds of the problem are satisfied). Set
EQP
dk = dkC + αk dkCE , xkT = xk + dk , trySOC = True.
8. Evaluate f (xkT ), h(xkT ) and g(xkT ).
9. Compute
ρk =
φ(xk ; νk ) − φ(xkT ; νk )
.
qk (0) − qk (dk )
10. If ρk ≥ ρu , set xk+1 ← xkT , f (xk+1 ) ← f (xkT ), h(xk+1 ) ← h(xkT ) and
g(xk+1 ) ← g(xkT ). Evaluate ∇f (xk+1 ); ∇hi (xk+1 ), i ∈ E; and ∇gi (xk+1 ), i ∈ I. Go to
Step 13.
11. If trySOC = True, compute d SOC by solving (7.2) and choose τ SOC so that the simple
bounds of the problem are satisfied; set xkT = xk + dk + τ SOC d SOC ; set trySOC = False;
go to step 8.
12. Set xk+1 ← xk .
13. Compute k+1 by means of (7.3).
14. Compute LP
k+1 using (3.5)–(3.6).
and maintains a (limited memory) BFGS approximation to the reduced Hessian of a
Lagrangian function. Even though Snopt uses only first derivatives (whereas Knitro
and Slique use second derivatives) it provides a worthy benchmark for our purposes
since it is generally regarded as one of the most effective active-set SQP codes available
for large-scale nonlinear optimization.
All tests described in this paper were performed on a Sun Ultra 5, with 1Gb of memory running SunOS 5.7. All codes are written in Fortran, were compiled using the Sun
f90 compiler with the “-O” compilation flag, and were run in double precision using
all their default settings except as noted below. For Snopt, the superbasics limit was
increased to 2000 to allow for the solution of the majority of the CUTEr problems.
However, for some problems this limit was still too small and so for these problems the
superbasics limit was increased even more until it was sufficiently large. Limits of 1 hour
of CPU time and 3000 outer or major iterations were imposed for each problem; if one of
these limits was reached the code was considered to have failed. The stopping tolerance
was set at 10−6 for all solvers. Although, it is nearly impossible to enforce a uniform
An algorithm for nonlinear optimization
41
stopping condition, the stopping conditions for Slique and Knitro were constructed
to be very similar to that used in Snopt.
The termination test used for Slique in the numerical results reported in this section
is as follows. Algorithm 10.1 stops when a primal-dual pair (x, λ) satisfies
max{||∇f (x) − A(x)T λ||∞ , ||g(x) ∗ λg ||∞ } < 10−6 (1 + ||λ||2 )
and max max |hi (x)|, max(0, −gi (x)) < 10−6 (1 + ||x||2 ),
i∈E
i∈I
(11.1)
(11.2)
where A denotes the Jacobian of all the constraints, λg denotes the Lagange multipliers
corresponding to the inequality constraints g(x), and g(x) ∗ λg denotes the componentwise product of the vectors g and λg .
11.1. Robustness
In order to first get a picture of the robustness of the Slique algorithm we summarize its
performance on a subset of problems from the CUTEr test set using all default sizes for
scalable problems (as of May 15, 2002). Since we are primarily interested in the performance of Slique on general nonlinear optimization problems with inequality constraints
and/or bounds on the variables (such that the active-set identification mechanism is relevant), we exclude all unconstrained problems and problems whose only constraints are
equations or fixed variables. We also exclude LPs and feasibility problems (problems
with zero degrees of freedom). In addition eight problems (ALLINQP, CHARDIS0,
CHARDIS1, CONT6-QQ, DEGENQP, HARKERP2, LUBRIF, ODNAMUR) were
removed because they could not be comfortably run within the memory limits of the
testing machine for any of the codes. The remaining 560 problems form our test set.
These remaining problems can be divided between three sets: quadratic programs (QP),
problems whose only constraints are simple bounds on the variables (BC), and everything else, which we refer to as generally constrained (GC) problems. If a problem is a
QP just involving bound constraints, it is included only in the BC set.
Although we will not show it here, the Slique implementation of the SLP-EQP
algorithm described in this paper is quite robust and efficient at solving simpler classes of problems (e.g., LPs, unconstrained problems, equality constrained problems and
feasibility problems) as evidenced in [22].
We should note that there are a few problems in CUTEr for which a solution does
not exist (for example the problem may be infeasible or unbounded). Although, it is
important for a code to recognize and behave intelligently in these cases, we do not
evaluate the ability of a code to do so here. For simplicity, we treat all instances where
an optimal solution is not found as a failure regardless of whether or not it is possible
to find such a point. Moreover, we do not distinguish between different local minima as
it is often too difficult to do so, and because we have noted that in the vast majority of
cases (roughly) the same point is found by the competing methods.
The distribution of problem types and sizes (Very Small, Small, Medium, and Large)
for our test set is shown in Table 1. We use the value n + m to characterize a problem’s
size where n is the number of variables and m is the number of general constraints (not
including bounds on the variables).
42
R.H. Byrd et al.
Table 1. CUTEr test set problem sizes and characteristics
Problem
class
VS
S
M
L
Total
Problem size
1 ≤ n + m < 100
100 ≤ n + m < 1000
1000 ≤ n + m < 10000
10000 ≤ n + m
all
QP
30
24
27
36
117
# of problems
BC
GC Total
266
59
177
76
5
47
30
61
118
13
51
100
560
107
336
Table 2. Robustness results by problem class
Problem
class
QP
BC
GC
Total
Sample
size
117
107
336
560
Slique
# Opt
% Opt
100
85.5
97
90.7
259
77.1
456
81.4
Knitro
# Opt
% Opt
113
96.6
98
91.6
295
87.8
506
90.4
Snopt
# Opt
% Opt
99
84.6
71
66.3
285
84.8
455
81.3
In Table 2 we summarize the number (# Opt) and percentage (% Opt) of problems
for which each solver reported finding the optimal solution, discriminated by problem
characteristics. On 7 problems Snopt terminates with the message “optimal, but the
requested accuracy could not be achieved” which implies that Snopt was within a factor of 10−2 of satisfying the convergence conditions. It is questionable whether or not
to count such problems as successes for testing purposes. In practice, such a message is
very useful, however, both Slique and Knitro report any problem for which it cannot
meet the desired accuracy in the stopping condition as a failure, even if it comes very
close and it is suspected that the iterate has converged to a locally optimal point. Therefore, in order to be consistent, we do not count these problems as successes for Snopt.
Since the number of such problems is small relatively speaking, their overall effect is
negligible.
Even though Slique is significantly less robust than the solver Knitro on this problem set, it is similar to Snopt in terms of overall robustness. We find this encouraging
since many features of our software implementation can be improved, as discussed in
the final section of this paper.
Next we compare in Table 3 the robustness of the solvers based on problem size.
Note the decrease in reliability of Slique as the problem size varies from medium (M)
to large (L). Included in the failures for Slique are ten large QPs in which Slique (but
not the other codes) experienced difficulties with memory and could not run properly.
Out of the remaining 27 failures for Slique on the large set, 24 of them result from
reaching the CPU limit. Clearly, for large-scale problems the current implementation of
Slique is too slow. Some of the reasons for this will be discussed later on. Snopt also
struggles on the set of large problems since many of these problems have a large reduced
space leading to expensive computations involving a dense reduced Hessian matrix.
An algorithm for nonlinear optimization
43
Table 3. Robustness results by problem size
Problem
class
VS
S
M
L
Total
Sample
size
266
76
118
100
560
Slique
# Opt
% Opt
251
94.4
52
68.4
90
76.3
63
63.0
456
81.4
Knitro
# Opt
% Opt
251
94.4
67
88.2
103
87.3
85
85.0
506
90.4
Snopt
# Opt
% Opt
250
94.0
70
92.1
83
70.3
52
52.0
455
81.3
11.2. Function evaluations and time
We now study the performance of Slique, Knitro and Snopt based on number of
function/constraint evaluations and total CPU time required to achieve convergence.
Our primary interest is in gauging the efficiency of the SLP-EQP approach on mediumscale and large-scale problems. For this reason, in this section we will restrict ourselves
to only those problems in our test set for which n + m ≥ 1000. For the number of
function/constraint evaluations we take the maximum of these two quantities. In order
to ensure that the timing results are as accurate as possible, all tests involving timing
were carried out on a dedicated machine with no other jobs running.
All the results in this section will be presented using the performance profiles proposed by Dolan and Moré [7]. In the plots πs (τ ) denotes the logarithmic performance
profile
πs (τ ) =
no. of problems where log2 (rp,s ) ≤ τ
,
total no. of problems
τ ≥ 0,
(11.3)
where rp,s is the ratio between the time to solve problem p by solver s over the lowest
time required by any of the solvers. The ratio rp,s is set to infinity (or some sufficiently
large number) whenever solver s fails to solve problem p. See [7] for more details on
the motivation and definition of the performance profiles.
First, we compare in Figures 1(a) and 1(b) the performance of the three codes on
43 problems whose only constraints are simple bounds on the variables. Although there
exist specialized approaches for solving these types of problems [6, 18, 25], it is instructive to observe the performance of Slique when the feasible region has the geometry
produced by simple bounds. Figures 1(a) and 1(b) indicate that Slique performs quite
well on this class of problems.
Next, we compare in Figures 2(a) and 2(b) the performance of Slique, Knitro
and Snopt on 63 quadratic programming problems from the CUTEr collection where
n + m ≥ 1000. We have excluded QPs which only have equality constraints. There are
both convex and nonconvex QPs in this set.
Note that Slique is similar to the other solvers in terms of function evaluations on
this set (although Knitro is more robust), but it is less efficient in terms of CPU time.
This is a bit surprising. We would expect that if Slique is similar to Snopt in terms
of number of function evaluations, that it would also be comparable or perhaps more
efficient in terms of time, since in general we expect an SLP-EQP iteration to be cheaper
than an active-set SQP iteration (and typically the number of function evaluations is
44
R.H. Byrd et al.
1
0.8
0.8
0.6
0.6
π(τ)
π(τ)
1
0.4
0.4
0.2
0
0
0.2
SLIQUE
KNITRO
SNOPT
2
4
τ
6
8
0
0
10
SLIQUE
KNITRO
SNOPT
2
(a) Function evaluations.
4
τ
6
8
10
(b) CPU time.
Fig. 1. Medium and Large Bound Constrained Problems
0.8
0.8
0.6
0.6
π(τ)
1
π(τ)
1
0.4
0.4
0.2
0
0
0.2
SLIQUE
KNITRO
SNOPT
2
4
τ
6
(a) Function evaluations.
8
10
0
0
SLIQUE
KNITRO
SNOPT
2
4
τ
6
8
10
(b) CPU time.
Fig. 2. Medium and Large Quadratic Programming Problems
similar to the number of iterations). In many of these cases, the average number of inner
simplex iterations of the LP solver per outer iteration in Slique greatly exceeds the
average number of inner QP iterations per outer iteration in Snopt. This is caused, in
part, by the inability of the current implementation of Slique to perform effective warm
starts, as will be discussed in Section 11.3.
Finally we consider the performance of the three codes on 112 generally constrained
problems. In Figures 3(a) and 3(b), we report results for the medium-scale and large-scale
generally constrained (GC) set. As in the set of quadratic programs the interior-point
code Knitro outperforms both active-set codes. Comparing the two active-set codes,
Slique is a little behind Snopt, in terms of both function evaluations and CPU time.
An algorithm for nonlinear optimization
45
1
0.8
0.8
0.6
0.6
π(τ)
π(τ)
1
0.4
0.4
0.2
0
0
0.2
SLIQUE
KNITRO
SNOPT
2
4
τ
6
8
SLIQUE
KNITRO
SNOPT
0
0
10
2
(a) Function evaluations.
4
τ
6
8
10
(b) CPU time.
Fig. 3. Medium and Large Generally Constrained Problems
11.3. Slique timing statistics and conclusions
We present below some more detailed statistics on the performance of Slique on the
CUTEr set of test problems. In Tables 4 and 5 we look at the average percentage of
time spent on various tasks in Slique based on problem characteristics and problem
size respectively. These average values are obtained by computing the percentages for
all the individual problems and then averaging these percentages over all the problems
in the test set, where all problems are given equal weight. In this way, problems which
take the most time do not dominate the timing statistics.
In these timing statistics we only include problems in which an optimal solution was
found and for which the total CPU time was at least one second. We look at the following
tasks: the solution of the LP subproblem (% LP); the solution of the EQP subproblem
(%EQP); the time spent factoring the augmented system matrix (i.e., the coefficient
Table 4. Slique timing results by problem class. Average percentage of time spent on various tasks.
Prob. class
QP
BC
GC
Total
% LP
62.7
5.3
43.2
40.5
% EQP
20.1
57.8
21.8
29.7
% AugFact
5.5
3.9
10.2
7.3
% Eval
4.5
23.5
12.6
12.6
% Other
7.2
9.5
12.2
10.0
Table 5. Slique timing results by problem size. Average percentage of time spent on various tasks.
Problem size
1 ≤ n + m < 100
100 ≤ n + m < 1000
1000 ≤ n + m < 10000
10000 ≤ n + m
Total
% LP
10.3
35.9
42.5
48.8
40.5
% EQP
16.2
16.4
33.1
34.7
29.7
% AugFact
4.8
13.2
7.7
5.0
7.3
% Eval
47.9
19.2
7.5
5.9
12.6
% Other
20.9
15.3
9.2
5.5
10.0
46
R.H. Byrd et al.
matrix in (5.9)) (% AugFact); the time spent evaluating the functions, gradients and
Hessian (% Eval); and all other time (% Other).
It is apparent from these tables that, in general, the solution of the LP subproblems
dominates the overall cost of the algorithm with the solution of the EQP being the second
most costly feature. An exception is the class of bound constrained problems where the
computational work is dominated by the EQP phase. It is also clear that the LP cost
dominates the overall cost of the algorithm more and more as the problem size grows.
A detailed examination of the data from these tests reveals that there are two sources
for the excessive LP times. For some problems, the first few iterations of Slique require
a very large number of simplex steps. On other problems, the number of LP iterations
does not decrease substantially as the solution of the nonlinear program is approached,
i.e., the warm start feature is not completely successful. Designing an effective warm
start technique for our SLP-EQP approach is a challenging research question, since
the set of constraints active at the solution of the LP subproblem often include many
trust-region constraints which may change from one iteration to the next even when
the optimal active set for the NLP is identified. In contrast, warm starts are generally
effective in Snopt for which the number of inner iterations decreases rapidly near the
solution.
We conclude this section by making the following summary observations about the
algorithm, based on the tests reported here; see also [22].
• Slique is currently quite robust and efficient for small and medium-size problems. It
is very effective for bound constrained problems of all sizes, where the LP is much
less costly.
• The strategy for updating the penalty parameter ν in Slique has proved to be effective.
Typically it chooses an adequate value of ν quickly and keeps it constant thereafter (in
our tests, roughly 90% of the iterations used the final value of ν, and ν was increased
about once per problem on the average). Therefore, the choice of the penalty parameter
does not appear to be a problematic issue in our approach.
• The active set identification properties of the LP phase are, generally, effective. This
is one of the most positive observations of this work. Nevertheless, in some problems
Slique has difficulties identifying the active set near the solution, which indicates
that more work is needed to improve our LP trust region update mechanism.
• The active-set codes, Slique and Snopt are both significantly less robust and efficient for large-scale problems overall, compared to the interior-point code Knitro. It
appears that these codes perform poorly on large problems for different reasons. The
SQP approach implemented by Snopt is inefficient on large-scale problems because
many of these have a large reduced space leading to high computing times for the
QP subproblems, or resulting in a large number of iterations due to the inaccuracy
of the quasi-Newton approximation. However, a large reduced space is not generally
a difficulty for Slique (as evidenced by its performance on the bound constrained
problems).
By contrast, the SLP-EQP approach implemented in Slique becomes inefficient for
large-scale problems because of the large computing times in solving the LP subproblems, and because warm starting these LPs can sometimes be ineffective. Warm starts
in Snopt, however, appear to be very efficient.
An algorithm for nonlinear optimization
47
12. Final remarks
We have presented a new active-set, trust-region algorithm for large-scale optimization.
It is based on the SLP-EQP approach of Fletcher and Sainz de la Maza. Among the
novel features of our algorithm we can mention: (i) a new procedure for computing
the EQP step using a quadratic model of the penalty function and a trust region; (ii)
a dogleg approach for computing the total step based on the Cauchy and EQP steps;
(iii) an automatic procedure for adjusting the penalty parameter using the linear programming subproblem; (iv) a new procedure for updating the LP trust-region radius
that allows it to decrease even on accepted steps to promote the identification of locally
active constraints.
The experimental results presented in Section 11 indicate, in our opinion, that the
algorithm holds much promise. In addition, the algorithm is supported by the global
convergence theory presented in [2], which builds upon the analysis of Yuan [24].
Our approach differs significantly from the SLP-EQP algorithm described by Chin
and Fletcher [4]. These authors use a filter for step acceptance. In the event that the
constraints in the LP subproblem are incompatible, their algorithm solves instead a
feasibility problem that minimizes the violation of the constraints while ignoring the
objective function. We prefer the 1 -penalty approach (3.4) because it allows us to work
simultaneously on optimality and feasibility, but testing would be needed to establish
which approach is preferable. The algorithm of Fletcher and Chin defines the trial step
to be either the full step to the EQP point (plus possibly a second order correction) or, if
this step is unacceptable, the Cauchy step. In contrast, our approach explores a dogleg
path to determine the full step. Our algorithm also differs in the way the LP trust region
is handled and many other algorithmic aspects.
The software used to implement the Slique algorithm is not a finished product
but represents the first stage in algorithmic development. In our view, it is likely that
significant improvements in the algorithm can be made by developing: (i) faster procedures for solving the LP subproblem, including better initial estimates of the active set,
perhaps using an interior-point approach in the early iterations, and truncating the LP
solution process when it is too time-consuming (or even skipping the LP on occasion);
(ii) improved strategies for updating the LP trust region; (iii) an improved second-order
correction strategy or a replacement by a non-monotone strategy; (iv) preconditioning
techniques for solving the EQP step; (v) mechanisms for handling degeneracy.
References
1. Bongartz, I., Conn, A.R., Gould, N.I.M., Toint, Ph.L.: CUTE: Constrained and Unconstrained Testing
Environment. ACM Transactions on Math. Softw. 21(1), 123–160 (1995)
2. Byrd, R.H., Gould, N.I.M., Nocedal, J., Waltz, R.A.: On the convergence of successive linear programming
algorithms. Technical Report OTC 2002/5, Optimization Technology Center, Northwestern University,
Evanston, IL, USA, 2002
3. Byrd, R.H., Hribar, M.E., Nocedal, J.: An interior point algorithm for large scale nonlinear programming.
SIAM J. Optim. 9(4), 877–900 (1999)
4. Chin, C.M., Fletcher, R.: On the global convergence of an SLP-filter algorithm that takes EQP steps.
Math. Program. Ser. A 96(1), 161–177 (2003)
5. Conn, A.R., Gould, N.I.M., Toint, Ph.: Trust-region methods. MPS-SIAM Series on Optimization. SIAM
publications, Philadelphia, Pennsylvania, USA, 2000
48
R.H. Byrd et al.: An algorithm for nonlinear optimization
6. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: LANCELOT: a Fortran package for Large-scale Nonlinear
Optimization (Release A). Springer Series in Computational Mathematics. Springer Verlag, Heidelberg,
Berlin, New York, 1992
7. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program.
Ser. A 91, 201–213 (2002)
8. Fletcher, R.: Practical Methods of Optimization. Volume 2: Constrained Optimization. J. Wiley and Sons,
Chichester, England, 1981
9. Fletcher, R., Leyffer, S.: Nonlinear programming without a penalty function. Math. Program. 91, 239–269
(2002)
10. Fletcher, R., Sainz de la Maza, E.: Nonlinear programming and nonsmooth optimization by successive
linear programming. Math. Program. 43(3), 235–256 (1989)
11. Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM J. Optim. 12, 979–1006 (2002)
12. Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, London, 1981
13. Gould, N.I.M., Hribar, M.E., Nocedal, J.: On the solution of equality constrained quadratic problems
arising in optimization. SIAM J. Scientific Comput. 23(4), 1375–1394 (2001)
14. Gould, N.I.M., Lucidi, S., Roma, M., Toint, Ph.L.: Solving the trust-region subproblem using the Lanczos
method. SIAM J. Optim. 9(2), 504–525 (1999)
15. Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEr (and SifDec), a Constrained and Unconstrained Testing
Environment, revisited. Technical Report TR/PA/01/04, CERFACS, Toulouse, France, 2003. To appear
in Transactions on Mathematical Software
16. Harwell Subroutine Library.: A catalogue of subroutines (HSL 2002). AEA Technology, Harwell, Oxfordshire, England, 2002
17. ILOG CPLEX 8.0.: User’s Manual. ILOG SA, Gentilly, France, 2002
18. Lin, C., Moré, J.J.: Newton’s method for large bound-constrained optimization problems. SIAM J. Optim.
9(4), 1100–1127 (1999)
19. Maratos, N.: Exact penalty function algorithms for finite-dimensional and control optimization problems.
PhD thesis, University of London, London, England, 1978
20. Powell, M.J.D.: A Fortran subroutine for unconstrained minimization requiring first derivatives of the
objective function. Technical Report R-6469, AERE Harwell Laboratory, Harwell, Oxfordshire, England,
1970
21. Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Mangasarian, O.L. and
Ritter, K. eds., Nonlinear Programming, London, Academic Press, 1970, pp. 31–65
22. Waltz, R.A.: Algorithms for large-scale nonlinear optimization. PhD thesis, Department of Electrical and
Computer Engineering, Northwestern University, Evanston, Illinois, USA, http://www.ece.northwestern.
edu/˜rwaltz/, 2002
23. Waltz, R.A., Nocedal, J.: KNITRO user’s manual. Technical Report OTC 2003/05, Optimization Technology Center, Northwestern University, Evanston, IL, USA, April 2003
24. Yuan, Y.: Conditions for convergence of trust region algorithms for nonsmooth optimization. Math.
Program. 31(2), 220–228 (1985)
25. Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 78: L-BFGS-B: Fortran subroutines for large-scale
bound constrained optimization. ACM Transactions on Math. Softw. 23(4), 550–560 (1997)