An optimal algorithm for bound and equality constrained quadratic

An optimal algorithm for bound and equality
constrained quadratic programming problems with
bounded spectrum ∗
Z. Dostál
VŠB-Technical University Ostrava,
Tř 17 listopadu,
CZ-70833 Ostrava, Czech Republic.
E-mail [email protected]
October 29, 2004
Abstract
An implementation of the recently proposed semi-monotonic augmented
Lagrangian algorithm for solving the large convex bound and equality constrained quadratic programming problems is considered. It is proved that if
the algorithm is applied to the class of problems with uniformly bounded spectrum of the Hessian matrix, then the algorithm finds an approximate solution
at O(1) matrix-vector multiplications. The optimality results are presented
that do not depend on conditioning of the matrix which defines the equality constraints. Theory covers also the problems with dependent constraints.
Theoretical results are illustrated by numerical experiments.
AMS classification: Primary 65K05; Secondary 90C20
Key words and phrases: quadratic programming, bound and equality
constraints, augmented Lagrangians, optimal algorithms
∗
This research has been supported by grant GAČR 101/04/1145, GA CAS S3086102 and
1ET400300415.
1
1
Introduction
An important ingredient in development of effective methods for the solution of
very large problems is identification of the algorithms that can solve some special
cases with the optimal (i.e. asymptotically linear) complexity. For example, if an
iterative method with the known rate of convergence in the condition number is
applied to the class of problems with the uniformly bounded condition number, we
may bound uniformly the number of the iterates to get an optimal algorithm. Such
simple observation was indeed the part of the success of the multigrid [25] or the
FETI based domain decomposition [22] methods for the solution of elliptic partial
differential equations.
We shall be concerned with the problem of finding effectively the minimizer of
the quadratic function subject to the bound and linear equality constraints, that is
minimize q(x) subject to x ∈ ΩBE
(1)
with ΩBE = {x ∈ IRn : x ≥ ` and Cx = 0}, q(x) = 21 xT Ax − bT x, A ∈ IRn×n
symmetric positive definite, C ∈ IRm×n , and `, b, x ∈ IRn . We do not assume that C
is a full row rank matrix nor m ≤ n, but we shall assume that ΩBE is not empty. We
shall also assume that C is sparse. Having in mind the optimality results mentioned
above, we shall be especially interested in problems with the matrix A large, sparse
and well conditioned. The condition on sparsity of A may be replaced by any other
condition which implies cheap evaluation of Ay for any y ∈ IRn . Such problems arise,
for example, from the discretization of semicoercive elliptic variational inequalities
(e.g. Dostál [3]) or from application of the duality based domain decomposition to
the contact problems of elasticity (e.g. Dostál, Friedlander and Santos [9], Dostál,
Gomes and Santos [14] or Dostál and Horák [16]).
There are several basic algorithms for solving (1). Probably the most popular
are those based on the interior point method (e. g. [27]). For example, the interior point methods were applied successfully to the solution of very large problems
of nonlinear optimization with many constraints and millions of decision variables
observing that the Hessian matrix with a special pattern of the nonzero elements
may be decomposed with nearly linear complexity [15]. In spite of the success of
the interior point methods for many problems, it is not obvious how they can exploit favorable distribution of the spectrum of the Hessian matrix A. The reason
is that auxiliary linear problems of the interior point method are ill conditioned
due to the penalization. Another successful algorithm, a variant of the augmented
Lagrangian method proposed by Conn, Gould and Toint and enhanced into the
2
package LANCELOT [2], generates approximations of the Lagrange multipliers in
the outer loop while the bound constrained auxiliary problems with symmetric positive definite matrices are solved approximately in the inner loop. The auxiliary
problems are of the type
minimize L(x, µk , ρk )
subject to
x≥`
where
(2)
ρk
kCx − dk2
(3)
2
is known as the augmented Lagrangian function, µk = (µk1 , . . . , µkm )T is the vector of
Lagrange multipliers for the equality constraints, ρk is the penalty parameter, and
k · k denotes the Euclidean norm.
We use two modifications of the algorithm of Conn, Gould and Toint [8]. The
first one concerns the precision of the approximate solution xk of the auxiliary problems which is defined by the norm of the projected gradient of the augmented Lagrangian and is controlled by a fixed multiple of the Euclidean norm of the feasibility
error. Let us recall that the same precision control has already been considered by
Hager [26] and Dostál, Friedlander, Santos and Alesawi [7, 10, 11, 12] for the problems with equality constraints and by Dostál, Friedlander and Santos [8, 13] for
bound and equality constraints. The second modification concerns the update rule
for the penalty parameter ρk which is increased until there is monotonic increase
of L(xk µk , ρk ) [8]. It turned out that not only the latter is achieved with quite
small values of the penalty parameter, but that it also enables to get results on the
convergence of the feasibility error that are independent of the conditioning of the
constraints.
The point of this paper is to show that the results mentioned above are sufficient
to guarantee that the algorithm implemented with the solution of the auxiliary
problems (2) by the recently proposed conjugate gradient based method with the
proportioning and reduced gradient projections [21] finds an approximate solution
of the class of problems with uniformly bounded spectrum of the Hessian matrix at
O(1) matrix-vector multiplications. If we apply this result to the class of problems
with the Hessian matrices that are in addition either sufficiently sparse or can be be
expressed as a product of sufficiently sparse matrices, then the algorithm is optimal
also in the sense that it can solve the classes of such problems with the asymptotically
linear complexity. While the results of [8] are independent of implementation of the
inner loop, the results of this paper concentrate just on the implementation of the
L(x, µk , ρk ) = h(x) + (µk )T (Cx − d) +
3
inner loop. Let us recall that the results are not a trivial corollary of the bound
on a number of the outer iterations proved in [8] as the multipliers µk that appear
in the linear term of the augmented Lagrangian need not be bounded. Moreover,
the precision of the solution of the auxiliary problems is controlled by the projected
gradient which is not continuous.
The paper is organized as follows. In Section 2 we review the MPRGP algorithm
for the bound constrained quadratic programming. In Section 3 we present new results on the rate of convergence of the iterates generated by MPRGP which are
important for analysis of optimality. The semi-monotonic augmented Lagrangian
algorithm SMALBE for minimization of the convex quadratic functions subject to
the bound and equality constraints is reviewed at Section 4. In Section 5 we present
analysis of optimality of the algorithm with the inner loop implemented by MPRGP.
The theoretical results are illustrated by numerical experiments with solving benchmarks up to more than two and half million variables in Section 6. Finally, some
conclusions are discussed in Section 7.
In the whole paper, q(x) = 12 xT Ax − bT x will denote a strictly convex quadratic
function defined on IRn , with the Hessian matrix ∇2 q = A ∈ IRn×n symmetric
positive definite and x, b ∈ IRn . The eigenvalues of A will be always denoted λi (A),
λmin (A) = λ1 (A) ≤ . . . ≤ λn (A) = λmax (A) = kAk.
For any nonempty sets I, J and matrix M , we shall denote by MIJ or MI the
submatrices of M that comprise rows and columns determined by the sets I and J ,
respectively. In the same spirit, given a vector x ∈ IRn , a subvector of x with the
components determined by I will be denoted by xI .
2
Bound constrained problems
We shall now be concerned with the solution of the auxiliary problem (2). To
simplify our exposition, we shall consider the problem
minimize
q(x)
subject to x ∈ ΩB
(4)
with ΩB = {x : x ≥ `} and the other notation from the introduction. Let us recall
that the solution of (4) is of independent interest and arise e.g. in application of
the duality based domain decomposition methods to the solution of the discretized
variational inequalities [29, 17, 19, 20] or in computer graphics.
4
It is well known that the solution to the problem (4) always exists, and it is
necessarily unique [1]. For arbitrary n-vector x, let us denote in this section the
gradient g = g(x) of q by
g = g(x) = Ax − b.
(5)
Then the unique solution x of (4) is fully determined by the Karush-Kuhn-Tucker
optimality conditions [1] so that for i = 1, . . . , n,
xi = `i implies g i ≥ 0 and xi > `i implies g i = 0.
(6)
Let N denote the set of all indices so that
N = {1, 2, . . . , n}.
The set of all indices for which xi = `i is called an active set of x. We shall denote
it by A(x) so that
A(x) = {i ∈ N : xi = `i }.
Its complement
F(x) = {i ∈ N : xi 6= `i }
and subset
B(x) = {i ∈ N : xi = `i and gi > 0}
are called a free set and a binding set, respectively.
To enable an alternative reference to the Karush-Kuhn-Tucker conditions (6),
we shall introduce a notation for the free gradient ϕ and the chopped gradient β
that are defined by
ϕi (x) = gi (x) for i ∈ F(x) ϕi (x) = 0 for i ∈ A(x)
βi (x) = 0 for i ∈ F(x),
βi (x) = gi− (x) for i ∈ A(x)
where we have used the notation gi− = min{gi , 0}. Thus the Karush- Kuhn-Tucker
conditions (6) are satisfied iff the projected gradient g P (x) = ϕ(x) + β(x) is equal to
zero.
The projection PΩ to Ω is defined for any n-vector x by
PΩ (x) = ` + (x − `)+
where y + denote for any n-vector y the vector with entries yi+ =max {yi , 0}.
The algorithm for the solution of (4) that we describe here combines the proportioning algorithm mentioned above with the reduced gradient projections. It
exploits a given constant Γ > 0, a test to decide about leaving the face and three
5
types of steps to generate a sequence of iterates {xk } that approximate the solution
of (4).
The expansion step is defined by
³
´
xk+1 = PΩ xk − αϕ(xk )
(7)
with the fixed steplength α ∈ (0, kAk−1 ]. This step may expand the current active
set. To describe it without PΩ , let us introduce, for any x ∈ Ω, the reduced free
e
gradient ϕ(x)
with the entries
ϕei = ϕei (x) = min{(xi − `i )/α, ϕi },
so that
e
xk+1 = x − αϕ(x).
(8)
e k )> ϕ(xk )
||β(xk )||2 ≤ Γ2 ϕ(x
(9)
If the inequality
holds then we call the iterate xk strictly proportional. The test (9) is used to decide
which component of the projected gradient g P (xk ) will be reduced in the next step.
Notice that the right-hand side of (9) blends the information about the current
free gradient and its part that can be used in the expansion step, while the related
relations in [4, 6, 24, 23] consider only the norm of the free gradient.
The proportioning step is defined by
xk+1 = xk − αcg β(xk )
³
(10)
´
with the steplength αcg that minimizes q xk − αβ(xk ) . It is easy to check [28] that
αcg that minimizes q(x − αd) for given d and x may be evaluated by the formula
αcg = αcg (d) =
d> g(x)
.
dT Ad
(11)
The purpose of the proportioning step is to remove indices from the active set. Note
that if xk ∈ Ω, then xk+1 = xk − αcg β(xk ) ∈ Ω.
The conjugate gradient step is defined by
xk+1 = xk − αcg pk
(12)
where pk is the conjugate gradient direction [28] which is constructed recurrently.
The recurrence starts (or restarts) from ps = ϕ(xs ) whenever xs is generated by the
6
expansion step or the proportioning step. If pk is known, then pk+1 is given by the
formulae [?, 28]
ϕ(xk+1 )> Apk
pk+1 = ϕ(xk+1 ) − γpk , γ =
.
(13)
(pk )> Apk
The basic property of the conjugate directions ps , . . . , pk that are generated by the
recurrence (13) from the restart ps is their mutual A-orthogonality, i. e. (pi )> Apj =
0 for i, j ∈ {s, . . . , k}, i 6= j. It follows easily [28] that
q(xk+1 ) = min{q(xs + y) : y ∈ Span{ps , . . . , pk }}
(14)
where Span{ps , . . . , pk } denotes the vector space of all linear combinations of the vectors ps , . . . , pk . The conjugate gradient steps are used to carry out the minimization
in the face
WI = {x : xi = `i for i ∈ I }
(15)
given by I = A(xs ) efficiently.
Let us define the algorithm that we propose in the form that is convenient for
analysis.
Algorithm 1. Modified proportioning with reduced gradient projections
(MPRGP).
Let x0 ∈ Ω, α ∈ (0, kAk−1 ], and Γ > 0 be given. For k ≥ 0 and xk known, choose
xk+1 by the following rules:
(i) If g P (xk ) = 0, set xk+1 = xk .
(ii) If xk is strictly proportional and ν(xk ) 6= 0, try to generate xk+1 by the conjugate
gradient step. If xk+1 ∈ Ω, then accept it, else generate xk+1 by the expansion step.
(iii) If xk is not strictly proportional, define xk+1 by proportioning.
3
Convergence of the modified proportioning
Basic theoretical results concerning the algorithm read as follows:
Theorem 1. Let Γ and amin be given constants, 0 < Γ and 0 < amin ≤ λmin (A),
b = max{Γ, Γ−1 }, let x denote the unique solution of (4), and let {xk } denote the
Γ
sequence generated by Algorithm 3.1 with α ∈ (0, kAk−1 ].
(i) For any k ≥ 0
³
´
(16)
q(xk+1 ) − q(x) ≤ ν q(xk ) − q(x)
7
where x denotes the unique solution of (4) and
ν =1−
αamin
.
b2
2 + 2Γ
(17)
(ii) The solution error is bounded by
´
2ν k ³ 0
kx − xk ≤
q(x ) − q(x) .
amin
k
2
(18)
(iii) If the solution x is not dual degenerate, then there is k such that xk = x.
(iv) If
µq
¶
Γ≥2
κ(A) + 1 ,
(19)
then there is k ≥ 0 such that xk = x.
Proof: See [21].
Since we shall use the MPRGP algorithm also in the inner loop of other algorithms, it is not enough to know that it converges, as we must be able to recognize
in time that we are sufficiently near the solution. What we need is the rate of
convergence of the projected gradient. Let us recall that the result is not a trivial
consequence of (18) as the projected gradient is not continuous function of the iterates.
Theorem 2. Let Γ and amin be given constants, 0 < Γ and 0 < amin ≤ λmin (A),
b = max{Γ, Γ−1 }, let x denote the unique solution of (4), and let {xk } denote the seΓ
quence generated by Algorithm 3.1 with α ∈ (0, kAk−1 ]. Let κ = α−1 min{1, λmin (A)}−1
so that the spectral condition number κ(A) of A satisfies κ(A) ≤ κ. Then the projected gradient is bounded by
³
´
kg P (xk )k2 ≤ a1 ν k q(x0 ) − q(x)
with ν defined by (17) and
a1 =
36κ
.
ν(1 − ν)
(20)
(21)
Proof: First notice that it is enough to estimate separately β(xk ) and ϕ(xk ) as
kg P (x)k2 = kβ(xk )k2 + kϕ(xk )k2 .
8
In particular, since for any vector d such that dT g(x) ≥ kdk2
1
α
q(x) − q(x − αd) = αdT g(x) − α2 dT Ad ≥ kdk2 ,
2
2
(22)
we can combine (22) with xk − αβ(xk ) ≥ ` to estimate kβ(xk )k by
q(xk ) − q(x) = (q(xk ) − q(xk − αβ(xk )) + (q(xk − αβ(xk )) − q(x)) ≥
α
kβ(xk )k2 .
2
Applying (16), we get
kβ(xk )k2 ≤
´
´
2³ k
2ν k ³ 0
q(x ) − q(x) ≤
q(x ) − q(x) ).
α
α
(23)
To estimate kϕ(xk )k, notice that the algorithm ”does not know” about the the
components of the constraint vector ` when it generates xk+1 unless their indices
belong to A(xk ) or A(xk+1 ). It follows that xk+1 may be considered also as an iterate
generated by Algorithm 2.1 from xk for the problem
minimize q(x) subject to xi ≥ `i for i ∈ A(xk ) ∪ A(xk+1 ).
(24)
If we denote
q k = min{q(x) : xi ≥ `i for i ∈ A(xk ) ∪ A(xk+1 )} ≤ q(x)
and δ k = q(x) − q k ≥ 0, we can use (16) to get
δ k = q(x) − q k ≤ q(xk+1 ) − q k ≤ ν((q(xk ) − q k ) = ν(q(xk ) − q(x)) + νδ k ,
so that
ν
ν k+1
(q(xk ) − q(x)) ≤
(q(x0 ) − q(x)).
(25)
1−ν
1−ν
Now observe that the indices of the unconstrained components of the minimization problem (24) are those belonging to I k = F(xk ) ∩ F(xk+1 ) as
δk ≤
I k = F(xk ) ∩ F(xk+1 ) = (N \ A(xk )) ∩ (N \ A(xk+1 )) = N \ A(xk ) ∪ A(xk+1 ).
It follows that if I k is nonempty, then by the definition of δ k and (22)
δ k ≥ q(x) − q (x − αgI k (x)) ≥
9
α
kg k (x)k2 .
2 I
(26)
For convenience, let us define gI (x) = 0 for any x and empty set I = ∅. Then (26)
remains valid for I k = ∅ so that we can combine it with (25) to get
kgI k (x)k2 ≤
2
2ν k+1
δk ≤
(q(x0 ) − q(x)).
α
α(1 − ν)
(27)
Since our algorithm is defined so that either I k = F(xk ) ⊆ F (xk+1 ) or I k =
F(xk+1 ) ⊆ F(xk ), it follows that either
kgF(xk ) (x)k2 = kgI k (x)k2 ≤
2ν k+1
2ν k
(q(x0 ) − q(x)) ≤
(q(x0 ) − q(x)) (28)
α(1 − ν)
α(1 − ν)
or
kgF(xk+1 ) (x)k2 = kgI k (x)k2 ≤
2ν k+1
(q(x0 ) − q(x)).
α(1 − ν)
Using the same reasoning for xk−1 and xk , we conclude that the estimate (28) is
valid for any xk such that
F(xk ) ⊆ F (xk+1 ) or F(xk ) ⊆ F(xk−1 ).
(29)
Let us now recall that by (18)
kg(xk ) − g(x)k2 = kA(xk − x)k2 ≤
´
2ν k kAk ³ 0
q(x ) − q(x) ,
amin
(30)
so that for any k satisfying the relations (29), we get
kϕ(xk )k = kgF (xk ) (xk )k ≤ kgF(xk ) (xk ) − gF(xk ) (x)k + kgF(xk ) (x)k
s
≤
2kAk k
ν (q(x0 ) − q(x)) +
amin
s
≤ 2
s
2
ν k (q(x0 ) − q(x))
α(1 − ν)
2κ
ν k (q(x0 ) − q(x)).
(1 − ν)
Combining the last inequality with (23), we get for any k satisfying the relations
(29) that
kg P (xk )k2 = kβ(xk )k2 + kϕ(xk )k2 ≤
10
´
10κ k ³ 0
ν q(x ) − q(x) .
(1 − ν)
(31)
Now notice that the estimate (31) is valid for any iterate xk which satisfies
F(xk ) ⊆ F (xk−1 ), i. e. when xk is generated by the conjugate gradient step or the
expansion step. Thus it remains to estimate the projected gradient of the iterate xk
generated by the proportioning step. In this case F(xk−1 ) ⊆ F(xk ), so that we can
obviously use the estimate (31) to get
s
P
kg (x
k−1
)k ≤
10κ k−1
ν
(q(x0 ) − q(x)).
(1 − ν)
(32)
Since the proportioning step is defined by xk = xk−1 − αcg β(xk−1 ), it follows that
kgF(xk ) (xk−1 )k = kg P (xk−1 )k.
Moreover, using the basic properties of the norm, we get that
kϕ(xk )k = kgF(xk ) (xk )k ≤ kgF(xk ) (xk ) − gF(xk ) (xk−1 )k + kgF(xk ) (xk−1 )k
≤ kg(xk ) − g(x)k + kg(x) − g(xk−1 )k + kg P (xk−1 )k
and by (30) and (32)
kϕ(xk )k ≤
vÃ
! v
!
u
uÃ
u 2kAk
u 2kAk
t
ν k (q(x0 ) − q(x)) + t
ν k−1 (q(x0 ) − q(x))
s
amin
amin
10κ k−1
ν
(q(x0 ) − q(x))
(1 − ν)
s
√
2κ
≤ ( 5 + 2)
ν k−1 (q(x0 ) − q(x)).
(1 − ν)
+
(33)
Combining the last inequality with (23), we get by simple computation that
kg P (xk )k2 = kϕ(xk )k2 + kβ(xk )k2 ≤
´
36κ k−1 ³ 0
ν
q(x ) − q(x) .
(1 − ν)
Since the last estimate is obviously weaker than (31), it follows that (20) is valid for
all indices k.
2
11
4
Algorithm for bound and equality constrained
problems
We shall be finally concerned with the problem (1). To describe our algorithm, we
shall denote the gradient of the augmented Lagrangian by g, so that
g(x, µ, ρ) = ∇x L(x, µ, ρ) = Ax − c + C T µ + ρC T (Cx − d).
(34)
Algorithm 2. Semi-monotonic augmented Lagrangian method for bound
and equality constrained problems (SMALBE) Given η > 0, β > 1, M >
0, ρ0 > 0, and µ0 ∈ IRm , set k = 0.
Step 1. {Inner iteration with adaptive precision control.}
Find xk ≥ ` such that
kg P (xk , µk , ρk )k ≤ min{M kCxk k, η}.
(35)
µk+1 = µk + ρk Cxk .
(36)
Step 2. {Update µ.}
Step 3. {Update ρ provided the increase of the Lagrangian is not sufficient.}
If k > 0 and
ρk
L(xk , µk , ρk ) < L(xk−1 , µk−1 , ρk−1 ) + kCxk k2
(37)
2
then
ρk+1 = βρk ,
(38)
ρk+1 = ρk .
(39)
else
Step 4. Set k = k + 1 and return to the Step 1.
Step 1 was shown to be well defined [13], that is, any algorithm for the solution
of the auxiliary problems in Step 1 which guarantees convergence of the projected
gradient will generate either xk that satisfies (35) in a finite number of steps or a
sequence of approximations that converges to the solution of (1). The basic theoretical results concerning this algorithm are summed up in the following theorem.
12
Theorem 3. Let {xk }, {µk } and {ρk } be generated by Algorithm 3.1 with η > 0, β >
1, M > 0, ρ0 > 0, µ0 ∈ IRm . Let α = λmin (A) denote the least eigenvalue of the
Hessian A of q and let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /α.
(i) The sequence {ρk } is bounded and
ρk ≤ β s ρ0 .
(40)
(ii) If z0 ∈ Ω then
∞
X
ρk
k=1
2
kCxk k2 ≤ q(z0 ) − L(x0 , µ0 , ρ0 ) + (1 + s)
η2
.
2α
(41)
(iii) For any k ≥ 0
L(xk+1 , µk+1 , ρk+1 ) ≥ L(xk , µk , ρk ) +
ρk
ρk+1
η2
kCxk k2 +
kCxk+1 k2 −
.
2
2
2α
(42)
(iv) kCxk k converges to 0.
(iv) The sequence {xk } converges to the solution x∗ of (1).
(v) If the solution x∗ of (1) is regular, then {xk } and {µk } converge to the solution
x∗ and the vector µ∗ of the Lagrange multipliers of (1), respectively.
Proof: See [8].
2
We shall need the following simple Lemma to prove optimality of the inner loop.
Lemma 1. Let {xk }, {µk } and {ρk } be generated by Algorithm 2.1 with η > 0, β >
1, M > 0, ρ0 > 0, µ0 ∈ IRm . Let 0 < amin ≤ min{1, λmin (A)} denote the least
eigenvalue of the Hessian A of the quadratic function q and let s ≥ 0 denote the
smallest integer such that β s ρ0 ≥ M 2 /amin .
Then for any k ≥ 0
k
L(x , µ
k+1
, ρk+1 ) − L(x
k+1
,µ
k+1
η2
βρk
, ρk+1 ) ≤
+
kCxk − dk2 .
2amin
2
Proof: Notice that by definition of the Lagrangian function (3)
L(xk , µk+1 , ρk+1 ) = L(xk , µk , ρk ) + ρk kCxk − dk2 +
= L(xk , µk , ρk ) +
ρk+1 − ρk
kCxk − dk2
2
ρk+1 + ρk
kCxk − dk2 ,
2
13
(43)
so that by definition of Step 2 (36) and (42)
L(xk , µk+1 , ρk+1 ) − L(xk+1 , µk+1 , ρk+1 ) = L(xk , µk , ρk ) − L(xk+1 , µk+1 , ρk+1 )
ρk+1 + ρk
+
kCxk − dk2
2
η2
βρk
≤
+
kCxk − dk2 .
2
2amin
2
5
Linear complexity of semi-monotonic algorithm
To present study optimality of Algorithm 2, let T denote any set of indices and let
for any t ∈ T be defined a problem
t
minimize qt (x) s.t. x ∈ ΩBE
(44)
with ΩtBE = {x ∈ IRnt : Ct x = 0 and x ≥ `t }, qt (x) = 21 xT At x − bTt x, At ∈ IRnt ×nt
symmetric positive definite, Ct ∈ IRmt ×nt , and bt , `t ∈ IRnt . For the sake of simplicity, we shall also assume that 0 ∈ ΩtBE for all t ∈ T . Our optimality result reads as
follows.
Theorem 4. Let {xkt }, {µkt } and {ρt,k } be generated by Algorithm 4.1 for (44) with
kbt k ≥ ηt > 0, β > 1, M > 0, ρt,0 = ρ0 > 0, µ0t = 0. Let amin denote a positive
constant such that amin ≤ λmin (At ), and let s ≥ 0 denote the smallest integer such
that β s ρ0 ≥ M 2 /amin . Then for each ² > 0 and t ∈ T there is an index kt such that
kt ≤
2+s
+1
min ρ0
²2 a
(45)
and
M −1 kg P (xkt t , µkt t , ρt,kt )k ≤ kCt xkt t k ≤ ²kbt k.
Proof: See [8].
(46)
2
Now we shall prove the main result of the paper, i. e. optimality of Algorithm
2 in terms of the matrix-vector multiplications provided Step 1 is implemented by
MPRGP.
Theorem 5. Let 0 < amin < amax and 0 < cmax be given constants and let the class
of problems (44) satisfies
amin ≤ λmin (At ) ≤ λmax (At ) ≤ amax and kCt k ≤ cmax .
14
(47)
Let {xkt }, {µkt } and {ρt,k } be generated by Algorithm 4.1 for (44) with kbt k ≥ ηt >
0, β > 1, M > 0, ρt,0 = ρ0 > 0, µ0t = 0. Let s ≥ 0 denote the smallest integer such
that β s ρ0 ≥ M 2 /amin and let Step 1 of Algorithm 4.1 be implemented by Algorithm
2.1 (MPRGP) with the parameters Γ > 0 and α ∈ (0, (amax +β s ρ0 c2max )−1 ] to generate
k,1
k,l
k,0
k
the iterates xk,0
= xk−1
t , xt , . . . , xt = xt for the solution of (44) starting from xt
t
−1
with xt = 0, where l = lt,k is the first index satisfying
or
k,l
k
kg P (xk,l
t , µt , ρt,k )k ≤ M kCt xt k
(48)
k
−1
kg P (xk,l
}.
t , µt , ρt,k )k ≤ ²kbt k min{1, M
(49)
xkt t
Then Algorithm 4.1 generates an approximate solution
of any problem (44) which
satisfies (46) at O(1) matrix-vector multiplications by the Hessian of the augmented
Lagrangian Lt for (44).
Proof: Let t ∈ T be fixed and let us denote by Lt (x, µ, ρ) the augmented Lagrangian
for the problem (44), so that for any x ∈ IRp and ρ ≥ 0
1
Lt (x, 0, ρ) = xT (At + ρCtT Ct )x − bTt x ≥
2
Applying the latter inequality to (41) with z0
we get
1
kbt k2
2
amin kxk − kbt kkxk ≥ −
.
2
2amin
= 0 and using assumption kbt k ≥ ηt ,
∞
X
ρt,k
ρt,i
η2
kbt k2
kCt xkt k2 ≤
kCt xit k2 ≤ h(z0 ) − L(x0t , µ0t , ρt,0 ) + (1 + s) t = (2 + s)
2
2amin
2amin
i=1 2
for any k ≥ 0. Thus by (43)
Lt (xk−1
, µkt , ρt,k )
t
−
Lt (xkt , µkt , ρt,k )
ηt2
βρt,k−1
βkbt k2
k−1 2
≤
+
kCt xt k ≤ (3 + s)
2amin
2
2amin
and, since the minimizer xkt of Lt ( . , µkt , ρt,k ) satisfies (35) and is possible choice for
xk , also that
βkbt k2
.
(50)
2amin
Using Theorem 3.2, we get that Algorithm 2.1 used to implement Step 1 of Algorithm
4.1 starting from xk,0
= xk−1
generates xk,l
t
t
t satisfying
1 P k,l k
kg (x , µt , ρt,k )k2 ≤ a1 ν l (Lt (xtk−1 , µkt , ρt,k ) − Lt (xkt , µkt , ρt,k ))
2 t t
βkbt k2 l
≤ a1 (3 + s)
ν
2amin
Lt (xk−1
, µkt , ρt,k ) − Lt (xkt , µkt , ρt,k ) ≤ (3 + s)
t
15
where
a1 =
36κ
αamin
b = max{Γ, Γ−1 }.
, κ = α−1 min{1, amin }−1 , ν = 1 −
, Γ
b
2
ν(1 − ν)
2 + 2Γ
It simply follows by the inner stop rule (49) that the number of the inner iterations
is uniformly bounded by any index lmax which satisfies
a1 ν lmax (3 + s)
βkbt k2
≤ ²2 kbt k2 min{1, M −2 }.
2amin
To finish the proof, it is enough to combine this result with Theorem 5.1.
6
2
Numerical experiments
We have implemented Algorithm 2.1 in MATLAB and solved a class of well conditioned problems of varying dimension. In all our tests, we use the quadratic forms
qt defined by the Hessian matrix At and the vectors bt , where each At is the symmetric Toeplitz matrix of the order 2 ∗ t2 that is fully determined by the nonzero
entries at11 = 12, at12 = at1t = −1, and the vectors b = bt are defined by the entries
bti = −1, i = 1, . . . , 2 ∗ t. Using the Gershgorin theorem, it is easy to see that the
eigenvalues λi of any At satisfy 8 ≤ λi ≤ 16. The lower bounds `t are defined by the
MATLAB script
0
l = [[−0.12 + 0.1 ∗ sin(2 ∗ pi ∗ linspace(1, t2 , t2 )/t2 )], −inf ∗ ones(1, t2 )] ;
In all our experiments, the initial approximation for x in the first run was the
zero vector. We solved the problems for t ∈ {10, 50, 250, 750} with ηt = kbt k, β =
10, µ0t = 0, M = 1, ρt,0 = 20 and varying Ct such that kCt k = 1 using the stopping
criterium kgt (x, µ, ρ)k ≤ 10−5 kbt k and kCt xk ≤ 10−5 kbt k. We have not observed any
update of the penalty parameter in agreement with (40). The auxiliary problems
(2) were solved by the MPRGP algorithm of Section 2 with
α = αt,k = 36−1 = (ρt,k + 16)−1 ≤ kAt + ρt,k CtT Ct k−1 .
.
In our first experiments, we assumed that the equality constraints are defined by
dt = 0 and by the matrix Ct with orthogonal rows defined by the MATLAB script
C(:, n/2 − t + 1 : n/2 + t) = (1/sqrt(2)) ∗ [speye(t, t), −rot90(speye(t, t))];
16
Table 1: Performance of the SMALBE for orthonormal rows
Equality
constrains
10
50
250
750
1150
Dimension n
200
5000
125000
1125000
2645000
Active
constraints
48
1244
31940
289288
680900
Matrix×vector
multiplications
50
71
82
90
100
Outer
iterations
9
8
7
7
6
Table 2: Effect of conditioning of the equality constraints
Equality
constraints
10
50
250
750
Dimension
200
5000
125000
1125000
Outer
iterations
84
9
7
7
Matrix×vector
multiplications
208
75
82
94
Conditioning
9.0
45.3
107.5
1061.4
Thus Ct is formed by t rows of 2 ∗ t2 entries which are zeros except ci,t2 −t+i = 1
and ci,t2 +t−i+1 = −1, i = 1, . . . , t. The matrix Ct is designed to enforce the kind of
symmetry xti,t2 −t+i = xt,t2 +t−i+1 , i = 1, . . . , t. The rows of Ct are orthogonal. The
results of computation are in Table 1. We can observe that the number of the outer
iterations is not increasing, while the number of the matrix vector multiplications is
increasing relatively mildly with the dimension of the problem.
To see the effect of the conditioning of the constraints, we first modified the
matrix Ct by adding to each row (except the last one) the following row and then
we normalized the resulting matrix. The results are in Table 2 which includes
also the conditioning of the constraints. We can see that the number of both the
outer iterations and the matrix-vector multiplications decreases even though there
is deteriorating conditioning of the constraints. The results are in agreement with
the theory.
To see what happens when the constraints are dependent, we first formed an
auxiliary matrix Cbt by appending the first 0.1 ∗ t rows of Ct to Ct so that Cbt had
1.1 ∗ t rows, and then we modified the matrix Cbt by summing and normalizing as
17
Table 3: Experiments with dependent rows
Equality
constraints
11
55
275
825
Dimension
200
5000
125000
1125000
Outer
iterations
143
24
11
10
Matrix×vector
multiplications
338
107
94
106
above. The resulting constraint matrix corresponding to t has the same first t rows
as the constraint matrix of the previous experiments and 0.1 ∗ t additional rows that
are linear combinations of the first ones. We can observe that the results in Table 3
are comparable to those in Table 2. Rather surprising feature of Table 2 and Table
3 is slow convergence for the small problems. The latter can be improved by more
sophisticated choice of the parameters. For example, using ρ = 500 reduces the
number of the inner and outer iterations for t = 10 to 119 and 7, respectively. Let
us recall that large ρ need not slow down the convergence of the inner loop [5].
The results indicate that we can observe optimality in practice for well conditioned problems. More numerical experiments including solution of the discretized
elliptic variational inequality and details of implementation may be found in [8, 18].
7
Comments and conclusions
We have proved new results concerning optimality of our recently proposed algorithm [8] for the solution of the convex bound and equality constrained quadratic
programming problems. While in [8] we proved optimality of the outer loop of
our algorithm, in this paper we show its optimality in terms of the matrix-vector
multiplications. We obtained also new results concerning the convergence of the
projected gradient of our algorithm for the bound constrained problems. Our optimality results do not depend on conditioning of the matrix which defines the equality
constraints and remain valid even for dependent constraints.
The results were confirmed numerically on the solution of the well conditioned
benchmarks with varying dimension and constraints. The results of the paper are important ingredients in the development of scalable algorithms for variational inequalities based on the FETI and explain experimental results observed earlier [16, 18].
The results will be also used to extend our earlier results on optimality of FETI-DP
18
[20] to the semi-coercive case. We shall describe these applications in more detail
elsewhere.
References
[1] Bertsekas, D. P.: Nonlinear Optimization. Belmont: Athena Scientific 1999.
[2] Conn, A. R., Gould N. I. M., Toint Ph. L.: LANCELOT: a Fortran package
for large scale nonlinear optimization. Berlin: Springer Verlag 1992.
[3] Dostál, Z.: Duality based domain decomposition with proportioning for the
solution of free boundary problems. J. Comput. and Appl. Mathematics 63,
203-208(1995).
[4] Dostál, Z.: Box constrained quadratic programming with proportioning and
projections. SIAM J. Optimization 7, 871-887(1997).
[5] Dostál Z.: On preconditioning and penalized matrices. Num. Lin. Algebra
with Applications 6, 109-114(1999).
[6] Dostál, Z.: A proportioning based algorithm for bound constrained quadratic
programming with the rate of convergence. Numer. Algorithms 34, 293302(2003).
[7] Dostál, Z.: Semi-monotonic inexact augmented Lagrangians for quadratic programming with equality constraints. Optimiz. Methods & Software, in print.
[8] Dostál, Z.: Inexact semi-monotonic Augmented Lagrangians with optimal feasibility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numer. Analysis, to appear.
[9] Dostál, Z., Friedlander, A., Santos, S. A.: Solution of coercive and semicoercive
contact problems by FETI domain decomposition. Contemporary Mathematics 218, 83-93(1998).
[10] Dostál, Z., Friedlander, A., Santos, S. A.: Adaptive precision control in
quadratic programming with simple bounds and/or equalities. High Performance Software for Non-linear Optimization, eds. Leone, R. De et al., G.,
Kluwer, Applied Optimization 24, 161-173(1998).
19
[11] Dostál, Z., Friedlander, A., Santos, S. A.: Augmented Lagrangians with adaptive precision control for quadratic programming with equality constraints.
Comput. Optimiz. and Applications 14, 37-53(1999).
[12] Dostál, Z., Friedlander, A., Santos, S. A., Alesawi, K.: Augmented Lagrangians with adaptive precision control for quadratic programming with
equality constraints: corrigendum and addendum. Comput. Optimiz. and Applications 23, 127-133(2002).
[13] Dostál, Z., Friedlander, A., Santos, S. A.: Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and
equality constraints. SIAM J. Optimization 13, 1120-1140(2003).
[14] Dostál, Z., Gomes, F. A. M., Santos, S. A.: Solution of contact problems by
FETI domain decomposition with natural coarse space projection. Comput.
Methods in Appl. Mech. and Engineering 190, 1611-1627(2000).
[15] Gondzio, J., Sarkissian, R.: Parallel interior point solver for structured linear
programs. Mathematical Programming 96, 561-584(2003).
[16] Dostál, Z., Horák, D.: Scalability and FETI based algorithm for large discretized variational inequalities. Math. and Comput. in Simulation 61, 347357(2003).
[17] Dostál, Z., Horák, D.: Scalable FETI with Optimal Dual Penalty for a
Variational Inequality: Numerical Linear Algebra and Applications 11, 455472(2004).
[18] Z. Dostál and D. Horák, On scalable algorithms for numerical solution of
variational inequalities based on FETI and semi-monotonic augmented Lagrangians, in Domain Decomposition Methods in Science and Engineering,
eds. R. Kornhuber et al., Springer 2004.
[19] Dostál, Z., Horák, D.: Scalable FETI with Optimal Dual Penalty for Semicoercive Variational Inequalities. Contemporary Mathematics 329, 293-302(2003).
[20] Dostál, Z., Horák, D., Stefanica, D.: A Scalable FETI–DP Algorithm for
Coercive Variational Inequalities, IMACS J. Appl. Numer. Math., in print.
20
[21] Dostál, Z., Schöberl, J.: Minimizing quadratic functions over non-negative
cone with the rate of convergence and finite termination, Computat. Optimiz.
with Applications, in print.
[22] Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the
FETI domain decomposition method. Comput. Methods in Appl. Mech. and
Engineering 115, 365-385(1994).
[23] Friedlander, A., Martı́nez, J. M., Raydan, M.: A new method for large scale
box constrained quadratic minimization problems. Optimiz. Meth. and Software 5 57-74(1995).
[24] Friedlander, A., Martı́nez, J. M., Santos, S.A.: A new trust region algorithm
for bound constrained minimization, Appl. Math. & Optimization 30, 235266(1994).
[25] Hackbusch, W.: Multigrid Methods and Applications. Berlin: Springer 1985.
[26] Hager, W. W.: Analysis and implementation of a dual algorithm for constraint
optimization. JOTA 79, 37-71(1993).
[27] Nocedal, J., Wright S. J.: Numerical Optimization. New York: Springer,2000.
[28] Saad, Y.: Iterative Methods for Sparse Systems. Philadelphia: SIAM 2003.
[29] Schöberl, J.: Solving the Signorini problem on the basis of domain decomposition techniques, Computing 60, 323-344(1998).
21