An optimal algorithm for bound and equality constrained quadratic programming problems with bounded spectrum ∗ Z. Dostál VŠB-Technical University Ostrava, Tř 17 listopadu, CZ-70833 Ostrava, Czech Republic. E-mail [email protected] October 29, 2004 Abstract An implementation of the recently proposed semi-monotonic augmented Lagrangian algorithm for solving the large convex bound and equality constrained quadratic programming problems is considered. It is proved that if the algorithm is applied to the class of problems with uniformly bounded spectrum of the Hessian matrix, then the algorithm finds an approximate solution at O(1) matrix-vector multiplications. The optimality results are presented that do not depend on conditioning of the matrix which defines the equality constraints. Theory covers also the problems with dependent constraints. Theoretical results are illustrated by numerical experiments. AMS classification: Primary 65K05; Secondary 90C20 Key words and phrases: quadratic programming, bound and equality constraints, augmented Lagrangians, optimal algorithms ∗ This research has been supported by grant GAČR 101/04/1145, GA CAS S3086102 and 1ET400300415. 1 1 Introduction An important ingredient in development of effective methods for the solution of very large problems is identification of the algorithms that can solve some special cases with the optimal (i.e. asymptotically linear) complexity. For example, if an iterative method with the known rate of convergence in the condition number is applied to the class of problems with the uniformly bounded condition number, we may bound uniformly the number of the iterates to get an optimal algorithm. Such simple observation was indeed the part of the success of the multigrid [25] or the FETI based domain decomposition [22] methods for the solution of elliptic partial differential equations. We shall be concerned with the problem of finding effectively the minimizer of the quadratic function subject to the bound and linear equality constraints, that is minimize q(x) subject to x ∈ ΩBE (1) with ΩBE = {x ∈ IRn : x ≥ ` and Cx = 0}, q(x) = 21 xT Ax − bT x, A ∈ IRn×n symmetric positive definite, C ∈ IRm×n , and `, b, x ∈ IRn . We do not assume that C is a full row rank matrix nor m ≤ n, but we shall assume that ΩBE is not empty. We shall also assume that C is sparse. Having in mind the optimality results mentioned above, we shall be especially interested in problems with the matrix A large, sparse and well conditioned. The condition on sparsity of A may be replaced by any other condition which implies cheap evaluation of Ay for any y ∈ IRn . Such problems arise, for example, from the discretization of semicoercive elliptic variational inequalities (e.g. Dostál [3]) or from application of the duality based domain decomposition to the contact problems of elasticity (e.g. Dostál, Friedlander and Santos [9], Dostál, Gomes and Santos [14] or Dostál and Horák [16]). There are several basic algorithms for solving (1). Probably the most popular are those based on the interior point method (e. g. [27]). For example, the interior point methods were applied successfully to the solution of very large problems of nonlinear optimization with many constraints and millions of decision variables observing that the Hessian matrix with a special pattern of the nonzero elements may be decomposed with nearly linear complexity [15]. In spite of the success of the interior point methods for many problems, it is not obvious how they can exploit favorable distribution of the spectrum of the Hessian matrix A. The reason is that auxiliary linear problems of the interior point method are ill conditioned due to the penalization. Another successful algorithm, a variant of the augmented Lagrangian method proposed by Conn, Gould and Toint and enhanced into the 2 package LANCELOT [2], generates approximations of the Lagrange multipliers in the outer loop while the bound constrained auxiliary problems with symmetric positive definite matrices are solved approximately in the inner loop. The auxiliary problems are of the type minimize L(x, µk , ρk ) subject to x≥` where (2) ρk kCx − dk2 (3) 2 is known as the augmented Lagrangian function, µk = (µk1 , . . . , µkm )T is the vector of Lagrange multipliers for the equality constraints, ρk is the penalty parameter, and k · k denotes the Euclidean norm. We use two modifications of the algorithm of Conn, Gould and Toint [8]. The first one concerns the precision of the approximate solution xk of the auxiliary problems which is defined by the norm of the projected gradient of the augmented Lagrangian and is controlled by a fixed multiple of the Euclidean norm of the feasibility error. Let us recall that the same precision control has already been considered by Hager [26] and Dostál, Friedlander, Santos and Alesawi [7, 10, 11, 12] for the problems with equality constraints and by Dostál, Friedlander and Santos [8, 13] for bound and equality constraints. The second modification concerns the update rule for the penalty parameter ρk which is increased until there is monotonic increase of L(xk µk , ρk ) [8]. It turned out that not only the latter is achieved with quite small values of the penalty parameter, but that it also enables to get results on the convergence of the feasibility error that are independent of the conditioning of the constraints. The point of this paper is to show that the results mentioned above are sufficient to guarantee that the algorithm implemented with the solution of the auxiliary problems (2) by the recently proposed conjugate gradient based method with the proportioning and reduced gradient projections [21] finds an approximate solution of the class of problems with uniformly bounded spectrum of the Hessian matrix at O(1) matrix-vector multiplications. If we apply this result to the class of problems with the Hessian matrices that are in addition either sufficiently sparse or can be be expressed as a product of sufficiently sparse matrices, then the algorithm is optimal also in the sense that it can solve the classes of such problems with the asymptotically linear complexity. While the results of [8] are independent of implementation of the inner loop, the results of this paper concentrate just on the implementation of the L(x, µk , ρk ) = h(x) + (µk )T (Cx − d) + 3 inner loop. Let us recall that the results are not a trivial corollary of the bound on a number of the outer iterations proved in [8] as the multipliers µk that appear in the linear term of the augmented Lagrangian need not be bounded. Moreover, the precision of the solution of the auxiliary problems is controlled by the projected gradient which is not continuous. The paper is organized as follows. In Section 2 we review the MPRGP algorithm for the bound constrained quadratic programming. In Section 3 we present new results on the rate of convergence of the iterates generated by MPRGP which are important for analysis of optimality. The semi-monotonic augmented Lagrangian algorithm SMALBE for minimization of the convex quadratic functions subject to the bound and equality constraints is reviewed at Section 4. In Section 5 we present analysis of optimality of the algorithm with the inner loop implemented by MPRGP. The theoretical results are illustrated by numerical experiments with solving benchmarks up to more than two and half million variables in Section 6. Finally, some conclusions are discussed in Section 7. In the whole paper, q(x) = 12 xT Ax − bT x will denote a strictly convex quadratic function defined on IRn , with the Hessian matrix ∇2 q = A ∈ IRn×n symmetric positive definite and x, b ∈ IRn . The eigenvalues of A will be always denoted λi (A), λmin (A) = λ1 (A) ≤ . . . ≤ λn (A) = λmax (A) = kAk. For any nonempty sets I, J and matrix M , we shall denote by MIJ or MI the submatrices of M that comprise rows and columns determined by the sets I and J , respectively. In the same spirit, given a vector x ∈ IRn , a subvector of x with the components determined by I will be denoted by xI . 2 Bound constrained problems We shall now be concerned with the solution of the auxiliary problem (2). To simplify our exposition, we shall consider the problem minimize q(x) subject to x ∈ ΩB (4) with ΩB = {x : x ≥ `} and the other notation from the introduction. Let us recall that the solution of (4) is of independent interest and arise e.g. in application of the duality based domain decomposition methods to the solution of the discretized variational inequalities [29, 17, 19, 20] or in computer graphics. 4 It is well known that the solution to the problem (4) always exists, and it is necessarily unique [1]. For arbitrary n-vector x, let us denote in this section the gradient g = g(x) of q by g = g(x) = Ax − b. (5) Then the unique solution x of (4) is fully determined by the Karush-Kuhn-Tucker optimality conditions [1] so that for i = 1, . . . , n, xi = `i implies g i ≥ 0 and xi > `i implies g i = 0. (6) Let N denote the set of all indices so that N = {1, 2, . . . , n}. The set of all indices for which xi = `i is called an active set of x. We shall denote it by A(x) so that A(x) = {i ∈ N : xi = `i }. Its complement F(x) = {i ∈ N : xi 6= `i } and subset B(x) = {i ∈ N : xi = `i and gi > 0} are called a free set and a binding set, respectively. To enable an alternative reference to the Karush-Kuhn-Tucker conditions (6), we shall introduce a notation for the free gradient ϕ and the chopped gradient β that are defined by ϕi (x) = gi (x) for i ∈ F(x) ϕi (x) = 0 for i ∈ A(x) βi (x) = 0 for i ∈ F(x), βi (x) = gi− (x) for i ∈ A(x) where we have used the notation gi− = min{gi , 0}. Thus the Karush- Kuhn-Tucker conditions (6) are satisfied iff the projected gradient g P (x) = ϕ(x) + β(x) is equal to zero. The projection PΩ to Ω is defined for any n-vector x by PΩ (x) = ` + (x − `)+ where y + denote for any n-vector y the vector with entries yi+ =max {yi , 0}. The algorithm for the solution of (4) that we describe here combines the proportioning algorithm mentioned above with the reduced gradient projections. It exploits a given constant Γ > 0, a test to decide about leaving the face and three 5 types of steps to generate a sequence of iterates {xk } that approximate the solution of (4). The expansion step is defined by ³ ´ xk+1 = PΩ xk − αϕ(xk ) (7) with the fixed steplength α ∈ (0, kAk−1 ]. This step may expand the current active set. To describe it without PΩ , let us introduce, for any x ∈ Ω, the reduced free e gradient ϕ(x) with the entries ϕei = ϕei (x) = min{(xi − `i )/α, ϕi }, so that e xk+1 = x − αϕ(x). (8) e k )> ϕ(xk ) ||β(xk )||2 ≤ Γ2 ϕ(x (9) If the inequality holds then we call the iterate xk strictly proportional. The test (9) is used to decide which component of the projected gradient g P (xk ) will be reduced in the next step. Notice that the right-hand side of (9) blends the information about the current free gradient and its part that can be used in the expansion step, while the related relations in [4, 6, 24, 23] consider only the norm of the free gradient. The proportioning step is defined by xk+1 = xk − αcg β(xk ) ³ (10) ´ with the steplength αcg that minimizes q xk − αβ(xk ) . It is easy to check [28] that αcg that minimizes q(x − αd) for given d and x may be evaluated by the formula αcg = αcg (d) = d> g(x) . dT Ad (11) The purpose of the proportioning step is to remove indices from the active set. Note that if xk ∈ Ω, then xk+1 = xk − αcg β(xk ) ∈ Ω. The conjugate gradient step is defined by xk+1 = xk − αcg pk (12) where pk is the conjugate gradient direction [28] which is constructed recurrently. The recurrence starts (or restarts) from ps = ϕ(xs ) whenever xs is generated by the 6 expansion step or the proportioning step. If pk is known, then pk+1 is given by the formulae [?, 28] ϕ(xk+1 )> Apk pk+1 = ϕ(xk+1 ) − γpk , γ = . (13) (pk )> Apk The basic property of the conjugate directions ps , . . . , pk that are generated by the recurrence (13) from the restart ps is their mutual A-orthogonality, i. e. (pi )> Apj = 0 for i, j ∈ {s, . . . , k}, i 6= j. It follows easily [28] that q(xk+1 ) = min{q(xs + y) : y ∈ Span{ps , . . . , pk }} (14) where Span{ps , . . . , pk } denotes the vector space of all linear combinations of the vectors ps , . . . , pk . The conjugate gradient steps are used to carry out the minimization in the face WI = {x : xi = `i for i ∈ I } (15) given by I = A(xs ) efficiently. Let us define the algorithm that we propose in the form that is convenient for analysis. Algorithm 1. Modified proportioning with reduced gradient projections (MPRGP). Let x0 ∈ Ω, α ∈ (0, kAk−1 ], and Γ > 0 be given. For k ≥ 0 and xk known, choose xk+1 by the following rules: (i) If g P (xk ) = 0, set xk+1 = xk . (ii) If xk is strictly proportional and ν(xk ) 6= 0, try to generate xk+1 by the conjugate gradient step. If xk+1 ∈ Ω, then accept it, else generate xk+1 by the expansion step. (iii) If xk is not strictly proportional, define xk+1 by proportioning. 3 Convergence of the modified proportioning Basic theoretical results concerning the algorithm read as follows: Theorem 1. Let Γ and amin be given constants, 0 < Γ and 0 < amin ≤ λmin (A), b = max{Γ, Γ−1 }, let x denote the unique solution of (4), and let {xk } denote the Γ sequence generated by Algorithm 3.1 with α ∈ (0, kAk−1 ]. (i) For any k ≥ 0 ³ ´ (16) q(xk+1 ) − q(x) ≤ ν q(xk ) − q(x) 7 where x denotes the unique solution of (4) and ν =1− αamin . b2 2 + 2Γ (17) (ii) The solution error is bounded by ´ 2ν k ³ 0 kx − xk ≤ q(x ) − q(x) . amin k 2 (18) (iii) If the solution x is not dual degenerate, then there is k such that xk = x. (iv) If µq ¶ Γ≥2 κ(A) + 1 , (19) then there is k ≥ 0 such that xk = x. Proof: See [21]. Since we shall use the MPRGP algorithm also in the inner loop of other algorithms, it is not enough to know that it converges, as we must be able to recognize in time that we are sufficiently near the solution. What we need is the rate of convergence of the projected gradient. Let us recall that the result is not a trivial consequence of (18) as the projected gradient is not continuous function of the iterates. Theorem 2. Let Γ and amin be given constants, 0 < Γ and 0 < amin ≤ λmin (A), b = max{Γ, Γ−1 }, let x denote the unique solution of (4), and let {xk } denote the seΓ quence generated by Algorithm 3.1 with α ∈ (0, kAk−1 ]. Let κ = α−1 min{1, λmin (A)}−1 so that the spectral condition number κ(A) of A satisfies κ(A) ≤ κ. Then the projected gradient is bounded by ³ ´ kg P (xk )k2 ≤ a1 ν k q(x0 ) − q(x) with ν defined by (17) and a1 = 36κ . ν(1 − ν) (20) (21) Proof: First notice that it is enough to estimate separately β(xk ) and ϕ(xk ) as kg P (x)k2 = kβ(xk )k2 + kϕ(xk )k2 . 8 In particular, since for any vector d such that dT g(x) ≥ kdk2 1 α q(x) − q(x − αd) = αdT g(x) − α2 dT Ad ≥ kdk2 , 2 2 (22) we can combine (22) with xk − αβ(xk ) ≥ ` to estimate kβ(xk )k by q(xk ) − q(x) = (q(xk ) − q(xk − αβ(xk )) + (q(xk − αβ(xk )) − q(x)) ≥ α kβ(xk )k2 . 2 Applying (16), we get kβ(xk )k2 ≤ ´ ´ 2³ k 2ν k ³ 0 q(x ) − q(x) ≤ q(x ) − q(x) ). α α (23) To estimate kϕ(xk )k, notice that the algorithm ”does not know” about the the components of the constraint vector ` when it generates xk+1 unless their indices belong to A(xk ) or A(xk+1 ). It follows that xk+1 may be considered also as an iterate generated by Algorithm 2.1 from xk for the problem minimize q(x) subject to xi ≥ `i for i ∈ A(xk ) ∪ A(xk+1 ). (24) If we denote q k = min{q(x) : xi ≥ `i for i ∈ A(xk ) ∪ A(xk+1 )} ≤ q(x) and δ k = q(x) − q k ≥ 0, we can use (16) to get δ k = q(x) − q k ≤ q(xk+1 ) − q k ≤ ν((q(xk ) − q k ) = ν(q(xk ) − q(x)) + νδ k , so that ν ν k+1 (q(xk ) − q(x)) ≤ (q(x0 ) − q(x)). (25) 1−ν 1−ν Now observe that the indices of the unconstrained components of the minimization problem (24) are those belonging to I k = F(xk ) ∩ F(xk+1 ) as δk ≤ I k = F(xk ) ∩ F(xk+1 ) = (N \ A(xk )) ∩ (N \ A(xk+1 )) = N \ A(xk ) ∪ A(xk+1 ). It follows that if I k is nonempty, then by the definition of δ k and (22) δ k ≥ q(x) − q (x − αgI k (x)) ≥ 9 α kg k (x)k2 . 2 I (26) For convenience, let us define gI (x) = 0 for any x and empty set I = ∅. Then (26) remains valid for I k = ∅ so that we can combine it with (25) to get kgI k (x)k2 ≤ 2 2ν k+1 δk ≤ (q(x0 ) − q(x)). α α(1 − ν) (27) Since our algorithm is defined so that either I k = F(xk ) ⊆ F (xk+1 ) or I k = F(xk+1 ) ⊆ F(xk ), it follows that either kgF(xk ) (x)k2 = kgI k (x)k2 ≤ 2ν k+1 2ν k (q(x0 ) − q(x)) ≤ (q(x0 ) − q(x)) (28) α(1 − ν) α(1 − ν) or kgF(xk+1 ) (x)k2 = kgI k (x)k2 ≤ 2ν k+1 (q(x0 ) − q(x)). α(1 − ν) Using the same reasoning for xk−1 and xk , we conclude that the estimate (28) is valid for any xk such that F(xk ) ⊆ F (xk+1 ) or F(xk ) ⊆ F(xk−1 ). (29) Let us now recall that by (18) kg(xk ) − g(x)k2 = kA(xk − x)k2 ≤ ´ 2ν k kAk ³ 0 q(x ) − q(x) , amin (30) so that for any k satisfying the relations (29), we get kϕ(xk )k = kgF (xk ) (xk )k ≤ kgF(xk ) (xk ) − gF(xk ) (x)k + kgF(xk ) (x)k s ≤ 2kAk k ν (q(x0 ) − q(x)) + amin s ≤ 2 s 2 ν k (q(x0 ) − q(x)) α(1 − ν) 2κ ν k (q(x0 ) − q(x)). (1 − ν) Combining the last inequality with (23), we get for any k satisfying the relations (29) that kg P (xk )k2 = kβ(xk )k2 + kϕ(xk )k2 ≤ 10 ´ 10κ k ³ 0 ν q(x ) − q(x) . (1 − ν) (31) Now notice that the estimate (31) is valid for any iterate xk which satisfies F(xk ) ⊆ F (xk−1 ), i. e. when xk is generated by the conjugate gradient step or the expansion step. Thus it remains to estimate the projected gradient of the iterate xk generated by the proportioning step. In this case F(xk−1 ) ⊆ F(xk ), so that we can obviously use the estimate (31) to get s P kg (x k−1 )k ≤ 10κ k−1 ν (q(x0 ) − q(x)). (1 − ν) (32) Since the proportioning step is defined by xk = xk−1 − αcg β(xk−1 ), it follows that kgF(xk ) (xk−1 )k = kg P (xk−1 )k. Moreover, using the basic properties of the norm, we get that kϕ(xk )k = kgF(xk ) (xk )k ≤ kgF(xk ) (xk ) − gF(xk ) (xk−1 )k + kgF(xk ) (xk−1 )k ≤ kg(xk ) − g(x)k + kg(x) − g(xk−1 )k + kg P (xk−1 )k and by (30) and (32) kϕ(xk )k ≤ và ! v ! u uà u 2kAk u 2kAk t ν k (q(x0 ) − q(x)) + t ν k−1 (q(x0 ) − q(x)) s amin amin 10κ k−1 ν (q(x0 ) − q(x)) (1 − ν) s √ 2κ ≤ ( 5 + 2) ν k−1 (q(x0 ) − q(x)). (1 − ν) + (33) Combining the last inequality with (23), we get by simple computation that kg P (xk )k2 = kϕ(xk )k2 + kβ(xk )k2 ≤ ´ 36κ k−1 ³ 0 ν q(x ) − q(x) . (1 − ν) Since the last estimate is obviously weaker than (31), it follows that (20) is valid for all indices k. 2 11 4 Algorithm for bound and equality constrained problems We shall be finally concerned with the problem (1). To describe our algorithm, we shall denote the gradient of the augmented Lagrangian by g, so that g(x, µ, ρ) = ∇x L(x, µ, ρ) = Ax − c + C T µ + ρC T (Cx − d). (34) Algorithm 2. Semi-monotonic augmented Lagrangian method for bound and equality constrained problems (SMALBE) Given η > 0, β > 1, M > 0, ρ0 > 0, and µ0 ∈ IRm , set k = 0. Step 1. {Inner iteration with adaptive precision control.} Find xk ≥ ` such that kg P (xk , µk , ρk )k ≤ min{M kCxk k, η}. (35) µk+1 = µk + ρk Cxk . (36) Step 2. {Update µ.} Step 3. {Update ρ provided the increase of the Lagrangian is not sufficient.} If k > 0 and ρk L(xk , µk , ρk ) < L(xk−1 , µk−1 , ρk−1 ) + kCxk k2 (37) 2 then ρk+1 = βρk , (38) ρk+1 = ρk . (39) else Step 4. Set k = k + 1 and return to the Step 1. Step 1 was shown to be well defined [13], that is, any algorithm for the solution of the auxiliary problems in Step 1 which guarantees convergence of the projected gradient will generate either xk that satisfies (35) in a finite number of steps or a sequence of approximations that converges to the solution of (1). The basic theoretical results concerning this algorithm are summed up in the following theorem. 12 Theorem 3. Let {xk }, {µk } and {ρk } be generated by Algorithm 3.1 with η > 0, β > 1, M > 0, ρ0 > 0, µ0 ∈ IRm . Let α = λmin (A) denote the least eigenvalue of the Hessian A of q and let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /α. (i) The sequence {ρk } is bounded and ρk ≤ β s ρ0 . (40) (ii) If z0 ∈ Ω then ∞ X ρk k=1 2 kCxk k2 ≤ q(z0 ) − L(x0 , µ0 , ρ0 ) + (1 + s) η2 . 2α (41) (iii) For any k ≥ 0 L(xk+1 , µk+1 , ρk+1 ) ≥ L(xk , µk , ρk ) + ρk ρk+1 η2 kCxk k2 + kCxk+1 k2 − . 2 2 2α (42) (iv) kCxk k converges to 0. (iv) The sequence {xk } converges to the solution x∗ of (1). (v) If the solution x∗ of (1) is regular, then {xk } and {µk } converge to the solution x∗ and the vector µ∗ of the Lagrange multipliers of (1), respectively. Proof: See [8]. 2 We shall need the following simple Lemma to prove optimality of the inner loop. Lemma 1. Let {xk }, {µk } and {ρk } be generated by Algorithm 2.1 with η > 0, β > 1, M > 0, ρ0 > 0, µ0 ∈ IRm . Let 0 < amin ≤ min{1, λmin (A)} denote the least eigenvalue of the Hessian A of the quadratic function q and let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /amin . Then for any k ≥ 0 k L(x , µ k+1 , ρk+1 ) − L(x k+1 ,µ k+1 η2 βρk , ρk+1 ) ≤ + kCxk − dk2 . 2amin 2 Proof: Notice that by definition of the Lagrangian function (3) L(xk , µk+1 , ρk+1 ) = L(xk , µk , ρk ) + ρk kCxk − dk2 + = L(xk , µk , ρk ) + ρk+1 − ρk kCxk − dk2 2 ρk+1 + ρk kCxk − dk2 , 2 13 (43) so that by definition of Step 2 (36) and (42) L(xk , µk+1 , ρk+1 ) − L(xk+1 , µk+1 , ρk+1 ) = L(xk , µk , ρk ) − L(xk+1 , µk+1 , ρk+1 ) ρk+1 + ρk + kCxk − dk2 2 η2 βρk ≤ + kCxk − dk2 . 2 2amin 2 5 Linear complexity of semi-monotonic algorithm To present study optimality of Algorithm 2, let T denote any set of indices and let for any t ∈ T be defined a problem t minimize qt (x) s.t. x ∈ ΩBE (44) with ΩtBE = {x ∈ IRnt : Ct x = 0 and x ≥ `t }, qt (x) = 21 xT At x − bTt x, At ∈ IRnt ×nt symmetric positive definite, Ct ∈ IRmt ×nt , and bt , `t ∈ IRnt . For the sake of simplicity, we shall also assume that 0 ∈ ΩtBE for all t ∈ T . Our optimality result reads as follows. Theorem 4. Let {xkt }, {µkt } and {ρt,k } be generated by Algorithm 4.1 for (44) with kbt k ≥ ηt > 0, β > 1, M > 0, ρt,0 = ρ0 > 0, µ0t = 0. Let amin denote a positive constant such that amin ≤ λmin (At ), and let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /amin . Then for each ² > 0 and t ∈ T there is an index kt such that kt ≤ 2+s +1 min ρ0 ²2 a (45) and M −1 kg P (xkt t , µkt t , ρt,kt )k ≤ kCt xkt t k ≤ ²kbt k. Proof: See [8]. (46) 2 Now we shall prove the main result of the paper, i. e. optimality of Algorithm 2 in terms of the matrix-vector multiplications provided Step 1 is implemented by MPRGP. Theorem 5. Let 0 < amin < amax and 0 < cmax be given constants and let the class of problems (44) satisfies amin ≤ λmin (At ) ≤ λmax (At ) ≤ amax and kCt k ≤ cmax . 14 (47) Let {xkt }, {µkt } and {ρt,k } be generated by Algorithm 4.1 for (44) with kbt k ≥ ηt > 0, β > 1, M > 0, ρt,0 = ρ0 > 0, µ0t = 0. Let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /amin and let Step 1 of Algorithm 4.1 be implemented by Algorithm 2.1 (MPRGP) with the parameters Γ > 0 and α ∈ (0, (amax +β s ρ0 c2max )−1 ] to generate k,1 k,l k,0 k the iterates xk,0 = xk−1 t , xt , . . . , xt = xt for the solution of (44) starting from xt t −1 with xt = 0, where l = lt,k is the first index satisfying or k,l k kg P (xk,l t , µt , ρt,k )k ≤ M kCt xt k (48) k −1 kg P (xk,l }. t , µt , ρt,k )k ≤ ²kbt k min{1, M (49) xkt t Then Algorithm 4.1 generates an approximate solution of any problem (44) which satisfies (46) at O(1) matrix-vector multiplications by the Hessian of the augmented Lagrangian Lt for (44). Proof: Let t ∈ T be fixed and let us denote by Lt (x, µ, ρ) the augmented Lagrangian for the problem (44), so that for any x ∈ IRp and ρ ≥ 0 1 Lt (x, 0, ρ) = xT (At + ρCtT Ct )x − bTt x ≥ 2 Applying the latter inequality to (41) with z0 we get 1 kbt k2 2 amin kxk − kbt kkxk ≥ − . 2 2amin = 0 and using assumption kbt k ≥ ηt , ∞ X ρt,k ρt,i η2 kbt k2 kCt xkt k2 ≤ kCt xit k2 ≤ h(z0 ) − L(x0t , µ0t , ρt,0 ) + (1 + s) t = (2 + s) 2 2amin 2amin i=1 2 for any k ≥ 0. Thus by (43) Lt (xk−1 , µkt , ρt,k ) t − Lt (xkt , µkt , ρt,k ) ηt2 βρt,k−1 βkbt k2 k−1 2 ≤ + kCt xt k ≤ (3 + s) 2amin 2 2amin and, since the minimizer xkt of Lt ( . , µkt , ρt,k ) satisfies (35) and is possible choice for xk , also that βkbt k2 . (50) 2amin Using Theorem 3.2, we get that Algorithm 2.1 used to implement Step 1 of Algorithm 4.1 starting from xk,0 = xk−1 generates xk,l t t t satisfying 1 P k,l k kg (x , µt , ρt,k )k2 ≤ a1 ν l (Lt (xtk−1 , µkt , ρt,k ) − Lt (xkt , µkt , ρt,k )) 2 t t βkbt k2 l ≤ a1 (3 + s) ν 2amin Lt (xk−1 , µkt , ρt,k ) − Lt (xkt , µkt , ρt,k ) ≤ (3 + s) t 15 where a1 = 36κ αamin b = max{Γ, Γ−1 }. , κ = α−1 min{1, amin }−1 , ν = 1 − , Γ b 2 ν(1 − ν) 2 + 2Γ It simply follows by the inner stop rule (49) that the number of the inner iterations is uniformly bounded by any index lmax which satisfies a1 ν lmax (3 + s) βkbt k2 ≤ ²2 kbt k2 min{1, M −2 }. 2amin To finish the proof, it is enough to combine this result with Theorem 5.1. 6 2 Numerical experiments We have implemented Algorithm 2.1 in MATLAB and solved a class of well conditioned problems of varying dimension. In all our tests, we use the quadratic forms qt defined by the Hessian matrix At and the vectors bt , where each At is the symmetric Toeplitz matrix of the order 2 ∗ t2 that is fully determined by the nonzero entries at11 = 12, at12 = at1t = −1, and the vectors b = bt are defined by the entries bti = −1, i = 1, . . . , 2 ∗ t. Using the Gershgorin theorem, it is easy to see that the eigenvalues λi of any At satisfy 8 ≤ λi ≤ 16. The lower bounds `t are defined by the MATLAB script 0 l = [[−0.12 + 0.1 ∗ sin(2 ∗ pi ∗ linspace(1, t2 , t2 )/t2 )], −inf ∗ ones(1, t2 )] ; In all our experiments, the initial approximation for x in the first run was the zero vector. We solved the problems for t ∈ {10, 50, 250, 750} with ηt = kbt k, β = 10, µ0t = 0, M = 1, ρt,0 = 20 and varying Ct such that kCt k = 1 using the stopping criterium kgt (x, µ, ρ)k ≤ 10−5 kbt k and kCt xk ≤ 10−5 kbt k. We have not observed any update of the penalty parameter in agreement with (40). The auxiliary problems (2) were solved by the MPRGP algorithm of Section 2 with α = αt,k = 36−1 = (ρt,k + 16)−1 ≤ kAt + ρt,k CtT Ct k−1 . . In our first experiments, we assumed that the equality constraints are defined by dt = 0 and by the matrix Ct with orthogonal rows defined by the MATLAB script C(:, n/2 − t + 1 : n/2 + t) = (1/sqrt(2)) ∗ [speye(t, t), −rot90(speye(t, t))]; 16 Table 1: Performance of the SMALBE for orthonormal rows Equality constrains 10 50 250 750 1150 Dimension n 200 5000 125000 1125000 2645000 Active constraints 48 1244 31940 289288 680900 Matrix×vector multiplications 50 71 82 90 100 Outer iterations 9 8 7 7 6 Table 2: Effect of conditioning of the equality constraints Equality constraints 10 50 250 750 Dimension 200 5000 125000 1125000 Outer iterations 84 9 7 7 Matrix×vector multiplications 208 75 82 94 Conditioning 9.0 45.3 107.5 1061.4 Thus Ct is formed by t rows of 2 ∗ t2 entries which are zeros except ci,t2 −t+i = 1 and ci,t2 +t−i+1 = −1, i = 1, . . . , t. The matrix Ct is designed to enforce the kind of symmetry xti,t2 −t+i = xt,t2 +t−i+1 , i = 1, . . . , t. The rows of Ct are orthogonal. The results of computation are in Table 1. We can observe that the number of the outer iterations is not increasing, while the number of the matrix vector multiplications is increasing relatively mildly with the dimension of the problem. To see the effect of the conditioning of the constraints, we first modified the matrix Ct by adding to each row (except the last one) the following row and then we normalized the resulting matrix. The results are in Table 2 which includes also the conditioning of the constraints. We can see that the number of both the outer iterations and the matrix-vector multiplications decreases even though there is deteriorating conditioning of the constraints. The results are in agreement with the theory. To see what happens when the constraints are dependent, we first formed an auxiliary matrix Cbt by appending the first 0.1 ∗ t rows of Ct to Ct so that Cbt had 1.1 ∗ t rows, and then we modified the matrix Cbt by summing and normalizing as 17 Table 3: Experiments with dependent rows Equality constraints 11 55 275 825 Dimension 200 5000 125000 1125000 Outer iterations 143 24 11 10 Matrix×vector multiplications 338 107 94 106 above. The resulting constraint matrix corresponding to t has the same first t rows as the constraint matrix of the previous experiments and 0.1 ∗ t additional rows that are linear combinations of the first ones. We can observe that the results in Table 3 are comparable to those in Table 2. Rather surprising feature of Table 2 and Table 3 is slow convergence for the small problems. The latter can be improved by more sophisticated choice of the parameters. For example, using ρ = 500 reduces the number of the inner and outer iterations for t = 10 to 119 and 7, respectively. Let us recall that large ρ need not slow down the convergence of the inner loop [5]. The results indicate that we can observe optimality in practice for well conditioned problems. More numerical experiments including solution of the discretized elliptic variational inequality and details of implementation may be found in [8, 18]. 7 Comments and conclusions We have proved new results concerning optimality of our recently proposed algorithm [8] for the solution of the convex bound and equality constrained quadratic programming problems. While in [8] we proved optimality of the outer loop of our algorithm, in this paper we show its optimality in terms of the matrix-vector multiplications. We obtained also new results concerning the convergence of the projected gradient of our algorithm for the bound constrained problems. Our optimality results do not depend on conditioning of the matrix which defines the equality constraints and remain valid even for dependent constraints. The results were confirmed numerically on the solution of the well conditioned benchmarks with varying dimension and constraints. The results of the paper are important ingredients in the development of scalable algorithms for variational inequalities based on the FETI and explain experimental results observed earlier [16, 18]. The results will be also used to extend our earlier results on optimality of FETI-DP 18 [20] to the semi-coercive case. We shall describe these applications in more detail elsewhere. References [1] Bertsekas, D. P.: Nonlinear Optimization. Belmont: Athena Scientific 1999. [2] Conn, A. R., Gould N. I. M., Toint Ph. L.: LANCELOT: a Fortran package for large scale nonlinear optimization. Berlin: Springer Verlag 1992. [3] Dostál, Z.: Duality based domain decomposition with proportioning for the solution of free boundary problems. J. Comput. and Appl. Mathematics 63, 203-208(1995). [4] Dostál, Z.: Box constrained quadratic programming with proportioning and projections. SIAM J. Optimization 7, 871-887(1997). [5] Dostál Z.: On preconditioning and penalized matrices. Num. Lin. Algebra with Applications 6, 109-114(1999). [6] Dostál, Z.: A proportioning based algorithm for bound constrained quadratic programming with the rate of convergence. Numer. Algorithms 34, 293302(2003). [7] Dostál, Z.: Semi-monotonic inexact augmented Lagrangians for quadratic programming with equality constraints. Optimiz. Methods & Software, in print. [8] Dostál, Z.: Inexact semi-monotonic Augmented Lagrangians with optimal feasibility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numer. Analysis, to appear. [9] Dostál, Z., Friedlander, A., Santos, S. A.: Solution of coercive and semicoercive contact problems by FETI domain decomposition. Contemporary Mathematics 218, 83-93(1998). [10] Dostál, Z., Friedlander, A., Santos, S. A.: Adaptive precision control in quadratic programming with simple bounds and/or equalities. High Performance Software for Non-linear Optimization, eds. Leone, R. De et al., G., Kluwer, Applied Optimization 24, 161-173(1998). 19 [11] Dostál, Z., Friedlander, A., Santos, S. A.: Augmented Lagrangians with adaptive precision control for quadratic programming with equality constraints. Comput. Optimiz. and Applications 14, 37-53(1999). [12] Dostál, Z., Friedlander, A., Santos, S. A., Alesawi, K.: Augmented Lagrangians with adaptive precision control for quadratic programming with equality constraints: corrigendum and addendum. Comput. Optimiz. and Applications 23, 127-133(2002). [13] Dostál, Z., Friedlander, A., Santos, S. A.: Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and equality constraints. SIAM J. Optimization 13, 1120-1140(2003). [14] Dostál, Z., Gomes, F. A. M., Santos, S. A.: Solution of contact problems by FETI domain decomposition with natural coarse space projection. Comput. Methods in Appl. Mech. and Engineering 190, 1611-1627(2000). [15] Gondzio, J., Sarkissian, R.: Parallel interior point solver for structured linear programs. Mathematical Programming 96, 561-584(2003). [16] Dostál, Z., Horák, D.: Scalability and FETI based algorithm for large discretized variational inequalities. Math. and Comput. in Simulation 61, 347357(2003). [17] Dostál, Z., Horák, D.: Scalable FETI with Optimal Dual Penalty for a Variational Inequality: Numerical Linear Algebra and Applications 11, 455472(2004). [18] Z. Dostál and D. Horák, On scalable algorithms for numerical solution of variational inequalities based on FETI and semi-monotonic augmented Lagrangians, in Domain Decomposition Methods in Science and Engineering, eds. R. Kornhuber et al., Springer 2004. [19] Dostál, Z., Horák, D.: Scalable FETI with Optimal Dual Penalty for Semicoercive Variational Inequalities. Contemporary Mathematics 329, 293-302(2003). [20] Dostál, Z., Horák, D., Stefanica, D.: A Scalable FETI–DP Algorithm for Coercive Variational Inequalities, IMACS J. Appl. Numer. Math., in print. 20 [21] Dostál, Z., Schöberl, J.: Minimizing quadratic functions over non-negative cone with the rate of convergence and finite termination, Computat. Optimiz. with Applications, in print. [22] Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods in Appl. Mech. and Engineering 115, 365-385(1994). [23] Friedlander, A., Martı́nez, J. M., Raydan, M.: A new method for large scale box constrained quadratic minimization problems. Optimiz. Meth. and Software 5 57-74(1995). [24] Friedlander, A., Martı́nez, J. M., Santos, S.A.: A new trust region algorithm for bound constrained minimization, Appl. Math. & Optimization 30, 235266(1994). [25] Hackbusch, W.: Multigrid Methods and Applications. Berlin: Springer 1985. [26] Hager, W. W.: Analysis and implementation of a dual algorithm for constraint optimization. JOTA 79, 37-71(1993). [27] Nocedal, J., Wright S. J.: Numerical Optimization. New York: Springer,2000. [28] Saad, Y.: Iterative Methods for Sparse Systems. Philadelphia: SIAM 2003. [29] Schöberl, J.: Solving the Signorini problem on the basis of domain decomposition techniques, Computing 60, 323-344(1998). 21
© Copyright 2026 Paperzz