TRUST REGION SPECTRAL BUNDLE METHOD FOR

TRUST REGION SPECTRAL BUNDLE METHOD FOR
NONCONVEX EIGENVALUE OPTIMIZATION
P. APKARIAN
∗,
D. NOLL
† , AND
O. PROT
†
Abstract. We present a non-smooth optimization technique for non-convex maximum eigenvalue functions and for non-smooth functions which are infinite maxima of eigenvalue functions. We
prove global convergence of our method in the sense that for an arbitrary starting point, every accumulation point of the sequence of iterates is critical. The method is tested on several problems in
feedback control synthesis.
Key words. Eigenvalue optimization, trust region method, proximity control, spectral bundle,
feedback control synthesis, H∞ -norm.
1. Introduction. Eigenvalue optimization has a wide spectrum of applications
in physics and engineering or statistics and finance. This includes composite materials
[13], quantum computational chemistry [57], optimal system design [44, 54], shape
optimization [15], pole placement in linear system theory, robotics [41], relaxations of
combinatorial optimization problems [24, 36], experimental design [52, 55], and much
else. Many of these problems are non-convex, but even in the realm of convexity
eigenvalue optimization has a prominent place. Semidefinite programming (SDP) is
an important class of convex programs, which may be solved by way of eigenvalue
optimization [45].
The idea to solve large semidefinite programs via eigenvalue optimization can be
traced back to [14, 23, 38]. It results from the insight that interior-point methods
are not the appropriate choice when problems get sizeable. Due to its importance in
practice, eigenvalue optimization has been intensively studied since the 1980s. Early
contributions are Wolfe [56], Cullum et al. [14], Polak and Wardi [51], or Fletcher [17].
Starting in the late 1980s, Overton has contributed a series of papers [46, 47, 48], where
in particular Newton type methods are discussed. Oustry [45] presents a synthesis of
first and second order methods suited for convex maximum eigenvalue functions.
Here our interest is in non-convex eigenvalue programs, which arise frequently in
automatic control applications and especially in controller synthesis. In particular,
solving bilinear matrix inequalities (BMIs) is a prominent application, which may be
addressed via non-convex eigenvalue optimization. In [42, 43] we have shown how
to adapt the approach of [38, 45] to handle non-convex situations. Applications and
extensions of these ideas are presented in [4, 1, 53, 8, 5].
The goal of the present paper is twofold. In a first part we investigate how to
expand on the idea of Helmberg and Rendl’s spectral bundle method [23] in order to
deal with non-convex eigenvalue programs. Non-convexity requires a new approximation technique, complementing the convex mechanism used in [23]. We achieve our
goal by a trust region type approach or what is equivalent, by a dual approach using
proximity control. This method has antecedents in classical bundling, like Lemaréchal
[33, 34, 35, 37] or Kiwiel [30, 31, 29]. Extensions of the convex case to include bounds
constraints are given in [21].
In a second part we extend our method to address more general classes of functions which are infinite suprema of maximum eigenvalue functions. This includes
∗ CERT-ONERA,
† Université
Control System Department, 2, avenue Edouard Bélin, 31055 Toulouse, France
Paul Sabatier, Institut de Mathématiques, 118, route de Narbonne, 31062 Toulouse,
France
1
optimization of the H∞ -norm, an important application in feedback control synthesis. Optimization of the H∞ norm has been pioneered by E. Polak and co-workers.
See for instance [39, 40, 49] and the references given there. Our own approach to
optimizing the H∞ norm is developed in [4, 1, 5].
The structure of the paper is as follows. After some preparations in Sections 2 - 5,
the algorithm is presented in Section 6. Convergence analysis follows in Sections 7 and
8. The semi-infinite case which includes in particular optimization of the H∞ -norm
is presented in Section 9. While the main objective of this work is the convergence
analysis of our method, we have added several numerical tests for eigenvalue programs
in Section 10 to validate the algorithm. Numerical tests for the H∞ -norm and for
related problems are presented in [6].
2. Elements from non-smooth analysis. Recall that the maximum eigenvalue function λ1 : Sm → R is convex but generally non-smooth and defined on the
space Sm of m × m symmetric or Hermitian matrices. We consider composite functions of the form f = λ1 ◦ F , where F : Rn → Sm is a class C 2 operator. Notice
that f is non-smooth due to non-smoothness of λ1 , and nonconvex unless F is an
affine operator. The case where F is affine and therefore f convex has been studied
by many authors [14, 23, 38]. Here our interest is focused on handling nonconvex f .
Notice that as a composite function, f has a favourable structure. In particular,
the Clarke sub-differential [12] is given by the chain rule
∂f (x) = F 0 (x)? ∂λ1 (F (x))
where ∂λ1 (X) is the usual sub-differential of convex analysis at X ∈ Sm , and where
F 0 (x) is the derivative of F , F 0 (x)? its adjoint, mapping Sm back into Rn . Recall that
λ1 is itself highly structured as it is the support function of the convex compact set
G = {G ∈ Sm : G 0, Tr(G) = 1} .
That means λ1 (X) = max{G • X : G ∈ G}, where X • Y = Tr(XY ) denotes the
standard scalar product on Sm . Therefore (cf. [26, Example XI.1.2.5])
∂λ1 (X) = {G ∈ G : G • X = λ1 (X)}.
3. Predicted and actual decrease. Consider the minimization of f = λ1 ◦ F
over Rn and suppose x ∈ Rn is the current iterate. If a step y = x + td away from
the current position x is taken, where kdk = 1 and t > 0, then
αx (d, t) = f (x) − f (x + td)
is called the actual decrease of f at x with regard to that step. Naturally, αx ≤ 0
corresponds to those cases where no decrease at all is achieved. In order to predict
the decrease by a step from x to y we use a convex model of f around x, namely,
(1)
φ(y; x) = λ1 (F (x) + F 0 (x)(y − x)) ,
where y 7→ F (x) + F 0 (x)(y − x) is the first-order affine approximation of F (y) in a
neighbourhood of x. Clearly φ(x; x) = f (x). Notice that f = φ in those cases where
F itself is affine. (See e.g. [14, 45, 23], where the convex case has been thoroughly
discussed). We call
πx (d, t) = f (x) − φ(x + td; x)
2
the predicted decrease of f at x if the step from x to y = x + td is taken. Taylor’s
theorem obviously suggests that πx (d, t) ≈ αx (d, t) when t is sufficiently small. In
order to relate πx to αx , we use the following
Proposition 1. Let xk , tk > 0 be bounded sequences, kdk k = 1. Then there
exists a constant L > 0 such that for every k,
(2)
πxk (dk , tk ) = αxk (dk , tk ) + `k t2k
for certain `k satisfying |`k | ≤ L.
Proof. Notice that for any given matrices A, E ∈ Sm , the estimate
λm (E) ≤ λ1 (A + E) − λ1 (A) ≤ λ1 (E)
is satisfied. This is also known as Weyl’s theorem. Let B ⊂ Rn be a bounded
set. Expanding F at xk ∈ B gives F (x) = F (xk ) + F 0 (xk )(x − xk ) + R(x; xk ) with
kR(x; xk )k ≤ Lkx − xk k2 for some constant L > 0 and all x ∈ B. Using Xk = F (xk ),
Dk = F 0 (xk )dk , we have f (xk + tk dk ) = λ1 (Xk + tk Dk + R(xk + tk dk ; xk )). We now
apply Weyl’s theorem with A = Xk + tk Dk , E = R(xk + tk dk ; xk ), which gives
|αxk (dk , tk ) − πxk (dk , tk )| = |f (xk ) − f (xk + tk dk ) − (f (xk ) − λ1 (Xk + tk Dk ))|
= |λ1 (Xk + tk Dk ) − f (xk + tk dk )|
= max {|λ1 (R(xk + tk dk ; xk ))| , |λm (R(xk + tk dk ; xk )|}
≤ kR(xk + tk dk ; xk )k ≤ Lt2k
uniformly over xk + tk dk ∈ B. That proves the claim.
4. Local models. Along with (1) we consider a second local model of f in a
neighbourhood of the current iterate x, which we update recursively. Notice first that
(1) is convex, because the operator y 7→ F (x) + F 0 (x)(y − x) is affine. Recall that
φ(y; x) = max {G • [F (x) + F 0 (x)(y − x)] : G ∈ G} ,
where
G = {G ∈ Sm : Tr(G) = 1, G 0} .
This leads us to the second type of approximation, where G is replaced by a smaller
and easier to compute subset Gk ⊂ G. We will generate a sequence Gk ⊂ G of such
approximations and let
(3)
φk (y; x) = max {G • [F (x) + F 0 (x)(y − x)] : G ∈ Gk } .
Clearly φk ≤ φ. We will use the φk (·; x) to approximate f in a neighbourhood of x.
We will generate descent steps for the φk (·; x), which will ultimately lead to descent
in f at x, as the agreement between f and φk is improved at each step. The minimal
requirement for Gk is the following obvious
Lemma 2. Suppose Gk contains a sub-gradient of the form G = ee> ∈ G, where
e is a normalized eigenvector associated with λ1 (F (x)). Then φk (x; x) = f (x).
5. Trust region versus proximity control. Let φk be the current approximation (3) of f , and let tk > 0 be some trust region radius. We consider the trust
region optimization program
(4)
minimize
subject to
φk (y; x), y ∈ Rn
ky − xk ≤ tk
3
Let y k+1 be the minimizer of (4), which is a strictly convex program. We claim
that y k+1 satisfies the Karush-Kuhn-Tucker conditions. Indeed, without constraint
qualification we can at least assert that the Fritz John necessary optimality conditions
are satisfied [12], i.e., there exist σ ≥ 0, τ ≥ 0, not both zero, such that
0 ∈ σ∂φk (y k+1 ; x) + τ (y k+1 − x) = 0,
ky k+1 − xk ≤ tk ,
τ tk − ky k+1 − xk = 0.
Now suppose σ = 0, then τ > 0, which implies ky k+1 − xk = tk . That means
y k+1 − x 6= 0, and then also τ (y k+1 − x) 6= 0, contradicting the first condition. In
other words, we must have σ > 0. Dividing by σ, respectively assuming σ = 1, now
gives the Karush-Kuhn-Tucker conditions with multiplier τk ≥ 0. Notice that τk = 0
is possible, in which case 0 ∈ ∂φk (y k+1 ; x), meaning that y k+1 is a minimum of φk (·; x)
alone.
Let us next consider the unconstrained penalty program:
(5)
minimize
φk (y; x) +
τk
2 ky
− xk2 , y ∈ Rn
where τk ≥ 0 is the penalty parameter. The term τ2k ky − xk2 is sometimes introduced
as a proximity control. In fact, (5) is just an indirect way to control the radius tk
of the trust region program (4). Namely, minima of (4) and minima of (5) are in
one-to-one correspondence in the following sense: If y k+1 is a minimum of (5) for
fixed τk > 0, then y k+1 also solves (4) with tk := ky k+1 − xk and associated multiplier
τk > 0. Conversely, if y k+1 solves (4) and if the associated multiplier τk is strictly
positive, then y k+1 solves (5) with that penalty parameter τk > 0. The case τk = 0
in (5) obviously corresponds to those cases in (4) where the trust region constraint is
inactive.
Notice that solving (5) gives the corresponding tk = ky k+1 − xk and a solution
where the trust region constraint is active. Conversely, solving (4) gives a solution of
the corresponding (5), but only if a method is used where the multiplier is computed
and where its value turns out > 0. Solutions of (4) where the trust region constraint
is inactive cannot be recovered from solutions of (5).
With the models φ and φk we introduce two levels of approximation of f , so it
is not surprising that two mechanisms to adjust the degree of exactness are applied.
Firstly, if necessary we adjust tk at each step. This is done indirectly via the management of τk . Secondly, we update Gk into Gk+1 after each trial y k+1 . We use the
standard terminology in non-smooth optimization. If y k+1 is not used as the next
iterate, we call this a null step. If y k+1 is accepted and becomes the next iterate x+ ,
we speak of a serious step.
In order to test quality of the trial steps y k+1 , we use the quotient
(6)
ρk =
f (x) − f (y k+1 )
.
f (x) − φk (y k+1 ; x)
We fix constants 0 < γ < 12 < Γ < 1. We say that f and φk (·; x) are in good agreement
when ρk > Γ, and we say that the agreement is bad if ρk < γ. The bad case includes
in particular situations where ρk ≤ 0. Since always f (x) − φk (y k+1 ; x) > 0, unless
0 ∈ ∂f (x), we deduce that ρk ≤ 0 corresponds to cases where f (x)−f (y k+1 ) ≤ 0, that
is, where the proposed step y k+1 is not even a descent step for f . In our algorithm
we use the following rule: y k+1 is accepted as soon as ρk ≥ γ, i.e., as soon as the step
is not bad. The question is then what we shall do when agreement is bad, i.e., when
4
ρk < γ. Here we compute a second test parameter
ρek :=
f (x) − φ(y k+1 ; x)
,
f (x) − φk (y k+1 ; x)
and we introduce a second control parameter γ
e, where γ < γ
e < 12 . We then have
two possibilities. If ρk < γ and also ρek < γ
e, then we decide that φ is close enough
to f , but φk is too far away from φ. In this case we do not change τk , but improve
the approximation Gk+1 so that φk+1 gets closer to φ. On the other hand, if ρk < γ,
but ρek ≥ γ
e, then we consider φ and f as of not in good agreement. This can only be
remedied by decreasing the trust region radius tk , or what comes down to the same,
increasing τk . While doing this, we still update Gk into a better Gk+1 , i.e., we still let
φk approach φ, so this process is always applied.
6. Aggregate sub-gradients. As our convergence analysis will show, the approximations Gk ⊂ G only need to fulfill minimal requirements to guarantee convergence. We have identified the following three conditions:
(G1 ) ee> ∈ Gk for some normalized eigenvector e associated with λ1 (F (x)).
(G2 ) ek+1 e>
e
associated with the
k+1 ∈ Gk+1 for some normalized eigenvector
k+1
maximum eigenvalue λ1 F (x) + F 0 (x)(y k+1 − x) .
(G3 ) G∗ ∈ Gk+1 for some of the G∗ ∈ Gk where the maximum φk (y k+1 ; x) is
attained.
Below we will discuss practical choices of the sets Gk , combining ideas from [23] and
[45], [22], and [42].
Let Gk consist of sets of the from
(7)
αGk + Qk Yk Q>
k,
where Yk ∈ Srk has Yk 0, α + Tr(Yk ) = 1, 0 ≤ α ≤ 1, where Qk is a m × rk
matrix whose rk ≥ 1 columns form an orthogonal basis of an invariant subspace of
F (x) + F 0 (x)(y k − x), and where Gk ∈ G is the aggregate sub-gradient. Assume
that at least one normalized
eigenvector associated with the maximum eigenvalue
λ1 F (x) + F 0 (x)(y k − x) is in the span of Qk . The idea is to build the new set Gk+1
along the same lines, using an updating strategy, which we now explain.
Let y k+1 be the solution of the new program (5), obtained with the help of Gk .
The necessary optimality condition gives 0 ∈ ∂φk (y k+1 ; x) + τk (y k+1 − x). Due to the
structure of φk , this means there exist G∗ ∈ Gk such that
0 = F 0 (x)? G∗ + τk (y k − x),
G∗ = α∗ Gk + Qk Yk∗ Q>
k,
where 0 ≤ α∗ ≤ 1, Yk∗ 0 and α∗ + Tr(Yk∗ ) = 1. Now a simple method is to let
(8)
Gk+1 = α∗ Gk + Qk Yk∗ Q>
k,
the new aggregate sub-gradient. Helmberg and Rendl [23] use a refinement of (8),
which is suited for large problem size: Let Y ∗ = P DP > be a spectral decomposition
of the rk × rk matrix Y ∗ . Decompose P = [P1 P2 ] with corresponding spectra D1
and D2 so that P1 contains as columns those eigenvectors associated with the large
eigenvalues of Y ∗ , P2 the remaining columns. Now put
(9)
∗
∗
Gk+1 = α∗ Gk + Qk P2 D2 P2> Q>
k / (α + Tr(Y )) ,
5
the new aggregate sub-gradient, which is an element of G. In this way only the
minor part of Y ∗ is kept in the aggregate sub-gradient. The dominant part of Y ∗
is retained in the next eigenbasis, by letting Qk P1 be part of Qk+1 . In particular,
by this procedure, at least one normalized eigenvector of the maximum eigenvalue
of F (x) + F 0 (x)(y k+1 − x) will be included in Qk+1 . In order to guarantee axiom
(G1 ), we also keep at least one normalized eigenvector associated with the maximum
eigenvalue F (x) in Qk+1 , respectively in the linear hull generated by its columns.
Altogether, Gk+1 consist of all αGk+1 + Qk+1 Yk+1 Q>
k+1 , where 0 ≤ α ≤ 1, Yk+1 0 and α + Tr(Yk+1 ) = 1. For this construction we have
Lemma 3. The sets Gk so defined satisfy the rules (G1 ) − (G3 ). In particular,
φk (x; x) = f (x) and φk+1 (y k+1 ; x) = φ(y k+1 ; x) for every k.
Remark. Conditions (G1 ) − (G3 ) leave a lot of freedom for the choice of the
bases Qk . In [22] Helmberg and Oustry investigate the convex case and discuss ways
to combine their two approaches [23] and [45] into a unified method. An alternative
approach is Polak and Wardi [51]. For non-convex eigenvalue functions we have
proposed an extension of Oustry’s approach in [42, 43]. The conclusions obtained
there remain valid in the present context.
Spectral bundle algorithm for minx∈Rn f (x)
Parameters: 0 < γ < γ
e < 12 < Γ < 1.
1. Outer loop. If 0 ∈ ∂f (x) stop. Otherwise goto inner loop.
1
2. Inner loop. Initialize G1 = ∂λ1 (F (x)), G1 = m
Im , put counter k = 1,
and choose τ1 > 0. If old value for τ from previous sweep is memorized,
use it to initialize τ1 .
3. Tangent program. At counter k with given τk > 0 and Gk solve
τk
min φk (y; x) + ky − xk2
y∈Rn
2
Solution is y k+1 . Find G∗ ∈ Gk where (3) at y k+1 is attained. Write
G∗ = α∗ Gk + Qk Yk Q>
k according to (7).
4. Acceptance test. Check whether
f (x) − f (y k+1 )
ρk =
≥ γ.
f (x) − φk (y k+1 ; x)
+
k+1
If this is the case put x = y
. If ρk > Γ, then memorize τ + = τk /2
in outer loop. If only γ ≤ ρk ≤ Γ, then memorize τ + = τk . Then go
back to step 1 to commence a new sweep of outer loop. If ρk < γ then
continue inner loop with step 5.
5. Agreement test. Compute
f (x) − φ(y k+1 ; x)
ρek =
f (x) − φk (y k+1 ; x)
and put
τk
if ρk < γ and ρek < γ
e
τk+1 =
2τk if ρk < γ and ρek ≥ γ
e
6. Aggregate sub-gradient. Compute new set Gk+1 according to (7).
New aggregate sub-gradient is Gk+1 = G∗ .
7. Inner loop. Increase counter k → k + 1 and go back to step 3.
7. Convergence analysis of inner loop. We have to show that the inner loop
is finite, that is, finds a trial point y k+1 accepted in step 4 after a finite number k of
6
steps. We prove this by showing that if the inner loop turns forever, that is, ρk < γ
for all k, then 0 ∈ ∂f (x). There are two sub-cases to be discussed, depending on the
decision in step 5. These will be addressed in Lemmas 4 and 5.
Our first concern is the situation where ρk < γ, but ρek ≥ γ
e. Here the model
suggests accepting the trial step y k+1 , but reality says the opposite. This situation
cannot occur infinitely often.
Lemma 4. Suppose the algorithm generates an infinite sequence of trial steps
y k+1 such that always ρk < γ. Then ρek < γ
e for some k0 and all k ≥ k0 .
Proof. 1) Assume on the contrary that ρk < γ for all k, but at the same time
ρek ≥ γ
e for infinitely many k ∈ N. Then according to the update rule in step 5 of the
algorithm, the sequence τk tends to +∞. As a consequence of the necessary optimality
condition 0 ∈ ∂φk (y k+1 ; x) + τk (y k − x), and due to uniform boundedness of the subdifferential ∂φk , we deduce that y k+1 → x. Using τk (x − y k+1 ) ∈ ∂φk (y k+1 ; x) calls
for the sub-gradient inequality, which gives
>
τk x − y k+1
x − y k+1 ≤ φk (x; x) − φk (y k+1 ; x) = f (x) − φk (y k+1 ; x),
the latter by Lemma 3, respectively by axiom (G1 ). In other words,
(10)
2) Putting dk =
follows
τk kx − y k+1 k2
≤ 1.
f (x) − φk (y k+1 ; x)
y k+1 −x
,
ky k+1 −xk
tk = ky k+1 − xk, we expand the test parameters as
ρek = ρk +
= ρk +
= ρk +
≤ ρk +
≤ ρk +
f (y k+1 ) − φ(y k+1 ; x)
f (x) − φk (y k+1 ; x)
πx (dk ; tk ) − αx (dk ; tk )
f (x) − φk (y k+1 ; x)
`k kx − y k+1 k2
f (x) − φk (y k+1 ; x)
L
τk kx − y k+1 k2
·
τk f (x) − φk (y k+1 ; x)
L
(using (10) ).
τk
Since τk → ∞, we deduce lim supk→∞ ρek ≤ lim supk→∞ ρk ≤ γ, contradicting ρek ≥
γ
e > γ for infinitely many k. That proves τk → ∞ is impossible, and therefore ρek ≥ γ
e
for infinitely many k is also impossible.
Remark. Notice that the proof of Lemma 4 uses axiom (G1 ) only. Axioms (G2 )
and (G3 ) will be needed in the next Lemma.
As a consequence of Lemma 4 we see that the algorithm, when faced with the
bad case ρk < γ, will continue to increase τk , until eventually ρek < γ
e, too. From some
index k0 onwards, we will then be in the first case in step 5 of the algorithm, where
the parameter τk is frozen, i.e., τk =: τ for k ≥ k0 . We then have no easy argument
to deduce y k+1 → x. Here, indeed, we will have to exploit properties (G2 ) and (G3 )
of the update rule Gk → Gk+1 . We follow the line of [23, Lemma 5.2].
Lemma 5. Suppose ρk < γ for all k while ρek < γ for some k0 and all k ≥ k0 .
Then 0 ∈ ∂f (x).
7
Proof. As we mentioned already, ρk < γ and ρek < γ for all k ≥ k0 implies that
in step 5 of the algorithm, τk =: τ is frozen for k ≥ k0 . A priori we therefore do not
know whether y k+1 → x, as we did in the proof of Lemma 4. This complicates the
following analysis.
Let us introduce the function
ψk (y; x) = φk (y; x) + τ2 ky − xk2 ,
then by its definition y k+1 is the global minimum of ψk (·; x) for k ≥ k0 . Let G∗ ∈ Gk
be the sub-gradient where the supremum φk (y k+1 ; x) is attained and which is retained
in Gk+1 in accordance with rule (G3 ) and also with step 3 of the algorithm. That
means
(11)
φk (y k+1 ; x) = G∗ • F (x) + F 0 (x)(y k+1 − x) ,
and also
(12)
ψk (y k+1 ; x) = G∗ • F (x) + F 0 (x)(y k+1 − x) + τ2 ky k+1 − xk2 .
We introduce the function
ψk∗ (y; x) = G∗ • [F (x) + F 0 (x)(y − x)] + τ2 ky − xk2 .
Then ψk∗ (y k+1 ; x) = ψk (y k+1 ; x) and
ψk∗ (y; x) ≤ ψk+1 (y; x)
(13)
for k ≥ k0 , because G∗ ∈ Gk+1 . We claim that
ψk∗ (y) = ψk∗ (y k+1 ) + τ2 ky − y k+1 k2 .
(14)
The easiest way to see this is to observe that ψk∗ is quadratic and expand it, using
∇ψk∗ (y) = F 0 (x)∗ G∗ + τ (y − x) and ∇2 ψk∗ (y) = τ I. Then clearly
ψk∗ (y) = ψk∗ (y k+1 ) + ∇ψk∗ (y k+1 )> (y − y k+1 ) + τ2 (y − y k+1 )> (y − y k+1 ).
Formula (14) will therefore be established as soon as we show that the first order term
in this expansion vanishes. But this term is
∇ψk∗ (y k+1 )> (y − y k+1 ) =
>
= (F 0 (x)∗ G∗ ) (y − y k+1 ) + τ (y k+1 − x)> (y − y k+1 )
= G∗ • F 0 (x)(y − y k+1 ) + τ (y k+1 − x)> (y − y k+1 )
= τ (x − y k+1 )> (y − y k+1 ) + τ (y k+1 − x)> (y − y k+1 )
= 0.
(definition of adjoint)
(optimality of y k+1 )
That proves formula (14).
3) From the previous point 2) we have
ψk (y k+1 ; x) ≤ ψk∗ (y k+1 ; x) + τ2 ky k+2 − y k+1 k2
(15)
= ψk∗ (y k+2 ; x)
≤ ψk+1 (y k+2 ; x)
≤ ψk+1 (x; x)
= φk+1 (x; x) ≤ φ(x; x).
(using ψk∗ (y k+1 ; x) = ψk (y k+1 ; x))
(using (14))
(using (13))
(y k+2 is minimizer of ψk+1 )
8
We deduce that the sequence ψk (y k+1 ; x) is monotonically increasing and bounded
above by φ(x; x). It therefore converges to some ψ ∗ ≤ φ(x; x).
Going back to (15) with this information, we see that the term τ2 ky k+2 − y k+1 k2
is now squeezed in between two convergent terms with the same limit ψ ∗ , and must
therefore tend to zero. Consequently, ky k+1 − xk2 − ky k+2 − xk2 also tends to 0,
because the sequence y k is bounded. Recalling φk (y; x) = ψk (y; x) − τ2 ky − xk2 , we
deduce, using both convergence results, that
φk+1 (y k+2 ; x) − φk (y k+1 ; x)
(16)
= ψk+1 (y k+2 ; x) − ψk (y k+1 ; x) − τ2 ky k+2 − xk2 + τ2 ky k+1 − xk2 → 0.
4) Let ek+1 the normalized eigenvectors of F (x) + F 0 (x)(y k+1 − x) associated
with λ1 , which we pick in step 3 of the algorithm and according to rule (G2 ). Then
k+1
gk = F 0 (x)∗ ek+1 e>
. Hence by the sub-gradient
k+1 is a sub-gradient of φk+1 (·; x) at y
inequality
φk+1 (y k+1 ; x) + gk> y − y k+1 ≤ φk+1 (y; x).
Since φk+1 (y k+1 ; x) = φ(y k+1 ; x) by Lemma 3, respectively rule (G2 ), we have the
estimate
(17)
φ(y k+1 ; x) + gk> y − y k+1 ≤ φk+1 (y; x).
Now observe that
0 ≤ φ(y k+1 ; x) − φk (y k+1 ; x)
= φ(y k+1 ; x) + gk> y k+2 − y k+1 − φk (y k+1 ; x) − gk> y k+2 − y k+1
≤ φk+1 (y k+2 ; x) − φk (y k+1 ; x) + kgk k y k+2 − y k+1 (using (17))
and this term tends to 0 due to (16), because y k+2 − y k+1 → 0, and because the
sequence gk is bounded. We deduce that φ(y k+1 ; x) − φk (y k+1 ; x) → 0.
5) We now show that φk (y k+1 ; x) → f (x), and then of course also φ(y k+1 ; x) →
f (x). Let δ > 0, then it follows from 4) that there exists k1 such that
(18)
φ(y k+1 ; x) − δ ≤ φk (y k+1 ; x)
for all k ≥ k1 . Using ρek ≤ γ
e for k ≥ k1 then gives
γ
e φk (y k+1 ; x) − f (x) ≤ φ(y k+1 ; x) − f (x)
(19)
≤ φk (y k+1 ; x) + δ − f (x).
Assume contrary to what is claimed that lim supk→∞ f (x) − φk (y k+1 ; x) = η > 0.
Then (19) gives γ
eη ≥ η − δ. Since γ
e < 1, choosing δ small enough leads to a
contradiction. For instance, δ < (1 − γ
e)η will do. This shows η = 0.
6) Having shown φ(y k+1 ; x) → f (x) and φk (y k+1 ; x) → f (x), we argue that
k+1
y
→ x. This follows from the definition of y k+1 , because
ψk (y k+1 ; x) = φk (y k+1 ; x) + τ2 ky k+1 − xk2 ≤ ψk (x; x) = φk (x; x) = f (x).
Since φk (y k+1 ; x) → f (x), we have indeed ky k+1 − xk → 0 by a sandwich argument.
9
To finish the proof, let us show that 0 ∈ ∂f (x). Notice first that the necessary
optimality condition gives 0 ∈ ∂ψk (y k+1 ; x) = ∂φk (y k+1 ; x) + τ (y k+1 − x), which
implies
τ (x − y k+1 ) ∈ ∂φk (y k+1 ; x).
The sub-gradient inequality gives
τ (x − y k+1 )> (y − y k+1 ) ≤ φk (y; x) − φk (y k+1 ; x)
≤ φ(y; x) − φk (y k+1 ; x)
for every y. Passing to the limit, observing τ (y k+1 −x) → 0 and φk (y k+1 ; x) → φ(x; x),
we obtain the estimate
0 ≤ φ(y; x) − φ(x; x),
for every y, which by convexity of φ(·; x) implies 0 ∈ ∂φ(x; x). Since ∂φ(x; x) = ∂f (x),
we have shown 0 ∈ ∂f (x), as claimed.
Remark. Various modifications of our algorithm may be considered. For instance, whenever a null step y k+1 is made, that is, ρk < γ, then we should first check
whether y k+1 gives descent in f :
(20)
f (x) − f (y k+1 ) ≥ δ1 > 0.
If this is not the case, the trust region radius is certainly too large, so we should
increase τk right away. As presented, this will also happen, but after several null
steps, approaching φk to φ, until the criterion in step 4 is met.
In the same vein, even when y k+1 gives descent in f , but slightly, so that ρk > γ
fails, we may check whether
(21)
σk :=
f (x) − f (y k+1 )
≥ δ2
f (x) − φ(y k+1 ; x)
for some 21 < δ2 < 1. If σk < δ2 , then f and φ are not in good agreement. In this
case, our algorithm will keep τk fixed and will start driving φk closer to φ. Eventually
this will lead to a moment where ρek < γ
e, and then τk will be increased to 2τk . As
above one may argue that in this case we have lost time, because we have brought φk
closer to φ even though φ is too far away from f , only to notice in the end that we
could not avoid increasing τ . This loss of time and energy could be avoided by adding
the test (21). If σk < δ2 , then we increase τk right away, but keep incrementing φk .
The convergence analysis of this is covered by the two central lemmas of this section.
8. Convergence analysis of outer loop. All we have to do now is piece things
together and show sub-sequence convergence of the sequence of serious steps taken
in the outer loop. From the previous chapter we know that the inner loop always
ends after a finite number of steps k with a new x+ satisfying the acceptance test,
unless we have finite termination due to 0 ∈ ∂f (x). Excluding this case, let us assume
that xj is the sequence of serious steps, satisfying the stopping test in step 4 of the
algorithm. Since y k+1 accepted in step 4 becomes the new xj+1 , that means
(22)
f (xj ) − f (xj+1 ) ≥ γ f (xj ) − φkj (xj+1 ; xj )
10
where j is the counter of the outer loop, k the counter of the inner loop, and where
at the outer step j the inner loop
was stopped at k = kj . Now recall from the
construction that τkj xj − xj+1 ∈ ∂φkj (xj+1 ; xj ). The sub-gradient inequality for
φkj at xj+1 therefore gives
>
τkj xj − xj+1 (xj − xj+1 ) ≤ φkj (xj ; xj ) − φkj (xj+1 ; xj ) = f (xj ) − φkj (xj+1 ; xj ),
using φkj (xj ; xj ) = f (xj ). That means
τkj kxj+1 − xj k2 ≤ f (xj ) − φkj (xj+1 ; xj ) ≤ γ −1 f (xj ) − f (xj+1 )
using (22). Summing up from j = 1 to j = J − 1 gives
J−1
X
τkj kxj+1 − xj k2 ≤ γ −1
j=1
J−1
X
f (xj ) − f (xj+1 ) = γ −1 f (x1 ) − f (xJ )
j=1
and under the hypothesis that the sequence f (xJ ) is bounded below, we deduce convergence of the series
∞
X
τkj kxj+1 − xj k2 < ∞.
j=1
j+1
j 2
In particular, τkj kx
− x k → 0. There are now two cases to be discussed.
Consider first sub-sequences j ∈ N where gj := τkj (xj − xj+1 ) → 0, j ∈ N . Since
gj is a sub-gradient of φkj (·; xj ) at xj+1 , we deduce that 0 ∈ ∂f (e
x) for every accumulation point x
e of the sequence xj , j ∈ N . So here we have sub-sequence convergence.
If we allow cases where the sequence xj is unbounded, so that no accumulation point
exists, then we still have kgj k → 0 for the sub-gradients gj . That means f gets flat
as kxj k → ∞. This is of course excluded if f is coercive.
The other case to be discussed concerns sub-sequences where kgj k is bounded
away from 0, τkj kxj − xj+1 k ≥ µ > 0 say. We argue that in this case τkj must
be bounded away from 0. Indeed, suppose on the contrary that τkj → 0 for a subsequence, then kxj −xj+1 k → ∞, hence τkj kxj −xj+1 k2 ≥ µkxj −xj+1 k, contradicting
the summability of τkj kxj − xj+1 k2 . That proves the claim.
With τkj bounded away from 0, we have two consequences. Firstly, we then must
even have τkj → ∞, because kxj − xj+1 kP
→ 0 and τkj kxj − xj+1 k ≥ µ. Secondly,
∞
summability of the series now shows that j=1 kxj+1 − xj k < ∞. This implies that
xj is a Cauchy sequence, because
kxj − xj+t k ≤ kxj − xj+1 k + · · · + kxj+t−1 − xj+t k → 0
(j, t → ∞).
Let xj → x̄. We will prove that x̄ is a critical point.
Consider the null steps y kj , y kj −1 , . . . , which were produced before y kj +1 = xj+1 .
As we know, they were all refused, so ρ < γ. Since our τ parameter tends to ∞, it
is clear that before y kj +1 was accepted, the doubling rule in step 5 of the algorithm
had been applied at least once. We are interested in the index where the doubling
rule was applied for the last time before acceptance, that is, the moment where we
passed from 21 τkj = τkj −ν to τkj = τkj −ν+1 . If ν > 1, then several steps with the
same τ parameter followed this moment. If ν = 1 then the doubling occurred for the
last time right before the appearance of y kj +1 . Notice that
f (xj ) − f (y kj −ν+1 )
<γ
f (xj ) − φkj −ν (y kj −ν+1 ; xj )
and
11
φ(y kj −ν+1 ; xj ) − f (xj )
≥γ
e.
φkj −ν (y kj −ν+1 ; xj ) − f (xj )
For ease of notation, let us assume ν = 1. That means, y kj was refused with
ρj =
f (y kj ) − f (xj )
<γ
φkj −1 (y kj ; xj ) − f (xj )
and
ρej =
φ(y kj ; xj ) − f (xj )
≥γ
e
φkj −1 (y kj ; xj ) − f (xj )
We then have τkj −1 = 21 τkj , and therefore
1
j
kj
∈ ∂φkj −1 (y kj ; xj ).
2 τk j x − y
Using the sub-gradient inequality and φkj −1 (xj ; xj ) = f (xj ), we obtain
1
2 τk j
xj − y kj
>
xj − y kj ≤ φkj −1 (xj ; xj ) − φkj −1 (y kj ; xj )
= f (xj ) − φkj −1 (y kj ; xj ),
which could also be written as
τkj kxj − y kj k2
j
f (x ) − φkj −1 (y kj ; xj )
(23)
≤ 2.
Substituting this into the above expression for ρej gives
ρej = ρj +
= ρj +
f (y kj ) − φ(y kj ; xj )
f (xj ) − φkj −1 (y kj ; xj )
πxj (dj ; kxj − y kj k) − αxj (dj ; kxj − y kj k)
f (xj ) − φkj −1 (y kj ; xj )
`j kxj − y kj k2
f (xj ) − φkj −1 (y kj ; xj )
2L
≤ ρj +
(using (23)).
τk j
= ρj +
Since ρj < γ and L/2τkj → 0, we have lim supj→∞ ρej ≤ γ, contradicting ρej ≥ γ
e>γ
for all j.
The
is a sub-sequence xj , j ∈ N where τkj kxj − xj+1 k ≥ µ > 0.
P final case
j
j+1
Here j∈N kx − x k < ∞ does not allow us to deduce that xj , j ∈ N is Cauchy.
Nonetheless, the above argument still applies to every accumulation point of xj , which
must be critical. Altogether we have proved the following
Theorem 6. Let f = λ1 ◦ F be a maximum eigenvalue function and let x1 ∈ Rn
be such that the set {x ∈ Rn : f (x) ≤ f (x1 )} is compact. Then every accumulation
point of the sequence xj of serious steps generated by the algorithm is a critical point
of f .
In practical tests we observe convergence, and the theoretical possibility of a
sequence of iterates with several accumulation points never occurs. This is explained
to some extent by the following
Corollary 7. Suppose f = λ1 ◦ F is convex in a bounded neighbourhood Ω of x∗
and that the iterates (serious steps) xk remain in Ω. Then the sequence xk converges
to some local minimum x] with f (x∗ ) = f (x] ).
Proof. As the sequence of serious steps satisfies the acceptance condition in step
4 of the algorithm, we are in the same situation as in the convex algorithm discussed
in [23]. The argument presented there shows convergence to a local minimum x] ∈ Ω.
12
Remark. The present trust region method should be compared to the approach
of Fuduli al. [18, 19] for general non-smooth and non-convex locally Lipschitz functions, where the authors design a trust region with the help of the first order affine
approximations a(y) = g > (y −y k+1 )+f (y k+1 ), g ∈ ∂f (y k+1 ), of the objective f at the
trial points y k+1 . As these affine models are not support functions to the objective,
the authors classify them according to whether a(x) > f (x) or a(x) ≤ f (x), using this
information to devise a trust region around the current x. Their approach is certainly
appealing, because it uses genuine information from the objective f . In contrast, our
method uses information from the model φ(·; x) at the trial points y k+1 .
9. Minimizing the H∞ -norm. In this chapter we extend our algorithm to a
larger class of functions which are suprema of an infinite family of maximum eigenvalue
functions. The application we have primarily in mind is the H∞ -norm, but the results
are applicable to a much larger class.
To introduce our case, consider a parametrized family of stable linear timeinvariant dynamical systems
ẋ = A(θ)x + B(θ)w
(24)
P (θ) :
z = C(θ)x + D(θ)w
with data A(θ) ∈ Rnx ×nx , B(θ) ∈ Rnw ×nx , C(θ) ∈ Rnx ×nz , D(θ) ∈ Rnw ×nz depending smoothly on a decision parameter θ ∈ Rn . The transfer function of P (θ)
−1
is G(θ, s) = C(θ) (sI − A(θ)) B(θ) + D(θ). Here nx is the order of the system, x
its state, nw the number of inputs, w the input vector, nz the number of outputs,
z the output vector. As a typical example, in feedback control synthesis, P (θ) may
represent a closed-loop system, depending on the unknown (to be designed) feedback
controller θ. The closed-loop transfer function is then dependent on the decision vector
θ which regroups the controller gains and possibly other decision parameters. Notice
that along with the feedback controller to be designed, the vector θ could include
other design parameters, e.g. from the open-loop system [41], or scalings/multipliers
in robust synthesis [6].
Typically, the performance of the unknown feedback controller might be measured
in the H∞ norm. Recall that the H∞ norm kG(θ)k∞ of a stable system is the
L2 (jR) → L2 (jR) operator norm of the channel w → z, where z(s) = G(θ, s)w(s).
An explicit expression is
kG(θ)k∞ =
σ (G(θ, jω)) =
sup
sup
1/2
λ1 G(θ, jω)H G(θ, jω)
ω∈R∪{∞}
ω∈R∪{∞}
where X H is the conjugate transpose of a matrix X. We are interested in that choice
of θ which minimizes the H∞ norm
min kG(θ, ·)k∞ .
(25)
θ∈Rn
We introduce the function
f (θ) = kG(θ, ·)k2∞ ,
which is then an infinite maximum of maximum eigenvalue functions
f (θ) =
max
f (θ, ω),
f (θ, ω) = λ1 (F (θ, ω))
ω∈R∪{∞}
13
where
F (θ, ω) = G(θ, jω)H G(θ, jω) ∈ Sm .
Program (25) is a semi-infinite with two sources of non-smoothness, the infinite maximum operator, and the non-smoothness of each maximum eigenvalue function f (·, ω).
The following result is a key property for the analysis of f ; see e.g. [9], [8, Lemma 1].
Lemma 8. Suppose G(θ) is stable. Then the set Ω(θ) = {ω ∈ R ∪ {∞} : f (θ) =
f (θ, ω)} of active frequencies is either finite, or Ω(θ) = R ∪ {∞}, i.e., f (θ) = f (θ, ω)
for every ω ∈ R ∪ {∞}.
We refer to Ω(θ) as the set of active frequencies at θ. A system where Ω(θ) = R∞
is called all-pass. In practical cases, iterates θ where G(θ, s) is all-pass are rarely
encountered.
For the following we switch back to the more standard notation in optimization,
where the decision variable θ is denoted x ∈ Rn . Consider the case where Ω(x) =
{ω1 , . . . , ωp } is finite. Any Ω with Ω(θ) ⊂ Ω ⊂ R∞ is called an extension of Ω(θ).
For a given extension Ω we consider the function fΩ (y) = maxω∈Ω f (y, ω). If Ω is
finite, then fΩ is a maximum eigenvalue function, namely, fΩ (y) = λ1 (FΩ (y)), where
FΩ (y) is block diagonal with diagonal blocks F (y, ω), ω ∈ Ω arranged in any order.
We have fΩ ≤ f and fΩ(x) (x) = fΩ (x) = f (x) for every extension Ω of Ω(x). The
sub-differential of f at x is determined by Ω(x) in as much as
∂f (x) = ∂fΩ (x) = ∂fΩ(x) (x).
Our goal is to extend the eigenvalue optimization algorithm to the case of the H∞ norm. We use the following simple idea.
i. For a finite extension Ω of Ω(x) we know how to generate descent steps for
fΩ at x, because fΩ is a maximum eigenvalue function.
ii. Suppose y k+1 is a serious step for fΩ satisfying the acceptance test in step 4
of the algorithm. If Ω is large enough, fΩ is close to f , so that we may hope
that the acceptance test will also be satisfied for f .
This leads to a convergent algorithm for the H∞ -norm. What is needed is an increasing sequence Ω1 ⊂ Ω2 ⊂ . . . of finite sets whose union is dense in R∞ . Then we use the
scheme of our algorithm, generating descent steps for fΩ` (x) , where Ω(x)∪Ωk ⊂ Ω` (x).
If the approximation of f by fΩk (x) is not good enough, we replace Ω` by the larger
Ω`+1 . This approach is inspired by the theory of consistent approximations of [50].
Spectral bundle algorithm for minn
max
x∈R ω∈R∪{∞}
f (x, ω)
Parameters 0 < γ ] < γ < 12 .
1. Outer loop. If 0 ∈ ∂f (x) stop, else goto inner loop.
2. Inner loop. Initialize with finite Ω1 containing Ω(x1 ). Put ` = 1.
3. Sub-program. For given Ω` consider fΩ` , φΩ` (·; x) and φkΩ` (·; x) and use
inner loop of the first algorithm (with counter k) to generate trial step x`
satisfying the test
f (x) − fΩ` (x` )
≥ γ.
f (x) − φkΩ` (x` ; x)
4. Reality check. Test whether
f (x) − f (x` )
≥ γ].
f (x) − φkΩ` (x` ; x)
5. Decision. If this is the case let x+ = x` and go back to step 1. Otherwise
add new frequencies to the set Ω` to obtain Ω`+1 and go back to step 3.
14
Here φΩ` (·; x) relates to fΩ` as φ(·; x) relates to f in Algorithm 1, and similarly,
φkΩ` (·; x) used here plays the role of φk (·; x) there.
Notice that Ω` could in theory be any increasing sequence of frequencies whose
union is dense, but it is preferable to adapt this sequence to the local situation at the
current iterate x. Ideas how Ω` (x) could be chosen at each step are discussed in [4].
Theorem 9. Every accumulation point of the sequence of iterates generated by
the above algorithm is a critical point of f .
Proof. Notice first that 0 6∈ ∂fΩ` (x) as a consequence of step 1, and because
Ω(x) ⊂ Ω` for every `. Therefore, the finite theory of the previous sections shows that
after a finite number k of steps, x` satisfying the test in step 3 above is found. This
x` is in fact the last y k+1 of the inner loop of algorithm 1 with counter k applied to
fΩ` . Now in the reality check (step 4) we test whether the so found x` also presents
a satisfactory decrease in f itself. Since fΩ` gets closer to f in a neighbourhood of x
as Ω` is updated in step 5, this happens after a finite number of updates `, leading to
the new x+ . Now we mimic the proof in Section 8 with γ ] replacing γ used there and
apply it to the sequence x, x+ , . . . . The conclusion is the same, namely sub-sequence
convergence to a critical point.
Remark. Using the theory of consistent approximations [50] allows in principle
to apply this method in a fairly general context. However, a difficulty arises in step
4 of the algorithm, where the reality check requires computing values f (x` ). This is
what makes the case of the H∞ norm special, because here we have an efficient way
to compute function values [9]. The same idea can be used to solve problems with
integral quadratic constraints (IQCs), see [6, 3].
10. Numerical tests. We present numerical tests with bilinear matrix inequalities (BMIs) arising in feedback controller synthesis. Consider a closed loop system
of the form (24), where A(K), B(K), . . . depend on the feedback controller K to be
designed. The bounded real lemma [10] asserts that the closed-loop transfer channel
w → z has H∞ -norm bounded by γ∞ if and only if there exists a Lyapunov matrix
X 0 such that


A(K)> X + XA(K) XB(K) C(K)>
e
B(K)> X
−γ∞ I D(K)>  ≺ 0.
B(K,
X, γ∞ ) = 
C(K)
D(K)
−γ∞ I
Fixing a small threshold > 0, we consider the following non-linear semidefinite
program:
minimize γ∞
e
subject to B(K,
X, γ∞ ) −I
I − X 0
which may be solved as an eigenvalue optimization program with decision variable x =
(vec(K),
svec(X), γ∞ ) if exact
penalization is used. To this end, put B(K, X, γ∞ ) :=
h
i
e
diag I − X; B(K, X, γ∞ ) , and fix a penalty parameter α > 0 to solve the eigenvalue program minx λ1 (γ∞ I + αB(K, X, γ∞ )) for x = (vec(K), svec(X), γ∞ ). An
alternative approach is to fix the performance level γ∞ , and to solve the eigenvalue
program
(26)
min λ1 (B(K, X, γ∞ ))
K,X
15
until a value < 0 is found. The gain value γ∞ could then be updated a few times to
improve performance. This approach has been adopted in our numerical tests, while
the exact penalty approach was used in [54].
Further testing of this approach for control problems with integral quadratic constraints (IQCs) is presented in [6].
10.1. Numerical implementation. We have performed four numerical experiments using models known in the control literature (VTOL Helicopter, Chemical
Reactor, Transport Airplane, and Piezoelectric Actuator). This allows comparison
with previous studies. We present both static and reduced-order controller designs.
The state space matrices of these models can be found in [32, 54], with the results of
H2 and H∞ synthesis problems.
To solve the non-convex eigenvalue optimization problem (26) we use our MATLAB implementation of the spectral bundle algorithm. The tangent sub-problem to
compute the trial step y + requires minimizing a quadratic cost function subject to a
SDP constraint (an LMI). In order to solve the tangent sub-problem efficiently, our
specSDP routine [7] was used.
10.1.1. Initialization of the algorithm. The parameter values of the spectral
bundle algorithm have been set to γ = 0.01, γ̃ = 0.4 and Γ = 0.6. We use tol = 1e − 5
as a tolerance parameter to stop the algorithm as soon as progress in function values
is minor, that is, f (x) − f (x+ ) < 1e − 5.
Initialization of the variables X and K in program (26) is a difficult task. Indeed the cost function (26) is non-convex and the behaviour of the algorithm could
dramatically change for a bad choice of X and K. For instance simple initializations
like K = 0 and X = I are bound to fail. We have decided to start with a closedloop stabilizing K, which is easily obtained via minimization of the spectral abscissa
α(K) = max ReΛ(A(K)), where Λ(A(K)) is the spectrum of A(K). Once the initial
K 0 is fixed in (26), minimizing of the cost function with respect to X alone is a convex program, which can be solved using standard LMI techniques. A possible way to
initialize X is therefore to choose X 0 optimal with respect to K 0 . Unfortunately, this
often leads to numerical problems since X 0 can be ill-conditioned and have exceedingly large norm. We have observed during our numerical testing that the algorithm
may crash because of the difference of magnitude between X and K. To avoid these
effects, we have sometimes used a scaling of X 0 to obtain decision variables of the
same order.
Initializing γ∞ is easier, because the standard full order H∞ controller gives a
strict lower bound.
A delicate point is the initialization and choice of the number rk ∈ N defining the dimension of the set Gk used to define the local model φk at each sweep
k. As there does not seed to be any theoretical way to set the value of rk , we
have adjusted it heuristically during the computations. Figure 1 compares, for the
helicopter model, static choices rk = const and displays the behaviour of the algorithm for rk ∈ {1, 2, 3, 4}. The ratio fi (xk )/f4 (xk ) is plotted for i = 1, · · · , 3, and
k = 1, · · · , 100, where fi (xk ) is the value of the cost function for rk = i and for the
k-th step xk of the algorithm. As we can see in this plot, after some iterations, the
algorithm behaviour is best for rk = 4, indicating that larger rk should give better
results. The results in [45] seem to indicate that rk should be chosen in such a way
that the gap between λrk and λrk +1 is as large as possible, but our testing in [2, 4, 54]
has not confirmed this. The situation is far from clear, and dynamic choices of rk
ought to be tested. The advantage of our present approach motivated by [23] over
16
3
f1(x)/f4(x)
f2(x)/f4(x)
f3(x)/f4(x)
2.5
2
1.5
1
0.5
10
20
30
40
50
60
70
80
90
100
Iterations
Fig. 1. Behaviour of the cost function during iterations for different value of rk . This plot
show the ratio fi (xk )/f4 (xk ), where fi (xk ) is the value of the cost function at step xk for rk = i.
the line motivated by [45] is that convergence of the method no longer hinges on the
choice of rk respectively r .
10.1.2. Non-smooth stopping tests. To check whether the algorithm has
reached a critical point respectively a minimum x∗ = (X ∗ , K ∗ ), we have implemented
two non-smooth stopping tests. The first uses the -enlarged sub-differential of [45]
17
for the maximum eigenvalue function δ λ1 (X) to compute the criticality measure
σ = dist (0, δ f (x∗ )) ,
where f (x) = λ1 (B(K, X, γ∞ )) = λ1 (F (x)). This parameter is the minimum value of
the small size semidefinite program (computed using specSDP [54]):
minimize {||F 0 (x∗ )? G||2 : G ∈ δ λ1 (B(X ∗ , K ∗ , γ∞ ))}
where δ λ1 (X) = Q Y Q>
: Y 0, Tr(Y ) = 1 . Here Q is a m×r matrix whose r
columns form an orthonormal basis of eigenvectors associated with those eigenvalues
λi (X) satisfying
i ∈ I := {i : λi (X) > λ1 (X) − } .
The number r := max{i : i ∈ I } is called the -multiplicity of λ1 (X). In [45] it
is shown that ∂λ1 (X) ⊂ δ λ1 (X) ⊂ ∂ λ1 (X), so that could roughly be interpreted
in the following way: If 0 ∈ δ f (x∗ ) = F 0 (x∗ )? δ λ1 (B(X ∗ , K ∗ , γ∞ )), then it is not
possible to further decrease f locally around x∗ by more than . See [42, Lemma 2]
for more details on this stopping test.
Our second stopping test is heuristic and uses random perturbations x of x∗ =
∗
(X , K ∗ ) to see whether the cost function value can be locally improved. Denoting nv
the number of real optimization variables, we generate 100nv random perturbations
around x∗ = (X ∗ , K ∗ ). The cost function values f (Xi , Ki ) for each perturbation
i = 1, · · · , 100nv are used to define
mf := min f (Xi , Ki ), Mf := max f (Xi , Ki ),
i
i
and
pf =
#{i : f (Xi , Ki ) < f (X ∗ , K ∗ )}
.
100nv
Parameter pf (in percent) gives the proportion of perturbation for which the cost
function value is improved.
10.2. Numerical results for H∞ synthesis. In Tables 1 and 2 we present
results obtained with the spectral bundle algorithm applied to the H∞ synthesis
problem. Table 1 is for static controllers, Table 2 for dynamic controllers.
10.2.1. VTOL Helicopter. State-space data for the VTOL Helicopter model
are from [32, 54], the model is described in [28]. The H∞ gain was fixed at γ∞ =
0.1542, the optimization variables were initialized as K = [1, 1]T and X = I. The
algorithm successfully resolved the problem and obtained the H∞ controller
14.06432
K∞ =
.
239.5975
We decided to look for a dynamic controller of order 2 with prescribed closed-loop
performance γ∞ = 0.133. The algorithm was initialized with a closed-loop stabilizing
K 0 and X 0 = I. The optimal Lyapunov matrix X with respect to the given K 0 was
used neither in the static nor in the dynamic case, because it has a very high norm
and is likely to introduce numerical problems. The algorithm successfully computed
1.672546 1.851477
73.76900
AK =
, BK =
,
1.849434 1.670218
73.68110
1.309171 1.308932
0.4300008
CK =
e − 02, DK =
.
3.245753 3.241668
0.7486698
18
10.2.2. Chemical Reactor. The chemical reactor model and numerical data
can be found in [27]. We fixed the performance level γ∞ = 1.1830, and initialized
our algorithm with a closed-loop stabilizing K 0 together with the associated optimal
X 0 (scaled for numerical convenience). This kind of initialization was used for both
static and dynamic case. The obtained static controller is
−3.791707 −9.704666
K∞ =
.
−7.166853 −35.27994
We also computed a dynamic controller of order 2. For γ∞ = 1.1420, we obtained
−2.197969 −0.341903
0.4983757
1.096069
AK =
, BK =
,
−0.334860 2.205355
−6.452330 −13.82350
−0.1090918 1.556743
−3.637238 −6.382226
CK =
, DK =
.
−0.2769249 3.964105
−3.506708 −19.39329
Criticality measure is quite low for both static and dynamic case, with respectively
σ = 2.0041e − 04 and σ = 9.7570e − 04. In the static case, we have made a test of
the algorithm with a more severe stopping criterion, to see if the criticality further
decreases. We choose to stop the algorithm if
(27)
f (x) − (x+ ) < 1e − 5, and ||x − x+ || < 1e − 5.
With this rule the algorithm stops after 26448 iterations. The final point verify
e = −0.0079 and criticality measure σ = 4.2865e − 06, with r = 4 and =
λ1 (B)
1.3813e − 5. At this numerical precision, we can consider that the algorithm has
reached a critical point.
10.2.3. Transport Airplane. Model and state-space data for the transport
airplane are from [20]. We used a closed-loop stabilizing K 0 and the associated
optimal X 0 for initialization. For γ∞ = 3.1770, our algorithm computed the static
controller
K∞ = 0.6340988 −0.5964908 −0.7923650 5.166775e − 02 1.055142 .
In the static case we have also made a test of the algorithm with the stopping criterion
(27). We have observed that the criticality of the final point of the algorithm has
decreased: σ0 = 2.4819e − 05 after 241 iterations. Again, we can consider that the
algorithm has reached a critical point with regard to the chosen numerical precision.
We failed to find a dynamic controller of order 2 for the airplane model. We
computed a dynamic controller of order 1 with performance γ∞ = 2.860. The H∞
controller reads
AK = −0.4589498 ,
BK = −1.133331 1.441023 1.107071 −0.116483 −1.873279 e − 02,
CK = −7.108071e − 03 ,
DK = 1.740566 −0.8878559 −0.9477933 0.1000800 2.542508 .
10.2.4. Piezoelectric actuator. The model of the piezoelectric actuator can
be found in [11]. This study turned out the most difficult one. As can be seen in Tables
1 and 2, the spectral bundle algorithm at first failed to solve the control problem both
19
in the static and the dynamic case. The algorithm converged to a couple (K ∗ , X ∗ )
with slightly positive objective value around 1.45e-5 for the dynamical case, with the
criticality parameter σ quite small in both cases, indicating optimality. While it
is perfectly possible that our algorithm, which is a local optimization method, may
converge to a local minimum with positive values, failing to solve the underlying control problem, the present case turned out special. Namely, upon testing the obtained
controller K ∗ , we realized that it is closed-loop stabilizing and therefore solves the
control problem. What happened is that the X ∗ computed by our algorithm was
not suitable, as it fails to solve the Lyapunov inequality. Nonetheless, within the
prescribed numerical precision both stopping tests indicated a local minimum.
The static controller for γ∞ = 0.0578 reads
K∞ = −0.3880673 −1.837152 −10.00377 e + 07,
and the dynamical controller of order 2, for γ = 0.030 is
−6.770630
−7.974432e − 01
AK =
e + 06,
−7.973471e − 01
−5.118056
8.842279e − 02 4.722429e − 01 2.812368
BK =
e + 07,
3.574044e − 02 1.910070e − 01 1.136064
CK = −1.167814 −4.720797e − 01 e + 06,
DK = −4.930334e − 01 −1.977292 −1.480471e + 01 e + 07.
The described phenomenon indicates the numerical difficulty of synthesis problems
with joint variable x = (X, K), where important disparities between the numerical
ranges of the two variables K and X may occur. In particular, the Lyapunov matrix
X may be very ill-conditioned. The idea to keep K ∗ and compute a new Lyapunov
variable X associated with K ∗ using a convex technique is systematically used in
D-K iterations, where K and X variables are optimized alternatingly. The advantage
of this approach is that both sub-problems are then convex and can be solved by
standard SDP solvers. Nonetheless, intensive testing [25, 16] has shown that D-K
techniques tend to get stalled and should in general be avoided. We believe that joint
minimization in x = (X, K) is the method of choice. This does not exclude occasional
restarts.
Acknowledgement. The authors acknowledge financial support from the Agence
Nationale de Recherche (ANR) under contract SSIA− NV− 6 Controvert, from Agence
Nationale de Recherche (ANR) under contract NT05 − 1− 43040 Guidage, and from
Fondation d’Entreprise EADS under contract Solving challenging problems in feedback
control.
REFERENCES
[1] P. Apkarian, D. Noll, Nonsmooth optimization for multidisk H∞ synthesis. European J.
Control, to appear.
[2] P. Apkarian, D. Noll, Controller design via nonsmooth multi-directional search. SIAM J.
Control and Optim., vol. 44, 2006, pp. 1923 - 1949.
[3] P. Apkarian, D. Noll, IQC analysis and synthesis via nonsmooth optimization. System and
Control Letters, to appear.
[4] P. Apkarian, D. Noll, Nonsmooth H∞ synthesis. IEEE Trans. Autom. Control, vol. 51, 2006,
pp. 71 - 86.
[5] P. Apkarian, D. Noll, Nonsmooth optimization for multiband frequency domain control design. Automatica, to appear.
20
γ∞
α
λn (X)
e
λ1 (B)
e
λn+ny +nu (B)
#It.
[n, ny , nu ]
nv
rk
σ
r
pf
mf
Mf
Helicopter
0.1542
-0.1288
8.3236
-1.8146e-04
-1.8866e+05
600
[4,2,1]
12
4
1.1560e-4
3
3.7732e-7
0%
6.4380e-04
1.9516e-02
Chemical Reactor
1.1830
-1.8721
0.4327
-4.4606e-03
-2.4616e+03
1800
[4,2,2]
14
4
2.0041e-04
4
1.2308e-05
0%
6.6349e-03
9.2861e-02
Airplane
3.1770
-0.2486
1.1127e-03
-5.8368e-05
-44.4910
77
[9,1,5]
50
12
0.0114
4
3.2401e-05
0%
4.0036e-07
4.3848e-05
Piezo Actuator
0.0578
-1.4202
7.0198e-10
8.2255e-03
-399.3432
1191
[5,1,3]
18
10
7.0538e-3
5
7.9430e-4
0%
570.5592
2.2997e+04
Airplane
2.8600
-0.3161
6.2022e-04
-1.7815e-04
-150.1174
1781
1
[10,2,6]
67
12
4.2673e-02
3
4.50352e-05
0%
2.1060e-03
2.2519e-02
Piezo Actuator
0.0300
-0.4796
-9.7169e-06
1.4503e-05
-16.1523
8962
2
[6,3,5]
43
10
3.2769e-02
5
1.4537e-05
0%
1.1468e-03
7.4389e-02
Table 1
Results of static H∞ synthesis
γ∞
α
λn (X)
e
λ1 (B)
e
λn+ny +nu (B)
#It.
Order
[n, ny , nu ]
nv
rk
σ
r
pf
mf
Mf
Helicopter
0.1334
-0.1583
2.7722e-03
-5.7903e-05
-39.5053
9334
2
[6,4,3]
33
10
5.1447e-04
5
1.1137e-05
0%
4.0061e-02
5.2018
Chemical Reactor
1.1420
-0.8245
0.2944
-3.2326e-03
-735.4831
1379
2
[6,4,4]
37
6
9.7570e-04
3
2.9419e-04
1.81 %
-1.8124e-09
2.7961e-08
Table 2
Results of dynamic H∞ synthesis
[6] P. Apkarian, D. Noll, O. Prot, Nonsmooth methods for analysis and synthesis with integral
quadratic constraints. Submitted.
[7] P. Apkarian, D. Noll, J.-B. Thevenet and H.D. Tuan, A spectral quadratic-SDP method
with applications to fixed-order H2 and H∞ synthesis. Asian Control Conference, Melbourne, AU, 2004.
[8] V. Bompart, D. Noll and P. Apkarian, Second-order nonsmooth optimization of the H∞
norm. Submitted.
[9] S. Boyd, V. Balakrishnan, A regularity result for the singular values of a transfer matrix
and a quadratically convergent algorithm for computing its L∞ -norm. System and Control
letters, vol. 15, 1990, pp. 1 - 7.
21
[10] S. Boyd, L. Elghaoui, E. Feron and V. Balakrishnan, Linear matrix inequalities in system
and control theory, vol. 15 of SIAM Studies in Applied Mathematics, SIAM, Philadelphia,
1994.
[11] B. M. Chen, H∞ control and its applications, vol. 235 of Lecture Notes in Control and Information Sciences, Springer Verlag, New York, Heidelberg, Berlin, 1998.
[12] F. Clarke, Optimization and nonsmooth analysis. Canadian Math. Society Series, John Wiley
& Sons, New York, 1983.
[13] S. Cox, R. Lipton Extremal eigenvalue problems for two-phase conductors; Archiv Rat. Mechanics Anal., vol. 136, 1996, pp. 101 - 117.
[14] J. Cullum, W. Donath and P. Wolfe, The minimization of certain nondifferentiable sums
of eigenvalues of symmetric matrices. Math. programming Studies, vol. 3, 1975, pp. 35 55.
[15] A.R. Diaaz, N. Kikudu, Solution to shape and topology eigenvalue optimization problems
using a homogenization method. J. Num. Meth. Engin. vol. 35, 2005, pp. 1487 - 1502.
[16] B. Fares, D. Noll and P. Apkarian, Robust control via sequential semidefinite programming.
SIAM J. Control and Optim. vol. 40, 2002, pp. 1791 - 1820.
[17] R. Fletcher, Semidefinite matrix constraints in optimization . SIAM J. Control and Optim.
vol. 23, 1985, pp. 493 - 513.
[18] A. Fuduli, M. Gaudioso and G. Giallombardo, A DC piecewise affine model and a bundling
technique in nonconvex nonsmooth optimization. Optimization method and Software, vol.
19, 2004, pp. 89 - 102.
[19] A. Fuduli, M. Gaudioso and G. Giallombardo , Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. vol. 14, 2004, pp. 743 756.
[20] D. Gangsaas, K. Bruce, J. Blight and U.-L. Ly, Applications of modern synthesis to aircraft
control: Three case studies. IEEE Trans. Autom. Control, AC-31, 1986, pp. 995 - 1014.
[21] C. Helmberg, K.C. Kiwiel, A spectral bundle method with bounds. Math. programming, vol.
93, 2002, pp. 173 - 194.
[22] C. Helmberg, F. Oustry, Bundle methods to minimize the maximum eigenvalue function.
Handbook of Semidefinite programming. Theory, Algorithms and Applications. L. Vandenberghe, R. Saigal, H. Wolkowitz (eds.), vol. 27, 2000.
[23] C. Helmberg, F. Rendl, Spectral bundle method for semidefinite programming. SIAM J.
Optimization, vol. 10, 2000, pp. 673 - 696.
[24] C. Helmberg, F. Rendl and R. Weismantel, A semidefinite programming approach to the
quadratic knapsack problem. J. Combin. Optim. vol. 4, 2000, pp. 197 - 215.
[25] J.W. Helton, O. Merino, Coordinate optimization for bi-convex matrix inequalities. Proc.
Conf. on Decis. Control, San Diego, CA, 1997, pp. 3609 - 3613.
[26] J.-B. Hiriart-Urruty, C. Lemaréchal ,Convex analysis and minimization algorithms II:
Advanced theory and bundle methods, vol. 306 of Grundlehren der mathematischen Wissenschaften, Springer verlag, New York, Heidelberg, Berlin, 1993.
[27] Y.S. Hung, A.G.J. MacFarlaneMultivariable feedback: A classical approach. Lect. Notes in
Control and Information Sciences, Springer Verlag, New York, Heidelberg, Berlin, 1982.
[28] L.H. Keel, S.P. Bhattacharyya and J.W. Howe Robust control with structured perturbations. IEEE Trans Autom. Control vol. 36, 1988, pp. 68 - 77.
[29] K.C. Kiwiel Proximity control in bundle methods for convex nondifferentiable optimization.
Math. Programming, vol. 46, 1990, pp. 105 - 122.
[30] K.C. Kiwiel Methods of descent for nondifferentiable optimization, vol. 1133 of Springer Lect.
Notes in Math., Springer Verlag, 1985.
[31] K.C. Kiwiel A linearization algorithm for computing control systems subject to singular value
inequalities. IEEE Trans. Autom. Control, AC-31, 1986, pp. 595 - 602.
[32] F. Leibfritz COM P Le IB, COnstrained Matrix-optimization Problem LIbrary - a collection
of test examples for nonlinear semidefinite programs, control system design and related
problems. Tech. Report, Universität Trier, 2003.
[33] C. Lemaréchal An extension of Davidson’s method to nondifferentiable problems. Math. Programming Studies, Nondifferentiable Optimization, M.L. Balinski and P. Wolfe (eds.),
North Holland, 1975, pp. 95 - 109.
[34] C. Lemaréchal Bundle methods in nonsmooth optimization, Nonsmooth Optimization, Proc.
IIASA Workshop 1977, C. Lemaréchal, R. Mifflin (eds.), 1978.
[35] C. Lemaréchal Nondifferentiable optimization. Chapter VII in: Handbooks in Operations
Research and Management Sciences, vol. 1, 1989.
[36] C. Lemaréchal Lagrangian relaxation, in: Computational and Combinatorial Optimization,
M. Junger, D. Naddef (eds.), 2001, pp. 115 - 169.
22
[37] C. Lemaréchal, A. Nemirovskii and Y. Nesterov New variants of bundle methods. Math.
Programming, vol. 69, 1995, pp. 111 - 147.
[38] C. Lemaréchal, F. Oustry Nonsmooth algorithms to solve semidefinite programs. SIAM
series; Advances in Linear Matrix Inequalitiy Methods in Control, L. Elghaoui, S.-N.
Niculescu (eds.), 2000.
[39] D. Mayne, E. Polak Algorithms for the design of control systems subject to singular value
inequalities. Math. Programming Studies, vol. 18, 1982, pp. 112 - 134.
[40] D. Mayne, E. Polak and A. Sangiovanni Computer aided design via optimization. Automatica, vol. 18, no. 2, 1982, pp. 147 - 154
[41] K. Mombaur Stability optimization of open-loop controlled walking robots. PhD Thesis, Universität Heidelberg, 2001.
[42] D. Noll, P. Apkarian, Spectral bundle method for nonconvex maximum eigenvalue functions:
first-order methods. Math. Programming, Series B, vol. 104, 2005, pp. 701 - 727.
[43] D. Noll, P. Apkarian, Spectral bundle method for nonconvex maximum eigenvlue functions:
second-order methods. Math. Programming, Series B, vol. 104, 2005, pp. 729 - 747.
[44] D. Noll, M. Torki and P. Apkarian, Partially augmented Lagrangian method for matrix
inequality constraints. SIAM J. on Optimization, vol. 15, 2004, pp. 161 - 184.
[45] F. Oustry A second-order bundle method to minimize the maximum eigenvalue function.
Math. Programming, Series A, vol. 89, 2000, pp. 1 - 33.
[46] M. Overton On minimizing the maximum eigenvalue of a symmetric matrix. SIAM J. on
Matrix Analysis and Applications, vol. 9, 1988, pp. 256 - 268.
[47] M. Overton, Large-scale optimization of eigenvalues. SIAM J. on Optimization, vol. 2, 1992,
pp. 88 - 120.
[48] M. Overton, R. Womersley, On the sum of the largest eigenvalues of a symmetric matrix.
SIAM J. on Matrix Analysis and Applications, vol. 13, 1992, pp. 41 - 45.
[49] E. Polak On the mathematical foundations of nondifferentiable optimization in engineering
design. SIAM Review, vol. 29, 1987, pp. 21 - 89.
[50] E. Polak Optimization: Algorithms and Consistent Approximations. Springer Series in Applied Mathematical Sciences, vol. 124, 1997.
[51] E. Polak, Y. Wardi A nondifferential optimization algorithm for the design of control systems
subject to singular value inequalities over the frequency range. Automatica, vol. 18, 1982,
pp. 267 - 283.
[52] F. Pukelsheim Optimal design of experiments. John Wiley & Sons, New York, 1993.
[53] J.-B. Thevenet, D. Noll and P. Apkarian, Nonlinear spectral SDP method for BMIconstrained problems: applications to control design. Informatics in Control, Automation
and Robotics I, J. Braz, H. Arajo A. Viera and B. Encarnaco (eds.), Springer Verlag, 2006,
pp. 61 - 72.
[54] J.-B. Thevenet, D. Noll and P. Apkarian, Nonsmooth methods for large bilinear matrix
inequalities: applications to feedback control. Submitted.
[55] L. Vandenberghe, S. Boyd Semidefinite programming. SIAM Review, vol. 38, 1996, pp. 49 95.
[56] P. Wolfe, A method of conjugate subgradients for minimizing nondifferentiable functions.
Math. Programming Studies, vol. 3. Nondifferentiable Optimization, M.L. Balinski, P.
Wolfe (eds.), North-Holland, 1975, pp. 145 - 173.
[57] Z. Zhao, B. Braams, M. Fukuda, M. Overton and J. Percus The reduced density matrix
for electronic structure calculations and the role of three-index representability. J. Chem.
Physics, vol. 11, 2003.
23

Download Report

TRUST REGION SPECTRAL BUNDLE METHOD FOR

Paperzz.com

Your Paperzz