x2X max x2X max S := {y ∈ := (f1(x) f2(x) ••• fm(x))T

A DUAL AND INTERIOR POINT APPROACH
TO SOLVE CONVEX MIN-MAX PROBLEMS
JOS F. STURM and SHUZHONG ZHANG
Econometric Institute
Erasmus University Rotterdam
Abstract. In this paper we propose an interior point method for solving the dual form of min-max
type problems. The dual variables are updated by means of a scaling supergradient method. The
boundary of the dual feasible region is avoided by the use of a logarithmic barrier function. A
major dierence with other interior point methods is the nonsmoothness of the objective function.
1. Introduction
Consider the following problem
(P)
min
max f (x)
x2X 1im i
where we assume that the functions fi (x), 1 i m, are real valued convex
functions dened on a convex and compact subset X of <n .
Clearly, we have
min max f (x) = min
max yT f (x)
x2X 1im i
x2X y2S
where S is m-dimensional unit simplex given by
S := fy 2 <m :
m
X
i=1
yi = 1 and yi 0 1 i mg
and the m-dimensional vector function f (x) is given by
f (x) := (f1 (x) f2 (x) fm(x))T :
Since the function yT f (x) is convex in x for xed y 2 S , and is concave in y for
xed x 2 X , it follows that (see e.g. Sion 6])
min
max yT f (x) = max
min yT f (x):
(1)
x2X y2S
y2S x2X
From now on we shall concentrate on the dual problem of (P) given by
(D)
max
h(y)
y2S
where the dual objective function is dened as
h(y) := min
yT f (x):
x2X
2
JOS F. STURM AND SHUZHONG ZHANG
Note that the domain of h is S . Clearly, h(y) is a concave function.
In two recent papers by Barros, Frenk, Schaible and Zhang 1, 2], fast algorithms
for solving generalized fractional programming were constructed on the basis of a
similar duality relation. The dual problem (D) can be derived using the Lagrangian
function. For a thorough discussion on the Lagrange duality theory for convex
programming, we refer to the book of Hiriart-Urruty and Lemarechal 5].
Observe that Problem (D) has a very simple constraint set. However, the function
h(y) is in general non-dierentiable. Throughout this paper we shall use an oracle
to get an optimal solution x of the following problem:
min
yT f (x)
x2X
(2)
where y 2 S . Using this oracle we not only know the function value h(y) = yT f (x),
but also an element belonging to the supergradient set. More precisely,
f (x) 2 @h(y)
where @h(y) denotes the supergradient set of h at point y.
The basic underlying idea is that we rst introduce a logarithmic barrier for
Problem (D), and then apply a scaling and projection supergradient method maximizing the barrier function. Due to lack of dierentiability in h(y), the convergence
analysis diers in avor from usual path-following algorithms. The advantage of our
approach is that we do not require any knowledge on the functions fi , i = 1 2 m,
and the structure of the constraint set X . Remark that for the cases where m is
relatively large compared to the dimension n, and the constraint set X is simple,
solving (2) is much easier than solving the original problem.
The notation we use is as follows. The superscript of a vector is used to denote
the iteration number, e.g. in the k-th iteration we have y(k) the subscript will denote
the coordinate, e.g. the i-th coordinate of y(k) is yi(k) capitalization of a vector will
denote the diagonal matrix taking the elements from the vector in the diagonal, e.g.
Y (k) = diag(y1(k) ym(k) ). We denote the all-one vector by e, the Euclidean norm
(the L2 norm) simply by kk and the L1 norm by kk1 .
We organize the presentation in the following way. In Section 2, we will introduce
the search direction and present the new algorithm. The convergence analysis of the
algorithm is carried out in Section 3 and some remarks concluding the discussion
are made in Section 4.
2. The scaling supergradient method
We introduce now the logarithmic barrier function
h (y) := h(y) + m
X
i=1
log yi :
A DUAL APPROACH TO MIN-MAX PROBLEMS
3
Observe that h (y) is a strictly concave function, for which the supergradient set
is given by
@h (y) = @h(y) + Y ;1 e:
(3)
The concept of logarithmic barrier was introduced by Frisch 4] to steer the
iterates away from the boundary. The optimizer of the barrier function will be a
nearly optimal solution to (D) if the multiple of the barrier term is small, as it is
shown in the following lemma.
Lemma 1 If y 2 S is such that h(y) = maxy2S h(y) then
h(y) ; m:
h(y) max
y2S
Proof.
From the concavity of h , it follows that
0 2 @h (y)
i.e., there exists 2 @h(y) such that
+ Y ;1 e = 0:
By the concavity of h, we have for y 2 arg maxy2S h(y) that
max
h(y)
y 2S
h(y) + T (y ; y)
= h(y) ; eT Y ;1 (y ; y)
= h(y) + (m ; eT Y ;1 y )
h(y) + m
where we used y y 2 S .
2
In this paper we shall maximize h over S for a prexed parameter > 0. We
shall x 0 < < =m if an -optimal solution is desired.
o
o
Assume that the current iterate y(k) 2S , where S denotes the relative interior
of S . Calling Oracle (2) we obtain
x(k) 2 arg xmin
(y(k) )T f (x):
2X
Let g(k) := f (x(k) ) + (Y (k) );1 e. Hence, by (3) we know that
g(k) 2 @h (y(k) ):
As a search direction we propose a scaled supergradient direction, which coincides with the supergradient direction of the function h (Y (k) z ) on the domain
fz : (y(k) )T z = 1g. The scaling transformation z = (Y (k) );1 y is based on the idea of
4
JOS F. STURM AND SHUZHONG ZHANG
Dikin's ane scaling algorithm 3] for linear programming. Remark that this scaling
maps the current iterate y(k) into the all-one vector e.
To simplify notations, we write
Pv := Im ; 1 2 vvT
kv k
to denote the orthogonal projection matrix onto the kernel of a given vector v 2 Rm .
The scaled supergradient direction we propse is Y (k) d(k) , where
1
P Y (k) g (k) :
Py k Y (k) g(k) y k
d(k) := ( )
( )
Remark that
Y (k) d(k) = arg emax
f(g(k) )T w : (Y (k) );1 w 1g:
T w=0
o
It is easily seen that y(k) + tk Y (k) d(k) 2S if j tk j< 1. In this paper, we require
that
0 < tk < 1 for k = 0 1 along with the classical conditions of the supergradient step length (cf. Shor 7]),
viz.
lim t = 0
k!1 k
1
X
k=0
tk = 1:
1 for
For simplicity we let := 21 . As an example, one may choose tk = k+2
k = 0 1 .
Our scaling supergradient
algorithm generates the following sequence of dual
o
variables belonging to S ,
y(0) = m1 e
and
y(k+1) := y(k) + tk Y (k) d(k) for k = 0 1 2 :
In the next section, it will be shown that
lim sup h (y(k) ) = max
h (y):
y2S k!1
5
A DUAL APPROACH TO MIN-MAX PROBLEMS
3. Convergence analysis
In the previous section, we have already seen that the sequence fy(k) g is contained
in the relative interior of S . We shall now prove that our barrier method avoids the
o
boundary so well that the sequence is actually contained in a closed subset of S .
By denition,
Py(k) Y (k) g (k) d(k)
= Py k Y (k) f (x(k) ) + Py k e:
( )
( )
Using miny2S kyk2 = m1 , it follows that
(k)
Py k e = e ; y(k) 2 e ; my(k) :
y
( )
Since X is convex and compact, all the convex functions fi , 1
uniformly bounded on X . Letting f1 := maxx2X kf (x)k1 , we have
P Y (k) f (x)
y (k )
so that
Py(k) Y (k) g (k) d(k)
e ; (f1 + m)y(k) :
yi(k+1) yi(k) for i with yi(k)
1 , we have
2
m, are
f1 y(k) f1
This implies that
Since 0 < tk
i
f1 + m :
y(k+1) 12 y(k)
for any k. Because y(0) = m1 e, it follows that
inf
min y(k) c1 k 1im i
(4)
where c1 := 21 f1 +m .
Now we use (4) and the fact that all the limit points form a closed set contained
o
in S to conclude that there is one limit point, say y, which attains the maximum
function value in h (y) among all the limit points. Let y be the maximum point of
h (y) in S . We shall now concentrate on proving h (y) = h (y ).
The proof is done by contradiction. Suppose from now on that
h (y) < h (y ):
Let the upper level set of h (y) at y be
L := fy 2 S : h (y) h (y)g:
(5)
6
JOS F. STURM AND SHUZHONG ZHANG
o
By this construction, there will be no other limit point in L.
o
Clearly, y 2L. Moreover, there exists a positive number such that
o
B (y ) \ S L
(6)
where B (y ) denotes a ball with center y and radius .
Now we turn to consider an iterative point y(k) . Let the upper level set at y(k)
be
Lk := fy 2 S : h (y) h (y(k) )g:
Due to the concavity of h , the projected supergradient direction Pe g(k) provides
a normal direction in S of a supporting hyperplane for Lk at y(k) .
Let y 2 Lk . The distance from y to the hyperplane is given by
(g(k))T (y ;y(k) ) :
(7)
(k) := (d(k) )T (Y (k) );1 (^y ; y(k) ):
(8)
Pe g (k) o
Let y^ 2S . Dene
Lemma 2 Let r > 0. If B(^y r) \ S Lk then there exists some constant c2 such
that
(k) rc2 (h (y ) ; h (y(k) )):
Proof.
Consider the following supporting hyperplane of Lk ,
fz : (g(k) )T Pe (z ; y(k) ) = 0g:
The distance from y^ towards this supporting hyperplane is
(g(k) )T (^y ; y(k) )= Pe g(k) :
As B (^y r) \ S Lk this implies
(g(k) )T (^y ; y(k) ) r Pe g(k) :
Therefore,
(k) = (d(k) )T (Y (k) );1 (^y ; y(k) )
= (g(k) )T (^y ; y(k) )= Py k Y (k) g(k) (k ) r P kPYe g(k) g(k) :
y
( )
( )
(9)
7
A DUAL APPROACH TO MIN-MAX PROBLEMS
Using the Cauchy-Schwartz inequality and the supergradient inequality, we have
P g (k) (g (k) )T (y ; y (k) )= y ; y (k) h(y ) ; h(y (k) ) :
(10)
e
As y(k) and y both belong to the unit simplex, it
Moreover, there holds
P Y (k) g (k) y(k)
Y (k) g (k) y ; y (k) follows y ; y(k) Y (k) f (x(k) ) + kek
p
f1 + m:
p
2.
(11)
From (9)-(11) it follows that
for c2 = p2(f11+pm) .
Dene
(k) rc2 (h (y ) ; h (y(k) ))
2
(k) := Y^ ;1 (^y ; y(k) ) :
(12)
We have the following relation:
Lemma 3 There holds
((k+1) )2 ((k) )2 ; 2tk (k) ; ((k) )2 ; (1 + (k) )2 tk =2]:
Proof.
Since y(k+1) = y(k) + tk Y (k) d(k) we have
((k+1) )2 =
=
2
Y^ ;1 (^
y ; y(k) ; tk Y (k) d(k) )
((k) )2 ; 2tk (k) + 2tk (^y ; y(k) )T Y^ ;1 (I ; Y^ ;1 Y (k) )d(k)
2
+t2k Y^ ;1 Y (k) d(k) :
Notice that
(13)
Y^ ;1 y (k) ; e
Y^ ;1 y (k) ; e = (k) :
(14)
1
Therefore, using d(k) = 1 it follows
^ ;1 (k) (k) ^ ;1 (k) (k) (15)
Y Y d Y y d 1 + (k) :
1
Similarly, using d(k) = 1 and the Cauchy-Schwartz inequality, we have
j (^y ; y(k) )T Y^ ;1 (I ; Y^ ;1 Y (k) )d(k) j (I ; Y^ ;1 Y (k) )Y^ ;1 (^y ; y(k) )
(I ; Y^ ;1 Y (k) )e Y^ ;1 (^
(k) )
y
;
y
1
= Y^ ;1 y(k) ; e (k)
1
((k) )2 8
JOS F. STURM AND SHUZHONG ZHANG
where the last inequality follows from (14).
Substituting the above inequality and the inequality (15) into (13) yields the
desired result.
2
Dene
y^ := (1 ; )y + y where 0 < < 1. Let y^ be y^ . By (6) there exists h~ > h (y) such that the ball
B (y ) \ S will be contained in the upper level set
fy 2 S : h (y) h~ g:
Using the concavity of h , this implies
B (^y ) \ S fy 2 S : h (y) (1 ; )h (y) + h~ g:
Since
lim sup h (y(k) ) = h (y) < ~h
k!1
(16)
we obtain from Lemma 2 and (16) that for given 0 < < 1 there must exist k1 such
that for all k k1 ,
(k) c2 (h (y ) ; h~ ):
(17)
On the other hand, by (12) we have
(k) = Y^ ;1 (^y ; y(k) )
=
Y^ ;1 (^
y ; y) + Y^ ;1 (y ; y(k) )
v
uX
um
t ( yi ; yi )2 + Y^ ;1 (y ; y(k) ) :
i=1 (1 ; )yi + yi
(18)
As y 6= y is a limit point, there is an unbounded set K () of integers such that
^ ;1
y ; y(k) )
Y (
v
uX
um
t (
yi ; yi )2
i=1 (1 ; )yi + yi
(19)
for all k 2 K ().
Based on (17) and (18), there exists a suciently small constant 0 > 0 such
that when = 0 and k 2 K (0 ), then
((k) )2 < minf 31 c2 0 (h (y ) ; h~ ) 1g:
(20)
Let k1 be chosen according to (17) for = 0 .
A DUAL APPROACH TO MIN-MAX PROBLEMS
9
Because limk!1 tk = 0, there is k2 2 K (0 ) with k2 k1 , such that for all
k k2 we have
2tk < 31 c2 0 (h (y ) ; h~ ):
In particular, for k k2 and if (20) holds, then we have
(1 + (k) )2 t < 2t < 1 c (h (y ) ; h~ ):
k
k 3 2 0 2
Using (17), (20), (21) and applying Lemma 3, it follows that
((k+1) )2 ((k) )2 ; 2tk (1 ; 31 ; 13 )c2 0 (h (y ) ; h~ )
= ((k) )2 ; 23 tk c2 0 (h (y ) ; h~ )
(21)
(22)
for k = k2 . This implies that (k +1) < (k ) and so (20) and (21) hold for k := k2 +1,
and consequently (22)
P also holds for k := k2 + 1. Recursively applying (22) yields a
contradiction since 1
j =k tj = +1. This shows that inequality (5) cannot be true,
which, in turn, proves the desired convergence result. To summarize, we present the
following main theorem of this paper.
2
2
2
Theorem 1 There holds
lim sup h (y(k) ) = max
h (y):
y2S k!1
4. Concluding remarks
We have presented in this article an interior point method for solving a dual form
of min-max type problems. An important question left is how to recover the primal
solutions using approximately optimal dual variables and an approximately optimal
objective value. We regard this as a topic for future research.
In a forthcoming paper, the authors will investigate a path-following scheme,
extending the current results. Finally, we remark that our convergence proof fails for
= 0, in which case the method becomes comparable to the ane scaling algorithm
for linear programming. It remains an open question whether the convergence still
holds in that case.
References
1. A.I. Barros, J.B.G. Frenk, S. Schaible and S. Zhang, A new algorithm for generalized fractional
programming, Technical Report TI94-23, Tinbergen Institute Rotterdam, 1994.
2. A.I. Barros, J.B.G. Frenk, S. Schaible and S. Zhang, How duality can be used to solve generalized fractional programming problems, 1994, submitted for publication.
3. I.I. Dikin, Iterative solutions of problems of linear and quadratic programming, Soviet Mathematics Doklady 8 (1967) 674-675.
4. K.R. Frisch, The logarithmic potential method for convex programming, Institute of Economics, University of Oslo, Oslo, Norway, 1955.
10
JOS F. STURM AND SHUZHONG ZHANG
5. J.-B. Hiriart-Urruty and C. Lemarechal, Convex analysis and minimization algorithms (vol. 1),
Springer-Verlag, Berlin, 1993.
6. M. Sion, On general minimax theorems, Pacic Journal of Mathematics 8 (1958) 171-176.
7. N.Z. Shor, Minimization methods for non-dierentiable functions, Springer-Verlag, Berlin,
1985.