An interior-point method for minimizing the sum of piecewise

An interior-point method for minimizing the sum of piecewise-linear convex functions
Toshihiro Kosaki
The Department of Industrial and Management
Tokyo Institute of Technology
2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152-8552 Japan
[email protected]
(February 26, 2010 )
Abstract
We consider the problem to minimize the sum of piecewise-linear convex functions under both
linear and nonnegative constraints. We convert the piecewise-linear convex problem into a standard form
linear programming problem (LP) and apply a primal-dual interior-point method for the LP. From the
solution of the converted problem, we can obtain the solution of the original problem.
We establish polynomial convergence of the interior-point method for the converted problem and devise
the computaion of the Newton direction.
Keywords: optimization, piecewise-linear convex function, interior-point method
1. Introduction
In this paper, we consider the problem to minimize the sum of piecewise-linear convex functions under both
linear constraints and nonnegative constraints. This problem is a generalized one from a piecewise-linear
convex problem[1, 3]. The purpose of the paper is to estimate complexity of solving this problem. More
precisely speaking, is there a polynomial time method for the problem?
This paper is organized as follows. In section 2, we introduce the conversion from a linear programming
problem(LP) with free variables into the standard form LP. The solution of the original problem is obtained
from the solution of the converted problem. In section 3, we formulate the problem to minimize the sum
of piesewise-linear convex functions. Firstly we convese the problem into an LP with free variable. And
secondly we convert the LP into a standard form problem. We apply the interior-point method. We establish
polynomial convergence of the algorithm and devise the computation of the Newton direction.
2. The conversion from the LP with free variables into the standard form LP
We consider the following LP with free variables.
min cT x + f T z
s.t. Ax + Dz = b
x ≥ 0,
(P)
where A ∈ ℜm×n , b ∈ ℜm , c ∈ ℜn , f ∈ ℜl , D ∈ ℜm×l are constants, x ∈ ℜn and z ∈ ℜl are variables.
Without loss of generality, we can assume that the matrix D is full column. Therefore rank D = l. We
introduce new variables z+ and z− . The standard form is as follow.
min cT x + f T z+ − f T z−


x
s.t. [A D − D]  z+  = b
z−
(P1)
x ≥ 0 z+ ≥ 0 z− ≥ 0.
The dual is as follow.
max bT y
s.t. AT y ≤ c
T
D y = f.
1
(D1)
The transform z = z+ − z− has a drawback that one z can’t determine z+ and z− and the computation
is unstable. The constraints of the dual problem, corresponding to free variables of the primal problem,
are equality constraints. We solve the equations in basic variables and substitute it for both the objective
function and constarint functions. We obtain a dual problem with only nonbasic variables. We consider the
primal problem corresponding to this problem.In case of semidefinite programminng, the conversion was
proposed[5]. The advantage is that the size of problem is small. We can obtain the solution of original
problem from the solution of the converted problem. We use DB for a submatrix of D corresponding to the
basis, and use the submatrix DN for the remain. We use subvectors yB and yN corresponding to DB and
DN . And we use, respectively, the submatrices AB and AN and subvectors bB and bN corresponding to yB
and yN . We solve equations D T y = f in basic variables,
−T
T
(f − DN
yN ).
yB = DB
(1)
Substituting yB into the objective function, we have
−T
T
(f − DN
yN ) + bTN yN
bT y = bTB DB
−T T
−T
= (bTN − bTB DB
DN )yN + bTB DB
f.
(2)
Substituting yB into constraint functions, we have
−T
T
(f − DN
yN ) + ATN yN ≤ c.
ATB DB
(3)
The dual problem is as follows.
−T T
−T
max (bTN − bTB DB
DN )yN + bTB DB
f
−T T
−T
DN )yN ≤ c − ATB DB
f.
s.t. (ATN − ATB DB
(D2)
The primal problem is as follows.
−T
−T
f )T x + bTB DB
f
min (c − ATB DB
−1
−1
s.t. (AN − DN DB
AB )x = bN − DN DB
bB
x ≥ 0.
(P2)
We use x∗ and y∗ for the solution of the converted primal and dual problems. The solutions (x, z) and
−1
−T
T ∗
(bB − AB x∗ )) and (DB
(f − DN
y ), y∗ )
(yB , yN ) of the original problems are given respectively by (x∗ , DB
.
3. Formulation of the problem to minimize the sum of piecewise-linear convex functions
We consider the convex problem to minimize the sum of piecewise-linear convex functions under linear and
nonnnegative constraints. The sum of piecewise-linear convex function is also convex [2]. The objective
function is nonlinear and has some nondifferential points. The formulation is as follows.
T
max
cpip x + dpip
min
x
p=1,... ,q
ip =1,... ,lp
s.t. Ax = b
(P3)
x ≥ 0,
where A ∈ ℜm×n , b ∈ ℜm , cpip ∈ ℜn , dpip ∈ ℜ are constants, x ∈ ℜn is variable. The objective function is
the sum of q piecewise-linear convex functions and each piecewise-linear convex function consists of lp affine
funcitions. This problem is written, by introducing both variables tp (p = 1, . . . , q) and slack variables
2
spip (ip = 1, . . . , lp , p = 1, . . . , q), as follows
min
tp
p=1,... ,q
T
s.t. cp1 x + dp1 + sp1p = tp p = 1, . . . , q
..
.
p
p
cip T x + dip + spip = tp p = 1, . . . , q
..
.
(P4)
T
cplp x + dplp + splp = tp p = 1, . . . , q
Ax = b
x≥0
spip p = 1, . . . , q ip = 1, . . . , lp .
We divide tp into two nonnegative variables and transpose the terms. And we obtain a standard form LP,
t+
t−
min
p −
p
p=1,... ,q
s.t.
T
cp1p x
−
p=1,... ,q
t+
p
p
p
+ t−
p + s1p = −d1 p = 1, . . . , q
..
.
T
p
p
−
cpip x − t+
p + tp + sip = −dip p = 1, . . . , q
..
.
(P5)
T
p
−
cplp x − t+
p + tp + slp p = −dlp p = 1, . . . , q
Ax = b
x≥0
−
t+
p ≥ 0 p = 1, . . . , q tp ≥ 0 p = 1, . . . , q
spip ≥ 0 p = 1, . . . , q i = 1, . . . , lp .
The dual problem is as follows.
max bT y −
s.t. AT y +
dpip upip
p=1,... ,q ip =1,... ,lp
p=1,... ,q ip =1,... ,lp
ip =1,... ,lp
upip ≤ 0 p
upip
cpip upip ≤ 0
(D3)
= −1 p = 1, . . . , q
= 1, . . . , q ip = 1, . . . , lp .
The peculiarity is that the primal (P4) problem has free variables tp . One of the methods dealing with free
variables is that the free variable is represented by the difference of two nonnegative variables. This method
has a drawback that variable diverge and numerical difficulties happen. We apply another method, which
was introduced in section 2, in order to avoid this difficulty.
p
We solve one of the constraints, then uplp = −1 −
uip (≤ 0) (p = 1, . . . , q) and we substitute this
ip =lp
3
relation into both objective function and constraint function. The objective function is
dpip upip
bT y −
p=1,... ,q ip =1,... ,lp
=bT y −
dpip upip
−


dplp

p=1,... ,q ip =lp
p=1,... ,q
p
(dlp − dpip )upip +
=bT y +
p=1,... ,q ip =lp

−1 −
dplp .
ip =lp


upip 

(4)
p=1,... ,q
The constraint function is
AT y +
cpip upip
p=1,... ,q ip =1,... ,lp
 



=AT y +
cpip upip +
upip 
clp −1 −


p=1,... ,q ip =lp
p=1,... ,q
ip =lp
p
p
(cip − cplp )upip −
clp (≤ 0).
=AT y +
p=1,... ,q ip =lp
(5)
p=1,... ,q
The dual problem is as follows.
max bT y +
p=1,... ,q ip =lp
T
s.t. A y +
−
(dplp − dpip )upip +
(cpip
p=1,... ,q ip =lp
upip
−
cplp )upip
≤
dplp
p=1,... ,q
cplp
p=1,... ,q
(D4)
≤ 1 p = 1, . . . , q
ip =lp
upip ≤ 0 p = 1, . . . , q, ip = 1, . . . , lp − 1.
The primal problem is as follow.
min
cplp
p=1,... ,q
p
(c1p − cplp )T x
s.t.
T
+
x+
sp1
..
.
−
splp +
p=1,... ,q
splp = dplp
−
dplp
p=1,... ,q
p
d1 p = 1, . . .
,q
(cpip − cplp )T x + spip − splp = dplp − dpip p = 1, . . . , q
(P6)
..
.
(cplp −1 − cplp )T x + splp −1 − splp = dplp − dplp −1 p = 1, . . . , q
Ax = b
x ≥ 0,
spip ≥ 0 p = 1, . . . , q ip = 1, . . . , lp .
We introduce slack variables.
vx
:=
p=1,... ,q
vspl
p
vspi
p
cplp − AT y −
p=1,... ,q ip =lp
upip p = 1, . . . , q,
:=
1+
:=
ip =lp
−upip p =
(cpip − cplp )upip ,
1, . . . , q ip = 1, . . . , lp − 1.
4
(6a)
(6b)
(6c)
The optimality condition is as follow.
(cp1 − cplp )T x + sp1 − splp = dplp − dp1 p = 1, . . . , q
..
.
(cpip − cplp )T x + spip − splp = dplp − dpip p = 1, . . . , q
..
.
(cplp −1 − cplp )T x + slp −1p − splp = dplp − dplp −1 p = 1, . . . , q
Ax = b
x≥0
spip ≥ 0 p = 1, . . . , q i = 1, . . . , lp
p
AT y +
(cip − cplp )upip + vx =
−
p=1,... ,q ip =lp
upip + vspl = 1 p = 1, . . . , q
cplp
(7)
p=1,... ,q
p
ip =lp
upip + vspl = 0 p = 1, . . . , q, ip = 1, . . . , lp − 1
p
vx ≥ 0
vspl ≥ 0 p = 1, . . . , q
p
v
sp
ip
≥ 0 p = 1, . . . , q, ip = 1, . . . , lp − 1
T
x vx = 0
spip vspi = 0 p = 1, . . . , q, ip = 1, . . . , lp .
p
∗
∗
We use (x∗ , sp1 , . . . , splp ) for the solution of the converted primal problem (P6). The solution of the
original primal problem (P4) is given by
∗
∗
(x, tp , sp1 , . . . , splp ) = (x∗ , cTlp p x∗ + splp + dplp , sp1p , . . . , s∗lp p ).
∗
(8)
∗
We use (y∗ , up1 , . . . , uplp −1 ) for the solution of the converted dual problem(D4). The solution of the original
dual problem (D3) is given by
p∗
∗
∗
(y, up1 , . . . , uplp −1 , uplp ) = (y ∗ , up1 , . . . , uplp −1 , −1 −
uip ).
(9)
ip =lp
By the number of the variables in the primalproblem
(P6), 
the short step interior-point method (Algo
q
lp L iterations. We ascertain this proposition.
rithm SPF[7]) can solve the problem (P6) in O n +
p=1
5
We use the following notations.













x̃ := 













(c11
A
− c1l1 )T
..
.





 (c1i − c1 )T
l1
1


..

.

1 T
 (c1
 l1 −1 − cl1 )

.
à := 
..

 (cq − cq )T

1
lq

..

.


 (cqiq − cqlq )T


..

.
(cqlq −1 − cqlq )T
x
s11
..
.
s1i1
..
.
s1l1
..
.
sq1
..
.
sqiq
..
.
sqlq


y

 u11




..


.



 u1
i1




..


.



 u1

 l −1
 , ỹ :=  1.


..



 uq


1


..


.




q
u


iq


..




.
uqlq −1
0 ...
1 ...
..
.
0
0
...
...
0
0
0
−1
..
.
0 ...
1
0
−1
..
.
0 ...
0
...
..
.
...
1
−1

b
d1l1 − d11
..
.





 d1 − d1i
l1
1


.
.

.

 d1 − d1
l1 −1
 l1
..
b̃ := 

.

 dq − dq

1
lq

..


.

 dqlq − dqiq

..


.
dqlq − dqlq −1
The feasible interior is written by














,













vx
 .. 
 . 


 vs1 
1 

 .. 
 . 


 vs1 
 i1 
 . 
 . 
 . 


v1 
s̃ := 
 sl1  ,
 .. 
 . 


 vsq 

1 
 . 
 .. 


 vq 
 siq 


 .. 
 . 
vsql

q

..
.
 q























 , c̃ := 
























1 ...
..
.
0 ...
0
...
0
1
0
0 ...
0
...
..
.
...
p
p=1 clp
0
..
.
0
..
.
1
..
.
0
..
.
0
..
.
1
1












,


−1 

.. 

. 

−1 

.. 
. 
−1
(11)













.











F 0 := (x̃, ỹ, s̃) : Ãx̃ = b̃, ÃT ỹ + s̃ = c̃, (x̃, s̃) > 0 .
6
(10)
(12)
(13)
The neighborhood is written by
N := (x̃, ỹ, s̃) ∈ F 0 : X̃ s̃ − µe2 ≤ 0.4µ ,
(14)
where e is a vector whose components are all 1. We use the following algorithm.
step0: Initial point (x̃0 , ỹ 0 , s̃0 ) ∈ N is given. Termination criteria µ∗ .
0.4
x̃0T s̃0
0
q
σ := 1 − ,
µ
:=
, let k := 0.
n + p=1 lp
n + qp=1 lp
step1:
step2:
If termination criteria µk ≤ µ∗ is satisfied, then stop.
T
x̃k s̃k
q
Set µk :=
. Solve
n + p=1 lp
Ã∆x̃k
T
k
k
à ∆ỹ + ∆s̃
S̃ k ∆x̃k + X̃ k ∆s̃k
= 0
(15a)
= 0
= σµk e − X̃ k s̃k ,
(15b)
(15c)
k
k
k
, ∆ỹ
and obtain Newton direction
k (∆x̃
, ∆s̃k ). k
k+1 k+1 k+1
k k
, ỹ
, s̃
:= x̃ , ỹ , s̃ + ∆x̃ , ∆ỹ , ∆s̃k .
Let x̃
step3: Set k := k + 1, go to step1.
3.1. iteration number
We consider at kth iteration. The equality constraints are satisfied by the following relation
Ãx̃k+1 = Ãx̃k = b̃,
(16)
ÃT ỹ k+1 + s̃k+1 = ÃT ỹ k + s̃k = c̃.
(17)
T
Note that, about dualtiy gap used by optimality criteria, ∆x̃k ∆s̃k = 0, We have the following estimate
T k
T
s̃ + ∆s̃k
x̃k+1 s̃k+1 = x̃k + ∆x̃k
T
T
T
T
= x̃k s̃k + x̃k ∆s̃k + s̃k ∆x̃k + ∆x̃k ∆s̃k


q
0.4
 (n +
= 1 − lp )µk .
q
n + p=1 lp
p=1
By the following inequality, the next point
k+1 k+1
s̃
− µk+1 =
X̃
2
=
=
≤
=
≤
≤
≤
≤
≤
(18)
generated by the algorithm is also kept in the neighborhood.
(19)
X̃ k + ∆X̃ k s̃k + ∆s̃k − µk+1 2
(20)
∆X̃ k ∆s̃k 2
1/2
−1/2
−1
(21)
D ∆X̃ k D∆s̃k where D := X̃ k S̃ k
2
√
2
D −1 ∆x̃k + D∆s̃k 2
(22)
2
4
√ −1/2 2
2
k
k k X̃ k S̃ k
σµ
e
−
X̃
s̃
(23)
4 2
2
√ k k
k s̃
−
σµ
e
X̃
2
2
(24)
4
min x̃ki s̃ki
2
√ k k
k
k s̃
−
µ
e
+
(1
−
σ)
µ
e
X̃
2
2
(25)
k
4
(1 − 0.4)µ
q
√
2
2
2 0.4 + (1 − σ) (n + p=1 lp )
(26)
4
1 − 0.4
√
32 2 k
µ
(27)
240
0.4µk+1 ,
(28)
7
where X̃ and S̃ are diagonal matrices whose components are x̃ and s̃ respectively. Positivity condition is
also satisfied.
Because one iteration can make the duality gap 1−0.4/ n + qp=1 lp time, algorithm can obtain optimal
solution in O( n + qp=1 lp L) iterarions, where L is the number of bits expressing the data of the problem.
From the solution of the converted problem, we can obtain the solution of the original roblem.
3.2. Newton direction
The time of computing the Newton direction is most part of all computing time. Therefore devising
computing the Newton direction is important. For the simplicity, we sonsider the problem about the sum
of two piecewise-linear convex functions. The Newton direction is given by the solution of the following
system of equations
−1
−1
σµk e − X̃ k s̃k
(29a)
ÃX̃ k S̃ k ÃT ∆ỹ k = −ÃS̃ k
∆s̃k
∆x̃
k
= −ÃT ∆ỹk
−1
−1
= S̃ k
σµk e − X̃ k s̃k − X̃ k S̃ k ∆s̃k
We use the following notations.



A
A
1
1
T

 (c11 − c1l1 )T
 (c1 − cl1 )



..
..






.
.



 (c1i1 − c1l )T
 (c1i1 − c1l )T 
1
1






..
..



.
.

 1
 1
−1 
1 T 
(c
(c
−
c
)
−
c1l1 )T
)
Ā := 
diag(x)diag(v
x
l
l
l
−1
−1
1
1
1



 (c21 − c2 )T
 (c21 − c2 )T 
l2
l2






..
..



.
.



 (c2 − c2 )T
 (c2 − c2 )T 
i2
i2



l2
l2



..
..



.
.
(c2l2 −1 − c2l2 )
(c2l2 −1 − c2l2 )

O

s11 vs−1
1

1

..

.



si11 vs−1
1

i1

..

.


sl1 −11 vs−1
+
l1 −11


s21 vs−1
2

1









(29b)
(29c)
T





















..

.
s2i2 vs−1
2
i2
..
.
s2l2 −1 vs−1
2
l2 −1












,












(30)
where diag() is a diagonal matrix whose components are elements of the input vector.The coefficient matrices
are written by




T
T
0
0
0
0
 e   e  + s2l v −1
 0  0  .
(31)
Ā + s1l1 vs−1
1
2 s2
l1
l2
0
e
0
e
Let A be a nonsingular matrix and B, C, D be matrices of proper size. Note that we have Sherman-MorrisonWoodbury formula [4, 6, 7]
−1
−1
(A + BDC) = A−1 − A−1 BD D + DCA−1 BD
DCA−1 .
(32)
8
we use the notation

By using the relation
Â−1

T
0
0
 e  e  .
 := Ā + s1l1 vs−1
1
l1
0
0


T
0
0
sl11 vs−1
1
l
−1
−1
1
= Ā−1 −


T
 Ā  e   e  Ā ,
0
0
0
0
 e  Ā−1  e 
1 + s1l1 vs−1
1
l1
0
0
(33)
(34)
we can obtain Â−1 from Ā−1 . And by using the relation
−1
−1
= Â−1 −
ÃX̃ k S̃ k ÃT


T
0
0
s2l2 vs−1
2
l2
−1
−1


T
 Â  0   0  Â ,
0
0
e
e
 0  Â−1  0 
1 + s2l2 vs−1
2
l2
e
e
(35)
we can obtain the Newton direction. If Ā has special structure by which it is easy to compute its inverse
matrix, then this devise reduces computation time. This method can be applied to the problem to minimize
the sum of more than three piecewise-linear convex functions.
References
[1]
[2]
[3]
[4]
[5]
D. Bertsimas and J. N. Tsitsiklis: Introduction to Linear Optimization, Athena Scientific (1997).
S. Boyd and L. Vandenberghe: Convex Optimization, Cambridge University Press (2004).
G. B. Dantzig and M. N. Thapa: Linear Programming 2:Theory and Extensions, Springer (2003).
M. Iri: Ippansenkeidaisu (in Japanese), Iwanami shoten (2003).
K. Kobayashi, K. Nakata and M. Kojima: A Conversion of an SDP Having Free Variables into the
Standard Form SDP, Computational Optimization and Applications, 36 (2007), 289-307.
[6] J. Nocedal and S. J. Wright: Numerical Optimization, Springer (2006).
[7] S. J. Wright: Primal-Dual Interior-Point Method, SIAM Publications (1997).
9