(1.3) Vf(x)

() 1993 Society for Industrial and Applied Mathematics
SIAM J. OPTIMIZATION
Vol. 3, No. 2, pp. 382-397, May 1993
OO8
PARTIAL-UPDATE NEWTON METHODS FOR UNARY, FACTORABLE, AND
PARTIALLY SEPARABLE OPTIMIZATION*
DONALD GOLDFARBt AND SIYUN WANG
Abstract. A modified Newton method for solving unary optimization problems that is based upon only partially updating an approximation to the Hessian matrix at each iteration is developed using rank-one updates.
Two partial updating criteria are presented: the first enables the method to retain the quadratic convergence
property of the classical Newton method, while the second enables it to achieve the superlinear convergence
property of quasi-Newton methods. Globally convergent modifications of the partial-update Newton method
are also given. Finally, the methods and proofs of their convergence are extended to partially separable and
factorable optimization problems.
Key words, unary function, partially separable function, factorable function, inexact Newton method,
partial-update Newton method, rank-one update
AMS subject classifications. 65K05, 65K10
1. Introduction. Consider the unconstrained nonlinear optimization problem
(1.1)
min
f(x).
Following McCormick and Sorer [20] we call problem (1.1) a unary optimization problem
if f(z) takes the form
U (c,(x)),
f(x)
(1.2)
i=1
ax,
where, for i 1,..., m, ci(x)
ai is a constant vector of size n x 1, and Ui(.) is a
unary function, i.e., Ui (.) is a function of a single argument. Note that a separable function f(z)
-i=1 fi(zi), the objective function of the linear robust regression problem
(e.g., see Byrd [1]), and the dual objective function of the entropy problem in information
theory (e.g., see Eriksson [5]) are of the form (1.2).
Let us assume that the unary functions Ui(.), i
1,..., m, in (1.2) are all twice
continuously differentiable. Then, using the chain rule of differentiation, the gradient
vector and the Hessian matrix of function (1.2) are
(1.3)
Vf(x)=(dU(c(x)))
dcz
ai
and
i=1
d/2
respectively.
For problem (1.1), ifwe assume that f(z) takes the form (1.2) and the Hessian matrix
’2f(z) is nonsingular for all z E R’, then a classical algorithm for finding a solution to
(1.1) is Newton’s method:
Received by the editors January 8, 1990; accepted for publication (in revised form) March 4, 1992. This
research was supported in part by National Science Foundation grants DMS 85-12277, DMS 91-06195, and
CDR 84-21402, and by Office of Naval Research contract N-00014-87-K0214.
tDepartment of Industrial Engineering and Operations Research, Columbia University, New York, New
York 10027.
Chemical Banking Corporation, New York, New York 10017.
382
383
PARTIAL-UPDATE NEWTON METHODS
0, 1,..., compute a sequence of steps {sk } and
Given an initial point z0, for k
iterates {z} as follows:
solve
(1.4)
and set
Vz f(zk)sk --V f(zk)
z+ z + sk.
It is well known that, under suitable conditions, Newton’s method has a local quadratic
rate of convergence; i.e., there exists a positive constant 7 such that
if z0 is sufficiently close to z*, where z* is a stationary point of (1.1).
On each iteration of Newton’s method (1.4), after forming the Hessian matrix and
gradient, we need to solve a system of linear equations, which takes O(na) operations.
When the Hessian matrix has the special form (1.3), we can modify the above Newton
method (1.4) to develop a more efficient algorithm for solving the unary optimization
problem (1.1)-(1.2). To see this, let
V2f(x)
i(x)aiaTi
AT(x)A,
i=l
where (x)
d2Ui(oi(x))/do, i 1,...,m, and (x) diag{(x),...,,(x)},
and AT [al,..., am]. Clearly, as the Hessian matrix changes, from step to step, only
the diagonal matrix is affected. Suppose that only the jth diagonal elements of (xk)
and (x_x) differ at iteration k, i.e.,
AT(xk)A AT(xk-1)A + (j(Xk) Cj(Xk_l))aja,
or, equivalently,
-
-
+
V2f(xk) can be obtained from V2f(xk_x) (assuming that both
V2f(xk) and V2f(xk_) are invertible and V2f(xk_) -x is given) by the well-known
Consequently,
Sherman-Morrison rank-one updating formula [24]:
+
where H, c, and w correspond to
Therefore, the next iterate
+
V2f(xk_,), a, and (xk) (xk-,), respectively.
v2
-
y(z )
can be obtained in only O(n2) arithmetic operations after evaluating Vf and i, i
1,..., m. If entries of (xk) and (xk-x) differ, we can perform rank-one updates
and obtain the new inverse V2f(xk) in O(ln 2) operations. A similar approach is used
in polynomial interior point algorithms for linear and quadratic programming to reduce
their complexity bounds (see, e.g., Karmarkar [18], Gonzaga [13], Goldfarb and Liu [12],
and Ye [27]).
The foregoing observation motivates us to consider the following modified Newton
method, which we refer to as apartial-update Newton method. This method takes advantage of the computation done in previous steps and only partially updates the diagonal
384
DONALD GOLDFARB AND SIYUN WANG
,
matrix even though all of its diagonal elements might change at each iteration, using
the rank-one updating formula given above to obtain the inverse of an approximation to
the exact Hessian matrix at the current iterate. To be precise, we define a diagonal matrix
k diag{k,..., eke}, a "working approximation" to (xk) in step k, by
(xo),
i- 1,...,m, and for k _> 1,
o
{
-x
if (xk) is "replaceable"by k-x,
(Xk) otherwise,
1,..., m. Then from I, k we obtain a "working approximation" H k
V2f(xk), or equivalently,
for i
ATkA to
Hk = V2f(xk) + Ek,
where Ek
(1.5)
’i: (/k i(Xk)) aiaT, and compute
sk
(Hk) -1 V f(xk) (V2I(Xk) -+- Ek) -1 V f(xk),
instead of computing sk as in (1.4).
Since V2 f(xk)sk --T f(xk) + rk, where
(1.6)
rk
Ek (V2 f(xk) + Ek)
-1
V f(xk),
the partial-update Newton method can be viewed as a special type of inexact Newton
method. In [2], Dembo, Eisenstat, and Steinhaug analyzed the convergence properties
of such methods under various assumptions on the forcing sequence {r/k } of bounds on
the relative residuals in (1.6), where Ilrkll/llVf(xk)ll <_ k. Also note that, if
0
(x0), i 1,..., m, for all > 1, then the partial-update Newton method reduces to
the simplified Newton method xk+ xk vf(zo)-vf(z), for all k > 1.
In the first subsection of 2 we prove the local convergence of the conceptual partialupdate Newton algorithm and in the second subsection we establish the rates of convergence for two variants of the method determined by different partial-update criteria.
Our analysis is closely related to the analysis of Dembo, Eisenstat, and Steinhaug [2]. In
3, two globally convergent modifications of the partial-update Newton method are presented and some preliminary numerical results obtained using these methods are given.
The final section is devoted to an extension of the partial-update Newton method to
partially separable and factorable functions.
2. Local convergence results. In this section we assume:
(A1) there exists a point x* R’ with Vf(x*) 0;
(A2) Vf(x) AT(x)A and V:f(x *) is nonsingular, where AT [a,... ,a,]
is an n x m matrix with rank n, a is the ith column of AT, i
1,..., m, and (x)
diag{(x),..., ,(x)};
(A3) (x) is continuous in a neighborhood of x*.
We use the Euclidean vector norm and the matrix norm induced from it, both of
which we denote by I1" II, and we define/ IIV2f(x*)-Xll,
2.1. Local convergence. Here we show that, under assumptions (A1)-(A3), the
partial-update Newton method is locally linearly convergent. First note that, under
(A1)-(A3), for any e > 0 and T > 0 there exists a > 0 such that
(2.1)
lie(x)
’I’(y)ll < T
385
PARTIAL-UPDATE NEWTON METHODS
and
(2.2)
provided that max{]Ix x*]], ]IV z’]], I]z x’l]} < 61.
The next result is an immediate consequence of the Perturbation Lemma in Ortega
and Rheinboldt [22, Theorem 2.2.3].
LEMMA 2.1. Under assumptions (A1)-(A3), for 0 < e < l, there exists a positive
constant 62 such that
(2.3)
V 2 f(xk) + Ek is nonsingular, and
.
(2.4)
provided that I1 *11 < and IICk (k)ll < ’, where 7- /211AII < 1/811AII
THEOREM 2.2. Let assumptions (A1)-(A3) hold. Then there exists > 0 such that, if
Ilzo-z* I -< the sequence {z} generated by the partial-update Newton method converges
to z*. Moreover, the convergence is linear, i.e.,
,
(2.5)
where 0 < t < 1.
,
-
Proof. Let > 0 be such that 0 < t 4/ < 1,
IIAII 2, and min{6, 6 } so
that (2.1)-(2.4) hold. Assuming that Ilzo z*ll < we prove (2.5) by induction.
Since Eo is the zero matrix, it is easy to verify that (2.5) is true for k
0. Now
supposing that (2.5) is true for k < N 1, then IIz z* I t Ilzo z* I < 6, 0 <_ _< N.
Hence, since [b b(zv)[ Ib(z,) bi(zv)l for some li, where i < li < N, it
follows from (2.1) that
0<I<N
l<i<m
Therefore, (2.2)-(2.4) hold with k
<-
N, and
The result then follows by taking norms and using the triangle inequality.
2.2. Rates of convergence. In this section we assume
(A4) (z) is Lipschitz continuous at x* with Lipschitz constant L.
It then follows that Vzf(z) is Lipschitz continuous at z* with Lipschitz constant
IIAl[ZL and, from Ortega and Rheinboldt [22, Theorem 3.2.12], that (2.2) can be strengthened to
(2.7)
for
IIz
IlVf(k) Vf(x*) V2f(x*)(xk *)11
z* I sufficiently small.
L
-llAIlllk
*11
386
DONALD GOLDFARB AND SIYUN WANG
We now give two "replacement" criteria for the partial-update Newton method.
The first one retains the local quadratic convergence property of the classical Newton
method. Under the second criterion our partial-update Newton method converges superlinearly and there is trade-off between the rate of convergence and the number of
rank-one updates.
Oiteion 1. For i 1,..., m,
k is replaceable by /k-1 if
-x
as long
where 0 < / < 1. Note that Criterion 1 essentially says to keep k
as IIXTf(z)ll is not too small relative to Ii(zk)
Therefore, only as zk
z*,
i.e., only as zk becomes very close to z*, is set equal to i(zk) if /is close to 1, say
k-xl.
r/= 0.99.
Oiteion 2. For i
1,..., m,
ik is replaceable by k-1 if k < p
or
where p is a given positive integer.
In order to characterize the rates of convergence for variants of the partial-update
Newton method that use these two replacement criteria, we need the following lemma.
LEMMA 2.3 (Ortega and Rheinboldt [22, Theorems 9.2.8 and 9.2.9]). Let the sequence {xk }, which is generated by an iterativeprocess, converge to a limit x*. Furthermore,
let 7o, "Yx, ..., /t be nonnegative constants. If there is a ko > such that
Vk >_ k0,
-
{xk } converge to x* with R-order at least rt, where rt is the unique positive
1 O. Moreover, rt E (1, 2), rt+x < rt, and limt__,o rt 1.
THEOREM 2.4. Let assumptions (A1), (A2), and (A4) hold and let {xk} be the se-
then the iterates
root oft +x -t
quence of iterates generated by the partial-update Newton method. Then
(1) { x k } is locally quadratically convergent to x* if Criterion 1 is used and
(2) {xk } is locally superlinearly convergent to x* with R-order at least rp, where rp is
tp 1 O, if Criterion 2 is used. Moreover, 1 < rp < 2,
the unique positive root of tp+
rp+ < rr, and limp__,o rp 1.
Proof. (1) Since, under Criterion 1, II(I)k (xk)ll <_ l_,llVf(xk)ll, conclusion (1)
follows from Theorem 3.4 in [2].
(2) Under Criterion 2, we have from (2.5) and (A4) that
max
k-pq-l <_j <_k
{ll’I’(:)
< 2LIIx-r, X*
’I’(x*)ll + II(x-)
.
(x*)ll}
[Ix0 x* 1 < 6 and 6 be sufficiently small so that Theorem 2.2 holds and
II(x) (*)11 -< LIIx *11 for I1 x*ll _< It then follows from (2.6)-(2.8) that
IIx+x :*11 < I1: :*11 (3,LIIAll z I1: :*11 + 4,LIIAII 2 I1:-,, II) Vk > p.
if k > p. Let
Conclusion 2 then follows from Lemma 2.3.
[:1
PARTIAL-UPDATE NEWTON METHODS
--
387
Under Criterion 2, the parameter p determines the average number of rank-one
updates required at each iteration, as well as the R-order of the convergence of the
iterates to z*. This fact is established by the following proposition.
PROPOSITION 2.5. On the average, the number of the rank-one updates at each iteration
is at most if Criterion 2 is used.
Proof. It follows from Criterion 2 that for all i, i 1,..., m,
/0 i(z0), and that after any iteration k in which i(zk) is not replaceable by k-,
is set equal to i(zk), then i(zt) is replaceable by
i.e.,
during at least the next
k + 1,..., k + p. The proposition is an immediate consequence of this
p iterations
U
observation.
If we use Criterion 2 and take p m, then each iteration of the partial-update Newton method requires on the average at most one rank-one update, and hence just O(n )
operations, the same amount of work as in quasi-Newton methods. In [6] Gay proved
that Broyden’s so-called "good" and "bad" rank-one update quasi-Newton methods converge superlinearly to a stationary point z* of f(z) with order at least 2 x/’. When
p m, the order of convergence of our partial-update Newton method r, > 2 /’, if
n < m < cn and 1 < c << n. Thus, in this special case, the lower bound on the efficiency
of our method is better than the lower bound for either of Broyden’s methods.
Note that, under Criterion 2, each i 1,..., m, stays fixed for at least p iterations
and all may not get updated at the same time. If all of the stay fixed for exactly p
iterations and they are all updated at the same time, then the partial-update Newton
method under Criterion 2 reduces to the p-step method
-
,
(2.9)
Xk+l
i
Zk,p,
Xk,i+l
O, 1,...,p- 1,
Xk,i
Xk,o
V2 f(Xk)-lv f(Zk,i),
xk,
considered in Traub [26], in which each major iteration consists of p simplified Newton
steps. Shamanskii [23] considered the p-step Newton-like method obtained by replacing
the 72f(xk) in (2.9) by the operator J whose jth column is Je
(Vf(xk / he)
for
n.
the
of
is
column
the
j
jth
1,..., (Here ej
identity matrix and hk
--Vf(xk))/hk
is of order IlVf(xk)ll.)
3. Globally convergent implementations. In the first part of this section we present
two modifications of the partial-update Newton method to make it globally convergent.
In the second part we give some preliminary numerical results.
3.1. Globally convergent modifications. From the local convergence analysis of 2,
we know that the partial-update Newton methods considered there converge rapidly to a
stationary point z* of f(z) once they get close enough to such a point. However, if these
methods do not start near enough to z*, they can fail to converge. Also, if the partially
updated Hessian ArkA is singular, these methods are not well defined. Therefore,
as in the case of Newton’s method, it is necessary to modify our partial-update Newton
methods so that they converge globally. In this section we propose modifications that utilize the special structure of Ar A to compute a positive definite approximate AA
so that a descent direction is obtained.
Consider the following Wolfe-type linesearch algorithm.
ALGORITHM 3.1. For given c and c2, where cq (0, ) and czg. (cq, 1), and a
given point z0, determine zk+, k
O, 1,2,..., as follows: If convergence stop.
Otherwise, compute the descent direction dk -(ArA)-Vf(z), where AA is
nonsingular, and choose a steplength ),k > 0, such that
388
DONALD GOLDFARB AND SIYUN WANG
f (x + Ad) < f(x) + alAVf(x)rd
(3.1a)
and
f (Xk + ,kdk) Tdk >_ c2V f(xk) Td,
(3.1b)
and set x+l xk + kdk.
Let the smallest and largest eigenvalues of matrix H be denoted by
)m(n)
and
m,(n), respectively.
THEOREM 3.1. Let ,(x) be continuous on an open set D and let the level set S {x
f(x) < f(xo)} be a compact subset of D fora given xo D, and assume that Vf(x) 0
for all k > 0 and f has a finite number ofstationarypoints in S. Then if there exist constants
2, such that #1
#1 and #2, where 0 < #1
)min(ArkA) )max(ArkA) #2 for
any k > O, the sequence of iterates {z k} generated by Algorithm 3.1 converges to some
z* E S with Vf(z*) 0. Moreover, the rate of convergence is at least R-linear/fVy(z *)
is invertible.
Proof. From the fairly standard argument (e.g., see Dennis and Schnabel [3], Goldfarb [11], or Mor6 and Sorensen [21]) z --. z* S with Vf(z*) 0.
Let 0 > 0 and ko be such that (z) is continuous on the closed ball B B(z*, o) c
S and xk B for all k > k0. Then, since IlVf(z)ll
dk(ATkA)Zdk <_ #lldkll 2,
>_
and, from (3.1b),
=
--V f(x)Tdk <
l llv f (zk + )d,)
1
we have from the mean-value theorem (e.g., see
(3.2)
where 7o
#--L IlVf(zk)ll" IId ll-< -V/(xk)rdk
max(llVUf(z)ll
z
Vf(z)ll- Ildkll,
[22]) that
-<
lldkll
Vk _> ko,
B). Hence, combining (3.1a) and (3.2), we see that
Vk >_ ko,
and the result that the rate of convergence is at least R-linear follows from Theorem
14.1.6 in [22].
The simplest modification of that ensures that the conditions on the eigenvalues
of ArA required by Theorem 3.1 are satisfied, assuming that /,(z) is continuous on
S, is to define the modified "working approximation" k =diag{ tk,..., } to (Xk)
by the following modifications.
Modification 1. For i 1,..., m, set /o max{0, i(x0)}, where 0 is a prescribed
small positive constant, and at step k, k > 1, define by the following criteria.
Criterion 1’.
k
k
if
max{O, (xk)} otherwise,
where 0
< r/< 1.
< r/,
389
PARTIAL-UPDATE NEWTON METHODS
Criterion 2’.
6-x
if k
< p or
max{O, $()} otherwise,
where p is a given positive integer.
Modification 1 may be overly cautious since ATCkA can be positive definite even if
some diagonal elements of Ok are negative. Consequently, we now propose an alternate
modification which ensures that the modified "working approximation" A(A is exactly equal to the unmodified working approximation ATI, k A for all k whenever ATCk A
is positive definite for all k. The modification is based on a method of McCormick [19,
7.3-7.4] for computing the positive part for a symmetric matrix given in dyadic form.
Modification 2. For i 1,..., m, determine ao so ATOA AT ((xo) + Eo) A is
positive definite, where Eo diag {al,..., am}, and at step k, k >_ 1, set
where is defined by the following.
Criterion 1".
otherwise,
where 0 < r/< 1.
Criterion 2".
k
k-i
if k <_ p
< 1,
or
max
k-p+l_<j
i(zk) otherwise,
where p is a given positive integer, and Ek diag {ak,..., ak} is determined as described below to guarantee that ATA is positive definite.
To specify how the a/ are to be chosen in Modification 2, we define the sets Uk
.> 2#-1
k-1
i
1,...,m}, and W k
{i ,k
1,...,m},Vk {i ,k
-, i
-i
-i
w
< k-x, i 1,..., m}. If we set ai Ofori UkUV anddefineSkas
{i
k AT(k-1A +
(i(xk) k-)aiaTi, then it follows from the definition of
T
k
(b
A A under Criteria 1" or 2" that
q
’’eV
ArkA
(3.3)
,
qk +
y] graia,
jEW
a-
,
where
q-1. If AT k-1A is positive definite then qk is. Moreover, if
(zk)+
qk
is any index in W k,
(qk)
(qk) 1/2
+ rataf (qk) 1/2 (I + (k) -1/2
-1
is positive definite if and only if
> (1/a (qk) at) since I + tk (k) 1/2
-1
has all unit eigenvalues except for one which equals l+#tkaT (qk) at. Therei e W recursively as follows:
fore, we can determine
(qk)-l/2
a,
r
,
ata
-1/2)
ata"
390
DONALD GOLDFARB AND SIYUN WANG
Let be any index in W and consider q
+
(b,(x)+ r -t-1) a,a’.
If we
choose
a
(3.4)
0 if
0" kl -1
where
7
update
(,(xk)- -1)a (*k)
l(2Ck
"
"q-
-1
at
+ 1 > O,
min{0, 7tk } otherwise,
(1/a’ (#k) at ) > 0 and 0 is a prescribed small positive constant, and
W {/},
k := 9k + (,(Xk)+ a -) ata and W
-1
k
(3.5)
"=
k
then the updated k is positive definite, and we can repeat the above procedure until W k
is the emp set. Note that the te rain{0, 7t} in (3.4) not only ensures that 9 remains
positive definite during its recursive computation, but more impoant, that
is unifoly bounded above for all k.
Choosing Ek by the above procedure ensures that AT kA is positive definite. Initially, we need to deteine a E0 so that ArA A ((Zo) + 0) A is positive definite. Ifwe write Ar ((z0) + E0) A as
llArAll
m
AT ((x0) + E0) A
ATA +
(,(x0) + a,0
i=1
6
this can be accomplished by applying the above procedure with
replaced by i(x0)
and
replaced by 1.
Modification 2 has several desirable propeies. First, gorithm 3.1 using Criterion
2" still needs, on the average, only at most rank-one updates at each iteration. Second,
k k if ATkA is positive definite, and hence k Ck, where Ck is defined by
Criterion 1 or 2, whenever AT CkA is positive definite for all k. To veri this, we just
need to show that ak
0, for i
m, under our selection role, if ATkA is
1,
k
W
s
Since
for
definite.
any
positive
-1
ATA
+ (s(x)
($,(x)
-1 ) a, as +
4
-1
)
it follows from the negative and positive definiteness of iw()(0i (x) -l)aia
and ATA, respectively, that
+ (,(x) -)a,a is positive definite, which
and arguing inductively, one can conclude that aik =0
0. Updating
implies that
i 1,..., m. Finally, as we point out in the ne section, the era computation required
to implement Modification 2 is moderate.
Setting
+ 0, for all i and k, so that min(ArA) Omin(ArA), we then
the
have
following theorem, which is an immediate consequence of eorem 3.1.
EOM 3.2. Under the assumptions of eorem 3.1, Algothm 3.1, where
defined by either Modification 1 or 2, global and R-linear conveent.
a
3.2. Implemention. At the kth step of gorithm 3.1 for k
tational effort involves soMng (AT A)d -V f (x ), where
(3.6)
ATriA AT-A +
5aja,
jJk
1, the main compu-
-
391
PARTIAL.UPDATE NEWTON METHODS
-1
&
at iteration k}, and
(j Cj(zk) is not "replaceable"by
max(j(zk), 0}
0
if Modification 1 is used, and
Modification 2
if
$(zk) + +
T
is used. Assumin that the CholCsky factorization Lk_IL}_ of ATk-1 A is available,
the Cholesky factorization LkL of AT@kA can be obtained by applying a numerically
stable rank-one updating procedure, such as Method C2 in Gill Ct al. [8] or Method 2 in
Goldfarb [10], IJkl times. If Modification 2 is used, the updates corresponding to indices
j V k C_ Jk are performed first to give the Cholesky factors of the initial matrix 9 k.
The remaining indices in Jk and the corresponding updates of the Cholesky factors are
then computed using the recursive procedure (3.4), (3.5) to determine Ek. Note that the
extra cost of computing k is just the cost of solving IW k triangular systems of linear
equations.
We now present some preliminary numerical test results. All algorithms were coded
in FORTRAN and compiled by the F-77 SUN FORTRAN compiler, and the results were
obtained using double precision arithmetic on a SUN SPARC. We used the termination
condition ]l’f(xk)]] < 10 -5 max{l, []xkll} and the linesearch algorithm proposed in
Dennis and Schnabel [3] with the parameter settings at
10 -a and a2
0.1 in the
dk
a
6-
linesearch conditions (3.1a) and (3.1b). The test functions that we used were
(1) the extended Powell singular (EPS) function in n variables:
f(x)
i=1
[(xa,-3 + 10xa,-2) z + 5(xa,_
+(i-
xa,) 2
+ lO(i_a xi)
2xi_
starting at x0 (3,-1, 0,1, 3,-1, 0,1,...);
(2) the extended Rosenbrock (ER) function in n variables:
f(z)
i=1
[100 (zz,- z22,_1)2 + (1- z2,_l) 2]
n/2
i=I
200
100
100
(X2i
3
X2i--1) 3 + (1 x2i_1) 2
100xi_l + x2i3
(x2i-1 + x2i) 3
starting at x0 (-1.2, 1,-1.2, 1,...);
(3) the extended Rosenbrock cliff (ERC) function in n variables:
S(x)
y
x2,-
3
+ (x2i x2i-1) + exp[20(x2i_l
100
=
starting at x0 (0,--1, 0,--1,...);
(4) the variably dimensional (VD) function in n variables:
f(x)-- (:r,i- 1) 2 +
i=1
starting at x0
(i), where
i(x,- 1)
+
i=1
1
i
1,
E,
i(x,- 1)
i=1
n; and
392
DONALD GOLDFARB AND SIYUN WANG
(5) the Broyden tridiagonal (BT) function in n variables:
f(x)
E [(3
2x,)x,
2x,+1 W 1] 2
i-1
i=1
[4x
4
2
2(2xi
(4x
xi- 1
2x+ + 1) 3
+- (3Xi
xi-
2Xi+l q- 1) 3 + (37i
xi- 1
2xi+1
1) 3
i=1
4
xi- 1
2x+x + 1) 2
],
where xo
xn+ 0, starting at x0 (-1,-1,-1,-1,...). Note that the second
expressions for the ER and BT functions are in unary form.
The test results are summarized in Table 1. The quantities Ni/NI/N,pd in the first
row of each cell of these tables are, respectively, the numbers of iterations, function evaluations, and rank-one updates performed by the algorithm. The number in parentheses
in the second row of each cell is the CPU time in seconds. The table presents results for
the extended Powell singular, the extended Rosenbrock, the extended Rosenbrock cliff,
the variably dimensional, and the Broyden tridiagonal functions for n (the number of
variables) equal to 40, 80, and 160. The column headings "PUI-I’" and "PU1-2’" refer,
respectively, to the partial-update Newton method under Criteria 1’ and 2’ in Modification 1, while the headings "PU2-1"" and "PU2-2"" refer, respectively, to the method under Criteria 1" and 2" in Modification 2. In these methods we used p [vJ, r/= 0.99,
and 0 10 -6. The last two columns, with the heading "Newton" and "p-Newton," give
results for the modified Newton method of Gill and Murray [9] and the p-step modified
Newton method of Traub [25] and Shamanskii [23], respectively, using the same termination criterion and linesearch as the other algorithms.
The test results for Modifications 1 and 2 were identical (except the CPU time) for
problems EPS, ERC, and VD because they were convex. Also, due to the structure of
the extended Powell singular function, starting at the chosen x0, it is not difficult to see
that, for i 2,..., Z,
4i-3(k) l(Xk), 4i-2(Xk) 2(Xk),
4i-l(Xk) 3(Xk), and 4i(Xk)
at any iteration k, k _> 1. Hence the number of rank-one updates in each iteration will be
.
an integer multiple of The numbers N,pd associated with the extended Rosenbrock
and the extended Rosenbrock cliff functions can be similarly explained.
These preliminary results show that although the partial update methods take more
iterations and function evaluations than Gill and Murray’s modified Newton method,
partial update methods take less time to solve some types of problems than do modified
Newton methods. In our test set this was true for the EPS, ERC, and VD sets of problems, all of which were convex. Also, method PU2-2" took the least time to solve the
largest incidence of problem BT.
4. Extension to partially separable and factorable optimization. The goal of this final section is to extend our partial-update Newton method to solvepartialty separable and
factorable minimization problems. Partially separable problems are defined by Griewank
and Toint [14], [15] as problems where the objective function has a decomposition of the
form
(4.1)
f(x)
E f,(x),
i=1
x
393
PARTIAL-UPDATE NEWTON METHODS
TABLE 1
Problem
n
PUI-I’
PU1-2’
PU2-1"
PU2-2"
Newton
p-Newton
(EPS)
40
6/17/40
7/36/20
6/17/40
7/36/20
7/14/0
9/44/0
(0.36)
(0.34)
(0.38)
(0.35)
(0.55)
(0.33)
8/24/120
8/42/40
8/24/120
8/42/40
7/14/0
11/48/0
(2.35)
(1.68)
(2.84)
(1.83)
(2.99)
(1.89)
8/24/240
10/46/80
8/24/240
10/46/80
7/14/0
16/58/0
(28.01)
(10.69)
(28.48)
(11.28)
(28.08)
(13.01)
50/238/2540
78/288/880
20/40/1000
43/115/1000
11/27/0
61/134/0
(65.46)
(15.13)
(11.43)
(14.07)
(1.39)
(2.88)
59/274/5640
94/350/2320
20/39/2080
50/118/2080
11/27/0
85/189/0
(259.64)
(109.09)
(72.20)
(79.98)
(13.49)
(20.65)
53/244/10560
98/342/4400
20/43/4800
52/124/4080
11/27/0
109/238/0
(1243.11)
(590.62)
(510.52)
(472.33)
(69.27)
(82.13)
80
160
(ER)
40
80
160
(ERC)
40
80
160
(VD)
40
80
160
(BT)
40
80
160
13/37/140
14/41/120
13/37/140
14/41/120
10/22/0
31/119/0
(1.10)
(0.99)
(1.12)
(1.10)
(1.12)
(1.79)
13/43/240
15/44/240
13/43/240
15/44/240
10/22/0
34/135/0
(5.26)
(5.44)
(5.28)
(5.48)
(5.95)
(6.79)
15/39/560
17/54/480
15/39/560
17/54/480
10/22/0
49/187/0
(40.15)
(36.36)
(41.26)
(37.42)
(47.68)
(43.36)
11/33/4
16/47/4
11/33/4
16/47/4
11/19/0
13/67/0
(0.40)
(0.53)
(0.41)
(0.55)
(1.41)
(0.62)
14/31/7
17/46/5
14/31/7
17/46/5
11/20/0
25/89/0
(1.81)
(1.93)
(1.85)
(2.00)
(18.24)
(3.59)
20/37/9
24/48/8
20/37/9
24/48/8
12/22/0
62/169/0
(12.09)
(12.23)
(13.26)
(13.09)
(104.67)
(59.03)
18/34/97
16/33/68
8/10/214
11/21/145
5/7/0
8/16/0
(1.71)
(1.49)
(3.84)
(2.06)
(0.36)
(0.31)
15/29/176
18/32/110
11/20/327
14/26/280
5/7/0
10/21/0
(11.26)
(5.54)
(36.55)
(26.66)
(2.41)
(2.02)
19/40/179
18/40/18
14/27/483
17/33/5
5/7/0
13/27/0
(46.35)
(19.65)
(98.13)
(16.51)
(27.81)
(17.48)
Numbers in cells are: Ni/Nf/Nur,,/--first row; (CPU sees.)
second row.
where each element function f (-) depends on only ni variables, where ni is small compared to n, the total number of variables of the problem. Partially separable problems
arise naturally in many different fields, such as finite elements, variational calculations,
and transportation networks (see [16] for more examples). Building approximations to
the low-rank Hessian of each element function separately, Griewank and Toint [14], [15]
developed partitioned variable metric update algorithms and obtained encouraging numerical results [16].
Assume that f(z), i 1,..., m, in (4.1) are all twice continuously differentiable,
and the gradient vector and the Hessian matrix of function (4.1) are
394
DONALD GOLDFARB AND SIYUN WANG
m
VS(x)
m
E
Vf(x) and
V2,t’(z)
E V2f(z)’
=1
i=1
respectively. Note that each element Hessian 2/ has noero entries in at most ni rows
and columns since element nction/ only depends on ni << n "inteal" variables. We
can rewrite 2/ (z), which we shall also denote by Hi (z), as
(4.2)
Hi(x)
V2fi(x)= MiGi(x)M,
i= 1,..., m,
where G(x) consists of the ni x ni noero submatr of V2fi(x) and M is an n x ni
matr whose jth column is the qth column of the n x n identi matr if xq is the jth
internal variable of fi. For example, if n 4 and fi(x) is a nction of only Xl and
then
0 0
H,(x)
0
0
h41(x)
0 0
0 0
0 0
0
0
h41
Factorable optimization problems are defined by McCormick [19] as problems where
the objective function f(z) is a factorable function, i.e., one that can be represented as
the last in a finite sequence of functions {f (z)} that are composed as follows:
n, f.C(x) x;
(1) for j 1,
(2 ) for j > n, fj(x) equals fk(x) + ft(x), fk(x)" fl(x), or T [fk(x)], where T(.)
is a function of a single variable, and k, < j. It is quite easy to see that a unary function
is a special case of a factorable function.
As pointed out by Jackson and McCormick [17], a factorable function possesses two
properties, that can be exploited to produce efficient algorithms: (1) its gradient and
Hessian can be computed exactly (in terms of the derivatives of T (.)), automatically,
and efficiently if it is assumed to be twice continuously differentiable; (2) its Hessian is
naturally given as a sum of outer products (dyads) of vectors, i.e.,
r
(4.3)
+
where {ui (x)} and {v (x)} are n vectors, and {ai(x)} are scalars, which are all available,
having been required for the computation of the gradient of f(x). This dyadic structure
of V2f(x) has been used by Emami [4] to obtain a factorization of a generalized inverse
of the Hessian of a factorable function, by Ghotb [7] for computing the generalized inverse of a reduced Hessian, when it is given in dyadic form, and by Sofer [25] to obtain
computationally efficient techniques for constructing the generalized inverse of such a
reduced Hessian and updating it. Since
(4.3) can be rewritten as
m
V2f(x)
Hi(x),
i=1
395
PARTIAL-UPDATE NEWTON METHODS
where Hi(x)
1,... ,m, are rank-one matrices, ai(x), i
i(x)ai(x)ai(x) T, i
1,..., m, are n vectors; and i(x), i 1,..., m, are scalars, all of which are functions of
X.
Thus, in both the partially separable and factorable cases, we can express the Hessian
of f(z) as
m
i=l
where each Hi (x) has low rank. Because of this, it is possible to extend the partial-update
Newton methods of the previous sections to solve partially separable and factorable optimization problems.
Such partial-update Newton methods for partially separable and factorable optimization compute a step direction by formula (1.5). But now the "working approximation" H k to V2f(xk) takes the form
m
i=1
1,..., m, Hk is a "working approximation" to the element Hessian
Hi (xk) V2fi (xk) in the methods for partially separable optimization and to the rankone matrix Hi (xk) i(xk)ai(xk)ai (xk)T in the methods for factorable optimization.
To be specific, unmodified versions of these methods initially set H/
Hi(xo),
i 1,..., m, and at step k (k > 1) set
where, for i
if Hi(xk) is "replaceable"by H/-x,
(4.4)
otherwise,
for i 1,..., m. Also, in analogy with the replacement criteria of 2, we have, substituting H for and the Frobenius norm I IIF (or any other matrix norm) for the absolute
value I. I, the following.
Criteon 1". For i 1,..., m,
Hi(xk) is replaceable by H/-1 if
where 0 < r/< 1.
Criterion 2*. For i
-
1,..., m,
Hi(xk) is replceable by H#
if k
< p or
IIHP
max
k-p+l<_j<_k
-
H()ll
1,
{IIH,(x)- H(x-)IIF} -<
where p is a given positive integer.
We note that because of the special form (4.2) of H(x) in the partially separable
case, H/k can be expressed as
H
M, Gk MT,
n,(z)lls IIC,-x
(k)ll and IIn(k) n(-x)ll -I1() -,(-)11, ’s can be substituted
for H’s in the updating procedure (4.4) and in Criteria 1" and 2* in this case.
where
Gk is an n,
x n, dense matrix. Moreover, since
IIHP
396
DONALD GOLDFARB AND SIYUN WANG
It is not very surprising that if we replace assumptions (A2)-(A4) by:
(A2’) Vfi(z), i 1,..., m, are all continuously differentiable in a neighborhood of
z* and Vf(z *) is nonsingular; and
(A3’) VJ’i (z), i 1,..., m, are all Lipschitz continuous at z*,
we can prove the following local convergence results using arguments analogous to those
used in 2.
THEOREM 4.1. Let {a:} be the sequence of iterates generated by the partial-update
Newton method for partially separable or factorable optimization. Then
(1) {a: }/s locally and linearly convergent to z* under assumptions (A1) and (A2’);
(2) {z}/s locally quadratically convergent to z*, if Ctitedon 1"/s used, under assurnp-
(A1), (A2’), and (A3’);
(3) {a:k }/s locally superlinearly convergent to a:*, if Criterion 2*/s used, under assumptions (A1), (A2’), and (A3’). Moreover, if p is finite, the rate of convergence is at least r
where r is the unique positive root oftp+ t
1 0 and 1 < r < P and on the average
each iteration needs at most ’ low-rank updates.
tions
-
As in the first part of 3 we can modify the partial-update Newton methods for partially separable and factorable problems to ensure global convergence to a stationary
point of f(z). These modifications, their implementation, and the results of numerical
testing will be presented in a future report.
Acknowledgment. We thank two anonymous referees for their helpful suggestions.
REFERENCES
[1] R. H. BYRD, Algorithms for robust regression, in Nonlinear Optimization 1981, M. J. D. Powell, ed.,
Academic Press, New York, 1982, pp. 79-84.
T. STEI-HAUG, Inexact Newton methods, SIAM J. Numer. Anal., 19
[2] R. S. DEMBO, S. C. EISESSTAT,
(1982), pp. 400-408.
[3] J. E. DENNIS AND R. B. SCHNABEL, Numerical Methods for Unconstrained Optimization and Nonlinear
Equations, Prentice-Hall, NJ, 1983.
[4] G. EMAMI, Evaluating strategies for Newton’s method using a numerically stable generalized inverse algorithm, Ph.D. thesis, Dept. of Operations Research, George Washington Univ., Washington, DC,
1978.
[5] J. EaIKSSON, A note on solution of large sparse maximum entropy problems with linear equality constraints,
Math. Programming, 18 (1980), pp. 146-154.
[6] D. M. GAY, Some convergence properties of Broyden’s method, SIAM J. Numer. Anal., 16 (1979), pp.
623-630.
[7] E GHOTB, Newton’s method for linearly constrained optimization problems, Ph.D. thesis, Dept. of Operations Research, George Washington Univ., Washington, DC, 1980.
[8] P.E. GILL, G. H. GOLUB, W. MURRAY, AND M. A. SAUNDERS, Methods for modifying matrix factorizations,
Math. Comp., 28 (1974), pp. 505-535.
[9] P.E. GILL, W. MURRAY, Arid M. H. WRIt3HT, Practical Optimization, Academic Press, London, 1981.
10] D. GOLDFARB, Factorized variable metric methods for unconstrained optimization, Math. Comp., 30 (1976),
pp. 796-811.
[11]
., Curvilinear path steplength algorithms for minimization which use directions of negative curvature,
Math. Programming, 18 (1980), pp. 31-40.
[12] D. GOLDFARa ANt) S. LIu, An O(na L) primal interiorpoint algorithm for convex quadratic programming,
Math. Programming, 49 (1991), pp. 325-340.
13 C.C. GOr4ZAt3A, An algorithm for solving linearprogramming problems in 0 na L) operations, in Progress
in Mathematical Programming, N. Megiddo, ed., Springer-Verlag, Berlin, 1989, pp. 1-28.
[14] A. GRIEWANK AND PH. L. TOINT, On the unconstrained optimization ofpartially separable functions, in
Nonlinear Optimization 1981, M. J. D. Powell, ed., Academic Press, New York, 1982, pp. 301-312.
[15]
., Partitioned variable metric updates for large structured optimization problems, Numer. Math., 39
(1982), pp. 119-137.
PARTIAL-UPDATE NEWTON METHODS
397
[16] A. GRIEWANK AN PH. L. Tolm’, Numerical experiments with partially separable optimization problems, in
Numerical Analysis: Proceedings Dundee 1983, D. E Griffiths, ed., Lecture Notes in Math. 1066,
Spdnger-Verlag, Berlin, 1984, pp. 203-220.
17] R.H. E JACKSON AND G. E MCCORMICK, The potyadic structure of’factorable function tensors with applications to high-order minimization techniques, J. Optim. Theory Appl., 51 (1986), pp. 63-94.
18] N. KARMARKAR, A new polynomial time algorithm for linear programming, Combinatorica, 4 (1984), pp.
373-395.
19] G. E McCoRMiCK, Nonlinear Programming: Theory, Algorithm, and Applications, John Wiley, New York,
1983.
[20] G.P. McCOMICKAr A. SOFE, Optimization with unary functions, Math. Programming, 52 (1991), pp.
167-178.
[21] J.J. Mom Am D. C. SOmerSEt, Newton’s method, in Studies in Numerical Analysis, G. H. Golub, ed.,
MAA Studies in Math., 24 (1982), pp. 29-82.
W. C. RHEIraOLDT, Iterative Solution of Nonlinear Equations in Several Variables,
[22] J. M. ORTEGA
Academic Press, New York, 1970.
[23] V.E. SHAMASKII, On a modification of the Newton method, Ukrain. Mat. Zh., 19 (1967), pp. 133-138.
(In Russian.)
[24] J. SHEMA Am) W. J. Momso,Adjustment ofan inverse matrix corresponding to changes in the elements
of a given column or a given row of the original matrix, Ann. Math. Statist., 20 (1949), pp. 621.
[25] A. SOFER, Computationally efficient techniques for generalized inversion in nonlinear programming, Ph.D.
thesis, Dept. of Operations Research, George Washington Univ., Washington, DC, 1983.
[26] J. TRAtm, lterative Methods for the Solution of Equations, Prentice-Hall, Englewood Cliffs, NJ, 1964.
[27] Y. YE, A 0 na L) potential reduction algorithm for linear programming, Math. Programming, 50 (1991),
pp. 239-258.

Download Report

(1.3) Vf(x)

Paperzz.com

Your Paperzz