Second Derivative Algorithms far Minimum Delay Distributed

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-32,NO. 8 , AUGUST 1 9 8 4
911
Second Derivative Algorithms far Minimum Delay
Distributed Routing in Networks
DIMITRI P. BERTSEKAS, ELI M. GAFNI, AND ROBERT G. GALLAGER, FELLOW, IEEE
exploitthenetworkstructureoftheproblem,simplifythe
computations, and facilitate distributed implementation.
[ A Section I1 weformulatetheminimumdelayrouting
problemasamulticommodityflowproblemanddescribea
broadclassofalgorithmstosolvetheproblem.This
class
is patterned after a gradient projection method for nonlinear
[4], but differs subprogramming [ 21, [ 31 asexplainedin
stantively from this method in that at each iteration the routing pattern obtained is loop-free. An interesting mathematical
complication arising from this restriction
is that, similarly as
in [ 1 1 , the value of the objective function need not decrease
strictlyateachiteration.Gallager’soriginalalgorithm
is recoveredasaspecialcasewithinourclassexceptforavariation in th’e definition of a blocked. node (compare with
[ 1,
I. INTRODUCTION
in ordertomaintain
eq.(15)].Thisvariationisessential
E consider the problem’of optimal routing
ofmessages
loop freedom during operation of our algorithms and, despite
inacommunicationnetwork
so astominimizeaverage
its seemingly minor nature, it has necessitated major differences
delaypermessage.Primarily,wehaveinmindasituation
in the proof of convergence from the corresponding proof of
wherethestatisticsofexternaltrafficinputschangeslowly
111.
withtimeasdescribedinthepaperbyGallager
[ 1 1 . While
Section 111 describes in more detail a particular algorithm
algorithms of the type to
be describedcanalsobeusedfor
from the class of Section 11. Thisalgorithmemployssecond
centralized computation, we place primary emphasis on algoderivatives ‘in a‘mannerwhichapproximatesaconstrained
rithms that are well suited for real-time distributed implemen[3] and is wellsuitedfordisversionofNewton’smethod
tation.Thus,itmaybedesirabletomodifythealgorithm
tributed computation.
of Section 111 to include a line search procedure before using
The algorithm of Section 111 seems to work well for most
it for centralized computation.
quasi-staticroutingproblemslikelytoappearinpracticeas
of a distributed extensive computational experience has shown
Two critical requirements for the success
[ 51. However,
routing
algorithm
are
speed
of convergence
and
relative
therearesituationswheretheunitystepsizeemployedby
insensitivity of performancetovariationsinthestatisticsof
this algorithm may be inappropriate. In Section IV we present
[ 1 ] anotherdistributedalgorithmwhichautomaticallycorrects
external traffic inputs. Unfortunately, the algorithm of
is not entirely satisfactory in these respects.
In particular,it
this potential difficulty whenever it arises at the expense of
is impossibleinthisalgorithmtoselectastepsizethat
will additional
computation
per
iteration.
This
algorithm
also
guaranteeconvergenceandgoodrate
of convergencefora
on minimizingat
employssecondderivatives,andisbased
broad range of external’.traffic inputs. The work described in
each iteration a suitable upper bound to a quadratic approxithis paper was motivated primarily by this consideration.
mation of the objective function.
A standard approach for improving the rate of convergence
BothalgorithmsofSections
111 and IV havebeentested
andfacilitatingstepsizeselection
in optimizationalgorithms
extensively and computational results have been documented
is to scale the descent direction using second derivatives of the
in [ 51 and [ 6 1 . These results substantiate the assertions made
‘objective function as for example in Newton’s method. This
hereregardingthepracticalpropertiesofthealgorithms.
is also the approach taken here. On the other hand, the straightThere are also other related second derivative algorithms
[71,
of Newton’smethod
is inappropriateforour
forwarduse
[8] that operate in the space of path flows and exhibit similar
problembothbecauseoflargedimensionalityandtheneed
behavior as the ones of this paper, while other more complex
for algorithmic. simplicity in view of our envisioned decentralalgorithms [ 121, [ 131 ‘are based on conjugate gradient approxiizedreal-timeloop-freeimplementation.
We havethusintromationstoNewton’smethodandexhibitafasterrate
of
ducedvariousapproximationstoNewton’smethodwhich
convergence. A survey is given in [ 151, and a computer code
implementation can be found in
[ 161. These algorithms are
Paper approved by the Editor for Computer Communication of the IEEE
wellsuitedforcentralizedcomputationandvirtualcircuit
Communications Societyfor publication after presentation at the 1978 Interna-networks but, in contrast with the ones of this paper, require
tional Symposium on Systems Optimization and Analysis, INRIA, Rocquenglobal information at each node regarding the network topology
court, France, December 1978. Manuscript received April
8, 1981;revised and the total flow on each link. Depending on the mode of
October 19, 1983. This work was supportedin part by the Defense Advanced operation of the network this information may not be availaResearch Projects Agency under Grant ONR-N00014-75-C-1183 and in part
ble.Thealgorithms
of [ 71, [ 81, [ 121, and [ 131mayalso
by the National Science Foundation under Grant NSF/ECS-79-19880.
more
computer
memory
when
implemented
for
D. P. Bertsekas and R. G . Gallager are with the Laboratory for Infopation require
centralized computation.
of Technology,Cambridge,
andDecisionSystems,MassachusettsInstitute
We finally mention that while we have restricted attention
MA 02139.
to the problem of routing, the algorithms of this paper can be
E. M. Gafni is with the Department of Computer Science, University of
California, Los Angeles, CA 90024.
applied to other problems of interest in communication net-
Abstract-We propose a class of algorithms for finding an optimal
quasi-static routing in a communicationnetwork. The algorithms are
iteratively
based on Gallager’s method [11 and providemethodsfor
updating the routing table entries of each node ina manner that guarantees
convergence to a minimum delay routing. Their main feature is that they
utilize second derivatives of the objective function and may be viewed as
approximations to a constrained version of Newton’s method. Theuse of
second derivatives results in improved speed of convergence and automatic
stepsize scaling with respect to level of traffic input.These advantages are
of crucial importance for the practical implementation of the algorittim
using distributed computationinanenvironment
where input traffic
statistics gradually change.
W
0090-6778/84/0800-0911$01 .OO @ 1 9 8 4 IEEE
works. For example, problems of optimal adaptive flow control'or combined routing and flow control have
been formulated in [91, [ 101 as nonlinear multicommodity flowproblems
of the type considered here, and.the algorithms
of this paper
are suitable for-their solution.
Due 'to space limitations the proofs of most
of the results
ofthepaperhavebeenomitted.Theycan
be found in.twp
reports [ 11 1 , [ 1 4 1 .
11. A CLASS OF ROUTING ALGORITHMS
Consideranetwork'consistingof
N nodesdenotedby
I , 2; ..., 'N and L directed links. The set
of links is tienoted
by L . We denote by (i, I ) the link from node i to node
I and
assumethatthenetworkisconnectedinthesensethatfor
?ny two nodes t n , i l there is a directed path from nz to 17. The
flow o n ,eachlink (i, I ) for any destination j is denoted by
,fir(;). The total flow on each link (i,I).is denoted by Fil;i.e.,
The vector of all flows fir(;), (i, I ) E L ,j = 1 -., N 1s denoted
by f .
We areinterestedinnumericalsolutionofthefollowing
multicommodity network flow problem:
~
Vi= l;..,N,i#j
' d ( i , l ) E L , i = I ; . ; , N , j = 1> ... N
3
'
= 0,
'd ( I , I ) E L , j = I , ...,N
d
where, for i Z j , r ; ( j ) is a known traffic input at node i destined
for,;, and O(i) and /(i) are the sets of nodes I f o r which (i, 1 ) E
L and ( I , i ) ' E L ,respectively.
Thestandingassumptionsthroughoutthepaperare
as
follows.
l)ri(j)>O,i,j= l;..,N,i#;.
2) Eachfunction
Dir is definedonaninterval
[ O , Cil)
where Cil iseitherapositivenumber(thelinkcapacity)or
+m; Dil is:co,nvex, continuous, and has strictly positive and
continuous first and second derivatives on
[ 0 , Cir), where the
derivatives a t ' 0 aredefinedbytaking,thelimitfromthe
right. Furthermore, D;r(Fij) -+ ~0 as Fir -+ C;l.
3) (MFP)hasatleastonefeasiblesolution,
f, satisfying
Fir < C;l f o r all (i, I ) E L .
For'notational convenience in describing various algorithms
w e will suppressthedestinationindexand
inwhatfollows,
concentrate on a si!Igle destitzatiotz 'choseg for concreteness
to b en o d e N . Ourdefinitions,opti'malityconditjons,and
algorithmsareessentially.identicalforeachdestination,
so
thisnotationalsimplificationshouldnotbecomeasource
ofconfusion.Inthecasewheretherearemultipledestinationsit ispossible
toimplementouralgorithms
in atleast
two
different
ways.
Either
iterate
simultaneously
for
all
destinations (the 7zll-at-once" version), or iterate sequentially
one d'estination at a time in'a cyclic manner with intermediate
readjustmentoflinkflo,wsinthespiritoftheGauss-Seidel
m e t h o d . ( t h e '%ne-ut-artime" version). The remainder
of o u r
notation follows in large measure the one employed in
[ 11,.
In addition, all vectors'will be considered to be column vectors,
transposition willbe
denotedbyasuperscript
T , andthe
stan'dardEuclideannormof
a vectorwillbedenotedby
jjl(i)
>
and for ti
(i; I),
>
f 0 let @il be the fraction of ti that travels on link
f;/
@.
I1 =ti
i = I;..,N-
> .
l(i,l)EL,
Then it is possible t o reformulate the problem in terms of the
variables @i/as follows [ 1 1 .
For each node i
N we fix an order of the outgoing links
(i, I),' I E 0 (i).'We' identify with each collection I(i,
I ) EL ,
i = I , ..., N - I} a column vector @ =
T , 9 2 ', - * , @ N - T ) T ,
where @i is the column vector with coordinates @il, 1 E 0 (i).
Let
+
.
N
fil(j)>O,
I * 1, i.e., xTx = 1.y 12 foranyvector x. Vectorinequalities
aremeantto
b e componentwise,i.e.,for
x = (x,, -', x n )
we write x 0 if x; 0 for all i = 1,
12.
Let ti be the total incoming traffic at node i,
and let cf, be the subset of (r, consisting of all @ for which there
exists a directed path
(i, I ) , -., (nz, N ) from every node. i. =
. I , ..., N - 1 t o t h e d e s t h a t i o n
N along which @i/ > 0 , '-.,
@ n l N _> 0 . Clearly, @ and @ are convex sets, and the closure of
@ is @. It is shown.in [ I ] that for every @ E @ and r = ( r l ,
r z , .-, T N - ) with r i 2 0 , i = 1 , -., N - 1 there exist unique
vectors t ( @r, ) = ( t l ( @ ,r ) , .-, tN- 1 (@,r ) ) and f(@, r ) with coordinates ,fil(@,
r ) , (i, I ) EL , i # N satisfying
t(@,I )
0 , f(@,y ) 2 0
ti(@,
r ) = ri
+
i =. I , 2, -, N
j h i ( @ ,r ) ,
-
I
mEI(i)
m+N
Furthermore, the functions t(@,
r ) , f(@, r ) aretwicecontinuously differentiable in the relative interior of their domain of
definition @ x ( r lr
O } , Thederivativesattherelativeboundary can also be defined by taking the limit through. the relative .
interior. Furthermore, for every
r .> 0, and every f which is
feasible for (MFP) there exists $5a E such that f = f(@, r ) .
It follows from the above discussion'that the prohlem can
be written in terms of the variables ( J ~as~
9
where we write D ( @ ,I ) = m if fir(@, r ) 2 Cil for some (i, I ) E L .
Similarlyasin
[ 11, our algorithms generate sequences of
loopifree routing variables @ and this allows efficient computa:
tion of various derivatives :of D . Thus for a given q5 E @ w e s a y ,
91 3
BERTSEKAS e t a l . : SECOND DERIVATIVE ALGORITHMS
6 i is the vector with
t h a t n o d e k is downstream f r o m n o d e i if there is a directed The scalar 01 is a positive parameter and
path from i t o k , and for every link ( I , 117) o n t h e p a t h w ehave components { 6 i m } given b y ( 7 ) .
All derivatives in (8) and (9) are evaluated at Gk and f ( @ ,
> 0. We say that node i is upstream from node k if k is
0 , matrix Mik is some
downstream from i. We say that @ is loop-free if there is n o r ) . For each i forwhich ti(Gk, r )
pair of nodes i , k such that i is both upstream and dow.nstream symmetric matrix which' is positive definite on .the subspace
@ and r Z 0 for which D ( @ , r ) 00 'the ( u i ( C / ~ o ( i ) u=i 0 } , i.e., uiTMikui 0 , 'd ui f 0 , Cl&(;)uil = 0 .
from k . For any
partialderivatives a D ( @ ,r ) / a @ i , canbecomputed
ushg the Thiswnditionguaranteesthatthesolution
t o problem (9)
following equations [ 1 ]:
exists and is unique. For nodes i for which t i ( @ k , r )= 0 the
definition of Mik is immaterial. The set of indexes B(i; G k ) is
specified in the following defintion.
Definition: F o ra n y @ E
and i = 1, ..', N - 1 theset
B ( i ; @),referred to as the set of blocked nodes for Cp a t i , is
theset of all 1 E O(i) suchthat @il = 0 , andeither do(@,
r))dri < do(@,
r ) / d r l , or there exists a link
( V I , 1 2 ) such that
m = 1 o r m is downstream of 1 and we have &,,,
0 , ?D(@.
r)/?r, Z do(@,
r ) / a r n . (Such a link will be referred t o as a n
improper link.)
We r e f e r t o [ I ] for a description of the method for genaD
erating the sets B ( i ; q5k) in a manner suitable for distributed
-=o
computation. Our definition of B ( i ; G k ) differs from the one
arN
o f [ 1]primarilyinthataspecialdevicethatfacilitatedthe
proof of convergence given in
[ I ] is not employed (compare
where Dill denotes the first derivative
of Di, with respect to
with 11, e q . ( 1 5 ) l ) .
f i r . Theequationsaboveuniquelydetermine
aD/a@il and
The following proposition shows some of the properties of
aD/ari andtheircomputationisparticularlysimple
if @ is the algorithm. Its proof-can be found in[ 1 1 ] , [ 141.
loop-free.
In
distributed
a
setting
each
node
i computes
Proposition 1 :
aD/a@il and aD/ari via (4), ( 5 ) afterreceivingthevalue
of
a) If Qk is loop-free, then g k + l is loop-free.
aD/ar, from all itsimmediatedownstreamneighbors.
Beb ) If Qk 1s loop-free and A@k= 0 solves problem (9), then
cause @ isloop-freethecomputationcan'beorganizedina
qbk is optimal.
deadlock-freemannerstartingfromthedestinationnode
N
c) If Gk is optimal, then G k + l is also optimal.
and proceeding upstream [ 1 ] ,
d ) If A@ik# 0 for some i for which ti(@, r ) 0 , then there
A necessary condition for optimality is given by (see [ 1 ])
exists a positive scalar qk such that
aD
- - - min
if @il> 0
D(&
VAG', r ) < D($bk.
r),
rl E ( 0 , 7 ? k ] .
(10)
>
<
@ ' €
>
>
>
+
a@il
-aDaCpi1
rnEO(i)
- min
rnEO(i)
-
a&,,
if
@il =
0
where ,all derivativesareevaluatedattheoptimum.These
equationsareautomaticallysatisfiedfor
i suchthat t ; = 0 :
and for ti
0 , the conditions are equivalent, through use
of
(4) and ( 5 ) , t o
>
aD
- = min hinl
ari
,mEO(i)
where 6irn is defined by
The following proposition is the main convergence result
regarding the class of algorithms (81,(9). Its proof will not be
given in view of its complexity and length. It may be found in
[ 11 1 . The propostion applies to the multiple destination case
in the "all-at-once" as well as the "one-at-a-time'' version.
Proposition 2: Let the initial routing
@o be loop-free and
satisfy D(@O,r )
D o where D O is some scalar. Assume also
X , A suchthatthesethatthereexisttwopositjvescalars
quences of matrices { M i k } satisfythefollowingtwoconditions.
a) The absolute value
of each element of M i k is bounded
above by A .
b) There holds
<
for all vi in the subspace
>
In fact, if (6) holds for all i (whether ti = 0 or t ; 0 ) , then it
is sufficient to guarantee optimality (see [ 1, Theorem 31 1.
We consider the class of algorithms
i , thevector A@ikwithcomponents
where,foreach
1 E 0 ( i )is any solution of the problem
&ilkz
Then there exists a positive scalar 6i (depending on D o , h , and
a ~ ( 0if]
, and k = 0,1, ... thesequence
{ G k } generated by algorithm ( 8 ) , (9) satisfies
A ) suchthatforall
Furthermore every limit point of
of problem (3).
Another interesting result which
{Qk}
is an optimal solution
will not be given here but
canbefoundin
[ 111statesthat,afterafinitenumberof
iterations,improperlinksdonotappearfurther
in the algorithm so 'that for rate of convergence analysis purposes the
potentialpresenceofimproperlinkscanbeignored.
Based
on this fact it.can be shown under a mild assumption that for
thesingle'destinationcasetherate
of convergenceofthe
algorithm is linear [ '1 11 .
Theclassofalgorithms
(8), (9),isquitebroadsincedifferentchoicesofmatrices
M; yielddifferentalgorithms.
A
specific choice of M i k yields Gallager's algorithm [ 11 (except
B ( i ; qbk) mentioned
forthedifferenceinthedefinitionof
earlier). This choice is the one for which Mik is diagonal_ with
all elements along the diagoual being unity except the
( I , 1)th
element which is zero where 1 is a node for which
Thus we finally obtain
A little thought.shows that the second derivative a2D/[arIl
is given by the more general formula
a20
We leave the verification of this fact to the reader. In the next
section we describe a specific algorithm involving a choice of
Mik based on second derivatives of Dil. The convergence result
of Proposition 2'is applicable to this algorithm.
111. AN ALGORITHMBASED ON SECOND DERIVATIVES
A drawback of the algorithm of [ 1 1 is that a proper range
of the stepsize parameter CY is hard to determine. In order for
thealgorithmtohaveguaranteedconvergenceforabroad
01 quite small, but this
will
range of inputs r, one must take
lead to a poor speed'of convergence for most of these inputs.
It appears that in this respect a better choice
of the matrices
M i k canbebasedonsecondderivatives.Thistendstomake
the algorithm to a large extent
scale-free, and for most prob01
lems likely to appear in practice, a choice of the stepsize
nearunityresultsinbothconvergenceandreasonablygood
speed of convergenceforabroadrangeofinputsr.Thisis
supportedbyextensivecomputationalexperiencesome
of
which is reported in [ 51 and [ 6 1 .
We use the notation
where qjk(1) is the portion of a unit of flow originating at
1
which
goes
through
link
k ) . However
calculation
of
a2D/[a,l]2usingthisformulaiscomplicated,andinfact
there seems to be no easy way to compute this second derivative. However, upper and lower bounds to it can be easily computed as we now show. By using (5) we obtain
c,
>
Since @ is loop-free we have that if
@lm
0, t h e n m, is n o t
= 1and
aD1, / a r l =
upstream of I andthereforeatl/arl
D l m " @ l m .A similar reasoning shows that
Combining the above relatiqns we obtain
[0,
We have already assumed that Dill'
is positiveintheset
Cir). We would like to choose the matrices Mik to be diagonal
withti-2a2D(@k,r)/[[email protected]
of constrained
a
versionof
Newton's method (see
[ 3 ] ) , where. the off-diagonal terms of
the Hessian matrix of D are set to zero. This type of approximated version of Newton's method is often employed in solving
large-scale
unconstrained
optimization
problems.
Unfortunately,thesecondderivativesa2D/[a@il]aredifficult
to compute. However, it
is possible to compute easily upper
and lower bounds to them which, as shown by computational
experiments, are sufficiently accurate for practical purposes.
Calculation of Upper atld Lower Bouttds to Second Derivatives
1
We computea2D/[a@i1]2evaluatedataloop-free
@ E a,
for all links (i, I ) E L for which I $! B ( i ;@), We have, using (4),
S i n c e a 2 D / a r m a r n > 0 , bysettinga2D/armar,tozerofor
m f n we obtain the lower bound
ByapplyingtheCauchy-Schwarzinequalityinconjunction
with (1 2) we also obtain
~~.
Using this fact in (1 3) we obtain the upper bound
Since 19 B(i; @)and @ is loop-free, the node I is n o t :pstream
of i , Itfollowsthat
= o andaDiI'la@, = D~~ ti. Using
again the fact that 1 is not upstream of i we have ati/&, = 0 ,
915
BERTSEKAS e t al.: SECOND DERIVATIVE ALGORITHMS
It is now easy to see that we have all
I
a2D
< Rl
R_1<-
[arJ
where & and zl are generated by
Fig. 1, Case where the upper and lower bounds on the second derivative are
equal.
m
The computation i s carried out by passing and
Fl upstream
together with aD/ar, and this is well suited for a distributed
algorithm. Upper and lower bounds
ail f o r azo/[ 2 ,
1 9 B ( i ; @) areobtainedsimultaneouslybymeans
of t h e
equation [cf. (1 1))
a1,
Qi1 =
a@il]
i
Fig. 2. Case where the upper and lower bounds on the second derivative are
not equal.
where pi is a Lagrange multiplier determined from the condition
ti2(Dil“-t Rl).
It is to be noted that in some situations occuricg frequently in
Qi1 coincideand
practicetheupperandlowerboundsand
The equation above is piecewise linear in the single variable
areequaltothetruesecondderivative.This
will occur if ki and is nearly trivial computationally. Note from
(20) that
@lm@lna2D/ar,nar,= 0 f o r rn f 1 2 . F o r e x a m p l e ,if the routing
plays the role of a stepsize parameter. For ti = 0 the algopattern is as &own in Fig.
1 (onlylinksthatcarryfloware
(9), @ i l k + 1 = 1 for t h e
rithm-sets, consistently with problem
= a 2 D / [ a @ i l 1 2f o r all ( i , 1 ) E L , 19 node 1 fo_r which 6 i i is minimum over all 6 i l , and sets Q l l k + l =
shown)then ail =
B ( i ; 4). A typicalcasewhere
Ti1 #
andthediscrepancy
0 f o r I # 1.
affectsmateriallythealgorithmtobepresentediswhen
It can be seen that
(20) is suchthatallroutingvariables
flow originating at i splits and joins again twice on its way to
(bil such that 6 i l
p i will be increased or stay fixed at unity,
N as shown in Fig. 2.
whileallroutingvariablessuchthat
6il
p i willbede-
<
The Algorithm
The following algorithm seems to be a reasonable choice.
>
creased or stay fixed at zero. In particular, the routing variable with smallest 6, will either be increased or stay fixed at
unity, similarly as in Gallager’s algorithm.
I f ti Z 0 we tuke Mik i n ( 9 ) to be the diagonal matrix with
AN ALGORITHM BASED ON ANUPPER BOUND TO
@ i J t i 2 ; 1 E O(i) along thediagonel where Qil is theupper
NEWTON’S METHOD
bound computed from (1 8) and (14)-( 16) and 01 is a positive
O( = 1 is satisscalarchosenexperimentally,(Inmostcases
While the introduction of a diagonal scaling based on secfactory .) Convergence of this algorithm can be easily establishedond derivatives alleviates substantially the problem of stepsize
by verifying that the assumption of Proposition 2 is satisfied.
selection,itisstillpossiblethatinsomeiterationsaunity
A variation o f t h e m e t h o d r e s u l t s
if weuseinplace
of t h e stepsize will not lead
t o a reduction of the objective function
upper bound
the average of the upper and lower bounds
and may even cause divergenceof the algorithm of the previous
(@il iqi1)/2. This, however, requires additional computation
section.Thiscanbecorrectedbyusingasmallerstepsizeas
and communication between modes.
shown in Proposition 2 but the proper range of stepsize magProblem (9) can be written fort i f 0 as
nitude depends on the network topology and may not be easy
todetermine.Thisdependencestemsfromthereplacement
of the Hessian matrix of D by a diagonal approximation which
in turn facilitates the computation of upper bounds to second
derivatives in a distributed manner. Neglecting the off-djagonal
terms of theHessianhastwotypes
of effects,First,while
operating the algorithm for one destination. we ignore changes
whicharecausedbyotherdestinations.Thepotentialdifficultiesresultingfromthiscanbealleviated(andformost
practicalproblemseliminated)byoperatingthisalgorithm
11. Second,
in a “one-at-a-time’’ version as discussed in Section
and can be solved using a LagrangeEultiplier technique. By
the effect of neglecting off-diagonal terms can be detrimental
introducing the expression (18) for
and carrying out the
insituationssuchastheonedepictedbyFig.
3. Here r 1 =
straightforwardcalculationwecanwritethecorresponding
r2 = r 3 = r4 > 0 , r s = r 6 = 0 and node 7 is the only destinaiteration (8) as
tion. If the algorithm of the previous section is applied to this
examplewith a = 1 , thenitcanbeverifiedthateach
of
the nodes 1, 2 , 3 , and 4 will adjust its routing variables according to what would be Newton’s method
if all other variables
remained unchanged. If we assume symmetric initial conditions
Iv.
ail
ail
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-32, NO.8 , AUGUST 1984
916
Fig. 3. Network for which a stepsize 01
=
1 leads to divergence
and that the first and second derivativesD , 7 ' , D57" and D,, ',
aremuchlargerthanthecorrespondingderivatives
of
all other links, then the algorithm would lead to a change
of
flowaboutfourtimeslargerthan.appropriate.Thus,for
example,astepsize
CY = 1 / 4 is appropriate,while (Y = 1can
lead to divergence.
Thealgorithmproposedinthissectionbypassesthese
difficultiesattheexpense
of additionalcomputationper
iteration. We show that if the initial flow vector
is near optimal then the algorithm is guaranteed to reduce the value of t h e
objectivefunctionateachiterationandtoconvergetothe
o p t i m u m w i t h a unity stepsize. The algorithm "upper bounds"
a quadratic approximation to the objective function D . This is
done by first making a trial changeA@*in the routing variables
usingalgorithm
( 8 ) , (9). Thelinkflowsthatwouldresult
fromthischangearethencalculatedgoingfromthe"most
upstream" nodes downstream towards the destination. Based
on the calculated trial flows the algorithm
"sens_es" situations
like the one in Fig. 3 and finds a new change A@. We describe
the algorithm for the case of a single destination (node N ) . The
of more than one destination consists
algorithm for the case
of sequences of single
destination
iterations
whereby
all
destinationsaretakenupcyclically
(Le., theone-at-a-time
m o d e of operation).
We wouldlike t o minimizethisexpressionbyadistributed
algorithm in which each node i selects A @ i l for each outgoing
link (i, I ) . The difficulty here is that the nodes are all coupled
through the vector A t ; a change A@il generates a change At, a t
each node I I downstream of the link (i, I ) .
In what follows, we will first eliminate the dependence
of
N ( A f ) on the linear terms in
A t ; we then proceed t o upper
bound N ( A f ) in such a way as to eliminate the quadratic terms
in At. Finally then, we show how each node
i can select A@il
foritsoutgoinglinks
so as toapproximatelyminimizethe
upper bound to N ( A f ) .We start by combining the two terms
in (26) that are linear in
At,
067"
i,l
where
We caninterpret z i l ' t o firstorderas'thederivativeof
Dil
evaluatedattheflow
tl$il. The
followingsimplelemma
will eliminate At from the first term in
(27); we state it in
greater generality than needed here since we
will use it again
on the quadratic term inAt.
L e m m a 1 : Let pilebe real for each (i, I ) E L , and for each
n o d e i, let Ti, T i , d i , d i be variables related by
i
The Upper Boutzd
At the typical iteration of the algorithm, we have a vector Then
of loop-free routing variables 4 and a corresponding flow vector
f. L e t Af denote an increment of flow such that .f
A,f is
ZiTi =
diTi.
feasible. A constrainedversion
of N e w t o n ' sm e t h o d[ 3 ]
is
I
i
obtained if Af is chosen to minimize the quadratic objective
function
Proof; Using (30) and then (29), we have
x
+
x
x
x
XZiTi =
i.I
i,I
I
+
subject, to f
Af E F where f is the set of feasibleflow
vectors. Let A@ be a change in @ corresponding toA.f and let
&=@+A@.
diTi
-
=
diTi
(23)
=
d l ( T J-
-
i
Let t be the vector of total traffic at the network nodes [cf.
( I ) ] , and let At be the corresponding change int . Then
2 dlpilTi
I
I
TI)
1
Q.E.D.
dl?;,.
I
To use this lemma on the first term of
( 2 7 L associate &I
with p i / , Ati with Ti and X,,,t,A@,,i with Ti. Then (24)
and (25) are equivalent to (29) in the lemma. DefiningDi' by
I
-
Afil=
+
Substituting ( 2 5 ) in ( 2 2 ) , wecanexpress
A@>
N ( A 0 in termsiof
l
I
and associating
asserts that
Di'
-
with d i and XJDil'$i/ with d i , t h e l e m m a
It can be seen that Di' can be calculated in a distributed fashion
917
BEKTSEKAS e t a l . : SECOND DEBIVATIVE ALGORITHMS
starting from the destination and proceedingupstreamsimilarly
as in algorithm ( 8 ) , (9). Using (33) in (271, we have
1
+ -xDi11r(Ati$il)2
+ -1 x D ; { ( t ; A @ i , ) 2 .
2
(34)
2 i.1
i,l
+
All of the terms in (34) except for (At;)' can be calculated
in a distributed
fashion,
moving
upstream
as
in ( 8 ) , (9). We
recall now that the algorithm is going to use the algorithm of
(8), first
calculate
(9)
to trial
change
a
A@*.We next
show
bound
upper
howtoA@*
used
will be
(44) way(Ati')2
a in such
that Lemma 1 can be employed on the result. For all
(i, I ) E
dhave weLwhere
, define
A@;[*+
= max ( 0 , A@il*); A@i/*= I min ( 0 , A@il*)I
[tiA&/*++ Ati*'(@i, + A@il*+)]
Atl*+ =
I'
Ati+(@;I+ A@il+)
~~~A~~
dAti*+(@il+A@ir*+)
[?t;(~;l+)'
(Ati+I2(@il+
~
A@;[*+
+?
[tiAgi1*- + At;*-@;l].
Thequantities Atl*+,Atl*the fact that the set of links
L * = {(i, I ) E L I
arewelldefinedbyvirtueof
A@il*-
+xp
i
Ati*-
Qil]
. Atl*-
(45)
(37)Thefollowingpropostionyieldsthedesiredupperbound.
Propositior13: Under the constraint (41),
N ( A ~ ) Etiei(A@i)
+ A&l* > 03
> 0 or
Ati*+(@;l
+ A&[*+)
(Ati-)2
(36)
ti(A@i1-)2
1
-I
A@ir+)2 AtI*+
We
exclude all nodes i for which the denominator [and hence the
(35) numerator by (41) and (43)]
is 0 . Similarly,
1
At*- =
on ( 3 9 ) , w e ob-
Finally, using the Cauchy-Schwarz inequality
tain
(46)
i
formsanacyclicnetwork[in
view of themannerthatthe
B(@;i) aredefinedinalgorithm
(8),
sets of blockednodes
( 9 ) ] . As a result Atl*+ and Atl*- arezerofor
all nodes I
whicharethe"mostupstream"inthisacyclicnetwork.
Starting from thesenodesandproceedingdownstreamthe
a discomputation of Atl*+ and At,*- can be carried out in
tributed manner for each 1 using (36) and (37).
We nextdefinethesamepositiveandnegativepartsfor
where
A@>
\
if A@il*= 0
0
and
The following constraints are now placed on A@:
A@il>0
only if A@il*> 0
A@i/<0
only if A@il* 0
C A @ i /= 0 ;
@il+
(4 1a )
<
( 4 1b)
A@il>0.
( 4 1C)
I
Withtheseconstraints
A t [ + ,At,- arealsowelldefined;
Atl+ is interpreted as the increase in flow at 1 d u e to increases
of decreasesin
A @ , Similarly
in A@, omittingtheeffects
Atl- is an upper bound to the magnitude of the decrease in
flow at I due to decreasesin A@. It follows easily that
<
+ (At,-)2.
Withthe'constraint(41),itis
(40), that for each1
Atl+ > 0
At,-
>0
(42)
also easytosee,from
only if Atl*+ > 0
only if Atl*-
> 0.
(35)-
i f N
Dirt-=
x,
[Dilrr$i12At;*-+ Di"-@;l1;
(49)
i#N
I
D,
I1
+
= DN"- = 0
Proof: In view o f ( 3 4 j , i twill suffice to show that
(50)
918
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-32, NO. 8 , AUGUST 1 9 8 4
Define T / , for all nodes 1, to satisfy
we have
<
(At/+)2/At/*+
Bycomparingwith(44),weseethat
Thus,
TJ.
Thestrictpositivityassumptionon
DiI" alsoimpliesthat
foreach y,
0 thereexistsascalar
& ( y e )suchthatevery
feasible f satisfying Ci,lDil(f;l) G & ( y e )alsosatisfies ( 5 7 )
and
hence
also
(58). Furthermore, 6(y,) can
be
taken
arbitrarilylargeprovided
ye is sufficientlylarge.
We will
makeuse of thisfactintheproofofthefollowingresult
the proof of which may be found in[ 141.
Propositiolz 4: Let @ and @ betwosuccessivevectorsof
routingvariablesgeneratedbythealgorithmofthissection
(withstepsize 0: = 1) and let f and f be the corresponding
(53)
vectors of link flows. Assume that some E with 0
1 we have
>
where
-
I
Applying Lemma 1 , associating TI with the first term of
and di with Di"+ in (49), we have
<~
<m
i, I
The second term can be considered as a sum over just those
A@il*+ 0 . Handling the A t i - term in the
terms for which
same way,
>
Combining
proof.
(55), (56j
with
(51),
(52)
< <
m-
completes
the
Q.E.D.
The Algorithm
where
Thealgorithmcannowbecompletedefined.Afterthe
routingincrement A@* is calculatedinadistributedmanner
bymeansofalgorithm
( 8 ) , (9),eachnode
i co.mputesthe
quantities Ati*+ and A t i * - . This is done recursively and in a
distributed manner by means of
( 3 6 ) , ( 3 7 ) starting from the
"most upstream"nodes and proceeding downstream towards the
destination, When this downstream propagation of information
reaches the destination indicating that all nodes have completed
the computation of Ati*+ and A t i * - , the destination gives the
signal for initiation of the second phase of the iteration which
co_nsists ofcomputation
of theactualrouting'increments
A@. T o d o t h i s e a c h n o d e
i mustreceivethevalues
of D l ' ,
1 andthen
Dl'", and Dl"- fromitsdowzstreamneighbors
determine
the
increments
A@;/ which
minimize
Q;(A@i)
subject to the constraint (41) and the new routing variables
Thennode i proceeds to compute Dit,D i t ' + , and Di"- via
( 3 2 ) , (49) , and (50) andbroadcaststhesevalues
to all u p stream
neighbors.
Thus
proceeding
recursively
upstream
from the destination each node computes the actual routing
increments A@i inmuchthesamewayas,thetrialrouting
increments A@i* were computed earlier.
We now.analyzethedescentpropertiesofthealgorithm,
We assume a single destination but the proof extends trivially
to thecasewherewehavemultipledestinationsandthe
algorithm is operated in the one destination at a time mode.
In view of the fact that each function
Dil is strictly convex
it followsthatthereisauniqueoptimalset
of totallink
E
0 there
flows {.fil*I(i, I ) E L}. It is clear that given any
existsascalar
7, suchthatfor
all feasibletotallinkflow
vectors j'satisfying
>
I fir - fir* I < YE,
where ye is thescalarcorresponding
to E asin ( 5 7 ) , ( 5 8 ) ,
and &(ye)is suchthat ( 5 7 ) [andhencealso
( 5 8 ) ] holdsfor
all E with 0
E
all feasible f satisfying(59).Thenfor
1
tr ( L I ) E L
(57 )
Theprecedingpropositionshowsthatthealgorithmof
this section does not increase the value of the objective functiononcetheflowvector
f entersaregionoftheform
{.fl Ci,lDil(f;l) 6(ye)}, .and that the sizeofthisregionincreases as the third derivative
of D;l becomes smaller. Indeed
if each function Dil is quadratic then (58) is satisfied for all
E
0 andthealgorithm
will notincreasethevalueofthe
.f. Thesefactscanbeusedtoshowthat
if
objectiveforall
thestartingtotal
flow vector j is sufficientlUv close t ot h e
optirnal the algoritlm of thissectionwillconverge
to the
optimal solution. The proof is similar to the one of Proposition 2 as given in [ 1 1 ] and is o m i t t e d .
We cannotexpecttobeable
to guaranteetheoretical
convergence when the starting routing variables are far from
optimalsincethis
is not agenericpropertyofNewton's
method which the algorithm attempts to approximate. However,inalargenumberofcomputationalexperimentswith
objectivefunctionstypicallyaiisingincommunicationnetworks and starting flow vectors which were far from optimal
[ 5 ] wehaveneverobserveddivergenceoranincreaseofthe
value of theobjectivefunctioninasingleiteration.Inany
case it is possible to prove a global convergence result fcr the
Qi(A@i) is
versionofthealgorithmwherebytheexpression
replaced by
<
>
r
J L
-I
919
BERTSEWS et d . : SECOND DERIVATIVE ALGORITHMS
where a is a sufficientlysmallpositivescalarstepsize.
The
precedinganalysis can beeasilymodified
to s h o wt h a t if we
introduce a stepsize (Y a s i n (61), then the algorithm of this
section is a descent
algorithm
at
all inflows
the region
From this it followsthat
givenanystartingpoint
@‘e@’,
there exists a scalar % > 0 such that for all stepsizes ote(0,
thealgorithm of thissectiondoesnotincreasethevalue
of
theobjectivefunctionateachsubsequentiteration.
This
fact can be used to prove a convergence result similar to the
one of Proposition 2 .
a]
REFERENCES
projection
and
path
a flow
formulation,”
Lab.
Inform.
and
Decision
Syst.,
Mass.
Inst.
Technol., Cambridge,
Rep.
LIDS-P-1364,
Jan.
1984.
*
Dimitri P.Bertsekas was born in Athens, Greece, in
1942. He received the Mechnical and Electrical Engineering Diploma from the National Technical UniversityofAthens,Greece,in1965,the
M.S.E.E.
degree from George Washington University in 1969,
andthe Ph.D. degree insystemsciencefromthe
Massachusetts Institute of Technology,’ Cambridge,
in 1971.
He has held faculty positions with the Department
ofEngineering-EconomicSystems,StanfordUniversity (1971-1974) and the Department of Electrical Engineering, Univel -sity of Illinois, Urbana (1974-1979). He is currently
Professor of Electrical Engineering and Computer Science at the Massachusetts
Institute of Technology. He consults regularly with private industry. He has
of stochastic systems, nondone research in the areas of estimation and control
linear and dynainic programming, and data communication networks, and has
written numerous papers in each of these areas. He is the author ofDynamic
Programming and Stochastic Control(Academic, 1976),Constrained Optimization and Lagrange Multiplier Methods(Academic, 1982), and the coCase (Academic,
author of Stochastic Optimal Control: The Discrete-Time
1978).
R. G. Gallager, “A minimum delay routing algorithm using distributed
computation,” IEEE Trans. Commun., vol.COM-25,pp.73-85,
1977.
A. A. Goldstein,“Convexprogramming
inHilbertspace,”Bull.
Amer. Math. SOC.,vol. 70, pp. 709-710, 1964.
E. S. Levitin,and B. T. Polyak,“Constrainedminimizationproblems,” USSR Comput. Math. Math. Phys., vol. 6, pp. 1-50, 1966.
D. P. Bertsekas, “Algorithms for nonlinear multicommodity network
flow problem,” in Proc. Int. Symp. Syst. Optimization andAnalysis,
A. Bensoussanand J . L. Lions, Eds. Springer-Verlag,pp.210-224,
1979.
D. P. Bertsekas, E. Gafni, and K. S. Vastola, “Validation of algorithms
for optimal routing of flow in networks,”Proc. IEEE Conf. Decision
and Control, San Diego. CA, pp. 220-227, Jan. 1979.
K. S. Vastola, “A numerical study of two measures of delay for network
routing,” M. S. thesis, Dep. Elec. Eng., Univ. Illinois, Urbana, Sept.
1979.
D. P. Bertsekas, “A class of optimal routing algorithms
for communication networks,” in Proc. 5th Int. Conf. Comput. Commun. (ICCCSO), Atlanta. GA, pp. 71-76, Oct. 1980.
D. P. Bertsekas,and E. Gafni.“Projectionmethodsforvariational
inequalities with application to the traffic assignment problem,” Mass.
see also
Inst.Technol.,Cambridge,Rep.LIDS-P-1043,June1980;
Math. Programming Study, vol. 17, pp. 139-159, 1982.
S. J. Golestaani, “A unified theory of flow control and routing in data
communicationnetworks,”Lab.Inform.andDecisionSyst.,Mass.
Inst. Technol., Cambridge, Rep. LIDS-TH-963, Jan, 1980.
S. J . Golestaani,“Flowcontrolandrouting
R. G:Gallagerand
Proc.5th Int. Conf. Comput.
algorithms for datanetworks,”in
Commun. (ICCC-80), Atlanta, GA, Oct. 1980, pp. 779-784.
E. M . Gafni, “Convergence of a routing algorithm,” Lab. Inform. and
Decision Syst., Mass. Inst. Technol., Cambridge, Rep. LIDS-TH-907,
May, 1979.
D. P. Bertsekas,and E. M.Gafni,“ProjectedNewtonmethodsand
optimizationofmulticommodity flows,” Lab.Inform.andDecision
Syst., Mass.Inst.Technol.,Cambridge,Rep.LIDS-P-1140,Aug.
1981; see also IEEE Trans. Automat. Contr., Dec. 1983.
E. M. Gafni, “The integration of routing and flow control for voice and
data in computer communication networks,” Ph.D. dissertation, Dep.
Elec. Eng. and Comput. Sci., Mass. Inst. Technol., Cambridge, Aug.
1982.
D. P. Bertsekas, E. M. Gafni, and R. G. Gallager. “Second derivative
algorithms for minimum delay distributed routing in networks,” Lab.
Inform.andDecision Syst., Mass.Inst.Technol..Cambridge,Rep.
LIDS-P-1082-A, Aug. 1982.
D. P. Bertsekas,“Optimalroutingandflowcontrolmethods
for
communication networks,” inAnalysisand Optimization of Systems,
A. Bensoussanand J . L. Lions,Eds.Springer-Verlag,pp.615-643,
1982.
D. P. Bertsekas, R . Gendron, and W. K. Tsai, “Implementation of an
optimalmulticommoditynetworkflowalgorithmbasedongradient
*