Max Plus Algebra and Queues

Max Plus Algebra and Queues
Bernd Heidergott
EURANDOM research fellow
Vrije Universiteit
Department of Econometrics and Operations Research
De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands
email:[email protected]
1
Lecture 1: Max Plus Algebra
The Beginnings
Basic Definitions
def
def
Define ε = −∞ and e = 0, and denote by Rmax the set R ∪ {ε}. For
a, b ∈ Rmax , we define operations ⊕ and ⊗ by
def
a ⊕ b = max(a, b)
def
and
a ⊗ b = a + b.
The set Rmax together with the operations ⊕ and ⊗ is called max-plus algebra
and is denoted by
Rmax = (Rmax , ⊕, ⊗, ε, e) .
As in conventional algebra, we simplify the notation by letting the operation
⊗ have priority over the operation ⊕. For example,
5 ⊗ −9 ⊕ 7 ⊗ 1
has to be understood as
(5 ⊗ −9) ⊕ (7 ⊗ 1) .
Notice that (5 ⊗ −9) ⊕ (7 ⊗ 1) = 8, whereas 5 ⊗ (−9 ⊕ 7) ⊗ 1 = 13.
Extension of operations to −∞
Clearly, max(a, −∞) = max(−∞, a) = a and a + (−∞) = −∞ + a = −∞,
for any a ∈ Rmax , so that
a⊕ε=ε⊕a=a
and
a ⊗ ε = ε ⊗ a = ε,
(1)
for any a ∈ Rmax .
Examples
5 ⊕ 3 = max(5, 3) = 5, 5 ⊕ ε = max(5, −∞) = 5, 5 ⊗ ε = 5 − ∞ =
−∞ = ε, e ⊕ 3 = max(0, 3) = 3, and 5 ⊗ 3 , = 5 + 3 = 8.
Fun with Max Plus
Powers are introduced in max-plus algebra in the natural way using the
associative property. We denote the set of natural numbers including zero by
N and define for x ∈ Rmax
def
x⊗n = x
| +x+
{z· · · + x} = n × x
{z· · · ⊗ x} = x
| ⊗x⊗
n times
n times
2
def
for all n ∈ N with n 6= 0, and for n = 0 we define x⊗0 = e ( = 0). For
example,
5⊗3 = 3 × 5 = 15.
Inspired by this, we similarly introduce general powers of real numbers as
x⊗α = α × x ,
for α ∈ R. For example,
1
8⊗ 2 =
1
×8 = 4
2
and
1
1
12⊗− 4 = − × 12 = −3 = 3⊗−1 .
4
√
What is −1? Show that Fermat’s last theorem (xn + y n = z n has no nonzero integer solutions for x, y and z when n > 2) fails to hold for max-plus
algebra.
Lets get abstract
Algebraic properties
• Distributivity:
∀x, y, z ∈ Rmax :
x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z) .
• Associativity:
∀x, y, z ∈ Rmax :
¡
¢
¡
¢
x⊕ y⊕z = x⊕y ⊕z
∀x, y, z ∈ Rmax :
¡
¢
¡
¢
x⊗ y⊗z = x⊗y ⊗z.
and
• Commutativity:
∀x, y ∈ Rmax :
x⊕y = y⊕x
and
x ⊗ y = y ⊗ x.
• Distributivity of ⊗ over ⊕:
∀x, y, z ∈ Rmax :
x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z) .
• Existence of a zero element:
∀x ∈ Rmax :
x ⊕ ε = ε ⊕ x = x.
3
• Existence of a unit element:
∀x ∈ Rmax :
x ⊗ e = e ⊗ x = x.
• The zero is absorbing for ⊗:
∀x ∈ Rmax :
x ⊗ ε = ε ⊗ x = ε.
• Idempotency of ⊕:
∀x ∈ Rmax :
x ⊕ x = x.
Definition 1 A semiring is a nonempty set R endowed with two binary operations ⊕R and ⊗R such that
• ⊕R is associative and commutative with zero element εR ;
• ⊗R is associative, distributes over ⊕R , and has unit element eR ;
• εR is absorbing for ⊗R .
Such a semiring is denoted by R = (R, ⊕R , ⊗R , εR , eR ). If ⊗R is commutative,
then R is called commutative, and if ⊕R is idempotent, then it is called
idempotent.
Max-plus algebra is an example of a commutative and idempotent semiring. Here are examples of other meaningful semirings:
• Identify ⊕R with conventional addition, denoted by +, and ⊗R with
conventional multiplication, denoted by ×. Then the zero and unit
element are εR = 0 and eR = 1, respectively. The object Rst =
(R, +, ×, 0, 1) – the subscript st refers to “standard” – is an instance of
a semiring over the real numbers. Since conventional multiplication is
commutative, Rst is a commutative semiring. Note that Rst fails to be
idempotent. However, as is well known, Rst is a ring and even a field
with respect to the operations + and × .
• Min-plus algebra is defined as Rmin = (Rmin , ⊕0 , ⊗, ε0 , e), where Rmin =
def
R∪{+∞}, ⊗0 is the operation defined by a⊗0 b = min(a, b) for all a, b ∈
def
Rmin , and ε0 = +∞ . Note that Rmin is an idempotent, commutative
semiring.
4
• Consider Rmin,max = (R, ⊕0 , ⊕, ε0 , ε), with R = R ∪ {ε, ε0 }, and set
ε ⊕ ε0 = ε0 ⊕ ε = ε0 . Then Rmin,max is an idempotent, commutative
semiring. In the same vein, Rmax,min = (R, ⊕, ⊕0 , ε, ε0 ) is an idempotent,
commutative semiring provided that one defines ε ⊕0 ε0 = ε0 ⊕0 ε = ε.
• Let F = {f |f (s) ≤ f (t), for all s ≤ t, s, t ∈ R} denote the set of
wide-sense increasing functions. Define
(f ⊕ g)(t) = min(f (t), g(s))
for all t.
For t > 0, let
(f ⊗ g)(t) = inf {f (t − s) + g(s)},
0≤s≤t
and let (f ⊗ g)(t) = 0, for t ≤ 0. Set ε(t) = +∞ and e(t) = +∞,
for t > 0, and ε(t) = e(t) = 0, for t ≤ 0. Then (F, ⊕, ⊗, ε(·), e(·)) is
an idempotent, commutative semiring. [This is used in Le Boudec and
Thiran.]
• As a last example of a semiring of a somewhat different nature, let
S be a nonempty set. Denote the set of all subsets of S by R; then
(R, ∪, ∩, ∅, S), with ∅ the empty set, and ∪ and ∩ the set-theoretic
union and intersection, respectively, is a commutative, idempotent
semiring. The same applies to (R, ∩, ∪, S, ∅).
Can one define ª?
Lemma 1 Let R = (R, ⊕R , ⊗R , εR , eR ) be a semiring. Idempotency of ⊕R
implies that inverse elements with respect to ⊕R do not exist.
Matrices and vectors
The set of n × m matrices with underlying max-plus algebra is denoted by
def
Rn×m
max . For n ∈ N with n 6= 0, define n = {1, 2, . . . , n}. The element of a
matrix A ∈ Rn×m
max in row i and column j is denoted by aij and, occasionally,
by [A]ij , for i ∈ n and j ∈ m.
The sum of matrices A, B ∈ Rn×m
max , denoted by A ⊕ B, is defined by
¡
¢
[A ⊕ B]ij = aij ⊕ bij = max aij , bij ,
n×m
it holds that A ⊕ B = B ⊕ A.
for i ∈ n and j ∈ m. Note that for A, B ∈ Rmax
5
For A ∈ Rn×m
max and α ∈ Rmax , the scalar multiple α ⊗ A is defined by
[α ⊗ A]ij
= α ⊗ aij
(2)
for i ∈ n and j ∈ m.
l×m
For matrices A ∈ Rn×l
max and B ∈ Rmax , the matrix product A ⊗ B is defined
as
[A ⊗ B]ik =
l
M
©
ª
aij ⊗ bjk = max aij + bjk
j∈l
j=1
(3)
for i ∈ n and k ∈ m. This is just like in conventional algebra with + replaced
by max and × by + . For example, let
µ
¶
µ
¶
e ε
−1 11
A =
and
B =
,
3 2
1
ε
then the elements of A ⊗ B are given by
[A ⊗ B]11 = e ⊗ (−1) ⊕ ε ⊗ 1 = max(0 − 1, −∞ + 1) = −1,
[A ⊗ B]12 = e ⊗ 11 ⊕ ε ⊗ ε = max(0 + 11, −∞ − ∞) = 11,
[A ⊗ B]21 = 3 ⊗ (−1) ⊕ 2 ⊗ 1 = max(3 − 1, 2 + 1) = 3,
and
[A ⊗ B]22 = 3 ⊗ 11 ⊕ 2 ⊗ ε = max(3 + 11, 2 − ∞) = 14,
yielding, in matrix notation,
µ
A⊗B =
−1
3
11
14
¶
.
Notice that the matrix product in general fails to be commutative1 . Indeed,
for the above A and B
µ
¶
14 13
B⊗A =
6= A ⊗ B.
1
ε
Let E(n, m) denote the n × m matrix with all elements equal to ε, and denote
by E(n, m) the n × m matrix defined by
(
e for i = j,
def
[E(n, m)]ij =
ε otherwise.
1
This explains why commutativity of ⊗ = + is ”ignored” for Rmax
6
If n = m, then E(n, n) is called the n × n identity matrix. When their
dimensions are clear from the context, E(n, m) and E(n, m) will also be
written as E and E, respectively.
For Rn×m
max , the matrix addition ⊕, as defined in (2), is associative, commun×n
the matrix product ⊗, as
tative, and has zero element E(n, m). For Rmax
defined in (3), is associative, distributive with respect to ⊕, has unit element
E(n, n), and E(n, n) is absorbing for ⊗.
The structure
³
´
n×n
,
⊕
,
⊗
,
E
,
E
,
Rn×n
=
R
max
max
with ⊕ and ⊗ as defined in (2) and (3), respectively, constitutes a noncommutative, idempotent semiring.
>
The transpose A ∈ Rn×m
max , denoted by A , is defined in the usual way. As
before, also in matrix addition and multiplication, the operation ⊗ has
priority over the operation ⊕.
def
The elements of Rnmax = Rn×1
max are called vectors. The jth element of a vector
n
x ∈ Rmax is denoted by xj , which also will be written as [x]j . The vector in
Rnmax with all elements equal to e is called the unit vector and is denoted by
u; in formula, [u]j = e for j ∈ n.
⊗k
For A ∈ Rn×n
defined by
max , denote the kth power of A by A
def
A⊗k = |A ⊗ A ⊗
{z· · · ⊗ A},
k times
(4)
def
for k ∈ N with k 6= 0, and set A⊗0 = E(n, n). Notice that [A⊗k ]ij has to be
carefully distinguished from (aij )⊗k . Indeed, the former is element (i, j) of
the kth power of A, whereas the latter is the kth power of element (i, j) of
A.
A mapping f from Rnmax to Rnmax is called affine if f (x) = A ⊗ x ⊕ b for
n
some A ∈ Rn×n
max and b ∈ Rmax . If b = E, then f is called linear. A recurrence
relation x(k + 1) = f (x(k)), for k ∈ N, is called affine (resp., linear) if f is
an affine (resp., linear) mapping.
A matrix A ∈ Rn×m
max is called regular if A contains at least one element
different from ε in each row. Regularity is a mere technical condition, for if
7
A fails to be regular, it contains redundant rows, and any system modeled
by x(k + 1) = A ⊗ x(k) can also be modeled by considering a reduced regular version of A in which all redundant rows and related columns are skipped.
We call a matrix A ∈ Rn×n
max irreducible if no permutation matrix P exists,
>
such that P ⊗ A ⊗ P is a block upper triangular matrix (the digraph
of A is strongly connected.). If a matrix is not irreducible, it is called reducible.
Let A be a random element in Rn×m
max defined on a probability space
(Ω, A, P ). We call A integrable if
E[ 1Aij >ε |Aij | ] < ∞ ,
1 ≤ i ≤ n, 1 ≤ j ≤ m .
In words, integrability of a matrix is defined through integrability of its nonε elements. If A is integrable, then the expected value of A is given by the
matrix E[A] with
(
E[ 1Aij >ε Aij ] for P (Aij 6= ε) > 0,
(E[A])ij =
ε
for P (Aij = ε) = 1,
for 1 ≤ i ≤ n, 1 ≤ j ≤ m.
We say that A(k) has fixed support if the probability that (A(k))ij equals ε
is either 0 or 1 and does not depend on k.
Heaps of pieces (no queueing)
Let P denote a finite set. In the example below we have P = {a, b, c}. A
sequence of ”pieces” out of P is called a heap. For example, w = a b a c b
is a heap; see the figure. Denote the upper contour of heap w by a vector
xH (w) ∈ Rnmax , where (xH (w))r is the height of the heap on column r. For
example, let




e ε ε ε ε
1 1 ε ε ε
 ε 1 2 2 ε 
 1 1 ε ε ε 







M (a) = 
 ε ε e ε ε  , M (b) =  ε 1 2 2 ε  ,
 ε e 1 1 ε 
 ε ε ε e ε 
ε ε ε ε e
ε ε ε ε e
and



M (c) = 


e
ε
ε
ε
ε
ε
e
ε
ε
ε
8
ε
ε
e
ε
ε
ε
ε
ε
1
2
ε
ε
ε
1
2



,


then xH (a b a c b) = (3, 4, 4, 3, 3)> , when starting from ground level. The
upper contour of the heap a b a c b is indicated by the boldfaced line in
Figure 1.4. For heap w and piece η ∈ P, write w η for the heap resulting
from piling piece η on heap w. Note that the order in which the pieces fall is
of importance. The upper contour follows the recurrence relation
xH (w η) = M (η) ⊗ xH (w).
In words, the upper contour of a heap of pieces follows a max-plus recurrence
relation.
For a given sequence ηk , k ∈ N, the asymptotic growth rate of the heap
model is given by
1
lim xH (k) ,
k→∞ k
provided that the limit exists. For example, if ηk , k ∈ N, represents a particular schedule, like η1 = a, η2 = b, η3 = c, η4 = a, η5 = b, η6 = c, and so forth,
then the above limit measures the efficiency of schedule a b c.
For a given sequence ηk , k ∈ N, the asymptotic form of xH (k) can be
studied, where form means the relative differences of the components of
xH (k). More precisely, in studying the shape of the upper contour the actual height of the heap is disregarded. To that end, the vector of relative
differences in xH (w), called the shape vector, is denoted by s(w). For example, the shape of heap w = a b a c b in Figure 1.4 is obtained by letting
9
the boldfaced line (the upper contour) sink to the ground level, yielding the
vector s(w) = (0, 1, 1, 0, 0)> . More formally, the shape vector is defined as
n
o
sr (w) = (xH (w))r − min (xH (w))p : p ∈ R ,
r ∈ R.
Suppose that the sequence in which the pieces appear cannot be controlled
(their arrivals may be triggered by an external source). For instance, ηk ,
k ∈ N, is a random sequence such that piece a, b, and c appear with equal
def
probability. Set s(k) = s(η1 , η2 , . . . , ηk ). Since pieces fall in random order,
s(k) is a random variable. Using probabilistic arguments, one can identify
sufficiency conditions such that the probability distribution of s(k) converges
to a limiting distribution, say, F . Hence, the asymptotic shape of the heap
is given by the probability distribution F . By means of F , for example, the
probability can be determined that the completion time of tasks typically
differs more than t time units over the resources, yielding an indication on
how well balanced the schedule ηk , k ∈ N, is.
The pleasures and frustrations of Max Plus
Great didactic tool; powerful deterministic theory; dynamic system view
(projective space); stochasticity spoils the fun: a one day wonder?
Deep concepts explained with Max Plus
Coupling from the past, weak convergence vs. a.s. convergence.
Example 1 Consider

1
 ε
B=
 0
ε
and

1
 ε
C=
 ε
0
ε
2
ε
0
ε
ε
ε
ε

ε
ε 
.
ε 
ε
ε
2
0
ε
ε
ε
ε
ε

ε
ε 
.
ε 
ε
Let A(k) be an i.i.d. sequence with state-space {B, C} such that
P(B = A(k)) = p = 1 − P(C = A(k)),
10
for some p ∈ (0, 1), which implies that A(k) fails to have fixed support (!).
Note that
x1 (k) = k and x2 (k) = 2k,
(5)
for any k. Observe, that (5) implies that,
(
k
if A(k) = B
x3 (k) =
2k if A(k) = C.
In the same vein,
(
2k
x4 (k) =
k
if A(k) = B
if A(k) = C.
Hence, it follows that
x1 (k)
=1
k→∞
k
lim
and
x2 (k)
= 2,
k→∞
k
lim
whereas for i = 3, 4
lim sup
k→∞
xi (k)
xi (k)
= 2 6= 1 = lim inf
k→∞
k
k
and the sample path limits fail to exist.
It is worth noting that x(k)/k converges weakly as k tends to ∞ towards
the random vector (Y1 , Y2 , Y3 , Y4 ), where Y1 = 1 and Y2 = 2 with probability
one and Y3 = 1, Y4 = 2 with probability p and Y3 = 2, Y4 = 1 with probability
1 − p.
Let Y (k) := A(0) ⊗ A(−1) ⊗ · · · ⊗ A(−k) ⊗ x0 , for k ≥ 0. Then, for k ≥ 1,
Yi (k)/k is independent of k and thus converges with probability one for i ∈ 4.
The random variable Y (k)/k, for k ≥ 1, is the weak limit of x(k)/k as k
tends to ∞.
Ergodicity and i.i.d.
Example 2 What happens if we consider stationary and ergodic sequence
instead of an i.i.d. sequence? Let Ω = {ω1 , ω2 } and P (ωi ) = 1/2, for i = 1, 2.
Define the shift operator θ by θ(ω1 ) = ω2 and θ(ω2 ) = ω1 . Then θ is stationary
and ergodic. Consider the matrices A, B, with A 6= B, and let
{A(k, ω1 ) : k ∈ N} = A, B, A, B, . . .
{A(k, ω2 ) : k ∈ N} = B, A, B, A, . . .
The sequence {A(k)} is thus stationary and ergodic. But with probability one
we never observe a sequence of n occurrences in a row of either A or B, for
n > 1.
11
The goal of the first three lectures
... is to prove the following theorem:
Assume that assumptions (W1), (W2) and (W3) are satisfied and denote
the maximal Lyapunov exponent of {A(k)} by a. If ν > a, then the sequence
W (k + 1) = A(k) ⊗ C(σ(k + 1)) ⊗ W (k) ⊕ B(k)
converges with strong coupling2 to an unique stationary regime W , with
M
W = D(0) ⊕
C(−τ (−j)) ⊗ D(j) ,
j≥1
where D(0) = B ◦ θ−1 and
D(j) =
j
O
A(−i) ⊗ B(−(j + 1)) ,
j ≥1.
i=1
Lecture 2: Max Plus and queues
A zoology of mappings
Closed networks (autonomous equations):
x(k + 1) = A(k) ⊗ x(k)
with A(k) irreducible.
Open networks (non-autonomous equations):
x(k + 1) = A(k) ⊗ x(k) ⊗ u(k)
with A(k) irreducible, or
x(k + 1) = A(k) ⊗ x(k)
with A(k) reducible.
2
... which implies total variation convergence, which implies weak convergence.
12
Max Plus models of queueing network
Example 3 Consider a closed system of J single-server queues in tandem,
with infinite buffers. In the system, customers have to pass through the queues
consecutively so as to receive service at each server. After service completion
at the J th server, the customers return to the first queue for a new cycle of
service.
We denote the number of customers initially residing at queue j by
nj . We assume that there are J customers circulating through the network
and that initially there is one customer in each queue, that is, nj = 1 for
1 ≤ j ≤ J. Figure 1 shows the initial state of the tandem network, customers
are represented by the symbol ‘•’.
-
º· º·
- •
-· · ·
•
¹¸ ¹¸
1
º·
- •
¹¸
2
J
Figure 1: The closed tandem queueing system at initial state nj = 1 for
1 ≤ j ≤ J.
Let σj (k) denote the k th service time at queue j and let xj (k) be the
time of the k th service completion at node j, then the time evolution of the
system can be described by a J-dimensional vector x(k) = (x1 (k), . . . , xJ (k))
following the homogeneous equation3
x(k + 1) = A(k) ⊗ x(k) ,
where the matrix A(k) looks like

σ1 (k)
ε
 σ2 (k) σ2 (k)




...
A(k − 1) = 



 ···
ε
···
···
···
ε
σJ−1 (k)
ε
(6)
ε
···
σJ−1 (k)
σJ (k)
for k ≥ 1. Observe that A(k) is irreducible.
3
Models can be algebraically computed. Not part of this lecture.
13

σ1 (k)
ε 




..

.



ε 
σJ (k)
(7)
Example 4 We now consider the open variant of the tandem network in
Example 3. Let queue 0 represent an external arrival stream of customers.
Each customer who arrives at the system has to pass through queues 1 to J
and then leaves the system. We assume that the system starts empty. Denoting the number of customers initially present at queue j by nj , we assume
nj = 0 for 1 ≤ j ≤ J. Figure 2 shows the initial state of the tandem network.
-
º· º·
-· · ·
¹¸ ¹¸
1
º·
¹¸
2
J
Figure 2: The open tandem queueing system at initial state nj = 0 for 1 ≤
j ≤ J.
Again, we let xj (k) denote the time of the k th service completion at station
j. In particular, we let x0 (k) denote the k th arrival epoch at the system. The
time evolution of the system can then be described by a (J + 1)-dimensional
vector x(k) = (x0 (k), . . . , xJ (k)) following the homogeneous equation
x(k + 1) = A(k) ⊗ x(k) ,
where the matrix A(k − 1) looks like

σ0 (k)
ε

σ
(k)
⊗
σ
(k)
σ
(k)
0
1
1

 σ0 (k) ⊗ σ1 (k) ⊗ σ2 (k)
σ
(k)
⊗
σ
1
2 (k)




..

.


σ0 (k) ⊗ · · · ⊗ σJ (k) σ1 (k) ⊗ · · · ⊗ σJ (k)
(8)
ε
ε
σ2 (k)
...
···
···
ε
ε
ε
σ2 (k) ⊗ · · · ⊗ σJ (k)
···
(9)
σJ (k)
for k ≥ 14 .
Alternatively, we could describe the system via a J dimensional vector
x̂(k) = (x̂1 (k), . . . , x̂J (k)) following the inhomogeneous equation
x̂(k + 1) = Â(k) ⊗ x̂(k) ⊕ B(k) ⊗ τ (k + 1) ,
(10)
where the matrix Â(k) looks like (9), except for the first column and the first
row which are missing, that is, (Â(k))ij = (A(k))i+1 j+1 for 1 ≤ i, j ≤ J; the
4
The source is modeled as a node of the network.
14











vector B(k) is given by

σ1 (k + 1)
σ1 (k + 1) ⊗ σ2 (k + 1)
..
.


B(k) = 






σ1 (k + 1) ⊗ σ2 (k + 1) ⊗ · · · ⊗ σJ (k + 1)
for k ≥ 0; and
τ (k) =
k
X
σ0 (i)
i=1
th
denotes the k arrival time. Notice that Bj (0), for 1 ≤ j ≤ J, denotes the
time it takes the first customer from entering the system until departing from
station j (this requires a proof ).
Example 5 (Example 4 revisited) We consider the system as described in
the above example. However, in contrast to Example 4, we let xj (k) denote
the time of the k th beginning of service at station j, with 1 ≤ j ≤ J. The
standard non-autonomous equation now reads
x(k + 1) = A(k) ⊗ x(k) ⊕ B(k) ⊗ τ (k + 1) ,
(11)
with A(k) given by
(
(A(k))ij =
ε
N
σj (k) ⊗ i−1
h=j σh (k + 1)
for i < j,
for i ≥ j ,
for 1 ≤ i, j ≤ J, where we set σj (0) = 0, and

0
σ1 (k + 1)
σ1 (k + 1) ⊗ σ2 (k + 1)
..
.



B(k) = 









σ1 (k + 1) ⊗ σ2 (k + 1) ⊗ · · · ⊗ σJ−1 (k + 1)
for k ≥ 0. An element Bj (0) denotes the time it takes the first customer from
entering the system until reaching station j, (this requires a proof ). Notice
that A(k) is reducible.
Example 3 and Example 4 model sequences of departure times from the
queues via a max-plus recurrence relation and a model for beginning of service
times is given in Example 5. We now turn to another important application
of max-plus linear models: waiting times.
15
Example 6 Consider the open tandem network described in Example 5. Let
Wj (k) be the time the k th customer arriving at the network spends in the
system until the beginning of her/his service at station j. Then, the vector of
waiting times W (k) = (W1 (k), . . . , WJ (k)) follows the recurrence relation
W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) ,
k ≥0,
with W (0) = (0, . . . , 0) and C(r) a matrix with diagonal entries −r and all
other entries equal to ε.
Taking J = 1, the above recurrence relation for the waiting times reads
W (k + 1)
= σ1 (k) ⊗ (−σ0 (k + 1)) ⊗ W (k) ⊕ 0
= max( σ1 (k) − σ0 (k + 1) + W (k) , 0 ) ,
k ≥0,
with σ1 (0) = 0, which is Lindley’s equation for the actual waiting time in a
G/G/1 queue.
If we had let x(k) describe departure times at the stations, c.f. Example 4,
then W (k) would yield the vector of sojourn times of the k th customer. In
other words, Wj (k) would model the time the k th customer arriving at the
network spends in the system until leaving station j.
In the above examples the positions which are equal to ε are fixed and
the randomness is generated by letting the entries different from ε be random
variables. The next example is of a different kind. Here, the matrix as a whole
is random, that is, the values of the elements are completely random in the
sense that an element can with positive probability be equal to ε or finite.
Example 7 Consider a cyclic tandem queueing network consisting of a single server and a multi server, each with deterministic service time. Service
times at the single-server station equal σ, whereas service times at the multiserver station equal σ 0 . Three customers circulate in the network. Initially,
one customer is in service at station 1, the single server, one customer is in
service at station 2, the multi-server, and the third customer is just about to
enter station 2. The time evolution of this network is described by a max-plus
linear sequence x(k) = (x1 (k), . . . , x4 (k)), where x1 (k) is the k th beginning of
service at the single-server station and x2 (k) is the k th departure epoch at the
single-server station; x3 (k) is the k th beginning of service at the multi-server
station and x4 (k) is the k th departure epoch from the multi-server station.
The system then follows
x(k + 1) = D2 ⊗ x(k) ,
16
where

σ
 σ
D2 = 
 ε
ε
ε
ε
e
ε
σ0
ε
ε
σ0

ε
ε 
,
e 
ε
with x(0) = (0, 0, 0, 0). Figure 3 shows the initial state of this system.
º·
• ¾
¹¸
º·
•
-
•
¹¸
º·
¹¸
Figure 3: The initial state of the multi-server system (three customers).
Consider the cyclic tandem network again, but one of the servers of the multiserver station has broken down. The system is thus a tandem network with
two single server stations. Initially one customer is in service at station 1,
one customer is in service at station 2, and the third customer is waiting at
station 2 for service. This system follows
x(k + 1) = D1 ⊗ x(k) ,
where

σ
 σ
D1 = 
 ε
ε
ε
ε
e
ε
σ0
ε
σ0
σ0

ε
ε 
,
ε 
ε
with x(0) = (0, 0, 0, 0). Figure 4 shows the initial state of the system with
breakdown.
Assume that whenever a customer enters station 2, the second server of the
multi server station breaks down with probability θ. Let Aθ (k) have distribution
P ( Aθ (k) = D1 ) = θ
and
P ( Aθ (k) = D2 ) = 1 − θ ,
17
º·
• ¾
¹¸
º·
•
-
•
¹¸
º·
@¡
¡
@
¡
@
¹¸
Figure 4: The initial state of the multi-server system with breakdown (three
customers).
then
xθ (k + 1) = Aθ (k) ⊗ xθ (k)
describes the time evolution of the system with breakdowns. That the above
recurrence relation indeed models the sample path dynamic of the system with
breakdowns is not obvious and a proof is required.
Particularities on waiting times
We consider basic recurrence relation
x(k + 1) = A(k) ⊗ x(k) ⊕ B(k) ⊗ τ (k + 1) .
(12)
We assumed that there is only one input stream. If we consider only component xj (k), we can subtract τ (k + 1) on both sides of equation (12) and
get
Wj (k + 1) = xj (k + 1) − τ (k + 1)
= (A(k) ⊗ x(k))j ⊗ (−τ (k + 1)) ⊕ Bj (k) .
Let σ0 (k) denote the k th interarrival time, that is,
k
X
τ (k) =
σ0 (i) ,
i=1
then
(A(k) ⊗ x(k))j ⊗ (−τ (k + 1)) =
M
Aji (k) ⊗ (−σ0 (k + 1)) ⊗ (xi (k) − τ (k))
i
=
M
Aji (k) ⊗ (−σ0 (k + 1)) ⊗ Wi (k) .
i
18
Let C(h) denote a diagonal matrix with −h on the diagonal and ε else. Then,
we can write the expression on the right-hand side of the above equation as
follows
M
¡
¢
Aji (k) ⊗ (−σ0 (k + 1)) ⊗ Wi (k) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) j .
i
Combining the above formulas, we obtain the following vectorial form of the
recurrence relation for W (k + 1):
W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) .
(13)
Lemma 2 Let x0 = u in (12). If W (0) = u in (13), then W (1) = B(0).
By using elementary matrix operations in the max-plus algebra, Equation
(13) can be rewritten as
W (k + 1) =
k
O
A(i) ⊗ C(σ0 (i + 1)) ⊗ W (0)
i=0
⊕
k
k
M
O
A(j) ⊗ C(σ0 (j + 1)) ⊗ B(i) ,
(14)
i=0 j=i+1
with W (0) = x0 .
When it comes to queueing networks, we obtain from (14) a closed-form
expression for the vector of (k + 1)st waiting/sojourn times in an open
queueing network that is initially empty and whose sequence of interarrival
times is given by {σ0 (k)}. More precisely, depending on whether we model
beginning of service or departure times by x(k), Wj (k) models the time the
k th arriving customer spends in the system until her/his service at server
j starts, or until she/he departs from server j. Equation (14) is called the
forward construction of waiting times.
Subadditive ergodic theory
Subadditive processes are double indexed processes X = {Xmn : m, n ∈ N}
satisfying the following conditions:
(S1) If i < j < k, then Xik ≤ Xij + Xjk a.s.
(S2) For m ≥ 0, the joint distributions of the process {Xm+1n+1 : m < n}
are the same as those of {Xmn : m < n}.
19
(S3) The expected value gn = E[X0n ] exists and satisfies gn ≥ −cn for some
finite constant c > 0 and all n ≥ 1.
We now state Kingman’s subadditive ergodic theorem: if X is a subadditive process (that is, (S1), (S2) and (S3) hold), then the limit
X0n
n→∞ n
ξ = lim
exists almost surely, and E[ξ] = λ. Condition (S2), on the shift {Xmn } →
{Xm+1n+1 }, is a stationarity condition. If all events defined in terms of X
that are invariant under this shift have probability zero or one, then X is
ergodic. In this case, the limiting random variable ξ is almost surely constant
and equal to λ. Note that the limit also holds when expected values are
considered.
Lemma 3 Let {A(k)} be a stationary sequence of a.s. regular and integrable
matrices in RJ×J
max . Then {−||xnm ||min : m > n ≥ 0} and {||xnm ||max : m >
n ≥ 0} are subadditive ergodic processes.
Theorem 1 Let {A(k)} be a stationary sequence of a.s. regular, integrable
square matrices. Then, finite constants λtop and λbot exist, so that for all
(non-random) finite initial conditions x0 :
def
||x(k)||max
||x(k)||min
def
≤ λtop = lim
k→∞
k→∞
k
k
λbot = lim
a.s.
and
1
1
E[||x(k)||min ] = λbot ≤ lim E[||x(k)||max ] = λtop .
k→∞ k
k→∞ k
The above limits also hold for random initial conditions provided that the
initial condition is a.s. finite and integrable.
lim
The constant λtop is called the top or maximal Lyapunov exponent of
{A(k)} and λbot is called the bottom Lyapunov exponent.
Lecture 3: Stability of waiting times
We consider the following situation. An open queuing network with J stations
is given such that the vector of departure times from the stations, denoted
by x(k), follows the recurrence relation
x(k + 1) = A(k) ⊗ x(k) ⊕ τ (k + 1) ⊗ B(k) ,
20
(15)
with x(0) = u, where τ (k) denotes the time of the k th arrival to the system.
We denote by σ0 (k) the k th interarrival time, so that the k th arrival of a
customer at the network happens at time
τ (k) =
k
X
σ0 (i) ,
k ≥1,
i=1
with τ (0) = 0. Then, Wj (k) = xj (k) − τ (k) denotes the time the k th
customer arriving to the system spends in the system until completion of
service at server j. The vector of k th sojourn times, denoted by W (k) =
(W1 (k), . . . , WJ (k)), follows the recurrence relation
W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) ,
k ≥0,
with W (0) = e, where C(h) denotes a diagonal matrix with −h on the
diagonal and ε elsewhere. Alternatively, xj (k) in (15) may model the times
of the k th beginning of service at station j. With this interpretation of x(k),
Wj (k) defined above represents the time spent by the k th customer arriving
to the system until beginning of her/his service at j. For example, in the
G/G/1 queue W (k) models the waiting time.
In the following we will establish sufficient conditions for W (k) to converge
to a unique stationary regime. The main technical assumptions are:
(W1) For k ∈ Z, let A(k) ∈ RJ×J
max be a.s. regular and assume that the maximal
Lyapunov exponent of {A(k)} exists.
(W2) There exists a fixed number α, with 1 ≤ α ≤ J, such that the vector
B α (k) = (Bj (k) : 1 ≤ j ≤ α) has finite elements for any k, and
Bj (k) = ε, for α < j ≤ J and any k.
(W3) The sequence {(A(k), B α (k))} is stationary and ergodic, and independent of {τ (k)}, where τ (k) is given by
τ (k) =
k
X
σ(i) ,
k ≥1,
i=1
with τ (0) = 0 and {σ(k) : k ∈ Z} a stationary and ergodic sequence of
positive random variables with mean ν ∈ (0, ∞).
Provided that {A(k)} is a.s. regular and stationary, integrability of A(k)
is a sufficient condition for (W1). In terms of queuing networks, the main
restriction imposed by these conditions stems from the non-negativity of
the diagonal of A(k). The part of condition (W3) that concerns the arrival
stream of the network is, for example, satisfied for Poisson arrival streams.
21
Theorem 2 Assume that assumptions (W1), (W2) and (W3) are satisfied
and denote the maximal Lyapunov exponent of {A(k)} by a. If ν > a, then
the sequence
W (k + 1) = A(k) ⊗ C(σ(k + 1)) ⊗ W (k) ⊕ B(k)
converges with strong coupling to an unique stationary regime W , with
M
C(−τ (−j)) ⊗ D(j) ,
W = D(0) ⊕
j≥1
where D(0) = B ◦ θ−1 and
D(j) =
j
O
A(−i) ⊗ B(−(j + 1)) ,
j ≥1.
i=1
It is worth noting that β(w), defined in the proof of Theorem 2, fails
to be a stopping time adapted to the natural filtration of {(A(k), B(k)) :
k ≥ 0}. More precisely, β(w) is measurable with respect to the σ-field
σ((A(k), B(k)) : k ≥ 0) but, in general, {β(w) = m} 6∈ σ((A(k), B(k)) :
m ≥ k ≥ 0), for m ∈ N.
Due to the max-plus formalism, the proof of Theorem 2 is a rather
straightforward extension of the proof of the classical result for the G/G/1
queue.
Lecture 4: Structural Insights (on matrices and
graphs)
Matrices and graphs
A directed graph G is a pair (N , D), where N is a finite set of elements called
nodes (or vertices) and D ⊂ N × N is a set of ordered pairs of nodes called
arcs (or edges). If (i, j) ∈ D, then we say that G contains an arc from i to j,
and the arc (i, j) is called an incoming arc at j and an outgoing arc at i.
A directed graph is also called a digraph in the literature. A directed graph
is called weighted if a weight w(i, j) ∈ R is associated with any arc (i, j) ∈ D.
From now on we will deal exclusively with weighted directed graphs and will
refer to them as “graphs” for simplicity.
To any n × n matrix A over Rmax a graph can be associated, called the
communication graph of A. The graph will be denoted by G(A). The set of
nodes of the graph is given by N (A) = n, and a pair (i, j) ∈ n × n is an arc
22
of the graph if aji 6= ε (this is not a typo!), i.e., in symbols (i, j) ∈ D(A) ⇔
aji 6= ε, where D(A) denotes the set of arcs of the graph.
For any two nodes i, j, a sequence of arcs p = ((ik , jk ) ∈ D(A) : k ∈ m),
such that i = i1 , jk = ik+1 , for k < m, and jm = j is called a path from i to
j. The path is then said to consist of the nodes i = i1 , i2 , . . . , im , jm = j and
to have length m. The latter will be denoted as |p|l = m. Further, if i = j,
then the path is called a circuit. A circuit p = ((i1 , i2 ), (i2 , i3 ), . . . , (im , i1 ))
is called elementary if, restricted to the circuit, each of its nodes has only
one incoming and one outgoing arc or, more formally, if nodes ik and il are
different for k 6= l. A circuit consisting of just one arc, from a node to itself,
is also called a self-loop. A matrix A ∈ Rn×n
max is called irreducible if there is
in the communication graph G(A) a path from any node to any other node.
If a matrix is not irreducible, it is called reducible.
The set of all paths from i to j of length m ≥ 1 is denoted by
P (i, j; m). For an arc (i, j) in G(A), the weight of (i, j) is given by aji
(again, this is not a typo!), and the weight of a path in G(A) is defined
by the sum of the weights of all arcs constituting the path. More formally,
for p = ((i1 , i2 ), (i2 , i3 ), . . . , (im , im+1 )) ∈ P (i, j; m) with i = i1 and j = im+1 ,
define the weight of p, denoted by |p|w , through
|p|w =
m
O
aik+1 ik .
k=1
Pm
Note that in conventional notation |p|w =
k=1 aik+1 ik . The average
weight of a path p is given by |p|w /|p|l . For circuits the notions of weight,
length, and average weight are defined similarly as for paths. Also, the phrase
circuit mean is used instead of the phrase average circuit weight.
Paths in G(A) can be combined in order to construct a new path. For
example, let p = ((i1 , i2 ), (i2 , i3 )) and q = ((i3 , i4 ), (i4 , i5 )) be two paths in
G(A). Then,
¡
¢
p ◦ q = (i1 , i2 ), (i2 , i3 ), (i3 , i4 ), (i4 , i5 )
is a path in G(A) as well. The operation ◦ is called the concatenation of paths.
Clearly, the operation is not commutative, even when both p ◦ q and q ◦ p are
defined.
Example 8 Let


ε 15 ε
A =  ε ε 14  .
10 ε 12
The communication graph of A is shown in Figure 5. The graph G(A) has
23
12
3
10
14
1
15
2
Figure 5: The communication graph of matrix A in Example 8.
node set N (A) = {1, 2, 3} and arc set D(A) = {(1, 3), (3, 2), (2, 1), (3, 3)}.
Specifically, G(A) consists of two elementary circuits, namely, ρ =
((1, 3), (3, 2), (2, 1)) and θ = (3, 3). The weight of ρ is given by
|ρ|w = a12 + a23 + a31 = 39,
and the length of ρ equals |ρ|l = 3. Circuit θ has weight |θ|w = a33 = 12 and
is of length 1.
Theorem 3 Let A ∈ Rn×n
max . It holds for all k ≥ 1 that
£ ⊗k ¤
©
ª
A ji = max |p|w : p ∈ P (i, j; k) ,
where [A⊗k ]ji = ε in the case where P (i, j; k) is empty, i.e., when no path of
length k from i to j exists in G(A).
Definition 2 The cyclicity of a graph G, denoted by σG , is defined as follows:
• If G is strongly connected, then its cyclicity equals the greatest common
divisor of the lengths of all elementary circuits in G. If G consists of
just one node without a self-loop, then its cyclicity is defined to be one.
• If G is not strongly connected, then its cyclicity equals the least common
multiple of the cyclicities of all maximal strongly connected subgraphs
of G.
Solving Linear Equations
For A ∈ Rn×n
max , let
A
+ def
=
∞
M
k=1
24
A⊗k .
(16)
The element [A+ ]ij yields the maximal weight of any path from j to i (the
value [A+ ]ij = +∞ is possible). Indeed, by definition
©
ª
[A+ ]ij = max [A⊗k ]ij : k ≥ 1 ,
where [A⊗k ]ij is the maximal weight of a path from j to i of length k; see
Theorem 3.
Lemma 4 Let A ∈ Rn×n
max be such that any circuit in G(A) has average circuit
weight less than or equal to e. Then, it holds that
+
A
=
∞
M
A⊗k = A ⊕ A⊗2 ⊕ A⊗3 ⊕ · · · ⊕ A⊗n ∈ Rn×n
max .
k=1
n
Theorem 4 Let A ∈ Rn×n
max and b ∈ Rmax . If the communication graph G(A)
has maximal average circuit weight less than or equal to e, then the vector
x = A∗ ⊗ b solves the equation x = (A ⊗ x) ⊕ b. Moreover, if the circuit
weights in G(A) are negative, then the solution is unique.
Analysis of irreducible matrices (deterministic)
Definition 3 Let A ∈ Rn×n
max be a square matrix. If µ ∈ Rmax is a scalar and
v ∈ Rnmax is a vector that contains at least one finite element such that
A ⊗ v = µ ⊗ v,
then µ is called an eigenvalue of A and v an eigenvector of A associated with
eigenvalue µ.
Denote the set of elementary circuits of the communication graph of A
by C(A).
n×n
Theorem 5 Any irreducible matrix A ∈ Rmax
possesses one and only one
eigenvalue. This eigenvalue, denoted by λ(A), is a finite number and equal to
the maximal average weight of circuits in G(A), i.e.,
|γ|w
.
γ∈C(A) |γ|l
λ(A) = max
Theorem 5 is the max-plus analogue of the Perron-Frobenius theorem
in conventional linear algebra, which states that an irreducible square
nonnegative matrix has a largest eigenvalue that is positive and real, where
largest means largest in modulus.
We now state the celebrated cyclicity theorem of max-plus algebra.
25
Theorem 6 Let A ∈ Rn×n
max be an irreducible matrix with eigenvalue λ and
cyclicity σ = σ(A). Then there is an N such that
A⊗(k+σ) = λ⊗σ ⊗ A⊗k
for all k ≥ N .
Analysis of reducible matrices (deterministic)
Let G = (N , D) denote a graph with node set N and arc set D. For i, j ∈ N ,
node j is said to be reachable from node i, denoted as iRj, if there exists
a path from i to j. A graph G is called strongly connected if for any two
n×n
nodes i, j ∈ N , node j is reachable from node i. A matrix A ∈ Rmax
is called
irreducible if its communication graph G(A) is strongly connected. If a matrix
is not irreducible, it is called reducible.
To better deal with graphs that are not strongly connected, we say for
nodes i, j ∈ N that node j communicates with node i, denoted as iCj, if
either i = j or there exists a path from i to j and a path from j to i. Hence,
iCj ⇐⇒ i = j or [iRj and jRi]. Note that the relation “communicates
with” is an equivalence relation. Indeed, its reflexivity and symmetry follow
by definition, and its transitivity follows by the concatenation of paths.
If a graph G = (N , D) is not strongly connected, then not all nodes of
N communicate with each other. In this case, given a node, say, node i, it
is possible to distinguish the subset of nodes that communicate with i from
the subset of nodes that do not communicate with i. In the first subset all
nodes communicate with each other, whereas in the second subset not all
nodes necessarily communicate with each other. In the latter case a further
subdivision of the nodes is possible. Repeated application of the previous idea
therefore yields that the node set N can be partitioned as N1 ∪ N2 ∪ · · · ∪ Nq ,
where Nr , r ∈ q, denotes a subset of nodes that communicate with each
other but not with other nodes of N . Recall that a partitioning of a set is
a division into nonempty subsets such that the joint union is the whole set
and the mutual intersections are all empty.
Given the above partitioning of N , it is possible to focus on subgraphs
of G, denoted by Gr = (Nr , Dr ), r ∈ q, where Dr denotes the subset of D of
arcs that have both the begin node and the end node in Nr . If Dr 6= ∅, the
subgraph Gr = (Nr , Dr ) is known as a maximal strongly connected subgraph
(m.s.c.s.) of G = (N , D). By definition, nodes in Nr do not communicate
with nodes outside Nr . However, it can happen that iRj for some i ∈ Nr
and j ∈ Nr0 with r 6= r0 , but then the converse (i.e., jRi) does not hold.
def
We denote by [i] = {j ∈ N : iCj} the set of nodes containing node i that
26
communicate with each other. These nodes together form the equivalence
class in which i is contained. Hence, given node i ∈ N , there exists an r ∈ q
such that i ∈ Nr and [i] = Nr .
Note that the above partitioning covers all nodes of N . If a node of G is
contained in one or more circuits, it communicates with certain other nodes or
with itself in case one of the circuits actually is a self-loop. In any case, the arc
set of the associated subgraph is not empty. However, if the graph G contains
a node that is not contained in any circuit of G, say, node i, then node i does
not communicate with other nodes and it communicates only with itself.
Then, by definition, node i forms an equivalence class on its own, so that
[i] = {i}. Because there does not even exist an arc from i to itself, it follows
that the associated subgraph is given by ([i], ∅); i.e., the node set consists
of node i only and the arc set is empty. Further, although it is not strongly
connected, ([i], ∅) will be referred to as an m.s.c.s. This is merely done for
convenience. Hence, in the following all subgraphs Gr = (Nr , Dr ), r ∈ q,
introduced above are referred to as m.s.c.s.’s.
e , D),
e by N
e =
We define the reduced graph, denoted by Ge = (N
e if r 6= s and there exists an arc (k, l) ∈ D for
{[i1 ], . . . , [iq ]} and ([ir ], [is ]) ∈ D
some k ∈ [ir ] and l ∈ [is ]. Hence, the number of nodes in the reduced graph
is exactly the number of m.s.c.s.’s in the graph. The reduced graph models
the interdependency of m.s.c.s.’s.
Note that the reduced graph does not contain circuits. Indeed, if the
reduced graph would contain a circuit, then two or more m.s.c.s.’s would be
connected to each other by means of a circuit, forming a new m.s.c.s. larger
than the m.s.c.s.’s it contains. However, this would contradict the fact that
these subgraphs already were maximal and strongly connected.
Let Arr denote the matrix obtained by restricting A to the nodes in [ir ],
for all r ∈ q, i.e., [Arr ]kl = akl for all k, l ∈ [ir ]. Notice that for all r ∈ q either
Arr is irreducible or Arr = ε. It is easy to see that because the reduced graph
does not contain any circuits, the original reducible matrix A, possibly after
a relabeling of the nodes in G(A), can be written in the form

A11 A12
 E A22


E
 E
 .
..
 ..
.
E
E
···
···
A33
..
.
···
···
···
A1q
A2q
..
.
..
..
.
.
E Aqq




,


with matrices Asr , 1 ≤ s < r ≤ q, of appropriate size. Each finite entry in
Asr corresponds to an arc from a node in [ir ] to a node in [is ]. The block
27
upper triangular form shown above is said to be a normal form of matrix A.
Note that the normal form of a matrix is not unique.
The set of direct predecessors of node i is denoted by π(i); more formally,
def
π(i) = {j ∈ n : (j, i) ∈ D }.
Moreover, denote the set of all predecessors of node i by
def
π + (i) = {j ∈ n : jRi},
and set π ∗ (i) = {i} ∪ π + (i). In words, π(i) is the set of nodes immediately
upstream of i; π + (i) is the set of all nodes from which node i can be reached;
and π ∗ (i) is the set of all nodes from which node i can be reached, including
node i itself. In the same vein, denote the set of direct successors of node i
by σ(i), more formally,
def
σ(i) = {j ∈ n : (i, j) ∈ D };
write
def
σ + (i) = {j ∈ n : iRj}
for the set of all successors of node i; and set σ ∗ (i) = {i} ∪ σ + (i). In words,
σ(i) is the set of nodes immediately downstream of i; σ + (i) is the set of all
nodes that can be reached from node i; and σ ∗ (i) is the set of all nodes that
can be reached from i, including node i itself.
Example 9 Let








A = 







ε
ε
ε
0
ε
ε
ε
ε
ε
9
0
ε
4
ε
16
ε
ε
ε
ε
ε
ε
−3
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
0
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
9
ε
ε
ε
ε
ε
ε
ε
−5
ε
ε
ε
6
ε
ε
ε
ε
ε
ε
0
ε
0.5
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε








.







The communication graph of A is shown in Figure 6. The graph
G(A) has node set N (A) = {1, 2, 3, . . . , 10} and arc set D(A) =
{(2,1), (3,2), (2,3), (4,3), (1,4),
(2, 5), (6, 5), (7, 6), (5, 7), (7, 8), (6, 9), (1, 10)}. Specifically, G(A) contains
28
0
8
4
3
_1
4
0
2
1
9
2
-3
9
16
7
0
5
0
6
-5
10
6
9
Figure 6: The communication graph of matrix A in Example 9.
three elementary circuits ρ = ((2, 3), (3, 2)), θ = ((4, 3), (3, 2), (2, 1), (1, 4)),
and η = ((6, 5), (5, 7), (7, 6)). The graph is not strongly connected, for example, 2R5, but it does not hold that 5R2. In words, node 5 is reachable from
node 2, but the converse is not true.
The predecessor and successor sets are given for node 10, for example, by
π(10) = {1} , π + (10) = {1, 2, 3, 4} , π ∗ (10) = {1, 2, 3, 4, 10},
and
σ(10) = σ + (10) = ∅ , σ ∗ (10) = {10}.
For node 5 these sets read
π(5) = {2, 6} , π + (5) = π ∗ (5) = {1, 2, 3, 4, 5, 6, 7},
and
σ(5) = {7} , σ + (5) = σ ∗ (5) = {5, 6, 7, 8, 9}.
There are five m.s.c.s.’s in G(A) with the set of nodes [1] = [2] = [3] =
[4] = {1, 2, 3, 4}, [5] = [6] = [7] = {5, 6, 7}, [8] = {8}, [9] = {9}, and
[10] = {10}. Because |ρ|l = 2 and |θ|l = 4, the m.s.c.s. corresponding to, for
instance, [2] has cyclicity 2, being the greatest common divisor of all circuit
lengths in [2]. Because |η|l = 3, the m.s.c.s. corresponding to, say [5], has
cyclicity 3. The other m.s.c.s.’s have cyclicity 1 by definition. Therefore, the
graph G(A) has cyclicity 6, being the least common multiple of the cyclicity
of all m.s.c.s.’s. Hence, σG(A) = 6. The reduced graph is depicted in Figure 7.
29
[8]
[2]
[5]
[9]
[10]
Figure 7: The reduced graph of G(A) in Example 9.
Based on the reduced graph, let [i1 ] = [8] = {8}, [i2 ] = [9] = {9},
[i3 ] = [5] = {5, 6, 7}, [i4 ] = [10] = {10}, and [i5 ] = [2] = {1, 2, 3, 4}. The
corresponding matrices are




ε 0 ε ε
ε −5 ε
 ε ε −3 ε 


ε ε 0  , A55 = 
A11 = A22 = A44 = ε, A33 =
 ε 4 ε 0 .
9 ε ε
0 ε ε ε
If both the rows and columns of A are placed into the order
8, 9, 5, 6, 7, 10, 1, 2, 3, 4,
obtained from placing the elements of the sets [i1 ], [i2 ], [i3 ], [i4 ], and [i5 ] one
after another, then the following normal form of A is the result


ε ε ε ε 12 ε ε ε
ε ε
 ε ε ε 6 ε ε ε ε
ε ε 


 ε ε ε −5 ε ε ε 16 ε ε 


 ε ε ε ε 0 ε ε ε

ε
ε


 ε ε 9 ε ε ε ε ε
ε ε 

.
 ε ε ε ε ε ε 9 ε

ε
ε


 ε ε ε ε ε ε ε 0 ε ε 


 ε ε ε ε ε ε ε ε −3 ε 


 ε ε ε ε ε ε ε 4 ε 0 
ε ε ε ε ε ε 0 ε
ε ε
In particular, the diagonal blocks of this normal form of A, starting in the
upper left corner and going down to the lower right corner, are given by A11 ,
A22 , A33 , A44 , and A55 , respectively.
30
Let {x(k) : k ∈ N} be a sequence in Rnmax , and assume that for all j ∈ n
the quantity ηj , defined by
xj (k)
lim
,
k→∞
k
exists. The vector η = (η1 , η2 , . . . , ηn )> is called the cycle-time vector of the
sequence x(k). If all ηj ’s have the same value, this value is also called the
asymptotic growth rate of the sequence x(k).
Theorem 7 Consider the recurrence relation x(k + 1) = A ⊗ x(k) for k ≥ 0,
with a square regular matrix A ∈ Rn×n
max and an initial condition x(0) = x0 .
Let ξ = limk→∞ x(k; x0 )/k be the cycle-time vector of A.
1. For all j ∈ n,
M
ξ[j] =
λ[i] .
i∈π ∗ (j)
2. For all j ∈ n and any x0 ∈ Rn ,
M
1
λ[i] .
xj (k; x0 ) =
k→∞ k
∗
lim
i∈π (j)
Ergodic theorems (stochastic)
Irreducible matrices
We say that A(k) has fixed support if the probability that (A(k))ij equals ε
is either 0 or 1 and does not depend on k.
With the definition of fixed support at hand, we say that a random matrix A is irreducible if (a) it has fixed support and (b) it is irreducible with
probability one. For random matrices, irreducibility thus implies fixed support.
Theorem 8 Let {A(k)} be a stationary sequence of integrable and irreducible
matrices in RJ×J
max such that all finite elements are non-negative and all diagonal elements are different from ε. Then, a finite constant λ exists, so that
for any non-random finite initial condition x0 :
||x(k)||min
||x(k)||max
xj (k)
= lim
= lim
= λ
k→∞
k→∞
k→∞
k
k
k
lim
a.s.
(17)
and
1
1
1
E[xj (k)] = lim E[||x(k)||min ] = lim E[||x(k)||max ] = λ ,
k→∞ k
k→∞ k
k→∞ k
for 1 ≤ j ≤ J. The above limits also hold true for random initial conditions
provided that the initial condition is a.s. finite and integrable.
lim
31
Reducible matrices
Let {A(k)} be a sequence of matrices in RJ×J
max with fixed support. If we
replace any element of A(k) that is different from ε by e, then the resulting
communication graph of A(k), denoted by Ge (A), is independent of k (and
thus non-random). Let Ger (A) denote the reduced graph of Ge (A). As for the
def
deterministic scenario, we denote by [i] = {j ∈ {1, . . . , J} : iRj} the set of
nodes of the m.s.c.s. that contains i. The set of all nodes j such that there
exists a path from j to i in Ge (A) is denoted by π + (i). We denote by λtop
[i] the
top Lyapunov exponent associated with the matrix obtained by restricting
A(k) to the nodes in [i]. In case i is an isolated node or node with only
incoming or outgoing arcs, we set λtop
[i] = ε.
Theorem 9 Let {A(k)} be a stationary sequence of integrable matrices in
J×J
Rmax
with fixed support such that with probability one all finite elements are
non-negative and the diagonal elements are different from ε. For any (nonrandom) finite initial value x0 it holds true that
xj (k)
= λj
k→∞
k
lim
with
λj =
M
a.s. ,
λtop
[i] ,
i∈π ∗ (j)
and
1
E[xj (k)] = λj ,
k→∞ k
for 1 ≤ j ≤ J. The above limits also hold for random initial conditions
provided that the initial condition is a.s. finite and integrable.
lim
For the proof, we set π ∗ (i) = {i} ∪ π + (i); and we define predecessor sets
[
[≤ i] =
[j]
j∈π ∗ (i)
and [< i] = [≤ i] \ [i].
Beyond fixed support
The real challenge is to establish ergodic results without relying on fixed
support. This is ongoing research!
32
References
[1] Altman, E., B. Gaujal, and A. Hordijk. Discrete-Event Control of
Stochastic Networks: Multimodularity and Regularity. Lecture Notes in
Mathematics, vol. 1829. Springer, Berlin, 2003.
[2] Baccelli, F., G. Cohen, G.J. Olsder, and J.P. Quadrat. Synchronization
and Linearity. John Wiley and Sons, (this book is out of print and
can be accessed via the max-plus web portal at http://maxplus.org ),
1992.
[3] Heidergott, B. Max Plus Stochastic Systems and Perturbation Analysis.
Springer, New York, 2006.
[4] Heidergott, B., G.J. Olsder, and J. van der Woude. Max Plus at Work:
Modeling and Analysis of Synchronized Systems. Princeton University
Press, Princeton, 2006.
[5] Le Boudec, J.Y., and P. Thiran. Network Calculus: A Theory of Deterministic Queueing Systems for the Internet. Springer, Lecture Notes
in Computer Science, No. 2050, Berlin, 1998.
[6] Mairesse, J. Products of irreducible random matrices in the (max,+)
algebra. Advances of Applied Probability, 29:444–477, 1997.
[7] McEneany W. Max-Plus Methods for Nonlinear Control and Estimation. Birkhäuser, Boston, 2006.
Appendix
The shift-operator
Many stochastic concepts, such as stationarity or coupling, can be expressed
through the shift-operator in a very elegant manner. Let (Ω, F, P) be a probability space. We call the mapping θ : Ω → Ω shift-operator if
• the mapping θ is a bijective and measurable mapping from Ω onto itself,
• the law P is left invariant by θ, namely E[X] = E[X ◦ θ] for any
measurable and integrable random variable.
For any n, m ∈ Z, we set θn ◦ θm = θn+m . In particular, θ0 is the identity
and (θn )−1 = θ−n . By convention, the composition operator ‘◦’ has highest
priority in all formulae, that is, X ◦ θY means (X ◦ θ)Y .
33
The shift operator allows to define sequences of random variables. To see
this, let X be a measurable mapping defined on (Ω, F) and set X(n, ω) =
X(θn ω), for n ∈ T ⊂ Z. Because the law P is invariant, the distribution of
X(n) is independent of n. This motivates the following definition. We call
{X(t) : t ∈ T }, with X(t) a R-valued random variable defined on (Ω, F) and
T ⊂ Z, θ-stationary if
X(t; ω) = X(0, θt ω) ,
ω ∈Ω,
(18)
for any t. We call a sequence X = {X(t) : t ∈ T } compatible with shift operator θ if a version of X exists satisfying (18). Moreover, we call X stationary
if X is compatible with shift operator θ so that X is θ-stationary.
The shift θ is called ergodic if
n
1X
X ◦ θk = E[X] a.s. ,
lim
n→∞ n
k=1
for any measurable and integrable function X : Ω → R. We call a sequence
X = {X(t) : t ∈ T } ergodic if X is compatible with an ergodic shift operator.
An event A ∈ F is called invariant if P (A) = P (θt A) for any t, where
t
θ A = {θt ω : ω ∈ A}. Ergodicity of a shift operator is characterized by
Birkhoff’s pointwise ergodic theorem: the shift operator θ is ergodic if (and
only if) the only events in F that are invariant are Ω and ∅.
Let X = {X(t) : t ∈ T } be a sequence of random elements on a state
space S. For m ≥ 1, let α ∈ S m be a sequence of states such that
(X(t + m − 1), X(t + m − 2), . . . , X(t)) = α
with positive probability. Define the sequence of hitting times of X on α as
follows:
T0 = inf{t ≥ 0 : (X(t + m − 1), X(t + m − 2), . . . , X(t)) = α}
and, for k > 0,
Tk+1 = inf{t > Tk + m : (X(t + m − 1), X(t + m − 2), . . . , X(k)) = α}.
Result: If X is a stationary and ergodic sequence compatible with shift
operator θ, then it holds that (i) Tk < ∞ with probability one for all k, and
(ii) limk→∞ Tk = ∞ with probability one.
34
Coupling Convergence
We say that there is coupling convergence in finite time (or, merely coupling)
of a sequence {Xn } to a stationary sequence {Y ◦ θn } if
¡
¢
lim P ∀k : Xn+k = Y ◦ θn+k = 1 ,
n→∞
or, equivalently, there exists an a.s. finite random variable N such that
XN +k = Y ◦ θN +k ,
k ≥0.
Result: Coupling (convergence) implies total variation convergence.
Result: Convergence in total variation implies convergence in distribution
(or, weak convergence) but the converse is not true.
Strong Coupling Convergence and Goldstein’s Maximal Coupling
We say that there is strong coupling convergence in finite time (or, merely
strong coupling) of a sequence {Xn } to a stationary sequence {Y ◦ θn } if
N 0 = inf{n ≥ 0 | ∀k ≥ 0 : Xn+k ◦ θ−n−k = Y }
is finite with probability one.
Result: Strong coupling convergence implies coupling convergence but the
converse is not true.
We illustrate this with the following example. Let ξm , with ξm ∈ Z and
E[ξ1 ] = ∞, be an i.i.d. sequence and define Xn , for n ≥ 1, as follows

for Xn−1 = 0 ,
 ξ0
Xn−1 − 1
for Xn−1 ≥ 2 ,
Xn =

Xn
for Xn−1 = 1 ,
where X0 = 0. It is easily checked that {Xn } couples with the constant
sequence 1 after ξ0 − 1 transitions. To see that {Xn } fails to converge in
strong coupling, observe that the shift operator applies to the ‘stochastic
noise’ ξm as well. Specifically, for k ≥ 0,

for Xn−1 ◦ θ−k = 0 ,
 ξ0 ◦ θ−k
−k
−k
Xn−1 ◦ θ − 1
for Xn−1 ◦ θ−k ≥ 2 ,
Xn ◦ θ =

Xn ◦ θ−k
for Xn−1 ◦ θ−k = 1 ;
35
where X0 ◦ θ−k = 0, and ξ−k = ξ0 ◦ θ−k . This implies
N0
= inf{n ≥ 0 | ∀k ≥ 0 : Xn+k ◦ θ−n−k = 1 }
= inf{n ≥ 0 | ∀k ≥ 0 : ξn+k − 1 ≤ n }
= ∞ a.s.
Result (Goldstein’s maximal coupling): Let {Xn } and Y be defined on
a Polish state space. If {X(n)} converges with coupling to Y , then a version
{X̃(n)} of {X(n)} and a version Ỹ of Y defined on the same probability
space exists such that {X̃(n)} converges with strong coupling to Ỹ .
36