Max Plus Algebra and Queues Bernd Heidergott EURANDOM research fellow Vrije Universiteit Department of Econometrics and Operations Research De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands email:[email protected] 1 Lecture 1: Max Plus Algebra The Beginnings Basic Definitions def def Define ε = −∞ and e = 0, and denote by Rmax the set R ∪ {ε}. For a, b ∈ Rmax , we define operations ⊕ and ⊗ by def a ⊕ b = max(a, b) def and a ⊗ b = a + b. The set Rmax together with the operations ⊕ and ⊗ is called max-plus algebra and is denoted by Rmax = (Rmax , ⊕, ⊗, ε, e) . As in conventional algebra, we simplify the notation by letting the operation ⊗ have priority over the operation ⊕. For example, 5 ⊗ −9 ⊕ 7 ⊗ 1 has to be understood as (5 ⊗ −9) ⊕ (7 ⊗ 1) . Notice that (5 ⊗ −9) ⊕ (7 ⊗ 1) = 8, whereas 5 ⊗ (−9 ⊕ 7) ⊗ 1 = 13. Extension of operations to −∞ Clearly, max(a, −∞) = max(−∞, a) = a and a + (−∞) = −∞ + a = −∞, for any a ∈ Rmax , so that a⊕ε=ε⊕a=a and a ⊗ ε = ε ⊗ a = ε, (1) for any a ∈ Rmax . Examples 5 ⊕ 3 = max(5, 3) = 5, 5 ⊕ ε = max(5, −∞) = 5, 5 ⊗ ε = 5 − ∞ = −∞ = ε, e ⊕ 3 = max(0, 3) = 3, and 5 ⊗ 3 , = 5 + 3 = 8. Fun with Max Plus Powers are introduced in max-plus algebra in the natural way using the associative property. We denote the set of natural numbers including zero by N and define for x ∈ Rmax def x⊗n = x | +x+ {z· · · + x} = n × x {z· · · ⊗ x} = x | ⊗x⊗ n times n times 2 def for all n ∈ N with n 6= 0, and for n = 0 we define x⊗0 = e ( = 0). For example, 5⊗3 = 3 × 5 = 15. Inspired by this, we similarly introduce general powers of real numbers as x⊗α = α × x , for α ∈ R. For example, 1 8⊗ 2 = 1 ×8 = 4 2 and 1 1 12⊗− 4 = − × 12 = −3 = 3⊗−1 . 4 √ What is −1? Show that Fermat’s last theorem (xn + y n = z n has no nonzero integer solutions for x, y and z when n > 2) fails to hold for max-plus algebra. Lets get abstract Algebraic properties • Distributivity: ∀x, y, z ∈ Rmax : x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z) . • Associativity: ∀x, y, z ∈ Rmax : ¡ ¢ ¡ ¢ x⊕ y⊕z = x⊕y ⊕z ∀x, y, z ∈ Rmax : ¡ ¢ ¡ ¢ x⊗ y⊗z = x⊗y ⊗z. and • Commutativity: ∀x, y ∈ Rmax : x⊕y = y⊕x and x ⊗ y = y ⊗ x. • Distributivity of ⊗ over ⊕: ∀x, y, z ∈ Rmax : x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z) . • Existence of a zero element: ∀x ∈ Rmax : x ⊕ ε = ε ⊕ x = x. 3 • Existence of a unit element: ∀x ∈ Rmax : x ⊗ e = e ⊗ x = x. • The zero is absorbing for ⊗: ∀x ∈ Rmax : x ⊗ ε = ε ⊗ x = ε. • Idempotency of ⊕: ∀x ∈ Rmax : x ⊕ x = x. Definition 1 A semiring is a nonempty set R endowed with two binary operations ⊕R and ⊗R such that • ⊕R is associative and commutative with zero element εR ; • ⊗R is associative, distributes over ⊕R , and has unit element eR ; • εR is absorbing for ⊗R . Such a semiring is denoted by R = (R, ⊕R , ⊗R , εR , eR ). If ⊗R is commutative, then R is called commutative, and if ⊕R is idempotent, then it is called idempotent. Max-plus algebra is an example of a commutative and idempotent semiring. Here are examples of other meaningful semirings: • Identify ⊕R with conventional addition, denoted by +, and ⊗R with conventional multiplication, denoted by ×. Then the zero and unit element are εR = 0 and eR = 1, respectively. The object Rst = (R, +, ×, 0, 1) – the subscript st refers to “standard” – is an instance of a semiring over the real numbers. Since conventional multiplication is commutative, Rst is a commutative semiring. Note that Rst fails to be idempotent. However, as is well known, Rst is a ring and even a field with respect to the operations + and × . • Min-plus algebra is defined as Rmin = (Rmin , ⊕0 , ⊗, ε0 , e), where Rmin = def R∪{+∞}, ⊗0 is the operation defined by a⊗0 b = min(a, b) for all a, b ∈ def Rmin , and ε0 = +∞ . Note that Rmin is an idempotent, commutative semiring. 4 • Consider Rmin,max = (R, ⊕0 , ⊕, ε0 , ε), with R = R ∪ {ε, ε0 }, and set ε ⊕ ε0 = ε0 ⊕ ε = ε0 . Then Rmin,max is an idempotent, commutative semiring. In the same vein, Rmax,min = (R, ⊕, ⊕0 , ε, ε0 ) is an idempotent, commutative semiring provided that one defines ε ⊕0 ε0 = ε0 ⊕0 ε = ε. • Let F = {f |f (s) ≤ f (t), for all s ≤ t, s, t ∈ R} denote the set of wide-sense increasing functions. Define (f ⊕ g)(t) = min(f (t), g(s)) for all t. For t > 0, let (f ⊗ g)(t) = inf {f (t − s) + g(s)}, 0≤s≤t and let (f ⊗ g)(t) = 0, for t ≤ 0. Set ε(t) = +∞ and e(t) = +∞, for t > 0, and ε(t) = e(t) = 0, for t ≤ 0. Then (F, ⊕, ⊗, ε(·), e(·)) is an idempotent, commutative semiring. [This is used in Le Boudec and Thiran.] • As a last example of a semiring of a somewhat different nature, let S be a nonempty set. Denote the set of all subsets of S by R; then (R, ∪, ∩, ∅, S), with ∅ the empty set, and ∪ and ∩ the set-theoretic union and intersection, respectively, is a commutative, idempotent semiring. The same applies to (R, ∩, ∪, S, ∅). Can one define ª? Lemma 1 Let R = (R, ⊕R , ⊗R , εR , eR ) be a semiring. Idempotency of ⊕R implies that inverse elements with respect to ⊕R do not exist. Matrices and vectors The set of n × m matrices with underlying max-plus algebra is denoted by def Rn×m max . For n ∈ N with n 6= 0, define n = {1, 2, . . . , n}. The element of a matrix A ∈ Rn×m max in row i and column j is denoted by aij and, occasionally, by [A]ij , for i ∈ n and j ∈ m. The sum of matrices A, B ∈ Rn×m max , denoted by A ⊕ B, is defined by ¡ ¢ [A ⊕ B]ij = aij ⊕ bij = max aij , bij , n×m it holds that A ⊕ B = B ⊕ A. for i ∈ n and j ∈ m. Note that for A, B ∈ Rmax 5 For A ∈ Rn×m max and α ∈ Rmax , the scalar multiple α ⊗ A is defined by [α ⊗ A]ij = α ⊗ aij (2) for i ∈ n and j ∈ m. l×m For matrices A ∈ Rn×l max and B ∈ Rmax , the matrix product A ⊗ B is defined as [A ⊗ B]ik = l M © ª aij ⊗ bjk = max aij + bjk j∈l j=1 (3) for i ∈ n and k ∈ m. This is just like in conventional algebra with + replaced by max and × by + . For example, let µ ¶ µ ¶ e ε −1 11 A = and B = , 3 2 1 ε then the elements of A ⊗ B are given by [A ⊗ B]11 = e ⊗ (−1) ⊕ ε ⊗ 1 = max(0 − 1, −∞ + 1) = −1, [A ⊗ B]12 = e ⊗ 11 ⊕ ε ⊗ ε = max(0 + 11, −∞ − ∞) = 11, [A ⊗ B]21 = 3 ⊗ (−1) ⊕ 2 ⊗ 1 = max(3 − 1, 2 + 1) = 3, and [A ⊗ B]22 = 3 ⊗ 11 ⊕ 2 ⊗ ε = max(3 + 11, 2 − ∞) = 14, yielding, in matrix notation, µ A⊗B = −1 3 11 14 ¶ . Notice that the matrix product in general fails to be commutative1 . Indeed, for the above A and B µ ¶ 14 13 B⊗A = 6= A ⊗ B. 1 ε Let E(n, m) denote the n × m matrix with all elements equal to ε, and denote by E(n, m) the n × m matrix defined by ( e for i = j, def [E(n, m)]ij = ε otherwise. 1 This explains why commutativity of ⊗ = + is ”ignored” for Rmax 6 If n = m, then E(n, n) is called the n × n identity matrix. When their dimensions are clear from the context, E(n, m) and E(n, m) will also be written as E and E, respectively. For Rn×m max , the matrix addition ⊕, as defined in (2), is associative, commun×n the matrix product ⊗, as tative, and has zero element E(n, m). For Rmax defined in (3), is associative, distributive with respect to ⊕, has unit element E(n, n), and E(n, n) is absorbing for ⊗. The structure ³ ´ n×n , ⊕ , ⊗ , E , E , Rn×n = R max max with ⊕ and ⊗ as defined in (2) and (3), respectively, constitutes a noncommutative, idempotent semiring. > The transpose A ∈ Rn×m max , denoted by A , is defined in the usual way. As before, also in matrix addition and multiplication, the operation ⊗ has priority over the operation ⊕. def The elements of Rnmax = Rn×1 max are called vectors. The jth element of a vector n x ∈ Rmax is denoted by xj , which also will be written as [x]j . The vector in Rnmax with all elements equal to e is called the unit vector and is denoted by u; in formula, [u]j = e for j ∈ n. ⊗k For A ∈ Rn×n defined by max , denote the kth power of A by A def A⊗k = |A ⊗ A ⊗ {z· · · ⊗ A}, k times (4) def for k ∈ N with k 6= 0, and set A⊗0 = E(n, n). Notice that [A⊗k ]ij has to be carefully distinguished from (aij )⊗k . Indeed, the former is element (i, j) of the kth power of A, whereas the latter is the kth power of element (i, j) of A. A mapping f from Rnmax to Rnmax is called affine if f (x) = A ⊗ x ⊕ b for n some A ∈ Rn×n max and b ∈ Rmax . If b = E, then f is called linear. A recurrence relation x(k + 1) = f (x(k)), for k ∈ N, is called affine (resp., linear) if f is an affine (resp., linear) mapping. A matrix A ∈ Rn×m max is called regular if A contains at least one element different from ε in each row. Regularity is a mere technical condition, for if 7 A fails to be regular, it contains redundant rows, and any system modeled by x(k + 1) = A ⊗ x(k) can also be modeled by considering a reduced regular version of A in which all redundant rows and related columns are skipped. We call a matrix A ∈ Rn×n max irreducible if no permutation matrix P exists, > such that P ⊗ A ⊗ P is a block upper triangular matrix (the digraph of A is strongly connected.). If a matrix is not irreducible, it is called reducible. Let A be a random element in Rn×m max defined on a probability space (Ω, A, P ). We call A integrable if E[ 1Aij >ε |Aij | ] < ∞ , 1 ≤ i ≤ n, 1 ≤ j ≤ m . In words, integrability of a matrix is defined through integrability of its nonε elements. If A is integrable, then the expected value of A is given by the matrix E[A] with ( E[ 1Aij >ε Aij ] for P (Aij 6= ε) > 0, (E[A])ij = ε for P (Aij = ε) = 1, for 1 ≤ i ≤ n, 1 ≤ j ≤ m. We say that A(k) has fixed support if the probability that (A(k))ij equals ε is either 0 or 1 and does not depend on k. Heaps of pieces (no queueing) Let P denote a finite set. In the example below we have P = {a, b, c}. A sequence of ”pieces” out of P is called a heap. For example, w = a b a c b is a heap; see the figure. Denote the upper contour of heap w by a vector xH (w) ∈ Rnmax , where (xH (w))r is the height of the heap on column r. For example, let e ε ε ε ε 1 1 ε ε ε ε 1 2 2 ε 1 1 ε ε ε M (a) = ε ε e ε ε , M (b) = ε 1 2 2 ε , ε e 1 1 ε ε ε ε e ε ε ε ε ε e ε ε ε ε e and M (c) = e ε ε ε ε ε e ε ε ε 8 ε ε e ε ε ε ε ε 1 2 ε ε ε 1 2 , then xH (a b a c b) = (3, 4, 4, 3, 3)> , when starting from ground level. The upper contour of the heap a b a c b is indicated by the boldfaced line in Figure 1.4. For heap w and piece η ∈ P, write w η for the heap resulting from piling piece η on heap w. Note that the order in which the pieces fall is of importance. The upper contour follows the recurrence relation xH (w η) = M (η) ⊗ xH (w). In words, the upper contour of a heap of pieces follows a max-plus recurrence relation. For a given sequence ηk , k ∈ N, the asymptotic growth rate of the heap model is given by 1 lim xH (k) , k→∞ k provided that the limit exists. For example, if ηk , k ∈ N, represents a particular schedule, like η1 = a, η2 = b, η3 = c, η4 = a, η5 = b, η6 = c, and so forth, then the above limit measures the efficiency of schedule a b c. For a given sequence ηk , k ∈ N, the asymptotic form of xH (k) can be studied, where form means the relative differences of the components of xH (k). More precisely, in studying the shape of the upper contour the actual height of the heap is disregarded. To that end, the vector of relative differences in xH (w), called the shape vector, is denoted by s(w). For example, the shape of heap w = a b a c b in Figure 1.4 is obtained by letting 9 the boldfaced line (the upper contour) sink to the ground level, yielding the vector s(w) = (0, 1, 1, 0, 0)> . More formally, the shape vector is defined as n o sr (w) = (xH (w))r − min (xH (w))p : p ∈ R , r ∈ R. Suppose that the sequence in which the pieces appear cannot be controlled (their arrivals may be triggered by an external source). For instance, ηk , k ∈ N, is a random sequence such that piece a, b, and c appear with equal def probability. Set s(k) = s(η1 , η2 , . . . , ηk ). Since pieces fall in random order, s(k) is a random variable. Using probabilistic arguments, one can identify sufficiency conditions such that the probability distribution of s(k) converges to a limiting distribution, say, F . Hence, the asymptotic shape of the heap is given by the probability distribution F . By means of F , for example, the probability can be determined that the completion time of tasks typically differs more than t time units over the resources, yielding an indication on how well balanced the schedule ηk , k ∈ N, is. The pleasures and frustrations of Max Plus Great didactic tool; powerful deterministic theory; dynamic system view (projective space); stochasticity spoils the fun: a one day wonder? Deep concepts explained with Max Plus Coupling from the past, weak convergence vs. a.s. convergence. Example 1 Consider 1 ε B= 0 ε and 1 ε C= ε 0 ε 2 ε 0 ε ε ε ε ε ε . ε ε ε 2 0 ε ε ε ε ε ε ε . ε ε Let A(k) be an i.i.d. sequence with state-space {B, C} such that P(B = A(k)) = p = 1 − P(C = A(k)), 10 for some p ∈ (0, 1), which implies that A(k) fails to have fixed support (!). Note that x1 (k) = k and x2 (k) = 2k, (5) for any k. Observe, that (5) implies that, ( k if A(k) = B x3 (k) = 2k if A(k) = C. In the same vein, ( 2k x4 (k) = k if A(k) = B if A(k) = C. Hence, it follows that x1 (k) =1 k→∞ k lim and x2 (k) = 2, k→∞ k lim whereas for i = 3, 4 lim sup k→∞ xi (k) xi (k) = 2 6= 1 = lim inf k→∞ k k and the sample path limits fail to exist. It is worth noting that x(k)/k converges weakly as k tends to ∞ towards the random vector (Y1 , Y2 , Y3 , Y4 ), where Y1 = 1 and Y2 = 2 with probability one and Y3 = 1, Y4 = 2 with probability p and Y3 = 2, Y4 = 1 with probability 1 − p. Let Y (k) := A(0) ⊗ A(−1) ⊗ · · · ⊗ A(−k) ⊗ x0 , for k ≥ 0. Then, for k ≥ 1, Yi (k)/k is independent of k and thus converges with probability one for i ∈ 4. The random variable Y (k)/k, for k ≥ 1, is the weak limit of x(k)/k as k tends to ∞. Ergodicity and i.i.d. Example 2 What happens if we consider stationary and ergodic sequence instead of an i.i.d. sequence? Let Ω = {ω1 , ω2 } and P (ωi ) = 1/2, for i = 1, 2. Define the shift operator θ by θ(ω1 ) = ω2 and θ(ω2 ) = ω1 . Then θ is stationary and ergodic. Consider the matrices A, B, with A 6= B, and let {A(k, ω1 ) : k ∈ N} = A, B, A, B, . . . {A(k, ω2 ) : k ∈ N} = B, A, B, A, . . . The sequence {A(k)} is thus stationary and ergodic. But with probability one we never observe a sequence of n occurrences in a row of either A or B, for n > 1. 11 The goal of the first three lectures ... is to prove the following theorem: Assume that assumptions (W1), (W2) and (W3) are satisfied and denote the maximal Lyapunov exponent of {A(k)} by a. If ν > a, then the sequence W (k + 1) = A(k) ⊗ C(σ(k + 1)) ⊗ W (k) ⊕ B(k) converges with strong coupling2 to an unique stationary regime W , with M W = D(0) ⊕ C(−τ (−j)) ⊗ D(j) , j≥1 where D(0) = B ◦ θ−1 and D(j) = j O A(−i) ⊗ B(−(j + 1)) , j ≥1. i=1 Lecture 2: Max Plus and queues A zoology of mappings Closed networks (autonomous equations): x(k + 1) = A(k) ⊗ x(k) with A(k) irreducible. Open networks (non-autonomous equations): x(k + 1) = A(k) ⊗ x(k) ⊗ u(k) with A(k) irreducible, or x(k + 1) = A(k) ⊗ x(k) with A(k) reducible. 2 ... which implies total variation convergence, which implies weak convergence. 12 Max Plus models of queueing network Example 3 Consider a closed system of J single-server queues in tandem, with infinite buffers. In the system, customers have to pass through the queues consecutively so as to receive service at each server. After service completion at the J th server, the customers return to the first queue for a new cycle of service. We denote the number of customers initially residing at queue j by nj . We assume that there are J customers circulating through the network and that initially there is one customer in each queue, that is, nj = 1 for 1 ≤ j ≤ J. Figure 1 shows the initial state of the tandem network, customers are represented by the symbol ‘•’. - º· º· - • -· · · • ¹¸ ¹¸ 1 º· - • ¹¸ 2 J Figure 1: The closed tandem queueing system at initial state nj = 1 for 1 ≤ j ≤ J. Let σj (k) denote the k th service time at queue j and let xj (k) be the time of the k th service completion at node j, then the time evolution of the system can be described by a J-dimensional vector x(k) = (x1 (k), . . . , xJ (k)) following the homogeneous equation3 x(k + 1) = A(k) ⊗ x(k) , where the matrix A(k) looks like σ1 (k) ε σ2 (k) σ2 (k) ... A(k − 1) = ··· ε ··· ··· ··· ε σJ−1 (k) ε (6) ε ··· σJ−1 (k) σJ (k) for k ≥ 1. Observe that A(k) is irreducible. 3 Models can be algebraically computed. Not part of this lecture. 13 σ1 (k) ε .. . ε σJ (k) (7) Example 4 We now consider the open variant of the tandem network in Example 3. Let queue 0 represent an external arrival stream of customers. Each customer who arrives at the system has to pass through queues 1 to J and then leaves the system. We assume that the system starts empty. Denoting the number of customers initially present at queue j by nj , we assume nj = 0 for 1 ≤ j ≤ J. Figure 2 shows the initial state of the tandem network. - º· º· -· · · ¹¸ ¹¸ 1 º· ¹¸ 2 J Figure 2: The open tandem queueing system at initial state nj = 0 for 1 ≤ j ≤ J. Again, we let xj (k) denote the time of the k th service completion at station j. In particular, we let x0 (k) denote the k th arrival epoch at the system. The time evolution of the system can then be described by a (J + 1)-dimensional vector x(k) = (x0 (k), . . . , xJ (k)) following the homogeneous equation x(k + 1) = A(k) ⊗ x(k) , where the matrix A(k − 1) looks like σ0 (k) ε σ (k) ⊗ σ (k) σ (k) 0 1 1 σ0 (k) ⊗ σ1 (k) ⊗ σ2 (k) σ (k) ⊗ σ 1 2 (k) .. . σ0 (k) ⊗ · · · ⊗ σJ (k) σ1 (k) ⊗ · · · ⊗ σJ (k) (8) ε ε σ2 (k) ... ··· ··· ε ε ε σ2 (k) ⊗ · · · ⊗ σJ (k) ··· (9) σJ (k) for k ≥ 14 . Alternatively, we could describe the system via a J dimensional vector x̂(k) = (x̂1 (k), . . . , x̂J (k)) following the inhomogeneous equation x̂(k + 1) = Â(k) ⊗ x̂(k) ⊕ B(k) ⊗ τ (k + 1) , (10) where the matrix Â(k) looks like (9), except for the first column and the first row which are missing, that is, (Â(k))ij = (A(k))i+1 j+1 for 1 ≤ i, j ≤ J; the 4 The source is modeled as a node of the network. 14 vector B(k) is given by σ1 (k + 1) σ1 (k + 1) ⊗ σ2 (k + 1) .. . B(k) = σ1 (k + 1) ⊗ σ2 (k + 1) ⊗ · · · ⊗ σJ (k + 1) for k ≥ 0; and τ (k) = k X σ0 (i) i=1 th denotes the k arrival time. Notice that Bj (0), for 1 ≤ j ≤ J, denotes the time it takes the first customer from entering the system until departing from station j (this requires a proof ). Example 5 (Example 4 revisited) We consider the system as described in the above example. However, in contrast to Example 4, we let xj (k) denote the time of the k th beginning of service at station j, with 1 ≤ j ≤ J. The standard non-autonomous equation now reads x(k + 1) = A(k) ⊗ x(k) ⊕ B(k) ⊗ τ (k + 1) , (11) with A(k) given by ( (A(k))ij = ε N σj (k) ⊗ i−1 h=j σh (k + 1) for i < j, for i ≥ j , for 1 ≤ i, j ≤ J, where we set σj (0) = 0, and 0 σ1 (k + 1) σ1 (k + 1) ⊗ σ2 (k + 1) .. . B(k) = σ1 (k + 1) ⊗ σ2 (k + 1) ⊗ · · · ⊗ σJ−1 (k + 1) for k ≥ 0. An element Bj (0) denotes the time it takes the first customer from entering the system until reaching station j, (this requires a proof ). Notice that A(k) is reducible. Example 3 and Example 4 model sequences of departure times from the queues via a max-plus recurrence relation and a model for beginning of service times is given in Example 5. We now turn to another important application of max-plus linear models: waiting times. 15 Example 6 Consider the open tandem network described in Example 5. Let Wj (k) be the time the k th customer arriving at the network spends in the system until the beginning of her/his service at station j. Then, the vector of waiting times W (k) = (W1 (k), . . . , WJ (k)) follows the recurrence relation W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) , k ≥0, with W (0) = (0, . . . , 0) and C(r) a matrix with diagonal entries −r and all other entries equal to ε. Taking J = 1, the above recurrence relation for the waiting times reads W (k + 1) = σ1 (k) ⊗ (−σ0 (k + 1)) ⊗ W (k) ⊕ 0 = max( σ1 (k) − σ0 (k + 1) + W (k) , 0 ) , k ≥0, with σ1 (0) = 0, which is Lindley’s equation for the actual waiting time in a G/G/1 queue. If we had let x(k) describe departure times at the stations, c.f. Example 4, then W (k) would yield the vector of sojourn times of the k th customer. In other words, Wj (k) would model the time the k th customer arriving at the network spends in the system until leaving station j. In the above examples the positions which are equal to ε are fixed and the randomness is generated by letting the entries different from ε be random variables. The next example is of a different kind. Here, the matrix as a whole is random, that is, the values of the elements are completely random in the sense that an element can with positive probability be equal to ε or finite. Example 7 Consider a cyclic tandem queueing network consisting of a single server and a multi server, each with deterministic service time. Service times at the single-server station equal σ, whereas service times at the multiserver station equal σ 0 . Three customers circulate in the network. Initially, one customer is in service at station 1, the single server, one customer is in service at station 2, the multi-server, and the third customer is just about to enter station 2. The time evolution of this network is described by a max-plus linear sequence x(k) = (x1 (k), . . . , x4 (k)), where x1 (k) is the k th beginning of service at the single-server station and x2 (k) is the k th departure epoch at the single-server station; x3 (k) is the k th beginning of service at the multi-server station and x4 (k) is the k th departure epoch from the multi-server station. The system then follows x(k + 1) = D2 ⊗ x(k) , 16 where σ σ D2 = ε ε ε ε e ε σ0 ε ε σ0 ε ε , e ε with x(0) = (0, 0, 0, 0). Figure 3 shows the initial state of this system. º· • ¾ ¹¸ º· • - • ¹¸ º· ¹¸ Figure 3: The initial state of the multi-server system (three customers). Consider the cyclic tandem network again, but one of the servers of the multiserver station has broken down. The system is thus a tandem network with two single server stations. Initially one customer is in service at station 1, one customer is in service at station 2, and the third customer is waiting at station 2 for service. This system follows x(k + 1) = D1 ⊗ x(k) , where σ σ D1 = ε ε ε ε e ε σ0 ε σ0 σ0 ε ε , ε ε with x(0) = (0, 0, 0, 0). Figure 4 shows the initial state of the system with breakdown. Assume that whenever a customer enters station 2, the second server of the multi server station breaks down with probability θ. Let Aθ (k) have distribution P ( Aθ (k) = D1 ) = θ and P ( Aθ (k) = D2 ) = 1 − θ , 17 º· • ¾ ¹¸ º· • - • ¹¸ º· @¡ ¡ @ ¡ @ ¹¸ Figure 4: The initial state of the multi-server system with breakdown (three customers). then xθ (k + 1) = Aθ (k) ⊗ xθ (k) describes the time evolution of the system with breakdowns. That the above recurrence relation indeed models the sample path dynamic of the system with breakdowns is not obvious and a proof is required. Particularities on waiting times We consider basic recurrence relation x(k + 1) = A(k) ⊗ x(k) ⊕ B(k) ⊗ τ (k + 1) . (12) We assumed that there is only one input stream. If we consider only component xj (k), we can subtract τ (k + 1) on both sides of equation (12) and get Wj (k + 1) = xj (k + 1) − τ (k + 1) = (A(k) ⊗ x(k))j ⊗ (−τ (k + 1)) ⊕ Bj (k) . Let σ0 (k) denote the k th interarrival time, that is, k X τ (k) = σ0 (i) , i=1 then (A(k) ⊗ x(k))j ⊗ (−τ (k + 1)) = M Aji (k) ⊗ (−σ0 (k + 1)) ⊗ (xi (k) − τ (k)) i = M Aji (k) ⊗ (−σ0 (k + 1)) ⊗ Wi (k) . i 18 Let C(h) denote a diagonal matrix with −h on the diagonal and ε else. Then, we can write the expression on the right-hand side of the above equation as follows M ¡ ¢ Aji (k) ⊗ (−σ0 (k + 1)) ⊗ Wi (k) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) j . i Combining the above formulas, we obtain the following vectorial form of the recurrence relation for W (k + 1): W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) . (13) Lemma 2 Let x0 = u in (12). If W (0) = u in (13), then W (1) = B(0). By using elementary matrix operations in the max-plus algebra, Equation (13) can be rewritten as W (k + 1) = k O A(i) ⊗ C(σ0 (i + 1)) ⊗ W (0) i=0 ⊕ k k M O A(j) ⊗ C(σ0 (j + 1)) ⊗ B(i) , (14) i=0 j=i+1 with W (0) = x0 . When it comes to queueing networks, we obtain from (14) a closed-form expression for the vector of (k + 1)st waiting/sojourn times in an open queueing network that is initially empty and whose sequence of interarrival times is given by {σ0 (k)}. More precisely, depending on whether we model beginning of service or departure times by x(k), Wj (k) models the time the k th arriving customer spends in the system until her/his service at server j starts, or until she/he departs from server j. Equation (14) is called the forward construction of waiting times. Subadditive ergodic theory Subadditive processes are double indexed processes X = {Xmn : m, n ∈ N} satisfying the following conditions: (S1) If i < j < k, then Xik ≤ Xij + Xjk a.s. (S2) For m ≥ 0, the joint distributions of the process {Xm+1n+1 : m < n} are the same as those of {Xmn : m < n}. 19 (S3) The expected value gn = E[X0n ] exists and satisfies gn ≥ −cn for some finite constant c > 0 and all n ≥ 1. We now state Kingman’s subadditive ergodic theorem: if X is a subadditive process (that is, (S1), (S2) and (S3) hold), then the limit X0n n→∞ n ξ = lim exists almost surely, and E[ξ] = λ. Condition (S2), on the shift {Xmn } → {Xm+1n+1 }, is a stationarity condition. If all events defined in terms of X that are invariant under this shift have probability zero or one, then X is ergodic. In this case, the limiting random variable ξ is almost surely constant and equal to λ. Note that the limit also holds when expected values are considered. Lemma 3 Let {A(k)} be a stationary sequence of a.s. regular and integrable matrices in RJ×J max . Then {−||xnm ||min : m > n ≥ 0} and {||xnm ||max : m > n ≥ 0} are subadditive ergodic processes. Theorem 1 Let {A(k)} be a stationary sequence of a.s. regular, integrable square matrices. Then, finite constants λtop and λbot exist, so that for all (non-random) finite initial conditions x0 : def ||x(k)||max ||x(k)||min def ≤ λtop = lim k→∞ k→∞ k k λbot = lim a.s. and 1 1 E[||x(k)||min ] = λbot ≤ lim E[||x(k)||max ] = λtop . k→∞ k k→∞ k The above limits also hold for random initial conditions provided that the initial condition is a.s. finite and integrable. lim The constant λtop is called the top or maximal Lyapunov exponent of {A(k)} and λbot is called the bottom Lyapunov exponent. Lecture 3: Stability of waiting times We consider the following situation. An open queuing network with J stations is given such that the vector of departure times from the stations, denoted by x(k), follows the recurrence relation x(k + 1) = A(k) ⊗ x(k) ⊕ τ (k + 1) ⊗ B(k) , 20 (15) with x(0) = u, where τ (k) denotes the time of the k th arrival to the system. We denote by σ0 (k) the k th interarrival time, so that the k th arrival of a customer at the network happens at time τ (k) = k X σ0 (i) , k ≥1, i=1 with τ (0) = 0. Then, Wj (k) = xj (k) − τ (k) denotes the time the k th customer arriving to the system spends in the system until completion of service at server j. The vector of k th sojourn times, denoted by W (k) = (W1 (k), . . . , WJ (k)), follows the recurrence relation W (k + 1) = A(k) ⊗ C(σ0 (k + 1)) ⊗ W (k) ⊕ B(k) , k ≥0, with W (0) = e, where C(h) denotes a diagonal matrix with −h on the diagonal and ε elsewhere. Alternatively, xj (k) in (15) may model the times of the k th beginning of service at station j. With this interpretation of x(k), Wj (k) defined above represents the time spent by the k th customer arriving to the system until beginning of her/his service at j. For example, in the G/G/1 queue W (k) models the waiting time. In the following we will establish sufficient conditions for W (k) to converge to a unique stationary regime. The main technical assumptions are: (W1) For k ∈ Z, let A(k) ∈ RJ×J max be a.s. regular and assume that the maximal Lyapunov exponent of {A(k)} exists. (W2) There exists a fixed number α, with 1 ≤ α ≤ J, such that the vector B α (k) = (Bj (k) : 1 ≤ j ≤ α) has finite elements for any k, and Bj (k) = ε, for α < j ≤ J and any k. (W3) The sequence {(A(k), B α (k))} is stationary and ergodic, and independent of {τ (k)}, where τ (k) is given by τ (k) = k X σ(i) , k ≥1, i=1 with τ (0) = 0 and {σ(k) : k ∈ Z} a stationary and ergodic sequence of positive random variables with mean ν ∈ (0, ∞). Provided that {A(k)} is a.s. regular and stationary, integrability of A(k) is a sufficient condition for (W1). In terms of queuing networks, the main restriction imposed by these conditions stems from the non-negativity of the diagonal of A(k). The part of condition (W3) that concerns the arrival stream of the network is, for example, satisfied for Poisson arrival streams. 21 Theorem 2 Assume that assumptions (W1), (W2) and (W3) are satisfied and denote the maximal Lyapunov exponent of {A(k)} by a. If ν > a, then the sequence W (k + 1) = A(k) ⊗ C(σ(k + 1)) ⊗ W (k) ⊕ B(k) converges with strong coupling to an unique stationary regime W , with M C(−τ (−j)) ⊗ D(j) , W = D(0) ⊕ j≥1 where D(0) = B ◦ θ−1 and D(j) = j O A(−i) ⊗ B(−(j + 1)) , j ≥1. i=1 It is worth noting that β(w), defined in the proof of Theorem 2, fails to be a stopping time adapted to the natural filtration of {(A(k), B(k)) : k ≥ 0}. More precisely, β(w) is measurable with respect to the σ-field σ((A(k), B(k)) : k ≥ 0) but, in general, {β(w) = m} 6∈ σ((A(k), B(k)) : m ≥ k ≥ 0), for m ∈ N. Due to the max-plus formalism, the proof of Theorem 2 is a rather straightforward extension of the proof of the classical result for the G/G/1 queue. Lecture 4: Structural Insights (on matrices and graphs) Matrices and graphs A directed graph G is a pair (N , D), where N is a finite set of elements called nodes (or vertices) and D ⊂ N × N is a set of ordered pairs of nodes called arcs (or edges). If (i, j) ∈ D, then we say that G contains an arc from i to j, and the arc (i, j) is called an incoming arc at j and an outgoing arc at i. A directed graph is also called a digraph in the literature. A directed graph is called weighted if a weight w(i, j) ∈ R is associated with any arc (i, j) ∈ D. From now on we will deal exclusively with weighted directed graphs and will refer to them as “graphs” for simplicity. To any n × n matrix A over Rmax a graph can be associated, called the communication graph of A. The graph will be denoted by G(A). The set of nodes of the graph is given by N (A) = n, and a pair (i, j) ∈ n × n is an arc 22 of the graph if aji 6= ε (this is not a typo!), i.e., in symbols (i, j) ∈ D(A) ⇔ aji 6= ε, where D(A) denotes the set of arcs of the graph. For any two nodes i, j, a sequence of arcs p = ((ik , jk ) ∈ D(A) : k ∈ m), such that i = i1 , jk = ik+1 , for k < m, and jm = j is called a path from i to j. The path is then said to consist of the nodes i = i1 , i2 , . . . , im , jm = j and to have length m. The latter will be denoted as |p|l = m. Further, if i = j, then the path is called a circuit. A circuit p = ((i1 , i2 ), (i2 , i3 ), . . . , (im , i1 )) is called elementary if, restricted to the circuit, each of its nodes has only one incoming and one outgoing arc or, more formally, if nodes ik and il are different for k 6= l. A circuit consisting of just one arc, from a node to itself, is also called a self-loop. A matrix A ∈ Rn×n max is called irreducible if there is in the communication graph G(A) a path from any node to any other node. If a matrix is not irreducible, it is called reducible. The set of all paths from i to j of length m ≥ 1 is denoted by P (i, j; m). For an arc (i, j) in G(A), the weight of (i, j) is given by aji (again, this is not a typo!), and the weight of a path in G(A) is defined by the sum of the weights of all arcs constituting the path. More formally, for p = ((i1 , i2 ), (i2 , i3 ), . . . , (im , im+1 )) ∈ P (i, j; m) with i = i1 and j = im+1 , define the weight of p, denoted by |p|w , through |p|w = m O aik+1 ik . k=1 Pm Note that in conventional notation |p|w = k=1 aik+1 ik . The average weight of a path p is given by |p|w /|p|l . For circuits the notions of weight, length, and average weight are defined similarly as for paths. Also, the phrase circuit mean is used instead of the phrase average circuit weight. Paths in G(A) can be combined in order to construct a new path. For example, let p = ((i1 , i2 ), (i2 , i3 )) and q = ((i3 , i4 ), (i4 , i5 )) be two paths in G(A). Then, ¡ ¢ p ◦ q = (i1 , i2 ), (i2 , i3 ), (i3 , i4 ), (i4 , i5 ) is a path in G(A) as well. The operation ◦ is called the concatenation of paths. Clearly, the operation is not commutative, even when both p ◦ q and q ◦ p are defined. Example 8 Let ε 15 ε A = ε ε 14 . 10 ε 12 The communication graph of A is shown in Figure 5. The graph G(A) has 23 12 3 10 14 1 15 2 Figure 5: The communication graph of matrix A in Example 8. node set N (A) = {1, 2, 3} and arc set D(A) = {(1, 3), (3, 2), (2, 1), (3, 3)}. Specifically, G(A) consists of two elementary circuits, namely, ρ = ((1, 3), (3, 2), (2, 1)) and θ = (3, 3). The weight of ρ is given by |ρ|w = a12 + a23 + a31 = 39, and the length of ρ equals |ρ|l = 3. Circuit θ has weight |θ|w = a33 = 12 and is of length 1. Theorem 3 Let A ∈ Rn×n max . It holds for all k ≥ 1 that £ ⊗k ¤ © ª A ji = max |p|w : p ∈ P (i, j; k) , where [A⊗k ]ji = ε in the case where P (i, j; k) is empty, i.e., when no path of length k from i to j exists in G(A). Definition 2 The cyclicity of a graph G, denoted by σG , is defined as follows: • If G is strongly connected, then its cyclicity equals the greatest common divisor of the lengths of all elementary circuits in G. If G consists of just one node without a self-loop, then its cyclicity is defined to be one. • If G is not strongly connected, then its cyclicity equals the least common multiple of the cyclicities of all maximal strongly connected subgraphs of G. Solving Linear Equations For A ∈ Rn×n max , let A + def = ∞ M k=1 24 A⊗k . (16) The element [A+ ]ij yields the maximal weight of any path from j to i (the value [A+ ]ij = +∞ is possible). Indeed, by definition © ª [A+ ]ij = max [A⊗k ]ij : k ≥ 1 , where [A⊗k ]ij is the maximal weight of a path from j to i of length k; see Theorem 3. Lemma 4 Let A ∈ Rn×n max be such that any circuit in G(A) has average circuit weight less than or equal to e. Then, it holds that + A = ∞ M A⊗k = A ⊕ A⊗2 ⊕ A⊗3 ⊕ · · · ⊕ A⊗n ∈ Rn×n max . k=1 n Theorem 4 Let A ∈ Rn×n max and b ∈ Rmax . If the communication graph G(A) has maximal average circuit weight less than or equal to e, then the vector x = A∗ ⊗ b solves the equation x = (A ⊗ x) ⊕ b. Moreover, if the circuit weights in G(A) are negative, then the solution is unique. Analysis of irreducible matrices (deterministic) Definition 3 Let A ∈ Rn×n max be a square matrix. If µ ∈ Rmax is a scalar and v ∈ Rnmax is a vector that contains at least one finite element such that A ⊗ v = µ ⊗ v, then µ is called an eigenvalue of A and v an eigenvector of A associated with eigenvalue µ. Denote the set of elementary circuits of the communication graph of A by C(A). n×n Theorem 5 Any irreducible matrix A ∈ Rmax possesses one and only one eigenvalue. This eigenvalue, denoted by λ(A), is a finite number and equal to the maximal average weight of circuits in G(A), i.e., |γ|w . γ∈C(A) |γ|l λ(A) = max Theorem 5 is the max-plus analogue of the Perron-Frobenius theorem in conventional linear algebra, which states that an irreducible square nonnegative matrix has a largest eigenvalue that is positive and real, where largest means largest in modulus. We now state the celebrated cyclicity theorem of max-plus algebra. 25 Theorem 6 Let A ∈ Rn×n max be an irreducible matrix with eigenvalue λ and cyclicity σ = σ(A). Then there is an N such that A⊗(k+σ) = λ⊗σ ⊗ A⊗k for all k ≥ N . Analysis of reducible matrices (deterministic) Let G = (N , D) denote a graph with node set N and arc set D. For i, j ∈ N , node j is said to be reachable from node i, denoted as iRj, if there exists a path from i to j. A graph G is called strongly connected if for any two n×n nodes i, j ∈ N , node j is reachable from node i. A matrix A ∈ Rmax is called irreducible if its communication graph G(A) is strongly connected. If a matrix is not irreducible, it is called reducible. To better deal with graphs that are not strongly connected, we say for nodes i, j ∈ N that node j communicates with node i, denoted as iCj, if either i = j or there exists a path from i to j and a path from j to i. Hence, iCj ⇐⇒ i = j or [iRj and jRi]. Note that the relation “communicates with” is an equivalence relation. Indeed, its reflexivity and symmetry follow by definition, and its transitivity follows by the concatenation of paths. If a graph G = (N , D) is not strongly connected, then not all nodes of N communicate with each other. In this case, given a node, say, node i, it is possible to distinguish the subset of nodes that communicate with i from the subset of nodes that do not communicate with i. In the first subset all nodes communicate with each other, whereas in the second subset not all nodes necessarily communicate with each other. In the latter case a further subdivision of the nodes is possible. Repeated application of the previous idea therefore yields that the node set N can be partitioned as N1 ∪ N2 ∪ · · · ∪ Nq , where Nr , r ∈ q, denotes a subset of nodes that communicate with each other but not with other nodes of N . Recall that a partitioning of a set is a division into nonempty subsets such that the joint union is the whole set and the mutual intersections are all empty. Given the above partitioning of N , it is possible to focus on subgraphs of G, denoted by Gr = (Nr , Dr ), r ∈ q, where Dr denotes the subset of D of arcs that have both the begin node and the end node in Nr . If Dr 6= ∅, the subgraph Gr = (Nr , Dr ) is known as a maximal strongly connected subgraph (m.s.c.s.) of G = (N , D). By definition, nodes in Nr do not communicate with nodes outside Nr . However, it can happen that iRj for some i ∈ Nr and j ∈ Nr0 with r 6= r0 , but then the converse (i.e., jRi) does not hold. def We denote by [i] = {j ∈ N : iCj} the set of nodes containing node i that 26 communicate with each other. These nodes together form the equivalence class in which i is contained. Hence, given node i ∈ N , there exists an r ∈ q such that i ∈ Nr and [i] = Nr . Note that the above partitioning covers all nodes of N . If a node of G is contained in one or more circuits, it communicates with certain other nodes or with itself in case one of the circuits actually is a self-loop. In any case, the arc set of the associated subgraph is not empty. However, if the graph G contains a node that is not contained in any circuit of G, say, node i, then node i does not communicate with other nodes and it communicates only with itself. Then, by definition, node i forms an equivalence class on its own, so that [i] = {i}. Because there does not even exist an arc from i to itself, it follows that the associated subgraph is given by ([i], ∅); i.e., the node set consists of node i only and the arc set is empty. Further, although it is not strongly connected, ([i], ∅) will be referred to as an m.s.c.s. This is merely done for convenience. Hence, in the following all subgraphs Gr = (Nr , Dr ), r ∈ q, introduced above are referred to as m.s.c.s.’s. e , D), e by N e = We define the reduced graph, denoted by Ge = (N e if r 6= s and there exists an arc (k, l) ∈ D for {[i1 ], . . . , [iq ]} and ([ir ], [is ]) ∈ D some k ∈ [ir ] and l ∈ [is ]. Hence, the number of nodes in the reduced graph is exactly the number of m.s.c.s.’s in the graph. The reduced graph models the interdependency of m.s.c.s.’s. Note that the reduced graph does not contain circuits. Indeed, if the reduced graph would contain a circuit, then two or more m.s.c.s.’s would be connected to each other by means of a circuit, forming a new m.s.c.s. larger than the m.s.c.s.’s it contains. However, this would contradict the fact that these subgraphs already were maximal and strongly connected. Let Arr denote the matrix obtained by restricting A to the nodes in [ir ], for all r ∈ q, i.e., [Arr ]kl = akl for all k, l ∈ [ir ]. Notice that for all r ∈ q either Arr is irreducible or Arr = ε. It is easy to see that because the reduced graph does not contain any circuits, the original reducible matrix A, possibly after a relabeling of the nodes in G(A), can be written in the form A11 A12 E A22 E E . .. .. . E E ··· ··· A33 .. . ··· ··· ··· A1q A2q .. . .. .. . . E Aqq , with matrices Asr , 1 ≤ s < r ≤ q, of appropriate size. Each finite entry in Asr corresponds to an arc from a node in [ir ] to a node in [is ]. The block 27 upper triangular form shown above is said to be a normal form of matrix A. Note that the normal form of a matrix is not unique. The set of direct predecessors of node i is denoted by π(i); more formally, def π(i) = {j ∈ n : (j, i) ∈ D }. Moreover, denote the set of all predecessors of node i by def π + (i) = {j ∈ n : jRi}, and set π ∗ (i) = {i} ∪ π + (i). In words, π(i) is the set of nodes immediately upstream of i; π + (i) is the set of all nodes from which node i can be reached; and π ∗ (i) is the set of all nodes from which node i can be reached, including node i itself. In the same vein, denote the set of direct successors of node i by σ(i), more formally, def σ(i) = {j ∈ n : (i, j) ∈ D }; write def σ + (i) = {j ∈ n : iRj} for the set of all successors of node i; and set σ ∗ (i) = {i} ∪ σ + (i). In words, σ(i) is the set of nodes immediately downstream of i; σ + (i) is the set of all nodes that can be reached from node i; and σ ∗ (i) is the set of all nodes that can be reached from i, including node i itself. Example 9 Let A = ε ε ε 0 ε ε ε ε ε 9 0 ε 4 ε 16 ε ε ε ε ε ε −3 ε ε ε ε ε ε ε ε ε ε 0 ε ε ε ε ε ε ε ε ε ε ε ε ε 9 ε ε ε ε ε ε ε −5 ε ε ε 6 ε ε ε ε ε ε 0 ε 0.5 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε . The communication graph of A is shown in Figure 6. The graph G(A) has node set N (A) = {1, 2, 3, . . . , 10} and arc set D(A) = {(2,1), (3,2), (2,3), (4,3), (1,4), (2, 5), (6, 5), (7, 6), (5, 7), (7, 8), (6, 9), (1, 10)}. Specifically, G(A) contains 28 0 8 4 3 _1 4 0 2 1 9 2 -3 9 16 7 0 5 0 6 -5 10 6 9 Figure 6: The communication graph of matrix A in Example 9. three elementary circuits ρ = ((2, 3), (3, 2)), θ = ((4, 3), (3, 2), (2, 1), (1, 4)), and η = ((6, 5), (5, 7), (7, 6)). The graph is not strongly connected, for example, 2R5, but it does not hold that 5R2. In words, node 5 is reachable from node 2, but the converse is not true. The predecessor and successor sets are given for node 10, for example, by π(10) = {1} , π + (10) = {1, 2, 3, 4} , π ∗ (10) = {1, 2, 3, 4, 10}, and σ(10) = σ + (10) = ∅ , σ ∗ (10) = {10}. For node 5 these sets read π(5) = {2, 6} , π + (5) = π ∗ (5) = {1, 2, 3, 4, 5, 6, 7}, and σ(5) = {7} , σ + (5) = σ ∗ (5) = {5, 6, 7, 8, 9}. There are five m.s.c.s.’s in G(A) with the set of nodes [1] = [2] = [3] = [4] = {1, 2, 3, 4}, [5] = [6] = [7] = {5, 6, 7}, [8] = {8}, [9] = {9}, and [10] = {10}. Because |ρ|l = 2 and |θ|l = 4, the m.s.c.s. corresponding to, for instance, [2] has cyclicity 2, being the greatest common divisor of all circuit lengths in [2]. Because |η|l = 3, the m.s.c.s. corresponding to, say [5], has cyclicity 3. The other m.s.c.s.’s have cyclicity 1 by definition. Therefore, the graph G(A) has cyclicity 6, being the least common multiple of the cyclicity of all m.s.c.s.’s. Hence, σG(A) = 6. The reduced graph is depicted in Figure 7. 29 [8] [2] [5] [9] [10] Figure 7: The reduced graph of G(A) in Example 9. Based on the reduced graph, let [i1 ] = [8] = {8}, [i2 ] = [9] = {9}, [i3 ] = [5] = {5, 6, 7}, [i4 ] = [10] = {10}, and [i5 ] = [2] = {1, 2, 3, 4}. The corresponding matrices are ε 0 ε ε ε −5 ε ε ε −3 ε ε ε 0 , A55 = A11 = A22 = A44 = ε, A33 = ε 4 ε 0 . 9 ε ε 0 ε ε ε If both the rows and columns of A are placed into the order 8, 9, 5, 6, 7, 10, 1, 2, 3, 4, obtained from placing the elements of the sets [i1 ], [i2 ], [i3 ], [i4 ], and [i5 ] one after another, then the following normal form of A is the result ε ε ε ε 12 ε ε ε ε ε ε ε ε 6 ε ε ε ε ε ε ε ε ε −5 ε ε ε 16 ε ε ε ε ε ε 0 ε ε ε ε ε ε ε 9 ε ε ε ε ε ε ε . ε ε ε ε ε ε 9 ε ε ε ε ε ε ε ε ε ε 0 ε ε ε ε ε ε ε ε ε ε −3 ε ε ε ε ε ε ε ε 4 ε 0 ε ε ε ε ε ε 0 ε ε ε In particular, the diagonal blocks of this normal form of A, starting in the upper left corner and going down to the lower right corner, are given by A11 , A22 , A33 , A44 , and A55 , respectively. 30 Let {x(k) : k ∈ N} be a sequence in Rnmax , and assume that for all j ∈ n the quantity ηj , defined by xj (k) lim , k→∞ k exists. The vector η = (η1 , η2 , . . . , ηn )> is called the cycle-time vector of the sequence x(k). If all ηj ’s have the same value, this value is also called the asymptotic growth rate of the sequence x(k). Theorem 7 Consider the recurrence relation x(k + 1) = A ⊗ x(k) for k ≥ 0, with a square regular matrix A ∈ Rn×n max and an initial condition x(0) = x0 . Let ξ = limk→∞ x(k; x0 )/k be the cycle-time vector of A. 1. For all j ∈ n, M ξ[j] = λ[i] . i∈π ∗ (j) 2. For all j ∈ n and any x0 ∈ Rn , M 1 λ[i] . xj (k; x0 ) = k→∞ k ∗ lim i∈π (j) Ergodic theorems (stochastic) Irreducible matrices We say that A(k) has fixed support if the probability that (A(k))ij equals ε is either 0 or 1 and does not depend on k. With the definition of fixed support at hand, we say that a random matrix A is irreducible if (a) it has fixed support and (b) it is irreducible with probability one. For random matrices, irreducibility thus implies fixed support. Theorem 8 Let {A(k)} be a stationary sequence of integrable and irreducible matrices in RJ×J max such that all finite elements are non-negative and all diagonal elements are different from ε. Then, a finite constant λ exists, so that for any non-random finite initial condition x0 : ||x(k)||min ||x(k)||max xj (k) = lim = lim = λ k→∞ k→∞ k→∞ k k k lim a.s. (17) and 1 1 1 E[xj (k)] = lim E[||x(k)||min ] = lim E[||x(k)||max ] = λ , k→∞ k k→∞ k k→∞ k for 1 ≤ j ≤ J. The above limits also hold true for random initial conditions provided that the initial condition is a.s. finite and integrable. lim 31 Reducible matrices Let {A(k)} be a sequence of matrices in RJ×J max with fixed support. If we replace any element of A(k) that is different from ε by e, then the resulting communication graph of A(k), denoted by Ge (A), is independent of k (and thus non-random). Let Ger (A) denote the reduced graph of Ge (A). As for the def deterministic scenario, we denote by [i] = {j ∈ {1, . . . , J} : iRj} the set of nodes of the m.s.c.s. that contains i. The set of all nodes j such that there exists a path from j to i in Ge (A) is denoted by π + (i). We denote by λtop [i] the top Lyapunov exponent associated with the matrix obtained by restricting A(k) to the nodes in [i]. In case i is an isolated node or node with only incoming or outgoing arcs, we set λtop [i] = ε. Theorem 9 Let {A(k)} be a stationary sequence of integrable matrices in J×J Rmax with fixed support such that with probability one all finite elements are non-negative and the diagonal elements are different from ε. For any (nonrandom) finite initial value x0 it holds true that xj (k) = λj k→∞ k lim with λj = M a.s. , λtop [i] , i∈π ∗ (j) and 1 E[xj (k)] = λj , k→∞ k for 1 ≤ j ≤ J. The above limits also hold for random initial conditions provided that the initial condition is a.s. finite and integrable. lim For the proof, we set π ∗ (i) = {i} ∪ π + (i); and we define predecessor sets [ [≤ i] = [j] j∈π ∗ (i) and [< i] = [≤ i] \ [i]. Beyond fixed support The real challenge is to establish ergodic results without relying on fixed support. This is ongoing research! 32 References [1] Altman, E., B. Gaujal, and A. Hordijk. Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity. Lecture Notes in Mathematics, vol. 1829. Springer, Berlin, 2003. [2] Baccelli, F., G. Cohen, G.J. Olsder, and J.P. Quadrat. Synchronization and Linearity. John Wiley and Sons, (this book is out of print and can be accessed via the max-plus web portal at http://maxplus.org ), 1992. [3] Heidergott, B. Max Plus Stochastic Systems and Perturbation Analysis. Springer, New York, 2006. [4] Heidergott, B., G.J. Olsder, and J. van der Woude. Max Plus at Work: Modeling and Analysis of Synchronized Systems. Princeton University Press, Princeton, 2006. [5] Le Boudec, J.Y., and P. Thiran. Network Calculus: A Theory of Deterministic Queueing Systems for the Internet. Springer, Lecture Notes in Computer Science, No. 2050, Berlin, 1998. [6] Mairesse, J. Products of irreducible random matrices in the (max,+) algebra. Advances of Applied Probability, 29:444–477, 1997. [7] McEneany W. Max-Plus Methods for Nonlinear Control and Estimation. Birkhäuser, Boston, 2006. Appendix The shift-operator Many stochastic concepts, such as stationarity or coupling, can be expressed through the shift-operator in a very elegant manner. Let (Ω, F, P) be a probability space. We call the mapping θ : Ω → Ω shift-operator if • the mapping θ is a bijective and measurable mapping from Ω onto itself, • the law P is left invariant by θ, namely E[X] = E[X ◦ θ] for any measurable and integrable random variable. For any n, m ∈ Z, we set θn ◦ θm = θn+m . In particular, θ0 is the identity and (θn )−1 = θ−n . By convention, the composition operator ‘◦’ has highest priority in all formulae, that is, X ◦ θY means (X ◦ θ)Y . 33 The shift operator allows to define sequences of random variables. To see this, let X be a measurable mapping defined on (Ω, F) and set X(n, ω) = X(θn ω), for n ∈ T ⊂ Z. Because the law P is invariant, the distribution of X(n) is independent of n. This motivates the following definition. We call {X(t) : t ∈ T }, with X(t) a R-valued random variable defined on (Ω, F) and T ⊂ Z, θ-stationary if X(t; ω) = X(0, θt ω) , ω ∈Ω, (18) for any t. We call a sequence X = {X(t) : t ∈ T } compatible with shift operator θ if a version of X exists satisfying (18). Moreover, we call X stationary if X is compatible with shift operator θ so that X is θ-stationary. The shift θ is called ergodic if n 1X X ◦ θk = E[X] a.s. , lim n→∞ n k=1 for any measurable and integrable function X : Ω → R. We call a sequence X = {X(t) : t ∈ T } ergodic if X is compatible with an ergodic shift operator. An event A ∈ F is called invariant if P (A) = P (θt A) for any t, where t θ A = {θt ω : ω ∈ A}. Ergodicity of a shift operator is characterized by Birkhoff’s pointwise ergodic theorem: the shift operator θ is ergodic if (and only if) the only events in F that are invariant are Ω and ∅. Let X = {X(t) : t ∈ T } be a sequence of random elements on a state space S. For m ≥ 1, let α ∈ S m be a sequence of states such that (X(t + m − 1), X(t + m − 2), . . . , X(t)) = α with positive probability. Define the sequence of hitting times of X on α as follows: T0 = inf{t ≥ 0 : (X(t + m − 1), X(t + m − 2), . . . , X(t)) = α} and, for k > 0, Tk+1 = inf{t > Tk + m : (X(t + m − 1), X(t + m − 2), . . . , X(k)) = α}. Result: If X is a stationary and ergodic sequence compatible with shift operator θ, then it holds that (i) Tk < ∞ with probability one for all k, and (ii) limk→∞ Tk = ∞ with probability one. 34 Coupling Convergence We say that there is coupling convergence in finite time (or, merely coupling) of a sequence {Xn } to a stationary sequence {Y ◦ θn } if ¡ ¢ lim P ∀k : Xn+k = Y ◦ θn+k = 1 , n→∞ or, equivalently, there exists an a.s. finite random variable N such that XN +k = Y ◦ θN +k , k ≥0. Result: Coupling (convergence) implies total variation convergence. Result: Convergence in total variation implies convergence in distribution (or, weak convergence) but the converse is not true. Strong Coupling Convergence and Goldstein’s Maximal Coupling We say that there is strong coupling convergence in finite time (or, merely strong coupling) of a sequence {Xn } to a stationary sequence {Y ◦ θn } if N 0 = inf{n ≥ 0 | ∀k ≥ 0 : Xn+k ◦ θ−n−k = Y } is finite with probability one. Result: Strong coupling convergence implies coupling convergence but the converse is not true. We illustrate this with the following example. Let ξm , with ξm ∈ Z and E[ξ1 ] = ∞, be an i.i.d. sequence and define Xn , for n ≥ 1, as follows for Xn−1 = 0 , ξ0 Xn−1 − 1 for Xn−1 ≥ 2 , Xn = Xn for Xn−1 = 1 , where X0 = 0. It is easily checked that {Xn } couples with the constant sequence 1 after ξ0 − 1 transitions. To see that {Xn } fails to converge in strong coupling, observe that the shift operator applies to the ‘stochastic noise’ ξm as well. Specifically, for k ≥ 0, for Xn−1 ◦ θ−k = 0 , ξ0 ◦ θ−k −k −k Xn−1 ◦ θ − 1 for Xn−1 ◦ θ−k ≥ 2 , Xn ◦ θ = Xn ◦ θ−k for Xn−1 ◦ θ−k = 1 ; 35 where X0 ◦ θ−k = 0, and ξ−k = ξ0 ◦ θ−k . This implies N0 = inf{n ≥ 0 | ∀k ≥ 0 : Xn+k ◦ θ−n−k = 1 } = inf{n ≥ 0 | ∀k ≥ 0 : ξn+k − 1 ≤ n } = ∞ a.s. Result (Goldstein’s maximal coupling): Let {Xn } and Y be defined on a Polish state space. If {X(n)} converges with coupling to Y , then a version {X̃(n)} of {X(n)} and a version Ỹ of Y defined on the same probability space exists such that {X̃(n)} converges with strong coupling to Ỹ . 36
© Copyright 2026 Paperzz