almost sure convergence of random gossip algorithms

ALMOST SURE CONVERGENCE
OF
RANDOM GOSSIP ALGORITHMS
Giorgio Picci
with T. Taylor, ASU Tempe AZ.
Wofgang Runggaldier’s Birthday, Brixen July 2007
1
CONSENSUS FOR RANDOM GOSSIP
ALGORITHMS
Consider a finite set of nodes representing say wireless sensors or distributed computing units, can they achieve a common goal by exchanging information only locally ?
exchanging information locally for the purpose of forming a common estimate of some physical variable x;
Each node k forms his own estimate xk (t) , t ∈ Z+ and updates according
to exchange of information with a neighbor.
Neighboring pairs are chosen randomly
Q: will all local estimates {xk (t) , k = 1, . . . , n} converge to the same value
as t → ∞?.
2
DYNAMICS OF RANDOM GOSSIP
ALGORITHMS
While two nodes vi and v j are in communication, they exchange information to refine their own estimate using the neighbor’s estimate.
Model this adjustament in discrete time by a simple symmetric linear relation
xi(t + 1) = xi(t) + p(x j (t) − xi(t))
x j (t + 1) = x j (t) + p(xi(t) − x j (t))
where p is some positive gain parameter modeling the speed of adjustment. For stability need to impose that |1 − 2p| ≤ 1 and hence 0 ≤ p ≤ 1.
For p = 21 you take the average of the two measurements so that xi(t + 1) =
x j (t + 1).
3
DYNAMICS OF RANDOM GOSSIP
ALGORITHMS
The whole coordinate vector x(t) ∈ Rn evolves according to x(t +1) = A(e)x(t),
the matrix A(e) ∈ Rn×n depending on the edge e = viv j selected at that particular time instant;


1 0
0
···
 0 ... ... ...

 ..

 .

1− p
···
p


...
...
1




...
...
...
A(e) = 



...
...
1




p
···
1− p




...
1
T
= In − p 1vi − 1v j 1vi − 1v j
4
EIGENSPACES
The vector 1vi has the ith entry equal to 1 and zero otherwise.
A(e) is a symmetric doubly stochastic matrix. The value 1−2p is a simple
eigenvalue associated to the eigenvector (1vi − 1v j ),
A(e)(1vi − 1v j ) = (1vi − 1v j ) − p(1vi − 1v j )2 = (1 − 2p)(1vi − 1v j )
⊥
the orthogonal (codimension one) subspace 1vi − 1v j
is the eigenspace
of the eigenvalue 1.
Let 1 := [1, . . . , 1]>. Want x(t) to converge to the subspace {1} := {α1 ; α ∈
R}. This would be automatically true for a fixed irreducible d-stochastic
matrix.
5
A CONTROLLABILITY LEMMA
Lemma 1 Let G = (V, E) be a graph. Then 1vi − 1v j
⊥
span {1}; i.e.
span {1vi − 1v j : (viv j ) ∈ E} = 1⊥
iff G is connected.
Corollary 1 Let G0 = (V, E 0) with E 0 ⊆ E be a subgraph of G. Let {ei : 1 ≤ i ≤
m0} be an ordering of E 0, and let π denote a permutation of {1, 2, · · · , m0}.
Q m0
0
Let B(E , π) = i=1 A(eπi), where the product is ordered from right to left.
Then
kB(E 0, π)
1⊥
k<1
if and only if G0 is connected.
6
THE EDGE PROCESS
Let Ω = E N, be the space of all semi-infinite sequences taking values
in E, and let σ : Ω → Ω denote the shift map: σ (e0, e1, e2, · · · , en, · · · ) =
(e1, e2, · · · , en, · · · ). Let evk : Ω → E denote the evaluation on the kth term.
Let µ denote an ergodic shift invariant probability measure on Ω, so that
the edge process e(k) : ω → evk (ω) is ergodic.
Special cases: e(k) is iid, or an ergodic Markov chain. However, what we
shall do works for general ergodic processes.
Consider the function
C : Ω × Z → Rn×n,
C(ω,t) :=
t−1
Y
i=0
A(evi(ω)) =
t−1
Y
A(ev0(σ iω))
i=0
which by stationarity of e obeys the composition rule C(ω,t +s) = C(σ t ω, s)C(ω,t)
with C(ω, 0) = I. Such a function is called a matrix cocycle.
7
MULTIPLICATIVE ERGODIC THEOREM
Theorem 1 [Oseledet’s Multiplicative Ergodic Theorem] Let µ be a shift
invariant probability measure on Ω and suppose that the shift map σ : Ω →
Ω is ergodic and that log+ kC(ω,t)k is in L1. Then the limit
1
2t
T
Λ = lim C(ω,t) C(ω,t)
(1)
t→∞
exists with probability one, is symmetric and nonnegative definite, and is
µ a.s. independent of ω. Let λ1 < λ2 < · · · λk for k ≤ n be the distinct
L
eigenvalues of Λ, let Ui denote the eigenspace of λi, and let Vi = ij=1 U j .
Then for u ∈ Vi −Vi−1,
1
lim log kC(ω,t)uk = log(λi) .
t→∞ t
(2)
The numbers λi are called the Lyapunov exponents of C.
8
MULTIPLICATIVE ERGODIC THEOREM
The Lyapunov exponents control the exponential rate of convergence (or
non-convergence) to consensus.
The matrices A(e) are doubly stochastic as are any matrix products of
them, C(ω,t). If follows that the constant functions on V , {1}, as well as the
mean zero functions in {1}⊥ are invariant under the action of this cocycle
and of its transpose. Thus these subspaces are also invariant under the
limiting matrix Λ of the Oseledet’s theorem.
There is a Lyapunov exponent associated with the subspace {1} which, it
is not difficult to see, is one.
There are n − 1 Lyapunov exponents associated with the subspace 1⊥, so
the key point is to characterize them.
9
CONVERGENCE TO CONSENSUS
Pn
1
n
For x ∈ R use the symbol x̄ := n i=1 xi. The main convergence result
follows.
Theorem 2 Let G = (V, E) be a connected graph and let e(t) be an ergodic
stochastic process taking values on E. Suppose that the support of the
probability distribution induced by e(t) is all of E. Let the gossip algorithm
be initialized at x(0) = x0. Then there is a (deterministic) constant |λ | < 1
and a (random) constant Kλ such that
kx(t) − x̄01k < Kλ λ t kx0 − x̄01k
µ-almost surely.
10
OPEN QUESTIONS
• Rate of convergence (for L2...)
• Multiple gossiping : more than one pair of communicating edges per
time slot,
• Convergence is merely associated to the time T it takes the algorithm
to visit a spanning tree with positive probability. Indeed, the actual rate
of convergence of the algorithm is just determined by T .
• Much remains to be done !!!
11
REFERENCES
W. Runggaldier (circa 1970): STILLE WASSER GRUNDEN TIEF, unpublished (although well known among specialists).
12