fastest mixing new

Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466
Selected Topics in
Probability and Stochastic Processes
Steve Dunbar
Fastest Mixing Markov Chain
Rating
Mathematicians Only: prolonged scenes of intense rigor.
1
Question of the Day
What is the stationary distribution of a Markov chain? If a Markov chain
is symmetric, that is, represented by a symmetric matrix, then what is the
specific stationary distribution? What determines the rate at which the
Markov chain approaches the stationary distribution?
Key Concepts
1. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus, called the mixing rate and denoted by µ or µ(P ) where P is the
transition probability matrix.
2. If P is an n × n symmetric stochastic matrix, then
µ(P ) = kP − (1/n)11T k2
where k · k2 denotes the spectral norm.
3. The eigenvalues and eigenvectors of the tridiagonal matrix


1/2 1/2 0
1/2 0 1/2





... ... ...
P0 = 




1/2 0 1/2
1/2 1/2
can be determined
by solving a recursive system of equations and are
(j−1)π
for j = 1, . . . , n. In particular, the largest eigenvalue
λj = cos
n
is 1.
4. The mixing rate µ(P0 ) = cos(π/n) for P0 is the smallest among all
symmetric stochastic tridiagonal matrices.
2
Vocabulary
1. A matrix for which the row sums
P
j
Pij = 1 is a stochastic matrix.
2. The asymptotic rate of convergence of the Markov chain to the stationary distribution is the mixing rate.
3. The asymptotic rate of convergence of the Markov chain to the stationary distribution depends on the second-largest eigenvalue modulus of
P , called the mixing rate.
Mathematical Ideas
This section is a survey, review, analysis and in-depth investigation of the
article: “Fastest Mixing Markov Chain on a Path”, by Stephen Boyd, Persi
Diaconis, Jun Sun and Lin Xiao, The American Mathematical Monthly, Volume 113, Number 1, January 2006, pages 70-74, [1].
Introduction to Fastest Mixing Markov Chain
This article considers the problem of assigning transition probabilities to
the edges of a graph in such a way that the resulting Markov chain mixes as
rapidly as possible. The problem is specialized because the graph corresponds
to a random walk in that each transition is either to the vertex itself or its
nearest neighbor. The article proves that the fastest mixing is obtained when
each edge has a transition probability of 1/2. This result is intuitive.
3
Figure 1: A graph with transition probabilities
Consider a graph with n ≥ 2 vertices, labeled 1, 2, . . . n with n − 1 edges
connecting adjacent vertices and with a loop at each vertex, as shown in
Figure 1. Consider the Markov chain, that is to say random walk, on this
graph, with transition probability from vertex i to vertex j denoted Pij . The
requirement that transitions can occur only on an edge or loop of the graph is
equivalent to Pij = 0 when |i − j| > 1. Thus P is a tridiagonal
matrix. Since
P
Pij are transition probabilities, we have Pij ≥ 0 and j Pij = 1. Since the
row sums are 1, then we say that P is a stochastic matrix. In matrix-vector
terms, we can write
P1 = 1
(1)
where 1 is the n×1 vector whose entries are all 1. Therefore, 1 is an eigenvalue
of the matrix P .
Furthermore, the article requires that the transition probabilities are symmetric, so that Pij = Pji . This then means that P is a symmetric, doublystochastic, tridiagonal matrix. Since P 1 = 1, then (1/n)1T P = (1/n)1T ,
and the uniform distribution is a stationary distribution for the probability
transition matrix, or the Markov chain.
The asymptotic rate of convergence of the Markov chain to the stationary
distribution depends on the second-largest eigenvalue modulus of P , called
the mixing rate The mixing rate is denoted by µ(P ):
µ(P ) = max |λi (P )|.
i=2,...,n
See below for more information and proofs. The smaller µ(P ) is, the faster
the Markov chain converges to its stationary distribution.
4
Motivation and Application
The following is an application or a motivation for this Markov chain. A
processor is located at each vertex of the graph. Each edge represents a
direct network connection between the adjacent processors. The processor
could be a computer in a network or it could be human worker, such as a
line of barbers or assemblers in a workshop. Each processor has a job load
or queue to finish, say processor i has load qi (t) at time t where qi (t) is a
positive real number. At each step, the goal is shift loads across the edges in
such a way as to balance the load. The shifting is done before any processing
or work begins, so the total amount of work
P to be done is constant. More
precisely, we would like qi (t) → q¯i = (1/n) qi (0) as t → ∞. Moreover, we
would like this balancing to take place as fast as possible. We intend to show
that the balance can be accomplished fastest by shifting one-half of the load
imbalance on each vertex from the more loaded to the less loaded processor.
Proofs about Fastest Mixing
Lemma 1. If P is an n × n symmetric stochastic matrix, then
µ(P ) = kP − (1/n)11T k2
where k · k2 denotes the spectral norm.
Remark 1. Recall that the spectral norm, also called the operator norm,
is the natural norm of a matrix induced by the L2 or Euclidean vector norm.
The spectral norm is also the maximum singular value of the matrix, that is
the square root of the maximum eigenvalue of AH A where AH is the conjugate
transpose.
Proof. Note that 1 is the eigenvector of P associated with the eigenvalue
λ = 1 by equation (1). Also
(1/n)11T 1 = 1.
Let u(2) , . . . u(n) be the other n − 1 eigenvectors of P corresponding to the
eigenvalues λ2 , . . . λn . Because the eigenvectors are orthogonal,
by taking
P
(j)
the inner product of u(j) with eigenvector 1, we see that ni=1 ui = 0 for
j = 2, . . . , n. Therefore
(1/n)11T u(j) = 0.
5
Then the eigenvalues of P − (1/n)11T are λ1 = 0, λ2 , . . . λn with eigenvectors
1, u(2) , . . . , u(n) respectively. Since P − (1/n)11T is symmetric, it is equivalent to a diagonal matrix, and its spectral norm is equal to the maximum
magnitude of its eigenvalues, i.e. max{|λ2 |, . . . , |λn |}, which is µ(P ).
Remark 2. Note that the article has a slight mistake here, since the article
asserts that µ(P ) = max[λ2 , . . . , −λn ]. This implies that λ2 > 0 > λn . This
is not always true, although it is true for the matrices considered here.
Lemma 2. If P is an n × n symmetric stochastic matrix and if y and z in
Rn satisfy
1T y = 0
kyk2 = 1
(zi + zj )/2 ≤ yi yj for i, j with Pij 6= 0
(2)
(3)
(4)
then µ(P ) ≥ 1T z.
Proof. Let the eigenvectors of P be {u(1) = 1, u(2) , . . . u(n) }, and by the
Principal Axes Theorem, we may take these to be an orthonormal basis of
Rn . Let y be as specified in the hypotheses (2) and (3) and let
y=
n
X
αi u(i) .
i=1
By hypothesis (3), kyk2 =
and u(i) for i = 2, . . . , n,
(P − (1/n)11T )
pPn
i=1
n
X
αi2 = 1. By using the orthogonality of 1
!
αi u(i)
=
i=1
n
X
i=1
αi λi u(i) =
n
X
αi λi u(i) .
i=2
The last sum on the right side intentionally starts at 2 because the first
eigenvalue is 0.
By Lemma 1
µ(P ) = kP − (1/n)11T k2 .
By definition of the spectral norm
kP − (1/n)11T k2 = max k(P − (1/n)11T )wk2 .
kwk=1
6
Specializing to the vector y with kyk2 = 1 and using the definition of the
2-norm
max k(P − (1/n)11T )wk2
p
= yT (P − (1/n)11T )T (P − (1/n)11T )y
v
!T
!
u n
n
u X
X
t
=
αi u(i)
(P − (1/n)11T )T (P − (1/n)11T )
αi u(i)
kwk=1
i=1
i=1
v
!T
u n
u X
t
αi λi u(i)
=
i=1
n
X
!
αi λi u(i)
i=1
v
u n
uX
=t
α 2 λ2 .
i
i
i=1
Apply Jensen’s Inequality to the concave
function
Pn down
2
2
combination defined by αi which has i=1 αi = 1
v
u n
n
X
uX
t
αi2 λ2i ≥
αi2 |λi |
i=1
√
· over the convex
i=1
≥
n
X
αi2 λi .
i=1
Now unwind the expression back into vector notation, again using the orthogonality of the eigenvectors
n
X
i=1
αi2 λi =
n
X
!T
αi u(i)
i=1
T
(P − (1/n)11T )
n
X
i=1
T
= y (P − (1/n)11 )y
= yT P y
X
=
Pij yi yj
i,j
7
!
αi u(i)
k(P − (1/n)11T )yk2
Now use the hypothesis (4)
X
X
Pij yi yj ≥
(1/2)(zi + zj )Pij
i,j
i,j
= (1/2)(zT P 1 + 1T P z)
= 1T z.
This establishes the lemma.
Lemma 3. The eigenvalues and eigenvectors of the tridiagonal matrix


1/2 1/2 0 . . .
0
1/2 0 1/2 . . .
0 




.
.
.
.
.
.
P0 =  0
. 0 
.
.


 0 . . . 1/2 0 1/2
0 ...
0 1/2 1/2
are λ1 = 1 and λj = cos (j−1)π
for j = 2, . . . , n.
n
Remark 3. The following proof is adapted from Feller, [2, Section XVI.2,
pages 388-391]. In fact, Feller finds the eigenvalues and eigenvectors for the
more general tridiagonal matrix


q p
0 ... 0
q 0
p . . . 0



 .. .. ..
P = 0
. 0
.
.


0 . . . q
0 p
0 ... 0
q p
Proof. The proof proceeds by directly finding the solution of the linear system
P0 u = λu. The problem is treated as a linear system in the (n − 2) × (n − 2)
system defined by equations 2, . . . n − 1 in the variables u2 , . . . , un−1 and then
using the first and last equation as boundary conditions which will determine
the values of λ that permit a nontrivial solution.
The equations are
λu1 = (1/2)u1 + (1/2)u2
λuj = (1/2)uj−1 + (1/2)uj+1 , j = 2, . . . .n − 1
λun = (1/2)un1 + (1/2)un
8
(5)
(6)
(7)
Equation (6) is satisfied by uj = sj provided
λs = (1/2) + (1/2)s2
√
√
or s+ = λ + λ2 − 1 and s− = λ − λ2 − 1. Then the general solution is of
the form uj = A(λ)sj+ + B(λ)sj− . Applying the first equation (5) obtain
λ(As+ + Bs− ) = (1/2)(As+ + Bs− ) + (1/2)(As2+ + Bs2− )
0 = A[(1 − 2λ)s+ + s2+ ] + B[(1 − 2λ)s− + s2− ]
0 = A[s+ − 1] + B[s− − 1].
Applying the last equation (7) obtain
n−1
λ(Asn+ + Bsn− ) = (1/2)(Asn−1
+ Bs−
) + (1/2)(Asn+ + Bsn− )
+
n−1
0 = Asn−1
+ [1 − 2λs+ + s+ ] + Bs− [1 − 2λs− + s− ]
2
n−1
2
0 = Asn−1
+ [−s+ + s+ ] + Bs− [−s− − s− ].
Combining these equations sn+ = sn− . Note that s+ s− = 1, so s2n
+ = 1 and
2n
s− = 1. That is, both s+ and s− are 2nth roots of unity. Therefore, s+ and
s− can be written in the form
iπj
iπj
iπj/n
e
= cos
+ i sin
n
n
for j = 0, 1, 2, . . . , 2n − 1.
Thus, the eigenvalues must be among the solutions of
s+ (λ) = eiπj/n
or
λ+
√
√
λ2 − 1 = eiπj/n
λ2 − 1 = −λ + eiπj/n
λ2 − 1 = λ2 − 2λeiπj/n + e2iπj/n
−1 = −2λeiπj/n + e2iπj/n
eiπj/n
=λ
2eiπj/n
2
e−iπj/n eiπj/n
+
=λ
2
2
cos πj/n = λ
1
+
9
So to each j we can find a root λj , namely
λj = cos(πj/n)
j = 0, 1, 2, . . . , 2n − 1.
However, some of these roots are repeated, since
cos(π(n − j)/n) = cos(π(n + j)/n)
for j = 1, . . . , n. So there are n eigenvalues λ1 = 1 and λj = cos
j = 2, . . . , n.
(j−1)π
n
for
Theorem 4. The value µ(P0 ) = cos(π/n) for


1/2 1/2

1/2 0
0




..
..
..
P0 = 

.
.
.



1/2 0 1/2
1/2 1/2
is the smallest among all symmetric stochastic tridiagonal matrices.
Proof. The proof proceeds by constructing a pair of vectors y and z that satisfy the assumptions of Lemma 2 for any symmetric tridiagonal stochastic matrix P . Furthermore, 1T z = cos(π/n), so the mixing rate µ(P0 ) = cos(π/n)
is the fastest possible.
By Lemma 3 and the definition of the mixing rate
µ(P0 ) = λ2 = λn = cos(π/n).
Take y = u(2) , the second eigenvector of P0 , so the assumptions (2) and (3)
in Lemma 2 are automatically satisfied. Take z to be the vector with
1
π
(2i − 1)π
zi =
cos
+ cos
cos (π/n)
n
n
n
for i = 1, . . . , n.
10
Note that
n
X
n
X
cos((2j − 1)π/n) = <
j=1
!
exp(i(2j − 1)π/n)
j=1
n
X
=<
!
exp(i2jπ/n) exp(−iπ/n)
j=1
= < exp(−iπ/n)
n
X
!
exp(i2π/n)j
j=1
= < exp(−iπ/n) exp(i2π/n)
n
X
!
exp(i2π/n)j−1
j=1
= < exp(−iπ/n) exp(2iπ/n)
n−1
X
!
exp(2π/n)j
j=0
n
1 − exp(i2π/n)
= < exp(iπ/n)
1 − exp(i2π/n)
1 − exp(i2π)
= < exp(iπ/n)
1 − exp(i2π/n)
= 0.
Then it is easy to verify that 1T z = cos(π/n).
Now check that y and z satisfy hypothesis (4) of Lemma 2.
1
1
(2i − 1)π
(2i + 1)π
zi + zi+1
π
=
+
cos
cos
+ cos
cos (π/n)
2
n
n
2
n
n
Using the cosine sum formula
π 1
(2i − 1)π
(2i + 1)π
2iπ
cos
+ cos
= cos
cos
2
n
n
n
n
this simplifies to
2iπ
zi + zi+1
1
π
=
cos
+ cos
.
2
n
n
n
Using the cosine sum formula again
1
π
2iπ
2
(2i − 1)π
(2i + 1)π
cos
+ cos
= cos
cos
= yi yi+1 .
n
n
n
n
2n
2n
11
Therefore equality in inequality (4) holds for the adjacent entries resulting
from the nonzero subdiagonal and superdiagonal entries. For the diagonal
entries, check that (zi + zi )/2 = zi ≤ yi2 . That is, check that
π
π (2i − 1)π
(2i − 1)π
2
+ cos
cos
≤ 2 cos
.
cos
n
2n
2n
n
Using the double-angle formula for the cosine
(2i − 1)π
(2i − 1)π
2
2 cos
= 1 + cos
.
n
2n
Therefore,
cos
π n
+ cos
(2i − 1)π
n
cos
π n
≤ 1 + cos
(2i − 1)π
n
and moving all terms to the right obtain
π π (2i − 1)π
(2i − 1)π
− cos
+ cos
1 − cos
cos
≥ 0.
n
n
n
n
This can be factored as
π h
π i (2i − 1)π
1 − cos
cos
.
1 − cos
n
n
n
This is true because
cos
(2i − 1)π
n
cos
for i = 1, . . . , n.
12
π n
≤1
Problems to Work for Understanding
1. Show that balancing the workload of a line of processors by shifting
one-half of the load imbalance on each vertex from the more loaded to
the less loaded processor can be represented by the tridiagonal matrix
P0 .
2. Show that
π (2i − 1)π
(2i + 1)π
2iπ
1
cos
+ cos
= cos
cos
2
n
n
n
n
and
2iπ
(2i − 1)π
1
2
(2i + 1)π
π
+ cos
cos
= cos
cos
.
n
n
n
n
2n
2n
3. Show that the four eigenvalues of


1/2 1/2 0
0
1/2 0 1/2 0 

P0 = 
 0 1/2 0 1/2
0
0 1/2 1/2
are the fourth roots of unity.
4. Show that the eigenvalues and for the more general tridiagonal matrix


q p
0 ... 0
q 0
p . . . 0


 .. .. ..

P = 0
.
.
. 0


0 . . . q
0 p
0 ... 0
q p
√
‘are λ1 = 1 and λj = 2 pq cos (j−1)π
for j = 2, . . . , n. Then show din
rectly that P0 is the fastest mixing among all such tridiagonal matrices
P.
5. Use mathematical software to numerically evaluate the eigenvalues of
the matrices P0 for sizes n = 2, . . . 8 and show that the values agree
with the exact eigenvalues in Lemma 3.
13
Reading Suggestion:
References
[1] Stephen Boyd, Persi Diaconis, Jun Sun, and Lin Xiao. Fastest mixing
Markov chain on a path. American Mathematical Monthly, 113(1):70–74,
January 2006. Markov chains.
[2] William Feller. An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, volume I. John Wiley and Sons, third
edition edition, 1973. QA 273 F3712.
Outside Readings and Links:
1.
2.
3.
4.
I check all the information on each page for correctness and typographical
errors. Nevertheless, some errors may occur and I would be grateful if you would
alert me to such errors. I make every reasonable effort to present current and
accurate information for public use, however I do not guarantee the accuracy or
14
timeliness of information on this website. Your use of the information from this
website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external
websites are provided as a convenience. I do not endorse, control, monitor, or
guarantee the information contained in any external website. I don’t guarantee
that the links are active at all times. Use the links here with the same caution as
you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions
or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modified: Processed from LATEX source on April 16, 2010
15

Download Report

fastest mixing new

Paperzz.com

Your Paperzz