The Transportation Metric
and Path Coupling
By Levin, Peres and Wilmer
Presented by Oleg Zlydenko
Coupling
β’ Analyze the mixing time of a Markov chain π
β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©:
β’ In both coordinates it looks like π (= coupling)
β’ For any two starting states, they eventually meet
β’ A bound on π‘πππ₯ depends on the time it takes the
chains to meet
Review
β’ Theorem: Let ππ‘ , ππ‘ β
π‘=0 be a sticky coupling of π,
where π0 = π₯, π0 = π¦. Let ππππ’πππ be the first time
the chains meet.
Then: ||ππ‘ π₯,β
β ππ‘ π¦,β
|| ππ β€ ππ₯,π¦ {ππππ’πππ > π‘}
β’ Which leads to: π π‘ β€ max ππ₯,π¦ {ππππ’πππ > π‘}
π₯,π¦βΞ©
β’ Then we find the minimal π‘ that ensures π π‘ < π
β’ This is a bound on π‘πππ₯ (π)
The Path Coupling Technique
β’ Analyze the mixing time of a Markov chain π
β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©:
Coupling
Path Coupling
β’ In both coordinates it looks
like π (= coupling)
β’ For any two starting states,
they eventually meet
β’ A bound on π‘πππ₯ depends on
the time it takes the chains to
meet
β’ In both coordinates it looks
like π (= coupling)
β’ For some two starting states,
they tend to βget closerβ
β’ A bound on π‘πππ₯ depends on
how quickly the states get
closer
Plan
β’ The Transportation Metric
β’ The main theorem of Path Coupling
β’ Bounding mixing time
β’ Example β Fast Mixing for Colorings
The Transportation Metric
The Transportation Metric
β’ We have a state space Ξ© and metric π between
states
0.5
Assistant
Professor
0.2
0.3
Out on
the
Street
0.8
0.7
0.7
Associate
Professor
0.2
0.1
Tenured
Professor
0.3
0.2
Dead
β’ π could be the difference
in salaries
β’ π π π‘ππππ‘, ππππ = 3
β’ π ππ π ππ π‘πππ‘, ππππ = 15
β’ π is a metric
1.0
β’ Non-negative
β’ Symmetric
β’ Triangle inequality
The Transportation Metric
β’ We have a state space Ξ© and metric π between
states
β’ The transportation metric (or Wasserstein metric) is
defined on distributions over Ξ©:
β’ ππΎ π, π = inf πΌ π π, π : π, π ππ π πππ’πππππ πππ, π
β’ Intuitively β itβs the distance between two distributions
β’ If π π₯, π¦ = 1 π₯β π¦ , then ππΎ π, π = π β π
ππ
β’ If π π₯, π¦ β₯ 1 π₯β π¦ , then ππΎ π, π β₯ π β π
ππ
πβπ
ππ
= inf Pr π β π
π,π
Optimal Coupling
β’ An optimal coupling (πβ , πβ ) of π, π achieves:
β’ ππΎ π, π = πΌ π πβ , πβ
β’ Lemma 1: an optimal coupling exists
β’ We already used this Lemma in a previous lesson, for
π π₯, π¦ = 1 π₯β π¦
Metric
β’ Lemma 2: ππΎ is a metric on the space of probability
distributions on Ξ©
β’ Non-negativity: π is a metric, hence non-negative, so
ππΎ is an infimum of a non-negative set
β’ Symmetry: πΌ π π, π
β’ Triangle inequality:
= πΌ π π, π
β’ For 3 distributions π, π, π on Ξ©:
ππΎ π, π + ππΎ π, π β₯ ππΎ π, π
β’ Proof over next few slides
Lemma 2 β Proof 1/2
β’ Let π, π, π be distributions on Ξ©
β’ Let π, π be the distributions on Ξ© × Ξ© of the optimal
couplings of (π, π) and (π, π) respectively
β’ Define a distribution: π π₯, π¦, π§ =
π π₯,π¦ β
π π¦,π§
π π¦
β’ The projection of π on the first 2 coordinates is π
β’
π§βΞ© π
π₯×π¦×z =
π π₯,π¦
π π¦
β
π§βΞ© π
π¦, π§ = π π₯, π¦
β’ Similarly, the projection of π on the last 2 coordinates
is π
π
Intuition
π
π
β’ π π₯, π¦, π§ =
π π₯,π¦ β
π π¦,π§
π π¦
is a distribution on Ξ©3
β’ Each cell of a cube holds a probability
β’ The projection on π₯, π¦ is π (a coupling of π, π)
β’ The projection on π₯ is π, the projection on π¦ is π
β’ The projection on π¦, π§ is π (a coupling of π, π)
β’ The projection on π¦ is π, the projection on π¦ is π
β’ So the projection on π₯, π§ is a coupling of π, π
Lemma 2 β Proof 2/2
β’ Let (π, π, π) is a random vector with distribution π
β’ Since π is a metric:
β’ π π, π β€ π π, π + π(π, π)
β’ Take expectation on both sides
β’ πΌ π π, π
β€ πΌ π π, π
+ πΌ π π, π
β’ Since π, π are distributions of the optimal couplings
β’ πΌ π π, π
+ πΌ π π, π
= ππΎ π, π + ππΎ π, π
β’ Note that (π, π) is a coupling of π, π
β’ ππΎ π, π β€ πΌ π π, π
β€ ππΎ π, π + ππΎ π, π
Summary so far
β’ Given a state space Ξ© and metric π between states,
we can define a new metric ππΎ between distributions
β’ We can use it to bound the Total Variation distance, if
π π₯, π¦ β₯ 1 for all π₯ β π¦ β Ξ©:
β’ ππΎ π, π β₯ π β π
ππ
Path Coupling
Path Metric
β’ Suppose we have a connected graph πΊ = (π, πΈ0 )
β’ π = Ξ© a state space of a Markov chain
β’ The edges donβt have to match permissible transitions
β’ In addition, we have a length function β on edges
β’ β π₯, π¦ β₯ 1 for all edges {π₯, π¦}
β’ The length of a path is the sum of β π₯, π¦ for edges
{π₯, π¦} on the path
β’ The path metric on Ξ©:
β’ π π₯, π¦ = min β π : π ππ π πππ‘β ππππ π₯ π‘π π¦
β’ Why is it a metric?
0.5
0.7
0.7
0.2
Assistant
Professor
Associate
Professor
1
Out on
the
Street
0.8
1.5
0.2
2
Tenured
Professor
2
1
0.3
0.1
0.3
0.2
Dead
1.0
β’ π ππ π πππππ‘π, ππππ = 2
β’ π π π‘ππππ‘, ππππ = 3
Summary so far
β’ Given a state space Ξ© and metric π between states,
we can define a new metric ππΎ between distributions
β’ We can use it to bound the Total Variation distance, if
π π₯, π¦ β₯ 1 for all π₯ β π¦ β Ξ©:
β’ ππΎ π, π β₯ π β π
ππ
β’ We can generate a metric π between states by
extending distances between some states with the
path metric
Main Theorem (Bubley, Dyer)
β’ Let πΊ = Ξ©, E0 , β and the path metric π be as
previously defined
β’ Assume for each edge π₯, π¦ β E0 we have a coupling
of distributions π π₯,β
, π(π¦,β
) such that:
β’ πΌ π π π₯,β
, π(π¦,β
)
for some πΌ > 0
β€ π βπΌ β
π π₯, π¦ = π βπΌ β
β π₯, π¦
β’ Then for any two distributions π, π on Ξ©:
β’ ππΎ ππ, ππ β€ π βπΌ β
ππΎ (π, π)
Motivation
β’ Recall: β π₯, π¦ β₯ 1 for all edges {π₯, π¦}
β’ Then π π₯, π¦ β₯ 1 π₯β π¦
β’ Pr π β π = πΌ 1 πβ π β€ πΌ π π, π
β’ Taking πππ over all couplings (π, π) of π, π:
β’ πππ π, π β€ ππΎ (π, π)
β’ So bounding ππΎ (with the theorem) provides a bound
on the mixing time
Bounding Mixing Time
β’ Corollary: Suppose that the hypotheses of BublyDyer Theorem hold. Then:
π π‘ = πππ₯π₯βΞ© ππ‘ π₯,β
β π
where ππππ Ξ© = max π π₯, π¦
x,yβΞ©
ππ
β€ π βπΌπ‘ β
ππππ(Ξ©)
is the diameter of Ξ©
β’ And consequently:
π‘πππ₯
β log π + log ππππ Ξ©
π = min t: d π‘ β€ π β€
πΌ
Corollary β Proof
β’ Recall that the theorem provides:
β’ ππΎ ππ, ππ β€ π βπΌ β
ππΎ (π, π)
β’ Applying it π‘ times: ππΎ πππ‘ , πππ‘ β€ π βπΌπ‘ β
ππΎ π, π
β’ Also, by definition: ππΎ π, π β€ ππππ(Ξ©)
β’ Specifically:
β’ Choosing π = π gives πππ‘ = πππ‘ = π
β’ Choosing π the distribution that always returns π₯, πππ‘ =
ππ‘ π₯,β
β’
ππ‘ π₯,β
β π
ππ
β€ ππΎ ππ‘ π₯,β
, π β€ π βπΌπ‘ ππππ(Ξ©)
Bubley and Dyer Theorem β Proof 1/5
β’ Have: πΌ π π π₯,β
, π(π¦,β
)
π₯, π¦ β πΈ0 (for πΌ > 0)
β’ Recall:
β€ π βπΌ β
π π₯, π¦ for
β’ ππΎ π, π = inf πΌ π π, π : π, π ππ π πππ’πππππ πππ, π
β’ So: ππΎ π π₯,β
, π π¦,β
β€ π βπΌ β
π π₯, π¦
β’ Lemma: For all π₯, π¦ β Ξ©:
ππΎ π π₯,β
, π π¦,β
β€ π βπΌ β
π(π₯, π¦)
Bubley and Dyer Theorem β Proof 2/5
β’ Lemma: For all π₯, π¦ β Ξ©:
ππΎ π π₯,β
, π π¦,β
β€ π βπΌ β
π(π₯, π¦)
β’ Proof:
β’ Consider the shortest path (according to π) between π₯, π¦:
π₯ = π₯0 , π₯1 , β¦ , π₯π = π¦
β’ ππΎ π π₯,β
, π π¦,β
β€ ππ=1 ππΎ π π₯πβ1 ,β
, π π₯π ,β
β’ β€ π βπΌ β
ππ=1 π π₯πβ1 , π₯π
The triangle inequality
βπΌ β
π π₯, π¦ for edges
βπΌ
π
π
π₯,β
,
π
π¦,β
β€
π
πΎ
β’ = π β
π π₯, π¦
We chose the shortest path
Bubley and Dyer Theorem β Proof 3/5
β’ Have: ππΎ π π₯,β
, π π¦,β
β€ π βπΌ β
π(π₯, π¦)
β’ Want: ππΎ ππ, ππ β€ π βπΌ β
ππΎ (π, π) for any two
distributions π, π on Ξ©
β’ Let π be an optimal coupling of π, π, i.e:
β’ ππΎ π, π =
π₯,π¦βΞ© π
π₯, π¦ β
π(π₯, π¦)
β’ We can choose for every π₯, π¦ β Ξ© an optimal
coupling ππ₯,π¦ of π π₯,β
, π π¦,β
β’
π’,π€βΞ© π
π’, π€ β
ππ₯,π¦ (π’, π€) β€ π βπΌ β
π(π₯, π¦)
Bubley and Dyer Theorem β Proof 4/5
β’ Have: π optimal coupling of π, π , and ππ₯,π¦ optimal
coupling of π π₯,β
, π π¦,β
β’ ππΎ π, π =
β’
π’,π€βΞ© π
π₯,π¦βΞ© π
π₯, π¦ β
π(π₯, π¦)
π’, π€ β
ππ₯,π¦ (π’, π€) β€ π βπΌ β
π(π₯, π¦)
β’ Define a distribution π =
β’ π is a coupling of ππ, ππ
π₯,π¦βΞ© π(π₯, π¦)
β
ππ₯,π¦
β’
Choose 2 starting states with
probabilities
π¦) β
ππ₯,π¦ (π’,π,
π€)π and then advance
π€βΞ© π(π’, π€) = π₯,π¦,π€βΞ© π(π₯,
them
on the
Markov
chain
π(π₯,
π¦)
β
π
(π’,
π€)
=
π(π₯,
π¦)
β
π(π₯,
π’) =
π₯,π¦βΞ©
π€βΞ© π₯,π¦
π₯,π¦βΞ©
β’
π₯βΞ© π(π₯, π’) β
π¦βΞ© π(π₯, π¦)
β’
π₯βΞ© π(π₯, π’) β
π(π₯) = ππ(π’)
β’
=
Bubley and Dyer Theorem β Proof 5/5
β’ Have:
β’ π=
π₯,π¦βΞ© π(π₯, π¦) β
ππ₯,π¦ ,
β’ ππΎ π, π =
β’
π₯,π¦βΞ© π
a coupling of ππ, ππ
π₯, π¦ β
π(π₯, π¦)
βπΌ
π
π’,
π€
β
π
(π’,
π€)
β€
π
β
π(π₯, π¦)
π₯,π¦
π’,π€βΞ©
β’ Want: ππΎ ππ, ππ β€ π βπΌ β
ππΎ (π, π) for any two
distributions π, π on Ξ©
β’ ππΎ ππ, ππ β€ π’,π€βΞ© π π’, π€ β
π π’, π€
β’ = π’,π€βΞ© π₯,π¦βΞ© π π’, π€ β
π π₯, π¦ β
ππ₯,π¦ π’, π€
β’ β€ π βπΌ β
π₯,π¦βΞ© π π₯, π¦ β
π π₯, π¦
β’ = π βπΌ β
ππΎ (π, π)
Summary so far
β’ Given a state space Ξ© and a connected graph with
lengths β on edges, we can define a metric π
between states (path metric)
β’ We can then define a metric ππΎ between
distributions (transportation metric)
β’ Given some conditions on π, we can bound the
mixing time (Corollary of Bubley-Dyer theorem)
The Path Coupling Technique
β’ Analyze the mixing time of a Markov chain π
β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©:
β’ In both coordinates it looks like π (= coupling)
β’ For some two starting states, they tend to βget closerβ
β’ A bound on π‘πππ₯ depends on how quickly the states
get closer
The Path Coupling Technique
β’ Decide which states are βcloseβ β’ Define πΊ = (Ξ©, πΈ0 ), and a distance function β on edges
β’ Extend β to a metric π on Ξ© using the path metric
β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©:
β’ In both coordinates it looks like π (= coupling)
β’ For 2 adjacent starting states, they tend to βget closerβ
β’ πΌπ₯,π¦ π π1 , π1
β€ π βπΌ β
π π₯, π¦
β’ A bound on π‘πππ₯ depends on πΌ and ππππ(Ξ©)
β’ π‘πππ₯ π β€
β log π +log ππππ Ξ©
πΌ
β’ We want as many edges as we can (so ππππ(Ξ©) is smaller)
β’ And we want the distance shrinking rapidly (so πΌ is bigger)
Fast Mixing for Colorings
Reminder: π-colorings
β’ Proper π-colorings of πΊ = π, πΈ are elements π₯ β
1,2, β¦ , π π s.t. : π₯ π£ β π₯(π€) for π£, π€ β πΈ
β’ Many uses for coloring problems
β’ Voting choices on the graph of (people, friendships)
β’ People tend to change votes according to friends
β’ Time slots on the graph of (tasks, conflicts) = scheduling
β’ Conflicting tasks canβt be executed in the same time slot
β’ Many things itβs interesting to analyze:
β’ How does a βrandomβ coloring look?
β’ How long does it take a process to converge?
Reminder
Metropolis chain
Glauber dynamics
β’ A vertex π£ is chosen
uniformly at random
β’ A color π is chosen
uniformly at random
β’ If updating the color of π£ to
π yields a proper π-coloring,
accept it
β’ A vertex π£ is chosen
uniformly at random
β’ A color π is chosen between
the admissible colors
β’ Why are they different?
β’ Colors that donβt appear at
the neighbors of π£
Theorem
β’ Consider the Glauber dynamics chain for proper πcolorings of a graph πΊ = (π, πΈ) with π vertices and
maximum degree Ξ
β’ If π > 2 β
Ξ, then the mixing time satisfies:
β’ π‘πππ₯ π β€
πβΞ
πβ2Ξ
β
π β
(log π β log(π)
Comparison
Metropolis chain
Glauber dynamics
β’ π >2β
Ξ
β’ π‘πππ₯ π β€
β’ π >2β
Ξ
β’ π‘πππ₯ π β€
π
πβ2Ξ
π(log π β log(π)
πβΞ
πβ2Ξ
π(log π β log(π)
Theorem β Proof 1/8
β’ Two colorings are neighbors if their differ in 1 node
β’ Defines the graph πΊ = (Ξ©, πΈ0 )
β’ Define β π₯, π¦ = 1 for edges π₯, π¦ β πΈ0
β’ The metric is π π₯, π¦ =
π£βπ π
π₯ π£ β π¦ π£
β’ We only need to define a way to generate 2 new
colorings starting with colorings π₯, π¦ β πΈ0 !!!
β’ Denote the unique vertex where π₯, π¦ differ by π£
β’ Denote: π΄π€ (π₯) is the set of allowable colors for node
π€ in coloring π₯
Theorem β Proof 2/8
β’ Choose a vertex π€ at random
β’ If π€ is not a neighbor of π£, choose a color at random
from π΄π€ π₯ = π΄π€ (π¦), and update π€ with it
β’ Works when π€ = π£ as well
β’ So far, itβs consistent with a coupling
β’ If π€ is a neighbor of π£, assume WLOG π΄π€ π₯
|π΄π€ π¦ |
β€
β’ Choose a random color π β π΄π€ (π¦), update π¦ at π€ with π
β’ The update of π₯ at π€ depends on the configuration
All Colors
π₯(π£) β π΄π€ π¦ , π¦(π£) β π΄π€ (π₯)
π
π
π£
π€
π£
π€
β’ We have the same allowable colors in both
configurations, so color π€ in π₯ with π
All Colors
π₯ π£ β π΄π€ π¦ , π¦(π£) β π΄π€ (π₯)
π
π
π£
π€
π£
π€
β’ If π is not black, we can color π€ with it
β’ If π is black, we can swap it for purple
β’ All allowable colors for π€ in π₯ are chosen with equal probability
All Colors
π₯ π£ β π΄π€ π¦ , π¦ π£ β π΄π€ (π₯)
π
π
π£
π€
π£
π€
β’ If π is not black, we can color π€ with it
β’ If π is black, we draw a random allowable color for π€
1
4
1 1
4 3
β’ The probability of every color in π΄π€ (π₯) is: + β
=
1
3
Theorem β Proof 6/8
β’ We have a coupling (π1 , π1 ) of π π₯,β
, π π¦,β
foor two
states π₯, π¦ that differ only at vertex π£
β’ Now we need to bound πΌπ₯,π¦ π π1 , π1 in order to
use the corollary (by some function π βπΌ β
π π₯, π¦ )
β’ π π1 , π1 decreases to 0 iff we chose π£, i.e. w.p. 1/π
β’ π π1 , π1 increases to 2 iff we chose a neighbor of π£,
w.p. deg π£ /π, and we updated with different colors
β’ So we got π = π₯(π£), w.p. β€
β’ In total π π1 , π1 β€ 1
1
β
π
+
1
β€
π΄π€ π¦
deg π£
1
β
π
πβΞ
1
πβΞ
Theorem β Proof 7/8
β’ π π1 , π1 β€ 1 β
1
π
deg π£
+
π
1
π
β’ π π1 , π1 β€ 1 β β
1 β
β’ Since π > 2 β
Ξ,
Ξ
πβΞ
1
β
πβΞ
Ξ
πβΞ
< 1, so π π1 , π1 < 1
β’ The distance is indeed decreasing
β’ Denoting: π Ξ, π = 1 β
π π₯ β₯ 1 + π₯ we get:
π π1 , π1 β€ 1
π Ξ,π
β
π
Ξ
πβΞ
β€π
=
π Ξ,π
π
β
πβ2Ξ
qβΞ
and using the inequality
Theorem β Proof 8/8
β’ π π1 , π1 β€ π
β
π Ξ,π
π
β’ Applying the corollary: (π‘πππ₯ π β€
β log π +log ππππ Ξ©
πΌ
β log π + log π
π‘πππ₯ π β€
π Ξ, π
π
πβΞ
=
π(log π β log(π)
π β 2Ξ
)
Known Results
β’ Some more results for Glauber dynamics on proper
π-colorings:
β’ For π β₯
11
6
β
Ξ, it is known π‘πππ₯ = π π β
log π
β’ For triangle-free graphs with maximum degree
Ξ = Ξ© log π , we can improve the bound on π to: π
β₯ 1.49. . .β
Ξ
1
2
β’ For the empty graph: π‘πππ₯ β₯ β
π β
log π β π π β
π
Questions?
© Copyright 2026 Paperzz