The Transportation Metric and Path Coupling

The Transportation Metric
and Path Coupling
By Levin, Peres and Wilmer
Presented by Oleg Zlydenko
Coupling
β€’ Analyze the mixing time of a Markov chain 𝑀
β€’ Devise a way to advance two states π‘₯, 𝑦 ∈ Ξ© × Ξ©:
β€’ In both coordinates it looks like 𝑀 (= coupling)
β€’ For any two starting states, they eventually meet
β€’ A bound on π‘‘π‘šπ‘–π‘₯ depends on the time it takes the
chains to meet
Review
β€’ Theorem: Let 𝑋𝑑 , π‘Œπ‘‘ ∞
𝑑=0 be a sticky coupling of 𝑃,
where 𝑋0 = π‘₯, π‘Œ0 = 𝑦. Let πœπ‘π‘œπ‘’π‘π‘™π‘’ be the first time
the chains meet.
Then: ||𝑃𝑑 π‘₯,β‹… βˆ’ 𝑃𝑑 𝑦,β‹… || 𝑇𝑉 ≀ 𝑃π‘₯,𝑦 {πœπ‘π‘œπ‘’π‘π‘™π‘’ > 𝑑}
β€’ Which leads to: 𝑑 𝑑 ≀ max 𝑃π‘₯,𝑦 {πœπ‘π‘œπ‘’π‘π‘™π‘’ > 𝑑}
π‘₯,π‘¦βˆˆΞ©
β€’ Then we find the minimal 𝑑 that ensures 𝑑 𝑑 < πœ–
β€’ This is a bound on π‘‘π‘šπ‘–π‘₯ (πœ–)
The Path Coupling Technique
β€’ Analyze the mixing time of a Markov chain 𝑀
β€’ Devise a way to advance two states π‘₯, 𝑦 ∈ Ξ© × Ξ©:
Coupling
Path Coupling
β€’ In both coordinates it looks
like 𝑀 (= coupling)
β€’ For any two starting states,
they eventually meet
β€’ A bound on π‘‘π‘šπ‘–π‘₯ depends on
the time it takes the chains to
meet
β€’ In both coordinates it looks
like 𝑀 (= coupling)
β€’ For some two starting states,
they tend to β€œget closer”
β€’ A bound on π‘‘π‘šπ‘–π‘₯ depends on
how quickly the states get
closer
Plan
β€’ The Transportation Metric
β€’ The main theorem of Path Coupling
β€’ Bounding mixing time
β€’ Example – Fast Mixing for Colorings
The Transportation Metric
The Transportation Metric
β€’ We have a state space Ξ© and metric 𝜌 between
states
0.5
Assistant
Professor
0.2
0.3
Out on
the
Street
0.8
0.7
0.7
Associate
Professor
0.2
0.1
Tenured
Professor
0.3
0.2
Dead
β€’ 𝜌 could be the difference
in salaries
β€’ 𝜌 π‘ π‘‘π‘Ÿπ‘’π‘’π‘‘, π‘‘π‘’π‘Žπ‘‘ = 3
β€’ 𝜌 π‘Žπ‘ π‘ π‘–π‘ π‘‘π‘Žπ‘›π‘‘, π‘‘π‘’π‘Žπ‘‘ = 15
β€’ 𝜌 is a metric
1.0
β€’ Non-negative
β€’ Symmetric
β€’ Triangle inequality
The Transportation Metric
β€’ We have a state space Ξ© and metric 𝜌 between
states
β€’ The transportation metric (or Wasserstein metric) is
defined on distributions over Ξ©:
β€’ 𝜌𝐾 πœ‡, 𝜈 = inf 𝔼 𝜌 𝑋, π‘Œ : 𝑋, π‘Œ 𝑖𝑠 π‘Ž π‘π‘œπ‘’π‘π‘™π‘–π‘›π‘” π‘œπ‘“πœ‡, 𝜈
β€’ Intuitively – it’s the distance between two distributions
β€’ If 𝜌 π‘₯, 𝑦 = 1 π‘₯≠𝑦 , then 𝜌𝐾 πœ‡, 𝜈 = πœ‡ βˆ’ 𝜈
𝑇𝑉
β€’ If 𝜌 π‘₯, 𝑦 β‰₯ 1 π‘₯≠𝑦 , then 𝜌𝐾 πœ‡, 𝜈 β‰₯ πœ‡ βˆ’ 𝜈
𝑇𝑉
πœ‡βˆ’πœˆ
𝑇𝑉
= inf Pr 𝑋 β‰  π‘Œ
𝑋,π‘Œ
Optimal Coupling
β€’ An optimal coupling (π‘‹βˆ— , π‘Œβˆ— ) of πœ‡, 𝜈 achieves:
β€’ 𝜌𝐾 πœ‡, 𝜈 = 𝔼 𝜌 π‘‹βˆ— , π‘Œβˆ—
β€’ Lemma 1: an optimal coupling exists
β€’ We already used this Lemma in a previous lesson, for
𝜌 π‘₯, 𝑦 = 1 π‘₯≠𝑦
Metric
β€’ Lemma 2: 𝜌𝐾 is a metric on the space of probability
distributions on Ξ©
β€’ Non-negativity: 𝜌 is a metric, hence non-negative, so
𝜌𝐾 is an infimum of a non-negative set
β€’ Symmetry: 𝔼 𝜌 𝑋, π‘Œ
β€’ Triangle inequality:
= 𝔼 𝜌 π‘Œ, 𝑋
β€’ For 3 distributions πœ‡, 𝜈, πœ‚ on Ξ©:
𝜌𝐾 πœ‡, 𝜈 + 𝜌𝐾 𝜈, πœ‚ β‰₯ 𝜌𝐾 πœ‡, πœ‚
β€’ Proof over next few slides
Lemma 2 – Proof 1/2
β€’ Let πœ‡, 𝜈, πœ‚ be distributions on Ξ©
β€’ Let 𝑝, π‘ž be the distributions on Ξ© × Ξ© of the optimal
couplings of (πœ‡, 𝜈) and (𝜈, πœ‚) respectively
β€’ Define a distribution: π‘Ÿ π‘₯, 𝑦, 𝑧 =
𝑝 π‘₯,𝑦 β‹…π‘ž 𝑦,𝑧
𝜈 𝑦
β€’ The projection of π‘Ÿ on the first 2 coordinates is 𝑝
β€’
π‘§βˆˆΞ© π‘Ÿ
π‘₯×𝑦×z =
𝑝 π‘₯,𝑦
𝜈 𝑦
β‹…
π‘§βˆˆΞ© π‘ž
𝑦, 𝑧 = 𝑝 π‘₯, 𝑦
β€’ Similarly, the projection of π‘Ÿ on the last 2 coordinates
is π‘ž
πœ‚
Intuition
πœ‡
𝑝
β€’ π‘Ÿ π‘₯, 𝑦, 𝑧 =
𝑝 π‘₯,𝑦 β‹…π‘ž 𝑦,𝑧
𝜈 𝑦
is a distribution on Ξ©3
β€’ Each cell of a cube holds a probability
β€’ The projection on π‘₯, 𝑦 is 𝑝 (a coupling of πœ‡, 𝜈)
β€’ The projection on π‘₯ is πœ‡, the projection on 𝑦 is 𝜈
β€’ The projection on 𝑦, 𝑧 is π‘ž (a coupling of 𝜈, πœ‚)
β€’ The projection on 𝑦 is 𝜈, the projection on 𝑦 is πœ‚
β€’ So the projection on π‘₯, 𝑧 is a coupling of πœ‡, πœ‚
Lemma 2 – Proof 2/2
β€’ Let (𝑋, π‘Œ, 𝑍) is a random vector with distribution π‘Ÿ
β€’ Since 𝜌 is a metric:
β€’ 𝜌 𝑋, 𝑍 ≀ 𝜌 𝑋, π‘Œ + 𝜌(π‘Œ, 𝑍)
β€’ Take expectation on both sides
β€’ 𝔼 𝜌 𝑋, 𝑍
≀ 𝔼 𝜌 𝑋, π‘Œ
+ 𝔼 𝜌 π‘Œ, 𝑍
β€’ Since 𝑝, π‘ž are distributions of the optimal couplings
β€’ 𝔼 𝜌 𝑋, π‘Œ
+ 𝔼 𝜌 π‘Œ, 𝑍
= 𝜌𝐾 πœ‡, 𝜈 + 𝜌𝐾 𝜈, πœ‚
β€’ Note that (𝑋, 𝑍) is a coupling of πœ‡, πœ‚
β€’ 𝜌𝐾 πœ‡, πœ‚ ≀ 𝔼 𝜌 𝑋, 𝑍
≀ 𝜌𝐾 πœ‡, 𝜈 + 𝜌𝐾 𝜈, πœ‚
Summary so far
β€’ Given a state space Ξ© and metric 𝜌 between states,
we can define a new metric 𝜌𝐾 between distributions
β€’ We can use it to bound the Total Variation distance, if
𝜌 π‘₯, 𝑦 β‰₯ 1 for all π‘₯ β‰  𝑦 ∈ Ξ©:
β€’ 𝜌𝐾 πœ‡, 𝜈 β‰₯ πœ‡ βˆ’ 𝜈
𝑇𝑉
Path Coupling
Path Metric
β€’ Suppose we have a connected graph 𝐺 = (𝑉, 𝐸0 )
β€’ 𝑉 = Ξ© a state space of a Markov chain
β€’ The edges don’t have to match permissible transitions
β€’ In addition, we have a length function β„“ on edges
β€’ β„“ π‘₯, 𝑦 β‰₯ 1 for all edges {π‘₯, 𝑦}
β€’ The length of a path is the sum of β„“ π‘₯, 𝑦 for edges
{π‘₯, 𝑦} on the path
β€’ The path metric on Ξ©:
β€’ 𝜌 π‘₯, 𝑦 = min β„“ 𝑠 : 𝑠 𝑖𝑠 π‘Ž π‘π‘Žπ‘‘β„Ž π‘“π‘Ÿπ‘œπ‘š π‘₯ π‘‘π‘œ 𝑦
β€’ Why is it a metric?
0.5
0.7
0.7
0.2
Assistant
Professor
Associate
Professor
1
Out on
the
Street
0.8
1.5
0.2
2
Tenured
Professor
2
1
0.3
0.1
0.3
0.2
Dead
1.0
β€’ 𝜌 π‘Žπ‘ π‘ π‘œπ‘π‘–π‘Žπ‘‘π‘’, π‘‘π‘’π‘Žπ‘‘ = 2
β€’ 𝜌 π‘ π‘‘π‘Ÿπ‘’π‘’π‘‘, π‘‘π‘’π‘Žπ‘‘ = 3
Summary so far
β€’ Given a state space Ξ© and metric 𝜌 between states,
we can define a new metric 𝜌𝐾 between distributions
β€’ We can use it to bound the Total Variation distance, if
𝜌 π‘₯, 𝑦 β‰₯ 1 for all π‘₯ β‰  𝑦 ∈ Ξ©:
β€’ 𝜌𝐾 πœ‡, 𝜈 β‰₯ πœ‡ βˆ’ 𝜈
𝑇𝑉
β€’ We can generate a metric 𝜌 between states by
extending distances between some states with the
path metric
Main Theorem (Bubley, Dyer)
β€’ Let 𝐺 = Ξ©, E0 , β„“ and the path metric 𝜌 be as
previously defined
β€’ Assume for each edge π‘₯, 𝑦 ∈ E0 we have a coupling
of distributions 𝑃 π‘₯,β‹… , 𝑃(𝑦,β‹…) such that:
β€’ 𝔼 𝜌 𝑃 π‘₯,β‹… , 𝑃(𝑦,β‹…)
for some 𝛼 > 0
≀ 𝑒 βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦 = 𝑒 βˆ’π›Ό β‹… β„“ π‘₯, 𝑦
β€’ Then for any two distributions πœ‡, 𝜈 on Ξ©:
β€’ 𝜌𝐾 πœ‡π‘ƒ, πœˆπ‘ƒ ≀ 𝑒 βˆ’π›Ό β‹… 𝜌𝐾 (πœ‡, 𝜈)
Motivation
β€’ Recall: β„“ π‘₯, 𝑦 β‰₯ 1 for all edges {π‘₯, 𝑦}
β€’ Then 𝜌 π‘₯, 𝑦 β‰₯ 1 π‘₯≠𝑦
β€’ Pr 𝑋 β‰  π‘Œ = 𝔼 1 π‘‹β‰ π‘Œ ≀ 𝔼 𝜌 𝑋, π‘Œ
β€’ Taking 𝑖𝑛𝑓 over all couplings (𝑋, π‘Œ) of πœ‡, 𝜈:
β€’ πœŒπ‘‡π‘‰ πœ‡, 𝜈 ≀ 𝜌𝐾 (πœ‡, 𝜈)
β€’ So bounding 𝜌𝐾 (with the theorem) provides a bound
on the mixing time
Bounding Mixing Time
β€’ Corollary: Suppose that the hypotheses of BublyDyer Theorem hold. Then:
𝑑 𝑑 = π‘šπ‘Žπ‘₯π‘₯∈Ω 𝑃𝑑 π‘₯,β‹… βˆ’ πœ‹
where π‘‘π‘–π‘Žπ‘š Ξ© = max 𝜌 π‘₯, 𝑦
x,y∈Ω
𝑇𝑉
≀ 𝑒 βˆ’π›Όπ‘‘ β‹… π‘‘π‘–π‘Žπ‘š(Ξ©)
is the diameter of Ξ©
β€’ And consequently:
π‘‘π‘šπ‘–π‘₯
βˆ’ log πœ– + log π‘‘π‘–π‘Žπ‘š Ξ©
πœ– = min t: d 𝑑 ≀ πœ– ≀
𝛼
Corollary – Proof
β€’ Recall that the theorem provides:
β€’ 𝜌𝐾 πœ‡π‘ƒ, πœˆπ‘ƒ ≀ 𝑒 βˆ’π›Ό β‹… 𝜌𝐾 (πœ‡, 𝜈)
β€’ Applying it 𝑑 times: 𝜌𝐾 πœ‡π‘ƒπ‘‘ , πœˆπ‘ƒπ‘‘ ≀ 𝑒 βˆ’π›Όπ‘‘ β‹… 𝜌𝐾 πœ‡, 𝜈
β€’ Also, by definition: 𝜌𝐾 πœ‡, 𝜈 ≀ π‘‘π‘–π‘Žπ‘š(Ξ©)
β€’ Specifically:
β€’ Choosing 𝜈 = πœ‹ gives πœˆπ‘ƒπ‘‘ = πœ‹π‘ƒπ‘‘ = πœ‹
β€’ Choosing πœ‡ the distribution that always returns π‘₯, πœ‡π‘ƒπ‘‘ =
𝑃𝑑 π‘₯,β‹…
β€’
𝑃𝑑 π‘₯,β‹… βˆ’ πœ‹
𝑇𝑉
≀ 𝜌𝐾 𝑃𝑑 π‘₯,β‹… , πœ‹ ≀ 𝑒 βˆ’π›Όπ‘‘ π‘‘π‘–π‘Žπ‘š(Ξ©)
Bubley and Dyer Theorem – Proof 1/5
β€’ Have: 𝔼 𝜌 𝑃 π‘₯,β‹… , 𝑃(𝑦,β‹…)
π‘₯, 𝑦 ∈ 𝐸0 (for 𝛼 > 0)
β€’ Recall:
≀ 𝑒 βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦 for
β€’ 𝜌𝐾 πœ‡, 𝜈 = inf 𝔼 𝜌 𝑋, π‘Œ : 𝑋, π‘Œ 𝑖𝑠 π‘Ž π‘π‘œπ‘’π‘π‘™π‘–π‘›π‘” π‘œπ‘“πœ‡, 𝜈
β€’ So: 𝜌𝐾 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹…
≀ 𝑒 βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦
β€’ Lemma: For all π‘₯, 𝑦 ∈ Ξ©:
𝜌𝐾 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹…
≀ 𝑒 βˆ’π›Ό β‹… 𝜌(π‘₯, 𝑦)
Bubley and Dyer Theorem – Proof 2/5
β€’ Lemma: For all π‘₯, 𝑦 ∈ Ξ©:
𝜌𝐾 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹…
≀ 𝑒 βˆ’π›Ό β‹… 𝜌(π‘₯, 𝑦)
β€’ Proof:
β€’ Consider the shortest path (according to 𝜌) between π‘₯, 𝑦:
π‘₯ = π‘₯0 , π‘₯1 , … , π‘₯π‘Ÿ = 𝑦
β€’ 𝜌𝐾 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹… ≀ π‘Ÿπ‘–=1 𝜌𝐾 𝑃 π‘₯π‘–βˆ’1 ,β‹… , 𝑃 π‘₯𝑖 ,β‹…
β€’ ≀ 𝑒 βˆ’π›Ό β‹… π‘Ÿπ‘–=1 𝜌 π‘₯π‘–βˆ’1 , π‘₯𝑖
The triangle inequality
βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦 for edges
βˆ’π›Ό
𝜌
𝑃
π‘₯,β‹…
,
𝑃
𝑦,β‹…
≀
𝑒
𝐾
β€’ = 𝑒 β‹… 𝜌 π‘₯, 𝑦
We chose the shortest path
Bubley and Dyer Theorem – Proof 3/5
β€’ Have: 𝜌𝐾 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹… ≀ 𝑒 βˆ’π›Ό β‹… 𝜌(π‘₯, 𝑦)
β€’ Want: 𝜌𝐾 πœ‡π‘ƒ, πœˆπ‘ƒ ≀ 𝑒 βˆ’π›Ό β‹… 𝜌𝐾 (πœ‡, 𝜈) for any two
distributions πœ‡, 𝜈 on Ξ©
β€’ Let πœ‚ be an optimal coupling of πœ‡, 𝜈, i.e:
β€’ 𝜌𝐾 πœ‡, 𝜈 =
π‘₯,π‘¦βˆˆΞ© 𝜌
π‘₯, 𝑦 β‹… πœ‚(π‘₯, 𝑦)
β€’ We can choose for every π‘₯, 𝑦 ∈ Ξ© an optimal
coupling πœƒπ‘₯,𝑦 of 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹…
β€’
𝑒,π‘€βˆˆΞ© 𝜌
𝑒, 𝑀 β‹… πœƒπ‘₯,𝑦 (𝑒, 𝑀) ≀ 𝑒 βˆ’π›Ό β‹… 𝜌(π‘₯, 𝑦)
Bubley and Dyer Theorem – Proof 4/5
β€’ Have: πœ‚ optimal coupling of πœ‡, 𝜈 , and πœƒπ‘₯,𝑦 optimal
coupling of 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹…
β€’ 𝜌𝐾 πœ‡, 𝜈 =
β€’
𝑒,π‘€βˆˆΞ© 𝜌
π‘₯,π‘¦βˆˆΞ© 𝜌
π‘₯, 𝑦 β‹… πœ‚(π‘₯, 𝑦)
𝑒, 𝑀 β‹… πœƒπ‘₯,𝑦 (𝑒, 𝑀) ≀ 𝑒 βˆ’π›Ό β‹… 𝜌(π‘₯, 𝑦)
β€’ Define a distribution πœƒ =
β€’ πœƒ is a coupling of πœ‡π‘ƒ, πœˆπ‘ƒ
π‘₯,π‘¦βˆˆΞ© πœ‚(π‘₯, 𝑦)
β‹… πœƒπ‘₯,𝑦
β€’
Choose 2 starting states with
probabilities
𝑦) β‹… πœƒπ‘₯,𝑦 (𝑒,πœ‡,
𝑀)𝜈 and then advance
π‘€βˆˆΞ© πœƒ(𝑒, 𝑀) = π‘₯,𝑦,π‘€βˆˆΞ© πœ‚(π‘₯,
them
on the
Markov
chain
πœ‚(π‘₯,
𝑦)
β‹…
πœƒ
(𝑒,
𝑀)
=
πœ‚(π‘₯,
𝑦)
β‹…
𝑃(π‘₯,
𝑒) =
π‘₯,π‘¦βˆˆΞ©
π‘€βˆˆΞ© π‘₯,𝑦
π‘₯,π‘¦βˆˆΞ©
β€’
π‘₯∈Ω 𝑃(π‘₯, 𝑒) β‹…
π‘¦βˆˆΞ© πœ‚(π‘₯, 𝑦)
β€’
π‘₯∈Ω 𝑃(π‘₯, 𝑒) β‹…
πœ‡(π‘₯) = πœ‡π‘ƒ(𝑒)
β€’
=
Bubley and Dyer Theorem – Proof 5/5
β€’ Have:
β€’ πœƒ=
π‘₯,π‘¦βˆˆΞ© πœ‚(π‘₯, 𝑦) β‹… πœƒπ‘₯,𝑦 ,
β€’ 𝜌𝐾 πœ‡, 𝜈 =
β€’
π‘₯,π‘¦βˆˆΞ© 𝜌
a coupling of πœ‡π‘ƒ, πœˆπ‘ƒ
π‘₯, 𝑦 β‹… πœ‚(π‘₯, 𝑦)
βˆ’π›Ό
𝜌
𝑒,
𝑀
β‹…
πœƒ
(𝑒,
𝑀)
≀
𝑒
β‹… 𝜌(π‘₯, 𝑦)
π‘₯,𝑦
𝑒,π‘€βˆˆΞ©
β€’ Want: 𝜌𝐾 πœ‡π‘ƒ, πœˆπ‘ƒ ≀ 𝑒 βˆ’π›Ό β‹… 𝜌𝐾 (πœ‡, 𝜈) for any two
distributions πœ‡, 𝜈 on Ξ©
β€’ 𝜌𝐾 πœ‡π‘ƒ, πœˆπ‘ƒ ≀ 𝑒,π‘€βˆˆΞ© 𝜌 𝑒, 𝑀 β‹… πœƒ 𝑒, 𝑀
β€’ = 𝑒,π‘€βˆˆΞ© π‘₯,π‘¦βˆˆΞ© 𝜌 𝑒, 𝑀 β‹… πœ‚ π‘₯, 𝑦 β‹… πœƒπ‘₯,𝑦 𝑒, 𝑀
β€’ ≀ 𝑒 βˆ’π›Ό β‹… π‘₯,π‘¦βˆˆΞ© 𝜌 π‘₯, 𝑦 β‹… πœ‚ π‘₯, 𝑦
β€’ = 𝑒 βˆ’π›Ό β‹… 𝜌𝐾 (πœ‡, 𝜈)
Summary so far
β€’ Given a state space Ξ© and a connected graph with
lengths β„“ on edges, we can define a metric 𝜌
between states (path metric)
β€’ We can then define a metric 𝜌𝐾 between
distributions (transportation metric)
β€’ Given some conditions on 𝜌, we can bound the
mixing time (Corollary of Bubley-Dyer theorem)
The Path Coupling Technique
β€’ Analyze the mixing time of a Markov chain 𝑀
β€’ Devise a way to advance two states π‘₯, 𝑦 ∈ Ξ© × Ξ©:
β€’ In both coordinates it looks like 𝑀 (= coupling)
β€’ For some two starting states, they tend to β€œget closer”
β€’ A bound on π‘‘π‘šπ‘–π‘₯ depends on how quickly the states
get closer
The Path Coupling Technique
β€’ Decide which states are β€œclose” β€’ Define 𝐺 = (Ξ©, 𝐸0 ), and a distance function β„“ on edges
β€’ Extend β„“ to a metric 𝜌 on Ξ© using the path metric
β€’ Devise a way to advance two states π‘₯, 𝑦 ∈ Ξ© × Ξ©:
β€’ In both coordinates it looks like 𝑀 (= coupling)
β€’ For 2 adjacent starting states, they tend to β€œget closer”
β€’ 𝔼π‘₯,𝑦 𝜌 𝑋1 , π‘Œ1
≀ 𝑒 βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦
β€’ A bound on π‘‘π‘šπ‘–π‘₯ depends on 𝛼 and π‘‘π‘–π‘Žπ‘š(Ξ©)
β€’ π‘‘π‘šπ‘–π‘₯ πœ– ≀
βˆ’ log πœ– +log π‘‘π‘–π‘Žπ‘š Ξ©
𝛼
β€’ We want as many edges as we can (so π‘‘π‘–π‘Žπ‘š(Ξ©) is smaller)
β€’ And we want the distance shrinking rapidly (so 𝛼 is bigger)
Fast Mixing for Colorings
Reminder: π‘ž-colorings
β€’ Proper π‘ž-colorings of 𝐺 = 𝑉, 𝐸 are elements π‘₯ ∈
1,2, … , π‘ž 𝑉 s.t. : π‘₯ 𝑣 β‰  π‘₯(𝑀) for 𝑣, 𝑀 ∈ 𝐸
β€’ Many uses for coloring problems
β€’ Voting choices on the graph of (people, friendships)
β€’ People tend to change votes according to friends
β€’ Time slots on the graph of (tasks, conflicts) = scheduling
β€’ Conflicting tasks can’t be executed in the same time slot
β€’ Many things it’s interesting to analyze:
β€’ How does a β€œrandom” coloring look?
β€’ How long does it take a process to converge?
Reminder
Metropolis chain
Glauber dynamics
β€’ A vertex 𝑣 is chosen
uniformly at random
β€’ A color π‘˜ is chosen
uniformly at random
β€’ If updating the color of 𝑣 to
π‘˜ yields a proper π‘ž-coloring,
accept it
β€’ A vertex 𝑣 is chosen
uniformly at random
β€’ A color π‘˜ is chosen between
the admissible colors
β€’ Why are they different?
β€’ Colors that don’t appear at
the neighbors of 𝑣
Theorem
β€’ Consider the Glauber dynamics chain for proper π‘žcolorings of a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices and
maximum degree Ξ”
β€’ If π‘ž > 2 β‹… Ξ”, then the mixing time satisfies:
β€’ π‘‘π‘šπ‘–π‘₯ πœ– ≀
π‘žβˆ’Ξ”
π‘žβˆ’2Ξ”
β‹… 𝑛 β‹… (log 𝑛 βˆ’ log(πœ–)
Comparison
Metropolis chain
Glauber dynamics
β€’ π‘ž >2β‹…Ξ”
β€’ π‘‘π‘šπ‘–π‘₯ πœ– ≀
β€’ π‘ž >2β‹…Ξ”
β€’ π‘‘π‘šπ‘–π‘₯ πœ– ≀
π‘ž
π‘žβˆ’2Ξ”
𝑛(log 𝑛 βˆ’ log(πœ–)
π‘žβˆ’Ξ”
π‘žβˆ’2Ξ”
𝑛(log 𝑛 βˆ’ log(πœ–)
Theorem – Proof 1/8
β€’ Two colorings are neighbors if their differ in 1 node
β€’ Defines the graph 𝐺 = (Ξ©, 𝐸0 )
β€’ Define β„“ π‘₯, 𝑦 = 1 for edges π‘₯, 𝑦 ∈ 𝐸0
β€’ The metric is 𝜌 π‘₯, 𝑦 =
π‘£βˆˆπ‘‰ 𝟏
π‘₯ 𝑣 ≠𝑦 𝑣
β€’ We only need to define a way to generate 2 new
colorings starting with colorings π‘₯, 𝑦 ∈ 𝐸0 !!!
β€’ Denote the unique vertex where π‘₯, 𝑦 differ by 𝑣
β€’ Denote: 𝐴𝑀 (π‘₯) is the set of allowable colors for node
𝑀 in coloring π‘₯
Theorem – Proof 2/8
β€’ Choose a vertex 𝑀 at random
β€’ If 𝑀 is not a neighbor of 𝑣, choose a color at random
from 𝐴𝑀 π‘₯ = 𝐴𝑀 (𝑦), and update 𝑀 with it
β€’ Works when 𝑀 = 𝑣 as well
β€’ So far, it’s consistent with a coupling
β€’ If 𝑀 is a neighbor of 𝑣, assume WLOG 𝐴𝑀 π‘₯
|𝐴𝑀 𝑦 |
≀
β€’ Choose a random color π‘ˆ ∈ 𝐴𝑀 (𝑦), update 𝑦 at 𝑀 with π‘ˆ
β€’ The update of π‘₯ at 𝑀 depends on the configuration
All Colors
π‘₯(𝑣) βˆ‰ 𝐴𝑀 𝑦 , 𝑦(𝑣) βˆ‰ 𝐴𝑀 (π‘₯)
𝒙
π’š
𝑣
𝑀
𝑣
𝑀
β€’ We have the same allowable colors in both
configurations, so color 𝑀 in π‘₯ with π‘ˆ
All Colors
π‘₯ 𝑣 ∈ 𝐴𝑀 𝑦 , 𝑦(𝑣) ∈ 𝐴𝑀 (π‘₯)
𝒙
π’š
𝑣
𝑀
𝑣
𝑀
β€’ If π‘ˆ is not black, we can color 𝑀 with it
β€’ If π‘ˆ is black, we can swap it for purple
β€’ All allowable colors for 𝑀 in π‘₯ are chosen with equal probability
All Colors
π‘₯ 𝑣 ∈ 𝐴𝑀 𝑦 , 𝑦 𝑣 βˆ‰ 𝐴𝑀 (π‘₯)
𝒙
π’š
𝑣
𝑀
𝑣
𝑀
β€’ If π‘ˆ is not black, we can color 𝑀 with it
β€’ If π‘ˆ is black, we draw a random allowable color for 𝑀
1
4
1 1
4 3
β€’ The probability of every color in 𝐴𝑀 (π‘₯) is: + β‹… =
1
3
Theorem – Proof 6/8
β€’ We have a coupling (𝑋1 , π‘Œ1 ) of 𝑃 π‘₯,β‹… , 𝑃 𝑦,β‹… foor two
states π‘₯, 𝑦 that differ only at vertex 𝑣
β€’ Now we need to bound 𝔼π‘₯,𝑦 𝜌 𝑋1 , π‘Œ1 in order to
use the corollary (by some function 𝑒 βˆ’π›Ό β‹… 𝜌 π‘₯, 𝑦 )
β€’ 𝜌 𝑋1 , π‘Œ1 decreases to 0 iff we chose 𝑣, i.e. w.p. 1/𝑛
β€’ 𝜌 𝑋1 , π‘Œ1 increases to 2 iff we chose a neighbor of 𝑣,
w.p. deg 𝑣 /𝑛, and we updated with different colors
β€’ So we got π‘ˆ = π‘₯(𝑣), w.p. ≀
β€’ In total 𝜌 𝑋1 , π‘Œ1 ≀ 1
1
βˆ’
𝑛
+
1
≀
𝐴𝑀 𝑦
deg 𝑣
1
β‹…
𝑛
π‘žβˆ’Ξ”
1
π‘žβˆ’Ξ”
Theorem – Proof 7/8
β€’ 𝜌 𝑋1 , π‘Œ1 ≀ 1 βˆ’
1
𝑛
deg 𝑣
+
𝑛
1
𝑛
β€’ 𝜌 𝑋1 , π‘Œ1 ≀ 1 βˆ’ β‹… 1 βˆ’
β€’ Since π‘ž > 2 β‹… Ξ”,
Ξ”
π‘žβˆ’Ξ”
1
β‹…
π‘žβˆ’Ξ”
Ξ”
π‘žβˆ’Ξ”
< 1, so 𝜌 𝑋1 , π‘Œ1 < 1
β€’ The distance is indeed decreasing
β€’ Denoting: 𝑐 Ξ”, π‘ž = 1 βˆ’
𝑒 π‘₯ β‰₯ 1 + π‘₯ we get:
𝜌 𝑋1 , π‘Œ1 ≀ 1
𝑐 Ξ”,π‘ž
βˆ’
𝑛
Ξ”
π‘žβˆ’Ξ”
≀𝑒
=
𝑐 Ξ”,π‘ž
𝑛
βˆ’
π‘žβˆ’2Ξ”
qβˆ’Ξ”
and using the inequality
Theorem – Proof 8/8
β€’ 𝜌 𝑋1 , π‘Œ1 ≀ 𝑒
βˆ’
𝑐 Ξ”,π‘ž
𝑛
β€’ Applying the corollary: (π‘‘π‘šπ‘–π‘₯ πœ– ≀
βˆ’ log πœ– +log π‘‘π‘–π‘Žπ‘š Ξ©
𝛼
βˆ’ log πœ– + log 𝑛
π‘‘π‘šπ‘–π‘₯ πœ– ≀
𝑐 Ξ”, π‘ž
𝑛
π‘žβˆ’Ξ”
=
𝑛(log 𝑛 βˆ’ log(πœ–)
π‘ž βˆ’ 2Ξ”
)
Known Results
β€’ Some more results for Glauber dynamics on proper
π‘ž-colorings:
β€’ For π‘ž β‰₯
11
6
β‹… Ξ”, it is known π‘‘π‘šπ‘–π‘₯ = 𝑂 𝑛 β‹… log 𝑛
β€’ For triangle-free graphs with maximum degree
Ξ” = Ξ© log 𝑛 , we can improve the bound on π‘ž to: π‘ž
β‰₯ 1.49. . .β‹… Ξ”
1
2
β€’ For the empty graph: π‘‘π‘šπ‘–π‘₯ β‰₯ β‹… 𝑛 β‹… log 𝑛 βˆ’ 𝑐 π‘ž β‹… 𝑛
Questions?