The Transportation Metric and Path Coupling

The Transportation Metric
and Path Coupling
By Levin, Peres and Wilmer
Presented by Oleg Zlydenko
Coupling
• Analyze the mixing time of a Markov chain 𝑀
• Devise a way to advance two states 𝑥, 𝑦 ∈ Ω × Ω:
• In both coordinates it looks like 𝑀 (= coupling)
• For any two starting states, they eventually meet
• A bound on 𝑡𝑚𝑖𝑥 depends on the time it takes the
chains to meet
Review
• Theorem: Let 𝑋𝑡 , 𝑌𝑡 ∞
𝑡=0 be a sticky coupling of 𝑃,
where 𝑋0 = 𝑥, 𝑌0 = 𝑦. Let 𝜏𝑐𝑜𝑢𝑝𝑙𝑒 be the first time
the chains meet.
Then: ||𝑃𝑡 𝑥,⋅ − 𝑃𝑡 𝑦,⋅ || 𝑇𝑉 ≤ 𝑃𝑥,𝑦 {𝜏𝑐𝑜𝑢𝑝𝑙𝑒 > 𝑡}
• Which leads to: 𝑑 𝑡 ≤ max 𝑃𝑥,𝑦 {𝜏𝑐𝑜𝑢𝑝𝑙𝑒 > 𝑡}
𝑥,𝑦∈Ω
• Then we find the minimal 𝑡 that ensures 𝑑 𝑡 < 𝜖
• This is a bound on 𝑡𝑚𝑖𝑥 (𝜖)
The Path Coupling Technique
• Analyze the mixing time of a Markov chain 𝑀
• Devise a way to advance two states 𝑥, 𝑦 ∈ Ω × Ω:
Coupling
Path Coupling
• In both coordinates it looks
like 𝑀 (= coupling)
• For any two starting states,
they eventually meet
• A bound on 𝑡𝑚𝑖𝑥 depends on
the time it takes the chains to
meet
• In both coordinates it looks
like 𝑀 (= coupling)
• For some two starting states,
they tend to “get closer”
• A bound on 𝑡𝑚𝑖𝑥 depends on
how quickly the states get
closer
Plan
• The Transportation Metric
• The main theorem of Path Coupling
• Bounding mixing time
• Example – Fast Mixing for Colorings
The Transportation Metric
The Transportation Metric
• We have a state space Ω and metric 𝜌 between
states
0.5
Assistant
Professor
0.2
0.3
Out on
the
Street
0.8
0.7
0.7
Associate
Professor
0.2
0.1
Tenured
Professor
0.3
0.2
Dead
• 𝜌 could be the difference
in salaries
• 𝜌 𝑠𝑡𝑟𝑒𝑒𝑡, 𝑑𝑒𝑎𝑑 = 3
• 𝜌 𝑎𝑠𝑠𝑖𝑠𝑡𝑎𝑛𝑡, 𝑑𝑒𝑎𝑑 = 15
• 𝜌 is a metric
1.0
• Non-negative
• Symmetric
• Triangle inequality
The Transportation Metric
• We have a state space Ω and metric 𝜌 between
states
• The transportation metric (or Wasserstein metric) is
defined on distributions over Ω:
• 𝜌𝐾 𝜇, 𝜈 = inf 𝔼 𝜌 𝑋, 𝑌 : 𝑋, 𝑌 𝑖𝑠 𝑎 𝑐𝑜𝑢𝑝𝑙𝑖𝑛𝑔 𝑜𝑓𝜇, 𝜈
• Intuitively – it’s the distance between two distributions
• If 𝜌 𝑥, 𝑦 = 1 𝑥≠𝑦 , then 𝜌𝐾 𝜇, 𝜈 = 𝜇 − 𝜈
𝑇𝑉
• If 𝜌 𝑥, 𝑦 ≥ 1 𝑥≠𝑦 , then 𝜌𝐾 𝜇, 𝜈 ≥ 𝜇 − 𝜈
𝑇𝑉
𝜇−𝜈
𝑇𝑉
= inf Pr 𝑋 ≠ 𝑌
𝑋,𝑌
Optimal Coupling
• An optimal coupling (𝑋∗ , 𝑌∗ ) of 𝜇, 𝜈 achieves:
• 𝜌𝐾 𝜇, 𝜈 = 𝔼 𝜌 𝑋∗ , 𝑌∗
• Lemma 1: an optimal coupling exists
• We already used this Lemma in a previous lesson, for
𝜌 𝑥, 𝑦 = 1 𝑥≠𝑦
Metric
• Lemma 2: 𝜌𝐾 is a metric on the space of probability
distributions on Ω
• Non-negativity: 𝜌 is a metric, hence non-negative, so
𝜌𝐾 is an infimum of a non-negative set
• Symmetry: 𝔼 𝜌 𝑋, 𝑌
• Triangle inequality:
= 𝔼 𝜌 𝑌, 𝑋
• For 3 distributions 𝜇, 𝜈, 𝜂 on Ω:
𝜌𝐾 𝜇, 𝜈 + 𝜌𝐾 𝜈, 𝜂 ≥ 𝜌𝐾 𝜇, 𝜂
• Proof over next few slides
Lemma 2 – Proof 1/2
• Let 𝜇, 𝜈, 𝜂 be distributions on Ω
• Let 𝑝, 𝑞 be the distributions on Ω × Ω of the optimal
couplings of (𝜇, 𝜈) and (𝜈, 𝜂) respectively
• Define a distribution: 𝑟 𝑥, 𝑦, 𝑧 =
𝑝 𝑥,𝑦 ⋅𝑞 𝑦,𝑧
𝜈 𝑦
• The projection of 𝑟 on the first 2 coordinates is 𝑝
•
𝑧∈Ω 𝑟
𝑥×𝑦×z =
𝑝 𝑥,𝑦
𝜈 𝑦
⋅
𝑧∈Ω 𝑞
𝑦, 𝑧 = 𝑝 𝑥, 𝑦
• Similarly, the projection of 𝑟 on the last 2 coordinates
is 𝑞
𝜂
Intuition
𝜇
𝑝
• 𝑟 𝑥, 𝑦, 𝑧 =
𝑝 𝑥,𝑦 ⋅𝑞 𝑦,𝑧
𝜈 𝑦
is a distribution on Ω3
• Each cell of a cube holds a probability
• The projection on 𝑥, 𝑦 is 𝑝 (a coupling of 𝜇, 𝜈)
• The projection on 𝑥 is 𝜇, the projection on 𝑦 is 𝜈
• The projection on 𝑦, 𝑧 is 𝑞 (a coupling of 𝜈, 𝜂)
• The projection on 𝑦 is 𝜈, the projection on 𝑦 is 𝜂
• So the projection on 𝑥, 𝑧 is a coupling of 𝜇, 𝜂
Lemma 2 – Proof 2/2
• Let (𝑋, 𝑌, 𝑍) is a random vector with distribution 𝑟
• Since 𝜌 is a metric:
• 𝜌 𝑋, 𝑍 ≤ 𝜌 𝑋, 𝑌 + 𝜌(𝑌, 𝑍)
• Take expectation on both sides
• 𝔼 𝜌 𝑋, 𝑍
≤ 𝔼 𝜌 𝑋, 𝑌
+ 𝔼 𝜌 𝑌, 𝑍
• Since 𝑝, 𝑞 are distributions of the optimal couplings
• 𝔼 𝜌 𝑋, 𝑌
+ 𝔼 𝜌 𝑌, 𝑍
= 𝜌𝐾 𝜇, 𝜈 + 𝜌𝐾 𝜈, 𝜂
• Note that (𝑋, 𝑍) is a coupling of 𝜇, 𝜂
• 𝜌𝐾 𝜇, 𝜂 ≤ 𝔼 𝜌 𝑋, 𝑍
≤ 𝜌𝐾 𝜇, 𝜈 + 𝜌𝐾 𝜈, 𝜂
Summary so far
• Given a state space Ω and metric 𝜌 between states,
we can define a new metric 𝜌𝐾 between distributions
• We can use it to bound the Total Variation distance, if
𝜌 𝑥, 𝑦 ≥ 1 for all 𝑥 ≠ 𝑦 ∈ Ω:
• 𝜌𝐾 𝜇, 𝜈 ≥ 𝜇 − 𝜈
𝑇𝑉
Path Coupling
Path Metric
• Suppose we have a connected graph 𝐺 = (𝑉, 𝐸0 )
• 𝑉 = Ω a state space of a Markov chain
• The edges don’t have to match permissible transitions
• In addition, we have a length function ℓ on edges
• ℓ 𝑥, 𝑦 ≥ 1 for all edges {𝑥, 𝑦}
• The length of a path is the sum of ℓ 𝑥, 𝑦 for edges
{𝑥, 𝑦} on the path
• The path metric on Ω:
• 𝜌 𝑥, 𝑦 = min ℓ 𝑠 : 𝑠 𝑖𝑠 𝑎 𝑝𝑎𝑡ℎ 𝑓𝑟𝑜𝑚 𝑥 𝑡𝑜 𝑦
• Why is it a metric?
0.5
0.7
0.7
0.2
Assistant
Professor
Associate
Professor
1
Out on
the
Street
0.8
1.5
0.2
2
Tenured
Professor
2
1
0.3
0.1
0.3
0.2
Dead
1.0
• 𝜌 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒, 𝑑𝑒𝑎𝑑 = 2
• 𝜌 𝑠𝑡𝑟𝑒𝑒𝑡, 𝑑𝑒𝑎𝑑 = 3
Summary so far
• Given a state space Ω and metric 𝜌 between states,
we can define a new metric 𝜌𝐾 between distributions
• We can use it to bound the Total Variation distance, if
𝜌 𝑥, 𝑦 ≥ 1 for all 𝑥 ≠ 𝑦 ∈ Ω:
• 𝜌𝐾 𝜇, 𝜈 ≥ 𝜇 − 𝜈
𝑇𝑉
• We can generate a metric 𝜌 between states by
extending distances between some states with the
path metric
Main Theorem (Bubley, Dyer)
• Let 𝐺 = Ω, E0 , ℓ and the path metric 𝜌 be as
previously defined
• Assume for each edge 𝑥, 𝑦 ∈ E0 we have a coupling
of distributions 𝑃 𝑥,⋅ , 𝑃(𝑦,⋅) such that:
• 𝔼 𝜌 𝑃 𝑥,⋅ , 𝑃(𝑦,⋅)
for some 𝛼 > 0
≤ 𝑒 −𝛼 ⋅ 𝜌 𝑥, 𝑦 = 𝑒 −𝛼 ⋅ ℓ 𝑥, 𝑦
• Then for any two distributions 𝜇, 𝜈 on Ω:
• 𝜌𝐾 𝜇𝑃, 𝜈𝑃 ≤ 𝑒 −𝛼 ⋅ 𝜌𝐾 (𝜇, 𝜈)
Motivation
• Recall: ℓ 𝑥, 𝑦 ≥ 1 for all edges {𝑥, 𝑦}
• Then 𝜌 𝑥, 𝑦 ≥ 1 𝑥≠𝑦
• Pr 𝑋 ≠ 𝑌 = 𝔼 1 𝑋≠𝑌 ≤ 𝔼 𝜌 𝑋, 𝑌
• Taking 𝑖𝑛𝑓 over all couplings (𝑋, 𝑌) of 𝜇, 𝜈:
• 𝜌𝑇𝑉 𝜇, 𝜈 ≤ 𝜌𝐾 (𝜇, 𝜈)
• So bounding 𝜌𝐾 (with the theorem) provides a bound
on the mixing time
Bounding Mixing Time
• Corollary: Suppose that the hypotheses of BublyDyer Theorem hold. Then:
𝑑 𝑡 = 𝑚𝑎𝑥𝑥∈Ω 𝑃𝑡 𝑥,⋅ − 𝜋
where 𝑑𝑖𝑎𝑚 Ω = max 𝜌 𝑥, 𝑦
x,y∈Ω
𝑇𝑉
≤ 𝑒 −𝛼𝑡 ⋅ 𝑑𝑖𝑎𝑚(Ω)
is the diameter of Ω
• And consequently:
𝑡𝑚𝑖𝑥
− log 𝜖 + log 𝑑𝑖𝑎𝑚 Ω
𝜖 = min t: d 𝑡 ≤ 𝜖 ≤
𝛼
Corollary – Proof
• Recall that the theorem provides:
• 𝜌𝐾 𝜇𝑃, 𝜈𝑃 ≤ 𝑒 −𝛼 ⋅ 𝜌𝐾 (𝜇, 𝜈)
• Applying it 𝑡 times: 𝜌𝐾 𝜇𝑃𝑡 , 𝜈𝑃𝑡 ≤ 𝑒 −𝛼𝑡 ⋅ 𝜌𝐾 𝜇, 𝜈
• Also, by definition: 𝜌𝐾 𝜇, 𝜈 ≤ 𝑑𝑖𝑎𝑚(Ω)
• Specifically:
• Choosing 𝜈 = 𝜋 gives 𝜈𝑃𝑡 = 𝜋𝑃𝑡 = 𝜋
• Choosing 𝜇 the distribution that always returns 𝑥, 𝜇𝑃𝑡 =
𝑃𝑡 𝑥,⋅
•
𝑃𝑡 𝑥,⋅ − 𝜋
𝑇𝑉
≤ 𝜌𝐾 𝑃𝑡 𝑥,⋅ , 𝜋 ≤ 𝑒 −𝛼𝑡 𝑑𝑖𝑎𝑚(Ω)
Bubley and Dyer Theorem – Proof 1/5
• Have: 𝔼 𝜌 𝑃 𝑥,⋅ , 𝑃(𝑦,⋅)
𝑥, 𝑦 ∈ 𝐸0 (for 𝛼 > 0)
• Recall:
≤ 𝑒 −𝛼 ⋅ 𝜌 𝑥, 𝑦 for
• 𝜌𝐾 𝜇, 𝜈 = inf 𝔼 𝜌 𝑋, 𝑌 : 𝑋, 𝑌 𝑖𝑠 𝑎 𝑐𝑜𝑢𝑝𝑙𝑖𝑛𝑔 𝑜𝑓𝜇, 𝜈
• So: 𝜌𝐾 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅
≤ 𝑒 −𝛼 ⋅ 𝜌 𝑥, 𝑦
• Lemma: For all 𝑥, 𝑦 ∈ Ω:
𝜌𝐾 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅
≤ 𝑒 −𝛼 ⋅ 𝜌(𝑥, 𝑦)
Bubley and Dyer Theorem – Proof 2/5
• Lemma: For all 𝑥, 𝑦 ∈ Ω:
𝜌𝐾 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅
≤ 𝑒 −𝛼 ⋅ 𝜌(𝑥, 𝑦)
• Proof:
• Consider the shortest path (according to 𝜌) between 𝑥, 𝑦:
𝑥 = 𝑥0 , 𝑥1 , … , 𝑥𝑟 = 𝑦
• 𝜌𝐾 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅ ≤ 𝑟𝑖=1 𝜌𝐾 𝑃 𝑥𝑖−1 ,⋅ , 𝑃 𝑥𝑖 ,⋅
• ≤ 𝑒 −𝛼 ⋅ 𝑟𝑖=1 𝜌 𝑥𝑖−1 , 𝑥𝑖
The triangle inequality
−𝛼 ⋅ 𝜌 𝑥, 𝑦 for edges
−𝛼
𝜌
𝑃
𝑥,⋅
,
𝑃
𝑦,⋅
≤
𝑒
𝐾
• = 𝑒 ⋅ 𝜌 𝑥, 𝑦
We chose the shortest path
Bubley and Dyer Theorem – Proof 3/5
• Have: 𝜌𝐾 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅ ≤ 𝑒 −𝛼 ⋅ 𝜌(𝑥, 𝑦)
• Want: 𝜌𝐾 𝜇𝑃, 𝜈𝑃 ≤ 𝑒 −𝛼 ⋅ 𝜌𝐾 (𝜇, 𝜈) for any two
distributions 𝜇, 𝜈 on Ω
• Let 𝜂 be an optimal coupling of 𝜇, 𝜈, i.e:
• 𝜌𝐾 𝜇, 𝜈 =
𝑥,𝑦∈Ω 𝜌
𝑥, 𝑦 ⋅ 𝜂(𝑥, 𝑦)
• We can choose for every 𝑥, 𝑦 ∈ Ω an optimal
coupling 𝜃𝑥,𝑦 of 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅
•
𝑢,𝑤∈Ω 𝜌
𝑢, 𝑤 ⋅ 𝜃𝑥,𝑦 (𝑢, 𝑤) ≤ 𝑒 −𝛼 ⋅ 𝜌(𝑥, 𝑦)
Bubley and Dyer Theorem – Proof 4/5
• Have: 𝜂 optimal coupling of 𝜇, 𝜈 , and 𝜃𝑥,𝑦 optimal
coupling of 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅
• 𝜌𝐾 𝜇, 𝜈 =
•
𝑢,𝑤∈Ω 𝜌
𝑥,𝑦∈Ω 𝜌
𝑥, 𝑦 ⋅ 𝜂(𝑥, 𝑦)
𝑢, 𝑤 ⋅ 𝜃𝑥,𝑦 (𝑢, 𝑤) ≤ 𝑒 −𝛼 ⋅ 𝜌(𝑥, 𝑦)
• Define a distribution 𝜃 =
• 𝜃 is a coupling of 𝜇𝑃, 𝜈𝑃
𝑥,𝑦∈Ω 𝜂(𝑥, 𝑦)
⋅ 𝜃𝑥,𝑦
•
Choose 2 starting states with
probabilities
𝑦) ⋅ 𝜃𝑥,𝑦 (𝑢,𝜇,
𝑤)𝜈 and then advance
𝑤∈Ω 𝜃(𝑢, 𝑤) = 𝑥,𝑦,𝑤∈Ω 𝜂(𝑥,
them
on the
Markov
chain
𝜂(𝑥,
𝑦)
⋅
𝜃
(𝑢,
𝑤)
=
𝜂(𝑥,
𝑦)
⋅
𝑃(𝑥,
𝑢) =
𝑥,𝑦∈Ω
𝑤∈Ω 𝑥,𝑦
𝑥,𝑦∈Ω
•
𝑥∈Ω 𝑃(𝑥, 𝑢) ⋅
𝑦∈Ω 𝜂(𝑥, 𝑦)
•
𝑥∈Ω 𝑃(𝑥, 𝑢) ⋅
𝜇(𝑥) = 𝜇𝑃(𝑢)
•
=
Bubley and Dyer Theorem – Proof 5/5
• Have:
• 𝜃=
𝑥,𝑦∈Ω 𝜂(𝑥, 𝑦) ⋅ 𝜃𝑥,𝑦 ,
• 𝜌𝐾 𝜇, 𝜈 =
•
𝑥,𝑦∈Ω 𝜌
a coupling of 𝜇𝑃, 𝜈𝑃
𝑥, 𝑦 ⋅ 𝜂(𝑥, 𝑦)
−𝛼
𝜌
𝑢,
𝑤
⋅
𝜃
(𝑢,
𝑤)
≤
𝑒
⋅ 𝜌(𝑥, 𝑦)
𝑥,𝑦
𝑢,𝑤∈Ω
• Want: 𝜌𝐾 𝜇𝑃, 𝜈𝑃 ≤ 𝑒 −𝛼 ⋅ 𝜌𝐾 (𝜇, 𝜈) for any two
distributions 𝜇, 𝜈 on Ω
• 𝜌𝐾 𝜇𝑃, 𝜈𝑃 ≤ 𝑢,𝑤∈Ω 𝜌 𝑢, 𝑤 ⋅ 𝜃 𝑢, 𝑤
• = 𝑢,𝑤∈Ω 𝑥,𝑦∈Ω 𝜌 𝑢, 𝑤 ⋅ 𝜂 𝑥, 𝑦 ⋅ 𝜃𝑥,𝑦 𝑢, 𝑤
• ≤ 𝑒 −𝛼 ⋅ 𝑥,𝑦∈Ω 𝜌 𝑥, 𝑦 ⋅ 𝜂 𝑥, 𝑦
• = 𝑒 −𝛼 ⋅ 𝜌𝐾 (𝜇, 𝜈)
Summary so far
• Given a state space Ω and a connected graph with
lengths ℓ on edges, we can define a metric 𝜌
between states (path metric)
• We can then define a metric 𝜌𝐾 between
distributions (transportation metric)
• Given some conditions on 𝜌, we can bound the
mixing time (Corollary of Bubley-Dyer theorem)
The Path Coupling Technique
• Analyze the mixing time of a Markov chain 𝑀
• Devise a way to advance two states 𝑥, 𝑦 ∈ Ω × Ω:
• In both coordinates it looks like 𝑀 (= coupling)
• For some two starting states, they tend to “get closer”
• A bound on 𝑡𝑚𝑖𝑥 depends on how quickly the states
get closer
The Path Coupling Technique
• Decide which states are “close” • Define 𝐺 = (Ω, 𝐸0 ), and a distance function ℓ on edges
• Extend ℓ to a metric 𝜌 on Ω using the path metric
• Devise a way to advance two states 𝑥, 𝑦 ∈ Ω × Ω:
• In both coordinates it looks like 𝑀 (= coupling)
• For 2 adjacent starting states, they tend to “get closer”
• 𝔼𝑥,𝑦 𝜌 𝑋1 , 𝑌1
≤ 𝑒 −𝛼 ⋅ 𝜌 𝑥, 𝑦
• A bound on 𝑡𝑚𝑖𝑥 depends on 𝛼 and 𝑑𝑖𝑎𝑚(Ω)
• 𝑡𝑚𝑖𝑥 𝜖 ≤
− log 𝜖 +log 𝑑𝑖𝑎𝑚 Ω
𝛼
• We want as many edges as we can (so 𝑑𝑖𝑎𝑚(Ω) is smaller)
• And we want the distance shrinking rapidly (so 𝛼 is bigger)
Fast Mixing for Colorings
Reminder: 𝑞-colorings
• Proper 𝑞-colorings of 𝐺 = 𝑉, 𝐸 are elements 𝑥 ∈
1,2, … , 𝑞 𝑉 s.t. : 𝑥 𝑣 ≠ 𝑥(𝑤) for 𝑣, 𝑤 ∈ 𝐸
• Many uses for coloring problems
• Voting choices on the graph of (people, friendships)
• People tend to change votes according to friends
• Time slots on the graph of (tasks, conflicts) = scheduling
• Conflicting tasks can’t be executed in the same time slot
• Many things it’s interesting to analyze:
• How does a “random” coloring look?
• How long does it take a process to converge?
Reminder
Metropolis chain
Glauber dynamics
• A vertex 𝑣 is chosen
uniformly at random
• A color 𝑘 is chosen
uniformly at random
• If updating the color of 𝑣 to
𝑘 yields a proper 𝑞-coloring,
accept it
• A vertex 𝑣 is chosen
uniformly at random
• A color 𝑘 is chosen between
the admissible colors
• Why are they different?
• Colors that don’t appear at
the neighbors of 𝑣
Theorem
• Consider the Glauber dynamics chain for proper 𝑞colorings of a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices and
maximum degree Δ
• If 𝑞 > 2 ⋅ Δ, then the mixing time satisfies:
• 𝑡𝑚𝑖𝑥 𝜖 ≤
𝑞−Δ
𝑞−2Δ
⋅ 𝑛 ⋅ (log 𝑛 − log(𝜖)
Comparison
Metropolis chain
Glauber dynamics
• 𝑞 >2⋅Δ
• 𝑡𝑚𝑖𝑥 𝜖 ≤
• 𝑞 >2⋅Δ
• 𝑡𝑚𝑖𝑥 𝜖 ≤
𝑞
𝑞−2Δ
𝑛(log 𝑛 − log(𝜖)
𝑞−Δ
𝑞−2Δ
𝑛(log 𝑛 − log(𝜖)
Theorem – Proof 1/8
• Two colorings are neighbors if their differ in 1 node
• Defines the graph 𝐺 = (Ω, 𝐸0 )
• Define ℓ 𝑥, 𝑦 = 1 for edges 𝑥, 𝑦 ∈ 𝐸0
• The metric is 𝜌 𝑥, 𝑦 =
𝑣∈𝑉 𝟏
𝑥 𝑣 ≠𝑦 𝑣
• We only need to define a way to generate 2 new
colorings starting with colorings 𝑥, 𝑦 ∈ 𝐸0 !!!
• Denote the unique vertex where 𝑥, 𝑦 differ by 𝑣
• Denote: 𝐴𝑤 (𝑥) is the set of allowable colors for node
𝑤 in coloring 𝑥
Theorem – Proof 2/8
• Choose a vertex 𝑤 at random
• If 𝑤 is not a neighbor of 𝑣, choose a color at random
from 𝐴𝑤 𝑥 = 𝐴𝑤 (𝑦), and update 𝑤 with it
• Works when 𝑤 = 𝑣 as well
• So far, it’s consistent with a coupling
• If 𝑤 is a neighbor of 𝑣, assume WLOG 𝐴𝑤 𝑥
|𝐴𝑤 𝑦 |
≤
• Choose a random color 𝑈 ∈ 𝐴𝑤 (𝑦), update 𝑦 at 𝑤 with 𝑈
• The update of 𝑥 at 𝑤 depends on the configuration
All Colors
𝑥(𝑣) ∉ 𝐴𝑤 𝑦 , 𝑦(𝑣) ∉ 𝐴𝑤 (𝑥)
𝒙
𝒚
𝑣
𝑤
𝑣
𝑤
• We have the same allowable colors in both
configurations, so color 𝑤 in 𝑥 with 𝑈
All Colors
𝑥 𝑣 ∈ 𝐴𝑤 𝑦 , 𝑦(𝑣) ∈ 𝐴𝑤 (𝑥)
𝒙
𝒚
𝑣
𝑤
𝑣
𝑤
• If 𝑈 is not black, we can color 𝑤 with it
• If 𝑈 is black, we can swap it for purple
• All allowable colors for 𝑤 in 𝑥 are chosen with equal probability
All Colors
𝑥 𝑣 ∈ 𝐴𝑤 𝑦 , 𝑦 𝑣 ∉ 𝐴𝑤 (𝑥)
𝒙
𝒚
𝑣
𝑤
𝑣
𝑤
• If 𝑈 is not black, we can color 𝑤 with it
• If 𝑈 is black, we draw a random allowable color for 𝑤
1
4
1 1
4 3
• The probability of every color in 𝐴𝑤 (𝑥) is: + ⋅ =
1
3
Theorem – Proof 6/8
• We have a coupling (𝑋1 , 𝑌1 ) of 𝑃 𝑥,⋅ , 𝑃 𝑦,⋅ foor two
states 𝑥, 𝑦 that differ only at vertex 𝑣
• Now we need to bound 𝔼𝑥,𝑦 𝜌 𝑋1 , 𝑌1 in order to
use the corollary (by some function 𝑒 −𝛼 ⋅ 𝜌 𝑥, 𝑦 )
• 𝜌 𝑋1 , 𝑌1 decreases to 0 iff we chose 𝑣, i.e. w.p. 1/𝑛
• 𝜌 𝑋1 , 𝑌1 increases to 2 iff we chose a neighbor of 𝑣,
w.p. deg 𝑣 /𝑛, and we updated with different colors
• So we got 𝑈 = 𝑥(𝑣), w.p. ≤
• In total 𝜌 𝑋1 , 𝑌1 ≤ 1
1
−
𝑛
+
1
≤
𝐴𝑤 𝑦
deg 𝑣
1
⋅
𝑛
𝑞−Δ
1
𝑞−Δ
Theorem – Proof 7/8
• 𝜌 𝑋1 , 𝑌1 ≤ 1 −
1
𝑛
deg 𝑣
+
𝑛
1
𝑛
• 𝜌 𝑋1 , 𝑌1 ≤ 1 − ⋅ 1 −
• Since 𝑞 > 2 ⋅ Δ,
Δ
𝑞−Δ
1
⋅
𝑞−Δ
Δ
𝑞−Δ
< 1, so 𝜌 𝑋1 , 𝑌1 < 1
• The distance is indeed decreasing
• Denoting: 𝑐 Δ, 𝑞 = 1 −
𝑒 𝑥 ≥ 1 + 𝑥 we get:
𝜌 𝑋1 , 𝑌1 ≤ 1
𝑐 Δ,𝑞
−
𝑛
Δ
𝑞−Δ
≤𝑒
=
𝑐 Δ,𝑞
𝑛
−
𝑞−2Δ
q−Δ
and using the inequality
Theorem – Proof 8/8
• 𝜌 𝑋1 , 𝑌1 ≤ 𝑒
−
𝑐 Δ,𝑞
𝑛
• Applying the corollary: (𝑡𝑚𝑖𝑥 𝜖 ≤
− log 𝜖 +log 𝑑𝑖𝑎𝑚 Ω
𝛼
− log 𝜖 + log 𝑛
𝑡𝑚𝑖𝑥 𝜖 ≤
𝑐 Δ, 𝑞
𝑛
𝑞−Δ
=
𝑛(log 𝑛 − log(𝜖)
𝑞 − 2Δ
)
Known Results
• Some more results for Glauber dynamics on proper
𝑞-colorings:
• For 𝑞 ≥
11
6
⋅ Δ, it is known 𝑡𝑚𝑖𝑥 = 𝑂 𝑛 ⋅ log 𝑛
• For triangle-free graphs with maximum degree
Δ = Ω log 𝑛 , we can improve the bound on 𝑞 to: 𝑞
≥ 1.49. . .⋅ Δ
1
2
• For the empty graph: 𝑡𝑚𝑖𝑥 ≥ ⋅ 𝑛 ⋅ log 𝑛 − 𝑐 𝑞 ⋅ 𝑛
Questions?

Download Report

The Transportation Metric and Path Coupling

Paperzz.com

Your Paperzz