The Transportation Metric and Path Coupling By Levin, Peres and Wilmer Presented by Oleg Zlydenko Coupling β’ Analyze the mixing time of a Markov chain π β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©: β’ In both coordinates it looks like π (= coupling) β’ For any two starting states, they eventually meet β’ A bound on π‘πππ₯ depends on the time it takes the chains to meet Review β’ Theorem: Let ππ‘ , ππ‘ β π‘=0 be a sticky coupling of π, where π0 = π₯, π0 = π¦. Let ππππ’πππ be the first time the chains meet. Then: ||ππ‘ π₯,β β ππ‘ π¦,β || ππ β€ ππ₯,π¦ {ππππ’πππ > π‘} β’ Which leads to: π π‘ β€ max ππ₯,π¦ {ππππ’πππ > π‘} π₯,π¦βΞ© β’ Then we find the minimal π‘ that ensures π π‘ < π β’ This is a bound on π‘πππ₯ (π) The Path Coupling Technique β’ Analyze the mixing time of a Markov chain π β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©: Coupling Path Coupling β’ In both coordinates it looks like π (= coupling) β’ For any two starting states, they eventually meet β’ A bound on π‘πππ₯ depends on the time it takes the chains to meet β’ In both coordinates it looks like π (= coupling) β’ For some two starting states, they tend to βget closerβ β’ A bound on π‘πππ₯ depends on how quickly the states get closer Plan β’ The Transportation Metric β’ The main theorem of Path Coupling β’ Bounding mixing time β’ Example β Fast Mixing for Colorings The Transportation Metric The Transportation Metric β’ We have a state space Ξ© and metric π between states 0.5 Assistant Professor 0.2 0.3 Out on the Street 0.8 0.7 0.7 Associate Professor 0.2 0.1 Tenured Professor 0.3 0.2 Dead β’ π could be the difference in salaries β’ π π π‘ππππ‘, ππππ = 3 β’ π ππ π ππ π‘πππ‘, ππππ = 15 β’ π is a metric 1.0 β’ Non-negative β’ Symmetric β’ Triangle inequality The Transportation Metric β’ We have a state space Ξ© and metric π between states β’ The transportation metric (or Wasserstein metric) is defined on distributions over Ξ©: β’ ππΎ π, π = inf πΌ π π, π : π, π ππ π πππ’πππππ πππ, π β’ Intuitively β itβs the distance between two distributions β’ If π π₯, π¦ = 1 π₯β π¦ , then ππΎ π, π = π β π ππ β’ If π π₯, π¦ β₯ 1 π₯β π¦ , then ππΎ π, π β₯ π β π ππ πβπ ππ = inf Pr π β π π,π Optimal Coupling β’ An optimal coupling (πβ , πβ ) of π, π achieves: β’ ππΎ π, π = πΌ π πβ , πβ β’ Lemma 1: an optimal coupling exists β’ We already used this Lemma in a previous lesson, for π π₯, π¦ = 1 π₯β π¦ Metric β’ Lemma 2: ππΎ is a metric on the space of probability distributions on Ξ© β’ Non-negativity: π is a metric, hence non-negative, so ππΎ is an infimum of a non-negative set β’ Symmetry: πΌ π π, π β’ Triangle inequality: = πΌ π π, π β’ For 3 distributions π, π, π on Ξ©: ππΎ π, π + ππΎ π, π β₯ ππΎ π, π β’ Proof over next few slides Lemma 2 β Proof 1/2 β’ Let π, π, π be distributions on Ξ© β’ Let π, π be the distributions on Ξ© × Ξ© of the optimal couplings of (π, π) and (π, π) respectively β’ Define a distribution: π π₯, π¦, π§ = π π₯,π¦ β π π¦,π§ π π¦ β’ The projection of π on the first 2 coordinates is π β’ π§βΞ© π π₯×π¦×z = π π₯,π¦ π π¦ β π§βΞ© π π¦, π§ = π π₯, π¦ β’ Similarly, the projection of π on the last 2 coordinates is π π Intuition π π β’ π π₯, π¦, π§ = π π₯,π¦ β π π¦,π§ π π¦ is a distribution on Ξ©3 β’ Each cell of a cube holds a probability β’ The projection on π₯, π¦ is π (a coupling of π, π) β’ The projection on π₯ is π, the projection on π¦ is π β’ The projection on π¦, π§ is π (a coupling of π, π) β’ The projection on π¦ is π, the projection on π¦ is π β’ So the projection on π₯, π§ is a coupling of π, π Lemma 2 β Proof 2/2 β’ Let (π, π, π) is a random vector with distribution π β’ Since π is a metric: β’ π π, π β€ π π, π + π(π, π) β’ Take expectation on both sides β’ πΌ π π, π β€ πΌ π π, π + πΌ π π, π β’ Since π, π are distributions of the optimal couplings β’ πΌ π π, π + πΌ π π, π = ππΎ π, π + ππΎ π, π β’ Note that (π, π) is a coupling of π, π β’ ππΎ π, π β€ πΌ π π, π β€ ππΎ π, π + ππΎ π, π Summary so far β’ Given a state space Ξ© and metric π between states, we can define a new metric ππΎ between distributions β’ We can use it to bound the Total Variation distance, if π π₯, π¦ β₯ 1 for all π₯ β π¦ β Ξ©: β’ ππΎ π, π β₯ π β π ππ Path Coupling Path Metric β’ Suppose we have a connected graph πΊ = (π, πΈ0 ) β’ π = Ξ© a state space of a Markov chain β’ The edges donβt have to match permissible transitions β’ In addition, we have a length function β on edges β’ β π₯, π¦ β₯ 1 for all edges {π₯, π¦} β’ The length of a path is the sum of β π₯, π¦ for edges {π₯, π¦} on the path β’ The path metric on Ξ©: β’ π π₯, π¦ = min β π : π ππ π πππ‘β ππππ π₯ π‘π π¦ β’ Why is it a metric? 0.5 0.7 0.7 0.2 Assistant Professor Associate Professor 1 Out on the Street 0.8 1.5 0.2 2 Tenured Professor 2 1 0.3 0.1 0.3 0.2 Dead 1.0 β’ π ππ π πππππ‘π, ππππ = 2 β’ π π π‘ππππ‘, ππππ = 3 Summary so far β’ Given a state space Ξ© and metric π between states, we can define a new metric ππΎ between distributions β’ We can use it to bound the Total Variation distance, if π π₯, π¦ β₯ 1 for all π₯ β π¦ β Ξ©: β’ ππΎ π, π β₯ π β π ππ β’ We can generate a metric π between states by extending distances between some states with the path metric Main Theorem (Bubley, Dyer) β’ Let πΊ = Ξ©, E0 , β and the path metric π be as previously defined β’ Assume for each edge π₯, π¦ β E0 we have a coupling of distributions π π₯,β , π(π¦,β ) such that: β’ πΌ π π π₯,β , π(π¦,β ) for some πΌ > 0 β€ π βπΌ β π π₯, π¦ = π βπΌ β β π₯, π¦ β’ Then for any two distributions π, π on Ξ©: β’ ππΎ ππ, ππ β€ π βπΌ β ππΎ (π, π) Motivation β’ Recall: β π₯, π¦ β₯ 1 for all edges {π₯, π¦} β’ Then π π₯, π¦ β₯ 1 π₯β π¦ β’ Pr π β π = πΌ 1 πβ π β€ πΌ π π, π β’ Taking πππ over all couplings (π, π) of π, π: β’ πππ π, π β€ ππΎ (π, π) β’ So bounding ππΎ (with the theorem) provides a bound on the mixing time Bounding Mixing Time β’ Corollary: Suppose that the hypotheses of BublyDyer Theorem hold. Then: π π‘ = πππ₯π₯βΞ© ππ‘ π₯,β β π where ππππ Ξ© = max π π₯, π¦ x,yβΞ© ππ β€ π βπΌπ‘ β ππππ(Ξ©) is the diameter of Ξ© β’ And consequently: π‘πππ₯ β log π + log ππππ Ξ© π = min t: d π‘ β€ π β€ πΌ Corollary β Proof β’ Recall that the theorem provides: β’ ππΎ ππ, ππ β€ π βπΌ β ππΎ (π, π) β’ Applying it π‘ times: ππΎ πππ‘ , πππ‘ β€ π βπΌπ‘ β ππΎ π, π β’ Also, by definition: ππΎ π, π β€ ππππ(Ξ©) β’ Specifically: β’ Choosing π = π gives πππ‘ = πππ‘ = π β’ Choosing π the distribution that always returns π₯, πππ‘ = ππ‘ π₯,β β’ ππ‘ π₯,β β π ππ β€ ππΎ ππ‘ π₯,β , π β€ π βπΌπ‘ ππππ(Ξ©) Bubley and Dyer Theorem β Proof 1/5 β’ Have: πΌ π π π₯,β , π(π¦,β ) π₯, π¦ β πΈ0 (for πΌ > 0) β’ Recall: β€ π βπΌ β π π₯, π¦ for β’ ππΎ π, π = inf πΌ π π, π : π, π ππ π πππ’πππππ πππ, π β’ So: ππΎ π π₯,β , π π¦,β β€ π βπΌ β π π₯, π¦ β’ Lemma: For all π₯, π¦ β Ξ©: ππΎ π π₯,β , π π¦,β β€ π βπΌ β π(π₯, π¦) Bubley and Dyer Theorem β Proof 2/5 β’ Lemma: For all π₯, π¦ β Ξ©: ππΎ π π₯,β , π π¦,β β€ π βπΌ β π(π₯, π¦) β’ Proof: β’ Consider the shortest path (according to π) between π₯, π¦: π₯ = π₯0 , π₯1 , β¦ , π₯π = π¦ β’ ππΎ π π₯,β , π π¦,β β€ ππ=1 ππΎ π π₯πβ1 ,β , π π₯π ,β β’ β€ π βπΌ β ππ=1 π π₯πβ1 , π₯π The triangle inequality βπΌ β π π₯, π¦ for edges βπΌ π π π₯,β , π π¦,β β€ π πΎ β’ = π β π π₯, π¦ We chose the shortest path Bubley and Dyer Theorem β Proof 3/5 β’ Have: ππΎ π π₯,β , π π¦,β β€ π βπΌ β π(π₯, π¦) β’ Want: ππΎ ππ, ππ β€ π βπΌ β ππΎ (π, π) for any two distributions π, π on Ξ© β’ Let π be an optimal coupling of π, π, i.e: β’ ππΎ π, π = π₯,π¦βΞ© π π₯, π¦ β π(π₯, π¦) β’ We can choose for every π₯, π¦ β Ξ© an optimal coupling ππ₯,π¦ of π π₯,β , π π¦,β β’ π’,π€βΞ© π π’, π€ β ππ₯,π¦ (π’, π€) β€ π βπΌ β π(π₯, π¦) Bubley and Dyer Theorem β Proof 4/5 β’ Have: π optimal coupling of π, π , and ππ₯,π¦ optimal coupling of π π₯,β , π π¦,β β’ ππΎ π, π = β’ π’,π€βΞ© π π₯,π¦βΞ© π π₯, π¦ β π(π₯, π¦) π’, π€ β ππ₯,π¦ (π’, π€) β€ π βπΌ β π(π₯, π¦) β’ Define a distribution π = β’ π is a coupling of ππ, ππ π₯,π¦βΞ© π(π₯, π¦) β ππ₯,π¦ β’ Choose 2 starting states with probabilities π¦) β ππ₯,π¦ (π’,π, π€)π and then advance π€βΞ© π(π’, π€) = π₯,π¦,π€βΞ© π(π₯, them on the Markov chain π(π₯, π¦) β π (π’, π€) = π(π₯, π¦) β π(π₯, π’) = π₯,π¦βΞ© π€βΞ© π₯,π¦ π₯,π¦βΞ© β’ π₯βΞ© π(π₯, π’) β π¦βΞ© π(π₯, π¦) β’ π₯βΞ© π(π₯, π’) β π(π₯) = ππ(π’) β’ = Bubley and Dyer Theorem β Proof 5/5 β’ Have: β’ π= π₯,π¦βΞ© π(π₯, π¦) β ππ₯,π¦ , β’ ππΎ π, π = β’ π₯,π¦βΞ© π a coupling of ππ, ππ π₯, π¦ β π(π₯, π¦) βπΌ π π’, π€ β π (π’, π€) β€ π β π(π₯, π¦) π₯,π¦ π’,π€βΞ© β’ Want: ππΎ ππ, ππ β€ π βπΌ β ππΎ (π, π) for any two distributions π, π on Ξ© β’ ππΎ ππ, ππ β€ π’,π€βΞ© π π’, π€ β π π’, π€ β’ = π’,π€βΞ© π₯,π¦βΞ© π π’, π€ β π π₯, π¦ β ππ₯,π¦ π’, π€ β’ β€ π βπΌ β π₯,π¦βΞ© π π₯, π¦ β π π₯, π¦ β’ = π βπΌ β ππΎ (π, π) Summary so far β’ Given a state space Ξ© and a connected graph with lengths β on edges, we can define a metric π between states (path metric) β’ We can then define a metric ππΎ between distributions (transportation metric) β’ Given some conditions on π, we can bound the mixing time (Corollary of Bubley-Dyer theorem) The Path Coupling Technique β’ Analyze the mixing time of a Markov chain π β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©: β’ In both coordinates it looks like π (= coupling) β’ For some two starting states, they tend to βget closerβ β’ A bound on π‘πππ₯ depends on how quickly the states get closer The Path Coupling Technique β’ Decide which states are βcloseβ β’ Define πΊ = (Ξ©, πΈ0 ), and a distance function β on edges β’ Extend β to a metric π on Ξ© using the path metric β’ Devise a way to advance two states π₯, π¦ β Ξ© × Ξ©: β’ In both coordinates it looks like π (= coupling) β’ For 2 adjacent starting states, they tend to βget closerβ β’ πΌπ₯,π¦ π π1 , π1 β€ π βπΌ β π π₯, π¦ β’ A bound on π‘πππ₯ depends on πΌ and ππππ(Ξ©) β’ π‘πππ₯ π β€ β log π +log ππππ Ξ© πΌ β’ We want as many edges as we can (so ππππ(Ξ©) is smaller) β’ And we want the distance shrinking rapidly (so πΌ is bigger) Fast Mixing for Colorings Reminder: π-colorings β’ Proper π-colorings of πΊ = π, πΈ are elements π₯ β 1,2, β¦ , π π s.t. : π₯ π£ β π₯(π€) for π£, π€ β πΈ β’ Many uses for coloring problems β’ Voting choices on the graph of (people, friendships) β’ People tend to change votes according to friends β’ Time slots on the graph of (tasks, conflicts) = scheduling β’ Conflicting tasks canβt be executed in the same time slot β’ Many things itβs interesting to analyze: β’ How does a βrandomβ coloring look? β’ How long does it take a process to converge? Reminder Metropolis chain Glauber dynamics β’ A vertex π£ is chosen uniformly at random β’ A color π is chosen uniformly at random β’ If updating the color of π£ to π yields a proper π-coloring, accept it β’ A vertex π£ is chosen uniformly at random β’ A color π is chosen between the admissible colors β’ Why are they different? β’ Colors that donβt appear at the neighbors of π£ Theorem β’ Consider the Glauber dynamics chain for proper πcolorings of a graph πΊ = (π, πΈ) with π vertices and maximum degree Ξ β’ If π > 2 β Ξ, then the mixing time satisfies: β’ π‘πππ₯ π β€ πβΞ πβ2Ξ β π β (log π β log(π) Comparison Metropolis chain Glauber dynamics β’ π >2β Ξ β’ π‘πππ₯ π β€ β’ π >2β Ξ β’ π‘πππ₯ π β€ π πβ2Ξ π(log π β log(π) πβΞ πβ2Ξ π(log π β log(π) Theorem β Proof 1/8 β’ Two colorings are neighbors if their differ in 1 node β’ Defines the graph πΊ = (Ξ©, πΈ0 ) β’ Define β π₯, π¦ = 1 for edges π₯, π¦ β πΈ0 β’ The metric is π π₯, π¦ = π£βπ π π₯ π£ β π¦ π£ β’ We only need to define a way to generate 2 new colorings starting with colorings π₯, π¦ β πΈ0 !!! β’ Denote the unique vertex where π₯, π¦ differ by π£ β’ Denote: π΄π€ (π₯) is the set of allowable colors for node π€ in coloring π₯ Theorem β Proof 2/8 β’ Choose a vertex π€ at random β’ If π€ is not a neighbor of π£, choose a color at random from π΄π€ π₯ = π΄π€ (π¦), and update π€ with it β’ Works when π€ = π£ as well β’ So far, itβs consistent with a coupling β’ If π€ is a neighbor of π£, assume WLOG π΄π€ π₯ |π΄π€ π¦ | β€ β’ Choose a random color π β π΄π€ (π¦), update π¦ at π€ with π β’ The update of π₯ at π€ depends on the configuration All Colors π₯(π£) β π΄π€ π¦ , π¦(π£) β π΄π€ (π₯) π π π£ π€ π£ π€ β’ We have the same allowable colors in both configurations, so color π€ in π₯ with π All Colors π₯ π£ β π΄π€ π¦ , π¦(π£) β π΄π€ (π₯) π π π£ π€ π£ π€ β’ If π is not black, we can color π€ with it β’ If π is black, we can swap it for purple β’ All allowable colors for π€ in π₯ are chosen with equal probability All Colors π₯ π£ β π΄π€ π¦ , π¦ π£ β π΄π€ (π₯) π π π£ π€ π£ π€ β’ If π is not black, we can color π€ with it β’ If π is black, we draw a random allowable color for π€ 1 4 1 1 4 3 β’ The probability of every color in π΄π€ (π₯) is: + β = 1 3 Theorem β Proof 6/8 β’ We have a coupling (π1 , π1 ) of π π₯,β , π π¦,β foor two states π₯, π¦ that differ only at vertex π£ β’ Now we need to bound πΌπ₯,π¦ π π1 , π1 in order to use the corollary (by some function π βπΌ β π π₯, π¦ ) β’ π π1 , π1 decreases to 0 iff we chose π£, i.e. w.p. 1/π β’ π π1 , π1 increases to 2 iff we chose a neighbor of π£, w.p. deg π£ /π, and we updated with different colors β’ So we got π = π₯(π£), w.p. β€ β’ In total π π1 , π1 β€ 1 1 β π + 1 β€ π΄π€ π¦ deg π£ 1 β π πβΞ 1 πβΞ Theorem β Proof 7/8 β’ π π1 , π1 β€ 1 β 1 π deg π£ + π 1 π β’ π π1 , π1 β€ 1 β β 1 β β’ Since π > 2 β Ξ, Ξ πβΞ 1 β πβΞ Ξ πβΞ < 1, so π π1 , π1 < 1 β’ The distance is indeed decreasing β’ Denoting: π Ξ, π = 1 β π π₯ β₯ 1 + π₯ we get: π π1 , π1 β€ 1 π Ξ,π β π Ξ πβΞ β€π = π Ξ,π π β πβ2Ξ qβΞ and using the inequality Theorem β Proof 8/8 β’ π π1 , π1 β€ π β π Ξ,π π β’ Applying the corollary: (π‘πππ₯ π β€ β log π +log ππππ Ξ© πΌ β log π + log π π‘πππ₯ π β€ π Ξ, π π πβΞ = π(log π β log(π) π β 2Ξ ) Known Results β’ Some more results for Glauber dynamics on proper π-colorings: β’ For π β₯ 11 6 β Ξ, it is known π‘πππ₯ = π π β log π β’ For triangle-free graphs with maximum degree Ξ = Ξ© log π , we can improve the bound on π to: π β₯ 1.49. . .β Ξ 1 2 β’ For the empty graph: π‘πππ₯ β₯ β π β log π β π π β π Questions?
© Copyright 2024 Paperzz