Markov Chains and
Mixing Times
By Levin, Peres and Wilmer
Chapter 4: Introduction to markov chain mixing,
Sections 4.1-4.4, pp. 47-61.
Presented by Dani Dorfman
Planned topics
• Total Variation Distance
• Coupling
• The Convergence Theorem
• Measuring Distance from Stationary
Definition
• Given two distributions 𝜇, 𝜈 on Ω we define the Total Variation to be:
𝜇−𝜈
𝑇𝑉
= max 𝜇 𝐴 − 𝜈(𝐴)
𝐴⊂Ω
Example
1−𝑞
• Coin tossing frog.
1−𝑝
𝑃=
𝑞
1−𝑝
𝑞𝑝
𝑤
𝑞
𝑝
,Π =
1−𝑞
𝑝+𝑞
𝑝
𝑝+𝑞
• Define 𝜇0 = 1,0 , Δ𝑡 = 𝜇𝑡 𝑒 − Π(𝑒) (= Π 𝑤 − 𝜇𝑡 (𝑤) )
• An easy computation shows:
𝜇𝑡 − Π 𝑇𝑉 = Δ𝑡 = 1 − 𝑝 − 𝑞 𝑡 Δ0
𝑒
“An Easy Computation”
• Induction:
• 𝑡 = 0:
Δ0 = 1 − 𝑝 − 𝑞 0 Δ0
• 𝑡 → 𝑡 + 1:
Δ𝑡+1 = 𝜇𝑡+1 𝑒 − 𝜋 𝑒 = 1 − 𝑝 𝜇𝑡 𝑒 + 𝑞 1 − 𝜇𝑡 𝑒
−𝜋 𝑒 =
𝑞
1 − 𝑝 − 𝑞 𝜇𝑡 𝑒 + 𝑞 − 𝜋 𝑒 = 1 − 𝑝 − 𝑞 𝜇𝑡 𝑒 + 𝑞 −
=
𝑝+𝑞
1 − 𝑝 − 𝑞 𝜇𝑡 𝑒 − 1 − 𝑝 − 𝑞 𝜋 𝑒 = 1 − 𝑝 − 𝑞 Δ𝑡
Proposition 4.2
• Let 𝜇 and 𝜈 be two probability distributions on Ω. Then:
1
𝜇 − 𝜈 𝑇𝑉 =
𝜇 𝑥 − 𝜈(𝑥)
2
𝑥∈Ω
𝜈
𝜇
𝐼
𝐼𝐼𝐼
𝐵
𝐼𝐼
𝐵
𝐶
𝜇 𝐴 − 𝜈(𝐴)
Proof
• Define 𝐵 = 𝑥|𝜇 𝑥 ≥ 𝜈(𝑥) , Let 𝐴 ⊂ Ω be an event. Clearly:
𝜇 𝐴 −𝜈 𝐴 ≤𝜇 𝐴∩𝐵 −𝜈 𝐴∩𝐵 ≤𝜇 𝐵 −𝜈 𝐵
• Parallel argument gives:
𝜈 𝐴 − 𝜇 𝐴 ≤ 𝜈 𝐴 ∩ 𝐵𝐶 − 𝜇 𝐴 ∩ 𝐵𝐶 ≤ 𝜈 𝐵𝐶 − 𝜇 𝐵𝐶
• Note that both upper bounds are equal.
• Taking 𝐴 = 𝐵 achieves the upper bounds, therefore:
𝜇 − 𝜈 𝑇𝑉 = 𝜇 𝐵 − 𝜈 𝐵 =
1
1
𝐶
𝐶
𝜇 𝐵 − 𝜈 𝐵 + (𝜈 𝐵 − 𝜇(𝐵 ) =
|𝜇 𝑥 − 𝜈(𝑥)|
2
2
𝑥∈Ω
Remarks
• From the last proof we easily deduce:
𝜇−𝜈
𝑇𝑉
=
[𝜇 𝑥 − 𝜈 𝑥 ]
𝑥∈Ω,𝜇 𝑥 ≥𝜈(𝑥)
• Notice that 𝑇𝑉 is equivalent to 𝐿1 norm and therefore:
𝜇−𝜈
𝑇𝑉
≤ 𝜇−𝜔
𝑇𝑉
+ 𝜔−𝜈
𝑇𝑉
Proposition 4.5
• Let 𝜇 and 𝜈 be two probability distributions on Ω. Then:
1
𝜇 − 𝜈 𝑇𝑉 =
sup
𝑓 𝑥 𝜇 𝑥 − 𝑓 𝑥 𝜈(𝑥)
2 max 𝑓 ≤1
𝑥∈Ω
𝐼
𝐼𝐼𝐼
𝐼𝐼
Proof
• Clearly the following function achieves the supremum:
1
𝜇 𝑥 −𝜈 𝑥 ≥0
∗
𝑓 (𝑥) =
−1
𝜇 𝑥 −𝜈 𝑥 <0
• Therefore:
1
𝑓 ∗ 𝑥 𝜇 𝑥 − 𝑓 ∗ 𝑥 𝜈(𝑥) =
2
𝑥∈Ω
1
1
𝜇 𝑥 − 𝜈(𝑥) +
𝜈 𝑥 − 𝜇(𝑥) =
2
2
𝑥∈Ω,𝜇 𝑥 −𝜈 𝑥 ≥0
𝑥∈Ω,𝜇 𝑥 −𝜈 𝑥 <0
1
1
𝜇 − 𝜈 𝑇𝑉 + 𝜇 − 𝜈 𝑇𝑉 = 𝜇 − 𝜈 𝑇𝑉
2
2
Definition
• A coupling of two probability distributions 𝜇, 𝜈 is a pair of random
variables 𝑋, 𝑌 s.t 𝑃 𝑋 = 𝑥 = 𝜇 𝑥 , 𝑃 𝑌 = 𝑥 = 𝜈 𝑥 .
• Given a coupling 𝑋, 𝑌 of 𝜇, 𝜈 one can define 𝑞 𝑥, 𝑦 =
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) which represents the joint distribution 𝑋, 𝑌 . Thus:
𝜇 𝑥 =
𝑞 𝑥, 𝑦 , 𝜈 𝑦 =
𝑦∈Ω
𝑞(𝑥, 𝑦)
𝑥∈Ω
Example
• 𝜇, 𝜈 represent a legal coin flip. We can build several couplings:
(𝑋, 𝑌) s.t
1
∀𝑥, 𝑦 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 =
4
1/4 1/4
𝑞=
1/4 1/4
1
𝑃 𝑋≠𝑌 =
2
(𝑋, 𝑌) s.t 𝑋 = 𝑌
1
∀𝑥 𝑃 𝑋 = 𝑌 = 𝑥 =
2
1/2
0
𝑞=
0
1/2
𝑃 𝑋≠𝑌 =0
Proposition 4.7
• Let 𝜇 and 𝜈 be two probability distributions on Ω. Then:
𝜇−𝜈
𝑇𝑉
= inf 𝑃(𝑋 ≠ 𝑌)
𝑋,𝑌
Proof
• In order to show 𝜇 − 𝜈
𝑇𝑉
≤ inf 𝑃 𝑋 ≠ 𝑌 , ∀𝐴 ⊂ Ω note that:
𝑋,𝑌
𝜇 𝐴 −𝜈 𝐴 =𝑃 𝑋 ∈𝐴 −𝑃 𝑌 ∈𝐴 ≤
𝑃 𝑋 ∈ 𝐴, 𝑌 ∉ 𝐴 ≤ 𝑃(𝑋 ≠ 𝑌)
• Thus it suffices to find a coupling 𝑋, 𝑌 𝑠. 𝑡 𝑃 𝑋 ≠ 𝑌 = 𝜇 − 𝜈
𝑞=
𝑇𝑉 .
Proof Cont.
𝐼
𝐼𝐼
𝐼𝐼𝐼
Proof Cont.
• Define the coupling (𝑋, 𝑌) as follows:
• With probability p = 1 − 𝜇 − 𝜈 𝑇𝑉 take 𝑋 = 𝑌 according to the
distribution 𝛾𝐼𝐼𝐼 .
• O/w take 𝑋, 𝑌 from 𝐵 = 𝑥 𝜇 𝑥 − 𝜈 𝑥 > 0 , 𝐵𝐶 according to the
distributions 𝛾𝐼 , 𝛾𝐼𝐼 correspondingly.
• Clearly:
𝑃 𝑋 ≠ 𝑌 = 𝜇 − 𝜈 𝑇𝑉
Proof Cont.
All that is left is to define 𝛾𝐼 , 𝛾𝐼𝐼 , 𝛾𝐼𝐼𝐼 :
1
γ𝐼 (𝑥) =
𝜇−𝜈
1
γ𝐼𝐼 (𝑥) =
𝜇−𝜈
𝜇 𝑥 −𝜈 𝑥
𝜇 𝑥 −𝜈 𝑥 >0
0
𝑒𝑙𝑠𝑒
𝑇𝑉
𝜈 𝑥 −𝜇 𝑥
𝜇 𝑥 −𝜈 𝑥 ≤0
0
𝑒𝑙𝑠𝑒
𝑇𝑉
min{𝜇 𝑥 , 𝜈(𝑥)}
γ𝐼𝐼𝐼 (𝑥) =
1 − 𝜇 − 𝜈 𝑇𝑉
• Note that:
𝜇 = 𝑝𝛾𝐼𝐼𝐼 + 1 − 𝑝 𝛾𝐼 ,
𝜈 = 𝑝𝛾𝐼𝐼𝐼 + 1 − 𝑝 𝛾𝐼𝐼
Theorem 4.9
• Suppose that 𝑃 is irreducible and aperiodic, with stationary
distribution 𝜋. Then ∃𝛼 ∈ 0,1 , 𝐶 > 0 𝑠. 𝑡:
𝑡
∀𝑡 ma𝑥 𝑃 𝑥,∙ − Π
𝑥∈Ω
𝑇𝑉
< 𝐶𝛼
𝑡
Lemma (Prop. 1.7)
• If 𝑃 is irreducible and aperiodic, then ∃𝑟 > 0 𝑠. 𝑡:
∀𝑥, 𝑦 𝑃𝑟 𝑥, 𝑦 > 0
Proof:
• Define ∀𝑥 Τ 𝑥 = {𝑡|𝑃𝑡 𝑥, 𝑥 > 0}, then ∀𝑥 gcd Τ 𝑥
• ∀𝑥 Τ 𝑥 is closed under addition.
• From number theory: ∀𝑥∃𝑟𝑥 ∀𝑟 > 𝑟𝑥 𝑟 ∈ Τ 𝑥 .
• From irreducibility ∀𝑥, 𝑦∃𝑟𝑥,𝑦 < 𝑛 𝑠. 𝑡 𝑃𝑟𝑥,𝑦 𝑥, 𝑦 > 0.
• Taking 𝑟 ≔ 𝑛 + max 𝑟𝑥 ends the proof.
𝑥∈Ω
= 1.
Proof of Theorem 4.9
• The last lemma gives us the existence of 𝑟 𝑠. 𝑡 ∀𝑥, 𝑦 𝑃𝑟 𝑥, 𝑦 > 0.
• Let Π be the matrix with Ω rows, each row is 𝜋.
• ∃𝛿 > 0 𝑠. 𝑡 ∀𝑥, 𝑦 ∈ Ω ∶ 𝑃 𝑥, 𝑦 ≥ 𝛿𝜋 𝑦 = 𝛿Π 𝑥, 𝑦 .
• Let 𝑄 be the stochastic matrix that is derived from the equation:
𝑃𝑟 = 1 − 𝜃 Π + 𝜃𝑄 [𝜃 = 1 − 𝛿]
• Clearly: 𝑃Π = Π𝑃 = Π.
• By induction one can see:
∀𝑘 𝑃𝑟𝑘 = 1 − 𝜃 𝑘 Π + 𝜃 𝑘 𝑄𝑘
Proof of Induction
• Case 𝑘 = 1 comes by definition.
• 𝑘 → 𝑘 + 1:
𝑃𝑟(𝑘+1) = 𝑃𝑟𝑘 𝑃𝑟 = 1 − 𝜃 𝑘 Π + 𝜃 𝑘 𝑄 𝑘 𝑃𝑟 =
1 − 𝜃 𝑘 Π + 𝜃 𝑘 𝑄 𝑘 P𝑟 = 1 − 𝜃 𝑘 Π + 𝜃 𝑘 𝑄 𝑘 1 − 𝜃 Π + 𝜃𝑄 =
1 − 𝜃 𝑘 Π + 𝜃 𝑘 1 − 𝜃 𝑄 𝑘 Π + 𝜃 𝑘+1 𝑄 𝑘+1 =
1 − 𝜃 𝑘 Π + 𝜃 𝑘 1 − 𝜃 Π + 𝜃 𝑘+1 𝑄 𝑘+1 = 1 − 𝜃 𝑘+1 Π + 𝜃 𝑘+1 𝑄 𝑘+1
Proof of Theorem 4.9 Cont.
• The induction derives:
𝑃𝑟𝑘+𝑗 = 𝑃𝑟𝑘 𝑃𝑗 =
1 − 𝜃 𝑘 Π + 𝜃 𝑘 𝑄𝑘 𝑃𝑗
• Therefore,
∀𝑗 𝑃𝑟𝑘+𝑗 − Π = 𝜃 𝑘 (𝑄 𝑘 𝑃𝑗 − Π)
• Finally,
∀𝑥 𝑃𝑟𝑘+𝑗 𝑥,∙ − 𝜋
𝑇𝑉
≤ 𝜃𝑘
Definitions
• Given a stochastic matrix 𝑃 with it’s 𝜋, we define:
𝑡
𝑑 𝑡 = max 𝑃 𝑥,∙ − 𝜋
𝑥∈Ω
𝑡
𝑡
𝑇𝑉
𝑑 𝑡 = max 𝑃 𝑥,∙ − 𝑃 (𝑦,∙
𝑥,𝑦∈Ω
𝑇𝑉
Lemma 4.11
• For every stochastic matrix 𝑃 and her stationary distribution 𝜋:
𝑑 𝑡 ≤ 𝑑 𝑡 ≤ 2𝑑(𝑡)
Proof:
• The second inequality is trivial from the triangle inequality.
• Note that: 𝜋 𝐴 = 𝑦∈Ω 𝜋 𝑦 𝑃(𝑦, 𝐴) .
Proof Cont.
𝑃𝑡 (𝑥,∙) − 𝜋
𝜋(𝑦) 𝑃𝑡 𝑥, 𝐴 − 𝑃𝑡 (𝑦, 𝐴)
max
𝐴⊂Ω
𝑇𝑉
= max 𝑃𝑡 𝑥, 𝐴 − 𝜋(𝐴) =
𝐴⊂Ω
𝐴⊂Ω
𝑦∈Ω
𝜋(𝑦) max 𝑃𝑡 𝑥, 𝐴 − 𝑃𝑡 𝑦, 𝐴
𝑦∈Ω
𝐴⊂Ω
𝜋 𝑦 𝑃𝑡 𝑥, 𝐴 − 𝑃𝑡 𝑦, 𝐴
≤ max
𝑦∈Ω
𝜋 𝑦 𝑃𝑡 𝑥,∙ − 𝑃𝑡 𝑦,∙
=
𝑦∈Ω
𝜋 𝑦 𝑑 𝑡 = 𝑑(𝑡)
𝑦∈Ω
𝑇𝑉
≤
≤
Observations
𝑑 𝑡 = max 𝜇𝑃 − 𝜋
𝜇
𝑑 𝑡 = max 𝜇P − 𝜈𝑃
𝜇,𝜈
𝑇𝑉
𝑇𝑉
Lemma 4.12
• The 𝑑 function is submultiplicative, 𝑖. 𝑒. ∀𝑠, 𝑡 𝑑 𝑠 + 𝑡 ≤ 𝑑 𝑠 𝑑 𝑡 .
Proof:
• Fix 𝑥, 𝑦 ∈ Ω, Let (𝑋𝑠 , 𝑌𝑠 ) be the optimal coupling of 𝑃 𝑠 𝑥,∙ , 𝑃 𝑠 𝑦,∙ .
• Note that:
𝑃𝑡+𝑠 𝑥, 𝑤 = (𝑃 𝑠 𝑃𝑡 ) 𝑥, 𝑤 =
• The same argument gives us:
𝑃 𝑠 𝑥, 𝑧 𝑃𝑡 𝑧, 𝑤 = 𝐸 𝑃𝑡 𝑋𝑠 , 𝑤
𝑧∈Ω
𝑃𝑡+𝑠 𝑦, 𝑤
= 𝐸 𝑃𝑡 𝑌𝑠 , 𝑤 .
Proof Cont.
• Note:
𝑃𝑡+𝑠 𝑥, 𝑤 − 𝑃𝑡+𝑠 𝑦, 𝑤 = 𝐸 𝑃𝑡 𝑋𝑠 , 𝑤 − 𝐸 𝑃𝑡 𝑌𝑠 , 𝑤
• Summing over all 𝑤 yields:
1
𝑡+𝑠
𝑡+𝑠
𝑃
𝑥,∙ − 𝑝
𝑦,∙ 𝑇𝑉 =
𝐸 𝑃𝑡 𝑋𝑠 , 𝑤 − 𝑃𝑡 (𝑌𝑠 , 𝑤) ≤
2
𝑤∈Ω
1
𝐸
2
𝑃𝑡 𝑋𝑠 , 𝑤 − 𝑃𝑡 (𝑌𝑠 , 𝑤)
𝑤∈Ω
≤ 𝑑 𝑡 𝑃 𝑋𝑠 ≠ 𝑌𝑠 ≤ 𝑑 𝑡 𝑑 𝑠
Remarks
• From submultiplicity we note that 𝑑(𝑡) is non-increasing.
• Also:
∀𝑐 𝑑 𝑐𝑡 ≤ 𝑑 𝑐𝑡 ≤ 𝑑 𝑐 (𝑡)
© Copyright 2026 Paperzz