Expectation-Propagation performs smooth gradient descent Advances in Approximate Bayesian Inference 2016 1 GUILLAUME DEHAENE Computational troubles in Bayesia 2 If we want to approximate π π½ : - Gaussian approximations: 1. Laplace approximation + Gradient Descent 2. Variational Bayes (and a variant) 3. Expectation Propagation Laplace + Gradient Descent 3 Laplace = Gaussian approximation at the mode Computed using Gradient Descent on π = β log π Probability π Laplace + Gradient Descent 4 Laplace = Gaussian approximation at the mode Computed using Gradient Descent on π = β log π The mathematically conservative choice: - Gradient Descent is well-understood - Laplace is exact in the large-data limit Physical intuitions 5 Gradient Descent β dynamics of a sliding object - Log probability - Log probability Linking GD, VB and EP 6 VB and EP iterate Gaussian approximations We can define an algorithm that: - Iterates Gaussian - Computes the Laplace - Does Gradient Descent Algorithm 1: disguised gradient descent 7 - Initialize with any Gaussian π0 - Loop: ππ = πΈππ π π = π β² ππ π½ = π β²β² ππ ππ+1 π β exp βπ π β ππ π½ β π β ππ 2 πβ² ππ ππ+π = ππ β β²β² π ππ This is Newtonβs method ! 2 Algorithm 1: disguised gradient descent 8 Newtonβs method π β ππ’πππππ‘ππ DGD π β exp βππ’πππππ‘ππ π β πΊππ’π π πππ Variational Bayes Gaussian approximation 9 The Variational Bayes approach: - Minimize KL π, π = πΈπ π log π for π a Gaussian Local minima respect (Opper, Archambeau, 2007): πΈπ β π β² =0 πΈπ β π β²β² = π£πππβ β1 Algorithm 2: smoothed gradient descent 10 - Initialize with any Gaussian π0 - Loop: ππ = πΈππ π π = πΈππ π β² π π½ = πΈππ π β²β² π β πβ² (ππ ) ππ+1 π β exp βπ π β ππ β πβ²β² (ππ ) π½ β π β ππ 2 2 Algorithm 2: smoothed gradient descent 11 πΌ-Divergence minimization 12 If instead of KL, we minimize: π·πΌ π, π = β« π1βπΌ ππΌ Then, local minima πβ are such that: ββ β π1βπΌ π β πΌ πΈβ β π β² = 0 πΈβ β π β πβ π β² = 1 Algorithm 3: hybrid smoothing GD 13 - Initialize with any Gaussian π0 - Loop: βπ β π 1βπΌ ππ πΌ ππ = πΈβπ π β² π = πΈβπ π π β² β π (ππβ1 ) π½ = π£ππβπ β πβ²β² (ππ ) πΈβπ π β πβ π β² π ππ+1 π β exp βπ π β ππ π½ β π β ππ 2 2 Interpreting algorithm 3 14 The only difference (not obvious for π½-term): Replacing ππ , a poor approximation to π By a superior hybrid approximation: βπ β π 1βπΌ ππ πΌ βπ Expectation Propagation 15 Assume that the target can be factorized: π π β ππ π π Then EP seeks a Gaussian approximation for each ππ : ππ π β ππ π They are improved iteratively Algorithm 4: classic Expectation Propagation 16 - Loop: π‘β - Compute the π hybrid: βπ β ππ π πβ π ππ π βπ and its mean and variance: ππ = πΈβπ π - π£π = π£ππβπ New π π‘β approximation: ππ π = π β ππ 2 exp β 2π£ππβπ πβ π ππ π β ππ ππ Algorithm 5: smooth EP 17 Factorizing π has split the energy landscape: π π = i ππ π For each component ππ π , use a different smoothing: βπ β ππ ππ β π πβ π Then, update ππ β ππ = exp(βππ ) Algorithm 5: smooth EP 18 - Initialize with any Gaussians π1 , π2 β¦ ππ - Loop: βπ β ππ ππ = π½= πβ π ππ β² πΈβπ π π = πΈβπ ππ β1 β² π£ππβπ πΈβπ π β ππ ππ π½ ππ π β exp βπ π β ππ β π β ππ 2 π π 2 Classic vs Smooth EP 19 Algorithm 4: - Computationally efficient - Completely unintuitive Algorithm 5: - Intuitive: linked to Newtonβs method - Tractable to analysis Which should we choose? Conclusion 20 Algorithm 1: iterating on Gaussians to perform GD Algorithm 2: smoothed GD computes VB approx. Algorithm 3: hybrid smoothing compute π·πΌ approx Algorithm 5: complicated hybrid smoothing which computes EP approximation We can re-use our understanding of Newtonβs when we think of EP Possible path towards improved EP algorithms? Conclusion 21 This might prove a path towards theoretical results on EP: - Intuitively proves the link between EP and VB: - The only difference between 2 and 5: ππ or βπ smoothing - In the limit where all βπ β ππ , EP β VB Corresponds to a large-number of weak factors
© Copyright 2026 Paperzz