A Fresh Look at the Bayes’ Theorem from Information Theory Tan Bui-Thanh Computational Engineering and Optimization (CEO) Group Department of Aerospace Engineering and Engineering Mechanics Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin Babuska Series, ICES Sep 9, 2016 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 1 / 28 Outline 1 Bayesian Inversion Framework 2 Entropy 3 Relative Entropy 4 Bayes’ Theorem and Information Theory 5 Conclusions (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 2 / 28 Large-scale computation under uncertainty Inverse electromagnetic scattering Randomness Random errors in measurements are unavoidable Inadequacy of the mathematical model (Maxwell equations) Challenge How to invert for the invisible shape/medium using computational electromagnetics with O 106 degree of freedoms? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 3 / 28 Large-scale computation under uncertainty Full wave form seismic inversion Randomness Random errors in seismometer measurements are unavoidable Inadequacy of the mathematical model (elastodynamics) Challenge How to image the earth interior using forward computational model with with O 109 degree of freedoms? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 4 / 28 Inverse Shape Electromagnetic Scattering Problem Maxwell Equations: @H , @t @E r⇥H=✏ , @t r⇥E= µ (Faraday) (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28 Inverse Shape Electromagnetic Scattering Problem Maxwell Equations: @H , @t @E r⇥H=✏ , @t r⇥E= µ (Faraday) (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity Forward problem (discontinuous Galerkin discretization) d = G(x) where G maps shape parameters x to electric/magnetic field d at the measurement points (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28 Inverse Shape Electromagnetic Scattering Problem Maxwell Equations: @H , @t @E r⇥H=✏ , @t r⇥E= µ (Faraday) (Ampere) E: Electric field, H: Magnetic field, µ: permeability, ✏: permittivity Forward problem (discontinuous Galerkin discretization) d = G(x) where G maps shape parameters x to electric/magnetic field d at the measurement points Inverse Problem Given (possibly noise-corrupted) measurements on d, infer x? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 5 / 28 The Bayesian Statistical Inversion Framework (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28 The Bayesian Statistical Inversion Framework (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28 The Bayesian Statistical Inversion Framework Bayes Theorem ⇡post (x|d) / ⇡like (d|x) ⇥ ⇡prior (x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 6 / 28 Bayes theorem for inverse electromagnetic scattering Prior knowledge: The obstacle is smooth: ✓ ◆ Z 2⇡ 00 ⇡pr (x) / exp r (x)d✓ 0 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 7 / 28 Bayes theorem for inverse electromagnetic scattering Prior knowledge: The obstacle is smooth: ✓ ◆ Z 2⇡ 00 ⇡pr (x) / exp r (x)d✓ 0 Likelihood: Additive Gaussian noise, for example, ✓ ◆ 1 2 ⇡like (d|x) / exp kG(x) dkCnoise 2 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 7 / 28 Outline 1 Bayesian Inversion Framework 2 Entropy 3 Relative Entropy 4 Bayes’ Theorem and Information Theory 5 Conclusions (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 8 / 28 Entropy Definition We define the uncertainty in a random variable X distributed by 0 ⇡ (x) 1 as Z H (X) = ⇡ (x) log ⇡ (x) dx 0 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 9 / 28 Entropy (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28 Entropy Wiener and Shannon Kolmogorov Copied from Sergio Verdu (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28 Entropy Wiener and Shannon Kolmogorov Copied from Sergio Verdu Wiener: “...for it belongs to the two of us equally” (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28 Entropy Wiener and Shannon Kolmogorov Copied from Sergio Verdu Wiener: “...for it belongs to the two of us equally” Shannon: “...a mathematical pun” (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28 Entropy Wiener and Shannon Kolmogorov Copied from Sergio Verdu Wiener: “...for it belongs to the two of us equally” Shannon: “...a mathematical pun” Kolmogorov: “...has no physical interpretation” (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 10 / 28 Entropy Entropy of uniform distribution (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28 Entropy Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28 Entropy Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 1 ⇡ (u) := ) H (U ) = log (|X|) |X| (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28 Entropy Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 1 ⇡ (u) := ) H (U ) = log (|X|) |X| How uncertain is the uniform random variable? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28 Entropy Entropy of uniform distribution Let U be a uniform random variable with values in X, and |X| < 1 1 ⇡ (u) := ) H (U ) = log (|X|) |X| How uncertain is the uniform random variable? H (X) H (U ) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 11 / 28 100 years of uniform distribution source: Christoph Aistleitner (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 12 / 28 100 years of uniform distribution source: Christoph Aistleitner Hermann Weyl (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 12 / 28 and Maximum entropy Maximum entropy distribution X with known mean and variance (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28 and Maximum entropy Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 13 / 28 and Maximum entropy Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy Z max H(X) = ⇡(x) ⇡(x) log(⇡(x)) dx subject to Z (CEO Group UT Austin) Bayes’ Theorem Z (x x⇡(x) dx = µ µ)2 ⇡(x) dx = 2 Z ⇡(x) dx = 1 Bayesian Inversions 13 / 28 Gaussian and Maximum entropy Maximum entropy distribution X with known mean and variance ⇡ (x)? with maximum entropy Z max H(X) = ⇡(x) ⇡(x) log(⇡(x)) dx subject to Z Z (x x⇡(x) dx = µ µ)2 ⇡(x) dx = 2 Z ⇡(x) dx = 1 Gaussian distribution: ⇡ (x) = N µ, (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 2 13 / 28 Outline 1 Bayesian Inversion Framework 2 Entropy 3 Relative Entropy 4 Bayes’ Theorem and Information Theory 5 Conclusions (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 14 / 28 Relative Entropy Abraham Wald (1945) Harold Je↵reys (1945) ✓ ◆ Z ⇡(x) D (⇡||q) := ⇡(x) log dx q(x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 15 / 28 Kullback-Leibler divergence = Relative Entropy Solomon Kullback Richard Leibler (1951) (1951) ✓ ◆ Z ⇡(x) D (⇡||q) := ⇡(x) log dx q(x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 16 / 28 Kullback-Leibler divergence = Relative Entropy Solomon Kullback Richard Leibler (1951) (1951) ✓ ◆ ✓ ◆ Z X ⇡(x) ⇡i discrete D (⇡||q) := ⇡(x) log dx = ⇡i log q(x) qi (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 16 / 28 Information Inequality The most important inequality in information theory D (⇡||q) 0 Can we see it easily? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 17 / 28 Information Inequality The most important inequality in information theory D (⇡||q) 0 Can we see it easily? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 17 / 28 Outline 1 Bayesian Inversion Framework 2 Entropy 3 Relative Entropy 4 Bayes’ Theorem and Information Theory 5 Conclusions (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 18 / 28 From Relative Entropy to Bayes’ Theorem Toss n times an kth dimensional dice with the prior distribution of k X k each face {pi }i=1 : pi = 1 i=1 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28 From Relative Entropy to Bayes’ Theorem Toss n times an kth dimensional dice with the prior distribution of k X k each face {pi }i=1 : pi = 1 i=1 Let ni is the number of times we see face i: (CEO Group UT Austin) Bayes’ Theorem ni ! pi n Bayesian Inversions 19 / 28 From Relative Entropy to Bayes’ Theorem Toss n times an kth dimensional dice with the prior distribution of k X k each face {pi }i=1 : pi = 1 i=1 ni ! pi n What is the likelihood that these n faces also distributed by the k X posterior distrubtion qi : qi = 1? Let ni is the number of times we see face i: i=1 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28 From Relative Entropy to Bayes’ Theorem Toss n times an kth dimensional dice with the prior distribution of k X k each face {pi }i=1 : pi = 1 i=1 ni ! pi n What is the likelihood that these n faces also distributed by the k X posterior distrubtion qi : qi = 1? Let ni is the number of times we see face i: The likelihood of i=1 k {ni }i=1 distributed by {qi }ki=1 ⇧ki=1 qini (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 19 / 28 From Relative Entropy to Bayes’ Theorem Toss n times an kth dimensional dice with the prior distribution of k X k each face {pi }i=1 : pi = 1 i=1 ni ! pi n What is the likelihood that these n faces also distributed by the k X posterior distrubtion qi : qi = 1? Let ni is the number of times we see face i: The likelihood of distribution) i=1 k {ni }i=1 distributed L := (CEO Group UT Austin) by {qi }ki=1 (Multinomial n! ⇧ki=1 ni ! Bayes’ Theorem ⇧ki=1 qini Bayesian Inversions 19 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := (CEO Group UT Austin) n! ⇧ki=1 ni ! Bayes’ Theorem ⇧ki=1 qini Bayesian Inversions 20 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := n! ⇧ki=1 ni ! ⇧ki=1 qini Take the log likelihood log L = log(n!) (CEO Group UT Austin) X log(ni !) + Bayes’ Theorem X ni log(qi ) Bayesian Inversions 20 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := n! ⇧ki=1 ni ! ⇧ki=1 qini Take the log likelihood log L = log(n!) X log(ni !) + X ni log(qi ) Stirling’s approximation log n! ⇡ n log(n) n X X X log L = n log(n) ni log(ni ) + ni log(qi ) + ni | {z 0 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions n } 20 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := n! ⇧ki=1 ni ! ⇧ki=1 qini Take the log likelihood X log L = log(n!) log(ni !) + X ni log(qi ) Stirling’s approximation log n! ⇡ n log(n) n X X X log L = n log(n) ni log(ni ) + ni log(qi ) + ni | {z 0 X ni 1 ( log L) = log n n (CEO Group UT Austin) ✓ ni /n qi ◆ Bayes’ Theorem Bayesian Inversions n } 20 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := n! ⇧ki=1 ni ! ⇧ki=1 qini Take the log likelihood X log L = log(n!) log(ni !) + X ni log(qi ) Stirling’s approximation log n! ⇡ n log(n) n X X X log L = n log(n) ni log(ni ) + ni log(qi ) + ni | {z 0 X ni 1 ( log L) = log n n (CEO Group UT Austin) ✓ ni /n qi ◆ Bayes’ Theorem = X pi log ✓ pi qi ◆ Bayesian Inversions n } 20 / 28 From Relative Entropy to Bayes’ Theorem The likelihood of {ni }ki=1 distributed by {qi }ki=1 L := n! ⇧ki=1 ni ! ⇧ki=1 qini Take the log likelihood log L = log(n!) X log(ni !) + X ni log(qi ) Stirling’s approximation log n! ⇡ n log(n) n X X X log L = n log(n) ni log(ni ) + ni log(qi ) + ni | {z 0 n } Relative entropy = average likelihood ✓ ◆ ✓ ◆ X ni X 1 ni /n pi ( log L) = log = pi log = D (p||q) n n qi qi (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 20 / 28 From Relative Entropy to Bayes’ Theorem Relative entropy = average likelihood 1 ( log L) = D (p||q) n (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28 From Relative Entropy to Bayes’ Theorem Relative entropy = average likelihood Write X ! Z (CEO Group UT Austin) 1 ( log L) = D (p||q) n Z log(L)p(x) dx = Z Bayes’ Theorem log ✓ ◆ p p(x) dx q Bayesian Inversions 21 / 28 From Relative Entropy to Bayes’ Theorem Relative entropy = average likelihood Write X ! Z 1 ( log L) = D (p||q) n Z log(L)p(x) dx = Z log ✓ ◆ p p(x) dx q q(x) = L(x)p(x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 21 / 28 From Relative Entropy to Bayes’ Theorem Relative entropy = average likelihood ! Bayes Write X ! Z Bayes’ theorem (CEO Group UT Austin) 1 ( log L) = D (p||q) n Z log(L)p(x) dx = Z log ✓ ◆ p p(x) dx q q(x) = L(x)p(x) Bayes’ Theorem Bayesian Inversions 21 / 28 From Optimization to Bayes’ Theorem Inverse Problem Given observation model d = G (x) + " (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28 From Optimization to Bayes’ Theorem Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28 From Optimization to Bayes’ Theorem Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x Statistical inversion: Prior knowledge: X ⇠ ⇡prior (x). Look for the posterior distribution ⇡post (x) that combines prior information and information from the data. (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28 From Optimization to Bayes’ Theorem Inverse Problem Given observation model d = G (x) + " Inverse task: given d, infer x Statistical inversion: Prior knowledge: X ⇠ ⇡prior (x). Look for the posterior distribution ⇡post (x) that combines prior information and information from the data. The likelihood: assume " ⇠ N (0, C) ✓ ◆ 1 2 ⇡like (x) = exp kd G(x)kC 2 (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 22 / 28 From Optimization to Bayes’ Theorem Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 23 / 28 From Optimization to Bayes’ Theorem Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized Conversely, best prior ! the information gained in the posterior should not be large (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 23 / 28 From Optimization to Bayes’ Theorem Prior Elicitation Try to get the best prior information = discrepancy relative to the posterior is minimized Conversely, best prior ! the information gained in the posterior should not be large Equivalently, ⇡post = arg min D (⇡||⇡prior ) = ⇡(x) (CEO Group UT Austin) Z Bayes’ Theorem ⇡(x) log ✓ ⇡(x) ⇡prior (x) ◆ dx Bayesian Inversions 23 / 28 From Optimization to Bayes’ Theorem How about information from the data? Want to find x to match the data as well as we can (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28 From Optimization to Bayes’ Theorem How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2C is minimized! (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28 From Optimization to Bayes’ Theorem How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2C is minimized! One approach: minimize the mean squared error Z ⇡post = arg min ⇡(x) kd G(x)k2C dx ⇡(x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28 From Optimization to Bayes’ Theorem How about information from the data? Want to find x to match the data as well as we can “Equivalently”: want to find the posterior distribution such that kd G(x)k2C is minimized! One approach: minimize the mean squared error Z Z 2 ⇡post = arg min ⇡(x) kd G(x)kC dx= ⇡(x) log (⇡like (x)) dx ⇡(x) (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 24 / 28 From Optimization to Bayes’ Theorem Prior + data information From prior ⇡post = arg min D (⇡||⇡prior ) = ⇡(x) (CEO Group UT Austin) Z Bayes’ Theorem ⇡(x) log ✓ ⇡(x) ⇡prior (x) ◆ dx Bayesian Inversions 25 / 28 From Optimization to Bayes’ Theorem Prior + data information From prior ⇡post = arg min D (⇡||⇡prior ) = ⇡(x) Z ⇡(x) log ✓ ⇡(x) ⇡prior (x) ◆ dx From likelihood ⇡post = arg min ⇡(x) (CEO Group UT Austin) Z ⇡(x) log (⇡like (x)) dx Bayes’ Theorem Bayesian Inversions 25 / 28 From Optimization to Bayes’ Theorem Prior + data information From prior ⇡post = arg min D (⇡||⇡prior ) = ⇡(x) Z ⇡(x) log ✓ ⇡(x) ⇡prior (x) ◆ dx From likelihood ⇡post = arg min ⇡(x) Z ⇡(x) log (⇡like (x)) dx A Compromise ⇡post = arg min ⇡(x) subject to (CEO Group UT Austin) Z ⇡(x) log (⇡like (x)) dx+ Z ⇡(x) dx = 1, Bayes’ Theorem Z and ⇡(x) ⇡(x) log ✓ ⇡(x) ⇡prior (x) ◆ dx 0. Bayesian Inversions 25 / 28 From Optimization to Bayes’ Theorem Prior + data information A Compromise ✓ ◆ Z Z ⇡(x) ⇡post = arg min ⇡(x) log (⇡like (x)) dx+ ⇡(x) log dx ⇡prior (x) ⇡(x) subject to (CEO Group UT Austin) Z ⇡(x) dx = 1, Bayes’ Theorem and ⇡(x) 0. Bayesian Inversions 26 / 28 From Optimization to Bayes’ Theorem Prior + data information A Compromise ✓ ◆ Z Z ⇡(x) ⇡post = arg min ⇡(x) log (⇡like (x)) dx+ ⇡(x) log dx ⇡prior (x) ⇡(x) subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post (x)? is it unique? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28 From Optimization to Bayes’ Theorem Prior + data information A Compromise ✓ ◆ Z Z ⇡(x) ⇡post = arg min ⇡(x) log (⇡like (x)) dx+ ⇡(x) log dx ⇡prior (x) ⇡(x) subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post (x)? is it unique? How to solve? (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28 From Optimization to Bayes’ Theorem Prior + data information A Compromise ✓ ◆ Z Z ⇡(x) ⇡post = arg min ⇡(x) log (⇡like (x)) dx+ ⇡(x) log dx ⇡prior (x) ⇡(x) subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post (x)? is it unique? How to solve? Lagrangian + calculus of variation (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 26 / 28 From Optimization to Bayes’ Theorem Prior + data information A Compromise ✓ ◆ Z Z ⇡(x) ⇡post = arg min ⇡(x) log (⇡like (x)) dx+ ⇡(x) log dx ⇡prior (x) ⇡(x) subject to Z ⇡(x) dx = 1, and ⇡(x) 0. Does it have a solution ⇡post (x)? is it unique? How to solve? Lagrangian + calculus of variation Solution = Bayes’ theorem ⇡post (x|d) = R (CEO Group UT Austin) ⇡like (d|x) ⇥ ⇡prior (x) ⇡like (d|x) ⇥ ⇡prior (x) dx Bayes’ Theorem Bayesian Inversions 26 / 28 Outline 1 Bayesian Inversion Framework 2 Entropy 3 Relative Entropy 4 Bayes’ Theorem and Information Theory 5 Conclusions (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 27 / 28 Conclusions 1 Information provide an intuitive and fresh view of Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28 Conclusions 1 Information provide an intuitive and fresh view of Bayes’ theorem 2 Relative entropy ! Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28 Conclusions 1 Information provide an intuitive and fresh view of Bayes’ theorem 2 Relative entropy ! Bayes’ theorem 3 Optimization + information ! Bayes’ theorem (CEO Group UT Austin) Bayes’ Theorem Bayesian Inversions 28 / 28
© Copyright 2024 Paperzz