Fast Estimation of Expected Information Gain for Bayesian

Fast Estimation of Expected Information Gain for Bayesian
Experimental Design Based on Laplace Approximation
Quan Long
Marco Scavino
Raúl Tempone
Suojin Wang
Computer, Electrical and Mathematical Sciences & Engineering, King Abdullah University of Science and Technology,
KSA
Department of Statistics, Texas A& M University, College Station, TX, 77843, USA
[email protected], [email protected], [email protected],
[email protected]
Introduction
Generalization to Under–Determined
Model
Multi-dimensional Inference
Let us consider additive Gaussian experimental noise,
y i = g(θ, ξ) + i ,
where both y and θ are vectors. is the Gaussian noise term independent from θ.
The posterior of parameter vector before the Laplace approximation is
applied reads
p(θ)
1 T −1
,
p(θ|{y i}) ∝
exp − r i Σ r i QM
2
i=1 p(y i )
i=1
where r i is the residual for ith measurement corresponding to one
design parameter, i.e. r i = g(θ d) + i − g(θ), where θ d is the ”true”
parameter of the system.
Laplace approximation leads to the normality of the posterior pdf for
the parameters
!
T −1
(θ − θ̂) Σ (θ − θ̂)
p(θ|{y i}) ≈ p(θ̂|{y i}) exp −
.
(1)
2
where θ̂ and Σ are to be derived below.
We first present how to obtain Σ, assuming Gaussian prior for θ, θ ∼
N (θ 0, Σp). Define the following quantity: the negative logarithm of the
original parameter posterior before approximation
i ni
h√
I≈
p(θ 0)dθ 0 ,
− log [ps(0)] − log ( 2π)n|Σs|t|1/2 −
2
θ0
where Σs|t is the projected covariance matrix.
Z h
In case we are interested in some physical quantity of interest, which
is commonly defined as a function of θ plus some independent error.
i.e. ,
Q = τ (θ) + Q ,
where the prediction error Q is independent to θ. Thus, the uncertainty in Q comes from the direct combination of two sources θ and .
Since the posterior pdf of parameters is concentrated around θ̂, small
noise assumption can be applied to propagate randomness from θ to
Q. Linearization of τ at θ̂ leads to
∂τ
τ (θ) ≈ τ (θ̂) +
(θ − θ̂) .
∂θ
We consequently can conclude that Q|{y i} is a Gaussian
"
#
2
1
(Q − Q̂)
p(Q|{y i}) = √
exp − 2
(3)
2σQ|{y }
2πσQ|{yi}
i
9
10
−10
10
0
1
10
2
3
4
10
10
10
Number of Quadrature Points/Samples
8
7
10
9
θ
6
5
11
θ
3
6
θ
9
θ
θ
2
5
θ
8
θ
7
3
12
1
3
2
θ
4
θ
1
0
0
4
3
6
9
Figure 4: Admissible locations of boundary sources and subdomains
of piecewise constant random conductivities. exp(θ) ∼ N (0, Σp),
Σp(3, 3) = Σp(5, 5) = Σp(7, 7) = 1 , Σp(1, 1) = Σp(2, 2) = Σp(4, 4) =
Σp(6, 6) = Σp(8, 8) = Σp(9, 9) = 0.01.
3.5
= H(Q) − H(Q|{y i}) .
where
Z Z
(log p(Q|{y i}))p(Q|{y i})dQp({y i})d{y i}
3
2.5
2
1.5
1
0.5
0
0
Z
H(Q) = −
LA
MC
MC
MC
6
of Q, Q̂ = τ (θ̂).
The expected information gain in Q is therefore
Z Z
I=
(log p(Q|{y i}) − log p(Q))p(Q|{y i})dQp({y i})d{y i}
and
−5
10
Impedance Tomography
T
∂τ
∂τ
2
2
Σ
+
σ
.
with σQ|{y
=
}
Q
i
∂θ ∂θ
where σ2Q , which is assumed to be a known constant, is the variance
H(Q|{y i}) = −
0
10
Figure 3: Logarithmic plot for absolute consecutive difference of ex2
pected information gain. σm
= 0.01. α = 1, β = 1.
Prediction of Quantity of Interest
Expected Information Gain
Shannon-type expected information gain is an important utility in evaluating the usefulness of a proposed experiment that involves uncertainty. Its estimation, however, cannot rely solely on Monte Carlo sampling methods, that are generally too computationally expensive for
realistic physical models, especially for those involving the solution of
stochastic partial differential equations. In this work we present a new
methodology, based on the Laplace approximation of the posterior
probability density function, to accelerate the estimation of expected
information gain in the model parameters and predictive quantities of
interest. Furthermore, in order to deal with the issue of dimensionality
in a complex problem, we use sparse quadratures for the integration
over the prior. We show the accuracy and efficiency of the proposed
method via several nonlinear numerical examples, including a single
parameter design of one dimensional cubic polynomial function and
the current pattern for impedance tomography.
M
Y
Absolute Consecutive Difference
5
10
10
20
(log p(Q))p(Q)dQ
We substitute expression (3) into H(Q|{y i}),
Z T
∂τ ∂τ
2
H(Q|{y i}) ≈ 0.5
log2π + log(
Σ
+ σQ|{y
} ) + 1 p(θ d )dθ d .
i
∂θ ∂θ
30
40
Index of Design Scenario
50
60
Figure 5: Information gains computed for all the possible combinations of current sources. The parameter space is defined by 9 conductivities.
M
1 X T −1
F (θ) := − log(p(θ|{y i})) =
r i Σ r i + log(p(θ)) + C1
2 i=1
9
Numerical Examples
6
y(θ, ξ) = θ3ξ 2 + θ exp[−|0.2 − ξ|] + with θ ∼ U(−1, 1),
and the corresponding Hessian is
M
X
T
−1
−1
H f = ∇∇F (θ̂) =
H g (θ̂)T Σ−1
r i + M J g (θ̂)Σ J g (θ̂) + Σp .
3.6
3.8
1 quadrature point
3 quadrature points
10 quadrature points
Expected Information Gain
Expected Information Gain
i=1
2
2
3
3
∼ N (0, 0.001).
10 X10 samples
3
0.4
0.6
ξ
0.8
0.5
−0.5
3
0
0
−1.5
3
6
9
−1
0
0
−1.5
3
6
9
(b)
3
9
2.8
0
1
0
Figure 6: Two examples of current patterns and corresponding potential contours inducing the most information gains. (a) Current pattern of 35th scenario. (b) Current pattern of 36th scenario.
104X104 samples
3.2
0.2
6
(a)
3.4
3.2
2.8
0
1
−1
10 X10 samples
3.6
3.4
0.2
0.4
(a)
0.6
ξ
0.8
1.5
1
(b)
9
1.5
1
6
0.5
1
6
0.5
0
Absolute Consecutive Difference
i=1
where J g and H g are the Jacobian and Hessian of g with respect to
θ.
For√M sufficiently large the magnitude of the first term is proportional
to M . Therefore, we can disregard the term
M
X
H g (θ̂)Σ−1
ri
10
0
ξ=0.25
ξ=0.5
ξ=0.75
−5
10
−0.5
3
−1
−10
10
0
0
6
9
0
0
0
1
10
2
10
Number of Quadrature Points
−1.5
3
6
9
(b)
Figure 7: Two examples of current patterns and corresponding potential contours inducing the least information gains. (a) Current pattern of 5th scenario. (b) Current pattern of 52th scenario.
−20
10
−1.5
3
(a)
−15
10
10
(c)
and obtain the approximation
−1
H f (θ̂) ≈ M J Tg (θ̂)Σ−1
J
(
θ̂)
+
Σ
g
p .
−0.5
3
0
−1
i=1
Figure 1: Performances of Laplace approximation and Monte Carlo
sampling in computing the expected information gain, M = 10.
(2)
Now consider the maximum likelihood solution of the parameters,
X
M
T −1
θ̂ := arg min
r iT Σ−1
r
+
(θ
−
θ
)
Σp (θ − θ 0) .
i
0
We change the model in the first example to
y = (αθ1 + βθ2)3ξ 2 + (αθ1 + βθ2) exp[−|0.2 − ξ|] + 2
with θ ∼ N (θ 0, Σp), ∼ N (0, σm
).
i=1
θ
The expected information gain in multi-dimensional case can be approximated in the following way
#
Z "
−1
T −1
(θ̂ − θ 0) Σp (θ̂ − θ 0)
1
|Σ|
d Σ : Σp
I≈
− log
− +
+
p(θ d)dθ d
2
|Σ
|
2
2
2
p
Θ
Z
1
|Σp|
≈
log
p(θ d)dθ d .
2
|Σ|
where the symbol : defines the summation of component-wise product of two tensors.
θ 0 = [0.5 0.5]T ,
Σp(1, 1) = Σp(2, 2) = 0.1
5
Expected Information Gain
Indeed, we have
T −1
θ̂ = arg min M (g(θ d) − g(θ))T Σ−1
(g(θ
)
−
g(θ))
+
(θ
−
θ
)
Σp (θ − θ 0) .
d
0
Acknowledgement
Model with two indistinguishable parameters
4
3
MC, M=10
LA, M=10
2
MC, M=1
LA, M=1
1
0
0
(4)
, Σp(1, 2) = Σp(2, 1) = 0
Support of this work by the AEA UT-KAUST project entitled “Predictability and uncertainty quantification for models of porous media” is
gratefully acknowledged. Quan Long, Marco Scavino and Raúl Tempone are members of the KAUST SRI Center for Uncertainty Quantification in Computational Science and Engineering.
5
Expected Information Gain
Eventually the covariance matrix of the posterior is
θ
−0.5
We consider the following simple model for the scalar data y:
3.8
1.5
0
3
1
F (θ) ≈ F (θ̂) + ∇F (θ̂)(θ − θ̂) + (θ − θ̂)T ∇∇F (θ̂)(θ − θ̂) ,
2
where the Jacobian of the log posterior with respect to parameters θ
is
M
X
−1
∇F (θ̂) =
J g (θ̂)T Σ−1
r
+
Σ
i
p (θ̂ − θ 0 )
Σ=
0.5
Model with one parameter
where C1 and C2 are both constants.
Now Taylor expansion of F (θ) around θ̂ yields
9
1
M
1 X T −1
1
=
r i Σ r i + (θ − θ 0)T Σ−1
p (θ − θ 0 ) + C2 .
2 i=1
2
H −1
f (θ̂) .
1.5
0.2
0.4
ξ
0.6
0.8
1
Reference
MC, M=10
LA, M=10
4
3
2
MC, M=1
LA, M=1
1
0
0
0.2
0.4
ξ
0.6
0.8
1
Figure 2: Expected information gain for different design parameters.
2
α = 0.7, β = 0.3. In the left figure, σm
= 0.01, while in the right figure
2
σm
= 0.001.
Quan Long, Marco Scavino, Raúl Tempone, Suojin Wang. Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximation. Computer Methods in Applied
Mechanics and Engineering. Vol. 259, pp. 24-39, 2013
Quan Long, Marco Scavino, Raúl Tempone, Suojin Wang. A projection method for optimal Bayesian experimental design. Preprint. 2013.