The Bouncy Particle Sampler: A Non-Reversible

The Bouncy Particle Sampler: A
Non-Reversible Rejection-Free Markov Chain
Monte Carlo Method
Alexandre Bounchard-Côté, Sebastian J. Vollmer, Arnaud
Doucet
Presented by Changyou Chen
January 20, 2017
1
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Introduction
Outline
2
1
Introduction
2
The Bouncy Particle Sampler
3
Numerical Results
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Introduction
SG-MCMC vs. Bouncy Particle Sampler
trace
0
x
SG-MCMC:
diffusion based,
approximated
simulation
-5
0
50
Changyou Chen
150
5
200
bouncy/jumping points
0
−5
3
100
10
t
5
Bouncy particle:
Poisson process
based, exact
simulation
15
count
5
0
50
100
150
200
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Introduction
Bouncy Particle Sampler
1
A Poisson-process based MCMC sampler:
velocity and direction depend on the target distribution
(model posterior)
bouncing (velocity changing) time driven by a Poisson
process parametrized by the model posterior
4
2
Stationary distribution equals the model posterior
distribution.
3
Since simulation of a Poisson process can be exact, no
error introduced, the algorithm is rejection free.
4
Theoretically sound.
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Outline
5
1
Introduction
2
The Bouncy Particle Sampler
3
Numerical Results
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Poisson Processes
1
2
A Poisson process N(t) is a counting process1 of rate λ if
the inter-arrival times are i.i.d. exponential with mean 1/λ .
When λ depends on t, it is called an inhomogeneous
Poisson process.
1t
can be considered as one-dimensional time for simplicity, generalization
on general spaces is straightforward.
6
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Basic Setup
1
Goal is to sample from a target distribution:
p(x) = e−U(x)
2
In Bayesian models, we are given data D = {d1 , · · · , dN }, a
generative model (likelihood) p(D|x) = ∏Ni=1 p(di |x) and prior
p(x), we want to sample from the posterior:
N
p(x|D) ∝ p(x)p(D|x) = p(x) ∏ p(di |x)
i=1
3
U(x) is defined as:
N
U(x) , − ∑ log p(di |x) − log p(x)
i=1
7
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Basic Setup
1
Like HMC, the parameter space is augmented with a
velocity variable.
2
Define the following two quantities:
Poisson process intensity: λ (x, v) = max {0, h∇U(x), vi}
h∇U(x), vi
velocity refreshment operator: R(x)v = v − 2
∇U(x)
k∇U(x)k2
8
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Algorithm Illustration
1
2
3
A one-dimension example with U(x) = x2 .
A particle changes velocity (bounce) at the first arrival time
of an inhomogeneous PP with intensity λ (x, v).
Other than this, a random bounce happens in the first
arrival time of a Poisson process with constant intensity.
U (x) = x2
Poisson process intensity:
6(x; v) = maxf0; hrU (x); vig
rU > 0
v<0
prob. bounce = 0
v<0
9
Changyou Chen
rU > 0
v>0
prob. bounce > 0
v>0
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
where Id denotes the d ⇥ d identity matrix, k · k the Euclidean norm, and hw, zi = wt z the
scalar product between column vectors w, z.1 The algorithm also performs a velocity refreshment
at random times distributed according to the arrival times of a homogeneous Poisson process of
intensity ref
0, ref being a parameter of the BPS algorithm. Throughout the paper, we use
h∇U(x),occurs.
vi The basic
the terminology “event” for a time at which either a bounce or a refreshment
λ (x,
v) BPS
= max{0,
∇U(x)
version
of the
algorithmh∇U(x),
proceeds asvi},
follows:R(x)v = v − 2
2
Basic BPS Algorithm
k∇U(x)k
Algorithm 1 Basic BPS algorithm
1. Initialize the state and velocity x(0) , v (0) arbitrarily on Rd ⇥ Rd .
2. While more events i = 1, 2, . . . requested do
(a) Simulate the first arrival time ⌧ bounce 2 (0, 1) of an inhomogeneous Poisson process of
intensity (t) = (x(i 1) + v (i 1) t, v (i 1) ).
(b) Simulate ⌧ ref ⇠ Exp
(c) Set ⌧i
ref
.
min (⌧ bounce , ⌧ ref ) and compute the next position
x(i)
x(i
1)
+ v (i
1)
⌧i .
(3)
(d) If ⌧i = ⌧ ref , sample the next velocity v (i) ⇠ N (0d , Id ).
(e) If ⌧i = ⌧ bounce , compute the next velocity v (i) using
⇣
⌘
v (i)
R x(i) v (i 1) ,
which is the vector obtained once v (i
of the energy function at x(i) .
1)
(4)
bounces on the plane tangential to the gradient
3. End While.
10
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Simulating bouncy time using a time-scale
transformation
1
2
3
Goal is to simulate the first arrival time of an
inhomogeneous Poisson process with intensity
χ(t) = max{0, h∇U(x + vt, v)i}.
Let Ξ(t) =
Rt
0 χ(s)ds
be the cumulative intensity.
The probability of the first arrival time τ > u is:
P(τ > u) = exp (−Ξ(u))
4
11
(Ξ(u))0
= exp (−Ξ(u))
0!
(1)
Hence, τ = Ξ−1 (− log(V)), where V ∼ U (0, 1), and
Ξ−1 (p) = inf{t : Ξ(t) ≥ p} is the first time such that Ξ(t) > p.
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Simulating bouncy time using a time-scale
transformation
1
When the target distribution is log-concave (U(x) is
Z τ
Z τ
convex):
Ξ(τ) =
λ (x + vs)ds =
λ (x + vs)ds ,
τ∗
0
2
3
where τ ∗ = arg mint:t≥0 U(x + vt) is the minimal.
After some simplifications,
U(x + vτ) − U(x + vτ ∗ ) = − log V, V ∼ U (0, ∞).
Solve through line search if not explicitly solvable.
U (x) = x
2
Poisson process intensity:
6(x; v) = maxf0; hrU (x); vig
rU > 0
v<0
prob. bounce = 0
v<0
12
(2)
Changyou Chen
rU > 0
v>0
prob. bounce > 0
v>0
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Example
U(x + vτ) − U(x + vτ ∗ ) = − log V
Example (Gaussian distributions)
Consider the target distribution to be a zero-mean multivariate
Gaussian of covariance matrix 12 Id , so that U(x) = kxk2 .
1
τ=
kvk2
13
Changyou Chen
p
−hx, vi + p−kvk2 log V
if hx, vi ≤ 0
−hx, vi + hx, vi − kvk2 log V otherwise
(3)
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Simulating
bouncy time via adaptive thinning
2.3.2 Simulation using adaptive thinning
In scenarios where it is difficult to solve (5), the use of a thinning procedure to simulate ⌧ provides
another
alternative.
we have access
to local-in-time upper bounds
¯s (t) onprocess
(t), that is
1
The
idea Assume
is to define
an easy-to-simulate
Poisson
¯s (t) = 0 forχ̄all
t <upper
s,
whose cumulative intensity
bounds χ(t), i.e.:
s (t)
¯s (t)
(t) for all t
s
χ̄ssimulate
(t) = 0theforfirstallarrival
t < s,
(t) ≥ χ(t) for
all process
t ≥ s ⇧¯ s with
(4)
and that we can
timeand
of theχ̄sinhomogeneous
Poisson
intensity ¯s (t) defined on [s, 1). Algorithm 2 shows the pseudocode for the adaptive thinning
procedure.
Algorithm 2 Simulation of the first arrival time of an inhomogeneous Poisson process through
thinning
1. Set s
0, ⌧
0.
2. Do
(a) Set s
⌧.
¯ s of intensity ¯s .
(b) Sample ⌧ as the first arrival point of ⇧
(c) While V >
(⌧ )
¯s (⌧ )
where V ⇠ U (0, 1).
3. Return ⌧ .
14
The event V > ¯s(⌧(⌧)) corresponds to a rejection step in the thinning algorithm but, in contrast to
rejection steps that occur in standard MCMC samplers, in the BPS algorithm this just means that
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Simulating bouncy time using superposition
1
Assume U(x) can be decomposed as:
m
U(x) = ∑ U [j] (x) .
(5)
j=1
2
Let χ [j] = max{0, h∇U [j] (x + tv), vi}, then we have
m
χ(t) ≤ ∑ χ [j] (t)
(6)
j=1
3
Therefore, let τ [j] be the first arrival time w.r.t. χ [j] (t),
m
τ = min τ [j] .
j=1
15
Changyou Chen
(7)
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Example
Example (Logistic regression)
Let {`r ∈ Rd }Rr=1 be the data, cr ∈ {0, 1} the lable of data `r .
Parameter x is assigned a standard multivariate Gaussian prior.
U(x) =
R
kxk2
+ ∑ log (1 + exph`r , xi) − cr h`r , xi .
|
{z
}
2
r=1
(8)
U [r] (x)
1
A lower bound for U [r] (x)’s corresponding intensity, χ [r] (t),
is
χ [r] (t) ≤ χ̄ [r] =
d
∑ 1 [(−1)c vk ≥ 0] · `rk · |vk | ,
r
(9)
k=1
where each χ [r] (t) is a constant given vk .
16
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy
Particle
Peters and de With (2012)
present
anSampler
informal proof establishing the fact that the BPS with
with
ref
ref = 0 admits ⇡ as invariant distribution. We provide in Appendix A a rigorous proof of this
this
ref
⇡-invariance result for ref
0 and prove that the resulting process is additionally ergodic when
when
ref
ref > 0. In the following we denote by P (z,
00
continuous-time
dz ) the transition kernel of the continuous-time
tt
Markov process z (t) = (x (t) , v (t)).
Theoretical results
Proposition 1.
Ptt of the BPS is
For any ref
0, the infinitesimal generator associated to the transition kernel
kernel
given, for any given continuously differentiable functions h : Rd ⇥ Rdd ! R, by
´
Pt (z, dz 0 )h(z 0 ) h(z)
Lh(z) = lim
t!0
t
ˆ
=
(x, v) h(z) + hrx h, vi + ref (h(x, v 0 ) h(x, v)) ( dv 00)
+ (x, v) h(x, R (x) v),
where we recall that
(v) denotes the standard multivariate Gaussian density on Rdd.
This transition kernel is non-reversible and ⇢-invariant, where
⇢(z) = ⇡ (x)
If we add the condition
ref
(12)
(12)
(v) .
> 0, we get the following stronger result.
Theorem 1. If
> 0 then ⇢ is the unique invariant probability measure of the transition kernel
kernel
of the BPS and the corresponding process satisfies a strong law of large numbers for ⇢-almost every
every
z (0) and h 2 L1 (⇢)
ˆ T
ˆ
1
lim
h(z (t))dt = h(z)⇢(z)dz a.s.
T !1 T 0
ref
We exhibit in Section 4.1 a simple example where Pt is not ergodic for
17
Changyou Chen
ref
= 0.
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
Basic proof idea
To derive the Fokker-Planck equation for the algorithm:
∂
µt (z) = (L ∗ µt ) (z) ,
∂t
where z , (x, v), µt (z) is the density of z at time t, L ∗ is the
adjoint of the generator L .
1
2
3
4
18
Assume the stationary distribution of µ(z) = π(x)ψ(v).
Write out the joint distribution of z and the joint times of the
Poisson process.
Calculate the marginal distribution of z, and get the
corresponding density µt (z) for time t.
0
t (z )
Verify that dµdt
= 0 for all z0 .
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
The local bouncy particle sampler
Divide parameters into groups with factor graph.
fa
fc
1
0
x4
−1
1
−2
0
−1
1
1
−2
0
0
−1
1
−1
1
−2
0
−2
0
−1
−1
1
1
−2
−2
0
0
−1
1
−1
1
−2
0
−2
0
−1
−1
1
1
−2
−2
0
0
position
x2 positionpositionposition
x3
−1
1
−1
1
−2
0
−2
0
−1
−1
1
−2
−2
0
−1
1
−2
0
−1
−2
x1
fb
5
5
0
0
5
5
0
0
F = {fa , fb , fc }
(0)
(0)
(0)
xThe
x4(0)
x
1
2
energy is decomposed
into: x3
x2(1)
x1(1)
U(x) = ∑ Uf (x)
(10)
(2)
x3(1)
x2
f ∈F
x3(2)
x2(3)
Define
local
intensity
functions
λ
and
local
bouncing
f
t x (2)
!
1matrices Rf :
(4)
x
λf (x, v)
(11)
2 = max{0, h∇Uf (x), vi}
h∇Uf (x), vi
(5) = v − 2
x1(3)
∇Uf (x)
(12)
Rfx(x)v
2
k∇Uf (x)k2
*
19
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
The Bouncy Particle Sampler
The local bouncy particle sampler
1
The next bounce time τ is the first arrival time of an
inhomogeneous Poisson process with intensity:
χ(t) = ∑f ∈F χf (t).
2
f
Sample a factor with probability χ(τ)
, and update the
components of the parameter x and velocity v related to
factor f , followed the basic BPS sampler.
3
Efficient implementation via priority queue or
thinning-based methods (when # factor is large).
χ (τ)
Theorem
The local BPS still endows ρ(z) = π(x)ψ(v) as the stationary
distribution.
20
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Outline
21
1
Introduction
2
The Bouncy Particle Sampler
3
Numerical Results
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Multivariate Gaussian distributions
Slope=−1.24
105
3
●
0
−1
yBounds
1
ESS per cpu second
2
104
103
102
−2
101
−3
100
●
101
−4
−2
0
2
102
103
104
105
Dimension
xBounds
gure 3: Left:
A trajectory
BPS
for ref
0, space
the center
of the space
is never explor
1
When
λ ref =of 0,thethe
center
of =the
is never
explored
ght: ESS per CPU second for increasing d for the process with refreshment.
(non-ergodic).
2
Optimal scaling
d:
⌦ ESS vs. dimension
↵
P of
s
(i)
(i 1)
d the bounce is based
. It is simple to check that the result
BPS:on≈ d−1.24
(empirically)
Fj (x ), v
j=1 rU
−1.25
namics preserves ⇡.
In
contrast
to
the
batchsize
s
=
1, this is not an implementation of lo
HMC: d
(theoretically)
−2
PS described in Algorithm
6,
but
instead
this
corresponds
random walk MH: d (theoretically) to a local BPS update for a rand
rtition of the factors.
22
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Comparison of the global and local schemes
1
Test on a sparse Gaussian field (not defined).
Relative error
1e+00
1e−02
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e−04
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
false
●
●
true
●
●
●
●
●
●
●
is_local
●
●
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Pairwise precision parameter
igure 4: Relative errors for Gaussian chain-shaped random fields. Facets contain results for field
pairwise precisions 0.1-0.9. Each summarizes the relative errors of 200 (100 local, 100 globa
cal BPS executions, each ran for a fixed computational budget (a wall-clock time of 60 seconds
23
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Comparisons of refreshment schemes
24
1
Global refreshment: basic BPS sampler.
2
Local refreshment: local BPS with partial components of v
refreshed.
3
Restricted refreshment: restrict v to have unit norm when
refreshing.
4
Restricted partial refreshment: local BPS version of
restricted refreshment.
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Comparisons of refreshment schemes
1e+00
1e−02
100
●
●
●
●
●
Relative error
●
●
1e−04
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+00
●
●
●
●
●
●
●
●
●
●
●
●
●
1e−04
Ref. rate
0.01
●
●
●
●
●
●
●
●
●
●
●
●
0.1
1
●
●
1000
1e−02
●
●
10
●
GLOBAL
LOCAL
PARTIAL
RESTRICTED
Refreshment type
Figure 5: Comparison of four refreshment schemes. The top panel shows results for a 100
ref .box plots sum
The local
is less sensitive
to λThe
dimensional1 problem,
and refreshment
the bottom one,scheme
for a 1000-dimensional
problem.
marize the marginal variance (in log scale) of the variable with index 50 over 100 executions o
BPS for each of the refreshment schemes.
25
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Comparisons with HMC
Multivariate Gaussian:
0.4
Relative error
0.2
0.0
Sampling method
BPS(adapt=false,fit−metric=false)
Stan/HMC(adapt=true,fit−metric=false,nuts=false)
Stan/HMC(adapt=true,fit−metric=false,nuts=true)
Stan/HMC(adapt=true,fit−metric=true,nuts=false)
Stan/HMC(adapt=true,fit−metric=true,nuts=true)
−0.2
−0.4
−0.6
0
25
50
75
100
Dimension
Relative
of marginal
variancefor
estimates
for a fixed budget (30s).
Figure Figure:
6: Relative
error oferror
marginal
variance estimates
a fixed computational
computation budget (30s).
21, Section 5.3.3.4], where even a basic HMC scheme was shown to perform favorably compare
o standard MH methods. We run several methods for this test case, each for a wall clock time o
30 seconds, and measure the relative error on the reconstructed marginal variances. We use Sta
12] as a reference implementation for the HMC algorithms. Different HMC versions are explore
26
Chenthe NUTS
The Bouncy
Particle Sampler:
Markovand
Chainby
Monte
Carlo M
by enabling
andChangyou
disabling
methodology
forA Non-Reversible
determiningRejection-Free
path lengths,
enablin
Numerical Results
Comparisons with HMC
Gaussian random field:
Relative error (log scale)
10
100
1000
method
BPS
Stan
0.10
0.01
0
25
50
75
100 0
25
50
75
100 0
25
50
75
100
Percent of samples processed
error for
d=
100,
fixed
computational
igure 7:Figure:
RelativeRelative
reconstruction
error
for10,
d=
10 1000
(left),with
d = a100
(middle)
and d = 1000 (right
veraged budget.
over 10 of the dimensions and 40 runs. Each panel is ran on a fixed computational budge
corresponding in each panel to the wall clock time taken by 2000 Stan iterations).
27
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Bayesian logistic regression
ESS per likelihood evaluation
Use the superposition trick presented previously.
Compared with FireFly, the only scalable algorithm with the
same convergence rate as traditional MCMC.
10−1
10−2
10−3
10−4
10−5
101
102
103
104
105
Number of datapoints (R)
Algorithm
28
Changyou Chen
BPS constant refresh rate
Tuned FireFly
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Bayesian inference of evolutionary parameters
A model from phylogenetics, to model the evolutionary of
model parameters (details omitted).
RF: ESS/s for different statistics
HMC: ESS/s for different statistics
3.0
3.0
2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
0.0
0.0
●
●
●
max
median
min
statistic (across parameters)
max
median
min
statistic (across parameters)
re 11: Maximum, median and minimum ESS/s for BPS (left) and HMC (right). The exp
ts are replicated 10 times with different random seeds.
29
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
0
10
Numerical Results
30
40
50
20
0
10
Lag
20
30
40
50
Lag
Bayesian inference of evolutionary parameters
Figure 12: Estimate of the ACF of the log-likelihood statistic for BPS (left) and HMC (right). A
HMC
similar behavior is observed forBPS
the ACF of the other statistics.
logDensity
Autocorrelation
Autocorrelation
logDensity
0.8
0.4
0.0
0.8
0.4
0.0
0
10
20
30
40
50
0
10
20
Lag
30
40
50
Lag
BPS: ESS/s for different refresh rates
HMC: ESS/s for different values of epsilon
Figure 12: Estimate of
the ACF of the log-likelihood statistic
for BPS (left) and HMC (right). A
similar behavior is observed for the ACF of the other statistics.
3.0
3.0
●
2.5
2.5
2.0
2.0
●
1.5
1.5
1.0
1.0
0.5
0.5
●
●
●
●
●
●
0.0
0.0
1e−06
1e−05
1e−04
0.001
0.01
0.1
refresh rate
30
Changyou Chen
1
10
100
1000
0.0025
0.005
0.01
0.02
0.04
epsilon
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M
Numerical Results
Thanks for your attention!!!
31
Changyou Chen
The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo M