Pliable Rejection Sampling

Pliable Rejection Sampling
Akram Erraqabi
Michal Valko
Alexandra Carpentier
Odalric-Ambrym Maillard
Montréal Institute of Learning Algorithms, Canada
Inria Lille - Nord Europe, France
Institut für Mathematik - U. of Potsdam, Germany
Inria Saclay - Île-de-France, France
SequeL – Inria Lille
ICML 2016
New York, June 2016
Short Review
Rejection Sampling
Short Review
Rejection Sampling
Goal: Sample from a target density f (not easy to sample from)
Tool: Use a proposal density g (from which sampling is quite easy)
M verifies f ≤ M g
Sampling Algo:
envelope M g
proposal g
1. Sample x from g
2. Accept x as a sample
from f with
(x)
probability Mf g(x)
target f
x
Pliable Rejection Sampling
SequeL - 1/15
Short Review
Rejection Sampling
Short Review
Rejection Sampling
Goal: Sample from a target density f (not easy to sample from)
Tool: Use a proposal density g (from which sampling is quite easy)
M verifies f ≤ M g
Sampling Algo:
envelope M g
proposal g
1. Sample x from g
2. Accept x as a sample
from f with
(x)
probability Mf g(x)
target f
x
Pliable Rejection Sampling
SequeL - 1/15
Short Review
Rejection Sampling
Short Review
Rejection Sampling
Goal: Sample from a target density f (not easy to sample from)
Tool: Use a proposal density g (from which sampling is quite easy)
M verifies f ≤ M g
Sampling Algo:
envelope M g
R
1. Sample x from g
proposal g
2. Accept x as a sample
from f with
(x)
probability Mf g(x)
A
target f
x
acceptance rate =
Pliable Rejection Sampling
A
A+R
=
1
M
SequeL - 1/15
Short Review
Rejection Sampling
Short Review
Rejection Sampling
Goal: Sample from a target density f (not easy to sample from)
Tool: Use a proposal density g (from which sampling is quite easy)
M verifies f ≤ M g
Sampling Algo:
envelope M g
R
proposal gCan
A
Question:
1. Sample x from g
we increase the acceptance
2. Accept x as a sample
rate?
from f with
(x)
probability Mf g(x)
target f
x
acceptance rate =
Pliable Rejection Sampling
A
A+R
=
1
M
SequeL - 1/15
The setting
The setting
Let d ≥ 1 and let f be a density on Rd .
Goal:
Given a number n of requests to f , what is the number T of
samples Y1 , . . . , YT that we can generate such that they are
i.i.d. and sampled according to f ?
acceptance rate =
Pliable Rejection Sampling
T
n
SequeL - 2/15
Can we increase the acceptance rate?
ARS
Can we increase the acceptance rate?
Adaptive Rejection Sampling
Adaptive Rejection Sampling
(ARS) [Gilks and Wild 1992]
I
The target f is assumed to be
log-concave (unimodal)
I
The envelope is made of
tangents at a set of points S
I
At each rejection, the sample
is added to S
Pliable Rejection Sampling
log(f )
SequeL - 3/15
Can we increase the acceptance rate?
ARS
Can we increase the acceptance rate?
Adaptive Rejection Sampling
Adaptive Rejection Sampling
(ARS) [Gilks and Wild 1992]
I
The target f is assumed to be
log-concave (unimodal)
I
The envelope is made of
tangents at a set of points S
I
At each rejection, the sample
is added to S
Pliable Rejection Sampling
log(f )
SequeL - 3/15
Can we increase the acceptance rate?
ARS
Can we increase the acceptance rate?
Adaptive Rejection Sampling
Adaptive Rejection Sampling
(ARS) [Gilks and Wild 1992]
I
The target f is assumed to be
log-concave (unimodal)
I
The envelope is made of
tangents at a set of points S
I
At each rejection, the sample
is added to S
log(f )
Very strong assumption!
Pliable Rejection Sampling
SequeL - 3/15
Can we increase the acceptance rate?
Improved ARS versions
Can we increase the acceptance rate?
Improved ARS versions
Adaptive Rejection
Metropolis Sampling (ARMS)
[Gilks, Best and Tan 1995]
Convex-Concave Adaptive
Rejection Sampling [Gorur and
Tuh 2011]
I
Can deal with non-log-concave
densities.
I
Decomposes the target as
convex + concave
I
Performs a
Metropolis-Hastings control
for each accepted sample
I
Builds piecewise linear upper
bounds (tangents, secant
lines)
I
At each rejection, the sample
is added to S
I
At each rejection, the sample
is added to S
Pliable Rejection Sampling
SequeL - 4/15
Can we increase the acceptance rate?
Improved ARS versions
Can we increase the acceptance rate?
Improved ARS versions
Adaptive Rejection
Metropolis Sampling (ARMS)
[Gilks, Best and Tan 1995]
Convex-Concave Adaptive
Rejection Sampling [Gorur and
Tuh 2011]
I
Can deal with non-log-concave
densities.
I
Decomposes the target as
convex + concave
I
Performs a
Metropolis-Hastings control
for each accepted sample
I
Builds piecewise linear upper
bounds (tangents, secant
lines)
I
At each rejection, the sample
is added to S
I
At each rejection, the sample
is added to S
Correlated samples!
Pliable Rejection Sampling
SequeL - 4/15
Can we increase the acceptance rate?
Improved ARS versions
Can we increase the acceptance rate?
Improved ARS versions
Adaptive Rejection
Metropolis Sampling (ARMS)
[Gilks, Best and Tan 1995]
Convex-Concave Adaptive
Rejection Sampling [Gorur and
Tuh 2011]
I
Can deal with non-log-concave
densities.
I
Decomposes the target as
convex + concave
I
Performs a
Metropolis-Hastings control
for each accepted sample
I
Builds piecewise linear upper
bounds (tangents, secant
lines)
I
At each rejection, the sample
is added to S
I
At each rejection, the sample
is added to S
Correlated samples!
Pliable Rejection Sampling
Convexity assumption!
SequeL - 4/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
envelope M g
R
proposal g
A
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
envelope M g
Smaller R means g should have a
similar “shape” to f
R
proposal g
A
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
For this purpose:
I Build an estimate fb
estimate fb
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
cg
envelope M
b
estimate fb
For this purpose:
I
Build an estimate fb
I
Translate it uniformly
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
cg
envelope M
b
R
For this purpose:
I Build an estimate fb
A
I
estimate fb
Translate it uniformly
target f
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
cg
envelope M
b
R
For this purpose:
I Build an estimate fb
A
I
estimate fb
Translate it uniformly
target f
! It should be easy to sample
4
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
from gb ... and fb !
SequeL - 5/15
A Pliable Solution
Folding the envelope
A Pliable Solution
Folding the envelope
Better proposal means smaller
rejection area R
Smaller R means g should have a
similar “shape” to f
cg
envelope M
b
R
For this purpose:
I Build an estimate fb
A
I
estimate fb
Translate it uniformly
target f
! It should be easy to sample
4
acceptance rate =
A
A+R
=
1
c
M
Pliable Rejection Sampling
from gb ... and fb !
SequeL - 5/15
A Pliable Solution
Folding the envelope
Visualizing a 2D example
Multimodal case
π π 1 + sin 4πy −
f (x, y) ∝ 1 + sin 4πx −
2
2
5
4
3
2
1
1
0
1
0.8
0.5
0.6
0.4
0.2
0
0
Figure: 2D target density (orange) and the pliable proposal (blue)
Pliable Rejection Sampling
SequeL - 6/15
Pliable Rejection Sampling
The PRS algorithm
Pliable Rejection Sampling
Step 1: Estimating f
I
f is defined on [0, A]d , bounded and smooth.
I
K is a positive kernel on Rd (product kernel).
I
Let X1 , . . . , XN ∼ U[0,A]d . The (modified) kernel regression
estimate is
N
Xi − x
Ad X
f
(X
)K
fb(x) =
i
N hd
h
k=1
For an unbounded support density, some extra information is needed to
construct a kernel-based estimate.
Pliable Rejection Sampling
SequeL - 7/15
Pliable Rejection Sampling
The PRS algorithm
Pliable Rejection Sampling
Step 1: Estimating f
I
f is defined on [0, A]d , bounded and smooth.
I
K is a positive kernel on Rd (product kernel).
I
Let X1 , . . . , XN ∼ U[0,A]d . The (modified) kernel regression
estimate is
N
Xi − x
Ad X
f
(X
)K
fb(x) =
i
N hd
h
k=1
Cost: N requests to f out of n.
For an unbounded support density, some extra information is needed to
construct a kernel-based estimate.
Pliable Rejection Sampling
SequeL - 7/15
Pliable Rejection Sampling
The PRS algorithm
Pliable Rejection Sampling
Bounding the gap
Theorem 1
The estimate fb is such that with probability larger than 1 − δ, for any
point x ∈ [0, A]d ,
s !
log(N Ad/δ) 2s+d
b
f (x) − f (x) ≤ H0
N
where H0 is a constant that depends on the problem parameters.
s is the degree to which f can be expanded as a Taylor expression.
Pliable Rejection Sampling
SequeL - 8/15
Pliable Rejection Sampling
The PRS algorithm
Pliable Rejection Sampling
Bounding the gap
Theorem 1
The estimate fb is such that with probability larger than 1 − δ, for any
point x ∈ [0, A]d ,
s !
log(N Ad/δ) 2s+d
b
f (x) − f (x) ≤ H0
N
where H0 is a constant that depends on the problem parameters.
s is the degree to which f can be expanded as a Taylor expression.
Remaing Budget: n − N .
Pliable Rejection Sampling
SequeL - 8/15
Pliable Rejection Sampling
The PRS algorithm
Pliable Rejection Sampling
Step 2: Generating Samples
I
I
I
Remaining requests to f : n − N
s
2s+d
Let rN = Ad HC log(NNAd/δ)
Construct the pliable proposal gb out of fb:
gb =
I
1
N
fb + rN U[0,A]d
PN
i=1 f (Xi ) + rN
Perform rejection sampling using gb and the empirical rejection
sampling constant
P
1
f (Xi ) + rN
N
c
M = 1 Pi
i f (Xi ) − 5rN
N
Pliable Rejection Sampling
SequeL - 9/15
Pliable Rejection Sampling
The PRS algorithm
The algorithm
target f
Pliable Rejection Sampling
Algorithm: Pliable Rejection Sampling (PRS)
Input: s, n, δ, HC
Initial Sampling
Draw uniformly at random N
samples on [0, A]d
Estimation of f
Estimate f using these N samples by
kernel regression
Generating the samples
Sample n − N samples from the
pliable proposal gb and perform
c as the
Rejection Sampling using M
envelope constant
Output: n
b accepted samples
SequeL - 10/15
Pliable Rejection Sampling
The PRS algorithm
The algorithm
target f
Pliable Rejection Sampling
Algorithm: Pliable Rejection Sampling (PRS)
Input: s, n, δ, HC
Initial Sampling
Draw uniformly at random N
samples on [0, A]d
Estimation of f
Estimate f using these N samples by
kernel regression
Generating the samples
Sample n − N samples from the
pliable proposal gb and perform
c as the
Rejection Sampling using M
envelope constant
Output: n
b accepted samples
SequeL - 10/15
Pliable Rejection Sampling
The PRS algorithm
The algorithm
estimate fb
target f
Pliable Rejection Sampling
Algorithm: Pliable Rejection Sampling (PRS)
Input: s, n, δ, HC
Initial Sampling
Draw uniformly at random N
samples on [0, A]d
Estimation of f
Estimate f using these N samples by
kernel regression
Generating the samples
Sample n − N samples from the
pliable proposal gb and perform
c as the
Rejection Sampling using M
envelope constant
Output: n
b accepted samples
SequeL - 10/15
Pliable Rejection Sampling
The PRS algorithm
The algorithm
Algorithm: Pliable Rejection Sam-
cg
proposal M
b
estimate fb
target f
Pliable Rejection Sampling
pling (PRS)
Input: s, n, δ, HC
Initial Sampling
Draw uniformly at random N
samples on [0, A]d
Estimation of f
Estimate f using these N samples by
kernel regression
Generating the samples
Sample n − N samples from the
pliable proposal gb and perform
c as the
Rejection Sampling using M
envelope constant
Output: n
b accepted samples
SequeL - 10/15
Pliable Rejection Sampling
The Main Result
A bound on the acceptance rate
The asymptotic performace
Theorem 2
Under RTheorem 1’s assumptions and if H0 < HC ,
8rN ≤ [0,A]d f (x)dx. Then, for n large enough, we have with
probability larger than 1 − δ that
"
s #
log (nAd/δ) 3s+d
n
b ≥n 1−O
.
n
where n
b is the number of i.i.d. samples generated by PRS.
Pliable Rejection Sampling
SequeL - 11/15
Pliable Rejection Sampling
The Main Result
A bound on the acceptance rate
The asymptotic performace
Theorem 2
Under RTheorem 1’s assumptions and if H0 < HC ,
8rN ≤ [0,A]d f (x)dx. Then, for n large enough, we have with
probability larger than 1 − δ that
"
s #
log (nAd/δ) 3s+d
n
b ≥n 1−O
.
n
where n
b is the number of i.i.d. samples generated by PRS.
Convergence Rate ↑
with s
Pliable Rejection Sampling
Convergence Rate ↓
with d
SequeL - 11/15
Experiements
Peakiness
Experiments
Scaling with peakiness
f∝
e−x
(1+x)a ,
a defines the peakiness level
1
1
PRS
Astar
SRS
0.9
0.8
0.8
Acceptance Rate
0.7
Acceptance Rate
0.7
PRS
Astar
SRS
0.9
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2
4
6
8
10
12
Peakiness
n = 10
14
16
4
18
20
2
4
6
8
10
12
14
Peakiness
(b) n = 10
16
18
20
5
Figure: Acceptance rate vs. peakiness
Pliable Rejection Sampling
SequeL - 12/15
Experiements
2D example
Experiements
Two dimensional example
5
4
3
2
1
1
0
1
0.8
n = 106
PRS
A? sampling
SRS
0.5
0.6
0.4
0.2
0
acceptance rate
66.4%
76.1%
25.0%
0
standard deviation
0.45%
0.80%
0.01%
Table: 2D example: Acceptance rates averaged over 10 trials
Pliable Rejection Sampling
SequeL - 13/15
Experiements
An inference problem
Experiments
The Clutter problem
n = 105 , 1D
PRS
A? sampling
SRS
acceptance rate
79.5%
89.4%
17.6%
standard deviation
0.2%
0.8%
0.1%
n = 105 , 2D
PRS
A? sampling
SRS
acceptance rate
51,0%
56.1%
2.10−3 %
standard deviation
0.4%
0.5%
10−5 %
Table: Clutter problem: Acceptance rates averaged over 10 trials
Pliable Rejection Sampling
SequeL - 14/15
Experiements
An inference problem
Conclusion
+ PRS deals with a wide class of functions
+ PRS has guarantees: asymptotically we accept everything (whp)
+ PRS is a perfect sampler
+ (whp) the samples are iid (unlike MCMC)
+ PRS’s empirical performance is comparable to state of the art
+ We have an extension to densities with unbounded support
− PRS works only for small and moderate dimensions
+ in favorable cases, it can scale to high dimensions as well
− It does not work well for peaky distributions (posteriors)
Possible extension: Iterative PRS by re-estimating f several times
(use the gathered samples to increase its precision)
Pliable Rejection Sampling
SequeL - 15/15
Thank you!
Questions? feel free to come for a little chat!
SequeL – Inria Lille
ICML 2016
Akram Erraqabi
[email protected]
sequel.lille.inria.fr