Pliable Rejection Sampling Akram Erraqabi Michal Valko Alexandra Carpentier Odalric-Ambrym Maillard Montréal Institute of Learning Algorithms, Canada Inria Lille - Nord Europe, France Institut für Mathematik - U. of Potsdam, Germany Inria Saclay - Île-de-France, France SequeL – Inria Lille ICML 2016 New York, June 2016 Short Review Rejection Sampling Short Review Rejection Sampling Goal: Sample from a target density f (not easy to sample from) Tool: Use a proposal density g (from which sampling is quite easy) M verifies f ≤ M g Sampling Algo: envelope M g proposal g 1. Sample x from g 2. Accept x as a sample from f with (x) probability Mf g(x) target f x Pliable Rejection Sampling SequeL - 1/15 Short Review Rejection Sampling Short Review Rejection Sampling Goal: Sample from a target density f (not easy to sample from) Tool: Use a proposal density g (from which sampling is quite easy) M verifies f ≤ M g Sampling Algo: envelope M g proposal g 1. Sample x from g 2. Accept x as a sample from f with (x) probability Mf g(x) target f x Pliable Rejection Sampling SequeL - 1/15 Short Review Rejection Sampling Short Review Rejection Sampling Goal: Sample from a target density f (not easy to sample from) Tool: Use a proposal density g (from which sampling is quite easy) M verifies f ≤ M g Sampling Algo: envelope M g R 1. Sample x from g proposal g 2. Accept x as a sample from f with (x) probability Mf g(x) A target f x acceptance rate = Pliable Rejection Sampling A A+R = 1 M SequeL - 1/15 Short Review Rejection Sampling Short Review Rejection Sampling Goal: Sample from a target density f (not easy to sample from) Tool: Use a proposal density g (from which sampling is quite easy) M verifies f ≤ M g Sampling Algo: envelope M g R proposal gCan A Question: 1. Sample x from g we increase the acceptance 2. Accept x as a sample rate? from f with (x) probability Mf g(x) target f x acceptance rate = Pliable Rejection Sampling A A+R = 1 M SequeL - 1/15 The setting The setting Let d ≥ 1 and let f be a density on Rd . Goal: Given a number n of requests to f , what is the number T of samples Y1 , . . . , YT that we can generate such that they are i.i.d. and sampled according to f ? acceptance rate = Pliable Rejection Sampling T n SequeL - 2/15 Can we increase the acceptance rate? ARS Can we increase the acceptance rate? Adaptive Rejection Sampling Adaptive Rejection Sampling (ARS) [Gilks and Wild 1992] I The target f is assumed to be log-concave (unimodal) I The envelope is made of tangents at a set of points S I At each rejection, the sample is added to S Pliable Rejection Sampling log(f ) SequeL - 3/15 Can we increase the acceptance rate? ARS Can we increase the acceptance rate? Adaptive Rejection Sampling Adaptive Rejection Sampling (ARS) [Gilks and Wild 1992] I The target f is assumed to be log-concave (unimodal) I The envelope is made of tangents at a set of points S I At each rejection, the sample is added to S Pliable Rejection Sampling log(f ) SequeL - 3/15 Can we increase the acceptance rate? ARS Can we increase the acceptance rate? Adaptive Rejection Sampling Adaptive Rejection Sampling (ARS) [Gilks and Wild 1992] I The target f is assumed to be log-concave (unimodal) I The envelope is made of tangents at a set of points S I At each rejection, the sample is added to S log(f ) Very strong assumption! Pliable Rejection Sampling SequeL - 3/15 Can we increase the acceptance rate? Improved ARS versions Can we increase the acceptance rate? Improved ARS versions Adaptive Rejection Metropolis Sampling (ARMS) [Gilks, Best and Tan 1995] Convex-Concave Adaptive Rejection Sampling [Gorur and Tuh 2011] I Can deal with non-log-concave densities. I Decomposes the target as convex + concave I Performs a Metropolis-Hastings control for each accepted sample I Builds piecewise linear upper bounds (tangents, secant lines) I At each rejection, the sample is added to S I At each rejection, the sample is added to S Pliable Rejection Sampling SequeL - 4/15 Can we increase the acceptance rate? Improved ARS versions Can we increase the acceptance rate? Improved ARS versions Adaptive Rejection Metropolis Sampling (ARMS) [Gilks, Best and Tan 1995] Convex-Concave Adaptive Rejection Sampling [Gorur and Tuh 2011] I Can deal with non-log-concave densities. I Decomposes the target as convex + concave I Performs a Metropolis-Hastings control for each accepted sample I Builds piecewise linear upper bounds (tangents, secant lines) I At each rejection, the sample is added to S I At each rejection, the sample is added to S Correlated samples! Pliable Rejection Sampling SequeL - 4/15 Can we increase the acceptance rate? Improved ARS versions Can we increase the acceptance rate? Improved ARS versions Adaptive Rejection Metropolis Sampling (ARMS) [Gilks, Best and Tan 1995] Convex-Concave Adaptive Rejection Sampling [Gorur and Tuh 2011] I Can deal with non-log-concave densities. I Decomposes the target as convex + concave I Performs a Metropolis-Hastings control for each accepted sample I Builds piecewise linear upper bounds (tangents, secant lines) I At each rejection, the sample is added to S I At each rejection, the sample is added to S Correlated samples! Pliable Rejection Sampling Convexity assumption! SequeL - 4/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope envelope M g R proposal g A target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R envelope M g Smaller R means g should have a similar “shape” to f R proposal g A target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f For this purpose: I Build an estimate fb estimate fb target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f cg envelope M b estimate fb For this purpose: I Build an estimate fb I Translate it uniformly target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f cg envelope M b R For this purpose: I Build an estimate fb A I estimate fb Translate it uniformly target f acceptance rate = A A+R = 1 c M Pliable Rejection Sampling SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f cg envelope M b R For this purpose: I Build an estimate fb A I estimate fb Translate it uniformly target f ! It should be easy to sample 4 acceptance rate = A A+R = 1 c M Pliable Rejection Sampling from gb ... and fb ! SequeL - 5/15 A Pliable Solution Folding the envelope A Pliable Solution Folding the envelope Better proposal means smaller rejection area R Smaller R means g should have a similar “shape” to f cg envelope M b R For this purpose: I Build an estimate fb A I estimate fb Translate it uniformly target f ! It should be easy to sample 4 acceptance rate = A A+R = 1 c M Pliable Rejection Sampling from gb ... and fb ! SequeL - 5/15 A Pliable Solution Folding the envelope Visualizing a 2D example Multimodal case π π 1 + sin 4πy − f (x, y) ∝ 1 + sin 4πx − 2 2 5 4 3 2 1 1 0 1 0.8 0.5 0.6 0.4 0.2 0 0 Figure: 2D target density (orange) and the pliable proposal (blue) Pliable Rejection Sampling SequeL - 6/15 Pliable Rejection Sampling The PRS algorithm Pliable Rejection Sampling Step 1: Estimating f I f is defined on [0, A]d , bounded and smooth. I K is a positive kernel on Rd (product kernel). I Let X1 , . . . , XN ∼ U[0,A]d . The (modified) kernel regression estimate is N Xi − x Ad X f (X )K fb(x) = i N hd h k=1 For an unbounded support density, some extra information is needed to construct a kernel-based estimate. Pliable Rejection Sampling SequeL - 7/15 Pliable Rejection Sampling The PRS algorithm Pliable Rejection Sampling Step 1: Estimating f I f is defined on [0, A]d , bounded and smooth. I K is a positive kernel on Rd (product kernel). I Let X1 , . . . , XN ∼ U[0,A]d . The (modified) kernel regression estimate is N Xi − x Ad X f (X )K fb(x) = i N hd h k=1 Cost: N requests to f out of n. For an unbounded support density, some extra information is needed to construct a kernel-based estimate. Pliable Rejection Sampling SequeL - 7/15 Pliable Rejection Sampling The PRS algorithm Pliable Rejection Sampling Bounding the gap Theorem 1 The estimate fb is such that with probability larger than 1 − δ, for any point x ∈ [0, A]d , s ! log(N Ad/δ) 2s+d b f (x) − f (x) ≤ H0 N where H0 is a constant that depends on the problem parameters. s is the degree to which f can be expanded as a Taylor expression. Pliable Rejection Sampling SequeL - 8/15 Pliable Rejection Sampling The PRS algorithm Pliable Rejection Sampling Bounding the gap Theorem 1 The estimate fb is such that with probability larger than 1 − δ, for any point x ∈ [0, A]d , s ! log(N Ad/δ) 2s+d b f (x) − f (x) ≤ H0 N where H0 is a constant that depends on the problem parameters. s is the degree to which f can be expanded as a Taylor expression. Remaing Budget: n − N . Pliable Rejection Sampling SequeL - 8/15 Pliable Rejection Sampling The PRS algorithm Pliable Rejection Sampling Step 2: Generating Samples I I I Remaining requests to f : n − N s 2s+d Let rN = Ad HC log(NNAd/δ) Construct the pliable proposal gb out of fb: gb = I 1 N fb + rN U[0,A]d PN i=1 f (Xi ) + rN Perform rejection sampling using gb and the empirical rejection sampling constant P 1 f (Xi ) + rN N c M = 1 Pi i f (Xi ) − 5rN N Pliable Rejection Sampling SequeL - 9/15 Pliable Rejection Sampling The PRS algorithm The algorithm target f Pliable Rejection Sampling Algorithm: Pliable Rejection Sampling (PRS) Input: s, n, δ, HC Initial Sampling Draw uniformly at random N samples on [0, A]d Estimation of f Estimate f using these N samples by kernel regression Generating the samples Sample n − N samples from the pliable proposal gb and perform c as the Rejection Sampling using M envelope constant Output: n b accepted samples SequeL - 10/15 Pliable Rejection Sampling The PRS algorithm The algorithm target f Pliable Rejection Sampling Algorithm: Pliable Rejection Sampling (PRS) Input: s, n, δ, HC Initial Sampling Draw uniformly at random N samples on [0, A]d Estimation of f Estimate f using these N samples by kernel regression Generating the samples Sample n − N samples from the pliable proposal gb and perform c as the Rejection Sampling using M envelope constant Output: n b accepted samples SequeL - 10/15 Pliable Rejection Sampling The PRS algorithm The algorithm estimate fb target f Pliable Rejection Sampling Algorithm: Pliable Rejection Sampling (PRS) Input: s, n, δ, HC Initial Sampling Draw uniformly at random N samples on [0, A]d Estimation of f Estimate f using these N samples by kernel regression Generating the samples Sample n − N samples from the pliable proposal gb and perform c as the Rejection Sampling using M envelope constant Output: n b accepted samples SequeL - 10/15 Pliable Rejection Sampling The PRS algorithm The algorithm Algorithm: Pliable Rejection Sam- cg proposal M b estimate fb target f Pliable Rejection Sampling pling (PRS) Input: s, n, δ, HC Initial Sampling Draw uniformly at random N samples on [0, A]d Estimation of f Estimate f using these N samples by kernel regression Generating the samples Sample n − N samples from the pliable proposal gb and perform c as the Rejection Sampling using M envelope constant Output: n b accepted samples SequeL - 10/15 Pliable Rejection Sampling The Main Result A bound on the acceptance rate The asymptotic performace Theorem 2 Under RTheorem 1’s assumptions and if H0 < HC , 8rN ≤ [0,A]d f (x)dx. Then, for n large enough, we have with probability larger than 1 − δ that " s # log (nAd/δ) 3s+d n b ≥n 1−O . n where n b is the number of i.i.d. samples generated by PRS. Pliable Rejection Sampling SequeL - 11/15 Pliable Rejection Sampling The Main Result A bound on the acceptance rate The asymptotic performace Theorem 2 Under RTheorem 1’s assumptions and if H0 < HC , 8rN ≤ [0,A]d f (x)dx. Then, for n large enough, we have with probability larger than 1 − δ that " s # log (nAd/δ) 3s+d n b ≥n 1−O . n where n b is the number of i.i.d. samples generated by PRS. Convergence Rate ↑ with s Pliable Rejection Sampling Convergence Rate ↓ with d SequeL - 11/15 Experiements Peakiness Experiments Scaling with peakiness f∝ e−x (1+x)a , a defines the peakiness level 1 1 PRS Astar SRS 0.9 0.8 0.8 Acceptance Rate 0.7 Acceptance Rate 0.7 PRS Astar SRS 0.9 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 Peakiness n = 10 14 16 4 18 20 2 4 6 8 10 12 14 Peakiness (b) n = 10 16 18 20 5 Figure: Acceptance rate vs. peakiness Pliable Rejection Sampling SequeL - 12/15 Experiements 2D example Experiements Two dimensional example 5 4 3 2 1 1 0 1 0.8 n = 106 PRS A? sampling SRS 0.5 0.6 0.4 0.2 0 acceptance rate 66.4% 76.1% 25.0% 0 standard deviation 0.45% 0.80% 0.01% Table: 2D example: Acceptance rates averaged over 10 trials Pliable Rejection Sampling SequeL - 13/15 Experiements An inference problem Experiments The Clutter problem n = 105 , 1D PRS A? sampling SRS acceptance rate 79.5% 89.4% 17.6% standard deviation 0.2% 0.8% 0.1% n = 105 , 2D PRS A? sampling SRS acceptance rate 51,0% 56.1% 2.10−3 % standard deviation 0.4% 0.5% 10−5 % Table: Clutter problem: Acceptance rates averaged over 10 trials Pliable Rejection Sampling SequeL - 14/15 Experiements An inference problem Conclusion + PRS deals with a wide class of functions + PRS has guarantees: asymptotically we accept everything (whp) + PRS is a perfect sampler + (whp) the samples are iid (unlike MCMC) + PRS’s empirical performance is comparable to state of the art + We have an extension to densities with unbounded support − PRS works only for small and moderate dimensions + in favorable cases, it can scale to high dimensions as well − It does not work well for peaky distributions (posteriors) Possible extension: Iterative PRS by re-estimating f several times (use the gathered samples to increase its precision) Pliable Rejection Sampling SequeL - 15/15 Thank you! Questions? feel free to come for a little chat! SequeL – Inria Lille ICML 2016 Akram Erraqabi [email protected] sequel.lille.inria.fr
© Copyright 2026 Paperzz