New algorithms for global optimization with Gaussian processes Emile Contal PhD student with Nicolas Vayatis [email protected] CMLA, ENS Cachan June 17, 2013 Introduction Gaussian Processes GP-UCB Parallelism References Introduction Gaussian Processes GP-UCB Parallelism New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Motivating example yx yx3 yx4 yx2 yx1 x x1 New algorithms for global optimization with Gaussian processes x2 x3 x4 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Motivating example yx xn+1 ? yx3 yx4 yx2 yx1 x x1 New algorithms for global optimization with Gaussian processes x2 x3 x4 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Motivating example yx xn+1 ? yx3 yx4 yx2 yx1 x x1 New algorithms for global optimization with Gaussian processes x2 x3 x4 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Sequential global optimization Setup I unknown f : X → R, where X ∈ Rd I f (x ? ) = maxx∈X f (x) I x1 , x2 , · · · ∈ X I y1 , y2 , · · · ∈ R, such that yt = f (xt ) + t where t ∼ N (0, σ 2 ) iid Examples I Heavy numerical experiment I Laboratory experiment I Sensor placement New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Objective Challenge I Search space in high dimension I Evaluations are expensive I Exploration / Exploitation Cumulative regret I I rt = f (x ? ) − f (xt ) P ? RT = T t=1 f (x ) − f (xt ) New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Introduction Gaussian Processes GP-UCB Parallelism New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Framework Definition f ∼ GP(m, k), with mean function m : X → R and kernel function k : X × X → R+ , when for all x1 , . . . , xn , f (x1 ), . . . , f (xn ) ∼ N (µ, C) , with µ[xi ] = m(xi ) and C[xi , xj ] = k(xi , xj ) . New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Four 1D examples New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Posterior distribution Bayesian Inference (Rasmussen and Williams, 2005) At iteration T , with observations YXT at XT = {x1 , . . . , xT }, the posterior mean and variances are given by, µT +1 (x) = kT (x)> C−1 T YXT (1) σT2 +1 (x) = k(x, x) − kT (x)> C−1 T kT (x) , (2) where CT = KT + σ 2 I and KT = [k(xt , xt 0 )]xt ,xt 0 ∈XT . New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example 0 −1 −2 −1 −0.5 New algorithms for global optimization with Gaussian processes 0 0.5 1 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Introduction Gaussian Processes GP-UCB Parallelism New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Upper and Lower bounds Definition fT+ (x) = µT (x) + βT σT (x) fT− (x) = µT (x) − βT σT (x) Property ∀x ∈ X , ∀T ≥ 1, f (x) ∈ fT− (x), fT+ (x) with high probability New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References GP-UCB (Srinivas et al., 2012) Algorithm Algorithm 1: GP-UCB for t = 0, 1, . . . do Compute µt and σt2 (Eq. 1, 2) with y1 , . . . , yt−1 xt ← argmaxx∈X ft+ (x) Query xt and observe yt New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Regret Bounds Theorem (Srinivas et al. (2012)) 2 ∀δ > 0, set ∀t ≤ T , βt = 2 log |X | t 2 π6δ , i h p Pr RT ≤ CT βT γT ≥ 1 − δ where C = 8 . 1+σ −2 Information gain (1/2) I H(X ) information entropy I I (X ) = H(YX ) − H(YX | f ) I γT = maxX ⊂X ,|X |=T I (X ) New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Regret Bounds Theorem (Srinivas et al. (2012)) 2 ∀δ > 0, set ∀t ≤ T , βt = 2 log |X | t 2 π6δ , h i p Pr RT ≤ CT βT γT ≥ 1 − δ where C = 8 . 1+σ −2 Information gain (2/2) I For linear kernel, γT ∈ O(d log T ) I For RBF kernel, γT ∈ O (log T )d+1 I For Matérn kernel, γT ∈ O T α log T , with d(d+1) ≤1 α = 2ν+d(d+1) New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Introduction Gaussian Processes GP-UCB Parallelism New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Motivating example yx yx3 yx4 yx2 yx1 x x1 New algorithms for global optimization with Gaussian processes x2 x3 x4 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Motivating example yx x (1) ? x (2) ? x (3) ? yx3 yx4 yx2 yx1 x x1 New algorithms for global optimization with Gaussian processes x2 x3 x4 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Setup Notation (1) (K ) At iteration T , select a batch of K queries XTK = xT , . . . , xT Complexity Finding the K points XTK that maximizes I (XTK ) is NP-hard. Heuristic Due to the submodularity of I , the greedy strategy is a good approximation (Guestrin et al., 2005). New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Relevant Region Definition xt• = argmax ft− (x) x∈X yt• = ft− (xt• ) n o Rt = x ∈ X | ft+ (x) ≥ yt• New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example 1 0 −1 −2 −1 −0.5 New algorithms for global optimization with Gaussian processes 0 0.5 1 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References GP-UCB-PE (Contal et al., 2013) Algorithm 2: GP-UCB-PE for t = 0, 1, . . . do Compute µt and σt2 (Eq. 1, 2) with y1 , . . . , yt−1 xt0 ← argmaxx∈X fbt+ (x) Compute R+ t for k = 1, . . . , K − 1 do (k) Compute σ bt (Eq.2) (k) xtk ← argmaxx∈R+t σ bt (x) Query {xtk }k<K New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example 1 0 −1 −2 −1 −0.5 New algorithms for global optimization with Gaussian processes 0 0.5 1 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example 1 x0 0 −1 −2 −1 −0.5 New algorithms for global optimization with Gaussian processes 0 0.5 1 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Example 1 x0 x1 0 −1 −2 −1 −0.5 New algorithms for global optimization with Gaussian processes 0 0.5 1 Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Regret Bounds Theorem (Contal et al. (2013)) ∀δ > 0, βt defined as above, i h p Pr RTK ≤ C1 TK βT γTK + C2 ≥ 1 − δ where C1 = 36 , log(1+σ −2 ) and C2 = π √ . 6 Corollary When the cost for a batch is fixed and K T ,√the improvement of the parallel strategy over the sequential one is K . New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan Introduction Gaussian Processes GP-UCB Parallelism References Contal, E., Buffoni, D., Robicquet, A., and Vayatis, N. (2013). Parallel gaussian process optimization with pure exploration. In Submitted to ECML (pending approval). Guestrin, C., Krause, A., and Singh, A. (2005). Near-optimal sensor placements in Gaussian processes. In Proceedings of ICML, pages 265–272. ACM. Rasmussen, C. E. and Williams, C. (2005). Gaussian Processes for Machine Learning. MIT Press. Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. (2012). Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265. New algorithms for global optimization with Gaussian processes Emile Contal, CMLA, ENS Cachan
© Copyright 2026 Paperzz