New algorithms for global optimization with Gaussian

New algorithms for global optimization with
Gaussian processes
Emile Contal
PhD student with Nicolas Vayatis
[email protected]
CMLA, ENS Cachan
June 17, 2013
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Introduction
Gaussian Processes
GP-UCB
Parallelism
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Motivating example
yx
yx3
yx4
yx2
yx1
x
x1
New algorithms for global optimization with Gaussian processes
x2
x3
x4
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Motivating example
yx
xn+1 ?
yx3
yx4
yx2
yx1
x
x1
New algorithms for global optimization with Gaussian processes
x2
x3
x4
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Motivating example
yx
xn+1 ?
yx3
yx4
yx2
yx1
x
x1
New algorithms for global optimization with Gaussian processes
x2
x3
x4
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Sequential global optimization
Setup
I
unknown f : X → R, where X ∈ Rd
I
f (x ? ) = maxx∈X f (x)
I
x1 , x2 , · · · ∈ X
I
y1 , y2 , · · · ∈ R, such that yt = f (xt ) + t where t ∼ N (0, σ 2 )
iid
Examples
I
Heavy numerical experiment
I
Laboratory experiment
I
Sensor placement
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Objective
Challenge
I
Search space in high dimension
I
Evaluations are expensive
I
Exploration / Exploitation
Cumulative regret
I
I
rt = f (x ? ) − f (xt )
P
?
RT = T
t=1 f (x ) − f (xt )
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Introduction
Gaussian Processes
GP-UCB
Parallelism
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Framework
Definition
f ∼ GP(m, k), with mean function m : X → R and kernel function
k : X × X → R+ , when for all x1 , . . . , xn ,
f (x1 ), . . . , f (xn ) ∼ N (µ, C) ,
with µ[xi ] = m(xi )
and C[xi , xj ] = k(xi , xj ) .
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Four 1D examples
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Posterior distribution
Bayesian Inference (Rasmussen and Williams, 2005)
At iteration T , with observations YXT at XT = {x1 , . . . , xT }, the
posterior mean and variances are given by,
µT +1 (x) = kT (x)> C−1
T YXT
(1)
σT2 +1 (x) = k(x, x) − kT (x)> C−1
T kT (x) ,
(2)
where CT = KT + σ 2 I and KT = [k(xt , xt 0 )]xt ,xt 0 ∈XT .
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
0
−1
−2
−1
−0.5
New algorithms for global optimization with Gaussian processes
0
0.5
1
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Introduction
Gaussian Processes
GP-UCB
Parallelism
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Upper and Lower bounds
Definition
fT+ (x) = µT (x) + βT σT (x)
fT− (x) = µT (x) − βT σT (x)
Property
∀x ∈ X , ∀T ≥ 1,
f (x) ∈ fT− (x), fT+ (x) with high probability
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
GP-UCB (Srinivas et al., 2012)
Algorithm
Algorithm 1: GP-UCB
for t = 0, 1, . . . do
Compute µt and σt2 (Eq. 1, 2) with y1 , . . . , yt−1
xt ← argmaxx∈X ft+ (x)
Query xt and observe yt
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Regret Bounds
Theorem (Srinivas et al. (2012))
2
∀δ > 0, set ∀t ≤ T , βt = 2 log |X | t 2 π6δ ,
i
h
p
Pr RT ≤ CT βT γT ≥ 1 − δ
where C =
8
.
1+σ −2
Information gain (1/2)
I
H(X ) information entropy
I
I (X ) = H(YX ) − H(YX | f )
I
γT = maxX ⊂X ,|X |=T I (X )
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Regret Bounds
Theorem (Srinivas et al. (2012))
2
∀δ > 0, set ∀t ≤ T , βt = 2 log |X | t 2 π6δ ,
h
i
p
Pr RT ≤ CT βT γT ≥ 1 − δ
where C =
8
.
1+σ −2
Information gain (2/2)
I
For linear kernel, γT ∈ O(d log T )
I
For RBF kernel, γT ∈ O (log T )d+1
I
For Matérn kernel, γT ∈ O T α log T , with
d(d+1)
≤1
α = 2ν+d(d+1)
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Introduction
Gaussian Processes
GP-UCB
Parallelism
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Motivating example
yx
yx3
yx4
yx2
yx1
x
x1
New algorithms for global optimization with Gaussian processes
x2
x3
x4
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Motivating example
yx
x (1) ?
x (2) ?
x (3) ?
yx3
yx4
yx2
yx1
x
x1
New algorithms for global optimization with Gaussian processes
x2
x3
x4
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Setup
Notation
(1)
(K ) At iteration T , select a batch of K queries XTK = xT , . . . , xT
Complexity
Finding the K points XTK that maximizes I (XTK ) is NP-hard.
Heuristic
Due to the submodularity of I , the greedy strategy is a good
approximation (Guestrin et al., 2005).
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Relevant Region
Definition
xt• = argmax ft− (x)
x∈X
yt• = ft− (xt• )
n
o
Rt = x ∈ X | ft+ (x) ≥ yt•
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
1
0
−1
−2
−1
−0.5
New algorithms for global optimization with Gaussian processes
0
0.5
1
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
GP-UCB-PE (Contal et al., 2013)
Algorithm 2: GP-UCB-PE
for t = 0, 1, . . . do
Compute µt and σt2 (Eq. 1, 2) with y1 , . . . , yt−1
xt0 ← argmaxx∈X fbt+ (x)
Compute R+
t
for k = 1, . . . , K − 1 do
(k)
Compute σ
bt (Eq.2)
(k)
xtk ← argmaxx∈R+t σ
bt (x)
Query {xtk }k<K
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
1
0
−1
−2
−1
−0.5
New algorithms for global optimization with Gaussian processes
0
0.5
1
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
1
x0
0
−1
−2
−1
−0.5
New algorithms for global optimization with Gaussian processes
0
0.5
1
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Example
1
x0
x1
0
−1
−2
−1
−0.5
New algorithms for global optimization with Gaussian processes
0
0.5
1
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Regret Bounds
Theorem (Contal et al. (2013))
∀δ > 0, βt defined as above,
i
h
p
Pr RTK ≤ C1 TK βT γTK + C2 ≥ 1 − δ
where C1 =
36
,
log(1+σ −2 )
and C2 =
π
√
.
6
Corollary
When the cost for a batch is fixed and K T ,√the improvement of
the parallel strategy over the sequential one is K .
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan
Introduction
Gaussian Processes
GP-UCB
Parallelism
References
Contal, E., Buffoni, D., Robicquet, A., and Vayatis, N. (2013).
Parallel gaussian process optimization with pure exploration. In
Submitted to ECML (pending approval).
Guestrin, C., Krause, A., and Singh, A. (2005). Near-optimal sensor
placements in Gaussian processes. In Proceedings of ICML, pages
265–272. ACM.
Rasmussen, C. E. and Williams, C. (2005). Gaussian Processes for
Machine Learning. MIT Press.
Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. (2012).
Information-theoretic regret bounds for gaussian process
optimization in the bandit setting. IEEE Transactions on
Information Theory, 58(5):3250–3265.
New algorithms for global optimization with Gaussian processes
Emile Contal, CMLA, ENS Cachan