Optimization of Noisy Functions - University of Wisconsin

Optimization of Noisy Functions - University of Wisconsin–Madison

Optimization of Noisy Functions
Geng Deng
Michael C. Ferris
University of Wisconsin-Madison
INFORMS, October 15, 2008
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
1 / 26
A general problem formulation
We formulate a noisy optimization problem as
min f (x) = E[F (x, ξ(ω))],
x∈S
ξ(ω) is a random component arising in some simulation process.
The sample response function F (x, ξ(ω))
I
I
I
typically does not have a closed form, thus cannot provide gradient or
Hessian information
is normally computationally expensive
is affected by uncertain factors in simulation
The underlying objective function f (x) has to be estimated.
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
2 / 26
WISOPT two-phase optimization framework
1
Phase I is a global exploration step. The algorithm explores the
entire domain and proceeds to determine potentially good subregions
for future investigation.
2
Phase II is a local exploitation step. Local optimization algorithms
are applied to determine the final solution.
500
1
400
0.8
300
0.6
200
0.4
100
0.2
0.025
0
0.02
−100
2.5
2
0.015
−0.235
1.5
1
0.5
0
−0.5
−0.5
0
Michael Ferris (University of Wisconsin)
0.5
1
1.5
2
2.5
−0.24
0.01
−0.245
−0.25
0.005
Simulation-Based Optimization
−0.255
INFORMS, 15 Oct
3 / 26
The flow chart of WISOPT
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
4 / 26
WISOPT Phase I: a classification based global search
Classifier: surrogate for indicator function of the level set


N


X
1
F (x, ξj ) ≤ c
L(c) = {x f (x) ≤ c} ' x f¯(x) =


N
j=1
c is a quantile point of the responses
The level set corresponds to promising regions
Training set: space-filling samples (points) from the whole domain
(e.g. mesh grid; the Latin Hypercube Sampling)
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
5 / 26
Classifiers predict new refined samples as promising
(a) Training samples in L(c)
are classified as positive and
others are negative.
The
solid circle represents estimated L(c).
(b) Classify a set of more
refined space-filling samples.
Four points are predicted as
positive and rest are negative.
The classifier is refined.
Validate the subset of the identified promising points by performing
additional simulations
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
6 / 26
Imbalanced data
To identify the top promising regions, the best 10% of the training
samples are labeled as ‘+’, and the rest are ‘-’
The imbalance of the training set causes low classification accuracy,
especially for positive members
Balance the training data set
I
Under-sample of the negative class using one-sided selection
F
F
I
Use 1-NN and retain only those negative samples needed to predict
training set
Clean the dataset with Tomek links
Over-sample of the positive class by duplicating positive samples
Adjust the misclassification penalty
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
7 / 26
Cleaning the dataset with Tomek links
(c) Determine the pairs of
Tomek links
Michael Ferris (University of Wisconsin)
(d) Remove the negative samples participating as Tomek
links
Simulation-Based Optimization
INFORMS, 15 Oct
8 / 26
Assemble classifiers using a voting scheme
Use g-means on training set to determine which to use
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
9 / 26
Classifier Phase I approach
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
10 / 26
Banana example
2−
−
−
−
−
1−
−
0.5 −
−
−
0
−
−
−0.5 −
−
−1 −
−
−
−1.5
−
−
−2 −
−2
1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
−1
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−0.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
0
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
0.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
1
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
Original
−
−
−
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
2
−
−
−
2
−
−
1.5
−
−
−
−
1
0.5
−
0
−0.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−1
−
−
−1.5
−
−
−2
−2
− −
−1.5
−1
−
−
−0.5
Predicted
0
0.5
1
1.5
−
2
Training
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2
Michael Ferris (University of Wisconsin)
−1.5
−1
−0.5
0
0.5
1
1.5
Simulation-Based Optimization
2
INFORMS, 15 Oct
11 / 26
The non-parametric “linking” idea
2−
−
−
−
−
1−
−
0.5 −
−
−
0
−
−
−0.5 −
−
−1 −
−
−
−1.5
−
−
−2 −
−2
1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
−1
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−0.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
0
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
0.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
1
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
1.5
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
2
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2
−1.5
Original / sse(h)
−1
−0.5
0
0.5
1
1.5
2
Data / Result
4
9
x 10
2
8
1.5
7
1
6
0.5
5
0
4
−0.5
3
−1
2
−1.5
1
0
0.1
0.2
0.3
0.4
0.5
Michael Ferris (University of Wisconsin)
0.6
0.7
0.8
0.9
1
−2
−2
−1.5
−1
Simulation-Based Optimization
−0.5
0
0.5
1
1.5
2
INFORMS, 15 Oct
12 / 26
Determine subregion radius by non-parametric regression
The idea is to determine the best ‘window size’ for non-parametric local
quadratic regression
1
∆ ∈ arg minh sse(h)
2
sse(h) is the sum of squared error of knock-one out prediction. Given
a window-size h and a point y , the knock-one out predicted value is
Qhy (y ), where Qhy (x) is a quadratic regression function constructed
using the data points within the ball {x| kx − y k ≤ h}/{y }.
1
Qhy (x) = c + g T (x − y ) + (x − y )T H(x − y )
2
.
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
13 / 26
Phase II: refine solution
Local optimization methods to handle noise
Derivative-free methods
Basic approach: reduce function uncertainty by averaging multiple
samples per point.
Potential difficulty:
efficiency of algorithm vs number of simulation runs
We apply Bayesian approach to determine appropriate number of
samples per point, while simultaneously enhancing the algorithm
efficiency
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
14 / 26
Quadratic model construction and trust region subproblem
solution
For iteration k = 1, 2, . . .,
···
Construct a quadratic model via interpolation
1
Q(x, ξ) = F (xk , ξ) + gQT (ξ)(x − xk ) + (x − xk )T GQ (ξ)(x − xk )
2
The model is unstable since interpolating noisy data
Solve the trust region subproblem
sk∗ (ξ) = arg mins Q(xk + s, ξ)
s.t. ksk2 ≤ ∆k
The solution is thus unstable
···
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
15 / 26
Bayesian posterior distributions
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
16 / 26
Constraining the variance of solutions (Monte Carlo
validation)
Generate ‘sample quadratic functions’ that could arise given current
function evaluations.
Trial solutions are generated within a trust region. The standard
deviation of the solutions are constrained.
n
max std([s ∗(1) (i), s ∗(2) (i), · · · , s ∗(M) (i)]) ≤ β∆k .
i=1
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
17 / 26
Simulation calibration
Detailed individual-woman level discrete event simulation of
Wisconsin Breast Cancer Incidence (using 4 processes):
I
I
I
I
Breast cancer natural history
Breast cancer detection
Breast cancer treatment
Non-breast cancer mortality among US women
Replicate breast cancer surveillance data: 1975-2000
In Situ Inc./100K pop.
70
60
SEER
50
40
30
20
WCRS
10
0
1975
1985
1995
Ye ar
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
18 / 26
Application to WBCE
500,000 points x generated uniformly at random
Using CONDOR (120 machines) can evaluate approximately 1000 per
day
I
I
F (x, ξ) involves simulation of 3 million women
363 are in L(10): “simulated points out of data envelope”
Using Phase I: 10,000 points evaluated, 220 points suggested, 195 are
in L(10)
Phase I results in new points (all are good), but 2 of which seem
better than the “experts” best solution
Phase II: application was not necessary
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
19 / 26
Design a coaxial antenna for hepatic tumor ablation
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
20 / 26
Simulation of the electromagnetic radiation profile
Finite element models (COMSOL MultiPhysics v3.2) are used to generate
the electromagnetic (EM) radiation fields in liver given a particular design
Metric
Lesion radius
Axial ratio
S11
Measure of
Size of lesion in radial direction
Proximity of lesion shape to a sphere
Tail reflection of antenna
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
Goal
Maximize
Fit to 0.5
Minimize
INFORMS, 15 Oct
21 / 26
Two-phase approach to optimize antenna design
parameters
Uniform LHS to generate 1,000 design samples to evaluate with the
FE simulation model (range [-0.1409, 0.2903])
c = −0.0354 the 10% quantile. L(c) has 100 positive samples (900
negative)
Balancing procedure: 200 positive vs. 269 negative samples
3 (of 6 tested using g-means) classifiers in ensemble
Refined data: 20,000 designs, 1914 predicted by classifiers as positive,
74.5% correctly
The best Phase I design has value -0.2238
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
22 / 26
30
30
25
25
20
20
Percent
Percent
Coaxial antenna design
15
10
10
5
0
−0.4
15
5
−0.35
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
0
−0.4
−0.35
Objective Value
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
Objective Value
(e) First stage initial designs
(f) Designs predicted by classifiers
Phase II started from 5 points predicted: value -0.2238
Phase II returned an optimal solution: value -0.2501
Total simulations used = 1000 + 1914 + 750
DIRECT (4000): -0.2064; SnobFit (4000): -0.1955
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
23 / 26
Noisy extension: changing liver properties
Dielectric tissue properties varied within ±10% of average properties
to simulate the individual variation.
WISOPT yields an optimal design that is a 27.3% improvement over
the original design and is more robust in terms of lesion shape and
efficiency.
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
24 / 26
Ambulance simulation
An ambulance is called when an emergency call occurs. Determine the
locations of the ambulance bases such that the expected response time to
emergency calls is minimized.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(g) The distribution of emergency calls
Michael Ferris (University of Wisconsin)
(h) The locations of ambulance bases
Simulation-Based Optimization
INFORMS, 15 Oct
25 / 26
Conclusions and future work
Coupling statistical and optimization techniques can effectively
process noisy function optimizations
Significant gains in system performance and robustness are possible
WISOPT framework allows multiple methods to be “hooked” up
Future work:
Problems with general constraints
More optimization algorithms in both phases
A phase transition module with variable radii
Michael Ferris (University of Wisconsin)
Simulation-Based Optimization
INFORMS, 15 Oct
26 / 26

Download Report

Optimization of Noisy Functions - University of Wisconsin–Madison

Paperzz.com

Your Paperzz