Design of experiments
in nonlinear models
Luc Pronzato
Université Côte d’Azur, CNRS, I3S, France
Outline I
1
DoE: objectives & examples
2
DoE based on asymptotic normality
3
Construction of (locally) optimal designs
4
Problems with nonlinear models
5
Small-sample properties
6
Nonlocal optimum design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
2 / 73
1 DoE objectives & examples
A/ Parameter estimation
1 DoE: objectives & examples
A/ Parameter estimation
Ex1: Weighing with a two-pan balance
+ Determine the weights of 8 objets, with mass mi , i = 1, . . . , 8
i.i.d. errors εi ∼ N (0, σ 2 )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
3 / 73
1 DoE objectives & examples
A/ Parameter estimation
1 DoE: objectives & examples
A/ Parameter estimation
Ex1: Weighing with a two-pan balance
+ Determine the weights of 8 objets, with mass mi , i = 1, . . . , 8
i.i.d. errors εi ∼ N (0, σ 2 )
Method a: weigh each objet successively
→ y (i) = mi + εi , i = 1, . . . , 8
→ estimated weights : m̂i = yi ∼ N (mi , σ 2 )
ˆ i ∼ N (mi , σ 2 /8) (with 64 observations. . . )
Repeat 8 times, average the results: m̂
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
3 / 73
1 DoE objectives & examples
A/ Parameter estimation
Method b: more sophisticated. . .
y1
y2
y3
y4
y5
y6
y7
y8
=
=
=
=
=
=
=
=
m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1
m1 + m2 + m3 − m4 − m5 − m6 − m7 + m8 + ε2
m1 − m2 − m3 + m4 + m5 − m6 − m7 + m8 + ε3
m1 − m2 − m3 − m4 − m5 + m6 + m7 + m8 + ε4
−m1 + m2 − m3 + m4 − m5 + m6 − m7 + m8 + ε5
−m1 + m2 − m3 − m4 + m5 − m6 + m7 + m8 + ε6
−m1 − m2 + m3 + m4 − m5 − m6 + m7 + m8 + ε7
−m1 − m2 + m3 − m4 + m5 + m6 − m7 + m8 + ε8
8
1X
yi
→ m̂8 =
8 i=1
ε1 + ε2 + ε3 + ε4 + ε5 + ε6 + ε7 + ε8
8
2
∼ N (m8 , σ /8) (idem for mj , j ≤ 7)
= m8 +
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
4 / 73
1 DoE objectives & examples
A/ Parameter estimation
Method b: more sophisticated. . .
y1
y2
y3
y4
y5
y6
y7
y8
=
=
=
=
=
=
=
=
m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1
m1 + m2 + m3 − m4 − m5 − m6 − m7 + m8 + ε2
m1 − m2 − m3 + m4 + m5 − m6 − m7 + m8 + ε3
m1 − m2 − m3 − m4 − m5 + m6 + m7 + m8 + ε4
−m1 + m2 − m3 + m4 − m5 + m6 − m7 + m8 + ε5
−m1 + m2 − m3 − m4 + m5 − m6 + m7 + m8 + ε6
−m1 − m2 + m3 + m4 − m5 − m6 + m7 + m8 + ε7
−m1 − m2 + m3 − m4 + m5 + m6 − m7 + m8 + ε8
8
1X
yi
→ m̂8 =
8 i=1
ε1 + ε2 + ε3 + ε4 + ε5 + ε6 + ε7 + ε8
8
2
∼ N (m8 , σ /8) (idem for mj , j ≤ 7)
= m8 +
à 8 observations only, against 64 with method a!
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
4 / 73
1 DoE objectives & examples
A/ Parameter estimation
Here, P
selection of a good design = combinatorial problem:
8
yk = i=1 f k i mi + εk = fk> m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]> )
y = Fm + ε with
method a: Fa = I8
method a: Fb = 8 × 8 Hadamard matrix, F>
b F b = 8 I8
LS estimator m̂
= arg min
m
=
n
X
k=1
Luc Pronzato (CNRS)
n
X
[yk − fk> m]2
k=1
!−1
fk fk>
n
X
yk fk = (F> F)−1 F> y
k=1
Design of experiments in nonlinear models
La Réunion, nov. 2016
5 / 73
1 DoE objectives & examples
A/ Parameter estimation
Here, P
selection of a good design = combinatorial problem:
8
yk = i=1 fk i mi + εk = fk> m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]> )
y = Fm + ε with
method a: Fa = I8
method a: Fb = 8 × 8 Hadamard matrix, F>
b F b = 8 I8
LS estimator m̂
= arg min
m
=
n
X
n
X
[yk − fk> m]2
k=1
!−1
fk fk>
k=1
=⇒ Choose the fk ’s such that Mn =
Luc Pronzato (CNRS)
n
X
yk fk = (F> F)−1 F> y
k=1
Pn
1
n
>
k=1 fk fk
= n1 F> F is nonsingular
Design of experiments in nonlinear models
La Réunion, nov. 2016
5 / 73
1 DoE objectives & examples
A/ Parameter estimation
Here, P
selection of a good design = combinatorial problem:
8
yk = i=1 fk i mi + εk = fk> m + εk ,
(e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1]> )
y = Fm + ε with
method a: Fa = I8
method a: Fb = 8 × 8 Hadamard matrix, F>
b F b = 8 I8
LS estimator m̂
= arg min
m
=
n
X
n
X
[yk − fk> m]2
k=1
!−1
fk fk>
k=1
=⇒ Choose the fk ’s such that Mn =
E{m̂} = m (no bias)
2
E{(m̂ − m)(m̂ − m)> } = σn M−1
n
n
X
yk fk = (F> F)−1 F> y
k=1
Pn
1
n
>
k=1 fk fk
= n1 F> F is nonsingular
=⇒ minimize a scalar function of M−1
n
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
5 / 73
1 DoE objectives & examples
A/ Parameter estimation
In this particular situation: combinatorial problem (since fk i ∈ {−1, 0, 1}) [Fisher
1925 . . . ]
More generally, when the design variables (inputs) are real numbers, optimum
design for parameter estimation is obtained by optimization of a scalar function of
the (asymptotic) covariance matrix of the estimator
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
6 / 73
1 DoE objectives & examples
A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics
A product x is injected in blood (→ input u(t)),
xC (t) (product in blood) moves to another tissue → xP (t)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
7 / 73
1 DoE objectives & examples
A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics
A product x is injected in blood (→ input u(t)),
xC (t) (product in blood) moves to another tissue → xP (t)
→ Linear differential equations:
(
dxC (t)
= (−KEL − KCP )xC (t) + KPC xP (t) + u(t)
dt
dxP (t)
=
KCP xC (t) − KPC xP (t)
dt
we observe the concentration of x in blood: y (t) = xC (t)/V + ε(t),
the errors (ti )’s are i.i.d. N (0, σ 2 ), σ = 0.2µg/ml
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
7 / 73
1 DoE objectives & examples
A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics
A product x is injected in blood (→ input u(t)),
xC (t) (product in blood) moves to another tissue → xP (t)
→ Linear differential equations:
(
dxC (t)
= (−KEL − KCP )xC (t) + KPC xP (t) + u(t)
dt
dxP (t)
=
KCP xC (t) − KPC xP (t)
dt
we observe the concentration of x in blood: y (t) = xC (t)/V + ε(t),
the errors (ti )’s are i.i.d. N (0, σ 2 ), σ = 0.2µg/ml
There are 4 unknown parameters θ = (KCP , KPC , KEL , V )
The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then
1.45 mg/min)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
7 / 73
1 DoE objectives & examples
A/ Parameter estimation
Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics
A product x is injected in blood (→ input u(t)),
xC (t) (product in blood) moves to another tissue → xP (t)
→ Linear differential equations:
(
dxC (t)
= (−KEL − KCP )xC (t) + KPC xP (t) + u(t)
dt
dxP (t)
=
KCP xC (t) − KPC xP (t)
dt
we observe the concentration of x in blood: y (t) = xC (t)/V + ε(t),
the errors (ti )’s are i.i.d. N (0, σ 2 ), σ = 0.2µg/ml
There are 4 unknown parameters θ = (KCP , KPC , KEL , V )
The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then
1.45 mg/min)
+ simulated experiments with «true» parameter values
θ̄ = (0.066 min−1 , 0.038 min−1 , 0.0242 min−1 , 30 l)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
7 / 73
1 DoE objectives & examples
A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min
• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
8 / 73
1 DoE objectives & examples
A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min
• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
• «optimal» design (for θ̄) :
t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)
(assumes that independent measurements at the same time are possible)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
8 / 73
1 DoE objectives & examples
A/ Parameter estimation
Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min
• «conventional» design :
t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)
• «optimal» design (for θ̄) :
t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)
(assumes that independent measurements at the same time are possible)
→ 400 simulations
→ 400 sets of 8 observations each, for each design
→ 400 parameter estimates (LS) for each design . . .
Ô histograms of θ̂i
(and approximated marginals [Pázman & P 1996])
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
8 / 73
1 DoE objectives & examples
A/ Parameter estimation
à «optimal» design gives more precise estimation
K̂EL [K̄EL = 0.0242]
180
160
140
120
100
80
60
40
20
0
0.01
Luc Pronzato (CNRS)
0.015
0.02
0.025
0.03
0.035
Design of experiments in nonlinear models
0.04
0.045
0.05
La Réunion, nov. 2016
9 / 73
1 DoE objectives & examples
A/ Parameter estimation
à «optimal» design gives more precise estimation
K̂CP [K̄CP = 0.066]
25
20
15
10
5
0
0
Luc Pronzato (CNRS)
0.05
0.1
0.15
Design of experiments in nonlinear models
0.2
0.25
La Réunion, nov. 2016
9 / 73
1 DoE objectives & examples
A/ Parameter estimation
à «optimal» design gives more precise estimation
K̂PC [K̄PC = 0.038]
30
25
20
15
10
5
0
0
Luc Pronzato (CNRS)
0.02
0.04
0.06
0.08
0.1
Design of experiments in nonlinear models
0.12
0.14
0.16
La Réunion, nov. 2016
9 / 73
1 DoE objectives & examples
A/ Parameter estimation
à «optimal» design gives more precise estimation
V̂ [V̄ = 30]
0.25
0.2
0.15
0.1
0.05
0
0
Luc Pronzato (CNRS)
5
10
15
20
25
30
35
Design of experiments in nonlinear models
40
45
50
La Réunion, nov. 2016
9 / 73
1 DoE objectives & examples
B/ Model discrimination
B/ Model discrimination
Ex3: [Box & Hill 1967] Chemical reaction A → B
2 design variables: x = (time t, temperature T )
reaction of 1st, 2nd, 3rd ou 4th order?
→ 4 model structures are candidate:
η (1) (x, θ1 )
= exp[−θ11 t exp(−θ12 /T )]
1
η (2) (x, θ2 ) =
1 + θ21 t exp(−θ22 /T )
1
η (3) (x, θ3 ) =
[1 + 2θ31 t exp(−θ32 /T )]1/2
1
η (4) (x, θ4 ) =
[1 + 3θ41 t exp(−θ42 /T )]1/3
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
10 / 73
1 DoE objectives & examples
B/ Model discrimination
Simulated experiment
Observations with 2nd structure («true»): y (xj ) = η (2) (xj , θ̄2 ) + εj , with
ä θ̄2 = (400, 5000)> the «true» value (unknown) of parameters in model 2
ä (j ) i.i.d. N (0, σ 2 ), σ = 0.05
Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
11 / 73
1 DoE objectives & examples
B/ Model discrimination
Simulated experiment
Observations with 2nd structure («true»): y (xj ) = η (2) (xj , θ̄2 ) + εj , with
ä θ̄2 = (400, 5000)> the «true» value (unknown) of parameters in model 2
ä (j ) i.i.d. N (0, σ 2 ), σ = 0.05
Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600
Sequential design: after the observation of y (xj ), j = 1, . . . , k,
• estimate θ̂ik (LS) for i = 1, 2, 3, 4
• compute posterior probability πi (k) that model i is correct for i = 1, 2, 3, 4
Initialization: πi (0) = 1/4, i = 1, . . . , 4 and x1 , . . . , x4 are given
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
11 / 73
1 DoE objectives & examples
B/ Model discrimination
probabilities πi (k)
1
π2
0.9
→ η(2)
0.8
0.7
0.6
0.5
π3
0.4
0.3
0.2
π4
0.1
0
π1
0
2
4
6
Luc Pronzato (CNRS)
8
10
12
14
Design of experiments in nonlinear models
La Réunion, nov. 2016
12 / 73
1 DoE objectives & examples
B/ Model discrimination
probabilities πi (k)
design points xk
1
π2
0.9
600
→ η(2)
1
580
0.8
0.7
560
0.6
540
4
5,7,8
6,9,12,14
0.5
π3
0.4
10,11,13
520
500
0.3
2
480
0.2
π4
0.1
0
460
π1
0
2
440
4
3
6
Luc Pronzato (CNRS)
8
10
12
14
0
20
Design of experiments in nonlinear models
40
60
80
100
120
140
La Réunion, nov. 2016
160
12 / 73
1 DoE objectives & examples
B/ Model discrimination
Design for discrimination is not considered in the following
A simple sequential method for discriminating between two structures η (1) (x, θ1 ),
η (2) (x, θ2 ) [Atkinson & Fedorov 1975]
After observation of y (x1 ), . . . , y (xk ) estimate θ̂1k and θ̂2k for both models
place next point xk+1 where [η (1) (x, θ̂1k ) − η (2) (x, θ̂2k )]2 is maximum
k → k + 1, repeat
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
13 / 73
1 DoE objectives & examples
B/ Model discrimination
Design for discrimination is not considered in the following
A simple sequential method for discriminating between two structures η (1) (x, θ1 ),
η (2) (x, θ2 ) [Atkinson & Fedorov 1975]
After observation of y (x1 ), . . . , y (xk ) estimate θ̂1k and θ̂2k for both models
place next point xk+1 where [η (1) (x, θ̂1k ) − η (2) (x, θ̂2k )]2 is maximum
k → k + 1, repeat
If more than two models: estimate θ̂ik for all of them, place next point using the
two models with best and second best fitting
(see [Atkinson & Cox 1974; Hill 1978] for surveys)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
13 / 73
2 DoE based on asymptotic normality
A/ Regression models
2 DoE based on asymptotic normality
A/ Regression models
yi = y (xi ) = η(xi , θ) + εi
System: physical experimental device, experimental conditions xi ,
i = 1, 2, . . . , n
Model(θ): mathematical equations, parameters θ = (θ1 , . . . , θp )>
(response η(x , θ) known explicitly or result of simulation of ODEs or PDEs)
Criterion: P
similarity between yi = y (xi ) and η(xi , θ), i = 1, 2, . . . , n, e.g.,
n
J(θ) = n1 i=1 [yi − η(xi , θ)]2 (LS)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
14 / 73
2 DoE based on asymptotic normality
A/ Regression models
Remarks:
Model(θ) also provides derivatives
∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1 , . . . , ∂η(x , θ)/∂θp )>
— plus higher-order derivatives if necessary—
via simulation of sensitivity functions, or automatic differentiation (adjoint
code)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
15 / 73
2 DoE based on asymptotic normality
A/ Regression models
Remarks:
Model(θ) also provides derivatives
∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1 , . . . , ∂η(x , θ)/∂θp )>
— plus higher-order derivatives if necessary—
via simulation of sensitivity functions, or automatic differentiation (adjoint
code)
Criterion: P
other criteria than LS can be used, e.g.,
n
J(θ) = n1 i=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood
(ML) estimation in more general settings
Pn
(J(θ) = n1 i=1 log π(yi |θ) → max!)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
15 / 73
2 DoE based on asymptotic normality
A/ Regression models
Remarks:
Model(θ) also provides derivatives
∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1 , . . . , ∂η(x , θ)/∂θp )>
— plus higher-order derivatives if necessary—
via simulation of sensitivity functions, or automatic differentiation (adjoint
code)
Criterion: P
other criteria than LS can be used, e.g.,
n
J(θ) = n1 i=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood
(ML) estimation in more general settings
Pn
(J(θ) = n1 i=1 log π(yi |θ) → max!)
We always assume independent observations y (xi ) (independent εi in
regression) — often much more difficult otherwise!
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
15 / 73
2 DoE based on asymptotic normality
A/ Regression models
Remarks:
Model(θ) also provides derivatives
∂η(x , θ)/∂θ = (∂η(x , θ)/∂θ1 , . . . , ∂η(x , θ)/∂θp )>
— plus higher-order derivatives if necessary—
via simulation of sensitivity functions, or automatic differentiation (adjoint
code)
Criterion: P
other criteria than LS can be used, e.g.,
n
J(θ) = n1 i=1 |yi − η(xi , θ)| (→ robust estimation), including
Maximum-Likelihood
(ML) estimation in more general settings
Pn
(J(θ) = n1 i=1 log π(yi |θ) → max!)
We always assume independent observations y (xi ) (independent εi in
regression) — often much more difficult otherwise!
Most of the following can be found in [P & Pázman: Design of Experiments in
Nonlinear Models, Springer, 2013]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
15 / 73
2 DoE based on asymptotic normality
B/ LS estimation
B/ LS estimation
θ̂n = arg minθ
Luc Pronzato (CNRS)
1
n
Pn
i=1 [yi
− η(xi , θ)]2
Design of experiments in nonlinear models
La Réunion, nov. 2016
16 / 73
2 DoE based on asymptotic normality
B/ LS estimation
B/ LS estimation
θ̂n = arg minθ
1
n
Pn
i=1 [yi
− η(xi , θ)]2
Linear model: η(x , θ) = f > (x )θ → θ̂n = (F> F)−1 F> y ,
with y = (y1 , . . . , yn )> and F> = (f(x1 ), . . . , f(xn ))>
=⇒ choose the xi such that Mn = n1 F> F has full rank
(Mn = normalized information matrix)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
16 / 73
2 DoE based on asymptotic normality
B/ LS estimation
B/ LS estimation
θ̂n = arg minθ
1
n
Pn
i=1 [yi
− η(xi , θ)]2
Linear model: η(x , θ) = f > (x )θ → θ̂n = (F> F)−1 F> y ,
with y = (y1 , . . . , yn )> and F> = (f(x1 ), . . . , f(xn ))>
=⇒ choose the xi such that Mn = n1 F> F has full rank
(Mn = normalized information matrix)
Since yi = f > (xi )θ̄ + εi for some θ̄ and E{εi } = 0 for all i, E{y} = Fθ̄
and E{θ̂n } = θ̄
2
Also, Var(θ̂n ) = E{(θ̂n − θ̄)(θ̂n − θ̄)> } = σ 2 (F> F)−1 = σn M−1
n when the εi are
i.i.d. with finite variance σ 2
=⇒ choose the xi to minimize a scalar function of M−1
n
(see Example 1: weighing with a two-pan balance)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
16 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Nonlinear model: η(x , θ)
Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for all
x . . . ) and for a suitable sequence (xi )
a.s.
θ̂n → θ̄ as n → ∞ (strong consistency)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
17 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Nonlinear model: η(x , θ)
Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for all
x . . . ) and for a suitable sequence (xi )
a.s.
θ̂n → θ̄ as n → ∞ (strong consistency)
Moreover, under «standard» regularity assumptions (η(x , θ) twice continuously
differentiable in θ for all x . . . ), for i.i.d. errors εi with finite variance σ 2 , for a
suitable sequence (xi )
√ n
d
n(θ̂ − θ̄) → N (0, σ 2 M−1 ) as n → ∞ (asymptotic normality)
Pn ∂η(xi ,θ) ∂η(xi ,θ) (information matrix)
with M = limn→∞ 1
>
n
Luc Pronzato (CNRS)
i=1
∂θ
θ̄
∂θ
θ̄
Design of experiments in nonlinear models
La Réunion, nov. 2016
17 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Nonlinear model: η(x , θ)
Under «standard» assumptions (θ ∈ Θ compact, η(x , θ) continuous in θ for all
x . . . ) and for a suitable sequence (xi )
a.s.
θ̂n → θ̄ as n → ∞ (strong consistency)
Moreover, under «standard» regularity assumptions (η(x , θ) twice continuously
differentiable in θ for all x . . . ), for i.i.d. errors εi with finite variance σ 2 , for a
suitable sequence (xi )
√ n
d
n(θ̂ − θ̄) → N (0, σ 2 M−1 ) as n → ∞ (asymptotic normality)
Pn ∂η(xi ,θ) ∂η(xi ,θ) (information matrix)
with M = limn→∞ 1
>
n
i=1
∂θ
θ̄
∂θ
θ̄
=⇒ choose the xi (design) to minimize a scalar function of M−1 ,
or maximize a function Φ(M)
= classical approach for DoE
(see Example 2 with a two-compartment model)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
17 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi } = E{ε2i } = E{ε2 (xi )} = σ 2 (xi )
n
Weighted LS estimator θ̂WLS
minimizes JWLS (θ) =
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
1
n
Pn
i=1
w (xi ) [yi − η(xi , θ)]2
La Réunion, nov. 2016
18 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi } = E{ε2i } = E{ε2 (xi )} = σ 2 (xi )
n
Weighted LS estimator θ̂WLS
minimizes JWLS (θ) =
1
n
Pn
i=1
w (xi ) [yi − η(xi , θ)]2
√ n
d
− θ̄) → N (0, C) as
Strong consistency and asymptotic normality n(θ̂WLS
n → ∞, where
−1
C = M−1
a Mb Ma and
Pn
1
i ,θ) ∂η(xi ,θ) Ma = limn→∞ n i=1 w (xi ) ∂η(x
∂θ
θ̄ ∂θ > θ̄
Pn
i ,θ) ∂η(xi ,θ) Mb = limn→∞ n1 i=1 w 2 (xi ) σ 2 (xi ) ∂η(x
∂θ
θ̄ ∂θ >
θ̄
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
18 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks:
Weighted LS: suppose heteroscedastic errors
var{εi } = E{ε2i } = E{ε2 (xi )} = σ 2 (xi )
n
Weighted LS estimator θ̂WLS
minimizes JWLS (θ) =
1
n
Pn
i=1
w (xi ) [yi − η(xi , θ)]2
√ n
d
− θ̄) → N (0, C) as
Strong consistency and asymptotic normality n(θ̂WLS
n → ∞, where
−1
C = M−1
a Mb Ma and
Pn
1
i ,θ) ∂η(xi ,θ) Ma = limn→∞ n i=1 w (xi ) ∂η(x
∂θ
θ̄ ∂θ > θ̄
Pn
i ,θ) ∂η(xi ,θ) Mb = limn→∞ n1 i=1 w 2 (xi ) σ 2 (xi ) ∂η(x
∂θ
θ̄ ∂θ >
θ̄
à C M−1
M = limn→∞
1
n
Pn
∂η(xi ,θ) ∂η(xi ,θ) 1
i=1 σ 2 (xi )
∂θ
θ̄ ∂θ >
θ̄
à C = M−1 when w (x ) ∝ σ −2 (x )
=⇒ choose the best estimator, then the best design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
18 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks (continued):
One may also consider the case var{εi } = σ 2 (xi , θ) (errors with
parameterized variance)
n
n
à Use two-stage LS: 1/ use w (x ) ≡ 1 → θ̂(1)
; 2/ use w (x ) = σ −2 (x , θ̂(1)
)
or use iteratively-reweighted LS (i.e., go on with more stages), or
penalized LS. . .
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
19 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks (continued):
One may also consider the case var{εi } = σ 2 (xi , θ) (errors with
parameterized variance)
n
n
à Use two-stage LS: 1/ use w (x ) ≡ 1 → θ̂(1)
; 2/ use w (x ) = σ −2 (x , θ̂(1)
)
or use iteratively-reweighted LS (i.e., go on with more stages), or
penalized LS. . .
√ n
d
Similar asymptotic results for ML estimation n(θ̂ML
− θ̄) → N (0, σ 2 M−1
F )
as n → ∞, with MF = Fisher information matrix
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
19 / 73
2 DoE based on asymptotic normality
B/ LS estimation
Remarks (continued):
One may also consider the case var{εi } = σ 2 (xi , θ) (errors with
parameterized variance)
n
n
à Use two-stage LS: 1/ use w (x ) ≡ 1 → θ̂(1)
; 2/ use w (x ) = σ −2 (x , θ̂(1)
)
or use iteratively-reweighted LS (i.e., go on with more stages), or
penalized LS. . .
√ n
d
Similar asymptotic results for ML estimation n(θ̂ML
− θ̄) → N (0, σ 2 M−1
F )
as n → ∞, with MF = Fisher information matrix
Model(θ) = linear ODE, experimental design = system input u(t)
à (≈ simple) analytic expression for M
à optimal input design ⇔ optimal control problem (frequency domain →
optimal combination of sinusoidal signals) [Goodwin & Payne 1977; Zarrop
1979; Ljung 1987; Walter & P 1994, 1997]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
19 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?
à There are many possibilities!
LS estimation in linear regression with i.i.d. errors N (0, σ 2 )
R(θ̂n , α) = {θ ∈ Rp : (θ − θ̂n )> Mn (θ − θ̂n ) ≤
σ2 2
χ (1 − α)}
n p
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n , α)} = α
(asymptotically true in nonlinear situations — e.g., nonlinear regression)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
20 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?
à There are many possibilities!
LS estimation in linear regression with i.i.d. errors N (0, σ 2 )
R(θ̂n , α) = {θ ∈ Rp : (θ − θ̂n )> Mn (θ − θ̂n ) ≤
σ2 2
χ (1 − α)}
n p
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n , α)} = α
(asymptotically true in nonlinear situations — e.g., nonlinear regression)
à Most criteria can be related to geometrical properties of R(θ̂n , α)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
20 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
C/ Design based on the information matrix
Maximize Φ(M), but which Φ(·)?
à There are many possibilities!
LS estimation in linear regression with i.i.d. errors N (0, σ 2 )
R(θ̂n , α) = {θ ∈ Rp : (θ − θ̂n )> Mn (θ − θ̂n ) ≤
σ2 2
χ (1 − α)}
n p
= confidence region (ellipsoid) at level α: Prob{θ̄ ∈ R(θ̂n , α)} = α
(asymptotically true in nonlinear situations — e.g., nonlinear regression)
à Most criteria can be related to geometrical properties of R(θ̂n , α)
à Nonlinear model =⇒ M = M(θ) depends on the θ where η(x , θ) is linearized:
for the moment use a nominal value θ0
à locally optimum design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
20 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1 ] ⇔ maximize 1/trace[M−1 ]
⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence
ellipsoids R(θ̂n , α)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
21 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1 ] ⇔ maximize 1/trace[M−1 ]
⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence
ellipsoids R(θ̂n , α)
E -optimality: maximize λmin (M)
⇔ minimize the longest axis of R(θ̂n , α)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
21 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
A few choices for Φ(·)
A-optimality: maximize −trace[M−1 ] ⇔ maximize 1/trace[M−1 ]
⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence
ellipsoids R(θ̂n , α)
E -optimality: maximize λmin (M)
⇔ minimize the longest axis of R(θ̂n , α)
D-optimality: maximize log det M
√
⇔ minimize volume of R(θ̂n , α) (proportional to 1/ det M)
Very much used:
a D-optimum design is invariant by reparameterization
det M0 (β(θ)) = det M(θ) det−2
∂β
∂θ>
often leads to repeat the same experimental conditions (replications)
(remember Ex2: dim(θ) = 4 → 4 different sampling times, several
observations at each)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
21 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
Ds -optimality: only s < p parameters of interest (and p − s «nuisance»
parameters) → θ> = (θ1> , θ2> ), with θ1 the vector of s parameters of interest
M11 M12
A11 A12
M(θ) =
, M−1 (θ) =
M21 M22
A21 A22
with
A11
=
−1
[M11 − M12 M−1
22 M21 ]
A12
=
−1
−[M11 − M12 M−1
M12 M−1
22 M21 ]
22
A21
=
−1
−1
−M−1
22 M21 [M11 − M12 M22 M21 ]
A22
=
−1
−1
−1
M−1
M12 M−1
22 + M22 M21 [M11 − M12 M22 M21 ]
22
→ maximize ΦDs [M] = det[M11 − M12 M−1
22 M21 ]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
22 / 73
2 DoE based on asymptotic normality
C/ Design based on the information matrix
Ds -optimality: only s < p parameters of interest (and p − s «nuisance»
parameters) → θ> = (θ1> , θ2> ), with θ1 the vector of s parameters of interest
M11 M12
A11 A12
M(θ) =
, M−1 (θ) =
M21 M22
A21 A22
with
A11
=
−1
[M11 − M12 M−1
22 M21 ]
A12
=
−1
−[M11 − M12 M−1
M12 M−1
22 M21 ]
22
A21
=
−1
−1
−M−1
22 M21 [M11 − M12 M22 M21 ]
A22
=
−1
−1
−1
M−1
M12 M−1
22 + M22 M21 [M11 − M12 M22 M21 ]
22
→ maximize ΦDs [M] = det[M11 − M12 M−1
22 M21 ]
à Useful for model discrimination:
if η (2) (x , θ2 ) = η (1) (x , θ1 ) + δ(x , θ2\1 ) (nested models),
estimate θ2\1 in η (2) to decide whether η (1) or η (2) is more
appropriate, see [Atkinson & Cox 1974]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
22 / 73
3 Construction of (locally) optimal designs
A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1 , . . . , xn ) in a regression model (for simplicity)
Each design point xi can be anything, e.g. a point in a subset X of Rd
∂η(xi ,θ) Pn
i ,θ) 0
+ maximize Φ(Mn ) w.r.t. Xn with Mn = M(Xn , θ0 ) = n1 i=1 ∂η(x
∂θ
θ 0 ∂θ >
θ
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
23 / 73
3 Construction of (locally) optimal designs
A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1 , . . . , xn ) in a regression model (for simplicity)
Each design point xi can be anything, e.g. a point in a subset X of Rd
∂η(xi ,θ) Pn
i ,θ) 0
+ maximize Φ(Mn ) w.r.t. Xn with Mn = M(Xn , θ0 ) = n1 i=1 ∂η(x
∂θ
θ 0 ∂θ >
θ
• If problem dimension n × d not too large → standard algorithm (but with
constraints, local optimas. . . )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
23 / 73
3 Construction of (locally) optimal designs
A/ Exact design
3 Construction of (locally) optimal designs
A/ Exact design
n observations at Xn = (x1 , . . . , xn ) in a regression model (for simplicity)
Each design point xi can be anything, e.g. a point in a subset X of Rd
∂η(xi ,θ) Pn
i ,θ) 0
+ maximize Φ(Mn ) w.r.t. Xn with Mn = M(Xn , θ0 ) = n1 i=1 ∂η(x
∂θ
θ 0 ∂θ >
θ
• If problem dimension n × d not too large → standard algorithm (but with
constraints, local optimas. . . )
• Otherwise, use an algorithm that takes the particular form of the problem into
account
Exchange methods: at iteration k, exchange one support point xj by a better one
x ∗ in X (design space) — better in the sense of Φ(·)
Xnk = (x1 , . . . ,
Luc Pronzato (CNRS)
xj , . . . , xn )
l
x∗
Design of experiments in nonlinear models
La Réunion, nov. 2016
23 / 73
3 Construction of (locally) optimal designs
A/ Exact design
[Fedorov 1972]: consider all n possible exchanges successively, each time
starting from Xnk , retain the «best» one among these n → Xnk+1
Xnk = ( x1 , . . . , xj , . . . , xn )
l
l
l
x1∗
xj∗
xn∗
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
24 / 73
3 Construction of (locally) optimal designs
A/ Exact design
[Fedorov 1972]: consider all n possible exchanges successively, each time
starting from Xnk , retain the «best» one among these n → Xnk+1
Xnk = ( x1 , . . . , xj , . . . , xn )
l
l
l
x1∗
xj∗
xn∗
One iteration → n optimizations of dimension d followed by ranking n
criterion values
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
24 / 73
3 Construction of (locally) optimal designs
A/ Exact design
[Mitchell, 1974]: DETMAX algorithm
If one additional observation were allowed → optimal choice
∗
Xnk+ = (x1 , . . . , xj , . . . , xn , xn+1
)
Then, remove one support point to return to a n-points design:
Ô consider all n + 1 possible cancellations,
retain the less penalizing in the sense of Φ(Mn )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
25 / 73
3 Construction of (locally) optimal designs
A/ Exact design
[Mitchell, 1974]: DETMAX algorithm
If one additional observation were allowed → optimal choice
∗
Xnk+ = (x1 , . . . , xj , . . . , xn , xn+1
)
Then, remove one support point to return to a n-points design:
Ô consider all n + 1 possible cancellations,
retain the less penalizing in the sense of Φ(Mn )
∗
Ô globally, exchange some xj with xn+1
[= excursion of length 1, longer excursions are possible. . . ]
One iteration → 1 optimization of dimension d followed by ranking n + 1
criterion values
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
25 / 73
3 Construction of (locally) optimal designs
A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires more
iterations
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
26 / 73
3 Construction of (locally) optimal designs
A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires more
iterations
dead ends are possible:
• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
26 / 73
3 Construction of (locally) optimal designs
A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires more
iterations
dead ends are possible:
• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
s both give local optima only s
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
26 / 73
3 Construction of (locally) optimal designs
A/ Exact design
DETMAX has simpler iterations than Fedorov, but usually requires more
iterations
dead ends are possible:
• DETMAX: the point to be removed is xn+1
• Fedorov: no possible improvement when optimizing one xi at a time
s both give local optima only s
Other methods:
Branch and bound: guaranteed convergence, but complicated [Welch 1982]
Rounding an optimal design measure (support points xi and associated
weights wi∗ , i = 1, . . . , m, presented next in B/):
choose n integers ri (ri = nb. of replications of observations at xi ) such that
P
m
r = n and ri /n ≈ wi∗
i=1 i
(e.g., maximize mini=1,...,m ri /wi∗ = Adams apportionment, see [Pukelsheim &
Reider 1992])
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
26 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
B/ Design measures: approximate design theory
[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman
1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ]
(nonlinear) regression, n observations at Xn = (x1 , . . . , xn ) with i.i.d. errors:
M(Xn , θ0 ) =
n
1 X ∂η(xi , θ) ∂η(xi , θ) θ0
n i=1
∂θ
∂θ> θ0
(the additive form is essential — related to the independence of observations)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
27 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
B/ Design measures: approximate design theory
[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman
1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ]
(nonlinear) regression, n observations at Xn = (x1 , . . . , xn ) with i.i.d. errors:
M(Xn , θ0 ) =
n
1 X ∂η(xi , θ) ∂η(xi , θ) θ0
n i=1
∂θ
∂θ> θ0
(the additive form is essential — related to the independence of observations)
Suppose that several xi ’s coincide (replications): only m < n different xi ’s
M(Xn , θ0 ) =
ri
n
m
X
ri ∂η(xi , θ) ∂η(xi , θ) θ0
n
∂θ
∂θ> θ0
i=1
= proportion of observations collected at xi
= «percentage of experimental effort» at xi
= weight wi of support point xi
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
27 / 73
3 Construction of (locally) optimal designs
M(Xn , θ0 ) =
m
X
i=1
wi
B/ Design measures: approximate design theory
∂η(xi , θ) ∂η(xi , θ) θ0
∂θ
∂θ> θ0
Pm
x1 · · · xm
with i=1 wi = 1
w1 · · · wm
Ô normalized discrete distribution on the xi ,
with constraints ri /n = wi
Ô design Xn ⇔
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
28 / 73
3 Construction of (locally) optimal designs
M(Xn , θ0 ) =
m
X
i=1
wi
B/ Design measures: approximate design theory
∂η(xi , θ) ∂η(xi , θ) θ0
∂θ
∂θ> θ0
Pm
x1 · · · xm
with i=1 wi = 1
w1 · · · wm
Ô normalized discrete distribution on the xi ,
with constraints ri /n = wi
Ô design Xn ⇔
Pm
à Release the constraints: use wi ≥ 0 with i=1 wi = 1
Ô ξ = discrete probability measure on X (= design space)
support points xi and associated weights wi
= «approximate design»
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
28 / 73
3 Construction of (locally) optimal designs
M(Xn , θ0 ) =
m
X
i=1
wi
B/ Design measures: approximate design theory
∂η(xi , θ) ∂η(xi , θ) θ0
∂θ
∂θ> θ0
Pm
x1 · · · xm
with i=1 wi = 1
w1 · · · wm
Ô normalized discrete distribution on the xi ,
with constraints ri /n = wi
Ô design Xn ⇔
Pm
à Release the constraints: use wi ≥ 0 with i=1 wi = 1
Ô ξ = discrete probability measure on X (= design space)
support points xi and associated weights wi
= «approximate design»
More general expression: ξ = any probability measure on X
Z
Z
∂η(x , θ) ∂η(x , θ) 0
M(ξ, θ ) =
ξ(dx ) ,
ξ(dx ) = 1
∂θ θ0 ∂θ> θ0
X
X
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
28 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
M(ξ) ∈ convex closure of M= set of rank 1 matrices
,θ) ∂η(x ,θ) M(δx ) = ∂η(x
∂θ
θ 0 ∂θ > θ 0
M(ξ) is symmetric p × p: ∈ q-dimensional space, q =
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
p(p+1)
2
La Réunion, nov. 2016
29 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
M(ξ) ∈ convex closure of M= set of rank 1 matrices
,θ) ∂η(x ,θ) M(δx ) = ∂η(x
∂θ
θ 0 ∂θ > θ 0
M(ξ) is symmetric p × p: ∈ q-dimensional space, q =
p(p+1)
2
ξ = w1 δx1 + w2 δx2 + w3 δx3
(3 points are enough for q = 2)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
29 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
Caratheodory Theorem:
M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m
X
i=1
wi
∂η(xi , θ) ∂η(xi , θ) p(p + 1)
+1
, m≤
θ0
∂θ
∂θ> θ0
2
+ 1 support points at most
⇒ consider discrete probability measures with p(p+1)
2
(true in particular for the optimum design!)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
30 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
Caratheodory Theorem:
M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m
X
i=1
wi
∂η(xi , θ) ∂η(xi , θ) p(p + 1)
+1
, m≤
θ0
∂θ
∂θ> θ0
2
+ 1 support points at most
⇒ consider discrete probability measures with p(p+1)
2
(true in particular for the optimum design!)
[Even better: for many criteria Φ(·), if ξ ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ ∗ ) is
support points are enough]
on the boundary of the convex closure of M and p(p+1)
2
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
30 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
Caratheodory Theorem:
M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m
X
i=1
wi
∂η(xi , θ) ∂η(xi , θ) p(p + 1)
+1
, m≤
θ0
∂θ
∂θ> θ0
2
+ 1 support points at most
⇒ consider discrete probability measures with p(p+1)
2
(true in particular for the optimum design!)
[Even better: for many criteria Φ(·), if ξ ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ ∗ ) is
support points are enough]
on the boundary of the convex closure of M and p(p+1)
2
Pm
Suppose we found an optimal ξ ∗ = i=1 wi∗ δxi
+ for a given n, choose the ri so that rni ' wi∗ optimum
→ rounding of an approximate design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
30 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
Caratheodory Theorem:
M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m
X
i=1
wi
∂η(xi , θ) ∂η(xi , θ) p(p + 1)
+1
, m≤
θ0
∂θ
∂θ> θ0
2
+ 1 support points at most
⇒ consider discrete probability measures with p(p+1)
2
(true in particular for the optimum design!)
[Even better: for many criteria Φ(·), if ξ ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ ∗ ) is
support points are enough]
on the boundary of the convex closure of M and p(p+1)
2
Pm
Suppose we found an optimal ξ ∗ = i=1 wi∗ δxi
+ for a given n, choose the ri so that rni ' wi∗ optimum
→ rounding of an approximate design
+ Sometimes, ξ ∗ can be implemented without any approximation: ξ = power
spectral density of an input signal
→ design of optimal input for ODE model in the frequency domain
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
30 / 73
3 Construction of (locally) optimal designs
B/ Design measures: approximate design theory
Caratheodory Theorem:
M(ξ) can be written as the linear combination of at most q + 1 elements of M:
M(ξ) =
m
X
i=1
wi
∂η(xi , θ) ∂η(xi , θ) p(p + 1)
+1
, m≤
θ0
∂θ
∂θ> θ0
2
+ 1 support points at most
⇒ consider discrete probability measures with p(p+1)
2
(true in particular for the optimum design!)
[Even better: for many criteria Φ(·), if ξ ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ ∗ ) is
support points are enough]
on the boundary of the convex closure of M and p(p+1)
2
Pm
Suppose we found an optimal ξ ∗ = i=1 wi∗ δxi
+ for a given n, choose the ri so that rni ' wi∗ optimum
→ rounding of an approximate design
+ Sometimes, ξ ∗ can be implemented without any approximation: ξ = power
spectral density of an input signal
→ design of optimal input for ODE model in the frequency domain
Why design measures are interesting?
How does it simplify the optimization problem?
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
30 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
C/ Optimal design measures
à Maximize Φ(·) concave w.r.t. M(ξ) in a convex set
Ex: D-optimality: ∀ M1 O, M2 O, with M2 6∝ M2 , ∀α, 0 < α < 1,
log det[(1 − α)M1 + αM2 ] > (1 − α) log det M1 + α log det M2
⇒ log det[·] is (strictly) concave
convex set + concave criterion ⇒ one unique optimum!
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
31 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
C/ Optimal design measures
à Maximize Φ(·) concave w.r.t. M(ξ) in a convex set
Ex: D-optimality: ∀ M1 O, M2 O, with M2 6∝ M2 , ∀α, 0 < α < 1,
log det[(1 − α)M1 + αM2 ] > (1 − α) log det M1 + α log det M2
⇒ log det[·] is (strictly) concave
convex set + concave criterion ⇒ one unique optimum!
ξ ∗ is optimal ⇔ directional derivative
≤ 0 in all directions
=⇒ «Equivalence Theorem» [Kiefer &
Wolfowitz 1960]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
31 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ (ξ; ν) = limα→0+ φ[(1−α)ξ+αν]−φ(ξ)
α
= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ (ξ ∗ ; ν) ≤ 0
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
32 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ (ξ; ν) = limα→0+ φ[(1−α)ξ+αν]−φ(ξ)
α
= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ (ξ ∗ ; ν) ≤ 0
Ô Takes a simple form when Φ(·) is differentiable
ξ ∗ maximizes φ(ξ) ⇔ maxx ∈X Fφ (ξ ∗ ; δx ) ≤ 0
+ Check optimality of ξ ∗ by plotting Fφ (ξ ∗ ; δx )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
32 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]
Fφ (ξ; ν) = limα→0+ φ[(1−α)ξ+αν]−φ(ξ)
α
= directional derivative of φ(·) at ξ in direction ν
Equivalence Theorem: ξ ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ (ξ ∗ ; ν) ≤ 0
Ô Takes a simple form when Φ(·) is differentiable
ξ ∗ maximizes φ(ξ) ⇔ maxx ∈X Fφ (ξ ∗ ; δx ) ≤ 0
+ Check optimality of ξ ∗ by plotting Fφ (ξ ∗ ; δx )
Ex: D-optimal design
ξD∗ maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ
⇔ maxx ∈X d(ξD∗ , x ) ≤ p
⇔ ξD∗ minimizes maxx ∈X d(ξ, x ) w.r.t. ξ ∈ Ξ
,θ) ,θ) where d(ξ, x ) = ∂η(x
M−1 (ξ) ∂η(x
∂θ
∂θ > θ 0
θ0
Moreover, d(ξD∗ , xi ) = p = dim(θ) for any xi = support point of ξD∗
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
32 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2 x ) (p = 2) i.i.d. errors, X = R+
[θ20 = 2]
→ d(ξ, x ) as a function of x
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
33 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2 x ) (p = 2) i.i.d. errors, X = R+
[θ20 = 2]
→ d(ξ, x ) as a function of x
0.01 0.75
ξ2 =
1/2 1/2
2.5
2
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
Luc Pronzato (CNRS)
1
1.2
1.4
1.6
1.8
2
Design of experiments in nonlinear models
La Réunion, nov. 2016
33 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Ex: η(x , θ) = θ1 exp(−θ2 x ) (p = 2) i.i.d. errors, X = R+
[θ20 = 2]
→ d(ξ, x ) as a function of x
0.01 0.75
0
ξ2 =
ξD∗ =
1/2 1/2
1/2
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
0.2
0.4
0.6
0.8
Luc Pronzato (CNRS)
1
1.2
1.4
1.6
1.8
2
0
0
0.2
0.4
Design of experiments in nonlinear models
0.6
0.8
1/θ2 = 0.5
1/2
1
1.2
1.4
1.6
La Réunion, nov. 2016
1.8
2
33 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors)
−1
,θ) = σ 2 d(ξ, x ) , n → ∞
n var[η(x , θ̂n )] → σ 2 ∂η(x
M (ξ, θ̄) ∂η(θ,u)
∂θ
∂θ > θ̄
θ̄
θ̄
à
ξD∗
D-optimality ⇔ G-optimality
minimizes the maximum of prediction variance over X
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
34 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors)
−1
,θ) = σ 2 d(ξ, x ) , n → ∞
n var[η(x , θ̂n )] → σ 2 ∂η(x
M (ξ, θ̄) ∂η(θ,u)
∂θ
∂θ > θ̄
θ̄
θ̄
à
ξD∗
D-optimality ⇔ G-optimality
minimizes the maximum of prediction variance over X
1.4
1.2
1
0.8
η(x , θ̄), η(x , θ̄)± 2 st.d.
0.6
0.4
0.2
0
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
→ put next observation where d(ξ, x ) is large
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
34 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Remark:
Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
35 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Remark:
Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Dual problem
design:
n to D-optimum
o
∂η(x ,θ) Define S =
,
x
∈
X
[S ∪ −S = Elfving’s set]
0
∂θ
θ
∗
E = minimum-volume ellipsoid centered at 0 that contains S
∗
∗
Lagrangian theory ⇒ E ∗ = {z ∈ Rp : z> M−1
F (ξD )z ≤ p} where ξD is D-optimum
∗
∗
support points of ξD = contact between E and S
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
35 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
Remark:
Eq. Th. = stationarity condition = NS condition for optimality
6= duality property!
Dual problem
design:
n to D-optimum
o
∂η(x ,θ) Define S =
,
x
∈
X
[S ∪ −S = Elfving’s set]
0
∂θ
θ
∗
E = minimum-volume ellipsoid centered at 0 that contains S
∗
∗
Lagrangian theory ⇒ E ∗ = {z ∈ Rp : z> M−1
F (ξD )z ≤ p} where ξD is D-optimum
∗
∗
support points of ξD = contact between E and S
In general, few contact points → repeat observations at the same place (see [Yang
2010, Dette & Melas 2011])
There exist dual problems for other criteria Φ(·)
(= one of the main topics in [Pukelsheim 1993])
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
35 / 73
3 Construction of (locally) optimal designs
C/ Optimal design measures
1
Ex: η(x , θ) = θ1θ−θ
[exp(−θ2 x ) − exp(−θ1 x )]
2
θ = (1, 5), X = R+
0.2
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
∗
E and S
⇒ D optimum design ξD∗ supported on two points
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
36 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,
φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ∗ ) ≤ φ(ξ) + maxx ∈X Fφ (ξ ∗ ; δx )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
37 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,
φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ∗ ) ≤ φ(ξ) + maxx ∈X Fφ (ξ ∗ ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ 1 not degenerate (det M(ξ 1 ) > 0)
2 : Compute xk∗ = arg maxX Fφ (ξ k ; δx )
If Fφ (ξ k ; δx + ) < , stop: ξ k is -optimal
k
3 : ξ k+1 = (1 − αk )ξ k + αk δxk∗ (delta measure at xk∗ )
[Vertex Direction]
k → k + 1, return to Step 2
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
37 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,
φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ∗ ) ≤ φ(ξ) + maxx ∈X Fφ (ξ ∗ ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ 1 not degenerate (det M(ξ 1 ) > 0)
2 : Compute xk∗ = arg maxX Fφ (ξ k ; δx )
If Fφ (ξ k ; δx + ) < , stop: ξ k is -optimal
k
3 : ξ k+1 = (1 − αk )ξ k + αk δxk∗ (delta measure at xk∗ )
[Vertex Direction]
k → k + 1, return to Step 2
Step-length αk ?
d(ξk ,x ∗ )−p
à αk = arg max φ(ξ k+1 ) [= p[d(ξk ,xk∗ )−1] for D-optimal design [Fedorov 1972]]
k
Ô monotone convergence
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
37 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
D/ Construction of an optimal design measure
Ξ = set of probability measures on X , Φ(·) concave and differentiable,
φ(ξ) = Φ[M(ξ)]
Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ∗ ) ≤ φ(ξ) + maxx ∈X Fφ (ξ ∗ ; δx )
Fedorov–Wynn Algorithm: sort of steepest ascent
1 : Choose ξ 1 not degenerate (det M(ξ 1 ) > 0)
2 : Compute xk∗ = arg maxX Fφ (ξ k ; δx )
If Fφ (ξ k ; δx + ) < , stop: ξ k is -optimal
k
3 : ξ k+1 = (1 − αk )ξ k + αk δxk∗ (delta measure at xk∗ )
[Vertex Direction]
k → k + 1, return to Step 2
Step-length αk ?
d(ξk ,x ∗ )−p
à αk = arg max φ(ξ k+1 ) [= p[d(ξk ,xk∗ )−1] for D-optimal design [Fedorov 1972]]
k
Ô monotone convergence
P∞
à αk > 0 , limk→∞ αk = 0 ,
i=1 αk = ∞ [[Wynn 1970] for D-optimal design]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
37 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )
k
M(Xk+1 ) = k+1
M(Xk )
,θ) + 1 ∂η(xk+1 ,θ) 0 ∂η(xk+1
>
0
k+1
∂θ
θ
∂θ
θ
with xk+1 = arg maxX Fφ (ξ k ; δx )
⇔ Wynn algorithm with αk =
Luc Pronzato (CNRS)
1
k+1
Design of experiments in nonlinear models
La Réunion, nov. 2016
38 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )
k
M(Xk+1 ) = k+1
M(Xk )
,θ) + 1 ∂η(xk+1 ,θ) 0 ∂η(xk+1
>
0
k+1
∂θ
θ
∂θ
θ
with xk+1 = arg maxX Fφ (ξ k ; δx )
⇔ Wynn algorithm with αk =
1
k+1
Guaranteed convergence to optimum
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
38 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Remarks:
Consider sequential design, one xi at a time enters M(X )
k
M(Xk+1 ) = k+1
M(Xk )
,θ) + 1 ∂η(xk+1 ,θ) 0 ∂η(xk+1
>
0
k+1
∂θ
θ
∂θ
θ
with xk+1 = arg maxX Fφ (ξ k ; δx )
⇔ Wynn algorithm with αk =
1
k+1
Guaranteed convergence to optimum
Faster methods exist:
remove support points from ξ k (≈ allow αk to be < 0) [Atwood 1973;
Böhning 1985, 1986]
combine with gradient projection (or a second-order method) [Wu 1978]
use a multiplicative algorithm [Titterington 1976; Torsney 1983–2009; Yu
2010] [for D or A optimal design, far from the optimum]
combine different methods [Yu 2011]
Still an active topic. . .
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
38 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex for
mixture experiments)
→ discretized into X` with ` elements (a grid — or better, a low-discrepancy
sequence, see [Niederreiter 1992])
Ô the algorithms above may be slow when ` is large
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
39 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex for
mixture experiments)
→ discretized into X` with ` elements (a grid — or better, a low-discrepancy
sequence, see [Niederreiter 1992])
Ô the algorithms above may be slow when ` is large
à Combine continuous search for support points in X with optimization of a
design measure with few support points, say m `
à Exploit guaranteed (and fast) convergence of algorithms for m small
+ use Eq. Th. to check optimality [Yang et al., 2013, P & Zhigljavsky, 2014]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
39 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2 x1 ) +
θ3
[exp(−θ4 x2 ) − exp(−θ3 x2 )]
θ3 − θ4
with x = (x1 , x2 ) ∈ X = [0, 2] × [0, 10] (and p = 5, θ20 = 2, θ30 = 0.7, θ40 = 0.2)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
40 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2 x1 ) +
θ3
[exp(−θ4 x2 ) − exp(−θ3 x2 )]
θ3 − θ4
with x = (x1 , x2 ) ∈ X = [0, 2] × [0, 10] (and p = 5, θ20 = 2, θ30 = 0.7, θ40 = 0.2)
Additive model [Schwabe 1995]: ξD∗ = tensor product of optimal designs for
(1)
(1)
(1)
β0 + β1 exp(−β2 x1 )
and
(2)
(2)
(2)
(2)
(2)
(2)
β0 + β1 [exp(−β2 x2 ) − exp(−β1 x2 )]/(β1 − β2 )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
40 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
Ex: D-optimal design for
η(x , θ) = θ0 + θ1 exp(−θ2 x1 ) +
θ3
[exp(−θ4 x2 ) − exp(−θ3 x2 )]
θ3 − θ4
with x = (x1 , x2 ) ∈ X = [0, 2] × [0, 10] (and p = 5, θ20 = 2, θ30 = 0.7, θ40 = 0.2)
Additive model [Schwabe 1995]: ξD∗ = tensor product of optimal designs for
(1)
(1)
(1)
β0 + β1 exp(−β2 x1 )
and
(2)
(2)
(2)
(2)
(2)
(2)
β0 + β1 [exp(−β2 x2 ) − exp(−β1 x2 )]/(β1 − β2 )
Use the Equivalence Th. to construct ξD∗ (with arbitrary precision — Maple)
[weight 1/9 at (0, 0.46268527927, 2) ⊗ (0, 1.22947139883, 6.85768905493)]
à 7 iterations of the algorithm in [P & Zhigljavsky, 2014] yield ξ such that
maxx ∈X Fφ (ξ; δx ) < 10−5
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
40 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin (M))
Φ(·) concave, X discretized into X` , ` not too large
Ô optimal design
optimal vector of weights w ∈ R`
P⇐⇒
`
wi ≥ 0, i=1 wi = 1
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
41 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin (M))
Φ(·) concave, X discretized into X` , ` not too large
Ô optimal design
optimal vector of weights w ∈ R`
P⇐⇒
`
wi ≥ 0, i=1 wi = 1
subgradients (↔ directional derivatives)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
41 / 73
3 Construction of (locally) optimal designs
D/ Construction of an optimal design measure
What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin (M))
Φ(·) concave, X discretized into X` , ` not too large
Ô optimal design
optimal vector of weights w ∈ R`
P⇐⇒
`
wi ≥ 0, i=1 wi = 1
subgradients (↔ directional derivatives)
general method for non-differentiable optimization (cutting plane method
[Kelley 1960], level method [Nesterov 2004]), see Chap. 9 of [P & Pázman
2013]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
41 / 73
4 Problems with nonlinear models
4 Problems with nonlinear models
Linear regression: easy
The expectation surface Sη = {η(θ) = (η(x1 , θ), . . . , η(xn , θ))> : θ ∈ Rp } is flat
and linearly parameterized
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
42 / 73
4 Problems with nonlinear models
Nonlinear regression: may be a bit tricky
Sη is curved (intrinsic curvature) and nonlinearly parameterized (parametric
curvature) [Bates & Watts 1980]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
43 / 73
4 Problems with nonlinear models
Ex: η(x, θ) = θ1 {x}1 + θ13 (1 − {x}1 ) + θ2 {x}2 + θ22 (1 − {x}2 )
X = (x1 , x2 , x3 ), x1 = (0 1), x2 = (1 0), x3 = (1 1), θ ∈ [−3, 4] × [−2, 2]
6
4
η3
2
0
−2
−4
−6
8
6
80
4
60
2
40
20
0
0
−2
η2
Luc Pronzato (CNRS)
−20
−4
−40
η
1
Design of experiments in nonlinear models
La Réunion, nov. 2016
44 / 73
4 Problems with nonlinear models
Two important difficulties:
¶ Asymptotically (n → ∞) — or if σ 2 small enough — all seems fine
(use linear approximations),
but the distribution of θ̂n may be far from normal for small n (or for σ 2 large)
à small-sample properties
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
45 / 73
4 Problems with nonlinear models
Two important difficulties:
¶ Asymptotically (n → ∞) — or if σ 2 small enough — all seems fine
(use linear approximations),
but the distribution of θ̂n may be far from normal for small n (or for σ 2 large)
à small-sample properties
· Everything is local (depends on θ): if we linearize, where do we linearize?
(choice of a nominal value θ0 )
à nonlocal optimum design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
45 / 73
5 Small-sample properties
A/ A classification of regression models
5 Small-sample properties
A/ A classification of regression models
Suppose that
yi = y (xi ) = η(xi , θ̄) + εi with E{εi } = 0 and E{ε2i } = σ 2 (xi ) for all i
Divide yi and η(xi , θ̄) by σ(xi ) Ô one may suppose that σ 2 (x ) = σ 2 for all x
Denote
y = (y1 , . . . , yn )> and η(θ) = (η(xi , θ), . . . , η(xi , θ))>
ε = (ε1 , . . . , εn )> so that E{ε} = 0 and Var(ε) = σ 2 In
We suppose η(x , θ) twice continuously differentiable w.r.t. θ for any x
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
46 / 73
5 Small-sample properties
A/ A classification of regression models
5 Small-sample properties
A/ A classification of regression models
Suppose that
yi = y (xi ) = η(xi , θ̄) + εi with E{εi } = 0 and E{ε2i } = σ 2 (xi ) for all i
Divide yi and η(xi , θ̄) by σ(xi ) Ô one may suppose that σ 2 (x ) = σ 2 for all x
Denote
y = (y1 , . . . , yn )> and η(θ) = (η(xi , θ), . . . , η(xi , θ))>
ε = (ε1 , . . . , εn )> so that E{ε} = 0 and Var(ε) = σ 2 In
We suppose η(x , θ) twice continuously differentiable w.r.t. θ for any x
ä Expectation surface: Sη = {η(θ) : θ ∈ Rp }
ä Orthogonal projector onto the tangent space to Sη at η(θ):
Pθ =
1 ∂η(θ) −1
∂η(θ)
M (X , θ)
(a n × n matrix)
n ∂θ>
∂θ
(both depend on X )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
46 / 73
5 Small-sample properties
A/ A classification of regression models
A classification of regression models [Pázman 1993]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
47 / 73
5 Small-sample properties
A/ A classification of regression models
Intrinsically linear models
ä The expectation surface Sη = {η(θ) : θ ∈ Rp } is flat (plane) — intrinsic
curvature ≡ 0
ä A reparameterization (continuously differentiable) exists that makes the model
linear
2
ä Pθ Hij (θ) = Hij (θ), where Hij (θ) = ∂∂θη(θ)
i ∂θj
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
48 / 73
5 Small-sample properties
A/ A classification of regression models
Intrinsically linear models
ä The expectation surface Sη = {η(θ) : θ ∈ Rp } is flat (plane) — intrinsic
curvature ≡ 0
ä A reparameterization (continuously differentiable) exists that makes the model
linear
2
ä Pθ Hij (θ) = Hij (θ), where Hij (θ) = ∂∂θη(θ)
i ∂θj
Observing at p different xi only (replications) makes the model intrinsically linear
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
48 / 73
5 Small-sample properties
A/ A classification of regression models
Parametrically linear models
ä M(X , θ) = constant
ä Pθ Hij (θ) = 0 — parametric curvature ≡ 0
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
49 / 73
5 Small-sample properties
A/ A classification of regression models
Parametrically linear models
ä M(X , θ) = constant
ä Pθ Hij (θ) = 0 — parametric curvature ≡ 0
Linear models
ä η(x , θ) = f > (x )θ + c(x )
ä the model is intrinsically and parametrically linear
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
49 / 73
5 Small-sample properties
A/ A classification of regression models
Parametrically linear models
ä M(X , θ) = constant
ä Pθ Hij (θ) = 0 — parametric curvature ≡ 0
Linear models
ä η(x , θ) = f > (x )θ + c(x )
ä the model is intrinsically and parametrically linear
Flat models
ä A reparameterization exists that makes the information matrix constant
ä Riemannian curvature tensor ≡ 0 Rhijk (θ) = Thjik (θ) − Thkij (θ) ≡ 0 where
Thjik (θ) = [Hhj (θ)]> [In − Pθ ]Hik (θ)
If all parameters but one appear linearly, then the model is flat
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
49 / 73
5 Small-sample properties
A/ A classification of regression models
A classification of regression models [Pázman 1993] (bis)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
50 / 73
5 Small-sample properties
B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ 2 In )
Intrinsically linear models (in particular, repetitions at p points):
p/2
det1/2 M(X ,θ)
→ exact distribution θ̂n ∼ q(θ|θ̄) = n (2π)
exp − 2σ1 2 kη(θ) − η(θ̄)k2
p/2 σ p
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
51 / 73
5 Small-sample properties
B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ 2 In )
Intrinsically linear models (in particular, repetitions at p points):
p/2
det1/2 M(X ,θ)
exp − 2σ1 2 kη(θ) − η(θ̄)k2
→ exact distribution θ̂n ∼ q(θ|θ̄) = n (2π)
p/2 σ p
Ex: η(x , θ) = exp(−θx ), θ̄ = 2, 15 observations at the same x = 1/2 (σ 2 = 1)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
51 / 73
5 Small-sample properties
B/ Density of the LS estimator
B/ Density of the LS estimator
Suppose ε ∼ N (0, σ 2 In )
Intrinsically linear models (in particular, repetitions at p points):
p/2
det1/2 M(X ,θ)
→ exact distribution θ̂n ∼ q(θ|θ̄) = n (2π)
exp − 2σ1 2 kη(θ) − η(θ̄)k2
p/2 σ p
Ex: η(x , θ) = x θ3 , θ̄ = 0, all observations at the same x 6= 0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−2
Luc Pronzato (CNRS)
−1.5
−1
−0.5
0
0.5
1
Design of experiments in nonlinear models
1.5
2
La Réunion, nov. 2016
51 / 73
5 Small-sample properties
B/ Density of the LS estimator
Flat models: approximate density of θ̂n
θ̄)]
q(θ|θ̄) = (2π)p/2 σpdet[Q(θ,
exp − 2σ1 2 kPθ [η(θ) − η(θ̄)]k2
np/2 det1/2 M(X ,θ)
where {Q(θ, θ̄)}ij = {n M(X , θ)}ij + [η(θ) − η(θ̄)]> [In − Pθ ]Hij (θ)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
52 / 73
5 Small-sample properties
B/ Density of the LS estimator
Flat models: approximate density of θ̂n
θ̄)]
q(θ|θ̄) = (2π)p/2 σpdet[Q(θ,
exp − 2σ1 2 kPθ [η(θ) − η(θ̄)]k2
np/2 det1/2 M(X ,θ)
where {Q(θ, θ̄)}ij = {n M(X , θ)}ij + [η(θ) − η(θ̄)]> [In − Pθ ]Hij (θ)
Remarks:
Other approximations (more complicated) for models with Rhijk (θ) 6≡ 0
(non-flat)
Approximation
of the density of the
penalized LS estimator
arg minθ∈Θ ky − η(θ)k2 + 2w (θ) (which includes the case of Bayesian
estimation)
We also know the (approximate) marginal densities of the LS estimator θ̂n
[Pázman & P 1996]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
52 / 73
5 Small-sample properties
C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ 2 In ), define e(θ) = y − η(θ)
Ô e> (θ̄) Pθ̄ e(θ̄)/σ 2 ∼ χ2p
Ô e> (θ̄) [In − Pθ̄ ] e(θ̄)/σ 2 ∼ χ2n−p
and they are independent
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
53 / 73
5 Small-sample properties
C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ 2 In ), define e(θ) = y − η(θ)
Ô e> (θ̄) Pθ̄ e(θ̄)/σ 2 ∼ χ2p
Ô e> (θ̄) [In − Pθ̄ ] e(θ̄)/σ 2 ∼ χ2n−p
and they are independent
à exactconfidence regions at level α
θ ∈ Rp : e> (θ) Pθ e(θ)/σ 2 < χ2p [1 − α] (if σ 2 known)
n
o
e> (θ) Pθ e(θ)
θ ∈ Rp : n−p
<
F
[1
−
α]
(if σ 2 unknown)
p,n−p
>
p e (θ) [In −Pθ ] e(θ)
(but they are not of minimum volume, maybe composed of disconnected
subsets. . . )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
53 / 73
5 Small-sample properties
C/ Confidence regions
C/ Confidence regions
Suppose ε ∼ N (0, σ 2 In ), define e(θ) = y − η(θ)
Ô e> (θ̄) Pθ̄ e(θ̄)/σ 2 ∼ χ2p
Ô e> (θ̄) [In − Pθ̄ ] e(θ̄)/σ 2 ∼ χ2n−p
and they are independent
à exactconfidence regions at level α
θ ∈ Rp : e> (θ) Pθ e(θ)/σ 2 < χ2p [1 − α] (if σ 2 known)
n
o
e> (θ) Pθ e(θ)
θ ∈ Rp : n−p
<
F
[1
−
α]
(if σ 2 unknown)
p,n−p
>
p e (θ) [In −Pθ ] e(θ)
(but they are not of minimum volume, maybe composed of disconnected
subsets. . . )
à approximate confidence regions based on likelihood ratio (usually connected):
n
o
θ ∈ Rp : ke(θ)k2 − ke(θ̂)k2 < σ 2 χ2p [1 − α] (if σ 2 known)
n
o
p
θ ∈ Rp : ke(θ)k2 /ke(θ̂)k2 < 1 + n−p
Fp,n−p [1 − α] (if σ 2 unknown)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
53 / 73
5 Small-sample properties
D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily of
minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
54 / 73
5 Small-sample properties
D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily of
minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
b) (approximate orR exact) density of θ̂n
e.g., minimize kθ − θ̄k2 q(θ|θ̄) dθ w.r.t. X (using stochastic approximation,
[Pázman & P 1992, Gauchi & Pázman 2006])
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
54 / 73
5 Small-sample properties
D/ Design based on small-sample properties
D/ Design based on small-sample properties
3 main ideas (exact design only) based on:
a) (approximate) volume of (approximate) confidence regions (not necessarily of
minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007]
(ellipsoidal approximation → D-optimality)
b) (approximate orR exact) density of θ̂n
e.g., minimize kθ − θ̄k2 q(θ|θ̄) dθ w.r.t. X (using stochastic approximation,
[Pázman & P 1992, Gauchi & Pázman 2006])
c) higher-order approximation of optimality criteria,
using ϕ(y|X
θ̄), σ 2 In )
R n, θ̄) = N (η(
2
minimize MSE kθ̂ (y) − θ̄k ϕ(y|X , θ̄) dy [Clarke 1980]
R
minimize entropy − log[q(θ̂n (y)|θ̄)]ϕ(y|X , θ̄) dy [P & Pázman 1994]
(usual normal approximation → D-optimality)
→ explicit (but rather complicated) expressions (depend on 3rd-order
derivatives of η(x , θ) w.r.t. θ)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
54 / 73
5 Small-sample properties
E/ One additional difficulty
E/ One additional difficulty
Overlapping of Sη , local minimizers. . .
4
3.5
C+
3
2.5
2
1.5
1
B +
+ A
0.5
0
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
s Important and difficult problem, often neglected!
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
55 / 73
5 Small-sample properties
E/ One additional difficulty
What can we do at the design stage?
à extensions of usual optimality criteria, e.g.
maximize φeE (X ) = min
θ
or
R
maximize φeE (ξ) = min
θ
kη(θ) − η(θ0 )k2
kθ − θ0 k2
[η(x , θ) − η(x , θ0 )]2 ξ(dx )
kθ − θ0 k2
Ô corresponds to E -optimal design if the model is linear (maximize λmin M(ξ)),
see Chap. 7 of [P & Pázman 2013], [Pázman & P, 2014]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
56 / 73
5 Small-sample properties
E/ One additional difficulty
What can we do at the design stage?
à extensions of usual optimality criteria, e.g.
maximize φeE (X ) = min
θ
or
R
maximize φeE (ξ) = min
θ
kη(θ) − η(θ0 )k2
kθ − θ0 k2
[η(x , θ) − η(x , θ0 )]2 ξ(dx )
kθ − θ0 k2
Ô corresponds to E -optimal design if the model is linear (maximize λmin M(ξ)),
see Chap. 7 of [P & Pázman 2013], [Pázman & P, 2014]
s All approaches presented so far are local
(the optimal design depends on θ̄ unknown ← θ0 )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
56 / 73
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θxR), yi = η(xi , θ̄) + εi , θ > 0, x ∈ X = [0, ∞)
Ô M(ξ, θ0 ) = X x 2 exp(−2θ0 x ) ξ(dx )
à ξD∗ = ξA∗ = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
57 / 73
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θxR), yi = η(xi , θ̄) + εi , θ > 0, x ∈ X = [0, ∞)
Ô M(ξ, θ0 ) = X x 2 exp(−2θ0 x ) ξ(dx )
à ξD∗ = ξA∗ = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)
¶ Average optimum design: maximize Eθ {φ(X , θ)} (or Eθ {φ(ξ, θ)})
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
57 / 73
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θxR), yi = η(xi , θ̄) + εi , θ > 0, x ∈ X = [0, ∞)
Ô M(ξ, θ0 ) = X x 2 exp(−2θ0 x ) ξ(dx )
à ξD∗ = ξA∗ = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)
¶ Average optimum design: maximize Eθ {φ(X , θ)} (or Eθ {φ(ξ, θ)})
· Maximin optimum design: maximize minθ {φ(X , θ)} (or minθ {φ(ξ, θ)})
à Between ¶ and ·: regularized maximin criteria, quantiles and probability level
criteria
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
57 / 73
6 Nonlocal optimum design
6 Nonlocal optimum design
Ex: η(x , θ) = exp(−θxR), yi = η(xi , θ̄) + εi , θ > 0, x ∈ X = [0, ∞)
Ô M(ξ, θ0 ) = X x 2 exp(−2θ0 x ) ξ(dx )
à ξD∗ = ξA∗ = . . . = δ1/θ0
Objective: remove the dependence in nominal value θ0
3 main classes of methods (related)
¶ Average optimum design: maximize Eθ {φ(X , θ)} (or Eθ {φ(ξ, θ)})
· Maximin optimum design: maximize minθ {φ(X , θ)} (or minθ {φ(ξ, θ)})
à Between ¶ and ·: regularized maximin criteria, quantiles and probability level
criteria
¸ Sequential design
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
57 / 73
6 Nonlocal optimum design
A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
R
φ(·, θ0 ) → φAO (·) = Θ φ(·, θ) µ(dθ)
PM
(i)
No difficulty if Θ = {θ(1) , . . . θ(M) } finite and µ = i=1 αi δθ (integral → finite
sum)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
58 / 73
6 Nonlocal optimum design
A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
R
φ(·, θ0 ) → φAO (·) = Θ φ(·, θ) µ(dθ)
PM
(i)
No difficulty if Θ = {θ(1) , . . . θ(M) } finite and µ = i=1 αi δθ (integral → finite
sum)
* Approximate design theory:
φAO (·) is concave when each φ(·, θ) is concave
à same properties and same algorithms as in Section 3 for
design measures
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
58 / 73
6 Nonlocal optimum design
A/ Average Optimum design
A/ Average Optimum design
Nothing special: probability measure µ(dθ) on Θ ⊆ Rp
R
φ(·, θ0 ) → φAO (·) = Θ φ(·, θ) µ(dθ)
PM
(i)
No difficulty if Θ = {θ(1) , . . . θ(M) } finite and µ = i=1 αi δθ (integral → finite
sum)
* Approximate design theory:
φAO (·) is concave when each φ(·, θ) is concave
à same properties and same algorithms as in Section 3 for
design measures
* Exact design: same algorithms as in Section 3
(for continuous distributions µ use stochastic approximation to avoid
evaluations of integrals [P & Walter 1985])
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
58 / 73
6 Nonlocal optimum design
A/ Average Optimum design
A Bayesian interpretation:
Suppose µ=prior distribution
has a density π(θ)
R
→ entropy − Θ π(θ) log[π(θ)] dθ
,θ)π(θ)
Posterior distribution of θ: π(θ|X , y) = ϕ(y|X
ϕ(y|X )
Gain in information = decrease of entropy
Entropy may increase, but expected gain in information I(X ) is always positive
[Lindley 1956]
R
à I(X ) = Ey { Θ (π(θ|X , y) log[π(θ|X , y)] − π(θ) log[π(θ)]) dθ}
where the expectation Ey {·} is for the marginal ϕ(y|X )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
59 / 73
6 Nonlocal optimum design
A/ Average Optimum design
A Bayesian interpretation:
Suppose µ=prior distribution
has a density π(θ)
R
→ entropy − Θ π(θ) log[π(θ)] dθ
,θ)π(θ)
Posterior distribution of θ: π(θ|X , y) = ϕ(y|X
ϕ(y|X )
Gain in information = decrease of entropy
Entropy may increase, but expected gain in information I(X ) is always positive
[Lindley 1956]
R
à I(X ) = Ey { Θ (π(θ|X , y) log[π(θ|X , y)] − π(θ) log[π(θ)]) dθ}
where the expectation Ey {·} is for the marginal ϕ(y|X )
If the experiment informative enough (σ 2 small, n large enough):
R
∼
maximizing I(X ) ⇔ maximizing log det M(X , θ) π(θ) dθ
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
59 / 73
6 Nonlocal optimum design
A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =
noninformative prior (Jeffrey)
R
1/2
π ∗ (θ) = R det 1/2 M(ξ,θ) à maximize det1/2 M(ξ, θ) dθ
Θ
det
M(ξ,θ) dθ
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
60 / 73
6 Nonlocal optimum design
A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =
noninformative prior (Jeffrey)
R
1/2
π ∗ (θ) = R det 1/2 M(ξ,θ) à maximize det1/2 M(ξ, θ) dθ
Θ
πν (θ) = R
Θ
det
M(ξ,θ) dθ
det1/2 M(ν,θ)
det1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
metric defined by ν) [Bornkamp 2011]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
60 / 73
6 Nonlocal optimum design
A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =
noninformative prior (Jeffrey)
R
1/2
π ∗ (θ) = R det 1/2 M(ξ,θ) à maximize det1/2 M(ξ, θ) dθ
Θ
πν (θ) = R
Θ
det
M(ξ,θ) dθ
det1/2 M(ν,θ)
det1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
metric defined by ν) [Bornkamp 2011]
Ex: η(x , θ) = exp(−θx ), α-quantiles of η(x , θ) for different π
π uniform on Θ = [1, 10]
1
0.9
0.8
0.7
η(x,θ)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
60 / 73
6 Nonlocal optimum design
A/ Average Optimum design
Which prior π(θ)? Expected gain in information maximum when π(·) =
noninformative prior (Jeffrey)
R
1/2
π ∗ (θ) = R det 1/2 M(ξ,θ) à maximize det1/2 M(ξ, θ) dθ
Θ
πν (θ) = R
Θ
det
M(ξ,θ) dθ
det1/2 M(ν,θ)
det1/2 M(ν,θ) dθ
→ uniform distribution of responses η(·, θ) (for the
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
η(x,θ)
η(x,θ)
metric defined by ν) [Bornkamp 2011]
Ex: η(x , θ) = exp(−θx ), α-quantiles of η(x , θ) for different π
π uniform on Θ = [1, 10]
πν , ν uniform on X = [0, 2]
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0
0.2
0.4
x
Luc Pronzato (CNRS)
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
Design of experiments in nonlinear models
La Réunion, nov. 2016
60 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
B/ Maximin Optimum design
φ(·, θ0 ) → φMmO (·) = minθ∈Θ φ(·, θ)
* Exact design:
Θ finite → same algorithms as in Section 3
Θ compact subset of Rp → relaxation method to solve a sequence
of maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
61 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
B/ Maximin Optimum design
φ(·, θ0 ) → φMmO (·) = minθ∈Θ φ(·, θ)
* Exact design:
Θ finite → same algorithms as in Section 3
Θ compact subset of Rp → relaxation method to solve a sequence
of maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988]
* Approximate design:
φMmO (·) concave when each φ(·, θ) is concave
s but φMmO (·) is non-differentiable!
à maximize φMmO (ξ) using a specific algorithm for
concave non-differentiable maximization (cutting plane, level method. . .
see Section 3)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
61 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
How to check optimality of ξ ∗ ?
φ(·, θ0 ) differentiable: maxx ∈X Fφ (ξ ∗ ; δx , θ0 ) ≤ 0 ?
→ plot Fφ (ξ ∗ ; δx , θ0 ) as a function of x
φMmO (·) not differentiable: maxν∈Ξ FφMmO (ξ ∗ ; ν) ≤ 0 cannot be exploited directly
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
62 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
How to check optimality of ξ ∗ ?
φ(·, θ0 ) differentiable: maxx ∈X Fφ (ξ ∗ ; δx , θ0 ) ≤ 0 ?
→ plot Fφ (ξ ∗ ; δx , θ0 ) as a function of x
φMmO (·) not differentiable: maxν∈Ξ FφMmO (ξ ∗ ; ν) ≤ 0 cannot be exploited directly
Equivalence Theorem:
ξ ∗ maximizes φMmO (ξ) ⇔ maxν∈Ξ FφMmO (ξ ∗ ; ν) ≤ 0
⇔ maxν∈Ξ minθ∈Θ(ξ∗ ) Fφ (ξ ∗ ; ν, θ) ≤ 0
with Θ(ξ)
R = {θ : φ(ξ, θ) = φMmO (ξ)}
⇔ maxx ∈X Θ(ξ∗ ) Fφ (ξ ∗ ; δx , θ)µ∗ (dθ) ≤ 0
for some probability measure µ∗ on Θ(ξ ∗ )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
62 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
How to check optimality of ξ ∗ ?
φ(·, θ0 ) differentiable: maxx ∈X Fφ (ξ ∗ ; δx , θ0 ) ≤ 0 ?
→ plot Fφ (ξ ∗ ; δx , θ0 ) as a function of x
φMmO (·) not differentiable: maxν∈Ξ FφMmO (ξ ∗ ; ν) ≤ 0 cannot be exploited directly
Equivalence Theorem:
ξ ∗ maximizes φMmO (ξ) ⇔ maxν∈Ξ FφMmO (ξ ∗ ; ν) ≤ 0
⇔ maxν∈Ξ minθ∈Θ(ξ∗ ) Fφ (ξ ∗ ; ν, θ) ≤ 0
with Θ(ξ)
R = {θ : φ(ξ, θ) = φMmO (ξ)}
⇔ maxx ∈X Θ(ξ∗ ) Fφ (ξ ∗ ; δx , θ)µ∗ (dθ) ≤ 0
for some probability measure µ∗ on Θ(ξ ∗ )
Once ξ ∗ is determined, solve a LP problem:
R
µ∗ on Θ(ξ ∗ ) minimizes maxx ∈X Θ(ξ∗ ) Fφ (ξ ∗ ; δx , θ)µ(dθ)
R
à plot Θ(ξ∗ ) Fφ (ξ ∗ ; δx , θ)µ∗ (dθ) (should be ≤ 0)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
62 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
Ex: η(x , θ) = θ1 exp(−θ2 x ), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ]
φ(ξ, θ) =
det1/p M(ξ,θ)
∗ ,θ)
det1/p M(ξD
Luc Pronzato (CNRS)
(D efficiency, ∈ [0, 1])
Design of experiments in nonlinear models
La Réunion, nov. 2016
63 / 73
6 Nonlocal optimum design
B/ Maximin Optimum design
Ex: η(x , θ) = θ1 exp(−θ2 x ), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ]
φ(ξ, θ) =
R
∗
Θ(ξ ∗ )
det1/p M(ξ,θ)
∗ ,θ)
det1/p M(ξD
∗
(D efficiency, ∈ [0, 1])
Fφ (ξ ; δx , θ)µ (dθ) for
θ2max = 2 (solid line, 2 support points) and
θ2max = 20 (dashed line, 4 support points)
0.01
0
−0.01
−0.02
−0.03
−0.04
−0.05
−0.06
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
63 / 73
6 Nonlocal optimum design
C/ Regularized Maximin Optimum design
C/ Regularized Maximin Optimum design
Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact
R
1
φMmO (ξ) = minθ∈Θ φ(·, θ) ≤ φq (ξ) = Θ φ−q (ξ, θ)µ(dθ) q (differentiable)
R
with φ−1 (ξ) = φAO (ξ), φ0 (ξ) = exp Θ log[φ(ξ, θ)]µ(dθ) and
φq (ξ) → φMmO (ξ) as q → ∞ (and φq (·) concave for q ≥ −1)
P
δ (i)
φMmO (ξ ∗ )
Moreover, Θ = {θ(1) , . . . , θ(M) }, µ = iM θ =⇒ φ∗ q ≥ M −1/q
MmO
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
64 / 73
6 Nonlocal optimum design
C/ Regularized Maximin Optimum design
C/ Regularized Maximin Optimum design
Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact
R
1
φMmO (ξ) = minθ∈Θ φ(·, θ) ≤ φq (ξ) = Θ φ−q (ξ, θ)µ(dθ) q (differentiable)
R
with φ−1 (ξ) = φAO (ξ), φ0 (ξ) = exp Θ log[φ(ξ, θ)]µ(dθ) and
φq (ξ) → φMmO (ξ) as q → ∞ (and φq (·) concave for q ≥ −1)
P
δ (i)
φMmO (ξ ∗ )
Moreover, Θ = {θ(1) , . . . , θ(M) }, µ = iM θ =⇒ φ∗ q ≥ M −1/q
MmO
1
0.9
q=−0.8
0.8
Ex: η = exp(−θx )
M(ξ,θ)
φ(ξ, θ) = M(ξ
∗ ,θ) (= efficiency)
Plot of φq (δx ) function of x
0.7
q=0.4
0.6
q=40
0.5
0.4
0.3
0.2
0.1
0
Luc Pronzato (CNRS)
0
0.1
0.2
Design of experiments in nonlinear models
0.3
0.4
0.5
0.6
0.7
0.8
La Réunion, nov. 2016
0.9
1
64 / 73
6 Nonlocal optimum design
D/ Quantiles and probability level criteria
D/ Quantiles and probability level criteria
∗
A/ ξAO
good for µ(dθ) on Θ, may
Rbe bad for some
θ
s for ψ(·) %, maximizing ψ Θ φ(·, θ) µ(dθ) (AO-opt.)
R
is different from maximizing Θ ψ[φ(·, θ)] µ(dθ)
∗
B/ ξMmO
often depends on the boundary of Θ
(Ô we often simply replace the dependence on θ0 by a dependence on θmax )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
65 / 73
6 Nonlocal optimum design
D/ Quantiles and probability level criteria
D/ Quantiles and probability level criteria
∗
A/ ξAO
good for µ(dθ) on Θ, may
Rbe bad for some
θ
s for ψ(·) %, maximizing ψ Θ φ(·, θ) µ(dθ) (AO-opt.)
R
is different from maximizing Θ ψ[φ(·, θ)] µ(dθ)
∗
B/ ξMmO
often depends on the boundary of Θ
(Ô we often simply replace the dependence on θ0 by a dependence on θmax )
0.25
0.2
u given →
Pu (ξ) = µ{φ(ξ, θ) ≥ u}
p.d.f. of φ(ξ;θ)
Pu(ξ)=1−α
0.15
α ∈ (0, 1) given →
Qα (ξ) = max{u : Pu (ξ) ≥ 1−α}
0.1
α
0.05
0
0
1
2
3
4
5
6
7
8
9
10
u=Qα(ξ)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
65 / 73
6 Nonlocal optimum design
D/ Quantiles and probability level criteria
ä maximizing Pu (ξ) = µ{φ(ξ, θ) ≥ u} is well adapted to
φ(ξ, θ) = efficiency ∈ (0, 1)
ä Qα (ξ) → φMMO as α → 0
ä for ψ(·) %, using ψ[φ(ξ, θ)] does not change Pu (ξ) and Qα (ξ)
s Pu (ξ) and Qα (ξ) generally not concave!
(but we can compute directional derivatives and maximize)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
66 / 73
6 Nonlocal optimum design
D/ Quantiles and probability level criteria
ä maximizing Pu (ξ) = µ{φ(ξ, θ) ≥ u} is well adapted to
φ(ξ, θ) = efficiency ∈ (0, 1)
ä Qα (ξ) → φMMO as α → 0
ä for ψ(·) %, using ψ[φ(ξ, θ)] does not change Pu (ξ) and Qα (ξ)
s Pu (ξ) and Qα (ξ) generally not concave!
(but we can compute directional derivatives and maximize)
0.045
0.04
0.035
Ex: η = exp(−θx )
φ(ξ, θ) = M(ξ, θ)
Plot of Qα (δx ) function of x
(with φAO (δx ) and φMmO (δx ))
α=0.5
0.03
0.025
0.02
0.015
α=0.1
0.01
0.005
α=0.05
0
Luc Pronzato (CNRS)
0
0.1
0.2
Design of experiments in nonlinear models
0.3
0.4
0.5
0.6
0.7
0.8
La Réunion, nov. 2016
0.9
1
66 / 73
6 Nonlocal optimum design
E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0 )
→ observe: y1 = y1 (X 1 )
→ estimate: θ̂1 = arg minθ J(θ; y1 , X 1 )
→ design: X 2 = arg maxX φ({X 1 , X }, θ̂1 )
→ observe: y2 = y2 (X 2 )
→ estimate: θ̂2 = arg minθ J(θ; { y1 , y2 }, {X 1 , X 2 })
| {z }
| {z }
growing
growing
→ design: X 3 = arg maxX φ({X 1 , X 2 , X }, θ̂2 )
. . . etc.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
67 / 73
6 Nonlocal optimum design
E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0 )
→ observe: y1 = y1 (X 1 )
→ estimate: θ̂1 = arg minθ J(θ; y1 , X 1 )
→ design: X 2 = arg maxX φ({X 1 , X }, θ̂1 )
→ observe: y2 = y2 (X 2 )
→ estimate: θ̂2 = arg minθ J(θ; { y1 , y2 }, {X 1 , X 2 })
| {z }
| {z }
growing
growing
→ design: X 3 = arg maxX φ({X 1 , X 2 , X }, θ̂2 )
. . . etc.
+ Replace unknown θ by best current guess θ̂k
(there exist variants with Bayesian estimation and average optimality)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
67 / 73
6 Nonlocal optimum design
E/ Sequential design
E/ Sequential design
θ0 → design: X 1 = arg maxX φ(X , θ0 )
→ observe: y1 = y1 (X 1 )
→ estimate: θ̂1 = arg minθ J(θ; y1 , X 1 )
→ design: X 2 = arg maxX φ({X 1 , X }, θ̂1 )
→ observe: y2 = y2 (X 2 )
→ estimate: θ̂2 = arg minθ J(θ; { y1 , y2 }, {X 1 , X 2 })
| {z }
| {z }
growing
growing
→ design: X 3 = arg maxX φ({X 1 , X 2 , X }, θ̂2 )
. . . etc.
+ Replace unknown θ by best current guess θ̂k
(there exist variants with Bayesian estimation and average optimality)
s Consistency of θ̂n ?
Asymptotic normality (for design based on M)?
(X k depends on y1 , . . . , yk−1 =⇒ independence is lost)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
67 / 73
6 Nonlocal optimum design
E/ Sequential design
à No problem if each X i has size ≥ p = dim(θ) (batch sequential design)
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
68 / 73
6 Nonlocal optimum design
E/ Sequential design
à No problem if each X i has size ≥ p = dim(θ) (batch sequential design)
If n observation in total, two stages only:
√ size of first batch?
→ should be proportional to n (it does not say much . . . )
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
68 / 73
6 Nonlocal optimum design
E/ Sequential design
à No problem if each X i has size ≥ p = dim(θ) (batch sequential design)
If n observation in total, two stages only:
√ size of first batch?
→ should be proportional to n (it does not say much . . . )
à Full sequential design: X k = {xk } (batches of size 1)
→ convergence properties difficult to investigate
M(Xk+1 , θ̂k ) =
k
1 ∂η(xk+1 , θ) ∂η(xk+1 , θ) M(Xk , θ̂k ) +
θ̂ k
θ̂ k
k +1
k +1
∂θ
∂θ>
with xk+1 = arg maxX Fφ (ξ k ; δx |θ̂k ) ⇔ Wynn’s algorithm [1970] with αk =
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
1
k+1
68 / 73
6 Nonlocal optimum design
E/ Sequential design
à No problem if each X i has size ≥ p = dim(θ) (batch sequential design)
If n observation in total, two stages only:
√ size of first batch?
→ should be proportional to n (it does not say much . . . )
à Full sequential design: X k = {xk } (batches of size 1)
→ convergence properties difficult to investigate
M(Xk+1 , θ̂k ) =
k
1 ∂η(xk+1 , θ) ∂η(xk+1 , θ) M(Xk , θ̂k ) +
θ̂ k
θ̂ k
k +1
k +1
∂θ
∂θ>
with xk+1 = arg maxX Fφ (ξ k ; δx |θ̂k ) ⇔ Wynn’s algorithm [1970] with αk =
ä some CV results for Bayesian estimation [Hu 1998]
ä no general CV results for LS and ML estimation
some results when X is finite (X = {x (1) , . . . , x (`) }) [P 2009, 2010]
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
1
k+1
68 / 73
References
References I
Atkinson, A., Cox, D., 1974. Planning experiments for discriminating between models (with discussion).
Journal of Royal Statistical Society B36, 321–348.
Atkinson, A., Fedorov, V., 1975. The design of experiments for discriminating between two rival models.
Biometrika 62 (1), 57–70.
Atwood, C., 1973. Sequences converging to D-optimal designs of experiments. Annals of Statistics 1 (2),
342–352.
Bates, D., Watts, D., 1980. Relative curvature measures of nonlinearity. Journal of Royal Statistical Society
B42, 1–25.
Böhning, D., 1985. Numerical estimation of a probability measure. Journal of Statistical Planning and
Inference 11, 57–69.
Böhning, D., 1986. A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347.
Box, G., Hill, W., 1967. Discrimination among mechanistic models. Technometrics 9 (1), 57–71.
Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Math. Stat. 24, 586–602.
Clarke, G., 1980. Moments of the least-squares estimators in a non-linear regression model. Journal of Royal
Statistical Society B42, 227–237.
D’Argenio, D., 1981. Optimal sampling times for pharmacokinetic experiments. Journal of Pharmacokinetics
and Biopharmaceutics 9 (6), 739–756.
Dette, H., Melas, V., 2011. A note on de la Garza phenomenon for locally optimal designs. Annals of Statistics
39 (2), 1266–1281.
Fedorov, V., 1972. Theory of Optimal Experiments. Academic Press, New York.
Fedorov, V., Leonov, S., 2014. Optimal Design for Nonlinear Response Models. CRC Press, Boca Raton.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
69 / 73
References
References II
Fisher, R., 1925. Statistical Methods for Research Workers. Oliver & Boyd, Edimbourgh.
Gauchi, J.-P., Pázman, A., 2006. Designs in nonlinear regression by stochastic minimization of functionnals of
the mean square error matrix. Journal of Statistical Planning and Inference 136, 1135–1152.
Goodwin, G., Payne, R., 1977. Dynamic System Identification: Experiment Design and Data Analysis.
Academic Press, New York.
Hamilton, D., Watts, D., 1985. A quadratic design criterion for precise estimation in nonlinear regression
models. Technometrics 27, 241–250.
Hill, P., 1978. A review of experimental design procedures for regression model discrimination. Technometrics
20, 15–21.
Hu, I., 1998. On sequential designs in nonlinear problems. Biometrika 85 (2), 496–503.
Kelley, J., 1960. The cutting plane method for solving convex programs. SIAM Journal 8, 703–712.
Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics
12, 363–366.
Ljung, L., 1987. System Identification, Theory for the User. Prentice-Hall, Englewood Cliffs.
Mitchell, T., 1974. An algorithm for the construction of “D-optimal” experimental designs. Technometrics 16,
203–210.
Nesterov, Y., 2004. Introductory Lectures to Convex Optimization: A Basic Course. Kluwer, Dordrecht.
Niederreiter, H., 1992. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia.
Pázman, A., 1986. Foundations of Optimum Experimental Design. Reidel (Kluwer group), Dordrecht (co-pub.
VEDA, Bratislava).
Pázman, A., 1993. Nonlinear Statistical Models. Kluwer, Dordrecht.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
70 / 73
References
References III
Pázman, A., Pronzato, L., 1992. Nonlinear experimental design based on the distribution of estimators.
Journal of Statistical Planning and Inference 33, 385–402.
Pázman, A., Pronzato, L., 1996. A Dirac function method for densities of nonlinear statistics and for marginal
densities in nonlinear regression. Statistics & Probability Letters 26, 159–167.
Pázman, A., Pronzato, L., 2014. Optimum design accounting for the global nonlinear behavior of the model.
Annals of Statistics 42 (4), 1426–1451.
Pronzato, L., 2009. Asymptotic properties of nonlinear estimates in stochastic models with finite design space.
Statistics & Probability Letters 79, 2307–2313.
Pronzato, L., 2010. One-step ahead adaptive D-optimal design on a finite design space is asymptotically
optimal. Metrika 71 (2), 219–238, ( DOI: 10.1007/s00184-008-0227-y).
Pronzato, L., Pázman, A., 1994. Second-order approximation of the entropy in nonlinear least-squares
estimation. Kybernetika 30 (2), 187–198, Erratum 32(1):104, 1996.
Pronzato, L., Pázman, A., 2013. Design of Experiments in Nonlinear Models. Asymptotic Normality,
Optimality Criteria and Small-Sample Properties. Springer, LNS 212, New York.
Pronzato, L., Walter, E., 1985. Robust experiment design via stochastic approximation. Mathematical
Biosciences 75, 103–120.
Pronzato, L., Walter, E., 1988. Robust experiment design via maximin optimization. Mathematical Biosciences
89, 161–176.
Pronzato, L., Zhigljavsky, A., 2014. Algorithmic construction of optimal designs on compact sets for concave
and differentiable criteria. Journal of Statistical Planning and Inference 154, 141–155.
Pukelsheim, F., 1993. Optimal Experimental Design. Wiley, New York.
Pukelsheim, F., Reider, S., 1992. Efficient rounding of approximate designs. Biometrika 79 (4), 763–770.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
71 / 73
References
References IV
Schwabe, R., 1995. Designing experiments for additive nonlinear models. In: Kitsos, C., Müller, W. (Eds.),
MODA4 – Advances in Model-Oriented Data Analysis, Spetses (Greece), june 1995. Physica Verlag,
Heidelberg, pp. 77–85.
Silvey, S., 1980. Optimal Design. Chapman & Hall, London.
Titterington, D., 1976. Algorithms for computing D-optimal designs on a finite design space. In: Proc. of the
1976 Conference on Information Science and Systems. Dept. of Electronic Engineering, John Hopkins
University, Baltimore, pp. 213–216.
Torsney, B., 1983. A moment inequality and monotonicity of an algorithm. In: Kortanek, K., Fiacco, A. (Eds.),
Proc. Int. Symp. on Semi-infinite Programming and Applications. Springer, Heidelberg, pp. 249–260.
Torsney, B., 2009. W-iterations and ripples therefrom. In: Pronzato, L., Zhigljavsky, A. (Eds.), Optimal Design
and Related Areas in Optimization and Statistics. Springer, Ch. 1, pp. 1–12.
Vila, J.-P., 1990. Exact experimental designs via stochastic optimization for nonlinear regression models. In:
Proc. Compstat, Int. Assoc. for Statistical Computing. Physica Verlag, Heidelberg, pp. 291–296.
Vila, J.-P., Gauchi, J.-P., 2007. Optimal designs based on exact confidence regions for parameter estimation of
a nonlinear regression model. Journal of Statistical Planning and Inference 137, 2935–2953.
Walter, E., Pronzato, L., 1994. Identification de Modèles Paramétriques à Partir de Données Expérimentales.
Masson, Paris, 371 pages.
Walter, E., Pronzato, L., 1997. Identification of Parametric Models from Experimental Data. Springer,
Heidelberg.
Welch, W., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria.
Technometrics 24 (1), 41–28.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
72 / 73
References
References V
Wu, C., 1978. Some algorithmic aspects of the theory of optimal designs. Annals of Statistics 6 (6),
1286–1301.
Wynn, H., 1970. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41,
1655–1664.
Yang, M., 2010. On de la Garza phenomenon. Annals of Statistics 38 (4), 2499–2524.
Yang, M., Biedermann, S., Tang, E., 2013. On optimal designs for nonlinear models: a general and efficient
algorithm. Journal of the American Statistical Association 108 (504), 1411–1420.
Yu, Y., 2010. Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimal
designs. Comput. Statist. Data Anal. 54, 1419–1425.
Yu, Y., 2011. D-optimal designs via a cocktail algorithm. Stat. Comput. 21, 475–481.
Zarrop, M., 1979. Optimal Experiment Design for Dynamic System Identification. Springer, Heidelberg.
Luc Pronzato (CNRS)
Design of experiments in nonlinear models
La Réunion, nov. 2016
73 / 73
© Copyright 2026 Paperzz