Robust Experimental Design for Multivariate Generalized Linear

Tel-Aviv University
Faculty of Exact Sciences
Department of Statistics and Operations Research
Designing Factorial
Experiments with Binary
Response
Hovav A. Dror & David M. Steinberg
July 2006
International Conference on DOE – Nankai University
1/40
Overview
Introduction – Designs for GLM’s
 Local D-optimal Designs
 Robust Designs
 Sequential Designs
 Conclusions

Technical reports and MATLAB macros available at
www.math.tau.ac.il/~dms/GLM_Design
Robust Experimental Design for multivariate GLM
2/40
D-optimal GLM designs

Theory like that for linear model, but with
a crucial, difference.
Fisher’s information matrix changes:
FTF 
W V
1
 
FTWF

d 2
d
 f   Fβ,    f β,  


T
F
WF
D-optimality: maximize
(Local D-optimal, and Local D-Efficiency)
3/40
Introduction – Visualization
4/40
Introduction – Main Objectives

Construction of an algorithm to find
Local D-optimal Designs

Generalization:
From locally optimal designs into robust designs (which
take account of the uncertainty
in the model parameters)

Further robustness – for different link functions, linear
predictors, etc.

Sequential design – use data to estimate the model and
improve the design as the experiment runs.
5/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs
 Sequential Designs
 Conclusions

Robust Experimental Design for multivariate GLM
6/40
Local D-optimal designs – Algorithm
Mimics algorithms for linear models.
 Main element – a row exchange
procedure.
 Rows are added or deleted, weighting
the regression functions in accord with
the mean value.


Timing: 1 second for a 16 point Poisson regression
with 5 variables + interactions (accuracy 2 decimal places)
7/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs

–
–
–
–
–
Clustering: Motivating Example
Clustering vs. Bayesian Designs
Clustering vs. Compromise Designs
Linear Predictor and Link function Robustness
Ink Production Example
Sequential Designs
 Conclusions

Robust Experimental Design for multivariate GLM
8/40
Clustering – Motivating Example
Proximity of 25 local D-optimal designs for a
logistic model with intercept value uncertainty
   0   x x   y y   xy xy
 x   y  2,  xy  0.2.
 0  U 0, 2
1
0.5
y

0
-0.5
-1
-1
-0.5
0
x
0.5
1
9/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs

–
–
–
–
–

Clustering: Motivating Example
Clustering vs. Bayesian Designs
Clustering vs. Compromise Designs
Linear Predictor and Link function Robustness
Ink Production Example
Conclusions
Robust Experimental Design for multivariate GLM
10/40
CLUSTERING vs. BAYESIAN DESIGNS (1)

Chaloner & Larntz (1989)
Design Criterion: maximize the mean
(over a prior distribution)
of the information matrix log determinant
px; ,    1/1  exp  x   
  U 1,1   U 6, 8 x  1,1
Their optimal Bayesian Design:
 Uses 7 support points
 Reported value of -4.5783 for the criterion
11/40
CLUSTERING vs. BAYESIAN DESIGNS (2)


K-means Clustering over 100 Local Designs
Local Designs’ coefficients: Low-Discrepancy
sequence (Niederreiter’s)
-4
Evaluated over 10,000
Coefficients vectors
Average Log Determinant
of the Information Matrix

-4.2
-4.4
-4.6

Both designs
(almost) meet
sufficient requirements
for optimality proof
Chaloner and Larntz (1989)
Reported Value
-4.8
-5
-5.2
-5.4
0
5
10
15
Number of Support Points
20
12/40
CLUSTERING vs. BAYESIAN DESIGNS (3)


Expect Bayesian to be generally better
But… If Clustering does not fall much:
 Simplicity of creation
 Considerably less computational needs
 Extension to multivariate problems –
almost trivial
13/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs

–
–
–
–
–

Clustering: Motivating Example
Clustering vs. Bayesian Designs
Clustering vs. Compromise Designs
Linear Predictor and Link function Robustness
Ink Production Example
Conclusions
Robust Experimental Design for multivariate GLM
14/40
Clustering vs. Multivariate Compromise Designs (1)

Woods, Lewis, Eccleston and Russell
(Technometrics, May 2006):
– A method for finding exact designs for
experiments in which there are several
explanatory variables
– Use Simulated Annealing to find a design with
the same criterion as Chaloner & Larntz
– They note that evaluating the integral is too
computationally intensive for incorporation
within a search algorithm, and therefore
average over a partial set
15/40
Clustering vs. Multivariate Compromise Designs (2)
 Crystallography experiment
– 4 variables
• (rate of agitation during mixing, volume of composition,
temperature and evaporation rate)
– Affect the probability that a new product is formed
– First order logistic model (with no interactions)
– 16 (/48) observations
– Parameter space:
(demonstrating algorithm’s superiority)
Performance evaluated using median
and minimum Local D-Efficiencies
relative to 10,000 random parameter
vectors
Parameter
0
1
2
3
4
Range
 3, 3
4,10
5,11
 6, 0
 2.5, 3.5
16/40
Clustering vs. Multivariate Compromise Designs (3)
Design
Median
Efficiency
Minimum
Efficiency
Standard 24 factorial
0.07
0.003
Woods’
Compromise design
0.41
0.12
17/40
Clustering vs. Multivariate Compromise Designs (4)

Clustering procedure (1):
– First, created Local Designs for 100
parameter vectors (Neiderreiter sequence)
– 1,600 points K-means clustering (K=16)
Design
Median
Efficiency
Minimum
Efficiency
Standard 24
factorial
0.07
0.003
Woods’
Compromise
0.41
0.12
7
Clustering (1)
0.40
0.09
1
]0.38,0.42[
]0.06,0.12[
30
seconds
0.25
seconds
Minutes
18/40
Clustering vs. Multivariate Compromise Designs (5)

Clustering procedure (2):
– Choose the cluster with highest average
log determinant of information matrix,
over N clustering repetitions:
Design
Median
Efficiency
Minimum
Efficiency
Standard 24
factorial
0.07
0.003
Woods’
Compromise
0.41
0.12
7
Clustering (1)
0.40
0.09
1
Clustering (2)
0.42
0.096
[0.416,0.430]
[0.06,0.13]
1
Minutes
19/40
Clustering vs. Multivariate Compromise Designs (6)
Fast procedure
Examine effect of # of Support points
0.45
Approximate Efficiency

0.4
20
seconds
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
Median Efficiency
Minimum Efficiency
10
20
30
40
50
60
70
80
90
100
Number of Support Points
20/40
Clustering vs. Multivariate Compromise Designs (7)
Crystallography experiment - summary
Design
Median
Efficiency
Minimum
Efficiency
Standard 24
factorial
0.07
0.003
Woods’
Compromise
0.41
0.12
7
Clustering (1)
0.40
0.09
1
Clustering (2)
0.42
0.096
1
0.423
0.177
[0.415, 0.432]
[0.141, 0.213]
Clustering (3)
Minutes
2.5
21/40
Clustering vs. Multivariate Compromise Designs (6)
Advantageous byproduct of clustering:
0.45
Approximate Efficiency

0.4
20
seconds
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
Median Efficiency
Minimum Efficiency
10
20
30
40
50
60
70
80
90
100
Number of Support Points
22/40
Overview
Introduction
 Prior Work
 Local D-optimal Designs
 Robust Designs

–
–
–
–
–

Clustering: Motivating Example
Clustering vs. Bayesian Designs
Clustering vs. Compromise Designs
Linear Predictor and Link function Robustness
Ink Production Example
Conclusions
Robust Experimental Design for multivariate GLM
23/40
Robustness for Linear Predictors
and Link functions





(again from Woods et al.)
2 variables
2 linear predictors: with / without interactions
2 link functions: Probit / CLL
Given (known) coefficients values
Design
Model
Clustering
Woods
d3
0.75
0.77
1.00
0.34 0.99 0.30
Interactio n
0.81
0.80
0.00
1.00 0.00 0.97
No interactio n
0.64
0.64
0.99
0.24 1.00
Interactio n
0.85
0.86
0.00
0.97 0.00 1.00
Probit No interactio n
CLL
d4
d5
d6
0.11
24/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs

–
–
–
–
–

Clustering: Motivating Example
Clustering vs. Bayesian Designs
Clustering vs. Compromise Designs
Linear Predictor and Link function Robustness
Ink Production Example
Conclusions
Robust Experimental Design for multivariate GLM
25/40
Ink Production Example (1)





A Poisson Model
5 Variables
Normally Distributed Coefficients values
uncertainty
Uncertainty about interaction effects
Centroid design reasonably efficient
26/40
Ink Production Example (2)



5 Tubes, each with different chemical
Each tube: Chosen concentration (fixed volume)
Ink quality classification: # of imperfect marks
(on a standard printed test page)



Low concentrations – low quality, unusable
High concentrations – expensive
Model building based on experts opinions
27/40
Ink Production Example (3)

Model building based on experts opinions
First-order
Term
With Interactions
Estimate S.E. Estimate
S.E.
Intercept
-1.52
0.21
-2.35
0.69
x1
-4.30
0.20
-5.53
0.94
x2
-1.79
0.16
-2.99
0.82
x3
-3.39
0.24
-3.95
0.59
x4
-0.28
0.32
-0.86
0.54
x5
0.23
0.30
0.41
0.36
x1x2
-2.07
1.32
x1x3
-1.13
0.98
28/40
Ink Production Example (4)

Full Factorial D-Efficiency:
2500
2000
1500
1000
500
0
0
0.2
0.4
0.6
0.8
1
Efficiency
29/40
Ink Production Example (5)

Cluster Design D-Efficiency:
4000
3500
3000
2500
2000
1500
1000
500
0
0
0.2
0.4
0.6
0.8
1
Efficiency
30/40
Ink Production Example (6)

Centroid Design D-Efficiency:
4000
3500
3000
2500
2000
1500
1000
500
0
0
0.2
0.4
0.6
0.8
1
Efficiency
31/40
Ink Production Example (6)
Centroid Design D-Efficiency
4000
3500
3000
2500
2000
1500
1000
500
00
0.2
0.4
0.6
Efficiency
0.8
1
Cluster Design D-Efficiency
4000
3500
3000
2500
2000
1500
1000
500
00
0.2
0.4
0.6
Efficiency
0.8
1
32/40
Ink Production Example (7)
1
Required Sample Size 
DEFF
Equivalent Sample Size
100
80
60
40
20
0
0
0.2
0.4
0.6
0.8
1
Efficiency
33/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs
 Sequential Designs
 Conclusions

Robust Experimental Design for multivariate GLM
34/40
Sequential Designs
Good design requires knowledge of
coefficients.
 Use the data thus far to assess the
model and the coefficients.
 Augment the design accordingly.
 Bayesian framework is natural.

Robust Experimental Design for multivariate GLM
35/40
Sequential Designs
Current methods:




Bruceton (Dixon and Mood 1948)
Langlie (1965)
Neyer (1994)
Wang, Smith & Ye (2006)
Robust Experimental Design for multivariate GLM
36/40
Sequential Designs
Current methods are limited to:
 One-factor experiments.
 Fully sequential experiments.
Our method can be applied with many
factors and in both fully sequential and
group-sequential settings.
Robust Experimental Design for multivariate GLM
37/40
Efficiency Comparison
Median:
0.67
5% quantile: 0.30
48 points
0
0.2
0.4
One-stage
ROBUST
0.6
0.8
1
Efficiency
Median:
0.98
5% quantile: 0.85
0
0.2
0.4
SEQUENTIAL
0.6
0.8
1
Efficiency
38/40
Overview
Introduction
 Local D-optimal Designs
 Robust Designs
 Sequential Designs
 Conclusions

Robust Experimental Design for multivariate GLM
39/40
Summary & Conclusions



Local D-optimal designs for GLM can be easily found
Clustering a database of local D-optimal designs
creates a robust design
Clustering is Robust for many uncertainty types:
– parameter space, linear predictors, link functions, …




Simple procedure, minimal computational resources
Speed allows exploration of various designs and
investigation of different number of support points
Outperforms more sophisticated and complex design
optimization methods
Efficient sequential designs by combining the ideas
with a Bayesian updating approach.
40/40