Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Designing Factorial Experiments with Binary Response Hovav A. Dror & David M. Steinberg July 2006 International Conference on DOE – Nankai University 1/40 Overview Introduction – Designs for GLM’s Local D-optimal Designs Robust Designs Sequential Designs Conclusions Technical reports and MATLAB macros available at www.math.tau.ac.il/~dms/GLM_Design Robust Experimental Design for multivariate GLM 2/40 D-optimal GLM designs Theory like that for linear model, but with a crucial, difference. Fisher’s information matrix changes: FTF W V 1 FTWF d 2 d f Fβ, f β, T F WF D-optimality: maximize (Local D-optimal, and Local D-Efficiency) 3/40 Introduction – Visualization 4/40 Introduction – Main Objectives Construction of an algorithm to find Local D-optimal Designs Generalization: From locally optimal designs into robust designs (which take account of the uncertainty in the model parameters) Further robustness – for different link functions, linear predictors, etc. Sequential design – use data to estimate the model and improve the design as the experiment runs. 5/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM 6/40 Local D-optimal designs – Algorithm Mimics algorithms for linear models. Main element – a row exchange procedure. Rows are added or deleted, weighting the regression functions in accord with the mean value. Timing: 1 second for a 16 point Poisson regression with 5 variables + interactions (accuracy 2 decimal places) 7/40 Overview Introduction Local D-optimal Designs Robust Designs – – – – – Clustering: Motivating Example Clustering vs. Bayesian Designs Clustering vs. Compromise Designs Linear Predictor and Link function Robustness Ink Production Example Sequential Designs Conclusions Robust Experimental Design for multivariate GLM 8/40 Clustering – Motivating Example Proximity of 25 local D-optimal designs for a logistic model with intercept value uncertainty 0 x x y y xy xy x y 2, xy 0.2. 0 U 0, 2 1 0.5 y 0 -0.5 -1 -1 -0.5 0 x 0.5 1 9/40 Overview Introduction Local D-optimal Designs Robust Designs – – – – – Clustering: Motivating Example Clustering vs. Bayesian Designs Clustering vs. Compromise Designs Linear Predictor and Link function Robustness Ink Production Example Conclusions Robust Experimental Design for multivariate GLM 10/40 CLUSTERING vs. BAYESIAN DESIGNS (1) Chaloner & Larntz (1989) Design Criterion: maximize the mean (over a prior distribution) of the information matrix log determinant px; , 1/1 exp x U 1,1 U 6, 8 x 1,1 Their optimal Bayesian Design: Uses 7 support points Reported value of -4.5783 for the criterion 11/40 CLUSTERING vs. BAYESIAN DESIGNS (2) K-means Clustering over 100 Local Designs Local Designs’ coefficients: Low-Discrepancy sequence (Niederreiter’s) -4 Evaluated over 10,000 Coefficients vectors Average Log Determinant of the Information Matrix -4.2 -4.4 -4.6 Both designs (almost) meet sufficient requirements for optimality proof Chaloner and Larntz (1989) Reported Value -4.8 -5 -5.2 -5.4 0 5 10 15 Number of Support Points 20 12/40 CLUSTERING vs. BAYESIAN DESIGNS (3) Expect Bayesian to be generally better But… If Clustering does not fall much: Simplicity of creation Considerably less computational needs Extension to multivariate problems – almost trivial 13/40 Overview Introduction Local D-optimal Designs Robust Designs – – – – – Clustering: Motivating Example Clustering vs. Bayesian Designs Clustering vs. Compromise Designs Linear Predictor and Link function Robustness Ink Production Example Conclusions Robust Experimental Design for multivariate GLM 14/40 Clustering vs. Multivariate Compromise Designs (1) Woods, Lewis, Eccleston and Russell (Technometrics, May 2006): – A method for finding exact designs for experiments in which there are several explanatory variables – Use Simulated Annealing to find a design with the same criterion as Chaloner & Larntz – They note that evaluating the integral is too computationally intensive for incorporation within a search algorithm, and therefore average over a partial set 15/40 Clustering vs. Multivariate Compromise Designs (2) Crystallography experiment – 4 variables • (rate of agitation during mixing, volume of composition, temperature and evaporation rate) – Affect the probability that a new product is formed – First order logistic model (with no interactions) – 16 (/48) observations – Parameter space: (demonstrating algorithm’s superiority) Performance evaluated using median and minimum Local D-Efficiencies relative to 10,000 random parameter vectors Parameter 0 1 2 3 4 Range 3, 3 4,10 5,11 6, 0 2.5, 3.5 16/40 Clustering vs. Multivariate Compromise Designs (3) Design Median Efficiency Minimum Efficiency Standard 24 factorial 0.07 0.003 Woods’ Compromise design 0.41 0.12 17/40 Clustering vs. Multivariate Compromise Designs (4) Clustering procedure (1): – First, created Local Designs for 100 parameter vectors (Neiderreiter sequence) – 1,600 points K-means clustering (K=16) Design Median Efficiency Minimum Efficiency Standard 24 factorial 0.07 0.003 Woods’ Compromise 0.41 0.12 7 Clustering (1) 0.40 0.09 1 ]0.38,0.42[ ]0.06,0.12[ 30 seconds 0.25 seconds Minutes 18/40 Clustering vs. Multivariate Compromise Designs (5) Clustering procedure (2): – Choose the cluster with highest average log determinant of information matrix, over N clustering repetitions: Design Median Efficiency Minimum Efficiency Standard 24 factorial 0.07 0.003 Woods’ Compromise 0.41 0.12 7 Clustering (1) 0.40 0.09 1 Clustering (2) 0.42 0.096 [0.416,0.430] [0.06,0.13] 1 Minutes 19/40 Clustering vs. Multivariate Compromise Designs (6) Fast procedure Examine effect of # of Support points 0.45 Approximate Efficiency 0.4 20 seconds 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 Median Efficiency Minimum Efficiency 10 20 30 40 50 60 70 80 90 100 Number of Support Points 20/40 Clustering vs. Multivariate Compromise Designs (7) Crystallography experiment - summary Design Median Efficiency Minimum Efficiency Standard 24 factorial 0.07 0.003 Woods’ Compromise 0.41 0.12 7 Clustering (1) 0.40 0.09 1 Clustering (2) 0.42 0.096 1 0.423 0.177 [0.415, 0.432] [0.141, 0.213] Clustering (3) Minutes 2.5 21/40 Clustering vs. Multivariate Compromise Designs (6) Advantageous byproduct of clustering: 0.45 Approximate Efficiency 0.4 20 seconds 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 Median Efficiency Minimum Efficiency 10 20 30 40 50 60 70 80 90 100 Number of Support Points 22/40 Overview Introduction Prior Work Local D-optimal Designs Robust Designs – – – – – Clustering: Motivating Example Clustering vs. Bayesian Designs Clustering vs. Compromise Designs Linear Predictor and Link function Robustness Ink Production Example Conclusions Robust Experimental Design for multivariate GLM 23/40 Robustness for Linear Predictors and Link functions (again from Woods et al.) 2 variables 2 linear predictors: with / without interactions 2 link functions: Probit / CLL Given (known) coefficients values Design Model Clustering Woods d3 0.75 0.77 1.00 0.34 0.99 0.30 Interactio n 0.81 0.80 0.00 1.00 0.00 0.97 No interactio n 0.64 0.64 0.99 0.24 1.00 Interactio n 0.85 0.86 0.00 0.97 0.00 1.00 Probit No interactio n CLL d4 d5 d6 0.11 24/40 Overview Introduction Local D-optimal Designs Robust Designs – – – – – Clustering: Motivating Example Clustering vs. Bayesian Designs Clustering vs. Compromise Designs Linear Predictor and Link function Robustness Ink Production Example Conclusions Robust Experimental Design for multivariate GLM 25/40 Ink Production Example (1) A Poisson Model 5 Variables Normally Distributed Coefficients values uncertainty Uncertainty about interaction effects Centroid design reasonably efficient 26/40 Ink Production Example (2) 5 Tubes, each with different chemical Each tube: Chosen concentration (fixed volume) Ink quality classification: # of imperfect marks (on a standard printed test page) Low concentrations – low quality, unusable High concentrations – expensive Model building based on experts opinions 27/40 Ink Production Example (3) Model building based on experts opinions First-order Term With Interactions Estimate S.E. Estimate S.E. Intercept -1.52 0.21 -2.35 0.69 x1 -4.30 0.20 -5.53 0.94 x2 -1.79 0.16 -2.99 0.82 x3 -3.39 0.24 -3.95 0.59 x4 -0.28 0.32 -0.86 0.54 x5 0.23 0.30 0.41 0.36 x1x2 -2.07 1.32 x1x3 -1.13 0.98 28/40 Ink Production Example (4) Full Factorial D-Efficiency: 2500 2000 1500 1000 500 0 0 0.2 0.4 0.6 0.8 1 Efficiency 29/40 Ink Production Example (5) Cluster Design D-Efficiency: 4000 3500 3000 2500 2000 1500 1000 500 0 0 0.2 0.4 0.6 0.8 1 Efficiency 30/40 Ink Production Example (6) Centroid Design D-Efficiency: 4000 3500 3000 2500 2000 1500 1000 500 0 0 0.2 0.4 0.6 0.8 1 Efficiency 31/40 Ink Production Example (6) Centroid Design D-Efficiency 4000 3500 3000 2500 2000 1500 1000 500 00 0.2 0.4 0.6 Efficiency 0.8 1 Cluster Design D-Efficiency 4000 3500 3000 2500 2000 1500 1000 500 00 0.2 0.4 0.6 Efficiency 0.8 1 32/40 Ink Production Example (7) 1 Required Sample Size DEFF Equivalent Sample Size 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 1 Efficiency 33/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM 34/40 Sequential Designs Good design requires knowledge of coefficients. Use the data thus far to assess the model and the coefficients. Augment the design accordingly. Bayesian framework is natural. Robust Experimental Design for multivariate GLM 35/40 Sequential Designs Current methods: Bruceton (Dixon and Mood 1948) Langlie (1965) Neyer (1994) Wang, Smith & Ye (2006) Robust Experimental Design for multivariate GLM 36/40 Sequential Designs Current methods are limited to: One-factor experiments. Fully sequential experiments. Our method can be applied with many factors and in both fully sequential and group-sequential settings. Robust Experimental Design for multivariate GLM 37/40 Efficiency Comparison Median: 0.67 5% quantile: 0.30 48 points 0 0.2 0.4 One-stage ROBUST 0.6 0.8 1 Efficiency Median: 0.98 5% quantile: 0.85 0 0.2 0.4 SEQUENTIAL 0.6 0.8 1 Efficiency 38/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM 39/40 Summary & Conclusions Local D-optimal designs for GLM can be easily found Clustering a database of local D-optimal designs creates a robust design Clustering is Robust for many uncertainty types: – parameter space, linear predictors, link functions, … Simple procedure, minimal computational resources Speed allows exploration of various designs and investigation of different number of support points Outperforms more sophisticated and complex design optimization methods Efficient sequential designs by combining the ideas with a Bayesian updating approach. 40/40
© Copyright 2026 Paperzz