Dickinson, A.W.; (1957)Some sequential design procedures." (Navy Research)

SOMl~
SEQTTF.NTIAL DESIGN PROCEDURES
tJ.!.
1.·.· .' · ..
-:.
.
,
~:J
•
by
.Allan W. Dickinson
University of North Carolina
This research was supported by the
'Office of Naval Research under the
oontract for research in probability
and statistics at Chapel Hill. Reproductions in whole or in part is permitted for any purpose of the United
States Government.
Institute of Statistics
Mimeograph Series No. 169
April, 19$7
ii
A C K NOW LED GEM E N T
I would like to express my sincere appreciation to
Dr. G. E. Nicholson, Jr., for reading promptly and carefully many rough drafts of this thesis and for a number
of helpful suggestions.
I am also indebted to Dr. R.
C. Bose for his careful reading of the manuscript, and
to Dr. A. H. E. Grandage for his aid and instruction
concerning the use of the IBM 650 computer.
I acknowledge gratefully financial assistance from
the Institute of Statistics and from the Office of Naval
Research without whose help this work would not have
been possible.
Finally, I give thanks to Mrs. L. O. Kattsoff
for typing the thesis, and to Miss Martha Jordan and
Mrs. J. K. Spencer for otherwise aiding me in its preparation •
•
iii
TABLE OF CONTENTS
PAGE
CHAPTER
ACT\N01~TLEDGF:1'1B.NT
•
,
,.
•
It
•
iv
INTRODTJCTI01\l
I.
II.
A STUDY OF SO~fE PROCEDURES FOR LOC ATING THE OPTIMUM
POINT • • . . • . • . • . . . • . . • . . . ..
1
APPLICATION OF THE KIEFER-lfITOLFO\ATITZ STOCHASTIC
APPROXIMATION PROCEDURE TO A SECOND DEGREE REGRESSION
FUNCTION
38
APPENDIX
62
q............
J."O
BIBLIOGRAPIIY
•
0
•••
•
,
O'l3'
...
•
•
..
•
l
,
•
•
*
..
68
iv
INTRODUCTION
tn this thesis, I will treat some of the sequential experimentation procedures that have arisen in recent years.
Speoial emphauis
will be placed upon optimization teohniques.
•
Chapter I contains essentially descriptive material.
Here the
Rotelling techninue ~11_7l. the Kiefer-Wolfowitz procedure L-18_7, the
Friedman-Savage method ~14_7. Blum's procedure
L 5_7,
M
and the Box-
Wilson prooess L-10_7 are described and compared on certain very general points set forth at the beginning of the ohapter.
Chapter II contains a development of some of the operating oharaoteristics of one of the above techniques, namely, the
Kiefer~Molfowitz
stochastio approximation procedure for estimating the maximum of a reM
gression function in the oase where the true regression is ouadratic,
In particular, a method is suggested for determining from a preliminary
experiment values for some of the arbitrary constants whioh appear in
the recursive relation defining the Kiefer-Wolfowitz procedure.
This
method is studied with t.he aid of the Monte Carlo approach in the
appendix.
t. The figures in square brackets refer to the bibliography.
CHAPTER I
A STUDY OF SOME RROCEDTJRES FOR LOCATING
. THE OPTIMUM POINT1
1,1.
...,
Terminology.
Many of the standard terms and expressions of statistics will
be used in a special sense in this paper.
These definitions are given
below.
(1.1.1) Independent Variable.
An independent variable, denoted by
Xi' is any variable that
is under the control of the experimenter.
(l.l.~)
Response.
A response is defined to be any variable of interest that is
not an independent variable. We shall denote a response by the symbol
y.
(1.1.3) Response Surface.
A representation of the response in terms of the independent
variables is said to be a response surface.
(1.1.4) Single Independent Variable Case.
The sIngle independent variable situation is defined to be that
where there is a single independent variable x, and we are attempting
to express the response as a function of x, say y
= f(x).
1. Sponsored by the Office of Naval Res~arch under the contract
for research in probability and statistics at Llhapel Hill. Reproduction
in whole or in part is permitted for any purpose of the United States
Government.
?
(1.1.,) Multiple Independent Variable Case.
The multiple independent variable case is defined to be that
where there are k independent variables t
xl' x 2' ••• , xk ; and we
are attempt:1ng to express the response as a function of (Xl' x 2 ' ••• ,
x k ),
say y = f(X 1 , x 2 ' •.. , ~).
(1.1.6) Observation.
•
An observation is defined to be a determination of the k+l
values:
(1.1.7) Experiment.
The determination of one or more observations according to
some pre-arranged plan is said to be an experiment.
(l.l.R) Experimental DesiRn.
Apre-arranged plan describing the values of the independent
variables for each observation is defined to be on experimental d€sign.
(1.1.9) Sequential Procedure.
A method of exoerimentation by means of which the values of
the tndependent variables fer the n-th observation are determined by
a procedure laid down in adVc9.LiCe of the experiment from the values for
the response on one or more of the preoeding n-l observations is called
a sequential procedure.
(l.l.JD)
ROl'>i
Vector.
A row vector consisting of the p elements
shall be denoted by the symbol !.
(xl' x 2 ' ••. , xp )
3
(1.1.11) Optimum Point.
A combination of the values of the independent variables which
maximizes the response is called an optimum point and is designated by
e
in the single independent variable situation and by
e
in the
multiple independent variable situation. We shall consider only the
•
problem of maximizing the response, since the minimization problem can
be handled by maximizing the negative of the response.
(1.1.12) NID Notation.
A random variable w is said to be NID(u,v) if, in every
sample, the wi are identically and indeoendently distributed with
mean u
and variance v.
1.2.
The Purpose of Experimentation.
Although all experimentation is undertaken with the ultimate
goal of
L~creasing
manls knowledge about the world in which he lives,
the short-term goal of an experiment may be modified by the funds or
equipment available, the extent of the experimenter's knowledge concerning the phenomena under investigation, or by the phase of the investigation.
The primary objectives may be summarized as:
(1.2.1)
Deciding on important independent variables}
(1.2.2)
Mapping the response surface over the area of
(1.?3)
Optimizing a response or a combination of responses;
(1.2,4)
Obtainiug an ~isight into the basic mechanism of ~~e system.
interest~
The objective of any particular experiment may be one or
several of these goals, but usually the experimenter will progress in
his knowledge and understanding of the system by taking steps in the
4
order listed above.
In addition tA (1.2.1) through (1.2~4) it should be the objective of every experiment to:
(1.2.,) Point out the direction
in which further research should pro~
ceed •
•
No scientific investigation is an end in itself.
'.
It is the essence of
all progress that we continue to accumulate information and to modify
and, if necessary, discard our ideas and our theories as new evidence
comes to light.
The laws of science constitute a growing, ever~changing organism, but the changes are generally evolutionary in character
and represent modifications, improvements, and extensions more
often than drastic repudiation of the old. 2
1.3.
Deficiencies of Factorial Design.
There has been a growing realization of late that there exist
many special experimental problems for which factorial designs and their
modifications: the
l~in
squares, the balanced incomplete block designs,
the confounded designs, and the fractionally replicated deSigns may not
provide a realistic or an economically feasible solution.
The experi-
menter, for instance, may be interested in picking out the most important variables from a large group of potentially important variables; or
he may be working with a number of populations with unknown means and
wish to pick a group of three or four of these populations, say those
with the largest means, for further study.
Alternatively, he may wish
?
E. Bright Wilson, Jr., An Introduction to Scientific Research.
New York: McGraw Hill, 19,2. PP. 30.
to optimize some response as a function of the independent variables
under his control. 1"Je shall be concerned in this thesis primarily
with the problem of optimizing a res?OnBe.
The complete factorial designs have been found eo be unsatisfactory for this nurpose mainly because:
(1.) .1) The conclus ions obt ained from the an alys is apply only to a
discrete set of factor combinations, while in reality many of
the variables are continuous.
(1.3.2) When the number of factors is large, the number of observations
required even for a 2n experimerlt becomes prohibitively large.
Fraotionally replicated designs are some help here but not the
whole answer.
(1.3.3) With many factors much of the information obtained from the ex•
periment concerns the high-order interactions.
Quite often
this is information of little value to the experimenter because:
i) In some fields high-order interactions have been shown to be
usually small when compared with the main effects and first
order interactions.
ii)~ven
when one of the high-order interactions is large, its
physical interpretation may be difficult.
(1.3.4) These'designs devote observations to regions which may be of
little interest to the experimenter since, in the light of the
results, these regions may be far from a maximum.
We are-thua,led to seek-:-fordesignswhichr
(1.3.,) Take into consideration the continuous nature of some of the
6
variables.
(1.3.6) ProVide degrees of freedom for estimating the main effects
and the low-order interactions but do not devote observations
to determining the high order interactions.
(1.3.7) Concentrate most of the observations in the neighborhood of
the maximum.
As a result of these oonsiderations, several new experimental
design and sequential experimentation techniques have come into use in
recent years.
It is the
~urpose
of this chapter to list some of these
yechniques, to provide a list of questions
~hich
it is hoped will be
of assistance in comparing these techniques, and to describe briefly
some of the optimum-locating techniques and indicate the general area
where each might prove useful.
o
1.J.j. •
The New
Technique~.
In 1941, Hote11ing 1.-17_7 treated the problem of maximizing
the reSDonse due to a single independent variable.
- -
Friedman and Savage /-14 7 proposed, in 1947, a sequential one.
factor-at-a-time approach to the problem of approximating the maximum
or mapping the area in the neighborhood of the maximum of a response
based on several independent variables.
Seouential plans for the single
independent variable case have been de\7ised by Robbins and Monro
L-22_7,
1951, and by Kiefer and Wolfowit~ ~18_7, 1952. Robbins and MonrQ
considered the problem of estimating x such that
f(x) = a
,
7
where a is pre-determined, and Kiefer and Wolfowitz obtained a sequential ~rQoedure for estimating x
Blum
-/-5-7,
1954, extended the
dimensional case.
such that
f(x) is a maximum.
Kiefer~o1fowitz procedure
- -7,
Dixon and Mood ;-13
to the multi-
1948, presented a seqqential
procedure for sensitivity testing.
Box and Wilson
L-10_7, 1951, introduced a method of sequential
progression by the method of Ifsteepest ascent ", and have contributed
a whole new class of experimental designs ranging from the first order
designs to the central and non-central composite designs and the rot atable designs.
Bechh~fer
L-3_7, 1954,
and Beohho£er, DunnGt and Sobe"l{:LJ, 1954,
have given procedures tor choosing from among a group of populations
a sub-group with the following property: a probability p
is chosen
by the experimenter; then the probability that the chosen sub-group
contains the population with the greatest (smallest) mean is greater
than or equal to
p.
Satterthwaite ~23_7. 1956, has introduced the random balance
desiRns.
These designs are constructed to screen a large group of
variables and pick out those which have a noticeable effect on the
resoonse.
1."
Some Questions.
We list here some Questions which we hope will prove helpful
in determining the usefulness of and the limitations on the new techniques.
i.
t·
8
(1.,.1) Number of Observations.
~hat
is the number of observations needed for a given result?
Is this number fixed or variable?
If it is variable, do we know its
probability distribution or can we place bounds upon it?
(1.,.2) Reliability of Predictors.
In those cases where we obtain regression coefficients, what
are the standard deviations and biases of these regression coefficients?
What is
var(;)? lNhat is var (~)?
(1.,.3) Is the Procedure Robust?
How sensitive is the procedure to the existence of interactions?
What is the effect in the seQuential procedures of changing the starting point, or in the 'steepest ascent" procedure of changing the initial
region? Is the procedure at the mercy of time trends?
the design to some form of
bloc~ing
Can we adapt
when serious non-heterogeneity
exists? What is the effect of using the lrJrong model? How about
biases, non-normality, etc.? Will a large experimental error seriously
damage the efficiency of the procedure?
(I., .4) Can lATe Test. for Goodness ·of ··fit and Change the Model if Neces-
Can we make at least a rough test for goodness of fit?
If such
a test indicates another model, can we make this ohange? Can we add
a variable during the course of the experimentation?
If the first ex-
periments indicate that we should move to a new experimental region, can
we salvage the information obtained from these first experiments, i.e.,
can we IJnest ft the design?
9
(1.,.,) Opportunity to use Past Knowledge.
Does the technique offer the investigator opportunity to make
use of his knowledge and skill?
(1.,.6) Interpretation of Results.
Can the results of the analysis be interpreted in such a way
that the experimenter with little or no statistical training will
'.
clearly understand them?
(1.,.7) Computational Considerations.
What about the computational procedures involved? Will they
become too cumbersome? Can they be programmed on an electronic computer? What is known about the distributions of the estimates obtained
from the sequential plans? What is the effect of optional stopping
in the seauential plans?
We shall discuss below some of the new designs which may be
•
used to estimate the location of the maximum response. 1,ve shall take
up separately the single independent variable designs and the multiple
independent variable designs.
Single Indep-endent Variable
De~si~ns
f
A
We shall discuss here the Hotelling procedure and the Kieferwolfowitz
1.6..
?rocedu~e.
The HotelIing Procedure.
The Hotelling procedure is descri.bed in L-17_7 and is summar-
ized briefly below.
,e
10
The experiment is divided into two parts: an initial experiment to determine the approximate location of the maximum response
and a secon(} experiment to determine a more precise estimate of the
location of the maximum response and to map the response in the vicinity
of the maximum.
For the initial experiment assume a fifth order regression
mode13, i.e.,
(1.6.1)
where
y
= ~0+~lX+~ZX2+~3x3+~UX4+~5x5
2
e is NID( 0,0 ).
+
6
,
Take at least 6 equally spaced levels of x:
x(l)' x(2)' ... , X(p)' p.:: 6, and make n
sponse (n ~ 2) at each level of x.
determinations of the re-
The total number of observations
in this initial stage is then np :::: 12.
Fit a fifth degree curve to
these observations with the use of orthogonal polynomials, thus obtaining the prediction equationr
where
bi is the least squares estimate of ~i' for i = 0, 1, "" ,.
Determine the value of x which maximizes (1.6.2). Call this
value t.
This can be determined graphically or by differentiating
(1.6.2) with respect to x, solving the resulting Quartic equation, and
taking the root which maximizes y....
Now conduct a second experiment in
3. 1ATe assume a fifth order regression model in order to obtain estimates of ~2' ~3' ~4' and ~5 which are used in obtaining the
design points for the second stage of the experiment.
11
the neighborhood of t
taking
~4
observations.
The €:&;perimental
designs for this phase of the investigation are given in ~17_7. Designs specifying three levels of x
are given for both equal and un-
equal frequencies in the cells corresponding to the levels of x.
Since we will limit ourselves to
8
small neighborhood of t, a quad-
ratic model should adeQuately describe the response in this region.
Thus we assume the model
where
the
8
,
,
is NID( 0,0- '2 ), and obtain the least squares estimates b i' of
The equation of prediction becomes,
~i'
(1.6.4)
Differentiating (1.6.4) with respect to (x-t) and setting
equal to zero, we obtain
,
and thus,
,
(1.6.6)
so that
e
=:
91 +" t
is the best estimate of the abs1cc-a at the maxi-
mum response.
Discussion.
Note that the total number of observations is fixed, being
4. nl
is for all practical purposes arbitrary.
12
np
+~'
The estimates of the regression coefficients, both in the
first and second phases of experimentation, are minimum variance estimates and the variance matrix for these regression coefficients is
obtained in the standard manner,
the designs in phase
Furt~ermore,
two are constructed so that the cubic bias in the regression ooefficients is zero and the quartic bias is minimized.
~lso,
the expecta-
1\
tion of the souare of the total error in 9, up to and including terms
of fourth order in the bias, is minimized by a proper balance between
the sampling variance and the Quartio bias.
,"-,
,
The varianoe of 'y for any x, say x, is estimated by
''\ I '
2
.
i=O
Var(y 'x)"'! Z
~ c (xI )i+j S 2
j=O ij
I..
where s2 is the estimate of a 2 , and where
«
is the appropriate
C
ij
term from the inverse normal matrix C.
xO
x
1
x2
COl
C
c10
c
ll
c12
c20
c 21
c22
I
! cOO
I
C
I
=:
\
\
\
L-1,_7.
Let (x-t) = z.
.,',
y
,
,
= bl
\,
!
We can obtain confidence limits for
Fiel1er. See
O2
=:
xl
x2
e by a device due to
Then from (1.6.,)
,
+ 2bzZ
/
J
xO
6 .• say
The varianoe of tMs.quantity can be estimated by
13
(1.6.8 )
Thus,
,
v
where t
3 is the a percentage point for the Student1s
a,~-
tribution with n l -3
degrees of freedom.
t dis-
Substituting from (1,6.8) ,
,
(1.6.10)
that is,
where
t
2
'" t
2
a.,nl -3
Thus, by solving (1.6.11) for z, we can obtain confidence limits for
Z
with oonfidence coefficient (l-.a).
Since x
get confidence limits for x, i.e., for 9.
=z
+ t, we can then
These confidence limits will
be correct, however, only if the true regression is ouadratic.
If the
true regression 1s not Quadratio,the confidence band for 9 will be
wider than that indicated by the computed limits due to the presence
of bias effects.
Another advantage of the Hotelling technique is that the order
of the experimental runs for either or both phases of the investigation
can be randomized so that a time trend will not seriously bias the esti-
14
mate of
e,
The Hotel1ing approach probably does not offer as much opportunity for the employment of the knowledge and skill of the experimenter as do same of the experimental techniques discussed below.
About the only contribution that the experimenter can make towards designing the experiment is in deciding what range to cover and how to
space the
XIS.
A check on the adequacy of the model may be obtained
by running 2 or 3 confirmatory observations, at points slightly displaced from the estimated maximum and comparing the observed with the
predicted resnonse for these points.
This idea of taking confirmatory
observations is a very important one and can be carried out with all of
the experimental techniques discussed in
th~.s
section.
The main results of interest to the experimenter will be the
estimate of 9, the regression equation for predicting y in the neighborhood of 0, and the residual variance, with perhaps an indication
of how the response behaves in a region distant from Q.
All of these
concepts are capable of simple interpretation by a man with the bare
minimum of statistical training.
Finally, most of the oomputation involved is the standard least
sou ares type of computation which should present no difficulty to the
statistician.
phase of the
Orthogonal polynomials oan be employed in the initial
exper~ment,
and, if a lot of this work is going to be
done, it might prove useful to compute a set of orthogonal polynomials
for the designs involving unequally spaced x-intervals in phase two.
15
In summary, the Rote11ing procedure is a fixed sample size
procedure well-suited to estimating the maximum of a response based
on a single independent variable and to mapping the function in the
neighborhood of the maximum response.
It is also a procedure for whioh
the mathematical properties of the estimates obtained are well-known
and in many senses optimum. 5
1.7 •
The Kiefer.Wolfowitz Prooedure.
The Kiefer-Wolfowitz procedure
~18-7
for the stochastic esti-
mation of the maximum of a regression function may be described as
follows:
Select two sequences of positive terms:
'"
~
;
'\
an \ and
}
,
'>
oJ
co.
1
n~
See (2,1) for the rules gOVErning the choice of these sequences.
zl be the best advance estimate of
e.
Then at each stage of the se-
quential procedure obtain the experimental points (Zn-cn' Y2n-l)
(zn+cn' Y?n)' and let
Let
and
zn+1 be determined by the rules
At the n-th stags of the sequential procedure let
estimate of
Q.
Zn+ 1 be the best
Kiefer and Wolfowitz have shown )-18_7 that, under
very general conditions, the most important being that the funotion
possess a unique
maxj~um
in the region of interest, that zn will
5. Many of the optimum properties of this technique are based upon
large sample approximations. The estimates obtained may be far from
optimum when the sample is small.
16
converge in probability to
e.
In Chapter II of this thesis a number of the properties of the
sequential prooedure for the oase when the regression function is a
quadratic function are investigated.
I shall state here what appear
to be the very general properties of the procedure, 8ld shall refer
to Chapter II for a more detailed description of these properties for
the special Oase considered there.
Discussion.
N; the number of observations, 1s unknown, of course, in advance
of the experiment, and, in fact, there is no
well~known
procedure for
terminating the sequence of experimentation (see (2.43) for suggestions),
let alone a"'ly knowledge of the probability distribution of N.
This
fact seems to be the main deterrent to widespread use of the procedure.
If at least bounds for N can be determined for a given situation, then
the sequential procedure gives promise of being more efficient than a
fixed sample size
experime~t
to accomplish the same objective.
haps the Monte Carlo method might be
~sed
Per~
to generate stochastic se-
quences and thus study various stopping procedures and their operating characteristios.
In addition to merely estimating Q, we can, of course, map the
resp onse in the neighborhood of the maximum by assumi.ng, say, a quadratic model and computing the least squares regression coefficients.
This
proced~e
(1.6.6),
(1,7.2)
can also give us another estimate of Q since, as in
17
Exact statements about the biases and variances of the regression coefficients so obtained cannot be made in advance of the experiment,
but their properties should in general be good since it is the tendency
of the stochastic procedure to concentrate observations in the neighborhood of the maximum response.
Thus the function should be well-mapped
in this neighborhood.
Some possible disadvantages of the procedure are that the rate
of convergence of zn to g may depend heavily upon the choice of zl
and that the process may be quite vulnerable to time-trends.
An
ad-
vantap-e is that we do not have to worry about experimenting in the
wrong regi.on.
The sequential procedure will, in due time, usually6
lead us to the right region.
The chance for the utilization of the skill of the investigator is somewhat limited to the choice of zl
sequences {anI andf cn,~
and the choice of the
which are essentially scale factors.
proper choice of these two sequences is essential in order that
The
zn
converge to Q rapidly. See (2.48) and (2.49).
In the case where the regression coefficients are estimated,
it is easy to check upon the adequacy of the model.
Since x Will,
in general, not be limited to three levels as in phase two of the
Rotel1ing scheme, we can compute the additional sums of squares due to
cubic, quartic, •••
effects and test these for significance in the
6. The procedure may not Ie ad to the region of the maximum response if there is more than one maximum. In such a case we may be led
to a local maximum smaller than the absolute maximum. This is true of
nearly all of the sequential optimization techniques.
18
standard manner,
Alternatively confirmatory observations may be taken.
The computational work and its interpretation are, in the case
where we are merely estimating
e, extremely simple, making the pro-
cedure admirably suited to wide-scale applicatio:l by non-statisticians.
In summary, the Kiefer-Wolfowitz procedure shows promise for the
purpose of locating the position of the maximum response and for describing the response in the vicinity of the maximum.
advantages are:
Its principal
simplicity of use, a well-defined procedure for pro-
gressing to the region of the maximum response, concentration of the
observations in the neighborhood of the maximum response, the presence
of a check for goodness of fit, and possibly an increase in efficiency
over the
fL~ed
sample size experiments.
The principal disadvantages
of the
~iefer-Wolfowitz
rule,
lack of knowledge concerning the probability distribution of N,
procedure are: the absence of a general stopping
dependence of the rate of convergence upon the choice of the starting
point, and sensitivity of the procedure to time trends and other forms
of non-heterogeneity.
Mult5.ple Independent Variable Situation
We shQll now consider three techniques that may be used in the
multiple
inGBp~nJent
variable situation.
These techniques are:
the
Friedman-Savage proc€Jdure, the Blum procedure, and the Box-wilson procedu~e,
~any
of the ideas and arguments used here are identical with
those used in the single independent variable situation,
The com-
19
plexity of the preblem is,of oourse, inoreased over that of the single
independent variable oase due to the multiplicity of factors and due
to two added functions of the experiment.
to separate out important independent variables;
i)
i1)
1.8.
to detect interactions.
-
The
Friedman-~ava~e
Procedure.
'!he procedure of Friedman and Savage is treated in
1:14_7.
A
description of the Friedman Savage procedure is as follows:
(1.8.1) Choose an initial factor combination. Presumably this will be
the beet advance estimate of the optimal factor combination.
(1.8.?) Orner the independent variables. One suggestion is to rank
them in their estimated order of importance.
(1.8.3) Holding all other independent variables constant. vary the
levels of the first until an approximate optimum is achieved.
Either the Hotelling or the Kiefer-Wolfowitz ideas might be
employed here.
(1.8.4) Using the optimal level of the first independent variable and
holding all other independent variables but the second at their
initial levels, find an aporoximate optimal level for the
second.
Continue in this manner until all independent variables
have been investigated.
Call this process a round of experi-
mentation.
(1.8.,) Continue experimenting in this fashion until the changes in
resoonse become small by some standard.
Use for the initial
20
factor combination of each round the estimate of those levels
of the independent variables maximiZing the response for the previous
round.
Discussion.
As in the case of the KieferJNolfowitz procedure/the sample
size N is a random variable about which little is known.
Friedman artd
Savage suggest that the procedure may be more efficient than a factorial design when there are no interactions and when experimental error
is small.
Since the tendency of the procedure is to concentrate
ob~
servations in the neighborhood of the maximum response, it may be
possible to map the response surface fairly well in the vicinity of the
maxtmum, but because of the possibly restricted approach to the maximum, this mapping may not be
60
complete as that of some of the other
sequential procedures.
It appears that the procedure may be very sensitive to the choice
of the starting factor combination and that it may be quite inefficient
in the presence of large interactions or large experimental error.
is also evident that we are quite at the meroy of time trends.
It
On the
positive side, as in the case of many sequential procedures, the model
that we choose is a very general one, and the procedure may be relatively insensitive to
~on-normality,
etc.
This technique more than any of the others resembles the preanalysis of variance technique of classical experimentation,
It thus
should be easy for the experimenter to understand and offers consider-
21
able sco-pe for the use of his knowledge and ingenuity.
vantage is that we can easily add new variables
wh~ch
Another adcome to mind
or dnop those of the original variables which in the initial stages
of experimentation prove to be unimportant.
The addition of a new
variable in many of the designs is acoomplished by repeating the whole
experiment with the new factor at a different level, possibly doubling
the cost of the experiment. With the Friedman-Savage prooedure we may
add a new variable, even in the final stages of the experiment, simply
by holding all the old variables at their estimated optimum levels and
varying the new factor over its set of levels, i.e., perhaps by as few
as five or six additional observations.
The computational burden is not heavy.
It is suggested that at
least in the early rounds the optimization procedure may be done graphically.
In summary, the advantages of the Friedman-Savage procedure are:
it is easily understood by the experimenter since it probably conforms
to his ideas on experimentation; we can add or drop variables during
the course of the experiment with ease; the model is very general; the
procedure concentrates observations in the neighborhood of the maximum response; and the procedure is possibly more efficient 7 than the
corresponding factorial experiment when interactions and experimental
error are small.
The disadvantages of the technique are that not much
7. 'When there are no interactions and no experimental error
the Friedman-Savage procesS-will oonverge to tne maximum response in
Qne round of experimentation.
22
is known about the probability distribution of N; the prooedure may
be quite sensitive to the choioe of the starting factor combination
and may be inefficient in the presenoe of large interactions or large
experimental error and is Quite vulnerable to time-trends.
It appears that this prooedure may be effective in the early
stages of an investigation as follows:
in separating the important
independent variables from the unimportant independent variables, in
obtaining an estimate of the location of the maximum response, and in
mapping the response surface fairly well in the immediate vicinity of
the maximum and roughly elsewhere.
1.9.
~
The Blum Procedure.
The Kiefer-Wolfowitz prooedure for stoohastio estimation of the
maximum of a regression funct10n with one independent variable has been
extended to the multiple independent variable oase by Blum
£,_7,
1954.
His prooedure is described below.
Choose
(1
't an ~
and
."
~
on
!
as two infinite sequenoes of posi-
tive numbers satisfying (?1) and let
I
0
I
(1.9.1)
U be the matrix:
\
i
\
I
I
i
\
I
I
I
I
o
oomposed of the row vectors
J
Uk
I
/
23
~l
=
(~,O,O. ,.,0)
~2
=
(0, u 2 ' 0, .,., 0)
,..
(0, O. O.••• , uk)
,
•
~k
Let !i be the row vector (Xli' x 2i '
be the measured response for the factor combination !i'
Let Y*
-n
be the vector
») ,
\
(1.9,2)
(' (Yx +c u.. -Yx ), (Yx +c u -Yx )" "'(Yx +c u -Yx
-n n-.L -n
-n n-2 -n
-n n-k -n
and define a recursive sequence of random vectors by choosing
~l
ar-
bitrarily and letting
a
x
= X + ~ y*
-n+l -n
c -n
n
•
Very general conditions for convergence of ?En to ~ are given in
£5_7,
The operating characteristics of the Blum procedure appear to be
very similar to those given for the Kiefer-Wolfowitz procedure and will
not be restated here.
ponent of
.....
.*
In which
Note that the term
'
L-19_7
.
means that if the experimental error is large the
procedure may be very slow to converge.
and Burnam
Yx appears in each oem..
-n
Perhaps the designs of Plackett
could be used to advantage here,
They present a seioll
quence of designs for use when the sample size is of the type
4n where
24
the estimate of each main effect is orthogonal to the estimate of all
other main effects.
Thus,for example, if there are eight independent
variables, the procedure outlined in (1.9.3) requires nine observations per round and the component of the change in factor Xp (P:l,2,
••• , 8) is determined from two observations. Using one of the Plackett
and Burnam designs, we could
take,say~12
observations and the component
of the change in factor
12 observations.
xp would be determined with the use of all
Thus by increasing the number of experiments by one
third we have increased the efficiency of the experiment by a factor
of six, and furthermore with three degrees of freedom left for error,
we can obtain some idea concerning the significance of the various
factors.
Perhaps the confounding of interactions with main effects
which occurs in the Plackett and Burnam designs will prevent their use
in this context, but I can see no reason why this confounding should
be my more serious than that which would occur with the use of (1.9 A).
As an extension of the results given in (2.48) it appears that
the proper choice of the seauences
(' a ~
L n}
and
~; C)·
l nol
and the matrix
U are essential fer reasonably rapid convergence of x t~ Q. Possi-n
bly a preliminary experiment to yield rough estimates of the second
degree regression coefficients would be helpful in the choice of these
factors.
~g9in,
this is a field where the Monte Carlo method might be
used to study the properties of a procedure based upon such a choice.
(1.10 ) The Box-Wilson Procedure.
- -7 and is out-
The Box-1i1Tilson technique is described in /-10
lined in steps (1.10.4) to (1.10.9) below.
25
We will first introduce a coding of the independent variables
which will simplify the presentation of the designs.
Suppose that
there are k independent variables'
pose that
xl' x , ••• , x k • Further, sup2
N observations are to be taken. Then we introduce a new
set of variables:
Zl' z2' ••• , Zk where
for i
(1.10.1)
and such that the
= 1,2.
of
.,k
,
zi obey the relations:
(1.10.2)
and
N
~ z2.-
~
u=-l
=- N
,
:lU
h
i =1
were
,2
,""k
Box and Wilson then proceed as follows:
(1.10.4)
An initial experiment is conducted somewhere in the region
of interest (persumably in the neighborhood of the best estimate of
the optimum factor combination).
The initial experiment is often a
factorial or fractional factorial experiment with all independent
variables at two levels.
The design is called a first-order design
and the model adopted is:
k
yu= ~O+
k
Z ~izi:U+ ~ ~i,zi z. + eu
1=1
i<j J U J U
,
26
where the
8
U
are
NID( 0 , (] 2 ).
(1.10.5) Comnute the regression ooefficients corresponding to the main
effects and two-factor interactions in (1.10.4).
If the main effects
are large compared with the two-factor interactions, conduct a second
experiment (again using a first order design) in whioh the factor levels
are changed in the direotion of largest response in the initial experiments.
This is called the path of steepest ascent.
Continue this pro-
cess until the first order effects are small in comparison with the twofactor interactions.
Box calls suoh a region a near-stationary region.
(1.10.6) Take additional observations to determine the quadratic components of the main effects.
A whole
veloped for this purpose,
these designs are discussed briefly in
an~
series of designs has been de-
(1.10;10). The model becomes:
k
1: {3ijZi z.
i<j
U JU
Compute the least squares estimates of the
by bp'
,
+ 8u
B • Denote these
P
The equation of prediction becomes
...,
k
Vu ~ bO+ Z b'~i +
. 1 1 U
J.=
k
2
• 1: biizi-u +
1=·1
k
Z bijZ i z .
i<j
u JU
By differentiating this expression with respect to Zl' 'Z2' •. , zk'
setting equal to zero and then solving the resulting system of k
simultaneous equations in k unknowns, the estimates
z~, z~, "', Z~
27
which maximize the response can be found.
The estimated maximum re-
spons&!' is
(1.10.8)
The expression (1.10.7) is reduced to the canonical form,
where
that is
a linear function in the
(1.10.9)
zi'
Confirmatory observations are taken.
(1.10.10) The Design~.
The Box-1ATilson designs, which we shall consider, are of two
types:
the first order designs, which are used to locate the direc-
tion of steepest asoent, and the second order designs, which are used
to estimate the regression coeffioients in a near-stationary region.
(Gardiner,
Granda~e,
order designs in
and Hader present an excellent discussion of
L-16_7).
Let us here introduce another notational device:
Let
Zo
=-
Zo
zl =- zl
.......
'
~
28
zk
=:
zk
zl z2
=:
zk+l
zl z3 :: zk+2
... ....
,
,
,
,.
zk..+zk'" zf
,
2
zl ... zf+l
2
z2 ... zf+2
,
.........
,
2
zk = zm
where
f=k+
,
=:
and
m =: k.+ _~~k-l)
+ k = k(k+3)
3
2
Optimum
fi~Et
order designs, in the sense that the variances
of the regression cJefficients become smallest, are achieved by the
use of factorial designs with all factors at two levels.
signs also have the property that
where
i
=:
j
= 1,
1, 2,
?,
... , f ;
... , f
;
These de-
29
When the number of independent variables is five or more, fractional
factorials may be used in this phase of the
investi~ation.
One criterion for the choice of good second-order designs is
the criterion of rotatability. With a rotatable design, the response
surface is determined with equal precision at points equi-distant from
the experimental center. With two independent variables, the regular
figures, i.e., points equally spaced on a circle, with the addition
of one or more center points, will give rotatable designs providing
that the number of points on the circle is five or more.
Box and Wilson propose the central composite designs which may
be constructed as follows.
Suppose that the last first-order experiment performed was a 1!2P replicate of a 2k factorial. Designate
the k factor levels at the center of the 1!2P x 2k design as (0, 0,
••• , 0), and add
k'
2k+l additional observations to the set of 2-P
observations by taking the factor combinations:
_..
-
-
(0;0, •.• ,0 ) , ( +a, 0, .•. ,0) , ( 0 •+a, ... ,0), ••. , ( 0, 0, ••. ,+a )
This portion of the c6sign is known as the cross-polytrope.
By a pro-
per adjustment between a and the radius of the hypercube, the central
composite designs become rotatable designs.
While rotatability is a desired property in these designs it is
not the sole criterion for the choice of good second-order designs.
It
is possible to choose a, for instance, so that coV(b , b ) = 0, where
j
i
i = 1, 2, •. ', m; j = 1, 2, ••. , m; i i j • These are called orthogonal
designs.
Box and Wilson also give
J-6_7
non-central comnosite designs.
30
These are designs which concentrate the observations taken during the
second-order phase of the experiment in a particular corner of the
last first-order design.
Discuss'i<>n.
The Box-Wilson procedure is, like the other procedures we have
been discussing, a seouential procedure.
It is not a formal sequential
procedure in the sense of the Kiefer-Wolfowitz procedure because, although the direction of tlteepest ascent is determined by earlier observations, the magnitude of tne step taken in this direction must be
chosen by the experimenter.
The number of observations, N, is clearly
a random variable, but one on which, it seems, plausibility, if not
probability, bounds can be placed.
posite designs.
Let us consider the central com-
If the first-order experiments cover a sufficiently
wide eXl;>erimental region and suocessive first-order designs are spaced
far enough apart, it seems to this writer unlikely that more than three
of these will be necessary to reach a near-stationary region.
if each of the first order designs is a 1!2 P replicate of a
factorial, and if the second order design is a central composite design, lower and uppper bounds for N, call them
Nt
and Nh can be
computed and are shown in Table 1.10 for k = 2, 3, '00' 6. The number
of experiments in a 3k factorial design is shown for comparative purposes.
The central oomposite designs may be
use~
of
cours~
as designs
in their own right -- without the preliminary use of first-order designs,
If there are several responses. for example, we cannot carry out the
first-order experimentation seouence on all of them simultaneously.
In
that case. our primary ooncern may be in mapping each response over the
area of interest.
The sample size for this type of experiment is that
given in the Nt' column of Table 1.10.
The variances and oovariances of the estimated regression coefficients can, of course, be obtained from:
8. The abbreviation cpo is used to indicate cross-polytrope.
32
(1.10,11)
Var (b i )
2
= 0iis,
for i = 0, 1, 2, ,._, m ,
and
where
i
= 0,
j
=:
1. 2, •.• , m ,
0, 1, 2, . H, m ,
,
where
0ij
is the element in the i-th row and j-th oolumn of the in-
verse normal matrix
0, and
s
2
'"'
N-m-l
The variance of the estimated response at any pOint
,
1
1
1
z =- (Zl' z2' .•• "
Zk) may be estimated from
A
v(y
IZ
-
I
)
Ck
= ~ Z
li=O
I.
The Box-Wi.lson designs lend themselves to blocking
£9_7 and
thus oan be used when real non-heterogeneity or a time trend is antioipated.
Interactions can be deteoted with the first-order designs
and the direotion of steepest ascent determined with the aid of these
interactions.
On the other hand, the Blum and Friedman-Savage pro-
cedures assume in each round of experimentation that the effects are
additive and for this reason may re/"luire more experiments when large
33
interactions do exist.
(1.10.1?)
The goodness. of· fit can be checked in one of several ways.
i)
If the design permits, we may add cubic or higher order
terms t6 the model and test whether or: not the increase
in the regression sum of squares is significant.
ii) 11,7e may know or have an idea of the magnitude of the experimental error variance and compare the obtained residual variance with this estimate, or we may replicate some
of the experimental points and divide the error variance
into two parts: that due to replication and a remainder,
and comnare these two variances.
iii)
lie
may plot I::.y = (y -/iJ) against y
and examine for in-
dications of correlation.
iv) We may take confirmatory observations.
These are perhcp s
best taken along the canonical. axes.
(1.1 0 .13)
If one of the procedures sug~ested in (1.10.12) indicates
that we have chosen the wrong model we can either:
i) In those cases where the model permits, write a new regression eouation including the cubic or higher order
terms determined in (1.10.12, i), if by doing so we find
that the fit is suffioiently improved; or,
ii) Add points to the experimental design.
The BOX-Wilson
designs adapt themselves very nicely to "nesting", for
instance.
- -7 give
Gardiner, Grandage and Hader /-16
34
procedures for nesting a seoond-order design into a tnirdorder design.
The third degree regression ooefficients calculated from the
second-order design in (i) may be biased.
If unbiased estimates of
these coefficients are desired, or i f the second-order design does
not furnish sufficient degrees of freedom to estimate all the relevant
third-degree re@'ression coefficients, then additional observations as
in (ii) are renuired.
Note that the cost of (i) may be quite small
when compared with that of (ii).
Perhaps the procedure (i) may be used
to indicate whether additional points should be taken or not.
An important use of the canonical equations is as follows.
Q.uite
often one of the canonical coefficients, i,e., the B , will be very
i
large when compared with all the others. SUppose that this is Bp '
~~at this often means is that
which itself is a linear combination
p
z,
of the zi' is the fundamental variable of the system. We give two examples to illustrate this point.
Example 1.
Suppose that in a chemical experiment Xl
the
lo~arithms
spectively.
represent
2
of the concentrations of ingredients one and two, re-
Suppose further that Bl >
with K some constant.
>
and x
B2 and that B ';::. K(x l -x )
1
2
This then would indicate that the absolute
concentration of neither ingredient is the controlling factor, but rather
that the ratio of their concentrations is the important variable.
35
-Fxample 2.
Secondly, a type of response surfaoe frequently encountered is
that of the ridge system.
See
possible response surfaces.
£6 _7
for a representation of some
4gain suppose that there are two inde-
pendent variables xl
and x 2 ' but that their product x l x 2 is really
the fundamental variable of the system, and that the equation of pre-
diction is
(1.10.14)
Then there will be a wide variety of pairs (Xl~x2) for whioh the re!ponse is at a maximum.
to XIX?
Upon differentiating (1.10.14) with respect
and setting equal to zero we obtain,
(1.10.15)
Thus
Y is maximized for all points (xl ,x 2 ) lying on the hyperbola
~
~
1
XlX~ = 3/2, 1.e ., ( 1, ~), ( ~ , 1 ), (J, ~), •.•
•
Henoe, we can often with multiple responses obtain the locus
of the
maxL~m
this locus to a
response for, say, the first response and then move along
~oint
where the second response is also maximized.
These designs do offer the experimenter an opportunity for the
use of his knoWledge in choosing the position of the first-order design,
the levels of the independent variables, the probable degree of the
actual response surface, and in estimating which variables are likely
to have quadratic effects and which pairs of variables are likely to
interact.
The interpretation of the results, while more complex than thz
for any of the other designs discussed here, is still within the experimenter's grasp.
~1hen
there are only two independent vari a,les, or
perhaps only two that turn out to be important, and with somewhat more
difficulty when there are three independent variables, the response
surface can be represented in the form of a oontour drawing
whioh displays the significant features of the surface,
L-6_7
These contour
representations are perhaps most easily obtained from the canonioal
equations.
The computational procedures involved are none other than those
of multiple regression made somewhat easier by the presence of several
zeros in the matrix of the normal enuations.
If electronic computing
facilities are available, the uncoded independent variables may be used
and all of the computational work, including the inversion of large
matrices, should proceed with little difficulty.
In
S1 mmary,
experimental work:
the Box-Wilson procedure is useful in all phases of
in separating out the important independent variables,
in mapping the response surface, in locating the position of the maxi-
mum response, in gaining understanding about the mechanism of the
syste~
and in pointing out the direction in which further experimentation should
proceed,
The properties of the regression coefficients so obtained are
well-known and, in many respects, optimum.
Goodness·of fit may be test-
ed by a number of procedures, and if the fit is shown to be bad, a new
model may be set up or the design augmented by additional points to
correct this difficulty,
The designs themselves are extremely versatile
37
and can be set up in different ways according to the immediate objective of the experiment.
The designs offer the experimenter an
opportunity to use his skill and knowledge and the contour representation of the response surface is easily understood by him.
1.11.
Summary.
This chapter has discussed briefly some of the inadequacies of
factorial design in the response surface problem and has attempted to
outline a few of the procedures recently developed for this purpose.
The treatment given these procedures has been by no means complete and
many of the arguments presented are clearly heuristic.
I will have
served my purpose if I have managed to indicate at least some of the
problems that will be met in the application of these designs.
CHAPTER II
APPLICATION OF THE KIEFER-WOLFOWITZ STOCHASTIC APPROXIMATION
PROCEDURE TO
A
SECOND DEGREE REGRESSION FUNCTION
We are interested in applying the Kiefer-Wolfowitz procedure
for estimating a maximum to a quadratic function and in determining
the operating characteristics of the orocedure in this case.
eral method is given by Kiefer and Wolfowitz
L-18_7
The gen-
and is described
briefly below.
The sequences
{an
'j
f n 1 are chosen subject to the
and
C
restrictions (2,1).
(? .1.1)
(2.1.2)
,
on -> 0
co
Z a
=
n=1 n
co
00
Z a c <
n=l n n
00
,
a 2
Z (-22)
(2.1.4)
00
<
c
00
n=l n
One choice for these sequences ,and t he one which we shall consider in
this paper 1.s (2.2).
(2.2.1)
e
(2.2.2)
R
{an}
(
JC
n
~
1 ,1
= ii ,
S
"''K
n
39
where Rand S are arbitrary positive constants and 0
<
k<
%.
It is easily seen that (2.1.1) and (2.1.2) are verified,and since
..,
00
l:
RS
,
-m
n=l n
and
00 R 2
l: (-)
,
1
n=l S
and since the well known series
00
1
l: -- converges for
n=l nP
P > 1. it is
seen that all of the conditions or (2.1) are satisfied by (2.2).
The sequential procedure is then described as follows.
be the value or x for whioh y
the best advanoe estimate of Q.
= rex)
Let Q
is a maximum end let zl be
At each stage of the experiment two
experimental points are obtained: (x 2n _l , Y2n-l) and (x2n , Y2n)' where
and x2n
response at xi'
n+'...
Z
~
zn + Cn , and where
Yi
is the measured
is then given by
Conditions on f(x) which are necessary for the convergenoe of z 'bo
n
Q are given in L-18_7.
Now suppose that the true regression is quadratio, i.e., rex)
is given by
40
y = f(x )
u
u
=!
as 2 + bx
u
u
+
e:
0 +
u
,
2
NID( 0,0' ).
e:
where the errors of measurement, i.e., the
Then
from (2.4)
'Y2 n-1· a(z n-0 n )2 + b(z n-0n ) +
0
+ &2n-1
I
2)
• a(z 2
n-?o n zn+0 n )+b(z n-0 n +0+&2 n-1
'
and
Y2n
= a(z n+c n )2+b(z n+0 n )+0+8 2n
Employing (?3) we obtain
(2.6.1)
zn+l
= zn
+
~
1
t4acnzn + 2bcn + .2n - .2n-1
a
:(1 + uaa)z
n n
+
2ba
n
+ -E(~2
on
n
- 8 2n _1 )
Remembering that O. - ~ , we can rewrite (2.6.1) free of the parameter
b
as
z 1 = z (1~4aa ) - 4aa g
n+
n
n
n
Since E(e:2n - e: 2n _1 )
=-
a
on
+ -=~(e2
n
-
e2n_l )
0, we have
,
41
or
E(zn+1
1 Zn): zn(1+4a an ) - 4 asn9
•
Also,
a
var (Zn+1 1 Zn) = (~)
(2.8)
2
2
2ae
Rewriting (2.7.2) slightly, we see that
We are now led to
THEOREM 2.1-
E("n+lj "1) • [<"l-e) 1!l(l+4aa1)]
+
e
PROOF BY MATHBM ATICAL INDUCTION.
(i) Take the
case
n
= 1.
Then from (2.7.2), we obtain
(2.10)
So the theorem ho1d3 for n = 1.
(ii)
Assume that the theorem is true for n = k, i.e., that
(2.11.1)
Then we wish to show that the theorem is true for n
= k + 1, i.e., that
42
However, from (2.7.2), it is seen that
(2.12)
E(Zk+2 IZk+l)
II
(1+4 aak+1 ) E(Zk+11 zl) - 4aak+1Q,
Substituting (2.11,1) into (2,12), we obtain
But this: is the form ('.2.11.2). Thus we have proved Theorem 2.1.
THEOREM 2.2.
E(zn+1 IZ1) ->
Q
as
n ->
00
Proof.
Since f(x) is a ouadratic function possessing a maximum, the
sign of a will be negative, and if the se('luence
be
[an]
is chosen to
Rln, R > 0, as indicated in (2,2), then we may write
(2.14)
where
K~-4aR>0
Now
(Zl -- Q) is finite and
~
Q say.. The infinite product
43
00
JT
m=l
(1- ~) vanishes by a well-known theoremJ hence, given any e/Q
there exists an N such that
and hence
N
it
m=l
K I
(1 - iii)
I -<
6
•
Therefore,
,
but e is arbitrarily small. Hence,
Let us make the following notation:
,
(2.16.1)
and let
(2.16.2)
for
t
~
Let
1,
and 1st
1hen we may state
>
0,
44
THEORBM 2.3.
PROOF BY MA1HEMATIGAL INDUCTION.
(i)
Take the case n = 1. lnTe see from (2.8) that
The theorem states that
Hence~
.e
the theorem is true for
(11)
n
= 1.
lIssurne th at the theorem is true for
n'" k, 1. e .,
(2.17.1)
lATe attempt to show
var(zk.+. 21 'ZI)
=
('
~ R2
J"
1.t=O
'; 1')(J2
kIt ~
+,
j
c
8
From (2.6.1), we see that
(2.18)
Making use of the induction assumption (2.17.1) and the notation (2.16),
we obtain
45
=:
f
2
k-l
2
P +l ~ Rk t +
k
t=O'
k R2
= E
f
t=o k+l, t
a 1 2}
k
C-=-)
c k+ l
20'
2
e
} 20'2
But this is expression (2.17.2). Hence,
s
Theorem 2.3 is proved.
Note
that, as a corollary, we have obtained the recursive relation
(2.20)
may also calculate at this point E( zn+ 1-8 )2.
the relation
{fiTe
(2.21)
we
E(z n+ 1-8) 2 = var(z n+ 1-8)+
1~}
E(z n+ 1-8)
see that
(2.2~)
n
j(
i=l
2
Making use of
46
THEOREM 2.4.
PROOF.
From (2.20), we see that
,
and, in fact,
p
2
n+
an+k_l)
+ (C + _
n k l
.e
~
n
(2.24)
var(zn+k
c
a
(n+k-2)
2
n+k-l Cn+ k- 2
n
21 ,,2e
... 0
Now, there exists!J1
for all
a 2
l(...!!) + ••• + p2
M such that for all n 2: M, p~:: 1.
M,
f 21 1 )
~ P~+k-l P~+k-2
•••
P~
var(zn f 21 1 )
Thus,
41
an+k-l)
••• + .( c
+ _
nk l
2J
Thus,
(2.25)
var( zn+k fzl) =
lim
n->
00
k ->
k-n ->
00
n
-'> 00
00
k ->
k-n ->
00
+
2 ~ .. n+k-l
lim:'O'
n _> 00 6
k-> 00
k-n ->
2?
2
(
Pn+k_1Pn+k_2·"Pnvar
zn/ zl)
lim
2:
m""n
00
a 21
(f2)
m
00
Consider the second term on the right-hand side.
From (2.,), we see
that this expression may be written as
.e
(2.26)
2R 20'2 n+k-l
lim
'/
2:
n -> 00 S
m=n
m
2;s
k -->
00
k-n ->
00
where 1 < P < 2. Consider the sequenoe
r
,
n +k- l
E
. JlFn
whioh is less than the corresponding term of
~} ,every term of
1 ?
m:n 7 5 '
00
{
But this latter eequer-oe is a null seouence in n and hence, by com'. [n+k-l 1 }
parison,
Z
-P
m=n
m
and
l
2 n+k-l a
6
(~)
8
m=n
m
2a
21 is a null sequenoe
in n.
The first term on the right-hand side of (2.?5) may be written
using the notation of (2.14) as
.e
48
I >.n+k-l
li~, ,.p~;.:
n ~>bo m=n
k ->00
f
2
(~- .~) . ~a:; ~,zri ~ zl)
,
k~n ->t:o
-,,;< Pb·: :rj'1i+. ' l , < ' - t
whereK > O.'It has'b{~en shown in (2'.14) that, given any e > 0,
lj;,Jq'
}
there exists an N such that
I
N
. "IT
. m=1
and, if we take
/i:r
(2.28)
.e
& such that
f
(1-
(1-.1£)
m
°
< e $ 1,
~)~
I::
.2 .:: •
Thus, !:-1 (1 _" ~)2 }'i! ! null !!quene!
in
n''''
L'
Now we shall show that for large enough n,
monotone decreasing oeqnence in n,
where
.e
1,
Again p from (2.2 0
Using the notation of (2.14), this may be written as
Thus,
[var( Zn
131)}
is a
49
1 < P < 2
and
K > 0,
and
lim
(2.31)
n->.(O
Thus, var (zn 'Zl) is tor large eoough n f monotone decreasing in n.
n+k-l
Hence, since
it
m=n
(1- I)
2
is a null sequence in n, so is
lIl..
n+k-l
Jr
m=n
K 2
(1-
m)
I
var(Zn 1. Zl)' But this means thct var (Zn+k zl) may be written as the
sum of two null sequences, and hence is itself a null sequence. Thus,
I
var{ zn zl) is a null sequence.
COROLLARY.
.e
E(Zn+l - 9)2 is a null sequence in n •
-
PROOF.
E(z .1-9)2
n+
= var(zn+1)
+ fECZ
n+
1-9 )
57. 2
The first term on the right-hand side, we have just shown to be a null
sequenceJ the second term on the right-hand side can be shown, with the
aid of (2.28), to be a null sequence.
Thus,
E(Zn+1-9)2 is the sum
of two null sequences and hence is itself a null sequenoe.
We now seek cov(Zn+l' zn \Zl). We shall prove Theorem
THEOliEM ?. 5•
-PROOF.
.e
2.5.
,0
We shall work first with the left-hand term in the last expression.
Employing
(2.33)
(2.6.~),
('n+l){'n)~'n~'n(1+4aan) Z
a
4 aane +
~("2n -
"2n-l)
2
a
(1
+ 4aa ) - 4aa Qz + ~ z (8 -&2 1)
n
n
n n
On n 2n
n-
1
,
and
:: (1+4sa )E(z2) - haa Q E(z )
n
n
n
but
n~
.e
{' E(z)n 1 2
E(z~) - var(z ) +
n
Thus,
- haar. OE(zn )
~
n-2 2
'~R 1 t
n (t=O n- ,
=P
l
2
2·:] - +
8
Ln-1
Jr
-haa 9 L-(zl-9)
n
~n-2 2
z: R -1 t
n It=o n ,
I
=P . }
f
i=l
2
':>0
8
it
i=1
~
+
e
2
+ (zl-9 )·
n-1
+2Q(zl-Q)
P.
2
Pi +
e
J(
-
7
n-1 2
TC
Pi
i=l
n
,
$1
The seoond of the expressions in (2.]2) may be written as
E( 3 n40 1
t
!'"'l) E( zn f 211 )
I~/i+ elf (";-v)
= { ( "1-9 )
,
.e
and (2.32) beoomes
co~ (z,,_
_.. 1 ,Z n 'Zl) =
(2.36)
~_
+(zl-Q)
+
2 n-l
e2 (1+4 aa·1 )
!
.e
')
1T F;P
i:1
~ n
-
+ (1+4aa
n
)(29)(zl-e)
4 asne /-(Zl-e)
-
1··1
:1'......Pi-
...."
-~
n-l
K
1=1
Pi
_7
~
..
52
+ 9(zl-9)
n-l
Jr
1=1
{
Pi 2(1+4aa )-4aa - 2 n
n
The term in brackets vanishes leaving
It now becomes a simple matter to oalculate the correlation
between zn
.e
and zn+l' We shall make use of
I
vvar(Zn Zl} var(Zn+ll Zl)
On substituting into (2.37) the expressions that have been obtained
for variances and covariances, we obtain
(2.38)
I n-2
~
= pn
/
V
l
.e
2
R
t=O n-l,t
~...,--n-l 2
~R
t=O n,t
53
Thus,
n-2 2
E R
t=O n-1,t
n-l 2
,
ER
t=o n,t
and, using the recursive relation given in (2.20), this becomes
2 n-2 2
n t=O n-l,t
PER
Thus, we may write
.e
(2.41)
whare 6. is given by
Now, from (?2),
1
For very la:z'ge n we may regard
••• as
being of the same magnitude. Taking this and remembering the defini2
2
tiona of Pn and Rn-1 , t' we have
54
1
K 2
(1- -)
+
n
Thus as
n ->
00,
•••
6-->_~1_ _
1+1+1+
4aR
that (1- -T{)
n = (1+ ---)
n
...
= o.
approaches unity as n
ThUS, Q.2(z n , zn+J... , zl) -> 1 as n ->
.e
We have used the fact
an N such that for all n
~
00.
approaches infinity.
But there always exists
N, P is positive.
n
Furthermore, the
souare root in expression (2.38) is conventionally taken to be positive.
Thus. we have proved
THEORti'M ?
(2 .43)
6.
Let uS~.now consider the problem of optional stopping. We de-
sire some procedure whereby the information obtained from the observations is utili3ed to terminate the se n uentia1 procedure whenever the
location of the
maxim~~
has been estimated with suitable precision.
We Shall mention two procedures which might be considered for this
position and shall elaborate upon the second •
.e
,5
(2.43.1)
Consider the sign of (Y2n-Y2n-l),
successive pair of
R900rd
this sign for each
and terminate the experiment after
obs~rvations
q
changes in sign where -q is,en i.nteger specified in advance of the experiment.
(2,43.2)
Consider a group of s
integer.
Let
zm -
Z
k, where
k
the experiment vilhen
=
Then terminate the experiment when
is pre-designated.
Let us consider the case for
.e
is an
zm = max(Zn_S+2' zn-s+3' "', zn+l)' and let zf
min(zn_s+2' zn_s+3' ••• , zn+l)'
I d .:
successive zls where s
Izn+1 -
zn)' is normally distributed,
Z
l
s
< k.
n,pr (
= 2.
Then we will terminate
Since
Z ,
n
and hence
Izn+l - Zn I -
<
kJ'
(z n+1-
can be
co~
puted from the cumulative normal distribution function and is seen to
be
(2.44)
Zn f zl)' • i.Te now proceed to determine u
it is seen that
(2.4,)
,
and v.
Using Theorem 2.1,;
56
= (Z1
n-1
- 0) )( (1+4a8. )(1+4aa -1)
i=1
n
1.
We obtain var(zn+l - zn J zl) from the expression
' n-1
=:
l:
{ . t=O
R~
+
n··2
l:
2
R
n,t t=O n-l.t
- 2P'
n-2
Z
R2 ,
n t:o n-.,t
1
Using recursive relation (~.?O),
Substituting expressions (2~4,) and (2.47) into (2.44), we obtain the
57
orobability that the experiment t.ar!l\inat.es on or berore the
trial (2n+?obServations).
of the two known sequences
(n+l)-th
This probability will be expressed in terms
f~ \
and {en (
and the unknown para-
+ ' Then the probability that the
u l
experiment terminates on the k-th trial is (r - r _ ). This probk
k 1
ability depends upon the unknown parameter a in a complex way, Thus,
meter
a.
Call this probability r
in a given situation, it would seem diffioult to specify
in ad-
k
vance and attach meaningful confidence limits to the estimate of
obtained from the process.
Perhaps a preliminary experiment to obtain
an estimate of a would be helpful in this context.
for a discussion of how an estimate of
See (2.49) below
a from a preliminary experi-
ment might be used to determine the sequences {an ~
order that
(2 .'~8)
z
n
e
and
{Cn1 L'"l
converge to g fairly rapidly.
The proper choice of ser:luences fan
in order to insure
1
{Cn!
and
convergence of z
to 9.
n
as in (2.2), confine oU.l'selves to seouences of the type,
eff~.clent
(2.48.1)
=
R
n
,
=
is necess ary
1,Te
will here,
S
""k ,
f"
where R > 0, S > 0, and 0 < k <
(2.48.2)
E( z
%.
Recalling (2.9),
I
n+l - zn zn ) = 4aan(z n - 9)
,
we sea that there are two types of behavior against which we must guard.
(?48.3)
First, if
4aon > > 1". then
I
S8
E(Zn+ll zl)
will oscillate back and forth over
creasing magnitude until we reach
N such that 14aaw/ ~ 2, after
which point, E(z ) will begin to converge to
Q.
n
(2.48.4)
Second, if
4as
9 in steps of in-
n < < 1, oonvergenoe to
e may
be very slow
(2.48.,) Thus, we see that careful selection of the sequence ~anl
is necessary in order that zn converge rapidly to Q. We will new
show that the selection of the sequence {on1 is not so critical.
~n
inspection of the expansion of the expression given in Theorem 2.3
for var(Zn+lj Zl)
the denominator 0
var (Zn+l
IZl)'
will reveal that the c
T~us,
1
fixed, then
E(Zn+l - zn t zl)
terms appear only in
inoreasing the size of c
n
can only reduoe
Furthermore, this reduction will oocur in the well-
known proportional manner; if we take
1f an
n
t
var (zn.;.l
fI zl')
1
= "it
is f:;,'ee of cn ;
~ o~~:::
[kC n\
I
var (Zn zl)'
hence, by changing
not harm the convergence properties of the procedure.
,and leave
Also note that
~ on
1'
we do
Thus, it is
recommended that we take
c to be as large as possible subject to
n
the restriction that we do not take it so large that the procedure is
yielding
ment.
z
n
values outside of the
"range of in1ierest It Qf the experi-
See L-18_7 on this point~
(? .49) 1111e P!'esent here a method by which the sequences { an
1
and
59
may b€ chosen from a preliminary experiment.
(2.49.1) Choose 5 equi-spaced x-points over the area
of interest.
P-26 , P-A, ~, P+A, and P+2A.
Let us call these x-points
(2.49.2) Make one observation o£ -t;he response at each x-point. Call
at
YJ~
Yl , Y2'
these responses
Y4' and Y, where
Yl
is the response
(P-26), Y2 is the response at (P-6), etc.
Now, estimate
a
and 9 by the method of least snuares.
It
is seen that
A
,
a ;:,
and that
(2.49.4)
+
Then cheose
as outlined in
course, the
z
1
= A9,
-1
R =- • -
4~
,
P
and S;:, P and proceed
1\
(2.3;. We explain briefly these choices. 9 is, of
estima~e
of the location of the maximum obtained from the
preliminary experiment and is the logical choice for
if
.e
:1 ,
4a
then, from (?9), we see that
zl.
Furthermore,
60
Perhaps, by choosing the sequence
{' anI in, this ltI8nner, some measure
of protection against the behavior mentioned in (2.48.3) and (2.48.4)
may be obtained.
This seems an area where researoh through the use
of the Monte Carlo method might prove very useful.
Finally, we have
chosen S equ~l to P in l~ne with the considerations of (2.48.5).
At this point we might mention that there are two distinct ways
in which the sequential procedure might be used to estimate Q. We
may terminate the experiment after ?t observations have been taken and
take
,
or we may use all of the experimental points, (xi' Yi)' i = 1,2, ••• ,
?t, or those of the experimental points that are in the vicinity of the
maximum to estimate
suitable degree.
e
from a least
s~uares
regression equation of
Note that in this Case the sequential procedure
serves to concentrate the experimental points in the vicinity of the
maximum. How the standard errors and biases of the regression coefficients so
obtai~ed
oompare with those of regression coefficients
obtained from a fixed sample size experiment using eoually-spaced xincrements or with those obtained from an experiment employing the
Hotelling procedure is a very importc;n'jj question llThich should be
further pursued.
If the
exper~mental
may be slow to converge.
.e
error is large the sequential procedure
A device that may be of assistance in this
situation is that of replication, i. e., we can make k determinations
61
of y at each level of x
and take
z
n+l
:=z
a
n
+..!!(v
c
n
J2n -
-y
2n-l
)
'
where
and i
(2.52)
= 1,2, •••
Summary.
I have in this chapter discussed briefly some of the operating
characteristics of the Kiefer-wolfowitz stoohastic approximation
pro~
cedure when the true regression is quadratio,and have attempted to indicate some of the consideration s that must be taken even in this
simple situation.
The treatment has been by no means rigorous or com-
plete, and there is no justification for believing that the procedures
suggested here will work at all well if the true regression is other
than quadratio.
62
APPENDIX
The
~onte
Car10 method has been used to generate stochastic se-
Quences with "the IBM 6,0
"~o.mput~~ "~9r
ceduresuggested in Section (2.4'9)
and
fur
the-purpose of evaluating the protl€liermining the sequences fan'!
{e"~
n, appearing in the "application of the Kiefer-Wolfowitz pro.I
cedure.
The physical situation simulated in this study is described
below.
Suppose that x
and yare functionally related by means of
the ef'1uation
•
e
but there is an experimental error involved in measuring
y so that
actually
,
where the
• Suppose that one
0u are NID(O, CJ'2)1
e
observ~
is taken for each of the following x-pointsz x = -200, x
200, x
= 400,
200; )
Now the true maximum value for y oocurs at
ion on y
= 0,
and x = 600. (i.e., in the notation of (~.49) ~
Jl!'"
x =
=p =
0, so we
assume that the initial ignoranoe of the location of the optimum point
1.
Bu was determined as
Let vi u assume each of the
values from 0 to 99 with probability 1/100, where i=l, 2, ••. , 10 and
follows~
u=l, 2. • •• • Then 0 = li.( ~o v. -495); CJ'2 has been estimated from 125
u 2 1=1 ~u
8
of the 0u generated by the machine and was found to be approximately
5400.
63
is such that the true optimum point is located half-way between the
experimental center and the boundary of the experimental region.
Now,
on the basis of these five exper.tmental points, suppose that a quadratio regression is fitted by the method of least souares so that
and that the location of the ebsicca of the maximum ordinate is estimated from
'!hen set R = ... 4~~>' S
= 200,
k
=
t;
and apply the Kiefer-Wolfowitz
procedure to obtain z2' z3' •• , z26'
Quenoe.
Call this an experimental se-
An experimental senuence then consists of 55 observations,
i.e., five
prelimi~ary
observations and
25
pairs of seauential observa-
tions.
The IBM 650 oomputer has been utilized to compound 111 such
experimental sequences.
Some of the results are summarized below.
Three 6xaffiples of these experimental sequences are given in
Table (3.1) •
.e
64
TABLE' .1
EXPERIMENTAL SEQUENCES
Ex~p1e
zl
z:2
z;
'.
z4
z5
z6
z7
Zs
z9
z10
z11
z12
z13
z14
z15
z16
z17
z18
z19
z20
z21
z22
z2;
z24
z25
z26
~e
,-
=
3::L3
- ;.5
0.8
6.6
- 2.7
- 1.7
-14.0
-12.5
- 8.5
2.3
- 1.1
- 8.8
-17.2
- 9.0
- 0.8
5.8
6.7
7.7
8.5
11.5
10.8
9.7
13.8
13.9
9.4
5.4
1
Example 2
160.8
-167.;
- 14.6
- 15.3
0.8
20.8
26.5
26.0
5.7
3.6
-16.0
- 8.4
0.0
16.1
8.6
0.5
-8.7
10.0
4.2
12.4
8.8
17.2
14.0
20.9
9.3
- 1.6
-
Example :;
38.9
56.1
;1.2
25.7
22.9
22.S
35.0
39.3
37.5
25.9
27.8
28.2
24.9
23.8
20.2
17.4
19.7
17.7
17.3
20.6
22.4
23.6
20.2
16.4
15.7
15.1
Note that in Example 2,
zn converges ouite rapidly to zero
after over-correcting for an unusually poor first estimate, and that,
in Example 3, z
remains positive throughout the entire sequence.
n
The means and variances of z for selected n have been comn
puted and apnear in Table (3,2) below. All of the variances are based
on 110 degrees of freedom.
TABLE 3.2
Number of
Experiments
,
n
1
2
•
7
9
13
19
3
.5
8
11
16
21
26
25
3.5
45
.55
-zn
8.64
9.07
4.4,
2.34
?80
0.04
0.31
1.16
1.48
var(zn)
Estimated Var(zn)2
3038
2480
1?66
2707
1937
1.50.5
1040
713
.542
387
301
246
8.53
.577
485
346
368
263
2. Tne estimated variance of z has been computed assuming
n
that var(zn) decreases as the reciprocal of the number of experiments.
Let var(z)= V and let N be the total number of experiments performed
n
n
n
N
following the determination of z • Then assume tIlEr\i n
is an estimate
.
n
~vn
of
where
is the estimated variance of zl; and let si=i ~NnVn'
ai,
ai
where the summation is over the nine values of n
appearing in Table
3.2. Then the estimated variance of zn is obtained from
N
5'n
2
sl
"
,
\,
\ •...
66
Note that the means are all positive.
attributed to the fact that the mean for
This may perhaps be
zl was rather large (but
not significantly different from zero at the .05 level) and that the
other means are correlated with the first.
Of special interest is
It appears that over the
the behavior of var(z)
for increasing n.
n
range of values of n appearing in the study that
var (Zn) decreases
roughly as the reciprocal of the number of experiments. Following
this assumption, a least souares curve has been fitted to the data of
Table 3.2. The predicted values for var(z ) are shown in the last
n
column of that table, and are seen to agree well with the experimentally
obtained variances.
.e
There remain several points of interest that could be investigated with the aid of the Monte Carlo method.
A few of these are in·
dicated below.
( i)
(ii)
The influence of k on var(z ) could be studied.
n
Various stopping rules could be evaluated.
(iii) The influence of the choice of the initial region might be
investigaohed t
(iv)
The operating characteristics of the estimation procedure in
the case where the true regression is not quadratic might be
studied.
(v)
Various modifications of the estimation procedure could be
investigated.
For example, we might take three observations
at each stage of the sequential procedure, say,
and
zn+0 n , and let zn+1
z ,
n n n
be the estimate of the location
Z -0 ,
67
o~
the
msxim~
found by fitting"a quadratioregression to
these three points.
,
y
68
BIBLIOGRAPHY
Anderson, R.L., "Recent Advances in Finding Best Operating
Conditions," Journal of the Amerioan Statistical Association, VLVIII(1953), 789-798.
..;-2 -7
Anscombe, F. J., "Fixed Sample Size Analysis of Sequential
Observations," Biometrios, X(195!J, 89-100.
i
'Rechhofer, Robert E., "A Single Sar.lple Multiple Deoision Prooedure for Ranking Means of Normal Populations with Known
Varianoes," Annals of Mathemat ioal Statistics, XXV(1954),
16-39'Rechhofer Robert E. Dunnet, Charles W. and Sobel. Milton,
nA Two-Sample MUltiple Decision Procedure for Ranking Means
of Normal Populations with a Common Unknown Varianoe,"
Biometrika, XLI (1954), 170-176.
£5 _7
le
Blum, J. R., "Multidimensional Stochastic Approximation
Methods," Annals of Mathematical Statistics, XXV (1954),
737-7~L!..
£6_7
Box, G.E.P., "The Exploration and Exploitation of Response
Surfaces I Some General Considerations and Examples ,"
Biometrics, X (1954), 16-60.
-r7 -7
Box, G. E P., "Multi-factor Designs of First Order," Biometrika,
XXXIX (19,2), 49-57 _
•
L-8 _7
Box, G.E,P., and Hunter, J.S.. "ft. Confidence Region for the
Solution of a Set of Simultaneous Equations with an Application to Experimental Design," Biometrika, XLI (1954),
190-199.
~
Box, G.E.P., and Hunter, J.S., "The Exploration and Exploitation of Response Surfacest III The Experimental Designs, II
to be published.
'Rox, G.E.P., and Wilson, K,B., "On the Experimental Attainment
of Optimmn Conditions," J ournaJ. of the Royal Statistical
Society, Series B~XIII (19~ 1-45.
["llJ Box,G.E.P.
and Youle, p.V., "The Exploration and Exploitation
of ~esponse Surfacest An ~xample of the Link between the
Fi tted Surface and the Basic Mechanism of the System,"
Biometrics,
...
-- XI (1955), 287-323.
69
Davies, Owen L., The Design and Analysis of Industrial Experiments, New York: Oliver and Boyd, 1954.
Dixon, W.J., and Mood, A.M., "A Method for Obtaining and
Analyzing Sensiti.vity Data,n Journal of the American
Statistical Association, XLIII (1948), 109-126.
Eisenhart, Hastay, and Wallis, Techniques of Statistical
Analysis,; New York' McGraw Hill, 1947 f
Fieller, E. c., "The Biological Standardization of Insulin,"
Journal of the Royal Statistical Society, Supplement,
VII (1940), 2-54.
J
Gardiner, D.A.., Granda~e, A.H.E • and Hader, R.J., "Some
Third Order Rotatable Designs," Institute of Statistics
Mimeo Series, No. 149, 1956.
fl7J Hotelling, Harold, "Experimental Determination of the Maximum
of a Function," Annals of Mathematical Statistics, XII
(1941), 20..4,.
Ie
,
L-18_7 Kiefer, J., and Wolfowitz, J., "Stochastic Estim~ ion of the
llJTaximum of' a Regression Function," Annals of Mathematical
Statistics, XXIII(1952), 462-466.
["19_7 Plackett, R. L., a'1d Burman, J p., "The Design of O:ptirnurn Mul'ti..
)factor Experiments, II Biometrika, XXXIII(l946), 30,-)2,.
Read, D.R. "The Design of Chemical Experiments," Biometrics,
X(1954). 1-15.
Robbins, Herbert, "Some Aspects of the sequential Design of
Experinents," Bulletin of the American Mathematical
Society, LVIII r-15r,2), 527-535.
Robbins, Herbert, and Monro, Sutton, "A Stochastic Approxirnati~n Method," Ar.nals of Mathematical Statistios, XXII
(1951), 400-407:
Sa~terthvTaite,
signs ,"
Franklin E•• "Random Balance Experimental De•
Wolfowitz, J., "On the Stochastic Approximation Method of
Robbins and Monro,'" Annals of Mathematioal Statistics,
XXIII (1952), 457-461.