Meydrech, E.F. and Kupper, L.L.; (1972). "A new approach to mean square error estimation of resoponse surfaces."

This research is being partially supported by National Institutes of Health,
Division of General Medical Schmees Grant No. STOl GM00038, and by
Department of Health, Education, and Welfare, National Institute of
Environmental Health Sciences Grant No. STOl ES00133.
t
•
A NEW APPROACH TO MEAN SQUARE
ERROR ESTIMATION OF RESPONSE SURFACES
by
Edward F. Meydrech and Lawrence L. Kupper
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 840
•
SEPTEMBER 1972
ABSTRACT
Box and Draper (1959) considered the problem of choosing a
response surface design to minimize integrated mean square error J
when the true response n=~i~l + ~2~2' represented as a polynomial of
degree d
2
in p variables, is approximated by a polynomial
lower degree d
A
y=~iEl
of
A
~l
for which the coefficient vector El is the vector
l
of usual least squares estimates.
Based on motivation supplied by
Korts (1971) and others, the use of a generalized estimator of the
form El =
A
~~l'
where
~
constants, is proposed.
is a diagonal matrix: of appropriately chosen
The choice of
~
which minimizes J depends
unfortunately on the unknown elements of the parameter vectors
and
~2'
~l
However, when the parameter space can be restricted by
specifying bounds (however crude) for the elements of
~l
and
~2
relative
to 02/N, it becomes possible to determine a set of ~'s, each K in the
set depending only on the given bounds and on the choice of experimental
design and each K in the set making J smaller in value than when
(the Box-Draper case).
~=!
Within this set can be found that particular
choice of K which minimizes the maximum of J over the restricted
parameter space.
Analogous results are obtained when a modified
criterion J*, the expected value of J with respect to a prior distri•
bution for the parameters, is considered; in this case, the results
depend on the choice of prior only through its second moments.
important special cases (dl-l, d 2=2),
The
(d l -2, d 2-3, p-l) and (dl-l,
d =3, p=l) are considered in detail and it is seen that the use of this
2
generalized estimator provides the same flexibility as the approach of
ii
Karson, Manson t and Hader (1969) with regard to the choice of experimental
design.
,
A method is also provided for assessing the accuracy of the
restrictions placed on the parameter space.
TABLE OF CONTENTS
PAGE
LIST OF TABLES •
iii
LIST OF FIGURES •
iv
CHAPTER
I
1.1
1.2
-
II
··
·
39
····
···
·· ····
Quadratic-Cube Case
Linear-Cubic Case
····
SUMMARY AND SOME ESTIMATION CONSIDERATIONS
LIST OF REFERENCES
1
5
10
21
·
···
Univariate Case
Mu1tivariab1e Case.
1
10
CONSIDERATION OF J FOR THE CASES d =2, d =3, p=l
1
2
AND d =1, d 2=3, pml.
1
4.1
4.2
V
··
Univariate Case
Mu1tivariab1e Case.
·
··
CONSIDERATION OF J* WHEN d 1=1, d =2.
2
3.1
3.2
IV
··
CONSIDERATION OF J WHEN d =1, d =2
2
1
2.1
2.2
III
·············
Introduction. .
············
Discussion of New Approach.
··
··
GENERAL CONSIDERATIONS
......················
39
45
54
54
58
63
67
iii
LIST OF TABLES
PAGE
TABLE
and~2
2.1.1.
Best k1
for Given (M1 , M2).
2.2.1.
Best Values of k
5.1.
Values of ~2a~ Which Give the Same A for
= ~2M/(1+~2M)
and
A
Dif ferent Values of,
. •• 18
~2 . .
.31
A
\I
and a=. 05. •
. . . . . . . . .66
iv
LIST OF FIGURES
PAGE
FIGURE
•
2.1.1.
Positioning of the Optimal Value of k
Relative to (2.1.13). • • • • • • .
1
• • . • • • • . 15
CHAPTER I
•
GENERAL CONSIDERATIONS
•
1.1
Introduction
Response surface methodology is concerned with a functional
relationship
--
n
f(E;;8)
between a response nand p non-stochastic factors E;1,E;2, ••• ,E;p which
usually represent quantitative variables such as time, temperature, etc.
Geometrically, f(E;;8) represents a surface in a (p+l)-dimensional space
--
whose coordinates are the p E;'s and n.
assumed to depend on parameters
The shape of this surface can be
el ,8 2 , ••• ,8 r ,
the values of which are
usually unknown and must be estimated from experimental data typically
consisting of N measured values of n corresponding to N specified combinations of levels of the factors
E;'
-u
=
(E;l u ,E;2 u ,. .. ,E;pu ) ,
u"'l, 2, ••• , N.
The set of points {E; } N is called the experimental design.
-u u... l
Since the
nature of the variables whose levels are represented by the {E;iu} will
change from one application to another, it is useful to define the
experimental design in terms of "standardized" variables {x iu } of the
general form
2
1=1,2, .•. ,p, u=1,2, ... ,N,
X.
III
where
f
..
N
~i
~ L i.,iu and 5 i is a scale factor.
u=1
N
L x = 0,
u=l iu
is the origin of
Note that
i=L,2, ... ,p, so that the center of interest of the
~'s
the x's .
In practice, the form of the function
f(~;~)
and the usual procedure (see Box and Wilson (1951»
is generally unknown
is to assume that the
unknown response function can be represented at every point in
X,
some
specified region of interest in the standardized space, by a low order
polynomial.
The coefficients of this polynomial are then the unknown
parameters and can be estimated by standard least squares methods.
Such
polynomial response functions are usually easy to fit mathematically, but
are very untrustworthy when extrapolated to points outside the region of
experimentation.
The properties of polynomial graduating functions are influenced
by the choice of experimental design.
Initially, criteria for judging
the goodness of designs were largely concerned with variance considerations.
The question of bias due to inadequacy of the approximating
polynomial was given only minor attention.
In reality, of course, a
graduating function such as a polynomial will always fail, to some extent,
to represent the true relationship exactly.
In an effort to take "bias error" as well as "variance error"
into account, Box and Draper (1959, 1963) adopted mean square error
integrated over the region of interest X as a basic design criterion.
,
In particular, if y(~) denotes the value of the estimated response at
the point
~'
= (x l ,x 2 , •.• ,xp ) in X where this value is obtained by fitting
by least squares a polynomial of degree d l
(~l)
in x ,x , ••• ,xp ' and if
l 2
3
n(!) is the true response .( x given exactly by a polynomial of higher
degree d , then they
conc~rned
themselves with the problem of characterp+d
l
izing by choice of experimental design the (
) coefficients of y{~)
2
•
p
which minimize the multiple integral
(l.l.l)
This quantity J is the mean
square error at
~
averaged uniformly over
X and normalized with respect
to the number of observations and the experimental error variance 0 2 •
To illustrate the partition of J into a variance component and a bias
component, (l.l.l) can be written as
J
= V+B,
where
V
NU
= --2
o
f
X
A
Var[y(x)]dx
--
(1.1.2)
and
(1.1.3) .
The minimization of J by choice of design will depend upon the relative
magnitudes of the V and B contributions.
There are two reasons why one might consider fitting a polynomial
of lower degree when the true response might possibly be represented by
one of higher order.
In some instances, the data necessary for fitting
the higher order equation may not be available; for example, there may be
a physical limitation on the number of experimental units or cost considerations may be prohibitive.
Secondly, it is not always clear that the change
4
in integrated mean square error caused by elIminating B and altering V
will give smaller J than when fitting the lower order polynomial.
It
is possible that a sizeable increase in V may result when the higher
order equation is fitted.
•
Taking X to be some simple region such as a hypersphere, Box
and Draper (1959, 1963) considered the cases dl=l, d =2 and d =2, d =3
2
l
2
and were led to the somewhat unexpected conclusion that unless the V
contribution was many times larger than the B contribution, the optimal
experimental designs for minimizing J were remarkably close to those
obtained by ignoring V completely and minimizing B alone.
Draper and
Lawrence reached similar conclusions when the region of interest is taken
to be cuboidal (1965) and when the intensity of interest in the factor
space is represented, not by a uniform weighting over a spherical or
cuboidal region, but by a symmetric multivariate distribution weight
...
function (1967).
Motivated by the above work, Karson, Manson, and Hader (1969)
developed an alternative approach which entailed using a method of estimation aimed directly at minimizing B and then employing any additional
flexibility in the choice of design to minimize V.
With this procedure,
they were able to find designs with smaller J than those suggested by
Box and Draper.
We should emphasize that Karson, Manson, and Hader
compared their methodology only to the Box-Draper "all-bias" case (Le.,
to that special class of designs which minimize B with least squares
estimation).
•
This is not too objectionable since Box and Draper con-
cluded that these "all-bias" designs are usually reasonable to use.
However, the Karson-Manson-Hader procedure may not in general give smaller
J than the Box-Draper approach for other designs since comparison depends
5
on the values of the unknown coefficients.
Another drawback to their
"minimum bias estimator" is that, in some cases, its use requires that
p+d
2
one obtain least squares estimates of all the (p ) coefficients in
n(~).
1.2
Discussion of New Approach
Blight (1971) considered the problem of mean square error (MSE)
estimation using an estimator which was a constant multiple of an unbiased estimator of the parameter of interest, and he gave sufficient
conditions for uniform improvement (in terms of MSE) over unbiased estimation.
His general approach was motivated by the fact that such esti-
mators have been shown to be uniformly better than the ordinary unbiased
estimator of powers of the standard deviation in normal samples (e.g.,
see Markowitz (1968), .Stein (1964), and Stuart (1969».
= ~~+~
For the linear model!
with the standard assumptions, one
= ~'y
can show that the particular linear function b
MSE{b)
and
A
~
= ~'S02+(1-S'~)2~2
= (x'x)
............
-1
is of the form kB, where k
which minimizes
= ~2/[~2+(~,~)-l02]
x'y
is the best linear unbiased estimator
.............
of~.
In this
same setting, Korts (1971) demonstrated that the b which maximizes the
minimum over {~: I~I
this multiple being
o=
2
M} of Pr(~-£
2
b
2
"-
~£) is also a multiple of ~,
2[1+(l+20)~]-1, where
log
In the mu1tivariable case ......y
e
= X~+e,
-_ .....
o<
£
< M.
Korts obtained (under a mild restric-
- = e'y
-- ,.,which jointly
tion) the analogous result that the elements of b
satisfy his "closeness" criterion are the elements of
--
K~,
where K is a
-
6
specified diagonal matrix and
A
S=
,.."
,.."
-1
- -X'y.
(X'X)
.....
One can relate the implications of the above work to the problem
of minimizing integrated mean square error in the following way.
In the
setting of Section 1.1, let n(~) = ~i~l + ~i~2 and let y(~) be of the
form ~i21.
The vector ~1 is made up of terms required for the poly-
nomial of degree d l ; the vector
~2
consists of the additional higher
order terms required to represent the polynomial of degree d , while
2
and
~2
are the corresponding vectors of unknown coefficients.
~l
With this
framework in mind, we define a new estimator
A
' )-1, =
= ~ (~1~1
~ll
~~1'
b
-1
p+d
1
where K is a size ( p . ) diagonal matrix of appropriately chosen constants, ~1 is the matrix of values taken by the terms in
~1
experimental combinations of the independent variables, and
for the N
~
is the
column vector of the measured values y = n(x ) + e , u=1,2, ••• ,N. We
.
u
-u
u
make the usual assumptions that E(e )=0, E(e 2)=02, and E(e e ,)=0 for
u
u
u u
u+u' .
!5=!,
Note that in the special case when
our generalized estimator
A
21 is simply the least squares estimator
When y(~)
A
= ~i~~l'
~1
employed by Box and Draper.
then (1.1. 2) and (1.1. 3) take the respective
forms
v = NQ
J
X
' ( 'X )-1
~1~ ~1-1
!5 ~1 dx
(1.2.1)
and
NQ
B =J [~i~(~1+~2) - (~i~1~2~2)]2d~,
02 X
where
~2
is the counterpart of !1 when
"alias matrix"
~2
replaces
~1
(1.2.2)
and where the
7
has for its elements the quantities which measure the extent of bias in
A
~l
in accordance with the equation
E{B-1 ) = (3 1
+ ~~2.
Note that (1. 2 .,1) depends'! neither on the elements of (3' = {~i' ~i> nor
on
0
2
•
In general, the particular choice of
diagonal elements of
~)
~
(i.e., the choice of the
which minimizes the sum of (1.2.l) and (1.2.2)
will depend, unfortunately, on the unknown elements of the parameter
vector (3 and also on 0 2 •
However, if the parameter space can be
restricted by specifying bounds for the elements of (3 relative to 02/N,
it is possible (at least in all the cases we have considered) to determine a set of
~'s,
each
~
in the set depending only on the given bounds
and on the choice of experimental design and each
~
in the set making J
smaller in value than when K=I (the Box-Draper case).
be found that particular choice of
~
over the restricted parameter space.
Within this set can
which minimizes the maximum of J
Although the complexity of expres-
sions (1.2.l) and (1.2.2) makes this methodology difficult to implement
for general d , d , and p, this is no real problem since most of the
l
2
important special cases can be dealt with quite successfully.
For
example, the case dl=l, d 2-2 is considered in detail in Chapter II, and
the cases d =2, d -3, p-l and dl=l, d 2-3, p-l are discussed in Chapter IV.
2
l
A more common Bayesian approach would involve minimizing by choice
of
~
the quantity obtained by averaging J over a prior distribution for
the unknown parameters.
The introduction of such a prior distribution
8
is yet another way to base estimation on any a priori information which
11I;IY
Ill' <ivai Luble concerning the parameter values.
example, if sufficient knowledge of a parameter
As all important
B is
available so that
a constant M can be specified for which it is known that
one possible choice for a prior on
[-M,M].
B is
IBI
~ M, then
the uniform distribution on
The use of such a "vague" or "non-informative" prior provides
a formal way of expressing ignorance with regard to the value of the
parameter over: the range:permitted.
In other words, nothing is said
about the value of the parameter, except the basic fact that it is
restricted to lie within certain definite limits.
An excellent dis-
cussion of the philosophy involved in specifying prior distributions
for parameters can be found in the third chapter of Jeffreys (1961).
Now, in our particular situation, let
tribution for the elements of
B'
~
=
(B',
S')
~l
~2
g(~)
denote a prior dis-
with domain
B.
Then, our
second approach is concerned with minimizing
where V and B are given by (1.2.1) and (1.2.2).
This approach is not
only quite amenable to analysis, but also leads to meaningful results
whose interpretations do not necessitate an exact specification of the
prior distribution.
the case dl=l, d
a
2
2.
These advantages are illustrated in Chapter III for
It is interesting to note that this concept of
taking the expected value of J with respect to some prior distribution
is somewhat similar in motivation to the notion that Box and Draper
(1959) had of averaging J over all orthogonal rotations of the response
surface.
9
Another important advantage of our procedure over that of Box
and Draper is that it enjoys the same flexibility as the Karson-MansonHader technique with regard to the choice of experimental design.
This
additional flexibility arises because the use of our generalized estimator results in smaller values of J (or J*) than when
choice of design.
~=!
for any
Such "uniform" improvement is not unexpected in
,
view of the work by Blight and others cited earlier.
CHAPTER II
CONSIDERATION OF J WHEN dl=l, d =2
2
2.1
Univariate Case
The special
proble~
of how best to fit a straight line to a
possibly quadratic response has received considerable attention in the
literature.
For example, 'David and Arens (1959) considered two functions
of MSE as criteria for finding optimal locations for the independent
variable.
-e
Box and Draper and Karson, Manson, and Hader also illustrated
their approaches for the case dl=l, d 2=2, p=l.
For simplicity and without loss of generality, we take the region
of interest X to be the closed interval [-1,1].
We assume that the true
response is given by the quadratic expression
(2.1.1)
Our suggested estimator of n(x) takes the form
(2.1.2)
"-
61
N
N
I
u=l
I
x (y -y)/
x 2 are the usual least squares
u u
u=l u
estimates of intercept and slope calculated via N pairs of data points
where 13 0
Y and
=
N
I
(x ,y ), u=l,2, ••• ,N, for which
x = O. In words, we are considering
u u
u=l u
the situation of fitting a straight line when the true response, while
roughly linear, may contain a quadratic component.
Emphasis is placed
on attaining an "optimal" straight line fit with a small, fixed total
11
\lumber of observations N, rather than on detecting departures from
linearity.
For the latter purpose, more than two abscissae would be
needed.
Now, let ~
j=2,3,4.
1
j
= -N
N
I1
.
x J denote the j-th moment of the design points,
u
u=
Then, since
-1
Q
f
=
1
dx = 2
-1
and
it follows from (1.1.2) that
N
V = ---
f1
20 2 -1
= ~
f
1
-1
A
Var[y(x)]dx
(k~+k~x2/~2)dx
(2.1.3)
Similarly, if we note that E(~O)
= SO+~2S2
and E(~I) = el+~3s2/~2' then
from (1.1.3) we have
N
B = ---
1
/ {E[y(x)]-n(x)}2dx
20 2 -1
N
= --20 2
/1
A
2
A
2
[kOE(eO)+klxE(e1)-(eO+elx+e2x )] dx
-1
1
• -!- / [(k O-l)SO+(k1 -l)e 1x + (kO~2+klX~3/~2-x2)e2]2dx
20 2 -1
= [kO(aO+~2a2)-(aO+a2/3)]2
+
~[kl(al+~3a2/~2)-al]2
+
4a~/45,
(2.1.4)
12
where a
i
= ~i/(a/~), i=0,1,2.
The sampling error in y(x) is propor-
tional to a/~ so that a~, for example, may be interpreted as being a
"standardized" measure of the strength of the linear component of T)(x).
v=
1 + 1/3J.l
(2.1.5)
2
and
(2.1.6)
which correspond exactly to the expressions on pages 627-8 of Box and
Draper (1959).
smallest when J.l
Clearly, (2.1.5) is minimized when J.l =1 and (2.1.6) is
2
2 = 1/3 and J.l 3
Draper criterion J
bd
= O.
Thus, the minimization of the Box-
, the sum of (2.1.5) and (2.1.6), by choice of
design clearly depends on the relative magnitudes of the variance and
bias contributions.
The values of k
O
and k
l
which minimize (2.1.4) can be determined
by inspection, and it is a straightforward exercise to show that J, the
sum of (2.1.3) and (Z.1.4), is minimized when
k
o=
(aO+J.lZa Z) (a O+a Z!3)
1+ (aO+J.lZa Z)
2
(Z.1.7)
and
k
1
=
J.l2al(al+J.l3a2!J.l2)
1+J.l2(al+J.l3aZ!J.l2)
(2.1.8)
2
For that rare occasion when the {ail are essentially known, the above
results will be useful.
known.
Almost invariably, however, the {ail are un-
This more realistic situation is the one of primary interest
13
tll us <lnd will
bl~
discussed In conJunctlon wlth whnt follows.
Now, the value of J for k
O
and k
1
given by (2.1.7) and (2.1.H)
can be shown to be less than the corresponding value when k =k =1 (the
o
1
Box-Draper case) for any choice of design for which ].13=0 (the optimal
third moment value when kO=kl=l).
In fact, for any design with ].13=0
(e.g., one symmetric about zero), it is true that (J-J
for values of k
O
and k
l
bd
other than (2.1.7) and (2.1.8).
) is negative
To prove this
result and to get the most from its implications, it is much easier to
take kO=l.
Note that the structure of J permits us to treat k
separately.
From (2.1.2) with this alteration, we then have
O
and k
l
(2.1.9)
and, for ].13=0, the sum of (2.1.3) and (2.1.4) becomes
(2.1.10)
J
Since J
bd
is (2.1.10) with kl=l, it follows that
(2.1.11)
which would involve a
that (J-J
bd
O
if k
O
were included.
) will be negative for any k
l
From (2.1.11), it follows
satisfying
(].12a~-1)
(1+].12a~)
(2.1.12)
Note that ].12a~/(1+].12a~) is (2.1.8) with ].13-0.
Unfortunately, the lower bound in (2.1.12) depends on a~.
To
circumvent this obstacle, we suppose that a constant M can be specified
l
14
for which it is known that a~ < Ml .
In any practical situation, such an
upper bound will always exist; and, since our approach permits the
experimenter to be as conservative as he wishes in his specification
of M , essentially no loss in either generality or flexibility is
l
suffered by restricting the parameter space in this simple way.
Of
course, any knowledge the experimenter has gained from prior experience
or from keen insight can be put to good use.
Some other methods for
dealing with the problem of not knowingaf will be discussed in
Chapter S.
With af < Ml , it follows from the monotonicity in af of the lower
bound of (2.1.12) that
Thus, any k
l
satisfying
(2.1.13)
will also satisfy (2.1.12).
1/(1+~2Ml)'
The width of the interval (2.1.13) is
which is never less than l/(l+Ml ).
For small values of Ml , (~2Ml-l)/(1+~2M1) may be negative; for
large values of M , it is clear that any k satisfying (2.1.13) will be
l
l
quite close to unity.
for any
~2'
It is important to mention that (2.1.13) holds
a phenomenon which will provide us with a great deal of
flexibility in the choice of design.
Also, we note that ~2af/(l+~2af),
the optimal choice for kl,is never greater than the upper bound in
15
(2.1.13) but is possibly less than the lower bound.
Figure 2.1.1
illustrates the two possible locations for the optimal value of k
If M is chosen so that a~ ~ M /2, then the
l
1
relative to (2.1.13).
optimal value of k
l
l
will be contained in (2.1.13).
Careful specifica-
tion of the restricted parameter space can help to insure that this
happens.
_ _ a~ < M /2
a~ ~
o
1
M/2
...............................................
"""-
", ,
" ,,
I
/
/
'I
1/
/
"-
"-
"
.......
........
-----
,/
~/
/
/
I
/
-"
1
FIGURE 2.1.1.
POSITIONING OF THE OPTIMAL VALUE OF
k l RELATIVE TO (2.1.13)
16
Thus, for a~ < M , any k 1 satisfying (2.1.13) will result in a
l
smaller integrated MSE than when kl=l.
One possible explanation for
such "uniform" improvement is that in our approach we are willing to
tolerate a "structural" change in the bias term in order to achieve a
reduction in the variance contribution.
This flexibility in the choice of k
~2)
l
(and also in the choice of
can be used to satisfy other meaningful design criteria as well.
More specifically, that particular k
1
which minimizes the maximum of
J over the restricted parameter space {(a~, a~): a~ < M , a~ < M } can
l
be easily determined.
2
Clearly, (2.1.10) is a maximum when a~=Ml and
a~=M~, so that the k l minimiZing this maximum is ~2Ml/(1+~2Ml)' which
is exactly the upper bound in (2.1.13).
The corresponding optimal choice for
~2
is the appropriate root
of the polynomial obtained by differentiating (2.1.10) with a~=Ml'
a~=M2' and with k
l = ~2Ml/(1+~2Ml).
3
This polynomial is
.
L aj~~'
j=O
where
a2
=
a1 -
and
By writing the polynomial in the form
(2.1.14)
17
we can see that it has exactly one positive root.
Although no explicit
expression for this root has been found, it is clear from (Z.1.14) that
it is greater than 1/3, is monotonically increasing in M , and is
l
mnnotonically decreasing in MZ'
If we take VZ=l in (Z.1.14) and set
the result less than or equal to zero, the resulting inequality
gives the relationship between the values of M and M for which the
Z
l
root is at least unity, in which case we take VZ=l; if not (e.g., when
M
Z
~
1/4), the best choice for Vz is the value of the root in the open
interval (1/3, 1).
Table 2.1.1 gives the best choices for k
tions of values of Ml and MZ '
l
and V for some combina-
z
We note that VZMl/(l+VZM ) is monotonically
l
increasing in M and monotonically decreasing in M •
l
2
These are reason-
able results since a large value of M indicates a strong linear effect
l
which should be associated with a large value of k •
l
Similarly, a large
value of M would indicate a strong quadratic effect and it would be
Z
logical to have the linear effect suppressed somewhat via a smaller value
It is of interest to compare our estimator (2.1.9) to the alternative procedure suggested by Karson, Manson, and Hader.
For the particular
problem under study, their estimator in the "full rank" case is
o
y(x) •
(I, x) [ :
1
a = (1,
x)AS
~~
18
TABLE 2.1.1.
(M ,M )
1 2
e
(.25, .25)
(.25,.5)
(.25,1)
(.25,2)
(.25,4)
(.25,6)
(.25,8)
(.25,10)
(.25,13)
(.25,16)
(.25,19)
(.25,22)
(.25,25)
( .5, .25)
(.5,.5)
(.5,1)
(.5,2)
(.5,4)
(.5,6)
(.5,8)
(.5,10)
(.5,13)
(.5,16)
(.5,19)
(.5,22)
(.5,25)
(1,.25)
(1,.5)
(1,1)
(1,2)
(1,4)
(1,6)
(1,8)
(1,10)
(1,13)
(1,16)
(1,19)
(1,22)
(1,25)
(2, .25)
(2, .5)
(2,1)
(2,2)
k
1
.084
.081
.079
.078
.077
.077
.077
.077
.077
.077
.077
.077
.077
.182
.164
.154
.148
.146
.145
.144
.144
.144
.144
.143
.143
.143
.373
.326
.294
.274
.263
.259
.256
.255
.254
.253
.253
.252
.252
.602
.548
.500
.462
BEST k1 AND
~2
.368
.351
.342
.338
.336
.335
.334
.334
.334
.334
.334
.334
.334
.445
.392
.363
.348
.341
.338
.337
.336
.336
.335
.335
.335
.335
.595
.485
.416
.377
.356
.349
.345
.343
.341
.339
.338
.338
.337
.756
.606
.500
.430·
~2
FOR GIVEN (M ,M )
1 2
(M ,M )
1 2
(2,4)
(2,6)
(2,8)
(2,10)
(2,13)
(2,16)
(2,19)
(2,22)
(2,25)
(4,.25)
(4, .5)
(4,1)
(4,2)
(4,4)
(4,6)
(4,8)
(4,10)
(4,13)
(4,16)
(4,19)
(4,22)
(4,25)
(6, .25)
(6, .5)
(6,1)
(6,2)
(6,4)
(6,6)
(6,8)
(6,10)
(6,13)
(6,16)
(6,19)
(6,22)
(6,25)
(8, .25)
(8, .5)
(8,1)
(8,2)
(8,4)
(8,6)
(8,8)
(8,10)
k
1
.436
.425
.420
.416
.413
.410
.409
.408
.407
.776
.737
.698
.661
.630
.615
.606
.601
.595
.591
.588
.586
.585
.845
.816
.785
.755
.727
.714
.705
.699
.693
.689
.686
.684
.682
.. 882
.859
.834
.809
.785
.773
.765
.760
~2
.386
.370
.361
.356
.351
.348
.346
.344
.343
.867
.702
.577
.487
.425
.399
.385
.376
.367
.361
.357
.354
.352
.909
.739
.610
.514
.445
.415
.399
.388
.377
.370
.364
.361
.357
.931
.760
.628
.529
.457
.425
.407
.395
19
TABLE 2.1.1. (Continued)
(M ,M )
l 2
e
(8,13)
(8,16)
(8,19)
(8,22)
(8,25)
(10,.25)
(10,.5)
(10,1)
(10,2)
(10,4)
(10,6)
(10,8)
(10,10)
(10,13)
(10,16)
(10,19)
(10,22)
(10,25)
(13, .25)
(13, .5)
(13,1)
(13 ,2)
(13,4)
(13,6)
(13,8)
(13,10)
(13,13)
(13,16)
(13,19)
(13,22)
(13,25)
(16,.25)
(16,.5)
(16,1)
(16,2)
(16,4)
(16,6)
(16,8)
(16,10)
(16,13)
(16,16)
(16,19)
k
l
]J2
.754
.750
.747
.745
.743
.904
.885
.865
.843
.823
.812
.805
.800
.795
.791
.788
.786
.785
.926
.911
.894
.877
.860
.851
.845
.840
.836
.833
.830
.828
.827
.939
.927
.913
.899
.884
.876
.871
.867
.863
.860
.858
.383
.375
.369
.365
.362
.945
.772
.639
.538
.464
.432
.413
.400
.387
.379
.373
.368
.364
.957
.784
.649
.547
.472
.438
.418
.405
.392
.383
.376
.371
.367
.965
.791
.656
.553
.478
.442
.422
.409
.395
.385
.378
(M ,M )
1 2
(16,22)
(16,25)
(19,.25)
(19,.5)
(19,1)
(19,2)
(19,4)
(19,6)
(19,8)
(19,10)
(19,13)
(19,16)
(19,19)
(19,22)
(19,25)
(22,.25)
(22,.5)
(22,1)
(22,2)
(22,4)
(22,6)
(22,8)
(22,10)
(22,13)
(22,16)
(22,19)
(22,22)
(22,25)
(25,.25)
(25,.5)
(25,1)
(25,2)
(25,4)
(25,6)
(25,8)
(25,10)
(25,13)
(25,16)
(25,19)
(25,22)
(25,25)
kl
]J2
.857
.855
.949
.938
.926
.914
.901
.894
.890
.887
.883
.880
.878
.877
.876
.955
.946
.936
.925
.914
.908
.904
.901
.898
.895
.894
.892
.891
.961
.953
.943
.934
.924
.918
.915
.912
.909
.907
.905
.904
.903
.373
.369
.970
.796
.661
.557
.480
.445
.425
.411
.397
.387
.380
.375
.371
.974
.800
.664
.560
.483
.448
.427
.413
.398
.389
.382
.376
.372
.977
.803
.667
.563
.485
.449
.428
.414
.400
.390
.383
.377
.375
20
where
a = (X'X)-lX'y
- - --
is the vector of ordinary least squares estimates
of all the parameters in (2.1.1).
moments (with
~3=0),
When expressed in terms of design
their integrated MSE takes the form
(2.1.15)
Here, it is required that (~4~~~) > 0, which necessitates taking measurements at no less than three distinct abscissae.
Since their pro-
cedure minimizes B for any choice of design, their minimum B (=4a~/45)
is independent of
~2
and
~4'
The difference between (2.1.10) and (2.1.15) is
(J-J
kmh ) =
[(k~-1)/3~2 + ; a~(k1-1)2]
(2.1.16)
+ (~2- 1/3)2[a~-(~4-~~)-1].
Thus, when a~ < M and a~ < M2 , (2.1.16) is negative for any k
1
1
satisfying (2.1.13) for which the design satisfies
(2.1.17)
As an illustration, suppose we agree to take n/2 observations at
each of the two points x
O<n<N.
Since ~4 = ~4p
= ±~,
= ~2~2
0<~~1,
for p
and (N-n) observations at x = 0,
= n/N,
we have (~4-~~)
= ~2(~2_~2)'
t denotes the value of our best choice for ~2 (determined as described
t
earlier), then we naturally require ~2p = ~2' so that ~2 must satisfy
If
~2
or, equivalently,
21
Since we wish
~2
to be as close to
~~
as possible so that (2.1.16) is
as negative as possible, it is clear that the best design specifies
taking only one observation at x=O if N is odd and only two if N is
even.
We note that an optimal Karson-Manson-Hader design may not
satisfy (2.1.17) depending on the value of M •
2
For example, in case
(a) on pages 466-9 of their Technometrics article, they considered
fitting a three-point design with observations at 0 and
optimal value of
~
is
1:273
±~.
Their
2~4
so that (~4-~~) = --9-- = 8/81, which satisfies
(2.1.17) for M no greater than 81/8.
2
In general, any fair comparison
among the three alternatives will depend on a~.
mention that in the "less than full rank" case
It is important to
(I t~1
= 0) i t is
sufficient that the Karson-Manson-Hader "minimum bias estimator" be
identical to the Box-Draper least squares estimator with an "all-bias"
design in order for
~
to be estimable.
Briefly summarizing our findings, if upper bounds (no matter how
crude) can be specified for a~ and a~, then the judicious use of our
generalized estimator (2.1.9) leads to smaller integrated MSE than is
achieved either by Box-Draper or by Karson-Manson-Hader, and it results
in the minimization of the maximum integrated MSE over the restricted
parameter space.
We feel that these are strong recommendations in
favor of our approach.
2.2
Multivariable Case
The selection of the region of interest X is an important con-
sideration in the multivariable case.
Motivated by the strategy of
22
sequential exploration proposed by Box and Wilson (1951), Box and
Draper felt that designs which symmetrically generate information would
be reasonable to use.
They interpreted "symmetric" to mean that the
region X should be a sphere and they consider the problem of approximating over this spherical region a quadratic relationship by a plane
in x l ,x 2 , •.• ,xp •
Draper and Lawrence (1965) thought it was important
to include the "corners" omitted by a sphere and so worked with a
cuboidal region of interest in the above setting.
We note that the
closed interval [-1, 1] is a special case of both types of regions.
We shall first consider a cuboidal region of interest X defined
by -1
~
xi
~
1, i=1,2, ••• ,p.
The true response is assumed to be given
by the quadratic expression
n(!) = (SO+Slxl+···+Spxp ) + (Sllxf+S22x~+ ••• +Sppx;
+ S12xlx2+Sl3xlx3+···+Sp_l,pxp_lxp)
= ~i~l + ~i~2'
(2.2.1)
where
and
Our suggested estimator of
n(~)
takes the form
23
A
=
~i ~ ~l'
(2.2.2)
with
-and
where
is a (p+1)xN matrix with xl'
- u = (1,x1 u ,x 2u , •.• ,xpu ) and ¥.-_ is the column
vector of the observed responses yu = n(x-u )+e u , u=1,2, ..• ,N.
where
is the (p+1) x p(p+1)/2 "alias matrix" and
x'
-2 =
with x- ' u = (x12 u ,x 22 u ,
2
•••
2
,xpu
,xl ux 2u ,xl ux u , ••• ,xp- 1 ,uxpu ).
3
Since
it follows from (1.1.2) that
Then,
24
(2.2.3)
•
where "tr" denotes the trace operator,
~11 = Q
f
X
C= ;
~i~l'
and
~l~i dx.
(2.2.4)
It is to be understood that the integration in (2.2.4) is performed
"e1ementwise" with respect to the entries in the matrix
~l~i'
With the appropriate substitutions, we can write (1.1.3) as
•
- 2~2~21(~~1+~~2-~1)+g2~22~2'
(Z.Z.5)
where
~21 = Q
f
X
,
(Z.Z.6)
,
(2.2.7)
~2~1 d~,
~22 = Q fX !2!Z d!,
Q:1 =
,
aIN ~1'
and
~
IN
~2 "'-6
a -Z'
25
The entries in
~l
and
gz measure the "size" of the elements of
~2' respectively, relative to the sampling error
~l
and
(o/;N).
For the cuboidal region of interest defined earlier,
n- l
f
-
-
dx
X
= 2P •
And, by performing the necessary integrations over the hypercube, we have
0'
~11 =
~21
e
[:
-- [:p
1
3
-
II
]
(2.2.8)
3-p
:] .
(2.2.9)
and
~I
~22
1
-o
+1 l'
s-p -p-p
= "9
o
(2.2.10)
I
-p (p-1)
2
where 1 is a (p x1) column vector of ones and I is the identity matrix
-p
-p
of order p.
1 N
1 N
Let ~ij = N L xi x. and ~hij = N L ~uXiuXju' 1 ~ h,i,j ~ p.
u=l
u JU
u=l
With (2.2.8), (2.2.9), and (2.2.10), it can be shown from (2.2.3) and
(2.2.5) that the elements of J - V+B are
(2.2.11)
and
26
+
+
1t=lr k t (k t -1)at i=lr j=ir ai]' h=lr cgh~hi]'
t [~g i~l j~i ~l cgh~hij]2
4
+ 45
where c
ij
all
aij
r
r
2 + 1p-1
\
2
aii
9 l
a i ]·,
i=l
i=l j=i+1
(2.2.12)
1
denotes the i,j-th element of C- .
i=1,2, .•. ,p,
= 1
+1
r
c
ii
(2.2.13)
3 i=l
and
B = Bdt =
[ I! ~ijaij
i=l j=i
+
-;
!
i=l
~ ~l[i~l j~iaij
2
a ii ]
r cgh~hij]2
h=l
(2.2.14)
The Draper-Lawrence criterion J
dt
is the sum of (2.2.13) and (2.2.14).
The basic conclusions concerning the minimization of integrated
MSE by choice of experimental design when the region of interest is
cuboidal are consistent with those for spherical regions.
That is
27
(see Box-Draper (1959, p. 639),
Jd~
is minimized for all a ij and c
moments
1i)
~hij
ij .
when the third
of the design are chosen to be zero.
Vdt alone is minimized when
~hij=O, ~ij=O,
and ~ii=l
for all h, i, and j.
iii)
Bdt alone is minimized when ~hij=O, ~ij=O, and ~ii=1/3
for all h, i, and j.
iv)
If values of a
obtained by averaging
ij
Jd~
over all
orthogonal rotations of the response surface (see
Appendix 2 of Box-Draper (1959»
are used, then J
dt
is minimized when ~hij=O, ~ij=O and ~ii=~2 (say) for
all h, i, and j.
Under these conditions, (2.2.13) and (2.2.14) become
(2.2.15)
and
= [(~2-1/3)2+4/45]
~
L a i2 .
i=l
+ 2(~2-1/3)2
1
! a
9 i=l j=i+1 ij
+!
We can easily see that
(2.2.l6), so that J
dt
P~l
2
~2=1
p-1
L
L a a
i=l j=i+1 ii jj
•
(2.2.16)
minimizes (2.2.15) and
is smallest for some value of
1 (the particular value of
~2
~
will depend on
~l
and
~2=1/3
~2
minimizes
between 1/3 and
~2)'
As in Section 2.1, further progress will be made much easier
by taking k
D
O
1; and, in order to make valid comparisons with Draper and
Lawrence, we shall also take
~hij=O, ~~j=O,
and
~ii=~2
for all h, i,
28
and j.
Our estimator then becomes
(2.2.17)
and the sum of (2.2.11) and (2.2.12) takes the form
J
4
+ 45
!
i=l
2
aii
P-1!
+.!. \
9
l
i=l j=i+1
('12
~ij
•
(2.2.18)
It is then a simple exercise to show that (2.2.18) is minimized when
i=1,2, ... ,p.
The difference between (2.2.18) and
Jd~'
(2.2.19)
the sum of (2.2.15) and
(2.2.16) is given by
(2.2.20)
Because of the structure of (2.2.20), it is clear that we can consider
each of the p terms in the sum separately, and it is crucial to note
that each of these terms is exactly of the form (2.1.11).
Thus, if
we are sure that a~ < Mi for ~ i, then we can proceed exactly as in
the univariate case and conclude that the i-th term in (2.2.20) is
negative for any k
i
satisfying
(2.2.21)
29
j
r
110
prior kllowh~dge concerning (Y.~ 16 available, we must content our-
selves with taking ki=l.
The implications here are clear:
if we can
specify an upper bound for at least ~ of the a~, then (2.2.18) can be
made smaller in value than Jd~.
Of course, the more {a~} we have
knowledge of, the better we can do.
As in the univariate case, the flexibility in the choice of k.
1
evidenced by (2.2.21) can be used to satisfy a minimax criterion.
(2.2.18) is maximized at a~=Mi for any {aj} (j;i) and for any {a
{a~j}' it follows that the value of k
i
Since
ii
} and
which minimizes this maximum is
given by
(2.2.22)
This value is the upper bound in (2.2.21) and represents quite a
reasonable choice for a particular value of k
i
in (2.2.21).
By inspection of (2.2.18), we see that the minimax solution for
~2 requires that we have prior knowledge concerning all of the {a~} and
--
the {a
for
ii
~2
}.
1
If such knowledge is not available, then a reasonable choice
is 1/3, the value which minimizes B.
In this case, we would be
comparing our procedure with that of Box and Draper only in the "allbias" situation; this is exactly the comparison made by Karson, Manson,
and Hader.
Now, if we can specify that a~ < Mi and laill < Mii for ~~~ i,
it is then possible to use this information to find
for
~2.
The appropriate choice for
~2
II
minimax Holutlun
is the only positive root of the
polynomial equation that results from differentiating (2.2.18) with
a{=Mi , aii=Mii , and k i
= ~2Mi/(1+~2Mi)'
i=1,2, ••• ,p.
This polynomial
30
equatiun Is oE the form
1 LP [M
1+ 1M
(~2-1/3) = 6M2
~2 i
o 1=1
]2
(2.2.23)
'
p
r J.
where M =
M'i' The complexity of this expression dictates that the
O i=l
root must be found numerically (we have used Newton's method). It is
easy to see by inspection of (2.2.23) that we can expect only one root
greater than 1/3.
the {Mi }
As in the univariate case, a relationship between
and M exists which can be used to tell whether this positive
O
root will exceed unity.
If the {M } and M satisfy
i
O
4M~ ~ i=lr [l:~J.']
(2.2.24)
2
then the root is at least unity and we take
~2=1.
If (2.2.24) does
not hold (e.g., when M ~ 1]p/2) , we use for ~2 the value of the root in
O
the open interval (1/3, 1).
We should point out that by taking
~ij=O
for all i and j, we have eliminated the {a } from consideration.
ij
Since the mu1tivariable extention of the Karson-Manson-Hader
approach has not been published at this time, a comparison with their
work will not be made in this section.
We would expect such a compari-
son to produce results analogous to those found for the univariate
case.
Table 2.2.1 gives the best values for the {ki}i~l and ~2 when
p-2,3,4,S,7,10.
In order to look at a wide range of p values and still
maintain reasonable table size, we consider only the special case Mi-M,
i=1,2, •• "p, which implies that
ki·k·~2M/(1+~2M).
Not unreasonably,
31
TABLE 2.2. L
BEST VALUES OF k = ].12M/ (1+].12M) AND ].12
p=2
(M,M )
O
(.5,.5)
(.5,2)
(.5,5)
(.5,10)
(.5,15)
(.5,25)
(2,.5)
(2,2)
(2,5)
(2,10)
(2,15)
(2,25)
(5,.5)
(5,2)
(5,5)
(5,10)
(5,15)
(5,25)
(10,.5)
(10,2)
(10,5)
(10,10)
(10,15)
(10,25)
(15, .5)
(15,2)
(15,5)
(15,10)
(15,15)
(15,25)
(25,.5)
(25,2)
(25,5)
(25,10)
(25,15)
(25,25)
p=3
k
].12
k
].12
.213
.148
.144
.143
.143
.143
.657
.462
.413
.403
.402
.401
.833
.715
.652
.633
.629
.626
.901
.843
.796
.778
.773
.771
.938
.892
.856
.841
.837
.835
.962
.934
.909
.899
.896
.894
.540
.348
.336
.334
.334
.333
.959
.430
.352
.338
.336
.334
1.000
.502
.374
.345
.338
.335
1.000
.538
.389
.350
.341
.336
1.000
.551
.396
.352
.342
.337
1.000
.563
.402
.355
.343
.337
.238
.151
.144
.143
.143
.143
.667
.483
.419
.405
.402
.401
.833
.735
.661
.636
.630
.627
.909
.856
.804
.781
.775
.771
.938
.901
.862
.844
.839
.835
.962
.939
.914
.901
.897
.894
.624
.356
.337
.334
.334
.334
1.000
.467
.360
.340
.337
.335
1.000
.554
.391
.350
.341
.336
1.000
.593
.410
.357
.345
.338
1.000
.608
.418
.361
.346
.338
1.000
.620
.426
.364
.348
.339
32
TABLE 2.2.1 (Continued)
p=4
(M,M )
O
(.5,.5)
(.5,2)
(.5,5)
(.5,10)
(.5,15)
(.5,25)
(2, .5)
(2,2)
(2,5)
(2,10)
(2,15)
(2,25)
(5, .5)
(5,2)
(5,5)
(5,10)
(5,15)
(5,25)
(10,.5)
(10,2)
(10,5)
(10,10)
(10,15)
(10,25)
(15,.5)
(15,2)
(15,5)
(15,10)
(15,15)
(15,25)
(25,.5)
(25,2)
(25,5)
(25,10)
(25,15)
(25,25)
p=5
k
]12
k
]12
.259
.154
.145
.143
.143
.143
.667
.500
.424
.407
.403
.401
.833
.749
.670
.640
.632
.628
.909
.865
.811
.785
.777
.772
.938
.908
.868
.847
.840
.836
.962
.943
.918
.903
.898
.895
.699
.363
.338
.335
.334
.334
1.000
.500
.369
.343
.338
.335
1.000
.596
.406
.355
.343
.337
1.000
.639
.429
.364
.348
.339
1.000
.654
.438
.369
.350
.340
1.000
.667
.446
.373
.353
.341
.278
.156
.145
.143
.143
.143
.667
.515
.430
.408
.404
.401
.833
.760
.677
.643
.634
.628
.909
.871
.817
.788
.779
.773
.938
.912
.872
.849
.842
.837
.962
.946
.921
.905
.899
.895
.768
.370
.339
.335
.334
.334
1.000
.530
.377
.345
.339
.335
1.000
.633
.420
.360
.346
.338
1.000
.678
.445
.371
.352
.340
1.000
.694
.456
.376
.354
.341
1.000
.709
.464
.381
.357
.342
33
TABLE 2.2.1 (Continued)
=.
~_:
~
-
-~
.
:
__
:.~:--,:. .•
:.:
.0
.':
"-
-_.. :- _ :-~"~'-:'. ::.··l;:.;:·... .:::..:-w-
p-7
(M,M )
O
(.5,.5)
(.5,2)
(.5,5)
(.5,10)
(.5,15)
(.5,25)
(2,.5)
(2,2)
(2,5)
(2,10)
(2,15)
(2,25)
(5, .5)
(5,2)
(5,5)
(5,10)
(5,15)
(5,25)
(10,.5)
(10,2)
(l0,5)
(10,10)
(10,15)
(10,25)
(15, .5)
(15,2)
(15,5)
(15,10)
(15,15)
(15,25)
(25, .5)
(25,2)
(25,5)
(25,10)
(25,15)
(25,25)
=--==-.:r'.:':~
p=10
k
)J2
k
)J2
.308
.161
.146
.144
.143
.143
.667
.538
.440
.411
.405
.402
.833
.777
.690
.649
.637
.630
.909
.881
.826
.793
.782
.774
.938
.919
.879
.854
.844
.838
.962
.951
.925
.908
.901
.896
.892
.385
.342
.336
.334
.334
1.000
.582
.392
.350
.341
.336
1.000
.696
.445
.369
.350
.340
1.000
.743
.475
.383
.358
.343
1.000
.760
.486
.389
.362
.344
1.000
.774
.496
.395
.365
.346
.333
.169
.147
.144
.143
.143
.667
.565
.453
.416
.407
.403
.833
.795
.705
.657
.641
.631
.909
.892
.837
.800
.786
.776
.938
.927
.887
.859
.848
.840
.962
.955
.930
.912
.904
.898
1.000
.405
.346
.336
.335
.334
1.000
.649
.413
.356
.344
.337
1.000
.773
.478
.383
.357
.342
1.000
.823
.512
.400
.367
.347
1.000
.840
.524
.408
.372
.349
1.000
.854
.535
.414
.376
.351
34
k and P2 are monotonically increasing in M and monotonically decreasing
in M for given p.
O
Thus, strong linear effects indicated by large
values of M result in a value of k close to 1.
effects tend to suppress k.
Significant quadratic
For fixed M and M ' we see that k and P2
O
are monotonically increasing in p.
For the spherical region of interest defined by
p
L
x~ 2 1,
i=l
we make use of the relationship
in order to perform the integrations required to obtain the matrices
(2.2.4), (2.2.6), and (2.2.7).
integral and 6
i
represents the power of Xi in the particular matrix
element being integrated.
6 is odd.
i
The expression above is a Dirichlet
We note that the integral is zero if any
We then have that the volume of a sphere in p dimensions
is
-
dx •
and, it also follows directly that
[r(l!2)]P
r(t +
1)
35
0'
~11 ..
(2.2.25)
o
(p+2)-ll
-p
~21 = (p+2) -1 [ Q!p
__
(2.2.26)
and
Q
--
21 +1 l'
-p -P-P
~22
= (p+2)
-1
(p+4)
-1
o
(2.2.27)
I
-p(p-l)
2
With (2.2.25), (2.2.26) and (2.2.27), it can be shown that
(2.2.3) and (2.2.5) become
(2.2.28)
and
2
P
[P
I
i=l
PIP
-
+2(k O-1)aO ~ a ii +
P
i=l
+
+2 ~ k~(k~-l)a~
a ij
c
P
~=l
i=1 j=i
h=l
+
+2
k
a ij
c
P
g=l g i=l j=i
h-1
+
1
[2(P+2)
(p+2) 2 (p+4)
i-I
2
P
1
P [
l
kO
P
l
P
l
P
j-i
P
P
L
L
P
l
~ ~ijaij -
l
f a~i
gh
~hij
I
+2
a
P
i=1 ii
]2
~h
~hij
]2
(2.2.29)
f a~j-2[ i=l
I a ii] 2]
+ (P+2)!
i=l j=i+1
36
If we let kO=l and k =1 for all i, then
i
1
P
ii
V = Vbd = 1 + +2 L c
p
i=1
(2.2.30)
and
+
1
2
(p+2) (p+4)
[
2{p+2)
P
L a~i+{p+2) p-1
L
i=l
P
[ P
]
aI j - 2 i=Llaii
i=l j=i+l
L
2J
'
(2.2.3l)
which correspond exactly to the expressions on page 636 of Box and
Draper.
that J
By arguing as we did for a cuboidal region of interest, we see
bd
, the sum of (2.2.30) and (2.2.3l), is minimized when all the
{~hij} and {~ij} vanish and when the {~ii} are all equal (to ~2' say).
It then follows that
(2.2.32)
and
With the above assumptions, the sum of (2.2.28) and (2.2.29)
takes the form
37
p
p
P
2
1
2
1
I
2(
2 +
1 +
I k + p+2 i=l a i ki-l)
(p+2) (p+4) I a~i
1l 2 (P+2) i=l
i
i=l
J
+
[(~2
1 2
- p+2) -
2
(p+2) 2 (p+4)
HP
I
a
r
i=l ii
1
+
(P+2)(p+4)
p-l
p
I
i=l
I
a2 •
j=i+l ij
(2.2.34)
Since the terms in (2.2.34) involving k i differ from the analogous terms
in (2.2.18) only by a common multiple, it is clear that J is minimized
when
i=1,2, ••• ,p,
and that J is less than J
one i.
bd
(2.2.35)
if we can specify that a~ < M for at least
i
Also, we can argue as in the cuboidal case to specify the
minimax value of k
i
as being the upper bound in (2.2.21).
To determine the minimax choice for M2 , one must note that
in (2.2.34) may be positive or negative depending on the choice of 1l •
2
For laiil < M , it is clear that the maximum of (2.2.34) occurs at
ii
p
I
a
ii
= 0 (which can always be arranged by proper specification of the
(M)
i=l
1
p
{Mii }) if 11 2 < +2 1 +
+4' and at I a ii - MO if
P
P
i-I
11 2 > p~2 (1 + ~P;4J· The minimax :01ution is the value of 11 2 which
I a , namely 1l 2 • p;2 (1 +~ P;4)·
i-I ii
is not overly appealing, since the
equalizes J for these extremes of
This procedure for selecting 11
2
38
value of
~2
chosen depends only on p and not on any of the available
information concerning the {a~} and {a
use
lht.'
ii
}.
An alternative would be to
Box-Draper "all-bias" design value liZ
suspect the minimization of J to require that
==
~2
1/(p+2); but, since we
be in the open
1
interval (p+2' 1), our choice seems slightly more reasonable.
CHAPTER III
•
CONSIDERATION OF J* WHEN dl=l, d 2=2
3.1
Univariate Case
Consideration of the Bayesian J* criterion necessitates our
introducing a
and a •
2
a
i
pr~or
distribution for the unknown parameters a
o'
aI'
In particular, let gi(a i ) represent a prior distribution for
with domain B , i=0,1,2.
i
It is standard practice to assume that
Also, we shall take E(a ) = 0, since a priori we have no reason to
2
expect that a straight line model is inadequate.
Although we will not
do so, it would be an easy matter to incorporate any knowledge we might
have concerning experimental error, say in the form of a prior distribution for 1/0 2 (see Lindley (1964), chapter 5).
In this case, the
modification would be to put E(l/02) in place of 1/0 2 everywhere it
appears.
Our problem then becomes one of choosing k
O
and k
l
to minimize
J* ... V* + B*
(3.1.1)
V* • V • k~ + k~/3~2
(3.1.2)
where
from (2.1.3), and, for B given by (2.1.4),
40
•
(3.1. 3)
Here, Y = E(a~) is the average over the prior on
i
~
measure of the magnitude of
S1
S.
~
of the standardized
introduced in Section 2.1.
Clearly, J*
depends on the prior distribution only through its second moments.
is worth noting that (3.1.3) is minimized when kO=kl=l,
~3=0,
~2=1/3,
It
and
so that consideration of the B* criterion alone leads one back to
the results of Box and Draper concerning the minimization of integrated
squared bias.
By differentiating the sum of (3.1.2) and (3.1.3) with respect
to
~3'
k ' and k l and then solving the resulting set of homogeneous
O
equations, we find that J* is minimized when
of k
O
and k
l
~3=0
and when the values
are given by
(3.1.4)
and
(3.1.5)
For any choice of
~2 ~
optimal values of k
J* when k
O
O
1/3 (and for
and k
l
~3·0),
the value of J* for these
can be shown to be less than the value of
and/or k 1 equals one.
41
The value of \.1 2 which minimizes J* can be found as follows.
We
may wr He the sum of (3.1. 2) and (3.1. 3) for \.1 =0 as
3
•
= = -
•
where k
O
and k
l
are given by (3.1.4) and (3.1.5).
Thus, we can
equivalently find the value of \.1 2 which maximizes f(\.I2)=f O(\.I2)+f l (\.I2).
Now, f l (\.I2) is monotonically increasing in \.12' and fO(\.IZ) has a simple
1
maximum at \.1 2 = 3(l+1/Y O).
Thus, if YO
~
lIZ, so that sampling error
is more than twice the size of the "variation" in B ' we should take
O
\.1 2=1 (ko is asserting itself herel).
When YO > 1/2, the best choice
for \.IZ must be obtained by solving
in conjunction with (3.1.4) and (3.1.5).
This results in the fourth
degree polynomial equation
(3.1.6)
where
•
and
42
From the structure of these coefficients, it follows from Descartes'
...
Rule of Signs (see MacDuffee
(1954), p. 60) that for YO > 1/2 the
above polynomial has at most one positive root.
Since
f'(~2) <
0
1
for 0 ~ ~2 ~ 3(l+l/yO) and since f'(~2) > 0 for ~2 large enough,
f'(~2)
= 0 has exactly one positive root.
than or equal to one, we take
~2
~2=1;
If this root is greater
otherwise, the optimal choice for
1
is the value of the root in the open interval (3
1
+~,
1).
YO
The above methodology requires that the y's be known; e.g.,
when gi<Si) is the uniform distribution on [-Mi , Mil, then specification of Yi = NM~/3a2 necessitates knowing the value of Mi/a.
-
y's are unknown, we can proceed as in Section 2.1.
When the
For the most part,
only results will be presented since the analytical development closely
\
parallels that given before, the sole exception being that now consideration is given to both k
O
and k 1 •
If Jtd denotes the sum of (3.1.2) and (3.1.3) when k =k =1, then
O 1
for
~3=0
we have
(3.1.7)
•
For
~2 ~
any k
O
1/3, the first bracketed expression in (3.1.7) is negative for
such that
(YO+~2Y2/3)-[1+Y2~2(~2-l/3)]
1+YO+ll~Y2
(3.1.8)
43
the second bracketed expression is negative for any k
1
for which
(3.1.9)
Now, if Yi < M!, i=0,1,2, then, for
~2 ~
1/3, any k O satisfying
(3.1.10)
also satisfies (3.1.8); and similarly, any k
1
satisfying
(3.1.11)
also satisfies (3.1.9).
We note that (3.1.4) satisfies (3.1.8) but not
necessarily (3.1.10), and that (3.1.5) satisfies (3.1.9) but not
necessarily (3.1.11).
Now, the particular choices of k O' k 1 , and
~2
which minimize
the maximum of J* over the restricted y-space are seen to be the values
of (3.1.4), (3.1.5), and the appropriate root
are obtained by using
Mt
~~
(say) of (3.1.6) which
for Yi everywhere it appears.
Here, one minor
drawback to working with k O is that (M~+~2M~/3)/(1+M~+~~~) satisfies
..
(3.1.10) only i f
(3.1.12)
If
M~
is too large,
greater than
1
~~
may not satisfy (3.1.12) since it is always
3(1+1/M~).
If this happens, the most reasonable tactic is
44
e
\.
to use that value of
~2
for which equality holds in (3.1.12).
Of course,
one can avoid this slight complication altogether by setting kO=l in
(3.1.1) to begin with, in which case the methodology will be identical
to that in Section 2.1.
This is because (3.1.7) with kO=l becomes
(J*-J* )
bd
(3.1.13)
which is exactly (2.1.11) with Y1 in place of a~.
It would be of interest to examine the methodology of Karson,
Manson, and Hader with regard to the J* criterion.
would require minimizing, via
Co
Their approach
and c l ' the quantity
\
(3.1.14)
Clearly, (3.1.14) is minimized at cO·E(B o) and cl=E(B l ).
V~h
Thus, with
as the first three terms of (2.1.14), the Karson-Manson-Hader
criterion
J~h
becomes
(3.1.15)
where Vi •
N
~
Var(Si) for i-O,l,2.
The difference between J* and
e
\
Note that
J~h
V2~Y2
since E(S2)-0.
is given by
45
(J*-J~mh)
=
(k~-1)+(k~-1)/3~2-(~2-1/3)2/(114-~~)
+ V (k -l)2_VO + jrl(k -1)2_ V / 3
1
O O
l
(3.1.16)
If we do not set kO-l, it becomes a difficult task to determine when
(J*-J~mh)
is negative.
is negative for any k
1
But, when k OD l, VI < M!, and V2 < M~, (3.1.16)
satisfying (3.1.11) and for any design satisfying
Thus, comparison of the two methods depends on the value of
3.2
M~.
Multivariable Case
For p independent variables, the prior distribution for 8'=(8',8'),
-1-2
where ~l and ~2 are defined in Section 2.2, will be assumed to be of the
form
where gi(8 ) and gjk(8 jk) represent prior distributions for 8i and 8jk ,
i
respectively.
Also, we shall require that
E(~2)=O.
We are concerned with choosing a matrix of constants
J* - V* + B*,
with V*(-V) given by (2.2.3) and with
(3.2.2)
46
e
\
B*
for B given by (2.2.5) and
=
fb
g(~)
"g(6)dr3
--
by (3.2.1).
With (2.2.8), (2.2.9), and (2.2.10) for the cuboidal region of
interest introduced in Section 2.2, we have from (2.2.11) that
(3.2.4)
and from (2.2.5) that
B*
= a'KA
Ka +a'A'KA KAa +a'Aa
-1--11--1 -2- --11---2 -1--1
=
Y (k
o
~
-li+!.
~ ~
~
~
y (k -li+!.
y
[k
cghll
]2
3 i:1 i i
3 i:1 j:i ij g:l gh=l
hij
0
(3.2.5)
where Y = E(a~j) is the average of a~j over the prior distribution On
ij
aij • The terms of (3.2.5) depend only on the second moments of the
prior distributions.
..
It can be shown with arguments similar to those on page 637 of
Box and Draper (1959) that B* is minimized if all the {llhij} vanish and
if llij-O for
i~j.
In this case (3.2.5) may then be written as
47
p
+ :5
L Yii
i=l
p-1
+
~ i=l
L
p
L
j=i+1
Yij ·
(3.2.6)
We see that (3.2.6) is minimized when all the k's are equal to one
and
~ii=1/3
for all i.
And, we can appeal to the results on page 637
of Box and Draper (1959) to show that V* is minimized at
~ij=O, i~j.
This, coupled with the results above on the minimization of B*, implies
that J* is minimized when the {~hij} and the {~ij} (i;j) are all zero.
Thus, J* becomes
(3.2.7)
We find that (3.2.7) is minimized when
(3.2.8)
and
i-1,2, ••• ,p.
(3.2.9)
For these optimal values, it can be shown that (3.2.7) is smaller than
48
when at least ~ k
i
is unity for any choice of the {~ii}'
The set of {~ii} which minimizes J* can be found by rewriting
(3.2.7) as
(3.2.10)
where k
O
and the {k } are as in (3.2.8) and (3.2.9).
i
monotonically increasing in
~ii;
and, (Yo + ;
Now, k
Li~l ~iiYii)kO
i
is
has a
maximum at
,
i=1,2, ••• ,p.
Thus, for YO ~ 1/2 we should take ~ii=l for all i.
If YO > 1/2, the
best. choice for the {~ii} is the set of values found by solving
iteratively the p equations
i=1,2, ••• ,p
in conjunction with (3.2.8) and (3.2.9).
One can argue as in Section
3.1 that a single positive value for each
~ii
value is greater than or equal to one, we take
optimal choice for
~ii
will lie between
will be found.
~ii-1;
If this
otherwise, the
49
and 1.
Thus, when YO and all the {Y } and {Y } are known, considerai
ii
tion of the J* criterion results in optimal values of the {Vii} which
are not necessarily equal.
result that J
dt
This is in contrast to the Draper-Lawrence
is minimized when V =V for all i.
ii 2
By taking ki=l, i=O,1,2, .•. ,p, in the sum of (3.2.4) and (3.2.5),
we find that the Draper-Lawrence criterion Jat is
(3.2.11)
As for J*, Jat is minimized when all {V hij } are zero and when Vij=O
for ifj.
Thus, (3.2.11) becomes
(3.2.12)
We see that the values of {Vii} which minimize (3.2.12) will be equal
only if all the {Y
ii
} are equal.
For kO·l, the difference between (3.2.12) and (3.2.7) is given by
50
•
(J*-Ja~)
p
=
~ i~l [(k~-1)/~ii+Yi(ki-1)2],
which is exactly (2.2.20) with a~=Yi'
for some i, the
k
1
i~th
(3.2.13)
Thus, if we can specify Y < Mt
i
term of the sum in (3.2.13) is negative for any
satisfying
{3.2.14)
Since J* for k =l is maximized at yi=Mt for any {y.} (j;i) and for any
O
J
{Y
ii
} and {y
ij
}, this maximum is minimized for
{3.2.15)
With (3.2.15), (J*-Ja~) is negative for any choice of {~ii}'
By inspection of (3.2.7) with k =l, we see that the minimax
O
solution for
~ii
requires (in addition to placing an upper bound on Y )
i
only that we assume Yii < Mti'
knowledge of all the {a
for
~2')
Then, J* with
ii
~n
} was necessary to specify the minimax value
Yii=M~i
minimized with respect to
is minimized,
(We remember that in the J criterion
~ii
and with k i given by (3.2.17) is
when
expression analogous in structure to the middle three
terms of (2.1.10).
Thus, if we can assume that Y <
i
M~
and Y < Mti
ii
51
for any i,we can make
(J*-J~~)
negative using (3.2.15) and the minimax
choice of llii' which is the root of (2.1.14) with Ml=Mt and M =Mti'
2
If
we can only assume that Y < Mt, we again suggest taking llii=1/3.
i
If we can specify that YO <
M~
and that y. < M* and Yii < M*
1
i
ii
for all i=1,2, ... ,p, it is then possible to include k
methodology.
O
in our
Unfortunately, as in the univariate case, provision for
the minimax solution for k
O
is a hardship.
Thus, we suggest that it
is more practical to take kO=l.
For the spherical region of interest of Section 2.2 with
(2.2.26), (2.2.27), and (2.2.28), V* is given by (2.2.28) and B* is
given by
(3.2.16)
As in the cuboidal case, J*, the sum of (2.2.28) and (3.2.16), is
minimized when the odd moments are zero.
Thus, J* becomes
(3.2.17)
52
whicll is minimized for values of k
O
and (kil given by
(3.2.18)
and (3.2.9).
Also, (3.2.17) will be further minimized at
~ii=l,
i=1,2, ••• ,p, if YO ~ l/(p+l), or at the set of {~ii} which simultaneously satisfies
with (3.2.18) and (3.2.9), if YO > 1/(p+1).
than or equal to one, we take
~ii
~ii=l;
If the i-th root is greater
otherwise, the optimal choice for
is the value of the root between
and 1.
Since the Box-Draper criterion Jtd' the sum of (2.2.28) and
(3.2.16) with ki=l, i-O,1,2, ••• ,p, differs from (3.2.11) oniy by the
coefficients of several terms, we can argue as in the cuboidal case
and find that J&d is minimized when the odd moments are zero.
J&d is given by
Then,
53
P
1
J~d = 1 + p+2
L
i=l
_1_+
\lii
12
+
(p+2) (p+4)
P
1 2
- p+2)
L Yii(\lii
1=1
[ (2p+2) P
L Yii + (p+2)
i=l
p-l
L
P
L y. ] ,
i=l j=i+1
(3.2.19)
1.j
which can be further minimized by choice of {\lii}.
From the similarity of (3.2.17) with (3.2.7) and of (3.2.19)
with (3.2.12), it follows for k =l that J* is less than Jtd for k
O
i
satisfying (3.2.14) if we can specify y. < M* for at least one i.
i
1.
We can then choose the minimax value of k. given by (3.2.17), which
1.
makes (J*-Jtd) negative for any \lii.
If we can also specify that
Y < Mti' it is then possible to find a minimax value for \lii by
ii
solving for the appropriate root of the polynomial
This polynomial equation is analogous to (2.1.14) with Ml=Mt, M2=Mti'
and slightly different coefficients.
Hence, it is possible to describe
completely the behavior of this root.
As in the cuboidal case, we again suggest that it is not
worthwhile to include k
known upper bounds.
O
in our estimator even when all the y's have
CHAPTER IV
CONSIDERATION OF J FOR THE CASES d =2, d 2=3, p=l and dl=l, d =3, p=l
l
2
4.1
Quadratic-Cubic Case
Box and Draper (1963) considered the problem of fitting over a
p-sphere a quadratic polynomial to a possibly cubic response.
Draper
and Lawrence (1965) looked at this same problem for a cuboidal region
of interest.
We shall consider only the univariate case (p=l), which
means that the above regions both reduce to the closed interval [-1, 1].
We assume that the true response is given by the cubic expression
Our estimator takes the form
y(x)
where
60 , 61 ,
and
62
are the elements of the vector ~1 of least squares
estimates given by
6_1 •
for which
(X'X
)-lX'
y,
-1-1
-1_
55
is a 3xN matrix with Xl' =(1, X , x 2 ) and y is the column vector of the
- u
u
u
observed responses yu = n(x ) + e , u=l,2, ... ,N.
u N u
1
.
As before, let ~j = N L x~, j=2,3,4, denote the j-th moment of
u=l
the design points and assume that ~3=O as did Box and Draper. Since
it follows from (1.1.2) that
N
1
V = --J Var[y(x)]dx
2
20 -1
(4.1.1)
where the design points are chosen so that ~4 > ~~.
Similarly, since
(4.1.2)
56
where a
i
= ~ Si/a, i=0,1,2,3.
When k =k =k 2=1, the sum of (4.1.1) and
O l
(4.1.2) is the Box-Draper (and Draper-Lawrence) criterion
J
(~2-1/3)2+4/45 1
1+1/3~2
+
---------2-----+ ~~(~4/~2-3/5)2 + 4a~/175.
=
bd
(1.1 4-1.1 )
2
(4.1.3)
From the last two terms of (4.1.3), it is clear that an "all-bias"
design satisfies 1.14/~2=3/5. When ~4/~2=3/5, it can be shown that V
bd
(and hence J
bd
) is minimized when ~2 = 1-1275 ~ 0.3675.
Now, it is clear from inspection of (4.1.1) that the values of
k
O
and k
2
which minimize J will depend on each other.
this problem, we will set kO=l.
To alleviate
In this case, J becomes
+ 4a~/175.
So, the difference between (4.1.4) and (4.1.3) can be written as
which is negative for any k 1 satisfying
(4.1.4)
57
(4.1.5)
. and for any k
2
satisfying
a~(~4-~~)+lO~2/3-l
2
2
l+a2(~4-112)
In order to proceed further with k , we will set 11 /11 =3/5.
l
4 2
This
restriction corresponds to considering only the class of designs which
minimizes B
(or, equivalently, B when all the k's are unity) and
bd
also implies that 11 2 < 3/5.
that 11
2
< 3/10.
Further development with k
2
requires
This would be much too restrictive, so we will take
k =l, which means that a~ no longer enters the picture.
2
Now, for lall < M, and la31 < M3 , it follows from the monotonicity
of the lower bound of (4.1.5) (with 11 /11 =3/5) that any k satisfying
4 2
l
(4.1.6)
will also satisfy (4.1.5).
Note the similarity in structure of (4.1.6)
to intervals previously found for k •
l
the minimax k
l
Also, in the same way as before,
can be shown to be
112 (M +3M/5) 2
1
k1 •
1+1l2(Ml +3M/5)
(4.1.7)
2
this is the choice which minimizes J when
al~Ml
and a =M •
3 3
This value·
S8
is the upper bound in (4.1.6) and makes (J-J
of ~2'
bd
) negative for any value
Note that ~2(al+3a3/S)2/[1+~2(al+3a3/S)2],the value of k
which minimizes (4.1.4) for
~4/~2=3/S,
l
will never exceed (4.1.7), but
may be less than the lower bound in (4.1.6).
The minimax value of
~2
is the root of the polynomial equation
obtained by differentiating (4.1.4) with respect to
(4.1.7), kZ=l,
~4/~Z=3/S,
al=Ml , and a 3=M3 •
~Z
for k
l
given by
This polynomial is given
by
36/S-(3-lJ ) 2
Z
S(3/S-lJ Z)
(4.1.8)
2
For large values of (M +3M /S)2, k is very close to one and J will
l
l
3
approximately equal J bd , which is minimized for
~2 ; 0.367S.
~4/lJZ=3/S
when
When (M +3M /S)2 is small, there is very little bias and
l
3
(4.1.8) is minimized when lJ =3(1-Z/IS) ~ 0.3167, the value of lJ
z
Z
which
minimizes (4.1.1) with kl=O, k =k 2=1.
O
4.2
Linear-Cubic Case
We have only considered situations where dZ-dl=l.
also be of interest to look at a case for which d -d >1.
Z 1
It would
Karson,
Manson, and Hader investigated such a situation when they illustrated
their "minimum bias estimator" for the case dl-l, d -3, p-l.
Z
For the closed interval [-1, 1], we assume that the true
response is given by the cubic (d Z-3) expression
59
We choose to consider the straight-line estimator (dl-l) of Chapter II,
namely
From (2.1.3), it follows that V is given by
(4.2.1)
And, since E(SO}=aO+a2~2 and E(Sl}=a1+a3~4/~2' it follows from (1.1.3)
that
N
B -- --20 2
f1
{E[y(x)]-n(x}}2dx
-1
(4.2.2)
where a
i
= ~ai/o, i=0,1,2,3.
The Box-Draper criterion J
bd
, the sum
of (4.2.1) and (4.2.2) with k =k =1, becomes
O 1
(4.2.3)
From the last four terms of (4.2.3), we see that an "all-bias" design
would require that
~3-1/3
and
~4m1/5.
The difference between J and J bd is given by
Ill)
.
which is negative for any k
O
satisfying
(aO+a2/3)2-a~(~2-l/3)2-l
1+(aO+a2~2)
and for any k
(4.2.4)
2
satisfying
l
(4.2.5)
To continue further with k
headway with k
l
O
would require that
necessitates that
~4/~2=3/5.
~2=1/3,
and to make any
So, in order to retain
some flexibility with regard to choice of design, we will set kO=l.
Thus, with
~4/~2=3/5,
(4.2.5) and (4.1.5) are exactly the same.
Thus, by the arguments for the quadratic-cubic case, it is clear that
(J-J bd ) is negative for any k l satisfying (4.1.6); the minimax value of
k
1
is given by (4.1.7).
For a~ < M , the minimax value of ~2 is obtained by setting
2
a 1-M , a =M , k =l, and
O
1
3 3
~4/~2·3/5
in J, the sum of (4.2.1) and (4.2.2),
and then differentiating with respect to
~2
with k
1
given by (4.1.7).
61
e
.1
This vlaue of
~2
(which will always be between 1/3 and 1) is the root
of the polynomial equation
.
(4.2.6)
It can be shown that the Karson-Manson-Hader criterion J
kmh
is
given by
J
kmh
=1 +
(~2-l/3)2
(4.2.7)
2
(~4-~2)
1 N
where ~6 = N U~l x~.
Here, it is required that (~4-~~»O and (~2~6-~~»O,
which again necessitates taking measurements at no less than three distinct abscissae.
necessitate that
For
~2
~4/~2=3/5,
these two design requirements
< 3/5 and ~6/~2 > 9/25, so that all the terms of
(4.2.7) are positive.
For
~4/~2=3/5,
the difference between J and J
kmh
is given by
(~2-l/3)2
~2(3/5-~2)
which for lall < M and la31 < M) can be made negative by taking any k l
l
satisfying (4.1.6) in conjunction with any
~2
satisfying
62
The choice between these two methods again depends on the value of
CHAPTER V
SUMMARY AND SOME ESTIMATION CONSIDERATIONS
Box and Draper considered the problem of choosing a response
surface design to minimize integrated mean square error J when the
true response n=~i~l + ~i§2' represented as a polynomial of degree d 2
in p variables, is approximated by a polynomial
degree d
l
for which the coefficient vector
usual least squares estimates.
21
y=
~i~l of lower
A
~l
is the vector
of
In this paper, we have suggested the
use of a generalized estimator of the form
21
=
A
~~l'
diagonal matrix of appropriately chosen constants.
where
~
is a
The choice of
~
which minimizes J depends on the unknown elements of the parameter
B' = (e',
e')j but, when it is possible to specify bounds (no
-1-2
matter how crude) for the elements of
relative to 02/N, the
vector
e
methodology we have developed leads to a set of
~'s
depending only on
the given bounds and on the choice of experimental design.
this set makes J smaller in value than when
~=!
Each K in
(the Box-Draper case).
It is then possible to find that particular choice of
~
which minimizes
the maximum of J over the restricted parameter space.
We have shown that our new procedure compares favorably with
the Box-Draper and Karson-Manson-Hader approaches for the particular
cases (dl=l, d =Z), (dl-Z, d Z=3, p-l), and (dl-l, d Z-3, p-l).
2
In
addition, analogous results for the case dl-l, d 2z 2 were obtained when
a modified criterion J*, the average of J over a prior distribution
64
for the elements of S, was considered.
of the prior come into play.
Here, only the second moments
Since the validity of our procedure is
contingent upon an accurate restriction of the parameter space, it
would be nice ,to have a method for checking on the reasonableness of
the limitations imposed.
For the case dl=l, d 2=2, p=l of Section 2.1, our methodology
necessitates specifying an upper bound M for the unknown parameter
l
a~ = NS~/cr2.
We have at our disposal a particular value for ~2 and an
unbiased estimator 13
1
of Sl.
With a "pure error" sum of squares
obtained by replication, we can also produce an unbiased estimate s2
of cr 2 based on, say, V degrees of freedom.
13
1 is normal with variance cr2/N~2.
Under the usual assumptions,
Thus, ~2Ni3~/cr2 has the noncentral
chi-square distribution with one degree of freedom and noncentrality
parameter A=~2a~.
Since vs 2 /cr 2 is independent of 13
1
and follows the
central chi-square distribution with V degrees of freedom, it follows
that ~ &2 = ~ NS 2 /s 2 is distributed as the noncentral F distribution
2 1
2 1
with 1 and V degrees of freedom and noncentrality parameter A=~2a~.
Incidentall~ it can be shown that an unbiased estimator of af is
(V~2)&~ -
111l2.
It is our objective to construct an upper one-sided confidence
interval for a~.
Let F ' ,"v,a (A) denote the a-th percentage point of
I
the noncentral F distribution with 1 and
centrality parameter A.
V
degrees of freedom and non-
Since pr[Fi,v(A) ~ 112a~] is a decreasing
function of A (see Johnson and Kotz (1970), p. 193), the value of A,
"2
"'2
say A, which satisfies Fi,v,a(A) - ~2al for the observed value of 112al
A
A
will be the upper bound for all values of A for which F '
I ,v,a (A) -<
"'2
~2al.
65
Thus, we can state that
A
A
when A satisfies F '
(A)
l ,v,a
=
We can interpret this result in the following manner.
If M
l
A
is greater than or equal to A/lJ,2' our choice of an upper bound for
a 12 seems to be a reasonable one.
A
However, when M is less than A!lJ,2'
l
i t is apparent that we have been too restrictive in our choice of
bound.
In this case, if time and cost limitations are not prohibitive,
we could consider repeating the experiment with a new bound for a~.
Since ~!lJ,2 is a random variable, we cannot simply use it as the new
M because k itself would then be a random variable.
l
l
However, it
would probably be reasonable to replace M by some number "close" to
l
Since the percentage points and probability integral tables of
the F' distribution are not fully tabulated, an F approximation
(Patnaik (1949»
A
F'
(A)
l ,v,a
= lJ,2a12
to the F' distribution can be used to solve
A
for A.
Here, Fi
,v
(A) is approximated by C F
vI'
v'
A
where C = (l+A) and vI • (1+A)2!(1+2A).
Then, A is found by computing
(VI + IVl(Vl-l»F
v
for different values of vI until equality with
VI' ,a
lJ,2a~ is obtained. The appropriate choice for A is then
A
(vI-I) + Ivl(vl-l).
For a-.05 and several choices of v, Table 5.1
A
gives the values of lJ,2a~ which result in the same value of A.
In the multivariable case of Chapter II, it should be clear
from inspection of the "alias matrix"
~
A
that
E(~l)
• -1
e.
Thus, we can
follow the above reasoning and show that each lJ,2ai • lJ,2Na~! s2,
A2
66
i=1,2, ••. ,p, where s2 unbiasedly estimates (12 and is independent of
each '"B , follows the noncentral F distribution with 1 and
i
of freedom and noncentrality parameter A. = ~2a~.
1
Bonferroni
1
V
degrees
By applying the
inequality (see Feller (1967), p. 110) and following
the methodology of the univariate case, we can make a joint probability statement giving upper bounds for each a~, i=1,2, ••• ,p.
This is done by solving, as before, for the value of Ai' say "Ai'
for which Fi, v,a/p (~i) =
~ii~ .. The statement Pr(a~ ..s. ~/~2' for
every i=1,2, •.• ,p) is made with confidence> (I-a).
TABLE 5.1.
VALUES OF ~2a~ WHICH GIVE THE SAME ~ FOR
DIFFERENT VALUES OF V AND a=.05
"'2
~2a1
•
"
A
v=3
v=8
v=15
\1=30
\1=60
.178
.588
1.135
1. 752
2.410
3.101
3.809
4.529
5.261
8.965
12.757
20.348
27.983
35.620
43.015
73.815
88.960
150.611
.178
.616
1.239
1.970
2.766
3.613
4.506
5.420
6.353
11.177
16.153
26.238
36.489
46.664
57.001
98.353
118.792
201.747
.174
.627
1.269
2.046
2.915
3.842
4.815
5.823
6.840
12.268
17.930
29.510
41. 259
53.231
65.126
112.717
136.754
232.109
.174
.632
1.299
2.103
3.018
3.990
5.032
6.102
7.210
13.123
19.352
32.306
45.552
58.902
72.416
127.081
154.237
262.870
.174
.632
1.314
2.141
3.064
4.085
5.156
6.277
7.444
13.654
20.300
34.210
48.573
62.982
77.913
137.654
166.949
287.240
2.414
4.449
6.464
8.472
10.477
12.481
14.483
16.485
18.487
28.491
38.494
58.496
78.497
98.497
118.498
198.499
238.499
398.499
LIST OF REFERENCES
Blight, B. J. N. (1971).
error estimation.
Some general results on reduced mean square
The American Statistician. 25(3): 24-25.
Box, G. E. P. and Draper, N. R. (1959). A basis for the selection of
a response surface design. Journal of the American Statistical
Association. 54: 622-654.
----Box, G. E. P. and Draper, N. R. (1963). The choice of a second order
rotatable design. Biometrika. 50: 335-352.
Box, G. E. P. and Wilson, K. B. (1951). On the experimental attainment of optimal conditions. Journal of the Royal Statistical
Society. Ser.!. 13: 1-45.
David, H. A. and Arens, B. E. (1959). Optimal spacing in regression
analysis. Annals of Mathematical Statistics. 30: 1072-1081.
Draper, N. R. and Lawrence, W. E. (1965). Designs which minimize
model inadequacies: Cuboidal regions of interest. Biometrika.
52: 111-118.
Draper, N. R. and Lawrence, W. E. (1967). Sequential designs for
spherical weight functions. Technometrics. 9: 517-529.
Feller, W. (1967). An Introduction to Probability Theory and Its
Applications. Vol. 1. 3rd Edit. John Wiley and Sons, Inc.,
New York.
Jeffreys, H. (1961).
London.
Theory of Probability.
Oxford University Press,
Johnson, N. L. and Kotz, S. (1970). Continuous Univariate Distributions
-2. Houghton Mifflin Co., Boston.
Karson, M. J., Manson, A. R., and Hader, R. J. (1969). Minimum bias
estimation and experimental design for response surfaces.
Technometrics. 11: 461-475 .
•
Korts, D. C. (1971). Estimators based on a closeness criterion over a
restricted parameter space for the general linear model of full
rank. University of North Carolina Institute of Statistics
Mimeo Series No. 741, April, 1971.
68
I.lndley, D. V. (19h4).
Point of View.
frobllblU ty and Stall HtiCH [rom
~
Bayl'91an
Cambridge University Press, Cambridge.
MacDuffee, C. C.(1954).
Inc., New York.
Theory of Equations.
John Wiley and Sons,
Markowitz, E. (1968). Minimum mean square error estimation of the
standard deviation of the normal distribution. The American
Statistician. 22(3): 26.
Patnaik, P. B. (1949). The non-central X2 - and F-distributions and
their applications. Biometrika. 36: 202-232 •
.
Stein, C. (1964). Inadmissibility of the normal estimator for the
variance of a normal distribution with unknown mean. Annals
of the Institute of Statistical Mathematics. 16: 155.
Stuart, A. (1969). Reduced mean square error estimation of cr P in
normal samples. The American Statistician. 23(4): 27-28 •
•
•