MINIMUM BIAS APPROXIMATION
OF A GENERAL REGRESSION MODEL
WITH AN APPLICATION TO RATIONAL MODELS
Robert Cote) A. R. Manson and R. J. Hader
Institute of Statistics
Mimeograph Series No. 756
Raleigh - July 1971
iv
TABLE OF CONTENTS
Page
LIST OF TABLES • •
v
vi
LIST OF FIGURES
1.
INTRODUCTION.
1
2.
DEVELOPMENT OF THE PROBLEM •
4
2.2
Choice of Minimum Bias Model in C .
Estimation of the min B Model • • •
2.3
Performance of the min B Approach
2.1
g
6
9
• • • • • • • •
11
3.
APPROXIMATING RATIONAL FUNCTIONS BY POLYNOMIAL FUNCTIONS •
15
4.
SUMMARY AND CONCLUSIONS
5.
REFERENCES.
36
6.
APPENDIX ••
37
A• .1
A.2
A.3
A.4
g
..
Theorem 2.1 •
Theorem 2.2 • • • • •
Theorem 2.3
Lemma 2.1 • •
• •
A·5
A.6 Theorem 2.4
A·7 Theorem 2. 5 . • . • • •
Lemm.a 2. 2 • • • • • • • • • • •
35
38
38
39
40
40
41
43
v
LIST OF TABLES
Page
....
19
andy
1. 01
21
~lx
andy
= 1. 50
22
~lx
andy
= 5·00
23
3·1
min B (y) when et = (1,2,4) and ~O = ~O +
g
-
3.2
min Vlmin Bg designs when ~O = ~O +
~lx
3.3
min Vlmin Bg designs when ~O
= ~O
+
3.4
min Vlmin Bg designs when
= ~O
+
3·5
min B (y) when et
g
-
3.6
min Vlmin B designs when ~O
g
= ~O
+
~lx
3·7
min Vlmin B designs when ~O
g
= ~0
+
~lx
3.8
min Vlmin Bg designs when 1'\0
= ~O
+
~lx
=
~O
(1,2,4) and ~O
= ~O
+
~lx
~lx
.
+
~2x
2
x 2 andy
2
2
+~x
andy
2
2
+ ~2x andy
+
~
24
= 1. 01
25
1. 50
26
5·00
27
=
vi
LIST OF FIGURES
Page
3.1
3.2
3.3
3.4
3.5
3.6
Contours of constant V for
L(N,N ,N ,N ; '1') = L(4,0,1,1; 1.01) • • • . . . • • • • • •
O 1 2
29
Contours of constant V for
L(N,N ,N ,N ; '1') = L(4,0,1,1; 1.50) • • • • • . • • • . • .
O 1 2
30
Contours of constant V for
L(N,N ,N ,N ; '1') = L(8,4,1,1; 5.00) • • • • • . • • • • • •
O 1 2
31
Contours of constant V for
Q(N,N ,N ,N ; '1') = Q(l.O, 0, 4, 1; 1.01)
O 1 2
. . . . . . . . . ..
32
Contours of constant V for
Q(N,N ,N ,N ; '1') = Q(5, 1, 1, 1; 1. 50) • • • • • . • • • • . .
O 1 2
33
Contours of constant V for
Q(N,N ,N ,N ; '1') = Q(6, 2,1,1; 5.00) . • . . • • • • • • ••
o 1 2
34
INTRODUCTION
1.
The problem of optimum design in regression has drawn the
In particular, Elfving (1952), Kiefer
attention of many writers.
(1959, 1961), Kiefer and Wolfowi tz (1959), Hoel and Levine (1964)
investigated different optimality criteria applying to regression
Also from a different viewpoint, Folks (1958) and Box and
problems.
Draper (1959, 1963) introduced bias considerations in their optimality
criterion.
The latter studied the situation where the model,
11(:~),
is
a polynomial of degree d + k - 1;
is approximated by a polynomial of degree d - 1;
The vector
~l
is made up of terms required for the polynomial of
degree d - 1; the vector
~2
is made up of additional higher order
terms required for the polynomial of degree d + k - 1 while
are corresponding vectors of regression coefficients.
£1
is
£1
=
variates in
~l
and
~2
The estimator
(X1Xl)-~il where Xl is the matrix of values taken by the
~l
for the N experimental runs; I is the vector of the N
uncorrelated random variables with E(Y.)
=
E[Y.1. - l1(x.1. )]2 = rr 2 (i = 1, 2, ... , N).
Box and Draper were primarily
1.
l1(x.) and
1.
interested in finding designs which minimize the mean square error
1\
(MSE) of y(x) averaged over some region of interest R, namely,
NO
J = ~
rr
SR MSE[y(~)J~
1\
2
where 0 -1
V
=
2NO
=
J
R~'
J can be split into two parts: tT
JR Var ["y(~)]~,
.
= \i + B where
.
the averaged vanance of IIy(~) over R; and
(J
B
= N~ JR {E[~(~)J - ~(~)}2~, the averaged squared bias of ~(~) over
(J
R.
In their work they noted that unless V was many times larger than
B, the minimum J designs were remarkably close to those obtained by
ignoring V completely.
Thus, they showed that to minimize B alone the
design matrix X = [X :X ] should satisfy
l 2
where X2 is the matrix of values of ~2 and Wlj
=
0
JR ~l~j
~
j
==
1, 2.
Equation (1.1) will be called the BOX-Draper conditions for minimum B.
Karson, Manson and Hader
this problem.
(1969)
presented a different approach to
Accepting the Box and Draper result that B is the
dominating factor in mean square error, they minimized B by choice of
estimator rather than by choice of design.
"
of ~(~) which they obtained was ~(~)
matrix of known constants A
=
=
The minimum bias estimator
~lA(X'X)-X'Z where A is a
[I: w~iWl2~'
The symbol S
will denote
a generalized inverse of a square matrix S, which satisfies the
equation S
= SS-S.
"
The estimator, ~(~), achieves minimum B for any
design for which ~ is estimable, where~'
== (~i: ~2)'
Using the
design flexibility remaining, they found designs with smaller J than
is given by designs which satisfy the Box-Draper conditions.
The following section will generate the Karson, Manson and Hader
method to a larger class of models.
more general situation will be given.
with the Box and Draper method.
Some results which hold in this
This approach will be compared
As a direct application, a brief
3
discussion of the problem of approximating the ratio of polynomials by
simple polynomials will be presented.
illustrated with examples.
The technique developed will be
For these examples, designs satisfying the
bias and variance criteria will be developed.
4
2.
DEVELOPMEl'fl' OF THE PROBLEM
Let (I,a,~) be a measure space where I is a compact set and~, a
finite measure defined on the o--algebra Cl of subsets of
L2(I,a,~)
X.
Let
be the class of all square integrable, real valued functions
defined a.e. on
X,
!.~.,
Consider n linearly independent continuous functions f , f , ... , f
l
n
2
in L2(I,Cl,~).
Let 8 , 8 , ... , 8 be n unknown constants in some
1
n
2
l
vector space ® over E , the field of real numbers.
(In the sequel Em
will denote the m-fold cartesian product of the set of real numbers. )
These quantities will be written as vectors:
If I is a k-dimensional space, its elements are actually k-tuples
(Xl' x 2 ' ... , ~) and any function defined on
function.
X is a k-variab1e
However, an element of the k-dimensional space I shall be
denoted by one letter with or without subscript, x.
Over the space
X,
a hypersurface in a (k+1)-dimensional space is assumed to be described
by
i1(x)
where the f. (j
J
n
.1: 8.f. (x) = 8 / f(x) ,
J=1 J J
- -
(2.1)
= 1, 2, ... , n) are known functions.
also be regarded as the operability region,
!.~.,
The set I will
for each x
E
X it is
possible to obtain a value of the random variable Y(x) which has mean
~/!(X).
The random variables Y(x.) and Y(x.) are assumed to have a
1
J
5
known correlation apart from a constant
2
If several x. 's are equal
0-.
1
then the corresponding Y(x. )'s will be treated as different random
1
variables.
A "close approximation" of the true response given in the
equation (2.1) is intuitively appealing.
estimation of all parameters
e.
In situations where the
is expensive, difficult or even
J
impossible, a convenient approach is to approximate the assumed true
model by a "relatively simpler one".
Thus, the experimenter selects a
class of such models;
s
= . .E
J =1
The g. (j
J
= 1,
with s s: n .
Ci.g.(X)}
J J
2, ... , s) are linearly independent, continuous, real
valued functions in L , and the parameters
2
are to be estimated.
Ci
j
€
@ (j
= 1,
2, ... , s)
These quantities will be written as vectors:
approach, the error of approximation stems from two sources:
the
sampling error and the bias error due to the failure of the chosen
~O(x) €
C to exactly represent the true response
would like to minimize both errors.
~(x).
Obviously, one
However, the minimization of the
sampling error must take place in the estimation space, by estimation
procedures, and in the operability region, X, by choice of design.
The minimization of the bias error has to be accomplished in the parameter space, @, and in L space by choice of ~O(x)
2
€
C.
sampling error is defined only after the choice of ~O(x)
Moreover, the
€
C is made.
6
Thus, the only alternative is the conditional minimizat;ion of the
sampling error,
!.~.,
the minimization of the sampling error given
that the minimum bias has been achieved within C.
2.1
Choice of Minimum Bias Model in C
Let B(~O) denote the averaged squared bias over X due to the
choice of ~O(x)
€
C, namely,
(2.2)
JX dg(x).
where 0 -1
The following minimization
B = minimum B(~O)
~o € C
is equivalent to
B
minimum
(a, g)
€
B(~/~)
tElsXL~
s s
where (~,~ ) is an element of the product space tEl xL .
2
tEl
s
is the s-fold
cartesian product of tEl and L~, the s-fold cartesian product of L .
2
The minimization of B(~/~) over tElsXL~ is a formidable task if at
all possible~ Even if one reduces the domain of minimization to
tEl s xMs where M = M (1'1' 1'2' •.• , f n ) is the space spanned by the
functions f. (i = 1, 2, ..• , n), the task is not materially simplified.
~
To reduce the minimization problem to workable size, it becomes
necessary to fix a set of functions ~' = (gl' g2' •.. , gs) and
minimize over tEl only.
(i
Then B will denote the fact that the g.
g
= 1, 2, ••• , s) may not be an optimum choice.
1
The functions ~, ~O'
7
f., g. are defined on X. and their integration will be over X unless
J
J
otherwise specified.
Henceforth, the variate x and the domain of
integration X. shall be omitted.
For a given set of functions (gl' g2' ... , gs) equation (2.2) is
written as
where
Whk
are real valued matrices.
=0
J --hk'$
(2.4)
The following theorem is appropriate at
this point.
Theorem 2.1
A necessary and sufficient condition for the matrix Wgg to be
positive definite is that the g.1 (i = 1, 2, ..• , s) be linearly
independent.
(For a proof, see Appendix A.l. )
Corrolary 2.1
As defined in equation (2.4) the matrix
(i)
W is positive definite
ff
(ii)
W'fW-~
f is positive semi-definite
g gg g
(iii)
Wff -
W~fW~~Wgf
is positive semi-definite.
8
Using Theorem 2.1, equation (2.3) may be written as
Bg (~ 0 ) = (a
- -
W-~
e)'wgg (a
gg gf- - w-lw
gg g fS)
-
+ _S'(Wt'f - W'fW- 1W f)S • (2.5)
g
gg g -
The second term of the R.H.S. of equation (2.5) is constant in a.
Therefore, from Theorem 2.1, it follows that
min B = minimum B (~O) = ~' (Wff - W' W-1W )e
g
a E ® g
gf gg gf -
if and only if
a = W-~
-1
where A = Wgg Wg f.
functions g. (i
~
e = AS-
gg gf-
Moreover min Bg is unique.
= 1,
Choosing the set of
2, •.. , s) is equivalent to selecting a subclass
Cg c C where
Cg = (~O: ~O = ~/~,
~ is given} with s s n .
The minimum bias model (min B model) in C which is used to
g
approximate
~
g
= ~/! is written as
a = AS
(2.6)
The choice of functions g. (i = 1, 2, .•• , s) as a subset of
~
(fl , f , ••. , f n ) is a natural and appealing one, and, if the set of
2
functions gi is a subset of (f , f , ..• , f ) so that ~
n
l
2
k
= 1,
2, .•• , s, then the matrix A always takes the form
A = [Is:
£2
= fk
w~iw12J,
where Wll = Wgg and W12 = 0
= (f s +l ' f s +2' ••. , f n )·
J f!lf 2d/..L with
for
9
2.2
Estimation of the min B Model
g
A question of great interest is the estimability of the min B
g
model given in equation (2.6).
Assume that a vector of N random
variables (observations) y/(!)
=
[Y(x ), Y(x ),
l
2
--, Y(x )] satisfies
N
(1)
E[y(!)]
(2)
2
E([y(!) - 1)(!)][y(!) - 11(!)] I} = L;(J .
= Fe
F is an Nxn matrix whose (i,j)th element is f.(x.), Z is a NxN
J
1
positive definite matrix, of known constant elements and (J2 is a real,
posi tive constant.
i.~., ~
F
€ ~
I
The Gauss Markoff Theorem (Scheffe,
1~7)
applies,
= ~ is linearly estimable if and only if the design matrix
where
This class
3MB
is also the class of designs which achieve
min B = 6 ' (Wff - WJfW-~ f)e .
g
g gg g -
An equivalent necessary and sufficient condition for estimability of
Ae is given in the following theorem.
Theorem 2.2
The vector of parameters
if A' = V1T.
~
= A~ is linearly estimable if and only
V is a nxr matrix whose columns are the orthonormal
1
characteristic vectors corresponding to the r non-zero characteristic
roots of (F/Z-1F) with s ~ r ~ n, and T is rxs matrix of full rank.
(For a proof, see Appendix A.2.)
10
This theorem essentially brings
out
the fact t.hat the process of
linear estimation takes p.lace in a r-dimensional subspace of the
)
n-dimensional space spanned by the C01UWB3 of (F'E --1F.
The following
theorem is then a natural one.
Theorem
2.3
If a
-
= AS-
is estimable where A
-1
= WggWgf is a matrix of full rank,
then the matrix A(F/E-1F)-AI is non-singular.
(For a proof, see
Appendix A. 3. )
Moreover, the columns of A' are in the space generated by the
columns of V (Theorem
l
2.2) which is the space generated by the columns
of (F/E-1F)-, (Rao, 1967, p. 184).
Thus, it can be shown (see
Graybill, 1969) that A(F/E-~)-AI is invariant for any generalized
inverse of (F/E-1F).
For a fixed F
3MB
€
the BLUE of
~
is
It
a
and the minimum variance linear estimator of
11
when using the min B
g
model is
Now Var
[11It 0 (F)] = ~/A(F/E -1F) - A/~~2
denoted by V , namely,
F
when averaged over the region 1 is
11
The notation Tr[HJ 'Ifill be used to denote the trace of the square
matrix H.
A great deal of design flexibility remains at this point.
One way to use this flexibility is to achieve
min V
minimum VF '
F
:!.. ~. ,
E
~
to obtain min V having achieved min B .
g
This design flexibility may also be used to satisfy other
criteria, as will be shown in Section 3.
2.3 Performance of the min Bg Approach
To approximate a given function, say
~(x),
on the basis of N
experimental observations various methods aimed at satisfying both
bias and variance criteria can be presented.
These methods could thus
be compared on the basis of the compromise they offer between variance
and bias.
Consider the situation where the set of functions
gi (i = 1, 2,
f. (i = 1, 2,
1
... ,
... ,
s) is a subset of the functions
n) so that
~
= f
k
for k = 1, 2,
... ,
s.
The full
model is then written as
"n=8'f
'I
-1-1 +8'f
-2=-2
where
~l!l
terms.
contains the first s terms and
the remaining (n-s)
The design matrix F = [F l : F J is written so as to match the
2
partitioning of the
defined:
~2!2'
model~.
The following classes of designs are
12
~FM
= (F: e
~
= (F: A6 is estimable}
is estimable}
(2.8)
3RM = (F:
~l is estimable}
Also, the following notation will be used:
V*(F) =
N~
cr
"
J Var [TlO(F)]d,u
and
V* = minimum V*(F) .
F
€
(2.10)
~*
Lemma 2.1
If A
=
[Al : A ] and A is a sxs matrix of full rank, then a
2
l
necessary condition for
A~
to be estimable is that F be of full rank.
l
(For a proof, see Appendix A.4.)
[I : W-~ fl. The
s. gg g 2
following lemma establishes a relationship between the classes of
In particular, this lemma applies when A
designs defined in (2.8).
Lemma 2.2
Using the notation of equation (2.8)
(i)
J.FM\;;;~~~
(ii)
~D \;;; ~
(iii)
when N = s, ~ = ~D'
(For a proof, see Appendix A.5).
=
13
Theorem 2.4
Using the notation defined in equations (2.9) and (2.10)
(i)
V ~ VMS
FM
(ii)
VMS ~ VRM
(iii)
VMS
S;;
VBD
= s.
(For a proof, see Appendix A.6.)
From (iii) of Lemma 2.2, strict equality is achieved in (iii) of
Theorem 2.4 when N = s and ~
f ¢.
Moreover, many examples exist
where the strict inequality holds in (iii) of Theorem 2.4 (see Karson,
Manson and Hader, 1969).
Consider the situation where N = s, then the min B model cannot
g
contain more than s parameters.
A logical method of obtaining
protection against bias would be to add more terms to the fitted model.
This is obviously not possible when N = s.
An alternative method of
obtaining additional protection against bias would be to add more
terms to the assumed true model and obtain minimum bias protection via
the fitted model.
Unfortunately, this extra bias protection is
accompanied by an increase in variance error as shown in the following
theorem.
Theorem 2.5
To the model given in equation (2.7)
'11 =8'f
-1-1 +8'f
-2=-2 =8'f
- -
'n
add t more terms, say
~3!3.
The new model is then
14
Let Cg = {~O: ~O = ~'~
class of simpler models.
where gk = f k for k = 1, 2, ..• , s} be the
If V. denotes the averaged variance of the
~
min Vlmin Bg linear estimator of ~i for i = 1, 2, then V ~ V '
l
2
a proof, see Appendix A.7.)
'(For
15
3.
APPROXIMATING RA'l'IONAL FUNCTIONS
BY POLYNOMIAL FUNC'rI ONS
l
A polynomial form in x over E is ~l expression of the form
n
.
P (x:a) 7 '~O a.x~ with a
0 if n > O. The degree of the poly-
n
~=
n
~
nomial form is n ~d a.
~
E
f
l
E (i = 0, 1, ..• , n).
A rational form is
f
defined to be the ratio P (x:a)/P (x:b) where P (x:b)
n
m
m
are polynomial forms in x of degree m
~d
0 ~d P (x:a)
n respectively.
n
If x is
regarded as a variable with a given domain, these forms are called
functions, namely, polynomial functions
definitions
c~
~d
rational functions.
These
be generalized to functions of several variables.
For
example, a polynomial function in (Xl' x 2 ' ..• , x ) is defined
n
recursively as a function in x
polynomials.
n
over the ring D[X , x ' ... ,
l
2
(For more details, see Birkhoff
~d MacL~e,
xn_~
of
1965.) The
same notation will be used for a polynomial in one or in several
variables, recalling that x may actually represent a product such as
k
t.
i~l Xi~ when dealing with a k-variable polynomial function.
Assume that the following functional relationship between the
response
~ ~d
the factor x holds
i\(x) = R
(x:e,y)
n,m
k
where the operability region I is a closed bounded set of E , and
u
v.
P (x·y) = .~O Y x J
m'
J=
are real valued polynomial functions of degree n
v.
J
~d
m respectively.
The class of relatively simpler models to be considered is
16
s
s.
P (x:a) = .~ a x J}
s
J=O s.
wi th
s s: q
J
where the s. (j = 0, 1, ... , s) are known integers and the a
J
l
unknown constants in E .
s.
J
are
No claim is made concerning any optimality
criterion in the choice of C
g
this class of models is simple and
illustrates the point that the set of functions g.1 (i = 0, 1, ... , s)
need not be a subset of functions f. (i = 0, 1, ... , q).
1
Using notation similar to that used in Section 2,
Bg (nO) = 0
with 0-
1
=
J dx
J (p s (x:a)
- Rn,m (x:e,y)}2dx ,
and where the measure ~ is simply the usual Lebesgue
measure defined on the Borel
~-algebra
of subsets of 1.
If for
appropriate range of j, one defines
t.
s.
x J
fj(x) = P (x:y); e j
m
et;g·(x)=xJ;a =Q'
j
J
j
Sj
then the results of Section 2 apply when restricting
t.
x J
P (x:y)
E L
2
m
.
(J
= 0,
A = W-~ f where W
gg g
S.+s
oJx
gg
1, .•• , q) and y to be known.
The matrix
is a (s+l)x(s+l) matrix whose (i,j)th element is
1
jdx and W is a (s+l)x(q+l) matrix whose (i,j)th element is
gf
S.+t.
o J~
1
J
(x:y) dx.
m
The minimum averaged squared bias is
min B' = e/(w
- W' w-1W )e
g ff
gf gg gf -
where W is a (q+l)x(q+l) matrix whose (i,j )th element is
ff
t +t
i j
x
o
2 dx.
[Pm(x:y)]
J
'
17
Assume that the vector of the Ii observable responses
= [Y(xl ), Y(x2 ), ... ~ Y(xN)J satisfies the conditions
Y'lxJ
(1)
E[y(~)J = R~
(2)
E([ (y(~) - 11(~)J[y(~) - I)(~)J f} = IN(j~
?
t.
where R is a Nx(q+l) matrix whose (i,j)th element i s ·
x. J
l
Pm (~ :)'
and (j2 is a real, positive constant.
R E
JMB
BLUE of
)
For any design matrix
(the class of designs for which min B g is achieved) the
~
is
~ = A(R'R)-R'Y(~)
A
_
2
and the variance of ~O is
So
sl
Ss
with ~' = (x ) x , '.0' x ).
Var (11 ) = ~'A(R'R) A'~(J
0
A
design matrix R E JM:s the averaged Var (11 ) over X. is
0
There still remains the flexibility of choosing R
E:
For a fixed
One way
~~.
to take advantage of this flexibility is to choose R so as to achieve
minimum V , or simply to achieve V < s (when possible) to give
R
R
R E ~
improvement over those designs which satisfy the Box-Draper conditions
of equation (1.1).
Moreover, this flexibility may be used to minimize
V withiri the class of designs which have equal spacing as is
R
illustrated in Figure 3.2.
As illustration, let
(3.1)
be the rational function to be approximated where
1mown
~O' ~lJ ~2
. bl e,
parame t ers l" n El, ~,the
~
con t roo11
. abl e varla
are un-
18
equation (3.1) may be written as
where x
E:
= [-1, 1], the operability region.
I
In order to use a
linear estimation procedure and to avoid undefined terms, it is
necessary to restrict y to be known and Iyl > 1.
The class of simpler
models is taken to be Cg = (~O: ~O = Pl(x:~) = ~O + ~lx}.
The matrix
-1
A = WggWg f' where
r
J
(2-')'Z),
(y Z-2'1)
2
222
.
h2-YZ) , ()' Z-2')'), (5+ 2Y _y3 z )
with Z = log (y+l) - log (y-l).
e
e
Z
,
Thus, B depends on the specified
g
value of y, and its minimum may be wri t ten as
min
Bg(Y) = ~'(Wff' - W'gfW-1W
gg gf )e
-
where
1
(L +~)
,
ZI
2
2
(1 +L - yZ) ,
Zl
Z'
1
W =
ff
2
(1 + L + yZ)
Zl
2
3
(3)' Z _ L + 2)' )
2
Zl
4
('! + 3y2 +L _ 2y3z)
3
Zl
Symmetric
with Zl = y
any value
2
- 1.
of~.
Numerical values of min
For example, if
e'
Bg(Y) can be computed for
(1, 2, 4), min B (y) has been
g
computed for different values of y as shown in Table 3.1.
19
min B (1) when ei = (1,2,4)
Table 3.1
g
-
and Tl o = CiO + 0'1x
min B (y)
g
r
1.01
320·9836
1. 50
1.1653
5·00
0.0510
--
conditions
(1 )
E [y (~)J
(2)
E{ [
= R.§.
y(~)
-
Tl(~) J[Y(~) - Tl(~)}} =
where R is an Nx3 matrix whose (i,j)th element is
2
I cr ,
N
x
y +
j
X.
and cr 2 is a
1
For a given matrix R such that a is
real, positive constant.
"
estimable, the averaged Var [TloJ over [-1, IJ is
Let
~
4
be the class of 3,
and 5 point designs which are
symmetric with respect to the origin.
as follows:
The N observations are spaced
N = NO + 2Nl + 2N where NO is the number of observations
2
taken at x = 0, Nl is the number of observations at x = ~ £1' and N2
is the number of observations at x
~
£2'
The above designs can be
better described by this following schematic presentation:
N2
Nl
NO
NL
N2
--4X~----4I----II--I~--f-I--+I------'J x .
-1
-£2
-£1
0
£1
£2
+1
(3.2)
20
For this class of designs, minimum V has been computed and
R
R
€
~
min V\min Bg designs tabulated for N = 3, 4, ... , 15.'l'hese designs
depend on the particular value of y.
Tables 3.2, 3.3 and 3.4 give
such designs for y = 1.01, 1.50 and 5.00 respectively.
These tables indicate that more and more design points are
forced to the boundary of the operability region I as the value of y
increases,
as y increases the bias decreases and therefore has
!.~.,
less and less influence on the choice of design.
For the following class of simpler models
wgg =
1
o
o
1
1
3
1
(2-yZ)
Z,
3'
, (I'
1
o
5
Z-2y)
(.5.-r2y 2 _1'2 Z )
3
o
3
2
(y 4Z _2y3 ~2y)
Symmetric
3
For the same set of parameters used in the previous example,
!.~.,
8' =(1,2,4), min Bg(Y) appear in Table 3.5.
From the Tables 3.1 and 3.5 one notices that for a fixed value of
y, min Bg (y), when approximating
when using a quadratic model.
~
by a linear model, is larger than
In general it is easy to show that
B (1,y) ~ B (Q,y) where B (1,y) and B (Q,Y) denote min B (y) when
g
g
g
g
g
approximating
~
by a linear and a quadratic model respectively.
Using the class of designs described in (3.2), minimum V
R
R
E
~
has been computed and the min V\min B designs appear in Tables 3.6,
g
.
~
2.1
Tab.1e 3.2
min Vlmin
and 1
p
Bg designs when nO ~ ~o
+ ~.1x
= 1. 0.1
N
No
N.1
.t1
N2
.t 2
min V
p
3
.1
1
0.8901
0
------
1.8362
3
4
0
1
0·5788
1
0·9160
1. 7157
4
5
.1
.1
0·7.136
.1
0·9245
1. 7099
5
6
0
2
0.6972
.1
0·9398
1. 6503
4
7
.1
2
0·75.18
1
0·9425
1. 6669
5
8
0
.1
0·7395
3
0.9534
1. 6206
4
9
.1
3
0·7713
1
0·9543
1. 6397
5
.10
0
4
0·7634
.1
0·9629
1.60.15
4
.11
.1
4
0·7847
.1
0.9629
1. 6202
5
12
0
5
0·7802
.1
0·9702
1.5873
4
.13
.1
5
0·7954
1
0·9698
1. 6049
5
.14
0
6
0.8673
1
1. 0000
1.
5730
4
15
.1
6
0.8739
.1
1.0000
1.5923
5
= number of des tinct design points
22
min Vlmin B designs when 11'0 : : :
g
and r = L 50
Table 3.3
Ct
o
+ Q'lx
min V
p
------
1. 8748
3
1
0.8428
1.8583
4
0.5602
1
0.8711
1. 8574
5
2
0·5013
1
0·9130
1. 8415
4
1
2
0·5846
1
0·9276
1. 8 1+53
5
8
0
3
0.5478
1
0·9681
1. 8298
4
9
1
3
0.6022
1
0·9780
1. 8349
5
10
0
4
0·5760
1
1. 0000
1.8203
4
11
1
4
0.6155
1
1.0000
1. 8263
5
12
0
5
0·5911
1
1. 0000
L 8176
4
13
1
5
0.6226
1
1. 0000
1. 8236
5
14
0
6
0.6008
1
1. 0000
1.8194
4
15
1
6
0.6272
1
1. 0000
1. 8246
5
N
NO
N1
1,1
N2
3
1
1
0·7908
0
4
0
1
0.3874
5
1
1
6
0
7
.t 2
p = number of distinct design points
23
min V\min B designs whenn ;::: Q' +
gOO
and 'Y = 5·00
Table 3.4
p
Cf X
1
£'2
min V
p
0
------
1. 8772
3
0.0889
1
1. 0000
1. 7956
)+
1
0.3993
1
1. 0000
1. 8147
5
2
1
0.6156
1
1. 0000
1. 8255
5
7
3
1
0.8346
1
1. 0000
1. 8193
5
8
4
2
1.0000
0
------
1. 7939
3
9
3
1
0.3656
2
1. 0000
1. 8040
5
10
4
1
0.6038
2
1. 0000
1. 8112
5
11
5
2
0·9814
1
1. 0000
1.8106
5
12
6
3
1. 0000
0
------
1. 7939
3
13
5
1
0.3236
3
1. 0000
1. 8001
5
14
6
1
0.5871
3
1. 0000
1. 8281
5
15
7
4
1.0000
0
------
1. 8034
3
N
NO
N
1
3
1
1
0.8106
4
0
1
5
1
6
= number
£'1
N
2
of distinct design points
24
Table 3.5
3.7 and 3.8.
1
1
min B ()')
g
1. 01
251.6719
1·50
0.0754
5·00
0.0005
Each table corresponds to a particular value of 1,
~.~.,
1. 01, 1. 50 and 5.00 respectively.
In the Box and Draper approach
when min B is
g
(the number of parameters in the reduced model).
achiev~d,
lnQn V
The design
flexibility given by the min B method allows one to always obtain
g
min V ~ s [Theorem 2.4 (iii)J and in many situations, min V < s for
infinitely many designs.
As illustration, variance contours have been
drawn for six designs which are shown in Figures 3.1 through 3.6.
L(N,NO,Nl ,N 2; 1) denotes the min Vlmin Bg design with a total of N
observations, which has been used to approximate R ,1(x:e,1) when 'flO
2
is a linear model.
model,
~.~.,
Similar notation is used when 'flO is a quadratic
Q(N, NO' Nl ,N 2; 1).
Another way of taking advantage of the
design flexibility offered by the min B method is to choose the
g
min Vlmin B design within the class of designs having equal spacing
g
which is given by the relations t
Figure 3.2.
2
=
2t
l
or t
l
=
2t
2
as shown in
s
25
Table 3.6
min Vlmin Bg designs when
~O
p
= ~O
+ ~lx + ~2x
2
and y = 1.01
N
NO
N1
1.1
N2
t2
min V
p
3
1
1
0·9412
0
------
3.3145
3
4
0
1
0·7201
1
0·9730
2.1807
4
5
1
1
0·7788
1
0·9768
2.0986
5
6
0
2
0.8585
1
1. 0000
1. 8194
4
7
1
2
0.8743
1
1. 0000
1. 8337
5
8
0
3
0.8623
1
1. 0000
1. 7000
4
9
1
3
0.8741
1
1.0000
1.7238
,/
10
0
4 . 0.0645
1
1.0000
1.6423
4
11
1
4
0.8739
.1
1. 0000
1.6661
5
12
0
5
0.8659
1
1.0000
1.6088
4
13
1
5
0.8736
.1
1.0000
1. 6310
5
14
0
6
0.8667
.1
1.0000
1. 5874
4
15
1
6
0.8732
1
1. 0000
1.6077
5
= number
of distinct design points
c;
26
Table 3.7
min Vlmin B designs when
g
110
=
0'0 + O'lx + Cr2x
2
and y
=
1. 50
N
NO
N
1
.t1
N
2
1- 2
min V
p
3
1
1
0.9409
0
------
2.6938
3
4
0
1
0.2723
1
1. 0000
2·3120
4
5
1
1
0.4662
1
1. 0000
2.2474
5
6
0
2
0.4576
1
1. 0000
2.2270
!~
7
1
2
0.5385
1
1. 0000
2.2559
5
8
0
3
0·5093
1
1. 0000
2.2'733
4
9
1
2
0.3820
2
1.0000
2.2633
5
10
0
3
0.4014
2
1.0000
2.2351
4
11
1
3
0.4613
2
1.0000
2.2322
./
12
0
4
0.4576
2
1. 0000
2.2270
4
13
1
4
0.4996
2
1. 0000
2.2392
5
14
0
5
0.4892
2
1. 0000
2.2442
4
15
1
4
0.4199
3
1. 0000
2.2384
5
~
p = number of distinct design points
h
27
min V\min B designs when
Table 3.8
g
Tl O
;::
eta
+ Q!l X + 0I
N
NO
N1
1,1
3
1
1
1. 0000
a
4
a
1
0.8888
5
1
1
6
2
7
2
x
2
and r ;:: 5- 00
min V
p
------
2.4215
3
1
1. 0000
2.1520
4
0.1970
1
1. 0000
2.2268
5
1
0·5122
.1
1. 0000
2.3253
5
3
2
1. 0000
a
------
2.1931
3
8
4
2
1. 0000
a
....
_._-_
.....
2.1452
3
9
5
2
1.0000
a
-------
2.1696
3
10
4
.1
0.3348
2
1. 0000
2.2460
5
11
5
3
1. 0000
0
----_.-.
2.1651
3
12
6
3
1. 0000
a
----
-:
2,1452
3
13
7
3
1. 0000
a
------
2.1563
3
14
8
3
1. 0000
a
------
2,1867
3
15
7
4
1. 0000
a
------
2.1562
3
N2
1,2
p ;:: number of distinct design points
.....
28
If the
~
priori knowledge on r is diffuse rather than sharp, so
that only a range for y can be specified, then one could use the
design flexibility available to achieve min B for a grid of values
g
of y, spread over the range specifi.ed.
29
1.0
0·9
~
-
01.76
1.8
2.0
V
0.8 -
~
0·7
\0
t-
0.6 -
r1
Q
.1 2
'0·5
1.(
0.4
~
0.3
0.2
-
2.0
0.1
0.0
0.0
0.1
"
0.2
Figure 3.1
-'
0.3
.'
.'
.'
0.6
.
Q.7
. t··
.'
0.8
Contours of constant V for
L(N,NO,N1 ,N 2 ; r) = L(4,0,1,1; 1.01)
•
30
1.0
0·9
0.8
"'-'-2.0
0·7
0.6
0.4
0.2
2.0
0.1
0·5
0.7
o.
.e1
Figure 3.2
Contours of constant V for
L(N,NO,N1 ,N 2 ; y) = L(4,0,1,1; 1.50)
31
1. 79
1.0
2.0
0·9
O.S
0·7
0.6
t 2 0·5
0.4
0·3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
t
Figure 3.3
*
0.6
0.7
o.S
0.9
l
Contours of constants V for
L(N,NO,Nl ,N 2 ; r) = L(S, 4, 1, 1; 5.00)*
The 5 point design collapses to a 3 point design because t l
=t2
1.0
32
1.64
1.0
2.0
-:
3.0
---
0·9
• 1.64
0.8
0·7
2.0
0.4
3.0
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
.e1
Figure 3.4
Contours of constant V for
Q(N,N ,N ,N ; y) = Q(10,0,4,1; 1.01)
o 1 2
1.0
33
2.10
1.0
2·5
0·9
1---_
3.0------------
0.8
0·7
0.6
I
I-
2 0.5
2.1
f
0.4
i
0·3
2·5
I
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0,7
0.8
0.9
11
Figure 3.5
Contours of constwlt V for
Q(N,NO,N1 ,N ; y) = Q(5,1,1,1; 1.50)
2
1.0
34
2.33
-
. .,
l.0
2.6
0·9
3·0
0.8
0·7
0.6
2.33
1,2
4)
0·5
0.4
0.2
0.1
0.0
0.0
I
0.1
I
0.2
0.3
0.4
0.5
0,6
0.7
0.8
1,1
Figure 3.6
Contours of constant V for
Q(N,N ,N ,N ; 1) = Q(6,2,1,1; 5.00)
O 1 2
0.9
1.0
35
4.
SUMMARY AND CONCLUSIONS
A general regression model
relatively simpler model
~(x)
~O(x).
is approximated by the BLUE of a
The choice of
~o(x)
within a fixed
1\
class is made so as to minimize the bias.
The approximation
~o(x)
subject to satisfying the bias criterion, obtains minimum variance.
It is not necessary that the relatively simpler model ~o(x) be made
up of terms of the full model.
applied to a situation where
The minimum bias technique has been
~(x)
is a rational function and
~o(x)
is
1\
a simple polynomial function.
Examples have been given where
satisfies both bias and variance criteria.
~o(x)
Illustration of the great
deal of design flexibility allowed by this method has been provided,
in particular:
(1)
An infinite number of designs achieving minimum
bias also give a V < s (variance obtained by
designs satisfying the Box-Draper conditions
when the reduced model contains s terms).
(2)
An infinite number of designs achieving
min B give V < s and allow equal spacing of
g
design levels.
(3)
It is possible to obtain designs which will
give min B for several values of y,
g
~.~.,
protection against inexact knowledge of y
may be obtained by gridwise protection over
a range of y values.
36
5.
REFERENCES
Birkoff, G. and S. MacLa.YJ.e. (1965). A Survey of Modern Algebra,
Third Edition, The MacMillan Company, New York.
Box, G. E. P. and N. R. Draper. (1959). A Basis for the Selection
of a Response Surface Desi~l. Jour. Amer. Statist. Assoc.
54:622-654.
Box, G. E. P. and N. R. Draper. (1963). The Cb.oice of a Second
Order Rotatable Design. Biometrika, 50:335-352.
Elfving, G. (1952). Optimum Allocation in Linear Regression Theory.
Ann. Math. Statist. 23,:255-262.
Folks, D. L. (1958). Comparison of' Designs for Exploration of
Response Relationships. Unpublished Ph.D. Thesis, Iowa State
College, Microfilm, Inc., Ann Arbor, Michigan.
Gantmacher, F. R. (1960). The Theory of Matrices.
Publishing Company, New York.
Volume 1, Chelsea
Graybill, F. A. (1969). Introduction to Matrices with Applications
in Statistics, Wadsworth Publishing Company, Belmont, California.
Hoel, P. G. and A. Levine. (1964). Optimal Spacing and Weighing in
Polynomial Prediction. Ann. Math. Statist. ,35:1553-1560.
Karson, M. J., A. R. Ma.YJ.son and R. J. Hader. (1969). Minimum Bias
Estimation and Experimental Design for Response Surfaces,
Technometrics. 11:461-475.
Kiefer, J. (1959). Optimum Experimental Design.
Soc., (Series B) 21:272-319.
Kiefer, J. (1961).
Math. Statist.
J. Roy. Statist.
Optimum Designs in Regression Problems II.
32:298-325.
Ann.
Kiefer, J. and J. Wo1fowitz. (1959). Optimum Design in Regression
Problems. Ann. Math. Statist. 30:271-294.
Rao, C. R. (1967). Linear Statistical Inference and its Applications.
John Wiley and Sons, New York.
Scheff~, H.
(1967). The Analysis of Variance.
London, England.
John Wiley and Sons,
37
6.
APPENDIX
38
A.l
Theorem 2.1
A necessary and sufficient condition for the matrix W to be
gg
positive definite is that the g.1 (i : : : 1 7 2, •.. ) s) be linearly
independent.
Proof
Since W is positive semi-definite 7 it suffices to prove that
gg
Wgg is not singular (Gantmacher, 1960, p. 305).
To prove the non-
singularity of W is equivalent to proving the following:
gg
a
necessary and sufficient condition for the matrix W to be singular
gg
is that the g. (i : : : 1, 2, ••. , s) be linearly dependent.
1
Assume that the g. (i : : : 1, 2, ... , s) are linearly dependent,
1
then there exist a vector of constants .£
a.e. on
So W
gg
I.
!
0 such that .s./~(x) : : : 0
Therefore
is not a positive definite matrix.
However, since W
gg
is
positive semi-definite, it must therefore be singular.
Assume now that the g.J. are linearly independent.
of constants ~
! 2,
.£/~(x)
!
0 a.e. on
I.
For any vector
Hence 7
Thus W is positive definite and is therefore non-singular.
gg
A.2 Theorem 2.2
The vector of parameters
A' : : : VIT.
~
:::
A~
is estimable if and only if
Vl is a nxr matrix whose columns are the orthonormal
39
characteristic vectors corresponding to the r non-zero characteristic
roots of
(F/~-1F )
with s
~
r
~
n, and T is a rxs matrix of full rank.
Proof
We know that AS is estimable if
the columns space of (F/I:
-1
~ld
only if the rows of A are in
I
F), (Scheffe, 1959).
Since the
characteristic vectors corresponding to the non-zero characteristic
roots of (F/~-~) span the same space as do the columns of (F/!:-lF),
then we have the required result.
A.3 Theorem 2.3
If
01
is estimable and A
:=;
W-1W
i::; of full rank, then the
gg gf
matrix A(F/I:-lFrAi is non-singular.
Proof
Assume that A(F/!:-lF)-AI is singular.
vector
£ f Q such
Then there exists a
that
(A.3.1)
Using the fact that
~
is estimable and premultiplying by
£',
equation (A.3.1) is written as
Since !:-l is positive definite, there exists a non-singular matrix P
such the!:-1
:=;
PP/.
Thus equation (A.3.2) may be written as
(A.3.3)
40
From equation (A.3.3) it fo.llows that
(A. 3· 5)
Premultiplying equation (A. 3,5) by F"I\ gives
(A.3.6)
Since ~ is estimable, equation (A.3.6) becomes
A'd = 0
which is a contradiction since A is of full
rank~
Therefore
A(F/.E-lF)-AI is non-singular.
Lemma 2.1
A.4
If A = [A : A J and Al is a sxs matrix of full rank) then a
2
l
necessary condition for Ae to be estimable is that F be of full rank.
1
Proof
Assume that Ae is estimable.
matrix.
A. 5
:=
CF' where C is an sxN
So [Al : A2 J = C[F l : F 2 J, !.~., Al = CFl ,
estimable.
least s.
Then A
Therefore Al~l is
Since the rank of A is s, the rank of F must be of at
l
l
Thus F is of full rank.
l
Lemma 2. 2
Using the notation of equation
(i)
~MS:~S:~M
(ii )
~D s: ~
(iii )
When N = s,
~
(2.8)
:JBD '
41
Proof
(i)
If
e
for
is estimable then
Ae
A necessary
is also.
condition
to be estimable is that 1"1 be of full rank.
Therefore ~l is estimable,
2.1.)
(ii)
Ae
(Lemma
Hence ~M ~ ~MB ~ 3RM'
The estimabili ty condition of Ae is A(1"fFfF'1"
:=
But
A.
one can write A(1"'1")-1"/F as
where
Thus, if}'
t::.
E:
:1BD'
equation (A,5.1) becomes
E ~MB
Hence :B'
(iii)
When N
:1BD
and
s and ~~B
:=
ever, when N
=:
:=
~ ~.
¢, the result obviously holds.
s and ~ 1=
Then the only way for
A~
¢, one
can w.r-l te A(FfFrFfF as
to be estimable is for
'F )-.IF''F - W~L
(F11
1 2 - gg
·INg f 2 '
!.~., 1" E
J BD •
Therefore
3MB
== ~D
when N
==
s.
A.6 Theorem 2.4
Using the notation defined in equations (2.9) and (2.10)
(i)
V
~ V
FM
MB
(ii)
V
MB
(iii)
V s: V
BD
MB
~
V
RM
==
s.
How-
42
Proof
(i)
For any F
E
~
~1:"MJ
and
Therefore,
By Corollary 2.1, it follows that VFM(F) ~ VMB(F).
Since
(ii)
3FM
For any F
~ ~~,
=
(Lemma 2.2, (i))J VFM ~ VMB ,
[F : F
l
2
J
€
~J 1'\
is of full rank (Lemma
2.1) and
VMB(F) ,", N,rrr[A(F"FfA-'Wgg J
and
.
One can write A(F/F)-A' as
-
.
-1
-
F F (F F ) F F J is a positive semil
l
2
definite matrix and Q = [F F (F F )-1
l 1l
where 6
= [F 2F2
-
2 1
1
2
fore, for any F
E
JMB
(A.6.l)
Hence Lemma 2.2 (i) gives V
MB
(iii)
When F
€
~D'
~
V ,
RM
the strict equality holds in equation
(A.6.l) and
...
43
vME (F)
=
VBD \'r) = N'lr[(}"l]
)-lWgg J = Trfr
]
. '.. l'
- s
s
From Lemma 2.2 (ii Lit follows that V s: V
MD
MB
A.7
s.
Theorem 2.5
To the model given in equation (2.7)
~
'11
add t more terms, say
= -l-l
e'f
~3!0'
'fl
'12
Let Cg
=
(Tl 0 :
~O = a/g
'I
--
e'f = -e'f-
--2=,2
The new model is
:=&If+8'f,
- -
where
class of simpler models.
+
~
~K
-3--3
= fk
(k
= 1,
2, ,. , s)} be the
Let V. denote the aver-aged variance of the
1.
min Vlmin B g linear estimator of Tl i (i
=
1,2).
The Vl ~ V ,
2
Proof
Let the design matrix
F~
= [F:• FJ J be partitioned so as to match
the partitioning of the model Tl 2 '
where
Al"
7
Then Vl = N.Tr[Al(F/F)-AiWggJ
= [r ' W-1W f ] and V2 = N,Tr[A(F~F*)-A/W ] where
s: gg g 2
'
gg
-1
A = CAl: W WgfJ,
gg
3
.
-
The matrix A(F-,F*) AI may be expanded as follows:
where U = [Al(F'/F'fF/F3 - W/ w-1J and 6.- = [F~F3 - F'F(F/FfF/F
g f 3 gg
)
3
3
is a positive semi-definite matrixo Ther-efore
r
44
and
© Copyright 2026 Paperzz