van der Vaart, H.R.; (1958)Some results on the probability distribution of the latent roots of a symmetric matrix of continuously distributed elements, and some complications to this theory of response surface estimation." (U.S. Army Ordnance)

t
The library of the Department of Statistics
l\Jorth Carolina State Un.versiiy
...
.
SOME RESULTS ON THE PROBABILITY DISTRIBUTION OF THE
LATENT ROOTS OF A SYMMETRIC IvIA'lRIX OF CONTINUOUSLY
DI?fRIBUTED ELEMENTS,
.ytb
SOME APPLICATIONS
TO THE TImORY OF RESPONSE SURFACE ESTDiATION
Prepared under Conu-act No. DA-36-034-ORD-15l7 (RD)
(Experimental Designs for Industrial Research)
,.
by
H. R.
-;
......
'~
T .•
.~.
~,
InstitUte of Stat.istics
Mimeo. Series No. 189
,.
January, \958
I
"
"
\
....
~~
•
• 6
...
V~i
der Vaart.
'.
•
CONTENTS
1 Introduc tion
1.1 Preliminaries, notations, statement of the problem
1.2 Definition of certain key matrices
page
1
1
5
2 Distribution-free results on the latent roots distribution
2.1 Some theorems oonI16loted with the bias of L as an estimator
7
of II
2.2 Some theorems on the variances and covariances of the
"elements of L and L
2.3 A condition for symmetry in the distribution of the
latent roots
9
17
21
3 Application of the preceding theory to some aspects of the
estimation of quadratic response surfaces
3.1 A few general considerations
3.2 Bias in the estimation of type of quadratic surface
3.3 The variance of the coefficients of the canonical form
3.4 Proposal of a new approach to some problems in the
evaluation of response surfaces
38
Literature cited
39
26
26
31
36
SOhiB RESULTS ON THE PROBABILITY DISTRIBUTION OF THE
LAT:NT ROOTS OF A SYiJl£LETRIC
~LATRIX
OF CONTINUOUSLY
DISTRIBUTED EL.lilE1:NTS, AND SCHE APPLICATIONS
TO THE THEORY OF RESPONSE SURFACE ESTIrrlATION
by
H. R. van der Vaart
Institute of Statistics, University of North Carolina
Raleigh, North Carolina, U. S. A.
and.
Leiden University, Netherlands
h
Introduction
Consider a k x k real matrix N, each of' its elements being allowed to take
values in a certain space. If a probability measure is defined over the product
2
TIN of these k spaces, h will be called a rgp2Qm ma-qix. This probability measure
2
will be called singulgt. (relative to the k -dimensiona1 Lebesgue measure to be
defined on
) if a subset of Tlp1 exists which has Lebesgue measure zero and
H
Tl
probability measure one (because of probabilities being non-negative the more
sophisticated definitions of
simplified in this context).
~~!'2) is
~~~,
1937, p. 30, or
~ggg,
1953, p. 611 can be
This subset (~~f~E~g 'I.!£ ~ ~ ~~~ ~! ~~~~~~~ 1]~~~~!
called the singular set of the probability measure in question (relative
to the afore-said Lebesgue measure) or alternatively its
.W 2! Blcrease (cf.
I.e.). We shall denote this singular set of' the probability measure over
~~~~,
N by
1T
the term ~I-spacea and the probability distribution (:::: probability measure) over
~I-space
by the term (erobabilitv) distribution ~ 11. We may need a term to denote
H and use ~~!~g~g ~:~E~2~ to this end.
Examples of singular probability measures defined on
lT
: the elements of ill may
M
11
be restricted to '!'E~~g~!, values, 1'1 may be prescribed to be 2~~!:2g2~!! (~1-space
- 2 -
'e
t k(k-l) -dimensional) or
1-1 may be
~~2!!'!~ (M-space ~ k(k+l) -dimensional). In
the second and third example M-space is a hypersurface in
TrW
and may be defined
for instance by equations regarding the alements of N, or by parametrization of
these elements.
We shall briefly discuss the case of a symmetric random matrix £'1..
Here M-
space may be defined as a subset of TrM by the parametrization
(j ~i; i = 1, ••• , k)
m = t
ij
ij
\
\. m..
J.J
= t ji
< i;
(j
(1.1,1)
k) •
i = 1, • '"
However, it is slightly more convenient to identify the t .. and the m.. (j ~ i) and
J.J
J.J
so to use (mIl' ••• , mlk , m22 , ..., m2k , •• -, ~-l k-l' ~-l k' ~k) -space for ~:
~E!2~ and
to use the
E!:2j~£~:12!!
by (1.1,1) on this m.. (1
-- ----
~J
2!
~h~
E!:QQ!!?!!!.!-l
<i =
<j =
< k)-space
=
g!~~!:!!?!!.!-!2!! over the set defined
for the distribution of M.
vie shall
call this distribution of a symmetric random matrix N continuous if it is absolutely
---------
continuous with respect to
----------
12 k(k+l)-dimensional Lebesgue measure on m.. (1 < i < J'
1.J
= =
< k) -space.
=
Let again IN and 21'1 be two k x k real (otherwise arbitrary) matrioes.
As is
well known, they are said to be equal,
1M = 2M ,
if each of the k
II
2
elements of IN equals the corresponding element of 2M• If now
and 2M are in addition random, they will be called emlivalent (relative to the
probability measure considered) i f 1M = 211 idth probability one (cf. ~~~gig~gX,
(1933
(1950), p. 33 for N a scalarJ
if one needs to distinguish between this
type of equivalence and the one described by
~~g~~~~~,
1946, p. 40 seqq., one may
add the qualification ~tochastica.l to the above-defined type).
This entails that
the probabilities of any subsets of the enlarged lH-space and of the enlarged
•
- 3 ..
,_,:H-space are equal as soon as these subsets can be transformed into each
c.
the equation 1M = 2M, hence the distributions of
and
t
II and
of
II are
oth61~ by
the same:
if
l
are equivalent random mat1.'ices, the expectation of f(lI) equals the exeecta-
]i2D. of f(lI) (provided these expectations exist).
Before stating the problem to be investigated, let us add some more notation
conventions. We shall write the expectation of a matrix for the matrix of the
expec tations
••• k
••• k
•••
•••
=
••• k
••• k
.
'
(1.1,2)
if the Greek capital I.L were different from the Latincapital m, we would have
replaced the fourth member of (1.1,2) by one single Greek capital in accordance
wi th the next convention: a Greek small letter will denote a parameter, and the
corresponding La. tin small Ie tter a random variable closely rela ted to it (Greek
letter denotes expectation of Latin letter, or Latin letter denotes an estimator
of the Greek one for instance): a Greek capital will denote a square matrix of
parameters, a Latin capital a square matrix of random variables.
will stand for IIjointly estimating the
en •• 11
T ~J
"Estimating
~"
(~= 11et)
.. 11 ~ 11 •••
••• kk). A symbol
. r 1.J J =
0=
like mi. will denote the I x k matrix or row vector consisting of the elements in
the i th row of ]iI, m . will denote the k x 1 matrix or column vector consisting of
.J
the elements in the j th column of M. Sometimes we shall denote vectors by underlining (printing in bold type) small letters without any subscript.
'
Now we shall state our problem.
metric k x k matrix:
(jj ••
r~J
= "'...
I J~
Let
Let F
if.,
'i:'
=
II c-c.
I
• t,- i. :;
_ 1
r
J ~J
= J'I f ~J
.. (" ~ = 11 •••
J = •••
••• kk be a real sym•••
kk be a real symmetric
k x k random matrix which is continuously distributed and satisfies
(1.1,3)
- 4As both
q)
will be real.
.eh
and F are real and symmetric, their latent roots, k in number,
Denote the latent roots of
4t
by A
and those of F by
h
( h = 1, ••• , k), and aE'sign the subscripts in such a way that
(1.1,4)
(1.1,5)
The object of t..1-}is paper is to investigate the j,piJ:'l::1
the
.e h
will be allowed to be negative
t~'JO
or more roots
of F bei..'1g continuously dlst.dbuted.
L
0:
1/ 6
2!
i:h~ ~,
where
- contrary to a nurr:oer of previous investiga-
tions on the distribution of latent roots.
the probability that
:-L:!@,:~X1QutiGn
.eh
Though two or more roots A may be equal,
h
be strictly equ~l is e.lt'ITays zero becu-use
The fo11owing notatio:'1s will be u.sed:
••• k
••• k
,
.e II' go = 1 ... k
gh h" h == 1 ••• k
,
where 5
is the well known Kronecker symbol.
gh
(1.1,6)
The distrib:.ltion of L (note that
L-space will be Ilk-dimensional" in the sense that its points are defined by k
coordina t'3s) has an interest of its own as well as in connection with certain specific problems.
Chapter 3 will discuss one of these spec:l,fic probleme, viz. the
estimation of the type of response surfaces (cf. for instance ~2~ and ~ma~, 1955),
whereas Chapter 2 will give a number of dJAt:::,j,:2.ut:i.-Qn:·free ;resul t,~ on the distribution of L (i.e. results which are to a large extent independent of the distribution
of F).
!l ~ m!
~
k. The above-mentioned condition, tha.t F be oontinuously
distributed will be maintained throughout the paper.
Yet it may be
weakened on some occasions, which are left to the reader to point out.
..
- :;
One example is the zero probabilit,y of two or more roots
strictly equal.
eh being
Certain discrete distributions of F will share this
property with the continuous distributions,
1.2 Definition of certain key matrices.
---
---._----~----~-----------~-~--~--
As is well known a real orthogonal matrix
I
Y
u
can always be found such that
FU
Cj'?Y=Y/t
= UL,
(1,2,1~
Here the coluIn.."1s of
Y
U
constitute the mutually orthogonal latent vectors of
~
, while v. h correspo:nds to Ah
F, while u. h corresponds to
~h
I t is well known that if
""\) ,h is a latent vector, so is --v ,h;
I
u. h is a latent vector, so is .. u. h ;
besides, if two or more latent roots of $ or F are equal:
••• = Ah
, say,
d
then there is an inf:ini te se t of systems of corresponding mutually orthogonal
latent vectors:
h'
°1
1.)
," h '
'2
•• <>' -U h ' say
'd
u h ' u h' ••• ,
'1
'2
U
h ' say
'd
Neither of these possibilities
The probability of two or more
will have consequences in' the
latent roots being equal is zero
course of our investigations,
(cf. the second paragraph of
- 6 because the elements of Y will
sec tion 1.1).
be seen to drop out of all
u.
interesting developments at an
probabUity one, except for their
early stage.
sign; as for this point see the
h
Hence the columns
of U are uniquely defined with
last paragraph but one in this
section.
For easy reference the following formulae, equivalent to (1.2,1~, are listed
ip
III
yl q.
= ULUt
Yi\. yf
F
=/l
UfF
Yf~Y
yl
UIFU
'" .A
III
1:1
(1.2,11V
LU e
(1.2,1£}
L
(l.2,1g)
Another real matrix to play an important r~le :in this investigation is V,
defined by one of the equivalent relations:
v
VI
= ytU,
III
UtY,
YV
= U,
VlY'
Y
= U',
= UV I ,
yt
III
)
l
VU~
(1.2,2)
J
Note that as U is a random matrix so is V, and that if Y a.l1d U are restricted so
as to be proper orthogonal,so is V.
As for the signs of the column vectors of U, it is evident that the equations
(1.2,2), once Y has been selected, establish a one-one correspondence between the
signs of the columns of U and V.
Now in deriving the distribution-free results of
this paper these signs will be seen to be of no consequence.
However, they will
become important when the distribution of L is studied under more precise assumptions, e.g., that F be multinorma1ly distributed.
Then the Haar meast:.re on the
orthogonal group of U and V matrices will come into play (cf. ~g~~~g~, 1951, and
~~~~,
1954), a.11d the sets of admitted matrices U and V will then have
in harmony with the normalization of this measure.
.~
be chosen
- 7 Both Y and U, henoe V, will be
restri~
throughout to be proper or tho gonE,.,
i.e., their determinants are equal to +1.
~
Dj_stribution-free res,¥,ts on the latent r0J2.ts. ~tribution
Under the assumptions of section 1.1, viz., if the k x k real symmetric
random matrix F is distributed contmuouslv and if
,
(2,1)
one finds from equations (1.2,1g), (2.1), (1.1,2), and (1.2,2) that
1l
= ysq,Y
= YI. c(F).Y = € (YIFY)
s
£ (VUIFUVI)
= f, (VLVI)
(2,2)
Thus, though the off-diagonal elements of the real symmetric random matrix
YVFY
/\
= VLVI = L,
say
A
are different from zero with probability one, and though .ell may exceed.
.......
.e 22
with
A
positive probability (see Remark at the end. of section 2)" yet this matrix L would
be an expectation-unbiased estimator of Il (as for this concept, see Xgg g~t -g~{g~,
1957,!, section 2).
Unfortunately, in the usual cases one will not know
4>
or Y,
but only be able to observe a value,' or rather a realization, of the random matrix
F.
Hence among the three matrices Y, V, U occurring in equations (1.2,2), (2,2)
/\
and (2,3) Y and. V will be essentia1~ unknown, so that generally speaking L cannot
serve as an estimator of./l.
Note that if Y should happen to be the identity
matrix I (which would entail that ~ is the diagonal matrix
11),
then equations
A
(2,2) and (2,3) show that F = L rather than the commonly used UIFU = L ought to
serve as an estimator of
A. This seems to be a first corroboration of a conjec-
ture made by R. L. ~g~t~g~ (and privately communicated to the present author
during his stay at the Institute of Statistics, Raleigh) to the effect that if
some off-diagonal element of F happens to be smaller than its standard error, it
might be a good idea to estimate the response surface as if' this off-diagonal
•
- 8 element were zero.
This leads to altering U and to usin g this altared U in tll.G
estimate UfFU of .it (if all off-diagonal elements of F are small -- which is a
fairly probable occurrence in the a,bove-mentioned case where Y = I, Jp
then this conjecture correctly leads to U
I::
=1.\ ......
I, hence to F as an estimator of Ji) •
It is intended to give this matter further consideration in another paper.
It is clear from equation (2,2) that though the off-diagonal elements of L
are identically zero, yet this
an estimator of Jl..
rn~trix
L may leave much to be desired ~~1en used as
As this use is qUite cOl1'Jl'llon in the evaluation of response
surfaces (see for instance ~g~ and !g~~~, 1955, p. 295, and ~g~ and My~~~~, 1957,
p. 239, and s80tion 3 of the present paper), it is wor'th wc:tle to investigate the
properties of L as an estimator of
.A. In
section 2.1 some bias properties of L
will be investigated, section 2.2 will be concerned with variances and covariances
A
of the elements of L and L, whereas section 2.3 will investigate a sufficient condition to be imposed on the distributio;,1 of F in order that the distribution of L
will be symmetri.cal in a certain sense.
B. ~ !!! S:
~!i.
Note that because the off-diagonal elements of
I\.
and L
equal zero by definition, no confusion can arise from using a single
subscript in A. , t , the diagonal elements of 1l. and L, respectively.
h h
/\
A
"'However, the diagonal elements of L ought to be denoted by t
or t gg
hh
as opposed to
l'g.h(g :I
h) for the off-diagonal elements.
/,"
frequently wri to .e
attached, ~
£,
h
A
or.e g instead of .e
Yet we shall
/'
or.e gg : ~
if one
subscript
- -_ .is
hh
this symbol will represent a diagQ!l9J. ~~ent of 1, i f
two subscripts are attached, the symbol will have its usual meaning.
1\
The proof that the off-diagonal elements of L are different from
zero with probability one, is similar to the proof of Lemma 2.1,1 in
the next section: the probability measure (absolutely continuous with
- 9-
t
respect to Lebesgue measure) which is defined on F-space is
k(k+l) -
dimensional, the dimension of the set of points wi th ~\ h ="\' v. f., v'h:::
g
,~,
19 1J J
1
1,J
= 0 (g p h) is less than 2 k(k+l), hence the probability
of this point set is zero.
A
1'\
> t 2)
An example that P(t l
Y = I, CP
'I A
= Ii
==!
J;0
may be positive is given by: be k
0 I'
If
1
withAl
1.. 211
<1.. 2
; F
II
=1
f 11 f i l A
12
f
21 f 22
distribution of F mUltinormal, then by section 1.1
= E(f 22)
J
yet P(f
ll
> f 22 )
P== L.
Be the
II
E(fll)
==
= 2,
Al
< 1.. 2
is known to be positive •
The first theorem is on absence of bias rather than on its presence.
1 h Jl 2. r.
~!"!
~.l,l.
If F is continuously distributed and
E(F)
==
<p J
then
,--------~~_.-
e(tr L) c (tr L)
==
==
tr
A
==
tr ¢
:; € (tr F)
(2.1,1)
k
g, ~ !'l !. r k.
t
I. .Q.
Q.
As usual, tr ~
stands for.~ 7>ii' etc.
i~
!.
Proof is :immediate from equation (2,2) and the following properties of traces:
tr
e(F)
=
E (tr
F) if F is any random matrix (cf. eq. 1.1,2)
tr (AB) == tr (BA) for any square matrices A and B;
for instance, tr
1\
==
(by eq. (2.2) )
= S [tr(Lvrv)] = (since
==
V is orthogonal)
tr [E. (VLV r) ] ==
==
e (tr
(2.1,3)
C [tr(VLVf) J =
L) •
Now, in order to prove the next two theorems on bias we need
1~ III III ~
£.1.l.
If the probabilitu distribution of F is continuous and
6 (F) =ep; then
- 10 -
(2.1,4.0)
P (t
k
j;'
......
~k > 0)
-
.~
.--.---..
-
roo
... ..., f.. •
_--... 1
k
As V is an orthogonal matrix
k
2
vih ... 1,
hal
""
Equation (2,3) yields t 1 =
k
.-'
"'>:"'
n=l
2
v1ht h ,
1\
tk
k
... /
"I)'
h=l
'2
h=l
v~h ...
1.
2
vkht h "
Hence
(2.1,5!!>
(2.1,5J2)
As
v~h :?
.A
0 and t h - t g ~ 0 if h
>g
(cf. inequalities (1.1,5), it is trivial that
,1\
t 1-t1 and tk-tk are essentially non-negative. Hence all that needs proof is
!'
=0
(2.1,6§)
P(tk-t k = 0) = 0
(2.1,611)
P(t l -t = 0)
1
"\
Ide shall write down the proof for equation (2.1,6g).
The equation
F ... ULUf ... YVLv'yr
which ensues from equations (1.2,1£) and (1.2,2) parametrizes (for this term see
e •g.
~§~m~lh 1947, p. 246) the
t
k (k+1) -dimensional F-space (F is symmetric t)
in terms of the Cartesian product of
and k-dimensiona1 L-space.
/\.
t
k(k-l)-dimensiona1 V-space (V is orthogonalt)
Now equation (2 .1,5~ shows that the condition
can be satisfied only if t l == .... :: .e g < t g+l and 0
VI g+1 = ...... v - lk
/\
for g = 1, ••• , k. Hence .B -t ... 0 defines a union of k subsets of the Ca.rtesian
1 l
product of V-space and L-space of dimension lower than
k(k+1)" In consequence
t l -t1
=0
I:
t
- 11 the image of
~l-tl =0
in F-space has
t k(ktl)-dimensional Lebesgue measure zero,
and as the probabilit,y distribution over F-space is assumed to be absolutely continuous with respect to this Lebesgue measure over F-space, it follows that
A
= 0) = 0
P(tl-t l
(cf. equation (3.5) in •Busamann,
I.e.) •
liftf<i1==a
k
k
[.7 ?
P
(u il
_1:.1..£.1.
'U jl
- Uil Ujl)f ij
-
> 0] = 1
- -_._.-
Proof.
c.-
__
It is easily seen from equations (2,3) and (1.2,1£) that 1l-~1 is the
element in the first row and first column of the first member of the matrix
equality (which -ensues from equations (1.2,1g) and (1.2,2) ):
nv t
-
L
u
ytFY - U'FU
(2.1,8)
•
vlriting down the element in thr first row and first column of the second member of
(2.1,8) and applying Lemma 2.1,1 yields the desired result.
pertains to
l~ow
A similar argument
.ek -1k •
tie are ready to prove that the smallest latent root of F is a negatively
expectation-biased estimator of the smallest latent root of
1>,
that the largest
latent root of F is a positively expectation-biased estimator of the largest
latent root of
q>,
and that their difference is a positively expectation-biased
estimator of the corresponding difference.
1: h ~
Sl I. !!
m
2 .1, 2 •
If F is continuously distributed and
--.
. . . . . -__ -....
C(F) = q>,
then
"...---_._--..-""I~-......---....-..
..
- 12 -
_-
Proof.
- .....
....
As by equations (2,3) and (2,2) A
1
)..1 -
= C(~),
)..k
= C(1k),
one finds readily
C(~l) = c: (~-~l) = J (-:t'l-~l)dP > 0;
the last inequali ty sign holddJ:lg true because of Lemma 2.1,1, see for instance
MMmg~j,
1950, p. 104.
C(~k)
Similarly
- Ak
e(~k-~l)
IS
J ($k -~)dP > 0
- (Ak-A l ) =
f ($k"'~)dP + J al-~l)dP > 0
The next theorem contains a result
! h~~
close~
related to distribution-bias:
t ~m 2.1,1.
If F is continuous~ distributed and
e (F) ='i,
then
(2.l,10,a)
(2.1,109)
(2.1,105:.>
:B. ~ m!. 1: !sa.
A
If ~l and
/\
.e k
could be regarded as estimators (Which, unfor-
tunately, they cannot in general, cf c sect:iDn 2) and i f each of the three
inequalities in this theorem were strict for at least one value of ~
then these inequalities would mean that
~k-~1
~l
,s1 ~k'
,i:
would be negatively, $k and
positively distribution-biased estimators of )..1' )..k' and Ak-A l with
A
1957g" definition 3).
1\
.el , .ek ,
A
/\
and .ek-.e (cf. Xg~ g~~ Xgg~~,
l
As it is, this theorem may provide us with limits
respect to the comparing estimators
for certain probabilities and in a special case with a proof of medianbias (see coro11ar,y ~ of this theorem).
- 13 -
- ...
Proof
•
.......
~
The proof ensues immediately from the s:imple lemmata 1 and l' in secti,oJ! U. of
X~ ~~E ZgF:~ (1957a). In lemma 1, contending that
i f p(z ~~) :: 1, then P(rn ~g) ~ P(m+z ~
S
+C ), whether the random
variables m and z are independent or not,
sUbstitute?; by 0, z by
(~-tl)'
(2.l,10~. In
m bY)'. to find
lemma 1',
~
contending that
if P(z ~~)
GO
1, then P(m ~ ~) ~ P(m·~z ~
6
+t, ), whether m and z are
independent or not,
sUbstitute' by 0, z by
(~-tk)' m by t k to find (2.l,10ll),
by 0, z by
and ,
'"
A
Uk-t
k -.el+t l ), m by t k-t1 to find (2 .1,10iV •
a !: ~
2! l'h~:re.m ~,
If each in terval in--2-J
f.. (1 < i < j < k) -space
_________
Q. 2. !: £ 1 !
~
z:t
inequalities (2 .1,10~
:e.,
::..::...
has pos i tive probabili ty, then
._ •
-
•
_
a11d £) are strict..
-
Proof.
- ...... We may remark that by interval we mean a set of points in F-space such that
"" .. < f.. < .l3 ..
~J
~J
~J
(1
< i < j < k),
to:
::
::
where ""ij as well as ~ ij are allowed to be either finite or infinite (in the
latter case: ""ij :: -co, !3 ij :: 70»).
Ncw if we want to prove the three inequali-
ties (2.l,10.€b !J. and .Q) to be strict, it will be sufficient to show that
[(~
n
(t l <Sl)] > 0 ,
(2.1,11~
P [(t < Sk) (\ (t > ~k)] > 0 ,
k
k
(2.l,ll£.)
A
P [ (tA -t
< ~ ) () (t -t > g
k 1
k 1
(2.1,110
P
>gl)
'"
)]
>0 ,
- 14 ..
respectively (cf. equation (4.2) in Lemma 1 in ~ g~~ ~~g~~, 1957f}). As by the
condition of the corollary each interval containing any point in F-space, hence
each corresponding neighbourhood of any point in the Cartesian product of V-space
and L-space, has positive
probabili~,
all we have to show is that there are
points in (V-space) x (L-space) in which both
and
t k - t1
For, we may then
su~round su~h
-5>0
points by a neighbourhood (corresponding to an
interval in F-space) a:1 pa:i."1ts of whic.h satisfy the same inequalities, whence the
proof will be comple te •
Now gj.ven any
~I
§
>0
choose
< g1
t1
.
t 2 with t 2 t
h
~
2
VII
t
'
~ I > S I-.el"
t _ (hg 3, ..., l<:),
h 1
1
< '2
I
tk
> £; k
I ~l
'
5l<:-tl<:_1
> tl<:-
Sl<:'
~k
and
IS; ~l<: ~
g
-
satisfying
1; t 1 , tl<:' t 2
2
2
tk_l,vI1 , and vkk ' as in
the first tvJ'O columns,
though in such a
Then
way
that
e
k
/\
$
I
...
•
e
~
k
2
A
.." v1hth ...
h=l
$
... L
k
k-l
2
2
.,. -V'il(.s1-t1) +
2 f,
= vkk'k +
2
S. vkk$k
=
(1-Vil)($2-~1)
L
k
~-t;. < Sk- Sl
2 )
+ (l-vkk .ek - 1 =
2'
'
/)-tt < ~
:> k
2
Vkh$h ~
h=l
""'"'
2
"" Vkk(~\- ~k) - (l-Vkk)(g k -tk _1) + ~ k
>
+ 51
< - 51
-t1
V
k
2 0 +~2$ >
:: vl1-tt1
/': v1h h ...
h=2
~ vII$1 + (1-V11)$2
/\
2.eh
h=l kh
'"
as·
<
I
> (1- 2vil}(Sl-tl )
+
$1 > S 1
< (-1+2v~k)(S k-tk - 1 )
•
+
.....
v.
S k <~ k
If in (2.1, llQ) ~ $0 the proof in the third column does no t apply.
"
Because of P ( t -t
k 1
> 0)
III
1, all we have
to show in this case is that there are points in the Cartesian product of V-space and 1-space in which
k
t k -t1
Choose t
1
and t 2 such that tk-t
l
V
A
A.
"
> -.§
c:
2
1f";1 (vkh
~O, hence -B
2
- v lh) t h
1
-tk
<..;
>-.
<~
~O.
~O; choose v
2
kl
=1
2
... v 1k •
A
A,.
Then $k-tl"" t 1 -tk
l-
< S.
- 16 -
£ 2. !: 2. 1 ;/;, !. !:,;t
:l2. Q! Theorem ~
In order that t l , t k and tk-t be negatively, positively and positively median1
~
-..
.....
--_._----~----
-------_.
biased estimators of AI' Ak , and A -AI respectively, it is sufficient that the
k
................
-..
..
...-.
..
...
-----~
~
-----
.--..~
~
~ k(k+l} -variate probability density funotion of the f"
~::r~.:::.~~
..
(1
< i < j < k)
~.......=:
------------_....-. -
--~
-=
be
-=.. . . .'---~
respeot to the point with coordinates )Oij = t(f:ij} (1
S i ~ j_~ k)_:
- --
Proof
•
.As tlB probability density funotion is determined only up to its values on a
set of Lebesgue measure (on F-space) zero, the statement of this corollary ought
to be read with this c.palifioation in mind. Now put
in the inequalities
(2.l,10gJ, (2.l,10:2,), (2.l,10~,
respectively of Theorem, .1,3.
respectively.
Then the proof will be complete if
Because of equations (2,3) and (1.2,1£).
YfFY
= '"L
and Y'4>Y
=A •
Thus the medians of
k
k
k
k
k
:? :2
k
~ "': '0'1 U 'l(f ..- <tJ.. ),
U 'k UJ'k(f~J'- O';J'}' ;
:; (U ik UJ'k- V ~l tJjl )
{';'l {=l ~
J
~J / ~J
i=l j':"l ~
... I :..
i=l j=l
...
are to be shown to be zero.
As these expressions when equated to zero represent
Qyperplanes in F-space which contain the oenter of s,ymmetry, the proof is evident.
•
(It was originally thought that t
distribution of F, cf.
y~
k
and t
der Vaart,
-._--
l
are median-biased for any continuous
1957!l~
for other than symmetrical
- 17 -
'e
distributions we have not been able to prove or disprove this idea, however; for
symmetrical distributions the median-bias of
.ek -.el
has been added to the origi.nal
idea.)
Finally we want to remark that though the existence of negative and positive
expectation-bias of the latent roots is to a laTge extent distribution-free, and
the evistence of median-bias is so to a lesser extent, the amounts
of the expectation-bias, and the amounts
of the median·wbias, depend on the distribution of F.
In case k
= 2,
the fonrula
a sped.al case of (2 vI, 5£:), will tu:'n out to be us ;);(v.1 ill compu t,[;. tions of this
amount under morf': precise assumptions concerning
'~h:; dist~.:'ibution
of F.
A
Some theorems on the varian::;es and cov3r:'.a.nces of the elem;mts of L a.."ld L.
----------------------~----------------~.~-~-~---------------~~-~-~-----
As we a.Te going to usc ma.ny summation signs in the next two sectj.ons, we want
to smplify thei:: notatiom
k
"'"
.'
4.J
g=l
by
=:'
g
we shall replace
k
.e.::;:. by .. ,
~
'1 -r
~~
~
""
k
~
L..J
k
2
~
by
i=1 j=1
L.J ,
i,j
k-l
k
~
~
4.t
~ j=:r.+1
by
"'"
L;.
i<j
To begin with, we shall prove a theorem on the sum of the variances of the
latent roots of F, which will be very useful as a check on computations concerning
these variances:
1: h it
Q.
t !i m. 4,.2 ~l
If F is continuously di.stribu ted and
8
(F) = ~ , then
- 18 "'"
LJ var.e g
1::1
g
~ var f
.~ij
~.
~"'" 2
+
J
- ~ ( l· t ) 2
L, v g
_ _=""oJiiK...-_ _p_
,h-." ij
~;;,a."",J
~ (. c~.
1:7 0 )2
- /'
.!..,~L- g _.~
'"
/' val' f .. + ""',
. I\. 2
1::1
-g
_ _ i.j_.~J
(2.2,1)
---_-.
Proof.
The equality of the last two members of
(2.2,1) follows from
Besides the definition of trace and the orthogonality of Y the equations (1 ..2,19)
and
(2.1,3) have been applied here. In quite the same
~
0
2~
g
2
g
1::1
way one obtains
~
2
.L.Jf'
• ~j
-l
...,J
The equality of the first two members of (2.2,1) then results if
is subtracted from both sides of the equation
,
which ensues from taking expectations in
If k
= 2,
Theorem 2.2,1 can be reduced in a.n interesting way:
9.. 2. r. 2. 1. 1. g, r.:l. 2!
=
-----
If k
(2.2,3).
Theorem. ~
_
2, then under the conditions of Theorem 2.2,1
-----._-~...
...._-~------
(2.2,4W
- 19 -
•
.-----
Proof
•
...
As for (2.2,4!2> compare Theorems 2.1,1 and 2.1,2.
Equation (2.2,4~ ensues
from equation (2.2,1) in Theorem 2.2,1 since for k = 2
In those cases in which var $1
= var
$2' this corollary enables us to compute
the variances of both latent roots as soon as .,I,j3 is lmown.
Section 2.J will derive
a condition which is sufficient for these variances to be equal.
A
A result on L which is analogous to Theorem 2.2,1 on L, is contained in
! h~
.Q.
!. !i
m
2.2, ~
If F is continuous],y distributed and
e (F)
= ~, then
---------
------
The main difference with Theorem 2.2,1 is that L is a diagonal matrix while
.
A
L is not, and that L is an expec tation-unbiased as tima tor of
A,
while Lis not ..
Using equations (2,3) and (1.2,1s1,) one finds by the same kind of argument that
was used in equation (2.2,2):
"A
tr(LL) = tr(LL)
i.e.
l~'ter
taking
~xpectations
and subtracting
= tr(FF),
- 20 - note that
C1 =A ,
hence
Stgh = 6ghAh'
cf. eq. (1.1,6) - one finds equation
(2.2,5) •
Finally we list the following result on the covariances of certain elements
1\
of L and L:
The
0
rem
---"'---~
2.2.3.
(~~
------=-'" ....-.. ------------..-..-=~,
---.-. ._.-. ;--If F is continuously distributed and C(F)
then
~
4 cov
h.J
(f .. ,f .. )
= '"
,/
JJ_
J.J.
t:b
,,) •
cov (.eA,t
hh
~
(2.2,6)
It is easy to prove from
that
Likewise
and
cov (t1\ , 1\t )
g,h
gg 1".h
""""'
.2
= r..,& (tr A2
L)
- ( CC tr "'2
L)
Now as a consequence of equation (2.1,3) and the orthogonality of U and V we find,
using equations (1.2,1£) and (2,3):
tr L
:=
tr (UfFU) ... tr F,
tr L
:=
tr (VLVi) ::: tr 1.
A
This proves the theorem.
Note that if k ... 2 this theorem together with equations (2.2,4) of the Corollary of Theorem 2.2,1 enables us to compute cov (t ,t2 ) solely from the amount of
l
expectation-bias
~~.
- 21 -
In conneotion with the Corollary oonoerning t.'E case k
1:1
2 of Theorem 2.2,1
we observed that it would simplify oertain oomputations if we knew that var
== var
.e2:
oompute var
in fact the knowledge of
.el
and var
.e 2 •
c(13
1:1
.el
e(.e 2-A 2) would then be suffioient to
Henoe it would be interesting to know conditions which,
if satisfied by the distribution of F, would guarantee that var
.el
1:1
var
.e 2 •
As this paper is restricted to distribution-free results, we shall replace
this equality of varianoes by a oondition which can be handled more readily by
distribution-free methods: that of some kind of symmetry in the joint distribution
and .e • If q denotes the joint probability density of
l
2
of definition (1.1,5) of $1 and .e
of t
.e l
and
.e 2 ,
then because
2
q(t l ,t 2)
1:1
0 if $1
>t 2
Hence a symmetry like
would lead to q differing from zero only if
.el
1:1
.e2'
A far more in teresting case
is afforded by the symmetry defined by
We shall consider the problem for arbitrary k instead of k = 2.
We shall first
list a few properties of such distributions of L as satisfy
(2 .3,1~
and then find sufficient oonditions on the distribution of F in order that the
distribution of the latent roots of F satisfied (2 .3,1~ •
Condition (2.3,lfJ ·is not the only possible generalization of the case of
only two latent roots, but it is well suited to our purpose.
- 22 Writing condition (2.3, I.!) as a condition of equivalence of two random
matrices (see section 1.1) makes the development transparent.
symbol
IV
If we use the
for equivalence, (2.3,1!> assumes the form:
-lE-
1NyI- L .
Here I is the identity matrix, while
••• k
-lE-
and
••• k
L
k
••• k '
e ••
I:
cf. definition (1.1,7).
As the expectations of functions of equivalent matrices are equal (cf. section
1.1), we have
e1 + l
*L ::
yI,
whence
c.e g
+
C.ek+l-g:: \)I
eo.,
k)
= 2 tr
<p
(g = 1,
and
= 2 etr
2 Ctr -lE- L
L = 2 tr
il
I:
k
Y
Considering the direct product of L and 1 (cf. ~~g~!~~~, 1946, p. 81; as L
= 1,
we need not distinguish between left and right direct product), we find
Lx L
C [L
-lE- -
N
x L]
('Y II:
=
L) x
-ll-
(1 I-
C[( yr-*L)
12 (I x I)
x
L),
(1 I-*L)]
- (C*1) x ~ I-1I x (C*L) +
t,
(*1 x *L)
Subtracting from this equality the equality
(e1)
x (t: 1) :: [l ()' I-*1) ] x [ i ( ~ I-'"', 1) ]
="'1 2 (I x
..c*
I) .. (v L) x
't
I-
¥I
.c*L)\ + ('"';;*L) x (~ll:-ll- L)
x (\,.
- 23 we find
whence from the definition of direct product
whence it appears that (2.3,1E) is sufficient to guarantee the equality of certain
variances.
Hence we want now to find a condition for the distribution of F which
is sufficient in order tn."t 1 ""
Y I-*1.
We are going to prove
l.hj!SlI.~m 2.3,l.
Denote by ..oLand
02
two arbit1'?ary real, n0n.-sin~lar, k x k matrices, which
may be non-random or random, and let 1 a.71d -ll-1 be defined as in (2 ,3,1Q).
1et F be
........
,-",--~.,~---------,-----...--..
a real symmetric, random k x k matrix.
Then:L."1 order that the random (dis.gonal)
.........
-----------~_._----_._------ ;..~-,.---
matrices
1
aDd
__ • •
be equivalent,
---_
.
't r-*1
..........-:z,.
d
it is sufficient tr.at some pair of matrices..Q 1 and fl exists such
2
-~-'--.,.---.-.-.~---..---- ... ~
...--.-. - . . ' -
that the random matrices
-----,~
are equivalent. This condition entails that
-------'--....
_.----..
-~--.,.,-------
::'1
&;I
2 tr
t
(F)
&;I
2 tr ~ ,
- 24 and in case
.01
andQ 2 are not random,
-----_--..~.......
.-.
Here 5,gh is t~r~:cker symbol and the symbol {M l_i,i
eM
any matrix) stands for
7Ht- )
m•••
J.;I
!i ~ mJ! !:. k-
and Q may be random as l1Tell as non-random, we
l
2
could not conform to our general notation system of Latin capitals for
As.o
random matrices, Greek capitals for non-random matrices_
Proof.
- -----
The k latent roots .e :; .e
l
2
a function of F:
S .ek
S.
of any matrix F define the matrix L as
,
L = \jf" (F)
Though F and L may be random, the functional relation YJ is not.
that the latent roots of the matrices Sl.~l F
Likewise it is well kno'WIl that if
¥ -.ek
.e l
~
...
~.ek
$ _.., :; Y-.el are the latent roots of
W (y I
Hence as the latent roots of
are the same, we find,
(y I
- F)
nI
(V I
= YI
It is well
knO'Wll
and F are the same, hence
are the latent roots of F, then
- F), so
- *L_
- F) and of n.;l
(1' I
- F)!22
= YI
-.Q ;IF0
2
- 25 (2.3,5£)
Thus the matrix
(Y I
- *1) is connected with the matrix
(y I -..Q ~ 1 F n
exactly the same functional relationship by which L is connected with
Hence if two matrices
(y I
Ul
and
02
can be found such that
(n;:l F f'll)
(n;:l FG 1 ) •
and
- D.;l F .0. 2) are equivalent, then 1 and ()' I - *1) are equivalent.
i f (.0;:1 F
D1)
2) by
Finally
and (yI - n;l F ~2) are equivalent, then
or
which p:"oves equation (2.3,4~, which is as it should be because of (2.3,2~.
Further the expectations of corresponding elements of the two equivalent matrices
(Qil F f2 ) and
l
!
(y I
- rl;l F [2. 2) are equal, which prooves equation (2.3,4J2.).
~ ~ m.:£ 1.~.
Take k = 2, !ll
Q -1 F Q
2
2
= I, f2 2 ~
= *F
r.>
'I
JJ
f 22
If -f1 ?_
~ -~ II
'
WL"i te
-ff 2l 1'1 '
11 I.
Then Theorem 203,1 shows that if
}' I -
*F,
then
It is interesting to see which consequences condition (2.3,6!!J has for the first
and second order mcments of the distribution of F.. We find from
- 26 -
6 F = YI
- t~!oF
that
in this case equation (2.3,4:£) yields just this.
Furthermore, from
we derive by the sarne argument which has yielded equatj.ons (2.3,2g) and (2.3,2~:
t
[F x F - (CF) x (cF)]
I::
C[*F x *F - (e*E') x (e*F)] ,
whence from the definition of direct product:
There are no consequences for C;(f12 ), var f , cov (f ,f ).
12
ll 22
Finally one mig,.'1t note that with the simple
n 1 and iL 2 applied
:L'1 this
example the resulting condition (2.3,6€:!:.) is very closely related to the type of
symmetry in the distribution of F one would intui tirely expect to guarantee the
desired symmetry in the distribution of L.
Yet with more general
Dl
and
°
2 the
resul ting syrranetry appears to be somewhat more intricate.
~
Application_of~e precedin~~~~y to som~.aspects
the
Be
g
.:g~imation
of
mladra.tic~ r~sEonse
surface..€,
and 1. real k x 1 matrices (column vectors),
11 a
if S is
4'"
matrix,
real scalar varia.bIe, a function of ~, ~ and
Ylo
a vector of zeros; here
S, !, ¢
of
a k x k real symmetric
cP,
which has the value
a.'Y1d ~ are rum-random quantities,
~ and <P be:ing thought of as unknown· ~2.B~~~~ (parameters) in a BE~~ ~~~§~~.!
E!2E!~!, ~ and
f~~12t Y~1;!~~ in
r
t being supposed to
be given (and to assume, respectively) ~f:
d::if£erent. ~xperlmen'bt:J oOlmoo+..ed w:it.h t,he ~gm~ g;;y~ problem
- 27 (f .i.,
Sl'
...,
5k may represent a set of k
"conditions" like temperature,
humidity, e te ., each of which may be made to vary a great deal over different
~
experiments concerning one problem,
may represent the lIyield" or IIresponsell of
some process which is made to take place at each of a number of different levels
of this set of k conditions).
The equation of a gU,a.dratip respq1We §fir..f.ace has·
the· form
(.3.1,1)
which determines the yield
the vector ~.
If ~ and
~
'rt
as a function of the k conditions represented by
could be experimentally realized and observed with
exactitude, there would be little need for statistic-s in this problem.
as soon as one of them cannot, statistics has to come in.
However,
As in most, if not all,
statistical work on the estimation theory of response surfaces, we are going to
assume that
$
can indeed be experimentally realized with exactitude, but ~ can
not: instead of the "true ll yield
~
we obser.. . .e realizations of a stochastic vari-
able y with
2
vary=a,
cf. ~~ and ~~l~~~, 1951, p. 2, ~g~ and t!lU~~~~, 1957, P. 198, ~~~~ID~~Hk' 1956, p. 8,
~g~~gfJ and ~~, 1957, p. 335. Note that in equations (3.1,2) ~ is a function of ~
according to equation (.341,1) whereas
ci
is not: var y is assumed to have the same
value no matter what values the conditions
S have.
It should be emphasized that
we will investigate only quadratic response surfaces, hence will not consider possible consequences of the model being at fault, i.e. of specification errors.
The general procedure now is to choose wisely a set of N, say, values of ~,
i.e. a set of N combinations of experimental conditions, and for each of these to
observe y, finally to estimate n , ~ and
LO
'..-
-
4>
from these data.
This estimation
usually done by the method of least squares (cf. ~g~ and ]!Jl~g~, 1951, p.
5,
~g~
is
- 28 and !!la~~~n;, 1957, p. 198), which under the assumptions stated yields expectati2.Ul!nbiased estimators of ~o'
E£. mg'face one
10 ,
~ and
4>
t and <P.
If one wants to find out the tYR~ of guadra-
is dealing with, one may reduce its equation to canonical form if
are known
~~~2,!il;z. ~g~ and ~~~~g~, 1951, pp. 23-24, ~gt;f, 1954,
p. 35, ~g~ and !g~1~, 1955, p. 289, ~ge and ~~ng~t, 1957, p. 239 {IIA fitted second
degree equation can be interpreted most readily by writing it in the canonical
form It > apply thi's device also if only ~~H~~~~~ of
'10 '
~ and 4> are known.
The
next section will discuss this practice.
Before we start this discussion, however, it will be appropriate to devote
some thought to the question of scaling, i.e. to the question of the choice of the
units in which xl' x 2' ... , ~ are to be measured. Before an equation like (3.1,1)
can be written dawn, one must have chosen a set of units. A change in the units
of some or all xIs will bring about a change in the value of J3 and ~ and by
choosing these units one
wants.
Cal'l
make the elements of 13 and of
On top of this, the papers by
~g~
4> have any value one
and his associates use a kind of normali-
zation of the units, i.e. of coding of the N values of .; for which y is going to
be observed,which is handled slightly differently in different papers.
gf~~QR,
Box and
===
1951, give a certain amount of theory on an equation like (3.1,1)' without
mentioning units at all (not restricting themselves to second degree polynomials),
and then on p. 8 introduce a kind of normalization 'tdth the sole purpose of being
able to compare two designs, i.e. two different sets of values of the vector
g
for which y is to be observed: in each of the two designs they change the units in
which
g.J. will
be measured in such a way that
N
s~J. = L~'
u= l
(e.J.U -
_
.;.)2/N
J.
will have the same value in both designs (i=l, ""
k).
Here
SoJ.u
is the value of
- 29 the condition represented by ~. in the uth set of exper:1mental conditions
J.
(u"'l, ••• , N), and gi is the mean of ~iU over u.
In ~~ and !gg~~, too, 1955, P9 292,
have different values for different i-values.
the
$ fS
Evidently they aD.ow si to
are coded, though obviously with the aim of getting s:1mple
c; -values
(llfor convenience ll ) , and no mention is madE! of any quantity like s~. In !g~ and
!!l1~~~t:,
1957, p. 196 the original idea is used again and even to a greater extent:
standardized levels
(u=l, ""
,
N; i ...l, ••• , k)
(3.1,4)
cf. (3.1,3), are introduced ( c an arbitrary constant, independent of i) so that
N
~ X
£1
U=
iu
t=
0
,
(i=l,
eo.,
(3.1,4~)
k).
So in this set-up, if one would calculate a quantity similar to s~ (cf. eqt.lation
J.
(3.1,3) ) in terms of these new variables x.J.U , then s~J. would have the same value
for all i, that is for each experimental condition. As
~~~
and
!m&l~~,
1957,
p. 196, write, equation (3.1,4) allows us to derive designs (i.e. sets of N values
of the vector
g)
which cover the region of immediate interest in any given
experimental problem, from one fixed design in terms of the variables x.
(Of
course more than one fixed design eXists, according to the properties which one
desires for the design one is going to apply, but by means of (3.1,4) each fixed
design in (Xl' .. 0, ~)-space can be transformed to serve on an infinite range of
regions to be explored in (£1' •• co,
kk)-space).
All this has an interesting consequence for the concept of rotatable designs.
A rotatable design (defined on p. 205 of ~g~ and M~R~~~' 1957) leads to the variance of the prediction y of the true yield
tl
being a function only of
- 30 -
~
the distance between the point in (XI~
•.. ~
Xk)-space where one wants to predict
and the center of the design.
However ~ as the whole of their theory is based upon
the standardized variables
rather than the original variables
x
(4) on page 196 and the consistent use of x-symbols rather than
g
(cf. equation
4-symbols
through-
out ~g~ and ~~~~, l.c.), this definition of rotatability needs to be restated in
terms of the original variables
S
if one 'Wmlts to see what it really amounts to.
Now of course the transformation (3.1,4) does not hold just for the points in
In general it runs
which y will be observed.
x ...
i
(i=l, ... , k)
(3.1~4~
~
(where the $i(i=l, ... , k) determine the cen ter of the design~ Le. the center of
the region of :1lIlm.ediate experimental interest) ~ which means that the variance of
Y in Urotata.ble ll designs is a function of
k
~
/.
J.=l
In view of the criticism which is often voiced a.gainst rotatable
designs~
and
which consists in questioning the justification of the definition of distance
involved, it seems in order to state exactly what (3.1,49) reveals as the origin
ofthift) distance definition: if the values of
s~
for different i are
different~
this is caused by the fact. that the range of interest in the variables
thought by the experimenter to differ from one i-value to another.
§.J.
was
Now any
experimenter who wants to explore yield as a function of experimental conditions
has to make up his mind about this point whether he uses rotatable designs or
*)
rather than Euclidean distance or yet another type
- 31 does not use them.
designs.
Hence this seems hardly a cause for criticism of rotatable
All they do is roughly this: they make the contour lines of equally
exact prediction follow the boundary of the experimentally interesting region as
close~
as possible.
In the subsequent section we are not going to use the transformation (3.1,4£>
S by x.
and we are not going to replace
Once the units have been chosen our
theory will apply, whether or not these units have been chosen so as to standardize
the variables.
If one wa.."1ts to know the type of quadratic surface represented by equation
(3.1,1), then
- as we reminded the reader in t he previous section -
one may
reduce this eGJ.1ation to canonical form i.e. rotate the coordinate-axes in such a
way that the transform of
~. ~..
J.
~J
-S
t
~ ~ no longer conta:ms cross-product terms like
-
Referring to section 1.2, one i-rould effect the transformation
b.Y which according to (1.2,la) the eGJ.1ation (3.1,1) of the
~ladratic response
surface aCGJ.1ires the form:
where Y :i.S orthogonal,1I is diagonal.
The general shape of a surface represented by (3.2,2) can be assessed much
easwr than that of a surface represented by (3.1,1).
truly quadratic the matrix
11
decides on the shape everywhere on the surface.
it is not truly quadratic, this will be pussible
g = 0,
or
<;
=:
Of course if the surface is
onJ~
in the neighbourhood of
0, respectively (equations (3Q1,1) and (3&2,2) being regarded as
the first terms of a TaYlor series) •
If
- 32 Assume for a moment that k
10$
2.
It is clear that in order to assess the
shape of the surface one may just as well ca.lculate contour lines from equation
(3.1,1), and plot them, inferring the general shape of the surface from them,
cf. £ggb~g~ and ~g~, 1957, p. 352.
~
Note that ~duc:Ll'l~ ~QUation (3.1,1) .t2 £,anon:i.-
!QFm and l2..J..otting contour ;tines are
S!§.§§ntialJ.,~
identic21, methods: one may
regard reduction to canonical form as a time-saving device by which to plot contour
lines, or in other words, the contour lines are not changed by the transformation
(3.2,1) •
Now in the papers cited in the second pars,graph of section 3.1
are unknown 'but (expectation-unbiased) est:imates Yo'
2.9
~o' ~ and
F are ava.ilable,
«P
The authors
of these papers then reduce to canonical form the second degree polynomial
b'5
yo + - -
S'Fg ,
+ __
cf. (3.1,1)
Le. - if we use the notation a,nd the matrices introduced in section 1.2, they
effect the transformation
!; = U ~
-
-
to obtain
yo
+ b 'u ~ + S; IL ~,
_
then base their inference about the general shape of the response surface (3.1,1)
on the diagonal matrix L.
Others, like
y = y
o
!1§!Jg~,
1956, did not compute L, but plotted
+ b ~ + f fF ~ ,
- ' -
which, as was sho'tffi above, amounts to the same thing.
Unfortunately, however, Theorem 2.1,2 and Theorem 2.1,3 together with its
corollaries .2:. and
h show that
L is a biased estimator of A according to different
definitions of bias: the smallest latent root t
l
of F is too small, the largest
- 33 latent root t
of F is too large, and their difference is too large (a) in the
k
sense of expectation-bias (on the average) under quite weak conditions, (b) in the
sense of median-bias under stronger conditions which, for instance, are satisfied
by F being multinormally distributed (c) again under quite vIeak conditions in the
sense of distribution-bias with respect to
~l' 1'k' -tk-~, where'L has been defined
by (2,3) and is no true estimator in that it contains elements (Y or V) which are
unknown and cannot be estimated.
Yet, (c) completes the picture set by (a) and
(b) which shows that using L as an estimator of
y
=y
o
It
(or drawing contour lines of
+ b' ~ + g fF g)
- --
leads to underestimating AI' the smallest root of
Ii, and to overestimating Ak ,
the largest root of .11, as well as Ak"'A , their difference, both on the average and
l
too frequently.
!i ~ m. ~ !:!£.
There is a difference, of course, between underestimation
"on the average" and "too frequently" (cf. X~ g~~ ¥g~~, 1957a): the
frequency (probability) of winding up with an estimate smaller than the
true value may be less than one half, yet the mean of these es timates
may be smaller than the true value.
Hence, i f either the smallest latent root of c:f:> is positive and "close tl to
zero or the largest latent root is negative and "close" to zero, this method will
tend to lead unduly often to the conclusion that the roots have different signs
whereas in reality they have not (or that one root is zero whereas either all are
positive or all are negative): in other words the conclusion of !his method
2!
anal.ysis will be, more frequently than it should, that there is a min:!m~, (col,
e
or saddlepoint)
\
1!
there is a min:imum 2!: i. mexipmIn or a §..tationaq ridge in
reality, or that there is a stationary ridge i f there!§. a minimum or a maximum
- 31J in reality.
And, as a matter of fact, there seems to be experi.?Jlental evidonce for this
oonclusicn from the theory.
(1)
On p. 548 of the book on design and analysis of indust:oial experiments edited
by ~g~~~ (1956) sn example is given of an analysis which in the first step yields
one negative and one smaller positive latent root (besides a negative one that is
very close to zero).
After a series of exper:i.Inents with the explicit aim of
checking the increase of the yield in the direction indicated by this positive
root (whereby the yield turned out to be "remarkably cons"ta...1 tll in this direction)
another estimate was made possible by the accumulation of a larger boQy of data;
this time the positive root was negligibly small.
!i. ~ m.
~
!:
k.
The behaviour of this positive root is in accordance with
oertain provisional result s l'Te obtaL"1ed if F is multinormally distributed.
More data result
i.~
a decrease of the overall variance; the amount of bias
is a monotonous inoreasing funetion of this variance hence decrea.ses with
j
it. Another interesting point hero is that under certain conditions the
varia.."1ce of the estimators.e g of the latent roots Ag also decreaselwi th
the overall V'ariance, and in such a way that the statistical significance
of the difference between zero and the latent root in questio n remains
unaltered.
Yet it seems justified to sny (as is done on p. 548 of ~~11~~,
l.c.) t."'lat the estimate of such a latent root turns out to be negligibly
small - though the authors do net give an explicit statement as to what
they mean by this term: this seems to be another example of the concept of
material significance as opposl.t,e to the ooncept of sta.tistical significance introduced by !!gg~~~ and ~~~gBP. (1954).
Finally all this means
that L, though it is a biased estimator of 1}, in the application to
- 35 response surface estimation is a consistent estimator of
J1
in the sense
that the estimates are closer as the design contains more points - whether
one makes a number of replicates of a simple d0sign or uses more elaborate
designs wM.ch contain a large number of points.
estimation of
est1mator of
.Il
Ii
From the point of view of
this reaul t counteracts (at least as long as no better
is constructed) the third requirement on an experimental
design for estill1a ting response surfaces, which 'trIas laid down by
~Q~
and
Hunter (1957), p. 197: lilt should not contain an excessively large number
=-==c:lOJ=
of experimental points".
(2)
~~~~,
1956, p. 94, Fig. 5.6, presents a diagram of contour lines of a fitted
second degree response surface which clearly shows a saddlepoimt (= col"" minimax).
What he writes concerning this occurrence is very interesting and in direct agreement with the conclusion of our preceding Remark: "Such a surface would appear to
be difficult to interpret agronomically.
One would certainly like some substantia-
tion of this tjrpe of pattern before ex'benJing its application too far.
22~!~~~ ~~l?l~g by 21?~~EY~~i-2E !?2~~~2 ~
(3)
,1;h2
A m2!:~
£E*:Hs:~ r~g,!2E is perhaps in order."
On the session on applications of response surface designs at the Atlantic
City Meeting of the American Statistical Association (September 1957) there were
reports of similar experiences.
As a conclusion to this section on bias in the estimation of the type of
quadratic surface we 1-rant to make two statements.
Firstly, so far I have never
found a report of the estimation of /\ by L resulting in a min1mum or a maximum
in cases where a saddlepoint was to be expected.
Theoretically this must happen,
though lltoo infrequently", and it would be interesting to hear of any such case.
Secondly, we have refrained entirely (cf. the end of the first paragraph of sec tion
3.1) from considering the possible effects of specification errors.
It would seem
- 36 -
necessary, however, to investigate the effects of fitting a second degree surface
in a third degree situation, ~2~h on the relevance of the matri."'C
It
(may be t..he
problem to be considered for general surfaces, even non-polynomial ones, would be
the estima.tion of curvature rather than the estimation of A; the problem thus
stated seems t.o include even the problem uhioh the method of steepest ascent sets
out to fJolve), ~!!:1 on bias in the estimation of
The variance of the coefficients of
------------_.... ------ -----.... _----- ....
11.
...,..--..... ---..._-----
the canonical form.
--~~
If the canonical form (3.2,4) of the estimator Yo +
:£f § +
~ IF ~ of the
second member of the ecp.1ation (301,1) of a quadratic response surface is used as
an es timator of the canonical form (3.2,2) of this response surface, i.e. if L is
used as an estimator of
.11,
it is certainly worth while to
k110W
something about
the distribution of L. We have already found a number of results on the expectation and median of the elements of L.
We sha.ll now make a few observations on
their variance, based on Theorem 2.2, I &..."'ld i·ts corollary.
The existing literature does net elaborate on this point.
Box
and Youle
:=a=
:.==~~~
(1955), p. 295 state that lIe.ppropriate standard errors for these constants can be
shown to be of the same order of magnitude as those of the original quadratic and
interaction terms ll •
~gG and !P~31J2~ (1957), p. 240, stA.ts that IIfor any rotatable
design the variances of the
(~oefficients
since the B.. (our .e ) are Sll,lPly the
g
J.J.
canonical variables they
ha\~
the
are the same in every orientation and
I' quadratic
~~~
standard
effec ts t in the direc tions of the
er~ors
as have the quadratic
effects b .. (our f.'> before transformation".
J.J.
1.1.
Let us consider the latter, more precise, statement from the standpoint of
equality (2.2,t~~ in Corollary of Theorem 2 2,1.
0
var .e2
= var
fro?,
then thisequa1i ty 'Would yield
t:._
If val'
.el
= val' £11 and
')
- 37 This would mean that once cj> ..
val'
£ (F)
(determining A,l and A,2' hence A,2-kl) and
£12 would be fixed, no other feature of the distributj.on of F could have any
influence on the a.'11ount of expecta.tion-bias
~~:
neither val' f ll or var £22' nor
any highor moments; ~!3 (being positive by de!'init1on) would be fixed at
~
.. +2-A,1) +
~
Y
(A,2-A,1) 2 +. h
VaI'
f 12"
2
This seems very unlikely, and provisional calculations based on F being rnultinormally distributed seem to in:licate tha.t it is not true.
Where, then, would be the error in the a:-cg'.lment on p. 240 of ~gs and g~n~~~
(1957)7
It is based on the top of p. 208 of their paper which states that
Uevery variance and covariance of the b' s (f vs in our notation) •••• must remain
constant under rotation", i.e. constant if the design, or the set of N different
values of the vector
S for
which y j,s to be observed, is rotated.
Applying this
to the rotation which red:'1ues (using cur neta,tion)
to canonical form, they conclude that for instance var t
= val' f • This must
l
l1
be wrong,for this would mean that val' t 12 = var f 12 , whereas t 12 = 0 per
= O. The reason why
12
their statement on p. 208 cannot be applied to the rotation which reduces
definitione~,
hence with probability one, so that var t
to canonical form, seems to be that this rotation is a random one (characterized
by the matrix U), depending on F, whereas the rotations discussed on p. 208 do
not depend on F, and may not be ra,ndom at all.
- 38 J:.~ ~::~!!!-2f..!_E!!!_!EE::2!~!!_.w_!~!...~;:2E!!!!_P.! . . ]e!_~~~'A!~ieE-g!_!!!E2a!!
surfaces.
---------
Suppose the surface has really an optimum.
Then the combination of the
methods of steepest ascent and of higher order designs in order to determine this
optimum might be proved to be consis tent under rather weak conditions by using
results and methods in multidimensional stochastic approximation.
See for instance
~1'l:M!l (1954) and the basic idea behind the paper by ~X~~lE~~~ (1956) on stochastic
approx:Unation
I:
numerical iteration + a stochastic element.
In this train of
thought the results of Crockett
and Chernoff
(1955) might be applied.
===~====
aa======
~-_._._-
/
_.-
- 39 LITERATURE CITED
Anderson, T. W. (1951). The asymptotic distribution of certain characteristic
roots and vectors; Proc. Second Berkeley Symposium on Math. Statistics and
Probability (1950), p. 103-130.
Bloemena, A. R. (1956).
(Overzichtsrapport).
EJ..'Perim.entele bepaling van optimaIe condities
Rapport S 204 (Ov 6) van de Statistische Afdeling van
het Mathematisch Centrum, Amsterdam (Dutch).
Blum, J. R. (1954).
Multidimensional stochastic appproximation methods.
Ann.
Math. Stat. ~, p. 737-744.
~,
G.E.P. (1954). The exploration at''1d exploitation of response surfaces: some
general considerations and examples. B5.ometrics
16-60.
m,
Box, G.E.P. and Hunter, J. s. (1957). Multi-factor experimental designs for
exploring response surfaces. Ann. Math. Stat. ~ p. 195-241.
~, G.E.P. and ~, p.
v.
(1955). The exploration and exploitation of response
surfaces: an example of the link between the fitted surface and the basic
mecha.-.-,ism of the system.
!k'x, G.E.P. and
conditions.
Biometrics 11., p. 287-323.
Yfj.j§.Q!l, K. B. (1951).
On the e:''q)c:ri''l1enta1 attainment of optimum
Jour. Roy. Stat. Soc., Sere ~
Busemann, H. (1947).
11,
p. 1-45.
Intrinsic area. Annals of Mat...'lematics !!§., p. 234-267.
Cochran, W. Go and .Q.Q~, Go M. (1957). Experimental Designs, 2nd ed., New York,
Wiley, and London, Chapman and Hall, xiv + 611.
Crockett,J. B. and £.hernof.!., H. (1955). Gradient methods of maximization.
Pacific Journal of Math3matics .2:, p. 33-50. (Also Cowles Commission Papers,
New Serf No. 92)"
Davies. O. L. (Editor) (1956). The design and analysis of industrial experiments;
published for Imp. Chem. Ind. Lim. by Oliver and Boyd, London and Edinburgh,
and Hafner, New York, 2nd ed., xiii + 636 p.
Doob, J. L. (1953). Stochastic processes.
and Hall, vii + 654 p.
•
New York, Wiley, and London, Chapman
Dvoretzk,y, A. (1956). On stochastic approximation, in: Proceedings of the Third
Berkeley &ymposium on Mathematical Statistics and Probability (1955). Univ.
of California Press, Vol. .L p. 39-55.
... 40 -
\
\
Mason, D. D. (1956). Functional models and experimental designs for characterizing
response curves and surfaces, p. 76-98 in: Methodological procedures in the
economic analysis of fertilizer use data (based on a symposium held in 1955
!
I
and edited by ~~'!E], E. L., tItH~9y,:, E. 0., m.@:c~~~, J.) Iowa State COll.
Press, xx + 218 p.
Saks, S. (1937). Theory of the integral. Monografie Matematyczne, tom 1; 2nd
revised ad., translated by L. C. Young; New York, Hafner, vi + 347 p.
Vaart, H. R. Y..i3ll~" (1957Q..). Some general remarks on the definition of the
concept of biased estimators. Institute of Statistics ~Iimeo. Series No. ill,
18 p.
Vaar-!, H. R. Y.ml ~ (1957:2). On some distribution-free bias properties of the
latent roots of real SYmMetric random matrices. Ann. Math. Stat. ~,
p. 1069-1070, abstract No. 41 •
•