1030 Part B

ON THE USE OF GENERAL POSITIVE QUADRATIC
FOI<~1S
I!\ SIMULTANEOUS INFERENCE
by
Yoscf Hochberg
Highway Safety Research Center
and
Uepartment of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No.
September 1975
lO~O
POSITIVE QUADRATIC 1'0101.')
IN SIHULTANEOUS INFERENCE
ON THE USE OF GENERAL
by
Yosef Hochberg
Highway Safety Research Center
'and
Department of Biostatistics
University of North Carolina at Chapel Hill
1.
Introduction and Summary.
The matrix in the quadrati2 form used to
project simultaneous confidence intervals for linear functions of the
parameter vector in Scheffe's S-method is conventionally chosen to be the
inverse of the dispersion matrix of the allied estimators.
(See Scheffe,
1959, Ch. 3 and Miller, 1966, Ch. 2).
The first attempt to introduce optimality criteria for choosing
the 'best' quadratic form seems to be that of Hoel (1951) where the case
of confidence bands for a simple straight line regression is discussed.
Hoe1 (1951) introduced a weight function for the values of the independent
variable and proposed the criterion of minimum weighted average distance
between the bands.
It turns out that the Working-Hotelling quadratic form
(a special case of the S-method in straight line regression) is not necessarily optimal.
The optimal choice depends on the specific distribution
attached to the 'independent' variable.
Bohrer (1973) proved that the S-method is optimal (for practical
confidence levels) under the criterion of minimum average length of confidence intervals for linear functions with uniform weight in some elliptical region.
Besides this result, it seems that little has been recorded
on features of the S-method, (see discussion following Wynn and Bloomfield,
1971).
2
The first goal of this work is to discuss arl adJitional optimality feature of the
S-metho~.
This is given in Sectioll
Another point discussed
he~e
~.
is the use of general quadratic
forms as a convenient tool for discriminating among various linear parametric functions (by the allocation of lengths of confidenc£ intervals)
in a way different than that obtained by using the S-method.
In a linear regression setup several
~Titers
(see. e.g., Wynn
and Bloomfield, 1971, and their references) have sharpened the usual
bands obtained from the S-method by restricting tne bands to finite
intervals.
We feel that problems might arise where an investigator has
main interest in a region which is away from the vector
01
means of the
independent variables but will still like to have bands over the entire
space.
This situation is discussed in Section 3 where elliptical bands
are constructed with minimum length of confidence interval set at a prespecified point.
Similar techniques can be used in any ANOVA setup.
General p.d.
quadratic forms can be used to construct discriminatory simultaneous
confidence intervals on different types of estimable parametric functions.
2.
On optimality of the Working-Hotelling-S~heffemethod.
Let ~
A
e
~
=
A
k
(el, ... ,ek)'£~
k
(e , .•. ,ek)'£~
l
tional property
(The S-method).
be the unkno~l vector of parameters, let
be the corresponding estimators with the distribu-
8~N(e, 02~), where ~ is a positive definite (p.d.) and
completely known, 0 2 is unknown but an unbiased estimator s2 is available
\>
such that s~ is independent of
v
chi-square variate with\> d.f.
8 and
-
\>S2/ 0 2 is distributed as a central
V
3
In its utmost generalicy, the projection
simultaneous confidence intervals on a 1]
convex confidence set
~ponding
D*
==
D,
e,
say, for
V.
pair of tangents to
0' -,
~
(,
b,
techn~que
for set:ing
"k starts with a general
~£h
then projecting on i the corre-
The S-method chooses for
D the set
(8-e)'V- 1 (8-6) < s d} where d is determined by the confidence
f;,.
,-v.
-
-
-
-
-
-
V
coefficient (See Scheffe, 1959, Ch. 3).
fidence coefficient,
simple arguments.
V*
Among all sets of the same con-
has minimum volume.
Ihis can be verified by
However, we are looking for an optimality character-
istic of the S-method in terms of the projected confidence intervals
~'e.
for the
We will now discuss such an optimality feature of
th·, class of all ellipsoids.
First we give
V*
in
the simultaneous estimation
"-
procedure based on a general p.d. quadratic form in 8-0.
any p.d. matrix and put
t
=
Let M: kxk be
"-
(8-8)/s , then we have
-
-
V
1 - a
~
where
is the (l-a)-th quantile of a random variable distributed
"'a
k
v
L
A X2 /X 2 = Q,
j
j=l
1
v
where
(2.1)
~s
XV2 denotes a central chi-square variate of v d.f.,
the A. 's are the k non-zero roots of VM and all the chi-square variates
J
in the expression given for Q are independent.
This is easily proved
using the well known result on the distribution of a general p.d. quadratic form in normal variates (see for example Box, 1954) and Scheffe's
projection
positive.
argument.
Note that only when k = Z both Al and A must be
Z
For k > 2 we may have some negative roots.
These roots will,
however, satisfy the conditions LA. > 0 and 'E(l/'A.) > 0, see Graybill (1969),
l.
Ch. 12.
The problem of obtaining E.
a
l.
or estimating it is discussed later.
I
For any given x we define LeI == 2(E.x'M- x)\
This is the length
4
between the upper and lower l-u confidence bounds on x't in (l.l).
problem is to choose a 'good' H.
Our
Rather than trying tu minimize th('
weighted mean of the LCI we will look for an H that minimizes the weighted
mean of the squared LeI's.
This approach is more convenient than Hoel's
approach, since on introducing a distribution for the values of xsr:
k
we
find that the weighted mean squared LCI depends only on the mean vector
and the dispersion matrix of x while the mean weighted LeI depends on
the specific distribution assumed.
Thus,
~ssume
(a distribution) is introduced for the values
0:
that a weight function
k
x in R
mean vector of x is u and the dispersion matrix is I.
such that the
The exrected squared
LeI is then seen from (2.1) to be
(2.2)
Explicit minimization of (Z.2) with respect to a real p.d. matrix
any given
~
and L seems difficult.
function of M).
(Note that
Clearly the S-method (i.e.,
be optimal for all
L or for ~ I~.
}1
Sa.
for
is a rather complicated
= V- l )
If we take u
~
=
is not
0 and
expect~d to
L = V-I (this is
quite reasonable in a linear regression setup where the independent variables have been centered around their means) we get for (2.Z) the simplified expression
(2.3)
where the A. 's are the roots of VM.
J
We note that (2.3) is invariant to
LA.1. if the A.1. are changed pro-
portionally, thus we may arbitrarily fix LA .•
1.
A + A
l
Z
=1
For k = 2 we now let
and prove that (2.3) is minimized when A
l
= AZ
=~.
It is not
5
difficult to verify that this is implied by the follolving result.
Let
x, Z, and Y be i.i.d. randall variables where X anJ Yare distributed as
x, I2
variates, then we have:
Lemma 2. L
For any a > 0 the quantity
is maximized when A
Proof.
= ~.
The problem is automatically reduced to proving that the quantity
f(A,a) = Pr{AX + (l-A)Y ~ a~ (I-A)}
is minimized by A =
~
for all values of a.
ofax~ variable we have
probability density function
f'
df(A,a)
dA
=
aA
f
y=O
h[(l-A)(a - f)]h(y)(~ - a)dy.
When substituting for h(o) and A
=~
a/2
f' (A = ~)
,
On letting h(o) denote the
f
y=O
we get
A
[(- - y)y]
-~
2
The second derivative, f", is negative at A
(4y - a)
~
O.
since
I
j
f"(A
=~)
a/2
=
f
o
h(y)[(4y - a)2 h
'(1 - y)
- 16yh(I - y)]dy
and h' (x), the derivative of hex) is negative for all values of x > O.
This demonstrates that A =
~
is at least a local maximum.
It
seems difficult to show analytically that this is the global maximum.
6
However, numerical integration
at \
~
(se~
Table 1) supports the statement that
1/2 the functions f(\,a) indeed obtains a unique global maximum
for any a > O.
3.
Simultaneous confidence bands of regression surfaces with
minim~
LeI
at an arbitrary poinb.
The Working-Hotel1ing-Scheffe bands on regressioD surfaces have
minimum LCI at the vector of means of the independent variables.
In some
experimental situations it might be appropriate to have simultaneous
confidence bands the LCI of which is minimized at a different point.
This state of affairs arises, for example, when the bands are constructed
mainly for prediction purposes and the rangp. of future most likely values
for the independent variables is removed from their current sample means.
We now consider regression surfaces with intercepts and suppose that the
independent variables have been centered around their means.
-1
(-1)
M
= (m.,
)i' 1
k' V = (v, ,). ' 1
k
1J
,J= , ••• ,
~J
J..,J= , ••• ,
such that the LeI (obtained from (2.1»
We let
and look for a p.d. matrix M
at a point x + U = (1,x2+~2'.'.'~+~k)
is proportional to the LCI obtained by the S-method at the point x, for all
x.
Thus, we must satisfy the identity
x'M
-1
x,
(3.1)
for all x
This gives
(-1)
m
= v, . ,
ij
1J
(-1)
m
li
i,j
k
(-1)
L
mil
(-1)
+
= v
mIl
ll
2, ... ,k
j=2
k
k
I
L
i=2 j=2
f.l . vi' ,
J J
v, ,lJ,lJ,
~J ~ J
i
.
2, ... , k
(3.2)
7
When k
2, for example, we
hav~
.,-1
v
£"1
which gives
VM
The roots of VM are thus determiaed from the equation
(l-:.\)(l-:.\+a) - a
As noted earlier, when k
0,
(3.3)
= 2 both roots are positive and the implementa-
;a given in
tion of the method is easy by using the tabulated values of
Table 2.
~a
The values of
one may fix Al + A
2
depend on v,a and the roots A ,A , however,
1
= 2, say, as we did in our tables.
number of parameters is three:
v, a, and I"
2
This way the
0 < I. < 1.
The use
c~
the
method and a comparison with the S-method is demonstrated by the following numerical example in a simple linear regression setup
E (Y .) =
~
Example.
a + SX.,
~
i
1, ... ,n.
(From Draper and Smith (1966), 1.2)
(w chosen arbitrarily).
Using lower case letters x, x., etc. for deviations fran the means, we
1.
have n
~ =
= 25, X
-17.2.
=
52.6;
Y=
9.424;
s2 =
\)
.7926, ~x7 = 7154.42,
1.
8=
-0.079829,
Using Scheffe's method we have the bands
Using the above described method which sets the minulUffi LeI at W
-17.2
we get here
-1
r,.08135
.00240!
~0240
.000141
----1
M
a = 1.03377; \1
0.37629.
2.65751; \2
So the bands are
Y
To obtain
A
=
~
a
+
'"
ax
± s {( (0.08135 - 0.00480x
v
-Ct
+ O.00014x 2 )}1/2.
(3.4)
from our table one should have used the entries v
2 Min(Al,AZ)/(A
l
+ A2 ) = 0.24806, and an 8ppropriate
obtained from the table must then be
mu1tip~.ied
Ct.
=
Z3,
The value
by (\1+'\") /2. These entries
are not provided by our tables and thus either interpolation or a simple
well known approximation, that we now consider, must be used.
That approx-
imation is given by
where F(a) is the (l-a)-th quantile of a central F distribution with
h,v
d.f. h and v and where
n~
g = -1-
(D.)2
and
J
h
(D,.)
D..
J
2
J
See for example Box (1954). This approximation is based on equating the
first two moments of Q (see definition following (Z.l»
gv~/x~ variable and solving for g and h.
with that of a
It can be used conveniently
for any k > 2 for which exact tabulation is not practical
For our example and a
= 0.10 we have:
g
= 2.37456, h
1.27762,
9
r.
8.49991.
and ".10
exact value (sel:' Section
The
putations) is ~.lO
8.31868
=co
V
=co
23.
for d,otalls on com-
(3.0\'33)(5.484)/2, where the value 5.48L.
was computed as the entries of Table 2 tor A
0.248 and
L.
=co
2 Min(i'1'\2)/(A
1
+ \2) =
Both bands, i. e., the S-method bands and triose based
on (3.4) with the exact evaluation of ~ l~' are given in Figure 1.
•
4.
Computations.
•
U
Let X,Y,Z be independent where X and Y each dis-
tributed as a X~ variable and Z distributed as a X~;v.
Table
1
was
constructed by computing in brute force
a,\
PrtAX + (l-A)Y < aA(l-A)}
;
r
Pr{X < (l-)(a - r)}h(y)dy
y=O
where h(o) is the density ofaXi'
The outer integral was computed by the 1ml scientific subroutine
DQG32 based on Gaussian quadrature.
A subroutine for computing the CDF
ofax 2 variable was used from the library of the Biostatistics Department
v
of the University of North Carolina at Chapel Hill.
The range of inte-
gration for the outer integral was split to a number of sub-intervals
with attention to the shape of h(o).
When
A ~ .5
the results of our
numerical integrations coincide with the available tables to the third
,
decimal point.
l
Table 2 involved similar computations.
Here we used numerical
integrations as above with a simple search procedure that terminated when
the gap between the upper and lower bounds became less than --0.01.
The range for the integral over values of z in calculating
00
f
z-O
az/(2-A)V
J
y=O
pdx -<~
VA
where F (.) is the CDF ofax 2 variate, was taken from zero to 4v in all
V
v
entries of the table.
t
10
FIGURE 1.
13 ~--------------------"""
The S-method bounds.
12
11
10
9
-32
x
-24
-16
-8
deviations from the mean.
o
-8
REFERENCES
Bohrer, R.
(1973) . An optimality property of Schef f (: bonus.
of Statistics, !' 766-772.
Annals
(1954). Some. theorems on quadratic forr.s applied in the
study of analysis of variance problems, I. Effect of inequality of
variance in the one Wqy classification. Ann. Math. Statist., ~,
290-302.
Box, G. E. P.
Draper, N. and Smith, H.
New York.
(1966).
Applied Regression Analysis.
~iley,
Graybill, F. A.
(1969). Introduction to Matrices with Applications in
Statistics. Wadsworth, Belmont, California.
Hoel, P. B.
(1951). Confidence regions for linear regression.
Second Berkeley Symp. on Math. Statist. and Prob. Berkeley:
ersity of California Press, 75-81.
Miller, R. G., Jr.
Rill, New York.
Scheffe, H.
(1959).
(1966).
Simultaneous Statistical Inference.
The Analysis of Variance.
Proe.
UnivNcGraw-
Wiley, New York.
Wynn, H. P. and Bloomfield, P. (1971). Simultaneous confidence in
regression analysis. J. Roy. Statist. Soc., Ser. B, 33, 202-217.
Table 1:
f(\,a~
\
a
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
5.500
6.000
6.500
7.000
7.500
8.000
8.500
9.000
9.500
10.000
10.500
11.000
11.500
12.000
Values of
0.100
0.200
0.300
0.400
0.500
0.070
0.133
0.189
0.238
0.284
0.325
0.361
0.394
0.425
0.454
0.480
0.505
0.5"26
0.547
0.568
0.586
0.603
0.620
0.635
0.649
0.663
0.676
0.688
0.699
0.093
0.177
0.251
0.317
0.376
0.428
0.475
0.517
0.555
0.589
0.621
0.650
0.675
0.698
0.721
0.741
0.759
0.775
0.791
0.805
0.818
0.830
0.841
0.850
0.107
0.203
0.287
0.361
0.428
0.487
0.538
0.585
0.6£.6
0.663
0.696
0.727
0.752
0.776
0.798
0.817
0.834
0.850
0.863
0.876
0.887
0.898
0.907
0.914
0.114
0.217
0.306
0.386
0.457
0.518
0.572
0.620
0.663
0.701
0.735
0.765
0.791
0.814
0.836
0.854
0.870
0.885
0.897
0.909
0.919
0.928
0.936
0.942
0.117
0.221
0.313
0.393
0.465
0.528
0.583
0.632
0.675
0.713
0.747
0.777
0.803
0.826
0.847
0.865
0.881
0.895
0.907
0.918
0.928
0.936
0.944
0.950
Tab Ie 2:
Valuco of '_
(A, 'vI) *
-.. -------~- - " - - - - - ·'ex -
f~1
--+I~
0.
1 0 . 05
I,
I
0.10
I
i
I
I
I
I
\,)
15
20
30
40
I
.
0.1
0.2
0.3
0.4
0.5
0.6
_------0.7
0.8
0.9
i
I
.
965
8.74
8.49
8.12
7.86
9 . 33
8.54
8.10
7.82
7.66
9 . 07
8.27
7.81
7.51
7.38
8 . 84
7.97
7.62
7.32
7.09
8 66
7.82
7.43
7.16
6.89
I
8 2'")!
"-
8 . 47
7.66
7.23
6.96
6.79
8 . 35
7.56
7.10
6.84
6.61
8 . ZP.
7.45
7.03
6.73
6.53
7.40 I
i
7.00 i
6.67
I
6.50
5.91
5.47
5.25
5.05
4.96
5.88
5.43
5.22
5.02
4.92
5.86
5.41
5.19
5.00
4.90
5.85
5.40
5.18
4.98
4.89
~
--
I
10
15
20
30
40
-_..
6.39
5.97
5.78
5.59
5.52
6.20
5.81
5.62
5.42
5.35
6.08
5.69
5.47
5.28
5.21
6.01
5.59
5.38
5.19
5.n
5.95
5.52
5.30
5.09
5.02