.
ON THE USE OF A MmIMAX REGBET FUNCTION
TO SET SIGNIFICANCE roINTS IN ERIOR TESTS
OF ESTIMATION
by
Richard J. Brook
Institute of Statistics
Mimeograph Series No. 856
Raleigh - December
--
iv TABLE OF CONTENTS
Page
LIST OF TABLES •
-
..
· -. · . . · . .
• • •
LIST OF FIGURES. • •
1.
1.4
1.-5
1.6
1.7
()
•
e
.•
• •
..
Introduction and Review of Literature • • •
The Linear Model and Restrictions Defined •
The Preliminary Test of Estimation. • • • •
The Estimator of y* • I< • • • _. • •
• -.
The Bias Function of y . • •
The Quadratic Risk Function of y *- ••
Invariance under Orthogonal Transformations
0
It
'.
0
•
9
II
0
•
•
·..
1
0
•
•
..
•
-
•
•
_.
•
-.
•
Q
•
•
·..
1
3
5
8
9
•
0
•••
•
_0
•
•
e
•
•
.a
e
.
o.
15
20
e
•••••
· . ...
•
. 0
22
• •
·..
·..
· . · '. · -.
.•
0
e
22
25
26
_-.
28
3.1 _A Minimax Regret Function Defined •
3.2 Some Properties of the Functions M(6,A), r(6,A) and s(6,A)..
28
31
MINIMAX REGRET FUNCTION FOR THE BETA VECTOR. • •• •• • • • • •
38
4.1
4.2
4.3
4.4
4.5
4.6
•
_.
•
•
••
••
•
•••
.0
0
•
0
•
•
•
•
•
•
•
•
0
•.••
•
-0
•
•
0
•
•
_••
-.
•
•
•
-••
0
•
0
•
..0
•
-.
0
0
.'.
•
-.
•
•
0
· · · . · . · · ·· · · · ·
·
.• .• • • ·• ·• ·• ·· ·• ·
0
0
RESULTS AND CONCLUSIONS. •
0
0
5.1 . Computer Procedure Followed to Obtain Critical Values Based
on the Predicted Value of y • • • • • • • • • • • • • • • • •
5.2 Results Based on the Predicted Value of y • • • • • • • •
5.3 Computer Procedure for the Case-of the Estimated Beta
Vector o. · o
5.4
5.5
38
39
39
44
45
46
II\troduc tory Comments
Bias of the Beta Vector •
The Mean Square Error of the_Estimated Beta Vector • • • • • •
A Partial Check on the Mean Square Error • • • • • • • • • • •
Orthogonal Transformations • • • • • • Minimax Regret Functions for the Estimated. Beta Vector. • •
4.6.1 . A Weighted Risk Function.
•
•
•
•
•
4.6.2 A Sawa-Type Risk Function •
• • •
• •
• • •
• •
4.6.3 Minimax Regret Condition Based on the Mean.Square .Error
5.
vii
0
-. • • • -. • _. • •
• • • • • • • • •
•
Comparison with Larson and BancroftVs Results • • •
Comparison with Wallace 'sResults • • • • • '••.••
Comparison 'with Sawa and Hiromatsu's Results. .• ••
MINIMAX REGRET FUNCTION FOR Y*
6
vi
••••••••••••••••
0
4.
•
FURTHER COMMENTS ON THE RESULTS OF CHAPTER 1
2.1
2.2
2.3
3.
..
QUADRATIC RISK FUNCTION FOR THE PREDICTED VALUE OF Y •
1.1
1.2
1.3
2.
•
·..
••
O"fj
o.
Cl"1t
0'."
0.8
."e·"
0
0"'8
.'e".
Results Based on the Estimated Beta Vector • • • •
Percentage Alpha Levels if 1..* Is a Critica1'Point of the
Non-Central F -. •
0
•
'.
•
0
0
. 0
•
•
.'.
•
•
•
-.
•
0
•
•
•
•
o '.
•
-
46
47
49
56
56
58
62
64
67
v
TABLE OF CONTENTS (continued)
Page
Efficiency of the Estimators Based on the Optimal Critical
Values
0
0
&. 0
0
I!l
·0
"
f>
et
•
•
e
0
~e
0
I!l
6
. It
G
•
•
Conclus-ions
6.
•
e
it
0
•
LIST OF REFERENCES.
o
•
•
Q
·0
~
~
..•
•
•.
e
Ii
0
0
8
0
0'
70
72
ea. e
Maximizing the Minimum Relative Efficiency • •
• • •
0
G
8
IJ
•
o
•
."
I!l
0
It
• • 78
...
81 .
vi
LIST OF TABLES
Page
5.1
Optimal values for the predicted value of y ••
5.2
Optimal values for the estimated beta vector.
5.3
Alpha percent level if A is from the non-central F with
non-centrality = 1/2. • • • .• • • • • • • • • • • •
•
"
(9
....
"
'.
. '. .
59
65 .
,~
5.4 Alpha percent level if A* is from the non-central F with
non-centrality
5.5
•
= m/2.
• • • • ••
•• • • • • • • • .•
68
...
Critical values obtained by maximizing the minimum relative
efficiency of the estimator, XS*, when.the number of
restrictions is two • • • • • •.• • • • • • • • • • • • • •
69
76
vii
LIST OF FIGURES
Page
3.1
Quadratic risk functions for different A • • • • •
30
4.1
The regret condition for the estimated beta vector
54
5.1
The quadratic risk function ofy for optimal critical values.
63
5.2
Efficiency of the estimator XS * relative to the least squares
estimator, Xb. • • • • • • • • • • • • • • • • . • • • • •
71
Efficiency of the estimator XS * relative to the estimator
based on the 5 percent F value • • • • • • • • • • • •
73
5.3
5.4
Efficiency of XS * relative to the "best" estimator •
...
75
1.
QUADRATIC RISK FUNCTION FOR THE PREDICTED VALUE OF Y
1.1
Introduction and Review of Literature
When an applied statistician is faced with the problem of
estimating certain parameters of interest, he has at least two choices
before him.
From the data at his disposal, he can use the method of
ordinary least squares to obtain one set of estimates.
Another set of
estimates can be found by the method of restricted least squares.
In
the latter case, the constraints may take the form of pooling two sets
of data.
A different kind of restriction is the use of prior knowledge
to estimate certain components of the B vector, or linear combinations
'V
of these.
He
'V
= 'Vh,
All these constraints can be cpnveniently written as
where H is an mxk matrix of
kn~wn
constants, e is a kxl vector
'V
of unknown parameters, and h is a kxl vector of known constants.
'V
This formulation includes the following common restrictions:
(i) One parameter is a constant multiple of another;
~.~.,
(if) One parameter has a certain constant value; e.£.,
eI
=
2
(iii) A linear combination of some of the parameters has a constant
value;
~.~.,
(fv) Wallace
l
points out that the pooling problem is eqUivalent to
choosing H tel be (lk' - I k ), where I k is the kxk identity matrix, and
there are k components of the pooled
I
~
vector.
T. D. Wallace, Professor, N. C. State University, Raleigh, N. C.,
Weaker criteria and tests for linear restrictions, to be published in
Econometrica.
2
Theil and Goldberger [1961] indicate how certain !!. priori
inequality constraints on the S vector can also be included in this
I\,
formulation.
In practice, it is not kn0wn whether the constraints are exact,
so that the experimenter must decide whether to use the ordinary least
squares estimate, 1>.'. or
.v
th~
restricted
estilnator,,_S~
I\,
The data, itself,
-
can be used to' decide which set of estimates to use by means of a prior
test of estimation.
u
The test statistic takes the form,
SSE(S)
- SSE (b)
1\,1\,
where
= --=---::-::~r.-::~-'-~ SSE(~)
SSE ( ) signifies the sum of s.quares for error
using that estimate of S, and k is a constant to make u the ratio of
I\,
two- mean squares.
A sequential estimator of S can now be defined as
I\,
S*
I\,
h i f u > AI\,
=
S if u <
I\,
whe't'e
A-
A-
is E./-n appropriate constant,and, generally, will be a certain
critical value of Snedecor's F distribu1:ion.
It is well known that
~
* is
a biased estimator.-
bias and quadratic risk function will be evaluated.
In Chapter 4, its
Similarly,
y* = XS * is a sequential estimator for the predicted value of v.
I\,
I\,
I\,
In the
remaining sections of Chapter 1, the bias and quadratic risk function· for
~* will be evaluated.
To do this, a method will be used similar to that
employed by Larson and Bancroft [1963], although they considered the
simpler case of zero restrictions instead of general linear restrictions.
Ashar [1970] also evaluated the bias and mean square error for a
sequential estimator for a two' variable model with zero restrictions.
3
In the definition of S* , or XB * , there is a certain arbitrarine'ss
tV
tV
as I. is chosen at will by the experimenter.
in a similar
context~
specify a value for I..
regret function for
[l~71h
defined a minimax regret function in order to
Their general approach will be used to
* in
~.
Sawa and Hiromatsu
d~fine.
a
Chapter 3, and for f3 * in Chapter 4.
tV
Toro-Vizcarrondo and Wallace [1968] and Wallace 2 proposed other
criteria than testing whether the linear restrictions are
is equivalent to testing that the non...centrality parameter,
Their criteria lead to testing whether
which'
trt.~~,
a,
is zero.
e is less t;han In., or m/2 t'
where m is the number of restrictions.
During this study, comparisons
will be made with the results they obtained.
1.2
The Linear Model and Restrictions Defined
Consider the general linear model
~
= X~+,t,'
where:t,
are nxl vectors and X is an nxk matrix of fixed constants.
larger than k, and X has rank'le, that is,of full rank.
necessary that the column vectors of X be orthogonaL
~ and~.
n is assumed
It
is not
The error vector
£ is assumed to be distributed as a multivariate normal with lIlean zerO
tV
and variance-covariance matrix
0 21.
That is, the errors £i' i ::;: 1, 2,
3, ••• , n, are identicall,y and independently distributed as N(O, 0'2).
The ordinary least squares estimate (0. L s.) of S is b
tV
where S
tV
= s-lx'v,
I\,
= X'X.
If a set of exact linear restrictions are placed on the l3 vector,
'V
viz., HB=h
where His an mxk matrix of rank m, and tV
h is an mxl vector,
"i:, tV
--
then the restricted estimator of B can be found by the method of
tV
multipliers, as in Goldberger [1964, p. 2561.
2
See fQotnote L
Lagran~e
4
,
,
The expression Z = (~- X~) (~ - X~) - 2J-l (H~-~) is minimized with
respect to Sand J-l, the mxl vector of Lagrange multipliers.
'V
'V
~,
restricted estimator,
for
~
The
is found to be
(1. 2.1)
The distribution of these two estimators, band S, will be needed in the
'V
next sectiCiln.
'V
Clearly,
b
'V
'V
,
2
N(S, S-1 (J)
(1. 2 .2)
'V
As H, S, and h consist of fixed constants, then
'V
The
va~iance
of S is then
'V
On simplifying, this becomes
and
var
(1. 2.4)
where the var S is given above.
'V
If it is known that the linear restrictions are exact, then
"
~
sho\.tld be chosen in preference to b, as both estimators 'ti6uld be
tV·
unbiased, and it can be shown that
var S < var b
'V-
'V
(1. 2.5)
5
-1'
-1 t -1 -1 is
This follows from the fact that the matrix S H [HS a] as
non-negative definite.
It should be noted that (1.2.5) holds whether or not the restrictiona are exact.
The PreliminartTestofEstimation
1.3
In this section, a few comments will be given on the preliminary
test of estimation and the distributions of some
ip1port~nt
statistics
will be derived as these distributions will be needed in later
se~tions.
If it is notkp.ownwhether the linear restrictions are exact, then
a prior test could be performed, namely testing the
H :. HS.
o
'V
nul~
hypothesis,
= h'V
Rae [1965] shows that a U.M.P. test of this hypothesis is bSEled em the
statistic
'"
u
=
SSE(~}
-
SSE(~)
m
SSE(~)
This expression can be simplified by substituting the value of
in (1 •. 2.1>.
(1.:3 .1)
n-k
~
found
The resulting expression for ucan be written as
u
=
where ~2=
SSE(~)
--r--
1'l-k
that is, the sample estimate for
Now, from (1.2 •. 2) 'V
b
'V
02
with n-k degrees of freedom, and
N(S,
S-102 ) and consequently
'V
(1.3.4)
6
As H was assumed to be an mxk matrix of rank m, and S is akxk matrix of
rank k, it follows thatHS
-1'
H is an mxm matrix of rankm.
Graybill
[1961, p. 84], shows that this is a sufficient condition, a19ng with
(1.3.4), to claim that:
(1. 3.5)
is distribu~ed as a non-central X2 with parameters m and
,
e which is
usually written as X 2(m;e), where eis the non-centrality parameter
and
e=
(H8-h) '[HS-lH']-1(H8-~)
'V 'V
'V 'V
(1. 3.6)
20'2
Under the null hypothesis, the distribution of ~ reduce~ to a
central
x2
with m degrees of freedom, that is
x2 (m).
It is well known
that
V
=
"2
(n-k)O'
'V
X 2 (n-k)
(1. 3. 7)
0'2
In .section 1.5, use will be made of the fact that rand V are
independently distributed; and
of sq·uares.
als~
that V and Q are independent sums
It will be useful to justify these two statements at this
pOint.
The first fact is well known and is proven in detail in Graybill
[1961, p. 113].
In brief;
~
=
(X'X)
-1
x';z
1
is a
linea,rformin~,
and
or
V=0'2
V
whi~h
' [I
= - l ;r;.
0'2
is a quadratic form in;r;.
' y
- XS-1X]
(1.3.8)
I'v
Independence of b and V follows from
'V.
7
of a theorem quoted by Graybill [1961, p. 87).
To show the independence of V and Q, 0..3.8)
ca~be
rewritten,
substituting XS+e: for v, to give
'V 'V
'V
V
=-1
'
e: ~ [I - XS -1X)
e: .
'V
(:l..3.9)
02 'V
From (1.3.3),
. -1X'(X(3+e:) =(3 + S-1X' E.
Now. b = (X'X) -1'
X Y = (X'X)
'V
'V
'V 'V
'V
'V
value for
~in
Substituting this
(1.3.3), and recalling that.under the null
hypothe~ie,
HS=h, then, under the null hypothesis, Q reduces to the following
'V 'V
quadratic form:
cent;a~
(1.3.10) .
John~3<'>n.and
Kotz [1970, p. 176) 'point.out that to show the independence
of' quadratic forms only the central forms need to be cO.nsiqered.
ne~essary
and sufficient condition for V and Q to
The
beindepend,.~t
is that
thepr,oduct of the matrices [1- XS-1X'] from (1.3~9),and
By straightforwar~
{XS-1ij' [HS- 1H')-1 HS -1X'} from (1.3.10) be zero.
multiplication, this is seen to be true.
Two additional facts wil1prove
useful in section 1.5 Asb and V are independent, and b 'VN(S, S-102 ),
.
'V
'V
'V
0
. then the joint density of band V can be written as
'V
(1.3.11)
-(b-.'S)'S(b-S)]
'V 'V
'V 'V
C
i
20 2
"
AsQ and Vo~ = (n-k)o4 are independent sums of squares, then
is distributed as the non-central F distl;"ibutionwith paramete1;sm, n-k
and 'e.
1.4 The Estimator of y*
In this sectionj the commonest;imator for the p1;edicted
v~lue
of
y,3 y*,based on preliminary tests of estimation, will be considered.
y* will be tak~n to
Its bias and mean square error will be evaluated.
be
XS * =
-
(1.4.),)
Xb 1£,1.1 > A
Xa
if 1.1 <A
where 'A is the critical point of Snedecor's F $t8tlatic with m and n-k
de$rees of freedom at ,the desired typel error level. ' This could also
be written as·
y* = xa * = xa"
where
1
1
(1.1) +Xb
1.
(1.1)
[A ,(0)
[O,A)
(1.1), the charac teris tic func tion, =
[a,b)
11
if 1.1 is in [a, b)
o if
In terms of the null hypothesis, Hog Ha=h,
u ~s in [a,b)c
t1;lees~imator
xa
HO isacaepted, butXb is chosen i f He is rejecteddat,the
3
From ~ow on '
the
(~),
il3 chosen if
desire4~-level.
*
" a , B but symbols
will, be omitted from v,b, a,
" a* , a will still represent vectors.
y, b,a,
I\,'V~~
~
9
105
the complement of
o
~he
that is ~
A1~.
={
A
The Bias Function of y *
(b~V)
probability of the set Al is given
by~
y
=
II
K f (V) exp ( -(b-S) S(b-S) )dbdV
2
20 2
Al
and by transforming the variables
=
f
U>A
b~V
(1.5.1)
to uas defined by (1.3.2)
f u (u;m~n-k~e)du
=
g(e,A)
where f u (u;m,n-k,e) is the density of the non-centralF with m anp
n-~
y
degrees of
freedom~
that
is~
u
~
F
(m~n-k;e)
as was shown in (l.3.l2).
However, rather than using the density function f (u;m,n-k,8), a transu
=
formation t =m u
n-k
Q
'"
will be employed.
The
r~su1ting
density
(n-k)02
ft(t;ID,n-k,e) is the ratio of two independent sums of squares.
Kempthorne
ft(t;m,n-k,e)
[1967~
=
~.
i=o
po 221] shows that
il B( ~+ i, ·n;k)
(l+t)i+~-k
whereB(p,q) is the Beta function with parameters p and q.
(1.5.3)
Now, the
expected value of y * can be considered separately over the setsAo and Al
:t:or these form a partition of the parameter space.
10
E(y*)
(1.5.4)
unbiased estimate of8 s then
(1.5.5)
E(y*) reduces
to~
(1.5.6)
To evaluate this. expressions it is necessary
to find E(bIA0
)P(A).
' "
0
For
convenience, E(bIAl)P(A1 ) will be evaluated first •. One way of finding
tnis is to differentiate (L501) and (1.5.2) with respect to the 8 vector.
Thei1-[1971,pp. 30-31] gives a review of vector and matrix differentiation.'
From his comments.
as S is symmetric
there~
0
it is clear that·
.
From (10 5 .1) and (1. 5 7)
0
(1.5.8)
= E(S(b:8)
0<:'
But from (1.5 2),
0
IA )P(A )
1
1
11
(1.5.9)
=
From (1.3.6) 9 e
ae
as
,.,
(H~=h)V[HS-IHij]-1(H~=h)
2cr 2
9
so that
HV[HS=lH~]=1(HS=h)
(1 .. 5.10)
From (1.5.8) and (1.5010)9
=1
This expression can be premu1tiplied by Spas Sis a full-rank matrix
of constants,.
Thus p
(1. 5 .ll)
and, as (3 isa vector of constants p E(~IA1)P(Al)
=~
g (e,A) •. Hence,
E(bIAl)P(A ) = S-lH V[HS-lHV]-l(HS=h)gV (6,1..)
1
+ Sg (e,A) •
(1.5.12)
:From the unbiasedness of b p it then follows that
E(bIA )P(A ) "" S(l=g(epA» =
o
0
S-1H~[HS-1Hi]=1(HS"'h)g' (e,A)
(1.5.13)
As the bias of y
'*
'*
,., E(y ) = X8 9 then from (105 .6) and (1.5.13), and bearing
in mind thatH, Sand hare' fixed, then.
12
bias y '* = ~ [ XS-1H~[HS-1HV]-lH E(bIA )P(A )
o
~
XS-1H v [RS=lH v
0
C1h P (A ) ].
o
= - xs=lH [HS=lH ]=IH f3(1-.g(6~A»
V
V
+ XS=lH v [HS=lH v ]-lHS-1H v [HS-1H Y ]-l(HS-h)g' (6,1..)
+ XS-lHV[HS=lHVClh(1-g(6pA»
(1 •.5.14)
On simplifying this leads to
bias y
'* . = XS -1HV[HS-1Hi] ~l (H{3=h){g v (6 pA)
+ g(6,A)-1}.(1.5.l5)
,
!his expression could be simplified by evaluating g (6,1..).
From (1.5.2)
and (1.5.3), g (6 pI..) can be written as
g(6,A) =
f
fu(u~m~n-k~6)du
U>A
=1
mA f t (t;m pn-k,6)dt. (1.5.16)
4l-k
t>----··
Thi,sequation can be differentiated with respect to 6.
To do this p it
wquldbe convenient if the last expression could be differentiated under
the integral sign.
for this are.that
Bartle [1964, p. 307] proves that necessary conditions
ft(t9m~n-k~e)
and
a ft(t~m,n-k~6)
as
be continuous in 8.
From (1.5.3) and (1.5.17) and· (1.5.18) below, it isclear that these
conditions are met.
Now
e
J.'! B(m
2'
-8
i
+ !!:
t
2
-1
. . m+n-k.·
n=k) (l+t)J. +
+ '-2i
2·
The second term on the. right of (l.5.l7) can be simplified by putting
j=i-L
13
i+!!!-1
llQ
e i =1 e-e
E
i=1
(i-1) I B (m +i
2
~.
2
t
(1.5.18)
. ·m+tl-k
n~k)
2
(1+t)i +. 2
co
=
r
j=o j ~
ej
llQ
=
=
e
~e
t
j+
m*
-z -
1
~
r
m*+n-k
j=o j I B(~ +j, n;k) (1+t)j +
2
ft(t;m+2~n-k~e),
where m* = m+2
and this is the density of the ratio of two
sums of squares withm+2 degrees of freedom in the numerator and (n-k)
degrees of freedom in the denominator.
m+2· Q
= P {n-k F (m+2~n=kpe) > ~}
n-k
In
comparison~
g(e~A)
=P
with. this definition of
m
{n:r
F
Q
(m~n-k;e) >
rnA
}
n-k
r(e~A)~
9
g
(e~A)
= -g(e,A) +
r(e~A)
(1.5.19)
and
(1.5.20)
14
As could be expected p the bias function is zero if the conditions HB=h
are exact p and the absolute value of the bias function increases as
IHB-hl increases p that iSpto the extent that B does not satisfy the
restrictions.
that Ibias
S~ N(B-S=lHO[HS-IH~]-l(HB-h)p
From (lo2.4)p
xsi =
I=XS-1HV[HS=lHV]-1(HB-h)lo
Ibias y*r = Ibias
as r(6 p A)p being a
xsi
probability~
Il-r(8 p A)1
This leads to
<
/bias XBI
lies between 0 and 1
(1.5.21)
0
As Xb is an unbiased estimator of y*~ then the value of Ibias y* I
lies between that of IbiasXb I and lbias XS I.
Ibias Xb/ ~ Ibias y
*
I .:.
"'-
/bias XBI
That is!j
(105.22)
Now the estimator Xb will be chosen with probability peAl) = g(8 p A) and
"
XS with probability peAo )
=
I-g(8 p A)o
It may be thought p naively, that
the.lbias y * I would be given by the expression
For 8 and A in the open interval
(O~ +oo)~
this expression is actually
larger than the correct expression of (1.5.21).
This can be verified
by employing an analogous argument to that which will be used later in
Lemma 3.2.2 to show that
and hence that
15
It is not clear from the expression for bias y * in (1.5020) whether
th~
absolute value of this bias function increases or decreases as 6,
the non-centrality paramete~ increases without limit.
increases
btitr(6~A)
For large 6 p IHs-hl
tends to 1 as Lemma 3 2.3 will demonstrateo
This
0
problem can be posed in another way by considering (bias y *) v (biasy *).
(1.5.23)
The problem reduces as to
whethe~
6 goes to infinity faster than
[r(6,A)-l]2 goes to zeroo
As Lemma 3.2.7 will show more formallyp 6[r(6 pA)-l] -l- 0 as 6 -l- +00.
Also,r(6,A) is an increasing function of 6 so that r(6,A)-l -l- 0 as 6 -l- +00
0
Consequentlyp 6 [r(6 pA)...;l] 2 and hence the quadratic form, (biasy*'
) (bias y*)
-l- 0 as 6 -l- +oop which, in turn p implies thatbiasy-l* 0 as 6 -l- +00.
1.6 The Quadratic Risk Function of y *
The mean square error of y *
= E(y*
'
- XS)(y * = XS).
Instead of the
mean square error p however p its trace will be evaluated and p in the third
chapter, this will be used to define a minimax regret function.
The problem p thenpis toevaluateM(&pi\)
*
= E(XS *-XS) v (XS-XS).
The
motivation for using MeepA) is that this is the quadratic risk function
*
of y.
Thus p
M(6 pA) =E(XS *-XS) q (XS *=X(3)
= E[(Xb=XI3)v(Xb=XI3)!A l ]P(A1 )
+ E[(X8=XS)v(xs=xS)IAo ]P(A0 )
v
= E[(b-(3) S(b=S) IA1]P(Al )
A
fJ
i".
+ E[(S-(3) S«(3=S)IAo ]P(Ao )
(1.601)
16
Noting the form ofS from
(1.2.1)~
then
~
M(8 p A)
= E[(b=8) S(b=S)IAl]P(A1 )
+ E[(b-8=S=lH~[HS-IHU]=1[Hb=h]) uS(b_8~S=lH~
(1.6.2)
[HS=lHU]~l[Hb=h]IA )P(A )
o
0
under .the expected value;;
(b)
-[Hb=h] v [HS=lHV]=lHS=lS(b=S)
(c)
-(b-8) v s S-lH v [HS-lH V ]=1 [Hb=h]
(d)
[Hb-h] v [HS- 1HV r-1HS-lS S- l HV [HS=lHV ]=1[Hb=h]
Now (d) reduces to [Hb-h]~[HS-lHU]=1[Hb-h] which is seen to be the Q
defined in Section 1.3.
A.s H(b-S)
=
(Hb-h) = (H8=h)p (b) can be written as·
(c) is merely the transpose of (b)o
(1.6.3)
=E[(b-8)US(b=~)]
=k
er 2 from the comment (1.2 0 2) and the fact that-S is abck
matrix of rank k o
17
Also~ Q := (Hb=h) ~ [HS~lHV l~·l(Hb=h) is symmetric~ so that Q := Qi 0
(106.2) can be written
(1.6.4)
+ E[QIAo ]P(A0 ) 0
With the above simplifications (1.604) becQmesg
(1.6.5)
In (1.5.13) an expression :was obtained forE(bIAo)P(Ao)~ so that it only
remains to evaluate E(QIAo )P(A0 )0
E(QIA1)P(A1 )
As u
:=
From (1.5 016)
=J
Q f u (u~e)du
u>A.
~ and ~t was shown in (103.6) that Q and
ma Z '
forconvenience~ let q
=
%:r
o
f
independent f and
~ then
00
=J
;2 are
A
mAo 2
q
fl(q~mge)f2(;2)dqd~2
(1.6.6)
~""==
0'2
where fl(o~.)~ fiCo) are the respective marginal densities of q and ;2 0
.From (1.302)~ fl(q~mge) is a non~central XV2 density function with m
degrees of freedom and with non=centrality parameter 6 0
18
1
where r( )
is the gamma functiono
Multiplying this density by q can be thought ,of as increasing the
value of m by 2 in the exponent of qo
In the denominator, m can also be
increased by 2if the whole expression under the summation sign is
multiplied by a factor 2[i + ,~]o
.. e
qf (q;m,e)
l
00
-6
L
1=0
e
S
q
e=
m+2
2
i
i£
This gives
2 i + ~2-
i
m+2
+ 2= 1
(1. 6.7)
r(i+ m+2 )
2
r(i+ m+2 )
2
ei
+ m e=e
i=o
S
~2
e '.
q
0+ m+2
ii 2~
m+2
i+ -2- - 1
=2=
+2
2
r (i+ E:....... )
The last term on the right=hand side of the
obviously m
fl(q~m+296)~
e~pression
(10607) is
and if in the first term j is put equal to i-I,
then,in a method similar to that explained in (1.5.18)9 the first term
becomes 2e f (q pm+4,6) 0 Putting 'these results back in (1.6.6) gives
1
00
~
f J
o
(1.6.8)
In section 105, the stat'istic.
t
=:
- Q
{n=k)~2
..,
~_
..
9..,.....__
was employed.
(n=k);2/ a 2
The above expression in (10608) can be transformed in terms of this
statistic to give
19
2E1
J .
ft(t~m+4pn=kp8)dt
(1.6.9)
t>~
-n-k
The second integral is the function r(8 pA)
m+2
m
v.
r(8~A) =p
{n=k F
S(e~A) =p
m+4 1/
{n:k F (m+4pn=k~e)
tha,tq
= ~
(m+2~n=k~e) ~n~k
A}o
m
.
~n_k!do
d~fined
in section 1.5 to be
Similarlyp define
From.(L6.9)>> and recalling
p it follows that
To complete. the.task.of this
sectio~p
there al;'etwo remaining terms of
(1.6.5) which need to. be evaluated so that ·M(6 ~A) may be found •.
Consider the third term on the right of (106.5), viz.
H.
a.
hand S all consist of fixed
consta~ts
and from
(1~50l2)
(105017)
Thus,
..
.. HS(1-r(6 0 A» +h[r(6.X) =g(6.A)]
and
20
and,
clearly~
(106.13)
and, hence
(106.14)
= 2a 2
6(1-r(6 9 A»
from the definition of 6 in (1.3.3).
The remaining term to be evaluated in (1.6.5) is merely the transpose
of this. which gives the same value as found in (1. 6 .14) for it is a
The quadratic risk function for y * can now be written
scalar quantity.
(1.6.15)
107
Invariance under Orthogonal Transformations
In the discussion so far 9 no restr2ctionsof orthogonality have
•
been placed on the XIS. Consider~ now. an orthogonal transformation of
the XIS, defined by
z = XC
v
[,hat·is. CC
Suppose.
now~
=I
whereC is akxk orthogonal matrix.
v ,
and C XXC
,
=:
y
Z Z where Z Z is a diagonalmatrixo
that the parameter space and the H matrix are also trans-.
formed as ·followsg
a =C
v
e
and
H*
=:
He
21
The nqn-centrality of (1.306) becomes
e
= (HS=h)V[HS-1HV]-1(HS=h)
(10 7 .1)
20 2
= (HCC'S-h)V[HCC'S-lCC'H]-l(HCC'S-h)
20- 2
Now, CC'=I implies that C- l = c'~ and
(10 7 ~2)
e in (1.7.1) can be written
e=
(H*a-h)V[H*(ZVZ)=lH~]-l(H*a-h)
(1.7.3)
20 2
These transformations obviously leave e invariant.
As the quadratic
risk function, M(epA) of (1.6.15)p is expressed in terms ofe p it .is
invariant under this set of orthogonal transformations.
Consider p now,
the bias function ,of (1.5.18) •
. In the light· .of (1.7.2) above p the bias function becomes g
Glearly, the bias function as well as the quadratic risk function remains
invariant under the above set of orthogonal transformations.
22
2.
FURTHER COMMENTS ON THE RESULTS OF CHAPTER 1
2.1
Comparison with Larson and BancroftVs Results
Larson and Bancroft [1963] addressed themselves to the problem of
"
...
finding an estimator for the expected value
ofy~
that is y*~ for a true
population model, y = So + Slx l +S2x 2 + •• ~ +8kxk + e.
divided into two groups
0
The first· group {xp
x2 ~
o.
The xVs were
xni' m<k} con=·
0'.
sisted of those variables which the experimenter felt were necessary
for accurate prediction.
A preliminary F test was used to decide on
whether to include the other group {xm+ ' xm+2 ' .'0' x k } in the predic1
tion model.
In other words, an F test was used to test the hypothesis
HO : Sm+l =Sm+2 = ••• = 13k = 0 using as the test statistici\
F
o
=
bm+1 2 + bm+2 2 + .•••
where Vis the
(k-m)V'
experimental error mean square.
Thus, the predicted value of y, y *, can be·writteng .
y
*. =
if F0 >
ifF
wher,
.. "
>..
o
>..
< A
is an.appropriate critical value of Snedecorvs F
. for a type I error of 5 percent.
test~
(say)
They obtained the result that
k
bias of y * = [1-h(6)] L Si Xi
i=m+1'
where h(6) is the
r(6~>")
used·inthispapero
(201.1)
23
To obtain this
result~
they assumed that the xVs were orthogonal,
and indeed were so scaled that
Under their assumptions and using zero restrictions as they
formulation is equivalent to taking h to be the zero
H
=
(Ok-m,m
~
vector~
I <-m ) where I k-m is an-identity matrix of rank
k
Also~
and 0k-m,m is.a (k-m)xm matrix of zeros o
take the form (
>m O:'k_m)
k-m,m
did~
their
and
(k-m)~
theirXvX matrix would
where D
isa diagonal mxm matrixo
m,m
k-m
l
° m,m
HS-lH' would be (0'k-m"m I
) (Dk-m
k-m~m.
rm'k-m )(0rm,k-m )
0
k-m
k-m
whic1:l. simplifies to Ik_mo
The bias function as given in (1.50l8) was
(201.2)
Under Larson and· BancroftVs assumptions, the trace of the expression
in (2.1.2) reduces to the expression in (2 01..l) 0 Furthermore, His of·
interest to note that the general quadratic risk function found in
(1.6.15) to beg
(201.3)
reduces with their simpler model to their results thatg.
24
Mean square error of y '*
(2.1.4)
m
k
k
8
= ~2{! + E x 2 +b(B)
I
2 + [ (6) 2h(e)+1]( E
i
)2}
i=l i
i=m+l xi
r
i=m+l cr- xi
n
thes(e~A)
The reel in (2.1.4) is
1
"_tI
n
"8
o
defined in section
L6~and
the term
enters into their expression because they included a constant termp
II
in their model.
Another -apparent discrepancy between .(2.1.3) and'
(2.l.i4) is that mindependent restrictions are used in this paper whereas
Larson and Bancroft used
k
for the b.,
J.
xi
E
2
=
k=m~
(k=m).
and because of the scaling mentioned above
As the xi are orthogonal p then
i=m+l
k
( , E
i=m+l
reduc.es to 2e.
m
2
By a suitable scaling p E x.J. is seen
i=l
to be equal to m, which is analogous. to the term "k-m"'in (2.1.3).
Although Larson and Bancroft developed the expression (2.1.1) and
(2.1.4) by assuming orthogonality of the
XiS,
.they included a proof by
David that the bias function would be unaltered by non-orthogonality.
The :proof given relies on a transformation of the sample space of
and of the parameter space of bis.
i
X S9
In section 1.S,it was shown that
under such transformations not ,only the bias of y * but also the quadratic
risk function remains invarianti and that this is so for general linear
restrictions.
However p they appear to have overlooked the point that
for invariance the Hmatrix.must also be transfermed,' which raises a
problem for the case of zero restrictions.
If Larson and Bancroft are
working with the transformed model of Z, a and H*p as the appendix seems
to imply, thenH* would equal (Ok-mpmp1k-m).
model, H+ (Ok
-m~m
P
I
k-m
But in the untransformed
)pso that the original restrictions would not
25
be zero restrictions but messy restrictions of the form of linear
combinations of the S's being zero.
2.2
Comparison with Wallace's Results
4
T. D. Wallace· introduced the term "weak mean squared error" and
Qefined·it as:
"The restricted estimatQ:r, 13, is better in weak mean
A
A
squared error i f and only i f E(S-S)'S(S-S)'::" E(b-S)'S(b-S)."
For the set of restrictions introduced in section 1.2, he showed that·S
is better in weak mean squared error if and only if the non-centrality
parameter 6
<
m/2 •. This provides a method of checking the expression
from (1.6.15), that
M(e,A) = 0 2 [k-m+mr(6,A) + 26(1-2r(6,A) + s(6,A»]
b, the 0.1. s. ·estimate of S, would always be chosen i f the critical value
of the u-statistic, A, is zero.
From the definitions of r(6,A) and s(6,A),
they would both become 1 if A = O.
That is,
(2.2.1)
On the other hand, as A + 00, r(6,A) and s(6,A) both tend to zero, and
S would be chosen with probability I •. That is
,.,
-"
E(S-S)'S(S-S) = M(e,oo) = cr 2 [k-m+26]
Thus, 13 is better in weak mean squared error·if and only if
4unpublished paper.
See.footnote.l.
(2.202)
26
M(e,~) ~
M(8,O) which implies that
which in turn implies that.e .::. m/2~
This provides a useful check on the expression given
2~3
forM(8~A)o
Comparison with Sawa and HiromatsuYs Results
Sawa and Hiromatsu [1971] set up a model similar to Larson and
5ancroft
[1963]~
with orthogonal xYs and q zero restrictionso
Evidently
aiming, to focus attention on the S vector itself rather than the pred:/.ctedvalue of y, they considered the following estimato,rfor the
l;i.near combination of S's, 'VCiS'V.
Y
!j\
=
if F
Y
~l b·
'VI
if F
,
where Fo
= ~2~2
q s
i
~l ~l + ~2 ~2
and s
2
>
A
<
A
0-
0
is the sample variance.
2
They obtained the
result~
Mean.squareerror
~A
and h(8,;\) is analogous to the r(e,A) of section 1.5.
This suffers from
the disadvantage that this expression depends on the ve,ctors
To
overco~e
this p they defined a risk function:
A
R(e,A)
= sup
~2
[E(~A-
Y
~2 ~2)
~l
and
~20
27
which. gives
R(e~)..)
=
h(e~A)
where their e is twice the
Unfortunately,
th~yare
+ e[S(epA) - 2h(e,)")+1]
non-c~ntrality
(203.3)
used in Chapter 1
0
.
rather careless in their definitions as on page 6·
i
they define h(epA)::: P { F (q+2 p n-k,e)
~
A}
v
and
S(e,A) - P {F (q+4,n-k,e) >A }
_m
m+2
m+4
wh ereas t h ey s hou ld have inc 1uded constant .facto.rs n-k ., -n-k ' n-k
was shown in section 1.60
give
ano~her
as
Furthermore, they are not consistent as they
incorrect version in a footnote on page
h(e,)..) = e-e/2
p
00
L:
j=o
(e/2)j
jI
2
l4~
st~tingg
2
(2.304)
P {X q+2+2j 2:.. AX n-k}
This should be, using their definition of H,
h(e,A) = e-e/2
00
L:
j=o
(e/2)j
.,
J
0
}
P {X 2 q+2+2j > ...!L AX 2
- n-k
n-k
.
(2.3.5)
Using (2.3.4) they omitted a factor of 1/2 on the right-hand side when
they obtained:
(2.306)
Fortunately, these omissions have not invalidated the risk function
they obtan.ned,which is quoted above in (2.3.2).
It cis interesting to notice the similarity of form of their
expression in (203.1) and the quadratic risk function of (106015)0
Fu:r;:ther cC)mments on their risk·functionwill be made in section 4.60
28
3.
3~1
It has been
MINIMAX REGRET FUNCTION FOR Y*
. A Minimax Regret Function Defined
sho~.in
section 1.7 that the quadratic risk
function~
M(6,A), based'onan estimator for the predicted -value of y, is invariant·
un~er
groups of. orthogonal transformations'ofthe sample, parameter and
restrictioI). spaces.
It 'is appealing, then p to define a minimax regret
function based on this quadratic risk function in .an att.empt to find an
optimal •.critical value p A'* t' of the prior test of estimation.
When values of the quadratic risk functionofy '* , M(6,A),
ar~
computed fora given value of 11.=11. 0 in the openintet'val ,(0,+00) 9 i t 'is .
found that the graph of M(6,A o ) follows the general shape of the curve
labelled 1..=1.. 0 in Figure .3.1..
The qua-dratic risk functions for Xb and XI3
are. represented by the lines labelled 11.=0 and A=<Xl, respectively.
The curve of M(6 t A ) for'A
o
istics:
0
in (0,+00) has the: following character-
When the nQ,n-centrality parameter, 6, is zero, M(6,A o ) is
'"
betweenq. r.f.(XI3) t the quadratic risk function for XI3, and -q. I'. f. (Xb).
At:, 6 increases.,M (6 tAo) increases and its graph cuts that of q.r.f. (Xb)
at a point between 6
=4m and
a point.6=6 U where 6U ~
2m .
6
m
= 2.
M(6,A)
then reaches·a maximum·ato
It would appear that·the graph of M(6,A ) is
o
unimodal, but this is not 'essential to the definition of ,the minimax
regret-condi,tion given below.
*
If 6 were known, then .the best estimator of Yt as far as the quadratic
m
. m
rifilk function is concerned t would beXI3 for 6 <2 ' and Xb fer 6 .:. 2 •.
Consider inf M(6 t A), that iathe infimum (which in this context will
A
~qualthe
minimum) of M(6,A) over all values of 1..
0
'
Clearly,
29
A
inf M(8,A) = M(8,+OO) or q.r.f. (XS) , for 8 <
m
2
A
(3.1.1)
M(8,0), or q.r.f. (Xb), for 8
m
~2
The minimax regret function, REG(8,A), is then defined as
REG (8 , A) = M(8 , A) - inf M(8 9 >..)
(3.1.2)
>..
For 8 < ~
and small values of m and n-k, it is found, empirically 9 that
REG(8,A) takes on a maximum at 8=0.
For larger values ofm and n=k,
there is a 8 at which the maximum occurs.
L
of REG(S,>") is labelled 0L.
In Figure 3.1, this value
For 8 .:. ~, REG(8,X) takes on a maximum value
at 8U and this value of REG(8,A) is labelled 0u in Figure 3.1.
Heuristically, these two distances can be thought of as the maximum
additional penalty for choosing XS * instead of the optimal estimators 9
XS and Xb, at the points 8=8 L and 8=8 ' respectively.
U
The minimax regret procedure is to seek the 1..=>.. * which makes both
0L and 0u as small as possible.
It .is found, however, that .asthe critical
value, A , increases the distance labelled
0L decreases but 0u increases.
This can be seen in Figure 3.1'by the relative posHions of the graphs
labelled 1..=1 and,A=4.
Thus,to minimize the minimax regret functionli REG (8 ,A) 9 over all
values of 8 and A, the procedure will be to seek a A=A * such that
or, in other words,
. *) = Sup REG(8 9 A*)
Sup REG(B,A
m
8<'2
8>E!.
-2
(3.1.4)
30
>..=4
A
o
---,-------
u
k 0'2
l
-:74~::;6===========±===~>..~=~1~
0'2 (k-m)
m
"2
Figure 3010
Quadratic risk functions for different A
e
31
* the aim is to find A*
In terms of the quadratic risk function of YP
such that
If the expression forM(B,A), as found in (1.6.15), is substituted in
(3.1.5)p the minimax regret condition implies the following relation=
ship.:
m r(8 ,A *) + 28 [-2r (81' A*) + s(8 A*)]
L
L
1'
(301.6)
=
m r(6 ' A*)-m + 28 [1-2r(6 ,A *) + s(8U,A *)]
U
U
U
where 81' 8U maximize thEa quantities cnthe left and right, respectively 0
3.2
Some Properties of the FunctionsM(e,A), r(8,A) and s(8,A)
In section 3.1, some general properties were noted of the graph of
M(~,A).
These properties will be dealt with in this section more
formally by a series of lemmas.
M(e,O) = q.r.f. (Xb) is a constant function of 8, and M(8,+oo)
=
A
q.r.f. (XS) is a linear function of 8.
Proof
Recall that
M(&,A)
= cr 2 {k-m+Illr{8,A) +
28[1-2r(8,A) +s(8,A)]}
From the definitions, of r(S,A), s(8,A) given in section L6,it is clear
that r{8, 0) and s(6, 0) are both unity,and r(S,co), s(8,oo) are both
zero.
S\.l.bstituting these values in (3.;2.l) gives the resultsg
32
and
Clearly, the quadratic risk functions for Xb and XS are represented
by the lines labelled A=O and
A=OO~
respectively.
M(e~A)
inherits many
of its characteristics from the functionsr(epA.) and s(e.A.).
For this
reason, attention will be focused on these functions in the next two
lemmas.
Now r(epA.) was defined after (1.5.18) in terms of the rat.io of
two sums of squares, that is
m+2 i (
)
mA}
r ( e,A ) = P{ n-k F m+2,n-k;e ~ n-k
Equivalently,
r(e,A.) = p{X'2 (m+2;e) > mA
n-k
x2 (n-k)}
[1943~
By a well-known transformation, for example see Wilks
(3.2 5)
0
p. 187],
r (e,.A) can be expressed in terms of the incomplete .Beta function.
X
'V
If
F(2p,2q), thenY =pi~q has the Beta distribution with parameters
p and q, and P{Y<y} = I y (ppq) where I y (p,q) is the incomplete Beta
function ratio and
fY
t p- l (l=t)q-l dt
o
and B(p q) =r(p)r(q) can be termed the complete Beta functiono
,
r (p+q)
this notation,
00
r(epA) =1 -
~
j=o
and y =
mA
-mA~+:;;;'n;"'-~k
•
I
y
(m+2 +.
2
J
9
n-k)
2
With
33
s(SgA.) will have the same form as (3.2.7) with (m+4) replacing the
factor (m+2).
(m+2).
A third function w(egA-) can be defined with (m+6) replacing
Perhaps, more succinct notation would be the followingg
s(S ,A.)
{302.9}
Lemma
I
3~202
ForSand A. in the open interval (Og +00),
In "other words, r(S,A.) is an increasing function of the numerator degrees
of freedom.
Equivalently,
~IV(p»O
for all integer p.
Proof
The result can also be obtained from the difference
00
s(6.A.) - r(e.A.) =
•
•
-6
j
~ e
S
jl
j=o
[I (m+2 + j n-k) _ I (m+4 +' n=2k )] 0
y 2
'2
y 2
J,
(3026·10)
Jordan [1962, p. 84] gives the following useful re1ationshipg
r (p+q)
P(
Iy(p,q)= r(p+1)r(q) y
1-y
)q
(
+ I y p+1 g q)
r(p+q)
p.
In other words, I y (p+1 p q) ~ Iy(pgq) = r(p+1)r(q)y (l=y)
The expression in (3.2.10) can then be written as
(302011)
q
34
m+2+2j
s(e,:>") -
r(S~:>")
=
is in the interval
w(e,:>")
> s(e~:>..)
2
(o~
1).
n-k
(l-y)
""2
>
° as
y
Consequently,
follows from a similar argument.
Lemma 3.2.3
Fore,:>" in the open interval (0, +00), r(e,:>"), s(S,:>") and w(e,:>")
{ire decreasing functions' of :>",but'increasing functions of So
Proof
It is clear from the definition of r(S,:>") in (3.2.5) that as A
increases then r(S,:>") decreases.
I~
It was shown in (1.5.19) that·
a similar way, or directly from (3.2.7),
=-
r(S,l) + s(S,A)
and this expression is positive from Lemma 3.2.2.
The lemma is now
proved for r(S,:>"), and analogously, the results follow for
S(e~A)
and
w(S,:>").
Lemma 3.2.4
When e=o, and Ain (0, +OO),M(S,:>") lies between the quadratic
risk function for XS and the quadratic risk function for Xbo
M(O, +00) < M(O,:>") < M(O,O) •
That·is~
35
Pr00f
Nowr(O,A) and S(OpA) are both unity when 11.:=:0 and both tend to zero
as A tends to infinity.
Thus~
(3.2.14)
Lemma 3.2.5
The graph M(epA), the quadratic risk function of XS '*l cuts that of
Xb ate=e
(0,
c
where e c is in the open interval (~p ~)9 provided that A is in
+00).
Proof
From (3.2.1) and (3.2.2), e
c
is defined by
kcr 2 = cr 2 {k-m+mr(e c pA) + 2e c [1-2r(e c 911.) + s(ecpA)]}
which simplifies to
e
c =
m [
2
1-r(S pA)
c
1-2r(e ,A) + see pA)
c
c
]
0
The lemma will be proved if it can be shown that
1
2
-<
1 - r(e pA)
c
< 1
l-2r(e c 9A) + see c ,A)
or
1
-<
2
1 - r(e c pI.)
l-r(e c pA) + see c pI.) - r{e c 9A)
<
1
The right-hand inequality in (3.2.17) follows from the fact that
see c
pA)
- r(e c pI.)
·
>
0 as shown in (3.2.12).
As sce c 9A) is a prcibabilityp
it is less than 1 for A in (Op -J-a»p so that the
denominator~
36
1 - r(e c ~A) + see c ~A)
.
which proves the left-hand
=
r(e c ~A) < 2[1 - r(S c ~A)]
inequality~
and completes the proof of the
lemma.
Lemma 3.2.6
The infimum
qfM(e~A)
is M(e, +OO)p the q.r.f.
"
(XS)~
when
e
<
2m
and is M(e,O)~ the q.r.f. (Xb) when 8 ~ ~ .•
Proof
Now
S(e~A)
- r(8,A) >
° from Lemma 3.2.2.
>
a 2{k-m+28+mr(8,A) - Z8r(8,A)]}
>
a 2{k-m+28}
Also, l-2r(8,A) + s(epA)
= M(8,
> 0,
For
e
<
m
'2
~
~) •
hence for e
m
~I
As 8 tends to infinitY9 M(8,A) approaches
M(e,O)~
the quadratic
risk function for Xb.
Proof
M(8,A)
= a2{k-m+mr(8~A)
+ 2e[1=2r(8~A) + S(e~A)]}
0
9
37
Gun [1965, p. 54] has shown that, with the present notation,
a
e [1-r(8,A)]
Thu1h
28[2{l-r(8~A)}
+
° as 8
++ 00,
- {l-s(e,A)}] +
increasing function of 8, then lim
e++oo
and a
0
>
° as 8+ + 00.
r(e,A)
As r(8,A) is an
= 1.
Hence, the result that
Asr(e,A) and S(e,A) are continuous functions of both e and
that the regret function of (3.1.2), REG(8,A)
= M(e,A)
i\~
it is clear
- inf M(8, A) will
A
m
attain a maximum value for 8 in the interval [0, ~]J and ['2' +>0] as
REG(8,A) decreases to zero as 8 goes to infinity.
38
4
0
MINIMAX REGRET FUNCTION FOR THE BETA VECTOR
4 1
Introductory Comments
0
In Chapter 1, the estimator t XB*~ of the predicted value of y was
studied and its bias and its quadratic risk function were obtained as
neat, mathematical expressionso
On the basis of the
latter~
a minimax
regret function was defined in Chapter 3 to seek an optimum critical
point for the preliminary test of estimation.
In this chapter, the bias and the mean square error of S* will be
found by similar methods to those used in Chapter L
In contrast.to
the bias and quadratic risk function of XS * , the bias and mean square
error of S* will not be invariant under orthogonal transformations o
It
will not be possible to find an optimal critical value independent of
the design matrix, S
= XVX~
experiment when Sand H are
or the restriction matrix, Ho
specified~
In a given
it will be possible to find an
optimal critical value, A , by setting up a similar minimax condition
o
as used
~nChapter
3.
Differentiation of matrices and vectors will again be greatly
and it will be helpful to recall the expression found in (10 5 07)
!as
[(b-S)'S(b-S)]
= -2S(b-S)
involved~
g
0
i
Differentiating with respect to S would give the transpose of this,
namely,
a
as
as S is symmetric o
v
(b-S)' S(b-S)]
= -2 (b-S) 's ,
39
This expression can be differentiated a second time,. as> shown in
Theil [1971, po 31], giving the result
a2
aBas' [(b-S)'S(b-B}]
4.2
= 2S
Bias of the Beta Vector
The model, restrictions, and test statistic are the same as in
Chapter 1.
The estimator of the B vector is given by
B*
The
bi~s
b
=
ifU>A
on the set Al
ifU<A
on the set of A •.
O
B* can be found in a similar way to the bias y * found in.
q . 5.20),
thus
bias
B* = S-IH' [HS=lH,]-I(HB-h) (r(8,A)-I)
"
As a partial check on this bias function, it is obvious that it takes
the value zero if the restrictions are
A takes the value zero.
exact~
that is if
HS=h~
or if
This la.tt.er case· is equivalent to always
choosing the o.Lsoestimator of B, namely b, which is, of
course~
unbiased.
4.3
The Mean Square Error of the Estimated Beta Vector
In this section~ the mean square error of S* willbeevaluated~
that is E(S*-S)(S*-S) I .
In sectionL6,it"may be recalled, i t was
the trace of the mean square error of y*, that is E(XS*=XS)V(XS*=XS)~
which was the more easily obtained.
It .is convenient to consider the transpose of each of the
expressions in (1.5.8) through (1.5.10)0
40
The mean square error.of 13* will contain an expression
E[(b-13)(b-S)W IA ]P(A ) and to evaluatethis~ it is necessary to differ~
o
0
entiate (4.302) a second time
toobtain~
a2 p(A )
l
asa13'
On the other.hand, from (403.2), the following can be obtainedr-
From (4.3.1) and (4.3.2), (or alternatively, see the derivation
leading to (1.5.11»
41
From (4.3.3) and (4.3.4),
E[- ~ + S(b-B)(e-B)VS' /Al]P(A )
l
cr 2
cr
,
H' [HS-lH' ]-lH
"
H'[HS-lH' ]-1 (HB=h) (HB-h) i [HS-lH ]=lH
cr 2
+g (8 pA)
cr 4
Q
= g (8 pA)
(4.3.6)
Now S is symmetric so that S
,
= S.
Pre- and post=multiplying the
eJ!:pression in (4.3.6) by S-lcr 2 and recalling from (105.2) that g(8 9 A) is
peAl)' the following result is obtained:
l
E[(b-S)(b-B)' IAl]P(Al ) = cr 2 s- g(8,A)
+ cr 2 s- l H'[HS- l H,]=lHS- l g' (8,A) + S-lH'[HS-lH,]-l(HB-h)(HB-h)'
[HS- 1H,]-lHs- l g"(8,A)
(4.307)
Now
E[ (b-B) (b-B)' IAo]P (Ao ) + E[ (b-B) (b-B) Y jAl]P (AI)
= val' b= S-lcr 2
•
,
Also g (e,A) = -g(epA) + r(8 pA) by (1.5.19).
Similarlyp Hean be shown
,
r (e,A) = -r(e,A) + S(e,A)
g" (8,A)
so that
= g(e,A) - 2r(B,A) + S(e,A)
Substituting these values back in (4.3.7) leads to
0
42
E[(b-S)(b-S)'IAo ]P(A)
= cr 2 s- l [1-g(6,A)]
0
- cr 2 s- 1H'[HS- 1H,]-lHS- l [=g(6,A) + r(6,A)]
- S-lH'[HS- 1H,]-1(HS-h)(HS-h)'[HS- 1H,]-lHS- l [g(6,A) - 2r(6,A)
+ s(6,A)]
(403.8)
Denote the mean square error matrix of S* = E(S*-S) (S*-S)' by MSE (S,A).
t
Then, as {Ao ' Al } form a partition of the parameter space,
= E[(b-S)(b-S), /Al]P(A l ) + E[(a-S)(a-S), IAo]P(Ao )
0
(403 010)
As a= b - S-lH'[HS-1H,]-1(Hb-h), MSE (6,1.) is found to be the sum
of·the following four termsg
(a)
(b)
E[(b-S)(b-S)'], as this term appears in both sets Ao and Al ,
_S-lH' [HS=lH' ]-1 E[ (Hb-h) (b-a) , IA]P(A ) ,
(c)
the transpose of (b), and
(d)
S-lH'[HS-1H,]-1 E[(Hb-h)(Hb-h)' IA ]P(A ) [HS-1H,]-lHS=1
.
.00
o
0
Note. that in (b) and (d), use is made of the fact that Sand Hare
matrices of
k~own
constants.
43
simplified using the identity
(Hb-h)
= H(b-f3)
+
(4.3.11)
(Hf3-h)
giving
(b) =_S-l H, [HS- 1H' ]-l E{H(b-f3) (b-f3)' + (Hf3-h) (b-f3) viA
o
}P(A )
0
whe+e n1 is the expression in (4.3.8) and n 2 is the transpose of the
expression in (4.3.9).
On simplifying,
+ S-lH'[HS-1H,]~1(Hf3_h)(Hf3_h)'[HS-1H,]-lHS-1][s(e9A)_ r(6 9 A)].
(4.3 13)
Q
As this expression is symmetric, then·(c) reduces to this same
expression.
The expression (d) can be expanded into four terms as
(Hb-h)(Hb-h)'
= H(b-f3) (b-f3) 'H' +
H(b-B) (Hf3-h)' + (H(3-h)(b-B) 'H'
+ (H(3-h)(Hf3-h)' •
Thus,
E[(Hb-h) (Hb-h) , IAo ]P(A0 )
,
= Hn 1H' + Hn 2 (HB-h), + (H(3-h)n 2H'+ (Hf3-h)(Hf3-h)'[1-g(e,A)]
9
44
where nland n , as mentioned above, are the expressions found in
2
(4.3.8) and (4.3.9).
Substituting the values for n
l
and n 2 into the
above expression and simplifying leads to
(4.3.15)
Summing the four terms (a), (b), (c), and (d) leads to the
following;
That is,
[1 - 2r(6,A) + s(6 t A)]
4,4
(4.3.16)
A Partial Check on· the }1eanSquare·Error
When A = 00, S is chosen with probability 1, r(6,oo) and 8(6 9 00 ) both
equal zero, so that (4.5.16) becomes
A
MSE(6,00)
"
= ftCS-S) (S-S)
v
47
The risk function of
quadratic risk
(4060l)~
however, is equivalent to the
function~M(SI~)~ defined
(1.6.l)~as
in
Any conclusions drawn from this weighted risk function would be
applicable to the predicted value of y and not to the estimated beta
vector.
4.6.2
A Sawa-TypeRiskFunction
It was pointed out in section 203 that Sawa and Hiromatsu studied
,
i
+ 'VC2'VS21 of the 13 16.
'V l
the problem of a linear combinationC'B = G1B
'V .'IJ
'V
By
a normaliZing process involving the.supremum of the quadratic risk funcHon
over the vector
~21
they arrived at the risk function of (2.3.2)1 viz o
= h(B,l)
R(BJ~)
+B[s(B,A) -
2h(BJ~)
The supremum they defined would be attained when the
the same direction as
the~2
(40602)
+1]
~2
vector, for they made use of the
vector had
Cauchy~
Schwarz inequality
i
1~2~21 .~ 1I~211
They then defined a regret function
and they sought the.optimal value of
0
1I~211
asin·(3.2.2)~
~~~
*~
such that
REG(B I~ *) = sup REG(B 1/\ *)
e
namely
(40605)
45
This expression is the same as obtained by Wallace
5
providing a partial
check for (4.5.16).
When A =
O~
b is chosen with probability
equal It so that (4.5.16) becomesMSE
(e~O)
l~ r(e~O)
and s(8 9 0) both
? -1
= E(b-a) (b-a) v = 0~S
as
expected.
4.5
In section
1.7~a
Orthogonal Trartsfo:rmations
set of orthogonal transformations was defined on·
the sample t parameter and restriction spaces as
co
such that CC V
follows~
= Cia
and R*
= RC
= I.
Under this set' of transfo:rmationst the following can easily be
justified:
(4.5.2)
(RS-h) (RS-h) v
=
(RCC· v S-h) (RCC i a-h)
i
In section L7 9 it was shown that the bias of y * and also its
quadratic risk function are invariant under the orthogonal transforma=
tions defined in that section.
Under these same
bias of a* would become
5
.
UIl-published paper.
See footnote 1.
transformations~
the
46
Cl~arly,
this is not invariant under these transformations.
Let MSE '* (e~A) be the
transformationo~ MSE(e~A)
with the above
substitutions made as shown by the statements (4.6.1) through (4.6.3).
From the form of
MSE(e~A)
as found in
C[MSE(e~A)]CV
or
=
(403.16)~
it follows that
= MSE '* (e,A)
CV[MSE '* (e ,A)]C
(4.5.4)
Clearly; MSE($,A) is not invariant under this set of transformations.
Whereas a regret function based on the loss function E(XS '* -XS)U(XS '*~XS)
did not depend on the specijic values taken by the S andH matrices,
this will not be the case with the loss function E(S '*.-S)V(S '* =S).
The question remains whether forgiven values of S and
H~
amean=
ingful regret function can be defined in order.to obtain an-optimal
critical value for the prior test. - Some possible approaches to this
problem are explored in the next section..
4.6, Minimax Regret Functions for
the Est:l.mated Beta Vector
4.6.1
I
A Weighted Risk Function
To overcome the problems mentioned in the preVious section,
weighted risk function,
R(e,~),could
be defined by:
and. a minimax regret equation set up as in (3.1. 3).
~
48
This necessitated evaluating R(epA) at the four points
v
But in their formulation 8
= ~2~2
p so that 8=0 implies
that~2
=~
p
cr 2
whereas 8=8 0 implies that ~2 =~;
where~; is non-zero~ It has been
shown above that the supremum would only be attained if
direction as
~2'
and as
~2
is fixed but
supremum cannot be attained at both
8~0
~2
had the same
arbitrary~
it is clear that the
and S=8.
o
On the other hand p
the quadratic risk function of y*p M(8,A) obtained in section 1.6 p did
not require taking the supremum but gave a value comparable with their
R(S,A).
This indicates that whereas they had tried to find an optimum
*
A, A , relating to a regret function for theSvector, or at least to
tV
!linear combinations of the
Si~
their arguments actually apply to a regret
function involving the predicted value of yo
It
m~y
be tempting to set up a Sawa-type risk function in the general
context of this paper p although the above comments suggest caution.
A
risk function, R(8,A)p could be defined by
R(epA)
= sup
tr MSE(8,A)
S
= sup
*
*
E (S-S)V(S-S)
S
Lancaster [1969 p p. 109] gives a theorem showing that for any
v
kxl vector y, and S being a real symmetric matrix, then VIY
v
r~~al
v
y~y SY<~kY ~p
and, in particular
Y Y<
1
il
v
Y Sy
1
where~l'
~k are~
respectivelyp the smallest and largest eigenvalues of S.
49
It is clear that the y vector could
'*
"
in
turn~
by the
If expectations are found& then the
vectors (a -alp (a-a> or (b-a).
inequalities remain valid.
bereplaced~
In the spirit of the risk function of Saws
and Hiromatsu quoted in section 2.3& let
(4.6.8)
1
= -].l
1
M(6 8 A)
If the minimax regret condition of (4.6.5) is set
UP8
it is clear
that both sides of the equation are identical with the results obtained
'*
from the predicted value .0fYg Y g but both sides are inflated by the
factor
L.
]11
It
is clear .from.the argument above that the supremum of
S
trMSE(6 pA) cannot g in fact be simultaneously attained for two different
values of6.
Any conclusions drawn from this procedure will only
strictly be valid for the estimated value of y.
4.6.3 - Minimax Regret Condition Based on the Mean Square Error
In the two previous sections g a risk function was defined either as
a weighting, or by taking the
'.
supremum~
of the mean square error of the
I
estimated beta vector.
Both-of these approaches led to the same critical
value given by the predicted value of y.
..
In this section, a critical value specific to the estimated bet:a
vector will be obtained by taking the trace of the mean square error as
the risk function.
That is p
50
In (4.3.16) the mean square error was given by
= 02g- 1
MSE(8~A)
+ 02s=1H~[HS=lHU]=1Hs=1[r(e~A)_1]
(4.6010)
+ S-1HU[HS-1Hq]-1(HS=h)(HS-h)U[HS~lHU]-lHS-l[1-2r(69A) +S(8 p A)]
(406.11)
Consider~ now 9 the ratio A = (HS=h)UA[HS=lHU ]=l(HS-h)
It is
then
(HS-h) v [HS-1HV]-1(HS_h)
of the form
ZV E Z
and by a simple extension of the theorem quoted in
ZY F Z
(4.6.7) from Lancaster [1969 9 p. 109]g
(406.12)
where ].lit i=l p 2, ••• ,m t are· the ordered characteristic roots of EF
-I "" Ao
As (HS-h)Y[HS- 1HU]-1(HS-h) = 20 2 8 p the risk£unction can be simplified
further.
R(8 t /..)
= cr 2{trS- 1
+ trA(r(8~A)-1) + 28A[1-2r(8 p /..) +s(e~A.)]} 0(4.6 013)
A regret function and minimax regret condition can now be defined in a
similar manner as in section 301 0
..
The analogs of (3.101) and (30102) are 9
respective1y~
51
inf
R(6~>")
=
R(6~+OO)
for 8
<
c
R(S,O)
for 6
>
c
>..
1 tr A
c :: - - - -
where
2
and
REG(6~>")
(4.6.14)
A
= R(8 p>")
- ini
>.
R(8~A)
(4.6.15)
For 6 ::. c,
(4.6.16)
As r(8,>') is a probability, and from (3.2.2)
s(8,A) - r(6,>') > 0, then [1-2r(6 r A) + s(6,A)]
>
0
and
sup REG(6;A) ~ sup a 2 {trA[r(6,A)-l]
8>trA
6>c
-2]J
m
It should be noted that the supremum on the right-hand side of (406.17) is
taken over the larger set of values of 8.•
[!!!
2]J
set of values of 6 on the left-hand side,
[c~
For 8 <
p
+00).. p which includes the
m
+OO)p of (4.6.17).
c~
(406.18)
..
To find the supremum of this quantity, it is necessary to know the slgn
of the
expression~
E
=
[-2r(8~A)
+ s(8,A)].
As 8 tends to infinity,thi,s
expression tends to -1.· Values of E were computed at 3,960 points for m
in the interval [1,120] ~ n-k in [2~120] ,. A in [O~ 3] and 8 in fOp ~Jo
52
The results indicated that E .:. 0 and is monotonically decreasing with
e to -1, and monotonically increasing with A to zero.
For the triplet of values (m p
n~kp
A) considered in Chapter 5 p the
supremum of REG(6,A) for e<c will occur when A takes the
smallest characteristic root of the matrix A.
Zlll
the
Again p as c is unknown,
the supremum will be .taken over the larger set of
. trA
o to
.-.
va1~e ~lP
e
values from
Thus,
sup REG (e , A)
e<c
The minimax procedure will be to search for an optimal A, A p which
o
minimizes the expressions in (4.6.17) and (4.6.19).
Empirical results
will show that as the critical value, A, increases, then sup REG(e,A)
6<c
decreases but sup REG(epA) increases.
e>c
The optimal minimax regret value
Ao will be that value which equalizes the expressions in (4.6.17) and
(4.6.19).
Thus, 1..
0
satisfies the equation
(4.6.20)
REG (eL' AO ) = REG (eU ' 1.. 0 )
e<c
e>c
where 8 , e maximize the expressions on the left and right, respectively.
L U
An important question arises as to whether it is possible to find such a
e
u which maximizes the quantity on the right of (4.6.17) while at the
sa~e ti~e being compatible with the. condition that A attain its maximum.
Now, Lancaster [1969, p. 110] shows that A attains the value
II
m
whenH(3-h = Z ·,a characteristic vector of the matrix A corresponding to
m
the characteristic root
]l
•
m
This implies that e
=:
Z [HS-lHV]-lZ
m
m
20 2
No
53
restriction has yet been placed on 0 2 which can take values in the open
interval
(O~+OO)
so that
e
may attain the value of
eU to
maximize the
quantity in (4.6.17).
Similarly for
(4.6.19)~ 11.
e ""
attains.the
value]11whenHS~h ""
Zl and
2 [HS-IH q 1-1Z
1
1
For .a11 values of cr2~ it.wou1d be possible to find a 6
L
which satisfies
both (4.6.19) and (406 021).
It .isinstructive to notice a connection between the regret
condition based on the es.timated beta vector and the regret condition
of Chapter 3 based on the predictedva1ue. of y.
Now the· trace of a matrix is the sum of its characteristic
,roots~
m
that is trA
=
E ]1i. If 11. is taken to be the arithmetic mean of the
1=1
1 m
trA m
]1i- that is 11. = ;- i:1 ]1i~it.is clear that the value of c == 2A "" 2" and
the regret functions of (406016) and (406.18) are. then equivalent to the
regret functions based on the predicted value of Yo
This is intuitively appealing as basing a risk function on the
predicted value of y is felt to be an averaging processo
The minimax regret condition of this section is
Figure 4.L
The risk function v
(4.6.13) contains the unknown
between 11
1
and 11 •
m
R(6~A)~
by
as found in the expression
variableA~
For a given value of
il~ustrated
which Can take on any value
11.~
the graph of
R(e~A)
lies
between the graphs of R(6 p "') for ]11 and ]1m P as illustrated by Figure 401
(a).
~
LI')
1
-----QRF
~
.
I I
0
1
A /
e
1
e
l
2
A
III
/~
~/~
?/,I-~---
8
2
e
= trA/
8
= trA/
1
Figure 4010
e
The regret condition for the estimated beta vector
e
8
2
3
= trA/
211m
2A
2fl1
e
55
In Chapter
3~ for t~A =~ ~ the aim was to find a value of the
critical point, A, which equalized the distances 0L and 0u.
For unknown
A these distances cannot be evaluated p but in Figure 4.1 (b) and (c)p it
can be seen that these distances are maximized by the distances
~L
and
The minimax regret criterion minimizes these maximum deviations in mean
square
errQr~
and this occutswhen
~L
= ~.
~u.
56
50
Sol
RESULTS AND CONCLUSIONS
Computer Procedure Followed to Obtain Critical
Values'BaSed'oIi'the'PtedictedValue of y
It was shown in section 3 1 that the optimal critical valuegA '* ~
0
of the prior test of estimation satisfies equation
(3.1.4)~
viz,.
"'" REG(6tJ~A '* )
m
(5.l.l)
6>'[
where 6L , 6 are the values :of 6 which maximize the quantities on the
U
left and
respectively.
right~
The procedure followed was to give values to
degrees of
freedom~ n-k~
m~
the numerator
the denominator degrees of freedom, and
A~
the
critical value of the prior F test o
To find
eU~
a computer search is carried out by increasing e in
small increments from a starting value of ~.
risk function of y '* ~
M(e~A)~
a small
along the
interval~ I~
The slope of the quadratic
is calculated at each e valueo
e
axis is located such that the slope of
M(e,A) changes from positive to negative in the interval 10
sively reducing the size of
I~
In this wayp
the optimal value for
e~
eU~
By succes=
is obtained.
this value is assumed to be reached when successive itera-
Inpractice~
tions differ by less than 10=4.
In a similar way~ the optimal value~ eLP is obtained for e < ~
Denote the left- and right-hand side of (501.1) by (\ and
respectively
0
0
0U~
If these differ by less than lO=7~ then the current value
'*
of A is taken to be tne optimal value p A.
or decreased according to whether 0
Otnerwise~
A is increased
1 is less than ,or more than
0U~
57
respectively.
That thlS procedure leads to an optimal value of A
confirms that for increasing
A~
REG(8
e<~2
V
A) decreases but REG(8u~;\)
6>~
-2
increases.
As a check that the maximum regret for 6 ~ ~
occurs at
euy
the
slope of the regret function is checked for each A at twenty points
between
eU
and 20m.
For the optimal critical
point~
A'* ,
eU
is found
to be less than 2m so that checking that the slope is negative for
values of 6 up to 20m provides strong confirmation that the maximum
regret occurs at eUo
Sawa and Hiromatsu [1971] gave values of A'* for m=l.
Their values
agreed closely with those of Table Sol. except that their values were
generally lower by 00001.
These
authors~
supremum of the regret function for e
thi~
of mo
is true for the case of
m=l~
<
!
however. assumed that the
occurred at
e =0
0
While
this does not hold for larger values
Their assumption actually leads to smaller values of A'*
particularly for large denominator degrees of freedom.
Two subroutines were used for the
non~central
F.
Both of these
were written by Mr. James Goodnight for generating the tables in
night and Wallace0
6
Good~
One of these subroutines uses a quick approximation
by means of a central F ~ while the other uses an iterative p:rocedure
until a given required accuracy is obtained.
The app&oximate subroutine
was used to generate optimal values for A for the numerator and
6James Goodnight and To Do Wal1ace~ No Co State University. Raleigh~
N. Co, Operational techniques and tables for making weak MSE tests for
restrictions in regressions p to be published in Econometrica.
58
denominator degrees of freedom listed in Table 5.J..
checked with the more accurate subroutine for m
The l:esults were
~ 1~2~4~8"
and the dis=
crepancies between the values of A'* obtained by the two subroutines were
less than 5.5
4~
and
of
m~
8~
percent~
respectively.
2
0.55
percent~
percent~
0 013 percent for m
= l~2~
As the discrepancies are small for higher values
the values listed in Table 5.1 were obtained from the accurate
subroutine for m
= 1~2p4~8
and from the fast approximation for the
remainder of the tableo
Certain monotone properties of
eL~
8
U
and A'* emerged which were used
to limit the area of search when the slower p more accurate subroutine
was used to generate the final tables of results for m
~ 1~2~4~8.
These
properties are detailed in the next section.
5.2
From Table
optimal critical
2.
Results Based on the Predicted Value of y
5.l~
it is clear that the outstanding property of the
value~
A'* ~ is that it does not vary much from the value
When m, the numerator degrees of
freedom~
is
l~
A'* only decreases by
about 0.1 as the denominator degrees of freedom increases from 2 to 120.
For large
m~
this decrease is more pronounced and when m
~
ok
l20 p A de=
creases by about 003 as the denominator degrees of freedom increases from
2 to 120.
For each value of
offreedom~which helps
numerator
degree~
m~
ok
A decreases with the denominator degrees
considerably in the search technique for when the
of freedom are
increased~
the previous value of A'* is
used as an upper limit for the starting intervalo
the denominator degrees of
degrees of freedomo
freedom~
ok
For a given value of
n-k, A increases with
m~
the numerator
59
Table 5.L
m
1
2
4
8
Optimal values for the predicted value of y
[
n~k
10972
a(F)
3504
26 08
20 08
eL
0 00
0 00
0 00
0 00
0 00
[
eU
200
1.9
109
1.8
108
L8
1.7
L898
l.887
1 0883
1.878
1.876
1509
1403
13 07
2
4
8
16.
24
60 .
120
20097
20006
10952
1.922
1.911
L897
10892
3408
25 08
1905
1506
14 01
12 03
11.6
2
4
8
16
24
60
12.0
2.185
2 0058
1.984·
1.953
1 0944
10931
1.931
34.7
2503
19 00
1501
13.6
11.7
11.0
004
004·
005
005
005
2
4
8
16
2.237
2 0093
20019
1 0988
10979
10972
10969
3406
2409
1701
11.6
904
6 06
5.7
000
0 03
101
1.6
1.7
2 00
2 01
1306
10 03
8.5
706
7 02
6 08
606
20265
2 0119
2 0043
2:008
1.997
1.986
L983
34.5
24.3
15 04
8.7
6 01
3 00
2 00
QoO
26 03
1908
16 01
14 00
13 02
12 01
110'1
2 0275
2.128
2.052
2.016
2 0005
1.993
1.989
3406
24 01.
14 07
706
408
L7
009
24
2
4
8
16
24
60
120
24*
*
2
4
8
16
24
60
120
60
120
16 *
.J,.
2
4
8
16
24
60
120
10922
17~3
OoQ
0 00
0 00
000
QoO
0 00
0 01
0 01 .
0 01
0 00
QoQ
l.0
2 05
3 06
401
408
500
000
1.6
400
5 07
6 5
0
707
8 01
309
3 02
2 08
207
206
. 2 05
2 05
1 01
5 05
4 07
4 03
402
4 00
309
3903
2904
2307
2004
1901
17.4
16 07
m
= number
of restrictions
n
= number
of points in the sample space
k
= number
of parameters
A*
6 ,8
L U
~(F)
= critical
= optimal
value of the prior test of estimation
lower~
= percentage
*These
and upper f theta values
alpha level for the central F
values were computed using the approximate subroutineo
61
Although the optimal values of the critical points p A*' 9 of the
prior test of estimation fluctuate only slightly over the whole range of
numerator and denominator degrees of
freedom~
the corresponding alpha
levels vary considerably as shown by Table 5.10
of freedom of at least
four~
For denominator degrees
the percentage alpha level decreases
monotonically with numerator and denominator degrees of freedomo
This
reflects the fact that the significance points of a central F decrease
monotonically with numerator degrees of
freedom~
provided that the denom-
inator degrees of freedom are at leastfour o
If a prior test of estimation is performed with alpha at the 5 percent
level, then the unrestricted least squares estimate will be chosen with
probability of 5 percent.
Clearly, from Table
function, as compared to a 5 percentF
test~
50l~
the minimax regret
generally increases the
propability that·the unrestricted least squares estimate will be
and in many cases this probability is greater than 15 percento
large m and
n-k~
chosen~
For
however g the alpha level decreases until it is only 001
percent for m and n=k both equal to 120.
Table 5.1 also gives optimum valt+esp
parameter.
eU
-.
As the numerator degrees of
increase.
increases but
eL~
eUp
of the non-centrality
freedom~ m~
As the denominator degrees
offreedom~ n~'k~
increase~
8
L
eU decreases.
For small m and
n=k~
8 is zero.
L
This occurs as
are relatively large for small values of m and n=k.
expected from the definition of
fact that for a given alpha
central
increase 9 both 8 and
L
r(e~A)
level~
r(e~A)
and scepA)
This would be
as found in (30204) and from the
the critical points of a
non~·
F distribution are monotonically decreasing with numerator.
and denominator degrees of freedom as can be seen by the tables in
62
7
Goodnight and Walla.ce.
When r(6 9 A.) and
that[-2r(e~A)
is large and negative so that the maximum of the
+
S(8~A)]
s(e~/\)
are
large~
it is found
regret function from the left-hand side of (3.1.6) occurs when 8
L
is zero.
eU
For large degrees of freedom in the numerator and denominator»
approaches p from above p half the numerator degrees of freedom.
This can
be seen in Figure 5.1 as the distance 0u moves to the point 8=m/2 as m
increases from 2 to 120.
It should be noted that these graphs are scaled
so that both the quadratic risk function and the values of theta are
e~pressed
in terms of mo
It can be seen also that as m increases the
shape of the graph changes so that the maximum
occurs at e
=0
for m
5.3
=2
regret~
0L P for e < m/2
=8
but moves away from zero for m
and m
= 120.
Computer Procedure for the Case of the
EstimatedBet~Vector
The procedure followed to obtain optimum values based on the beta
vector was similar to that used for the predicted value of y.
difference was
Figure
that~
3.l~the
The only
instead of equalizing the distances 0L and 0u of
aim was to equalize the distances
~Land ~U
of Figure
4.1.
As
largest
~L
depends on the sma11estroot
rQot~· ~
m
~
ofA~ ~l~
and
~u
depends on the
then extensive tables would be needed to cover the
possible values that
~l
and
~m
could take.
For a given set of data 9 S
and H would be fi~ed and known so that the matrix A
could be evaluated and the roots
~l
and
~m
obtained.
.
=
[HS- l HQ ]-lHS- 2Hv
Denote~,
~
by
v'*.•
~
trA
For the case of the predicted value of y of each of the
7
See footnote 6.
'*
~.
1.
1
equal -.
m
For
63
QRF
m=2
cr 2 (k-m)
e
m
2"
Figure Solo
The q~adratic .risk function of y for optimal
critical values
64
the estimated beta
unity~
vector~
1
if
O<Pl<- and
-m
*
Pl~
then for a given value of
1
if
lJ m< 10
m -
-
<
As the lJi* sum to
P: satisfies the more restrictive
inequality
*
l"~Pl
<
(m=l)
In the next
8 and
60~
section~
11
*
m
optimal values will be found for m=4and n...k =
and for selected values of PI* and
o < PI*
1 - PI*
3
5.4
< 0 0 25
1.J
*4
satisfying the conditions
and
.::. P4* ~ 1 ... 3P l*
Results Based on the Estimated Beta Vector
From Table 5.2, it can be seen that as PI' the smallest characteristic
root of A decreases,
critical value,
[1-2r(6,A) +
AO~
or~
in other words, as PI* decreases the optimal
increases.
S(6~A)]
being
increases as PI decreases.
cho~en
This would be expected from (406.19) for
.
negative
implies that the regret for 6
<
c
This means that a larger value of A must be
to reduce this regret so that it equals the regret for
e
>
c.
To
*
illustrate that ,A increases as]Jl decreases ~ consider the triplet of
*
values of (U *
l , ]J4' 1.. 0 )
=
(0.20~
Consider the regret for 6
the expression 28
lJm[l~2r(6,A)
0.30, 1.841) and
~ c~
(OolO~ 0.30~
as found,in (4.6.17).
1.937).
This involves
+'S(6,A)], and as the term in,brackets is
non-negative, an increase in Pm implies an increase in the regret, and
a smaller value of A must be chosen to equalize this regret with the
regret for 6
<
c.
Thus, an increase in lJ
m
implies an increase'in it ,
0
65
Table 5.2.
m=4 p n-k=8
Optimal values for the estimated beta vector '*
]:
0.25
0.25
1.997
18.7
0.4
4.6
0.20 .
0 030
1.841
21.4
005
402
0.35
1.706
2400
004
30·9
0040
1.599
2604
003
306
0.35
1.730
2305
0 07
309
0045
1.526
28 02
0 05
304
0.55
1.384
32.1
004
3.·1
0030
1.931
19 07
1.4
403
0 040
1.663
25.0
1.2
3 0.7
0 050
1.481
29 04
1.0
303
0.60
1.351
33.1
008
3 00
0 070
1.251
36 03
0 07
2 08
0.35
1.930
19 08
208
4 02
0 045
1.672
24.8
204
3.6
0055
1.495
29 00
202
3 02
0065
1 0365
32 07
1.9
2 09
0.75
1.264
3.5.9
1.8
2.7
0.85
1.184
38 07
1.6
206
0.25
0.25
1.951
11.2
007
309
0 020
0030
1.822
13.5
0 08
305
0035
1.698
16.1
0 07
3 02
0040
1.599
18.5
006
3.0
0010
0.05
n~k=60
[
u4'*
0.15
m=4,
[A
j.!1'*
O
a(F)
8
L
eU
66
Table 5.2 (continued)
m=4, n-k=60
0.15
0.05
0 035
1.734
15.3
LO
3 0..-1
0 045
10543
20.0
0 08
2 9
0.55
1.410
2400
007
2.6
0.30
1.947
11.3
1.8
307
0.40
1.688
16.3
1.5
3 01
0.50
1.516
2007
1.3
208
0.60
1.391
24.7
1.1
205
0.70
1.296
1801
1.0
204
0 035
1.968
10.9
301
3.6
0.45
1.718
15.6
2.7
3.1
0 055
1.546
19.9
2.4
208
0065
1.419
23.7
2 2
0
2 5
0.75
1.321
2701
2.0
2 4
0.85
1.242
30 02
1.8
202
.~
0
0
0
m = number of restrictions
n.= number of points in the sample space
k = number of parameters.
A
o
6 ,6
t
U
= critical value of the prior test of estimation
= optimal
lower~
and upper, theta values
a(F) = percentage alpha level for the central F
*
~i
~i
= trA
*These
9
i =
1~49
where Vi are the ordered roots of the
matr~x
values were computed from the approximate subroutine
0
A
67
*'
~l = 0.10~
and an example of this can be seen in the table when
as
*
~4
increases from 0.30 to 0.70 then Ao decreases from 1.937. to 1.251.
'*
~4has
It seems clear from the table that the value of
'*
~l
influence than
0.35.
As
'*
~l
on the value of Aoo
Consider
increases threefold from 0.05 to
from 1.968 to 1.734.
When
'* = 0.10~
~l
however~
~=4J n~k =
0.15~
and
more
60 9 and
'* =
~4
Ao only decreases
'*
~4
is doubled from
0.30 to 0.60, A' decreases considerably from 1.947 to 1.391.
o
5.5
Wallace
8
Percentage Alpha Levels if A'* Is a Critical
Point of the Non~Certtral F
propesed other criteria for linear restrictions in
regression based on whether the mean square error of the unrestricted
estimator was less than the mean square error of the unrestricted estima~
tor.
His."Strong MSE" criterion assumed that the test
distributed as a non-central F
equal to 1/2.
u ~ F'(m/2).
On.the other
withe~
hand~
statistic~ u~
the nQn-centrality
is
parameter~
his "Weak MSE" criterion assumes that
One motivation for using these criteria is that they in-
crease the probability ofcheosing the restricted estimators over the
ordinary F test as the restricted estimators have smaller variance even
though they are biased under the alternate hypothesis.
In section
2.5~
it was noted that the minimax regret function in general chooses the
unrestricted estimators with higher probability than, the usual 5 percent
level F tes t
~
The percentage alpha levels in the Tables 503 and 5.4
show the percentage probability of choosing the unrestricted estimator
assuming uisdistributed as a non-central Fwithl/2 or rn/2 as
centrality parameter o.
non~
These tables re-emphasize that the Wallace
68
Table 5.3.
Alpha percent level if A* is from the non-central F with
non-centrality = l/2 a
~l
1
2
I
4
8
16
[
24
[
60
120,
2
45.5
42.4
39.7
37.8
36.2
35.7
35.2
3500
4
41.5
3607
32.2
28.8
26.3
25.5
2404.
24.0
8
39.2
33.2
26.9
21.3
17.5
16.1
14.3
13.7
16
38.0
31.2
23.3
15.8
10.7
8.8
604
5.6
'24
37.6
30.5
21.8
13.6
709
50,8
304
2.6
60
37.1
29.6
2000
10.4
4.2
2.3
005
002
120
37.0
29.3
1903
9.3
3 00
1.3
0.1
001
a-.
Table gives 100 a
co
= 100 fA
ij
g(F (1/2»
dF
~
(1/2) •
69
70
criteria increase the probability of choosing restricted estimators
except when the nO):l.=centrality is 1/2 and m and n are large.
5.6· Efficiency of the EstimatQrs Based on the
. 'Optimal'Ctitical'Values
The quadratic risk function for the predicted value of the
dependent. variable has been considered in some detail in Chapter
3~and
in section 5.2, graphs of the quadratic risk function were displayed for
three values of
m~
the number of restrictions on the parameter space.
Another way of considering the properties of the quadratic risk function
is affol;'dedby the concept of relative efficiency.
In Figure 5.2 il graphs are drawn for three values of m.for the
efficiency of the estimatorXS '* relative to the least squares
Xb.
estimator~
This relative efficiency will be designated by R.E. (A.'* ~ 0) and
defined by
R.E •.
'*
(A~
q.r~fo for A=A *
0) = q.r.;£.
for 1.=0
~where
A* is the optimal
value of A obtained bytha minimax regret condition.
When.the
non~
centrality parameter 9 6 0 is zero then R.E o (A '* ~ 0) takes on the value
k
k-m+tnr(o.,A)
m
, wherer(O,A) = pr{F(m+2 g n-k):: m+2
Al.
From standard'
F tables, it is found that this probability decreases as m increases o
This is reflected in the graphs in
that~
for 6 = 0 9 RoE. (A '* 9 0) is
largewhenm =1209 and decreases asm decreases q
The graphs are scaled as the non-centrality parameter g 6 9 is
expressed as a multiple ,of the number of
this scaling
factor~
restrictions~mo
Even with,
the relative efficiency with m large is
where greater than the relative efficiency for small no
every~
Infact p
71
106·
RE
104
1.2
1.0 r--\1"-'~\r-------:...--:::;;:::-------------=======--
~m=120
m
0 8
=2
0
n=k
m
"2
Figure 5 2.
0
•
m
3m
"2
2m
= 120
e
Efficiency of the estimator XS * relative to the least
squares estimator~ Xb
72
for
m
:=
120, the relative efficiency only falls below l'for a
relatively small interval of e.
Of course, the shape of the curve of
R.E. (:\ * ~ 0) will alter as the number of parameters, k, is altered.
This
can be compared with the graphs of the quadratic risk function as·in
Figure 5.1 where altering the value of k will only
of the curve but its shape will be unaffected.
.effect a translation
Ask increases both
numerator and denominator of R~E. (A~ 0) increase so that the value of
R.E. (A *, 0) becomes closer to unity for all values of 8 9 or, in other
words, the curves tend to collapse towards the line R.E. (A *» 0) = 1.
In,Figur:e 5.3, the efficiency of the estimator using the optimal
critical value» A*, relative to thatpf the 5 percent F value is plotted
for the same three values of m.
This could be designated as
R.E.
The curves reflect the fact. that, except for large values of m
the critical
val~es
are smaller than the.5 percent .F values.
andn~
Conse-
quently» the. q. r. f. uS!ing A* is larger than' the q 6 r.f. using AO•05
when e is
small~
but. for larger
m/2, the opposite is true.
.e~
and certainly when e is greater than
Again, as ,k,the number of
parameters~
increases$lRoE. (A '* 9>"0.05) becomes closer to unity.
5.7
Maximizing the Min:j.'!T1um Relative EfficiencY
In the previous section, the efficiency of the estimator 9 XS * ,
was compared with the ordinary least squares estimator and the
traditional two-stage estimator at the 5 percent level.
Amore
inter~
esting measure is the efficiency relative to the 1cbestii. estimator which
~
m'
is·the restricted estimator, XS, whe~ eis in the interval [0, 2]~ and
73
RE
102
1.0
n-:k
:=
120
0,,8
m
"2
Figure 5.30
m
m
"2"
2m
e
Efficiency of the estimator X13 * relative to the estimator
based on the 5 percent F value
74
the least squares estimator 9
Xb~
('2m9
whenS is in
+00).
As the quadratic
risk function for XS * 9 Xb and XSare given respectively byg
M(S~O)
>=
0
2 k
The relative efficiency is then defined by
R,oEo (S9A) = Relative Efficiency =- M(8 p +OO) when
M(e~A)
M(e ,0)
M(Sp A)
In a
simila~
approach to that of the
mini~ax
e
is in [O~ m]
2
(50701)
m
when S is in ('2 9 +00)
regret function of Chapter 3 p
a maximin regret function based on this relative efficiency could be used
to determine an optional value of A2 the critical value of the prior test
of estimation..
This maximin approach would be to find the critical valuep
A , which satisfies the condition
o
inf R. E.
S<!!
2
(S~Ao)
=.inf R. E. (SpA'o )
(507.2)
8>m
-2
where.A o is in the interval [02 +00]0
It can be seen from Table 5.5 and Figure 5 4 that this cr.iterion
0
leads to a slightly higher critical value and lower alpha level than the
minimax criteriono
As the total number of parameters p k p increasesp
howeverpthe critical value decreases slightly
0
e
-
e
LO'---------~-----------m=2
n=60
RE
0 09"
-
:v.
Optimal value
of e
0.751
---....
-----1( .. - - - - - - -
---
La
Figm:e 5040
Efficiency of XB
'*
4
2 0
0
3eO
e
relative to the 9!bese i estimator
......
\Jl
76
Table 5.5.
Critical values obtained by maximizing the minimum relative
efficiency of the estimator, X~~ when the number of
restrictions is two
'*
q
a(F (1) )
~
a(F em)
n-k
k
I I
A
2
4
3.079
4.7
24.5
33.2
40.9
6
2.647
4.3
27.4
3607
44.9
8
2.487
4.1
28.7
38.2
46.6
4
2.782
3.7
17.4
2'7.2
36.2
6
2.452
3.4
20.1
30.7
40 02
8
2.327
3.3
2103
32.2
42.0
4
2.606
3.1
13.3
23.6
33.5
6
2.336
2.9
15.• 7
27.0
37.5
8
2.231
2.9
16.8
28.5
39.2
4
2.508
2.9
lLl
21 06
32 0
6
2.270
2.7
13.4
24.9
36.0
8
2.176
2.6
1404
2604
37.7
4
2.473
2.8
10.3
2100
31.6
6
2.246
2.6
12.5
24.2
35.5
8
2.157
2.6
13.5
25.6
37 02
4
2.429
2.7
20.2
31.0
6
2.217
2.5
11.5
23 03
3409
8
2.132
2.5
12.5
24.7'
36.5
4
2.415
2.6
9.1
19.9
3008
6
2.207
2.5
11.2
23.0
34.7
8
2.124
2.4
12.1
24.4
36.3
4
8
16
24
60
120
8
U
a (F)
9 .. 4·
2
2
0
)~
n
Table 5.5 (continued)
n
= number
of points in the sample space
k
= number
of parameters
A*
= critical
value for the prior test of estimation
eu = optimal theta value
a(F) = percentage alpha level
for the central F
t 1
a(F (2» = percentage alpha level for the non-central F with non1
centrality
= 2'
percentage alpha level for thenon-certtral F with non=
centrality
=m
2' = 1.
78
The lower optimal value ofe, 6 , was found to be zero for.each of
L
the three cases considered.
The upper optimal value of
e~
eU' is the
same value, for a given Ap which maximizes the regret function of
Chapter 3.
5.8
Conclusions
In the context of a general linear model with linear restrictions,
the traditional prior test of estimation leads to a choice between the
restricted or unrestricted estimators.
The resulting composite estimator
is biased.
This study has led to explicit expressions for the bias function
* in the case of estimating the predicted
of the composite estimator, XS,
value of the dependent variable, y.
As expected, the value of this
bi~s
function is found to be always less in absolute value than that of the
restricted estimator, while the unrestricted estimator is p of course,
unbiased.
Explicit expressions were also obtained for the quadratic risk
function for the predicted value of y.
This risk function was shown to
reach a maximum value when the non-centrality parameter, 8, takes the
value 6 ' dependent on the numerator and denominator degrees of freedom
U
and the critical value, A, of the prior test. Except for some interval
of
e
between one-fourth of the numerator degrees of freedom and
SuP
the
graph of the quadratic risk function .11es between that of the restdcted·
and unrestricted estimators o
A minimax regret function based on the quadratic risk
fun~tion
of
XS* was then proposed to determine the optimum value of the critical value,
A=A.
*
This value, A*, was found to be invariant under orthogonal
79
transformations of the sample, parameter and restriction spaces o This
is to be expected as the bias and quadratic ,risk function of XS '* are
invariant under these transformations.
From the empirical results, it is clear that the minimax regret
function .is generally more conservative in choosing the biased restricted
estimator than the traditicmal F test at (say) the 5 percent leveL
This
may be expected as no prior information is assumed on the distribution
of the
non~centrality
parameter and large values of this non-centrality
parameter imply large values of the
biasand,hence~
of the quadratic
risk of the restricted estimator.
Instead of the minimax criterion to set the value of A, a similar
criterion could be used based on the ,efficiency of the estimator relative
to the "best" estimator,. which is the ordinary least squares estimate
when the non-centralityparame'ter
the restricted estimator for
e
e
is in the interval (~,+oo), but is
in the inte:rval [0, ~].
The value of
A, ,A o , is sought which maximizes the minimum efficiency of the estimator.
In general, this procedure leads to somewhat larger values of
o
onk~
the tO,talnumber of
and
The optimum A
smallfilr alpha levels, than does the minimax procedureo
values, A , do depend
A~
parameters~
in this case
and decrease as k increases.
'* of the beta vector,
Attention is then focused on the estimate, S',
'*
and expressions are found for the bias and the mean square error of S.
It is found that these expressions are not invariant to orthogonal'
transformations, so that ..it would not be possible to find an optimum
value of A which did not'dependon the design matrix S
:=
XijX~
matrix'H which determines.the restrictions on the beta vectoro
or the
In a
80
particular casewhenS and H are
can be
found~
known~
and it depends on the
characteristic roots of the matrix A
then an optimum value
smallest~ ~lP
=
The empirical results show that>.
o
OfA~ Ao~
and the largest p
~m~
[HS- l HV ]-lHS- 2Hv •
decreases as
~
m
or as
~l·
increases.
On the other hand f it is shown that the minimax condition for the predictedvalue of ydepends on an averaging process and involves a constant
factor~
the reciprocal of the number of restrictions.
The main value of the minimax regret condition is that it does not
depend on an arbitrary level of the prior F test.
In the absence of
other ·information about the non-centrality parameter, and hence of the
proposedrestrictions~ it
leads to an estimator with a small quadratic
risk function over the whole range of the nqn-centrality parameter.
81
6.
LIST OF REFERENCES
Ashar, V. G. 1970. On the use of preliminary tests in regression analysis.
Unpublished Ph.D. thesis, Department of Statistics, North Carolina
State Univ~rsity at Raleigh. University Microfilms, Ann Arbor, .
Michigan.
Bartle, R. G. 1964. The Elements of Real Analysis.
Inc., New York, New York.
Goldberger, A. S. 1964.
New York, New York.
Econometric Theory.
John Wiley and Sons,
John Wiley and Sons, Inc.,
Graybill, F. A. 1961. An Introduction to Linear Statistical Models.
McGraw-Hill Book Company, New York, New York.
Gun, A. 1965. The use of a preliminary test for inter~ctions in the
estimation of factorial means. Unpublished Ph.D. thesis, Department
of Statistics, North Carolina State University at Raleigh. University
Microfilms, Ann Arbor, Michigan.
Johnson, N. L. and S. Kotz. 1970. Continuous Univariate Distributions.
Houghton Mifflin, New York, New York.
Jordan, K. 1962. Calculus of Finite Differences.
Company, New York, New York.
Kemp thorne , O. 1967. The Design and Analysis
Wiley and Sons, Inc., New York, New York.
Lancaster, P.
York.
1969.
Theory of Matrices.
Chelsea Publishing
of Experiments.
John
Academic Press, New York, New
Larson, H. J. and T. A. Bancroft. 1963. Biases in prediction by regression for certain incompletely specified models. ' Biometrika 50:391402.
Rao, C. R. 1965. Linear Statistical Inference and Its Applications.
John Wiley and Sons, Inc., New York, New York.
,
.
Sawa, T. and T. Hiromatsu. 1971. Minimax regret significance points
for a preliminary test in regression analysis. Technical Report
No. 39, The Economic Series, Stanford University, Stanford,
California.
Theil, H. 1971. Principles of Econometrics.
New York, New York.
John Wiley and Sons, Inc.,
Theil, H. and A. S. Goldberger. 1961. On pure and statistical estimation
in economics. International Economic Review 2(1):102-126.
82
LIST OF REFERENCES (continued)
Toro-Vizcarrondo, C. E. and T. D. Wallace. 1968. A test of the mean
square error criterion for restrictions in linear regression.
Journal of the American Statistical Association 63:558-572.
Wilks, S. S. 1943. Mathematical Statistics.
Princeton, New Jersey.
..
.
•
Princeton University Press,
© Copyright 2026 Paperzz