Smith, W.L.; (1975)On the asymptotic behavior of certain sums of probabilities."

ESTIMATION ON RESTRICTED PARAMETER SPACES
by
William Franklin Watson, Jr.
Institute of Statistics
Mimeograph Series No. 1026
Raleigh, N. C.
ABSTRACT
WATSON, WILLIAM
Spaces.
FRANKLIN~
JR.
Esti.mation on Restricted Parameter
(Under the direction of HUBERTUS ROBERT VAN DER VAART and
BENEE FRANK SWINDEL.)
The problem of finding point estimates of parameters when the
feasible parameter space is a proper and convex subset of Euclidean
m~space
waS studied.
The algorithms of maximum likelihood estimation
for the parameters of linear models, restricted in such a manner, were
reviewed for the case in which the elements of the error vector have
a normal distribution.
These estimators were shown to be biased, to
possess a type ,of consistency, and, in the univariate case, to have a
mean square error no larger than the unrestricted maximum likelihood
estimator.
Also, these estimators were shown to map all unrestricted
estimates which are not in the feasible parameter space to the boundary
of the feasible parameter space.
It would be difficult to believe
that the parameter is on the boundary so often.
The Bayesian estimators, the median and mean of the posterior
distribution, were shown to have different unpleasant properties when
the parameter space is a proper, convex subset in Euclidean m-space,
The median of the posterior distribution
W3.S
found to take on
points on the boundary of the feasible parameter space only if a
supporting hyperplane of the posterior contained at least half of the
probability mass of the posterior distribution.
S i.mi.larly, the mean
of the posterior distributi.on would never take on Some of the points
in the feasible parameter space as estimates unless the posterior
distribution tended to a dengenerate distribution at these points for
Some point in the sample space.
However. the mean of the univariate and a bivariate truncated
normal posterior distribution, were, shown to take on every point in
the support of the posterior for some value of the random variable.
Assuming the prior density to be proportional to either a uniform,
exponential, or truncated normal density over the feasible space, zero
elsewhere, lead to a truncated normal posterior when the random
variable was distributed normally.
A detailed examination was made of the estimators for the mean
parameter of a univariate normal distribution for the situation in
which the parameter; was known to be contained in a half-line.
Neither the mean of appropriate truncated normal posteriors using
any of the priors mentioned above nor the restricted maximum likelihood estimators had uniformly smaller mean square error over the
feasible parameter space.
The regret function was then introduced
and was defined to be the difference in the mean square error of an
estimator at a point in parameter space and the smallest mean square
error of the candidate estimators of that point.
The strategy chosen
waS to find an estimator which would minimize, among the candidate
estimators, the maximum regret over the sample space.
Joined
estimation procedures were proposed, in which the mean of a posterior
(exponential prior) was used over a portion of the sample space and
maximum likelihood procedures were used over the remainder of the
sample space.
An optimal joined estimator was found to give an 18%
reduction in maximum regret over the best of the classical estimators.
To extend the technique) opti.mal Bayesian estimators of this type were
found for se.veral subsets of the sample space.
The resulting
estimator gave a 48% reduction in the maximum regret over what was
found for the best of the classical estimators.
found for a bivariate example.
Similar results were
ESTIMATION ON RESTRICTED
PARAMETER SPACES
by
WILLIAM FRANKLIN WATSON JR.
A thesis submitted to the Graduate Faculty of
North Carolina State University at Raleigh
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
DEPARTMENTS OF FORESTRY AND STATISTICS
RALEIGH
1 9 7 4
APPROVED BY:
Co-Chairman of Advisory Committee
Co-Chairman of Advisory Committee
BIOGRAPHY
William F. Watson Jr. was born September 18, 194.5, i.n Tifton,
Georgia, and was raised in the farming community of Eldorado which is
near Tifton.
He received his elementary and secondary education in
the Tift County, Georgia, school system and was graduated from Tift
County High School in 1963.
He attended Abraham Baldwin Agricultural College, Auburn
University. and the University of Georgia.
From the latter, he
received a Bachelor of Science degree in forestry in 1967 and the
Master of Science degree in 1969.
In 1969 he was inducted into the U.S. Army where he served as a
computer programmer and systems analyst for the
Infantry Divisions in the Republi.c of Viet Nam.
th
~
and
rd
2~
Upon his release from
active duty in 1971, he entered North Carolina State University to
pursue the Doctor of Philosophy degree.
In 1974, he assumed a
research position with the Forestry Department at Mississippi'State
University.
The author is married to the former Linda Diane Hamlin and they
have one son, Hank.
iii
ACKNOWLEDGMENTS
Any expression of gratitude would be insufficient for the CoChairmen of thi.s author's Advisory Committee, Professors' H. R.
van der Vaart and B. F. Swindel.
These gentlemen were extremely
generous with their time, and provided the counsel which inspired
many of the significant portions of this study.
A special word of
thanks is due Professor W. 1. Hafley who served as guide through the
admin.istrative hurdles while serving as Co-Chairman of the Advisory
Committee.
The author wishes to also thank the other members of the
Graduate Faculty who served on his committee, T. M. Gerig, T. E. Maki,
and T. O. Perry; all of whom made significant contributions to his
educational experience.
Professor J. M. Danby's suggestions dealing
with the problems of numerical integration encountered in this study
were also appreciated.
The author was supported during his graduate study by the North
Carolina Agricultural Experiment Station.
Sincere gratitude is extended to the author's wife and Son for
their sacrifices, and especially to his wife for her efforts in the
completion of this paper.
iv
TABLE OF CONTENTS
Page
1.
INTRODUCTION • • • •
1.1
1.2
1.3
1.4
2.
The Problem •
Terminology
Review of Literature
Scope, Objectives, and Organization of This Paper
MAXIMUM LIKELIHOOD ESTIMATION ON RESTRICTED PARAMETER SPACES
2.1
2.2
2.3
2.4
General Discussion
Quadratic Programming
Isotonic Regression • •
Properties of the Restricted Maximum Likelihood
Est ima. te s .
3.
0
•
•
9
•
•
•
•
•
•
•
•
•
•
Properties and Problems • • • • •
Alternative Bayesian Procedures •
BAYESIAN ESTIMATORS DERIVED FROM TRUNCATED NORMAL POSTERIOR
DISTRIBUTIONS • • • • • • • • • • • • •
4.1
4.2
4.3
4.4
5.
0
BAYES IAN ESTIMATION ON RESTRICTED PARAMETER SPACES
3.1
3.2
4.
1
Mean of a Truncated Normal Distribution • • • • •
Priors Producing a Truncated Normal Posterior
Distribution for the Problem of Isotonic
Regression . • • . . • • • . • . • • ••
Construction of Several Bayesian Estimators and
Comparison With the Restricted Maximum
Likelihood Estimators • • • • • • • • • •
Comparison of Mean Square Errors of Restricted
Maximum Likelihood Estimators and Bayesian
Estimators • • • •
• • • • • •
IMPROVED ESTIMATORS
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Joining Estimators
The Criterion of Regret • • ••
The Application of Minimax Regret to the
Construction of a Joined Estimator
Other Joined Estimators • • • • • •
Extending the Technique • • • • • • • •
Estimating Two Ordered Parameters • • •
Estimators for m Ordered Parameters •
1
3
3
4
6
6
7
14
18
23
23
26
30
30
32
36
49
58
58
60
61
69
77
84
94
v
~BLE
OF CONTENTS
(Continued)
Page
6.
SUMMARY
7.
LIST OF REFERENCES
103
8.
APPENDIX
106
8.1
8.2
8.3
107
112
8.4
Theorems and Proofs
••••
Values of the Function f(x)/F(x) • • • •
The Mean of a Truncated Multivariate Normal
Posterior Distribution • • • • • •
Truncated Normal Posteriors Arising From Unequal
Samples From Several Populat.ions • • • • • • •
98
us
133
1.
L 1
INTRODUCTION
The Problem
The statistician is often confronted with the problem of
estimating the parameters for the linear model
(l.1.1)
y=X~+~.
In this problem,
y
is an
n
n X m design matrix,
~,
is an
m element vector containing the
known parameters, and
~
is an
n
components.
element vector of responses,
X
is an
un~
element vector of the random
There are many situations where the true value of the
parameter vector
~
(Euclidean m-space).
is known to lie in a proper subset of
m
R
Often such information can be written as linear
inequality constraints in terms of the parameter vector
~.
An
example of such restrictions is
(l.1.2)
where
C
elements.
is a matrix of order
k X m ,and
d
is a vector of
Not all restrictions are of this simple form:
k
Hudson
(1969) cites a case of polynomial regression where the derivative of
the polynomial must be positive over an interval.
Modelers of growth in biological population can often rule out
subsets of the parameter space,
m
R , because values of the parameters
in these sets would violate known biological laws.
An example of such
a violation would be a model for the amount of wood fiber accumulated
in the bole of a tree at various ages with parameters which give decreasing predictions of fiber accumulation over age.
2
It would be desirable i f statisti.cians could prescribe a uniform
set of rules for the modeler who has prior information that the true
value of the parameters are certain to be found in a proper subset of
m
R •
Unfortunately, such a set of rules has not been forthcoming.
Searle (1971) has listed the alternatives that have been proposed to
resolve the problem of negati.ve variance components.
Many of these
alternatives are applicable to the problem of estimation when the true
values of the parameters of a linear model are known to be in a subset
of
m
R •
Some statisticians view estimates which violate known constraints
as being indicative of a failure of the model to represent the true
situation, and investigate alternative formulations of the model.
Others choose to ignore occasional violations of the constraints if
the unrestricted estimates possess good properties otherwise.
Another
group prefers to incorporate the restrictions into the estimating
process.
They realize that infeasible estimates are otherwise
possible even when the model is correct due to the randomness of
sampling and the construction of the estimators.
Even those who agree to incorporate the restrictions on the
parameter space in the estimation procedure do not agree on which
estimation procedure best takes advantage of this additional information.
Many statisticians feel that maximizing the likelihood
function over the set of feasible parameters is the most desirable
alternative.
The Bayesians, however, suggest that prior probabilities
should be assigned to the elements of the feasible parameter space,
and classical Bayesian techniques be invoked.
Actually, we will see
that each of these alternatives has discouraging properties.
3
1.2
Terminology
Several expressions have been used to describe the subset of
in which the true value of the parameter is known to lie.
This
m
R
sub~
set will usually be called the restricted or feasible parameter space,
occasionally simply the parameter space.
The unrestricted least squares estimator is also the unrestricted
maximum likelihood estimator when the likelihood is proportional to a
normal distribution.
In convex progrannning literature, the term basic
Eo1ution refers to the value
x
-0
of
x
minimum for a convex objective function
which gives the global
F(~)
•
This paper will deal
with the normal likelihood function and for our purposes the termS
unrestricted least squares estimate, unrestricted maximum likelihood
estimate, and basic solution or basic estimate will be considered
synonomous.
Similarly, restricted least squares estimate, restricted
maximum likelihood estimate, and minimum feasible solution are inter=
changed.
The term Bayesian estimate is often used for the mean of the
posterior distribution in discussions of Bayesian techniques.
Should
any other meaning be intended, it will be made clear by the text.
1.3
Review of Literature
Many statisticians have taken an interest in the problem of
finding estimators for the parameters of the linear model (1.1.1)
where the parameters can be restricted as in (1.1.2).
Most of the
earlier work has simply attempted to find the least squares
estimator which satisfies the restrictions
(c.~. ~ . .&.,
Judge and
Takayama (1966), Mantel (1969), Lovell and Prescott (1970), Zellner
4
(l96l), and Malinvaud (1966), page 31'7.
Finding the restricted least
squares solution is an application. of quadratic progranuning which is
covered in most convex or nonlinear progranuning texts
(~.• .&.,
Boot
(1964), Hadley (l964) , and Kunzi, Krelle, and Oettli (1966».
A particular class of restricted least squares estimators, viz.,
those in isotonic regression, received much attention.
(A brief
discussion of the problems for which isotonic regression is appropriate,
is contained in Section 2.3.)
by Ayer
~
Early work in this area was performed
ale (1955) and Brunk (1958) and a recent text by Barlow and
others (1972) was devoted entirely to the subject.
The book contains
a nearly complete bibliography.
Bayesian procedures for the unrestricted regression problem have
been discussed for example by Raiffa and Schlaifer (1961) and by
Zellner (1971).
However, Bayesian estimators for
parameter space have not received much attention.
mentioned the topic and Barlow
~~.
~
in a restricted
Bartholomew (1965)
(1972, p. 95) discussed the
mode of the posterior as a possible estimate under the conditions considered here.
1.4
Scope, Objectives, and Organization of This Paper
This paper will concentrate on point estimators for the para-
meters of the linear model which satil?fy constraints (1.1.2).
Attention will be restricted to situations in which the vector
(1.1.1) has an n-variate normal distribution.
.§.
in
This paper will con-
sider the case of full rank design matrices only.
This paper will have two objectives.
The first will be detailing
the properties of maximum likelihood estimation on restricted parameter
5
spaces for the normal likelihood function.
The second will be to
determine if Bayesian techniques or Some other estimation procedure
will give properties superior to the maximum likelihood estimates.
The maximum likelihood estimation procedure for restricted
para~
meter spaces and normal likelihood functions will be considered in
Chapter 2.
A simple quadratic programming algorithm will be reviewed
there to give readers unfamilar with quadratic programming an under=
standing of the mappings onto feasible parameter space carried out by
maximum likelihood estimation under restrictions.
Chapter 3 will deal with Bayesian estimati.on on restricted parameter spaces and illuminate Some seemingly unknown differences in
Bayesian estimation on restricted parameter spaces as compared to
estimation on parameter spaces which include all the elements of
m
R
Chapter 4 will be devoted to incorporating these findings into a
situation where the likelihood function is the normal distribution
function.
The Bayesian estimators for a flexible class of prior
distributions will be presented.
Properties of the means of the
resulting posterior distributions will also be discussed.
Finally, in Chapter 5 the possibility of combining Some of the
previously presented estimators will be explored.
The aim will be to
profit from the improvements in the mean square error made by some
estimators over certain sets of the feasible parameter space while
minimizing inflation of the mean square error.
6
MAXIMUM LIKELIHOOD ESTIMATION ON
2.
RESTRICTED PARAMETER SPACES
2.1
General Discussion
When the vector
!.
mean zero and covariance
in (loLl) has a normal distribution with
.
matr~x
2
cr I , the parameter
has a
~
like lihood func tion
112
exp C~ - cr (~ - .~) I (y
2
--"'-~-
(
=
?_'licr 2.n/2
)
To maximize this function with respect to
~,
Xfi»
•
(2.1.0
it is necessary to
minimize the res idual Sum of squares;
~(~)
X
When
= (z -
Xfi)/(Z -
is of full column rank, the value of
(2.1.2) over
m
R
(2.1.2)
~
which minimizes
is the least squares estimator
I
(X X)
By the
Xfi) •
Gauss~Markov
~,l
I
X'y
theorem this is the best linear unbiased estimator.
One approach to finding an estimator for
~
which satisfies
constraints such as (1.1.2) is to maximize (2.1.1) for
appropriate subset of
m
R
0
~
on the
The execution of the maximization of a
likelihood function on a proper subset of
not: altogether new to the statistician.
m
R
is not easyo but it is
For eX.ample, any problem
which includes constructing a likelihood ratio test of the hypothesis
H:
C/~ ~ .~
7
conceptually involves finding the point at which (2.1.1) is maximized
on a proper su b set
0
f
Rm
0
Most introductory stati.s tical texts give
a method for finding the estimators of
known i! giori.
subset of
~
when elements of
are
~
This also is an example of estimation on a convex
m
R •
Now consider a normal li.kelihood function of a mean vector"
with more than one element.
with respect to
~
b&.'
If one wishes to maximize this likelihood
over a proper subset,
S
J
of Euclidean space and
the global maximum of the li.keIihood is not contained in
S
classical analysis procedures are not adequate to find the solutions.
One can ut,ilize, however" some of the techniques of nonlinear (in fact,
quadratic) programming to obtain such maximum likelihood estimators.
2.2
Quadratic Programming
The algorithms of quadratic programming provide' methods for
minimizing a convex quadratic function
subject to the restrictions that
(2 2 2)
C2:!~ b .
For
Q(~)
0
to be strictly cpnvex, it is necessary for
D
to be
positive definite (see Kunzi, Krelle, and Oettl i (1966)" page 39)
For the function
X'X
'f(~)
0
in (2 1.2) to be strictly convex, the matrix
0
must be positive definite.
column rank o
0
This is assured by
X
having full
8
Boot (1964) notes that i f the restrictions in (2.2.2) were
equalities the desired solution could be found by the standard
application of Lagrangian multipliers.
But. the restrictions are in-
equalities and there exist situations where the basic solution to
(2.2.1) satisfies the restrictions.
In situations where this is not
the case. Some or all of the restrictions must be invoked to obtain
the required solution.
The restricted solution will then lie on the
boundry of the feasible parameter space, that is, it will satisfy some
of the inequalities in (2.2.2) as equalities; see Theorem 1 in Section
8.1.
These equalities will here be called !lbinding" constraints.
Illustrations would perhaps clarify the situation.
Consider a
two parameter linear model of full rank, where the parameters to be
estimated are restricted to the first quadrant, that is,
~ ~
O.
contours of the objective function (2.1.2) are then ellipses.
The
Figures
2.1, 2.2, and 2.3 give examples of the optimum feasible solutions
that can be obtained when the basic solution is infeasible.
The
ellipse shown in each figure is the contour corresponding to the
minimum of the criterion function on the feasible sample space.
In Figure 2.1 the basic solution violates the constraint
and the optimal feasible solution lies on the line
13
2
=0
is the binding constraint.
~2':
o.
~2 ~
0
Thus,
In Figure 2.2 0 the basic solution
violates both the constraints, but t.he optimal feasible solution Is
on the line
~1
and
- 0
~
1
=0
is the bind tng con3traint.
Figure
2.3 illustrates a situation where 0nly nne constraint is violated by
the basic
and on
~.2
e~ti.mate,
=0
;
13
but the optimal feaSible
1
= 0
and
s2
=0
~L'l.ution
lies en
~l':
are the bi.nding COllstraint2.
0
9
----+--~~----------Sl
Figure 2.1
An example of a solution to a quadratic programming problem in which the basic estimate violates the constraint .S2 ~ 0 , and the same constraint is binding
Figure 2.2
An example of a solution to a quadratic programming
problem in which the basic estimate violates both constraints, and the constraint Sl ~ 0 is binding
10
l:l
....2
-.p;:.::a.I-e--------------- f\
Figure 2.3
An example of a solution to a quadratic programming
problem in which the constraint 1:1 ~ 0 is violated
2
by the basic estimate, and both constraints are binding
11
From these examples i.t is apparent that a slgntficant problem io
bind~
quadratic programming is that of deciding which. constraints are
ing.
out~,
An algorithm due to The il and Van de Panne (1960) wi.ll be
lined for findi.ng the binding con,straints and the optimum feasible
solution.
If the restrictions which are binding are known) then optimum
feasible estimates would be found by the straight forward use of
Lagrangian multipliers.
For example" if only a subset,
S. of the
original restrictions stated a3 equalities are binding) then the
minimum of
(2.2.1) under the restrictions (2.2.2) could be found by
minimizi.ng
(2.2.3)
where
C
Q
s~
::;
d
-8
describes the binding constraints in the set
Taking the derivative of (2.2.3) with respect to
§.
S •
and
,,,-
setting it equal to zero, one finds
solution under
S ,
Q
~s
"the
optimum feasible
'
~.,
(2.2.4)
Premultiply
(2.2.4) by
The matrix
cs
Thus,)
Cs
and substitute
d
~s
for
C ~
(~
'is
•
Then
always has full row rank" (see Boot (1964), page 99).
12
1
Substituting this expression for
in (2.2.4) gives
~s
(2.2.5)
To discover the binding constraints,
S, which give in turn the
minimum feasible solution by (2.2.5), Theil and van de Panne (1960)
recommended the following procedure.
to a collection of
k
In this discussion
Sk
constraints which are being imposed as
equality constraints.
1)
Find the basic solution,
i.~.,
the unrestricted
a0
If
1::..
satisfies
the constraints, it is the minimum feasible
solution.
2)
If
a
violates any of the constraints, the
~o
sets
8
1
will be formed by taking one at a
a time each of the constraints which are
violated by
~o
.
The restricted estimate,
is then found by (2.2.5) for each set,
If any
l
satisfies all the consl
straints it is the desired solution, and the
corresponding set
8
1
is the set of binding
constraints.
3)
If the optimal solution was not found in Step
2, sets of constraints
S2
adding one at a time to each
are found by
8
1
' each of
the constraints violated by the corresponding
refers
13
The restricted estimate ~
is then
8
1
2
found for each unique set of constraints 8 •
2
§.S
If an estimate
a
.t::. s
violates none of the
'
2
constraints, it is the optimal solution if and
only if the estimates found by eliminating
S!
2
either-of the constraints in
violate the
omitted constraint.
4)
I f the optimal solution is not found in Step
3, sets of constraints,
8
3
,
are constructed
,by adding one at a time to each of the sets
S2
,
the constraints found to be violated by
~
• Sets, s' , which
2
52
fail to satisfy the final condition given in
the corresponding
Step 3 are not considered in this step.
restricted estimates,
The
~s
,are then found
3
for each unique sets of constraints S3. If
an estimate,
a
I
,
violates no constraints
s3
.
and the three estimates found by eliminating
one of the constraints in
SI
3
violates the
~Sl is the optimal
3
If a feasible estimate fails to
constraint omitted, then
solution.
satisfy the last condition, the corresponding
set,
S3' is not considered in subsequent
steps.
5)
The process is continued as in Step 4 by considering successively larger sets of constraints
14
A given feasible estimate
Sk
if each of
k
,[s'
,k
is optimal
estimates found by eliminati.ng
one of the constraints in
straint omitted.
S~
violates the con-
The algorithm is continued
until such an optimal feasible solution is
found.
Kunzi, Krelle, and Oettli (1966) give proofs that the preceding
algorithm will lead to the solution of the quadratic programming
problem.
Their proofs are based on the saddle point theorem given by
Kuhn and Tucker (1951).
The optimal solution
aJ::.s
will be uni.que, although i.t is possible
to reach the same solution with different sets of "binding" constraints.
This can occur if the point of tangency of the criterion function
(2.1.2) and one of the restrictions forming the boundary of the
feasible parameter space is also the point of intersection of other
restrictions.
Such is the case in Figure 2.4.
There are many algorithms for finding the solutions to the
quadratic programming problem.
One of the more recent contributions
is Mantel's (1969) paper in which he gives a procedure which can
simplify calculations in Some instances.
However~
the
Theil~van
de
Panne algorithm has a geometrical interpretation which is somewhat
easier to grasp.
2 3
0
Isotonic
Regressio~
In many cases the expectation of an observation can be expressed
as
E(y . .)
~J
= s.
J
.
This is equivalent to havi.ng a design matrix with
15
--o;+::~------------
Figure 2.4
a1
An example of a feasible solution to a quadratic programming problem found for the binding constraintS2
where
Sl = 0
=0
is also satisfied by the feasible solution
16
o
or
1
for each element, and only one
1
per row.
Then the
which would maximize (2.1.1) subject to a given.3et of order
restrictions on the parameters would be called the isotonic regression
with respect to the particular restrictions.
Maximizing (2.1.1) for
this particular problem has sufficient applications and has generated
enough interest to warrant a book devoted entirely to the subject
(see Barlow
~
al. (1972».
As an estimating procedure isotonic regression could prove
extremely useful for the growth model problem in which observations
have been made over several time intervals.
For many biological
phenomena the growth model should be restricted to being monotonically
increasing.
In the absence of further knowledge of the functional
form of the growth process, the maximum likelihood estimates under
the assumption of normality would be the isotonic regression with respect to restrictions that
(2.3.1)
a simple ordering of the
Sus.
The restriction given i.n (2.3.1) can be expressed in the form
(1.1.2) by letting
follows:
d
equal the null vector and by defining
C
as
17
=1
o
c =
1
o
o
0
~l
1
o
0
o
o
-1
o
o
o
o 0
-1
(2.3.2)
1
(m=l) X m
A solution satisfying the simple ordering restriction indicated
in (2.3.1) and (2.3.2) consists of first finding the basic estimates
by either unrestricted least squares or unrestricted weighted least
squares.
Then an algorithm called pooling adjacent violaters is
applied.
This procedure involves taking the weighted average of
adjacent estimates (e .~.,
.
S.
~
> ~.1+ 1 ) which violate the restrictions.
This pooled estimate is then assigned as the isotonic estimates for
each parameter.
Any pooled estimates will be considered as a block
if further pooling is required to obtain the desired order.
The
weight of the block will be the Sum of the weights for the basic
estimates.
Pooling is continued until the isotonic estimates
satisfy the ordering imposed on the parameters.
A method of steepest
descent gives a strategy for choosing violators which would be the
most efficient in many cases (see Kruskal (1964».
18
2.4
Properties of the Rest:ricted M.axi.mumt,ikelihood Estimates
A property of an estimator which is usually conSidered desirable
is consistency.
For if one has a consistent estimator" the true
value of the parameter can be bought.
That is with a suffi.ciently
large sample size the estimate has a nearly degenerate distribution
at the true value of the parameter.
The basic estimates are consistent in a certain, reasonable
sense when
variance
£
2
(J
I
in (1.1.1) is distributed normally with mean zero and
In particular, if the experiment represented by
(1.1.1) is repeated
k
times, yielding
k
independent vectors
2
:L.~ "" MVN (XI2.,(J I) ,
then the design matrix for the enti.re experiment is
x
x
,X
where there are
k
k
=
submatrices all identical to
and
_ (X! X) - 1/ k •
Now the basic estimates are
X.
Then
19
k
(XIX) -lX' ( L;
y.)/k.
J
j=l
were
t_e
h
h vector
.th
is the o b servationor
t h e J-f
y.
J
the experiment (1.1.1).
distribution with mean
Recall that
X~
Yj
..
f
repet~t~on 0
has an n-variate normal
) covariance matrix
2
(j
k
I
.
Then
L;
1..
j=l
would have an n-variate normal distribution with mean
variance
Thus,
normal with mean
~
S
~k
would be distributed as an
and variance
(j
2
(XIX)
-1
/k
kX~
J
and
m variate
Thus, as
k
becomes large, the covariance matrix becomes a zero matriX, and the
distribution of
ak
is degenerate at
~.
To show that the restricted estimates satisfying (1.1.2) are
consistent in the same sense, observe that
lim
Pr(C/~k < ~) ~ 0 •
k~co
This is a consequence of the convergence of the distribution of the
basic estimator to a degenerate distribution
which i.s known to satisfy (1.1.2).
a~
the true value of
§.
This implies that as the sample
si.ze increases, the basic estimates will vi.olate the restrictions on
the model with a probability of zero.
If the basic estimates fail to
violate the restrictions, then the basic estimates are the restricted
maximum likelihood estimates.
(Barlow
~
al. (1972) gives an
equivalent proof for the case in which the restricted estimates are an
isotonic regression with respect to a quasi-ordering.)
The restricted maximum likelihood estimators are, in general,
biased.
Mantel (1969) gives the following example which illustrates
20
the reason.
Consider the one parameter model in which the one
meter cannot be negative.
para~
The restricted maximum likelihood estimate
S,
~ , is zero when the basic estimate,
is less than zero, and is
equal to the basic estimate when the basic estimate satisfies the
restriction.
where
p(S)
The expected value of
r
is
i.s the probability density of the basic estimate.
The
basic estimate is unbiased and its expectation is
Note that
if
p~) >
a
is less than zero on the interval
°
anywhere on
S
[- CD ,OJ
so
[-CD ,0) ; so
= E (S)
<
"" )
E (S
If the basic estimates lie outside the feasible parameter space
with probability near zero, the restri.cted maxi.mum likelihood
estimators can have little bias.
This property mi.ght encourage the
modeler to be conservative in formulating the restrictions, and he
might include elements in the "feasible" parameter space which are not
feasible.
This type of "bordering" of the feasible parameter space
would be paid for by negating some of the gains made in reducing mean
square error by restricting the parameter space.
21
Barlow
~
al. (1972), page 64, gives a theorem for the isotonic
N
regreSSi.on estimates,
§., of ordered parameters which shows that these
estimates have a smaller mean square error than the unrestricted least
squares estimates.
The theorem states that
A
(13.1. - 13.)
1
where
~
2
1
~
is the true value of the parameter,
estimate of
~
and
w
is a vector of weights.
expectation of (2.4.1) with the
W.
1
(2.4.1)
w.
the least squares
Taking the
equal shows that the mean square
error of the isotonic regression estimator is less than the mean
square error· for the unrestricted estimator.
(This is a straight-
forward application of the comparison theorem.)
This result of the
inequal ity in (2.4.1) is that for ordered
fi, isotonic estimates
reduce the mean square error although for
~
not 'near' a boundary,
the reduction would be expected to be small.
In this example, it is possible to show that
th~
mean square
error can be decreased because the isotonic regression estimates are
the nearest points in the restricted parameter space to the basic
estimates.
The general restricted maximum likelihood estimate or
restricted least squares estimate is not usually the nearest point in
the restricted parameter space to the basic estimate; so the same
proof would not hold.
Mantel (1969) states, without proof, that the mean square error
for any unrestricted least squares estimator is larger than for the
restricted estimator.
Judge and Takayama (1966) concluded that for a
broad class of problems, Mantel's contention is true.
Their
22
conclusions were b.ased on Zellner's (1961) work in which a mapping
similar to the one given in the proof for the isotonic regression
example above was considered.
A final property to consider is that the restricted estimates
will always be boundary points of the restricted parameter space
whenever the basic solution is infeasible.
This property is shown
in Section 2.2 above and more formally by Theorem 1 in Section 8.1.
Thus, restricted maximum likelihood estimates
wi~l
pile up on boundary
points - points which barely satisfy the restrictions.
The property
is unappealing because the same restricted estimate could be obtained
for the case where the basic estimate satisfies the restrictions
exactly and when the basic estimates grossly violate the restrictions.
To summarize, Some properties of the restricted maximum likelihood
estimator are:
it is consistent in a reasonable sense, but it is
biased; it can have small mean square error, but the sampling
distribution is somewhat unattractive.
23
3.
3.1
BAYES IAN ESTIMATION ON RESTRICTED PARAMETER SPACES
Properties and Problems
The problem described in the introduction lends itself well to
the Bayesian philosophy of estimation.
One wishes to find estimators
for certain parameters where the true values cannot possibly belong to
a certain subset of Euclidean space.
fine a
"prio~'
The Bayesian approach would de=
desnity which assigns probability zero to this
impossible subset.
The next step in the Bayesian approach would be to specify the
prior density on the feasible parameter space.
In the situation where
little additional prior information is available with regard to the
true value of the true value of the parameters being estimated, a
uniform prior is often chosen.
~
that no greater degree of
The uniform prior has the interpretation
priori belief is placed in anyone point in
the feasible parameter space than in any other.
The final step in the
Bayesian approach would be to cpmpute the "posterior" distribution,
that is the conditional distribution of the parameter given the
observations, and to estimate the parameter by Some measure of the
central tendency of this posterior distribution:
its mean (most
frequently), its median, or its mode.
The Bayesian approach does seem to be appropriate for finding
estimates in the situations described here, but few publications have
addressed this problem.
Bartholomew (1965) discussed the special
problems of constructing interval estimates when the parameter space
is restricted.
Barlow
~
ale (1972) discussed the use of the mode of
24
the posterior as an est.imator when the parameter space is restricted.
These are the only references that were found whi.ch dealt with
Bayesian estimation on restricted parameter spaces.
The mode of the posterior density is the same as the traditional
maximum likelihood estimator when a uniform prior distribution is used.
This is true whether the feasible parameter space is a proper subset
of Euclidean space or not.
In case the feasible part of parameter
space is a proper subset of the Euclidean space, this estimator will be
bunched up at the boundary of the feasible space.
This is an un-
pleasant property of this estimator mentioned earlier.
The Bayesian estimator most often used is the mean of the
posterior distribution.
Now the mean of any distribution will be con··
tained in the convex hull of the support of that distribution.
Since
the support of the posterior distribution is a proper or improper subset of the support of the prior dis tribution, this is a somewhat
attractive property.
However, this Bayesian estimator also has an
unpleasant property:
it can aSsume values on t.he finite part. of the
boundary of t.he convex hull of the support of the posterior distri=
bution if and only if the post.erior distribution is degenerate at (a
flat subset of) this finite part
coordinates); see Theorem 8.3.
(i.~.,
the part with finite
In fact the mean of the posterior
distribution will always be bounded away from that boundary unless the
posterior distribution is degenerate at it.
This property is particularly easy to observe when a beta prior
with parameters
a and
~
is assumed for t.he parameter
binomial distribut.ion with density
e
of a
25
n; x
p(xle) = (x>e
e)
(1 -
n-x
; x
= 1, 2,
e
The posterior distribution for
(a + x)
meters
and
'"
e
The parameters
value of
a
e
n
(S
= (a
and
S
x) I (a
The mean of the posterior is
+S +
n) •
could never take on the value of
e
al(a + 13 + n) , nor between
=n
0
or
1.
cannot take any values between
(a + n)/(a + S + n)
e
reader will see this by finding the value of
x
11
are greater than zero, so for a given
fact, it is easy to see that
and
s; 1
is a beta distribution with para-
+ n-x) •
+
o s: e
.•• , n
for
and
x
=0
In
0
1, (the
and for
).
The mean of the posterior distributions for the continuous
conjugate distributions given by'Raiffa and Sch1aifer (1961) show the
same property.
As an example, consider the rectangular distribution
with dens ity
f(K~e)
=
lie,
0 s: x s: e,
e > 0
where the real life problem indicates that the feasible part of
parameter space is given by
a
6
[y,.CD),
... ,
density for a sample
xn )
0
< y.
of size
n
The joint
is
e-n ,
o
otherwise
I/Bennee F. Swinde1 suggested this example in personal communications
26
where X(n)
is the largest order statistic.
The conjugate prior
density is
> 1,
n'
p(e)
Y
s;
e
s;co
cC:
otherwise
Th en th e mean
0
·
f th e po S t erl,or,
A
e=
n
M
/I
,
s~nce
n + n' > 2 , 1',S
~l
n/l-l
n"-l
n I '_2 n 1/ -2 = n" -2 M ,
M
where
M = Max(X(n)'Y) ,
strictly positive, and
n'l = n
'"e
+ n ' • Since
Y > 0,
has a minimum distance of
M also is
Y/(n"-2)
from
Y , the finite boundary of the feasible parameter space.
Thus, the Bayesian estimator (in the sense of the mean of the
posterior distribution) seemS to be as unappealing as the maximum
likelihood estimator,
(i.~.,
the mode of the posterior distribution for
a uniform prior), since legitimate values of the parameters,
i.~.,
values carrying positive probability density in both the prior and the
posterior distributions, will be ignored by the estimation process.
3.2
Alternative Bayesian Procedures
The mean of the posterior distribution is the appropriate
estimator for a parameter
e
when the loss function is the squared
error
=
A
(e - e)
2
•
(3.2 . .1.)
27
Tiao and Box (1973) have suggested that other loss functions not be
overlooked.
For example the loss function
Ie
~
eI >
e
for
e
small and positive:>
I'e-el<e
gives rise to the mode of the posterior distribution as an estimator
of the parameter.
The expected value of the
1085
function
is minimized by the median of the posterior distribution.
These two additional loss functions and the corresponding
estimators seem inadequate for the particular problem under con=
sideration.
For example, when a uniform prior is used, it was
observed before that the mode of the posterior is also the maximum
likelihood estimate,see also Barlow
~~.
(1972), page 95.
The median is similar to the mean of the posterior in that it,
too, excludes feasible values of the estimator.
A point in a one=
dimensional parameter space can be the median of a continuous
posterior distribution i f and only if the cumulative posterior
distribution is equal
1
~
at the point.
Thus. for
ab~olutely
continuous posterior distributions. finite boundary points on the
convex hull of the support would again be excluded as estimates.
In
fact, any posterior distribut.ion which does not assign a probability
of
1
2
or more to a flat boundary point will fail to give boundary
points as medians of the
po~terior.
28
In the search foe an estimator of a restricted parameter with
appealing sampling properties, estimators whi.ch accumulate at boundary
points and estimators which never include certain neighborhoods of
boundary points have so far been encountered.
It would seem that it
should be possible to find an estimation process that would avoid both
extremes.
For example, one might think that the problem can be
solved by choosing a pri.or such that the mean of the resulting
posterior can aSSume any value in the feasible parameter space.
This
can be done by assigning posi.tive prior probability to points outside
the feas ible parameter space..
This would be analogous to "averaging"
the feasible parameter space and the unrestricted parameter space.
A specific example of how this could be carried out can be
constructed using the rectangular process with a hyperbolic prior
cited earlier in the section.
Recall that in this example the feasible
parameter space is . [y. co),
0 < Y , and the mean of the posterior
is
(nl'-I)M/(n'I~2)
,where
M is the larger of
largest order statistic of the sample.
Y
and the
Thus, the mean of the
posterior will never fall in the interval
[y,(n'J~1)y/(n/-2)J
which is a non-empty subset of the feasible parameter space.
,
If,
however, the chosen prior assigns positive value over the interval
[(n 1/ -2)y / (n 1/ -1).
CO
j
according to the hyperbolic distribution" then
the minimum value of the mean of the posterior is
y
0
Thus, the mean
of the posterior can be forced to cover the entire feasible parameter
space by assigning positive prior probability to elements not contained in the feasible parameter space.
29
This example illustrates that the prIor can be manipulated to
achieve estimates that exhaust the feasible parameter space.
It
should be noted, however, that for the combination of prior and
likelihood chosen) the resulting estimator now has the same shortcoming as the maximum likelihood estimator; . !.. .t=.., for all samples in
which the maximum order statistic is less than
the estimate of
e
is
(nl!,,2)y/(nll~l)
,
y •
The evidence presented thus far indicates that for the problem
presented in the introductjon, Bayesian procedures can rectify the
undesirable accumulation of estimates on the boundary of the feasible
parameter space.
The cost of this rectification seems to be that the
mean or median of the posterior will not approach the finite
boundaries of the feasible parameter space unless the mass of the
posterior accumulates at these boundaries.
The mode of the posterior
distribution re.presents a distinct alternative to maximum likelihood
estimators only if the prior chosen is not uniform.
Other general properties of Bayesian estimation on a restricted
parameter space will not be explored here.
In the remainder of this
paper, attention will be restricted to some specific estimation
problems regarding the m-variate normal distribution where inequality
constraints can be placed on the elements of the mean vector.
In these
specific sit.nations, specific Bayesian estimators will be explored.
30
4.
BAYESIAN ESTIMATORS DERIVED FROM TRUNCATED NORMAL
POSTERIOR DISTRIBl:TI0NS
Raiffa and Schlaifer (1961) listed as a desirable trait of a
prior density that it should lead to analytically tractable posterior
distributions.
In Section 4.3, several reasonable priors will be
listed which yield a truncated normal post.erL)r distribution for the
situation in which the observations are a sample from a normal
distribution.
All of these reasonable priors assign positive
probability only to the feasible parameter space, such as the space
defined by (1.1.2). ·The truncated normal distribution does have a
mean and mode which can be expressed with some degree of convenience.
Of course, the mode is that point which gives the maximum of the
normal de.nsity function on the restricted parameter space.
of the mode
Properties
(1.£., the restricted maximum likel ihood estimator) and
algorithms for finding it were discussed at length in the second
chapter of this paper.
The mean of the posterior distribution is the value of the
parameter minimizing the loss function (3.2.1).
To explore tract=
ability of means of truncated normal distribu.tii1Os,. we will first
consider the univariate truncated normal distribution
g(x)
= e-(xJ..L)
=0
2
/20
2
for
x ~ a ,
for
x:
< a
(4.1.1)
31
Cram~r (1951), page 248, gives the Urat moment of (4.1.1) as
E(x) = ~
+afCa:)/O _ F(a;:))
,I.-a" /. Al-a
+ af ( ~)h
. ~~)
= ~
Here
f(x)
F(x)
and
a
a
(4 .1 .2)
•
are the density and distribution function
respectively for a normal random variable with mean zero and variance
one.
Equation (4.1.2) involves only sta.ndardized normal density and
distribution functions and is easily evaluated for values of (a1J.) la
3.
less than
f(~)
a
However, as
(l - F(~»
and
a
(a""\-L)
goes to co, both the functions
approach zero rapidly.
The attendent
computational problems are easily taken care of by using a continued
fraction expression for
(1 .. F(x»
(eL
Abramowitz and Stegum (1%4),
page 932), namely
1 - F(x)
f(X)[;+
~+ ~+ ~+~+
= f(x) GF(x)
... J
for
x
> 0
(4 1.3)
•
0
Substituting (4.1.3) into (4.1.2) gives
E(x) =
=
II.
r-
+ af(~)/(f«~)CF(a~»
"a
a
"a
for
,~> 0
a
~ + a I CF (~) •
a
Section 8.2 contains a table of the values we computed of
f(x)/F(x)
=
l/CF(-x)
for
-10 ~ x ~ 5 •
(4.1.4)
32
Thus, in univariate truncated normal posterior distribllti.ons
finding the mean of the posteri.or is not difficul t for any given
set of values of
~
•
a,
and
a.
In our applications
seen to be a function of the observations.
~
will be
The obvious next question
is what happens for multivariate truncated normal distributions?
problem is dealt with in the Appendix, Section 8.3,
This
Equation (8.3.11)
gives the mean for a broad class of posterior distributions, and
(8.3.13) gives Cram~rls result for the univariate case as derived from
(8.3.11).
4.2
Priors Producing a Truncated Normal Posterior Distri.bution for
the Problem of Isotonic Regression
In Chapter 3 the uniform prior was said to be applicable when
the modeler can not specify in advance that anyone point in the
parameter space is more likely than any other.
It was also noted in
that chapter that a uniform prior yields a posterior which is
proportional to the likelihood for those points in the parameter
space that belong to the support of the prior.
Consi.der the case of one observation taken from each of
normal populations.
i
=
1, 2,
4",
The populations have means
m , and all have variance,
a
2
~
i. ' respectively,
,known.
Let the
uniform prior be assigned over the feasible parameter space,
defined by the simple ordering
m
A
33
Note that, the feasible parameter spaces defined by general linear
inequalities
C~ ~ ~
are considered in the Appendix, Section 8.3.
is
In this case the joint density of
where
:L
Y1" Y2"
is the vector with components
~
case of (8.3.3), and the Bayesian estimator of
n
and with
This posteriDr density is a special
support as given in (8.3.14).
If
••• " Ym
is a special case
observations are taken from each population, the density
function is
~
C') ~nm exp( ~
f(yl~) = (
m
n
~
~
(y ij
1=1 j=l
=
(I..[1n
'li
C') -nmexp (~(
m
n
~
~
i=l j=l
m
•
Here"
Yi
y ..
:lJ
eXP(-i.~l nG i
-
re f ers to t h e
J~- 0
2.
Y ij
-
2
~.) / (2c
1.
m
~
~
1,
.2
»
~i)2/(2c2».
.th
on
»
ny. ) ! C2c
i=l
b servat.lon
.
,th
l'
on t 'h e 1
. - popuatlon,
an d
is the mean of the observations from the
posterior density of
~
.2
A
is
.th
1~"~'
population.
Then, the
34
m
-
n
?
m)
2
t\f"'I"i::
"L.. Y:. ~ n L: Y~) Il 2a) )
\V '<'11 a ) ~nm exp ('" '(~
1 J
.
..
i=1 1.' \
1.=1 j=l
(4.2.1)
Therefore (4.2.• 1) is a truncated normal posteri.or distribution of the
form (8.3.3).
the
(The case where the number of observations differs for
m populations
i~;
considered in the Appendix" Section 8.4).
An exponential prior and a normal joint density also yield a
truncated normal posterior.
The exponential prior is appealing to
those modelers who know that the true value of the parameter is more
likely to barely satisfy the restrictions than to be far away from the
boundary of the feasible parameter space.
Again, the case of one observati.on per population for
populations will be considered first.
mean
known.
I-L.
1.
respectively,
m normal
Again, the populations have
i = 1. 2 ••••• m • and all have variance
An exponent. ia 1 pr i.o r for a simple orde n.ng
j S ~
C1
2
35
~
•••
~.
4L m
~ f.L
m~
l)/e m"" .1)
= K eKp(~ICft)
for
e
bl<,.
A ~
(4.2.2)
and
p~La)
Here,
l/e.
1.
K
J
=0
othendsE •
is a scalar constant,.
and the matrix
C
.a
is the vector whose Edements are
(2'3
_ •. ' •.2)
'.
. _ .g·~v'e·n·
.L
1.0
l"S
•
The resulting posterior
dens 1. ty is:
• K
exp(~ ~/~)/
ex p
=
(~ «z ~,
2
0' C)) '.:,H:) ! «(y
SAexp( ~«y~0'2C '.e.)
~~)
I
=
~~~~C) )
«Y"O' 2 C. :9)
The posterior (4.2.3) is again of the form (8.3.3) with support given
in (8.3.14).
The last prior considered here is a truncated
multivari~te 122.rma1.
This prior gives a truncated normal posterior when the observations are
taken from a random variable.normally distributed.
The truncated
36
normal is a candidate prior when the modeler knows that Some i.nterior
points in the feas ible paramet.er space are more li.kely t.o be the true
value of the parameter than any other points.
The truncated normal prior for
m mean parameters
~i
of the
densities of the observed variables considered here is
elsewhere •
- 0
As before,
K
is the normalizing constant.
density on
A
is
Then" the posterior
2
2
y+a £)!
2 2
Q1+a
QI
(4.2.4)
4.3
Construction of Several Bayesian Estimators and Comparison
W.~th
the Restricted Maximum Likelihood Estimators
In Section 3.1 concern was expressed that the mean cf the
posterior distribution might not yield all the points in the
feasible parameter space as estimates.
Example distributions were
given which showed this property for all possible values of the
37
observations on the random variable.
A theorem Is gi.ven in the
Appendix" Section 8. I" whi.ch shows that all points in the convex hull
of the support could be obtained as Bayesian estimates only if the
posterior became degenerate at tlle boundary points.
It can be shown that the truncated normal posterior does become
degenerate at boundary points for some observed values of the random
variable.
To illustrate, consider the univariate truncated normal
pos terior:
~
1
(II. =y)
2
= V2IT- ~_e ~
CO
fS
/2a
2
~
2
2
a
Here
S
f.L e [8,':0)
otherwise •
is a fini.te number.
The posterior probability that
[a,b]c[S,OO)
where
for
1
,_f,,_y) /2a
\~ e \J-i' .
c\J,
V2.ffa
F(x)
f.L
lies in an interval
is:
is the distribution function for the nOlrmally distributed
random variable with mean zero., and varian.ce one.
First. the case in which
S
=
a < b
will be considered.
by (4.301),
Pr4J,e[S,bJ) = 1 -
F«y~b)/(J)/F«y=S)/(J)
0
Then
38
As
y
goes to
'·co,
Pr4L e: [Sob])
becomes
L.
This can be seen
L'Hospital!s rule to
by application of
V~S
FCy,~b
n)/F("-~)
a
a'
F( (y~b) fa) /F{ (y.. S) fa)
1 im
y~=
co
.:
Here
f(x)
f(y~b)fa)ff«y.,S)fa)
Urn
y~=
is the density function for a random variable normally
distributed with mean zero and variance one.
1"
~m
y~=
Since
co
b
.fUy·~b) fa) =
f «y-S) fa)
>
CO
S ~
y(b-S)
y~-
e
lim
In the case
S
< a
~
a
0
co
(4.3.3) is equal to zero.
y~~
Then
(4.3.3)
lim
y~-
lim
and
(4.3.2)
co
Tht'B"
Pr(f.Le:[S,b])
L.
CO
b ,
Pr(~e: [a, b])
F(X- a )
= ~_ a .;_'
f'(b~)
a
v~b
F("--)
CL
39
and both
and
F(~~b-)/F(Y~S)
(J
have limits of zero as
(4.3.2).
Thus~
y
(J
goes to
=
the probabi.lity that
in the interior of
[S, 00) ... 0
as
from the results found for
CD
is in any interval entirely
I-L
y -+
=
00"
and the mass of the
probability is accumulated at the boundary point,
this truncated normal posterior tends to
at
S
as
y
goes to
=
of this posterior approaches
S •
Therefore~
a degenerate distribution
This implies that for
00.
S •
y ... = 00
the mean
Thus, the mean of a truncated normal
posterior can take on all the values in the support of the posterior
provided the observed values of the random variable can take on all
negative ~ real values.
We will see examples of this in the
discussion below.
The case where one observation is made on a population distri=
i-L ~
but ion normally with mean
sidered.
space
0
and variance one will now be
con'~
A uniform prior i.s assumed over the feas ible parameter
[0,+(0) •
Then according to (4.2.1), the resulting
pos ter ior dens ity on
0
S;
i-L
< + 00 is
40
so that by (4.1.2), the Bayesian esti.mator
(t.~."
the mean of the
posterior density) is
~
As
y
=y +
+00,
approaches
f(y)
approaches unity very rapidly.
as
y
(4.3.4)
f(y)/F(y) •
tends to zero and
Thus,
becomes large and positive.
approaches
For
y < 0 ,
j..L
F(y)
y
very rapidly
can be expressed
i.n terms of the continued fraction given in (4.1.3) and (4.104)0
j..L
=y +
.i.~.,.
l/CF{=y) •
The value of a continued fraction,
a
a
l
Z
a
3
~.~~.
b
1
b
2
b
u
0
0
:)
3
lies between two successive convergents if" for every term
a.
the continued fraction,
1.
and
b.
~
The terms of
for
-y, while negative values of
~.
~
is equal to
GF(~y)
being examined, so the value of
1
( ,~
~
1.
CF(x)
have positive integers
yare
would be in the interval
=V
,~.~~)
-y " 2
y' + 1 "
i.~.,
Land
between the first two convergents.
U. where:
L
= Y + L1 = 0
..y
and
of
are positive (Abramowitz and
Stegun (1964), page 19).
a .• b.
a ./b.
Thus,
j..L
must be between
41
u=
y +
1
~~~-~.
,-=:f~
?
y-
As
y
y
tends to
goes to
,~
00
-00.
function of
y
Ass~mipg
,
+
1
U approaches zero. as
approaches zero as
~
Figure 4.1 contains a plotting of
for the interval
[~lO)10J
~
as a
•
an eX,E0nential ,E.Iior over the feasible parameter space
for the same density of the observed variable gives somewhat
different Bayes estimators.
For this case, the mean of the
posterior distri.bution is:
=y
~
lIe +
f(y~l/e)/F(y~l/e)
(4.3.5)
as follows from (4.1.2.), since the posterior distribution here is
y ~
a normal distribution with mean
lie
truncated at zero.
Figure 4.2 gives a plot of the estimators,
values of
tends to
e •
Note that as
y - lIe •
which for
~
, for several
becomes large and pos iti ve.
y
e
the maximum likelihood estimator.
f1
near zero is quite different from
No poInt 1.n the feasible para-
meter space is excluded as an estimate of
~
.
42
10.0
7.5
5.0
2.5
0.0
------
-10.0
0.0
-5.0
5.0
10.0
y
Maximum likelihood estimate
Bayesian estimate (uniform prior)
Figure 4.1
Bayes estimates (uniform prior) and maximum likelihood
estimates of the mean, ~ ~ 0 , of a normal distribution
when the observation is y
43
,
10.0
"
7.5
~
1/,
.,
rJ
I
'1//
.'•
.'
5.0
~
"
0.0 ....
./
-
-10.0
". ,../
.a=..;;;;;;;,.-._ _--'
-5.0
/
"I ,
./
/
I
• I'
.' I
/
2.5
I
.
I
/
/
II
/
0.0
5.0
10.0
y
Maximum likelihood estimate
Bayesian estimate (exponential prior,
(exponential prior,
(exponential prior,
Figure 4.2
e = 5)
e = 0.5)
e = 0.25)
--.-._-----
Bayes estimates (exponential priors) and maximum
likelihood estimates of the mean, ~ ~ 0 , of a normal
distribution when the observation is y
44
For the same density of the observed variables and a
truncat~d
[0" co) "mean
nermal prior with positive probability over
e,
and
variance one. the mean of the posterior is;
Y+e + iiittelL.\f11
F «Y+8) /1(2)
2
~
The values for
Figure 4.3.
It should be noted that for
approaches zero.
(y+l)/2 •
For
e=1
y e [-10,10J , and
when
y
y
are shown in
negative,
positive and large,
~
~
approaches
Again, all points in the feasible parameter space are
found as estimates of
~
for Some value of
y •
For the case of the bivariate truncated normal, as is given in
(8.3.18), with a simple ordering on the parameters
l-1i' again all
points in the feasible parameter space can be found as estimates, for
some value of
X.
The exprEssion in (8.3.19) gives the expected
value of thi.s posterior.
When
(Y;?" Y )
l
is negative,
~l
and
can be written as continued fraction as was shown in (4.1.3).
Thus "
and
(4.3.6)
Then
1-11
would be between
L
and
u where
45
10.0
7.5
5.0
2.5
..........
..........
-..._---"
0.0 ....- - - - - - - - - - '
-10.0
..5.0
0.0
5.0
10.0
y
Maximum likelihood estimate
Bayesian estimate (prior~ a normal
(1~1) truncated at zero)
Figure 4.3
Bayes estimates (truncated normal prior) and maximum
likelihood estimates of the mean~ ~ ~ 0 , of a normal
distribution when the observation is y
46
L = y
c
The values of
vergents for
approaches
(Yz-Y 1)
1
a
+IJ7.
-
+
Land
CF(x)
U
in
are found by substituting the firs t. two
~ 1
(Y2+Yl)/2 ; so
approaches
-
1
.
As
~l
-
(Y2~Yl)
(YZ-Y )
1
- 00 ,
L
tends to zero as
(Y2+Yl)!2
00 •
Similarly, it can be shown that
as
goes to
con~·
approaches
-00.
~2 ~
(Y2+Yl)/Z
This expression,
tends to zero
(y +y )/2, is the
2 l
Same as would be found by isotonic regression for a simple ordering
on
~l
and
parameters.
~2
when the basic estimates violate the ordering on the
(See the discussion following (2.3.2).)
approaches
- 00,
the Bayesian estimates
to the maximum likelihood estimates.
tends to one; and
~1
becomes
y1
As
and
(Y2~Yl)
~2
Thus, as
and
~2
tend
becomes large and
becomes
y2.
These
limits are again the isotonic regression estimates when the basic
estimates satisfy the ordering on the parameters"
Since the isotonic
regression estimates are on the boundary of the feasible parameter
space when the basic solution violates the ordering, the2e Bayesian
estimates will take on values in the neighborhood of the boundary with
a positive probability.
47
It would be desirable to determi.ne if, 1.n general, the Bayesian
estimates are close to the maximum likelihood estimates for some
observations.
This is not possible analytically because of the
complexity of the
multi~ormal
integ'ra1, wh,ich Kendall and Stuart
(1969, pp. 350-353) have pointed out.
These difficulti.es would not
arise when the n-variate density function is a product of
independent univariate normal density functions.
n
It is difficult to
conceive of practical situations in which a truncated normal posterior
would havE' this property.
For this reason, the remairling discussion
will be limited to the univariate and bivariate truncated normal
posteriors.
The univariate and bivariate Bayesian estimators discussed here
are usually consistent in the case of a uniform prior.
In both cases,
the Bayesian estimator (cf. (4.3.4) and (4.3.6») consists of the unrestricted maximum likelihood estimator with an additional term of the
form
± Af(b/A)/F(b/A)
Here
A
•
(4.3.7)
is the standard deviation of the unrestricted maximum
likelihood estimator times a positive constant,
c.
random variable is distributed normally with variance
unrestricted maxi.mum likelihood estimator has variance
sample of size
n.
a
2
the
2
a In
for a
By application of (4.2.1) and (4.1.2), the
Bayesian estimator is
p;
Thus, i f the
= y + ~n f «y-a) (\fUla)) IF «y-a) 0[D.la»)
•
48
Therefore, (4.3.6) is
~
feb ~n/(Qj»/F(b ~n/(cc»
n
The value of
b
•
(4.3.8)
in the univariate case is
f.L
uniform prior in which
> a ; see (4.1.2).
(;-a)
for the
Then as was shown in
(2.4.1),
Prey
as sample size goes
probability one,
to
-a ~ 0)
Then
CD.
b ~n/(ce)
0
b
will be positive with
will approach
CD,
and
feb ~n/(ce»/F(b ~n/(ce»
becomes zero, since
feb ~n/(ccr»
becomes zero and
F(b ~n/(ccr»
approaches one.
In the bivariate case
by
b
is
(;2-)\)
where
f.L 1 < f.L2 •
Again
(2.4.1)
Pr(Y2
as
n
goes to
Thus, as
n
CD.
SO by the same argument,
(4.3.8) becomes zero.
becomes large, these Bayesian estimators approach the
unrestricted maximum likelihood estimator which is consistent.
Notice that these estimators are not consistent when the
feasible parameter space is a closed set.
the argument for
when
f.L
= a.
f(x)
and
F(x)
For example when
f.L
~
a ,
in (4.3.8) would approach zero
Then (4.3.8) would approach a positive quantity as
increased, and therefore, the estimator would not be consistent.
n
49
For the normal and exponential prior';:; discussed here, it i.s
possible that
equal to
y
b
- a 2 Ie
true value of
probability one as
f(x)/F(x)
if
~
is negative as
n
goes to co
.
For example
for the exponenti.al prior with
were less than
n
2
a Ie ,
became large.
would approach
~
b
~
~
0
.
b
is
If the
would be negative wi.th
Then since the argument of
co, the estimate would become zero.
Thus,
were larger than zero, the Bayesian estimator would not be con-
sis tent.
The same si.tuation exists for the truncated normal prior
distribution.
4.4
Comparison of Mean Sqyare Errors of Restricted Maximum Likelihood
Estimators and Bayesian Estimators
The mean of a truncated normal posterior does seem to solve the
problem of accumulating estimates on the boundary points that the
restricted maximum likelihood estimator presented.
This gain is made
without incurring the problems anticipated in Section 3.1.
That is,
estimates are found in any neighborhood of the boundary for the univariate and bivariate truncated normal posterior.
The restricted maximum likelihood estimators are consistent.
Bayesian estimators for a uniform prior with support on an open set
are also consistent; however, other priors do not necessarily lead
to consistent estimators.
Another fitting comparison of the Bayes and maxi.mum likelihood
estimators is with respect to mean square error.
Does one of these
estimators have mean square error uniformly smaller than any other?
This questi.on will be studied in depth for the univari.ate case.
50
Without: loss of generality, numerical examples will only be given
prior distributi.ons with support
fOt
[0, ro) .
The restricted likelihood estimator of one observati.on from the
univariate normal, with mean known to be larger than
"
IJ.ML
= a
when
y < a ,
=y
when
y
~
a, is
a •
The meal'. square error for this estimator is
222
+ sro (Yj1) e-(Y1L) /('2c)d •
a
\{'r TTCT
Y
Integrating by parts, the last integral becomes
Then,
(4.4.1)
For the same sampling density and a uniform prior for
greater than
IJ.
a, the mean square error of the Bayes estimator is
51
(4.4.2)
This expression does not lend itself well to analytical examination.
Howeyer, numeri.cal approximat.ions of this formula can be found by
applyi.ng Gaussian-Hermite quadrature formulas.
An explanation of the
technique is contained in Ghizzetti and Ossicini (1970).
programs used to evaluate (4.4.2.) were
DQH32
and
System/360 Scientific Subroutine Package (1970).
plotting of (4.4.1) and (4.4.2) for a equal zero,
and
~
s [0,8J.
DQH64
The computer
i.n the
Figure 4.4 gives a
a
2
equal to one,
(The values of these functions from which Figure 4.4
waS made are given in Table 4.1.)
Neither estimator has uniformly
smaller mean square error than the other.
The Bayesian estimator from an exponential prior and with the
same density of the observed variable has the following mean square
error; (in this case
a
~
is set equal to zero)
2
1
J I(2rT a
2
2
e-(Y~) /(2c )dy •
Figure 4.5 was found by evaluating (4.4.3) by program
above~
Again,
(4 4 3)
•.
DQH64. , see
a 2 was set equal to one. As can be seen in Figure
4.5, the estimates found from exponential priors do not give uniformly
smaller mean square errors than the restricted maximum likelihoCld
esti.mates either.
In fact, the Bayesian estimator whi.ch gives the
52
1'.00
"
"
/
/f
/.
/
0.75
'-"
Mean
Square
Error
/
/
I
I
.
I
0.25
I
I
I
.I
I
.
I
0.50
I
..'
/1
•
•
•
•
•
•
0.00
0.0
2.0
4.0
6.0
8.0
~
Lower envelope
Maximum likelihood estimator
Bayesian estimator (uniform prior)
Figure 4.4
a
Plots of the lower envelope and the mean square error
for the maximum likelihood estimator and a Bayesian
estimator (uniform prior)
aThe term lower envelope will be introduced in Chapter 5
53
~--
5.00
;
/
3.75
I
/
I
2.50
Mean
Square
Error
I
/
I
.-
.-.-._._.~._.­
1.25
0.00
0.0 .
2.0
4.0
6.0
8.0
Lowe r enve lope
Maximum likelihood estimator
Bayesian estimator (exponential prior,
(exponential prior,
(exponential prior,
Figure 4.5
e = 0.5)
e = 2.0)
e = 6.0)
----------------
Plots of the lower envelope and mean square error for
the maximum likelihood estimator and several Bayesian
estimators (exponential priors)
54
Table 4.1
Mean square error for the maximum like1i.hood estimator
and a Bayesian estimator (uniform prior)
MiSE
MSE
I-i.
Bayesian
estimator
estimator
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2,.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
0.91554
0.84237
O. '7'7987
0.72739
0.68427
0.64983
0.62339
0.60427
0.59179
0.58528
0.58408
0.58754
0.59504
0.60598
0.61979
0.63592
0.65388
0.67320
0.69346
0.71426
0.73526
0.75617
0.77672
0.79669
0.81590
0.83421
0.85151
0.86771
0.88277
0.89667
0.90939
0.92096
0.93142
0.94080
0.94916
0.95656
0.96308
0.96878
4.97374
0.97803
0.50473
0.51'788
0.53788
0.56325
0.59256
0.62454
0.65802
0.69198
0.72555
0.75803
0.78885
0.81761
0.84401
0.86791
0.88923
0.90801
0.92435
0.93837
0.95028
0.96021
0.96855
0.97535
0.98085
0.98527
0.98878
0.99153
0.99367
0.99531
0.99656
0.99750
0.99820
0.99872
0.99910
0.99937
0.99956
0.99970
0.99980
0.99986
0.99991
0.99994
MSE
MSE
ML
I-i.
Bayesian
estimat.or
estimator
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5.0
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8.0
0.981'71
0.98485
0.98751
0.98977
0.99165
0.99323
0.99453
0.99561
0.99649
0.99721
0.99780
0.99827
0.99864
0.99894
0.99918
0.99937
0.99952
0.99963
0.99972
0.99979
0.99984
0.99988
0.99991
0.99994
0.99995
0.99997
0.99997
0.99998
0.99999
0.99999
0.99999
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
0.99996
0.99997
0.99998
0.99999
0.99999
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
L.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
L.OOOOO
ML
5.5
greatest improvement for sITUilller values of
as
I-L
I-L
' performs the poorest
increases.
Now the case of the truncated normal prior will be considered.
The prior examined here will be proportional to a normal density
with mean parameter
A and variance
0
and the prior will be zero elsewhere.
uni.variate normal density with mean
over the interval
[0, ex:»
,
The observations again have a
and variance
I-L
a
2
Then the
posterior is the univariate case of (4.2.4) and using (4.1.2) the
Bayesian estimator is found to be:
2
" = oY+o' A
I-L
2
oia
+
g
2
2
~ f «OY+o' A,) / (cia »
2
2
2'
0+0'
F«cyfa A)/(Cia
»
The mean square error of this estimator is
This function was also evaluated by Gaussian-Hermite quadrature for
a
2
A,
and
C equal to one.
A plot of the values of this
function are shown in Figure 4.6 for
I-L
in the interval
[0,8J
On
this interval, the mean square error for the Bayesian estimator was
smaller only in the neighborhood of
A'
The same conclusion can be
drawn from Figure 4.7 in which the mean square error is plotted for
the Same example with
A set equal to 3.
56
/
12.0
I
I"
I
I
9,,0
I
I
Mean
Square
Error
6.0
3.0
0.0
>-<---""
0.0
2.0
;; <
.;
/
/
/
4.0
/
/
/
/
/
I
/
I
/
I
l
6.0
8.0
Maximum likelihood estimator
Bayesian estimator
Figure 4.6
-------
Plots of the mean square error for the maximum
likelihood estimator and a Bayesian estimator (prior,
normal (1,1) truncated at zero)
57
I
I
I
J
I
I
I
.J
6.0
J
J
I
Mean Square
Error
3.0
I
I
I
\
I
\
I''
,
"
\
II
,
0.0
I
I
.
I
l
~''''''---.,;//'
0.0
2.0
4.0
6.0
8.0
11
Maximum likelihood estimator
Bayesian estimator
Figure 4.7
-------
Plots of the mean square error for the maximum likelihood estimator and a Bayesian estimator (prior, normal
(3,1) truncated at zero)
58
5.
5.1
IMPROVED ES'r:nvIATOHS
Joining Estimators
It was shown in Section 4.4 that none of the Bayesian estimators
presented have uniformly smaller mean square error than do restricted
maximum likelihood estimators and vice versa.
value of the parameter
j..L
However, if the true
happened to be near the boundary of the
feasible parameter space, an exponential prior has been found which
gave a smaller mean square error of the resulting statistic for
values of
near the bOillldary (see Figure
j..L
mean square error for values of
~
4.5).
t~e
This improvement in
near the boundary corresponds to
sacrifices in mean square error for values of
j..L
away from the bOlw.dary.
The restricted maximum likelihood estimator had larger mean square
error near the boundary, but is vastly superior to the Bayesian
estimators found from exponential priors at points farther from the
boundary.
The Bayesian estimators found from a illliform prior had a
mean square error which was smaller than that of the restricted
maximum likelihood estimator for values of
j..L
in the feasible parameter
space away from the boundary, and larger near the boundary.
(The
uniform prior will not be considered separately in the remainder of
this paper since it can be derived as a limiting case of exponeEtial.
priors.)
All this suggests that a model.er having information only that
the mean,
j..L,
of some normal density function belongs to a certain
half-line might try to combi.ne the better properties of both type3
of estimators.
59
Combined estimators are not, foreigh to
2
Lat.isticians.
In fdct,
the restricted maximum likelihood proc-.edll:!.'es me:'1t.i.oned in Chapter 2
are essentially combined estimators.
If
thf~
illJrestricted estimates
are points in the feasible parameter space, they are the restricted
maximum likelihood estimates.
If the unrestricted estimates are not
points in the feas ible parameter space, a.YJ.other algorithm is employed
to pI'oduce the restricted estimates.
Other- combined estimators have been considered for entirely
different situations.
Bancroft (1944), Mosteller (1948), and Gun
(1965) have studied esti~ation procedures with a preliminary significance test.
Their estimators are found by first testing to determine
if the estimates from several populations are significaIltly different.
If significance is found, individual population estimates are used.
Otherwise, the estimates from the various populations are pooled; note
that the significance levels recormnended for these situations are
larger than the significaIlce levels normally used.
Consider the univariate normal density with a
not less than d.
rr~ean
known to be
An estimator with generally smaller mean square
error could hopefu.lly be created by using a Bayesian estimator derived
from the exponential prior when the unrestricted maximum likelihood
estimate is near the boundary or falls outside the feasible parameter
space.
The unrestricted maximum likelihood estimate would be taken
as the estimate in all other situations.
Finding such aa estimator
which does give a reduction in mean square error is a f,)rmidable task,
A good value
e of
the parameter of the exponential prior must be
found, and the part of sample space in which the maximum 1ikd.ihc,Iod
60
estimator is to be used must be determined
0
Of ,:;ourse, a criterion
of goodrless must be established t:J dictate the ch'Jlces.
502
The Criterion of R.egret
In the framework of statistical decision tbeory, the msan square
error of any estimator is often regarded as the expected value of the
(quadratic) loss suffered when using that estimatcr (the loss being a
consequenee of the fact that the estimator is not equal to the value
of the parameter being estimated).
ThE. expected 10s8 is a function of'
IJ.; its value also depends on the estimati.on procedure used:
the case of Bayesian estimator's, it depends on the
thus, in
e characterizing
a
particular prior within a family of priors; more basically, :it depends
on the family of priors.
Similarly, it depends on whether one uses a
(restricted or unrestricted) maximum likelihood estimator or a Bayesian
estimator.
which is
fronted.
I
The task of somehow combining several estimators; each of
good' for some IJ.-values,
!
poor' for others; must be con-
Now, for each point IJ. in the feasible parameter space we
can determine the infimum of the expected less cm'responding to aLL
competing estimators; the value of this inf'imUI'l will, of couyse,
depend on the class of competing estimators.
Thus, a functi.()!:l of IJ.
which will be called the lower e!1Ve}ope (of the expected 1 CBS fUI1(ticn)
will be defined.
This lower envelope indicates the best "re
Ca::J.
pc·s-
sibly do with the available estimator's if for each exper.'.iment the trvle
IJ.-value is known.
Since this is not kYle-'WYl, the
expe.~ted
108s can be
no smaller than the lower envelope; no mattsr how the estima:,ors
previously studied a:re combined,.
Thus,.it must 'ce ac::.eptJe;J. that the
mean square error of the combined estimatol's for
,[OSi
IJ.-va.l.ues wiJ..1
61
exceed the value of the lower envelope.
The difference between the
two will be cal.led regret (cf. Savage (1.954), Sections 9.4, 9.5, and
9.8, and Savage (1968)).
A corribined estimator will be sought which
minimizes this regret, which again, depends on the class of competing
estimators and, of course, on I.L.
The plan is to define a real-valued criterion summarizing the
behavior of this regret function over the feasible parameter space,
then to select such a
I
combination 1 of the above estimators as to make
this real number as small as possible.
There are many such criteria
available. Gun (1965) suggested using the Ll-norm of the regret
function in the situation he studied.
dates.
Other
Ln -norms are also candi.
Of course, the computation of such norms requiTes selection
of a measure over the feasible parameter space.
A criterion which can
be implemented with less difficulty is maximizing the regret function
over the feasible parameter space and minimizing this over the competing estimators.
Thus, the criterion would be minimax regret.
Minimax procedures are described by Wald (1950) and Savage (1954).
As Wald has stated, minimax is applicable when a particular prior
cannot be justified.
in this section.
This is more in line with the situation proposed
The minimax criterion is a pessimistic approach, but
it does protect against large losses.
5.3 The Application of Minimax Regret to the Construetion of a ,Joined
Estimator
Consider again a sample of size one, y, from a normal distribution
wi th unknown mean IJ.
~
d and known variance
o
ct.
to investigate joined estimators of the form
The obj ective now is
62
IJ.
where
Ct
> d,
lJ.
J
:=
e
for
y<Ct
.'
~L
for
y <::
,
lJ.
a denotes
the Bayesian estimator corresponding to an
exponential prior with parameter
maximum likelihood estimator.
mator thus depends on IJ.,
objective is to choose
Ct
and ~L denotes the (unrestricted)
The regret function for such an esti-
e and
e and
a,
Ct
Ct will be dena ted by R(IJ.,
e,
Ct).
The
so as to minimize
max R(IJ., 8, Ct) •
IJ. <:: d
The pair (a', Ct') which minimizes
combined estimator,
~.~.,
(5.3.1)
characterizes the optimum
one chooses the Bayesian estimator corre-
sponding to the exponential prior with parameter
a'
when the
unrestricted maximum likelihood estimate is less than Ct', rold chooses
the unrestricted maximum likelihood estimate otherwise.
To find the values of
Ct
and
e which minimize
(5.3.1), one first
must determine the lower envelope of the family of mean square error
curves .
~'he
initial step is to determine the lower envelope of the
mean square error (see
(4.3.5))
ing to an exponential prior with
of all Bayesian estimators eorrespond-
e€
(0, 00) •
Then it will turn Gv.t
.
that for no value of IJ. the mean square errcr of tbe restricted maxiffilIDl
likelihood estimator or the mean square error of the Bayesian estimator corresponding to the uniform pri.or is less than the constructed
lower envelope.
Therefore, this lower envelope is the lower envelope
for the class of competing estimators mentioned in Section 5 ..2.
approximation for it was f'OUlld by numerical methods.
An
This will be
done first for the case of d=O and
2
(1
:=
1.
'I'a:ble 5. 1 gives the
]'01' ~
approximation to the lower envelope that was found as follows.
equal 0.1 or 0.2, candidate values for e were found by increasing e
For values for ~ €
0.1.
by steps of length
[0.3, 8.oJ
such candidate
values were found by incrementing e either by 0.1 or by half the
difference in the optimizing e for the two preceding values of
~,
By comparing the val.ues in 'I'abl.e 4.1 with the
whichever was larger.
val'.les in Table 5.1 the reader wi.l.l eas ily convince himself that the
function tabulated in Table 5.1 gives the sought after lower envelope.
The next step was finding the mean square error for the joined
estimator.
This mean square error is (cf. equation (4.3.4))
S~ [y
1
e
- l/e + f(y - l/e)/F(y - lie) -
-(y-~)
2
/2
~J2
dy
J2TI
+
eD
S
Ct
(y-~)
for any given values of
2
1
J2n
Ct
and~.
The second term in this expression
reduces to (cf. equation (4.4.1))
(5.3.3.)
where f(x) arId F(x) are normal density and distribution functions
respecti vely for the uni vCl.riate normal di st;ribution with mean zero
and variance one.
64
Table 5.1
I-L
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
Approximation for the lower envelope of the mean square
errors for the estimators derived from the truncated
normal posterior formed with exponential priors
e gwmg the
minimum MSE
0.1
0.2
0.3
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.4
1.5
1.7
1.9
2.1
2.4
2.7
3.0
3.4
3.7
4.2
4.7
5.5
6.2
7.0
8.1
9.3
10.4
12.1
14.7
17.2
19.9
22.4
26.2
32.0
37.7
M.inimum
MSE
0.00010
0.00156
0.00617
0.01551
0.03034
0.05150
0.07704
0.10812
0.14293
0.18153
0.22284
0.26607
0.31062
0.35597
0.40132
0.44628
0.49040
0.53311
0.57427
0.61357
0.65079
0.68582
0.71854
0.74895
0.77700
0~80276
0.82625
0.84761
0.86690
0.88423
0.89972
0.91352
0.92572
0.93648
0.94592
0.95415
0.96130
0.96748
0.97280
0.97735
I-L
4.1
4.2
4.3
4A
4.5
4.6
4.7
4.8
4.9
5.0
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8.0
e gIvIng the
minimum MSE
43.5
52.2
65.1
78.1
91.1
1l0.5
139.7
168.9
212.7
256.5
322.2
420.7
519.2
667.0
814.8
1036.5
1369.0
1701.6
2366.6
3031. 7
4029.2
5026.8
7022.0
9017.1
12010.0
16499.0
23232.6
29966.2
43433.9
56900.6
77101.7
107403.2
152855.3
221033.5
323300.6
425568.0
630102.3
936904.0
1243705.7
1857309.1
Minimum
MSE
0.98122
0.98451
0.98728
0.98960
0.99154
0.99315
0.99448
0.99558
0.99647
0.99720
0.99779
0.99826
0.99864
0.99894
0.99918
0.99937
0.99952
0.99963
0.99972
0.99979
0.99984
0.99988
0.99991
0.99994
0.99995
0.99997
0.99997
0.99998
0.99999
0.Q9999
0.99999
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
The first term of (5.3.2) must be evaluated by numerical proce
dures.
An algoritr.iIll very useftD. in the mini z'1.tion over
Ct,
evaillates
this term by the Hermitian forr!lU-la using the first derivative (see
System 360/Scientific SU'Jroutine Package (1970) subprogram DQHFE, or
Hildebrand (1956)).
This algorithm approximatES the value of the
integral at several equidistant points over the interval of integration
as follows.
Define
Z.
J.
== z. (x.) :=
1
1
S xi
y(x)
a
dx
at equidistant points xi which satisfy the following relationship
X.
1
--
a
+ (i-l)h .
The value of zl is assigned to be zero, and all other values of zi
are found by the formula
z.1 - z.1- 1 + h(y.1- l+Y·+h(y~
1 - y~)/6)/2
.
1
11
where y~ is the derivative of the function y(x) at x. and y. is equal
1
to y(x.).
1
1
1
The maximum error will be less than
4 4
s h y (v)/720
4
where s is the total length of the interval aIld y (v) is the fourth
derivative of y(x) eval.uated at v
€
[Xl' x ]
n
Therefore, the first term in (5.3.2)
a
S_ro
Cy
- l/e + f(y - l/e)/F(y- lie) - fJoJ
.
2
66
could be evaluated at several values of
(5.3.4) is very near zero for
Ct
Ct
in one pass.
The value of
In Figure 4,2
ywt greater than -10,
it can be seen that 1\11' the unr'estricted maximum. likelihood estimator,
would be less than
~E
for y less than zero.
For y less than 10 and
..
greater than zero, it is easily seen that (~E -~)
2....
<
(~1
2
- ~) .
Thus,
S 10
_
CD
[y - l/e + f(y - l/e)/F(y - l/e) - ~J
<
S
10
(Y_IL)2
...
- CD
1
J2TI
- ( y_IL)
e'"
1
2
J2TI
e
_(y_~)2/2
dy
2/2
dy
The last integral can be found to oe
-(-10 - ~)f(-lO - ~) + F(-lO - ~)
following the steps outlined in (4.4.1).
Since
~ ~
0
f(-lO - ~) ~ f(-lO)
and
F(-lO - ~) ~ F(-lO) ,
and both f(-lO) and F(-lO) are very near zero; therefore, (5.3.4)
can be closely approximated by evaluating it over the interval
[-10, CtJ.
In this way regret was computed for values of
greater than zero.
Here, the values of
~
€
~,
8, and Ct, all
[0.1, 8,0] were integer
multiples of 0.1; joi:qing points, Ct, were allowed to take on values
between 0.25 and 5.0 which were integer multiples of 0.25; values of
~
67
e were
0.25, 0,5, 0.75, 0,875, loU, 1.'5, 2.0, and 2,5,
Table 5.2
gives the maximum regret for each (e,~) pair considered, where regret
is the value of (5.3.2) for each 8, ~, and j.J. considered minus the lower
envelope given in Table 5.1 for that value of j.J..
As can be seen in Table 5.2, the values
e'
and ~' which minimize
(5.3,1) seem to lie in the intervals
0. 75 ::;
e'
:$
L
°
and
L 25 :$ ~' :$ 1. 75
and the associated regret is at most 0.4799L
The maximum likelihood
estimator from. the same likelihood function has a maximum regret of
0.58386 (when compared to the same lower envelope).
The Bayesian
estimator from the uniform prior and the same likelihood has a maximum
regret of 0.91544.
j.J.J
The optimal joined estimator is given by
y - 1/.875 + f(y - 1/.875)/F(y - 1/.875)
= y
Note that j.J. is discontinuous at y equal 1.50, which
J
<
for
y
for
1,50::; y .
re~ults
1.50
in an
interval of feasible j.J.-space being unattainable by this estimator.
So this estimator which is the first result of an attempt to construct a small mean square error estimator onto the feasible parameter
space, fails again 'to exhaust the feasible parameter space.
following section will discuss remedies for this.
The
Table 5.2
Maximum regrets for joined estimators
e
GV
0.250
0.500
0.750
0.875
1.000
1.500
2.000
2.500
0.93493
1.04432
1.17629
1.32916
1.50075
1.69316
1. 90199
2.12449
2.35616
2.59129
2.82360
3.04708
3.25652
3.44794
3.61877
3.76780
3.89493
4.00099
4.08747
4.15626
1.18827
1.34450
1.51911
1.71163
1.91475
2.12679
2.34798
2.57122
2.79261
3.00735
3.21102
3.39976
3.57070
3.72205
3.85313
3.96414
4.05603
4.13027
4.18864
4.23358
1.33428
1.51355
1.70844
1.91584
2.13144
2.35314
2.57464
2.79291
3.00438
3.20498
3.39130
3.56061
3.71118
3.84223
3.95389
4.04697
4.12281
4.18309
4.22963
4.26433
1.42621
1.61907
1. 82655
2.04095
2.26498
2.48818
2.70802
2.92155
3.12555
3.31649
3.49162
3.64891
'3.78727
3.90648
4.00706
4.09013
4.15717
4.20993
4.25020
4.27981
Maximum Regret
0.25
0.50
0.75
LOO
1.25
L50
1. 75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
4.25
4.50
4.75
5.00
3.07098
3.49415
3.86061
4.14993
4.35675
4.48888
4.56236
4.59578
4.60572
4.60440
4.59934
4.59428
4.59054
4.58822
4.58694
4.58630
4.58602
4.58590
4.58585
4.58583
0.53246
0.53429
0.52072
0.49140
0.53673
0.61666
0.73520
0.89043
1.08054
1.30178
1.54858
1.81380
2.08932
2 ~3666 7
2.63773
2.89539
3.13399
3.34964
3.54016
3.70499
0.55373
0.56364
0.55959
0.53712
0.49792
0.48414
0.55142
0.64532
0.76166
0.89494
1.03829
1.18555
1.33086
r.46924
1.59648
1.71115
1.81087
1.89621
1. 96708
2.02414
0.56275
0.57622
0.57641
0.55870
0.52453
0.47991
0.49056
0.45494
0.65849
0.76550
0.87963
0.99545
1.10815
1.21384
1.30973
1.39417
1.46650
1.52683
1.57582
1.61448
(j\
co
69
5.4
Other Joined Estimators
As a combination of two estimat.ors, one Bayesian estimator and.
the maximum likelihood estimator, an
est~nator
was created in Section
5.3 which had smaller maximum regret than
any of the classical estima-
tors previously considered in this paper.
This suggests that maximum
regret could be decreased further by combining several Bayesian
estimators with the maximum. likelihood estimator.
'1'0 explore these
possibilities, the case of a sample of size one from a univariate
normal density will again be examined.
This section will consider
the case where the mean of this density is known to be non-negative
(~.~., d=O) and the variance is one.
Instead of attempting to find one optimum interval as was done
in Section 5.3, the domain of the observation will now be divided
into several fixed intervals.
are given in Table 5.3.)
(The intervals considered in this case
A search was carried out to find the optimal
Bayesian estimator (exponential prior) for each interval in Table 5.3;
the maximum likelihood estimator will be used for [5.00,
CD ).
The mean square error for such a joined estimator where q+ 1
intervals are considered is
(5.4.1)
where a
l
= 0, and p(y) is the normal density function with mean I.J.
and variance one.
The estimator
estimator and is a function of y.
~L
is the maximum likelihood
The estimators I.J.. are the Bayesian
l
70
Table 5.3
Values of the parameters j..L and e and the intervals
used in the st.epwise optimL<.ing process a
;:;q;"=
~~=
~
Values of
e used
Values of
j..L used
Intervals
;;on8 ide red
0.125
0.2
( .. co " 0000)
0.125
0.250
0.4
[0.00, 0025)
0.250
0.375
0.6
[0.25, 0.50)
0.250
0.500
0.8
[0.50, 0.75)
0.375
0.625
1.0
[0.75" 1.00)
0.875
0.750
1.2
[l.00" 1.25)
1.250
0.875
1.4
[1.25, 1.50)
1.250
1.000
1.6
[1. 50, 1. 75)
1.250
1.250
1.8
[1. 75" 2.00)
1. 750
1.500
2.0
[2.00, 2.25)
1.750
1.750
2.5
[2.25, 2.50)
2.000
2.000
3.0
[2.50, 2.75)
2.500
2.500
3.5
[2.75, 3.00)
2.500
3.000
4.0
[3.00, 3.25)
2.500
3.500
4.5
[3.25, 3.50)
2.500
4.000
5.0
[3.50, 3.75)
4.000
4.500
5.5
[3.75, 4.00)
3.000
10.000
6.0
[4.00 0 4.25)
4.000
11.000
6.5
[4.25, 4· 050)
3.000
12.000
7.0
[4. 50" 4.75)
3.500
[4.75 0 5.00)
2.500
13 .000
Optimal value of
e on each interval
14.000
a
The meaning of the first two columns is explained in the text
71
est.fmateI's found by as signing d,ii'i'erell t e:<:ponential priors, eha.raetf:'':··
ized by the.: r parameter 8., em e&,ch iJ.-inte:,:"val and theref\;;re they are
1
l'Lucticns of ;y and 8 ..
1
'L'he
p'.coblem is trlen to chc()se the parameters
so as to minimize
,- 4 '"-?)
( J.
where LE(IJ.) is the above-mentioned lower envelope.
This problem of finding an optirnu."rl
. estimator in each int,erval
evokes memories of dynamic programming.
(1962).)
(See Bellman and Dreyfus,
The interva.ls correspond to the stages in the dynamic
progrannning problem, choosing the 8.1 on each interval corresponds
to
.
the activities, and maximizing regret corcesponds to the objective
flllction which is to be min:U:nized.
Howev€!:, the problem of finding
the 8. so as to minimize maximum regret cannot be restated in terms
1
of a recursive relation since the choice c,f the 8. in anyone ,:'nterval
.1
affects the maximum regret fun,etian as a whole.
'rhis property
vi~)late,s
the underlying assumptions of dynamic p:rogl'ammlng.
Thus, to deterrr.ine the cholc;e of (8 , 8 , ... , 8 ) wh leh wc;u~id
1
g
2
trlily be optimal would require the evaluation ':,f ('5.4.2) for all l',)int.s
in a q-dimensional space.
Note that the eva.!.Jlatiorl (if (5.4.2) is qu..:.te
costly
j
even at one point
(a.-I '
P
8 , .
2
H, aq ).
2/
.~
To reduee
C;OT:rpV.tpI'
costs, cnly relat i vel:y few alte'Yl.ative values for each courdinate
al
werE: examined; these are given in the first column of Table
Also,
~i.3.
0
in determining the maximlJl11p in (5.4.2) only a few !J.-valueE WF.:re llsed;
see thf-c second column of Table 5· '3.
Even so, the cost of the cOrr...tlmtationalwork is p:'C'obibiti.ve.
'Therefore, an approximation was used. which is simil.ar to the stepwl se
inclLLsionmethod. cf regression an.aJ,ys.i.s
constructed sequent.i.all:y as follows.
'-
wa.s
F'irst define IJo 0 as
c
e
with
A joined estimator, !J.p"
0
10
for
Y<5,
for
y
~
5 ,
where lJo (y) is the Bayesian estimator corresponding to the exponential
e
prior on [0,
estimator.
where
a is
CD )
with
e equal
Then define
II
""c:l
10, and ~L (y) 1s the maximum likelihood
as
ferr
y
~
0 ,
for
:v <
0,
chosen from among the candidl),te vuLu,::,s listed in Tab Ie :)·3
,g!The first integral in (5)-1-.2) is
evalUEi.~,ed a.s
a.
f ··11);,
fur rea..sc,ns
given with respect to (5.304), All but the :Last two terms are
evaluated using the stibprogrmn DOHFE i.n "the System 360/Scde:(lUfic
SubrolJ.tine Package (1970). The next t.o last term is identic3.1 :,0
(5.2.3) and is evaluated using the n.ormal density and distribution functions given there.
in such a way that j.J. .. (;y-) wiTl Lave the sma11.,:st p03fdbleic8Jcimum
C,l..
regre.t
Then defi.ne
0
II.
r- c 2
as
1
for
:.l
E:
R~![ CL 0, 0025)
,
for
e is
where
chosen from among the same Candidate va:!.ues so that
will have the smaLlest pos;3ible
rr~a.xim.!JEl
regret.
f.L
c2
(y)
This process is
exce:pt on the y-
cont.i.nued so that j.J.c3 (y) will
interval [0.25, 0.50), where
and
e is
again chosen so as to minimizE' ma.xiY!lurn regret of j.J.. (y), aIld
cj
Eventually the e!ltire y-intervaJ [0.0, 5.0) will be divided
so on.
into intervals of length 0.25 and. on each int.erval se(lUentiall;y, the
e will
parameter
at the i
th
step.
be chosen so as to rnin.imize maximum regret "'f
.. j.J. c: i
The rna.xiIfl~:un regret of j.J.-l (y) .1 s equal to 0.304207,
c2_
which is indeed sUbstantially less than the value found for the ,io:Lned
estimators presented in Section 5,3.
at each stage are shown in Table 5.3,
estimator is discontinuc'us at
ma{l~
The optimal values of
It shou.l:l be noted that this
of the ;;21 ,jo.:!n points,
It wO'.lld be desirable to refine th,ec
y-~i[j;erva:Ls
ar..d
val~e.:;
attempted in the preceding process, bu.+: this :is t·.:;o costly
tion.
for
If' the prr)cess could be contin.ued
e in
t~rrn8
of the
observatio~lS
C(.Juld then be slibstituted. for
e 1:0
e chosen
H.
all
e
cpera-
continu.':'us flLncticxcc,
could be fuued,
,":,;f
e ),
This fUl1ctL:,r;
the ex:prs8s10r, :Cor the Ba;v2::.daJ:l
74
estimator andwoul.d yield an es timato!' whieti vJ,Yu.ld giVE;; a IllE1X:hr.um
la:r l.i.kel:ihood.
0
e listed
Using the valCles of
tions e(y) were constructed.
i.n Table c;,3 as a basis some func-
The Di:S,ximUln regret for the estimatc'I'S
fuund by snbBtituting theBe functions o!'t,hco observations for
e were
then fOUlld by approximating the mean square errors by using the s11bpr()gr~
~.~.,
DQH64 in the Systerr.. 360/Sc.ienti £'i c Subrcutine Package (1970),
Gaussian-Hermite quadrature.
As for the construction of these functions e(y), first consider
Figure 5 0l, which depicts the e-values of Table 503,
'The variability
of these e-values after the twelfth interval cOll.id be ignored in
searching for fWlCtions e(y).
When the observation y is large and
e
is large, the Bayesian est:irnate tends to the maximmll likeliho()d
estimate as was shown in Section 4.2.0
the
e' s
of
e or
Therefore, the variability of
is most likely due to the B8..yesian estimate for these values
arw larger
e not
being
s.ign:i.f'.ic~ant.1:y
different from the
max:imum .likelihood estimate for observations that are large.
Notice in Figure
5.1
that when the obS2rvation is larger than
0.:'), when one ignores the variability of' e f:yr' obse!'vations larger
than 2.75, a linear function of the observations (y) seems te, fit~
the values of e as they depend on y.
for several I.i.near f'unct.iorls e(y).
The Elaximu:rn regret; was Cc.'illputed
'Table 504 givestte funct,.lcns tbat
w'ere considered, the interval on which t,he 1in.ear Y'mr;t:ion
aDd the value used for e on the rernaindE:r cfthe
observation, in colun-L."ls 1, 2,s.nd 3 re'3pcctively,
dcn~ain
1Iiaf
used,
of the
Nc,te that t.hese
75
4
3
Value of
e
2
1
1 2 3
4
5
6
7 8
9 10 11 12 13 14 15 16
Interval
Figure 5.1
A plot of the optimal value of
optimizing procedure
e
found by the stepwise
76
Linear functions of the observations used in place of
in the exponential prior
Table 5 4
0
Linear function
e = ~O.05 +
Y
S, the set on which
the linear function
was used
Value of
e used
on SC
b
Maximum
regret
Y
e [O.l,ooJ
0.1
0.361671
e
=
~O.25
+ 0.75y
y
E:
[0.2,ooJ
0.1
0.387803
e
= =0.10
+ 1.50y
y
E:
[0.1, co ]
0.1
0.441211
=0.70 + 1.50y
y Ii:
[O.5,co]
0.1
0.550220
e
e
e
=
~·1.13
+ 1.14y
y E:
[0.75.ooJ
0.125
O.552f06
=
~1.38
+ 1.63y
y
[1.0,00 j
0.25
0.569923
E:
77
i\~1::tions
gave some improvement .in the maximJI.rJ regret
of t;wo es+:·imators.
giv~:~;
Figuc::E' '5.2
CVf.j"(·
'::;b::
,jo.in~
a plotting ()f the meaIl square
er'rcr fa::' thE; estimat)T fe,und by Gsl:.'lg tbe linear f;m.c:tion
e -0 . 025
+ O. 75y
when y is greater than or equal to 0.2, and letting 8
.1 otherwi.se.
This would yield the estimator
:i -
j.J.
1/ ( -() •0:::-:5 + 0.75:-1)
+ fey - 1/(-0.025 + 0.75y)J/F[Y - 1/(-0.025 + 0.75y)J
for
y - 10 + fey - 10)/F(y - 10)
5.5
y < 0.2 ,
Extending the Technique
All of the work so far reported in this chapter wa.s concerned
with a samp;t.e of size one from a univariate nonnal density with
7ariance one and mean, jJo, not. less t.han zero.
for any sample of sizE' n
~
Analogous results h'Jld
1 from any univariate normal density wi.th
Cd,
a mean known to lie in the interval
CD )
0
Given a sample size
n and mean y from a normal dist.rib:ltion with unknown
jJo
2: d- anIi
eXJ)ecj.~at:'on
kr:lOwn VarIance
"
1"
. l.lll1C·l0n
n
t'
th e mean, j.J.,
0 2 , .tl.:Jle .1"k
I elh()Gd
I~ Col' ..
is
_)
L ( jJo 1Y
By Sllbst.ituting
the~i
r;·f
y
-
ex:
e
2
-n(y-jJo) /20
2
for y and 0 / n for
cl
2
in the fcllowing re3lilts,
are appli.::ab12 to samples of' anY size
110
78
"
/
.,-- ......
/
I
1.0
/
...... ,
"
............ .............
.............
·
I
I
I
I
I
Mean
Square
Error
I
I
0.5
I
/
I
I
I
/
0.0
0.0
2.0
4.0
6.0
8.0
Lowe r enve lope
Bayesian estimator
Figure 5.2
-------
Lower envelope and mean square error for a Bayesian
estimator using a continuous function of the observations
for the parameter
79
'The posterior using the expo:nential
pric~
is
o
elsewhere .
Using (4.1.2) the mean of this poster::Lcr is fC'lmd tc) 'be
~
.: y _
i/ e +
_Of((;Y~-=-O~/ e_=--_~Jj 01
(Fie -
F'((y -
d)/o)
The mean square err'Dr of this estimator is a function of' jJo, 0,
e,
defined for
e>
0,
(1
> \), jJo
d:
2:
~.J
(Y-jJo)~
1
2
J
,} J R'[jJo
-
(jJo-d
f
\ 0
Q+
e
+
F' (jJo-d
-- +
:=
e
.Q)
e
-
tl
0
where the transfoI1nation y
- Q)
11
jJo + Otl was used.
2
2"
o
e
1
J2
e~2" d.
2
du
$
Note that replaci:rlg d
-)i;
by 0, and "by 1 yields e, function X
r(jJo +
(1-",8), defin,=d for 8> 0,
U
_
l::)
e J2
e)
• '1
F'(jJo +- u ~
a..'1.d.t~hat
~)
o
jJo 2:
0:
20
both f1.mctions defined on their appropriate domain.
square error for the estimator giveQ in
(5.5.1)
Thus tbe mEan
is expressed in terms
of the mean square e:['ror for the Bayesian estimator found when the
variance is one, the feasible parameter space is the positive half
line, and the prior is an exponential density with parameter
!.~.,
the situation discussed in Section
5.3.
a/o,
Therefore, the lower
envelope fOT the mean square error for estimators of the form (5.5.1)
and variance
equ8~
tc 0
2
can l,e found in Table
~5.1.
The v8.Lw of the
minimum meaD square error at the point l.J.' in the feasible parameter
rl
space equ.als
times the value of the minimum mean square error at
the point (l.J.' - d) /0 in the feas ible parameter space [0,
in Table 5.1.
00 ),
as found
Likewise the values of the regret function in the
general case can be found from the values of the regret function in
the special case of the previous sections.
In Section 5.4 we considered estimators for l.J. ;::: 0 from samples of
size 1 from N(l.J.,l), where these estimators were obtained from Bayesian
estimators with exponential prior, -6e -l.J./
tion
e(y).
e,
The mean square er.ror for such
by replacing
e by
a func:-
estimator is defined
&'1
for l.J. ;::: 0 and equals
SR'
[y - l.J. -
e(y)
f(y -
+
ety))
---~l'"-
F(y -
eTYJ)
2
)
e
In the more general case we cons irier es·timat.oI's for ~' ;::: d from
sampes
1
.",. (,l.J. , ,02 ...I, were
h
. 1·
estimatcJl'S
f Slze
one (y ' , say ) f rom 1,
tuese
(>.
are obtained from Bayesian estimators with exponential prior ~
-( '-d)/e ,
-8e ,l.J.
by replacing
e by
a function 8' (y').
The mean sguhres
81
j.J.' ~
error for such an estimator is defined f·or
f(~
(J
(J
d and equals
,
eI (y I ) )
- e (~
I
') )
This integral is easily seen to reduce to the ahcve one by means of'
the substitutions
y
e' (y')
:=
(J
e(y)
Note that these substitutions are compatible with the reduction in
equation
(5.5.2).
Therefore, i t is possible to find the value at
of the regret function of any estimator for
(J
y'
where y'
rv
2
eI (y ')
+ cr
f
j.J.' ~
j.J.'
d of the form
(~
(J)
cr - e I (y ')
F(~
cr)
o - eTYT)
N(j.J.' ,i), from the vallIe at j.J. of the regret fWlction of the
estimator for j.J.
~
0 of the form
1
8G1
Y -
f(y
+
F(y _
where y '" N(j.J. ,1), simply by taking 0
t ·lon a;t j.J.
regret. f
une
:=
1
-e(y))
2
1 )
em
times the value of the latter
j.J.'- d ~
<>n'd cornpll-ted
wJ.-~_t.. 1,-l e(y)
,_L
(J
e'·(~~~Q'l_)
(J
Similarly, when one has found the fUllr.:tion e(y) which minimizes
maxi.mum regret for the problem of est,imating j.I. ;;:: 0 from one C:bservation of y
rv
N(j.I.,l), one can immediately conclude that the fill1ction
e' (y') which minimizes maximum regret for the problem of estimating
j.I.'
~
d from one observation of y'
rv
N(j.I.',f,2) is given by
y'- d
e'(y') == 0 . e(y) == 0 . e(-o-) .
The mean square error for t.he joined estimator considered in
Section 503 was given by (503.2).
Making the transformation u
'-=
y-j.l.,
this expression becomes
J
Ct_"!,
'-A-J
(u _ lie + f(u + j.I. - l~e) _
F(u+ j.I. - 1 e)
2
e -u
/2
du +
J
CD
Ct-j.I.
u
e
2
For the case of the normal density with
-u
2
1
J2TI
/2
du .
j.I.'
E:
Cd,
CD )
and variance
the mean square error for the joined estimator would be
,)
f((y - o"-;e' - d)/O)
F((y - 02/ e,
1
-
d)/o)
2 _2
e
- (y-j.l.') /20
dy
J2TI
2
e
,~
_ (y_j.I. ') / 20c:.
dy.
J2n
Making the traxlsformation u ==
~
o
, this expression becomes
0
2
,
S
(01'- CD
1-"')/0
[u -
i/e'
o/e'
f(u _.
-------7r=-~J
+ 0
o/e'
F'(u -
(5.5.4)
would equ.al
i
ella
stitutions (1-"'- d)/O for 1-",
The optimal values of
01
and
2
(5.5.3)
times
e
for
0.75 $
e$
1.25 $
01
1
e'
(5.5.3))
1.
$ 1.75 .
e' /a $
1
e'
0
or
o. 75
The optimal choice for
0::
0 $
$
e
du
2
e
/2 du .
-u
(5.5.3)
for the case
for the general case would be
0·75 $
J2TI
when in
and
So the optimal
_u2 /2
1
and (()('- d)/a for
e found
variance one (~.~., the case given by
2
+ I-" ; d)
1.1
Then
1-"' - d)
0
-
+
.
would be
or
1.250+ d$OI' $1.750+ d.
were
1.10
the sub01
€
are made.
[0,
CD)
and
'5,6
Est:lr!lating TI<TO OrCier2d Parameters
'It,,: rlet;l!ods in this ::hapter can be ,spplied in
the
f'2,rameter model 1:1. whi,c;h t,he
tvJO
order·:d.
Wi,thant loss
(,tC'
.J,
~
two pal"iillletE:;:rs
gen'-Y"llty"
'8V
II.
<
II.
,e",o,_, ,
,0;
""'1""1
- 1""2·
•
sim.ilar man:::,'.::!, ':,u
are Y..Ilcwn to
t,~
The Eay,;os ian
est,ima-tors fer the two pa:r'a..'TIeters when a :mifc.rnn prior is cus,Surned a.re
given it' (803.19).
For the case of a sarr-ple of size one from a
'ulvariate normal diEtributic,n with covariance matrix
WOJJd
(lr,
arid fur the
be
(5·6,1)
+
wbere
f( (Y2
F( (Y2
As a measure analcgous to mean square ereor for this vector-·valu.:':d
estimator, the following sCf:Llar expressioE will be used
~)
(~
I
which in this prob,1em eQuals
S00_ 00 S00_ 00 [(''Yl
+ (y'2 - 20
o
2
/e
+ AO/
+
202/
J2 -
e~,,-J
c:.
2
r. ,~ !J.l),2
AO',/ ,.;2
J
1
85
The first two terms are the variances of Y and Y , respectively.
l
2
making the transformation
ul
=
Y2 - Yl
u
2
=
Y + Y
l
2
more tractable expressions can be found for the r'emaining terms.
Thus,
By
86
In this case
2
- 20 /
e) / (0 ,/2) )
After integrating with respect to u expression
2
involve only one variable, ul
between 1-1
2
and 1J.
Expression
1
= Y2
- Yl'
(5.6.2)
is seen to
Nute that the difference
completely specifies the mean squa:re error.
(5.6.2)
can be evaluated Llsing the same numerical
methods as were used for the one parameter case considered previously
in this chapter.
2
0
=
The lower envelope for values of 1-1
2
- 1J.
1
and
1 was approximated in exactly the same manner as the univariate
case.(See Table 5.5, where ~ 1-1
= 1J. 2 - 1J.1 ·)
Section 2.3 discussed the restricted maximum likelihood estimator
for (1J.1' 1J. ) under the given conditions.
2
this estimator is given by
The mean square error for
87
Table 5.5
Approximati.on for the lower envelope of the mean square
error for estimat.ors of ordered pa rame t.e 'r s
J
.
=:;z:z
~
e giving the
minimum MSE
Minimum
MSE
~
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2..9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
2.1
2.1
2.4
2.6
2.7
2.8
3.0
3.2
3.4
3.6
3.8
4.0
4.3
4.6
4.9
5.2
5.5
5.8
6.2
6.7
7.1
7.6
8.0
8.7
9.4
10.0
10.7
11.4
12.4
13.4
14.4
15.9
16.7
18.6
19.5
21.9
24.3
25.5
28.4
31.4
1.10346
1.11865
1.12878
1.14301
1.15793
1.17420
1.19115
1.2091'7
1.22811
1..24787
1.26842
1.28972
1.31166
1.33421
1.36723
1.38063
1.40435
1.42832
1.45243
1.47659
1.50071
1.52469
1.54848
1.57196
1. 59509
1.61778
1.63997
1.66161
1.68261
1. 70297
1.72263
1.74156
1.75973
1. 77710
1.79369
1.80946
1.82443
1.83858
1.85193
1.86448
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5.0
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8.0
e glvl,ng the
mini.mum MSE
34.4
37.3
40.3
44.8
49.2
55.9
62.5
69.2
'75.9
82.6
92.6
102.6
117.6
132.6
147.6
170.2
192.7
215.2
249.0
282.8
316.6
367.3
418.0
468.6
544.7
620.7
734.7
848.7
1019.8
1190.9
1361.9
161.8.5
1875.1
2259.9
2644.8
3029.7
3607.0
4472.9
5338.9
6204.8
=-~
Mintmum
MSE
1.87625
1.8872,5
1.89752
1.90707
1. 91592
1.92411
1.93166
1.93861
1.94497
1. 95080
1.95611
1.96094
1..96533
1.96930
1.97288
1..9'7610
1. 9"7899
1. 98158
1. 98388
1. 98594
1.98775
1. 98937
1.99080
1. 99205
1. 9931a
1.99411
1. 99495
1. 99568
1.99632
1. 99687
1.99734
1.99775
1. 99810
1. 99840
1. 99865
1. 99887
1. 99906
1.99921
1. 99935
1.99946
;n exp( -
~(~
-
~)' (l..
-
~) )dL1 dy 2
The first integral is ~ the sum of the variances of Yl and Y .
2
making the transformation
~
= ~ - ~,
By
and then
1
u
J2
expression (5.6.3) becomes
A)
MSE ( 1Jo..
"'"ML
-
-
1 + -1
2
J X;J
-00
CD
(~l-
"""2)/J2
( v 2 -I-
1
v2) -1
2
2n
e -v'v/2
_:!...J dv dv
1
2
(5·6.4)
Here the functions f(x) and F(x) are the density and distribution
functions for the univariate normal distribution with mean zero and
variance one.
Thus, the mean square error for the restricted maximum
likelihood estimator is a fU!:lction of (""2 - ""1) also.
The maximum regret for the maximum likelihood estimator calculated
for values of b. "" = ""2 - ""1 given in Table 5·5 was found to be
0.420623.
This compares favorably to the maximum regret of 0.835614
found for the Bayesian estimator using a lL'1.i.form prior.
To determine if these maximum regrets could be decreased even
more, the process of joining Bayesian estimators (for exponential
priors) with maximum likelihood estimators was examined for this case.
Following the procedure outlined in Section 5.4, an attempt was
made to find the optimal value of
This was done for the case of 0
e and
values of
2
e on
=1
each of several intervals.
and the ordering ""2
~
""1'
(The
b. "" that were used are given in Table 5.6.)
The mean square error for these combined estimators can be
written as in (5.4.1) for the eXhaustive sets Ii'
i
=
1,2, ... , q+l,
"
MSE(~)
"
(~L - ~)/(~L - ~) P(~)dyldy2
Here the functions ~. (~, e) are the Bayesian estimators from exponential
l
priors, and
~L
is the restricted maximum likeli.hood estimator.
Since
the interval I + 1 will be required to be in the feasible parameter
q
space,
~L
"
~L
= y...
is the unrestricted maximum likelihood estimator,
l· ~. ,
The term (y.. - ~) I (y.. - ~) is found to occur in everyone
of the q+l integrands in (5.6.5).
Thus, (5.6.5) can be expressed as
(See the derivation of(5.6.2.))
90
Table 5.6
Values of the parameters e and I-L and the intervals
used in the stepwise optimizing process for two ordered
parameters a
Value!3 of
e used
Values of
4l. used
Intervals
consi.dered (Y2
Yl )
Optimal value of
on each interval
e
2.0
0.2
[co , 0.00)
4.0
2'05
0.4
[0.00, 0.25)
2.5
3.0
0.6
[0.25, 0.50)
2.5
4.0
0.8
[0.50, 0.75)
2.5
5.0
1.0
[0.75, l.00)
2.5
10.0
1.2
[ 1.00, 1.25)
10 .. 0
15.0
1.4
[ 1.25, 1.50)
10.0
20.0
1.6
[ 1.50, 1.75)
5.0
25.0
1.8
[1.75, 2.00)
10.0
30.0
2.0
[2.00, 2.25)
10.0
40.0
2.5
[2.25, 2.50)
5.0
50.0
3.0
[2.50. 2.75)
5.0
75.0
3.5
[2.75, 3.00)
10.0
100.0
4.0
[3.00, 3.25)
10.0
125.0
4.5
[3.25, 3.50)
10.0
150.0
5.0
[3.50, 3.75)
10.0
200.0
5.5
[3.75, 4.00)
10.0
250.0
6.0
. [4... 00, 4.25)
10.0
6.5
[4.25, 4.50)
10.0
7.0
[ 4 • 50, 4.75)
10.0
[4.75, 5.00)
10.0
aSee the text for an explanation of columns one and two
91
.,
MSE(~)
Here
f((Ul
A.l
- 2i/s i )/(0 ,j2))
F( (ul - 202/6i)/(0
j2))
The variable ul is y2 - Yl' so the intervals I i are found by dividing
the (Yl' Y2)-Plane into disjoint sets based on the values of Y2 - Y ·
l
(The intervals used for this example are also given in Table 5.6.)
The stepwise optimizing procedure described in Section 5.4 was
utilized to gain some idea of a proper function to use for
minimizing maximum regret.
e in
The values of S for each interval which
optimized the minimax regret by this procedure are listed in Table
5.6.
Using the Ba;yesian estimator corresponding to the listed S on
the appropriate interval and the maximum l.ikelihood estimator for an
observation in which y 2 -Yl
regret was 0.316849.
~
5·0 yielded an estimator :whose maximum
(Figure 5.3 gives a plotting of tne mean square
error for this estimator.)
To determine if the stepwise optimizing algorithm could be
improved upon, the maximum regret was fOillld for several other comb inations of the
e' s
for the various intervals,
The Bayesian estimator
93
for the exponential prior with OEe of the values of
a listed
was llsed
on the following intervals when
y
2
- Y
1
e [-
e
00, 0)
Y2 - Yl e [0.0, 0.25)
:=
2.0, 2.5, 3.0, 4.0;
e = 2.0,
6
:=
2.0, 2·5, 3·0;
2.5, 3.0, 4.0;
6
e
2·5, 3·0;
:=
2·5, 3.0, 4.0, 5.0, 10.0;
and when
e = 10.0
For the interval Y e [5. 0, (0) the maximum likelihood estimator was
the assigned estimator.
The maximum regret was evaluated for the com-
bined estimators found by using all possible combinations of the
candidate estimators listed for the various intervals.
values of
procedure.
6, as in rable 5.6, were found to be optimal by this
Of the other combinations tested, it was found that
:=
of
e
2 - Yl ) e [0.0, 0.25) by
3.0, gave a maximum regret of 0.316861. Thus, the optimal choice
replacing the optimal
6
The same
e must
near
in the interval (Y
decrease from approximately
2.5 at y
~
0.50.
4.0 at y
In the vicinity of y
:=
:=
0.0 to a value of
1.0, the value of
e must
begin increasing to a value which yields all estimator that differs
from the maximum likelihood estimator by a negligible amount,
e
From (5.6.2) it can be sei"n that the mean square e.cror of' ~e is
a function
~(~ !J., 6, 0)
of
~,
e and
0, such that
= ,.,2 • ~(~
.§.
(1,1 )
(3
U
= (32
. ~*(~
a
.§.)
'0
say; which allows the campu.tatioD of mea.'1 square errors
in the more general case with arbitrary (but known)
discussed special 'case with 0
is the optimal choice for
and when
.
a2 l.S
=
e when
1.
estimators
from the above
Thus, as in Section 5.5, if 6'
Y - Y is in the interval [~, a ]
1
2
2
equal to one, then 08' would be the optimal choice on
the interval [0 ~,
a(Y
2
i
bf
a
a ] when
2
a2
is not equal to one.
Likewise, if
2
- Y ) is an' optimal continuous function when" equals one,
l
2
ae«Y
2
e when
- Yl)/O) would be the optimal continuous function to use for
0
2
is not equal to one.
Therefore,
ar~
results found for the
case in which the covariance matrix is the identity matrix, could be
made to apply to the case where the
5.7
covari~'1ce matrix
2
is 0 1.
Estimators for m Ordered Parameters
Suppose that instead of estimating the meWl vector for a bivariate
normal distribution, a modeler needed to estimate the mean vector for
a mul.tivariate normal distribution where the
satisf'y a certain order.
(~omponents
were known to
This modeler could construct estimators of
the type illustrated in this chapter, but his problem wO'.Jld be <:ons iderably more complex than the cas es dealt with thus far.
Consider, for example, thE; ca3e -,{hen :m is equal to 3 andt.he
2
covariance matrix is 0 1.
Assuming an exponential prior of the form
given in (4,2.2), the posterior would be as in (4,2.3), a.rtd the
Bayesian estimator of !:!: would follow upon substitution of Y:.. - C '.e. for
y:..
in (8,3·17).
(The estimator for this example is given by expressions
(8,3.22) and (8.3,23).
The mean square error for this case is
Note that
2
is the sum of the variances of Y , Y , and Y and would equal. 30
l
2
3
Making the transformation
Ui
= Yi
A.
for i = 1, 2, 3, MSE(!:!:e) becomes
- ~i
0
2
2 '!*/p* )]
+ (C' ~ + cr H '!**
/p )' (C I ~ + o-H
with
--I
0
1J. 2 - IJ. l
.
\)
1J.
exp(-b
l*
- IJ.
l
l
*"
*/402 )F(~
bl
2
*
*/402 )F( b~
2
=
exp( -b
3
-
b 2* '
co , V )211
l
-
bl *'
co, V )2TI
2
0
2
J2
0
2
J2
o
L
and
Substi.tuting for the Y in (8.3.20) b * a.nd b * are found to be
l
i
2
'b l *
b2
*
=
e
2/ l + 0 2/ 82 '
u 2 - u l - 2 0'
= u
3
- u 2 - 202/8 2 + 0 2/8 l ·
97
Thus, the expression is a function of
the elements of
(l,.
8 , 8 , and differences in
1
2
~, !.~.,
The lower envelope for the mean square error would be a function
of the differences in the means also.
However, approximating the
minimum value of (5.7.2) for fixed values of the elements in (5.7.3)
would require searching over the possible values of both 8 and 8 ,
1
2
The search method used for the univariate and bivariate cases would
not be applicable in this situation.
Based on the results obtained in the univariate and bivariate
cases discussed earlier in this chapter, it would seem likely that
functions of the observations could be constructed to use for the
parameters 8 and 8 which would give a reduction inmaximurn regret
1
2
in this case.
The domain of the observations could be divided into a
grid based on values of Y2 - Y and Y - Y '
1
2
3
The
for~m
of the func-
tions optimal for 8 and 8 in each set formed by the grid and for
1
2
several values for the elements of
(5.7.3). Again, it would be
necessary to find an appropriate multidimensional numerical integration algorithm for evaluating the integral on the various elements
of the grid.
Thus, the task of finding improved estimators would be
considerably more involved than was the case in the univariate and
bivariate situations.
98
6.
SUMMARY
Quite often a modeler knows that the true values of the parameters
in his model could not possibly be contained in certain sets of the
parameter space.
This paper has examined such a situation for a linear
model whose errors are distributed nonnally with a known covariance
matrix .. Attention was restricted to the case where the moderler knows
linear inequalities which define the feasible parameter space.
Three
alternative estimation techniques were presented which took into
account these restrictions on the parameter space.
The literature contains many treatises on maximizing the likelihood function with restrictions of this sort.
Maximizing the normal
likelihood function is equivalent to minimizing a quadratic function,
and the algorithms of quadratic programming give solutions to the
problems of minimizing a quadratic function.
Special, simplified
algorithms exist for certain design matrices and for the cases when
the restrictions are orderings of the parameters.
Estimates in these
cases are called the isotonic regression with respect to the ordering.
The restricted maximum likelihood estimators were shown to have
same desirable properties.
They possess a type of consistency and
give a smaller mean square error than the unrestricted estimators in
same cases.
A property of these estimators which is unappealing is
that all of the unrestricted estimates which violate the restrictions
will be mapped to a boundary of the feasible parameter space.
The
consequences of this property is that many unrestricted estimates which
are quite different are mapped to the same point on the boundary of
the feasible parameter space by these restricted maximum likelihood
99
procedures, so that they pile up on the
bOQ~dary.
It is hard to
believe that the true parameter values are so often right on the
boundary.
Bayesian estimators are used frequently in situations where the
modeler knows that some subsets of the parameter space are more likely
to contain the true value of the parameters than are other subsets.
However, there have been few publications which deal with assigning a
zero prior probability to portions of the sample space.
For this
reason, Chapter 3 dealt with the basic properties of Bayesian estimation on restricted parameter spaces.
The mean of the
most commonly used.
~osterior
distribution is the Bayesian estimator
In Chapter 3, it was shown that this Bayesian
estimator would not take on some values of the feasible parameter
space Wlless the posterior distribution became degenerate at the
bOWldary for some set of observations.
too, are Wlappealing.
These Bayesian estimators,
Other measures of the central tendency of the
posterior distribution did not seem to yield viable alternatives for
estimation on the restricted parameter space.
Several different types of priors were illustrated which would
yield a trWlcated normal posterior for the case of normal likelihood
function.
The truncated normal posterior was shown to degenerate at
the boundary for some observati.ons in the univariate and bivariate
case.
Thus, the mean of these truncated normal posteriors could give
estimates in every point of the feasible parameter space.
The expression for the expectation of the truncated multivariate
normal posteriors was found to include multivariate normal distribution
100
functions.
These distribution functions are analytically tractable
only for special. cases and these sIJecial cases do not necessarily
coincide with the conunon situations.
The problem of estimating the mean of a univariate distribution
was examined in detail to determine if some of the Bayesian estimators
proposed would give a uniformly smaller mean square error than was
found for the restricted maximum likelihood estimator.
No such
estimator was found, but many of the Bayesian estimators would give a
smaller mean square error over a portion of the parameter space.
The third estimator examined consisted of a Bayesian estimator
over a portion of the sample space and the maximum likelihood
estimator over the
remainder~
It was hoped that this would take
advantage of smaller mean square errors found for Bayesian estimators
near the boundary, without incurring the larger mean square errors
that the Bayesian estimators had away from the boundary.
As a measure
of how well one was doing in reconciling these two goals, the regret
function was· introduced, and the criterion of goodness chosen was
minimax regret.
See Section 5.2.
For the case in which the variance
of the underlying density is one, an optimal est:tmator of this type
was found to be one in which the mean of the posterior found from an
exponential prior with
a equal
to 0.875 is used for an observation
less than 1.5 and the unrestricted maximum likelihood estimator is
used for an observation greater than 105.
This estinlator had a maxi-
mum regret of 0.47991 which shows a decrease from the
maximu~m
regret
of 0.58386 found for the restricted maximum likelihood estimator.
101
Next, a procedure was proposed in which a different estimator
would be used on several preassigned intervals in the sample space.
Using a stepwise procedure for optimizing the choTce'of estimators,
an estimation procedure was found which would reduce the maximum regret
to 0.304207.
These results indicate that some continuous functions
of the observations could be used for the parameter
a in
the Bayesian
estimator and should lead to an estimator giving smaller maximum
regret.
Based on the limited information obtained here, some linear
functions to use for
e were
examined; the best gave a maximum regret
of 0 .361671.
The examined"choices of functions for
e were
good for a univariate
normal density with variance one, whose mean was greater than zer o.
In Section 5.5, a method was shown for choosing the optimal function
of the. observations .for
e for
other variance values, for other feasi-
ble parameter spaces, or for other sample sizes.
Section 5.6 showed that in the problem of estimating two .ordered
parameters maximum regret could again be reduced in the same manner
given for the univariate case.
This procedure was still found to be
relatively simple since the mean square errors of the Bayesian and
maximum likelihood estimators are a function of the difference in the
ordered parameters.
However, extending these algorithms to more than
two parameters, with a simple ordering, was shown to be a problem of
much greater magnitude.
For these cases the mean square error of the
estimators became functions with arguments of higher dimensionality.
The pedestrian techniques of
analysi~
used ·for. the. one-dimensional
case were found to be no longer adequate.
102
This study has shown that point estimators can be constructed
whi.ch use to a greater extent more precise information regarding the
parameter space.
The criterion minimizing maximum regret is particu-
larly applicable in the situation in which it· is difficult to spec.ify
a particular prior distribution for the parameter.
However, optimizing
this criterion function was found to be most difficult and computer
costs were excessive even for the crudest of approximations.
This
author would suggest that those interested in extending this method
give top priority to the development of algorithms for finding the
parameter
e as
a function of the observations which would give the
optimum for this criterion.
A better optimizing algorithm would make
for a much simpler task of extending this technique to the case of m
ordered parameters.
This study did not exhaust the
tion on restricted parameter spaces.
virtually untapped thus far.
only for the uniform prior.
B~yesian
alternatives for estima-
This area of study has been
The mode of the posterior was examined
Under a more intensive study of other
priors the mode of the posterior might be found to yield estimators
with more desirable properties than the estimators presented here.
103
'70
LIST OF' REFE.RENCES
Abramowitz, 1\1., and 1. A. stegun. 1964. Handbcok of Mathematical
Functio~s.
National Bureau of Standard.s, Washington, D. C.
AyeI', M., H. D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman.
1955. An emperical distribution function for sampling with
incomplete information. Annals of Mathematical Statistics.
26:641-647.
Ba.."lcroft, T. A. 1944. On biases in estimation du.e to the use (1f
preliminary test of significance. Annals of Mathematical
Statistics. 15:190-204.
Barlow, R. E., D. J. Bartholomew, J. M. BrE,mner, and H. D. Brmlk.
1972. Statistical Inference Under Order Restrictions.
John Wiley and Sons, Inc., New York City, New York.
Bartholomew, D. J. 1.965. A comparison of some Bayesian and
frequentist inferences. Biometrika. 52:19-35.
Bellman, R. E. and S. E. Dreyfus. 1962. Applied Dynamic Programming. Princeton University Press, Princeton, New Jersey.
Birnbaum, Z. W. and P. L. Meyer. 1953. On the effect of truncation
in some or all co-ordinates of a multinormal population.
Journal of the Indian Society of Agricultural Statistics.
2: 17-27 .
Boot, J. C. B. 1964. Quadratic Programming: Algori thms,
Anomalies, Applications. North-Holland PUblishing Company,
Amsterdam.
Brunk, H. D. 1958. On the estimati en of parameters restricted by
inequalities. Annals of Mathematical Statistics. 29:437-453.
Cram~r, H.
1951. Mathematica.l Methods of Statistics.
University Press, Princeton, New Jersey.
Princeton
Cunrow, R. N., and C. W. Durmett. 1962. The nurner.l.cal evaluation
of multivariate normal integrals. AYlEals of Mathematical
Statistics. 33: 571-579.
Dutt, J. E. 1973. A representation of IriLllti variate .n.onnal pro'bability integrals by integral transforms. Biometrika. 60:63'7-
645.
Ghizzetti, A., and A. Ossicini. 1970. Quadrature Formulae.
Academic Press Inc., New York City, New York.
104
Gun, A. 1965. The use of a preliminary test for interactions in the
estimation of factorial means. Institute of Statistics Mimeograph Series, Number 436. North Car'olina State University,
Raleigh, North Carolina.
Gupta, S. S. 1963. Probability integrals of multivariate normal and
multivariate t. Annals of Mathematical Statistics. 34: 792-828.
Hadley, G. F. 1964. Nonlinear and Dynamic Programming.
Wesley Publishing Company, Reading, Massachusetts.
Addison-
Hildebrand, F. B. 1956. Introduction to Numerical Analysis.
Hill Book Company, Hightstown, New Jersey.
McGraw-
Hudson, D. J. 1969. Least squares fitting of a polynomial constrained
to be either non-negative, non-decreasing, or cm.vex. Journal of
the Royal Statistical Society. 31:113-118.
Judge, G. G., and T. Tak8\Yama. 1966. Inequality restrictions in
regression analysis. Journal of the American Statistical
Association. 61:116-181.
Kendall, M. G., and A. Stuart. 1969. The Advanced Theory of
Statistics. VoL 1. 3rd ed. Hafner Publishing Company, Inc.,
New York City, New York.
Kruskal, J. B. 1964. Nonmetric multidimensional scaling:
method. Psychometrika. 29:115-129.
A numerical
Kunzi, H. P., W. Krelle, and W. Oettli. 1966. Nonlinear Programming.
Translated by F. Levin, Blaisdell Publishing Company, Waltham,
Massachusetts.
Lovell, M. C., and E. Prescott. 1970. Multiple regression with
inequality constraints, pretesting bias, hypothesis testing and
efficiency. Journal of the American Statistical Association.
65:913-925.
Malinvaud, E. 1966.
Statistical Methods of Econometrics.
McNally and Company, Chicago, Illinois.
Mantel, N. 1969.
programming.
Rand
Restricted least squares regression and quadratic
Technometrics. 11:763-773.
Milton, R. C. 1972. Computer evaluation of the multivariate normal
integral. Technometrics. 14:881-887.
Mosteller, F. 1948. On pooling data. Journal of the American
Statistical Association. .43:231-242.
Raiffa, H., and R. Schlaifer. 196L Applied Statistical Decision
Theory. Division of Research, Graduate School of Business
Administration, Harvard University, Boston, Massachusetts.
105
Savage, I. R. 1968. Statistics: Uncertaint;i and Behavior.
Mifflin Company, Boston,- Massachusetts.
Savage, L. J. 1954. The Foundations of Statistics.
Sons, Inc., New York City, New York.
Searle, S. R. 1971. Linear Models.
New York City, New York.
Houghton
JohnWiley 8....'1.d
John Wiley and Sons, Inc.,
System!360 Scientific Subroutine Package. 1970. International
Business Machines Corporation, White Plains, New York.
Theil, H., and C. Van de Panne. 1961. Quadratic programming as an
extension of conventional quadratic maximation. Journal of
the Institute of Management Sdence. 1: 1-20.
Tiao, C. C., and G. E. P. Box. 1973. Some comments on BaYes
estimators. The American Statistician. 27:12-14.
Wald, A. 1950. Statistical Decision Functions.
Sons, Inc., New York City, New York.
John Wiley "and
Zellner, A. 1961. Linear Regression with Inequality Constra~nts
on the Coefficients. Mimeographe~ Report 6109 of the
International Center for Management Science.
Zellner, A. 1971.
Econometrics.
York.
An Introduction to Bayesian Inference in
John Wiley and Sons, Inc., New York City, New
107
8.1
Theorems and Proofs
In this appendix
convention~l
mathematical and statistical
symbolism and the terminology of Section 1.2 will be used without
further explanation.
Theorem 1
Suppose the matrix
D
is positive definite and one wishes to
minimize the function
F(~)
= x'Dx
arbitrary, closed set
B
If the basic estimate
minimizes
F
and if
x
-s
x
-s
where
x
is restricted to an
~
is not in
among all the boundary points of
B
B , then
is a minimal feasible solution.
-Proof
Since
D
function on
is positive definite,
n
R
any
~r:f: ~
for
0 < A < 1.
F(x ) <
O
If
~r
F(~r)
(£!.
F(x)
is a strictly convex
~.~., Kunzi et al., 1966, p. 38).
Since
F(~)
Therefore for
F(~)
is the global minimum of
,
, and then
is an interior point of
lies on the boundary of
the boundary of
B.
B, choose
A so that
Then for any point,
x
-r
(A~r
in
B, there exists a point on the boundary,
+
(l-A),~)
B not on
~b'
such
108
Therefore, the
is a boundary point
x e B which minimizes
which minimizes
x
-s
F(~)
for all the boundary
points.
Theorem 2
A non-degenerate probability distribution with support within a
half closed half line (closed interval) has a mean bounded away from
i.~.,
the finite end point(s) of that half line (interval),
1)
is a distribution function with support
F(~d
BcDcR
and
D
D
=
=
1
=
(-co,co)
[s,+co)
(-co ,t]
where
for some
for some
1
s e R
1
t e R
Without loss of generality take
2)
A
~
==
JIfdF(~), = JIf'dF~)
Lim
A > 0
A -. 0
Then "If 0 > 0 ,
F(s-~)
or
D
"If r e R
or
D
=
=
D ,
= F(s-O) =
a •
[s, tJ
.
.
s + e .
[s,+co)
~ ~ e > 0 3 ~ ~
Proof
Note that, by the definition of
1
B :f r
109
s + 0[1 - F(s+o)] = s + oh(o), say.
=
a
Now,
~
0 > 0
~
F(S) = F(s+O)
h(o) > 0
=1
~
(otherwise
dF(s)
F(s+o)
=1V
= F(s) - F(s-O) = 1
~
0 > 0
s = B , a con-
tradiction of the first statement in the theorem).
o > O.
Choose such a
Then
IJ. ;a: s
+e
where
e == oh(o) > 0 •
Theorem 3
A non-degenerate probability distribution function, F (u) , on
nn
R
with support within a convex proper subset,
n
D
,
n
R
0f
c 1ose d
with respect to boundary points with finite coordinates has a mean
bounded away from every boundary point,
coordinates,
1)
~
,
0
f
il.
with finite
D
.!..~.,
F (u) is a distribution function "with support
nn
n
n
B c D eRn where B
(~: ~/~ = d V £ eRn,
+
d
s
1
R } ,
2)
_
u == f
3)
s
S
udF(u)
, and
n- = 1udF(u)
1"""B
D
is a finite boundary point of
coordinates
n
D
with finite
is bounded away from
~
•
llO
Proof
.£ '!d.
=s
hyperplane of
n
D
Le t
,
.£ e
n
R
,
-c'c- = 1
~.
containing
,
s e R
1
be a supporting
Without loss of generality assume
(8.1.1)
Consider a random variable, say,
Define a scalar random variable
Then,
(8.1.2)
Moreover, it is clear that
hypotheses of Theorem 2.
U_
F(~)
where
Thus by Theorem 2,
F
satisfies the
a
e > 0
such that
(8.1.3)
Define
C
n X n'
orthonormal, with first row equal to
squared distance between !d.
and
s
is seen to satisfy
= (C~-C~)' (C1L-C~)
c
l
Then, the
111
(i·£.· ,
the squared length of a vector is not less than the square of
the first coordinate)
(by (8.1.2) and the fact that
s
~/~ =
is in the hyperp1ace
s) •
Therefore
(by (8.1.3)),
i.£..,
~
is not closer than
n
arbitrary boundary point of D
is not closer than
coordinates.
e
2
> 0
e
2
> 0
to
s , an
with finite coordinates.
to any boundary point of
n
D
Thus,
~
with finite
U3
Table 8.1
Values of the function
x
f(x)/F(x)
-10.0
- 9.9
9.8
9.7
- 9.6
9.5
9.4
9.3
9.2
.. 9.1
9.0
8.9
8.8
8.7
8.6
- 8.5
8.4
- 8.3
8.2
- 8.1
- 8.0
7.9
7.8
7.7
7.6
7.5
.. 7.4
-.7.3
7.2
7.1
7.0
- 6.9
6.8
6.7
6.6
- 6.5
- 6.4
6.3
6.2
- 6.1
10.0980930
9.9990463
9.9000187
9.8010092
9.7020197
9.6030493
9.5041008
9.4051723
9.3062668
9.2073832
9.1085224
9.0096865
8.9108744
8.8120880
8.7133284
8.6145945
8.5158901
8.4172134
8.3185673
8.2199516
8.1213675
8.0228167
7.9243002
7.8258181
7.7273731
7.6289663
7.5305977
7.432,2701
7.3339844
7.2357426
7.1376456
7.0393953
6.9412937
6.8432426
6.7452450
6.6473007
6.54941.37
6.4515858
6.3538198
6.2561178
-
-
-
-
-
--
-
-
-
-
--
x
~6.0
-5.9
-5.8
-5.7
-5.6
-5.5
-5.4
-5.3
-5.2
~5.1
-5.0
-4.9
-4.8
-4.7
-4.6
-4.5
-4.4
-4.3
-4.2
-4.1
-4.0
-3.9
-3.8
-3.7
-3.6
-3.5
-3.4
-3.3
-3.2
-3.1
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
f(x)/F(x)
f(x)/F(x)
x
f(x)/F(x)
6.1584826
6.0609159
5.9634228
5.8660049
5.7686663
5.6714095
5.5742397
5.4771595
5.3801737
5.2832870
5.1865034
5.0898285
4.9932661
4.8968239
4.8005056
4.7043190
4.6082706
4.5123672
4.4166174
4.3210268
4.2256069
4.1303644
4.0353117
3.9404573
3.8458128
3.7513905
3.6572037
3.5632658
3.4695911
3.3761969
3.2830982
3.1903143
3.0978661
3.0057716
2.9140568
2.8227444
2.7318611
2.6414347
2.5514956
2. .4520771
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1,8
1.9
2.3732147
2.2849464
2.1973124
2.1103573
2.0241289
1.9386768
1.8540564
1.7703276
1.6875515
1.6057968
1.5251350
1.4456425
1.3674021
1.290498"]
1.2150249
1.1410770
1.0687561
0.9981660
0.9294158
0.8626174
0.7978845
0.7353317
0.6750731
0.6172208
0.5618827
0.5091604
0.4591471
0.4119247
0.3675614
0.3261089
0.2875999
0.2520463
0.2194365
0.1897350
0.1628812
0.1387897
0.1173516
0.0984359
0.0818925
0.0675557
114
Table 8.1
(Continued)
x
f(x)jF(x)
x
f(x)jF(x)
x
f(x)jF(x)
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.0552479
0.0447836
0.0359748
0.0286341
0.0225796
0.0176378
0.0136466
0.0104572
0.0079357
0.0059637
0.0044378
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
0.0032700
0.0023857
0.0017234
0.0012326
0.0008729
0.0006120
0.0004248
0.0002920
0.0001987
0.0001338
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5.0
0.0000893
0.0000589
0.0000385
0.0000249
0.0000160
0.0000064
0.0000040
0.0000024
0.0000015
0.0000009
115
8.3
The Mean of a Truncated Multivari.ate Normal Posterior Distribution
Consider the situation in which the posterior distribution has
the following form:
=
exp(-(X-~)/V(X-~)/2)
SAexp(-(x-~)/V(X-~)/2)d~
(8.3.1)
on the convex set
A
=
{§.: C§.:2:
i}
and
elsewhere.
The mean of this posterior would be
(8.3.2)
Evaluating (8.3.2) is no easy task in the multivariate case.
Finding the normalizing constant of the probability density (the
denominator of (8.3.1»
probability integral.
requires evaluating a multivariate normal
Kendall and Stuart (1969), pages 350-353,
Curnow and Dunnett (1962), Gupta (1963), and Dutt (1973), to mention a
few, have given solutions to this integral for special cases of the
region of integration.
Abramowitz and Stegun (1964), pages 956-957,
give techniques which can be adapted to evaluating a bivariate
normal probability integral on a convex set.
Milton (1972)
illustrated the use of multidimensional Simpson quadrature to
evaluate multivariate normal probability integrals such as these.
111)
For the cases considered by these authors their techniques provide
relatively inexpensive methods of evaluating such integrals on
computers.
However, the technique \Nhich handles the more general
situation, Simpson quadrature, becomes quite expensive as the
dimensionality increases.
For many practical problems, the numerator of expression (8.3.2)
(after substitution of (8.3.1) into (8.3.2»
can also be reduced to
evaluating a multivariate normal probability integral.
This occurs
when (8.3.2) reduces to finding the mean of a posteri.or distribution
(8.3.3)
with
V positive definite, on the set
• ••, am
and
p(b&..ly) == 0
elsewhere.
Later in this section, the Bayesian
estimator for a simple ordering of the mean parameters will be
deri.ved by making such a transformation.
This example should aid
the reader in formulating other problems of this sort so that the
reduction which follows can be utilized.
The mean of (8.3.3) would be
e
Jamm
e
•••
e
Sa 2 Sa l
2
e
e
Sam ... Sa
.
m
Making the transformation
1
2
2
~ exp(-(y_~)'V(y - ~)/2)~
e
Sal exp(-(y.•~)IV(Y-b!-)/2)dbL
I
~ ==
y - b!<. , the mean can then be written
117
E (ll,) = 1. -
= 1.
(8.3.4)
- D(~)/P , say.
Following a method used by Birnbaum and Myer (1953),
D(~)
can be
simplified to an express ion involving only normal probability
integrals.
By expressing the quadratic form as a sum, the elements of
D(~)
can be expressed in the same manner as the following expression
for
+
m
m
I:
I:
v .. z.z.)/2ldz •
1J 1 J
i=2 j=2
Defining
s 1 (z 1)
-
as
y -a
"y3- a 3 Y2-a2
m
m m .•. J
exp[-(2z
I: v . z '
l
Y -e u·y-e
Ym-e m
3 3
2 2
j=2 lJ J
r
S
+
then
m
I:
m
I:
i=2 j=2
v .. z.z.)/2Jdz 2 dz ••. dz
3
1J 1 J
m
118
Integrating by parts gives
Here
exp(-(2z
m
I;
v1,z.
1
J J
j=2
+
m
m
I;
I;
i=2 j=2
v. z . z ,)/2)l
1J ~ J
.-
Thus,
or
(8.3.5)
where
~1
is the first row of
V.
119
.
Repeat~ng
.
t: h ~s
h '1th
process f or t: '.,e
. - e'1ement
0
f
D (~) , one finds
that
.2
'y';E(~) = S.(y.-e.)e
I,
.L.
-(y.-e.) v"
~
I.
~
L
./2
(8.3.6)
11.
where
s. (z.)
1.
1
m
E v .. z.
j=l 1J J
j;li
m
+
m
E
k=l j=l
k;li j;li
L;
V
Z
Z
kj k j
)/2)
dz
The vector
would be the i
v.
-1
th
(8.3.7)
m
V.
row of
Thus, from
(8.3.5) and
(8.3.6) it can be seen that
VD(~) =
where the i
th
element of
1. = S . ( y. -e. ) e
111
1
(8.3.8)
1
would be
.. (y .-e .)2 v . ./ 2
1
L
~l
1
Call the matri.x
v with the
th
:i:~
row and column deleted,
L
1
f
.th
an d d enote tLle
vector wh ose eementsorm
t he 1
- - row a f '
. th
1
: - e ement deleted by
the 1
definite i f
V
the elements of
g,i
.
(Note that
is positive definite. )
z
with the
excE~ption
V
z.
Let
of
-~
z.
L
.
1
V
V.
~
,
with
would be pos itive
be the vector of all
Then
120
s, (x)
1.
+ -1.
z!v.z.)/2)dz
.•
1.-.1
1.
Completing the square,
s.1. (x)
s 1.. (x)
can be expressed as
a
a
- 151.)/2 sym-am .•• SYi+l- i +l SYi-l- i-l •••
exp(x 2
S!V.
e '+l y. ·.l- e . 1
1. 1.
1.
V
Y'+'l. m-em
1.
1.
1. - L
1. -
Making the transformation
t
S,(x)
1.
-1
= xV. n,
1. ~1.
+ -1.
z. )
becomes
-1
Sc m-l_
= exp(x 2S.V,
51./2)
1. 1.
1.
d
S,(x)
1.
The vector
I
m 1
c
would be
-1
c = x(V,..9".)
I.
1.
+
Yi - 1
-e
Yi +1
-
_
i
1
e i +1
121
and
d
would be
v - a1
·1
-1
+
d = x(V. .9..)
1.
1.
f
Yi-I
-
Yi+1
- ai +1
I.
r
I
Ym
Ii
a _
i
1
- am
Then
c.
(x)
u
1.
=
,-1~. /,.
') (m-l)/2\ V.-11'~
exp(x 2.9..V.
2)(~IT)
Jdc m-I_
m l
1.
1.
d
Jc 11
1.
1.
?
-(m-1)/2\ -ll-~
(-IT)
V.1.
exp(-t'V.t/2)dt ,
-
1.-
-
(8.3.9)
and the integral is a multi.variate noemal probability integral.
Thus, the elements of
1
consist of exponential functions,
known constants, and multivariate normal probability integrals.
From
(8.3.8), it is easily seen that
-1
V 1
(8.3.10)
and substituting thi.s expression into (8.3.4) i.t can be seen that
(8.3.11)
122
So, for posterior distributions of the form of (8.3.3), fi.nding the
Bayesian esti.mates for an observatiun
y..
consi.sts of evaluating
multivariate normal probability integrals.
In the univariate case, the function
value one for all values of
8 (zi)
would have the
1
Then in (8.3.10),
1
would be a
scalar and would equal
1
= exp[-(y-e)
The covariance matrix
2
v.
11.
V-I
/2J -
1.
would be the scalar
= exp[-(y-e) 2 /(2cr 2 )J -
1
~
exp[-(y-a)~v.l1/2J •
0'2 , so
2
2
exp[-(y-a) /(2cr )J
and from (8.3.4) and (8.3.11)
22222
/(2cr )J - exp[-(y-a) /(20' )J
sy-a exp[_z2/(2cr 2 )]dz
y-e
_ 0' exp[-(y-e)
=
y
2
= y _ cr exp[-(y-e)2/{?;:(/rJ - eXPC:(Y-a)2/(2cr
~
where
F(x)
2
11
(8.3.12)
cr[F«y-a)/O') - F«y-e)/O')]
is the distribution function for a normally distributed
random variable with mean zero and variance one.
Then for
e =
00,
Equat.ion (8.3.12) becomes
E(~)
where
f(x)
variance one.
Cramer's
- Y + crf«y-a)/O')/F«y-a)/O')
(8.3.13)
is the normal density function with mean zero and
Notice that expression. (8.3.13) is identical to
resu1t~quoted
in (4.1.2).
123
For the next example, consider the posterior given in (8.3.3),
but with support,
A, defined as
A
i..~.,
(8.3.14)
(bh..: J..Ll S:J..L2 s: ... s: J..L m}
a simple orderi.ng of the
Il.
r- :1
The mean of this posterior would
•
be
co SJ..Lm-l
J,.J..L3 (2.
« . V "'l)~
-co -co ••• -co -co' exp f
==
CO
J..L m-I
J..L 3
J..L 2
,
J-co f -co" ... S-co J-co exp( - (.l-Id) V (y.y) /2)~
~...,;) I
~
E~)
Make the one-to-one transformation
t:S
z =
H(~
-1
1
0
o
0
0
-1
1
o
0
0
0
-1
o
0
0
0
0
-1
1.
0
0
0
(
Y"';.)/~
_....
.
- bk.,) , where
H
o
-1
or equivalently
(z
l
l = Yi+l - Yi -
zm
=
->'m
+
~m
~ i+1 + ~ i
'
:i. == 1,
••• , m-l
(8.3.15)
•
Now the region of integration in terms of the J..L-coordinates is defined
by the inequalities
124
i
c:::
1, ••• , m-I
which entail the following inequalities in terms of the z-coordinates:
(
-00< z.1. < Yi+l
i
-00< z m < +00
- y.
i
1.
::
I,
... ,
m-I
.
(8.3.16)
On the uther hand, for any set of
z~values
set of inequalities one can find a set of
satisfying the second
~-values,
according to
equations (8.3.15), which satisfy the first set of inequalities.
In
fact, solving (8.3.15) one finds
.,~
(
;;
~m
l~i
Therefore
z
m
= Ym + z
m
(i = 1,
= Y + zHl + z.
1.
i
< +00
implies
~
m
. " .,
< +00 and
z
m-l)
m-l
.
< Y - Y
m
m-I
implies
= Ym-I
+ z
m
+ z
m-I
< Y
+
m-I
and so on for the indices
i < m-l.
z ill
+ Ym = ~m
This proves that the region of
integration in terms of the z-coordinates is given by the
inequalities (8.3.16).
'
125
Thus
co
Sbm-.1 • •• S'b 2- Sb'I H. -.1 :~
1
S
- CO
-
- CO
CO
S-co
where
b.
V
Q
~~
.~
b'
...-
-co
-CD
1.
I .
H-1 I VB -1 z) 12)dz
I .
--
l.e t
.
Ii-~ VH- 1 •
Q
i\1c.U.(.r; Lhat
- CO
J ') S-co
1.
exp( - (z
-
Sm-I
YH1 - Y
i
~
- CO
-6
b'
1
VIi - z) 12)dz
~
--------
exp( - (2 1-H -
ttl
i.s pcuu:ivc dl.:'f.i.niLe since
is pos iti.ve. defini.t.e.
Tbe matrix
Q
L8
p.eon-s:ingular and
is symmetric sin.ce
V
is
Thus,
symmetric.
b
H-lS
co J m-I
_
-co-co
b
J
m-I
S
b
-co -co
=~ -
-1.
H D(~)/P
b
J-~S-~ exp(-E Q~/2)d~
CO
.. -1 -1
=~ -
H Q liP
(8.3.ll)
by applying the argument which derives (8.3.11) from (8.3.4).
The term
(8.3.4).
co for
for
V
D(~)
in (8.3.17) is similar to the term
By substituting
(ym - am) , -
CO
b.
~
for
for
(Y.
~
(y i - e i ) '
a.) ,
1
D(z) in
i = 1, 2. ,
••• , m- 1
i = 1, 2., ••• , m ,and
Q
in the derivati.on of (8.3.10); (8.3.17) can be expressed in
termS of multivariate normal integrals and other functions more
easily evaluated.
An example will nnw be given to show how thi.s can
be accomplished in case
m - 2. •
126
Coni;Ldur the pO:3ceri.ur
(8.3.18)
for
p(v·l.v)
-
\.,>here t.he set
elsewhere
I)
AU3
a"
H
folluws:
:::
[-1 1
0 -1 J '
-1
-1
J
and
-I
-')
;JO' - .
Notice that
.9..1 '"
-2
0'
-1
2
::: 0' /2 ,
1
Q
and
For this example, the appropriate substitutions in (8.3.8) would be to
(y 1 -e 1/)
alld
(Y2- e 2)
(y 1- a 1)
equal to
(Y 2 - Yl)
8 (CO)
2
in (8.3.9), one finds that
set
equal to
•
- CO,
(y 2-a2)
Then Holving for
to CO,
Sl(YZ-Y 1)
and
and
127
2
1
cr
r
··1'
"J
o
Then subs ti.tuting for
in (8.3.17) gives
1
222
.. 2,
exp(-(Y2- Yl) /(4cr »cr 1[2iT cr /2
....."'-~._--~.
ECbl)
CO
---_.
Y?-Yl
f-cof-::O"
....
-----"'----~
exp(-~'Q~/2)d~
Thus
(8.3.19)
Now cunsider an example in which a random sample of size one it::
ta k.en f rom eac h.
O'
f
'
1·
h
t 1lree popu 1atl.ons,
were
th
e 1':th
- popu l
at1:on
,las
normal densIty with mean
Suppose the parameter"
fL·1.
fL·1:
(1 "" 1, 2, 3) and known variance
cr
2
are unknown, but are known to satisfy
the ordering
The set in
~-space
defined by this ordering will again be called
The Bayesian est.imators fur the
form (4.2.2) can be
d£ri.v~'d
!-L'1.
A •
from an exponential prior of the
by .ti.nding the expected value of a
posterior di.stribution of th(; folluwing form:
t-"
~
(
,A
8
p,
ard
~
t~
1
.I
1 imit"
()f
r •~
dud
-I
.'
,
t-:'
J
(I
'[r, 2
I
(!
lntegratL~)n:)
a /lj,
I
l/8 l
I
+' I/O,
I
j
] /lj
l
18,)
)
3"
(y\ + cr'
'.I
/(J
ib
/8,
u /8 )
- a'/t!.
h."")
-to C5
)
I
!
.1
,)
-> (J~ /6,)
I
I
J
,
,>~
,V" ~
)
iu ,
..
129
The matri.x
Q
1.8
r-'
0
' -1
,I
Q
-
(]
-2
/:1
0
-1
-1
0
-1
-1
0
0
-1
"j
r"l
I
-1
-1
0
-1
-1
-1 I
!
I
The expression for
t.
1
1
1
2
2
l
1
2
3
I
,
I
-
,:
(]
-2
P would then be
The function given for
P
is sometimes denoted in a way similar to
the univariate normal distribution function (cf. Bi.rnbaum and Meyer
(1953».
Adopting this notation
Evaluating the numerator of (8.3.17) a substitution should be
made into (8.3.10) as was outlined in the paragraph prior to (8.3.18),
i·£.· ,
(i
Yi - a i
= 1,
= bi
2, 3) •
(i = 1, 2) ,
Y3 - a 3 =co,
Then the above mentioned
in (8.3.8) and so the vector
1
Yi - ei. = -co
Q will correspond to
given there would have values
V
130
(8.3.21)
1 =
J
o
Recall that in expression (8.3.9)
row and
column deleted, and
element deleted.
n .
..:J.~
V.
~
is the matrix
is the i th row of
V
V
= [1
1
1]-2
3 a
,
-1
V2
[3
-1
-1] 10'2
1 --2--
and substituting these into (8.3.9) it is found that
d
d
~Jc 2
(2"')
'II
\[2
2
Jc 1
1
(2:rr )-lh-ll-~
~ 1
exp ( -J;.'V 1! /2)d t 1d t 2
with
2
3
.£ = b 1[ -2
-21!l.....[I]cr-2+[-COJ
[-coJ
2 2
1
- co
= - co
'
and
d
-
=b
[,3
1 -2
with the
Then
V2
2
with the
2
b
-2 J ~ [l J cr- 2 + [ 2
2, 2
1
co
J
.th
~--.
i: th
131
Then the i.ntegral 1.S a bi.variate normal distribution. function
and
Sl(b )
1
is
?
Sl(b l )
The function
= exp(b-/4cr
1
S2(b 2 )
2
)2n"
2.
bl
~
F
V2 2 2
«- - b 2-
,CO, VI) •
would be (by (8.3.9»
with
and
Then the integral is a bivariace normal distribution function and
Then in (8.3.20)
I =
o
132
2
X - a C'~
Subs t i tut ing
E (\L) =
2
2
a I •
-
a
Recall
V
1
P, one finds that
and
=
X, in (8.3.17) one finds
X - a C'~
=X
-1
for
2
C
-1 -1
- H
'.e. -
V
-1
Q
H
1/ p
'1/ P
(8.3.22)
•
Then, substituting the expressions found for
- l/e 1
E~) =
X - a
2
....,
-1
0
0
1
-1
0
0
1
-1
4
a
-1 +~
F3 (b 1 ,b 2 , co ,Q)\{'£r IQ I
\['l'
(8.3.23)
o
Suppose
= 1,
i
made on each of
m
2, ••• , m • independent observations are
populations, and that each population has a
J..L.
normal di.stribution with mean
th 0
].-~.-.
O'~1.
,
known.
" from t t
t.h
· · b e d enote d b y
18 ·
1
- pop_ u l
atl.on
b servat.lon
-
the jOl.Ilt. density of th.e
f
and vari.ance
1.
(Yibl'9:)
II
m
L:
and tben
n.
1-
L: (y .. -J..L')
1.J
1.
1.=-'1 j=l
1.
1=1
ij
would be denoted by
-n
i
exp(0'.
m
(f1n) -n
1.
y
I,et the
2
2
1(2a .) )
1.
wh~te
m
n
=
L: n.1.
1=1
Assuming a uniform prior for
I:k,
over a set
equal to zero over the complement of
which is
o
A
on
c
and
m
exp(-
p~l~)
L:
SA
1.
(y.
L:
,~ 1..) 2 1(:2(/»
1.
l.j
---
n.
m
1.
2
2
exp(- L:
.. ~.) 1(2o'.»~
L: (y 1.j
1.1
i=1 j=1
exp ( -
n
i
2?
i
2J/(~La 2».
m [
L:
L: y .. -~J..L • L: Yij +n:f i
i
i=1 -i=1.._~~_1.--r.j_=.;;;1_.
_
n,
SA
A
n.
n
=
and setting the prior
A yields a posterior density
on
i=1 j=1
=
A
exp(-
m
1
2
L: [L: Yij,-2j.Li
i=l j=1
n.
1.
2
2
L: Y1j,+n i J..L ]I(20'i»~
j=1
m
')
')
,~) n. /(?,
"")')/
1: (-?.q.L "y- , .....'f-A'.
4].
. 1
1 1
1
1
1
--:;;1;...=.;:..,_
exp ( ==
fA
m
exp(-
2
_
1: (1=1
2
4J,.y.-+t.L .)n./(2c.»~
1
1
1
1
1
Completing the square, the posterior is found to be
p~lx)
HE::re
D
exp (- (i-y,)' D(ibJI..) /2)
=---------
fA
(8.4.1)
exp(-(i~)'D(i~)/2)~
is a diagonal matrix with elements
2
n.1. /a 1.
Thus, (8.4.1)
is a truncated normal posterior density.
If instead of a uniform prior, an exponential prior similar to
(4.2.2) is assumed, the posterior density is proportional to
Then following the precise steps given for (8.4.1) the posterior is
found to be
n.
p(Ul.l.~)
m
0:
exp( -
1: [~
1
n.
2
Yl' J' -2j.L l'
i=l j=l
+
~
m
(l/e m- 1
-
~
1
'=l
J
2
2
Y •. +n.~ ,]/a .)
IJL
l'
1
lie m») •
n,
The term
m
122
exp(-( L: L: y. ,/(J,)/2)
i=l j=l 1J 1
in the normalizing constant.
would cancel with the same term
Then the posterior would be
135
2
I (nm-l e m-l )
- ... - 4J. mQl (y. m-l -am-l
~
2
2
- 4J. m(ym + a mIn me m- l»nmla m
(8.4.2)
2
Then. de fining the vec tor !. to have elements (Yl~l/(nlel»'
2
2
(Y2~2/(n2e2) + a 2!(n2e l » , etc., (the i: th term will correspond to
the term with lJ.i in expression (8.4.2». Completing the gqua:ce
y
the posterior is found to be
where
D is the diagonal matrix with elements
This too is
of the form of (8.3.3) which is a truncated normal posterior.
truncated normal prior could be handled similarly.
A