Hall, Wm. J.; (1953)Estimation of the parameters of a bivariate normal population from truncated samples." (Navy Research)

.
13STIi:fATION OF THE PA.,.wmTERS OF A BIVARIATE
IWRHAL POPULATION FROM TRUNCATED SAMPLES
by
WHo JACKSON HALL
Institute of statistics, University of North Carolina
•
Special report to the Office of Naval Research
of work at Chapel Hill under Project ~m 042 031
for research in probability and statistics.
Institute of Statistics
Mimeograph Series No. 78
August 1953
";
, \.
~oduction
and summary.
l"aximum likelihood estimates of the
parameters of a bivariate normal distribution are obtained for a sample
in which only those observations falling in a specific region can be
measured, all other observations being called "unmeasured observations".
Two cases are treated, the number of unmeasured observations being unknown (Case I) or known (Case II).
Explicit expressions are obtained
when the region of truncation is a rectangle or an infinite strip.
The
asymptotic covariance matrix is obtained simultaneously with the solution.
t-.Te denote the bivariate normal density function with parameters
~x' ax' ~y' cry' and p (sometimes denoted ~l' ~2' ~3' ~4' ~5 for con-
venience) by ¢(x,y).
Then, i.n Case I, the likelihood of a sample of n
independent observations all in a region R is
1
n
II ¢(x. ,y.)
pn i=l
1.
1.
where
p
= Pr~(x,y) in R_7 = \ ¢(x,y)
dx
dy
R.
and, in Case II, the likelihood of a sample of N independent observations of which n observations occur in Rand N-n elsewhere is
-2n
(~T) (l_p)~T-n II 0{x. ,Y ) •
n
i=l ~ i
The partial derivative of the logarithmic likelihood,L , with respect to
one of the parameters, say A, is
at
ox
where f{n,p)
=
= _ f(n,p)
dp
'at
+
n/p in Case I and f{n,p) = (N-n)/{l-p) in Case II.
To
obtain the maximum likelihood estimates, all five partial derivatives
are equated to zero and solved for the unknorm parameters.
Iter~tive solut~on
of the maximum likelihood equations.
To solve
the five estimating equations simultaneously, we propose a Newton iterative procedure.
Choosing an initial trial solution, we approxiaate the
system of equations by a linear system
series expansion.
o
at
= d"X
usin~
the linear terms in a Taylor
Thus
(j=l, ... ,5)
j
where a subscript (l) denotes evaluation at the first trial point and a
superscript (1) denotes the first trial value.
In matrix notation we have
-3-
~mere f
is the (column) vector with elements
*~.
(j=l, •.• ,S), d(l) is
J
the (column) vector with elements (Aj_Aj(l», and
:\ = (a J.J
.. )
Second trial values are obtained from the first from
(assuming \ 1 ) non-singular), and by substituting these values for the
intti.al ones further estimates are obtained, and so on until stability
is reached.
(It
~-lay
not be necessary to recalculate the A matrix at
each step if its elements are sufficiently stationary.
i1hen it is recal-
culated, its inverse may be obtained quickl~r by iteration L-~_7.
~ecision
•
.
of
•
~stimates.
lbe
as~aptotic
inverse of the matrix with elements -::(-
2
covariance matrix is the
~.~X~), and thus may be estimated
I
by A-I.
)
J
1bis estin~te is obtained simultaneously with the solution of the
estimating equations.
-4Rectgular truncation. Here we develop explicitly the estimating
equations tor a rectangularly truncated population.
Let the region R be a
rectangle, boWlded by the line. x • hl' x • h 2 , Y • k l , 1· k 2 (h1<h 2 , k 1< k 2 ).
Then
k
P
•
h
2
\
2
)
~x.y) dx dy •
k1 hl
~Tow
_(x,y) may be expre.Md a. a power series in p with Hennite
functions as coefficients-
where
G (t)
•
(_l)v...!..
v
L
~n dt
v
• -
1
~n
e
_t 2/2
V
e- t
2/
2
( '11= 1 , 2 ,
•••
)
•
(See R. A. Fisher's introduction, pp.xxvi-xxviii, ~2_7.) For negative sub-
scripts, the Hermite functions are defined by the two following
t
1 -
~
-CD
relationB~
Since the series in p converges 'uniformly in x and y, and since
~._G~ (t)
dt ;.
- Gv- l(t), we have
11 2 (.,2
P
-
\
\
vfo
~ Gv(x) Gv(Y) cIx dy
"1 ~l
l-There
'1').
J.
=
(i=1,2)
and
After calculating the deriva.tives of the l. r,s function, we obtain the
following derivatives:
(COX)
TAELE:S OF PARTIALLY BALANCED
DESIGNS 1VITH TI-lO ASSOCIATE
CLASSES
By R. C. Bose, W. H. Clatworthy
and S. S. Shrikhande
(Inst. of Stat. Mimeo. Series
No. 77)
Name
Date
-6-
I
Op
1
(
1:
=
di:x
1;
1,0
(J
x
)
(1)
1
dP
d<Tx
~
dl)
!
I
!
dp = 61 ,1
d2D
Of.l.x
---';;'2
=
.,~
!
I
I
i
1
.~ i;2,0
a
x
02
~-;L.jax =
!
(I' Xl, 1 -I- 2: 2 ,0)
I
(
I
~x
=
2
d "0
dll.'x'y
aIL =
1 (I' Z2 1 + Z3 0)
a 2
,
,
x
1
-- z1,1
(J (J
x,y
I
,
!
i
I
i
2
d P _
df.L)~ay -
(j~y
=~
(1'1. 2 1+ );1 2)
,
,
I
(2)
)
\
02 p
1 ~'
~x~p = ax "'2,1
-7\
\
\
\
I
I
\.
all others being obtained by symmetry.
('1'he interchange of x and y re-
quires the interchange of the order of the subscripts on the Z
functions.)
r,s
In Case I,
o f ()
ou
n,p =
~.' n!p,2.
and in Case I:r.,
(4)
~p
f(n,p)
:=
(N-n)/(1-p)2.
Calculating the derivatives of log ¢(X,7),
~~e
find'
-8i1
•Z
(}
'd!J..-
~=l
log
:c
0( x.; ,Y.;) =
I
I
(5)
Z
i
_~\L log ¢
dcrx
J-
Z op loc.
¢
(
'\
I
I
\
\
I
npm
01
=
2-
a a (l-p )
x y
(6)
= -
11
lOW (1 + P 2 )
----2-2
-).ax ( loop
111
01
-
7
2p m
10-
-9-
o2
Z
np m1l
dcrF log ¢:: .-.
~
Yo
Y
crxy
cr (1- p )
= -
n 2 ~(l + 3p 2)(m + m )
20
02
(l-p )
- 2p (3 + p2) m - (1-p4
11
t7
j
all others being obtained by symmetry; we have denoted
(r,s = 0,1,2) .
m
rs
Using (1) and (5),
=
1'19
may now calculate the elements of the
o
-f(n,p) -:.
O~
+
n
0
z ~. -. log
i=1 o~
f
vector:
¢(X'1.'Yi)
.
and using (1), (2), (3), (4), and (6), we ;:lay calculate the elements of the
A matrix'
-10-
(8)
(The required Z
functions may be computed from tables of the tetrachoric
r,s
functions ~~~7, using the relation.iV1 lv(t) = Gv_l(t). )
The estimates for truncation over an infinite quarter-plane may be
obtained by letting one of the hIs and one of the kls in the discussion
above Co to
.~ 00.
Initial
estL~ates
may be obtained by approximating the region R by
an infinite strip and using the linear truncation estimation method which
follows •
.~ear truncation.
loVe shall consider independently the estimation
problem when the region R is an infinite strip, bounded by x
= hI
and x
= h2
(h l < h 2). Here, we shall distinguish three CalGS' (I) the number of unmeasured observations is unknown, (II) the nrnnbers of unmeasured observations
in each truncated half-plane are known (say nl observations in R =
l
["(x,y) ~ x < hL7 and n2 in R2 '" L(x,y)' x >h 2_7, nJ + n2 '" N-n), and (III)
only the total nrnnber N-n of unmeasured observations is known.
NoW the marginal distribution of x is independent of
and is simply a truncated univariate normal distribution.
~
y
, a , and p,
y
Thus, in all
r4
and (]' may be estimated by the methods of A. C. Cohen
7
x
x
- for truncated univariate normal distributions, the three cases above corres-
three cases,
f.l.
ponding to the three cases enumerated by Cohen.
The likelihood functions in the three cases
are~
-Il-
(
(9)
~
l
n
l
-
(I)
II ¢(xi,Yi)
. 1
p n J.C
11
11
(II)
k1 P1
1
P2
2
11
II ¢(xi,Y;)
1=1
11
(III)
k (1-p )N-n
II ¢(x:t,Y;)
2
i=l
where
P1::
r()
7
--(hl-U. )
Pr,
L x,y in R _. :: ~ -. x
1
-
O'x
and
t
~ (t)
.)
-co
.J:.... e-t
-12ft
2
/2 dt
kl and k2 are constants. Since p, P1' and P2 are all independent of ~Y' O'y'
and p , the maximum likelihood equations for these three parameters will be
the same in all three cases; namely, from (5) and (9),
-12-
~ = -
OOy
n_ - (1 - p 2 +p m - m )
cr (l-p 2)
11 02
= 0
y
where L denotes the logarithmic likelihood, m being defined by (7). Havrs
ing obtained estimates of ~x and crx by Cohen's method, these three equations
may be solved for estimates of !..Ly , cry , and p, yielding (after some algebraic
manipulation) :
2
I-L
y
- 2 m10 '~x +!..Lx )
=
(10)
m
+
'-!..L
(mOl,
Y)
-!..Lx
10
(11)
p
2
r~'" 2 _ ( m , - 2
'
mlO!..L
20
x +
-
x
substitute
r,1
' - ~O' mOl'
m20 I
~
-
';I~
7
2)_ .
=
where mrs' denotes sample moments about the origin.
. 11
~x
- .La
,
for
If
!-Lx
= ~O',
then we
-13in equations (10) and (11).
Since the estimates of ~x and a x are independent of (1 y , a y , and p,
we find that the A matrix is now the direct sum of two sub-matrices, and
likelflse for its inverse.
~
x
The first inverse sub-matrix (corresponding to
and a ) is given by Cohen
x
;-4 7,
- -
the second may be obtained by inverting
the matrix whose elements are given by (8).
Aclcnowledgement.
Acknowledgement is hereby made to Professor H.
Rotelling for proposing this problem and for helpful suggestions, and to
Professor 8. N. Roy for reading the manuscript.
Harold Hotelling, IIPractical problems of matrix calculation",
Berkele S osium on Mathematical Statistics and Pro abi i y, Univarsity of Ca ifornia Press {1949J, pp. 275-93.
British Association for the ~dvancement of Science, Mathematical Tables, Vol. I, Cambridge University Press.
Tables for Statisticians and Biometricians,
Part II, Cambridge University Press (1931),
pp. 74-7.
I~stimating the mean and variance of normal
populations from singly truncated and doubly
truncated sample sit, Annals of Mathematical
Statistics, Vol. XXI (1950), pp. 551-69~ .