Lucas, Donna.Toward Reconstruction of an Un-paired Random Sample."

TOWARDS RECONSTRUCTION OF AN UN-PAIRED RANroM SAMPLE
Donna Lucas*
Department of Statistics
University of North Carolina at Chapel Hill
*This
work was supported in part by the National Science Foundation under
grant MCS78-01434 and by the Army Research Office under Contract
DAAG29-77-C-0035.
OONNA LUCAS.
Towards Reconstruction of an Un-paired Random Sample.
(Under the direction of INDRA M. Q-IAKRAVARTI and NORMAN L. JOHNSON.)
The problem of re-pairing a broken random sample, as first
presented by DeGroot, Feder, and Goel (1971), is considered.
The probability of occurrence of a sufficient condition for
equivalency of the maximun likelihood and optimal solutions when the
broken random sample is from a bivariate nonnal distribution is examined.
This requires an investigation of the distribution of the
product of ranges of the component variables in samples from a bivariate
normal distribution.
For the problem of matching one particular observation of the
•
un-paired random sample, a procedure based on the most probable rank of
the concomitant of that observation is proposed.
The maximun likelihood
solution to the matching problem for multivariate nonnal data when each
observation vector has been separated into three components is given.
Additionally, reconstruction of partially broken random samples is
presented in the framework of regression and experimental designs.
For the case in which the partially un-paired random sample arises
from a bivariate normal distribution, the maximun likelihood estimator
of the correlation coefficient is derived.
Where X and Y have an
absolutely continuous bivariate distribution, the joint distribution of
the i th ordered X and j th ordered Y is discussed as a useful tool in
examining the consequences of estimation and testing procedures applied
to re-paired samples.
ACKNOWLEDGEMENfS
With sincere appreciation I acknowledge the guidance and encouragement of my advisors, Professors I.M. Chakravarti and N.L. Johnson.
I also wish to thank members of my examination committee,
Professors R.J. Carroll, R.W. Helms, and P.K. Sen, for their interest
and helpful corrnnents.
To other members of the faculty of the Department
of Statistics and the Department of Biostatistics who have contributed
to my graduate education I extend thanks.
In particular I wish to
mention the friendship and advice of Professor M.R. Leadbetter.
I express appreciation to the National Science Foundation and the
Department of Statistics for financial support throughout my graduate
education.
Special thanks go to Ms. Mary Riddick for an excellent job of typing
and cheerful assistance with many details.
ii
•
TABLE OF mNTENTS
CHAPTER I:
INTRODUCTION
1.1.
The Matching Problem
1
1.2.
The Multivariate Normal Case
9
1. 3.
Outline of Proposed Research
12
aIAPTER II:
2.1.
COMPARISON OF MAXIMUM LIKELIHOOD AND OPTIMAL SOLUTIONS
Results for n
2.2.1.
=4
14
Distribution of the Product of Ranges of Correlated
Deviates
~
Introduction
19
=2
2.2.2.
Exact Distribution for n
2 . 2 . 3.
Lower and Upper BOlmds
25
2.2.4.
Calculations for Larger Samples
31
2.2.5.
Moments when n
2.2.6.
First and Second Moments when n
2.2.7.
Lognonna1 Approximation
aIAPTER III:
3.1.
24
=2
35
=3
39
49
MATCHING A SINGLE OBSERVATION
Application of the Distribution of the Rank of the
Concomitant
55
3.. 2.
The Most Probable Rank of the First Concomitant
58
3.3.
The Most Probable Rank of the nth Concomitant
62
3.4.
General Results for n
=2
iii
64
CHAPTER IV:
CHAPTER V:
5.l.
5.2.
5.3.
RE- PAIRING A PARTIALLY BROKEN SAMPLE
THE JOINT DISTRIBUTION OF ORDER STATISTICS OF
CORRELATED VARIABLES
Joint Distribution of Xi : n
Preliminary Computations
and
Joint Distribution of Xi : n ,
Xj
CHAPTER VI:
69
-
76
Yj : n
80
:n' Yk : n
and
Yt:n
82
RECONSTRUCTION OF A '!WI CE BROKEN SAMPLE
6.l.
Trivariate Normal Case
86
6.2.
A Three-Dimensional Assigrunent Problem
91
6.3.
Multivariate Normal Case
97
CHAPTER VI I : MATCHING UNDER OTHER M:>DELS
7.l.
Experimental Designs
100
7.2.
Regression
105
BIBLIOGRAPHY
e
107
iv
-~
rnAPTER I
INTRODUCfION
1.1.
The :Matching Problem
Many versions of random matching problems have been discussed.
One such problem is the following, as given by Feller (1957).
Two
identical decks of N cards are each put into random order and then
compared.
A match occurs whenever a card occupies the same position in
both decks.
The frequency distribution of the number of matches is
calculated.
We will investigate a different type of matching problem,
in which available information about the objects to be matched is used
with the goal of obtaining a more correct pairing than would result under
random matching.
This problem was introduced by DeGroot, Feder, and
Goe1 (1971).
Let T denote an r-dimensiona1 random vector (r
s-dimensiona1 random vector (s
~
1).
~
Let r + s = p.
1) and U an
These random vec-
tors have a joint p-dimensiona1 probability distribution, which we assume
is known.
A random sample of size n is drawn from this probability
distribution:
[~:]
, r~:] ,.. , [~:].
2
However, before this sample can be observed, each observation is broken
into two components, !i and
~i·
Then!I'!Z' .•. , !n are observed in
some random order which we denote by yl ' Y2' •.• ,
are observed in an independent random order -
~n'
~l' ~2'
observed vectors - Y1' Y2' ... , Yn and ~l' ~2' ..• ,
broken random sampZe from the joint distribution of
~n
!
and
~l' ~2'
... ,
••• , w
• The
-n
- are termed a
and
y.
Due to the
randomization, it is not known how the vectors were paired in the original unbroken sample.
The general problem is to re-pair the vectors in
some optimal manner, attempting to reconstruct the original random
sample.
A research worker could encounter such a problem due to accidental
causes or negligence.
occur.
Also, a situation similar to the following could
Measurement of a particular characteristic is taken on each
member of a set of individuals.
Later in the course of research, it is
decided to measure other characteristics of these same individuals.
I{owever, because of lack of foresight in recording results, it is not
possible to identify each earlier measurement with a specific individual
and thus with a later set of observations.
A re-pairing procedure is
required in order to jointly utilize all data.
In the first presentation of this problem, DeGroot, Feder, and Goel
(1971) assume that the joint distribution of
!
and
y can be
represented
by a probability density function of the form
f(t, u) = aCt)
S(~)
exp[y(!) o(u)],
t
E
r
R,
U E
s
R ,
(1.1)
where a, S, y, and 0 are real-valued functions of the indicated vectors.
If T and U are both one-dimensional and jointly have a bivariate normal
~n
3
distribution, then the pdf is of the fonn (1.1).
the joint distribution of
!
and
~
If r
= 1 or s = 1 and
is multivariate nomal, then the pdf
is also of this fonn.
DeGroot, Feder, and Goel (1971) propose re-pairing the broken random sample in the way which maximizes the probability of a completely
correct set of n matches.
Let ¢ denote the set of all n! possible pennu-
tations of the integers 1, 2, ... , n, and let ¢
denote one such pennutation.
If the vector
vector vo in the unbroken sample, i
-1
=
= [¢(l), ¢(2), ... , ¢(n)]
~¢(i)
were paired with the
1, 2, ••. , n, then the joint pdf
for the entire sample would be
n
II
n
i=l
fev., W,l,(O))
-1
-'/' 1
n
[i~l a('!:i)Jli~l SC~i)J exp[JI Y('!:i) o(~¢(i))] .
=
n
(1. 2)
The maximum likelihood (ML) solution to the matching problem is the
pennutation ¢ for which (1.2) is maximized.
Let Xi
= Y(~i) and Yi = o(~i) for i = 1, 2,
without loss of generality, that the vectors
~l' ~2'
... ,
~n
.. . , n.
~l' ~2'
We assume,
and
... , v
-n
are indexed so that
(1.3)
Furthennore, let x I
and y(¢)'
=
(xl' xz'
... ,
t
xn ),
= (y l' Y2' ... , yn) ,
(Y¢(l) , Y¢(2)' .•• , Y¢(n)) for any perutation ¢ E ¢.
.
=
Then
we see from (1.2) that the ML solution is the pennutation ¢ for which
~'
r(¢) is maximized.
Since the components of x and yare ordered
according to (1.3), by a well known result (see Hardy, Littlewood, and
Palya (1967), page 261) Xl y(¢) ::; x'
y for all ¢
......
.....,.
~......
E
¢.
The ML solution is
4
the permutation ¢* = (1, 2, ... , n), which pairs w.
with v.,
-1
-1
i
= 1, 2, ... , n.
To find the posterior probability of obtaining a completely correct
set of n matches, we note that all n! permutations in <P have the same
prior probability of being the permutation of the unbroken sample.
Applying (1.2), we express the posterior probability p(¢) of any ¢
E
<P,
after the broken random sample has been observed, as
p(¢) = exp[x' y(¢)]j L exp[x' yeW)]
WE<P
--
(1.4)
The permutation ¢* = (1, 2, •.. , n), which is the ML solution, has the
highest posterior probability.
Considering the ML solution for the bivariate normal case, let
2
x.1
= v.ja
y. = lpj(l-p 2)]w.ja
1
1
1
u' i = 1, 2, ... , n, where aT is the
T and
variance of T, a~", the variance of U, and p the correlation coefficient.
Assuming that the observations of the broken random sample are indexed
so that (1.3) holds, the ML solution is to pair Wi with Vi'
i = 1, 2, ... , n.
vn and WI ~ w2 ~ ... ~ wn ' and pairing Wi
with v1' if p > 0, w
I ' with v.1 if p < 0, for i = 1, 2, ... , n. Of
n+ -1
course, if p = 0, then all permutations are equally likely. We note
according to VI
~
This is equivalent to ordering the observations
v2
~
..•
~
that the ML solution for the bivariate normal case depends only on the
sign of p.
Thus for p > 0, the ML solution dictates the same re-pairing
procedure, regardless of the magnitude of p.
DeGroot and Goel (1976)
investigate properties of the ML solution for the bivariate normal case.
Monte Carlo methods are used to obtain estimates of the probability of
a completely correct set of n matches, the expected number of correct
5
matches, and the probability of no correct matches when applying the ML
procedure.
The results show that this method performs better than ran-
dom matching with regard to all three criteria.
As a further example, consider the case in which T is one-dimensional,
U is s-dimensional, and their joint distribution is nR.lltivariate normal.
We partition the mean vector and covariance matrix of the joint distribution as
and
Let
ai
IU
=
ai - ~Tu ~ ~TU·
choosing x.1
= v·1
i
= 1,
2, ... , n.
i
= 1,
2,
... ,
and y.1
=
Examination of the joint pdf leads to
2
(1/OT ,U) [~
+ -0TU' ~uu
):_~ (w.
- ~U)],
' 1
-1
_
Assuming that (1.3) holds, we pair
n, to obtain the ML solution.
~i
with vi'
For the nR.lltivariate normal
distribution,
Thus the ML solution is to calculate the values z.
1
i
= 1,
that vI
i
= 1,
= E(TIU- -=1
w.),
2, ... , n, index the observations in the broken random sample so
:0;
v2
:0;
•••
:0;
vn and zl
:0;
z2
:0;
•••
:0;
zn , and pair w.
with v.,
-1_1
2, ... , n.
DeGroot, Feder, and Goel (1971) also investigate the problem of
re-pairing the broken random sample in order to maximize the expected
munber of correct matches.
The solution which meets this requirement
is referred to as the optimal solution.
It is still assumed that the
6
joint pdf is of the form (1.1).
The sample is from an absolutely
continuous distribution, and so we may assume that xl < x 2 < ••• < xn
and Yl < Y2 < ••• < Yn ' For any pennutation rr E ¢, denote by M(rr) the
expected number of correct matches when w
is paired with v.,
-rr (.)
1
-1
i = 1, 2, ... , n. Let N(rr,¢) be the number of values of i,
i
= 1, 2,
in ¢.
... , n,
for which rr(i) = ¢(i), where ¢ is also a permutation
When the broken sample is re-paired according to rr, and the
pairing in the original sample was actually ¢, then N(rr ,¢) is the mnnber
of correct matches.
Thus ,
I
M(rr) =
N(rr,¢)
p(¢)
for any rr
E
(1.5)
¢,
¢E¢
where p(¢) is the posterior probability given by (1.4).
Considering
(1.4), we see that the permutation which maximizes M(rr) is the same as
that which maximizes
V(rr) =
I
N(rr,¢) exp[~t r(¢)].
(1. 6)
¢E¢
The following result is shown by DeGroot, Feder, and Goel (1971).
Theorem 1.1.
Let ¢
E
¢ be a pennutation such that ¢(i) > ¢(j) for some
integers i and j, 1
~
i
~(i)
= ¢(j),
~(j)
< j
~
= ¢(i), and
n.
~(k)
Let
~
be defined by the relations
= ¢(k) for all other k. Let
(1.7)
7
If A*
~
0 for every pair of integers (q,h) such that either 1 ::; q
and j
<
h ::; n or <p (i) < q ::; n and 1 ::; h < i, then M( l/J) > M( <p) .
Applying this theorem, it is established that if <po
E ~
<
<p(j)
is a permutation
for which the expected number of correct matches is maximized, then
<P°(l)
= 1 and <p°(n) = n. Thus for n = 2 and n = 3 the optimal and ML
solutions are identical.
Additionally, sufficient conditions are found under which, for an
arbitrary n > 3, the optimal solution is simply to pair w1" with v",
~
i = 1, 2, ... , n.
~1
The following theorem gives conditions which, as
DeGroot, Feder, and Goel note and as can be observed from simple nurnerical examples, are usually more restrictive than necessary.
If (xn - x2) (Yn-l - Yl) ::; 1 and (~-l - Xl) (Yn - Y2) ::; 1,
the expected number of correct matches is maximized by pairing ~i with
Theorem 1.2.
v·,
i = 1, 2, ... , n.
~1
It follows that if (xn - xl)(Yn - Yl) ::; l,the optimal solution is the
same as the ML solution. For the bivariate normal case, this condition
is
(1. 7)
In Chapter II we
examine
the probability with which this condition
holds, and thus are led to consider the problem of the distribution of
the product of ranges of the component variables in samples from a
bivariate normal distribution.
8
Chew (1973) extends the matching problem as first introduced to a
more general class of bivariate distributions - those whose densities
possess a monotone likelihood ratio (MLR).
non-decreasing MLR if for any xl
<
A pdf f(x,y) has a
Xz and Yl
<
yZ' g(xl , x Z; Yl' YZ)
2:
0,
2:
0
where
g(a,b; c,d) = f(a,c)f(b,d) - f(a,d)f(b,c) .
We say that f(x,y) possess a non-increasing MLR if g(xl , xZ; Yl' YZ)
for every xl ~ Xz and Yl 2: YZ·
Suppose then that we have a broken random sample Xl' x z'
... ,
xn,
Yl ' YZ' ... , Yn from a bivariate distribution whose pdf f(x,y) possesses
a non-decreasing MLR. Assume that the observations are indexed so that
(1.3) holds.
If the observations Xi and Y¢(i) were paired in the
original tmbroken sample, for i = 1, Z, ... , nand ¢
E <I> ,
then the joint
pdf for the sample would be
n
L(¢; x, y) = IT f(xo, Y¢(o)) .
°
1
1=
1
(1.8)
1
The ML solution, which maximizes (1.8) among all ¢
E <I> ,
is
¢* = (1, Z, ... , n), due to the fact that f(x,y) possesses a
non-decreasing MLR.
Also, given the observations of the broken random
sample, the posterior probability of a completely correct set of n
matches for any ¢
E <I>
is
p(¢) = L(¢; x, y)!
-
L
~E<I>
L(~;
x, y)
(1. 9)
9
This is maximized by the ML solution, pairing y.1. with x.1. for
i = 1, 2, ... , n.
If the broken random sample is from a distribution
whose density possesses a non-increasing MLR, the ML solution is to pair
yn+ l-1.
' with x.1. for i = 1, 2, ... , n.
Chew (1973) and Goel (1975) establish results concerning optimal
pairing for this enlarged class of bivariate distributions.
As is done
for the class of distributions whose pdf's are of the fonn (1.1), sufficient conditions are obtained illlder which the optimal solution is equivalent to the ML solution.
Additionally, Chew (1973) compares the expected
number of correct matches for the ML solution with that for random
matching, when the broken random sample is from a distribution whose
density possesses an MLR.
If the n objects to be paired are distinct,
then the expected number of correct matches when pairing randomly is 1.
It is shown that M(¢*)
1.2.
2
1.
The Multivariate Normal Case
Suppose that r, the dimension of
!,
and s, the dimension of U, are
both greater than 1, and T and U have a joint multivariate normal distribution.
L
Assume that the mean vector is
2,
be partitioned according to
rxr
E21
sxr
Let
rxs
E22
sxs
and let the covariance matrix
10
L:
-1
=
-pxp
~11
rxr
~12
rxs
~2l
~22
sxr
e
sxs
Given the observations in the broken random sample, the likelihood
function of a permutation ¢
¢
E
is
n
I v!
C w
]
L(¢) = exp[-1/2 i=l
-1 -12 _¢(i)
,
omitting the factors which do not depend on ¢.
(1.10)
Maximizing L(¢) is
equivalent to maximizing the posterior probability of a completely correct
set of n matches, and L(¢) is maximized if
n
C(¢)
= I v! ~12 ~,j,(1·)
i=l -1
't'
(1.11)
is minimized.
DeGroot and Goel (1976) show that determining the ML solution to the
matching problem in this case is equivalent to solving a corresponding
linear assignment problem.
r ~ s.
Asstmle, without loss of generality, that
Then rank (f12) ~ r.
Let ~i = ~i and li = ~12 ~i'
We can write
n
C(¢)
= I x! y¢(.) .
. 1 -1 - 1
(1.12)
1=
Let P = ((p .. )) be an nxn matrix with p .. = x! y .. Finding the permuta1J
1J
-l-J
tion which minimizes (1.11) is the same as choosing a .. ,
1J
i, j = 1, 2, ... , n, to minimize
11
n
C
n
= L
i=l
L a .. p .. ,
j=l 1J 1J
subject to the following restrictions:
n
i = 1, 2,
... , n
,
L a .. = 1
i=l 1J
j = 1, 2,
... , n
,
a .. =
.
L a .. = 1
j=l 1J
n
1J
0
or 1
This is the standard linear assigrunent problem, with cost matrix
There is no simple solution, but algoritluns are available.
~.
The Minimum
Entry algorithm, a suboptimal rule which is easy to calculate, is used
by DeGroot and Goel (1976).
In :Monte Carlo studies, this algorithm
proves to be almost identical to the ML rule.
Applying the concept of canonical correlations and canonical variables, another matching procedure for the multivariate normal case is
developed by DeGroot and Goel (1976).
~ A ~ O.
Let the corresponding canonical
r
2 ~
variables for T.- be a l = a'
T,
, ar = a'
T, and the corresponding
~l ~
~r ~
canonical variables for U be Sl = ~i ~, ... , Sr = ~~~. It can be shown
between! and
~
by Al
~
Denote the canonical correlations
1..
that
(1.13)
We denote by ~i the value of <4< when! = ~i' and by ~i the value of ~
when U
= w.,
for i = 1, 2, ... , n, and k = 1, 2, ••• , r.
~
~1
as given by (1.11),
Then, for
C(~)
't'
12
(1.14)
Since Al
Ar , when Al is large relative to A2, ... , Ar , a
rule which maximizes I~=l ali I3 I ¢(i) should be close to the ML rule,
which minimizes C(¢). Hence, assume that the observations of the broken
~
A2
~
...
~
random sample are indexed so that all
<
a 12
< •••
<
a ln and
The canonical rule specifies that w.
be paired
-1
with v., i = I, 2, ... , n.
-1
ML rule are identical.
If rank (E12)
=
1, the canonical rule and
In a Monte Carlo study for r = 2, it was found
that the performance of the canonical rule was fairly similar to that of
the ML and Minimum Entry rules for A2 small compared to AI' the canonical
rule being the simplest of the three to calculate. For the case in which
A2 was not small compared to AI' the ML and Minimum Entry rules were
better than the canonical rule.
1.3.
Outline of Proposed Research
The problem of matching one particular observation of the broken
random sample, rather than re-pairing all observations, is investigated
in Chapter III.
A procedure, based on the most probable rank of the
concomitant of the observation to be matched, is proposed.
This may be
applied to a broken random sample from any absolutely continuous bivariate
distribution.
Additionally, the rost probable rank of the concomitant is
studied for the case in which the concomitant corresponds to either the
smallest or largest ordered observation.
Another matching problem considered is the following.
Suppose, when
sampling from a bivariate normal distribution, some of the observations
are still paired and the others have each been separated into two
13
components.
In Chapter IV the maximum likelihood solution to this
problem is given.
Of interest is the case for which no prior knowledge
of the correlation coefficient is available.
In Chapter V the joint distribution of the i th ordered X and the
jth ordered Y is discussed, where X and Y have an absolutely continuous
bivariate distribution.
This distribution will be useful in examining
the consequences of estimation and hypothesis testing procedures applied
in situations where the random sample is either partially or completely
un-paired.
Chapter VI extends the matching problem for multivariate normal data
to the case in which each observation vector is separated into three
components.
When the joint distribution is trivariate normal, the maxi-
mum likelihood solution is of simple form for certain correlation structures.
For other less commonly occurring correlation structures, the
maximum likelihood problem is shown to be a special case of the threedimensional assignment problem.
Reconstruction of partially broken random samples in the framework
of regression and experimental designs is discussed in Chapter VII.
Questions for further research are presented.
CHAPTER II
COMPARISON OF MAXIMUM LIKELIHOOD AND OPTIMAL SOLUTIONS
In this chapter we first compare the ML solution with the optimal
solution, that solution for which the expected number of correct matches
is maximized, when the broken random sample is from a bivariate normal
distribution and n = 4.
Then we examine the probability of occurrence
of a sufficient condition for equivalency for general n, and consequently
consider the distribution of the product of ranges of the component variables in samples from a bivariate normal distribution.
Results for n = 4
2.1.
Assume that the broken random sample is from a standardized bivariate normal distribution, with correlation
the ML and optimal solutions are the same.
p >
O.
For n = 2 and n = 3,
For general n, a sufficient
condition for equivalency of the two rules is given by (1. 7).
DeGroot
and Goel (1976) give results of a MOnte Carlo study on reconstruction of
broken random samples, in which it is observed that of 10,000 samples
for n
= 4 and
p
= .05(.05).95, not a great number fulfilled this suffi-
cient condition.
Nevertheless, the ML and optimal rules were identical
for all samples.
Hence, we further consider conditions for equivalency
of these two solutions for the case n = 4.
Denote the ordered observations from a broken random sample of size
4 by Xl < x2 < x 3 < x4 and y1 < y2 < y3 < y4·
The ML solution to
15
re-pairing is the permutation ¢* = (1, 2, 3, 4).
the optimal solution, we mow that ¢O(l)
that either ¢o
=
¢* or ¢o
=
¢1
=
=
1 and ¢O(4)
(1, 3, 2, 4).
solutions to be the same, we must have V(¢*)
defined by (1. 6).
Letting ¢o represent
~
=
4.
It follows
For the ML and optimal
V(¢l J , where V(¢l) is as
Equivalently, we must have
(2.1)
Let r = p/(1_ p2).
Let fer) =
<I>
V(¢~)
As p increases from 0 to 1, r increases from 0 to
- V(¢l J .
00.
Noting that
= {(1,2,3,4J, (1,3,2,4) , (4,2,3,1) , (4,3,2,1) ,
(2,1,3,4), (2,3,1,4), (3,1,2,4) , (3,2,1,4),
(1,2,4,3), (1,4,2,3), (1,3,4,2), (1,4,3,2) ,
(4,1,3,2), (4,3,1,2), (4,1,2,3) , (4,2,1,3) ,
(3,2,4,1), (3,4,2,1), (2,3,4,1), (2,4,3,1) ,
(2,1,4,3) , (2,4,1,3), (3,1,4,2) , (3,4,1,2)},
we evaluate fer) as follows:
4
fer) = 2{exp[r iII xiYi] - exp[r(xIYI + x 2Y3 + x3Y2 + x4Y4)]}
e
16
exp[r(~Y2
- exp[r(xl Y2 + x2Y3 + x3Y4 + x4Yl )] +
+ x2Y4 + x3Y3 + x4Yl)]}'
(2.2)
We observe that the first four expressions of (2.2) enclosed by
braces {••. } are non-negative, due to the facts that (1) r
(2) (xiYj +
e
~Yi) >
(xiY i +
~Yj)
for i
k, j
<
<
~
0,
i, and
~
(3) {exp(t s2) - exp(t sl)} increases with t for t
0, 0
~
sl
<
s2'
That the remaining two tenns enclosed by braces be non-negative is the
condition given by Theorem 1.1 for M(<p*)
~
M(<Pl ).
Obviously there are
cases in which these latter tenns are negative but are outweighed by the
positive tenns, yielding a positive fer).
Inunediately from the above expression for fer), it follows that
f(O) = O.
Of course for
thus M(<p*) = M(<P l ).
p
= 0 all permutations are equally likely and
Also, the behavior of fer) as r
+
00 can be evaluated.
The maximum value of It=l xiY<p(i) is achieved for <p = <P* = (1,2,3,4), and
It=l xiYi can be either positive, zero, or negative.
then fer)
+
+00 as r
all <p f <p* (<p
E ~),
+
00.
4
4
If Ii=l xiYi = 0, then Ii=l xiY<p(i) < 0 for
and so fer)
4
Li=l xiY<p(i) < 0 for all <p
If it is positive,
E
+
+2 as r
~, fer)
+
+
00.
0 as r
Finally, when
+
00.
17
We next find fl(r) to further investigate the behavior of fer).
4
4
fl(r) = Z ni~l XiYi) exp[r
xiYi] - (xlYl
ill
exp[r(x1Yl
+
XZY3
+
X3YZ
+
+
xZY3
+
x4Y4)]}
x 3YZ
+
x4Y4)
18
+
(xlYZ
+
x ZY4
+ x:SY3 +
x4Yl) exp[r(xlyZ
+
x ZY4
+
x3Y3
+
x4Yl)]}·
(Z.3)
Evaluating this expression at r = 0,
(Z.4)
Since
z < x3 and YZ
X
say that for "small"
Recalling that f(O) = 0, we can
3
the ML and optimal solutions are the same.
< Y ' f' (0) > 0.
p,
Continuing, we can obtain
~'(O)
4
~'(r)
Z
= Z {(i~l xiYi ) - (xlYl
+
Z {(xlY4
+
xZY Z + x3Y3
and evaluate it at r
= 0.
Z
x3YZ + x4Y4) }
+
xZY3
+
+
x4Yl)
z-
(xlY4
+
z
xZY3 + x3YZ + x4Yl) }
19
Upon simplification,
P'(O)
=
8(x3 - x2)(Y 3 - YZ)[(x 2
+
x3)(Y2
+
Y3)
+
(xl
+
x4)(Yl
+
Y4)]·
(2.5)
Thus, PI(D) is positive when [(x2
is positve.
+
x3)(Y 2
+
Y3)
+
(Xl
+
x4)(YI
+
Y4)]
In concluding this section, we note that we have not fOlmd a simple
necessary and sufficient condition for equivalency of the ML and optimal
solutions when n
=
4.
However, our investigation of the behavior of fer)
seems to indicate that the ML rule can be used as a substitute for the
optimal rule without a significant loss in terms of expected number of
correct matches, as is certainly confinned by the :Monte Carlo study of
DeGroot and Goel (1976).
2.2.1.
Distribution of the Product of Ranges of Correlated Deviates Introduction
Let (X.,Y.),
i
].
].
=
1,2, ... , n, be n independent random variables
from a standardized bivariate normal distribution with correlation p.
When the XIS are arranged in ascending order as
Xl :n < X
:2n
~
...
~
Xn:n ,
we denote the Y's paired with these order statistics by
20
Y[l:n]' Y[2:n]' ... , Y[n:n] ,
and call these the concomitants of the ordered XI s.
Similarly, let the
ordered YIS be
~Y
n:n
and denote their concomitants by
X[l:n]' X[2:n]' ... , X[n:n] •
Let the range of the XIS and the range of the YI S be
WXn =Xn:n -Xl:n andWYn =Yn:n -Yl:n .
We denote the product of these ranges by
Introducing further notation, we say that X sJ: Y if X and Yare
identically distributed. Also, X ~t Y if P(X
and X sJ Y if P(X
>
x) ~ PCY
>
>
x) ~ pcy
>
x) for all x,
x) for all x.
When p > 0, the sufficient condition (1.7) for the
solutions to be identical can be expressed as
~
and optimal
21
For p <
a the
sufficient condition is
•
-n:n - X)(Y
l:n l:n - Y
n:n) ~ (1_ p2)/p ,
(X
which we re-write as
Noting that ~xy(n,p) s=t Wxy(n,-p), we consider the distribution of
Wxy(n,p) for p >
a and
general n.
In particular, we investigate
(2.6)
When p = 1,
P[Wxy(n,1)
~
wJ = PLWXn
~ .;w ]
= FW (.;w)
,
(2.7)
n
where FW (t) is the cdf of the range for a sample of size n from a
n
standard normal population. When p = 0,
(2.8)
where f W (t) is the pdf corresponding to FW (t). Distribution functions
n
n
of Wxy(n,O) and Wxy(n,l) are displayed in Figure 2.1 for n = 6 and n = 10.
Tables of Harter (1970) for the pdf and probability integral of the range
were utilized in these calculations.
Numerical integration was performed
using Simpson's rule together with Newton's 3/8 rule.
22
..
n
..,
..
c:
o
•
.!
j
~
...
.~
~
•~
Q
..
.....
.~
~u
ltQ
"
c:
..
23
.
o
'"
....
N
"0
L
...c:
0
.
....
0
';
'">-
'"
.-:
.., ....u
~
0"
~
;.<
... ...
...
e
0
u
c:
::>
Q
N
IS
...
...
.~
::>
~
'~
'"
0
C
......'>"
i
C
0
:0-
"
u
0
seo
S'O
Sl'O
0
r~1
-;;:
0
.::.>><
:II
...
~
e
24
Additionally, we note that asymptotically Wxn and 1Yn are
independent.
Z. Z. 2.
(See Geffroy (1958, 1959).)
Exact Distribution for n = 2
Suppose n
=
P[WxylZ,p)
2.
~
Then,
w] = P[(X 2 : Z - Xl : 2) (YZ:Z - Yl:2)
= P [-w
Z -
~)
u.y
~
< (X
= P [-w/ Z
<
(Y2 - Y1)
~
~
w]
w]
w/2] ,
where U and Y have a standardized bivariate nonnal distribution with
correlation p.
Thus,
~
2
2
(l-p 2
)/p] = P[-ll-p
)/2p < U·y
~ (l-p )/Zp] •
(2.9)
Applying the results of Aroian, Taneja, and Cornwell (1978) on the
distribution of the product of bivariate nonnal variables,
P[U·Y ~ z] = .5 + n- l ~ g(z,p,t)dt ,
where
g(z,p,t) = t- l G- l {(G;I)~ sin tz - (~I)~ cos tz} ,
2S
To evaluate (2.9), numerical methods and computer code as described by
Cornwell, Aroian, and Taneja (1977) were used.
the second column of Table 2.1, for p
Results are displayed in
= .01, .05(.05).95.
•
2.2.3.
Lower and upper Bounds
We now derive lower and upper bounds for P[Wxy(n,p)
~
w].
First,
note that we may express y./Q in the form
1
rz
y./Q = p X./I.l-p1
1
+
Z.,
1
i=1,2, ... ,n,
(2.10)
where the Zi' s are mutually independent standard normal random variables,
and the Zi' s are independent of the Xi's.
(Y·/Qlx.
= x.) is normal Nlp
111
The distribution of
x./Q,
1
1).
Applying (2.10), let us order the y' s.
where Z[i] is that Zi accompanying Yi:n.
and
Then
It follows that
e.
26
Table 2.1
e
P[Wxy(2,P) ~ (1_p 2)/P]
t
•
. e
•
p
Lower BOlmd
Exact Value
Upper Botmd
.01
1.00000
1.00000
1.00000
.05
.99998
.99999
.99999
.10
.99607
.99744
.99777
.806
.15
.97447
.98367
.98590
.805
.20
.93182
. ~5716
.96329
.805
.25
.87393
.92172
.93338
.804
.30
.80796
.88102
.89937
.799
.35
.73979
.83745
.86329
.791
.40
.67317
.79239
.82629
.779
.45
.60998
.74657
.78892
.763
.50
.55087
.70032
.75137
.745
.55
.49584
.65370
.71361
.725
.60
.44449
.60656
.67544
.702
.65
.39622
.55862
.63647
.676
.70
.35041
.50942
.59613
.647
.75
.30626
.4~826
.55353
.615
.80
.26247
.40411
.50732
.578
.85
.21846
.34521
.45523
.535
.90
.17211
.27816
.39278
.480
.95
.11886
.19419
.30749
.399
(Exact-LB)/ (UB-LB)
27
12
/ fl- p-IWXn
Yn
CYV
r7"
t
s~
= t)
pt/ 11- p-
+ WZn .
(2.11)
•
Now considering the distribution of Wxy(n,p),
•
=
fa FWYn (~I WXn = t):£Wn (t)dt
.
From (2.11)
F
l~
WYn
I WXn = t)
= P[W
Zn
P [pt + Q
2:
:::;
(~t
Wzn
~f
]
pt)/Q] = F. ((w - p t ) / Q ) .
Wn t
Thus,
IW1P
;: Ja Fw (Ci
n
r--?
- pt)/ v'l-p)fw (t)dt,
(2.12)
n
and
2
:::; (l-p )/p] ;::
2
p /p
h_
J
F
a
wn
r--?
r--r.
(/l-p-/pt- pt/v'l-p-)iW (t)dt.
n
(2.13)
To find an upper bOtmd for P [WXY (n, p)
that
(Y./Q
1
IX.1 = x.)
1
~
w], we again use the fact
is normal N(px./Q, 1).
1
We can view the range
.
as the range of n independent normally distributed random variables, each
having unit variance, but with the difference between the largest and
28
smallest of their expected values being
We will show that this range is stochastically larger than the range of
n independent standard nonnal random variables.
We note that
= P{
n
n
n
i=2
jd
(-s
~
[(Zi + 0i) - (Zj + OJ)]
~
s)},
where the Zi are TIUltually independent standard nonnal random variables.
Denote the joint pdf of Zl' Z2' ..• , Zn by f(z)' and let
~'
= (01' 02' •.• , on)· Let
n
n (-s
E = Hzl' z2' ... , z ): .n
n
1=2 J. <1.
~
z. - z·
1
J
~
s)} •
It can be seen that E is a convex set in n-space, symmetric about the
origin.
Employing this notation,
We apply the following theorem of Anderson (1955).
Theorem.
Let fez)
(ii)
Let E be a convex set in n-space, synunetric about the origin.
~
0 be a function such that (i) fez) = f(-z),
{zlf(z)
~
-
u} = ~ is convex for every u (0
< U < 00),
and
29
(iii)
IE
f(~) z <
00
(in the Lebesgue sense).
Then
The multivariate nonnal density (positive definite covariance matrix)
satisfies the conditions of the theorem, and we take k
= P{
n
n
(-s ~
n
i=2 j <i
= Fw
z. - z.
1
J
=0
to give
~ s)
(s) •
n
It follows that
and therefore
(2.14)
~
ru FW (Q/pt) f W (t)dt .
n
(2.15)
n
We mention one other method of finding an upper bOlmd for
P[Wxyln,p)
~
w].
However, in calculations it was fOlmd that this bound
improved upon (2.15) only for n
exact value.
= 2,
for which case we already know the
We apply (2.10) and order the X' s.
30
12
st
/ rz
Y[r:n]/Il= p xr : n v'l-p- + Zlr)'
Pwhere Z(r) is that Zi accompanying Y[r:n]'
r = 1, 2, ... , n
Since the joint distribution
of the Z. is independent of the joint distribution of the X., ordering
1
1
the X's does not affect the distributional properties of the Z's.
Z(r) are mutually independent standard normal random variables.
where T
= Zln)
- Z(l) is a normal N(O,2J random variable.
The
Thus,
Because
[Yn:n - Y1 : n ] ~ [Y [n:n] - Y[l:n]], we have
[Yn:n - Yl:n ] / 0
-p
s1
and then
It follows that
FW l~
Yn
I ~ = t) ~ 4>l~
P[Wxy(n,p)
P[Wxy(n,p)
~
~ w] ~
ll-p2)/p]
:c;
- ptJ/O 12) ,
fa 4>(~ - pt)/O 12) ~n It)dt , l2.16)
fa 4>UvO";pt - pt/O]/ IZ)~ It)dt. (2.17)
n
In Table 2.2 are displayed the lower bound (2.13) and the upper
bound (2.15) calculated for n
= 3(1)6(2)16
and p
=
.05, .15, .30, .50,
31
. 70, and .90.
Harter's tables were used to obtain values of FW (w) and
n
Values of the pdf tabulated at sufficiently small intervals for
f W (w).
n
the numerical integration were available only through n = 16.
Calcula-
tions of the lower bound (2.13) and the upper bound (2.15) for n = 2
are compared with the exact values in Table 2.1.
2.2.4.
Calculations for Larger Samples
Gt.nnbel (1947) obtains the asymptotic distribution of the range W
n
for a large sample from an initial distribution of the exponential type.
Let the reduced range R = Cin. (Wn - 2un ), where ~ and un are the parameters
of the distribution of the extremes of a variable symmetrical about zero.
Then its asymptotic distribution function
function
~(r)
~(r)
and asymptotic density
may be expressed as
where KO and Kl are the modified Bessel functions of the second kind of
orders zero and unity. The asymptotic distribution function F(w) and
density function few) of Wn are related to
F(W) =
~(r)
and few) =
Cin.~(r),
~(r)
and
~(r)
by
where r = ~(w - 2un ) .
(2.18)
For samples from the normal distribution, Gt.nnbel finds that if the estimates of ex.n and un are based on moments of the range, then the asymptotic
distribution of the range gives a good fit to the calculated distribution
when the sample size is at least 10.
These estimators of
~
and un are
_
32
Table 2.2
2
Lower and Upper Bmmds for P[Wxy(n, p) ::; (1- p )/ p]
e
= .05
= .15
= .30
n
3
.99986
.99991
.88481
.92917
.49257
.65682
4
.99949
.99969
.75023
.83249
.25704
.41435
5
.99876
.99924
.60212
.71318
.12215
.23728
6
.99753
.99847
.46313
.58872
.05458
.12'715
8
.99315
.99566
.24982
.36848
.00967
.03202
10
.98565
.99072
.12371
.21248
.00155
.00721
12
.97470
.98333
.05782
.11572
.00023
.00151
14
.96011
.97325
.02593
.06046
.00003
.00030
16
.94198
.96044
.01127
.03061
.0
.00006
p
n
p
p
= .50
p
P = .70
P = .90
3
.19907
.37788
.07634
.20036
.01795
.06970
4
.05995
.15372
.01376
.05234
.00151
.00926
5
.01631
.05529
.00224
.01192
.00011
.00106
6
.00416
.01838
.00034
.00250
.00001
.00011
8
.00024
.00175
.00001
.00009
.0
.0
10
.00001
.00015
.0
.0
.0
.0
12
.0
.00001
.0
.0
.0
.0
14
.0
.0
.0
.0
.0
.0
16
.0
.0
.0
.0
.0
.0
33
1/ ~ = 13 OW hr ,
(2.19)
n
2un
and
where
= u...
. \"In
-?:y
/an ,
l2.20)
l1w and Ow are the mean and standard deviation of the range of a
n
n
sample of size n from a standard nonnal population, and y stands for
Euler's constant.
Before applying the results of Gumbel to the evaluation of lower
bound (2.13) and upper bound (2.15), we make some preliminary calculations. If z = 2 e- r / 2 is so large that
then the expression for Kl(z) becoIOOs
K (z)
1
rrr e -z II
IE
=
+
J.:z oz
15 )
l28z2
and 'flr) becomes
For r O = -3.345, z
F(~/Pt)
-3
= 8.276
is negligible for
x 10
-4
,and 'f(rO)
.
= 1.002
x 10
-4
.
Since
34
or equivalently,
(2.21)
Also, if z
-3
«1, the expression for KO(z) becomes
KO(z)
rrr
= .; 2Z e
-z
1
9
II - 8z + 128z ) ,
and lJi(r) becomes
lJi(r)
= lIT
3r
exp(- T - 2 e
-r/2
er
l2 9
r
)[1 - 16 +m e ] .
For r* = -3.669, z-3 = 5.091 x 10- 4 , and lJilr)
= 1.001
x 10
-4
• Hence,
we say that
is negligible for
or equivalently
(2.22)
3S
since an is no larger than 3 for the n we consider.
Combining (2.21)
and (2.22), we see that the integrand of our asymptotic approximation
to the upper bOl.md (2.15) is negligible over the entire range of integration if
(2.23)
Simplifying (2.23), this condition is
where
2un and C1t = r'le/~ + 2un •
In Table 2.3 we display the values of ~ and ~ for n = 20(20)100,
Co
= rO/an
+
n
taken from the tables of Harter (1970).
n
Also, the estimates l/~ and
2un calculated according to (2.19) and (2.20) are given. For each
-k
sample size considered, Po = (cOc* + 1) 2 is computed. Table 2.4
displays the lower and upper bounds for P[Wxy(n,p) ~ (1_p2)/p], for
n = 20(20)100.
These calculations employed tabulations of the distri-
bution function and density function of the reduced range in Probability
Tables for the Analysis of Extreme-Value Data (1953) and the transfonnations (2.18).
2. 2. 5.
Results are given for p
Moments when n
=
2
As was discussed in Section 2.2.2,
= .025(.025).100(.050).200.
e
36
Table 2.3
e
n
1Jwn
20
3.7349501196
2
OW
n
.5309837904
40
4.3215543564
.4478115084
60
4.6385564145
.4082466789
80
4.8535488369
.3836172763
100
5.0151872729
.3662417546
A
.4017458
2un
3.2711619
.277
40
.3689422
3.8956356
.146
60
.3522671
4.2318880
.111
80
.3414757
4.4593383
.096
100
.3336527
4.6300078
.083
n
1/'"an
20
Po
37
Table 2.4
Lower and Upper Bounds for P[Wxy(n,p)
::;
2
(l-p )/ p]
n
20
p = .025
.99973
.99975
p = .050
.89784
.92623
p = .075
.38202
.49201
40
.99925
.99942
.58917
.67546
.02653
.06124
60
.99815
.99866
.29828
.39828
.00070
.00339
80
.99622
.99731
.12472
.19985
.00001
.00011
100
.99332
.99527
.04500
.08844
.0
.0
P=
20
p = .100
.06491
.12886
.150
.00036
.00252
.0
.00002
40
.00010
.00078
.0
.0
.0
.0
60
.0
.0
.0
.0
.0
.0
80
.0
.0
.0
.0
.0
.0
100
.0
.0
.0
.0
.0
.0
n
p
= .200
e
38
where U and V have a standardized bivariate normal distribution with
correlation p.
Hence, the moments of the distribution of Wxy (2,p) can
be evaluated by utilizing expressions for moments and absolute moments
of the bivariate normal distribution, as given by Johnson and Kotz (1972).
E[WxylZ,p) Z] = 4'ElU ZVZ) = 4(1 + Zp 2) .
:L3
r-z
2
E[Wxy(Z,p) 3 ] = 8'E(lu-v
I) = 16'IT-1 {v'l-p-(4+l1p)
+ 3p(3+Zp Z)sin- 1 pl.
E[Wxy lZ ,p)4]
=
16.E(U4V4J
=
384 p2(3+pZ) + 144.
We also note the following for p = 0 and p
moments of the tmivariate normal distribution.
=
1, applying values for
lSee, e.g., Jolmson and
Kotz l1970).)
E[WxylZ,O)ZJ
=
4lEUZ)Z
=
E[Wxy lZ ,0)3] = 8(Elu 3 j)Z
4.
=
8(2v1)2
= 64'IT- l
E[Wxy CZ ,0)4] = 16(EU4)Z = 16(3)2 = 144.
E[Wxy lZ ,l)k] = Zk ECUZk ) = Zk lZk_ 1)lZk_ 3) ..• 3.1.
For p = 0, .01,.05(.05).95,.99,1.00, the expectation, second moment
about zero, and variance of WxylZ,p) are displayed in Table Z.5.
third and fourth central moments,
~3
and
~4'
were also computed.
The
For
39
each value of p considered, an index of skewness, a 3 = ~3(~z)-3/Z, and
an index of kurtosis, a4 = ~4l~z)-Z, are given in Table 2.5. Figure
2.2 shows a plot of these moment ratios for
p = .01,.05(.05).95,.99.
•
Z.Z.6.
First and Second Moments when n
For n
=
=3
3, we use the following identities in evaluating moments of
the product of ranges of X and Y:
Thus,
= 1/4(IXz-XIIIYz-Y11
Wxyl3, p)
+ IXZ-X11IY3-Y11 + IXZ-X11IY3-Yzl
Ix3- Xl
+ IX3-XIIIYz-Y11 +
Ix3- X1 11 Y3- Y1'
+
+ IX3-XzIIYz-Y11 +
IX3-Xz IIY3- Y1
+ IX3- Xz
1
IIY3- Y Z 1
II Y3-YZ I)
l2.24)
Let
U1 = lXz -X )//2 , U = (X 3-X1 )/12 ,
z
1
U
3
= (X 3-X z)/12 ,
V1 = (Y2- Y1)/12 , V2 = (Y3- Yl)//2 , and V3 = (Y 3- YZ)/!2 .
The joint distribution of U. and U. is standardized bivariate normal,
1
i = 1,2,3,
j
= 1,2,3,
and
J
e
40
e
Table 2.5
Distribution of Wxy (2,P)
P
e
e
2
E[Wxy (2, p)] E[Wxy (2, p) ]
Var [Wxy(2, pJ]
u (p)
3
u (p)
4
.00
1.273240
4.000000
2.378860
2.513228
12.594180
.01
1.273303
4.000800
2.379500
2.513840
12.600378
.05
1. 274831
4.020000
2.394806
2.528412
12.747270
.10
1. 279611
4.-080
2.442596
2.571685
13.179168
.15
1. 287590
4.180
2.522112
2.636769
13.815782
.20
1. 298790
4.320
2.633144
2.714951
14.557325
.25
1. 313240
4.500
2.775401
2.796944
15.302737
.30
1.330977
4.720
2.948500
2.874525
15.969242
.35
1. 352052
4.980
3.151955
2.941526
16.502304
.40
1. 376527
5.280
3.385173
2.994154
16.876379
.45
1.404476
5.620
3.647447
3.030786
17.089688
.50
1.435991
6.000
3.937930
3.051520
17.156748+
.55
1.471184
6.420
4.255618
3.057631+
17.101107
.60
1.510190
6.880
4.599326
3.051086
16.949686
.65
1. 553179
7.380
4.967635
3.034191
16.729141
.70
1.600362
7.920
5.358841
3.009336
16.463782
.75
1.652008
8.500
5.770870
2.978856
16.174778
.80
1.708479
9.120
6.201100
2.945014
15.880396
.85
1. 770274
9.780
6.646130
2.910024
15.596625
.90
1.838154
10.480
7.101190
2.876257
15.338928
.95
1.913455
11. 220
7.558690
2.846769
15.125959
.99
1.981201
11.840800
7.915643
2.830223
15.011998
1.00
2.000000
12.000000
8.000000
2.828427
15.000000
41
FIGURE 2.2
Distribution of WXy (2,p):
(a3,a4) for p : .01,.05(.05).95,.99
18
-r--I
16.5
)(
)(
)(
)(
)(
)(
)(
)(
x
15
)(
p
.99
:
x
x
13.5
)(
p =
.01
x
;.(
12 -+--_--~-__r--+-2.45
2.60
2.75
a
3
-.-----
I
2.90
----r----
3.05
42
e
corrCUi'vj) =
p
i=j
p/2
li-j
-p/2
I
li-j I
= 1
= 2
.
Also, the joint distributions of U. and U., and of V. and V., are
1
J
1
J
standardized bivariated normal, i = 1, 2, 3, j = 1, 2, 3, where
Corr CU. ,U.) = Corr (V. ,V.) =
1
J
J
1
I =1
li-j I = 2
[ 1/2
li-j
-1/2
•
We now write Wxy C3, p) as
and evaluate its expectation as follows.
ECIU.v. j) = 27f- I c O + p sin -1 p), i = 1, 2, 3,
1 1
ECIUiVj
I)
= 27f-
ECIUiVj
I)
= 27f-
I
I
= 27f- I
E [Wxy(3, p)]
= 2"1
E[Wxy (3,p)] =
CA-P~
! sin- I [ ~ J),
+
li-j
I
= 1 ,
-!
I
sin- [
! J)
CA-f I
1
sin- [
~ JL li-j I = 2.
cA-f
+
/ 2 + psin -1 p) + 127f -1 (/1~ + 2'P sin -1 [ "2}
P]
{67f -1 (/1-p-
3~-1
T
{a 2lt.+
+ p(sin-1p + sin- 1 [
f J)).
(2.25)
43
Calculations of the above expression for the first moment of Wxy (3,p)
are performed for p = 0,.01,.05(.05).95,.99,1.00 and displayed in the
first column of Table Z.6.
We next wish to find an expression for the second moment about zero
of Wxy (3,p).
Wxy (3, p) Z =
~ {I uivi I + Iuiv~ I
+
IU~~I
Iu~i I + /u~~ I
+
IU~~I
+
!UivlV31
+
luivzV3/
+
IUlUZvil
+
IUl u3Vii
+
IUZU3Vii
+
/U~lVZ/
+
/U~lV3/
+
IU~ZV31
+
IUIUZV~I
+
IUIU3V~1
+
IUZU3V~1
+
IU~lVzl
+
IU~lV31
+
IU~ZV31
+
IUIUZV~I
+
IUIU3V~1
+
IUZU3V~1
+
Z[IUIUZVIVzl
+
IU~~I
uiv~ I
+
+
Iu~il
+ /
+
Z(IUivlVzl
+
IU1UZVI V3 '
Proceeding to evaluate E[Wxy (3,p)Z], we first apply the relationship
where X and Y have a standardized bivariate normal distribution with
correIation p.
Thus ,
E( IU;,? I) = [1
1
J
+
1 +
Zp Z
pZ/Z
=j
i 1 j
i
,
44
and
E{IUivil
+
+
luiv~1
lu~il
+
+
luiv;1
IU~~1
+
+
lu~il
+
IU~;I} = 9(1
IU~~1
IU~;1
+
+
p2) •
+
(1
We next note that
L
E(IX-YZI) = 2TI
-1
{(P23
+
. -1
2 P12 P13)Sln
P23
+
2
P12
+
2
f
2
P13) 11 - P23}'
(2.26)
where the joint distribution of X, Y and Z is standardized trivariate
normal, with correlation coefficients P12' P13' and P23.
In the evalu-
ation of E[Wxy (3,p)2], the terms of interest which are of the form
Ix!.yZI
have correlation coefficients as follows:
2
IUivlV21,
IUlU2Vll,
IU~lV21 '
/UlU2V21,
IU~2V31 '
IU2U3V21,
IU~2V31 '
IU2U3V31
l
2
P
p/2
1/2
P
-p/2
-1/2
p/2
-p/2
1/2
p/2
p/2
-1/2
2
2
uivlV3 1 ,
}
IU~lV31 ,
2
IU~2V31 , IU2U3Vl / ,
2
IU}rlV2 ' ,
IUlU2V31
IU~lV31 ,
/UlU3V21
2
}
45
~
Applying (2.26),
[(~
ECluiv1V2i) = 27T- 1
= Cl:.
6
/-- V3)
'I
E(l uiY1
+
Zp
+
/3 )
7T
1
pZ (1:.
3
+
1
27T -1 [(- 2"
=
[t J) sin-1(~)
+
+
6
7T
pZ
+
~) h _ (~)z]
+
I! . i )
7T
4
1
2p [-2p ] ) sin-1 (-2)
I! ) + p 21
/3 • -5 )
(- + -
= (- + -
(1
+
3
7T
4
+
'
and
E{IUiv1VZ1
ii
IUl U2V
+
luiv1V21
+
+
IU1UZV~1
IU~ZV31
+
+
IUZU3V~1
+
IU~2V31
+
IUzV3V~1
+
luivlV31
+
IU1U3V
ii
+
IU~lV31
+
IU1U3V~1
+
luiv2V31
+
IU ZU3V
ii
+
IU;VlVzl
+
IU1UZV;1
+
IU~lV31
+
IUlU3V~I}
=
1Z{ (6
1
/3
1T )
+ p (3 +
/3 5
1T . 4)}
+
= 3(1
+
+
pZ)(l
2 1
+
1
6{(6
6/3) .
7T
Summarizing, we have established that
+
/3
TI )
+
£
z
1
(-3 +
/3
1T )}
46
Since E( lUIU2Vl V2 i) = E(IU2U3V2V31), E( lUIU2V2V3 i) = E( IU2U3VlV21) ,
and E(IUl U2Vl V3 j) = ECIUl U3Vl V2 j) = E(IUl U3V2V3 j) = E(IU2U3VlV31) ,
we have
(2.27)
A(p) = ~(5 + l2/! ) (1 + p2)
4
7f
where
(2.28)
Exact results for absolute moments of the multivariate normal distribution are available only for the bivariate and trivariate cases.
Thus,
we approximate the last four terms in (2.27) with a power series expansion given by Kamat (1952).
E( IWXYZ I)
=
+
47f- 2
1
z;r
L;,+
+
{l +
\'
4 _ T1
p..
l
°
1 . l\'. p~.
+
\'
l
2 l<J 1J
i<j <k
°
l<J
1J
~
l<J <k
'+.
122
4 . . L n Pij Pk£
l<J, k <;I.,
+
PiJ' Pik PJOk
(2
2
2 2
p .. P'k + p .. p ok +
1J 1
1J J
L Pij
Pik Pj£ Pk£
+ ••• } ,
(2.29)
where W, X, Y, and Z have a multivariate nonnal distribution with correlation coefficients P12' P13' P14' P23' P24' and P34'
are the following cases:
Of interest here
47
P12
P13
Pl4
PZ3
PZ4
P34
IU1UZV1V21
1/2
P
p/2
p/2
P
l/Z
IU1U2V1V31
l/Z
P
-p/2
p/2
p/2
-l/Z
IU1U2V2V31
1/2
p/2
-p/2
P
p/Z
l/Z
IU1U3V1V31
-1/2
P
-p/2
-p/Z
P
-1/2
-2
5
2 +112
51-+9
12
p
4
P } ,
-Z
1
3 2
5 12 + 3 "8 p
-9"6
59
4
p }
-Z
1
3 2
5 12 + 3 "8 p
-9'6
59
p4}
-2
125
5 12 + 9p + 1 12
e
Applying (2.29),
and
E (lUIUZV1V2 j) :=
7T
E(IU1U2V1V31)
:
if
E( lUIU V2V3 j)
Z
:: 7T
E(!U1U3V1V31)
:=
7T
e
4
P } .
Then summing,
(Z.30)
Values of A(p), B(p), A(p) + B(p) := E[Wxy (3,p)2], and the corresponding approximation for the variance of Wxy (3,p) are given in Table
2.6.
Note that for p = 0,
48
49
E[Wxy (3,0)2] = (EfWxy (3,1)])2 = (
3~
+
2)2 = 13.351621 ,
E[Wxy(3,0)2] - A(O) = (3/3 + 2)2 - 1(5 + 1215" ) = 4.639659
TI
4
TI
and
= 13.347404 and B(O) = 4.635444.
which we compare with {A(O) + B(O)}
2.2.7.
Lognormal Approximation
We use the first two moments of Wxy (2,p), and the fact that it has
zero probability of taking a negative value, to fit a lognormal curve.
For Wxy (3,p), we follow the same procedure, substituting the approximation for the second moment.
Suppose that the random variable T has a lognormal distribution.
Then Z = log(T) is normally distributed.
of Z and a its standard deviation.
Let
~
denote the expected value
Then,
E(T) = exp(~ +
2
21 a),
and
Letting
a;
and
denote the expected value and variance of Wxy(n,p),
W
we equate them with the expected value and variance of T, which gives
l1
a2
a
= (log[ ~ +
l1
1
l])~ ,
W
and
We approximate the distribution function of Wxy(n, p) as follows:
e
50
P[Wxy(n,p)
~
w]
= P[log{Wxy(n,p)}
~
log w]
~ P(Z ~ log w) = ~(log ~ - ~) ,
where
~(.)
is the cumulative distribution function of a standard normal
random variable.
Lognormal curves were fitted in this manner for selected values of
p.
The distribution of Wxy (2,p) is known, as discussed in Section 2.2.2.
Thus we can compare this distribution with the lognormal fit.
2.3, a comparison of the two distribution functions for p
=
In Figure
.5 is
exhibited.
Also, P[Wxy (2,p) ~ (1_p2)/p] was approximated by the lognormal
curve for p
=
.01,.05(.05).95.
These results are compared with the exact
values in Table 2. 7.
For n = 3 and p = • 5, the fitted curve, along with the lower bound
(2.12) and the upper bound (2.14) for the distribution function, are
displayed in Figure 2.4.
It can be observed that for small w, the approx-
imated value of P[Wxy(3, .5)
~
w] falls below the lower bound.
The fitted
lognormal curve was also used to approximate P[Wxy(3,p) ~ (1_p2)/p].
In
Table 2.8, these approximations are compared to the corresponding lower
and upper bounds.
51'
~oJ
f~
....
.......
...u ~
>
f.
I
~ .3
'(
~
~~
U"\
N
i;<
'"0'
....
e::
,8
+J
'"N
4i
l-
!.)
&
e::
.~ ...8
u•
..."
.D
...·n...
,~
.......~
oJ
..,
0
-0
--<
.~
.,.
..,
co
"':
0:
~
~
N
N
U"\
--<
~
,.
C
....r
,.,'"
"
~'
x
o
52
e
Table 2.7
2
p
e
!:.e.p
P[Wxy (2,p) .,; (l-p 2)/p]
.01
99.990021
Exact Value
1.000000
.05
19.950001
.999988
.999619
.10
9.900004
.997438
.995591
.15
6.516668
.983670
.984886
.20
4.800000
.957156
.966587
.25
3.750000
.921724
.940746
.30
3.033334
.8~1023
.907760
.35
2.507143
.837447
.868086
.40
2.100000
.792386
.822099
.45
1.772222
.746572
.770018
.50
1.500000
.700325
.711895
.55
1.268182
.653699
.647643
.60
1.066667
.606561
.577107
.65
.8~8462
.558620
.500194
.70
. 72~571
.509415
.417088
.75
.583333
.458263
.328623
.80
.450000
.404113
.236924
.85
.326471
.345211
.146564
.90
.211111
.278160
.066532
.95
.102632
.194192
.012696
LOgnonna1 Pit
1.000000
53
r-------------------------------------------T~
54
Table 2.8
p
.01
e
e
P[Wxy(3,p)
Lower BOlllld
1.000000
$
2
(l-p )/p]
LOgnonna1 Fit
1.000000
Upper BOlmd
1.000000
.05
.999855
. 99~118
.999914
.10
.977297
.982823
.986332
.15
.884813
.934457
.929174
.20
.750184
.857058
.842140
.25
.613134
.761874
.747494
.30
.492568
.659615
.656815
.35
.393143
.557996
.574315
.40
.313334
.461793
.500781
.45
.249773
.373605
.435642
.50
.199070
.294626
.377882
.55
.158394
.225251
.326421
.60
.125468
.165511
.280261
.65
.098550
.115322
.238508
.70
.076339
.074606
.200356
.75
.057894
.043298
.165079
.80
.042099
.021212
.131998
.85
.028989
.007775
.100449
.90
.017947
.001595
.069701
.95
.008851
.000066
.038607
.99
.000526
.000000
.009607
OW'TER III
MATOUNG A SINGLE OBSERVATION
3.1.
Application of the Distribution of the Rank of the Concomitant
= 1, 2, ... , n, be n independent random variables
Let (Xi' Vi)' i
from some bivariate distribution.
As before, we denote the ordered X's
by
X
l:n -< X2:n
~
~
X
n:n
and the concomitants of these order statistics by
Y[l:n]' Y[2:nJ' ... , Y[n:n] .
Suppose we have a broken random sample from this bivariate distribution.
We observe the XIS in some random order and the V's in some independent
random order, not knowing the original pairing of X's and Y's.
Let us
consider the problem of matching one particular X, rather than that of
reconstructing the entire sample.
DeGroot, Feder, and Goe1 (1971) investigate this problem for the
case in which the joint distribution of X and Y can be represented by a
probability density flUlction of the form
f(x,y)
=
a(x) 8(y)eXY
for
(x,y)
E
2
R ,
(3.1)
56
where ex and f3 are arbitrary real-valued flU1ctions.
Denote the ordered
observations of the lU1-paired sample by x l : n ~ x 2 : n ~ ... ~ xn :n and
Yl:n ~ Y2: n ~ ... ~ Yn : n ' Suppose one wishes to match x l : n ' Pairing
Yl: n with x l : n n~ximizes the posterior probability of obtaining a correct
match. Similarly, this criterion leads to pairing Yn : n with xn : n ' A
general solution is not pursued.
Assume only that the joint distribution function of X and Y is
absolutely continuous, and that X and Yare standardized variables, as
ordering is location and scale invariant.
We propose the following
Let Rr,n represent the rank of
th
Y[r:n] among the nY's. Suppose one wishes to match the r ordered X.
Then pair with it the kth ordered Y, where
procedure for matching one observation.
P{R
r,n
k}
=
=
max P{R
r,n
l~s~n
= s}
David, 0' Connell, and Yang (1977) derive an expression for
P{R
r,n
= s}
P{R
. r ,n
=
s}
=
n ~
~
Leo Leo
t
\ C ek er - 1 - k es - l - k en-r-s+l+k f(
)dx d
k 1 2
3
4
x,Y
y,
k=O
L
(3.2)
where
el(x,y) = P{X
~
x, Y
~
Y}
e 2 (x,y) = P{X
~
x, Y
e 3 (x,y) = P{X
>
x, Y
~
y} ,
e4 (x,y) = P{X
>
x, Y > y} ,
t = min(r-l, s-l)
,
>
y}
e
57
and
=
Let TIrs = P{R
= s}.
r,n
P{R
= s}.
r,n
Relation 3.1:
x to
(n-l)!
k! (r-I-k)! (s-f-k)! (n-r-s+l+k)!
The following two synunetry relations hold for
If there exist monotone increasing transfonnations from
X' and from Y to Y' such that the joint pdf g(x', y') of X' and Y'
is syrrnnetric (i. e. g(x', y') = g(y', x')), then
r,s = 1, 2, .•. , n .
Relation 3.2:
If f(x,y) = fe-x, -y), then
TIrs
= TIn+l - r , n+l-s
r,s=1,2, ... ,n.
Numerical evaluation of P{R r,n = s} is given by David, O'Connell,
,
and Yang (1977) for the case in which the joint distribution of X and Y
is bivariate nonnal, n = 9, and p = 0.1(0.1)0.9,0.95.
It is noted that
for small and intermediate values of p, holding r constant,
necessarily maximized by letting s = r.
Suppose n = 9,
p
rs is not
= 0.5, and
TI
Then the maximum value of TI 6s is obtained for s = 7. For p = 0.3,
the maXL'1lum value of TI 6s is obtained for s = 8. In these two examples,
if the 6th ordered X is matched based on the distribution of R ,n ' the Y
6
which is paired with this observation is not the same as that Y which
r = 6.
would be paired with it under complete re-pairing according to the maximum likelihood rule.
58
We observe from the available numerical results that for
p
= 0.1(0.1)0.9,0.95,
From Relation 3.2 it follows that
Since TIrs (p) = TIr,n-s
+1 (-p), r,s
-p = 0.1(0.1)0.9,0.95,
= 1, 2, .•. , n, we have for
and
We wish to further investigate the proposed method for matching an individual X based on the distribution of the rank of its concomitant when
that X is either the smallest or largest X.
3.2.
The Most Probable Rank of the First Concomitant
Let us consider the distribution of the rank of Y[l:n]' when the
joint distribution of X and Y is Gumbel's bivariate exponential distribution.
The marginal distributions of both X and Yare standard exponen-
tial, and the joint pdf is
59
f(x,y) = e-(x+y+exy) {(l+ex)(l+ey) - e},
as stated by Johnson and Kotz (1972).
x > 0, Y > 0, 0
When
e = 0,
dent, and the correlation decreases 'as e increases.
P{Y
>
$
e
$
1, (3.3)
X and Yare indepenAlso, we note that
ylX = x} decreases as X increases, so we say that Y is stochasti-
cally decreasing in X.
(See Barlow and Proschan (1975).)
From (3.2), we have
n-l
P{Rl,n = s} = n(s-l)
,00
,00
)-00 )-00
s-l n-s
83 e4 f(x,y)dx dy .
(3.4)
When the joint distribution of X and Y is Gumbel's bivariate exponential
distribution,
e3 (x,y) = e -x
e
-(x+y+exy) ,
and
Making the appropriate substitutions,
P{Rl,n
(e- x _ e-(x+y+exy))s-l (e-(x+y+eXY)J n- s
= s } = n (n-l),ooo,ooo
s-l )o)n
· e-(x+y+eXY){(l+ex)(l+ey) - e} dx dy
n-l)
,000
= n (s-l ) 0
,000
)
n
e- nx (1 _ e-(y+exy))s-l e-(n-s+l) (y+exy)
· {(1+8x)(1+ey) - e} dx dy
s-l
= n(~=~) I
(_l)k (skI) ~ e- nx ( ~ e-(n-s+l+k) (y+exy)
k=O
· {(l+ex)(l+ey) - e} dy)dx .
60
-ax
-- J001 _e
x dx.
Let E1(a)
'
Integrating, we have
F e- nx ( F (l+ex) e-(n-s+1+k) (y+exy)dy)dx = - r - _1--::-..,.........
JO
n(n-s+1+k)
(1)
JO
e ~ e- nx ( ~ y(l+ex) e-(n-s+1+k)(y+exy)dy)dx
(2)
n
8
1
= ----..,...2 e
(n-s+1+k)
n
E1 (8) ,
n
-e F e- nx ( F e-(n-s+1+k) (y+exy)dy)dx = 1
ee E (n)
In
In
(n-s+l+k)
18 .
O
O
( 3)
Combining tenns,
To further simplify the above expression, we apply the following.
Identity:
~L (-1) k (x)
k
k=O
1 2=
1 z+x
~
1
L (z+l+k)' z = 0,1,2, ...
(z+l+k)
(z+l+x) ( x ) k=O
(3.6)
Proof:
(tu)z (l+tu)x =
x
L
k=O
x
(1) JO J1
-1 0
(L
k=O
(~) (tu)z+k
(~)(tu)Z+k)dtdu
= (_l)z
x
L
k=O
(_l)k (~)
1~~
(z+l+k) 2 .
61
(2)
16
Repeatedly integrating by parts,
x
(tu)Z (l+tu)X dt
= z!x! l
(_l)k
k=O
f_Ol f1o (tu)z
(l+tu)x dt =
1
(l+u)x-k Uz+k .
(z+l+k)! (x-k)!
( _l)z
(z+l+x) (Z+X)
x
x
L
k=O
1
(z+l+k) •
This shows the identity.
It follows now from (3.5) and (3.6) that
n
P{R
l,n
1
= s} = - +
n
e
e E
l
n
(-e){
Stl. l
1
(n-s+l+k) - I} .
k=O
(3.7)
Comparing P{R ,n = s} and P{RI ,n = s-l},
I
n
n s-l
P{RI ,n = s} - P{RI ,n = s-l} = ee ElC e) { k~O
~
(n-~+l+k)
s-2
1
- k~O
\ (n-s+ 2+k)}
n
1
= (n-s+l)
e n
e El(e) .
Thus, we have shown that
(3.8)
We note that any monotone increasing transfonnations applied separately
to X and Y do not change the values of TIrs ' Due to this fact, (3.8)
holds not only for X and Y with a joint probability density function of
the form (3.3), but for all other variates having distributions which
can be derived by such transfonnations.
for an even wider class of distributions.
We conjecture that (3.8) holds
Whenever (3.8) does hold,
62
matching a single observation based on the most probable rank of its
concomitant specifies pairing Yn : n with xl : n '
3.3.
The Most Probable Rank of the nth Concomitant
Still assuming that X and Yare distributed according to Gumbel's
bivariate exponential distribution, we now examine the probability
distribution of Rn,n . Due to Relation 3.1, P{Rn,n = l} may be obtained
from (3.7). The condition of Relation 3.2 is not satisfied by the joint
pdf (3.3), so we will derive an expression for P{Rn,n = s}, s = 1,2, ... ,n.
It follows from (3.2) that
P{Rn,n = s} = n( n-l
s- 1)
r r
-00-00
n-s
8s-l
1 82 f(x,y)dx dy .
For Gumbel's bivariate exponential distribution,
8 (x,y)
1
=
1 - e
-x
e
-y
+ e
-(x+y+8xy) ,
8 (X,y) = e- y - e-(X+Y+8XY) •
2
and
Substituting, we have
e
-x
e -y + e -(x+Y+8XY) }s-l
. {e- y - e-(X+y+8xy)}n-s e-(X+y+8XY){(1+8X) (1+8y) - 8}dxdy
= n(~=i)
s-l s-l-k n-s+k
L
L
L
k=O
j=O
£=0
.
(_l)k+J+£ (s-l)(s-~-k)(n-s+k)
k
J
£
. ~ ~ e-(n-s+l+k)y e- x [(j+£+1)+(£+1)8Y] {(1+8X)(1+8y) - 8}dxdy
(3.9)
63
Integrating and sinplifying,
(1)
-8 ~ e-(n-s+l+k)Y (/; e- X [(j+t+l)+(t+1)8y]dx)dy
= __1_
[In-s+1+k) U+ t +1)] E [(n-s+l+k) (j+t+1)]
t+l exp
It+1) 8
1
It+1) 8
(2)
/; (1+8y)e-ln-S+1+k)Y(~e- X [(j+t+l)+(t+1)8y]dx)dy
=_
+
(3)
j
exp[ln-S+1+k)p+t+1)] E [(n-s+1+k)(j+t+l)]
(t+l)2 e
(t+l e
1
It+l)8
1
(t+l) (n-s+1+k)
~ (1+8y)e-(n-S+l+kiY(~x e- X [(j+t+l)+(t+1)8Y]dx)dy
----.:...:..:~~--- dx
.
Noting that
e -ax
-a
-ax
r,1 -Z
dx = e - a r; ~ dx,
x
and combining tenns, we obtain
n-1 s-l s-l-k n-s+k
P{~,n = s} = n(s-l) I
I
I
k=O j=O t=O
.
{I
(t+l)ln-s+k+l) -
j
---~2~-----­
(t+l)
lj+t+l)
a
~
0 ,
64
e
Calculations of the distribution of Rn,n for n = 2(1)8,10,12 and
= O.OS, 0.1(0.1)0.9,0.95,1.00 show the following ordering:
(3.11)
n1 > TIn 2 > ••• > TInn
TI
Displayed in Table 3.1 are the m.nnerica1 results for n = 3,6,10 and
e = 0.1(0.2)0.9,1.0. We believe it is likely that the ordering (3.11)
holds for Gumbel's bivariate exponential distribution, although we have
not shown this.
As
for the ordering (3.8), we conjecture that (3.11)
holds for a wider class of bivariate distributions, perhaps those for
which Yis stochastically decreasing in X.
3.4.
General Results for n = 2
Additionally, we consider the case for which n
stochastically increasing in X.
where x1 : 2
<
x2: 2 .
Suppose Xl : 2
=
2 and Y is
= x l : 2 and X2 : 2 = x 2: 2 '
Then
The random variables (Xl'Y 1) and (X2 ,Y Z) are independent and identically
distributed. Thus, the conditional pdf of Y[r:2] given Xr : 2 = xr : Z is
r
= 1,2 .
65
From this observation and the assumption that Y is stochastically
increasing in X, it follows that
> P{Y > ylX
= x l : Z} = P{Y[l:Z]> y!Xl : Z = XI:z} for all y.
Hence,
P{Y[2:2] > Y[l:Z]IXl : Z = x1 : Z' XZ: Z = xZ: 2}
= ~oo f~oo fyltlX = xl : Z) fyCulx = xZ: 2)dt
for any xl : 2
<
x Z: Z'
du
This shows that
P{RZ, Z = 2}
Since P{RZ, Z = I} + P{RZ,. Z = Z}
'TT
>
l/Z .
= 1,
ZZ > 'TT ZI .
Due to the relationships P{RI , 2 = I} + P{RZ, Z = I}
P{RI , 2 = 2} + P{RZ, Z = Z} = 1, we also have
l3.l2)
= 1 and
l3.l3)
66
It can be shown similarly that for Y stochastically decreasing in X
(3.14)
67
Table 3.1
P{R
n,n
= s}
Gumbel's Bivariate Exponential Distribution
n = 3
8\s
.1
1
.3602
2
.3384
3
.3013
.3
.4096
.3404
.2500
.5
.4544
.3355
.7
.4955
.3262
.1783
.9
.5336
.3142
.1522
1.0
.5517
.3073
.1409
+-
.2101
e
n = 6
8\s
.1
1
.1904
2
.1838
3
.1759
4
.1661
5
.1529
6
.1309
.3
.2358
.2113
.1854
.1572
.1252
.0850
.5
.2788
.2316
.1868
.1436
.1013
.0578
.7
.3196
.2460
.1831
.1288
.0819
.0406
.9
.3585
.2556
.1761
.1143
.0662
.0292
1.0
.3773
.2588
.1719
.1074
.0597
.0250
+-
Table 3.1 (Continued)
n
8\5
1
.1
= 10
.1191
2
.1162
3
.1131
4
.1096
5
.1058
6
.1014
7
.0962
8
.0898
9
.0813
10
.0675
.3
.1562
.1448
.1332
.1214
.1094
.0964
.0831
.0688
.0530
.0338
.5
.1920
.1688
.1468
.1259
.1059
.0869
.0688
.0515
.0348
.0184
.7
.2267
.1888
.1552
.1256
.0993
.0762
.0559
.0384
.0233
.0106
.9
.2602
.2051
.1597
.1221
.0911
.0657
.0451
.0286
.0159
.0064
1.0
.2766
.2120
.1606
.1196
.0868
.0608
.0404
.0248
.0132
.0050
+
+
0\
00
e
e
e
CHAPTER IV
RE- PAIRING A PARTIALLY BROKEN SAMPLE
Let T and U have some absolutely continuous bivariate distribution.
A sample of size n + m = p is drawn from this distribution.
Before n
of the observations are recorded, each of these npairs is separated
into its two components.
... ,
The observations - t l , t 2 , .•. , tn' and
un - are available in the form xl' Xz' ... , Xn and
Yl' Y2' ... , yn' where indexing is such that
< ••• <
y .
(4.1)
n
The original pairing of these observations is tmknown.
The remaining
m members of the sample are observed in paired form (ti' ui ),
i = n+l, n+2, ... , p. Relabel t i as Xi and u i as Yi' i = n+l, n+2, •.. , p.
Where ¢ denotes the set of permutations of the n integers, 1, 2, ... , n,
we wish to select
~ E ¢
and pair
Y~(i)
with Xi' i = 1, 2, ..• , n.
We first asstnne that the parameters of the joint distribution of T
and U are known.
tion
~,
When the original pairing is specified by the permuta-
the likelihood function of the sample of size p is
n
L(~)
Choosing
~
~
P
IT
f(x.,y.)
•
= IT f(x i , Y~(i))·
1
1
i=l
~
i=n+l
(4.2)
to maximize this likelihood function is equivalent to choosing
to maximize
70
=
Ll (</»
n
TI f(x i ,
i=l
Y~(i))
(4.3)
.
'I'
This is the maximum likelihood problem discussed in Chapter I.
If the
joint pdf of T and U possesses a non-decreasing monotone likelihood
ratio, (4.3) and thus the likelihood function (4.2) are maximized by
</>*
= (1,2, ..• ,n).
If the joint pdf of T and U possesses a non-increasing
monotone likelihood ratio, the maximum likelihood solution is
</>*
= (n, n-l, ... , 2, 1).
The m paired observations of the sample do
not contribute to the solution.
Suppose the joint distribution of T and U is a bivariate normal
distribution, with ECT) = flr' E(U) =
correlation coefficient p.
111'
2
2
Var(T) = aT' Var(U) = aU ' and
Assume that all the parameters are unknown.
For the case m = 0, i.e. all observations un-paired, DeGroot and Goel
(1975) obtain the following maximum likelihood solution:
(4.4)
a""2
T
",,2
=
"". 2
:{2
aU = L (y. -aU) /p ,
. 1
1=
p=r
</>
=
I
1
(4.5)
(4.6)
i=l
</> =
(4.7)
Similarly, we will give the maximum likelihood solution when m > O.
The log likelihood function of the sample of size p is, omitting a
constant term,
71
{ i=lI
1
+ .
I
(4.8)
1.=n+1
From consideration of the above expression, the maximum likelihood estimators of ~, ].1U' ai ' and a~ are fOlmd to be those given by (4.4) and
(4.5), as would be obtained when the entire sample is lUlbroken.
estimators do not depend on the pairing
We now eva1 uate (4 . 8)
involving
'"
L(~,p)
~
= -
at
~
These
in any way.
'"
'"
",2
d ",2
..
lJ.r'
].1U' aT ' an aU' offil.ttl.ng tenus not
or p.
1
2n log(l-p 2)
2
2 (1- p )
'"
'"
'"
x -lJ..n
y -u..
x -lJ..n
y
-].1'"
:E
(
&
'1)
(
i
'U)]}
- 2p[ I (~' 1)( p(&) U) +
L
eU
i=l or
aU
i=n+1 or
n
Let us establish the following notation:
r
n
n~
-u..
x -Q..., y
= I (i",'1)( p(~) rU)/n
i=l
eJT
au
(4.9)
and
Using this notation, we write
2
= -p{ 10 g (1 - P ) + 1 2 [1 L"'(~,p)
~
2
(l-p )
p
rp~
~
]} .
We wish to choose ~ and p to maximize L(~,p), where ~
Ip I
< 1.
Maximizing L(~,p) is equivalent to minimizing
(4.10)
E <P
and
72
1
2 [1 - prpq)
(l-p )
•
Let us hold ~ constant and minimize F(~,p) with respect to p.
aF _
1
ap - -(l--"""'p2:--)
[-p - r
2p(1- r :)
p~
p~
+
(1- p2)
] .
Setting this equal to zero, we have
3
p
-
2
rp~
p + p -
rp~
=0 .
(1 + p2)(p - rp~) = 0 .
This equation is satisfied by
p = rp~.
Note that
1+r 2
pp
=
1-r
2
> 0 •
p~
Substituting
p = rp~
for
p
in (4.10),
2
log (1- r p)
= -p{
2 P
+ I} .
Choosing ~ to maximize L($,rp~) is equivalent to choosing ~ to minimize
(l-r~~). Hence ~ is that pennutation for which r~~ is maximized. Recall
that
max r
~d)
It follows that
A,
n'jl
= r n'jlA,*
and
73
2)
r2
max (rp</>
=
p</>*
</>d>
Sunmarizing, we give the maximum likelihood solution for </> and p.
A
</>
= </> *
¢ = </>*
n
Obviously, as p
-+
A
As ~
-+
d
and
= r p </>*
1"f
p = r p </>*
if
A
p
2
2
r p </>* > r p </>*
r~</>* > r~</>*
.
1, the solution approaches
¢=
</>
an
=
A
</>*
and
</>*
and
p
A
p
= r p</>*
if
2
2
rn</>* > r n </>*
=r
if
r
if
r
.
p</>*
2
2
>
rn</>*
n</>*
0, the solution approaches
¢ = </>*
and
p = r P'l'*
~
m
< 0 •
Additionally, we mention the above maximum likelihood problem when
the prior information about p includes only its sign and not its magnitude.
p > O.
Suppose we know
mize L(</>,p), where </>
E
p=
o.
r p </>* when r p </>* >
<P
Then we wish to choose </> and p to maxi-
and 0 < p < 1.
= </>*
and
When r p </>* < 0, a reasonable estimator of p
A
cannot be found by considering L(</> ,p).
likelihood solution is
It follows that ~
$ = </>* and
If we know p < 0, the maximum
p = rp~
when rp~ < O.
when r p </>* > 0 we do not obtain a reasonable estimator of p.
In this case,
Even though
74
r n ¢* > 0 and r n ¢* < 0, as shown by DeGroot and Gael (1975), r p ¢* and
r p ¢* will not always be positive and negative, respectively.
However,
this sign agreement will hold with high probability.
Suppose now that we have a partially broken random sample from a
and
lJu
2
known, aT'
au'2
Without loss of generality, let
l-Lr
= ].lU = O.
From considera-
l-Lr
bivariate nonnal distribution with
unknown.
and p
tion of the log likelihood function, it follows that the maximum likeli-
2
2
hood estimators of aT and aU are
and
~
2
. 1
1=
1
-2
aU = L. Y./p ,
as would be obtained from a completely tmbroken sample.
r(l)
n¢
n
= l
x. y C)
(~) ( p.? ) In
i=l
I
r(l) =
m
i=n+l
aT
aU
x.
y.
1
1
(~)(~)
aT aU
,
Im
r(l) = ~ r(l) + m r(l)
p¢
p n¢
p m
and
Let
.
It can be shown that the log likelihood ftmction evaluated at
cr~
is maximized by
if ( r(1))2
$ = ¢* and p = r(l)
p¢*
p¢*
"'-
¢ = ¢*
and
p=
r(l)
p¢*
if
> (r(1))2
p¢*'
(r(1))2
P¢*
ai
and
75
Other combinations of known and tmknown parameters for the bivariate normal case can be considered in a similar manner to find maximum
likelihood estimators of the unknown parameters and the unknown pairing
¢ of the broken portion of the sample.
CHAPTER V
'!HE JOINT DISTRIBUTION OF ORDER STATISTICS
OF CORRELATED VARIABLES
5.1.
Joint Distribution of Xi : n and Yj : n
Having reconstructed a broken random sample or broken portion of
a random sample from a bivariate distribution whose pdf possesses a
monotone likelihood ratio, we have re-paired observations of the form
(x i : n ' Yj:n)' where j
=i
or j
= n+l-i,
i,j
= 1,
2, ... , n.
In order to
obtain and evaluate estimation and testing procedures for such re-paired
samples, we need to know the j oint distribution of X"
l.n and Y.J:n . Mood
(1941) gives the joint distribution of the medians of correlated variables
and shows that it is asymptotically normal.
It is shown by Siddiqui (1960)
that the joint distribution of X : and Y : is asymptotically normal.
i n
j n
Independently of these results we derive
ffil
exact form for the distribu-
tion flll1ction and corresponding pdf of X.
and Y. .
l:n
J :n
Let X and Y have the absolutely continuous joint cdf F(x,y) and pdf
f(x,y).
Since ordering of the X's and ordering of the Y's are location
and scale invariant, we consider the standardized variables.
A descrip-
tion of the occurrence of the event {X : ::; x, Yj : n ::; y} is displayed in
i n
the following 2x2 table:
77
where t = i, i+l, •.• , n, u = j, j+l, ••. , n, and s = a tu ' atu+l, ..• ,
btu for a tu = rnax(O, t+u-n), btu = min(t,u). Let
n!
Cs (t,u,n) = -s""!.C
. . t:-_-s~)-;-!
.
7Cu---s""'o:)....'"?Cn--t:-_-u-+-s~) ,...,
and Cs(t,u,n) = 0 for s < 0, t-s < 0, u-s < 0, or n-t-u+s < O.
Also,
let
8 (X,y) = P{X
l
~
x, Y
= P{X
>
x, Y ~ y} ,
8 (X,y)
3
~
y} ,
82 (X,y) = P{X
~
x, Y
>
y} ,
= P{X
>
x, Y
>
y} .
8 (X,y)
4
'Then we can write
n
n
l:.
I
t=1 u=j
(5.1)
Introducing further notation, let
8.
IX
_ d
- -;;;- 8·, 8.
oX
I
IY
a
= VI~T
8-,
I
We differentiate the cdf (5.1) with respect to x.
i = 1,2,3,4 •
Using the relation-
ships 8 lx + 8 3x = 0 and 8 2x + 8 4x = 0, and combining tenns give
J
78
a
ax (P{Xi : n ~ x, Yj : n ~ y})
n bi-l u-l
t
= n8 lx I.
Cs(i-l, u-l, n_I)8~8~-S-le~-S-le~-i-U+S+I
U=J
s=o
n-l bi-l,u
+ n8 zx I.
I Cs(i-l,
U=J
s=o
We differentiate the above expression with respect to y and apply the
relationships 8ly + eZy = 0, e 3y + 84y = 0, elxy + 8 2xy = o. Combining
the resulting tenus, we have the following expression for the joint pdf
of X.
and Y.
l:n
J :n
f x.
Y.
l:n J:n
(x,y)
=
d
(Note that we assume
I
5=0
f(s) = 0 for d
<
0.)
When i = j, (S.Z) becomes
79
f X.
y.
l:n l:n
(x,y)
=
+ fx(xlY
>
y)(l-Fy(y)) {fy(yIX ~ x) FX(x)
r
. i 2 C (i-2,i-l,n-2)
5=0 s
+ fy(ylX
>
x) (l-FX(x))
i-I
e~e~-S-2e~-S-le~-2i+S+l
. 1 2'
C (i-l,i-l,n-2) e~(e2e3)1-S- e~- l+S} ]
L
5=0
5
+ n f(x,y) iII Cs(i-l,i-l,n-l)
5=0
e~(e2e3)i-S-le~-2i+s+l .
The following synnnetry relations hold for (5.1) and (5.2).
Relation 5.1:
f(x,y)
If the joint pdf of X and Y is synnnetric (i.e.
= f(y,x)), then
f X. Y. (x,y)
l:n ]:n
Relation 5.2.
f X.
y.
l:n ]:n
Proof:
(x,y)
=
f X. Y. (y,x)
l:n ]:n
i,j
=
1, 2, ... , n.
If f(x,y) = f(-x,-y), then
= f
'
X(n+l-i):nYCn+l-j:n)
(-x,-y)
i,j
= 1, 2, ... , n.
Let Xi = -Xi and Yi = -Yi' i = 1, 2, ... , n.
Then
(5.3)
80
Relation 5.3:
If the joint distribution of X and Y is standardized
bivariate nonnal with correlation coefficient p, then
Y. (x,y; p)
l:n J:n
f X.
Proof:
Let Yf
= fX
i,j
(x,y;-p)
Y
= 1,
2, .•. , n.
i:n (n+l-j):n
= -Yi ,
= 1,
i
2, .•. , n.
The joint distribution of Xi
and Yf is standardized bivariate nonnal with correlation coefficient -po
Then,
5.2.
Preliminary Computations
Suppos~
nonnal.
the joint distribution of X and Y is standardized bivariate
Let </lex) and <l>(x) denote the standard nonnal pdf and cdf.
Also,
let </l(x,y; p) and <I>(x,y; p) denote the standardized bivariate normal pdf
and cdf, where p is the correlation coefficient.
8l (X,y) = <l>(x,y;p) ,
8 (x,y)
3
= <l>(y)
82 (X,y)
- <l>(x,y; p),
=
Then
<l>(x) - <l>(x,y;p) ,
(5.4)
=1
8 (x,y)
4
- <l>(x) - <l>(y) + <l>(x,y; p) .
Additionally,
fx(xlY
~
y) Fy(Y) = <p(x) <l>[:L.2!] ,
fx(xlY > y) (l-Fy(y))
o
= </lex)
<l>[ pX-;] ,
g
81
¢[ X-P~ ] ,
D
fy(YIX > x)(l-FX(x)) = <PeY) ¢[ EY-; ]
D
fy(ylX
For example, let n
= 4.
~
x) FX(x)
=
<Pey)
(5.5)
Substituting the expressions (5.5) into the
form of the joint pdf of Xi : n and Yj:n (5.3), we have
f
y
(x,y)
X1:4 1:4
= 4 8~[3<P(Y)
¢[ r--?
Py-x ]
2
<p(x)
¢[ rPY-X?2 ]
+ <P(x,y;p)8 ] ,
4
y'l-p'"
y'l-p~
and
f
y
(x,y)
X2:4 2:4
= 12[<p(x)
¢[ y-px ]
;--?
y'l-p~
+ 2 <p(y)
+ 2 <p(x)
{<p(y)
¢[
px-y ] {<Pey)
D
+ ¢(y)
y'l-p~
2
8
4
¢[ py-x ] 82 84 }
D
¢[
x-py ] 83 84
D
¢[ py-X]
D
where the 8 's are given by (5.4).
i
¢[ x-r--?PY2 ]
[8
2
8 + 8 8 ]}
3
1 4
Expressions for f X y
(x,y) and
3:4 3:4
fX4:4Y4:4 (x,y) can be obtained directly from (5.3) or by applying
Relation 5.2.
We note that the expressions given in Section 5.1 perhaps
appear somewhat formidable, but evaluations can be performed using the
computer.
;::.
82
Suppose we have a broken random sample of size n from a standardized bivariate nonnal distribution.
the sample is re-paired accordingly.
Assume it is known that p > O,and
Let us estimate p from the
re-paired sample with the estimator
We use the joint pdf of Xl..'.n and Y.l..n
. (5.3) and numerical integration
) for n = 2, 3,4 and
procedures to calculate Ep (Xl.·.nY·.
• 1.. n
P = 0.1(0.2)0.9.
of n and p.
Table 5.1 displays
Ep(r~*)
for the indicated values
In Table 5.2 is given Corrp(Xi:n'Yi:n)' Values of the
expectation and variance of Xi : n necessary for these calculations are
obtained from the tables of Tietjen, Kahaner, and Beckman (1977).
5.3. Joint Distribution of Xi : n , Xj : n ' Yk : n , and YR,:n
Consider the occurrence of the event {Xi : n :;:; xl' Xj : n :;:; x 2'
Yk : n :;:; Yl' YR,:n :;:; Y2}' for 1 :;:; i < j :;:; n, 1 :;:; k < R, :;:; n, Xl < xz ' and
Yl < Yz' as displayed in the following 3x3 table:
Y :;:; Yl Yl < Y :;:;Y2
X s; Xl
Xl
<
X :;:; Xz
Xz
where t
=
<
X
YZ
<
Y
P
q
t-p-q
t
r
s
v
u-p-r
lv-q-S
v-r-s
n-t-u-v-w
+p+q+r+s
n-t-v
u
w
n-u-w
n
i, i+l, ... , n, u = k, k+l, .•. , n, v
=
a tj , atj+l, .•. , n-t,
w = auk' auk+l, ..• , n-u, for a cd = max(O,d-c), p = 0, 1, ..• , btu'
83
q = 0, I , ..., bt -p,w ' r = 0, I , ..., bu-p,v , s = 0, 1, ... , bw-q,v-r'
for bcd = rnin(c,d) , and p,q,r,s are chosen so that
Using the same approach as for Xi : n and Yj : n ,
we can find the J'oint distribution ftmction for X.l:n , X.J:n , Yk :n ' and
(n-t-u-v-w+p+q+r+s)
YQ,: n .
~
O.
This expression and the corresponding pdf are quite cumbersome,
and we will not pursue the general case further here.
It would be of
interest to find an approximation to this distribution.
Let us now look at a special case of the above problem, the joint
distribution of XI : n ' Xn : n ' YI :n , and Yn : n .
Let
Then we can write
· ~ YI' Y.
P{XI .n
· ~ Xl' X.
n.n ~ xz' YI .n
n.n ~ YZ}
btu
~
~
\
S t-s u-s n-t-u+s
= L
L
L Cs (t,u,n) PI Pz
P3 P4
t=l u=l s=O
(5.6)
a p. and p. = ~Ta p., i = I, Z, 3, 4, j = I, Z.
Let p.
= -')lX j
oX
lYj
vlj 1
j 1
Differentiating the cdf (5.6) with respect to Xl and simplifying with
the relationships PI
a
aX
I
Xl
+ P3
Xl
= 0 and Pz
Xl
+ P4 = 0, we have
Xl
[P{XI : n ~ xl' ~:n ~ xz' YI :n ~ Yl' Yn : n ~ Yz}]
n n-l u-l n-u
~
n-l u n-u-l
= nPIXI uI (u-l) P3 P4 + nPZXI u~l (u) P3 P4
I
84
Continuing to differentiate, we obtain the following expression for the
X Y Y
(xl' x z , Yl ' YZ)
l:n n:n l:n n:n
fX
= n(n-I)
n-4 {
Z
P4
f(x l , YI) f(x Z' YZ) P4
+ f(xl,yZ) f(xZ'YI)
+ f(xz,Y Z) PZ x
I
P~
+ (n-Z) [f(xI'YI) P4x
Z
P4yz P4
P3y P4 + f(xl'YZ) P4x P3y P4
I
Z
I
+ f (x Z'YI) PZ
P4yz P4 + (n- 3) Pz x P4x P3y P4Y J},
xI
I
Z
I
Z
(5.7)
where
e
PZx
P4x
I
Z
= fX(xIIYI < Y ::; YZ) P(YI < Y ::; Yz) ,
= fx(xz1Yl < Y ::; Yz) P(YI < Y ::; Yz) ,
P3YI = fy(Yllx l
<
X ::; x z ) P (xl < X ::; x z ) ,
P4yz = fy(Yzlx1
<
X ::; x ) P(x
I
z
<
X ::; x )
z
.
85
e
Table 5.1
n
Ep(i~l Xi : n Yi:n/n)
p
0.0
n=2
.31831
n=3
.47746
n=4
.57392
0.1
.36722
.50858
.59593
0.3
.47828
.58447
.65345
0.5
.60248
.67291
.72166
0.7
.74167
.77800
.80531
0.9
.90041
.90842
.91523
1.0
1.00000
1. 00000
1.00000
e
Table 5.2
Corr p(Xi : n , Yi : n ) = CorrpCXCn+1_i):n' YCn+1- i ):n)
p
n=3
n=4
0.0
n=2
i=l
.0
.0
.0
.0
.0
0.1
.07175
.05357
.07446
. 03834
.06986
0.3
.23467
.19566
.22755
.16601
.21484
0.5
.41686
.36597
.39410
.32694
.37378
0.7
.62105
.57035
.58714
.52984
.56112
0.9
.85391
.82181
.83208
.79366
.81113
1.0
1.00000
1.00000
1.00000
1.00000
1.00000
i=l
i=2
i=l
i=2
rnAPTER VI
RECONSTRUCTION OF A 'IWICE BROKEN SAMPLE
6.1.
Trivariate Nonnal Case
In this chapter we discuss the situation in which the broken
random sample to be reconstructed is formed when each observation is
separated, not into two components, but into three components.
the random vector
distribution.
!'
Suppose
(TI' TZ' T3) has a standardized trivariate nonnal
Prior to the recording of n complete observations in a
=
random sample from this distribution, each three-dimensional observation
vector is broken into its three components.
Denote the ordered Tl's,
TZ's, and T3 's by
Xl
<
Xz <
••• <~,
Yl
<
YZ
< ••• <
Yn , and zl
<
Zz
< ••• <
zn'
(6.1)
We wish to reconstruct the original n vectors, and will choose permutations
~
and TI of the n integers (1, Z, ••• , n), pairing
with xi' i
=
1, Z, ••. , n.
Let the correlation matrix of T be
-
R=
-
e
1
P12
P13
P12
1
PZ3
P13
PZ3
1
Y~(i)
and zTI(i)
87
If the true pairing of the sample were described by (¢,rr), then the
joint pdf would be
L(¢,rr)
= ~ ((2rr)-3/2 ~-1/2 exp{-1/2 ~-1[(1-p~3)x?
i=l
+
where t:.
1
2 2
(1-P13)Y¢(i)
= I~I = 1
+
2 2
(1-P12)zrr(i)
222
-P23 - P13 - P12
+
+
2(P13 P23- P12)Xi Y¢(i)
2P12 P13 P23·
Omitting tenns not
involving ¢ or rr,
Choosing (¢,rr) to maximize L(¢,rr) is equivalent to choosing (¢,rr) to
maximize F(¢,rr), where
n
F(¢,rr)
= i~l [(P12 - P13 P23)xi Y¢(i)
+
+
(P13 - P12 P23)xi zrr (i)
(6.2)
(P23 - P12 P13)Y¢(i)zrr(i)] .
Let p.. k denote the partial correlation between T. and T. given
1J·
1
J
88
Also, let 81 = 1PlZ - P13PZ31,
8Z = 1P13 - PlZPZ3 1, and
83 = Ip23 - Pl ZP13 1•
We now consider the maximization of F(</>,1T).
Case (1).
PlZ .3 > 0, P13.2 > 0, PZ3.1 >
For this situation, we write F(</>,1T) as
°.
n
F(</>,1T) = iII [8 l Xi Y</>(i)
8ZXi Z1T (i)
+
+
83Y</>(i)Z1T(i)] •
Because 81 , 82, and 83 are each positive, the permutations which maximize F(</>,1T) are </>* = (1, Z, .•• , n) and 1T* = (1,2, •.• , n). In other
words, we assign y.1
and1
z. to x.,
1 i = 1,2, ••• , n, where the observations are indexed according to (6.1).
Case (Z).
PlZ .3 > 0,
P13.Z
<
0,
PZ3.1
<
0.
n
F(</>,1T) = iII [8 l Xi Y</>(i) - 8ZXi Z1T (i) - 83Y</>(i)Z1T(i)] •
Choosing (</>,1T) = (</>*, rr*), where rr* = (n, n-l, ••• , 2, 1), maximizes
\,n
. ..
\,n
. ..
\,n
d
Li=l xi Y</> (i) , m1n1mlZeS Li=l xiZrr(i) , mlll1ID1ZeS Li=l Y</>(i)Z1T(i) , an
thus maximizes F(</>,rr).
Case 3.
P12 • 3 <0,
P13.Z < 0,
P23.l < 0.
n
F(</>,1T) = - iII [8 l Xi Y</>(i) + eZxizrr(i) + 83Y</>(i)Z1T(i)] •
There is no simple solution here, as there was in the previous two
cases.
We wish to choose (</>,rr) to minimize -F(</>,rr).
equivalent to selecting wijk's to minimize Gl , where
This problem is
89
n
Gl = L
i=l
n
n
L
j=l
I
k=l
(6.3)
wiJ'k ciJ'k '
i,j,k = 1, 2, ••• , n,
(6.4)
and the w, "k's are subject to the following restrictions:
1J
Wijk = 0 or 1
n
i,j,k
n
L l: w. 'k = 1
j=l k=l 1J
n
n
I
i=l
= 1, 2, •.. , n ,
j
= 1, 2, ... ,
n ,
k
= 1,2, ... ,
n .
n
I w, 'k = 1
j=l 1J
..• , n,
i
n
I I W'·k = 1
i=l k=l 1J
= 1,2,
(6.5)
(6.6)
Setting w. "k = 1 specifies ¢(i) = j and n(i) = k. The 3n restrictions
1J
(6.6), which taken together are actually (3n-2) independent restrictions,
mean that each individual component may be assigned to only one reconstructed vector.
Case (4).
P12.3 > 0,
P13'2 > 0,
P23'1 < 0
n
F(¢,n) = i~l [6 l Xi Y¢(i)
+
62xi zn (i) - 63Y¢(i)zn(i)] .
As for Case (3), there is no simple maximlUl1 likelihood solution.
wish to choose (¢,n) to minimize -F(¢,n).
We can view this as the
problem of selecting wijk's so that G2 is minimized, where
We
90
n
G
2
n
= I
.I
n
I
(6.7)
Wo Ok d. ok '
i=l J=l k=l
1J
1J
i,j,k = 1, 2, 3, ••• , n,
(6.8)
and the w. ·k's are subject to restrictions (6.5) and (6.6).
1J
Case (5).
Suppose a given situation is described by either Case (3) or Case
If one 8.1 ·is small relative to the others, then the maxinn..un
(4) •
likelihood problem is close to Case (5), and the similarity increases
as the relative magnitude of that 8i decreases.
For Case (5),
n
F(~,TI) = iII [(P12 - P13P23)xiY~(i) + (P13 - P12P23)xizTI(i)] .
Since we have assumed that P23 = P12P13 '
F(~,TI)
2
2
Note that sgn[P12 (1-P13)] = sgn(P1 2) and sgn[P13(1-P12 ) ] = sgn(P13 ).
If P23 > 0, and thus sgn(P12) = sgn(P13), then the choice of ~ and the
°
choice of TI are the same.
For P12 > and P13 > 0, we select
For P12 < and P13 < 0, we choose (~*,TI*) to maximize F(~,TI).
P23 < 0, and so sgn(P12) ~ sgn(P13)' we let ~(n+1-i) = TI(i) ,
°
i = 1, 2,
P1 2
<
"0'
°and P13
n.
For P12
>
°and P13
< 0, we choose (~*,TI*).
> 0, we choose (~*,TI*) •
(~*,TI*).
If
For
91
6.2.
A Three-Dimensional Assignment Problem
We now consider the solution of the three-dimensional matching
problem which arises in Case (3) and Case (4) of reconstructing a twice
broken sample from the standardized trivariate normal distribution.
First, recall the standard linear assignment problem with cost matrix
~,
mentioned in Chapter I.
The objective of this problem is to choose
a .. 's, subject to restrictions (6.10) and (6.11), in such a way that
1J
the total cost gl is minimized, where
n
gl =
I
i=l
n
I a .. p .. ,
j=l 1J 1J
n
I a .. = 1,
j=l 1J
i = 1, 2,
n
I a .. = 1,
i=l 1J
a .. = 0 or 1
1J
and
j = 1, 2,
i,j = 1,
(6.9)
... , n,
... , n,
... , n .
(6.10)
e
(6.11)
When condition (6.11) is replaced by the restriction
a ..
1J
~
0
i,j = 1, 2, .•• , n ,
(6.12)
we have a linear programming problem and so can apply linear programming
methods.
It
is known that an optimal solution (Le. a solution for
which total cost is minimized) to the standard linear assignment problem
(6.9), (6.10), and (6.11) is the same as an optimal solution to the linear
programming problem (6.9), (6.10), and (6.12).
(See Dantzig (1963).)
92
For our three-dimensional matching problem, such a result does
not hold.
Consider replacing condition (6.5) by the restriction that
the decision variables be non-negative,
~
a
i,j,k
= 1,
2, ••• , n .
(6.13)
The problem (6.3), (6.5), and (6.6) is not necessarily equivalent to
the linear progrannning problem given by (6.3), (6.6), and (6.13).
example, suppose n = 4.
Let
and all other co ok = 100.
1J
For
The choice of w. °k's which minimizes (6.3),
1J
subject to the restrictions (6.6) and (6.13) is
and all other w.. k =
1J
o.
The resulting total cost is 8.
The non- zero
decision variables are not integers, and there does not exist an integer solution satisfying (6.6) which gives a total cost less than or
equal to 8.
Nevertheless, we conj ecture that quite often an optimal
solution to a problem of the form (6.3), (6.6), and (6.13) will in fact
be integer-valued, when the cijk's are those which arise in efforts to
reconstruct a twice broken sample.
methods will suffice.
In such instances, linear programming
It would be of interest to find conditions on the
three-dimensional cost array moor which an optimal solution is integervalued, and investigate the probability of occurrence of these conditions.
93
Pierskalla (1968) considers the multidimensional assignment
problem and proposes for its solution an algorithm based on a tree
search technique of the branch-and-pmmd variety.
The three-dimensional
assignment problem is to select Wijk's to minimize
g2 =
I j=lr k=lI w.1J k coo1J k
(6.14)
0
i=l
subject to
r
r
I
r
l
j=l k=l
l
i=l k=l
I r
i=l j=l
Wijk = 1
Wijk
:0;
1
j = 1, 2,
·.. ,
q,
W
:0;
1
k = 1, 2,
·.. ,
r,
ijk
w. ok = 0 or 1
1J
where p
:0;
q
:0;
i = 1, 2, · .. , p,
all i,j ,k
r and the cijk's are costs of assignment.
(6.15)
e
(6.16)
The resulting
special case when p = q = r = n, and thus all constraints of (6.15) are
equality constraints, has the same form as the three-dimensional matching
problem (6.3), (6.5), and (6.6).
Therefore, we may apply Pierskalla's
algorithm to this problem.
We also present two algorithms which will give feasible solutions
(i.e. solutions which satisfy the restrictions (6.5) and (6.6)) but not
necessarily optimal solutions for problems of the form (6.3), (6.5), and
(6.6), although the algorithms do aim to produce low-cost solutions.
These algorithms, which are easily calculated, are generalizations of
94
algorithms used for two-dimensional problems.
Method I generalizes
the Mini.mum Entry algorithm mentioned in Chapter I, and Method II
the Vogel Advanced Start Method.
(See, e.g., Gaver and Thompson (1973).)
We illustrate these methods using a random sample of size 3 generated from a standardized trivariate nonna1 distribution with correlation
matrix
1.0
-0.6
-0.4
-0.6
1.0
-0.1
[-0.4
-0.1
1.0
R=
For this distribution, P12.3
P23.1
= -.701815,
P13.2
= -.577897,
and
= -.463713. Also, 81 = .64, 82 = .46, and 8 3 = .34. The obser-
vations in the ran40m sample, prior to breaking, are
1.0996
-.0114
-.9723
-1.0815
-.8493
1.6381
-1.1085
and
1.4289
1.1716
Ordering according to (6.1) , we have
1
i
2
3
x·1
-1.1085
- .0114
1.0996
y.
-1.0815
-.9723
1.4289
z.1
,..8493
1.1716
1.6381
1
The cost array, where c. 'k is as defined by (6.4), is given in Table
1)
6.1.
Note that the correct matching is specified by w
I32
w213 = 1, w321 = 1, and all other w
= O.
ijk
= 1,
95
Method I
(1)
Select the smallest entry in the cost array, c" " k.
l
w. "k
l
l Jl 1
=
Let
1, and set all other decision variables in row iI' collUTlIl j 1'
and file kl equal to O.
cost array.
(2)
l Jl 1
Delete row iI' collUTlIl jl' and file k l from the
Repeat step (1) with the remaining elements of the cost array.
Continue this procedure until all w" "k's have been assigned values.
1J
Applying this method to the cost array displayed in Table 6.1, we
see that Ciljlkl
= c 133 = -1.0532. Let w133 = 1 and other wijk = 0
whenever i = 1, j = 3, or k = 3.
Delete row 1, collUTlIl 3, and file 3
from the cost array.
c 3ll = -0.8784.
k
The smallest of the remaining c ijk ' s is
Let w3ll = 1 and other wijk = 0 for i = 3, j = 1, or
Finally, set w222 = 1. We note that Method I has correctly
matched y3 with Xl and zl with x3" The other matches are not correct.
=
1.
The resulting total cost of this assignment is -2.318.
Method II
(1)
For each row i in the cost array, compute the difference
between the two smallest entries, c"" k - c 1"J"lk ·
1J2 2
l
collUTlIl and each file.
(2)
Do
the same for each
Go to the row, column, or file corresponding to the largest
of these differences and select the smallest entry, say c" " k.
l
w. "k
l
lJl 1
=
Let
1, and set all other decision variables in row iI' collUTlIl jl'
and file kl equal to O.
cost array.
(3)
lJl 1
Delete row iI' column j l' and file k l from the
Repeat steps (1) and (2) with the remaining elements of the
cost array.
Continue this procedure until all decision variables are
set equal to 0 or 1.
96
Table 6.1
Cost Array
k =1
i \j
1
2
3
1
1.5126
1.4037
-.9933
2
.3246
.2923
-.4186
3
-.8784
-.8331
.1634
k =2
e
i,j
1
1
-.2610
-.2950
-1.0419
2
-.4291
- .3864
.5526
3
-.5993
-.4790
2.1675
2
3
k =3
i,j
1
1
-.6704
-.6870
-1. 0532
2
-.6030
-.5430
.7768
3
-.5348
-.3972
2.6301
2
3
97
Application of Method II to the cost array given in Table 6.1
yields w132 = 1, w321 = 1, w2l3 = 1, and all other wijk = O. This
assignment gives a correctly reconstructed sample. The total cost of
this assignment is
-2.478~
Additionally, we consider the linear progranuning problem (6.3),
(6.6), and (6.13) for this sample. There are n 3 = 27 decision variables
and (3n-2)
=
7 equality constraints.
Using a simplex algorithm program
by Neebe (1978), the optimal solution was found to be w132 = 1,
w321 = 1, w2l3 = 1, and all other woo k = O. This optimal solution,
1J
which we note is integer-valued, specifies the correct matching.
6.3.
Multivariate Normal Case
Suppose the p-dimensional random vector !' = (!i, !2' !3) has a
standardized multivariate nonnal distribution.
size n is taken from this distribution.
A random sample of
Before the observations are
recorded, each observation vector is broken into three components, of
dimensions Pl' P2' and P3' where PI + P2 + P3
=
p.
Denote the !l's,
!2 's, and!3' s, each observed in some random order, by
~l' ~2' ... , ~, ll' l2' ... ,
pennutations
<p
and
7f
and -7f
z CO)
with x.,
i
1
-1
In' and
~l' ~2' ... , ~n'
We will select
of the n integers (1, 2, •.. , n), matching
l<PCi)
= 1, 2, ... , n, in an attempt to reconstruct the
original n observations.
Let the correlation matrix of ! be partitioned according to
R=
~ll
~12
~13
~2l
~22
~23
~3l
~32
~33
e
98
where Ro
is po
0
-lJ
1
x
Po, i,j
J
R-1 =
-
= 1, 2, 3. Also, let
~1l
~12
~13
~2l
~22
~23
~3l
~32
~33
If the pairing of the original sample were described by C<j>,'IT), then
the j oint pdf would be
L(<j>,'IT)
-¥ IRI-:-Z exp{-~Cx!,
n
=
1
oIT1 C2'IT)
1=
-
_1
Ycf>'CO)'
1
-'Z'co))R
IT 1 -
-1
Cx!,
Ycf>'
C O), -'z'C-))'}·
_1
_ 1
IT 1
Omitting terms not involving cf> or 'IT,
n
LCcf>,'IT) = exp{- ill C~i ~12 rcf>Ci) + ~i ~13 ~'ITCi) + r~Ci) ~23 ~'ITCi))} •
Let us choose Ccf>,'IT) to maximize the likelihood function LCcf>,'IT) , or
equivalently to minimize ACcf>,'IT), where
tl
A(cf>,'IT) =
I
i=l
ex!
c y~C-) + x!
C Z CO) + Y1CO)
C
Z CO)) .
_1 - 12 -~ 1
-1 - 13 -'IT 1
-~' 1
_ 23 -'IT 1
This problem is identical to that of selecting wijk's to minimize G3 ,
where
hOOk = x! C12 y1J
-1 -J
+
x! C13 zk
-1 -
+
y! C23 zk
-J - -
i,j,k = 1, 2, ••• , n ,
99
and the w. "k's are subject to conditions (6.5) and (6.6).
1)
The form is
that of a three-dimensional assignment problem, as discussed in
Section 6.2.
CHAPTER VII
MATOUNG UNDER OTIlER
7.1.
~DELS
Experimental Designs
In this chapter we briefly introduce matching problems in the
context of experimental designs and regression.
First, let us mention
the work of Bose and Mahalanobis (1938), rootivated by the following
situation.
In an agricultural experiment in India, the yields from
individual plots were stored side by side in labelled bags.
Acciden-
tally, two of the bags were damaged, and the contents of those two bags
were mixed up.
The combined yield contained in the two damaged bags
could be measured, but the individual yields of the two plots were not
known.
Bose and Mahalanobis proposed a method for reconstruction of
the individual yields, utilizing the known combined yield and niinimizing
the residual sum of squares.
We approach the rnatching problem which
arises when all individual responses are available, but some of them
have become unlabelled.
Consider the usual model for a randomized complete block design
with both block and treatment effects fixed.
treatments.
There are n blocks and r
Letting X denote the response for the i th block and j th
ij
treatment,
Xij
= l.l
+ Pi +
Tj
+ £ij'
i
= 1,
2, ... , n,
j
= 1,
2, ... , r ,
101
where ~ .. is a constant, the Pi's and Lj'S are constants subject to the
. . ' S are indepenrestrictions I~=l Pi = 0 and Ij=l Lj = 0, and the £ 1J
dent N(O,a Z). Let
r
1. =
X.
x.1·
j~l
X.. , X
1J
= X.1· Ir,
n
, X
I x..
.j = i=l
1J
..
X
.j
= X .In, and
'J
n
= I
i=l
r
I x.. ,
j=l 1J
x.. = X• .jnr
Suppose that all nr responses are mown and that, with the exception of
two responses, each is identified with its block treatment combination.
Denote the tm.-matched responses by Xl < xz. It is not known whether
\q,
= Xl and Xqs = Xz or ~ = Xz and \s = xl' for the appropriate
m, p, q, s where 1 s; m, q s; n, 1 s; p,s s; r, m f q, and p f s.
Let us assign Xl and Xz in that way which minimizes the residual sum
of squares,
n
SSE = I
i=l
=
n
Jl
r
I {X.. - X.1.
j=l 1J
r
j~l
- X..J
+ X
}Z
r
X~1. Ir - . I 1 XZ•J·/n + XZ•• Inr . (7.1)
J1
J=
n
Z
X..
1J
Introducing further notation, let
X'm. = )' X., X'
jrp mJ
q.
We re-write SSE as
.p = 1. ~ X.1p , and X'.s = 1. ~q X.1S •
X ., X'
qJ
e
102
SSE
2
= Xmp
+
- {(X'
.p
•
+
2 - {(X'
Xqs
-me
+
Xu )2
mp
(X'
.s
Xmp )2
+
+
(X'q.
X )2}/n
qs
+
+
Xqs )2}/r
X2 Inr
.•
(terms not involving Xmp or Xqs ) •
2
SSE = Xmp
+
2)
Xmp
+
(2Xqs X'q.
+
(2Xqs.s
X'
+
2 )}/n
Xqs
X2.• Inr
2 - {(2Xu X'
Xqs
mp m.
X'
- {(2Xmp.p
+
+
+
+
2 )
Xmp
+
+
+
2 )}/r
Xqs
(terms not involving Xmp or Xqs ) .
Xqs ) = (Xl + x2) and (~ + X~s) = (xi + x~) for either
assignment of the un-matched responses, the assignment which minimizes
Since (Xmp
+
SSE is that which minimizes
A = -{Xmp X'm. + Xqs X'q. }/r - {Xmp.p
X' + Xqs.s
X' }/n ,
or equivalently that which maximizes
B = Xmp {X'm. Ir
+
X'.pIn}
+
Xqs {X'q. Ir
+
X'.5 In} .
Asstming that xl < x2 ' the following assignment minimizes SSE:
Xmp = xl and Xqs = x2 if {x'
-me Ir + X'.pIn} < {X'q. Ir + X'.5In} ,
(7.2)
Xmp = x2 and Xqs = xl if {X'q. Ir + X'.5 In} < {X'm. Ir + X'.pIn} .
103
By proceeding similarly to the above, it can be shown that if the
two tm-matched responses xl < Xz both arise from block m, the residual
sum of squares is minimized by assigning
xmp
if Xl < X'
.p
.s
= x1 and
(7.3)
if X'.s < X'.p
When the two un-matched responses are from different blocks but both
receive the same treatment, this criterion specifies assigning
~
= x1 and Xqp = xZ if ~. < X'
q.
X
=x
mp
(7.4)
= x if Xl < X'm.
Z and Xqp
1
q.
Matching rules for the randomized block design can be derived for
cases in which more than two responses are not identified with block
treatment combinations.
Xl
Denote the lll1-matched observations by
Xz < ••• < xk · These are to be assigned to X b ' X b ' •.. , X b .
al 1 a Z Z
ak k
AsstDne first that none of the un-matched observations are in the same
<
block or receive the same treatment.
Then the residual sun of squares is
minimized by assigning
Xa..
b =x.,
1.
i=l,Z, ... ,k,
(7.5)
1. 1.
where indexing is so that
{X'
al •
Ir
+
X'b In} < {Xl
• 1
a Z•
Ir
+
X'b Ir} < ••• < {X'
• Z
Ir
+
Xl
~.. bk
In} .
e
104
When tm-matched responses occur in the same block or receive the same
treatment, the assigrunent rule tmlSt be adjusted accordingly.
Suppose
for example that b l = b Z' and let
In thi s situation, minimizing SSE is equivalent to maximizing
c=
X b {X' /r + (X"b + ~Xa b )/n} + X b {X' /r + (X"b +~X b )/n}
. I
a l I aI'
, 1
ZI
a Z I a Z'
al I
k
+
I
i=3
Xa . b . {X~./r + X:b.!n} ,
1 1
1
1
Matching rules can be established in this manner for other experimental designs,
Suppose the design is a latin square with fixed treat-
ment and block effects.
For n rows, colunms, and treatments, the model
is
where
~
is a constant, the Pi's, Kj'S, and Tk's are subject to the
..
In. I p. -In
'
'd
restr1ct1ons
. I K. -In
. I T.J -0
- ,andh
t e £"k
1=
1 - 1=
J - 1=
1J s are 1n ependent N(O,a Z). Let Xl < Xz represent the two tm-matched responses to
be assigned to X b
and X b
• In this situation, the residual
a l IC I
a Z ZC Z
sum of squares is minimized by assigning
105
+ Xl
}
+ Xl
+ Xl
+ Xl
} < {Xl
if {Xl
•. c l
HC Z
a Z' .
.b l ·
.b Z·
alH
X
alblc l
= X
Z
,
=
and X
aZbZc Z xl
..
+ Xl
+ Xl
+ Xl
+ Xl
} < {Xl
}
if {Xl
HC Z
HC l
aZH
alH
.b l ·
.b Z·
When a l = a Z' b l = b Z' or c l = c z' appropriate matching rules follow,
and the case of more than two un-matched responses can be considered.
This entire procedure extends in a straightforward way to multifactor
designs.
Developing matching rules in the framework of various experimental
designs is only an initial step.
An assessment of the behavior of such
rules, concerning for example probability of correct matching, would be
of interest.
With respect to the ensuing estimation and testing, the
effects of including the matched responses in the analysis should be
evaluated.
It would be useful to detennine lmder which circumstances it
is preferable to include the matched responses instead of proceeding as
if those responses were missing.
Possibly there are alternative criteria
which give improved matching rules.
7. Z.
Regression
Suppose the model of interest is linear regression with one inde-
pendent variable,
i
= 1, Z, ... ,
p,
where Xi is the value of the independent variable assumed known
without error,
So and Sl are parameters to be estimated,
(7.6)
106
Y. is the value of the response variable corresponding to X., and the
1
1
=i,
Ei's are independent random error terms with E(E i ) = 0, Var(E i )
i = 1, 2, ... , n. If of the p responses, only m, Yn+1 , Yn+2 , .•. , Yp '
are properly identified with values of the independent variable, we
•
wish to pair the remaining responses with Xl' X2 ' •.• , ~. Denote the
un-matched responses by Y1 ::; Y2 ::; ••. ::; Yn . If the original pairing
were specified by
cP E
<P, the residual sum of squares would be
~
SSE = 2 (Y. -Y)
i=l 1
+
\,p
-
n
{I
i=l
1
1
- - \,p
where X = Li=l Xi/p and Y - Li=l Yi/p·
equivalent to choosing
cP
(x. -X) (Y'" ( . ) -Y)
1
~ 1
I
(x. -X) (Y. -Y) } 2/ I
.1
1- I
1~n+
-
2
(X; -X) 2 ,
~
.
Choosing
cP
to minimize SSE is
to maximize
!
F = {;_n_ (Xi-X)(Y"'(i)-Y) +
(x.-X)(y.-y)}2.
~ Ll
~
i=n+l 1
1
(7.7)
A similar procedure to establish a matching rule can be followed of
course if the model is second or higher order in one independent variable
or involves more independent variables.
The connnents of the preceeding
section also hold here, in that the consequences of including such
re-paired responses in the analysis remain to be evaluated.
BIBLIOGRAPHY
•
Anderson, T. W. (1955).
The integral of a synnnetric tmimodal ftmction
over a synnnetric convex set and some probability inequalities.
Proceedings of the American MathematicaZ Society 6, 170-176.
Aroian, L.A., Taneja, V.S., and Cornwell, L.W. (1978).
Mathematical
forms of the distribution of the product of two normal variables.
Communications in Statistics A7, 165-172.
Bose, 5.5. (1938).
The estimation of mixed-up yields and their
standard errors.
Sankhy7I 4, 112-120.
Bose, 5.5. and Mahalanobis, P.C. (1938).
Individual yields in the case
of mixed-up yields of two or more plots in field experiments.
Sankhya 4, 103-111.
Chew, M.e. (1973).
On
pairing observations from a distribution with
monotone likelihood ratio.
The AnnaZs of Statistics 1, 433-445.
Cornwell, L.W., Aroian, L.A., and Taneja, V.S. (1977).
uation of the product of two nonnal variables.
Numerical eval-
JournaZ of Statis-
ticaZ Computation and SimuZation 7, 123-132.
Dantzig, G.B. (1963).
Linear Programming and Extensions.
Princeton
University Press, Princeton, New Jersey.
David, H.A. (1973).
Concomitants of order statistics.
BuZZetin of the
InternationaZ StatisticaZ Institute 45, 295-300.
David, H.A. and O'COImell, M.J., and Yang, 5.5. (1977).
Distribution
and expected value of the rank of a concomitant of an order
statistic.
The AnnaZs of Statistics 5, 216-223.
108
DeGroot, M.H., Feder, P.I., and Goel, P.K. (1971).
Matchmaking.
The
Annals of Mathematioal Statistios 42, 578-593.
Estimation of the correlation
DeGroot, M.H., and Goel, P.K. (1975).
coefficient from a broken random sample.
Technical Report No. 105,
•
Department of Statistics, Carnegie-Mellon University.
DeGroot, M.H. and Goel, P.K. (1976).
variate normal data.
Feller, W. (1957).
oations, 1.
The matching problem for multi-
Sankhya B 38, 14-29.
An Introduotion to ProbabiUty Theory and its AppU-
John Wiley and Sons, New York.
Gaver, D. P. and Thompson, G. L. (1973).
Mode ls in Operations Researoh.
Programming and Probabi Zi ty
Brooks/Cole Publishing Company,
Monterey, California.
Geffroy, J. (1958, 1959).
extremes.
Contribution
a la
theorie des valeurs
pub Zioations de l' Insti tut de Statistique de l' Universi te
de Paris 7, 37-121; 8, 123-184.
Goel, P.K. (1975).
On re-pairing observations in a broken random sample.
The Annals of Statistios 3, 1364-1369.
Gould, H.W. (1972).
Combinatorial Identities, A Standardized Set of
Tables Listing 500 Binomial Coeffioient Summations, Morgantown,
West Virginia.
Gumbel, E.J. (1947).
The distribution of the range.
The Annals of
Mathematioal Statistios 18, 384-412.
Gumbel, E.J. (1953).
Introduction to ProbabiZity Tables for the AnaZysis
of Extreme- Value Data.
National Bureau of Standards, Applied Mathe-
matics Series, 22; Washington, D.C.: U.S. Government Printing Office.
Hardy, G.H., Littlewood, J.E., and P6lya, G. (1967).
2nd ed. Cambridge University Press.
InequaZities,
4It
109
Harter, H. L. (1970).
Estimation, 1.
Opdep Statistics and theip Use in Testing and
Aerospace Research Laboratories, Washington, D.C.
Johnson, N.L. and Kotz, S. (1970).
DistPibutions in Statistics:
Continuous UnivaPiate DistPibutions - 1.
•
JOM Wiley and Sons,
New York.
Johnson, N.L. and Kotz, S. (1972).
DistPibutions in Statistics:
Continuous MuZtivapiate DistPibutions.
JOM Wiley and Sons, New
York.
Kamat, A.R. (1952).
Incomplete and absolute moments of the multivariate
normal distribution with some applications.
Kurtz, T.E., Link, R.F.,
Tukey, J.W., and Wallace, D.L. (1966).
Correlated ranges of correlated deviates.
~od,
A.M. (1941).
BiometPika 40, 20-34.
BiometPika 53, 191-197.
On the joint distribution of the medians in samples
from a multivariate population.
The AnnaZs of MathematicaZ
Statistics 12, 268-278.
Nair, K.R. (1940).
The application of the technique of analysis of
covariance to field experiments with several missing or mixed-up
plots.
Sankhy~
Neebe, A.W. (1978).
No. 78- 7.
4, 581-588.
LPSBA, A linear progranuning code.
Technical Report
Department of Operations Research and Systems Analysis,
University of North Carolina at Chapel Hill.
Pierskalla, W.P. (1968).
The multidimensional assignment problem.
Opepations ReseaPch 16, 422-431.
Siddiqui, M.M. (1960). Distribution of quanti1es in samples from a
bivariate population.
JouPnaZ of Reseapch of the NationaZ Bupeau
of standaPdS Section B 64, 145-150.
110
Tietjen, G.L., Kahaner, D.K, and Beckman, R.J. (1977).
Variances and
covariances of the normal order statistics for sample sizes 2 to SO.
SeZected TahZes in Mathematical- Statistics VoZwne V, 1-74.
Mathematical Society, Providence, Rhode Island.
.American
UNCI.ASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (When D.t. Entered)
r'
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTATION PAGE
I. REPORT NUMBER
4.
GOVT ACCESSION NO.
TITLE (and Subtitle)
3.
RECIPIENT'S CATALOG NUMBER
5.
TYPE OF REPORT 8< PERIOD COVERED
Towards Reconstruction of an Un-paired Random
Sample
Technical
6.
PERFORMING ORG. REPORT NUMBER
Mimeo Series #
•
7•
AUTHOR(s)
B.
Donna J. Lucas
9.
CONTRACT OR GRANT NUMBER(s)
ARO DAAG29-77-C-0035
10. PROGRAM ELEMENT. PROJECT. TASK
AREA 8< WORK UNIT NUMBERS
PERFORMING ORGANIZATION NAME AND ADDRESS
Department of Statistics
University of North Carolina
Ch::mp1 Hi 11
I 1.
1\1r
171:;1 L1
CONTROLLING OFFICE NAME AND ADDRESS
12.
REPORT DATE
13.
NUMBE"R OF PAGES
IS.
SECURITY CLASS. (of this report)
Spnf"
Army Research Office
Research Triangle Park, NC
110
14. MONITORING AGENCY NAME 8< ADDRESS(1f different from Cantrall/nil Office)
1Sa.
16.
1 Q7Q
DECL ASSI FI CA TlON/ DOWN GRADIN G
SCHEDULE
DISTRIBUTION STATEMENT (of this Report)
Distribution unlimited - approved for public release
17.
DISTRIBUTION STATEMENT (of the abstract entered In Block 20, If different from Report)
18.
SUPPLEMENTARY NOTES
Ph.D. dissertation under the direction of Professors LM. Chakravarti
and Norman L. Johnson.
Partially supported by NSF under grant #~1CS78-01434
19.
KEY WORDS (Continue on reve'se side if necessa'y and Identify by block numbe,)
Bivariate normal distribution, Broken random sample, multivariate normal
distribution, Regression, Experimental design.
20.
ABSTRACT (Continue on ,eve,,,e side If necessary and Identify by block number)
The probability of occurrence of a sufficient condition for equivalency of
the maximum likelihood and optimal solutions when the broken random sample is
from a bivariate normal distribution is examined. This requires an investigation of the distribution of the product of ranges of the component variables
in samples from a bivariate normal distribution.
DO
FORM
1 JAN 73
1473
EDITION OF 1 NOV 65 IS OBSOLETE
Unclassified
SECURITY CLASSIFICATION OF TH!S PAGE (When Data Entered)
SECURITY CLASSIFICATION OF THIS PAGE(When Dete Entered)
•
SECURITY CLASSIFICATION OF THIS PAGE(When Data Entered)