Harrell, F. E., J.;(1979)Statistical Inference for Censored Multi-variate Distributions Based on Induced Order Statistics."

STATISTICAL INFERENCE FOR CENSORED MULTIVARIATE DISTRIBUTIONS
BASED ON INDUCED ORDER STATISTICS
by
Frank E. Harrell, Jr.
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1230
MAY 1979
STATISTICAL INFERENCE FOR CENSORED MULTIVARIATE DISTRIBUTIONS
BASED ON INDUCED ORDER STATISTICS
by
Frank E. Harrell, Jr.
A Dissertation submitted to the faculty of
the University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the degree of
Doctor of Philosophy in the Department of
Biostatistics
Chapel Hill
1979
Approved by:
Reader
FRANK EANES HARRELL, JR. Statistical Inference for Censored
Multivariate Distributions based on Induced Order Statistics.
(Under the direction of PRANAB KUMAR SEN.)
This research is concerned with methods of multivariate analysis
applicable to life testing problems with concomitant variables
observable only for failing units.
With this setup the author
attempts to provide:
1)
estimates of parameters from censored bivariate normal
observations begun by Watterson [1959],
2)
the likelihood ratio test of independence based on censored
bivariate normal observations,
3)
methods for inference based on censored multivariate normal
distributions especially by studying multiple and partial
correlations,
4)
parameter estimates and likelihood ratio tests of independence for other selected bivariate distributions, and
5)
locally most powerful rank tests of independence for
general censored bivariate distributions.
These methods also have applications in inference from biased
samples resulting from a selection process.
ii
ACKNOWLEDGMENTS
The author wishes to express much appreciation to his adviser,
Professor P. K. Sen, who suggested the topic of this dissertation,
and who was constantly available for counsel and guidance.
The
insight provided by Professor Sen was invaluable throughout the
research process.
The other members of the committee, Professors
I
C. E. Davis, K. L. Lee, D. Quade, and O. D. Williams, also provided
helpful suggestions and ideas which went into the dissertation.
The author also wishes to thank his wife, Marie10uise Worden
Harrell, for her patience and assistance in his graduate studies,
and his parents, Mr. and Mrs. Frank E. Harrell, Sr. for their constant encouragement and understanding throughout his studies.
Finally, the author wishes to acknowledge the support provided
by the Lipids Research Clinics Program through the National Heart,
Lung and Blood Institute, contract NIR-NHLBI-71-2243 from the
National Institutes of Health.
FER
iii
TABLE OF CONTENTS
vii
LIST OF TABLES
viii
LIST OF FIGURES • •
CHAPTER
I.
INTRODUCTION AND REVIEW OF LITERATURE· • • • • • •
1
1.1.
Introduction . • • • • • • • • • •
1
1.2.
Notation and Definitions.
2
1.3.
Methods for Analyzing Censored Survival
Data When All Concomitant Data are
Available
1.4.
II.
.........
4
Framework for Survival Analysis with
Incomplete Concomitant Data
11
1.5.
Censored Bivariate Normal Distribution.
13
1.6.
Summary • • • • • • •
19
ESTIMATION OF PARAMETERS OF CENSORED BIVARIATE NORMAL
DISTRIBUTIONS AND TEST OF INDEPENDENCE BASED ON
INDUCED ORDER STATISTICS • .
2.1.
Introduction. • •
2.2.
Estimation of cr
2
x
Cram~r-Rao
20
using only Induced Order
20
Statistics
2.3.
20
Lower Bounds for Variances of
iv
24
Unbiased Estimators . • • . • .
2.4.
Maximum Likelihood Estimation ••
• 33
2.5.
Test of
• 41
H : p- 0
O
••••••
2.5.1.
Likelihood Ratio Test • .
2.5.2.
Distribution of Test Statistic under
• 41
Censoring
III.
• 44
STATISTICAL INFERENCE FOR CENSORED MULTIVARIATE
NORMAL DISTRIBUTIONS BASED ON INDUCED ORDER
53
STATISTICS • . • • • . . . • • • • .
• . • 53
3.1.
Introduction··
3.2.
Maximum Likelihood Estimation
53
3.3.
Inference about Independence of X and Y
62
3.3.1.
Estimation of Multiple Correlation
Coefficient.
• 62
3.3.2.
Likelihood Ratio Test of Independence. 64
3.3.3.
Distribution of Test Statistic under
•• 66
Censoring • • •
3.4.
Inference about Independence of
~(2) and Y
given ~(1) • . • • • • • • •
3.4.1.
.72
Estimation of Partial Correlation
Coefficients. •
3.4.2.
Distributions of Test Statistic under
Censoring
• . . • . • • . • • . • • • 75
v
3.5.
Inference about Independence of x(l) and
(x(2),Y) • • • . •
-
IV.
. 83
3.5.1.
Estimation of Canonical Correlations. 83
3.5.2.
Null Distribution of a Test Statistic. 84
STATISTICAL INFERENCE FOR OTHER MULTIVARIATE
DISTRIBUTIONS . . • . •
86
4.1.
Introduction
86
4.2.
Inference when Order Statistics are from any
Parent Distribution but Concomitant Variables
4.3.
are Conditionally Normally Distributed
• • •. 87
4.2.1.
Cramer-Rao Lower Bounds ••
. • • • 87
4.2.2.
Estimation· • . . .
4.2.3.
Likelihood Ratio Test of Independence 94
Example:
91
Order Statistics from Extreme Value
Parent Distribution • . . . . • . . . . • . .. 94
4.4.
Example:
Order Statistics from Exponential
Parent Distribution •
V.
. . . . . . . . . . ..
. 99
DISTRIBUTION-FREE TESTS ON INDEPENDENCE BASED ON
INDUCED ORDER STATISTICS
.•• 111
S.l.
Introduction
••• 111
5.2.
Locally Most Powerful Rank Test of
Independence . . • . . . • . . . •
• .111
5.3
Empirical Power Studies . • . • . . • . .
. .118
5.4
Asymptotic Relative Efficiencies of Linear
Rank Tests
. . . . . . . . . . . . . . . . •. 122
vi
5.5.
Hoeffding's General Test of Independence as
applied to Censored Bivariate Samples •
VI.
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
6.1.
Summary. •
6.2.
Suggestions for further Research
BIBLIOGRAPHY • . • •
• •••••
.....................
124
131
131
l~
133
vii
LIST OF TABLES
Table
2.1
Cram~r-Rao
Lower Bounds for Variances of Unbiased Estima-
tors of the Parameters of the Bivariate Normal Distribution based on Order Statistics and Induced Order
Statistics
2.2
32
Estimated Moments of Modified Maximum Likelihood
Estimators of the Bivariate Normal Parameters •
..
2.3
Estimated Power of 5% r *-Test for Censored Data.
4.1
Estimated Power of 5% Test for Censored Data from
the Extreme Value Distribution
4.2
5.1
. . . . . . . . . . . . ..
Estimated Power of 5% Test for Censored Data under
the Log-Linear Model. • •
.......
51
99
110
110
119
Estimated Power of 5% Normal Scores Test for Censored
Data from the Extreme Value/Normal Distribution
5.3
40
Estimated Power of 5% Normal Scores Test for
Censored Data from the Bivariate Normal Distribution.
5.2
..
121
Estimated Power of 5% Hoeffding D Test for Censored
Bivariate Normal Distribution • • • • • • • • • • •
129
viii
LIST OF FIGURES
Figure
4.1
Graph of Marginal Density of Xi when Y has an
i
Exponential Distribution
• • • • • • • • •
106
CHAPTER I
INTRODUCTION AND REVIEW OF LITERATURE
1.1.
Introduction
In life testing studies, life length or time to failure of an
item is the major response variable.
Many parametric statistical
methods are now available for analyzing the relationship between life
length and a set of associated variables when life length measurements are censored in some way and the associated variables are observed for all n censored or uncensored units.
The literature re-
view included describes many of these methods as background for investigating the problem of concomitant data being available only for
units that actually fail.
Selection problems can also be placed into this setting.
For
example, suppose that 1000 students take a college's entrance exam
and the highest 100 scorers enter the college.
take a second exam.
Once there they
The problem may be to estimate the population
mean score of the second type of exam or the population correlation
between the two exams given this biased sample, where the population
of interest consists of all high school graduates.
2
1.2
Notation and Definitions
In most life testing studies, observations are accumulated
sequentially in time and are censored, i.e., only a subset of the
set of ordered life lengths is observed.
an experiment have life lengths
< ••• < Y
Y
< Y
n,l- n,2
n,n
Yl ,··· 'Y n
Suppose that n items in
Yl' ••• 'Y n
and let
be the order statistics corresponding to
< Yn j+l <. •• < Y k
with Yn,]. ,
- n,
j >1, k· n
being observable,
we have censoring on the left; for
l~j~k~n.
If
j • 1, k < n
we have censoring on the right, the typical type for
survival studies.
When
j
or
k
is a random variable, the censoring is
said
to be of type I; an example is an experiment with a pre-specified
duration and
that time and
k
is the number of failures or deaths occuring during
j . 1.
If
k
is pre-specified, the censoring is
of type II and the duration of the experiment is thus a random
variable.
In multi-stage censoring schemes or staggered entry ex-
periments (called progressive censoring by some authors - see for
example Cohen [1965]), subjects are withdrawn from the study without
having failed at each of a number of stages (or equivalently they
enter the study at different times), but the statistical analysis is
not performed until the end of the study.
In true progressive
censoring, statistical tests are made throughout the experiment to
possibly terminate it early with rejection of the hypothesis of no
effect of a treatment on life length.
3
Most published statistical methods as well as those proposed
here are designed for analyzing type II right-censored data in which
the smallest
out of
k
served for fixed
k.
n
order statistics of life length are ob-
However, for both type I and type II censoring
maximum likelihood estimation procedures are the same and asymptotically the estimators for either type will have equivalent properties.
When
is a random variable, the distribution of derived estimators
k
depends on the distribution of
Let
Xl, ••• ,Xn
failure times
and hence may be more complicated.
k
denote concomitant variates corresponding to
Yl' ••• 'Yn
so that (Xl,Yl), ••• ,(Xn,Y n ) are indepen-
dent and identically distributed random vectors (i.i.d.r.v.).
Define
Xn[i] - Xj
if
Yn,i·
Yj
for
Bhattacharya [1974] has termed the
i,j
Xn[i]
=
l, ••• ,n.
(1.2.1)
as the induced order
statistics while David and Galambos [1974] have named these as the
concomitants of order statistics.
Xn [1] , ••• ,Xn [k] ,
statistics.
order.
the first
k
of
n
induced order
Note that these variates are not necessarily in any
When concomitant data are observable only for failing units
then only the first
able.
1 <k <n
We will call
k
of
n
induced order statistics are observ-
4
1.3.
Methods for Analyzing Censored Survival Data When
All Concomitant Data are Available.
Assume that the conditional distribution function (d.f.) of
Yi
~i
given
is of the form
Fax (y)
with probability density
--i
function (p.d.f.) fax (y), for
p (>l)-vectors,
product.
~
i · l, ••• ,n; here, the
--i
is an unknown p-vector, and
~~i
~
may be
is a vector
The likelihood function of a type II right-censored sample
(and given the
~i)
is
(1. 3.1)
Various authors have used (1.3.1) to draw statistical inference [on
a
....
and other parameters associated with p.d.f •
principle of maximum likelihood.
fax(y)] using the
--
See Zippin and Armitage [1966]
for the exponential distribution, Glasser [1967] for the exponential
hazard distribution, for example.
A good general description of the setup of the likelihood
function is given by David and Moeschberger [1978] generalized to
more than one cause of failure and multiple censoring points.
Consider the case of type I multi-stage censoring in which an individual's failure is observed only if it occurs within some predetermined time interval, the length of which may vary from subject
to subject.
In this case, subjects can enter the study at different times
5
with the termination point being the same for all.so the times are
adjusted to start at zero for each subject.
Starting with
subjects, the ith subject is observed until censoring time
lifetime for a subject,
lihood of the
sa~le
Yi , is known only if
Y
i
~
B •
i
n
B •
i
The
The like-
is then
(1.3.2)
where the first product is over the known failures and the second
is over the censored times.
If the
B
i
are random as in random
entry times to the study, the estimation procedure is the same but
the properties of the estimators will depend on the distribution of
the entry times.
are equal.
For type II censoring, it is assumed that all
Bi
A discussion of this also appears in Cohen [1965], aimed
more at the case of multi-batch censoring without covariates.
Estimation of the parameter vector
~
is usually carried out by
maximizing the likelihood in (1.3.2) by an iterative process and
associated tests of hypotheses are performed by computing likelihood ratios and using large sample theory pertaining to the distribution of -2 log L where
L is the likelihood ratio.
Halperin
[1952] showed that the maximum likelohood estimate (MLE) of a single
parameter with no covariates is in some sense optimal under type II
right censoring with mild regularity conditions on the underlying
I
density.
He also gave an outline of the proof for the multiple
6
parameter case, the MLE is weakly consistent, efficient, and asymptotical1y multi-normally distributed.
An additional paper on this
subject is by Silvey [1961], which includes a discussion of the
properties of the likelihood for dependent random variables.
Maximum likelihood estimators can seldom be derived explicitly.
Because of this the properties of the estimators are difficult to
study.
mean
(~)
Saw [1961] was able to study the moments of the MLE of the
and standard deviation (cr) derived from a type II right
censored normal sample.
He found the estimators to be badiy biased
for large proportion of censoring (small kIn).
For
k/(n+1)-.2, he
derived
~ - E(~) • 5.538cr/(n+1) + O(n+l)-2
and the amount of bias is worse for smaller
(1. 3. 3)
k/(n+1).
An explicit and asymptotically efficient method for deriving
unbiased estimators was given by P1ackett [1958] and formalized by
Chan [1967].
Chan's formulation is for a general location-scale
distribution and double censoring.
Yn, j ,Y n, j+1' ••• 'Yn, k'
-1
8
2
f«y-8 )/8 )
2
1
where
and scale parameters.
U i · E(Z
n,
n,
i).
Here the sample consists of
l<j <k<n
8
1
Let
and
8
2
from the parent density
are respectively the location
Z
• (Y
- 81)/8 2
n,i
n,i
and
The log likelihood function is (Chan [1967])
7
k
L • log [n!/(j-l) !(n-k) 1] - (k-j+l)log 8
2
+
!: log feZ i)
i-j
n,
+ (j-l) log F(Z
n, j) + (n-k)log [l-F(Zn, k)]'
where
F(z) is the d.f. corresponding to the p.d.f. fez).
strategy is to expand
Zn,i • u n,i'
aLIas r ,
r • 1,2,
(1.3.4)
The
in a Taylor series about
j ~ i ~ k , and then to ignore terms which converge in
probability to zero.
Thus, aside from these remaining terms of
order greater than one, which converge in probability to zero,
(1. 3.5)
where
(1.3.6)
8
L
(2)
2 2
- -(j-l) [(Yn,j-a l }/a 2- Un,j] (fj/Pj+ un,jfj/Pj-un,jfj/Pj)
k
- i:j
[(Yn,i-al}/a2-un,i](un,ifi/fi-un,ifI2/f~+
fi/ f i )
(1. 3. 7)
(1.3.8)
f i - feu n, i}' Pi-Feun, i}' qi- I-Pi' f i'- df(x}/dxl x=u i
n,
and so forth.
The first term of (1.3.5) also converges in probability to zero
en
so the estimation procedure is to solve for L
- 0,
Denote the estimators satisfying this requirement by
r
= 1,2.
a~
and
The fact that (a * ' 6*) are unbiased follows from the fact that
l
2
E(L(r}} • O.
u
n, i by
-1
F
Chan also shows that estimators derived from replacing
(i/(n+l}) have the same asymptotic properties.
The
extimators (a * ,a *) can be called linearized maximum likelihood
2
l
es timators.
Saw [1962] developed a procedure similar to that of Chan for
the multiple parameter case and for general likelihood functions in
which the likelihood is expressed as a polynomial with a remainder
term.
This ?rocedure can be used iu many cases to derive simple
9
and efficient moment estimators.
Saw applied this to right cen-
sored normal samples (See also Saw [1959]) to derive simple and
efficient estimators of the mean and variance.
These estimators
require computation of only two expected values as opposed to
Chan's estimators which involve calculation of
u
., ... ,u
n,J
n,
k.
Modified maximum likelihood techniques could perhaps be applied
to the more complicated likelihood in (1.3.1) involving covariates.
This technique might eliminate the need for iterative solutions and
provide unbiased estimates of regression parameters.
At any rate,
there seems to be no rigorous proof of the optimality of MLE's
derived from expressions such as (1.3.1) which involve covariates.
Many authors in the literature who discuss survival analysis with
covariates seem to assume that theorems concerning identically distributed observations apply.
There are other methods in the literature for analyzing censored survival data than the conventional parametric maximum likelihood technique.
Cox [1972] developed a conditional likelihood model
which could be called a mixed parametric/nonparametric approach.
In
his model, which allows for arbitrary right censoring, only the
ranks of the failure times are used.
The model assumes that for
some (unspecified) monotonic transformation d(y) of the failure
times, d(Y)-
ax
has an extreme value distribution.
This implies
that the ratio of hazard rates for observations i and j at each
fixed failure time y is
to be f(y)/(l-F(y»].
exp(~~i-~~j).
[The hazard rate is defined
This mixed approach has been explored for
10
general distributions by Kalbfleisch [1978].
Brown, Hollander, and Korwar [1974] developed a nonparametric
test of association between arbitrarily right censored failure time
and one uncensored concomitant variable.
Kendall's tau and is easily applied.
of
n
The test is based on
The data for the test consist
independent pairs (Xl,Yl), ••• ,(Xn,Yn ) where
is known
Yi~Bi'
only if
Let
where
Yi
0i· I{Y i uncensored} - i:{Yi~Bi} and
Zi
= min(Yi,B i ),
I{A} denotes the indicator function for the set A.
Let
I,
{
. {~
0,
(1.3.9)
-1,
if it can be inferred that Y > Y
i
j
if Y • Y or if "uncertain"
j
i
if it can be inferred that
(1.3.10)
Y < Yj
i
so that
1,(Oj-1 and Zi > Zj) or (Oi-0,Oj-1 and Zi-Zj)
b ij -
{
-1, (oi-1 and Zi < Zj) or (Oi-1,Oj-0 and Zi-Zj)
(1. 3.11)
0, otherwise
The test statistic for independence is
s -
(1.3.12)
e
11
and the test can be performed by either finding the permutation distribution of S or by using a normal approximation to its null distribution.
Brown
~
al. [1974] proposed a modification of this test which
uses the Kaplan-Meier [1958] empirical survival curve to gain more
information in the "uncertain" case b
ij
.. O.
This test is more
complicated and may make little improvement when much censoring is
present.
In fact, in the example in their paper, in which little
censoring is present, the significance levels for the more complicated test are generally larger than for the simple test in
(1.3.12).
When
Xi
is a binary variable, (1.3.12) reduces to the
modified two-sample test of Gehan [1965]0
Majundar and Sen [1978] have considered generalized linear rank
statistics for progressive censoring with staggered entry.
Miller
[1976] estimated parameters of a linear regression on failure times
by means of weighted regression using weights constructed from the
Kaplan-Meier curve.
A general theory of rank analysis of covariance
under progressive censoring when there is no staggered entry has
been developed by Sen [1977]0
The asymptotic theory of multiple
likelihood ratio tests in progressive censoring has been studied by
sen [1976] for the single parameter case.
1.4.
Framework fbr Survival Analysis with Incomplete Concomitant Data
All methods in section 1.2 assume that though Y l' ••• 'Y k are
n,
n,
12
only observable, the entire set (Xl, ••• ,X ) [or equivalently
n
Xn[l]' ••• 'Xn[n]] is given prior to experimentation so that {1.3.l}
is properly defined.
In many other life testing problems, especially
those arising in toxicological or other invasive or destructive
studies, Xn[i] is observable only when
i
Y . i is observable (for
n,
2n), so that the second factor on the right hand side of (1.3.l)
is no longer deterministic.
This is particularly true if observing
Xi necessitates the failure of the ith unit, so that for the surviving [n-k] units, the Xi are not observable.
As an example, suppose that an experiment is performed in
which a toxic substance is injected in constant amounts at regular
time intervals in mice until death occurs.
At death, the amount of
the toxic substance taken up by the kidneys is measured to test the
hypothesis that death occurs when a certain amount is precipitated
there.
For mice alive at the end of the study, the toxic substance
may have been dissipated by means other than by the kidneys' action.
Suppose that for time or cost reasons, the experimenter, having
started with 30 mice, had decided to stop the experiment, after the
first 10 deaths occur.
He may want to study the relationship be-
tween survival time and absorption of the toxic substance by the
kidney, the latter being measureable only at the termination of life
and removal of a kidney.
As another example, suppose that in an experiment designed to
study the relationship between arteriosclerosis and life length,
13
rhesus monkeys are fed an atherogenic diet.
Autopsies are per-
formed on the first 20 monkeys to die and the amount of fatty plague
deposited on the walls of the aorta is measured.
The experimenter
may wish to quantify the strength of relationship between this
~asurement
and life length.
Other examples of this type arise in
selection problems, particularly in educational testing.
The necessity of observing concomitant data for all individuals
is not required if we make the assumption that (Xi,Y i ) has a multivariate distribution of known form.
Suppose that (Xi,Y i ) has p.d.f.
f(x,y) and the conditional density of Xi given
Y = Yi is
i
gy (X).
i
By an appeal to Lemma 1 of Bhattacharya [1974], we conclude that
given
Yn,l, ••• ,Yn,k' Xn [l],.,.,Xn [k] are (conditionally) in-
dependently distributed
the Y i
n,
1 < i <k
g
(x)
Yn,j
and the conditional p.d.f. of Xn[j] given
is given by
for
j=l, ... ,k and every
(1. 4.1)
1 <k <no
This result provides the basis for much of the statistical inference
based on induced order statistics.
1.5.
Censored Bivariate Normal Distribution
Watterson [1959] considered the case where (Xi,Y i ) have a bi-
variate normal distribution.
<f>§(x,y) with
~
=
2
Denote the p.d.f. of (Xi,Y i ) by
2
(J.lx'l)r,ax,ay , p),
J.lx'l)r
are the means,
2
2
ax,ay
14
the variances, and p the correlation coefficient of X and Y.
By
(1.4.1), given Yn,l, ••• ,Yn,k' Xn[l], ••• ,Xn[k] are independently
normally distributed with means
(1.5.1)
and variances
2
2
a (l-p )
(1.5.2)
x
for
j-l, ••• ,k.
Let then
Y
• (Yn, l'···'Yn, k)' and
~
pod.f. of
! -
(Xn [1] ,. •• 'Xn [k]) , ,
-1
Z• a
~
X given
Y is then
~
~
y
(Y ~
~
1).
y -
The joint (conditional)
which is the k-variate normal density with mean vector
~
x ~l+pax ~Z
(1.5.4)
and covariance matrix
(1.5.5)
15
which implies
(1.5.6)
Thus if we let
EZ
=u =
(un, l' ••• 'un, k) and E(Z-EZ)(Z-EZ)'
= ...V=«vn, i J »,(1.5.7)
... ... ......
o
then, from (1.5.4), (l.5.5) and (1.5.6) we obtain
(l.5.8)
For
-
k<n<20; u
-
"""
and
V
~
are tabulated in Sarhan and Greenberg
[1962, pp. 193-205] and several approximations for these moments for
higher values of n are in David [1970, pp. 65-7]0
suggests that a good approximation for
u
n, i
Harter [1961]
is
u
. n,i
an • .4,
where
~l
(1.5.9)
n > 50,
denotes the standard univariate normal d.f.
The
un,i can
also be approximated with high accuracy by numerical integration
or by using the power series of Saw [1958] for select
k/(n+l)
not
16
toO near
0
or
1.
The result of Saw [1959, section 6.] may be
used to interpolate the power series for values of
tabled.
k/~n+l)
not
From (1.5.8) we obtain
•
(j
2
2
2 2
x (l-p ) + P (j x'Vn, i i '
(1.5.10)
i· j
and using (1.5.4),
.
• 'Vn,ij Cov(Xi , Yi >·
(1.5.11)
All, of these results can be found in Watterson [1959].
From (1.5.7) and the definition of
E(Y) • lJ
-
-
1 +
Y-
Let a denote a
(j
u
Y ...
k x1
and
...Z
2
E(YlJ l)(Y- lJ 1)'· (j v.
..."""
....
y...
y...
y
vector of constants.
E(a
... 'Y
...) • lJy a' 1 + (jy......
a' u , V(a'Y)
_... •
(1.5.12)
Then
2
(j
y-a' Va
from which the best linear unbiased estimators (BLUE's) of
(1.5.13)
lJ and
y
17
cry can be determined.
a'a is minimized instead of
However, if
a'Va,
.......... as in Gupta [1952], the estimators are much simpler.
The
result is
...
-1
k
- a'
Y
a - k
ay = a'
Y
S = (u-u.
1)/ E (u
...
K...
m-1 n,i
~~y
- (u-u. 1) u. / E ( u i-U.)
...K...
K m-1
n,
K
k
_
2
(1.5.14)
2
(1.5.15)
u.)
K
where
"-1
uk - k
k
(1.5.16)
E u
i-1
n,i .
For censoring on either side, Sarhan and Greenberg [1962, pp.20811] computed the minimum efficiency for the simpler estimators with
respect to the BLUE's to be over 84% for both the mean and standard
deviation for
n < 15
and in general the efficiencies are well over
90%.
Now consider linear estimation using
X:
...
E(a'X)
a' 1 + pcrx a'u
.... -.. = lJ:x..........
(1. 5.17)
(1.5.18)
To get an unbiased estimator of
lJ:
x
we require
...a'
1
= 1, a' u = o.
18
Minimizing the variance is a problem since it depends on the unknown
parameter
p.
l.I
x
where (i and
...
If we minimize for, p .. 0, we have the estimators
and
• (i'x
... -
...
B are
-
pax • 6'X
... -
(1.5.19)
defined in (1.5.14) and (1.5.15) respectively.
Alternatively, minimization can be done for p .. 1 yielding the
same vectors as in the BLUE's of
l.I and
y
cr.
y
For
n" 10 and all
possible ways of left and/or right censoring, Watterson [1959]
co~
pared the relative efficiencies of the two for the worst possible
value of
-2
p (0 or 1) and found the simpler estimators in (1.5.19)
to be often better than the others and never much worse.
from (1.5.17), no linear estimator in ...X exists for p or
Estimation of p and
later chapters.
Note that
cr.
x
crx will be the subject of much of the work in
Estimation of
l.I
x
and
crx will perhaps be of more
interest in selection problems while estimation of p and testing
p .. 0
will be of interest in both selection problems and life test-
ing studies.
We note that from (1.5.3) and (1.5.7) that the basic assumption
of bivariate normality can be roughly checked by examining the conditional normality of
a plot of
y
n,i
!
by usual regression methods and by examining
vs. un, i ' i • l, ••• ,k
for linearity.
David and Galambos [1974] also discuss this setup.
asymptotic results for the expected rank of
Xn[i] in
They give
Xl, ••• ,Xn
19
for
iln
~
A (0 < A < 1) as
n
~
00 •
They also show that for fixed
k, Xn [l]- E(Xn [l])' ••• 'Xn [k]- E(Xn[k]) are asymptotically independent1y normally distributed with mean
n~
00
for
Ip I < 1.
For our purposes, however, we will need
o < p ~ 1,
with
n
1.6.
Summary
~OO,
0
p
k=pn
fixed.
Chapter I provides the background necessary for analyzing censored
dat~
when concomitant information is available only as induced
order statistics for uncensored observations.
estimators for the remaining parameters, p and
In chapters to follow,
a
x
, of the bivariate
normal distribution will be developed and a test of independence will
be studied.
These methods will be extended for the multivariate
normal distribution thereby providing many tools for statistical
inference for censored data that are parallel to those commonly used
for complete data.
be considered.
Next, some other bivariate distributions will
Finally, some nonparametric tests of independence
will be studied and compared to parametric tests by both empirical
power
calculations and asymptotic results.
CHAPTER II
ESTIMATION OF PARAMETERS OF CENSORED BIVARIATE
NORMAL DISTRIBUTIONS AND TEST OF INDEPENDENCE
BASED ON INDUCED ORDER STATISTICS
2.1.
Introduction.
Suppo~e
that (Xi,Y i ) follows a bivariate normal distribution
2
2
with parameters llx' lly , O'x ' O'y , and p. Estimators of IIx and
pO'x using
Xn [l] , ••• ,Xn[k]
while the estimation of
Y
n,l
•••
Y
n,k
have been developed by Watterson [1959]
lly and
0' .
y
using the order statistics
can be made as in Sarhan and Greenberg [1962].
However, the estimation of
0'
x
and p poses certain difficulties and
constitutes a major objective of this chapter.
tors of
O'x and p we also derive an estimator of
In deriving estimaII
x
using both order
statistics and induced order statistics which is slightly better
than Watterson's.
We will also show that the classical Pearsonian
correlation coefficient has desirable properties as a test statistic
for testing
2.2.
H : p - 0
O
and we will estimate its power function.
2
Estimation of O'x using only Induced Order Statistics.
Sometimes one may be interested in estimating population
21
parameters from censored data even when the ordered variates,
are not observable.
!,
This problem arises often in selection processes.
For example, suppose in a class of 100 students, the 20 tallest
are determined by having the students order themselves by increasing
height.
These 20 students are asked to do high-jumps to determine
their maximum jumping capability.
Later, the athletic instructor
wishes to estimate the population variability of
ju~ping
capacity.
not just for the tallest students, using only the observed jumps
and the ranks of students' heights out of 100.
As we have seen in section 1.5, the population standard deviation of
Xi'
ax , is not estimable by any linear function of X.
~
It is natural to try a quadratic estimator of the form X'AX.
The
variance of such an estimator would depend on the unknown parameter
P so the variance cannot be minimized for all P.
Following the
Watterson approach, we might minimize the variance of
p
= 0,
which amounts to minimizing
trA
....
2
~'~
when
2
subject to E(X'AX)
= ax •
,.., ..
~
The authors have found by simulations that this estimator, which requires the solution of
k
simultaneous equations, has little to
gain over a much simpler estimator presented below.
...2
a
x
where
and the
=
(2.2.1)
is Watterson's estimator,
are real constants.
Let
Now
k
E
a
X
, defined in (1.5.19)
i-1 i n [i]
22
and
where
Un, i '
Vn,ii ' and
~
are defined in (1.5.7) and (1.5.16).
Also,
k
E(Xn[i]Ux) •
E~:l CijXn[i]Xn[j])+
Cii
E(X~[i])
j"i
• j;i
r Cij {p2crx2(Vn, ij + Un, iUn, j)+ ~2+
p~x ax
x
2
ajv ij + ~ +
x j-1
n,
x
22
- p a
k_.
~ ~
p~
..
a U i + aia x ( 1-p
x x n,
and
22
E [Xn[i] ( ~x - -)]
~x
- - paX
Therefore,
k_
2
(u
2
2
Ea..v
-Ci.a(l-p)
1 x
•
• 1 J n ~ ij
J-
n,
2)
i+u j)}
n,
23
(2.2.2)
where
k
a
i
• (k-1)/k
+ ~ {I + 2(u i- ~ )}/ L (u
n,
K
n
m-1
n,m
_ ~)2
K
(2.2.3)
k
bi
for
...c
=-
ai - 2
i=l, ••• ,k.
La
j=l j
Le t
~
Thus,
-
Vn,ij
=
+
k
k
r
r
a a v
r-1 s-l r s n,rs
( aI' ••• , ~.,
).,-
2
2
E(a ) - O' (a'c
x
x ~ ~
+ v i" + u
n,
1.
2
n,i
- ( b 1 ' ••• ,b )' '
k
~ -
+ Pb'c) and for an unbiased
~
~
c i - {b 1""'<#a' b - a i"'W"V
b 'b } I { (a'
b) 2 - ( .....a ' "'W-.._
a)(b 'b) }, 1 < i < k •
.... _
(2.2.4)
"V
An estimator of P can also be obtained by considering the Watterson
estimator of
square root of
pO'
x
ax2
(based on
X alone) and dividing the same by the
...
in (2. 2. 1) •
However, such an estimator does not
24
2.3
Cra~r-Rao
Lower Bounds for Variances of Unbiased Estimators
Denote the parent bivariate normal density of (Xi,Y i ) by
2 2
<l>a(x,y)
where -a- (llx
,lly,0x,0
y ,p), ll,ll
xy stand for the means,
_
2 2
0,0
for the variances, and p for the correlation coefficient of
x y
-
By (1.5.3) the joint (conditional) pdf of X given Y is
k
IT {<I>l«xi-ll -po
i-l
where
<1>1
x
1
0- (Y
x y
'---;!2
i- ll »/0 {l-p-)/o {l-p- }
n,
y
x
x
is the standard univariate normal density.
pdf for the first
k
of
n
-
(2.3.1)
The joint
order statistics is
(2.3.2)
y <y < ••• <y
1- 2-
where
~l
k
is the standard univariate normal
n[k] _ n ••• (n-k+l).
the joint pdf of
~
df
and
Multiplying (2.3.1) and (2.3.2), we obtain that
and
Y is equal to
k
n[k]{ IT <l>a(xi'Yi)} [1-
i-l ...
Y <y
1- 2
Let
-
~l«Yk••• <y
-
k
II )/Oy)]n-k
(2.3.3)
Y
•
L k denote the log likelihood function based on (2.3.3) aside
n,
25
from constant terms.
Z • (Y... -
~y...
1)/0y
= (Yn, k
Zn, k
,
T - (X - ~ 1)/0 ,
x x
Then letting
L k - (n-k) 10g[1n,
~
- ~y )/0y
~l(Zn, k)]-k
logo
we have
x -k log 0y - (k/2)10g(1-p2)
(2.3.4)
Then we have
(a/a~)L
x n, k- ...1 '
(T-pZ)/O (1_ p2)
...
...
x
(2.3.5)
(a/aO)L
x
n,
k - -k/O
(aJa~)L k
y n,
=
x
+ (T'T - p T'Z)/Ox (1_ p2)
-1
(n~k)o ~l(Z k)/[l-~l(Z k)]
y
n,
n,
2 -1 - 1 ,
.
-(l-p) 0
1 (pT-Z)
Y ....
-1
(a/aO)L
y n, k· (n-k)oy
- ....
Zn, k~l(Zn, k)/[l- ~l(Zn, k)]
- 0-1(1_ p 2)-1(PT'Z - Z'Z) - k 0-:
y
......
With notations in (1.5.7), we let
~. (w , •• ·,wk )'
1
where
wi·
......
y
...u· (un, l'···'un, k)' and
2
Vn,ii + un,i.
Also let
26
where
a,b,c,d
are non-negative numbers and the pdf of Z k is
n,
(2.3.7)
for
k· l, ••• ,n.
Noting that from (1.5.8)
(2.3.8)
E(T'Z).· P!'
~,
E (Z' Z) • l' w ,
""'"
#v
....
#v
we find the following results after differentiating each side of each
equation in (2.3.5) with respect to
~x,~y,crx,cry'
and p and taking
expected values of the negations of the resulting second partial
derivatives:
2 2
2
2
(3 /3U)L
x n, k - -k/crx (l-p )
(2.3.9)
2
2
2
(3/3U3cr)L
x x n, k - - l...' (2T-pZ)/cr
...
... x (l-p)
(2.3.10)
27
(2.3.11)
(d
1.
2
12
!d~ d~)L
x
y
2
n, k · kp!a xay (l-p )
= E{-<a 2 /a~x a~y )}
2
- -kp/axay (1-p )
· 2 2
~14 • E{-(a /a~x aa)L
1 'uta xay (l-p )
y n, k}· - p ......
(2.3.12)
(2.3.13)
(2.3.14)
I 23 • E{ -(a
2 2 2
/aa x d~.)L
} . - p ......
l'u/ax a y (l-p )
Ty n,k
(2.3.15)
(2.3.16)
28
(2.3.17)
2
(a / ap 2)L ,k • {k(1+p2)(1_ p 2) - (l
n
+ 3p2)(!'! + ~'~)
+ 2p(3 + p2)T'Z}/(1_
p 2)3
......
(2.3.18)
(2.3.19)
I
54
2
• E{ - (a /apaa)L
y
n,
k}· -
p l' w/a (1_ p 2)
...
...
y
(2.3.20)
2
2
-2
-1
(a /all)L
y n, k · -(n-k)ay ¢ll(Zn, k) [l-~l(Zn, k)]
{¢ll(Z k)
n,
1
2
22
[l-~l(Zn, k)]-l
2
- Z k}- k/a (1_ p2)
n,
y
2
• E {-(a /alJy)Ln,k} • {(n-k)[~(n,k; 2,0,2,0)-~(n,k; 1,0,1,1)]
(2.3.21)
29
{a2/a~ y aa)L
k = -(n-k)~ (Z
) [1-~ (Z
)]-1{1_Z 2 +
1 n,k
1 n,k
n,k
y n,
~l{Zn, k) [l-~l{Zn, k)]
I 24
-1
2
Zn, k}/ay
= E{-{a 2 /a~yaay)}={{n-k)[~(n,k;1,O,1,O)-~{n,k;1,O,1,2) (2.3.22)
~(n,k;2,O,2,1)]+ l'u{2_p2)/{1_ p2)}/a 2
- -
y
=)
)[
{
)]-1{ 2-Z 2 k +
(a 2/a....2)L
v
Y n, k -(n-k Zn, k~l{Z n, k 1-~1 Zn, k
n,
-1
2
Zn, k~l{Zn, k) [l-~l{Zn, k)] }/ay
+ (2pT'Z-3 Z'Z)/a 2{1_p2) + k/a 2
2
- - } .
2
I 44 - E { -(a /aay)Ln,k
+
=
y
y
{{n-k) [2~{n,k;1,O,1,1)- (n,k;1,O,1,3)
~(n,k;2,O,2,2)]
2
2
+ l' w {3- 2p )/(l-P )
--
(2.3.23)
and where
Ij~
= I~j
for every
j,~
= 1, ••• ,5.
Des Raj (1953), who discussed maximum likelihood estimation
from bivariate samples when Y is truncated instead of censored,
developed equations similar to these.
The expected values of the
second partial derivatives are different in that case.
The
~functions,
which include all moments of single order
statistics, can be evaluated for given
n,k
by numerical integration
to almost any desired precision {12 significant digits were used for
30
the computations in this paper).
!
Fisher information matrix
and
1'w.
=
To complete the evaluation of the
«I jt »,
!'~
we need to evaluate
For this as in Saw [1958], we note that given
Z k and
n,
disregarding the ordering of (Z n, l' ••• 'Zn, k-1)' the joint distribution of (Zn, l' ••• 'Zn, k-1) is reducible to that of
k-l independent
random variables from a univariate normal d.f., truncated from above
by Zn,k.
Denote (Zn,l' ••• 'Zn,k_l) in random order by (Sl, ••• ,Sk_l)'
so that for
l~i~k-l,
2
E(Si lzn, k)- -~l(Zn, k)/~l(Zn, k),E(Si lz 0, k)=l-Zn, k~l(Zn, k)/~l(Zn, k)
(2.3.24)
and, in general, for
r
e
~ 0,
E(Sri+2I zn, k) = (r+l)E(Sir lzn, k)- zr+kl~l(Z
n,
n, k)/~l(Z0, k) .
(2.3.25)
Hence
k
k-1
l'u - E E(Z . i)- E(Z k)+ E E(5 i )
i-I n,
n,
i-I
k-1
- E(Z
0,
k)+
- E(Zn k)+
E E{E(Si lz k)}
i-I
k-l
E E{-~l(Z
'i-I
-
n,
0,
k)/~l(Z
0,
k)}
~(n,k;0,0,0,1)-(k-1)~(n,k;1,1,0,0)
(2.3.26)
e
31
k
L E(Z2 i)
l'w -
i-1
n,
k-1
k) + L
n,
i-1
= E(Z2
E(S~)
k-1
- E(Z2n, k) + L E{E( S2i l z k)}
i-1
n,
k-1
- E(Z2 k) + L E{l-Z k~l(Z k)/~l(Z k)}
n,
i-1
n,
n,
n,
- w(n,k;0,0,0,2) + (k-1) [l-W(n,k;l,l,O,l)].
(2.3.27)
Note that in Saw's notation,
Let
of
w(n,k; a,a,o,b) is
W(k/(n+1),n:a,b).
22 ' .A24 ' and A44 denote respectively the linear combinations
A
W-functions in (2.3.21), (2.3.22) and (2.3.23).
For three
selected sample sizes, the values for use in determining the information matrix, I...
are:
i'u
l'w
n
k
20
10
.618616
.0471597
-7.67491
10.0000
100
20
.412067 -.000905918 -.276115
-27.7132
43.0867
1000
50
.189672 -.203353
-102.906
218.888
The
Cr~r-Rao
.746412
.157445
lower bounds for variances of unbiased estimators
are presented in the following table for three sample sizes, five
values of p and
ments of
I
-1
Ox· 0y - 1.
These bounds are the diagonal e1e-
and also represent approximate variances of the
maximum likelihood estimators,
A
~
A
x
,~
y
,0A
~
A
x
,O,~
y
(
i.e., so 1ut i ons to
32
(2.3.5»
as predicted by asymptotic theory.
TABLE 2.1
CRAMER-RAO LOWER BOUNDS FOR VARIANCES
A
V [lly ]
A
A
"
V[p]
n
k
p
20
10
0
.2433
.0761
.0500
.0601
.2433
.1
.2417
.0761
.0514
.0601
.2372
.3
.2283
.0761
.0618
.0601
.1916
.5
.2015
.0761
.0775
.0601
.1181
.~
.1078
.0761
.0787
.0601
.0049
0
.4598
.0567
.0250
.0345
.2134
.1
.4557
.0567
.0266
.0345
.2077
.3
.4235
.0567
.0385
.0345
.1653
.5
.3590
.0567
.0562
.0345
.0984
.9
.1333
.0567
.0564
.0345
.0032
0
.6170
.0508
.0100
.0155
.1410
.1
.6114
.0508
.0112
.0155
.1370
.3
.5661
.0508
.0200
.0155
• L081
.5
.4755
.0508
.0330
.0155
.0630
.9
.1584
.0508
.0322
.0155
.0017
100
20
1000 50
V[G ]
x
V[a ]
x
V[a ]
y
It is interesting to note that for a large proportion of censoring
(small k)ll
n x
cannot be estimated well from induced order statistics,
33
yet
(J
x
can be estimated with more accuracy than
Also, lower bounds for variance of estimators of
and
independent of p as one would expect;
(J
y
lly
(J
y
for small p •
and
(J
y
are
can be estimated
efficiently with a small number of order statistics whether or not
induced order statistics are observable.
The difficulty in estimating
so the selection process on
II
x
stems from p being unknown,
Y causes an unknown location
the sample of induced order statistics.
to be zero,
.02
2~4.
(J
2
x
at
II
x
bias in
For example, if p were known
would be estimated by the sample mean with variance
n=1000, k-50, instead of
.6170
2
•
x
(J
Maximum Likelihood Estimation
An iterative method may be used to estimate
~
by using (2.3.5).
Des Raj [1953] set up the implicit solutions to the likelihood equations for the case of double truncation on Y from a bivariate normal
sample which is similar to our case relating to censoring.
Assuming
the theorem outlined by Ha1perin[1952] to be valid for our bivariate
case, the maximum likelihood estimator (MLE) will be consistent,
asymptotically normally distributed and asymptotically efficient.
Note that since
Yn, l' •• "Yn, k are not independent, knowledge of the
asymptotic properties of the MLE may demand additional regularity
conditions
[viz. Sen [1976]].
Nevertheless, for bivariate normal
d.f.vs, optimal properties hold under quite general setups.
The complexity of the solutions to (2.3.5) makes the study of
34
the properties of the MLE difficult for any finite sample size.
It will be shown in a more general setting in section 4.2, however,
that the MLE of
alone.
and
1..1
y
l..I
y 'C1y
is the usual MLE based on
Y.
n, 1"'" Yn, k
Nonetheless, the results obtained by Saw [1961] for estimating
C1y
in this case implies that theMLE of
desirable properties for small
k
or
n.
will have un-
_6
In particular, the MLE
may have considerable bias (relative to its standard error).
linearization of the log-likelihood function of
The
Y·l,···,Y
n,
n, k
described in Chan [1967], which results in simple and efficient
estimators of
l..Iy
and
C1y , will solve this problem.
There are other good estimators of
ones considered by Chan.
l..Iy
and
C1y
besides the
For instance, Saw [1959] developed an un-
biased estimator of 1..1y which is a linear combination of Yn,k and
k-l
(k-1) -1 1: Y
and has asymptotic efficiency (a function of
i-I n,i
k/(n+l» ~94% for k/(n+l) as low as .25. He also considered a
k-l
k-l
2
2
linear combination of
1: (Y i - Y k) and {1: (Y i- Y k)}
n,
n,
i-1 n,
n,
i-I
as an unbiased estimator of
ciency for all
k/(n+l).
C1 y2 which is of 100% asymptotic effi-
Linear unbiased estimators of
C1y have
also been developed (for example see Gupta [1952]).
Since for the estimation of
(1..1
y
,C1y ), the induced order statistics
do not contribute any information, it seems natural to estimate
(l..I ,cr ) by some efficient procedure based only on
-Y· (Yn, l""'Yn, k)'
Denote the estimators by (~,C1;) and then consider only the first
y
y
three equations in (2.3.5) wherein
1..ly,C1y
are replaced by 1..ly,C1
* y* •
35
Equating these functions to
0, we get three equations in three un-
(~x,ax,p)and
known parameters
as the modified MLE.
denote these solutions
*
*
*)
(~x,ax'p
For this purpose, we write
(2.4.1)
k
5 2. k-l
y
Equating
0;0
t..
i=l
(Y
(a/a~)L
x
n,
= k- l
._Y) 2 and 5
n,~
k
0;0
t..
i=l
xy
in (2.3.5) to
- p* (Y
Equating (a/aa)L
x n, k
k
to
(X [i] - X) (Y i Y)'
n
n,
0 we get
* /a * .
y
~ )
(2.4.2)
y
0, we get
(2.4.3)
Combining (2.4.2) and (2.4.3) we have
(2.4.4)
Equating (a/ap)L n, k to
0,
- *) 2 }/a*2 + {S 2+ (Y_~
- * ) 2 }/a *2 ]
P* (l-p*2 )-p * [{5 2+ (X-~
x
x
x
y
y
y
36
(2.4.5)
Using (2.4.2) this simplifies to
*a *
P* (l-p *2 )- P* (s x2lax*2 + sy2 lay*2 )+ (l-p *2 )S xy laxy
*a *
• p* (p *Sxyxy
la *a * - p*2 ) + sxy laxy
(2.4.6)
.
substituting the identity in (2.4.4) into (2.4.6),
(2.4.7)
The similarity of
p* to the Pearson product moment correlation
should be noted, for (2.4.7) can be re-written
p
* - (S Is s ) [(S la * )(a *Is )l.
xy xy
x x
y y
(2.4.8)
From (2.4.4) and (2.4.7) we get
(2.4.9)
substituting (2.4.7) into (2.4.2),
37
1.1x*
=
(2.4.10)
Thus, having estimated
by 1.1* ,0 * , we may estimate 1.1,0 and
y y
x x
1.1 , 0
y
y
P by (2.4.10), (2.4.9), and (2.4.7), respectively.
of these estimators is that as
One nice feature
k/n"'l (Le., the amount of censoring
decreases), the estimators approach the ordinary MLE, which are known
to be efficient for complete samples of any size.
are MLE based on
Yn, l'···'Yn, k
Again, if
II * ,
Y
the entire set of estimators is
E(XIY) = 1.1 1 + PO (Y - 1.1 1)/0 it follows that
- x x y y
II + PO (Y - 1.l )/0 and
x
x
y
y
Since
E(XIY)
-
=
- k
-1
k
E [1.1 +
i-I
x
po (Y
x
n,
i- 1.1 )/0 ](Y i- Y)
y
y
n,
(2.4.11)
- POx
And so from (2.4.10),
*
E(llx* IY)
- • IIx + POx (lly Therefore,
estimator of
l.l
*
x
is an unbiased
l.ly •
Since
y )OY'
1.l
of l.lx i f lly* is an unbiased
2
2
0x(l-P )~k '
esti~tor
V(~I!)
-
(2.4.12)
38
(2.4.13)
Combining (2.4.13) and (2.4.11) we find
(2.4.14)
2
2 -1
2
2 2 2
It can also be shown that E(S x Iy)
- cr {k (k-1)(1-p)+P S /cr }.
...
x
y y
combining this with (2.4.14) and (2.4.9) we have
(2.4.15)
and if
cr*2 is unbiased for
y
2
,
Y
cr
(2.4.16)
It can also be shown that
(2.4.17)
Note that from (2.4.11) if
*2
cry is unbiased for
2
cry
then an un-
biased estimator for the population covariance of X and Y, pcrxcry ,
is given by
e
39
(2.4.l8)
In order to estimate the expected values of
well as the variances of
llx*
*
,(j
x
,
and
(j
*
x
and
p * as
p * , for each of three
different combinations of (n,k) and five different values of p,
500 random sets of
(~,!)
were generated and the statistics were
evaluated. lly* , (jy* were taken as the estimators due to Gupta (1952)
in (1.5 .14) and (1. 5.15).
Alse, for comparison, the variance of
Watterson's estimator of
For
lly*
(1.5.9).
and
-
llx
the
u
-
llx ,llx
n,i
defined by (1.5.l9) was estimated.
in {1.5.l4} were approximated by
For simplicity, we have taken
(j
x
= (jy = 1 •
We have also
estimated the trace efficiency of the modified MLE, which is the
ratio of the sum of the five
Cram~r-Rao
lower bounds from Table 2.1
to the sum of the estimated variances of the five modified MLE.
Table 2.2.
*
Estimated moments of modified maximum likelihood estimators
*
V[a]
x.
V [a ]
.y
*
E [p*]
V[p ]
Trace
Efficiency
1.021
.077
.078
-.020
.236
.902
.092
.998
.074
.088
.060
.213
.884
.293
.082
1.043
.079
.079
.275
.196
.862
.265
.276
.087
.998
.102
.086
.449
.172
.749
.9
.123
.126
.093
1.014
.106
.083
.881
.024
.764
0
.495
.508
.076
1.038
.042
.045
-.044
.169
.955
.1
.548
.564
.087
1.059
.047
.051
.080
.176
.859
.3
.487
.507
.090
1.054
.050
.049
.256
.152
.868
.5
.400
.420
.081
1.020
.057
.046
.442
.115
.865
.9
.202
.209
.087
1.014
.080
.048
.885
.009
.667
0
.663
.685
.078
1.045
.019
.022
-.016
.114
.931
.1
.601
.616
.091
1.044
.018
.025
.092
.105
.983
.3
.624
.648
.094
1.048
.024
.024
.274
.103
.875
.5
.477
.508
.091
1.030
.032
.025
.462
.066
.923
.9
.187
.188
.082
.987
.038
.023
.887
.003
.777
n
k
p
V [11 ]
V[~ ]
V [11 ]
20
10
0
.281
.297
•074
.1
.287
.293
.3
.281
.5
100
20
1000 50
-
x
x
y .
E [0*]
. . x.
e
.....
*
*
.I:"-
0
e
41
From Table 2.2 we see that
-
~x
than
*
~x
is consistently slightly better
although a small part of this may have to do with the
approximation used for the
u n, i.
The efficiency of
~
*
x
by
simulation is between 66% and 102% according to the bounds in Table
* has large variance for large censoring
x
the other hand, cr is estimated as well as cr
x
y
As expected,
2.1.
proportion.
On
~
almost every case and has lower variance in some cases.
appears to have small bias and is 53-103% efficient.
Also,
in
a*
x
The efficiency
p* is between 69% and l30%foT the cases studied except when
of
p • .9.
=
(p
The bias in
p* is very low except for moderate
.3 or .5 resulted in maximum estimated bias of .06).
p
Note that
*) would be even smaller if for example Saw's [1959] estimator of
x
were used.
V(~
~
y
2.5.
Test of
2.5.1.
H :
O
e=0
Likelihood Ratio Test
From the results of the previous section, if
of
(~
* *
(~y,ay)
are MLE
,a ), they are MLE independent of any restriction on p.
Y
y
Thus
the modified MLE of section 2.4 may be incorporated to prescribe a
likelihood ratio test for
HO: p= O.
As
* *
* * *
~y,ay; ~x,ax'p
before,
are the estimators considered in section 2.4 except that now we
require
* *
~,ay
to be true MLE.
Over the parameter space restricted
by p. 0, the parallel estimators are
11
* a·"
* -* -*
cr
~y'
y'~y'
x' 0
where
42
-*
J,lx
•
and
X
_*2 •
(j
S2
are defined by (2.4.1)
x
x
(2.5.1.1)
Ln, k in (2.3.4), the log ratio of the
From the definition of
maximum likelihood under the null hypothesis and the maximum likelihood over the whole parameter space, multiplied by -2, results in a
likelihood ratio statistic of
-*x
-2
(j
*-2
+
(j
+
(j
y
*-2
y
E(Yn
i -
* 2
(2.5.1.2)
j.l )
y
* 2
(Yn, i -
J,l )
y
}
where all summations are from 1 to k.
We now use the relation
(2.5.1.3)
and further, by (2.4.10),
(2.5.1.4)
~(X [i] n
t..
J,l
*)(Y
x
0,
i -
*)
J,l y ...
k S
xy
+ k S (Y _ 1,*)2/
S2
...y
xy
y
(2.5.1.5)
e
43
and using (2.4.4) and (2.4.7),
(2.5.1.6)
so the first term in (2.5.1.2) is
-k log (l-r
and
5 ,5 ,5
x
y
xy
*2
) where
r
*•
(2.5.1. 7)
5xy /5 xy
5
are defined by (2.4.1).
product moment correlation of the set
Thus
r * is the ordinary
(Xn [i] ,Yn , i)' i=l, ••• ,k.
The remaining terms of X become
-
(l-p
*2
1 *-2
)- a
Y
E(Y
n,
* 2
i - lly).
Using (2.5.1.3) through (2.5.1.6), we get, on dividing by
k,
44
which after simplification is seen to be zero.
Hence the likelihood
ratio statistic is given by (2.4.1.7) which is identical to the
likelihood ratio statistic for a complete sample of size k.
This
suggests that an "optimum" test of independence using censored data
would still be the same test as if the data were not censored and
that we may use
r * (or some function of r *) as a test statistic.
As
with a complete sample it will be advantageous to consider as a test
statistic
*
*2
*2
2
-1 2 2 2
T • (k-2) r / (l-r ) . Sxy / {(k-2) [5 xy
S -S xy ]} .
(2.5.1.8)
Note that this test is identical to that one would obtain from the
usual conditional regression test based on the conditional distribution in (2. 3. l) •
2.5.2.
Distribution of Test Statistic under Censoring
A statistic for testing the hypothesis that X and Yare independent is
T* in (2.5.1.8).
Define
(2.5.2.l)
Let
F (t;a,b) be the non-central F d.f. with degrees of freedom
6
45
. * < u}
(DF) (a,b) and non-centrality parameter 6. and G k(u) ,. p{U'
n,
for
u > O.
Finally, let
*
ClO
FP( t ; a, b) = f F 2
2 (t ; a ,b) d G k (u) ,
O P u/(l-p )
n,
(2.5.2.2)
t >0
Then, we have the following
Theorem 2.5.1.1:
For every
t >0
and given
*
*
*
U) P{T ~tlU }- F 2
2 -1 *(t;1,k-2).
p (l-p) U
(2.5.2.3)
Hence
*
P{T < d
• F* (t; 1,k-2)>>
P
F (t;1,k-2) ,. F (t;1,k-2)
p
O
*
Proof:
From (1. 5.3), given
t
>0.
Under
P =0 0,
is the central F d.f. with DF (1,k-2).
!'
Xn [lJ , ••• 'Xn
Lkl
independently normally distributed with means
1 < i <k
H :
O
and a common variance
2
2
crx (l-p ).
are (conditionally)
II
Hence
x
+ pO'xO'-l(y
iy
n,
k S
xy
is condi-
tionally normally distributed with mean
(y-y l)'{lJ 1 + pO' 0'- 1 (Y-ll l)}" k pO' S 2 Icr
......
x: . . . . xy ... y...
xy y
and variance
2
2
0' (l-p )(Y- Y 1)' I (Y- Y 1)
x
... · . . . . . . k ...
...
::I
22
x y
2
*
0' 0' (l-p )U .
l.J ),
y
,. pO'xy
0' U*
46
Therefore
2 I{cr 2cr2 (l-p 2)U*} - Xl(p
2 2U*(l-p)
2 -1 ),
k 2 Sxy
x y
..'1'
l (A)
u
where
stands for the non-central chi-square d.f. with
and non-centrality parameter
l:!..
(2.5.2.4)
p DF
Further,
-
(2.5.2.5)
- U' A U
(2.5.2.6)
2
l
A • cr U*(Ik - k- 1 1') • (Y - Y l)(Y - Y 1)'
...,
y.....
""" -
and given
!'
~
(2.5.2.7)
is conditionally normally distributed with mean
vector
(2.5.2.8)
47
-1 -2
and dispersion matrix k 5y ...I k • Further, it can be shown that
1 2
k- 5- A is an idempotent matrix and that the rank of A· k-2.
Y ...
...
The non-centrality parameter,
y'
AY
, is zero.
...
"'f
Thus given
Y
... '
(2.5.2.9)
which implies its unconditional distribution is also
2
Xk _2 (0).
Finally, noting that (Y-Y
... ...1)' ...A • 0, we obtain from (2.5.2.5) and
2
(2.5.2.6) that given Y, 52 and {5 5 2 - 52 } are conditionally
...
xy
xy
xy
independent.
Hence (2.5.2.3) follows from (2.5.1.8), (2.5.2.4),
and (2.5.2.9), and the fact that this conditional d.f. depends on!
through U* alone.
* 1,k-2).
Fp(t;
all
* *
Finally, by (2.5.2.3), P(T *2. t) ,. E{P(T~tlu)
Under
U* and hence,
HO: P= 0, this is equal to
F*
p ,. FO•
,.
FO(t; 1,k-2) for
Q.E.D.
From Theorem 2.5.2.1, T* has the same comditional distribution
as the statistic from an uncensored sample and under .H : p
O
=
0, it
has the classical (variance ratio) distribution with DF (1,k-2) and
the test can be made without any difficulty.
100
aX point of
FO(t; 1,k-2) i.e.,
Let F * be the upper
a
* 1, k-2)= 1 - a.
F ( Fa;
O
Then
the power of the test based on T* (or r *) and corresponding to the
level of significance a (0 < a < 1) is given by
*
*
1 - FP ( Fa; 1, k-2)
(2.5.2.10)
48
In general, for
k < n, Gn, k
the d.f. of U,* is quite complicated,
and hence, analytical solutions for (2.5.2.10) are difficult to
obtain.
However, one can obtain one or two moment approximations
for it by computing the moments of U* by the formulae of Saw [1958].
unfortunately these formulae contain additional errors not reported
in the Corrigenda to Saw [1958].
defined by (2.3.6).
Let
Then, noting that
k
~ab
- ~(n,k; a,a,O,b),
U* • (Z-Z ...1)' (Z-Z 1) where
1
Z • (Z l' ••• 'Z k)',
Z
=
kE Z i ' with Zn, i=(Yn, i-~y )/0y ,
...
n,
n,
i-I n,
1 < i < k relate to the standard normal d. f" we find after deriving
Saw's formulae that
and that the coefficient of
E(U
*2
) should be -(3k
2
~13
in Saw's expression (3.6) concerning
- 14k + 15) instead of (5k
2
- 10k + 1).
These findings have been confirmed by Professor Saw in a personal
communication.
Note that for a complete sample (k-n), E(U *)=k-1,
V(U*)-2(k-l).
2
2 -1
Letting 1; - p (l-p )
and using the properties of the F-
distribution.
E(T
*
Iu*) •
*
(1 + ~ U ) (k-2)/(k-4)
(2.5.2.11)
49
which imply that
*
*
E(T) .. {l +l,;E(U)}
(k-2)/(k-4)
(2.5.2.13)
(2.5.2.14)
For the one moment approximation, we suppose that
where
Fa,b(~)
Thus we require
F ,k_2(l,;E(t!
1
has the d.f.
A-
*».
*
* with
For the two moment approximation, we replace the
aF k_2(0) and equate its moments to (2.5.2.13),
b,
Thus
*
E(T) .. a(k-2)/(k-4) ={l +
a
a,b) defined after (2.5.2.1).
l,;E(U) and replace the distribution of T
distribution of T* by
(2.5.2.14).
F~(t;
*
T - F ,k_2(A)
l
..
1
+
Z;;E(U w)}(k-2)/(k-4)
so
*
l; E(U )
(2.5.2.15)
2
E(T*) .. a (k-2)2(b+2)/b(k-4) (k-6)
which implies
(2.5.2.16)
Note that when
p. 0,
a - b .. 1
tions give the correct null d. f.
and
A- 0
so that both approxima-
Also, we limit ourselves to
k >4
50
for the one-moment approximation or
k >6
for the two moment approx-
imation.
For the different
10
3.43246
3.87797
100
20
4.22029
3.60135
1000
50
6.79179
4.41258
20
n=20
k=10
1
b
1
values are:
n=100
k-20
o
.1
.3
.5
.9
1.035 1.339 2.144 15.633
1
1.043
1.417
2.407
18.992
1.001 1.057 1.312
1
1.002
1.085
1.443
5.174
.1
a
U* are as follows:
*
V(U )
k
(a~b)
0
*
E(U )
11
The resulting
p
the moments of
n~k
For each
.5
.3
.• 9
3.731
p
o
.1
.3
k=50
.5
.9
a
1
1.069
1.672
3.264
29.954
b
1
1.004
1.182
1.845
9.063
n~k~p
n-lOOO
the power of the r *-test at the 5% level under
censoring was estimated by generating 2000 k x 1
random
!
vectors.
The exact upper 5% critical values were used and the power was
51
calculated by averaging 2000 conditional non-central F probabilities
to estimate the unconditional distribution of U* as in (2.5.2.2).
In order to compare the power of the test using censored data to that
using a complete bivariate sample, the complete sample size
n*
required to achieve the same simulated power using the Fisher-Yates
Z-test is given.
TABLE 2.3
ESTIMATED POWER OF 5% r *-TEST FOR CENSORED DATA
Power from
One Moment
Approximation
Power from
Two Moment
Approximation
Simulated
Power
*
n
k
.Q.
20
10
.1
.0531
.0531
.0532
6
.3
.0810
.0802
.0814
6
.5
.1570
.1522
.1581
6
.9
.9160
.8331
.8256
7
.1
.0544
.0544
.0544
7
.3
.0939
.0921
.0942
7
.5
.2025
.1929
.2026
8
.9
.9796
.9468
.9349
9
.1
.0576
.0575
.0576
10
.3
.1266
.1219
.1273
10
.5
.3140
.2967
.3140
11
.9
.9995
.9985
.9956
13
100
1000
20
50
!L
52
The two-moment approximation is better for p - .9, but the onemoment approximation is better overall in estimating the power
function.
This is fortunate because the one-moment formula is easier
to calculate since only the first moment of U* need be computed.
Note
that a complete sample of about one-fifth the size (k) of the observed censored sample would achieve the same power when
n· 1000
k • 50 in which a high proportion of observations are censored.
However, because of considerations of cost of failures or controlling
the duration of the experiment, this type of sacrifice may be
necessary.
CHAPTER III
STATISTICAL INFERENCE FOR CENSORED MULTIVARIATE
NORMAL DISTRIBUTIONS BASED ON INDUCED ORDER STATISTICS
3.1.
Introduction.
Let
(X
_ 1 ,Y 1 ), ••• ,(X,Y)
_D.
n be independent and identically dis~i
tributed random (p+1)-vectors where
no difficulty in (1. 2.1) in defining
order statistic where
x-n [i] = -j
X
if
is a p-vector.
X ] as a
-n[i
Y
n,i
= Yj
1 xp
for
There is
induced
i,j=l, ••• ,n.
In this chapter we will extend the results of Chapter 2 by supposing
that
where
(~i'Yi)
~
has a (p+l)-variate normal d.f. with density
~e(x,y)
denotes the mean, variance, and covariance parameters.
Maximum likelihood estimation as in section 2.4 will be used to
estimate
~
and a likelihood ratio test of
dent will be developed as in section 2.5.1.
H :
O
~
and Yare indepen-
Other tools of statistical
inference commonly used in complete data problems, namely tests of
partial and canonical correlation, will be developed for censored
multivariate normal data.
3.2.
Maximum Likelihood Estimation
By (1.4.1), the joint (conditional) pdf of
54
(3.2.1)
where
~i· (xil·xi2.···.xiP) and
density of
Xi
given
-
tain the joint pdf of
/fl
(xi) is the conditional
Yn.i ...
Multiplying (3.2.1) by (2.3.2) we ob-
Y i.
n.
X
... and
Y.
...
(3.2.2)
where
n [k] is defined in (2.3.2).
The log likelihood function based
on (3.2.2) is. aside from constant terms.
(3.2.3)
1 k
~ (U
where
and
1
- ~)'
2 i_l ... i
...
V-
!!i· (!nUJ'[ Yn.~)'
Y
is the dispersion matrix of (!i ,Yi ) •
~x
-
- -
y. (~x
~Y) wber~
•
v...xx
V
...
•
.
r
E(~i)' ~Y •
E(Y i )·
Partition
y
as
v
.
•
v'
"'''1'
'--
(U
... - ...~)
i
"'''1
Oy2
(3.2.4)
55
where
...xx is the
p xp
V
covariance vector of
dispersion matrix of
Following section 2.4 we estimate
*
simplicity, the
estimate
of
L
n,k
is the
p xl
2
is the (scaler) variance of Y •
i
Y
2
lJy and (Jy using the vector Y
...
Xi and Y., and
...
J.
only; denote these estimators by
...xy
~i ' V
*
lJy
(J
and
(J
2*
y
respectively.
will be dropped for the present.
For
So we must
lJ
V
,and V
and we consider the non-fixed terms
-x ' -xx
-xy
,when multiplied by -2 to be:
k log I~I
0.2.5)
+
\
Now define
k
k
i"l
i"l
1
r ~n [i] , Y .. k- r Y
X .. k -1
S
...xx
.. k
-1
k
-
n,
i ) U ..
-
r (X [i]-X)(X
[i]-X), ,S .. k
i=l -n
......n
_
... xy
-1
(~,y)
k
--
r (X [.]-X)(Y n,J..-Y)
i=l -n J. (3.2.6)
=
[~xx
S'
-xy
After dividing by
:XYJ
-yy
k, (3.2.5)
c&~
be written
0.2.7)
56
I~I = a~ I~-ll and
We find that
A
-1
V
•
-
...
_a- 2 A V
Y
...... xy
(3.2.8)
a-2 + a-4 V'
Y
y
A V .
...xy .......xy
(3.2.9)
- 2
+ ay-2(lJ-Y)
+ a-4
(lJ -Y)V' A V (lJ -Y)
y
y
y
...xy......xy y
where the symmetric matrix
(3.2.10)
A
... is defined to be
(3.2.11)
Therefore, ignoring fixed terms, (3.2.7) implies
2
Ln, k aL • logl A-li
-2a-y ...xy......xy
S' A V
+ a-y 48yyV'xy A
V + tr A
8
...
.......xy
...... xx
(3.2.12)
e
57
(3.2.13)
so
]..I
-x
Let
~ij
•
-
v ..
1.J
-
X
-
=
0-
y
2
V
-xy
(]..I
y
-
Y).
(3.2.14)
V
, i,j = 1, ••• ,p. Also let
-xx
is all zeros except for elements (i,j) and
denote an element of
av-xx lavi J·
~i·
•
- J
(k,i), which are unity.
Using
(3.2.15)
(3.2.16)
we have
~ I~
aL aV ij
= tr
A
_
A
~ij
+ 20-2
V _0-4y Syy_xy-_iJV' At. .A V
S ' AA A
y -xy~ij
__xy
_xy
- tr A ~ A S - -ij- -xx
=
(]..I
-x
-
X)'A ~i.A (]..I
- - - J- -x
-
X)
-
tr[{A + 20-Y2AV
S' A - 0-4S
V -xy
V' A-AS xx...A
-_xy_xy_
Y yy-A -X"j'
58
- A(U
-X)(U
-X)
'A
+ 20-2
(U -X)
'A
~ ~x ~
~x ~
~
y (Uy -Y)AV
~~xy ~x ~
~
(3.2.17)
Since this trace is zero for every
the Lemma of Smith [1978] that
~
i,j
=
l, ••• ,p
must be zero.
we conclude by
Using (3.2.14) this
implies
-2
A + 20 A V S' A
....
y ~ ~xy ~xy....
Postmu1tip1ying by
0
-0-
4 -1
A
y ....
y
4
S
A V V' A-A S A
-xy~xy~ -xx....
yy~
=0
~pxp
(3.2.18)
yie1dE
2
Oy4 ~p
I + A [20Y ....Vxy ....xy
S' - Syy ....Vxy~xy
V'
0
4
S
y .... xx
]
=0
(3.2.19)
or
(3.2.20)
,. A-1 ,.
V - 0-2v V'
~xx
y ....xy....xy •
(3.2.21)
e
59
Therefore,
0.2.22)
Next we need to find the derivative of (3.2.12) with respect to V •
-xy
We have
a log ,A-1, /av
-
=
-xy
(a/av
-xy
.
--
I·
2
B V'
-2cr- A V
-xy -xy - -xy B-A
Y - -xy
= - cr- 2 (a/av )V'
•
I
-2
)tr {B(-cr )V V' }
y -xy-xy B=A
Y
0.2.23)
--
(aiav-xy )S'
AV
-xy--xy
=
2
(a/av
)S' (-B)(-cr- v V' )BQ!
-xy -xy
Y -xy-xy
B=A,Q=V
~
+
(a/av-xy )S'
BV
-xy--xy
IB=
...
"'W
-
-"XY
A
-
• (a/av-X'j )cr-Y2v'
BQS' BV I
-X'j---X'j--X'j B=A, Q=V
+ AS
- -X'j
..., - ..., -'XY
= 2 cr- 2A V
Y - -X'j
s' A V + A S
-X'j -
0.2.24)
-X'j
--xy
AV
+ 2 AV
Similarly
(a/av-X'j )V'
-X'j
(a/av-X'j )
AV
-
-X'j
- 2 cr-
2
Y
A V V'
-
-X'j-X'j .... -X'j
2
tr A S · 2 cr- A S A V
_ -xx
Y - -xx - -X'j
-
-xy
0.2.25)
0.2.26)
60
(3.2.27)
using (3.2.14).
(a/av-xy )(~-x...
-X)'A
-
V
-xy
2
• 2 cr- A V
(~-X)'A V
y - -xy -x -
- -xy
+ A(~ -X)
-x -
+ cr-2 (j.l -Y)A
Y
using (3.2.14).
Y
(3.2.28)
V
--xy
Combining (3.2.23) through (3.2.28) and premu1ti-
1 6-1
plying by -2 cr A ,
y-
(3.2.29)
By (3.2.20) the expression in brackets is
and adding the
*
*
4 -1
cr A • Solving for
ynotation to denote the estimators we have
*2 -1
V
"cr S
S
-xy
Y yy ...xy
V
-xy
(3.2.30)
61
Substituting (3.2.30) into (3.2.14),
(3.2.31)
Substituting (3.2.30) into (3.2.22),
v* = S + S-2 (cr*2_ S )S
Sf
-xx
-xx
yy
y
yy ...xy -xy.
(3.2.32)
These modified MLE's are direct extensions of those for the bivariate
case given respectively in (2.4.18), (2.4.10), and (2.4.9).
Using (3.2.1),
(3.2.33)
which implies
2
E(x_I_Y) ·U + cr- (y - U )v
-x
y
y -xy
(3.2.34)
and
E(S
IY) - k-
...xy -
1
k
E[ E X [i](Y i - Y) Yn.,i]
i=l-n
n,
(3.2.35)
62
So from (3.2.31), (3.2.34), and (3.2.35),
'*
-2
'*
E(~...x IY)
~ + cry (~y - ~y)V
...xy
... • ...x
'* is unbiased for
so
~x
~
...x
if
1..1
'*
(3.2.36)
is unbiased for
y
From
~.
y
(3.2.30) and (3.2.35),
(3.2.37)
V'*
so
...xy
is unbiased for
V
...xy
if
cr'*2 is unbiased for
y
2
Y
cr •
By the invariance property of maximum likelihood estimators we
may take' as the modified MLE of the vector of correlations between
~
and
Y
to be
(3.2.38)
using (3.2.30) and (3.2.32).
3.3.
Inference about Independence of
3.3.1.
~
and Y
Estimation of Multiple Correlation Coefficient
With the setup in (3.2.4), the square of the population multiple
correlation coefficient between
2. cr- 2 V'
P
y
v- l
V •
...xy -xx ...xy
~i
and Y is
i
(3.3.1.1)
e
63
2
P is
The modified maximum likelihood estimator of
2*
P
where
(1
*
y
-(1
*-2 V*' V*-1 V
* ,
Y
"'XY"'xx
(3.3.1.2)
... xy
is defined in section 2.4 and
respectively in (3.2.30) and (3.2.32).
V*
...xy
and
Expanding
V*
... xx
*
V
...xy
are defined
and
V*
...xx
this is
0.3.1. 3)
Now using for example, Rao [1965, p. 29] if
matrix , ~
a column vector,
s
P
...
a scaler, and
is a non-singular
M = w'p-lw
...... ...
(3.3.1.4)
which implies further that
W'(P + s WW,)-l W•
... M/(l + s M).
(3.3.1.5)
Using this, (3.3.1.3) becomes
(3.3.1.6)
64
-2
or, with
c • 5yy
ay*
(3.3.1.7)
2*
R
is the ordinary sample squared multiple correlation
where
coefficient defined by
• 5- 1 S' 5- 1 5
...yy ...xy ...xx ... xy.
(3.3.1. 8)
Note that for a complete sample,
Sand
case both
yy
(j
*2
y
c
approaches unity since in that
are consistent estimates of the same
quantity.
3.3.2.
Likelihood Ratio Test of Independence
As in section (2.5.1), unrestricted parameter estimates for the
(p+l)-variate censored normal distribution are
e* •
!i
and
Y
i
(ll*
y'
a*
y
*
!:x
V
*
(3.3.2.1)
-xx '
are independent if and only if
over the parameter space restricted by
...*
e •
*
(lly
a*• ...*
,I
y' "'x
-*
Vxx '
2)
v...xy
V
- ...o.
• 0
are
...xy
The estimators
(3.3.2.2)
65
where
-*
~ = X
-x
~
and
-*
V
-xx
=S
~xx
as defined by (3.2.6).
(3.3.2.3)
#IV
S-1
xx •
e
H : V
= 0, A in (3.2.11) evaluated at e = * becomes
O ""'XY
Evaluating .
A. .at. . .e* using (3.2.30) through (3.2.32) we ob-
Now under
....
#IV
'V
tain
A* =
(S
- S-1
-xx
yy
S
S' )-1
-xy ...xy
with the quantities involved defined in (3.2.6).
(3.3.2.4)
Using these results
along with (3.2.12), the maximum log likelihood under HO is proportional to
p + 10giS
I
"'Xx
(3.3.2.5)
and the unrestricted maximum is
-1
log
I~* I
(3.3.2.6)
By for example Rao [1965, p. 29],
-1
IA*
I· Is...xx 1(1
-
- S-l S' s-1 S )
yy ... xy ... xx -xy •
We obtain using (3.3.1.4) that
(3.3.2.7)
66
tr (P + s W W')-~.
"""
....
~
P
-s M/(l + s M)
~
with the notation as in section 3.3.1 where
M - W'
... p...
matrix and
l
~
(3.3.2.8)
is a non-singular
p xp
50 (3.3.2.7), (3.3.1.5), and (3.3.2.8)
W.
imply that (3.3.2.6) equals
log
S' I +
I...xx
- 1 - 51 5 ) + P
log (1 - Syy 5'
...xy ... xx ... xy
•
(3.3.2.9)
Subtracting the restricted maximum, we obt.:lin as the likelihood ratio
criterion
.2
log (1 - R ),
and
.2
R
(3.3.2.10)
is the ordinary squared multiple correlation coefficient.
Thus the test for independence in multivariate normal complete
samples is optimal for testing independence with censored multivariate
normal samples in the sense that this test statistic is the likelihood ratio criterion.
3.3.3.
Distribution of Test Statistic under Censoring
We take as a test statistic for testing H : V
O ...xy • 0...
..2
T - {R
/p}/{(l-R
.2
)/(k-l-p)},
(3.3.3.1)
67
where
*2
R
~n[i]'
. Let
2.5.2.
of
..X
is defined in (3.3.1.8) and
r~(t;
p
is the dimension of
a,b) denote the non-central F d.f. as in section
Let p denote the population multiple correlation coefficient
and
Y in (3.3.1.1) and
k
U* ... k 0'-2 S ... 0'-2 L (Y
_ Y) 2
y
YY
Y i=l n,i
(3.3.3.2)
(3.3.3.3)
(3.3.3.4)
u>o
Then with
F*
p (t; a,b) defined in (2.5.2.2) we have the following
Theorem 3.3.3.1:
For every
t >0
* .. FA(t; p, k-l-p),
P{T * ~tlu}
Hence
P{T*_< d • F*p(t; p, k-l-p),
t
->0 .
t >0.
(3.3.3.5)
Under
H : ..Vxy--..
• 0'
~';;;";;";"--.-o-=-
F*(t; p, k-l-p) • FO(t; p, k-l-p) is the central F d.f. with DF
(p, k-l-p).
Proof:
By (3.3.1.5) with
p .. S
...xx
w...
S
...xy
S ..
S-l
YY
68
(3.3.3.6)
From (3.2.1), (3.2.4), and (3.2.33), conditional on
Yn,i ' !n[i]
has a p-variate normal distribution with mean
(3.3.3.7)
and variance matrix
v...xx - cr-y 2 V_xy_xy
V'
and the
1 < i <k
X [i]'
n
p-variate
~ormal
(3. 3. 3. 8)
are independent.
So
S
-xy
is conditionally
with mean
(3.3.3.9)
and variance
Let
~
be any non-zero p-vector and
Conditional on
Yn,i'
Qi
Qi·!n [i] ~ , 1 ~ i
is univariate normal with
E(Qi) • _"'X
L'~ +cr-Y2 (y n, i - ~y )L'
V
_ -xy
~ k.
69
V(Q) • L'(V
-xx
i
- a-Zv
V' )L
and Q1, ••• ,Qk are independent.
on !,
9 is
9-
Define
(Q1, ••• ,Qk)'.
Conditional
k-variate normal with moments
- • L'V(g} • L'
E(Q)
II
1
-x -
(V
-xx
Let H • I _k- 1 1
~
r
+ a-~' V
- a-Zv
so that HZ.H.
9•
E • k(S
Let
V'}L I
Y -xy -xy - - k
~x
(3.3.3.13)
(Y - 11 1)
-xy Y-
Y -
k- 1X'HY with
X L.
(3. 3. 3.1Z)
Y -xy-xy -
•
{3.3.3.14}
S • k- 1X'HX and
Then
_S-l S S'}.
-xx
yy-xy-xy
•
Note that
p' !k x 1 defined in section 3. Z.
-
S
-xx-xy
Then
L' E L • L' X' [ H - k- 1S-1 HYY'H]X L
- - ""
- ~
1Y --- - " " -
= Q'
-
J Q.
{3.3.3.15}
Using (3.3. 3.14) and the fact that J is idempotent,
[J V
-
(Q)]Z • [L' (V
-
-
-xx
- a-Zv
~'E
The noncentra1ity parameter of
by (3.3.3.13).
V' )L]Z J.
y -'XY -lC'j -
~'~ ~
L is
This is seen to be zero.
(3.3.3.16)
-
where
Since rank
(~)
~
is given
= k-Z it
follows that
L' E L-L'(V
- -
-xx
- a- Z V
y
Vt}L· XZ_
-xy -xy -
Since this is true for every
~
{3.3.3.17}
k Z
we have by Rao [1965, p. 45Z]
a:Z V
k(S
- 5- 15 S')- W (k-Z, V -xx
yy-xy -xy
P
-XXI
V')
-xy-xy
(3. 3.3.18)
where W (m,r) denotes the central Wishart distribution of dimension
p
-
p, m degrees of freedom, and matrix parameter
~.
Since this
70
distribution does not depend on
of
E
....
Y,
....
the unconditional distribution
is the central Wishart distribution •
1
1
L'
S • k- L'
X' H
Y • k- Q'(H Y).
..
#v"xy
....
.....
Now
#IV
and (3.3.3.14), L'
S
.......xy
and
"""
L' E L
...........
...
Based on (3.3.3.13)
....
are conditionally independent
since
Since the conditional independence holds for any
conditionally independent
(3.3.3.9) and
(3.3~3.10),
L, E
by Rae [1965, p. 453].
k
1/2
S
....xy
and
S
are
-xy
Finally, from
is conditionally p-variate
normal with mean and variance given respectively by
S (V - a- 2v V' )
y -xy.... xy •
yy ....xx
We now invoke the theorem on the distribution of Hote11ing's
statistic (see Rao [1965, p. 458]) •
-1
c • Syy ,
o•
-
S
d • ....xy
,
k 1/2 a- 2s
V •
y yy -xy
(3. 3~ 3.19)
2
T
In Rao's notation we let
-2
- S-l S
S'
E• V
V V'
S • S
....xx
....xx - ay -xy....xy
yy ....xy ...xy ,
Then
--
(k-1-p) c d' S-l dip • T*
....
haS conditional d.f.
FA(t; p, k-1-p) with non-centrality parameter
71
Using (3.3.1.5),
(3.3.3.20)
with
p2 and
U*
defined respectively in (3.3.1.1) and (3.3.3.2).
Equation (3.3.3.5) follows from the fact that this conditional d.f.
depends on
Y
....
only through
U* •
Finally by (3.3.3.5) ,
*
*
peT * ~t) • E{P(T * ~tlu)}'"
Fp(t; p, k-I-p).
this is equal to
FO(t; p, k-l-p) for all
U*
Under
.,. 0
H : V
....
O ....'}C'j
and in that case
Q.E .D.
From Theorem 3.3.3.1, T* has the same conditional distribution
as the statistic from an uncensored sample and under
H : V
O ....'}C'j
it has the classical F distribution with DF (p, k-l-p).
other
authors~
= 0....
Note that
e.g. Anderson [1958, pp. 89-96], obtain the distribu-
tion of the sample multiple correlation coefficient by conditioning
on
!i'
In our case this would not lead to a normal conditional
distribution of
Yi
so instead we have had to condition on
Y •
i
The complicated unconditional non-null distribution of the complete
sample multiple correlation coefficient can be found in Anderson
[1958, pp. 93-5].
As
in section 2.5.2, exact power calculations for
this test are difficult to obtain.
approximation to the distribution of
Here we may also take a one-moment
*
Tusing
(3.3.3.20) by taking
72
E(U *) can be calculated as in section 2.5.1.
The estimated power of
the test of size a is
* p, k-l-p)
1 - F/..( Fa;
(3.3.3.22)
* p, k-l-p) - I-a.
FO( Fa;
where
This provides a much simpler method
for higher moment power approximations 1..'1 the complete sample case
(n-k) than does the result obtained by conditioning on
ei (see
Anderson [1958, p. 95]) since the non-centrality parameter is a
function only of the univariate
Y , 1 < i < n.
i
Inference about Independence of
3.4.
3.4.1.
--
X(2) and Ygiven
x(l)
...
Estimation of Partial Correlation Coefficients
We make the following partitions:
P:!. + P2 • P
v...XX
-
~11
-
!i2
and similarly for
PI xP I
~12 PIx P~
!22
...xy
V
J
and
... ly
V
P2 xp.:.l
*
V* , V
...xy , ~xx
...xx
-G X]
... 2Y
S
,..xy
.
P
1
p xl
2
(3.4.1.1)
73
Let
*2
(cr - 5 ).
yy
Y
yy
d - 5-
2
(3.4.1.2)
Then from (3.2.32) and (3.2.30),
V*
...xx
=
5- 12 + d ...8 1y...52' Y
-511 + d -51y ...51'y
(3.4.1. 3)
5 1'2
...
V*
- .cr·*2 5-1
xyyyy
.
+ d ...52y-51'y
[~lyJ
S
...2y
5 22
...
+ d ...52 y-52'y
(3.4.1.4)
•
The conditional covariance matrix of (~i2), Y ) given
i
!~1) is given
by
.
-1
w...
Y22-Yi2Y11~12
V -V' v- 1v
...2y -lY...11_12
V -V' V-~
...2y ...12...11...1y
cr 2_v' v- 1v
y ...ly_11_1y
•
~22
W
W'
W
...yy
... 2y
-2y
(3.4.1.5)
•
using (3.4.1.3) and (3.3.1.4),
(3.4.1.6)
which along with (3.4.1.3) and (3.4.1.4) provides the modified MLE of
w
... ,
74
*
*2
-1
-1
-1
Wyy • 0Y (1- 5yy ...51'y ......
511 51y )/(l+d ...51'y ......
511 51y )
(3.4.1. 7)
Thus the modified MLE of the partial correlation vector of
y
given
~(l)
g*x (2) y Ix (1)
!(2) and
is
.. 0*5- 1 (1_5- 15' 5- 1S )-1/2(1+d5· 5-1 5 )-1/2.
y yy
...yy... ly-11...1y
...ly-1l...1y
(3 4 1 10)
• • •
The multiple (or "multiple partial") correlation between !(2) and Y
given
x(l)
is defined in an analogous manner to the multiple
...
X and Y (3.3.1.1) except that the covariance
matrix is replaced by W. This reduces to
correlation between
...
0.4.1.11)
where P is the multiple correlation coefficient between
defined by (3.3.1.1) and
Pyl
!
and
Y
is the multiple correlation coefficient
~
75
X{l) and
between
Y.
Using (3.3.l.7) the modified MLE of
(3.4.1.12)
R*
where
YI
between
is the ordinary sample multiple correlation coefficient
X{l) and
-
Y
(3.4.1.13)
Combining (3.4.l.l2) and (3.3.1.7),
2*
. 2*
2*·
2*
-1
2*
Py21l • (R - Ryl)/{{C-Ryl ) [c+(c -l)R ]}
where as before
~.4.2.
A
x(l)
_
'
• (3.4.1.14)
1'\"*-2.1S a " correction factor for censoring".
c • Syy v y
Distribution of Test Statistic under Censoring.
statistic for testing the independence of x(2) and
,
-1
HO: Y2y - Y12 Y1l ~ly • ~, or equivalently
-
Y
2
HO: Py2 11
given
=0
is
(3.4.2.l)
with the sample squared multiple partial correlation coefficient
defined by
76
(3.4.2.2)
and its components defined in (3.3.1.8) and (3.4.1.13).
2
and
Py2 1l
Let
denote the population correlations defined in section
3.4.1 and
(3.4.2.3)
*
n, k(u) • p{p ~ u},
G
Then with
F~(t;atb)
u >0
-
.
(3.4.2.4)
denoting the non-central F d.f. as before and
* a,b) defined in (2.5.2.2), we have the following
Fp(t;
Theorem 3.4.2.1:
For every
t >0
and given
P*
(3.4.2.5)
Hence
*y21l (t; P2 t k-l-p),
P{S * ~ d • Fp
t
> O.
Under HO:
F*y2Il(t; P2 ' k-l-p) • FO(t; P2 t k-l-p) is the central
~
E.y21l -0,
F d.f. with
(P2 ' k-l-p).
Proof:
Equation (3.3.l.5) and the matrix partitions given in (3.4.1.5)
77
lead. us to the relation
(3.4.2.6)
where
w
~
5
= ~2y
S - ~12
5' -11
5- 1 ~ly
, and
-1
~ - ~22 - ~i2 ~11 ~12 -1"
*2 -1
-1
-1
5yy (l-Ry1) (52
-5 '2 511~51y )(5- 2y -5- 1'2 -511-51y )'.
~ y -1 -
(3.4.2.7)
Mak1ng partitions as in section 3.4.1 and letting
l.l
-x
.. (
X(l)
}!1
)
H2
...
~
(x(l) ,
_n[l]
,
)'
.... X(l)'
_n [k] ,
(2)' I
X(2) ... (x(2)'
_n [1] ' ••• 'Xn[kJ)
(3. 4.2. 8)
-(1) ... k -1
X
-
then we have
k
E X(l)
i_1-n [i]
,
-(2) .. k-1
X
-
k
E X(2)
i.1-n [i]
-
78
(3.4.2.9)
Now conditional on" Yn,i ' !~[i] is p-variate normal with
(3.4.2.10)
vex'
IY
)
- neil n,i·
V _a-
-11
(
So conditional on
Y
n,i
y
2
v-ly-1y
V'
Vi -a-2
Vi
_12 y v
_2y-1y
and
x(l)
V
-12
_a- 2 v
V'"
y -ly-2y
V _a- 2v Vi
-22 y _2y_2y
(2)
f
-neil , ~n[i] is
)
P2-variate normal
with
(3.4.2.11)
and
2
V(X_ (2[i)]' IYn,i ' X(l) ) • (V _a- v V' )_(~T' _a- 2V V' ) •
....n [i]
_22 y -2y-2y
I..12 Y -2y-1y
n
-2
,-1
(V.... 1l-ay V
.... 1y-Vl y )
-2
,
(V.... 12 -ay V
.... l y-V2y )
79
which, using (3.3.1.4) is
(3.4.2.12)
and
X(2)
n [i]
, 1_< i _< k, are conditionally independent.
(1)
-2
*2
2
E(w~ IX
,Y)·
cry 5yy (l-Ry 1) (l-py 1)
~
~
-1
We have
,-L_
(V~ 2y -V12Vl1~1
)
~
~
~ y
and because of the conditional independence of the
(2)
~n[i]
(3.4.2.13)
,
0.4.2.14)
P2-vector and Q1· ~~~~] ~, 1 < 1 < k •
(1)
Conditional on Yn,i ' ~n[i] , Qi is univariate normal and
Let
~
be any non-zero
2
E(QiIYn.,i ' X(1)
+
-n[i] ) • ...L' -1.1 2 + cr-y (yn, i- 1.1y )L'
- V
~ 2y
2
- ~_)V )' (Y11- cr- V V' )-1 •
[X(l) - (1.1 + cr- 2 (y
~n [i]
1
y
n,i . Y ... ly
~
y ...ly...1y
(V12 - cr-2 V1 V2, ) L
Y - y- Y -
(3.4.2.15)
80
V(Q i Iy n,i' X(l»
-n[i]
and
9•
Q1, ••• ,Qk
- -L' ~~ L
(3 •••
4 2 15)
are conditionally independent.
(Q1,···,Qk)' • x(2)L.
It follows that
9
Let
is k-variate normal
with
(2
+ [X (1) _(11~l+ (J-2
V,)-l( V
y (Y n,i - lI~y )V_ ly )'] V
-11- v~y V
-ly-ly
-12 -
V(Qlx(i),Y)-L'LL
~ ~
Let
- -
E· k A.
I
-
~-2 V
v
V' ) L
y -ly-2y _
(3.4.2.16)
_k .
Then
(3.4.2.17)
It can be verified that
~
is idempotent.
Therefore,
e
81
0.4.2.18)
~'J ~ with ~ = E(Qlx(~)y) can be
The non-centrality parameter,
~
shown to be zero.
Note that rank
--
- -- -
(J) - k-2-p •
...
1
It follows that
conditionally
2
L' E L ... L' l: L.· X
...
...
......
1
Since this is true for every
E
=k
(3.4.2.19)
k-2-p.
~
A ... W (k-2-P1'~)
P2
(3.4.2.20)
W (~) denotes the central Wishart distribution. We
P2
note in passing that since the parameters of this conditional distriwhere as before
bution do not depend on
Wishart distribution.
X(l)
or
...
Y,
... E... also has an unconditional
This result is also a direct consequence of
(3.3.3.18) and properties of the Wishart distribution.
However, for
our purposes we needed to verify that the conditional distribution
--
(given Y and
...
(1)
X
...
is also a Wishart distribution •
To demonstrate that
we see that
wand
...
E
...
are conditionally independent
82
and
(Y'H - S' S-l X(l)'H)(L'E L)1 J
- -ly...ll ...... .........k ...
Then since
L'w
41V_
and
=0
(3.4.2.21)
.
L'E L are conditionally independent for every
#V_IV
L , w and E
......
... are conditionally independent. From (3.4.2.16),
~ • k l/Z ~ is conditionally P2-variate normal with
V(dlx(i) ,Y) • S
... ...
...
yy
(1-R*2)E
yl ...
2
We again use Rao's Theorem concerning Hotelling's T distribution
[1965, p. 458] with
as above.
....s • E,
...
given
c •
Thus
has conditional d.f. FA(t; P2' k-l-p) with A(3.4.2.3) using (3.3.1.5).
-1
c~' ~
~
given by
Equation (3.4.2.5) follows from the fact
that the conditional d.f. depends on Y and x(l) only through P*.
implies
HO:
*
*
*
*
P(S < t) • E{P(S < tiP)} - F 21' (t; Pz ' k-l-p).
-
2 l • 0, Fpy21l
*
Py2
l
and in that case
~
is equal to
*
Fpy2ll·
FO•
~
This
Under
FO(t; P2 ' k-l-p) for all
P*
Q.E.D.
83
'Ie
From Theorem 3.4.2.1, S has the classical null distribution.
'Ie
Power approximation to the S
the preceeding sections.
test will be more difficult than in
For a complete sample some simplification
may be made, for in that case
is a consistent estimator of
2
Py1 in (3.4.2.3.).
3.5.
Inference about Independence of
3.5.1.
~(1) and (~(2) ,Y)
Estimation of Canonical Correlations
Let the random variable
!
be partitioned as in section 3.4.1.
H : ~(1) is independent of (~(2), Y) where Y may
O
be subject to censoring. Let the covariance matrix of (x(l)' ,x(2)' ,Y)
We wish to test
...
...
be partitioned as
Y11
Y12
V
...ly
Yi2
Y22
...V2y
...V'ly
V
(j
~2y
(3.5.1)
2
y
The hypothesis of interest is then
H : (Y12 Y1y) .~.
O
Let
T • (~(2), Y) and
V(T)
"'2y
V ) .•
(j2
y
(3.5.1.2)
84
~lT· (Y12
Also let
* , ~T*
V11
Y1y) and
'E*1T
denote modified
MLEs of the corresponding parameters as given in section 3.2.
Popu-
1ation canonical correlations, the largest of which is the maximum
..
correlation between' any linear combination of T with any linear combination of
!(1), can be estimated by finding eigenvalues of
(3.5.1.3)
3.5.2.
Null Distribution of a Test Statistic
(1) ,
From (3.4.2.10) conditional on
Yn,i' Xn[i]
is Pl-variate normal
with
(3.5.2.1)
V(X(l)'
n [i)
Under
HO:
IYn,i )
~lT·
with parameters
as well as of
-2 V
V·
..lY..1y.
(1) ,
2,
~1
V
• ..11 - cry
!n[i] ,1~i~k is (unconditionally) normal
and
Y i'
n,
Y11 and
!~~~]
is independent of
Thus by Anderson [1958
~~~~]
,p. 324] , eigenvalues
of
-1
-1,
~11 ~lT ~T ~lT' ~lT • (~12 ~lY)
have the classical null distribution and the test of
(3.5.2.2)
HO:
E1T • 2
85
can be carried out without difficulty.
The matrices involved are
ordinary sample sums of products matrices.
The non-null distribu-
tion of the eigenvalues will be exceedingly complex.
CHAPTER IV
STATISTICAL INFERENCE FOR OTHER
MULTIVARIATE DISTRIBUTIONS
4.1.
Introduction.
Suppose that (Xi,Y ) has a bivariate d.f. such that
i
marginal density
where
Xi
~
is a
1 x
Y
i
has
gA(y) with at least the first two moments existing
~
vector.
Suppose also that, conditional on Yi '
has a normal distribution with linear regression and homo-
scedasticity.
Define
(4.1.1)
It follows that
Corr(Xi,Y i .) • p • Say fa x ,
(4.1.2)
87
The information matrix of
e·
-
(ll
x
,o,a,).)
. "..
and the large sample
covariance matrix of ....e will be calculated.
MLE of
~
will be derived with the MLE of
Joint MLE or modified
o,a
being independent of
ax . The
These estimators will be used to estimate p and
likelihood ratio test of
a-
H :
O
statistic does not depend on
will be displayed.
0
g)..
The test
The properties of the test will
....
be studied for
Yi
having an exponential or extreme value distribu-
tion.
4.2.
Inference when Order Statistics are ftomanY Parent Distribution
but Concomitant Variables are ConditionallY Normally Distributed
4.2.1.
Cram~r-Rao
Lower Bounds
The joint density of
Yn, l'···'Yn, k
is
(4.2.1.1)
y <y < ••• <y
1- 2-
where
G).
-
k
is the d.f. corresponding to the density
ditional density of
Xn[l]' ••• 'Xn[k] given
gA.
.....
Yn, l'···'Y n, k
The conis, by
Lemma 1 of Bhattacharya [1974],
k.
II {~
i=l
1
«xi - IIx - a(y n, i- IIy »/o)/o}
(4.2.1.2)
88
and
Xn [l]"'"
Xn[k] are conditionally independent.
denotes the univariate normal density.
function of
Yn, 1".' , Yn, k
As before,
~1
Denote the log likelihood
by
k
E log g).. (Yn,i)+(n-k) log [l-G).. (Yn,k) 1.
i-1
...
(4.2.1.3)
...
The total log likelihood function of (Xn[i]' Yn,i ' l~i~k) is then,
aside from constant terms,
Let ~ denote t~e
Then letting
1 x (R. -1) parameter vector in ~ aside from
T-X-ll
.....
"'W
1,
X -
z•
""'"
(Y - II
...
lly'
1) ,
y ""'"
(4.2.1.5)
(4.2.1.6)
(4.2.1. 7)
(4.2.1.8)
(4.2.1.9)
89
LY
n,k
Now let the information matrix derived from
[ ~~y
...ll-
I
...lJ,
Y
0.
Y ...
and let
and
=
u...
E( ...T) ..
un, i
= E(Yn, i-
...
(4.2.1.10)
2 2
8
2
w=
(w 1 ,···,wk )'.
...
8
a
0
0
2k/8
2
2
=
(llx ,8,
2
is:
...0
2
2 2
lly
y ...
...0
-aoy...l'u/8
...
I
lJ, ,0.)
...
0
lly
a,
0.
-ka/8
0
02 1 'w/8 2
y......
a
Using the fact that
lly
y ...l'u/o
(4.2.1.11)
wi = E(Y n, i- lly) lOy
the information matrix ...I for ...e
llx
k/o
~]
IIy )/0y ~
(un, l'· •• 'un, k)',
aE(Z...),
llx
I
...0.
be
+ka /8
(4.2.1.12)
...0
III
0.
... y-
o....
I
...0.
The inverse of the upper left
0
2
3x3
submatrix of
0
2
_0 1 'u/kdO
8 /2k
o
...l' w/kd
...
2
... -
!
is
y
2
d=(l'w -(l'u) /k).
02/ d0 2
y
(4.2.1.13)
90
I -1 ,.
This yields
-
a
llx
2
2
0 1 'w/kd+a f
--
-o2 l 'u/kda
- Y
0
02/ 2k
-a
lly
-af!
I-I
-lly-a-a
af
0
02/ da2
y
0
0
a
a
(4.2.1.14)
-I
IY
where
f,. (I
R. x R., and
lly
!a is
(R.-l) x (R.-l) •
When
gA is a one parameter distribution such as the exponential,
a is not defined.
-variance
matrix
For this case we obtain the large sample co-
llx
02 l 'w/kd+a 2/I
- -
IIY
0
a
a
-02 l 'u/kda
02/ 2k
- -
0
02/ da 2
y
lly
Y
a/Ill
Y
a
a
I
-1
lly
(4.2.1.15)
91
and
cr
is a function of
y
lly.
To place this within the framework
of section 2.3, Cramer-Rao lower bounds on the parameters p and
e,o,
which are derived from
and
cr
y
cr
x
,can be calculated from
(4.2.1.14) by standard large-sample methods.
4.2.2.
Estimation.
As in previous chapters, we will place
* by
a parameter to
denote an estimator which is derived from maximum likelihood estimators·' of
~
or by another efficient method such as linearization of
LY (Chan [1967]).
n,k
To estimate
llx
A '" over a parameter will denote a true MLE.
,o,e,
we set partial derivatives of L k in
.
n,
- 2
2
(4.2.1.5) through (4.2.1.7) to zero. Define X, Y, S ,S
x y and
as in (2.4.1).
Dropping
=
e(y - II )
X - IIx
y
*
S
xy
and'" for the moment we find
from (4.2.1.5),
(4.2.2.1)
(4.2.2.2)
and
e•
Note that
Sxy /S2y'
e is
depending on
from (4.2.1. 7) •
the usual regression estimator.
(4.2.2.3)
Since it is MLE not
LY
n,k we write
(4.2.2.4)
92
a into
A
Substituting
62
(4.2.2.2) we find
• S2 _ S2 /S2
x
xy
(4 2 2 5)
y'
• • •
the usual residual squared error estimator from regression.
Using
(4.2.2.1), (4.2.1.8) becomes
(a/au)L
k" (a/au )LY k
Y n,
Y n,
(4.2.2.6)
which along with (4.2.1.9) implies that
other efficient estimators based on
,* --
~
LY
(u* , ~ *) are MLE or
n,k alone.
y
From (4.2.2.1)
we complete the estimation by
(4.2.2.7)
which, of course, is the same as (2.4.10).
For the derived parameters
we obtain, using (4.1.2),
(4.2.2.8)
as in (2.4.9) and (2.4.7).
and
P* are also.
are MLE it follows that
This statement can be proved by writing
.*
(J
x
93
transformation is one-to-one.
Formulas for the expected values of estimators in section 2.4
are also valid here.
In particular (2.4.ll)and (2.4.13) imply
(4.2.2.9)
so
This exact variance should be compared with the large sample variance
from (4.2.1.14),
(4.2.2.10)
which for a complete sample equals
(4.2.2.11)
It can also be shown that
(4.2.2.12)
94
4.2.3.
Likelihood Ratio Test of Independence
By (4.2.2.6), estimation of
8
~
is unaffected by restrictions on
Hence the log likelihood ratio depends only on the conditional
normality of
...X.
Thus the argument in section 2.5.1 applies and the
likelihood ratio statistic is a function of the product moment
r * and we use the test statistic
correlation
T* in (2.5.1.8). Theorem
2.5.2.1 used only the premise of conditional normality so conditional on
5 2 ,T
y
* has
an F d.f. with DF(1,k-2) and non-centrality parameter
2
2
y
y
{p2 / (1_ p2)}k 5 10 •
The unconditional distribution of T* then depends
only on 8 (y) and under H : 8 ~ 0, p = 0 and T* has an unconditional
A
O
F d. f.
4.3.
-.
Example:
Let
Order Statistics from Extteme Value Parent Distribution
Yl' ••• 'Y
n
denote random variables from the extreme value
type I distribution with
G (y). exp[
A
.
d.f.
_e-(y-~)/e] ,
..atJ<y<
CXl
,
e>0
This distribution is a useful model for life-testing.
(4.3.1)
An
interpreta-
tion of its parameters is given by the moments,
E(Y i ) -
~
+
ye
(4.3.2)
95
where y is Euler's constant (.57721 ••• ).
The location parameter
~
can be efficiently estimated by linear combinations of order statistics while the scale parameter
p. 284]
a cannot;
see Johnson and Kotz [1970,
for a summary of efficiencies of various estimators.
and Moore [1968] showed that while the MLE of
able bias, the MLE of
~
~
and
a have
Harter
consider-
based on a censored sample is still comparable
to the best linear unbiased estimator in terms of mean squared error
and the MLE of
e is
superior to the best linear unbiased estimator.
= fOw
u
Let
rw
(a)
a-1
e
-u
denote the incomplete gamma function and
etc.
r~(a) = (arw(t)/at) It-a'
Let
y' (a) •
where
(4.3.3)
du
p. kin
r'00 (a) - r'-log p
(a)' y" (a) = r"00 (a) - r"-log P (a) , (4. 3. 4)
is the proportion of order statistics observed.
y' (a), y"(a), etc. are related to incomplete po1ygamma functions.
Specializing the results of Harter and Moore [1968] to the case of
censoring on the right, we find as elements of the information matrix,
96
-[y'(2)+ p q log P + (l-p)
-1
2
p q log p] • b
(4.3.5)
+ P q log p {2 + q + (l_P)-lq log p) = c
q. ~og(-log p) and
LY k
is the log likelihood function
n,
based on the order statistics as defined in (4.2.1.3). So the
where
"asymptotic information matrix" of
~,e
is
(4.3.6)
To draw inferences from the induced order statistics
X we need to
-
identify the transformed parameters
(4.3.7)
crY • [V(Y i )]1/2 • e~/16
97
since the parameters associated with
moments· of
and
0
X.
(€ + ye
y
r
(02/ n )
l
y
~
are defined by conditional
The large sample covariance matrix of the MLE of
and
en/16
~
y
respectively) is
2
2
(6/n )(y a + c-2 y b)
(4.3.8)
(l6/n) (ya - b)
with corresponding information matrix
(~/ 16) (b-ya)]
(4.3.9)
y a + c-2yb
Using
this, the asymptotic joint lower bounds for variances of
(~x,o,a, ~y'
estimators of
Oy) can be readily obtained using
(4.2.1.14).
For example we find that the lower bound for the
variance of
~~
*
from (4.2.1.14) is
(4.3.10)
where! and
d
in (4.2.1.13) can be computed using the first two
moments of extreme value order statistics (see Johnson and Kotz
[1970, p. 279]).
To study the test of independence of
Ho:
a• 0
Xi
we need to study the distribution of
and
Y , or
i
98
*•
U
C1
-2
Y
(4.3.11)
2
k
2
- (6/Tr) L (Z
- z)
i-1 n,i
where
1 < i <k
Zn, i '
treme value distribution
are order statistics from the standard ex(~
• 0,
approximated by simulation.
generated for each
n,k
e-
1).
* is most easily
E(U)
For this purpose, 2000
combination.
~
vectors were
The average values of
U* are
as follows:
n
k
E(U *)
20
10
1.518
100
20
1.250
1000
50
1.419
(4.3.12)
To estimate the unconditional power of the test, the conditional
power was compqted for each simulated value of
U* and each value
of P by using the noncentta1
These 2000 powers
F
distribution.
were then averaged to estimate the unconditional distribution of
For the one moment power approximations, the corresponding noncentral F probabilities were evaluated using
The results are in Table 4. 1.
* in (4.3.12).
E(U)
*
U.
99
TABLE 4.1
ESTIMATED POWER OF 5% TEST FOR CENSORED DATA FROM
THE EXTREME VALUE DISTRIBUTION
n
k
20
10
20
100
1000
50
p
Power
Power from One Moment
Approximation
.1
.0514
.0514
.3
.0636
.0636
.5
.0967
.0965
.9
.5686
.6079
.1
.0513
.0513
.3
.0628
.0628
.5
.0940
.0939
.9
.5677
.5889
.1
.0516
.0516
.3
.0656
.0656
.5
.1035
.1035
.9
.6581
.6738
The power approximation is apparently satisfactory.
4.4.
Example:
Let
Order Statistics from Exponential Parent Distribution
Y . have density function
i
-1
y
8, (y) • a
...
1\
exp(-y/a)
Y
(4.4.1)
100
lly- cry .
so that
This implies
[k]
-1 k
L~
LY - log n
- k log cry - cr
E Y i-(n-k)cr--Y
k
n, k
y 1-1 n,
y n,
(4.4.2)
~y
(4.4.3)
l
- k- (
cr"
The MLE
~
Y i + (n-k)Y k)- Y + k-l(n-k)Y . k
i-l n,
n,
n,
has several desirable properties, among them being that
y
it is the best linear unbiased estimator
berg [1962, p. 354]).
Its variance is
for
cr
y
(Sarhan and Green-
2
cr /k, which is also the
y
reciprocal of
I
cr"
Using
y
lly
.. k cr-2 •
we can estimate
(4.4.4)
Y
II x , crx ,and p with (4.2.2.7) and
(4.2.2.8) and using (2.4.15), (2.4.16), and (2.4.17), we find that
....x*
11
-
2
X - (n-k)S xy Yn, k/k Sy ,
E(ll *) .. II
x
x
(4.4.5)
(note that the last term may be large for large amount of censoring)
and
101
since
(4.4.6)
From Sar-han and Greenberg [1962, p. 343] , the order statistics
have the following moments, with
a
n,i
= E(Z
n,i
)
i
= E
t-1
(n-t+1)
Z i
n,
= Yn, i/cry
-1
(4.4.7)
i
b
n,
i-V(Z
n,
i ) · Cov(Z
n,
i'Z
n,
j) -
E (n-t+1)
.2,-1
-2
, i<j
- .
So
k
l' u -
E E(Z
-1)· -(n-k)a
i-1
n,i
n,k
k
l'w
= i-1
E E(Z
-1)
n,i
2
•
using
(4.4.8)
(4.4.3)
k
2
E (b
+ a
- 2 a .+1)
i=l n,i
n,i
n,~
and
k
k
i
-1
k
2
E b
- E E (n-t+1) .. E (k-i+1) / (n-i+1)
i
i-1 n,i
• t-1 l l
ik
= E
2
{(n-i+1) + [(k-i+1)-(n-i+1)]}/(n-i+1)
i-1
•
a n,k - (n-k)b n, k
(4.4.9)
102
k
2
k
Ea
- L (k-i+1) (n-i+1)
i-1 n,i
1=1
-2
1
+ 2 L L (k-i+1) (n-i+1) - (n-1+1)1>1
- a n,k - (n-k)bn, k
1
(4.4.10)
+ 2 L L {~-i+1)+ [(k-1+1)-(n-1+1)]}(n-i+1)-1(n-1+1):1
i>1
The rightmost term is
2 L L(n-1+1)-1 -2(n-k)L L(n-i+1)-1(n-1+1)-1
i>1
i>1
whose first term is
k
L (k-1)(n-1+1)
2
-1
1-1
k
• 2 L {(n-1+1)+ [(k-1)-(n-1+1)] (n-1+1)-1
1-1
and second term is
2
(n-k)(b n, k - an, k)
so (4.4.10) equals
2
2k - [1 + 2(n-k)]an, k - (n-k)an, k
implying
(4.4.11)
103
2
1 ~w ,. k-(n-k)E (Zn k)
-.
l'".
(4.4.12)
,
(4.4.13)
With these quantities, the large sample covariance matrix in
(4.2.1.15) can readily be evaluated.
An indication of the power of the r * correlation test is given
by the expected value of
*=
U
k
2
E Z
- k
t-1 n,i
-1
k
(E Z i)
i-1 n,
2
(4.4.14)
By combining (4.4.9) and (4.4.11), the expected value of the first
term is
k
E( E Z2 i) • 2k - 2(n-k)a k-(n-k)(b k+a2 k)'
i=l n,
n,
n,
n,
Using the fact that
E{ (
V [EZ n, i + (n-k)Z n, k]
(4.4.15)
= k we obtain
~
Z i)2},. k + (E E Z i)2 - (n-k)2 b k
i-1 n,
n,
n,
-2(n-k) Cov(E Zn,i ' Zn,k)'
The second term is obtained from (4.3.15).
(4.4.16)
For the covariance term,
104
k
Cov( L Z i'Z k)" E( L Z i Z k)-E(Z k) L E(Z i)
n,
n,
i-l n,
n,
n,
n,
k
L Cov (Z
-
i-l
n,
k
L b
i' Z ) n,k
i-l n,i
(since
i
~ k)
= a n, k - (n-k)bn, k by (4.4.9).
(4.4.l7)
Thus we obtain
E{(LZ
n,
2
i)2}. k + k-2(n-k) (k+l) a
n,
k+ (n-k)2(b
n,
k+ a
2
k)
n,
(4.4.18)
and using (4.4.15),
*
-1
2
E(U ) - k-l + k (n-k)[2 a n, k-n(b n, k + a n, k)]'
(4.4.19)
For large amounts of censoring, E(U*) is small in comparison with
the normal distribution (for
n" 1000, k .. 50, E(U * ) .. 6.79 for
normal distribution, 0.011 for exponential).
- log (l-p) , bn,k • -!n-k
- 1n •
p(l-p)/n
where
Now since
p >.95.
Since
n,k
..
p. kin,
* l-p -2 (l-p) log 2 (l-p)
(k-l) -1 E(U).
for large n and fixed p.
a
(4.4.20)
This quantity does not exceed .5 until
U* relates to the non-centrality parameter of the
conditional F distribution, power characteristics of
t~e
test of
105
a•
H ;
O
a
will be inferior with this exponential model with any
substantial amount of censoring.
Where the exponential distribution
is most dense (in its left tail), there is little variation in the
smallest order statistics.
Either the linear regression model or
the assumption of homoscedastic variances for this setup is unreasonable.
Thus while the extreme value distribution model per-
formed quite well, this theory may be of limited utility in some
non-normal models.
It is interesting to consider the marginal density of X in this
model where Y has the exponential distribution.
1.1
X
•
a
Taking
(J
x
= (Jy
:I
1 and
for simplicity, X has the standard normal density when p:l 0,
and when
p'"
a
it has density
00
2 -1/2
2
2
f(x) • fa [21T(1-p )]
exp{-[x-P(y-l)] 12(1-p )}exp(-y)dy
(4.4.21)
= Ipl-lexP{(1_3p2_2PX)/2p2}~«x+P_(1_p2)/p)/li_
where
~(.)
is the standard normal d.f.
p2 )
This density closely re-
sembles the normal density for moderate p and becomes more skewed as
p
becomes large in absolute value; however its first two moments
remain unchanged.
A graph of
f(x) is shown in Figure 4.1.
106
Figure 4.1.
Graph of Marginal Density of Xi when Yi has
an Exponential Distribution
~,...----------------------------....
o
f(x)
a:+-...,....~+-r--r--r---r---r---r----r-~~3iiii;ii;i~
D
-S.5
-S
-2.5
-2
-1.5
-1
-0.5
0
X
0.5
1
1.5
:2
2.5
S
S.5
107
A model in which the marginal distribution of X should depend on
P is one in which both X and Yare reflections of another variable
with a strength of association
P.
The problems with this exponential model can be remedied if
instead of linear regression ho1ding.the regression on Xi is 10glinear, Le.,
x + !3(T i - l.l y )
l.l
where
T
i
~
log Y •
i
(4.4.22)
As far as estimation of parameters associated
with Xi is concerned, we could use the extreme value distribution
for Y as discussed earlier.
i
linear model since -T
meters
~,.
-log cr
,
y
i
This is a generalization of the log
has the extreme value distribution with para-
e ,.
1.
However, the present discussion will be
for the log linear model with Y having an exponential distribution.
i
Using properties of the extreme value distribution,
(4.4.23)
and since
2
2k cry/cry'" X 2k (Epstein and Sobel [1953] in Sarhan and
/'0.
Greenberg [1962, p. 362]) it follows that the moment generating
function of
log cr
Y
is
108
m(t) •
(0' Ik) t
y
f (t+k) If (k) •
(4.4.24)
Therefore,
(4.4.25)
V(log
cr ) Y
m"(O) - m' (0)2
= ljJ' (k)
where
k-1
ljJ(k) -
t
f' (k)/f(k) - - y +
1
1-
1-1
and
(4.4.26)
k-1
ljJ'(k) -
~2/6 _ t
1- 2 • k- 1 •
1-1
Note that
lim ljJ(k) - log k - lim ljJ'(k) • O.
k-.oo
'Ie"
l.l
T
-
log
with variance
k - n,
An unbiased estimator
k-.oo
0'
.y
+ log k -
ljJ'(k).
k-l_ 1
t
1-1
~
(4.4.27)
From Johnson and Katz [1970, p. 285], if
"
log O'y is the MLE of E(T 1 ) and for any
~
k,n, V(l.lT) is equal
109
to the variance of the MLE of E(T ) for a complete sample of size k.
i
*
~T
From
an unbiased estimator of
as simple estimators of p and
(J
x
~
x
can be produced as well
since VeTil is a known constant.
All parameters are estimated in the manner described previously with
Y replaced by
.....
a
T· log Y, e.g •
=
(4.4.28)
The power function of the test of
a•
0 for this model is different
from that of section 4.3 since here we are sampling from the opposite
tail of the distribution of
Y •
i
The average values of U* here are:
*
n
k
E(U )
20
10
5.653
100
20
12.244
1000
50
30.545
The power estimates calculated as in table 4.1 are shown in table 4.2.
110
TABLE 4.2
ESTIMATED POWER OF 5% TEST FOR CENSORED DATA
UNDER THE LOG-LINEAR MODEL
Power from One Moment
Power
Approximation
n
k
.Q.
20
10
.1
.0560
.0560
.3
.1102
.1100
.5
.2486
.2569
.9
.9079
.9957
.1
.0629
.0628
.3
.1804
.1808
.5
.4516
.4810
.9
.9953
.9999
.1
.0847
.0846
.3
.3910
.3988
.5
.8289
.8783
.9
1.0000
1.0000
100
1000
20
50
The one moment approximation is again satisfactory.
By comparison
with table 2.3 the power characteristics of this model, when it
holds, are greatly superior to those of the bivariate normal
model under censoring,
CHAPTER V
DISTRIBUTION FREE TESTS OF INDEPENDENCE
BASED ON INDUCED ORDER STATISTICS
5.1.
Introduction.
In this chapter we will derive locally most powerful rank tests
for independence of Xi and Y when (Xi,Y ) are from a general
i
i
bivariate distribution.
The tests will be explored in more detail
for the case when Xi is conditionally normally distributed or when
linear regression-type alternatives are appropriate.
Empirical
power studies will be made for Y having a normal or extreme value
i
distribution.
An asymptotic study will then be made of relative
efficiencies of rank tests of independence with regards to local
regression alternatives.
Finally, an omnibus nonparametric test of
independence due to Hoeffding [1948] will be studied for the case of
censored bivariate samples.
5.2.
Locally Most Powerful Rank Test of Independence
Suppose that
Y has density
i
is the conditional density of
is a parameter such that
Xi
g(y) with d.f. G(y) and
Xi given
and
Y
i
Yi • y.
fe(xIY)
We assume that
e
are independent if e • 0 and
112
that
e is
not a parameter of
g(y).
fo(xIY) • fO(x).
With X and
joint density of
X and
'"
n
[k]
!
When a • 0 we write
defined as in section 1.4, the
Y is
k
n-k
{II fe(xiIYi)g(Yi)} [l-G(yk )]
, Y < Y < ••. < Y •
i-l
1- 2- k
The density when
Xi and
Yi
(5.2.1)
are independent is
y<y< ... <y.
1- 2-
-
k
(5.2.2)
The likelihood ratio then is
k
Ln,k - i~l {fa (Xn [i] \Yn,i) / (f O(Xn [i] )}.
(5.2.3)
It can easily be seen that
k
Qn,k • (aLn,k/ae)\e_O - i:l[alO g fa(Xn[i] !Yn,i)/ae] 16=0·(5.2.4)
Let
5
n,i
be the rank of
~n • (S n, 1, ••• ,5 n, k>'
Yn.,i
among
Yl, ••• ,Yn (5 n, i · i), and
For continuous distributions, the probability
of a tie is zero and ranks are well defined with probability 1.
For the induced order statistics, the rank of
Xl"."X n is unknown.
Define
Rk,i
Xn[i] among
to be the rank of Xn[i] among
Xn[l] ",.,Xn[k]and 11t - (11t,1,···'11t,k)·
Then following, for example,
e
113
Cox and Hinkley [1974, p. 188], the locally most powerful rank test
(t.m.p.r.t.) for testing
H :
O
e=
0
using only
~
and
S
...n
is based
on the statistic
(5.2.5)
•
Computation of
cn, k(i,~
i) and therefore
-It,
using the fact that under
sample of size
are the
k
from
H '
O
fO(x).
where we have abbreviated
~
l, ... ,k.
is facilitated by
represents a complete random
In that case
order statistics of the
k
pendent,
~
Tn, k
k
Xn [R ] , ••• ,Xn [1\]
l
Lied.r.v. X [i) , l<i<k
n
-
-
by
Ri • Now under HO: X and Y indecan take on any of the k! permutations of the integers
l\,i
This fact gives rise to an exact permutation test for H :8=O.
O
Hoeffd1ng [1951] showed that, under
E(T
n,
k) - k-
1
k
k
E
Ec
k (i,j), and
-1
k
k
H '
O
i-1 j-1 n
(5.2.6)
V(Tn k ) - (k-l)
,
r
r
d2k (1")
,J,
i-1 j-1 n
114
where
k
I: c k(i,h) +
h-1 n,
k
-2
(5.2. 7)
He also showed that under suitable conditions on the c
n,
k(i,j),
V(T
)-1/2 (T
- E T k) converges in distribution to the
n,k
n,k
n,
standard normal distribution under
H
O
we have introduced the dependence of
parameter
n
lim
k~
Id
n,
k
-+ co.
[Note that
k(i,j) on the additional
which was not the case in Hoeffding [1951] , however,
the same theory still applies.]
tions on the
c
as
c
n,
n,k
MOtoo (1957) weakened the condi-
k(i,j) to the following:
I:
(i,j)/d
I> e:
n,k
(d
(i,j)2/ d2
n,k
) - 0
(5.2.8)
n,k
where
For many densities
into a product
fa ' the quantity
cn,k(i,j) will factor
a n, k(i) b n, k(j), so that the resulting test statistic
is a linear rank statistic, and the resulting test is easy to apply.
There are distributions, however, for which such factorization is
not possible.
tribution.
An example is the Gumbel bivariate exponential dis-
115
Assume now the more restricted model
2
2
exp {-(x- ay) /2(1- a )}
fa(xIY) - [2n(1_a 2)]-1/2.
which does yield linear rank statistics.
Location and scale parameters can be ignored because the rank tests
to be considered are location and scale invariant.
For this model,
k
Qn,k
E(Y
k
i~l Xn[i] Yn,i •
~ , i = E(~ , i) where
Define
of
:0:
from
fO(x) = (2n)
i) where as before
n,.
-1/2
Y i
n,
from the parent distribution
(5.2.9)
~
-K,i
is the i
2
exp(-x /2).
is the i
g(y).
th
order statistic out
Also define
th
t
n,i
-
order statistic out of
n
Then
(5.2.10)
E(Y n, ils n, i) = t n, i '
and since
Xn [i] and
Yn , i
are independent when
a = 0,
k
T
n,
k · E(Q
klR. S
n,;;.1t -n
,
a-
0)·
I:
i=l
11.
K,R i
t n, i .
(5.2.11)
This statistic is useful not only for the conditional normal model
but also for general monotonic association alternatives.
As
in
Ghosh and Sen [1971] , one can also define "mixed rank statistics"
which are asymptotically efficient.
These are
116
X
J:fn,k -
t
n [i]
n, i
or
(5.2.12)
k
~,k
~,Ri
• i:1
The statistic
MY
n,k
more dependent on
Since urtde"r
integers
Yn,i·
might be preferable since the scores
g{y)
H
for large amounts of censoring.
are equally likely,
parametric test of independence.
By Theorem II.3.1.c
u
n,
of Hajek
O'
and Sidak [1967, p. 61] , under
E{T . k) - k
R , 1 < i < k among the
i
T
provides a nonn,k
all permutations of
O
1, ••• ,k
un, i are
H
t
and
(5.2.13)
2
V{T
k) -
n,
(1
u
k
r (t i
i-1 n,
-2
- t)
,
where
-1
u - k
k
r
uk i '
i-1 '
t - k-
k
~
1
~
i-1
t
n,
i ' an
d
(5.2.14)
2
(1 u
-
(k-1)
-1
k
r
i-1
2
(~ i - U) •
'
117
These results hold true for all
t
n,i
and
these scores are defined as in (5.2.7).
k
uk,i
whether or not
Assume that
L (t i-"t) 2 > 0
i-1 n,
exists a square integrable function T(U) such that
are any real cons tan ts such that
1
lim 1
k-+oo 0
2
[~ 1+ [v k] - T(V)] dv
'
1,···,t k
n,
n,
and that there
= O.
t
(5.2.15)
"
Hajek and Sidak
[1967] have shown (Theorem V.l.6.a) that under H '
O
T
is asymptotically normal for k ~oo with mean and variance
o.,k
given in (5.2.13) • The asymptotic variance of T k can also be used:
n,
lim
k-+oo
where
k
_ 2 1
2
V(Tn, k) - [ L (t i- t ) ]/0 [T(u) - T ] du,
i-l n,
1
T • 10 T(u)du.
(5.2.16)
So for large k, critical values from the
standard normal distribution can be compared to
Z
n, k
=
(k_1)1/2 r
n,k
(5.2.17)
where
(5.2.18)
can be used as a rank measure of correlation.
is to treat
Another approximation
118
(k_2)1/2 r
n,k
as an approximate
/(1_r 2 )1/2
n,k
t
(5.2.19)
random variable with
k-2 D.F.
This approxi-
mation is accurate for small sample sizes in many situations (e.g.
logistic or normal generating distributions).
~.m.p.r.t.
The paper by Shirahata [1975] also deals with the
under censoring but in the case where X and Yare both timecensored, i.e.
Xi is observed if it is one of the
Xi and Y is observed if it is one of the
i
5.3.
n
n
.
l
smallest
smallest
2
Y .
i
Empirical Power Studies
In this section we study power properties of the rank tests.of
independence discussed in section 5.2, assuming the conditional
normality model for X given Y and for g(y) being the normal or
extreme value distribution.
We write p instead of
with discusssions of parametric tests.
by generating 1000 values of
95
th
of
percentile of
Irn,k l
Irn,k l
r
n,k
for
are:
n,k,p, estimating the
and computing the proportion
greater than this critical value when p ~ O.
scores defined in (1.4.9) are employed for
when
be consistent
The power is estimated here
for each
p • 0
e to
g(y) is the normal density.
~
K,i
and for
Blom
t n,i
The empirical critical values
119
Empirical Critical
Value
Critical Value Estimated
from (5.2.19)
n
k
20
10
.620
.632
100
20
.457
.444
1000
50
.274
.279
When
X and
Yare truly from a bivariate normal distribution and
normal scores are used for
~,i
and
tn, i
the following power
estimates result:
TABLE 5.1
ESTIMATED POWER OF 5% NORMAL SCORES TEST FOR
CENSORED DATA FROM THE BIVARIATE NORMAL DISTRIBUTION
n
k
.Q
Estimated Power
20
10
.1
.061
.3
.086
.5
.142
.9
.743
.1
.045
.3
.082
.5
.139
.9
.895
.1
.064
.3
.148
.5
.319
.9
.993
100
1000
20
50
120
Except for
n - 100
k· 20
p·.9
these results compare quite
favorably to the power of the best parametric test for the bivariate
normal case as found in table 2.3.
The normal scores test
statistic has a null distribution that is independent of
and
fe(xIY)
g(y) in the sense that it has a permutation distribution de-
pending only on the assumption that all permutations of
equally likely.
Therefore~
!k
are
the normal scores test may be preferred
over the parametric test which assumes that either
is normal before its null distribution is known.
fO(x) or
g(y)
The normal scores
test may also be appropriate in the presence of non-linear regression
or heteroscedasticity.
In order to examine power properties of the normal scores test
when the scores are not optimal we now assume the extreme value/normal
model of section 4.3 but continue to use normal scores for
~~i.
The power estimates are found in table 5.2.
t
n~
i
and
121
TABLE 5.2
ESTIMATED POWER OF 5% NORMAL SCORES TEST FOR
CENSORED DATA FROM mE EXTREME VALUE/NORMAL DISTRIBUTION
Estimated·P6wer
n
k
12.
20
10
.1
.062
.3
.071
.5
.102
.9
.499
.1
.043
.3
.056
.5
.064
.9
.513
.1
.054
.3
.089
.5
.106
.9
.652
100
1000
20
50
By comparing those results with table 4.1, we see that very little
power is lost over the parametric test even though only the scores
for
~
are optimal.
The normal scores test of independence may
perform well for many bivariate distributions.
122
5.4.
Asymptotic Relative Efficiencies of Linear Rank Tests
x
Assume that optimum scores for
from a square integrable function
~
that scores for
are generated asymptotically
~
'r
l
(u) as in (5.2.15) and likewise
~l(u) satisfying
are generated by
(5.4.1)
where
k· pn
and
p
rank statistic from
HO:
testing
e·
0
(0 t 1]
is fixed.
Ll(u) and
~l(u)
€
We assume that the linear
is locally most powerful for
and is asymptotically efficient relative to
the best parametric test in the Pitman (local alternatives) sense.
Now let
L (U) and
2
~2(u)
be other corresponding score functions
which are not necessarily optimal.
Define
(5.4.2)
and
Py2 •
where
Li •
(5.4.3)
123
(5.4.4)
and
i-l,2 •
2
Px
Note that
and
p2
y
can be interpreted as squared correlation
coefficients between corresponding "correct" and "incorrect" scores.
We now argue heuristically based on the results of Ghosh and Sen
[1971] and Basu [1967] that the asymptotic relative efficiency of
the linear rank test based on
(~2,$2)
to that based on
(~l,$l)
is
given by
(5.4.5)
For example, choosing correct X-scores
2
Px .. 1.
However, choosing Wilcoxon scores for X
the optimal scores are normal scores
p2 .. 3/~.
for
2
P
y
(~l(u)
=
-1
~
yields
(~2 (u)
(u»
= u)
when
yields
If this same mistake is made in choosing scores for Y
x
($l(u) ..
(~l (u) =~ 2{u»
-1
~
(u),
$2(u)" u), the following component quantities
are:
(5.4.6)
where
a ..
~
-1
2
(p),
<!>{u) ..
(2~)-1/2 e -u /2 ,
124
W
2 •
p/2 ,
-1 P
P
10 u
~
-1
(u) du·
-<p(a) +
~(2
1/2
a) /2pTr
1/2
,
(5.4.7)
(5.4.8)
and
(5.4.9)
(5.4.10)
This squared correlation of
Y - score functions has respective
values .636, .902, and .955 (3/Tr)
respectively.
when
p •• 00057, .5, and 1
Thus for example very little efficiency is lost in
using the Spearman correlation coefficient (which incorporates
Wilcoxon scores) instead of the normal scores correlation for censored
bivariate normal data.
5.5.
Hoeffding's General Test of Independence as applied to Censored
Bivariate Samples
Suppose that two random variables have a d.f. H(x,y) with con-
tinuous joint and marginal density 6,mctions.
Let
125
8
=
co co
2
D (x,y) dH(x,y)
f-eo f-eo
(5.5.1)
where
D(x,y) - H(x,y) - H(x,co)H(co,y),
(5.5.2)
The "D" test based on a random sample {(Xi'Y ), 1 < i <n}
i
vised by Hoeffding in 1948.
and
!
vectors and tests
It depends only on the ranks of the
~
H : 8- 0 , being consistent against all
O
alternatives to independence.
The test can also be applied to a
censored bivariate sample . {(Xn Ii] ,Yn , i)'
1.:: i.:: n}
the null hypothesis all permutations of the ranks
likely (fixing the ranks
was de-
~n) due to . {X
nIi
],
since under
!k
1 < i < k}
are equally
representing
a random sample of size k under H '
O
Hoeffding derived an unbiased estimator of
statistic.
~
based on a U-
Alternatively 8 can be estimated with the sample bivariate
empirical distribution:
co 2
f D (x,y)d Hn (x,y)
-co n
(5.5.3)
where
n
H (x,y)
n
= n -1 L
i-l
I {Xi<x , Yi<y},
-
D (x,y) - H (x,y) - H (x,co)H (co,y) ,
n
n
n
n
(5.5.4)
126
and
riA}
is the indicator function for the set A.
Blum, Kiefer,
and Rosenblatt [1961] developed another test of this type in the
spirit of Kolmogorov-Smirnov tests.
Bn • sup
x,y
Their test statistic is
(5.5.5)
Dn (x,y).
For a censored bivariate sample the following information is
available:
0
Fn,k(~} • k
n
-1
r
i-I
r{xi~x}ci
(5.5.6)
k
.. k-1
r I {X [i~X}'
n
i-I
Gn,k(y} - n
-1
H k(x,y}" n
n,
k
-1
i <y},
n,-
(5.5.7)
r ({y
i-I
k.
r I{Xn [i]
i=l
.
~x}I{Y
n,
i~ y},
(5.5.8)
and
o
Dn, k(x,y}. Hn, k(x,y} - Fn, k(x}Gn, k{y}
where
{5.5.9}
127
(5.5.10)
Yn,k ?y
When
the following simplifications hold:
(5.5.11)
and
H k(x,y)
n,
=n
-1
n
I: I{X ~ x}I{Yi~y}.
i
i-1
(5.5.12)
Thes'e sample distribution functions have the following properties:
(5.5.13)
and
E [H
and if
k(x,y) IY
a denotes
E [F
and
n,
O
n,
k (x)
n,
k > y] ,. H(x,y) •
-
(5.5.14)
the dependence parameter as in section 5.2,
Ie.
0] ,. F (x)
(5.5.15)
128
E[Dn, k{x,y)!Yn, k -> y,
since
E[c ] ,.
i
Here
Yi •
F{x) and
(5.5.16)
0] • 0
kIn.
(5.5.17)
G{y) are respectively the marginal d.f.
of Xi and
A statistic for testing independence is
'"
/).
n,k
An
e-
-
y
fC1J f n,k D2 k{xsy)d H k{x,y).
n,
n,
.;.00
.;.00
(5.5.18)
alternative test statistic analogous to (5.5.5) is B
,. sup D k.
n,
n,k
_C1J< x<C1J
_C1J< y<Y·
- n, k
These statistics should perhaps be studied in more detail.
Hoeffding's D test can be carried out without difficulty.
However,
It would
be interesting to see how such an omnibus test of independence performs with respect to a simple test directed at simple alternatives.
To this end, 1000 censored bivariate normal random vectors were
generated for different
n,k, and
p.
The empirical upper 95
th
percentile when p • 0 was used as a critical value since the null
distribution of D is tabulated only for very small sample sizes.
resulting critical values are:
The
129
D. 95
n
k
20
10
.00503
100
20
.00218
1000
50
.00067
Power is estimated by computing the proportion of observed D
statistics greater than D. 95 •
The results are in table 5.3.
TABLE 5.3
ESTIMATED POWER OF 5% HOEFFDING D TEST FOR
CENSORED BIVARIATE NORMAL DISTRIBUTION
n
k
£.
Estimated Power
20
10
.1
.066
.3
.084
.5
.117
.9
.573
.1
.045
.3
.075
.5
.118
.9
.782
.1
.055
.3
.113
.5
.236
.9
.970
100
1000
20
50
130
By comparison with table 2.3 the power is reduced from that of the
likelihood ratio test but is not devastated.
is the smallest for the case
n· 1000
proportion of censoring is present.
k· 50
The power reduction
in which a high
CHAPTER VI
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
6.1.
Summary.
Many tools of classical multivariate theory have been extended
for use in analyzing censored multivariate samples consisting of
order statistics of one variate and induced order statistics of
other variates.
have been
Maximum likelihood and simplified estimators
develop~d
for means, variances, covariances, correlation
coefficients, and multiple, partial and canonical correlation coefficients.
Statistics for testing total or partial independence of
random vectors have also been derived.
Linear rank tests for inde-
pendence in the bivariate case have also been proposed and
Hoeffding's general test of independence in this case has been
studied.
Properties of estimators have been studied by deriving CramerRao lower bounds for dispersion and by simulations.
Power pro-
perties of some of the proposed test procedures have been studied
by deriving conditional power functions and by making empirical
power calculations.
The rank tests for independence were shown to
have many advantages over the parametric tests since the power
132
characteristics of the two types of tests are similar while the
rank tests make fewer assumptions.
6.2.
Suggestions for Futther Research
There are several open areas remaining in this research.
For
one, the unconditional distribution of test statistics such as
(2.5.1.8) could be derived if the distribution of the sample
variance calculated from a censored normal sample could be obtained.
This may not turn out to be a tractable problem.
Also, it would
be advantageous to prove formally that the partial correlation test
in section 3.4.2 is actually the likelihood ratio test [the author
could find no formal proof of this even for the complete sample
case] •
Another question worth considering is how much information regarding tests of independence is lost when only induced order
statistics are observed and not the entire vector of concomitant
data.
This question could be studied by comparing the power
characteristics of the rank test proposed in section 5.2 with those
of the test proposed by, for example, Brown et a1. [1974].
aIBLIOGRAPHY
Anderson» T. W. 1958. An Introduction to Multivariate Statistical
Analysis. New York: Wiley.
Basu» A. P. 1967. On the large sample properties of a generalized
Wilcoxon-Mann-Whitney statistic. Annals of Mathematical
Statistics 38: 905-15.
Bhattacharya» P. K. 1974. Convergence of sample paths of normalized
sums of induced order statistics. Annals of Statistics 2:
1034-~.
Blum» J. R.» Kiefer» J.» and Rosenblatt» M. 1961. Distribution
free tests of independence based on the sample distribution
function. Annals of Mathematical Statistics 32: 485-98.
Brown» B. W.» Hollander» M.» and Karwar» R. M. 1974. Nonparametric
tests of independence for censored data with applications to
heart transplant studies. In Reliability and Biomet;y»
Editors F. Proschan and R. J. Serfling» Philadelphia: S.I.A.M.
(pp. 327-54).
Chan» L. K. 1967. Remark on the linearized maximum likelihood
estimate [for type II double censoring]. Annals of Mathematical
Statistics 38: 1876-81.
Cohen» A. C. 1965. Maximum likelihood estimation in the Weibull
distribution based on complete and on censored samples.
Technometrics 7: 579-88.
Cox» D. R. 1972. Regression models and life-tables.
Royal Statistical Society Series B 34: 187-202.
Cox» D. R.» and Hinkley» D. V.
Chapman and Hall Ltd.
David» H. A.
1970.
1974.
Order Statistics.
Journal of the
Theoretical Statistics.
New York:
Wiley.
London:
134
David, H. A., and Galambog, J. 1974. The asymptotic theory of
concomitants of order statistics. Journal of Applied Probability 11: 762-70.
David, H. A., and Moeschberger, M. L.
Risk. London: Griffin.
1978.
The Theory of COmpeting
Des Raj 1953. On estimating the parameters of bivariate normal
population from doubly and singly linearly truncated samples.
Sankhya 12: 277-90.
Gehan, E. A. 1965. A generalized Wilcoxon test for comparing
arbitrarily singly censored samples. Biometrika 52: 203-23.
Ghosh, M., and Sen, P. K. 1971. On a class of rank order tests for
regression with partially informed stochastic predictors.
Annals of Mathematical Statistics 42: 650-61.
Glasser, M.
1965. Regression analysis with dependent variable
Biometrics 21: 300-7.
censo~ed.
Glasser, M. 1967. Exponential survival with covariance.
of the Americal Statistical Association 62: 561-8.
Journal
Gupta, A. K. 1952. Estimation of the mean and standard deviation
of a normal population from a censored sample. Biometrika 39:
260-73.
Hajek, J., and sId4k, Z.
Academic Press.
1967.
Theory of Rank Tests.
New York:
Halperin, M. 1952. Maximum likelihood estimation in truncated
[meaning censored] samples. Annals of Mathematical Statistics
23: 226-38.
Harter, H. L. 1961. Expected values of normal order statistics.
Biometrika 48: 151-65. Correction 48: 476.
Harter, H. L., and Moore, A. H. 1968. Maximum likelihood estimation,
from doubly censored samples, of the parameters of the first
asymptotic distribution of extreme values. Journal of the
American Statistical Association 63: 889-901.
Hoeffding, W. 1948. A non-parametric test of independence.
of Mathematical Statistics 19: 546-57.
Annals
Hoeffding, W. 1951. A combinatorial central limit theorem.
of Mathematical Statistics 22: 558-66.
Annals
135
Johnson, N. L., and Kotz, S. 1970. Distributions in StatiStics:
Continuous Univariate DiStributions - 1. Boston: Houghton
Mifflin Company.
Kalbfleisch, J. D. 1978. Likelihood methods and nonparametric
tests. Journal of the Americal Statistical Association 73:
167-70.
Kaplan, E. L. and Meier, P. 1958. Nonparmetric estimation from
incomplete observations. Journal of the Americal Statistical
Association 53: 457-81.
Majumdar, H., and Sen, P. K. 1978. Nonparametric testing for
simple regression under progressive censoring with staggering
entry and random withdrawal. CommunicationS in Statistics Theory and Methods A7(4): 349-71.
Miller, R. G. 1976. Least squares regression with censored data.
Biometrika 63: 449-64.
Matoo, M. 1957. On the Hoeffding's combinatorial central limit
theorem. Annals of the 'Institute of Statistical Mathematics,
Tokyo 8: 145-54.
Plackett, R. L. 1958. Linear estimation from censored data. Annals
of Mathematical Statistics 29: 351-60. [Note: errors in this
paper are reported in Chan, 1967].
Rao, C. R. 1965. Linear Statistical Inference and Applications.
New York: Wiley.
Sarhan, A. E. and Greenberg, B. G. (Editors) 1962.
to Order Statistics. New, York: Wiley.
Contributions
Saw, J. G. 1958. Moments of sample moments of censored samples
from a normal population. Biometrika 45: 211-21. Correction
45: 587-8.
Saw, J. G. 1959. Estimation of the normal population parameters
given a singly censored sample. Biometrika 46: 150-5.
Saw, J. G. 1961. The bias of the maximum likelihood estimates of
the location and scale parameters given a type II censored
normal sample. Biometrika 48: 448-51.
Saw, J. G. 1962. Efficient moment estimators when the variables are
dependent with special reference to type II censored normal data.
Biometrika 49: 155-61.
136
Sen, P. K. 1976. Weak convergence of progressively censored
likelihood ratio statistics and its role in asymptotic theory
of life testing. Annals of Statistics 4: l247~57.
Sen, P. K. 1977. Rank analysis of covariance under progressive
censoring. Institute of Statistics Mimeo Series No. 1118,
University of North Carolina.
Shirahata, S. 1975. Locally most powerful rank tests for independence with censored data. Annals "of Statistics 3: 241-5.
Silvey, S. D. 1961. A note on maximum likelihood in the case of
dependent random variables. Journal "of the Royal Statistical
Society Series B 23: 444-52.
Smith, D. W. 1978. A simplified approach to the maximum likelihood
estimation of the covariance matrix. American Statistician
32: 28-9.
Watterson,-G. A. 1959. Linear estimation in censored samples
from multivariate normal populations. Annals of Mathematical
Statistics 30: 814-24.
Zippin, C., and Armitage, P. 1966. Use of concomitant variables
and incomplete survival information in the estimation of an
exponential survival parameter. Biometrics 22: 665-72.