Creason, J.P.; (1978)The Theory and Application of a General Iterative Maximum Likelihood Procedure to Randomly Censored Univariate and Bivariate Normal Linear Models."

BIOMATHEMATfCS TRAINING PROGRAM
THE THEORY AND APPLICATION 'OF' A,. GENERAL
ITERATIVE MAXIMUM LIKELIHOOD PROCEDURE/TO RANDOMLY
CENSORED UNIVARIATE AND BIVARIATE NORMAL LINEAR MODELS
by
J.P. Creason
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1173
JUNE 1978
•
THE THEORY &\~ APPLICATION OF A GENERAL ITE~~TIVE
MAXIMUM LIKELIHOOD PROCEDURE TO ~\NDO~~Y
CENSORED UNIVARIATE AND BIVARIATE
NO&~ LINEAR MODELS
by
John Paul Creason
A Dissertation submitted to the faculty
of The University of ~orth Carolina at
Chapel Hill in partial fulfillment of
the requirements for the degree of
~octor of Philosophy in ~he ~epartment
0f 3ioatatistics.
Chapel Hill
1978
I
;._.
:.,k ........ L ....
Reader
\o-•• ,\O.
JOHN PAUL CREASON. The Theory and Application of a General Iterative
Maximum Likelihood Procedure to Randomly Censored Univariate and Bivariate ~ormdl Linear Models.
eUnder the direction of LAWRENCE L.
KUPPER. )
A general iterative maximum likelihood procedure for estimation of
the parameters of a randomly censored univariate or bivariate normal
distribution is developed, based on Orchard and Woodbury's "Missing
Information Principle" (HIP).
This procedure is applied as a ganeral
solution to univariate and bivariate k-sample estimation problems and
multiple liilear regression estimation problems in the presence of random
censoring on one or both variables.
The
proced~re
is applied in the
univariate cases to data sets from the literature for which specific
methods have been developed and for \/hich solutions are therefore known.
Simulations are run in several bivariate cases to establish the small
sample characteristics of the estimates under several censoring regimens, and to demonstrate the general applicability of the procedure.
Likelihood ratio tests are derived for use under random censorship
conditions ::or both univariate and bivariate normal problems, and, in
the bivariate case, are evaluated through simulation studies that investigate the behavior of these tests under the null hypothesis ror
relatively 3mall samples.
The power of the likelihood ratio test is
also investigated for some small sample problems subject to several
different censoring regimens.
The above results are extended to the multivariate mUltiple regression case where no :nore than two dependent variables are ';:ensored.
Approximate methods are given for situations where more than two variables are censored in any observation vector.
,
Three direct applications of the developed theory
a~d
iterative
procedures to Environmental Protection Agency data sets consisting of
measurements of trace residues of pollutants in human tissues and of
pertinent covariates of interest are presented to demonstrate the
utility of these procedures.
ACKNOWLEDGEMENTS
The author wishes to thank the mecrbers of the doctoral committee,
Dr. Lawrence Kupper, Dr. Michael Symons, Dr. Linda Little, Dr. Victor
Hasselblad, Dr. William Nelson, and Dr. Wilson Riggan, for their assistance during this research, as well as for their review of the manuscript.
The author is especially grateful to Dr. Hasselblad for origi-
nally encouraging the pursuit of this topic, as well as for his
tinuing support and helpful suggestions.
CQn-
Dr. Kupper, the author's
faculty advisor, is also due special thanks for his patience and encouragement throughout the course of this disseration.
The author thanks Mrs. Betty Bradford for her excellent typing
of an extremely difficult manuscript.
Finally, special recognition is due the author's wife, rIsa, for
her many sacrifices and for her encouragement during the many hours
spent in
compl~ting
this dissertation, and also the author's children,
Kyle and Jennifer, for their special patience and support during the
long course
or
study.
i1
T~BLE
Of CONTENTS
Chapter
1.
Page
INTRonUCTION
1.1
1.2
1.3
'")
'-
.
REVIEW OF THE LITERATURE .
Applications and Definitions of Terills
Review of the Literature
.
1.2.1 Univariate normal estimation
1.2.2 Estimation in linear parametric models
1.2.3 Bivariate normal estimation . .
1. 2.4 The "~[issing Info:-matior. Principle" (MIP)
in the missing data case
Statement of Specific Purpose and
Content of this Work . . . . . . .
LIKELIHOOD ESTIMATION OF PAR&~ETERS IN
UNIVARIATE LINEAR MODELS WITH THE DEPENDENT
VARIABLE SUBJECT TO TYPE I CENSORING
L
1
8
8
18
20
20
26
~XlMUM
2.1
2.2
2.3
2.4
2.5
3.
~~D
Estimation in the one-sample Univariate
Normal C a s e .
..
Estimation in the k-sample Univariate
Normal Case
Estimation in the Univariate Multiple
Regression Case
. .
Sampling Errors of Estimates
2.4.1 The one-sample case
2.4.2 The k-sample case
Applications to Literature Data Sets
2.5.1 Examples in the one-sample case
2.5.2 An example in the k-sample case
2.5.3 An example in the Linear Regression Case
LIKELIHOOD ESTL~TION OF PARfu~ETERS
OF THE BIVARIATE NORMAL DISTRIBUTION WITH
BOTH VARIABLES SUBJECT TO CENSORING . . . .
29
29
32
33
35
35
37
41
42
47
50
~XIMUM
3.1
3?
A Simplified Estimation Procedure Usi.ng the RIP
in the Singly-censored Multivaria.te Normal
Distribution Case . . . . . . . . . . . . . . .
Bias Relative to True Maximum Likelihood Estimators of the Simplified Procedure in the 3ivariate Norillal Case with Fixed Type I
C2nsoring . . . . . . . . . . . . . . . . . . . . .
iii
52
52
54
iv
3.3
3.4
3.5
3.6
4,
SIMULATION STUDIES OF HYPOTHESIS TESTING PROCEDURES
IN THE PRESENCE OF Type I CENSORING
4.1
4.2
67
71
72
74
74
93
101
APPLICATION OF RESULTS TO ENVIRONMENTAL DATA SETS
110
5.1
5.2
5.3
6.
60
The Univariate One-sample Case
Power Considerations in the Univariate
One-sample Case , . . , . , . . .
The Univariate k-sample Case . . . . .
The Univariate Multiple Regression Case
Tests of ~eans in the Bivariate Normal Case
The Bivariate Normal Linear Model Case . . .
4.3
4.4
4.5
4.6
5.
Maximum Likelihood Estimation of the Parameters
of a Bivariate ~ormal Distribution in the
Presence of Type I Random Censoring
A Simulation Study of the Bivariate ~ormal
Distribution Estimation Procedure . . . .
Estimation in the k-sample Bivariate Case
Estimation in the Bivariate Muliple Regression Case
Tissue Levels of 10 Selected Trace Elements in
Maternal Venous Blood, Cord Blood, and Placenta
Polychlorinated Hydrocarbons and Polychlorinated
Biphenyls in Human Plasma
. . . . . .
HANES Data: Trace Elements in Scalp Hair and Blood
SUMMARY AND CONCLUDING RID-lARKS
LIST OF REFERENCES
79
81
87
110
116
122
132
135
LIST OF TABLES
Table
1.1.1
Page
Proportion of Volunteers Having Plasma Chlorinated
Hydrocarbon Residues and Mean Values of Measurable Residues Distributed by Age and Sex
4
1.1. 2
Trace Metal Levels in Maternal Blood
S
1.1. 3
Trace Metal Levels in Cord Blood . .
6
1.1.4
Trace Metal Levels in Placental Tissue
7
1.2.1.1
Contributions to Log Likelihood and Its Derivatives
for a Time-to-Tumor Experiment (Lea, 1945) . . . . .
2.4.2.1
Contributions to Log Likelihood and Its
Derivatives in the k-sample Case . . . .
2.4.3.1
10
38
Contributions to Log Likelihood and Its Derivatives
in the Univariate Multiple Regression Case . . .
40
2.5.1.1
Failure Times cor 37 Locomotive Control Devices
42
2.5.1.2
Results of the RIP Procedure for the
Locomotive Control Device Problem
43
2.5.1.3
Life in Hours of 119 Burned Out Electric Light Bulbs
44
2.5.1.4
Results of the RIP Procedure for the Electric
Light: Bulb Problem . . . . .
2.5.1.5
on Successive Days after. Treatment
with Hormone Preparation . .
2.5.2.1
4S
Number of Shrimp Moulting and Dying Without
~oulting
2.5.1.6
. . . .
. . . .
46
Results of the RIP Procedure for the Shrimp
Moulting Problem .
47
Tensile Strengths of Three Types of Copper Wires
48
v
vi
2.5.2.2
Results of the RIP Procedure for the
43
Copper Wire Problem
2.5.2.3
Maximum Likelihood Parameter Estimates for the
Copper Wire Problem - Hahn and Miller Results
49
2.5.3.1
Data Erom Glasser (1965) for Linear Regression
51
3.4.1
Summary Statistics of Maximum Likelihood Estimates
for Censored Bivariate Normal Samples of Size
N~25,
68
with p=O.O .
3.4.2
Summary Statistics of Maximum Likelihood Estimates
for Censored Bivariate Normal Samples of Size
69
~=-0.5
with
3.4.3
~1=25,
Summary Statistics of Maximum Likelihood Estimates
for Censored Bivariate Normal Samples of Size N= 25,
4.1.1
70
~=0.5
with
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and p-Values for the
77
Univariate i-Sample Case (N=25)
4.1.2
Percentiles of 1000 Simulations of
~
Pr(T~tdf)
and
?
Pc(xl~~i)
Under the Null Hypothesis for the Univariate
78
I-Sample Case .
4.2.1
Empirical Power of the T and :<2 Test Statistics Under
Several Alternative nypotheses as Derived from 1000
Simulation Runs at Each Alternative and Each
80
Censoring Regimen .
4.3.1
Means and Variances of 1000 Simulations of Null
nypothesis Test Statistics Eor the Univariate
84
4.3.2
Percentiles of 1000 Simulations of
)
)
?r(xi~{i)
Pr(tdf~T)
and
Under the Null Hypothesis for the
Univariate 2-Sample Case
35
vii
4.3.3
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistic (X;) for the Univariate
4-Sample Case (N =N =N =N =25)
l 2 3 4
4.3.4
Percentiloas of 1000 Simulations of
.
2
2
Pr(x3~'{3)
Under the
Null Hypothesis for the Univariate 4-Samploa Case
4.4.1
86
86
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and Coefficient Estimates
for the Univariate Multiple Regression Case with Two
89
Uncorrelated Independent Variables (N=25) .
4.4.2
Percentiles of 1000 Simulations of
?
?
Pr(x2~~U)
2
2
Pr(x2~'{R)
and
Under the Null Hypothesis for the Univariate
Multiple Regression Case with Two Uncorrelated
Independent Variables (N=25)
4.4.3
. . . .
90
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and Coefficient Estimates
for the Univariate Multiple Regression Case with Two
Correlated (p=0.5) Independent Variables (N=25) .
4.4.4
Percentiles of 1000 Simulations of
?
?
Pr(x2~~U)
91
pr(X;~'{R2) ann
-
.
Under the Null Hypothesis for the Univariate
Multiple Regression Case with Two Correlated (p=O.5)
Independent Variables (N=2S)
4.5.1
92
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and p-Values for the
Bivariate Normal Case with p'=O.O (N=25)
4.5.2
2 2
Percentiles of 1000 Simulations of Pr(x2~~E) and
2
o
::; XEU ) ..
'
f or teD
h °i var~ate
.
• r ( X2~
un d er t h
e"
~u 11 Hypot h
es~s
Normal Case with p'=O.O (N=25)
4.5.3
95
96
Means and Variances of the Null Hypothesis Test
Statistics and P-Values Ear the Bivariate Normal
Case with
~'=
-0.5 (N=25)
97
viii
4.5.4
')
~
Percentiles of 1000 Simulations of Pr(x2~~~) and
2 ?
Pr(X22..XBU) Under the :-lull Hypothesis toe the Bivariate
Normal Case with p'=0.5 (N=25)
4.5.5
.
98
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and ?-Values for the
99
Bivariate Normal CasE: with p'=0.8 (N=25) . .
4.5.6
Percentiles of 1000 Simulations of
?
2
2
Pr(X2~~B)
and
2
Pr(x22..xBU) Under the Null Hypothesis for the Bivariate
Normal Case with
4.6.1
f)
'=0. 8
(~'i=25)
100
. . .
Means and VpxiancF.'s of 1000 Simulations of the Null
Hypothesis Test Statistics and Coefficient Estimates
for the Bivariate Multiple Regression Case with Two
Independent Variables, with P12=P34=0.O (N=25) . .
'1
4.6.2
')
Percentiles of 1000 Simulations of Pr(x~~~~R) and
2 2
Pr(x4~XBRU) Under the Null Hypothesis for the Bivariate
Multiple Regression Case with
4.6.3
. . . . 103
~12=0
and P34=0 (N=25)
104
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and Coefficient Estimates
for the Bivariate Multiple Regression Case with Two
4.6.4
Independent Variables, with P12=0 and 034=0.5 (N=25)
'J
2
Percentiles of 1000 Simulations of Pr(xZ: X ) and
BR
J
?
pre X~~~BRU) Under the Null Hypothesis for the Bivariate Multiple Regression Case with
and
106
(N=25)
°34=0.5
4.6.5
~)L2=O
105
Means and Variances of 1000 Simulations of the Null
Hypothesis Test Statistics and Coefficient Estimates
tor the Bivariate Multiple Regression Case with Two
Independent Variables, with 012=0.5 and 9 34 =0 (N=25)
Percentiles of 1000 Simulations of
Pre
'1
?
4 XBRU )
?
and
Under the Null Hypothesis for the Bi-
variate Multiple Regression Case with
P34=0 (N=25)
?
Pr(x4~~BR)
107
~12=0.5
.
and
108
ix
5.1.1
Maximum Likelihood Estimates of Trace Element Levels
in Maternal Blood (N=156)
5.1. 2
Maximum Likelihood Estimates of Trace Element Levels
in Cord Blood (N=159)
5.1. 3
.
113
. .
Maximum Likelihood Estimates of Trace Element Levels
in Placenta (N=141)
5.1.4
112
.
114
. .
Covariance Matrix and Correlation Matrix ML Estimates
for Maternal Blood (8 censored), Cord Blood (0 censored), and Placenta (33 censored) Boron Levels
115
(N=159)
5.2.1
Maximum Likelihood Estimates of the Means and Variance
of the Logs of Six Plasma Residues by Four RaceResidence G r o u p s . .
5.2.2
117
..
Maximum Likelihood Estimates of the Correlation Matrix
120
for Plasma Chlorinated Hydrocarbon Pesticides .
5.2.3
Hypothesis Tests of Effects of Demographic Variables
on Selected Plasma Chlorinated Hydrocarbon Residues
Using Likelihood Ratio Statistics..
5.3.1
Maximum Likelihood Estimates of Means and Covariance
Matrix of Logs of
Rfu~ES
Scalp Hair Trace Element
Data (N=168)
5.3.2
121
. ....
HANES Study:
123
. . . . . . . .
Regressi.on Analysis of Logs of Scalp
Hair Arsenic Data (N=199, Number Lei t Censored = 31) . . . . 124
3.3.3
Maximum Likelihood Estimates of Means and Covariance
Ma tcix of Logs of HANES Blood Trace Element Data (N c 102) . . 125
5.3.4
~~ES
STUDY:
Regression Analysis of Logs of Blood
Trace Element Data
5.3.5
. . . . . . . . . .
Bivariate Multiple Linear Regression Analysis of
Log Scalp
Hai~
. . . 126
a~ES
Arsenic and Log Blood Arsenic Data (5
Scalp Hair and 14 Bloods Left-censored out of 38)
5.3.6
128
Maximum Likelihood Estimates of Log Scalp Hair to Log Blood
Correlations in
P~ES
Trace Element Data . . . . . . . . . . 129
It
LIST OF FIGURES
Figure
3.2.1
Page
Bias of Estimated
~l
(Relative to the True XL
Estimator) Produced by the Simplified Estimation Procedure of Section 3.1 for Pl and P2
Probability of Censoring Variables Xl and x
2
57
Respectively .
3.2.2
Bias of Estimated v
(Relative to the True ~
12
Estimator) Produced by the Simplified Estimation Procedure of Section 3.1 for P1 and ?2
Probability of Censoring Variables Xl and x
2
58
Respectively .
3.2.3
Bias of Estimated all (Relative to the True ML
Estimator) Produced by the Simplified Estimation Procedure of Section 3.1 for Pl and P2
Probability of Censoring Variables Xl and x
Respectively .
x
•
2
59
Chapter 1
INTRODUCTION AND REVIEW OF THE
LITE~\TURE
This chapter presents a review ot censored data analysis using
maximum likelihood estimation procedures under the assumption ot an
underlying normal distribution.
Applications in which these procedures
are useful are pointed out, and applications for which new methodology
will be developed are presented.
also introduced.
Definitions to be used in the text are
The "Missing Information Principle" and its applica-
tion to censored data problems is detailed . . The specific purpose and
content of the present work is described in the last section of the
chapter.
1.1 Applications and definitions of
Data are said to be
censor~d
ter~s
if the values of all observations
outside some known or randomly obtained
number of such observations is
know~.
li~its
are unknown, but the
rhe limits are termed "truncation
points" or "censoring points" in most of the literature.
censored data arise in life testing, in
dose-respons~
Examples of
$tudies, in bio-
logical assays, and in survival time experiments, among others.
If the
values of all observations below some (not necessarily fixed) point are
unknown, the sample is said to be singly censored to the left.
If the
values of all observations above some point are I.mknown, the sample is
said to be singly censored to the right.
sample is said to be doubly censored.
.,
The
If both situations exist, the
censorin~
procedure is also
2
designated as being either Type I or Type II.
Type I censoring occurs
if the values of all observations beyond some known point (or points)
are censored.
Hence, the actual number of censored observations in the
total sample is a random variable.
Type II censoring is said to occur
when a fixed proportion of the sample size is censored at the lower
and/or upper ends of the range of the random variable. The number of
censored observations is therefore fixed in Type II censoring.
The idea of a censored sample is sometimes confused with that of a
truncated distribution.
A truncated distribution is one that cannot be
observed in part or parts of its range.
It is not possible to obtain
observations from within these removed parts of the distribution.
Trun-
cation is therefore a property of the distribution, whereas censoring is
a property of the sample.
This study is concerned initially with Type I single and double
censoring, with the censoring point(s) for each sample observation
vector being random variables independently distributed of each other
and of the censored variables themselves.
random censoring.
This will be termed Type I
Most of the literature examples of censored data
analysis have resulted from life-test situations.
(1952) discusses log survival time of ten mice with
cor example, Gupta
termi~ation
seven have died, resulting in Type II singly censoring.
after
Cohen (1957)
uses a time-mortality experiment looking at log survival time of forty-seven subjects, where the first reading is delayed twenty hours with two
early failures and the experiment is terminated after eighty hours with
five survivors.
Sahn and
~iller
Here we have a fixed Type I double censoring situation.
(1968a) look at the tensile strength of three types of
wire, with readings en eacn type restricted to a range of 5050 to 5150
3
pounds, resulting in a 3-sample problem with fixed Type I double censoring on each treatment group.
These are but a few of the examples
available, but they give a feel for the type of problems that have
originated the interest in censored sample estimation procedures.
Although life-testing is not the problem originating interest in
the methods to be developed in this paper, the methods to be proposed
are in fact directly applicable to life-test situations where the timemetameter is assumed to be
norma~ly
distributed.
The specific problem
type that originated interest in this study is demonstrated by
Table 1.1.1 taken from Finklea et a1. (1972).
Of the three plasma
chlorinated hydrocarbon residues measured, one (PCB) has from 25% to 55%
of its values censored to the left while the other two experience little
or no censoring.
The problem then is to estimate the means and co-
variance matrix of these variables assuming a log-normal distribution,
and to test for effects due to factors such as sex and age.
Tables
1.1.2 through 1.1.4 are taken from a recent article dealing with maternalfetal tissue sets measured for concentrations of eighteen trace elements
as presented by Creason et al. (1976).
In this case, the censoring
points for each observation are random variables, dependent upon the
amount of tissue collected.
The minimum detectable level (MDL) is high
for low tissue weight samples and low for high tissue weight samples.
(The high tissue weight samples contain more total trace element and
hence following ashing the concentration in the residual ash is higher
than far a low tissue weight sample, producing a lower MDL for the high
tissue weight samples.)
Here then are cases of Type I random single
censoring, the censoring being on the left.
From Tables 1.1.1 through
1.1.4 it is obvious that the possibility a: bivariate or multivariate
Table ].1.1
Proportion of Volunteers Having Plasma Chlorinated Hydrocarbon Residues
and Mean Values of Measurable Residues Distributed by Age and Sex
Plasma residues
A~e
N
Per cent
Bean of
with
Illeasurable
measurable
residues
residues
DDE
---pp'DDT
PC~
N
(ppb)
Per cent
r1ean of
Per cent
Hean of
with
measurable
with
measurable
measurable
residues
measurable
residues
residues
(ppb)
residues
N
(ppb)
Female
O-fl
5-9
10-19
20-39
40-49
60+
6
39
H2
91
67
20
33.3
43.6
32.9
46.2
55.2
40.0
4.25
5.31
f••
64
4.48
5.89
5.09
21
94.9
97.6
98.8
97.9
95.7
100.0
5.01
4.48
3.73
3.39
3.12
3.31
39
42
83
95
70
21
94.9
100.0
100.0
100.0
100.0
100.0
3.58
5.24
4.98
4.45
4.51
5.00
45
53
101
82
67
25
95.6
100.0
97.0
97.6
98.5
100.0
5.63
4.60
4.43
3.72
3.52
6.55
45
53
101
82
67
25
100.0
100.0
100.0
100.0
100.0
100.0
5.42
5.95
5.41
5.92
6.02
7.50
39
42
83
95
70
--------Halp.
0-4
5-9
10-19
20-39
40-59
60+
4
48
96
78
63
25.0
35.4
40.6
55.1
22
45.5
41. 3
4.20
3.35
4.58
5.43
5.10
5.77
.j:-
e
e
-
e
e
Table 1.1.2
Trace Metal Levels in Maternal Blood
11 g/100
El.EMENT
a
1I0BS
PER CENT
CENSORED
MEAN
m1
MINIMUM
MAXIMUM
GEOM MEAN
GM/GSO
a
GM X GSO
PH
186
.00
33.27
4.70
178.00
27.81
15.80
48.95
CD
186
3.23
3.17
.10
31. 30
1.72
.52
5.67
CU
183
.00
79.34
25.20
149.00
75.72
54.91
104.43
ZN
181
.00
662.00
270.00
1075.00
646.00
514.00
810.00
IlG
177
25.42
.96
.10
6.60
.41
.11
1.49
LI
185
1. 62
.56
.03
2.38
.39
.16
.96
SE
182
.00
11.68
3.30
34.60
10.84
7.36
15.96
FE
187
.00
48565.00
15200.00
67100.00
48000.00
40644.00
56687.00
BA
185
.00
8.65
2.00
40.00
7.43
4.37
12.61
13
187
4.81
10.04
1.00
49.00
7.70
3.69
16.09
CR
178
.56
11.1
1.0
90.0
6.9
2.8
17.4
NI
182
14.29
8.5
•7
85.0
3.8
1.2
12.3
AG
un
41.71
V
181
48.07
SN
186
11.83
MN
186
5.38
.06
3.80
1.3
.3
3.3
1.2
.8
1.7
4.6
.9
24.0
3.7
2.0
6.9
3.49
.60
23.00
2.71
1. 35
5.44
.40
.26
.10
a
.65
These values give a range that should include approximately 68% of the population concentrations,
assuming the underlying distribution of concentrations is log normal.
VI
Table 1.1.3
Trace Netal Levels in Cord Blood
llg/lOO ml
ELEHENT
/JOBS
PER CENT
CENSORED
MEAN
HINUruH
HAXlMUM
GEOM MEAN
GM/GSD
a
GH X GSD
a
-
Ptl
180
.56
31.77
2.70
136.00
26.86
14.56
49.55
CD
185
4.32
2.76
.10
18.20
1.65
.52
5.26
CU
186
.00
45.46
10.30
127.00
41. 77
27.33
64.08
ZN
185
.00
499.00
170.00
1100.00
478.00
354.00
645.00
HG
185
22.70
1. 33
.10
11. 50
.58
.15
2.22
LI
185
1.62
. 78
.05
5.21
.51
.19
1. 36
SE
180
.00
12.71
3. 70
35.60
11.83
8.16
17 .15
FE
185
.00
50825.00
27200.00
78900.00
50369.00
43857.00
57848.00
tlA
183
.Ou
10.01
2.00
45.00
8.32
4.71
14.72
B
un
.00
13.62
.70
94.00
10.74
5.44
21.20
CR
18L,
.OU
12.4
1.0
73.0
7.9
3.2
19.6
Nl
182
5.49
8.1
.7
56.0
4.5
1.6
12.9
AG
186
29.03
V
187
41. 71
SN
187
7.49
MN
186
.54
.53
.33
.13
.08
9.00
1.6
.5
6.0
1.4
2.2
2.3
5.6
.9
42.0
4.31
2.2
8.5
4.18
.85
21.00
3.56
2.00
6.34
.84
--.
d
These values give a range that should include approximately 68% of the population concentrations.
assuming the underlytng distribution of concentrations is log normal.
Q\
e
e
e
-
e
e
Table 1.1. 4
Trace Metal Levels in Placental Tissue
II g!lOO grn
---
ELUIENT
flOilS
PER CENT
CENSORED
MEAN
HINlMUH
HAXIHUH
GEOM MEAN
GM!GSO
a
GM X GSO
a
-----PB
165
.00
37.26
8.10
150.00
31.56
17.91
55.61
CO
16Y
.OU
4.41
.90
15.80
3.74
2.08
6.73
CU
166
.00
42.21
4.40
U8.00
33.75
15.82
72.01
ZN
166
.00
HG
164
9.15
2.39
.10
81.90
.65
.14
2.90
LI
167
.00
.65
.05
3.35
.49
.21
1.12
SE
160
.00
14.39
6.10
24.80
13.85
10.42
18.41
FE
169
.00
30420.00
3800.00
91400.00
28930.00
14280. 0 0
47100. 00
BA
167
.00
10.1
3.0
31.0
8.9
5.4
14.5
B
169
20.12
8.3
2.0
47.0
6.1
2.9
12.7
CR
167
4.19
6.3
1.0
27.3
4.8
2.3
10.1
NI
169
Id.20
3.4
.8
32.0
2.2
1.0
5.2
AG
168
60.12
V
169
88.76
SN
168
MN
167
1344.
600.
3200.
1267.
906.
1771.
.09
4.40
1.5
.8
8.0
1.3
.9
2.1
30.36
5.0
2.0
29.0
3.9
2.1
7.4
.60
9.4
1.0
50.0
6.9
3.3
14.3
.47
.26
.09
.71
a These valves give a rangt: that should include approximately 68% of the population concentrations.
assuming the underlying distribution of concentrations is log normal.
-..J
8
censoring must be considered.
For example, if two tissues are collected
and a trace pollutant measured in each, bivariate Type I random censoring of the two trace pollutant concentrations is a very real possibility.
Or, if two different trace pollutants are measured within a single tissue, both trace pollutant concentrations may again be subject to Type I
random censoring.
In time-to-tumor experiments, if two organs are being
examined simultaneously for carcinogenisis, time-to-tumor might be censored for one or both organs by the death or sacrifice of the animal.
Maximum likelihood estimation procedures to deal with these situations
will be developed in this study.
The multivariate case where the number
of variables is greater than two is more complex, but special cases will
be presented (where no more than two variables are censored in an observation vector) and approximations suggested for other cases that will
lead to good estimation and hypothesis testing procedures in the multivariate normal distribution situation.
Very little work has been done on hypothesis-testing problems for
censored normal distribution situations.
Another aspect of this work
will be the development of hypothesis-testing procedures using the likelihood ratio approach for Type I randomly censored samples from univariate or bivariate normal distributions, and a comparison of these
results with other suggested procedures from the literature, where
possible.
1.2 Review of the Literature
1.2.1 Univariate
~ormal
Estimation
A number of authors give maximum likelihood estimators for the parameters of a univariate normal distribution when the distribution is
Type I censored, with the =ensoring point fixed.
The problem is
9
originally treated by Bliss and Stevens (1937) who proposed to use a
successive approximations technique.
Stevens, in an appendix to Bliss's
paper, actually described a solution for both singly and doubly Type I
sin~le
censored cases, but only the
Bliss.
censoring case was of interest to
Stevens' approach was to write down the likelihood equations and
take partial derivatives with respect to U and 0.
and
0
initial estimates of U
obtained by graphical methods were then inserted into the normal
equations.
Newton's method for iteration to new estimates of the para-
meters, based on a Taylor's Series expansion, then proceeded using the
following scheme (see for example Rao (1952»:
2
aL
--z
au
2
(1.2.1.1)
I
UI·~u
aUdO
u
o
By evaluating
,0
+
/LI
(1.2.1.2)
---z- . ua
0
30
II
IU
0'
(J
0
the partial derivatives using prior parameter esti-
mates, the corrections
6u
and
60
can be obtained from these simultaneous
equations and the cycle repeated until convergence is obtained.
An
estimate of the information matrix is obtained from this approach also,
since the coefficients of the left-hand side of the above equations are
evaluations of the necessary second partial derrivatives evaluated at
the estimated parameter points.
Lea (1945) modified the approach of
Stevens to apply to the Type I random censoring situation in looking at
time-t~-tumor
experiments with unrelated deaths in the sample.
Lea
wrote down the contributions to the log likelihood under each condition
10
as shown in Table 1.2.l.i.
Then the Newton iteration scheme was applied
resulting in corrections to the estimated parameters given by
6u
a
51 (AG + BH)
(1.2.1.3)
~
6a
51
+ BJ)
(.~
2
?
2
where G= E/(CE-D ), H ~ - D/(CE-D-) and J = C/(CE-D ), with A,B,C,D and E
defined as in Table 1.2.1.1.
A through E are recomputed
As in Stevens (1937), the contributions to
~sing
the new estimated paramters and iterations
continued until convergency is achieved.
The variance-covariance matrix
of the parameter estimates
Table 1.2.1.1
Contributions to Log Likelihood and Its Derivatives
For a Time-to-Tumor Experiment (Lea, 1945)
Contribution by an
animal getting a
tumor at log-time t
Quantity
log (a/a)
L
B
3'f.l
3L
=
2
!Y
q
2
a L
2
-fJ
q
(x -1)
~
C = -a
D
x
iJ-
=
log q
z
aL
A =
Contribution by an
animal not getting a
tumor by log-time t
au
2
q
?
a-L
2
3'>lda
2
2
xz
q
2
?
2x
?
2 3'"L
E = -(j
2
oa
z
1
2
(3x -1)
z (I-x'")
+~
q
2
q
2
xz(2-x )
q
+
2 2
x z
?
q-
11
--1
~I
is approximated by
using values obtained from the last itera-
:.J
tion of the estimation procedure.
Hald (1949) examined Type I fixed
single censoring using an approach entirely different from earlier
efforts.
Using the point of truncation
(~)
as the origin, Hald wrote
down the log likelihood and differential with respect to
~
and a.
He
then rewrote these normal equations in terms of the truncation point
rather
By manipulating these new normal equations, a complex
than~.
~
function of
and sample values only was obtained.
An aUXiliary func-
tion f(h,y) dependent upon the fraction censored (h) and a sample statistic (y) was tabled by Hald to
g(h,~)
provide~.
A second auxiliary function
was then employed to obtain a, and these two parameter estimates
were then used to
respect to
~
obtain~.
Limits for
n~
of the second partials with
and a were used to derive expressions for the asymptotic
variance-covariance matrix of the estimates.
Cohen (1950) used an
approach similar to Hald's for the Type I doubly censored case.
Cohen
took derivatives of the log likelihood with respect to cr and the left
~L
censoring point
(expressed in standard population units).
He then
manipulated these normal equations to obtain the estimating equations
given by
'J·(Y -Y -<; )-'J =0
1 2 L'
1
~
a'.(I-~1·(Yl-Y2-~L)-Y2 R/o)-v 2=0
where R
= range
of uncensored values
~R = right censoring point = :;L + Ria
..,
I
1>(x) =
vS
exp-x'" 12
y
Hy) =
i
)
-co
,p (x) dx
(1.2.1.4)
12
number of censored values in the left and right tails,
respectively
=
n
number of uncensored values
0
n
Y1
n
L
l
¢(C: )
R
1l(E,;R)
0
and v
(';L)
1-1>(~L)
0
n
= n2
Y~
and v
4>
1
2
are the first and 8econd sample moments referred to the
left terminus; i.e.
\)
k
=
_
I...
n
k/
0
i=l
x. n
~
0
The equations (1.2.1.4) must be solved simultaneously using a modified Newton-Raphson technique for solving two equations in two unknowns.
Both equations require only normal curve ordinates and areas to be evaluated for desired values of ';L and
mean
i~
...
estimated by•
spect to
0
and
~L
\..l
=
x 0 - 0~"'L'
0.
After solutions are obtained, the
Second partial derivatives with re-
are derived and final estimates of the parameters are
inserted to provide an estimate of the variance-covariance matrix for
G
and ';L'
Cohen (1957) further reduced his earlier equations (1950) for
Type I fixed double censoring by expressing
(1.2.1.5)
resulting in estimating equations (1.2.1.6) where s
2
is the sample variance:
(y 1-Y 2-~~L )/'\;~R -E~L ) - v 1/R = 0
Il.c.y _~.y _(Y _y )2)/(~ _; )2_ s 2/ R2= 0
\_ . .., 1 ~ 2
1 2
'R'L
(1.2.1.6)
13
n
and
n
4
o
<P(i;R)
---<P(t: )
R
Iteration procedures for solution of equations (1.2.1.6) are explicitly outlined in one section of his work, with suggestions for first
approximations of the equation solutions.
with respect to
~
Second partial derivatives
and a are written down for this case and estimates of
the parameters may be inserted to obtain an approximation of the variancecovariance matrix of
~
and
J
by inversion of the estimated information
matrix.
For the Type I singly censored case, Cohen (1957) presents an alternative solution based on the following equations (assuming left censoring):
(1.2.1.7)
(1.2.1.8)
~2
Il
-(0 -v )/v
= x0
21
The first equation is solved Eor
second, and u using the third.
(1.2.1.9)
(Y-~L)'
0
calculated using the
The family of curves given by (1.2.1.7)
are plotted as functions of ;L for various values of h=nl/(n + n ) in
o
Cohen's paper, so that a solution for
~L
1
is available graphically, with
a and u then immediately available through equations (1.2.1.8) and
(1.2.1.9) respectively.
Cohen (1959)
fur~her
examined the Type I and Type II fixed single
censoring case and obtained simplified estimators by writing the normal
equations in terms of three parameters:
~ =
ex o -u)!er,
~,
and a.
obtains as normal equations in the Type I single censored case
He
14
xo -
1.1
= cr·e;
(1. 2 .1.10)
x -
1.1 "" o·y
(1. 2 .1.11)
(1. 2 .1.12)
where Y(h,~)
~(-~)
• (1-$(-~»
= (h / (I-h»
The system
of equations (1.2.1.10) through (1.2.1.12) are replaced by the equivalent
system
cr
222
"" s
1.1
=
x -
+ \(x-x o )
(1.2.1.1.3)
\(x-x )
(1. 2 .1.14)
(1-Y(Y-C;»/(Y_C;)2
where x and s
2
o
= s2/(;_x
o
)2
(1. 2 .1.15)
are the uncensored sample moments and
\ is determined directly for a sample as that value of \ corresponding
2 2
to the sample statistics hand s /(x-x ) .
o
-
With h, x ,
~
o
and s
2
avai1-
able from the sample, it is only necessary to read \ from a table provided
by Cohen and later greatly extended by Cohen (1961), or from graphs also
provided, and a and
1.1
calculated using (1.2.1.12) and (1.2.1.14).
Estimators in the Type II singly censored case turn out to be identical with those for Type I singly censored samples when we set the
smallest (or largest) uncensored observation equal to
~
a
.
Formulae for the Type I and Type II singly censored asymptotic
variance-covariance matrices are given by Cohen in terms of;, HO/HO,
~'l
'1(1.1)
'V
(c" /E(n»· (G
"2
Veal '\,
Cov (;.l,0)
'V
Z2
/0
U
~,
·\'j22-'\2»
~2
(0 /E(n»·(:~1l/('311·022-G12»
~ 2
(0 /E(n»
~'~2
.(-'\2/«'=)11°22-°12»
(1.2.1.16)
15
where the 0 .. are given in the left-censored case by
~J
Type I Censored Samples
Type II Censored Samples
8
J22 ( ~>:)
o
11
(h,.;)
:2
l+Y(h,O ·(2(-0+0
= -? +~c • e12 (~
(1.2.1.17)
.1: )
With Z(s)
=
~(~)/[l-~(~)].
For samples censored on the right,
8 .. (-s) from (1.2.1.17) are substituted into (1.2.1.16).
1.]
As (n +
o
nl)~'
the 0 .. for Type r: censored samples approach those for Type I censored
~J
samples, and
+n
[l-~(~)]-lim
n
o
1
~
(n 0 I(n 0 +
°1 »,
so that limiting values
of the estimated variance and covariance for both types of censored
samples in this sense become the same.
Sampford (1952) examined the problem of estimation of the parameters
of a univariate normal distribution under random single right censoring
in what he called the "accidental death" model.
In Sampford' s model the
censoring is assumed to be independent of the censored distribution,
which is also an assumption in the results to be established in this work.
Sampford's approach is very similar to that of Lea (1945) with the exception that Sampford uses a different parameterization of the normal dis-t~ibution,
with the density (unction given by
(1.2.l.18)
The normal equations for
HnL
3a.
CHnL
38
,;(
=
=
and 3 for right censoring are then
-
x
x
l:n- !:: \)
nld
-
0
x
x'
Znx
E..) x'
(1. 2 .1.19)
0
16
where n
=
v(n)
=
n
X'
X
ex + 6x
~(n)/[l-~(n)]
= number of uncensored values
=
censored values
= uncensored
values
Using the MacClaurin expansion to the first order of small quantities,
as in Lea (1945), the increments to be added to the first estimates a
and 13
1
to yield the revised estimates 1),2; 8
2
l
are determined after reduc-
tion of the equations by
X,X
Z
where
z: = v(Il)
-
nA(v)
2
v (11)
-
nv (v)
A(n)
=
IJJ
=
=
I
_
w'
2
(1.2.1.20)
(x-x) }
dv/dn
1 for an uncensoced value
A for a censored value
x
=
L.wxl Zw
a'
=
a + Bx
~
and
-
Further cycles can be carried out until convergence is achieved.
The
asymptotic variances of a and B are then estimated by the reciprocals
of the denominators of (1.2.1.20) as calculated for the final iteration.
Blight (1970) derived the maximum likelihood equations for the estimation of the parameters of an exponential family from a Type I censored
sample.
He found the maximum likelihood estimates by an iterative pro-
cedure involving the expected values of the estimators conditional on
the previous
estim~tes.
tecized such that
The exponential family member must be parame-
17
aln(f(x,0»
A(cj>(x)
a0
(1.2.1. 21)
O)
where A is a positive definite matrix and a function of
° alone.
If the
total number of observations is N and the number of uncensored values is
n while the number of censored values in the subrange Sj is n ,
j
j=1,2, ••• ,k, then the likelihood equations become
N·0 =
,.
n
Ecj>(X )
j=l
+
j
En ·E (O)
j
j
(1. 2.1. 22)
The right-hand side of this equation is seen to be the form of the maximum likelihood estimator for a complete set of N values in which the
contribution of a value that is not exactly observed is replaced by the
expected contributions conditional on
1ihood estimate.
° being equal to the maximum 1ike-
Hence his iterative procedure is given by the recurrence
relation in the estimator
~i:
n
Ecj> (x )
j
+
(i=O,l, ••• )
(1. 2.1. 23)
j=l
°
where
_0
is some initial estimate.
In the case of the normal distribution,
Blight's method requires a transformation of parameters so that
° = (~,a2 + ~2),
= (01'02)"
He also derives new estimates serially using
the new estimate of 01 in estimating 02 in each iteration.
In the vicinity of the maximum likelihood estimates we have approximately
E.(0
] _i ) = Ej (0_-)
aE.(G)
where B ~ (0)
]
-
]
a0
-
0=n
+ Bj
(O),(O.-O)
_
_1_
(1.2.1.24)
18
Using this, Blight shows that
(1. 2.1. 25)
and hence that a necessary and sufficient condition for convergence is
th at th e 1 argest
k
.
I ue of N
1 Z n.B.(G) must be less than one and
j=l J J -
e~genva
convergence is then geometric with rate equal to the largest eigenvalue.
These eigenvalues are shown to be all positive.
Therefore the expec-
tatiun of the maximum likelihood estimator of an element of G is an
increasing function of that element.
The asymptotic variance-covariance matrix of the estimators
e are
found by Blight to be given by the inverse of the Fisher information
matrix I(e). where
k
I(e)
where P
0
p.
J
= NA{P(V+EE')
P.E.E~
j=l J J J
ee'
}A
(1.2.1.26)
= probability of an uncensored observation
= probability of an observation in subrange
= the inverse
A
+ Z
j
of the variance covariance matrix of ,p (x) where
x is unrestricted
v =
1.2.2
the variance-covariance matrix for
to be uncensored
Estimation in Linear Parametric
~(x)
when x is restricted
~odels
Sampford and Taylor (1959) considered randomized block experiments
with Type I single censoring and developed methods analogous to those of
Sampford (1952) for their analysis.
In the 2-samp1e
~matc.hed
pairs)
case, Sampford's procedure is directly applied to the paired data di£ferences.
An adjustment for bias of the estimate of the variance in the
presence of censoring is proposed to be
Zw/(~w-l),
analogous to the
19
uncensored case, where w is defined as in (1.2.1.20).
For experiments
with more than two treatments, the authors decide on the following
procedure:
1.
Obtain initial estimates
2.
Insert expected values for the censored values using current
estimates of the parameters
3.
Re compu te the
4.
Revise the estimates in 2. and iterate until convergence
5.
Run ANOVA using the estimated values for censored values
6.
Revise
0
2
ij '"
lJ
lJ
+ Pi +
t
j
dividing by (mn-r + EA )
ij
where m ... number of blocks
n ... number of treatments
r ... number of censored values
and
7.
A•• is defined as in (1. 2.1. 20)
~J
Go to 2. and repeat the whole process until everything converges.
The final estimate of
0
in the final iteration.
2
is obtained by dividing by (m-l) (n-1) -r + D ..
~J
Tests of significance of treatment differences
are approximated by a t-test with "degrees of freedom" given by the divi"2
sor of the final estimate of o.
Hence to compare treatments
t
j
and \.
we would have
t(m-1)(n-l)-r+E~" (~.-~)/s2.(l/Ew
.. +
J
t\.
i ~J
1/.Ewik )
~
(1.2.2.1)
where w.. = 1, uncensored values
~J
A.. , censored values.
~J
Taylor (1973) used simulated data for the above procedures using a
t-test with d.f. (m-1) (n-l) (mn-r + EN, and found the variance estimate
am
to be too low.
With d.f. given by em-l)
(n~l)-r
high, resulting in more conservative tests.
however.
+ LoA the variance was too
Both approximations were good,
20
Glasser (1965) used Lea's method directly extended to the regression situation with Type I randomly singly censored data.
For the right
censored case, the contributions to the log likelihood and its derivatives under each condition (censored or uncensored) were derived and
tabled.
Using Newton method equations exactly as in Lea (1945) initial
estimates of parameters were inserted in the summed terms from this table
and corrections computed.
New estimates were then generated and itera-
tions repeated until convergence.
~2
matrix is estimated by cr
The asymptotic variance-covariance
times the inverse of the coefficient matrix of
the iterative equations as it appears in the last iteration.
1.2.3
Bivariate Normal Estimation
Most of the methods available for handling censored data in the
univariate case are not applicable to the bivariate situation.
(1955) presents a
maxim~m
Cohen
likelihood solution for the bivariate case
where only one variable is censored, and the second variable is not
measured when the first is censored.
His method is an extension of his
univariate methodolgy mentioned earlier.
Singh (1960) published a
maximum likelihood solution of the bivariate case where the truncation
(rather than censoring) is with respect to both variables, and where the
variables are required to be uncorrelated.
Rosenbaum (1961) obtains
moment estimates for a bivariate normal distribution singly truncated
with respect to both variables, where the variables may be correlated.
From chese moments the parameters of the distribution may be estimated.
1. 2.!4
The "~issing Information Principle" (MIP) in the Missing Data
Case
The MIP as presented by Orchard and Woodbury (1972) represents the
framework upon which the resulcs in this work are based.
The objective
21
of the MIP is the development of a general principle for obtaining maximum likelihood estimates from incomplete data using as nearly as possible
the same techniques as would be used for complete data.
Orchard and Wood-
bury introduce the general philosophy for dealing with the problem of incomplete data.
They develop theory
~ld
show applications of their techni-
ques in several general cases concerning missing data.
bury (1971), the approach is explained as follows:
Following Wood-
Suppose that the com-
plete data consists of the random variable X and that the incomplete-data
image of it is Y.
The likelihood equations may be written
(1.2.4.1)
By taking the appropriate partial derivatives the score vector may be
correspondingly written
Sco(elx) • alnL(X!e)/aG
= Sco(G,X!Y) + Sco(eIY)
(1.2.4.2)
where sco(elx) is the Score for the parameter e as defined in the equation.
Since the expected value w.r.t.X of the Score of 0 (written as
~(Sco(0IX) IY»
equals zero, we may show that Ey(Sco(e,x!y) !Y)
=0
and
consequently by taking the conditional expectation of the score we obtain
sco(e!y)
= Ey(Sco(G/X)
IY)
(1.2.4.3)
In a similar manner a partitioning of the information matrix is obtained:
I(e,elx)
=
I(9,eIY) +I(8,e,xIY)
(1.2.4.4)
Equation 1.2.4.4 states simply that the total information is equal
to the information contained in the available data plus the lost information.
Equation 1.2.4.3 states that the score for the incomplete data
may be obtained by taking the conditional expectation of the score for
t~e
complete data, with the missing values being regarded as random
22
variables.
Equations 1.2.4.3 and 1.2.4.4 may be regarded as the funda-
mental relationships of the MIP.
Beale and Little (1975) examine six estimation procedures applied
to multivariate normal populations subjected to random deletions.
The
estimations are found by
(1)
Ordinary least squares using complete observations only
(2)
A method proposed by Buck (1960) which is essentially a
single iteration of the MIP
(3)
The MIP, with (~-1) used instead of N to derive the estimated
covariance matrix (corLected ~IP)
(4)
A method that estimates the means, variances and covariances
of the independent variables only by corrected MIP, uses them
to fit missing values of the independent variables and then
uses ordinary least squares on all observations for which the
dependent variable is present.
(5)
Method 4, but with incomplete observations given appropriately
reduced weights
(6)
Method 5, but using a covariance matrix for all the variables,
found by Method 3, to find the fitted values and estimate the
weights.
The criterion for judging the effectiveness of each estimator was the
residual sum of squares of deviations (5) of the observed and fitted
values of the dependent variable when the deleted values were restored.
The data were generated from a multivariate aormal population with one
variable identified as the dependent variable, and
dependent variables.
bet~een
2 and 4 in-
They took SO, 100, or 200 observations and deleted
either 5, 10, 20 or 40 per cent of the observed values of each variable,
the values being chosen randomly and independently for each variable.
The average value of S was computed for each of the methods over 10 sets
of random numbers for seven covariance matrices and each number of
observations and deletion patterns above.
The conclusion is that the
corrected MIP consistently beat the other methods using the S criterion.
23
IVhen the straight XIP was tested versus the corrected MIP, the results
were almost identical to those of Method 3.
Beale and Little derive
Orchard and Woodbury's MIP, following their argument but emphasizing
that the effect of the principle is to replace a maximization problem
by a fixed point problem.
They give a formal definition of the MIP that
clarifies its logic and they follow Orchard and Woodbury in showing that
the principle leads to a simple iterative algorithm for finding estimators that are maximum likelihood when the population is multivariate
normal.
The formal definition of the MIP as presented by Beale and
Little is developed as follows:
The MIP is concerned with the situation in which there are
random variables that can be grouped into two vectors z and y
with joint distribution depending on the vector 8 of parameters,
where y has been observed but z has not been observed. In our
application of the principle, 8 represents the set of means and
the covariance matrix for the multivariate normal distribution,
y represents the complete observations, while z represents the
missing values in the incomplete observations. We wish to
find 0, the estimate which maximizes the log-likelihoods L(y;8)
of y given 8. But it may not be easy to compute this directly.
On the other hand, it may be much easier to find the value of
8 that maximizes the log-likelihood L(z,y;9) of z and y given
8, for any complete set of data (z,y). Furthermore, we may be
able to find the value of 0 which maximized the expected value
of L(z,y;8) if z is treated as a random variable with some
known distribution. The appropriate formulae can often be derived by imagining that the sample is replicated an arbitrarily
large number of times, with y taking the same value in all replications but with z taking its known distribution. This procedure is central to the Missing Info=mation Principle, which
is now (formally) described. Let f(zly;8) denote the probability
density function for the conditional distribution of z given
y and 8, and let L(zly;G) denoce 1n (f(z!y;G». Then we know
that
L(z,y;G)
= L(y;0) + L(z\y;8)
(1.2.4.5)
Now take any assumed value 8 for 8. This, together with
A
the observed value of y, defines a distribution for z, and we
can take expectat~ons of both sides or (1.2.4.5), integrating
out with respect to z. This is expressed by the equation
(1.2.4.6)
24
If the distribution of z has a probability density element
f(zIY;GA)dz then (1.2.4.6) can be equivalently written as
L(~,~;~)f(~I~;~A)d~
2
L(~;~) + L(:I~;~)f(:I~;GA)d:
(1.2.4.7)
We can now find the value Gu of 0 that maximizes the lefthand side of (1.2.4.7). Th~s may depend on G , so we may
,
A
wr1.te
(1.2.4.8)
Equation (1.2.4.8) represents a transformation from the vector
G to the vector 8. We now define the Missing Information
A
Principle: Estima~e 0 by a fixed point of the transformation ~,
namely a value of G such that
o ...
ep(G)
(1.2.4.9)
Beale and Little justify their approach by showing that the maximum likelihood estimator of 0 is a root of the fixed point equations,
and conversely every root of the fixed point equations is a maximum or
stationary value of the likelihood.
Hence, if the likelihood is dif-
ferentiable, any solution of the fixed point equations automatically
satisfies the likelihood equations found by setting the partials of
the likelihood equal to zero.
The equations for the multivariate normal case with missing data
are now developed.
Let the matrix K(N x n) denote the complete set
of variables, P, the set of variables observed in the i
th
observation,
~
and P
8
A
t
the total set of variables observed.
= (~A,t.\),
L(X;i-!,t)
:~ ... -P(G )
A
= -~
N
L:
n
l:
:=
n
l:
Then 8 ... (~,t),
(~,~) and
1=1 j=l k~l
'k
(xi'-~)~ (xik-~k) + ~Nln(dett
-1
(1. 2.4.10)
)
J
;k
th
where ~ is the jk
element of
t-.1
Taking expectations with
°=
0
A
and the known variables fixed,
E{L(X;~,t) !PT;IlA,tA } ... -~H2:«xijA-llj)(XikA-llk)
+ ~Nln(det*-l)
'k
+ cr,kAP »~
J
i
(1.2.4.11)
25
t
Maximizing with respect to U and
~J'M
0jkM
for 1 .:.. j < k < n.
gives
-1
=N
N
(1. 2.4.12)
• E xi'A
i=1
J
-1
.
=N
(1.2.4.13)
Setting
U
A
=
~
= u, tA
.. ~ =
to
the fixed point
equations are
(1. 2.4.14)
N
(1. 2.4.15)
x.,
i""l 1J
E
-1 n
0J'k = N
E
~
~
(x. ,-U j ) (x'k-Uk) +
1=1 1J
1
cr'lkP
J
(1.2.4.16)
i
These are the equations developed by Woodbury and Hasselblad (1970).
find the ML estimates initial estimates of u and
t
To
are inserted into
(1.2.2.14) and cycled through (1.2.4.14) - (1.2.4.17) until convergence is achieved.
In a recently published report, Dempster, et al. (1977) present what
they term the "EM algorithm" for maximum likelihood estimation from incomplete data.
They acknowledge that some of the theory underlying the
EM algorithm was presented by the MIP of Orchard and Woodbury as well
as others, but develop results which assert that successive iterations
of the type contemplated in this study always increases the likelihood,
and convergence implies a stationary point of the likelihood, as shown
26
by Beale and Little.
Sufficient conditions for convergence of an EM
(and hence an MIP) algorithm are also given, and the rate of convergence of such an algorithm close to a stationary point under specified
assumptions about the likelihood function is presented.
Their conclu-
sion is that if the information loss due to incompleteness is small, then
the algorithm will converge rapidly.
Hence in the censored case, where
more information is retained than when a variable is missing entirely,
convergence to a solution should be quite rapid for moderate amounts
of censoring.
The iterative scheme for the univariate case in the
presence of Type I censoring is also suggested in this report, although
explicit formulae are not derived.
1.3
Statement of Specific Purpose and Content of This Work
The specific purpose and content of this work are the following:
1.
The development of an iterative maximum likelihood procedure
for estimation of the parameters of a randomly censored bivariate normal distribution based on Orchard and Woodbury's "Missing Information
Principle" .
2.
Application of the MIP as a general solution to univariate and
bivariate multiple linear regression estimation problems in the presence of random censoring on the dependent variables.
3.
The development of a likelihood ratio test for hypothesis
testing on the developed estimates under random censorship conditions.
4.
A comparison using simulation procedures of the likelihood ratio
test with the approximate t-test of Sampford and Taylor (1959) and with
other approximate tests under random censorship conditions in order to
evaluate their respective
beh~!iors
under the null hypothesis.
27
5.
Extension of the above results to the multivariate multiple
regression situation when no more than two dependent variables are censored in any observation vector.
6.
A demonstration of direct application of the developed theory
and iterative procedures to several Environmental Protection Agency
data sets consisting of measurements of trace residue concentrations in
human tissues and of pertinent covariates of interest.
In Chapter 2, the problem of maximum likelihood estimation in the
k-sample case and the multiple regression case when the dependent
variable is subject to Type I random censoring is considered.
Sampling
errQrs of estimates are also derived for each of these cases.
The esti-
mation procedures developed using the MIP methodology are shown to obtain the same estimates as other authors for several examples from the
literature.
Chapter 3 first considers approximate maximum likelihood solutions
to the multivariate normal estimation problem with fixed Type I single
censoring on the variables.
A numerical procedure for maximum likeli-
hood estimation of the variance-covariance matrix with missing data as
developed by Woodbury and Hasselblad (1970) using the
for the censored case.
~IP
is modified
If more than one variable is censored in an
obser'Tation vector, this method leads to biased estimates relative to
the true maximum likelihood estimates.
These biases are investigated
for varying percent censored and varying correlations between variables
in the bivariate normal case.
Unbiased consistent iterative estima-
tion methods for the bivariate normal case are developed as an extension of the biased approach, and simulations are run to investigate
the characteristics of the estimation procedure.
The procedure is also
28
shown to be applicable to the multivariate normal distribution with at
most two variables censored in any observation vector.
Chapter 3 looks at maximum likelihood estimation of parameters in
a bivariate linear model with both dependent variables randomly censored.
As in Chapter 2, the k-sample and the multiple regression cases are
examined.
Chapter 4 deals with tests of hypotheses involving the maximum likelihood estimators developed in the earlier chapters.
Likelihood ratio
tests are investigated and simulations are run using computer-generated
data to establish the behavior of these tests under the null hypothesis.
Where possible, other suggested test statistics are computed and compared to the likelihood results.
The power of the likelihood ratio tests
in the univariate case is also examined through simulation studies.
Chapter 5 demonstrates the usefulness of these new techniques in
the analysis of three data sets drawn from studies conducted by the
Environmental Protection Agency.
Finally, Chapter 6 summarizes the results of this investigation and
presents general conclusions as well as suggestions for further research.
Chapter 2
MAXIMUM LIKELIHOOD ESTIMATION OF PARAMETERS IN UNIVARIATE
LINEAR MODELS WITH THE DEPENDENT VARIABLE
SUBJECT TO TYPE I CENSORING
In this chapter methodology is developed for maximum likelihood estimation using the MIP in the univariate one sample, k-sample and multiple
regression cases, where the dependent variable is subject to random censoring.
This new methodlogy will be termed the "restricted information
principle," of RIP for easier reference in later sections.
All of the univariate methodology is actually a straightforward
application of equations 1.2.4.14 through 1.2.4.16.
However, the expec-
tations involved are now conditioned on the information available in an
observation, this information being the range in which a censored observation is known to be.
Hence, we have a restricted information rather
than a missing information principle at work.
The univariate cases will
now be examined individually so that the estimation procedures specific to each problem may be explicitly detailed.
The techniques
are applied to data sets from the literature and
are shown to obtain the same estimates as the literature sources.
2.1
Estimation in the One-sample Univariate Normal Case
In the one-sample univariate normal case we have the Y. as inde~
?
pendently identically distributed (iid) as N(u,cr-), so that
Y.~
=~
+ e.
~
1=1,2, ... ,n
29
(2.1.1)
30
i '" k
i :/J k.
.e
For an observation Y. censored on the left at point c. we have
1
1
=
E(Y·IY.<c.)
111
(~(ai»)
-1
f a i (ay
+ ~)~(y)dy
-oc
a
f i y<j>(y)dy
(2.1.2)
-co
while for right censoring we have
E(Y.\Y.>c.)
'" ~ + a'(<1>(ai)/(l-~(a;»))
1
1
1
...
where a
i
'"
(ci-u)/a,~(x)
(2.1.3)
is the standard density evaluated at x, and
is the cumulative normal evaluated at x.
~(x)
Also we have
(2.1.4)
2
2
E(e.IY.>c.) '" a +a(c.-\l)q,(ai)/(l-Ha.)
1
1
1
1
1
(2.1.5)
Hence, using the RIP, the approach for this problem is straightforward.
After obtaining initial estimates for
values are estimated.
~
and for
cr, censored
New parameter estimates [or u, with censored values
replaced by their conditional expectation and with computation of a
~')
new
a~
given (in the case of left censoring) by
~2
c5 p+
where
-.! ,. = Y.
1
1
l
for uncensored observations
= E (Y l. 'I Y.1 <c 1. ) for censored observations
and
.- =
...
v'';
=
0, Y, uncensored
1
1, Y. censored
1
(2.1.6)
31
with
~2
0
p
2
= previous estimate of a .
2
But E(o !Y<c) is given by
c
(Y-u+oq, (a) / ~ (a»
J
-
2
2 - ~
• (2rro)
t~
-1
(a»
2
exp- (Y-u) /20 dy
co
c
I«Y-~)/cr+q,(a)/~(a»2.q,«y-u)/a»/~(a)
-
dy
co
a
o
r[z+q,(a)/~(a»2(1:f.;~(a»-lexp-z2/2
2
(2.1. 7)
dz
J
-co
a
=
0
a
Jz2(;r;~(a»-lexp-z2/2
2
-
dz + 20
co
2
fq,(a)(~(a»-2(z/;z.;)exp-z2/
-
co
2 dz
The first term of (2.1.7) is integrated by parts with
u =
U) 1:[;] , dv = -zexp[-z2/ 2 ]dz, giving
_z/[¢(c, 0
2
a
J exp-z /2
-co
rz;-
-co
(2.1.8)
The second term of 2.1.7 is seen to be
,a
20 24>(a)(~(a» -2 (-(1/n:;) exp-z 2 /2
I
]
= -20 2 (:p(a)/Ha» 2
(2.1.9)
- co
and so (2.1.7) finally becomes
(2.1.10)
Using this result in (2.1.6), we have for left censoring
(2.1.11)
32
while for right censoring the above procedure leads to
E [(Y'_C)2 + 0i;-P? (l+ai~(ai)/(l-~(ai»
n
i=l
(2.1.12)
i
c -\.I
~2
i
::0-ai'
and
0p
are
defined
as
in
(2.1.6)
and
a
i'
i
a
where y'
p
These new parameter estimates are used in equations (2.1.2) or
(2.1.3) to obtain new estimates of the censored values and iterations
then continue until convergence is achieved.
Initial values are ob-
tained using uncensored observations to estimate
~
and a
2
for use in
the estimating equations.
2.2.
Estimation in the k-samp1e Univariate Normal Case
2
In the k-samp1e case we have the Y
i.i.d. as NC\.Ij'O ),
ij
j=l, 2, ... ,k, so that
Y.. = U.+ e ..
~J
where
•
~J
J
i::01,2, ... ,n.
j=1,2, ... ,k
J
, 2
E(Y .. ) = Uj , E(e e ) ::0
ij kl
~J
k
E n.= N
j=l J
C2.2.l)
ij=kl
ij ;'kl.
{~
Analogous to section 2.1, for censored observations Y.. we find
~J
(2.2.2)
ECY .. IY.j>c .) = 1-l.+oHa ·)/(l-1>(a ..
iJ
iJ
~J
~
J
~J
J
n j \I (Y'•• -u".) 2 +
j=l i=l
~J
J
I'
~
0
O •.
~J
'2 V .. )
(J
P
~J
IN
where V.. = I-a .. <pCa .. )/Ha .. ) - (9(a .)/f.!(a .. »2
~J
~J
~J
(2.2.3)
(2.2.4)
=
\.I.
I:: k
»
~J
iJ
~J
(2.2.5)
33
for left censored observations, and
for right-censored observations and a
ij
=
c ij - U •
J, with other terms de-
o
p
fined analogous to Section 2.1.
Initial parameter estimates for the iterative process may be computed assuming a univariate one-sample model as in Section 2.1.
Cen-
sored values are estimated using these initial values, and the u.
and
.J
~2
a
are computed inserting the estimated values in place of the censored
~2
values, with a
obtained from (2.2.5).
until convergence is obtained.
The recursive process is repeated
This process has been found to be effec-
tive in all examples examined.
2.3
Estimation in the Univariate Multiple Regression Case
In the univariate multiple regression case we have Y
where Y
=
is a random observed vector, e
Y
n
=
(e~:1]
l
XB + e
:II
is a random vec-
n
tor, X is an (n x k) matrix of rank consisting of known fixed quantities,
and 3
is a vector oi unknown parameters, where k<n.
kxl
The e. 's are in~
dependently identically distributed as normal with mean zero and variance
a
2
unknown.
In this situation as is well known the normal equations yield
(2.3.1)
~2
a
-1
,
~
= n (Y-XB), (Y-XS)
(2.3.2)
For an observation Y. left-censored at point c. we have from
J
J
(2.1.2) that
E(Y. iy.<c.)
J
J
J
=
Zk
i=l
B.X'i-a~(a.)/t(a.)
~ J
J
J
(2.3.3)
34
while for right censoring
k
[
E(Y.IY.>C ) ""
J
where a
j
= !c j
k
[
-
1=1
j
J
a.X . . 1/0 and
~ J~
i=l
~(x)
S.X .. + a~(a.)/(l-~(a.»
1
J~
J
J
(2.3.4)
and leX) are defined as previously.
Also we have
2
2
E (e. 1 Y. <c.)
J J
J
and
""
0
E(e~IY.>c.) = a
J
J
J
2
(1+0
(l-~
-1 k
(~
1=1
-1 k
(~
. 1
~=
6iX'i-c.).(~(a.)/~(a.»)
(2.3.5)
8.X .. -c,)·(~(a.)/(1-~(a.»»
(2.3.6)
J
~
J
J~
J
J
J
J
J
2
After obtaining initial estimates of 6 and a , the successive sub-
stitutions procedure is established.
Censored Y 's are replaced by
j
e are
their estimated expected values, and new
~2
equations.
is computed as previously:
a
~2
k
n
Z
u + =
p l
where Yj
computed using the normal
j=l
«Y~-E
J i=l
6i Xj
2
.)
~
~2
+ o.E(a IY
J
P
j
censored»
(2.3.7)
= Yj for uncensored observations
= E(Y. IY.<c.) for left censored observations
J
J
J
= E(Y. IY.>c.) for right censored observations
]
and where
o. ""
J
]
J
0 for uncensored observations
= 1 for censored observations .
.,
In this situation, E (u·!Y. censored) is given by one of the following:
J
~2
~2
2
E «(j IY . <c . ) = a (1-a.~(a.)/1(a.)-(¢(a.)/~(a.»
)
(2 3 8)
P
J'f' J
J
J
J
••.
P
J
J
(2.3.9)
and
K
where a. "" [c.- Z 3.X . . ]/a .
J
J
i=l ~ J ~
P
The RIP iterative scheme is therefore established.
After estima-
ting censored values initially assuming Y is univariate normal with
35
mean and variance unknown, first estimates of S and a 2 are obtained.
These parameter estimates are used to estimate the censored values and
the iterative process is repeated until convergence to the maximum
likelihood solution.
2.4
Sampling Errors of Estimates
The asymptotic variance-covariance matrix of the parameter esti-
mates developed in Sections 2.1 through 2.3 is obtained by inverting
the matrix whose elements are negatives of expected values of the second
order derivatives of the log likelihood functions.
Each case shall be
examined in turn.
2.4.1
The one-sample case
Cohen (1959) obtained the expression for the asymptotic variance-
covariance matrix of
(~,o)
in the one-sample case for both Type I fixed
censoring and Type I I censoring.
The components of
~his
matrix are given
by
V(;J)
,,~
"2
"
""
"2
(0 /n)(Gll/(01l02Z-G1Z»
"2
~
~
A
(2.4.1.1)
h2
Cov (;.l , 0' )~J (0 In)(-01Z/(GI1G22-0lZ»
where n is the sample size, and G , 0lZ' 8
are respectively
1l
22
2
2)
')
2
?
22
-[a /n]E(3 L/3~-), -[O'~/n]E(3 Lld~ao) and -[O'-/n]E(a Lido). The G
ij
:or singly censored samples of Types I and II are:
Type I Censored Samples
8
'3
3
11
12
22
(a) = l-·p(a)+Q(a)· (,p(a)/Ha) +
(a)
= H a) • (1+a ( Q> (a) N (a) + a»
(a)
=
(l-~(a»
. (2+a0
12
(a»
a)
(2.4.1.2)
36
Type II Censored Samples
where n
n
n
o
c
=
n In + (n In)· (tP (a) (p(a» • (4l (a) I ~ (a) + a)
o
c
=
(n
= number
c
In).(~(a)/~(a»·(1+a(4l(a)/1(a) +
a»
of complete observations
=
number of censored observations
=
total sample size
= no
+ n
c
For samples censored on the right, 0. ,(-a) from (2.4.1.2) are substituted
1J
into (2.4.1.1) with the sign of 8
reversed.
12
~
A'"l
~
In order to obtain the variance-covariance matrix of (u,a ), we must
pre- and post-multiply the matrix given in (2.4.1.1) by
[~
l.W.
aa 2
- -1
- -1
3(a)
1
0
0
1
,- 1
all
a (0)
aa 2
J
o
(2.4.1.3)
=
=
I
i-
2a
J
lo
This gives
(2.4.1.4)
In the case of random censoring, the distribution of the censoring
variable would have to known in order to determine the asymptotic
~
variance-covari~~ce matrix
of
~2
(~,a).
If one is willing to assume that
the second partial derivatives are close to their expectations, an estimate of the variance-covari.ance matrix may be obtained by using the
results in the following section, letting k=l.
37
2.4.2
The k-sample case
Here we have the density given by
(2.4.2.1)
Let Yij
=
(Yij-~j)/0, and a
=
ij
(cij-~j)/a
point for a censored observation Y , with
ij
where c ij is the censoring
~
and
~
defined as previously.
The contribution of each observation to the log likelihood and its deHnf (Y .. )
a2 lnf (Y .. )
rivatives are given in Table 2.4.2.1.
Note that
~J
=
~~.oLJ_
=
2
d~l
= a for
j~l
and also that
~
the inverse of the asymptotic variance-covariance matrix for
given by
f8kll
2
(n/a )E
~2
(~,a
) is
(2.4.2.2)
le
k12
where in the left censoring case El
kll
is a k x k diagonal matrix with jth
diagonal term given by summation of the contributions to
in Table 2.4.2.1 to be
R(a .. )(R(a .. )+ a .. )}/n
~J
~J
Z;
censored ~J
in class j
wh i 1 e G
k12
. a vector
= 8k2l ~s
0
f 1engt h k
?
U:
(Y .. -~.)/auncensored ~J J
in class j
+
a: .-1)/2a}/n
1.J
and finally G
is given by
k22
+
L:
(2.4.2.3)
.
by
. h J. th term g1ven
w~t
a .. R (a .. ) (R (a .. )
~J
~J
censored 1J
in class j
(2.4.2.4)
Table 2.4.2.1
Contributions to Log Likelihood and Its Derivatives in the k-sample Case
Qliantity
L
-0
Not Censored
Left Censored
log(fi(y. ,)/0)
log (of> (a .. »
R(--a
Yij
')
2
1J-<)L/dO
(Yij-l)/2
-a .. R(a .. )/2
1
R(a .. )(R(a .. )+a .. )
') ')
OJ)
1J
2
)
1J
1J
2
2 2
1J
-R(a .. )
uJL/JIl.
J
-a-3~L/j)(u
log(l-<I>(a .. »
1J
1J
222
d L/dll.
J
Right Censored
.)
(y .. -1/"2)/0"
1J
,
)
a .. R(-a .. ) /2
1J
1J
ij
1J
R( -a .. ) (R ( -a .. ) -a .. )
1J
2
aijR(a ij ) (aijR(aij)+aij-3)/40
1J
1J
2
1J
1J
1J
1J
1J
1J
?
-O~J-L/J(0-) <)Il.
Yi/
J
R(a) = g.(a)/4J(a).
For
1
f
O
j
R(a .. )(a .. R(a .. )+a?-1)/20
1J
elL/Jl-il =
1J
32L/al-i~ =
1J
1J
2
1J
a .. R( -a .. )(a .. R( -a .. )-a .. +3) /40
2
2
R(-a .. ) (a .. R(-a .. )-a .. +1)/20
1J
1J
1J
1J
2
d2L/dl-ilai = 0 and d L/dl-l dl-i = 0 for 1 j m.
l m
w
co
e
e
e
39
k
u:
E
(7 ·-\.1,)2 /0 2 -
all i J J
uncensored
observations
,/2 + E
n
ai,R(ai1)(ai.R(aij)
all J
J
J
censored
observations
j=l OJ
(2.4.2.5)
+ a 2ij -3)/4}/no 2
where R(a) =
~(a)/~(a).
For observations censored on the right -a
ij
is substituted for a
in
ij
equations (2.4.2.3) through (2.4.2.5), with the sign reversed on G
kl2
terms.
If one is willing to assume that the second partial derivatives are
close to their expectations, an estimate of the variance-covariance
~,
?
matrix of (\.1,0-) may be obtained by inserting parameter estimates into
equations (2.4.2.3) through (2.4.2.5), using these values for the G
kij
in equations (2.4.2.2), and inverting that matrix.
2.4.3
The univariate multiple regression case
The density of Y is given in this case by
j
2 -~
P
2
2
(
exp-{(Y,- E X 6 ) 120 }
J
i=l ji i
feY,) = (21T0)
J
where p is the number of
ind~pendent
2.4.3.1)
variables in the regression.
Let
p
Y.-~
z.
J
=
~
CJ
.. B.
J
~
~
and a.
J
=
c.-lX·iB,
J
]
~
v
for Yj censored at point c j .
Then
the contributions to the log likelihood and its derivatives are found to
be as in Table 2.4.3.1.
If we let R(a.) = 'H3..)/~(a,), n
]
J
J
0
= number of
observed dependent
variables and n c the number of censored dependent
.
variables, then the inverse of the asymptotic variance-covariance matrix
is given by summation over the contributions in Table 2.4.3.1 to be
-
2
(l/'J )E
Il~ll
~2l
8
~~12
II
1~l\.22 J
(2.4.3.2)
Table 2.4.3.1
Contributions to Log Likelihood and Its Derivatives
In the Univariate Multiple Regression Case
4uantity
10g(ep (z.) /0)
log(<p(a.»
oaL/aiL
XjiZ j
-XjiR(a j )
X.. R (-a. )
2 dL/O{(2)
(z.-1)/2
2
J
-a R(a )/2
a.R(-a.)/2
L
10g(1-4>(a.»
1
J
1
0
J1
j
j
2
J
X ..
_}a 2 L/d(02)2
2
2
(z.-l/2)/o
2
2
a.R(a.)(a.R(a.)+a.-3)/40
XjiZ/o
X.. R(a.)(a.R(a.)+a.-l)/20
J1
J
J
J
J
)
.) 'I
-0- rL/atJ.
1
2
a (0 )
-o2;j2L/aB·J13k
1
.
XjiX ki
J
)1)
J
J
)
J
J
2
XjiR(-a j ) (R(-aj)-a j )
2
X.. R(a.)(R(a.)+a.)
J1
1
J
J
-,/d2L/at1~
R(a)
Right Censored
Left Censored
Not Censored
J
J
2
X.. Xk·R(a.)(R(a.)+a.)
J1 1
J
J
J
a.R(-a.)(a.R(-a.)-a~+3)/402
J
J
J
J
J
2
X.. R(-a.)(a.R(-a.)-a.+1)j20
J
J1
J
J
J
X.. Xk·R(-a.)(R(-a.)-a.)
J1
'1
]
]
J
= ~(a)/~(a).
.p-
o
e
e
e
41
where the jth term of the p x p sub-matrix G
is given by
Rll
left censoring:
[GRll ] jk ,.
(2.4.3.3)
right censoring:
where 8.
~
= { Ol
uncensored observations
censored observations
while the jth term of the pxl vector G
is given by
Rl2
left censoring:
and G
is given by
R22
left censoring:
~
(z:-O.5)+o.25 :
[R(ai)ai(R(aiai+ai-3)]]
1
uncensored
cens.
observations
obs.
(2.4.3.5)
right censoring:
8
R22
= cr -2 [ :
(Z~-O.5)+o.25~
uncensored
observations
[R(-a.)a.(R(-a.)a.-a:+3)]]
cens.
1
1
1
1
1
obs.
If we assume that the second partials are close to their expectations, an
estimate of the covariance matrix of the maximum likelihood estimates is
obtained by evaluating the terms of equation (2.4.3.2) using (2.4.3.3)
through (2.4.3.5) and inverting this matrix.
?
...
~
:)
Applications to Literature Data Sets
Data sets from several literature sources provide an opportunity
for comparison of results in the one-sample, the k-sample, and the
42
multiple regression cases.
The convergence criteria for the RIP pro-3
cedures in each example was taken to be /Uk+l-Uk!<ok+l·lO
and
~
I ~2
~2
~
jOk+l-okl<ok+l·10
2.5.1
-3
.
~
~
Convergence was attained in every case.
Examples in the one-sample case
Example 1.
Hahn and Shapiro (1967) presented data on a control de-
vice tested on n 96 diesel locomotives.
2
After 135,000 miles of service
the n =59 unfailed units were removed from further testing.
The failure
c
times for the n =37 units that failed are given in Table 2.5.1.1.
o
Table 2.5.1.1
Failure Times for 37 Locomotive Control
Devices (in thousands of miles)
22.5
54.5
77 .0
84.0
112.5
120.0
37.5
57.5
78.5
91.5
113.5
122.5
46.0
66.5
80.0
93.5
116.0
123.0
48.5
68.0
81.5
102.5
117.0
127.5
51.5
69.5
82.0
107.0
118.5
131.0
53.0
76.5
83.0
108.5
119.0
132.5
134.0
The underlying failure distribution was taken to be a log-normal distribution (See Hahn
&
Shapiro).
In this case we have Type I fixed
censoring on the right, with c. .. c
J
..
10g
10
(135.0).
The results
obtained from the RIP procedures are given in Table 2.5.1.2.
Letting u "" 2.2218 and ;2 .. 0.0936. and therefore ~."" ~ "" c:~ "" -0.2990
J
'J
in this one-sample case, equations (2.4.2.3) through (2.4.2.5) yield
°11""
(37+33.4157)/96
8
(-36.4l63+26.4549/2)/(96xO.3059) .. -0.7896
12
=
= 0.7335
G .. (29.4636+(-0.5036/4»/(96xO.0936) .. 2.4529
22
(2.5.1.1)
43
so that the estimate of the variance-covariance matrix is given by
- -1
1-0.7335
0.0936
96
I
-0.7896
-0.7896
2.4529
2.0863
=
J
0.0936
96
0.6716
0.6716 ]
(2.5.1.2)
0.6239
Tab Ie 2. 5 . 1. 2
Results of the RIP Procedure for the
Locomotive Control Device Problem
iteration
\.Ii
a.~
1. 9207
0.0307
0.0358
0.0483
0.0583
0.0661
0.0723
0.0771
0.0809
0.0838
0.0861
0.0878
0.0892
(1)
0
1
2
3
4
5
6
7
8
9
10
11
2.1020
2.1362
2.1586
2.1744
2.1860
2.1946
2.2011
2.2060
2.2098
2.2126
2.2148
iteration
(i)
12
13
14
15
16
17
18
19
20
21
22
a
ui
2.2165
2.2178
2.2188
2.2196
2.2207
2.2207
2.2210
2.2213
2.2215
2.2217
2.2218
i
0.0903
0.0911
0.0917
0.0922
0.0926
0.0929
0.0931
0.0933
0.0934
0.0935
0.0936
Since this is a fixed Type I censoring situation, Cohen's procedure
(1957, 1961) may be applied to obtain the maximum likelihood estimates.
We have
x = 1.9207,
0.6976 and h
s
2
= 0.03066,
= 59/96 = 0.6146.
(1961) yields A = 1.4387.
2.2223 and
~?
0- •
X
o
?
= 10g10(135) , and hence s 2 /(x-x
)- =
o
Linear interpolation in Cohen's Table 2
Hence,
\.12
1.9207 + 1.4387(2.1303-1.9207) =
0.03066 + 1.4387(1.9207 - 2.1303)
2
= 0.0938.
values compare quite closely to those from the RIP procedure.
These
Estimates
of the variances and covariances using expected values in this case are
obtained from equations (2.4.1.4) for this fixed Type I example.
Cohen
(1961) has tabled values of these variance factors for selected values
of a.
Equations (2.4.1.4) and Cohen's Table 3 yield
44
~
V(w",0.0938(2.1186)
96
Cov(
~
Il,
"2
(T
~
0.0938(2xO.3063)(1.132l)
96
)
0.0938(0.6935)
96
(2.5.1.3)
V(~2) '" O.0938(4xO.0938) (1. 7038) .. 0.0938(0.6393)
96
96
These values compare quite closely to those obtained form (2.5.1.2)
using the parameter estimates inserted into the equations for random
censoring given by equations (2.4.2.2) through (2.4.2.5).
Example 2.
In this example of Type II censoring, given both by
Gupta (1952) and Cohen (1959), a sample of 300 electric light bulbs
were tested until n =119 had burned out, with the distribution of
o
observed values as given in Table 2.5.1.3.
Table 2.5.1.3
Life in Hours of 119 Burned Out Electric Light Bulbs
Frequency
Life in hours
950-1000
1000-1050
1050-1100
1100-1150
1150-1200
1200-1250
1250-1300
1300-1350
1350-1400
1400-1450
2
2
3
6
7
12
16
20
24
27
Total
119
p = 0.3966, i = 1304.832, s2
Gupta's procedure yields
and d
~
=
= 145.168,
,
~
so that
= 0.3666 and hence cr
= 202.1
= 12198.25
hours and
1502.1 hours, while Cohen (1959) gives h = 0.60333, A = 1.36 and
hence
J
= 202.1 hours and
to this data set with c
the following results:
=
\.l
= 1502.3
hours.
The RIP procedure applied
14.50 and all values divided by 100 gives
45
Table 2.5.1. 4
Results of the RIP Procedure for the
Electric Light Bulb Prob lem
~
iteration
\J
i
·10
-2
;:.10- 2
~
iteration
1.
\J
i ·10
-2
;2. 10 -2
i
0
13.0483
121. 98
12
14.9869
394.09
1
14.2359
153.35
13
14.9947
397.40
2
14.4659
210.08
14
15.0007
399.93
3
14.6145
255.06
15
15.0052
401. 87
4
14.7186
290.18
16
15.0086
403.36
5
14.7940
317 . 43
17
15.0113
404.49
6
14.8496
338.48
18
15.0133
405.36
7.
14.8909
354.71
19
15.0149
406.03
8
14.9220
367.18
20
15.0161
406.54
9
14.9455
376.77
21
15.0170
406.93
10
14.9632
384.13
11
14.9767
~
Hence the RIP procedure yields
\J
~2
a
1501.7 and a
=
40693 or cr
= 201.7.
Even with over 60% censoring of the distribution as in this example,
convergence to essentially the same estimates as those given by the
Gupta and the Cohen procedures is achieved.
Equations (2.4.2.3) through (2.4.2.5) yield: 8
8
12
=
-0.1180, and 8 22
= 0.0564,
is given by
...
I 0.7448
40693
300
I
1_-0.1180
-0.1180
0.0564
11
= 0.7448,
so that the ML estimate covariance matrix
r=
40693
300
- 2.0076
I
4.1977
I
(2.5.1.4)
I
i
I,- 4.1977
26.5001
J
46
Using Cohen's Table 3 (1961), we obtain
~
(~)
Var
~
40693(2.022)
300
)
=
=
40693(4x4.068) (l.635)
300
~2
Cov(~,cr
~2
=
Var(cr)
40693(2x2.0l7) (1.051)
300
=
=
(2.5.1.5)
40693(4.2397)
300
40693(26.6066)
300
From (2.5.1.4) and (2.5.1.5) we see that the variance factors obtained
using expected values and using the sample parameter estimates in the
general equations (2.4.2.2) through (2.4.2.5) are quite comparable.
Example 3.
Sampford (1952) presents the following example of Type I
random censoring:
Tab 1e 2. 5 . 1. 5
Number of Shrimp Moulting and Dying Without
Moulting on Successive Days after Treatment
with Hormone Preparation
(I Moulting
Day
1
2
3
4
5
6
7
8
9
Survivors
Totals n
o
= 47
n
c
IF
Dying
8
7
4
4
8
7
5
5
2
2
6
4
1
6
3
9
9
5
34
1.
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.0
t
=1+
log ('r)
0.70
1.18
1. 40
1. 54
1. 65
1. 74
1. 81
1. 88
1.93
1. 95
= 82
In this case death before moulting acts as a random censoring
mechnism on the moulting time distribution.
One necessary assumption
in this example, as mentioned by Sampford, is independent and uncorrelated action between cause of death and cause of moult.
metameter to be analyzed is t
=
The time
(1 + log (day of moulting».
Sampford's
47
procedure yields a
.
-1A?
~')
cr
""
==
0.42.
==
-3.19, b
:a
1. 55, so that
\1
:a
a
- "" 2.060 and
_
b
The RIP iteration scheme yields the following:
bTab le 2. 5 . 1. 6
Results of the RIP Procedure for the Shrimp Moulting Problem
iteration
(i)
iteration
,,2
\1
cr2
\1
(i)
a
1. 44
0.16
10
2.04
0.39
1
1. 80
0.18
11
2.04
0.40
2
1. 88
0.23
12
2.05
0.40
3
1. 93
0.27
13
2.05
0.40
4
1.98
0.30
14
2.05
0.41
5
1. 98
0.33
15
2.05
0.41
6
2.00
0.35
16
2.05
0.41
7
2.02
0.36
17
2.06
0.41
8
2.03
0.37
18
2.06
0.41
9
2.03
0.38
19
2.06
0.41
The asymptotic variance-covariance matrix for this random censoring example is not available through Cohen's tables since they apply
only to fixed censoring results.
(2.4.2.5) we have:
011
==
Using equations (2.4.2.3) through
0.69169,
012 == -0.35421, and 022
so that the covariance matrix of the
~~
0.41
129
::I
,_-0.3542
2.5.2
l
2.1991
-0.3542 ]
0.5295
0.52951,
estimates 13 given by
-1
0.6917
:a
0.41
129
1.4711
1.4711]
(2.5.1.6)
2.8726
An example in the k-sample case
This example, based on data from Wine (1964) deals with the com-
parison of the tensile strengths of three types of copper wire.
Ten
independent random samples for each of the three wire types are given
48
and tensile strength measurements are obtained from instruments which
can only read out values between 5050 and 5150 pounds.
After subtract-
ing 5000 pounds from each value, the results are as shown in Table 2.5.2.1.
Table 2.5.2.1
Tensile Strengths of Three Types of Copper Wire
Wire Type 1
Wire Type 2
<50
50
75
85
90
105
Wire Type 3
<50
<SO
<50
50
50
55
65
120
130
150
110
115
120
130
70
70
80
80
90
90
100
130
150
>150
This is an example of two-sided fixed Type I censoring, with censoring on the left at 50 and on the right at 150.
If one assumes tensile
strength to be normally distributed with a common standard deviation
for the copper wire types, the RIP procedure yields the results given
in Table 2.5.2.2.
Tab Ie 2. 5 . 2 . 2
Results of the RIP Procedure for the Copper Wire Problem
iteration
(i)
0
1
..,
'-
3
4
5
uli
97.778
91. 737
91. 218
91.180
91.168
91.164
u2i
88.571
72.785
70.444
70.173
70.115
70.100
u3i
95.356
102.174
102.677
102.71J
102.725
102. 727
~2
cr.
l.
888.540
1296.842
1326.130
1337.900
1341. 234
1342.138
49
Convergence was achieved after five iterations.
Inserting the
final parameter estimates in equations (2.4.2.2) through (2.4.2.5) yields
1342.138
30
3.0600
0.0117
-0.0033
-3.4228
0.0117
3.2952
-0.0114
-11.8819
-0.0033
-0.0114
3.0545
3.3187
-3.4228
-11.8819
3.3187 3471. 9932
Hahn and Miller (1968b) obtained the maximum likelihood equations for
several populations with a common variance under fixed censoring within
each population and solved these equations iteratively using a "secant
method" of solving general simultaneous nonlinear equations.
This nu-
merical method is presented in more detail by Jeeves (1958) and by
Wolfe (1959). among others.
Hahn and Miller obtained the results given
in Table 2.5.2.3 for this example.
The results from the RIP procedure
are also given for comparison.
Table 2.5.2.3
Maximum Likelihood Parameter Estimates for the Copper
Problem - Hahn and Miller Results
Population Number
1
2
Estimate of population means
91.2
70.1
Estimated standard error of
estimated population means
11.79
12.14
Estimated common deviation = 36.63
Standard error of standard deviation estimate
3
102.7
11.76
= 5.69
RIP Procedure Results
Population Number
1
2
3
Estimate of population means
91.2
70.1
102.7
Estimated standard error of
estimated population means
11. 70
12.14
Estimated common standard deviation = 36.64
Standard error or standard deviation estimate
11.69
5.38
50
2.5.3
An example in the Linear Regression Case
Glasser (1965), in developing his extension of Lea's (1945) pro-
cedures to linear regression as detailed in Chapter 1, presents the
following example abstracted from Griffing et al (1961) concerning 16
patients with primary lung tumors.
The logarithm of the length of
survival in days was taken as the dependent variable (Y).
Age (Xl) and
a score rating at the onset of therapy (X ) are the independent variables.
2
These data are as shown in Table 2.5.3.1.
Applying the RIP procedure as given in Section 2.3 to this Type I
random censoring example, convergence to the maximum likelihood estimates
is achieved in 10 iterations to give So
and
~2
0
= 1266.63.
=
8
1
= -0.602,
6
Application of Glasser's procedure yields 6
o
~
61 = -0.602, 62
= 195.57,
10.210 and cr
2
~
'}
(35.6)~
=
2
= 10.201
= 195.54,
1267.36.
"""
..
,..
.. 2
The estimated asymptotic variance-covariance matrix of (6 ,6 .6 ,0
0
1 2
is computed using equations (2.4.3.2) through (2.4.3.5), with the following results:
83.845
(symmetric)
-1.257
0.0236
-1.973
-0.0129
0.4591
i- -38 .103
-0.2613
12'.2333
1266.63
16
4668.007
J
while Glasser's equavalent results give
l
83.861
1267.36
16
i
J
(symme tric)
0.0236
-1.257
-1.974
-0.0129
0.4595
I_ -38.145
-0.2613
12.2800
4676.160
J
)
51
Table 2.5.3.1
Data from Glasser (1965) for Linear Regression
y
100 log survival
194
223
194
198
223
159
213
180
232
192
>215
>205
>248
>242
>256
>256
Xl
Age
X
2
rating/l0
42
67
62
52
57
58
55
63
44
62
51
64
54
64
54
4
6
4
6
5
6
6
57
7
5
7
7
10
8
3
9
9
Chapter 3
LIKELIHOOD ESTIMATION OF PARAMETERS OF THE
BIVARIATE NORMAL DISTRIBUTION WITH BOTH
VARIABLES SUBJECT TO CENSORING
~XIMUM
In Chapter 3 maximum likelihood estimation using the RIP in the
case of a bivariate normal population with random censoring occuring
on both variables is investigated.
First, a simplified procedure for
the multivariate normal case is introduced, and biases of the procedure
are investigated in the bivariate fixed censoring case.
estimation methodology is developed.
Then, unbiased
The unbiased procedure is also
shown to be applicable in the case of a multivariate normal distribution with at most two variables censored in any data vector.
In
section 3.4 the ease with which the RIP theory may be extended to bivariate linear models is demonstrated.
Both the bivariate k-samp1e
and bivariate multiple regression cases are examined.
Simulations are
run to investigate the small sample characteristics of the estimates
in the bivariate normal case.
3.1
A Simplified Estimation Procedure Using the RIP in the
Singly-censored Multivariate Normal Distribution Case
The general scheme for estimation in the multivariate normal case
will use the marginal estimates developed in Chapter 2 as starting
points for an iterative process using the RIP procedure.
Assume we are given N observations for a k-dimensional multivariate normal distribution.
. bl e
'lar1a
(:
' th
,-rom
th
e.l.
0
Let
A
ij
denote the value of the jth
b
'
servatl.on
'lector,
50
h
i
tat
::0
1
,)
,,-,
•••
"an
d
,,~
53
j
=
1,2, ... ,k.
If x .. is observed, denote it as y .. and if it is cen1J
1J
sored denote it by Zij'
Let c
be the known censoring point for the jth
j
component of the distribution.
The log-likelihood function for
x = [Xij]NXk is given by
L = ~~ln(2rr)/2 + ~Nlnl~-
1
1-
N
~E
(xi-~)'t-
i=l -
1
(3.1.1)
(xi-~)
-
--
The log likelihood of the observed data is given by
L'
where E
Z
data.
(3.1.2)
= EZ (L)
denotes integration of the function with respect to the censored
Setting first partial derivatives of L' with respect to
~
equal
to zero gives
(3.1.3)
therefore, as we found earlier, u is estimated by
(3.1.4)
where now if just one observation from a particular vector is censored,
t h en t h e expecte d va 1 ue
0
..
i h tf t hat J.th 0 bservat10n
1S .
g1ven b y t h erg
hand side of (2.1.2) or (2.1.3) with u and cr replaced by the conditional mean and variance.
By using previous estimates of u and
equation (3.1.4) gives an iteration scheme for u.
t,
If more than one
observation is censored, (3.1.4) requires the evaluation of the multivariate normal probability integral.
ly difficult.
in£orma~ion
This would in general be extreme-
Therefore, the procedure will be simplified by not using
about censoring on one variable in computing the expected
value af any other censored value.
In order to estimate
t,
the first
partial derivative of L' wi~h respect to the elements of +-1, denoted
as
jm
, is needed.
This
deriva~ive
results in the equation
54
N
cr.
Jm
=
(3.1.5)
E 0:: (Xij-)..I,) (xi -)..I ) IY) IN
Z i=l
J
~
m
which is the standard sum of squares and cross products with censored
values estimated from)..l and ~ as in (3.1.4), except when both x
x. are censored.
1m
be added.
Thus
N
A
where Qijm equals one if x
ij
and x
im
lJ
A
(3.1.6)
+ 0ij E (cr j IY)/N
m
m Z
m
)
are both censored, and zero other-
Equation (3.1.6) also involves )..I and
t
on both sides of the
~
equation.
and
In that case, the additional term of E (cr 11) must
z jm
cr. = L: «E (x. ,) - )..I.)(E (xi) Jm
i=l Z 1J
J
Z
m
wise.
ij
If Ez(crjmIY) could be evaluated easily from )..I and
,
t,
then equa-
tions (3.1. 4) and (3.1. 6) form a "successive substitutions" iteration
scheme.
rne evaluation of (3.1.6) has the same difficulties as (3.1.4)
in the multivariate case when more than one value is censored.
The con-
ditional expectations involve the multivariate normal probability integral.
Because of this, these terms are left out of the covariances.
The conditional variances, ignoring other censored variables, are added
to the cross products.
This reduces the bias on the diagonal elements,
and insures convergence to the maximum likelihood estimates in the univariate case.
3.2
Bias Relative to True Maximum Likelihood Estimators of the
Simplified Procedure in the Bivariate Normal Case with
Fixed Type I Censoring
In order to obtain estimates of the biases (relative to the true
ML estimators) produced by the simplifications instituted in the estimation procedure introduced in Section 3.1, the bivariate normal case
under fixed Type I censoring is examined.
For any observation from the
bivariate normal, there are three cases pertinent to the problem:
,.
55
(1) xl and x
2
both present, (2) xl or x
and (3) xl and x
estimates.
2
both censored.
In particular
to the ML estimators.
considered.
~l'
2
censored and the other present,
Only case (3) creates biases in the
~2'
all'
0
12
and 022 are all biased relative
Because of symmetry, only
~l'
all and 012 will be
Rosenbaum (1961) obtains concise formulae for the moments
of a bivariate normal distribution singly truncated with respect to
both variables.
be derived.
From these, the biases of the parameter estimates may
We have in the left-censored case
(3.2.1)
E«xl-~1)(xZ-~2) Ix l <c 1 ,x 2 <c Z)
2
E«xl-~l) !x l <c l ,x 2 <c Z)
and
(cl-\.l·)/o.
~
~
i
1
=
(c - p c )/(1-p Z) ~
Z
l
c
Z
=
Z \~
(cZ-pcl)/(l-p )
C
~
b
b
[ei-20ef2 +e~
°3 =
.
P
~(x)
and
1-p
= correlation
=
and
= allm ZO
=
where with a.
(3.2.3)
=
l
.
1,2
= censoring
z = censoring
~
P1
= prob.
Pz
:a
point for xl
point for x
2
xl censored
prob. x censored
2
coefficient = a 12/ (° cr 22)
11
cumulative standard bivariate
coefficient = ;)
~(x)
(3.2.Z)
(011022)m ll
:&
~ormal
~2
with correlation
the standard normal density and cumulative normal
density we have
mlO
mil
=
(~2(al,aZ'p»
-1
(~(al)~(bZ)
+ p~(aZ)~(bl»
(3.2.4)
= (~Z(al,a2,p)-1(~2(al,aZ'p)p + pal~(al)~(b2)
+
p a iP(a )1(b )
Z
1
Z
+
?
«l-p~)/Z7T)
~
-Hb3»
0.2.5)
56
mZO = (~2(a1,a2'p»
and
+
Letting ¢2
P
2
-1
(~2(a1,a2'p) + a1~(al)~(b2)
Z
(3.2.6)
~
aZ4>(a ZH (b ) +p «l-p ) /2TT) Hb )
l
3
= ¢2(a l ,a Z 'p)
the bias for
~l
is seen to be
~
-1
E(~l)-~l = -~2(~1-all~(al)/¢(al) - (~I-all~2 (~(a1)¢(b2)
0.2.7)
+ p¢(a )¢(b »»
Z
1
~
all(~2.(al)/¢(al)
- .(al)~(bZ) - p$(a Z)¢(b 1 »
A graph of this bias is given in Figure 3.Z.1 for varying probabilities of censorship on each variable.
The absolute bias is seen to
be less than .02011 for all p<0.6, even with 30 percent of both variables
censored.
For p>O, the negative bias increases rapidly with p, reaching
a maximum around
ps 0.8 with 30 percent censoring of both variables.
The bias for olZ is found in the same manner to be given by
(3.2.8)
A graph of this bias is given in Figure 3.2.2.
The bias is
seen to be quite severe as p increases towards 1, being greatest for
P
l
= Pz = 0.1
and least for PI s P2
situation is reversed for negative
P2
= 0.3
and smallest for P
1
= 0.3
0,
This
the bias being largest for Pl
= P~"- = 0.1,
nearly so great as positive p.
for large positive p.
although the bias is not
=
57
0.01,.--
----.
."..
~".'
.... -- ...............
".'
/
./#
0.00
......
,"
--_ .... ------------ "
"
Pl-O.10
P2-0.30
......
,
,
"
~
~"
,
"- ,
Pl-O.IO
P2==0.lO
\.
-0.01
. '"
\
\
\,
,
,
"
-0.02
Pl==O. 30
.'.P2==O.30
"
"
'-~
,,/
-0.03
-1.0
-0.5
0.0
0.5
1.0
COR R E L A T ION
Figure 3.2.1
Bias of Estimated w (Relative to ~he True ML Estimator) Produced
l
by the Simplified Estimation Procedure of Section 3.1 for Pl and
P2 probability of Censoring Variables xl and x
2
Respectively.
58
O.3r-
..• .
.....-
..
•
.
.
...•
PI=O.lO
PZ=0.30
...
........
.
/
.
..,
/
/
/'
N
N
/
....
....
':)
,/
o
",'
'-'
/'
PI=0.30
PZ=0.30
i Jl
<:
H
;:Q
PI=O.IO
P2=0.10
0.0
-0 •
11.....-~__..L_.L__J..__"1.._...I.___..L_.L_~__..L_..L__"_._....l_..L__J.._L.._...I.___I._"__--I
-1.0
-0.5
0.0
0.5
1.0
COR R E L A T ION
Figure 3.2.2
Bias of Estimated 012 (Relative to the True ML Estimator) Produced
by the Simplified Estimation Procedure of Section 3.1 for PI and
P2 Probability of Censoring Variables
Xl
and
X
z Respectively.
59
0.10,--
...,
.
Pl:oO.30
........ - ......, P2:o0.30
/
/
/
I
0.00
.,
.
"
"
Pl=O .10
P2=0.30
......• ", .
.... - . . . . .....
/
I "
I "
.-
',_ ,
.. ~~,
"
Pl=O.lO
P2=0.10
-0.05
/
-0 .10 '--_'_-...I_~-.L._.._. ....._'___'__"""_
0.0
-0.5
-1.0
/
_...a._~___I_..&o_. ......_...a._........_ ' _ _ . . J
_L_.........
0.5
1.0
COR R E L A T ION
Figure 3.2.3
Bias of Estimated all (Relative to the True ML
EstL~ator)
Produced
by the Simplified Estimation Procedure of Section 3.1 for Pl and
P2 Probability of Censoring Variances xl and x ' respectively.
2
60
The bias for 011 is obtained in the same manner as above and
is given by
(3.2.9)
A graph of this bias is shown in Figure 3.Z.3.
The bias for
increases with increasing percent censored of each variable.
is seen to be negative for p>O and positive for p<O.
For PI
the positive bias reaches a maximum of 0.06Z 011 for
negative bias reaches a maximum of -0.07Z 011 for p
3.3
p=
=
v~
ll
The bias
=
Pz = 0.3
-0.56 and the
+ 0.66.
Maximum Likelihood estimation of the parameters of a bivariate
normal distribution in the presence of Type I random censoring
In order to demonstrate the applicability of the RIP procedure to
this case, the normal equations for this particular case will be ex'plicitly derived.
normal density with mean vector
(
1
: ';11
!
-
:
1Z
and variance-covariance matrix
l
~??i
--)
(YIZ,Y
ZZ
) , ... ,(Yln,Y
Zn
) of size n from this distribution with both
variates subject to random Type I censoring.
assume left-censoring.
no
n
n
l
2
=
Without loss of generality,
Further, let
number of observations with neither 1
Llumber of observations
~vith
1
nor 'lZ censored
YI only censored
= number of observations with Y
Z
only censored
n J = number of observations with Y and Y2 both censored
I
61
c
li
= censoring point of Yli (if censored)
censoring point of Y?"
(if censored)
_1
and
r
li
qli =
~Z(Yl'YZi)dYl
Y only censored
1
_00
qZi
=
(3.3.1)
1
otherwise
(2"~~Z(Yli'Y2)dY2
Y only censored
Z
_00
(3.3.Z)
=1
otnerwise
Y and
l
-00
I
censored
_00
(3.3.3)
=1
e
Y~
otherwise
Then the log likelihood Eor this sample is given by
0.3.4)
(cont)
In order to obtain equations for the maximum likelihood estimates
of t:1P cneans, the followine; results are needed:
_exl
(cont)
62
and likewise
(cont)
(3.3.6)
and
(3.3.7)
Therefore, using results (3.3.4) through (3.3.7), we have
aLia\.! 1 =
2 -1
-(1-0)
[ - ~ (Y -\l1)/cr
1i
ll +
P(crllJZ2)-~(~
.1
uncensored
-
?
(l-p-)
-1
[- L
(YZi-U Z»]
Z
uncensored
I
E{Y1-\..l1 Yl<cU'YZi}/iJll+ P(0"1lu
L
Z2
)--2
Y 1 only
censored
(con t)
E{Yl-U 1 !Y1<c 1i ,
and '11
2
censored
L
1
+ o (0"1 10"Z"")
-
"-
->,;.!...
'i
1
censored
But equation (3.3.3) is just the usual one for the partial
w.r.t. u ' with the censored values replaced by their conditional
1
'2xpec ted
va1'Jes.
The
partial 'Ni tit respp-c t to u) Ls seen to give
0.3.8)
63
symmetric results to equation (3.3.8).
In order to obtain equations for the maximum likelihood estimates
of the variance and covariance, the following results are needed:
(3.3.9)
-(]
-!~
11
(3.3.10)
Differentiating (3.3.4) with respect to
using results
(3.3.8) through (3.3.11) we have
~
-n-(l-p-)
-1
[E
(-(Y
uncensored
Y
?
- (l-p-)
-1
1
and Y
?
li
-u )-/0
1
11
Z
Z
1 C- E {(Y 1 -u 1 ) /allIYl<c1i'Y2i}
Y1 on y
[I:
censored
1
+ p(YZi-uz)/ai:
'J
- C1-0-)
-1
[I:
y
.E{({1-U1)/a~liY1<C1i'YZi})J
2
censored
64
(cont)
Equation (3.3.12) is seen to be the usual maximum likelihood equation with cross product terms involving censored observations replaced
I.
by the expectation of such terms.
symmetrical results.
respect to
p
The equation for aL/a(0~2) gives
Now for the covariance term, the partial with
will be examined.
The following results are again aeeded:
+ (Y2i- Il ?) 2 102Z- Zp (0110ZZ) -~l"(Y2i-IlZ)E{Yl-~1IYl<cli'Y Zi })
0.3.13)
+
((Yli-~1)2/vll
-
2p(allcrZ2)-~(Yli-Wl)E{Y2-~2!Y2<c2i'Yli})
0.3.14)
65
and so we have finally
2
? -1
1
3L/3p = np/(l-p) -r (l-p-) [l: (YU-Wl)(Y2i-W2)/(011022) 1
uncensored
+
~(011022)
-~
(Y2i-U2)E{YI-Ul!Yl<cli'Y2i}
1
only censored
+
(011a22)-\~E{(YI-Ul)(Y2-:J2)
I..
IYl<Cli'Y2<c2i}]
both censored
:2 -2
2
-~
2
- ~(l-p) [~«Y1i-Ul) lOll -2p(oll022) (Yli-Ul)(Y2i-u:2) + (Y 2i -:J 2 ) 102Z)
uncensored
+
~(E{(Yl-~l)210l1IY1<Cli'Y2i}
1
-2p
(J 11 a 22) --2 (Y 2i-W
2 ) E{y l-U l !Y1<c l i ' Y2i }
1
only censored
~
+ ~ ( (Y li-:J 1) .:. I all- 2 p (cr 11 cr 22)
-~
2
(Y l i -u 1) E{Y2-:J 2 Iy2 <Czi ' Yl i }
2
only censored
~
2
2
+ Z E{(Y1-u l ) 1011-2p(Oll022)-~(Y1-Ul)(Y2-U2) + (YZ-U Z) 10221
both
censored
V
<c l~' Y2 <c ') . 'J.
........
-~
• 1
1
J
(3.3.16)
66
From the above equations we see that the maximum likelihood equations for the variance-covariance matrix are the usual ones with cross
product terms involving censored values replaced by the conditional expectation of such terms.
This is exactly the solution presented by
direct application of the RIP to the fixed censoring point multivariate
normal problem in Section 3.1.
The general scheme for estimation of the parameters of the bivariate
normal will therefore follow that presented in Sections 3.1, except that
in the case where both variables are censored, correction terms will be
added to the matrix of cross-products to obtain unbiased estimators.
The necessary correction terms are obtained from equations (3.2.1) through
(3.2.6) presented earlier.
Beginning with initial estimates of the
parameters, expected values of the censored values are obtained.
New
estimates of the parameters are computed using the standard maximum
likelihood equations with expected values used in place of censored
values and expected values of cross product terms in place of those
terms involving censored values.
These new estimates are then used to
continue the iterative process until convergence to the maximum likelihood solution.
It is clear from the development of equations (3.3.8), (3.3.12)
and (3.3.16) that this solution is immediately extendable to the
multivariate normal case with no more than two of the variables censored in any observation vector, since the only adjustment necessary
involves conditioning on the additional
obse~,ed
values in the multi-
variate data vector, which presents no difficulties.
67
3.4
A simulation study of the bivariate normal
distribution estimation procedure
The estimation procedure developed in Section 3.3 will provide
maximum likelihood estimates of the parameters of a randomly censored
bivariate normal distribution.
However, the behavior of these esti-
mates for moderate sample sizes and varying degrees of censorship is
unknown;
the asymptotic variance-covariance matrix of the estimates
would be difficult to derive explicitly in any case.
Therefore, the
Ten groups of 1000 sets of
following simulation study was carried out:
25 bivariate normal random variables were generated using the polar
method, with parameters U
l
=
~2
= 0,
all
= a 22 =
where p' is a selected value between 0 and 1.
1, and
p
=
p',
Nine of the data sets
were subjected to Type I censoring in the upper tail, using censoring
points selected from the normal distribution function that corresponded
to the 70
th
,80
th
,and 90
th
percentiles, such that the following com-
binations of uncensored and/or censored marginal distributions were
obtained:
(10%,
20~O,
(0%,
O~O,
(0%, 10%), (0%, 20%),
(10%,30%),
(20%,20%),
(O~~,
30;~),
(10%, 10%),
(20%,30%) and (30%, 30%).
The
estimation procedure of Section 3.3 was applied to each of the data
sets to obtain the estimates u l '
~istics
~2'
all' 022' and 012'
for each set of 1000 estimates
corr.espondi~g
combinations of censoring were then computed.
carried out tor 9' = - 0.9(.1)0.9.
Summary sta-
to each of the
This procedure was
Hence, the additional impact
of the correlation on the estimation procedure and on the variancecovariance matrix of the estimates was examined in the simulations.
fhe summary statistics corresponding to D' = -0.5,
f;'
=
+0.5 are ;siven in r.lbles 3.4.l
throu~h
the overall results of the simulations.
0'
~
0.0, and
3.4.3 as examples of
68
Table 3.4.1
Summary Statistics of
~~ximum
for Censored Bivariate
10
0
Mean
Samples of
~ormal
Size N=25, with
Per cent
of Variable 2
Censored
Likelihood Estimates
P~O.O
Per Cent of Variable 1 Censored
20
10
Std.
Std.
Std.
Dev.
Mean
Dev.
Mean
Dev.
30
Mean
Std.
Dev.
~1
0.010
0.197
0.005
0.212
~2
0.013
0.202
-0.001
0.201
0.959
0.276
0.999
0.324
22
0.992
0.313
0.974
0.328
12
-0.006
0.201
0.010
0.201
0.008
0.205
0.009
0.203
0.011
0.204
0.016
0.209
-0.003
0.201
0.009
0.213
0.963
0.274
0.958
0.302
0.972
0.346
(J 22
0.990
0.336
0.980
0.326
0.985
0.351
"
(J12
-0.001
0.218
0.005
0.205
0.001
0.207
~IL
-0.001
0.198
0.004
0.198
0.011
i).
214
0.011
0.210
~2
0.008
0.022
0.009
0.219
0.014
0.214
-0.006
0.208
(JU
0.966
0.285
0.971
0.319
0.995
0.334
0.975
0.365
(J22
0.988
0.384
1.000
0.391
0.991
0.376
0.988
0.380
(J12
-0.003
0.206
-0.011
0.218
-0.002
0.214
-0.005
0.248
cr
11
&
("J
).l.1
;0(
~2
20
(J11
"
30
69
Table 3.4.2
Summary Statistics of Maximum Likelihood Estimates
for Censored Bivariate Normal Samples of
Size N=25, with
Per cent
of Variable 2
C.ens 9 r e d
0
Mean
''''
Mean
Std.
Dev.
0.010
0.196
112
0.009
0.206
-0.003
0.207
'"
<711
0.952
0.262
0.988
0.309
A
0.991
0.303
0.989
0.317
-0.487
0.217
-0.499
0.231
-0.011
0.204
0.005
0.205
0.010
0.209
112
'"
0.015
0.211
0.000
0.214
0.004
0.210
«111
0.968
0.284
0.973
0.313
0.975
0.338
°22
1.003
0.349
0.992
0.342.
0.999
0.358
°12
-0.493
0.236
-0.503
0.236
-0.496
0.244
111
0.003
0.199
0.007
0.206
-0.001
0.209
0.006
0.222
llZ
0.017
0.221
0.005
0.210
0.028
0.213
0.019
0.219
Q11
0.947
0.275
0.992
0.329
0.986
0.331
0.995
0.373
0.987
0.378
1.002
0.371
0.985
0.368
0.989
0.377
0.485
0.240
-0.500
0.248
-0.493
0.250
-0.488
0.256
°22
A
"12
A
111
30
30
0.205
A
20
Per Cent of Variable 1 Censored
10
20
Std.
Std.
Std.
Dev.
Mean
Mean
Dev.
Dev.
-0.001
111
10
p=~0.5
,..
0'22
A
° 12
70
Table 3.4.3
Summary Statistics of Maximum Likelihood Estimates
for Censored Bivariate Normal Samples of
Size N=25, with p= 0.5
Per cent
of Variable 2
Censored
0
Mean
Std.
Dev.
0.011
0.212
u2
0.000
0.198
0.007
0.210
jll
0.971
0.285
0.989
0.316
0.982
0.320
0.985
0.326
0.491
0.230
0.487
0.237
-0.002
0.197
0.010
0.200
-0.002
0.208
z
0.009
0.213
0.005
0.208
0.001
0.207
0'11
0.972
0.265
0.989
0.320
0.982
0.346
5 22
0.990
0.354
0.981
0.340
0.969
0.347
A
0.493
0.229
0.496
0.246
0.487
0.245
-0.004
0.201
0.008
0.200
0.002
0.205
0.008
0.217
2
0.015
0.227
0.017
0.221
-0.002
0.220
-0.002
0.221
8 11
0.979
0.283
0.978
0.305
0.968
0.330
0.984
0.370
A
1.009
0.383
1.002
0.401
0.959
0.359
0.974
0.359
0.494
0.242
0.496
0.245
0.470
0.235
0.492
0.253
A
A
0'12
J.l
1
U
A
0"12
'il
u
30
Mean
0.200
l
0'22
20
30
-0.009
u
10
Per Cent of Variable 1 Censored
10
20
Std.
Std.
Std.
Dev.
Mean
Dev.
Mean
Devo.
1
0'22
A
0"12
71
For comparison purposes, the asymptotic standard deviations of
the estimated parameters in the uncensored case are given by
Std Dev (u ) '"
l
= cr 2 /1fS '" o.zoo
Std Dev (IlZ)
~
Std Dev (cr 11) =
~
Std Dev (cr 12) '"
Std Dev (cr
«1
ZZ
)
a
4
."
(48cr /Z5-)~ '" 0.277
+ (pl)2)/24)~
0.4.1)
p , '" 0.0
'" 0.204 for
'" 0.228 for p' '" + 0.5
Several points became obvious during the execution of these
simulations.
First, the estimation procedure behaved Hell and con-
verged rapidly to the parameter estimates, even with 30% of each
variate censored.
Second, the procedure behaved consistently for
all values of p'.
Third, the mean values of the parameter estimates
were always very close to the expected values for all degrees of
censoring and all values of P'.
the estimates was for large
t~o
Ip'l
The only suggestion of a trend in
and asymmetric censoring of the
variates, which produced slightly higher estimates of the
variance of the highly censored variate as well as of the covariance.
As observed in Tables 3.4.1 through 3.4.3, the computed standard
deviations of the sample estimates correspond quite closely with
the asymptotic values cor the uncensored
c~se,
with the standard
deviations increasing with increasing degrees of censorship.
These
increases in the standard deviation relative to the asymptotic values
for the uncensored case amounted to at most 10% for the mean estimate
and 40% for the variance estimates, both occuring with 30% of one
and/or both variables censored.
3.5
Estimation for the k-sample
bivari~te
case
In the k-sample bivariate case observations are collected on k
independent groups of samplinJ units,
wit~
responses described by a
72
bivariate normal random variable with mean vector~.
-J
group and a covariance matrix
in the jth
common to all groups.
Imen both variables are subject to random Type r single censoring,
the maximum likelihood estimates of u., j
-J
z
1,2, ... ,k and tare ob-
tained by a direct application of the RIP as developed in Chapters 2
and 3.
The conditional expectattons of the censored variables given
the observed portion of the observation vector are used in place of
the censored values in the standard maximum likelihood equations
for the mean matrix, with conditional expectations of. cross product
terms involving censored variables used in the equations for the
pooled estimate of the covariance matrix.
As shown in Sections 3.2
and 3.3, adjustments to cross products involving estimates of censored variables are only necessary when both variables are censored
in which case equations 3.2.1 through 3.2.6 are used to calculate
the correction terms.
The extension of the k-sample bivariate case to the k-sample
multivariate case
no more than 2 of the variables censored in
~"tih
any observation vector is straightforward, involving conditioning
on the additional observed variables in the multivariate vector in
computing expected means and covariance matrices.
3.6
Estimation in the bivariate multiple regression case
In the bivariate multiple "egression case we have
y = X 8 + e
(3.6.1)
(
where
Y
l~ll
= I:
is a random matrix of observed and randomly
i·
I'Ynl
censored values,
e
=
r~ll 712'1
I
l~nl
:
I
en2 j
is a random matrix, X is an nxk
73
matrix of rank k consisting of known fixed quantities, and
is a matrix of unknown parameters, where
k~n.
The (eil,e
iZ
~2
) pairs
are independently identically distributed as bivariate normal with
mean vector 0 and covariance matrix
t
unknown.
The normal equations
for this situation are of course the multivariate analog of the
univariate case given by equations (2.3.1) and (2.3.2) of Section 2.3.
If only one observation of an observation pair is censored, its expected value is given by (2.3.3) or (2.3.4) with the mean and variance replaced by the conditional mean and variance given the observed
value.
The expected value of the cross product term contribution to
the variance is given by (2.3.3) through (2.3.6), again with those
means and variances conditional on the observed value of the pair.
On the other hand, if both observations are censored, the conditional
means and conditional covariance terms are computed from Rosenbaum's
equations (3.2.1) through (3.2.6) as described in Section 3.2.
The iterative scheme for the bivariate multiple regression case
is therefore seen to be the analog of the univariate multiple regression case, with adjustments to means and covariance terms necessary
where both observations are censored.
The multivariate multiple re-
gression case with no more than two observations censored in anyone
vector is again a straightforward extension of this scheme, with conditioning of the means and variances on the additional observed
variables being the only necessary adjustment to make.
Chapter 4
SIMULATION STUDIES OF HYPOTHESIS TESTING PROCEDURES
IN THE PRESENCE OF TYPE I CENSORING
Methods of testing hypotheses concerning parameter estimates in
the censored sample case are often necessary.
In this chapter, the
likelihood Latio (A) is investigated through simulation studies to
determine its behavior for moderate sample sizes for the experimental
situations most likely to be encountered.
These situations include the
univariate I-sample, k-sample, and multiple regression cases, and
the bivariate I-sample and linear model cases.
Under the null hypo-
thesis, -2 In \ sh'luld have an asymptotic chi square distribution with
degrees of freedom equal to the difference in the number of parameters
in the two hypothesized models.
Other tests previously suggested for the censored normal situation
are also computed where possible in the simulation studies to examine
their behavior relative to the likelihood ratio.
These include Samp-
ford and Taylor's (1959) approximate t-test in the univariate l-sample
and k-sample cases, as well as the asymptotic z-value using the estimated asymptotic variance-covariance matrix of the maximum likelihood
estimators as developed in Chapter 2.
4.1
The Univariate One-Sample Case
As mentioned in Chapter 1, Sampford and Taylor (1959) give a method
of analysis for randomized block experiments with censored observations, and state that it can be used for other experimental designs.
74
75
Taylor (1973) carried out extensive simulation studies to verify
these results.
This method of analysis f0r testing H :
o
~
= u* employs
a test of the form
A
~
T
=
(~-u*)(EWi-l)
la
~ (~(ai)/~(-ai))«~(ai)/~(-ai))
- a ) for right censoring
i
(4.1.1)
= 1 Eor uncensored observations
with a i
=
(cl-~*)/a
and c
= censoring
i
point for observation i.
The degrees of freedom for this test is taken to be
tWi-l.
A second possible course of action is an appeal to the asymptotic
normality of the maximum likelihood estimators.
testing H : U
o
The test value for
= u* is then
Z
=
A
A
~
(u- u*) 1[Var (u) ] ~
(4.1.2)
where Var (u) is obtained from the asymptotic variance-covariance
matrix given by equation (2.4.1.2).
The likelihood function may be evaluated under each hypothesized
~odel
and the ratio evaluated, with -2 1n A being asymptotically chi
square with 1 degree of freedom in. testing H0 : u
=
u*.
This is a test
of interest here, since it is generally applicable and yet has not been
investigated in the censored sample situation.
The likelihood ratio in this case is given by
,\ =
A? ~2 n /2
2 -2
A 2
~?
(c-/J) 0
(exp-(E(Yi-u ic ) /20 -E(Yi-\.l) /2a-»'f1('H(c - u*)/a)1
i
uncensored
uncensored
censored
'~«C.-\.l)/'J))
~
(4.1.3)
76
~2
where a
~2
and cr
are the variance estimates under HI and H ' respeco
tively, and no is the number' of uncensored values, while c
i
is the
censoring point of censored observation Y .
i
For this simulation study, four groups of 1000 sets of random
normal deviate (~~O and a2~1) samples of size 25 were generated using
the polar method, and subjected to 0%, 10%, 20%, and 30% Type I fixed
censoring, as described previously.
Fixed censoring was used in order
to simplify interpretation of results.
(Fixed censoring may be con-
sidered random censoring with probability 1 that the censoring point
c
1
is equal to a constanc c.)
Each of the three test statistics were
computed for the 1000 samples within each group and the distributions
of the test statistics were examined.
The results of the simulations are quite encouraging
First,
the z and T test statistics produced essentially identical values
in all of the simulation groups tested (Table 4.1.1).
All three
test statistics behaved nicely when the null hypothesis was true,
with sample means and variances for the z and T statistics being
very close to the expected values
(~
= 0.0, cr 2 = 1.0) and with the
2
X sample mean and variance values being quite close to their expected
values
(~
=
1.0, a
2 _
?
- _.0).
Under the null hypothesis, the trans-
formation of the sample statistics to p-values of their theoretical
distributions should produce a uniform (0,1) distribution with mean
and variance equal to 0.5 and 0.083 respectively.
tion of the T and
x2
ihis transforma-
sample values indeed produces a distribution
of sample ITalues with means and variances extremely close to these
values, as seen in Table 4.1.1.
(The p-transformed z-values are
not given since they were essentially identical to the T-values.)
77
Table 4.1.1
Means and Variances of 1000 Simulations of the· Null
Hypothesis Test Statistics and p-Values for
the Univariate I-Sample Case (N-25)
Pr(tdf~T )
2
X
?
2
pr(X~~X1)
24.000
0.495
1.017
0.505
0.002
23.549
0.489
1.028
0.499
-0.014
-0.006
22.842
0.499
1.028
0.508
30
0.020
0.039
21. 907
0.488
0.954
0.494
0
1.081
1.038
0.000
0.081
2.005
0.083
10
1.084
1.048
0.120
0.083
2.336
0.085
20
1.058
1.041
0.391
0.082
1. 899
0.084
30
0.932
0.967
0.849
0.081
1. 702
0.084
%
CIl
~
Censored
Z
T
0
0.033
0.032
10
0.000
20
df
t'Cl
Q)
~
CIl
Q)
(J
~
t'Cl
~
I-<
t'Cl
>
h8
Table 4.1. 2
Percentiles of 1000 Simulations of Pr(tdf<T)
and
pr(x~~xi) Under the Null Hypothesis
for the Univariate 1-Samp1e Case
Percentiles
%
,-..
t'l
',j"j
..."
50
70
90
95
99
0.298
0.488
0.682
0.899
0.945
0.985
0.092
0.282
0.494
0.689
0.885
0.943
0.993
20
0.106
0.297
0.505
0.696
0.891
0.940
0.987
30
0.085
0.286
0.484
0.694
0.880
0.935
0.983
0
0.106
0.307
0.501
0.697
0.909
0.952
0.988
10
0.094
0.289
0.505
O. 707
0.897
0.951
0.994
20
0.109
0.306
0.515
0.708
0.905
0.953
0.990
30
0.086
0.288
0.494
O. 703
0.888
0.948
0.988
Censored
10
0
0.103
10
30
~
'-../
0...
,-..
N
.....
~I.....
N
><
...
0...
'-../
79
2
In order to further insure the fit of the T and X sample statistics
to their theorectical distributions under the several censoring regimens,
the cumulative frequency distributions of the p-values from these statistics were examined.
As is obvious from Table 4.1.2, there is an ex-
cellent fit of these sample statistics to their predicted distributional forms under the null hypotheses.
The likelihood ratio test
statistic p-value distribution was consistently closer to the true
percentiles than was that of the T-statistic for all censoring regimens,
with the T-statistic producing slightly more conservative results.
That
is, the T-statistic would provide slightly fewer significant tests
under the null hypothesis than would the maximum likelihood test.
Both
tests, however, appear to be extremely close to theoretical a-levels
under the null hypothesis.
4.2
Power Considerations in the Univariate One-sample Case
The power of the T and likelihood statistics was investigated by
repeating the simulations of Sections 4.1 with the generated samples
having a selected mean different from the null hypothesis value of O.
Three different values for the alternative mean were used
u
= 0.392,
and
~
=
(~
= 0.329,
0.515) in order to obtain an idea of the shape of
the power curve under various censoring regimens.
These particular
values were chosen so as to produce values in the middle part of
the power curve.
The censoring points were selected as in Section 4.1,
so that in fact for the generated samples with higher means, more of
the values were censored than would have been censored under the naIl
distribution.
'fable 4.2.1
2
. Empirical Power of tile T and x Test Statistics Under Several Alternative Hypotheses as
Derived from 1000 Simulation Runs at Each Alternative
and Each Censoring Regimen
-- --
-
Selected
Level of the Test
Cl -
0.05
0.10
o
0.01
Pr (t df>T)
2 2
Pr(x >X )
l 1
pr(tdf~r)
2 2
Pf(X >X )
1- 1
HlA : ~=O. 329
0.514
0.536
0.361
0.393
0.110
0.121
: ~=O. 392
1B
H1C: ~""O. 515
0.572
0.604
0.441
0.465
0.217
0.236
0.796
0.816
0.695
0.710
0.426
0.466
0.476
0.518
0.357
0.390
0.107
0.129
0.594
0.631
0.452
0.495
0.196
0.240
0.799
0.830
0.684
0.713
0.385
0.463
0.438
0.491
0.299
0.354
0.082
0.116
0.555
0.594
0.450
0.500
0.202
0.278
0.771
0.807
0.638
.
0.695
0.334
0.441
0.408
0.465
0.252
0.324
0.061
0.098
0.494
0.546
0.400
0.492
0.146
0.245
0.748
0.794
0.615
0.680
0.298
0.451
% Censoring
li
10 % Censoring
lilA
1I
1B
Hie
20 % Censoring
lilA
1I
1B
li
1e
Pr(t
>T)
df-
2
2
Pf(Xl~Xl)
*
30 % Censoring
lilA
H
1B
1I
1e
*
The per cent censored represents that portion expected to be censored under H •
0
Under H a greater number of observations are in fact censored, since the mean
of the distribution is higher, with the variance remaining the same.
~
0
e
e
e
81
Two points are immediately obvious from a summary table of these
simulations.
First, the likelihood ratio statistic is consistently
more powerful than the T-statistic for all censoring regimens.
Second,
there is generally a loss of power as the percent censored increases,
especially for the T-statistics.
The likelihood ratio test statistic
suffers much less loss in power with increasing censoring, this loss
amounting to 25% in the very worst case found, with, the loss usually
being much less than this.
The conclusion to be drawn from this power investigation then is
that the likelihood ratio test statistic is definitely the best of
any of the test procedures examined, both in regards to behavior
under the null hypothesis and in regard to power under various alternative hypotheses.
4.3
The Univariate k-sample Case
The same three types of test statistics may be employed in the
k-sample case as were used in the I-sample case:
an approximate
2-sample t-test for comparing two treatment means (Taylor, 1973), a
z-statistic using the asymptotic variance-covariance matrix of the
maximum likelihood estimators to obtain the variance of any linear
combination of parameters, and the likelihood ratio (A), with
-2 In \ compared to a chi square distribution with degrees of freedom
equal to the difference in the number of parameters in the two models.
~~
For H :
0
~
=
~I
~j
~
~
versus Hl:~ ~
the likelihood ratio is given by
~
82
~2 -? n /2
\ = (0/0-) 0
-?
-2
• 2 -2
(exp-(H«Y .. -u)-/2cJ -(Yl.-\.l.) /20 )))
~J
ij
-
-J
J
uncensored
....."
....
• IT (<PC (c .. -u) / a) /
<!¥
1.J
(4.3.1)
« c . j -U • ) / a) )
1.
J
censored
n
~2
where a
-2
a
c
ij
o
number of uncensored values
=
and U ' j=1,2, ... ,k are the parameter estimates under H
j
l
and
are the parameter estimates under H
\.l
o
is the censoring point of censored observation Y
ij
.
The Eirst simulation study carried out is for a 2-sample case, so
that the three tests may be directly compared.
One thousand samples
of n=50 random normal deviates were generated and each sample grouped
into two groups of 25.
The maximum likelihood estimates were then
computed for the two-sample model and the one-sample model, and
the three test statistics generated for each of the thousand samples.
The T-statistic is obtained by computing
(4.3.2)
n.
1-
where w.
1. •
= L w.. with w
. 1 q
J=
' defined as in 4.1.1.
iJ
freedom Eor the test is taken to be w . +
l
W
2.
-
2.
The degrees of
The z-statistic
is obtained by computing
(4.3.3)
where the variances and covariances are obtained Erom the asymptotic
variance-covariance matrix as given by 2.4.1.4.
Goodness-of-fit tests showed that both z and T were well-approximated by a normal distribution in this null hypothesis case under all
83
censoring regimens.
The variances of all three test statistics are
higher than their asymptotic expected values, even for 0% censoring.
2
The p-value transformations of the T and X
~tatistics
indicate a
close fit to a uniform (0,1) variable. although both statistics produced slightly too many significant differences under the null hypothesis, as seen in Table 4.3.1 and 4.3.2.
This excess amounted to
very little in all cases, even with up to 30% censored in both samples.
Each of the three test statistics means over the 1000 simulations
were very close to the expected values.
The T-statistic always had
a smaller variance than the z and appears to be the preferred test
of the two in the 2-sample case.
The likelihood ratio test behaved
very nicely Ear all censoring regimens and also provided a very good
test procedure under the null hypothesis, as seen in Table 4.3.2.
The second simulation study in the univariate k-sample case is
for a 4-sample situation, with the likelihood ratio statistic,
-2 ln A, the test of interest.
In this case Ear H :
o
~l
=
~2
=
~3
=
~4
we will have a test statistic that should have (asymptotically) a
Chi square distribution with p = 4 - 1
= 3 degrees of freedom.
Four
sets of one thousand samples of n=lOO random normal deviates were
generated and grouped into four groups of 25, with 0, 10, 20, and 30 per
cent censored, respectively.
Both a four-sample and a one-sample
model was fit to each data set and the likelihood function evaluated
using the estimated parameters fit for each model.
The test statis-
tic -2 In A was then evaluated and the distribution of the 1000 test
statistics obtained was examined.
The simulation results indicate a closer approximation to the
asymptotic distribution occurs when censoring is present than when it
84
Table 4.3.1
Means and Variances of 1000 Simulations of Null
Hypothesis Test Statistics for the
Univariate 2-Sample Case
(N =N =25)
1
2
%
:1eans
Variances
Censored
Z
T
df
Pr(tdf'::'T)
xi
2 2
Pr (X 1.2.X-i)
0
0.024
0.023
48.000
0.505
1.111
0.513
10
-0.007
-0. 007
47.129
0.512
1.112
0.521
20
-0.050
-0.049
45.709
0.498
1.032
0.507
30
0.016
0.016
43.854
0.495
1.068
0.505
o
1.150
1.104
0.000
0.087
2.529
0.089
10
1.147
1.097
0.233
0.086
2.202
0.088
20
1.067
1.011
0.782
0.083
2.131
0.084
30
1.119
1.045
1. 791
0.083
2.514
0.084
85
Table 4.3.2
Percentiles of 1000 Simulations of
and
Pr(td£~T)
pr(x~<x~) Under the Null Hypothesis
for the Univariate 2-Sample Case
Percen tiles
%
......
~I.....
"0
~
,...
'-"
~
N
......,....;
~I,....;
N
><
,...
'-"
~
Censored
10
30
50
70
90
95
99
0
0.097
0.296
0.511
0.728
0.903
0.956
0.995
10
0.094
().311
0.520
0.717
0.906
0.953
0.992
20
0.095
0.296
0.510
0.695
0.894
0.945
0.987
30
0.100
0.293
0.498
0.683
0.906
0.956
0.992
0
0.099
0.303
0.523
0.740
0.911
0.962
0.996
10
0.096
0.319
0.532
0.730
0.915
0.959
0.994
20
0.098
0.303
0.521
0.709
0.904
0.951
0.990
30
0.103
0.301
0.510
0.698
0.916
0.963
0.995
86
Table 4.3.3
Means and Variances of lOOO Simulations of the Null
?
Hypothesis Test Statistic (X;) for the
Univariate 4-Sample Case
(N ,",N ,",N =N ,",25)
l 2 3 4
%
Censored
Mean
Variance
0
3.212
7.116
10
3.078
6.601
20
3.021
5.919
30
3.076
5.783
Table 4.3.4
2· 2 )
Percentiles of 1000 Simulations of Pr ( Xy:.x
3
Under the Null Hypothesis for the
Univariate 4-Samp1e Case
Pe rcen tiles
Censored
'.
10
30
50
70
90
95
99
0
0.112
0.316
0.526
0.730
0.918
0.966
0.993
10
0.089
0.310
0.508
0.715
0.903
0.958
0.990
20
0.104
O.
:no
0.520
0.712
0.901
0.951
0.991
30
0.109
0.309
0.518
n.723
n.902
0.950
0.990
at
87
is not present for this type of test (Tables 4.3.3 and 4.3.4).
the
~-level
When
selected is 0.10 or less. the significance levels obtained
under the null hypothesis will be very close to the expected
as can be seen by the percentile values in Table 4.3.4.
~-levels.
The worst
case for this particular simulation study was in fact the non-censored
data sets, when the test would produce a slightly greater number of
significant results than expected under the null hypothesis.
4.4
The Univariate Multiple Regression Case
The likelihood ratio in this linear model case is given by
(4.4.1)
- -
~
~
.rr~«ci-Xlb)/a)/~«ci-X2b)/a)
where the exponential contains only uncensored Y. 's and
~
under H
o
(4.4.2)
Since both the degree of censoring of the dependent variable and
the collinearity of the independent variables could be important
factors. one series of simulations was done with uncorrelated independent variables while a second set was done with a correlation of
0.5 between the two independent variables used.
In each series, one
thousand samples of n=25 sets of three uncorrelated random normal
deviates were generated using the polar method, and the variable used
as the dependent variable was subjected to 0%, 10%, 20%, and 30% censoring for each study, respectively.
For the series with correlated
independent variables, the two remaining variables were first trans-
88
formed to produce the desired correlation of 0.5 between them.
The
model generated was therefore
(4.4.3)
Two test statistics were calculated for these simulated tests, one being
the straightfocward likelihood ratio statistic,
being the
xi say,
and the other
likelihood ratio statistic obtained by using unbiased esti-
mators of the variances in place of the maximum likelihood estimates,
"
Xiu'
In order to obtain the unbiased estimates, the "effective degrees
r,u i ,
of freedom" was computed under each model as
in 4.1.1.
with wi defined as
The maximum likelihood estimates were then multiplied by
n/:w. to obtain the unbiased values.
The likelihood functions were
~
then evaluated u.sing these values, and -2 In A computed as previously.
The b., i=1,2 were extremely close to their expected values for
~
all censoring regimens and for both correlated and uncorrelated independent variables (Tables 4.4.1 and 4.4.3).
The sample variances
of the estimates were about 20% higher in the correlated case than in
the uncorrelated case, but remained fairly stable across all censoring
regimens.
The likelihood ratio statistic
~ showed a 10 to 15% positive
,"I.
bias in its mean relative to the asymptotic expected value for all
censoring regimens including censoring in both the correlated and
uncorrelated cases.
The bias was consistently greater in the corre..,
lated case, as were the sample variances of
The bias showed
Xi.
a steady decline with increasing censoring for both correlated and
uncorrelated independent variables.
89
Table 4.4.1
Means and Variances of 1000 Simulations of the ~ull Hypothesis
Test Statistics and Coefficient Estimates for the
Univariate Multiple Regression Case with
Two Uncorrelated Independent
Variables (~=25)
Censored
b
o
-0.002
0.009
2.281
0.539
2.106
0.499
10
0.005
0.004
2.275
0.534
2.080
0.491
20
-0.024
-0.012
2.207
0.534
1. 974
0.483
30
0.004
0.004
2.217
0.521
1. 933
0.459
o
0.050
0.044
4.690
0.085
4.690
0.099
10
0.050
0.050
4.996
0.086
4.970
0.099
20
0.055
0.049
4 . .113
0.085
4.244
0.100
30
0.053
0.052
5.112
0.087
4.970
0.104
1
b')
90
Table 4.4.2
fPrtX2~XR
' 2 2) and pr(x~~xiu)
Under the Null Hypothesis for the Univariate Multiple
'
Percentiles of 1000 Simu l
at~ons
0
Regression Case with Two Uncorrelated
Independent Variables (N=-25)
Percentiles
%
,-...
NC:;:
~I
N N
x
...
>l<
~
Censored
10
30
50
70
90
95
99
0
0.121
0.330
0.559
0.751
0.929
0.963
0.994
10
0.113
0.345
0.555
0.739
0.927
0.965
0.995
20
0.107
0.340
0.561
0.733
0.921
0.959
0.988
30
0.109
0.317
0.522
0.724
0.929
0.968
0.995
0
0.040
0.269
0.519
0.728
0.923
0.960
0.993
10
0.026
0.280
0.510
0.713
0.919
0.961
0.994
20
0.007
0.262
0.509
0.703
0.912
0.952
0.986
30
0.000
0.220
0.449
0.686
0.917
0.961
0.994
,-...
;::,
N~
N
vI
N
x
...
~
;:l...
91
Table 4.4.3
Means and Variances of 1000 Simulations of the Null Hypothesis
Test Statistics and Coefficient Estimates for the
Univariate Multiple Regression Case with
Two Correlated (p""O. 5) Independent
Variables (N::a25)
%
Censored
(/]
c::
b
l
b
2
2
2
2
2
2
~
Pr (X2~XR)
XRu
Pr (X2~XRu)
0
0.003
-0.015
2.327
0.539
2.151
0.500
10
0.004
0.001
2.294
0.533
2.100
0.490
20
-0.008
-0.001
2.263
0.537
2.029
0.487
30
-0.012
0.005
2.228
0.532
1.944
0.472
0
0.065
0.065
5.425
0.085
5.425
0.098
10
0.069
0.068
5.474
0.086
5.436
0.100
20
0.066
0.073
4.801
0.087
4.705
0.102
30
0.072
0.073
4.549
0.085
4.428
0.102
ell
(])
~
(/l
(])
u
:::
ell
'M
~
ell
:>
92
Table 4.4.4
f Pr (X ~~ ) and pr<X;.,9{2 )
RU
Under the Null Hypothesis for the Univariate Multiple
Percentiles of 1000 Simula t ions
0
Regression Case with Two Correlated (p =0 •.5)
Independent Variables (N-25)
Percentiles
%
Censored
10
30
50
70
90
95
99
a
0.120
0.340
0.550
0.762
0.931
0.968
0.995
~lN
10
0.122
0.329
0.549
0.743
0.925
0.964
0.997
,..
20
0.105
0.349
0.560
0.752
0.923
0.963
0.991
30
0.120
0.341
0.541
0.745
0.923
0.964
0.990
0
0.039
0.280
0.508
0.740
0.925
0.965
0.994
10
0.035
0.262
0.501
0.717
0.917
0.960
0.996,
20
0.000
0.272
0.507
0.723
0.912
0.956
0.989
30
0.005
0.249
0.471
O. 703
0.909
0.960
0.988
'""'
NO:::
N
x
-"
0...
.......
~
N~
N
vI
N
><
,..
'-'
0...
93
xi
The percentiles of the p-values of the
xiu
and
statistics
show that both will produce slightly too many significant results
under the null hypothesis for a
2
sistently less for the "unbiased"
0.10 or lower.
2
XRu
This excess is con-
statistic under all censoring
regimens, and this statistic appears to provide a reliable method
of hypothesis testing under these censoring conditions.
little
There were
if any differences in the percentiles between the uncorre1ated
case (Table 4.4.2) and the correlated case (Table 4.4.4) for both
these test statistics.
4.5
Tests of Means in the Bivariate Normal Case
Hypotheses concerning the mean vector
~
may be tested by evalu-
ating the likelihood functions using the estimated parameters under
each hypothesis, the parameters being computed using the RIP theory
developed in section 3.2, and by then evaluating the likelihood ratio.
The behavior of -2 ln A under the null hypothesis can be evaluated
through simulation studies for Ho : (~1'~2)
=
(~!,~~),
* unspecified,
for censored sampling cases as in the earlier sections of this chapter.
The likelihood ratio in this case is given by
'IT
3
II
n
(4.5.1)
~
(q. ,/q'i)
j=l i=l J~ J
Nhere ~ and
t
t
are the parameter estimates under Hi'
is the estimated variance-covariance matrix under H ,
o
qji' j=1,2,3; i=1,2, ... ,n is defined as in (3.3.1) through 0.3.3),
with q .. evaluated under H
] ~
0
~ld
q., evaluated under Hi
J~
94
and where the exponential includes uncensored Y only.
One thousand sets of 25 random bivariate normal deviates
u2
= 0,
= 022 =
all
1, p
= p')
(~l
=
were generated as previously described
in section 3.4 for each case of 0%, 10%, 20%, and 30% fixed Type I
censoring, with p'
= 0.0,0.5, and 0.8, respectively.
was carried out on the variables as detailed in
sec~ion
resultant combinations of censoring as described there.
(Ul'U~)
thesis H :
o
..
The censoring
3.4, with the
The hypo-
= (0,0) was examined by computing the likelihood
function using maximum likelihood estimators under each hypothesized
X~ = -2 In A.
model and computing the likelihood ratio statistic
"unbiased" likelihood ratio statistic
xiu
The
was also computed by using
an unbiased estimate of the variance-covariance matrix in the likelihood functions.
WI.
This unbiased estimate was obtained by computing
= LWli and w2. = LW2i as in section 3.4 and then multiplying each
Uij by n/[(Wi.- 3 )'(Wj.-3)]
.
The likelihood ratio statistics
xi
and
x~u both had a positive
bias relative to their asymptotic expected value of 2.00, and their
(Tables 4.5.1,
variances were also positively biased for all p'.
4.5.3, and 4.5.5).
9'
The bias of the means increased in magnitude with
in general, although this was not true for all simulated censoring
2
regimens.
The X
BU
means were closer to 2.00 than were the corres-
?
ponding X means in all cases examined.
B
sample size
(~=25)
For this relatively small
and the degree of censoring involved, these test
statistics conform fairly well to their asymptotic expected means and
variances for all p'.
An examination of the percentiles of
!u
?
and
Xi
~
p-values (Tables
4.5.2, 4.5.4 and 4.5.6) shows that although the percentiles are not
95
Table 4.5.1
Means and Variances of 1000 Simulations of the
~ull
Hypothesi 5
Test Statistics and P-values for the Bivariate
Normal Case with 0'=0.0 (N=25)
x
0,
'1
o
Censored
o
'1<
X
o '2
Censored
o
2.143
0.510
2.102
0.500
10
2.089
0.510
2.051
0.500
20
2.092
0.513
2.056
0.505
30
2.281
0.533
2.247
0.525
10
2.052
0.514
2.017
0.506
20
2.217
0.535
2.184
0.527
30
2.187
0.525
2.157
0.517
20
2.090
0.507
2.060
0.499
30
2.172
0.527
2.144
0.520
30
30
2.052
0.506
2.026
0.500
o
o
4.906
0.088
4.906
0.091
10
4.415
0.083
4.416
0.086
20
4.163
0.085
4.163
0.088
30
5.310
0.086
5.310
0.089
10
3.791
0.080
3.791
0.083
20
4.472
0.083
4.472
0.086
30
4.835
0.085
4.335
0.087
20
4.622
0.084
4.622
0.087
30
4.510
0.081
4.509
0.034
30
4.216
0.084
4.215
0.086
10
20
10
20
30
96
Table 4.5.2
?
?
Percentiles of 1000 Simulations of Pr(x;<Xi) and PI' (X·:.2.X;u)
Under the Null Hypothesis for the Bivariate
Normal Case with
p'
=0. a (N=25)
Percen tiles
% xl
% x?
Censored Censored
90
95
99
0.723
0.917
0.964
0.993
0.513
0.705
0.918
0.960
0.990
0.313
0.521
0.713
0.910
0.957
0.988
0.129
0.328
0.534
0.758
0.926
0.966
0.993
10
0.114
0.320
0.526
0.709
0.899
0.955
0.989
~IN
20
0.111
0.339
0.556
0.748
0.916
0.954
0.995
'-'
f..<
0..
30
0.122
0.333
0.528
0.741
0.915
0.959
0.994
20
0.103
0.307
0.508
0.700
0.907
0.966
0.994
30
0.116
0.3.34
0.530
0.732
0.910
0.962
0.993
30
30
0.106
0.303
0.513
0.709
0.903
0.955
0.994
0
0
0.080
0.295
0.495
0.717
0.915
0.963
0.993
10
0.099
0.292
0.504
0.699
0.916
0.960
0.990
20
0.091
0.300
0.512
0.707
0.909
0.956
0.988
30
0.115
0.317
0.526
0.753
0.924
0.966
0.993
10
0.098
0.309
0.517
0.703
0.897
0.954
0.989
20
0.098
0.330
0.550
0.743
0.915
0.954
0.995
30
0.108
0.322
0.521
0.737
0.914
0.958
0.99..t.
20
0.088
0.296
0.501
0.696
0.905
0.966
0.994
30
0.103
0.324
0.524
0.728
0.909
0.962
0.993
30
0.096
0.294
0.506
0.706
0.901
0.955
0.994
a
10
10
30
50
0
0.098
0.310
0.505
10
0.116
0.306
20
0.107
30
70
,--.
C'J ::0
N
x
20
,.....
~
10
Nc:e
:<
N
vi
N
x
'-"
;...
=20
30
97
Table 4.5.3
Means and Variances of the Null Hypothesis Test Statistics
and P-Values for the Bivariate Normal Case
wi th p' =1).5 (N=25)
x
'1
Censored
9<
~
X2
'1<
o
X-
2
B
Pr(X2'::'XB)
X
BU
2 ?
Pr(xz'::'XBu)
0
2.176
0.524
2.135
0.514
10
2.186
0.521
2.148
0.512
20
2.008
0.504
1.972
0.495
30
2.133
0.516
2.099
0.508
10
2.100
0.513
2.070
0.506
20
2.098
0.512
2.071
0.505
30
2.112
0.512
2.087
0.506
20
2.116
0.520
2.094
0.515
30
2.155
0.515
2.136
0.510
30
30
2.217
0.527
2.203
0.524
0
0
4.505
0.086
4.505
0.089
10
4.623
0.087
4.623
0.090
20
3.674
0.085
3.674
0.083
30
4.405
0.088
4.406
0.091
10
4.258
0.086
4.253
0.088
20
4.606
0.084
4.603
0.086
30
4.483
0.087
4.485
0.089
20
4.244
0.081
4.243
0.083
30
4.990
0.086
4.988
0.088
30
4.752
0.085
4.749
0.086
0
Censored
?
2
2
fJl
c
('j
II)
10
~
20
(J)
10
11)
C,)
c
('j
..-I
....
('j
>-
20
30
98
Table 4.5.4
"
?
?
?
2':'XBU)
Percentiles of 1000 Simulations of Pr(X2.:.:.x~)and Pr ( X
Under the Null Hypothesis for the Bivariate
Normal Case with pI =0.5 eN: 25)
Percentiles
7i
%x
10
xl
2
Censored Censored
10
30
50
70
90
95
99
0
0.098
0.327
0.551
0.735
0.914
0.968
0.994
10
0.111
0.312
0.528
0.737
0.923
0.966
0.994
20
0.108
0.305
0.505
0.711
0.906
0.9S0
0.988
30
0.105
0.309
0.528
0.739
0.917
0.951
0.995
10
0.094
0.324
0.511
0.732
0.915
0.950
0.991
20
0.094
0.324
0.524
0.718
0.899
0.954
0.993
30
0.096
0.308
0.506
0.733
0.913
0.956
0.994
20
0.109
0.337
0.534
0.722
0.904
0.954
0.994
30
0.093
0.303
0.534
0.729
0.910
0.959
0.996
30
30
0.112
0.343
0.540
0.726
0.933
0.965
0.989
0
0
0.080
0.313
0.541
0.730
0.913
0.967
0.994
10
0.093
0.299
0.519
0.731
0.922
0.965
0.993
20
0.092
0.292
0.496
0.705
0.904
0.949
0.987
30
0.089
0.297
0.520
0.734
0.915
0.9S0
0.994
10
0.079
0.313
0.503
0.728
0.914
0.949
0.991
20
0.083
0.315
0.518
0.715
0.397
0.954
0.993
30
0.085
0.299
0.501
0.729
0.912
0.955
0.993
20
0.100
0.328
0.530
0.718
0.903
0.954
0.994
30
0.086
0.297
0.529
0.727
0.909
0.958
0.996
30
0.106
0.337
0.535
0.724
0.933
0.965
0.989
0
10
.-..
NiXl
N
~I
N
x
'-'
l-i
p..
20
10
.-..
:::l
N>f
N
vi
N
x
'-'
l-i
.::..
20
30
99
Table 4.5.5
~leans
and Variances of 1000 Simulations of the Null Hypothesis
Test Statistics and P-Values for the Bivariate
Normal Case with p'=O.8 (N=25)
% xl
Ul
%x
2
X-B
2
2
2
2
2
X- BU
pr(X2~XBu)
Censored
2
Censored
0
0
2.470
0.555
2.429
0.546
10
2.214
0.533
2.175
0.524
20
2.091
0.515
2.053
0.506
30
2.073
0.512
2.038
0.503
10
2.099
0.504
2.075
0.499
20
2.224
0.535
2.202
0.531
30
2.172
0.519
2.150
0.514
20
2.123
0.522
2.113
0.520
30
2.405
0.557
2.397
0.555
30
30
2.188
0.526
2.189
0.526
0
0
5.703
0.088
5.703
0.091
10
4.578
0.081
4.578
0.084
20
4.107
0.083
4.106
0.086
30
4.208
0.082
4.208
0.085
10
4.887
0.085
4.883
0.087
20
4.494
0.082
4.492
0.084
30
4.707
0.086
4.705
0.088
20
4.059
0.081
4.055
0.082
30
5.229
0.084
5.230
0.084
30
4.730
0.083
4.723
0.083
10
c::
co
ell
~
20
rJl
~
()
10
c::
co
~
~
co
>
20
30
2
pr(x- .::.\)
100
Table 4.5.6
2 2
2 2
Percentiles of 1000 Simulations of Pr(x :X ) and Pr(X2~xnU)
2 B
Under the Null Hypothesis for the Bivariate Normal
Case with p' =0.8 (N=25)
Percentiles
%X-I
%x
2
Censored Censored
10
30
50
70
90
95
99
0
0.119
0.366
0.583
0.768
0.946
0.974
0.994
10
0.125
0.354
0.542
0.736
0.922
0.964
0.994
20
0.121
0.314
0.512
0.725
0.910
0.958
0.991
30
0.106
0.319
0.517
0.706
0.903
0.960
0.992
10
0.113
0.302
0.500
0.699
0.920
0.965
0.995
20
0.117
0.351
0.550
0.739
0.916
0.962
0.994
30
0.097
0.321
0.524
0.727
0.926
0.965
0.994
20
0.124
0.336
0.530
0.715
0.915
0.959
0.989
30
0.133
0.357
0.575
0.768
0.929
0.966
0.995
30
30
0.121
0.328
0.546
0.734
0.912
0.964
0.993
0
0
0.101
0.353
0.574
O. 763
0.944
0.974
0.993
10
0.108
0.342
0.532
0.731
0.921
0.963
0.994
20
0.105
0.301
0.503
0.720
0.908
0.957
0.990
30
0.090
0.307
0.508
0.701
0.902
0.960
0.992
10
0.101
0.295
0.495
0.697
0.919
0.964
0.995
20
0.109
0.342
0.543
0.735
0.914
0.962
0.994
30
0.087
0.316
0.519
0.723
0.925
0.965
0.994
20
0.120
0.328
0.526
0.714
0.915
0.959
0.989
30
0.131
0.356
0.575
0.768
0.929
0.965
0.995
30
0.123
0.330
0.545
O. 734
0.912
0.963
0.993
0
10
,-...
N~
N
vi
N
x
'-'
l-l
~
20
10
,-...
:::J
~~
vi
:"l:"l
x
'-'
l-l
;:l..
20
30
101
as close to their asymptotic values as was found in the univariate
case, they still fit fairly well for all censoring regimens.
The
percentiles show that both test statistics will produce slightly
more significant tests than expected under H , this excess
o
using
xiu
than using
increasing
~',
xi,
for a < 0.10.
less
This excess increases with
the worst cases being encountered for p'
these simulations.
be~ng
2
0.8 in
Even in this case, if one tests using a • 0.05
or less, the true significance level will almost always be within
0.3 to 1.5 per cent of that value (Table 4.5.6), even for the case
of 30% censoring on one or both variates.
4.6
The Bivariate Normal Linear Model Case
As in the previous bivariate cases, the parameters of this model
may be computed using the RIP theory of Chapter 3 under both Hand
a
HI.
The likelihood ratio is then evaluated by inserting these para-
meter estimates into each likelihood function.
The likelihood ratio
in this case is given by
.A. ::
I t I
I t I
n
n
where 8 and
1
13 ...
L.
and
o
o
/2
(4.6.1)
/2
;1
are the parameter estimates under H
t z are
0
the parameter estimates under HI
qj i' j=I,2,3: i=I,2, ... ,n is defined as in (3.3.1) through (3.3.3),
with q .. evaluated under Hand q .. evaluated under HI.
J~
0
J~
102
Since the degree of censoring of the dependent variables, the
correlation of the dependent variables, and the collinearity of the
independent variables could be important factors, a series of three
simulation studies was done using (1) uncorrelated dependent variables
and two uncorrelated independent variables, (2) uncorrelated dependent
variables and two independent variables having P34
two dependent variables with P12
with 034
= 0.5
= 0.5, and (3)
and two independent variables
= 0.0.
In each series, one thousand samples of n=25 sets of four uncorrelated random normal deviates were generated using the polar method,
with the first two being designated dependent variables and the last
two independent variables.
The dependent and/or independent variables
were transformed to create the appropriate correlation for each simulation, and the likelihood ratio statistic,
ratio statistic using unbiased estimators,
each sample.
xiR and the likelihood
XiRU' were generated for
The effective degrees of freedom were generated as in
Section 4.5.
The b
ij
, i=l,2; j=1,2 were extremely close to their expected
values for all censoring regimens for all three simulation series
(Tables 4.6.1 through 4.6.6).
The sample variances of the estimates
were about 20% higher when the independent variables were correlated
(P34
= 0.5), but were unchanged for correlated versus uncorrelated
dependent variables.
These variances remained stable across all
censoring regimens.
The likelihood ratio statistic
XiR
showed a 10 to 20% positive
bias in its mean relative to the asymptotic expected value for all
103
Table 4.6.1
Means and Variances of 1000 Simulations of the Null Hypothesis
Test Statistics and Coefficient Estimates for the
Bivariate Multiple Regression Case with
Two Independent Variab1es,with
°12"°34::00.0 (N::025)
% xl % x 2
Cen- Censored sored
~
III
al
~
U
b
12
b 2l
b Z2
2
XBR
2 .,
Pr (X./'+<XiR)
2
X'"BRU
.,
-0.004
-0.004
0.000
-0.005
4.576
0.544
4.226
0.504
10
0.003
-0.001
-0.004
-0.005
4.619
0.557
4.248
0.515
20
-0.010
0.060
-0.005
-0.013
4.600
0.553
4.192
0.507
30
0.016
0.007
-0.002
-0.002
4.626
0.553
4.162
0.501
10
0.004
-0.009
-0.010
0.005
4.739
0.560
4.348
0.516
20
0.012
-0.018
-0.001
-0.005
4.681
0.553
4.258
0.506
30
0.C02
0.011
-0.005
0.005
4.618
0.546
4.137
0.492
20
0.010
-0.006
0.014
-0.004
4.739
0.561
4.272
0.509
30
-0.001
0.008
0.006
0.008
4.651
0.556
4.130
0.497
30
30
0.003
-0.005
0.007
-0.003
4.739
0.557
4.159
0.494
0
0
0.046
0.048
0.044
0.049 11.214
0.088
11. 214
0.096
10
0.049
0.049
0.050
0.051 10.140
0.087
10.108
0.096
20
0.047
0.063
0.050
0.052 10.676
0.087
10.623
0.096
30
0.049
0.048
0.058
0.053 11.161
0.087
10.933
0.097
10
0.045
0.051
0.056
0.051 11. 810
0.088
11. 745
0.097
20
0.045
0.041
0.055
0.045 11. 524
0.092
11. 420
0.102
30
0.049
0.051
0.055
0.053 11.600
0.088
11. 421
0.099
20
0.052
0.053
0.049
0.051 11.582
0.088
11. 408
0.098
30
0.056
0.050
0.055
0.054 10.685
0.084
10.471
0.096
30
0.055
0.055
0.052
0.060 11. 777
0.091
11. 433
0.104
10
20
CIl
al
CJ
~
....III
~
10
III
:>
20
30
2
Pr(x4~x..BRU)
0
0
en
b
104
Table 4.6.2
2 2
and Pr (X 4.::.XBRU)
Under the Null Hypothesis for the Bivariate
Percentiles of 1000 Simulations of
pr(X'~'-:~R)
Multiple Regression Case with
iJ 12-0
% xl
iJ 34-0
(N-25)
%x
2
Censored Censored
10
30
50
70
90
9.5
99
0
0.123
0.352
0.550
0.762
0.942
0.975
0.998
10
0.127
0.356
0.586
0.784
0.939
0.971
0.995
20
0.112
0.372
0.571
0.771
0.935
0.970
0.997
30
0.129
0.361
0.568
0.773
0.934
0.972
0.987
10
0.132
0.371
0.576
0.874
0.945
0.974
0.998
20
0.121
0.354
0.576
0.783
0.945
0.975
0.997
30
0.106
0.363
0.558
0.757
0.946
0.979
0.998
20
0.121
0.376
0.582
0.773
0.945
0.975
0.996
30
0.139
0.376
0.572
0.770
0.943
0.976
0.996
30
30
0.120
0.362
0.581
0.787
0.945
0.975
0.997
0
0
0.069
0.288
0.497
0.730
0.933
0.971
0.997
10
0.073
0.288
0.534
0.754
0.929
0.966
0.994
20
0.054
0.298
0.512
0.734
0.922
0.969
0.996
30
0.060
0.279
0.501
0.731
0.920
0.964
0.995
10
0.070
0.303
0.521
0.750
0.935
0.970
0.997
20
0.059
0.281
0.515
0.745
0.935
0.970
0.996
30
0.042
0.281
0.488
0.710
0.935
0.974
0.997
20
0.057
0.296
0.513
0.733
0.932
0.969
0.995
30
0.063
0.283
0.485
0.724
0.926
0.970
0.994
30
0.044
0.257
0.494
0.738
0.931
0.968
0.996
0
.-.
and
10
IX:
N><F.
vi
N-:r
x
......
I-<
0-
20
.-.
::l
0::
N><F.
vI
N-:r
......x
10
~
~
20
30
105
Table 4.6.3
Means and Variances of 1000 Simulations of the Null Hypothesis
Test Statistics and Coefficient Estimates for the
Bivariate Multiple Regression Case with
Two Independent Variables,with
P1Z=0 and P34=0.5 (N=Z5)
% xl
% X2
on
b
b 2l
b 22
2
l2
XBR
2 2
Pr (X42.~R)
2
2 2
XBRU Pr (X4':'X]RU)
Cen- Censored sored
0
-0.007
0.012
0.015
-0.006
4.419
0.537
4.069
0.495
10
a.Oll
-0.007
0.000
-0.010
4.636
0.560
4.265
0.518
20
0.004
-0.006
-0.002
-0.010
4.754
0.565
4.344
0.519
30
0.002
-0.004
-0.079
-0.011
4.873
0.574
4.401
0.522
10
-0.001
-0.009
-0.007
0.007
4.359
0.539
3.969
0.493
20
0.008
-0.011
0.007
-0.007
4.414
0.534
3.989
0.484
30
0.008
-0.012
0.002
0.004
1....679
0.561
4.198
0.507
20
0.009
0.004
0.006
-0.019
4.706
0.556
4.237
0.503
30
0.008
0.002
-0.003
0.003
4.668
0.563
4.148
0.504
30
30
0.004
0.009
0.006
0.008
4.716
0.566
4.137
0.502
0
0
0.062
0.065
0.057
0.056
9.676
0.086
9.676
0.095
10
0.061
0.063
0.066
0.069
10.137
0.088
10.107
0.097
ZO
0.061
0.064
0.069
0.075
11.039
0.085
10.966
0.095
30
0.059
0.063
0.075
0.073
11.860
0.087
11. 661
0.099
10
0.064
0.061
0.059
0.063
8.789
0.082
8.729
0.091
20
0.064
0.057
0.069
0.069
10.412
0.085
10.287
0.095
30
0.071
0.066
0.069
0.070
10.527
0.084
10.369
0.094
20
0.068
0.075
0.074
0.073
11. 977
0.088
11. 793
0.099
30
0.066
0.067
0.079
0.070
10.349
0.082
10.139
0.094
30
0.075
0.083
0.073
0.069
10.465
0.086
10.204
0.098
0
(J)
c:
10
co
aJ
x
20
'Il
(!)
(J
c
co
10
.....
!-o
<il
>
20
30
106
Table 4.6.4
pr(xi~xiR)
2 2
and Pr (X4.2.~RU)
Under the Null Hypothesis for the Bivariate
Percentiles of 1000 Simulations of
Multiple Regression Case with
P12~0
%x
" Xl
2
Censored Censored
and 034=0.5 (NoZ5)
10
10
30
50
70
90
95
99
0
0.123
0.330
0.557
0.754
0.933
0.974
0.994
10
0.113
0.367
0.608
0.779
0.935
0.967
0.996
20
0.148
0.381
0.588
0.778
0.951
0.977
0.996
30
0.137
0.388
0.606
0.801
0.995
0.979
0.998
10
0.124
0.352
0.557
0.746
0.918
0.957
0.994
20
0.119
0.335
0.549
0.748
0.930
0.967
0.997
30
0.136
0.383
0.578
0.770
0.940
0.979
0.995
20
0.127
0.352
0.586
0.776
0.936
0.977
0.998
30
0.145
0.397
0.589
0.776
0.940
0.970
0.996
30
30
0.137
0.363
0.595
0.785
0.939
0.972
0.997
0
0
0.069
0.266
0.504
0.721
0.923
0.969
0.993
10
0.060
1).300
0.558
0.747
0.925
0.961
0.995
20
0.081
0.311
0.532
0.739
0.942
0.972
0.996
30
0.069
f).
301
0.534
0.761
0.929
0.974
0.995
10
0.066
0.283
0.498
0.706
0.905
0.950
0.993
20
0.057
0.257
0.483
0.706
0.917
0.960
0.996
30
0.063
0.301
0.511
0.725
0.926
0.974
0.994
20
0.061
0.270
0.519
0.733
0.934
0.971
0.997
20
0.065
0.305
0.513
0.728
0.923
0.961
0.996
30
0.056
0.263
0.515
0.736
0.921
0.964
0.995
0
---0::
,;j1--r
NCQ
10
;>O:!
'-"
l-<
>l..
20
---;x:
~
N:Q
10
><,
C'~
Vl
x
--r
...
'-"
::l..
20
30
107
Table 4.6.5
Means and Variances of 1000 Simulations of the Null Hypothesis
Test Statistics and Coefficient Estimates for the
Bivariate Multiple Regression Case with
Two Independent
p 12,",0.5
% xl
% x"l
'-
b
U
b
1Z
b
and
n
Variables~ with
p
34=0 (N=25)
b
Z2
2
XBR
Pr(x-<Xi )
4- R
?
?
2
XBRU
2 2
Pr(x 4:;..XBRU )
Cen- Censored sored
0
0
0.006
0.000
-0.001
-0.002
4.629
0.557
4.278
0.517
10
-0.003
0.000
-0.003
0.002
4.569
0.544
4.196
0.501
20
0.015
-0.017
0.007
-0.021
4.739
0.564
4.326
0.517
30
0.002
0.010
-0.003
0.003
4.598
0.546
4.128
0.493
10
-0.001
0.002
-0.008
-0.001
4.532
0.547
4.149
0.502
20
0.001
-0.001
0.006
0.009
4.599
0.560
4.176
0.512
30
Cl.015
-0.006
0.008
-0.008
4.596
0.552
4.122
0.498
20
0.001
-0.002
0.000
-0.003
4.571
0.549
4.072
0.492
30
0.000
-0.078
0.008
-0.054
4.931
0.577
4.359
0.517
30
30
-0.004
0.000
-0.n05
0.001
4.732
0.556
4.185
0.495
0
0
0.049
0.047
0.052
0.045
9.90
0.087
9.90
0.096
10
0.050
0.042
0.047
0.048
10.984
0.087
10.945
0.097
20
0.050
0.048
0.049
0.058
11. 241
0.082
11.156
0.092
30
0.052
0.044
0.051
0.054
11. 824
0.087
11. 666
0.098
10
0.050
0.048
0.051
0.047
10.207
0.085
10.160
0~095
20
0.052
0.049
0.051
0.054
9.487
0.083
9.379
0,092
30
0.051
0.047
0.058
0.055
10.979
0.084
10.709
0.094
20
0.050
0.051
0.054
0.053
10.494
0.086
10.277
0.097
30
0.051
0.051
0.061
0.061
12.058
0.089
11.673
0.102
30
0.057
0.057
0.055
0.053
11. 9 78
0.088
11. 658
0.100
{JJ
c:
til
,1J
::r;
10
20
{JJ
,1J
10
'-i
c:
~
"'\.4""
:tl
>
20
30
108
Table 4.6.6
2 ')
) and Pr(X <~ )
4-- R
4- RU
Under the Null Hypothesis for the Bivariate
Percentiles of 1000 Simulations of
')
?
Pr(x~<~
Multiple Regression Case with
P12z0.5 and 034z0 (N-Z5)
%x
% xl
2
Censored Censored
10
30
50
70
90
95
99
0
0.134
0.355
0.569
0.783
0.936
0.974
0.994
10
0.119
0.344
0.558
0.757
0.948
0.973
0.996
ZO
0.152
0.376
0.572
0.776
0.942
0.982
0.997
30
0.115
0.351
0.568
0.766
0.932
0.976
0.998
10
0.134
0.343
0.578
0.756
0.936
0.968
0.995
20
0.129
0.385
0.584
0.766
0.936
0.965
0.995
30
0.106
0.367
0.574
0.762
0.934
0.973
0.996
20
0.141
0.350
0.568
0.768
0.937
0.976
0.997
30
0.124
0.384
0.616
0.812
0.956
0.979
0.996
30
30
0.119
0.365
0.565
0.774
0.954
0.977
0.997
0
0
0.079
0.292
0.518
0.753
0.926
0.970
0.993
10
0.063
0.Z80
0.502
0.723
0.939
0.969
0.996
20
0.088
0.30Z
0.511
a.741
0.932
0.978
0.997
30
0.050
0.Z64
0.494
0.720
0.918
0.973
0.997
10
0.074
0.271
0.524
0.720
0.926
0.962
0.995
20
0.068
0.312
0.525
0.727
0.921
0.958
0.994
30
0.044
0.283
0.508
0.718
0.919
0.967
0.995
20
0.066
0.263
0.497
0.720
0.921
0.970
0.996
30
0.049
0.294
0.536
O. 762
0.940
0.972
0.995
30
0.049
0.272
0.479
0.719
0.941
0.970
0.996
0
.-..
c::::
N::O
cJ1
x
10
'-'
!-f
p..
20
'""'::>
c::::
N.:Q
10
><.;1
N--:r
><;
'-"
~
~
20
30
109
censoring regimens including no censoring on either variable.
The
bias remained fairly stable over all censoring regimens in every case.
The unbiased statistic
?
BR
expected values than X
X~RU was consistently much closer to the
and should be used as the test statistic in
these hypothesis testing situations.
2
2
The percentiles of the p-value of the X
and X
statistics
BR
BRU
(Tables 4.6.2, 4.6.4 and 4.6.6) show that under the null hypothesis
too many significant results will be produced for a
= 0.10 or lower.
2
The amount of excess was definitely less for the X
statistics under
BRU
all censoring regimens in all three simulation studies, and was consistent across these three simulations.
In order to obtain fairly
accurate a-levels under the null hypothesis in these hypothesis test2
ing situations, a conservative procedure is to use the X
statistic
BRU
at the average a-level given in Tables 4.6.2, 4.6.4 and 4.6.6 under
the corresponding desired (I-a) percentile column headings in carrying
out the test.
A less conservative approach would of course be to
run tests at selected a-levels, and be cognizant of the fact that
marginally significant results should be viewed with caution.
Chapter 5
APPLICATION OF RESULTS TO ENVIRONMENTAL DATA SETS
This chapter demonstrates the usefulness of the theory developed
in Chapters 1 through 4 through direct applications in the analysis of
three different data sets collected by the U. S. Environmental Protection Agency.
All three data sets deal with the detection and measure-
ment of very low levels of selected chemicals in hlnnan tissues,
these
chemicals being polycholorinated organics in one data set and trace
elements in the other two.
In each data set, the censoring value is
determined by the minimum detectable level
(~IDL)
of the chemical under
study.
Since a larger tissue sample contains more of the chemical, and the
tissue is ashed and concentrated prior to analysis, the "wet weight"
tissue concentration
sample weight.
~1DL
is directly dependent upon original tissue
This tissua sample weight could vary considerably due
to many conditions at the time of collection.
Hence, here we are
presented with random Type I left censoring.
5.1
Tissue levels of 10 selected trace elements in maternal
venous blood, cord blood, and placenta
The original investigation (Creason et aL, 1976) from which this
data is taken was aimed at gaining information on the levels of trace
elements present in selected maternal-fetal tissues, and on the inter~elationships
of these levels.
The study rationale and
~aterials
and
methods of collection are fully described in that study, and Tables 1.1.2
through 1.1.4 were drawn directly from it.
110
III
Previous investigations have indicated that a log-normal distribution would provide a good fit to the tissue trace element concentrations.
Hence, logs of the data were taken prior to analysis.
Ten trace
elements were chosen from that data set for demonstration purposes:
Lead (Pb), Cadmiwm (Cd), Mercury (Hg), Lithium (Li), Selenium (Se),
Boron (B), Chromium (Cr), Nickel (Ni), Tin (Sn) and Manganese (Mn).
Using the techniques described in Section 3.3 for a multivariate normal
distribution with randomly censored data, the mean vector and the
variance-covariance matrix of the logs of the ten trace elements were
estimated.
Only observation vectors with measurements on every trace
element were included in the analysis, resulting in the sample sizes
as indicated in Tables 5.1.1. through 5.1.3.
In applying the multivariate
technique one must be concerned as to whether more than two variables in
an observation vector are censored.
For the maternal-fetal tissues,
two observation vectors of the 159 measured had three tissues censored
for maternal blood; five observation vectors of the 156 measured for
cord blood had three tissues censored; and eleven observation vectors of
the 141 measured for placenta had three or more tissues censored.
In
these cases, the corrections to the covariance terms were included for the
first two censored variables encountered in the vector, with the third
censored variable only being corrected in its variance term.
This pro-
cedure should produce very little bias due to the relatively rare occurrence of more than bivariate censoring in the ten variables measured.
The estimation procedure converged quite rapidly for these three
tissues, requiring eight iterations for maternal blood, six for cord
blood and fifteen for placenta.
The maximum likelihood estimates ob-
tained for the parameters of interest are shown in Tables 5.1.1 through
5.1. 3.
112
Table 5.1.1
Maximum Likelihood Estimates of Trace Element
Levels in Maternal Blood (N-156)
j..lg/100 ml
Element
a
Percent
Censored
Geometric
Mean
(GM)
Geometric
Std Dev
(GSD)
GM/GSD
GMxGSD
Pb
0.0
26.34
1.72
15.33
45.25
Cd
3.2
1.58
3.23
0.49
5.11
Hg
26.3
0.31
5.00
0.06
1. 50
Li
1.9
0.38
2.45
0.15
0.93
5e
0.0
10.88
1.44
7.54
15.70
B
5.8
7.97
2.04
3.91
16.26
Cr
0.6
6.87
2.49
2.75
17.13
~i
14. 7
3.38
3.33
1. 01
11.26
Sn
11.5
3.66
1.95
1. R8
7.13
Mn
6.4
2.56
2.02
1. 26
5.18
aCensoring Point is different for each trace element, and can vary
from sample to sample for a given element as a result of varying amounts
of blood collected for analysis. The actual censoring points and number
censored at those points are:
Cd: 0.100(4), 0.200(1)
Hg: 0.100(37), 0.150 (1)
L1: 0.050 (3)
B: 1.000(1), 2.000(2)
Cr: 2.10(1)
:U: I). 700 (1) , 0.800 (1) , 0.900(2), 1.000(15) ,
1.100(4)
Sn: 1. 700 (1) ,
2.300(1) ,
~In : 0.600(1),
1.000(5) ,
2.00 (12) , 2.100(1), 2.200(2) ,
2.700(1)
0.890(1) , 0.900(1), 0.950 (1) ,
1. 300(1)
113
Table 5.1. 2
Maximum Likelihood Estimates of Trace Element
Levels in Cord Blood (N=l59 )
Ilg/100 ml
Element
a
Percent
Censored
Geometric
Mean
(GM)
Geometric
Std Dev
(GSO)
GM/GSO
GMxGSO
Pb
0.0
27.46
1. 79
15.36
49.08
Cd
1.9
1.77
3.07
0.58
5.43
Hg
22.0
0.48
5.09
0.09
2.43
Li
0.6
0.55
2.52
0.22
1. 38
Se
0.0
12.03
1.45
8.31
17.41
B
0.0
10.45
1.92
5.45
20.03
Cr
0.0
7.67
2.09
3.66
16.05
~i
5.7
4.27
3.02
1.42
12.89
Sn
4.4
4.31
1.93
2.23
8.33
Mn
0.6
3.52
1. 79
1.97
6.29
aCensoring Point is different for each trace element, and can vary from
sampla to sample for a given element as a result of varying amounts of
blood collected for analysis. The actual censoring points and number
censored at those points are:
Cd: 1.00(2), 2.00(1)
Hg: 1.000(33), 1.100(2)
Li: O. 0 so ( 1)
~i:
0.700(1), 0.900(1),1.000(4),1.100(2),1.400(1)
Sn: l.000(1) , 1.900(1), 2.000(3),2.200(1),2.300(1)
MIl' 1. 000 ( 1)
114
Table 5.1. 3
Maximum Likelihood Estimates of Trace Element
Levels in Placenta (N::014l)
ilg/lOO g
Element
Percent
Censored
Geometric
Mean
(GM)
Geometric
Std Dev
(GSD)
GM!GSD
GMxGSD
Pb
0.0
31.61
l.73
18.26
54. 74
Cd
0.0
3.81
1. 81
2.10
6.91
Hg
9.9
0.58
5.11
0.11
2.97
Li
0.0
0.46
2.38
0.19
l.09
Se
0.0
14.05
1. 32
10.65
19.52
B
22.7
5.15
2.42
2.13
12.46
Cr
5.0
4. 75
2.20
2.16
10.45
Ni
46.1
1.46
3.52
0.42
5.13
Sn
29.8
3.42
2.26
1.52
7.71
MIl
0.7
7.17
2.14
3.35
15.33
aCensoring Point is different for each trace element, and can vary from
sample to sample for a given element as a result of varying amounts of
tissue collected for analysis. The actual censoring points and number
censored at those points are:
Hg: 1. 000 (14)
B: 2.00(13) , 3.000(10)
Cr: 1. 000 (3) , 2.000(4)
~i: 0.800 (1) , 1. 000 (40)
So: 2.000(24) J 2.330 (1)
Mn: 2.000 (1)
J
J
J
3.500(1) , 4.000(5), 5.000(1)
J
2.000 20) , 2.330 (1) , 3.000 (2)
3.000 ll) J 4.000(4), 5.000 (2)
5.500 (1) , 6.000 (1)
115
Table 5.1.4
Covariance Matrix and Correlation Matrix ML Estimates for Maternal
Blood (8 censored), Cord Blood (0 censored), and
Placenta (33 censored) Boron Levels (N=l59)
Variance-Covariance Estimates
Maternal Blood
Maternal
Blood
0.5669
Cord
Blood
Cord Blood
Placenta
0.0692
-0.1005
0.4521
0.0411
Pl3centa
0.7949
Correlation Estimates
Maternal Blood
Maternal
Blood
Cord
Blood
Placenta
1
Cord Blood
Placenta
0.137
-0.150
,...
0.069
l
116
Another aspect of this data set to be examined was the interrelationship of trace element levels in different tissues.
As an
example, consider the measurement of Boron in maternal blood, cord
blood, and placenta.
Maternal blood and placenta measurements involve
censored values, while there are no censored values in cord blood.
The results of applying the estimation procedure for the variancecovariance matrix, and hence for the correlation matrix, of the logs
of these measurements are given in Table 5.1.4.
were required for convergence.
Only four iterations
The type of situation demonstrated by
Boron levels in these tissues, wherein levels are below the MOL in one
or two tissues but not in others, have occurred with great frequency
in similar studies carried out by EPA scientists.
Hence a definite
need for the development of this estimation procedure has been demonstrated.
Polychlorinated hydrocarbons and polychlorinated
biphenyls in human plasma
This data set is drawn from Finklea et al (1972), and is the
source of Table 1.1.1.
In that particular paper, a percentile trans-
form was taken and this transformed data was then analyzed, with censored
data replaced by one-hali their cumulative frequency.
In the analysis
to be carried out here, logs of the data are taken, and the censored
estimation procedures and likelihood ratio hypothesis testing procedures detailed earlier are applied.
First, for each of six residues measurm (OP'DDT, DDD, DOE, PP'DDT,
Dieldrin, and PCB), the geometric means were estimated for each of
four race-residence groups (Table 5.2.1) using the univariate k-sample
iterative techniques as developed in Section 2.2.
The censoring points
-
-
e
Tab Ie 5. 2. 1
l1aximum Likelihood Estimates of the Means and Variances of the Logs
of Six Plasma Residues by Four Race-Residence Groups
Retddue
and
Group
op'DDT
Rural
Urban
Urban
Rural
DDT
Rural
Urban
Urban
Rural
DOE
Rural
Urban
Urban
Rural
Number
of
Observations
Number
Censored
Mean
of
Logs
29
black
black
white
white
139
175
199
210
124
102
178
black
black
\oJhite
white
139
175
199
210
130
101
66
136
black
black
white
white
139
175
199
210
0
2
black
black
white
\-lhite
139
175
199
210
0
2
2
13
Common
Variance
of Logs
24
0.03
0.14
0.12
0.04
2.156
-3.942
-1. 779
-1. 570
-2.095
2
0.02
0.17
0.21
0.12
0.417
2.082
1. 599
0.841
1.110
0
0
Geometric
Means
(ppb)
2.404
-3.498
-1.960
-2.145
-3.278
106
3
~DDT
Rural
Urban
Urban
Rural
Iterations
for
Convergence
8.02
4.95
2.32
3.03
0.581
1. 9/.8
1. 385
0.298
0.290
7.01
3.99
1. 35
1.34
~
~
'-.J
Table 5.2.1 (Continued)
Number
of
Observationti
Number
Censored
black
black
white
white
139
175
199
210
98
122
36
78
Black
black
\Jhite
white
107
151
166
192
102
94
Residue
and
Group
Dieldrin
Rural
Urban
Urban
Rural
PCB
Rural
Urban
Urban
Rural
Iterations
for
COl!vergence
Mean
of
Logs
Common
Variance
of Logs
-2.500
-2.585
-1. 314
-1.499
26
0.08
0.08
0.27
0.22
2.049
-2.169
-0.347
0.153
0.526
74
(ppb)
1.923
12
77
Geometric
Means
O.ll
0.71
1.17
1.69
l-'
......
(Xl
e
e
e
119
(MDL's) of the six residues are given in this study by 0.1 or 0.2
parts per billion (ppb) for all residues except PCB, and by 1.0 ppb
for PCB.
The number of each residue censored in each group is shown in
Table 5.2.1 as well.
The high degree of censoring of some residues
is quite obvious from this table.
The correlation matrix of the log-transformed residues can be
estimated using the iterative method of Chapter 3.
the example of Section 5.1, when
~ore
However. as in
thffil two variables in an obser-
vation vector are censored, biases in the estimation procedure are intoduced.
In this data set, more than two variables are censored in over
250 of the 723 observation vectors due to tbe relatively high degree
of censoring occuring on many of the residues.
With this caution well
in mind, the estimation procedure was applied to this heavily multiple
censored data set.
wben more than two variables were censored, cor-
rections to the covariance term of the first two censored variables encountered in the vector were made.
The variables were originally ordered
so that the variables most heavily censored were first in the data vector.
Forty-six iterations were required for convergence in this case, with
the resultant correlation matrix as given in Table 5.2.2.
The cor-
relations shown were quite close to those obtained using the percentile
transform as given in the original paper by Finklea et alAs a final step in the re-analysis of this data set, a linear model
was fit to each residue, the model being
le (residue)
=
So + 6 (age) + 3 (sex) + 8 (groups) + error
2
3
1
The group variable was also broken down into its three individual degrees of freedom as a Race Effect, Residence Zffect, and Race-Residence
120
Table 5.2.2
a
Maximum Likelihood Estimates of the Correlation Matrix
for Plasma Chlorinated Hydrocarbon Pesticides
(Log Transformation)
pp'DDT
DDE
op'DDT
DDD
Dieldrin
PCB
a
pp'DDT
N=723
DDE
N=723
op'DDT
N-723
1
0.767
0.150
1
DDD
N-723
Dieldrin
N-723
PCB
N-616
-0.145
-0.160
-0.243
0.060
-0.047
-0.244
-0.264
1
0.706
0.213
0.041
1
0.276
0.099
1
0.146
1
These estimates should be regarded with caution due to the high degree of
multiple censoring occurring throughout the observation vectors.
e
e
e
Table 5.2.3
Hypothesis Tests of Effects of Demographic Variables on Selected Plasma
Chlorinated Hydrocarbon Residues Using Likelihood Ratio Statistics
EEfect
d. f.
_jJP'DUT
2
X
P
X
DDE
2
op'DDT
2
P
X
I
II
P
X
a
Age
1
10.9
0.001
37.0 <0.001
0.8
NS
S ex
1
13.0
<0.001
35.0 <0.001
3.5
NS
3
445.9
<0.001
331.9 <0.001
73.4
<0.001
roup::;
Race
1
433.9
<0.001
299.0 <0.001
,I
0.01
NS
I
Res
Race-Res
T otal
1
26.1
68.9 <0.001
28.9
<0.001
9.1 <0 .003
452.8
<0.001
358.0 <0 .001
1
5
<0.001
!
:
DDD
2
73.0
<0.001
1.5
NS
81.0
<0.001
Dieldrin
2
X
P
.•
\
P
PCB
2
X
P
2.4
NS
3.9
0.05
1. 31
NS
0.9
NS
1.2
NS
0.01
NS
116.7<0.001
l
I
91. 7 <0.001
121.01 <0 .001
50.1 <0.001
86.8
<0.001
105.50 <0.001
90.2 <0.001
0.1
NS
22.90 <0.001
19.1 <0.001
2.0
NS
50.20 <0.001
I
125.5 <0.001
101. 7 <0.001
:
126.20 <0 .00 1
I
d
NS signifi~s p
>
0.05
Percent at Each Residue Below Minimum Detectable:
Residue
Percent Censored
pp'DDT
DDE
op'DDT
DDD
Dieldrin
PCB
2.4
0.3
70.5
59.9
46.2
56.3
......
N
......
122
Interaction.
By fitting complete and incomplete models and computing
their likelihoods as described in Chapter 4, likelihood ratio statistics
can be computed to test for the effect due to
~ach
parameter involved, the
likelihoods being computed as demonstrated in section 4.6 for censored
data.
5.3
The results of these tests are given in Table 5.2.3.
HANES Data: trace elements in scalp hair and blood
In a co-operative effort carried out in 1972, EPA joined with HEW
in collecting trace element data during an on-going survey of randomly
selected participants from two communities, one being New York City
and the other a small community in Washington state.
This on-going
survey is called the Health and Nutrition Examination Survey, or HANES.
By using the facilities available through HANES, EPA obtained scalp hair
sampl~s
and some blood samples for trace element analysis, and through
a questionnaire also obtained some covariate information.
This pilot
study resulted in complete information on 168 scalp hair samples and
102 blood samples, with approximately 38 subjects donating both blood
and hair for analysis.
One set of objectives of the study was to determine
whether pollutant burdens as measured by scalp hair (long term indicator)
and blood (short: term indicator) varied bet"'een an urban and rural
setting, and whether the scalp hair and bluod trace element levels
were related.
The overall levels of these trace elements were also
of interest.
The trace elements under study ;"1ere are lead (Pb), Cad-
mium (Cd), Mercury (Hg), Arsenic (As), Selenium (Se), Chromium (Cr),
Silver (Ag), Vanadium (V), Tin (Sn), and Managanese (Mn).
In scalp
hair, Arsenic is the only element censored to any degree (29 of 168
below
~IDL)
below
~L
while in blood, 6 of the 10 trace elements had 15 or more
out of 102 samples, with Vanadium having the maximum
e
e
e
Table 5.3.1
Maximum Likelihood Estimates of Heans and Covariance Matrix
of Logs of HANES Scalp Hair Trace Element Data (N=168)
Scalp Hair
---Pb
Cd
Hg
As
Se
Cr
Ag
V
Sn
Mn
2.661
-0.034
-0.024
- 3.171
-0.828
-0.689
-1. 715
-1. 989
-0.620
-0.147
Geometric Mealls(llg/g) 14.300
U.970
0.980
0.040
0.440
0.500
0.180
0.140
0.540
0.860
Mean of Logs
Number Censored
0
0
1
29
1
0
0
3
0
0
Covariance Matrix of Logs of Scalp Hair Trace Elements
Pb
0.875
Cd
Hg
As
Se
Cr
0.398
0.395
0.441
0.123
0.469
0.585
0.119
0.313
0.184
0.725
0.258
0.367
0.048
0.129
0.313
0.053
-0.106
0.130
1.176
0.258
0.086
0.424
0.370
-0.054
0.267
0.116
3.214
0.145
0.665
0.729
0.218
0.120
0.257
0.749
0.214
0.002
-0.026
0.025
0.041
2.305
1.203
1.062
1.155
0.902
2.873
0.835
0.929
0.790
2.075
0.740
0.750
1. 519
0.562
Ag
V
Sn
1.077
~h1
Censoring Points for Hg:
As:
Se:
V:
0.050
0.002. 0.003(18). 0.005(2), 0.010(6), 0.012. 0.014
0.038
0.002. 0.004(2)
~
N
W
124
Table 5.3.2
HANES Study: Regression Analysis of Logs of Scalp Hair
Arsenic Data* (N~199, Number Left Censored := 31)
Means
Sex
Linear Age
Quadratic Age
City
As
Source
Sex
Variances
Coefficients
-0.0653
0.9957
0.1554
3.5201
5.4807
-0.9506
17.8718
316.0151
0.1255
0.0754
0.9943
0.3234
-2.9569
3.2796
Sources of Variation
2
degrees of freedom
K.
p value
1.47
1
NS
18.20
2
0.001
linear
17.95
1
0.001
quadratic
18.03
1
0.001
I) ~71
1
0.01
27.90
4
0.001
Total Age
City
Overall
*Sex was coded 1 := male; -1 := female while city was coded 1 = New York City,
-1 = Moses La~e, Washington with "Age" := age in years/l0 and quadratic
age := ("Age")-.
e
e
e
Table 5.3.3
Maximum Likelihood Estimates of Means and Covariance Matrix
of Logs of HANES Blood Trace Element Data (N;102)
Blood
M~an
of Logs
Geometric Means
(lJg/100 011)
Number l.eft-Censored
Ph
Cd
Hg
As
Se
Cr
Ag
V
-1. 2429
-4.2798
-5.5330
-7.3508
-2.0728
-2.5985
-5.8408
-4.8766
28.9
1.4
0
1
0.4
17
0.06
29
12.6
7.4
0
0
0.3
30
0.8
67
Sn
Mn
-3.4130
-3.6175
3.3
22
2.7
15
Covariance Matrix of Logs of Blood Trace Elements
Pb
0.2189
Cd
Hg
As
O. 10 '31
0.0239
0.0147
-0.0038
0.0054
0.0911
0.0180
-0.0216
o.1l4 3
1. 3633
-0.1366
0.4018
0.0278
-0.0286
-0.1412
0.1060
-0.0283
0.1843
1.4293
-0.4215
0.0880
0.098
0.1054
-0.0312
-0.1085
-0.0586
4.2669
0.0140
-0.411
-0.0941
0.0129
0.2590
0.3367
0.0014
0.0490
0.0129
0.0140
-0.0464
0.6651
0.1218
-0.0368
-0.0008
0.2670
1. 4195
-0.0400
0.0487
0.1044
0.3354
0.0879
0.0677
0.4761
0.1086
0.1506
Se
Cr
Ag
V
Sn
Mn
0.8148
Censoring Points (in IJg/100 m1)
Cd:
Hg:
As:
0.100
0.100(15), 0.200(2)
0.020(28), 0.100
Ag:
V:
Sn:
Mn:
0.080, 0.090(7), 0.100(20), 0.200, 1.000
0.800(5), 0.900(20), 1.000(39), 1.100, 1.300, 2.000
1.700(2), 1.800, 1.900, 2.000(18)
0.900(5), 1.000(10)
~
t"J
V.
126
Table 5.3.4
HANES STUDY: Regression Analysis of Logs
of Blood Trace Element Data
MERCURY (16 left-censored out of 104)
Means
Sex
Age(l)
Age(q)
City
1n(Hg)
-0.2115
2.8654
12.9192
-0.5384
-5.5731
Variances
Coefficients
0.9553
4.7088
251.0843
0.7101
1. 3529
0.0464
0.3224
-0.0580
-0.2974
Sources
2
X
Sex
0.09
Age(l)
2.55
4.32
Age(q)
Total Age 6.73
City
5.02
13.52
Overall
d.f.
p
value
NS
NS
.038
.035
.025
<.01
1
1
1
2
1
4
ARSENIC (38 left-censored out of 110)
Means
Sex
Age(l)
Age(q)
City
1n(As)
-0.2182
2.8609
12.9105
-0.5273
-7.6174
Variances
Coefficients
0.9524
4.7257
250.7066
0.7220
4.2310
-0.1193
0.2091
-0.0141
1.1519
Sources
2
X
0.26
Sex
Age(l)
0.29
0.02
Age(q)
Total Age 1.61
26.58
City
Overall
31.08
d.f.
p
value
NS
NS
NS
NS
<.001
<.001
1
1
I'
2
1
4
SILVER (34 left-censored out of 112)
Means
Sex
Age(l)
Age(q)
City
1nCAg)
-0.2321
2.8732
12.9954
-0.5357
-5.8725
Variances
Coe ff icients
0.9461
4.7400
251.1833
0.7130
1.6470
-0.0871
-0.1453
0.0060
-0.1943
Sources
Sex
Age(l)
Age(q)
Total Age
City
Overall
X2
d.f.
0.37
0.34
0.05
3.11
1.58
5.76
1
1
1
2
1
4
P
value
NS
NS
NS
NS
NS
NS
VANADIUM (73 left-censored out of 109)
Means
Sex
Age(l)
Age(q)
City
1n(V)
-0.2477
2.9184
13.2228
-0.5413
-4.9548
Variances
Coefficients
0.9386
4.7061
252.0942
0.7070
0.5015
-0.0220
0.0907
-0.0044
0.3115
Sources
2
X
Sex
0.17
Age(l)
0.12
Age(q)
0.20
Total Age 2.18
City
11.34
Overall
15.44
d.f.
1
1
1
2
1
4
p
value
NS
NS
NS
NS
<.001
<.01
127
Table 5.3.4 (cont)
.:ill! (23 left-censored out of 110)
Means
Sex
Age(l)
Age(q)
City
1n(Sn)
-0.2727
2.8564
12.8211
-0.5455
-3.4610
Variances
Coefficients
0.9256
4.6623
248.6282
0.7025
0.4861
0.0664
0.0529
0.0047
-0.157
Source
Sex
Age(l)
Age(q)
Total Age
City
Overall
2
X
d.f.
p value
0.82
0.12
0.01
7.59
0.04
8.61
1
1
1
2
1
4
NS
NS
NS
.022
NS
NS
MANGANESE (16 left-censored our of 113)
Means
Sex
Age(l)
Age(q)
City
1n(Mn)
-0.2389
2.8841
13.0291
-0.5398
-3.5920
Variances
Coefficients
0.9429
4.7113
249.0881
0.7086
0.8669
0.0910
-0.0299
0.0054
0.3280
Source
X
2
Sex
0.97
Age(l)
0.02
Age(q)
0.01
Total Age 0.01
10.08
City
11.64
Overall
d.£.
1
1
1
2
1
4
p
value
NS
NS
NS
NS
<.01
.020
Table 5.3.5
Bivariate Multiple Linear Regression Analysis of HANES Log Scalp Hair Arsenic and
Log Blood Arsellic Data (5 Scalp Hair and 14 Hloods Left-censored out of 38)
--l>1eans
Variances
I
2
X
d. f.
p value
Sex
0.38
2
NS
Source
-
0.1198
-0.1452
5.4534
-0.2736
0.3034
Age(l)
0.38
2
NS
HI. 6992
318.2345
0.0456
-0.0251
Age(q)
0.26
2
NS
City
-0.6316
0.6011
0.4842
1. 5281
Total Age
1.08
4
NS
11l(Hair As)
-3.1074
2.5902
City
14.30
2
<.001
]n(Bld As)
-7.6127
4.0675
Overall
20.78
8
<.01
Sex
0.2105
Age(l)
3.6395
Age(q)
0.9557
Coefficients
Blood
Scalp Hair
......
N
ex>
e
e
e
129
Table 5.3.6
Maximum Likelihood Estimates of Log Scalp Hair to Log
Blood Correlations in HANES Trace Element Data
Number
of
Observations
Number of
Number of
Scalp Hair
Bloods
Left-censored Left-censored
Correlation
Hg
40
0
7
0.0463
As
40
6
15
0.2948
Ag
38
0
10
0.1125
V
37
a
26
-0.1350
Sn
35
a
6
-0. 0880
Mn
38
a
11
0.3440
130
number (67) and Silver being the next most censored with 30.
Censoring
points were random, as in the maternal-fetal study.
First, let us examine the
~calp
hair.
Means and variances of the
logs of the trace elements and their covariance matrix were estimated
using the iterative methods of Chapter 3.
Table 5.3.1.
Results are shown in
Four iterations were required for convergence of the
estimation procedure in this case.
One may look at each scalp hair
trace element separately in a linear model relating each measurement
to the covariates of interest (linear age effect, quadratic age effect,
sex effect, and city effect, here).
Table 5.3.2 gives the results for
a linear model fit to Arsenic measurements, as well as tests of significance as outlined in Chapter 4.
The uncensored trace elements can
of course either be analyzed in the standard way or subjected to the
same likelihood ratio test as Arsenic.
The other scalp hair trace
elements were analyzed, but are not presented here, for the sake of
brevity.
The blood data can be analyzed as was the scalp hair data.
Of
course, che blood data is subject to a good deal more censoring than
the scalp hair data so that only biased estimates are available for
the correlation coefficients, since more than two observations have
been censored in some vectors.
There were in fact 26 occurences of
more than two variables censored in an observation vector out of the
102 samples available.
Thir:y iterations were required for conver-
gence to the estimates in this example.
Six trace elements were subject to extensive left censoring in
the blood data, as seen in Table 5.3.3.
These six trace elements --
131
Mercury, Arsenic, Silver, Vanadium, Tin, and Manganese -- were subjected to a regression analysis of their log values and coefficients
were tested using the likelihood ratio statistics as developed in
Chapter 4.
Results of these analyses are detailed in Table 5.3.4.
The techniques worked quite well, with fast convergence and quite
reasonable results when closely examined.
If one considered the long term and short term pollutant burden
indicators together (scalp hair and blood, respectively) in looking for
differences due to covariates of interest, one is presented with a
bivariate multiple linear regression analysis.
For example, in examining
the relationship of (log scalp hair Arsenic, log blood Arsenic) pairs
of trace element measurements to age, sex, and city of residence as in
the univariate case previously discussed, one is confronted with a bivariate multiple linear regression problem.
When the methods of Chap-
ter 4 are applied to the available data pairs, the results are as detailed in Table 5.3.5.
The only significant difference detected was
between cities, a result consistent with the two univariate analyses of
log (Arsenic) as well as being consistent with the observed city means
for Arsenic.
Again, no problems were encountered in this analysis, or in
any of the the other bivariate analyses run but not presented here, in
2
achieving convergence or obtaining a X test of significance.
Correlations of trace element concentrations across tissues are
also of interest in this study.
The maximum likelihood estimates of the
scalp hair to blood correlations of logs of the trace elements are given
in Table 5.3.6 for those trace elements subject to censoring in one or
both variables.
Rapid convergence was attained for all these covariance
estimates, including those with high degrees of censoring in both blood
and scalp hair trace element levels.
Chapter 6
SUMMARY AND CONCLUDING
R~~S
An iterative maximum likelihood procedure for estimation of the
parameters of a randomly censored univariate or bivariate normal distribution has been established, based upon Orchard and Woodbury's
"Missing Information Principle".
This procedure has been shown to be
immediately extendable to multivariate normal situations wherein no
more than two variables are censored in anyone observation vector.
Using the same basic approach, a procedure for estimation of parameters
in univariate and bivariate k-sample and multiple linear regression
models is established using iterative "Restricted Information Principle" methods.
These procedures are shown to produce results in the
univariate case that are identical to those obtained using methods
established for very specific problems in the literature.
Simulation
studies are used to establish the effectiveness of the procedure in
the bivariate cases.
Sampling errors of the estimates are derived in
the univariate case, and are estimated through simulation studies in
the bivariate cases.
Likelihood ratio tests are developed for hypothesis testing under
the various models under random censorship conditions.
Simulation
studies are again used to evaluate the behavior of these tests for
moderate sample sizes, as well as to evaluate the behavior of several
approximate tests suggested by earlier authors for specific applications.
The likelihood ratio tes ts are shown to produce sign i-
132
133
ficance levels quite close to the desired one under the null hypothesis.
In addition, one simulation study of the power of the likelihood ratio
test under alternative hypothesis in the univariate one-sample case
under varying censoring regimens establishes that the likelihood ratio
test is definitely the better of any of the test procedures examined.
There was of course a loss of power with increased censoring in all of
the tests considered, although this loss was small in most cases.
These estimation and hypothesis testing procedures were applied to
randomly censored data obtained from the U.S. Environmental Protection
Agency and were found in every case to be
effec~ive.
converging quickly
to maximum likelihood estimates and providing hypothesis testing procedures that were easily applied using the computer programs developed
for that purpose.
Further investigation of the power of the likelihood ratio tests
under random censorship conditions would be enlightening, and could be
carried out using the simulation procedures established in this study.
In addition, the biases introduced when the procedures used in the multivariate normal case with more than two variables censored in some observation vectors needs to be explored further, and the best iterative
schemes to reduce this bias need to be established.
The "Restricted
Information Principle" could also be applied to other than multivariate
normal distributions that are randomly censored.
Distributions of in-
terest must be established so that the iterative methods developed here
could be modified and applied to these alternative distributions.
The procedures developed in this work were motivated by one particular problem--data sets consisting of trace element measurements
with many measurements being below the minimum detectable level of the
instrument being used.
However, it is immediately obvious that a great
134
many other problems involving life testing with random censorship,
where the underlying distributions of life-times (or their transformation) can be considered normal, are also solved using the methods in
this work.
These applications can be investigated and enlarged upon
as use of the iterative methods is expanded.
LIST OF REFERENCES
Beale, E. and Little, R.
(1975). Missing values in multivariate
analysis. Journal of the Royal Statistical Society, Series B.
37: 129-145.
Blight, B.J.N. (1970). Estimation from a censored sample for the
exponetial family. Biometrika. 57: 389-395.
Buck, S.F. (1960). A method of estimation of missing values in
multivariate data suitable for use with an electronic computer.
Journal of the Royal Statistical Society, Series B. 22: 302-306.
Cohen, A.C. Jr. (1950). Estimating the mean and variance of normal
populations from singly truncated and doubly truncated samples.
Annals of Mathematical Statistics. 24: 557-569.
Cohen, A.C. Jr. and Woodward, J. (1953). Tables of Pearson-Lee-Fisher
functions of singly truncated normal distributions. Biometrics.
9: 489-497.
Cohen, A.C. Jr. (1955). Restriction and selection in samples from
bivariate normal distributions. Journal of the American Statistical Association. 50: 884-893.
Cohen, A.C. Jr. (1957). On the solution of estimating equations for
truncated and censored samples from normal populations. Biometrika. 44: 225-236.
Cohen, A.C. Jr. (1959). Simplified estimators for the normal distribution when samples are singly censored or cruncated. Technomecrics. 1: 217-237.
Cohen, A.C. (1961). Tables for maximum likelihood estimates: singly
truncated and singly censored samples. Technometrics. 3: 535-541.
Creason, J.P. et al (1976). Maternal-fetal tissue levesl of 16 trace
elements in 8 selected continental United States communities.
Proceedings of Trace Substances in Environmental Health - X.
D.D. Hemphill, editor, 53-62.
Dempster, A.P., Laird, N., and Rubin, D. (1977). Maximum likelihood
from incomplete data via the EM algorithm. Journal of the Royal
Statistical Society, Series B. 39: 1-21.
135
136
Finklea, J., et al (1972). Polychlorinated biphenyl residues in
human plasma expose a major urban pollution problem. American
Journal of Public Health. 62: 645-651.
Glasser, M. (1965). Regression analysis with dependent variable censored. Biometrics. 21: 300-307.
Griffing, J., et al (1961). One year's clinical experience with
5-fluorouracil and X-ray--a preliminary report. Cancer Chemotherapy Reports. 12: 63-88.
Gupta, A.K. (1952). Estimation of the mean and standard deviation of
a normal population from a censored sample. Biometrika. 39:
260-273.
Hahn, G. and Miller, J. (1968a). Methods and computer programs for
estimating parameters in a regression model from censored data.
General Electric Research and Development Center TIS Report 68C-277.
Hahn, G. and Miller, J. (1968b). Time-sharing computer programs for
estimating parameters of several normal populations and for regression estimation from censored data. General Electric Research and Development TIS Report 6S-C-366.
Hahn, G. and Shapiro, S. (1967). Statistical Models in Engineering,
John Wiley and Sons, Inc., New York, ~.Y.
Hald, A. (1949). ~!aximum likelihood estimation of the parameters of
a normal distribution which is truncated at a known point. Skandinavisk Aktuarietidskrift. 32: 119-134.
Jeeves, T.A. (195S). Secant modification of Newton's methods.
Assoc. Compo Mach. 1:9.
Lea, D. (1945). The biological assay of carcinogens.
5: 633-639.
Corom.
Cancer Research.
Orchard, T., and Woodbury, X. (1972). A missing information principle:
Theory and applications. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, University of
California Press, 697-715.
Rao, C.R. (1952). Advanced statistical methods in biometric research,
John Wiley and Sons, Inc., New York, ~.Y.
Rosenbaum, 3. (1961). Moments of a truncated bivariate normal distribution. Journal of the Royal Statistical Society, Series B.
23:
405-408.
Sampford, X.R. (1952). The estimation of reponse time distributions.
II Multi-stimulus distributions. Biometrics. 8: 307-369.
137
Sampford, M.R. and Taylor, J. (1959). Censored observations in randomized experiments. Journal of the Royal Statisticsl Society,
Series B. 21: 214-237.
Singh, N. (1960). Estimation of parameters of a multivariate normal
population from truncated and censored samples. Journal of the
Royal Statistical Society, Series B. 22: 307-311.
Stevens, W.L. (1937). "The truncated normal distribution", appendix to
"The calculation of the time-mortality curve" by C.I. BLiss.
Annals of Applied Biology. 24: 815-852.
Taylor, J. (1973).
observations.
The analysis of designed experiments with censored
Biometrics. 29: 35-43.
Wine, R.L. (1964). Statistics for Scientists and Engineers,
Hall, Inc., Englewood Cliffs, N.J.
Prentice-
Wolfe, P. (1959). The secant method for simultaneous nonlinear equations.
Comm. Assoc. Compo Mach. 2:12.
Woodbury, M. (1971). "Discussion of paper by Hartley and Hocking."
Biometrics. 27: 808-817.
Woodbury, M. and Hasselblad, V. (1970). Maximum likelihood estimates
of the variance-covariance matrix from the multivariate normal.
SH}~E National Meeting, Denver, Colorado, March, 1970.