, University of
"
~iture of s~~~h ~rolina
o Series # 21~giLCS
l.ogis ta tic Do
Under CampI' se-Response M-~ I
R
Lance Mea
'uue s
or Some ProbI
surement Er
. TOXicology
ems In Inhalation ror
, BY;
Molrio O1en !'bk
!l1~amiiiee--,
,
n:::-;----__
'-
Date
I
1
I
I,
I
I
I
,I
I
,I
I
I
I
•
I
LOGISTATIC DOSE-RESPONSE MODELS UNDER
COMPLIANCE MEASUREMENT ERROR FOR SOME
PROBLEMS IN INHALATION TOXICOLOGY
by
Mario Chen Mok
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2180T
May 1997
-
,
Logistic Dose~Response Models
under Compliance Measurement Error for
some Problems in Inhalation Toxicology
by
Mario Chen Mok
A dissertation submitted to the faculty of The University of North Carolina at
Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of
Philosophy in the Department of Biostatistics, School of Public Health
Chapel Hill
1997
Approved by:
~~isor
~ J1A. Reader
(3""'7 11•??17r....
Reader
© 1997
Mario Chen Mok
ALL RIGHTS RESERVED
11
ABSTRACT
MARIO CHEN MOK: Logistic Dose-Response Models under Compliance Measurement
Error for some Problems in Inhalation Toxicology.
(Under the direction ofDr. PRANAB KUMAR SEN.)
In a classical measurement error model, true regressors, though unknown, vary
stochastically around' the specified dose level.
In inhalation toxicology problems,
however, it is reasonable to assumed that the administered dose is not completely
absorbed into the system because a portion of the dose is either metabolized or deflected
by the different mechanisms in the body. This results in stochastic compliance of the
administered dose, typically being smaller in magnitude.
We introduce a model· for
compliance based on a Beta distribution.
Logistic models are among the most widely used models in dose-response
analysis. In this research, we are primarily interested in stochastic compliance of various
types, and the study of its impact on logistic models. We consider suitable adjustments
for compliance of the administered doses, and develop appropriate approximate models
that are likely to have reduced bias and more efficient statistical conclusions than in
logistics models that ignore such measurement error.
We develop compliance-adjusted models for the classical logistic model, a
bivariate logistic model, and a segmented logistic model. These compliance-adjusted
models are categorized into three types: (i) Low (or near zero) dose levels, (ii)
moderate dose levels, and (iii) high dose levels. The low dose setting is appropriate
1II
when studying agents in the environment, which are normally found in low levels.
Moderate doses are commonly encountered in dosimetry studies, where higher doses
than those encountered in the environment are needed to obtain a suitable number of
responses. High dose levels are more common in accelerated life testing problems.
.
A set of data provided by the Radiation Effects Research Foundation is analyzed
to illustrate the practical application of the approximate models.
In addition, a
simulation study is carried out to examine the statistical performance of the models under
different situations.
•
IV
ACKNOWLEDGMENTS
I would like to express my gratitude to my advisor, Dr. Pranab K. Sen. The
completion of this research would not have been possible without his constant guidance
and patience.
Special thanks to Dr. Keith Muller for his continued mentorship
throughout the years I have been in the Department. Thanks also to the rest of my
committee members, Dr. Michael Symons, Dr. Barry Margolin, and Dr. Dana Loomis
for their valuable comments. I would also like to thank my academic advisor, Dr. Kant
Bangdiwala, and my friend Dr. Sergio Munoz for their support and friendship, and for
giving me the opportunity to work in the Center for Health Promotion and Disease
Prevention, experience that I highly value.
I am grateful to the Radiation Effects
Research Foundation for allowing me the use their data.
Finally, I would like to thank my family and all my friends, both here and in my
country, Costa Rica, for their love and support, and for giving meaning to the work I do.
v
..
To my god children,
Wendy, Herbert, and Angelo,
and my nieces, Iris and Julie.
I hope to contribute with my work in making a better world for them.
VI
Table of Contents
1. Introduction andLiterature Review
1
1.1
Introduction........................................................................................... 1
1.2
Literature Review
4
1.2.1 Pharmacokinetics
4
Physiologically Based Pharmacokinetic Models
1.2.2 Dose-Response Analysis
5
8
Tolerance Models
9
Multistage Models:
16
1.2.3 PB-PK and Dose-Response Models
23
1.2.4 Accounting for Measurement Error
24
Measurement Error Models
25
. DitTerential and NonditTerential Error..
26
Likelihood Function Accounting for Measurement Error..
27
Approximations
28
1.3 Synopsis of Research
30
II. Logistic Model under Compliance Measurement Error..
2.1 Introduction
2.2 A Model for Compliance
.32
32
·
:
2.3 Development of the Models
33
35
Vll
2.3.1 Model Approximation for Low Dose Studies
38
2.3.2 Model Approximation for Moderate Dose Studies
38
2.3.3 Model Approximation for High Dose Studies
39
2.4 Bootstrap for the Approximated Logistic Models
39
2.4.1 Compliance Distribution
.40
2.4.2 Estimation of the Parameters of the Model...
.42
III. Bivariate Logistic Model under Compliance Measurement Error
49
3.1 Introduction
49
3.2 Development of the Models
5D
3.3 Estimation
52
IV. Segmented Logistic Model under Compliance Measurement Error
60
4.1 Introduction
60
4.2 Development of the Models
61
4.3 Estimation
:
67
4.3.1 Probability Model..
67
4.3.2 Asymptotic Distribution
68
4.3.3 Estimating Equations
71
4.3.4 Further Details for Application
72
V. Nmnerical Results
77
5.1 Introduction
77
5.2 Application to the Study of Radiation Effects
78
5.2.1 The Atomic Bomb Survivor Data
78
5.2.2 Methods for Estimation
83
. VIII
5.2.3 Application of the Bootstrap
85
5.2.4 Results
87
5.2.5 Discussion
89
5.3 Simulation Study
92
5.3.1 Results of the Simulation Study
5.3.2 Discussion
97
102
VI. Relnarks
105
Appendix
l 09
References
111
IX
CHAPTER I
INTRODUCTION AND LITERATURE REVIEW
1.1 Introduction
Contemporary exposure to a variety of toxic ambient air pollutants represents a
continuous threat to public health. Major pollutants such as ozone and carbon monoxide
have been subjects of extensive research. Attention is usually focused on specific health
effects.
Cancer, for example, has been of primary concern. Other health effects of
interest are asthma, respiratory infections, bronchial distress, allergies, developmental,
and reproductive toxicity.
The assessment of risk of the different health effects is a very complex problem.
It deals with the assessment of exposure levels, relevant measure of dose, dose-response
relationships, and health effects. Models of exposure can be very complicated due to
including identification of sources of emissions and models for atmospheric
transportation. Once an agent comes in contact with an individual, we are concerned
with how the agent is distributed throughout the body, and how much of it is absorbed
and reaches a target site of the body.
Dose-response relationships are of primary
importance in establishing a causal relationship between a particular agent and a
particular health effect.
Several dose-response models have been used to study the risks imposed by the
different agents in the environment.
Among those, logistic models have been used
extensively and will be the focus of this research. The classical logistic (regression)
model rests on a fundamental assumption that the administered dose levels are known
without appreciable error and account for the true regressors, so that the logit can be
expressed as a linear function of the dosage. This relation breaks down when the true
regressors differ (even stochastically) from the administered doses, as may be due to
measurement error (of some kind or another), or other forms of (stochastic) compliance.
In a classical measurement error model, true regressors, though unknown, vary
stochastically around the specified dose level.
In inhalation toxicology problems,
however, it is reasonable to assumed that the administered dose is not completely
absorbed into the system, a portion of the dose is either metabolized or deflected by the
different mechanisms in the body.
This result in a stochastic compliance of the
administered dose, typically being smaller in magnitude.
The problem of compliance may occur when studying the effects of agents as
they are encountered in the environment, as well as in controlled, laboratory experiment.
When studying the effects of agents in free-living humans, which is the interest in
environmental epidemiology, there is little control in the amount of the agent that is
actually inhaled, and ultimately absorbed into the body.
Therefore, accounting for
measurement error, in particular compliance measurement error, become relevant. In
toxicology experiments, although there is much more control of the dose levels, there is
always the chance that a portion of the administered dose is not absorbed into the body,
in particular if the interest is the study of the effect of the agent in a particular organ.
Therefore, taking into account compliance measurement error is also of interest.
2
In this context, pharmacokinetic principles become relevant to study how an
administered dose is distributed in the body from its point of entry to its delivery to a
target organ. Pharmacokinetics and physiologically based pharmacokinetic models (PBPK) in particular, have been used to estimate relevant measures of dose (e. g., organspecific dose). Measures of dose form pharmacokinetic models have been used in doseresponse analysis and might be used to study the problem of stochastic compliance.
This chapter presents a reVIew of the issues concerned with dose-response
models and measurement error that relate to our interest in measurement error of the
compliance type in the logistic model.
The literature review in this chapter focuses
mainly on three areas. First, we start with a review of pharmacokinetic principles and
the development and importance of PB-PK models. Understanding these principles will
help us to appropriately incorporate this type of information into the dose-response
analysis.
Second, we introduce two types of dose-response models which are
extensively used: tolerance based models and multistage models. Here we focus on
their fundamental characteristics and current estimation methods. We also try to provide
some insight on how these models have been, and continue to be extended to account
for new situations. A section with some comments on the relationships between PB-PK
models and dose-response models
is included.
Third, we present a review of the
literature on measurement error, where we discuss the relevant concepts and current
methods to account for the problem of measurement error. Finally, in the last section of
this chapter, Section 1.3, we provide a synopsis of the new work presented here.
3
1.2 Literature Review
1.2.1 Pharmacokinetics
Pharmacokinetics refers to the distribution of a chemical and its metabolites from
the point of entry to the different organs and tissues of the body. Pharmacokinetic
variables include: factors that determine the rate and extent of absorption of a substance;
factors that govern the distribution and elimination of the substance, and the formation,
distribution and elimination of biologically active metabolites; and factors that affect the
rate or extent of reaction of the substance with the target site (Morris, 1990).
Pharmacokinetic models consist of numerous parameters which represent actual
physiological functions that control the distribution and disposition of contaminants in
the body.
In classical pharmacokinetics, the .body is reduced to a few compartments. Then
the drug uptake in these compartments is described by solving a series of differential
equations. One disadvantage of the classical models is the lack of a physiological basis,
which disallows species extrapolation. This limitation led to the need for physiologically
based models.
In the physiologically based approach, the body is separated into several
anatomical compartments that are linked together by a flow network, the body fluid
system. PB-PK models differ from classical models in that they are based on the actual
physiology and anatomy of the test animal, and therefore they provide a better base for
extrapolation. For an overview of the recent research to develop PB-PK models for a
few classes of chemicals, see Andersen (1994).
There are several reviews in the
literature on PB-PK models, their applications and principles (Gerlouski and Jain, 1983;
Andersen, Krewski and Withey, 1993).
4
Physiologically Based Pharmacokinetic Models (PB-PK)
PB-PK models are important because of their explicit biological structure. They
are suitable for dose, dose-route, and interspecies extrapolation. Some chemicals are
known to be carcinogenic only after they have been metabolized. Furthermore, some
have suggested the use of the concentration of a particular intracellular element in the
target tissue, such as DNA adduct, as the biologically effective dose (BED). PB-PK
models allow the prediction of concentration of BED from exposure or administered
dose.
More importantly, the relationship between the administered dose and the BED
can be nonlinear (Hoel, Kaplan, and Anderson, 1983). For example, when high doses
are administered, metabolic and transport processes could become saturated.
When
nonlinearities are present, high to low dose extrapolation based on administered dose can
be higWy misleading by either under or overestimation.
..
PB-PK models allow extrapolation of dose-response function from one route of
entry to another (e. g., inhalation vs. ingestion). In each case, the BED can be calculated
in the organ or tissue of interest.
PB-PK models allow for interspecies extrapolation because the models are based
on physical and chemical principles that applies across species. They can be scaled from
one species to another by scaling the appropriate parameters. For instance, some have
observed that many anatomical and physiological variables can be correlated as functions
of body weight (Dedrick, 1973).
5
In PB-PK models, the body is represented by several separated compartments
corresponding to regions of the body such as organs (e. g., lung, liver), or some
physiologically similar tissue such as muscle or fat.
A compartment is included in a model if it represents a region of substantial
uptake of the chemical of interest; if it is related to a clearance process; or if it is a region
of special interest due to its response to the chemical. A compartment can represent
several organs or tissues lumped together. In principle, a PB-PK model could model the
whole body including every organ and tissue. However, this would be costly because of
the number of parameters that would need to be estimated. In addition, it would be
usually not necessary for the study of a particular chemical. The available knowledge of
the substance in the body is used to simplify the model into internally homogeneous
compartments that are relevant for the study of the specific substance.
A single compartment is considered to consist of three well-mixed phases or -sub
compartments: a) a vascular space through which the compartment is perfused with
blood; b) an interstitial space; and c) an intracellular space consisting of the tissue cells
that comprise the compartment, which is usually the ultimate site of action of a chemical.
A chemical is carried by blood into the vascular space; diffusion can occur across
capillary membranes into the interstitial s.pace; and finally trans-membrane diffusion can
occur to reach the intracellular space. The chemical may also enter by direct diffusion
from another compartment, and then move across the sub-compartments by diffusion.
Mathematically, the model is described as a series of mass balance equations of
the chemical in each compartment or subcompartment. The mass balances are expressed
as differential equations representing the following (Dedrick, 1973):
6
[Rate of accumulation of chemical in compartment]
=
[Rate of absorption or injection]
+ [Rate offlow in with blood] - [Rate of flow out with blood]
+ [Rat~ of diffusion in]-[Rate of diffusion out]
+ [Rate of formation by chemical reaction]
- [Rate of conversion by chemical reaction]
-[Rate of excretion by physiological process].
Some simplification may occur because not all the terms are relevant in a given
compartment.
A series of parameters need to
~e
estimated before solving the mass balance
equations (Lutz and Dedrick, 1987):
a) Anatomical.
These parameters relates to the size and structure of the body, for
example, body weight, organ size, or tissue space.
b) Physiological. These include blood flow rate, clearance, and metabolic rates.
c) Clearance. It relates to mechanisms of removal of chemicals from the body. For
example, the kidneys by urinary elimination, or the liver by biliar and fecal elimination.
d) Metabolism. Metabolism may be responsible of the toxic effect of the chemical by
forming a reactive species. Metabolism may be an essential step for detoxification by
reducing the chemical to an innocuous form or by helping its removal.
e) Thermodynamics.
Thermodynamic parameters help to describe the interaction of
chemicals with tissues in terms of protein binding and tissue partitioning.
7
f) Transport. In some instances, the rate of tissue uptake can be controlled by the rate of
transport across membranes. This transportation may occur by passive diffusion or
carrier-mediated transport.
All parameter values need to be specified before solving the differential
equations. Physiological parameters can be determined from experimental methods. For
some species, these parameters can be found in the literature.
In vivo or in vitro
experiments may be used to determined kinetic parameters. In addition, in vivo-in vitro
correlations may be appropriate for estimating some of the parameters.
1.2.2 Dose-Response Analysis
As
mentioned
before,
dose-response
relationships
are
usually
studied
independently, using data from other sources. Because of the existence of little or no
adequate human data, dose-response models are usually developed based on data from
animal studies (bioassays).
In a long-term bioassay test animals are observed for a major portion of their life
span. The development of the first tumor of the type of interest is recorded after or
during exposure to several doses of an agent.
The objectives of a bioassay are to
compare treated with untreated group, investigate the existence of a dose-response
relationship and in some cases, fit a dose-response model.
According to the
International Agency for Research on Cancer (IARC, 1986) a standard bioassay has the
following characteristics:
1) Both sexes ofF-344 rat and B6C3F mouse 6-8 week old.
2) 4 groups (control, maximum tolerated dose, and 2 intermediate doses).
8
3) Minimum of 50 animals per sex per species per dose group.
These characteristics are similar to the "standard NCI bioassay" (Sontag, Page, and
Saffiotti, 1976), in which for each sex-species studied there are three groups of 50
animals each. In addition, this is the type of bioassay being carried out by the National
Toxicology Program (NTP).
The dose selection is generally based on previous
subchronic toxicity studies, where the maximum tolerated dose is predicted.
It is
recommended that the highest dose employed should be the maximum tolerated dose,
defined as "The highest dose of the test agent during the chronic study that can be
predicted not to alter the animals' normal longevity from effects other than
carcinogenicity" (Sontag, Page, and Saffiotti, 1976).
The optimality and general
statistical issues of this bioassays have been studied by Portier and Hoel (1983) and
Haseman (1984).
In addition, Finney (1978) provided a comprehensive review of
experimental designs and statistical analysis for other types of biological assays.
In addition to the obvious problem of extrapolating results from animals to
human, deriving a dose-response function has some other problems that need
consideration. First, bioassays must use high doses to get enough incidence among·a
limited number of animals. Therefore, high to low dose extrapolation become a problem.
Second, three or four dose groups are generally used in a standard bioassay. As
consequence, there is not adequate information on the shape of the dose-response
function (Morris, 1990).
Tolerance Models
Tolerance Models contain the assumption that each subject has its tolerance level
to an agent, a health effect is produced only after the subject is exposed to doses higher
than this level. There are various possibilities for the choice of a tolerance distribution
9
function. Among them, the normal, which leads to the probit model, the logistic and the
Weibull distribution function are commonly used. These models express the probability
of a lifetime response, n, as a function of dose, d,
ned)
= F[g(d)],
(1.1)
where g(.) is usually a linear function on d.
1) Probit model
ned) =
~(a
+ (3logd),
(1.2)
where ~ (.) represents the standard normal distribution function.
2) Logistic model
n d
()
=
exp(a + (3 d)
1 + exp(a + (3d)'
(1.3)
3) Weibull model
ned) = 1 - exp[ - (a + flo d{31 )],
(1.4)
where the background incidence rate is given by 1 - exp[ - a], and {31is a shape
parameter.
The Probit, Logistic, and Weibull can be viewed as special cases of the broad
class of models known as Generalized Linear Models (GLM's) introduced by NeIder and
Wedderburn (1972).
A GLM consist of three components: a random component, a
systematic component, and a link function.
1. The random comp~nent identifies the distribution of the response variable. In a GLM,
the components of the response variable Y
= (Y1, ... , Yn ) come from a distribution in
the exponential family. That iS,the independent Y! s have probability density function
of the form
10
!(Yi;O,</»=exp {
yO - b(O)
a(¢)
for some specific functions a('), b('), and c(·).
+ C(Y, </»
}
,
(1.5)
For data from bioassays, the
underlying distribution is binomial, which belongs to the exponential family· of
distributions.
2. The systematic component relate a linear predictor 11 = (111, ... , 11n) to a set of
covariates
11 = X{3.
(1.6)
Here X is a model matrix or design matrix. In dose-response analysis, X would
contain the specification of the dose group.
3. The link between the random and the systematic components is a function linking the
expected value of the response variable to the linear predictor. Let Iii = E(¥i), then
1]i
= g(lii), where g(.) is called the link function.
With the same random and systematic components, different link functions lead
to the different tolerance models, Probit, Logistic, and Weibull.
For the tolerance
models, the systematic component is given by
g(n") = 11i =
a
+ {3d; i
= 1, ... , n.
(1.7)
Additional covariates may also be included.
The link functions commonly used are
1. The logit function (logistic model)
g(n") = log{_7r_}.
1-7r
2. The probit or inverse normal function (probit model)
11
(1.8)
(1.9)
3. The complementary log-log function (Weibull model with zero background incidence)
g(n-) = log[ - 10g(1 - 71")].
(1.10)
All these functions are differentiable and increasing on (0,1).
From bioassay data, each independent
mi
Yi
has the binomial distribution with index
and parameter 7I"i. ,Based on this, the log likelihood may be written as follow
1(,,; y)
=
t
[Y;IOg( 1 :;
where Yi is the number of responses among
"J +
mi
(1.11)
m;log(l- ,,;)),
subjects when treated with dose di .
For the different tolerance models, the log likelihood stated above can be
expressed in terms of the parameters
Q'
f3 by using the corresponding link function.
and
Then the parameter maximum likelihood estimates can be obtained by iterative' reweighted least squares described in detail by McCullagh and NeIder (1989, pp. 40-43,
114-117).
The goodness of fit of the model can be tested by using the deviance function
given by
D(y; 1?) = 21(1r ; y) - 21(1?; y)
"{ Yi10g ( --,
Yi )
= 2'LJ
i
p't
+ (mi -
(1.12)
Yi ) } '
Yi)log (mi, _- --.
mt
p't
where 1? are the fitted probabilities under any given model, H o; and 9r i = Ydmi,
corresponds to the point at which the log likelihood attains its maximum. The deviance
function, D(y; 1?), is asymptotically or approximately distributed as X~-P' where p is the
number of parameters under H o.
12
The deviance function can also be used to compare t~o nested models. Let H o
denote the test model and H A an extended model with an additional covariate. The
reduction in deviance
(1.13)
is identical to the likelihood-ratio test statistic, and it is approximately distributed as
xi.
Some Generalizations of Tolerance Models
Some have proposed different generalizations of the usual dose-response models
described above. These generalizations aim to improve the fit of the models by making
the model more flexible.
Prentice (1976) proposed to model dose-response by using a four parameter
class. It includes one parameter for location, one for scale, and two for shape
_jY exp(wmd[l + exp(w)]-(m +m
1
P(d) -
-00
where y
~()
fJ ml,m2
= (d: Jl ) , and (3(-, -) represents the beta function.
2)
dw,
(1.14)
The Logistic and Probit model
are special cases of this class of models.
The use of this class of models has two main purposes. First, it one allows to
choose between different models, for example, between logistic and probit, by choosing
parameters values in a single model. The author presents a score test to be used with
this purpose. Second, the use of a flexible parametric models is expected to improve the
fit at the extremes. Therefore, more reliable low-dose extrapolations can be obtained.
Stukel (1988) proposed a generalization of the logistic model that included two
shape parameters. This allows different symmetric and asymmetric probability curves.
In 1990, Stukel recognizes the usefulness of these models for estimating EDIOOp, the
13
dose associated with a 100p% response rate, in dose-response modeling. In the model
she proposed, the shape parameters were designed to modify the behavior of the logistic
curve in the extremes, where the fit of the standard logistic can be poor. The model is
given by
exp[h/l(1})]
/1/1 = 1 + exp[h/l(1})]'
where 1} =
0'
(1.15)
+ {3d.
The h/l (1}) are strictly increasing nonlinear functions of
parameters,
111
1},
indexed by the shape
and 110, defined as follow. 'For 1} ~ 0 (or /1 ~ ~),
0'1
>0
0'1
= 0,
0'1
<0
(1.16)
(1.17)
A score test can be used to choose between different subfamilies, and maximum
likelihood estimates can be obtained using the delta algorithm. In addition, approximate
confidence intervals can be constructed based on a first-order Taylor expansion.
More recently, Devidas et at. (1993) proposed to use a generalized logistic model
with one shape parameter to address the low-dose problem. The model is given by
P(y
= 11x,O, (3, 0') = [1 + exp( -
14
0 - {3x)r o •
(1.18)
· .
They pointed out that the model has an advantage over the general model proposed by
Prentice (1976), that is, the availability of a closed form algebraic expression for p, the
mean ofy.
The Multivariate Case
The standard GLM (McCullagh and Neider, 1989) assumes that observations are
independent. However, in many studies, it is common to observe a vector of responses
for each subject in the study.
In dose-response studies, one might be interested in
characterizing the dependency of each response on dose and/or describing the
association between responses.
In these situations, the analysis needs to take into
account the problem of correlated observations.
Liang and Zeger (1986) proposed an extension of GLM for the analysis of
longitudinal data, where an outcome variable is observed at several time points. In their
work, a standard GLM is used for each marginal model, and no form is specified for the
joint distribution of the outcome variables.
Instead, they proposed generalized
estimating equations (GEE) that give consistent estimates of regression parameters and
of their variances under weak assumptions about the joint distribution. The idea is to use
a "working" correlation matrix to characterize the dependencies among responses from
the same subject to construct general estimating equations.
A modification of the methods proposed by Liang and Zeger (1986) was
proposed by Lipsitz, Laird and Harrington (1991).
They proposed to model the
correlation matrix as a function of the odds ratio for binary responses of pairs of these
responses.
This approach is appealing because the odds ratio has many desirable
properties. Some find odds ratios easier to interpret than other measures of association.
15
In this same line of research, Liang, Zeger, and Qaqish (1992) discussed new
GEE approaches for the estimation of parameters of multivariate regression models for
categorical data. These methods,
howev~r,
can be applied when the responses comprise
repeated observations on one variable, as in longitudinal studies as well as when the
observations correspond to different variables. A noteworthy discussion on marginal
models in contrast with log-linear models is also given in this paper.
On the other hand, fully likelihood methods have been proposed in the literature.
Likelihood methods are usually preferred by many. However, in the case of multivariate,
likelihood based methods are very complicated when considering a vector of responses
of general dimension.
bivariate case only.
responses.
Therefore, in most cases methods are well developed for the
Some authors have worked more specifically on correlated binary
See for example McDonald (1993), Lipsitz, Laird and Harrington (1990),
and Prentice (1986). Lesaffre and Molenberghs (1991) discuss the probit model for the
multivariate case, and in 1994 Molenberghs and Lesaffre proposed a multivariate
Plackett distribution for ordinal categorical responses. Finally, it is worth mentioning the
work by Lang and Agresti (1994). They proposed a maximum likelihood approach using
Lagrange's method of undetermined multipliers.
The models are used to specifY
constraints on the cell probabilities and then the likelihood is maximized subject to these
constraints.
Multistage Models
Multistage models are biologically based.
They assume that a health effect
(cancer in particular) is developed as a result of the occurrence of a sequence of random
events, which is believed to be the way how genetic change usually occurs.
The
discussion in this section will be presented assuming cancer is the health effect of interest
16
because the multistage theory was developed specifically for carcinogenesis. However,
the multistage theory can and has been applied to non-cancer endpoints.
Numerous models have been developed under the theory of multistage models.
In this section, three types of these models are being presented: The Armitage-Doll
model, two-stage models, and multipath. multistage models. They illustrate important
concepts in carcinogenesis. Some details are given on the estimation of tumor incidence
and the parameters of the model for the general case.
Tan (1991) and Portier and Sherman (1994) provide a more extensive and more
detailed review of multistage models. Another important review of the mathematics of
multistage models is given by Wittemore and Keller (1978).
The Armitage-Doll Model
The Armitage-Doll model is considered the first practical and acceptable model
of carcinogenesis. Armitage and Doll (1954) postulated that cancer develops from a
single cell that must pass through a series of irreversible mutations to become a
malignant cell. The transition probabilities from one stage to the next are believed to be
linear functions of dose (d), i. e., I-£i(d)
= ai + bid, for i = 1, ... , k, where k represents
the number of stages. In addition, these transition probabilities are assumed to be unique
and time homogeneous.
Two major criticisms were stated for the k-stage Armitage-Doll model. First,
there were increasing biological evidence that the process of carcinogenesis consists of
fewer stages, probably just two or three. Second, The Armitage-Doll model does not
account for cell kinetics, that is, birth and death of the cells.
17
Two-stage Model of Carcinogenesis
A lot of attention has been given to a multistage model of carcinogenesis with
only two stages. According to these models, there are three types of cells: normal stem
cells, intermediate or initiated cells, and malignant cells. These models assume that, in
general, with probability one, a malignant cell will develop into a tumor.
First, Armitage and Doll (1957), in view of the criticisms, modified their initial kstage model by reducing the number of st~ges to two stages and including a deterministic
birth and death process on intermediate cells. This model contains several assumptions:
a) The probability that any normal cell undergoes a first change is PI per unit time, which
is proportional to the concentration d l , of the agent initiating the first change; b) each
initiated cell gives rise to an exponentially increasing clone, which contains e kt cells at
time t after the first change, where k is a constant; and c) any such clone has probability
P2 e kt per unit time of experiencing a second change, where P2 is proportional to· the
concentration ~, of the agent initiating the second change. Based on these assumptions,
they concluded that the incidence rate per person is
NPI
{I - exp[ -
~ (e kt -
I)]},
(1.19)
where N is the mean number of cells per person which are exposed to the risk of a first
change.
Next, the two stage model with a stochastic instead of the deterministic birthdeath process, was developed.
This model assumes that normal cells grow at a
deterministic rate and intermediate cells grow in a stochastic manner. The number of
intermediate cells are assumed to arise from normal cells as a Poisson process with
intensity 1-£1X(t), where 1-£1 is a constant mutation rate from the normal stage to the
intermediate stage, and X(t) is the number of normal cells at time t. In a small time
18
period (t, t + At), an intermediate cell' can divide into two intermediate cells with
probability f3At + o(At)~ it can die with probability bAt + o(At)~ or it can divide into
one intermediate and one malignant cell with probability J.L2At + o(At). The probability
of no event in the small time period is 1 - (f3 + b + J.L2)At + o(At), while the
probability of more than one event is o(At). The constants J.Ll and J.L2 are mutation rates
per cell per year, f3 is the birth rate of intermediate cells per cell per year, and b is the
death rate of intermediate cells per cell per year. This model is usually known as the
MVK model, and was developed by Moolgavkar and Venzon (1979), and Moolgavkar
and Knudson (1981). Furthermore, the non-homogeneous case have been considered,
with birth, death and mutation rates defined as function of time (Moolgavkar, Dewanji
and Venzon, 1988) (Moolgavkar and Luebeck, 1990).
Finally, a further generalization has been considered for the two stage multistage
model, a model allowing both normal and intermediate cells to grow in a stochastic
manner (Moolgavkar, Dewanji and Venzon, 1988) (Portier, Kopp-Schneider and
Sherman, 1995).
In any small time period (t + At), a cell in stage i can replicate
resulting in two identical cells with probability f3i(t)At +
o(At)~
it can die with
probability bi(t)At + o(At)~ it can mutate to the next higher stage (i + 1) with
probability J.Li+l (t)At + o(At)~ or it can remain in stage i with probability
1- [f3i(t) +bi(t) +J.Li+l(t)] At+o(At), with i=O,lindicating the normal and the
intermediate stage respectively. The probability of more than one event is o(At).
A birth and death process is disregarded in stage 2 (malignant stage) because of
the assumption that a tumor will develop .from a malignant cell with certainty. However,
this assumption is not needed when there is information on growth kinetics of malignant
cells.
Dewanji, Moolgavkar and Luebeck (1991) presented the mathematical
19
developments to take explicit account of cell proliferation of malignant cells. In this
case, becoming extinct has a non-zero probability.
The birth-death process and the mutation process are assumed to be independent;
and each cell goes through these processes independently of other cells. In addition, the
rates of the process are usually assumed to be piecewise constant due to mathematical
.,
.
difficulties.
Multiple pathway models of carcinogenesis
There exists some biological evidence that supports that cancer may develop
from mUltiple pathways. Tan (1991) described several examples in which carcinogenesis
seems to involve two pathways with one-stage and two-stage model, or two two-stage
models.
Then, he presented the mathematical development for a general multipath
model involving a one-stage model and two two-stage models. In this model there are
five types of cells: normal cells, three types of initiated cells, and malignant cells. A
..
normal cell can become malignant by going through the different pathways as follow:
1. A normal cell may undergo one mutation and become malignant at a rate Jl (onestage).
2. A normal cell may undergo two mutations, transforming from the normal stage to
initiated stage 1 or initiated stage 2 at a rate
01
or
02,
respectively, and then
transforming from either of these initiated stages to the malignant stage at a rate f31 or
132, respectively (two-stage).
3. A normal cell may undergo three mutations, transforming to the initiated stage 3 after
the first transformation into either the initiated stage 1 or 2 at a rate
WI
or
W2,
respectively, and a final transformation from the initiated stage 3 to the malignant
20
stage at a rate
f33.
In this case, initiated stage 3 is a necessary stage towards
malignancy, while initiated stage 1 and 2 are exchangeable.
As in the MVK model, intermediate cells proliferate via a birth-death process
with their corresponding birth and death rates.
..
The mathematics become more
complicated with additional pathways and therefore, the data needs are more complex to
estimate the parameters in these models.
Tumor Incidence
In risk assessment, the estimation of the tumor incidence function is of interest.
Let T be the random variable which follows the time of appearance of the first malignant
cell, then the tumor incidence function is given by
h(t)
=Iltlim~ 0 [1pet < T
u.t
-
< t + IltlT >
t].
-
(1.20)
An analytical approximation to the cumulative probability distribution function of T is
peT ::; t)
~
1 - exp{ - E[X(t)]} (Whittemore and Keller, 1978), where X(t) is the
random variable representing the number of malignant cells at time
t.
From this
approximation, the incidence function can be estimated. Nowadays, however, several
methods are available for estimating the exact incidence function.
In what follows, the two-stage model is used as a base for our presentation. This
would help simplify the notation, and generalization to more stages can easily be drawn
from it.
Let Xo(t), Xl (t), and X 2 (t) denote the number of normal, intermediate, and
malignant cells at time
t, respectively. Then the probability generating function at time t
21
starting with a normal cell at time 0 is given by
'IT(Xo, x}, X2; t)
= LP{ Xo(t) = i, Xl (t) = j, X 2(t) = kl
i,i,k
Xo(O)
(1.21)
= I,Xl (O) = 0,X2 (0) = O}xhx{x~.
The survival function, P(T > t) = P[X2 (t) = 0], that is, the probability of no
tumor at time t is equivalent to the probability that the second stage contains 0 cells at
time t. In terms of the probability generating function, this probability is 'IT(I, 1, O;t).
Therefore, the incidence function can be obtained as
h(t) = ...:.. \l!'(I, 1,0; t) .
\l!(I, 1,0; t)
(1.22)
Kolmogorov forward and backward equations have been proposed to obtain
'IT(I, 1,0; t).
For a stochastic two-stage model, Moolgavkar, Dewanji and Venzon
(1988) used Kolmogorov forward equations, while Portier, Kopp-Schneider and
Sherman (1996) proposed the use of Kolmogorov backward equations for estimating
tumor incidence. Application of these methods to other multistage models can be found
in Tan (1991) and Portier and Sherman (1994).
Parameter Estimation
Based on the binomial distribution, the log-likelihood is given by
(1.23)
where Yi is the number of responses among
mi
animals after a fixed from being treated
with dose di . P(di) can be obtained from the incidence function h(di) as follow
22
(1.24)
where S (di , t) is the survival function.
The log-likelihood is related to the parameters of a given model, such as, f3's, o's,
and /1,'S, by using this 'expression for P(dd. Then the maximum likelihood estimates can
be obtained based on this log-likelihood function.
1.2.3 PB-PK and Dose-Response Models
The importance of PB-PK models in risk assessment lies in the fact that they can
be used to provide a measure of the BED. such as the dose delivered to the target tissue,
in human based on animal studies. The BED can then be used in the appropriate doseresponse model instead of the administered or exposure dose.
The dose-response
relationship is expected to be more realistically represented because of the use of a more
meaningful measure of dose. Therefore, low-dose extrapolation should be improved as a
consequence.
Combining PB-PK modeling and tolerance dose-response models appears to be
straight-forward. PB-PK models would provide with the appropriate BED, which will
be used along with the incidence data to estimate the parameters ofthe chosen tolerance
model. This, however, ignores the fact that BED is an estimation subjected to error,
hence a random variable. Research is needed in this area.
On the other hand, incorporating PB-PK models into multistage type of models
has some other considerations. First, we need to consider the type of carcinogen under
study. An agent can be an initiator, a promoter, or both. Initiators are agents which
23
cause the initial DNA damage. Promoters are agents that act on initiated cells promoting
their clonal expansion. An agent can be both an initiator and a promoter, in which case it
is are called a complete carcinogen. In terms of the MVK two-stage model, initiators are
associated with the mutation rates, while promoters are related to the birth and death
rates of initiated cells. A promoter would increase the birth rate and/or decrease the
death rate. Second, there is the consideration of selecting the appropriate association
between the BED and the different parameters of the model. A linear relationship is
usually assumed.
Tan and Singh (1987), for instance, proposed the use of a linear
relationship after properly considering the non-linearities between the BED and the
administered dose. For promoters, we can choose to model the difference between the
birth and death rates (0 -(3) as a function of BED instead of separately.
1.2.4 Accounting for Measurement Error
The problem of measurement error have been studied extensively. The book by
Fuller (1987) provides a comprehensive review on the subject, mainly in the context of
normal linear model. Interest on measurement error in nonlinear models started in the
early 1980s, and it has grown dramatically since then. Most of the work has focused
attention on binary regression, especially using the logistic model; but methods has also
been proposed for the Generalized Linear Models in general, and quasilikehood
problems. A good account of this work on nonlinear error-in-variables modeling is now
presented in the book by Carroll, Ruppert, and Stefanski (1995).
24
..
Measurement Error Models
Suppose a dichotomous response variable Y depends on predictor vector X
(PI xl) and U (P2 xl) through a model of the form
P(Y = llx, u) = F(n
+ (i~x + {3~u),
(1.25)
where F(·) is a known distribution function such as those used for tolerance models (e.
g., Logistic, Normal, Weibull). Here w~ have two kinds of predictors: the vector of
predictors U which represents the set of predictors that can be observed without error,
and X which represents those predictors that can only be observed through the surrogate
Z.
A model for the relationship between X and Z needs to be specified in the
analysis of measurement error problems. Several models for measurement error has been
proposed in the literature.
Carroll, Ruppert, and Stefanski (1995) distinguish two
general types: 1) Error Models, where the models focus on the conditional distribution of
Z given X and U; and 2) Regression Calibration Models, which focus on the conditional
distribution of X given Z and U. A general model of the first type would be
Z
= 'Yo + 'Y~X + 'Y~U + c, with E(cIX, U) = O.
(1.26)
Similarly, a general model of the second type can be taken as
X
= 1]0 + 1]~Z + 1]~U + c., with E(c.IZ, U) = o.
These models in their simplest form have been used extensively, that is Z = X
(1.27)
+ c, and
X= Z+c•.
Most of the work on Measurement Error uses additive error models as presented
above. However, the additive model is not a requirement. Sometimes, a multiplicative
model might be more appropriate. A model of the form
25
x = Zc, with E(cIZ) = 1
(1.28)
has been discussed in the literature somewhat.
Differential and Nondifferential Error
There are basically two types of measurement error associated with the set of
predictors X, differential and nondifferential. The distinction depends on whether or not
the measurement error relies on the response variable Y. When we have nondifferential
error, if X is available, Z does not contain any additional information about Y. This
implies that
!YIX,z,u(Ylx, z, u)
= !Ylx,u(ylx, u)
(1.29)
!ZIX,Y,u(zlx,y,u)
= !zlx,u(zlx,u).
(1.30)
or equivalently
This distributional assumption has also been termed "conditional independence" in the
measurement error literature (Reade-Christopher and Kupper, 1995).
Measurement error of the nondifferential type can be justified in a lot of
situations, especially when the measurement of predictors occurs previous to the
measurement of the response, which is the case in prospective studies'- Differential error,
on the other hand, might occur in case-control studies, where the response is obtained
first, and therefore, it has the potential to affect the observation of the predictors.
Selection recall and bias by interview are ,usually cited as the sources of differential error
in case-control studies (Flegal, Keyl, and Nieto, 1991). Another way differential error
may arise in case-control studies is described by Carroll, Ruppert, and Stefanski (1995,
p.16). The example relates to nutritional studies in which the true predictor would be
long-term diet before diagnosis. In the case-control setting, diet is reported only after
26
diagnosis.
There is the potential that a subject may change its diet because of the
diagnosis of disease. Therefore, the reported diet measured after diagnosis is clearly still
associated with the disease status, even after accounting for long-term diet before
diagnosis, which leads to differential error. Differential error is usually thought unlikely
in prospective studies as mentioned before, but it is still plausible (Wacholder, Dosemeci,
•
and Lubin, 1991).
Most of the work in the literature focus on nondifferential error because under
nondifferential measurement error, one can still estimate the parameters in models, even
when the true predictors, X, are not observable. When differential measurement error
occurs, the true predictors have to be obtained in at least some of the study subjects .
The methods presented in the following sections assumes nondifferential error in their
development. See Carroll, Ruppert and Stefanski, 1995, section 14.2 for an account of
methods for problems with differential measurement error.
Likelihood Function Accounting for Measurement Error
The full likelihood for the data is given by
fifv.zlU (y, zlu) =
fi{ J
fYIU.x(ylu, x)fzlU.x(zlu, x )fxlU(x lu) dx }, (1.31)
for assumed densities !YIU,x(ylu, x), !zlU,x(zlu,x), and !xlu(xlu). In some restrictive
cases, estimates can be obtained by maximizing this likelihood (Armstrong, 1985).
Another approach is based on the conditional likelihood
fify,z,u(y'z,u)
=
fi {J
hw.x(ylu, x)fXIU.Z(xlu,
Z)}.
(1.32)
First, nuisance parameters of !zlU,x(zlu,x), and !xlU(xlu) are estimated from the
27
validation dataset, then these estimates are plugged into the conditional likelihood
yielding a pseudo likelihood function. Second, estimates of the parameters of interest
are obtained by maximizing the pseudo-likelihood function over the primary dataset. See
the work by Gong and Samaniego (1981) for more
details on pseudo maximum
likelihood estimation.
.
Carroll et al. (1984) used the conditional approach assuming the error vector to
be normally distributed.
The general case is presented and details are given for the
Probit Model. This approach, however, has been criticized due to its computational
difficulty and its lack of robustness to misspecification of the densities involved. Use of
maximum quasi likelihood estimation (Armstrong, 1985); and semiparametric estimation
(Carroll and Wand, 1991) have been proposed to overcome some of these criticisms.
Approximations
The model relating the response variable with the observed variables Z and U is
given by
P(Y
= llz,u) =
=
Now, by replacing F(a + f11x
!
!
P(Y
= llx,u)fxlz,u(xlz,u)dx
F(a +
(1.33)
f1~x + f1~u)fxlz,u(xlz,u)dx.
+ f1~u) with its Taylor series expansion with respect to x
about E(Xlz, u) we have
P(Y = llz,u) = F(a + ,8~E(Xlz,u)
1
+ 2cov (Xlz, u)
+R,
28
8
2
+ f1~u)
F(a + f11x + f1~u)
8x 2
(1.34)
Ix=E(Xlz,u)
with
R=
1
L -kE[x
00
E(Xlz,u)]
k
ak F(a +a,8~xk + ,8~u) Ix=E(Xlz,u)-
(1.35)
x
k=3
The term in the expansion which is linear in [x - E(Xlz, u)] becomes zero when taking
the expectation.
Then an approximation is obtained by considering only a few terms of the
expansion in the model. Rosner et al. (1989 and 1990) used only the first term in their
developments, whereas Kuha (1994) considered an approximate model with two terms
(second order). Based on these approximations, the authors provided a way to obtain
the corrected parameter estimates by a simple transformation of the naive estimates. In
addition, the delta method was used to obtain standard errors of the estimates. In this
work, the measurement error model was taken to be the standard multivariate linear
model
(1.36)
where 170 (1 x PI), 171 (PI
X
PI) and 172 (Pl x PI) are parameter matrices, and c.
(PI xl) is an independent random vector with zero mean and covariance matrix n. The
procedure take into account both systematic and random within-person error by using
this measurement error model.
On the other hand, when predictors are measured with error, estimators of the
parameters of the regression model, such as in the logistic model, are known to be
asymptotically bias (Stefanski and Carroll, 1985). Both authors have proposed methods
to adjust the estimates by using approximations aimed to reduce bias. Stefanski (1985
and 1989) used this approach for Generalized Linear Models, and Stefanski and Carroll
(1985) for the case of logistic models.
29
The approach is based on expanding the usual estimators for the parameters to
study the nature of the bias. Then, this information is used to develop adjustment to the
estimators. The measurement error model in this work is taken to be
Z = X
+ ae
(1.37)
where a is a scalar and e is defined as usual. In this context, the proposed bias-adjusted
estimators required small measurement error asymptotics, that is, a -+ 0 as n -+
00.
1.3 Synopsis of Research
In this research, we are primarily interested in stochastic compliance of various
types and the study of its impact on logistic models. We consider suitable adjustments
for compliance of the administered doses and develop appropriate approximate models
that are likely to have reduced bias and more efficient statistical conclusions than in
logistic models that ignore such measurement error, compliance of the dose levels.
Based on these approximations, suitable statistical analysis schemes are
considered.
A major part of the statistical analysis relates to efficient estimation of
related parameters, and examines their effective role in drawing statistical conclusions.
In Chapter II, we introduce a model for compliance based on a beta distribution,
which is used in the development of the models in the remaining of Chapter II as well as
in Chapter III and IV. In Chapter II, we take the classical logistic model, and develop
suitable adjustments to account for compliance measurement error by incorporating the
information from the compliance distribution. In Chapter III, we extend the models
developed in Chapter.II to accommodate for a bivariate outcome, where the correlation
of two binary outcomes is taken into consideration. Another extension is considered in
30
Chapter IV, where we develop approximate models for the segmented logistic model
accounting for compliance.
Chapter V is dedicated to illustrate the practical application of the proposed
methods and examine the performance of these methods under different situations. The
present work was originally motivated to study the dose-response problem in inhalation
toxicology, where the problem of compliance arises naturally. However, compliance of
the administered dose arises in a wide variety of situations. One example is in dosimetry
studies, particularly involving subhuman primates, where only part of the allotted dose is
actually consumed, while the rest is carelessly wasted. Another example is in the study
of radiation effects, where the organs are naturally shielded by the surrounding tissues of
the body. In Chapter V, we use a dataset provided by the Radiation Effects Research
Foundation (RERF) on the atomic bomb survivors in Japan to illustrate the use of the
proposed methods in practice. In addition, Chapter V presents the results of a simulation
study conducted to examine whether and how the proposed methods achieve a reduction
in the bias and an improvement of the overall statistical efficiency. Several compliance
patterns that can be encountered in practice are studied. The final chapter, Chapter VI,
is dedicated to present a few concluding remarks.
31
CHA'PTERII
LOGISTIC MODEL UNDER COMPLIANCE
•
MEASUREMENT ERROR
2.1 Introduction
In this chapter, we take the classical logistic model and adjust it to account for
compliance measurement error on the measure of dose. As discussed in the following
section, we assume a beta distribution to model this type of measurement error.
Therefore, the parameters of the beta distribution become part of the model.
.
Approximated models instead of exact models are developed here. The idea is to
develop models that reduce the bias incurred by ignoring the measurement error while
keeping the models simple enough for their use in practice.
These compliance-adjusted models are categorized into three types: (i) Low (or
near zero) dose levels, (ii) moderate dose levels, and (iii) high dose levels. The low
dose setting is appropriate when studying agents in the environment, which are normally
found in low levels. Moderate doses are commonly encountered in dosimetry studies
involving subhuman primates, where higher doses than those encountered in the
environment are needed to obtain a suitable number of responses. High dose levels are
more common in accelerated life testing problems, where also generally subhuman
primates are used.
The compliance models to be considered in the present chapter are based on
appropriate beta distributional models for the compliance factor, and for the three types
(i), (ii) and (iii), mentioned before, different approximations are developed, and based
•
on them, suitable statistical analysis are proposed. We present the theory needed for the
estimation of the parameters of these models, and discuss their use to draw statistical
conclusions.
2.2. A Model for Compliance
Recall that at dose level D, the binary response variable Y takes on the values 0
..
and 1 with respective probabilities 1 - 7r(D) and 7r(D), where in logistic models we set
at D
= d,
7r(d)
= PlY = lid]
= F[a
= {I
+
,Bd]
+ exp[ -
(2.1)
(a + ,Bd)]} -1, d > 0,
where a and ,B are parameters, and if 7r(d) is assumed to be nondecreassing in d, then ,B
is nonnegative, while 7r(0) = {I + e- a } -1 represents the baseline response probability.
In the above formulation D stands for the actual dose level incorporated in the
experimental protocol, not necessarily equal to the administered dose level.
The point of distinction is that in a measurement error or a stochastic compliance
.
model, the actual input dose D is stochastic and may differ from the administered dose
level V (mostly nonstochastic).
where
€
In a measurement error model, we let D
= V + €,
represents the measurement error, and it is generally assumed that
33
€
has a
distribution symmetric around 0; often
and variance
<7:
€
is taken to be normally distributed with mean 0
(mayor may not be known, and mayor may not be small).
In a
stochastic compliance model, on the contrary, we take
D = UV, 0 :S U :S 1,
(2.2)
where the stochastic factor U represents the proportion of the administered dose (V)
that relates to the actual intake dose D. The point to notice is that U typically has a
distribution centered somewhere in (0,1), and E(U) is generally smaller that one.
In this context, we propose to assume a Beta distribution with parameters (a, b)
for the distribution of the compliance factor U, i. e. U - Beta(a, b), with
a
E(U) = =7,
. a+b
(2.3)
and
ab
Var(U) = (a + b)2(a + b + 1) =
.
n.
(2.4)
The distribution of D, its variance and expected value can be easily obtained since D is a
1-1 transformation of U. Note also that V is assumed to be fixed. Here 7 represents the
expected proportion of the administered dose that reaches the target site.
The advantage of using a Beta distribution is its flexibility in modeling different
patterns of compliance. Different combinations of the parameters a and b lead to a range
of compliance patterns that are likely to be encountered in a given population. Several
cases are possible:
1) a < 1 and b < 1. A given exposure or administered dose is more likely to be either
highly absorbed or almost unabsorbed. This pattern might arise in populations that
can be separated in two groups whose characteristics lead to different compliance
34
patterns.
For instance, blue and white collar workers are exposed differently to
outdoors pollution.
2) a < 1 < b. Low compliance of dose is more likely to occur. Clearance mechanisms, for
example, might work effectively for some chemicals.
3) b < 1 < a. High compliance of dose is more likely to occur. A higher proportion of
the dose is expecte~ to be consumed or absorbed.
4) a > 1 and b > 1. In this case, moderate compliance is more likely to occur. We have
more probability around moderate true doses and less probability as we depart from
the mode.
Knowledge of the specific agent under study is needed to decide which of the above
compliance patterns is most appropriate. PB-PK models can be useful in this regard.
2.3 Development of the Models
Under model (2.1) the relevant dose measure is D. However, it is usually the
case that D is not directly observable. In this case, it might be more intuitive to replace
D by its expected value for the given dose V
= v,
and examine to what extent that
works out in (2.1). Therefore, as a first approach, consider fitting a model of the form
7fO(V) = PlY = lIE(Dlv)] = F[o + ,BE(Dlv)]
= F[o + ,B-yv).
(2.5)
This model is definitely a step ahead of the naive model (i. e., a model using the
administered dose level V) because it adjusts the dose measure by the average
proportion that actually relates to D . . Model (2.5), however, does not take into
35
consideration the variability associated with the different compliance patterns discussed
before. The model, then, which relates the response variable Y with the administered
dose that also accounts for the random variable D is given by
1r·(v)
= P{Y = llv} = l"p{y = Ild,v}fDIV(dlv)dd
= ioU P{Y =
(2.6)
Ild}fDIV(dlv)dd
v
=
Io F{a: + ,ad}fDIV(dlv)dd.
The assumption of nondifferential error was used in going from the first to the second
equality in expression (2.6).
An approximation can be obtained by replacing F( a: + ,ad) with its K -order
Taylor series expansion with respect to d about E(Dlv),
•
1r
(v) ~
L[( (}kF(a:{}x+ ,ax) Ix=E(Dlv) l
k
k=O
v
k
[d ...:.. E(Dlv)] fDlv(dlv)dd.
(2.7)
0
By taking advantage of the properties of the Beta distribution we can express (2.7) in a
somewhat simpler form based on the moments of U,
•
1r
~(}kF(a:
+ ,ax) Ix=E(Dlv) { vk~
(}x
f;;o( - ,) k'-J E(UJ). } ,
(v) ~ ~
k
(2.8)
where
E(Ui)
= rea + b)r(a + j).
r(a)r(a + j + b)
Note that the first term in the expansion corresponds to 1r°(v). We propose to use 1r°(v)
as a starting point and use some terms of the expansion to adjust the parameters of the
model correspondingly.
36
A second-order approximation of 7f*(v) specifically for the logistic model can be
easily obtained by using (2.8) with K = 2,
*
7f (v)
° 1 8 F(a+(3x)
~ 7f (v) + 2"V n
8x
2
2
2
~ 7fo(v) + !v2 nf327fO(v)
2
-
Ix=E(Dlv)
(2.9)
[1 - 7fo(v)][1 - 27fo(v)].
Here we can take the logit transformation and show that the resulting model is a linear
model plus an adjustment,
10git[7f*(v)] ~ 10git[7fo(v)] + G(v; a, b, 0:, (3)
~a + f3'Yv +G(v;a,b,o:,f3),
(2.10)
with
G(v; a, b,.a, (3) = log {
1 + ~if nf32 [1 - 7fO(v)][1 - 27fO(v)] }
1 _ ~v2 nf32~(v)[1- 2~(v)]
.
(2.11)
Now, we work 'on (2.11) to introduce suitable and practical adjustments to the
basic linear model. We can consider expanding G(o) as follow
G(v;a,b,o:, (3)
= 10g[1 +H(v;a,b,a, (3)]
(2.12)
1
1
= H(v;a,b,a, (3) - 2"[H(v;a,b,a, (3)f + 3[H(v;a,b,a, (3)]3 - ...
with
H(v; a, b, a, (3) = 1
1 if n(32[1 - 27fO(v)]
nf32~(v)[1 _ 2~(v)]'
-1v2
(2.13)
We further expand (2.11) by expanding (2.13). However, at this point, we consider
expanding (2.13) around Vo E {O,
VI,OO},
where VI is an intermediate dose producing
an intermediate probability of response, say 7fO(Vl) = 7fl, with 0.2 < 7fl < 0.8. This
37
way, we can develop adjustments that' are more appropriate for studies using low,
moderate or high dose levels.
A first-order expansion of (2.13) is considered in the
following presentation, that is
H (v; a, b, a, (3 ) ~ H (x; a, b, a, (3 )1 x=vo
+ (V -
aH(x;a,b,a,(3)1
ax
X=VO' (2.14)
Vo )
2.3.1 Model Approximation for Low Dose Studies
When Vo
= 0, it can be shown that (2.13) is approximately equal to zero, which
implies that (2.11) is also approximately equal to zero. Therefore, no adjustments are
needed and (2.10) becomes
(2.15)
which is simply the usual logistic model based on the expected value of D, that is
logit[7f*(v)] ~ logit[7fo(v)],
(2.16)
2.3.2 Model Approximation for Moderate Dose Studies
Let us consider,
7f1
= ~,
so
VI
is the dose needed to obtain a probability of
response of one half. Such a dose is sometimes regarded as ED50 (Effective Dose for
50% response). In this case, a zero-order expansion of (2.11) in combination with the
first-order expansion (2.14), leads to
(2.17)
Therefore, the corrected model is given by
38
(2.18)
For other dose values of VI, such that 11"1 is approximately ~, but not equal, the corrected
model is similar to (2.18), but its expression is much more involved.
2.3.3 Model Approximation for High Dose Studies
When vo~oo" it can be shown that approximation (2.14) is equal to - 1. In this
case, the series expansion (2.12) does not converge. Therefore, the simple approach
does not work here, and further work is needed to deal with this situation.
Then we went back to analyze expression (2.11) and study the rate of
convergence of the terms involved in it. This led to
G(v; a, b, a, (3) = - 2log(v)
+ log
[;n] +
O(v-
2
).
(2.19)
Therefore, the corrected model for high dose studies is given by
logit[1I"· (v)]
~ (Q + log
[;n]) +
f3'Y v - 2log(v).
(2.20)
2.4 Bootstrap for the Approximated Logistic Models
The bootstrap is a computationally intensive technique used for making
inferences on parameters based on a given sample. The bootstrap method estimates the
shape of the sampling distribution of a statistic by "re-sampling" a large number of times
from the available sample. Its advantage is that it does not require strong distributional
39
assumptions nor the derivation of complicated analytical expressions. In this section we
consider the bootstrap method to draw statistical conclusions about the parameters of
our models.
First, we prove that the bootstrap is consistent for its application to our models.
We based our discussion on the results given by Mammen (1992) for the consistency of
the bootstrap method.
In his paper, Mammen shows that asymptotic normality is
necessary and sufficient for consistency of the bootstrap in the case of linear functionals
of the form
(2.21)
Therefore, we intend to provide estimators of the parameters of the different models that
can be expressed as linear functionals and then, study whether asymptotic normality
holds for these statistics.
2.4.1 Compliance Distribution
The parameters of the Beta
distri~ution
can be estimated by maximum likelihood
based on an independent study. These maximum likelihood estimators can be expressed
as linear functions based on the score function. First, suppose there are n observations
available for the estimation ofthe Beta parameters, and let
8=
[~] and 9" = [~] the corresponding maximum likelihood estimators.
Thus,
(2.22)
40
with 1(0), the Fisher Information matrix
1(0) = E{ [;O!Ogf(X"Ol]
[;OlOgf(Xt.O)]}
(2.23)
and the corresponding score function
(2.24)
where !(Xi,O) is the Beta density function.
Using the multivariate version of the
Central Limit Theorem we may show that
~~N[O,1(0)].
Then, it follows from the Slustky theorem that
.;n(On -
O)~N(O, [1(0)r 1).
(2.25)
For a detailed proof of these results see Sen and Singer (1993), p.205-209. More
.
specifically for the Beta distribution, the Fisher information matrix is given by
1(0) _ (1{1 (a) - '1/;' (a + b)
-1{I(a+b)
-
- 1{1 (a + b) )
1{I(b)-1{I(a+b) ,
(2.26)
where '1/;(.) is the digamma function (the derivative of the logarithm of the gamma
function) and 1{1 (.) is known as the trigamma function (the second derivative of the
logarithm of the gamma function).
.
41
2.4.2 Estimation of the Parameters of the Model
First, note that the approximate models given in Section 2.3 can be expressed as
the usual GLM for a logistic model
logit[w*(v)] ~ Q*
+ (3*v,
(2.27)
where Q* and (3* are functions of the corresponding parameters of a Beta distribution, ("(,
0) and the parameters of interest (Q, (3)0
The functions Q* and (3* have different forms
for each of the different approximated models:
a) Low Dose
Q* = Q, and (3* = (3"(.
(2.28)
b) Moderate Dose
(2.29)
a) High Dose
Q* = Q + log
Then suppose, that there are k
[;0]'
~
and (3* = (3"(.
2 dose levels
v}, ... ,
Vk
(2.30)
and that corresponding
to the ith dose level, there are ni binary responses ¥ii, j = 1, ... , ni, 1 ::; i ::; ko Let
1
·
-"'Y;o
P1 L...J tJ'
nj
ni J=
. 1
(2.31)
and
(2.32)
.
42
Now, one may get estimators for a* and ,6*that are linear by using the weighted
least-squares method and minimizing a measure of dispersion of the form
k
Qn(a*,,6*) = LwdZi - a* - ,B*Vi]2,
(2.33)
i=l
with
(2.34)
The corresponding estimators for a*and ,6*are called minimum logit estimators, and
are linear in the Zits with coefficients depending on the viand Pi as follow
(2.35)
and
k
7l =L
i=l
(2.36)
with
and
The asymptotic multinormality can be easily shown.
Recall that niPi have
binomial distributions, which can be approximated by a normal distribution. Therefore,
we can use the delta method to verify that
43
which are also independent. Let
Zl
Z
=
a* - {3*Vl
, andp. =
Then we can state that the Z/s are asymptotically multivariate normal
.;N(Z - p.)~N(O,E),
with
{n(Jr(vl)[l-1T(Vl)]}-l
o
0
o
.(2.38)
E=N·
o
o
o
{nk1T(vk)[l-1T(Vk)]}-l
k
where N = Eni.
i=l
Now we can use a multivariate version of the Delta method (theorem 3.45, Sen
......*
and Singer, 1993) to conclude that a*and {3 are asymptotically normal
where
with the vector-valued function 9 : lR k --+ lR2
44
g(Z) =
Therefore, we have
(2.39)
.
G(J.t)EG(p.)t
= N. [va:~a,2
cOV(Q , (3)
cov(fi*:.~.)]
Var({3)
with
t;{i}
2
Var(1l*) =
{ni 1r(vi)[l-
1r(Vi)]} -1,
(2.40)
k
Var(7/) =
L
(2.41 )
i=l
and
k
cov(a·,7l) =
L
i=l
,.....,
Now we need to express
a and {3 as functions of (fi*,
,.....,.,.....
(3 ) and @, b). For each
case we have
a) Low Dose
........
.Q. . . = Q........ , and fJ = {3
-;;:::-.
'Y
a
45
(2.43)
b) Moderate Dose
"" = [(""*
"" Ht=\)]
8 Q , 'Y,
1/3
Q
4""2
'Y
+ 3n
[8 (""*
Q ,
"" ] -1/3
9,n)
,
(2.44)
with
and
(2.45)
c) High Dose
""
Q
=
""*
Q
-
I og
(2f/9fi ) '
(2.46)
and
""*
"" f3
f3 -- -"".
(2.47)
'Y
Recall that
Since (0*,
7l)
and @,
b) are
linear functions, a linearization of
a and ~ can be easily
obtained by retaining first-order terms in a Taylor series expansion.
So far, we have shown that (0*,
""*
.....
f3 ) and @, b) are asymptotically normal. Now,
since (0*, ~*) and @, b) are independent, they may be considered jointly asymptotically
46
normal. Therefore, we can use the multivariate version of the Delta method once again
to conclude that
a and (3 are also asymptotically normal distributed with parameters
depending on each case (2.28), (2.29) or (2.30)
vN +n[ (~) - (~) ]~N[o,E(a, (3)].
For low dose, the asymptotic variance is given by
"" ""
E(a, (3) = (N
+ n)
Cov{Q" .13')
[var(a*)
Cov(~'
./n
Var(13) +'"1f!... V
'Y"
'"12
]
("") .
ar,
(2.48)
For High Dose
E(a,
(3) =
(N
+ n) [var(al
Gov(a, (3)
Gov(a:!)],
Var((3)
where
""*
Var(a) = Var(a*)
+
+;
Gov(a*,7l) + VarCf )
fJ*
(3*
Var(i)
,2
2 G
,n
-
("" n) Var(n)
ov" H + n2 '
2
""*
,2
(2.49)
f.l) = Var((3)
V ar (fJ
+
~
(2.50)
-y4 ,
and
..........
"""")
Cov(ii*, (3)
Gov (a, (3 =
.,
..........
..........
+
Var((3)
(3*
C"")
t:l
+ -3
Var, +
,fJ*,
-
(3*
,
2n Gov
,H.
("" ""
,,11),.(2.51)
..-...*
where Var(a*), Var((3 ) and Gov(a*,(3 ) were presented as expression (2.40), (2.41)
and (2.42) respectively, while Var(i), Var(n) and GovC:r,
47
n) can be obtained from the
corresponding Fisher information matrix (expression 2.26) based on the appropriate
transformation.
Straightforward but lengthy computations lead to expressions for the asymptotic
variance for the Moderate dose case. We do not present these here. The computations
are similar to those presented for low and high dose.
--.
In sum, we have shown that (a, (3) can be expressed as linear functionals and that
asymptotic normality holds. Therefore, based on Mammen (1992), we can conclude that
bootstrap is consistent.
48
CHAPTER III
BIVARIATE LOGISTIC MODEL UNDER
COMPLIANCE MEASUREMENT ERROR
3.1 Introduction
In many research studies, it is common to observe several responses for each
experimental unit.
Sometimes, the study of the effect of an agent on two different
outcomes is of interest. For instance, cancer on two different organs, a main health
effect and a side effect, or two different health effects, such as cancer and asthma. In
addition, one might also be interested in describing the association between responses.
In such cases a multivariate approach is needed to account for correlated observations.
The usual bivariate logistic model is such that the marginal models are given by
the classical univariate logistic model. As we mentioned before, these models assume
that the true regressors are known without appreciable error, so the logit transformation
would lead to a linear relationship with the dose. This relationship is disturbed by the
presence of measurement error.
Here, we are interested in studying the impact of
measurement error of the compliance type on the usual bivariate logistic model.
In this chapter, we extend the models proposed in chapter II to accommodate a
bivariate outcome. In this situation, marginal models reduce to the models in chapter II,
and the bivariate case should account for dependencies between the outcomes of interest,
as well as for the problem of compliance measurement error affecting the dose. We
consider suitable adjustments for compliance, and develop appropriate approximations
for the bivariate logistic model.
Following the same ideas used in chapter II, these compliance-adjusted models
are categorized into the three types: (i) Low (or near zero) dose levels, (ii) moderate
dose levels, and (iii) high dose levels. The compliance factor is modeled using the Beta
distribution, as described in Section 2.2.
Based on this, suitable approximations are
proposed for each case (i), (ii) and (iii), and appropriate statistical procedures are
considered.
3.2 Development of the Models
In the bivariate setting, the response variable is composed of two binary
outcomes, so Y
=
(Y1 , Y2)t is a vector of binary responses. For a regressor X
= x,
the bivariate logistic model assumes
1foo(x;{3)
= F(Yh Y2) = [1 +e-(ol+131
X
)
+e-(02+J12x)]-1,
(3.1)
with {3 = [Ql,{31,Q2,1'2]f. This bivariate model is such that the marginal models are
univariate logistic,
(3.2)
and
(3.3)
•
. 50
The remaining quadrant probabilities can then be easily obtained as follow,
7fOI (x; (J)
= 7fOo(x; (J) -
7foo(x; (J)
(3.4)
e-(02+I32X)
7flO(X; {3) = 7foo(x; {3) - 7foo(x; (J)
(3.5)
e-(ol+131 X)
-----.,...-----..,.....----,------.,....-----,-
- [1
+ e-(02+132 X)] [1 + e-(Ol +.8Jx) + e-(02+I32X)] ,
and
7fn(x;{3)
= 1-7fOl(X;{J) -7flO(X;{J) -7foo(x;{3)
[2 + e-(Ol +,131 x) + e-(02+I32X)] e-(Ol +I31x)(o2+l32x)
Then the odds ratio can be used to measure the association between Y1 and
(3.6)
Y2,
In dose-response analysis, we assume that the models presented above hold for
X = D, which is stochastic in nature and may differ from the administered dose V.
Recall that in the stochastic compliance case, we are using a model of the form
D
= UV, 0 ~ U
~
1,
for the relationship between the administered dose (V) and the actual intake dose (D).
Also, we assume the compliance factor (U) has a beta distribution, Le. U - Beta (a, b),
with E(U) = 'Y , and Var(U) =
We can use
~he
n as described in Section 2.2.
approximate models developed in chapter II to account for
measurement error in the bivariate model presented here. The models in chapter II are
taken to be the marginal models for our bivariate model. From these marginal models,
we define the bivariate model and the corresponding quadrant probabilities.
51
Let us
define the adjusted approximate model for the bivariate logistic as follow
'Troo(v;rr) = F*(Yll Y2) =
[1 +e-(u;+P;v) +e-(U;+.a;V)]-\
(3.8)
with f3* = [a;, f3;, a;,,B;]t, where the elements of f3*are functions of the parameters of
the Beta distribution, (1', n), and the parameters of interest f3. Based on the models in
chapter II as marginal models, we have
a) Low Dose
= ai, and f3t = f3i/, with i E {I, 2}.
(3.9)
= ai + lnf3[1'iV~, and f3t = f3i/ -lnf3[1't?, with i E {I, 2}.
(3.10)
ai
b) Moderate Dose
ai
c) High Dose
(3.11)
Similarly we define each of the remaining quadrant probabilities,
and
'Trll (v;
'Tr01 (v;
f3*),
'TrIO (v;
f3*),
f3*) as well as the adjusted odds ratio, based on these adjusted quantities.
3.3 Estimation
We can get maxImum likelihood estimators for
corresponding likelihood function.
f3* by
maxlmlZmg the
First, suppose that there are k independent
subsamples, where the ith subsample is characterized by the dose level Vi. Within the ith
subsample, we have ni independent observations. The corresponding counts are nj~
,
the number of occuqences of (js) in the ith subsample. Given Vi, these counts are
multinomially distributed with ni replications and probability vector 'Tri = {'Tr}~(Vi; f3*)}.
52
•
Then the log-likelihood function is given by
(3.12)
and the score function is
U ({3* ).=
•
81(
R*)
(oJ
k
'"'"
8t:«<
~
(oJ
z= 1
{I 1
n(i)
is
~L..J
(z)( .' t:l*)
;=0 s=O 1ris v z, (oJ
'"'"'"'"
•
81r(i)(V"
R*)}
is II (oJ
8t:l*
(oJ
(3.13)
•
We follow the results by Sen and Singer (1993, p. 254-255) to obtain the Fisher
information matrix for each subsample
(3.14)
(details on this expression are provided in the appendix). The Fisher information matrix
for the entire sample is then given by
(3.15)
k
where N
= Eni.
i=l
Then the maximum likelihood estimators can be obtained iteratively, and we can
conclude that
(3.16)
Alternatively, we can combine estimates of f3* from each subsample by using a Weighted
Least Squares approach. This approach would simplify the iterative process required to
obtain maximum likelihood estimates based on the entire sample. We may consider
53
k
Q(p.) = 'L(p/ -
p.)tnJi(p/)(~·
- 13·),
(3.17)
i=1
........
where
Pi
is the maximum likelihood estimate of 13· based on the ith subsample. Now
let
........
S
where 1 k x 1 is a k x
with respect to
i
=
and S·
•
= lkxl ® 13* ,
matrix with all elements equal to 1. Then, by minimizing Q ({3. )
(3., we obtain a weighted least squares estimate of f3*
(3.18)
with
(3.19)
........
Now before studying the distributional properties of 13 W LS first note that
ffi(l/rns - P') = ffi[t,A;Ce\B: - t,A;(8'){j']
(3.20)
k
=
v'N'L [Ai(~r)~· - Ai(S·)p·]
i=1
k
==
v'N'E {[Ai(~r) - Ai(S·)]p· + Ai(S·)(~· - (3.)
1=1
+ [Ai (e.) - Ai (S· )] (Pi· - 13*) }.
We can simplifY this expression by replacing
Taylor series expansion as follow
54
[Ai (e·) - Ai (S· )] 13·
with a first-order
(3.21)
with
(3.22)
•
where 1m is m
X m
identity matrix. Then a first-order approximation for (3.20) is given
by
.......
which represents a much simpler function of the (3['s, 1 = 1, ..., k.
.......
Now since each (3[ is an independent maximum likelihood estimate of (3., we can
.......
state that the (3[ 's are asymptotically multivariate normal
VNCS· - e· )~NCO,E),
(3.24)
with
o
E=N·
(3.25)
o
55
Therefore, we can use a multivariate version of the Delta method (theorem 3.45,
.... *
Sen and Singer, 1993) to conclude that f3 W LS is asymptotically normal
VN(P~LS - f3*)~N[O, G(8*)EG(8*)t],
(3.26)
where
a [ . . . . *] Ie· =e.'
.
G(8*) =
. . . . *t g(8 )
(3.27)
a8
with
(3.28)
Then we have
G(e»EG(e»t =
Nt {[tHilce» +A,ce»rlntIlurwt (3.29)
[tHilce» + AtCe»] }
Now recall that (a, b), the parameters of the Beta distribution, are estimated
independently, and that their estimators are asymptotically normal as proven in section
.......
.......*
2.4. Therefore, we can state that @, b) and f3 W LS are jointly asymptotically normal. A
final application of the multivariate Delta method can be used to prove the asymptotic
"
..........
.........
normality of f3, which is a function of @, b) and f3 W LS. The parameters of the final
normal distribution depend on each case, low, moderate or high dose
(3.30)
56
with
Var(al)
'"
COV(at,f31)
COV(0t, 0 2)
'"
COV(Ol, (32)
.
'"
. Var(f31)
'"
COV(f3
Var(a2)
1, (2)
'" '"
COV(f31, (32) COV(02, P2)
Var(P2)
For low dose, we have
Var(al)
'"
(3.31)
"'*
,
Cov(Oi,f31)
Cov(ot,f3d
=
,
(3.32)
COV(Ol, (2)
= Cov(oi, 0;),
(3.33)
'"
COV(0t,f32)
V
= Var(a;),
"'*
,
Cov(oi, (32)
=
"'*
(P) _ Var(f31)
ar 1 ,2
,
*2
+
(3.34)
'"
f31 Var(,)
-y4
(3.35)
,
"'*
'"
COV(f31, (2)
.
C
(P P) OV 1, 2 -
=
,
Cov(f31,a;)
..
.......... ..........
COV(f31,f32)
12
Var(a2)
'"
COV(02, (32)
,
+
.........
f31f32 Var (,)
-y4
= Var(a;),
=
,
,
(3.37)
(3.38)
"'*
Cov(o;, (32)
57
(3.36)
,
(3.39)
and
(3.40)
For High Dose
(3.41)
(3.43)
(3.45)
(3.47)
58
(3.48)
•
and
(3.50)
As before, straightforward but lengthy computations lead to expressions for the
asymptotic variance for the Moderate dose case. We chose not to present these here.
The computations are similar to those presented for low and high dose.
59
CHAPTER IV
SEGMENTED LOGISTIC MODEL UNDER
COMPLIANCE MEASUREMENT ERROR
4.1 Introduction
Another extension of the classical logistic model-is the segmented logistic model.
In dose-response analysis under the segmented approach with three segments, a dose is
considered low, moderate or high according to two threshold points, d~ and d;, at which
the effect of an agent is believed to change considerably. Under the logistic model, a
logistic model is assumed for each segment, but the parameters of the model differ from
segment to segment. .
In this chapter, we are interested in developing approximations for the segmented
logistic model to adjust for stochastic compliance. Up to three segments are considered,
and as a result, three situations may arise. 1) 0 < V ~ d;, the administered dose does
not exceed the lower threshold point. In this case, only one segment is involved in the
model. 2) d;
<
V ~
d2, the administered dose exceeds the lower threshold point but
not the upper threshold point. In this case, when considering compliance measurement
error of the dose, the adjusted model will involved up to two segments. 3) V
> d;, the
administered dose exceeds both threshold points, and therefore, all three segments would
need to be considered in the compliance-adjusted model. The compliance models to be
"
considered in this chapter are based on appropriate beta distributional models for the
compliance factor as in previous chapters. Different approximations are proposed for
each of the three cases I), 2) and 3) mentioned above, and based on them, suitable
statistical analysis schemes are developed.
4.2 Development of the Models
Under a segmented model with three segments and threshold points, d~ and d;,
we have
< d*I
d~ < d :S
d
-
d
> d;
ti.l)
The model which relates the response variable Y with the administered dose V that also
accounts for D as a random variable, combines the underlying models at each segment.
This can result in one of three situations: 1)0
< V :S d;, 2) d; < V:S d2, and 3)
V>~.
1) 0
<V <
- d*1
1
u
. 7f*(v)
2) d*1
<
=
F(OI
+ ,Brd)fDIV(dlv)dd.
(4.2)
V <
- d*2
1
d'1
7f*(v) =
F(Ol
+ ,BId)fDIl' (dlv)dd +
1~ [F(OI + ,BId;) + F(02 + f32d) I
61
(4.3)
F(02
+ f32 d;)]fDIV(dlv)dd.
3) V > d2
1 F(~1 +
d"1
7r*(V) =
f31d)fDIV(dlv)dd
+
(4.4)
d·
r
Jd;
2
[F(OI
+ f31 d;) + F(02 + f3;.d) - F(02 + f3;.d;)]fDIV(dlv)dd +
r [F(OI + f31d;) + F(02 + f3;.~) - F(02 + f3;.d;) +
Jdi
F(03
+ f3Jd) -
F(03
+ f3J d;)]fDIV(dlv)dd.
Then corrections are developed by using approximate models based on Taylor series
expansions similar to those developed for'the univariate logistic model in Chapter II.
In the logistic case, we define
1) 0 <
V <d;.
In this case, only one segment is involved in the model. Therefore,
the developments described in Section 2.3 can be directly applied here. Furthermore, if
we consider this case a low dose case, we can use a model similar to (2.15), for low dose
studies,
10git[7r*(v)] ~
01
+ f3nv
(4.6)
~ logit[7r~1 (v)),
with
exp(ol + f3nv )
7rn (v) = 1 + exp(ol + f3nv )'
o
the logistic model (4.4) with x = E(Dlv) = ,v.
62
(4.7)
2) d;
< V < d2.
In this case, two segments are involved in the model.
One
approach is to develop approximations for the terms corresponding to each segment
separately. That is,
(4.8)
with
1
d'1
7l"i2(v) =
F(al
+ f3l d )fDJV(dlv)dd,
(4.9)
and
An approximation can be obtained by replacing F(al
+ f3l d) in (4.8) and F(a2 + f32d)
in (4.10) with their second-order Taylor expansion with respect to d about
E(DID ~ d;), and E(Dld; ~ D ~ v) respectively,
7l"i2(V)
~
1
dO1
.
{F(al
+f3lx)lx=E(DID~di)
(4.11)
• aF(al + f3IX)
+ [d - E(DID~ dd]
ax
IX=E(DID~di)
• 2aF(al +f3IX)
1
+ 2[d - E(DID ~ dl )]
ax
}
IX=E(DID~di)
fDJV(dlv)dd
~ { 7l"~2(V) + ~if nI2f3?7l"~2(V) [1- 7l"~2(v)][1-27l"~2(V)] }PI2'
and
63
1r22(V)
~ 1~ {F(a2 + ,62x)lx=E(Dld;~D~V)
1
*
+ [d - E(Dld l ::; D ::; V)]
*
1
+ 2[d - E(Dldl
+ F(al + ,BId;) -
aF(a2 + f32x)
ax
IX=E(Dld;~D~v)
D ::; v)]
::;
f(a2
(4.12)
2 aF(a2
+ f32x)
ax
IX=E(Dld~~D~V)
+ f32 d;)} fDIV(dlv)dd
~ {1r~2(V) + ~if 1122,8?1r~2(V) [1-1r~2(v)][1- 21r~2(V)] }P22
+ {F(al +,Bld;) where
')'12,
11 12 ,
')'22
F(a2 +f32 d;)}P22 ,
and 11 22 are the incomplete Beta means and variances for each
segment respectively, '
')'12
= E(UIU::; u;), 12 12 = Var(UIU::; u;), withu; = d;/v,
')'22
= E(Ulu; < U
::; 1), 1222
= Var(Ulu; < U ::; 1),
o( )
exp(ai + ,Bi'"YijV)
"
.
1riJ'V = 1
(
R
)' wlth~ = 1,2andJ = 2,
+ exp ai + fJi'"YijV
P12
=
l
d~
o
fDIV(dlv)dd and P22
=
(4.13)
LV fDIV(dlv)dd.
,
~1
We can further expand 1r*(v) by expanding (4.11) and (4.12) with respect to v.
We expand (4.11) about v
= 0, and (4.12) about v = VI, with VI
such that 1r~2(VI)
=
~.
A first-order expansion leads to
1ri2 (v)
1r22(V)
~ P12 {1r~2 (0)
+ ,Bt1'121r~2 (0)[1 -
~ P22 {~ + F(al + ,Bid;) + (v -
VI) [lf32')'22 -
, 64
1r~2 (O)]v},
(4.14)
+ f32d;)
(4.15)
F(a2
~{jJn22vi] }.
Therefore, the final model is given by
7r*(v) ~ 7r~2(O)Pl2
(4.16)
+ {~+ F(Ol + f3ld;) -
F(02
+ ~d;) -
Vl
+ {f31'Y127r~2(O)[1-7r~2(O)]~l2 + [l~'Y22 -
[l~'Y22 - ~.Bin22Vi] }P22 }
~.Bin22Vi]p22 }V.
3) V > di, In this case, all three segments are involved in the model. We have now,
7r*(V) = 7ri3(V) + 7r23(v)
+ 7r23(V),
(4.17)
with
1
d'1
7ri3(V) =
F(Ol
+ f3l d )fDjV(dlv)dd,
(4.18)
and
7r23(V) =
r [F(Ol + f3ld;) + F(02 + ~~) - F(02 + ~d;)
Jdi
+ F(03
+ [3sd) -
F(03
+ [3s~)lfDIV(dlv)dd ..
An approximation can be obtained by replacing F(Ol
F( 03
+ [3sd) with
(4.20)
+ f3l d),
F(02
+ f3.2d)
and
second-order Taylor expansions with respect to d about the
corresponding incomplete Beta means,
7ri3(V)
~ { 7r~3(V) + ~if nl3f3r7r~3(V) [1 - 7r~3(v)][1- 27r~3(V)] }Pl3.
(4.21)
7r23(V)
~ {7r~3(V) + ~if n23,6J7r~3(V) [1 - 7r~3(v)][1- 27r~3(V)] }P23
(4.22)
+ {F(Ol +f3l d;) - F(02 +~d;)}P23,
and
65
7f33(V)
~
{7f83 (v) + ~if n33 {3;7f83 (v) [1- 7f83(V)][1- 27f83 (v)] }P33
+ {[F(al
+ (3ld~) + F(a2 + f32d;)
- F(a2
(4.23)
+ f32d~) - F(a3 + fJJ~) }P33 ,
where
= E(Ulu; < U
"12 3
"133
o(
7fiJ' v)
P I3
= f
Jo
~ U2), 0 23
= E(Ul u 2 < U
~ U2), and
n33 = Var(Ulu; < U
~ 1),
exp(ai + (3i'YijV)
.. {
}
. {
}
= 1+
(
t:l
)' with ~ E 1,2, 3 and) E 2,3,
exp ai + ,..,i'YijV
d·
1
~ 1),
= Var(Ulu; < U
fDjV(dlv)dd, P23
= f
~
Jd~
fDjV (dlv)dd and P33
=
l
(4.24)
v
fDjV(dlv)dd.
ti;
Finally, expand (4.21) with respect to v around v = 0, (4.22) around v =
VI,
with
VI
such that 7f~3(vI) = ~, and (4.23) around vo, where Vo~ 00. This leads to
7fh(v) ~ P13{7f~3(0)
7f23(V)
+ {3n137f~3(0)[1-7f~3(0)]V},
~ P23 {~ + F(al + (3ld~) + (v -
vd
F(a2
+ ,82d~)
(4.25)
(4.26)
[~f32'Y23 - ~,Bin23vi]},
7f33(V) ~ {1 + [F(al.+ (3ld~) + F(a2 + f32~)
- F(a2 + f32d~) - F(a3 + fJJ~)}P33.
(4.27)
•
66
Therefore, the final model is given by
1T*(V) ~ PI31T~3(0)
(4.28)
+ P23 {~ + F(QI + ,BId;) - F(~2 + .B2d;) + P33 {1 + [F(QI + ,Bid;) + F(Q2 + .B2c0.) + { P13 ,Bi 'Y131T~3 (0)[1 -
vI
[~.B2'Y23 - ~,Bin23Vi] }
F(Q2
+ .B2d;) - F(Q3 + f33 d;)}
1T~3 (0)] + P23 [~.B2'Y23 - ~,Bin23V2] }V.
4.3 Estimation
4.3.1 Probability Model
To estimate the parameters of the Segmented model, first, we suppose that there
are k independent subsamples, where the ith subsample is characterized by the dose level
Vi.
Within the ith subsample, we have ni observations. The corresponding counts are ri,
the number of occurrences of the event of interest in the ith subsample and let
r = (rh ..., rkt Also let 7r = [1TI(8), ..., 1Tk(8)]f denote the corresponding probabilities
of the event, where 8 is a vector of the parameters of the Segmented model. Then the
probability law is given by the product Binomial model
Three of the most common approaches to estimating 8 are via Maximum
likelihood (ML), Minimum chi-squared (MCS) and Modified Minimum chi-squared
(MMCS). All three methods of estimation produce BAN estimators (Sen and Singer,
1993). We propose to use the MMCS method because of its computational advantages.
67
The MMCS estimator is defined as the value '"
(J which minimizes
(4.29)
•
4.3.2 Asymptotic Distribution
We can derive the asymptotic distribution of '"
(J by using the same technique
employed by Sen and Singer (1993, p. 254-256) for the estimator of (J obtained via
Maximum likelihood. First, let
Ilull
~
K, 0 < K <
00
and consider the Taylor's series
expansIOn
k
where N =
Eni and (J*
belongs to the line segment joining (J and (J + N-l/2 u . Then
i=l
we define
A(U) =
QN((J
+ N- 1/ 2 u) -
QN((J)
(4.31)
1 tIt
1 t
= --u
U+-u Vu+-u Wu
..;N
2N
2N
'
where
(4.32)
and
68
(4.33)
Now note that:
i)
Recall that
ri
has a binomial distribution.
Hence it follows from the Central Limit
Theorem that
(4.35)
Note also that {ri} are independent. Then from the multivariate version of the Delta
method (Theorem 3.4.6, Sen and Singer, 1993), it can be easily shown that
(4.36)
with
IN (8) =
1 k
- LnJi(8)
(4.37)
N i=l
where Ii (8) is the Fisher Information matrix for the i-th subsample given by
1
Ii (8)
= 7ri (8)[1 -
7ri (8)]
87ri(8) 87ri(8)
88
88t'
(4.38)
ii)
V
N
=
8 2 QN(8)
(4.39)
8888t
~
t{
2
nt
87ri(8) a7ri(8) _ n7{ r i - ni 7ri(8)} a 7ri (8)}.
N i=l .ri(ni - ri) 88
88t
ri(ni - ri)
8888 t
69
Note that the second term in the above expression is of the order O(N- 1/ 2 ). Then the
expression is dominated by the first term, 'so it follows that
(4.40)
iii) Since
is a continuous function of (), we have
Then we may conclude that
(4.41)
.,
where the minimum is attained at
(4.42)
This will also correspond to the minimum ofQN(()), which is attained at the MMCS 8 of
(). Therefore, we have
(4.43)
which implies that
(4.44)
70
Using the Slutsky theorem we may conclude that
!N(o -
O)~N{O, [1N(O)r 1 }.
(4.45)
4.3.3 Estimating Equations
Estimating equations are given by
(4.46)
which can be solved iteratively by using the Newton-Raphson method. The a-th step
estimate of 0 is provided by
.......(a)
o
.......
=0
(a-l) - {{)2()O{)ot
QN(O)
10 =0(0-1)
}-l{ ()QN(O)
()O
}
10 =0(0-1),
(4.47)
where
(4.48)
and
. 71
4.3.4 Further Details for Application
In the following presentation, we restrict our attention to the situations 2)
v ::; d2 and 3) V > d2.
d;
<
Estimation for the parameters in Situation 1) 0 < V ::; d1 is a
direct application of the results presented in chapter II because it is equivalent to the low
dose model. Some details on the quantities involved in the matrixes of first derivatives
of 7ri ((J) are given for each situation in the following sections.
2) d*1
<
V <
- d*2
The vector of parameter corresponding to this model is given by
(J =
Recall
that
01, {31, 02
1'12, 1'22, !122 , P 12 , P 22 '
(01, {31, 02,
and
132
132, 1'12, 1'22, !122, P12, P 22 )t.
are
the
parameters
of interest,
and
are the parameters associated to the Beta distribution.
that
They
represent characteristics of the different segments involved in the model .
Initial estimates
We propose to use the low and moderate dose models of Chapter II to obtain
initial values for
01, {31, 02
and
132,
respectively. For this purpose, we would need
additional subsamples with dose level in the low dose range, i. e., V ::; d; .
On the other hand, we estimate the parameters of the Beta distribution (a, b)
from an independent sample and based on these estimates, we can obtain initial values
for
1'12, 1'22,
!122 ,
P 12, P 22 .
Then we allow these parameter to be updated by the
iterative process.
72
Matrix of first derivatives of Wi (8)
The derivatives of 7ri (0) with respect to each of the parameters in the model are
needed in the -computation of the Fisher information matrix (4.38) as well as in the
development of the iterative process (4.47).
.
87ri(O) _ {87ri (O) 87ri(O) 87ri(O) 87ri(O) 87ri(O) 87ri(O) 87ri(O) 87ri(O) 87ri (O)}t
80 801' 8f31 ' 80 2 ' 8f32 ' 8112 ' 8122 ' 8n 22 ' 8P12 '8P22
'
with
(4.52)
(4.53)
(4.54)
(4.55)
(4.56)
73
(4.57)
(4.58)
.
3) v> d2
The vector of parameter corresponding to this model is given by
(J
= (aI, {3l, a2, 132, a3, 133, 1'13, 1'23, n23 , P 13, P 23 , P33)t.
Now the parameters of interest are all
{311 a2,
132,
a3
and 133, and 1'13,
1'23,
n23 ,
P13,
P23, P33 are the parameters associated to the Beta distribution.
Initial estimates
Similarly to the previous case, we propose to use the low, moderate and high
dose models of Chapter II to obtain initial values for
all {3l, a2,
132,
a3
and
133,
respectively. Here we would need additional subsamples with dose level in the low and
moderate dose range, that is, V ~ d; and d;
< V
~
d2 respectively.
Again we estimate the parameters of the Beta distribution (a, b) from an
independent sample and based on these estimates, we obtain initial values for
n23 ,
P 13 , P23 , and
P33.
1'13, 1'23,
Then we allow these parameter to be updated by the iterative
process.
. 74
Matrix of first derivatives of 1ri (0)
The derivatives of 7ri ((J) with respect to each of the parameters in the model
corresponding to this dose level case are now provided
87ri((J) _ {87ri ((J) 87ri((J) 87fi((J) 87fi((J) 87ri((J) 87fi((J)
8(J 8at
8{3t
8a2
8f32
8a3
8/33
I
I
I
I
I
87fi ((J) 87ri ((J) 87fi ((J) 87fi ((J) 87fi ((J) 87fi ((J) } t
81't3 I 81'23 ' 8n23 ' 8Pt3 ' 8P23 ' 8P33
with
8;i((J) =
at
7f~3(O)[l - 7f~3(O)]Pt3
+ {31'Yt37r~3(O)[1 - 7r~3(O)][l - 27f~3(O)]Pt3Vi
+ F(at + {3t d;)[l - F(at + {3td;)]P23
+ F(at + {3t d;)[l- F(at + (3td;)]P33
(4.59)
(4.61)
87fi((J) { .
•
•
a~n23 }
1'23
8{32 = -dt F(a2+f32d t )[l-F(a2+f32dt )] + 8{3i~3 P23 + 7P23Vi,(4.62)
+ d;F(a2
- dt F(Q2
+ f32~)[l + f32d;)[l -
F(a2
F(a2
75
+ f32~)]P33
+ f32d;)]P33
(4.64)
(4.65)
(4.66)
(4.67)
76
CHAPTER V
NUMERICAL RESULTS
5.1 Introduction
The purpose of this chapter is to illustrate the use of the methods proposed in
previous chapters to draw statistical conclusions in practice, and to examine the
performance of these methods under different conditions. In addition, it is the intention
of this chapter to provide some insight on how the conclusions may change if compliance
measurement error is ignored.
..
This chapter is divided into two major sections. The first section provides an
illustration of the application of the proposed methods to a situation that can be
encountered in practice. With this purpose, a set of data provided by the Radiation
Effects Research Foundation (RERF) containing information on the atomic bomb
survivors of Japan is analyzed.
The second section of this chapter presents the results of a small simulation study
conducted to examine how successful the proposed methods are in reducing bias and
improving overall statistical efficiency. Several compliance patterns are considered, and
in each case, the proposed models are compared against a model ignoring compliance
measurement error and a model considering only a first order adjustment
5.2 Application to the Study of Radiation Effects
As pointed out before, the study of radiation effects is a problem suitable to be
studied by using the proposed models. After a certain amount of radiation is released in
the environment, we can expect that only a portion of this amount would actually reach
particular organs. Shielding of one kind or another play an important role in this matter
(e. g., the body, houses or other kinds
ofst~ctures).
We start this section by describing the atomic bomb survivor data, then the
appropriate models are used to analyze the data. In this presentation, emphasis is given
on the use of the bootstrap methods. We conclude the section with a discussion of the
results and the effects.ofignoring compliance measurement error.
5.2.1 The Atomic Bomb Survivor Data
A great deal of effort has been given to the estimation of radiation dose that the
people of the cities of Hiroshima and Nagasaki of Japan received as a result of the drop
of the atomic bombs in 1945. The latest revision of the dosimetry system was carried
out in 1986 and led to the new DS86 system used for dose estimation at RERF since
then. A description of the entire system is provided in DS86 final report (Roesch, 1987).
Doses from both gamma rays and neutrons exposure has been calculated for each
survivor based on recorded distance from hypo centers. Shielding information collected
by questionnaires are used to adjust these doses. Finally, organs dose are estimated
based on an existing Monte Carlo transport analog (Kerr, 1979). This analog is used to
78
investigate depth dose distributions in several cylindrical tissue phantoms of different
sizes.
For more than 30 years, the Atomic Bomb Casualty Commission (ABCC) and -its
successor, RERF, have been following a fixed cohort of atomic bomb survivors, the Life
Span Study (LSS) cohort.
The dataset provided by RERF used in this chapter
corresponds to the data used in the analyses of LSS cancer incidence for the period of
1958-1987 for solid tumors, and 1950-1987 for leukemia, lymphoma and mylenoma. The
papers by Thompson et. al. (1994), and Preston, et. al. (1994) present the results of
RERF's analysis of these data.
Data on individual survivors are not provided in the dataset, instead, data were
stratified on:
1) City
1 = Hiroshima
2 = Nagasaki
2) Sex
1 = Male
2
=
Female
3) DS86 Colon Dose (in Gy)
1 = 0.00 - 0.01
2 = 0.01 - 0.10
3 = 0.10 - 0.20
4 = 0.20 - 0.50
5 = 0.50 - 1.00
6 = 1.00 - 1.50
79
7 = 1.50 - 2.00
8 = 2.00 - 3.00
9 = 3.00 - 4.00
10 = 4.00 <=
4) Age at Exposure (in years)
1=
0-
5
2=
5-
10
3= 10-
15
4= 15 -
20
5= 20 -
25
6= 25 -
30
7= 30 -
35
8= 35 -
40
9 = 40 -
45
10 = 45 -
50
11 = 50 -
55
55 -
60
12 =
..
•
13 = 60 <=
5) Calendar Time
1 = 1 Jan 1958 through 31 Dec 1960
2 = 1 Jan 1961 through 31 Dec 1965
3 = 1 Jan 1966 through 31 Dec 1970
4= 1 Jan 1971 through 31 Dec 1975
5 = 1 Jan 1976 through 31 Dec 1980
6 = 1 Jan 1981 through 31 Dec 1985
7 = 1 Jan 1986 through 31 Dec 1987
80
·6) Shielded kerma
1 = < 4Gy
2= 4 <=Gy.
For each stratum defined by these six cross-classification variables, information is
given on number of people, person years, mean dose, and number of tumors, among
.-
others. The solid tumor table was defined using colon dose. A separate computer file
was provided containing organ dose adjustment factors specific to city and age of'
exposure. This information would allow us to compute organ doses for each of the
fifteen organs incorporated into the DS86 system.
The leukemia, lymphoma, and
myeloma cancer table was provided in a separate computer file, and was defined using
bone marrow dose. The solid tumor table includes case counts for 31 specific tumor
types as well as for all solid tumor lumped together. The leukemia table includes counts
for all lymphomas, non-Hodgkin's lymphomas, multiple myeloma, all leukemias, acute
myelogenous, chronic myelogenous leukemia, acute lymphocytic leukemia, and adult Tcell leukemia.
For the purpose of applying our models we need the number of people and the
corresponding number of tumors in each stratum.
We compressed, however, the
information on Calendar Time because we are not interested here in the time of the
event. In addition, in order to estimate the compliance distribution, we use the
distribution of people on the different organ dose group. Since the solid tumor dataset is
defined based on colon tumor, we decided to use this type of tumor in our presentation
here. Table 1 presents a summary of this information by Colon Dose .
81
Table 1. Distribution of people and tumor counts by Colon Dose.
DS86 Colon Dose
0.00 - 0.01
0.01 - 0.10
0.10 - 0.20
0.20 - 0.50
0.50 - 1. 00
1. 00 - 1.50
1. 50 - 2.00
2.00 - 3.00
3.00 - 4.00
Total
Number of People (%)
42702 (53.4)
21479 (26.9)
5307 ( 6.6)
5858 ( 7.3)
2882 ( 3.6)
1051 ( 1. 3)
393 ( 0.5)
297 ( 0.4)
3 ( 0.0)
79972 (100 . 0 )
Number of Tumors
234
115
30
35
21
13
5
4
0
457
Note: This table does not include kerma >= 4.0 Gy.
Questions have been raised about the accuracy of the dose estimates for people
with shielded kerma above 4 Gy (pierce, Stram and Voeth, 1990). The provided set of
data is structured in a way that is easy to identify those groups with shielded kerma
above 4 Gy. They have been excluded from the analysis in the published reports and
were excluded from our analysis here too.
This fact helps us define the amount of
administered dose. We can set V = 4.0 as a result. A consequence of this elimination is
that we are left with only one administered ·dose group, which can not be analyzed based
on logistic models. In order to be able to apply the proposed methods, we decided to
simulate data on another dose group based on the data structure in the atomic bomb
survivor data. Despite that we can not use the results obtained by analyzing these data
to make real life conclusions on radiation effects, the results are still useful for the
purposes of this chapter. For the simulated group, we assumed an administered dose of
V
= 6.0 and an increase of25% in the tumor incidence with respect to the atomic bomb
survivor data group. In sum, we have:
82
..
i
1
2
Group
Atomic Bomb Survivors
Simulated
Administered dose (Vi)
4.0
6.0
# people (ni)
79972
79972
# of tumors
457
571.25
5.2.2 Methods for Estimation
We use the maximum likelihood method to estimate the parameters of the beta
distribution assumed for the compliance distribution of the dose. The data needed for
this purpose is the distribution of subjects on the different colon dose groups. These
data are presented in Table 1 for the atomic bomb survivor group and is the same for the
simulated group. Therefore, we have a total of 159944 subjects distributed across the
different colon dose groups following the same proportions as indicated in table 1.
We modified the results given by Johnson and Kotz (1970) to account for the
grouping of the data. Let Y 1 , ••• , Ym be the midpoints of the corresponding colon dose
group. The maximum likelihood equations for the estimators (a,
'I/J(a) - 'I/J(a + b)
b) of (a, b) are
= logG 1
(5.1)
I
where G 1
= IIYjm
j
I
/
N,
G2
j=l
= II(1 _ Yj )m
j/
N,
(5.2)
j=l
and 'I/J(.) is the digamma function, with 1 = number of colon dose groups, m j
= number
of people in the colon dose group j, and N = total number of people in the sample. For
our problem, l'
9, and N
= 159944.
83
The asymptotic covariance matrix of
E ("
a,
/No. and /Nb (as N
b) _ ('I/J'(a) -
'l{/ (a + b)
-'l{/(a+b)
-
---+
- 'l{/ (a + b) )
'l{/(b)-'l{/(a+b)
-1
Johnson and Kotz (1970) also provided a first approximation for
0.0 =
~{lG2 }/{1- G 1 2
G2 }
00) is given by
(5.3)
(a, b)
(5.4)
"
1
bo = -{l- Gtl/{l- G 1 - G2 }.
(5.5)
2
Using these values as initial estimates, we can obtain solutions of the maximum
likelihood equations iteratively.
We can use the formulae given in Chapter II to obtain estimates of (a*, f3*), and
then of (a, (3). Estimates for the covariance matrices were also provided in Chapter II.
The data in this chapter can be considered a low dose case, and therefore, the
corresponding results of Chapter II are applied here. The model, in this case is given by
logit[7I"*(v)] ~ a
+ f3'Yv.
(5.6)
Recall also that the results on (a* , f3*) are of interest in the present chapter because they
represent the model when compliance measurement error is ignored. This model is given
by
logit[7I"*(v)] ~ a*
+ f3*v.
(5.7)
On the other hand, estimates of (a"', f3*) are obtained as part of the process
needed to estimate the parameters of interest (a, (3). As a result, bootstrap estimates
~*
-
~
(0;, f3b) are obtained along with the bootstrap estimates (Ob' f3b) form each resample.
Therefore, an estimate of the covariance matrix as well as the confidence region for
"* ) can be obtained following the procedures described in the following section.
(0*, f3
84
'"
5.2.3 Application of the Bootstrap
The idea of the bootstrap is to generate an empirical estimate of the statistic's
sampling distribution. This is done by taking random samples with replacement from the
original sample as it were the population, and then obtaining estimates of the parameters
of interest based on each of these "resamples". These resamples are called bootstrap
samples.
The basic bootstrap algorithm is outlined by Efron and Tibshirani (1993), p.47.
Here we present the steps needed for our models based on the data on hand:
1. Draw a simple random sample of size
ni
from each sample corresponding to each
administered dose level with replacement. A special provision needs to be made here
to account for the fact that the data is grouped according the cross-classification
..
described earlier in this chapter.
For each element re-sampled, we identify its
corresponding stratum and its contribution to the tumor incidence would be the
..
proportion of tumor occurred in that particular stratum among the subjects classified
in such stratum.
For the re-sampling of the data obtained for the estimation of the parameters of the
compliance distribution (organ dose distribution), we generate a sample using the
proportion of people in each organ dose group in the sample as the probability
distribution.
These set of samples correspond to a bootstrap sample.
/"00
2. Calculate the statistics of interest (a, (3) as described in Sections 5.3.1 and 5.3.2.
/"00
This yields bootstrap estimates (ab' f3b).
85
3. Repeat steps 1 and 2 B times, where B is a large number. For estimating standard
errors B should typically be 25-200.
Much lager values of B are needed for
confidence intervals (Efron and Tibshirani, 1993, section 6.4). We chose to use
B = 500 for these analyzes.
The bootstrap estimate of the covariance matrix can be obtained by the sample
covariance matrix of the B replications
where
(5.9)
with (Qb, ~b) the estimates of (a, (3) based on the b bootstrap re-sample.
...
Furthermore, consider the bootstrap of the pivot
(5.10)
to construct the joint confidence region for (a, (3). A bootstrap confidence region for
(a, (3), based on Ub and having 1 - a level is given by
(5.11)
where u(a) is an upper a-point of the distribution of the Ub'S.
86
5.2.4 Results
First we obtain the maximum likelihood estimates for (a, b). The estimates are
( Ci)
b
=
( 0.3777 )
11.8353 '
which lead to the estimates of the mean and variance of the compliance distribution
...
"
-;;; = 0.0307728, and n = 0.0022471.
Figure 1 presents the estimated compliance distribution.
Figure 1. Estimated Compliance Distribution
18 r;m=-=-=~=-=-===-=-=~=-~
16
14
12
S' 10
'it 8
6
4
2
o "~2illlli.~~8§.lll:2.E12a.
o
0.05
0.1
0.15
0.2
0.25
0.3
u
The minimum logit estimates of (a·,
(rr".)
a
f3.) are
- 5.60818
= ( 0.1122907 ).
which are obtained based on the colon tumor incidence data.
Then we obtain the
estimates of the parameters of interest by combining the previous estimates appropriately
- 5.60818) '
(~Q) ( 3.6490205
=
with estimated covariance matrix given by
87
s _ (0.0268594
500 -
_
0.16457
- 0.16457)
1.0482874 '
obtained based on 500 bootstrap re-samples. In addition, the bootstrap 95% joint
confidence region is obtained based on these results and is presented in Figure 2.
Figure 2. 9SOio Confidence region, with adjustment
for C~mpliance
-5.59
a"
-5.61
iii
-5.63
-5.65
-5.67
beta
,.
........
Finally, the bootstrap estimate of the covariance matrix of (a"', (3 ) is obtained as
s. _ (
500 -
0.0268594
0.005064
-
- 0.005064)
0.000991
'
and the corresponding bootstrap 95% confidence region is presented in figure 3.
88
Figure 3. 95% Confidence Region, Ignoring
Compliance
-5.59
«
a
I'll
-5.61
ii
-5.63
-5.65
-5.67
5.2.5 Discussion
The estimates of the parameters of the compliance distribution indicate that we
have a case of a < 1 < b, in which low compliance of the administered dose is more
likely to occur. That is, a small portion of the radiation dose reached the colon in most
of the people. In average, this proportion is estimated to be about 3%. Ifwe ignore this
compliance factor, the impact of the radiation dose is represented by the parameter f3.,
which was estimated to be 0.112. This value indicates the increase in the logit of the
incidence of colon tumor per unit increase in the radiation dose (gamma rays).
By
examining the confidence region presented in figure 3, we can say that we have 95%
confidence that this parameter is positive and different from zero.
On the other hand, based on our model to account for compliance measurement
error, the impact of the radiation dose is represented by the parameter f3, which was
estimated to be 3.649.
In turn, this value indicates the increase in the logit of the
89
incidence of colon tumor per unit increase in the radiation dose that is expected to reach
the colon in average. We can also conclude that this parameter is positive and different
from zero based on the joint confidence region (figure 2).
As expected, ,the model with adjustment for compliance estimated a stronger
relationship between radiation dose and colon tumor incidence.
The drawback,
however, is that this estimation has higher variability due to the need to estimate the
parameters of the compliance distribution (see figure 4).
However, this increase in
variability better reflects the uncertainties in the estimates of dose.
The problem of measurement error in dose-response analysis of the atomic bomb
survivor data has been considered by Pierce, Stram and Vaeth (1990). They use the
tissue kerma at the location of the survivor corrected for shielding and terrace as the
observed measure of dose, and adopt a parametric form for its distribution given the true
dose.
Several error models were considered, such as, lognormal, contaminated
lognormal, and normal. They also make an assumption with respect to the distribution
of the true dose by using a Weibull distribution. These assumptions are used to estimate
the average true dose given an observed dose, and develop suitable adjustment for the
dose-response model. They claim that these adjustments can be applied to organ dose,
and show that for the LSS cohort the average of the true dose given the observed dose is
substantially less than such observed dose, which is consistent with our developments.
90
•
Figure 4. 95% Confidence Region with and without
adjusting for Compliance
alpha*
-5.56
\0
alpha
I
I
-5.56
I
-5.58
-5.58
-5.6
-5.6
-5.62
-5.62 .-
-5.64
-5.64
-5~66
I !
-0.1
r!
! , !
0
0.1
0.2
beta*
without adjustment for compliance
I , !
0.3
0.4
I
-5.66
I
i
I
3.4
!
I
I
I
!
!
3.5
3.6
3.7
3.8
3.9
beta
with adjustment for compliance
I
5.3 Simulation Study
Each replication in this simulation study was set up to mimic the type of
bioassays commonly used by the National Toxicology Program (NTP), which also
corresponds to the "standard NCI bioassay" (Sontag, Page, and Saffiotti, 1976) (Portier
and Hoel, 1983) (Haseman, 1984). Typically, each experiment contains 50 animals in
each of three dose groups, where the doses are chosen as a proportion of the maximum
tolerated dose. Therefore, we decided to simulate 150 observations for three groups in
each replication. The doses were chosen to have responses in (0.02, 0.05, 0.1) for the
application of the models developed for low dose cases; (0.4, 05, 0.6) for the application
of the models for moderate dose cases; and (0.9, 0.95, 0.98) for the application of the
models for high dose cases.
The models were examined under several compliance distributions. Two factors
were varied in a factorial design, "'( E {0.25, 0.50, 0.75} and
yielding nine different compliance distributions.
"'( = 0.75 and
n=
n E {0.05"'(, 0.1"'(, 0.2"'(}
However, the last combination,
0.2"'(, presented some numerical problems when performing the
iterative estimation due to the likelihood of extreme values. Therefore, the present study
examines the results for eight cases of the compliance distribution (see figures 5 to 12).
Furthermore, we assume that an independent study of similar size is conducted to study
and estimate the parameters of the compliance distribution.
92
4
Figure 5. Beta density, a
10.5
=3.5, b =
3
a2
0.
o
o
0.2
0.4
u
0.6
0.8
=
=
Figure 6. Beta density, a 1.625, b
4.875
4...."...,.".,.,.".....,.,.,.,.,.,.,...,.".,.,."..............,.,.,.,.,...,,.,.,..,.,.,,.,.,..,.,.,..............,.,.,..,.,.,,.,.,..,.,.,....
a2
0.
O~=~==~=~=~~=~
o
0.2
0.6
0.4
0.8
1
u
Figure 7. Beta density, a
4
=0.6875, b =
m===~2i'l.0~6m2~5=====
a2
0.
o. .&1iliill• •E:1!._~ii§J=
o
0.2
0.4
0.6
u
93
0.8
1
Figure 8. Beta density, a
=4.5, b =4.5
4
3
a2
Q.
o
o
0.2
0.4
u
0.6
Figure 9. Beta density, a
0.8
=2, b =2
4
3
a2
Q.
0.2
0.6
0.4
0.8
u
Figure 10. Beta density, a =0.75, b =
0.75
3
a2
Q.
O~~~~~~~~~~~~~~
o
0.2
0.6
0.4
0.8
u
..
94
Figure 11. Beta density, a = 3, b = 1
4
3
22
Q,
0.2
0.6
0.4
0.8
u
Figure 12. Beta density, a =1.125, b =
0.375
4
3
22
Q,
o
o
0.2
0.6
0.4
0.8
u
In sum, each replication consists of incidence data for 150 observations from
three dose groups, and an additional 150 observations on compliance generated from a
beta distribution under the stated assumptions.
The parameters of interest are a and {3 from model (2.6). For all cases (low,
moderate, and high dose), and all situations (the eight different compliance distributions),
these parameters were fixed at a
=-
5.5, and {3
= 3.5.
With these values and the
corresponding response levels, (2.6) was solved numerically to obtain the values for the
administered doses needed to obtain such responses. Table 2 presents the values for the
administered doses for each case and situation.
95
Table 2. Values for the adrilinistered doses for the three dose groups in each case (Low dose, Moderate dose, and High case) for
h of the different comoliance distrib .
High Dose
Low Dose
Moderate Dose
0.40
0.50
0.60
0.95
0.98
0.05
0.10
0.90
0.02
neVi) ~
a
b
VJ
V2
V3
V3
V2
V3
VJ
V2
VJ
n
'Y
3.125
5.63
6.5
7.53
14.67
19.16
26.51
2.43
1.6
0.25 0.0125
3.5
10.5
5.64
6.875
8.51
24.65
39.35
71.34
2.77
0.025
1.625
4.875 1. 435 2.14
11.
79
.
5.93
265.23
1. 79
2.33
8.16
96.09
1008.65
0.050
0.6875 2.0625 1.23
2.805
3.145
7.3
1.
695
3.53
5.93
9.42
0.86
1.
33
0.50 0.025
4.5
4.5
3.14
3.68
1.235 1.525
2.72
8.35
12.14
19.6
0.81
0.050
2
2
2.55
3.14
1.115
1.415
4.1
24.49
61.
32
207.47
0.745
0.100
0.75
0.75
1.175
1.86
2.055
2.27
3.72
4.69
6.36
0.59
0.91
0.75 0.0375
3
1
1. 785
1. 98
0.88
1.125
2.225
5.44
9.545
20.84
0.075
1.125
0.375 0.57
-
--
\0
0\
L
Each replication is analyzed under three methods: 1) Ignoring compliance
measurement error, where no adjustment is used; 2) Using adjustments based on a firstorder approximation; and 3) Using the proposed adjustments, which are based on a
second-order approximation. Recall that the first-order adjusted model correspond to
the model using the expected value of the true dose d, which is given by model (2.5).
The methods are compared based on bias, variance, and mean squared error (MSE).
Finally, keep in mind that in the low dose case, the proposed model corresponds
to the first-order adjusted model. In addition, the first-order adjustments only consider
adjustments for the f3 parameter, while the
Q
parameter remains the same as in the
unadjusted model.
5.3.1 Results of the Simulation Study
Table 3 and Table 4 summarize the results for the low dose case for the alpha and
beta parameters of the model respectively. Table 5 and Table 6 present the results for the
moderate dose case, and Table 7 and Table 8 present the results corresponding to the
high dose case.
•
97
Table 3. Results for the alpha parameter of the model
for a low dose case based on 1,000 replications
n
Mean
Bias
Var
MSE
0.0125
0.025
0.050
0.025
0.050
0.100
0.0375
0.075
-5.089
-5.071
-5.068
0.411
0.429
0.432
4.016
3.878
3.791
4.181
4.058
3.974
-5.125
-5.391
-5.203
0.375- 4.248
0.109 5.465
0.297 4.434
4.384
5.472
4.518
5.042
-5.107
0.458
0.393
3.884
4.160
4.090
4.310
y
0.25
0.50
0.75
~
00
Table 4. Results for the beta parameter ofthe model
for a low dose case based on 1,000 replications
0.50
0.75
Adjusting for CME
Mean
Bias
Var
MSE
n
Mean
0.0125
0.025
0.050
0.025
0.050
0.100
0.0375
0.075
0.907
1.017
1.208
-2.59
-2.48
-2.29
0.526
0.652
0.906
7.252
6.817
6.156
3.620
4.078.
4.857
0.12
0.58
1.36
8.36
10.50
14.95
8.37
10.82
16.78
1. 692
2.053
2.083
-1. 81
-1.45
-1.42
1. 874
2.892
2.803
5.142
4.985
4.809
3.386
4.115
4.180
-0.11
0.62
0.68
7.49
11. 70
11. 35
7.50
12.06
11. 80
2.371
2.534
-1.13
-0.97
3.611
4.180
4.882
5.110
3.163
3.376
-0.34
-0.12
6.42
7.42
6.53
7.43
y
0.25
Ignoring CME
Bias
Var
•
MSE
..
Table 5: Results for the alpha parameter of the model
for a moderate dose case based on 1,000 replications
MSE
Mean
Adjusting for CME
Second order
MSE
Bias
Var
9.402
13.483
19.107
7.082
9.843
15.378
6.294
6.840
5.481
3.879
2.675
7.657
5.455
3.782
9.293
6.817
10.981 0.264
9.379 . 0.130
8.175
0.050
13.157 0.495
10.955 0.249
9.282
0.116
14.793 0.806
12.317 0.548
Ignoring CME/First-order
y
0.25
0.50
0.75
n
0.0125
0.025
0.050
0.025
0.050
0.100
0.0375
0.075
Mean
-2.779
-1. 970
-1.173
-3.534
-2.674
-1. 676
-4.078
-3.665
Bias
Var
2.721
3.53
4.327
1. 966
2.826
3.824
1.422
1. 835
2.003
1. 021
0.382
3.218
1. 860
0.754
4.277
3.474
120.85
88.09
66.88
173.61
120.26
86.28
219.65
152.26
\0
\0
Table 6. Results for the beta parameter of the model for a moderate dose case based on 1,000 replications
Ignoring CME
y
n
0.0125
0.025
0.050
0.50 0.025
0.050
0.100
0.75 0.0375
0.075
0.25
Adjusting for CME
Mean
Bias
Var
MSE
Mean
0.423
0.280
0.136
1.117
0.840
0.513
1. 976
1. 833
-3.077
-3.220
-3.364
-2.383
-2.660
-2.987
-1. 524
-1.667
0.046
0.020
0.005
0.320
0.182
0.069
1.000
0.866
9.51
10.39
11.32
6.00
7.26
8.99
3.32
3.64
1. 695
1. 1"25
0.546
2.233
1. 680
1. 029
2.636
2.441
First order
Bias
Var
-1. 805
-2.375
-2.954
-1. 267
-1. 82
-2.471
-0.864
-1. 059
0.747
0.332
0.081
1.276
0.727
0.281
1. 782
1.536
MSE
Mean
13.74
17.76
22.68
10.68
13.83
18.52
8.87
9.71
-3.335
-2.276
"'3.408
-4.888
-3.485
-1.813
-5.993
-4.782
Second order
Bias
Var
-6.835
-5.776
-6.908
-8.388
-6.985
-5.313
-9.493
-8.282
1.87
4.70
4457.24
5.22
7.46
116.32
1. 72
22.79
MSE
48.59
38.06
4500.51
75.57
56.24
144.44
91. 85
91. 35
Table 7. Results for the alpha parameter of the model
for a high dose case based on 1,000 replications
Adjusting for CME
Second order
Bias
Var
Ignoring CME/First-order
y
n
0.25
0.0125
0.025
0.050
0.025
0.050
0.100
0.0375
0.075
0.50
0.75
o
o
MSE
MSE
Mean
3.169
1.113
0.362
38.823
49.36258.078
-1. 018
1. 0793
3.687
4.482
6.579
9.187
1.665
0.611
0.310
21. 750
43.897
84.709
5.340
6.638
7.546
5.144
1.653
0.405
33.649
45.719
57.341
-2.509
-0.284
2.295
2.991
5.216
7.795
2.794
0.878
0.326
11. 738
28.087
61. 091
5.692
7.186
4.071
0.803
36.464
52.441
-2.644
-0.213
2.856
5.269
2.205
0.500
10.362
28.449
Mean
Bias
Var
0.471
1.446
2.097
5.971
6.946
7.597
-0.160
1.138
2.046
0.192
1.686
Table 8. Results for the beta parameter of the model for a high dose case based on 1,000 replications
Adjusting for CME
Ignoring CME
y
n
0.0125
0.025
0.050
0.50 0.025
0.050
0.100
0.75 0.0375
0.075
0.25
•
Mean
First order
Var
Bias
MSE
Mean
Second order
Var
Bias
MSE
Mean
Bias
Var
MSE
0.123
0.034
0.002
-3.377
-3.466
-3.498
0.0101
0.0009
6.9*10- 6
11.411
12.012
12.234
0.495
0.138
0.009
-3.005
-3.362
-3.491
0.1617
0.0145
0.0001
9.193
11. 320
12.185
0.936
0.349
0.039
-2.564
-3.151
-3.461
0.154
0.018
4*10- 4
6.73
9.95
11.98
0.412
0.138
0.011
-3.088
-3.362
-3.489
0.1101
0.0137
0.0001
9.648
11. 319
12.173
0.825
0.276
0.022
-2.675
-3.224
-3.478
0.4428
0.0551
0.0006
7.600
10.451
12.095
1.411
0.619
0.087
-2.089
-2.881
-3.413
0.399
0.058
0.002
4.76
8.36
11. 65
0.562
0.113
-2.938
-3.387
0.2140
0.0114
8.844
11. 482
0.751
0.151
-2.749
-3.349
0.3822
0.0203
7.941
11.238
1.348
0.431
-2.152
-3.069
0.355
0.031
4.98
9.45
.
The estimation of the parameters under the unadjusted model presented several
characteristics that were common in all situations. First, the bias in the estimation of the
alpha parameter came out to be always positive. Second, in the estimation of the beta
parameter, the bias was always negative. In addition, this bias was smaller for situations
with higher "'(.
For the low dose case, the estimation of the alpha parameter contains a small
bias, which in all situations came out to be on the positive side. In the estimation of the
beta parameter, the proposed adjustments successfully reduce bias. The reduction in bias
seems to be greater in situations where the compliance distribution has small 12. This
was the case when the "'( was 0.25 and 0.50. However, this was not true when the "'( was
taken to be 0.75. The MSEs are higher under the adjusted model due to the increase in
vanances.
For the moderate dose case, there were a positive bias in the estimation of the
alpha parameter. The biases were smaller for compliance distribution with smaller 12.
•
The adjusted estimates for alpha came out to have substantially larger bias and larger
MSEs correspondingly. For the beta parameter, the bias is always negative under all
three methods. The proposed model presented an increase in the bias as well as in the
variance leading to substantially higher MSEs. The first-order adjustments presented
smaller biases in all situations. However, the variances came out to be higher, which led
to slightly higher MSEs than under the unadjusted model.
Finally, for the high dose case, the bias in the estimation of the alpha parameter
was positive in all situations. The adjusted estimates presented less bias in situations
where the compliance distribution has small 12. The bias was larger when 12 was taken to
be 0.2"'(. The variance of these estimates came out to be smaller under the adjusted
101
model, and the MSEs were also smaller, except for those cases where the bias was
larger. For the estimation of the beta parameter, the first-order adjustments reduce bias
in all situations, and the proposed adjustments reduce bias even further. The variance
increase under the first-order model and increase even more in most cases under the
proposed model for high dose. Despite of this, the MSEs are smaller in all situations.
5.3.2 Discussion
The present simulation study confirms the presence of bias in the estimation of
the parameters of the model when compliance measurement error is ignored. The alpha
parameter is estimated with a positive bias under all different compliance distributions
studied here. At the same time, the beta parameter is estimated with a negative bias in all
situations studied. The beta parameter represents the effect of the dose, and the negative
bias indicates an attenuation effect due to ignoring compliance measurement error. This
attenuation effect is expected under the compliance model, where the true dose is always
assumed to be lower than the administered dose. Also, as expected, the bias is smaller
for situations where the compliance distribution has larger values for its mean, which
correspond to the situations where the mean of the true doses is closer to the
administered dose.
Among all three cases, the adjustments proposed for the moderate dose case do
not seem to work adequately. Both bias and variance are larger in the estimation of both
the alpha and beta parameters when using the adjusted model in comparison with the
unadjusted model.
This result is common across all different types of compliance
distributions considered here. A possible explanation of this failure is the fact that, in the
moderate dose case, the adjustments are based on an expansion of the model around a
102
•
dose VI, in which VI was taken such that 7fo(Vt) = 7ft, where 7ft = ~. A more
appropriate centering value for the expansion could have been 7ft = ~
€
+ €, and ignoring
could have introduced non negligible noise to the approximation. Other values for 7ft
were considered during the development of the adjustments for the moderate case, but
the complexity of the adjustments were considered impractical.
For now, we can
recommend the use of a model with first-order adjustments. Model (2.6), the model
using the expected value of the true dose, resulted in a reduction of the bias in the
estimation of the beta parameter under all situations.
The adjustments for the low dose case seem to work in reducing the bias in the
estimation of the parameters under all types of compliance distributions. This reduction
in bias seems to be more important in situations where the compliance distribution has
small variance. On the other hand, the MSEs turned out to be larger due to the larger
variability of the adju.sted estimates. Comparing the MSEs, however, is appropriate if
the distribution of both estimates are comparable, in particular, if they are normally
distributed. Under non normality, the mean and variance are not enough to describe the
distribution of the estimates.
Here both estimates, with and without adjusting for
compliance measurement error, are asymptotically normal. However, consideration of
rates of convergence is of importance when comparing the distributions of the estimates,
especially under small sample situations. In any case, methods that are nearly unbiased
are usually preferable and the issue of increase variability needs to be study further in
both the low dose case, and the moderate case with first-order adjustments.
Finally, the results of this simulation study show that the adjustments for the high
dose case are successful in achieving both a reduction in bias as well as a reduction in
MSE, which translates into an improvement in the overall efficiency of the estimation.
103
These characteristics are maintained when comparing with either the unadjusted model
or the model with first-order adjustments.
104
CHAPTER VI
REMARKS
,
The problem of compliance is common in many applications.
In inhalation
toxicology studies, it is reasonable to expect that the dose administered is not completely
absorbed due to the normal mechanisms working within a system, such as, metabolism
and clearance mechanisms, among others. Experiments using subhuman primates might
not be able to assured that the dose distributed be actually consumed. The problem of
compliance can also be found in human studies, one example is the study of radiation
effects, as described in Chapter V. In all these situations, the more relevant measure of
dose is one smaller in magnitude than the administered dose.
In dose-response analysis, the dose refers to the actual dose incorporated in the
experimental protocol, which might not necessarily be equal to the administered dose.
The point of distinction is that in a measurement error or a stochastic compliance model,
the actual input dose
p
is stochastic and may differ from the administered dose level V
(mostly nonstochastic). In a measurement error model, in general, we let D = V
+ U,
where U represents the measurement error, and it is generally assumed that U has a
distribution symmetric around 0; often U is taken to be normally distributed with mean 0
,.
and variance a~. In a stochastic compliance model, on the contrary, we take
D = UV, 0:5 U :5 1,
•
where the stochastic factor U represents the proportion of the administered dose, V, that
relates to the actual dose D.
One contribution of this dissertation work is the proposed model for stochastic
compliance. Multiplicative models has been used before for modeling the measurement
error problem. Typically, U is assumed to follow a Lognormal or Gamma distribution
with E(U)
= 1, where no particular reason is offered for the choice of distribution.
The
point to notice is that, in the compliance measurement error problem, U has typically a
.
distribution centered somewhere in (0, 1), and E(U) is generally smaller that one. In
this context, the Beta distribution is appropriate by providing a great deal of flexibility to
account for potential compliance patterns, and having E(U)
~
1, as desired.
We have proposed adjustments for logistic models to account for compliance
measurement error. Approximate models were proposed for the classical logistic model
for one binary outcome; a bivariate logistic model; and a segmented logistic model.
Moreover, adjustments were developed for situations in which the dose is considered
...
low, moderate or high.
Therefore, we believe that a wide range of studies can be
analyzed by using the proposed models.
The ideas proposed here for the development of the models as well as the
proposed model for compliance based on the beta distribution, can be useful in other
models, and might be the subject offurther research. Another area for further research is
extending the bivariate case to higher dimension cases.
situation, however,
The general multivariate
is currently being studied by many, where the problem under
consideration is accounting for the dependencies among the responses. Accounting for
measurement error of the compliance type in these models can be a next step for
research.
Another type of dependencies that might need to be considered are those
106
arising from subjects within each dose group due to clustering of some kind. These type
of dependencies are likely to affect second-order adjustments, and in some situations
might make the proposed adjustments inappropriate.
The simulation study presented in Chapter V shows that the proposed models
have reduced bias in most -cases. The approximate models have reduced bias for the low
and high dose cases.
For the moderate case, a first-order adjustment seems to be
successful in reducing bias but not the proposed adjustments based on second-order
approximations. Further research is needed to study the failure of the second order
adjustments. For the low dose case, the mean squared error seem to be greatly affected
by the adjustments. However, it seems also that this situation might be better when the
distribution of compliance has small variance, for which the reduction in bias would be
more substantial. For example, in the study of radiation effects presented in Chapter V,
..
we can see that despite the substantial increase in the variability of the estimate of {3, the
difference between this estimate and the estimate obtained ignoring compliance
measurement error is even more striking. This provide evidence of the magnitude of the
bias incurred by ignoring such measurement error of the administered dose. In any case,
methods that are nearly unbiased are usually preferable and the issue of increase
variability needs to be study further in both the low dose case, and the moderate case
with first-order adjustments.
On the other hand, the simulation study shows that the proposed model for the
high dose case is successful achieving all the desired characteristics, reduction of bias
while still controlling the size of the mean squared error. These characteristics are
present in the model with first-order adjustments, and improve even further in the model
with second-order adjustments.
.
107
Several points are worth mentioning with respect to the additional data needed
for the estimation of the compliance distribution. First, the need of independent studies
for the assessment of the measurement error process is common, and not particular to
our situation. An alternative would be to resort to additional assumptions. Second, in
cases where true doses are very diffi'cult or even impossible to obtain, the true dose
measurements in the independent study are at best estimates with usually large errors.
This seems to be the case when using organ dose estimates, as in the radiation effect
study of Chapter V, where phantoms were used for the estimation.
We believe,
however, that the use of this type of data for the estimation of the compliance
distribution is less problematic than using the estimated doses directly in the fitting of a
model and attempting other adjustments for measurement error.
In sum, we believe that the problem of stochastic compliance is present in many
situations, and its appropriate consideration is of relevance.
We provide in this
dissertation logistic models, that account for the compliance measurement error, that are
likely to be appropriate in a large variety of situations.
..
. 108
Appendix
Computations for Ii ({3*)
Some details for the computation of (3 .14) is given in this appendix. Let
1
1
I i (f3*) = LLKJ:)(Vi;{3*),
i=O 8=0
where
The aim of this section is to present computational expressions for KJ:) (Vi; f3* ), 4 x 4
matrices.
.
First let E 1
= e-(O;+P;Vi) and
E2
= e-(O;+tJ;Vi) to
simplify the presentation of
these expressions somewhat. Now we present expressions for each (j, s) term
j
= 0, s =
°
where the triplets (a, b, c) are specified for each (m, n) cell as follow
b)
(a, ,e =
(0,2,0)
(0,1,1)
(1,2,0)
(
(1,1,1)
(0,0,2)
(1,1,1)
(1,0,2)
j = 0, s = 1
109
(2,2,0)
(2,1,1)
1
.
(2,0,2)
with
(0,0,2,1,2,3)
(0,0,1,1,1,1)
(a, b, c, d, e) =
j
(
(0 , 1, 2, 1, 2, 3)
(0,1,1,1,1,1)
(0,0,0,1,0, - 1)
(1,1,1,1,1,1)
(0,1,0,1,0, - 1)
(0,2,2,1,2,3)
(1,2,1,1,1,1)
(0,2,0,1,0, - 1»). .
= 1,8 = 0
with
(a,b,c,d,e)
=
For j = 1,
8
(0,0,1,0,0, -1)
(1,0,1,1,1,1)
(0,1,1,0,0, - 1)
(
(0,1,1,1,1,1)
(0,0,1,2,2,3)
(1,1,1,1,1,1)
(0,1,1,2,2,3)
(0,2,1,0,0, - 1)
(0,2,1,1,1,1)
1
.
(0,2,1,2,2,3)
= 1, we present the expressions more explicitly because they are much
more cumbersome
with
Q=
v; Ai
v;A 1 A 2 v;A~
where
,a;
Al
=
{E 1 + (0; + vi)[2 + E 1 + E 2 ]) [1 + E 1][l
- E 1 [2 + E 1 + E 2 ][1 + E 2 ][2 + 2E1 + E 2 ]
+ E 2 ][l + E 1 + E 2 ],
A2
=
{E2 + (0; +,8;Vi) [2 + E 1 + E 2 ]}[1 + E 1][l
- E 2 [2 + E 1 + E 2 ][l + E 2 ][2 + E 1 + 2E2 ]
+ E 2 ][l + E 1 + E 2 ] •
110
.
REFERENCES
Andersen, M.E.
(1994).
Physiologically-Based Pharmacokinetic Modeling.
Illformatioll Journal 28, 247-254.
Drug
Andersen, M.E., Krewski, D. and Withey, JR (1993). Physiological Pharmacokinetics
and Cancer Risk Assessment. Cancer Letters 69, 1-14.
Armitage, P. and Doll, R (1954). The Age Distribution of Cancer and Multistage Theory
of Carcinogenesis. British Journal ofCancer 8, 1-12.
..
Armitage, P. and Doll, R (1957). A Two-Stage Theory of Carcinogenesis in Relation to
the Age Distribution of Human C~ncer. British Journal ofCallcer 11, 161-169.
Armstrong, B. (1985). Measurement Error in the Generalized Linear Model.
Commullications in Statistics - Simulation and Computation, 14, 529-544.
Bois, F.Y., Zeise, L. and Tozer, T.N. (1990). Precision and Sensitivity ofPharmacokinetic
Models for Cancer Risk Assessment: Tetrachloroethylene in Mice, Rats, and
Humans. TOXicology alld Applied Pharmacology 102,300-315.
Carroll, R.I., Ruppert, D., and Stefanski, L.A. (1995). Measurement Error in Nonlinear
Models. Chapman and Hall, London.
Carroll, RJ, Spiegelman, C.H., Lan, KKG., Bailey, KT. and Abbott, RD.(1984). On
Errors-in-Variables for Binary Regression Models. Biometrika 71, 19-25.
•
Carroll, RJ and Wand, M.P. (1991). Semiparametric Estimation in Logistic Measurement
Error Models. Journal ofthe Royal Statistical Society - Series B 53,573-585.
111
Conolly, R.B., Reitz, RH., Clewell, H.J., and Andersen, M.E. (1988). Pharmacokinetics,
Biochemical Mechanism and Mutation Accumulation: a Comprehensive Model of
Chemical Carcinogenesis. Toxicology Letters 43, 189-200.
Conolly, RB. and Andersen, M.E. (1991). Biologically Based Pharmacodynamic Models:
Tools for Toxicological Research and Risk Assessment. Annual Review of
Pharmacology and Toxicology 31, 503-523.
Conolly, R.B. and Andersen, M.E. (1993). An Approach to Mechanism-based Cancer
Risk Assessment for Formal~ehyde. Environmental Health Perspective
Supplements 101, 169-176.
Crump, KS. (1994). Limitations of Biological Models of Carcinogenesis for Low-Dose
Extrapolation. Risk Analysis 14, 883-886.
Dedrick, RL. (1973). Animal Scale-Up.
Biopharmaceutics 1, 435-461.
Journal
of Pharmacokinetics
and
Devidas, M., George, E.O., and Zelterman, D. (1993). Generalized Logistic Models for
Low-Dose Response Data. Statistics in Medicine 12, 881-892.
Efron, B. and Tibshirani, R (1993). An Introduction to the Bootstrap. Chapman and Hall,
London.
Flegal, KM., Keyl, P.M., and Nieto, F.J. (1991). Differential Misclassification Arising
form Nondifferential Errors in Exposure Measurement. American Journal of
Epidemiology 134, 1233-1244.
Finney, D. J. (1978). Statistical Method in Biological Assay. Hafner Publishing Company,
New York.
Fuller, W.A. (1987). Measurement Error Models. John Willey and Sons, New York.
Gerlowski, L.E. and Jain, RK (1983). Physiologically Based Pharmacokinetic Modeling:
Principles and Applications. Journal ofPharmaceutical Sciences 72, 1103-1127.
Gong, G. and Samaniego, FJ. (1981). Pseudo Maximum Likelihood estimation: Theory
and Applications. The Annals ofStatistics 9, 861-869.
Haseman, J.K (1984). Statistical Issues in the Design, Analysis and Interpretation of
Animal Carcinogenicity Studies. Environmental Health Perspectives 58,385-392.
112
•
Hoel, D.G., Kaplan, N.L., and Anderson, M.W. (1983). Implication of Nonlinear Kinetics
on Risk Estimation in Carcinogenesis. Science 219, 1032-1037.
IARC (1986). Long-Term and Short-Term Assaysfor Carcinogens: a Critical Appraisal.
No. 83 LYON: International Agency for Research on Cancer.
Johnson, N. L. and Kotz, S. (1970). Continuous Univariate Distributions-I. New York:
John Wiley and Sons.
Johnson, N. L. and Kotz, S. (1970). Continuous Univariate Distributions-2. New York:
John Wiley and Sons.
Kendall, M.G. and Stuart, A. (1981). Regression, Structure and Functional Relationship.
Biometrika 38, 11-25.
Kerr, G. D. (1979). Organ Dose Estimates for Japanese Atomic-Bomb Survivors. Health
Physics 37, 487-508.
Kuha, 1. (1994). Corrections for Exposure Measurement Error in Logistic Regression
Models with an Application to Nutricional Data. Statistics in Medicine, 13,11351148.
Lang, 1.B. and Agresti, A. (1994). Simultaneously Modeling Joint and Marginal
Distributions of Multivariate Categorical Responses. Journal of the American
Statistical Association 89, 625-632.
Lesaffre, E. and Molenberghs, G. (1991). Multivariate Probit Analysis: A Neglected
Procedure in Medical Statistics. Statistics ill Medicine 10, 1391-1403.
Liang, K.-Y. and Zeger, S.L. (1986). Longitudinal Data Analysis using Generalized Linear
Models. Biometrika 73, 13-22.
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate Regression Analyses for
Categorical Data. Journal ofthe Royal Statistical Society Series B 54, 3-40.
..
Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1990). Maximum Likelihood Regression
Methods for Paired Binary Data. Statistics in Medicine 9, 1517-1525.
Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized Estimating Equations
for Correlated Binary Data: Using the Odds Ratio as a Measure of Association.
Biometrika 78, 153-160.
113
Lutz, R.I. and Dedrick, R.L. (1987). Implication of Pharmacokinetic Modeling in Risk
Assessment Analysis. Environmental Health Perspectives 76,97-106.
Mammen, E. (1992). Bootstrap, Wild Bootstrap, and Asypmtotic Normality. Probability
Theory and Related Fields 93, 493-455.
McCullagh, P and Neider, I.A (1989). Generalized Linear Models. Chapman and Hall,
London.
McDonald, B.W. (1993). Estimating Logistic Regression Parameters for Bivariate Binary
Data. Journal ofthe Royal Statistical Society Series B 55,391-397.
J
Molenberghs, G. and Lesaffre, E. (1994). Marginal Modeling of Correlated ordinal Data
Using a Multivariate Plackett Distribution. Journal of the American Statistical
Association 89,633-644.
Moolgavkar, S.H. and Venzon, DJ. (1979). Two-Event Models for Carcinogenesis:
Incidence Curves for Childhood and Adult Tumors. Mathematical Biosciences 47,
55-77.
Moolgavkar, S.H. and Knudson, AG. (1981). Mutation and Cancer: A Model for Human
Carcinogenesis. Journal ofNational Cancer Institute 66, 1037-1052.
Moolgavkar, S.H., Dewanji, A and Venzon, D.I. (1988). A Stochastic Two-Stage Model
for Cancer Risk Assessment. I. The Hazard Function and the Probability of Tumor.
Risk Analysis 8,383-392.
Moolgavkar, S.H. (1994). Biological Models of Carcinogenesis and Quantitative Cancer
Risk Assessment. Risk Analysis 14, 879-882.
Moolgavkar, S.H. and Luebeck, G. (1990). Two-Event Model for Carcinogenesis:
Biological, Mathematical, and Statistical Considerations. Risk Analysis 10, 323341.
Morris, S. C. (1990). Cancer Risk Assessment: A Quantitative Approach. Marcel Dekker,
Inc., New York.
Neider, 1. A and Wedderburn W. M. (1972). Generalized Linear Models. Journal of the
Royal Statistical Society A 135, 370-384.
114
T
Pierce, D. A., Stram, D.O., Vaeth, V. (1990). Allowing for random errors in radiation
exposure estimates for the atomic bomb survivor data. Radiation Research 123,
275-284.
Portier, C.J. and Sherman, C.D. (1994). The potential effects of chemical mixtures on the
carcinogenic process within the context of mathematical multistage model. In
Yang, R (ed.) Risk Assessment of Chemical Mixtures: Biological and
Toxicological Issues. Academic Press, Inc., 25, 665-686.
Portier, C.J. and Hoel, D. (1983). Optimal Design of the Chronic Animal Bioassay.
Journal of Toxicology and Environmental Health 12, 1-19.
Portier, C.J. and Kaplan, N.L. (1989). Variability of Safe Dose Estimate when using
Complicated models of the Carcinogenic Process. Applied TOXicology 13, 533544.
Portier, C.J., Kopp-Schneider, A., and Sherman, C.D. (1996). Calculating Tumor
Incidence Rates in Stochastic Models of Carcinogenesis.
Mathematical
Biosciences 135, 129-146.
Prentice, RL. (1976). Generalization of the probit and logit Methods for Dose-Response
Curves. Biometrics 32, 761-768.
•
Prentice, RL. (1986). Binary Regression using an Extended Beta-Binomial Distribution,
with Discussion of Correlation In4uced by Covariate Measurement Errors. Journal
ofthe American Statistical Association 81,321-327.
Preston, D. L. et. al. (1994). Cancer Incidence in Atomic Bomb Survivors. Part III:
Leukemia, Lymphoma and Multiple Myeloma, 1950-1987. Radiation Research
Supplement 137, S68-S97.
Reade-Christopher, S. and Kupper, L.L. (1995). On the Effects of Predictor
Misclassification in Multiple Linear Regression Analysis. Communication in
Statistics: Theory andMethods 24, 13-37.
·6
Roesch, W. C. (Ed.) (1987). Final Report of US-Japan joint reassessment of atomic bomb
radiation dosimetry in Hiroshima and Nagasaki. Radiation Effect Research
Foundation, Hiroshima.
. 115
Rosner, B., Willet, W.C. and Spiegelman, D. (1989). Correction of Logistic Regression
Relative Risk' Estimates and Confidence Intervals for Systematic within-person
Measurement Error. Statistics in Medicine 8, 1151-1163.
Rosner, B., Spiegelman, D. and Willet, W.C.(1990). Correction of Logistic Regression
Relative Risk Estimates and Confidence Intervals for Measurement Error: The
Case of Multiple Covariates Measured with Error. American Journal of
Epidemiology 132, 734-745.
Sen, P.K and Singer, J.M. (1993). Large Sample Methods in Statistics: An introduction
with Applications. Chapman and Hall, London.
Sontag, J.M., Page, N.P., and Saffiotti, U. (1976). Guidelinesfor Carcinogen Bioassay in
Small Rodents. DHHS Publication (NIH) 76-801 National Cancer Institute,
Bethesda, Md.
Stefanski, L.A. and Carroll, R.J. (1985). Covariate Measurement Error in Logistic
Regresion. The Annals ofStatistics 13, 1335-1351.
Stefanski, L.A. (1985). The Effects of Measurement Error on Parameter Estimation.
Biometrika 72, 583-592.
Stefanski, L.A. (1989). Correcting Data for Measurement Error in Generalized Linear
Models. Communications in Statistics - Theory andMethods 18, 1715-1733.
Stukel, T.A. (1988). Generalized Logistic Models. Journal of the American Statistical
Association 83, 426-431.
Stukel, T.A. (1990). A General Model for Estimating ED lOOp for Binary Response DoseResponse Data. The American Statistician 44, 19-22.
Tan, W. Y. (1991). Stochastic Models ofCarcinogenesis. Marcel Dekker, New York.
Tan, W.Y. and Singh, KP. (1987). Assessing the Effects of Metabolism of Environmental
Agents on cancer Tumor Development by a Two-Stage Model of Carcinogenesis.
Environmental Health Perspectives 74,203-210.
Thompson, D.E. et. al. (1994). Cancer Incidence in Atomic Bomb Survivors. Part II:
Solid Tumors, 1958-1987. Radiation Research Supplement 137, 817-867.
116
,
Wacholder, S., Dosemeci, M., and Lubin, lH. (1991). Blind Assignment of Exposure
Does Not Always Prevent Differential Misclassification. American Journal of
Epidemiology 134, 433-437.
Whittemore, A. and Keller, J.B. (1978). Quantitative Theory of Carcinogenesis. SIAM
Review 20, 1-30.
J
..
117
© Copyright 2026 Paperzz