Tudor, Gail E.; (1991).Survival Analysis Using Primary and Surrogate Endpoints."

T'-n
':'~c.
Lib""ry of the DCM'+me~,. ,~...;t~·
... ..
,.,.....iL -'iit of \,."J"'-:;.:
North Carolina Stat£; University
l
I
IQ
!"
SURVIVAL ANALYSIS
USING PRIMARY AND SURROGATE ENDPOINTS
by
Gail E. Tudor
Department of Biostatistics, University of
North Carolina at Chapel Hill, NC.
Institute of statistics Mimeo Series No. 1891T
September 1991
SURVIVAL ANALYSIS
USING PRIMARY AND SURROGATE ENDPOINTS
by
Gail E. Tudor
A dissertation submitted to the faculty of the University of North
Carolina in partial fulfillment of the requirements for the degree
of Doctor of Philosophy in the Department of Biostatistics.
Chapel Hill
1991
Approved by:
Advisor
Reader
Reader
ABSTRACT
GAIL E.
TUDOR. Survival Analysis Using Primary and Surrogate Endpoints.
(Under the
direction of TIMOTHY M. MORGAN)
Current survival techniques do not provide a good method for handling clinical trials
with a large percent of censored observations.
Some methods that have been proposed to
increase the power of analyses· include covariate adjustment, multivariate models, use of
surrogate endpoints and the use of time dependent covariates. This research proposes using time
dependent surrogates of survival as outcome variables, in. conjunction with observed survival
time to improve the precision in comparing the relative effects of two treatments on the
distribution of survival time, or in estimating the parameters in the survival model, where the
outcome measure is time to a specified event. This is in contrast to the standard method which
uses the marginal density of survival time, T, only, or the marginal density of the surrogate, X,
only, therefore, ignoring some available information.
The surrogate measure, X, may be a fixed value or a time dependent variable, X(t). X
is a summary measure of some of the covariates measured throughout the trial that provide
additional information on a subject's survival time. It is possible to model these time dependent
covariate values and relate the parameters in the model to the parameters in the distribution of
T given X.
The efficiency of the standard method compared to the proposed method is presented for
several different cases which use a fixed or time dependent variable. The last example applies
data from a study on the effects of lipids on coronary heart disease. The proposed method is
always as efficient or more efficient than the standard method.
When the treatment effects
survival by having a multiplicative effect on the conditional hazard the proposed method is most
efficient when the values of X are very heterogeneous and there is little censoring.
When
treatment affects the distribution of the surrogate measure, X, only, the proposed method is
most efficient when the values of X are homogeneous and there is a large amount of censoring.
When both are affected the largest gain is achieved when X is very heterogeneous and there is
little censoring.
11
ACKNOWLEDGEMENTS
I would like to thank my committee members, Drs. Timothy Morgan, Gary Koch, Ed
Davis, P.K. Sen and Gary Rozier for their support of me throughout my stay here at UNC. A
special thanks to my advisor, Dr. Timothy Morgan, who made my dissertation experience an
extremely enjoyable and educational one. Also, I would like to thank him for being so available
and so helpful.
Lastly, I would like to thank all my student friends in the Biostatistics
Department for taking the time to speak and laugh with me. You are the ones who have kept
me sane.
iii
TABLE OF CONTENTS
Chapter
Page
I. INTRODUCTION
1.1. Introduction................................................
1
1.2. Surrogate Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3
~
1.3. Covariate Adjustment
. . . . . . . . . . . . . . . . . . . . . . . . . . ..
5
1.4. Multivariate Multistate Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
7
1.5. The Use of Surrogate Measures in Combination with Observed Survival Time
to Improve the Estimate of Survival Distributions . . . . . . . . . . . . . . . . . . . ..
9
1.6. Proposed Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
11
II. LITERATURE REVIEW AND PRELIMINARY MODELS
2.1. Introduction................................................ 13
2.2. Use of Surrogate Measures in a Multivariate Normal Model. . . . . . . . . . . . . .. 13
2.3. A Stochastic Model for Censored Survival Data in the Presence of an
Auxiliary Variable
18
2.4. Nonparametric Survival Curve Estimation Using a Time Dependent
Dichotomous Surrogate Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22
2.5. Use of a Surrogate Response Measure in a Parametric Survival Model
26
III. USE OF A SINGLE VALUE SURROGATE MEASURE
3.1. Introducing the Likelihood Model for Survival. . . . . . . . . . . . . . . . . . . . . . ..
33
3.2. Forming the Likelihood Equations Using Given Distributions. . . . . . . . . . . ..
34
3.3. Comparing the Efficiency of the Proposed Method in Estimating the
Parameter of an Exponential Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
36
iv
3.4. Computing the Efficiency Ratio When Survival Time is Affected by Treatment. 38
3.5. Computing the Efficiency Ratio When the Effect of Treatment is on the
Surrogate Measure, X, Only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
41
3.6. Computing the Efficiency Ratio When Both Survival Time and the Surrogate
Measure, X, are Affected by Treatment. . . . . . . . . . . . . . . . . . . . . . . . . . ..
44
3.7. Introducing a General Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
49
3.8. Computing the Efficiency Ratio When Survival Time is Affected by Treatment. 50
3.9. Computing the Efficiency Ratio When the Effect of Treatment is on the
Surrogate Measure, X, Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
53
3.10. Computing the Efficiency Ratio When Both Survival Time and the Surrogate
Measure, X, are Affected by Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . ..
55
3.11. Results Extended to Random Type I Censoring. . . . . . . . . . . . . . . . . . . . . .. 59
IV. USE OF TIME DEPENDENT SURROGATE MEASURES
60
4.1 Introduction................................................
60
4.2 Simple Specific Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
61
4.3 General Model ... '. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
65
4.4 Examples..................................................
69
V. FORMATION AND IMPLEMENTATION OF SURROGATE MEASURES
88
5.1 General Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
88
5.2 Description of Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
91
5.3 Implementation.............................................
92
5.4 Computing the Efficiency Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
94
VI. SUMMARY AND FUTURE RESEARCH
97
6.1 Summary..................................................
97
6.2 Future Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 101
REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 103
v
•
CHAPTER 1
INTRODUCTION
1.1 Introduction
A clinical trial is a prospective study comparing the effect of an intervention(s) against
a control in patients with a given medical condition. They are often very well designed, paying
close attention to the avoidance of bias in the selection, care and evaluation of patients. Each
subject is followed, from a well defined point, which becomes time zero or baseline for the study,
until they reach a given endpoint. A properly planned and executed clinical trial is a powerful
experimental techinque for assessing the effectiveness of an intervention.
To plan a trial
properly, the endpoints for evaluating each subject's response to treatment must be carefully
.
.
defined.· Two commonly used endpoints in drug trials are the occurrence of side effects (e.g.
nausea/vomiting) and a success/failure indicator (e.g. patients whose diastolic blood pressure
dropped below 90 mm Hg is a success and has thus attained the endpoint) (Herson, 1989). For
both of these events the issue is whether the event occurred or not. The time at which it occurs
may not be important.
However, time to some event is often the primary endpoint.
For
exampie, the time it takes a patient to move from the disease-free state to having some
symptom of the disease (e.g. first detection of a tumor), or survival time, the time from which a
patient begins the treatment until their death (e.g. in breast cancer studies, the time from
masectomy until death) are two common primary endpoints. For simplicity the term survival
time will be used as a general term encompassing time to any endpoint.
Let T be a nonnegative continuous random variable representing the survival time of an
individual. The survival and cumulative distribution functions are defined by Miller (1981) as
S(t)
= P(T>t) and F(t) = 1-S(t), respectively.
The probability density function, also known as
toe unconditional failure rate, is defined as
f(t) = - dS(t)jdt.
The hazard function or conditional failure rate specifies the instantaneous failure rate
conditional upon survival to time t and is defined as
A(t) = lim
~t-O
P(t< T <
t+~t
j T > t) j
~t
= f(t)jS(t) = - dlogS(t)jdt.
Survival times are usually known exactly for only a portion of all experimental units
under study and the rest are censored. An observation is said to be right censored at time C if
the exact survival time is known to be greater than C. Similarly, an observation is said to be
left censored if the exact survival time is known to be less than C.
In clinical trials right
censoring is more common than left censoring.. Only right censoring will be considered in this
dissertation.
There are several ways in which right censoring can occur.
the observed survival times of n patients under study.
Let t 1 < t 2 < ...< t n be
Type II censoring occurs when the r
(r<n) smallest observed survival times are uncensored. The remaining n-r items are censored at
the survival time for the rth smallest observed survival time, tr- Type I censoring occurs when
each subject is subjected to a
limi~ed
period of observation, C i , of prespecified or known length,
or some random time independent of a subject's failure time, T i , i=1,2, ... n. T i is observed
only when T i < C i. Type I censoring is frequently used in clinical research where a decision is
made to terminate a study after a prescribed period of time. For a given subject:
T i is survival time,
C i is censoring time,
and
61• is an indicator variable of event = 1 if y.=
T·1
1
For popular models like the exponential, Weibull and Cox's proportional hazards
models, the power available to detect a treatment difference depends on the observed number of
events. For trials that study diseases, like metastatic lung cancer, where rapid deaths occur such
that there is very little censoring, available survival techiniques work very well. But, for rarer
2
events, or events that take a long time to happen, any reasonable length trial (3-10 years) may
see 50-95% censoring. Often a trial to study such events would be infeasible. For example, if
you only expect 1% deaths and you need 400 deaths to have sufficient power, this means that
the total number of patients required is 40,000.
For the popular models just mentioned the
variability of an estimate has an inverse relationship with the number of patients who reach the
endpoint. Thus, if there are few events the precision is low and the power available to detect a
treatment difference is small. If the length of the study could be extended and more deaths
observed than the variability would be reduced and the power would increase. Due to cost and
time constraints, this is not often feasible.
This has led investigators to consider the use of
surrogate measures, covariate adjustment, multivariate models and time-dependent covariates to
improve power. Each of these topics will be covered in the following sections. A review of the
literature on the uses of time dependent events to improve the estimate of the survival
distribution will be covered in chapter 2.
1.2 Surrogate Measures·
A surrogate variable is measured as a substitute for some other variable.
In clinical
trials, surrogate variables have been used in lieu of primary outcome variables of interest.
Survival is the simplest and most definitive of all possible endpoints. However, the time alloted
to follow subjects may not be long enough to encompass the true survival times for the majority
of the subjects. Thus, a surrogate response, S, has often been sought for the true response, T,
that occurs later in time. Also, events which occur before death are often of interest but may
not have been measured. If an experiment is designed to show a treatment effect in variable Y,
but instead variable X was measured, a variable that is pathophysiologically related to variable
Y, then variable X is often considered to be a surrogate of variable Y.
A good surrogate
endpoint has been defined by Prentice (1989) as a response variable for which a test of the null
hypothesis of no relationship to the treatment groups under comparison is also a valid test of
the corresponding null hypothesis based on the true endpoint.
In clinical trials, a true endpoint generally measures the clinical benefit of a treatment,
3
while a surrogate endpoint measures the process of a disease. In cancer studies with survival
time as the primary endpoint, according to Ellenberg and Hamilton (1989), surrogate endpoints
frequently employed are tumor response, time to progression, and time to reappearance of
disease, since these events occur earlier and are unaffected by the use of secondary therapies. In
cardiovascular studies, surrogate endpoints frequently employed are ejection fraction (measures
the effectiveness of the heart in pumping blood), arterial potency, blood cholesterol and blood
pressure (Wittes et al., 1989).
For example, systolic blood pressure is measured instead of
stroke for testing whether ingestion of calcium prevents stroke. In defining surrogate endpoints,
a distinction must be made between study endpoints which provide the basis for reliable
comparisions of therapeutic effect, and clinical endpoints which are useful for patient
management (e.g. monitoring response to treatment) but are not sensitive or specific enough to
be used as a formal endpoint in a clinical trial.
The primary motivation in using surrogate endpoints in clinical trials, is to replace a
rare or distal endpoint by a more frequent or proximate endpoint. The use of the more frequent
endpoint Can lead to dramatic reductions in sample size and shorter studies since more subjects
are experiencing the endpoint and less time is required for followup. Another justification cited
for utilizing surrogate endpoints is that patients may survive a long time following diagnosis so
they will probably be censored at the end of the study. Also, effective secondary therapies may
exist that prolong survival independently of the effect of the primary treatment under study.
Thus, if the first treatment does not benefit the patient, we want to determine this quickly so we
can then try another treatment. It is often the case that the endpoint of greatest relevance is
not practical or even feasible to measure.
A surrogate endpoint is usually proposed on the basis of biological rational. There must
be a strong pathophysiological relationship between the true endpoint and the surrogate
endpoint.
It is also important to establish statistically that the correlation with the "true"
endpoint is of a sufficient amount to justify the surrogate endpoint as a basis for inference.
The
suitability of the surrogate response depends not only on its relationship with the true endpoint
but also on the treatments or interventions under comparison.
4
Prentice (1989) offered the
following as an example. A surrogate for coronary heart disease incidence, that is suitable for
the comparison of anti-hypertensive treatments, may well not be suitable for the comparison of
cholestoral lowering drugs since the effects of these drugs on the surrogate variable could be very
different.
The context of the therapeutic intervention determines whether the variable under
examination is a variable of interest or simply a surrogate. A study looking at the effect of
ingesting calcium may have stroke as the primary endpoint. A study to test if supplementing
the diet with calcium prolongs life, may have total mortality as the true endpoint and use stroke
as the surrogate endpoint.
One must distinguish between the use of endpoints to test mechanism from the use of
endpoints to test the clinical benefits of therapy.. A favorable response in a surrogate endpoint
does not necessarily imply that the true endpoint was favorable. For example, the fact that low
cholesterol is prognostically favorable does not necessarily imply that the lowering of cholesterol
will prolong one's life. As Wittes et al (1989) said "A surrogate endpoint is appropriate when a
test of mechanism can replace a test for clinical benefits of the therapy."
The persuasiveness of
the surrogate endpoint then depends on the state of understanding of the disease under
investigation and on the availability of studies indicating that change in the value of the
surrogate leads to a change in the true endpoint.
Even after a good surrogate endpoint has been defined there are still issues for concern.
A large potential bias may exist due to the missing data in trials with surrogate endpoints. This
is because the chance of having a missing endpoint may well relate to the treatment variable of
most interest. In addition, surrogate endpoints often suffer from informative censoring because
sometimes the action of reaching an endpoint cannot be easily observed. It must be measured to
know it occurred and some subjects do not return for a visit due to inconvenience or the
unpleasantness of this process.
1.3 Covariate Adjustment
Another procedure to increase the precision or reduce the bias of trials in survival
analysis is to adjust for covariate measures that were measured at the beginning of the study,
5
Variables measured before the treatment begins, baseline variables, may be included as possible
prognostic variables in a model.
If the study is not a randomized study, it is important to
measure and adjust for confounding variables when making a comparison or the comparison will
be biased. Without the use of randomization of subjects to treatment groups it is likely that
important covariates will not be evenly distributed among the treatment groups.
Even with randomization, sample covariate distributions usually differ between groups
due to random variation. This can lead to biased estimates of treatment effect when important
baseline covariates are omitted from the model.
However, it has been shown by Gail et al
(1984) that the asymptotic bias from omitting covariates is zero, if the regression of the response
variable on treatment and the covariates is linear or exponential.
Much of the literature on
clinical trials emphasizes the importance of adjusting for any covariates (baseline variables) for
which randomization fails to result in a nearly even balance. In addition to reducing bias the
adjustment of covariates is used as a means to increase precision. When covariates are measured
and adjusted for in an analysis, the variance of the estimate of group difference is reduced by
eliminating that part of the variance due to random variation in the covariates. The gain in
precision resulting from including covariates can be given as a function of censoring. The larger
the percent of censored subjects the smaller the amount of gain that can be achieved.
In clinical trials, the number of covariates is usually so large that they cannot all be
included in the analysis. Thus, we need a way to choose a subgroup of them if we are to adjust
at all: Beach and Meier (1989) cover the following methods in some detail.
One common
method is to choose for adjustment those covariates showing a large disparity. Another method
is to choose covariates with a large influence as measured by R 2, the squared correlation
coefficient between the outcome variable and the predictor variable, or by the test statistic for
the association coefficient in the model.
These covariates should provide the most gain in
efficiency. A third method is based on the product of influence and disparity. Some covariates
are taken a priori as obvious candidates.
Where the benefits of adjusting for covariates related to the outcome event can be
shown, so to can the losses. Beach and Meier (1989) claim that covariate selection, as described
6
earlier, can often change the conclusions of the trial even though the precision of treatment effect
either changes very little or actually increases.
Results given in their paper show that unless
either R 2 or N, the sample size, is very large, selection based on any of the three methods
actually increases the mean squared error so that an unadjusted estimate would then be
preferrable. They conclude, that under conditions typical in clinical trials, adjustment can be
expected to decrease the precision and adjustment improves precision only if R 2 and N are fairly
large.
All of the above views agree that the importance of adjusting for a covariate increases as
the strength of its association with survival time increases. They all argue not to include all the
baseline variables, for which randomization fails to result in a nearly even balance across
treatment groups, but only adjust for those covariates that show a strong association with the
outcome variable.
1.4 Multivariate Multistate Models
Another procedure, aimed at increasing the power available to detect a treatment
difference, is not to distinguish between looking only at survival or looking only at surrogates
but to look at the joint multivariate effect of treatment. A mathematical framework in which
models that consider more than two states (e.g. alive and dead) can be analyzed is a
multivariate counting process. Andersen (1986) presented the following definitions along with a
simple· three state survival model example. A multivariate counting process N={(N 1 (t), N2 (t), .
. . , Nk(t»; t
l
[O,l]} is a stochastic process with k components counting the occurrences, as time
t proceeds, of k different types of events Nh(t) is the number of type h events in the time
interval [O,t].
Each component process Nh has jumps of size +1, and no two component
processes can jump simultaneously. Thus multiple events cannot occur simultaneously. The
events correspond to the transitions, for an individual or a group of individuals, between various
states of a stochastic process.
A rate or transition intensity, 0h(t), is calculated between all
events.
For example, in a study of survival in Stage I cancer, the occurrence of metastases is
7
likely to increase the death intensity(hazard). One way of modelling this would be by means of
a time dependent covariate taking the value 1 at time t if metastases have occurred before time
t and 0 otherwise. This corresponds to proportional death intensities Q02(t) and Q12(t). Q02(t)
is the relapse rate. See Figure 1.1.
Fig. 1.1
The development in time of a multivariate counting process N is governed by its
(random) intensity process A =[{..\l(t), ... , Ak(t)}j t f [0,1]], which is given as follows. Let I dt
be a small time interval of length dt around time t, then Ah(t)dt is the conditional probability
that Nh jumps in Idt given all that has happened until just before time t. Using the notation of
Andersen (1986), let dNh(t) denote the increment of N h over I dt and let F t _ be the sigmaalgebra recording what has happened up to, but not including t, and then write Ah(t)dt
=
Pr[dN h (t)=l/F t-l. Here F t _ is the domain (an algebra of events) of the multivariate counting
process N(u) on the interval [O,t)..Since it is a sigma-algebra, if Fl' F 2, ..., F t _1f F t _ then U
Fnf Ft-.
Ft- includes all unions, intersections and complements and combinations of its
members. Thus, Ft- includes all events that have happened before (but not at) time t. As a
consequence we have that F.f F t' whenever s<t, reflecting the fact that as time proceeds more
and more is learned about the process.
For the multistate model in Figure 1.1, the survival function depends on all transition
intensities in the model.
Andersen (1986) showed how the transition probabilities could be
estimated under the assumption of constant transition intensities in which case the transition
probabilities are simple explicit functions of the intensities.
In the simple three state model, the
survival function can be estimated directly from estimates of the integrated intensities even
when covariates are included. For patients in state 0 at time 0 Andersen derived, for the simple
three state model in Figure 1.1,
So(t) = exp[-A o1 (t) - A02 (t)]
+ J {exp[-Ao1 (u) - A02 (u)] exp[-A l1 (u) - A12 (u)]} dAo1 (u)
8
t
where A0 h(t) =
J a 0 h(s)ds, h=l, ..., k.
0
The advantages of this method are that large sample properties of the estimators can be
derived rigorously using a counting process and martingale results both in the case of models for
homogenous groups and in the case of regression models, and both for parametric and for semiparametric models. The limitations include the fact that few time dependent covariates can be
analyzed simultaneously in this way unless very large samples are available. Therefore, if other
covariates are included beyond those defining the various alive states, the assumptions still have
to be made that these other covariates develop in a deterministic way (e.g. can be constant)
during an interval of prediction. In addition, the above method is only useful when dealing with
discrete time dependent variables. Continuous covariables have to be descretized in order to fit
into the framework.
1.5 The Use of Surrogate Measures in Combination with Observed Survival Time to Improve
the Estimate of Survival Distributions
A method is needed that can incorporate information provided from variables measured
throughout the trial, that are associated with survival time, in a format in which it can then be
combined with a patient's censoring time to provide improved estimates of and comparisons of
distributions of survival times. The method that will be proposed in Chapter 3 combines the
likelihood function based on censored deaths with likelihood functions based on observed
covariated trends to estimate the distribution of survival time. In this way, more information is
added to the likelihood resulting in better estimates and a gain in precision in estimating the
survival distribution or test of hypotheses comparing survival distributions.
Current survival techniques do not provide a good method for handling trials with a
large percent of censored observations.
The power available for such studies is too small to
decide whether the test treatment is better than its comparison.
Some clinical trials use
surrogate endpoints to reduce the amount of censoring. But then the problem arises of whether
you are measuring the treatment effect that is of primary interest. This dissertation places its
concentration on the primary endpoint, survival. Hence, it is the treatment effect on survival
9
that is measured, which is our main concern and not the effect on some surrogate endpoint.
In using variables measured across time the values will be correlated.
If the variable is
associated with survival time then using some measure of the trend of these values should
increase the power to detect a treatment difference. Some people have analyzed time dependent
surrogates as if they were fIxed meaning that they do not change across time. Hence, only one
time point is being used as if the other time points did not matter. In cases where it is the
pattern across time that is important and not just one time point, the use of this additional
information will increase the power available for making treatment comparisons.
A time dependent variable, x(t), is any variable that (may) affect the outcome time to
event and changes values over time. There are two types of time dependent variables, internal
and external.
An internal covariate is observed only so long as an individual survives and is
uncensored.
Its values may be influenced by the treatment or therapy that the individual is
exposed to.
As a result, it carries information about the survival time of the corresponding
individual, and must be handled differently than an external covariate. An external covariate is
one that is not directly involved with the failure mechanism. It may be one of three types; (i)
one whose value is measured in advance and (lXed for the duration of the study, (ii) an external
covariate whose total path, although not constant, is determined in advance for each individual,
(e.g. age) and (iii) one whose process through time may influence, but is not influenced by the
failure experience of the trial (Kalbfleisch and Prentice, 1980).
Time dependent variables are useful when they predict survival. It may be the trend
over time, the mean value, the last value or all of these and/or more that is associated with a
patient's survival time.
The problem is that in a randomized study internal time dependent
variables should be avoided as covariates in the primary evaluation of the treatment effects.
Variables measured after randomization carry information about the survival time of the
corresponding individual, including information on the effect of the treatment. Including them
in a conditional model as covariates along with other prognostic variables may adjust away
some part of the effect of another covariate on the risk of the outcome. If a time dependent
variable is used as a possible covariate when analyzing a treatment effect, one never knows
10
•
whether it is an indicator of death or a risk factor.
Thus, currently, in many randomized
clinical trials, available information provided by many time dependent variables is not being
used. Only data collected at or before baseline or external time dependent variables, which are
not directly involved with the failure mechanism, are included as possible prognostic variables of
survival time in the majority of studies.
This research proposes using the time dependent information as outcome variables in
conjuction with survival time to improve the estimate and comparison of survival distributions.
There is information available in time dependent variables that could be used in designing a
surrogate measure for survival time. Since some information on survival time already exists for
every subject, the idea is not to predict a missing value, nor is it to design a surrogate measure
and use it in place of the original survival measure. The idea is to add onto the information
that is already available.
1.6 Proposed
Research
This dissertation uses information from time dependent surrogate measures to improve
the comparison of survival
curv~.
This is performed by incorporating information about
survival time, provided from the sunogate measures, into the likelihood equation. The proposed
joint likelihood equation uses the conditional density of survival time given a surrogate measure
X along with the marginal density of X. This is in contrast to the standard likelihood equation
which uses the marginal or conditional density of survival time and does not include any
information on sunogate measures. A treatment effect will be added to each likelihood and its
variance will be computed using the proposed likelihood and the standard likelihood equations.
In this way the gain in efficiency in estimating the treatment parameter can be calculated by
taking the ratio of the expected values of the corresponding components or" the information
matrix for each method.
These efficiency ratios will be computed using different models. The first model will
assume that the density for the conditional density of T, survival time, given X is the
parametric exponential proportional hazards density.
11
The surrogate measure, X, may be a
single, fIxed value or a summary measure of several time dependent covariates that contain
information on survival time. Its distribution is assumed to be gamma. The results will then
be generalized by having the conditional density of T given X follow a general proportional
hazards model. The above models initially treat X as a fIxed variable. To incorporate X as a
summary measure of several surrogate covariates, it is necessary to model the time dependent
covariate values and then relate the parameters in the model to parameters in the distribution of
T given X. This is illustrated through several examples.
The use of X as a summary measure that is time dependent will fIrst be illustrated by
assuming that all measures of one covariate through time each have a gamma distribution. The
covariate is measured at discrete but frequent time points. The marginal density of the fIxed
variable X will be replaced by the marginal density of a time dependent variable in the
likelihood equation.
The time dependent measures will be summarized by individual
parameters, which will then be related to the subjects survival distribution.
Methods
for
applying these measures, including model selection and time dependent models will be discussed.
An example incorpOrating time dependent covariates into the analysis of survival time for a
study of lipid lowering will be described. Lastly, the implications, generalizations, and future
research will be discussed.
12
CHAPTER 2
LITERATURE REVIEW AND PRELIMINARY MODELS
2.1 Introduction
This chapter focuses on three different ways to use surrogate measures (e.g. time
dependent events) to improve the estimate of a primary outcome measure (e.g. survival time).
The first example uses surrogate measures to improve the estimate of the paramet~rs of a
normal model when there is missing data. This is followed by a literary review of Lagakos 1976
paper that proposed a stochastic model which utilized information available on a time
dependent event other than survival. Here, instead of having missing variables there is partial
information on survival time due to censoring.
Lagak08 idea is extended to a nonparametric
example that compares two ·survival curves by computing one survival curve using a time
dependent dichotomous surrogate mesure. The idea of using surrogate measures to improve the
precision in estimating parameters is further developed in the literature review of D.R. Cox's
1983 paper.
2.2 U~ of Surrogate Measures in a Multivariate Normal Model
This dissertation proposes to use surrogate measures, not as a replacement to the true
endpoint, but in combination with the partial information on survival time. The following is an
example of how, when there is missing data on the primary variable of interest, using additional
information, provided from variables measured along with the primary variable of interest,
results in increased precision in estimating the parameters. It is often the case that a complete
data vector of 1xp variables is not available for every subject.
anywhere from 1 to p variables.
A subject could be missing
Thus to estimate the true mean for a given variable, the
-familiar formula for the maximum likelihood estimation is not appropriate. For example, say
two variables of interest are measured, the variable Y and a second surrogate variable X.
Variable X was measured on all nl+n2 subjects but variable Y was measured on only nl of the
subjects. It is assumed that the n2 values of Y are missing at random. Let both variables have
a normal distribution.
Then the likelihood function for this data is
nl
L::;:II
1
1/2
1=1 (211')
U
-1
y-uy
I
1 y-uy
n2
1
1
2
2
1/2 ux- exp[(-1/2)(x-u x ) /ux J
..1=nl+l (211')
lEI exp[(-1/2)( x-u ). 1: (x-ux)J. II
x
y
2
where E =
The maximum likelihood estimates of u% nd u ll can be found by taking the derivative of
the log likelihood with respect to u% and u ll ' separately and setting these equations equal to zero.
The first derivative with respect to u ll will contain only the information from the nl .subjects
who had both variables Y and X measured.
The first derivative with respect to u% will contain
two components, one from the joint distribution of X and Y, and the other from the univariate
normal distribution of X given by the n subjects who had missing y values.
The asymptotic variance of uy can be found by inverting the information matrix, I.
I
=[
nl u x
2
!1EI
-nl u xy!lEI
Thus, the variance of the estimate of uy is
14
·'
p2
2
(Txi
)], where p =
2 2'
n
nl+ 2
(Tx (Ty
Thus, with a bivariate normal. distribution where n2 of the nl +n2 y values are missing, the
.
nl
variance ofY =i~1 Yi/nl' the iiy when the surrogate information is not used in the likelihood, is
(
(Ti/nl' which is larger than the variance of iiy when the surrogate information is used in the
likelihood. For this variance is equal to
(Ty 2
2
nl
n2(I-p )
~
[(
)
+
1 nl+ n2· (nl+ n 2/
By using the available information on the variable X from all nl +n2 subjects, an estimate of u"
n p2
can be found that has a smaller variance then the sample mean of y by a factor of (1 - n ~n ).
1 2
Now, suppose that these same two variables were measured but values are missing on
variable X as well as on variable Y. For example, there are n1 subjects with missing values on
X, n2 subjects with no missing values, and n3 subjects who have a missing value for variable Y.
* . n3
IT
and the natural
nl
LnL <X E
i=1
2 -1/2
2
2
(211' (Tx )
exp[-(xrux) /2(Tx ]
l=n2+ 1
log of the likelihood is proportional to
n2
y·-u
y,-u
[-(y.-u )2/2(T 2] E
_ ! ( 1 y)' 1;1( 1 y)
1 Y
Y i=nl+l 2 xrux
xi-ux
n3
2
2
(xrux) /2(Tx
, E
l=n2+ 1
The first derivative with respect to u" contains two components, one from the univariate
distribution of the n 1 subjects with no missing x value and one from the bivariate distribution
arising from the n2 subjects who had no missing values. The first derivative with respect to U x
also contains two components, one being the derivative of the bivariate distribution of the n2
15
subjects with no missing values and the other derivative is from the univariate distribution of
the n3 subjects with a missing Y value but no missing X value.
The information matrix is of the form:
Now, we see that by using all the information provided on both variables, Y and X, we
can get a more precise estimate of u y than if we had only used the values of Y. However, since
some values are missing on X our estimate is not as precise as it would be if there were no
missing values for X, as was seen in the first example.
Next, suppose that three variables were measured, Y, X and Z with n2+n3 missing
values. of Y, and n3 missing values of X and no missing values for variable Z.
Assume the
correlation between varibles X and Z is zero. The likelihood for this scenario is
* . n3
II
(211" 0'Z
l=n2+ 1
2 -1/2
)
2
2
exp[-(zruz) /20' z ]
where 1:; is the variance-covariance matrix for all three variables, and
:En
variance matrix for variables X and Z. The information matrix is of the form
16
IS
the diagonal
where A = (tTi tTx 2 tTz2 - tTz2 tTxi - tTx 2 tTzi)
2
2
= 1 - Pxy - Pzy
The variance of the estimate of uy is now a fuoctioo of three variables.
var(uy ) = {tT x 2tT/ tTz 2[(01+02)(02+03)+01(01+02)]
- 2tTx 2tTy2tTz2[(01+02)(02+03)+01(01+02)
-!01(01+0 2+0 3)] - tTx 2tTitTzi[2(01+02)(02+03)+01(01 +D2)]
2
+ tTzi tTxi[2(01+02)(02+03)+01(01+02) - D1(°2+°3)-°1 )]
2 2
=
2]
tTy 2fL0 1(01+0 2+ 0 3)(1-P xy-P zy) + 01(°1+°2+°3) - 01 0 3Pz y
D1(01 +02)(Dl +02+03)
17
Pzy2
]
The stronger the association between variables X and Z with variable Y, the more precise the
estimate of uy will be. If variables X and Z contain no information on variable Y then the
rr
variance of fi y is equal to
.
2
l 1 , the variance of the average of the nl y values.
This illustrates
how using information available from variables associated with the primary variable is always,
asymptotically, better or at least equal to just using the available values of the variable of
interest (in this
case
variable y).
In survival analysis, instead of having missing values, there is missing information on
survival time because some of the subjects are censored. Their true survival time is not known
but the censored value provides a minimum value of survival time.
By using additional
information extracted from surrogate measures it is possible to obtain a more precise estimate of
the survival distribution.
2.3 A Stochastic Model for Censored-Survival Data in the Presence of an Auxiliary Variable
Additional information may be in the form of a time dependent event that occurs
between baseline and the endpoint. In 1976 Lagakos proposed a stochastic model which utilized
information available on a time dependent event other than survival. The basic components of
the model are related to the Semi-Markov model of Weiss and Zelen (1965) and to the bivariate
exponential model introduced by Freund (1961).
Unlike these two models, Lagako's model
incorporates incomplete (censored) observations as well as covariate variables.
For ease of
presentation we will consider only one time dependent surrogate event here which shall be
referred to as "time to progression". Both survival and time to the additional event of interest,
disease progression, are assumed to be measured from a common point internal to each patient
(i.e. from entry into the study).
Binary-valued random variables coded 0 or 1 are used to
denote i) whether a patient experiences the time dependent event, progression, before death(c=l)
18
or not(c=O), ii) whether a patient's true survival time is known(a=l, else a=O), and iii) whether
a patient experiences the time dependent event before being censored or before death, whichever
occurs first(b=l, else b=O). See Figure 2.1.
Fig. 2.1
The random variable survival time, T, may equal time till death with no progression,
Xl' or time to progression plus survival time beyond progression, x2 +x3 • The random variable,
P, takes on the minimum value of xl and x 2 .Defined this way, P will-represent time until
progression when c=l. When c=O, P equals T and will be given the interpretation of death
ocurring without progression. Lagakos assumed the x/s to be independent exponential random
variables with associated failure rates "i >0 (i=l,2,3). Then the joint density of dying without
progression occurring first is
00
"l exp(-"l x 1) dx 1 J "2 eXP(-"2x2) dx 2
xl
exP
= "l
("l x 1)dx1 exp(-"2 x 1)
= "l exP [-x1("1+ "2)] dx 1·
The joint density of progression and death where progression occurs before death is
00
"2exp(-"2x2) dx2 J "I exp(-"I xl) dx1
x2
= "2exp[-x2("1+"2)] dX2'
The conditional density of death given progression is
"Iexp[-"1(x2+x3-x2)]
= "3exP(-"3 x3)
and the probability of progression occuring before death is
00
Pr[c=l] =
J "2 exp[- x2("1+"2)] dx2
o
= "2 / ("1+"2)'
When progression occurs before death P=P=x2' T=t=x2+x3 and the joint density of survival
19
time and progression time is
f(t,pl c=l) = A2 exp[-(Al+ A2)P] A3 exp[- A3(t-P)] / [A2 / (Al+ A2)]
= (Al+A2)A3 exP[-(AI + A2 - A3)P- A3t ], for t>p>O and 0 otherwise.
Lagako gave the following interpretation of the model.
"A patient begins with competing
hazard rates for death and progression of Al and A2' respectively. If progression occurs, the
patient's subsequent hazard rate for death is A3' Thus, the "effect" of progression is to alter a
patient's exponential survival rate from Al to A3'
Moreover, survival after progression is
independent of the particular time. at which progression occurred. The case Al=A3 means that
progression carries with it no prognosis of subsequent survivaL" Note that this is only true with
exponential (so called memoryless) distribution.
Lagakos incorporated censored observations in the following manner.
random variable representing "potential" censoring time.
Let Y be a
For example, when survival and
progression time are measured from patient's entry into a study, Y might be the time between
entry into the trial and the time of analysis. Then let the random variable U denote either true
survival time (and so a=I), or a censored measure of survival (a=O). U equals the minimum of
T and Y. Similarly, V equals the minimum of P and U, either the observed time to progression
(and so b=I), or a lower bound for it (b=O). In this way four possible types of outcomes may
be observed for a patient; patient alive without the occurrence of progression, patient alive and
progression has occurred, progression and then death occurred, and death without progression.
The densities associated with these four types of observable outcomes are:
dProb[U=V=s, a=b=O] = exp[-(Al+A2)s] dI(s),
dProb[U=s, a=O, V=p, b=l] = A2 exP[-(AI + A2 - A3)P -A3s] dI(s) dP,
dProb[U=t, V=p, a=b=l] = A2 A3exp[-(A1+A2 - A3)P - A3t] [1-I(t)]dPdt,
and
dProb[U=V=t, a=l, b=O] = Al exp[-(Al + A2)t] [1-I(t)]dt.
Where 1(.) denotes the c.d.f. of the random variable Y and assumes Y is independent of (T,P).
From these the likelihood function can be shown to be proportional to
K.
L = IT Aj J exP[-(Al+A2)EVi - A3(EU i - EVi)]
20
.
for homogeneous populations.
The random variables K1 , K2 , and K3 denote the number of
observed deaths without progression, Eai(l-bi), the number of observed progressions, Ebi, and
the number of observed deaths with progression, Eaibi' respectively.
The first and second
derivatives of the log likelihood are as follows:
'"
dlogL/dA. = K·/A. - (6'1 + 6'2) EY. - 6· 3 (EU. - EY.)
J
JJ
J
J
IJ
11
and
d 2IogL/dA.dA. = -K. 6../A. 2 ,
1 J
J IJ
J
for i, j = 1, 2, 3, where 6ij is the Kronecker delta. The maximum likelihood estimates of AI' A2'
A3
are thus
X.J
= K.J
X3 =
lEY.,
.1
j=I,2
K3 I(E Ui - E Vi)'
Their asymptotic covariance matrix can be estimated by
COVa(A. , A.) = 6.. A. 2 I K.
1
J
IJ J
J
ij = 1,2,3.
Note that the estimates are asymptotically independent. For heterogeneous populations where
there are available values of p covariate variables, Z=(Zl' ... , Zp)' for each patient, Lagakos
incorporated this by assuming
tha~
a patient with covariate vector Z follows the model with
lambda parameters
Aj (Z) =
where
a >0 and 'Yj
aj exphj' Z}
j=I,2,3.
=(rjl' ...,rjp)' are unknown parameters. In this way, the model has been
expanded from 3 to 3(p+l) parameters.
The case r11=r21=r31=r::j:. 0 corresponds to
covariate ZI having the effect of a scale change in the amount exph ZI}'
The maximum likelihood estimates of (aj , rj)' j=I,2,3 are not in general expressible in
closed form and so iterative procedures need to be employed for their determination.
The
corresponding covariance matrix can be estimated by the inverse of the sample information
matrix. Tests of significance can thus be made by either using this or by comparing differences
in maximized log likelihoods to percentage points of the appropriate chi-square distributions.
The intrinsic assumptions underlying the model are (i) the exponentiality of post-
21
progression survival time, (ii) the independence of post-progression survival time from time until
progression and (iii) the exponentiaiity and equality of the conditional distributions of time until
progression (given progression) and time until death without progression (given death without
progression).
Lagako's model incorporates time to event information that has previously been ignored,
hut it can only be used on discrete variables.
His method assumes independent exponential
random variables and incorporates this new information using a bivariate extension of the
exponential distribution proposed by Freund (1961). The model allows us to test whether the
occurrence of progression significantly changes the hazard of dying. This is done by testing Ho:
~1 =~3
using the negative log of the ratio of the likelihoods.
2.4 Nonparametric Survival Curve Estimation Using a Time Dependent Dichotomous Surrogate
Measure
This section develops and compares nonparametric survival estimates for the models
The well known Kaplan-Meier estimate of Set), the probability of
considered by Lagakos.
surviving beyond time t, will be
~ompared
with a method that estimates S(t) by combining
information from a time dependent event, that occurs between baseline and death, with the
subjects observed time till death.
A comparision will be made of survival probabilities
computed by the two different methods for the three state model considered in Figure 2.2. For
this example a subject may take one of two paths towards the event death. Either a subject
goes from the starting state "good" to the state "fair" in time T 1 and then from "fair" to
"death" in time T 2, or a subject goes from the state "good" to death in time T 3 • Thus, for the
observed survival time, T:
nnrll---T~
ifT3 <T 1
ifT3
~2
> T 1·
22
Fig. 2.2
Given the following values of Tl' T 2 , and T 3 for eight subjects the resulting survival time, T, is
seen in the last column.
Subject
T1
T2
T3
T
1
2+
0+
2+
2+
2
5+
0+
5+
5+
3
3
1+
3+
4+
4
2
4+
2+
6+
5
6+
0+
6
6
6
8+
0+
8
8
7
3
2
3+
5
8
6
1
6+
7
.
Method A computes the Kaplan-Meier estimate of the survival probability, S(t), using
the value T for every subject.
Thus, this method ignores information on any intermediate
events that occur before the state death on either route.
The Kaplan-Meier estimate is
computed for each time t i using the formula Sk (t) = IT" (1 - d;Jnrj) (Kaplan and Meier,
m
y(jr5ti
1958). Where Y(j) denotes the r d!stinct survival times, Y(1)<Y(2)< ... Y(r)' d j is the number
of deaths at time Y(j) and nrj is the number of subjects at risk at time Y(j).
Method A
estimates the survival probability, S(t), using only the value of T I +T 2 or T 3 instead of using
information from all three times, as will method B.
For method B individual Kaplan-Meier estimates of survival are computed for T I , T 2,
and T 3. From this information the discrete density function can be estimated for each time to
event interval using
for i=1.
23
intervals:
T1
T2
good to fair
fair to death
T3
good to death
t·1
Skm(t) fT (t)
1
t·1
Skm(t) fT (t)
2
t.
Skm(t) fT (t)
3
2
0.875
0.125
1
0.75
0.25
6
0.667
0.333
3
0.583
0.292
2
0.375
0.375
8
0
0.677
6
0.388
0.195
00
0
0.388
00
0
0.375
00
0
0
1
Assuming T 1, T 2, and T 3 are independent their joint density is f ,t ,t (t) =
t1 2 3
f 1(t)f2(t)f3(t). By definintion of T,
fT(t)
= Pr(T3=t n T 1>t) + Pr(T 1+T 2=t n T 3 >T 1)
00
= f3(t) Jf1(v)dv
t
00
J f2(w)dw +
oB
.
00
Jf1(u) f2(t-u)du
0
00
Jf3(w)dw
u
= f3(t) Sl(t) + Jf1(u) f2(t-u) S3(u)du.
o
For the discrete case:
and
The nonparametric maximum likelihood estimate of fT(t) is given by
For our example:
24
.
t·1
fT(t)
ST(t)
[0-3)
0
1
[3-4)
0.031
0.969
[4-5)
0.120
0.849
[5-6)
0.110
0.739
[6-7)
0.129
0.610
[7-8)
0.033
0.578
[8-9)
0.308
0.270
[9- )
0.270
Method B uses all of the available information by considering the time it takes to reach
the endpoint, death, from both possible routes.. Method A only uses the information provided
from the observed route. The following table presents the estimates along with their estimated
standard errors.
Method B
Method A
t
[0-3)
1
1
[3-4)
1
0.969
(.045)
[4-5)
1
0.849
(.110)
[5-6)
0.833 (.152)
0.739
(.158)
[6-7)
0.625 (.214)
0.610
(.187)
[7-8)
0.313 (.245)
0.578
(.182)
[8-9)
0
0.270
(.187)
[9- )
0
0
Method B has more jumps and appears to be more precise. The estimated variance of Skm(t) is
obtained by Greenwood's formula (Greenwood, 1926) for Kaplan-Meier estimates.
25
An
asymptotic expression for the variance of 8T (t) is obtained as follows.
Since it was assumed
that T l' T 2 and T 3 are independent, the variance of 8 T (t) is the variance of a linear
combination of products of independent variables. For example, 8T (4) = 1 - £1(2)£2(1)8 3(2) £1(2)£2(2)5 3(2) - £1(1)£2(1)5 3(3). The variance of 5 T (4) is equal to
var[ £1 (2)£2(1)8 3(2)] + var[£1(2)£2(2)53 (2)] + var[£1 (1)£2(1)53(3)]
+ 2cov[£1(2)£2(1)5 3(2)'£1 (2)£2(2)5 3 (2)] + 2cov[f1(2)£2(1)5 3(2)'£1 (1)f2(1)53 (3)]
+ 2cov[f1(2)£2(2)53 (2),f1(1)f2(1)8 3(3)].
To compute the variance of a product of independent observations the following formula is
appropriate. Its derivation can be found in Mood et. al. (1974).
var[XY] = uy2var(X) +
U
x2var(X) + 2u y 2ux 2cov[X,Y].
This can be easily extended to three variables.
To find the covariances of two products the
following formula was derived using the definition of covariance in term of expected values.
cov[X1Y1Z1,X2Y2Z2] = E(X1Y1Z1X2Y2Z2) - E(X1Y1Z1)E(X2Y2Z2)
= E(X1X2)E(Y1Y2)E(Z1Z2) - xlYl zl x2Ytr2
=
[cov(X1X2)+xlx.e][cov(Y1Y2)+Yly~[cov(Z1Z2)+zlz~ - xlYlzlx.eYtr2
For this survival distribution example, X, Y and Z are densities, f(t), and probabilities, S(t).
Individual covariance terms, for example, cov[f1(3),S1 (6)], are computed by defining the density
in terms of S(t). Thus,
cov[f1(3),5 1(6)] = cov[5 1(2)-5 1(3),5 1(6)]
= cov[5 1(2)5 1(6)] - cov[5 1(3)5 1(6)].
The covariance of two survival probabilities can be estimated similar to Greenwood's formula
for the variance of 5(t), since all the components of 5 T (t) are Kaplan-Meier estimates.
An
estimate of the asymptotic covariance of two Kaplan-Meier estimates is given by Miller (1981)
8(t)5(u)
E •
y (j)~t u
d·
(J
d)
nrj nrj - j
where t"u equals the minimum of t and u.
2.5 Use of a Surrogate Response Measure in a Parametric Survival Model
Discrete time dependent events are not the only informative time dependent covariates
26
that are measured in clincial trials. Much information is gathered at frequent intervals and at
the present time is not being used. Since a consequence of censoring a subject's survival time is
a loss of information about the effect of explanatory variables, it is important that a method be
found that can incorporate continuous time dependent covariates as well as discrete time
dependent covariates so that as much information as possible can be retrieved.
n.R. Cox (1983)
remarked that "Sometimes it is possible to recover some of this lost information via a surrogate
response measure intended to predict the remaining lifetime."
He studied this possibility
theoretically under idealized assumptions in the following manner.
Suppose there are n
independent individuals observed each with a constant px1 vector of explanatory variables
Zi=(zil' ..., Zip)'· Let the failure-time, T i, for the individual be exponentially distribut~d with
rate parameter Pi' where 10gPi=p'zi' with fJ an unknown parameter vector. If all individuals are
observed until failure, t i, being the observed value of T i , the likelihood is
and the log likelihood is
The first derivative with respect
t~
/3 r is
and the negative second derivative with respect to /3r and /3s is:
Taking expectations, we have the information matrix for fJ is iA (fJ)=z'z. Where z is the nxp
matrix of explanatory variables.
Suppose some of the individuals experience noninformative right censoring so t i is now
the failure-time or censoring-time.
The log likelihood is now l(fJ)=I;fP'zi - I;aUexp(fJ'zi)ti,
where the first term is summed over failed individuals. To evaluate E(Ti), for a simple case,
take a random censorship model in which for the i-th individual the censoring time is
exponentially distributed with parameter k i , so that the density of survival time, T, is now a
linear combination of two densities.
00
00
fT(t)=fc(t) {f~u)du + f~t) {fc(u)du = (ki+Pi)exp[-(ki+Pi)ti]
27
Therefore, the E(Ti)=(ki+pl l and the probability of censoring, ci =
00
=
£kiexp(ki+Pi)tidt = k/(ki+Pi).
00
00
0
t
J (fc(t) J ft<u)du)dt
The information is thus ib(P)=z/z - z/a where
c=diag(cl' ..., cn ) and ci=l if censored, 0 iffailed.
Suppose, now, that on each censored individual there is measured a random variable X,
which is intended to be related to the remaining life-time of an individual. To be useful, the
distribution of X must depend in a known way on parameter vector p. For simplicity Cox chose
Vi as the unobserved remaining life-time of the ith individual measured from the instant of
censoring. Then Xi=Vi/Wi where Wi is an independent "noise" component having some fIxed
distribution independent of censoring time and of zi. Purely, for mathematical convenience, Cox
takes the Wi to have a gamma distribution with density a(aw)~-lexp(-aw)/r(~) of coefficient
of variation l/{X. Under the assumptions made previously, Vi' is exponentially distributed with
parameter, Pi. Let V and W be jointly distributed continuous random variables with den.sity
fv,w(v,w) and let X=V/W amd Y=VW so that X has density,
fx(x)
J Iw/fv,w(xiwi,wi)dw = J IWlpiexP(-Pixiwi)a(awi)~-l exp(-awi)/r(a)dw
=Pia~/r(a)J IWlexP(-wi(Pixi+a)wi~-ldw
=
=(Pia~r('\+l»/(r('\)(Pixi+a)~+l) =Pia'\'\/(Pixi+a),\+l .
This is one way in which the distribution of Xi can be related to Pi and hence to p.
Under these assumptions the likelihood is:
and
where E c denotes the sum over censored individuals.
respect to the three parameters a, '\, and
The fIrst and second derivatives with
P are as follows:
dlogL
--aa= E c [Na - (,\+l)/(a+exp(p\)xi )]
2
d logL
da2
=
2
2
E c [-~/a + (~+l)/(a+exp(p/zi)xi) )]
dlogL
dX'"""" = Ec
[l/~ + loga - log(a+exp(p'zi)xi)]
2
d logL = E
d~2
c
_1/~2
dlogL
-1
dPr =Ef zir - Eallti zir exp(p'zi) + E c [zir - (~+l)xizirexp(p'zi)(a+exp(p'zi)xi) ]
28
..
d 2logL
{J'
2
-2
dPA8 = -Eall tizirZisexp( zi) + E c -(A+l)xi zirzis exp(2{J'zi) (a+exp({J'zi)xi)
s
- E c (A+l)xizirzi~ exp({J'zi)(a+exp({J'zi)xirl
dlogL
-2
dadP = f(A+l)xi zir exp({J'''j)(a + exp({J'''j) xi)
r
dlogL
1
-1
dadA = f[a -(a + exp({J'zi)xi) ]
and
dlogL
dAdP = f-[xizir exp({J'zi)]/[a + exp({J'zi)~]'
r
Calculations are shown for the r-th and s-th elements of the lxp fJ matrix, for the i-th subject.
These results can be easily extended to matrix form by combining all second order derivatives
using all p elements of fJ into one large matrix. The summations can be handled by using the
trace of c for the lxl components, multiplying c by l /ez for the lxp components and multiplying
c by z/ez for the pxp components.
The results of such calculations will be shown after the
expected values computations. To compute the expected values of the second derivatives the
following computations were used.
A+l
]=
{J'z,
2
(a+e IX.)
E[
.
J [(A+l)Pi AaA ]dx
(
I
a+Pixi )A+3
(A+l)Pi AaA
= pi(A+2)a A+ 2
=
(A+l)A
(A+2)a 2
z· z· ( A+l)p'Aa AA
=JIrIS
I
(a+p.x.)'\+1
I I
z· (A+l)p'Aa
dx-J az·IrIS
I
(a+p.x.)'\+2
I I
29
A
dx
=
Thus,
Zirzis('\+ 1)Pia'\,\
,\a'\Pi
-
2
E[d 1] = z'z _ z'cz
dp2
+ z'cz,\
'\+2
= z'z - '\~2 z'cz.
2
E[- d InL] = tr(c){- ~ +
dad'\
00
J
p.a'\'\
1
0 (a+p.x.)'\+2
1
,\ 1 1
= tr(c){- (j + ('\+l)a }
dx}
= tr(c){- a'\~l}
c.z·,\ c.(,\+l)z.,\
Ir
a
('\+2)a
c-z·
1 Ir,\
(,\+2)a'
=_~+ 1
_ _
-
•
30
Thus, the new information matrix, in which Q, ..\ and
P are
regarded as unknown has the
partitioned form
,-2
C.A
(..\+lr 1 l'cz
-c.Q-1(..\+lr 1
c...\Q-2(..\+2r 1
_..\Q-1(..\+2r 1I'cz
.'. -2(..\+2r 1.,c •
where c.
= tr(c) and 1 is an n x 1 vector of ones. The elements c...\-2, _c.Q-1(..\+lr 1 and
c•..\Q-2(..\+2r 1 are scalar values, the elements (..\+lr 1 I'CZ and _..\Q-1(..\+2r11'cz are vectors of
size 1 x p and the last element, .'. -2(..\+2r 1• I cz, is a matrix of dimension p x p.
If Q, ..\ are known, so that X is directly calibrated to relate to T, the information matrix
for {J is
ic(P)
= .'. - 2(..\+2r 1 .'cz
= .'. - .'cz + ..\(..\+2r 1 z'cz.
Therefore, a proportion ..\f(..\+2) of the loss of information from censoring has been recovered by
using a surrogate measure of survival time applied to censored observations. By incorporating a
surrogate measure of survival time into the likelihood equation, information about the unknown
parameter
P is made available.
Thus, a more precise estimate of the parameters in the model is
obtained.
All of the above examples and literary reviews show a gain in precision in estimating
parameters or survival distributions by incorporating surrogate measures into their calculations.
In the simple multivariate normal model more precise estimates were obtained for the primary
variable by including information about other variables that are correlated with the primary
variable in the calculations. The stronger the correlation between the primary variable and the
surrogate measures the greater the gain in precision.
Lagakos(1976) proposed using a
dichotomous time dependent surrogate measure to improve the estimate of survival in a three
state stochastic model.
For a patient who experienced the surrogate event, his exponential
survival rate was altered from ..\1 to ..\3.
When Lagakos' idea was extended to the
nonparametric survival curves example it was shown that using a time dependent dichotomous
31
surrogate event that occurred between baseline and death improved the precision of the survival
estimates.
Cox(1983)
showed how using the marginal density of the surrogate measure of
survival time along with the probability of surviving beyond time t given x, Stlx(t), results in a
gain in precision in estimating the parameters of the model when the conditional density of
survival time is exponential and X is a ratio of an exponential and a gamma distributed
variable.
The next chapter proposes incorporating a surrogate measure, X, into the likelihood
equation that models time to an event.
Unlike Cox's model, the marginal density of the
surrogate measure will be applied to both censored and uncensored observations. It is the joint
density of survival time, T, and the surrogate measure, X, that is used rather than the marginal
density of T. The result of incorporating X into the likelhood equation is a gain in precision in
estimating the survival distribution and a decrease in the variance. of the parameter estimates.
As a result, the power to detect a treatment difference
is increased.
•
32
CHAPTER 3
USE OF A SINGLE VALUE SURROGATE MEASURE
3.1 Introducing the Likelihood Model for Survival
In this chapter two methods for estimating survival curves and group differences will be
compared.
One, referred to as the standard method, is based on the standard likelihood
equation that uses the marginal density of T. The second method, the proposed method, uses
the joint distribution of T and X. It will be shown that the proposed method is at least as
efficient as the standard method. The conditional distribution of T on X will first be assumed
to be exponential and will then be generalized to proportional hazards models.
A treatment
parameter will be incorporated into the likelihood equations and it will be shown that the
amount of information recovered from using the surrogate measure, X, varies depending on
which components of the likelihood are hypothesized to be affected by treatment and by the
amount of censoring.
The standard likelihood equation used in survival analysis is based on the marginal
distribution of survival time, T, and is derived in the following manner. Given the conditional
distribution of T on X, fT1x(t,x), and the density of X, f(x), the marginal distribution of T can
be found by
and the survival function, S(t), can then be computed by the formula
For random type I censoring observe t i ,
c·1
> t.,1 6.=1
1
t·1 >
c·,1 6·=0.
1
Let Yi = max(ti,cJ The density for t is f(y) with distribution F(y) and the density for c is g(y)
with distribution G(y). Thus, using the notation of Miller (1981),
Li = f(Yi)[1 - G(Yi)]
if 15.=1
1
= g(Yi)[1 - F(Yi)]
if 15·=0
1
and
n
Assume noninformative censoring so the max L is equivalent to max . II f(y.)
8
8 1=1 1
15.
1
1 15.
S(y.) - 1
1
such that
L
d
c
= IIfT(Yi)II ST(Yi)
3.1
d
c
.
where II denotes multiplying over deaths, II denotes multiplying over censored observartions and
n
II will denote multiplying over all observations.
The information available about the random variable X can be included by using the
joint distribution of T and X, fT,X(Yi,xi)' instead of the marginal distribution of T. In this case
the likelihood is of the form
d
L
c
= II fT1x(Yi,xi) fx(xi) II STlx(Yi,xi) fx(xi)
d
c
n
= II fT1x(Yi,xi). II STlx(Yi,xi) II fx(xi).
3.2
A comparison of the maximum likelihood estimates and their variances computed from
likelihoods 3.1, the standard method, and likelihood 3.2, the proposed method, will reveal that
more precise estimates are obtained when likelihood functions are formed using the joint
distribuiton of T and X, rather than the marginal distribution of T.
3.2 Forming the Likelihood Equations Using Given Distributions
All of the likelihood equations used in this chapter will deal with the surrogate measure,
X, as a fixed random variable. The use of X as a time dependent variable will be discussed in
chapter 4.
For the first model let the conditional density of T on X have an exponential
density,
and the variable X have a gamma density
34
f (x) - f30t x Ot -1 e- f3x dx
x
- r(Ot)
Then the joint density fT ,x(t,x) is
f
(y x) - he- hy f30t x Ot- 1 e- f3x dydx
r(Ot)
.
T ,x , -
The marginal density of T is
f (y) -
T
-
J.,\r(Ot)
/P -x Ot+ 1- 1 e-(..\r+f3)x dx
= .,\ f30t
r(Ot+1)
r(Ot) ("\Y+f3)Ot +1
and the probability of survival beyond time t is
The probability of surviving -beyond time C is «.,\!+f3»Ot and will be referred to simply as S for
sections 3.2 - 3.6. Therefore, the likelihood function for T, when some of the observations are
censored, using the marginal density, is
L=
a: .,\Ot(
f3 )Ot+1 fI(-f3_)Ot
f3 .,\ti+P
"\ci+f3
3.3
and the likelihood function for T using the joint density of T and X is
.
d
.,\x.t. c -h·c. n f30t
Ot 1 -f3x.
L = IT h.e- I lIT e I I IT - - x. - e I
I
r(Ot)
I
3.4
So the information in variable X about the unknown parameters is only used by the proposed
method (equation 3.4) and is ignored by the standard method(equation 3.3).
For each likelihood equation the maximum likelihood estimate of the parameters can be
obtained by setting the first derivative of the log likelihood equation, with respect to the given
parameter, to zero.
The corresponding variance of the MLE's will be found by taking the
inverse of the information matrix. Sections 3.3 - 3.10 will assume fixed type I censoring at time
C. The results will be generalized to random type I censoring in section 3.11.
35
3.3
Comparing the Efficiency of the Proposed Method in Estimating the Parameter of an
Exponential Distribution
The following section will estimate the underlying conditional hazard of fT1x(ti,xi)' '\,
and the underlying marginal survival function, STet), using both methods. For simplicity, the
parameters a and fJ will be assumed known.
The natural log of the likelihood using the
marginal density, the standard method, is
LnL = :E {In'\ + Ina -lnfJ + (a+1)[(InfJ - In('\t}. + fJ)J} + :E aOnfJ - In('\C +fJ)].
d
c
Taking the first derivative with respect to ,\ and setting it equal to zero we get the following
equation
1
(a+1)ti
aC
~ (X - ('\t i +fJ» - f ('\C + fJ) = O.
Any value of ,\ that solves this equation is the MLE for '\,
equal to
th~
i
The asymptotic variance of ,\ is
inverse of the expected value of the negative value of the second derivative with
respect to '\, where
d 21ngL
d,\2 =
~
1 (a+1)«'\t i + fJ)2 - 2,\tl- fJ2)
(,\2 -,\2
('\t + fJ)2
+
i
f
aC2
('\C + fJ)2
and
+
= n(l-S)1.... _ n (a+ 1) [l-S
,\2,\2
l+l/a 1) ..... (Sl+2/a_ 1)
-...
]
(a+1)
(a+2)
+ 2a (S
= n [1-S-(1-S)(a+1)-2a(Sl+1/a -1) + a(a+1) (Sl+2/a -1) _ aS1+2/a(S-2/a_1_2S-1/a_1»]
,\ 2
(a+2)
36
na(1_ S l+2/a)
=
,\2(a+2)
.
,\2(a+2)
The variance of ,\ is equal to
The variance decreases as a increases and
na(1-S1+2/ a)
increases as the percent of censoring increases. Note that a is the shape parameter of a gamma
distribution.
When a=l the distribution is exponential and the value of f(x) continuously
decreases as x increases.
This occurs when the sample of x values is heterogeneous.
As a
increases the coefficient of variation increases. a is large when the sample of x values is very
homogeneous.
In comparison the likelihood which incorporates the joint density of T and X, the
proposed method, has the following log likelihood
Ln(L) = E In'\ + Inx· -,\x.t. + E -,\x·C + E aln,B - Inr(a) + (a-1)lnx. -,Bx..
dill
C
1
all
1
1
The first derivative with respect to ,\ is
dLd~L
=E
A
d
(+ -x.t.) - E x·C.
11
A
C
1
2
The second derivative with respect to ,\ is d ln L = E -1 and the expected value of its negative
d,\ 2
d ,\ 2
IS
ECd2InL)
d,\2
= n(l-S).
,\2
•
,\2
•
Hence, the variance of ,\ is equal to n(1-S)" The variance of ,\ using the proposed method does
not depend on a, but only on the percent of censoring by decreasing as the percent of censoring
decreases. Using the delta method the variance of the survival distribution, ST(t), is equal to
. dST(t) 2 .
(~) var('\) =
,B'Pa 2t 2
2~
('\t+,B) a+
Thus, using the standard method,
•
1) var('\).
,B'Pa 2t 2
,\2(a+2)
var(S (t)) =
T
('\t+,B)2(a+1) na(1_ S l+2/a)
as compared to using the proposed method,
2 22 2
,B a t ,\
var(S (t)) =
T
(,\t+,B)2( a+1) (l-S)n
In using only the marginal density of T to form the likelihood equation versus using information
on both T and X, the variance of ST(t) is larger by a factor of
(a+2)(1-S)
a(1_ S l+2/a)'
3.5
37
As a decreases, the efficiency of the standard method to the proposed. method decreases such
that the proposed method is more efficient when the values of X vary a lot from subject to
subject.
As the percent of censoring increases the efficiency of the standard method to the
proposed. method increases to 1.
As both a and percent censoring get larger the gain in
efficiency diminishes.
3.4 Computing the Efficiency Ratio When Survival Tune is Aff'ected by Treatment
Now, an additional term will be added to the likelihood equation, e"'liZi, which will
represent the effect of treatment. Z is a
parameter.
~ 1 treatment indicator variable and
"'I is the treatment
First, the treatment effect will be assumed to have a multiplicative effect on the
hazard for the conditional density, fT1x(ti,xi).
Secondly, the effect of treatment on T will be
assumed to be indirect by its affect on the scale parameter
P in
the marginal density, fx(xi).
Thirdly, the effect of treatment will be placed on both simultaneously but the effect may vary
between the two.
There does not have to be an equal effect. A comparison of the standard
method to the proPosed method will be made for each type of treatment effect. The parameters
a and
p are no longer assumed known.
For the standard method, with treatment affecting T through the term e"'IIZi, the
likelihood equation, found by replacing ,\ by ,\e"'11\ is
d \
L= IT Aa e
"'lIZI(
P
RIc
fJ
p+'\e"'II Zit.
)a+ IT (
1
R
fJ
)a
p+'\e "'II ZiC
and the natural log of the likelihood equation is,
1 = ~nn'\ + Ina + "'IIZi -lnp + (a+1)(lnp -In(p+,\e'''lIZit i )]
+ Ea[lnp - In(p+,\e"'IIZiC)].
c
The first derivative with respect to "'II is
(a+1)'\t.z.e"'Ilzi
a,\Cz.e"'IIZi
dl = E[z. _
__
1 1
] _ E
1
d"'ll
d 1 (p+'\e "'IIZiti)
c (p+'\e "'IIZiC)
and the partial derivative with respect to "'II and ,\ is
38
The expected value of the partial derivatives equals zero when we let Z take on the values of +1
or -1 and evaluate the expected value under the null hypothesis of no treatment effect. The
negative second derivative with repect to 1t is
_aA2C2z.2e21tZi
+
t
[(,8+Ae 1~ZiC)2
aACz.2e1t Zi
+ (,8+A:1tZiC)]·
and its expected value under the null hypothesis is
2
E(Il1 t =0) = E E (-d
z tIz d11
k)
= E(z2)[-n(a+1)(1-S) _
z
na(a+1) (S1+2/a -1) + 2na(1_S1+1/a)
(a+2)
+ 2na(a+1) (S1+2/a -1) + n(a+1)(1-S) + na(S1+1/a_ 1)
(a+2)
_ na~2C2S1+2/a + a~C S1+1/a]
{32
na(1_ S1+2/a)
(a+2)
(3
3.6
=
The amount of information on 11 increases as a increases and decreases as the percent of
39
censoring increases.
For the proposed method, with treatment effecting T given X, the likelihood equation is,
d
'Y1z ·
'Y1 z . c
'Y1z · all pOt
Ot 1 -x·p
L= ll..\xie
'exp(-..\xie
't i) llexp(-..\xie
'C) II r( Ot) xi - e '
and the natural log of the likelihood equation is,
1= EOn'\
d
Z
+ Inx.I + 'Y1z.I - ..\x.e
'Y 1Zit.] + E -..\x.e'Y1 iC + E [OtlnP -lnr(Ot) + (Ot-1)lnx.
II
C
I
all
I
-pxl·].
The first derivative with respect to 'Y1 is
,J!L
=
Q'Y1
E[z. - ..\x.z.e'Y1 Zit.]
d I
I I
I
+
E -..\x.z.e'Y1 ziC
C
I I
and the partial derivative with respect to 'Y1 and ,\ is
d 21
'Y1 z ·
It·
\ = E -x·z·e
d 'Y1 d A
d I I
I
+
E -x·z·e
C
I I
'Y1 z .
'C.
2
The expected value of the partial derivative, E(~,\~~~), equals zero under the null hypothesis of
no treatment effect. The second derivative with repect to 'Y1 is
2
-d 1 = E ..\x.z.2e'Y 1Zit.
d'Y12
d
I I
I
+
E ..\x.z.2e'Y1 z iC
C
I I
and its expected value under the null hypothesis is
2
E(II'Y1 =0) = E E E ( -d 1 )
z xlz tlxz d'Y1 2
=
,\CpOtOt
[ (P+,\C)Ot+1
n 1-
pOt
(p+,\C)d
,\COtpOt
+ (P+,\C)Ot
= n(1-S).
3.7
Once again we see that with the proposed method the amount of information gained does not
depend on Ot, but it does depend on the percent of censored subjects.
The efficiency of 'Yl
increases as the percent censored decreases.
The efficiency of the standard method to the proposed method can be found by taking
the ratio of E(II'Y1 =0) for the two methods.
The efficiency of the standard method to the
40
pr4posed method for when the treatment affects survival time, T given X, only, is
Ea(lhl=O) _ 0(1_8 1+ 2/ 0 )
Eb(lhl=O) - (0+2)(1-8) •
This is the same result
the
parameter~.
88
3.8
equation 3.5, the efficiency of the standard method to the proposed for
The efficiency of the proposed method increases
decreases to 0 (or as the
88 0
population of X becomes more heterogeneous) and also increases as the percent of censored
For large values of
observations decreases.
and a large percent of censoring the proposed
0
method is no longer more efficient than the standard method. Figure 3.1 illustrates how the
efficiency ratio goes to 1 for
of
t can be reduced to
the standard method.
0> 1
when the percent censored is 50%. As
0
goes to 0 the vatiance
i, (17%), of its original value by using the proposed method instead of
It is most helpful to use the information provided from X when the
values of X are heterogeneous.
1.0 +I
•
efficiency
•
••
•
•
•
•
•
•
•
• • • • • ...
•
••
•
+ +
0.5 +
...
...
... + ...
+ + +
+ + + + + +
Fig. 3.1
+
...
...
0.0
.
+
--~---------+---------"---------I---------+-----------
o
1
2
J
4
""
_
o
3.5 Computing the Efficiency Ratio When the Effect of Treatment is on the Sllrrogate Measure,
X, Only
Now, let the treatment effect be on the surrogate X by replaceing
the standard method the likelihood equation is equal to
41
p with
z
pe'Y2 i. For
L=
g~
[pe''Y2 Zi/(>.t+Pe1 2Zi)]a+l IT [pe 12Zi/(>.C+pe12Zi)]a
pe12
c
Z
1
and the natural log of the likelihood equation is,
I = ~nn>. + Ina - 12zi - Inp + (a+l)(lnp + 12zr In(pe1 2Zi +>.ti))]
+ fannP + 12zi - In(pe1 2Zi+>.C) ].
The first derivative with respect to 12 is
z
dl
pz.e12 I·
-d- = E[-zi +(a+l)(zr
1 1Z .
)] + E [azi 12
d
(pe 2 '+>.ti)
c
and the partial derivatives are equal to
z
i
d 21 - E (a+l)pt.z.e12
I I
2Zi
d
d12 >. - d (pe1 +>.t )2
i
The expected value of the partial derivatives equals zero under the null hypothesis of no
treatment effect. The second derivative with repect to 12 is
_az. 2p2e212Zi
+ f
[(pe ~2Zi+>.C)2
apz.2e'Y2 zi
+ (pe 1 2IZi +>'C) ]
and its expected value under the null hypothesis is
C
C
E(II12=O) = E(z2)[-na(a+l)p2 f
1 +3dt + na(a+l)>.pa+l f
1 +2dt
z
0 (p+>.t)a
2
n (Q+>.t)a
_ nSap + nSa,B
(p+>'C)2 (P+>.C)
-1) - naS 1+ 2/ a + naS 1+ 1/ a
3.9
This is the same result as equation 3.6, the efficency of 11 for standard method.
Since the
standard method only uses information obtained from T, the survival time, the amount of
information obtained is the same for 11 and 12'
42
For the proposed method, with treatment affecting X, the likelihood equation is,
\.. C
all (lJe "Y 2Zi)a a 1
z·
d
L= TIAx.e- AAt TIexp(-Ax'C) TI
x. - exp(-lJe "Y2 I x.)
1
1
r( a)
1
1
and the natural log of the likelihood equation is,
I = ~~nA
+ IOXi
- Axiti]
+t
-AxiC
+ ~1[a(lnlJ + "Y2zi) - Inr(a) + (a-1)loxi - lJe"Y2Zixi]'
The first derivative with respect to "Y2 is
~ = E az. - IJx.z.e "Y2zi
0"Y2
all
1
11
and the partial derivative with respect to "Y2 and A is
d 21
d"Y2 dA = O.
d 21 _ E z
d"Y2da - all i
~=
d"Y2dlJ
E - x.z.e"Y2 zi
all
11
The expected values of the partial derivations equal zero under the null hypothesis of no
treatment effect. The second derivative with respect to "Y2 is
2
_d 1 = E-lJx.z.2e"Y2zi
d"Y22 all 1 1
and its expected value under the null hypothesis is
. 2 "Y2z. 00
E(II"Y 2=0) = nE[lJzi e 1 J x1·f I (x)dx]
z
0 xz
= na.
3.10
Now, the information on the treatment parameter depends on a and not on the percent of
censoring. The opposite was found when the treatment affected T and not X for the proposed
method. Here the information increases as the set of x's become more homogeneous because the
treatment effect on x will be clearer and easier to measure because it will be the same effect for
all x.
The efficiency of the standard method to the proposed method can be found by taking
the ratio of E(II"Yi=O) for the two methods.
The efficiency of the standard method to the
proposed method for when the treatment affects survival time, X, only, is
Ea (I1"Y 2=0)
(1_ S l+2/a)
E b (II"Y2=O) =
(a+2)
3.11
43
The amount of the information obtained on the parameter "1 using the proposed method
increases as
0
increases. Thus, the efficiency of the standard method to the proposed method
decreases as
0
increases and increases as the percent of censoring increases to 100%. When there
is no censoring the proposed method is still more effecient by a factor of 0+2. The larger the
percent of censoring the more important it is that information from X be used.
Figure 3.2
illustrates that using the information providcd from X reduces the variance of t by at least 55%
when there is 50% censoring. When
is
to
0
is as large as 4, the variance of t by the proposed method
of ita variance using the standard method. Even with no censoring the variance is reduced
by at least 50%.
efficiency
.
0.8 +
Fig. 3.2
0.0 +
--+---------+---------.~--------+---------+--------0 1 2 3 4
a
3.6 Computing the Efficicncy Ratio Whcn Doth Survival Time and the Surrogate Measure, X,
are Affected by Treatmcnt
The latter two examples allowed treatment to affect survival, T, either through TIX or
through X. A more practical example is where treatment affects survival time through survival
time given X and the surrogate measure, X. For the standard method, the likelihood equation is
DOW,
44
For simplicity let 1=/1 - "Y2' The log of the likelihood equation is then
InL = ~ {Ina + InA - lnp +zi l + (a+l)Onp - In(p+Atie
z."Y
1 )]}
+
f
aOnp - In(p+ACe
Z'I
1 )].
The first derivative with respect to "Y is
(a+ 1)At.z.ezi l
aACz.ezi l
dlnL = E z. _
1 1
+ E_
1
d"Y
d 1
(p+Atiezi/ )
C
(p+ACe zi/ )
and the partial derivatives are equal to
When Z takes on the values of 1 or -1, the expected value of the partial derivatives is 0 under
the null hypothesis of equal treatment effect. The second derivative with respect to 1 is
aACz. 2eZ·I
1
1
and the expected value of its negative under the null hypothesis is
C
C (p+At)2 p2 2PAt
E(II"Y=O) = E(z2)[na(a+l)Apa J
At +2 dt - na(a+l)Apa J
- ~2 dt
z
0 (p+At)a
0
(p+At)a
n SaAC
+ (P+AC)
2
_ na(a+lppa[
-1
+ _1_ +
P
aA(p+AC)a
aApa (a+2)A(,8+AC)a+2
45
_
+
2/3
2/3
(a+l)A(/3+AC)o+1
- ~1_SI+2/0)
- (0+2)
.
3.12
This is the same result that was obtained when treatment affected T given X, only, and
treatment affected X only, for the standard method.
Here, we are computing the expected
information of 1'1-1'2 which increases as 0 increases and decreases as the percent of censoring
increases. The variance of 1 is the inverse of E(II1'=O).
For the proposed method, the likelihood equation with both treatment effects present is
Z
z
1'1z· Ax·e1'1Zit. c -Axie1'1ziC all (/3e1'2 i)0 0-1 -/3e1'2 ixi
d
L = IT Ax i e 1 e l l IT e
IT
r(o)
Xi
e
and the natural log of the likelihood is
InL = I =
~nnA +
Inxi +
~lzi - Axie1' IZit i] +
f -Axie1'I ZiC + ~1[oln/3 + 01'2Z- lnr(o)
1'2z.
+ (o-I) lnxi -/3e lxi]·
For the proposed method, the variance of 11 and 12 will have to be computed separately, and
then the variance of 1C12 will be computed. The first derivative with respect to 1'1 is
.
dl
-d1'1
= E(z. - At·x.z·e
d 1
1 1 1
1'lz .
I) + E -ACx.z.e1'lz1, .
C
1 1
The partial derivatives are as follows
d 21
d1'}d1'2
d 21
d1'}d/3
d 21
-d1'}dA
_ 0
= 0
= E -t·x·z·e
d 1 11
1'l z ,
"Ylz ,
,+ E -Cx·1z·1e '.
c
Under the null hypothesis of equal treatment effect, 1'1= 1'2' but they do not necessarily equal
zero. The expected value of the partial derivative of 1'1 and A is
E(II1'I=1'2) = ~{zi[ne
1'lz.00
1£
1'lz.
C
1'lz. -Ax.t.e 1
xii tiAe lxie 11
dt f(x)dx]}
46
+ E{z.[ne
z 1
"Y1Z' 00
1
00
f0
+ E{Zi[f DX1·e
0
z
00
x· f
l
C
"Y1Z' -Ax.t.e
CAe
lx.e
1 1
"Y1 Z'
1
dt f(x)dx]}
1
"Y1z. -Ax.C
ICe 1 f(x)dx]}
+
All of the exponential terms drop out under the null hypothesis thus the expected value of the
partial derivative with respect to "Y1 and A is zero under the null hypothesis.
The second
derivative with repect to "Y1 is
2
_d }
d "Y1 2
= E At.x.z. 2e"Y1 Zj
dIll
+
E ACx.z. 2e"Y1 Zj
C
11
and its 'expected value under the null hypothesis is
C
"Y1 z,
E(II"Y1="Y2) = E{z.2[nAe "Y 1z
x· f t.Ae"Y1zix.e-Axitie
Idt f(x)dx]}
Z
1
0 1
1
1
0
(r
2
+ E{z.
z 1
=
[ne
"Y1 z, 00 00
"Y1z, -Ax.t.e
1 f x· f ACAe
lx.e 1 1
0 l
1
C
"Y1 Z,
1
dt f(x)dx]}
"Y1 z.
\_- C "Y1 Zi
-Ax·Ce
1
"Y1 z. 00
-N\.' e l l
~{zi nAe
1 f xi[C(1-e
1
) - (C + e
"y z.
- Ax)]f(x)dx}
o
~1~
1
2
1
00
+ E{ZI'2 [f DXiAe"Y1z.ICe-Ax·C
1 f(x)dx]}
z
0
47
+
ACPQQ
pQ
= n[l- (P+AC)Q+l - (P+AC)Q
ACQPQ
+ (P+AC)Q
= n(I-S).
3.13
This is the same amount of information for "Yl that occured when treatment was on T given X
only. Having treatment affect both T and X does not change the individual information of the
two treatment effects. The variance of "Yl is nei-sr
The first derivative with respect to "Y2 is
.....QL = 1:: QZ. - px.z.e"Y2zi
d"Y2
all
1
.1 1
and the partial derivatives are equal to
d 21
d"Y2dA = O.
~=
d"Y2dQ
Ez.
all 1
~ = E- x.z.e"Y2zi
d"Y2dP
Under the null
hypoth~is
all
1 1
of equal treatment effect, "Yl ="Y2' the expected values of the partial
derivatives are equal to 0, when Z takes on the values of 1 or -1. The second derivative with
respect to "Y2 is
2
_d 1 = E_px.z. 2e"Y2 zi
d"Y22 all
1 1
and its expected value under the null hypothesis is
E(II"Yl ="Y2) = nE(pe
z
"Y2z. 200
l Z·
x.f(xlz)dx)
1
1
J
0
2
= nE(az.
z 1 )
= na.
3.14
Thus, the variance of 12 is n~' The partial derivative with respect to 11 and 12 is zero and so
the variance of 11 - 12 is
var( 11 - 12) = nd-S)
+ n~
a+l-S
=na(I-S) .
The efficiency of the standard method to the proposed method is
(a+ I-S)(I- SI+2/a)
(a+2)(I-S)
3.15
This ratio decreases as a decreases to 0 and also decreases as the percent of censoring decreases
to 0%.
For large a and a large percent of censoring the proposed method is no longer more
efficient than the standard method. This is the same scenario as Figure 3.1 when the treatment
effect was on survival time, T given X, only.
However, when treatment effects T and X
simultaneously the greatest reduction possible in the variance of
there is no censoring and a is close to
l' is
50%. This occurs when
o.
Under the standard method the information of the treatment parameter remains the
same whether the treatment effect is on T, X, or both.
Under the proposed method, the
information of the treatment parameter varies depending on what variable it is affecting. When
treatment is on T given X, or T and X simultaneously, the efficiency of the standard method to
the proPosed method decreases to 0 as a decreases to 0 and as the percent of censoring decreases
to 0%. In the heterogeneous case of no censoring you cut your variance in half by measuring X.
When the treatment effect is on X, it is most important to measure X when X is very
homogeneous and the percent of censoring is large.
3.7 Introducing a General Model
In order to have a more general model that could be used for a wide variety of data, the
conditional density of fT1x(t,x) will be assumed to follow a proportional hazards model,
h(t)xe-xH(t), where h(t) is an arbitrary underlying hazard function and H(t) =
49
j h(u)du.
o
If the
variable X has a gamma density
f (x) - pOt x Ot -1 e- Px dx
x
- r(Ot)
then the joint density fT,x(t,x) is now
(t ) _ h(t) -xH(t) pOt Ot-l -px dtdx
-l',x ,x xe
r(Ot) x
e
,
f_
and the marginal density of T is
r
IT
(t) _ ooJ h(t)pOt Ot+l-l _(H(t)+P)xi dx
- 0 r(Ot) xi
e
=
h(t)POt
r(Ot+l)
f(Ot) (H(t)+P)Ot +1
h(t)Ot
P
= T«H«t)+P))
Ot+l
dt.
The probability of survival beyond time t is
The probability of surving beyond time, C, the time of censoring, is [H(&+:aJ Ot and will be
refered to simply as S for sections 3.7 - 3.11.
3.8 Computing the Efficiency Ratio When Survival Time is Affected by Treatment
Now, for the standard method, with treatment affecting T ,given X through the term
e11\ the likelihood equation is,
Z
d h(ti)Ot e11 j
"'+1 c
P
) ....
IT (
)Ot
Z
P
P+H(ti)e11 j
p+e1 1ZjH(C)
and the natural log of the likelihood equation is,
L= IT
I
(
= ~nnh(t) +
P
InOt + 11zi - Inp + (Ot+l)(Inp - In(p+e11ZjH(ti))]
+ 1:Ot[lnp - In(p+e1 1Zj H(C))].
c
The first derivative with respect to 11 is
Z
dl
(Ot+l)H(ti)zie11 j
d11
= ~[zi -
(p+e11ZiH(ti))
]-
t
Z
OtH(C)zie11 j
(p+e11ZjH(C))
and the partial derivatives are equal to
d21 = 1:[ (Ot+l)H(ti)h(t)zie211Zj
d11 dH (t)
d
(p+e 11ZiH(ti))2
50
(Ot+l)h(t )Zie11Zj
i
(p+e 1 1Zi H(t i))
+
f
Z
oH(C)h(C)zie211Zi
oH(C)h(C)zie11 i
[(13+e1 1ZiH(C»2 - (13+e 1 1Z iH(C» ].
The expected values of the partial derivatives, equal zero when we let Z take on the values of + 1
or -1 and evaluate the expected value under the null hypothesis of no treatment effect. The
negative second derivative with respect to 11 is
2z. 2e211 zI· (o+l)H(t.)z. 2e11 zI·
d 21 = E[ -(o+l)H(t.)
_-_
1
1
+
1 1
]
d11 2 d
(13+e 11ZiH(t »2
(13+e1'1 ZiH(ti»
i
oH(C)Zi2e11 zi
(13+e 1 1ZiH(C» ].
and its expected value under the null hypothesis is
2
E(I111=O) = E E (-d ~)
Z t 1z d11
C [(13+ H(t.»2 _ 132 _ 213H(t.)]h(t) .
= E(z 2)[-n(o+l)o13° J
1
+3 1
dt
Z
0
(13+H(ti»O
+
n(o+~)0130 J[H(ti)+13-13]h(t)dt _ noSH(C)2 +
o (13+H(ti»0+2
(H(C)+13)2
= E(z2)[-n(0+1)(1-S) __
no..,...:.(o_+=l.:.-) (Sl+2
z
(0+2)
/0 -1) +
nSoH(C)]
(H(C)+13)
2no(1_ S l+1/0)
+ 2no(0+1) (Sl+2/0 -1) + n(o+l)(l-S) + no(Sl+l/o_l)
(0+2)
no(1_ S l+2/0)
= ---:.-,.(0-+-=2""")~
This is the same result as equation 3.6.
The amount of information on 11 increases as a
51
increases and decreases as the percent of censoring increases. The following equations will reveal
that replacing the A by h(t) will not change the resulting efficiency ratios. Except that now S
depends on H(t) instead of At.
For the proposed method, with treatment affecting TIX, the likelihood equation is,
Ot 1 -x'{3
d
11z .
11z .
c
11z ,
all {3Ot
'H(C)) IT r(Ot) xi - e '
L= ITh(t)xie 'exP(-xie 'H(ti)) ITexP(-xie
and the natural log of the likelihood equation is,
.]
11 Z'H(t.)
I=d
I:~nH(t) + lox· + 11z. - x·e
i l l
1
11 z'H(C)
,
+ I:c-x·e
l
+ I: [Otln{3 - Inf(Ot) + (Ot-l)loxi - {3x·1].
all
Toe first derivative with respect to 11 is
dl
d 11
= I:[z. - x.z.e11ZiH(t.))
d
1
1 1
1
+
I: _x.z.e 11ZiH(C)
C
1 1.
and the partial derivative with respect to 11 and H(t) is
11 z ,
d 21
d11dH(t) = ~ -xizie
'h(t i)
+
z
·
f -xizie11 'h(C).
The expected value of the partial derivative equals zero under the null hypothesis of no
treatment effect. The second derivative with repect to 11 is
2 11 z ,
d 21
-=--2 = I: x·z· e
'H(t.)
d11
d 11
1
+
2 11 z ,
I: x·z· e
'H(C)
C
•
1 1
and its expected value under the null hypothesis is
2
E(IIr I =0) = E E E ( -d 1 )
z xlz tlxz d11 2
=
C
2
00
11z. -x.e 11~IH(t.)
E(z )[nAI xiI H(ti)h(t)xie
Ie 1
1 dt f(x)dx] +
zOO
11z.
2 00 00
1 z· -x.e
IH(t.)
•
H(C)h(t1)x1·e 1 Ie 1
1 dt f(x)dx]
E(z )[n I
z
0
C
xJ
[1
=
n
H(C){3 Ot Ot
- (H(C)+{3)Ot+l
+
{3Ot
(H(C)+{3)Ot]
52
nH(C){3 Ot Ot
+ (H(C)+{3)Ot+l
= n[1- H(C)a S1+1/a _ S + H(C)a S 1+ 1/a]
/3
/3
= n(1-S).
This is the same result as in equation 3.7.
The efficiency of the proposed method to the standard method for when the treatment
effects survival time, T, given X, only, is
Eb(II'r1=O)
(a+2)(1-S)
Ea (II'r1 =O) = a(1_Sl+2/a) .
This is the same result as equation 3.8.
3.9 Computing the Efficiency Ratio When the Effect of Treatment is on the Surrogate Measure,
x, Only
Now, let the treatment effect be on the surrogate X by replacing /3 by /3e
standard method the likelihood equation is equal to
d h(t)a
/3e12zi
a+l
/3e12zi
a
Z
L = IT fJe12 i [(H(t)+fJe12zi)1
~ [(H(C)+fJe12zi)1
and the natural log of the likelihood equation is,
I = ~~nh(t) + Ina - 12zi - .In/3 + (a+l)(ln/3 + 12zr In(/3e1 2Z i+H(t i»]
+ ta~nfJ +
12zi - In(fJe1 2Zi +H(C) )].
The first derivative with respect to 12 is
~
R
~~e
i.
12zI·
-d- = E[-zi +(a+l)(zr
1
12
d
(/3e 2 '+H(ti»
and the partial derivatives are equal to
z
(a+l)fJh(t.)z.e12 i
d 21
E
1 1
d12dH (t) = d (fJe 1 2Zi +H(t »2
i
53
)] +
E (azi c
12Z ,
1.
For the
The expected values of the partial derivatives equal zero under the null hypothesis of no
treatment effect. The second derivative with repect to 12 is
and its expected value under the null hypoth~is is
h(t)
2
2C
h(t)
Q 1C
E( Ih 2=O) = E(z )[-nQ(a+1).8a+ J
+3 dt + nQ(Q+1)-'.8 + J
+2 dt
z
0 (.8+H(ti))Q
0 (.8+H(ti))Q
nSQ.82 + nSQ.8
(.8+H(C))2 (.8+H(C))
- nQ(Q+1)(Sl+2/Q -1) _ (Sl+l/Q -1) _ nQS1+2/Q + nQS1+1/Q
- (Q+2)
nQ
= ~1_Sl+2/Q).
(Q+2)
This is the same result as 3.9.
For the proposed method, with treatment affecting X, the likelihood equation is,
d
L = IT h(t)xie-
xH(t) c
all .8QeQI2Zi
r(Q)
IT exP(-xiH(C)) IT
,z.
xt- lexp(-.8e
2 Xi)
I
and the natural log of the likelihood equation is,
. 1= EVnh(t)+lnx1·-x.H(t.)] + E-x1·H(C) + E [Q(ln.8+'Y2zi)-lnr(Q)+(Q-1)lnx.-.8e12Zixl']'
d
lie
all
1
The first derivative with respect to 12 is
..J!L = E QZ. _ .8x.z.e12zi
d '2
all
1
11
and the partial derivatives are equal to
d 21
d ,2dH(t) = O.
2
d 1
d ,2dQ
2
d 1
d ,2 d.8
_ Ez
- all i
= E _ x.z.e12zi
all 1 1
The expected values of the partial derivatives equal zero under the null hypothesis of no
treatment effect. The second derivative with respect to '2 is
54
= no.
This is the same result as found in equation 3.10. The resulting efficiency ratio for treatment on
X only is
E(Ih'2=0)
(0+2)
E(Ih'2=0) = (I_ SI+2/0)
This is the same result as equation 3.11.
3.10 Computing the Efficiency Ratio When Both Survival Time and the Surrogate Measure, X,
are Affected by Treatment
The results for when treatment affects both survival time and the surrogate, X, are as
follows. For the standard method, the likelihood equation is now,
L=
fi oh(ti) eZi('Yl - 'Y2)[
/3
/3
]0+1
.8+H(ti)ezib'1 - 'Y2)
it [
/3
]0.
/3+H(C)ezi ('Yl - 'Y2)
For simplicity let 'Y='YI - 'Y2' The log of the likelihood equation is then
InL = ~ {Ino + Inh(t) -In/3 +zi'Y + (o+I)~n/3 -In(/3+ H(t i)e
+ E
c
The first derivative with respect to 'Y is
(o+I)H(t.)z.eZ(Y
dl
L = E z. _
_n_
I I
d'Y
d I
(/3+H(t )/O')
i
o~n/3 -In(/3+H(C)ezi 'Y)).
+E
-oH(C)Zie
z''Y
I
c (/3+H(C)ezi'Y)
and the partial derivative with respect to 'Y and A is
(o+I)H(ti )Zie
Z''Y
I
(/3+ H(t )eZi'Y)
i
oh(C)zie
Z''Y
I
(/3+H(C)e zi 'Y) .
Z''Y
I
)]}
When Z takes on the values of 1 or -1, the expected value of the partial derivative is 0 under the
null hypothesis. The second derivative with respect to '1 is
2 Z·'1
aH(T)zi e 1
(,8+H(T)eZi'1) .
and its expected value under the null hypothesis is
C h(t.)H(t.)
E(Ih=O) = E(z2)[na(a+l),8a J
1
1+ dt
z
0 (,8+H(t ))a 2
i
C [(,8+H(t ))2_,82_2,8H(t )]h(t )
i
i dt
i
- na( a+ 1)A,8a J
o
(,8+H(t ))a+2
i
n SaH(C)
+ (,8+H(C))
2,8
(a+l),8a+l
n SaH(C)
+ (,8+H(C))
nSaH(C)2
(,8+H(C))2
2,82
(a+2)(,8+H(C))a+2
+
2,82
(a+2),8a+2]
nSaH(C)2
(,8+H(C)2)
_ ~1_SI+2/a)
- (a+2)
.
This is the same result that was obtained in equation 3.12. The variance of
i' is the inverse of
E(Ih=O).
For the proposed method, the likelihood equation with both treatment effects present is
.
'11 Z •
'12z.
'12z.
c -xie IH(C) all (,8e
l)a a-I -,8e
l xi
IT e
IT
f(a) Xi
e
56
•
aDd the natural log of the likelihood is
InL = 1 = EVnh(t.) + lox·
1
1
d
.+ 11z. - x.e11 ZiH(t.)] +
11
1
Z
E -x.e11 iH(C)
cl
12z.
+ ~I[aln,B + a12z- Inr(a) +(a-l)lnxi -,Be
l xi]·
The first derivative with respect to 11 is
dl
d 11
= E[z. - H(t.)x.z.e11Z11 + E -H(C)x.z.e11 zi
d
1
1
1 1
C
1 1
and the partial derivative with respect to 11 and H(t) is
11 zi
d 21 _
d11dH(t) - ~ -h(ti)xizie
+
t -h(C)xizie11z ,
I.
2
Th~ expected value of the partial derivative, E(d-~(0d~I)' equals zero under the null hypothesis
of equal treatment effect. This can be seen by following the same computations as were used for
the expected value of the partial derivative with respect to 11 and A. The second derivative
with repect to 11 is
2
-d 1
d11 2
= E H(t.)x.z.2e11Zi + E H(C)x.z.2e11Zi
d
1
1 1
C
1 1
and its expected value under the null hypothesis is
2
E(II1 =0) = E E E ( _d 1 )
1
z xlz tlxz d112
2
00
C
O'
0
1 z· x·e 11~IH(t.)
= E(z )[nAJ xiJ H(ti)h(t)xie 1 Ie
z
-
2 00 00
E(z )[n J x·1 J
z
0 C
[1 H(C)a SI+I/a
- n -,B
-
S
dt f(x)dx] +
11z.
1 z· -x.e
IH(t.)
H(C)h(ti )x1·e 1 Ie 1
1 dt f(x)dx]
1
H(C)a S 1+
+ -,B-
= n(1-S).
57
1
1/a]
This is the same result as equation 3.13. The variance of 11 is nd-Sr
The first derivative with respect to 12 is
.....9!.- =
d12
E az. _ !Jx.z.e12z i
all 1
1 1
and the partial derivatives are equal to
d 21
d12d'\ = O.
d 21
1 z·
d1 d!J = Eall -xizie 2 ,
2
d21
d12d !J = ~lazi
The expected values of the partial derivatives equal zero under the null hypothesis of no
treatment effect. The second derivative with respect to 12 is
2
_d 1 = E-!Jx.z.2e12zi
- d122 all
1 1
and its expected value under the null hypothesis is
E(Ih 2=O) = nE(!Jzl·2
z
f0 xif(x)dx)
2
= nE(az.
z 1)
=na.
Thus, the variance of 12 is n~' This is the same result
as equation 3.14.
The partial derivative with respect to 11 and 12 is zero and so the variance of 11 - 12 is
var(-Y1 - 12) = nd-S)
+ n~
a+1-S
= na(1-Sr
The ratio of the variances from the standard method and the proposed method is
(a+2)(1-S)
This is the same result as equation 3.15. All results and conclusions drawn for the exponential
situation parallel the results of the likelihood equations that follow the general proportional
hazards model.
Thus, for any general proportional hazards model, the precision of the
parameter estimates can be improved by using the likelihood equation that incorporates the
surrogate measure, X.
The variance and efficiency only depend on h(t) through S, the
probability the subject will be censored, where C is the fixed censoring time.
58
..
3.11 Results Extended to Random Type I Censoring
The above results can be extended to random type I censoring by taking the expected
value of the negative second derivatives with respect to death time, t i , and censored time, ci'
Previously, the expected value was taken with respect to t i given ci fIxed at time C. If C is
replaced by ci' the corresponding random censoring time for each subject, in the expected value,
then by taking the expected value again with respect to ci this time, we arrive at the expected
value of the negative second derivative with respect to t and c,
E rd2lnL) = E E rd2InL).
t,c ~
c tic d")'2
In order to get an expected value with respect to censoring time, ci' the density of ci' gdci)' is
needed. Once
~(ci)
is provided the following computation is performed:
d2
E r InL) =
t,c ~
If the
densit~
of ci'
~(ci)
J [E
tic
2
r d InL)] ~(c.) dc.
d")'2
1
is not known the density of the probability of surviving beyond time
.ci' gSc(ci)(Sc(ci» can be used since the expected values are only dependent on the time of
censoring through the function Sc(cJ
Either wayan efficiency ratio can be computed that is
within the ballpark.
59
CHAPTER 4
USE OF A TIME DEPENDENT SURROGATE MEASURE
4.1 Introdudion
In Chapter 3, the proposed method incorporated surrogate measures where the value of
the surrogate measure for a particular subject remained constant throughout the study. The
following chapter will concentrate on variables that change across time. In clinical trials many
covariates are measured overtime. For example, a subject's blood pressure may be taken once
every week during the duration of the trial. When such time dependent covariates are measured
it may be the trend over time that is important to the hazard of surviving and not just one
value.
Thus, the surrogate measure, X, may be more useful as a time dependent variable,
Xi(tj ), where the variable X is measured for subject i "at time j for j=l, ..., J. The pattern that
the values of X(t) take for a given subject could be of a variety of possibilities. For example,
the expected value of the time dependent covariates may follow a straight line, a step function
or a polynomial curve, each pattern being defined by a set of parameters. In Chapter 3 it was
assumed that the hazard depended on X. Now, the hazard may depend on the parameters that
describe the trend of the expected values of X(t). The asymptotic variance of the treatment
parameter 1 will depend on these parameters along with the percent of censored observations.
The following sections will incorporate parameters describing the course of a time
dependent surrogate, X(t), into the standard and proposed likelihoods to compare the resulting
asymptotic
variance
of
the
treatment
parameter,
1,
assuming
different
types
of
functions(models) for X(t) and different hazard functions. Section 4.2 illustrates the issue with a
specific example where all distributions are known to be gamma. The second example, found in
section 4.3, computes a very general model where the hazard depends on some arbitrary function
of the parameters of the expected value of X(t). Section 4.4 will apply the results of section 4.3
to different models of the time dependent covariate X(t). For example, the surrogate variable,
X(t), may be a step function where the hazard jumps from value a to b when a given event
occurs. This model could be applied to the following covariates: age at first birth, menopause,
age at puberty, time to a heart attack, or how long a subject has been exposed to a given
state(condition). Another example that will be covered is when the parameter(s) describing the
trend of the expected values of X(t) follows a gamma distribution. When the time dependent
surrogate measure increases linearly across time, the hazard of the primary outcome event may
depend on the slopes of the trend of each subject. This model is applicable to such measures as
cholesterol, blood pressure and the number of cancer cells found in a subjects bone marrow. The
hazard may depend on how fast the surrogate is increasing over time or the value of the
surrogate variable at a specific point in time. For all examples Xi(tj)=ui(tj ) + fij where ui(tj )
is the expected value of Xi(tj ) and will be modeled as a function of the parameter(s) fJ through
the function gl(fJi ,tj )=ui(tj ). fij are iid and independent of ui(tj ) with density, ff' The hazard,
h(tlfJ), depends on sOme function of these parameters, or equivalently on some function of the
surrogate three pieces of information must be known; the model for u(t), the function of the
parameters of u(t) on which the hazard depends, g3(fJ,t), and what the distribution of g3(fJ,t) is
or what the distribution of fJ is.
All of the following models will be specified through u(t),
g3(fJ,t)"and f(fJ), the distribution of the parameters, fJ, or the distribution of g3(fJ,t).
4.2 Simple Specific Example
The following is a very specific example that incorporates the time dependent surrogate,
Xi(tj ), where the variable X is measured for subject i at time j for j=l, ... J, into the likelihood
equations.
j
# k),
Assume that Xi(tj ) is independent of Xi(t k ) for j
I
J
(i.e. fij independent of fik'
and that each measure of X in time has a gamma distribution, Xi(tj ) "'" gamma(o,
+l t)' Let 0 be known, bI·
a· b..
I
#k
"'"
gamma(02,1), and a·=O.
Thus, the expected values of X.(t.)
I
I J
across time are summarized by a straight line through the origin with a slope of obi and so for I
61
subjects there are I different straight lines.
Given the following densities the marginal density
and survival function of T can be computed.
fT(t) =
I Aabite-bi (Aat
2
/2+1)db
Aata2
00
STet) =
I f(t)dt
t
a
= (Aat 2/2 +lf 2
As before, an additional term will be added to the likelihood equation, e1'1\
represent the effect of the treatment.
~hich
will
The likelihoods for both the standard and proposed
methods will be computed in order to compare the amount of information on 1'1 obtained from
a
the two methods. For this example, S will represent ST(C)=(AaC 2/2 +lf 2, the percent of
censored subjects.
For the standard method the natural log of the likelihood equation is
1'1 z. 2
In L = LlnA + 1'1zi + Ina + Inti + Ina2 - (a2+1)ln(Aae
lt i /2 +1)
d
1'1 z. 2
a2 ln (Aae
IC /2 +1).
-t
The first derivative with respect to 1'1 is
Under the null hypothesis of 1'1=0 the expected value of the partial derivative with respect to l'
and A is zero. The second derivative with respect to 1'1 is
62
•
Its negative expected value under Ho is equal to
ECd21nL)
= nE E Cd21nL)
d1' 2
1
Z
tlz d1' 2
1
C -(Q2+1)A3Q3Q2z'12tl.5/4
= nJ
dt
o
3
(AQti 2/2 +lt2+
C (Q2+1)A2Q2Q2z.2t.3/2
1 1
dt
2
0
(AQti 2/2 +lt2 +
+ nJ
_ nQ2(1_sl+2/Q2)
-
•
(Q2+2)
This is the same result as equation 3.6 and 3.9 except the information here depends on Q2'
which measures the amount of heterogeneity in the slopes, hi' in contrast to equation 3.6 and 3.9
which depended on
Q,
which measured the amount of heterogeneity in the surrogate X.
For the proposed method the natural log of the likelihood equation is
63
In L = ~lnA + "'Y1zi + In(ai+biti) - AQe
"
"'Y1 z.
2
l(aiti+biti /2) +
f
-AQe
"'Y1 z.
2
l(aiC+biC /2)
+ ~ (Q2- 1)lnbi - bi -lnr(Q2)'
The first derivative with respect to "'Y1 is
z
z
ddlnL = Ez. - AQz.e"'Y1 i(a.t.+b.t. 2/2) + E_AQz.e"'Y1 i(a.C+b.C 2/2).
"'Y1
d 1
1
1 1
1 1
C
1
1
1
Under the null hypothesis of "'Y1 =0 the expected value of the partial derivative with respect to "'Y1
and A is zero. The second derivative with respect to "'Y1 is
2
z
z
d ln L = E-AQz.2e"'Y1 i(a.t.+b.t. 2/2) + E_AQz.2e"'Y1 i(a.C+b.C2/2)
d"'Y1 2
d
1
11
11
C
1
1
1
Its expected value under Ho is equal to
ECd2InL) = nE E E Cd21nL)
d"'Y1 2
zblz tlbz d"'Y1 2
C 2 2 2 2 3
-AQb.t. 2/2
00 2 2 2 2
2
-AQb.t. 2/2
= nE E [J z· A Q b. (t. /2)e
1 1
dt + f z. A Q b. (C /2)t,e
1 1
dt]
1
1
C 1
1
1
z biz 0 1
= n(1-S).
For the proposed method the information on "'Y1' using a time dependent surrogate, depends only
on the percent of censoring in the trial. This is the same result that occurred when the surrogate
was fixed. The efficiency ratio of the standard method versus the proposed method is
Q2(1_ S 1+2/Q2 )
(1-S)(Q2+2) .
The ratio goes to 1 as the amount of censoring goes to 100% and as the heterogeneity in the
64
slopes, bi , decreases.
This is the same scenario as seen in Figure 3.8 execept now the
Q
parameter is measuring the amount of heterogeneity in a parameter that describes the trend of
the expected values of X(t) rather than the amount of heterogeneity in the X values.
4.3 General Model
The following is an example of a general model that incorporates any arbitrary specified
hazard.
change across time of one or more time dependent covariates with a density of f(9).
Denote
'YZ '
h(tiIOi)= g3(Oi,t i) eland denote the cumulative hazard of g3(O,t) by
giving the conditional density of survival time given 9:
Z
'Y Z -G(Oi,t)e'Y
fTIO.,z = g3(Oi,t) e e
,
1
and the conditional survival function:
-G(O.,t)e 'YZ
ST10i'z = e
1
•
The marginal density of survival time is
,
'YZ
fT(t) = g3(Oi,t) e'YZ e-G(Oi,t)e f(O)dO
J
o
and the survival function is
ST(t)
=J Jg3(Oi,ti) e'Yz e-G(Oi,ti)e'Yzf(O)dO dt
TO
= J e-G(Oi,t)e'YZf(O)dO.
o
With these functions so defined the standard and proposed likelihoods can now be formed.
For the standard method the natural log of the likelihood equation is
'Y.
'Y Z'
In L = E In[J g3(Oi,t )e'Yzie-G(Oi,ti)e If(O)dO] + E In[J e-G(Oi,ti)e If(O)dO]
i
d
c
The first derivative with respect to 'Y is
d~;L= ~
'YZ,
'Y Z'
{[J g3(Oi,ti)zie'Yzie-G(Oi,ti)e If(O)dO_ J g3(Oi,t i)zie2'YZiG(Oi,ti)e-G(Oi,ti)e If(O)dO]
65
..
z
+~
C
J-G(O,t)ze1 ze-G(O,t)e1 f(O)dO
J e-G(O,t)e1 zf(O)dO
The second derivative with respect to 1 is
Z
Z
f(O)dO+ J g3(O,t)z2e31zG2(O,t)e-G(O,t)e1 f(O)dO]
Z
Z
[J g3(O,t)e1 ze-G (O,t)e1 f(O)dO] * [ J g3(O,t)e1 ze- G (O,t)e1 f(O) dOr 2
+ [_ J g3(O,t)2z2e21ZG(O,t)e-G(O,t)e1
*
Z
Z
- [J g3(O,t)ze1 ze- G (O,t)e1 f(O)dO- J g3(O,t)ze21Z G(O,t)e- G (O,t)e1 f(O)dO]
* [J g3(O,t)ze1ze-G (O,t)e1
z
f (O)dO -
Jg3(O,t)e21Z G(O,t)e- G(O,t)e1
* [J g(O,t)e1 ze-G(O,t)e1
z
f(O)dO]
Z
f(O)dOr 2}
Z
J-z2e1 Z G(O,C)e-G(O,C)e1 Zf(O)dO J z2e21 Z G2(O,C)e-G(O,C)e1 f(O)dO
+ ~{
+
C
J e-G(O,C)e1 Zf(O)dO
J e-G(O,C)e1 Zf(O)dO
Z
[J _z2e1 Z G(O,C)e-G(O,C)e1 f(O)dO]2
[J e-G(O,C)e1Zf(O)dO]2
}.
Its expected value under Ho is equal to
C
2
E(- d lnL) = nE {z.2e1
d12
z 1
J
ZiJ J[ 0
o
f (O,t )G( O,t )f(O)dO)f (O,t )f( O)dO
T10
T10
0
C
21
-n~{zie
(J
ZiJ J [ 0
o
] dt dO}
fT(t)
0
66
fTIO(O,t)G2(O,t)f(O)dO)fTIO(O,t)f(O)
f,(t)
T
]dtdO}
..
C
C
C [J f G(O,t)f(O)dO]2
T10
2
= n [J tTloG(O,t)f(O)dO]dt - n [J fT10G (O,t)f(O)dO]dt + n
0
f(t)
]dt
o 0 . .
0 0
0
n[J G(O,C)e-G (O'C)f(O)dO]2
+ n f G(O,C)e-G(O'C)f(O)dO - n f G 2(O,C)e- G (O'C)f(O)dO + ~O_---,::~=_
o
. 8
f e-G(O,C)f(O)dO
J
J
J[
o
= n f [1 - (G(O,C) + l)e- G (O'C)]f(O) - n f [2 - (G 2(O,C) + 2G(O,C) + 2)e- G (O'C)]f(O)
o
0
C [J g3(O,t)G(O,t)e- G (O,t)f(O)dO]2
+ n
J[
0
f(t)
£
]dt +
n G(O,C)e-G(O'C)f(O)dO.
o
- n f G 2(O,C)e- G (O'C)f(O)dO
o
£
n[J G(O,C)e- G (O'C)f(O)dO]2
+ ~O_--:::~::--_
f e-G(O'C)f(O)dO
o
C [f g3(O,t)G(O,t)e- G (O,t)f(O)dO]2
G
= n[-l+ (G 2(O,C)+G(O,C)+1)e- (O'C)f(O)dO + n
J[
o
67
0
f(t)
]dt
n[J G(9,C)e- G (9'C)f(9)d9]2
= n[-1 +
Je-G(9,C)f(9)d9 + 2 J G(9,C)e- G (9'C)f(9)d9] +
9
9
9
Je- G (9,C)f(9)d9
]
9
C [J g3(9,t)G(9,t)e- G (O,t)f(O)d9]2
+ nJ[
o
9
J g3(9,t)e-
G(9 )
,t f(9)d9
]dt.
4.1
9
The amount of information obtained on r from the standard method can be computed using 4.1
for any specified distribution of 9, and for any specified hazard function, g3(9,t).
For the proposed method the natural 108 of the likelihood equation is
In L = ~ In 83(9i ,ti )
+ rZi -
G(9i ,ti )e
r z.
+t
1
-G(9i'C)e
r z.
.1
+ ~ In f(9).
The first derivative with respect to r is
!!\!!1
=
or
Ez. - z.G(9.,t.)e
d 1
1.
1 1
rzi
r
+ EC -z.G(9.,C)e
\
1
I
The second derivative with respect to r is
2
d 1nL = E
d-y 2
d
- z2. G (O.,t.)erZi + E _ z.2G (9-,C)e-yzi .
1
1
I
C
I
I
Its expected value under Ho is equal to
2
C
-yz·
z
E(d Inl') = nE E
z2 ie -y iG(9,t)g3(9,t)e-yzi e- G (9,t)e I dt
dr
z 91z 0
J
+ nE E STI9 zi2G(O,C)erZi
z 91z
t)
= nE E [z2i(G(O,C) - G(O,C)e- G(9 ,C) ) - C
J z2I,er z1.83 (O,t) - 83(O,t)e- G(O 'dt]
0
z Olz
= nE E [z.2 (1 _ e-G(O,C»]
z Olz
=
n~ zi 2
I
J
(1- STIO) f(O) dO
o
68
4.2
Using the proposed method with any arbitrary hazard function, the amount of information
obtained on 1 depends only on the percent of censored observations and the sample size. This is
the same result as found in equation 3.7 and 3.13. The efficiency of the standard method to the
proposed method can be found by taking the ratio of equation 4.1 over equation 4.2. This ratio
can be used to define the efficiency ratio of the treatment parameter for any given specified
parameter(s) of change across time and for any arbitrary specified hazard.
As mentioned
previously, three pieces of information must be provided in order to compute the efficiency; the
model for u(t), the function of the parameters of u(t) on which the hazard depends, g3(0,t), and
what the distribution of g3(0,t) or
°
is. The following section will provide examples of how
equations 4.1 and 4.2 can be applied to different fJ's and hazard functions, g3(fJ,t) and how the
efficiency changes depending on these values.
4.4 Examples
The general model described in Section 4.3 can be applied to a variety of time
dependent variables.
For example, suppose that the expected value of the time dependent
covariate within each subject follows a straight line, thus, Xi(t)=c:ri
+ Pit + £i.
So ui(t) is a
function of two parameters, c:ri and Pi. Every subject's expected values fallon a different line so
there is· an c:ri and
/3i for each of the I subjects. Assume the hazard depends on some function of
these parameters and its distribution is known. In Chapter 3 it was assumed that the hazard
depended on some arbitrary surrogate measure, X. The form of X was not defined but it was
assumed that X had a gamma distribution with parameters c:r and
/3. c:r was a measure of the
strength of the association between the surrogate measure X and the survival time, T. The only
information needed was the distribution of X. As described earlier to apply the general model
which incorporates a time dependent surrrogate measure, three pieces of information are needed;
the model of u(t), g3(0,t), and the distribution of g3(0,t) or the distribution of 0. Given the
values of X(t) follow a straight line, assume, for the first example, that the hazard is
69
proportional to exp(.B) and exp(.B) ..... gamma.
applicable is where
A possible situation where this may be
.B is measuring the rate of change over time of a subjects blood pressure or
cholesterol and the higher the rate, the higher the hazard of dying of a heart attack.
To
compute the expected amount of information for this situation let g3(O,t)=h(t)exp(.B) and
G(O,t)=H(t)exp(.B) and exp(.B) ..... gamma(a,b) such that
ba f3
f.B(.B)=exp(-exp(.B)b)exp(.B( a-I)) r(:) d.B.
Thus, using equation 4.1, the expected amount of information for the standard method when
y=e.B is equal to
2
E(~~J)= -1 +
(a-'l)ba
£ exp(-yH(C))exp(-ybl r(a) d(y)
00
~1)~
00
+ 2 £ yH(C) exp(-yH(C))exp(-yb)y r(a)
d(y)
00
(a-l)ba
[£yH(C) exp(-yH(C))exp(-ybl r(a) d(y)]2
+~---------..,.-....."...,...----
(a-l)ba
£ exp(-yH(C))exp(-ybl r(a) d(y)
00 .
(a-l)ba
00
C
+
Jo
[J y2H(t) exp(-yH(t))exp(-ybl r(a)
o
d(y)]2
(a-l)ba
£r.exP(-yH(t))exP(-Yb)y r(a) d(y)
00
70
dt
The amount of information depends on the percent of censored observation, ST' and on the
.
~
parameter, a. C and b together predict the amount of censoring for ST(C)= (H(C)+b)a' By'
considering a few examples, which are displayed in Table 1, we can see that in general the
efficiency of the standard method to the proposed decreases as the percent of censoring decreases
and as the parameter a decreases (as the amount of homogeneity in slopes, (ii' decreases). In
particular for very small values of a, say 2 or less, the amount of gain in efficiency goes from
25% to 70% as the amount of censoring decreases from 50%.
Even with large values of a,
(a>5), if censoring is low, say less than 25%, the variance in the treatment estimator can be
reduced from 10% to 25%. These results agree with those of chapter 3 where the surrogate, X,
followed a gamma distribution.
Table 1
The Efficiency of the Standard Method to the Proposed
When g3(O,t)=exp({i) and exp({i) -
gamma(a,b)
For Varying Values of a and S, the Percent of Censoring
a
S
1
2
4
6
10
'0
0.33
0.50
0.67
0.75
0.83
0.10
0.37.
0.55
0.72
0.79
0.87
0.25
0.44
0.63
0.78
0.84
0.90
0.50
0.58
0.75
0.89
0.90
0.94
0.90
0.90
0.95
0.98
0.98
0.98
Another example where the hazard could have a gamma distribution is if the hazard,
g3(O,t), is proportional to the variance of the residuals and the variance of the residuals follow a
71
gamma distribution. A possible situation where this may be applicable is where the values of
the time dependent surrogate have low variability for some of the subjects and jump around
sporadically for the other subjects. Examples of possible surrogate measures that behave in this
way are pulse rate and type A and type B behavior, where type A people constantly jump back
and forth between low stress and high stress, and type B people remain at low stress. Both of
these examples may be surrogates for a heart atttack. The efficiency for this example is the
same as Chapter 3 and Table 1.
gamma distribution.
In Chapter 3 the fixed surrogate measure, X, assumed a
In example 1 of Chapter 4, (results in Table 1), the time dependent
surrogate is a continuous variable where the expected values of X(t) follow a straight line. The
hazard function equals the exponential of the slope of this line.
Where the hazard, exp(,8),
assumes a gamma distribution, the efficiency of "I is equal to the effeciency of "I in Chapter 3
where the hazard also followed a gamma distribution. Thus, as long as the hazard has a gamma
distribution, the efficiency of the standard method to the proposed method is known. The form
of the hazard function does not matter, only its distribution.
72
See Table 2.
Table 2
Efficiency of the Standard Method to the Proposed Method For Varying
Functions of X( t) and the Hazard Where the Hazard Function has a
Gamma Distribution, gamma( 01,13), For All Examples
Efficiency
X(t)
hazard
S=o
S=0.50
S=0.90
1
X
X
0.333
0.583
0.903
1
01
exp(l3)
0.333
0.583
0.903
1
step
=0 if t<r
function
=b ift>r
0.333
0.583
0.903
2
X
X
0.500
0.750
0.950
2
01
exp(l3)
0.500
0.750
0.950
2
step
=0 if t<r
function
=b ift>r
0.500
0.750
0.950
10
X
X
0.833
0.941
0.990
10
01
exp(l3)
0.833
0.941
0.990
10
step
=0 if t<r
function
=b if t>r
0.833
0.941
0.990
+ I3t
"
+ I3t
+ I3t
Another type of function that may be considered as a model for a time dependent
surrogate measure is a step function.
The hazard of the outcome of interest may depend on
some intermediate event that occurs before the primary outcome.
Let the hazard function,
g3(O,t)=0 if t<r and g3(O,t)=b if t>r, where t is the time of death and r is the time the
73
intermediate risk event occurred. Thus, a person's hazard jumps from 0 to the value, b, if the
subject experiences the intermediate event before reaching the primary outcome or before being
censored.
The expected amount of information from this model will be found under three
circumstances. First let b be fIxed and r be random with a gamma distribution with parameters
a and {3.
Here, the subjects may experience the intermediate event at any time but their
resulting hazard, b, is the same for everyone.
Thus, f(9)=f(r)=exp(-r{3){3a ra-Ijf(a) dr.
A
possible situation where this may be applicable is where the risk event is starting to smoke and
the outcome of interest is death from lung cancer, or the risk event is the occurrence of a heart
attack with the endpoint being death following a heart attack. At the beginning of the study no
subjects have experienced the given risk event.. Any subject who goes on to experience the
event increases their risk by an amount b, fIxed for all subjects. Given this scenario G(9,t)=O if
t<r and G(9,t)=b(t-r) if t>r.
The expected amount of information on "'( for the standard
method is equal to
[JCb(C-)r e-b(C-r)Ra r a-I e-r{3
o
f(a)
1J
+
C
+Jo
[Jt b2(t-r) e-b(t-r) Rara-I e-r{3
1J
o
rca)
dr ]2
dr ]2
dt.
4.3
The integrals can be computed for different values of a and b using numerical integration. For
the proposed method the expected amount of information on "'( is n(I-ST). Thus, the efficiency
74
of the standard method to the proposed is the ratio of equation 4.3 over (1-ST ). Table 3 shows
that in general the efficiency of the standard method to the proposed method decreases as
decreases, where
Q
Q
measures the amount of homogeneity in the times, ri' where ri is the time the
intermediate event occurred for the i-th subject. The efficiency also decreases as the value of the
hazard, b, increases. The use of the intermediate event in the likelihoood has a big impact on
the efficiency when b is large (e.g. b=5, 10). The efficiency of the standard method is close to
the proposed when b is small and the amount of censoring is large. In this example b is the
amount of increase in the hazard.
impact on the efficiency.
So a large change in a subject's hazard, results in a big
The efficiency also decreases as the percent of censoring decreases.
Specifically, Table 3 illustrates that for a hazard jump of 5 or more and an
Q
value ranging from
1 to 10 the proposed method reduces the variance of the treatment parameter by at least 70%
when there is 50% censoring or less. Even with a small hazard jump of 2, 50% censoring and
very little heterogeneity in the jumping times, the variance is reduced by at least 20%.
75
Table 3
The Efficiency of the Standard Method to the Proposed
When g3(O,t) = 0 if t<r
with b fIxed
= b ift>r
r - gamma(a,b)
For varying values of a, average hazard jump b and S
S
1
2
5
10
o
0.235
0.342
0.571
0.711
0.5
0.385
0.421
0.614
0.731
b=5
S
1
2
5
10
o
0.092
0.100
0.177
0.316
0.5
0.170
0.171
0.233
0.317
b=
S
10
1
2
5
10
o
0.048
0.050
0.061
0.105
0.5
0.083
0.086
0.093
0.137
76
The second scenario for this step function example is to keep the same hazard function,
g3(O,t) but let b be random with a gamma distribution, gamma(Qb'p) and let r be a fIxed value
1
so f(O)=f(b)=exp(-bp) pQb bQb- /r(Qb) fb. The time the event occurs is the same for everyone
but the effect(the risk) of the event varies from subject to subject. This model could be used
when an investigator is forced to randomize the subjects before the event occurs.
A possible
situation where this may be applicable is a study of people living in one city and one day a
nuclear bomb drops on their city. All the subjects are affected at the same time but the amount
of harm each receives varies. Under this
sce~ario
it is assumed that the censoring time, C is
greater than the time the event takes place, r, and so the formula for the expected amount of
information on 1 for the standard method is equal to
= -1 + ST(C) + 2QbST(C)(1-ST(C)1/Qb) + Qb2ST(C)(1-ST(C)1/Qb)2
+
1o
QbjT1(Qb+1)2(1-ST(t)1/Qb)2ST(t)1+1/Qbdt.
4.4
The efficiency of the standard method is the ratio of equation 4.4 over (l-S T (C». Since r is
fIxed and the hazard, b, has a gamma distribution the resulting efficiency is the same as
Chapter 3 and Table 1.
The third scenario that is possible with the given step function, g3(O,t) is letting both b
and r be random. Assume b ,.., gamma(Qb,Pb)' r,.., gamma(Q,P) and band r are independent,
77
80
f(6)=fb (b)fr (r).
This model could be applied in cases like the first scenario except now a
subject's reaction to the occurrence·~f the event varies between subjects. Thus, each subject's
hazard increases by a different amount instead of a fixed amount. The formula for the expected
information is
-d21
E(~) = -1
dr·
0000
ro - 1 {pe-r .8 b
r()
+C
J J oo
0b- 1 0b -b.8b
.8b e
r()
dbdr
°b
0b -b.8b
Coo 0-1 a O -r.8
b(C- )
1 .8
e
fJ e
b(C-r)er b°bb
db dr]2
o 0 r(o)
r(ob)
[J J r
+
.8°e- r .8
£(C-r)20b2 .8b0 b(C-r+.8b)-0b-2 ro- 1f(o)
dr
C
78
4.5
For the proposed method the amount of information on "( is equal to n(1-S T (C» with
4.6
The efficiency of the standard method to proposed method can be expressed as a ratio of
equations 4.5 and 4.6. Table 4 illustrates in general how the efficiency of the standard method
to the proposed decreases as
0'
decreases and as O'b decreases where
0'
is a measure of the amount
of homogeneity in the times of the occurrence of the intermediate event, ri' and O'b measures the
amount of homogeneity in the hazard values, bi . The efficiency decreases as the amount of
homogeneity in the times, ri' decreases and as the amount of homogeneity in the size of the
hazard, bi , decreases. The efficiency also depends on the average size of the hazard, O'b/ Pb . As
the resulting hazard of experiencing the intermediate event increases the efficiency decreases.
Thus, the greater the affect the intermediate event has on the hazard of reaching the primary
endpoint, and the more varied this affect is from subject to subject, the less efficient is the
standard method.
The efficiency also decreases as the percent of censoring decreases.
Specifically, Table 4 shows how the standard method is only 10 to 65% as efficient as the
proposed method when there is heterogeneity in both band r and censoring is 50% or less.
However, even if neither the times of the jump nor the resulting hazard is heterogeneous the
standard method is still 10 to 25% less efficient then the proposed method when there is 50%
censoring or less.
79
Table 4
The Efficiency of the Standard Method to the Proposed
When g3(O,t) = 0 ift<r
with b "'" gamma(ab',8~
= b if t>r
r "'" gamma(a,b)
For varying values of a, ab' average hazard jump b and S
S 0
0.5
S
1
2
4
0.188
0.237
0.233
0.370
0.408
0.456
1
2
0
0.5
0.188
0.411
S 0
0.5
2
4
6
0.509
0.596
0.593
0.5730
0.646
0.690
10
16
20
0.356
0.568
0.771
0.836
0.872
0.527
0.649
0.805
0.859
0.918
For the last example the step function model is varied by setting g3(O,t)=1 if t<r and
g3(O,t)=b if t>r. Now, all subjects start out with a hazard of one. If they experience the risk
event before time, C, or before reaching the primary endpoint, then their hazard jumps to b.
The cumulative hazard is equal to G(O,t)=t of t<r and G(O,t)=r+b(t-r) if t>r. This example
will also be played out with three different scenarios. For the first scenario let b be fixed and r
follow a gamma distribution, gamma(a,,8), then, using equation 4.1, the expected amount of
information on 'Y, for the standard method is
80
1
I
a 1
lECd21 ) = -1 +
exp(-C)r - ,B e-r,B dr +
exp[-r(l-b)
n d")'2
C
r(a)
0
00
+ 2
-6
~~~]ra-1,Be-r,B
dr
C exp(-C)ra - 1,B e- r {3 d
2C
J [r(l-b)+bC]exp[-r(l-b) - bC]ra - 1,B e-r,B
r(a)
r+ 0
r(a)
dr
{I Cexp(_C)ra -1,Be- r,B dr + 1[r(l-b)+bC]exp[-r(l-b) - bC]ra - 1,Be-r,B dr]2
+
C
+Jo
C
r(a)
0
r(a)
r
1
a
exp(_C)r - ,Be- ,B
C exp[-r(l-b) _ bC]r a - 1,Be- r ,B
r(a)
dr +
r(a)
dr
6
00
[I texp(_t)ra - 1,Bae-,Br
t
£
dr
+
r(a)
j b[r(l-b)+bt][ex~r(l-b) - bt]ra - 1,Bae-r,B
0
r(a)
dr]2
......:...--------:,----,B..".-~---------:--,---,B,.........--dt.
exp(_t)ra - 1,Bae-r
{
r(a)
00
dr
+
t bexp[-r(l-b) _ bt]r a - 1,Ba e- r
r(a)
£
4.7
dr
As before the information can be computed for different values of a,b and amount of censoring,
using numerical integration. For the proposed method the expected amount of information on ")'
is equal to n(l-S T (C)). The efficiency of the standard to the proposed is found by taking the
ratio of equation 4.7 over (l-ST (C)). Table 5 illustrates how the efficiency decreases as a and
the percent of censoring, ST' decrease and as b increases. The efficiency of the standard method
to the proposed method decreases as a decreases, the amount of homogeneity in the event times,
rio The efficiency also decreases as the percent of censoring decreases and as the the resulting
hazard of experiencing the intermediate event increases.
Thus, the more variation in the
amount of time it takes a subject to experience the intermediate event and the larger the
resulting hazard, b, the less efficient is the standard method. More specifically, when there is a
large amount of heterogeneity in the times the event occcurs (a=l) and there is 50% censoring,
the gain in efficiency using the proposed method goes from 1% to 40% as the size of the hazard,
b, goes from 1.5 to 10. Even with 90% censoring, when the hazard increases by more than 9,
the variance of the treatment parameter estimate is reduced by 10%.
81
..
Table 5
The Efficiency of the Standard Method to the Proposed
When g3( 0,t) = 1 if t<r
= b if t>r
with b fIxed
r - gamma(0,{3)
For varying values of 0, the hazard b and S
0=1
b
S
1.5
5.0
10
0
0.964
0.642
0.452
0.10
0.97
0.654
0.480
0.50
0.989
0.783
0.594
0.90
0.999
0.971
0.912
b=5
°
S
1
2
5
10
o
0.642
0.781
0.892
0.991
0.50
0.783
0.928
0.999
1.00
For the second scenario let b be random, r be fIxed and the size of the jump between the
two hazards, b-1, have a gamma(ob,{3b) distribution. Since censoring time, C, is fIxed and r is
fIxed the only path of interest is when C>r. Thus, the expected amount of information on "I, for
the standard method, letting x=b-1 is
82
dx
00
C[1
+
b [-rx+(x+l)t] e-[-rx+(x+l)t] PbQb xQb- 1e-xPb
~2
0
r(Qb)
---::.--------...::::....-Q-b-Qb---=-l--x-P""'"b--- dt.
r
b exp[-(-rx + (x+l)t)] Pb x
e
dx
o
r(Qb)
J
j
..
Replacing e- x with y the following equations result.
_d21
E(2) = -1
d-y
+
Qb- 1 Qb
. Clny Pb- r-l
1 exp(-C) e
y
(-lny)
Pb
10
r( )
dy
Qb
+
•
dt
83
C
+/
dt.
r
The expected amount of information can be found for varying values of r, Qb' Pb and the
percent of censoring, S. Table 6 illustrates how the efficiency of the standard method to the
proposed decreases as Qb decreases (or as the amount of heterogeneity in the hazard jump
increases), as the size of the hazard jump increases and as the percent of censoring decreases.
Specifically, Table 6 shows how both the size of the average hazard jump,
b-I, and the amount
of heterogeneity in the jumps is important to the resulting efficiency. For an average size jump
of 3 at 0% censoring, the efficiency drops 10% when Qb goes from 6 to 3. At 90% censoring the
efficiency drops 8%.
Table 6
The Efficiency of the Standard Method to the Proposed
When ~3(O,t) = 1 if t<r
with ~1 - gamma(Qb'Pb )
= b if t>r
r fixed
For varying values of Qb and Pb' where the average
hazard jump b-I
= ;~ and S.
(for
(for Pb=1)
average hazard jump
S
2
3
5
0
0.669
0.611
0.546
0.5
0.946
0.872
0.9
0.973
0.920
b-I = 3)
Qb
3
6
0
0.611
0.710
0.758
S 0.5
0.872
0.947
0.862
0.9
0.920
0.999
84
"
For the third scenario both the size of the hazard jump, b-l, and the time the risk event
occurs, r, have a gamma distribution, (b-l - gamma(ab'P b) and r - gamma(a,I», the expected
amount of information on the treatment parameter, -y, is
ECd 2I) = -1
+
J exp( - C) r a-I e-r
00
C
d-y2
dr
rea)
.
1
C oo[r(l-b)+bC][exp[-r(l-b)-bC-(b-l)p1Jra- e- r (b-l)
+2
r(a)r(ab)
££
ab- 1
Pb
ab
d(b-l)dr +
OOJ Cexp(-C)ra-le-r
CJOO [r(l-b)+bC][exp[-r(l-b)-bC-(b-l)P1Jra-le-r(b-ltb-l Pbab
2
[C
rea)
dr+ 0 J0
f(a)r(ab)
d(b-l)dr]
00
~
?{
00
+
o
1 r
C 00 exp[-r(l-b)-bC-(b-l)p. 1ra-I e-r (b-l) ab- 1Pbab
exp(-C)r.a-ebJ
rea)
dr +
r(a)r(a~
d(b-l) dr
££
texp(_t)r a - 1e- r
r(a)
00
. {
too b[r(l-b)+bt]exp[-r(l-b)-bt-(b-l)P1Jra-le-r(b-ltb-l Pbab
dr+
££
.
a -I
a
d(b-I)dr
exp(_t)r a - 1e- r
too b exp[-r(l-b) - bt -(b-l)p1Jr e (b-l) b Pb b
rea)
dr +
r(a)r(ab)
d(b-l) dr
££
C a-I -r e-Ca
00 - C a-I -r
JJb ab
= -1 + J e r e dr + J r
e
C
r(a)r(~1 -r
rea)
0
rea)
00
dr + 2 J C e(C-r+pbt b
C
85
C
r
Q-l -r
e dr
rea)
Je-C ra-I e-r dr
00
C
C
+
Jo
r(a)
t -t a-I -r
e r e dr
. t
r(a)
00
{[f
00
-t a-I -r
t a-I -r
[fer
edr+Jr
e
t
r(a)
0 r(a)
a-I -r
r r(a)
dr] dt.
Using numerical integration the expected amount of information on 'Y can be found for different
values of a, ab' Pb and the amount of censoring.
Table 7 illustrates in general how the
efficiency of the standard method to the proposed decreases as a decreases and as ab decreases.
a is a measure of the amount of homogeneity in the times of the occurrence of the intermediate
event, ri' and ab measures the amount of homogeneity in the size of the hazard jump, brl. The
efficiency decreases as the amount of homogeneity in the times, ri' decreases and as the amount
of homogeneity in the size of the hazard jumps decreases. The efficiency also depends on the
average size of the hazard jump, abl Pb.
As the resulting hazard of experiencing the
intermediate event increases the efficiency decreases.
Thus, the greater the effect the
intermediate event has on the hazard of reaching the primary endpoint, and the more varied this
effect is from subject to subject the less efficient is the standard method. The efficiency also
decreases as the percent of censoring decreases. In particular, Table 7 shows that the standard
method goes from 82% efficiency to 48% efficiency as the average size of the hazard jump
increases 5-fold, from 2 to 10 when there is 25% censoring and a large amount of heterogeneity
in the jump times. Even with less heterogeneity the decrease in efficiency is significant.
86
Table 7
The Efficiency of the Standard Method to the Proposed
When g3(9,t) = 1 if t<r
gamma(ab'Pb )
with b-l
= b if t>r
r
gamma(a,l)
For varying values of a, and the average hazard jump b-T
S=0.25
ab
P
b
a
-= b-l
0.67
2.0
4.0
10.0
1
0.935
0.820
0.682
0.476
2
0.952
0.865
0.757
0.586
5
0.970
0.912
0.838
0.709
87
.
CHAPTER 5
FORMATION AND IMPLEMENTATION OF SURROGATE MEASURES
5.1 General Approach
All of the examples in Chapter 4 illustrated how the amount of information on the
treatment parameter, 1, can be increased by using the proposed method.
That is, using a
method which combines information from a surrogate measure with the primary endpoint. As
was shown in Chapter 4 three pieces of information are needed to use the proposed method: the
model for u(t), the function of the parameters of u(t) on which the hazard depends, g3(O,t), .and
the distribution of
°
or g3(O,t).
In chapter 4 all three pieces of informati9n were assumed
known. This chapter will illustrate how to derive this information from a given data set.
To start, one must identify one or more time dependent variables that look like possible
surrogate measures of survival.
It would be too time consuming to model all the time
dependent covariates measured in the study so it is important to identify the top candidates.
The investigator may already know of such variables from previous studies or simple exploratory
analyses can be done by plotting the surrogate measure against time for different groups of
subjects like those who died early compared to those who lived a long time for continuous
variables or stratified survival curves can be computed for categorical variables. If the subjects
all increase or decrease at the same rate or in the exact same way this provides little information
about survival.
Once a potential surrogate variable(s) has been chosen the next step is to find u(t), the
expected value of X(t) which defines the trend or pattern across time of the surrogate values
within a subject.
The point here is to look for a pattern among subjects that can be
described by a minimum number of parameters. One way to find u(t) for a continuous variable
is to plot x against time for each subject and use linear regression to form different models(e.g.
linear, quadratic etc).
Several models may need to be fitted before finding the best one. Values
like the baseline measure, the predicted baseline, the maximum value etc., are other possible
forms the surrogate may take.
To decide which function of the surrogate is most helpful in
..
providing additional information on the primary outcome, survival, these varIables or
Parameters can be tested in a Cox's proportional hazards model which allows for censoring. The
chosen surrogate will be the one most strongly associated with survival and with the largest
amount of heterogeneity. Other methods available like spline smoothing, projection pursuit and
additive modeling may also be used to find the relationship between the surrogate measure(s)
and survival. A discrete event can be modeled by the distribution of time to the discrete event
and/or with a binary bernoulli distribution.
Once the function for u(t) has been found and the parameter(s),
function of
(J
(J,
identified, the
on which the hazard depends must be identified. The investigator may have prior
knowledge from a previous study to suggest an appropriate function, otherwise explanatory
analysis will need to be done to find an appropriate form. For one or more parameters, regular
regression techniques can be used to model the relationship between survival and the
parameters.
Modeling procedes just as if several independent prognostic variables were being
regressed on survival time. More modern methods of modeling may also be used. If
(J
is one
dimensional, a spline smoothing technique would be appropriate. If there are several parameters
like five or six, projection pursuit or additive modeling is a powerful but complicated method to
use.
For a discrete measure, u(t) is a step function.
How important the risk event is on
affecting the hazard can be found by regressing survival on the risk event using Cox's
proportional hazards model. Information about the risk event can include whether it occurred
or not and/or how much time has elapsed since baseline. To see if subjects are at different risks
(i.e. random size jump), it is possible to estimate the hazard risk of the event for different levels
of another variable.
Also, one needs to know if the times at which the risk event occurred, ri'
are random or fixed.
For a fixed time it might be the exact time (e.g. a given day) that the
89
•
event occured or it may be a fIxed amount of time (e.g. 32 days after entering the trial). The
investigator may already know the answers to these questions due to prior knowledge obtained
by previous studies/research.
The parameters of ST(t) may be estimated using MLE based on information from both
X(or 9) and T19. If the MLE is not expressible in closed form, iterative procedures may have to
be employed.
5.2 Description of the Data Set
The above procedures will be illustrated using data from the Lipid Research Clinics
Coronary Primary Prevention Trial (LRC-CPPT). This was a multicenter, randomized, doubleblind study designed to test the efficacy of cholesterol lowering in reducing the risk of CHD.
The subjects consisted of 3806 asymptotic middle aged men (range 35-39) with primary
hypercholesterolemia (plasma cholesterol level of 265 mg/dL or greater), free of, but at high risk
of CHD because of elevated LDL-C levels. The men were randomized into two groups that were
similar in baseline characteristics.
The treatment group received the bile acid sequestrant
cholestyramine resin and the control group received a placebo.
The participants attended clinics every two months at which end points were evaluated
and lipid levels were determined. Information was sometimes incomplete, therefore, missing
values for change in Total-C, LDL-C and HDL-C were imputed. 80-95% of the subjects would
attend 'a bimonthly visit and have valid plasma lipid measurements performed.
If a subject
missed fIve or less visits in a row the value from the current visit was used for the last fIve
visits. If more than fIve visits were missed the current measure was used for the last fIve visits
and the baseline measure was used for the others.
The primary endpoint for evaluating the
treatment was the combination of defInite CHD death and/or defInite nonfatal myocardial
infarction. All men were followed up for a minimum of seven and up to ten years. The average
period of followup was 7.4 years. At the end of the trial contract was made with all of the men
who were still living, including any who discontinued vistits during the course of the trial.
Thus, the vital status is known for all men originally entered into the study,
90
5.3 Implementation
Two surrogates of survival were considered as examples for this trial, total plasma
cholesterol measurements and the development of an ischemic ECG response (positive excercise
test). Cholesterol was measured bimonthly for the entire length of the study. Excercise ECG
tests were taken semiannually or annually. Following the steps outlined above the first step was
to identify what measurements of cholesterol were to be used. To answer this question various
forms of total plasma cholesterol, LDL-C, and HDL-C were computed and entered into a Cox's
proportional hazards model. The covariates consisted of the slope, intercept, predicted baseline,
baseline and average values of the pretreatment visits of Total-C, LDL-C, HDL-C and a ratio of
Total-C over HDL-C.
These variables were formed as summary variables of the cholesterol
measurements as a way to minimize the information over time into a few variables or
parameters that are significantly related to survival. The result of testing these variables in a
Cox's proportional hazards model was that the baseline and predicted baseline measurements of
Total-C, LDL-C, HDL-C, and the ratio were the only significant predictors at the 0.05 level of
significance. The percent change in Total-C and LDL-C, from baseline to the last measurement
was also tested, but, results were nonsignificant.
The current value of cholesterol was also tested to see how well it worked as a surrogate
of survival.
The current value was found for each subject by predicting the value from the
equation of a line using the parameters previously calculated. Thus
current cholesterol = intercept(cholesterol)
+ slope(cholesterol)*tcurrent
The significance of this surrogate was found by applying current cholesterol as a time dependent
variable in a Cox's proportional hazards model. Total-C was used over LDL-C, HDL-C and
the ratio of Total-C over HDL-C because no type of baseline value of cholesterol was more
significant than any other, and there were several missing values for LDL-C and HDL-C. The
hazard function, g3(9,t), is now a function of the parameter estimate of current value. Since it
is known that the treatment affects cholesterol, the surrogate, and maybe survival also, the most
appropriate model to use here, then, is the model that allows for a treatment effect on both the
surrogate and survival. Thus, equation 3.15 will be used to compute the efficiency. To apply
91
•
this example to equation 3.15 which uses a surrogate whose distribution does not depend on
time, we must create a function using the parameter estimate of current which eliminates any
trend over time. This is done in the following way. First, using all cholesterol values, ignoring
dependence within subjects, a straight line is fit to describe the overall trend of cholesterol across
time.
Next, the difference between the overall line and each individuals predicted line is
computed(refer to this value as a residual) at three different time points, 1 year(360 days), 2
years(720 days) and 3 years(1080 days). The assumption is that the variation of the residuals is
constant across time.
In this way we can design a hazard function that depends on the
predicted current value but is independent of time. The hazard will be computed at the three
different
time
points
to
test
this
assumption.
The
hazard
takes
on
the
form
exp(.8current*fesiduali) for each subject. The distribution of the hazard is compared across the
three time points. Exp(.8) is the ris~ ratio for one unit of change in a subject's cholesterol value.
By multiplying .8 by .the residual for a particular subject the value of the hazard function
becomes independent of time. A histogtam of the individual hazards was plotted and looked to
follow a gamma distribution. For the maximum gain in efficiency, the amount of heterogeneity
in the hazard should be large.
The amount of heterogeneity for this hazard example can be
estimated by the method of moments by dividing the squared mean of the hazard by its
variance, which equals a when the distribution of the hazards is gamma.
In performing the
method of moments, a looked to be 376. Since a is so large the surrogate of current cholesterol
is disregarded. The proposed method will not be any more efficient than the standard method
do the large amount of homogeneity in the hazard.
The next example uses the results of the exercise ECG tests. This example illustrates
how to apply a time dependent discrete surrogate measure.
The information needed for this
example is the hazard of the occurrence of a positive ECG test, and the distribution of the
occurring times, given they are random.
It is assumed for this example that the resulting
hazard of a positive ECG test is the same for everyone and that the occurrence of times, r, are
random. To incorporate a random size jump, information is needed about the severity of the
event. For example, if the event was the recurrence of cancer, information like when it occurred
92
or the size of the tumor needs to be measured at the time of the event. If the event was a heart
attack, information on the degree of diagnosis or the number of arteries clogged could help one
determine the severity of the attack. This type of analysis could be very beneficial if there is a
good deal of heterogeneity among the size of the jumps, but it is more complex. The size of the
jump, ~1, is found by applying a Cox proportional hazards model to a time dependent variable
which takes on the value 1 if a subject experienced a positive ECG test before a given subject's
death time or a value of 0 if no positive ECG test occurred before the given subject's death
time.
The exponential of the resulting parameter estimate for this variable is the risk ratio
resulting from experiencing a positive ECG test.
The other parameters used to define this
surrogate are the gamma parameters which describe the distribution of the event times.
A
gamma distribution is chosen for the occurrence times of a positive ECG test due to the
resulting frequency of all the times.
However, it would be very difficult to estimate 0, the
gamma parameters, due to the existence of left and right censoring on the event times which
would lead to using only °a small percent of the data to estimate the parameters. Thus, the
integral needed to compute the amount of information on the treatment parameter, "'I, wll be
integrated over the hazard function instead of over, O.
5.4 Computing the Efficiency Ratio
The hazard function takes on three different values depending on whether the subject
started' out with a positive ECG test, h(tIO)=h(t)exp(.8) and H(tIO)=H(t)exp(p), experienced one
during the trial, h(tIO)= h(t)exp(p) and H(tIO)=H(r) + exp(p)[H(t)-H(r)] or never had a
positive test before dying or leaving the study, h(tIO)=h(t) and H(tIO)=H(t). To compute the
expected value of a function of the hazard, sum over the subjects and their appropriate hazards.
Following the general equation 4.1, for a time dependent surrogate, and letting k=exp(p), n1 be
the number of subjects who started with a positive ECG test, n2 the number of subjects who
inbtwns
never had a positive ECG test during the study, and E =n3 the number of subjects who had
a positive ECG test sometime during the study and n=n 1 + n2 +n3 , and letting n(t)i stand for
the number of ni at a given point in time, t,
93
•
2
n
ECd 4) =
-1 +
d"Y
n
-If exp[-kH(C)] + i
inbtwns
exp[-H(C)] +
+ 2{ ~ H(C)kexp[-kH(C)] + ;
L
exp[-(H(ri) + k(H(C)-H(ri)))]
H(C)exp[-H(C)]
inbtwDS
+
L
[H(ri) + k(H(C)-H(ri»]exp[-(H(ri) + k(H(C)-H(ri)))]}
+ {hlH(C)kexp[-kH(C)] I;H(C)exP[-H(C)]
+ E [H(ri)+k(H(C)-H(ri))]exp[-(H(ri)+k(H(C)-H(ri)))]]2
/ {
~1 exp[-kH(C)] + ; exp[-H(C)] + Eexp[-(H(ri) + k(H(C)-H(ri)))]}}
C
J
+ {"~h
h(t)k 2H(t)exp[-kH(t)] +
o
{
n~h h(t)H(t)exp[-H(t)]
+ E kh(t)[H(ri) + k(H(t)-H(ri))]exp[-(H(ri) + k(H(t)-H(ri)))]} /
O<r<t
n~h h(t)kexp[-kH(t)] + ~ h(t)exp[-H(t)]
+ E h(t)kexp[-(H(ri) + k(H(t)-H(ri»)]}}
O<r<t
To evaluate this equation the baseline hazard density is assumed to follow an exponential
density, thus, h(t)='\ and H(t)='\t. This assumption is common for highly censored data. The
value for C, the time at which censoring is fIxed, is computed for 90% censoring, the percent of
censoring experienced in the trial.
n
n
inbtwns
[exp[-('\r + ,\Ck - '\rk)] = 0.90
S = ,fexp(-k,\C) + !fexp(-'\C) +
L
and C is found to be 3350 days. The MLE of ,\ is found by dividing the number of deaths by
the total amount of person time on the study for those people who never experienced a positive
excercise test, ~=0.000027. The expected amount of information can be found by using Euler's
method of integration which is applicable when the intervals being integrated over are fairly
small. Thus, for the last integral in the expected amount of information,
c
J{"~h
o
,\k 2,\texp[-k,\t] +
n~h ,\2texp[-At]
94
+L
kA[Ari
+ k(At-Ari)]exp[-(Ari + k(At-Ari))]}
/
O<r<t
{
n~h
Akexp[-kAt]
+ n~h
Aexp[-At]
+L
Akexp[-(Ari
+ k(At-Ari))]}}
O<r<t
2
2 2
where A. = [nl k A T exp(-Ut] + n2 A T expJAT)]2
1
nl A k exp(-Akt) + n2 Aexp(-kt)
nl is the number of subjects who have had a positive ECG test before time ri' n2=n-n1, the
number of subjects who have not had a positive ECG test before time ri and T is the average of
r·1 and r·1- l' A·1 and the other three integrals can be evaluated by summing over the appropriate
hazard functions for the three types of people using any available computer software package.
For 90% censoring and an increasing hazard rate of 1.86, the resulting effeciency is 0.98. For a
gain of 10% or more, either the hazard rate must increase to 2.5 or higher or the percent of
•
censoring must decrease.
This chapter illustrates the process one goes through for a preliminary investigation
either to analyze methods to use in future studies or to see if the proposed method is worth
using for some data that has already been collected. The methods used to estimate u(t) and
g3(O,t) .are what is done in practice at the present time, but, one should also consider more
modern methods like spline fitting, projection pursuit and additive modeling. The LRC-CPPT
data set provided an example that did not give large gains in efficiency. Lipid measurements
are not the ideal surrogate of survival. Measuring lipids across time did not appear to provide
anymore information on survival than the baseline measurement.
Thus, there is no gain in
using the proposed method. In this case, the predictor after randomization was not as good as a
predictor at baseline. When such is the case, the proposed method would not be used. Types of
data that would work well with the proposed method are cancer studies where strong surrogates
of survial exist and the percent of censoring is lower. For example, a study of leukemia found
95
•
the rate of change of BUN(blood urea nitrogen), creatinine, chloride, and sodium to be stronger
predictors of survival than their baseline measurements and a study of lung cancer found
performance status and cell type to be strong predictors.
The proposed method is used to analyze variables that are measured after
randomization. A preliminary analysis looking at the efficiency for varying models and different
amounts of heterogeneity and percent censoring can help an investigator design a study that
eliminates measuring variables overtime that appear to be weak surrogates for the proposed
method and includes measuring strong surrogates of surival that can be used later to increase
the precision in making a treatment comparison or in estimating survival.
96
CHAPTER 6
SUMMARY AND FURTHER RESEARCH
6.1 SUIIlIDaIY
The purpose of this work was to combine information from surrogate measures with the
primary outcome to improve the estimate and comparison of survival distributions. This was
done by forming a likelihood function from the joint density of survival time, T, aDd the
surrogate measure, X.
This is in contrast to the standard methods which use the marginal
density of T and STet) only or the marginal density of X only, therefore, ignoring available
information.
Some methods that have been proposed to increase the power available in making
treatment comparisons were reviewed in Chapter 1. The question of validity was raised about
conclusions deduced from using surrogate endpoints in place of primary endpoints.
Other
problems mentioned were that the traditional method of adjusting for covariates cannot use
internal time dependent covariates and the gain by covariate adjustment is minimal when
censoring is large. Chapters 1 and 2 discussed alternative ways of incorporating time dependent
variables. Only one of the models was able to incorporate continuous time dependent variables
and both Lagakos' model and Cox's model assumed an exponential distribution for survival
time.
The proposed method provides a way to incorporate information from time depedent
covariates in a way that overcomes the other models weaknesses.
The likelihood used in the proposed method was developed in Chapter 3.
It can
incorporate fixed or time dependent surrrogate measures of the discrete or continuous form. The
proposed method uses the joint density of survival time and a surrogate measure, fT ,x(t,x), to
estimate and test hypotheses concerning fT(t), the density of of survival time. The examples in
Chapter 3 revealed a gain in precision in estimating the treatment effect, 'Y, and S(t), survival,
by using the proposed method instead of the standard method. This was done by comparing the
expected amount of information on
i' from the two methods.
If there is a large amount of
censoring, measuring the surrogate covariate improves the estimate of survival, S(t). If there is
little censoring it is worthwhile to measure a surrogate to get a more precise treatment
comparison. The proposed method is always better than the standard but how much efficiency
is gained using the proposed method depends on what exactly the treatment is affecting,
whether it be both the conditional hazard of survival time given X and the distribution of X, or
survival time given the surrogate measure only, or just the distribution of the surrogate. When
treatment affects the conditional distribution of survival time given the surrogate, TIX,
or
affects both T and X directly, the largest gains are achieved when the values of the surrogate
measure are heterogeneous and the percent of censoring is low. When the treatment affects the
surrogate only, the largest gains are achieved when the surrogate measure is homogeneous and
the percent of censoring is high.
A general model that can be used with any arbitrary hazard and a fixed or time
dependent surrogate covariate was
~eveloped
in Chapter 4. Three pieces of information must be
provided in order to compute the efficiency; the model for u(t), the expected value of X(t), the
function of the parameters of u(t) on which the hazard depends, g3(O,t), and the distribution of
g3(O,t) or 0. Assuming a model for u(t) and a function and distribution for g3(O,t) and 0, the
efficiency of the standard method to the proposed was computed for seven different examples.
All seven examples revealed that asymptotically the proposed method of using another outcome,
a surrogate of survival, along with the primary outcome, is always as good or better than the
standard method. The amount of gain in efficiency depended on the percent of censoring and
the amount of hererogeneity in the parameters describing the trend of the time dependent
covariates. When the percent of censoring is very high (>90%) and the amount of heterogeneity
in
(J
is small there is little to be gained by using methods that ignore effects through the
surrogate.
98
The examples were limited to using one time dependent surrogate measure with one to
three parameters or one discrete time dependent surrogate with one or two parameters. The
distributions of the parameters were assumed gamma for all seven of the examples. The gamma
distribution was chosen because it describes a positive random variable and incorporates a wide
variety of shapes. Two examples, one where the surrogate was a time dependent variable and
for the other a discrete measure, resulted in the same efficiency. Even though they had different
functions for u(t) and g3(O,t), both examples had a hazard function which followed a gamma
distribution. Thus, the efficiency depends not on the form of.the time dependent covariate but
on the distribution of the hazard function, a function which depends on the parameters
describing the trend of the time dependent
sum~gate.
The seven examples indicate that time
dependent covariates which have a large effect on the hazard appear to contain more helpful
information than just one value of a variable.
An example using data from the Lipid Research Clinics was described in Chapter 5.
This data was used to illustrate how to choose a surrogate measure and how to compute u(t)
and g3(O,t), and to find the distributions of
the general equation 4.1 to
comput~
°
or g3(O,t). This information was then applied to
the efficiency of the treatment parameter. The results using
the lipids data reveal that the proposed method offers little gain in efficiency with this data set.
The proposed method is aimed at clinical trials which have surrogates of survival that are very
significant predictors of survival (increase the hazard by at least 3-fold) and have a large
amount of heterogeneity.
The proposed method works better as the percent of censoring
decreases, but, a gain of 10% or more in efficiency can be achieved even with 90% censoring in
cases where the surrogate has a large affect on the hazard of dying (risk ratio of 10 or higher).
The lipids data did not ofTer a highly significant surrogate of survival. Cholesterol did not turn
out to be a good surrogate of survival for it was not significantly related to survival and the
values were not very heterogeneous.
This may have been because so many other factors are
related to cholesterol and survival and so it is hard to get a clear 1 to 1 relationship.
Past
studies have shown that cholesterol only explains about 50% of the variation in CRD.
The
example using ECG tests was very applicable but the discrete event of the occurrence of a
99
pOsitive exercise test did not increase the hazard enough to make the proposed method
worthwhile.
As mentioned earlier the proposed method is aimed at data sets which have a
strong surrogate of survival, that is very heterogeneous.
This dissertation adds three more survival models to the original two that are
commonly used today. Now, an investigator has five models from which to choose to apply his
data: i) the marginal denstiy of survival, ii) the marginal density of the surrogate, iii) the joint
density of survival, T, and the surrogate, X, with treatment affecting survival given the
surrogate directly, iv) the joint density of T
an~
X with treatment affecting the surrogate only,
and v) the joint density of T and X with treatment affecting both survival and the surrogate
directly. Chapter 3 and Chapter 4 show how to quanitify the advantages_and disadvantages of
each of the three treatment models in order to decide which one to use.
For the LRC-CPPT
trial the investigator would choose the marginal density of the surrogate or the joint density
with treatment affecting the surrogate, as the model.
But in a study like Leukemia where a
strong surrogate is available, a model using the joint density of T and X would be preferable.
To decide which of the three joint density models would be best the investigator needs to
consider whether treatment is affecting TIX only, X only, or both, and how strong the effect is.
If nothing is known about the treament the best model to use would be the model that allows
treatment to affect both T and X. But, if one knew that treatment mainly affected X and the
percent of censoring was high, the model with treatment affecting X only should be used. The
joint density model that allows treatment to affect both T and X produces the smallest gains in
efficiency, since each of the individual treatment effects prefers oppOsite values for a and the
percent of censoring, S. As Chapter 3 illustrated the model allowing a treatment effect on TIX,
only, produces the largest gains when a is small and S is small, but, the model allowing
treatment to affect X, only, sees the largest gains in efficiency when both a and S are big.
Thus, one must consider carefully where the effects of treatment may be and how large the
amount of heterogeneity and the percent of censoring is, in their trial.
The prOpOsed method is able to handle fixed or time dependent covariates of the
continuous or discrete type. In order to find the estimate of treatment and its variance the form
100
•
of the condition hazard, h(tlx), must be known. All results from Chapters 3 through 5 indicate
-that it is beneficial to use data from another outcome, a surrogate measure of survival, to help
estimate survival. Using the proposed method either improves the precision of the estimates or
allows for a smaller sample size while maintaining the same amount of precision obtained using
the standard method. For example, an efficiency ratio result of 0.60 indicates that the variance
of the treatment parameter is reduced by 40% when the proposed method is used instead of the
standard, or the variance remains the same but the sample size is reduced by 40%.
These
advantages are a direct result of making full use of the data at hand.
6.2 Further Research
In Chapter 3, the efficiency of the three treatment models using the joint density of T
and X works asssuming you can explain the distribution of the hazard in terms of a gamma
function, which is a very applicable distribution that considers may different shapes, but, further
work could be done to expand the results to other distributions.
Equation 4.1 expanded the
results of the model where treatment affected T/X only, to incorporate any arbitrary specified
hazard that may depend on time. A general model also needs to be developed for the situation
where treatment affects both, TIX and X, directly.
All of the above results assumed that the models chosen for u(t) and g3(O,t) were true
models. But, when the true model is not known and some weaker, less correct model is 'used
what is the effect on the efficiency?
Further research is needed to address this question of
mismodeling. Another extension to this research could look at more modern methods of
modeling like spline smoothing, projection pursuit and additive modeling as possible ways to
improve the estimate of u(t) and h(t).
asymptotically.
In addition, the above results have all been done
The next step is to look at the efficiency of the proposed method for small
sample sizes.
At the present time, there is not a computer package to maximize the specific joint
likelihood equations and produce the maximum likelihood estimates proposed. It is possible to
specify a conditional and marginal density in a general software pakage and compute the
101
eStimates in this way, but this is not a user friendly approach.
Thus, software needs to be
developed to provide a user friendly procedure that can apply these joint likelihoods.
I.
.
102
REFERENCES
Andersen, P.K. (1986). Time-dependent covariates and Markov processes. Modern Statistical
Methods in Chronic Disease Epidemiology. Wiley, New York.
Beach, M.L. and Meier, P. (1989). Choosing covariates in the analysis of clinic trials.
Controlled Clinical Trials 10: 161S - 175S.
Bickel, P.J. and Doksum, K.A. (1977). Mathematical Statistics Basic Ideas and Selected
Topics. Holden-Day, Oakland.
Chiang, C.L. (1980). An Introduction to Stochastic Processes and Their Applications. Krieger,
New York.
Cox, D.R. (1972). Regression models and life-tables. J. R. Stat. Soc. B 34, 187-202.
Cox, D.R. (1983). A remark on censoring and surrogate response variables. J. R. Stat. Soc.
B 45, 391 - 393.
Cox, D.R. and D. Oakes. (1984). Analysis of Survival Data. Chapman and Hall, New York.
Ellenberg, S.S. and Hamilton, J.M. (1989). Surrogate endpoints in clinical trials: Cancer.
Statistics in Medecine Vol. 8 No.5, 405-413.
Friedman, J.H. (1991). Multivariate Adaptive Regression Splines. The Annals of Statistics
Vol. 19, No.1, 1-141.
Freund, J.E. (1961). A bivariate extension of the exponential distribution. J. Am. Stat.
Assoc. 56, 971-977.
Friedman, L.M., Furger, C.D. and DeMets, D.L. (1985). Fundamentals of Clinical Trials
(Second Edition). Littleton, PSG.
Gail, M.H., Wieand, S. and Piantadosi, S. (1984). Biased estimates of treatment effect in
randomized experiments with nonlinear regressions an omitted covariates. Biometrika 71,
431-444.
Greenwood, M. (1926). The natural duration of cancer. Reports on Public Health and
Medical Subjects, 33, London: Her Majesty's Stationery Office, 1-26.
Herson, J. (1989). The use of surrogate endpoints in clinical trials (An introduction to a
series of four papers). Statistics in Medecine Vol. 8 No.5, 403-404.
Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data.
Wiley, New York.
Kaplan, E.L. and Meier, P. (1958). Nonparametric estimator for incomplete observations. J.
Amer. Stat. Assoc. 53,457-481.
Lagakos, S.W. (1976). A stochastic model for censored-survival data in the presence of an
auxiliary variable. Biometrics 32, 551-559.
Lindgren, B.W. (1976). Statistical Theory (Third edition). Macmillan, New York.
103
Miller, R.G. (1981). Survival Analysis. Wiley, New York.
Mood, A.M., Graybill, F.A. and Boes, D.C. (1974). Introduction to the Theory of Statistics
(Third Edition). McGraw-Hill, New York.
Morgan, T.M. and Elashoff, R.M. (1986). Effect of censoring on adjusting for covariates in
comparisons of survival times. Communications in Statistics, Part A - Theory and
Methods 15, 1837-1854.
Pocock, S.J. (1983). Clinical Trials A Practical Approach. Wiley, New York.
Prentice, R.L. (1989). Surrogate endpoints in clinical trials: Definition and operational
criteria. Statistics in Medicine. Vol. 8, No.5, 431-440.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman
and Hall, New York.
Taylor, H.M. and Karlin, S. (1984). An Introduction to Stochastic Modeling. Academic Press,
New York.
Weiss, G.B. and Zelen, M. (1965). A semi-Markov model for clinical trials. J. Appl. Prob.
2,269-285.
Wittes, J., Lakatos, E. and Probstfield, J. (1989). Surrogate endpoints in clinical trials:
Cardiovascular diseases. Statistics in Medecine, Vol. 8, No.5, 431-440.
104
•