AREA-UNDER-THE-CURVE ANALYSIS AND OTHER ANALYSIS
STRATEGIES FOR REPEATED MEASURES CLINICAL TRIALS
by
Edward C. Bryant
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1453
December 1983
AREA-lrnnER-THE-CURVE ANALY~I~
AND OTHER ANALYSIS STRATEGIES FOR
REPEATED MEASURES CLINCAL TRIALS
bV
Edward Carroll Brvant
A dissertation submitted to the faculty
of the University of North Carolina at
Chapel Hill in partial fulfillment of
the reouirements for the de2ree of
Doctor of Public Health in the Department
of Biostatistics.
Chapel Hill
1983
Approved by:
0a-"';"~~
Advisor
Reader
Reader
ABSTRACT
EDWARD C. BRYANT.
Area-Under-the-Curve Analysis and other analysis
strategies for repeated measures clinical trials. (Under the direction
of DENNIS B. GILLINGS)
In the pharmaceutical clinical trial setting, the repeated
measures design is often utilized to evaluate treatment effectiveness
when response to treatment manifests itself over time.
study subject, a response vector y'
with y is the vector t'
=
=
For each
(YI""'yp) is observed; associated
(tl, .•. ,t ) which identifies the times
p
of observation.
The use of area-under-the-time-response curve (AUC) as a measure
of cumulative (or equivalently, mean) response is examined.
ANOVA model using the AUC parameter is specified.
The
Methods to estimate
AUC, such as least-squares polynomial curve estimation and methods
based on numerical integration techniques, are examined, and the
error of the estimation procedures are compared.
The use of AUC
in the treatment-by-time interaction situation is discussed.
Other analysis methods such as separate univariate analyses,
mixed model analysis, polynomial growth curve analysis, piecewise
growth curve analysis, and Zerbe-Walker test analysis are reviewed
and compared to AUe analysis with respect to assumptions, power,
anc applicability to longitudinal pharmaceutical clinical trial
data.
Three clinical trial data sets are analyzed to illustrate
the various approaches to analysis.
Considerations in selecting
an analysis method, such as study objectives, the use of knowledge
regarding the response phenomena under study to motivate a modeling
approach, the presence of missing data, and the test power, are
discussed.
AUC analysis is seen to address the question of average response
to treatment over the period of observation.
Since it involves
integration of response function f(t) over time, the AUC parameter
cannot be used when interest is in describing the nature of f(t)
or in testing for differences in response curve shape among treatments.
Growth curve analysis is naturally directed at this and provides
a powerful test of treatment differences.
However, problems of
interpretation of the polynomial coefficients can result when highorder models are required to give an adequate fit to the data and
when the model is not suggested by theory.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS
LIST OF TABLES
LIST OF FIGURES
·
.
.
.
·
·
v
vi
viii
Chapter
1.
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Statement of the Problem
1.2
Review of the Literature
1.3
1.4
.....................
.....................
1
1
4
1.2.1 Separate Univariate Analyses ••••••••••••
1.2.2 Endpoint Analysis •••••••••••••••••••••••
1.2.3 Univariate Mixed Model ANOVA ••••••••••••
a. Traditional Procedure ••••••••••••••••
b. Greenhouse-Geisser Procedure •••••••••
c. E. -Corrected Approach ••••••••••••••••
d. Conditions for Exact F-Ratios ••••••••
e. Testing for Circularity ••••••••••••••
1.2.4 Multivariate Analysis of Variance
Procedures •..••••.•••••.•••...••.......•
a. Profile Analysis •••••••••••••••••••••
b. Growth Curve Analysis ••••••••••••••••
1.2.5 Summary Univariate Procedures •••••••••••
a. Area-Under-the-Curve Analysis ••••••••
b. Zerbe-Walker Test ••••••••••••••••••••
5
7
9
9
12
14
15
17
18
19
24
33
33
36
Comparison of Analysis Methods
39
1.3.1 Mixed Model Approaches ••••••••••••••••••
1.3.2 MANOVA versus Mixed Model •••••••••••••••
1.3.3 Multivariate Procedures versus
Combined Univariate Tests •••••••••••••••
40
41
47
Research Proposal
50
iii
2.
AREA-UNDER-THE-CURVE ANALYSIS
2.1
Introduction
2.2
Statistical Model
2.3
Methods of Estimation
•••••••••••••••••••••• 55
. . • • • . . • . . . • . . . • . • . • • . . . . . . . . . . ..
5S
•••••••••••••••••••••••••••• 58
•••••••••••••••••••••••• 63
2.3.1 AUC Approximations Which Are Linear
Combinations of Observed Responses •••••• 64
a. Rectangular Rule Approximation ••••••• 65
b. Trapezoidal Rule Approximation ••••••• 68
c. Interpolating Polynomials of
Degree Greater than One •••••••••••••• 77
2.3.2 Least Squares Polynomial Approximation •• 79
3.
WEIGHTED AUC ANALYSIS
3.1
Introduction
3.2
Experimental Issues
3.3
Development of Weighting Function Based on
Expected Response Function •••••••••••••••••••
89
3.3.1
3.3.2
90
96
3.4
4.
5.
••••.•••.••••.•••••••••••••..••.•
82
•••••••••••••••••••••••••• 85
Functional Form Known ••••••••••••••••••
Functional Form Unknown ••••••••••••••••
•••••••••••••••••••
100
POWER OF THE AUC AND ZERBE-WALKER F-TESTS
OF TREATMENT MAIN EFFECTS ••••••••••••••••••••••••••
102
PIECEWISE GROWTH CURVE MODELS
108
The Weighted AUC Parameter
••••••••••••••••••••••
5.2
.................................
Statistical Model ............................
5.3
Analysis Considerations
5.1
6.
•••••••••••••••••••••••••••••• 82
Introduction
••••••••••••••••••••••
ANALYSIS OF CLINICAL TRIAL DATA SETS
108
111
115
•••••••••••••••
118
6.1
Introduction
•••••••••••.••••••••••••••.•.••••
118
6.2
Study 1: Active vs. Placebo in the Treatment
of Rheumatoid Arthritis ••••••••••••••••••••••
119
Study 2: Active ve. Standard in the Treatment
of Rheumatoid Arthritis ••••••••••••••••••••••
129
Study 3: Active vs. Placebo in the Treatment
of Depression ••••••••••••••••••••••••••••••••
139
6.3
6.4
iv
6.5
7.
Comparison of Analysis Results
152
•• ••••••• ••••••• ••• ••• ••••• •• •• •• ••••
158
ANALYSIS CONSIDERATIONS FOR REPEATED
}1EASURES DATA
8.
•••••••••••••••
7.1
Study Objectives
•••••••••••••••••••••••••••••
7.2
Knowledge of the Response Phenomena
161
7.3
Properties of the Statistical Model
163
7.4
Characteristics of the Observed Data
165
SUGGESTIONS FOR FURTHER RESEARCH
REFERENCES
158
••••••••••••••••••
170
••••••••••••••••••••••••• •.•••••••••••••••••••
172
ACKNOWLEDGMENTS
I would like to express my appreciation for the advice and
unfailing support given by my adviser. Dr. Dennis Gillings.
I would
also like to thank committee members Drs. Craig Turnbull. Jonathan
Davidson. Gary Koch. and Bert Kaplan. for their comments and helpful
suggestions.
I especially thank Dr. Davidson for making available
the depression clinical trial data used in the illustrative analyses.
I also acknowledge the financial support provided by the Center
for Epidemiologic Studies of the NIMH (Grant No. 2T32MH15131) that
enabled me to pursue this research.
I also express my appreciation
for the computer funds provided by Hoechst-Rousse1 Pharmaceuticals.
Inc. in addition to the two arthritis clinical trial data sets.
To my wife. Patricia. I give special thahks and eternal love
for her devotion. encouragement. and sacrifice during this period
in our lives.
I wish also to thank my parents. Earl and Dorris
Bryant. for their love and support.
Finally. I would like to express my gratitude to the faculty
and students of the Department of Biostatistics who have greatly
enriched my professional and personal life.
The continuing advice
and friendship of Dr. Craig Turnbull during my studies has been
especially appreciated.
LIST OF TABLES
Page
Table 1.1
Analysis of Variance Table for Zerbe-Walker Test
Statistic. One-way Layout
37
Table 2.1
Analysis of Variance Table for AUC Parameter.
One-way Layout
59
Table 2.2
Analysis of Variance Table for AUC Parameter.
Balanced Fixed Effect Two-way Layout
61
Table 2.3
ANOVA Table Comparing AUC (Linear Interpolation
Estimate) and Mixed Model ANOVA for the test of
Treatment Main Effects
76
Table 6.1
Study 1 Average Number of Painful Joints (NPAIN)
by Treatment Group and Investigator
121
Table 6.2
Study 1 Mixed Model Analysis Results for NPAIN
123
Table 6.3
Study 1 Growth Curve Analysis Parameter
Correlation Matrix
125
Table 6.4
Study 1 Growth Curve Analysis Results.
Effects Third-Order Polynomial Model
Table 6.5
Study 1 AUC Analysis and Zerbe-Walker Test Results
for NPAIN
128
Table 6.6
Study 2 Average Number of Painful Joints (NPAIN)
by Treatment Group and Investigator
131
Table 6.7
Study 2 Univariate Analysis Results for Each
Observation Time Point
133
Table 6.8
Study 2 Mixed Model Analysis Results for NPAIN
134
Table 6.9
Study 2 Growth Curve Analysis Parameter Correlation
Matrix
136
Table 6.10
Study 2 Growth Curve Analysis Results. Fifth-Order
Polynomial Model
137
Table 6.11
Study 2 AUC Analysis and Zerbe-Walker Test Results
138
Table 6.12
Study 3 Average Hamilton Depression (HAMTOT)
Scores by Treatment Group
141
Table 6.13
Study 3 Mixed Model Analysis Results
144
Main
126
vii
Table 6.14
Study 3 Growth Curve Analysis Parameter
Correlation Matrix
146
Table 6.15
Study 3 Estimates of Natural Polynomial
Coefficients for Third-Order Model
147
Table 6.16
Study 3 AUC and Zerbe-Walker Test Results
148
Table 6.17
Study 3 Piecewise Growth Curve analysis.
of Parameter Correlation Coefficients
Table 6.18
Study 3 Piecewise Growth Curve Analysis Results
151
Table 6.19
Summary of Analysis Results for Clinical Trial
Data Sets
153
Table 7.1
Summary Comparison of AUC and other Methods of
Analysis for Repeated Measures Data
168
Matrix
151
LIST OF FIGURES
Page
Figure 1.1
Data Array from the Repeated Measures Treatmentby-Occasions Design
3
Figure 1.2
Hypothetical Mean Time-Response Curves for Placebo
and Active Treatments
31
Figure 1.3
Hypothetical Average Response to Treatment j.
34
Figure 1.4
Time-Response Curve of Figure 1.3 Showing Average
Response f u '
34
Figure 1.5
Illustration of Rao's Paradox for Bivariate Case
when -0.5.
48
Figure 2.1
Hypothetical Response Curve to Treatment j.
56
Figure 2.2
Time Response Curve of Figure 2.1 Showing Average
Response f u '
56
Figure 2.3
Approximation of AUC on [to, tIl Using the
Rectangular Rule.
66
Figure 2.4
Approximation of AUC on [to,t1l Using the
Trapezoidal Rule.
66
Figure 2.5
Response Data Showing Estimated AUC over [t1,tpl as
Sum of Subinterval AUC Estimates.
72
Figure 2.6
Linear Interpolation AUC Estimate as a Function of
Estimated Average Int~rval Responses.
72
Figure 3.1
Three Possible Response Patterns for a
Longitudinal Clinical Trial of Test Versus
Standard Treatment
83
Figure 3.2
Hypothetical Serum Blood Level Time-Response Curve
of a (Single-Dose) Standard Drug.
91
Figure 6.1
Study 1 Average NPAIN Score Time-Response Profile,
by Treatment Group.
122
Figure 6.2
Study 1 Goodness-of-fit p-values for Polynomial
Growth Curve Main Effects Model of a Given Order.
125
Figure 6.3
Study 2 Average NPAIN Score Time-Response Profile
by Treatment Group.
132
Figure 6.4
Study 3 Plot of HAMTOT Means by Treatment Group.
143
Figure 6.5
Study 3 Goodness-of-fit p-values for Polynomial
Growth Curves of Given Order.
146
CHAPTER 1
INTRODUCTION AND
REVIEW OF THE LITERATURE
1.1
STATEMENT OF THE PROBLEM
In many experimental situations, interest lies in examining
response to some treatment or intervention where the response
manifests itself over time.
In the pharmaceutical clinical trial
setting, in particular, evaluation of drug effectiveness often
involves the assessment of response data collected over a period of
time following initiation of drug therapy.
Decisions regarding the
efficacy of one treatment relative to another are based on comparing
response along a time continuum.
A common research design that has been used to examine responseover-time phenomena is the longitudinal repeated measures design.
The
longitudinal study is characterized by measurements taken on the same
unit of observation (subject) over a set of occasions.
Typically,
subjects are assigned on a random basis to one of g different groups.
Following initiation of treatment, responses to treatment are observed
at p pre-specified time points of interest to the investigator.
For
2
each study subject, a vector of responses is generated.
For the i th
subject from the jth treatment group, treatment responses can be
denoted by the vector Yij' - (Yij1'Yij2'''.'Yijp)' i-1, ... ,nj;
j-1,... ,g.
Associated with the response vector Yij is another vector
tij' where t'ij - (tij1,tij2, ...,tijp)' which denotes the times of
observation following initiation of treatment.
Most often the times
of observation are the same for all N subjects so that tij' (t1,t2, ... ,t p ) for all i,j.
The response data for N subjects from a
treatment-by-occasions trial can be expressed in an N x p array as
shown in Figure 1.1.
The treatment-by-occasions design has been labeled by Campbell
and Stanley (1966) as the Multiple Time-Series design and in the
psychological literature as the Type I mixed design (Lindquist, 1953).
By whatever label, this design represents one of the most widely used
research designs in the area of clinical trial research and will
occupy the principal focus of this dissertation.
Although multiple
characteristics (e.g., diastolic and systolic blood pressures, blood
chemistry profile) may be measured at each time point, attention is
restricted here to the analysis of one characteristic measured
periodically over time.
In addition, response data is assumed to be
continuous and complete (i.e., no missing data).
Just how to go about analyzing the information generated by a
repeated measures study has long been a subject of discussion among
statisticians.
Selection of an appropriate statistical approach to
analysis is determined by a variety of considerations, including the
experimental goals, the nature of the phenomena under study, validity
of statistical model assumptions for the observed data, and
FIGURE 1.1
DATA ARRAY FROM THE REPEATED MEASURES
3
TREATMENT-BY-OCCASIONS DESIGN.
Treatment
Group
Subject
1
.
.
j
Occasion
1
1
Y111
··
·
n1
Group 1 Means
Y.11
Y.12
1
Y1j1
Y1j2
··
·
Yij1
··
·
nj
·
·
·
Yn,;j1
Group j Means
.
YU2
k
·..
·· ·· ··
· · ·..·
Y11).l Yn,12
·
·
i·
g
2
1
Group g Means
Combined Group Means
Yijk
Y~j2
y. j1 Y.j2
·..
··
··..
···
·..
·..
Y1jk
··
·
··
Yijk
·..
···
·..
··
·
Yry.p
Y.1p
··
·
Yijp
··
·
YI!;jp
Y1gp
Y•• k
Y1gk
··
Y~k
-
Y.1-
Y1jp
·..
Y•• 1
Y•• 2
··
Y. jp
··
· ··..
·..
Y.gk
Y1g2
YUp
· ·..·
·..
Y. jk
Ytljjk
·..
·· · ·
· ·
·
·
·..·
Yqg 1 Y~2
·..
Y. g1 Y. g2
Y1g1
··
n·g
···
Yij2
···
·..
·..
·· ··
· ·..·
Yn,l.k
·..
Y.1k
Yllk
All
P Occasions
y• .,.
··
·
Y~p
Y. gp
Y.~.
-Y•• p
= k th response of i th subject from jth group;
i:1, ••• ,n ;
j-l •••• ,g,J
k=l, ••• , p.
4
characteristics of the completed experiment such as the occurrence of
dropouts and missing data.
This dissertation reviews statistical methods for repeated
measures data and discusses their application to clinical trial data.
Attention is focused on applications to the pharmaceutical clinical
trial setting where the form of the time-response curve is often
unknown and where a main concern is assessing, in some overall sense,
the efficacy of an investigational treatment relative to some
reference treatment.
,In this chapter, various approaches to repeated measures
analysis that have appeared in the literature and which have been used
for longitudinal designs are reviewed and compared.
analysis strategies are introduced.
Two additional
The first is an intuitively
appealing univariate approach to assessing treatment main effects
using an area-under-the-time-effect curve (AUC) approach.
The second
is the use of a piecewise growth curve analysis approach with which to
model data whose functional form changes during the period of
observation.
1.2
REVIEW OF THE LITERATURE
Early treatment of the repeated measures analysis problem often
consisted of endpoint analysis, i.e. looking at the first and last
observation from a subject's vector of responses and ignoring the rest
(Wishart, 1938).
Another strategy was to analyze the
classical regression techniques, thus ignoring the
da~a
using
correlat~j
structure of the response variables and essentially treating the data
as a repeated cross-sectional design.
Tests based on this procedure
have been shown (McGregor, 1960; Siddiqui, 1958; Watson, 1955; Elston
and Grizzle, 1962) to be too liberal, resulting in far too many
rejections of the null hypothesis.
Wishart (1938) was the first to
develop repeated measures statistical techniques that incorporated the
correlated structure of the data and utilized all the information on
subject ij contained in Yij'
His method of fitting polynomial curves
to the data was essentially the basis for what is now called growth
curve analysis.
In this section,
the statistical literature on
approaches to the analysis of longitudinal data is reviewed.
Models
for analysis are either univariate or multivariate depending on the
assumptions the investigator is willing to make about the data.
The
first approach to be discussed is the use of separate univariate
analyses.
1.2.1
Separate Univariate Analyses
For treatment responses measured at p different observation
times, separate univariate analyses can be performed for each
observation time to assess significant treatment differences.
In
fact, this approach is often followed in the analysis of
pharmaceutical clinical trial efficacy data for descriptive purposes.
From the perspective of assessing the overall superiority of one
treatment relative to other treatments, however, the question of how
to combine the results is problematic.
In the situation where the separate tests of significance are
6
independent, Fisher's method for combining p-values for one-sided
tests of location (Fisher, 1950) or Monti's procedure (Monti, 1975) of
combining values of univariate test statistics might be used.
However, in the repeated measures study, the data from different
occasions are correlated and the above procedures are inappropriate.
For the case when the separate tests are not independent, Brown (1975)
presented an approximate test for one-sided tests under the assumption
of multivariate normality.
His method consists of calculating the
first two sample moments of X2 and equating these to that of a chisquare distribution to derive an approximate distribution for X2 •
That the procedure is limited to one-sided tests restricts its use to
the case of two treatment groups.
A different procedure that can be used to combine non-independent
tests of significance of treatment effects is the use of multiple
comparison procedures.
One widely used multiple comparison procedure
is based on the Bonferroni inequality (e.g., see Neter and Wasserman,
1974).
To test the null hypothesis that no treatment differences
exist at any of the p observation time points, the test at each
occasion is evaluated using critical value o</p where
0<
is the
overall Type I error-level the investigator is willing to risk.
Subsequent to a significant overall test at occasion k, treatment
group contrasts can be tested employing either the Bonferroni or
Scheffe multiple comparison procedures.
This approach has the
advantage over the procedure due to Brown in that two-sided test and
multi-treatment situations can be considered.
7
1.2.2
Endpoint Analysis
The repeated measures design is often used in pharmaceutical
clinical research to collect safety data (e.g.. blood chemistry and
hematology. vital signs. incidence of side effects) in order to
closely monitor and assess drug toxicity.
While efficacy data is
usually collected at each subject visit. primary interest with respect
to treatment efficacy may be directed at patient response at the end
of the trial.
That is. for efficacy response vector Yij' -
(yij1 ..... Yijp)' i t is observation Yijp that is of primary interest.
For example. in the treatment of depression. the success of drug
therapy may best be assessed only after a sufficient period of time
has passed for the establishment of tolerable drug dosage levels.
After this period of time. the beneficial effects of the drug in
conjunction with psychotherapy can be realized and might be expected
to be monotonically increasing until time t p •
When there are no early dropouts and interest is directed at
patient response at the end of the observation period. the problem of
assessing treatment effects for the repeated measures trial is reduced
to a single univariate analysis of Yijp. The occurrence of early
dropouts complicates the picture in that Yijp is not observed for some
subjects.
When dropouts occur. a univariate analysis of the last
recorded value is sometimes performed.
This is called endpoint
analysis (Gould, 1980).
The use of endpoint analysis utilizes information from each
subject rnndomized to treatment and avoids the problem of noncomparable samples.
An underlying assumption of endpoint analysis is
8
that the last available response of a patient who withdrew from the
study is the same as would have been observed if he stayed in the
trial.
The validity of this assumption is difficult if not impossible
to test.
Gould (1980) proposed a new approach to using endpoint analysis
which incorporates reason-for-early-termination in deriving a nonparametric endpoint score.
At the time of early termination, one of
the following outcomes is determined:
1) Withdrawn, cured
2) Withdrawn, intolerant of drug
3) Withdrawn, drug does not work
4) Withdrawn, reasons unrelated to treatment.
Given a range of observed responses at time p, denoted by IRL,Ru], and
assuming higher scores reflect greater treatment efficacy, a subject
who terminates early is given the following score according to his
reason for dropout:
1) If cured, give score 51
> RU•
2) If intolerant of drug, score 52
< RL •
3) If drug does not work, score 53
~
52.
4) If patient withdrew for reasons unrelated to treatment,
do not assign a score.
It is assumed here that subjects who drop out are in contact with
study staff at the time of withdrawal.
Another approach using endpoint analysis examines the effect of
early dropouts on response.
In this approach, a dummy variable which
identifies whether the endpoint score is an early termination score or
not is included in the ANOVA model.
Also included is a treatment-by-
9
dummy variable interaction term.
The presence of significant
interaction identifies a different direction of association of
response level and early termination across the treatment groups.
For
example, the response of early dropouts in the placebo group may be
lower than the average total-group response level while the response
of early dropouts in the active treatment group may be higher
(reflecting a "cure" and no further need for treatment).
Using this
approach, the nature of the early dropout phenomena can be examined
and adjusted estimates of treatment group differences can be obtained.
1.2.3
Univariate Mixed Model Analysis of Variance
0_
a.
Traditional Procedure
Consider the data array presented in Figure 1.1 for the repeated
measures treatment-by-occasions design.
A three factor mixed model
analysis of variance can be constructed with treatments and occasions
as completely crossed fixed factors and subjects as a random factor
nested within treatment groups.
Letting Yijk represent the k th
measurement from the i th subject in the jth treatment group, the mixed
model ANOVA can be written (Winer, 1971)
(1.1 )
with side conditions ~
0(
j =
fP k
t
= fO<Pjk .. ~ o<.8jk = 1T,8ik(j)-O;
where~, cXj' and t?k are the overall population mean, jth tr~atment
10
effect, and k th occasion effect, respectively;fri(j) is a random
effect associated with the i th subject in the jth treatment group; ,~A
......jIC.
is the interaction effect for the jth treatment group and the k th
occasion; fTf9ik(j) is the interaction effect of the i th subject and
k th occasion within the jth treatment group; and eijk is a random
error term.
Unless multiple measurements are taken for each subject
at each data collection point, 1i~ik(j) and eijk cannot be
independently estimated.
The random error components can be grouped and expressed as
(1.2)
mijk - tri(j) + rr)9ik(j) + eijk·
Correspondingly, the expected value of observation Yijk is
(1.3)
Traditional asumptions about the mijk (Eisenhart, 1947; Scheffe, 1959;
Greenhouse and Geisser, 1959) are that the p error terms mij1, ...,mijp
have a multivariate normal distribution with expectation zero and the
following covariance structure
(1'2.
if i=i',j=j',
k=k'
{'(flo i f
i=i' ,j=j',
k~k'
o
otherwise.
11
The above assumption for the covariance structure can be rewritten in
matrix notation as
(1.4)
~. _ (5'1.[1. P ...
,.,
J
P:
.
P]
cr2.[ (l -/' )I p
1 ." f'
:
P fJ ...
.
1
+
jj']
where j is a p x 1 vector of l's.
for all treatment groups j;
j-1, ••• , g.
A matrix of the form in (1.4) is said to have the property of compound
symmetry or uniformity (Geisser, 1963).
The term compound symmetry
refers to the invariance of the covariance structure under all
permutations of the mij1,OU,mijp
(Koch, 1969).
The presence of
compound symmetry results in exact F-ratios for within-subject factors
(i.e., condition-by-treatment interaction and condition main effects).
Contingent on a non-significant interaction, treatment main effects are
then tested, assuming equal among-group variances (i.e., j'~1 j -
When considering response-over-time data, the uniform covariance
structure may not occur in many "real life" situations.
This is
because observations made on the same individual at two closely spaced
time points tend to be more highly correlated than two observations
made at longer intervals.
For example, blood pressure readings made
on the same individual one minute apart are apt to have closer values
than readings made one day apart.
For longitudinal data with an
arbitrary covariance structure the question then is whether some other
univariate analysis of variance approach to analysis is
appr~priate.
12
b.
Greenhouse-Geisser Procedure
Box (1954) derived the theoretical background for approximate
univariate ANOVA procedures for data for one group with a general
covariance structure.
Geisser and Greenhouse (1958) extended results to
the situation of several treatment groups.
The test of the hypothesis
of no treatment-by-occasion interaction in the situation of general
covariance structure was found to be the same as the usual F-test for
the uniform situation, except with modified degrees of freedom
F( (p-1)(g-1) Eo ,(p-l)(N-g)f. ) where the correction factor € is defined
by
(l.5)
(.:&
p1.(<tu - '8:.)7-/
<,-,)
where
<ff<T:; - 2pfo-t~ + pta:~)
r:r ts
are the elements of 1:. ,
Gr tt
is the mean of the diagonal terms
in 1: ,
crt. is the mean of the tth row,
a-..
is the grand mean of terms in ~ •
In similar fashion, degrees of freedom for an approximate F-test of no
condition main effects are F( (p-1)E. ,(p-1)(N-g)E.).
The F-test of the
hypothesis of no treatment effects is distributed exactly as F(g-1,N-g),
assuming normality and equal among-group variances.
The correction factor can be expressed in matrix terms as
(1.6)
E.
=
(tr U'~ U)2 / t· tr(U'!. 0)2
13
where U is a p x t orthonormal contrast matrix of interest of rank t
and 'tr' is the trace operator (Rogan et al., 1979).
showed E. is such that (p-l)-1
.s.
€
.s. 1.
Box (1954)
When the population
covariance matrix is compound symmetric then€ equals unity.
One can
note that for the case of two repeated measures (p-2) the contrast
space for conditions is necessarily of one dimension and equation
(1.6)
simplifies to unity, indicating that the F-test (or
equivalently the paired t-test when there are less than three groups)
is always exact, assuming normality and equal variances among groups.
For the case of more than two repeated measures and general
covariance structure, Greenhouse and Geisser (1959) discussed the use of
an approximate test whereby the degrees of freedom for testing withinsubject factors are reduced by the correction factor £.
But since
~
is a function of the elements of the population covariance matrix 2: ,
and the effects of estimating € from the sample covariance matrix S
are unknown, these authors suggested for moderate or small error
degrees of freedom a conservative test for within-subject factors.
The conservative test consists of equating
value of (p-1)-1.
~
to its minimum possible
For a less conservative aproach to analysis, the
authors suggested the following general strategy for analysis of
within-subject factors:
1. First use the F-test with full unreduced degrees of
freedom.
2. If the F-test is not significant, stop here because
it will not be significant with fewer degrees of freedom.
Otherwise, test for significance with the reGuced degrees
of freedom (conservative test).
14
3. If the conservative test is significant, stop here because
it must be significant with more degrees of freedom.
Otherwise, estimate Eo from the sample covariance matrix
and use the approximate test.
£ -Corrected Approach
c.
An alternative analytic strategy within the univariate mixed model
framework is to forgo the above three-step procedure and test directly
for within-subject factors using F-ratios based on adjusted degrees of
freedom where £ is estimated from the sample covariance matrix S.
A
This estimate, £ , is the maximum likelihood estimate for £
(Anderson, 1958) when the population is multivariate normal.
Collier et
,..
a1. (1967) and Stoloff (1970) investigated the effect of using £ for the
population value £ .
Utilizing simulation studies, they found
;!'o
e.
to be
negatively skewed for values of £ close to unity and positively skewed
for very low values of
e. ,
especially when sample size is small.
In
particular, when the group sample size is less than twice the number of
A
occasions, €. may be seriously biased if E. is 0.75 or above, thus
resulting in an overly conservative test.
Huynh and Feldt (1976) found the expected value of the numerator
and denominator of equation (1.5) and used the resulting ratio to develop
a ratio of unbiased estimators that is less biased than the maximum
A
likelihood estimator Eo, when E. is moderately large, say Eo
> 0.75.
The authors found in a review of the educational research literature
that values of €
statistic,
€ ,
> 0.75
were very
~cmmon.
is given in terms of
£
by:
The resulting ratio
15
A
~
[N(p-1)E: - 2] / (p-1)(N- g - (p-1)E. )
(1. 7)
The advantage of
i
over ~ with respect to power is greatest when N
is small and Eo close to unity.
When €. is 0.5 or less, the maximum
A
_
likelihood estimate E. was found to be less biased than E:. •
Huynh (1978) describes two additional procedures for testing for
within-subject effects.
IGA are similar to the
Degrees of freedom correction factors GA and
£"
and
~
correction factors respectively, but
the former are valid for arbitrary covariance matrices that are
unequal among treatment groups.
d.
Necessary and Sufficient Conditions for Exact F-Ratios
While compound symmetry and equality of covariances were long
regarded in the statistical literature as necessary and sufficient
conditions for exact within-subject F-ratios for the mixed model
(Winer, 1962; Kirk, 1968; Greenhouse and Geisser, 1959), Huynh and Feldt
(1970) and Rouanet and Lepine (1970) independently showed that these
assumptions were sufficient but not necessary.
Assuming multivariate
normality, the necessary and sufficient conditions for within-subject
F-ratios to be distributed exactly as an F distribution are
(108)
j=1,2,000,g.
where U is a p x (p-l) orthonormal contrast matrix and I is the
identity matrix of rank p-1.
Equation (108) states that the F-tests of
treatment-by-occasion interaction and occasion main effects a:e valid if
16
and only if the p-1 contrasts represented by U are independent and equally
variable.
An equivalent expression of (1.8) was given by Huynh and Feldt
(1970) as
For Yi and Yj representing separate conditions for the same
subject, exact F-tests are obtained iff all possible
differences Yi-Yj'
(i~j),
are equally variable, i.e.
for all i,j
(1.9)
(i~j).
The condition of compound symmetry is a special case of condition
(1.9); Rouanet and Lepine label (1.9) as the condition of circularity.
The necessary and sufficient condition expressed in (1.8) and (1.9)
is more likely to be satisfied in the experimental situation than the
traditional assumption of compound symmetry.
However, the likelihood that
(1.9) is met in the response-over-time design is generally suspect since,
for the three occasions situation, (1.9) requires:
Var(Y1- Y2)
~
Var(Y1) + Var(Y2) - 2~Var(Y1)Var(Y2) P'1 o'it.
must
= Var(YC Y3)
~
Var(Y1) + Var(Y3) - 2~Var(Y1)Var(Y35P'1"h
must
= Var(Y2-Y3) = Var(Y2) +
.
Var(Y3) - 2~Var(Y2)Var(Y3)P'tI.'{:S·
In fact when conditions have equal variances, (1.9) implies that all
covariances must be equal, which is just the condition of compound
symmetry.
tit·
17
e.
Testing for Circularity
Before one decides to employ the mixed model to analyze a
particular treatment-by-occasions data set, it is necessary to
evaluate whether the model's assumptions are satisfied.
Numerous
approaches to assess multivariate normality have been described in the
literature (e.g., Mardia, 1971; Malkovich and Afifi, 1973;
Gnanadesikan, 1977; Khatri, 1979).
drawbacks.
However, each method has serious
For example Mardia's asymptotic test requires an extremely
large sample size for accuracy, and the procedure discussed by
Malkovich and Afifi which is based on Roy's union-intersection
principle requires prohibitively large computer costs.
Fortunately,
the robustness of MANOVA to multivariate non-normality appears
satisfactory, at least when sample sizes are large.
Ito (1969) has
shown that in this case the effect of violation of the multivariate
normality assumption is slight on testing hypotheses about the mean
vectors but can be serious for testing covariance matrices (this is
parallel to the univariate ANOVA situation).
Greater attention has
been given to assessing violation of circularity and equal among-group
covariance matrix assumptions.
In order to assess circularity and equal among-group covariance
matrix assumptions, a sequential procedure is used.
Box's (1949)
multivariate analogue of Bartlett's M-test (1937) for homogeneity of
variance is £i rst used to test Hal: o'I, 0 - ...
assumption of multivariate normality.
II:
O'~, 0,
under the
If Hal is not rejected, the
circularity hypothesis H02: O'~ 0= hoi, where I: is the pooled
population covariance matrix across groups, is tested using Mauchly's
18
(1940) sphericity criterion W.
Haz
Assuming normality, if both HOI and
are not rejected, the data is judged to have the circularity
property and the mixed model within-subject F-test results are exact.
Several points are pertinent here.
First, for the test of within-
subject effects, it is the test for equality of the covariance matrix
for the contrast of interest (defined by U) that is of importance and
not the equality of the covariance matrices of the g treatment groups.
Second, the Mauchly and Box tests, in an analagous fashion to
Bartlett's test, are extremely sensitive to violations of the
assumption of multivariate normality.
The use of the Box and Mauchly
tests as preliminary procedures in the analysis of repeated measures
data is discussed further in section 1.3 comparing the mixed model
to other analytic approaches.
1.2.4
Multivariate Analysis of Variance Procedures
Statistical procedures other than the mixed model have been
developed for the analysis of the treatment-by-occasions data
displayed in Figure 1.1.
A natural extension of the univariate
general linear model that has been employed for repeated measures
studies is multivariate analysis of variance (MANOVA).
As before for
the mixed model, MANOVA assumes that the vector of responses of the
i th subject in the jth treatment group has a multivariate normal
distribution with mean vector
matrix
r
LL! = I ".
,,--J
'vf'<.J .,
.. •
,
J'
.,u..)
and covariance
which is common to the g treatment groups.
Response vectors
of different individuals are assumed to be independent.
In contrast
19
to the mixed model, however, no assumptions regarding the structure of
the covariance matrix are necessary for MANOVA.
In this section several approaches to repeated measures analysis
using MANOVA are discussed.
The use of MANOVA for profile data
(treatment-by-conditions design) is first presented followed by a
discussion of growth curve analysis.
a.
Profile Analysis
An example of a profile analysis experimental design is a child
development study in which the maternal attitudes of mothers from
various socio-economic groups (treatments) were investigated for four
.-
subscales (conditions) of a psychological inventory (Morrison, 1976).
Profile analysis is essentially a MANOVA procedure specifically aimed
at handling independent variables that are qualitative rather than
quantitative (Press, 1972).
Thus profile analysis applied to
longitudinal designs does not consider the ordered relationship of the
times of observation t'=(tl, ... ,t p ). However, a discussion of the
procedure is included here for two reasons.
First, profile analysis
has been often used to analyze longitudinal data (D. Gillings,
personal communication), perhaps due to the unavailability of a better
procedure.
Second, since the test for teatment-by-occasion
interaction using traditional mixed model ANOVA is inappropriate for
longitudinal data when U'I. U is not compound symmetric (Koch et al.,
1980), the profile analysis test offers a better way to assess
interaction as a preliminary step to testing for treatment main
effects.
20
Suppose that p responses in the same metric have been obtained from
independent observational units (subjects) grouped according to g
treatment groups.
For N such subjects an N x p array of observations is
generated (see Figure 1.1).
The k th observation on the i th subject from
the jth treatment group, Yijk' can be represented as
i-l, ••• ,N j ;
j-l, ••• ,g; k-l, ••• ,p.
where fJjk is the expected value of the k th response for the jth
treatment group and eijk is a random error term.
Expressed in matrix
notation, the N x p array Y, can be modeled as
(1.10)
Y-XB
+ E
X is an N x g design matrix representing some parameterization of
interest.
For a design incorporating covariables, X would also
contain the values of the concomitant variables.
The matrix B is the
g x p matrix of g treatment means for the p responses.
is an N x p
E = (e~ , £;, ,... ,E: rp )
matrix of residuals where £ii has the multivariate normal
distribution N(.rt~J 1:
),
covariance matrix
and identical for all treatment groups.
1:
being positive definite
The assumption of common
multivariate normal distribution among subjects implies there are no
missing data.
Under the multivariate general linear model defined by (1.10) the
maximum likelihood and least squares estimate of the population
parameter matrix B is
21
(1.11)
~ - (X'X)-lX'Y
;'\
~j- (X'X)-lX'Y j
where
Bgxp -
[,8" ... ',8, ]
and j-1, ••• ,p.
Thus the parameter estimates of group means for the p different
conditions are the same as if the data were considered a collection of
univariate data.
The correlation structure of ~ is not involved in
1\
computing B.
Tests concerning parameters of B are constructed using
(1.12)
110:
CBU· 0
where C is a c x g contrast matrix of rank c referring to hypotheses on the
elements within columns of B (comparisons across treatments) and U is
a p x u
contrast matrix referring to hypotheses on the elements
within rows of B (comparisons across conditions).
Various test
statistics have been derived to test hypotheses of the form (1.12).
The test statistics are different functions of the eigenvalues of BE- 1
where H is the between-treatments sums of cross-products matrix and E
is the within treatments sums of cross-products matrix.
given by
(1.13)
(1.14)
E • U'Y[IN - X(X'X)-lX']y'U •
Hand E are
22
With p
< N-g,
E will be non-singular with probability one.
Five
multivariate test statistics used to test (1.12) are:
1) Roy's Largest Root
R-maximum eigenvalue of R(R + E)-l
2) Hotelling-Lawley Trace T-tr(RE- l )
3) Wilk's Likelihood Ratio
4) Pillai-Bartlett Trace
-
I
E(R + E)-l
V-tr(R(R + E)-l)
5) Gnanadesikan's Determinental Criterion
roots of B(B + E)-l
U-product of s largest
where s-# of hypothesis df's.
Only when either c or u equals one are these different tests equivalent.
In other situations the results of the tests can lead to different
conclusions regarding the null hypothesis
Ho'
Test power is a function
of a number of factors including number of groups, number of occasions
and sample size.
The hypotheses of interest in profile analysis are:
1)
Are the profiles across the p conditions of the g treatment
groups parallel?
This corresponds to a test of treatment-by-condition
interaction.
The null hypothesis of no interaction can be tested
using appropriate forms of C, the contrast matrix across conditions,
and U, the contrast matrix across conditions.
2)
Do the treatment groups have the same group means?
This corresponds to a test of treatment main effects.
When there is no treatment-by-condition interaction, a univariate oneway ANOVA using the sum of each subject's data vector Yij. • j'Yij can
be used.
This is the mixed model ANOVA test of treatment main
23
effects.
When interaction is present, the above test is not sensitive
to alternatives in which certain conditions have a positive effect in
some treatment groups but a negative effect in other treatment groups
(Koch, 1969).
In this case, MANOVA is appropriate.
Rejection of the
MANOVA null hypothesis indicates either treatment-by-condition
interaction or consistent differences among treatments across
conditions.
~he
lack of inference specificity of this test is a
problem and for this reason many authors (e.g., Morrison, 1976; Winer,
1971) indicate that testing for main effects is meaningful only when
there is no treatment-by-condition interaction.
In this case the
univariate procedure is preferred.
3)
Are the average responses similar for the p conditions?
This corresponds to a test of condition main effects.
Using a contrast matrix U of rank p-l and C • I gxg , the appropriate
MANOVA test is Hotel1ing's T2 statistic (Hotelling, 1931). When there
is no treatment-by-condition interaction, the T2 test using the
weighted average of treatment group means over conditions to estimate
the p x 1 vector
Yp
has maximum power.
When interaction is present,
the T2 test using the unweighted average of group means may be more
powerful since the corresponding estimate ~ of has been adjusted for
any group effects on the distribution of the conditions contrast
(Koch, 1969).
Significant results for the three MANOVA tests described above do
not indicate which treatment combinations are different.
Roy's union-
intersection principle leads directly to a multiple comparison procedure
24
from which simultaneous confidence bounds on all linear combinations of
CBU can be constructed.
The
lOO(l-~)
percent simultaneous confidence
bounds on all functions b'CBUa of the parameters of B are specified
by
(1.15)
b'CBUa + b'C(X'X)-lX'YUa _+ (x CI() [(a'U'EUab'C(X'X)-lCb)·S]
where xCI( is the appropriate
lOO~
percentagee point from the Pillai
tables indicating the upper-tail cumulative distribution of the greatest
root (e.g., see Morrison, 1976).
When only a few!!. priori contrasts are
to be tested, shorter confidence intervals can be obtained using the
Bonferroni inequality.
b.
Growth Curve Analysis
In certain experimental situations the members of the data vector
Yij represent responses measured along some ordered dimension.
For
example, in a longitudinal repeated measures design, measurements are
recorded at specified time points (occasions).
Measurements can be
also associated with other ordered dimensions (e.g., a sequence of
doses).
While the MANOVA model just described does not consider the
ordered dimensionality of the vector members, a more involved MANOVA
model has been developed which does. This procedure is called growth
cu rve analysis.
When responses are measured along an ordered dimension such as
25
time, interest may be directed toward specifying the form of the response
curve as a function of time.
The form of the response-over-time curve
may be suggested in a particular application by theoretical
considerations.
However, in many areas of study the knowledge of the
natural laws of processes is absent and empirical functions which
provide an adequate fit to the data are utilized.
Growth curve
analysis is a multivariate statistical technique of fitting polynomials
or other functions linear in their parameters to longitudinal data.
While the technique can be applied to any type of repeated measures
design with an ordered dimension for occasions, the name "growth curve
analysis" entered the statistical literature because much of the early
development of methodologies was motivated by problems concerning
growth.
Wishart (1938) first reported the use of orthogonal polynomials in
an attempt to utilize all of the data available in a longitudinal study
presented to him for analysis.
Rao (1959) developed procedures for
parameter estimation and estimation of confidence bands for the response
curve.
For the treatment-by-occasions design the growth curve model is
described by
(1.16)
Y =- XBT
+ E
where Y is an N x p
d~ta
1.1, T is the q x p (q
~
matrix representing the data from Figure
p) design matrix within individuals, X is the
N x g design matrix across individuals, B is a g x q matrix of unknown
parameters to be estimated, and
errors.
E is an N x p matrix of random
It is assumed that the rows ofE are independently
26
distributed as a p-variate normal distribution with expectation vector
0' lxp and common covariance matrix ~ pxp'
The within-individuals design matrix, T, specifies polynomial
powers of the times of observation.
For the p-observation case, T is
of the form
(1.17)
T -
t 0
1
t 0
2
t 0
p
t 1
1
t 1
2
t 1
p
t 2
1
t 2
2
t 2
p
t1
p-1
t2
where t i - time i; i-1, ... ,p.
p-1
Implicit in this construction of T is the assumption that all subjects
are obseved at the same intervals.
A polynomial of order q-l, which
is less than p-l, may provide a sufficient fit to the data.
In this
case, T pxp is reduced to Tqxp by eliminating the last p-q rows of
(1.17).
The polynomial parameters of B in (l.16) are correlated.
Using a Cholesky decomposition approach, the design matrix T
expressed in natural polynomial coefficients in (1.17) can be
transformed into an orthonormal design matrix To'
The growth
curve model in (1.16) can then be expressed in orthonormal
polynomials as
Assuming a q_lth order polynomial describes the response curve
27
adequately, the matrix Bo is a g x q matrix of orthonormal
parameters.
Using this framework, independent tests of parameters in
Bo can be performed.
As in other general linear models, the
across-individual design matrix X can be constructed to identify
treatment group membership as well as contain covariable values.
Potthoff and Roy (1964) first considered the model in (1.18) and
derived a weighted least squares estimate of Bo which involves an
arbitrary matrix of weights G- 1•
increases as G-
1
.1
departs from ~ •
They showed that the variance of Bo
The maximum likelihood (and
weighted least squares estimate) of Bo was shown by Khatri (1966) to
be
(1.19)
where
S - Y'[IN - X(X·X)-lX'lY.
This is identical to Potthoff and Roy's estimate of Bo when G is set
equal to S.
Grizzle and Allen (1969) derived the unconditional
covariance matrix of Bo as
where~
A
is the Kroenecker product and var(~o) is the covariance ma~rix
of the elements of
.0
"rolled out" and expressed as a column vector.
Multivariate testing procedures based on functions of HK- 1 can be used
to test the hypothesis HO: CBoU - O.
error matrix K are defined by
The hypothesis matrix Hand
28
where R· (X'X)-l + (X'X)-l X'YS-1 Y'X(X'X)-1
- Bo(ToS-1To')Bo'·
Rao (1965, 1966, 1967) developed a somewhat different approach to
growth curve analysis based on an analysis of covariance framework.
He showed that unless the weight matrix G- 1 equals 1:"', sample
information for estimating Bo is lost.
He suggested that the
remaining (p-q) polynomials not included in the model, or some subset,
might be used as covariables to provide a better estimate of B o •
When
all p - q variables are used as covariab1es this is equivalent to
using G- 1 ~ S-l as weights.
When no covariab1es are used this is
equivalent to the unweighted analysis discussed by Potthoff and Roy
(1964), Le.,
If tpe p-q polynomials provide no information about the q
polynomial parameters in the model, the sample dispersion matrix, S,
is of the form (Rao, 1965)
(1.23)
1: =
E(S) = (N-g)(Ta'A Ta
+
0':1)
where Ta is the q x p reduced within-individual design matrix, and J\
is the covariance matrix of reduced-model parameters.
m~asures
TR'~Ta
the deviation of each individual's curve from the group curve
based on the reduced model using a q-l th order polynomial, and a'·I
~ ~
29
measures the deviation of each individual about his own curve.
Rao
(1965) provides a likelihood ratio test of (1.23) versus general
covariance structure.
(1.24)
The test criterion is
- ----------------------
When N is sufficiently large, the statistic -2 In 1\
is approximately
distributed as ~~ with (p-q)(p+q+1) - 2]/2 degrees of freedom.
When the dispersion matrix is not of the form in (1.23), the
exclusion of all of the p-q polynomials as covariables can result in
loss of information leading to less precise estimates of model
parameters.
Grizzle and Allen (1969) indicate that determination of
which of the p-q variables to choose as covariables can be done by
review of the correlation between variables in the estimation space
(model parameters) and variables in the error space (p-q covariab1es).
Uniformly high correlations would suggest using all p-q variables as
covariables while uniformly low correlations would suggest an
unweighted analysis.
Rao (1965) ponts out that using sample
information to select covariables is subject to criticism and is
likely to result in estimates whose precision is overestimated.
suggests the use of an
~
He
priori procedure, such as using an analysis
of previously collected data to determine a set of useful covariables.
An alternate strategy within the polynomial curve fitting
framework is to fit separate polynomial curves over different
30
subintervals of the period of observation [t1,t p ).
This modeling
approach is appropriate when response to treatment follows one pattern
for some period of time and yet another for a subsequent time period.
An example of such a situation is the basal body temperature as a
function of the day of a woman's menstrual cycle (Teeter, 1982).
Prior to ovulation one (linear) submodel for basal body temperature
fits and a second (linear) submodel fits after ovulation.
In the
context of a pharmaceutical clinical trial, the initial time period
following initiation of treatment may involve a placebo effect.
In a
subsequent time period, the placebo effect disappears and gives way to
a different pattern of response.
Figure 1.2 displays a hypothetical
mean time-response curve that suggests the fitting of a piecewise
model may be appropriate.
The intersection points of the submode1s are referred to as join
points and reflect the constraint that fi(T) • f i + 1(T), t i
~
T
~
ti+1'
where T is the join point, and t i and ti+1 are fixed times of
observation.
Spline function theory adds further constraints that all
derivatives of the submode1s agree at the join points, which are
termed knot points (Smith, 1979).
In addition, spline theory selects
the knot points for analytical convenience, i.e., they may have no
physical interpretation (Feder, 1975).
be either known or unknown.
Join points can be assumed to
Teeter (1982) discusses the use of
piecewise linear rgression models to estimate the join point when
measurement error is present in the independent variables.
Situations
where the interval containing the join point is assumed either to be
known or unknown are considered.
Brunelle and Johnson (1980) consider
the use of a linear piecewise regression model in the repeated
31
HYPOTHETICAL MEAN TIME-RESPONSE CURVES
FIGURE 1.2
FOR PLACEBO AND ACTIVE TREATMENTS
Response
Following
Initiation of
Treatment
o
1
2
3
4
5
6
7
Time Following Initiation
of Treatment
8
9
10
32
measures clinical trial setting to estimate the slope of the line
segments representing response to treatment.
In each case the assumed
model for the two-segment model is of the form
(1.25)
subject to the constraint
/3.0 + fill
T -
fib +
join point T is known, the unknown parameters
estimated with the use of indicator variables.
fJ~ T.
~jo and
Assuming the
(Jji, can
be
Teeter (1982) applied
this model to the cross-sectional ovulation study mentioned above.
In
an analysis of data from a repeated measures pharmaceutical c1incal
trial, Brunelle and Johnson (1980) applied ordinary least-squares
regression to (1.25) to estimate model parameters.
This approach
ignores the correlation structure of within-subject observations in
the repeated measures design and is appropriate for the crosssectional design only.
In Chapter 5, another approach to the use of piecewise models to
describe repeated measures response data is discussed.
The proposed
approach, which assumes the join point is known and coincides with a
study observation time point, is an extension of the growth curve
model framework.
In Chapter 6, this piecewise growth curve model is
applied to data from a pharmaceutical clinical trial of an
investigational anti-depression drug.
33
1.2.5
Summary Univariate Procedures
a.
Area-Under-the-Curve Analysis
A potentially useful approach to the analysis of treatment main
effects in response-over-time studies is area-under-the-curve (AUC)
analysis.
Response to treatment j might be plotted as a function of
time as shown in Figure 1.3.
For this hypothetical curve, one can
note a rapid initial response followed by a more gradual reduction in
response after peak response is achieved.
The response curve fj(t),
when integrated over observation period [tl,tpl provides a measure of
--
AUC.
That is,
t,
(1.26)
AUC j -
~fj(t)
dt.
t.
AUC represents the shaded area under the response curve fj(t) of
Figure 1.3.
This parameter, measured in response parameter units
multiplied by time units, provides a univariate measure of cumulative
response to treatment and can be viewed as a summary or index measure
of response over the period of observation.
To further illustrate the nature of the AUC response parameter,
(1.26) can be restated as
(1.27)
f u (t p - tl) for some f u
such that fmi~ f u ~fmax·
This result follows directly from the theorem of the mean of integral
34
FIGURE 1.3
HYPOTHETICAL AVERAGE RESPONSE TO TREATMENT j.
Average
Response
p
Time Following Initiation of Treatment
FIGURE 1.4
TIME-RESPONSE CURVE OF FIGURE 1.3 SHOWING
AVERAGE RESPONSE
f~.
Response
Time Following Initiation of Treatment
35
calculus.
Here f u represents an average response to treatment
exhibited during the period of observation [t 1 ,t p ).
representation of AUC is shown in Figure 1.4.
This
Thus the AUC parameter
can be viewed as the average response level during the observation
period multiplied by the length of observation.
In identical manner to
one-way ANOVA, the total corrected AUC sum of squares can be
partitioned into between- and within-subject sums of squares.
Specification of the ANOVA model utilizing the AUC parameter is given
in Chapter 2.
While AUC is a fundamental concept in pharmacokinetics (Gibaldi
and Perrier, 1975) and has been extensively used in the
pharmacological literature (e.g., see Hamilton et al., 1981; Garty
--
and Hurwitz, 1980; Ginsburgh and McCracken, 1980), its use in clinical
trial applications is almost non-existent.
The one exception has been
the field of analgesic research which has employed AUC-like parameters
to assess treatment efficacy in relieving pain.
subjective pain ratings
Using patient
(O-no pain, ... ,3-severe pain), the 'sum
of pain intensity differences' (SPIO) is obtained by taking the
difference of the baseline rating and the score at each of the p
observation points (defined as PIO's) and then summing these PID
differences.
Plotting PIO values according to time of response leads
to a plot similar in shape to Figure 1.3 (e.g., see James et al.,
1982).
As first defined by Bellville et ale (1968) and as most often
used (e.g., see Kaiko, 1980; Laska and Sunshine, 1981), the SPIO
parameter is an unweighted sum of PIO scores.
As such, the SPIO score
generally will not be equivalent to the AUC parameter.
One group of
investigators (James et al., 1982) has recognized this and has
36
suggested a weighted SPID score, where the weights are time interval
lengths, which is equivalent to the AUC value when fj(t) is estimated
using linear interpolation.
No discussion of the statistical properties of AUC analysis and
its possible role in the analysis of repeated measures data has
appeared in the statistical literature, to the knowledge of this
writer.
A major focus of this dissertation is the review of the use
of AUC analysis for the treatment-by-occasions design.
Methods to
estimate AUC are presented in Chapter 2.
b.
Zerbe-Walker Test
Zerbe and Walker (1977) introduced a summary measure somewhat
similar to the AUC parameter that can be used to assess treatment
differences with respect to response curve shape.
The Zerbe-Walker
parameter d(x,y) is based on cumulating distances between response
curves x(t) and y(t) over the interval [t1,tpl.
The distance between
two curves x(t) and y(t) over [t1,tpl is defined as
(1.28)
d(x,y) -
Using (1.28), the total sum of squared distances from the grand mean
curve can be partitioned into sums of squares between groups and
within groups in a manner analagous to ordinary randomized analysis of
variance.
The Zerbe-Walker (Z-W) test ANOVA table for the one-way
layout is given in Table 1.1.
The authors suggested using the F-statistic based on the Z-W
e-
TABLE 1.1
ANALYSIS OF VARIANCE TABLE FOR
ZERBE-WALKER TEST STATISTIC
~
ONE-WAY LAYOUT 1
Source
Between
Groups
Sum of Squares
B
=
r
DF
Mean Square
t
nj f(Y.j(t) -
y•• (t»2 dt
g-1
B/(g-l)
t.,
W=
1; ~
1
l
f
t
cr1(t)
dt
t.
tp
Within
Groups
Expected Mean Square
S
(Yij(t) - Y. j(t»2 dt
n-g
W/(n-g)
t,
!
+
[t
nj
f
(I'}t> -
jL(t))
2
t,
<f(t) dt
1:,
tr
Total
T =
~~ kYij(t) - Y•• (t»2 dt
J
where
n-l
'~
Y. j (t) = ~ Yi j (t) / n j;
l
Y. ,<t)
P.(t) =
~ ~, Yij(t) / n;
=
)
4: nj "uj(t)
/ n.
J
1 Reprinted from Zerbe and Walker (1977).
e
e
tit
dt]/(g- 1)
38
parameter as a randomization test statistic to test
110:
,)I., (t) - ...
- ,t(9(t) for all t in [t1,t p ] versus Ha : .,tlj(t) '" )l.k(t) for some j " k
for some subinterval of [t1,t p ].
The exact permutation p-value is
determined by evaluating F for each of the R • nl /
assignments of n subjects to g groups.
11 nj!
1-'
possible
An approximate p-value is
obtained by evaluating F for a random sample (with replacement) of the
R possible assignments.
Zerbe (1979) extended the use of the Z-W statisic to the case of
the randomized block repeated measures design where the subject
receives each of the g treatments in a randomized order.
He developed
an approximate randomization test for testing treatment differences
over subintervals of [t1,tpl which does not require the evaluation of
random samples of the (gl)n permutations of g treatments to n
subjects.
The derivation of the approximate randomization test is
based on an approach used by Pitman (1937) whereby the first two
moments of the randomization test statistic B(t) - T(t) / [T(t) + R(t»)
( T(t) and R(t) represent treatment and residual sums of squares at
time t) are equated to the first two moments of a beta distribution.
The parameters of the beta distribution, V, and Va.' are then found as
(E[B(t»))2 (1 - E[B(t)])
-
(1.29)
Var[B(t)]
~
.
(1 - E[B(t)]) / 2 E[B(t)].
2 E[B(t)]
e-
39
In a small simulation study, p-values for the approximate test were
seen to be contained in the 99 percent confidence intervals estimated
from a random sample of 1000 permutations.
The similarity of the Zerbe-Walker statistic and the AUC
statistic is apparent from review of equations (1.26) and (1.28).
While the AUC approach detects differences in area-under-the-timeresponse curve, the Zerbe-Walker statistic involves integration of
squared differences between response curves.
These two univariate
response indices are discussed in further detail in Chapter 2.
1.3
COMPARISON OF ANALYSIS METHODS
The methods of analysis of repeated measures data that have been
discussed differ considerably with respect to applicability to
particular data situations as well as with respect to statistical
characteristics (e.g., assumptions, power, robustness).
For example,
endpoint analysis may be the method of choice to analyze treatment
differences in a trial of a drug for which a monotonically increasing
response is expected over the period of observation.
In the case of a
single-dose trial of an analgesic for the temporary relief of chronic
pain, endpoint analysis is probably inappropriate since the last
subject response may be far different from the subject's highest
response due to a lowered level of active drug metabolites in the
blood.
Here other measures, such as total cumulative response using
AUC analysis or maximum response level, are perhaps preferrable.
other experimental situations, the form of the response
funct~on
In
may
40
be of primary interest, as in the study of human growth for some age
range in some population.
Here, AUC analysis, which eliminates the
dimension of time-response through integration over time, cannot be
used.
Growth curve analysis is the method of choice in this
situation.
The effect of features of the experimental situation on the
choice of an analytic strategy for repeated measures data is discussed
more fully in Chapter 7.
In this section, the literature is reviewed
to compare the various methods of analysis with respect to statistical
characteristics such as power and robustness.
1.3.1
Mixed Model Approaches
In a Monte Carlo study, Collier et al. (1967) compared the Type I
error rates of the conventional split-plot F-test,
/'10
~
-adjusted, and
Greenhouse-Geisser conservative test (e set equal to 1/(p-l) ) for
tests of treatment-by-occasion interaction (F AB ) and occasion main
effects (F B) •
Various combinations of group size (n-5, 15), oe,-level
(0(- .10, .05, .025, .01), and circularity patterns (Eo ranging from 1
to .45) were studied for a three-treatment by four-occasion design.
Data were generated for a multivariate normal distribution with equal
number of subjects per group and equal covariance matrices among groups.
For the test of occasion main effect, FB, the authors found that the
unadjusted test results in p-values becoming smaller than the nominal
critical value as €. decreases and with smaller sample sizes.
conservativeness was seen to increase greatly when
~
< 0.64.
The antiThe p-
values of the£-adjusted test were found to be generally very close to
41
nominal values, except for values of £ close to unity, in which case the
~-adjusted
procedure was found to be conservative.
Type I error rates
for the conservative test were much less than nominal rates for all but
very low values
of~,
too conservative.
suggesting the conservative test is generally much
Results for tests of FAB were found to be similar to
that described above but with slightly larger differences between
observed and nominal Type I error rates.
Other investigators
(Sto1off, 1970; Wilson, 1975; Huynh and Feldt, 1976, 1980) found similar
results regarding Type I error rates for the unadjusted and adjusted
mixed model procedures.
Huynh and Feldt (1980) investigated the traditional F-test in the
situation of general covariance structure both when the g treatment
-e
groups had and did not have equal covariance matrices.
unequal group sizes was also investigated.
The effect of
For unequal covariance
matrices, variation in group sample sizes and degree of circularity were
found to be important factors with respect to Type I error rates.
With
compound symmetric covariance matrices, inequality of group sizes were
found to result in liberal tests, and more so for FAB than FB•
With
equal group sizes and non-circular, unequal covariance matrices, overly
liberal tests were again noted, especially for the interaction test
1.3.2
MANOVA versus Mixed Model ANOVA
Like mixed model ANOVA, multivariate analysis assumes multivariate
normality.
In contrast to mixed model ANOVA, however, MANOVA does not
assume any particular form of the population covariance mattix~. The
42
obvious question is why not always use MANOVA procedures instead of the
traditional mixed model or E-adjusted ANOVA procedures.
While
multivariate tests such as Roy's largest root and Hote11ing's T2
generally control for Type I errors, and the univariate and multivariate
procedures are asymptotically equivalent (Danford et a1., 1960),
multivariate tests do not always give the most powerful test of withinsubject hypotheses.
When circularity properties exist the mixed model
ANOVA is always as or more powerful than multivariate procedures.
Davidson (1972) states that MANOVA is nearly as powerful as mixed
model ANOVA. assuming circularity, when the sample size per group is greater
than the number of conditions by 20 or more.
In the situation where the
sample size approaches the number of conditions, the power of the MANOVA
tests is drastically reduced.
In fact when the sample size is less than
the sum of the number of treatment groups and occasions minus one (i.e.,
N
< p+g-1),
multivariate tests cannot be performed since in this case
the error matrix E of equation (1.14) is singular.
In a simulation study, Rogan et a1. (1979) compared MANOVA tests
of within-subjects effects with mixed model model approaches
A
..-
(conventional F-test, conservative, €-adjusted, and €-adjusted tests)
with respect to Type I error annd power rates.
A three-treatment by
four-occasions design with 13 subjects per treatment group was
considered.
Ten different data sets were generated reflecting varying
levels of circularity ( € ranged from 1 to 0.48) and varying levels of
heteroscedasticity (Box's criterion M was used to measure departure
from the assumption U· I
T2
j
U ... c).
For multivariate tests, Hote11ing's
statistic was used for tests of condition main effects, and Roy's
largest root, Pi11ai-Bart1ett trace and Wi1k's
A were
used to test for
43
treatment-by-occasion interaction.
In the situation of circularity
and homoscedasticity, the traditional split-plot F-test (F conv ) was
found to be more powerful than any alternative test.
When circularity
holds but homoscedasticity does not, Fconv was seen to compare
favorably with the £-adjusted procedures with respect to power and
was superior to the multivariate procedures.
As the level of
circularity decreases a loss of power was noted for all testing
procedures, with the multivariate procedures generally having less
power than the
~
-adjusted tests of treatment-by-occasion interaction.
In this situation, the Fconv test was seen to give poor Type I error
control.
Based on their results the authors conclude:
'~he
-e
-
data from this investigation do not suggest a clear-cut
~
rule for choosing between the E.
and E.-adjusted univariate F
tests and their multivariate counterparts.
In general, if the value
of the parameter £ is greater than or equal to 0.75, the
-
E .-adjusted procedure provides the most robust and most powerful
test of both within-subject effects, followed by the """
E. -adjusted F
test and finally the multivariate tests.
If E.
< 0.75
the
multivariate procedures are consistently the most powerful tests of
either within-subjects hypothesis followed by the Eo
"'" - and
i-adjusted F tests." (p. 282)
The authors in similar fashion investigated the effect of non-normality
and found that the multivariate T2 test of condition main effects is
most powerful but overly liberal with respect to Type I error rates and
recommended using an E.-adjusted test in this situation.
Ito and Schull (1964) studied the effects of violation of the
2
assumption of equal among-group covariance matrices on Hotell1ng's T
44
test and found that while the test was fairly robust with equal sample
sizes, if sample sizes vary greatly a large effect on the level of
significance and power of the test occurs even when violations are
moderate.
increases.
These effects are exascerbated as the number of occasions
Holloway and Dunn (1967) found that equal sample sizes tend
to keep the level of significance of the T2 test close to the nominal
value in the situation of heteroscedasticity but does not help in
maintaining the power of the test.
Approaching the problem from the concept of randomization, Koch et
a1. (1980) provide some guidelines in the choice between mixed model
ANOVA and MANOVA for analysis of within-subject factors when assumptions
of normality and homoscedasticity are violated.
By randomization
arguments, the authors state that for sufficiently large samples (e.g.,
at least 20 per treatment group) randomization of subjects to treatment
results in valid approximate F-tests of treatment effects, and
randomization of conditions to subjects results in valid mixed model
tests of condition effects. In the longitudinal design, however,
occasions obviously cannot be randomized to subjects, and
randomization arguments cannot be used to motivate the mixed model
test of occasion main effect.
The mixed model F-test of treatment-by-
condition interaction cannot be justified by randomization arguments.
When there are at least 30 subjects per treatment group, the authors
consider the MANOVA procedure justifiable by randomization arguments
even when the underlying assumptions are violated.
The principal
advantage of MANOVA over the mixed model approach is that it
"permits valid tests to be undertaken for within experimental
unit hypotheses concerning the comparison of conditions for
45
situations where there is no randomization of conditions to
observational unit.
Similarly, it also permits tests to be
undertaken for condition x treatment interaction.
On the other
hand, it may either be inapplicable or have unsatisfactory power
when the number of observational units per experimental unit
either exceeds or is relatively large with respect to the number
of experimental units." (p. 254)
Results by Rogan et al. indicate that uniform superiority of any
of the mixed model and MANOVA approaches does not exist for all levels
of covariance matrix heterogeneity and non-circularity.
Even choosing
which multivariate test criterion to use remains an open question,
although from robustness considerations, Olson (1974) recommends the
-e
Pillai-Bartlett trace criterion.
Addressing the question of when to
use mixed model or MANOVA analysis for the occasions (only one
treatment) design, Davidson (1972) states:
'Large differences in power occur only when small but reliable
effects are present with effects highly variable but averaging
to zero over subjects; the multivariate test is preferrable in
such cases."
It has been suggested (Rouanet and Lepine, 1970; Huynh and
Feldt, 1970) that the decision of whether to use the mixed model
univariate approach or the multivariate approach might be reasonably
based on preliminary testing of circularity and homoscedasticity
assumptions using Box's modified likelihood ratio criterion M to test
U' ~I u- ••• =U' ~S U where 0 is a contrast matrix of interest for the
within-subject factor space and using Mauchly's sphericity criterion W
to test U'1: U- ,\·1.
Questions regarding test performance in tLe
46
presence of non-normality arise, however.
Huynh and Mandeville (1979)
investigated the effect of non-normality on the Mauchly test and found
that the W criterion tends to err on the conservative side for light
tailed symmetric distributions.
Rogan et a1. (1979) and Keselman et
al. (1980) studied the use of the M and W tests as preliminary tests
of circularity for normal and non-normal data and found that the tests
were almost useless.
Keselman et a1. state that this conclusion is
supported by the
"virtually identical Type I and
po~er
rates between the sequential
strategies that used preliminary tests and the uniformly adopted
corrected degrees of freedom tests that did not.
• •• Thus it
appears that for all but the most minute departures from the
validity conditions, the rejection of the circularity hypothesis
is a fait accomp11o"
It seems then that preliminary testing of circularity has little to
recommend it.
Review of the simulation study results by Rogan et a1.
suggests that the investigator wanting to choose between mixed model
ANOVA and MANOVA analysis strategies for analysis of within-subject
effects must reach a decision based on procedures that are in some
sense subjective.
Calculation of the sample estimate of € as an
initial step will help the decision process in that
/'
values close to
.....
unity clearly recommend the mixed model approach and small ~ values
recommend MANOVA.
What are small values?
~
Based on results presented
in Rogan et al. (1979) one decision rule for choice between MANOVA
and mixed model is to use MANOVA when the data are not highly skewed,
has reasonably more subjects per group than conditions and
than 0.75; use traditional mixed model ANOVA when
'" is
~
/'
~
is less
very close to
e-
47
A
unity (say €
> 0.95);
and use an adjusted mixed model procedure
otherwise.
1.3.3
Multivariate Procedures versus Combined Univariate Tests
It might be conjectured that the properties of multivariate
analysis are extensions of the univariate case and that conclusions
reached by the two methods are generally consistent.
That this is not
the case is illustrated by what has been labeled Rao's paradox (1966).
The "paradox", described by Healy (1969) for the bivariate case, is
that univariate tests for occasion effects
HOl:~'·O
and Hoz:NO can
both be significant at a given o('-level while HotelUng's TZ test of
H:
[~I
I"t) • [0 0) can be non-significant, and vice versa.
This
situation can be explained by review of Figure 1.5 which displays the
rejection points for the univariate tests and the 95% acceptance
ellipsoid for the TZ test in the situation where the correlation
between variables.jO, equals 0.5.
function of the correlation
I'
The differences in power is a
between variables.
When the
correlation is zero, the TZ test has appreciably less power when
population means are equal than the combined univariate tests.
absolute value of
;0
As the
increases the power of the TZ test eventually
surpasses that of the univariate test (Morrison, 1976,).
Criticism of combining univariate test results through multiple
comparison procedures is often that this is a crude method which ignores
the correlation structure of the repeated measures while multivariate
approaches do incorporate such a structure.
In addition, when the number
of occasions is large, the power of a method combining univariate
48
FIGURE 1.5
Illustration of Rao's Paradox for Bivariate
Case when p =0.5
~ qS% A~talotce
E.\h p~o'\C!
eW~~~?~~---
tol./
U
"-...l_'_""_,__---_"r_-UM:, IJ'QA; JL Ac.tLp ~ -a "'C.e
---.-----
-t. \......,.. 1..1,
"I
..,,,,,
~~iO'lo\. ~ ~:A, :0.
KEY
IIII
~
- Both univariate tests significant; T
- Neither univariate test
2
not significant.
s~gnificant; T 2 significant.
1111
- Univariate test for Y significant; T
1
=
-
- Univariate test for Y significant; T
2
2
2
not significant.
not significant.
49
results across occasions such as the Bonferroni multiple comparison
.
procedure, may be low.
This provides a principal motivation for use
of multivariate procedures.
However, problems arise for the
multivariate approach from violations of more stringent and complex
assumptions and from problems of interpretation of test statistics.
Regarding the former point, Kowalski (l972) states,
'~hen
the covariance matrices are unequal, increasing the number
of variables causes a direct increase in the level of significance,
i.e. the problem of a Type I error, of the test. People have
complained long and loud about how the use of separate univariate
t-tests muddles up the composite level of significance (though the
overall level can often be calculated and almost always bounded)
-e
but, unless the covariance matrices are equal, the same sort of
phenomena effects the multivariate test and the greater the number
of variables the greater the distortion of the nominal significance
level." (p. 122)
Methods to assess violations of assumptions in the multivariate case,
such as multivariate goodness-of-fit tests for normality, are less well
developed.
Other univariate concepts such as skewness and kurtosis have
no direct analogy in the multivariate case.
Regarding the problems of interpretation, growth curve analysis
provides a good example of some of the potential problems associated
with asseasing multivariate analysis results.
One important use of
growth curve anlaysis over the profile analysis MANOVA approach occurs
when the number of repeated measures is large relative to the number
of subjects per treatment group.
The use of growth curve analysis
reduces the dimensionality of the problem to the lowest order
50
polynomial curve which adequately fits the data.
However. in certain
experimental situations. problems of interpretation remain.
For a
fitted fourth-order polynomial curve. what is the physical
interpretation of a significant fourth-order coefficient?
Why. other
than mathematical convenience. has the polynomial function been chosen
to be fit to the data?
For treatment comparisons. what is the
interpretation of significant differences among treatment groups with
respect to fourth-order coefficients?
Grant (1956) discusses an
example where the interpretation of polynomial coefficients is
straightforward and a result of theory behind the observed phenomena.
He suggests. "It is not a particularly wise procedure to test
components for which there are not specific .!!. priori hypotheses."
Meyers (1979) in discussing polynomial curve fitting. amplifies this
e-
point:
"Any set of a data points can be fitted by a polynomial of order
a-1. but if the population function is not polynomial the polynomial
analysis will be misleading.
It is also dangerous to identify
statistical components freely with psychological processes.
It is one
thing to postulate a cubic component of (factor) A. to test for it.
and to find it significant. thus substantiating the theory.
It is
another matter to assign psychological meaning to a significant
component that has not bee postulated on
1.4
~
priori grounds."
RESEARCH PROPOSAL
This dissertation examines several new approaches to the analysis
51
of repeated measures data.
Particular attention is given to the
applicability of analytic approaches to pharmaceutical clinical trial
efficacy data.
1.
Specific areas of interest are:
Examination of the concept of area-under-the-time-response
curve (AUC) as a response parameter.
Two methods of estimating AUC are considered. One procedure
assumes fj(t) can be approximated over observation period [tl,tpl by a
qth-order polynomial curve
curve analysis.
~ij(t) estimated from the data using growth
The second estimation procedure uses the observed
~
data vector Yij without explicitly specifying fij(t).
The accuracy of
the estimation procedures are compared.
The use of AUC analysis is compared to the mixed model ANOVA
test of treatment main effects.
Conditions under which equivalent
test statistics are obtained are specified.
The power of the AUC F-
statistic is compared to the Zerbe-Walker test statistic when linear
interpolation estimation is used to approximate response function f(t)
on subintervals of [tl,tpl.
The use of AUC analysis when treatment-by-time interaction is
present is considered.
The issue of treatment-by-time interaction and
its relationship to the interpretation of treatment main effects is
addressed.
A weighted AUC approach, which bears a certain resemblance
to the method of direct standardization in epidemiologic research, is
proposed.
2.
Development of a piecewise growth curve model for the
modeling of response to treatment whose form changes over time.
52
For certain response phenomena, the form of the response
curve is segmented.
That is, one response function operates for one
interval following initiation of treatment, and another function is in
effect for subsequent intervals.
Assuming the time of change of
response is known, a piecewise growth curve model is developed.
Implications of use of this model rather than a single growth curve
function over [tl,tpl are discussed.
3.
Comparison of analysis methods through an analysis of
clinical trial data sets.
The results of analyses of clinical trial data using separate
univariate analyses, mixed model ANOVA, AUC analysis, Zerbe-Walker
test analysis, and growth curve analysis are compared.
The validity
of assumptions made by the different methods are reviewed relative to
characteristics displayed by the data.
The data sets to be utilized
are:
a.
a clinical trial of the efficacy of a non-steroidal anti-
inf1amation drug in the treatment of rheumatoid arthritis.
This
randomized parallel-group (active vs. placebo) study at four
investigative sites assessed the relative efficacy of active drug to
reduce pain and level of immobility due to arthritis.
The response
measure of interest is the number of painful joints experienced by the
study subject, observed at baseline and weekly intervals for four
weeks following initiation of treatment.
The sample size for analysis
is 92.
b.
a clinical trial of the efficacy
of an non-steroidal
arthritis treatment relative to standard (aspirin) treatment.
This is
53
an independent study of the same test treatment as is investigated in
a. above, this time measured against standard treatment.
Data
characteristics differ in that data from eight investigative units are
utilized and response to treatment (number of painful joints) is
measured at 11 time points during a six and one-half month treatment
period.
The sample size for analysis is 81.
c.
drug in the
a clinical trial of the efficacy of an investigational
treatme~t
of depression.
The test drug, an inhibitor of
the neurotransmitter monoamine oxidase (MAO), was tested versus
placebo in a randomized parallel-group trial initially involving 29
outpatients.
Response to treatment was measured at baseline, and
Weeks 1, 2, 3, 4, and 6 following initiation of treatment.
Favorable
response is indicated by a reduction in total symptomatology score
derived from the Hamilton Depression Inventory.
4.
A summary comparison of analysis strategies.
Based on issues addressed in the items above, considerations
relevant to planning the analysis of a longitudinal data set are
discussed.
The applicability of the various analysis methods to
pharmaceutical clinical trial data is reviewed in light of both
experimental and statistical modeling issues.
In the next chapter, the AUC approach to analysis of treatment
main effects for repeated measures data is considered.
The
statistical model and methods to estimate AUC are presented.
The
issue of the interpretation of treatment main effects when treatmentby-time interaction is present is considered in Chapter 3.
A weighted
54
AUC parameter is proposed for this situation.
The powers of the AUC
and Zerbe-Walker tests are compared in Chapter 4.
An alternative
strategy to analysis is presented in Chapter 5 where the use of
piecewise growth curve models is presented.
Issues relative to the
use of piecewise models rather than unconstrained growth curve models
are reviewed. In Chapter 6, results of analyses of three clinical
trial data sets are presented in order to illustrate the applicability
of the various analysis approaches to
pharma~eutical
clinical trial
data and to compare inferences made by the different methods.
Characteristics of the data sets such as circularity, normality, and
existence of missing data are compared to assumptions made by the
analysis methods.
Considerations in selecting an analysis method for
repeated measures data are discussed in Chapter 7 including issues
such as the use of scientific knowledge to motivate modeling
approaches, the effect of missing data on choice of analysis, and the
applicability of certain analysis methods to address particular
research questions.
outlined.
In Chapter 8 suggestions for future research are
CHAPTER 2
AREA-UNDER-THE-CURVE ANALYSIS
2.1
INTRODUCTION
A potentially useful approach to the analysis of treatment main
effects in response-over-time studies is area-under-the-time-response
curve (AUC) analysis.
Response to treatment j can be plotted as a
function of time as in Figure 2.1.
-e
This plot might represent the true
response curve for an individual assigned to treatment j or the true
average response curve to treatment j for some population.
The
interval [t1ttpl represents the time interval of observation.
response to treatment j at time tk t
fj(t k ).
tl~
tk
~tpt
The
is represented by
An intuitively appealing measure of cumulative response to
treatment j is the sum of all the response intensities fj(tk).
This
sum t for continuous response function fj(t)t is represented by
If
t
(2.1 )
AUC j -
j(U) duo
~,
This parameter t measured in response parameter units multiplied by
time units t represents the shaded area under fj(t) in Figure 2.1.
The
AUC parameter can be viewed as a summary or index measure of
cumulative response to treatment over the observation period [tIttpl.
56
FIGURE 2.1
HYPOTHETICAL RESPONSE CURVE
TO TREATMENT j.
Response
Time from Initiation of
Treatment j.
FIGURE 2.2
TIME RESPONSE CURVE OF FIGURE 2.1
SHOWING AVERAGE RESPONSE f u •
Response
Time from Initiation of
Treatment j
e-
57
In similar fashion to (2.1), the AUC parameter can be defined for
subintervals of [tl,tpl.
To further illustrate the nature of the AUC
response parameter, note that (2.1) can be rewritten as
(2.2)
AUC j -
f u x (t p - tl) for some f u
where f min < f u <f max '
Here f u represents an average response to treatment j exhibited during
the period of observation [tl,tpl.
in Figure 2.2.
This representation is displayed
This indicates that the AUC parameter can be viewed as
a measure of average response during the period of observation
multiplied by the lentgh of the observation period.
In practice the form of response function fj(t) is unknown.
However, the AUC response for each study subject i from treatment j,
AUC ij , can be estimated based on the observed response vector Yij'
Various methods of AUC estimation are reviewed in section 2.3.
Using
the AUC estimate for each study subject, the analysis can proceed in
an identical manner as for ordinary univariate ANOVA, i.e., the total
corrected AUC sum of squares can be partitioned into between- and
within-subject sums of squares from which significance tests using Fstatistics can be constructed.
is presented in section 2.2.
The ANOVA model for the AUC parameter
Important experimental and statistical
considerations impact on the applicability of AUC analysis to repeated
measures data.
Experimental considerations, such as seeking to
establish a statistical model to conform to study goals and to
incorporate knowledge of the form of the response function f(t), are
discussed in Chapter 7.
Another experimental issue, the meaning of
58
tests of treatment main effects in the presence of treatment-by-time
interaction, is discussed in Chapter 3.
2.2
STATISTICAL MODEL
/"-.
Assuming an AUC value, AUC ij , is calculated in some unbiased
manner for suject i from treatment j with response vector Yij' •
, Yijp)' a univariate analysis of variance center point
model can be specified for the single factor repeated measures design
as
(2.3)
rnj=n,
j-l, ... ,g,
~o(j=O;
J
where
/'C- is
an overall AUC mean,
0< j
is the effect of the jth
treatment, and eij is a random error term.
Standard ANOVA assumptions
apply and are:
1.
AUCij's are independent for all ij.
2.
eij - N(O, (12) for all ij.
An analysis of variance table showing expected mean squares of
between- and within-subject sources of AUC variation is presented in
Table 2.1.
In the notation of Table 2.1, 9ij(t) represents the
estimated response function for subject 1 from treatment group j over
the observation period [t1,tpl.
indicates that F • (B!(g-I»
Review of the expected mean squares
! (R!(n-g»
is an appropriate test
TABLE 2.1
ANALYSIS OF VARIANCE TABLE FOR AUC PARAMETER
ONE-WAY LAYOUT
'"
Lr)
Source
Between
Groups
B ..
Within
Groups
R-
Total
DF
Sum of Squares
t f(y
nj[
Mean Square
g-1
oj( t) - Y.. (tlldtj2
B/(g-1)
Expected Mean Square
~
<T'AUC
J
t.
~ ~ [ .l<Yij(t) J
+ ~nj[
2
n-g
(Yij(t) - Yo (tlldt]2
n-1
L
Y o j(tlldtJ
R/(n-g)
t,
S
(j(jC"t) - p,(!.))
t-
dt
]2/(g-1 )
•
1-
CTAUC
~
~r [
T ..
J
I
f
t
0
t.
where
Y.j(t) •
y.. (t) =
p(t)..
e
..
~ Yij(t) 1 nj;
" ...
~ ~ Yij(t) 1 n;
r
J
•
nj ).l}t) I n.
e
..
•
e
60
Model (2.3) can easily be extended to the case of a multifactor
design.
For example, in the case of a two-way factorial design, a
center-point model is specified as
•
(2.4 )
i-l, ... ,n jk ,
j-l, ... ,g; k-l, ... ,l;
with side conditions fo<.j- fA-l'oyjk- f~k-O.
J
For illustrative
,J
purposes, consider model (2.4) arising from a cooperative repeated
measures clinical trial.
Then o(j might represent the effect of
treatment j, ~k is the effect of the k th investigator, and ~jk is
the interaction of the jth treatment and the k th investigator.
The
analysis of variance table for this design, assuming balanced data and
that both factors are fixed, is presented in Table 2.2.
An important property of the AUC parameter is the ability to
investigate treatment differences for some subinterval of the total
observation period [t1,t p ].
This is done by simply replacing tl and
t p in (2.3) by integral limits of interest t L and tu' where
tl~tL<t~tp.
This analysis feature is not shared by growth curve
analysis in which treatment comparisons are made based on parameters
estimated for the entire observation period [tl,t p ].
The estimation of AUC for subintervals of [tl,t p ] leads directly
to a test for treatment-by-time interaction and time main effect with
respect to AUC response.
k
~-l,
For each adjacent subinterval [tk,tk+l]'
l~
an AUC estimate can be obtained, resulting in a p-l dimension
,.
TABLE 2.2
ANALYSIS OF VARIANCE TABLE FOR AUC PARAMETER
FOR BALANCED FIXED EFFECT TWO-WAY LAYOUT
\0
Source
Treatments
Sum of Squares
TX ""
Treatment
by
PI ""
Mean Square
Expected Mean Square
t.
~nj[ ~Y.j.(t)-Y ... (t»dt]2
g-l
fn~[ f(Y •• k(t)-Y ••• (t»dt]2
1-1
)
Investigators
DF
TX/(g-l)
t.
2-
<1AUC + n11; [
~
(1AUC +
ng~[~(~k(t) -~(t»
s
dt ]2/(1-1)
to
i:,
TXP1
]2/(g_1)
t.
J
PI/(1-1)
f (Aj(t) -)l(t» dt
p:
njk(_~Y
~t(!) - Y.~.(t~
J
-y .. k ( +y ••• ( t) dt ]
Ie.
t,
1-
(g-l)x
(1-1)
TXPII
cTAUC +
(g-1)(1-1)
R/(n-g1)
ng~I[ 1(..u1~(t) - f-~(t) -.uk(t)
J
~
+)..ct
dt] 1 g-1)(1-1)
Invest.
Interaction
Residual
R "" T - TX - PI + TXPI
n-g1
Total
T
.tP~~Yijk(t) -
ng1-1
Y••• (t» dtj 2
"&.
CTAUC
~.
e
e
"
e
62
vector of AUC interval estimates.
The hypothesis of no treatment-by-
time interaction or time main effect can then be tested using MANOVA
procedures discussed in section 1.2.4.a.
While similar in construction, the AUC and Zerbe-Walker
statistics are directed at somewhat different null hypotheses.
Comparison of the ANOVA tables for the Zerbe-Walker statistic (see
Table 1.1) and the AUC statistic (see Table 2.1) for the one-way
layout helps to clarify the differences in these statistics.
From
Table 1.1, the Zerbe-Walker (Z-W) test treatment sums of squares is
written as
TXSS ZW •
- y..(t»2
dt
while the AUC treatment sums of squares, taken from Table 2.1 is
Review of these equations shows that while the Z-W statistic magnifies
any differences in curve shapes by first squaring and then
integrating, the AUC statistic identifies differences in cumulative
response over [t1,tpl by first integrating and then squaring.
Only
differences in curve shape which result in significantly different
cumulative response are detected by AUC analysis.
The Zerbe-Walker
test, in contrast, is directed at any differences in curve shape and
thus includes significanct treatment-by-time interaction in the
alternative hypothesis, Ra •
The powers of the AUC and Z-W test
statistics are compared in Chapter 4.
.
63
2.3
METHODS OF ESTIMATION
The form of the response curve fij(t), which describes response
of subject ij to treatment j, is in practice unknown.
The task then
is to find an unbiased AUC estimate, AUC ij , where fij(t) is unknown
but where the observation vector Yij is known.
For any continuous function f(t) defined on time interval [tl,t p ]
with known values f' • (f(tl), ... ,f(t p
of degree q
~
»' a polynomial function Pq(t),
p-l, can be used to approximate f(t).
The Weierstrass
Approximation Theorem (e.g., see Ralston and Rabinowitz, 1978) states
that as the number of known values f(tk)' k-l, ... ,p, increases, the
--
approximating polynomial Pq(t) can be chosen to assure any required
level of accuracy.
After Pq(t) has been determined, AUC can then be
tp
estimated by
S Pq(t) dt.
Polynomial approximation has several
c.
advantageous characteristics relative to approximation techniques
employing other classes of functions such as exponential or
trigonometric functions.
A major advantage of this method is its
relatively simple mathematical form, a property which has made it by
far the most widely used method for computer applications (Conte,
1965).
In addition, certain polynomial procedures lead to
approximations for which the error of approximation can be estimated
or at least bounded.
The use of polynomial approximation has often been applied to
situations where the function f(t) is of such a form that integration
cannot be performed exactly (e.g., the normal curve) or is too
complicated to be carried out.
The experimental collection of time-
64
response data provides a different setting for the application of
polynomial approximation techniques.
Here, the number of observed
values, p, is fixed; the response function f(t) is unknown but
described by the observed values of Yij; and the observed values
include random errors of measurement, i.e.,
(2.5)
Methods utilizing polynomial functions to estimate response function
f(t) and AUC for the case of experimental data are reviewed in this
section.
AUC estimation approaches which are linear combinations of
observed responses are first considered followed by a review of least
squares approaches using polynomial functions.
2.3.1
e-
AUC Approximations Which Are Linear Combinations of
Observed Responses
First assume that values of response function f(t) are known at a
set of p+l equally spaced time points to,"" t p•
by fO, ... ,f p•
Denote these values
Applying principles of finite difference calculus
(e.g., see Chapter 3 of Conte, 1965), f(t) can be approximated by the
Newton forward-difference interpolating polynomial Pp(s) where
(2.6)
where h denotes interval length, s • (t-tO)/h, and
65
f k - 2 - ... + (-l)k fO
f(x + jh).
Note that Af O • 6 lf (xO)· f(xO+h) - f(xO)· fl - fO'
and
A2 f O • /:12 f (xO)· l:.<L~f(xO»· Af(xO+h) - l:.f(xO)
• [f(xO+2h) - f(xO+h)] - [f(xO+h) - f(XO»)
• f 2 - 2f l +f O•
Here Pp(s) is a sum of polynomials of order O,l,H.,p, each of
successively higher degree.
The (k+l)st order polynomial Pk(s) passes
through all values fo,m,f k _ l , as does Pk- l (s), and also passes
through f k •
For the single time interval [to,tll, an approximate AVe value
can be obtained, noting that dt - h·ds,
t,
(2.7)
AVe ..
Sf(t) dt
~
1.
h
t.
a.
SPp(s) ds.
0
Rectangular Rule Approximation
When p equals zero, equation (2.7) becomes
t,
(2.8)
f f(t) dt
't.
1-
~ AVeO .. h
f
f O ds .. h·f O•
o
This is known as the zero-order or rectangular rule approximation and
is shown as the shaded are in Figure 2.3.
This approach can be
applied over each subinterval h k , k=O, ... ,p-l, so that an estimate of
Ave over [to,tpl can be obtained as
FIGURE 2.3
APRROXIMATION OF AUC ON [to,til
66
USING THE RECTANGULAR RULE
Response
f(t)
Time
FIGURE 2.4
APPROXIMATION OF AUC ON [to,til
e-
USING THE TRAPEZOIDAL RULE
Response
f(t)
ime
67
(2.9 )
The error of approximation of the rectangular rule over the
subinterval [to,t1]' from Conte (1965), is
t.
't.
(2.10)
ERR [tO,t1] - ff(t) dt - JPo(t) dt - h 2 f'("l0)/2
~o
to
The error over over all subintervals of [to,t p ] can be seen from use
of (2.10) to be
'"\
(2.11)
"-
ERR[tO,t p ] . ( ~ h k 2 f'("1?k»/2
<
tol
(h max2 ~ f'('17k»/2
where h max denotes the maximum interval length on [to, tpl.
Noting
p"'
that i f f(t) is continuous on [to,tpl, then
1: f'('1lk)
equals pf'('1t)
~o
for some point
't in [to,tpl, and noting that the number of intervals
p is less than or equal to (tp-tO)/h max ' the error of rectangular rule
AUC approximation over [to,tpl can be written as
When all intervals are of equal length h, this becomes
These results indicate that the error of rectangular rule
68
approximation of AUC is proportional to the interval length h, i.e.,
is an o(h) function.
As the number of responses observed over [to,t p ]
increases without bound the approximation error tends to zero in a
linear manner.
When data values YO'Y1'''.'yp which include random measurement
errors are available in absence of knowledge of f O,f1, ... ,f p '
the
maximum bound for error in estimating AUC using the rectangular rtile
is, assuming equal intervals (Conte, 1965)
(2.14)
BE (RR) • h
I
eO + e1 + ... + e p -1
I
where ek denotes the deviation of Yk from true value f k •
When the
error terms are normally distributed with expectation zero, the
expected value of the bound to the effect of sampling errors on
estimating AUC is zero.
Hence the rectangular rule produces an
unbiased estimate of AUC.
b.
Trapezoidal Rule Approximation
By choosing a linear approximation to f(t) on the subinterval
[t o ,t1]' an estimate of AUC that is more accurate than the
rectangular rule estimate can be obtained.
r
I
(2.15)
o
=
Using linear interpolation
I
pes) ds .. h
S(fO + s 6fo)ds
0
h(fO + fl)/2.
e*
69
This is simply the area of the trapezoid inscribed under the response
curve f(t) (see Figure 2.4).
Applying the trapezoidal rule over each
subinterval of [to.tpl the trapezoidal rule estimate of AUC is then
(2.16)
A
AUC TR • hO(f O+f 1 )/2 + h 1 (f 1+f 2 )/2 + ... + hp_1(fp_1+fp)/2
•
'-I
t
IUO
hk(fk + f k +1)/2.
When the time intervals are equal. (2.16) reduces to
(2.17)
""'"
AUCTR(equal intervals) - h(f O/2 + f 1 + ... + f p - 1 + f p /2).
In a manner similar to the development of the error of
-e
rectangular rule approximation of AUC. the error of trapezoidal rule
approximation over [to.tpl is found to be
(2.18)
where h max is the maximum subinterval length and f"('7t) denotes the
second derivative of f(t) evaluated
at~.
This indicates that the
trapezoidal rule estimation procedure has an error which is
proportional to the square of the interval length.
The 0(h 2) error
function for the trapezoidal-rule compares to the o(h) error function
for the rectangular-rule.
This indicates the superiority of the
trapezoidal rule relative to the rectangular-rule in estimating AUC.
which i8 also graphically indicated by comparison of Figures 2.3 and
2.4.
70
When only measurements YO'Y1''''Y p ' subject to measurement error,
are known instead of f O,f 1 , ... f p ' a bound to the error of the
trapezoidal-rule approximation of AUC is given by (Conte, 1965),
assuming equal intervals, as
(2.19)
When the error terms, ek' are normally distributed, the expected value
of BE (TR) is zero, indicating that the trapezoidal-rule AUC estimate
is an unbiased estimate of the true area under the time-response
curve.
The linear interpolation approach to estimating f(t) over each
subinterval of [to,t p ] results in line segments estimating f(t) which
are the best linear unbiased estimates of the corresponding population
mean interpolated response curves (Zerbe, 1979).
The resulting
trapezoidal-rule approximation of AUC represents a flexible, easily
implemented approximation technique.
AUC estimates for each
subinterval of [to,t p ] can be constructed and then AUC estimates over
the entire interval of observation
[to,t p ] or some subinterval of
interest can then be constructed as a sum of these interval estimates.
In addition, in estimating AUC using observations YO'Yl'''.Y p subject
to measurement error, the trapezoidal-rule procedure represents a
stable procedure (Ralston and Rabinowitz, 1978) affected less by large
errors inherent in a subset of the YO'Y1 ,".Yp than other Newton
interpolating polynomials of higher degree (to be reviewed in Section
2.3.1.c).
The trapezoidal-rule approach is the method of choice in
the pharmacological field (Gibaldi and Perrier, 1975), where AUC is an
e"
71
important parameter in its own right and a means for solving for other
pharmacokinetic parameters.
In the remainder of this section, the implications of using
trapezoidal-rule AUC estimation will be further examined for the
treatment-by-occasions design.
The use of the trapezoidal-rule
procedure for estimating AUC for clinical trial data is shown
graphically in Figure 2.5.
Review of Figure 2.5 shows the overall AUC
estimate for the period of observation to be the sum of the area
within each of the subinterval trapez9ids.
Using the responses Yij' •
(Yij1'Yij2,m'Yijp) of subject i from treatment j, the trapezoidalrule of AUC for the period of observation [t1,tpl can be expressed as
the sum of trapezoidal areas
(2.20)
/"-..
AUC ij • h1(Yijl + Yij2)/2 + h2(Yij2 + Yij3)/2 + ...
•
,-,
+
h p- 1(Yij,p-1 + Yijp)/2
1: hk(Yijk + Yij k+1)/2
k.'
'
where hk is the interval length (tk+1 - tk)' measured in time units,
between times of observation tk and tk+l.
Within each interval k, using (1.27), the interval estimate of AUC
is
(2.21)
A
for some ft(k)
A
such that Yijk.5. ft(k) .5. Yij,k+1·
Comparison of (2.20) and (2.21) indicates that the linear
interpolation estimate of average response
1\
f~(k)
during interval k is
72
FIGURE 2.5
RESPONSE DATA SHOWING ESTIMATED AUC OVER
[tl,tpl AS SUM OF SUBINTERVAL AUC ESTIMATES.
Response
to
Treatment
Interval
eFIGURE 2.6
Response
to
Treatment
LINEAR INTERPOLATION AUC ESTIMATE AS A FUNCTION
OF ESTIMATED AVERAGE INTERVAL RESPONSES.
73
the average of observed response Yijk and Yij,k+l.
The AUC estimate
for the entire interval of observation [tl,tpl is then
(2.22)
This is a weighted sum of average interval responses, where the
weights are interval lengths hk •
This construction of the linear
interpolation estimate of AUC is graphically displayed in Figure 2.6,
which presents the same response curve and area under the curve shown
in Figure 2.5.
'"f~ over
response
--
(2.23)
The linear interpolation estimate of the average
A
f~
the entire period of observation [tl,tpl is
•
/
The linear interpolation AUC estimate can be written in an
alternate form to (2.22) as a function of observed responses Yijk.
In
this form
~
(2.24)
AUC ij " (hl/2)(Yijl) + ([hl + h2 l / 2 )(Yij2) + ...
,
• 1:
It_,
+
([h p-2 + h p- 1 l/2)(Yijp-l) + (h p - 1 /2)(Yijp)
([h k - 1 + h k ]}/2 (y ijk)
where h O .. h p
= O.
.........
In this construction, AUC ij can be viewed as a weighted sum of the
observed responses where the weights are the average interval lengths
of the two intervals before and after the observation.
Since no
observations are made before tl or after t p ' hO and h p are defined to
be zero.
Expressed in matrix notation (2.24) becomes
74
A
AUC ij - (Ch)'Yij
(2.25 )
where
100 ... 000
Yij1
110 ... 000
Yij2
C - 1/2 • 0 1 1 ... 0 0 0
h -
- Yij
...
000 ... 011
000 ... 001
h p- 1
p x (p-1)
Yijp
(p-1)x1
px1
e-
The variance of AUC ij , then is
cr~UC(ij) ... (Ch)' 1:l (Ch)
(2.26)
where
~j
is the covariance matrix for treatment group j or for
population j from which subject ij is drawn.
The assumption of equal
between-group variances is expressed as
(2.27)
2-
(]AUC ... (Ch)'r1 (Ch) .......
(Ch)'~(Ch).
Equation (2.27) indicates that the equality of group covariance
matrices is not required for AUC analysis assumptions.
'When intervals
are equally spaced (i.e. tk+1 - tk ... h for all k-1, ... p-1), then
(2.25) reduces to
75
/""-.
(2.28)
AUC ij (equal intervals) - hc'Yij
where c'lxp - [1/2 1 1 ... 1 1 1/2]; and (2.26) reduces to
(2.29)
a:UC (equal intervals)ij - h 2 c'1jc.
When intervals are equally spaced, the AUC parameter can be seen to be
proportional to the sum of observed responses where the first and last
observations are given a reduced weight of one-half.
The definition of the linear interpolation AUC estimate above
suggests that the AUC analysis of treatment main effects is similar
to the mixed model ANOVA test for treatment main effects.
The ANOVA
Table for AUC and mixed model ANOVA procedures are presented in Table
2.3.
Review of Table 2.3 shows that the F-test statistics of the two
methods differ in the linear combination of members of observation
vector Yij'
AUC-analysis involves the linear combination (Cb)'Yij
while the mixed model test for treatment main effects defines the
linear combination j'Yij' an unweighted sum
~f
the observed responses.
Thus, the mixed model analysis disregards the time structure in t' •
[t 1 ,t2, ... ,t p ] associated with the observation of Y'ij •
(Yijl'."'Yijp) while the AUC test incorporates the time points of
measurement into the test of treatment effects.
As the number of
times of observations increases for a finite period of observation,
the AUC and mixed model test statistics converge to the same value.
e
e
TABLE 2.3
'"r--
e
ANOVA TABLE COMPARING AUC (LINEAR INTERPOLATION ESTIMATE) AND
MIXED MODEL ANOVA FOR THE TEST OF TREATMENT MAIN EFFECTS.
SOURCE
MIXED MODEL
AUC
SSTX
tnj(Cb)'(Y.j - Y.)(Y.j - Y.)'(Cb)
IIp
J
~nj j'(Y.j - Y.. )(Y.j - Y.)'j
J
Treatment
SSE
g - 1
g - 1
df
~~(Cb)'(Yij - Y.j)(Yij - Y.j)'(Cb)
,)
,
IIp
~ ~ j'(Yij - Y.j)(Yij - Y.j)' j
J
l"
Error
df
SST
N - g
N - g
1;:1;:(Cb)'(Yij - Y.. )(Yij - Y.. )'(Ch)
1
~1; j'(Yij - Y.. )(Yij - Y.)'j
J
I.
Co
Total
df
F-Test for Test
of Treatment
(N-g)
lfnj(Ch)'(Y.j - Y.. )(Y.j - Y.. )'(Ch)
(g-1)
tt(Ch)'(Yij - Y.. )(Yij - Y.. )'(Ch)
--
J
Main Effect
N - 1
N - 1
I.
(N-g)
-(g-l)
1; n j
j' (y .j
-
Y•• Hy .j
-
Y •• )' j
44j'(Yij - Y.)(Yij - Y.)'j
J
t
77
c.
Interpolating Polynomials of Degree Greater Than One
The Newton forward-difference interpolating polynomial of
equation (2.6) might be used to estimate f(t) by increasing the degree
of the interpolating polynomial to n, where n is greater than one.
The smallest subinterval of [to,tpl for which AUC can now be estimated
includes n+1 points.
For example, to use a second order polynomial to
estimate f(t), three response points are needed.
In addition, to use
(2.6) to estimate AUC, the times of measurement must be equally
spaced.
For the case of n-2, the Newton forward-difference polynomial
estimate of AUC for f(t) is
--
ta
(2.30)
f f(t)
to
dt ~ h
~
Spes) ds •
0
After integration and retaining differences through the third order,
this becomes
(2.31)
This is known as Simpson's rule (Conte, 1965) and also as the
parabolic rule (Hildebrand, 1974).
Over interval [to,tpl comprised of
2N subintervals of equal length h, tbe AUC estimate is
(2.32)
"'"
AUC SR [ to, tpl • h(fO + 4f1 + 2f2 + 4f3 + ... + 4f2N-l
+ f 2N )/3.
78
The form of the error term is
(2.33)
The error of Aue approximation is thus an 0(h4 ) function, indicating
significantly greater accuracy than the trapezoidal rule.
One can see
from (2.33) that when the true response function f(t) is a polynomial
of order three or less, Simpson's rule gives exact results and
provides a convenient method of integrating f(t).
The major drawback of Simpson's rule is the requirement of equal
subinterval lengths h on period of observation [to,tpJ.
In fact, this
is a disadvantage shared by all interpolating polynomials of degree
two or higher.
Another problem is overfitting.
As the degree of the
interpolating polynomial is increased, increasingly more accurate
approximations of AUe for f(t) can be achieved, assuming equal
intervals and that fO,oo.,f p are measured without error.
observations yo,m,yp involving measurement error are
When
ava~lable
instead of fO,.oo,f p ' however, significant errors of estimation can
occur when using a high order polynomial for a discrete number of
points.
This is because the polynomial is likely to "overfit" the
data and oscillate violently around the true response curve f(t)
(Hildebrand, 1974).
When two consecutive subintervals are of equal length while
surrounding subintervals are of different lengths, some combination of
the trapezoidal rule and Simpson's rule might be employed to estimate
AUe.
However, the increase in the precision of the AUe estimate using
this strategy may not off-set disadvantages of having to re-estimate
e-
79
AUC using the trapezoidal rule for subintervals of interest which
overlap with Simpson's-rule subintervals.
For the special case of
equally-spaced times of observation, the use of Simpson's-rule
estimation of AUC is recommended because of its greater accuracy
relative to the trapezoidal or rectangular rules.
2.3.2
Least-Squares Polynomial Approximation
An alternate approach to interpolation methods for the
approximation of unknown f(t) based on observed values Y'-(Yl,".'yp)
involving sampling errors is to use least-squares procedures.
For an
mth degree polynomial, m ~ p-l, the least-squares polynomial solution
is the mth degree polynomial with coefficients which minimize the sums
of squared deviations (SSE) of observed and predicted values.
This
can be written as
(2.34 )
where w(t) is some weight function such that w(t k )
~
0 for all k.
By
differentiating (2.34) with respect to the aj(m), the normal equations
are obtained and solutions for the aj(m) can be determined.
within-subject growth curve model E(y')
= ~T
The basic
can be used to provide a
weighted least squares solution for polynomial coefficients for each
subject.
The resulting polynomial is then integrated over [tl,tpl to
obtain an estimate of AUC.
The least-squares approach differs from the
previ~us
interpolating polynomial procedures discussed in section 2.3.1 in
80
several respects.
While the interpolating polynomial based on finite-
difference calculus is determined so as to pass through each observed
value Yk' the least-squares polynomial does not (when m
< p-l),
but
rather smooths the data so as to minimize differences between observed
and predicted points.
In addition, the least-squares polynomial is
estimated based on all of the observed values in [tl,tpl while the
interpolating- procedure polynomial is a piecewise function whose
members are defined on specific subintervals of [tl,tpl.
The question arises of how to choose m, the degree of the
approximating polyomial, given p, the number of observed responses.
Selecting m • p-l gives a perfect fit to the observed data in the
sense that it passes through all the observed values Yk.
However, as
mentioned previously, the use of a p_lth order polynomial can result
in an approximating curve that is overly sensitive to the measurement
error associated with the Yk and that oscillates wildly about the true
curve f(t).
Ralston and Rabinowitz (1978) suggest using SSE in (2.34) as a
criteria for selecting m.
They suggest plotting values of MSE m •
SSE I (p-m-l) for different choices of m and to choose as the value of
m after which no significant decrease in MSE m occurs.
When
calculating AUC for each study subject this procedure becomes
impractical quickly as the sample size increases.
Results from
applying this approach to a small random sample of study subjects may
suggest a value for m so that the procedure would not have to be
applied to all study subjects.
Approaching the problem froDl the growth curve model framework,
other procedures to determining m might be used.
A basic assumption
e-
81
of growth curve analysis is that the subjects' response data can be
described by the same order polynomial curve.
From this perspective,
the value of m is chosen to be the order of the polynomial curve that
would be selected in performing a growth curve analysis on the data.
Then this order of polynomial is used to approximate fij(t) for each
study subject and the resulting polynomial function fij(t) is
integrated to estimate AUC ij •
Another approach within the growth
curve model framework that appeals to the logic of the Ralston and
Robinowitz procedure is to plot the goodness-of-fit p-va1ue of growthcurve models of ascending order and to choose as the polynomial order
the value of m after which no significant increase in p-va1ue occurs.
Based on the discussion in this section, different AUC estimation
--
procedures are utilized in Chapter 6 for the AUC-ana1ysis of several
clinical trial data sets.
Based on its simplicity, ease of
implementation and adequate accuracy of approximation, the
trapezoidal-rule estimate of AUC is used in the analysis.
In
addition, the use of an AUC estimate based on a growth curve
polynomial of order m, where m is chosen based on insignificant
increase in goodness-of-fit p-va1ue, is explored.
Analysis results
from these two AUC-estimates are compared to each other and to results
from other analysis approaches.
CHAPTER 3
WEIGHTED AREA-UNDER-THE-CURVE ANALYSIS
3.1
INTRODUCTION
In certain longitudinal studies, different patterns of response
may be observed among the treatment groups.
In Figure 3.1, three
possible response pattern differences are displayed for a test versus
standard longitudinal clinical trial.
Figure 3.1.a displays
essentially parallel response curves where response to test treatment
is greater than response to standard treatment by a constant amount at
all observed time points following initiation of treatment (i.e., no
treatment-by-time interaction).
In Figure 3.1.b, response to test
treatment is greater than response to standard for all time points,
but the difference is not constant over all observation points.
In
Figure 3.1.c, response to test treatment is initially much higher than
response to standard but by time t3 and thereafter is less than that to
standard.
For the results displayed in Figures 3.1.b and 3.l.c, the
tests of treatment-by-time interaction could both be highly statistically
significant, reflecting the non-parallelism of the treatment group
response curves.
However, the situations in Figures 3.1.b and 3.1.c
are quite dissimilar, providing examples of what Neter and Wasserman
e-
FIGURE 3.1
THREE POSSIBLE RESPONSE PATTERNS FOR A
LONGITUDINAL CLINICAL TRIAL OF TEST
VERSUS STANDARD TREATMENT
Figure 3.1.a
No Interaction
Tx
Response
Figure 3.1.b Unimportant
Interaction
Tx
Response
.~~
~__ "~e.t
Standard
-Figure 3.1.c
Important Treatment-by-Time Interaction
Tx
Response
83
84
(1974) call unimportant and important interactions, respectively.
In the no-interaction and unimportant-interaction situations,
testing for treatment main effects using the AUC analysis described in
Chapter 2 is appropriate and straightforward.
In fact, AUC analysis
may be preferable to growth curve analysis or Zerbe-Walker (Z-W) test
analysis in the case of unimportant interaction since the latter
methods may be overly sensitive to unimportant differences in curve
shape.
This over-sensitivity will tend to occur for response-over-
time data that has many observation time points and is irregular in
shape.
But how about the situation in Figure 3.1.c?
Here an AUC test
of treatment differences is non-significant since the AUC's of the two
groups are the same, but the pairwise comparisons made for each of the
observation time points might be all highly significant.
The example
in Figure 3.1.c introduces the problem of assessing treatment main
effects when treatment time-response curves are not parallel.
In
certain situations, such as a clinical trial of drugs from the same
chemical family which are expected to utilize the same mechanisms of
action and differ only in potency, the interpretation of treatment
main effects using the AUC parameter is difficult when there is
treatment-by-time interaction.
In this situation, analytical interest
centers on the detection and interpretation of an unanticipated
interaction.
In other situations, the interpretation of an AUC test
of treatment main effects in the presence of interaction may be
straightforward.
For example, in a clincial trial of active versus
placebo treatments, the time-response curves are likely to be nonparallel over [tl,tpl;
however, an AUC analysis of treatment
e-
85
differences with respect to cumulative or average response is
appropriate and meaningful.
These examples suggest that aspects of
the study goals and the experimental entities under study must be
considered in assessing treatment differences in overall response when
significant treatment-by-time interaction is present.
In this
chapter, experimental issues relevant to the problem of analysis in
the interaction situation are discussed with attention given to
experimental considerations associated with pharmaceutical clinical
trials.
In addition, a generalized AUC approach to the analysis of
longitudinal data with non-parallel response curves is introduced.
3.2
EXPERIMENTAL ISSUES
To develop a rational scheme from which to assess treatment
effects in a longitudinal setting,it is important that salient
features of the particular experimental situation be considered.
In
the development which follows, the discussion focuses on experimental
issues pertinent to the pharmaceutical clinical trial evaluation of
drug efficacy.
Referring to Figure 3.1.c where equal AUC's were
observed, the performance of test treatment relative to standard can
be viewed as either good or bad, depending on wheter the investigator
views a rapid response as desirable or undesirable.
For example, in
the case of assessing analgesics for the relief of severe postoperative pain, rapid onset of response appears desirable.
However,
in the case of assessing a narcotic antagonist for reversal of the
respiratory depression associated with opiate overdose, a too rapid
86
response might result in risk of cardiac arrest so that a gradual
onset of response may be preferrable.
In yet another experimental
setting, it may be unimportant whether response is rapid or delayed.
For example, a study to compare the efficacy of different antihypertensive agents to control high blood pressure is likely to be
interested in the overall performance over a specified period of time,
as measured by average systolic or diastolic blood presure.
In contrast to the situation where treatments are expected to act
as dilutions of each other, the above examples illustrate experimental
situations where non-parallel response curves are anticipated and
where interest is still focused on assesing treatment effects in some
overall sense.
The AUC parameter described in Chapter 2 may still be
useful in assessing treatment differences in cumulative response.
However, when differences in response curve shape among treatment
groups reflect differences in response characteristics (e.g., drug
latency) that are important with respect to deciding the relative
superiority of one treatment, AUC analysis as previously described is
inappropriate.
One approach to analysis of cumulative response in the
case of treatment-by-time interaction is to use some
~
priori method
of weighting the various components of the p-member observation vector
Yij or some vector-function of Yij (such as the (p-l) x 1 vector of
subinterval AUC responses AUC ij ' • (AUCijl, ... ,AUCij,p_l»'
so as to
reflect desirable or expected properties of the experimental entities
under study.
Attention is directed here at developing a weighted AUC
parameter, AUC-W, that can be used in the interaction situation.
Since the presence of significant treatment-by-time interaction
reflects differences 1n response over time, the following development
e-
87
of a weighting scheme by which to weight AUC ij assumes that the
weighting function wet) is a time-related function and hence envelops
concepts such as drug latency and duration of action.
Details of the particular experimental situation impact upon
whether an unweighted of weighted AUC analysis is appropriate, and in
the case of weighted-AUC analysis, affect the choice of weighting
function wet).
In the case of the anti-hypertensive drug study, the
use of equal weights for all observation times (unweighted AUC
analysis) may be appropriate.
In the case of the analgesic study for
severe postoperative pain, it seems reasonable to weight differences
between drugs for early observation time points more highly than for
late observation time points.
For the clinical trial comparing drug
efficacy, issues related to the choice of a weighting scheme include:
1. Desirable properties of the entities under study.
For the study of a single-dose of test drug, for example,
treatment effect involves ideas such as latency, time of peak effect,
and duration.
The choice of a weighting scheme will reflect the
investigator's interest in one or more of these parameters.
2. Types of comparisons to be made.
In the case of low- versus high-dose comparison of the same
drug, for example, response to treatment involves the same sites/modes
of action and results in response curves that are parallel (Goodman
and Gilman, 1975).
For treatment comparisons of parallel response
curves, an unweighted AUC analysis is appropriate.
Unweighted
analysis also seems appropriate for active versus placebo comparisons.
For the comparison of active drugs from different chemical families
with different sites/modes of action, different patterns of
r~sponse
88
may be anticipated and a weighted analysis may be appropriate.
3. Aspects of the study design.
As an example, the investigative trial of a drug taken
periodically over the observation period (as opposed to the singledose trial) often involves initial titration of dose to effective
levels that minimize side effects.
Once this has been accomplished,
the comparison of relative efficacy is addressed in the subsequent
maintenace dose phase.
In this situation, a weighting scheme might
give zero-weights to responses observed during the titration study
phase.
As another example, for endpoint analysis, the weighting
scheme effectively assigns zero-weights to all but the most recent
observation and a weight of one to it.
These examples suggest no hard and fast rule for the adoption of
a particular weighting scheme.
Rather the nature of the experimental
goal and aspects of the design of a particular study must be
considered by the analyst in determining a useful weighting scheme.
It is clear that whatever weighting scheme is adopted, it should
be
~
prior in nature.
The next section will introduce one
~
priori
approach to weighted AUC analysis based on the concept of an expected
response function.
In the case that an
~
priori weighting scheme
cannot be justified on sound scientific grounds, AUC analysis will be
inappropriate in assessing overall treatment differences for nonparallel response curves when curve shape reflects drug response
dynamics that are considered important in assessing drug
effectiveness.
Multivariate growth curve analysis may be useful in
assessing treatment differences when a low-order polynomial curve fits
c
89
the data well but may be of limited usefulness in describing and
interpreting irregular shaped response curves for which high-order
coefficients must be included to provide an adequate fit.
In this
case, the analyst may be forced to approach the assessment of overall
treatment differences by combining in some fashion the univariate
comparisons at each observation time point.
3.3
DETERMINATION OF WEIGHTS FOR WEIGHTED AUC ANALYSIS USING THE
CONCEPT OF AN EXPECTED RESPONSE FUNCTION
The following development assumes linear interpolation
estimation of response function f(t) over observation period [t1,tpl
based on the observation vector Yij' • (Yij1'''.'Yijp).
The response
vector Yij is assumed to be the response of subject i to drug
treatment j.
The weighted AUC
parameter refers to functions of the general
form
P·I
(3.1)
AUC-W ij •
where
t
Ill"
wk • 1.
Here AUC-W ij denotes the weighted AUC-value of the i th subject from
the jth treatment group for the period of observation [t1,tpl; AUC ijk
denotes the unweighted AUt-value for subject i of treatment group j
for the subinterval [t k ,tk+ll, k-1, ... ,p-1; and wk is some weight for
[tk,t k+1 l •
Using trapezoidal-rule approximation of AUC, the weighted
AUC parameter of equation (3.1) is essentially a double-weighting
90
scheme employing wk and interval lenght h k as weights associated with
average response value y(tk,tk+l) • (Yk+Yk+l)/2.
Given the general idea of a weighting function, the question
arises as to the choice of a particular weighting function and its
validity.
As mentioned previously, the weighting function will be
assumed for purposes of the development in this paper to be a timerelated fuction and hence envelop concepts such as drug/treatment
latency and duration of action.
For the example of the analgesic
study previously cited, the goal of discovering an effective new
treatment with more rapid onset of action than standard drug leads
naturally to selecting a weighting scheme that is a function of desired
or expected drug latency properties.
The following discussion will
utilize this idea of an expected response function to aid in the
development of an AUC weighting function w(t) for the case of
pharmaceutical clinca1 trials.
The weighting function in turn leads
to the weights wI' w2, ... ,w p-l.
Situations where the form of the
expected response function is known and where the functional form is
unknown but its general shape is known are discussed.
3.3.1
Expected Response Function
Functional Form Known
Assuming clincial response to a drug treatment is proportional to
the pharmacokinctic actiton of the drug in the body, it may be
possible to specify
~
priori the mathematical form of an expected
response function to use as a weighting function.
For example, if
w(t) represents the serum blood level time-response curve after a
single dose of a standard drug (see Figure 3.2), the weights WI' ... ,
e-
91
FIGURE 3.2
HYPOTHETICAL SERUM BLOOD LEVEL TIME-RESPONSE
CURVE OF A (SINGLE-DOSE) STANDARD DRUG.
Blood
Serum
Concentration
w(t}
Time
92
wp-1 for weighted-AUC can be derived as
S
tlC+l
(3.2)
wk - weight for interval [tk,tk+1] -
wet) dt] / W
t"
t,
where W·
S
wet) dt
k-1, ••• ,p-1.
t,
Review of (3.2) shows that the weights derived from this expected
response function approach are derived as the area under the
standardized blood-concentration time-curve.
Relatively high serum-
blood levels of standard drug during subinterval [t k ,tk+1] result in a
weight wk that is large relative to other interval weights.
Consequently the weighted AUC parameter AUC-W of (3.1) will involve
weighting the AUC-value for [tk,tk+1] more than AUC-values for other
intervals.
A question immediately arises concerning the validity of the
underlying assumption of pharmacological response (i.e., the use of
blood plasma or serum concentrations) providing a direct measure of
clincial response.
That this is so, at least for many classes
~f
drugs in the single-dose evaluation setting, is supported by several
authors.
Goodman and Gilman (1975, p. 19) state:
"Pharmacokinetic principles relate specifically to the variation
with time of drug concentration, particularly in the blood, serum, or
plasma.
By extrapolation, they may be interpreted in terms of drug
effect.... They are not a substitute for, but rather a supplement to
clinical monitoring and judgment".
Koch-Weser (1972) states:
"Changes in the serum concentrations of most drugs are
93
accompanied by similar changes in their concentration at the site of
action and in the number of drug-receptor complexes".
Goldstein et. al. (1974) discuss serum concentrations and interpatient variability with respect to response.
"When patients are given identical doses of a drug or even
identical doses per kilogram of body weight. very large differences in
response may be seen.
In some patients. the drug may have no
therapeutic effect. in others it may work well. and in still others
toxicity associated with overdosage may be seen.
Theoretically, two
sources of this variation between patients may be segregated:
differences in plasma level established by a given dose and
differences in effect produced by a given plasma level.
For most
drugs that have been studied sufficiently. the principal variation is
in the plasma level."
These citings are not meant to suggest that drug plasma
concentrations are equivalent to drug response.
Rather it is
suggested that plasma concentrations are correlated with response and
offer an objective
~
priori approach to the computation of meaningful
weights to be used for weighted AUC analysis when treatment-by-time
interaction is present.
The use of blood or plasma serum levels as a
basis for wet) may be most appropriate for the single-dose trial of
drugs.
Serum concentration time-response data are usually available
for these types of investigational drugs.
Long term trials involving
repeated dosing present a problem in that the commonly performed
single-dose blood level studies are inapplicable.
In addition, doses
are often titrated to individual patient needs so that the average and
cumulative amount of drug received is not the same for all patients.
94
The functional form of wet) using blood concentrations can be
specified using pharmacokinetic theory.
The two-compartment model has
been shown (e.g., see Greenblatt and Koch-Weser, 1975) to adequately
describe the absorption and elimination dynamics of many drugs. The
two-compartment model assumes that the body can be resolved into a
small central compartment (often assumed to be comprised of the blood
volume together with the extracellular fluid of tisues such as the
heart, lungs, kidneys, liver and endocrine glands) and a peripheral
compartment (muscle, skin and body fat) where drugs enter more slowly.
Assuming first-order kinetics (i.e., the rate at which a drug is
removed from a compartment is proportional to the drug concentration
in it) and instantaneous uptake, an exponential elimination process
operates and can be described by
e(3.3)
Concentration in the
central compartment at time t,
Parameters of the plasma concentration time curve wet) can be solved
using Laplace transformations or using matrix differential equations
(Greenblatt and Koch-Weser, 1975).
Integration of equation (3.3) then
leads directly to the weights wk of equation (3.2).
The derivation of the weights wk
bears a certain resemblance to
the concept of direct standardization (e.g., see Armitage, 1977)
employed in the epidemiologic comparison of populations with respect
to some relevant parameter.
For example, in the comparison of
p,lncreatic cancer rates of two countries, direct standardization might
be used to adjust for different age distributions of the countries
9S
(age-by-country interaction).
Direct standardization introduces a
standard population and calculates a standardized parameter value Pj'
for population j as
-
(3.4)
where Pjk is the category-specific rate (e.g., age-specific death
rate) in the jth study population's k th category, and Nk is the
standard population size in the k th category. The average weighted
AUC measure
AUC=W j for the jth treatment group can be written as
(3.5)
-e
where AUC jk is the mean unweighted AUC-value for the jth treatment
group for study interval [tk,t k+l ] and wk is the corresponding weight
determined from the expected response function w(t).
Comparison of
these equations shows that the AUC weights wl, ... ,wp_l of (3.5)
derived from the expected response function correspond to the standard
population weights Nl,N2""
of (3.4).
Assuming the the rationale behind weighted AUAC is appropriate,
the question remains of which drug's plasma concentration time curve
(or other meaningful weighting function) should be used to calculate
w(t).
In the analgesic study example, should standard or test drug's
plasma concentration time curve be used?
Just as there is no unique
choice for a standard population in epidemiologic applications of
direct standardization, the choice of which drug or treatment expected
response function to use for weighted AUC is not unique.
It is felt
96
that the choice, as long as it reflects experimental goals, will not
greatly affect the comparison of treatments in the important
treatment-by-time interaction situation, although it will certainly
affect the absolute values of the weighted-AUC values for each
treatment group.
3.3.2
Expected Response Function
Functional Form Unknown
The value of the weights w1,w2,ooo,w p_1 obviously can be affected
by the choice of the weighting function w(t).
This might cause
concern to the analyst who has a general idea of expected response but
who may not know the mathematical form of such a response function.
Another problem is that the clinical response pattern may not mimic
the blood level time-response pattern.
An example of such a
situation exists for the study of monoamine oxidase (MAO) inhibitors
used in the treatment of atypical depression.
While the relationship
between serum concentrations and drug effect holds for most drugs,
Koch-Weser (1972) states,
'~his
is true only for drugs that act reversibly i.e., whose
action does not outlast their presence at the receptor site, but the
great majority of drugs fall into this category....
The intensity of
drugs that do not act reversibly (such as MAO inhibitors) is
presumably dependent on their initial serum concentration but is not
related to the level at any given moment."
In this situation it is desirable to develop a weighting scheme
which is invariant to small changes in expected response function
parameters and which can be used when certain less well defined
e-
97
characteristics of expected response are known (eg., the investigator
knows that response should improve over time implies a monotonically
increasing expected response function).
In this section the effect of
different choices of a weighting function are examined.
1.
Linear Monotonic Pattern.
The linear monotonic expected response
w~)
pattern is described by wet) • a t.
Note
that an intercept term is not necessary here
since expected response is considered re1ative to baseline value.
-e
'Ti"'l!
For this pattern the
weights w1, ••• ,wp-1 are unaffected by a
change in slope a for a given interval length.
Thus for any linear
expected response function, a constant set of weights wk exist for any
given interval length [t1,tpl.
2.
General Monotonic Pattern.
A functional form that is more general
wit)
than the linear function is the monotonic
function where either w(tk)
< w(tk+1)
(monotonically decreasing) or w(tk)
> w(tk+l)
(monotonically increasing) for all t k in
[t1,tpl.
The situation where the
investigator expects a continuous
improvement over the study period corresponds
98
to selecting a monotonically increasing weighting function w(t).
Consider two monotonic exponential functions Y1 • e- t and Y2 • e- t/2 •
Weights based on five equal interval lengths on the interval [0,5) are
presented in the following figure.
One can note that changing the
coefficient by a factor of 2 results in dissimilar weights.
However,
one can see that the rank of the weights for Y1 and Y2 do not differ.
In fact, for any monotonic expected response function used in
Weighting
Function w(t)
Y1 ... e- t
Interval
Interval:
0-1
1-2
2-3
3-4
4-5
Weight Value
.636
.234
.086
.032
.012
Rank of
Weight
Weight Value
Y2 ... e- t/2
5
4
3
2
1
.429
.260
.158
.096
.058
2
1
Rank of
Weight
5
4
3
weighted AUC on interval [t1,t p J, the rank of the subinterval weights
is invariant for equally spaced subintervals.
The implication here is that the exact form of the expected
response function need not be known in order to know the relative
magnitude of the weights for any monotonic weighting function w(t)
when intervals are equally spaced.
This suggests the possible use of
ranks when w(t) is monotonic but the exact form of w(t) is unknown.
For example, if w(t) is monotonically increasing, then the weights for
AUC-W can be derived as wk • k I
i.
This simplifies to
e-
99
(3.6)
2k/(p-l)p
When wet) is monotonically decreasing, wk(ranks) equals 2(p-k)/(p-l)p.
This function is a linear function of interval number and thus
essentially assumes that the expected response function wet) is
linear.
3.
Symmetrical Pattern.
Assuming a symmetrical expected response
function (i.e., w(tj)-w(tk) implies that
w(tj_l)-w(tk+l) for k
>
j), the arguments
advanced above can be applied so that the
-e
middle interval has the largest weight, the
two contiguous intervals are tied for the
next highest weight, etc.
4.
w~)
Skewed Pattern.
A skewed expected response pattern is
shown in the figure on the left.
For this
class of expected response functions, no
definitive relationship between time and
rank-weights is apparent.
~m~
The general shape
of the curve, a large initial increase
followed by a tapering decline, is similar to the shape of many
100
pharmacokinetic response curves.
This suggests, in the case of drug
evaluation, the suitability of using the blood-concentration timeresponse curves discussed in section 3.3.1 to derive the weights wk'
From this review of general patterns of the expected response
function wet), it is seen that when wet) is expected to be generally
linear on [t1,t p ] but its exact form is not known, the determination
of the weights w1, ... ,w p-1 is straightforward.
For a more general
monotonic pattern the ranks of possible weights are invariant to
changes in the form of wet), and a weighting scheme based on the rankweights might be used.
For a symmetrical pattern, the weights based
on ranks might also be used when intervals are equally spaced.
When
the exact form of the expected response function is unknown but is
skewed in shape, no unique weighting system is determinable.
3.4
THE WEIGHTED AUC PARAMETER
From equation (3.1) the general form of the weighted AUC
parameter was shown to be AUC-W ij • 1: WkAUCijk' where AUC ijk is the
l4.
th
unweighted AUC-value for the i
subject of the jth treatment group in
the k th time interval.
This equation can be expressed, using the
trapezoidal-rule approximation of AUC, as
(3.7)
~
AUC-Wij - Wlhl(Yijl+Yij2>/2] + •••
+ wp-1 h p-1[(Yij,p-1+Yijp)!2]
e-
101
where Yijk is the observed response at time tk' and hk is the interval
length tk+l-tk.
Expressed as a function of the observed responses
•
Yijk' this becomes
(3.8)
where wO·w p• 0 .hO·hp•
In this construction, AUC-W ij can be viewed as
a (double) weighted sum of the observed responses where the k th weight
is the average of interval lengths hk- l and hk weighted by expected
response function weights wk-l and wk.
With equal intervals between
observations, equation (3.8) reduces to
~
-e
(3.9)
p
AUC-Wij(equal intervals) • h ~ [(wk-l +wk )/2) Yijk
ksl
where wO·wp.O.
Thus with equal time intervals, the AUC-W parameter is
a weighted sum of observed responses where the weight for Yijk is the
average of the expected response function weights for intervals
adjacent to Yijk.
Expressed in matrix notation, (3.7) becomes
(3.10)
where D(w) is a diagonal matrix of rank (p-l) containing the weights
wk' and hand C are defined as in equation (2.25).
The variance of
AUC-W, based on trapezoidal-rule approximation, is then
(3.11)
~
Var(AUC-W ij ) • [h'D(w)C') I:j[CD(w)h).
CHAPTER 4
POWER OF THE AUC AND ZERBE-WALKER F-TESTS OF
TREATMENT MAIN EFFECTS
Using the expected mean squares shown in Table 1.1 for the ZerbeWalker test statistic and Table 2.1 for the AUC statistic, test power
of the AUC and Zerbe-Walker (Z-W) F-statistics can be defined and
The power function h(A) can be defined in terms of the
compared.
non-certtra1ity parameter 1\ , for fixed numerator and denominator
degrees of freedom m and d and fixed c<-level, as
(4.1)
h( A)
:III
Prob [ F(m,d;
Of
) ~ Fa( (m,d)]
•
where F0< (m,d) denotes the 100(1- 0<.) percentile of the central Fdistribution with m and d degrees of freedom.
The non-centrality
parameter for the AUC fixed-effects model F-test for interval of
observation [to,t p ] is
t
(4.2)
A(AUC) ..
!'<;U.j(t)
-;U.. (t»dt
'to
2-
m C1'AUC
]2
j-1, ... ,g
e-
103
where~.j(t)
is the true average response function for treatment j
over [to,tpl and
....-u..(t)
function over [to,tpl.
is the true average population response
The corresponding Z-W test non-centrality
parameter is given by
(4.3)
A(Z-W)
-
f
t,
nj [
f (P.
j ( t) - f t .. ( t
»2 d t
j-1,... ,g
t.o
m
'1
<1'Z-W
One can note from review of (4.2) and (4.3) that A(AUC) is greater
t,
t,
than zero whenever f"<~j(t)dt" "u.k(t)dt for some j,k - 1, ... ,g;
f
j;k.
--
t. responses are equal, }\(AUC)
When the average• AUC treatment
equals zero.
In contrast, the Z-W non-centrality parameter )\(Z-W) is
greater than zero whenever treatment response functions are not all
identical, Le., when ft.j(t) '" ft.k(t) for some j,k-1, ... ,g; j;k.
This indicates, as was discussed earlier in Chapter 2, that the Z-W
statistic is directed at detecting any differences in treatment
response curve shape while the AUC statistic is directed at detecting
any differences in treatment response curves which result in different
cumulative responses among treatments.
The true average response functions p.j(t) and
...-u.. (t)
unknown and some unbiased estimates must be used instead.
are
In this
chapter the method of linear interpolation to estimate .;U(t) on
subinterval lto,tll is utilized to estimate non-centrality parameters
)\(AUC) and )\(Z-W).
Since the degrees of freedom of the F-test
statistics are the same, the following result, shown by Ghosh (1973),
can be used.
For fixed m, d, and o<-leve.l
104
(4.4)
•
This states that the F-test power function h, for constant degrees of
freedom and O(-level, is an increasing function of the non-centrality
parameter.
This indicates that the power of the AUC and Zerbe-Walker
tests can be compared by comparison of their non-centrality
para~eters
A(AUC) and A(Z-W).
Using linear interpolation to estimate the individual and average
response functions, the AUC non-centrality parameter of (4.2) can be
estimated on [to,tl] by
(4.5)
where
Yij(t) .. aij+bijt,
Y.j(t) .. taij/nj + 'fbijt/nj
:: a. j + "b.jt,
Y.. (t)
.. ~a.j/g +
J
1;:'b. j t/g
=
8 ••
J
+b
..
and
t.
The Z-W non-centrality parameter of (4.3) can be estimated on [to,tl]
by
(4.6)
t,
A
A(Z-W) =
d
t S
[(a.j+b.jt) -
nj
to
105
•
These can be represented, after integration and squaring, as
(4.7)
m 1;
J
1; [a2 2
+ a2 b2(tO+t1) + b22(t02+2totl+t12)/4]
I.
and
(4.8)
·e
where a1 .. (a.j-a.), a2 .. (aij-i. j ),
b 1 .. (b.j-b.), and b 2 • (bij-b. j ).
Review of (4.7) and (4.8) shows that the non-centrality parameters
differ only in the coefficient associated with the squared sloped
terms in the numerator and denominator.
The Z-W "-coefficient
(t o 2 + tOtl + tl 2)/3 is greater than the AUC ;\-coefficient
(to 2 + 2tOtl + tl 2 )/4 whenever (tl-tO)2 is greater than zero, which is
always the case.
The non-centrality parameters of (4.7) and (4.8)
can then be rewritten, respectively, as
(4.9)
1\
"(AUC) •
106
d(T 1 + ~nj(b.j-b •.>2(h +f.»
(4.10)
1\
.1
Nz-w) •
m(T 2 + 1;~(birb.j)2(h + E.
J
»
•
L
where T1 and T2 are the between- and within-group sums of squares of
intercept terms and intercept-slope cross-product terms, h is the AUC
coefficient (t0 2 + 2tOt1 + t1 2 )/4, and
£
is the increment for the Z-W
coefffic1ent.
By evaluating (4.9) and (4.10) to determine conditions under
'"
which I\(AUC) is greater than ~
)\(Z-W), the conditions under which AUCanalysis results in a more powerful test of treatment differences in
response curve shape than the Z-W test can be determined.
It can be
shown that AUC-analysis has greater power than the Z-W test when the
proportion of variance in the slope parameter explained by betweengroup differences is less than the proportion of intercept and
intercept-slope variance explained by between-group differences.
In
the case of equal slopes for all g treatment groups during [to,t1l.
.>
2 is zero and
the between-group slope sums of squares,? nj(b.rb•
~
thus the AUC test will have greater power than the Z-W test.
When
intercepts are equal but slopes different, the Z-W test yields a more
powerful test than the AUC test.
The above power comparisons have been based on linear
interpolation estimation of response function f(t) on subinterval
[t k ,tk+1 l defined by the times of observation of responses Yk and
Yk+1.
ov,~r
Assuming the linear interpolation estimate to be satisfactory
all subintervals of [to, tpl, the power function he,,) based on
linear interpolation is a function of the power functions within each
107
subinterval.
Since the calculation of beween- and within-group sums
of squares for both Z-W and AUC analysis over [to,tpl using linear
•
interpolation involves summing sum-of-square terms over all
subintervals, the estimated power of the tests over [to,tpl can be
compared by averaging the power over all subintervals.
In summary, this chapter has considered the power of the ZerbeWalker and AUC tests to detect treatment differences in response curve
shape based on linear interpolation estimation of response function
f(t).
The construction of the AUC and Z-W response parameters,
reflected in their respective non-centrality parameters, suggests that
-e
the Z-W test has greater power in the case of treatment-by-time
interaction.
powerful.
When there is no interaction, the AUC-test is more
It is noted that the AUC parameter is not constructed to
detect differences in the shape of treatment response curves except
where these differences lead to differences in cumulative response to
treatment.
In Chapter 6, results of analyses of clinical trial data
are presented to illustrate how the tests might typically compare with
respect to
powe~
CHAPTER 5
PIECEWISE GROWTH CURVE MODELS
5.1
INTRODUCTION
In Chapter 1 the use of polynomial growth curve models to
approximate time-response function f(t) and to test for differences in
treatment response curve shapes was discussed.
Also discussed was the
possible use of piecewise models when response to treatment follows
one pattern over some subinterval of the obervation period [t1,tpl and
other distinctly different patterns over other subintervals.
An
example from the pharmaceutical clinical trial setting is the activeversus-placebo trial, where a "placebo" effect is often noted during
the initial time intervals of observation for certain types of drugs
such as psychoactive drugs.
Following this initial subjective
response to an anticipated effective treatment, subject response (eg.
level of depression, subjective rating of pain) may change to reflect
a more physiological response to treatment.
As another example,
Teeter (1982) investigated the use of piecewise regression models to
model body temperature as a function of the menstrual cycle.
Prior to
ovulation a linear model with negative slope was seen to fit and after
ovulation a linear model with positive slope was seen to fit.
The
e-
109
piecewise model can provide a descriptively useful model which is more
easily interpreted than a higher order polynomial fit over the entire
interval [t1,tpl.
It also may have advantages when interest lies in
prediction or extrapolation past the observed range of observation
(Monti et. al., 1978).
Piecewise models have been studied extensively and used on an
ad hoc basis, especially in engineering applications, to smooth
observed data.
A continuous overall model on [t1,tpl is accomplished
through the constraint that the submodels meet at their endpoints,
called "join" points.
The term "spline" function denotes piecewise
models which, in addition to the constraint that the segmented curves
meet at the join points, requires that derivatives of the piecewise
functions agree at the join points (Smith, 1979) thus resulting in
smooth looking curves over [t1,tpl.
This distinction between spline
and piecewise functions is sometimes confused in the literature (eg.
see Brunelle and Johnson, 1980).
In the statistical literature,
piecewise model theory has been developed for the cross-sectional data
situation when the join points are assumed known (see eg., Fuller,
1969; Monti, et a1., 1978) or unknown (see eg., Hudson, 1966; Teeter,
1982).
When the join points are known the use of multiple regression
techniques can be used to estimate model parameters.
For example, for
the case of a three-segment linear model with two join points the
piecewise model can be written using indicator variable I j as
110
(5.1)
E(Yi) -
,A+ At i + !t(ti - T1 )ll + A(ti - T2)l2
where
ti - time of measurement,
T1 and T2 are the join point abscissas,
lj
-r
i f ti ~ Tj
i f ti
> Tj
j-l,2.
The join point abscissas mayor may not coincide with observation
times t1''''' t p •
When the join points are not assumed known, the
continuity restraint is non-linear in the unknown parameters
and T j (Teeter, 1982), and iterative procedures must be used to
estimate model parameters.
Quandt (1958) outlines a maximum
likelihood approach to parameter estimation while Hudson (1966)
develops an ordinary least squares estimate that is equivalent to the
maximum likelihood estimate under the assumption of normality of the
error terms.
In the case of repeated measures data where the observations at
the different times of observation are not independent, a piecewise
model with continuity constraints within the growth curve model
suggests itself.
Assuming the join points to be known, a full within-
subject piecewise model can be constructed that fits the observed data
perfectly.
This is analagous to the situation in ordinary growth
curve analysis that a (p_l)th order polynomial curve can be
constructed to perfectly fit p observation points.
In the next
section, a statistical model for piecewise growth curve analysis is
developed.
•
111
5.2
STATISTICAL MODEL
The within-subject polynomial growth curve model for subject ij
can be written as
(5.2)
q
i
p
where Yij* is the p x 1 vector of responses for subject ij, T* is a
q x p matrix of powers of observation time values and
Ai"
P;J
is a q x 1
matrix of polynomial coefficients to be estimated. When q equals p,
the model is a full model which fits the data perfectly.
The same
data can be perfectly fitted through the use of piecewise models with
specified join points.
Assuming the join points are known and
coincide with observed time points, the observations of the pdimensional observation vector Yij * can be grouped into subsets to
correspond to the intervals defined by the join points.
Within each
interval containing r ~2 data points an (r-1)th order polynomial can
be determined which fits the data within the data perfectly.
The
collection of such submode1s then fits the data over [t1,t p )
perfectly.
To provide clarity in the development of the piecewise growth
curve model, the case of one join point will be considered.
The
specification of models with two or more join points follows directly.
Consider the time-response data presented in Figure 1.2.
Assuming the
join point occurs at time 4, a quadratic submode1 for time interval
[0,4) and a cubic submode1 for interval [4,10) combine to provide a
perfect fit to the data in the sense of passing through all observed
112
response values YI,ou,Y6.
(5.3)
The full submodels can be specified as
,
~..
E(Yij[tl.t3]') •
~'JI
·Tl
A'T
E(Yij[t3. t 6]') •
~ijl.
2
where Tl and T2 are square matrices of powers of observation time
values and have order one less than the number of observations in
their respective subintervals.
The use of linear interpolation to
estimate f(t) over [t1,tpl can be viewed as a special case of
piecewise regression.
Here each subinterval is defined by adjacent
observations and a first-order (linear interpolated) curve constitutes
a full model for the subinterval.
In contrast to other piecewise
models, however, the linear interpolation method is motivated as an
approximation technique with parameters that in general have no
physical interpretation.
The overall piecewise model can be specified
by
(5.4)
= E(Yij[tl.t3]"Yij[t3. t 6]')
,
-
[Jijl
,
Jljl.
-
I
~ij T.
Here Ylj[tl.t6] is a (p+l)th order vector which contains the response
value associated with the join point twice.
This reflects the
constraint that the submodel endpoints meet at the join point.
Using
the six-data-point example displayed in Figure 1.2 this model can be
written for the ijth subject In transposed form as
113
t 0 t 1 t 12
1
1
0
1
t2 t2 t 22
tOt 1 t 2
3
3
3
Yijl
Yij2
Yij3
•
0
0
0
0
~o
0
0
0
0
IJ.
0
0
0
°3 A
0
1
fi!
fi't
0
0
0
t3
Yij4
0
0
0
YijS
0
0
0
Yij6
0
0
0
t 0 t4 1 t4 2 t 3
4
4
O 1
tS ts t S2 t S3 fts
0
1
2
3
t6 t6 t6 t6 /I.
Yij3
t3
t l t3
Interval 1 Intercept
Linear
Quadratic
Interval 2 Intercept
Linear
Quadratic
Cubic
Assuming that T is common to all subjects, i.e. that subjects
have common observation times and no missing data, a between-subjects
model indicating some experimental design can be written as
(5.S)
Y •
XBT + E
where Y is the N x (p+1) matrix of response data with the column of
response values at the join point repeated twicej X is an N x g
between-subject design matrix identifying treatment group membership
for the N study subjects including covariates if appropriate; E is an
N x (p+l) matrix of residuals; and B is a g x q matrix of unknown
polynomial parameters for the piecewise submode1s.
When q equals p+l
a full model is specified which fits the data perfectly.
The
parameter matrix B can be partitioned to identify the parameters of
each submode1j for example B can be written as B-[B 1 B2] for the case
of one join point, where Bl is a matrix of polynomial parameters for
the model fit before the join point (submode1 1) and B2 is a matrix of
polynomial parameters for the model for the interval after the join
point (submodel 2).
114
The full piecewise growth curve model (q • p+l) expressed in
terms of natural polynomial coefficients in (5.5) can be specified in
terms of orthonormal polynomial coefficients.
The orthonormal
polynomial coefficient matrix Torn can be determined using Cholesky
decomposition.
The procedure to determine Torn and the resulting
orthonormal model is shown below.
E(Y)
Starting with the model in (5.5)
XU
E(Y)T'· XBTT'· XBT*
Now let TN - T'(Ti ,).
TN is symmetric positive definite and thus TN
• LL' for some (p+l)-dimension lower triangular square matrix L.
*
-1T .
Then define Torn • L
By construction, Torn is orthonormal.
Thus
an orthonormal model upon which independent tests of parameters can be
based is
(5.6)
•
E(YT')
•
In certain instances, a polynomial of lower order than the full
model can provide an adequate fit for one or more of the subintervals
defined by the join points.
In the case of one join point, the
piecewise model can be stated as
(5.7)
E(Y)
=
115
where .12 and .22 denote polynomial parameters of submode1 1 and
submodel 2 respectively thought to be zero. The reduced model can then
be written as
(5.8)
E(Y)
-
XBaTa
- X [B 11 B21 ] [Till
T21
The goodnesss-of-fit of the reduced mopde1 is tested by the test of
the null hypothesis Ho : [B12 B22l - CBB - 0 where C and Bare
appropriately chosen contrast matrices.
Multivariate test statistics
such as Roy's largest root and the Pi11ai-Bart1ett trace can be used
as statistics for this test.
Just as for growth curve analysis
discussed in section 1.2.4, simultaneous confidence bounds on all
linear combinations of CBU can be constructed using Roy's unionintersection principle.
5.3
ANALYSIS CONSIDERATIONS
Piecewise growth curve analysis can be a descriptively useful
analysis technique for repeated measures data when the form of
response to treatment changes over the period of observation [tl,tpl.
In this situation, the use of a single growth curve model over [tl,tpl
can result in an inappropriate polynomial model with high-order
parameters that are difficult to interpret.
When the join points are
known from theoretical considerations or known empirically from
previous research and the join point abscissas coincide witL scheduled
116
observation points, the piecewise growth curve model provides an
appealing model for analysis.
However, when little is known about the
response phenomena under the study, the validity of the piecewise
model (and choice of particular join point abscissas) relative to a
growth curve model fit over the entire interval [t1,tpl is suspect.
Generally, the selection of a piecewise model over regular growth
curve model should be based on some supportive scientific knowledge.
When theory suggests a possible change in the form of the
response curve over [t1,tpl but the exact location of the join points
is unknown, interest may lie in estimating the location of the join
points in addition to estimating the parameters of the response curve
in the subintervals of observation.
In this situation, the continuity
constraint is nonlinear in the unknown parameters, and an iterative
estimation approach must be used which depends on knowledge of curve
shapes on either side of the join point, and on whether the number of
observations between join points is known or unknown.
A discussion of
join point estimation approaches is beyond the scope of this chapter;
the interested reader is referred to Teeter (1982) and Hudson (1966)
for details for the case of uncorrelated error terms.
When the approximate location of the join points is known, the
piecewise growth curve
model may provide a useful descriptive model
,
for the data.
For example, an investigator may know from clinical
experience that the first two weeks of treatment for depression may
involve a placebo effect such that the patient, in manifesting hopes
that a new treatment may provide a cure, shows temporary signs of
improvement.
In the situation of no treatment effect, the patient's
level of depression may subsequently return to baseline levels. In the
117
analysis of results from such a clinical trial, the choice of the Week
Two observation as the join point may be reasonable.
The
appropriateness of this model can be investigated by review of study
subject results.
When a consistent pattern of response which suggests
a change in response at Week Two is observed across patients and for
various patient population subgroups, the appropriateness of the
piecewise growth curve model is substantiated.
In the following chapter, three clinical data sets are analyzed
using analysis approaches discussed in Chapter 1.
The analysis of
data from a clinical trial of an anti-depression drug in Section 6.4
includes as an illustrative model the fitting of a piecewise growth
curve model, fit to model an inital placebo-effect phase followed by a
post-placebo effect response phase.
ANALYSIS OF CLINICAL TRIAL DATA SETS
6.1
INTRODUCTION
In this chapter. the methods of analysis for repeated measures
data described previously are applied to three pharmaceutical clinical
trial data sets.
The data sets are analyzed to illustrate the
applicability of the analysis methods to longitudinal data with
various characteristics.
Two of the data sets are from independent
clinical trials of a nonsteroidal drug for the treatment of rheumatoid
arthritis.
The other data set is from a clinical trial of a drug
treatment for depression.
The data sets differ considerably with respect to the number of
subjects. length of the observation period. number of observations.
and type of factorial setup.
While the depression study was
conducted at a single investigative site. the arthritis studies were
cooperative trials involving up to eight investigative sites.
In each
of the studies, a number of subjects terminated study participation
before the scheduled last visit. resulting in an incomplete response
matrix Y.
While the repeated measures analysis re&ults are based on
subjects with complete data. endpoint analysis results based on the
119
entire sample are included for illustrative purposes.
6.2
STUDY 1.
ANALYSIS OF TEST-DRUG VERSUS PLACEBO IN THE
TREATMENT OF RHEUMATOID ARTHRITIS
6.2.1
Introduction
The illustrative analyses in this section utilize data from a
multi-center, double-blind, placebo controlled, parallel group
randomized study of outpatients with rheumatoid arthritis.
The
efficacy of the test drug, a nonsteroidal antiinflamatory compound,
was assessed based on a variety of joint pain and mobility response
parameters.
For the purposes here, the number of painful joints
(NPAIN) is utilized.
Patients who met the study protocol's selection
criteria were randomized to placebo or test-drug treatment and
examined at baseline and at weekly intervals for four weeks following
initiation of treatment.
Of a total of 109 subjects who were
initiated on treatment, 92 (84.4%) had complete four-week data.
This
group of 92 subjects is considered in the following analyses.
Subjects were recruited from six investigative sites.
Because of
small numbers, however, the data from some investigators are combined
for analysts, resulting in four investigator categories.
120
6.2.2
Data Analyses
The average number of painful joints (NPAIN) for each
investigator/treatment group combination is presented in Table 6.1.
One can note from review of Table 1 that, despite different levels of
NPAIN rating among investigators, consistent differences between
treatments in mean NPAIN are noted across investigators.
For all but
one time point following baseline (Week 2, Investigator 2), the mean
NPAIN score for test drug is lower than that for placebo.
While the
placebo-group response profile is fairly flat over the observation
period, the active-group shows a steady decline in number of painful
joints.
This is shown graphically in Figure 6.1.
Results from separate univariate analysis at each time point are
summarized in Table 6.1.
While statistically significant differences
among investigator ratings were noted at all time points, treatmentby-investigator interaction was not found significant at any time
point.
Significant treatment differences in mean NPAIN response were
found at Week 4 (p=.0079).
Using the Bonferroni multiple comparisons
procedure and an overall O{-value of .05, the null hypothesis of no
treatment differences at any of the post-baseline time points is
rejected.
Mixed model analysis of variance results are presented in Table
6.2 both including and excluding the baseline observation.
The
baseline-included analysis is useful for the analysis of withinsubject factors while the baseline-excluded analysis is appropriate
for assessing treatment main effects.
Results of tests of within-
,.
....
subject factors are displayed based on unadjusted, Eo -adjusted and E.-
121
TABLE 6.1
Investigator
STUDY 1 AVERAGE NUMBER OF PAINFUL JOINTS
BY TREATMENT GROUP AND INVESTIGATOR
Sample
Treatment Size
Week FollOWing Initiation
of Treatment
0
1
2
3
4
1
Placebo
Active
13
13
26.92
24.54
26.46
19.08
24.00
19.69
22.38
18.46
23.77
16.69
2
Placebo
Active
17
19
19.70
20.77
19.59
17.40
17.75
19.13
15.44
14.03
15.57
11.08
3
Placebo
Active
6
8
28.67
24.18
17.83
12.92
15.00
10.28
15.83
12.15
15.50
10.90
4
Placebo
Active
8
8
35.72
34.88
42.09
36.00
35.59
28.00
34.20
24.38
34.34
18.50
Placebo
Active
44
48
25.97
24.71
26.42
20.21
23.23
19.29
21.65
16.64
22.21
13.81
All
Investigators
Univariate Analysis Results
for each Time Point.
2-sided p-values*
Endpoint
TX*INV
.9123
.8743
.6727
.7372
.4988
INV
.0463
.0006
.0103
.0052
.0054
TX
.7011
.0767
.3316
.1316
.0079
*Using a Bonferroni multiple comparisons procedure approach,
significance is reached when p-value Is less than 0(/4.
.0250
.0020
122
FIGURE 6.1
STUDY 1
AVERAGE NPAIN-SCORE TIME-RESPONSE
PROFILE, BY TREATMENT GROUP.
NPAIN
Rating
30
2S
------- ~.
~.
~~
e
Placebo
)1.-
20
15
Active
10
o
1
2
3
Weeks Following Initiation of Treatment
4
171
TABLE 6.2
STUDY 1 MIXED MODEL ANALYSIS RESULTS FOR NPAIN.
E. -CORRECTION FACTOR ESTIMATES AND p-VALUES
FOR UNADJUSTED AND E. -ADJUSTED TESTS.
Baseline Included
in Analysis
from Analysis
-
.8387
.9596
E.
.9506
1.0000
TX*INV*Time
unadjusted
'--adjusted
[-adjusted
.7346
.7090
.7272
.7155
.7094
.7155
.0002
.0005
.0002
.0026
.0030
.0026
.0293
.0384
.0318
.1049
.1076
.1049
<.0001*
<.0001*
TX*INV
.7811
.7082
TX
.0779
.0442
Correction Factor
Estimates
Analysis
Results
Baseline Excluded
~
INV*Time
~nadjusted
E.-adjusted
l-adjusted
TX*Time
,Rnadjusted
€-adjusted
~-adjusted
Time
* This level of significance found for unadjusted and
~-adjusted tests.
124
-
adjusted degrees of freedom, where ~
€ is the sample estimate of E. and Eis the Huynh and Feldt estimator.
Review of Table 6.2 indicates no
significant treatment-by-investigator interaction but significant
investigator main effects (p-.OOOl).
Treatment differences in mean
NPAIN response were found to be significant (p-.0442) indicating the
efficacy of the test drug in reducing the number of painful joints.
Growth curve analysis was performed initially by using a full
model including treatment-by-investigator interaction terms.
A third-
order polynomial model (goodness-of-fit p-value - .3440) provided an
adequate fit to the data and was used to test for significant
treatment-by-investigator interaction.
The test, based on Wilk's
lambda statistic, was non-significant (p-.8827), and a reduced main
effects model including treatment and investigator terms was fitted.
The correlation matrix of polynomial term parameters is presented in
Table 6.3.
In Figure 6.2, goodness-of-fit (g-o-f) p-values are
displayed for polynomial curves of various order.
The third-order
model resulted in a g-o-f p-value of .6014 and was utilized to examine
treatment differences in response curve shape.
Since the fourth-order
term was found to correlate with two of the estimation space variables
(see Table 6.3), it was included as a covariable in the analysis.
Analysis results are presented in Table 6.4.
One can see that
differences between treatment groups were found to be near-significant
with respect to the linear component of response (p-.OS02) and the
cubic compnent
(~.OS40).
Near significance was found for treatment
group differences in the intercept term (p=.0823) despite a nonsignificant result (p-.7011) when a univariate test of treatment
differences at baseline was conducted.
125
TABLE 6.3
STUDY 1 GROWTH CURVE ANALYSIS PARAMETER
CORRELATION MATRIX (with p-values below
the diagonal) FOR MAIN EFFECTS MODEL.
Order of Polynomial Coefficient
Int.
Int
--
1st
-.2587
2nd
4th
-.1652
.2654
-.0954
-.1031
.0341
-.1635
-.1095
.0196
1st
.0128
2nd
.1155
.3281
3rd
.0106
.7468
.2989
4th
.3658
.1195
.8529
FIGURE 6.2
3rd
-.2143
.0402
STUDY 1 GOODNESS-OF-FIT p-VALUES (WILK'S
LAMBDA) FOR POLYNOMIAL GROWTH CURVE MAIN
EFFECTS MODEL OF A GIVEN ORDER.
g-o-f /.0
p-value
•
.8
•
.~
.'1
•
•2.
0'"----........-----......
o
1
2
Order of Polynomial
..-
.....-_
3
4
126
TABLE 6.4
STUDY 1 GROWTH CURVE ANALYSIS RESULTS
MAIN EFFECTS THIRD-ORDER POLYNOMIAL MODEL
p-Value (Wilk's Lambda)
for Test of Model Factor Differences
Model Parameter
Intercept
Treatment
.0823
1st
.0502
2nd
.5856
3rd
.0540
Any Differences
Investigator
.0408
< .0001
127
AUC analysis results for the main effects model are presented in
Table 6.5.
estimation:
Results are displayed according to method of AUC
trapezoidal rule, fourth-order (full model) polynomial
approximation, and third-order (reduced model) polynomial
approximation.
Review of Table 6.5 shows that differences among
investigators in AUC response are significant throughout the
observation period.
Significant treatment differences in AUC response
were noted for the interval between Week 3 and Week 4 (p-.0286) but
not for the entire observation period (p-.1161).
Analysis results
using a reduced third-order polynomial to estimate the AUC parameter
were found to be generally closer to the results from trapezoidal-rule
AUC analysis than to the analysis based on the full polynomial model.
To illustrate the use of the AUC parameter to test for treatmentby-time interaction, a MANOVA test using subinterval AUC estimates was
found to be significant (p-.0340).
This compares to a p-va1ue of
0.0581 obtained form a MANOVA test using NPAIN values for each
observation point, and a p-va1ue of .0318 using mixed model analysis.
Results from Zerbe-Walker test analysis are included in Table
6.5.
Estimation of time-response function f(t) was based on linear
interpolation between observed responses.
Z-W analysis p-values can
be seen to be similar to AUC analysis results although generally
somewhat smaller.
128
TABLE 6.5
STUDY 1 AUC ANALYSIS AND ZERBE-WALKER
TEST RESULTS FOR NPAIN.
2-sided p-values
I.
AUC ANALYSIS RESULTS
Weeks
Method of
AUC-Estimatiot1
0-1
1-2
2-3
3-4
0-2
0-3
0-4
TX
Trapezoida]
INV
Rule
.2555
.1538
.1968
.0286
.1921
.1859
.1161
.0001
.0001
.0007
.0013
.0001
.0001
.0002
TX
.1242
.1695
.2624
.0308
.1330
.1585
.0977
INV
.0001
.0001
.0015
.0008
.0001
.0001
.0001
TX
.2220
.1231
.1997
.0562
.1607
.1634
.1201
INV
.0001
.0001
.0005
.0009
.0001
.0001
.0001
TX
.1804
.1039
.1461
.0187
.0829
INV
.0001
.0001
.0005
.0009
----- --------- -----
Full (4th-order)
Polynomial Model
Reduced (3rdorder) Model
II. ZERBE-WALKER
TEST RESULTS
.0001
129
6.3
STUDY 2.
ANALYSIS OF TEST-DRUG VERSUS STANDARD IN THE
TREATMENT OF RHEUMATOID ARTHRITIS
•
6.3.1
Introduction
The analyses in this section utilize data from a multi-center,
double-blind, standard (aspirin) controlled, parallel group randomized
study of outpatients with rheumatoid arthritis.
This study
investigates the same test drug as the study analyzed in section 6.2
and uses the same response measures such as NPAIN.
Characteristics of
this data set differ, however, in the following ways:
--there are a greater number of investigators participating
in the trial.
Eleven investigative sites (grouped into
eight investigator categories, due to small sample sizes)
participated in this study.
--there are a greater number of observation time points, a
longer observation period, and the observation times are not
not all equally spaced.
The NPAIN response of subjects was
measured at baseline, at Weeks 1, 2, 3 and 4, and at Months
2, 3, 4, 5, 6 and 6.5.
Of 169 patients randomized to treatment, 81 (48%) had complete data.
This group of 81 subjects is considered in the analyses presented in
the following section.
130
6.3.2
Data Analyses
•
The average number of painful joints (NPAIN) for each
investigator and treatment group combination is presented in Table
6.6.
Review of Table 6.6 shows that for the majority of observation
times following baseline, the mean NPAIN values of the test-drug group
are lower than those of the aspirin (ASA) group.
An exception to this
is noted for Investigator 5's results, where for each time point the
mean NPAIN value of the test-drug group is larger than that of the ASA
comparison group.
The mean NPAIN values are plotted in
Fig~re
6.3 by
treatment group.
Results from univariate analysis at each time point are presented
in Table 6.7.
Analysis results show no significant treatment-by-
investigator interaction at any time point.
Investigator differences
are statistically significant for observation time points up to Month
4 but not thereafter.
Treatment differences were not found to be
signifiant at any time point, although the mean NPAIN values shown in
Table 6.6 show that the test-drug group tends to have a slightly lower
number of painful joints throughout the course of the six and one-half
treatment period relative to the ASA group.
Mixed model analysis of variance results are given in Table 6.8.
As for the analysis in the previous section, results of tests on
~
within-subject factors are presented using unadjusted, ~-adjusted,
...
and £-adjusted degrees of freedom. The results for between-subject
factors indicate no significant treatment-by-investigator interaction
(pm.6764) or treatment main effects (p-.3592).
Investigators showed
significant differences in mean NPAIN ratings over the course of the
~
TABLE 6.6
131
STUDY 2 AVERAGE NUMBER OF PAINFUL JOINTS (NPAIN)
BY TREATMENT GROUP AND INVESTIGATOR.
SUBJECTS WITH COMPLETE DATA
Investigator
1
2
3
4
5
6
7
8
All
Invls
ASA
22.0
18.8
19.0
14.0
25.0
28.2
18.0
32.4
21.5
Test
20.4
22.8
19.0
14.7
31.0
30.4
19.8
35.0
24.1
ASA
34.0
15.0
7.3
11.0
15.0
15.5
12.1
28.3
14.6
Test
15.4
17.0
11.8
11.0
24.3
20.8
11.6
30.5
17.0
ASA
34.0
15.0
10.3
13.7
13.3
23.2
13.0
24.5
16.4
Test
14.8
19.8
7.7
6.3
21.1
22.4
10.0
23.5
15.7
ASA
34.0
15.8
6.7
10.7
9.3
21.3
14.1
6.8
13.1
Test
11.6
16.5
6.8
3.7
22.1
23.3
8.4
22.0
14.5
ASA
34.0
14.2
4.3
5.5
17.7
25.7
10.9
12 .8
13.1
Test
15.8
10.2
7.8
11.3
19.4
22.6
7.9
27.0
14.4
ASA
28.0
14.5
8.3
10.2
17.0
32.2
10.7
16.2
15.7
Test
10.5
18.0
8.7
5.3
17.1
17.8
8.5
18.5
13.1
ASA
17.0
13.0
7.0
12.2
7.0
25.7
12.1
15.8
13.7
Test
7.9
15.5
8.8
5.3
16.4
18.9
5.9
15.5
12.0
ASA
11.0
14.0
6.7
15.2
9.7
25.7
13.6
20.5
15.1
Test
7.7
8.3
13.8
8.0
18.0
14.5
9.6
11.0
11.8
ASA
28.0
15.0
3.7
11.6
10.7
24.8
14.1
19.3
14.5
Test
11.4
11.2
8.5
8.0
19.7
15.3
8.4
13 .5
12.1
ASA
40.0
9.8
7.5
6.4
14.0
26.0
13.7
21.8
14.7
-l
Test
8.5
14.2
10.2
6.7
18.6
12.0
9.0
14.5
11.8
ASA
0.0
12.2
4.7
11.7
13.3
23.0
13.9
18.3
13.4
I
Teat
9.9
11.8
9.7
3.0
21.4
9.8
9.4
8.0
11.0
Time
Treatment
0
1Wk
2 Wks
3 Wks
1 Month
2 Months
3 Months
4 Months
5 Months
6 Months
6.5 Mos.
132
FIGURE 6.3
STUDY 2 AVERAGE NPAIN SCORE TIME-RESPONSE PROFILE
BY TREATMENT GROUP. SUBJECTS WITH COMPLETE DATA.
NPAIN
Rating
30
25
20
l\-:><~~.
15
.----·----·~.ASA
-----lC--__,,__--
----ll.~Test
10
5
O'------,.-----...-----.....----....----....----T----......-B
1
2
3
4
5
Months Following Initiation of Treatment
6
7
133
TABLE 6.7
STUDY 2 UNIVARIATE ANALYSIS RESULTS
FOR EACH OBSERVATION TIME POINT.
SUBJECTS WITH COMPLETE DATA.
Model Factor
Time Point
TX*INV
INV
Baseline
.9987
.0225
.4319*
Wk1
.6757
.0094
.3798*
Wk2
.6129
.0110
.5645
Wk3
.1987
.0243
.9899*
Month 1
.4777
.0010
.9647*
Month 2
.4881
.0080
.1568
Month 3
.7028
.0514
.4399
Month 4
.5748
.5164
.2870
Month 5
.6155
.1870
.2675
Month 6
.2092
.1991
.1698
Month 6.5
.2972
.2825
.2641
Endpoint
.4340
----
.2210
TX
* Direction of difference favors ASA treatment.
Using a Bonferroni multiple comparsons procedure approach,
significance is reached for a p-va1ue that is less than
«/10.
134
TABLE 6.8
STUDY 2 MIXED MODEL ANALYSIS RESULTS FOR NPAIN
£, -CORRECTION
FACTOR ESTIMATES AND p-VALUES FOR
UNADJUSTED AND € -ADJUSTED TESTS.
Baseline Included
Baseline Excluded
In the Analysis
From the Analysis
.6372
.6625
.8769
.9053
E.-adjusted
e .-adjusted
.0013
.0078
.0024
.0002
.0015
.0003
INV*Time
,Alnadjusted
€'-adjusted
£ -adjusted
.0009
.0058
.0016
.0003
.0027
.0006
.0329
.0629
.0409
.1231
.1573
.1317
Correction
Factor Estimates
Analysis
Results
TX*INV*Time
~nadjusted
-
TX*Time
~nadjusted
e.-adjusted
l-adjusted
Ie
<.0001*
Time
<.0001*
TX*INV
.7458
.6764
INV
.0239
.0275
TX
.4433
.3592
This level of significance found for unadjusted and
E.-adjusted tests.
135
observation period (p-.0275).
Using a full two-way model including treatment-by-investigator
interaction terms, a fifth-order polynomial curve was initially fitted
to the data.
The p-value of the goodness-of-fit test for this model
was .0942 suggesting a somewhat poor fit, but it was decided to employ
this reduced model since interpretation of model polynomial
coefficients of higher order would be clearly impossible.
The
correlation matrix of polynomial parameters is presented in Table 6.9.
Review of Table 6.9 shows no trend for any variables in the error
space to correlate consistently across variables in the estimation
space.
For this reason, no higher-order terms were included as
covariables in the model.
Review of analysis results, shown in Table 6.10, indicates that a
significant treatment-by-investigator interaction exists (p=.0026).
This is in sharp contrast to results of separate univariate analyses
and mixed model analysis discussed previously.
Assuming treatment
group main effect tests are still meaningful in the presence of
interaction, the analysis results show significant treatment group
differences in linear trend and near significance for the second- and
fourth-order polynomial coefficients.
AUC analysis results are presented in Table 6.11.
Results are
presented based on trapezoidal-rule approximation of AUC.
Review of
Table 6.11 shows no significant treatment-by-investigator interaction
over the six and one-balf month treatment period (p-.6634).
Treatment
group differences in cumulative response to treatment were not found
to be significant (p-.3269).
Differences in mean NPAIN ratings among
investigators were statistically significant (p-.0442).
The use of a
\0
....
C""l
e
e
TABLE 6.9
e
STUDY 2 GROWTH CURVE ANALYSIS
PARA~W.TER
CORRELATION MATRIX (with p-values below
the diagonal).
Polynomial Coefficient
Int.
Int.
1st
-.3963
I
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
.3031
-.4993
.1197
-.4527
.1558
-.1768
-.0432
-.1287
.2512
-.5478
.1898
-.1009
.4591
-.1948
.1710
-.0691
.2114
.0403
-.3316
.1526
-.3541
-.0612
.1365
-.0138
.0335
.0636
-.2921
.2777
-.1242
-.0002
-.0055
-.0502
-.1690
-.3253
.4130
.1346
-.0545
-.2310
.2155
-.1453
.1966
.1628
.0145
-.3168
-.2571
.0437
-.0813
.1042
-.1564
-.1130
.0248
-.0728
-.2634
1st
.0003
2nd
.0059
.0000
3rd
.0000
.0896
.0025
4th
.2870
.3701
.1738
.0081
5th
.0000
.0000
.0012
.0121
.0030
6th
.1649
.0814
.5876
.2691
.0001
.1956
7th
.1143
.1270
.2244
.9984
.2308
.0785
.0205
8th
.7020
.5397
.9028
.9612
.6291
.1465
.6984
.1632
9th
.2523
.0582
.7668
.6565
.0380
.8977
.4704
.3151
.5182
10th
.0237
.7210
.5725
.1316
.0533
.0040
.3545
.8258
.0175
.1490
.1842
137
TABLE 6.10
STUDY 2 GROWTH CURVE ANALYSIS RESULTS
FIFTH-ORDER POLYNOMIAL MODEL.
p-Values (Wilk's Lambda) for
Tests of Differences
Model Factor
Polynomial
Coefficient
--
Treatment
Investigator
Any diffs
.0741
.0027
Intercept
.5017
1st
.0171
2nd
.0661
3rd
.3602
4th
.0817
5th
.8305
Treatment-by-Investiga tor
Interaction
.0026
1
e
~e
......
TABLE 6.11
..
e
STIIOY 2 AUC ANALYSIS ANO ZERBE-WALKER TEST RESULTS
SUBJECTS WITH COMPLETE DATA.
Time Period
I. AUC Analysis
Weeks
Months
Results
II.
0-1
1-2
2-3
3-4
1-2
2-3
3-4
4-5
5-6
TX*INV
.9623
.6479
.3575
.3634
.5858
.6415
.6842
.6095
.3860
.4366
.6634
INV
.0098
.0086
.0125
.0029
.0014
.0130
.2080
.3621
.1650
.2205
.0442
TX
.3749
.8749
.7702
.9871
.4242
.2489
.3370
.2657
.1922
.1870
.3269
INV
.0104
.0095
.0247
.0007
.0017
.0158
.1866
.3463
.2176
.3062
.0563
TX
.3133
.8018
.5860
.5688
.5856
.3769
.3244
.2940
.3217
.3328
.3702
6-6.5
0-6.5
Zerbe-Walker
Test Results
139
loth order polynomial (full model) to estimate AUC resulted in
distinctly different results (Tx-by-INV p-.07l2; INV p-.0763; TX
p-.0756).
A MANOVA test of the null hypothesis of no treatment-by-
time interaction using subinterval AUC estimates was found nonsignificant (p-.34l5) compared to a MANOVA test p-value of .2007 when
NPAIN values for each observation time were used.
Results from Zerbe-Walker test analysis are included in Table
6.11.
Calculation of the Z-W test statistic was based on linear
interpolation between observe responses.
Review of Table 6.11 shows a
general similarity between AUC and Z-W test results.
6.4
STUDY 3.
ANALYSIS OF TEST-DRUG VERSUS PLACEBO IN THE
TREATMENT OF DEPRESSION
Introduction
The illustrative analyses in this section utilize
d~ta
from a
randomized placebo-controlled parallel-group double-blind clinical
trial to assess the efficacy of a test drug in the treatment of
depression (Davids9n et. al., 1981).
The test drug is believed to
stimulate mental activity in depressed patients by inhibiting the
neural enzyme, monoamine oxidase (MAO).
Outpatients were selected who
had persistent depression and who failed to respond to other
psychotropic drugs.
Drug dosage was adjusted on an individual basis
to achieve 90 percent MAO inhibition.
A total of 29 patients were
140
randomized to either placebo or test-drug treatment groups and
followed for a six-week observation period.
At baseline and Weeks 1,
2, 3, 4 and 6, patient depression levels were assessed by physician
rater using a modification of the Hamiltion Depression Scale (Davidson
et. al., 1982).
The response parameter utilized in the following
analyses is the total Hamilton Scale score (HAMTOT) obtained by
summing the score of the 28 scale items.
Due to early dropouts, only
19 subjects completed the full six weeks of treatment.
Several characteristics of this data set differ from those of the
rheumatoid arthritis clinical trials previously discussed.
Data set
features include
a relatively small total sample size.
a one-way ANOVA layout.
In contrast to the arthritis
studies, where a two-way layout was used to investigate
treatment-by-investigator interaction and investigator
differences, this study was conducted at only one
investigative site.
-- the intervals between observations are not all equally
spaced.
In the following section, illustrative analyses to assess treatment
differences are presented.
6.4.2
Data Analyses
The average total Hamiltong Depression Scale (RAMTOT) score for
placebo and
te~t-drug
groups are presented in Table 6.12 for each
observation time point.
Results are displayed for all subjects
141
TABLE 6.12
STUDY 3 AVERAGE HAMTOT SCORES BY TREATMENT GROUP
ALL SUBJECTS AND SIX-WEEK COMPLETERS.
Weeks
I. All
Subjects
Initiation of Treatment
Fol1~wing
0
1
2
3
4
n
14
14
14
14
12
8
14
mean
32.86
21.07
16.71
16.21
16.92
15.00
19.21
n
15
15
15
15
13
11
15
mean
31.20
19.40
11.60
8.80
5.69
4.55
6.73
p-va1ue
.5281
.6366
.1886
.0373
.0050
.0165
Endpoint
6
Placebo
Test Drug
.-
6-Week
Completers
II.
0
1
2
3
4
6
Placebo (n-8)
30.25
20.50
11.00
10.62
13 .12
15.00
Test Drug (n-11)
32.36
18.18
12.00
7.91
4.82
4.55
.2055
.1172
.6105
.1275
<'0001
.0165
p-value
Using a Bonferroni multiple comparisons procedure approach,
significance is reached when a p-value is less than ot/5.
142
observed at each time point and for subjects completing the six-week
observation period.
One can note that the six-week completer group
tended to exhibit lower HAMTOT scores over the course of treatment
than the all-subjects group.
One can also note that while test-drug
symptomatology steadily decreased over time, the placebo group mean
HAMTOT scores initially decreased but then began to return to baseline
levels.
A graphical display of HAMTOT means is presented in Figure
6.4 for the six-week completers.
Results from separate univariate analysis at each time point are
shown in Table 6.12.
Significant treatment differences are evident
for Weeks 3, 4, and 6.
Using the Bonferroni multiple comparisons
procedure and an Ol-value of 0.05, the Week 4 difference is judged
statistically significant, indicating that the null hypothesis of no
treatment differences at any of the time points is rejected.
Mixed model analysis of variance results are presented in Table
6.13 for the first three weeks of treatment and the entire six-week
treatment period.
Test results of within-subject factors are
~
presented based on no correction factor; based on £. the maximum
likelihood estimate of the degrees-of-freedom correction factor ~ ;
,..
and based on the Huynh and Feldt estimator~. Review of Table 6.13
indicates a statistically significant treatment-by-time interaction
for Weeks 0-6 regardless of which correction factor is considered.
A
test of treatment main effects for Weeks 0-6 is non-significant
(p-.2736).
A growth curve analysis was performed for subjects with complete
six-week data
(n~19).
The parameter correlation matrix for the
complete (fifth-order polynomial) one-factor model is presented in
143
FIGURE 6.4
STUDY 3 PLOT OF HAMTOT MEANS BY TREATMENT GROUP
SUBJECTS WITH COMPLETE DATA
Hamtot
Score
35
•
.Placebo
~~./
10
.~
5
lCK--------lCTest
O'---.;------...---_,__-..:..----.~--_,__--_,_--
B
1
2
3
4
5
Weeks Following Initiation of Treatment
6
144
TABLE 6.13
STUDY 3 MIXED MODEL ANALYSIS RESULTS
€-CORRECTION FACTOR ESTIMATES AND p-VALUES
FOR UNADJUSTED AND £ -ADJUSTED TESTS
o-
-
3 Weeks
o-
6 Weeks
.9143
.6812
1.0000
.9227
.2387
.0029
Eo -ad jus ted
.2413
.0097
e .-adjusted
.2387
.0039
€
TX*Time
unadjusted
~
-
Time
TX
* This
<.0001*
.1471
<.0001*
.2736
level of significance found for unadjusted
and E.-adjusted tests.
145
Table 6.14.
The goodness-of-fit of the different-order polynomial
models is presented in Figure 6.5.
Review of Figure 6.5 shows that a
polynomial model of order at least three is needed to fit the data
adequately (goodness-of-fit p-va1ue • .3316).
Utilizing a third-order
poi1ynomia1 to fit the data, treatment comparisons were made by
appropriate choice of contrast matrices C and U to test the null
hypothesis HO: C.oU. 0 where
polynomial parameters.
.0
is a 2 x 4 matrix of orthonormal
Estimates of natural polynomial coefficients
for both treatment groups and analysis results are given in Table
6.15.
Review of Table 6.15 shows significant treatment differences
with respect to the linear component (p·.0041) indicating that the
linear component of response to treatment of the test-drug group was
significantly larger than that of the placebo group over the six-week
observation period.
AUC analysis results for these data, using linear interpolation
estimation of response function f(t), are summarized in Table 6.16.
Treatment differences become apparent for the intervals following the
Week 2 observation time.
Considering cumulative response to treatment
over the entire six-week period, differences between the treatment
groups in AUC were not found significant (p-.1800).
The same test for
treatment differences in AUC response based on fifth-order polynomial
estimation of f(t) resulted in a p-va1ue of .2659.
A third-order
polynomial model (used in the growth curve analyses) resulted in a pvalue of .1795.
To illustrate the use of the AUC parameter to test for treatmentbj-time interaction, a MANOVA test using subinterval AUC estimates was
found non-significant for Weeks 0-6 (p·.0782).
This
compar~s
to a p-
TABLE 6.14
STUDY 3 GROWTH CURVE ANALYSIS PARAMETER
146
CORRELATION MATRIX (with p-values below
the diagonal).
Polynomial Coefficient
Int.
1st
Int.
2nd
.0468
3rd
.5444
-.2492
.0926
-.2235
-.3000
-.1149
-.2747
-.4252
-.5331
.1739
.0625
-.2258
.8492
2nd
.9409
.3577
3rd
.0159
.2120
.0695
4th
.3036
.6395
.0188
.7995
5th
.7062
.2550
.4764
.3526
.1512
.5366
STUDY 3 GOODNESS-OF-FIT p-VALUE (Wilk's Lambda)
FOR POLYNOMIAL GROWTH CURVES OF GIVEN ORDER.
g-of-f
p-value
1.0
.8
•
•6
.4
.2
0.0
0
5th
-.0183
1st
FIGURE 6.5
4th
1
.
2
Order of Polynomial
4
5
147
TABLE 6.15
STUDY 3 ESTIMATES OF NATURAL POLYNOMIAL
COEFFICIENTS FOR THIRD-ORDER MODEL WITH
p-VALUES FOR TESTS OF TREATMENT DIFFERENCES.
Polynomial Parameter Estimate
Intercept
Linear
Quadratic
Cubic
Placebo
30.89
-16.13
4.10
-.308
Test Drug
31.99
-15.64
3.08
-.207
Placebo
<.0001
.0003
.0001
.0375
Test Drug
<'0001
<.0001
<.0001
.0933
.2736
.0041
.7475
.5803
p-value for
HO: parameter-O
Tx Difference
p-value
* p-value
I
.0656* f
for test of any differences between treatment
groups with respect to polynomial parameters. Tests are
are based on orthonormal polynomial coefficients.
148
TABLE 6.16
STUDY 3
AUC AND ZERBE-WALKER TEST RESULTS
TWO-SIDED p-VALUES FOR TEST OF NO TREATMENT
DIFFERENCES.
SUBJECTS WITH COMPLETE DATA.
Time Period (Weeks)
AUC
(Linear Interpolation)
Zerbe-Walker
Test
0-1
1-2
.5365
.2905
2-3
.0669
.55791.29671.0740
3-4
.0132
I ----I
4-6
.0156
0-3
0-6
.1863
.1800
---I .
1899
1---- I
149
value of .0522 for a MANOVA test using HAMTOT values for each
observation point, and a p-value of .0097 using mixed model analysis.
Included in Table 6.16 are results from the Zerbe-Walker test for
differences in treatment group curve shape for selected subintervals
of the treatment period.
Estimation of the Z-W parameter was based on
using linear interpolation estimation of response function f(t) within
each subinterval.
Review of Table 6.16 indicates that the Z-W results
are closely similar to those obtained using AUC analysis.
In summary, these results suggest that the effect of the testdrug on HAMTOT scores began to manifest itself following the Week 2
visit.
When considering the entire six-week observation period, no
significant differences between treatment groups in cumulative
response were noted (AUC p-value • .1800).
A third-order polynomial
model was found to provide an adequate fit to the six-week data, and
polynomial components up to the cubic term were found to be
signifciantly different than zero for both treatment groups (the pvalue of the cubic component for active-drug reached nearsignificance, p·.0933).
Over this period a significant difference
between treatments with respect to the linear component of response
was noted (p·.004l).
To provide an illustrative example of the use of piecewise growth
curve analysis (see Chapter 5 for a discussion), this section
concludes with an analysis of the depression data based on a twosegment piecewise growth curve model.
It is assumed that
th~se
150
response data (see Figure 6.4) reflect one form of response for Weeks
0-2 (Time Period 1) and another form of response in the subsequent
period of observation (Time Period 2).
For example, it might be
conjectured that the first two weeks following initiation of treatment
represent primarily a period of placebo-effect response.
Based on this framework, a quadratic piecewise model fitted for
Weeks 0-2 and a cubic piecewise model fitted for Weeks 2-6 can be
constructed so as to fit the data perfectly.
A review of response-
over-time plots for each of the 19 subjects showed that the response
of most subjects in a given treatment group was similar in shape to
the average group curve displayed in Figure 6.4, thus providing some
evidence that a segmented model is appropriate.
The parameter
correlation matrix for the complete quadratic/cubic piecewise model is
given in Table 6.17.
A reduced quadratic/quadratic model was found to provide an
adequate fit to the data (goodness-of-fit p-value - .7708).
Results
from analysis using this model are presented in Table 6.18.
Assuming
the model is correct, statistically significant differences between
treatment groups were found for the linear component during Time
Period 2.
Differences with respect to the quadratic component for
Time Period 1 approached statistical significance (p=.0763).
Several observations concerning a "placebo effect" are suggested
from the analysis results from use of the piecewise model.
From
review of the tests of parameters for the placebo group in Time Period
1, the placebo effect for this population can be described as linear
in nature.
While the quadratic component for placebo is non-
significant (p-.9372), the linear component is highly significant
TABLE 6.17
STUDY 3 PIECEWISE GROWTH CURVE ANALYSIS.
151
MATRIX OF PARAMETER CORRELATION COEFFICIENTS
WITH p-VALUES BELOW THE DIAGONAL •
.
Piecewise Model Coefficients
TlINT
T1LIN
T1INT
.0000
.4257
T1LIN
.0692
T1QUAD
.0542
.2593
T2INT
.0013
.6268
.0656
T2LIN
.6049
.0299
.4466
.0297
T2QUAD
.0015
.1047
.8001
.5438
.4608
T2CUB
.9925
.6223
.7754
.4721
.0339
TABLE 6.18
T1QUAD
T2INT
T2LIN
T2QUAD
T2CUB
-.4483
.6817
-.1268
.6761
-.0023
.2723
.1192
-.4983
.3839
-.1208
-.4307
-.1857
-.0623
-.0701
.4989
.1486
-.1756
-.1800
-.4884
-.0401
.8704
STUDY 3 PIECEWISE (TWO-SEGMENT) GROWTH CURVE
ANALYSIS RESULTS.
NATURAL POLYNOMIAL COEFFICIENT
ESTIMATES FOR QUADRATIC/QUADRATIC MODEL WITH
p-VALUES FOR TESTS OF TREATMENT DIFFERENCES.
Polynomial Parameter Estimate
TlINT
TlLIN
T1QUAD
T2INT
Placebo
30.25
-9.88
0.13
8.89
0.722
0.05
Test Drug
32.36
-18.18
4.00
26.02
-8.642
0.84
T2LIN
T2QUAD
p-value for test of
HO: parameter-O.
Placebo
<'0001
.0001
.9372
----
.1036
.8950
Test Drug
<.0001
<.0001
.0080
----
.0494
.0186
.0763
I ---- I
.0158
p-va1ue for Tx
Differences
I
.9525
I
.9891
I
I·
.1311
I
152
(p <.0001).
Assuming a placebo effect is in operation for both groups
during Time Period 1 and that it is linear in nature, the "true"
effect of test drug during the first two weeks of treatment can be
viewed as occurring through it quadratic component (p-.0080).
6.5
COMPARISON OF ANALYSIS RESULTS
A summary presentation of selected analysis results for the three
clinical trial data sets utilized in Chapter 6 is shown in Table 6.19.
One can note from review of Table 6.19 that for each study, the pvalues for the test of treatment main effects differ considerably
across the various procedures. For these data one can also note that
growth curve analysis p-va1ues tend to be lower than the corresponding
p-values of the other analysis methods.
If interest is in detecting any differences in curve shape over
[t 1 .t p ]' growth curve analysis clearly provides the most suitable and
powerful method of analysis.
However, pharmaceutical clinical trials
that are long-term and/or include many observation time points may
result in response curves that are irregular in shape (e.g., see
Figure 6.3) for which differences in polynomial parameters of the
fitted curves are to be expected;
the question of superiority of one
treatment relative to another still remains.
When a high-order
polynomial is required to provide an adequate fit to a set of response
data, problems of interpretation can arise.
For example, a fifth-
order polynomial was r:itted to the Study 2 data and prOVided a
..
153
Table 6.19
Summary of Analysts Results for
Clinical Trial Data Sets.
Two-sided p-values.
Method of Analysis
Study
1. RArthritis
Test vs
Placebo
2. RArthritis
Test vs ASA
3. Depression
Model
Mixed
Growth
Factor Bonferroni
Model
Curve
TX*INV
n.s.
.7082
INV
.0024
TX
AUC
Z-W
.8827
.7770
---
.0001
.0001
.0002
.0001
.0316
.0442
.0408
.1161
.0829
TX*INV
n.s.
.6764
.0026
.6634
----
INV
.0100
.0275
.0027
.0442
.0563
TX
n.s.
.3592
.0741
.3269
.3702
TX
.0001
.2736
.0656
.1800
----
Test vs Placebo
somewhat poor fit (goodness-of-fit p-value
model
frame~orkt
= .0942).
Within this
differences in the linear component of response were
significant (p·.0171) while differences with respect to the secondorder (p-.0661) and fourth-order (p=.0817) components reached nearsignificance.
These results may prove difficult for an investigator
to interpret and relate to the physiological/psychological
r~sponse
154
process under study.
This illustrates the importance of using a
statistical model that has linkage to the level of knowledge
concerning the response phenomena under study.
This point is further emphasized by comparison of the piecewise
and ordinary growth curve analysis results for Study 3 (see section
6.4).
The fitted piecewise model assumes that response function f(t)
changes form over [t1,t p ] while the ordinary growth curve model
assumes f(t) remains the same.
While the ordinary growth curve model
fitted to the six-week data found polynomial components up to the
third-order to be significantly different than zero for both treatment
groups (see Table 6.15) and significant differences between treatment
groups with respect to the linear component (p·.004l), results from
the piecewise model were fundamentally different.
The
quadratic/quadratic piecewise model found a significant quadratic
component of response to test drug in Time Period 1 and significant
treatment group differences with respect to the linear component for
Time Period 2.
Each model can provide a perfect fit to the data in
the sense of passing through observed data points, but which one is at
least approximately correct cannot be judged based on the observed data
alone but must be substantiated by theoretical arguments.
Of course,
both models are useful as exploratory analysis tools to test
hypotheses concerning the nature of response.
Several other comments regarding growth curve analysis are
suggested by the data analyses presented in this chapter.
In absence
of theoretical arguments to suggest the proper order-curve, the
selection of a reduced polynomial model for each of these three data
sets based on review of goodness-of-fit test results was found to be
155
somewhat subjective.
Review of the plots of the goodness-of-fit p-
values of Figures 6.2 and 6.5 shows no trend for the p-values to level
off after a certain-order curve is fitted but rather to increase
sharply after each increment in polynomial curve order.
The
combination of a multifactor design with an irregularly shaped
response curve based on many observation time points (Study 2) appears
to necessitate a high-order growth curve model in order to achieve an
adequate fit.
To model a two-factor model including treatment-by-
investigator interaction for Study 2, a fifth-order polynomial was
utilized.
The resulting test for treatment-by-investigator
interaction across all polynomial coefficients was highly significant
(p·.0026) in contrast to non-signficant test results using mixed
model, MANOVA, and AUC analysis.
This suggests that growth curve
analysis may be oversensitive to slight differences in curve shape for
irregularly shaped data from multifactor studies.
Deciding which
higher order terms to include as covariates was also found to be a
subjective procedure.
Typically for these data sets, a high-order
variable would correlate highly with one of the polynomial terms in
the estimation space.
Inclusion of such a variable as a covariable
tended to result in a more significant test result for the term with
which a high correlation was observed but less significant test
results for other model terms.
This suggests that selection of high-
order polynomial terms as covariables should not be done on an ad hoc
basis but should be done based on
~
prioTi scientific knowledge about
the effect of the covariables on response.
The AUC response parameter was shown in Chapter 2 to measure
cumulative response to treatment over some defined period of
156
observation.
As such it can be incorporated as a response parameter
within a modeling framework such as growth curve analysis by
integrating the estimated growth curve polynomial response function
pet).
When a specific model for response function f(t) is not known,
the AUC parameter can still be estimated through the use of
approximation techniques such as the trapezoidal rule or least-squares
polynomial approach.
In contrast to the mixed model test of treatment
differences, which ignores the ordered time dimension associated with
response, estimation of the AUC parameter incorporates the times of
observation.
As a cumulative or average response parameter, AUC
analysis is not sensitive to detecting differences in response curve
shape over [tl,t p ].
Review of AUC analysis results for the three
clinical trial data sets shows the overall AUC test p-value for
treatment differences over [tl,tpl to be somewhere between the high
and low p-values observed for the set of univariate tests performed at
the individual observation points, assuming the same direction of
difference was noted for all time points.
For example, from Table 6.5
one can note that the overall AUC test p-value for treatment main
effect in Study 1 is .1161, which is between the extreme p-values
of .7011 and .0079 found from the univariate tests in Table 6.l.
Similarly, the AUC test result for Weeks 0-1 (p-.2555) is between the
p-values for Week 0 (p-.7011) and Week 1 (p=.0767).
AUC analysis was found useful to examine for response differences
on subintervals of [tl,tpl.
This is in contrast to growth curve
analysis, for which treatment comparisons are based on testing
parameters estimated on the entire observation interval [tl,tpl.
Zerbe-Walker analysis can also be used for tests on defined
e-
157
subintervals.
Using linear interpolation to approximate f(t) over
each subinterval, the AUC and Z-W test results were found to be
generally comparable.
Based on linear interpolation approximation,
AUC analysis has the advantage in that the AUC value is much simpler
to compute than the Z-W value.
The use of least-squares polynomial
estimation over the entire interval [tl,tpl to calculate the Z-W
parameter may result in more divergent p-values than those observed
based on linear interpolation estimation.
The subinterval AUC estimates can be used in a MANOVA test to
test for time main effects.
The test for time main effects is most
often of little interest in the pharmaceutical clinical trial setting,
however, since the response function f(t) is usually expected to vary
-e
over time, even for placebo treatment.
In the important treatment-by-
time interaction situation, differences in the shape of the response
curve are of interest, and as discussed in Chapter 3, unweighted AUC
analysis is inappropriate.
A weighted AUC procedure for the special
case of pharmaceutical trials is introduced in Chapter 3 which is
intended to evaluate treatment differences in cumulative response by
use of a weighting scheme that incorporates salient features of drug
action such as latency and duration of action.
Based on its extreme simplicity, the Bonferroni multiple
comparisons procedure may provide a useful "quick and dirty" approach
to assessing the null hypothesis of no treatment differences over
[tl,tpl.
When the number of observations is large, however, this
method can be expected to be insensitive to moderate but consistent
treatment differences.
•
CHAPTER 7
ANALYSIS CONSIDERATIONS FOR REPEATED MEASURES DATA
The analysis results presented in Chapter 6 illustrate the use of
several parametric approaches to the analysis of longitudinal data
from the pharmaceutical clinical trial setting.
While the general
motivation for such studies is often the determination of whether test
treatment is more effective than some comparison treatment(s), the
choice of analysis technique involves the combined consideration of a
number of factors including the study objectives, knowledge concerning
the response phenomena under study, properties of the prospective
statistical technique such as model assumptions and power, and
characteristics of the observed data such as the presence of missing
data.
A discussion of these topics and how they affect the choice of
analysis technique follows.
7.1
STUDY OBJECTIVES
An obvious consideration in formulating an analysis plan for
repeated measures data is the stated purpose of the study.
In later
phases of pharmaceutical clinical trial research of investigational
159
drugs, for example, assessment of test drug efficacy relative to some
other treatment (e.g., placebo or active standard) is most often the
primary study goal.
The study protocol typically defines the type of
research design and any important parameters upon which assessment of
drug efficacy is to be based.
It is important that the general notion
of efficacy (e.g., "test drug is better than standard treatment") be
further refined to reflect the characteristics of the test agents
under study.
In many bioassay situations, a new test agent (e.g.,
insulin preparation) is expected to behave as a dilution of some
standard preparation.
In this situation, the presence of a treatment-
by-time interaction (which might indicate impurities in the test
preparation or problems in production quality control) is important,
and statistical methods which identify treatment-by-time interaction
are selected.
In response-over-time studies of test versus placebo,
however, response patterns over time are expected to be different, and
the principal concern may be directed to issues such as significant
differences between treatments in average response over the period of
observation or the time of initial effect of test drug.
The repeated measures design is often used in pharmaceutical
clinical trial research to collect safety data in order to closely
monitor drug toxicity, especially in the early stages of drug
development.
While efficacy data may also be collected at each
patient visit, the primary interest with respect to treatment efficacy
may be directed at patient response at the end of the treatment
period.
This research interest might be expected for trials in which
titration of drug dosage is required over the first several weeks or
for trials in which response to treatment is expected to be
160
monotonically increasing over the course of the study treatment
period.
In this case, univariate analysis of the last scheduled
patient visit may be more appropriate than an analysis approach which
considers the entire vector of responses Yij.
When in addition
missing data occur, endpoint analysis may be the analysis method of
choice.
Expected or desired treatment properties and aspects of disease
processes under study should also be considered in the statement of
experimental goals and in the subsequent selection of statistical
analyses.
For example, in the case of assessing analgesics for the
relief of severe post-operative pain, rapid onset of drug effect
appears desirable.
A statistical analysis of experimental data for
this situation would use methods that assess time and level of peak
effect.
In another experimental setting, it may be unimportant
whether response is rapid or delayed; for example a study to compare
the efficacy of different anti-hypertensive agents to control high
pressure is likely to be interested in the overall performance of test
drug over a specified period of time.
The repeated measures analysis methods discussed in this paper
address differently the general question of whether test treatment is
more effective than some comparison treatment over period of
observation [tl,tpl.
The Bonferroni procedure null hypothesis is that
there are no treatment differences at any time point versus the
alternate hypothesis of a significant difference at one or more time
points.
The mixed model test essentially treats time as a class
variable and tests for treatment main effects by testing for
differences with respect to the unweighted sum of the members of the
161
observation vector Yij'
AUC analysis addresses the question of
treatment differences over [tl,tpl by testing for differences in
cumulative (or equivalently, mean) response.
Thus for the anti-
hypertension drug study cited above, AUC analysis may be the method of
choice.
Multivariate polynomial growth curve analysis approximates
response function f(t) over [tl,tpl by a polynomial function pet) and
identifies significant differences in curve shape parameters.
The
Zerbe-Walker test is a univariate test of differences in response
curve shape.
7.2
KNOWLEDGE OF THE RESPONSE PHENOMENA
The formulation of an analysis plan should consider the theory
behind the response phenomena under study.
In certain fields of
study, experimental experience built on theoretical constructs that
have been honed on the iterative cycle of experimentation and theory
modification may suggest the appropriateness of a certain model to
explain a particular response phenomena.
For example,in the field of
pharmacokinetics, the uptake and elimination of many types of drugs in
the body has been seen to be fit by a double exponential model
(Greenblatt and Koch-Weser, 1975).
This model implies that the rate
of drug elimination from central and peripheral body compartments at
time t is proportional to the amount of drug still present at the site
at time t.
In otber instances (e.g., the elimination kinetics of
ethanol) the rate of elimination is constant so that the amount of
drug in the body is a linear function of time.
Previous study of the
162
response phenomena may also suggest important variables to include in
the model as covariates or interaction terms.
When the theory of the response phenomena under study is
adequately understood, the analysis of aspects of such a response
phenomena is best done by statistical model building based on the
theory.
In the case of an exponential process, for example, non-
linear regression techniques can be used to solve for model
parameters.
For example, human growth from zero to six years has been
modeled as an exponential process (Jenss and Bayley, 1937).
The
height of children between ages six and eleven has been found to be
linear (Bayer and Bayley, 1959); for a comparative study of female and
male growth patterns during this age range, a linear growth curve
model would provide a meaningful model which fits the data well.
In
contrast, the use of AUC analysis here, in effect comparing average
sex heights between ages six and eleven, does not address questions
about the growth process and is inappropriate.
In many areas of study, little is known about the structure of
the response phenomena under study, and theoretically-based
statistical models cannot be constructed except on an exploratory
analysis basis.
While reasonable models for the pharmacokinetics of
certain drugs are sometimes available, time-response models for the
clinical response to the same drugs do not exist.
In this case
empirical models, constructed to adequately describe the observed data
and to address experimental goals, must be used.
163
7.3
PROPERITES OF THE STATISTICAL MODEL
The methods of analysis that are reviewed in this paper are
characterized by different distributional assumptions.
The univariate
ANOVA, Z-W test and AUC analysis techniques involve univariate
normality assumptions while the mixed model and MANOVA approaches
assume error terms with multinormal distributions.
Review of residual
plots for the univariate and AUC procedures applied to the data sets
of Chapter 6 showed generally symmetric bell-shaped distributions.
While no test of multivariate normality was performed for the growth
curve models fitted to the clinical trial data sets, review of
residual plots for each time point generally showed distributions
which were symmetric about zero.
The traditional mixed model ANOVA
model also assumes for tests of within-subject factors that
circularity holds, i.e., that the variance of pairwise differences
between occasions is constant across all occasions.
This assumption,
I
that
~
equals unity, is often not met in longitudinal trials, as was
"...
seen for the E. -estimates for the three clincial trial data sets (Eo
ranged from .6372 to .8387).
Growth curve analysis involves further
assumptions that all subjects are measured at identical time points
and that the response of all subjects are fit by polynomials of
identical degree.
These assumptions are not strictly necessary for
AUC and Z-W analysis, although observations that are poorly spaced and
are made at widely different time points for different subjects are
likely to result in poor (possibley biased) response parameter
estimates.
An important consideration in the selection of an analysis
164
technique is the technique's power.
For the test of within-subject
factors. a review of the literature indicates that mixed model
analysis is ab least as powerful as MANOVA when circularity conditions
hold.
When the circularity assumption is not valid. results by Rogan
et. al. (1979) indicate that when Eo is greater than .75. the
£,-
adjusted procedures provide the most powerful and robust tests of
within-subject factors; as € departs greatly from unity the
multivariate tests are consistently more powerful.
With respect to
between-subject factors. such as treatment main effects. the analysis
results of Chapter 6 suggest that growth curve analysis generally
provides the most powerful test. in the sense of detecting any
differences in response curve shape.
However. an overly sensitive
analysis may result when a high-order growth curve model is fitted to
irregularly-shaped response curves. such as seen for Study 2 in
Chapter 6.
For these data, the growth curve test for treatment-by-
investigator interaction
(p~.0026)
compared to a p-value of .6764
found using mixed model analysis and a p-value of .6634 found using
AUC analysis.
Review of treatment-investigator group means in Table
6.6 shows that one of the eight study investigators showed a different
pattern of response ratings across treatments.
While statistical power is an important analysis consideration,
selection of an analysis approach based strictly on power is
inappropriate.
As mentioned previously. the various methods of
analysis are directed at certain (sometimes different) questions.
When the interest of analysis is in detecting significant differences
among treatments in average or cumulative response over the interval
of observation [t 1 .t p ]' AUC analysis is a natural choice.
In
e-
165
addition, AUC analysis (and Z-W analysis) can be utilized when
interest is in examining for treatment differences on subintervals of
[tl,tpl; growth curve analysis is not directed at this interest.
7.4
CHARACTERISTICS OF THE OBSERVED DATA
Characteristics of the observed study data can affect the choice
of method of analysis.
In many clinical trial situations, complete
data is not obtained for each study subject.
Some subjects may miss
appointments or drop out of the study early, and omissions in data
recording may occur (e.g., a blood specimen may be lost or the blood
-e
analysis may be invalid due to contamination).
The existence of
missing data can cause serious problems for the analysis and
interpretation of repeated measures study results.
Exclusion of
subject data when there are missing observations in the subject's data
vector, in addition to being subject to criticism for loss of
information and possible bias, can result in a drastic reduction in
sample size and subsequent loss of power.
No repeated measures
analysis approach that has been discussed is immune to the problems
caused by missing data.
When the
amou~t
of missing data is not too large, missing data
estimation procedures based on least-squares or likelihood approaches
(e.g., see Kleinbaum, 1973) can be used under the assumption that the
cause of the missing data to be estimated is not associated with
treatment.
When dropouts occur and are possibly related to treatment,
Gould's approach of assigning scores based on reason-for-withdrawal
166
might be used.
When "large" amounts of missing data exist, serious
questions about the validity of study results arise.
How large is
"large" is difficult to define, but simulation results by Simons et.
a1. (1978) suggest that up to 25 percent randomly missing data may not
affect study results appreciably when inter-visit correlations are not
too high.
The use of sensitivity analyses, where missing data are
estimated under various schemes ranging from extreme conservatism with
respect to the null hypothesis of no treatment differences (e.g., give
missing data of subjects in test treatment group the least favorable
observed values) to more liberal schemes (e.g., give missing values
the average score of observed values of the appropriate treatment
group), is another approach to analysis in the missing data situation.
If analyses based on the various assignment schemes yield generally
similar conclusions regarding treatment effects, the validity of study
conclusions can be judged to be reasonable despite the presence of
missing data.
Other data characteristics, such as the level of measurement,
form of the distribution of responses and sample size, impact on the
choice of analysis.
For longitudinal data, the ordered dimension of
the times of response measurement should also be considered in the
selection of statistical technique.
analysis MANOVA
bo~h
Mixed model ANOVA and profile
ignore the metric associated with the vector t'
(t1,...,t p ) which identifies the p times of observation.
a
AUC analysis
extends the mixed model and growth curve analysis extends the MANOVA
model by considering the ordered nature of the members of t.
In the
development of AUC analysis here it has been assumed that the response
data are measured on an interval or ratio scale, or are recorded on an
~
167
ordinal scale that reflects an underlying continuous response scale.
However, for nominal data or data where normality assumptions are
unreasonable, nonparametric analysis, logistic modeling, or
categorical analysis techniques are alternatives to the parametric
techniques described previously.
In summary, the choice of analysis technique for repeated
measures data involves the combined consideration of a number of
factors including study goals, existence of theory on the response
phenomena under study, and characteristics of the observed data.
For
example, when interest lies in treatment performance by the end of the
trial and missing data are prevalent, endpoint analysis may be the
-e
analysis method of choice.
Interest in assessment of cumulative
treatment effects over the time period of observation may suggest AUC
analysis, while interest in the shape of the response curve may
suggest growth curve analysis.
In many experimental situations,
research interests are many-faceted, and a combination of analysis
techniques is appropriate in the development of an analysis plan.
following table summarizes salient features of each analysis method
according to the analysis issues discussed. in this chapter.
The
e
e
e
<Xl
-0
-
TABLE 7.1 SUHHARY COMPARISON OF AUC AND OTHER METHODS
METHODS OF ANALYSIS FOR REPEATED MEASURES DATA
UNIVARIATE TESTS WITH
ANALYSIS
BONFERRONI MULTIPLE
CONSIDERATION COMPARISON PROCEDURE
A. Study
Objectives
--tests for any diffs
in tx-grp means over
[tl,tpJ.
MIXED
HODEL
ANOVA
POLYNOHIAL GROWTH
CURVE MODEL
--form of response
curve ignored.
~
ZERBE-WALKER
TEST
--tests for cons istent diffs between txs
over [tl,tpJ.
--tests for any diffs
between tx-grp curve
shapes over [tl,tpJ.
--tests for tx
diffs w.r.t. mean
or cumulative response on [t~,tpJ
--can be use for
comparisons on
subintervals.
--tests for any
diffs in tx-grp
curve shapes on
[tl,tpJ.
--can be used
for comparisons
on subintervals
--form of response
curve ignored.
--form of response
considered.
--when true form of
response is not a
polynomial,problems
of interpretation.
--when form of response unknown, useful to detect diffs
in tx-curve shapes.
--form of response
curve ignored.
--can be included
w/in meaningful
modeling framework
--when form of response unknown,
can still be esti_ted.
--form of response ignored.
--can be include(
w/in meaningful
modeling framewol 'k
--when form of
response unknown,
can still be estimated.
--has less
intuitive meaninf
than AUC parametE r
--no test for tx-bytime interaction or
time aain effect.
B. Knowledge
of Response
Phenomena
Under Study
AUC
....'"
>D
C. Statiaticall--univariate
Model
noraality
Propertie8 a88umption8.
--low power when p is
large or when con8i8tent (moderate) diff8
between treat~nts.
--multivariate
noraality
asaumptiona.
--a88ume8 circularity
for w/in-aubject tests.
--overly liberal tests
for w/in-aubject tests
when no circularity.
--multivariate
normality
assumptions.
--univariate
normality
assumptions.
-univariate
normality
assumptions.
--low power for w/insubject tests when n
is small.
--low power to
detect tx diff8 in
respon8e curve
shape8.
--measurement time8
may not need be the
same for all subj's
--power greater
than AUC test
when tx*time
interaction
--measurement
times may not
need be the same
for all subjects
--measurement times
asaumed the same for
all subjects.
1
--ignores metric of
ob8ervation-time
vector t.
--.is8ing data cause8
nonca-parability of
sa.ples at different
time pointe.
C_nt8
--·quick and dirty·
method.
--
e
~
TABLE 7.1
MIXED
MODEL
ANOVA
URIVARIATE TESTS WITH
ANALYSIS
IBOIIFERRONI MULTIPLE
CONSIDERATION COMPARISON PROCEDURE
D. Characteri8tiC8 of
Observed
Data
--no a88umption8
regarding
•
~cOntinued).
I
POLYNOMIAL GROWTH
CURVE MODEL
Auc
,; IZERBE-WAUER ')
TEST
t.
--considers metric
of t.
--consider8 metric
of t.
--consider8
~tric of t.
--.is8ing dat8 caU8es
noncompar8bility of
samples at different
time pointe.
--.is8ing dat8 caU8e8
noncomparability of
samples at different
time pointe.
--missing data
caU8e8 noncor
parability of
samples at diff
time pointe.
-- .issing date
cause8 noncor
para bUtty of
8amples at
diff time pt ••
--appropriate for
profile (a8 0Ppo8ed to
longitudinal) data.
--high computation
C08t8 if n and lor p
i8 large.
--poor fit p08sible
for irregularly
shaped pharaaceutical
clinical trial data
as p get8 large.
U8ing linear interpolation,
Z-W te8t re8ult8 8imilar to AUC
test results. AUC parameter is
easier to calculate.
--ignores metric of
-
.,
e
SUGGESTIONS FOR FURTHER RESEARCH
The preceding chapters have discussed various approaches to the
analysis of longitudinal data with attention focused on analysis
issues associated with data from the pharmaceutical clinical trial
setting.
Area-under-the-curve (AUC) analysis was introduced and
I
compared to other analysis methods described in the literature.
Several important research issues that were either not covered or only
cursorily discussed merit consideration for future research.
The
following outlines such topics.
Research issues concerning the use of AUC analysis and other
techniques for the analysis of longitudinal data include
--- the estimation of missing data in AUC analysis.
The use of
least-squares polynomial regression or linear interpolation based
on the observed data to calculate AUC might be compared to other
missing data estimation approaches such as maximum likelihood
methods or the substitution of the average of group response at
the time points for missing datL
The effect of design
strategies which anticipate the occurrence of missing data such
as having more frequent measurements might be explored.
--- the use of AUC when the times of observation differ among
171
subjects.
The growth curve model requires that subjects be
observed at the same time points.
While this is not strictly
required for the development of the AUC parameter, the use of
J
observation times that are widely and unequally spaced to
estimate AUC may result in a highly inaccurate estimate.
The
effect of unequal times of observation on the accuracy of the
AUC estimate might be examined through the use of a simulation
study for various classes of response functions.
--- the application of piecewise growth curve models to clinical
trial data where a "placebo effect" is expected.
The use of
piecewise models might help to delineate the nature of the
placebo effect for certain types of response situations,
-e
such as response mechanisms that involve psychological response.
the extension of AUC methodology to more than one variable.
For example, the bivariate response function f(xt,Yt) for
systolic and diastolic blood pressure (if known) could be
integrated over time to obtain a "volume under the sphere" type
of parameter.
The use of this summary index parameter appears
limited to variables measured in the same units.
Methods of
estimating this multi-variable parameter and its potential
usefulness as a response parameter might be explored.
REFERENCES
Anderson, T.W.: An introduction to multivariate statistical analysis,
New York, Wiley Ceds)., 1958.
Bartlett, M.S.: Some examples of statistical methods of research in
agriculture, J. Royal Stat Society Supplement, 4:137-183, 1937.
Bellville, J.W., Forrest, W.H., and Brown, B.W.: Clinical and
statistical methodology for cooperative clinical assays of analgesics,
Clinical Pharmacology and Therapeutics, 9:290-302, 1968.
Boik, R.J.: A priori tests in repeated dmeasusres designs: effects of
nonsphericity, Psychometrika, 46:241-255, 1981.
Box, G.E.P.: A general distribution theory for a class of likelihood
criteria, Biometrika, 36:317-346, 1949.
Box, G.E.P.: Problems in the analysis of growth and wear curves,
Biometrics, 6:362-389, 1950.
Box, G.E.P.: Some theorems on quadratic forms applied in the study of
analysis of variance problems, Annals of Math Stat, 25:290-302, 484-498,
1954.
Brown, M.B.: A method for combining non-independent, one-sided tests of
significance, Biometrics, 31:987-992, 1975.
Brunelle, R.L. and Johnson, D.W.: The use of a linear spline model in the
analysis of a repeated measures experiment through SAS, Proceedings of
SAS Users Group International Conference, 236-240, 1980.
Campbell, D.T. and Stanley, J.C.: Experimental and quasi-experimental
designs for research, Chicago,Rand McNally, 1966.
Cole, J.W. and Grizzle, J.E.: Applications of multivariate analysis of
variance to repeated measurement experiments, Biometrics, 22:810-828,
1966.
Collier, R.O., Baker, F.B., Mandeville, G.K., and Hayes, T.F.: Estimates of
test size for several test procedures based on conventional variance
ratios in the repeated measures design, Psychometrika, 32:339-353, 1967.
Conte, S.D.:
1965.
Elementary Numerical Analysis, New York: McGraw-Hill,
Danford, M.B., Hughes, H.M., and McNee," R.C.: On the analysis of repeatedmeasurements experiments, Biometrics, 16:547-565, 1960.
Davidson, J., Weiss, J., Sullivan, J., Turnbull, C.D., and Linnoila,
M.: A placebo-controlled evaluation of isocarboxazid in outpatients,
Monoamine oxidase inhibitors: The state of the art, Youdim and Paykel
173
(eds.), New York: Wiley and Sons, 115-123, 1981.
Davidson, J., and Turnbull, C.D.: Loss of appetite and weight
associated with isocarboxazid in depression, J. Clinical
Psychopharmacology, 2(4), 263-266, 1982.
J
Davidson, M.L.: Univariate versus multivariate tests in repeatedmeasures experiments, Psych. Bull., 77(6):446-452, 1972.
Eisenhart, C.: The assumptions underlying the analysis of variance,
Biometrics, 3:1-21, 1947.
Elston, R.C. and Grizzle, J.E.: Estimation of time-response curves and
their confidence bands, Biometrics, 18:148-159, 1962.
Ertel, J.E. and Fowlkes, E.B.: Some algorithms for linear spline and
piecewise multiple linear regression, JASA, 355:640-648, 1976.
Feder, P.I.: On the asymptotic distribution theory in segmented
regression problems -- identified case, Annals of Statistics, 3(1):4983, 1975.
Fisher, R.A.: Statistical methods for research workers, Oliver and
Boyd, 11th edition, London, 99-101, 1950.
Fuller, W.A.: Grafted polynomials as approximatiing functions,
Australian J. Agricultural Economics, 13:35-46, 1969.
Gaito, J.: Repeated measurements designs and counterbalancing, Psych.
Bull., 58:46-54, 1961.
Garty and Hurwitz: The effect of cimetridine and antacids on
gastrointestinal absorption of tetracycline, Clinical Pharmacology
and Therapeutics, Aug:203, 1980.
Geisser, S.: Multivariate analysis of variance for a special covariance
case, JASA, 58:660-669, 1963.
Geisser, S. and Greenhouse, S.W.: An extension of Box's results on the
use of the F distribution in multivariate analysis, Annals of Math.
Stat., 29:885-891, 1958.
Ghosh, B.K.: Some monotonicity theorems for chi-square, F, and t
distributions with applications, J. Royal Stat Society (Series B),
35:480-492, 1973.
Ghosh, M., Grizzle J.E., and Sen, P.K.: Nonparametric methods in
longitudinal studies. JASA, 68:29-36, 1973.
•
Gibaldi, M. and Perrier, D.:
Inc., 293-296, 1975 •
Pharmacokinetics, New York:Macel Dekker,
Gill, J.L.: Combined significance of non-independent tests for repeated
measurements, J. Animal Science, 48:363-366, 1979.
174
Ginsburg and McCracken: Bioavailability of cefadroxil capsules and
suspension in pediatric patients, J of International Medicaid Research,
8(1), p •• 9, 1980.
Goodman and Gilman, The pharmacological basis for therapeutics, 1975.
Gnanadesikan, R.: Methods for statistical data analysis of multivariate
observations, New York:John Wiley and Sons, 1977.
Goldstein, H.: Longitudinal studies and the measurement of change,
The Statistician, 18:93-117, 1968.
Gould, A.L.: A new approach to ghe analysis of clinical drug trials
with withdrawals, Biometrics, 36:721-727, 1980.
Grant, D.: Analysis-of-variance tests in the analysis and comparison of
curves, Psych. Bull., 53:141-154, 1956.
Greenblatt and Koch-Weser: Clinical pharmacokinetics, N. England J.
Medicine, 293:702-704, 1975.
.
Greenhouse, S.W., and Geisser, S.: On methods in the analysis of
profile data, Psychometrika, 24:95-112, 1959.
Grizzle, J.E. and Allen, D.M.: Analysis of growth and dose-response
curves, Biometrics, 25:307-318, 1969.
Hamilton, R.: Determination of mean va1proic acid serum level by assay
of a single pooled sample, Clinicla Pharmacology and Therapeutics,
29:408, 1981.
Healy, M.J.R.: Rao's paradox concerning multivariate tests of
significance, Biometrics, 25:411-413, 1969.
Hildebrand, Introduction to Numerical Analysis, New York: McGraw-Hill,
second edition, 1974.
Holloway, L.N. and Dunn, O.J.: The robustness of Hotelling's -r2,JASA,
62:124-136, 1967.
Hotel1ing, H.: The generalization of Student's ratio, Annals of Math.
Stat., 2:360-378, 1931.
Hudson, D.J.: Fitting segmented curves whose join points have to be
estimated, JASA, 61:1097-1129, 1966.
Huynh, H.: Some approximate tests for repeated measurement designs,
Psychometrika, 43:161-175, 1978.
Huynh, H.: Testing the identity of trends under the restriction of
monotonicity in repeated measures designs, Psychometrika, 46:295-305,
1981.
Huynh, H. and Feldt, L.S.: Conditions under which mean square ratios in
repeated measurements designs have exact F-distributions, JASA, 65:1582-
•
175
1589, 1970.
Huynh, H. and Feldt, L.S.: Estimation of the Box correction for degrees
of freedom from sample data in randomized block and split-plot designs,
J. Ed. Stat., 1:69-82, 1976.
Huynh, H. and Mandeville, G.: Validity conditions in repeated measures
designs, Psych. Bull., 86:964-973, 1979.
Huynh, H. and Feldt, L.S.: Performance of traditional F tests in
repeated measures designs under covariance heterogeneity, Comm. Stat.,
A9:61-74, 1980.
Ito, K. and Schull, W.: On the robustness of the il test in
multivariate analysis when covariance matrices are not equal,
Biometrika, 51:71-82, 1964.
James, K.E., Forrest, W.H., and Rose, R.I..: A historical summary of
fifty-five four-point parallel line assays for post-surgical analgesic
efficacy, presented at the XI International Biometric Conference,
Toulouse, France, September 1982.
Jenss R.M., and Bayley, N.: A mathematical method for studying growth
in children, Human Biology, 9:556-563, 1937.
Kaiko: Age and morphine analgesia, Clinical Pharmacology and
Therapeutics, Dec., p. 823, 1980.
Keselman, H.J., Rogan, J.e., Mendoza, J.L., and Breen, L.J.: Testing
the validity conditions of repeated measures F tests, Psych. Bull.,
87:479-481, 1980.
Khatri, C.G.: A note on a MANOVA model applied to problems in growth
curves, Annals Inst. Stat. Math., 18:75-86, 1966.
Khatri, C.G.: Characterizations of multivariate normality. II. Through
linear regressions, J. Multivariate Analysis, 9:589-598, 1979.
Koch, G.G.: Some aspects of the statistical analysis of 'split plot'
experiments in completely randomized layouts, JASA, 64:485-505, 1969.
Koch, G.G., Amara, I.A., Stokes, M.E., and Gillings, D.B.: Some views
on parametric and non-parametric analysis for repeated measurements and
selected bibliography, Int. Stat. Review, 4:249-265, 1980.
Kowalski, C.J.: A commentary on the use of multivariate statistical
methods in anthropometric research, Am. J. Physical Anthropology,
36:119-131, 1972.
Kowalski, C.J. and Guire, K.E.:
38:131-169, 1974.
Longitudinal data analysis, Growth,
Laska and Sunshine: Fenoprofen and codeine analgesia,
Pharmacology and Therapeutics, 29:606, 1981.
Clini~al
176
Littel, R.C. and Folks, J.L.: Asymptotic optimality of Fisher's method
of combining independent tests, JASA, 66:802-806, 1971.
Mardia, K.V.: The effect of nonnormality on some multivariate tests
and robustness to nonnormality in the linear model, Biometrika, 58:105121, 1971.
..
Mauchly, J.W.: Significance test for sphericity of a normal n-variate
distribution, The Annals of Math Stat, 29:204-209, 1940.
McGregor, J.R.: An approximate test for serial correlation in
polynomial regression, Biometrika, 47:111-119, 1960.
Mendoza, J.L., Toothaker, L.E., and Nicewander, W.A.: A Monte Carlo
comparison of the univariate and multivariate methods for the groups
by trials repeated measures design, Multivariate Behavioral Research,
9:165-178, 1974.
Meyers, J.: Fundamentals of Experimental Design, New York: McGrawHill, 1979.
Monti, K.L.: The locally optimal combination of certain multivariate
test statistics, Ph.D. dissertation, UNC at Chapel Hill, Inst. of
Statistics Mimeo Series No. 1007, 1975.
Monti, K.L., Koch, G.C., and Sawyer, J.:' An application of segmented
linear regression models to the analysis of data from a crosssectional growth experiment, Inst. of Statistics. UNC Mimeo Series No.
1200. 1978.
Morrison. D.F.: Multivariate Statistical Methods. New York. McGrawHill. 1967 (2nd ed •• 1976).
Moursund. D.G. and Duris. C.S.: Elementary Theory and Application of
Numerical Analysis. New York. McGraw-Hill. 1967.
Neter. J. and Wasserman. W.: Applied Linear Statistical Models.
Homewood. IL. Richard D. Irwin (ed.). 1974.
Okun. et. al.: An analgesic comparison study of indoprofen versus
aspirin and placebo in surgical pain. Journal of Clinical Pharmacology.
Aug.-Sept •• p. 487. 1979.
Olson. C.L.: Comparative robustness of six tests in multivariate
analysis of variance. JASA. 69:894-908. 1974.
Olson. C.L.: On choosing a test statistic in multivariate analysis of
variance. Psychological Bulletin. 83:579-586. 1976.
Pitman. E.J.G.: Significance tests which may be applicable to samples
from any populations: III. The analysis of variance. Biometrika.
29:322-335. 1937.
Potthoff. R.F. and Roy. S.N.: A generalized multivariate analysis of
variance model useful especially for growth curve problems. Biometrika.
a
~
177
51:313-326, 1964.
...
Quandt, R.E.: The estimation of the parameters of a linear regression
system obeying two separate regimes, JASA, 55:878-880, 1958 •
Ralston, A., and Rabinowitz, P.: A first course in numerical
analysis, New York: McGraw-Hill, 1978.
Rao, C.R.: Some statistical methods for the comparison of growth
curves, Biometrics, 14:1-17, 1958.
Rao, C.R.: Some problems involving linear hypotheses in multivariate
analysis, Biometrika, 46:49-58, 1959.
Rao, C.R.: The theory of least squares when the parameters are
stochastic and its application to the analysis of growth curves,
Biometrika, 52:447-458, 1965.
Rao, C.R.: Covariance adjustment and related problems in multivariate
analysis. In: Multivariate Anaylsis, New York, Academic Press, p. 87103, 1966.
Rao, C.R.: Least squares theory using an estimated dispersion matrix
and its application to measurement of signals, Proceedings of Fifth
Berkeley Symposium on Mathematical S~atistics and Probability, Vol. I,
355-372, 1967.
Rogan, J.C., Kese1man, H.J., and Mendoza, J.L.: Analysis of repeated
measurements, British Journal of Mathematical and Statistical
Psychology, 32:269-286, 1979.
Rouanet, H. and Lepine, D.: Comparison between treatments in a
repeated-measures design: ANOVA and multivariate methods, British
Journal of Mathematical and Statistical Psychology, 23:147-163, 1970.
Roy, S.N.: On a heuristic method of test construction and its use in
multivariate analysis, Annals of Mathematical Statisitcs, 24:2202-238,
1953.
Scheffe, H.: A method for judging all contrasts in the analysis of
variance, Biometrika, 40:87, 1953.
Scheffe, H.:
1959.
,
The Analysis of Variance, New York, John Wiley and Sons,
Siddiqui, M.M.: Covariance of least-squares estimates when residuals
are correlated, Annals of Math Stat, 29:1251-1256, 1958.
Smith, P.L.: Splines as a useful and convenient statistical tool, The
American StatistIcian, 33(2):57-62, 1979.
Stoloff, P.H.: Correcting for heterogeneity of covariance for repeated
measures designs of the analysis of variance, Educational ant
Psychological Measurements, 30:909-924, 1970.
178
Symons, M.J., Gillings, D.B., and Donelan, M.A.: A practical
comparison of some multivariate analysis strategies for clinical
trials with missing data, presented at the Joint Statistical Meetings,
San Diego, Calif., August 15, 1978.
Teeter, R.A.: Effects of measurement error in piecewise regression
models. Ph.D. dissertation. Dept. of Biostatistics. UNC at Chapel
Hill, 1982.
Watson. G.S.: Serial correlation in regression analysis, Biometrika.
42:327. 1955.
Wilson, R.S.: Analysis of developmental data: Comparison among
alternative methods, Developmental Psychology, 11:676-680. 1975.
Winer, B.J.: Statistical Principles in Experimental Design, New York,
McGraw-Hill. 1962, second ed.(1972).
Wishart. J.: Growth rate determinations in nutrition studies with the
bacon pig and their analysis. Biometrika, 30:16-28. 1938.
Zerbe. G.O., and Walker. J.H.: A randomization test for comparison of
groups of growth curves with different polynomial design matrices,
Biometrics, 33:653-657, 1977.
Zerbe. G.O.: Randomization analysis of randomized blocks design
extended to growth and response curves. Communications in Stat. Theory
and Methodology. A8(2):191-205, 1979.
'.
© Copyright 2026 Paperzz