•
.
•
A STEP-DOWN PROCEDURE FOR ANALYSIS OF
TIME-VARYING COVARIATES IN MULTI-VISIT STUDIES
by
SOLANGE ANDREONI
Department of Biostatistics
University of North Carolina
•
Institute of Statistics
Mimeo Series No. 2175T
September 1996
A Step-Down Procedure for
•
Analysis of Time-Varying Covariates
in Multi-Visit Studies
by
Solange Andreoni
A dissertation submitted to the faculty of the University of North Carolina at Chapel
Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy
in the Department of Biostatistics, School of Public Health.
Chapel Hill
1996
Approved by:
----------::;l1li;::"'"------ Reader
---~::...---_L----__r---Reader
_ _ _+=-..t:..:::..~__===;;..~=...:..._=_
Reader
•
@1996
Solange Andreoni
ALL RIGHTS RESERVED
11
Abstract
SOLANGE ANDREONI.
A Step-Down Procedure for Analysis of Time-Varying
Covariates in Multi-Visit Studies. (Under the direction of Dr. Pranab Kumar Sen.)
..
It is a common practice in many multi-visit studies to collect data on responses
variables and many potential covariates repeatedly over time for each subject. In view
of the issues of cost and precision involved, one question that may arise is whether
it is desirable to collect the covariates at each visit as opposed to collecting them
only once, that is, at the beginning of the study. In that context, a model to assess
the optimal frequency of the collection of the time-varying covariates is proposed,
and some test statistics that can be applied to test the hypotheses of interest are
discussed. Monotone multiple design multivariate models are considered since they
allow different dependent variables to have different design matrices nested within
each other. It is first assumed that the random errors and covariates come from a
multivariate normal distribution. The null hypotheses of interest are that the responses at each visit can be written as linear functions of the covariates measured at
the first visit. A step-down procedure is proposed to test the null hypotheses that
involves decomposing the hypotheses of interest into an intersection of hypotheses f(Jr
the parameters involved in each visit. The likelihood is then written as a produl't
of independent statistics under the null hypothesis, so that each hypothesis is testl'd
separately for each visit. The step-down procedure is extended for the case wllt'll
there are several responses and covariates observed at each visit, and also to all()\\
a monotone decreasing pattern of observations of the responses and covariates al(Jll!.!,
time. It is also investigated to what extent the proposed test procedure is robust t()wards some local departures from the normality assumption under both the null alld
alternative hypotheses. The use of the step-down procedure is extended to the
('il~l'
when independent individuals are sampled from a multivariate elliptically symrnet til'
distribution using different M -estimators, and corrections for the test statistics illvolved for each case are presented. Two applications of the proposed methodolog\'
are discussed using data extracted from randomized multi-visit clinical trials.
111
Acknowledgements
I would like to thank all committee members for their helpul comments, corrections, and professional advice. I especially thank Dr. Sen for his infinite patience
.
and wisdom and Dr. Stewart for his unlimited support. The completion of this work
would not have been possible without their trust and encouragement. I also gratefully
acknowledge the support from CAPES, the Rockefeller Foundation and innumerable
grants supplied by Dr. Stewart. The love and understanding provided by my family
in my frequently stressful periods are greatly appreciated. I thank my mother and
siblings for their constant support and unconditional love. This research is a direct
result of their faith on me, and is dedicated to them and the memory of my father.
IV
Co~e~s
·
1 Introduction and Literature Review
1.1
Introduction .
1
1.2
Literature Review.
3
1.3
1.2.1
Models.
3
1.2.2
Estimation .
7
1.2.3
Hypothesis Testing
1.3.2
1.4
.
2
12
Robust Methods
1.3.1
.
1
15
Spherical, Elliptical and Left-Orthogonally Invariant Distributions
16
A1-Estimation .
20
Synopsis of Research
21
1.4.1
Model
22
1.4.2
Maximum Likelihood Approach
23
1.4.3
Hypothesis of Interest
23
1.4.4
Step-Down Procedure.
25
1.4.5
Robust Estimation Approach
27
Normal Distribution Case
29
v
2.1
Introduction ..
29
2.2
Data Structure
29
2.3
Correlation Model .
31
2.4
Estimation. . . . .
33
2.5
Hypotheses of Interest
46
2.6
Step-Down Test Procedure
49
2.6.1
Distribution Under the Null Hypothesis.
51
2.6.2
Power Considerations . . . . . . . . . . .
57
3 Robustness
4
61
3.1
Introduction
61
3.2
Tests . . . .
62
3.3
Null robustness
64
3.3.1
Approach 1
65
3.3.2
Approach 2
66
3.4
Non-null robustness .
68
3.5
Correlation Model .
69
3.6
Estimation .
71
3.7
Discussion· .
78
Elliptical Distributions Case
79
4.1
Introduction . . . .
79
4.2
Correlation Model.
80
4.3
Estimation. . . . .
82
.
•
4.3.1
Maximum Likelihood Estimation
VI
84
4.3.2
5
6
lvi-Estimation
85
4.4
Null Hypothesis . . .
87
4.5
Step-Down Test Procedure
88
4.5.1
Maximum Likelihood Estimators from Normal Distribution.
91
4.5.2
Maximum Likelihood Estimates
92
4.5.3
Affine-Invariant M-Estimates
95
4.5.4
Discussion
.
97
Numerical Examples
98
5.1
Introduction
98
5.2
Example 1
98
5.3
Example 2
100
Suggestions for Future Research
103
6.1
Introduction
.
103
6.2
Conditional M-Estimates .
104
6.3
Missing Values
106
Bibliography
107
•
Vll
List of Tables
1.1
Data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
21
2.1
Data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
30
5.1
Statistics for Example 1. . . . . . . . .
99
5.2
Pattern of missing data for Example 2.
101
5.3
Statistics for Example 2. . . . . . . . .
102
..
Vlll
Chapter 1
Introduction and Literature
Review
1.1
Introduction
Longitudinal data, characterized by the repeated observation of one or more response
variables in the same experimental unit, arise in studies of many areas, such as
Medicine, Epidemiology, Biology, Environmental Sciences, Agronomy and Economics.
In this context, the observation of the response variables may occur at different ages,
different visits, or even different distances from a certain origin, although in this work
we will frequently refer to "time" to represent the different occasions of observation.
The experimental units (people, animals, for instance) may be classified in different groups defined by one or more factors, such as sex, race, dose, treatments, etc.
The response variables may be categorical (presence or absence of a symptom, for
example) or continuous (cholesterol level, blood pressure, FEV1, etc.).
Frequently studies involving repeated measurements are aimed at fitting simple
models to explain the response change over the occasions of observation and the
influence of other factors or covariates in the response profile over time.
As defined by Ware (1985), covariates that take a single value for each experi-
mental unit for the entire period of observation are called between-subject covariates.
Examples of such covariates are demographic variables, such as sex and race; as are
exposure or treatment variables that do not change over time. Within-subject covariates are those characteristics of an individual that vary over occasion of observation.
Frequently the term time-varying covariate is also used in this case. Examples include
the use of environmental factors and baseline measurements that vary over time either
by design or haphazardly.
Often there are many covariates measured on each occasion. In view of the issues
of cost and precision involved, one question that may arise is whether it is desirable
to collect the covariates at each visit as opposed to collecting them only once, that is,
at the beginning of the study. In Section 1.2 we present a literature review on models
that include time-varying covariates and some statistics commonly used in hypothesis
testing. Section 1.3 contains some known results concerning robust tests for covariance
matrices in a more general class of distributions other than the normal distribution.
In that context, in Section 1.4 we propose a model to assess the optimal frequency of
collection of the time-varying covariates, and we discuss some test statistics that can
be applied to test the hypothesis of interest. We start the discussion by considering
the case when there is only one response variable with the assumption that the set
of observations corresponding to a certain experimental unit is from a multivariate
normal distribution. In Chapter 2 we extend the approach taken in Section 1.4,
considering the case where there are several responses and covariate measurements
at each visit still under the normal distribution assumption. Chapter 3 treats the
cases where the distributions of the test statistics involved remain exactly the same
under a more general class of distributions that includes the normal as a special case.
vVhen the assumptions given in Chapters 2 or 3 do not seem to be reasonable, we
extend some methods of constructing robust estimators for the proposed model and
the corresponding test statistics in Chapter 4. Finally we present two applications of
the methodology in Chapter 5 and some suggestions for future reserch in Chapter 6.
It is expected that the methodology can help to design to design future multi-visit
studies.
2
..
1.2
Literature Review
The analysis of repeated measures data involves the representation of the data through
simple models that reflect the fundamental variation of the responses. Essentially,
these models involve
1. the mean value of the response variables on each measurement occasion for each
treatment;
2. the variances and covanances between the responses observed for the same
experimental unit.
Although there exist many proposals to represent the mean function of the response, we consider only linear models in this work. Generally the parameters of
these models have an easy interpretation and linear models provide a good fit to the
data. Although the main interest is usually the estimation of the mean values, it is
necessary to take into account the correlation structure of the observations on the
same experimental unit. This requirement influences the estimates of the variabilit\·
of the parameters associated with the mean function, and in some cases it can affect
the estimates of those parameters.
Some definitions (Helms, 1992) are useful to choose an appropriate model for
longitudinal data. A longitudinal study has a regularly timed schedule if measurement"
are scheduled at equal "time" intervals on each subject, and it has regularly till I , ,/
data if measurements are actually obtained at equal intervals of time on each subj('('t
A longitudinal study has consistently timed schedule if every subject is going to 1)1'
evaluated at the same set of time values, whether or not they are regularly tim('d.
and has consistently timed data if all subjects are evaluated at the same set of tin\('"
•
1.2.1
Models
The design of studies with a consistent schedule can be accomplished using multi\'ariate and univariate models. Timm (1980) provides a good summary of the literatun'
3
on the multivariate analysis of repeated measures data. The full-rank general linear
multivariate (GLM) model can be expressed as
Y =XB+E,
(1.1 )
where Y is the (n x p) matrix of the p observable responses on n experimental units;
X is the (n x q) design matrix that may contain the between-subject covariates with
p :::; n - q; B is the (q x p) matrix of unknown parameters; and E is the (n x p)
matrix of random errors. It is further assumed that the transpose of the rows of E
are independently normally distributed with mean vector 0 (p xl) and dispersion
matrix :E (p x p). In our case we consider the columns of Y to represent time and
the rows the experimental units.
The standard GLM model (1.1) assumes that each experimental unit has the
same covariance structure, and does not allow the presence of missing values. It also
has the requirement that different dependent variables share the same design matrix X. When this is not the case, such as when we have time-varying covariates, one
strategy would be to use separate univariate models and tests for each dependent variable, but this approach ignores the information contained in the covariances
amotl~
the dependent variables and increases the family-wise type I error rate generated bv
the multiple comparisons.
One multivariate model that can handle the problem of having repeated cm"miates over time is the multiple design multivariate (MDM) model (Srivastava, 19G(; I
which allows different dependent variables to have different design matrices. TIlt'
MDM model can be written as
(L~)
where Y and E are as in (1.1), X t is an (n x
md
design matrix for the tth depend('llt
variable, and {3t is an (mt xl) vector ofregression coefficients associated with the tt h
(n x 1) vector of responses Yt, t = 1, ... ,po Alternatively, the model can be writtetl
4
.
by rolling out the columns of (1.2), that is,
Y
= X/3 + € =
where Y and
€
Y1
Xl
0
0
/3 1
Y2
0
X2
0
/3 2
YP
0
0
Xp
€1
+
/3p
€2
(1.3)
€p
are (np xl) vectors with Yt and €t representing the tth column of Y and
E respectively; here X is an (np x m) block diagonal matrix with the design matrices
Xt's in the diagonal; /3 is an (mx 1) vector ofregression coefficients and m =
Under this representation, E(€d
Var ( €)
•
=0
and E(€t€f,)
= att,In,
for t,t'
L:f=l mt.
= 1, ... ,p, so
that
= ~ ® In =
Model (1.3) is called the seemingly unrelated regressions (SUR) model (Zellner, 1962:
Srivastava and Giles, 1987) in the econometric 1iterature. It should be noted that
•
(1.2) and (1.3) are equivalent to the standard multivariate linear model (1.1) when
Xl
= X 2 = ... = X p .
Patel (1986) introduced a model for the analysis of repeated measures data
incorporating covariates that change over time. The model is
T
Y =
Ae + 2: Xlr l + E,
(1.4 )
1=1
where Y is an (n x p) matrix of observations on p occasions for n experimental units: A
is an (n x k) design matrix of rank k
< n that may include between-subject covariates:
eis a (k x p) matrix with unknown parameters; Xl is an (n x p) matrix containing the
•
values of the lth covariate that changes with time;
r l is a (p x p) diagonal matrix with
diagonal elements being 'Yll, ... ,'YIp, for l = 1, ... , r; and E is an (n x p) matrix of
random errors. The model assumes that the rows of E are independently distributed
with the same normal multivariate distribution with (p x 1) mean vector 0 and (p x p)
dispersion matrix
~.
5
Other models can be proposed that allow different responses to have different
design matrices, and accommodate missing data as well. This can be acomplished
by stacking the data by experimental unit, instead of by time as in (1.3). Ware
(1985) and Jennrich and Schluchter (1986) considered that situation. Suppose that
..
we have a set of n experimental units, where we observe a response along Pi known
occasions. Let
Yi
=
(Yil, ... ,
Yip; V be the (Pi xl) vector of responses associated with
the ith experimental unit. A linear model that associates the responses of the ith
experimental unit with a set of explanatory variables is given by
l.
Yl*· = X*{3*
+e·*
(1.5 )
l ,
where Xi is the (Pi x m) known design matrix, {3* is an (m xl) vector of unknown
parameters, and ei is the (Pi xl) vector of random errors of the ith experimental
unit. It is further assumed that ei, ... , e~ are independent with ei
elements of :E i are known functions of a vector
elements do not depend on (3; that is, :E i
rv
N(O, :Ed. The
4> of unknown parameters, whose
= :E i (4))
for i
= 1, ... , n.
Model (1.5) can also be writen in the vectorial form used by Harville (1974, 1976,
•
1977)
y* = X*{3*
X* = (Xi T, ... , X~TV is a
(L:f=l Pi
+ e*,
(1.6)
x m) design matrix. This way, E(y*) = X* (3*
and Var(y*) = diag (:E l , ... , :En)' Ware (1985) and Jennrich and Schluchter (1986)
discuss several different choices for the covariance structure :E i (4)), for i = 1, ... , n.
We consider the case where we can write each :E i as :E i (4))
=
E i :E(4))ET; that is,
:E i is obtained by selecting from :E rows and columns associated with the Pi occasions of observation and :E is an unstructured (p x p) covariance matrix with
ements are zeros and ones conveniently placed to select the appropriate variances and
covariances from :E associated with the ith experimental unit.
6
1.2.2
Estimation
In the full rank GLM model (1.1), the best linear unbiased estimator of B is
..
(1. 7)
Under the assumption that the errors are normally distributed, B is also the maximum
likelihood estimator (MLE) of B. The MLE of 'E is given by
~
'E=
(Y - XB)T(Y - XB)
.
n
(1.8)
The unbiased estimator nf.j(n - q) is conventionally used, where q=rank(X), since
(1.8) is a biased estimator.
The problem of estimating the parameter vectors
131"'" I3 p in model (1.3) has
been intensively investigated. One method is the application of ordinary least squares
(OLS) to each equation. This method leads to estimates that are consistent and unbiased, but only under certain conditions do the OLS estimates possess other optimal
properties such as asymptotic efficiency. Alternatively, estimation methods allowing
for the correlation between the response variables have been considered. Srivastava
and Dwivedi (1979) reviewed some procedures used in the econometric setting. Most
of the proposed estimators have the form of a generalized least squares (GLS) for
13
using a consistent estimator S = [(stt')] of 'E. The OLS and GLS estimators of 13 are
defined as
(1.9)
and
(1.10)
with
(1.11)
.
where X = diag (Xl, ... ,Xp ). The conditions under which
(30LS
=
(3GLS
were devel-
oped by Dwivedi and Srivastava (1978). Srivastava (1967) showed that for a general
unstructured 'E the OLS and GLS estimators are identical if Xl = X 2 = ... = X p ; or
7
more generally, if and only if the X I, ... ,Xp are all non singular, and not necessarily identical, linear combinations of the same set of variables. Although
Gauss-Markov estimator of {3 given
~, ~
i3GLS
is the
is usually unknown. Zellner (1962, 1972)
proposed a two-stage generalized least squares estimator of (3 using equation (1.10)
with
~
replaced by an estimator S based on the OLS residuals. The elements of S
are
(1.12)
(X[Xt)-1 X[Yt for t = 1, ... , p. Zellner (1962) proved the weak con-
where 13t
sistency and asymptotic normality of the two-stage estimator of j3 calculated from
(1.12). Gallant (1975) proved strong consistency of the estimator under the more
general framework of a nonlinear MDM model. Kleinbaum (1973) demonstrated that
it is a best asymptotically normal (BAN) estimator when the underlying distribution
of the error terms is a p-variate normal. Other divisors can be used in place of n in
(1.12) (see Srivastava and Dwivedi (1979)).
In a subsequent paper, Zellner (1963) considered a two-equation system (p
= 2)
"
of (1.3) and established several finite sample properti.es of two-stage estimators of {3.
The estimators are, however, based on the unrestricted estimate S of :E. The elements
Stt'
...
of S are based on the residuals obtained by regressing each Yt on all regressors in
the system, and are given by
(1.13)
where 13t
= (X6'X O)-IX6'Yt and X o is a basis of [Xl' ..
X p ]. Zellner's (1963) analysis
was confined to the case where XfX 2 = 0, which may not be the case in many
applications.
Revankar (1974, 1976) specified a two-equation model in which the explanatory
variables in the second equation were a subset of the explanatory variables in the
..
first equation and derived exact expression for the variance-covariance matrix of the
two-stage estimators of {3 obtained from (1.12) and (1.13). Under this model, both
estimates of {32 coincide with the respective OLS estimator. Conniffe (1982) also
8
'"
discussed this situation relating the two Zellner estimators to the estimators obtained
from the maximum likelihood approach assuming normally distributed errors.
Telser (1964) suggested an estimation method which introduces in each equation
the disturbances of all remaining equations in the system, thus allowing study of the
correlations among the disturbances of various regression equations explicitly. This
gives an idea of the degree of multicollinearity among the explanatory variables of
different equations, and how the correlations among the disturbances can be exploited
to obtain asymptotically more efficient estimates of {3. His method involves writing
the regression equation t, t
= 1, ... ,p, as follows:
(1.14)
where V t is an (n x (p -1)) matrix having
0t
El, E2,""
is a column vector of p - 1 unknowns and
such that E(V[vd
= O. Define
13t(k)' Qt(k)
Vt
and
Et-l, Et+l,""
as its columns,
is an (n x 1) vector of disturbances
€t(k)
as estimates of
respectively at the kth round. Suppose the estimators of
Qt(l),
Ep
(3l
and
{3t, 0t
01
are
and
13t(l)
Et
and
obtained by applying least squares to the first equation in (1.14) after replacing
€t' (0) 's (t f =j:. t) in V t by
estimate
El
Et' 's,
the estimated disturbances based on least squares. ~ ow
by
and write the second equation in (1.14) as
to which the application of least squares will yield the first round estimates
€2(1)
= Y2 -
x 2132(1)' and use this in place of E2(0) in the third equation.
(32(1)
and
We have for
the kth round estimate and the tth equation the following regression to be estimated
by least squares:
At each equation we bring in the residual from the preceding estimate so that
9
This recursive process is repeated until convergence. Ruble (1968, pp. 287-294) presented an alternative derivation of the iterative estimator of Telser (1964) through a
maximum likelihood (ML) approach.
Still under model (1.3) and assuming normally distributed error terms, Kmenta
.,
and Gilbert (1968) derived the maximum likelihood estimators given by
(1.15)
where SML = [(st~L)] with
(1.16)
for t, t f = 1, ... ,po The ML solutions can be found by iterating equations (1.15) and
(1.16), beginning the iteration procedure with a starting value like the one provided
by (1.9). Generally the ML estimator of {3 is consistent, asymptotically efficient,
asymptotically normally distributed, and under certain conditions it is also unbiased
•
(Don and Magnus, 1980).
Several simulation studies have been conducted providing comparisons among
..
the two-step estimators of {3, the OLS estimator and the ML estimator. Most of them
are restricted to the case when p = 2. They are characterized by a trade-off among
(n - mr), (ml - m2), the correlation between Yl and Y2, and the correlations among
the explanatory variables in the two equations. It seems that the ML estimator is
preferable to the Zellner estimators when the correlation between Yl and Y2 is large.
The larger ml - m2is and the smaller (n - md is, the more efficient the MLE will be
relative to the Zellner estimators. Asymptotic expansions for the standard errors of
the Zellner estimators have been derived (see Srivastava and DWivedi, 1979), however,
a bootstrap work by Freedman and Peters (1984) suggests that such expansions may
•
yield results that are too small by a factor of at least two.
Patel (1986) presented an algorithm to compute maximum likelihood estimates
ofthe parameters
e, r
1 , ... ,
rr and ~ assuming model. (1.4).
Verbyla (1988) proposed
an alternative derivation of the maximum likelihood estimates of the parameters of
10
Patel's model based on the work of Verbyla and Venables (1988) that involves a
reparametrization of the model and the use of conditional maximum likelihood.
Using model (1.5), Jennrich and Schluchter (1986) proposed the application of
maximum likelihood methods to obtain estimates of (3* and ¢y. The log-likelihood to
be maximized can be written as
n
IML
= -(1/2) 2)Pi In(21r) + In l:Eil + (y; - X;{3*)T:Ei l (y; - X;{3*)}.
(1.17)
i=l
Alternatively, the method of restricted maximum likelihood (REML) can be used.
The REML estimates of the variance-covariance parameters
maximize the REML likelihood
IREML,
4> are the values that
which is the log-likelihood of any set of 1'1 - m
linearly independent error contrasts. Harville (1974) showed that the RElvIL loglikelihood can be written as
(1.18)
\\'h ere E-*i -- Yi* - x*("n
i L..i=l
X*T~-lX*)-l
"n
i ~i
i
L..i=l X*T~-l
i ~i Yi*
an d 1'1 -- "n
L..i=l Pi· It era t'lve
procedures are usually required to find the maximum likelihood estimates of {3* and
4> from (1.17) or (1.18), such as Newton-Raphson, Fisher scoring, or variations of the
EM algorithm; for details see Jennrich and Schluchter (1986). In either the ML or
REML approaches, if
:E i =
:E i (;P), where
;p is the ML or REML estimate of 4>,
the
estimates of {3* take the form of the generalized least squares estimates, that using
the notation of model (1.5) can be written as
(1.19)
with covariance matrix usually estimated by
(1. 20)
..
If {3 and
4> are estimated by the ML approach, (1.20) is the estimate of the covariance
matrix of
/3 obtained from the expected information matrix.
11
1.2.3
Hypothesis Testing
The usual linear multivariate hypothesis in model (1.1) can be stated as H o: CBU =
(Jo vs H a : CBU =1= (Jo, where the (a x q) matrix C allows contrasts corresponding
to different predictors in the design matrix, while the (p x b) matrix U allows contrasts within subject. The parameters are estimated by jj = CBU. There are four
commonly used procedures for testing the general linear multivariate hypothesis. All
four are based on functions of the eigenvalues of the following matrices:
(1.21)
and
(1. 22)
where X is the design matrix as in (1.1). Under H o, the matrices Sh = UTyTQh YU
(the matrix due to the hypothesis) and Se = UTyTQe YU (the matrix due to error) are independent central Wishart matrices with degrees of freedom min( a,b) and
"
n - q respectively. Standard multivariate criteria can be used as test statistics for
H o, for example: Wilk's generalized likelihood ratio statistic, det (Se)/ det (Sh
+ Se):
..
Hotelling's trace, trace (ShS;l); Pillai's trace, trace [Sh(Se+Sh)-l]; and Roy's largest
root,
Cl
=
Cl (ShS;l).
Except for some special cases, exact distributional results are
not available for these test statistics. Instead, asymptotic approximations based on
F or X2 distributions are employed in practice (see Timm (1980)), for both null and
non-null cases. Muller and Barton (1989) and Muller, LaVange, Ramney and Ramney (1992) presented some methods for approximating power for GLM models with
normally distributed errors.
Under models (1.2) or (1.3) and the null hypothesis H o: {C1,Bl
O} where each C t is a full rank (Ct x
md
matrix with
Ct
= 0, ... , Cp,Bp =
:S mt, McDonald (1975)
derived an exact test by assuming that the error distribution is a p-variate normal
distribution and applying Roy's union-intersection principle (see Roy, Gnanadesikan
and Srivastava (1971)). Nevertheless, the verification of the testability conditions of
12
l"
H o for McDonald's procedure can be difficult. Following McDonald (1975) let
(1. 23)
and
(1. 24)
where X o is any basis of X
= [Xl'" X p ] given by X o = XH, Co = CH, C is an
2:: mt) matrix of rank s :S q constructed using Roy's union-intersection principle,
and H is any suitable (2:: mt x q) matrix of rank q = rank(X). Under H o, the
matrices Sh = yTQh Y and Se = YTQe Yare independent central Wishart matrices
(s x
with degrees offreedom min(p,s) and n-q respectively. Standard multivariate criteria
can be used as test statistics for H o.
For model (1.3) hypotheses of the form H o: C{3 = (}o, where now C is a c x
m matrix, several statistics have been suggested in the literature. Zellner (1962)
proposed applying a parallel of the usual F-test for the univariate regression model
to the MDM model by means of a transformation of the model using an estimate
of the covariance matrix, thus taking into account the fact that the errors may be
•
correlated across equations. The form of the test statistic is
m) (c?3 - (}oY{C[X=(~-~ Q9 In)X]-lCT}~l(c?3 - (}o).
c
(y - X(3)T(:E- l Q9 In)(y - X(3)
F = (np F will,
however, have the same asymptotic distribution as
Fc,np-m,
(1.2;"))
\;/1
that is, a
distributed random variable. If:E were known, F would provide an exact test.
I)\[t
-
~
when an estimator :E is used, the resulting statistic, F, is equal to the exact F plll:-some error of order-O [np-l/2] in probability. Zellner (1962) suggested using Fc.
II
I)
II
as the null distribution, claiming that it is more conservative.
Kleinbaum (1973) derived asymptotic tests of linear hypotheses under a general
•
model and p-variate normality via Wald constructions, of which models (1.3),
(l.~) .
(1.5) and (1.6) are special cases. Any BAN estimator of {3 and any consistent estimator of:E will suffice in that test construction. In particular, for testing H o: C{3
= Oli.
where C is of a full rank c x m matrix (c :S m), the statistic is given by
(1. 26)
13
which is asymptotically X2 with c degrees of freedom under H o.
Lightner and O'Brien (1984) conducted some simulation studies comparing the
F statistic (1.25) and the Wald statistic (1.26) substituting the maximum likelihood
estimators, Zellner's estimators based on (1.12) and (1.13) and using different approximations (F and X2 ) for the distribution of the statistic under the null hypothesis.
The test of interest was whether the treatment by time interaction adjusted for one
time-varying covariate was present or not. The conclusion was that all test statistics
performed well for p = 2, but this conclusion was not reached for other dimensions.
The likelihood ratio statistic is also frequently employed to test hypotheses in
the models described in subsection 2.1. The asymptotic distribution of the likelihood
ratio statistic A is given by A = -210g A
f'V
X~ where c is the dimension of the
constraint. Rocke (1989) conducted studies to find a reasonable approximation for
the distribution of A in some particular examples.
He concluded that using the
asymptotic X~ reference distribution for testing hypotheses in SUR models results
in highly biased significance levels, in which many true null hypotheses would be
•
rejected. The bootstrap Bartlett adjustment method described there performed well
..
across the cases examined, over second order asymptotic adjustments and the
crud~'
bootstrap in which the significance level assigned to A is the fraction of the .\ 's
obtained by the bootstrap procedure greater than the observed sample A.
Bee = 0 vpr:-'ll"
< T versus H 02 : r l 1=- 0 fUl
Patel (1986) derived the likelihood ratio tests for testing HOI:
H ll :
Bee
1=- 0 and for testing H 02 :
at least one l >
TI
T -
TI
for alll
>
TI
assuming model (1.4). The null hypothesis HOI states that
functions of the parameters of
last
rl = 0
e are zero.
lilH'itl
The null hypothesis H 02 states that til!'
within-subject covariates do not provide any information beyond what
provided by the first
TI
covariates. The test for H 02 would be useful in developing,
1"
it
method for selection of within-subject covariates provided that the investigator wallts
to use them.
..
..
14
1.3
Robust Methods
For model (1.3) Chinchilli, Schwab and Sen (1989) proposed the use of aligned rank
statistics under the MDM linear model for estimating {3 and testing H o: {C {3 = O},
where C is a (c x m) matrix of full rank by using a Wald type statistic that has X 2
distribution with c degrees of freedom under H o and assumptions described in their
paper. Nonparametric procedures for the MDM model are further discussed by Saleh
and Shiraishi (1993).
Sen (1983) described the use of rank methods in ANOCOVA models that can
be applied to each equation individually.
One approach that particularly interests us is the minimization of the mean
squared error of prediction, and the following robust methods for testing hypothesis
in variance-covariance and correlation matrices have been proposed in the literature.
Elliptical distributions have been employed in two general approaches yielding
•
somewhat different results. In one, an n x m data matrix is regarded as being distributed according to an nm-dimensional elliptical distribution. Elements in different
rows of the data matrix are regarded as uncorrelated but not independent. Under
these conditions certain normal theory likelihood ratio tests remain valid without
correction (see Kariya and Sinha (1989) and Anderson, Fang and Hsu (1990)).
The other approach regards rows of the data matrix as being independent and
distributed according to an m-variate elliptical distribution. Under that situation a
number of alternative methods have been proposed in the literature. Muirhead and
Waternaux (1980) proposed a kurtosis correction on the likelihood ratio statistic for
testing hypothesis on the population correlation matrix based on the usual sample
..
estimate of the correlation matrix obtained when sampling from a normal distribution
to take into account the fact that we might be sampling from a more general elliptical
distribution.
Tyler (1982) and Tyler (1983) extended the work of Muirhead and Waternaux
(1980). He studied the robustness and efficiency properties of the normal likelihood
15
ratio test statistics for functions of the population covariance matrix under elliptical
distributions. Let V be a fixed symmetric positive-definite matrix parameter of order
m and let V n be a sequence of symmetric positive-definite random matrix estimates
of order m. Also, let H(r) be a k-dimensional multivariate function of the (m x m)
covariance matrix
r, such that the hypothesis of interest can be written as
He showed that iffor all a > 0 and all symmetric positive-definite matrices
H(ar), then .jTi{H(Vn )
-
H'(r)
L~l
r,
H(r) =
H(V)} converges to a k-variate normal distribution with
mean zero and variance-covariance matrix
Jm =
H(r) = O.
20"1
{H' (V)} (V ® V) {H' (V)} T, where
= ~{dH(r)/dvec(r)}(I + J m ),
J ii ® J ii , J ij is the (m x m) matrix of with one in the (i,j) position
and zeros elsewhere. The derivations for the asymptotic distributions rely primarily
on the asymptotic normality and the invariance properties common to the sample
covariance matrix, the maximum likelihood estimate and the M-estimates of scatter
matrices when sampling from an elliptical distribution.
Browne (1984) and Shapiro and Browne (1987) also discuss correction factors
for tests in covariance matrices.
Little (1988), Lange, Little and Taylor (1989) and Lange and Sinsheimer (1993)
provide a good reference on how to fit multivariate elliptical distributions to an n x m
data matrix that can be used to use the test procedures described by Tyler (1982)
and Tyler (1983).
1.3.1
Spherical, Elliptical and Left-Orthogonally Invariant
Distributions
An (m xl) random vector z is spherically symmetrically distributed if the distribution
of z remains the same under orthogonal transformations, that is,
£(rz) = £(z)
16
•
for all orthogonal (m x m) matrices
r.
Similarly, an (m xI) random vector z is
elliptically symmetrically distributed with mean J-t and positive definite dispersion
matrix 'It if the distribution of 'It- 1/ 2 (Z
J-t) is spherically symmetric distributed,
-
that is,
..
for all orthogonal (m x m) matrices
r.
The characteristic function has the form
If z (m x 1) is elliptically distributed with parameters J-t and 'It and has a
probability density function, then the density function has the form
(1. 27)
where
(1. 28)
9 is a scalar multiple of a density such that 9 is a function on [0, (Xl) satisfying
•
JRm
g(uTu)du = 1. The variables T = d2 and z* = ['It- 1/ 2 (Z
-
J-t)]/T 1/ 2 are indepen-
dent with z* being uniformly distributed on the unit sphere, and T having density of
the form
f T (t) --
m/2
7r
r(m/2)
t m / 2 - 1g(t)
.
(1.29)
If the appropriate moments of z exist then z has mean J-t and variance-covariance
matrix -2'I/J'(0)'lt = [E(T)/m]'lt.
Let Z be an (m x k) random matrix. We say that Z is elliptically symmetric
distributed about J.L (m x k) and dispersion matrix 1m ® 'It if the distribution of
vec['lt- 1/ 2 (Z - J-t)T]
= (1m ® 'It- 1/ 2 )vec [(Z -
J-t)T] is spherically symmetric distributed.
The following theorem generalizes some properties of the normal distribution to
the case of elliptically symmetric distributions.
•
Theorem 1.1 Let z be distributed according to the m-variate elliptical distribution
whose density is given by (1.27), and
[:J
17
I
where m1
+ m2
= m. If the second moments exist then Zl
I Z2
is elliptically distributed
with
I zd
I zd
E (Z2
Cov (Z2
1-£2
+ W21 w 1/(zl
- 1-£1)
~(Zl) (W22 - \)/21 W1/W 12 ).
•
If a probability density function has the form (1.27), so is its scale mixture
where H (a) is a distribution on (0, 00 ). When the integrand 9 is a normal density, it is
often called a normal mixture. A condition for an elliptically symmetric distribution
to be a normal mixture can be found in Kariya and Sinha (1989).
Some examples of elliptical distributions are the Nm(l-£, w) that has density
function of the form
(1.31)
The kurtosis parameter is
=
K,
0, the mean is E(Z)
=
J-l and variance-covariance
matrix Var(Z) = W.
If we set
1- t
if qi = 1
(1.32)
o
otherwise ,
where 0 < t < 1 and ,\ > O. The t-contaminated m-·variate elliptical normal density
is given by
(27r)-m/2(1- t)lwl- 1/ 2 exp
[-~(z -
I-£rW- 1(z - 1-£)]
(1.33)
+ (27r)-m/2 ,\m/2 Elwl- 1/ 2 exp [-~(Z The kurtosis parameter is
K,
= ((1
+ E(,\-2
- 1))/[1
+ E(,\-l
I-£rW- 1(z - 1-£)] .
- 1)]2) - 1. The weight
function has the form
w =
{I - t + E,\l+m/2 exp[(l -
,\) d2/2J} /
18
{I - E + t,\m/2 exp[(1 -
,\) d 2/2J}. (1.34)
•
Suppose that y is Nm(O, 1m ) and q is X~, and that y and q are independent. Let
'1' be a positive definite matrix and '1'1/2 be a symmetric square root of '1'. If
.
then z has the m-variate elliptical i-distribution with
r[(1/ + m)/2] 1'1'1-1/2
r[1//2](1/7r)m/2
The kurtosis parameter is
K,
1/
degrees of freedom given by
[1 + ~(z _ /-Lr'1'- 1(z _ /-L)]-(v+m)/2
(1.35)
1/
= 2/(1/ - 4) (1/ 2:: 5), the mean is E(Z) = /-L (1/ 2:: 2),
and the variance-covariance matrix is Var(Z) = v'1' /(v - 2)
(1/
2:: 3). The weight
function in that case is given by
W
\\Then
1/
= (1/
+ m)/(I/ + d2 ).
(1.36)
= 1 this is called the multivariate Cauchy distribution for which no moments
exist.
•
An (m x k) random matrix Z has a left-orthogonally invariant distribution about
/-L (m x k) if the distribution of r(Z - /-L) remains the same under orthogonal trans-
formations, that is,
L:(r(z - /-L)) = L:(Z - /-L)
for all orthogonal (m x m) matrices
r.
In this case, if Z has a density it has the form
(1.37)
where r is a nonnegative function on the set of (k x k) matrices. The class of leftorthogonally invariant distributions contains the class of elliptically symmetric distributions about /-L and dispersion matrix 1m 0 '1' but not vice-versa. An example of
a left-orthogonally invariant distributed variable that is not elliptically symmetric is
the matrix-variate i-distribution with density function
..
The results presented in this section can be found in Kariya and Sinha (1989)
and Muirhead (1982).
19
1.3.2
M -Estimation
For a random sample from an elliptical distribution Maronna (1976) defines affine
invariant M-estimates of location to be the solution
jJ~
and'll to a system of equations
,
=:
0
.
of the form
n-
i
n
L Wi (di)(Zi -
J.L)
(1.38)
i=i
and
n
n-
i
L wz(dT)(Zi -
J.L)(Zi - /J~r = ~
(1.39)
i=i
where
The functions Wi (odd) and Wz (even) must satisfy certain conditions as explained in
!\-'1aronna's paper. The estimators provided by the solution of (1.38) and (1.39) are
affine invariant and yield the maximum likelihood estimates of J.L and 'II for the case
of elliptically distributed variables if we take wi(di ) := wz(dT) = -2g'(di )jg(dd.
Maronna (1976) also proposed M-estimators based on an approach suggested
•
by Huber (see (Huber, 1981)). In that case, the weight functions for the Huber(q)
estimator are of the form
di
1,
:::;
k
wi(di ) =
(1.40)
k j di ,
otherwise
and
(1.41)
where k Z is chosen to be the lOOq percent point ofaXz distribution with m degrees
of freedom, and j3 is chosen so as to make the solution to the estimate of'll in (1.39)
asymptotically unbiased for the covariance matrix in a normal distribution situation.
Notice that more general M-estimates can be obtained from (1.38) and (1.39)
substituting Wi and Wz by a different set of weights for each coordinate of Zi (Singer
and Sen, 1985).
20
.
1.4
Synopsis of Research
In many multi-visit studies there are several covariates measured at each occasion.
In view of the issues of cost and precision involved, one question that arises when
..
planning similar future investigations is whether it is desirable to collect the covariates
at each visit as opposed to collecting them only once, that is, at the beginning of the
study. For that purpose consider the simple situation where there are n subjects
followed for p visits. In each visit t (t = 1, ... , p) r covariates and one response
are observed for each subject. The data structure is presented in Table 1.1. The
covariates change over time for each individual and are assumed to vary randomly for
each individual.
Table 1.1: Data structure
Covariates
Visit
..
.
1
XIll
Xll2
Xll r
1
X2ll
X212
X2lr
Y21
1
Xnll
X n l2
Xnl r
Ynl
Xl21
Xl22
Xl2r
2
X221
X222
X22r
Y22
2
X n 21
X n 22
X n 2r
Yn2
Xlpl
Xl p 2
YIp
p
X2pl
X2p2
Y2p
p
Xnpl
X np2
2
p
.
Xl
Response Variable
X 2
X p
X npr
21
Y.I
Y.2
Yll
Yl2
Ynp
1.4.1
Model
Several different models may be proposed to study these data. The one we consider
here is called the monotone multiple design multivariate (MMDM) model, which
allows different dependent variables to have different design matrices nested within
each other. It is given by
Y.l
X o Xl
Y.2
0
0
Y.p
0
0
0
0
X o Xl
0
0
0
0
0
0
{31
X2
0
0
0
{32
Xp
{3p
0
X o Xl
e.l
+
e.2
e.p
(1.42)
Y = X{3
+ €,
where each Y.t is an (n x 1) vector with the responses observed at visit t, X o is an
(n xl) matrix whose elements are ones, each X t is an (n x r) matrix containing
the values of the random covariates in visit t, each {3t is a ((tr
+ 1)
x 1) vector
of unknown regression coefficients, and e.t is an (n xl) random error vector for
t
= 1, ... ,p.
The vectors
Xit.
= (Xitl, ... , Xitr)
T
= (eil' ... , eip) are assumed
means J.L.t. = (P.tl, ... ,P.tp)T and
and
to follow a multivariate normal distribution with
eit
T
o respectively, and dispersion matrix
~ = (~xx
o
0
),
~ee
The observations on different individuals are taken to be independent. It is assumed
that the response variable at each visit is a linear function of the covariates observed
up to that visit. Comparing this model with the standard general linear model for
multivariate multiple regression we see that the standard GLM model requires the
same set of covariates to be used in the matrix X, that is, X = diag (Xl, ... , Xd. This
model is a special case of the "seemingly unrelated regression model" that allows each
response variable to have a different design matrix, that is, X
= diag (Xl, ... , X p).
We will often call model (1.42) a correlation model because the covariates are stochastic and (1.42) defines how the covariates are related to the responses.
22
•
1.4.2
Maximum Likelihood Approach
Under the assumption of normally distributed errors, the maximum likelihood estimates of f3 and :E can be found by maximizing the log-likelihood function with respect
.
to f3 and:E. However under model (1.42) direct calculation of the likelihood equations
often requires iterative procedures. An alternative is to use a suitable transformation
of the varibles involved.
If we observe for each individual i:
Xile, Yii, Xi2.,
Yi2,· .. , Xip., Yip, then, under
the assumption of global normality, the variables
Xile Yii Xi2. -
E(Xile)
I Xile)
E (Xi2. I Yii, Xile)
E(Yii
Yi2 - E(Yi2
•
I Yii, Xile, Xi2.)
Xip. - E(Xip.
I Yii,
Yip - E(Yip I Yii,
(1.43)
,Yip-i, Xile,
, Yp-i,
Xile, ...
... ,Xip-le)
,Xip.)
are independent and normally distributed. The proof is a generalization of exercise
276 from Kendall and Stuart (1967, pg 343).
The parameters of each of 2p components of the resulting likelihood can tll('!l
be estimated by maximizing 2p regression models separately. The maximum likr'lihood estimators of JL.t., f3 and :E of the original model (1.42) can be obtained k
transformation, as it is seen in Chapter 2.
1.4.3
Hypothesis of Interest
The objective of this work is to develop test procedures to assess if the covariat('s
that change over time should be included in the analysis as opposed to using olll\
the covariate measurements obtained at the first visit, taking into account the loss of
precision involved.
23
The null hypothesis of interest (Ho) is that the responses at visit t, t = 2, ... ,p,
can be written as linear functions of the covariates measured at visit 1, that is,
p
H o : {C,B = O} = n{Hot : Ct,Bt = 0, t = 2, ... ,p};
t=2
(1.44)
where C = diag(C 2 , . . . ,C p) is a (cx l: mt) matrix, C t = [0 let] is Ct x mt, t = 2, .... p.
Ct = (t-l)r, mt = rt+l,
C
=
p(p-1)r/2 and l:mt = p[r(p+1)+2J/2. The alternative
hypothesis (Hd is that the responses at visit t, t = 1, ... , p, are linear functions of
the covariates obtained at visits 1 to t.
We can rewrite the null hypothesis of interest in terms of the transformed model
as
p
Ho :
n{Hot:
(JYt·Yl· ..Yt-lXl ...Xt
t=2
=
(TYt.Yl ...Yt-lXl}
or
p
Ho :
n
for s
= 2, ... , t}
< (JYt.Yl ...Yt-lXl for some t
= 2, ... ,p.
{Hot: (JYS.Yl ...Ys-lXl ... Xs
t=2
and the alternative hypothesis
Hl
:
(JYt.Yl ...Yt-lXl ...Xt
=
(JYS.Yl ...Ys--lXl
Similarly to (1.43), under H o we observe for each individual i:
Xile, Yil,
Yi2 ... ..
Yip the variables
Xile Yil -
E(Xih)
E(Yil
I Xih)
Yi2 - E(Yi21
Yip - E(Yip
Yil,Xih)
I Yil, ... , Yip-I, Xih)
are independent and normally distributed. Under HI the new variables Yit.Yil ..
Yit-I,
Xih, ... ,
Xite t = 1, ... ,p have variance-covariance matrix given by
and under H o, Yit'Yil, ... , Yit-I,
Xih
t = 1, ... ,p have variance matrix given by
24
Observe that
1;f - 1;f
is positive semi-definite. Each diagonal element of
greater than or equal to the corresponding element in
1;f,
since
1;f
aYt.Yl ...Yt-1Xl
is
::::
aYp·Yl ...Yp_1Xl ... Xp·
1.4.4
Step-Down Procedure
Under the assumption that the responses are normally distributed and the covariates
are fixed, McDonald (1975) derived an exact test applying Roy's union-intersection
principle that can be used to test a class of null hypotheses that includes (1.44) as
a special case. For that purpose take Co = C p and X o = [In Xl ... X p ] in (1.23)
and (1.24). Standard multivariate criteria can be used as test statistics for the null
hypothesis in this case. HmF'ver, his approach does not take into account the fact
that the model under the alternative hypothesis is restricted, that is, given by model
(1.42), so his test may be sensitive to departures from the null hypothesis that are not
•
specified in the alternative. Besides that, the case when the covariates are stochastic
is not discussed.
.
Observe that the likelihood ratio test statistic for H o takes the form
_ [1~11]n/2
A~
11;01
1;0 and 1;1 denote the maximum likelihood estimators of 1; under Ho and H l
~
where
'
~
respectively. The likelihood ratio statistic can be written as
(1.46)
where
.
a denotes
the maximum likelihood estimator of the corresponding variance.
Observe that (1.46) is a product of p - 1 terms that can be individually transformed
to
F
t
=
(
n - (r + l)t)
r(t - 1)
(a
Yt.Yl ...Yt,-IXl
-a
Yt.Yl
aYt.Yl ...Yt-1Xl
)
Yt-1Xl ... X t .
Xt
Under H o, the statistics Ft are quasi-independently distributed with Ft having the
F-distribution with degrees of freedom r(t - 1) and (n - (r
25
+ l)t)
for t = 2, ... ,p.
The distri bution of Ft does not depend on Y.t and X t , t = 1, ... , t - 1, being held
fixed or not, given that Hos is true for s = 1, ... ,t.
This fact suggests the use of a step-down testing procedure (Roy et al., 1971)
to test H o. The procedure is to compare F2 with the significance point
respective degrees of freedom. If the observed value is larger than
h.
If it is accepted, then compare F3 with
12,
12
for its
then reject H o.
•
In sequence, the components are tested.
If one is rejected, the sequence is stopped and the hypothesis H o is rejected. If all
components of the null hypothesis are accepted, the composite hypothesis is accepted.
The proposed test has acceptance region W given by
n[F
p
W :
t ::;
it],
t=2
where it's are to be chosen such that
•
and the probability of accepting H o
p
p[n (Ft ::; it!Hot )]
t=2
p
= II (1 -
at)
=1-
a,
•
t=2
for a prespecified a.
We can choose the significance levels based on the investigators considerations.
In the absence of any other reason, the component significance levels can be taken
equal. If the investigator is more interested in the final measurements (like pth response) and a p is a very small number, then it will take a relatively large deviation
from the pth null hypothesis to lead to rejection.
We may observe that each Ft region itself (for t = 2, ... ,p) is an intersection of
of an infinity of t regions with the same degrees of freedom, (n - (r + l)t), but they
are different for each t.
The step-down procedure is appropriate, since there is a meaningful basis for
considering the responses in the specified order. Considering them in different orders
may, in general, lead to different conclusions.
26
1"
There are some differences between the proposed methodology and McDonald's
procedure. First McDonald asssumes that the residual mean square of y
I X is con-
stant, whereas in the proposed methodology y and X are random and we work with
the unconditional residual mean square. McDonald uses the information contained
...
in a linear combination of the original variables, reducing the problem to a one dimensional case. If in his strategy the hypothesis is not rejected that implies that
the corresponding hypothesis in our proposed strategy would not be rejected either,
but the converse may not be true. McDonald's test may be sensitive to departures
from the null hypothesis that are not specified in the the alternative hypothesis of
our proposed strategy. McDonald attempted to construct confidence regions for the
parameters associated with the mean function, whereas in this work, we are mainly
concerned with hypothesis testing of variance-covariance parameters.
In view of that, when the responses are normally distributed and the covariates
are continuous random variables, we propose a decomposition of the likelihood ratio
satatistic and the use of a step-down procedure for testing purposes. That involves
decomposing the hypothesis of interest into an intersection of hypotheses for the
parameters of each visit. The likelihood is then written as a product of independent
statistics under the null hypothesis, so that each hypothesis is tested separately for
each visit. The use of the step-down procedure in this case provides an appropriate
framework for both estimation and hypothesis testing.
The method is further discussed and generalized to the situation when there are
several responses and the missing data pattern is increasingly monotone by design.
that is, when the number of experimental units with responses and covariates followed
up to visit t is larger than for visit t + 1, t = 1, ... ,p - 1 in Chapter 2.
..
..
1.4.5
Robust Estimation Approach
Chapter 3 treats the cases where the distributions of the test statistics involved remain
exactly the same under a more general class of distributions that includes the normal
as a special case. It is shown that the step-down procedure described in Chapter 2
27
remains robust in the case that the joint distribution of all variables involved is
an univariate elliptical distribution. This robustness property is exact, in the sense
that it is not an asymptotic result, and holds under the null and the alternative
hypotheses. The strategy used to study the robustness of the proposed step-down
procedure follows the lines defined by Kariya and Sinha (1989).
On the other hand, it has been observed in the literature that tests of hypotheses
concerning the population covariance matrix based on the normal distribution theory are not robust when the observations from different subjects are independently
distributed and follow a multivariate elliptical distri.bution. In particular, the tests
are generally sensitive to the kurtosis of the variables involved. In Chapter 4 we
consider the approach taken by Tyler (1982, 1983) and Muirhead and Waternaux
(1980) to modify the step-down procedure described in Chapter 2. That approach
involves finding an adjustment to the log likelihood ratio statistic based on the normal distribution such that the test statistic is asymptotically robust over the class of
multivariate elliptical distributions with finite fourth moments. Tyler (1982, 1983)
presented general conditions under which such adjustment is possible and can be
applied to the normal likelihood test statistic when the estimator of the dispersion
matrix is taken to be the same as in the normal distribution, the maximum likelihood estimator under a specified elliptical distribution or an affine-invariant robust
1\;1 estimator of the dispersion matrix.
The proposed methodology is applied to two da1casets extracted from multi-visit
studies in Chapter 5.
.
28
Chapter 2
Normal Distribution Case
2.1
Introduction
In this chapter we extend the step-down procedure described in Section 1.4 to the
case when there are several observed responses at each visit and a planned monotone
decreasing pattern of observation of the responses and covariates over time. It is
•
assumed that the set of observations corresponding to a certain experimental unit
follows a multivariate normal distribution.
2.2
Data Structure
Consider the situation where there are n subjects followed for p visits. In each visit t
(t = 1, ... ,p) r random covariates and q responses are observed for nt subjects, such
that n =
nl :::::
n2 ::::: ... ::::: n p and the nt subjects have their covariates and responses
measurements observed in the prior visits. At the beginning of the study
..
f design
variables that do not change over time are recorded for all individuals and are not
considered to be random. The data structure is presented in Table 2.1. Note that the
r covariates change over time for each individual, and that n p individuals have their
covariate and response measurements observed at all p visits,
np-l -
np
individuals
have their covariate and response measurements observed at the first p - 1 visits, and
so on, such that nl - n2 individuals have their covariate and response measurements
observed only at the first visit. It is assumed that n p >
f + 1 + p(r + q)
- q and that
the dropping out is independent of the responses and covariates.
..
Table 2.1: Data structure
fl.
Visit
Design
Random
Variables
Covariates
Responses
1
X IOI
x lO !
X lli
X ll2
X llr
Ylli
Yllq
1
X 201
x 20 !
X 211
X 212
X 21r
Y211
Y21q
1
Xiol
Xio!
Xill
Xil2
Xilr
Yilq
Yilq
1
X nlol
x nlo !
x nlll
X nl12
x nllr
Ynlll
Ynllq
2
X I21
X I22
X I2r
YI21
YI2q
2
X 221
X 222
X 22r
Y221
Y22q
2
Xi21
Xi22
Xi2r
Yi21
Yi2q
2
X n221
X n222
X n22r
Yn221
Y n 22q
P
X lPI
X lP2
x lpr
YIPI
Ylpq
P
X 2PI
X 2P2
x 2pr
Y2PI
Y2pq
P
Xipl
XiP2
Xipr
Yipl
Yipq
P
x nppi
x npp2
x nppr
Ynppi
Ynppq
•
..
,
30
2.3
Correlation Model
Let Xio. be the ((I
1, Xii., Xi2., ... ,
+ 1)
Xit.
x 1) vector of fixed design covariates with first element being
be the (r x 1) vectors of random covariates, Yil., Yi2., ... ,
be the (q x 1) vectors of responses, and eil., ei2., ... ,
eit.
Yite
be the (q x 1) vectors of
random errors associated with each individual i, for i = 1, ... , nt and t = 1, ... , p.
Let
X(t) -
1
XlOl
x lO !
XTlO •
1
X 201
x 20!
x T20 •
Xnto!
X ntO •
o -
=
x. o! ],
T
1 x ntOI
X(t)
U
= [ X. OI
X lUl
X lUr
XTlU •
X 2U l
X 2ur
X 2U •
T
= [ X. UI
X. U2
X. ur ],
= [ Y.UI
Y.U2
Y.uq ],
= [ e. UI
e. U2
e. uq
T
X ntu •
X ntUI
T
YlUl
YlUq
YIU.
T
y(t)
U
Y2U.
Y2Ul
T
E(t)
U
YntUI
Yntu.
e lUl
e TlU •
e2Ul
e 2U •
T
T
entUI
..
],
e ntu •
and
for u = 1, ... , t and t
1, ... ,p.
Suppose that x~;~ =
(xft. x:[;•... xft.V
and
e~;). = (e;l. e:[;•... eft.V are uncorrelated random vectors distributed according to
31
a multivariate normal distribution with mean
(2.1 )
•
and variance matrix
(2.2)
where
8
:E(t)
-
(t) --
xo
[8 XIO 8
X20· . •
:EXIXI.XO
:ExIX2.XO
:Ex2XI.XO
:Ex2X2.XO
:EXtXI.XO
:EXtX2.XO
8xto,
]
xx.xo -
and
:E(t) =
:E e1e1
:E e1e2
:E e2e1
:E e2e2
:E ete1
:E ete2
.
ee
Note that 8
x(to)
corresponds to the first t(f -+- 1) columns of 8
responds to the first tr columns and rows of :E~).xo =
:E xx . xo '
x(po)
= 8-
xo,
~(t)
~XX.XO
cor-
and :E~~ corresponds
to the first tq columns and rows of :E~~) = :E ee • The fact that the sets of covariate
observations and random errors associated with different individuals are uncorrelated
implies that they are independent under the assumption of normality.
In this context the appropriate correlation model that incorporates a monotone
multiple design multivariate (MMDM) linear model for the vector of responses at the
tth visit of the ith individual is given by
T
Yit. = B to Xio. -+-
BT
t
(t)
Xi..
..
-+- eit.,
(2.3)
where B to is the ((f -+- 1) x q) matrix of unknown regression coefficients associated to
the non random covariates, B t =
(B~ B~
... Bft)Y is the (tr x q) matrix with Btu
being the (r x q) matrix of unknown regression coefHcients associated to the random
32
covariates observed at visit u for u = 1, ... , t, i = 1, ... , nt, and t = 1, ... ,po Then
(t) _
Zi.. -
((t)T
(t)T)T'~ =
Xi.. Yi..
e
1, ... , nt are norma 11 y d'Istn'b ute d WIt
. h mean
e- xo(t)T Xioe
(t) _
J.L zi -
(
B(t)T.
o X we
)
(2.4)
+ B(t)Te(t)T
.
xo X we
and variance matrix
~(t)
zz
~(t)
=
(
~(t)
xx.xo
xX.xo
B(t)T~(t)
xx.xo
~(t)
ee
B(t)
+ B(t)T~(t)
B(t)
xx.xo
)
(2.5 )
where
(2.6)
and
B(t) =
B l l B 2l
Btl
0
B 22
B t2
0
0
B tt
(2.7)
•
Note again that B~t) corresponds to the matrix formed by the first t(f
of B~p) = B o ,
rows of B(p)
~~Jxox
2.4
B(t)
= B,
+ 1)
columns
corresponds to the matrix formed by first tq columns and tT
and that ~~~.xox corresponds to the first tq columns and rows of
= ~yy.xox = ~ee + BT~xx.xoB.
Estimation
The usual method of maximum likelihood involves the maximization of the joint
(t)T ei..
(t)T)T &lor ~. -· 'b'
d Istn
utlOn 0 f (Xi..
..
and ~. Since x~:~ and e~;~ for i =
+ 1,
nt+ 1 + 1,
nt+l
, nt
,nt
- ,
1
an d t -
and t = 1,
. h respect to B
,p WIt
,p are uncorrelated
random vectors, thus independent under the normality assumption, the likelihood
•
under the model described in Section 2.3 is given by
33
Il:(t) 1-(nt-;t±l)exp[_~f-trl:(t)-1 ~ (x(t)_S(t)T
xx.xo
2
Xx.xo.
I.. xo
II
t=1
t=l
l=nt±l +1
p
where
~
n p +l
~
X
w.
I..
(28)
)(x(t)_S(t)T X · )T]
xo!D·
.
= 0, N = (q + r) L:f=1 nt, and l: = diag(Inp ® l:~l.xo' Inp_I-np ® l:~~x~' ... ,
I nl -nz ® l:1~.xo' I np ® l:~~) , Inp_I-np ® l:~~-1) , ... , I nl -nz ® l:W)· Direct calculation of
•
the likelihood equations under the MMDM model often requires iterative procedures.
We use an alternative approach based on a orthogonal transformation of x... and e...
(see Fujisawa (1995)) because this method yields explicit solutions of the maximum
likelihood estimators of Band l: for the MMDM model. For that purpose observe
that to maximize (2.8) is equivalent to maximize the product of likelihoods whose
parameters have an one to one relationship with the parameters of (2.8), that is,
..
(2.9)
with respect to B to , Btl and
~c
d
~c
h
(t-l)
_
-
=
for t
(T
T
""'xx.xo
an
l:~x.xo
= diag(l:xlxl.xo, ... , l:xpXp.XOXI",Xp_I)'
""'ee' W
ere Xi..
Sxto
Xil.Xiz."
(t-l)-l
l:ee
l:el ...et_l
l:XtXt .XOX1 ...Xt-l
l:
XtXt·Xo
- l:
1, ... , p,
T)T
.X it - l•
34
and
(t-l)
,ei..
_
-
Set
for t
(T
T
=
2, ... , p,
T)T
eileeiz•·· .e it -
l:~e
= diag(l:elel' ... , l:epep.el ...ep_I)'
_
et -
[ST
ST]T
etl' . .
ett-l
,
Xt Xl .. ·Xt-I·XO
l:Xt Xl ...Xt-l.XO
Sxt
l:(t.-l)-Il:
Xl'.XO
Xl .. ·Xt-l Xt·Xo'
1•
'
.
(2.10)
and
for t = 2, ... ,p. The parameters of each of the 2p components of the likelihood can
then be estimated by maximizing 2p multivariate regression models separately. The
"
maximum likelihood estimators of Band :E of the original correlation model can be
obtained by transformation.
To find the maximum likelihood estimators of the parameters involved we are
going to use the following two well known theorems repeatedly.
Theorem 2.1 Let 4> and T be symmetric positive definite matrices and
f(i~) =
14>I-n exp[-ntr (4)- l T)]. Then f(4)) is maximized with respect to 4> when 4> = T.
Theorem 2.2 Let 4> be a (u x v) matrix, T i be a (v x v) symmetric positive def-
inite matrix, T 2 be a (u xu) symmetric positive definite matrix, and and f (4))
•
..
-ntr (T l 4> T T 2 4». Then f(4)) is maximized with respect to 4> when 4> = O.
Consider the terms of the likelihood (2.9) that correspond to the distribution of x ....
that is,
.
_
[Xlt.
OT
.
_
~ xtoXw.
OT ( (t-i) _
~ xt Xi..
O(t-i)T
. )]T]
~ xo
x w•
..
•
First take the likelihood corresponding to the distribution of X. Ie
35
Theorem 2.3 For the likelihood junction (2.11) the maximum likelihood estimators
oj e XIO and ~XIX1'XO are
(2.12)
and
~XIX1'XO
(2.13)
Proof Observe that
Take
e
nl
A
XIO
T
~ Xio. X io •
' "
=
{
}
-1 { 'nl
"
T
~ Xio. X ile
}
_{ (l)T (1)}-1{ (l)T (I)}
X o X o
X o Xl
.
So we can write
nl
-tr ~;llXl'XO L(Xil.
-
e~IOXiO.)(Xil.
-
e~IOXiO.r =
i=l
-n1
tr
~;I~I.XOtXIXI'XO + tr ~;11XI'XO (8~10 - e~~IO) {f: xio.xfo.} (8
XIO -
e X10 )
~=1
where
tXIXI.XO
is given by (2.13). Using Theorem 2.2 the above last term is maximized
at the value zero when
~XIX1'XO
e XIO
by Theorem 2.1.
=8
XIO '
The first term is maximized when ~xlrl.J:·o
=
•
Now take the likelihood of each
X.t.
regressed on
X. I . ,
••.
,X.t-I.
for t = 2, .... p.
that is
(211")
_~
2
I~XtXt.XO,CI ... Xt-ll
_~
2
{I
exp -2'tr ~XtXt,XOXI
.
-1
... Xt_1
nt
"'lx.
L..J
~t.
e T X·
xto to.
e T (X(t-1) _ e(t-1)T X ' )]
xt i..
XO
W.
i=l
lXit. - e- TxtoXio. -
eT ((t-1)
- xt Xi..
-
36
e(t-1)T
)]T}
- xo
Xio.
.
(2.14)
Define
e
[e(t-1)
XO
xto
]=
(X(t)TX(t»)
0
0
-1
X(t)]
X(t)T [X(t)
0
(2.15)
t
and
..
X(t)-x(t)e ]T[X(t) - x(t)e(t-1) X(t) - x(t)e ] (2 16)
Qtt = ~[X(tLxO(t)e(t-1)
nt
xo
t
0
xto
0
XO
t
0
xtO·
.
with submatrices
Q ttll = ~(X(t) _
x(t)e(t-1»)T(x(t) _
0
xo
nt
x(t)e(t-1»)
xo
0
(2.17)
,
_ 1 ((t)
(t) )T ((t)
(t) )
X t - X o E>xto X t - X o E>xto
(2.18)
Qtt22 -
nt
and
Qtt12 --
QT
_
tt21 -
~(X(t)
nt
x(t)e
_ x(t)e(t-1»)T(X(t) _
0
xo
t
0
)
(2.19)
xto·
Theorem 2.4 For the likelihood function (2.14) the maximum likelihood estimators
of E>xto,
E>xt
and ~XtXt.XOXl... Xt-l are given by
8 x t0 = e x t 0 - [e(t-1)
- 8(t-1)]8
xo
xo
xt ,
(2.20)
(2.21)
and
(2.22)
where
-
E>xto
- (t-l)
and E>xo
are given by (2.15) and
Qtt
by (2.16).
Proof Observe that
Xit. -
T
E>
- xtoXio.
E>T ((t-l)
-
- xt Xi..
o.T
= Xit. -
(
=
Xit. -
~ *T
E>
- xto
--
Xtt.
Cl xto -
. _ E>~ *T
xto
_
-
E>(t-1)T
)
- xo
Xio.
",t-1 E>T E>T )
L...u=l - xtu - xuo Xio. -
E>~ T
=
E>T (t-1)
- xtXi••
E>*T)
(E>~ T
E>T) (t-1)
+ (E>~- *T
xto - - xto Xio. + - xt - - xt Xi..
E>~ TxtXi..
(t-1) + [(E>' *T
_ E>*T) (E>~ T _ E>T)] [
xto
xto
xt
xt
x w. • ]
(t-1)
- xtXi..
(t-1)
Xi..
.
* -- E> xto
where E> xto
...
-
E>(t-1)E>
- E> xto
xo
xt -
{t [ 7
t-1
tt:· )]
1
X t••
[xfa.
{[X~t) X(t)r[X~t)
-
",t-1
E> xuo E> xtu· Take
L..u=l
X~;~l)T]
X(t)]} -1
37
}-1
{t [7
t-1
tt:· )]
1
X t••
[X~t) X(t)rx~t)
xft. }
(2.23)
where
8~to
(X~t)TX~t))-lX~t)T (X~t) - X(t)8 xt )
-
- (t
e xto - e xo-
1)
A
ext
and
ext
(X(t)TX(t) - X(t)Tx~t) (X~t)Tx~t))-l x~t)TX(t))-1
(X(t)Tx~t) _ X(t)Tx~t) (X~t)Tx~t))-l x~t)T:x~t))
(
x(t)TX(t) _ e(t-1)Tx(t)Tx(t)e(t-l))-1(x(t)Tx(t) _
xo
0
0
xo
t
[(x(t) _
x~t)e~to-1))T(x(t)
[(x(t) -
_
x~t)e~to-l)r(X~t)
e- (t-l)Tx(t)Tx(t)e)
0
0
xto
XO
x~t)e~to-1))]-1
-
x~t)exto)]
QU;l Qtt12'
So
..
8*xto + 8(t-1)8
xo
xt
e
xto
- (e(t-l)
xo
8(t-1))8
xo
xt·
Then
"
-tr ~-l
""'XtXt .XOXl ... Xt-l
-tr ~-1
""'XtXt.XOXl ... Xt-l
nt
'""[
~ Xite i=l
[(eA *T
xto
e'.. *TxtoXioe - e.. TxtXi..
(t--1)][
Xite A
A
_ e*T) (e T
e T )]
xto" xt - .. xt
38
8*T
.. xtoXioe -
8T
(t-l)]T
.. xtXiee
x(t)T]
o
[ X(t)T
[X(t)
0 X(t)]
[8* -8* ]
xto
xto
8 xt -
8
According to Theorem 2.2 the last term is maximized when
.
8
By Theorem 2.1 the first term is maximized when
xt .
.
xt
8;to = 8;to
and
E>xt =
:EXtXt.XOXI ... Xt_1 =tXtxt-xOX! ... Xt_l
where
Since
X(t)
- X(t)8*
- X(t)8 xt -t
0
xto
_ X(t)(8
_
X (t)
t
0
xto
X(t) _ X(t)8
..
t
0
xto
X(t) - x(t)(e
t
...
0
X(t) _ x(t)e
t
0
X(t) _ x(t)e
t
0
xto
xo
xt
) - X(t)8
xt
- 8(t-l)8 ) - (X(t) _ X(t)8(t-l»)8
xo
xto
xto
8(t-l)E>~
xt
xo
0
- (e(t-l) - 8(t-l»)8 ) _ (X(t) _
xo
xo
xt
_ (X(t) _ X(t)o(t-l»)8
0
0 ' xo
XO
X(t)8(t-l»)E>~
0
xo
xt
xt
_ (X(t) _ x(t)e(t-l»)Q-l Q
0
xt
ttll
tt21'
The maximum value attained by the likelihood of x••• is given by
•
(L!l!
The next algorithm explains how to obtain the maximum likelihood estimators
E>xo
and
:E xx .xo
'
(01
It is a direct application of Theorems 2.3 and 2.4 and the relationship",
given in (2.10) .
.,
Algorithm 2.1 The maximum likelihood estimators of
•
tained using the p steps described below.
Step 1 Obtain
tXIXI.XO
and 8
XIO '
Then
39
E>xo
and
:E xx .xo
can be
(1/1'
and
Step 2 Obtain ~X2X2,XOXI'
~X2X2,XOXI = ~X2X2'XO
-
e
and
X20
e
Then, SInce
X2 '
..
SX2
~X2XI.XO~~~.;;;;~XIX2'XO' we have that
~
~XIX2'XO
_
~
~
~X2X2 .XOXI
(2)
~
~
A
A
+ ~X2XI'XO
~X2X2 .XOXI
A
:-
(1) 1
~
+ ~X2XI'XO ~XX.;;;O ~Xl
~
~XX.XO
(1)
~XX.XO SX2'
-
X2·XO
SX2,
_
-
and
And so on, so that at the last step we have the following.
Step p Obtain
and
~
tXpXp.XOXI ... Xp_l'
LJXpXp.XOXI ..• Xp_1
--
~
LJxpXp.XO
-
expo
expo
~
Since
~
A
-
Xx.XO -
Sxp
~(p-l)-I~
LJXpXI ... Xp-I.XOLJXX.xo
~XpXp'XOXI",Xp-1
t
and
.
= ~~~d-l~XI ... Xp_1
LJXI ... Xp_1 Xp.xo
we have that
Xp.XQ
.
A
+ ~Xp
XI",Xp-I.XOSXP'
t(p)
XX.XO-
and
~
S
XO -
S~ (p) XO
-
[SA
XIO
sA
X20
"
•
40
Now consider the terms of the likelihood (2.9) that correspond to the distribution
of e... , that is,
..
where N e
= q l::f=l nt/2. First take the likelihood corresponding to the distribution
(2.25)
Theorem 2.5 For the likelihood junction (2.25) the maximum likelihood estimators
(2.26)
•
and
Proof Observe that
eile
Yile - BioXiO. - Bil XiI.
Yil. - :8ioXiO. - :8il XiI.
•
+ (:8io - Bio)XiO. + (:8il - Bil )Xil•
Yil. - :8ioXiO. - :8ilXi1.
..
Xw. ]
[ Xu.
41
where
[ ::: ]
{
~ [ :::]
[xi,. x;'.l
r{~ [:: :]
y;'. }
(2.28)
{[X~l) xi1)]T[X~1) xi1)]} -1 [X~l) xil)ryi 1)
Then
nl
-tr ~;l~l
L eiueft.
i=l
nl
-tr~;l~l L[Yih- B~oXio.- B~lXil.][Yi1.-E~~oXiO.- B~lXil.r
i=l
According to Theorem 2.2 the last term is maximized when B lO =
13
10
given by (2.28). By Theorem 2.1 the first term is maximized when
and B ll = Ell
~elel = ~(.:I
.
l
where
Now take the likelihood corresponding to the distribution of e.t. regressed
1111
e. l., ... , e.t-l. for each t = 2, ... ,p, that is,
..
Observe that
T
Yit. - B toXio. -
42
BT
t
(t-1)
Xi.. -
C\T
(t-l)
0etei..
t
t-1
u=l
t
u=l
L B[u Xiu• - L e~tueiu.
Yit. - BfaXiO. -
L
Yit. - BfaXiO. -
..
B[u Xiu•
u=l
t-1
[
-L
e~tu Yiu. -
..
B~oXiO. -
u=l
v=l
t-1
Yit. -
Bfa -
(
L
t-1 (
-
)
e~tvB~o Xio.
v=l
t-1
~
-
]
LU
B~vXiv.
~ e~tvB~u
B[u -
)
Xiu.
BitXit.
t-1
-L
E>~tuYiu.
u=l
t
Yit. - rfaXiO. -
t-1
L r[u Xiu• - L e~tuYiu.
u=l
Yit. -
r toXio.
'T
+ [(I'fa -
r'T
-
t
(t-1)
Xi..
-
u=l
(t-1)
0etYi..
aT
rfa) (I'[ - rn (8~t - E>~t)j
Xio.
(t)
Xi..
(t-1)
Yi..
where
t-1
r to
= B to -
L BvoE>etv,
v=l
rtu =
t-1
Btu -
L
BvuE>etv,
V=U
(2.30)
and
-1
or
(t)
Xi..
i=l
T
(t)T
[ X iO • Xl..
(t-l)Tj
Yi..
T
Yit.
i=l
(t-l)
Yi..
•
{[X~t)
for t
X(t)
y(t)t[X~t)
= 2, ... ,p and u = 1, ... ,t -
(t-1)
Yi..
X(t)
1.
-13
y(t)j} -1 [X~t) X(t)
t
y(t)tyi )
Theorem 2.6 For the likelihood function (2.29) the maximum likelihood estimators
r t,
of rto,
8
et
and ~etet.el ...et-l are given by
1
rto
nt
r t
8
L
i=l
[Xi'O]
X~~~
T
(t)T
(t-l)T
[Xio• Xi.. Y i . . ]
(t-l)
Y1"
et
{[X~t)
X(t)
y(t)r[X~t) X(t)
rl
y(t)J} - l
Xio.
nt
L
yft.
(t)
Xi••
1=1
[X~t)
(t-l)
Yi..
X(t)
...
)
y(t)ry~t)
(2.31)
and
~etet.fI ...et-l
1
= -
nt
nt
L[Yit. i=l
= ~[y~t)
_
nt
rio X iO • -
x~t)rto
rJx~~~
- x(t)r t -
-
8~ty~~~1)][Yit.
y(t)8 et r[Y?) -
-
rJx~~~
rio X iO • -
x~t)rto
x(t)r t
-
8~ty;~~1)r
-
-
y(t)8 et }(2.32)
Proof Observe that
T T t T
tr~-l
etet.el ...et-l [r to _r to rT_r
t 8 et-8 et
T
T
nt
]
T
(t)T (t-l)T]
[X io • Xi.. Yi..
'"
L...J
i=l
(t)
Xi..
(t-l)
Yi..
with rto, r t and
8 et
given by (2.31), and ~etet.el ...et-l by (2.32), Applying Theorem
2.2, the second term of the above expression is minimized when
rt
and 8
et
=
8 et ,
r to
=
r to, r t
=
According to Theorem 2.1, the first term is minimized when
~ etet·el···et-l --~ etet·XOel···et-l'
.
The maximum value attained by the likelihood of e... is
...
Therefore the maximum value attained by L or L c is given by
maxL= (21r) 'texp (- ~) l~elell-¥-fII~XtXt.XOXl...Xt-ll-¥fI
t=l
t=2
44
I~etet.el...et-ll-¥
(2.33)
where N = (q
+ r) Lf=l nt·
•
The next algorithm summarizes how to obtain the maximum likelihood estimators of B to , B t and
~etet
for t = 1, ... ,po It is a direct application of Theorems 2.1,
2.5 and 2.6, and the relationships (2.10) and (2.30).
"
Algorithm 2.2 The maxzmum likelihood estimators of B to , B t and
~etet
for t =
1, ... ,p can be obtained using the p steps described below.
Step 1 Obtain
t
e1e1 ,
13
10
and
13
1,
Then
, (1) _
'
~ee - ~elet'
and
Step t (t = 2, ... ,p) Obtain tetet.et ...et-11 tto, r t and eet.
e et -- ~(t-1)-1
~
d~
~
~
~(t-1)-1 ~
·
Sznce
~ee
~el ...et-l et an ~etet.el ...et-l = ~etet - ~et el .. ·et-l ~ee
~et· ..et-t et
we have that
t
etet·el .. ·et-l
and
t(t) =
ee
+t
et et .. ·et-l
(~""
t
ee
etet
:.
~etet
~etel
Then calculate
t(t-1)-lt
)
t-1
Btu = ttu
+L
Bvue tv ,
u = 1, ... ,t - 1,
v=u
..
el· ..et-l et
and
t-1
B to = tto
+L
v=l
45
Bvoe tv '
2.5
Hypotheses of Interest
We are interested in testing whether the inclusion of the time-varying covariates in
the model provides a reduction in the dispersion matrix of the residual error of the
.
responses when compared to the inclusion of the fixed covariates and the first visit
w
measurements of the time-varying covariates. For that purpose call
~letet = ~y t y t· x 0 x I .. · x t'
and
for t = 1, ... , p. The hypotheses being tested can be expressed as
p
Ho :
n{Hot:
t=2
~YtYt .XOXI = ~YtYt ,XOXI ...Xt}
(2.34)
~YtYt,XOXI > ~YtYt,XOXI ...Xt}·
(2.35 )
versus
P
HI: U{H lt
:
t=2
We can rewrite (2.34) and (2.35) as
n{H;t :
t=2
p
H;:
~YtYt'YI ...Yt-IXOXl = ~YtYt'Yl ...Yt-IXOXI ... Xt given H;s is true for s
= 1, ... , t -
I}
(2.36)
and
p
U {H~t: ~YtYt'YI ...Yt-IXOXI > ~YtYt'YI ...Yt-IXOXI ... Xt}·
t=2
= Hot for t = 1, ... , p if and only if H~s is true for s
H~ :
Observe that
H~t
(2.37)
= 1, ... ,t -
1.
If the covariates were non stochastic the null H o and the alternative hypothesis
HI could be written as
p
Ho : n{Hot :
t=2
Btu
= 0 for u = 2, ... ,t}
46
(2.38)
•
versus
P
HI :
U{H
lt :
Btu
0 for some u
=1=
= 2, ... , t}
(2.39)
t=2
which in turn can be rewritten as
p
H;: n{H;t:
t=2
..
r tu = 0
for u
= 2, ... , t, givenH;sis true for s = 1, ... , t -I} (2.40)
versus
p
H~ : U{H~t:
t=2
because
r tu = Btu -
Observe that
each element
r tu =1= 0 for some u = 2, ... , t},
L~::~ BvuE>tv for u
!:YtYt.YI ...Yt-IXOXI -
aYtjYtj.YI ...Yt-IXoxl
j
=
1, ... , q
and t
= 1, ... , t - 1 and r tt = B tt for t = 1, ... ,p.
in the diagonal of
= 1, ... ,p; and if
aYtjYtj'YI ...Yt-IXoxl
= 1, ... ,q, then aYt)Ytj'.YI ...Yt-IXoxl y. Y
Y
xox
x = 0 for )., = 1, ... ,q.
t)·
I .. · p-I
!:YtYt.YI ...Yt-IXOXI
in
aYtjYt) .YI ...Yp-IXOXI ... Xp
some j
a y t)'
is positive semi-definite, since
!:YtYt.YI ...Yt-IXOXI ... Xt
equal to the corresponding element
(2.41)
=
is greater than or
!:YtYt.YI ...Yt-IXOX1
Xt'
for
aYtjYtj.YI"'YP-IxoxI
xp
for
aYtjYtj"YI"'YP_IxoxI ... xp
= aYtj'Yt).YI ...Yt-IXoxl
-
I· .. P
The problem can be compared to the problem of testing independence between
..
)
two set of variables of the form
(t) _ y(t) y(t)
Z tl
t· I
...
y(t) X(t) X(t)
t-I 0
I
(2.42)
and
(2.43)
satisfying the model
(2.44 )
with
"
!:ZtZt
is (q + r(t -1) x q + r(t - 1)),
!:ZtlZtl
is (q x q) and
!:Zt2 Z t2
is (r(t -1) x r(t - 1)).
that is, to test
Hot:
!:ZtlZt2
= 0 versus H lt
: !:ZtlZt2
47
=1=
0
for t
= 2, ... ,p.
(2.45)
Let
and
where
=
X *(t)
t
[x(t) x(t) x(t)
0
1
2' . .
x(t) y(t)
y(t) ]
t
l' . .
t-l
and
C t = [Or(t-l)X(J+l+r) Ir(t-l) Or(t-l)xq(t-l)]'
The maximal invariant for each step is the set of ordered nonzero characteristic roots
of
vVt
i
= Sil Sm St2~ St2l where Stij = Z~ Ztj
('i, j =
..
1, 2).
Thus the hypotheses are now expressed in terms of canonical correlations
R t = [.6. t 0] : q x r(t - 1)
and .6. t = diag (Ptl, ... , Pts) with s = min[q, r(t - 1)]. In terms of .6. t the hypotheses
in (2.45) are written as
Hot: .6. t = 0
versus
H lt :.6. t
=I=-
0
for t = 2, ... ,p.
(2.46)
At each step t (t = 2, ... ,p) the problem is invariant under the group of transformations
{It
= O(nt)
x GL(q) x GL[r(t - 1)] acting on z~t) by
(2.47)
where
O(nt)
is the set of all
(nt
x
nt)
orthogonal matrices, GL(q) is the set of all
(q x q) nonsingular matrices and GL[r(t - 1)] is the set of all (r(t - 1) x r(t - 1))
nonsingular matrices.
48
...
Observe that the problem at each step remains invariant under the group of
transformations Qt because
,
U*
t
2.6
= U*[g (Z(t))] =
t
t
t
IG;I
IG;+H;I
Step-Down Test Procedure
Let A be the likelihood ratio statistic for testing H o above when the observations
of each individual come from a multivariate normal distribution. Observe that the
likelihood ratio statistic can be written in several different ways. The following form
is more convenient to evaluate the distribution under H o
maxLO(~)
A =
max£l(~)
p
IT At
t=2
ITP [ I:E .el..·et-l I] nt/2
...
1
:E~tet
t=2
I
etet.el ...et-11
IT [1:Ef~t'Yl"'Yt-lXOXl"'Xt I]
I
t=2
IT [
=
t=2
nt/2
YtYt.Yl ...Yt-lXOXll
IGtl
IG t
] nt/2
+ Htl
(2.-1i-l )
'
and the next one for the distribution under Hi
A =
,
,
where ~YtYt'Yl ...Yt-lXOXl ...Xt and ~YtYt'Yl ...Yt-lXOXl denote the maximum likelihood estillldtor of the indicated variance based on the normal distribution given by
~YtYt'Yl ...Yt-lXOXl =
~(y~t) _ x~t)tto
nt
-
X~t)ttl
-
y(t)et)T(y~t)
49
-
x~t)tto
-
X~t)ttl
-
yU)e t )
(2 ..191
and
~
-
YtYt·Yl···Yt-lXOXl .. ·Xt -
~(y~t) _ X~t)f'to - X(t)f't - y(t)8d T (y?) - x~t)f'to - x(t)f't - y(t)8 t ). (2.50)
nt
In the next section we show that under the assumption of multivariate normality
and H a, the statistics -2log At are independently distributed, each statistic converges
to a chi-squared distribution with qr(t - 1) degrees of freedom as nt -+
distribution of - 2 log At does not depend on
00.
The
Y?), x~t), X(t) and y(t) being held fixed
or not, given that Has is true for s = 1, ... , t.
This fact suggests the use of a step-down testing procedure (Roy et al., 1971)
to test H a. The procedure is to compare U2 with the significance point U2 for its
respective degrees of freedom. If the observed value is larger than U2, then reject H a.
If it is accepted, then compare U3 with
In sequence, the components are tested.
U3'
If one is rejected, the sequence is stopped and the hypothesis H a is rejected. If all
component null hypotheses are accepted, the composite hypothesis is accepted.
The proposed test has acceptance region A given by
..
..
p
A:
nrUt ~ Ut],
t=2
where Ut'S are to be chosen such that
P[Ut
~
UtlHas, s = 1, ... ,t] = 1 - at,
and the probability of accepting H o
p
p[n (Ut ~ UtIHat)]
t=2
p
= II (1 - ad = 1 - a,
t=2
for a prespecified a.
We can choose the significance levels based on the investigators' considerations.
In the absence of any other reason, the component significance levels can be taken
equal. If the investigator is more interested in the final measurements (like in the q
responses in the pth visit) and a p is a very small number, then it will take a relatively
large deviation from the pth null hypothesis to lead to rejection.
50
•
We may observe that each Ut region itself (for t = 2, ... ,p) is an intersection of
an infinity of t regions with the same degrees of freedom, (nt -
f - (r + 1)t), but they
are different for each t.
The step-down procedure is appropriate, since there is a meaningful basis for
considering the responses in the specified order. Considering them in different orders
may, in general, lead to different conclusions.
2.6.1
Distribution Under the Null Hypothesis
We want to show that under the null hypothesis the distribution of the criterion (2.48)
is the distribution of a product of (p - 1) independent variables, each of which has
the distribution of a product of q independent Vij variables corresponding to a test
on conditional canonical correlations.
For that purpose let
..
G t -n
t
~ YtYt'Yl ...Yt-lXOXl ... Xt
(2.51)
and
(2.52)
(2.53)
where
v;. _
tJ -
IGtjl /
IG tj
IGtj-II IG tj - 1
ViI
= gtll / (gtll
+ Htjl
+ Htj-Il'
+ h tll ),
•
for j
= 2, ... , q and t = 2, ... ,p, and G tj
and H tj are the submatrices of G t and H t ,
respectively, corresponding to the first j rows and columns.
51
Vtj is the ratio of the square of the length of the vector from Y.tj = (Yltj, ... , Yntlj) T
.
..
tOltsproJectlOnon
X(t)
0'
X(t)
1"'"
X(t) y(t)
t,
1"'"
square of the length of the vector from
y~~l' and Y.tl,···,
Y.tj
y(t)
t_l,an
d
d h
Y.tl,···,Y.tj-l an t e
to its projection on
t
X6 ),
X~t), y~t),
(see Anderson (1984), Chapter 8). The ratio
Y.tj-l
\ltj
... ,
is the
2/n t th
power of the likelihood ratio criterion for testing the hypothesis that the
. 0 f Y.tj on X(t)
X(t) .
.
h
f X(t)
X(t) y(t)
y(t)
regresslOn
2 ' ... ,
t is zero m t e presence 0
0'
l'
1, ... ,
1-1 '
and Y.tl, ... , Y.tj-l· For j = 1, 9tll is the sum of squares of the residuals of Y.tl
· regresslOn
. on X(t)
X(t).
h
f X(t)
X(t) y(t)
d
f rom itS
2,···,
t m t e presence 0
0,
l ' 1 " ' " y(t)
t-l' an
9t1l
+ h til
. t h e sum 0 f squares 0 f t h e resl'd uaIs f rom
is
ratio \It 1 =
9tll / (9tll
. 0 f Y.tl on
regresslOn
+ h tll ),
X(t)
2 ' ... ,
X(t) .
t is
and has the Beta distribution B (v;
~(
v, 2 Vtl
+
1 _ ")
~v ] =
J '2
t2
X(t)
l'
0'
y(t)
l ' ... ,
Th e
y(t)
t-l'
which is appropriate to test the hypothesis that the
.
zero m
t he presence 0 f
is distributed as X~tl / (X~tl +X~t2) where
B [
x(t)
Vtl = nt -
Vtl /2, Vt2/2).
r[4(Vtl
r[4(Vtl
y(t)
y(t)
t-l
f - l - r t - (t-1)q and Vt2
= r(t -1)
Thus
\ltj
X(t)
0'
X(t)
l'
1, ... ,
has the Beta density
+ Vt2 + 1 - j)] (IItl+1-j)/2-1(1
+ 1 _ j)]r[4//t2] v
)lIt2/ 2 -
_
1
v,
for 0 S v S 1 and 0 otherwise. Since this distribution does not depend on y~t) . ..
y~~l' and
Y.tl,·"· ,Y.tj-l
we see that the ratio
\ltj
independent of Vsl for s = 2, ... , t and l = 1, ...
•
""
is independent of them, and hencp
,j -
1. Then \ltj for t = 2, . ". , p and
j = 1, ... ,q are independent under H o.
The cumulative distribution function of A under H o can be found by integratill!..!the joint density of
{\ltj,
for
t
= 2, ... ,p and j = 2, ... , q}
p
over the range
q
II II \ltj < A
t=2
j=l
The above distribution is usually difficult to compute because it is a product of (p- I II/
Beta distributions with first and second parameters depending on t.
Observe that (2.48) is a product of p - 1 terms that can be individually trall~
formed to a function of Ut . Under Ho, the statistics Ut are independently distribut('d
with Ut having the Dirichlet distribution with degrees of freedom
t
Vt2
and
1111
t"()J
= 2, ... , p. The distribution of Ut does not depend on y~t), y(t), x~t) and X"
being held fixed or not, given that Hos is true for s = 1, ... ,t.
52
...
The usual four invariant tests for multivariate hypotheses can be used to test
(2.45). All four are based on functions of the eigenvalues of the following matrices:
(2.54)
"
..
and
where
X t*(t) =
[x(t) x(t) x(t)
0
1
2' . .
x(t) y(t)
y(t) ]
t
l' . .
t-l
and
C t = [Or(t-l)X(J+1+r) Ir(t-l)
Under H o, the matrices
t
G t = yit)T QteYi )
H t =
Or(t-l)Xq(t-l)]'
y~t)TQth y~t) (the matrix due to the hypothesis) and
(the matrix due to error) are independent central Wishart matrices
with degrees of freedom min(r(t - l),q) and nt -
..
f - 1 - rt - q(t - 1) respectively.
Standard multivariate criteria can be used as test statistics for HOt, for example:
1. Wilk's generalized likelihood ratio statistic: det (G t )/ det (H t
+ G t ) > Cit;
2. Lawley-Hotelling's trace with critical region:
trace
(HtG
t l ) > C2t;
3. Pillai's trace with critical region: trace
+ Ht)-l] > C3t;
[Ht(G t
4. Roy's largest root with critical region: chl
(HtG
t l ) > C4t.
In certain situations Ut may be transformed to an F distribution and the standard tables employed for determining the t-th critical value:
1. If q
= 1 then
(2.56)
2. If Vt2
= 1 then
1 - Ut Vtl
Ut
+1q
53
F
q
rv
q,Vtl
+l-q
(2.57)
3. If p = 2 then
(2.58)
4. If Vt2 = 2 then
va; Vtl + 1 va;
q
1-
q
'" F 2q ,2(vtl +l-q)
(2.59)
•
where Vtl
= nt -
j - 1 - rt - (t - l)q and Vt2
= r(t -
1).
In other situations an alternative procedure is to observe that (2.48) is a product
of p - 1 terms that can be individually transformed to -2log At = -nt log Ut . The
distribution of -2logA t = -ntlogUt approaches a X2 distribution with qr(t -1)
degrees of freedom as nt -+
00.
This asymptotic distribution can be improved by
means of an asymptotic expansion for the distribution of the likelihood ratio criterion
as explained in Anderson (1984).
Theorem 2.7 (Anderson, 1984) Let Up',qj,n' be a product oj p* independent distributed variables V1 , .•• ,Vp' with n*
=N
- q*, q*
= qi + q2
and density junction of
Yj, j = 1, ... ,p* being
B [ . ~( • + 1 _ .) ~ q1*J = r[Hn* + qi + 1 - j)] (n'+1- j )/2-1(1 _ .)qj/2-1
v), 2 n
J '2
r[~(n* + 1- j)]r[~qi]v)
v),
for 0 ::;
Vj ::;
1 and 0 otherwise.
The cumulative distribution function oj - 2 k log Up' ,qj ,n' is given by
',qj,n'::; w )
P ( -2plogUpN/2
P (-k log Up',qj,n' ::; w)
P (X~.qj ::; w)
+ ~~
[p (X~.qj+4 ::; w) - P (X~.qj ::; w)]
+ :4 {'Y4 [P(X~.q~+8::; W) - P(X~'q~ ::; w)]
-'Yi [P(X~.q~+4 ::; w) - P(X~'qj ::; w)]}
+
where
p=
R~,
•
(2.60)
n* - (p* - qi
N
54
+ 1)/2
'
(2.61)
..
k
= p N = n* - ~(p* - q~ + 1),
p*qi(p*2 + qi 2 - 5)
"'12
=
(2.62)
(2.63)
48
and
* *
2
•
"'14
=
~2 + ~9~~ [3p*4 + 3q~4 + 10p*23q~2 - 50(p*2 + q~2) + 159] .
(2.64)
The remainder term of (2.60) is of the order O(N-6).
The next Corollary gives a specific approximation for the distribution of the test
statistic at each step under H o.
Corollary 2.1 The cumulative distribution function of -2pt log At
= -kt log Ut
un-
der H o is given by
P (-2pt log At ::; Wt)
P (-k t log Ut ::; wd
P (X~t ::; Wt)
..
+ It;
+ ~
[p (X~t +4 ::; Wt)
[P(X~t+8
{"'It4
- P
::; Wt) -
(X~t ::; Wt)]
P(X~t
::; Wt)]
-"'1;2 [P(X~t+4 ::; Wt) - P(X~t ::; Wt)] }
+
where
Pt
=
nt -
R~,
(2.65)
f - 1 - [(r + 2q)t - r - q + 1]/2
nt
k t = Ptnt = nt - f _ 1 _ [( r
qr(t - 1)[q2
+ 2q) t ~ r -
+ r 2(t -
"'It2 =
q + 1] ,
1)2 - 5]
48
(2.66)
(2.67)
(2.68)
and
dt = qr(t - 1)
for t = 2, ... ,p. The remainder term of (2.65) is of the order O(nt 6).
55
(2.70)
Proof Observe that
where
Ut
is distributed as
Uq,r(t-l),nt-f-l-rt-q(t-l)
for t = 2, ... ,po Each
Ut
is a
product of q independent Beta distributions under H o. Note also that in our case
N
nt
p*
q,
q~
r(t - 1),
q;
f + 1 + r + q(t -
q*
f + 1 + rt + q(t -
n*
nt -
f -
1),
1),
1 - rt - q(t - 1).
..
Substituting into (2.61), (2.62), (2.63) and (2.64) we obtain
nt -
f -
1 - rt - q(t - 1) - (q - r(t - 1)
Pt
+ 1)/2
.
nt
nt -
k
t
f -
1 - [(r
+ 2q)t - r - q + 1]/2
= Pt nt = nt _ f _ 1 _
qr(t - l)[q2
"(t2
[( r
+ 2q) t ; r - q + 1] ,
+ r 2(t -
1)2 - 5]
48
=
and
'V
It4
= "(;2
2
+ qr(t -
1920
1) [3 q4
+ 3r 4 (t _
+
1)4
10 q2 r2(t - 1)2 - 50 (q2
+ r 2(t -
1)2)
+ 159].
Therefore under H o we have
1)
- 2 Pt log At -+
2
Xdt
(2.71)
where dt = qr(t - 1) for t = 2, ... ,p. Moreover,
(2.72)
56
where
2:f=2 dt
= qrp(p - 1)/2.
•
Anderson (1984) discusses the use of such approximations as well as others based
on different applications of (2.60). Another approximation for the distribution of a
function of Ut proposed by Rao (1951) that could be used is given by
Fp' Ql' ,k' S -T'
1 - Utl/s k* S
-
r*
= ----:-;-'---p*qi
-UtI Is
(2.73)
that is distributed as an F distribution with p*qi and k* s - r* degrees of freedom,
where
s=
p*2 qi 2 - 4
p*2 + qi 2 - 5'
r
p*qi
= -2- - 1 and
k*
=N
1
- qi - 2(P* - qi
+ 1).
The cases where this approximation is exact are given in (2.56), (2.57), (2.58), and
(2.59). Otherwise for small values of qi this approximation is more accurate than the
previous X2 approximation. It is worth mentioning that in the proposed step-down
procedure for the particular test of interest qi = qr(t -1) gets larger at each step due
•
to the fact that we are testing the gain in information provided by an accumulated
number of time-varying covariates at each step. Therefore when p is large the use
of approximation (2.73) for the distribution of a function of Ut under Ho is not
recommended.
2.6.2
Power Considerations
In order to understand the complexity of the power funtion, consider the simplest
situation when q
= 1 and f = o.
Recall that in that case
p
A=
II (1 -
r;)nt/2
t=2
2
where r t2 = r Ztl
is the square of the multiple correlation coefficient between
Zt2
Ztl = Yt·YI,···, Yt-l, Xl
and
57
for t = 2, ... ,p. Let
T
When q
= 1, it is well known
(T
Zt2 Ztl
~-l
Zt2 Zt2 (T Zt2Ztl
aZtlZtl
be the square of the population multiple correlation coefficient between Ztl and Zt2.
2
Pt =
(see Anderson (1984)) that each Ft
= dt2 rl![dtl (1- rn]
is distributed according to a non-central F-distribution with dtl = r(t - 1) and
d t2 = nt - tr - 1 degrees of freedom and non-centrality parameter given by
T
~-l
S ~-l
~ _ (T Zt2 Ztl Zt2 Zt2 t22 Zt2 Zt2 (T Zt2 Ztl
Ut -
a Ztl Ztl.Zt2
for t = 2, ... ,p conditioning on the observed values of Yl, Y2,"" Yt-l, Xl, ... , X t at
each step t. The unconditional density function of
r; is given by
f((d tl + d t2 )/2) (r 2)(d tl -2)/2 (1 _ r2)(dt2-2)/2
r( dtl /2) f( dt2 /2) t
t
where
00
)
1
is the generalized hypergeometric function with (a) k = a( a + 1) ... (a
The distribution function of
P(r t2 :S Vt)
.
(adk'" (a P )k Zk
L (b) k ... (b q k k'.
k=O
pFq(al' ... ,ap; bl , ... ,bq; z) =
+k-
1).
r; can be expressed in the form
~
= k=O
~CkP ( Fdt1+2k,dt2:S
t2
Vt) '
d d
tl + 2k 1 - Vt
where Ck is the negative binomial probability
c,
= (-1)' (
-(d,,: d,,)/2 )
(1 _
p1J IdH+d"IJ2 (piJ'.
The calculation of the distribution function is more complicated when q 2:: 2.
The power of the proposed step-down procedure depends on the direction of the
alternative hypothesis H l . Let
7r
7r
denote the power function, that is,
•
P(Reject H o I H l )
1 - P(Accept H o I Hd
1 - P(Ut :S
Ut,
1 - P(U2 :S U2
for t
= 2, ... ,p I Hd
I Hd··· P(Up :S up I Hl,fJs :S Us for
58
s = 2, ... ,p - 1).
Let f3t(6 t ) denote the probability of accepting Hot given that Hot is false conditioned
on the observed values of YI, Y2,"" Yt-I, Xl,"" Xt. In that case, there are 2P - 1 -1
situations for the alternative hypotheses as described below.
Case 1: If HI = {Ho2 true, ... ,HOp - I true, Hop false} then the power given that
..
Y I , Y 2 ,.··, Y p -
I,
Xl," ., Xt· are being held fixed is given by
Case 2: If HI = {H02 true, ... , H Op - 2 true, HOp - 1 false, Hop false} then
7f(6p -
l ,
6p ) =
1 - [flf::i(1 - ad] P(Up- 1 ::; Up-I, Up ::; Up I HI, Ut ::; Ut, for
t = 2, ... ,p - 2).
Case 3: If HI = {Ho2 true, ... , HOp - 2 true, HOp - 1 false, Hop true} then
•
7f(6p _ I ,6p ) =
1 - [flf::i(l - at)] P(Up- I ::; Up-I, Up ::; Up I HI, Ut ::; Ut, for
t = 2, ... ,p - 2).
Case 2 P - 1 -1: If HI = {Ho2 false, ... ,Hop false} then
The power function is given by 7f = E[7f(15 c )] for each case c. Note that the calculati(JIJ
of the power function involves the knowledge of the joint cumulative distributllJll
function of (U2 , ..• ,Up) because the test statistics involved are no longer independ('111
under HI'
Bootstrap Method for Power Calculation
The empirical power can be calculated by means of bootstrap methods. Let
~"
diag {t xxn , teen} be the maximum likelihood estimate of the covariance matrix baspd
on the original sample, B ta the maximum likelihood estimate of B ta and B t t h('
maximum likelihood estimate of B t .
59
Under the correlation model with independent errors (2.3) the covariates Xit.
= 1, ... , nt, t = 1, ... , p}
and the residuals eit. for i
are resampled independently.
Draw IvI samples from the original n = n1 covariate measurements with replacement. For each sample draw n p covariate observations with replacement from
,np ; n p -
for i = 1,
XiI.,
,Xip.
XiI.'
,Xip-l.
for i = 1,
with replacement from
.
ate 0 b servatlOns
i
= 1, ... ,np -
l -
*(p) -
Xi..
-
XiI.
[
n p covariate observations with replacement from
1-
n p ; and so on, until n l
, np -
l -
for i
= 1, ... , n l
*
XiI."'"
*
X ip •
n p ; and so on x~~~
Draw !vI samples from the n
]T J:lor
n 2 covariate observations
-
n 2 are obtained. Call these new covari-
. -
Z-
1, ... , n p ., Xi..
*(p-1)
= XiI. for i = 1, ... ,n l
=
•
-
-
*
XiI.' ... ,Xip-l.
for
n2•
-
n1 residuals with replacement independently
of the above sampling scheme for the covariate observations such that you draw n p
from
eil., ... ,eip.
with replacement for i
= 1, ... ,np ; n p -
= 1, ... ,np - l
-
with replacement for i
i
= 1, ... ,n
l
-
-
n p ; ... , n l
l -
n 2 from
n p from
eil.
eil., ... ,eip-I.
with replacement for
n2•
•
For each sample put
The joint distribution of {( x:~~) , Y:~~)) i
=
1, ... , nt, t
=
1, ... , p}, conditional on
{(x~~~, Y~~~) i = 1, ... ,nt, t = 1, ... ,p}, is the bootstrap estimate of the unconditional
· .b
'
(t) ,Yi••
(t)).Z
d 1stn
utlOn
0 f {( Xi..
--
i'
= I
, ...}
,p un d er t h
e correatlOn
mo d eI
1, ... ,nt, t
(2.3) with independent errors.
For each sample calculate :t~, the maximum likelihood estimate of the covariance matrix for each bootstrap sample.
For each §~ calculate the corresponding
(T;, ... ,T;) and count the number of times that the hypothesis is rejected, let us call
this number m. So the empirical power of the procedure is given by
m
7r e
= !vI'
Beran and Srivastava (1985) showed that the empirical distribution of the bootstrap
test statistic converges to the distribution of the true test statistic.
60
Chapter 3
Robustness
3.1
Introduction
In this chapter we investigate the robustness of each step of the step-down procedure
.
described in Chapter 2 that was derived under the assumption of normality against
deviations from this assumption. We try to understand what circumstances cause the
procedure to remain equally valid when we change the class of distributions to a larger
one that includes the normal distribution as a special case. The larger classes of distributions considered are the left-orthogonally invariant and the elliptically symmetric
distributions defined in Chapter 1.
The strategy used to study the robustness of the proposed step-down procedure
follows the lines defined by Kariya and Sinha (1989). The robustness of a test statistic
is studied from two aspects. The first one is the robustness under the null hypothesis,
that means that the critical value of a size a test computed under the normal model is
the same as the one obtained when the data comes from a certain defined broader class
of distributions. The second one is the robustness under the alternative hypothesis
that can mean that the power of the test procedure under the alternative hypothesis
remains the same under a larger class of distributions. Moreover, any optimality
property enjoyed by the test under the normal distribution is preserved when we
have a departure from normality towards a larger class of distributions.
After verifying to what extent the proposed step-down procedure is robust under
the null and the alternative hypotheses we illustrate those results from an estimation
perspective. This approach was taken by Anderson et al. (1990).
3.2
Tests
Recall that in Chapter 2 the problem can be transformed to the problem of testing
independence between two sets of variables of the form
Z (t)
tl
= y(t)
y(t)
y(t) X(t) X(t) = y(t) _ E[y(t) I y(t)
y(t) X(t) X(t)]
t . I ...
t-I
0
I t t
I · ..
t-I
0
1
(3.1 )
and
(t)
Z t2
[x~t)
X~t)] .y~t)
[X~t)
X~t)]
...
y~~l X6t ) x~t)
_ E {[X~t) ... X~t)]
I y~t) ... y~~l X6t ) x~t)}
(3.2)
satisfying the model
(3 ..3)
..
"
where
W ZtZt
is a positive definite (q
matrix and
WZt2Zt2
+ r(t -
1) x q + r(t - 1)) matrix,
W ZtlZtl
is a (q
XI/I
is a (r(t - 1) x r(t - 1)) matrix.
The null hypothesis of interest is the intersection of hypotheses of the form
Hot:
WZtlZt2
= 0
for t = 2, ... ,p given Hos is true for s = 1, ... , t,
(T I,
versus
HIt :
WZtlZt2
:f:. 0
for some t = 2, ... ,p given Hos is true for s = 2, ... , t-I. (:3,-) i
For each t (t = 2, ... ,p) the problem is invariant under the group of
9t
transformati(Jll~
= O(nt) x GL(q) x GL[r(t - 1)] acting on z~t) by
T Q Z(t)C T )
gt (Z (t))
t
= (Q t Z(t)C
tl
tl'
t t2
t2
62
lor gt = (Q t. C Itl C)
2t
r
E
9t
(3.6 )
where O(nt) is the set of all (nt x nt) orthogonal matrices, GL(q) is the set of all
(q x q) nonsigular matrices and GL[r(t - 1)] is the set of all (r(t - 1) x r(t - 1))
nonsigular matrices.
Our statistic of interest is t(Z~t)) = IGtl/lG t + Htl where G t and H t are given
..
by (2.51) and (2.52) respectively. The statistic t(Z?)) is a function of the maximal
invariant for each step defined as follows. The maximal invariant for each step of the
problem is the set of ordered nonzero characteristic roots of
Stij = Z~:)T
sili Stl2St2~St21
where
zg) (i, j = 1,2). The squared canonical correlations are defined to be the
u = min[q, r(t - 1)] largest characteristic roots of Qtl Qt2 where
Qtl. = Z(t)(Z(t)TZ(t))-lZ(t)T
tz
tl
tl
tl'
i = 1,2.
are orthogonal projections.
Using the above transformations, tests can be reduced to tests on
R t = [at 0] : q x r(t - 1)
...
and at = diag (Ptl, . .. , Ptu) with u = min[q, r(t - 1)]. In terms of at the hypotheses
in (3.4) and (3.5) are written as
HOt: at
=0
for t
= 2, ... ,p
given Hos is true for s
= 2, ... ,t -
1,
versus
H lt
:
at # 0 for some t
= 2, ... ,p
given Hos is true for s
= 2, ... , t
- 1.
When the distribution of the variables involved is not normal the test is not of independence but of being correlated or not.
In the next sections we consider the robustness of the proposed statistics for
testing (3.4) in the exact sense defined below.
Definition. A test statistic t = t(z) for a certain test problem on a parameter ()
considered under a specified model and z having distribution F is said to be null
63
robust against the departure from F towards another distribution H if for each null
point 0 0 the distribution of t(z) remains the same.
Definition. A test statistic t = t(z) for a certain test problem on a parameter 0
considered under a specified model and z having distribution F is said to be nonnull robust against the departure from F towards another distribution H if for each
point 0 1 in the parameter space of the alternative hypothesis the distribution of t(z)
remains the same.
3.3
Null robustness
Kariya and Sinha (1989) described two approaches to evaluate the null robustness
of test statistics with respect to departures from the normal distribution towards a
larger class of distributions that includes the normal as a special case. They are the
following:
•
1. Choose a class of distributions containing N mk (J-t, n) under H o, and derive a
condition for which the distribution L:[t(Z)] of t(Z) under H o remains the same
in the class, where L: denotes the distribution of a random variable or matrix.
2. Fix Po = N(IJo, no), where (IJo, no) belongs to the null hypothesis parameter
space, and obtain a condition on P under which P belongs to the class of
distributions where t(Z) has the same distribution as t(Z) if Z were normally
distributed, that is,
Po
= {P
EP
I L:[t(Z) I P] = L:[t(Z) I Po]},
where P is the class of probability measures on
the distribution of t(Z) under L:(Z) = PEP.
64
Rmxk
and L:(t(Z)
I P)
denotes
...
3.3.1
Approach 1
We need the following results to verify if the null robustness is valid in the classes of
left orthogonally invariant distributions (Pd and elliptically symmetric distributions
(PE )·
"
Theorem 3.1 (Kariya and Sinha, 1989) Let t(Z) be a test statistic. The distribution
of t(Z), .c[t(Z)], remains the same for all distributions .c(Z) E P L as that of .c(Z) =
N (JL, 1m Q9 '11) if and only if .c[t(Z)] when .c(Z) = N (JL, 1m Q9 '11) satisfies
1. .c[t(Z-JL)] = .c[t(Z)] for all JL and positive definite (kxk) matrices'll belonging
to the null hypothesis space; and
2. .cv[t(VA)] = .c[t(V)] for JL = 0, all A and'll positive definite (k x k) matrices
such that Z
= VA,
with V E U
= {V : m x
k
I VTV = Id,
and .c v denotes
the distribution of t(VA) with respect to V .
.
The next corollary is a derived from a more general theorem regarding necessary
and sufficient conditions for the distribtuion of a test statistic to remain the same in
the class of elliptically symmetric distributions under the null hypothesis. That more
general theorem can be found in Kariya and Sinha (1989).
Corollary 3.1 (Kariya and Sinha, 1989) The null distribution of t(Z) is unique in
P E if the following conditions hold
1. t[(Z - JL)W- 1/ 2 ] = t(Z), for any JL and'll in the parameter space under the null
hypothesis, and
2. t( aZ) = t(Z), for any a > O.
Application: Recall the testing problem defined in Section 3.1. Observe that at
each step we have JLt = 0, then the distribution of St does not depend on JLt implying
that condition 1 in Theorem (3.1) is satisfied. On the other hand, condition 2 is not
65
satisfied unless the test is trivial, meaning that the power at step t is equal to the
significance level at of that step. This is verified by supposing that a test statistic
t(Zi t )) based on St satisfies condition 2 in Theorem (3.1). Then by Theorem (3.1),
the distribution of t(Zi
t
))
remains the same for all positive definite \l1 ZtW Since at
step t \l1 ZtlZt2 = 0 versus \l1 ZtlZt2 1=- 0 is tested, this implies that the power function
is equal to the significance level. Therefore any test based on St, including the loglikelihood ratio test, Roy's test, Lawley-Rotelling's test, and Pillai's test, does not
have an unique distribution in P L , unless the test is trivial. On the other hand, the
conditions in Corollary (3.1) are satisfied, implying that invariant tests that are a
function of St have the same distribution as their counterpart tests in the normal
distribution case in the class of elliptically symmetric distributions P E . The fact that
conditions 1 and 2 in Corollary (3.1) are satisfied can be verified by observing that
I-t
= 0 and taking C n =
3.3.2
a\l1~~~~l and C t2
=
a\l1~~~~2 for all a > 0 in (3.6).
Approach 2
This approach enables us to obtain a larger class of distributions than P E for which
the null robustness is valid.
Theorem 3.2 (Kariya and Sinha, 1989) Let X be the set of all (m x k) matrices of
rank k and Z a random matrix such that Z EX. Assume that
X and that
9 acts transitively on
9 = K·1l (each g in 9 can be written g = kh for k
E
K and h E 1l).
where
(i) K is a normal subgroup of g,
(ii) K is a compact subgroup of g.
Suppose that the measurable map t from (X, B) to (Y, C) is invariant under 1l. Then
£[t(Z)
I P] = £[t(Z) I PI] for P, P'
66
E PK,
..
Application: Let t(Z~t») be the proposed test statistic for each step that is based
on the vector of min(q, r(t - 1)) non-zero characteristic roots of StliStl2St2~St2l. We
want to show that the distribution of t(Z~t») remains the same for all distributions
of t(Z~t») in the class J(t as that when z~t) is distributed as N (0, I nt 0
..
W ZtZt )
= Po.
For that purpose consider the group Qt = O(nd x O(nt) x GL(q) x GL(r(t - 1))
whose elements are (cI-t, Qt, At, B t) with cI- t E O(nd, fl t E O(nd, At E GL(q), and
B t E GL(r(t - 1)). The action of Qt on X t is defined by
and Qt acts transitively on X t . The group operation is
Let
and
.
Then Qt = J(t . tit, J(t is compact and J(t is normal in Qt. By Theorem 3.2 we obtain
for all P E PK/l since Po E P Kt . To describe P Kt , observe that the elements of J(t act
on X t by
Thus .c(Z~t») E Pit if and only if Z~t) = (ZW, ZW) has the same distribution as
(cI-tZW, Z~~»), cI- t E O(nd. This occurs, for example, when .c(Z~t») = N(O, I nt 0 WZtZt)
with
w"" (w,~", w,:,,,)'
=
WZtlZtl is q x q, and
WZt2Zt2
is r(t - 1) x r(t -1). Generally if z~t) has a density on X
it which can be written as
Z(t») - ! (Z(t)TZ(t) Z(t»)
!(Z (t)
tl'
t2 - 1 tl tl' t2 ,
67
(3.7)
for some fl' Then
(t) Z(t)) - £(.:F. Z(t) Z(t))
£ (Z tl'
t2 ":I.' ttl,
t2 ,
for CPt E O(nd· That means that the distribution of
zW is elliptically symmetric.
An
important special case where (3.7) occurs is when we take the distribution of e... to
be elliptically symmetric and independent of x... with finite second moments. Then
the joint distribution of
eit•. eil •... eit-l.
for i
= 1, ... , nt and
t
= 1, ... ,p is also
elliptically symmetric. So the proposed step-down procedure is also null robust in
that case.
3.4
Non-null robustness
Theorem 3.3 (Kariya and Sinha, 1989) The distribution of a statistic t(Z) is unique
in P E (J.1, I
Q9
'11) for each (J.1, '11) in the parameter space of the alternative hypothesis
if the following conditions hold:
1. t(Z - J.1) = t(Z), for all J.1 in the parameter space under the alternative hypotheszs,
2. t(aZ) = t(Z), for all a> O.
Application: The fact that conditions 1 and 2 in Corollary (3.1) are satisfied can
be verified by observing that J.1
=0
and taking C tl
a > 0 in (3.6). In the nonnull case, where
WZtlZt2
=1=
= al q
and C t2
= alr(t-l)
for all
0, the distribution of t(Z~t)) at
step t depends on '11 ZtZt' and so the nonnull robustness holds for each fixed'll ZtZt' This
implies the nonnull robustness of the canonical correlations and all invariant tests for
independence, because the set of the canonical correlations is a maximum invariant
at that step. This implies the optimality robustness of invariant tests, because the
power function of each invariant test remains the same in P E (J.1t,l nt
example, the LBI test with critical region
68
Q9
'11 ZtZt)' For
under .c(Z~t)) = N(J1-t,I nt
PE (J1-t , I
nt
@
@
WZtZt ), which is known as Pillai's test, remains LEI in
W ZtZt ) ' Similarly, the UMPI R 2 -test in the case min(q, r(t - 1)) = 1,
which is equivalent to Pillai's test, also remains UMPI in the above class. On the
other hand the null distribution of the sample correlation has been shown to remain
the same in a broader class of distributions.
3.5
Correlation Model
In this section we illustrate the fact that the distributions of the test statistics involved in the step-down procedure proposed in Chapter 2 remain the same under
the null and the alternative hypothesis as the ones obtained under the assumption
of normality when the data come from a univariate elliptical distribution with the
additional assumption that the density exists.
Consider the same data structure as in Section 2.2 and modify the correlation
model described in Section 2.3 as follows. Suppose now that x~:~ = (x~. x~ •... XTt.)T
..
and e~:~
= (e~. e~ •... eft.V
for i
=
nt+1
+ 1, ... ,nt
and t
=
1, ... , pare uncorrp-
lated random vectors distributed according to a univariate elliptical distribution wit h
density function
The mean is given by
.0.
O(p)TX(p)
0xo
o
o(p-1)T
Oxo
J1-u
=
(p-1)
x.o.
0
o(1)T
oxo
(1)
X.o.
o
and variance matrix 'l/Jg W where 'l/Jg depends on 9 and
69
o
W(t)
ee
e xo(t)
W(t)
XX.XO -
[e
--
XIO
e X20' . . e]
xto,
WXIXl.XO
WXIX2.XO
WX2Xl'XO
WX2X2.XO
WXtXl.XO
WXtX2.XO
and
w(t)
ee
=
w
w
e1e1
w
e2e1
w e2e2
w
w ete2
e1e2
ete1
and N = (q
of
e;ra)
w xx .xo '
=
+ r) Lf=l nt. Note that e~td corresponds to the first t(J + 1) columns
e xo ,
w~tl.xo corresponds to the first tr columns and rows of w~J.xo =
and w~~ corresponds to the first tq columns and rows of w~~) =
Wee'
The
sets of covariate observations and random errors associated with different individuals
are uncorrelated but not independent under the assumption of univariate elliptically
distributed variables unless 9 is normal.
In this context the appropriate correlation model that incorporates a monotone
multiple design multivariate (MMDM) linear model for the vector of responses at the
tth visit of the ith individual is given by
_ BTto Xioe + B T
(t)
t Xi.. +
Yite -
where B to is the ((J
+ 1)
(3.9)
eite,
x q) matrix of unknown regression coefficients associated
to the design variables, B t =
(B~ B~
... BitV is the (tr x q) matrix with Btu being
the (r x q) matrix of unknown regression coefficients associated with the random
covariates observed at visit u for u
= 1, ... ,t, i = 1, ... ,nt,
70
and t
= 1, ... ,p.
Then
z~~~ = (X~~~T y~~~TV i = 1, ... , nt are elliptically distributed with mean
e- xo(p)T X eoe
B o(p)T X eoe
J-tz
+ B(p)Te(p)T
xo x eoe
(3.10)
=
e (I)T X eoe
- XO
B o(I)T X eoe
+ B(I)Te(I)T
xo X eoe
and variance matrix 'l/Jg 'II zz where
(3.11)
W(t)
w(t)
ee
B(t)
+ ~~')~W(t)
B(t)
xx.xo
)
,
(3.12)
(3.13)
and
(3.14)
o
o
Note again that B~t) corresponds to the matrix formed by the first t(j
+ 1)
columns
of B~P) = B o,
B(t)
rows of
B(p) =
B, and that w~~,xox corresponds to the first tq columns and rows of
w~J,xox
= Wyy,xox
3.6
corresponds to the matrix formed by first tq columns and tr
=-w ee
+ BTwxx.xoB.
Estimation
The usual method of maximum likelihood involves the maximization of the joint
· 'b'
(t)T ei..
(t)T)T lor
~ z. -dIstn
utlOn 0 f (xi..
and'll. Since x~:~ and e~:~ for i
- ,
1
+ 1, , nt an d t = nt+ I + 1, ,nt and t = 1,
nt+1
71
. h respect to B
,p WIt
,p are uncorrelated
random vectors the likelihood under the model described in Section 3.5 is given by
_
L(x••• ,e••• ,B, W) -
P
II IW
(t) _ (nt-~t±il
ee
t=l
nt
'"
p
tr W(t)-l
9 '"
~
xx.xo
[
t=l
~
(t)
p
II jWxx.xol
I
_ (nt-~t±l)
t=l
t..
(x(t) _ e(t)T x·
xo
W.
t..
) (x(t) _ e(t)T x.
xo
W.
)T
i=nt±l+l
t .T.(t)-l
~ r
ee
+ '"
nt
'"
p
~
'K
t=l
(t)
(t)T
]
et..e t..
(3.15)
.
i=nt±l +1
Direct calculation of the likelihood equations under the correlation model often
requires iterative procedures. We use an alternative approach based on a transformation of the original variables whose parameters have a one to one relationship with
the parameters of (3.15). This method yields explicit solutions of the maximum likelihood estimators of Band W for this model. Observe that if the set of observations
on all each individuals XiI., Yih, Xi2., Yi2., ... ,
t
Xip., Yip.
for i = nt+l + 1, ... ,nt and
= 1, ... ,p are distributed according to a univariate elliptical distribution, then the
set of errors
[Xih - E (Xih)]
..
[Yih - E (Yih
I Xih)]
[Xi2. - E (Xi2.
I Yih, XiI.) ]
[Yi2. - E (Yi2.
I Yih, Xih, Xi2.)]
(3.1 G)
for i = nt+l + 1, ... ,nt and t = 1, ... , p follow a univariate elliptically symrrH'111 t
distribution with mean 0 and variance-covariance matrix y
c
that is block diag()llill
with the matrices in the main diagonal being
Yf
~g
(
WXtXt.el ...et-1Xoxl ... Xt-ol
0
Wetet .el ...et-l XOX1 ... Xt
72
)
Then the likelihood can be expressed as
P
L c (x ••• , e••• ) =
II 1'1'
XtXt.XOX1 ... Xt_l
I-~2
I'1' e1et I-~
2
t=l
•
g [tr
P
II 1'1'
etet.el ...et-l
I-~2
t=2
'1'~}Xl'XO~(Xil. -
8;lQ X io.)T
8;lQXio.)(Xil. -
P
+L
t=2
tr '1';t~t,XOX1... Xt_l
nt
"'[
. _8 T
L..- Xtt.
. _8 T (
xtoXw.
(t-1)_8(t-l)T
xt X t . .
xo
.
X w•
)][
. _8 T
Xtt.
. _8T (
xtoXw.
(t-l)_8(t-l)T
xt Xi..
xo
.
X w•
)]T
i=l
(3.17)
1, ... ,p, 8
and 8
et
for t = 2, ... ,p, where
'1'
,
xt
XtXt.XOX1 ... Xt-l
and
for t = 2, ... , p. The maximum likelihood estimators of Band
model can be obtained by transformation.
73
'1'
of the original
In order to find the maximum likelihood estimators of the parameters we are
going to use the next theorems.
Theorem 3.4 (Anderson et al., 1990) Let '11 be a symmetric positive definite matrix.
Suppose that 9 is such that g(uTu) is a density in R N and r NI2 g(r) has a positive
finite maximum r g . Suppose that on the basis of one observation u from
(3.18)
the maximum likelihood estimators of 1-£ and '11 under normality exist, are unique and
are given by
it
and ~. Suppose also that ~ is positive definite with probability one.
Then the maximum likelihood estimators under (3.18) are
it,
jJ, =
,
(3.19)
N -
'11=-'11
(3.20)
rg
and the maximum value attained by the likelihood is
(3.21)
Proof Let D =
Note that
IDI
Iwl- 1lN w and
= 1 and that the likelihood in terms of
D,
rand 1-£ is given by
(3.23)
Observe that the likelihood (3.23) attains its maximum when r NI2 g(r) is maximum
and (u - I-£)TD- 1 (u - 1-£) is minimum. Thus we have
tr D -1 (u - 1-£)( u - 1-£) T
trD- 1 (u - jJ,)(u - jJ,r
where jJ, =
it
+ (jJ, _1-£)Tjj-1(jJ, -1-£)
is equal to maximum likelihood estimator of 1-£. According to Theorem
(2.2) the second term is maximized when 1-£ = jJ,. The first term is maximized when
74
D =
D=
(u- jJ,)(u- jJ,)T by Theorem (2.1). Therefore the maximum likelihood esti-
mators of D and J1.,
.
case D and
D and jJ, respectively,
are the same as in the normally distributed
fL. Now consider the function
log(r N / 2 g(r)) = (N/2) logr
Taking the derivative of (3.24) with respect to
l'
+ logg(r).
(3.24)
and equating it to zero we obtain
(3.25 )
When 9 corresponds to the density function of the normal distribution then
r=
rg =
N. Observe that
where
and
Therefore we can write
N '1'=-'1'. •
A
rg
A sufficient condition for the function r N / 2 g(r) to have a positive finite maximum
l' 9
is given by the following lemma.
Lemma 3.1 (Anderson et al., 1990) Suppose that g(uTu) is a density for u E Ri'·;
such that g(r) is continuous (1'
~
0) and decreasing for
l'
sufficiently large. Then the
function
h(r) = r N / 2 g (1'),
has a maximum at some finite
l' 9
~
l'
~ 0,
(3.26)
O.
Corollary 3.2 The maximum likelihood estimators of the parameters of the correlation model (3.9) under the conditions conditions of Theorem 3.4 are
(3.27)
75
N= -~XtXt,XOX1 ... Xt-l'
rg
A
WXtXt,XOX1 ...Xt_l
r
tu
ttu,
=
N-
A
Wetet.el ...et_l -
-~ etet·el···et-l'
rg
= eet,
eet
(3.28)
(3.29)
(3.30)
(3.31)
(3.32)
(3.33)
and
A
Wee
N= -~ee
rg
(3.34)
where the --estimators denote the maximum likelihood estimators of the respective parameters under the normal distribution. The maximum value attained by the likelihood
zs
(3.35)
Proof As in Theorem 3.4 we can write the likelihood as
L c (x ••• , e ••• ) = T -N/2 r N/2 g (r)
(3.36)
where
nl
T = tr D;llx1 .XO
L(Xil. -
E>~IOXio.)(Xil.
-
E>~IOXiO.)T
i=l
p
+L
tr D;t~t,XOX1...Xt_l
t=2
nt
"'[
. _E>T
. _E>T ( (t-1)_E>(t-1)T . )][ . _E>T
. _E>T ( (t-1)_E>(t-1)T . )]T
xtoXw.
xt Xi..
xo
X w•
Xlt.
xtoXw.
xt Xi..
xo
X w•
~ Xlt.
i=l
nl
+ tr D;l~l L eil.e~.
i=l
p
nt
D-1
"'(
+ '"
~ tr etet.el ...et-l ~ eit. t=2
E>T (t-1))(
E>T (t-1))T
- etei..
eit. - - etei..
.
i=l
(3.37)
In Chapter 2 we showed how to obtain the maximum likelihood parameters under
the normality assumption, which is equivalent to minimizing (3.37) with respect to
76
t = 2, ... ,p. Similarly to the proof of Theorem 3.4 we can show that (3.27) to (3.34)
.
are valid.
_
Corollary 3.3 (Anderson et al., 1990) Let the conditions of Theorem 3.4 hold, and
let wEn be a set such that if (IL, \lI) E w then (IL, c\ll) E w for all c
>
O. The likelihood
ratio criterion for testing the null hypothesis (IL, \lI) E w is l~nll/2/I~wll/2, where ~n
and ~ware the maximum likelihood estimators of \lI in nand w, respectively, under
normality.
Corollary 3.4 Let the conditions of Corollary 3.4 hold, and let wEn be a set such
that if (8, \lI) E w then (8, c\ll) E w for all c
> O. The likelihood ratio criterion
for testing the null hypothesis (8, \lI) E w, 8
=
(8 xo ,8 eo ) is l~nll/2/1~wll/2
=
l~nll/2 /1~wll/2 that can be decomposed as
p
A=
II At
(3.38)
t=2
where
•
(3.39)
where ~ etet.el ... et-l n and ~ etet.el ...et-1 W are the maximum likelihood estimators of
\lIetet.el ...et-l
in nand w, respectively, under normality.
Theorem 3.5 (Anderson et al., 1990) Let the conditions of Theorem 3.4 hold. Suppose the distribution of
(3.40)
under normality does not depend on (IL, \lI) E
Wm
x W v ' Then the distribution of A(u)
for arbitrary g does not depend on g and does not depend on (IL, \lI) E
77
Wm
x
Wv '
Corollary 3.5 The distribution of At for t = 2, ... , p gwen that Has is true for
s = 1, ... , t under the correlation model (3.9) remains exactly the same under the
null and the alternative hypothesis if the distribution of x••• and e... is a univariate
elliptical distribution.
Proof That can done using (3.3), (3.4) and (3.5).
3.7
•
Discussion
The step-down procedure described in Chapter 2 remains robust in the case of the
correlation model (3.9) that assumes that the joint distribution of x ••• and e... is
an univariate elliptical distribution. This robustness property is exact, in the sense
that it is not an asymptotic result, and holds under the null and the alternative
hypotheses. We also observed that the distribution of the test statistic At of each
step t t = 2, ... ,p under the null hypothesis remained invariant under a larger class of
distributions of x ... and e... that have the form £(x... , e••• ) = £(x... , e;..e... ). In
particular, if we assume that x... and e... are independent, and that the distribution
of e... is elliptically symmetric, the test reduces to the case of testing hypotheses on
t h e mean parameters 0 f y
y(t)
~
(t). y(t)
t
1 ...
t-I lor
t = 2, ... , p. S0 t h e propose d step-down
procedure is also null robust in that case.
We also noted that the step-down procedure is not robust in the exact sense
when joint distribution of x ... and e... is a left orthogonally invariant distribution.
78
.
Chapter 4
Elliptical Distributions Case
4.1
Introduction
It has been observed in the literature that tests of hypotheses concerning the popula-
tion covariance matrix based on the normal distribution theory are not robust when
the observations from different subjects are independently distributed and follow a
multivariate elliptical distribution. In particular, the tests are generally sensitive to
the kurtosis of the variables involved. In this chapter we consider the approach taken
by Tyler (1982, 1983) and Muirhead and Waternaux (1980) to modify the step-dow II
procedure described in Chapter 2. That approach involves finding an adjustment t()
the log likelihood ratio statistic based on the normal distribution such that the tl':-;t
statistic is asymptotically robust over the class of multivariate elliptical distributi(JII"
with finite fourth moments. The class of elliptical distributions is particularly slli table for the proposed step-down procedure since it retains some properties of tIll'
normal distribution, such as that conditionally linear means stay the same and t 11(\ t
the conditional covariance matrix parameters are a scalar multiple of the dispersioll
parameters (see Section 1.3.1). Tyler (1982, 1983) presented general conditions undl'l
'I
which such adjustment is possible and can be applied to the normal likelihood tp:-;t
statistic when the estimator of the dispersion matrix is taken to be the same as in t}ll'
normal distribution, the maximum likelihood estimator under a specified ellipti('(\!
distribution or an affine-invariant robust M estimator of the dispersion matrix.
4.2
Correlation Model
Consider the same data structure as in Section 2.2 and modify the correlation model
described in Section 2.3 as follows. Suppose now that x;:~
e;:~ =
nt+l
(eft. e~ •... eft. yare
+ 1, ... , nt
= (xft. x~ •... Xft.)T
and
uncorrelated random vectors and independent for i
and t = 1, ... ,p. We assume now that the distribution of u;:~
(x;:~, e;:~) is a univariate elliptical distribution with density function
(4.1 )
for i
= nt+l + 1, ... , nt
and t
= 1, ... ,p.
The mean is given by
(4.2)
and variance matrix ~9 'l1(t) where ~9 depends on 9 and
=
'l1(t)
o(t) _
\::f XO -
'l1(t)
xx.xo -
('l11~.xo
o
[0
0
\::fXIO \::f X2 0
0
),
'l1(t)
ee
0]
... \::fxto ,
'l1 XIXI.XO
'l1 XIX2. XO
'l1 X2XI.XO
'l1 X2X2 .xo
'l1XtXI'XO
'l1 Xt X2. XO
'l1 XIXt .Xo
'l1 XtXt.Xo
and
'l1(t)
ee
=
'l1 elel
'l1 ele2
'l1 e2el
'l1 e2e2
'l1 etel
'l1 et e2
The observations on different individuals are independent, but observations on the
same individual are correlated with the covariance matrix being the (trq x trq) matrix
80
.
'l'(t). Note that 8~td corresponds to the first t(J
+ 1)
columns of 8~) =
'l'~~.xo
8 xo ,
corresponds to the first tr columns and rows of 'l'~l.xo = 'l' xX.Xo' and 'l'~~ corresponds
to the first tq columns and rows of 'l'~~) = 'l' ee' The sets of covariate observations
and random errors associated with different individuals are uncorrelated but not in"
dependent under the assumption of univariate elliptically distributed variables unless
9 is normal.
In this context the appropriate correlation model that incorporates a monotone
multiple design multivariate (MMDM) linear model for the vector of responses at the
tth visit of the ith individual is given by
Yite = B Tto Xioe
+ B Tt Xi..
(t)
+ eite,
(4.3)
where B to is the ((J + 1) x q) matrix of unknown regression coefficients associated to the
design, B t =
(B~
B:; ...
Bit)T
is the (tr x q) matrix with
being the (r x q) matrix
Btu
of unknown regression coefficients associated to the random covariates observed at
...
.. u for u -- 1 , ... "t '1, -- 1,
VISIt
••• ,
- ,
1 ... ,p. Th en
nt, an d t -
(t) -Zi..
((t)T
Xi..
Yi(t)T)T
..
i = 1, ... ,nt are elliptically symmetric distributed with mean
(t) _
J.Jz
i
e~dT Xioe
(
B(t)T.
o X we
-
)
(4.4)
+ B(t)TO(t)T
.
0xo X we
and variance matrix ~g 'l'~~ where
zz
'l'(t)
B(t))
xx.xo
'l'(t)
xx.xo
'l'(t) =
(
B(t)T'l'(t)
xx.xo
'l'(t)
yY.xo
(
'l'(t)
B(t)
xx.xo
'l'(t)
xx.xo
-
B(t)T'l'(t)
xx.xo
'l'(t)
ee
+ B(t)T'l'(t)
B(t)
xx.xo
),
(4.5 )
(4.6)
and
•
B(t)
=
B l l B 2l
Btl
0
B 22
B t2
0
0
B tt
Note again that B~t) corresponds to the matrix formed by the first t(J
of B~P) = B o,
B(t)
(4.7)
+ 1)
columns
corresponds to the matrix formed by first tq columns and tr
81
rows of
wW.xo
4.3
B(p) =
B, and that w~~.xo corresponds to the first tq columns and rows of
= w yy .xo = Wee + BTwxx.xoB.
Estimation
The usual method of maximum likelihood involves the maximization of the joint
(t)T ei..
(t)T)T £lor z d lstn'b utlOn 0 f (Xi..
o
0
0
and Wo Since x~;~ and e~;~ for i
-
nt+l
- ,
1 ... , P WI'th respect to B
+ 1 , ... , nt an d t -
= nt+l
+ 1, .. 0' nt and t
= 1,o .. ,p are
uncorrelated
random vectors the likelihood under the model described in Section 4.2 is given by
L =
II Iw(t)ee 1P
t=l
P
II Iw(t)xx.xo 1P
(nt-;tttl
(nt-;tttl
t=l
nt
trw(t)-l(x(t) 9[
xx.xo 1..
II
II
t=l
+1
1..
e(t)T x · )(x(t) - e(t)T x ' )T
XO
w.
xo
w.
i=nttl
(48)
.
(t) (t)T]
+tr .T.(t)-l
';l" ee
ei..e
i.. .
.
.
Direct calculation of the likelihood equations under the MMDM model often
requires iterative procedures. We use an alternative approach based on a transformation of x ... and e... to facilitate the finding of the maximum likelihood estimators of
Band W for the MMDM model. For that purpose observe that maximizing (4.8) is
equivalent to maximizing the product of the likelihood of the transformed variables
whose parameters have an one to one relationship with the parameters of (4.8).
If we observe for each individual i: Xii., Yil., Xi2., Yi2., ... ,
Xip., Yip.
according
to a multivariate elliptical distribution, then the variables
[XiI. - E (Xile)]
[Yile - E (Yil. I Xile)]
[Xi2. - E (Xi2. I Yil., Xile) ]
[Yi2. - E (Yi2.
(4.9)
I Yile, Xile, Xi2.)]
[Xip. -
E
(Xip.
I Yil., ... , Yip-le, Xile,
, Xip-le)]
[Yip. -
E
(Yip.
I Yile, . '0' Yip-I., XiI.,
, Xip.)]
82
..
(
are noncorrelated and elliptically symmetrically distributed distributed with mean 0
and variance-covariance matrix
.'
~:
The likelihood function can be written as
P
L C --
t=l
nt
P
P
n
II I'll XtXt.XOXl ...Xt-l I-'!f II I'll etet.el ...et-l I-¥
t=l
II II
9 [trw;11x1 .XO (Xil. -
8~lOXio.)(Xil. - 8~lOXio.)T
t=l i=nt+l +1
t
+L
tr W;"lx ",XOXl",X"_l
u=2
,_OT
,_OT ( (u-1)_o(u-1)T . )][ ,_OT
. _OT ( (u-1)_O(u-1)T . )]T
[x w • \;J xuoXw. \;J xu Xi..
\;J xo
X w•
x w • \;J xuoXw. \;J xu Xi..
\;J xo
X w•
.
(4.10)
with respect to B to , B t , and
8 xto
for t = 1, ... , p,
8 xt
and
8 et
for t = 2, ... , p.
The log likelihood function is given by
P
P
t=l
t=l
1
1
10g L C -- '"
L..J nt
2 10g 1'11XtXt·XOXl .. ·Xt-l 1+'"
L..J nt
2 10g 1'11etet.el ..·et-l I
nt
P
L L
logg [trw;11x1 .XO (Xil. -
8~lOXio.)(Xil. - 8~lOXio.t
t=l i=nt+l +1
t
+L
u=l
tr W;"lx".XOXl".~"_l
,
OT
. _OT ( (u-1)_O(u-1)T . )][ ,_OT
. _OT ( (u-1)_o(u-1)T , )]T
[x w . - \;J xuoXw. \;J xu Xi..
\;J xo
Xwe Xwe \;J xuoXw. \;J xu Xi..
\;J xo
Xwe
..
(4.11 )
with respect to B to , B t , and
8 xto
for t
= 1, ... , p,
and w~e'
83
8 xt
and
8 et
for t
= 2, ... ,p,
w~x.xo
In order to illustrate the difficulty in finding the maximum likelihood estimators
of the parameters in (4.10), take the derivatives of (4.11) with respect to 8
8logL c
nl{w
8 '11-1
= 2
XIXI·XO
XIXI.XO
8 log L c
88
xt
2
_
~
- ~
_
2
i=l
-
d' '11
} ~
g'(di ) {
T
lag XIXI·XO + ~ -2-(d.) 2Xil.OXil.O
i=l
g ~
-
XlO
and
d' [
T]}
lag Xil.O X i 1.0
g'(di ) [ . __ T . __ T ( (t-l) __ (t-l)T . )][ T
(d) X~t. 8 xto x w • 8 xt Xi..
8 xo
x w•
X io•
g ~
(t-l)T]
Xi..
where
t
+L
tr W;ulxu.XOXl",XU_l
u=l
.
OT
. _ OT ( (u-l)_ O(U-l)T . )] [ . _ OT
. _ OT ( (u-l)_ O(U-l)T . )]T
[ X~u.- 0xuoxw. 0xu Xi..
0xo
x w•
X m • 0XUoXw. 0xu Xi..
0xo
Xw•
•
Equating the above expressions to zero, we obtain that the maximum likelihood
estimates are the solution to the following set of equations
(4.12)
1 ~ 2 g'(d i ) ,
'T
(4.13)
~ --,-Xil.OXil.O·
nl i=l
g(d i )
Thus, the maximum likelihood estimators have the form of iteratively reweighted least
.i.
"KXIXI.XO
= -
squares estimators, where the weights depend on the form of the elliptical distribution
being considered through - 2g' (di ) / g( di ).
•
4.3.1
Maximum Likelihood Estimation
When the underlying elliptical distribution is of the normal/independent type, the
maximum likelihood estimators can be found with less difficulty. A elliptical distri-
84
..
bution is of the normal/independent type if the distribution of Zi is of the form
and
In a more general situation of missing data, Little (1988) observed that the EM algorithm can be used to obtain estimates of the parameters involved. For that purpose,
call JLi
= JLi(cP) = XicP and Wi =
E~W(1P)Ei'
Given qi, update the parameter
estimates using
.ri,.(s+l) =
'¥
(~
(S)X*'W(S)-l X *) -1 (~ (s)X*'W(s)-l .)
L...J qt
L...J qt
z
z
uz
t
t
t
i=l
,
(4.14)
i=l
and
:t
W(s+l) = W(s) - .!.. [W(S)
E~W~S)-lEiW(S)]
n
i=l
.
+~
[W(S)
~qiS)E~W~S)-l(Ui
- X:cP(s))(Ui -
X:cP(S))'W~S)-lEiW(S)](4.15)
where w~s) = E~W(s)Ei' Then recompute qi using
qt(s) = E[q.t
I U obS,t,,¥,
. .ri,.(s-l) W(s-l)]
.
(4.16)
In our case, as the data has a monotonic decreasing pattern of observations the
algorithm simplifies. If we apply the above algorithm to the variables transformed
by (4.9) then the estimation of the parameters involves using the same algorithms
described in Chapter 2, but weighting the observations of each individual by q;S).
s
Then update qi ) by (4.16). Repeat the procedure until convergence is obtained. The
form of expression (4.16) for some elliptical distributions are given by the weight
functions described in Section 1.3.1.
4.3.2
M-Estimation
In our model, when the underlying distribution is from an elliptical, affine invariant
M -estimates are the solutions to the system of equations of the form
-1'"
nt
_8
. _8 ( (t-l)_8(t-l)T . )]
n t L...J Wi (d.)[.
z Xtt.
xtoXw.
xt Xi..
xo
X w•
i=l
T
T
85
-
-
0,
(4.17)
nt 1
nt
L w1(dd(eit. -
8;ue~;~1))
= 0,
(4.18)
i=l
nt
L...J W2 (d2)[.
i Xtt. _
-1'"
nt
-
OT
.
C7 x to X ZO.
_OT
(Xi..
(t-1)_O(t-1)T
. )][ Xtt.
. _ .. .]T -_ .i.
C7
C7
XZO.
':I.'XtXt.XOXI ... Xt_l
xO
xt
i=l
(4.19)
and
(4.20)
where
di = tr W~}XI'XO (Xii. - 8;lQXio.)(Xil. - 8;lQXiO.)Y
t
+ '"
L...J tr '11-1
XuXu·XOXI",Xu-1
u=l
.
OT
. _OT ( (u-1)_O(u-1)T . )][ . _OT
( (u-1)_O(u-1)T
. )]T
[ Xtu . - C7 xuO X W • C7 xu Xi..
C7 xO
X W•
Xtu • C7 xuo X w. • _OT
C7 xu Xi..
C7 xO
XZO.
for t
= 1, ... ,p.
The estimators provided by the solution of (4.17) to (4.20) yield the
maximum likelihood estimates of the parameters involved in the case of ellipticallY
distributed variables if we take w1(d i ) = w2(d7) = -2g'(dd/g(dd.
Let mi denote the dimension of (xf..
nt+l
+ 1, ... , nt
and t
= 1, ... ,po
er•• )Y.
Recall that mi
=
tqr for /
=
We will often denote mi by m s depending on which
set s i falls, s = 1, ... , p.
Maronna (1976) also proposed M-estimators based on an approach suggested 1,\
Huber (see Huber (1981)). In that case, the weight functions for Huber(q) estirna!lJ!
are of the form
1,
dt
<
_ k·t
(--1.:2 1)
and
(--1.22)
86
where
kl
is chosen to be the 100q percent point ofaX2 distribution with mi degrees
of freedom, and (3 is chosen so as to make the solution to the estimates in (4.19) and
(4.20) asymptotically unbiased for the covariance matrix in a normal distribution
situation. Huber(1) is of the form
(4.23)
and
(4.24)
Huber(O) results in the maximum likelihood estimators of the parameters under the
normal distribution.
4.4
...
Null Hypothesis
Recall that the hypotheses involved have the form
H,Ot •.
\l! YtYt.Yl···Yt-lXOXl·.·Xt = \l! YtYt.Yl,..Yt-lXOXl
H lt :
\l!YtYt.Yl,..Yt-lXOXl.,.Xt
~
\l!YtYt.Yl.,.Yt-lXOXl
for
t = 2, ... ,p
for
some
t
= 2, ... , p
Observe that
\l! YtYt .Yl,..Yt-lXOXl,..Xt =1= \l! YtYtlYl,..Yt-lXOXl,..Xt
and
unless the distribution is normal. In the conditional approach the residual variances
depend upon the variables being conditioned on (see Chapter 1).
In the case of elliptically symmetric distributions the unconditional variance
matrix is then, perhaps, best regarded as a comprising average relationship over all
possible values of the random variables. Within the class of elliptical distributions,
the variance matrix ~g \l! is not well defined, however parameters of the form
H (\l!)
are
well defined whenever the function H satisfies the condition that for any symmetric
positive definite matrix \l! and any positive scalar c,
87
H(\l!)
=
H(c\l!).
4.5
Step-Down Test Procedure
Consider the likelihood ratio statistic for testing H o above when the observations of
each individual come from a multivariate normal distribution:
A* =
IIp [I~
t=2
YtYt·Yl···Yt-1XOX1· .. Xt
I]
nt/2
(4.25)
I'ltYtYt.Yl ...Yt-1XOXll
where ~YtYt'Yl ...Yt-1XOX1 ... Xt and ~YtYt'Yl ...Yt-1XOXl are the maximum likelihood estimators
of
'ltYtYt.Yl ...Yt-1XOX1 ... Xt
under H lt and Hot respectively.
Observe that (4.25) is a product of p - 1 terms that can be individually transformed to
W t = - 2 log A; .
(4.26)
This fact suggests the use of a step-down testing procedure to test H o. The
procedure is to compare W 2 with the significance point
of freedom. If the observed value is larger than
then compare W 3 with
W3'
W2,
W2
for its respective degrees
then reject H o. If it is accepted,
The components are tested in sequence. If one is rejected,
the sequence is stopped and the hypothesis Ho is rejected. If all component null
hypotheses are accepted, the composite hypothesis is accepted.
The step-down procedure proposed discussed in Chapter 2 is not robust when
we consider that the observations on taken on the same individual follow an elliptical
distribution but the observations on different individuals are independent. Tyler
(1982) and Tyler (1983) studied the robustness and efficiency properties of the normal
likelihood ratio test statistics for functions of the population covariance matrix under
elliptical distributions. A summary of his results is given below.
Let 'It be a fixed symmetric positive-definite matrix parameter of order m and
let ~n be a sequence of symmetric positive-definite random matrix estimates of order
m. Also, let H(O) be a k-dimensional multivariate function of the (m x m) covariance
matrix 0, such that the hypothesis of interest can be written as H(O) = O. Tyler
(1983) showed that if for all a > 0 and all symmetric positive-definite matrices O.
H(O) = H(aO), then Vn{H(~n) - H('It)} converges to a k-variate normal distribu-
88
.
tion with mean zero and variance-covariance matrix 2adH'('l1)}('l1 ® 'l1){H'('l1)}T,
where
H'('l1)
•
= L~I J ii ® J ii ,
Jm
= ~{dH('l1)/dvec('l1)}(I + J m ),
J ij is the (m x m) matrix of with one in the (i,j) position and
zeros elsewhere.
For symmetric positive-definite m x m matrices V and 'l1, define
Let
and
(4.27)
Theorem 4.1 (Tyler, 1983) Let ~n be a sequence of symmetric positive-definite ran•
dom matrices of order m such that nl/2(~n - 'l1) converges in distribution, where 'lJ
is a fixed symmetric positive-definite matrix with H(a'l1) = H('l1), H o : H('l1)
and HI : H('l1)
i- O.
If rankH'('l1) = k in a neighborhood of'l1, then
1. under H o
2. under the sequence of alternatives 'l1 n = 'l1 + n- I / 2 A with H('l1) = 0
where
and
89
=0
A test statistic -2Iog[Ln,H(~n)] can be made robust by dividing by a correction
factor al whenever H(o:\II)
= H(\II).
Observe that Theorem 4.1 is valid in our case when that nl
= ... ,np and
\II and
•
~ in Theorem 4.1 are the matrices
f
and
respectively, where ~c is an estimator of \IIc under HI such that nt/2(~C - \lie)
converges in distribution and H(o:\II C )
= H(\II c ),
In the step-down procedure, when nl
each step t t
~
H : H(\II C )
= O.
... ~ n p , define
= 2, ... ,p. Observe that in this case
(4.29)
where
)TW (y(t)
( y t(t) _ X*(t).;:;.
t ...... t
2t
t -
H- t --
';:;'TC T
.....t
t
{C t [X*(t)T
W
t
2t
X*(t).;:;.)
t .....t
(4.30)
X*(t)]-ICT}-1
C t .;:;.
t
t
.....t
(4.31)
•
and
Ct =
[Or(t-l)x(J+l+r) Ir(t-l) Or(t-I)Xq(t-I)]'
90
The (nt x nt) matrix W 2t is a diagonal matrix with elements equal to w2(d i ). Then,
Theorem 4.1 applies to this situation but with
0"1
depending on t.
Next we consider three alternatives for finding corrections for step-down proce-
•
dure when the data are a random sample from the same elliptically symmetric distri-
.
bution using the results derived by Tyler (1982, 1983) and three different estimators
of the parameters involved.
4.5.1
Maximum Likelihood Estimators from Normal Distribution
Let ~etet.el ...et-l be the maximum likelihood estimator of
Wetet.el ...et_ll
t = 1, ... ,p,
based on a random sample from an unknown elliptical distribution with finite fourth
moments. Take
.
Gt and fIt
as defined by (4.30) and (4.31) respectively. The following
theorem is valid as a result of Theorem 4.1 .
Theorem 4.2 For elliptical distributions
1. under H o,
2. Under the sequence of alternatives 8 t ,nt
where k t = qt(r - 1),
Gt
= 8 t + n;1/2 Kt
with C t 8 t
=0
and fIt are calculated as in the normal distribution case,
and
91
Therefore when sampling from a multivariate elliptical with kurtosis parameter
Ko
the statistic (4.26) can be replaced by
(4.32)
Under Ho each statistic (4.32) converges to a chi-squared distribution with qr(t - 1)
degrees of freedom as nt -+
00.
In practice
Ko
can be replaced by a consistent estimator
since the asymptotic chi-squared distribution still holds.
The moment estimator of
is suggested by Muirhead and Waternaux (1980)
Ko
and is given by
K, =
mm(4) j[(m + 2)m(2)2] - 1
where
nl
m(4) =
2)d;)2 j n 1
i=l
and
nl
m(2) =
L d; jn1
i=l
when n1
= n2 = ... = n p '
Similarly,
Ko
s can be estimated by
where
n.
L
m~4) =
(dT)2 j(n s - ns+d
i=n.+l +1
and
m~2)
n.
=
L
d; j(n s - ns+d,
i=n.+l +1
4.5.2
Maximum Likelihood Estimates
Let ~ etet.el ...et-l be the maximum likelihood estimator of \11 etet.el ...et-l' t = 1, ... , p,
based on a random sample from a specified elliptical distribution with density
where 9 is known (see Section 1.3.1). Take
Gt and Ht as defined
by (4.30) and (4.31)
respectively. The following theorem is valid as a result of Theorem 4.1.
92
19
.
•
Theorem 4.3 For elliptical distributions
1. under Hot,
2. under the sequence of alternatives St,nt
= St + n;1/2 Kt
with CtS t
=0
where k t = qt(r - 1),
!
[im(m
"',g,t =
as
+ 2)] / E[h 2 (T)]
=
4 E[h 2 (T)]/[m s (m s + 2)],
h 2(T) = Tg'(T)/g(T) and T has density (1.29).
The adjustments to the test statistics are obtained from the expected information
matrix for elliptical distributions. Now, take ~ =
dr
Proposition 4.1 (Lange et al., 1989) The expected information matrix I is block
diagonal with location parameters () in one block and scale parameters ¢ in another
block. The terms for each individual i, i = 1, ... ,nl are given by
..
93
and
where C i = tr (Wi 18wd 8'lj.Jg)tr (wi 18wd 8'lj.Jh). The expected information matrix
.
is obtained by summing over all observations i i = 1, ... ,n1.
In our case C i = 0 and only E {T [g'(T)/ g(T)]2} has to be calculated, because
for elliptical distributions
E{T' [~inn = (m + 2)E {T [~g?]l
For the multivariate t distribution, it can be verified that (see Lange et al. (1989))
E {T [91(T)]2} = m
g(T)
4
1/
1/ + m
+ m + 2'
so that
(m + 1/
o-l,g,t
+ 2)/(m + 1/)
It
=
Recall that for the contaminated normal distribution
g(t) = (27r)-m/2[(1 - €) exp( -t/2)
+ €)..m/2 exp( -)..t/2)]
and
The integrals involved in the expected information matrix when the underlying distribution is the contaminated normal are difficult to evaluate. One solution is to
substitute the integrals by their empirical values as in Singer and Sen (1985). Another solution is to observe that
h(€)
[g'(t)/g(t)]2
11 - € + €)..m/2+1 exp[(1 - )..)t/2]
"4 1 - € + €)..m/2 exp[(1 - )..)t/2]
94
can be approximated by a Taylor series expansion up to the second term when E is
small, that is,
h(E) ~ h(O)
+ h'(O)E + [h"(0)/2]E 2,
as E -t 0,
where
•
h(O) = 1/4,
h'(O) = [(,\ - 1),\m j 2/2] exp[(1 - ,\)t/2]
and
h"(O) = ,\m/2(,\ - 1) exp[(1 - ,\)t/2]
+ [,\m/2(,\ -
1)('\ - 3)/2] exp[(1 - '\)t].
Then
E{T [~(~in
-f
t
~ 10
.
00
s {h(O)
2- m / 2
r(m/2)
-
•
[~gir Mt) dt
{I1
+ h'(O)E + [h"(0)/2]E 2} fT(t) dt
+ 12 + I 3 }
where
11 = 2m / 2 - 1 r(m/2
12 = [2 m/ 2f(m/2
+ 1)('\ -
+ 1)(1 - E + E/,\),
1)E/'\][1 - E + E,\m+1/(2,\ _ 1)m/2+1]
and
2m/ 2f(m/2
13
+
4.5.3
Let
+ 1)('\ -
(1 - E)/2 m/2+1
1),\m E2 {[(1 - E)('\ - 3)
+ E,\m/2(,\ -
+ E]/[2(2,\ _
3)/[(2(3,\ - 2)m/2+1J} .
Affine-Invariant M-Estimates
q,etet.et ...et-t
be an M-estimator of
Wetet.et ...et-ll
t = 1, ... ,p, based on a random
sample from a specified elliptical distribution with density
-
Gt
and
1)m/2+1]
fIt
f9
where 9 is known. Take
as defined by (4.30) and (4.31) respectively. The following theorem is
valid as a result of Theorem 4.1.
95
Theorem 4.4 For elliptical distributions
1. under Hot,
2. under the sequence of alternatives 8 t ,nt
= 8 t + n;1/2 Kt
with C t 8 t
=0
where k t = qt(r - 1),
ff
l,w,g,t
-1J
t'
as = (2'l/J2 + m s)2 j[(m s + 2)2'l/Jd,
'l/Jl
= E['l/J(asT)]j[ms(m s + 2)],
'l/J2 = E[asT'l/J'(asT)Jlm s,
'l/J(T) = TW2(T)
and T has density (1.29).
The value of al,w,g,t for Huber(l) is given by
(m + 2)jm
al,w,g,t
=
This value is not dependent on the specific elliptical population.
96
4.5.4
.
Discussion
As discussed by Tyler (1983), the adjusted likelihood ratio test by a kurtosis estimate has a few drawbacks. First, the test is valid only if the underlying elliptical
distribution has finite fourth moments. Second, even if the fourth moments exist,
the test is not very powerful whenever the kurtosis parameter is at least moderatly
large. This can be seen by calculating the asymptotic relative efficiency of this test
with respect to the adjusted likelihood ratio test using the other types of estimators
when
nl
= n2 = ... = nt·
In that case, the ratio O"l,g,t/(l
+ rc)
represents the asymp-
totic relative efficiency of the maximum likelihood estimators based on the normal
distribution to the maximum likelihood estimators based on a specified elliptical distribution. Tyler (1983) showed that O"l,g,t/(l
+ rc)
:::; min[l, (1
+ 2/m)/(1 + rc)].
Thus.
even for moderate values of rc the adjusted likelihood ratio test is quite inefficient.
The parameter O"l,g,t is bounded above by 1 + 2/m. He also noted that the sample
information for the parameter H(lJ!) when sampling from a specific elliptical population is not much less than the sample information when sampling from a normal
population, especially in higher dimensions.
•
97
Chapter 5
Numerical Examples
5.1
Introduction
Two applications of the proposed methodology are presented as illustrations using
data extracted from randomized, multi-visit clinical trials.
5.2
Example 1
The data examined are from a randomized multicenter trial testing treaments to
lower cholesterol levels by means of low-fat diets. For that purpose, 105 subjects
were assigned randomly to one of three diets (A, B and C) and their plasma lipid
(cholesterol and triglycerides) levels and weight were evaluated after 5, 6, 7 and 8
weeks. Other factors considered were gender combined with age (Male under 40, Male
over 40, Pre-menopausal Female and Post-menopausal Female), and race (Black and
White). Although the primary interest of the trial was to examine the effect of the
diets on plasma lipid levels, one question raised was whether the weights observed at
weeks 6, 7 and 8 provide more information than the weights recorded at week 5 alone.
Previous investigations indicated that cholesterol levels should be log-transformed to
reduce skewness and triglycerides levels kept as they are. A model that had different
intercepts for each factor level (but without any interaction terms) and common slopes
•
for the weight variables was fitted. The model is given by
(
where
Yitl
equal to 1,
and
Xitl
YUl ) -- B Tto
YU2
is the log transformed cholesterol level,
Xi02
to
Xi07
Yit2
is the triglyceride level,
XiOI
is
have values of 0 and 1 according to the other factors considered,
is the weight of each individual i at visit t for i
= 1, ... , 105 and t = 5, ... , 8.
Table 5.1: Statistics for Example 1.
Estimates from
Normal
..
Step
Week
df
1
6
2
2
7
4
Statistic
2
8
6
adj by
K,
t16
CN
Huber( 1)
X obs
1.7040
1.5147
0.5405
0.3377
0.5634
P-value
0.4266
0.4689
0.7632
0.8446
0.7543
21.3879
19.0117
16.0458
13.9287
14.6883
0.0003
0.0008
0.0030
0.0075
0.0054
10.6883
9.5008
7.8664
7.4770
7.4884
0.0985
0.1473
0.2481
0.2790
0.2780
2
X obs
P-value
3
Normal
2
Xobs
P-value
The step-down test statistics for the hypotheses of interest were computed \\" I r II
different estimates of the parameters involved and are presented in Table 5.1. T II t
estimate of the kurtosis parameter
K,
was calculated using the moment estimator dlld
resulted in Fe = 0.1250. The degrees of freedom t-distribution was obtained from tIlt
estimate of of the kurtosis parameter. The parameters t and A of the contaminatl·d
normal distribution were calculated using the method of maximum likelihood. alld
resulted in E = 0.3458 and ,\ = 0.3589. The estimated value of t indicates about t hd r
there is about 34.58% of contamination if the contaminated normal distribution is tIll'
correct assumption. The adjustments to the step-down test statistics were computed
using the method described in Chapter 4. The null hypothesis is that the weights at
99
weeks 6, 7 and 8 provide no more information beyond the one provided by weight at
week 5 in the prediction of the plasma lipids. For a = 0.05, we can take as = 0.0170,
s = 1,2, 3 and observe that the results in Table 5.1 lead to the rejection of the null
hypothesis at step 2 regardless of the test statistic used. The test statistics based
on the normal distribution tended to yield larger p-values at each step, favoring the
.
,
alternative hypothesis.
5.3
Example 2
The data in this section correspond to a part of a randomized multi-visit clinical
trial to test a cholesterol lowering drug against placebo. Initially there were 1835
subjects in the group on active treatment and 1843 taking placebo.
The age of
each subject was recorded at entrance in the trial. Two liver function measurements
(alkaline phosphatase and SGOT) from two groups of men (on drug and placebo) were
observed at eight regularly spaced visits and the alcohol consumption (grams/day)
was recorded for each individual. The monotone missing data pattern of the data
.
scheduled to be taken one year apart is shown in Table 5.2. Only 42.88% of the
subjects have data for all eight measurements. Because of progressive recruitment
and a fixed termination date, most of the subjects were followed for less than the full
follow-up time.
The following model was used:
(
where
Yitl
Ym ) = Bra
Ytt2
is the transformed alkaline phosphatase,
XiOl
is equal to 1,
Xi03
is the age at randomization, and
Xi02
Yit2
is the transformed SGOT,
indicates the treatment: 1 if on drug and 0 if on placebo,
Xit!
is the transformed alcohol consumption
for each individual i and visit t. The following transformations were used to reduce
the skewness of the responses and random covariates: Y:t~·75 for alkaline phosphatase.
100
•
Table 5.2: Pattern of missing data for Example 2.
Visit
•
Number of
1
2
3
4
5
6
7
8
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1577
884
167
149
305
323
160
x
3678
Subjects
113
3565
3405
3082
2777
2628
2461
1577
3678
*-O.50 for SGOT 'and
for alcohol consumption .
Ylt2
l tX*O.25
l
The step-down procedure was used to test whether the information on alcohol
consumption recorded at visits 2 to 8 did not provide more information to explain the
change in the liver functions than the alcohol consumption recorded at visit one alone.
The step-down test statistics using different estimates of the parameters involved and
are presented in Table 5.3. The adjustments to the test statistics based on kurtosis
were slightly larger than 2.3 at each step. The degrees of freedom t-distribution
was obtained by means of a grid search, that is, by fixing the degrees of freedom
at some predetermined values and choosing the one that maximizes the likelihood.
It was not possible to calculate the test statistics using the contaminated normal
distribution because the estimation algorithm did not converge. For a = 0.05, we
can take as = 0.0073, s = 1, ... , 7 and observe that the results in Table 5.3 lead to
different conclusions about the rejection of the null hypothesis according to the test
statistic used. The step-down procedure based on the normal distribution rejected
the null hypothesis in step 1, whereas the ones based on the tlO-distribution and
Huber(l) estimates rejected the null hypothesis in step 3. The test statistics based
101
Table 5.3: Statistics for Example 2.
Estimates from
Normal
Step
Visit
df
1
2
2
Statistic
2
Xobs
P-value
2
3
4
2
Xobs
P-value
3
4
6
2
Xobs
P-value
4
5
8
2
Xobs
P-value
5
6
10
2
Xobs
P-value
6
7
12
2
Xobs
P-value
7
8
14
2
Xobs
P-value
Normal
adj by '"
14.7645
6.3051
6.3842
9.0091
0.0006
0.0427
0.0411
0.0111
10.3286
4.4105
5.1933
5.9062
0.0352
0.3533
0.2680
0.2063
34.9135
14.8732
19.9043
22.0402
0.0000
0.0213
0.0029
0.0012
12.4086
5.3522
4.9401
5.7610
0.1339
0.7193
0.7640
0.6740
25.4326
10.9864
14.1376
14.4013
0.0046
0.3586
0.1668
0.1555
22.5018
9.7100
14.6837
14.9050
0.0323
0.6414
0.2592
0.2467
28.9578
12.3260
17.5014
17.9651
0.0106
0.5801
0.2304
0.2084
tlO
Huber(1)
•
on the kurtosis adjustment did not lead to the rejection of the null hypothesis at
all. This reinforces the fact that when the kurtosis estimate is too large, the kurtosis
ajustment to the test statistics may not be appropriate. As in Example 1, the test
statistics based on the normal distribution tended to yield larger p-values at each
step, favoring the alternative hypothesis.
"
102
•
•
Chapter 6
Suggestions for Future Research
6.1
Introduction
This study of step-down procedures to the analysis of time-varying covariates does
c
not cover all possible situations, but represents an important starting point in helping clarify the scope of the proposed correlation model under some of the possible
variations in the model assumptions. Two main questions remain unsolved, and are
suggested for future research:
1. Is it possible to extend the step-down procedure for the case when M-estimates
based on conditional distributions of the responses and stochastic predictors
given the past observations are used, instead of using M-estimates based on the
entire observation of the variables associated with each subject?
2. What is the impact of a general pattern of missing values, instead of a monotone
decreasing one?
•
Some insights into these issues and related ones are presented in the next sections.
Other topics for future research are to use the proposed methodology to determine
the optimal spacing between covariate measurements and to adapt the procedure for
use in cross-over designs.
6.2
Conditional M-Estimates
It seems natural to do some preliminary analysis on the data gathered at each visit to
study the relationship between the time-varying covariates and the responses. This is
particularly true when the observation of the variables of each visit can be completed
before the end of the multi-visit trial, as in Example 1 in Chapter 5. This poses
•
no problem if it is assumed that the underlying distribution is normal, as used in
Chapter 2, because the distribution of the test statistic at each step does not depend
upon an unknown parameter related to the variance-covariance structure as in the
case of other elliptically symmetric distributions, and the conditional or unconditional
approaches used to decompose the log likelihood ratio statistic into p - 1 parts to be
tested separately become equivalent. On the other hand, where the distribution of the
observations associated with each individual is assumed to arise from an elliptically
symmetric distribution, and the observations on different individuals are independent,
we considered only the case of the step-down procedure calculated with different estimates of the variance-covariance matrix parameter that are based on the entire
observation of the variables in the trial. Thus, if one decides to use the step-down
procedure with variance-covariance parameters calculated using only the variables observed up to a certain visit, we conjecture that there may be some loss of information
if the data on each individual come from an elliptically symmetric distribution other
than normal. More studies are needed to determine if this loss occurs, and how it is
influenced by the sample size. Along these lines, another natural extension to consider
is that the vector of errors and the vector of stochastic regressors within each individual are independent, besides the independence among different individuals, with
the distribution of the errors coming from "normal mixtures" of the type described
in Little (1988). This case does not require that the entire vector of observations of
the responses be elliptically symmetrically distributed, but just that the conditional
marginals of the responses given the previous responses be elliptically symmetrically
distributed. In that case the conditional variance matrix of the error of the responses
depends on the previous responses through ~(eile, ... ,eit-l.)Wetet.el ...et_l·
104
•
The assumption of elliptically symmetric distributions at some point may be
harder to relax in the joint situation of using M-estimation and the proposed stepdown procedure. This is due to the fact that the step-down procedure depends heavily
on the linearity of the mean parameters with respect to the previous responses at
each visit under a conditional distribution, or a transformation of the unconditional
distribution. In order to relax the linearity assumption, one has to move away from
M-estimation and consider other strategies such as the theory of R-estimation (rankestimation) and hypothesis testing. One problem with the R-estimation approach
is that it may not be possible to calculate the corresponding test statistics since a
multicollinearity problem on functions of the ranks being taken as regressors is much
more likely to occur.
One might argue that needing the data from all visits to compute some of the
step-down test statistics discussed in this work would be defeating the purpose of
this effort, and is redundant because then the entire log likelihood ratio statistic (or
some robustified version of it) can be easily computed for a simultaneous test of noncorrelation between the responses and stochastic predictors. While this argument has
an intuitive appeal, it does not meet two sets of circumstances:
1. The use of a robustified log likelihood statistic method, or its Wald type of
approximation, does not provide any further information in the case when the
null-hypothesis is rejected. Thus, one appropriate action is to split the overall
statistic into p - 1 parts and use the step-down method to verify which visit
contains the test statistic that leads to rejection of the overall null hypothesis.
2. The study of some general situations was needed before one can properly define
a set of reasonable assumptions for the proposed correlation model. A topic
of future research is to develop some robust step-down procedures that do not
require the estimation of the parameters based on all variables in the trial.
•
This case is probably more suitable in tests of independence betweeen sets of
variables, and independence is a more restrictive assumption than being noncorrelated.
105
6.3
Missing Values
It would be interesting to investigate the influence of the occurrence of a general
pattern of missing values for the responses and covariates on the step-down test
statistics. It may be less of a problem when the missing data pattern is independent
of the covariates and responses and all the data can be used to split the likelihood test
statistic (or some robustified version of it). In the case of a general pattern of missing
data the information contained for the responses and covariates at each step depend on
the past and future observations of them. However, there are many methods available
(Little, 1992) to calculate the estimates of the parameters involved, and then, the test
statistic can be split into the ones needed to apply the step-down procedure. In that
case, one needs to show that the step-down statistics are independent under the
null-hypothesis and local alternatives, otherwise the estimation process can be more
seriously compromised in terms of loss of information from the future visits.
106
•
Bibliography
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd
edn, New York: Wiley.
Anderson, T. W., Fang, K. T. and Hsu, H. (1990). Maximum likelihood estimates
and likelihood ratio criteria for multivariate elliptically contoured distributions,
in K. T. Fang and T. W. Anderson (eds), Statistical Inference in Elliptically
Contoured and Related Distributions, New York: Allerton, pp. 217-223.
Beran, R. and Srivastava, M. S. (1985). Bootstrap tests and confidence regions for
functions of a covariance matrix, The Annals of Statistics 13: 95-115.
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of
covariance structures, British Journal of Mathematical and Statistical Psychology
37: 62-83.
Chinchilli, V. M., Schwab, B. H. and Sen, P. K. (1989). Inference based on ranks for
the multiple-design multivariate linear model, Journal of the American Statistical
Association 84: 517-524.
Conniffe, D. (1982). Covariance analysis and seemingly unrelated equations, The
American Statistician 36: 169-171.
Don, F. J. H. and Magnus, J. R. (1980).
On the unbiasedness of iterated GLS
estimators, Communications in Statistics - Theory and Methods 9: 519-527.
107
DWivedi, T. D. and Srivastava, V. K. (1978). Optimality of least squares in the
seemingly unrelated regression equation model, Journal of Econometrics 7: 391395.
Freedman, D. A. and Peters, S. C. (1984). Bootstrapping a regression equation: Some
empirical results, Journal of the American Statistical Association 84: 97-106.
Fujisawa, H. (1995). A note on the maximum likelihood estimators for multivariate
normal distribution with monotone data, Communications in Statistics - Theory
and Methods 24: 1377-1382.
Gallant, A. R. (1975). Seemingly unrelated nonlinear regressions, Journal of Econometrics 3: 35-50.
Harville, D. A. (1974). Bayesian inference for variance components using only error
contrasts, Biometrika 61: 383-385.
Harville, D. A. (1976). Extension of the Gauss-Markov theorem to include the estimation of random effects, The Annals of Statistics 4: 384-395.
Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems, Journal of the American Statistical Association
72: 320-340.
Helms, R. W. (1992). Intentionally incomplete longitudinal designs: 1. Methodology
and comparison of some full span designs, Statistics in Medicine 11: 1889-1913.
Huber, P. J. (1981). Robust Statistics, New York: John Wiley.
Jennrich, R. 1. and Schluchter, M. D. (1986). Unbalanced repeated measures models
with structured covariance matrices, Biometrics 42: 805-820.
Kariya, T. and Sinha, B. K. (1989). Robustness of Statistical Tests, Boston: Academic
Press.
Kendall, M. G. and Stuart, A. (1967). Advanced Theory of Statistics, Vol. 2, 2nd edn,
New York: Hafner.
108
..
Kleinbaum, D. G. (1973). Testing linear hypotheses in generalized multivariate linear
models, Communications in Statistics 1: 433-457.
•
Kmenta, J. and Gilbert, R. F. (1968). Small sample properties of alternative estimators of seemingly unrelated regressions, Journal of the American Statistical
•
Association 63: 1180-1200.
Lange, K. 1., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical modeling
using the t distribution, Journal of the American Statistical Association 84: 881896.
Lange, K. L. and Sinsheimer, J. S. (1993). Normal/independent distributions and
their applications in robust regression, Journal of Computational Statistics and
Graphical Analysis 2: 175-198.
Lightner, J. M. and O'Brien, R. G. (1984). The MDM model for repeated measures
designs with repeated covariates, American Statistical Association Proceedings
of the Computing Section, pp. 126-131.
Little, R. J. A. (1988). Robust estimation of the mean and covariance matrix with
missing values, Applied Statistics 37: 23-38.
Little, R. J. A. (1992). Regression with X missing: An unified approach, JOUrTwl (If
the American Statistical Association 87: 1227-1237.
Maronna, R. A. (1976). Robust M-estimators of the multivariate location and scall"1
The Annals of Statistics 4: 51-67.
McDonald, L. L. (1975). Tests for the general linear hypothesis under the multipl,'
design multivariate linear model, The Annals of Statistics 3: 461-466.
Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory, New York: \Yil,,\
Muirhead, R. J. and Waternaux, C. M. (1980). Asymptotic distributions in canonical
correlation analysis and other multivariate procedures for nonnormal populations, Biometrika 67: 31-43.
109
Muller, K. E. and Barton, C. N. (1989).
Approximate power for repeated mea-
sures ANOVA lacking sphericity, Journal of the American Statistical Association
84: 549-555.
.
Muller, K. E., LaVange, L. M., Ramney, S. L. and Ramney, C. T. (1992). Power
calculations for general linear multivariate models including repeated measures
applications, Journal of the American Statistical Association 87: 1209-1226.
Patel, H. I. (1986). Analysis of repeated measures designs with changing covariates
in clinical trials, Biometrika 42: 707-715.
Revankar, N. S. (1974). Some finite sample results in the context of two seemingly
unrelated regression equations, Journal of the American Statistical Association
69: 187-190.
Revankar, N. S. (1976). Use of restricted residuals in SUR systems: Some finite
sample results, Journal of the American Statistical Association 71: 183-188.
Rocke, D. M. (1989). Bootstrap Bartlett adjustment in seemingly unrelated regression, Journal of the American Statistical Association 84: 598-601.
Roy, S. N., Gnanadesikan, R. and Srivastava, J. N. (1971). Analysis and Design of
Certain Quantitative Multi-Response Experiments, New York: Pergamon-Press.
Ruble, W. L. (1968). Improving the computation of simultaneous stochastic linear equations estimates, Agricultural Economics Report no. 116, Michigan State
University, East Lansing, MI.
Saleh, A. K. M. E. and Shiraishi, T. (1993). Robust estimation for the parameters of
multiple-design multivariate linear models under general restriction, Journal of
Nonparametric Statistics 2: 295-305.
Sen, P. K. (1983). A Fisherian detour on the step-down procedure, in P. K. Sen
(ed.), Contributions to Statistics: Essays in Honour of Norman L. Johnson.
Amsterdam: North-Holland, pp. 129-145.
110
Shapiro, A. and Browne, M. W. (1987). Analysis of covariance structures under
elliptical distributions, JASA 82: 1092-1097.
..
Singer, J. M. and Sen, P. K. (1985). M-methods in multivariate linear models, Journal
of Multivariate Analysis 17: 168-184.
Srivastava, J. N. (1966). Some generalizations of multivariate analysis of variance,
in P. R. Krishnaiah (ed.), Multivariate Analysis, New York: Academic Press,
pp. 129-145.
Srivastava, J. N. (1967). On the extension of the Gauss-Markov theorem to complex
multivariate linear models, Annals of the Institute of Statistical Mathematics
19: 417-437.
Srivastava, V. K. and Dwivedi, T. D. (1979). Estimation of seemingly unrelated
regression equations, Journal of Econometrics 10: 15-32.
Srivastava, V. K. and Giles, D. E. A. (1987). Seemingly Unrelated Regression Equations Models, New York: Marcel Dekker.
Telser, L. G. (1964). Iterative estimation of a set of linear regression equations,
Journal of the American Statistical Association 59: 845-862.
Timm, M. H. (1980). Multivariate analysis of variance ofrepeated measures, in P. R.
Krishnaiah (ed.), Handbook of Statistics, Vol. 1, Amsterdam: North-Holland,
pp.41-87.
Tyler, D. E. (1982). Radial estimates and the test for sphericity, Biometrika 69: 429436.
Tyler, D. E. (1983).
Robustness and efficiency properties of scatter matrices,
Biometrika 70: 411-420.
Verbyla, A. P. (1988). Analysis ofrepeated measures designs with changing covariates,
Biometrika 75: 172-174.
111
Verbyla, A. P. and Venables, W. N. (1988). An extension of the growth curve model,
Biometrika 75: 129-138.
Ware, J. H. (1985). Linear models for the analysis of longitudinal studies, The Amer-
ican Statistician 39: 95-101.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions
and tests for aggregation bias, Journal of the American Statistical Association
57:
348~368.
Correction (1972) 67: 255.
Zellner, A. (1963). Estimators for seemingly unrelated regression equations: Some exact finite sample results, Journal of the American Statistical Association 58: 977992.
Zellner, A. (1972).
Corrigenda, Journal of the American Statistical Association
67: 255.
112
© Copyright 2026 Paperzz