Derived Variables with Repeated Measurements

Derived Variables with Repeated Measurements
Andrew W. Roddam
University of Oxford, Wellcome Trust Centre for the Epidemiology of Infectious Disease
South Parks Road
Oxford, OX1 3PS, UK
[email protected]
Introduction
This paper considers the extension of a technique discussed in Cox & Wermuth (1992) and
Wermuth & Cox (1995), to the case of a multivariate response vector Y = (Y1; : : : ; Y ), which
is observed at a series of time points t = 1; : : : ; n. The aim of the original paper was to seek
a set of linear transformations, Y , of a multivariate response vector Y = (Y1; : : : ; Y ), such
that in the multiple regression of Y on a set of explanatory variables X , only the co-ecient
of X ; s = 1; : : : ; q was non-zero. In the current setting we have repeated measurements of the
multivariate response vector Y , and the aim is to nd a set of linear transformations of Y
which is constant over the observed time points.
This paper proposes a methodology which estimates a q-dimensional vector of linear
transformations Y , of the original response vector Y , such that at each time point Y = AY ,
t = 1; : : : ; n. For the purposes of exposition, we restrict attention to the case where n is either
2 or 3. We will also consider the natural extension to the case where in addition to the repeated
measurements we also have a set of explanatory variables X .
t
q
q
s
s
t
t
t
t
t
t
Results
In the case of two repeated measurements then there is in essence only one possibility
for the evolution of the set of linear relationships, that is we require Y to depend only on
Y ,1 , for s = 1; : : : ; q. This can be achieved by setting the matrix of regression co-ecients
of Y on Y ,1 to be diagonal and equal to D, say. It can be shown that this is equivalent to
solving AB ,1 = DA for A, where B ,1 is the matrix of regression coecients of Y on
Y ,1. Thus the required solution is that the rows of A are the left eigenvectors of B ,1, with
eigenvalues being the elements of D. If there is a set of explanatory variables X , on which
arbitrary dependence is dened, then we simply compute B ,1 relative to the regression on
X . It is possible that the computed eigenvalue will be complex or zero. In the former case,
this would imply that the joint dependencies could not be represented in any simple linear
relationship, whereas a zero eigenvalue would imply that that a reduced number of derived
variables would be sucient to dene the joint dependence relationships.
In the case of three repeated measurements then there are a number of dierent, interesting
possibilities for the evolution of the set of linear relationships. We could assume the Markov
property, i.e. Y depends only on Y ,1 , and it can be shown that in this case we require the
matrices of regression coecients B n n,1 and B n,1 n,2 to have the same left eigenvectors.
Alternatively, if we no longer assume the Markov property, we could require that Y depends
on both Y ,1 and Y ,2 . This can be shown to be equivalent to solving AB ,1 ,2 = D ,1A
for A, where D is some diagonal matrix. To ensure that we retain the desired properties, we
need to check that this solution for A satises B n n,1 = (D1jD2), where D1; D2 are diagonal
matrices, and Z ,1 = (AY ,1; AY ,2) . Some techniques will be explored in order to assess
n;s
n
;s
n
n
n;n
n;n
n
n
n;n
n;n
t;s
t
Y
;s
;Y
Y
;Y
n;s
n
;s
n
n
;s
Y
T
n
n
n
;Z
;n
n
the appropriateness of these two structures for a given data set.
Application
The techniques discussed in this paper will be illustrated on a data set of childhood growth
collected between 1972-77. Measurements of the children's height and weight were taken every 6
months from birth until age 5, and in addition there were a number of background measurements
including, maternal weight gain, smoking, and maternal height, all of which are known to aect
the height and weight of children. We will consider whether it is reasonable to assume that the
set of joint relationships between height and weight is constant throughout the rst ve years
of live, or whether it is more plausible to consider two dierent sets of joint relationships; one
set until the child is approximately 1 year old and one set thereafter. We will also illustrate
the case where we have two and three repeated measurements, and whether in the case of three
repeated measurements it would be appropriate to assume the Markov property.
REFERENCES
Cox, D.R. and Wermuth, N. (1992). On the calculation of derived variables in the analysis of
multivariate responses. Journal of Multivariate Analysis 42, 162-170
Wermuth, N. and Cox, D.R. (1995). Derived variables calculated from similar joint responses:
some characteristics and examples. Computational Statistics & Data Analysis 19, 223-234
FRENCH RE SUME
Cet expose porte sur le developpement d'une technique presentee par Cox et Wermuth
(1992). Nous utilisons le cas d'un vecteur de reponse multivarie Y t = (Y1 ; : : : ; Yq ) observe
aux temps t = 1; : : : ; n. Il s'agit d'estimer un vecteur a q-dimension, qui represente le vecteur
de reponse initial Y t , par une combinaison lineaire des Y t tel que Y t = AY t , t = 1; : : : ; n.
L'exemple illustre dans cet expose se limite uniquement aux cas ou n est egal a 2 ou 3. Nous
presenterons egalement la suite logique de ce probleme ou, en plus de donnees repetees dans le
temps, s'ajoute un groupe de variables explicatives X . Une banque de donnees sur la croissance
d'enfants entre la naissance et 5 ans nous permettra d'illustrer cette technique avec un exemple
concret.