optimal design for multiple responses with variance depending on

OPTIMAL DESIGN FOR MULTIPLE RESPONSES WITH
VARIANCE DEPENDING ON UNKNOWN PARAMETERS
Valerii Fedorov, Robert Gagnon, and Sergei Leonov
GSK BDS Technical Report 2001-03
August 2001
This paper was reviewed and recommended for publication by
Anthony C. Atkinson
London School of Economics
London, U.K.
John Peterson
Biomedical Data Sciences
GlaxoSmithKline Pharmaceuticals
Upper Merion, PA, U.S.A.
William F. Rosenberger
Department of Mathematics and Statistics
University of Maryland, Baltimore County
Baltimore, MD, U.S.A.
c 2001 by GlaxoSmithKline Pharmaceuticals
Copyright Biomedical Data Sciences
GlaxoSmithKline Pharmaceuticals
1250 South Collegeville Road, PO Box 5089
Collegeville, PA 19426-0989
Optimal Design for Multiple Responses with
Variance Depending on Unknown Parameters
Valerii FEDOROV, Robert GAGNON, and Sergei LEONOV
Biomedical Data Sciences
GlaxoSmithKline Pharmaceuticals
Abstract
We discuss optimal design for multiresponse models with a variance matrix that depends
on unknown parameters. The approach relies on optimization of convex functions of the
Fisher information matrix. We propose iterated estimators which are asymptotically equivalent to maximum likelihood estimators. Combining these estimators with convex design
theory leads to optimal design methods which can be used in the local optimality setting. A
model with experimental costs is introduced which is studied within the normalized design
paradigm and can be applied, for example, to the analysis of clinical trials with multiple
endpoints.
Contents
1 Introduction
4
2 Regression Models and Maximum Likelihood Estimation
4
3 Iterated Estimators and Combined Least Squares
7
3.1 Multivariate linear regression with unknown but constant covariance matrix . . .
9
4 Optimal Design of Experiments
10
5 Optimal Designs Under Cost Constraints
12
6 Discussion
20
7 Appendix
22
4.1 Dose response model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1 Two response functions with cost constraints . . . . . . . . . . . . . . . . . . . . . 15
5.2 Linear regression with random parameters . . . . . . . . . . . . . . . . . . . . . . 18
1 INTRODUCTION
4
1 Introduction
In many areas of research, including biomedical studies, investigators are faced with multiresponse models in which variation of the response is dependent upon unknown model parameters.
This is a common issue, for example, in pharmacokinetics, dose response, repeated measures,
time series, and econometrics models. Many estimation methods have been proposed for these
situations, see for example Beal and Sheiner (1988), Davidian and Carroll (1987), Jennrich
(1969), and Lindstrom and Bates (1990). In these models, as in all others, optimal allocation
of resources through experimental design is essential. Optimal designs provide not only statistically optimal estimates of model parameters, but also ensure that investments of time and
money are utilized to their fullest. In many cases, investigators must design studies in which
they are subject to some type of constraint. One example is a cost constraint, in which the total
budget for conducting the study is limited, and the study design must be adjusted not only to
ensure that the budget is realized, but also to ensure that optimal estimation of parameters is
achieved.
In this paper, we introduce an iterated estimator which is asymptotically equivalent to the
maximum likelihood estimator (MLE). This iterated estimator is a natural generalization of
the traditional iteratively reweighted least squares algorithms. It includes not only the squared
deviations of the predicted responses from the observations, but also the squared deviations of
the predicted dispersion matrix from observed residual matrices. In this way, our combined
iterated estimator allows us to construct a natural extension from least squares estimation to
the MLE. We show how to exploit classic optimal design methods and algorithms, and provide
the reader with several examples which include a popular nonlinear dose response model and a
linear random eects model. Finally, a model with experimental costs is introduced and studied
within the framework of normalized designs. Among potential applications of this model is the
analysis of clinical trials with multiple endpoints.
The paper is organized as follows. In Section 2, we introduce the model of observations and
discuss classic results of the maximum likelihood theory. In Section 3, the iterated estimator is
introduced. Section 4 concentrates on optimal design problems. In Section 5, the model with
experimental costs is studied within the normalized design paradigm. We conclude the paper
with the Discussion. The Appendix contains proofs of some technical results.
2 Regression Models and Maximum Likelihood Estimation
In this section, we introduce the multiresponse regression model, with variance matrix depending upon unknown model parameters. Models of this type include repeated measures, random
coecients, and heteroscedastic regression, among others. We also present a brief review of
maximum likelihood estimation theory, concluding with the asymptotic normality of the MLE.
Note that the MLE for the regression models described herein does not yield closed form solu-
2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION
5
tions, except in the simplest of cases. It is necessary, therefore, to resort to iterative procedures,
and to rely on the convergence and asymptotic properties of these procedures for estimation and
inference.
Let the observed k 1 vector y have a normal distribution and
E[yjx] = (x; );
Var[yjx] = S (x; );
(1)
where (x; ) = (1 (x; ); : : : ; k (x; ))T ; S (x; ) is a k k matrix, x are independent variables
(predictors) and 2 Rm are unknown parameters. In this case the score function of a
single observation y is given by
@ nlog jS (x; )j + [y ; (x; )]T S ;1 (x; ) [y ; (x; )]o ;
R(yjx; ) = ; 21 @
and the corresponding Fisher information matrix is (cf. Magnus and Neudecker (1988, Ch. 6),
or Muirhead (1982, Ch. 1))
(x; ) = (x; ; ; S ) = Var[R(yjx; )] = [ (x; )]m;=1;
"
#
(2)
T
(x; ) + 1 tr S ;1 (x; ) @S (x; ) S ;1 (x; ) @S (x; ) :
(x; ) = @ @(x; ) S ;1 (x; ) @@
2
@
@
In general, the dimension and structure of y; , and S can vary for dierent x. To indicate this,
we should introduce a subscript ki for every xi , but we do not use it, retaining the traditional
notation yi ; (xi ; ) and S (xi ; ) if it does not cause confusion. The log-likelihood function L
for N independent observations y1 : : : yN can be written as
N n
o
X
1
LN () = ; 2
log jS (xi ; )j + [yi ; (xi ; )]T S ;1 (xi ; ) [yi ; (xi ; )] ;
i=1
(3)
and the information matrix is additive in this case, i.e.
N () =
N
X
i=1
(xi ; ):
Any vector N which maximizes the log-likelihood function LN (),
N = arg max
L ()
2
N
(4)
is called a maximum likelihood estimator (MLE). We introduce the following assumptions:
Assumption 1. The set is compact; xi 2 X where X is compact, and all components
in (x; ) and S (x; ) are continuous with respect to uniformly in , with S (x; ) S0 where
S0 is a positive denite matrix. The true vector of unknown parameters is an internal point
of .
Assumption 2. The sum Pi f (xi; ; )=N converges uniformly in to a continuous
function (; ),
;1 X f (xi ; ; ) = (; ) ;
lim
N
N !1
i=1
N
(5)
2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION
where
h
6
i
f (x; ; ) = log jS (x; )j + tr S ;1 (x; ) S (x; ) +
+ [(x; ) ; (x; )]T S ;1 (x; ) [(x; ) ; (x; )];
and the function (; ) attains its unique minimum at = .
Following Jennrich (1969, Theorem 6), it can be shown that under Assumptions 1 and 2,
the MLE is a measurable function of observations and is strongly consistent; see also Fedorov
(1974), Heyde (1997, Ch. 12), Pazman (1993), Wu (1981). Condition (5) is based on the fact
that
n
o
E [y ; (x; )]T S ;1 (x; ) [y ; (x; )] =
h
i
= [(x; ) ; (x; )]T S ;1 (x; ) [(x; ) ; (x; )] + tr S ;1 (x; ) S (x; ) ;
and the Kolmogorov law of large numbers; see Rao (1973, Ch. 2c.3).
If in addition to the above assumptions all components of (x; ) and S (x; ) are twice
dierentiable with respect to for all 2 , and the limit matrix
lim N ;1
N !1
N
X
i=1
(xi ; ) = M ( )
(6)
exists and is regular, then N is asymptotically normally distributed, i.e.
p
N (N ; ) N (0; M ;1( )):
(7)
Note that the selection of the series fxi gN is crucial for consistency and precision of N :
Remark 1. Given N and fxigN1 , a design measure can be dened as
N
X
N (x) = N1 xi (x); x(z) = f1 if z = x; and 0 otherwiseg :
i=1
If the sequence fN (x)g weakly converges to (x), then the limiting function (; ) in the
\identiability" Assumption 2 can be presented as
Z
(; ) = f (x; ; )d(x);
cf. Malyutov (1988). Most often, within the optimal design paradigm, the limit design is a
discrete measure, i.e. a collection of support points fxj ; j = 1; :::; ng with weights pj , such that
P p = 1; see Section 4.
j j
3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES
7
3 Iterated Estimators and Combined Least Squares
If the dispersion matrices of the observed y are known, i.e. S (xi ; ) = S (xi ), then (3) leads to
what is usually called the generalized least squares estimator:
~N = arg min
N
X
i=1
[yi ; (xi ; )]T S ;1 (xi ) [yi ; (xi ; )] ;
(8)
which is well studied. When the variance function S depends on , it is tempting to replace (8)
by
N
X
T S ;1 (x ; ) [y ; (x ; )] ;
~N = arg min
[
y
;
(
x
;
)]
(9)
i
i
i
i
i
i=1
which in general is not consistent; cf. Fedorov (1974), Muller (1998).
Resorting to iteratively reweighted least squares (IRLS), see Beal and Sheiner (1988), Vonesh
and Chinchilli (1997),
~N = Nlim
;
!1 t
t = arg min
N
X
i=1
(10)
[yi ; (xi ; )]T S ;1 (xi ; t;1 ) [yi ; (xi ; )] ;
leads to a strongly consistent estimator with an asymptotic dispersion matrix
#;1 N @(x ; )
~ "X
@
(
x
;
)
i S ;1 (x ; )
i
;
Var N '
i
T
@
@
=~N
i=1
which is bigger, in terms of non-negative denite matrix ordering, than the corresponding matrix
for the MLE, dened in (6). For a discussion on related issues, see Jobson and Fuller (1980),
Malyutov (1982).
The natural step after (10) is introduction of the combined iterated least squares estimator
which includes squared deviations of the predicted dispersion matrix S (x; ) from observed
residual matrices:
^N = tlim
(11)
!1 t ;
where
t = arg min
v2 (; t;1);
2
N
vN2 (; 0) =
+ 21
where
N
X
i=1
N
X
i=1
hn
[yi ; (xi ; )]T S ;1 (xi ; 0 ) [yi ; (xi ; )]
o
i2
tr [yi ; (xi ; 0 )] [yi ; (xi ; 0 )]T ; (xi ; ; 0 ) ; S (xi ; ) S ;1 (xi ; 0 ) (12)
(xi ; ; 0 ) = [(xi ; ) ; (xi ; 0 )] [(xi ; ) ; (xi ; 0 )]T :
3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES
8
To prove the convergence of the combined iterated estimator, together with Assumption 1, we
need the following:
Assumption 3. The variance function satises S (x; ) S1 for all x 2 X and 2 ,
where S1 is a positive denite matrix. Design measures N (x) converge weakly to (x), and the
function
Z
~(; ) = f~(x; ; )d(x)
is continuous with respect to , and attains its unique minimum at = , where
n
o2
f~(x; ; ) = [(x; ) ; (x; )]T S1;1 [(x; ) ; (x; )] + tr [S (x; ) ; S (x; )] S1;1 :
The following theorem establishes the asymptotic equivalence of the combined iterated estimator (11), (12) and the MLE (4). The introduction of stationary points of the log-likelihood
function in the statement of the theorem is similar to Cramer's (1946) denition of the MLE.
Theorem 1. Under the regularity Assumptions 1 and 3,
lim P f^N 2 N g = 1;
N !1
where N is a set of stationary points of the log-likelihood function LN (),
N = f : @L@N () = 0; j = 1; : : : ; mg:
j
The proof of the theorem is postponed to the Appendix.
Remark 2. The introduction of the term (xi ; ; 0) in (12) together
with Assumption 3
guarantees that for any 0 , the unique minimum of limN !1 E vN2 (; 0) =N with respect to is
attained at = ; see Lemma 1 in the Appendix for details. Note that if the iterated estimator
(11), (12) converges, then (xi ; t ; t;1 ) ! 0 as t ! 1. Therefore, in some situations this
term can be omitted if a starting point 0 is close enough to the true value; see Section 3.1 for
an example.
Remark 3. If function ~(; ) dened
R in Assumption 3 attains its unique minimum at
= , then so does function (; ) = f (x; ; )d (x), see Remark 1. To verify this, note
that if S and S are postitive denite matrices, then the matrix function
g(S ) = log jS j + tr[S ;1 S ]
attains its unique minimum at S = S ; see Seber (1984, Appendix A7).
Remark 4. If in (12) one chooses 0 = and, similar to (9), considers
^N = arg min v2 (; );
N
2
3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES
9
then using the approach described in Fedorov (1974), it can veried that this \non-reweighted"
estimator ^N is, in general, inconsistent.
3.1
Multivariate linear regression with unknown but constant covariance
matrix
Let y be a k 1 normally distributed vector and E[yjx] = F T (x); Var[yjx] = S , i.e.
T = ( T ; S11 ; S21 ; : : : ; Sk1 ; S22 ; S32 ; : : : ; Sk2 ; : : : ; Skk ) = ( T ; vechT (S ));
with F (x) = [f1 (x); : : : ; fk (x)]. We follow Harville (1997, Ch. 16.4) in using the notation vech
for element-wise \vectorization" of a k k symmetric matrix. The simplest estimator of the
regression parameter is given by
~N =
N
X
i=1
F (xi )F T (xi )
!;1 X
N
i=1
F (xi )yi;
which is unbiased (though not ecient) and thus provides a choice of 0 which allows for dropping
the term (xi ; ; 0 ) on the right-hand side of (12); see Remark 2. In this case, the modied
function vN2 (; 0) can be presented as
vN2 (; 0) =
N h
X
yi ; F T (xi )
iT
h
i=1
N
X hn
(S 0 );1 yi ; F T (xi )
i
o
i2
+ 21
tr [yi ; F T (xi ) 0 ] [yi ; F T (xi ) 0 ]T ; S (S 0 );1 :
(13)
i=1
Parameters and S in (13) are nicely partitioned and if ki k, then each step in the iterative
procedure (11) can be presented in closed form,
t = Mt;;11 Yt;1 and St = N ;1
where
Mt =
N
X
i=1
N h
X
i=1
ih
yi ; F T (xi )t;1 yi ; F T (xi)t;1
F (xi)St;1 F T (xi ); Yt =
N
X
i=1
iT
;
(14)
F (xi)St;1 yi ;
cf. Fedorov (1977). Consequently,
^
^N = tlim
!1 t and Sn = tlim
!1 St :
The information matrix of ^NT = ^NT ; vechT (S^N ) is blockwise with respect to and vech(S ),
therefore the asymptotic dispersion matrices can be computed separately. For instance,
Var(^N ) '
"X
N
i=1
F (xi )S^;1 F T (xi )
N
# ;1
:
Note that the second formula on the right-hand side of (14) is valid only if all k components are
measured at all points xi . Otherwise, St cannot be presented in closed form for any nondiagonal
case.
4 OPTIMAL DESIGN OF EXPERIMENTS
10
4 Optimal Design of Experiments
As soon as the information matrix of a single measurement is dened, it is a rather straightforward task to construct numerical algorithms and derive their properties in the optimal design
P
theory setting. Let ni measurements be taken at point xi , and let ni=1 ni = N , where n is the
number of distinct xi .
Let design N be dened as
N = f(ni; xi )n1 ;
n
X
i=1
ni = N ; xi 2 Xg;
where X is a design region. Each design generates the information matrix
MN () =
X
i
ni (xi; ) = N
n
X
i=1
pi (xi; ) = NM (; ); where M (; ) =
n
X
i=1
pi(xi; );
weights pi = ni=N ; M (; ) is a normalized information matrix, and = fxi ; pi g is a normalized
(continuous, or approximate) design. In this setting N may be viewed as a resource available
to a practitioner; see Section 5 for a dierent normalization. In convex design theory, it is
standard to allow weights pi to vary continuously. The goal is to minimize various functionals
depending on the normalized variance matrix M ;1 (; );
= arg min
[M ;1 (; )]:
Among popular optimality criteria are:
D-criterion, = log jM ;1 (; )j, which is often called a generalized variance and is
related to the volume of the condence ellipsoid;
A-criterion (linear), = tr AM ;1(; ), where A is an m m non-negative denite
matrix and m is the number of unknown parameters in the model.
Using classic optimal design techniques, one can establish an analog of the generalized equivalence theorem. A necessary and sucient condition for a design to be locally optimal is the
inequality
h
i
(x; ; ) = tr (x; )M ;1 ( ; ) m; m = dim ;
(15)
in the case of the D-criterion, and
h
i
h
(x; ; ) = tr (x; )M ;1 ( ; )AM ;1 ( ; ) tr AM ;1 ( ; )
i
(16)
in the case of the linear criterion. Equality in (15), or (16), is attained at the support points
of the optimal designs . The function (x; ; ) is often called a sensitivity function of the
corresponding criterion; see Fedorov and Hackl (1997, Ch. 2).
For a single response function, k = 1, the sensitivity function of the D-criterion can be
rewritten as
(x; ; ) = d1S((x;x;;)) + d22S(2x;(x;;)) ;
(17)
4 OPTIMAL DESIGN OF EXPERIMENTS
11
where
(x; ) M ;1 (; ) @(x; ) ; d (x; ; ) = @S (x; ) M ;1 (; ) @S (x; ) ;
d1(x; ; ) = @@
2
T
@
@T
@
(18)
see Downing et al. (2001). The scalar case was extensively discussed by Atkinson and Cook
(1995) for various partitioning schemes of parameter vector , including separate and overlapping
parameters in the variance function.
Example 3.1 (continued). As mentioned above, in this example the information matrix
is blockwise with respect to and S , i.e.
(x; ) = (x)
0
0
SS
!
; and M (; ) = M (; )
P
0
0
SS
!
;
where (x) = F (x)S ;1 F T (x); M (; ) = i pi (xi ); 0's are zero matrices of proper size
and SS does not depend on x. Therefore, (15) admits the following presentation:
h
i
;1 ( ; )F (x) dim :
tr S ;1 F T (x)M
(19)
;1 ( ; )F (x) is the asymptotic dispersion matrix of the predicted
The matrix d1 = F T (x)M
response vector F T (x) at point x.
Note that in general the design may depend on S , i.e. (15) or (16) lead to a locally optimal
design. Formulas like (15) and (16) provide a basis for rst order numerical algorithms similar to
those discussed in major texts on experimental design; cf. Atkinson and Donev (1992), Fedorov
and Hackl (1997). While not discussing these algorithms in general we provide some special
cases in examples.
4.1
Dose response model
In dose response studies, the response is often described by a logistic function
; 1 ;
(x; ) = (x; ) = 1 + 1 +2(x=
3 )4
(20)
S (x; ) = 1 2 (x; ) ; T = ( T ; T )T :
(21)
where x is a given dose. The power model is a popular choice for the variance function,
To illustrate D-optimal design for this model, we use the data from a study on the ability of
a compound to inhibit the proliferation of bone marrow erythroleukemia cells in a cell-based
assay; see Downing et al. (2001). The vector of unknown parameters was estimated by tting
the data collected from the two-fold dilution design covering a range of concentrations from
0.98 to 500 ng/ml. Thus the design region was set to [-0.02, 6.21] on the log-scale. Fig. 1
presents the locally optimal design and the variance of prediction for the model (20), (21),
where = (616; 1646; 75:2; 1:34; 0:33; 0:90)T .
The two subplots on the right-side of Fig. 1 present the normalized variance of prediction
(x; ; ) dened in (17), for a serial dilution design 0 (i.e. uniform on the log-scale, upper
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
1600
ψ
ψ (standard)
1
ψ2 (new term)
12
10
ψ(x,ξ0,θ)
η(x,θ)
1400
1200
8
6
1000
4
800
2
0
2
4
0
6
12
0
2
log(x)
4
6
4
6
log(x)
1000
12
10
ψ(x,ξ*,θ)
d(x,ξ,θ)
800
600
400
8
6
4
ξ0 (uniform)
*
ξ (optimal)
200
0
0
2
2
4
6
0
0
2
log(x)
log(x)
Figure 1: Upper left: response. Upper right: normalized variance for uniform design. Lower right:
normalized variance for optimal design. Lower left: unnormalized variance (triangles - optimal design,
circles - serial dilution)
right) and D-optimal design (lower right). The solid lines show the function (x; ; ) while
the dashed and dotted lines display the 1st and 2nd terms on the right-hand side of (17),
respectively. The unnormalized variance of prediction d1 (x; ; ) dened in (18), is given on the
lower-left subplot. It is worthy to note that the optimal design in our example is supported at
just four points, which is less than the number of estimated parameters. We also remark that
the weights of the support points are not equal. In our example p = f0:28; 0:22; 0:22; 0:28g.
5 Optimal Designs Under Cost Constraints
Traditionally when normalized designs are discussed, the normalization factor is equal to the
number of experiments N ; see Section 3. Now let each measurement at point xi be associated
with a cost c(xi ), and there exists a restriction on the total cost,
n
X
i=1
nic(xi ) C:
(22)
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
13
In this case it is quite natural to normalize the information matrix by the total cost C and
introduce
X
MC (; ) = MNC() = wi~(xi ; ); with wi = nicC(xi ) ; ~(x; ) = c((x;x)) : (23)
i
Note that the considered case should not be confused with the case when additionally to (23)
P
one also imposes that i ni N . The corresponding design problem is more complicated and
must be addressed as discussed in Cook and Fedorov (1995).
As soon as cost function c(x) is dened, one can use the well elaborated techniques of
constructing continuous designs for various criteria of optimality
[MC;1 (; )] ! min
; where = fwi ; xi g:
As usual, to obtain frequencies ni , values n~ i = wi C=c(xi ) have to be rounded to the nearest
P
integers ni subject to i nic(xi ) C ; for details on rounding, see Pukelsheim (1993, Ch. 12).
To illustrate the potential of this approach, we construct D-optimal designs for the twodimensional response function (x; ) = [1 (x; ); 2 (x; )]T , with the variance matrix
!
S (x; ) = S11 (x; ) S12 (x; ) :
(24)
S12 (x; ) S22 (x; )
Let a single measurement of function i (x; ) cost ci (x); i = 1; 2. Additionally, we impose a
cost cv (x) on any single or pair of measurements. The rationale behind this model comes from
considering a hypothetical visit of a patient to the clinic to participate in a clinical trial. It is
assumed that each visit costs cv (x), where x denotes a patient (or more appropriately, some
patient's characteristics). There are three options for each patient:
(1) Take test t1 which by itself costs c1 (x); the total cost of this option is C1 (x) = cv (x)+ c1 (x).
(2) Take test t2 which costs c2 (x); the total cost is C2 (x) = cv (x) + c2 (x).
(3) Take both tests t1 and t2 ; in this case the cost is C3 (x) = cv (x) + c1 (x) + c2 (x).
Another interpretation could be measuring pharmacokinetic prole (blood concentration) at one
or two time points.
To consider this example within the traditional framework, introduce binary variables x1
and x2 ; xi = f0 or 1g; i = 1; 2: Let X = (x; x1 ; x2 ) where x belongs to a \traditional" design
region X , and pair (x1 ; x2 ) belongs to
X12 = f(x1 ; x2 ) : xi = 0 or 1; max(x1 ; x2 ) = 1g:
Dene
where
(X; ) = Ix1 ;x2 (x; ); S (X; ) = Ix1 ;x2 S (x; ) Ix1 ;x2 ;
Ix1 ;x2 = x1
0
0
x2
!
:
(25)
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
Now introduce the \extended" design region X,
X = X X12 = Z1
[
Z2
[
14
Z3 ;
(26)
where
Z1 = f(x; x1 ; x2 ) : x 2 X ; x1 = 1; x2 = 0g; Z2 = f(x; x1 ; x2) : x 2 X ; x1 = 0; x2 = 1g;
\
\
Z3 = f(x; x1 ; x2 ) : x 2 X ; x1 = x2 = 1g; and Z1 Z2 Z3 = :
The normalized information matrix MC (; ) and design are dened as
MC (; ) =
n
X
i=1
n
X
wi~(Xi ; );
i=1
wi = 1; = fXi ; wig ;
where ~(X; ) = (X; )=Ci (x) if X 2 Zi ; i = 1; 2; 3; with (X; ) dened in (2), and
(X; ); S (X; ) introduced in (25).
Note that the generalization to k > 2 is straightforward; one has to introduce k binary
variables xi and matrix Ix1 ;:::;xk = diag(xi ): The number of subregions Zi in this case is equal
to 2k ; 1:
The formulation (26) will be used in the example below to demonstrate the performance of
various designs with respect to the sensitivity function (X; ; ); X 2 Zi .
Response functions: η1(x,θ) = γ1 + γ2x + γ3x2 + γ4x3; η2(x,θ) = γ1 + γ5x + γ6x+
5
η1
η2
4
η(x,θ)
3
2
1
0
−1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x
Figure 2: Response functions 1 and 2 ; parameter = (1; 2; 3; ;1; 2; ;1:5)T
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
5.1
15
Two response functions with cost constraints
For the example, we selected functions
1 (x; ) = 1 + 2 x + 3x2 + 4x3 = F1T (x) ; 2 (x; ) = 1 + 5x + 6 x+ = F2T (x);
where F1 (x) = (1; x; x2 ; x3 ; 0; 0)T ; F2 (x) = (1; 0; 0; 0; x; x+ )T ; x 2 X = [;1; 1]; and
x+ = fx if x 0; and 0 otherwiseg; see Fig.2. Cost functions are selected as constants
cv ; c1 ; c2 2 [0; 1] and do not depend on x. Similarly, the variance matrix S (x; ) is constant,
S (x; ) =
S11 S12
S12 S22
!
:
In our computations, we take S11 = S22 = 1; S12 = ; 0 1, thus changing the value of
S12 only. Note that = (1 ; 2 ; : : : ; 6 ; S11 ; S12 ; S22 )T : The functions 1 ; 2 are linear with
respect to unknown parameters . Thus optimal designs do not depend on their values. On the
contrary, in this example optimal designs do depend on the values of the variance parameters
Sij , i.e. we construct locally optimal designs with respect to their values (compare Figures 4 and
5). We considered a rather simple example to illustrate the approach. Nevertheless, it allows us
to demonstrate how the change in cost functions and variance parameter eects the selection
of design points.
1st function only: cost = 1.0 + 0.0
ψ(X), X∈ Z1
9
7
5
3
1
0
−1
1
2nd function only: cost = 1.0 + 0.0
ψ(X), X∈ Z2
9
7
5
3
1
0
−1
1
Both functions: cost = 1.0 + 0.0 + 0.0
ψ(X), X∈ Z3
9•
7
0.321
•
0.162
•
•
0.043
0.164
0
0.5
0.325
•
5
3
1
0
−1
−0.5
1
x
Figure 3: Sensitivity function and D-optimal design for cv = 1, c1 = c2 = 0; = 0
For the rst run, we hchoose cv = 1; c1 =i c2 = 0; = 0; see Fig.3 which shows the sensitivity
function (X; ; ) = tr ~(X; )MC;1 ( ; ) for X 2 Zj ; j = 1; 2; 3: Not surprisingly, in this case
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
16
selected are design points in subregion Z3 . Indeed, since individual measurements cost nothing,
it is benecial to take two measurements instead of a single one to gain more information and
to decrease the variability of parameter estimates. The weights of the support points are shown
in the plot which illustrates the generalized equivalence theorem: the sensitivity function hits
the reference line m = 9 at the support points of D-optimal design; recall that dim() = 9.
If we introduce positive costs of the individual measurements, then the weights are redistributed. The case cv = 1; c1 = c2 = 0:5; = 0; is presented in Fig. 4. Compared to Fig.3,
the design weights in the middle of subregion Z3 shift to subregion Z1 , where two new points
appear: x4;5 = 0:45 with weights w4;5 = 0:18. It is interesting that in this case no support
points appear in subregion Z2 (i.e. measuring the 2-nd function only).
1st function only: cost = 1.0 + 0.5
•
ψ(X), X∈ Z1
9
•
0.184
7
0.185
5
3
1
0
−1
1
2nd function only: cost = 1.0 + 0.5
ψ(X), X∈ Z2
9
7
5
3
1
0
−1
1
Both functions: cost = 1.0 + 0.5 + 0.5
ψ(X), X∈ Z3
9•
7
•
0.311
0.313 •
0.02
2
5
3
1
0
−1
−0.5
0
0.5
1
x
Figure 4: D-optimal design for cv = 1, c1 = c2 = 0:5; = 0
The next case deals with positive correlation = 0:3 and cv = 1; c1 = c2 = 0:5; see Fig.
5. Now there are just 4 support points in the design: two of them are at the boundaries of
subregion Z3 with weights w1;2 = 0:33, and the other two are in the middle of subregion Z1 ,
x3;4 = 0:45 with weights w3;4 = 0:17:
So far, subregion Z2 has not been represented in the optimal designs. Fig.6 illustrates a case
when support points appear in this subregion. For this, we take cv = 1; c1 = 1; c2 = 0:1; = 0:2:
A design point x5 = 0 appears in the center of Z2 with weight w5 = 0:1. Not surprisingly,
subregion Z1 has no support points in this example since the cost of measuring function 1 is
much higher than for function 2 .
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
17
1st function only: cost = 1.0 + 0.5
•
ψ(X), X∈ Z1
9
•
0.173
7
0.174
5
3
1
0
−1
1
2nd function only: cost = 1.0 + 0.5
ψ(X), X∈ Z2
9
7
5
3
1
0
−1
1
Both functions: cost = 1.0 + 0.5 + 0.5
ψ(X), X∈ Z3
9•
7
0.33
0.332
1
•
5
3
1
0
−1
−0.5
0
0.5
1
x
Figure 5: D-optimal design for cv = 1, c1 = c2 = 0:5; = 0:3
1st function only: cost = 1.0 + 1.0
ψ(X), X∈ Z1
9
7
5
3
1
0
−1
1
ψ(X), X∈ Z2
2nd function only: cost = 1.0 + 0.1
9
•
7
0.105
5
3
1
0
−1
1
ψ(X), X∈ Z3
Both functions: cost = 1.0 + 1.0 + 0.1
9 • 0.31
1
7
•
0.14
•
0.14
2
3
0.314
•
5
3
1
0
−1
−0.5
0
0.5
1
x
Figure 6: D-optimal design for cv = 1, c1 = 1; c2 = 0:1; = 0:2
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
5.2
18
Linear regression with random parameters
Let
E(yj; x) = f T (x); Var(yj; x) = 2 :
(27)
We assume that given , all observations are independent. Parameters 2 Rm are independently
sampled from the normal population with
E( ) = 0 ; Var( ) = ;
(28)
where 0 and are often referred to as \population", or \global" parameters. Let
T
f (xij ) = [f1(xijh ); : : : ; fm (xij )]T ; i =
i 1; : : : ; kj ; yj = (y1j ; : : : ; ykj ;j ); xj = (x1j ; : : : ; xkj ;j );
and F (xj ) = f (x1;j ); : : : ; f (xkj ;j ) : Then model (27) can be represented as
E(yj jj ; xj ) = F T (xj )j :
(29)
We emphasize that dierent numbers of measurements kj can be obtained for dierent j 's.
Predictor xij is a q-dimensional vector; for example, if a patient receives a q-drug treatment, xuij
denotes the dose level of drug u administered to individual j in experiment i; u = 1; : : : ; q.
From (28) and (29) it follows that
E(yjxj ) = (xj ; 0 ) = F T (xj )0 ; Var(yjxj ) = S (; 2 ; xj ) = F T (xj )F (xj ) + 2 Ikj :
(30)
We rst assume that is diagonal, i.e. = diag( ); = 1; : : : ; m. For the discussion on
how to tackle the general case of non-diagonal matrix , see Remark 5 in the Appendix.
Straightforward exercises in matrix algebra lead from (2) to the following representation of
the information matrix:
0
N; 0m;m 0m;1
N
X
B
2
B
N (; ; ) = (xj ; ) = @ 0m;m N; N;
j =1
where
01;m TN; N;
1
0
N B ;j 0m;m 0m;1
CC = X
B@ 0m;m ;j ;j
A
j =1
01;m T;j ;j
1
CC ; (31)
A
;j = F (xj )Sj;1 F T (xj ); Sj = S (; 2 ; xj );
h
i
f;j g = 21 F (xj )Sj;1FT (xj ) 2 ; ; = 1; : : : ; m;
h i
f;j g = 12 F (xj )Sj;2 FT (xj ); = 1; : : : ; m; ;j = 21 tr Sj;2 ;
F (xj ) = f(x1j ); f (x2j ); : : : ; f (xkj ;j ) ; and 0a;b is an (a b) matrix of zeros:
Thus sets of parameters and f; g are mutually orthogonal. This makes optimal design
and estimation problems computationally more aordable. Iterated estimators, similar to the
example in Section 3.1, can be written
2
2
^
^N = tlim
!1 t ; N = tlim
!1 t ; ^N = tlim
!1 t ;
5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS
where
;1 Yt ; M;t =
t+1 = M;t
N
X
j =1
F (xj )Stj;1F T (xj ); Yt =
t+1
!
t2+1
Mt;
= Mt;
T
Mt;
Mt;
N
X
j =1
!;1
19
F (xj )Stj;1 yj ; Stj = S (t ; t2 ; xj );
yt;1
yt;2
!
;
(32)
where t = (t1 ; : : : ; tm )T ; Mt; is an (m m) matrix; Mt; and yt;1 are (m 1) vectors;
Mt;; Mt; ; and Mt; are the same as N; ; N; ; and N; , respectively except that Stj
should be substituted for Sj ,
fyt;1 g = 21
N n
X
j =1
yt;2 = 21
h
F (xj )Stj;1 yj ; F (xj )T t
N
X
io2
; = 1; : : : ; m;
[yj ; F (xj )T t ] Stj;2 [yj ; F (xj )T t ]:
j =1
The proof of (32) is postponed to the Appendix.
Now, we show how to introduce cost constraints for this example; for other methods of
generating optimal design with constraints in random eects models, see Mentre and al. (1997).
The hypothetical model is similar to that for Example 5.2, where patients visit a clinic to
participate in a clinical trial. Here, we assume that the patients will undergo serial sampling
over time, for example blood sampling, within a specied time interval [0; T ]. Each patient who
participates in the trial may have a dierent number of samples taken, up to a maximum of q.
Following Example 5.2, we impose a cost cv for each visit, and we assign a cost cs for each
of the individual samples. Therefore, if k samples are taken, the total cost per patient will be
described as
Ck = cv + kcs; k = 1; : : : ; q;
with the restriction (22) on the total cost.
Since samples are taken over time, there is a natural ordering corresponding to the timing of
the samples. The design region X for a patient depends upon the number and timing of samples
taken from that patient. For example, if a patient visits the clinic for a single sample only, the
design region X1 will consist of a single point x, 0 x T . For a patient having two samples
taken, the design region X2 will consist of vectors of length 2,
X2 = fX = (x1 ; x2 ); 0 x1 < x2 T g;
etc. If q samples are taken from a patient, then
Xq = fX = (x1 ; : : : ; xq ); 0 x1 < : : : < xq T g:
Finally, the normalized information matrix can be dened as
MC (; ) =
n
X
i=1
wi~(Xi ) ;
n
X
i=1
wi = 1;
6 DISCUSSION
20
where ~(X; ) = (X; )=Ck , if dim(X ) = k and X 2 Xk , and the information matrix (X; )
is dened in (31).
6 Discussion
Multiresponse regression models with a variance matrix depending upon unknown parameters
form a common basis for modeling in many areas of research. Well known examples exist in
biomedical research, such as in pharmacokinetics and dose response models. Further examples
are prevalent in econometrics, psychometrics, and agricultural eld studies, among others. A
rich literature exists in which methods for parameter estimation are derived and compared.
Optimal design for these types of models has also been studied, but the literature is not as well
developed.
In this paper, we propose an iterated estimator, which is shown to be a least squares estimator
combining the usual generalized least squares with squared deviations of the predicted dispersion
matrix from the observed residual matrices. This estimator is shown to be asymptotically
equivalent to the MLE. We provide closed form solutions for the proposed estimator for a
linear model with random parameters. For the case of a single response, the combined iterated
estimator is similar to the iterated estimator proposed by Davidian and Carroll (1987). However
they partition the parameter vector into two subvectors, with the second one appearing only
in the variance function, and perform iterations on each term separately. Moreover, for cases
where parameters in the expectation and variance functions coincide, the second term disappears
completely from the iterations and hence their iterated estimator does not lead to the MLE.
Optimal experimental design is a critical aspect of research, and well known algorithms
can be utilized to generate such designs. We combine the proposed iterated estimator with
convex design theory to generate locally optimal designs. In a specic example, a dose response
study with the response modeled by a logistic function and the variance modeled by a two
parameter power function, a rst order numerical algorithm was utilized to generate a D-optimal
design. The optimal design was compared to a two-fold dilution design with 10 dilutions (a
standard dose response design). We underline that the optimal design for this model is supported
at four design points, which is less than the total of 6 parameters in the model. Therefore,
combined application of the estimation and design algorithms leads to an optimal design which,
if implemented, requires fewer resources than the standard design (4 points instead of 10).
Certainly, for nonlinear models one constructs locally optimal designs which requires preliminary
estimates of unknown parameters. From our experience with the logistic model, the locally
optimal designs are quite robust to a reasonable variation of parameter estimates.
Finally, we introduce cost functions and demonstrate the application of cost constraints to
normalized designs. This provides experimenters with a basis for incorporating a total cost,
and allows them to achieve a statistically optimal design, in light of these costs. Such a design
oers several benets, not the least of which is to enable the experimenter to conduct the study
within budget while obtaining reliable parameter estimates. In our example with two response
6 DISCUSSION
21
functions, we demonstrate the impact of cost constraints. With a xed overall cost only (no
individual costs, see Fig. 3), support points are allocated within our design region Z3 , which is
not surprising: it is benecial to take two measurements, rather than one, to increase information
content and reduce parameter variability. Introduction of positive costs and correlation between
the response functions shifts the design which is demonstrated in the examples; see Figs. 4-6.
In conclusion, this paper describes an iterated estimator for nding parameter estimates for
multireponse models with variance depending upon unknown parameters, combines this method
with convex design theory, and introduces designs with cost constraints. These concepts can be
a valuable tool to experimenters, enabling ecient parameter estimation and optimal allocation
of resources.
Acknowledgements
The authors are grateful to the referees for useful comments that helped to improve the presentation of the results.
7 APPENDIX
22
7 Appendix
In this section, we use a few standard formulas from matrix dierential calculus; see Harville
(1997, Ch. 15).
(1) If S is a symmetric matrix which depends on parameter , and if u is a scalar function of
S , then
du = tr @u @S ;
(33)
d
@S @
@ log jS j = S ;1; dS ;1 = ; S ;1 dS S ;1 :
@S
d
d
(34)
@ tr [(A ; S )B ]2 = 2B (S ; A)B :
@S
(35)
(2) If A; S; and B are symmetric matrices of proper dimension, then
The following lemma is used in the proof of Theorem 1.
Lemma 1. Let
S0 S (x; ) S1 for any x 2 X and 2 ;
where S0 and S1 are positive denite matrices. Let
(36)
R1(x; ; ; 0) = [(x; ) ; (x; )]T S ;1 (x; 0 ) [(x; ) ; (x; )];
R2(x; ; ; 0) = 21 trf R22 (x; ; ; 0 ) + S (x; ) ; S (x; ) S ;1 (x; 0 )g2 ;
R22 = [(x; ) ; (x; 0)] [(x; ) ; (x; 0)]T ; [(x; ) ; (x; 0)] [(x; ) ; (x; 0 )]T ;
and for a given ,
[(x; ) ; (x; )]T S1;1 [(x; ) ; (x; )] + 12 trf[S (x; ) ; S (x; )] S1;1 g2 > 0:
Then
(37)
R1(x; ; ; 0) + R2(x; ; ; 0) > 0:
Proof. It is obvious that for any x; ; and 0,
R1 (x; ; ; 0 ) 0; R2 (x; ; ; 0 ) 0;
and
R1(x; ; ; 0) = R2 (x; ; ; 0 ) = 0 if = :
Next, term R22 can be represented as
R22 = [(x; ) ; (x; 0 )] [(x; ) ; (x; )]T ; [(x; ) ; (x; )] [(x; 0 ) ; (x; )]T : (38)
7 APPENDIX
23
Since both terms on the left-hand side of (37) are non-negative, then at least one of them is
positive. If the rst term is positive, then the lemma follows from (36) and the denition of R1 .
Next, if the rst term is equal to zero, then (x; ) = (x; ), and (38) implies that
R2(x; ; ; 0) = 21 trf[S (x; ) ; S (x; )]S ;1 (x; 0)g2 12 trf[S (x; ) ; S (x; )]S1;1 g2 > 0;
since in this case the second term on the right-hand side of (37) is positive. This proves the
lemma.
Proof of Theorem 1. First, compute partial derivatives of the log-likelihood function
LN (), introduced in (3). Using (33) and (34), and letting zi () = yi ; (xi ; ), one gets:
#
N "
N
T
X
X
@L
(
)
@S
(
x
;
)
N
i
;
1
;2 @ =
tr S (xi ; ) @
; 2 @ @(xi ; ) S ;1(xi ; )zi() ;
j
j
i=1
N
X
i=1
j
(xi ; ) S ;1 (x ; ) z () :
ziT () S ;1 (xi; ) @S@
(39)
i
i
j
i=1
Next, use (33) and (35) to compute partial derivatives of vN2 (; 0) with respect to j , which
;
leads to
N @T (x ; )
@vN2 (; 0) = ; 2 X
i
S ;1(xi ; 0 ) zi () +
@j
@
j
i=1
"
N
o ;1 0 @S (xi ; ) #
X ;1 0 n
0
T
0
:
+ tr S (xi ; ) S (xi ; ) ; zi ( )zi ( ) S (xi ; ) @
j
i=1
From the identity tr[AB ] = tr[BA], it follows that
+
"
N
X
i=1
N @T (x ; )
X
@vN2 (; 0) i
=
;
2
S ;1(xi ; ) zi() +
@j =
@
j
i=1
0
tr S ;1 (xi ; ) @S (xi ; )
@j
#
;
N
X
(xi ; ) S ;1 (x ; ) z () ;
ziT () S ;1 (xi ; ) @S@
i
i
j
i=1
which coincides with (39).
Note that if the algorithm (11) converges, then under the introduced assumptions
2 (; ^N ) 2 (; t ) @v
@v
N
N
= 0;
lim
t!1
@ =
@ =t
=^N
which implies that ^N 2 N . To prove the convergence of (11), introduce
AN (0 ) = arg min
v2 (; 0):
2
N
Then (11) can be presented as the recursion t = AN (t;1 ) for the xed point problem =
AN (). Convergence of the recursion is guaranteed if for any 1; 2 2 there exists a constant
K; 0 < K < 1, such that
[AN (1 ) ; AN (2 )]T [AN (1 ) ; AN (2 )] K (1 ; 2 )T (1 ; 2 );
7 APPENDIX
24
cf. Saaty and Bram (1964), Ch.1.10.
Remark that AN () is simply a generalized version of the least squares estimator with predetermined weights. Straightforward calculations show that the expectation of the ith summand
on the right-hand side of (12) is equal to
R(xi ; ; ; 0) = R1 (xi; ; ; 0) + R2 (xi; ; ; 0 ) + R3 (xi ; ; 0)
where terms R1 and R2 are introduced in Lemma 1, and term R3 does not depend on . Lemma
1 together with Assumption 3 guarantees that for any 0 , the limiting function
X
lim 1 R(x ; ; ; 0 )
i
N !1 N i
has a unique minimum with respect to at = : Using the strong law of large numbers, and
following Jennrich (1969), one can show that AN () is strongly consistent, i.e. converges almost
surely to ; cf. Rao (1973, Ch.2c). From this fact and the compactness of , it follows that the
probability
h
i
P fAN (1 ) ; AN (2 )gT fAN (1 ) ; AN (2)g K (1 ; 2 )T (1 ; 2 )
tends to 1 as N ! 1 uniformly over 1 ; 2 2 for any xed 0 < K < 1. Thus, for large N
with probability close to 1 the limit (11) exists and, consequently, ^N 2 N which proves the
theorem.
Proof of (32). Introduce zj = yj ; F T (xj )t ; Bj = (St;j );1 ; and recall that
Sj = F T (xj )F (xj ) + 2 Ik :
It is straightforward to show that
F T (x
j )F (xj ) =
and therefore
m
X
=1
FT (xj )F (xj ) ;
@Sj = F T (x )F (x ); @Sj = I :
k
j j @2
@
The analogue of the second term on the right-hand side of (13) can be written as
N
h
X
v2 = 12 tr (zj zjT ; Sj )Bj
j =1
i2
Then using (33), (35), (40), and the identity tr[AB ] = tr[BA], one gets:
2N
3
n
o
X
@v2
T
2
T
T
4
5
@ = tr j=1 Bj F (xj )F (xj ) + Ik ; zj zj Bj F (xj )F (xj ) =
8m
9
N
<X T
=
X
=
F (xj )Bj : F (xj )F (xj ) + 2 Ik ; zj zjT ; Bj FT (xj ) =
j =1
=1
(40)
7 APPENDIX
25
= 2
m
X
=1
fMt; g + 22 fMt; g ; 2fyt;1 g :
In a similar fashion, taking the partial derivative with respect to 2 leads to
2
3
N
n T
o
@v2 = tr 4X
2I ; z zT B 5 =
B
F
(
x
)
F
(
x
)
+
j
j
j
j j
j
k
@2
j =1
9
2m 8
3
N
N
N
<X
=
X
X
X
= tr 4 : Bj FT (xj )F (xj )Bj ; + 2 Bj2 ;
Bj zj zjT Bj 5 =
=1
j =1
= 2
m
X
=1
j =1
j =1
fMt; g + 22 Mt; ; 2yt;2 :
Finally, equating the expressions of partial derivatives to zero entails
!
Mt; Mt;
T
Mt;
Mt;
2
!
=
yt;1
yt;2
!
;
which proves (32).
Remark 5. To establish the analog of (32) in the case of non-diagonal symmetric matrix
= ( ), one has to exploit the identity
F T (xj )F (xj ) =
=
m
X
r=1
m
X
r;q=1
rr FrT (xj )Fr (xj ) +
rq FrT (xj )Fq (xj ) =
X
r>q
h
i
rq FrT (xj )Fq (xj ) + FqT (xj )Fr (xj ) :
(41)
First, the formula (31) should be modied according to (41). The information matrix N;
is now an [m(m +1)=2] [m(m +1)=2] matrix corresponding to parameter vech() (cf. Example
1 for the notation vech). Let
Wj; = F (xj )Sj;1 FT (xj ):
Then the elements of N; are dened by N;;rq ,
N;;rr = 21
N
X
j =1
2 ; Wj;r
N;;rq =
N
X
j =1
Wj;r Wj;q ; ;rq =
N
X
j =1
[Wj;r Wj;q + Wj;q Wj;r ] ;
cf. Jennrich and Schluchter (1986). An m(m + 1)=2 vector N; has elements N;; ,
N
X
N;; = 21 F (xj )S ;2 FT (xj ); N;; =
j =1
and, nally N; does not change.
N
X
j =1
F (xj )S ;2 FT (xj );
7 APPENDIX
26
Using now formula (41), taking partial derivatives with respect to ; , and 2 , and
equating them to zero leads to the system of [m(m + 1)=2 + 1] linear equations, the solution of
which is given by
!
~ t; M~ t; !;1 y~t;1 !
M
=
;
T
2
M~ t;
Mt;
y~t;2
where M~ t; and M~ t; are introduced absolutely similar to N; and N; , with Stj substituting for Sj ,
N h
N
i
X
X
fy~t;1g = 21 F (xj )Stj;1zj 2 ; fy~t;1 g = F (xj )Stj;1 zj F (xj )Stj;1 zj ;
j =1
j =1
N
X
y~t;1 = 12 zjT Stj;2 zj :
j =1
REFERENCES
27
References
[1] Atkinson, A.C. and Cook, R.D. (1995), D-optimum designs for heteroscedastic linear models, JASA, 90 (429), 204-212.
[2] Atkinson, A.C., and Donev, A. (1992), Optimum Experimental Designs, Clarendon Press,
Oxford.
[3] Beal, S.L., and Sheiner, L.B. (1988), Heteroscedastic nonlinear regression. Technometrics,
30(3), 327-338.
[4] Cook, R.D., and Fedorov, V.V. (1995), Constrained optimization of experimental design,
Statistics, 26, 129-178.
[5] Cramer, H. (1946), Mathematical Methods of Statistics, Princeton University Press.
[6] Davidian, M., and Carroll, R.J. (1987), Variance function estimation, JASA, 82 (400),
1079-1091.
[7] Downing, D.J., Fedorov, V.V., Leonov, S.L. (2001), Extracting information from the variance function: optimal design. In: Atkinson, A.C., Hackl, P., Muller, W.G. (eds.) MODA6
- Advances in Model-Oriented Design and Analysis, Heidelberg, Physica-Verlag, 45-52.
[8] Fedorov, V.V. (1974), Regression problems with controllable variables subject to error,
Biometrika, 61, 49-56.
[9] Fedorov, V.V. (1977), Parameter estimation for multivariate regression, In: Nalimov, V.
(Ed.), Regression Experiments (Design and Analysis), Moscow State Universtiy, Moscow,
112-122 (in Russian).
[10] Fedorov, V.V., and Hackl, P. (1997), Model-Oriented Design of Experiments, SpringerVerlag, New York.
[11] Harville, D.A. (1997), Matrix Algebra from a Statistician's Perspective, Springer-Verlag,
New York.
[12] Heyde, C.C. (1997), Quasi-Likelihood and Its Applications, Springer-Verlag, New York.
[13] Jennrich, R.I. (1969), Asymptotic properties of nonlinear least squares estimators, Ann.
Math. Stat. 40, 633-643.
[14] Jennrich, R.I. , and Schluchter, M.D. (1986), Unbalanced repeated-measures models with
structured covariance matrices, Biometrics, 42, 805-820.
[15] Jobson, J.D., and Fuller, W.A. (1980), Least squares estimation when the covariance matrix
and parameter vector are functionally related, JASA, 75 (369), 176-181.
REFERENCES
28
[16] Lindstrom, M.J., and Bates, D.M. (1990), Nonlinear mixed eects models for repeated
measures data, Biometrics, 46, 673-687.
[17] Magnus, J.R., and Neudecker, H. (1988), Matrix Dierential Calculus with Applications in
Statistics and Econometrics, Wiley, New York.
[18] Malyutov, M.B. (1982), On asymptotic properties and application of IRGNA-estimates
for parameters of generalized regression models. In: Stoch. Processes and Appl., Moscow,
144-165 (in Russian).
[19] Malyutov, M.B. (1988), Design and analysis in generalized regression model F. In: Fedorov,
V.V., Lauter, H. (Eds.), Model-Oriented Data Analysis, Springer-Verlag, Berlin, 72-76.
[20] Mentre, F., Mallet, A., Baccar, D. (1997), Optimal design in random-eects regression
models, Biometrika, 84 (2), 429-442.
[21] Muirhead, R. (1982), Aspects of Multivariate Statistical Theory, Wiley, New York.
[22] Muller, W.G. (1998), Collecting Spatial Data, Springer-Verlag, New York.
[23] Pukelsheim, F. (1993), Optimal Design of Experiments, Wiley, New York.
[24] Pazman, A. (1993), Nonlinear Statistical Models, Kluwer, Dordrecht.
[25] Rao, C.R. (1973), Linear Statistical Inference and Its Applications, 2nd Ed., Wiley, New
York.
[26] Saaty, T.L., and Bram, J. (1964), Nonlinear Mathematics, McGraw-Hill, New York.
[27] Seber, G.A.E. (1984), Multivariate Observations, Wiley, New York.
[28] Vonesh, E.F., and Chinchilli, V.M. (1997), Linear and Nonlinear Models for the Analysis
of Repeated Measurements, Marcel Dekker, New York.
[29] Wu, C.F. (1981), Asymptotic theory of nonlinear least squares estimation, Ann. Stat., 9
(3), 501-513.