OPTIMAL DESIGN FOR MULTIPLE RESPONSES WITH VARIANCE DEPENDING ON UNKNOWN PARAMETERS Valerii Fedorov, Robert Gagnon, and Sergei Leonov GSK BDS Technical Report 2001-03 August 2001 This paper was reviewed and recommended for publication by Anthony C. Atkinson London School of Economics London, U.K. John Peterson Biomedical Data Sciences GlaxoSmithKline Pharmaceuticals Upper Merion, PA, U.S.A. William F. Rosenberger Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, MD, U.S.A. c 2001 by GlaxoSmithKline Pharmaceuticals Copyright Biomedical Data Sciences GlaxoSmithKline Pharmaceuticals 1250 South Collegeville Road, PO Box 5089 Collegeville, PA 19426-0989 Optimal Design for Multiple Responses with Variance Depending on Unknown Parameters Valerii FEDOROV, Robert GAGNON, and Sergei LEONOV Biomedical Data Sciences GlaxoSmithKline Pharmaceuticals Abstract We discuss optimal design for multiresponse models with a variance matrix that depends on unknown parameters. The approach relies on optimization of convex functions of the Fisher information matrix. We propose iterated estimators which are asymptotically equivalent to maximum likelihood estimators. Combining these estimators with convex design theory leads to optimal design methods which can be used in the local optimality setting. A model with experimental costs is introduced which is studied within the normalized design paradigm and can be applied, for example, to the analysis of clinical trials with multiple endpoints. Contents 1 Introduction 4 2 Regression Models and Maximum Likelihood Estimation 4 3 Iterated Estimators and Combined Least Squares 7 3.1 Multivariate linear regression with unknown but constant covariance matrix . . . 9 4 Optimal Design of Experiments 10 5 Optimal Designs Under Cost Constraints 12 6 Discussion 20 7 Appendix 22 4.1 Dose response model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1 Two response functions with cost constraints . . . . . . . . . . . . . . . . . . . . . 15 5.2 Linear regression with random parameters . . . . . . . . . . . . . . . . . . . . . . 18 1 INTRODUCTION 4 1 Introduction In many areas of research, including biomedical studies, investigators are faced with multiresponse models in which variation of the response is dependent upon unknown model parameters. This is a common issue, for example, in pharmacokinetics, dose response, repeated measures, time series, and econometrics models. Many estimation methods have been proposed for these situations, see for example Beal and Sheiner (1988), Davidian and Carroll (1987), Jennrich (1969), and Lindstrom and Bates (1990). In these models, as in all others, optimal allocation of resources through experimental design is essential. Optimal designs provide not only statistically optimal estimates of model parameters, but also ensure that investments of time and money are utilized to their fullest. In many cases, investigators must design studies in which they are subject to some type of constraint. One example is a cost constraint, in which the total budget for conducting the study is limited, and the study design must be adjusted not only to ensure that the budget is realized, but also to ensure that optimal estimation of parameters is achieved. In this paper, we introduce an iterated estimator which is asymptotically equivalent to the maximum likelihood estimator (MLE). This iterated estimator is a natural generalization of the traditional iteratively reweighted least squares algorithms. It includes not only the squared deviations of the predicted responses from the observations, but also the squared deviations of the predicted dispersion matrix from observed residual matrices. In this way, our combined iterated estimator allows us to construct a natural extension from least squares estimation to the MLE. We show how to exploit classic optimal design methods and algorithms, and provide the reader with several examples which include a popular nonlinear dose response model and a linear random eects model. Finally, a model with experimental costs is introduced and studied within the framework of normalized designs. Among potential applications of this model is the analysis of clinical trials with multiple endpoints. The paper is organized as follows. In Section 2, we introduce the model of observations and discuss classic results of the maximum likelihood theory. In Section 3, the iterated estimator is introduced. Section 4 concentrates on optimal design problems. In Section 5, the model with experimental costs is studied within the normalized design paradigm. We conclude the paper with the Discussion. The Appendix contains proofs of some technical results. 2 Regression Models and Maximum Likelihood Estimation In this section, we introduce the multiresponse regression model, with variance matrix depending upon unknown model parameters. Models of this type include repeated measures, random coecients, and heteroscedastic regression, among others. We also present a brief review of maximum likelihood estimation theory, concluding with the asymptotic normality of the MLE. Note that the MLE for the regression models described herein does not yield closed form solu- 2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION 5 tions, except in the simplest of cases. It is necessary, therefore, to resort to iterative procedures, and to rely on the convergence and asymptotic properties of these procedures for estimation and inference. Let the observed k 1 vector y have a normal distribution and E[yjx] = (x; ); Var[yjx] = S (x; ); (1) where (x; ) = (1 (x; ); : : : ; k (x; ))T ; S (x; ) is a k k matrix, x are independent variables (predictors) and 2 Rm are unknown parameters. In this case the score function of a single observation y is given by @ nlog jS (x; )j + [y ; (x; )]T S ;1 (x; ) [y ; (x; )]o ; R(yjx; ) = ; 21 @ and the corresponding Fisher information matrix is (cf. Magnus and Neudecker (1988, Ch. 6), or Muirhead (1982, Ch. 1)) (x; ) = (x; ; ; S ) = Var[R(yjx; )] = [ (x; )]m;=1; " # (2) T (x; ) + 1 tr S ;1 (x; ) @S (x; ) S ;1 (x; ) @S (x; ) : (x; ) = @ @(x; ) S ;1 (x; ) @@ 2 @ @ In general, the dimension and structure of y; , and S can vary for dierent x. To indicate this, we should introduce a subscript ki for every xi , but we do not use it, retaining the traditional notation yi ; (xi ; ) and S (xi ; ) if it does not cause confusion. The log-likelihood function L for N independent observations y1 : : : yN can be written as N n o X 1 LN () = ; 2 log jS (xi ; )j + [yi ; (xi ; )]T S ;1 (xi ; ) [yi ; (xi ; )] ; i=1 (3) and the information matrix is additive in this case, i.e. N () = N X i=1 (xi ; ): Any vector N which maximizes the log-likelihood function LN (), N = arg max L () 2 N (4) is called a maximum likelihood estimator (MLE). We introduce the following assumptions: Assumption 1. The set is compact; xi 2 X where X is compact, and all components in (x; ) and S (x; ) are continuous with respect to uniformly in , with S (x; ) S0 where S0 is a positive denite matrix. The true vector of unknown parameters is an internal point of . Assumption 2. The sum Pi f (xi; ; )=N converges uniformly in to a continuous function (; ), ;1 X f (xi ; ; ) = (; ) ; lim N N !1 i=1 N (5) 2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION where h 6 i f (x; ; ) = log jS (x; )j + tr S ;1 (x; ) S (x; ) + + [(x; ) ; (x; )]T S ;1 (x; ) [(x; ) ; (x; )]; and the function (; ) attains its unique minimum at = . Following Jennrich (1969, Theorem 6), it can be shown that under Assumptions 1 and 2, the MLE is a measurable function of observations and is strongly consistent; see also Fedorov (1974), Heyde (1997, Ch. 12), Pazman (1993), Wu (1981). Condition (5) is based on the fact that n o E [y ; (x; )]T S ;1 (x; ) [y ; (x; )] = h i = [(x; ) ; (x; )]T S ;1 (x; ) [(x; ) ; (x; )] + tr S ;1 (x; ) S (x; ) ; and the Kolmogorov law of large numbers; see Rao (1973, Ch. 2c.3). If in addition to the above assumptions all components of (x; ) and S (x; ) are twice dierentiable with respect to for all 2 , and the limit matrix lim N ;1 N !1 N X i=1 (xi ; ) = M ( ) (6) exists and is regular, then N is asymptotically normally distributed, i.e. p N (N ; ) N (0; M ;1( )): (7) Note that the selection of the series fxi gN is crucial for consistency and precision of N : Remark 1. Given N and fxigN1 , a design measure can be dened as N X N (x) = N1 xi (x); x(z) = f1 if z = x; and 0 otherwiseg : i=1 If the sequence fN (x)g weakly converges to (x), then the limiting function (; ) in the \identiability" Assumption 2 can be presented as Z (; ) = f (x; ; )d(x); cf. Malyutov (1988). Most often, within the optimal design paradigm, the limit design is a discrete measure, i.e. a collection of support points fxj ; j = 1; :::; ng with weights pj , such that P p = 1; see Section 4. j j 3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 7 3 Iterated Estimators and Combined Least Squares If the dispersion matrices of the observed y are known, i.e. S (xi ; ) = S (xi ), then (3) leads to what is usually called the generalized least squares estimator: ~N = arg min N X i=1 [yi ; (xi ; )]T S ;1 (xi ) [yi ; (xi ; )] ; (8) which is well studied. When the variance function S depends on , it is tempting to replace (8) by N X T S ;1 (x ; ) [y ; (x ; )] ; ~N = arg min [ y ; ( x ; )] (9) i i i i i i=1 which in general is not consistent; cf. Fedorov (1974), Muller (1998). Resorting to iteratively reweighted least squares (IRLS), see Beal and Sheiner (1988), Vonesh and Chinchilli (1997), ~N = Nlim ; !1 t t = arg min N X i=1 (10) [yi ; (xi ; )]T S ;1 (xi ; t;1 ) [yi ; (xi ; )] ; leads to a strongly consistent estimator with an asymptotic dispersion matrix #;1 N @(x ; ) ~ "X @ ( x ; ) i S ;1 (x ; ) i ; Var N ' i T @ @ =~N i=1 which is bigger, in terms of non-negative denite matrix ordering, than the corresponding matrix for the MLE, dened in (6). For a discussion on related issues, see Jobson and Fuller (1980), Malyutov (1982). The natural step after (10) is introduction of the combined iterated least squares estimator which includes squared deviations of the predicted dispersion matrix S (x; ) from observed residual matrices: ^N = tlim (11) !1 t ; where t = arg min v2 (; t;1); 2 N vN2 (; 0) = + 21 where N X i=1 N X i=1 hn [yi ; (xi ; )]T S ;1 (xi ; 0 ) [yi ; (xi ; )] o i2 tr [yi ; (xi ; 0 )] [yi ; (xi ; 0 )]T ; (xi ; ; 0 ) ; S (xi ; ) S ;1 (xi ; 0 ) (12) (xi ; ; 0 ) = [(xi ; ) ; (xi ; 0 )] [(xi ; ) ; (xi ; 0 )]T : 3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 8 To prove the convergence of the combined iterated estimator, together with Assumption 1, we need the following: Assumption 3. The variance function satises S (x; ) S1 for all x 2 X and 2 , where S1 is a positive denite matrix. Design measures N (x) converge weakly to (x), and the function Z ~(; ) = f~(x; ; )d(x) is continuous with respect to , and attains its unique minimum at = , where n o2 f~(x; ; ) = [(x; ) ; (x; )]T S1;1 [(x; ) ; (x; )] + tr [S (x; ) ; S (x; )] S1;1 : The following theorem establishes the asymptotic equivalence of the combined iterated estimator (11), (12) and the MLE (4). The introduction of stationary points of the log-likelihood function in the statement of the theorem is similar to Cramer's (1946) denition of the MLE. Theorem 1. Under the regularity Assumptions 1 and 3, lim P f^N 2 N g = 1; N !1 where N is a set of stationary points of the log-likelihood function LN (), N = f : @L@N () = 0; j = 1; : : : ; mg: j The proof of the theorem is postponed to the Appendix. Remark 2. The introduction of the term (xi ; ; 0) in (12) together with Assumption 3 guarantees that for any 0 , the unique minimum of limN !1 E vN2 (; 0) =N with respect to is attained at = ; see Lemma 1 in the Appendix for details. Note that if the iterated estimator (11), (12) converges, then (xi ; t ; t;1 ) ! 0 as t ! 1. Therefore, in some situations this term can be omitted if a starting point 0 is close enough to the true value; see Section 3.1 for an example. Remark 3. If function ~(; ) dened R in Assumption 3 attains its unique minimum at = , then so does function (; ) = f (x; ; )d (x), see Remark 1. To verify this, note that if S and S are postitive denite matrices, then the matrix function g(S ) = log jS j + tr[S ;1 S ] attains its unique minimum at S = S ; see Seber (1984, Appendix A7). Remark 4. If in (12) one chooses 0 = and, similar to (9), considers ^N = arg min v2 (; ); N 2 3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 9 then using the approach described in Fedorov (1974), it can veried that this \non-reweighted" estimator ^N is, in general, inconsistent. 3.1 Multivariate linear regression with unknown but constant covariance matrix Let y be a k 1 normally distributed vector and E[yjx] = F T (x); Var[yjx] = S , i.e. T = ( T ; S11 ; S21 ; : : : ; Sk1 ; S22 ; S32 ; : : : ; Sk2 ; : : : ; Skk ) = ( T ; vechT (S )); with F (x) = [f1 (x); : : : ; fk (x)]. We follow Harville (1997, Ch. 16.4) in using the notation vech for element-wise \vectorization" of a k k symmetric matrix. The simplest estimator of the regression parameter is given by ~N = N X i=1 F (xi )F T (xi ) !;1 X N i=1 F (xi )yi; which is unbiased (though not ecient) and thus provides a choice of 0 which allows for dropping the term (xi ; ; 0 ) on the right-hand side of (12); see Remark 2. In this case, the modied function vN2 (; 0) can be presented as vN2 (; 0) = N h X yi ; F T (xi ) iT h i=1 N X hn (S 0 );1 yi ; F T (xi ) i o i2 + 21 tr [yi ; F T (xi ) 0 ] [yi ; F T (xi ) 0 ]T ; S (S 0 );1 : (13) i=1 Parameters and S in (13) are nicely partitioned and if ki k, then each step in the iterative procedure (11) can be presented in closed form, t = Mt;;11 Yt;1 and St = N ;1 where Mt = N X i=1 N h X i=1 ih yi ; F T (xi )t;1 yi ; F T (xi)t;1 F (xi)St;1 F T (xi ); Yt = N X i=1 iT ; (14) F (xi)St;1 yi ; cf. Fedorov (1977). Consequently, ^ ^N = tlim !1 t and Sn = tlim !1 St : The information matrix of ^NT = ^NT ; vechT (S^N ) is blockwise with respect to and vech(S ), therefore the asymptotic dispersion matrices can be computed separately. For instance, Var(^N ) ' "X N i=1 F (xi )S^;1 F T (xi ) N # ;1 : Note that the second formula on the right-hand side of (14) is valid only if all k components are measured at all points xi . Otherwise, St cannot be presented in closed form for any nondiagonal case. 4 OPTIMAL DESIGN OF EXPERIMENTS 10 4 Optimal Design of Experiments As soon as the information matrix of a single measurement is dened, it is a rather straightforward task to construct numerical algorithms and derive their properties in the optimal design P theory setting. Let ni measurements be taken at point xi , and let ni=1 ni = N , where n is the number of distinct xi . Let design N be dened as N = f(ni; xi )n1 ; n X i=1 ni = N ; xi 2 Xg; where X is a design region. Each design generates the information matrix MN () = X i ni (xi; ) = N n X i=1 pi (xi; ) = NM (; ); where M (; ) = n X i=1 pi(xi; ); weights pi = ni=N ; M (; ) is a normalized information matrix, and = fxi ; pi g is a normalized (continuous, or approximate) design. In this setting N may be viewed as a resource available to a practitioner; see Section 5 for a dierent normalization. In convex design theory, it is standard to allow weights pi to vary continuously. The goal is to minimize various functionals depending on the normalized variance matrix M ;1 (; ); = arg min [M ;1 (; )]: Among popular optimality criteria are: D-criterion, = log jM ;1 (; )j, which is often called a generalized variance and is related to the volume of the condence ellipsoid; A-criterion (linear), = tr AM ;1(; ), where A is an m m non-negative denite matrix and m is the number of unknown parameters in the model. Using classic optimal design techniques, one can establish an analog of the generalized equivalence theorem. A necessary and sucient condition for a design to be locally optimal is the inequality h i (x; ; ) = tr (x; )M ;1 ( ; ) m; m = dim ; (15) in the case of the D-criterion, and h i h (x; ; ) = tr (x; )M ;1 ( ; )AM ;1 ( ; ) tr AM ;1 ( ; ) i (16) in the case of the linear criterion. Equality in (15), or (16), is attained at the support points of the optimal designs . The function (x; ; ) is often called a sensitivity function of the corresponding criterion; see Fedorov and Hackl (1997, Ch. 2). For a single response function, k = 1, the sensitivity function of the D-criterion can be rewritten as (x; ; ) = d1S((x;x;;)) + d22S(2x;(x;;)) ; (17) 4 OPTIMAL DESIGN OF EXPERIMENTS 11 where (x; ) M ;1 (; ) @(x; ) ; d (x; ; ) = @S (x; ) M ;1 (; ) @S (x; ) ; d1(x; ; ) = @@ 2 T @ @T @ (18) see Downing et al. (2001). The scalar case was extensively discussed by Atkinson and Cook (1995) for various partitioning schemes of parameter vector , including separate and overlapping parameters in the variance function. Example 3.1 (continued). As mentioned above, in this example the information matrix is blockwise with respect to and S , i.e. (x; ) = (x) 0 0 SS ! ; and M (; ) = M (; ) P 0 0 SS ! ; where (x) = F (x)S ;1 F T (x); M (; ) = i pi (xi ); 0's are zero matrices of proper size and SS does not depend on x. Therefore, (15) admits the following presentation: h i ;1 ( ; )F (x) dim : tr S ;1 F T (x)M (19) ;1 ( ; )F (x) is the asymptotic dispersion matrix of the predicted The matrix d1 = F T (x)M response vector F T (x) at point x. Note that in general the design may depend on S , i.e. (15) or (16) lead to a locally optimal design. Formulas like (15) and (16) provide a basis for rst order numerical algorithms similar to those discussed in major texts on experimental design; cf. Atkinson and Donev (1992), Fedorov and Hackl (1997). While not discussing these algorithms in general we provide some special cases in examples. 4.1 Dose response model In dose response studies, the response is often described by a logistic function ; 1 ; (x; ) = (x; ) = 1 + 1 +2(x= 3 )4 (20) S (x; ) = 1 2 (x; ) ; T = ( T ; T )T : (21) where x is a given dose. The power model is a popular choice for the variance function, To illustrate D-optimal design for this model, we use the data from a study on the ability of a compound to inhibit the proliferation of bone marrow erythroleukemia cells in a cell-based assay; see Downing et al. (2001). The vector of unknown parameters was estimated by tting the data collected from the two-fold dilution design covering a range of concentrations from 0.98 to 500 ng/ml. Thus the design region was set to [-0.02, 6.21] on the log-scale. Fig. 1 presents the locally optimal design and the variance of prediction for the model (20), (21), where = (616; 1646; 75:2; 1:34; 0:33; 0:90)T . The two subplots on the right-side of Fig. 1 present the normalized variance of prediction (x; ; ) dened in (17), for a serial dilution design 0 (i.e. uniform on the log-scale, upper 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 1600 ψ ψ (standard) 1 ψ2 (new term) 12 10 ψ(x,ξ0,θ) η(x,θ) 1400 1200 8 6 1000 4 800 2 0 2 4 0 6 12 0 2 log(x) 4 6 4 6 log(x) 1000 12 10 ψ(x,ξ*,θ) d(x,ξ,θ) 800 600 400 8 6 4 ξ0 (uniform) * ξ (optimal) 200 0 0 2 2 4 6 0 0 2 log(x) log(x) Figure 1: Upper left: response. Upper right: normalized variance for uniform design. Lower right: normalized variance for optimal design. Lower left: unnormalized variance (triangles - optimal design, circles - serial dilution) right) and D-optimal design (lower right). The solid lines show the function (x; ; ) while the dashed and dotted lines display the 1st and 2nd terms on the right-hand side of (17), respectively. The unnormalized variance of prediction d1 (x; ; ) dened in (18), is given on the lower-left subplot. It is worthy to note that the optimal design in our example is supported at just four points, which is less than the number of estimated parameters. We also remark that the weights of the support points are not equal. In our example p = f0:28; 0:22; 0:22; 0:28g. 5 Optimal Designs Under Cost Constraints Traditionally when normalized designs are discussed, the normalization factor is equal to the number of experiments N ; see Section 3. Now let each measurement at point xi be associated with a cost c(xi ), and there exists a restriction on the total cost, n X i=1 nic(xi ) C: (22) 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 13 In this case it is quite natural to normalize the information matrix by the total cost C and introduce X MC (; ) = MNC() = wi~(xi ; ); with wi = nicC(xi ) ; ~(x; ) = c((x;x)) : (23) i Note that the considered case should not be confused with the case when additionally to (23) P one also imposes that i ni N . The corresponding design problem is more complicated and must be addressed as discussed in Cook and Fedorov (1995). As soon as cost function c(x) is dened, one can use the well elaborated techniques of constructing continuous designs for various criteria of optimality [MC;1 (; )] ! min ; where = fwi ; xi g: As usual, to obtain frequencies ni , values n~ i = wi C=c(xi ) have to be rounded to the nearest P integers ni subject to i nic(xi ) C ; for details on rounding, see Pukelsheim (1993, Ch. 12). To illustrate the potential of this approach, we construct D-optimal designs for the twodimensional response function (x; ) = [1 (x; ); 2 (x; )]T , with the variance matrix ! S (x; ) = S11 (x; ) S12 (x; ) : (24) S12 (x; ) S22 (x; ) Let a single measurement of function i (x; ) cost ci (x); i = 1; 2. Additionally, we impose a cost cv (x) on any single or pair of measurements. The rationale behind this model comes from considering a hypothetical visit of a patient to the clinic to participate in a clinical trial. It is assumed that each visit costs cv (x), where x denotes a patient (or more appropriately, some patient's characteristics). There are three options for each patient: (1) Take test t1 which by itself costs c1 (x); the total cost of this option is C1 (x) = cv (x)+ c1 (x). (2) Take test t2 which costs c2 (x); the total cost is C2 (x) = cv (x) + c2 (x). (3) Take both tests t1 and t2 ; in this case the cost is C3 (x) = cv (x) + c1 (x) + c2 (x). Another interpretation could be measuring pharmacokinetic prole (blood concentration) at one or two time points. To consider this example within the traditional framework, introduce binary variables x1 and x2 ; xi = f0 or 1g; i = 1; 2: Let X = (x; x1 ; x2 ) where x belongs to a \traditional" design region X , and pair (x1 ; x2 ) belongs to X12 = f(x1 ; x2 ) : xi = 0 or 1; max(x1 ; x2 ) = 1g: Dene where (X; ) = Ix1 ;x2 (x; ); S (X; ) = Ix1 ;x2 S (x; ) Ix1 ;x2 ; Ix1 ;x2 = x1 0 0 x2 ! : (25) 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS Now introduce the \extended" design region X, X = X X12 = Z1 [ Z2 [ 14 Z3 ; (26) where Z1 = f(x; x1 ; x2 ) : x 2 X ; x1 = 1; x2 = 0g; Z2 = f(x; x1 ; x2) : x 2 X ; x1 = 0; x2 = 1g; \ \ Z3 = f(x; x1 ; x2 ) : x 2 X ; x1 = x2 = 1g; and Z1 Z2 Z3 = : The normalized information matrix MC (; ) and design are dened as MC (; ) = n X i=1 n X wi~(Xi ; ); i=1 wi = 1; = fXi ; wig ; where ~(X; ) = (X; )=Ci (x) if X 2 Zi ; i = 1; 2; 3; with (X; ) dened in (2), and (X; ); S (X; ) introduced in (25). Note that the generalization to k > 2 is straightforward; one has to introduce k binary variables xi and matrix Ix1 ;:::;xk = diag(xi ): The number of subregions Zi in this case is equal to 2k ; 1: The formulation (26) will be used in the example below to demonstrate the performance of various designs with respect to the sensitivity function (X; ; ); X 2 Zi . Response functions: η1(x,θ) = γ1 + γ2x + γ3x2 + γ4x3; η2(x,θ) = γ1 + γ5x + γ6x+ 5 η1 η2 4 η(x,θ) 3 2 1 0 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 x Figure 2: Response functions 1 and 2 ; parameter = (1; 2; 3; ;1; 2; ;1:5)T 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 5.1 15 Two response functions with cost constraints For the example, we selected functions 1 (x; ) = 1 + 2 x + 3x2 + 4x3 = F1T (x) ; 2 (x; ) = 1 + 5x + 6 x+ = F2T (x); where F1 (x) = (1; x; x2 ; x3 ; 0; 0)T ; F2 (x) = (1; 0; 0; 0; x; x+ )T ; x 2 X = [;1; 1]; and x+ = fx if x 0; and 0 otherwiseg; see Fig.2. Cost functions are selected as constants cv ; c1 ; c2 2 [0; 1] and do not depend on x. Similarly, the variance matrix S (x; ) is constant, S (x; ) = S11 S12 S12 S22 ! : In our computations, we take S11 = S22 = 1; S12 = ; 0 1, thus changing the value of S12 only. Note that = (1 ; 2 ; : : : ; 6 ; S11 ; S12 ; S22 )T : The functions 1 ; 2 are linear with respect to unknown parameters . Thus optimal designs do not depend on their values. On the contrary, in this example optimal designs do depend on the values of the variance parameters Sij , i.e. we construct locally optimal designs with respect to their values (compare Figures 4 and 5). We considered a rather simple example to illustrate the approach. Nevertheless, it allows us to demonstrate how the change in cost functions and variance parameter eects the selection of design points. 1st function only: cost = 1.0 + 0.0 ψ(X), X∈ Z1 9 7 5 3 1 0 −1 1 2nd function only: cost = 1.0 + 0.0 ψ(X), X∈ Z2 9 7 5 3 1 0 −1 1 Both functions: cost = 1.0 + 0.0 + 0.0 ψ(X), X∈ Z3 9• 7 0.321 • 0.162 • • 0.043 0.164 0 0.5 0.325 • 5 3 1 0 −1 −0.5 1 x Figure 3: Sensitivity function and D-optimal design for cv = 1, c1 = c2 = 0; = 0 For the rst run, we hchoose cv = 1; c1 =i c2 = 0; = 0; see Fig.3 which shows the sensitivity function (X; ; ) = tr ~(X; )MC;1 ( ; ) for X 2 Zj ; j = 1; 2; 3: Not surprisingly, in this case 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 16 selected are design points in subregion Z3 . Indeed, since individual measurements cost nothing, it is benecial to take two measurements instead of a single one to gain more information and to decrease the variability of parameter estimates. The weights of the support points are shown in the plot which illustrates the generalized equivalence theorem: the sensitivity function hits the reference line m = 9 at the support points of D-optimal design; recall that dim() = 9. If we introduce positive costs of the individual measurements, then the weights are redistributed. The case cv = 1; c1 = c2 = 0:5; = 0; is presented in Fig. 4. Compared to Fig.3, the design weights in the middle of subregion Z3 shift to subregion Z1 , where two new points appear: x4;5 = 0:45 with weights w4;5 = 0:18. It is interesting that in this case no support points appear in subregion Z2 (i.e. measuring the 2-nd function only). 1st function only: cost = 1.0 + 0.5 • ψ(X), X∈ Z1 9 • 0.184 7 0.185 5 3 1 0 −1 1 2nd function only: cost = 1.0 + 0.5 ψ(X), X∈ Z2 9 7 5 3 1 0 −1 1 Both functions: cost = 1.0 + 0.5 + 0.5 ψ(X), X∈ Z3 9• 7 • 0.311 0.313 • 0.02 2 5 3 1 0 −1 −0.5 0 0.5 1 x Figure 4: D-optimal design for cv = 1, c1 = c2 = 0:5; = 0 The next case deals with positive correlation = 0:3 and cv = 1; c1 = c2 = 0:5; see Fig. 5. Now there are just 4 support points in the design: two of them are at the boundaries of subregion Z3 with weights w1;2 = 0:33, and the other two are in the middle of subregion Z1 , x3;4 = 0:45 with weights w3;4 = 0:17: So far, subregion Z2 has not been represented in the optimal designs. Fig.6 illustrates a case when support points appear in this subregion. For this, we take cv = 1; c1 = 1; c2 = 0:1; = 0:2: A design point x5 = 0 appears in the center of Z2 with weight w5 = 0:1. Not surprisingly, subregion Z1 has no support points in this example since the cost of measuring function 1 is much higher than for function 2 . 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 17 1st function only: cost = 1.0 + 0.5 • ψ(X), X∈ Z1 9 • 0.173 7 0.174 5 3 1 0 −1 1 2nd function only: cost = 1.0 + 0.5 ψ(X), X∈ Z2 9 7 5 3 1 0 −1 1 Both functions: cost = 1.0 + 0.5 + 0.5 ψ(X), X∈ Z3 9• 7 0.33 0.332 1 • 5 3 1 0 −1 −0.5 0 0.5 1 x Figure 5: D-optimal design for cv = 1, c1 = c2 = 0:5; = 0:3 1st function only: cost = 1.0 + 1.0 ψ(X), X∈ Z1 9 7 5 3 1 0 −1 1 ψ(X), X∈ Z2 2nd function only: cost = 1.0 + 0.1 9 • 7 0.105 5 3 1 0 −1 1 ψ(X), X∈ Z3 Both functions: cost = 1.0 + 1.0 + 0.1 9 • 0.31 1 7 • 0.14 • 0.14 2 3 0.314 • 5 3 1 0 −1 −0.5 0 0.5 1 x Figure 6: D-optimal design for cv = 1, c1 = 1; c2 = 0:1; = 0:2 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 5.2 18 Linear regression with random parameters Let E(yj; x) = f T (x); Var(yj; x) = 2 : (27) We assume that given , all observations are independent. Parameters 2 Rm are independently sampled from the normal population with E( ) = 0 ; Var( ) = ; (28) where 0 and are often referred to as \population", or \global" parameters. Let T f (xij ) = [f1(xijh ); : : : ; fm (xij )]T ; i = i 1; : : : ; kj ; yj = (y1j ; : : : ; ykj ;j ); xj = (x1j ; : : : ; xkj ;j ); and F (xj ) = f (x1;j ); : : : ; f (xkj ;j ) : Then model (27) can be represented as E(yj jj ; xj ) = F T (xj )j : (29) We emphasize that dierent numbers of measurements kj can be obtained for dierent j 's. Predictor xij is a q-dimensional vector; for example, if a patient receives a q-drug treatment, xuij denotes the dose level of drug u administered to individual j in experiment i; u = 1; : : : ; q. From (28) and (29) it follows that E(yjxj ) = (xj ; 0 ) = F T (xj )0 ; Var(yjxj ) = S (; 2 ; xj ) = F T (xj )F (xj ) + 2 Ikj : (30) We rst assume that is diagonal, i.e. = diag( ); = 1; : : : ; m. For the discussion on how to tackle the general case of non-diagonal matrix , see Remark 5 in the Appendix. Straightforward exercises in matrix algebra lead from (2) to the following representation of the information matrix: 0 N; 0m;m 0m;1 N X B 2 B N (; ; ) = (xj ; ) = @ 0m;m N; N; j =1 where 01;m TN; N; 1 0 N B ;j 0m;m 0m;1 CC = X B@ 0m;m ;j ;j A j =1 01;m T;j ;j 1 CC ; (31) A ;j = F (xj )Sj;1 F T (xj ); Sj = S (; 2 ; xj ); h i f;j g = 21 F (xj )Sj;1FT (xj ) 2 ; ; = 1; : : : ; m; h i f;j g = 12 F (xj )Sj;2 FT (xj ); = 1; : : : ; m; ;j = 21 tr Sj;2 ; F (xj ) = f(x1j ); f (x2j ); : : : ; f (xkj ;j ) ; and 0a;b is an (a b) matrix of zeros: Thus sets of parameters and f; g are mutually orthogonal. This makes optimal design and estimation problems computationally more aordable. Iterated estimators, similar to the example in Section 3.1, can be written 2 2 ^ ^N = tlim !1 t ; N = tlim !1 t ; ^N = tlim !1 t ; 5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS where ;1 Yt ; M;t = t+1 = M;t N X j =1 F (xj )Stj;1F T (xj ); Yt = t+1 ! t2+1 Mt; = Mt; T Mt; Mt; N X j =1 !;1 19 F (xj )Stj;1 yj ; Stj = S (t ; t2 ; xj ); yt;1 yt;2 ! ; (32) where t = (t1 ; : : : ; tm )T ; Mt; is an (m m) matrix; Mt; and yt;1 are (m 1) vectors; Mt;; Mt; ; and Mt; are the same as N; ; N; ; and N; , respectively except that Stj should be substituted for Sj , fyt;1 g = 21 N n X j =1 yt;2 = 21 h F (xj )Stj;1 yj ; F (xj )T t N X io2 ; = 1; : : : ; m; [yj ; F (xj )T t ] Stj;2 [yj ; F (xj )T t ]: j =1 The proof of (32) is postponed to the Appendix. Now, we show how to introduce cost constraints for this example; for other methods of generating optimal design with constraints in random eects models, see Mentre and al. (1997). The hypothetical model is similar to that for Example 5.2, where patients visit a clinic to participate in a clinical trial. Here, we assume that the patients will undergo serial sampling over time, for example blood sampling, within a specied time interval [0; T ]. Each patient who participates in the trial may have a dierent number of samples taken, up to a maximum of q. Following Example 5.2, we impose a cost cv for each visit, and we assign a cost cs for each of the individual samples. Therefore, if k samples are taken, the total cost per patient will be described as Ck = cv + kcs; k = 1; : : : ; q; with the restriction (22) on the total cost. Since samples are taken over time, there is a natural ordering corresponding to the timing of the samples. The design region X for a patient depends upon the number and timing of samples taken from that patient. For example, if a patient visits the clinic for a single sample only, the design region X1 will consist of a single point x, 0 x T . For a patient having two samples taken, the design region X2 will consist of vectors of length 2, X2 = fX = (x1 ; x2 ); 0 x1 < x2 T g; etc. If q samples are taken from a patient, then Xq = fX = (x1 ; : : : ; xq ); 0 x1 < : : : < xq T g: Finally, the normalized information matrix can be dened as MC (; ) = n X i=1 wi~(Xi ) ; n X i=1 wi = 1; 6 DISCUSSION 20 where ~(X; ) = (X; )=Ck , if dim(X ) = k and X 2 Xk , and the information matrix (X; ) is dened in (31). 6 Discussion Multiresponse regression models with a variance matrix depending upon unknown parameters form a common basis for modeling in many areas of research. Well known examples exist in biomedical research, such as in pharmacokinetics and dose response models. Further examples are prevalent in econometrics, psychometrics, and agricultural eld studies, among others. A rich literature exists in which methods for parameter estimation are derived and compared. Optimal design for these types of models has also been studied, but the literature is not as well developed. In this paper, we propose an iterated estimator, which is shown to be a least squares estimator combining the usual generalized least squares with squared deviations of the predicted dispersion matrix from the observed residual matrices. This estimator is shown to be asymptotically equivalent to the MLE. We provide closed form solutions for the proposed estimator for a linear model with random parameters. For the case of a single response, the combined iterated estimator is similar to the iterated estimator proposed by Davidian and Carroll (1987). However they partition the parameter vector into two subvectors, with the second one appearing only in the variance function, and perform iterations on each term separately. Moreover, for cases where parameters in the expectation and variance functions coincide, the second term disappears completely from the iterations and hence their iterated estimator does not lead to the MLE. Optimal experimental design is a critical aspect of research, and well known algorithms can be utilized to generate such designs. We combine the proposed iterated estimator with convex design theory to generate locally optimal designs. In a specic example, a dose response study with the response modeled by a logistic function and the variance modeled by a two parameter power function, a rst order numerical algorithm was utilized to generate a D-optimal design. The optimal design was compared to a two-fold dilution design with 10 dilutions (a standard dose response design). We underline that the optimal design for this model is supported at four design points, which is less than the total of 6 parameters in the model. Therefore, combined application of the estimation and design algorithms leads to an optimal design which, if implemented, requires fewer resources than the standard design (4 points instead of 10). Certainly, for nonlinear models one constructs locally optimal designs which requires preliminary estimates of unknown parameters. From our experience with the logistic model, the locally optimal designs are quite robust to a reasonable variation of parameter estimates. Finally, we introduce cost functions and demonstrate the application of cost constraints to normalized designs. This provides experimenters with a basis for incorporating a total cost, and allows them to achieve a statistically optimal design, in light of these costs. Such a design oers several benets, not the least of which is to enable the experimenter to conduct the study within budget while obtaining reliable parameter estimates. In our example with two response 6 DISCUSSION 21 functions, we demonstrate the impact of cost constraints. With a xed overall cost only (no individual costs, see Fig. 3), support points are allocated within our design region Z3 , which is not surprising: it is benecial to take two measurements, rather than one, to increase information content and reduce parameter variability. Introduction of positive costs and correlation between the response functions shifts the design which is demonstrated in the examples; see Figs. 4-6. In conclusion, this paper describes an iterated estimator for nding parameter estimates for multireponse models with variance depending upon unknown parameters, combines this method with convex design theory, and introduces designs with cost constraints. These concepts can be a valuable tool to experimenters, enabling ecient parameter estimation and optimal allocation of resources. Acknowledgements The authors are grateful to the referees for useful comments that helped to improve the presentation of the results. 7 APPENDIX 22 7 Appendix In this section, we use a few standard formulas from matrix dierential calculus; see Harville (1997, Ch. 15). (1) If S is a symmetric matrix which depends on parameter , and if u is a scalar function of S , then du = tr @u @S ; (33) d @S @ @ log jS j = S ;1; dS ;1 = ; S ;1 dS S ;1 : @S d d (34) @ tr [(A ; S )B ]2 = 2B (S ; A)B : @S (35) (2) If A; S; and B are symmetric matrices of proper dimension, then The following lemma is used in the proof of Theorem 1. Lemma 1. Let S0 S (x; ) S1 for any x 2 X and 2 ; where S0 and S1 are positive denite matrices. Let (36) R1(x; ; ; 0) = [(x; ) ; (x; )]T S ;1 (x; 0 ) [(x; ) ; (x; )]; R2(x; ; ; 0) = 21 trf R22 (x; ; ; 0 ) + S (x; ) ; S (x; ) S ;1 (x; 0 )g2 ; R22 = [(x; ) ; (x; 0)] [(x; ) ; (x; 0)]T ; [(x; ) ; (x; 0)] [(x; ) ; (x; 0 )]T ; and for a given , [(x; ) ; (x; )]T S1;1 [(x; ) ; (x; )] + 12 trf[S (x; ) ; S (x; )] S1;1 g2 > 0: Then (37) R1(x; ; ; 0) + R2(x; ; ; 0) > 0: Proof. It is obvious that for any x; ; and 0, R1 (x; ; ; 0 ) 0; R2 (x; ; ; 0 ) 0; and R1(x; ; ; 0) = R2 (x; ; ; 0 ) = 0 if = : Next, term R22 can be represented as R22 = [(x; ) ; (x; 0 )] [(x; ) ; (x; )]T ; [(x; ) ; (x; )] [(x; 0 ) ; (x; )]T : (38) 7 APPENDIX 23 Since both terms on the left-hand side of (37) are non-negative, then at least one of them is positive. If the rst term is positive, then the lemma follows from (36) and the denition of R1 . Next, if the rst term is equal to zero, then (x; ) = (x; ), and (38) implies that R2(x; ; ; 0) = 21 trf[S (x; ) ; S (x; )]S ;1 (x; 0)g2 12 trf[S (x; ) ; S (x; )]S1;1 g2 > 0; since in this case the second term on the right-hand side of (37) is positive. This proves the lemma. Proof of Theorem 1. First, compute partial derivatives of the log-likelihood function LN (), introduced in (3). Using (33) and (34), and letting zi () = yi ; (xi ; ), one gets: # N " N T X X @L ( ) @S ( x ; ) N i ; 1 ;2 @ = tr S (xi ; ) @ ; 2 @ @(xi ; ) S ;1(xi ; )zi() ; j j i=1 N X i=1 j (xi ; ) S ;1 (x ; ) z () : ziT () S ;1 (xi; ) @S@ (39) i i j i=1 Next, use (33) and (35) to compute partial derivatives of vN2 (; 0) with respect to j , which ; leads to N @T (x ; ) @vN2 (; 0) = ; 2 X i S ;1(xi ; 0 ) zi () + @j @ j i=1 " N o ;1 0 @S (xi ; ) # X ;1 0 n 0 T 0 : + tr S (xi ; ) S (xi ; ) ; zi ( )zi ( ) S (xi ; ) @ j i=1 From the identity tr[AB ] = tr[BA], it follows that + " N X i=1 N @T (x ; ) X @vN2 (; 0) i = ; 2 S ;1(xi ; ) zi() + @j = @ j i=1 0 tr S ;1 (xi ; ) @S (xi ; ) @j # ; N X (xi ; ) S ;1 (x ; ) z () ; ziT () S ;1 (xi ; ) @S@ i i j i=1 which coincides with (39). Note that if the algorithm (11) converges, then under the introduced assumptions 2 (; ^N ) 2 (; t ) @v @v N N = 0; lim t!1 @ = @ =t =^N which implies that ^N 2 N . To prove the convergence of (11), introduce AN (0 ) = arg min v2 (; 0): 2 N Then (11) can be presented as the recursion t = AN (t;1 ) for the xed point problem = AN (). Convergence of the recursion is guaranteed if for any 1; 2 2 there exists a constant K; 0 < K < 1, such that [AN (1 ) ; AN (2 )]T [AN (1 ) ; AN (2 )] K (1 ; 2 )T (1 ; 2 ); 7 APPENDIX 24 cf. Saaty and Bram (1964), Ch.1.10. Remark that AN () is simply a generalized version of the least squares estimator with predetermined weights. Straightforward calculations show that the expectation of the ith summand on the right-hand side of (12) is equal to R(xi ; ; ; 0) = R1 (xi; ; ; 0) + R2 (xi; ; ; 0 ) + R3 (xi ; ; 0) where terms R1 and R2 are introduced in Lemma 1, and term R3 does not depend on . Lemma 1 together with Assumption 3 guarantees that for any 0 , the limiting function X lim 1 R(x ; ; ; 0 ) i N !1 N i has a unique minimum with respect to at = : Using the strong law of large numbers, and following Jennrich (1969), one can show that AN () is strongly consistent, i.e. converges almost surely to ; cf. Rao (1973, Ch.2c). From this fact and the compactness of , it follows that the probability h i P fAN (1 ) ; AN (2 )gT fAN (1 ) ; AN (2)g K (1 ; 2 )T (1 ; 2 ) tends to 1 as N ! 1 uniformly over 1 ; 2 2 for any xed 0 < K < 1. Thus, for large N with probability close to 1 the limit (11) exists and, consequently, ^N 2 N which proves the theorem. Proof of (32). Introduce zj = yj ; F T (xj )t ; Bj = (St;j );1 ; and recall that Sj = F T (xj )F (xj ) + 2 Ik : It is straightforward to show that F T (x j )F (xj ) = and therefore m X =1 FT (xj )F (xj ) ; @Sj = F T (x )F (x ); @Sj = I : k j j @2 @ The analogue of the second term on the right-hand side of (13) can be written as N h X v2 = 12 tr (zj zjT ; Sj )Bj j =1 i2 Then using (33), (35), (40), and the identity tr[AB ] = tr[BA], one gets: 2N 3 n o X @v2 T 2 T T 4 5 @ = tr j=1 Bj F (xj )F (xj ) + Ik ; zj zj Bj F (xj )F (xj ) = 8m 9 N <X T = X = F (xj )Bj : F (xj )F (xj ) + 2 Ik ; zj zjT ; Bj FT (xj ) = j =1 =1 (40) 7 APPENDIX 25 = 2 m X =1 fMt; g + 22 fMt; g ; 2fyt;1 g : In a similar fashion, taking the partial derivative with respect to 2 leads to 2 3 N n T o @v2 = tr 4X 2I ; z zT B 5 = B F ( x ) F ( x ) + j j j j j j k @2 j =1 9 2m 8 3 N N N <X = X X X = tr 4 : Bj FT (xj )F (xj )Bj ; + 2 Bj2 ; Bj zj zjT Bj 5 = =1 j =1 = 2 m X =1 j =1 j =1 fMt; g + 22 Mt; ; 2yt;2 : Finally, equating the expressions of partial derivatives to zero entails ! Mt; Mt; T Mt; Mt; 2 ! = yt;1 yt;2 ! ; which proves (32). Remark 5. To establish the analog of (32) in the case of non-diagonal symmetric matrix = ( ), one has to exploit the identity F T (xj )F (xj ) = = m X r=1 m X r;q=1 rr FrT (xj )Fr (xj ) + rq FrT (xj )Fq (xj ) = X r>q h i rq FrT (xj )Fq (xj ) + FqT (xj )Fr (xj ) : (41) First, the formula (31) should be modied according to (41). The information matrix N; is now an [m(m +1)=2] [m(m +1)=2] matrix corresponding to parameter vech() (cf. Example 1 for the notation vech). Let Wj; = F (xj )Sj;1 FT (xj ): Then the elements of N; are dened by N;;rq , N;;rr = 21 N X j =1 2 ; Wj;r N;;rq = N X j =1 Wj;r Wj;q ; ;rq = N X j =1 [Wj;r Wj;q + Wj;q Wj;r ] ; cf. Jennrich and Schluchter (1986). An m(m + 1)=2 vector N; has elements N;; , N X N;; = 21 F (xj )S ;2 FT (xj ); N;; = j =1 and, nally N; does not change. N X j =1 F (xj )S ;2 FT (xj ); 7 APPENDIX 26 Using now formula (41), taking partial derivatives with respect to ; , and 2 , and equating them to zero leads to the system of [m(m + 1)=2 + 1] linear equations, the solution of which is given by ! ~ t; M~ t; !;1 y~t;1 ! M = ; T 2 M~ t; Mt; y~t;2 where M~ t; and M~ t; are introduced absolutely similar to N; and N; , with Stj substituting for Sj , N h N i X X fy~t;1g = 21 F (xj )Stj;1zj 2 ; fy~t;1 g = F (xj )Stj;1 zj F (xj )Stj;1 zj ; j =1 j =1 N X y~t;1 = 12 zjT Stj;2 zj : j =1 REFERENCES 27 References [1] Atkinson, A.C. and Cook, R.D. (1995), D-optimum designs for heteroscedastic linear models, JASA, 90 (429), 204-212. [2] Atkinson, A.C., and Donev, A. (1992), Optimum Experimental Designs, Clarendon Press, Oxford. [3] Beal, S.L., and Sheiner, L.B. (1988), Heteroscedastic nonlinear regression. Technometrics, 30(3), 327-338. [4] Cook, R.D., and Fedorov, V.V. (1995), Constrained optimization of experimental design, Statistics, 26, 129-178. [5] Cramer, H. (1946), Mathematical Methods of Statistics, Princeton University Press. [6] Davidian, M., and Carroll, R.J. (1987), Variance function estimation, JASA, 82 (400), 1079-1091. [7] Downing, D.J., Fedorov, V.V., Leonov, S.L. (2001), Extracting information from the variance function: optimal design. In: Atkinson, A.C., Hackl, P., Muller, W.G. (eds.) MODA6 - Advances in Model-Oriented Design and Analysis, Heidelberg, Physica-Verlag, 45-52. [8] Fedorov, V.V. (1974), Regression problems with controllable variables subject to error, Biometrika, 61, 49-56. [9] Fedorov, V.V. (1977), Parameter estimation for multivariate regression, In: Nalimov, V. (Ed.), Regression Experiments (Design and Analysis), Moscow State Universtiy, Moscow, 112-122 (in Russian). [10] Fedorov, V.V., and Hackl, P. (1997), Model-Oriented Design of Experiments, SpringerVerlag, New York. [11] Harville, D.A. (1997), Matrix Algebra from a Statistician's Perspective, Springer-Verlag, New York. [12] Heyde, C.C. (1997), Quasi-Likelihood and Its Applications, Springer-Verlag, New York. [13] Jennrich, R.I. (1969), Asymptotic properties of nonlinear least squares estimators, Ann. Math. Stat. 40, 633-643. [14] Jennrich, R.I. , and Schluchter, M.D. (1986), Unbalanced repeated-measures models with structured covariance matrices, Biometrics, 42, 805-820. [15] Jobson, J.D., and Fuller, W.A. (1980), Least squares estimation when the covariance matrix and parameter vector are functionally related, JASA, 75 (369), 176-181. REFERENCES 28 [16] Lindstrom, M.J., and Bates, D.M. (1990), Nonlinear mixed eects models for repeated measures data, Biometrics, 46, 673-687. [17] Magnus, J.R., and Neudecker, H. (1988), Matrix Dierential Calculus with Applications in Statistics and Econometrics, Wiley, New York. [18] Malyutov, M.B. (1982), On asymptotic properties and application of IRGNA-estimates for parameters of generalized regression models. In: Stoch. Processes and Appl., Moscow, 144-165 (in Russian). [19] Malyutov, M.B. (1988), Design and analysis in generalized regression model F. In: Fedorov, V.V., Lauter, H. (Eds.), Model-Oriented Data Analysis, Springer-Verlag, Berlin, 72-76. [20] Mentre, F., Mallet, A., Baccar, D. (1997), Optimal design in random-eects regression models, Biometrika, 84 (2), 429-442. [21] Muirhead, R. (1982), Aspects of Multivariate Statistical Theory, Wiley, New York. [22] Muller, W.G. (1998), Collecting Spatial Data, Springer-Verlag, New York. [23] Pukelsheim, F. (1993), Optimal Design of Experiments, Wiley, New York. [24] Pazman, A. (1993), Nonlinear Statistical Models, Kluwer, Dordrecht. [25] Rao, C.R. (1973), Linear Statistical Inference and Its Applications, 2nd Ed., Wiley, New York. [26] Saaty, T.L., and Bram, J. (1964), Nonlinear Mathematics, McGraw-Hill, New York. [27] Seber, G.A.E. (1984), Multivariate Observations, Wiley, New York. [28] Vonesh, E.F., and Chinchilli, V.M. (1997), Linear and Nonlinear Models for the Analysis of Repeated Measurements, Marcel Dekker, New York. [29] Wu, C.F. (1981), Asymptotic theory of nonlinear least squares estimation, Ann. Stat., 9 (3), 501-513.
© Copyright 2026 Paperzz