a PROPERTIES OF VARIANCE OOMPONENT ESTrnATORS OBTAINED BY RESTRICTED MAXIMUM LIKELIHOOD AND BY MmQUE Songsiri Sriburi Institute of Statistics Mimeograph Series No. 1175 RaJ.eigh - May 1978 . iv TABLE OF CONTENTS Page LIS'! OF TABLES • LIS'I OF FIGURES. ............. ·.... .. ...... ·.. . . · . . ... . . . · . . . . . . · . . 1. INTRODUCTION. • 2. ASYMP'IO'IIC PROPERTIES OF RESTRICTED MAXIMUM LIKELmOOD ES'IDfATES IN THE MIXED MODEL • ~ .·......···· . · · ·· ·· 3. Proof of COtldition a. Proof of Condition b. Proof of Condition c. Summarizing Theorem • ·· ·· ·· .. ·· • • · · ..·· · · • · · · .. · · ·..· ··· '. THE DISTRIBU'IION OF VARIANCE COMPONENT ESTDf.ATORS 3.1 Introduction • • • • • • • • • • • • • 3.2 The Method of Maxtures • • • • • • • • 3.3 The Distribution of Variance Component Estimators (Positive Quadratic Forms) •• The Distr1bution of Var1ance Component 3.4 Estimators (Indefinite Quadratic Forms). 4. viii 1 7 2.1 Introduction 2.2 The Mixed Medel and Its Assumptions. 2.3 The General Asymptotic Theorem • 2.4 A Sequence of Experiments. 2.5 COtlsistency, Asymptotic NOl:mality, and Asymptotic Efficiency of REML Estimates. 2•.5.1 2.5.2 2.5.3 2.5.4 v ·. 7 8 12 14 16 17 18 21 27 31 31 31 ·.. ·.. 33 36 COMBINING INFORMATION FROM SEVERAL EXPERIMENTS. 50 Introduction. • • • • • • • • • • • • • Model and Methods. . • • • • • • • • • . • • • • Distribution and Variances of Estimators • A Comparison of Four Methods of Estimation • . • •• The Effect of the Number of Levels of Each Random Factor. • • • • • • • • • • • • • 4.6 The Effect of the True Variance Components. • 50 51 57 60 4.1 4.2 4.3 4.4 4.5 .... ·.. 5. SUMMARY 6. LIST OF REFERENCES. 7. APPENDIX. • • • • • • ·..· ·. ·.. 72 84 99 • 105 • • 107 v LIST OF TABLES Page 3.4.1 The cummulative distribution obtained from the method developed and the Monte Carlo , -3 method of T with e • 10 4.4.1 •••••••• .... 47 Variances of &2, probabilities of negative r .. 2 a , and the 95th percentiles of the distri- r .. 2 butions of a from four methods and eight r combined experiments when a 2 2 • 1. 0, a c • .4, 2 2 r a rc • .8, and a e • 1.0• • • • • 4.4.2 .. 2 .. 2 c c 61 Variances of a , probabilities of negative a , and the 95th percentiles of the distributions of &2 from four methods and eight combined c 2 2 2 experiments when a • 1.0, a • .4, a • .8, 2 r c rc and a • 1. O. • • • • • e 4.4.3 Variances of &2 , probabilities of negative &2 , rc rc and the 95th percentiles of the distributions of &2 from four methods and eight combined rc 2 2 2 e:cperiments when a • 1.0, (j .: .4, a • .8, . 2 r c rc an.d a • 1.0 • . . . . . . . . e 4.4.4 67 2 pro ba b i 1 ities 0 f negat~ve ...2 Vari~ces 0 f ..a, a , e e: and the 95th percentiles of the distributions of a2 from four methods and eight combined e: 2 2 2 experiments when a • 1.0, a • .4, a • .8, 2 r c rc and a 4.5.1 64 e • 1. o. . . . . . . . . 70 a2 .. 2 , probabilities of negative a , Variances of r r and the 95th percentiles of the distributions of 2 when using method d and sixteen combined ar 2 2 2 experiments with a • 1.0, a • .4, arc r c 2 an.d a • 1.0. . . . . . . . . . e = .8, 73 vi LIst OF tABLES (continued) Page 4.5.2 .. 2 .. 2 Variances of a , probabilities of negative a , c c and the 95th percentiles of the distributions .. 2 of a when. using method d and sixteen combined 2' c 4.5.3 4.5.4 2 2 experiments with a • 1.0, a • .4, a • .8, 2 r c rc and a .' 1. O. • . . • . . • . . • • e .. 2 .. 2 Variances of a ,probabilities of negative a , rc rc and the 95th percentiles of the distributions .. 2 of a when. using method d and sixteen combined rc 2 2 2 experiments with a • 1.0, a • .4, a • .8, 2 r. c rc and a •. 1. O. . . . . . . . . . • • • • • e .. 2 Variances of a € .. 76 .. 79 .. 2 probabilities of negative a , J € and the 95th percentiles of tne distributions 2 of a when using method d and sixteen combined 2 € 2 2 experiments with a • 1.0, a • '.4, arc • .8, 2 r c and ae: • 1.0. • . • • • • 4.6.1 .. 2 Variances of a r 81 A2 J r ' . probabilities of negat1ve a J and the 95th percentiles of the distributions of a2r when using method 2 experiments with a cr 4.6.2 2 rc .=.. 8, and 2 (j e: d and three combined 2 • .01, 1.0, 2.0, cr . : .4, r c 1. O. . . . . . . . . . . . . . a .. 2 • 85 A2 Variances of a , probabilities of negat1ve a , r r and the 95th percentiles of the distributions of ;2 when r using method d and three combined 2 2 experiments with a • 1.0, a • .01, .4, 2.0, 2 2 r c a .:. 8 , and a • 1. o. . . . . . . . . . . . rc e: 89 vii LIS'!' OF TABLES (continued) Page 4.6.,3 A2 A2 t probabilities of negative a , r r Variances of a and the 95th percentiles of the distributions A2 of a ~ when using method d and three combined 2 r 2 2 experiments with a • 1.0, a • .4, a • .01, 2 r c rc .8, 2.0, and a • 1.0 • • • • e: 4.6.4 A2 r 92 A2 r Variances of a , probabilities of negative a , and the 95th percentiles of the distributions of &2 when using method d and three combined r experiments with a 2 • 1.0, a 2 2 r c and a e: • .01, 1.0, 2.0 • • . • 7.1 2 • .4, arc • .8, Seven terms with appropriate choices of A, B, and C 95 124 viti LIS'! OF FIGURES 4 3.4.1 Graphs of the p.d.f. of T, where T· 4 - j-1r aj~r+"J r AiU~ i-1 for three sets of (A1,AZ,A3,A4,-al ..... 4.6.1 .. 2 2 where .01 < a < 2.0 r - r- The graphs of variances of a for three combined experiments.. 4.6.2 • • • • • • The graphs of variances of ~2 where .01 r < a2 < c- 4.6.4 ·86 2.0 for three combined experiments. • 4.6.3 49 90 .. 2 2 < 2.0 The graphs of variances of a where .01 < a r rc for three combined experiments. • .. 2 The graphs of variances of a . where .01 < a2 < 2.0 ........ r for three combined experiments. • - 93 e:- . .• . . . . . . 96 1. INTRODUCTION Estimation of variance components in the mixed model of the analysis of variance has been the subject of considerable discussion for several years. If the data are balanced, then estimation relies_almost exclusively on the analysis of variance method. This method consists of making an analysis of variance table, equating expected values to observed mean squares, and using the solutions to the resulting equations as the estimates. It has been shown by Graybill (1954) and Graybill and Wortham (1956) that-under the assumption of normality, these estimates obtained from balanced data sets have minimum possible variances, in the class of unbiased estimates. Henderson (1953) has suggested analogous techniques for unbalanced data. Recent17, several new methods have been proposed. maximum likelihood method by Hartley and Rao (1967). One is the This method yields simultaneous estimations of both fixed effects and variance components by maximizing the likelihood function with respect to each of the fixed effects and the variance components. Even though the maximum likelihood estimators of variance components have some desirable properties, their use has been limited. The major reason for this is that effective algorithms are not readily available. Another criticism is that the maximum likelihood estimators do not take into account the loss in degrees of freedom due to estimating fixed effects. Recently, several attempts have been made to improve on the maximum likelihood method. Patterson and Thompson (1971) eliminated the second problem through their Restricted Maximum Likelihood (REML) approach by ~ 2 partitioning the likelihood function into two parts; one part entirely free of fixed effects. The maximization of this part yields what we call REML estimates of variance components. Another development was MINQUE, Minimum Norm Quadratic Unbiased Estimation (or Estimator) by Rao (1970, 1971a, 1971b, 1972). In the general mixed model ! '"' XA + Ue :II XA + Uof.o + suppose the random effect e: is observable. ! i-O PiO'~ + U e: , (1.1) p-p A natural estimator of would be e:' Ae: where A is a suitably defined diagonal matrix. Since, in practice, ! rather than e: is observable, the MINQUE principle leads one to select an estimator of the form r'AY where matrix A is selected to minimize the nOTm of the difference, to AX :II a and tr(AV i ) , :II I IU'AU Pi' Vi '"' UiU i for i-O,l,.,p. ... 2 • •• , 0'2) p is given by ~ - AI I, subject The MINQUE of - :II S S, where S is a (P+l)x(p+l) matrix with element Sij equal to tr(QViQVj ), Q - , - L -1_ -1 r-1 -E-L -X(X r -X) -x't , S- denotes a generalized inverse of S, and q is a (p+l) x 1 vector whose ith element is !' QVi Q!. An inversion and numerous multiplications of nxn matrices, where n is the number of observations, are reqUired to obtain the MINQUE. Some developments on reducing the size of matrices needed to be inverted have been made. Liu and Senturia (1977) have 2 shown that it is possible to obtain the MINQUE of.£ by manipulati?i 3 matrices of size g x gt where g is the sum of the numbers of levels of all random factors omitting the error, the number of fixed effects, and one. Giesbrecht and Burrows (1978) have shown that for the nested model one need only invert a p x p matrix, where p is the number of variance components. MINQUE also requires that one has prior values for the variance components. The properties of the estimators depend on the quality of these prior values. Since good prior values are not always avail- able, several authors have proposed an iterative MINQUE procedure, where the prior values are replaced by the estimates from the previous cycle and the negative estimates are replaced by zeros. Harville (1977) and Giesbrecht and Burrows (1978) have pointed out that if the process converges, then iterative MINQUE is identical to REML, i.e., these estimates satisfy the equations obtained from the REML method. In two recent papers Weiss (1971, 1973) has discussed the asymp- totic properties of maximum likelihood estimates for some nonstandard cases. He has shown that a general asymptotic theorem holds for a class of cases where the observed random variables are not necessarily independent "and identically distributed. This theorem also allows the possibility of different normalizing sequences for different sequences of estimates. Miller (1977) used Weiss's general asymptotic theorem to obtain asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance under some mild restrictions on the design sequences. In this paper it is shown that similar asymptotic properties hold for the REML estimates. In particular, these asymptotic results obtain 4 when the number of levels of each random factor increases to infinity or when the ex-periment is repeated. The sequences of the estimates may require normalizing sequences which differ in order of magnitude in order to eliminate the problem of a degenerate limiting distribution. Truncation does not affect the asymptotic results because the true variance components are assumed to be positive and the estimates are consistent with high probability. There exists a number of nOn-iterative schemes for estimating variance components. A common feature of these methods is that the resulting estimates are obtained as quadratic functions of the original observations. The selection of the actual quadratic function is often open to debate. If one is willing to use prior information about the components, then the MINQUE p~inciple corporate these prior values. leads to specific forms that in- Even if the original observations are assumed to have a normal distribution, in general the distributions of the quadratic fo~ used as estimators have remained intractable. Some progress can be made by noting that quadratic functions of normal random variables are distributed as linear functions of single degree of freedom chi-square random variables. independ~~t, In particular, Robbins and Pitman (1949) have shown that a positive quadratic form in normal variates is distributed as a mixture of chi-square distributions. Press (1966) extended this to obtain the distribution of the difference of two positive definite quadratic forms, and hence to the indefinite quadratic forms in normal variates. Wang (1967) used a similar tech- nique to study the distribution of several variance component estimators in the one-way balanced model. 5 In Chapter 3 Press's (1966) results are extended to ~~ine ~he distribution of variance component estimators in a large class of models. The first step is to derive the distribution of a quadratic estimator l' AY, where the eigen values of At are all positive, in terms of an infinite series of chi-square distributions. It is shown that the coefficients in the series can be evaluated recursively. Next, , the distribution of a quadratic estimator ! AY, where the eigen values of At are not all positive, is derived in terms of the confluent hypergeometric function. Finally, the probability density function of a , • quadratic estimator! AY, where the eigen v~lues of At are no~ all positive, is expressed as infinite series of chi-square density functions when both the number of positive eigen values and the number of negative eigen values are even integers. This series is useful in studying the behavior of variance component estimators. In Chapter 4 the distribution derived in Chapter 3 is used to study the effects of changes in experimental design and true variance components on MINQUE. The designs used for this study consisted of pairs of independent two-way balanced experiments. Recently, Giesbrecht (1977) has examined methods of estimation and shown by a simulation study that the estimates obtained from an iterative MINQUE procedure have smaller variances than the estimates obtained from method of pooling sums of squares and method of computing mean of the analyses of variance estimates. It is notable that all these estimates are obtained by equating the quadratic forms to their expected values and solving the system of equations. His work is extended to comparing the three methods togethar with method of averaging mean squares by using several combined ·e 6 experiments. In this .work, these methods are compared by using three criteria, variances, the probabilities of negative estimates, and the 95th percentiles of the distributions. The last two are obtained from the distribution derived in Chapter 3. Recall that the distribution of a variance component estimator values of matrix A!. , ! AI depends only on the e1gen The true variance components are assumed to be known, and the prior values required by MINQUE are replaced by the true variance components. 7 2. ASYMPTOTIC PROPERIIES OF RESTRICTED MAXIMUM LIKELIHOOD ESTUfA,TES 2.1 m THE MIXED MODEL Introduction In a recent paper, Miller (1977) has shown that in the mixed model of the analysis of variance there is a sequence of roots of the likelihood equations which is consistent, asymptotically normal, and efficient in the sense of attaining the Cramer-Rao lower bound for the covariance matrix. In this chapter it will be shown that similar results hold for the Restricted Maximum Likelihood (REML) estimators of the variance components defined by Patterson and Thompson (1971) and Corbeil and Searle (1976). One can view REML estimators as estimators obtained by factoring the likelihood into two parts, one a function of contrasts among fixed effects and the other a function of contrasts with zero expectation and then maximizing the latter. Reasons for considering REML as opposed to conventional maximum likelihood include the following: a) The REML estimates of variance components agree with the values obtained from the analysis of variance when the data set is balanced in the sense that there is an adjustment for the "degree of freedom" lost due to the fixed effects in the model. b) The system of non-linear equations ,. 0 for i-O,l, ••• ,p where (P+1) is the number of variance components in the model which can8 be reformulated as a system of linear equations, for i-O,l, ••• ,p Y'A Y • - i- where the {Ai} and {cij } depend on assumed prior values of the variance components. These have been discussed by Harville (1977) and Giesbrecht and Burrows (1978). The equations obtained are exactly the equations obtained by applying MINQUE theory, developed by Rao (1970, 1971a, 1971b, 1972), when one has prior information about the variance components. Consequently if the iterative scheme suggested by the linearMINQUE equations converges (with proper allowance to replace negative estimates by zeros) then one has the REML estimates. c) Occasionally REML estimates are easier to obtain than con- ventional maximum likelihood estimates. 2.2 The Mixed MOdel and Its Assumutions Consider the model Y a XB -- + U €1 1- + ••• + U € + (2.1) U~ p-p~ where Y is an nxl vector of observations; X is an nxk matrix of known constants; U , iaO,l, ••. ,p, are nxc matrices of known constants with i i U oa I ; 8 is a kxl vector of unknown constants; e:., i=O,l, ••• ,p, are n ~ - cixl vectors of random variables such that ~ 2 '" N(O,Dio ), where Di are i , cixc i matrices of known constants with DO ... In' and E (~i.f..j) "" 0 for i~j. 9 From the model (2.1) it follows that E(Y) • XB and Var(Y) • 1: Let d i be the rank of the symmetric matrix. D • i Then there exists , ....ic' a cixd i matrix Gi with full column rank such that Vi· ViDiU i • Uiui if where Ui • UiG i • Define 2 ~ 2 2 2 • (oO,ol' ••• 'op). 2 is positive definite, provided 00 n ~ k + P + 1. > O. Note that the matrix 1: It will be assumed that '. The following three assumptions on the X and U matrices i are required: a) X has a full column rank. b) Ranks of the augmented matrices [X U ], i-O,l, ••• ,p, are i greater than rank of X. c) U , i-O,l, ••• ,p, have full column ranks. i The first assumption can be satisfied by a suitable reparameterization. The second and the third assumptions require that the fixed effects are not confounded with any oth~r random effects and the random effects are not confounded with each other. If we assume! ~ N(~,t), then the log- likelihood function of Y is (2.2) The maximum likelihood procedure yields estimates of both fixed effects and variance components simultaneously by maximizing the likelihood of Y with respect to each element of ~ and each of the variance components. As stated in Chapter 1, the method of maximum likelihood is rarely used in practice because the arithmetic is difficult and because the method fails to take into account the loss in degrees of freedom resulting from 10 estimating fix,d effects. The approach of this paper will be to follow Patterson and Thompson (1971) and partition the log-likelihood into two parts, one part entirely free of fixed effects. restricted to the latter. of the variance components. nonsingular transformation Our attention will be Maximization will provide REML estimates Partitioning is accomplished by using the rlxt1:-~1 -z • Y where T is an (n-k) xc. matri."t r ~ ~ ~[[x't-lxs~'Lo tT ' o such that TX - O. It follows that 0 11 X't-~ The model of the part which is entirely free of fixed effects is given by (2.3) and the parameter space is defined as (2.4) , -1 . 2 The log-likelihood of TY and X 1 denoted by Ll (TY;Q. ) and 1: (2.5) and (2.6) Using the rules for matrix differentiation given by Graybill (1969), 2 differentiation of L with respect to elements of {cr } gives l i 11 for i-O,l, ••• ,p, where tr(A) is the trace of a mat~ix A. (2.7) The second-order partial 2 derivatives of L (TY;,2, ) are l (2.8) - Y'T' (TtT,)-lTU~U*'T'(TtT')-lTU*U*'T' (TtT,)-l TY -'. Joi , '-1 Define Q • T (TtT) j j . - T. Th~ for i,J-O,l, ... ,p. (2.7) and (2.8) can be written as (2.9) for i-O,l, ••• ,p and for i,j-O,l, ••• ,p. (2.10) Also note that Khatri (1966) has shown that rt!'\ A value of E.2 E~ that maximizes L1 (T!),2,2 ) is referred to as a REML A est 1mate oX- E..2 • Frequently numerical techniques will be needed to solve the system given by (2.9). beyond the scope of this paper. The discussion of such techniques is For the remainder of this chapter, the notation L (TY ;a 2 ) is 1n - 0 . - 12 2 used instead of Ll (T!;.£ ) to emphasize the dependence on n, the size of the sample. 2.3 The General Asymptotic Theorem A general asymptotic theorem, proved by Weiss (1971), is presented by using notation which fit our needs. This theorem is very general and concerns the asymptotic properties of roots of the likelihOod equation in some nonstandard cases, where the observed random variables are neither independent nor identically distributed. The theorem also allows normalizing sequences of different orders of magnitude for estimates of different parameters. Miller (1977) has applied Weiss's theorem to prove asymptotic properties of maximum likelihood estimates in the mixed model. In his work, Weiss's theorem has been stated in essentially this form. In Section 2.5, the asymptotic properties of REML estimates in the mixed model of the analysis of variance -.Jill be proved by applying this theorem. The general asymptotic theorem is stated as follows: Theorem 2.3.1. For a sequence of random variables TY with the -n log-likelihood functions L n (TY ;0'2) where -0'2E~, suppose the true -n~ l parameter pOint ~ is an interior point of CE>. Let there be 2 (P+l) sequences {ni(n)} and {mien)}, iaO,l, ••• ,p, of positive constants such that lim°ni(n) • tl""- =, lim mien) no+- 00, mien) and lim n. () n n-+- 1. = O. Denote 13 2• 2 a-r. ,a ) -nl n (TY a2 e N 2 (.Q:n). n-v - a) , i,j-O,l, .•• ,p, fOT all If theTe exist I ij (~), i,j-O,l, ••• ,p, such that Bij (n,fL2) IfL-20 2 2 2 2 conveTges in probability to I ij (20) as n ... ca, I ij (20) is a continuous 2 . 2 function of ~, and a matTiX [I ij (~)] is positive definite, and b) t~ere ·22 exist sequences ( y(n,~)} and (o(n,~)} of positive constants, which conveTge to zeTO as n ... 2 for all cr E 2 Nn(~)' ca, such that for each n then there exists a sequence of estimates ... 2 ~ (n), 2 which are roots of the equations aLln (TY ;a ). -n- --~--2~--- ~ 0, i~o,l, ••• ,p, such ocr . 1. that &2(n) is consistent and the vector whose ith component ni(n)(cr~(n) - a 2 Oi ) conveTges in distribution to a normal random vector with mean vector.Q. and covariance matriX [ I ij 2 ]-1 • (.2:0) In applying Theorem 2.3.1 to REML estimates in the mixed model of the analysis of variance, we need to show that under some assumptions on the mixed model the model (2.3) implies the requirements of Theorem 14 2.3.1. These rill be shown in Section 2.5. The assumptions on the mixed model have been discussed in Section 2.2, and some further assumptions required on the sequence of experiments will be discussed in the following section. 2.4 A Seguence of Experiments The asymptotic results will be established for a sequence of experiments where the size of the experiments increases, that is, the number of levels of each random effect increases. One can visualize extensions of previous experiments or an entirely different experiment at each stage. These assumptions are required to eliminate sequences of experiments in which the limiting dist'l:ibucions are degenerate. Without loss of generality, for each n let TU., i-O,l, ••• ,p, in J. model (2.3) be 1aaeled so that the ci(n) are in decreasing order of magnitude. Generate the partition of the integer {O,l, ••• ,p}, SO' so that for any two indices i and j in a set S , the s associated ci(n) and cj(n) have the same order of magnitude. Note that there are a + 1 sets in the partition, SO' Sl, ••• ,Sa' where Ss = {i s , i s +1, ••• ,is+l -1} for s-O,l, •.. ,a-l and Sa = {ia , i a+l, ••• ,p}. Define (2.11) for i E S s vi(n) • rank [TUJ.. :TU +l: ••• :TU ] - rank[TU. : •• :TU. 1:TU.+1: ••• :TU ] si s P J. s J.J. P • 15 where the matrix ['rUl : 'rUZ: ... :TUp] is the augmented matrix. Thus, v (n) is the dimension of the part of 'rU not dependent on the other 1 1 'rU where 1 < j < p, 1+j, and 1 j s- - e S. s It is notable that v (n), 1 i-O,l, ••• ,p, are related to the degrees of freedom of the squares in the analysis of variance. sums Define for i-O,l, ••• ,po It is assumed that lim vi (n) () n"'- c i exists for i-O,l, •.• ,p. n of (Z.12) This assumption implies that the ith random effect not become confounded with any other random effects when n becomes large, that is, v (n) and ci(n) have the 1 same order of magnitude. Recall from Theorem 2.3.1 that (2.13) To facilitate the latter proof, 0'2, where E,.2 e N (~), rl~l be indexed n by subscript k, k-0,l,2. For each n and each 2 ~ 2 e Nn (.£0) (2.14) 1: k - Since the covariance matrix 1: nonsingu1ar matrix t\. k is positive definite, there exists a such that 1: k ... I t\.~. Define (2.15) 16 and 2 2 a -a -~ (2.16) Using (2.16) and the expected value of a quadratic form, the expected fork.-i (2.17) Finally, define 2 It is assumed that all elements of {Iij(£o)} exist and that the matrL~ [I ij ~) 1 'is positive definite. 2.5 Consistency, Asvm~totic Normality, and Asymptotic Efficiency of REML Estimates In this section we will prove that the requirements of theorem 2.3.1 are satisfied by the sequence of experiments discussed in 17 We will first establish a set of three basic requirements Section 2.4. listed below and referred to as conditions a, b, and c and then prove a summarizing theorem, Theorem 2.5.1. Condition a) The three basic requirements are: There. exist 2(p+l) sequences {ni(n)} and {mien)}, .i-O,l, ••• ,p, of positive constants such that lim ni(n)~, 0:+» lim m (n)-=, o:+»i mien) and lim tr+"" ni(n) • O. Condition b) all i,j, 2 Bij(n,~) There exist i,j-O,l, ••• ,p, such that for 2 converges in probability to I ij (20) as n 2 I ij (~) is a continuous function of Condition c) 2 Iij(~)' + =, and 2 2.0. There exist sequences 2 {y(n,~)} positive constants, which converge to zero as n + 2 and {o(n,£o)} of =, such that for each n 2.5.1. Proof of Condition a By the definition (2.12), ni(n) • lim ni(n) • n- =. v~(n), i-O,l, ••• ,p, implies Define K(n) - max i,j IE2 [B ij (n,~)] ~ I ij (~) I· (2.19) 18 It is easily seen that K(n) converges to zero as u ~ ~ ~ +~. Now define ~ m(n) - min (uO(u), ~ (n), ••• ,up(u), K (n». (2.20) Without loss of generality, let mien) It follows that lim mien) • ~ and lim ui(n) n- 2.5.2. (2.21) for i-O,l, ••• ,po mien) • m(n) n- o. Proof of Condition b To prove the second and third conditions, we will assume that the 2 true variance components 00i' i-O,l, ••• ,p, satisfy the following: • 2 2mi (u) °Oi > u (u) Condition 1. for i-O,l, ••• ,p. i Since m.(n) • men), i-O,l, ••. ,p, condition b becomes (jOi 2 1. i-O,l, ••• ,p. 2m(n) > n (n)' i This requires that the true variance components be posi- tive numbers. Define A.(A), i-l,2, ••• ,n, as the eigen values of an nxn matrix 1. Also X ~ n p 0 2 E N (~) means that for any fixed e: - n~ p {IX -xl> e:} < 2 n o > 0 and 0 (j > 2 X for all 0 there e..~ists o. In the next two Lemmas establish that Iij(~)' i,j-O,l, •.• ,p, defined in (2.18) satisfy the second requirement of Theorem 2.3.1. We 19 first prove that for all i,j, B (n,~) converges in probability ij to 2 Iij~) +~. as n From this point on, the notation of dependence on n will be suppressed, for example, using n , m , and i i 2 Bij~) instead 2 of ni(n) , mien), and Bij(n,ak ). 2 Lemma 2.5.1. 2 If B (Eo) and I (Eo), i,j-O,l, ••• ,p, are as defined ij ij 'in (2.16) and (2.18), then for all i,j, Bij(~) converges in probability n + Proof. ~.' It is sufficient to prove that Var 2 [Bij(~)l converges to 2 ~ zero as n +~. Inequality. The desired result then follows from Chebyshev's From the definition (2.16), we have v~r [Bij(~)l • v~r ~ {- n~nj [~tr(QOViQOVj) - I'QoViQoVjQo!l} ~ * *' Using Lemmas 7.3 and 7.6 and Vj • UjU j , we have 20 It follows By Lemma 7.8, is bounded by some constant. 2 It follows that VarfBij (£:0)] 2 converges to zero as n + =, completing the first part of ~ ~. 2 The second part of b, that is the continuity of I .. (cr ) is ~J o established by the following lemma. Lemma 2.5.2. If I ij (~), i,j-O,l, •• .,p, are as defined and condition 1 is tr.le, then I ij i..."1 (2.18) 2 (.£0) is a continuous function of 2v2 for all i,j. Proof. that Iij(~) is a continuous function of prove that there exist cri and ~, it is sufficient to cr~ e Nn(~) such that 21 Since n:~jltr(~ViQ2Vj) - tr(Q1ViQ1Vj )\ converge. to zero a' n by Lemma 7.11, then n~njltr(Q2ViQ2Vj) - +. converges to zero as n function of 2.5.3. .2.02 + CIS. Therefore, I ij tr(Q1ViQ1Vj 2 (.2.0) )I is a continuous for all i,j. Proof of Condition c Let L(A) be defined as the linear space formed by all linear combinations of the columns of matrix A. From (Z.ll) and for each n, Vi is the dimension of the part of TU i not dependent on the other TUj , For s-1,2, ••• ,a-l, let H be an s orthonormal basis for the part of L(TU TU i S+l+l : ... :'!U ). P : TU +l: .• ':TU ) orthogonal i s is p Let Ha be an orthonormal basis for (TU i : TU +l: •.• :TU ), and let HO be an orthonormal basis for the ia a p orthogonal complement of L(TU : TU : ••• :TU ). 2 p 1 Let the dimension of Hs be (n-k)xc s , s-O,l, ••• ,a, then H .. (H0 : Hl: •• :Ha ) is an orthogonal matrix. Since Ti:ZT , is positive definite, there exists a lower triangular matrix D such that T!2T' .. DD'. definite. Also H'T!2T'H is positive It follows that there exists an upper triangular matrix R such that R'H'DD'HR .. I. .. o for i;'j (2.ZZ) 22 t The vector Z can be written as and the vector ' t , (~,Z1."" ,;) , 'I! can be. written as a r 5-0 Fs-s Z (2.23) The condition of the bound of -5 Z , s-O,l, ••• ,a, will be set up to facilitate later development. This condition will not rule out any design of interest since it appears that the probability of this condition being true approaches one as n ~ <w. The condition is as follows: , -¥s 11 -<- Condition 2. -c - 10 s for s-O,l, ••• ,a. Finally, to prove condition c, there exist sequences and 2 {o(n,~)} of positive constants which converge to zero as n such that for each n, P ~ P 2 L L m p { 2 i-O j-O . 2 for all a - fJ.. I 2 sup Bij (~ ) 2 E N ( 2) n~ E N 2 (a~), n-u it is sufficient to prove that for all i,j, This is proved in Lemma 2.5.3 as follows: 2 {y(n,~)} ~ .. 23 Lemma 2.5.3. i,j~O,l, 2 2 Let Bij (~) and Iij~)' where k-O,1,2 and ••• ,p, be as defined in (2.16) and (2.18). If condition 1 and condition 2 are true, then (2.24) Proof. We have that 2 2 sup < m 2 .Q:.l E Nn (~) + m2IBij(~~) - E 2 [Bij(O'i)]1 + m21 E ~2 + mZIE z ~2 [Bij(~)l I Z [Bij(~)l\ 2 2 0' -2 Z +m\ E Z I . [ B (0' 22) ] - E2[B (9{)] 2 ij ij ~2 - E 2 Bij (0' 1) - Bij (0' 2) (Bij(~)J I (2.25) - Iij(~)1 . £.0 ~ Next, we will show that each term on the right hand side of (2.25) converges to zero as n + =. For the first term on the right hand side of (2.25), converges to zero as n 2 By the definition (2.16), m IBij(O'i) - Bij(O'~)1 + =. 24 + 1 u n i j P~t r (Q2Vi Q2Vj) ~ 2::njltr(Q1ViQ1Vj) I 2 But n:n j - !' Q2Vi Q2Vj QZy] - tr(Q2 Vi Q2Vj l I '- !'QzViQzVjQzyl converge to zero as n Therefore, nmnZ i j to zero as n + l 2 tr(QlViQlVj l - tr(Q2 Vi Q2 Vj ) 7.16, respectively. I I ' and n:n)!' Q1V i Q1 VjQ1 Y + CD by Lemmas 7.11 and Z - Bij(oZ) Z Bij('£'l) ! converges CD. This has been proved in Lemma 7.17. Therefore, the second term on the right hand side Z of (Z.25), m \ Bij (.£.;) - E [B ij Z (o~) 11, converges in probability °z to zero as n + CD. Using (Z.17), the third term on the right hand side of (Z.Z5) m21 E2 [Bij .£Z (cri']' - E2 [Bij (~lJI· m21 - n~nJ"tr (Q2Vi Q2Vj l - tr(Q2 ViQ2Vjll\ °2 + n~nj [~tr(QOViQOVj) - tr(QOViQOVjQO!2)1! ~ 2 z:'1n j Itr(~V1Q2Vj) - tr(QoV1IloVj ) 25 I )I· 2 + n:n !tr(Q2V1Q2Vj) - tr(Q OV1QOVj QOt 2 j 2 By Lemma 7.11, 2:1njltr(Q2V1Q2Vj) - tr(IloV1~Vj)1 converges to 2 zero as n .. era. Next we will show that n:njltr(Q2ViQ2Vj) - tr(QOViQOVjQot2)! converges to zero as n .. era. Since ~. Q2 - QO' it follows that 2 n:njltr(Q2V1~Vj) 2 • n:n j - tr(QOV1QOVjQot2)I Itr(~Vi~Vj)+tr(~ViQOVj)+tr(QOVi~Vj)+tr«QO-QOt2QO)ViQOVj)/ . 2 2 2 I :;. n:n tr(,iV1,iVj ) j I + n:nj Itr(,iV1QOVj) I + n:nj Itr(QOV1,iVj ) I . 2 + n:n Itr«QO-Qot2QO)ViQOVj)I • j By the definition of m and lemma 7.10, ';le have that .L!tr(m.~vJlr' nin r j . converge to zero as n - tr(oviQOVj )I 21 E2 [Bij (a 22 )] m E.2 ~ era. °2 2 These, together with 2:1nj!tr(Q2V1Q2Vj) converging to zero as n" 2 - E2 [B ij (.20)] l. I converges era, ~ply that to zero as n .. era. 26 Using (Z.17), the fourth term on the right hand side of (2.Z5) Z 2 ~ E (Bij(~)] Z mZ\ E (Bij(~)] Z Z 0'2 20 - tr(QOViQOV j )/. I ~ nin m j \ tr(QOviQOVjQOL2) It has been shown in Lemma 7.10 that mZltr(QOViQOVjQotz) - tr(QOViQOVj )\ converges to zero as n Therefore, m21Ez (Bij (~)] - EZ (Bij ~ 0'2 n + =. (~) JI converges + =. to zero as - Finally, the fifth term on the right hand side of (Z.Z5), Z m \ EZ (B ij (~)] - I ij (~) I, converges to zero as n + <» by the"· ~ definition of limit. These steps are assembled to find nO such that given First, choose n1 such that for all n > n 1 P {condition 1 and condition Z are false} 2 Q.Z Then choose u z~ u 1 such that for all u - E}Bij (Q.~)]I Q.Z > n Z < ~ • 27 21 2 This is true because m B (0'2) _- E ij 2 0'2 Next, choose n ~ 3 n 2 such that for all n - E2 [B ij (~) ] I, 3 u ' 3 21 2 [B ij ~) ] m E ~2 0'2 choose nO > n > - E2 [B ij ~ such that for all n > nO' < Then we may conclude that for n ~ .i 5 nO' for 2.5.4. (~) ] I' 2 ~2 E 2 Nn (.2:0) • Summarizing Theorem The results from Subsections 2.5.1, 2.5.2, and 2.5.3 can now be summarized by the following theorem: Theorem 2.5.1. For a sequence of experiments, each described by the mixed model (2.1) under the assumptions discussed in Section 2.2, satisfying the assumptions discussed in Section 2.4, consider the model (2.3), which is entirely free of fixed effects, IT • '!Uof-o + TU1:€l + ... + TU~ 28 of~, with the log-likelihood function space QD be defined as that defined 2 point ~ is an interior point of 2 Ll(TY;£) • .Let the parameter in (2.4) where the true parameter ®. Assume that there exist (P+l) sequences of positive constants ni(n) which depend on n, i-O,l, ••• ,p, 2 2 2 such that the matri..~ I~) - [Iij~)]' where I ij (.£o) - ... 2 2 sequence of estimates £ (n) of 2. with the properties as follows: a) Given ~ > 0, there exist such that for all n > a(~) such that a a(~) > and nO(~) nO(~) ~ b) !he (P+l)x1 vector whose ith component is l-e:. ni(n)(&~(n) - a~i) 2 -1 converges in distribution to a Np+l(Q,(I(£o» ) random vector. Proof. As stated above, if we can prove that the assumptions of this theorem imply the requirements of Theorem 2.3.1, the asymptotic results will follow immediately. Consider the following three steps: First of all, there exist 2(P+1) sequences {n.(n)} and {m (n)}, 1. i i-O,l, ••• ,p, of positive constants, as defined in (2.12) and (2.21), m. (n) 1. such that lim ni(n) • =, lim mien) • =, and lim n.(n) • O. nn~ n- 1. implies the first requirement of Theorem 2.3.1. This 29 2 Iij(~)' Secondly, there exist i,j-O,l, ••• ,p, as defined 2 in (2.18), such that for all i,j, Bij(n,£o) converges in probability to I ij 2 (~) ~s And for all i,j, n -+ 2 Iij~) CD. This has been proved in Lemma 2.5.I. is a continuous function of has been proved in Lemma 2.5.2. 2 £0. This Therefore, the second requirement of Theorem 2.3.1 is satisfied. 2 Finally, by Lemma 2.5.3 we have that m. .. sup 2 2 2 IBij(n'~l) .2.1 E Nn(~) Ihis implies the last requirement of Theorem 2.3.1. It has been shown that REML estimates of variance components in the mixed model of the analysis of variance. are consistent and asymptotically normal. It is of interest to inquire whether these estimates are asymptotically efficient. In the independent, identically distri- buted case, the Cramer-Rao lower bound for the. covariance matrix is the inverse of the Fisher information matrix for one observation. The bound, that is, the inverse of the information matrix, in the problem conSidered here cannot be defined in the usual sense because the observations in the sequence are neither independent nor identically distributed, and normalizing sequences of different orders of magnitude may be required by estimates of different parameters. Thus the definition of an information matrix which is analogous to the definition 30 in the independent and identically distributed case must be considered. We define an information matrix in our case as the matrix I where the ijth element is 2 2} for i.,j-O, .. .,p. (2.26) (1 -(1 - .::.0 If the estimates are consistent and asymptotically normal and.if the asymptotic covariance matrix is the inverse of the information matrix, then a sequence of estimates is said to be asymptotically efficient. The REML estimates are consistent and asymptotically normal with the 2 -1 • asymptotic covariance matrix (I(£Q)) 2 Since the matrix I(£Q) is the same as the information matrix defined in (2.26), the REML estimates in the mixed model of the analysis of variance are asymptotically efficient in the sense of attaining the Cramer-Rao lower bound for the covariance matrix. 31 3. THE DISTRIBUTION OF VARIANCE COMPONENT ESTIMATORS 3.1 Introduction In this chapter we develop a general method of examining the statistical properties of variance component estimators obtained by equating translation-invariant quadratic forms of normally distributed \ . random variables to their expected values and solving the resulting equations. Clearly the estimators obtained in this manner are again translation-invariant quadratic functions of the observations. It will be shown that distributions of these estimators can be WTitten as linear functions of chi-square distributions, i.e., mixtures of chisquare distributions. In general the coefficients of the individual chi-square distributions will be functions of the true but unknown variance components. In a subsequent chapter the method will be used to study a number of specific variance component estimation techniques. The behavior of several methods will be compared for specific values of the components. 3.2 The Method of Mixtures In order to derive the distribution of a quadratic form of normal random variables we require the definition of a mixture of a sequence of distributions. Definition 3.2.1. If FO(x),Fl(x), ••• is any sequence of distri- bution functions and if cO,c l ' ••• is any sequence of constants such ~ that c j > 0, j • 0,1, ••. , and L cj • 1, then the function F(x) • CjFj(X) is called a mixture of the sequence of distribution functions. 4It 32 The following lemma provides a starting point for the derivation of the distribution of variance component estiJnators. n If ! '" N(XS,1:), then y'AY is distributed as ) Lemma 3.2.1. AjU J-l 2 j where ~, j-l,2, ••• ,n, are independent noncentral chi.-square random variables each with one degree of freedom and noncentrality parameter 1 .. T"" j Note that 1: .. LL " AXS for A "s. 0 and Tl PjL j , j is arbitrary for Aj - O. , and P. is the eigen vector corresponding to the non-:J zero eigen value Aj of the matrix At. The most commonly used methods of estimating variance components are based on equating translation-invariant quadratic forms to their expected values and solving the system of linear equations. It is clear that the estimators obtained are also translation-invariant quadratic estimators. The definition of a translation-invariant quadratic form is adopted in this work as follows: Definition 3.2.2. A quadratic y'AX is called translation-i~variant if and only' if (Y - XS)' A(! - XS) .. X' AY for all!. This is equivalent to AX .. O. It follows from Lemma 3.2.1 that a translation invari.ant estimator n is distributed as ,~ AjU 2 where U2j , j-l, ••• ,n, are independent central j j-l chi-square variables each with one degree of freedom. 33 3.3 The Distribution of Variance Component Estimators (Positive Quadratic F01:Il1s) , Let A. be an arbitrary symmetric matrix such that Y AY is translation invariant. The eigen values of A.t can be identified such that Al ~ A2 ~ , ••• , ~ An > O. 2 a(Un 2 One can then write n f..t" j-l 'jU2 as j 1\ A_ _2 n j + a 1Un_ 1 + ... + an _ 1Ul) where a • An and a j - --X-' jal, ••• ,n-1. n The cummulative distribution function of a(U 2 + a U2_ n 1 n 1 + ... + can be written as a mixture of an infinite sequence of distribution functions. Theorem 3.3.1. j-I, ••• ,n, are independent chi-square variables each with one degree of freedom, and a, a , j-1, ••• ,n-1, are positive constants such that j a j ~ 0 and a > O. Then for z > 0 P(Z n-1 where Co .. 1r j-1 ~ z)· L k-O ~Fn+2k <:) _~ 1 k-1 a j , ~ - 2k i\_tCt for k > 1 such that L g.-a (3.1) 34 H .. m n-l 1 m ,\" (1 - --) , and F (z) is the cummulative distribution funcj-1 P aj tioo of a chi-square random variable Z with P degrees of freed01Jl. Proof. (See Johnson and Kotz (1970» 2 Z + ••• + an _ 1U1 , the characteristic function of ~, 'z(t), is a (1 - 2it)-~ n-1 1T (1 - 2iajt)~. Let w .. (1 - 2it)-1. Therefore, j-1 the characteristic function 'Z(t) can be written as a n-1 It is clear that IT j-1 function of '1 r 1 n-1 aj 1 2 (1 - -)U .. 2 j-1 aj J Q .. - ~k(Q;2'(1 - (1 - (1 - L)w) -~ is the moment generating Define 1 1k a-) ,... , 1 2'(1 - a-» .. E(Q ). 1 n-1 Then the characteristic function 4l Z (t) can be written in terms of a an infinite series expansion, that is, 3S 4lZ (t) ~ ,g, n-1 Z w (IT" -~} CD' aiL j-1 ~. k k! W k k-O Define kl n-1 It follows that ~. -~ (.rr aJ ) J-1 k! where K m ~(l - a l is the mthcummulant of Q. l» • n- ~(m-1) !Em where By using n-1 Em· L . K m (Q;l (l - 1 a Z ) , ... , 1 1 m and a. J (1 - - ) j-1 , we have that ~ k-1 .. -k g.-OL ~_g. c n-l Therefore, cO· Tr j-l Since W = a-; and c k - for k .. l,Z, .... g. k k-l L ~-t g.-O c t for k · l,Z, •••• (1 - Zit)-l is the characteristic function of X~Z)' CD 4l z(t), then the characteristic function 4>Z (t).. -a I k-O ~4>n+2k (t) . 36 Z 'Ihis implies that P (< a - transformation Z a . !a' Z a ) By the linear .. we have that for z > 0 3.4 'Ihe Distribution of Variance Component Es~1mator~ (Indefinite-Quadratic Forms) When. the eigen values of AI are not all positive, an estimator y' AI is di.stributed as the difference of two independent linear func- tions of single degree of freedom chi-square variables. 'Ihe distribution is derived in terms of the confluent hypergeometric function and in terms of chi-square distributions when the number of positive eigen values and the number of negative eigen values are even. Define (3.2) for c,x tion. > 0 where ~(c,d;x) deno~es the confluent hypergeometric func- 'Ihis function satisfies the confluent hypergeometric differential equation of Kummer: 2 x .!..f + dx (d-x) '* - cy .. O. 37 The probability density function of the difference of two chisquare variables with m and n degrees of freedom can be written in terms of confluent hypergeometric function as follows: 2 Theorem 3.4.1. a,b > O. 2 Suppose'! • aX - bY where X '" X (m)' Y '" X (n)' and The probability density function of T is given by mrn -t ----1 1 2 t e 2i 1jI [11.!!!ta. ,(a+b) t' 2' Z ' Zab j for t ~ 0 (3.3) p(m,n)(t) • -(~:~)t] for t < 0 • geometric functions as defined in (3.Z). Proof. See Press (1966). When both m and n are even positive integers, the probability density function p (t) defined in Theorem 3.4.1 can be expressed as m,n' a mixture of a finite number of chi-square distributions as follows: Corollary 3.4.1. and a,b > O. Suppose T • aX - bY where X '" X\Zk)' Y '" X~21) , The probability density function of T is given by for t P2k,21 (t) - ~ 0 (3.4) ~ k-~ (a+H 1 \ 1-1 1a.+b1 s-o 2 ill.§.~lS Lt) s! Ia+b f z (1-s) \ b for t ~ 0 38 where (k) is the ascending factorial such that (k) s s ... r (k+s) d r (k) an f (x) is the probability density function of a chi-square random varir able X with r degrees of freedom. Proof. The proof is given for the case where t where t ~ ~ 0 only; the case 0 can be proved analogously. From (3.3), we have that for PZk,Z.2. (t) • t ~ From (3.Z), la+bj t ] tP [.2., k+.2.; \2ab .. -r 1 <.Z) -1 r (.2.) .r 1 (.2.) (a+b\ By using the linear transformation Z.. \ Zabi tY and the gamma function r(a) .. Je-z za-l dz, we have that o k-l 1 tP~ .e.,k+.e.; ,Zab tJ .. r(.e.) r a+bj '; r Ik-l)' (Zab\ \s a+b/ s-O • r ~.e.) k-l r saO J e-z zS+i-l dz s+.e. all t -(s+.2.) 0 f(k)f(s+.e.) f(k-s)s! ~ab)S+.2. -(s+.2.) G+b t O. 39 Therefore, k-i P2k, 21 (t) ~ 2 -t s-k s+t-k.. s a -0 r (s-H. ) ''-'''--::''S+~t--=--=-~~ - saO (a+b) _L.!-) t-l{ \a+b 2a e t k-s-1 str(R.)r(k-s) -t k 1 ~ J!l!.lJL) 1 \ a+b1' saO st ra.:+h s{e - 2a (~l k-s-1 k } r(k-s)2 -s Since -.L f 2 (k-s) (~. 2a (..;J k-s-1 _e_.-...;..;:a1=--__ r(k-s)2 k- s then we have that PZk, 21 ( t ) · U-) 1p:rb t-l k-1 W \i+bl ~ t. s-O ~ L-2-' s s! \a+b) f rSo\ 2 (k-s) iaJ for t ~ 0 Corollary 3.4.2 gives an alternate form of the probability density function pm,n (t) defined in Theorem 3.4.1. This form is more convenient for this study. Corollary 3.4.2. and a,b > O. Pm,n (t)· Suppose T • aX - bY where X ~ X~m)' Y~ X~n)' The probability density function of T is given by .L ab '" f m (t:V) J 0 f n (~) dv for t ~ 0 (3.5) '" -ab1 J f n 0 -(~ fmra) du for t ~O where fr(v) is the probability density function of a chi-square variable V with r degrees of freedom. 4It 40 Proof. The proof is given for the case where t where t ~ 0 only; the case 0 can be proved analogously. Using (3.2) and (3.3), we have that for t e 2a t 2 Pm,n (t) - ';;;"'m+n-"";;;~m-n--- ~ b2 2 2 m+n la+b} n 1P [ 2'"2-; \2ab 0 GO 0 GO J 0 ] 1 f By the linear transformation S - 1 ab t riB!)2 -t m+n --1 e 2a t 2 • !!t!l m n 2 2 a2 b 2 r(~\ 21 -- ~ m+n --1 -_t Pm,n(t) ~ 1 m e ~ -\2a e - (~ 2ab st m n 1· --1 2(1+s)2 s ds. r (~l JJ t' r: for t > 0 m --1 V j2 r( ~} 22 n v -2 -1 e- 2b (v) b dv n f(n)22 2 GO 1 aab f f ,t+v, 'v f (hj dv. \ a n m '-) 0 , As noted above, if the estimator Y AY is an indefinite quadratic form, that is, Ar is not positive definite, then it may be treated as the difference of two positive definite forms. This implies that a r variance component estimator X'AY is distributed as s r j-1 L i-1 AiU~ ajU~. where a j • -Ar+j' j-1, .•• ,s, r is the number of positive J 41 eigen values, and s is the number of negative eigen values. can be represented as clear that It is ~~tures of chi-square variables. Write h b • al' a b(b s U2r+s + b s-lU2r+s-l + ••• + ~2 ~~l ) were a - ' h r' i i • 1,2, ... ,r-l, and b j • ~ b ' j • 2,3;... ,so ';\, . r-l. .a 'Ihe probability density 2 function of a(u; +. alU;-l +. . •. + a r- l U1 ) - b (b s u2r+ s + as- lulr+s-1 + ••. 2 ) is derived in term~ of the confluent hypergeometrlc function. r+l +. U 2 Let. 'I - a(U r2 + a u rl' +... + ar-l TTu 12 ) - b('o s U2r+s ~' l Thee rem 3. 4. 2. b ., s-l r+s-l + U 2 + Ur+l)' 'Ihen the probability density function of 'I is given by for _ where r to E :Ie ~,cl < t < = (3.6) are the constants defined as in Theorem 3.3.1 corresponding 2 0jUr+j' respectively, and Pm,n(t) is the probability density function of the difference of two independent chi-square variables as defined in (3.3). ... ., 42 and V are independent random variables. From Theorem 3.3.1, the cummulative distribution function of a random variable U QII Gu(u) • Jo ~Fr+2k{~) for u ~ (3.7) 0 where Fp (u) denotes the cummulative distribution function of a chi, square random variable U TNith P degrees of freedom. It is clear that Fr+2k (:), k • 0,1, ••• , are differentiable, that is, ~ fr+2kl:) .. ; F~2k [~) where f p (u) denotes the probability density function of a chi-square random variable U with P degrees of freeco dom. Since the series c [.\' ...£ a f r+2k k-O (U) a converges uniformly on every finite interval of u, then if converges to the function Su(u) where gU(u) .. GU(u). Therefore, QII 8u(u).. r .f fr+2k k-O (:) for u QII By the similar argument, we have that r ~ (3.8) O. c* b i f s+2i (~) converges i=O uniformly on every finite interval of v to gv(v) where 8V(V) .. , G (v). v Therefore, 43 for v (3.9) > 0 By the convolution formula, the probability density function of a random variable '! is given by CD ~(t) - f for t iu(t+v) gV(v) dv (3.10) > 0 o Substituting (3.8) and (3.9) into (3.10), we have that for t CD h.x(t) - f CD . ~ f r+2k I~]. : [L o I r ! r :1 f s+2.9. (;rJ' dv • ;,~fr+2k ~ (t+v) -;- and the series k-O c.* 1-0 * L 1-0 . k-O Since both the series co CD ~_O ,., fs+ .9.[ ~) converge uniformly on every finite interval of 2 v, then h.r(t)... r ft+V } a 1<;-0 f s+22. By Corollary 3.4.2, we have that CD co for t For t ~ > 0 0, h.r(t) can be derived analogously by using the con- volution formula 44 h.r(t) - f gv(-t+u) ~(u) for t du ~ 0 o When both r and s are even positive integers, that is, r - 2p and s - 2q, the probability density function of a random variable T is given by r I h.r(t) - ~ for t 0 I I (3.11) \' I co co l ~o r * .2.-0 ~c.2. \a+bl i.JL) p+k-Lt.J:.j q+2.+1. (p+k)s (~ s ( t,l .Ia+bl s 1. a+bJ f 2 (q+~-s) -bJ sIo . for t ~ o. In applying the results from Theorem 3.4.2 when both r and s are even positive integers, the formula which appears well-suited for co~ puter calculation of the cummulative distribution function of a random variable T is ,'/ i co r r co c*{~ ~. k-O ~-O ~ .2. -a+H t'saOt q+.2. [.....k-l (g+~)s s. S( 1 w \a+bJ F {ta IJ J] 2 (P+k-s) for t > 0 H.r(t) (g+k) s I a' s 1 s! (a+b) ( - F t-t1)))] 2 (q+.2.-s) ; for t- < O. 45 where F (t) denotes the cummulative distribution function of a chip . square random variable T with p degrees of freedom. c.~ (a+bb} p+k ~ d... _ (t) - r+t-l ,\ Define (pZ~) S (a+ab) s (1 - F2 (q+1-s) ] (-t b )) saO where k~..t .. O~l~.... Then the cummulative distribution function of a random variable T can be written as for t ~ 0 for t ~ 0 H.r(t) .. For our programming purposes the steps in the calculation of the cummulative distribution function of a random variable ! for t ~ 0 are set out as follows: a) For any k, co~ute N r d~k(t) where N is the number such that 2.-0 d~k(t) is less than a prescribed magnitude ~, usually 10 b) Note that the {~} -4 for 2. ~ N+l. are computed using the formulas given in Theorem 3.3.1. c) Compute 4 in less than a prescribed magnitude €, usually 10- , for it k ~ N +1. 46 Therefo re, H-(t) • I _ L ~* ~ ~O 9.=0 * L.!...l q+9.[P+~-1 (g+9.)s W s ~c1 \a+bl s=o s! la+bl (t))] (1-F2 (P+k_S); for t ~ 0 In the case where t < 0, the steps in the calculation are similar to those when t ~ O. It is easily seen that the distribution depends only on a,b,a., J. i • 1,2, ••• ,.r-l! and b j , j • 2,3, .... ,s. In the case where one of these numbers is large when compared with the other, the series will converge slowly. One should change a prescribed magnitude € to 10-5 to include more terms. The cummulative distribution obtained from the method developed has been compared with the cummulative distribution obtained from the Monte Carlo method. ~(t) As an example, we consider the evaluation of 2 2 .2 2 2 2 .2 • P(.4SU I + .2SU 2 + .2SU3 + .2SU4 - .13U S - .13U 6 - . 1307 - 2 .2lU 8 < t) for different values of t. The cummulative distribution from the Monte Carlo method is obtained from the relative cummulative frequency distri2 2 bution of 30,000 random variates distributed as (.45X(1) + .2SX(1) + 222 222 .25X(1) + .2SX(1) - .13X(1) - .13X(1) - .13X(1) - .21X(1»· results are shown in Table 3.4.1. The 47 Table 3.4.1 The cummulative distribution obtained from the method developed and the Monte Carlo method of T with € • 10- 3 H.r(t) t Exact Monte Carlo -1.50 .00550 .00657 -1.00 .02415 .02427 - .50 .08662 .08990 .00 .26421 .26660 .50 .52368 .52187 1.00 .72299 .72017 1.50 .84694 .84533 2.00 .91768 .91430 2.50 .95663 .95327 3.00 .97735 .97440 3.50 .98871 .98603 4.00 .99507 .9-9300 e 48 Note that the method developed depends also on when the series is terminated. In this example, all series are terminated when the contribution from the remaining terms is less than 10- 3 , the cummulative distribution obtained from the method developed is less than that obtained from the Monte Carlo method when t 0 and greater than that < obtained from the Monte Carlo method when t > O. Allowing more tems of the series will give the more accurate results. It is of interest to know what the graph of the probability denr sity function of T, where T • r i~l 2 AiU i - s r j-l . CL • J 2 Ur+j' looks like. In Figure 3.4.1 the graphs of the probability density functions of T, where 4 T• . 4 2 AU CL.U " r r 4 i 1 j-1 i-1 ? J ~J , for three sets of Ai' i • 1,2, 3 , 4, and CL. , J j .. 1,2,3,4, are shown. It is noticeable that when the ratio becomes large, the graph of the probability density function shifts to the right and becomes flatter. Also the tail of the graph on the right is larger than the other, that is, the distribution is positively skewed. .60 r-----~'=_----"-----,-----.----......,r-----I i I I (.45 •• 25 •• 25 • • 25. -.13. -.13. -.13. -.21) .40 (.61 •• 33 • • 33 • • 33. -.13. -.13. -.13. -.21) ,.." 4.J ......, r .20 1.1625. 1.1625. 1.1625. -.16. -.16. -.16. -.16 -2.0 0.0 2.0 4.0 6.0 8.0 t lo'igure 3.4.1 4 Graphs of the p.d.L of T I where T "" r 4 2 2 "iUiaju +jl for r i;;1 . j;;i l three sets of ("l" "2 1 "3 1 "41 -all -a 21 ~a31 -a ) 4 ,f:- \D e e e so 4. COMBDUNG INFORMATION FROM SEVERAL EXPERIMENTS 4.1 Introduction , It is not unusual to encounter situations where an experimenter wishes to estimate variance components and has at his disposal data from a number of experiments. If these experiments have identical designs, then it is clear that one should combine information by averaging estimates from individual experiments. If, however, these experiments happen to have different designs, then he may select any one of a number of possible teclmiques of analysis. different choices may lead to different answers. Unfortunately, !he theory developed in Chapter 3 is now used to study and compare properties of four reasonable alternative variance component estimators. The rationale is that this situation has sufficient structure to permit study and jet give some insight to the fully general unbalanced case. Despite the limited nature of this study, that is, combining pairs of balanced experiments, certain general trends appear. We find, for example, that the straightforward technique of simply computing averages of estimates obtained from separate analyses of variance may be very inefficient. The following four methods of obtaining estimates are considered: a) ance. Pool the sums of squares from the separate analyses of variThis can be thought of as an adaptation of either Henderson's method 1 or method 3. b) Compute the mean of the estimates obtained from the separate analyses of variance. 51 c) Compute the averages of the mean squares in the analyses of variance, equate to their expected values, and solve for the estimates. d) Apply the MINQUE theory discussed in Chapter 2 to the unbalanced data set. It can be shown that in this case the estimators are weighted functions of the sums of squares in the individual analyses of variance with weights that depend on the true variance components. In practice prior estimates of the components must be used when computing these weights. For purposes of this study it was assumed that the true values were known and consequently the results must be interpreted as providing a bound for the technique. The estimates will be compared on the basis of variance, probability of yielding negative estimates, and the 95th percentile of the distribution. By their very nature, methods a, b, and c are unbiased. Method d is also unbiased if one uses fL~ed prior values for the components and accepts the occasional negative estimate. Truncating the distribution by replacing negative values by zeros destroys the unbiasedness. The effects of iterating the process are not investi- gated in this study. 4.2 MOdel and Methods We assume the conventional linear model for the rxc table with interaction and n sub samples where ~ is an unknown constant, the {r },' {c j }, {rc }, and {£ijk} ij i are all independent normal random variables with zero mean and variances 4It 52 a 2 ' a2 , a 2 , and a 2 , respectively. c rc e r For purposes of this chapter, it is convenient to rewrite the model as (4.1) It follows that itions. When information from a series of m experiments with common vari- 2 2 2 2 ance components, a , a , a ,and a , is to be combined, the'model (4.1) r c rc e will be indexed by subscript k, k-1,2, ••• ,m. Four methods of combining information and obtaining the estimates to be investigated are the method of pooling sums of squares (method a), the method of computing mean of the analyses of variance estimates (method b), the method of averaging mean squares (method c), and the MINQUE method (method d). In each case, estimates are obtained by equating quadratic fOrMS to their expected values and solving the resulting system of equations. These equations will be listed as follows: a) The system of equations obtained by pooling sums of squares (method a) is given by m m I ck~(rk-l) r (r.-l) k-l m j-l J m I ~(rk-l) I (rj-l) k-l a"2r + m jal .. 2 .. 2 cr + cr rc e I s~ r (rj-l) k=l = m j-l 53 m r .,-;;k~-;;;.l rk~('it-1) m r j-l (cj m r .c -1) ~('it-1) &2 + ._k-___.1 m &2 m + &2 rc r • r.SSSc. __·_ _ ~k.-~·;;;.l m € r (cj-l) j-l. (Cj-l) j-l (4.2) b) For the method of computing the mean of the analyses of vari- ance estimates (method b), we first find the analysis of variance estimates obtained by solVing the system of equations (4.3) for each experiment and then find the mean of these estimates from m experiments. !he system of equations for each experiment is given by 2 +"2 cna.. r2 + na.. rc a€ = MSR rna.. 2 + na.. 2 + a.. 2 c rc € = MSC (4.3) .. 2 .. 2 na rc + a € = MSRC &2 • MSE • € c) The system of equations obtained by averaging mean squares (method c) is given by m l: 2 o m r 2 o n c + n + k-l k k r k-l k rc 2 ma € m • I MSR.. 1<.-1 nk 54 m 2 ~ tl l 0 k-1 1 k C - m 2 + ~ tlk 0 + 1<.-1 rc r m 2 ma. e: m = (4.4) -2 -2 m ~crrc + mcr e: • ~ MSRCk k-1 k-1 2 ma e: d) r MSCk k-1 m • r MSE- ~ k-1 The system of equations obtained by the MINQUE method (method d) is g:1.ven by m (ri.-1)cktl 2 -2 m (rk-1)ckn -2 k k cr + cr cr + L k-1 (E (MS~» 2 e: k-1 (E(MS~»2 rc k-1 (E(MSR ~ » 2 r m 2 2 (rk-1)~nk -2 r r • m ~tlkS~ r 1<.-1 (E(M~»2 2 2 2 m (ck-1)rktl m (ck-1)rk~ -2 m (ck -l)rk~ .2 k .2 cr + E cr cr· + E 1<.-1 (E(MS~»2 rc k-1 (E(MSC »2 C k-1 (E(MSC »2 e: k k r" -E t L (4.5) m rk~SSCk kw1 (E(MSC »2 k 2 2 .. ( ) 2 m· (~-l)rk~ .2 m rk-1 ~ cr-2 + L cr + E k k-l (ECMSC »2 c 1<.=1 (E(MS~»2 r 1<.=1 (ECM~»2 k m (r -1)~~ 2 + (~-1)~ CECMSC »2 k 55 (rk-l)(~-l) (E(MSRC ))2 k + SS~ ) (E(MS~))2 . It is easily seen that the estimates obtained in each case are weighted linear functions of the sums of squares in the separate analyses of variance, i.e., The" ~~ected SS~, SSC , SSRC , and k k SS~, k-l,2,.,m• • values of these sums of squares are (4.6) E(SSRC.) • (r, -l)(c, -1)0' l<. t(. l<. 2 € ? +.,'-kl<. (r, -1) (ck-1)0'rc and the variances of these sums of squares are Var(SS~) 2(E(S~)]2 • (rk-1) (4.7) Var(SSC k ) · 2(E(SSC )]2 k (~-1) 56 2[E(SSRC )]2 k Var(SSR~) .. (rk-l)(~-l) Var(SS~) &t .. 2[E(SSE )]2 k for k-l,2, ••• ,m. (1) rk~~- In order to apply the theory developed in Chapter 3 the sums of squares must be written in the form: J V kl N k C k3 C~, - - -r-"-n- , C c:. t1. k2 -K K k-k"K. element is equal to one, and ~k is an N x k ~k fore, the estimates obtained can be written in fo~ y' A..!. , and identity matriX. te~s There- of a quadratic These estimates are translation-invariant quadratic esti- mates because we have the condition AX = 0 in all cases. Recall that the distribution of a translation-invariant quadratic - - . estimator y' AY delJends only on the eigen values of the matr:ix AI: where 4 .. Var(Y). values. positive. Clearly the estimates of interest can assume negative These may happen because the eigen values of AI: are not all The method developed in Chapter 3 of finding the distribution of variance component estimators will be discussed in more detail in Section 4.3. In Section 4.4 to Section 4.6 the method will be applied 57 to find the probabilities of negative estimates and the 95th percentiles of the distributions of the estimates. 4.3 Distribution and Variances of Estimators The four estimates will be compared, in Section 4.4 for a set of unbalanced designs, by using three criteria, variances, the probabilities of negative estimates, and the 95th percentiles of the distributions of the estimates. method developed in Chapter 3. The last two are obtained from the Also, the properties of the MINQUE method for selected unbalanced designs ~d selected values of the true variance components will be examined in Sections 4.5 and 4.6. The estimates are linear functions of the independent sums of squares in the separate analyses of variance. of these estima~es Consequently variances are simply obtained by using (4.7). , The estimates are also of the form YAY. Consequently, the method of finding the distribution of quadratic forms can be applied to find the probabilities of negative estimates and the 95th percentiles of the distributions. Recall that the variance component estimator ! , AY is distributed as -Qj' j-l,2, ••• ,s, such that Ai and Q j are positive numbers, are the eigen values of a matrix At. 2 2 This can be written as a(U r + a1Ur- 1 i-l,2, ••• ,r-l, and bj .. ~b' j-2,3, ••• ,s. This 58 study is restricted to the case where both the number of positive eigen values and the number of negative eigen values are even, that is, both r and s are even. Define T .. 2 AU i i s L j-l 2- C%jUr+' ~ J The probability density function of T when r - 2p and s • 2q is given by for h.r(t) .. ~ t:O ~ t > 0 (4.8) lo· c*(.JL)p+k-l/...L) q+t-l !P+k)s a s (_ !') t a+b 'a+b s! (a+b) f 2 (q+t-s) b for t ~ O. The distribution of the estimators obtained from the four methods is the same as that of T. For discussion purposes, a random variable T is used to represent any estimator. The probability of negative estimates and the 95th percentile of the distribution are obtained from the cumulative distribution function of T. The formula which appears well-suited for computer calculation of the cumulative distribution function of T is *(.....L)q+t [P+k-l (o+.7.)s b s t ~ r r ~Ct a+b L sl (a+b) (l-F2 (P+k-s) (a»J k-O taO saO CD 1 ~(t) .. CD for t > 0 for t < O. (4.9) 59 since the cumulative distribution function HT(t) is of the infinite series form, the results in Section 4.4 to Section 4.6 are then calculated by asing for t ~ 0 for t ~ O. (4.10) where N is the number such that is less than 10-4 for t N+l, ~ N* is the number such that is less than 10-4 for k N+l, ~ M is the number such that r....E-)P+k [ Ck\a+b q+9.-l \' '0 s· -4 is less than 10 for k (P+k)s ( a )s( sl ~ -;H;" '- t»] I-FZ(q+R._s)l;""b M+l, M* is the number such that * CNd b p+k q+9.-1 (~k) a s t ~1(.\I.a+b) [ \ ,' ~S! s I.\~) (1 - F 2(q+t-s) (-b k-O s-O M \' (. -» ] . -4 . is less than 10 for 1 By calculating 1i: (t) 60 ~ M*+l. for several values of t a.n.d interpolating, the 95th percentile of the distribution of T is obtained. 4.4 A COmDarison of Four Methods of Estimation In this section the four methods of estimation discussed in Section 4.2 will be compared for eight experiments. The combined experiment is defined as a pair of dissimilar but balanced exper1ments. The behavior, 1. e., variances, the probabilities of yi.elding negative estimates, and the 95th percentiles of the distributions, of &2, ;2, r c &2 , and &2 obtained from all four methods and for eight combined rc e: experiments are considered under a set of true variance components, 2 · ' 4 , cr 2rc • .8, an'd cr e:. 2 1 •• 0 cr 2r =- 10 • , crc To facilitate the later discussion, the methods will be referred r to as simply methods a, b, c, and d. The seructure be used to represent the combination of ~~o (r l 2 experiments, the number of levels of row factor in the kth experiment, ~here ~ l.<:. number of levels of column factor in the kth experiment, and number of subsamples in the kth ~~eriment, r, is «. is the ~ is the where kw l,2. Variances, the probabilities of negative eseimates and the 95th percentiles of the distributions of ar2 obtained f~om four methods of estimation and eighe combined e."tperiments are shown in Table 4.4.10 61 Table 4.4.1 ~2 ~2 Variances of a , probabilities of negative a , and the 95th r r ~2 percentiles of the distributions of cr r from four methods and eight 2 2 2 2 combined experiments when ar • 1.0, ac • .4, a~ • .8, and a& • 1.0 • No Design Method 1 2 2 2) (422 a 1.573 .194 3.417 b 2.097 .218 3.758 c 2.097 .218 3.758' d 1.569 .193 3.393 a 2.259 .255 3.960 b 2.468" .256 4.086 c 2.291 .256 3.965 d 2.236 .255 3.963 a .999 .116 2.978 b 1.871 .178 3.549 c 1.230 .143 3.143 d .992 .105 2.936 a 1.587 .190 3.431 b 1. 825 .200 3.593 c 2.545 .223 4.047 d 1.489 .187 3.366 a .753 .078 2.636 b .823 .098 2.680 c .764 .080 2.640 d .745 .082 2.613 2 2 2 2) (242 3 4 5 4 2 21 (4 4 2) 95th value For each combined experiment, variances of &2 obtained from methr ods a and d are appreciably less than those from methods b and c, and variance of 02r obtained from method d is less than that from method a. But there is an almost equal distribution of incidences in which variance 0f aA2 obtained f rom metho d b is greater than, equal to, or r less than that from method c. Therefore, when considering the criterion of variance, methods a and d yield more accurate results than methods b 2 and c, and method d yields the most accurate results on 0 of all four r 63 methods. Note that method d yields the minimum variance quadratic unbiased estimates of the variance components. In general, methods b and c are more likely to yield negative estimates for (12 than a and d. r ... 2 The probability of a negative (1 r obtained from method d is less than from method a, except for combined experiment number 5. In this combined experiment, the magnitude of the difference between these two methods is small. The probabilities of yielding negative a; from methods b and c depend on the experiments which are combined. It is notable that methods band c frequently give the same results. When the criterion o~ 95th percentile of the distribution is used, one has to give some consideration to the probability of yielding ... 2 negative (1 because the shape of the density function is different. r The results, in general, agree with the results when the variance criterion is used. In some cases, it happens that one method gives a ... 2 smaller probability of yielding negative (1 and a larger 95th percentile r of the distribution, while the other gives a larger probability of ... 2 yielding negative (1 and a smaller 95th percentile of the distribution r (see combined experiment number 5; compare methods a and d). Therefore, these sometimes make a comparison impossible, and in this case variance may be the best criterion. From the above discussion, it can be seen that a conclusion cannot be made as to the desirability of methods band c because they depend on the experiments which are combined. But it is notable that they 64 frequently give the same results. Therefore, one may say that when all are considered, method d is the most desirable, method a the second most desirable, and methods band c the least desirable of all four methods. Variances, the probabilities of negative estimates, and the 95th percentiles of the distributions of ;2 obtained from four methods of c estimation and eight combined experiments are shown in Table 4.4.2 • ... 2 ... 2 Variances of a , probabilities of negative a , and the 95th c c percentiles of the -distributions of·&2 from four methods and eight c 2 combined experiments when a • 1.0, a 2 • .4, a2 • .8, and a2 • 1.0 c r rc € Table 4.4.2 e ... 2 Var(a ) c P(a 2 0) No. Design Method 1 ~ ~ a .806 .375 2.135 b 1. 043 .366 2.302 c .837 .364 2.144 d .803 .374 2.130 a .763 .328 2.012 b 1.017 .348 2.217 c 1.017 .348 2.217 d .761 .325 1.989 a .330 .259 1.529 b .856 .309 2.042 c .505 .280 1. 711 d .329 .250 1.490 2 3 2 2 {~ (~ 2 2' 4 21 2 4 ~ < c- 95% value 65 Table 4.4.2 (continued) No. Design 4 ~i~ 5 6 7 8 (4 2 2) 442 ~ i :) (4 4 2) 424 I~ : ~ Method var(&2) c p(&2 < 0) c- 95% value a .797 .367 2.124 b .846 .352 2.144 c .846 .352 2.144 d .739 .362 2.074 a .280 .226 1.439 b .374 .257 1.592 c .374 .257 1.592 d .280 .229 1.426 a .642 .346 1.952 b .797 .341 2.093 c .662 .340 1.965 d .641 .347 1.954 a .282 .223 1.439 b .324 .234 1.509 c .452 .274 1.717 d .266 .216 1.409 a .266 .225 1.376 b .282 .212 1.370 c .282 .212 1.370 d .246 .212 1.346 ~ e 66 For each combined exper~ent, variances of a2c obtained from methods a and d are significantly less than those obtained from methods b and c, and variance of &2 obtained from method d is less than that c obtained from method a. Changes in variances 0f a.. 2 obtained f rom c methods b and c are directly related to the experiments which are combined. Therefore, when considering a criterion of variance, methods ... 2 a and d yield more accurate results on a than methods b and c, and c ... 2 method d yields the most accurate results on a • c For four out of eight combined experiments, that is, combined experiment numbers 2, 3, 5, and 7, the probabilities of observing nega... 2 tive a when using methods a and d are significantly smaller than when c using methods band c. Method d, in general, gives a smaller proba- ... 2 bility of a negative a c than method a. ... 2 obtaining negative cr c As before, the probabilities of when using methods b and c depend on the experi- ments which are combined. When considering the 95th percentile of the distribution of ;2 c ... 2 together with the probability of yielding negative a c as the criteria, the results are the same as the results when considering variance as a criterion. These have been discussed in the preceding paragraph. We conclude that method d yields slightly better results on &2 than method c a and considerably better results than methods b and c. 67 Variances, the probabilities of negative estimates, and the 95th ..2 percentiles of the distributions of cr obtained from four methods of rc estimation and eight combined exper::fJnents are shown in Table 4.4.3 • Table 4.4.3 .. 2 ... 2 Variances of arc' probabilities of negative arc' and the ... 2 95th percentiles of the distributions of arc from four methods and 2 2 2 eight combined experiments when a r • 1.0, cr c • .4, arc • .8, and No. 1 2 3 Design ·Method 95% value a .887 .181 2.644 b 1.174 .213 2.905 c 1.174 .213 2.905 d .873 .181 2.604 a .887 .181 2.644 b 1.174 .213 2.905 c 1.174 .213 2.905 d .882 .184 2.615 a .363 .062 1.914 b .978 .139 2.744 c .978 .139 2.744 d .359 .066 1.903 68 Table 4.4.3 (continued) 69 ..2 For each combined experiment, variances of a when using methods a rc and d are significantly smaller than when using methods b and c. Also ",2 method d leads to a smaller variance f or a than method a. It is rc ..2 notable that variances of a obtained from methods b and c are often rc the same. Therefore, when considering variance as the criterion, methods a and d yield more accurate estimates of all. 'the probabilities of obtaining negative estimates when using methods a and d are, in general, appreciably· smaller than when using methods b and c except for combined experiment number 4. In combined experiment number 4, the probabilities of negative &2 when using the four methods rc are in the order a<b<d<c, but the magnitude of the differences between pairs of methods are small. When methods a and d are compared, it is. .. 2 clear that the probabilities of negative a from these two methods are rc nearly equal. For five out of eight combined e."tperiments methods b and .. 2 c have the same probability of a negative a • In the remaining three, rc method b leads to a smaller probability of a negative estimate. ,,2 When considering the 95th percentile of the distribution of cr ,,2 together with the probability of yielding negative cr rc rc as the criteria, the results are the same as the results when considering variance as a criterion. These have been discussed in the preceding paragraph. ,,2 Therefore, method d proves to yield the most accurate results on cr rc of all methods. Variances, the probabilities of negative estimates, and the 95th percentiles of the distributions of &2 obtained from four methods and e: eight combined experiments are shewn in Table 4.4.4. 70 Table 4.4.4 ... 2 Variances of a e , probabilities of negative a... e2 , and the .. 2 95th percentiles of the distributions of a e from four methods and eight 2 2 2 combined experiments when a r ,. 1.0, a - .4, a 2 ,. .8, and a e • 1.0 c rc No. Desiga 1 {; ; ~ 2 3 4 5 (~ (~ 2 4 2 4 ~) ~) Ii 22 Z) (: 42 ~ Method ... 2 Var(a e ) 95% value a .167 1.753 b .188 1.827 c .188 1.827 d .167 1.753 a .167 1.753 b .188 1.826 c .188 1.826 d .167 1.753 a .100 1.616 b .156 1.750 c .156 1.750 d .100 1.616 a .100 1.616 b .104 1.711 c .104 1.711 d .100 1.616 a .083 1.537 b .094 1.582 c .094 1.582 d .083 1.537 71 Table 4.4.4 (continued) No. Design 6 7 8 Method 2 Var(a ) e 95% value a .056 1.451 b .063 1.471 c .063 1.471 d .056 1.451 a .050 1.437 b .052 1.446 c .052 1.446 d .050 1.437 a .050 1.437 b .052 1.446 .052 1.446 .050 1.437 d It is rather striking that methods a and d appear to give the same variance 01:~ .. 2 f or all co mb me . d experJ.Il1ents. . a e This condition also occurs 'Nith methods b and c. But methods a and d yield more accurate estimates than methods b and c. w"hen using the 95th percentile of the dist=ibu- tion of &2 as the criterion, similar conclusions obtain except that in € combined e~eriment number 4 the 95th percentile of the distribution of &2 for method d is larger than the corresponding value for method a. e: When the results from Table 4.4.1 to Table 4.4.4 are combined, one may conclude that method d is the most desirable, method a the second most desirable, and methods b and c the least desirable of all four methods. Since method d appears to be the most desirable of the four, the next tvo sections TNill be devoted to examining the effects of number of e 72 levels of the random factors and the effects of size of the true variance components. 4.5 !'he Effect of the Number of Levels of Each Rand01U Fa.ctor In this section the effects of the nu:tber of levels of each random # h ... 2 ... 2 ... 2 ... 2 ...actor on t e estimates, ~ , ~ ,~ ,and ~ , are e..'"tamined whc using r c rc: ~ method d and sixteen combined experi:1ents. The combined e:tper:iments are ~a:in defined as pa irs of independent t"'.JO-way balanced e:tperiments. There are sixteen. observations in the combined exper..:nent denoted by 2 2 2 2 ~ and twenty-four in the combined e.."qler..:nents denoted by 42 . 2J2' ' ~ (2 and ~2 , 2 2 4' 21·, For the rest of the combined e:tperiments eacit involves th1r-::y-two observations. T:1.e behavior, i. e. , variances, the probabilities of negative estimates, and the 95th percentiles 6f the distributions, • o~ the estimates, are examined under a set of true variance ...2 ~ r ... 2 ... 2 , a ,a ... 2 and a_, ~c:' c ~ co~ouents, a 2 .. 1.0, J2 .. .4, r c ~2:,c ... 8, and ~:... .. 1.0. Variances, the probabilities of negative pe::'c:!!ltiles of the disttibutions of ... '? ~- 1:' esti~tes, and the 95th r,.onc using :lethod d and combined exper-=:ents under a set of true variance compouents, J 2 c . . . 4, '? a- re • .8, and a 2 e • 1.0, are shown in Table 4.5.1. si:~een 2 ~_ .. .. 1.0, As defined above, a combined expert:ent is a pair of experi:ents so to read Table e..~eri- 4.5.1 locate :he row and the column labeled -Mitn the pair of cents, for ezample, the results for combined e:Qeri.:lent \; 1- found at the intersection of row 1 and colucn 4. 4.5.1 is symmetric. ~otice 2 2 2\ 2J are that Table Table 4.5.1 Variances of aA2 , probabilities of negative aA2 , and tbe 95th percentiles r r 0 f the distributions 0 f "2 2 2 2 2 a •• 4, a rc • .8, and a £ • 1.0 a r when using method d and sixteen combbled experiments witb a r • 1.0, ,c (4 2 (2 2) 4 2) (2 2 (2 4) 2 2) Experiment A2 Var(a ) pea.. 2<0 ) 95% value Var(02) p«(j2<0) . r- 95% value Var(a.. 2) r r r- r 95% value p(a~O) 95% value var(o~) p«(j2<0) r- (4 2 2) 1.048 .145 2.927 1.291 .166 3.207 1.489 .187 3.366 1.569 .193 3.393 (2 4 2) 1.291 .166 3.207 1. 791 .213 3.727 2.096 .241 3.918 2.236 .255 3.963 (2 2 4) 1. 489 .187 3.366 2.096 .241 3.918 2.601 .258 4.136 2.842 .272 4.306 (2 2 2) 1.569 .193 3.393' 2.236 .255 3.963 2.842 .272 4.306 3.145 .283 4.500 .' w '" e e e 74 Clearly combined experiment 2 estimating a • 2) 2z· 2 ( "4' is the most desirable for When considering only combined experiments involvi..."1g r thirty-two observations, one obtains a better estimate of a2 when using r eight rows than when using only siX rows, and when using six rows than when using only four rows. This likewise occurs in combined experiments involVing twenty-four observations. With the same number of observa- tlons and the same number of rows, combined experiments which have larger degrees of freedom for the interaction between row and column 2 effects provide better estimates of a , .that is, using combined experir (~ ment 2 4 ~) with six degrees of freedom for interaction is better than using combined experiment and combined' experiment (~ 4 2 ~) (~ 4 4 ~) or combined experiment experiment (~ 4 2 2) 2 2) 2 2 (i with four degrees of freedom, is better than combined e."t'periment (~ 2 4, 2 4 (~ 2 .4) However, there are large 22' differences among combined experiments (; 2 2 using combined .~so with four degrees of freedom for interaction is bet,ter than combined experiment (~ J • 4 4 ~ ) , (~ Also, there is a :)- with the last being the least desirable. large difference between combined experiments (~ i .~ ) , and 4 2 iJ and (~ 2 2 iJ 2 It is notable that one can obtain a better estimate of a r when the total number of observations is twenty-four than when it is thirty-two. This can happen when the combined experiment with a smaller number of observations has larger degrees of freedom for interaction, for e.~ample, 75 using combined experiment (4 \2 2 2 -(2 2) 2 or 2 4 2 ~ each with four degrees of freedom is better than using combined experiment degrees of freedom. (i 2 2 :) with two Also, when the size of O'r2 not being so small combined experiment with a smaller number of observations but a larger 2 number of rows may provide a better estimate of 0'. r combined experiment (~ 4 4 ~) , (~ (i 2 2 For example, using 22) is better than using combined experiment 2 2 4 2 Variances, the probabilities of negative estimates, and the 95th ... 2 percentiles of the distributions of O'c when using method d and sixteen combined experiments under a set of true variance components, 0' 2 2 -2 O'c • .4, 0' rc • .8, and 0' e: • 1.0, are shown in Table 4.5.2. 2 r a 1.0, The struc- e ture of Table 4.5.2 is the same as that of Table 4.5.1. Clearly combined experiment ( estimating 0' 2 • C 2 2 : ;) is the most desirable for When considering only combined experiments involVing thirty-two observations, using eight columns is better for estimating 0'2 than using only four or six columns. c This also occurs in combined experiments involving twenty-four observations, that is, using six columns is better for estimating 0'2 than using only four columns. c With the same number of observations and the same number of columns, combined experiments which have larger degrees of freedom for the interaction between row and column effects provide better estimates of O'~, that is, e e e Table 4.5.2 Variances of A2 0 • c probabilities of negative when using method d and sixteen combined experiments with (2 (4 4 2) 2 A2 0 • c 0 2 r e· . and the 95th percentiles of the distributions - 0 f A2 0 c 1.0. a 2 - .4. a 2 - .8. and a 2 • 1.0 c rc I: 2) (2 2 4) (2 2 2) Experiment A2 Var(o ) c A2 p (0 <0) 95% value A2 Var(o ) c A2 p(o <0) 95% value A2 Var(o ) A2 p (0 <0) 95% value A2 Var(o ) c A2 p (0 <0) 95% value c- c- c c- e- (2 4 2) .508 .290 1.689 .519 .305 1.159 .610 .322 1.951 .161 .325 1.989 (4 2 2) .519 .305 L 159 .561 .346 1.895 .139 .362 2.014 .803 .314 2.130 (2 2 4) .610 .322 1.951 .139 .362 2.014 1.131 .363 2.365 1.291 .314 2.529 (2 2 2) .161 .325 1.989 .803 .374 2.130 1.297 .374 2.529 1.525 .383 2.640 " 0\ 77 . (2 using combined experJ.ment 4 4 2 ~) with six degrees of freedom for (~ interaction is better than using combined experiment Combined experiment (44 four degrees of freedom. 4 ~) 2 il with 2 2 . degrees of freedom is better than using combined experJ.ment with four degrees of freedom or combined experiment degrees of freedom. Also combined experiment (i (~ 2 degrees of freedom. 4 combined experiments (4 (42 2 2 :) with four (~ ~) with 2 2 However, there are large differences among 2 2 2 2 last being the least desirable. · six :) with two 2 i) 2 degrees of freedom is better than combined ext'eriment ~o 2 with betiJeen combined experiments (42 2 2 :) , with the Also, there is a large difference 2 2 ~) and (i 2 2 i) . For combined experiments involving thirty-ewo observations, one may obtain a better estimate of a 2 when using a smaller number of columns but a larger numc ber of rows, for example, using combined experiment for estimating a 2 than using combined experiment. (22 c where this can happen is that combined experiment degrees of freedom for interaction. l: ~) l~ 2 2 4 z) 2 2 2 is better One reason ~) has larger One can obtain a better estimate of a 2 when the total number of observations is twenty-four than it is c thirty-two. This can happen when the combined experiment with a smaller number of observations has larger degrees of freedom for interaction, 78 (2. '42 '2)' 2 for example, using combined experiment ?Z than using combined experiment ~ (i or 2 r 2 2 i) is better 2 2 It may be sufficient to state that the general same as for a. (22 conc1~sions are the However now, looking at more columns has a larger effect than before. This appears to be related to the relative magni- tudes of a 2 and a 2 • r c Variances, the probabilities of negative estimates, and the 95th percentiles of the distributions 0f ... 2 arc when using method d and sixteen' combined experimem:s under a set of true variance components, a 2 .. 1. 0, r a 2 2 2 c ... 4, a rc ... 8, and a & • 1.0, are shown in Table 4.5.3. The struc- ture of Table 4.5.3 is the same as that of Table 4.5.1. Clearly combined experiment estimating a 2 rc • (~ 2 4 ~) is the most desirable for When considering only combined experiments involVing thirty-two observations, one obtains a better estimate of a 2 when rc using combined experiment with six degrees of freedom for the inte~ action between row and column effects than using combined experiment with four degrees of freedom, and using combined experiment with four degrees of freedom for interaction is better than using combined experiment with two degrees of freedom. This also occurs in combined experi- ments involVing twenty-four observations, that is, using combined experiment with four degrees of freedom for interaction is better than using combined experiment with two degrees of freedom. There are large Table 4.5.3 Variances of when using method d and (4 .. 2 G rc ,probabilities of negative si~teen .. 2 0 rc ,and the 95th percentiles of the distributions of .. 2 G rc 222 2 combined experiments with a - 1.0, G - .4, a - .8, and a - 1.0 r c rc £ 2 2) (2 4 (2 2) 2 4) (2 I 2 2) Experiment .. 2 .. 2 Var(o"2 ) p(a.. 2 <0) 95% value Var(02 ) p(a 2 <0) 95% valuQ Var(o . ) p(a <0) 95 % value rc rcrc rcrc rc- ..2 "2 Var(a ) p(a <0) 95% value rc rc- (4 2 2) .595 .123 2.272 .580 .123 2.254 .750 .155 2.515 .813 .181 2.604 (2 4 2) .580 .123 2.254 .595 .123 2.272 .758 .159 2.526 .882 .184 2.615 (2 2 4) .750 .155 2.515 .758 .159 2.526 1.108 . .220 2.947 1.350 .256 3.101 (2 2 .813 .181 2.604 .882 ".184 2.615 1.350 3.107 1.753 .311 3.450 2) .256 ...... \0 e e e 80 ~ differences among combined experiments with different degrees of freedom for interaction even though the total number of observations remains constant. However, there are small differences among combined experi- ments which involve the same number of observations and the same degrees of freedom for interaction, for exampl~ combined experiments (: 4 2\ ~ , combined experiments (2 2 4 , and (22 4 4 4 2 , and combined experiments obtain a better estimate of a 2 rc (i ~ ~) and (~ 2 2 2 2 4 2 ~) . One can when the total number of observations is twenty-four than when it is thirty-t"'.Jo. This can happen when the com- bined experiments with smaller numbeJ;'s of observations have larger degrees of freedom for interaction, for example, using combined experi(4 ment \ 2 (~ 2 2 ~) or (~ ~) 4 2 is better than using combined experiment 2 2 Variances and the 95th percentiles of the distributions 0f .. 2 a_0:. when using method d and sixteen combined experiments under a set of true variance components, a shown in Table 4.5.4. of Table 4.5.1. 2 2 r = 1.0, a c = ., .4, a- rc 1.0, are '!he structure of Table 4.5.4 is the same as that Note that the probabilities of negative ;2 are all c: equal to zeros. Clearly combined experiment (; 2 2 1) is the most desirable for 2 estimating a • When considering only combined experiments involving c: thirty-two observations, using combined experiment where the number of Table 4.5.4 A2 Variance.. of 0 £ and the 95th combined el'ller1l111mta with a (2 2 "'crc~ntUe8 r ... 2 of the diatributionll of G£ when uains aethod d and sixteen 222 2 - 1.0, 0 - .4, 0 - .8. and" - 1.0 r c rc £ 4) (2 4 (4 2) 2 (2 2) 2 2) Experilllellt ~2 ~2 ~2 Var(" ) 951 value Var(" ) 95% value Var(" ) 95% value £ £ £ Var aj2) . £ 95% value (2 2 4) .083 1.537 .100 1.611 .100 1.616 .125 1.685 (2 4 2) .100 1.611 .125 1.671 .125 1.671 .167 1.753 (4 2 2) .100 1.616 .125 1.671 .125 1.677 .167 1. 753 (2 2 2) .125 1.685 .167 1. 753 .167 1.753 .250 1.960 ! QO ~ e e e 82 subsamples is four is better for estimating a 2 e: than using combined experiments where the numbers of subsamples are two and four or combined experiment where the number of subsamples is two, and using combined experiment where the numbers of subsamples are two and four is better than using combined experiment where the number of sub samples is two. This likewise occurs in combined experiments involving twenty- four observations. With the same number of observations, combined experiments which involve the same numbers of subsamples or the same number of sub samples provide the same results on estimating a e:2 ' that is, combined experiments (: 2 2 2 2 ~) , (i ~J and (~ 2 2 ~) l~ an'd ~) 2 4 ~) , , combined experiments 4 ~) , and (i (~ 4 2 ~) However, there are small differences among 2 . 4 4 all sixteen combined experiments. and combined exp erimen ts It is observable that combined experiments which involve t'r.venty-four observations may provide the same results as combined experiments which involve thirty-t'r..To observations, combined experiment (~ 2 2 i) tions and combined experiment (: 2 2 ~) , for ~xample, with twenty-four observa- (i 2 4 2\ 21 , or (~ 4 4 ~J each with thirty-two observations provide the same results On estimating a~. When considering the results from Table 4.5.1 to Table 4.5.4, one would prefer to use combined experiment with a larger number of rows or a larger number of columns when the total number of observations remains constant. The reason is that combined experiment with larger 83 numbers of subsamples improves the results on estimating cr 2 , but the E improvement is small when comparing with a larger number of rows or a larger number of columns which improves the results on estimating 222 cr , cr , and cr • rc r c However, a larger number of rows or a larger number of columns does not yield much different results on estimating cr 2 rc and cr 2 when the total number of observations remains the sam~ but it proE 2 r and cr c • When the value 2 2 of cr is large when comparing with the value of cr , it·is better to r c duces a noticeable improvement on estimating cr have the ·number of rows increased. increased otherwise. . increas~g t he numb er 2 The number of columns should be· The reason is that when the size of cr 0f 2 r is large, rows pro vid es muc h sma11er vari ance of crA2 r than that when increasing the number of columns or increasing the numbers of subsamples when the total number of observations remaining the same. Ihis will be discussed in more details in the next section. And also, increasing the number of rows and increasing the number of columns do not yield much different results on estimating cr cr 2 c is small, cr 2 ,. .4 in the present work. 2 when the size of It is noticeable that one can c 2 2 c 2 and cr when the total number of obtain a better estimate of cr r' cr c' rc observations is twenty-four than when it is thirty-two. This happens when combined experiments with smaller numbers of observations have larger degrees of freedom for interaction, for example, using combined 84 experiment {~ 2 2 ~1 or (~ 4 2 ~ each ~N.ith four degrees of freedom 2 2 2 is better for estimating a , a , and a than using combined experir c rc \i meut 2 2 4.6 :} with two degrees of freedom. The Effect of the True Variance Components 2 In this section the effect of the true var:t.ance components, a , 1: 2 2 2 ... 2 ac' arc' and a e:' on the estimates a r combined e:tperiments are examined. when using method d and three The combined ~~eriments are de- fined as in Section 4.5 each involving t"oJenty-four observations. The behavior, i.e., variances, the probabilities of negative estimates, and the 95th percentiles of the distributions, of the estimates under several sets of true variance components are e:tamined. 2 2 the effect of a , where a r r where cr 2 c a. a ar2 That is "'2 _ 2 .01, 1.0, 2.0, on a , the effect ot cr , r c ...2 2 2 01, .4, 2. 0, on a J the effect of a , T.vhere cr • • 01, . r rc rc ...2 2? "'2 .8, 2.0, on ar' and the effect of ae:' where a~ • .01, 1.0, 2.0, on are Variances, the probabilities of negative esti~tes, and the 95th ... ? percentiles of the distributions of cr- when using method d and three r 2 combined experiments T.with a · . 01, 1. 0, and 2. a are shewn in Table r 4.6.1. 85 ... 2 .. 2 and the 95th Table 4.6.1 Variances of a , probabilities-- of negative-a, r r .. 2 eercenti1es of the d1strib~~ions of ar when using method d and thl:ee2 2 combined experiments with a • .01, 1.0, 2.0, a 2 • • 4, a c r rc • .8, and 0 2 • 1.0 e Design r; • (~ (~ 2 2 4 2 2 2 0 ~t i) i) 2 r ... 2 Var(o ) r p(.;2 < 0) r- 95% value .01 .252 .567 .• 936 1.00 1.569 .193 3.393 2.00 3.719 .099 5.753 .01 .426 .490 1.146 1.00 2.236 .255 3.963 2.00 6.217 .147 7.503 .01 .679 .493 1. 361 1.00 2.842 .272 4.306 2.00 7.017 .183 8.306 e - e e 2 2 4) ( 222 2 4 2) (222 6.0 4.0 r-. N <0 J.4 '-" J.4 l1S I // / (~ ~ ~) P 2.0 0.0 1.0 2.0 o Figure 4.6.1 2 r The graphs of variances 0 ~2 f 0 r where .01 < - 0 2 < 2.0 for r- three combined experiments 00 0\ 87 2 It is noticeable that when the value of or increases, the probability of observing a negative the distribution of cr 2r .. z 0 r decreases while the 95th percentile of increases. We consider the behavior of the .. 2 .. 2 2 2 estimate 0 by looking at variance of 0 as a function of 0 with 0 , r 0 2 rc' an d of 0 0 r r The graphs of variances of 2e: b e ing fixed. .. 2 0 c as functions r 2 for three combined experiments, data from Table 4.6.1, are r shown in Figure 4.6.1. For each value of 0 2 , one obtains a better estimate of r fi using combined experiment {~ ment ~} 4 2 or (~ 2 2 ~ ~) 0 2 when r than when using combined experi- and using combinid experiment is better than using combined experiment (~ 2 2 three utilize the same number of observations. il i l~ ~) even though all When 0 Z is small, r there are small differences among these three combined experiments. The rate of increase in variance of r2 for combined experiment l2 2 2 ment ~ 4 2 22 "Z as a function of 0 r 0 2 is largest r 4) somewhat smaller for combined experi2' , and much smaller for combined experiment (~ 2 2 ~) . It is noticeable that the rates of increase are nearly the same for combined experiments 2 (~ 4 2 2 2 .. 2 ~. This implies that each value of or affects variances of or for combined experiments 2 2 bined i} by nearly the same amount. ~~eriments 12 l 2 24 2\ 21 One can conclude that for com- involving twenty-four observations using combined experiment with six rows is better than using that with six columns or 88 that with numbers of subsamples being two and four when the value of 0 2 r is small and much better when the value of . 0 2 r is large. Variances, the probabilities of negative estimates, and the 95th percentiles of the distributions of combined experiments with o~ A2 0 r when using method d and three • • 01, .4, and 2.0 are shown in Table 4.6.2. Clearly for each combined experiment the probability of a nega- A2 tive o r value of remains the same when the value of 0 2 c tile of the A2 the probability of a negative 0 . different combined experiments. distribution~ 2 increases, but for each c 0 r changes when using These also hold for the 95th percen- In order to examine the behavior of &2, we • r 2 2 2 2 consider variance of &2 as a function of 0 with 0 , 0 , and 0 r c r rc € The graphs of variances of being fixed. A2 r as functions of 0 0 2 c for three combined experiments, data from Table 4.6.2, are shown in Figure 4.6.2. For each combined experiment there are no differences among variances of A2 0 .• 2 r for different values of 2 does not affect variance of cr • not depend on the value or size of 0; than using combined experiment 2 combined experiment ( 12 2 2 4 2 ~l c , that is, the value of This implies that estimating 0 r obtains a better estimate of 0 0 2 c • For each value of 0 e~eriment (i i) , 4 ~) or (~ 2 2 2 c does 2 one c when using combined 2 2 r 0 (; 2 2 and using is better than using combined ~~eriment 89 ",2 "'2 Variances of cr , probabilities of negative cr r , and the r .. 2 95th percentiles of the distributions of cr when using method d and r Table 4.6.2 2 2 2 three combined experiments with cr r • 1.0, cr • .01, .4, 2.0, cr c rc • .8, 2 and cr e: • 1.0 Design (i (~ (~ 2 2 4 2 2 2 ~) ~) . iJ cr 2 c Var(cr"2 ) r p(a 2 < 0) r- 95% value .01 1.568 .193 3~.386 .40 1.569 .193 3.393 2.00 1.572 .196 3.414 .01 2.236 .255 .40 2.236 .255 3.963 2.00 2.236 .255 3.963 .01 2.842 .272 4.300 .40 2.842 .272 4.300 2.00 2.842 .272 4.300 • 3.963 e e e e 3.0 I- (~ i ~) '" - 4 (2 2) 222 2.0 N ~ <"b '-' (4222 2 2) I ~ C\1 l> 1.0 0.0 1.0 2.0 2 a c A2 2 The graphs of variances of a where .01 < a < 2.0 for r - cthree combined experiments Figure 4.6.2 \0 o 91 Variances, the probabilities of negative estimates, and the 95th ..2 percentiles of the distributions of a when using method d and three r combined experiments with 2 a · . 01, rc • 8, and 2.0 are shown in Table 4.6.3. ",2 For each combined experiment the probability of a negative a r increases when a 2 increases. rc Also, for each value of a 2 the probabirc lity of a negative a"'2 increases when using cott1bined experiment r (i 4 2 i) rather than using cott1bined experiment combined _experiment 4 i) . 2 2 2 ;) 2 2 ~ and using rather than using combined experiment These also hold for the 95th percentile of the distribu- .. 2 tion of a • r . funct~on as a (~ (; .. 2 We consider the behavior of cr of a r by looking at variance of 2 with a 2 22 , a , and cr . being fi%ed. rc r c ~ "'2 The graphs of 2 variances . of a r as functions of cr rc for three combined experiments, data from Table 4.6.3, are shown in Figure 4.6.3. For each value of a 2 rc , one obtains a better estimate of a using combined experiment (; (~ i ~) or (~ ~ iJ, ~ ~) 2 r when than when using combined experiment and using combined experiment better than using combined experiment ( 22 2 2 (~ i 4 ) even though 2 2 ) is 2 all three combined experiments utilize the same number of observations. The rate ",2 2 as a function of a is smallest for r rc of increase in variance of a 4 combined experiment ( 2 4 2 2 2 2\ ~ , somewhat larger for combined experiment ;), and much larger for combined experiment (~ 2 2 ;) . How.ever, 92 "2 ...2 Variances of a , probabilities of negative ar , and the Table 4.6.3 r ... 2 95th percentiles of the distributions of a r when using method d and 2 2 2 three combined experiments with a r • 1.0, a c • .4, arc • .01, .8, 2.0, and a 2 e - 1.0 Design (i (~ /2 \2 2 2 4 2 2 2 ~) ~) ~) ..2 Var(a ) r p(.;2 < 0) r- .01 .820 .078 2.786 .80 1.569 .193 3.393 2.00 3.293 .286 4.430 .01 1.424 .142 3.412 .80 2.236 .255 3.963 2.00 3.819 .336 4.792 .01 1. 437 .116 3.452 .80 2.842 .272 4.306 2.00 6.172 .350 5.622 a 2 rc 95% value 2 2 4) ( 222 6.0 4.0 lr-. N ~ <0'-' / I ~ ~ ~ ~ /' (; ; ~ (4 2 2) 222 l\t l> 2.0 0.0 1.0 2.0 a 2 rc A2 2 The graphs of variances of a where .01 < a < 2.0 for r - rcthree combined experiments Figure 4.6.3 'D LJ e e e 94 when the size of a 2 is small, there are small differences among these rc three combined experiments. It is notic.eable that the rates of increase are nearly the same for the combined experiments 4 2 ~) . (~ 2 2 2 This implies that each value of arc affects variances fi of ;; for combined experiments ~ 4 2 2 2 , which have the same degrees of freedom for interaction, by nearly the same amount. Variances, the probabilities of negative estimates, and the 95th ... 2 percentiles of the distributions of a r when using method d and three combined experiments with a~ ~Ol, 1.0, 2.0 are shown in Table • For each combined experiment the probability of a negative increases when the value of a probability of a negative 2 combined experiment ~) . 2 2 ... 2 increases, and for each value of a € (i 4 2 2 € the increases when using combined ~~eriment 21~ rather than using combined experiment 2 a. r ;2r 2 ~ 4 (22 2 ~ and using rather than using combined experiment These also hold for the 95th percentile of distribution of ... 2 We also consider the behavior of the estimate a r by looking at variance of ;2 as a function of a 2 with a 2 , a 2 , and (12 being fixed. r E: r c rc ... 2 2 The graphs of variances of a as functions of a for three combined r E: experiments, data from Table 4.6.4, are shown in Figure 4.6.4. 2 r For each value of (12 one obtains a better estimate of a when € using combined experiment ment (~ 4 2 ~j or (~ 2 2 2 2 ~ rather than using combined experi- and using combined experiment l~ 4 2 ~ is 95 ,,2 "2 Variances of a , probabilities of negative a , and the r r ,,2 95th percentiles of the distributions of a when using method d and r 2 2 2 three combined experiments with a • 1.0, ac • .4, arc • .8, and r Table 4.6.4 0'2 • .01, 1.0, and 2.0 e: Design (~ (; (~ 2 2 4 2 2 2 ;) ~ i) 2 ae: ,,2 Var(a ) r .01 1.064 .128 3.0ll 1.00 1.569 .193 3.393 2.00 2.202 .238 3.844 .01 1. 706 .194 3.634 1.00 2.236 .255 3.963 2.00 2.845 .294 4.300 .01 2.128 .223 3.876 1.00 2.842 .272 4.306 2.00 3.659 .295 4.705 p(a 2 < 0) r- 95% value e e e e 2 2 4) ( 222 3.0 2 4 2) ( 222 ~ ~ N '~ ~ 2.0 I (4 2 2) ------- 2 2 2 1.0 0.0 1.0 2.0 a Figure 4.6.4 2 £ . A2 2 The graphs of variances of a where .01 < a < 2.0 for r - £- three combined experiments '0 0\ 97 2 2 better tha;!. using combined experiment (2 2 4) 2 even though all three combined experiments utilize the same number of observations. rate of increase in variance of [~ ~ ~), combined experiment 0r2 as a function of a 2 The is largest for € smaller for combined experiment ( 22 and much smaller for combined experiment (; ~ ~. 4 2 There' are large differences among these three combined experiments when the value of 2 a 2€ is small, and the differences become larger when the value of a € . In other words, using combined experiment increases. better than using combined experiment (~ 4 2 2) 2 (2 or \2 ( '+2' 2 2 2}2 JoS . 2 2 ~) when the 2 value of a 2 is small, and much better when the value of a i.s large. € € When considering the results from Table 4.6.1. to Table 4.6.4, 2 2 2 the value of ar' arc·' and a2 have a large effect on estimating a r c:: all three combined experiments • ..2 affect the results on a. r for 2 However, the value of a c does not There are small differences among three 2 . is small, but large differr combined e..'"q)eriments when the value of cr ences when the value of a 2 is large. r value of a It is noticeable that when the 2 is large, using a larger number of rows prOVides much r .. 2 better results on a than using a larger number of columns or larger r numbers of subsamples even though the total number of observations remains constant. Also, there are small differences among three combined experiments when the value of a 2 rc is small, but combined experiment with a larger number of rows or a larger number of columns .. 2 provides much better results on a than larger numbers of subsamples r 98 when the value of 0'2. rc is large. However, a larger number of rows is better than a larger number of columns, that is, combined experiment 2 2 ~ 2J 4 2 ~. changes. is better for estimating 2 than combined experiment 0' r The same conclusion can be made when the value of 0'2 E 99 5. SUMMARY The aims of the present work are restated as follows: a) Examine the asymptotic properties of REML estimates in the mixed model of the analysis of variance. b) Derive the distribution of translation-invariant variance component estimators. c) Compare four methods of variance component estimation for several combined experiments. d) Study the effect of the number of random levels and the effect of the true variance components on MINQUE. It is shown in Chapter 2 that the REML estimates are consistent, asymtotically normal, and asymptotically efficient in the sense of attaining the Cramer-Rao lower bound for the covariance matrix under some assumptions on the mixed model and some assumptions on a sequence of experiments. A class of design sequences where the number of levels of each random factor increases to infinity is used. In considering the asymptotic properties for the mixed model, it is different from the usual method of proof of such properties because the observed random variables are not independent and identically distributed. Different normalizing sequences which are related to the degrees of freedom in the analysis of variance are used for different sequences of estimates. We prove the asymptotic theory by showing that under some assumptions on the mixed model and a sequence of experiments the requirements of a theorem proved by Weiss are satisfied. 100 For the purpose of studying the behavior of the estima~ors, the exact distribution of estimators which are translation-invariant quadratic forms is derived in Chapter 3 by using the method of mixtures. The distribution is shown to depend only on the eigen values of At which, in general, are not all positive. When all eigen values are positive, the distribution can be expressed as an infinite series of chi-square distributions. When there are both positive and negative eigen values, the distribution can be expressed in terms of the confluent hypergeometric function. However, the distribution can be represented in terms of the chi-square distribution when both the number of positive and negative eigen values are even. Even though the distribution derived is in terms of infinite series, an adequate approximation can be obtained by truncating the series. The accuracy of the distribution depends on when the series is terminated. The method has been checked with the distribution obtained from the Monte Carlo method. In Chapter 4, the four methods of variance component estimation for eight combined experiments are compared under a set of true variance components by using three criteria, that is, variance, the probability of a negative "estimate, and the 95th percentile of the distribution, where the last two are obtained from the distribution derived. It appears that the MINQUE method and method of pooling sums of squares, in general, yield better estimates than method of computing mean of analyses of variance estimates and method of averaging mean squares. In the meantime, one obtains a better estimate when using the MmQUE method than using the method of pooling sums of squares. A2 For all estimates, aA2r , aA2 ' aA2rc , an d a, e: c '0 t h e same cone l ' b can be us~on as a ove 101 made. However, a conclusion cannot be made so as to the desirability of method of computing mean of the analyses of variance estimates and method of averaging mean squares because they depend on the experiments which are combined. same results. It is noticeable that they frequently give the' It appears that the probability of a negative &2 is e: equal to zero in all cases. Since the MINQUE method is the mest 2 2 2 2 desirable of all four methods for estimating or' a c ,ar e ,and e: °, it is then worthy to use this method to estimate variance components when one experiment~, has a series of dissimilar but balanced at least for a series of two-way balanced e."'q)er1ments. w~en the effects of the number of levels of each random factor A2 ~ d it l.'S on MINQUE, that is, aA2 , aA2 ,aA2 ,and ae:' are cons id e_e, r c rc 2 clear that using more rows yields a better estimate of a , using more r 2 columns yields a better estimate of a , and using larger numbers of c subsamples yields a better esticate of observations remaining constant. 0 2 e: when the total number of \ol1th the SaJ!le number of observations and the same number of rows, combined experiments which nave larger degrees of freedom for the interaction between row and column effects pro vid e b etter . est~tes ~ 2 r ot a . With the same number of observations and the same number of columns, combined experiments which have larger degrees of freedom for interaction provide better estimates of cr~. Combined e.~periments with the same number of observations but larger 2 degrees of freedom for interaction provide better estimates of arc 102 However, there are small differences among the results on 2 arc obtained from combined experiments with the same number of observations and the same degrees of freedom for interaction. For estimating 0 2 e: , the re- sults obtained from different combined experiments are not much different even though using combined experiments with larger numbers of subsamples prOVide better estimates. All these results suggest one to have rows increased or columns increased, when the total number of observations remains constant, to improve his estimation. or2 and 0 The sizes of 2 c also have an effect on deciding that rows or columns should be increased. When. the size of 0 a large improvement on estimating 2 is large, increasing rows produces r 0 2• Also, increasing columns pro- r duces a large improvement on estimating o~ when the size of o~ is large. Hence, it is better to have rows increased when the size of is large when comparing with the size of 0 0 2 r 2 and to have columns inc creased otherwise. It is not always true that one obtains better estimates w.hen using combined experiments with larger numbers of observations. In some cases, combined experiments with smaller numbers of observations but larger degrees of freedom for interaction prOVide better estimates, for example, combined experiments ( "'2' 2 2 ~) and (~ 4 2 ~ each with twenty- four observations and four degrees of freedom for interaction provide better estimates for 0;, o~, and O;c than combined experiment [~ '~ ~) 103 with thirty-two observations and two degrees of freedom for interaction, and all three combined experiments provide the same results on The variance component estimates depend not only on the number of random levels but also on the size of some true variance components. 2 2 2 2 For estimating cr , the sizes of cr ,cr and cr have a large effect on r r rc' € ... 2 r cr , but the size of cr 95th percentil e 0f 2 ... 2 has no effect on cr • c r · t r ib ut i on the d~s 0f Variance of . h he cr... 2 ~ncrease went r ... 2 increases, but the probability of a negative cr 2 ar2 and r . s~ze the 0f cr 2 r decreases when the size When the size of cr 2 increases, variance of cr"'2 , the rc r ... 2 probability of a negative cr , and the 95th percentile of the distribuof cr r increases. r tion of ar 2 increase. • The same concluSion on variance of a2r , probability of a negative the and the 95th percentile of the distribu- tion of a~ can be made when the size of cr~ increases. ... 2 a2r , When considering 2 variance of or as a function of or' the rates of increase in variance of cr 2 for different combined ~~periments with the same number of obserr vations are not the same. Combined experiments with larger numbers of rows provide much better estimates of 0 2 than combined ~~eriments with r larger numbers of columns when the total number of observations remains constant and the size of 0 2 r is large. However, there are small differ- ences among combined experiments with the same number of observations 104 when the size of a 2 r is small. When considering variance of &2 as a r 2 A2 function of arc' the rates of increase in variance of a for different r combined experiments with the same number of observations are not the same, but they are almost the same for combined experiment with the same number of observations and the same degrees of freedom for interaction. When the size of a 2 is small, there are small differences rc among combined experiments with the same numbers of observations. The A2 2 general conclusion when considering variance of a as a function of a r € is the same as that when considering variance of a 2 as a function of r 105 6. LIST OF REFERENCES Corbeil, R. R. and S. R~ Searle. 1976. Restricted maximum likelihood (REML) estimation of variance components in the mi:md model. Tecbnometrics 18:31-38. Giesbrecht, G. F. 1977. Combining experiIl1ents to estimate variance components. Personal communication. Giesbrecht, G. F. and P. Burrows. 1978. Estimating variance co:nponents using MINQUE or restricted maximum likelihood when the error structure has a nested model. To be published in Communications in Statistics. Graybill, F. A. 1954. On quadratic estimates of variance components. AImals of Mathematical Statistics 25:367-372. Graybill., F. A. and A. W. Wortham. 1956~ A note on unifor.nly best 1mbiased estimators for variance components. Journal of the American Statistical Association 51:266-268. Graybill, F. A. 1969. Introduction to Matrices with Application in Statistics. Wadst~\,)rth Publishing, Inc., Belmont, California. Bartley, E. O. and J. N. K. Rao: 1967. Maximum likelihood estimation for the mixed an'alysis of variance medel. Biometrika 54 :93-108. Harville, D. A. 1977. Maximum. likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72:320-340. Henderson, C. R. 1953. Estimation of variance and covariance ponents. Biomet=ics 9:226-252. co~ Khatri, C. G. 1966. A note on a manova model applied to proble!I1S in growth curve. Annals of the Institute of Statistical Mathematics 18:75-86 Johnson, N. L. and S. Katz. 1970. Continuous Univariate Distributions-2. Boughton Mifflin, Inc., Boston. Liu, L. and J. Senturia. 1977. Computation of MDlQUE variance component estimates. Journal of the American Statistical Association 72:867-868. Miller, J. J. 1973. Asymptotic properties and computation of maxi~um likelihood estimates in the mixed model of the analysis of variance. Technical Report No. 12, Depar:nent of Statistics. Stanford Univeristy, Stanford, California. 106 Miller, J. J. 1977. Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Annals of Statistics 5:746-762. Patterson, H. D. and R. Thompson. 1971. Recovery of Interblock inf ormation when block sizes are 1.mequal. Biometrika 58: 543-554. Press, S. J. 1966.. Linear combinations of non-central chi-square variates. Annals of Mathematical Statistics 37:480-487. Rao, C. R. 1970. Estimation of heteroscedastic variances in linear models. Journal of the American Statistical Association 65: 161-172. Rao, C. R. 1971a. MIQUE theory. Estimation of variance and covariance components Journal of Multivariate Analysis 1:257-275. Rao, C. R. 1971b. Minimum variance quadratic 1.mbiased estimation of variance components. JourtIB.l of Multivariate Analysis 1:443-456. Rae, C. R. 1972. Estimation of variance and cOl7ariance components in linear models. Journal of the American Statistical Association 67:112-115. Robbins, H. E. and E. J. G. Pitman. 1949. Application of the method of mixtures to quadratic forms in normal variates. Annals of Mathematical Statistics 20:552-560. Wang, Y. Y. 1967. A comparison of several variance component estimators. Biometrika 54:301-305. Weiss, L. 1971. Asymptotic properties of maximum likelihood estimators in some nonstandard cases. Journal of the American Statistical Association 66:345-350. Weiss, L. 1973. Asymptotic properties of maximum likelihood estimators in some nonstandard cases, II. Journal of the American Statistical Association 68:428-430. 107 7. APPENDIX The following lemmas have been used in the proofs of Lemma 2.5.1 to Lemma 2.5.3 and Theorem 2.5.1. Some lemmas, TNhich were collected from reference materials, are stated in order to facilitate the proofs of the consequent lemmas. In the next four lemmas, the basic results on the eigen values of an tlXll matrix A'A (A), TNhere i-l,2, •• ,n, TNith A (A) l i ~ A (A) 2 ~ ••• > A (A) are stated. n Lemma 7.1. The eigen values of an tlXll symmetric ide!l1t'otent matrix are either O·or 1. Lemma. 7.2. If A is any nonsingular matrix and B is any n:m matrix, Lenma 7.3. If A and B' are mxn matrices, then the nonzero eigen values of A.B and the nonzero aigen values of BA are the same. Lemma 7.4. The eigen values of A and A' are the same, and the matrix AP , TNhere p is a positive intager, has the eigen values Ai(A). The notations TNoich ....ill be used in the re!!lainder of this appendi."'t are the same as those defined in Chapter 2. From the definition (2.13) 2 2 2 and condition 1, if 201 and cr 2 lcrii - cr~il ~~, e 2 2 cr Oi 2 3aOi Nn(ZQ) , thenT < cr ji < -y-, and TNhere j-l,2 and i-O,l, •• ,p. In the following l~, the results on the bounds of the Q~"'timum of the aigen values and the bounds of the maximum of the absolute values e 108 of the eigen values of some particular matrices are paraphrased from M:.U1er (1977). 2 2 2 Given.2.1 and.2.2 E Nn (2.0)' let: m, I: k , where k-0,1,2, ni' Lemma 7.5. 2 a 01 ' and Vi' where i-O,l, •• ,p, be as defined in Chapter~. If eondi- tion 1 is true, then a) A1 (1:~lv1) ~ +' a Oi -1 b) A (1: I: ) ~ 2 1 1 O c) A (1: 1 2 d) max IA t (1: o (Zl-I: O»I ~ -1 to) ~ 2 -1 m t-1,2,.,n e) -1 max IAt(!o (!2- t o»1 < t-l,2,. 'n m min 1-0,1,. ,p f) g) IAt(t -1 o (1: 1-I: Z» 2.:a1,2,.,n I max IA9. (I:~1 O:l-ZO») )'-l,Z,.,;l I< max -< 2m 2 , and (n a ) Oi 1-0,1,. ,p 1 min 2m min i-O,l,.,p P~of. 2 (n 1a 01) . (::t 2 ) a i Oi See Miller (1977). The bound of the maximum of the absolute values of the eigen values of the matrix which is of the form C'BC is given as follows: Lemma 7.6. then If 'c is an nxm matrix and B is an nxn symmetric ~atrL~, ~~ IAk(C'BC)1 ~ A1(C'C) max 1;\2(3)1. k-1 , 2 , . ,m 9. -1 , Z, • , n 109 P~oof. sult~ See Miller (1973). , , -1 where t , k-O,1,2, and T are as defined in Qk - T (TtkT) k Chapt er 2, we have that Recall that there e.tists a nonsingular matri:t ~ such that t k • ~Ak' where k-O,1,2. k • I - ~lx(X't;lX)-1x,~t, then Qk • t;l - Define P -1 -L -1 -t -1 t k X(X'tkX) -X'tk~· ~ ?k~ . It is clear that P k is a symmetric idempotent matrix. Applying Lemma 7.6, the bounds of the maximum of the eigen values and the bounds of the maximum of the absolute values of the eigen values of some particular matrices are as follows: Lemma 7.7. Given.z.i and a~ E N'n (~), let i-O,l,.,p, be as defined in Chapter 2. a)' Al (AOQaAO) 5.. 1 b) Al(A~Q1AO) < 2 c) Al(A~Q2AO) < 2 d) ~ and ~* be as defined If condition 1 is , 2m max [A 2. (Ao~Ao)1 ~ . 2 ~ (n.a) 2.-l,2,.,n i • 0 , 1 ,.,p ~ 01 , and tr~e, then ~ 110 P~oof. The proofs are given for (a) and (d); the other cases can be proved analogously. a) Using ~ -t -1 - AO PerO ' where Po is a symmetric idempotent matrix, and Lemma 7.1, we get ~1 d) max By the definition of ~, IA1(AbaAaAb~o)1 - max IA1(A~QO(EO-t2)Q2AaA~Q2(tO-t2)QOA~ 1-1,2,.,n 1-1,2,.,n Applying Lemmas 7. 3, 7. 4, and 7. 6, max IA9.. (A~Ml1\~MO) ;2.-1,2,. ,n I Using (a), (c), and Lemma 7.5, we can conclude that • III max lAg. (A~AAO)I ~ 1 2 Thus, ...- , < ----2m~-2~ min ,.,Il .I. ..,- 0 , 1 ,. J P (uia Oi) min(di,d ) j In the uext lemma, we will show that .;;i;,J;,....j _ -__ is bounded for Il Il i j i, j -0 , 1, . , p • Lemma 7.8. then If d i and n , where i-O,l,.,p, are defined in Chapter 2 i is bounded for all i,j. • Since c P-roof. i and n 2 i have the same order of magnitude, then min(c. ,c ) .;;i.......j __1._ j_ is bounded. n.n. 1. It is know~ that min(d.,d.) < J •. 1.,J 1. J ~in(c.,c.). -. j 1., J 1. is bounded fo-r all i,j. Therefo-re, The bound of the trace of the matri~ which is of the fore AV.BV. 1. J is given as follows: Lemma 7.9. Let A and B be symmetric nxn matrices. and V., where i-O,l,.,p, are as defined in Chapter 2, then for all i,j, 1. max I Ai (A~-\AO) I max 2,-1,2, . ,n Proof. See Miller (1973). IAi (A~BAO) I. 2,-1,2, . ,n 112 Applying Lemma 7.9, the bounds of the traces of matrices which are of the form AViBV j - for different matrices A and B are obtained as ninj follows: Lemma 7.10. 2 Given 0'12 and 0'22 E :::In (~), let A and A.* be as defined 2 above, and let m, t k , Qk' where k-0,1,2, d i , ni' a Oi ' and Vi' where 1-0,1,.,p, be as defined in Chapter 2. If condition 1 is true, then for all i,j, 2m m min(d. ,d ) 1, j 1. j . (2 ) 2 !I1~ 1=0,1, • , P , and n i 0' 01 - 16m Proof. The proofs are given only for (a) and (c); the other cases can be proved analogously. 113 a) Applying Lemmas 7. 7 and 7.9, we have that max IAi (A~MO) I 1-1,2,.,n c) max IA2. (A~MO) I 1-1,2,. ,n Applying Lemmas 7.6 and 7.9, we get n~l1j I tr«Qo-Qo"zQo)V1 QOVj) min(di,d ) j . < i.j IIiIIj 2 2 a Oi cr OJ I max IA1(A;<QO-QOt2QO)AO)! max lA2.(~QoAo)1 t-l,2,.,n 2.-1,2,.,n max IAi(A~QO(!0-t2)QOAO)1 2,-1,2,. ,n. by Lemmas 7.1 and 7.5. 114 Applying temna 7.10, the bounds of the differences of the traces of matrices which are of the form AV iBVj - for different matrices A D..n. 1. J and B are obtained as follows: 222 Given 0'1 and 0'2 e Nn <.20), LI!IIm1a 7.11. above and let til, • let A and A be as defined 2 ~, Qk' where k-O,1,2, d , D. , 0'01' and Vi' where 1 i-O,l,.,p, be as defined in Chapter 2. i If condition 1 is true, th~~ for all i,j, Proof. a) Using A • Q2-QO' it follows that Hence (7.1) The first two terms on the right hand side of (7.1) converge to zero as D. .. ca by Lemmas i. 8 and 7.10 and by the definition of The third term likewise converges to zero as as n .. ca. n .. ca. Iil. Therefore, 11.5 Therefore, (7.2) The first two zero as n third + te~ te~s on the right hand side of (7.2) converge to = by Lemmas 7.8 and 7.10 and the definition of m. likewise converges to zero' as n + =. Then for all i,j, 2 n:nj!tr(Q2ViQ2Vj) - tr(QlViQlVj )/ converges to zero as n -s The +~. c Lemma 7.12. are bounded for s-O,l, •. ,a. If c s , ~here 2 s-O,l, .. ,a, n , and 00i' i are as defined in Chapter 2, then for .all s, -c s are bounded. ~here i-O,l,.,p, 116 Proof. The proof is given only for the first case, the second case can be proved analogously. n~:l. have the same order of magnitude. For any i e 5 , c and s i Then it is sufficient to show that Cs Since cs has the same order of magni- .. dim{L(TU is : •• :TU )} p ra 'EU S J t t-s ra ci - that: c ra C < j < s- c j EU St jEU 5 t-s+1 e-s .:i • a ci number. for i E S s and j This implies that any for j iE 5 • s It is also lc.own t e -s C 5s+ , then ~ is positive finite 1 ci and c i have the same order of magni- - s -----------2 ' C tude. Hence, min i"O , 1, • ,is+1-1 s"O,l, •• ,a, are bOlmded because (n ) i The bOlmd of a quadratic for.n ~Nhich is of the for.n .! , , T ABCTI is given as follows: Lemma 7.13. If!Y is an (n-k)xl vector, A and C' are (n-k):tn matrices, B is an nxn matrix, and A is as defined in Chapter 2, then 117 The next step is to find the bounds of quadratic forms which are -1 , A A. TY for different matrices A. Recall from O O , (2.23) that under TX"~ N(O,Tl: T ), TY can be W1:'i1:ten as 2 , , of the form Y T M -t , , -t -1 , F Z • This gives! I M O AO A II s-s ! r :II Z'F'o'AA-tA:IA'OF Z for any matrix A. saO t- O -s s 0 -lJ t-t By Cauchy-Schwarz Inequality, And by condition 2., we have that p~ve Io , that! I , - t -1 , .~O A A II O is bounded for different matrices A, we first find the bound of Al(F:O'AA~tA~lA'DFs) and then combine this result with the fact discussed in the preceding paragraph. In the following values 0 l~, the bounds of the maximum of the eigen ' , -t - t , f matrices which are of the form FsO AA A A DF O O s are given. UB Lemma 7.14. 2 2 Given ~l and'£2 e 2 Nn(~)' let t k , where k-O,1,2, Ps ' 2 where s-O,l,.,a, n , cr Oi ' and Vi' where i-O, •• ,p, be as defined in i , , Chapter 2, and there exists a nonsingular mat:-ix D such that T!2I -DD • If condition 1 and condition 2 are true, then for all s, , and b) - 0 otherwise. Recall from Chapter 2 that P-O' E:R. where H is an orthogonal Proof. , , matrix, n is a lower triangular matr1:t such that I!2I -DO , and R is an upper triangular matr1.."( such that R's:'nn'mt.. r. Ihe matrix F, i.e., F· (F O:F1 :.:Fa ), can be written as 0' (~:H~: ... :H:) where a: depends on ac' a) t-O,l, •• ,s. '-1 -t-1 Using (T:ZI) • 0 D , Then V T'a*=O for 1 i s e a U j-s+l Sj. 119 -But: a*s -D- tF, s s-O,l,. ,a, and 19+1- 1 - r 1-0 Therefore, (t -t )T'a* - E*T'a* - E*!'D-tF. 1 2 s s s This together with with the definition of the maximum of the eigen values imply that sup - 1iO Applying Lemmas 7.3 and 7.6, we have that ~ * - t -1 * - t I A1(D-1TA A,T, D t ) max IAj (A-1 2 E AO AO E A2 ) 2 2 j-l,2,. ,n 120 by Lemmas 7.1 and 7.5. , -t By the same argument as in (a), Vi! D F s • 0 for i e a U Sj. j-s+l a For i ~ U Sj' applying Lemmas 7.2, 7.3, and 7.6, we have that j-s+l 121 max IAj(A;Q2A2)! j-l,2,. ,n IAj max j-l,2,. (P2) I. ,tI. Applying Lemnas 7.1 and 7.5, we have that for i ~ The bounds of the maximum of the eigen values of matrices which are of the form " AaAaBAa , AbZ for diff erent matrices B are given as follows: Lemma 7.15. and let m, r k , Qk' where k-O,1,2, nit a;it and Vi' where i-O,l,.,p, be as defined in Chapter 2. If condition 1 is true, then fo'!' all i,j, < b) 122 Proof. The p~oof is given only for (a); the other cases can be proved analogously. Applying LC!MI1a 7. 6, we have that , for all i, by Lamas 7. 5 and 7. 7• By assembling the results from Lenma 7.13 to Lemma 7.15, we will converges to zero as n ~ = in the fo1loYing Lemma. 123 Given .£12 and Lemma 7.16. O'2z E Nn 2 <.£0), i-O,l,. ,p, be as defined in Chapter Z. let 6. and 6. :lr be as defined , Under !Y '" N(£, TtZT ), if condi- . 2 ticn 1 and condition Z are true. then for all i,j, n:n - I'Q1ViQ1VjQlyl converges to zero as n ~ )1' z Q viQZVj Qz! a. Bence, , I Since Qk • T (TtkT ) -1_ ~, each of the seven terms on the right hand side of (7.3) can be written in the form of m2 1 (l'T'AB~)1 for different matrices A, B, and C as shown in Table 7.1. 124 e Seven terms with appropriate choices of A, B, andC Iable 7.1 Ierm A 1 (IIZI')-lr(tl~tZ) Z (II I')-lT --1 Z ni C B V. Vi Q1 ni ~ n j QZ V , -1 (TtZI) 4 J (It I') -IT 2 j V Z Q1 ni Q2 t:.* .:..1 i (TtZT')-lTCZl-tZ) n +.~ , , 1 Q ' , -1 , , -1 , , -1 )T (It I ) Z 2 O:l- ZZ)T (T!ZI ) 1 V -!!l* in Q1 1 n j i fo~ I')-l Z O:l-tZ)T (It 2T ) m-., I (1"T ABCTY) I, converges is given; the rest can be proved analogously by a a \' \' r.. A A TY)· r.. O O saO taO - t -1 ' j _ (J.. -i: Q1 Q using Lemna 7.13 to Lemma 7.15. (Y T .U. n j :roving that the first term of the to zero as n Vj V .J. n V 7 V nj .J. I' (It i Vi I')-l Z (t 1-t Z)I ' (TtZI , ) -1 Q1 Q -! 6.* 1 n T(t 1-t Z) 1 6 QZ n . (TtzI' )-lrO: -t ) 5 n. J V 1 2 ~I'(It t/' (Tt I') -lr ....1 Z ni 3 I' (It I')- ~ ", F DM S S , , Recall that (Y T ABCTI) - t -1 , Z A A DFtZ... for any matriX A. O O • 125 By Cauchy-Schwarz Inequality, and By lemma 7.12, there L~sts a constant B such that for s-O,l,.,a, -cs < B for j ~ a U 52. 2.-s+1 Applying Lemma 7.14, we have that for s-O,l,.,a, 88 2 ~IOm13, and then therefore, (7.4) a Applying Lemma 7.14, we have that for s-O,l,.,a and j ~ U 1.-s+:4 s 2. 126 V V " D (Tt T ' )-~ l _ - j A-tA- I _ j T' (Tt T')--nF L l' Z 'F Z <...l:. -s S Z tl 0 0 tl Z s-s 10 s j j c ?? -4 ~ . Z_ Z4 tlj 0' OJ B, 100'OJ and then Therefore, By Lemma. 7.15 < 16 2 4 • Ilia Oi (7.6) Combining (i.4), (7.S), and (7.6), we have that 4 ( Y " T crt,! I !1l v ) - z .. 4 88m ~ m V j ! I ( Tt ! 1 tli QZ _ Z Ilj L ( !l-I: ) Q -i J: Z '"'lo B (a+1) Z ) - 1 T"Z ) Z 127 2 Vi 1m (!' t;.* ~ i Vj Q t r Qz!) 2 I, converges to zero as 11 .. =a. For the j remainder of the ter.:ns of the form 2 m 1<y'T'ABC'!Y)I, one can prove by the similar argument that each term converges to zero as 11 .. =a. converges to zero as the following 11 .. =a. l~. i-O,l,. ,p, are as defined in Chapter 2, then Var(B ij (E:i)] £~ ~ 32min(di, dj ) zi 214 4 a l1 i n j a Oi OJ for all i,j. By using the definition of variance of a quadratic fo~, ? Var(y'AY) • 2tr(AZ )-' Z 2 1.2 v~r [Bij~~)l • v~r {n~~j [~tr(Q2ViQ2Vj) 1.2 22 - I'QzViQZVjQ2Y1J 128 But there are at most min(di,d.) nonzero eigen values of Q2ViQ2Vj' i,j J Therefore, Applying Lemmas 7.3 and 7. 6, we ~ Al(U;'A~tA~1u;) e· have Al(A;Q2ViQ2AO) • Al (!;lvj ) Al (A;Q 2AOAOlv iAotA;Q2AO) -L_ " -1 ~ Al(!O-Vj ) Al (AOQ2AOAOQ2Aa) Al(r O Vi)' Hence, Al(Q2ViQ2Vj) then e" ~ 4 2 2 O'OiO'Oj by Lemma 7.5 and L~ 7.7.
© Copyright 2025 Paperzz