International Statistical Review (2006), 74, 2, 169–185, Printed in Wales by Cambrian Printers c International Statistical Institute Statistical Proofs of Some Matrix Theorems C. Radhakrishna Rao Statistics Department, Joab Thomas Building, Pennsylvania State University, University Park, PA 16802, USA Summary Books on linear models and multivariate analysis generally include a chapter on matrix algebra, quite rightly so, as matrix results are used in the discussion of statistical methods in these areas. During recent years a number of papers have appeared where statistical results derived without the use of matrix theorems have been used to prove some matrix results which are used to generate other statistical results. This may have some pedagogical value. It is not, however, suggested that prior knowledge of matrix theory is not necessary for studying statistics. It is intended to show that a judicious use of statistical and matrix results might be of help in providing elegant proofs of problems both in statistics and matrix algebra and make the study of both the subjects somewhat interesting. Some basic notions of vector spaces and matrices are, however, necessary and these are outlined in the introduction to this paper. Key words: Cauchy–Schwarz inequality; Fisher information; Kronecker product; Milne’s inequality; Parallel sum of matrices; Schur product. 1 Introduction Matrix algebra is extensively used in the study of linear models and multivariate analysis in statistics [see for instance Rao (1973), Rao & Toutenburg (1999) and Rao & Rao (1998)]. It is generally stated that a knowledge of matrix theory is a prerequisite for the study of statistics. Recently a number of papers have appeared where statistical results are used to prove some matrix theorems. [Refs: Dey, Hande & Tiku (1994), Kagan & Landsman (1997), Rao (2000), Kagan & Smith (2001), Kagan & Rao (2003, 2005), Kagan, Landsman & Rao (2006)]. The object of this paper is to review some of these results and give some new results. It may be noted that the statistical results used to prove matrix theorems are derivable without using matrix theory. We consider only matrices and vectors with real numbers. 1.1 Some Basic Notions of Vector Spaces and Matrices A symmetric p p matrix A is said to be nnd (nonnegative definite) if a Aa for any p-vector a, which is indicated by A . A is said to be pd (positive definite) if a Aa for all non-null a, which is indicated by A . We write A B if A B is nnd and A B if A B is pd. The notation C p k is used to denote a matrix C with p rows and k columns. The transpose of C obtained by interchanging columns and rows is denoted by C k p. The rank of any matrix X, denoted by X , is the number of independent columns (equivalently of independent rows) in X. The paper is based on a keynote presentation at the 14th International Workshop on Matrices and Statistics meeting in Auckland, New Zealand, March 29–April 1, 2005. 170 C.R. RAO We denote by X the vector space generated by the column vectors of X p k, X y y Xa a R k where R k the vector space of all k-vectors. Given X p q X r , there exists a matrix Y p s s respect to a given pd matrix V X VY Y p r such that with where q s is a matrix with all elements zero. The vector spaces X and Y are said to be mutually V -orthogonal. [If V I , the identity matrix, X and Y are simply said to be orthogonal]. The space R p of all p-vectors is generated by If z X Y Xa Y b a R q b Rs R p , then z Xa Y b Multiplying by X V , we have X Vz X V Xa a X V X Xa X X V X X V z X Vz so that the part of z lying in X can be expressed as P X z where X X V X X V (1.1) is called the projection operator on the space X . [The explicit expression (1.1) for a projection PX operator in terms of a generalized inverse defined in (1.3) was first reported in Rao (1967). See also Rao (1973, p. 47)]. Note that we can choose the orthogonal complement of X as Y , where Y Given a matrix X m inverse) of X, such that I X X V X n, there exists a matrix X XX X X X V (1.2) n m, called the g-inverse (generalized X (1.3) need not be unique unless X is a square matrix of full rank. If C is any matrix such that C C, so that X X behaves like an identity matrix in such situations. [See Rao (1973), Rao & Mitra (1971, Section 16.5), and Rao & Rao (1998, Chapter 8)]. C X , then X X 2 Some Basic Results Let x x ½ x p be a p-vector rv (random variable) and y y ½ yq be a q-vector rv all with zero mean values. We denote the covariance matrix of x and y by E x y E x y ½ ½ ½ ¾ C x y E x y E x p y½ E x p y¾ which is a p q matrix. The following results hold. C y x E x ½ yq E x p yq Statistical Proofs of Some Matrix Theorems 171 THEOREM 2.1. If A p p is the covariance matrix of a p-vector rv x, then: a b A is nnd (2.1) If A s p, then there exists a matrix D p p s such that y D x a.s. (2.2) Conversely, if A p p is an nnd matrix, then there exists a rv x such that C x x A (2.3) To prove (2.1) consider the scalar rv a x a R p . Its variance is V a x E a x x a a Aa A is nnd The result (2.2) follows by choosing D as the orthogonal complement of A. Then C D x D x Cov D x D AD D x a.s. Note: The result (2.3) follows if we assume the existence of an nnd matrix A ½¾ such that A½¾ A½¾ A (or a matrix B such that A B B ). Then we need only choose an rv z with I as its covariance matrix and define x A ½¾ z (or x Bz). However, we prove the result without using the matrix result on the existence of A ½¾ (or the factorization A B B ). Later we show that the factorization A B B is a consequence of a statistical result. We assume that there exists a p -vector rv with any given p p nnd matrix as its covariance matrix and show that there exists a p-vector rv associated with any given p p nnd matrix. Since an rv exists when p , there exists one when p and so on. Let us consider the nnd matrix A p p in the partitioned form A ½½ A where A ½½ p p . Then, since A is nnd, the quadratic form a A½½ a c a c ¾ for all a and c. (2.4) If , then a for all a which implies that . If x ½ is an rv associated with the nnd matrix A ½½ , then x ½ is an rv associated with A. If , then, choosing c a , (2.4) reduces to a A½½ ½ a for all a i.e., A ½½ ½ is nnd. Let x ½ be an rv associated with A ½½ ½ p p and z be a univariate rv with variance and independent of x ½ . Then it is easy to verify that the rv x ½ ½ z u (2.5) z has A as its covariance matrix. THEOREM 2.2. Let A p p be a symmetric matrix partitioned as A A ½½ A¾½ A½¾ A¾¾ 172 C.R. RAO Then A is nnd if and only if (a) A ½½ is nnd, (b) A ¾¾ A¾½ A½½ A½¾ A¾ ½ (Schur complement of A ½½ ) is nnd, (c) A ½¾ A½½ . The theorem was proved by Albert (1969) using purely matrix methods. Statistical proofs were given by Dey et al. (1994) and Rao (2000). First we prove sufficiency using Theorem 2.1. (a) implies that there exists a random variable x with mean zero and covariance matrix A ½½ . (b) implies that there exists a random variable z with mean zero and covariance matrix A ¾½ . We can choose z to be independent of x. Now consider the random variable x y where y z A ¾½ A½½ x. Verify using (c) that C A, which implies that A is nnd. To prove necessity, observe that there exists a random variable x y with the covariance matrix A. (a) follows since C x x A ½½ . If A ½½ is not of full rank, let a be a nonnull vector such that a A½½ which implies that a x a.s. and C a x b y a A½¾ b b, i.e., a A½¾ . Hence A ½¾ A½½ which proves (c). Now consider the random variable z y A ¾½ A½½ x whose covariance matrix is C z z A¾¾ A¾½ A½½ A½¾ which implies (b). THEOREM 2.3. Let A and B be nnd matrices partitioned as A B A¾½ and B B¾½ If A B A ½½ B½½ , then A A ½½ and B B½½ . A A½¾ A¾¾ ½½ ½½ B½¾ B¾¾ The statistical proof of this theorem given by Dey et al. (1994) and Mitra (1973) is reproduced here. Let U½ U¾ N A and V½ V¾ N B be independent. Then Z U ½ U¾ V½ V¾ Z ½ Z ¾ N A B . Since A B A ½½ B½½ , there exists a matrix G such that A¾½ B¾½ G A½½ B½½ A¾¾ B¾¾ G A½¾ B½¾ which implies Cov Z ¾ G Z ½ U¾ V¾ GU½ GV½ Then CovU¾ GU½ and CovV¾ GV½ which implies U¾ GU½ A A ½½ and V¾ GV½ B B½½ . THEOREM 2.4. Let A be an nnd matrix. Then it can be factorized as A BB Let x ½ x p x be rv with the nnd matrix A p p as its covariance matrix. Then we can recursively construct random variables zi xi aii ½ x i ½ ai ½ x½ i p such that z ½ z p are mutually uncorrelated. It may be noted that ai ½ x ½ aii ½ x i ½ (2.6) Statistical Proofs of Some Matrix Theorems 173 is the regression of x i on x ½ x i ½ . Denote by b i¾ , the variance of z i . We can write the transformation (2.6) as Z Fx where F is a triangular matrix of rank p. Then ¾ C z z FC x x F F AF where ¾ is a diagonal matrix with diagonal elements b ½¾ b¾p , some of which may be zeros, and A F ½ ¾ F ½ F ½ F ½ B B where is the diagonal matrix with diagonal elements b ½ b p . 3 Cauchy–Schwarz (CS) and Related Inequalities A simple (statistical) proof of the usual CS inequality is to use the fact that the second raw moment of an rv is nonnegative. Consider a bivariate rv, x y such that b½½ E x ¾ Then E y E x y b½¾ b½¾ x b½½ ¾ b¾¾ E y ¾ ¾ b¾¾ bb½¾ (3.1) ½½ which written in the form ¾ b½¾ b½½b¾¾ (3.2) is the usual CS inequality. If b ½½ , x a.s. and b½¾ and the result (3.2) is true. In particular, if x y is a discrete bivariate distribution with x i yi having probability p i i n, then pi xi yi ¾ pi xi¾ pi yi¾ since b½¾ pi x i yi b½½ pi x i¾ and b¾¾ pi yi¾ . (3.3) The matrix version of CS inequality is obtained by considering rv’s x and y such that ½½ C x x ½¾ C x y ¾¾ C y y and noting that C y ¾½ ½½ x y ¾½ ½½ x ¾¾ ¾½ ½½ ½¾ (3.4) Az and y Bz where z is an s-vector A A AB B B A special case of (3.4) is obtained by considering rv’s x rv with uncorrelated components, in which case ½¾ ½½ ¾¾ and (3.4) reduces to BB B A AA AB (3.5) The statistical proofs of the results in the following Lemmas are given in Dey et al. (1994). LEMMA 3.1. Let A and B be nnd matrices. Then a A a a B ½ a a A B a a A a a B a provided a A B , with equality when either A or B has rank unity. 174 Let A C.R. RAO X ½ X ½ and B X ¾ X ¾ . Consider the linear models Y X ½ X ½ X ¿ Y½ X ½ Y¾ X ¾ Y¿ Y ¾ ¾ under the three models are Ti a i i X i X i X i Yi i V T½ a A a ½ ½ V T¾ a B a ¾ ½ The best linear estimates of a Now V T¿ ½ ¾ ¾ V ½ T½ ¾ T¾ ½ ¾ ¾ ½¾ V T½ ¾¾ V T¾ aaAAa aaaBB a a If either A or B is unity, T¿ is same as ½ T½ ¾ T¾ ½ ¾ . This completes the proof. a A B a By using the same argument, the result of Lemma 3.1 can be generalized as follows. LEMMA 3.2. If a A B , then a A B a a where a a ½ a¾ . a ½ A a ½ a ¾ B a ¾ ½ Aa¾ B LEMMA 3.3. Let A i i k be nnd matrices and a A i . Then ¾ k k k a Ai a ½ ¾ a Ai a i ½ ½ a A i a ½ LEMMA 3.4. Let A be nnd matrix. Then for any choice of g-inverse a A aa a a A a a A a if a A 4 Schur and Kronecker Product of Matrices The following theorems are well known. THEOREM 4.1. Let A and B be nnd matrices of order p p. Then the Schur product A Æ B is nnd. A Æ B ai j bi j , where A ai j and B bi j . Proof. Let X be a p-vector rv such that C X X A and Y be an independent p-vector rv such that C Y Y B. Then it is seen that CX Æ Y X Æ Y A Æ B so that A Æ B is the covariance matrix of a rv , which proves the result. THEOREM 4.2. Let A p p be an nnd matrix and B q q be an nnd matrix. Then the Kronecker product A B is nnd. A B a i j B , where A ai j . Proof. Let X be a p-vector rv such that C X X A and Y be an independent q-vector rv such that C Y Y B. Then C X Y X Y A B so that A B is the covariance matrix of a rv , which proves the desired result. Statistical Proofs of Some Matrix Theorems Note that Theorem 4.1 is a consequence of Theorem 4.2 as A A B. 175 Æ B is a principal submatrix of 5 Results Based on Fisher Information 5.1 Properties of Information Matrix Let x be a p-vector rv with density function f x , where parameter. Define the Fisher score vector ½ q is a q-vector f f Jx ½ q and information matrix of order q q Ix E Jx Jx Cov Jx Jx which is nnd. Monotonicity of information: Let T x be a function of x. Then under some regularity conditions (given in Rao (1973, pp. 329–332)) IT Ix (5.1) where I T is the information matrix derived from the distribution of T . Additivity of information: If x and y are independent random variable whose densities involve the same parameters, then Ix y Ix I y (5.2) Information when x has a multivariate normal distribution: Let x N p B A, i.e., distributed as p-variate normal with mean B and a pd covariance matrix A. Then the information in x on is Ix B A If y Gx, then ½ B (5.3) I y B G G AG ½ G B (5.4) and if G is invertible, I x I y . Note that if G is of full rank, I y Ix , i.e., there is no loss of information in the transformation y Gx. Suppose that the parameter is partitioned into two subvectors, ½ ¾ where ½ ½ p and ¾ p ½ s . The corresponding partition of the score vector is ¼ ¼ ¼ ¼ Jx J½x J¾x (5.5) E J J E J J I I ½x ½x ½x ¾x ½½ ½¾ Ix (5.6) E J¾x J½x E J¾x J¾x I¾½ I¾¾ We define the conditional score vector of ½ , when ¾ is not of interest and considered as a nuisance and the information matrix is parameter, by J ½x J½x I½¾ I¾¾½ J¾x (5.7) 176 C.R. RAO which is the residual of J½x subtracting the regression on J ¾x , in which case I ½x E J ½x J ½x I½½ I½¾ I¾¾½ I¾½ As one sees from the definition I ½x J½x and J¾x are uncorrelated. If (5.8) I½½ , the equality sign holding if and only if the components of I ½½ I ½¾ ½ Ix ¾½ I I ¾¾ then I ½x I½½ I½¾ I¾¾½ I¾½ Ix½½ which can be proved as follows. Consider a random variable p q vector r.v. x y Np Then using (5.3) with B I½½ q ½ I¾½ I½¾ I¾¾ (5.9) Ix y I ½½ . Further z x I½¾ I¾¾½ y and y are independent. Hence I ½½ Ix y Iz y Iz I y Iz I½½ I½¾ I¾¾½ I¾½ ½ since C z z I½½ I½¾ I¾¾½ I¾½ . The monotonicity property remains the same for conditional information. Thus if T statistic and IT is pd, then I½T I ½x T x is a (5.10) But the additivity property may not hold. If x and y are independent random variables, we can only say I½x y I ½x I ½ y (5.11) which is described as superadditivity, unless x and y are identically distributed in which case equality holds. For a proof, reference may be made to Kagan & Rao (2003). 5.2 Some Applications THEOREM 5.1. If A and B are pd matrices, then A B A ½ B ½ (5.12) Proof. Consider independent rv’s N B and y N A B Æ ¾ I where Æ ¾ I is added to A B to make C y y positive definite. Note that Æ can be chosen as small x Statistical Proofs of Some Matrix Theorems as we like. Then and 177 x y N A ƾ I Ix B ½ I y Ix y A ƾ I ½ Using monotonicity (5.1) and additivity (5.2) properties of Fisher information, we have A Æ ¾ I ½ Ix y Ix y Ix I y B ½ Since (5.13) is true for all Æ , taking the limit as Æ , we have the result (5.12). (5.13) The equality given in the following theorem is useful in the distribution theory of quadratic forms (Rao, 1973, p.77). We give a statistical proof of the equality. THEOREM 5.2. Let p p be a pd matrix and B k p k, be matrices such that BC . Then C C C ½ C ½ B B ½ p of rank k, C p k p of rank B B ½ ½ (5.14) N p and the transformation Y½ C X N p k C C C Y¾ B ½ X Nk B ½ B ½ B Proof. Consider the random variable X Here Y½ and Y¾ are independent, and using additivity (5.1) of Fisher information we have ½ I X IY Y IY IY Note that IY is the first term and IY is the second term on the left hand side of (5.14). See ½ ½ ¾ ½ ¾ ¾ (Rao, 1973, p.77) for an algebraic proof. 6 Milne’s Inequality Milne (1925) proved the following result. THEOREM 6.1. For any constants i i n, such that ½ n and given i i i n, ¾ i i i i (6.1) ¾ i i¾ i i Proof. We give a statistical proof of Milne’s inequalities. To prove the right hand side inequality, consider the bivariate discrete distribution of X Y : probability value of X value of Y ½ n ½ ½ n ½ ½ n ½ ½ (6.2) We use the well known formula C X Y E X ½ X ¾ Y½ Y¾ (6.3) 178 C.R. RAO where X ½ Y½ and X ¾ Y¾ are independent samples from the distribution (6.2). A typical term on the right hand side of (6.3) is i j i j ¾ i ¾ j ¾ i j Hence C X Y E XY E X E Y . The right hand side inequality follows by observing that i i i E XY E X and E Y i i i¾ COROLLARY 1. For any rv X with Pr X A , E AX E AX E A¾ X ¾ (6.4) We use the same arguments as in the main theorem exploiting (6.3), choosing A X and A X as the variables X and Y . COROLLARY 2. (Matrix version of Milne’s inequality). Let A V ½ Vn be symmetric commuting matrices such that A A Vi A i n. Then i A Vi ½ i A Vi ½ i A¾ Vi¾ where i and ½ n . ½ (6.5) A purely statistical proof of (6.5) is not known. The proof given in Kagan & Rao (2003) depends on the result that an nnd matrix A has an nnd square root A B ¾ . Is there a statistical proof of this result? A different statistical proof of Milne’s inequality (6.1) given in Kagan & Rao (2003) is as follows. Proof. Let x ½ x n be independent bivariate normal vectors such that E x i i ½ C x i x i ¾ i i in which case the elements of I xi are I½½xi I½¾xi I¾¾x i ¾ i i i I¾½x ¾ i i i and the information based on x, the whole set of x ½ x n is i i¾ i i I½¾x I¾½x i¾ I½½x where I¾¾x ¾ ½ I½x i ¾ i i ¾ i ¾ i i i (6.6) Statistical Proofs of Some Matrix Theorems 179 Consider now the vector x ½ of the first components of x ½½ x ½n of x. The distribution of x ½ depends only on ½ and Ix ½ ½ i and trivially I½x ½ ½ Ix ½ (6.7) Since x ½ is a statistic, by applying the super additivity result (5.11) to (6.6) and (6.7), we have ¾ ½ i i i i i (6.8) i¾ i¾ i¾ and setting leads after simple calculation to the Dividing both sides of (6.8) by i i i i right-hand side inequality of (6.1). Note that the classical inequality between arithmetic and geometric means gives ¾ ¾ i i i i i i i i i¾ i which establishes the left-hand side inequality in (6.1). 7 Convexity of Some Matrix Functions A ½½i Ai Let A½¾i A¾½i A ½½ A A½¾ A¾½ A¾¾ B ½½ B B½¾ be nnd matrices. Further let B¾½ A¾¾i i n n n i Ai i and i ½ B¾¾ ½ A½ Æ A¾ Æ Æ An n½ Æ Ai Since A and B are nnd matrices, we have A¾¾ A¾½ A½½ A½¾ B¾¾ B¾½ B½½ B½¾ A½½ A½¾ A¾¾ A¾½ B½½ B½¾ B¾¾ B¾½ (7.1) (7.2) As a consequence of (7.1) and (7.2), we have the following theorems. THEOREM 7.1. Let C i i n be symmetric matrices of the same order. Then n n i Ci¾ i Ci ¾ ½ ½ n n ÆCi¾ ÆCi ¾ ½ ½ (7.3) (7.4) Proof. Choose A ½½i Ci¾ A½¾i A¾½i Ci and A ¾¾i I . Then A i is nnd by the sufficiency part of Theorem 2.2. Hence using (7.1) and (7.2), we have (7.3) and (7.4) respectively. 180 C.R. RAO THEOREM 7.2. Let C i i n, be pd matrices. Then n n i Ci ½ i Ci ½ ½ ½ n n ÆCi ½ ÆCi ½ (7.5) ½ (7.6) ½ Proof. Choose A ½½i Ci ½ A½¾i A¾½i Ci A¾¾i I . Applying the same argument as in Theorem 7.1, the results (7.5) and (7.6) follow from (7.3) and (7.4) respectively. The inequality (7.5) is proved by Olkin & Pratt (1958). See also Marshall & Olkin (1979, Chapter 16). In particular, if C½ and C ¾ , then C ½ Æ C¾ C½ ½ Æ C¾ ½ ½ . THEOREM 7.3. Let C i si that Bi Bi Im . Then si i n and Bi si m i n, be matrices such Bi C i C i Bi Bi Ci Bi ¾ (7.7) In particular, when C i are symmetric for all i , (7.9) reduces to Proof. Since the matrix Bi Ci¾ Bi B C C B i i i i Bi Ci Bi ¾ Bi C i Bi (7.8) Bi C i Bi B Bi is nnd, we get (7.8) and (7.9) by applying (7.3). Some of these results are proved in Kagan & Landsman (1997) and Kagan & Smith (2001) using Information Inequality (5.1) THEOREM 7.4. Let A and B be pd matrices of order n n and X and Y be m n matrices. Then X Æ Y A Æ B ½ X Æ B X A ½ X Æ Y B ½ Y (7.9) Proof. The result follows by considering the Schur product of the nnd matrices X A ½ X X X Y B and A ½ Y Y Y B and taking the Schur complement. THEOREM 7.5. (Fiedler, 1961). Let A be a pd matrix. Then AÆA Proof. The Schur product A I I A ½ A Æ ½ ½ I I A I (7.10) AÆA ½ I I A ½ ÆA (7.11) is nnd, since each matrix on the left hand side of (7.11) is nnd. Multiplying the right hand side matrix of (7.13) by the vector b a a on the left and by b on the right, where a is an arbitrary vector, we have a A Æ A ½ a a a a A Æ A ½ I Statistical Proofs of Some Matrix Theorems 181 Purely matrix proofs of some of the results in this section can be found in Ando (1979, 1998). THEOREM 7.6. Let A ½ An be s s pd matrices and define the partitions Ai A A½¾i A¾¾i ½½i A¾½i where A½½i is of order p p. Then n and A i ½ ½ ½½ i Ai ½ A½½ A½¾ i A¾¾ i i A¾½ i n i A½½i ½ (7.12) ½ Proof. Let N i Ai i n be independent s-dimensional normal variables, where ½ ¾ is a p s p s dimensional parameter and j are known constants such that ½¾ n¾ . Set x x ½ x n i¾ i . Then i Ai ½ Ix i Ai ½ Ix ½ From (5.8) and the result, A ½½ A½½i A½¾i A¾¾½i A¾½i , i ½ n ½ I½x i A½½ i A½½i (7.13) i I½ x xi i i ½ Due to superadditivity (5.11), (7.15) leads to (7.14). 8 Inequalities on Harmonic Mean and Parallel Sum of Matrices The following theorem is a matrix version of the inequality on harmonic means given in Rao (1996). THEOREM 8.1. Let A ½ and A ¾ be random pd matrices of the same order, and E denote expectation. Then E A½ A¾ ½ ½ ½ ½ E A½ E A¾ ½ ½ Proof. Let C be any pd matrix. Then A C ½ ½ A½ A½ A½ (8.1) (8.2) C is nnd by the sufficiency part of Theorem 2.2. Taking expectation of (8.2), and considering the Schur complement, we have E A½ C Now choosing C ½ A½ E A ½ E C ½ E A ½ A ½ A¾ and noting that A½ ½ A¾ ½ ½ A½ A½ A½ A¾ ½ (8.3) A½ the result (8.1) follows by an application of (8.3). The same result was proved by Prakasa Rao (1998) using a different method. THEOREM 8.2. Let A ½ A p be random pd matrices and M ½ M p be their expectations. 182 C.R. RAO Then E A M where A (8.4) A½ ½ A p ½ ½ M M½ ½ M p ½ ½ Proof. The result (8.4) is proved by repeated applications of (8.1). THEOREM 8.3. (Parallel Sum Inequality). Let A i j i ces. Denote A i p and j q be pd matri- A½i½ A pi½ ½ A j A j i A j q A A½½ A p½ ½ Then Ai A (8.5) The result follows from (8.4) by considering A i j i p, as possible values of A ½ A p , giving uniform probabilities to index j , and taking expectation over j . We can get a weighted version of the inequality (8.5) by attaching weights q. j to index j j THEOREM 8.4. Let A ½ An be nnd matrices of the same order p and B p k of rank k and B Ai for all i . Then B Ai B ½ ½ B Ai B (8.6) where Ai denotes any g-inverse of A i . X i X i and consider independent linear models Yi X i i E i and C i i I i n (8.7) The best estimate of B (in the sense of having minimum dispersion error matrix) from the entire model is B Xi Xi X i Yi Proof. Let A i with the covariance matrix Ai B The estimate of B from the i -th part of the model, Y i X i i , is i B X i X i X i Yi B (8.8) (8.9) with the covariance matrix B Ai B (8.10) Combining the estimates (8.9) in an optimum way we have the estimate of B B Ai B ½ ½ B Ai B ½ i with the covariance matrix B Ai B ½ ½ Obviously (8.11) is not smaller than (8.8) which yields the desired inequality. The results of Theorems 8.1 and 8.4 are similar to those of Dey et al. (1994). (8.11) Statistical Proofs of Some Matrix Theorems 183 9 Inequalities on the Elements of an Inverse Matrix The following theorem is of interest in estimation theory. ½½ ½¾ ½½ ½¾ ½ ¾½ ¾¾ ¾½ ¾¾ where p q p q is a pd matrix and ½½ is of order p p. Then ½½ ½½ ½ THEOREM 9.1. Let Proof. Consider the rv X Y Np q ½½ ¾½ ½¾ ¾¾ (9.1) where X is p-variate and Y is q-variate. Using the formulae (5.3) and (5.4) on information matrices, we have I X Y ½½ I X ½½½ and the information inequality (5.1) (9.2) I X Y Ix yields the desired result. 10 Carlen’s Superadditivity of Fisher Information Let f u be the probability density of a p q vector variable u Jf f ½½ jrs f ½¾ f ¾½ u ½ u p q and define f ¾¾ f f jrs E ur us and f ½½ and f ¾¾ are matrices of orders p and q respectively. Let g u ½ u p and h u p ½ u p q be the marginal probability densities of u ½ u p and u p ½ u p q respectively with the corresponding J matrices, J g of order p p and J h of order q q. Then Carlen (1991) proved where that Trace J f Trace Jg Trace Jh (10.1) We prove the result (10.1), using Fisher information inequalities. Let us introduce a p q -vector parameter ½ ¾ ½ p p ½ p q where ½ is a p-vector and ¾ is a q-vector, and consider the probability densities f u ½ ½ u p q p q g u ½ ½ u p p and h u p ½ p ½ u p q p q 184 C.R. RAO It is seen that in terms of information I f ½ f ½½ I f ¾ f ¾¾ which shows that f ½½ Ig ½ Jg Ih ¾ Jh Jg Jh f ¾¾ Taking traces, we get the desired result (10.1). A similar proof is given by Kagan & Landsman (1997). 11 Inequalities on Principal Submatrices of an nnd Matrix A k k principal submatrix of an nnd matrix A n n is a matrix A k k obtained by retaining columns and rows indexed by the sequence i ½ i k where i ½ i ¾ i k n. It is clear that if A , then A . THEOREM 11.1. If A is nnd matrix, then (i) A ¾ A¾ , and if A is pd (ii) A (iii) X A ½ X X A ½ X where X has the same number of rows as A. Proof. All the results are derived by observing that A X which implies X X A A X ½ X X X A ½ X ½ A ½ , and (11.1) (11.2) and taking a Schur complement for special choices of X. For (i), choose A ¾ for A and X A. For (ii), choose X I . (iii) Follows directly from (11.2). Different proofs and additional results are given by Zhang (1998). References Albert, A. (1969). Conditions for positive and nonnegative definiteness in terms of pseudo inverses. SIAM J. Appl. Math., 17, 434–440. Ando, T. (1979). Concavity of certain maps on positive definite matrices and applications to Hadamard products. Linear Algebra and Appl., 26, 203–241. Ando, T. (1998). Operator-Theoretic Methods for Matrix Inequalities. Hokusei Gakuen University. Carlen, E.A. (1991). Superadditivity of Fisher’s information and logarithmic Sobolev inequalities. J. Funct. Anal., 101, 194–211. Dey, A., Hande, S. & Tiku, M.L. (1994). Statistical proofs of some matrix results. Linear and Multilinear Algebra, 38, 109–116. Fiedler, M. (1961). Über eine Ungleichung für positiv definite Matrizen. Math. Nachr., 23, 197–199. Kagan, A. & Landsman, Z. (1997). Statistical meaning of Carlen’s superadditivity of Fisher information. Statistics and Probability Letters, 32, 175–179. Kagan, A., Landsman, Z. & Rao, C.R. (2006). Sub- and superadditivity à la Carlen of matrices related to Fisher information. To appear in J. Statistical Planning and Inference. Kagan, A. & Rao, C.R. (2003). Some properties and applications of the efficient Fisher score. J. Statist. Planning and Inference, 116, 343–352. Kagan, A. & Rao, C.R. (2005). On estimation of a location parameter in presence of an ancillary component. Teoriya Veroyatnostej i ee Primenenie, 50, 171–176. Kagan, A. & Smith, P.J. (2001). Multivariate normal distributions, Fisher information and matrix inequalities. Internat. J. Math. Education in Sci. Technology, 32, 91–96. Marshall, A.W. & Olkin, I. (1979). Inequalities: Theory of Majorization and its Applications. New York: Academic Press. Statistical Proofs of Some Matrix Theorems 185 Milne, E.A. (1925). Note on Rosseland’s integral for absorption coefficient. Monthly Notices of Royal Astronomical Society, 85, 979–984. Mitra, S.K. (1973). Statistical proofs of some propositions on nonnegative definite matrices. Bull. Inter. Statist. Inst., XLV, 206–211. Olkin, I. & Pratt, J. (1958). A multivariate Tchebycheff’s inequality. Ann. Math. Stat., 29, 226–234. Prakasa Rao, B.L.S. (1998). An inequality for the expectation of harmonic mean of random matrices. Tech. Report, Indian Statistical Institute, Delhi. Rao, C.R. (1967). Calculus of generalized inverse of matrices: Part 1 – General theory. Sankhyā A, 19, 317–350. Rao, C.R. (1973). Linear Statistical Inference and its Applications, Second edition. New York: Wiley. Rao, C.R. (1996). Seven inequalities in estimation theory. Student, 1, 149–158. Rao, C.R. (2000). Statistical proofs of some matrix inequalities. Linear Algebra and Applic., 321, 307–320. Rao, C.R. & Mitra, S.K. (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley. Rao, C.R. & Rao, M.B. (1998). Matrix Algebra and its Applications to Statistics and Econometrics. Singapore: World Scientific. Rao, C.R. & Toutenburg, H. (1999). Linear Models: Least Squares and Alternatives, (Second Edition). Springer. Zhang, Fuzhen (1998). Some inequalities on principal submatrices of positive semidefinite matrices. Tech. Report, Nova South Eastern University, Fort Lauderdale. Résumé Les ouvrages sur les modèles linéaires et l’analyse multivariée incluent un chapitre sur l’algèbre matricielle, tout à fait à juste titre, car les résultats matriciels sont utilisés dans la discussion des méthodes statistiques dans ces domaines. Dans les années récentes, sont apparus de nombreux articles dans lesquels les résultats statistiques obtenus sans avoir recours aux théorèmes matriciels ont été utilisés pour démontrer des résultats matriciels qui sont eux-mêmes utilisés pour générer d’autres résultats statistiques. Cela peut avoir une valeur pédagogique. On ne veut pas dire pour autant que la connaissance préalable de la théorie matricielle n’est pas nécessaire pour étudier les statistiques. On cherche à montrer qu’un usage judicieux de résultats statistiques et matriciels pourrait aider à fournir des démonstrations élégantes de problèmes autant en statistique qu’en algèbre matricielle et rendre l’étude des deux sujets plutôt intéressante. Des notions basiques des espaces vectoriels et des matrices sont cependant nécessaires et elles sont évoquées dans l’introduction de cet article. [Received June 2005, accepted November 2005]
© Copyright 2026 Paperzz