Supplemental Material for ASYMPTOTIC SIZE OF KLEIBERGEN'S LM AND CONDITIONAL LR TESTS FOR MOMENT CONDITION MODELS By Donald W. K. Andrews and Patrik Guggenberger December 2014 COWLES FOUNDATION DISCUSSION PAPER NO. 1977S COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 208281 New Haven, Connecticut 06520-8281 http://cowles.econ.yale.edu/ Supplemental Material for Asymptotic Size of Kleibergen’s LM and Conditional LR Tests for Moment Condition Models Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Patrik Guggenberger Department of Economics Pennsylvania State University First Version: March 25, 2011 Revised: December 19, 2014 Contents 13 Outline 2 14 Proof of Lemma 8.2 3 15 Proof of Lemma 8.3 4 16 Proof of Theorem 8.4 11 17 Proofs of Su¢ ciency of Several Conditions for the p j( ) Condition in F0j 26 18 Asymptotic Size of Kleibergen’s CLR Test with Jacobian-Variance Weighting and the Proof of Theorem 5.1 29 19 Proof of Theorem 7.1 52 1 13 Outline We let AG1 abbreviate the main paper “Asymptotic Size of Kleibergen’s LM and Condi- tional LR Tests for Moment Condition Models” and its Appendix. References to Sections with Section numbers less than 13 refer to Sections of AG1. Similarly, all theorems and lemmas with Section numbers less than 13 refer to results in AG1. This Supplemental Material provides proofs of some of the results stated in AG1. It also provides some complementary results to those in AG1. Sections 14, 15, and 16 prove Lemma 8.2, Lemma 8.3, and Theorem 8.4, respectively, which appear in Section 8 in the Appendix to AG1. Section 17 proves that the conditions in (3.9) and (3.10) are su¢ cient for the second condition in F0j : Section 18 proves Theorem 5.1. Section 18 also determines the asymptotic size of Kleibergen’s (2005) CLR test with Jacobian-variance weighting that employs the Robin and Smith (2000) rank statistic, de…ned in Section 5, for the general case of p this test is correct. But, when p 1: When p = 1; the asymptotic size of 2; we cannot show that its asymptotic size is necessarily correct (because the sample moments and the rank statistic can be asymptotically dependent under some sequences of distributions). Section 18 provides some simulation results for this test. Section 19 proves Theorem 7.1, which provides results for time series observations. For notational simplicity, throughout the Supplemental Material, we often suppress the argument 0 for various quantities that depend on the null value Material, the quantities BF ; CF ; and ( 1F ; :::; pF ) 0: Throughout the Supplemental are de…ned using the general de…nitions given in (8.6)-(8.8), rather than the de…nitions given in Section 3, which are a special case of the former de…nitions. For notational simplicity, the proofs in Sections 14-16 are for the sequence fng; rather than a subsequence fwn : n 1g: The same proofs hold for any subsequence fwn : n 1g: The proofs in these three sections use the following simpli…ed notation. De…ne Dn := EFn Gi ; n := Fn ; Bn := BFn ; Cn := CFn ; Bn = (Bn;q ; Bn;p q ); Cn = (Cn;q ; Cn;k Wn := WFn ; W2n := W2Fn ; Un := UFn ; and U2n := U2Fn ; where q = qh is de…ned in (8.16), Bn;q 2 Rp q; 2 Bn;p q 2 Rp q ); (13.1) (p q) ; Cn;q 2 Rk q; and Cn;k q 2 Rk (k q) : n;q n := Diagf 2 6 (p := 6 4 0 0(k Note that 14 De…ne n 1Fn ; :::; qFn g 0q n;q q) q 2 Rq (p q) n;p q p) q 0(k p) (p q) q ; 3 n;p q 7 7 2 Rk 5 p := Diagf 2 R(p (q+1)Fn ; :::; pFn g q) (p q) ; and : (13.2) is the diagonal matrix of singular values of Wn Dn Un ; see (8.8). Proof of Lemma 8.2 Lemma 8.2 of AG1. Under all sequences f 0 n1=2 @ bn vec(D gbn EF n Gi ) 1 0 A !d @ n;h :n vec(Dh ) 1 gh A Under all subsequences fwn g and all sequences f 1g; 0 0 N @0(p+1)k ; @ wn ;h : n h5;g 0pk k 0k pk vec(Gi ) h 11 AA : 1g; the same result holds with n replaced with wn : Proof of Lemma 8.2. We have bn n1=2 vec(D 0 1 b 1n n B C X B . C 1=2 Dn ) = n vec(Gi Dn ) B .. C b n 1 n1=2 gbn @ A i=1 b pn 2 0 1 3 0 E G g F `1 n ` C n 6 B 7 X 6 B C 1 7 .. = n 1=2 vec(G D ) g 6 B C Fn i 7 + op (1); i n . 4 @ A 5 i=1 0 EFn G`p g` where the second equality holds by (i) the weak law of large numbers (WLLN) applied to n 1 P P G`j g`0 for j = 1; :::; p; n 1 n`=1 vec(G` ); and n 1 n`=1 g` g`0 ; (ii) EFn gi = 0k ; (iii) h5;g = lim (14.1) Pn `=1 Fn is pd, and (iv) the CLT, which implies that n1=2 gbn = Op (1): Using (14.1), the convergence result of Lemma 8.2 holds (with n in place of wn ) by the Lyapunov triangular-array multivariate CLT using the moment restrictions in F. The limiting covariance b n Dn ) and n1=2 gbn in Lemma 8.2 is a zero matrix because matrix between n1=2 vec(D EFn [Gij Dnj (EFn G`j g`0 ) 3 1 0 Fn gi ]gi = 0k k ; (14.2) where Dnj denotes the jth column of Dn ; using EFn gi = 0k for j = 1; :::; p: By the CLT, the limiting b n Dn ) in Lemma 8.2 equals variance matrix of n1=2 vec(D (EFn vec(G` )g`0 ) lim V arFn (vec(Gi ) 1 Fn gi ) see (8.15), and the limit exists because (i) the components of submatrices of n1=2 gbn 15 equals 5;Fn and (ii) lim EFn gi gi0 s;Fn vec(Gi ) Fn = lim vec(Gi ) Fn = vec(Gi ) ; h are comprised of (14.3) 4;Fn and ! hs for s = 4; 5: By the CLT, the limiting variance matrix of = h5;g : Proof of Lemma 8.3 Lemma 8.3 of AG1. Suppose Assumption WU holds for some non-empty parameter space 2: Under all sequences f n;h :n 1g with bn n1=2 (b gn ; D n;h 2 ; b n UFn Tn ) !d (g h ; Dh ; EFn Gi ; W Fn D where (a) (g h ; Dh ) are de…ned in Lemma 8.2, (b) de…ned in (8.17), (c) (Dh ; 1=2 WF = F h) h h ); is the nonrandom function of h and Dh and g h are independent, (d) if Assumption WU holds with ; and UF = Ip ; then h = 0; has full column rank p with probability one and (e) under all subsequences fwn g and all sequences f wn ;h :n 1g with wn ;h 2 ; the convergence result above and the results of parts (a)-(d) hold with n replaced with wn : The proof of part (d) of Lemma 8.3 uses the following two lemmas and corollary. 2 Rk Lemma 15.1 Suppose variance matrix ), k vectors p has a multivariate normal distribution (with possibly singular p; and the variance matrix of 2 Rp with jj jj = 1: Then, P ( Comments: (i) Let Condition A su¢ cient condition for Condition ( 0 2 Rk has rank at least p for all nonrandom has full column rank p) = 1: denote the condition of the lemma on the variance of : is that vec( ) has a pd variance matrix (because = Ik )vec( )): The converse is not true. This is proved in Comment (iii) below. (ii) A weaker su¢ cient condition for Condition is that the variance matrix of 2 Rk has rank k for all constant vectors 2 Rp with jj jj = 1: The latter condition holds i¤ V ar( 0 vec( )) > 0 2 Rpk of the form for all (because ( 0 0 0 )vec( = ) = vec( V ar( vec( )) > 0 for all 2 Rpk 2 Rp and for some 0 )= 0 2 Rk with jj jj = 1 and jj jj = 1 ): In contrast, vec( ) has a pd variance matrix i¤ with jj jj = 1: 4 (iii) For example, the following matrix for Condition (and hence Condition satis…es the su¢ cient condition given in Comment (ii) holds), but not the su¢ cient condition given in Comment (i). Let Zj for j = 1; 2; 3 be independent standard normal random variables. De…ne 0 1 Z1 Z2 =@ A: Z3 Z1 (15.1) Obviously, V ar(vec( )) is not pd. On the other hand, writing 0 2) = ( 1; and =( 1; 0 2) ; we have 0 V ar( ) = V ar( 1 [Z1 1 = V ar(( =( 2 1 2) Now, ( 1 = 0 implies = 0 implies 1; 1) are: ( and ( 2; 2) 2 6= 0; 2 1 = 0 or 2; 2) = (0; 0) implies ( with jj jj = jj jj = 1; V ar( + + 1 2 2 )Z1 1 2 Z2 + 2 2 2) 2 2 +( + 2 1 2) 2 2 1) )2 =( 1; 1) 1 1 )2 2 2 1) : +( = 0 implies 2 (15.2) = 0 or = (0; 0) implies ( 1 = 0: In addition, 2 1 2) =( 2 2 1) =0 2 1 1 + 2 2) =( 2 2 2) >0 > 0: Hence, V ar( 0 ) > 0 for all and 2 R2 with jj jj2 = 1; and the su¢ cient condition given holds. allows for redundant rows in ; which corresponds to redundant moment conditions in the application of Lemma 15.1. Suppose a matrix adds one or more rows to 2 1 Z3 ) 6= 0; etc. So, the two cases where ( ) is pd for all in Comment (ii) for Condition + Z1 2 ]) + = (0; 0): But, ( 1 1 2 [Z3 1 = 0 and ( 2 = 0 implies = (0; 0) and ( (iv) Condition 1 1 1 1 + Z2 2 ] + satis…es Condition ; which consist of one or more of the existing rows of : Then, one or some linear combinations of them. (In fact, the added rows can be arbitrary provided the resulting matrix has a multivariate normal distribution.) Call the new matrix (because the rank of the variance of +: The matrix + also satis…es Condition is at least as large as the rank of the variance of + ; which is p): Corollary 15.2 Suppose Rk (p q ) Let M 2 q 2 Rk q is a nonrandom matrix with full column rank q and has a multivariate normal distribution (with possibly singular variance matrix ) and k Rk k be a nonsingular matrix such that M q the variance matrix of M2 2 Rp q p q 2 with jj 2 jj = 1: Then, for 2 Rk =( q has rank at least p q ; p q p) = 1: 5 2 p: = (e1 ; :::; eq ); where el denotes the l-th coordinate vector in Rk : Decompose M = (M10 ; M20 )0 with M1 2 Rq 2 p q ) 2 Rk p; k and M2 2 R(k q ) k: Suppose q for all nonrandom vectors we have P ( has full column rank Comment: Corollary 15.2 follows from Lemma 15.1 by the following argument. We have 0 =@ M The matrix M1 q M1 p q M2 q M2 p q has full column rank p i¤ M 1 0 A=@ Iq 0(k q ) q M1 p q M2 p q has full column rank p i¤ M2 rank p q : The Corollary now follows from Lemma 15.1 applied with M2 ;k p q q ;p q ; and 2; 1 A: (15.3) p q has full column ; k; p; and replaced by respectively. The following lemma is a special case of Cauchy’s interlacing eigenvalues result, e.g., see Hwang (2004). As above, for a symmetric matrix A; let Let A r 1 (A) denote a principal submatrix of A of order r 2 (A) ::: denote the eigenvalues of A: 1: That is, A r denotes A with some choice of r rows and the same r columns deleted. Proposition 15.3 Let A by a symmetric k ::: 2 (A) 1 (A 1 ) k matrix. Then, k (A) k 1 (A 1 ) k 1 (A) 1 (A): The following is a straightforward corollary of Proposition 15.3. Corollary 15.4 Let A by a symmetric k m (A r ) for m = 1; :::; k r and (b) k matrix and let r 2 f1; :::; k m (A) m r (A r ) 1g: Then, (a) m (A) for m = r + 1; :::; k: Proof of Lemma 8.3. First, we prove the convergence result in Lemma 8.3. The singular value decomposition of Wn Dn Un is Wn Dn Un = Cn 0 n Bn ; (15.4) because Bn is a matrix of eigenvectors of Un0 Dn0 Wn0 Wn Dn Un ; Cn is a matrix of eigenvectors of Wn Dn Un Un0 Dn0 Wn0 ; and n is the k on the diagonal (ordered so that jFn p matrix with the singular values f jFn :j pg of Wn Dn Un 0 is nonincreasing in j). Using (15.4), we get Wn Dn Un Bn;q 1 n;q = Cn 0 n Bn Bn;q 1 n;q = Cn 0 n@ Iq 0(p q) q 1 A 1 n;q 0 = Cn @ Iq 0(k q) q 1 A = Cn;q ; (15.5) where the second equality uses Bn0 Bn = Ip : Hence, we obtain b n Un Bn;q Wn D 1 n;q = Wn Dn Un Bn;q 1 n;q bn + Wn n1=2 (D = Cn;q + op (1) !p h3;q = 6 h;q ; Dn )Un Bn;q (n1=2 n;q ) 1 (15.6) where the second equality uses n1=2 q (by the de…nition of q in (8.16)), b n Dn ) = Op (1) (by M1 < 1 8F 2 FW U ; see (8.5)), n1=2 (D jFn Wn = O(1) (by the condition jjWF jj ! 1 for all j Lemma 8.2), Un = O(1) (by the condition jjUF jj M1 < 1 8F 2 FW U ; see (8.5)), and Bn;q ! h2;q with jjvec(h2;q )jj < 1 (by (8.12) using the de…nitions in (8.17) and (13.1)). The convergence in (15.6) holds by (8.12), (8.17), and (13.1), and the last equality in (15.6) holds by the de…nition of h;q in (8.17). Using (15.4) again, we have n1=2 Wn Dn Un Bn;p 0 q = n1=2 Cn 1 0 n Bn Bn;p q 0 = n1=2 Cn 0 n@ 0q (p q) 1 Ip q 0q (p q) 0q (p q) B C B C 1=2 C ! h3 B Diagfh1;q+1 ; :::; h1;p g C = h3 h1;p = Cn B n n;p q @ A @ A (k p) (p q) (k p) (p q) 0 0 1 A q; (15.7) where the second equality uses Bn0 Bn = Ip ; the convergence holds by (8.12) using the de…nitions in (8.17) and (13.2), and the last equality holds by the de…nition of h1;p q in (8.17). Using (15.7) and Lemma 8.2, we get b n Un Bn;p n1=2 Wn D where Bn;p q ! h2;p q; q = n1=2 Wn Dn Un Bn;p ! d h3 h1;p q q bn + Wn n1=2 (D + h71 Dh h81 h2;p q = Dn )Un Bn;p q h;p q ; (15.8) Wn ! h71 ; and Un ! h81 by (8.3), (8.12), (8.17), and Assumption WU using the de…nitions in (13.1) and the last equality holds by the de…nition of h;p q in (8.17). Equations (15.6) and (15.8) combine to prove b n Un Tn = n1=2 Wn D b n Un Bn Sn = (Wn D b n Un Bn;q n1=2 Wn D !d ( h;q ; h;p q ) = h 1 1=2 b n Un Bn;p q ) Wn D n;q ; n (15.9) using the de…nition of Sn in (8.19). The convergence is joint with that in Lemma 8.2 because it b n Dn ); which is part of the former. This establishes the just relies on the convergence of n1=2 (D convergence result of Lemma 8.3. Properties (a) and (b) in Lemma 8.3 hold by de…nition. Property (c) in Lemma 8.3 holds by Lemma 8.2 and property (b) in Lemma 8.3. 7 Now, we prove property (d). We have h02;p 0 = lim Bn;p q h2;p q q Bn;p q = Ip q 0 and h03;q h3;q = lim Cn;q Cn;q = Iq (15.10) because Bn and Cn are orthogonal matrices by (8.6) and (8.7). Hence, if q = p; then 0 h h3;q ; h = Ip ; and 2 n;h 0 8n h;p q ; M = h03 ; M1 = h03;q ; M2 = h03;k gives the desired result that P ( that “M h all nonrandom vectors q h;p q 2 2 Rk q = 2 2 2 Rp q 2 Rp q; and = q = h: h;q (= h3;q ); Corollary 15.2 has full column rank p) = 1: The condition in Corollary 15.2 in Corollary 15.2 that “the variance matrix of M2 h03;k q; = (e1 ; :::; eq )” holds in this case because h03 q h;q 1; which is assumed in part (d). We prove part (d) for this case by applying Corollary 15.2 with q = q; = = has full column rank. h Hence, it su¢ ces to consider the case where q < p and p q h p q h;q 2 = h03 h3;q = (e1 ; :::; eq ): The condition 2 Rk q has rank at least p q for with jj 2 jj = 1” in this case becomes “the variance matrix of has rank at least p q for all nonrandom vectors 2 2 Rp q with jj 2 jj = 1:” It remains to establish the latter property, which is equivalent to p q V ar(h03;k q h;p q 2 ) >08 2 2 Rp q with jj 2 jj = 1: (15.11) We have V ar(h03;k q h;p q 2 ) = V ar(h03;k 1=2 q h5;g D h h2;p q 2 ) = ((h2;p 0 q 2) (h03;k 1=2 q h5;g ))V = ((h2;p 0 q 2) (h03;k 1=2 vec(Gi ) ((h2;p q 2 ) q h5;g )) h = h03;k h 1=2 q h5;g Gi h2;p q 2 ar(vec(Dh ))((h2;p 1=2 0 q h5;g ) ) q 2) (h03;k (h03;k 1=2 0 q h5;g ) ) ; (15.12) 1=2 where the …rst equality holds by the de…nition of h;p q in (8.17) and the fact that h71 = h5;g and h81 = Ip by the conditions in part (d) of Lemma 8.3, the second and fourth equalities use the general formula vec(ABC) = (C 0 A)vec(B); the third equality holds because vec(Dh ) by Lemma 8.2, and the fourth equality uses the de…nition of the variance matrix an arbitrary random vector ai : Next, we show that h03;k h 1=2 q h5;g Gi h2;p q 2 8 N (0pk ; ai h vec(Gi ) ) h in (8.15) for equals the expected outer-product matrix 1=2 0 Cn;k Fn lim n q Gi Bn;p q 2 : 1=2 q h5;g Gi h2;p q 2 h03;k h = ((h2;p 0 q 2) 1=2 vec(Gi ) ((h2;p q 2 ) q h5;g )) h (h03;k = lim((Bn;p 0 q 2) 0 (Cn;k q n = lim((Bn;p 0 q 2) 0 (Cn;k q n 0 q 2) lim((Bn;p 0 q 2) = lim((Bn;p 0 (Cn;k 0 (Cn;k 0 lim EFn vec(Cn;k 0 Cn;k Fn = lim 1=2 n q n q Gi Bn;p 1=2 q 2 1=2 )) vec(Gi ) ((Bn;p q 2 ) Fn 0 (Cn;k q n 1=2 0 1=2 )) vec(Gi ) ((Bn;p q 2 ) Fn 0 (Cn;k q n 1=2 0 n q 1=2 n q 1=2 0 q h5;g ) ) (h03;k 1=2 )) ))EFn vec(Gi ) EFn vec(Gi )0 ((Bn;p vec(Gi ) ((Bn;p q 2 ) Fn )) )) Gi Bn;p 0 (Cn;k 0 EFn vec(Cn;k q 2) q n q 1=2 n 0 (Cn;k q 2) n q 1=2 0 )) 1=2 0 )) 0 q 2) Gi Bn;p ; (15.13) where the general formula vec(ABC) = (C 0 A)vec(B) is used multiple times, the limits exist by the conditions imposed on the sequence f :n Cn;k ! h3;k q q; and 1=2 n 1=2 h5;g ; ! n;h 1g; the second equality uses Bn;p the third equality uses the de…nitions of 0 given in (3.2) and (8.15), respectively, and the last equality uses EFn vec(Cn;k 0 vec(Cn;k q n 1=2 Dn Bn;p 1=2 nm Gi Bnm ) = O(n 0 vec(Cn Fn We can write lim 0 vec(Cn m Fnm q) 1=2 ) 1=2 n Gi Bn ) by (15.7) with Wn = n 1=2 1=2 jFnm 2 Rp For all 2 0 Cn m ;k Fnm lim 2 Rp q j 1=2 nm Gi Bnm ;p j with jj 2 jj = 1; let and Gi Bn;p q) j; ai F = 1g of matrices 1 for some j = 0; :::; q: It cannot be the case 1=2 jFnm ! 1 as m ! 1 9 1 as m ! 1 by the de…nition of q in (8.16). Now, we …x an arbitrary j 2 f0; :::; qg: The continuity of the p j 1=2 ai F ! h2;p : that j > q; because if j > q; then we obtain a contradiction because nm condition in F0j imply that, for all n as the limit of a subsequence fnm : m for which Fnm 2 F0j for all m by the …rst condition of F0j and nm q j j ) function and the p j( ) with jj jj = 1; = lim = (0q p j( p j j0 ; 0 )0 2 0 Cn m ;k Fnm 2 Rp j 1=2 nm Gi Bnm ;p j j: Then, Bnm ;p 2 2 Rp j > 0: = Bnm ;p (15.14) q 2 and, by (15.14), p j lim 0 Cn m ;k Fnm j 1=2 nm Gi Bnm ;p q 2 >08 Next, we apply Corollary 15.4(b) with A = lim 0 Cn m ;k Fnm q 1=2 nm Gi Bnm ;p q 2 ;m=p j; r = q 0 Cn m ;k Fnm j; where A j q with jj 2 jj = 1: 1=2 nm Gi Bnm ;p q 2 (q j) (q j) equals A with its …rst q and columns deleted in the present case and p > q implies that m = p 9 and A (15.15) j = lim j rows 1 for all j = 0; :::; q: Corollary 15.4 and (15.15) give p q 0 Cn m ;k Fnm lim q 1=2 nm Gi Bnm ;p q 2 >08 2 2 Rp q with jj 2 jj = 1: (15.16) Equations (15.12), (15.13), and (15.16) combine to establish (15.11) and the proof of part (d) is complete. Part (e) of the Lemma holds by replacing n by the subsequence value wn throughout the arguments given above. = 0k for some Proof of Lemma 15.1. It su¢ ces to show that P ( For any constant 2 Rp with jj jj = 1) = 0: > 0; there exists a constant K < 1 such that P (jjvec( )jj > K ) : Given " > 0; let fB( s ; ") : s = 1; :::; N" g be a …nite cover of f 2 Rp : jj jj = 1g; where jj s jj = 1 and B( s ; ") is a ball in Rp centered at of radius ": It is possible to choose f s such that the number, N" ; of balls in the cover is of order " p+1 : That is, N" s : s = 1; :::; N" g C1 " p+1 for some constant C1 < 1: Let r denote the rth row of for r = 1; :::; k written as a column vector. If 2 B( s ; "); we have jj s jj = k X ( 0 r( 2 s )) r=1 !1=2 k X r=1 jj 2 r jj jj 2 s jj !1=2 = "jjvec( )jj; where the inequality holds by the Cauchy-Bunyakovsky-Schwarz inequality. If 2 B( s ; ") and = 0k ; this gives jj s jj (15.17) "jjvec( )jj: (15.18) Suppose Z 2 Rp has a multivariate normal distribution with pd variance matrix. Then, for any " > 0; P (jjZ jj ") = Z fZ (z)dz sup fZ (z) z2Rk fjjzjj "g Z dz C2 "p (15.19) fjjzjj "g for some constant C2 < 1; where fZ (z) denotes the density of Z with respect to Lebesgue measure, which exists because the variance matrix of Z is pd, and the inequalities hold because the density of a multivariate normal is bounded and the volume of a sphere in Rp of radius " is proportional to "p : For any 2 Rp with jj jj = 1; let B the diagonal k B 0 be a spectral decomposition of V ar( k matrix with the eigenvalues of V ar( and B is an orthogonal k to the eigenvalues in is ) on its diagonal in nonincreasing order k matrix whose columns are eigenvectors of V ar( : By assumption, the rank of V ar( 10 ); where ) that correspond ) is p or larger. In consequence, the …rst p diagonal elements of B 0 V ar( : Let (B 0 )B = vector B0 and p : Let p submatrix of is pd (because the …rst p diagonal elements of Now, given any = 0k for some " = P [N s=1 [ : We have V ar((B 0 )p ) = p are positive). 2B( s ;"):jj 2 Rp with jj jj = 1) = 0k g f jj=1 " P [N s=1 fjj s jj "jjvec( )jjg " P [N s=1 fjj s jj "jjvec( )jjg \ fjjvec( )jj " P [N s=1 fjj s jj "K g + s=1 N" X ) = > 0 and " > 0; we have P( N" X jj and V ar(B 0 )p denote the p vector that contains the …rst p elements of the k denote the upper left p p jj = jjB 0 are positive. We have jj P (jj s jj P (jj(B 0 s s=1 K g + P (jjvec( )jj > K ) "K ) + s )p jj "K ) + N" C2 K p "p + C1 " ! p+1 C2 K p "p + as " ! 0; (15.20) where the …rst inequality holds by (15.18) using 2 B( s ; "); the third inequality uses the de…nition of K ; the third last inequality holds because jj(B 0 s jjB 0 s s )p jj s jj = jj s jj using the de…ni- tions in the paragraph that follows the paragraph that contains (15.19), the second last inequality holds by (15.19) with Z = (B 0 s s )p and the fact that the variance matrix of (B 0 s s )p is pd by the argument given in the paragraph following (15.19), and the last inequality holds by the bound given above on N" : Because = 0k for some > 0 is arbitrary, (15.20) implies that P ( 2 Rp with jj jj = 1) = 0; which completes the proof. 16 Proof of Theorem 8.4 Theorem 8.4 of AG1. Suppose Assumption WU holds for some non-empty parameter space 2: Under all sequences f (a) bpn !p 1 if q = p; (b) bpn !d min ( n;h :n 0 0 h;p q h3;k q h3;k q 1g with h;p q ) n;h 2 if q < p; 11 ; (c) bjn !p 1 for all j q; bn ; i.e., (b(q+1)n ; :::; b nU bn0 D b n0 W cn0 W cn D (d) the (ordered ) vector of the smallest p q eigenvalues of nU 0 0 h;p q h3;k q h3;k q bpn )0 ; converges in distribution to the (ordered ) p q vector of the eigenvalues of h;p q 2 R(p q) (p q) ; (e) the convergence in parts (a)-(d) holds jointly with the convergence in Lemma 8.3, and (f) under all subsequences fwn g and all sequences f wn ;h :n 1g with wn ;h 2 ; the results in parts (a)-(e) hold with n replaced with wn : The proof of Theorem 8.4 uses the following rate of convergence lemma. This lemma is a key technical contribution of the paper. Lemma 16.1 Suppose Assumption WU holds for some non-empty parameter space Under all sequences f n;h : n 1g with n;h 2 and for which q de…ned in (8.16) satis…es 1; we have (a) bjn !p 1 for j = 1; :::; q and (b) when p > q; bjn = op ((n1=2 q ` q and j = q + 1; :::; p: Under all subsequences fwn g and all sequences f wn ;h 2 wn ;h 2 `Fn ) ) : n for all 1g with ; the same result holds with n replaced with wn : Proof of Lemma 16.1. By the de…nitions in (8.9) and (8.12), h6;j := lim j = 1; :::; p 2: (j+1)Fn = jFn for 1: By the de…nition of q in (8.16), h6;q = 0 if q < p: If q = p; h6;q is not de…ned by (8.9) and (8.12) and we de…ne it here to equal zero. Because in j; h6;j 2 [0; 1]: If h6;j > 0; then f magnitude, i.e., 0 < lim jFn :n 1g and f jF (j+1)Fn is nonnegative and nonincreasing :n 1g are of the same order of 1:50 We group the …rst q singular values into groups that (j+1)Fn = jFn have the same order of magnitude within each group. Let Gh (2 f1; :::; qg) denote the number of groups. (We have Gh 1 because q 1 is assumed in the statement of the lemma.) Note that Gh equals the number of values in fh6;1 ; :::; h6;q g that equal zero. Let rg and rg denote the indices of the …rst and last singular values, respectively, in the gth group for g = 1; :::; Gh : Thus, r1 = 1; rg = rg+1 1; where rGh +1 is de…ned to equal q + 1; and rGh = q: Note that rg and rg depend on h: By de…nition, the singular values in the gth group, which have the gth largest order of magnitude, are f rg Fn :n 1g; :::; f r g Fn :n 1g: By construction, h6;j > 0 for all j 2 frg ; :::; rg g = 1; :::; Gh : (The reason is: if h6;j is equal to zero for some j 2 frg ; :::; rg is of smaller order of magnitude than f by construction, lim j 0 Fn = jFn r g Fn :n 1g; then f rg Fn 1g for :n 1g 1g; which contradicts the de…nition of rg :) Also = 0 for any (j; j 0 ) in groups (g; g 0 ); respectively, with g < g 0 : Note that when p = 1 we have Gh = 1 and r1 = r1 = 1: 50 Note that supj 1;F 2FW U jF < 1 by the conditions jjWF jj M1 and jjUF jj M1 in FW U and the moment conditions in F : Thus, f jFn : n 1g does not diverge to in…nity, and the “order of magnitude” of f jFn : n 1g refers to whether this sequence converges to zero, and how slowly or quickly it does, when it does converge to zero. 12 bn are solutions to the determinantal equation b nU bn0 D b n0 W cn0 W cn D The eigenvalues fbjn : j pg of nU b 10 j bn b nU bn0 D b n0 W cn0 W cn D Ip j = 0: Equivalently, by multiplying this equation by r12Fn n 1 jBn0 Un0 U jnU n 1 b Un Bn j; they are solutions to jU n j 2 0 0 b 0 c0 c b r1 Fn Bn Un Dn Wn Wn Dn Un Bn (n1=2 r 1 Fn ) 2 b 10 U b 1 Un Bn j = 0 Bn0 Un0 U n n (16.1) wp!1; using jA1 A2 j = jA1 j jA2 j for any conformable square matrices A1 and A2 ; jBn j > 0; jUn j > 0 (by the conditions in FW U in (8.5) because 2 and 2 only contains distributions in FW U ); b 1 j > 0 wp!1 (because U bn !p h81 by (8.2), (8.12), (8.17), and Assumption WU(b) and (c) and jU n h81 is pd), and r 1 Fn > 0 for n large (because n1=2 r1 Fn ! 1 for r1 the quali…er wp!1 from some statements below.) Thus, f(n1=2 j b1n 2 Rr1 for A r1 ; 2 0 0 b 0 c0 c b r1 Fn Bn Un Dn Wn Wn Dn Un Bn q): (For simplicity, we omit r1 Fn ) 2b jn :j pg solve bn )j = 0 or (Ip + A bn ) 1 2 Bn0 Un0 D b n0 W cn0 W cn D b n Un Bn j(Ip + A Ip j = 0; where r 1 Fn 2 3 b b2n A A b 10 U b 1 Un Bn Ip bn = 4 1n 5 := Bn0 Un0 U A n n 0 b b A2n A3n b2n 2 Rr1 A (p r1 ) ; b3n 2 R(p and A bn ) multiplying the …rst line by j(Ip + A r1 ) (p r1 ) (16.2) and the second line is obtained by 1 j: We have = = 1 c b r1 Fn Wn Dn Un Bn 1 1 c r1 Fn (Wn Wn )Wn Dn Un Bn 1 1 c r1 Fn (Wn Wn )Cn 2 n (n1=2 + Op ((n1=2 r 1 Fn ) r1 Fn ) 1 ) 1 c 1=2 b Wn n (Dn Dn )Un Bn (16.3) 3 h + o(1) 0r1 (p r1 ) 6 6;r1 7 1=2 1 (p r1 ) r1 = (Ik + op (1))Cn 6 O( r2 Fn = r1 Fn )(p r1 ) (p r1 ) 7 r1 Fn ) ) 4 0 5 + Op ((n 0(k p) r1 0(k p) (p r1 ) 2 3 r1 1 Y h6;r 0r1 (p r1 ) 1 5 ; where h6;r := Diagf1; h6;1 ; h6;1 h6;2 ; :::; ! p h3 4 h6;` g; 1 0(k r1 ) r1 0(k r1 ) (p r1 ) `=1 h6;r 2 Rr1 1 r1 ; h6;r := 1 when r1 = 1; O( 1 r1 ) matrix whose diagonal elements are O( (p r1 ) (p r1 ) r2 Fn = r1 Fn ) r2 Fn = r1 denotes a diagonal (p r1 ) (p c Fn ); the second equality uses (15.4), Wn !p h71 (by Assumption WU(a) and (c)), jjh71 jj = jj lim Wn jj < 1 (by the conditions in FW U de…ned in b n Dn ) = Op (1) (by Lemma 8.2), Un = O(1) (by the conditions in FW U ); and Bn = (8.5)), n1=2 (D 13 cn Wn 1 !p Ik (because W cn !p h71 ; h71 := O(1) (because Bn is orthogonal), the third equality uses W jY1 jY1 lim Wn ; and h71 is pd by the conditions in FW U ); jFn = r1 Fn = ( (`+1)Fn = `Fn ) = h6;` + o(1) `=1 for j = 2; :::; r1 ; and jFn = r1 Fn = O( r2 Fn = r1 Fn ) nonincreasing in j); and the convergence uses Cn ! h3 ; and n1=2 r2 Fn = r1 Fn q):51 ! 1 (by (8.16) because r1 r 1 Fn `=1 for j = r2 ; :::; p (because f jFn : j pg are ! 0 (by the de…nition of r2 ); Equation (16.3) yields 2 0 0 b 0 c0 c b r1 Fn Bn Un Dn Wn Wn Dn Un Bn 2 !p 4 2 = 4 h6;r 0r1 (p r1 ) 1 0(k r1 ) r1 0(k r1 ) (p r1 ) 0r1 (p r1 ) 2 h6;r 1 0(p r1 ) r1 0(p r1 ) (p r1 ) 30 2 5 h03 h34 3 h6;r 0r1 (p r1 ) 1 0(k r1 ) r1 0(k r1 ) (p r1 ) 5; 3 5 (16.4) where the equality holds because h03 h3 = lim Cn0 Cn = Ik using (8.7). In addition, we have bn := Bn0 Un0 U bn 10 U bn 1 Un Bn A Ip !p 0p p (16.5) bn !p h81 by Assumption WU(b) and (c), h81 := lim Un ; and h81 is bn 1 Un !p Ip (because U using U pd by the conditions in FW U ); Bn ! h2 ; and h02 h2 = Ip (because Bn is orthogonal for all n 1): The ordered vector of eigenvalues of a matrix is a continuous function of the matrix by Elsner’s Theorem, see Stewart (2001, Thm. 3.1, pp. 37–38). Hence, by the second line of (16.2), (16.4), 2 cn D b nU bn Bn (i.e., cn0 W b n0 W bn0 D (16.5), and Slutsky’s Theorem, the largest r eigenvalues of Bn0 U 1 f(n1=2 r 1 Fn ) 2b ((n jn 1=2 r1 Fn ) 2 b1n ; :::; (n 1=2 r 1 Fn ) bjn !p 1 8j = 1; :::; r1 because n1=2 r1 Fn r1 g by the de…nition of bjn ), satisfy :j 2 br1 n ) !p (1; h26;1 ; h26;1 h26;2 ; :::; r1 1 Y h26;` ) and so `=1 (16.6) q) and h6;` > 0 for all ` 2 f1; :::; r1 1g (as noted b0 D b 0 c0 c b b above). By the same argument, the smallest p r1 eigenvalues of r12Fn Bn0 U n n Wn Wn Dn Un Bn ; i.e., f(n1=2 r1 Fn r1 Fn ) ! 1 (by (8.16) since r1 2b jn : j = r1 + 1; :::; pg; satisfy (n1=2 r 1 Fn ) 2 bjn !p 0 8j = r1 + 1; :::; p: (16.7) If Gh = 1; (16.6) proves part (a) of the lemma and (16.7) proves part (b) of the lemma (because 51 For matrices that are written as O( ); we sometimes provide the dimensions of the matrix as superscripts for clarity, and sometimes we do not provide the dimensions for simplicity. 14 in this case r1 = q and r1 Fn = `Fn here on, we assume that Gh = O(1) for all ` 2: Next, de…ne Bn;j1 ;j2 to be the p Bn for 0 j1 < j2 q by the de…nitions of q and Gh ): Hence, from (j2 j1 ) matrix that consists of the j1 + 1; :::; j2 columns of p: Note that the di¤erence between the two subscripts j1 and j2 equals the number of columns of Bn;j1 ;j2 ; which is useful for keeping track of the dimensions of the Bn;j1 ;j2 matrices that appear below. By de…nition, Bn = (Bn;0;r1 ; Bn;r1 ;p ): By (16.3) (excluding the convergence part) applied once with Bn;r1 ;p in place of Bn as the farright multiplicand and applied a second time with Bn;0;r1 in place of Bn as the far-right multiplicand, we have %n := 2 0 0 b 0 c0 c b r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;r1 ;p 2 30 2 =4 h6;r + o(1) 1 0(k r1 ) r1 +Op ((n1=2 = op ( 5 Cn0 (Ik + op (1))Cn 4 1 r1 Fn ) r2 Fn = r1 Fn ) 0r1 (p r1 ) O( r2 Fn = (k r1 Fn ) r1 ) (p r1 ) ) + Op ((n1=2 r 1 Fn ) 1 3 5 ); (16.8) where the last equality holds because (i) Cn0 (Ik + op (1))Cn = Ik + op (1); (ii) when Ik appears in place of Cn0 (Ik + op (1))Cn ; the …rst summand on the left-hand side (lhs) of the last equality equals 0r1 (p r1 ) ; and (iii) when op (1) appears in place of Cn0 (Ik + op (1))Cn ; the …rst summand on the lhs of the last equality equals an r1 (p r1 ) matrix with elements that are op ( r2 Fn = r1 Fn ): De…ne b ( ) := 1n 2 0 0 b 0 c0 c b r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;0;r1 b1n ) 2 Rr1 (Ir1 + A b ( ) := 3n 2 0 0 b 0 c0 c b r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p (Ip b ( ) := % 2n n b2n 2 Rr1 A (p r1 ) ; and 15 r1 r1 b3n ) 2 R(p +A ; (16.9) r1 ) (p r1 ) : As in the …rst line of (16.2), f(n1=2 0=j r1 Fn ) 2b jn :j pg solve 2 0 0 b 0 c0 c b r1 Fn Bn Un Dn Wn Wn Dn Un Bn 2 = 4 bn )j (Ip + A 3 b ( ) b ( ) 1n 2n 5 b ( )0 b ( ) 2n 3n b ( )0b 1 ( )b ( )j 2n 2n 1n = jb1n ( )j jb3n ( ) = jb1n ( )j j 2 0 0 b 0 c0 c b r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p b3n A b02nb1n1 ( )%n %0nb1n1 ( )A b2n (Ip r1 + A 1 %0nb1n ( )%n b02nb1n1 ( )A b2n )j; + A (16.10) where the third equality uses the standard formula for the determinant of a partitioned matrix and the result given in (16.11) below, which shows that b ( ) is nonsingular wp!1 for equal to any 1n solution (n1=2 r 1 Fn ) algebra.52 2b jn to the …rst equality in (16.10) for j Now we show that, for j = r1 +1; :::; p; (n1=2 2b r1 Fn ) jn p; and the last equality holds by cannot solve the determinantal equation jb1n ( )j = 0; wp!1; where this determinant is the …rst multiplicand on the right-hand side (rhs) of (16.10). This implies that f(n1=2 r1 Fn ) 2b jn : j = r1 + 1; :::; pg must solve the determinantal equation based on the second multiplicand on the rhs of (16.10) wp!1: For j = r1 + 1; :::; p; we have e j1n : = b1n ((n1=2 = r 1 Fn ) 2 bjn ) 2 0 b 0 c0 c b 0 r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;0;r1 2 = h6;r + op (1) 1 op (1)(Ir1 + op (1)) (n1=2 r 1 Fn ) 2 b1n ) bjn (Ir1 + A 2 + op (1); = h6;r 1 (16.11) where the second last equality holds by (16.4), (16.5), and (16.7). Equation (16.11) and 2 min (h6;r 1 )> 0 (which follows from the de…nition of h6;r in (16.3) and the fact that h6;` > 0 for all ` 2 1 f1; :::; r1 1g) establish the result stated in the …rst sentence of this paragraph. For j = r1 +1; :::; p; plugging (n1=2 52 r 1 Fn ) The determinant of the partitioned matrix 2b = jn into the second multiplicand on the rhs of (16.10) 1 0 2 nonsingular, e.g., see Rao (1973, p. 32). 16 2 3 equals j j = j 1 j j 3 1 0 2 1 2j provided 1 is gives 2 0 0 b 0 c0 c b r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p 0=j (n1=2 r 1 Fn ) 2 bjn (Ip b0 e 1 %n A 2n j1n bj2n : = A b3n A r1 1 b2n + (n1=2 %0nej1n A 2 0 0 b 0 c0 c b r2 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p using Op ((n1=2 j r2 Fn ) 2) 2 r2 Fn = r1 Fn ) ) bj2n )j; where +A using (16.8) and (16.11). Multiplying (16.12) by 0=j + op (( = op (1) (because r2 r1 Fn ) 2 ) (16.12) r 1 Fn ) 2 2 2 r1 F n = r2 Fn (p b0 e 1 A b bjn A 2n j1n 2n 2 R r1 ) (p r1 ) gives (n1=2 + op (1) + Op ((n1=2 2 r2 Fn ) bjn (Ip r1 bj2n )j +A q by the de…nition of r2 and n1=2 jFn (16.13) ! 1 for all q by the de…nition of q in (8.16)). Thus, f(n1=2 r 2 Fn ) 0=j 2b jn : j = r1 + 1; :::; pg solve 2 0 0 b 0 c0 c b r2 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p + op (1) (Ip r1 For j = r1 + 1; :::; p; we have bj2n )j: +A (16.14) bj2n = op (1); A (16.15) b2n = op (1) and A b3n = op (1) (by (16.5)), e 1 = Op (1) (by (16.11)), %n = op (1) (by (16.8) because A j1n since r 2 Fn r1 Fn and n1=2 r1 Fn ! 1); and (n1=2 (16.7)). r1 Fn ) 2b jn = op (1) for j = r1 + 1; :::; p (by Now, we repeat the argument from (16.2) to (16.15) with the expression in (16.14) replacing that bj2n ; Bn;p r ; r Fn ; in the …rst line of (16.2), with (16.15) replacing (16.5), and with j = r +1; :::; p; A 2 1 2 r2 1 r3 Fn ; r2 r1 ; p r2 ; and h6;r = Diagf1; h6;r1 +1 ; h6;r1 +1 h6;r1 +2 ; :::; 2 Y `=r1 +1 h6;` g 2 R(r2 r1 ) (r2 r1 ) bn ; Bn ; r Fn ; r Fn ; r ; p r ; and h ; respectively. (The fact in place of j = r1 + 1; :::; p; A 1 2 1 1 6;r1 bj2n depends on j; whereas A bn does not, does not a¤ect the argument.) In addition, Bn;0;r that A 1 and Bn;r1 ;p in (16.8)-(16.10) are replaced by the matrices Bn;r1 ;r2 and Bn;r2 ;p (which consist of the r1 + 1; :::; r2 columns of Bn and the last p r2 columns of Bn ; respectively.) This argument gives the analogues of (16.6) and (16.7), which are bjn !p 1 8j = r2 ; :::; r2 and (n1=2 r2 Fn ) 2 bjn = op (1) 8j = r2 + 1; :::; p: (16.16) bj3n in place of A bj2n ; where In addition, the analogue of (16.14) is the same as (16.14) but with A bj3n is de…ned just as A bj2n is de…ned in (16.12) but with A b2j2n and A b3j2n in place of A b2n and A b3n ; A 17 respectively, where b1j2n 2 Rr2 for A r2 ; bj2n A b2j2n 2 Rr2 A 3 b2j2n b1j2n A A 5 =4 b0 b3j2n A A 2j2n 2 (p r1 r2 ) ; Repeating the argument Gh b3j2n 2 R(p and A (16.17) r1 r2 ) (p r1 r2 ) : 2 more times yields bjn !p 1 8j = 1; :::; rGh and (n1=2 r g Fn ) 2 bjn = op (1) 8j = rg + 1; :::; p; 8g = 1; :::; Gh : (16.18) A formal proof of this “repetition of the argument Gh 2 more times”is given below using induction. Because rGh = q; the …rst result in (16.18) proves part (a) of the lemma. The second result in (16.18) with g = Gh implies: for all j = q + 1; :::; p; (n1=2 rGh Fn ) 2 bjn = op (1) (16.19) because rGh = q: Either rGh = rGh = q or rGh < rGh = q: In the former case, (n1=2 qFn ) op (1) for j = q + 1; :::; p by (16.19). In the latter case, we have qFn lim rG rG Fn h = lim rGh Fn h Y = rGh Fn of the proof. Hence, in this case too, (16.20). Because qFn `Fn for all ` qFn ) 2b jn = 1 h6;j > 0; (16.20) j=rGh where the inequality holds because h6;` > 0 for all ` 2 frGh ; :::; rGh (n1=2 2b jn 1g; as noted at the beginning = op (1) for j = q + 1; :::; p by (16.19) and q; this establishes part (b) of the lemma. Now we establish by induction the results given in (16.18) that are obtained heuristically by “repeating the argument Gh 2 more times.”The induction proof shows that subtleties arise when establishing the asymptotic negligibility of certain terms. Let ogp denote a symmetric (p 1; :::; p rg 1 rg rg for ` 1 +` is op ( (rg 1 +`)Fn 1 (since (rg rg are (p rg 1) matrix whose (`; m) element for `; m = 2 1=2 1 rg Fn ) ): rg Fn )+Op ((n nonincreasing in j) and n1=2 1 +m)Fn jFn 1) = We now show by induction over g = 1; :::; Gh that wp!1 f(n1=2 for some (p 2 0 rg Fn Bn;rg rg 1) 1 ;p (p b0 W c0 c b Un0 D n n Wn Dn Un Bn;rg rg are ogp may depend on j): 1) 1 ;p + ogp (Ip rg rg Fn r g Fn ) solve j Note that ogp = op (1) because 1 ! 1 for g = 1; :::; Gh : 2b jn : j = rg bjgn )j = 0 +A 1 + 1; :::; pg (16.21) bjgn = op (1) and ogp (where the matrices that symmetric matrices A The initiation step of the induction proof holds because (16.21) holds with g = 1 by the …rst line 18 bjgn := A bn and ogp = 0 for g = 1 (and using the fact that, for g = 1; r of (16.2) with A g and Bn;rg 1 ;p 1 = r0 := 0 = Bn;0;p = Bn ): For the induction step of the proof, we assume that (16.21) holds for some g 2 f1; :::; Gh 1g and show that it then also holds for g + 1: By an argument analogous to that in (16.3), we have 1 c b rg Fn Wn Dn Un Bn;rg 02 B6 6 !p h 3 B @4 h6;rg 2 R(rg rg 0r g 1 1 ;p 2 6 = (Ik + op (1))Cn 6 4 Diagf (rg rg 3 1) h6;rg 0(k 1) rg ) (rg rg (rg rg 1) 1) 7 k 7;0 5 0rg 1 (p rg 1) rg Fn ; :::; pFn g= rg Fn 0(k p) (p rg 1) 1 3 7 7 + Op ((n1=2 5 C A ; where h6;rg := Diagf1; h6;rg ; :::; (p rg ) C r g Fn ) 1 ) rg 1 Y j=rg 1 +1 h6;j g; (16.22) ; and h6;rg := 1 when rg = 1: Equation (16.22) and h03 h3 = lim Cn0 Cn = Ik yield 2 0 rg Fn Bn;rg 1 ;p b0 W c0 c b Un0 D n n Wn Dn Un Bn;rg 1 ;p 2 0(rg 2 h6;r g !p 4 0(p rg ) (rg rg 1) 0(p rg 1) (p rg ) rg ) (p rg ) 3 5: (16.23) By (16.21) and ogp = op (1); we have wp!1 f(n1=2 rg Fn ) 2 bjn : j = rg 1 + 1; :::; pg solve 0 b 0 c0 c b bjgn ) 1 2 B 0 Ip rg 1 j = 0: Hence, by j(Ip rg 1 + A rg Fn n;rg 1 ;p Un Dn Wn Wn Dn Un Bn;rg 1 ;p + op (1) bjgn = op (1) (which holds by the induction assumption), and the same argument as used (16.23), A to establish (16.6) and (16.7), we obtain bjn !p 1 8j = rg Let ogp denote an (rg are op ( (rg +j)Fn = rg Fn ) 1 + 1; :::; rg and (n1=2 rg 1) + Op ((n1=2 (p rg Fn ) 2 bjn !p 0 8j = rg + 1; :::; p: (16.24) rg ) matrix whose elements in column j for j = 1; :::; p rg Fn ) 1 ): rg Note that ogp = op (1): By (16.22) applied once with Bn;rg ;p in place of Bn;rg 19 1 ;p as the far-right multiplicand and applied a second time with Bn;rg 1 ;rg in place of Bn;rg 1 ;p as the far-right multiplicand, we have %gn := 2 0 rg Fn Bn;rg 2 6 =6 4 Diagf 1 ;rg b0 W c0 c b Un0 D n n Wn Dn Un Bn;rg ;p 30 0r g (rg 1 +1)Fn ; :::; rg ) (rg rg 0(k +Op ((n1=2 (rg rg 1 rg Fn ) 1 2 1) rg Fn g= rg Fn 1) ) 7 0 6 7 Cn (Ik + op (1))Cn 6 Diagf 5 4 (p rg ) 0r g (rg +1)Fn ; :::; pFn g= rg Fn 0(k p) (p rg ) = ogp ; 3 7 7 5 (16.25) where %gn 2 R(rg rg 1) (p rg ) ; Diagf (rg 1 +1)Fn ; :::; rg Fn g= rg Fn = h6;rg + o(1) = O(1) and the last equality holds because (i) Cn0 (Ik + op (1))Cn = Ik + op (1); (ii) when Ik appears in place of Cn0 (Ik + op (1))Cn ; then the contribution from the …rst summand on the lhs of the last equality rg in (16.25) equals 0(rg (p rg ) 1) ; and (iii) when op (1) appears in place of Cn0 (Ik + op (1))Cn ; the contribution from the …rst summand on the lhs of the last inequality in (16.25) equals an ogp matrix. bjgn as follows: We partition the (p r ) (p r ) matrices ogp and A g 1 0 ogp = @ b1jgn where o1gp ; A 2 R(p rg ) (p rg ) b 1jgn ( b 2jgn ( b 3jgn ( 2 ) := 2 0 rg Bn;rg o02gp o3gp 1) (rg rg 1 bjgn A and A 1) 3 b1jgn A b2jgn A 5; =4 b0 b3jgn A A 2jgn b2jgn ; o2gp ; A 2 2 R(rg rg 1) 1 ;rg b0 W c0 c b Un0 D n n Wn Dn Un Bn;rg b2jgn ; and A 2 0 0 b 0 c0 c b rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p 1 ;rg + o1gp + o3gp (Irg (Ip rg (16.26) (p rg ) + 1; :::; p and g = 1; :::; Gh : De…ne 1 ) := %gn + o2gp ) := o1gp o2gp rg R(rg ; for j = rg g 1 rg b3jgn ; and o3gp ; A 1 b1jgn ); +A b3jgn ); +A (16.27) where b1jgn ( ); b2jgn ( ); and b3jgn ( ) have the same dimensions as o1gp ; o2gp ; and o3gp ; respectively. 20 From (16.21), we have wp!1 f(n1=2 0=j 2 0 rg Fn Bn;rg 1 ;p r g Fn ) b n0 W cn0 W cn D b n Un Bn;r Un0 D g = jb1jgn ( )j jb3jgn ( ) b 2jgn ( = jb1jgn ( )j j 2b 1 ;p jn : j = rg + ogp 1 (Ip 1 )0b1jgn ( )b2jgn ( )j 2 0 0 b 0 c0 c b rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p b3jgn A b0 b 1 ( )(%gn + o2gp ) [Ip rg + A 2jgn 1jgn 1 b02jgnb1jgn b2jgn ]j; + A ( )A + 1; :::; pg solve rg + o3gp 1 bjgn )j +A 1 (%gn + o2gp )0b1jgn ( )(%gn + o2gp ) 1 b2jgn (%gn + o2gp )0b1jgn ( )A (16.28) where the second equality holds by the same argument as for (16.10) and uses the result given in (16.29) below which shows that b1jgn ( ) is nonsingular wp!1 when equals (n1=2 rg Fn ) 2 bjn for j = rg + 1; :::; p: Now we show that, for j = rg +1; :::; p; (n1=2 rg Fn ) 2b jn cannot solve the determinantal equation jb1jgn ( )j = 0 for n large, where this determinant is the …rst multiplicand on the rhs of (16.28) and, hence, it must solve the determinantal equation based on the second multiplicand on the rhs of (16.28). For j = rg + 1; :::; p; we have e 1jgn := b1jgn ((n1=2 rg Fn ) 2 2 bjn ) = h6;r + op (1); g (16.29) b1jgn = op (1) (which holds by the by the same argument as in (16.11), using o1gp = op (1) and A b1jgn following (16.21)). Equation (16.29) and min (h 2 ) > 0 establish the result de…nition of A 6;rg stated in the …rst sentence of this paragraph. For j = rg +1; :::; p; plugging (n1=2 rg Fn ) gives (n1=2 bj(g+1)n : = A b3jgn A +(n1=2 bj(g+1)n 2 R(p and A = o3gp jn into the second multiplicand on the rhs of (16.28) 2 0 0 b 0 c0 c b rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p 0=j o3gp 2b 2 rg Fn ) bjn (Ip rg bj(g+1)n )j; where +A b0 e 1 (%gn + o2gp ) A 2jgn 1jgn 2 rg Fn ) rg ) (p rg ) b02jgne 1 A b bjn A 1jgn 2jgn 1 (%gn + o2gp )0e1jgn (%gn + o2gp ) 1 b2jgn (%gn + o2gp )0e1jgn A (16.30) : The last two summands on the rhs of the …rst line of (16.30) satisfy 1 (%gn + o2gp )0e1jgn (%gn + o2gp ) = o3gp ogp0 ogp = ( + o3gp 2 2 rg+1 Fn = rg Fn )o(g+1)p ; 21 (ogp + o2gp )0 (h6;rg2 + op (1))(ogp + o2gp ) (16.31) where (i) the …rst equality uses (16.25) and (16.29), (ii) the second equality uses o2gp = ogp (which holds because the (j; m) element of o2gp for j = 1; :::; rg (rg +m)Fn = 2 1=2 1 rg Fn ) ) rg Fn )+Op ((n = op ( rg 1 and m = 1; :::; p rg is op ( (rg +m)Fn = rg Fn )+Op ((n1=2 rg Fn ) rg ) and (h6;rg2 + op (1))ogp = ogp (which holds because h6;rg is diagonal and 1) (rg since rg 2 min (h6;rg ) 1 +j)Fn 1 +j > 0); (iii) the 2 2 0 rg rg Fn = rg+1 Fn )ogp ogp for j; m = 1; :::; p op ( (rg +j)Fn (rg +m)Fn = 2rg Fn )( 2rg Fn = 2rg+1 Fn ) = op ( (rg +j)Fn (rg +m)Fn Op ((n1=2 rg Fn ) 2 )( 2rg Fn = 2rg+1 Fn ) = Op ((n1=2 rg+1 Fn ) 2 ) and, hence, last equality uses the fact that the (j; m) element of ( is the sum of a term that is = ( 2 rg+1 Fn ) and a term 2 2 0 rg Fn = rg+1 Fn )ogp ogp that is is o(g+1)p (using the de…nition of o(g+1)p ); and (iv) the last equality uses the 2 2 rg Fn = rg+1 Fn )o3gp for j; m = Op ((n1=2 rg Fn ) 1 )( 2rg Fn = 2rg+1 Fn ) fact that the (j; m) element of ( 2 2 2 rg Fn )( rg Fn = rg+1 Fn ) + +Op ((n1=2 rg+1 Fn ) 1 )( rg Fn = rg+1 Fn ); = (using rg Fn = rg+1 Fn 1; :::; p = rg is op ( (rg +j)Fn (rg +m)Fn 2 (rg +j)Fn (rg +m)Fn = rg+1 Fn ) op ( which again is the same order as the (j; m) element of o(g+1)p 1): The calculations in (16.31) are a key part of the induction proof. The de…nitions of the terms ogp and ogp (given preceding (16.21) and (16.25), respectively) are chosen so that the results in (16.31) hold. For j = rg + 1; :::; p; we have bj(g+1)n = op (1); A (16.32) b2jgn = op (1) and A b3jgn = op (1) by (16.21), e 1 = Op (1) (by (16.29)), %gn + o2gp = op (1) because A 1jgn (by (16.25) since ogp = op (1)); and (n1=2 rg Fn ) 2b jn = op (1) (by (16.24)). Inserting (16.31) and (16.32) into (16.30) and multiplying by 0=j 2 0 0 b 0 c0 c b rg+1 Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p Thus, wp!1; 0=j f(n1=2 rg+1 Fn ) 2b jn + o(g+1)p (n1=2 2 2 rg Fn = rg+1 Fn rg+1 Fn ) : j = rg+1 ; :::; pg solve 2 0 0 b 0 c0 c b rg+1 Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p + o(g+1)p (Ip rg 2 bjn (Ip gives rg bj(g+1)n )j: +A bj(g+1)n )j: +A (16.33) (16.34) This establishes the induction step and concludes the proof that (16.21) holds for all g = 1; :::; Gh : Finally, given that (16.21) holds for all g = 1; :::; Gh ; (16.24) gives the results stated in (16.18) and (16.18) gives the results stated in the Lemma by the argument in (16.18)-(16.20). Now we use the approach in Johansen (1991, pp. 1569-1571) and Robin and Smith (2000, pp. 172-173) to prove Theorem 8.4. In these papers, asymptotic results are established under a …xed true distribution under which certain population eigenvalues are either positive or zero. Here we need to deal with drifting sequences of distributions under which these population eigenvalues may 22 be positive or zero for any given n; but the positive ones may drift to zero as n ! 1; possibly at di¤erent rates. This complicates the proof. In particular, the rate of convergence result of Lemma 16.1(b) is needed in the present context, but not in the …xed distribution scenario considered in Johansen (1991) and Robin and Smith (2000). Proof of Theorem 8.4. Theorem 8.4(a) and (c) follow immediately from Lemma 16.1(a). b n0 W cn0 W cn bn D Next, we assume q < p and we prove part (b). The eigenvalues fbjn : j pg of nU bn b nU b n0 W cn0 W cn D bn are the ordered solutions to the determinantal equation jnU bn D b nU D Ip j = 0: Equivalently, with probability that goes to one (wp!1), they are the solutions to b n Un Bn Sn b n0 W cn0 W cn D jQn ( )j = 0; where Qn ( ) := nSn Bn0 Un0 D bn 10 U bn 1 Un Bn Sn ; Sn0 Bn0 Un0 U (16.35) bn j > 0 wp!1. Thus, because jSn j > 0; jBn j > 0; jUn j > 0; and jU b 0 b 0 c0 c b b min (nUn Dn Wn Wn Dn Un ) equals the smallest solution, bpn ; to jQn ( )j = 0 wp!1. (For simplicity, we omit the quali…er wp!1 that applies to several statements below.) We write Qn ( ) in partitioned form using Bn Sn = (Bn;q Sn;q ; Bn;p Sn;q := Diagf(n1=2 q ); 1Fn ) 1 where ; :::; (n1=2 qFn ) 1 g 2 Rq q : (16.36) b n Un Tn (= n1=2 Wn D b n Un Bn Sn ) can be written The convergence result of Lemma 8.3 for n1=2 Wn D as where b n Un Bn;q Sn;q !p n1=2 Wn D h;q and h;p q h;q b n Un Bn;p := h3;q and n1=2 Wn D q !d h;p q ; (16.37) are de…ned in (8.17). We have cn W 1 !p Ik and U bn U 1 !p Ip W n n (16.38) cn !p h71 := lim Wn (by Assumption WU(a) and (c)), U bn !p h81 := lim Un (by Assumpbecause W tion WU(b) and (c)), and h71 and h81 are pd (by the conditions in FW U ): 23 By (16.35)-(16.38), we have 2 3 b n Un Bn;p q + op (1) Iq + op (1) h03;q n1=2 Wn D 5 Qn ( ) = 4 0 0D 0 W0h 1=2 B 0 0D b 0 W 0 Wn n1=2 D b n Un Bn;p q + op (1) b n1=2 Bn;p U + o (1) n U 3;q p q n n n n;p q n n n 3 2 3 2 2 Sn;q 0q (p q) Sn;q A1n Sn;q Sn;q A2n 5 ; where 4 5 4 (16.39) A02n Sn;q A3n 0(p q) q Ip q 3 2 A A2n bn 10 U bn 1 Un Bn Ip = op (1) for A1n 2 Rq q ; A2n 2 Rq (p q) ; bn = 4 1n 5 := Bn0 Un0 U A 0 A2n A3n and A3n 2 R(p q) (p q) ; 0 bn is de…ned in (16.39) just as in (16.5), and the …rst equality uses A 0 C := h3;q and h;q h;q = h03;q h3;q = lim Cn;q n;q = Iq (by (8.7), (8.9), (8.12), and (8.17)). Note bjn (de…ned in (16.2)) are not the same in general for j = 1; 2; 3; because their that Ajn and A b1n 2 Rr1 r1 : dimensions di¤er. For example, A1n 2 Rq q ; whereas A h;q If q = 0 (< p); then Bn = Bn;p q and bn0 D b n0 W cn0 W cn D b nU bn Bn nBn0 U 1 bn )0 B 10 B 0 U 0 D b0 0 c = nBn0 (Un 1 U n n n n Wn Wn Wn !d 0 h;p q 0 h;p q ; cn W 1 (Wn D b n Un Bn )B 1 (U 1 U bn )Bn W n n n (16.40) where the convergence holds by (16.37) and (16.38) and h;p q is de…ned as in (8.17) with q = 0: The smallest eigenvalue of a matrix is a continuous function of the matrix (by Elsner’s Theorem, see cn D b nU bn Bn cn0 W b n0 W bn0 D Stewart (2001, Thm. 3.1, pp. 37–38)). Hence, the smallest eigenvalue of nBn0 U converges in distribution to the smallest eigenvalue of 0 0 h;p q h3;k q h3;k q h;p q (using h3;k 0 q h3;k q = h3 h03 = Ik when q = 0), which proves part (b) of Theorem 8.4 when q = 0: In the remainder of the proof of part (b), we assume 1 q < p; which is the remaining case to be considered in the proof of part (b). The formula for the determinant of a partitioned matrix and (16.39) give jQn ( )j = jQ1n ( )j jQ2n ( )j; where Q1n ( ) : = Iq + op (1) 0 Q2n ( ) : = n1=2 Bn;p 2 Sn;q Sn;q A1n Sn;q ; 0 b0 0 1=2 b Dn Un Bn;p q q Un Dn Wn Wn n 0 [n1=2 Bn;p 0 0 b0 q Un Dn Wn h3;q b n Un Bn;p [h03;q n1=2 Wn D q + op (1) Ip q + op (1) A02n Sn;q ](Iq + op (1) + op (1) Sn;q A2n ]; 24 A3n 2 Sn;q Sn;q A1n Sn;q ) 1 (16.41) none of the op (1) terms depend on ; and the equation in the …rst line holds provided Q1n ( ) is nonsingular. By Lemma 16.1(b) (which applies for 1 and bjn Sn;q A1n Sn;q = op (1): Thus, 2 = o (1) q < p); for j = q + 1; :::; p; we have bjn Sn;q p 2 bjn Sn;q Q1n (bjn ) = Iq + op (1) bjn Sn;q A1n Sn;q = Iq + op (1): (16.42) By (16.35) and (16.41), jQn (bjn )j = jQ1n (bjn )j jQ2n (bjn )j = 0 for j = 1; :::; p: By (16.42), jQ1n (bjn )j = 6 0 for j = q + 1; :::; p wp!1. Hence, wp!1, jQ2n (bjn )j = 0 for j = q + 1; :::; p: (16.43) Now we plug in bjn for j = q + 1; :::; p into Q2n ( ) in (16.41) and use (16.42). We have 0 b0 0 b q Un Dn Wn Wn Dn Un Bn;p q 0 Q2n (bjn ) = nBn;p 0 [n1=2 Bn;p bjn [Ip q 0 b0 0 q Un Dn Wn h3;q + A3n + op (1) b n Un Bn;p + op (1)](Iq + op (1))[h03;q n1=2 Wn D 0 (n1=2 Bn;p 0 b0 0 q Un Dn Wn h3;q b n Un Bn;p A02n Sn;q (Iq + op (1))(h03;q n1=2 Wn D q + op (1)] + op (1))(Iq + op (1))Sn;q A2n q + op (1)) +bjn A02n Sn;q (Iq + op (1))Sn;q A2n ]: (16.44) The term in square brackets on the last three lines of (16.44) that multiplies bjn equals Ip q + op (1); b n Un Bn;p because A3n = op (1) (by (16.39)), n1=2 Wn D q (16.45) = Op (1) (by (16.37)), Sn;q = o(1) (by the de…nitions of q and Sn;q in (8.16) and (16.36), respectively, and h1;j := lim n1=2 jFn ); A2n = op (1) 2 A +A0 b S (by (16.39)), and bjn A02n Sn;q (Iq +op (1))Sn;q A2n = A02n bjn Sn;q 2n 2n jn n;q op (1)Sn;q A2n = op (1) 2 = o (1) and A (using bjn Sn;q p 2n = op (1)): Equations (16.44) and (16.45) give 0 Q2n (bjn ) = n1=2 Bn;p 0 = n1=2 Bn;p := Mn;p q 0 b0 0 q Un Dn Wn [Ik b n Un Bn;p h3;q h03;q ]n1=2 Wn D 0 b0 0 0 1=2 b n Un Bn;p q Wn D q Un Dn Wn h3;k q h3;k q n bjn [Ip q q + op (1) + op (1) + op (1)]; where the second equality uses Ik = h3 h03 = h3;q h03;q + h3;k 25 0 q h3;k q bjn [Ip bjn [Ip q q + op (1)] + op (1)] (16.46) (because h3 = lim Cn is an orthogonal matrix) and the last line de…nes the (p q) (p q) matrix Mn;p Equations (16.43) and (16.46) imply that fbjn : j = q + 1; :::; pg are the p q: q eigenvalues of the matrix Mn;p q := [Ip q + op (1)] 1=2 Mn;p q [Ip q + op (1)] 1=2 by pre- and post-multiplying the quantities in (16.46) by the rhs quantity [Ip (16.47) q 1=2 + op (1)] in (16.46). By (16.37), Mn;p q !d 0 0 h;p q h3;k q h3;k q h;p q : (16.48) The vector of (ordered) eigenvalues of a matrix is a continuous function of the matrix (by Elsner’s Theorem, see Stewart (2001, Thm. 3.1, pp. 37–38)). By (16.48), the matrix Mn;p converges in distribution. In consequence, by the CMT, the vector of eigenvalues of Mn;p q; q viz., fbjn : j = q + 1; :::; pg; converges in distribution to the vector of eigenvalues of the limit matrix 0 b 0 b 0 c0 h3;k q h0 h;p q ; which proves part (d) of Theorem 8.4. In addition, min (nU D W h;p q n 3;k q n n cn D b nU bn ); which equals the smallest eigenvalue, bpn ; converges in distribution to the smallest W eigenvalue of 0 0 h;p q h3;k q h3;k q h;p q ; which completes the proof of part (b) of Theorem 8.4. The convergence in parts (a)-(d) of Theorem 8.4 is joint with that in Lemma 8.3 because it b n Un Tn ; which is part of the former. This just relies on the convergence in distribution of n1=2 Wn D establishes part (e) of Theorem 8.4. Part (f) of Theorem 8.4 holds by the same proof as used for parts (a)-(e) with n replaced by wn : 17 Proofs of Su¢ ciency of Several Conditions for the p j( ) Condition in F0j In this section, we show that the conditions in (3.9) and (3.10) are su¢ cient for the second condition in F0j ; which is p j( 0 CF;k F 1=2 j F Gi BF;p 26 j ) 1 8 2 Rp j with jj jj = 1: Condition (i) in (3.9) is su¢ cient by the following argument: p j 0 CF;k F p j C F;p F 1=2 0 = ( min = 0 j F Gi BF;p j j) vec(C F;p F F j 1=2 Ip min 2Rp j :jj Gi BF;p j jj=1 ( jj( Ip Ip 0 vec(C F;p F min 2 2R(p j) :jj = jj=1 0 vec(C F;p F min 0 1=2 F F Gi BF;p 0 j) j) 0 j 1=2 j jj j j) 0 1=2 Gi BF;p Ip 1=2 vec(C F;p F F ( j F j) Gi BF;p j) j) ( jj( min 2Rp Gi BF;p j) j :jj jj=1 Ip Ip jj( F 0 CF;k j 0 C F;p j 1=2 F Gi BF;p 1=2 F j Gi BF;p j CF;k 0 CF;k F 0 are a collection of p j rows of CF;k eigenvalue of a (p j) (p j ); j 1=2 j F Gi BF;p replaced by j j) jj2 jj2 j) j and r = k 0 C F;p j ; p (because 1=2 0 CF;k F ; since j F Gi BF;p j = and by de…nition the rows of the …rst equality holds because the (p j)-th largest j) matrix equals its minimum eigenvalue and by the general formula vec(ABC) = (C 0 A)vec(B); and the last equality holds because jj( 0 Ip (17.1) 0 likewise with CF;k j; jj( jj ; is a submatrix of F j) Ip where the …rst inequality holds by Corollary 15.4(a) with m = p 0 C F;p j j) Ip j) 0 jj2 = ( 0 Ip j) = = 1 using jj jj = jj jj = 1: Condition (ii) in (3.9) is su¢ cient by su¢ cient condition (i) in (3.9) and the following: 0 min = j 2 2R(p j) :jj jj=1 C F;p j min 2R(p j)k :jj min jj=1 vec( F 0 1=2 F F (Ip jj(Ip min jj(Ip = 1=2 vec(C F;p F j) of 0 C F;p j j 2 jj Gi BF;p j j) C F;p C F;p j vec( F where the last equality uses jj(Ip Gi BF;p 1=2 F 0 j) j) Gi BF;p jj j) vec( F 1=2 F min 2 2R(p j) :jj j) Gi BF;p jj=1 j) jj(Ip (Ip jj(Ip j ; C F;p j j C F;p C F;p C F;p j) j) j) jj jj2 (17.2) j) jj2 = 0 (I p j 0 C F;p j C F;p j ) = 1 because the rows are orthonormal and jj jj = 1: Condition (iii) in (3.9) is su¢ cient by su¢ cient condition (ii) in (3.9) and a similar argument to that given in (17.2) using the fact that min of BF;p j 0 2Rpk :jj jj=1 jj(BF;p j are orthonormal. 27 Ik ) jj2 = 1 because the columns Condition (iv) in (3.9) is su¢ cient by su¢ cient condition (iii) in (3.9) and a similar argument to that given in (17.2) using min of F in place of min following calculations: 0 2R(p 1 (Ip F 1=2 2Rpk :jj jj=1 jj(Ip j)2 :jj ) = jj=1 p X j=1 p X jj(Ip F C F;p j ( j =jj j jj)0 min ( F 1 1 F =( 0 1 ; :::; 0 0 p) for 2 Rk 8j j jj2 2=(2+ ) for M as in the de…nition = 1: The latter inequality holds by the ( j =jj j jj) jj j jj2 max ( F ) max ( F ) ((EF jjgi jj2+ )1=(2+ ) )2 M 2=(2+ ) 2=(2+ ) M p; the sums are over j for which ity uses jj jj = 1; and the last inequality holds because EF jjgi jj2 = ((EF jjgi jj2 )1=2 )2 M jj j jj2 = 1= ) j=1 where j) ) jj2 j ; (17.3) 6= 0k ; the second equal- = max 2Rk :jj jj=1 EF ( 0 gi )2 by successively applying the Cauchy-Bunyakovsky-Schwarz inequality, Lyapunov’s inequality, and the moment bound EF jjgi jj2+ M in F: Conditions (v) and (vi) in (3.9) are su¢ cient by the following argument. Write vec(Gi ) F = (MF ; Ipk ) fi 0 F (MF ; Ipk ) ; (EF vec(Gi )gi0 )(EF gi gi0 ) where MF = 1 2 Rpk k : (17.4) We have min ( vec(Gi ) ) F = = min 2Rpk :jj jj=1 min 2Rpk :jj jj=1 0 (MF ; Ipk ) = min ( fi F ); 0 (MF ; Ipk )0 jj(MF ; Ipk )0 jj min 2R(p+1)k :jj fi 0 F (MF ; Ipk ) jj=1 0 (MF ; Ipk )0 jj(MF ; Ipk )0 jj fi F fi F jj(MF ; Ipk )0 jj2 (17.5) where the inequality uses jj(MF ; Ipk )0 jj2 = 0 + 0 MF0 MF 2 Rpk with jj jj = 1: This 1 for shows that condition (v) is su¢ cient for su¢ cient condition (iv) in (3.9). Since EF fi EF fi0 ; fi F = V arF (fi ) + condition (vi) is su¢ cient for su¢ cient condition (v) in (3.9). The condition in (3.10) is su¢ cient by the following argument: p j 0 CF;k F 1=2 j F Gi BF;p j p 0 CF F 1=2 F Gi BF;p 1=2 j = p F F Gi BF;p j ; (17.6) where the …rst inequality holds by Corollary 15.4(b) with m = p and r = j and the equality holds 28 because 18 0 CF F 1=2 F Gi BF;p j = CF0 1=2 F F Gi BF;p j CF and CF is orthogonal. Asymptotic Size of Kleibergen’s CLR Test with JacobianVariance Weighting and the Proof of Theorem 5.1 In this section, we establish the asymptotic size of Kleibergen’s CLR test with Jacobian-variance weighting when the Robin and Smith (2000) rank statistic (de…ned in (5.5)) is employed. This rank statistic depends on a variance matrix estimator VeDn : See Section 5 for the de…nition of the test. We provide a formula for the asymptotic size of the test that depends on the speci…cs of the moment conditions considered and does not necessarily equal its nominal size : First, in Section 18.1, we provide an example that illustrates the results in Theorem 5.1 and Comment (v) to Theorem 5.1. In Section 18.2, we establish the asymptotic size of the test based on VeDn de…ned as in (5.3). In Section 18.3, we report some simulation results for a linear instrumental variable (IV) model with two rhs endogenous variables. In Section 18.4, we establish the asymptotic size of Kleibergen’s CLR test with Jacobian-variance weighting under a general assumption that allows for other de…nitions of VeDn : In Section 18.5, we show that equally-weighted versions of Kleibergen’s CLR test have correct asymptotic size when the Robin and Smith (2000) rank statistic is employed and a general equalfn is employed. This result extends the result given in Theorem 6.1 in Section weighting matrix W fn = b n 1=2 ; as in (6.2). The results of Section 18.5 are 6, which applies to the speci…c case where W a relatively simple by-product of the results in Section 18.4. Proofs of the results stated in this section are given in Section 18.6. Theorem 5.1 follows from Lemma 18.2 and Theorem 18.3, which are stated in Section 18.4. 18.1 An Example Here we provide a simple example that illustrates the result of Theorem 5.1. In this example, the true distribution F does not depend on n: Suppose p = 2; EF Gi = (1k ; 0k ); where ck = (c; :::; c)0 2 b n EF Gi ) !d Dh under F for some random matrix Dh = (D1h ; D2h ) 2 Rk 2 : Rk for c = 0; 1; n1=2 (D fn = Ve 1=2 and MF = I2k ; we have n1=2 (M fn MF ) !d M h under F for some random Suppose for M matrix M h 2 Dn 2k 2k R :53 We have b ny = vec 1 (Ve 1=2 vec(D b n )) = M f11n D b 1n + M f12n D b 2n ; M f21n D b 1n + M f22n D b 2n ; D k;p Dn (18.1) b n EF Gi ) !d Dh and n1=2 (M fn MF ) !d M h are established in Lemmas 8.2 The convergence results n1=2 (D and 18.2, respectively, in Section 8 of AG1 and Section 18 in this Supplemental Material under general conditions. 53 29 b n = (D b 1n ; D b 2n ); M fj`n for j; ` = 1; 2 are the four k where D fn ; and likewise k submatrices of M for Mj`F for j; ` = 1; 2: Let M j`h for j; ` = 1; 2 denote the four k Tny = Diagfn 1=2 ; 1g: k submatrices of M h : We let Then, we have b ny Tny = n1=2 D !d b 2n b 1n + M f22n n1=2 D f21n D b 2n ; n1=2 M b 1n + M f12n D f11n D M Ik 1k + 0k k k 0 ; M 21h 1k + Ik D2h = 1k ; M 21h 1k + D2h ; (18.2) b 2n !d D2h and n1=2 D b ny Tny depends (because EF Gi2 = 0k ): Equation (18.2) shows that the asymptotic distribution of n1=2 D on the randomness of the variance estimator VeDn through M 21h : f21n !d M 21h (because M21F = 0k where the convergence uses n1=2 M k) It may appear that this example is quite special and the asymptotic behavior in (18.2) only arises in special circumstances, because EF Gi = (1k ; 0k ); M21F = 0k k; and MF = I2k in this example. But this is not true. The asymptotic behavior in (18.2) arises quite generally, as shown in Theorem 5.1, whenever p 2:54 1=2 b ny ; then the calculations If one replaces VeDn by its probability limit, MF ; in the de…nition of D f21n replaced by n1=2 M21F = 0k k in the …rst line and, hence, M 21h in (18.2) hold but with n1=2 M replaced by 0k k in the second line. Hence, in this case, the asymptotic distribution only depends on Dh : Hence, Comment (iv) to Theorem 5.1 holds in this example. b ny by W fn D b n as in Comment (v) to Theorem 5.1. This yields equal Suppose one de…nes D b n : This is equivalent to replacing Ve 1=2 by I2 W fn in the de…nition weighting of each column of D Dn b ny in (18.1). In this case, the o¤-diagonal k of D in the …rst line of (18.2) equals 0k k blocks of I2 k; fn are 0k W k f21n and, hence, M which implies that M 21h = 0k k in the second line of (18.2). b ny does not depend on the asymptotic distribution of the Thus, the asymptotic distribution of D fn : It only depends on the probability limit of W fn ; as stated (normalized) weight matrix estimator W in Comment (v) to Theorem 5.1. 18.2 Asymptotic Size of Kleibergen’s CLR Test with Jacobian-Variance Weighting b n is In this subsection, we determine the asymptotic size of Kleibergen’s CLR test when D weighted by VeDn ; de…ned in (5.3), which yields what we call Jacobian-variance weighting, and the Robin and Smith (2000) rank statistic is employed. This rank statistic is de…ned in (5.5) with f21n does not converge When the matrix M21F 6= 0k k ; the argument in (18.2) does not go through because n1=2 M f21n M21F ) !d M 21h by assumption). In this case, one has to alter the de…nition of Tny in distribution (since n1=2 (M b n before rescaling them. The rotation required depends on both MF and EF Gi : so that it rotates the columns of D 54 30 = 0: For convenience, we restate the de…nition here: rkn = rkny := b ny is as in (5.4) with (so D = by 0 by min (n(Dn ) Dn ); 55 0 ): Let b ny := vec 1 (Ve 1=2 vec(D b n )) where D k;p Dn (18.3) b y )0 D b y ; for j = 1; :::; p; byjn denote the jth eigenvalue of n(D n n ordered to be nonincreasing in j: By de…nition, b ny equals (by )1=2 : of n1=2 D jn by 0 by min (n(Dn ) Dn ) (18.4) = bypn : Also, the jth singular value De…ne the parameter space FKCLR for the distribution F by FKCLR := fF 2 F : where when 2 1 > 0 and min (V arF ((gi0 ; vec(Gi )0 )0 )) 0 0 0 4+ 2 ; EF jj(gi ; vec(Gi ) ) jj M g; (18.5) > 0 and M < 1 are as in the de…nition of F in (3.1). Note that FKCLR in F0 satis…es M 1 2=(2+ ) 2; F0 by condition (vi) in (3.9). Let vech( ) denote the half vectorization operator that vectorizes the nonredundant elements in the columns of a symmetric matrix (that is, the elements on or below the main diagonal). The moment condition in FKCLR is imposed because the asymptotic distribution of the rank statistic rkny depends on a triangular array CLT for vech(fi fi 0 ); which employs 4 + moments for fi ; where fi := (gi0 ; vec(Gi EFn Gi )0 )0 as in (5.6). The min ( ) condition in FKCLR ensures that VeDn is positive de…nite wp!1; which is 1=2 needed because VeDn enters the rank statistic rkny via Ve ; see (18.3). Dn For a …xed distribution F; VeDn estimates de…nition in (8.15) and the MF 2 6 6 =6 4 DFy := M11F .. . Mp1F p X j=1 .. min ( . vec(Gi ) F de…ned in (8.15), where vec(Gi ) F is pd by its ) condition in FKCLR :56 Let 3 M1pF 7 7 .. 7 := ( . 5 MppF vec(Gi ) 1=2 ) F (M1jF EF Gij ; :::; MpjF EF Gij ) 2 Rk p and ; where Gi = (Gi1 ; :::; Gip ) 2 Rk (18.6) p : As in Section 5, the function veck;p1 ( ) is the inverse of the vec( ) function for k p matrices. Thus, the domain of veck;p1 ( ) consists of kp-vectors and its range consists of k p matrices. vec(Gi ) vec(Gi ) 56 More speci…cally, is pd because by (8.15) F := V arF (vec(Gi ) (EF vec(G` )g`0 ) F 1 gi ) F 1 1 0 0 0 0 0 0 = ( (EF vec(G` )g` ) F ; Ipk )V arF ((gi ; vec(Gi ) ) )( (EF vec(G` )g` ) F ; Ipk ) ; where ( (EF vec(G` )g`0 ) F 1 ; Ipk ) 2 Rpk (p+1)k has full row rank pk and V arF ((gi0 ; vec(Gi )0 )0 ) is pd by the min ( ) condition in FKCLR : 55 31 y y 1F ; :::; pF ) Let ( denote the singular values of DFy : De…ne BFy 2 Rp p to be an orthogonal matrix of eigenvectors of DFy0 DFy and CFy 2 Rk k to be an orthogonal matrix of eigenvectors of DFy DFy0 y 1F ; :::; y 2 jF ) for j y pF ) ordered so that the corresponding eigenvalues ( y jF tively, are nonincreasing. We have =( and ( y 1F ; :::; (18.7) y pF ; 0; :::; 0) 2 Rk ; respec- = 1; :::; p: Note that (18.7) gives de…nitions of BF and CF that are similar to the de…nitions in (8.6) and (8.7), but di¤er because DFy replaces WF (EF Gi )UF in the de…nitions. De…ne ( 1;F ; :::; 9;F ) as in (8.9) with 1=2 = WF = 7;F F ; 8;F = Ip ; and W1 ( ) and U1 ( ) equal to identity functions. De…ne 10;F 0 = V arF @ 1 fi vech (fi fi A 2 Rd 0) y 1;F ; and CFy where d := (p+1)k+(p+1)k((p+1)k+1)=2: De…ne ( y jF are de…ned in (8.9) but with f :j pg; BFy ; y 2;F ; d y 3;F ; ; (18.8) y 6;F ) in place of f as ( jF 1;F ; :j 2;F ; 3;F ; 6;F ) pg; BF ; and CF ; respectively. De…ne = KCLR F := ( := f : =( hn ( ) : = (n1=2 Let f n;h 2 KCLR 1;F ; :::; 1;F ; :::; 1;F ; :n y 1;F ; 10;F ; 2;F ; 10;F ; 3;F ; y 2;F ; y 1;F ; 4;F ; y 3;F ; 6;F ; 1g denote a sequence f n 7;F ; 2 (18.9) y 6;F ) y 3;F ; y 2;F ; 5;F ; y 6;F ); for some F 2 FKCLR g; and 8;F ; KCLR 10;F ; n :n bn for H as in (8.1). The asymptotic variance of n1=2 vec(D KCLR :n q; and h1;p 1g for which hn ( EFn Gi ) is vec(Gi ) h n) y 6;F ): ! h 2 H; under f n;h 2 q p and hs for s = 2; :::; 8 as in (8.12), q = qh as in (8.16), h2;q ; h2;p as in (8.17), and h8 = Ip due to the de…nitions of upper left k y 3;F ; y 2;F ; 1g by Lemma 8.2. De…ne h1;j for j h3;p 1=2 y 1;F ; 7;F n; n;q ; and 8;F and n;p q as in (13.2). Note that h7 = q ; h3;q ; 1=2 h5;g and given above, where h5;g (= lim EFn gi gi0 ) denotes the k submatrix of h5 ; as in Section 8. For a sequence f 2 h10 = 4 n;h 2 h10;f h10;f 2f KCLR h10;f h10;f :n f 2 2f 2 1g; we have 0 3 5 := lim V arFn @ 32 fi vech (fi fi 0 ) 1 A 2 Rd d : (18.10) Note that h10;f 2 R(p+1)k With y jF ; BFy ; and CFy (p+1)k is pd by the de…nition of FKCLR in (18.5). in place of jF ; BF ; and CF ; respectively, de…ne hy1;j for j p and hys for s = 2; 3; 6 as in (8.12) as analogues to the quantities without the y superscript, de…ne q y = qhy as in (8.16), de…ne hy2;qy ; hy2;p y n;p q y qy ; hy3;qy ; hy3;k qy ; and hy1;p qy y n; as in (8.17), and de…ne y ; n;q y and as in (13.2). The quantity q y determines the asymptotic behavior of rkny : By de…nition, q y is the largest value j ( below that if qy p) for which lim n1=2 = p; then rkny y jFn = 1 under f qy !p 1; whereas if < p; then n;h 2 KCLR : n rkny converges in 1g: It is shown distribution to a nondegenerate random variable, see Lemma 18.4. By the CLT, for any sequence f n 1=2 n X i=1 0 0 @ n;h 2 KCLR fi vech (fi fi 0 0 EFn fi fi 0) 0 :n 1g; 1 A !d Lh N (0d ; h10 ); where Lh = (Lh;1 ; Lh;2 ; Lh;3 )0 for Lh;1 2 Rk ; Lh;2 2 Rkp ; and Lh;3 2 R(p+1)k((p+1)k+1)=2 (18.11) and the CLT holds using the moment conditions in FKCLR : Note that by the de…nitions of h4 := lim EFn Gi and h5 := lim EFn (gi0 ; vec(Gi )0 )0 (gi0 ; vec(Gi )0 ); we have 2 h10;f = 4 for h5;g 2 Rk k; h5;g h5;gG h5;Gg h5;G h5;Gg 2 Rkp k; vec(h4 )vec(h4 )0 and h5;G 2 Rkp 3 2 5 ; where h5 = 4 h5;g h5;gG h5;Gg h5;G 3 5 (18.12) kp : We now provide new, but distributionally equivalent, de…nitions of g h and Dh : g h := Lh;1 and vec(Dh ) := Lh;2 h5;Gg h5;g1 Lh;1 : (18.13) These de…nitions are distributionally equivalent to the previous de…nitions of g h and Dh given in Lemma 8.2, because by either set of de…nitions g h and vec(Dh ) are independent mean zero random vectors with variance matrices h5;g and respectively, where min ( vec(Gi ) ) Fn vec(Gi ) h vec(Gi ) h (= h5;G vec(h4 )vec(h4 )0 h5;Gg h5;g1 h05;Gg ); is de…ned in (8.15) and is pd (because vec(Gi ) h is bounded away from zero by its de…nition in (8.15) and the FKCLR ): 33 = lim min ( vec(Gi ) Fn and ) condition in De…ne y Dh := p X j=1 2 M 6 11h 6 . p ; where 6 .. 4 Mp1h (M1jh Djh ; :::; Mpjh Djh ) 2 Rk .. . 3 M1ph 7 .. 7 . 7 := ( 5 Mpph vec(Gi ) 1=2 ) ; h (18.14) Dh = (D1h ; :::; Dph ); and Dh is de…ned in (18.13). De…ne y h =( y h;p q y y h;q y ; y h;p q y ) : = hy3 hy1;p qy y 2 Rk + Dh hy2;p p qy y h;q y ; 2 Rk := hy3;qy 2 Rk (p q y ) qy ; and : (18.15) Let a( ) be the function from Rd to Rkp(kp+1)=2 that maps n X 0 1 fi A into 0) vech (f f i=1 i i 0 n X 1 @ An := vech n vec(Gi EFn Gi )vec(Gi n 1 @ (18.16) EFn Gi )0 i=1 e n := n 1 n X i=1 gi gi0 2 Rk k and e n := n 1 1 n X vec(Gi i=1 e n e n 1 e 0n ! EFn Gi )gi0 2 Rpk 1=2 k 1 A ; where : Pn part of its argument. Also, a( ) is well de…ned P and continuously partially di¤erentiable at any value of its argument for which n 1 ni=1 fi fi 0 is Note that a( ) does not depend on the n i=1 fi pd.57 We de…ne Ah as follows: Ah denotes the (kp)(kp + 1)=2 d matrix of partial derivatives of a( ) evaluated at (0(p+1)k0 ; vech(h10;f )0 )0 ; (18.17) where the latter vector is the limit of the mean vector of (fi 0 ; vech (fi fi 0 )0 )0 under f n n;h 2 KCLR : 1g: De…ne 1 M h := vechkp;kp (Ah Lh ) 2 Rkp kp ; (18.18) 1 where vechkp;kp ( ) denotes the inverse of the vech( ) operator applied to symmetric kp kp matrices. P The function a( ) is well de…ned in this case because n 1 n EFn Gi )vec(Gi EFn Gi )0 i=1 vec(Gi P n 1 1 0 1 0 1 pk (p+1)k e e e e e e has full row rank pk: = ( n n ; Ipk )n n n ; Ipk ) 2 R n n ; Ipk ) and ( i=1 fi fi ( 57 34 e n e n 1 e 0n De…ne y y y M h := (M h;qy ; M h;p y M h;p qy := p X j=1 qy ) := (0k qy y ; M h;p qy ) (M 1jh h4;j ; :::; M pjh h4;j )hy2;p qy and h4 = (h4;1 ; :::; h4;p ) 2 Rk 2 Rk 2 Rk (p p ; where qy ) 2 6 6 ; Mh = 6 4 M 11h .. . .. (18.19) 3 M 1ph 7 .. 7 . 7; 5 M pph . M p1h p: Below (in Lemma 18.4), we show that the asymptotic distribution of rkny under sequences f n;h 2 KCLR 1g with q y < p is given by :n rh (Dh ; M h ) := where y h;p q y y h;p q y min (( y + M h;p y0 0 y q y ) h3;k q y h3;k q y ( y h;p q y y + M h;p y is a nonrandom function of Dh by (18.14) and (18.15) and M h;p function of M h by (18.19). For sequences f n;h 2 KCLR : n 1g with qy q y )); qy (18.20) is a nonrandom = p; we show that rkn !p rh := 1: We de…ne h =( h; h;q ; as in (8.17), as follows: 2 Rk h;p q ) h2 = (h2;q ; h2;p q ); p ; h;q h3 = (h3;q ; h3;k := h3;q ; and q ); h1;p q h;p q 2 := h3 h1;p q + h7 Dh h8 h2;p 3 0q (p q) 6 7 k 7 := 6 4 Diagfh1;q+1 ; :::; h1;p g 52 R 0(k p) (p q) q; where (p q) : (18.21) 1=2 b n through In the present case, h7 = h5;g and h8 = Ip because the CLRn statistic depends on D b n ; which appears in the LMn statistic.58 This means that Assumption WU for the parameter b n 1=2 D space KCLR cn = b n 1=2 ; U bn = Ip ; h7 = h 1=2 ; and h8 = Ip : (de…ned in Section 8.4) holds with W 5;g Thus, the distribution of h depends on Dh ; q; and hs for s = 1; 2; 3; 5: Below (in Lemma 18.5), we show that the asymptotic distribution of the CLRn statistic under 58 b n through the rank statistic. The CLRn statistic also depends on D 35 sequences f n;h 2 KCLR 1 CLRh := 2 1g with q y < p is given by59 :n LM h + J h LM h := v 0h v h 2 p; q rh + (LM h + J h v h := P rh )2 + 4LM rh ; where 1=2 h 1=2 1=2 h5;g g h ; J h := g 0h h5;g M h h5;g g h 2 k p; and rh := rh (Dh ; M h ): (18.22) The quantities (g h ; Dh ; M h ) are speci…ed in (18.13) and (18.18) (and (g h ; Dh ) are the same as in Lemma 8.2). Conditional on Dh ; LM h and J h are independent and distributed as respectively (see the paragraph following (10.6)). For sequences f n;h 2 KCLR :n 2 p and 1g with 2 ; k p qy = p; we show that the asymptotic distribution of the CLRn statistic is CLRh := LM h := v 0h v h where v h := P 1=2 h h5;g g h : The critical value function c(1 c(1 2; p ; r) to be the 1 ; r) is de…ned in (5.2) for 0 quantile of the 2 p r < 1: For r = 1; we de…ne distribution. Now we state the asymptotic size of Kleibergen’s CLR test based on Robin and Smith (2000) statistic with VeDn de…ned in (5.3). Theorem 18.1 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator VeDn employed by the rank statistic rkny (de…ned in (18.3)) is de…ned by (5.3). Then, the asymptotic size of Kleibergen’s CLR test based on the rank statistic rkny is AsySz = maxf ; sup P (CLRh > c(1 ; rh ))g h2H provided P (CLRh = c(1 ; rh )) = 0 for all h 2 H: Comments: (i) The proviso in Theorem 18.1 is a continuity condition on the distribution function of CLRh c(1 ; rh ) at zero. If the proviso in Theorem 18.1 does not hold, then the following weaker conclusion holds: AsySz (18.23) 2 [maxf ; sup P (CLRh > c(1 ; rh ))g; maxf ; sup lim P (CLRh > c(1 h2H x"0 h2H ; rh ) + x)g]: (ii) Conditional on (Dh ; M h ); g h has a multivariate normal distribution a.s. (because (g h ; Dh ; M h ) has a multivariate normal distribution unconditionally).60 The proviso in Theorem 18.1 holds 59 The de…nitions of v h ; LM h ; J h ; and CLRh in (18.22) are the same as in (9.1), (9.2), (10.6), and (10.7), respectively. 60 Note that g h is independent of Dh : 36 whenever g h has a non-zero variance matrix conditional on (Dh ; M h ) a.s. for all h 2 H: This holds because (a) P (CLRh = c(1 of iterated expectations, (b) some calculations show that CLRh 0 1=2 h ; rh )jDh ; M h ) by the law cJ h + c2 + crh i¤ X h X h = c2 + crh ; where c := c(1 (rh + c)LM h = c)1=2 (P ; rh )) = E(Dh ;M h ) P (CLRh = c(1 h5;g g h )0 ; c1=2 (M 1=2 h h5;g g h )0 )0 using (18.22), (c) P and (d) conditional on (Dh ; M h ); rh ; c; and h h +M h = c(1 ; rh ) i¤ ; rh ) and X h := ((rh + = Ik and P h M h = 0k k; are constants. (iii) When p = 1; the formula for AsySz in Theorem 18.1 reduces to and the proviso holds automatically. That is, Kleibergen’s CLR test has correct asymptotic size when p = 1: This holds y because when p = 1 the quantity M h in (18.19) equals 0k p by Comment (ii) to Theorem 18.3 below. This implies that rh (Dh ; M h ) in (18.20) does not depend on M h : Given this, the proof that P (CLRh > c(1 ; rh ) = for all h 2 H and that the proviso holds is the same as in (10.9)-(10.10) in the proof of Theorem 10.1. (iv) Theorem 18.1 is proved by showing that it is a special case of Theorem 18.6 below, which is similar but applies not to VeDn de…ned in (5.3), but to an arbitrary estimator VeDn (of the asymptotic variance vec(Gi ) h bn of n1=2 vec(D EFn Gi )) that satis…es an Assumption VD (which is stated below). Lemma 18.2 below shows that the estimator VeDn de…ned in (5.3) satis…es Assumption VD. (v) A CS version of Theorem 18.1 holds with the parameter space F where F ;KCLR := f(F; 0) : F 2 FKCLR ( 0 ); in (18.5) with its dependence on 0 0 2 ;KCLR in place of FKCLR ; g and FKCLR ( 0 ) is the set FKCLR de…ned made explicit. The proof of this CS result is as outlined in the Comment to Proposition 8.1. For the CS result, the h index and its parameter space H are as de…ned above, but h also includes 18.3 0 as a subvector, and H allows this subvector to range over : Simulation Results In this section, for a particular linear IV regression model, we simulate (i) the correlations y between M h;p qy (de…ned in (18.19)) and g h and (ii) some asymptotic null rejection probabilities (NRP’s) of Kleibergen’s CLR test that uses Jacobian-variance weighting and employs the Robin and Smith (2000) rank statistic. The model has p = 2 rhs endogenous variables, k = 5 IV’s, and an error structure that yields simpli…ed asymptotic formulae for some key quantities. The model is y1i = Y2i0 0 + ui and Y2i = where y1i ; ui 2 R; Y2i ; V2i = (V21i ; V22i )0 ; take Zij 0 Zi + V2i ; 2 R2 ; Zi = (Zi1 ; :::; Zi5 )0 2 R5 ; and N (:05; (:05)2 ) for j = 1; :::; 5; ui N (0; 1); V1i (18.24) 2 R5 2: N (0; 1); and V2i = ui V21i : The random variables Zi1 ; :::; Zi5 ; ui ; and V1i are taken to be mutually independent. We take 37 We = n = (e1 ; e2 cn 1=2 ); where e1 = (1; 0; :::; 0)0 2 R5 and e2 = (0; 1; 0; :::; 0)0 2 R5 : We consider 26 values of the constant c lying between 0 and 60:1 (viz., 0:0; 0:1; :::; 1:0; 1:1; :::; 10:1; 20:1; :::; 60:1); y as well as 707:1; 1414:2; and 1; 000; 000: Given these de…nitions, h1;1 = 1; h1;2 = c; and M h = y (05 ; M h;p qy ) 2 R5 2; see (18.19). In this model, we have gi = EF Gi gi0 = 0k k: Zi Y2i0 : The speci…ed error distribution leads to Zi ui and Gi = vec(Gi ) h In consequence, the matrix (de…ned in (8.15)), which is the asymptotic variance of the Jacobian-variance matrix estimator VeDn (de…ned in (5.3)), simpli…es as follows: vec(Gi ) h = lim V arFn vec(Di EFn Di )vec(Di EFn Di )0 = lim V arFn vec(Gi EFn Gi )vec(Gi EFn Gi )0 ; where Di : = Gi1 1F 1 F gi ; Gi2 2F 1 F gi ; jF (18.25) = EF Gij gi0 for j = 1; 2; and F = EF gi gi0 : In addition, in the present model, Gi1 and Gi2 are uncorrelated, where Gi = (Gi1 ; Gi2 ): In consequence, vec(Gi ) h is block diagonal. In turn, this implies that lim MFn := ( diagonal with o¤-diagonal block lim M12Fn = 05 vec(Gi ) 1=2 ) h is block 5: The quantities hy1;j for j = 1; :::; 5 (de…ned just below (18.10)) are not available in closed form, so we simulate them using a very large value of n; viz., n = 2; 000; 000: We use 4; 000; 000 simulation y repetitions to compute the correlations between the jth elements of M h;p qy and g h for j = 1; :::; 5 and the asymptotic NRP’s of the CLR test.61 The data-dependent critical values for the test are computed using a look-up table that gives the critical values for each …xed value r of the rank statistic in a grid from 0 to 100 with a step size of :005: These critical values are computed using 4; 000; 000 simulation repetitions. Results are obtained for each of the 29 values of c listed above. The simulated correlations y between the jth elements of M h;p qy :33; for all values of c :38; and :38 60:1: For c = 707:1; the correlations are :32; For c = 1414:2; the correlations are correlations are :01; in Theorem 5.1 that and g h for j = 1; :::; 5 take the following values :01; y M h;p qy :01; :38; :24; :01; and :38; :27; :27; :27; and (18.26) :36; :36; :36; and :36: :27: For c = 1; 000; 000; the :01: These results corroborate the …ndings given and g h are correlated asymptotically in some models under some sequences of distributions. In consequence, it is not possible to show the Jacobian-variance weighted CLR test has correct asymptotic size via a conditioning argument that relies on the independence 61 The correlations between the jth and kth elements of these vectors for j 6= k are zero by analytic calculation. Hence, they are not reported here. 38 of y h;p q y y + M h;p qy and g h : Next, we report the asymptotic NRP results for Kleibergen’s CLR test that uses Jacobianvariance weighting and the Robin and Smith (2000) rank statistic. The asymptotic NRP’s are found to be between 4:95% and 5:01% for the 29 values of c considered. These values are very close to the nominal size of 5:00%: Whether the di¤erence is due to simulation noise or not is not clear. The simulation standard error based on the formula 100 )=reps)1=2 ; where ( (1 reps = 4; 000; 000 is the number of simulation repetitions, is :01: However, this formula does not take into account simulation error from the computation of the critical values. We conclude that, for the model and error distribution considered, the asymptotic NRP’s of the Kleibergen’s CLR test with Jacobian-variance weighting is equal to, or very close to, its nominal size. y This occurs even though there are non-negligible correlations between M h;p qy and g h : Whether this occurs for all parameters and distributions in the linear IV model, and whether it occurs in other moment condition model, is an open question. It appears to be a question that can only be answered on a case by case basis. 18.4 e Dn Estimators Asymptotic Size of Kleibergen’s CLR Test for General V In this section, we determine the asymptotic size of Kleibergen’s CLR test (de…ned in Section 5) using the Robin and Smith (2000) rank statistic based on a general “Jacobian-variance”estimator VeDn (= VeDn ( 0 )) that satis…es the following Assumption VD. The …rst two results of this section, viz., Lemma 18.2 and Theorem 18.3, combine to establish Theorem 5.1, see Comment (i) to Theorem 18.3. The …rst and last results of this section, viz., Lemma 18.2 and Theorem 18.6, combine to prove Theorem 18.1. The proofs of the results in this section are given in Section 18.6. Assumption VD: For any sequence f n;h 2 KCLR : n 1g; the estimator VeDn is such that fn = Ve 1=2 and MFn is fn MFn ) !d M h for some random matrix M h 2 Rkp kp (where M n1=2 (M Dn de…ned in (18.6)), the convergence is joint with 0 n1=2 @ bn vec(D gbn EF n Gi ) 1 0 A !d @ gh vec(Dh ) 1 A 0 0 N @0(p+1)k ; @ 0k pk h5;g 0pk vec(Gi ) h k 11 AA ; (18.27) and (g h ; Dh ; M h ) has a mean zero multivariate normal distribution with pd variance matrix. The same condition holds for any subsequence fwn g and any sequence f wn in place of n throughout. Note that the convergence in (18.27) holds by Lemma 8.2. 39 wn ;h 2 KCLR :n 1g with The following lemma veri…es Assumption VD for the estimator VeDn de…ned in (5.3). Lemma 18.2 The estimator VeDn de…ned in (5.3) satis…es Assumption VD. Speci…cally, b n EF n Gi ; M fn MFn ) !d (g h ; Dh ; M h ); where M fn := Ve 1=2 ; MFn := ( vec(Gi ) ) 1=2 ; and n1=2 (b gn ; D Dn Fn (g h ; Dh ; M h ) has a mean zero multivariate normal distribution de…ned by (18.11) and (18.13)(18.18) with pd variance matrix. b n is de…ned in Lemma 18.2 and Comment: As stated in the paragraph containing (18.21), D cn = b n 1=2 and U bn = Ip : Theorem 18.3 below with W De…ne Sny := Diagf(n1=2 y 1 1 1=2 y qFn ) ; 1; :::; 1g 1Fn ) ; :::; (n 2 Rp p and Tny := Bny Sny ; (18.28) where Bny is de…ned in (18.7). b ny Tny is given in the following theorem. The asymptotic distribution of n1=2 D Theorem 18.3 Suppose Assumption VD holds. For all sequences f n;h 2 KCLR : n 1g; y y y y y b n EF n Gi ; D b n Tn ) !d (g h ; Dh ; n1=2 (b gn ; D h + M h ); where h is a nonrandom a¢ ne function of y Dh de…ned in (18.14) and (18.15), M h is a nonrandom linear (i.e., a¢ ne and homogeneous of degree one) function of M h de…ned in (18.19), (g h ; Dh ; M h ) has a mean zero multivariate normal distribution, and g h and Dh are independent. Under all subsequences fwn g and all sequences f wn ;h 2 KCLR :n 1g; the same result holds with n replaced with wn : y y h; M h) Comments: (i) Note that the random variables (g h ; in Theorem 5.1 have a multivariate normal distribution whose mean and variance matrix depend on lim V arFn ((fi 0 ; vec (fi fi 0 )0 ) and on the limits of certain functions of EFn Gi by (18.11)-(18.19). This, Lemma 18.2, and Theorem 18.3 combine to prove Theorem 5.1 of AG1. y (ii) From (18.19), M h = 0k y h4 = 0k and q y = 1 implies M h;p y M h;p ( qy p if p = 1 (because q y = 0 implies q = 0 which, in turn, implies qy has no columns).62 For p has no columns) or if h4;j = 0k for all j 1Fn ; :::; pFn ) of DFy n satisfy n1=2 jFn y 2; M h = 0k p if p = q y (because p: The former holds if the singular values ! 1 for all j semi-strongly identi…ed). The latter occurs if EFn Gi ! 0k p (i.e., all parameters are strongly or p (i.e., all parameters are either weakly identi…ed in the standard sense or semi-strongly identi…ed). These two condition fail to hold when Note that q y = 0 implies q = 0 when p = 1 because n1=2 DFy n = n1=2 MFn EFn Gi = O(1) when q y = 0 (by the de…nition of q y ) and this implies that n1=2 EFn Gi = O(1) using the …rst condition in FKCLR : In turn, the latter 1=2 1=2 implies that n1=2 Fn EFn Gi = O(1) using the last condition in F . That is, q = 0 (since WF = F and UF = Ip 1=2 c b b because Wn = n and Un = Ip in the present case, see the Comment to Lemma 18.2). 62 40 one or more parameters are strongly identi…ed and one or more parameters are weakly identi…ed or jointly weakly identi…ed. y (iii) For example, when p = 2 the conditions in Comment (ii) (under which M h = 0k p) fail to hold if EFn Gi1 6= 0k does not depend on n and n1=2 EFn Gi2 ! c for some c 2 Rk : The following lemma establishes the asymptotic distribution of rkny : Lemma 18.4 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, under all sequences f (a) rkny n;h 2 KCLR :n 1g; := bypn !p 1 if q y = p; (b) rkny := bypn !d rh (Dh ; M h ) if q y < p; where rh (Dh ; M h ) is de…ned in (18.20) using (18.19) with M h de…ned in Assumption VD (rather than in (18.18)), (c) byjn !p 1 for all j qy; (d) the (ordered ) vector of the smallest p b ny ; i.e., ((by y )1=2 ; :::; q y singular values of n1=2 D (q +1)n (bypn )1=2 )0 ; converges in distribution to the (ordered ) p hy0 3;k qy ( y h;p q y y + M h;p qy ) 2 R(k q y ) (p q y ) ; y where M h;p q y vector of the singular values of is de…ned in (18.19) with M h de…ned qy in Assumption VD (rather than in (18.18)), (e) the convergence in parts (a)-(d) holds jointly with the convergence in Theorem 18.3, and (f) under all subsequences fwn g and all sequences f wn ;h 2 KCLR :n 1g; parts (a)-(e) hold with n replaced with wn : The following lemma gives the joint asymptotic distribution of CLRn and rkny and the asymptotic null rejection probabilities of Kleibergen’s CLR test. Lemma 18.5 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, under all sequences f n;h 2 KCLR :n 1g; (a) CLRn = LMn + op (1) !d (b) lim P (CLRn > c(1 n!1 2 p and rkny !p 1 if q y = p; ; rkny )) = if q y = p; (c) (CLRn ; rkny ) !d (CLRh ; rh ) if q y < p; and (d) lim P (CLRn > c(1 n!1 P (CLRh = c(1 ; rkny )) = P (CLRh > c(1 ; rh )) if q y < p; provided ; rh )) = 0: Under all subsequences fwn g and all sequences f replaced with wn : 41 wn ;h 2 KCLR 1g; parts (a)-(d) hold with n Comments: (i) The CLR critical value function c(1 de…nition, clr(r) := 1 2 2 p + where the chi-square random variables 2 k p 2 p r+ and q 2 k p ( 2 p ; r) is the 1 + 2 k p r)2 + 4 quantile of clr(r): By 2r p ; (18.29) are independent. If rh := rh (Dh ; M h ) does not depend on M h ; then, conditional on Dh ; rh is a constant and LM h and J h are independent and distributed as 2 p and 2 k p (see the paragraph following (10.6)). In this case, even when q y = p; P (CLRh > c(1 ; rh )) = EDh P (CLRh > c(1 ; rh )jDh ) = ; (18.30) as desired, where the …rst equality holds by the law of iterated expectations and the second equality holds because rh is a constant conditional on Dh and c(1 ; rh ) is the 1 quantile of the conditional distribution of clr(rh ) given Dh ; which equals that of CLRh given Dh : (ii) However, when rh := rh (Dh ; M h ) depends on M h ; the distribution of rh conditional on Dh is not a pointmass distribution. Rather, conditional on Dh ; rh is a random variable that is not independent of LM h ; J h ; and CLRh : In consequence, the second equality in (18.30) does not hold and the asymptotic null rejection probability of Kleibergen’s CLR test may be larger or smaller than q y < p: depending upon the sequence f n;h 2 KCLR :n 1g (or f wn ;h 2 KCLR :n 1g) when Next, we use Lemma 18.5 to provide an expression for the asymptotic size of Kleibergen’s CLR test based on the Robin and Smith (2000) rank statistic with Jacobian-variance weighting. Theorem 18.6 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, the asymptotic size of Kleibergen’s CLR test based on rkny is AsySz = maxf ; sup P (CLRh > c(1 ; rh ))g h2H provided P (CLRh = c(1 ; rh )) = 0 for all h 2 H: Comments: (i) Comment (i) to Theorem 18.1 also applies to Theorem 18.6. (ii) Theorem 18.6 and Lemma 18.2 combine to prove Theorem 18.1. (iii) A CS version of Theorem 18.6 holds with the parameter space F see Comment (v) to Theorem 18.1 and the Comment to Proposition 8.1. 42 ;KCLR in place of FKCLR ; 18.5 Correct Asymptotic Size of Equally-Weighted CLR Tests Based on the Robin-Smith Rank Statistic In this subsection, we consider equally-weighted CLR tests, a special case of which is considered in Section 6. By de…nition, an equally-weighted CLR test is a CLR test that is based on a rkn b n for some general k k weighting matrix W fn : b n only through W fn D statistic that depends on D We show that such tests have correct asymptotic size when they are based on the rank statistic fn 2 Rk k that satis…es certain of Robin and Smith (2000) and employ a general weight matrix W 1=2 conditions. In contrast, the results in Section 6 consider the speci…c weight matrix b n 2 Rk k: The reason for considering these tests in this section is that the asymptotic results can be obtained as a relatively simple by-product of the results in Section 18.4. All that is required is a slight change in Assumption VD. The rank statistic that we consider here is rkny := b 0 f0 f b min (nDn Wn Wn Dn ): (18.31) We replace Assumption VD in Section 18.4 by the following assumption. Assumption W: For any sequence f n;h 2 KCLR : n 1g; the random k k weight matrix fn W y ) !d W h for some non-random k k matrices fW y : n 1g and fn is such that n1=2 (W W Fn Fn k matrix W h 2 Rk some random k k; WFyn ! Why for some nonrandom pd k k matrix Why ; the convergence is joint with the convergence in (18.27), and (g h ; Dh ; W h ) has a mean zero multivariate normal distribution with pd variance matrix. The same condition holds for any subsequence fwn g and any sequence f wn ;h 2 KCLR :n fn (= Ve 1=2 ) = Ip If one takes M Dn 1g with wn in place of n throughout. fn in Assumption VD, then D b ny = W fn D b n and the rank W statistics in (18.3) and (18.31) are the same. Thus, Assumption W is analogous to Assumption fn = Ip W fn and MFn = Ip W y : Note, however, that the latter matrix does not VD with M Fn typically satisfy the condition in Assumption VD that MFn is de…ned in (18.6), i.e., the condition that MFn = ( vec(Gi ) 1=2 ) : Fn Nevertheless, the results in Section 18.4 hold with Assumption VD WFy ; DFy = WFy EF Gi ; and M h = Ip replaced by Assumption W and with MF = Ip these changes, y Dh as in (18.15) with = Why Dh in (18.14) (because ( y Dh Below we show y as just given, and M h is y the key result that M h;p qy vec(Gi ) 1=2 ) h is replaced by Ip de…ned as in (18.19) with = y 0k (p q ) for rkny Why ); y M h;p qy W h : With y h is de…ned = W h h4 hy2;p qy : de…ned in (18.31). By (18.20), this implies that rh (Dh ; M h ) := min (( y y0 0 y h;p q y ) h3;k q y h3;k q y ( 43 y h;p q y )) (18.32) when q y < p: Note that the rhs in (18.32) does not depend on M h and, hence, is a function only of Dh : That is, rh (Dh ; M h ) = rh (Dh ): Given that rh (Dh ; M h ) does not depend on M h ; Comment (i) to Lemma 18.5 implies that P (CLRh > c(1 sequences f wn ;h 2 KCLR :n ; rh )) = under all subsequences fwn g and all 1g: This and Theorem 18.6 give the following result. Corollary 18.7 Let the parameter space for F be FKCLR : Suppose the rank statistic rkny (de…ned fn that satis…es Assumption W. Then, the asymptotic size in (18.31)) is based on a weight matrix W of the corresponding equally-weighted version of Kleibergen’s CLR test (de…ned in Section 5 with rkn ( ) = rkny ) equals : Comment: A CS version of Corollary 18.7 holds with the parameter space F ;KCLR in place of FKCLR ; see Comment (v) to Theorem 18.1 and the Comment to Proposition 8.1. y Now, we establish that M h;p qy (= W h h4 hy2;p qy ) = 0k Why h4 := lim WFyn EFn Gi = lim CFy n y 0 y Fn (BFn ) where CFy n (p q y ) : y0 y Fn BFn We have = hy3 lim y0 y F n h2 ; y Fn is the singular value decomposition of WFyn EFn Gi ; with the singular values of WFyn EFn Gi ; denoted by f yjFn : n and zeroes elsewhere, and CFy n and BFy n are the corresponding of singular vectors, as de…ned in (18.7). Hence, lim yn exists, 1g for j k call it is the k p matrix p; on the main diagonal k and p y h; (18.33) p orthogonal matrices y and equals hy0 3 h4 h2 : That is, the singular value decomposition of Why h4 is y y0 h h2 : Why h4 = hy3 The k p matrix zeroes elsewhere. Let qy; y h;j y h has the y h;j for j limits of the singular values of WFyn EFn Gi on its main diagonal and p denote the limits of these singular values. By the de…nition of y jFn = 0 for j = q y + 1; :::; p (because n1=2 as y h In addition, 2 =4 y h;q y y y (k 0 q) q (18.34) y y 0q (p q ) y y 0(k q ) (p q ) y hy0 2 h2;p ! hy1;j < 1): In consequence, 3 5 ; where qy 0 =@ y h;q y := Diagf y y 0q (p q ) 44 Ip qy 1 A: y h can be written y y h;1 ; :::; h;q y g: (18.35) (18.36) Thus, we have y M h;p qy : = W h (Why ) = W h (Why ) = 0k (p q y ) ; 1 Why h4 hy2;p 2 1 y4 h3 qy = W h (Why ) y h;p q y y y (k 0 q) q 0q 0(k y 1 y h3 (p q y ) q y ) (p q y ) y y0 y h h2 h2;p q y 30 5@ y y 0q (p q ) Ip qy 1 A (18.37) where the …rst equality holds by the paragraph following Assumption W and uses the condition in Assumption W that Why is pd and the second equality holds by (18.35) and (18.36). This completes the proof of Corollary 18.7. 18.6 Proofs of Results Stated in Sections 18.2 and 18.4 For notational simplicity, the proofs in this section are for the sequence fng; rather than a subsequence fwn : n 1g: The same proofs hold for any subsequence fwn : n 1g: Proof of Theorem 18.1. Theorem 18.1 follows from Theorem 18.6, which imposes Assumption VD, and Lemma 18.2, which veri…es Assumption VD when VeDn is de…ned by (5.3). Proof of Lemma 18.2. Consider any sequence f n;h 2 KCLR : n 1g: By the CLT result in b n EFn Gi ) in (14.1), and the de…nitions of g h and Dh in (18.11), the linear expansion of n1=2 (D (18.13), we have bn n1=2 (b gn ; D EFn Gi ) !d (g h ; Dh ): (18.38) Next, we apply the delta method to the CLT result in (18.11) and the function a( ) de…ned in (18.16). The mean component in the lhs quantity in (18.11) is (0(p+1)k0 ; vech(EFn fi fi 0 )0 )0 : We have 00 a @@ = vech = vech where vec(Gi ) Fn and 0(p+1)k vech(EFn fi fi 0 ) EFn vec(Gi vec(Gi ) Fn Fn 1=2 11 AA EFn Gi )vec(Gi EFn Gi )0 vec(Gi ) Fn 1 vec(Gi )0 Fn Fn 1=2 = vech(MFn ); (18.39) are de…ned in (3.2), the …rst equality uses the de…nitions of a( ) and fi vec(Gi ) Fn EFn fi fi 0 ! (given in (18.16) and (5.6), respectively), the second equality holds by the de…nition of in (8.15), and the third equality holds by the de…nition of MFn in (18.6). Also, h10;f and h10;f is pd. Hence, a( ) is well de…ned and continuously partially di¤erentiable at 45 lim(0(p+1)k0 ; vech(EFn fi fi 0 )0 )0 = (0(p+1)k0 ; vech(h10;f )0 )0 ; as required for the application of the delta method. The delta method gives n1=2 (An 0 0 vech(MFn )) = n1=2 @a @n 1 n X i=1 ! d Ah Lh ; 0 @ fi vech (fi fi 0 ) 11 AA 0 a@ 0(p+1)k vech(EFn fi fi 0 ) 11 AA (18.40) where the …rst equality holds by (18.39) and the de…nitions of a( ) and An in (18.16), the convergence holds by the delta method using the CLT result in (18.11) and the de…nition of Ah following (18.16). 1 Applying the inverse vech( ) operator, namely, vechkp;kp ( ); to both sides of (18.40) gives the recon…gured convergence result 1 n1=2 (vechkp;kp (An )) 1 MFn ) !d vechkp;kp (Ah Lh ) = M h ; (18.41) where the last equality holds by the de…nition of M h in (18.18). The convergence results in (18.38) and (18.41) hold jointly because both rely on the convergence result in (18.11). We show below that n1=2 (VeDn 1 (vechkp;kp (An )) 2 ) = op (1): This and the delta method applied again (using the function `(A) = A (18.42) 1=2 for a pd kp kp matrix A) give 1 because vechkp;kp (An ) = ( Qh10;f Q0 1=2 n1=2 (VeDn 1 (An )) = op (1) vechkp;kp vec(Gi ) 1=2 ) +op (1) h and vec(Gi ) h is pd (because h10;f is pd and (18.43) vec(Gi ) h = for some full row rank matrix Q). Equations (18.38), (18.41), and (18.43) establish the result of the lemma. 46 Now we prove (18.42). We have VeDn := n = 1 n n X b n )vec(Gi G vec(Gi i=1 n X 1 vec(Gi b n )0 G b n b n 1 b 0n 0 EFn Gi )vec(Gi EF n Gi ) i=1 =n 1 en n X bn vec(G vec(Gi EFn Gi )b gn0 EFn Gi )vec(Gi en ! bn vec(G 1 gbn gbn0 en en e EFn Gi )0 i=1 n bn EFn Gi )vec(G bn vec(G 1 e0 n + Op (n EFn Gi )b gn0 1 EFn Gi )0 0 ); (18.44) where the second equality holds by subtracting and adding EFn Gi and some algebra, by the de…nitions of b n and b n in (4.1), (4.3), and (5.3), and by the de…nitions of e n and e n in (18.16) and the third equality holds because (i) the second summand on the lhs of the third equality is Op (n 1 ) b n EFn Gi ) = Op (1) (by the CLT using the moment conditions in F; de…ned in because n1=2 vec(G b n EFn Gi ) = Op (1); and b n = Op (1); (3.1)) and (ii) n1=2 gbn = Op (1) (by Lemma 8.3)), n1=2 vec(G b n 1 = Op (1); e n = Op (1); and e n 1 = Op (1) (by the justi…cation given for (14.1)). Excluding the Op (n 1) 1 term, the rhs in (18.44) equals (vechkp;kp (An )) 2: Hence, (18.42) holds and the proof is complete. cn = Proof of Theorem 18.3. The proof is similar to that of Lemma 8.3 in Section 8 with W bn = Un = Ip ; and the following quantities q; D b n ; Dn (= EFn Gi ); Bn;q ; n;q ; Cn ; and Wn = Ik ; U y y y y y y y by n ; respectively. The proof employs the n replaced by q ; Dn ; Dn (= D ); B y ; y ; Cn ; and Fn n;q n;q notational simpli…cations in (13.1). We can write b y By y ( D n n;q y ) 1 n;q y y = Dny Bn;q y( y ) 1 n;q y by + n1=2 (D n By the singular value decomposition, Dny = Cny y Dny Bn;q y( y ) 1 n;q y = Cny y0 y y n Bn Bn;q y ( 0 = Cny @ Iqy y y 0(k q ) q y y0 n Bn : y ) 1 n;q y 1 A = Cy 47 y 1=2 Dny )Bn;q y (n (18.45) Thus, we obtain = Cny n;q y y ) 1: n;q y : 0 y@ n Iqy y y 0(p q ) q 1 A( y ) 1 n;q y (18.46) b n = (D b 1n ; :::; D b pn ) 2 Rk Let D n 1=2 by (D n Dny ) = n 1=2 p p X j=1 p X = j=1 and Dh = (D1h ; :::; Dph ) 2 Rk f1jn D b jn (M p: We have fpjn D b jn M1jFn EFn Gij ; :::; M f1jn n1=2 (D b jn [M f1jn EFn Gij ) + n1=2 (M MpjFn EFn Gij ) M1jFn )EFn Gij ; :::; fpjn n1=2 (D b jn EFn Gij ) + n1=2 (M fpjn MpjFn )EFn Gij ] M p X (M1jh Djh + M 1jh h4;j ; :::; Mpjh Djh + M pjh h4;j ); !d (18.47) j=1 where the convergence holds by Lemma 8.2 in Section 8, Assumption VD, and EFn Gij ! h4;j (by the de…nition of h4;j ): Combining (18.45)-(18.47) gives b y By y ( D n n;q where the equality uses n1=2 y jFn y ) 1 n;q y y h;q y ; y y = Cn;q y + op (1) !p h3;q y = (18.48) 0 q y by the de…nition of q y and Bn;q y Bn;q y = Iq y ; ! 1 for all j the convergence holds by the de…nition of hy3;qy ; and the last equality holds by the de…nition of y h;q y in (18.15). Using the singular value decomposition Dny = Cny y n1=2 Dny Bn;p 0 0q y qy = n1=2 Cny B 1=2 y = Cny B @ n n;p (k p) (p 0 y y0 y n Bn Bn;p q y 1 (p q y ) y y0 n Bn 0 again, we obtain = n1=2 Cny 0q y (p q y ) 0 y @ n 0q y (p q y ) Ip 1 qy C B C y C ! hy B Diagfhy C = hy hy ; :::; h g y 3 3 1;p 1;p A A @ 1;q +1 y 0(k p) (p q ) qy qy ) 1 A qy ; (18.49) where the second equality uses Bny0 Bny = Ip ; the convergence holds by the de…nitions of hy3 and hy1;j for j = 1; :::; p; and the last equality holds by the de…nition of hy1;p qy in the paragraph following (18.10), which uses (8.17). y By (18.47) and Bn;p qy ! hy2;p b ny n1=2 (D y qy ; we have y Dny )Bn;p y using the de…nitions of Dh and M h;p qy qy y !d Dh hy2;p qy y + M h;p qy ; in (18.14) and (18.19), respectively. 48 (18.50) Using (18.49) and (18.50), we get b ny B y n1=2 D n;p qy b ny + n1=2 (D y = n1=2 Dny Bn;p qy ! d hy3 hy1;p y Dh hy2;p qy + qy + y Dny )Bn;p y M h;p qy y h;p q y where the last equality holds by the de…nition of qy y h;p q y = y + M h;p qy ; (18.51) in (18.15). Equations (18.48) and (18.51) combine to give y b ny B y ) ) 1 ; n1=2 D n;p q y n;q y y y y M h;p qy ) = h + M h b ny Bny Sny = (D b ny B y y ( b ny Tny = n1=2 D n1=2 D n;q !d ( y h;q y ; y h;p q y + y (18.52) y using the de…nitions of Sny and Tny in (18.28), h in (18.15), and M h in (18.19). b n EFn Gi ) !d (g h ; Dh ): This convergence is joint with that in (18.52) By Lemma 8.2, n1=2 (b gn ; D b n EFn Gi ); which is part of the former, because the latter just relies on the convergence of n1=2 (D fn and of n1=2 (M MFn ) !d M h ; which holds jointly with the former by Assumption VD. This establishes the convergence result of Theorem 18.3. The independence of g h and (Dh ; by Lemma 8.2, and the fact that y h y h) follows from the independence of g h and Dh ; which holds is a nonrandom function of Dh : Proof of Lemma 18.4. The proof of Lemma 18.4 is analogous to the proof of Theorem 8.4 with cn = Wn = Ik ; U bn = Un = Ip ; and the following quantities q; D b n ; Dn (= EFn Gi ); bjn ; Bn ; Bn;q ; W b ny ; Dny (= Dy ); by ; Bny ; B y y ; Sny ; S y y ; y ; and hy y ; Sn ; Sn;q ; jFn ; and h3;q replaced by q y ; D Fn jn n;q n;q jFn 3;q respectively. Theorem 18.3, rather than Lemma 8.3, is employed to obtain the results in (16.37). In consequence, h;q and y h;q y h;p q are replaced by y y h;q y + M h;q y and y 0k q by (18.19)). y y y where + M h;qy = h;qy (because M h;qy := y y y are replaced by h;qy and h;p qy + M h;p qy in y y h;p q y + M h;p q y ; The quantities respectively, h;q and h;p q (16.37) and in the rest of the proof of Theorem y 8.4. Note that (16.39) holds with h3;q replaced by hy3;qy because h;qy = hy3;qy by (18.15) (just as b b h;q = h3;q ): Because Un = Un ; the matrices An and Ajn for j = 1; 2; 3 (de…ned in (16.39)) are all zero matrices, which simpli…es the expressions in (16.41)-(16.44) considerably. The proof of Theorem 8.4 uses Lemma 16.1 to obtain (16.42). Hence, an analogue of Lemma 16.1 is needed, where the changes listed in the …rst paragraph of this proof are made and h6;j and Cn are replaced by hy6;j and Cny ; respectively. In addition, FW U is replaced by FKCLR (because FKCLR FW U for equals F0 for and FKCLR WU su¢ ciently small and MW U su¢ ciently large using the facts that F0 \FW U su¢ ciently small and MW U su¢ ciently large by the argument following (8.5) bn = Un ; the matrices A bjn for F0 by the argument following (18.5)). Because U WU 49 j = 1; 2; 3 (de…ned in (16.2)) are all zero matrices, which simpli…es the expressions in (16.9)-(16.12) cn ; D b n; considerably. For (16.3) to go through with the changes listed above (in particular, with W b ny ; Dny ; and Ip ; respectively), we need to show that Dn ; and Un replaced by Ik ; D By (5.4) with = b ny n1=2 (D 0 Dny ) = Op (1): (and with the dependence of various quantities on (18.53) 0 suppressed for notational simplicity), we have 2 3 f11n f1pn M M p 6 7 X . .. 7 e 1=2 .. kp b jn ); where M fn = 6 b jn ; :::; M fpjn D b ny = f1jn D D (M 6 .. . . 7:= VDn 2R 4 5 j=1 fp1n fppn M M kp : (18.54) By (18.6), we have Dny = p X (M1jFn Djn ; :::; MpjFn Djn ) (18.55) j=1 using Dn = (D1n ; :::; Dpn ); and Djn := EFn Gij for j = 1; :::; p: For s = 1; :::; p; we have fsjn D b jn n1=2 (M fsjn n1=2 (D b jn MsjFn Djn ) = M b jn where n1=2 (D fsjn Djn ) + n1=2 (M fsjn Djn ) = Op (1) (by Lemma 8.2), n1=2 (M fn MsjFn ) = Op (1) (because n1=2 (M vec(Gi ) 1=2 ) ; F 0 EF vec(Gi )gi F 1 MFn ) !d M h by Assumption VD), MsjFn = O(1) (because MF = ( in (8.15) satis…es and min (V vec(Gi ) F arF (fi )) 2 := V arF (vec(Gi ) vec(Gi ) F 1 F MsjFn )Djn = Op (1); (18.56) gi ) = [ vec(Gi ) F de…ned : Ipk ]V arF (fi ); by the de…nition of FKCLR in (18.5)), and Djn = O(1) (by the moment conditions in F, de…ned in (3.1)). Hence, n 1=2 b ny (D Dny ) = p X j=1 f1jn D b jn ; :::; M fpjn D b jn ) n1=2 [(M (M1jFn Djn ; :::; MpjFn Djn )] = Op (1): (18.57) This completes the proof of the analogue of Lemma 16.1, which completes the proof of parts (a)-(d) of Lemma 18.4. For part (e) of Lemma 18.4, the results of parts (a)-(d) hold jointly with those in Theorem 18.3, rather than those in Lemma 8.3, because Theorem 18.3 is used to obtain the results in (16.37), rather than Lemma 8.3. This completes the proof. 50 Proof of Lemma 18.5. The proof of parts (a) and (b) is the same as the proof of Theorem 10.1 for the case where Assumption R(a) holds (which states that rkn !p 1) using Lemma 18.4(a), which shows that rkny !d 1 if q y = p: The proofs of parts (c) and (d) are the same as in (10.5)-(10.9) in the proof of Theorem 10.1 for the case where Assumption R(b) holds, using Theorem 18.3 and Lemma 18.4(b) in place of Lemma 8.3, with rh (Dh ; M h ) (de…ned in (18.20)) in place of rh (Dh ); and for part (d), with the proviso that P (CLRh = c(1 ; rh )) = 0: (The proof in Theorem 10.1 that P (CLRh = c(1 ; rh )) = 0 does not go through in the present case because rh = rh (Dh ; M h ) is not necessarily a constant conditional on Dh and alternatively, conditional on (Dh ; M h ); LM h and J h are not necessarily 2 p independent and distributed as and 2 :) k p Note that (10.10) does not necessarily hold in the present case, because rh = rh (Dh ; M h ) is not necessarily a constant conditional on Dh : The proof of Theorem 18.6 given below uses Corollary 2.1(a) of ACG, which is stated below as Proposition 18.8. It is a generic asymptotic size result. Unlike Proposition 8.1 above, Proposition 18.8 applies when the asymptotic size is not necessarily equal to the nominal size : Let f n :n 1g be a sequence of tests of some null hypothesis whose null distributions are indexed by a parameter with parameter space : Let RPn ( ) denote the null rejection probability of a …nite nonnegative integer J; let fhn ( ) = (h1n ( ); :::; hJn ( ))0 2 RJ : n functions on lim inf n!1 Cn 1g be a sequence of lim supn!1 Cn 1g; let Cn ! [C1;1 ; C2;1 ] denote that C1;1 C2;1 : Assumption B: For any subsequence fwn g of fng and any sequence f wn ) under : For : De…ne H as in (8.1). For a sequence of scalar constants fCn : n hw n ( n ! h 2 H; RPwn ( wn ) wn 2 n!1 2 1g for which ! [RP (h); RP + (h)] for some RP (h); RP + (h) 2 [0; 1]: Proposition 18.8 (ACG, Corollary 2.1(a)) Under Assumption B, the tests f AsySz := lim sup sup :n n : n 1g have RPn ( ) 2 [suph2H RP (h); suph2H RP + (h)]: Comments: (i) Corollary 2.1(a) of ACG is stated for con…dence sets, rather than tests. But, following Comment 4 to Theorem 2.1 of ACG, with suitable adjustments (as in Proposition 18.8 above) it applies to tests as well. (ii) Under Assumption B, if RP (h) = RP + (h) for all h 2 H; then AsySz = suph2H RP + (h): We use this to prove Theorem 18.6. The result of Proposition 18.8 for the case where RP (h) 6= RP + (h) for some h 2 H is used when proving Comment (i) to Theorem 18.1 and the Comment to Theorem 18.6. Proof of Theorem 18.6. Theorem 18.6 follows from Lemma 18.5 and Proposition 18.8 because 51 Lemma 18.5 veri…es Assumption B with RP (h) = RP + (h) = ; rh )) when q y < p: RP + (h) = P (CLRh > c(1 19 when q y = p and with RP (h) = Proof of Theorem 7.1 Theorem 7.1 of AG1. Suppose the LM test, the CLR test with moment-variance weighting, and when p = 1 the CLR test with Jacobian-variance weighting are de…ned as in this section, the parameter space for F is FT S;0 for the …rst two tests and FT S;JV W;p=1 for the third test, and Assumption V holds. Then, these tests have asymptotic sizes equal to their nominal size 2 (0; 1) and are asymptotically similar (in a uniform sense). Analogous results hold for the corresponding CS’s for the parameter spaces F ;T S;0 and F ;T S;JV W;p=1 : The proof of Theorem 7.1 is analogous to that of Theorems 4.1, 5.2, and 6.1. In the time series case, for tests, we de…ne but with 5;F =( 1;F ; :::; 9;F ) and f n;h :n 1g as in (8.9) and (8.11), respectively, de…ned di¤erently than in the i.i.d. case. (For CS’s in the time series case, we make the adjustments outlined in the Comment to Proposition 8.1.) We de…ne63 5;F := VF = 1 X m= 1 In consequence, 5;Fn 0 EF @ gi vec(Gi EF Gi ) 10 A@ gi vec(Gi m m EF Gi m) 10 A : (19.1) ! h5 implies that VFn ! h5 and the condition in Assumption V holds with V = h5 : The proof of Theorem 7.1 uses the CLT given in the following lemma. Lemma 19.1 Let fi := (gi0 ; vec(Gi )0 )0 : We have: wn all subsequences fwn g and all sequences f wn ;h :n 1=2 Pwn i=1 (fi EFn fi ) !d N (0(p+1)k ; h5 ) under 1g: Proof of Theorem 7.1. The proof is the same as the proofs of Theorems 4.1, 5.2, and 6.1 (given in Sections 9, 10, and 11, respectively, in the Appendix to AG1) and the proofs of Lemmas 8.2 and 8.3 and Theorem 8.4 (given in Sections 14, 15, and 16 in this Supplemental Material), upon which the former proofs rely, for the i.i.d. case with some modi…cations. The modi…cations a¤ect the proofs of Lemmas 8.2 and 8.3 and the proof of Theorem 5.2. No modi…cations are needed elsewhere. The …rst modi…cation is the change in the de…nition of 63 of 5;F described in (19.1). The di¤erence in the de…nitions of 5;F in the i.i.d. and time series cases re‡ects the di¤erence in the de…nitions vec(Gi ) in these two cases. See the footnote at (7.1) above regarding the latter. F 52 The second modi…cation is that b n = b n ( 0 ) !p h5;g not by the WLLN but by Assumption V and the de…nition of b n ( ) in (7.4). In the time series case, by de…nition, 5;F := VF ; so h5 := lim upper left k FT S ; = lim VFn : By de…nition, h5;g is the upper left k 5;Fn k submatrix of h5 and k submatrix of VF by (7.1) and (19.1). Hence, h5;g = lim min ( F ) Fn : F is the By the de…nition of 8F 2 FT S : Hence, h5;g is pd. k submatrix of h5 that corresponds to the submatrix b jn ( ) of Vbn ( ) in (7.4) for j = 1; :::; p: The third modi…cation is that b jn = b jn ( 0 ) = h5;Gj g + op (1) in (14.1) in the proof of Lemma 8.2 (rather than b jn = EFn Gij g 0 + op (1)) for j = 1; :::; p and this holds by Let h5;Gj g be the k i Assumption V and the de…nition of b jn ( ) in (7.4) (rather than by the WLLN). We write 0 h5 = @ h5;g h05;Gg h5;Gg h5;G 1 A for h5;g 2 Rk k ; h5;Gg 0 1 h5;G1 g B C B C .. =B C 2 Rpk . @ A h5;Gp g k ; and h5;G 2 Rpk pk : (19.2) The fourth modi…cation is that VeDn in (11.1) in the proof of Theorem 5.2 is de…ned as described in Section 7, rather than as in (5.3). In addition, VeDn !p h7 in (11.1) holds with h7 = h5;G h5;Gg (h5;g ) 1 h0 5;Gg by Assumption V, rather than by the WLLN. The …fth modi…cation is the use of a WLLN and CLT for triangular arrays of strong mixing random vectors, rather than i.i.d. random vectors, for the quantities in the proof of Lemma 8.2 and elsewhere. For the WLLN, we use Example 4 of Andrews (1988), which shows that for a strong P mixing row-wise-stationary triangular array fWi : i ng we have n 1 ni=1 ( (Wi ) EFn (Wi )) !p 0 for any real-valued function ( ) (that may depend on n) for which supn for some bn n1=2 (D (Wi )jj1+ < 1 1 EFn jj > 0: For the CLT, we use Lemma 19.1 as follows. The joint convergence of n1=2 gbn and EFn Gi ) in the proof of Lemma 8.2 is obtained from (14.1), modi…ed by the second and third modi…cations above, and the following result: n 1=2 n X i=1 ( (Wi ) 0 EFn (Wi )) = @ 0k Ik h5;Gg h5;g1 !d N (0(p+1)k ; Lh5 ); where 0 1 0 gi A=@ (Wi ) := @ vec(Gi ) h5;Gg h5;g1 gi pk Ipk Ik h5;Gg h5;g1 1 An 0k 1=2 n X (fi EFn fi ) i=1 pk Ipk 10 A@ gi vec(Gi ) 1 A ; (19.3) fi = (gi0 ; vec(Gi )0 )0 ; and the convergence holds by Lemma 19.1. Using (19.2), the variance matrix 53 Lh5 in (19.3) takes the form: 0 0k Ik Lh5 = @ h5;Gg h5;g1 =@ h5;Gg h5;g1 0 vec(Gi ) h 10 pk Ipk 0k Ik h5;Gg0 h5;Gg h5;G A@ h5;g 0k h5;Gg 10 pk Ipk h5;Gg h5;g1 h05;Gg : = h5;G A@ h5;g 10 pk A@ vec(Gi ) h h5;g1 h05;Gg Ik 0pk 1 0 A=@ k Ipk h5;g 0pk k 0k pk 1 A vec(Gi ) h 1 A ; where (19.4) Equations (14.1) (modi…ed as described above), (19.3), and (19.4) combine to give the result of Lemma 8.2 for the time series case. The sixth modi…cation occurs in the proof of Lemma 8.3(d) in Section 15 in this Supplemental Material. In the time series case, the proof goes through as is, except that the calculations in (15.13) ai F are not needed because ai F (and, hence, as well) is de…ned with its underlying components ai F re-centered at their means (which is needed to ensure that vec(G ) vec(Gi ) implies that lim Fn i = h automatically holds and 1=2 vec(h03;k q h5;g Gi h2;p q 2 ) (which, in the i.i.d. case, is proved in h is a convergent sum). The latter lim 0 vec(CF n ;k Fn q 1=2 Fn Gi BFn ;p q 2 ) = (15.13). This completes the proof of Theorem 7.1. Proof of Lemma 19.1. For notational simplicity, we prove the result for the sequence fng rather than a subsequence fwn : n 1g: The same proof applies for any subsequence. By the Cramér- Wold device, it su¢ ces to prove the result with fi EFn fi and h5 replaced by s(Wi ) = b0 (fi EFn fi ) and b0 h5 b; respectively, for arbitrary b 2 R(p+1)k : First, we show lim V arFn n 1=2 n X ! s(Wi ) i=1 where by assumption V arFn n 1=2 n X 5;Fn ! s(Wi ) i=1 = P1 = m= 1 EFn s(Wi )s(Wi m ) n X1 CovFn (s(Wi ); s(Wi = b0 h5 b; ! h5 : By change of variables, we have m )) m= n+1 (19.5) n X1 m= n+1 jmj CovFn (s(Wi ); s(Wi n m )): (19.6) This gives V arFn n 1=2 n X i=1 2 1 X m=n ! s(Wi ) jjCovFn (s(Wi ); s(Wi b0 m ))jj 5;Fn b + n X1 m= n+1 54 jmj jjCovFn (s(Wi ); s(Wi n m ))jj: (19.7) By a standard strong mixing covariance inequality, e.g., see Davidson (1994, p. 212), sup jjCovF (s(Wi ); s(Wi m ))jj C1 F 2FT S =(2+ ) (m) F C1 C =(2+ ) m d =(2+ ) ; where d =(2 + ) > 1; (19.8) for some C1 < 1; where the second inequality uses the de…nition of FT S in (7.2). In consequence, both terms on the rhs of (19.7) converge to zero. This and b0 5;Fn b ! b0 h5 b establish (19.5). P P When b0 h5 b = 0; we have limn!1 V arFn (n 1=2 ni=1 s(Wi )) = 0; which implies that n 1=2 ni=1 P s(Wi ) !d N (0; b0 h5 b) = 0: When b0 h5 b > 0; we can assume 2n = V arFn (n 1=2 ni=1 s(Wi )) c for some c > 0 8n 1 without loss of generality. We apply the triangular array CLT in Corollary 1 of de Jong (1997) with (using de Jong’s notation) n 1=2 s(W i) n 1: = = 0; cni := n 1=2 n 1; and Xni := Now we verify conditions (a)-(c) of Assumption 2 of de Jong (1997). Condition (a) holds automatically. Condition (b) holds because cni > 0 and EFn jXni =cni j2+ = EFn js(Wi )j2+ 2jjbjj2+ M < 1 8Fn 2 FT S : Condition (c) holds by taking Vni = Xni (where Vni is the random variable that appears in the de…nition of near epoch dependence in De…nition 2 of de Jong (1997)), dni = 0; and using Fn (m) Cm d 8Fn 2 FT S for d > (2 + )= and C < 1: By Corollary 1 of de Jong (1997), we have Xni !d N (0; 1): This and (19.5) give n 1=2 n X i=1 s(Wi ) !d N (0; b0 h5 b); as desired. 55 (19.9) References Andrews, D. W. K. (1988): “Laws of Large Numbers for Dependent Non-identically Distributed Random Variables,” Econometric Theory, 4, 458–467. Davidson, J. (1994): Stochastic Limit Theory. Oxford: Oxford University Press. de Jong, R. M. (1997): “Central Limit Theorems for Dependent Heterogeneous Random Variables,” Econometric Theory, 13, 353–367. Hwang, S.-G. (2004): “Cauchy’s Interlace Theorem for Eigenvalues of Hermitian Matrices,”American Mathematical Monthly, 111, 157–159. Johansen, S. (1991): “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models,” Econometrica, 59, 1551–1580. Kleibergen, F. (2005): “Testing Parameters in GMM Without Assuming That They Are Identi…ed,” Econometrica, 73, 1103–1123. Rao, C. R. (1973): Linear Statistical Inference and its Applications. 2nd edition. New York: Wiley. Robin, J.-M., and R. J. Smith (2000): “Tests of Rank,” Econometric Theory, 16, 151–175. Stewart, G. W. (2001): Matrix Algorithms Volume II : Eigensystems. Philadelphia: SIAM. 56
© Copyright 2026 Paperzz