Econometric Theory, 28, 2012, 42–86. doi:10.1017/S0266466611000120 ASYMPTOTIC DISTRIBUTION OF JIVE IN A HETEROSKEDASTIC IV REGRESSION WITH MANY INSTRUMENTS JOHN C. CHAO University of Maryland NORMAN R. SWANSON Rutgers University JERRY A. HAUSMAN AND WHITNEY K. NEWEY MIT TIEMEN WOUTERSEN Johns Hopkins University This paper derives the limiting distributions of alternative jackknife instrumental variables (JIV) estimators and gives formulas for accompanying consistent standard errors in the presence of heteroskedasticity and many instruments. The asymptotic framework includes the many instrument sequence of Bekker (1994, Econometrica 62, 657–681) and the many weak instrument sequence of Chao and Swanson (2005, Econometrica 73, 1673–1691). We show that JIV estimators are √ asymptotically normal and that standard errors are consistent provided that K n /rn → 0 as n → ∞, where K n and rn denote, respectively, the number of instruments and the concentration parameter. This is in contrast to the asymptotic behavior of such classical instrumental variables estimators as limited information maximum likelihood, bias-corrected two-stage least squares, and two-stage least squares, all of which are inconsistent in the presence of heteroskedasticity, unless K n /rn → 0. We also show that the rate of convergence and the form of the asymptotic covariance matrix of the JIV estimators will in general depend on the strength of the instruments as measured by the relative orders of magnitude of rn and K n . Earlier versions of this paper were presented at the NSF/NBER conference on weak and/or many instruments at MIT in 2003 and at the 2004 winter meetings of the Econometric Society in San Diego, where conference participants provided many useful comments and suggestions. Particular thanks are owed to D. Ackerberg, D. Andrews, J. Angrist, M. Caner, M. Carrasco, P. Guggenberger, J. Hahn, G. Imbens, R. Klein, N. Lott, M. Moriera, G.D.A. Phillips, P.C.B. Phillips, J. Stock, J. Wright, two anonymous referees, and a co-editor for helpful comments and suggestions. Address correspondence to: Whitney K. Newey, Department of Economics, MIT, E52-262D, Cambridge, MA 02142-1347, USA; e-mail: [email protected]. 42 c Cambridge University Press 2011 JIVE WITH HETEROSKEDASTICITY 43 1. INTRODUCTION It has long been known that the two-stage least squares (2SLS) estimator is biased with many instruments (see, e.g., Sawa, 1968; Phillips, 1983; and the references cited therein). In large part because of this problem, various approaches have been proposed in the literature to reduce the bias of the 2SLS estimator. In recent years, there has been interest in developing procedures that use “delete-one” fitted values in lieu of the usual first-stage ordinary least squares fitted values as the instruments employed in the second stage of the estimation. A number of different versions of these estimators, referred to as jackknife instrumental variables (JIV) estimators, have been proposed and analyzed by Phillips and Hale (1977), Angrist, Imbens, and Krueger (1999), Blomquist and Dahlberg (1999), Ackerberg and Devereux (2009), Davidson and MacKinnon (2006), and Hausman, Newey, Woutersen, Chao, and Swanson (2007). The JIV estimators are consistent with many instruments and heteroskedasticity of unknown form, whereas other estimators, including limited information maximum likelihood (LIML) and bias-corrected 2SLS (B2SLS) estimators are not (see, e.g., Bekker and van der Ploeg, 2005; Ackerberg and Devereux, 2009; Chao and Swanson, 2006; Hausman et al., 2007). The main objective of this paper is to develop asymptotic theory for the JIV estimators in a setting that includes the many instrument sequence of Kunitomo (1980), Morimune (1983), and Bekker (1994) and the many weak instrument sequence of Chao and Swanson (2005). To be precise, √ we show that JIV estimators are consistent and asymptotically normal when K n /rn → 0 as n → ∞, where K n and rn denote the number of instruments and the so-called concentration parameter, respectively. In contrast, consistency of LIML and B2SLS generally requires that Krnn → 0 as n → ∞, meaning that the number of instruments is small relative to the identification strength. We show that both the rate of convergence of the JIV estimator and the form of its asymptotic covariance matrix depend on how weak the available instruments are, as measured by the relative order of magnitude of rn vis-à-vis K n . We also show consistency of the standard errors under heteroskedasticity and many instruments. Hausman et al. (2007) also consider a jackknife form of LIML that is slightly more difficult to compute but is asymptotically efficient relative to JIV under many weak instruments and homoskedasticity. With heteroskedasticity, any of the estimators may outperform the others, as shown by Monte Carlo examples in Hausman et al. Hausman et al. also propose a jackknife version of the Fuller (1977) estimator that has fewer outliers. This paper is a substantially altered and revised version of Chao and Swanson (2004), in which we now allow for the many instrument sequence of Kunitomo (1980), Morimune (1983), and Bekker (1994). In the process of showing the asymptotic normality of JIV, this paper gives a central limit theorem for quadratic (and, more generally, bilinear) forms associated with an idempotent matrix. This theorem can be used to study estimators other than JIV. For example, it has already been used in Hausman et al. (2007) to derive the asymptotic properties of the 44 JOHN C. CHAO ET AL. jackknife versions of the LIML and Fuller (1977) estimators and in Chao, Hausman, Newey, Swanson, and Woutersen (2010) to derive a moment-based test. The rest of the paper is organized as follows. Section 2 sets up the model and describes the estimators and standard errors. Section 3 lays out the framework for the asymptotic theory and presents the main results of our paper. Section 4 comments on the implications of these results and concludes. All proofs are gathered in the Appendixes. 2. THE MODEL AND ESTIMATORS The model we consider is given by y = X δ0 + ε , n×G G×1 n×1 n×1 X = ϒ + U, where n is the number of observations, G is the number of right-hand-side variables, ϒ is the reduced form matrix, and U is the disturbance matrix. For the asymptotic approximations, the elements of ϒ will implicitly be allowed to depend on n, although we suppress the dependence of ϒ on n for notational convenience. Estimation of δ0 will be based on an n × K matrix, Z , of instrumental variable observations with rank(Z ) = K . Let Z = (ϒ, Z ) and assume that E[ε|Z] = 0 and E[U |Z] = 0. This model allows for ϒ to be a linear combination of Z (i.e., ϒ = Z π, for some K × G matrix π). Furthermore, some columns of X may be exogenous, with the corresponding column of U being zero. The model also allows for Z to approximate the reduced form. For example, let X i , ϒi , and Z i denote the ith row (observation) for X, ϒ, and Z , respectively. We could let ϒi = f 0 (wi ) be a vector of unknown functions of a vector wi of underlying instruments and let Z i = ( p1K (wi ), . . . , p K K (wi )) for approximating functions pk K (w), such as power series or splines. In this case, linear combinations of Z i may approximate the unknown reduced form (e.g., Newey, 1990). To describe the estimators, let P = Z (Z Z )−1 Z and Pij denote the (i, j)th ele¯ −i = (Z Z − Z i Z )−1 (Z X − Z i X ) be the reduced ment of P. Additionally, let i i form coefficients obtained by regressing X on Z using all observations except the ith. The JIV1 estimator of Phillips and Hale (1977) is obtained as δ̃ = n ∑ i=1 −1 ¯ −i Z i X i n ∑ ¯ −i Z i yi . i=1 Using standard results on recursive residuals, it follows that ¯ −i Z i = X Z (Z Z )−1 Z i − Pii X i (1 − Pii ) = ∑ Pij X j /(1 − Pii ). j=i JIVE WITH HETEROSKEDASTICITY 45 Then, we have that δ̃ = H̃ −1 ∑ X i Pij (1 − Pj j )−1 y j , i= j H̃ = ∑ X i Pij (1 − Pj j )−1 X j , i= j where i= j denotes the double sum ∑i ∑ j=i . The JIV2 estimator proposed by Angrist et al. (1999), JIVE2, has a similar form, except that −i = (Z Z )−1 ¯ −i . It is given by (Z X − Z i X i ) is used in place of δ̂ = Ĥ −1 ∑ X i Pij y j , i= j Ĥ = ∑ X i Pij X j . i= j To explain why JIV2 is a consistent estimator, it is helpful to consider JIV2 as a minimizer of an objective function. As usual, the limit of the minimizer will be the minimizer of the limit under appropriate regularity conditions. We focus on δ̂ to simplify the discussion. The estimator δ̂ satisfies δ̂ = arg minδ Q̂(δ), where Q̂(δ) = ∑ ( yi − X i δ)Pij ( yj − X j δ). i= j Note that the difference between the 2SLS objective function ( y − X δ)P( y − n X δ) and Q̂(δ) is ∑i=1 Pii ( yi − X i δ)2 . This is a weighted least squares object that is a source of bias in 2SLS because its expectation is not minimized at δ0 when X i and εi are correlated. This object does not vanish asymptotically relative to E[ Q̂(δ)] under many (or many weak) instruments, leading to inconsistency of 2SLS. When observations are mutually independent, the inconsistency is caused by this term, so removing it to form Q̂(δ) makes δ̂ consistent. To explain further, consider the JIV2 objective function Q̂(δ). Note that for Ũi (δ) = εi − Ui (δ − δ0 ) Q̂(δ) = Q̂ 1 (δ) + Q̂ 2 (δ) + Q̂ 3 (δ), Q̂ 2 (δ) = −2 ∑ i= j Ũi (δ)Pij ϒ j (δ − δ0 ), Q̂ 1 (δ) = ∑ (δ − δ0 ) ϒi Pij ϒj (δ − δ0 ), i= j Q̂ 3 (δ) = ∑ Ũi (δ)PijŨ j (δ). i= j Then by the assumptions E[Ũi (δ)] = 0 and independence of observations, we have E[ Q̂(δ)|Z] = Q 1 (δ). Under the regularity conditions in Section 3, ∑i= j ϒi Pij ϒ j is positive definite asymptotically, so Q 1 (δ) is minimized at δ0 . Thus, the expectation Q 1 (δ) of Q̂(δ) is minimized at the true parameter δ0 ; in the terminology of Han and Phillips (2006), the many instrument “noise” term in the expected objective function is identically zero. For consistency of δ̂, it is also necessary that the stochastic components of Q̂(δ) do not dominate asymptotically. The size of Q̂ 1 (δ) (for δ = δ0 ) is proportional to the concentration parameter that we denote by rn . It√turns out that Q̂ 2 (δ) has size smaller than Q̂ 1 (δ) asymptotically but Q̂ 3 (δ) is O p ( K n ) (Lemma A1 shows that the variance of Q̂ 3 (δ) is proportional to K n ). Thus, to ensure that the expectation 46 JOHN C. CHAO ET AL. of √ Q̂(δ) dominates the stochastic part of Q̂(δ), it suffices to impose the restriction K n /rn → 0, which we do throughout the asymptotic theory. This condition was formulated in Chao and Swanson (2005). The estimators δ̃ and δ̂ are consistent and asymptotically normal √ with heteroskedasticity under the regularity conditions we impose, including K n /rn → 0. In contrast, consistency of LIML and Fuller (1977) require K n /rn → 0 when Pii is asymptotically correlated with E[X i εi |Z]/E[εi2 |Z], as discussed in Chao and Swanson (2004) and Hausman et al. (2007). This condition is also required for consistency of the bias-corrected 2SLS estimator of Donald and Newey (2001) when Pii is asymptotically correlated with E[X i εi |Z], as discussed in Ackerberg and Devereux (2009). Thus, JIV estimators are robust to heteroskedasticity and many instruments (when K n grows as fast as rn ), whereas LIML, Fuller (1977), or B2SLS estimators are not. Hausman et al. (2007) also consider a JIV form of LIML, which is obtained by minimizing Q̂(δ)/[( y − X δ) ( y − X δ)]. The sum of squared residuals in the denominator makes computation somewhat more complicated; however, like LIML, it has an explicit form in terms of the smallest eigenvalue of a matrix. This JIV form of LIML is asymptotically efficient relative to δ̂ and δ̃ under many weak instruments and homoskedasticity. With heteroskedasticity, δ̂ and δ̃ may perform better than this estimator, as shown by Monte Carlo examples in Hausman et al.; they also propose a jackknife version of the Fuller (1977) estimator that has fewer outliers than the JIV form of LIML. To motivate the form of the variance estimator for δ̂ and δ̃, note that for ξi = (1 − Pii )−1 εi , substituting yi = X i δ0 + εi in the equation for δ̃ gives δ̃ = δ0 + H̃ −1 ∑ X i Pij ξ j . (1) i= j After appropriate normalization, the matrix H̃ −1 will converge and a central limit theorem will apply to ∑i= j X i Pij ξ j ,which leads to a sandwich form for the asymptotic variance. Here H̃ −1 can be used to estimate the outside terms in the sandwich. The inside term, which is the variance of ∑i= j X i Pij ξ j , can be estimated by dropping terms that are zero from the variance, the expectation, and removing replacing ξi with an estimate, ξ̃i = (1 − Pii )−1 yi − X i δ̃ . Using the independence of the observations, E[εi |Z] = 0, and the exclusion of the i = j terms in the double sums, it follows that E ∑ X i Pij ξ j ∑ X i Pij ξ j |Z i= j =E i= j ∑ ∑ i, j k ∈{i, / j} Pik Pjk X i X j ξk2 + ∑ Pij2 X i ξi X j ξ j |Z . i= j Removing the expectation and replacing ξi with ξ̃i gives ˜ =∑ ∑ i, j k ∈{i, / j} Pik Pjk X i X j ξ̃k2 + ∑ Pij2 X i ξ̃i X j ξ̃ j . i= j JIVE WITH HETEROSKEDASTICITY 47 The estimator of the asymptotic variance of δ̃ is then given by ˜ H̃ −1 . Ṽ = H̃ −1 This estimator is robust to heteroskedasticity, as it allows Var(ξi |Z) and E[X i ξi |Z] to vary over i. A vectorized form of Ṽ is easier to compute. Note that for X̃ i = X i /(1 − Pii ), we have H̃ = X P X̃ − ∑i X i Pii X̃ i . Also, let X̄ = P X, Z̃ = Z (Z Z )−1 , and Z i and Z̃ i equal the ith row of Z and Z̃ , respectively. Then, as shown in the proof of Theorem 4, we have ˜ = n ∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ξ̂i2 i=1 K +∑ K n ∑ ∑ Z̃ ik Z̃ i X i ξ̂i k=1 =1 n ∑ Z jk Z j X j ξ̂j i=1 . j=1 This formula can be computed quickly by software with fast vector operations, even when n is large. An asymptotic variance estimator for δ̂ can be formed in an analogous way. Note that Ĥ = X P X − ∑i X i Pii X i . Also for ε̂i = yi − X i δ̂, we can estimate the middle matrix of the sandwich by ˆ = n ∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ε̂i2 i=1 K +∑ K n ∑ ∑ Z̃ ik Z̃ i X i ε̂i k=1 =1 i=1 n ∑ Z jk Z j X j ε̂j . j=1 The variance estimator for δ̂ is then given by ˆ Ĥ −1 . V̂ = Ĥ −1 Here Ĥ is symmetric because P is symmetric, so a transpose is not needed for the third matrix in V̂ . 3. MANY INSTRUMENT ASYMPTOTICS Our asymptotic theory combines the many instrument asymptotics of Kunitomo (1980), Morimune (1983), and Bekker (1994) with the many weak instrument asymptotics of Chao and Swanson (2005). All of our regularity conditions are conditional on Z = (ϒ, Z ). To state the regularity conditions, let Z i , εi ,Ui , and ϒi denote the ith row of Z , ε,U, and ϒ, respectively. Also let a.s. denote almost surely (i.e., with probability one) and a.s.n denote a.s. for n large enough (i.e., with probability one for all n large enough). 48 JOHN C. CHAO ET AL. Assumption 1. K = K n → ∞, Z includes among its columns a vector of ones, for some C < 1, rank(Z ) = K , and Pii ≤ C, (i = 1, . . . , n) a.s.n. In this paper, C is a generic notation for a positive constant that may be bigger or less than 1. Hence, although in Assumption 1 C is taken to be less than 1, in other parts of the paper it might not be. The restriction that rank(Z ) = K is a normalization that requires excluding redundant columns from Z . It can be verified in particular cases. For instance, when wi is a continuously distributed scalar, Z i = p K (wi ), and pk K (w) = wk−1 , it can be shown that Z Z is nonsingular with probability one for K < n.1 The condition Pii ≤ C < 1 implies that K /n ≤ C n because K /n = ∑i=1 Pii /n ≤ C. Now, let λmin (A) denote the √ smallest eigenvalue of a symmetric matrix A and for any matrix B, let B = tr(B B). √ Assumption 2. ϒi = Sn z i / n where Sn = S̃n diag (μ1n , . . . , μGn ), S̃n is G × G and bounded, and the smallest eigenvalue of S̃n S̃n is bounded away from zero. 2 √ √ min μ jn → Also, for each j, either μ jn = n or μ jn / n → 0, rn = 1≤ j≤G n √ /n ≤ C and ∞, and K /r → 0. Also, there is C > 0 such that z z ∑ n i i i=1 n λmin ∑i=1 z i z i /n ≥ 1/C a.s.n. This condition is similar to Assumption 2 of Hansen, Hausman, and Newey (2008). It accommodates linear models where included instruments (e.g., a constant) have fixed reduced form coefficients and excluded instruments have coefficients that can shrink as the sample size grows. A leading example of such a model is a linear structural equation with one endogenous variable of the form δ01 + δ0G X iG + εi , yi = Z i1 (2) where Z i1 is a G 1 × 1 vector of included instruments (e.g., including a constant) and X iG is an endogenous variable. Here the number of right-hand-side variables is G 1 + 1 = G. Let the reduced form be partitioned conformably with δ, as ϒi = , ϒ ) and U = (0,U ) . Here the disturbances for the reduced form for (Z i1 iG i iG Z i1 are zero because Z i1 is taken to be exogenous. Suppose that the reduced form for X iG depends linearly on the included instrumental variables Z i1 and on an excluded instrument z iG as in X iG = ϒiG + UiG , rn /n z iG . ϒiG = π1 Z i1 + Here we normalize z iG so that rn determines how strongly δG is identified, and we absorb into z iG any other terms, such as unknown coefficients. For Assump , z ) and require that the second moment matrix of z is tion 2, we let z i = (Z i1 iG i bounded and bounded away from zero. This normalization allows rn to determine the strength of identification of δG . For example, if rn = n, then the coefficient on z iG does not shrink, which corresponds to strong identification of δG . If rn grows √ more slowly than n, then δG will be more weakly identified. Indeed, 1/ rn will JIVE WITH HETEROSKEDASTICITY 49 be the convergence rate for estimators of δG . We require rn → ∞ to avoid the weak instrument setting of Staiger and Stock (1997), where δG is not asymptotically identified. For this model, the reduced form is I 0 I√0 Z i1 Z√ i1 = . ϒi = π1 1 0 rn /n π1 Z i1 + rn /nz iG z iG This reduced form is as specified in Assumption 2 with √ √ I 0 1 ≤ j ≤ G 1, , μ jn = n, μGn = rn . S̃n = π1 1 Note how this somewhat complicated specification is needed to accommodate fixed reduced form coefficients for included instrumental variables and excluded instruments with identifying power that depend on n. We have been unable to simplify Assumption 2 while maintaining the generality needed for such important cases. We will not require that z iG be known, only that it be approximated by a lin , Z ) . Implicitly, Z ear combination of the instrumental variables Z i = (Z i1 i1 i2 and z iG are allowed to depend on n. One important case is where the excluded instrument z iG is an unknown linear combination of the instrumental variables , Z ) . For example, the many weak instrument setting of Chao and Z i = (Z i1 i2 Swanson (2005) is one where the reduced form is given by √ ϒiG = π1 Z i1 + (π2 / n) Z i2 for a K − G 1 dimensional vector Z i2 of excluded instrumental variables. This model can be folded into our framework by specifying that K − G1, rn = K − G 1 . z iG = π2 Z i2 Assumption 2 will then require that 2 /n = (K − G 1 )−1 ∑(π2 Z i2 )2 ∑ ziG i n i is bounded and bounded away from zero. Thus, the second moment ∑i (π2 Z i2 )2 /n of the term in the reduced form that identifies δ0G must grow linearly √ in K , just as in Chao and Swanson (2005), leading to a convergence rate of 1/ K − G 1 = √ 1/ rn . In another important case, the excluded instrument z iG could be an unknown function that can be approximated by a linear combination of Z i . For instance, suppose that z iG = f 0 (wi ) for an unknown function f 0 (wi ) of variables wi . In this def case, the instrumental variables could include a vector p K (wi ) = ( p1K (wi ), . . . , p K −G 1 ,K (wi )) of approximating functions, such as polynomials or splines. Here 50 JOHN C. CHAO ET AL. , p K (w ) ) . For r = n, the vector of instrumental variables would be Z i = (Z i1 i n this example is like Newey (1990) where Z i includes approximating functions for the reduced form but the number of instruments can grow as fast as the sample size. Alternatively, if rn /n → 0, it is a modified version where δG is more weakly identified. Assumption 2 also allows for multiple endogenous variables with a different strength of identification for each one, i.e., for different convergence rates. In the preceding example, we maintained the scalar endogenous variable for simplicity. The rn can be thought of as a version of the concentration parameter; it determines the convergence rate of estimators of δ0G just as the concentration √ parameter does in other settings. For rn = n, the convergence rate will be n where Assumptions 1 and 2 permit K to grow as fast as the sample size. This corresponds to a many instrument asymptotic approximation like Kunitomo (1980), Morimune (1983), and Bekker (1994).√For rn growing more slowly than n, the convergence rate will be slower than 1/ n, which leads to an asymptotic approximation like that of Chao and Swanson (2005). Assumption 3. There is a constant, C, such that conditional on Z = (ϒ, Z ), the observations (ε1 ,U1 ), . . . , (εn ,Un ) are independent, with E[εi |Z] = 0 for all i, E[Ui |Z] = 0 for all i, supi E[εi2 |Z] < C, and supi E[ Ui 2 |Z] ≤ C, a.s. In other words, Assumption 3 requires the second conditional moments of the disturbances to be bounded. 2 n z Assumption 4. There is a π K such that ∑i=1 i − π K Z i /n → 0 a.s. This condition allows an unknown reduced form that is approximated by a linear combination of the instrumental variables. These four assumptions give the consistency result presented in Theorem 1. −1/2 Sn (δ̃ − THEOREM 1. Suppose that Assumptions 1–4 are satisfied. Then, rn p p p p −1/2 δ0 ) → 0, δ̃ → δ0 , rn Sn (δ̂ − δ0 ) → 0, and δ̂ → δ0 . The following additional condition is useful for establishing asymptotic normality and the consistency of the asymptotic variance. n z 4 Assumption 5. There is a constant, C > 0, such that ∑i=1 /n 2 → 0, i 4 4 supi E[εi |Z] < C, and supi E[ Ui |Z] ≤ C a.s. To give asymptotic normality results, we need to describe the asymptotic variances. We will outline results that do not depend on the convergence of various moment matrices, so we writethe asymptotic variances as a function of n (rather than as a limit). Let σi2 = E εi2 |Z where, for notational simplicity, we have suppressed the possible dependence of σi2 on Z. Moreover, let H̄n = n ∑ zi zi /n, i=1 ¯n = n ∑ zi zi σi2 /n, i=1 JIVE WITH HETEROSKEDASTICITY 51 ¯ n = Sn−1 ∑ Pij2 E[Ui Ui |Z]σ j2 (1 − Pj j )−2 i= j Hn = + E[Ui εi |Z](1 − Pii )−1 E[ε j U j |Z](1 − Pj j )−1 Sn−1 , n ∑ (1 − Pii )zi zi /n, i=1 n = n ∑ (1 − Pii )2 zi zi σi2 /n, i=1 n = Sn−1 ∑ Pij2 E[Ui Ui |Z]σ j2 + E[Ui εi |Z]E[ε j U j |Z] Sn−1 . i= j When K /rn is bounded, the conditional asymptotic variance given Z of Sn (δ̃ −δ0 ) is ¯ n + ¯ n ) H̄n−1 , V̄n = H̄n−1 ( and the conditional asymptotic variance of Sn (δ̂ − δ0 ) is Vn = Hn−1 (n + n )Hn−1 . To state our asymptotic normality results, let A1/2 denote a square root matrix for a positive semidefinite matrix A, satisfying A1/2 A1/2 = A. Also, for nonsingular A, let A−1/2 = (A1/2 )−1 . THEOREM 2. Suppose that Assumptions 1–5 are satisfied, σi2 ≥ C > 0 a.s., and K /rn is bounded. Then V̄n and Vn are nonsingular a.s.n, and d V̄n−1/2 Sn (δ̃ − δ0 ) → N (0, IG ), d Vn−1/2 Sn (δ̂ − δ0 ) → N (0, IG ). The entire Sn matrix in Assumption 2 determines the convergence rate of the estimators, where Sn (δ̂ − δ0 ) = diag (μ1n , . . . , μGn ) S̃n (δ̂ − δ0 ) is asymptotically normal. The convergence rate of the linear combination ej S̃n (δ̂ − δ0 ) will be 1/μ jn , where e j is the jth unit vector. Note that yi = X i δ0 + u i = z i diag (μ1n , . . . , μGn ) S̃n δ0 + Ui δ0 + εi . The expression following the second equality is the reduced form for yi . Thus, the linear combination of structural parameters ej S̃n δ0 is the jth reduced form √ coefficient for yi that corresponds to the variable μ jn / n z ij . This reduced form coefficient is estimated at the rate 1/μ jn by the linear combination ej S̃n δ̂ of the √ instrumental variables (IV) estimator δ̂. The minimum rate is 1/ rn , which is the inverse square root of the rate of growth of the concentration parameter. These rates will change when K grows faster than rn . 52 JOHN C. CHAO ET AL. The rate of convergence in Theorem 2 corresponds to the rate found by Stock and Yogo (2005) for LIML, Fuller’s modified LIML, and B2SLS when rn grows at the same rate as K and more slowly than n under homoskedasticity. ¯ n in the asymptotic variance of δ̃ and the term n in the asymptotic The term variance of δ̂ account for the presence of many instruments. The order of these terms is K /rn , so if K /rn → 0, dropping these terms does not affect the asymptotic variance. When K /rn is bounded but does not go to zero, these terms have the same order as the other terms, and it is important to account for their presence in the standard errors. If K /rn → ∞, then these terms dominate and slow down the convergence rate√of the estimators. In this case, the conditional asymptotic variance given Z of rn /K Sn (δ̃ − δ0 ) is ¯ n H̄n−1 , V̄n∗ = H̄n−1 (rn /K ) and the conditional asymptotic variance of √ rn /K Sn (δ̂ − δ0 ) is Vn∗ = Hn−1 (rn /K )n Hn−1 . When K /rn → ∞, the (conditional) asymptotic variance matrices, V̄n∗ and Vn∗ , may be singular, especially when some components of X i are exogenous or when different identification strengths are present. To allow for this singularity, our asymptotic normality results are stated in terms of a linear combination of the estimator. Let L n be a sequence of × G matrices. THEOREM 3. Suppose that Assumptions 1–5 are satisfied and K /rn → ∞. If L n is bounded and there is a C > 0 such that λmin L n V̄n∗ L n ≥ C a.s.n then −1/2 d L n rn /K Sn (δ̃ − δ0 ) → N (0, I ). L n V̄n∗ L n Also, if there is a C > 0 such that λmin L n Vn∗ L n ≥ C a.s.n, then L n Vn∗ L n −1/2 Ln d rn /K Sn (δ̂ − δ0 ) → N (0, I ). √ Here the convergence rate is related to the size of rn /K Sn . In the simple √ case where δ is a scalar, we can take Sn = √rn , which gives a convergence rate of √ K /rn . Then the theorem states that rn / K (δ̃ − δ0 ) is asymptotically normal. √ It is interesting that K /rn → 0 is a condition for consistency in this setting and also in the context of Theorem 1. From Theorems 2 and 3, it is clear that the rates of convergence of both JIV estimators depend in general on the strength of the available instruments relative to their number, as reflected in the relative orders of magnitude of rn vis-à-vis K . Note also that, whenever rn grows √ at a slower rate than n, the rate of convergence is slower than the conventional n rate of convergence. In this case, the available JIVE WITH HETEROSKEDASTICITY 53 instruments are weaker than assumed in the conventional strongly identified case, where the concentration parameter is taken to grow at the rate n. When Pii = Z i (Z Z )−1 Z i goes to zero uniformly in i, the asymptotic variances n of the two JIV estimators will get close in large samples. Because ∑i=1 Pii = tr(P) = K , Pii goes to zero when K grows more slowly than n, though precise conditions for this convergence depend on the nature of Z i . As a practical matter, Pii will generally be very close to zero in applications where K is very small relative to n, making the jackknife estimators very close to each other. Under homoskedasticity, we can compare the asymptotic variances of the two JIV estimators. In this case, the asymptotic variance of δ̃ is V̄n1 = σ 2 H̄n−1 , V̄n = V̄n1 + V̄n2 , V̄n2 = Sn−1 σ 2 E[Ui Ui ] ∑ Pij2 /(1− Pj j )2 Sn−1 i= j + Sn−1 E[Ui εi ]E[Ui εi ]Sn−1 ∑ Pij2 (1 − Pii )−1 (1 − Pj j )−1 . i= j Also, the asymptotic variance of δ̂ is Vn = Vn1 + Vn2 , Vn2 = Sn−1 Vn1 =σ 2 Hn−1 n ∑ (1 − Pii ) i=1 σ 2 E[Ui Ui ] + E[Ui εi ]E[Ui εi ] 2 z i z i /n Hn−1 , Sn−1 ∑ Pij2 . i= j By the fact that (1 − Pii )−1 > 1, we have that V̄n2 ≥ Vn2 in the positive semidefinite sense. Also, note that Vn1 is the variance of an IV estimator with instruments z i (1 − Pii ) whereas V̄n1 is the variance of the corresponding least squares estimator, so V̄n1 ≤ Vn1 . Thus, it appears that in general we cannot rank the asymptotic variances of the two estimators. Next, we turn to results pertaining to the consistency of the asymptotic variance estimators and to the use of these estimators in hypothesis testing. We impose the following additional conditions. Assumption 6. There exist πn and C > 0 such that a.s. maxi≤n z i −πn Z i → 0 and supi z i ≤ C. The next result shows that our estimators of the asymptotic variance are consistent after normalization. THEOREM 4. Suppose that Assumptions 1–6 are satisfied. If K /rn is bounded, p p then Sn Ṽ Sn − V̄n → 0 and Sn V̂ Sn − Vn → 0. Also, if K /rn → ∞, then p p rn Sn Ṽ Sn /K − V̄n∗ → 0 and rn Sn V̂ Sn /K − Vn∗ → 0. A primary use of asymptotic variance estimators is conducting approximate inference concerning coefficients. To that end, we introduce Theorem 5. 54 JOHN C. CHAO ET AL. THEOREM 5. Suppose that Assumptions 1–6 are satisfied and that a(δ) is an × 1 vector of functions such that (i) a(δ) is continuously differentiable in a neighborhood of δ0 ; (ii) there is a square matrix, Bn , such that for A = ∂a(δ0 )/∂δ , Bn ASn−1 is bounded; and p (iii) for any δ̄k → δ0 , (k = 1, . . . , ) and Ā = [∂a1 (δ̄)/∂δ, . . . , ∂a (δ̄)/∂δ] , we p have Bn ( Ā − A)Sn−1 → 0. Also suppose that there is C > 0 such that λmin (Bn ASn−1 V̄n Sn−1 A Bn ) ≥ C if K /rn is bounded or λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) ≥ C if K /rn → ∞ a.s.n. Then for à = ∂a(δ̃)/∂δ, d ( à Ṽ à )−1/2 a(δ̃) − a(δ0 ) → N (0, I ). If there is C ≥ 0 such that λmin (Bn ASn−1 V̄n Sn−1 A Bn ) ≥ C if K /rn is bounded or λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) ≥ C if K /rn → ∞ a.s.n, then for  = ∂a(δ̂)/∂δ, d (  V̂  )−1/2 a(δ̂) − a(δ0 ) → N (0, I ). Perhaps the most important special case of this result is a single linear combination. This case will lead to t-statistics based on the consistent variance estimator having the usual standard normal limiting distribution. The following result considers such a case. COROLLARY 1. Suppose that Assumptions 1–6 are satisfied and c and bn are such that bn c Sn−1 is bounded. If there is a C > 0 such that bn2 c Sn−1 V̄n Sn−1 c ≥ C if K /rn is bounded or bn2 c Sn−1 V̄n∗ Sn−1 c ≥ C if K /rn → ∞ a.s.n, then c (δ̃ − δ0 ) d → N (0, 1). c Ṽ c Also if there is a C ≥ 0 such that bn2 c Sn−1 Vn Sn−1 c ≥ C if K /rn is bounded or bn2 c Sn−1 Vn∗ Sn−1 c ≥ C if K /rn → ∞ a.s.n, then c (δ̂ − δ0 ) d → N (0, 1). c V̂ c To show how the conditions of this result can be checked, we return to the previous example with one right-hand-side endogenous variable. The following result gives primitive conditions in that example for the conclusion of Corollary 1, i.e., for the asymptotic normality of a t-ratio. COROLLARY 2. If equation (2) holds, Assumptions 1–6 are satisfied for z i = , z ), c = 0 is a constant vector, either (Z i1 iG JIVE WITH HETEROSKEDASTICITY 55 (i) rn = n or (ii) K /rn is bounded and (−π1 , 1)c = 0 or 2 |Z] is bounded away from zero, and the (iii) K /rn → ∞, (−π1 , 1)c = 0, E[UiG sign of E[εi UiG |Z] is constant a.s., then c (δ̂ − δ0 ) d c (δ̃ − δ0 ) d → N (0, 1), → N (0, 1). c Ṽ c c V̂ c The proof of this result shows how the hypotheses concerning bn in Corollary 1 can be checked. The conditions of Corollary 2 are quite primitive. We have previously described how Assumption 2 is satisfied in the model of equation (2). Assumptions 1 and 3–6 are also quite primitive. This result can be applied to show that t-ratios are asymptotically correct when the many instrument robust variance estimators are used. For the coefficient δG of the endogenous variable, note that c = eG , so (−π1 , 1)c = 1 = 0. Therefore, 2 |Z] is bounded away from zero and the sign of E[ε U |Z] is constant, it if E[UiG i iG follows from Corollary 2 that δ̂G − δ0G d → N (0, 1). V̂GG Thus, the t-ratio for the coefficient of the endogenous variable is asymptotically correct across a wide range of different growth rates for rn and K . The analogous result holds for each coefficient δ j , j ≤ G 1 , of an included instrument as long as π1 j = 0 is not zero. If π1 j = 0, then the asymptotics are more complicated. For brevity, we will not discuss this unusual case here. The analogous results also hold for δ̃G . 4. CONCLUDING REMARKS In this paper, we derived limiting distribution results for two alternative JIV estimators. These estimators are both consistent and asymptotically normal in the presence of many instruments under heteroskedasticity of unknown form. In the same setup, LIML, 2SLS, and B2SLS are inconsistent. In the process of showing the asymptotic normality of JIV, this paper gives a central limit theorem for quadratic (and, more generally, bilinear) forms associated with an idempotent matrix. This central limit theorem has already been used in Hausman et al. (2007) to derive the asymptotic properties of the jackknife versions of the LIML and Fuller (1977) estimators and in Chao et al. (2010) to derive a moment-based test that allows for heteroskedasticity and many instruments. Moreover, this new central limit theorem is potentially useful for other analyses involving many instruments. 56 JOHN C. CHAO ET AL. NOTE 1. The observations w1 , . . . , wn are distinct with probability one and therefore, by K < n, cannot all be roots of a K th degree polynomial. It follows that for any nonzero a there must be some i with a Z i = a p K (wi ) = 0, implying a Z Z a > 0. REFERENCES Abadir, K.M. & J.R. Magnus (2005) Matrix Algebra. Cambridge University Press. Ackerberg, D.A. & P. Devereux (2009) Improved JIVE estimators for overidentified models with and without heteroskedasticity. Review of Economics and Statistics 91, 351–362. Angrist, J.D., G.W. Imbens, & A. Krueger (1999) Jackknife instrumental variables estimation. Journal of Applied Econometrics 14, 57–67. Bekker, P.A (1994) Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62, 657–681. Bekker, P.A. & J. van der Ploeg (2005) Instrumental variable estimation based on grouped data. Statistica Neerlandica 59, 506–508. Billingsley, P. (1986) Probability and Measure, 2nd ed. Wiley. Blomquist, S. and M. Dahlberg (1999) Small sample properties of LIML and jackknife IV estimators: Experiments with weak instruments. Journal of Applied Econometrics 14, 69–88. Chao, J.C., J.A. Hausman, W.K. Newey, N.R. Swanson, & T. Woutersen (2010) Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity. Working paper, MIT. Chao, J.C. & N.R. Swanson (2004) Estimation and Testing Using Jackknife IV in Heteroskedastic Regressions with Many Weak Instruments. Working paper, Rutgers University. Chao, J.C. & N.R. Swanson (2005) Consistent estimation with a large number of weak instruments. Econometrica 73, 1673–1692. Chao, J.C. & N.R. Swanson (2006) Asymptotic normality of single-equation estimators for the case with a large number of weak instruments. In D. Corbae, S.N. Durlauf, & B.E. Hansen (eds.), Frontiers of Analysis and Applied Research: Essays in Honor of Peter C. B. Phillips, pp. 82–124. Cambridge University Press. Davidson, R. & J.G. MacKinnon (2006) The case against JIVE. Journal of Applied Econometrics 21, 827–833. Donald, S.G. & W.K. Newey (2001) Choosing the number of instruments. Econometrica 69, 1161–1191. Fuller, W.A. (1977) Some properties of a modification of the limited information estimator. Econometrica 45, 939–954. Han, C. & P.C.B. Phillips (2006) GMM with many moment conditions. Econometrica 74, 147–192. Hansen, C., J.A. Hausman, & W.K. Newey (2008) Estimation with many instrumental variables. Journal of Business & Economic Statistics 26, 398–422. Hausman, J.A., W.K. Newey, T. Woutersen, J. Chao, & N.R. Swanson (2007) IV Estimation with Heteroskedasticity and Many Instruments. Working paper, MIT. Kunitomo, N. (1980) Asymptotic expansions of distributions of estimators in a linear functional relationship and simultaneous equations. Journal of the American Statistical Association 75, 693–700. Magnus, J.R. & H. Neudecker (1988) Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley. Morimune, K. (1983) Approximate distributions of k-class estimators when the degree of overidentifiability is large compared with the sample size. Econometrica 51, 821–841. Newey, W.K. (1990) Efficient instrumental variable estimation of nonlinear models. Econometrica 58, 809–837. Phillips, G.D.A. & C. Hale (1977) The bias of instrumental variable estimators of simultaneous equation systems. International Economic Review 18, 219–228. JIVE WITH HETEROSKEDASTICITY 57 Phillips, P.C.B. (1983) Small sample distribution theory in econometric models of simultaneous equations. In Z. Griliches & M.D. Intriligator (eds.), Handbook of Econometrics, vol. 1, pp. 449–516. North-Holland. Sawa, T. (1968) The exact sampling distribution of ordinary least squares and two-stage least squares estimators. Journal of the American Statistical Association 64, 923–937. Staiger, D. & J.H. Stock (1997) Instrumental variables regression with weak instruments. Econometrica 65, 557–586. Stock, J.H. and M. Yogo (2005) Asymptotic distributions of instrumental variables statistics with many weak instruments. In D.W.K. Andrews & J.H. Stock (eds.), Identification and Inference for Econometric Models: Essays in Honor of Thomas J. Rothenberg, pp. 109–120. Cambridge University Press. APPENDIX A: Proofs of Theorems We define a number of notations and abbreviations that will be used in Appendixes A and B. Let C denote a generic positive constant and let M, CS, and T denote the Markov inequality, the Cauchy–Schwarz inequality, and the triangle inequality, respectively. Also, for random variables Wi , Yi , and ηi and for Z = (ϒ, Z ), let w̄i = E[Wi |Z], W̃i = Wi − w̄i , ȳi = E[Yi |Z], Ỹi = Yi − ȳi , η̄i = E[ηi |Z], η̃i = ηi − η̄i , ȳ = ( ȳ1 , . . . ., ȳn ) , w̄ = (w̄1 , . . . , w̄n ) , μ̄Y = max | ȳi | , μ̄η = max |η̄i | , μ̄W = max |w̄i | , 1≤i≤n 1≤i≤n 1≤i≤n 2 2 σ̄Y = max Var Yi |Z , and σ̄W = max Var Wi |Z , i ≤n i ≤n σ̄η2 = max Var ηi |Z , i ≤n where, to simplify notation, we have suppressed dependence on Z for the various quanti2 , σ̄ 2 , and σ̄ 2 ) defined previously. Furthermore, ties (w̄i , W̃i , ȳi , Ỹi , η̄i , η̃i , μ̄W , μ̄Y , μ̄η , σ̄W η Y for random variable X , define X L 2 ,Z = E X 2 |Z . We first give four lemmas that are useful in the proofs of consistency, asymptotic normality, and consistency of the asymptotic variance estimator. We group them together here for ease of reference because they are also used in Hausman et al. (2007). LEMMA A1. If, conditional on Z = (ϒ, Z ), (Wi , Yi )(i = 1, . . . , n) are independent matrix of rank K , then a.s., Wi and Yi are scalars, and P is a symmetric, idempotent for w̄ = E (W1 , . . . , Wn ) |Z , ȳ = E (Y1 , . . . , Yn ) |Z , σ̄Wn = maxi≤n Var (Wi |Z)1/2 , 2 σ̄ 2 + σ̄ 2 ȳ ȳ + σ̄ 2 w̄ w̄, there exists a σ̄Yn = maxi≤n Var (Yi |Z)1/2 , and Dn = K σ̄W Wn Yn n Yn positive constant C such that 2 ∑ Pij Wi Y j − ∑ Pij w̄i ȳ j i= j i= j ≤ C Dn a.s. L 2 ,Z Proof. Let W̃i = Wi − w̄i and Ỹi = Yi − ȳi . Note that ∑ Pij Wi Y j − ∑ Pij w̄i ȳj = ∑ Pij W̃i Ỹ j + ∑ Pij W̃i ȳj + ∑ Pij w̄i Ỹ j . i= j i= j i= j i= j i= j 58 JOHN C. CHAO ET AL. 2 σ̄ 2 . Note that for i = j and k = , E W̃ Ỹ W̃ Ỹ |Z is zero unless i = k Let D1n = σ̄W i j k n Yn and j = or i = and j = k. Then by CS and ∑ j Pij2 = Pii , 2 E ∑i= j Pij Ỹi W̃ j |Z = ∑ ∑ Pij Pk E W̃i Ỹ j W̃k Ỹ |Z i= j k= = ∑ Pij2 i= j ≤ 2D1n E[W̃i2 |Z]E[Ỹ j2 |Z] + E[W̃i Ỹi |Z]E[W̃ j Ỹ j |Z] ∑ Pij2 ≤ 2D1n ∑ Pii = 2D1n K . i= j i Also, for W̃ = (W̃1 , . . . , W̃n ) , we have ∑i= j Pij W̃i ȳ j = W̃ P ȳ − ∑i Pii ȳi W̃i . By 2 I a.s., so independence across i conditional on Z, we have E W̃ W̃ |Z ≤ σ̄W n n 2 ȳ P ȳ ≤ σ̄ 2 ȳ ȳ, E[( ȳ P W̃ )2 |Z] = ȳ P E[W̃ W̃ |Z]P ȳ ≤ σ̄W Wn n 2 ∑i Pii ȳi W̃i |Z = ∑ Pii2 E[W̃i2 |Z] ȳi2 ≤ σ̄W2 n ȳ ȳ. E i Then by T we have 2 ∑i= j Pij W̃i ȳ j L 2 ,Z 2 ≤ ȳ P W̃ L 2 ,Z 2 + ∑i Pii ȳi W̃i L 2 ,Z 2 ȳ ȳ ≤ C σ̄W n a.s. PZ . 2 ≤ C σ̄Y2n w̄ w̄ a.s. The Interchanging the roles of Yi and Wi gives ∑i= j Pij w̄i Ỹ j L 2 ,Z n conclusion then follows by T. LEMMA A2. Suppose that, conditional on Z, the following conditions hold a.s.: (i) P = P(Z) is a symmetric, idempotent matrix with rank(P) = K and Pii ≤ C < 1; n E W W |Z (ii) (W1n ,U1 , ε1 ), . . . , (Wnn ,Un , εn ) are independent, and Dn = ∑i=1 in in satisfies Dn ≤ C a.s.n; |Z = 0, E[Ui |Z] = 0, E[εi |Z] = 0, and there exists a constant C such that (iii) E Win E[ Ui 4 |Z] ≤ C and E[εi4 |Z] ≤ C; n E W 4 |Z a.s. (iv) ∑i=1 → 0; and in (v) K → ∞ as n → ∞. Then for ¯ n def = ∑ Pij2 i= j E[Ui Ui |Z]E[ε 2j |Z] + E[Ui εi |Z]E[ε j U j |Z] /K and any sequences c1n and c2n depending on Z of conformable vectors with c1n ≤ C, D c + c ¯ c2n ≤ C, and n = c1n n 1n 2n n c2n > 1/C a.s.n, it follows that √ n d −1/2 Yn = n c1n ∑ Win + c2n ∑ Ui Pij ε j K → N (0, 1) , a.s.; i=1 a.s. i= j i.e., Pr(Yn ≤ y|Z) → ( y) for all y. JIVE WITH HETEROSKEDASTICITY 59 Proof. The proof of Lemma A2 is long and is deferred to Appendix B. The next two results are helpful in proving consistency of the variance estimator. They use the same notation as Lemma A1. LEMMA A3. If, conditional on Z , (Wi , Yi )(i = 1, . . . , n) are independent and Wi and Yi are scalars, then there exists a positive constant C such that 2 ∑i= j Pij2 Wi Y j − E ∑i= j Pij2 Wi Y j |Z L 2 ,Z ≤ C Bn a.s., 2 σ̄ 2 + σ̄ 2 μ̄2 + μ̄2 σ̄ 2 . where Bn = K σ̄W Y W Y W Y Proof. Using the notation of the proof of Lemma A1, we have ∑ Pij2 Wi Y j − ∑ Pij2 w̄i ȳj = ∑ Pij2 W̃i Ỹ j + ∑ Pij2 W̃i ȳj + ∑ Pij2 w̄i Ỹ j . i= j i= j i= j i= j i= j As before, for i = j and k = , E W̃i Ỹ j W̃k Ỹ |Z is zero unless i = k and j = or i = and j = k. Also, Pij ≤ Pii < 1 by CS and Assumption 1, so Pij4 ≤ Pij2 . Also, ∑ j Pij2 = Pii , so 2 2 E W̃ Ỹ W̃ Ỹ |Z E ∑i= j Pij2 W̃i Ỹ j |Z = ∑ ∑ Pij2 Pk i j k i= j k= = ∑ Pij4 i= j E W̃i2 |Z E Ỹ j2 |Z +E W̃i Ỹi |Z E W̃ j Ỹ j |Z 2 σ̄ 2 ≤ 2σ̄W Y ∑ Pij4 ≤ 2K σ̄W2 σ̄Y2 a.s. i= j Also, ∑i= j Pij2 W̃i ȳ j = W̃ P̃ ȳ − ∑i Pii2 ȳi W̃i where P̃ij = Pij2 . By independence across i 2 I , so conditional on Z, we have E[W̃ W̃ |Z] ≤ σ̄W n n 2 ȳ P̃ 2 ȳ E[( ȳ P̃ W̃ )2 |Z] = ȳ P̃E[W̃ W̃ |Z] P̃ ȳ ≤ σ̄W n 2 = σ̄W n ∑ 2 μ̄2 ȳi Pik2 Pkj2 ȳ j ≤ σ̄W Y i, j,k 2 μ̄2 = σ̄W Y∑ k ∑ Pik2 i ∑ Pkj2 j 2 2 μ̄2 E ∑i Pii2 ȳi W̃i |Z =∑ Pii4 E[W̃i2 |Z] ȳi2 ≤ K σ̄W Y ∑ Pik2 Pkj2 i, j,k 2 μ̄2 2 2 2 = σ̄W Y ∑ Pkk ≤ K σ̄W μ̄Y a.s., k a.s. i 2 Then by T, we have ∑i= j Pij2 W̃i ȳ j 2 + ∑i Pii2 ȳi W̃i L 2 ,Z L 2 ,Z L ,Z 2 2 2 2 2 ≤ C K σ̄W μ̄Y a.s. Interchanging the roles of Yi and Wi gives ∑i= j Pij w̄i Ỹ j L 2 ,Z 2 ≤ W̃ P̃ ȳ ≤ C K μ̄2W σ̄Y2 a.s. The conclusion then follows by T. As a notational convention, let ∑i= j=k denote ∑i ∑ j=i ∑k ∈{i, / j} . n 60 JOHN C. CHAO ET AL. LEMMA A4. Suppose that there is C > 0 such that, conditional on Z,(W1 ,Y1 , η1 ) , . . . , √ √ (Wn , Yn , ηn ) are independent with E[Wi |Z] = ai / n, E[Yi |Z] = bi / n, |ai | ≤ C, |bi | ≤ C, E[ηi2 |Z] ≤ C, Var(Wi |Z) ≤ C/rn , and Var(Yi |Z) ≤ C/rn and there exists πn such a.s. √ that maxi≤n ai − Z πn → 0 and K /rn → 0. Then i ∑ An = E i= j=k Wi Pik ηk Pkj Y j |Z = O p (1), ∑ i= j=k p Wi Pik ηk Pkj Y j − An → 0. n Proof. Given in Appendix B. LEMMA A5. If Assumptions 1–3 are satisfied, then (i) Sn−1 H̃ Sn−1 = ∑ z i Pij (1 − Pj j )−1 z j /n + o p (1), i= j √ (ii) Sn−1 ∑ X i Pij (1 − Pj j )−1 ε j = O p (1 + K /rn ), i= j (iii) Sn−1 Ĥ Sn−1 = ∑ z i Pij z j /n + o p (1), i= j √ (iv) Sn−1 ∑ X i Pij ε j = O p (1 + K /rn ). i= j Proof. Let ek denote the kth unit vector and apply Lemma A1 with Yi = ek Sn−1 X i = √ −1 for some k and . By Assumption 2, z ik / n + ek Sn−1 Ui and Wi =e Sn−1 X i (1 − Pii ) √ √ λmin (Sn ) ≥ C rn , implying Sn−1 ≤ C/ rn . Therefore a.s. √ E[Yi |Z] = z ik / n, Var(Yi |Z) ≤ C/rn , √ Var(Wi |Z) ≤ C/rn . E[Wi |Z] = z i / n(1 − Pii ), Note that a.s. √ √ K σ̄Wn σ̄Yn ≤ C K /rn → 0, σ̄Yn √ −1/2 w̄ w̄ ≤ Crn ∑ i σ̄Wn −1/2 ȳ ȳ ≤ Crn ∑ zik2 /n → 0, i −1/2 2 −2 z i (1− Pii ) /n ≤ Crn (1−max Pii )−2 i ∑ zi2 /n →0. i Because ek Sn−1 H̃ Sn−1 e = ek Sn−1 ∑i= j X i Pij X j Sn−1 e /(1 − Pj j ) = ∑i= j Yi Pij W j and Pij w̄i ȳ j = Pij z ik z j /n(1 − Pj j ), applying Lemma A1 and the conditional version of M, we deduce that for any υ > 0 and An = ek Sn−1 H̃ Sn−1 e − ∑i= j ek z i Pij (1 − Pj j )−1 a.s. z j e /n ≥ υ , P (An |Z) → 0. By the dominated convergence theorem, P (An ) = E [P (An |Z)] → 0. The preceding argument establishes the first conclusion for the (k, )th element. Doing this for every element completes the proof of the first conclusion. For the second conclusion, apply Lemma A1 with Yi = ek Sn−1 X i as before and Wi = εi /(1 − Pii ). Note that w̄i = 0 and σ̄Wn ≤ C. Then by Lemma A1, E[{ek Sn−1 ∑ X i Pij (1 − Pj j )−1 ε j }2 |Z] ≤ C K /rn + C. i= j The conclusion then follows from the fact that E[An |Z] ≤ C implies An = O p (1). JIVE WITH HETEROSKEDASTICITY 61 For the third conclusion, apply Lemma A1 with Yi = ek Sn−1 X i as before and Wi = e Sn−1 X i , so a.s. √ √ K σ̄Wn σ̄Yn ≤ C K /rn → 0, σ̄Wn ȳ ȳ ≤ Crn−1/2 ∑ zik2 /n → 0, √ σ̄Yn w̄ w̄ → 0. n The fourth conclusion follows similarly to the second conclusion. Let H̄n = ∑i z i z i /n and Hn = ∑i (1 − Pii )z i z i /n. LEMMA A6. If Assumptions 1–4 are satisfied, then Sn−1 H̃ Sn−1 = H̄n + o p (1), Sn−1 Ĥ Sn−1 = Hn + o p (1). Proof. We use Lemma A5 and approximate the right-hand-side terms in Lemma A5 by H̄n and Hn . Let z̄ i = ∑nj=1 Pij z j be the ith element of Pz and note that n ∑ zi − z̄i 2 /n = (I − P)z 2 /n = tr(z (I − P)z/n)= tr[(z− Z π K n ) (I − P)(z − Z π K n )/n] i=1 ≤ tr[(z − Z π K n ) (z − Z π K n )/n] = n ∑ zi − π K n Z i 2 /n → 0 a.s. PZ . i=1 It follows that a.s. −1 ∑(z̄ i − z i )(1 − Pii ) z i /n ≤ ∑ z̄ i − z i (1 − Pii )−1 z i /n i i 2 2 ≤ ∑ z̄ i − z i /n ∑ (1 − Pii )−1 z i /n → 0. i i Then ∑ zi Pij (1 − Pj j )−1 z j /n = ∑ zi Pij (1 − Pj j )−1 z j /n − ∑ zi Pii (1 − Pii )−1 zi /n i= j i, j =∑ i z̄ i (1 − Pii )−1 z i /n − i ∑ zi Pii (1 − Pii )−1 zi /n i = H̄n + ∑(z̄ i − z i )(1 − Pii )−1 z i /n = H̄n + oa.s. (1). i The first conclusion then follows from Lemma A5 and T. Also, as in the last equation, we have ∑ zi Pij z j /n = ∑ zi Pij z j /n − ∑ Pii zi zi /n = ∑ z̄i zi /n − ∑ Pii zi zi /n i= j i, j i i (z̄ i − z i )z i /n = Hn + oa.s. (1), i = Hn + ∑ i n so the second conclusion follows similarly to the first. Proof of Theorem 1. First, note that by λmin Sn Sn /rn ≥ λmin S̃ S̃ ≥ C, we have √ Sn (δ̃ − δ0 )/ rn ≥ λmin (Sn Sn /rn )1/2 δ̃ − δ0 ≥ C δ̃ − δ0 . 62 JOHN C. CHAO ET AL. p p √ Therefore, Sn (δ̃ − δ0 )/ rn → 0 implies δ̃ → δ0 . Note that by Assumption 2, H̄n is bounded and λmin ( H̄n ) ≥ C a.s.n. For H̃ from Section 2, it follows from Lemma A6 and Assumption 2 that with probability approaching one λmin (Sn−1 H̃ Sn−1 ) ≥ C as the sample −1 size grows. Hence Sn−1 H̃ Sn−1 = O p (1). By equation (1) and Lemma A5, −1/2 Sn (δ̃ − δ0 ) = (Sn−1 H̃ Sn−1 )−1 Sn−1 √ p rn = O p (1)o p (1) → 0. ∑ X i Pij ξ j / rn i= j All of the previous statements are conditional on Z = (ϒ, Z ) for a given sample size n, −1/2 Sn (δ̃ −δ0 ), we have shown that for any constant v > so for the random variable Rn = rn 0, a.s. Pr( Rn ≥ v|Z) → 0. Then by the dominated convergence theorem, Pr( Rn ≥ v) = E[Pr( Rn ≥ v|Z)] → 0. Therefore, because v is arbitrary, it follows that Rn = p −1/2 Sn (δ̃ − δ0 ) → 0. rn Next note that Pii ≤ C < 1, so in the positive semidefinite sense in large enough samples a.s., Hn = ∑(1 − Pii )z i z i /n ≥ (1 − C) H̄n . Thus, by Assumption 2, Hn is bounded and bounded away from singularity a.s.n. Then the n rest of the conclusion follows analogously with δ̂ replacing δ̃ and Hn replacing H̄n . We now turn to the asymptotic normality results. In what follows, let ξi = εi when considering the JIV2 estimator and let ξi = εi /(1 − Pii ) when considering JIV1. Proof of Theorem 2. Define √ Yn = ∑ z i (1 − Pii )ξi n + Sn−1 i ∑ Ui Pij ξ j . i= j By Assumptions 2–4, √ 2 n E ∑i=1 (z i − z̄ i ) ξi / n |Z a.s. n n = ∑i=1 z i − z̄ i 2 E ξi2 |Z n ≤ C ∑i=1 z i − z̄ i 2 /n → 0. Therefore, by M, Sn−1 n √ ∑ X i Pij ξ j − Yn = ∑ (zi − z̄i ) ξi / i= j p n → 0. i=1 We now apply Lemma A2 to establish asymptotic normality of Yn conditional on Z. Let n = Var (Yn |Z), so n = n ∑ zi zi (1 − Pii )2 E[ξi2 |Z]/n + Sn−1 ∑ Pij2 i=1 i= j × E[Ui Ui |Z]E[ξ j2 |Z] + E[Ui ξi |Z]E[U j ξ j |Z] Sn−1 . JIVE WITH HETEROSKEDASTICITY Note that 63 √ rn Sn−1 is bounded by Assumption 2 and that ∑i= j Pij2 /K ≤ 1, so by bounded- ness of K /rn and Assumption 3, it follows that n ≤ C a.s.n. Also, E[ξi2 |Z] ≥ C > 0, so n ≥ n n i=1 i=1 ∑ zi zi (1 − Pii )2 E[ξi2 |Z]/n ≥ C ∑ zi zi /n. Therefore, by Assumption 2,λmin ( n ) ≥ C > 0 a.s.n (for generic C that may be different from before). It follows that n−1 ≤ C a.s.n. Let α be a G × 1 nonzero vector. Let Ui be defined as in Lemma A2 and ξi be √ −1/2 α, defined as εi in Lemma A2. In addition, let Win = z i (1 − Pii )ξi / n, c1n = n √ −1 −1/2 α. Note that condition (i) of Lemma A2 is satisfied. Also, by and c2n = K Sn n the boundedness of ∑i z i z i /n and E[ξi2 |Z] a.s.n, condition (ii) of Lemma A2 is satisfied; condition (iii) is satisfied by Assumptions 3 and 5. Also, by (1 − Pii )−1 ≤ C n E W 4 |Z ≤ C n z 4 /n 2 a.s. → 0, so condition (iv) is and Assumption 5, ∑i=1 ∑i=1 i in −1/2 satisfied. Finally, condition (v) is satisfied by hypothesis. Note also that c1n = n α and √ √ −1/2 K /rn rn Sn−1 n α satisfy c1n ≤ C and c2n ≤ C a.s.n. This follows c2n = √ √ from the boundedness of K /rn , rn Sn−1 , and n−1 . Moreover, the n of Lemma A2 is n = Var(c1n n ∑ Win + c2n ∑ Ui Pij ξ j / i= j i=1 √ −1/2 K |Z) = Var(α n Yn |Z) = α α by construction. Then, applying Lemma A2, we have n √ −1/2 −1/2 d −1/2 αα α n Y n = n ∑ c1n Win + c2n ∑ Ui Pij ξ j / K → N (0, 1) −1/2 It follows that α n a.s. i= j i=1 d −1/2 Yn → N 0, α α a.s., so by the Cramér–Wold device, n d Yn → N (0, IG ) a.s. Consider now the JIV1 estimator where ξi = εi /(1 − Pii ). Plugging this into the ex¯ n + ¯ n for ¯ n and ¯ n defined according to Assumption pression for n , we find n = −1/2 −1 1/2 H̄n n 5. Let V̄n also be as defined following Assumption 5 and note that Bn = V̄n −1/2 −1/2 is an orthogonal matrix because Bn Bn = V̄n = I. Also, Bn is a function of V̄n V̄n −1/2 1/2 only Z, V̄n ≤ C a.s.n because λmin (V̄n ) ≥ C > 0 a.s.n, and n ≤ C a.s.n. By Lemma A6, (Sn−1 H̃ Sn−1 )−1 = H̄n−1 + o p (1). Note that if a random variable Wn sata.s. isfies Wn ≤ C a.s.n, then Wn = O p (1) (note that 1( Wn > C) → 0 implies that E[1( Wn > C)] = Pr( Wn > C) → 0). Therefore, we have −1/2 V̄n (Sn−1 H̃ Sn−1 )−1 n 1/2 −1/2 Note that because n −1/2 = V̄n ( H̄n−1 + o p (1))n 1/2 = Bn + o p (1). d Yn → N (0, IG ) a.s. and Bn is orthogonal to and a function −1/2 only of Z, we have Bn n d Yn → N (0, IG ). Then by the Slutsky lemma and δ̃ = δ0 + 64 JOHN C. CHAO ET AL. H̃ −1 ∑i= j X i Pij ξ j , for ξ j = (1 − Pj j )−1 ε j , we have −1/2 −1/2 −1 −1 −1 −1 −1 Sn (δ̃ − δ0 ) = V̄n (Sn H̃ Sn ) Sn V̄n −1/2 = V̄n ∑ X i Pij ξ j i= j (Sn−1 H̃ Sn−1 )−1 [Yn + o p (1)] −1/2 = [Bn + o p (1)][n −1/2 Yn +o p (1)] = Bn n d Yn +o p (1) → N (0, IG ), which gives the first conclusion. The conclusion for JIV2 follows by a similar argument n for ξi = εi . Proof of Theorem 3. Under the hypotheses of Theorem 3, rn /K → 0, so following √ p n z (1 − P )ξ /√n → the proof of Theorem 2, we have rn /K ∑i=1 0. Then similar to i ii i √ √ √ the proof of Theorem 2, for Yn = rn Sn−1 ∑i= j Ui Pij ξ j / K , we have rn /K Sn−1 ∑i= j X i Pij ξ j = Yn + o p (1). Here let n = Var (Yn |Z) = rn Sn−1 ∑ Pij2 E[Ui Ui |Z]E[ξ j2 |Z] + E[Ui ξi |Z]E[U j ξ j |Z] Sn−1/K . i= j Note that by Assumptions 2 and 3, n ≤ C a.s.n. Let L̄ n be any sequence of bounded −1/2 matrices with λmin ( L̄ n n L̄ n ) ≥ C > 0 a.s.n and let Ȳn = L̄ n n L̄ n L̄ n Yn . Now let α be a nonzero vector and apply Lemma A2with Win = 0, εi = ξi , c 1n = 0, and √ −1/2 √ L̄ n rn Sn−1 . We have Var c2n c2n = α L̄ n n L̄ n ∑i= j Ui Pij ξ j / K |Z = α α > 0 by construction, and the other hypotheses of Lemma A2 can be verified as in the proof of d Theorem 2. Then by the conclusion of Lemma A2, it follows that α Ȳn → N (0, α α) a.s. d By the Cramér–Wold device, a.s. Ȳn → N (0, I ). Consider now the JIV1 estimator and let L n be specified as in the statement of the result such that λmin L n V̄n∗ L n ≥ C > 0 a.s.n.Let L̄ n = L n H̄n−1 , so L n V̄n∗ L n = L̄ n n L̄ n . −1/2 1/2 Note that L̄ n n L̄ n ≤ C and n ≤ C a.s.n. By Lemma A6, (Sn−1 H̃ Sn−1 )−1 = H̄n−1 + o p (1). Therefore, we have L̄ n n L̄ n −1/2 Note also that −1/2 L n (Sn−1 H̃ Sn−1 )−1 = L̄ n n L̄ n L n ( H̄n−1 + o p (1)) −1/2 = L̄ n n L̄ n L̄ n + o p (1). √ rn /K Sn−1 ∑i= j X i Pij (1 − Pj j )−1 ε j = O p (1). Then we have −1/2 L n rn /K Sn (δ̃ − δ0 ) L n V̄n∗ L n −1/2 = L̄ n n L̄ n L n (Sn−1 H̃ Sn−1 )−1 rn /K Sn−1 = L̄ n n L̄ n −1/2 ∑ X i Pij (1 − Pj j )−1 ε j i= j d L̄ n + o p (1) [Yn + o p (1)] = Ȳn + o p (1) → N (0, I ) . The conclusion for JIV2 follows by a similar argument for ξi = εi . n JIVE WITH HETEROSKEDASTICITY 65 Next, we turn to the proof of Theorem 4. Let ξ̃i = ( yi − X i δ̃)/(1 − Pii )and ξi = εi / (1 − Pii ) for JIV1 and ξ̂i = yi − X i δ̂ and ξi = εi for JIV2. Also, let ˆ 1 = ∑ Ẋ i Pik ξ̂ 2 Pkj Ẋ , ˆ 2 = ∑ P 2 Ẋ i Ẋ ξ̂ 2 + Ẋ i ξ̂i ξ̂ j Ẋ , Ẋ i = Sn−1 X i , ij k j i j j i= j=k ˙1 = ∑ i= j=k Ẋ i Pik ξk2 Pkj Ẋ j , ˙2 = ∑ Pij2 i= j i= j Ẋ i Ẋ i ξ j2 + Ẋ i ξi ξ j Ẋ j . ˆ1 − ˙ 1 = o p (1) and ˆ2 − ˙2 = LEMMA A7. If Assumptions 1–6 are satisfied, then o p (K /rn ). Proof. To show the first conclusion, we use Lemma A4. Note that for δ̇ = δ̂ and X iP = p X i /(1 − Pii ) for JIV1 and δ̇ = δ̃ and X iP = X i for JIV2, we have δ̇ → δ0 and ξ̂i2 − ξi2 = 2 −2ξi X iP (δ̇ − δ0 ) + X iP (δ̇ − δ0 ) . Let ηi be any element of −2ξi X iP or X iP X iP . Note √ √ that Sn / n is bounded, so by CS, ϒi = Sn z i / n ≤ C. Then E[ηi2 |Z] ≤ C E[ξi2 |Z] + C E[ X i 2 |Z] ≤ C + C ϒi 2 + C E[ Ui 2 |Z] ≤ C. ˆ n denote a sequence of random variables converging to zero in probability. By Let Lemma A4, ˆ p ∑ i= j=k Ẋ i Pik ηk Pkj Ẋ j = o p (1)O p (1) → 0. ˆ1 − ˙ 1 is a sum of terms of the From the preceding expression for ξ̂i2 − ξi2 , we see that form p ˆ 1 − ˙1 → ˆ ∑i= j=k Ẋ i Pik ηk Pkj Ẋ , so T, 0. j Let di = C +|εi |+ Ui ,  = (1 + δ̂ ) for JIV1,  = (1 + δ̃ ) for JIV2, B̂ = δ̂ − δ0 for JIV1, and B̂ = δ̃ − δ0 for JIV2. By the conclusion of Theorem 1, we have  = O p (1) p and B̂ → 0. Also, because Pii is bounded away from 1, (1 − Pii )−1 ≤ C a.s. Hence, for both JIV1 and JIV2, X i ≤ C + Ui ≤ di , Ẋ i ≤ Crn−1/2 di , ξ̂i − ξi ≤ C X i (δ̂ − δ0 ) ≤ Cdi B̂, ξ̂i ≤ C X i (δ0 − δ̂) + |ξi | ≤ Cdi Â, 2 ξ̂i − ξi2 ≤ |ξi | + ξ̂i ξ̂i − ξi ≤ Cdi (1 + Â)di B̂ ≤ Cdi2  B̂, 2 −1/2 2 di Â, Ẋ i ξi ≤ Crn−1/2 di2 . Ẋ i ξ̂i − ξi ≤ Cμ−1 n di B̂, Ẋ i ξ̂i ≤ Crn Also note that because E[di2 |Z] ≤ C, E ∑ i= j Pij2 di2 d j2 rn−1 | Z ≤ Crn−1 ∑ Pij2 = Crn−1 ∑ Pii = C K /rn , i, j i 66 JOHN C. CHAO ET AL. so ∑i= j Pij2 di2 d j2 rn−1 = O p (K /rn ) by M. Then it follows that 2 2 2 2 ∑ Pij Ẋ i Ẋ i ξ̂ j − ξ j ≤ ∑ Pij2 Ẋ i ξ̂ j2 − ξ j2 i= j i= j ≤ Crn−1 ∑ Pij2 di2 d j2  B̂ = o p (K /rn ) . i= j We also have 2 ∑ Pij Ẋ i ξ̂i ξ̂ j Ẋ j − Ẋ i ξi ξ j Ẋ j ≤ ∑ Pij2 Ẋ i ξ̂i Ẋ j ξ̂ j − ξ j i= j i= j + Ẋ j ξ j X̆ i ξ̂i − ξi K ≤ Crn−1 ∑ Pij2 di2 d j2  B̂ = o p . r n i= j n The second conclusion then follows from T. LEMMA A8. If Assumptions 1–6 are satisfied, then ˙1 = ˙2 = ∑ i= j=k z i Pik E[ξk2 |Z]Pkj z j /n + o p (1), ∑ Pij2 zi zi E[ξ j2 |Z]/n + Sn−1 ∑ Pij2 i= j i= j E[Ui Ui |Z]E[ξ j2 |Z] + E[Ui ξi |Z]E[ξ j U j |Z] Sn−1 + o p (K /rn ). Proof. To prove the first conclusion, apply Lemma A4 with Wi equal to an element of Ẋ i , Y j equal to an element of Ẋ j , and ηk = ξk2 . Next, we use Lemma A3. Note that Var(ξi2 |Z) ≤ C and rn ≤ Cn, so for u ki = ek Sn−1 Ui , 4 + Ẋ 4 |Z] E[( Ẋ ik Ẋ i )2 |Z] ≤ C E[ Ẋ ik i 4 /n 2 + Eu 4 |Z + z 4 /n 2 + Eu 4 |Z ≤ C/r 2 , ≤ C z ik n ki i i 2 2 2 2 2 E[( Ẋ ik ξi ) |Z] ≤ C E z ik ξi /n + u ki ξi |Z ≤ C/n + C/rn ≤ C/rn . Also, if i = E[Ui Ui |Z], then E[ Ẋ i Ẋ i |Z] = z i z i /n + Sn−1 i Sn−1 and E[ Ẋ i ξi |Z] = Sn−1 E[Ui ξi |Z]. Next let Wi be Ẋ ik Ẋ i for some k and , so E[Wi |Z] = ek Sn−1 i Sn−1 e + z ik z i /n, |E[Wi |Z]| ≤ C/rn , Var(Wi |Z) ≤ E[( Ẋ ik Ẋ i )2 |Z] ≤ C/rn2 . Also let Yi = ξi2 and note that |E[Yi |Z]| ≤ C and Var(Wi |Z) ≤ C. Then in the notation of Lemma A3, √ √ √ K (σ̄Wn σ̄Yn + σ̄Wn μ̄Yn + μ̄Wn σ̄Yn ) ≤ K (C/rn + C/rn + C/rn ) ≤ C K /rn . By the conclusion of Lemma A3, for this Wi and Yi we have √ ∑ Pij2 Ẋ ik Ẋ i ξ j2 = ek ∑ Pij2 zi zi /n + Sn−1 i Sn−1 e E[ξ j2 |Z] + O p ( K /rn ). i= j i= j JIVE WITH HETEROSKEDASTICITY 67 Consider also Lemma A3 with Wi and Yi equal to Ẋ ik ξi and Ẋ i ξi , respectively, so σ̄Wn σ̄Yn + σ̄Wn μ̄Yn + μ̄Wn σ̄Yn ≤ C/r n . Then, applying Lemma A3, we have √ ∑ Pij2 Ẋ ik ξi ξ j Ẋ j = ek Sn−1 ∑ Pij2 E[Ui ξi |Z]E[ξ j U j |Z]Sn−1 e + O p ( K /rn ). i= j i= j √ Also, because K → ∞, we have O p ( K /rn ) = o p (K /rn ). The second conclusion then n follows by T. Proof of Theorem 4. Note that X̄ i = ∑nj=1 Pij X j , so n ∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ξ̂i2 i=1 n = = = n Pik Pkj X i X j ξ̂k2 − ∑ Pik Pkj X i X j ξ̂k2 − ∑ Pii Pij X i X j ξ̂i2 − ∑ Pij Pj j X i X j ξ̂ j2 −2 ∑ Pii2 X i X i ξ̂i2 i, j,k=1 n i, j,k=1 = n ∑ i, j=1 i, j=1 i= j ∑ i= j=k Pij Pj j X i X j ξ̂ j2 ∑ n i, j,k ∈{i, / j} ∑ Pii Pij X i X j ξ̂i2 − ∑ i= j i=1 n Pik Pkj X i X j ξ̂k2 − ∑ Pii2 X i X i ξ̂i2 i=1 Pik Pkj X i X j ξ̂k2 + n n i= j i=1 ∑ Pij2 X i X i ξ̂ j2 − ∑ Pii2 X i X i ξ̂i2 . Also, for Z i and Z̃ i equal to the ith row of Z and Z̃ = Z (Z Z )−1 , we have K K n n ∑ ∑ ∑ Z̃ ik Z̃ i X i ξ̂i k=1 =1 i=1 n ∑ = = i, j=1 n ∑ K ∑ Z jk Z j X j ξ̂ j j=1 K ∑∑ k=1 =1 Z̃ ik Z jk Z̃ i Z j n X i ξ̂i ξ̂ j X j = ∑ i, j=1 ( Z̃ i Z j )2 X i ξ̂i ξ̂ j X j = i, j=1 n ∑ 2 K ∑ Z̃ ik Z jk X i ξ̂i ξ̂ j X j k=1 Pij2 X i ξ̂i ξ̂ j X j . i, j=1 Adding this equation to the previous one gives ˆ = = ∑ Pik Pkj X i X j ξ̂k2 + ∑ Pik Pkj X i X j ξ̂k2 + i= j=k i= j=k n n i=1 i, j=1 ∑ Pij2 X i X i ξ̂ j2 − ∑ Pii2 X i X i ξ̂i2 + ∑ i= j Pij2 X i ξ̂i ξ̂ j X j ∑ Pij2 (X i X i ξ̂ j2 + X i ξ̂i ξ̂ j X j ), i= j which yields the in Section 2. equality Let σ̇i2 = E ξi2 |Z and z̄ i = ∑ j Pij z j = ei Pz. Then following the same line of argument as at the beginning of this proof, with z i replacing X i and σ̇k2 replacing ξ̂k2 , ∑ i= j=k z i Pik σ̇k2 Pkj z j /n = ∑ σ̇i2 z̄ i z̄ i − Pii z i z̄ i − Pii z̄ i z i + Pii2 z i z i n − ∑ Pij2 z i z i σ̇ j2/n. i i= j 68 JOHN C. CHAO ET AL. Also, as shown previously, Assumption 4 implies that ∑i z i − z̄ i 2 /n ≤ z (I − P)z/n → 0 a.s. Then by σ̇i2 and Pii bounded a.s. PZ , we have a.s. 2 ∑ σ̇i (z̄ i z̄ i − z i z i )/n ≤ ∑ σ̇i2 (2 z i z i − z̄ i + z i − z̄ i 2 )/n i i 1/2 1/2 ≤C ∑ zi 2 /n ∑ zi − z̄i 2 /n i + C ∑ z i − z̄ i 2 /n → 0, i i 1/2 1/2 2 4 2 2 2 → 0. ∑ σ̇i Pii (z i z̄ i − z i z i )/n ≤ ∑ σ̇i Pii z i /n ∑ zi − z̄i /n i i i It follows that ∑ i= j=k z i Pik σ̇k2 Pkj z j /n = ∑ σ̇i2 (1 − Pii )2 z i z i /n − i ∑ Pij2 zi zi σ̇ j2 /n + oa.s. (1). i= j It then follows from Lemmas A7 and A8 and T that ˆ2 = ˆ 1 + ∑ i= j=k + Sn−1 z i Pik σ̇k2 Pkj z j /n + ∑ Pij2 ∑ Pij2 zi zi σ̇ j2 /n i= j E[Ui Ui |Z]σ̇ j2 + E[Ui ξi |Z]E[ξ j U j |Z] Sn−1 i= j + o p (1) + o p (K /rn ) = ∑ σ̇i2 (1 − Pii )2 z i z i /n i + Sn−1 ∑ Pij2 i= j E[Ui Ui |Z]σ̇ j2 + E[Ui ξi |Z]E[ξ j U j |Z] Sn−1 + o p (1) + o p (K /rn ) because n → 0. Then for JIV1, where ξi = εi /(1 − Pii ) and σ̇i2 = σi2 /(1 − Pii )2 , we have ˆ2 = ¯ n + ¯ n + o p (1) + o p (K /rn ). ˆ 1 + For JIV2, where ξi = εi and σ̇i2 = σi2 , we have ˆ 2 = n + n + o p (1) + o p (K /rn ). ˆ 1 + Consider the case where K /rn is bounded, implying o p (K /rn ) = o p (1). Then, because ¯ n + ¯ n , Hn−1 , and n + n are all bounded a.s.n, Lemma A6 implies H̄n−1 , −1 −1 ˆ 2 Sn−1 H̃ Sn−1 ˆ 1 + Sn Ṽ Sn = Sn−1 H̃ Sn−1 ¯ n + ¯ n + o p (1) H̄n−1 + o p (1) = V̄n + o p (1), = H̄n−1 + o p (1) Sn V̂ Sn = Vn + o p (1), which gives the first conclusion. JIVE WITH HETEROSKEDASTICITY 69 For the second result, consider the case where K /rn → ∞. Then for JIV1, where ξi = ¯ n for n sufficiently εi /(1 − Pii ) and σ̇i2 = σi2 /(1 − Pii )2 , the almost sure boundedness of large implies that we have ˆ 2 =(rn /K ) ¯ n +(rn /K ) ¯ n +(rn/K )o p (1)+ o p (1)=(rn/K ) ¯ n +o p (1). ˆ 1 + (rn /K ) For JIV2, where ξi = εi and σ̇i2 = σi2 , we have ˆ 2 =(rn/K )n +(rn /K )n +(rn /K )o p (1)+ o p (1)=(rn/K )n +o p (1). ˆ 1 + (rn /K ) ¯ n , Hn−1 , and (r/K n )n are all bounded a.s.n and by Then by the fact that H̄n−1 , (r/K n ) Lemma A6, −1 −1 ˆ 1 + ˆ 2 Sn−1 H̃ Sn−1 Sn Ṽ Sn = Sn−1 H̃ Sn−1 ¯ n /K n + o p (1) H̄n−1 + o p (1) = V̄n∗ + o p (1). = H̄n−1 + o p (1) rn Similarly, Sn V̂ Sn = Vn∗ + o p (1), which gives the second conclusion. n Proof of Theorem 5. An expansion gives a(δ̂) − a(δ0 ) = Ā(δ̂ − δ0 ) for Ā = ∂a(δ̄)/∂δ where δ̄ lies on the line joining δ̂ and δ0 and actually differs element by p p element from a(δ). It follows from δ̂ → δ0 that δ̄ → δ0 , so by condition (iii), Bn ÂSn−1 = Bn ASn−1 + o p (1). Then multiplying by Bn and using Theorem 4, we have −1/2  V̂  a(δ̂) − a(δ0 ) −1/2 Bn ĀSn−1 Sn δ̂ − δ0 = Bn ÂSn−1 Sn V̂ Sn Sn−1  Bn −1/2 = Bn ASn−1 + o p (1) V̄n + o p (1) Sn−1 ABn + o p (1) × Bn ASn−1 + o p (1) Sn δ̂ − δ0 −1/2 Bn ASn−1 Sn δ̂ − δ0 + o p (1) = Bn ASn−1 V̄n Sn−1 A Bn −1/2 1/2 −1/2 = Bn ASn−1 V̄n Sn−1 A Bn Bn ASn−1 V̄n V̄n Sn δ̂ − δ0 −1/2 + o p (1) = Fn Fn Fn Ȳn + o p (1) −1/2 Sn (δ − δ0 ), where the third equality in the prefor Fn = Bn ASn−1 V̄n and Ȳn = V̄n ceding display follows from the Slutsky theorem given the continuity of the square root 1/2 d matrix. By Theorem 2, Ȳn → N (0, IG ). Also, from the proof of Theorem 2, it follows that this convergence is a.s. conditional on Z. Then because L n = (Fn Fn )−1/2 Fn satisfies L n L n = I , it follows from the Slutsky theorem and standard convergence in distribution results that −1/2 d a(δ̂) − a(δ0 ) = L n Ȳn + o p (1) → N (0, I ),  V̂  giving the conclusion. n 70 JOHN C. CHAO ET AL. Proof of Corollary 1. Let a(δ) = c δ, so Ā = A = c . Note that condition (i) of Theorem 5 is satisfied. Let Bn = bn . Then Bn ASn−1 = bn c Sn−1 is bounded by hypothesis so condition (ii) of Theorem 5 is satisfied. Also, Bn ( Ā − A)Sn−1 = 0, so condition (iii) of Theorem 5 is satisfied. If K /rn is bounded, then by hypothesis, λmin (Bn ASn−1 V̄n Sn−1 A Bn ) = bn2 c Sn−1 V̄n Sn−1 c ≥ C; or if K /rn → ∞, then λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) = bn2 c Sn−1 V̄n∗ Sn−1 c ≥ C, which gives the first conclusion. The second conclusion follows n similarly. Proof of Corollary 2. We will show the result for δ̂; the result for δ̃ follows analogously. Let γ = limn→∞ (rn /n), so γ exists and γ ∈ {0, 1} by Assumption 2. Also, √ √ √ √ √ √ γ I −π1 rn Sn−1 = rn S̃n−1 diag 1/ n, . . . , 1/ n, 1/ rn → R = . 0 1 √ Consider first the case where rn = n so that γ = 1. Take bn = rn and note that bn c Sn−1 = √ c ( rn Sn−1 ) is bounded. Also, c R = 0 because R is nonsingular and Vn ≤ C a.s.n implying that bn2 c Sn−1 Vn Sn−1 c = c RVn R c + oa.s. (1). Also n = Sn−1 E[ ∑i= j Pij Ui ε j ∑i= j Pij Ui ε j |Z]Sn−1 is positive semidefinite, so Vn ≥ Hn−1 n Hn−1 . Also, by Assumptions 2 and 4, there is C > 0 with λmin (Hn−1 n Hn−1 ) ≥ C a.s.n. Therefore, a.s.n, bn2 c Sn−1 Vn Sn−1 c ≥ c R Hn−1 n Hn−1 R c + o(1) ≥ C + o(1) ≥ C. (A.1) The conclusion then follows from Corollary 1. For γ = 0, let a = (−π1 , 1)c and note that c R = (0, a) = 0. If K /rn is bounded, let bn = √ rn . Then, as before, bn c Sn−1 is bounded and equation (A.1) is satisfied, and the conclu√ √ √ sion follows. If K /rn → ∞, let bn = rn / K . Note that bn c Sn−1 = rn /K c ( rn Sn−1 ) → 0, so bn c Sn−1 is bounded. Also, note that √ rn Sn−1 eG = diag( rn /n, . . . , rn /n, 1) I 0 e = eG . −π1 1 G Furthermore, a constant sign of E[εi UiG |Z ] implies E[εi UiG |Z ]E[ε j UjG |Z ] ≥ 0, so by Pii ≤ C < 1, 2 |Z]σ 2 + E[ε U |Z]E[ε U |Z] 2 |Z]σ 2 /K K ≥ ∑ Pij2 E[UiG i iG j jG ∑ Pij2 E[UiG j j i= j ≥C ∑ Pij2 /K = C ∑ Pij2 −∑ Pii2 i= j i, j i i= j K = C 1−∑ Pii2 /K ≥ C. Therefore, we have, a.s., (rn /K )n = √ rn Sn−1 eG . ≥ CeG eG ∑ Pij2 i= j 2 |Z]σ 2 E[UiG j √ + E[εi UiG |Z]E[ε j UjG |Z] /K eG rn Sn−1 JIVE WITH HETEROSKEDASTICITY 71 Also, Hn is a.s. bounded so that λmin (Hn−1 ) = 1/λmax (Hn ) ≥ C + oa.s. (1). It then follows that from c R = aeG bn2 c Sn−1 V̄n∗ Sn−1 c = rn c Sn−1 Hn−1 (rn /K )n Hn−1 Sn−1 c ≥ Crn c Sn−1 Hn−1 eG eG Hn−1 Sn−1 c Hn−1 eG )2 + oa.s. (1) ≥ C + oa.s. (1). = a 2 C(eG n The conclusion then follows from Corollary 1. APPENDIX B: Proofs of Lemmas A2 and A4 We first give a series of lemmas that will be useful for the proofs of Lemmas A2 and A4. n LEMMA B1. Under Assumption 1 and for any subset I2 of the set (i, j)i, j=1 and n 4 2 2 any subset I3 of (i, j, k)i, j,k=1 , (i) ∑ Pij ≤ K ; (ii) ∑ Pij Pjk ≤ K ; and I I 2 3 (iii) ∑ Pij2 Pik Pjk ≤ K , a.s.n. I3 Proof. By Assumption 1, Z Z is nonsingular a.s.n. Also, because P is idempotent, n rank(P) = tr(P) = K , 0 ≤ Pii ≤ 1, and ∑ Pij2 = Pii . Therefore, a.s.n, j=1 n ∑ Pij4 ≤ ∑ I2 ∑ I3 Pij2 = i, j=1 Pij2 Pjk2 ≤ n ∑ Pii = K , i=1 n ∑ ∑ j=1 n n ∑ Pij2 i=1 Pjk2 k=1 = n n j=1 j=1 ∑ Pj2j ≤ ∑ Pj j = K , ∑ Pij2 Pik Pjk ≤ ∑ Pij2 ∑ Pik Pjk ≤ ∑ Pij2 ∑ Pik2 ∑ Pjk2 I3 i, j ≤∑ i, j k Pij2 i, j Pii Pj j ≤ ∑ For the next result, let Sn = k k Pij2 = K . i, j ∑ i<j<k<l Pik Pjk Pil Pjl + Pij Pjk Pil Pkl + Pij Pik Pjl Pkl . LEMMA B2. If Assumption 2 is satisfied, then a.s.n (i) tr (P − D)4 ≤ C K ; (ii) ∑ Pik Pjk Pil Pjl ≤ C K ; and (iii) |Sn | ≤ C K , where D = diag(P11 , . . . , Pnn ). i< j<k<l Proof. To show part (i), note that (P − D)4 = (P − PD − DP + D 2 )2 = P − PD − PDP + PD2 − PDP + PDPD + PD2 P − PD3 − DP + DPD + DPDP − DPD2 + D 2 P − D 2 PD-D 3 P + D 4 . for any square matrices A and B. Then, Note that tr(A ) = tr(A) and tr(AB) = tr(BA) 4 tr (P − D) = tr(P) − 4tr(PD) + 4tr(PD2 ) + 2tr(PDPD) − 4tr(PD3 ) + tr(D 4 ). By 0 ≤ 72 JOHN C. CHAO ET AL. Pii ≤ 1 we have D j ≤ I for any positive integer j and tr(PD j ) = tr(PD j P) ≤ tr(P) = K a.s.n. Also, a.s.n, tr(PDPD) = tr(PDPDP) ≤ tr(PD2 P) ≤ tr(P) = K and tr(D 4 ) = ∑i Pii4 ≤ K . Therefore, by T we have tr (P − D)4 ≤ 16K , giving conclusion (i). Next, let L be the lower triangular matrix with L ij = Pij 1(i > j). Then P = L + L + D, so (P − D)4 = (L + L )4 = (L 2 + LL + L L + L 2 )2 = L 4 + L 2 LL + L 2 L L + L 2 L 2 + LL L 2 + LL LL + LL L L + LL3 + L LL2 + L LLL + L LL L + L LL2 + L 2 L 2 + L 2 LL + L 2 L L + L 4 . Note that for positive integer j, [(L ) j ] = L j . Then using tr(AB) = tr(B A) and tr(A ) = tr(A), tr((P − D)4 ) = 2tr(L 4 ) + 8tr(L 3 L ) + 4tr(L 2 L 2 ) + 2tr(L LL L). Next, compute each of the terms. Note that tr(L 4 ) = ∑ Pij 1(i > j)Pjk 1( j > k)Pk 1(k > )Pi 1( > i) = 0, ∑ Pij 1(i > j)Pjk 1( j > k)Pk 1(k > )Pi 1(i > ) i, j,k, tr(L 3 L ) = i, j,k, = ∑ Pij Pjk Pk Pi = = ∑ Pk Pkj Pji Pi = i< j<k< tr L 2 L 2 = ∑ ∑ Pij Pjk Pk Pi ∑ Pij Pjk Pk Pi , <k< j<i i> j>k> i< j<k< Pij 1(i > j)Pjk 1( j > k)Pk 1( > k)Pi 1(i > ) i, j,k, ∑ = Pij Pjk Pk Pi i> j>k,i>>k = ∑ ∑ Pij Pjk Pk Pi + i> j=>k = Pij Pjk Pkj Pji + ∑ Pij2 Pjk2 + 2 ∑ Pij 1(i > j)Pjk (k > j)Pk 1(k > )Pi 1(i > ) i< j<k ∑ Pik Pk Pj Pji , i< j<k< and tr(L L L L ) = i, j,k, = ∑ Pij Pji Pij Pji + ∑ j<i j<k<i Pij Pjk Pk Pi Pk Pki Pij Pj + Pj Pji Pik Pk i< j<k< ∑ ∑ i>> j>k ∑ i> j>k = Pij Pjk Pk Pi + i> j>>k Pij Pjk Pkj Pji JIVE WITH HETEROSKEDASTICITY ∑ + Pij Pjk Pkj Pji + j<i<k ∑ Pij Pji Pi Pi j<<i + ∑ < j<i ∑ Pij Pji Pi Pi + ×Pij Pjk Pk Pi = 73 ∑ Pij4 + 2 ∑ i< j < j<k<i ∑ + Pij2 Pik2 + Pik2 Pjk2 + 4 i<j<k + j<<k<i ∑ ∑ < j<i<k + ∑ j<<i<k Pik Pkj Pj Pi . i< j<k< Summing up gives the result tr((P − D)4 ) = 2 ∑i< j Pi4j + 4 ∑i< j<k (P 2ij Pjk2 + Pik2 Pjk2 + Pij2 Pik2 ) + 8S n . Then by T and Lemma B1, we have |Sn | ≤ (1/4) ∑ Pij4 + 1/2 i<j ∑ (Pij2 Pjk2 + Pik2 Pjk2 + Pij2 Pik2 ) + (1/8) tr((P − D)4 ) ≤ C K , i<j<k a.s.n, thus giving part (iii). That is, Sn = O a.s. (K ). To show part (ii), take {εi } to be a sequence of independent and identically distributed random variables with mean 0 and variance 1 and where εi and Z are independent for all i and n. Define the random quantities 1 = ∑ Pij Pik ε j εk + Pij Pjk εi εk + Pik Pjk εi ε j , i<j<k 2 = ∑ Pij Pik ε j εk + Pij Pjk εi εk , 3 = i<j<k ∑ Pik Pjk εi ε j . i<j<k Note that by Lemma A1, E 23 |Z = E ∑i<j<k Pik Pjk εi ε j ∑<m<q Pq Pmq ε εm |Z ∑ = Pik Pjk Pi Pj = ∑ 2 (Pik )2 Pjk +2 i<j<k i<j<{k,} = Oa.s. (K ) + 2 ∑ ∑ Pik Pjk Pi Pj i<j<k< Pik Pjk Pi Pj . i<j<k< Also, note that E 2 3 |Z = E ∑ i<j<k Pij Pik ε j εk + Pij Pjk εi εk ∑<m<q Pq Pmq ε εm |Z = ∑ i<j<k< Pij Pik Pj Pk + ∑ Pij Pjk Pi Pk i<j<k< and E 22 |Z = E ∑i<j<k Pij Pik ε j εk + Pij Pjk εi εk × ∑ <m<q Pm Pq εm εq + Pm Pmq ε εq |Z 74 JOHN C. CHAO ET AL. = ∑ {i,}<j<k ∑ + ∑ Pij Pik Pj Pk + Pij Pik Pjm Pmk + i<j<m<k ∑ = Pij2 Pik2 + i<j<k ∑ ∑ <i<j<k Pij2 Pjk2 + 2 i<j<k ∑ +2 Pij Pjk Pim Pmk i<{j,m}<k Pij Pjk Pi Pk ∑ Pij Pik Pj Pk i<<j<k Pij Pjk Pim Pmk i<j<m<k ∑ + = ∑ i<j<k ∑ Pij Pi Pjk Pk + i<j<k< Pjk Pk Pij Pi i<j<k< Pij2 Pik2 + ∑ Pij2 Pjk2 + 2Sn = Oa.s. (K ). i<j<k Because 1 = 2 + 3 , it follows that E 21 |Z = E 22 |Z + E 23 |Z + 2E 2 3 |Z = O a.s. (K ) + 2S n = O a.s. (K ). Therefore, by T, the expression for E 23 |Z given previously, and 3 = 1 − 2 , ∑ Pik Pjk Pi Pj ≤ E 23 |Z + Oa.s. (K ) ≤ E (1 − 2 )2 |Z + Oa.s. (K ) i<j<k< ≤ 2E 21 |Z + 2E 22 |Z + Oa.s. (K ) ≤ Oa.s. (K ). LEMMA B3. Let L be the lower triangular matrix with L ij = Pij 1(i > j). Then, under √ 1/2 Assumption 2, LL ≤ C K a.s.n, where A = Tr A A . Proof. From the proof of Lemma B2 and by Lemma B1 and Lemma B2(ii), we have a.s.n 2 LL = tr(LL LL ) = ∑ P 4 + 2 ∑ P 2 P 2 + P 2 P 2 + 4 ∑ Pik Pkj Pj Pi ij ij ik ik jk ≤C i<j i<j<k K + ∑ Pik Pkj Pj Pi ≤ CK. i<j<k< Taking square roots gives the answer. i<j<k< n For Lemma B4, which follows, let φi = φi (Z) (i = 1, . . . , n) denote some sequence of measurable functions. In applications of this lemma, we will take φi (Z) to be either conditional variances or conditional covariances given Z. Also, to set some notation, let 2 (Z) = E[u 2 |Z], and γ = γ (Z) = E[u ε |Z], where σi2 = σi2 (Z) = E[εi2 |Z], ωi2 = ωin i in i i i to simplify notation we suppress the dependence of σi2 on Z and of ωi2 and γi on Z and n. Let the following results apply. LEMMA B4. Suppose that (a) P is a symmetric, idempotent matrix with rank (P) = K and Pii ≤ C < 1; (b) (u 1 , ε1 ) , . . . ., (u n , εn ) are conditional independent on Z; (c) 4 4 there exists a constant C such that, a.s., supi E u i |Z ≤ C, supi E εi |Z ≤ C, and supi |φi | = supi |φi (Z)| ≤ C. Then, a.s., JIVE WITH HETEROSKEDASTICITY (i) E (ii) E (iii) E (iv) E (v) E (vi) E 75 2 1 2 φ (u ε − γ ) |Z → 0 P ∑ k i i i i<k ki K |Z → 0 ; 2 2φ u 2− ω 2 |Z → 0 ∑i<k Pki k j j 2 | Z → 0; ∑i< j<k Pki Pkj φk u i ε j + u j εi 2 ∑i< j<k Pki Pkj φk εi ε j |Z → 0; 2 P P φ u u ∑i< j<k ki kj k i j |Z → 0. 1 2 2 2 K ∑i<k Pki φk ε j − σ j 1 K 1 K 1 K 1 K 2 Proof. To show part (i), note that E 1 K ∑ 2 i<k≤ n Pki φk u i εi − γi 2 |Z 1 4 φ 2 E u 2 ε 2 |Z − γ 2 P ∑ ki k i i i i<k≤ n K2 2 2 P 2 φ φ E u 2 ε 2 |Z − γ 2 + 2 ∑1≤ i<k<l≤n Pki li k l i i i K 1 4 2 2 2 4 4 ≤ 2 ∑1≤i<k≤n Pki φk E u i |Z E εi |Z + E u i |Z E εi |Z K 2 2 2 2 2 4 4 + 2 ∑1≤i<k<l≤n Pki Pli |φk | |φl | E u i |Z E εi |Z +E u i |Z E εi |Z K 1 4 + 2 2 P 2 → 0, ≤C P P ∑ ∑ ki ki li K 2 1≤i<k≤n K 2 1≤ i<k<l≤n = where the first inequality is the result of applying T and a conditional version of CS, the second inequality follows by hypothesis, and the convergence to zero a.s. follows from applying Lemma B1(i) and (ii). Parts (ii) and (iii) can be proved in essentially the same way as part (i); hence, to avoid redundancy, we do not give detailed arguments for these parts. To show part (iv), first let L be a lower triangular matrix with (i, j)th element L ij = Pij 1 (i > j) as in Lemma B3 and define Dγ = diag (γ1 , . . . , γn ), Dφ = diag (φ1 , . . . , φn ), u = (u 1 , . . . , u n ) , and ε = (ε1 , . . . , εn ) . It then follows by direct multiplication that ε L Dφ Lu − tr L Dφ L Dγ = ∑ 2 φ (u ε − γ ) Pki k i i i 1≤i<k≤n + ∑ 1≤i< j<k ≤ n Pki Pkj φk u i ε j + u j εi 76 JOHN C. CHAO ET AL. so that, by making use of Loève’s cr inequality, we have that 2 1 E P P φ ε + u ε | Z u ∑ 1≤i<j<k≤n ki k j k i j j i K2 2 1 ≤ 2 2 E u L Dφ Lε − tr L Dφ LDγ |Z K 2 1 + 2 2 E ∑ 1≤i<k≤n P 2ki φ k u i εi − γ i |Z . K (B.1) It has already been shown in the proof of part (i) that ( 1/K 2 ) E ∑ 1≤i<k≤n 2 P 2ki φk u i εi − γ i |Z → 0 a.s. PZ , so what remains to be shown is that 1/K 2 2 E u L Dφ Lε − tr L Dφ L Dγ |Z → 0 a.s. PZ . To show the latter, note first that, by straightforward calculations, we have 2 1 E u L D Lε − tr L D L D | Z γ φ φ K2 2 1 1 . = 2 tr L Dφ L ⊗ L Dφ L E εu ⊗ εu |Z − 2 tr L Dφ L Dγ K K (B.2) Next, note that, by straightforward calculation, we have E εu ⊗ εu |Z ⎛ ⎞ ⎛ σ12 ω12 e1 e1 σ12 ω22 e1 e2 · · · σ12 ωn2 e1 en γ 2 e1 e1 γ1 γ2 e2 e1 ⎜ ⎟ ⎜ 1 ⎜ 2 2 2 2 ⎟ ⎜ ⎜ σ ω e e σ ω e e · · · σ22 ωn2 e2 en ⎟ ⎜ γ2 γ1 e1 e2 γ22 e2 e2 = ⎜ 2 1. 2 1 2 2. 2 2 ⎟+⎜ . .. .. .. ⎜ ⎟ ⎜ .. .. .. . ⎝ ⎠ ⎝ . . γn γ1 e1 en γn γ2 e2 en σn2 ω12 en e1 σn2 ω22 en e2 · · · σn2 ωn2 en en ⎞ ⎛ ⎛ γ1 ⊗ Dγ ϑ1 e1 e1 0 ··· 0 0 ··· n×n n×n n×n ⎟ ⎜ ⎜ 0 ϑ e e · · · 0 ⎟ ⎜ 0 γ2 ⊗ Dγ · · · ⎜ 2 2 2 n×n ⎟ ⎜ n×n ⎜ n×n +⎜ . ⎟+⎜ . . . .. . .. ⎜ ⎜ . .. ⎟ .. .. .. . . ⎠ ⎝ ⎝ . 0 n×n 0 n×n · · · ϑn en en 0 n×n 0 n×n = (Dσ ⊗ In ) vec (In ) vec (In ) (Dω ⊗ In ) + Dγ ⊗ In K nn Dγ ⊗ In + E Dϑ E + Dγ ⊗ Dγ , · · · γ1 γn en e1 ⎞ ⎟ ⎟ · · · γ2 γn en e2 ⎟ ⎟ . .. ⎟ .. . ⎠ 2 · · · γn en en ⎞ 0 n×n ⎟ 0 ⎟ n×n ⎟ ⎟ .. ⎟ . ⎠ · · · γn ⊗ D γ (B.3) where K nn is ann 2 × n 2 commutation matrix such that, for any n × n matrix A, K nn vec (A) = vec A . (See Magnus and Neudecker, 1988, pp. 46–48, for more on commuta tion matrices.) Also, here, Dγ = diag (γ1 , . . . ., γn ), Dσ = diag σ12 , . . . ., σn2 , Dω = diag ω12 , . . . ., ωn2 , Dϑ = diag (ϑ1 , . . . , ϑn ) with ϑi = E εi2 u i2 |Z − σi2 ωi2 − 2γi2 . . . for i = 1, . . . ., n, E = e1 ⊗ e1 ..e2 ⊗ e2 .. · · · ..en ⊗ en , and ei is the ith column of an n × n identity matrix. It follows from (B.2) and (B.3) and by straightforward calculations that JIVE WITH HETEROSKEDASTICITY 77 2 1 E u L D Lε − tr L D L D | Z γ φ φ K2 2 1 1 = 2 tr L Dφ L ⊗ L Dφ L E εu ⊗ εu |Z − 2 tr L Dφ L Dγ K K 1 = 2 vec (In ) Dω L Dφ L Dσ ⊗ L Dφ L vec (In ) K 1 + 2 tr Dγ L Dφ L Dγ ⊗ L Dφ L K nn K 1 1 + 2 tr L Dφ L ⊗ L Dφ L E Dϑ E + 2 tr L Dφ L Dγ ⊗ L Dφ L Dγ K K 2 1 − 2 tr L Dφ L Dγ K 1 1 = 2 tr L Dφ L Dω L Dφ L Dσ + 2 tr Dγ L Dφ L Dγ ⊗ L Dφ L K nn K K 1 (B.4) + 2 tr L Dφ L ⊗ L Dφ L E Dϑ E . K Focusing first on the first term of (B.4), and letting ω2 = max1≤i≤n ωi2 , σ 2 = 2 max1≤i≤n σi2 , and φ = max1≤i≤n φi2 , we get 1 2 1 tr L Dφ LDω L Dφ LDσ ≤ ω2 σ 2 φ tr L LL L 2 2 K K 2 1 C ≤ C 2 tr L LL L = 2 L L K K a.s. PZ , (B.5) where the first inequality follows by repeated application of CS and of the simple inequality tr A A ≤ max λi tr A A , (B.6) 1≤ i ≤ n which holds for n × n matrices A and = diag (λ1 , . . . , λn ) such that λi ≥ 0 for all i, and where the second inequality follows in light of the assumptions of the lemma. Turning our attention now to thesecond term of (B.4), we make use of the fact that, for n × n matrices A and B, tr ( A ⊗ B) K nn = tr { AB} (a specialization of the result given by Abadir and Magnus, 2005, p. 304) to obtain K −2 tr D γ L D φ LDγ ⊗ L D φ L K nn = K − 2 tr L D φ LDγ L D φ LD γ . As in (B.5), by repeated use of CS and the inequality (B.6), we obtain 2 1 C tr Dγ L Dφ LDγ ⊗ L Dφ L K nn ≤ 2 L L 2 K K a.s. PZ . (B.7) 78 JOHN C. CHAO ET AL. Finally, to analyze the third term of (B.4), we note that 1 tr L Dφ L ⊗ L Dφ L E Dϑ E K2 ≤ 2 1 n 1 n |ϑi | ei L Dφ Lei ≤ 2 ∑ |ϑi | ei L Dφ2 Lei ei L Lei ∑ 2 K i=1 K i=1 ≤φ n 2 2 1 |ϑi | ei L Lei 2 K i=1 ∑ ≤C 2 2 1 n 1 n 1 n 2 L Le ≤ C P Pe = C e e i i ∑ ∑ ∑P K 2 i=1 i K 2 i=1 i K 2 i=1 ii ≤C 1 n C ∑ Pii = K a.s. PZ , K 2 i=1 (B.8) where the first inequality follows from T, the second inequality follows from CS, the third inequality makes use of (B.6), the fourth inequality uses CS and T and follows in light of the assumptions of the lemma, and the last inequality holds because Pii < 1. In light of (B.4), it follows from (B.5), (B.7), and (B.8) and Lemma B3 that (1/K 2 ) 2 E[(u L Dφ Lε − tr L Dφ L D γ )2 | Z ≤ 2C 1/K 2 LL + C ( 1/ K ) ≤ C/K a.s. PZ , which shows part (iv). It is easily seen that parts (v) and (vi) can be proved in essentially the same way as part (iv) (by taking u i = εi ); hence, to avoid redundancy, we do not give detailed arguments for these parts. n Proof of Lemma A2. Let b1n = c1n n −1/2 and b2n = c2n n −1/2 and note that these W are bounded in n because n is bounded away from zero by hypothesis. Let win = b1n in and u i = b2n Ui , where we suppress the n subscript on u i for notational convenience. Then, √ n y , y = w + ȳ , ȳ = Yn = w 1n + ∑i=2 ∑ j<i (u j Pij εi + u i Pij ε j )/ K . in in in in in Also, E w1n 4 |Z ≤ ∑i E win 4 |Z ≤ C ∑i E Win 4 |Z → 0 a.s., so by a conditional version of M,we deduce that for any υ > 0, P (|w1n | ≥ υ | Z) → 0. Moreover, note that supn E |P (|w1n | ≥ υ | Z)| 2 < ∞. It follows that, by Theorem 25.12 of Billingsley (1986), P (|w1n | ≥ υ) = E P (|w1n | ≥ υ | Z) → 0 as n → ∞; i.e., w1n p n y + o (1). → 0 unconditionally. Hence, Yn = ∑i=2 p in d n Now, we will show that Yn → N (0, 1) by first showing that, conditional on Z, ∑i=2 d ,U , ε ) for i = 1, . . . , n. Define the yin → N (0, 1), a.s. To proceed, let Xi = (Win i i σ -fields Fi,n = σ (X1 , . . . ., Xi ) for i = 1, . . . ., n. Note that, by construction, Fi−1,n ⊆ Fi,n . Moreover, it is straightforward to verify that, conditional on Z, {yin , Fi,n , 1 ≤ i ≤ n, n ≥ 2} is a martingale difference array, and we can apply the martingale central limit 2 (Z) = E[u 2 |Z], and γ = γ (Z) = theorem. As before, let σi2 = E[εi2 |Z], ωi2 = ωin i in i E[u i εi |Z], where to simplify notation we suppress the dependence of σi2 on Z and of ωi2 and γi on Z and n. Now, note that E[win ȳ jn |Z] = 0 for all i and j and that JIVE WITH HETEROSKEDASTICITY E ( ȳin )2 |Z = ∑ ∑E j<i k<i = ∑ Pij2 79 (u j Pij εi + u i Pij ε j )(u k Pik εi + u i Pik εk )|Z /K ω2j σi2 + ωi2 σ j2 + 2γi γ j K. j<i Thus, sn2 (Z) = E ∑i=2 yin n 2 n 2 |Z + E ȳ 2 |Z |Z = ∑ E win in i=2 D b − E w 2 |Z + = b1n n 1n 1n ∑ Pij2 i= j ω2j σi2 + ωi2 σ j2 + 2γi γ j K D b + b ¯ = b1n n 1n 2n n b2n + oa.s. (1) −1/2 ¯ n c2n −1/2 + oa.s. (1) c1n Dn c1n + c2n = n n −1/2 = n −1/2 n n + oa.s. (1) = 1 + oa.s. (1) → 1 a.s., n E W W |Z and where Dn = Dn (Z) = ∑i=1 in in ¯ n (Z) = ¯n = ∑ Pij2 i= j E[Ui Ui |Z]E[ε 2j |Z] + E[Ui εi |Z]E[ε j U j |Z] K. 4 |Z ≤ C n Thus, sn2 (Z) is bounded and bounded away from zero a.s. Also, ∑i=2 E yin ∑i=2 n n ε = 4 4 4 E Win |Z +C ∑i=2 E ȳin |Z . By condition (iv), ∑i=2 E Win |Z → 0. Let ȳin √ √ u = ∑ j<i u i Pij ε j / K . By Pij < 1 and ∑ j Pij2 = Pii , we have ∑ j<i u j Pij εi / K and ȳin that a.s. n ∑E i=2 ε ȳin 4 C n |Z ≤ 2 ∑ Pij Pik Pi Pim E εi4 |Z E u j u k u u m |Z ∑ K i=2 j,k,,m<i C n 4 2 2 ≤ 2 ∑ ∑ Pij + ∑ Pij Pik ≤ C K /K 2 → 0. K i=2 j<i j,k<i n E Similarly, ∑i=2 n ∑E i=2 u ȳin 4 |Z → 0 a.s., so that n 4 4 4 |Z ≤ C ȳin ∑ E ȳinε |Z + E ȳinu |Z → 0 i=2 n E y 4 |Z → 0 a.s. Then by T we have ∑i=2 in Conditional on Z, to apply the martingale central limit theorem, it suffices to show that for any > 0 n 2 |X , . . . , X 2 (Z) ≥ | Z → 0. , Z − s P ∑i=2 E yin 1 i−1 n (B.9) 80 JOHN C. CHAO ET AL. Now note that E win ȳin |Z = 0 a.s. and thus we can write n ∑E i=2 n 2 |X , . . . , X 2 2 2 yin 1 i−1 , Z − sn (Z) = ∑ E[win |X1 , . . . , Xi−1 , Z] − E[win |Z] i=2 n 2 |X , . . . , X 2 + ∑ E win ȳin |X1 , . . . , Xi−1 , Z + ∑ E[ ȳin 1 i−1 , Z] − E[ ȳin |Z] . n i=2 i=2 (B.10) We will show that each term on the right-hand side of (B.10) converges to zero a.s. To pro2 |X , . . . , ceed, note first that by independence of W1n , . . . , Wnn conditional on Z, E[win 1 2 Xi−1 , Z] = E[win |Z] a.s. Next, note that E win ȳin |X1 , . . . , Xi−1 , Z = E[win u i |Z] √ √ ∑ j<i Pij ε j / K + E[win εi |Z] ∑ j<i Pij u j / K . Let δi = δi (Z) = E[win u i |Z] and con√ sider the first term, δi ∑ j<i Pij ε j / K . Let P̄ be the upper triangular matrix with n P̄ij = Pij for j > i and P̄ij = 0, j ≤ i, and let δ = (δ1 , . . . , δn ). Then, ∑i=2 ∑ j<i δi Pij ε j / √ √ 2 n n 2 ≤ ∑i=1 E[win |Z]E[u i2 |Z] ≤ C K = δ P̄ ε/ K . By CS, δ δ = ∑i=1 E win u i |Z √ √ a.s. By Lemma B3, P̄ P̄ ≤ C K a.s., which in turn implies that λmax P̄ P̄ ≤ C K √ a.s. It then follows given E u 2j |Z ≤ C a.s. that E[(δ P̄ ε/ K )2 |Z] ≤ Cδ P̄ P̄δ/K ≤ √ √ C δ 2 / K ≤ C/ K → 0 a.s., so that by M we have for any > 0, P δ (Z) P̄ ε/ √ √ n E w ε |Z K ≥ |Z → 0 a.s. Similarly, we have ∑i=2 ∑ j<i Pij u j / K → 0 a.s. in i n Therefore, it follows by T that, for any > 0, P ∑i=2 E win ȳin |X1 , . . . , Xi−1 , Z ≥ |Z → 0 a.s. To finish showing that equation (B.9) is satisfied, it only remains to show that, for any > 0, n 2 |X , . . . , X 2 P ∑i=2 E ȳin (B.11) 1 i−1 , Z − E[ ȳin |]Z ≥ |Z → 0 a.s. Now, write n ∑ E ȳin2 |X1 , . . . , Xi−1 , Z − E[ ȳin2 |Z] i=2 = ∑ ωi2 Pij2 ε 2j − σ j2 K +2 j<i ∑ + ∑ σi2 Pij2 u 2j − ω2j K +2 j<i +2 ∑ j<i γi Pij2 u j ε j − γ j ωi2 Pij Pik ε j εk /K j<k<i ∑ σi2 Pij Pik u j u k /K j<k<i K +2 ∑ γi Pij Pik (u j εk + u k ε j )/K . (B.12) j<k<i By applying parts (i)–(iii) of Lemma B4 with φi = γi , ωi2 , and σi2 , respectively, we 2 2 obtain, a.s., E ∑ j<i γi Pij2 u j ε j − γ j /K |Z → 0, E ∑ j<i ωi2 Pij2 ε 2j − σ 2j /K | Z 2 → 0, and E ∑ j<i σi2 P 2ij u 2j − ω2j /K | Z → 0. Moreover, applying part (iv) of 2 Lemma B4 with φi = γi , we obtain E ∑ j<k<i γi Pi j P ik u j εk + u k εi /K | Z → 0 a.s. PZ . Similarly, conditional on Z, all of the remaining terms in equation (B.12) converge in mean square to zero a.s. by parts (v) and (vi) of Lemma B4. JIVE WITH HETEROSKEDASTICITY 81 The preceding argument shows that as n → ∞, P (Yn ≤ y|Z) → (y) a.s. PZ , for every real number y, where (y) denotes the cumulative distributionfunction of a standard normal distribution. Moreover, it is clear that, for some > 0, sup E |P (Yn ≤ y|Z)|1+ < ∞ n (take, e.g., = 1 ). Hence, by a version of the dominated convergence theorem, as given by Theorem 25.12 of Billingsley (1986), we deduce that P (Yn ≤ y) = E [P (Yn ≤ y| Z)] → n E [ (y)] = (y) , which gives the desired conclusion. Proof of Lemma A4. Let w̄i = E[Wi |Z], W̃i = Wi − w̄i , ȳi = E[Yi |Z], Ỹi = Yi − ȳi , η̄i = E[ηi |Z], η̃i = ηi − η̄i , μ̄2W = max w̄i2 ≤ C/n, i≤n μ̄2Y = max ȳi2 ≤ C/n, i≤n 2 = max Var(W |Z) ≤ C/r , σ̄W n i i≤n σ̄η2 = max Var(ηi |Z) ≤ C. i≤n μ̄2η = max η̄i2 ≤ C, i≤n 2 σ̄Y = max Var(Yi |Z) ≤ C/rn , i ≤n Also, let y̆i = ∑ j Pij ȳ j , w̆i = ∑ j Pij w̄ j , be predicted values from projecting ȳ and w̄ on P and note that ∑ y̆i2 ≤ ∑ ȳi2 ≤ C, ∑ w̆i2 ≤ ∑ w̄i2 ≤ C. i i i i By adding and subtracting terms similar to the beginning of the proof of Theorem 4, An = ∑ ∑ i= j k ∈{i, / j} w̄i Pik η̄k Pkj ȳ j = ∑ η̄i w̆i y̆i − Pii w̄i y̆i − Pii w̆i ȳi + 2Pii2 w̄i ȳi n − ∑ w̄i ȳi Pij2 η̄ j . i i, j By T, CS, and η̄k ≤ C, w̆ η̄ y̆ ∑ k k k ≤ C ∑ w̆k2 ∑ y̆k2 ≤ C, k k k w̄ P η̄ y̆ ∑ i ii i i ≤ ∑ w̄i2 Pii2 η̄i2 ∑ y̆i2 ≤ C, i i i and it follows similarly that ∑i w̆i Pii η̄i ȳi is bounded. By Lemma B1, ∑i,k w̄i ȳi Pik2 η̄k ≤ Cn −1 ∑i,k Pik2 ≤ C K /n ≤ C. Also, ∑i w̄i ȳi Pii2 η̄i ≤ Cn/n = C. Thus, |An | ≤ C holds by T. For the remainder of this proof we let E[•] denote the conditional expectation given Z. Note that Wi Pik ηk Pkj Y j = W̃i Pik ηk Pkj Y j + w̄i Pik ηk Pkj Y j = W̃i Pik η̃k Pkj Y j + W̃i Pik η̄k Pkj Y j + w̄i Pik η̃k Pkj Y j + w̄i Pik η̄k Pkj Y j = W̃i Pik η̃k Pkj Ỹ j + W̃i Pik η̃k Pkj ȳ j + W̃i Pik η̄k Pkj Ỹ j + W̃i Pik η̄k Pkj ȳ j + w̄i Pik η̃k Pkj Ỹ j + w̄i Pik η̃k Pkj ȳ j + w̄i Pik η̄k Pkj Ỹ j + w̄i Pik η̄k Pkj ȳ j . 82 JOHN C. CHAO ET AL. Summing and subtracting the last term gives ∑ i= j=k Wi Pik ηk Pkj Y j − An = 7 ∑ ψ̂r , r =1 where ψ̂1 = ψ̂4 = ∑ W̃i Pik η̃k Pkj Ỹ j , ψ̂2 = ∑ W̃i Pik η̄k Pkj ȳ j , ψ̂5 = i= j=k i= j=k ∑ W̃i Pik η̃k Pkj ȳ j , ψ̂3 = ∑ w̄i Pik η̃k Pkj Ỹ j , ψ̂6 = i= j=k i= j=k ∑ W̃i Pik η̄k Pkj Ỹ j , ∑ w̄i Pik η̃k Pkj ȳ j , i= j=k i= j=k p and ψ̂7 = ∑i= j=k w̄i Pik η̄k Pkj Ỹ j . By T, the second conclusion will follow from ψ̂r → 0 for r = 1, . . . , 7. Also, note that ψ̂7 is the same as ψ̂4 and ψ̂5 , which is the same as ψ̂2 with the random variables W and Y interchanged. Because the conditions on W and Y are p symmetric, it suffices to show that ψ̂r → 0 for r ∈ {1, 2, 3, 4, 6}. Consider now ψ̂1 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̃k Pkj Ỹ j W̃r Pr s η̃s Pst Ỹt ] = 0, except for when each of the three indexes i, j, k is equal to one of the three indexes r, s, t. There are six ways this can happen, leading to six terms in E[ψ̂12 ] = ∑ ∑ i= j=k r =s=t E[W̃i Pik η̃k Pkj Ỹ j W̃r Pr s η̃s Pst Ỹt ] = 6 ∑ τ̂q . q=1 2 σ̄ 2 σ̄ 2 K ≤ Cr −2 K → 0. By Lemma B1, we have Note that by hypothesis, σ̄W η Y n τ̂1 = ∑ i= j=k E[(W̃i Pik η̃k Pkj Ỹ j )2 ] = ∑ i= j=k 2 σ̄ 2 σ̄ 2 K → 0. E[W̃i2 ]Pik2 E[η̃k2 ]Pkj2 E[Ỹ j2 ] ≤ σ̄W η Y Similarly, by CS, τ̂3 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃ j Pjk η̃k Pki Ỹi )] i= j=k 2 2 2 = ∑ E[W̃i Ỹi ]E[W̃ j Ỹ j ]E[η̃k ]Pik Pkj i= j=k 2 σ̄ 2 σ̄ 2 K → 0. ≤ σW η Y Next, by Lemma B1 and CS τ̂2 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃i Pij η̃ j Pjk Ỹk )] i= j=k 2 2 = ∑ E[W̃i ]E[η̃k Ỹk ]E[η̃ j Ỹ j ]Pik Pij Pjk i= j=k 2 σ̄ 2 σ̄ 2 K → 0. ≤ σ̄W η Y JIVE WITH HETEROSKEDASTICITY 83 Similarly, τ̂4 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃ j Pji η̃i Pik Ỹk )] i= j=k 2 = ∑ E[W̃i η̃i ]E[W̃ j Ỹ j ]E[η̃k Ỹk ]Pik Pkj Pji i= j=k 2 σ̄ 2 σ̄ 2 K → 0, ≤ σ̄W η Y τ̂5 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃k Pki η̃i Pij Ỹ j )] i= j=k 2 2 = ∑ E[W̃i η̃i ]E[Ỹ j ]E[W̃k η̃k ]Pik Pkj Pji i= j=k 2 σ̄ 2 σ̄ 2 K → 0, ≤ σ̄W η Y τ̂6 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃k Pkj η̃ j Pji Ỹi )] i= j=k 2 = ∑ E[W̃i Ỹi ]E[η̃ j Ỹ j ]E[W̃k η̃k ]Pjk Pij Pik i= j=k 2 σ̄ 2 σ̄ 2 K → 0. ≤ σ̄W η Y p T then gives E[ψ̂12 ] → 0, so ψ̂12 → 0 holds by M. Consider now ψ̂2 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̃k Pkj ȳ j W̃r Pr s η̃s Pst ȳt ] = 0, except when i = r and j = s or i = s and j = r. Then by (A + B + C)2 ≤ 3(A2 + B 2 + C 2 ) and for fixed k, ∑i=k Pik2 ≤ Pkk , ∑i=k Pik4 ≤ Pkk , it follows that ∑ i=k Pik2 2 ∑ j ∈{i,k} / Pkj ȳ j 2 ȳ 2 + P 2 ȳ 2 ≤ 3 ∑ Pik2 y̆k2 + Pki i kk k i=k 2 + 2 ȳ 2 2 +2 2 ≤ 9n μ̄2 ≤ C. P y̆ ȳ y̆ ≤ 3 ∑ kk k ∑ k ∑ k k Y ≤3 k k k 84 JOHN C. CHAO ET AL. It follows by |AB| ≤ A2 + B 2 2, CS, and Pik = Pki that E[ψ̂22 ] = ∑ i=k 2 ∑ E[W̃i2 ]Pik2 E[η̃k2 ] j ∈{i,k} / Pkj ȳ j +∑ i=k j ∈{i,k} / 2 σ̄ 2 ≤ 2σ̄W η ∑ i=k ∑ Pkj ȳ j j ∈{i,k} / Pij ȳ j 2 ∑ Pik2 ∑ E[W̃i η̃i ]Pik2 E[W̃k η̃k ] j ∈{i,k} / ≤ C/rn → 0. Pkj ȳ j p Then ψ̂2 → 0 holds by M. Consider ψ̂3 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̄k Pkj Ỹ j W̃r Pr s η̄s Pst Ỹt ] = 0, except when i = r and j = t or i = t and j = r. Thus, 2 2 2 2 E[ψ̂3 ] = ∑ E[W̃i ]E[Ỹ j ] + E[W̃i Ỹi ]E[W̃ j Ỹ j ] ∑ Pik η̄k Pkj i= j 2 σ̄ 2 ≤ 2σ̄W Y ∑ ∑ i= j k ∈{i, / j} 2 k ∈{i, / j} Pik η̄k Pkj . Note that for i = j, ∑k ∈{i, / j} Pik Pkj η̄k = ∑k Pik Pkj η̄k − Pij Pii η̄i − Pij Pj j η̄ j . Note also that 2 ∑ ∑ i Pik2 η̄k k i,k, 2 ∑ ∑ Pik η̄k Pkj i, j ∑ = ∑ = k i, j,k, =∑ k, ∑ ∑ Pik η̄k Pkj k i, j 2 = μ̄2 2 2 Pik2 Pi η ∑ Pii ≤ μ̄η K , ∑ Pik η̄k Pkj k ∑ 2 = μ̄2 K . Pk η 2 k, −∑ i i k, =∑ ∑ i,k, Pik η̄k Pjk Pi η̄ Pj = ∑ η̄k η̄ 2 ≤ μ̄2 η̄k η̄ Pk η It therefore follows that 2 i= j 2 η̄ η̄ ≤ μ̄2 Pik2 Pi k η ∑ Pik Pi i ∑ Pjk Pj j 2 ∑ Pik η̄k Pki k ≤ 2μ̄2η K . Also, by Lemma B1, ∑i= j Pij2 Pj2j η̄2j ≤ μ̄2η ∑i= j Pij2 ≤ μ̄2η K , so that ∑ i= j ⎧ ⎨ 2 ∑ k ∈{i, / j} Pik η̄k Pkj ≤3 2 ∑ ⎩ ∑ Pik η̄k Pkj i= j k + Pij2 Pii2 η̄i2 + Pij2 Pj2j η̄2j ⎫ ⎬ ⎭ ≤ 6μ̄2η K . 2 σ̄ 2 μ̄2 K ≤ Cr −2 From the previous expression for E[ψ̂32 ], we then have E[ψ̂32 ] ≤ C σ̄W n Y η p K → 0. Then ψ̂3 → 0 by M. JIVE WITH HETEROSKEDASTICITY 85 Next, consider ψ̂4 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̄k Pkj ȳ j W̃r Pr s η̄s Pst ȳt ] = 0, except when i = r. Thus, 2 2 E[ψ̂42 ] = ∑ E[W̃i2 ] ∑ ∑ j=i k ∈{i, / j} i Pik η̄k Pkj ȳ j 2 ≤ σ̄W ∑ i ∑ ∑ j=i k ∈{i, / j} Pik η̄k Pkj ȳ j . Note that for i = j, ∑ k ∈{i, / j} Pik η̄k Pkj ȳ j = ∑ Pik η̄k Pkj ȳ j − Pii η̄i Pij ȳ j − Pij η̄ j Pj j ȳ j . k Therefore, for fixed i, ∑ ∑ j=i k ∈{i, / j} Pik η̄k Pkj ȳ j = ∑ ∑ Pik η̄k Pkj ȳj − Pii η̄i Pij ȳj − Pij η̄ j Pj j ȳj j=i k = ∑ Pik η̄k y̆k − ∑ Pik2 η̄k ȳi − Pii η̄i y̆i − ∑ Pij η̄ j Pj j ȳ j + 2Pii2 η̄i ȳi . k k j Note that because P is idempotent, we have ∑ j ∑k Pjk η̄ j y̆ j η̄k y̆k ≤ ∑ j η̄2j y̆ j2 ≤ μ̄2η ∑ j y̆ j2 ≤ μ̄2η ∑ j ȳ j2 ≤ n μ̄2η μ̄2Y ≤ C. Then it follows that ∑{∑ Pik η̄k y̆k }2 = ∑ ∑ ∑ Pij η̄ j y̆j Pik η̄k y̆k = ∑ ∑ η̄ j y̆j η̄k y̆k ∑ Pij Pik i k i j k j k i = ∑ ∑ Pjk η̄ j y̆ j η̄k y̆k ≤ C. j k Also, using similar reasoning, ∑(Pii η̄i y̆i )2 ≤ ∑ η̄i2 y̆i2 ≤ n μ̄2η μ̄2Y ≤ C, i i 2 ∑ ∑ Pij η̄ j Pj j ȳj i ≤ ∑ η̄i2 Pii2 ȳi2 ≤ ∑ η̄i2 ȳi2 ≤ C, j ∑ i i i 2 ȳi ∑ Pik2 η̄k ≤ μ̄2Y k ∑ i,k, 2 η̄ η̄ ≤ μ̄2 μ̄2 Pik2 Pi k Y η ∑ i,k, 2 ≤ K μ̄2 μ̄2 ≤ C, Pik2 Pi η Y ∑ Pii4 η̄i2 ȳi2 ≤ n μ̄2η μ̄2Y ≤ C. i 2 C ≤ C/r Then using the fact that (∑r5=1 Ar )2 ≤ 5 ∑r5=1 Ar2 , it follows that E[ψ̂42 ] ≤ σ̄W n p → 0, so ψ̂4 → 0 by M. w̄i Pik Pkj ȳ j = w̄i Pik y̆k − w̄i Pik2 ȳi − Next, consider ψ̂6 . Note that for i = k, ∑ j ∈{i,k} / w̄i Pik Pkk ȳk . Then for fixed k, ∑ ∑ w̄i Pik Pkj ȳj = ∑ w̄i Pik y̆k −w̄i Pik2 ȳi −w̄i Pik Pkk ȳk − w̄k Pkk y̆k + 2w̄k Pkk2 ȳk i=k j ∈{i,k} / i 2 ȳ . = w̆k y̆k − ∑ w̄i Pik2 ȳi − w̆i Pkk ȳk − w̄k Pkk y̆k + 2w̄k Pkk k i 86 JOHN C. CHAO ET AL. Then using the fact that (∑r5=1 Ar )2 ≤ 5 ∑r5=1 Ar2 we have E[ψ̂62 ] = ∑ E[η̃k2 ]( ∑ ∑ i=k j ∈{i,k} / k w̄i Pik Pkj ȳ j )2 ≤ 5σ̄η2 ∑ w̆k2 y̆k2 + k ∑ 2 w̄ ȳ w̄ ȳ + w̆ 2 P 2 ȳ 2 + w̄ 2 P 2 y̆ 2 + 4w̄ 2 P 4 ȳ 2 Pkj2 Pki i i j j k kk k k kk k k kk k i, j ≤ 5σ̄η2 ∑ w̆k2 y̆k2 + μ̄2W μ̄2Y k ∑ i, j,k ≤ 5σ̄η2 2 + μ̄2 Pkj2 Pki Y ∑ w̆k2 + μ̄2W k ∑ y̆k2 + n4μ̄2W μ̄2Y k ∑ w̆k2 y̆k2 + 7n μ̄2W μ̄2Y k ≤ C ∑ w̆k2 y̆k2 + Cn/n 2 ≤ C ∑ w̆k2 y̆k2 + o(1). k k √ = maxi |ai − Z i πn | → 0, let αn = πn / n, and note that Now let πn be such that n √ maxi≤n w̄i − Z i αn = n / n. Let w̄ = (w̄1 , . . . , w̄n ) . Then |w̄i − w̆i | = w̄i − Z i (Z Z )−1 Z w̄ = w̄i − Z i αn − Z i (Z Z )−1 Z (w̄ − Z αn ) 1/2 2 1/2 √ 2 ≤ n / n + ∑ Pij ∑ w̄ j − Z j αn j j 1/2 √ 1/2 ≤ n + Pii n max w̄i − Z i αn = n + Pii n ≤ Cn . i≤n Then by T, maxi≤n |w̆i | ≤ maxi≤n |w̄i | + n → 0, so that 2 2 y̆ 2 ≤ max |w̆ | w̆ i ∑ k k ∑ y̆k2 = o(1) ∑ ȳk2 → 0. k i≤n k k p Then we have E[ψ̂62 ] → 0, so by M, ψ̂6 → 0. The conclusion then follows by T. n
© Copyright 2024 Paperzz