ASYMPTOTIC SIZE OF KLEIBERGEN'S LM AND
CONDITIONAL LR TESTS FOR MOMENT CONDITION MODELS
By
Donald W. K. Andrews and Patrik Guggenberger
December 2014
COWLES FOUNDATION DISCUSSION PAPER NO. 1977
COWLES FOUNDATION FOR RESEARCH IN ECONOMICS
YALE UNIVERSITY
Box 208281
New Haven, Connecticut 06520-8281
http://cowles.econ.yale.edu/
Asymptotic Size of Kleibergen’s LM
and Conditional LR Tests
for Moment Condition Models
Donald W. K. Andrews
Patrik Guggenberger
Cowles Foundation
Department of Economics
Yale University
Pennsylvania State University
First Version: March 25, 2011
Revised: December 31, 2014
Andrews and Guggenberger gratefully acknowledge the research support of the National Science Foundation
via grant numbers SES-1058376 and SES-1355504, and SES-1021101 and SES-1326827, respectively. The authors
thank Isaiah Andrews, Xu Cheng, Anna Mikusheva, and Jim Stock and the participants of seminars at the following universities for helpful comments: Boston, Boston College, Brown, Chicago, Columbia, Freiburg, Hanover,
Harvard/MIT, Hebrew Jerusalem, Konstanz, Maryland, Michigan, New York, Northwestern, Ohio State, Princeton,
Queen’s, Strasbourg, and Wisconsin.
1
Abstract
An in‡uential paper by Kleibergen (2005) introduces Lagrange multiplier (LM) and conditional
likelihood ratio-like (CLR) tests for nonlinear moment condition models. These procedures aim
to have good size performance even when the parameters are unidenti…ed or poorly identi…ed.
However, the asymptotic size and similarity (in a uniform sense) of these procedures has not been
determined in the literature. This paper does so.
This paper shows that the LM test has correct asymptotic size and is asymptotically similar
for a suitably chosen parameter space of null distributions. It shows that the CLR tests also have
these properties when the dimension p of the unknown parameter
equals 1: When p
2; however,
the asymptotic size properties are found to depend on how the conditioning statistic, upon which
the CLR tests depend, is weighted. Two weighting methods have been suggested in the literature.
The paper shows that the CLR tests are guaranteed to have correct asymptotic size when p
2
with one weighting method, combined with the Robin and Smith (2000) rank statistic. The paper
also determines a formula for the asymptotic size of the CLR test with the other weighting method.
However, the results of the paper do not guarantee correct asymptotic size when p
2 with the
other weighting method, because two key sample quantities are not necessarily asymptotically
independent under some identi…cation scenarios.
Analogous results for con…dence sets are provided. Even for the special case of a linear instrumental variable regression model with two or more right-hand side endogenous variables, the
results of the paper are new to the literature.
Keywords: asymptotics, conditional likelihood ratio test, con…dence set, identi…cation, inference, Lagrange multiplier test, moment conditions, robust, test, weak identi…cation, weak instruments.
JEL Classi…cation Numbers: C10, C12.
1
Introduction
We consider the moment condition model
EF g(Wi ; ) = 0k ;
where 0k = (0; :::; 0)0 2 Rk ; the equality holds when
(1.1)
Rp is the true value, fWi 2 Rm : i =
2
1; :::; ng are stationary and strong mixing observations with distribution F; g is a known (possibly
nonlinear) function from Rm+p to Rk with k
p; and EF ( ) denotes expectation under F: This
paper is concerned with tests of the null hypothesis
H0 :
=
0
versus H1 :
6=
0:
(1.2)
We consider the Lagrange Multiplier (LM) test of Kleibergen (2005) and adaptations of Moreira’s
(2003) conditional likelihood ratio (CLR) test to the nonlinear moment condition model (1.1),
as in Kleibergen (2005, 2007), Smith (2007), Newey and Windmeijer (2009), and Guggenberger,
Ramalho, and Smith (2012). The LM and CLR tests are designed to have better overall power
than the Anderson and Rubin (1949)-type S-tests of Stock and Wright (2000) when k > p:1
These tests aim to have good size even when the parameters are unidenti…ed or weakly identi…ed.
Weak identi…cation and weak instruments (IV’s) can occur in a wide variety of empirical applications in economics with linear and nonlinear models. Examples include: new Keynesian Phillips
curve models, dynamic stochastic general equilibrium (DSGE) models, consumption capital asset pricing models (CCAPM), interest rate dynamics models, Berry, Levinsohn, and Pakes (1995)
(BLP) models of demand for di¤erentiated products, returns-to-schooling equations, nonlinear regression, autoregressive-moving average models, GARCH models, smooth transition autoregressive
(STAR) models, parametric selection models estimated by Heckman’s two step method or maximum likelihood, mixture models, regime switching models, and all models where hypotheses testing
problems arise in which a nuisance parameter appears under the alternative hypothesis, but not
under the null. For references, see (for example) Andrews and Guggenberger (2014a) (hereafter
AG2).
The contribution of the paper is to determine the asymptotic sizes of the tests listed above,
and the con…dence sets (CS’s) that correspond to them, for suitably de…ned parameter spaces of
distributions, and to see whether their asymptotic sizes necessarily equal their nominal sizes. We
1
For the special case of the linear IV model, power comparisons (some theoretical and some simulation based) are
given in Kleibergen (2002), Moreira (2003), Andrews, Moreira, and Stock (2006, 2008), Chernozhukov, Hansen, and
Jansson (2009), Hillier (2009), Mikusheva (2010), and Ploberger (2012).
3
also determine whether these tests and CS’s are asymptotically similar in a uniform sense. The
strength of identi…cation of
depends on the magnitude of the singular values of the expectation
of the Jacobian
G(Wi ; ) :=
@
g(Wi ; ) 2 Rk
@ 0
p
(1.3)
of g(Wi ; ): The parameter space we consider does not impose any restrictions on the magnitude
of these singular values. The results hold for arbitrary …xed k and p with k
p:
We show that Kleibergen’s LM test (and CS) has correct asymptotic size and is uniformly
asymptotically similar for a parameter space of null distributions that is fairly general. But, the
parameter space does require an eigenvalue condition on the asymptotic variance of a transformation
of the conditioning statistic (onto which the normalized sample moments are projected). This
condition guarantees that the asymptotic version of the k
p conditioning statistic (after suitable
normalization) is full rank p a.s. This condition is shown not to be redundant in Section 12 in
the Appendix to this paper. The parameter space also requires that the variance matrix of the
moment functions is nonsingular. This assumption is needed because the inverse of the sample
variance matrix is employed to make the conditioning statistic asymptotically independent of the
sample moments. This condition can be restrictive because in some models lack of identi…cation
is accompanied by singularity of the variance matrix of the moments. For example, this occurs in
models in which for some null hypothesis a nuisance parameter appears only under the alternative
hypothesis.
The nonlinear CLR tests (and CS’s) that we consider depend on a rank statistic, which measures the rank of the expectation of G(Wi ; ): Following Kleibergen (2005), the rank statistics that
have been considered in the literature depend on a weighted orthogonalized version of the samP
ple Jacobian, n 1 ni=1 G(Wi ; ); where the orthogonalization is designed to create a conditioning
statistic that is asymptotically independent of the sample moments. Two weightings have been
considered. The …rst, proposed by Kleibergen (2005, 2007) and Smith (2007), premultiplies the
vectorized orthogonalized sample Jacobian by the negative square root of a consistent estimator of
its kp
kp variance matrix. We call this the Jacobian-variance weighting. The second, proposed
by Newey and Windmeijer (2009) and Guggenberger, Ramalho, and Smith (2012), multiplies the
k
p orthogonalized sample Jacobian by the negative square root of a consistent estimator of the
k
k variance matrix of the sample moments. We call this the moment-variance weighting.
Given the weighting of the orthogonalized sample Jacobian, several functional forms for the rank
statistic have been considered in the literature, including the rank statistics of Cragg and Donald
(1996, 1997), Robin and Smith (2000), and Kleibergen and Paap (2006). We provide results for
a general form of the rank statistic and verify the conditions imposed on the general form for
4
the Robin and Smith (2000) rank statistic. The latter is a popular choice because it is easy to
compute. Note that when p = 1; these rank statistics all reduce to the squared Euclidean norm of
the weighted orthogonalized sample Jacobian vector.
For the case where p = 1; we show that the CLR tests (and CS’s) based on either weighting
have correct asymptotic size and are asymptotically similar in a uniform sense (for parameter spaces
that are the same as those considered for the LM test and CS, or slightly smaller, depending on
the method of weighting).
For the case where p
2; we show that the CLR test (and CS) based on the Robin and
Smith (2000) rank statistic with the moment-variance weighting has correct asymptotic size and
is uniformly asymptotically similar for the same parameter spaces of distributions as considered
for the LM test (and CS). On the other hand, we cannot show that the CLR test (and CS) based
on the Robin and Smith (2000) rank statistic with the Jacobian-variance weighting necessarily
has correct asymptotic size. The reason is that the weighted orthogonalized sample Jacobian is
not necessarily asymptotically independent of the sample moments under some sequences of null
distributions. This occurs because the random variation of the kp
kp sample variance estimator
turns out to a¤ect the asymptotic distribution of the weighted orthogonalized sample Jacobian in
some cases. Roughly speaking, this occurs when some parameters are weakly identi…ed and some
are strongly identi…ed, or when some transformations of the parameters are weakly identi…ed and
some transformations are strongly identi…ed. (Obviously, when p = 1 these scenarios cannot occur.)
This phenomenon has not been demonstrated previously in the literature.
Simulations in a linear IV regression model with two right-hand side endogenous variables corroborate the existence of the asymptotic correlations discussed in the previous paragraph. However,
for the particular model and error distributions considered, these correlations have a small e¤ect
on the asymptotic null rejection probabilities of the CLR test with Jacobian-variance weighting.
These probabilities are very close to the nominal size of the test.
The results of the paper show that weak identi…cation occurs (i.e., the test statistics have
nonstandard asymptotic distributions due to identi…cation de…ciency) when lim n1=2 spFn < 1;
where fsjF : j = 1; :::; pg are the singular values of the expected Jacobian, EF G(Wi ;
to be nonincreasing in j; F denotes a null distribution, fFn : n
0 );
ordered
1g denotes a sequence of
null distributions for which the previous limit exists, and the limit is taken as n ! 1: Strong
or semi-strong identi…cation occurs when lim n1=2 spFn = 1: Strong identi…cation occurs when
lim spFn > 0 and semi-strong identi…cation occurs when lim n1=2 spFn = 1 and lim spFn = 0: When
p = 1; s1F = jjEF G(Wi ;
However, when p
0 )jj
and weak identi…cation occurs when lim n1=2 jjEF G(Wi ;
0 )jj
< 1:
2; weak identi…cation can take many di¤erent forms. Weak identi…cation in the
5
standard sense, i.e., when all parameters are weakly identi…ed, e.g., as in Staiger and Stock (1997),
occurs when lim n1=2 s1Fn < 1: This is a relatively easy case to analyze asymptotically. Weak
identi…cation also occurs when lim n1=2 spFn < 1; but lim n1=2 s1Fn = 1; i.e., di¤erent singular
values behave di¤erently asymptotically. We refer to this as weak identi…cation in a nonstandard
sense. It includes the (some weak/some strong) identi…cation scenario considered in Stock and
Wright (2000) based on their Assumption C. The nonstandard weak identi…cation scenario is the
scenario in which the weighted orthogonalized sample Jacobian may not be independent of the
sample moments when the Jacobian-variance weighting is employed. This case is much more di¢ cult
to analyze asymptotically. A subset of this case, which we refer to as joint weak identi…cation, is
a case in which the previous conditions hold (i.e., lim n1=2 spFn < 1 and lim n1=2 s1Fn = 1) and
lim n1=2 jjEFn Gj (Wi ;
0 )jj
= 1 for all j
p; where Gj (Wi ;
0)
denotes the jth column of G(Wi ;
0 ):
Under joint weak identi…cation, each column of the Jacobian behaves as though the corresponding
parameter is strongly or semi-strongly identi…ed, but jointly, weak identi…cation occurs (because
lim n1=2 spFn < 1). As discussed in Section 2 below, no results in the literature consider all of the
cases of weak identi…cation that may occur when p
2:2
For clarity, the results of the paper are stated and derived …rst for i.i.d. observations. Then,
they are extended to cover time series observations that are stationary and strong mixing. This
way of proceeding lets us provide somewhat weaker assumptions in the i.i.d. case than if the i.i.d.
case is treated as a special case of the time series results.
All limits below are taken as n ! 1: The expression A := B denotes that A is de…ned to equal
B:
The paper is organized as follows. Section 2 discusses the related literature and the contribution
of this paper to the literature. Section 3 de…nes the moment condition model. Section 4 de…nes and
provides asymptotic results for Kleibergen’s (2005) LM test. Section 5 does likewise for Kleibergen’s
(2005) CLR test with Jacobian-variance weighting. Section 6 does likewise for Kleibergen’s CLR
test with moment-variance weighting, as in Newey and Windmeijer (2009) and Guggenberger,
Ramalho, and Smith (2012). Section 7 provides results for the tests with time series observations.
An Appendix provides some of the proofs of the results given in the paper. The remaining proofs
and some additional results are given in the Supplemental Material to this paper, see Andrews and
Guggenberger (2014b).
2
The de…nitions of the identi…cation categories given here, which are based on fsjFn : j p; n 1g; where sjF is
the jth largest singular value of EF G(Wi ; 0 ); are suitable when min (V arF (g(Wi ; 0 ))) is bounded away from zero
over the parameter space of distributions F: When the latter condition does not hold, but min (V arF (g(Wi ; 0 ))) > 0
for all distributions F; then sjF should be de…ned to be the jth largest singular value of the normalized expected
1=2
Jacobian V arF (g(Wi ; 0 ))EF G(Wi ; 0 ) in order to obtain the appropriate de…nitions of the identi…cation categories.
6
2
Discussion of the Literature
To date in the literature it has only been shown that Kleibergen’s LM and CLR tests control
the limiting null rejection probability under certain strong instrument and certain weak instrument
sequences. For example, concerning the validity of the LM and CLR tests, Kleibergen (2005, proofs
of Theorems 1 and 3) deals only with sequences of matrices EFn G(Wi ; ) whose limits are a full
column rank matrix or a matrix of zeros.3 Kleibergen (2005) does not consider the cases where
(i) the limit of EFn G(Wi ; ) exists and is nonzero, some of its columns are equal to zero,
and the remaining columns are linearly independent, and
(ii) the limit of EFn G(Wi ; ) exists and is nonzero and some subset of its columns are
nonzero but less than full column rank,
where fFn : n
(2.1)
1g is a sequence of true null distributions that generates the data. Case (ii) is an
example of “joint weak identi…cation” in which several parameters individually satisfy conditions
that indicate strong identi…cation, but jointly exhibit weak identi…cation. This paper is the …rst
to investigate joint weak identi…cation. Results for cases (i) and (ii) are needed to establish the
asymptotic sizes of the LM and CLR tests.
Example. Consider as a simple example the linear IV regression model
y1i = Y2i0 + ui ;
Y2i =
0
Zi + V2i ;
(2.2)
where y1i 2 R and Y2i 2 Rp are endogenous variables, Zi 2 Rk for k
(=
F)
2 Rk
p
p is a vector of IV’s, and
is an unknown unrestricted parameter matrix.4 The data fWi = (y1i ; Y2i0 ; Zi0 )0 : i =
1; :::; ng are i.i.d. and EF ((ui ; V2i0 )0 jZi ) = 0p+1 a.s. Here m = 1 + p + k and
g(Wi ; ) = Zi (y1i
Y2i0 ) and G(Wi ; ) =
Zi Y2i0 :
(2.3)
3
See the …rst equation of the proof of Kleibergen’s (2005) Theorem 1 in which the rate of convergence of his
b
DT ( 0 ; Y ) to its limit is stated to be T
for = 0 (which is a typo and should be 1=2) or 1 and J ( 0 ) (which equals
lim EFn Gi ( 0 ) in our notation) is assumed to exist and have full column rank when = 1:
4
For simplicity, no exogenous variables are included in the structural equation. See Andrews, Cheng, and Guggenberger (2009) and Mikusheva (2010) for asymptotic size results for the CLR test in linear IV regression models with
included exogenous variables, but with only one right-hand side endogenous variable. Due to the latter feature, cases
(i) and (ii) in (2.1) and case (iv) in (2.5) below do not arise in the aforementioned papers.
7
By assumption, EF g(Wi ; ) = EF Zi ui = 0k when
EF G(Wi ; ) =
The latter does not depend on
is the true vector. In addition, we have
EF Zi Zi0 :
(2.4)
but does depend on the reduced-form coe¢ cient matrix
which
determines the strength of the IV’s. Stock and Wright (2000), Guggenberger and Smith (2005), and
Guggenberger, Ramalho, and Smith (2012) consider weak/strong IV sequences
Rk (p1 +p2 ) ;
where
1n
= n
1=2 h
1
for a …xed h1 and
2n
=
2
n
=(
1n ;
2n )
2
is a …xed matrix (that does not
depend on n) with full column rank p2 : Specialized to the linear IV setting, the goal of this paper
is to establish that the LM and CLR tests of the hypotheses in (1.2) have asymptotic sizes equal
to their nominal sizes for a parameter space that does not impose any restrictions on :
Case (ii) identi…cation failure in (2.1) occurs in model (2.2) with p
subset of the columns of
this occurs when p = 2;
n
n
2 for sequences
n
where a
converge to nonzero vectors that are linearly dependent. For example,
2 Rk
2;
and the columns of
n
are (1; :::; 1)0 and (1 + o(1); :::; 1 + o(1))0 :
Weak identi…cation of this type has not been dealt with in the literature on LM and CLR tests in
linear IV models. We do so in this paper (for both linear and nonlinear models).
We return now to the discussion of the general moment condition model. The missing cases
in Kleibergen’s (2005) proofs of Theorems 1 and 3 are important because they are likely cases in
practice. For example, the case where some parameters are strongly identi…ed and others are weakly
identi…ed (likely) occurs in Stock and Wright’s (2000) (SW) and Kleibergen’s (2005) consumption
capital asset pricing model (CCAPM) example.
Guggenberger and Smith (2005), Otsu (2006), Inoue and Rossi (2011), Guggenberger, Ramalho,
and Smith (2012), and I. Andrews (2014) deal with a subset of case (i) for generalized empirical
likelihood (GEL) and GMM versions of the LM and CLR tests, but rule out case (ii) by assumption.5;6 Furthermore, their results for case (i) rely on Assumption C of SW.7;8 This assumption
is an innovative contribution to the literature, but it has some signi…cant drawbacks as a general
high-level condition.
5
Case (ii) is ruled out by Assumption ID(iii) in Guggenberger and Smith (2005) and Assumption ID (iii) in
Guggenberger, Ramalho, and Smith (2012), which assume that the matrix M2 ( ) has full column rank, where M2 ( )
contains the columns of EF G(Wi ; ) that correspond to the strongly identi…ed parameters. Case (ii) also is ruled out
by Assumption C(ii) in Stock and Wright (2000), which is used to obtain results for GMM estimators.
6
Guggenberger and Smith (2005), Otsu (2006), and Inoue and Rossi (2011) do not consider CLR tests.
7
Assumption C of SW requires that the expected moment functions can be written as n 1=2 m1n ( ) + m2 ( ) for
some functions m1n and m2 and some ( ; ) such that = ( 0 ; 0 )0 and (@=@ 0 )m2 ( 0 ) has full column rank, where
2 for some real-valued
0 denotes the true value of : In addition, it requires that m1n ( ) ! m1 ( ) uniformly over
function m1 ; m2 ( 0 ) = 0k ; m2 ( ) 6= 0k for 6= 0 ; and (@=@ 0 )m2 ( ) is continuous.
8
Inoue and Rossi (2011) and I. Andrews (2014, Appendix B) use conditions that are much like Assumption C
of SW, but they are not exactly the same. As discussed below, Guggenberger, Ramalho, and Smith (2012) impose
high-level conditions on a rank statistic when dealing with a CLR test under Assumption C of SW.
8
First, while Assumption C is easy to verify or refute in linear IV models, it is hard to verify
or refute in many, or most, nonlinear models. As far as we are aware, it has only been veri…ed
in the literature for one nonlinear model and that nonlinear model is only a local approximation
to the model of interest. The model of interest is the two parameter CCAPM considered in SW
and Kleibergen (2005). SW verify Assumption C for a local approximation to this model that is a
polynomial in the parameters, see p. 1093 of their Appendix B.9 It appears to be hard to verify or
refute Assumption C in the CCAPM of interest.
Another example where Assumption C is hard to verify or refute is the following simple nonlinear
regression model with endogeneity, one weakly-identi…ed parameter, and one strongly-identi…ed
parameter: yi = f (Y1i
constant vector C 2
1
+ Y2i 2 ) + ui ; Y1i = Zi0
Rk ;
2
6=
0k ;
1n
+ V1i ; Y2i = Zi0
2
+ V2i ;
1n
= Cn
1=2
f ( ) is a known function, Zi is a vector of IV’s, and
The moment functions take the form (yi
f (Y1i
1
for some
= ( 1;
0
2) :
+ Y2i 2 ))Zi : For an arbitrary function f it
is di¢ cult to determine whether Assumption C holds or not. If f is a quadratic function, or a
polynomial, then it may be possible to verify Assumption C. But, even for such functions, doing
so does not seem easy.
Second, Assumption C is restrictive. For example, it fails to hold in a nonlinear regression model
with weak identi…cation due to the coe¢ cient on a nonlinear regressor being close to zero. Suppose
the model is yi = h(Xi ; ) + ui for i = 1; :::; n; where yi and Xi are observed, ui is an unobserved
= ( ; )0 : The parameter
mean zero error, and
is weakly identi…ed when
= Cn
1=2
for some
constant C: It is shown in Appendix E of the Supplemental Material to Andrews and Cheng (2012)
that Assumption C fails in this case.
Another example where Assumption C fails is a linear IV model with joint estimation of the
right-hand side (rhs) endogenous variable parameter, which is weakly identi…ed, and the structural
equation error variance, which is strongly identi…ed: y1i = Y2i
simplicity),
n
= Cn
1=2
for some constant C; V ar(ui ) =
EZi ui = EZi V2i = 0: The moment functions are (y1i
C fails in this
2
1 + ui ;
> 0;
Y2i = Zi
= ( 1;
Y2i 1 )Zi and (y1i
0
2) ;
n + V2i ;
Zi 2 R (for
and Eui = EV2i =
Y2i 1 )2
2:
Assumption
model.10
9
The approximate model for which SW verify Assumption C is a local approximation to the model of interest
based on a Taylor series expansion about a reference parameter value 0 ; in their notation. This approximation is
necessarily accurate only for close to 0 : For other values of ; the approximate model may be di¤erent from the
model of interest. Note that Assumption C is a global assumption. So, the fact that it holds for the approximate
model local to 0 does not imply that it approximately holds for the original model.
10
Assumption C of SW fails in the present example because the expected moment functions are E(y1i Y2i 1 )Zi =
1
2
2
2
n 1=2 EZi2 C( 1
Y2i 1 )2
EZi2 C 2 ( 1
10 ) and E(y1i
2 = n
10 ) + a( ); where a( ) :=
10 )
V( 1
0
2
2
2 uV ( 1
10 ) + 20
2 ; 0 = ( 10 ; 20 ) denotes the true value of ; V := V ar(V2i ); uV := Cov(ui ; V2i ); and V
and uV do not depend on n: Because a( ) does not depend on n; but does depend on both 1 and 2 ; one must take
= and m2 ( ) = (0; a( ))0 in Assumption C (see the footnote above which speci…es Assumption C). In this case,
(@=@ 0 )m2 ( 0 ) is a 2 2 matrix with less than full rank, because its …rst row is zero, which violates Assumption C.
9
The results of this paper do not impose any conditions on the functional form of the expected
moment conditions and their derivatives, like Assumption C does. The conditions given are more
general than the conditions used in the papers that rely on Assumption C.
We also point out that no papers in the literature deal with cases where p
of EFn G(Wi ; ) is zero, but n1=2 jjEFn Gj (Wi ; )jj ! 1 for some j
2 and the limit
p; where, as above, Gj (Wi ; )
denotes the jth column of G(Wi ; ): In such situations, analogues of cases (i) and (ii) arise in which
suitably rescaled versions of the columns j for which n1=2 jjEFn Gj (Wi ; )jj ! 1 have limits that
are11
(iii) nonzero and linearly independent and
(iv) nonzero and linearly dependent.
(2.5)
Case (iv) sequences are examples of joint weak identi…cation. Cases (iii) and (iv) sequences need
to be considered to establish the correct asymptotic sizes of the LM and CLR tests.
For CLR tests, Guggenberger, Ramalho, and Smith (2012) establish the correct asymptotic null
rejection probabilities for GEL versions of the CLR test in a subset of case (i) under Assumption C
and the assumption that the conditioning statistic, rkn ( ); either diverges to in…nity or converges
in distribution to a random variable that is random only through its dependence on the limit of the
estimated Jacobian. Verifying this condition in cases (i)-(iv) is not easy. We do so in this paper
for the Robin and Smith (2000) rank statistic rkn ( ) with moment-variance weighting. In sum,
Guggenberger, Ramalho, and Smith (2012) do not establish the correct asymptotic null rejection
probabilities of the CLR test under Assumption C. They do so only under an additional high level
condition on the rank statistic.
Kleibergen’s (2005, Thm. 3) results for the CLR test rely on the claim that the conditioning statistic rkn ( ) is asymptotically independent of the LM statistic if rkn ( ) is a function of a
b n ( ) 2 Rk p :
weighting matrix, VeDn say, and the orthogonalized sample Jacobian, denoted by D
However, this claim does not hold in general, as shown in Theorem 5.1 below and Section 18 in
the Supplemental Material.12 Newey and Windmeijer (2009) consider the limiting null rejection
11
For example, suppose p = 2: Let (Gi1 ; Gi2 ) = G(Wi ; ) 2 Rk 2 : An example of case (iii) occurs when Gi1 exhibits
what might be called "semi-strong identi…cation," i.e., EFn Gi1 = C1n n s for 0 < s < 1=2 and C1n ! C1 2 Rk ; where
C1 6= 0k ; and Gi2 exhibits the classic features of "weak identi…cation," i.e., EFn Gi2 = C2 n 1=2 for some C2 2 Rk :
Then, EFn Gi1 ! 0k ; EFn Gi2 ! 0k ; n1=2 jjEFn Gi1 jj ! 1; and ns EFn Gi1 ! C1 6= 0k :
An example of case (iv) occurs when EFn Gi1 is as above and EFn Gi2 = C2n n s2 for 0 < s2 < 1=2 and C2n !
C2 2 Rk ; where C2 6= 0k ; and C1 and C2 are linearly dependent. If C1 and C2 are linearly independent, then this is
another example of case (iii).
12
1=2
1=2 b
1=2
Under
P sequences Fn such that n EFn G(Wi ; ) converges to a …nite matrix, n Dn ( ) and n gbn ( ) (=
n 1=2 n
g(W
;
))
are
asymptotically
independent
(see
Lemmas
8.2
and
8.3
in
Section
8
in
the
Appendix).
Therei
i=1
b n ( )) is a continuous function of n1=2 D
b n ( ) and a weighting matrix Vbn (that converges in probability
fore, if r(Vbn ; n1=2 D
10
probability of the CLR test under “many instrument”asymptotics. They do not analyze the e¤ects
of weak identi…cation (such as in cases (i)-(iv)). Their Assumption 2 implies global identi…cation
of :
As a special case of the asymptotic size results of this paper for nonlinear models, this paper
provides some new results for the linear IV regression model. Speci…cally, the results of the present
paper establish the correct asymptotic size of LM and CLR tests in the linear IV model with an
arbitrary number of rhs endogenous variables, under some maintained assumptions. The results
allow for heteroskedasticity of the errors and stationary strong mixing errors and observations.
In contrast, the relevant results available in the literature for the linear IV model are as follows.
Kleibergen (2002) shows that his LM test has correct asymptotic null rejection probabilities under
…xed full-rank reduced-form matrices, as well as under standard weak IV asymptotics— that is,
under the n
1=2 -local
to zero sequences in Staiger and Stock (1997). Also see Moreira (2009).
Moreira (2003) proves that the limiting null rejection probability of the CLR test is correct under
standard weak IV asymptotics (i.e., of the type considered in Staiger and Stock (1997)). None of
these papers considers cases (i)-(iv) above. Mikusheva (2010) establishes the correct asymptotic size
of homoskedastic LM and CLR tests and CS’s when there is only one endogenous rhs variable, i.e.,
p = 1; and the errors are homoskedastic. Guggenberger (2012) establishes the correct asymptotic
size of heteroskedasticity-robust LM and CLR tests in a heteroskedastic model with p = 1: I.
Andrews (2014) establishes the correct asymptotic size of a class of conditional linear combination
(CLC) tests when p = 1; which he shows are equivalent to a class of CLR tests. He provides some
CLC tests that are designed to have good power under heteroskedasticity and autocorrelation.
Moreira and Moreira (2013) introduce some tests that maximize weighted average power in a linear
IV model with heteroskedasticity and autocorrelation for the case where p = 1: Note that when
p = 1; i.e., only one rhs endogenous variable appears (and the exogenous variables are projected
out), cases (i), (ii), and (iv) above do not arise (because EF G(Wi ; ) has a single column). Phillips
b n ( )) are
to a positive de…nite matrix), then by the continuous mapping theorem (CMT), n1=2 gbn ( ) and r(Vbn ; n1=2 D
also asymptotically independent.
However, under sequences for which a component of n1=2 EFn G(Wi ; ) diverges to plus or minus in…nity, the CMT
b n ( ) does not converge in distribution, but rather, some component of it diverges to
cannot be applied because n1=2 D
plus or minus in…nity in probability (see Lemma 8.3 in Section 8 in the Appendix when h1;j = 1 for some j p): In
b n ( )) may not have an asymptotic distribution, and if it does, r(Vbn ; n1=2 D
b n ( )) and n1=2 gbn ( )
this case, r(Vbn ; n1=2 D
are not necessarily asymptotically independent. The following is a simple example of the latter situation when p = 2:
b n ( )) = Vb12n jjn1=2 D
b 1n ( )jj; where Vb12n is the (1, 2) component of Vbn and D
b 1n ( ) is the …rst column
Let r(Vbn ; n1=2 D
1=2 b
b
b
of Dn ( ): Assume Vn V !p 0 for some matrix V and n (Vn V ) !d ; where is a mean zero normal random
matrix. Assume that under Fn the …rst column EFn G1 (Wi ; ) of EFn G(Wi ; ) is a …xed nonzero vector, Ge1 say.
b 1n ( ) !p Ge1 (see Lemma 8.2 in
Assume that the (1; 2) element of V; denoted by V12 ; equals zero under Fn : Then, D
1=2 b
1=2 b
b
b
Section 8 in the Appendix) and V12n jjn D1n ( )jj = n (V12n V12 )jjD1n ( )jj !d 12 jjGe1 jj: But, in general there
is no reason why 12 and the random limit of n1=2 gbn ( ) are independent. For simplicity, the previous example is
b 1n ( )jj: But, components of rank
somewhat contrived, because rank statistics typically are not of the form Vb12n jjn1=2 D
statistics may be of this form.
11
(1989) and Choi and Phillips (1992) provide asymptotic and …nite sample results for estimators
and classical tests in simultaneous equations models with …xed
or partially identi…ed when p
matrices that may be unidenti…ed
1: Their results do not cover weak identi…cation (of any type).
Hillier (2009) provides exact …nite sample results for CLR tests in the linear IV model under the
assumption of homoskedastic normal errors and known covariance matrix.
We return now to the discussion of a general moment condition model. In this paper, we show
that a minimum eigenvalue condition that appears in the parameter space F0 (de…ned below) for the
null distributions F is necessary in some sense to obtain correct asymptotic size for the LM and CLR
tests. For example, in the linear IV regression model, this eigenvalue condition rules out perfect
correlation between the structural and reduced-form errors. Without the eigenvalue condition, we
show that in some cases the LM statistic equals the AR statistic plus a op (1) term. In consequence,
the LM test (which uses a
2
p
critical value) over-rejects the null hypothesis asymptotically when
k > p: Furthermore, without it, we show that in other cases the LM statistic equals zero a.s. for
all n
1 and, hence, the LM test rejects the null hypothesis with probability zero for all n
1:
In such cases, the LM test under-rejects the null asymptotically. These properties of the LM test
have not been recognized in the literature, e.g., see Kleibergen (2005, Theorem 1).
We note that the asymptotic framework and results given here should be useful for establishing
the asymptotic size of tests (and CS’s) for moment condition and linear IV models that di¤er from
the LM and CLR tests (and CS’s) considered here, such as the tests in Moreira and Moreira (2013)
and I. Andrews (2014). For example, we provide su¢ cient conditions for a suitably renormalized
version of the moment-variance-weighted orthogonalized sample Jacobian to have full rank almost
surely asymptotically, which is needed in the latter paper when p
2:
AG2 is a sequel to this paper. It introduces two new nonlinear singularity-robust conditional
quasi-LR (SR-CQLR) tests and a singularity-robust Anderson-Rubin (SR-AR) test. AG2 shows
that these tests (and the corresponding CS’s) have correct asymptotic size for all p
1 under very
weak conditions. For example, in the i.i.d. case, one of the two SR-CQLR tests and the SR-AR test
only require the expected moment functions to equal zero at the true parameter and the sample
moment functions to have 2 + moments uniformly bounded for some
> 0: (The other SR-CQLR
test imposes somewhat stronger moment conditions.) In particular, none of the tests in AG2 impose
any conditions on the expectation of the Jacobian matrix of the moments or any conditions on the
variance matrices of the moment functions or the conditioning statistic, which is the meaning of
“singularity-robust.” The two SR-CQLR tests are shown to be asymptotically e¢ cient in a GMM
sense under strong and semi-strong identi…cation.
The new SR-CQLR tests in AG2 have some advantages over the CLR tests considered in this
12
paper. First, they have correct asymptotic size under noticeably weaker conditions. Because they
do not require the variance matrix of the moment functions to be nonsingular, they apply to
models in which for some null hypothesis a nuisance parameter appears only under the alternative
hypothesis and not under the null hypothesis.13 In addition, they do not place any restrictions on
the eigenvalues of the expected outer product of the vectorized orthogonalized sample Jacobian,
which can be restrictive and can be di¢ cult to verify in some models.
Second, the tests reduce, or essentially reduce, asymptotically to Moreira’s (2003) CLR test in
the homoskedastic linear IV model for all p
statistic is needed when p
1: In consequence, (a) no arbitrary choice of rank
2; and (b) the tests have the desirable power properties of Moreira’s
(2003) CLR test in the homoskedastic normal linear IV model when p = 1; which have been
established in Andrews, Moreira, and Stock (2006, 2008) and Chernozhukov, Hansen, and Jansson
(2009).14 In contrast, the CLR tests considered here for p
1 are all based on the form of Moreira’s
LR statistic when p = 1 and, in consequence, require the speci…cation of some rank statistic. The
CLR tests considered here based on the Jacobian-variance weighting reduce to Moreira’s CLR test
when p = 1; but we cannot show that they necessarily have correct asymptotic size when p
2: On
the other hand, we show that the CLR tests considered here that are based on the moment-variance
weighting have correct asymptotic size when p
when p = 1 (or p
1; but they do not reduce to Moreira’s CLR test
2):
We also mention the recent paper by I. Andrews and Mikusheva (2014a) that introduces a new
conditional likelihood ratio test for moment condition models that is robust to weak identi…cation.
This test is asymptotically similar conditional on the entire sample mean process that is orthogonalized to be asymptotically independent of the sample moments evaluated at the null parameter
value.
The LM and CLR tests considered in this paper are for full vector inference. To obtain subvector
inference, one can employ the Bonferroni method or the Sche¤é projection method, see Cavanagh,
Elliott, and Stock (1995), Chaudhuri, Richardson, Robins, and Zivot (2010), Chaudhuri and Zivot
(2011), and McCloskey (2011) for Bonferroni’s method, and Dufour (1989) and Dufour and Jasiak
(2001) for the projection method. These methods are conservative, but Bonferroni’s method is
found to work well by Chaudhuri, Richardson, Robins, and Zivot (2010) and Chaudhuri and Zivot
(2011).15
13
Nonsingularity of the variance matrix of the moments is needed for Kleibergen’s CLR-type tests, because the
inverse of this matrix is used to orthogonalize the sample Jacobian from the sample moments when constructing a
conditioning statistic.
14
For related results, see Chamberlain (2007) and Mikusheva (2010).
15
A re…nement of Bonferroni’s method that is not conservative, but is much more intensive computationally, is
provided by Cavanagh, Elliott, and Stock (1995). McCloskey (2011) also considers a re…nement of Bonferroni’s
method.
13
Other methods for subvector inference include the following. Subvector inference in which nuisance parameters are pro…led out is possible in the linear IV regression model with homoskedastic
errors using the AR test, but not the LM or CLR tests, see Guggenberger, Kleibergen, Mavroeidis,
and Chen (2012). Andrews and Cheng (2012, 2013a,b) provide subvector tests with correct asymptotic size based on extremum estimator objective functions. These subvector methods depend on
the following: (a) one has knowledge of the source of the potential lack of identi…cation (i.e., which
subvectors play the roles of ; ; and
in their notation), (b) there is only one source of lack of
identi…cation, and (c) the estimator objective function does not depend on the weakly identi…ed
parameters
(in their notation) when
= 0; which rules out some weak IV’s models. Montiel
Olea (2012) provides some subvector analysis in the extremum estimator context of Andrews and
Cheng (2012). His e¢ cient conditionally similar tests apply to the subvector ( ; ) of ( ; ; ) (in
the notation of Andrews and Cheng (2012)), where the parameter
determines the strength of
identi…cation and is known to be strongly identi…ed. This subvector analysis is analogous to that
of Stock and Wright (2000) and Kleibergen (2004). Cheng (2014) provides subvector inference in a
nonlinear regression model with multiple nonlinear regressors and, in consequence, multiple potential sources of lack of identi…cation. I. Andrews and Mikusheva (2012) provide subvector inference
methods in a minimum distance context based on Anderson-Rubin-type statistics. I. Andrews and
Mikusheva (2014b) provide conditions under which subvector inference is possible in exponential
family models (but the requisite conditions seem to be restrictive).
3
Moment Condition Model
3.1
De…nition of the Parameter Space for the Distributions F
First we introduce some notation. For notational simplicity, we let gi ( ) and Gi ( ) abbreviate
g(Wi ; ) and G(Wi ; ); respectively. We denote the jth column of Gi ( ) by Gij ( ) and Gij = Gij ( 0 );
where
0
0
denotes the (true) null value of ; for j = 1; :::; p: Likewise, we often leave out the argument
for other functions as well. For example, we write gi and Gi rather than gi ( 0 ) and Gi ( 0 ): We
let Ir denote the r dimensional identity matrix. For a positive semi-de…nite (psd) matrix A; we let
j (A)
denote the jth largest eigenvalue of A:
For some ; > 0 and M < 1; de…ne
F := fF : EF gi = 0k ; EF jj(gi0 ; vec(Gi )0 )0 jj2+
where
min (
M; and
0
min (EF gi gi )
g;
(3.1)
) denotes the smallest eigenvalue of a matrix, jj jj denotes the Euclidean norm, and
14
vec( ) denotes the vector obtained by stacking the columns of a matrix. The …rst condition in F
is the de…ning condition of the model. The second condition in F is a mild moment condition on
the moment functions gi and their derivatives Gi : The last condition in F rules out singularity and
near singularity of the variance matrix of the moments.16 For example, in the linear IV model it
rules out EF u2i Zi Zi0 being singular, which usually is not restrictive. Identi…cation issues arise when
EF Gi has, or is close to having, less than full column rank (which occurs when one or more of its
singular values is zero or close to zero). The conditions in F place no restrictions on the singular
values or column rank of EF Gi :
The condition
0
min (EF gi gi )
in F can be replaced by
0
min (EF gi gi )
> 0 without a¤ecting
the asymptotic size and similarity results given in Theorems 4.1 and 6.1 below, provided gi and Gi
are replaced with gi and Gi ; respectively, in F and F0 (de…ned below), where gi := (EF gi gi0 )
and Gi :=
(EF gi gi0 ) 1=2 Gi :17;18
1=2 g
i
This allows for the variance matrix of gi to be arbitrarily close to
singular, which occurs in some cases when identi…cation is weak, but rules out singularity.
The parameter spaces for the distribution F that we consider in this paper are subsets of F:
The main parameter space that we consider is F0 ; which we now de…ne.
For an arbitrary square-integrable (under F ) vector ai ; let
ai
F
ai
F
:= EF ai a0i ;
The matrix
ai
F
:= EF ai gi0 ;
F
:=
gi
F
= EF gi gi0 ; and
ai
F
:=
ai
F
ai
F
F
1 ai 0
F :
(3.2)
is the expected outer product of the vector of residuals from the L2 (F ) projections
of the components of ai onto the space spanned by the components of gi :
Let
(
ordered so that
jF
1F ; :::; pF )
denote the p singular values of
1=2
F
EF Gi ;
(3.3)
is nonincreasing in j: These singular values are nonnegative and may be zero.
Let
BF denote a p
p orthogonal matrix of eigenvectors of (EF Gi )0
1
F
(EF Gi )
(3.4)
16
Note that it is not possible to avoid the assumption min (EF gi gi0 )
by replacing an estimator b n of EF gi gi0
by an eigenvalue-adjusted version, e.g., as de…ned in AG2. The reason is that the eigenvalue adjustment leads to a
b n ; de…ned in (4.1) and
nonzero asymptotic covariance between the sample moments gbn and the conditioning matrix D
(4.3) below, which yields a test that does not necessarily have correct asymptotic size. See Comment (ii) to Lemma
8.2 in the Appendix for more details.
17
This holds because min (EF gi gi 0 ) = min (Ik ) = 1 and the proofs of the results given below go through with gi
and Gi in place of gi and Gi throughout.
18
The matrix (EF gi gi0 ) 1=2 that appears in the de…nition of gi and Gi can be replaced by any nonsingular k k
matrix, say KF ( 0 ); that yields min (EF gi gi 0 )
> 0: For example, in somewhat related contexts, Andrews and
Cheng (2013b) and I. Andrews and Mikusheva (2014) …nd it convenient to rescale moment conditions by diagonal
matrices.
15
ordered so that the corresponding eigenvalues (
CF denote a k
1F ; :::;
are nonincreasing. Let
k orthogonal matrix of eigenvectors of
ordered so that the corresponding eigenvalues are (
for j
pF )
1F ; :::;
p: With some abuse of notation, for an integer 0
the decomposition of BF into its …rst j and last p
BF;j = BF and BF;p
j
1=2
F
(EF Gi )(EF Gi )0
(3.5)
2 Rk : Note that
jF
p; let BF = (BF;j ; BF;p
j)
0
pF ; 0; :::; 0)
j
1=2
F
=
2
jF
denote
j columns, where by de…nition, when j = p;
denotes a matrix with no columns and, when j = 0; BF;j denotes a matrix
with no columns and BF;p
j
= BF : Analogously, for an integer 0
j
k; let CF = (CF;j ; CF;k
j)
denote the decomposition of CF into its …rst j and last k
j columns, where, when j = 0 or j = k;
CF;j and CF;k
j:
For 0
j
j
are de…ned analogously to BF;j and BF;p
p
1 and
2
Rp j ;
de…ne
jF (
For a given
1
F0 : =
0
CF;k
F
1=2
j
F
Gi BF;p
j
:
(3.6)
> 0; we de…ne the parameter space of null distributions to be
Sp
j=0 F0j ;
F0j : = fF 2 F :
0F
) :=
where
(3.7)
1
jF
and
p j
(
jF (
))
1
8 2 Rp
j
with jj jj = 1g;
for j = p:19;20 We assume that F0 6= ?:
1=2 b
k
The conditions in F0 are used to show that the estimator b n D
n 2 R
:=
1;
and
p j
(
jF (
)) :=
1
the normalized population Jacobian matrix
1=2
F
p;
de…ned below, of
EF Gi has full column rank p asymptotically with
probability one after suitable normalization (see Lemma 8.3(d) in the Appendix). This almost sure
(a.s.) full column rank p property is needed to obtain the desired asymptotic
2
p
null distribution
of the LM statistic (introduced below), which is used by the LM and CLR tests. The LM statistic
is a quadratic form in the sample moments with weight matrix given by the projection matrix onto
b n 1=2 D
b n:
We obtain the a.s. full column rank property using conditions on both the (asymptotic) mean
1=2 b
and variance of b n D
n : The index j on F0j denotes the contribution coming from the mean and
p
j denotes the contribution coming from the variance. For j = 0 (i.e., when the parameters
19
The matrices BF and CF are not necessarily uniquely de…ned. But, this is not of consequence because the p j ( )
condition is invariant to the choice of BF and CF :
20
Note that Kleibergen
(2005) does not impose any rank restrictions on the variance matrix of the limiting distribP
ution of n 1=2 (gi0 ; vec(Gi )0 Evec(Gi )0 )0 : As simple examples show, however, to derive the limiting distribution of
the LM test statistic, one needs to impose some restrictions of the type in F0 : For example, the case gi ( ) = 0 with
probability one for all vectors is compatible with Kleibergen’s (2005) assumptions but violates the nonsingularity
claim in the statement of Theorem 1 in Kleibergen (2005).
16
are weakly identi…ed in the standard sense), the
are placed on the mean
1=2
F
1
jF
condition disappears, no restrictions
EF Gi ; and the a.s. full column rank property is obtained using the
p j(
) condition with j = 0: For j = p (i.e., when all parameters are strongly identi…ed), the
p j(
) condition disappears (because BF;p
is a matrix with no columns when j = p) and the
j
a.s. full rank property is obtained using only the mean condition
1:
pF
For 0 < j < p (i.e.,
when the parameters are weakly identi…ed in the nonstandard sense), the a.s. full rank property is
obtained partly via the mean condition
The “variance”(or variability) condition,
0
(k j) (p j) matrix CF;k
is just
1=2
F
1=2
j
F
Gi BF;p
j
and partly via the
1
jF
(
p j
jF (
))
1;
p j(
) condition.21;22
can be interpreted as follows. The
1=2
is a submatrix of the k p matrix CF0
0
Gi with its rows and columns rotated. This submatrix CF;k
F
1=2
j
F
Gi BF ; which
Gi BF;p
j
has the
1=2
j linear combinations of the rows and columns of F Gi removed for which the mean component
1=2
1=2 b
EF Gi ; provides a column rank of magnitude j: (More speci…cally, the mean
of b n D
n ; i.e., F
component of the j linear combinations of the rows and columns of
0
CF;j
1=2
F
EF Gi BF;j = Diagf
by the de…nition of F0j
:23 )
The
1F ; :::; jF g
p j
(
jF (
1=2
F
Gi that are removed equals
2 Rj
j
and the column rank of Diagf
))
1
condition requires that every linear combination
0
(with jj jj = 1) of the columns of the aforementioned submatrix, i.e., CF;k
1=2
j
F
enough variability to provide the requisite additional column rank of magnitude p
the (p
j)-th largest eigenvalue of
jF (
0
CF;k
F
) (:=
1=2
j
F
Gi BF;p
j
1F ; :::; jF g
Gi BF;p
jF (
; has
j: Speci…cally,
) is bounded away from zero.
This allows for the minimal amount of variation that still delivers the incremental p
rank that is required. Note that the matrix
j
is j
j column
) is not actually a variance matrix. It is an
expected outer-product matrix, which makes the condition slightly weaker.
We can write
jF (
0
) = ( 0 BF;p
vec(Gi )
F
j
0
CF;k
1=2
j
F
vec(Gi )
(BF;p j
F
)
= EF GF i G0F i ; where GF i := vec(Gi )
(using the general formula vec(ABC) = (C 0
vec(Gi )
F
1=2
F
1
F
CF;k
j)
gi 2 Rpk
and
(3.8)
A)vec(B)): The random vector GF i consists of
the residuals from the L2 (F ) projections of the components of Gi onto the space spanned by the
components of gi : The matrix
vec(Gi )
F
is the expected outer-product of these residuals. Analogously,
21
Sequences of distributions in the semi-strongly identi…ed category can come from sets F0j for any j < p:
Linking the parameter spaces F0j for j = 0; :::; p with identi…cation categories, as is done in this paragraph,
provides a useful interpretation, but is somewhat heuristic. The reason is that the parameter spaces F0j place
conditions on individual distributions F; whereas the asymptotic identi…cation categories (i.e., strong, semi-strong,
and weak in the standard and nonstandard senses) depend on the properties of sequences of distributions fFn : n 1g:
1=2
23
The stated equality holds because (i) by (3.3)-(3.5) F EF Gi = CF Diag( F )BF0 ; where Diag( F ) is the
k
p matrix whose (m; m) element equals mF for m = 1; :::; p and whose other elements all equal zero, (ii)
1=2
1=2
0
CF0 F EF Gi BF = Diag( F ) by the orthogonality of CF and BF ; and, hence, (iii) CF;j
EF Gi BF;j =
F
Diagf 1F ; :::; jF g:
22
17
the matrix
jF (
0
elements of CF;k
) is the expected outer-product of the residuals from the L2 (F ) projections of the
1=2
j
F
Gi BF;p
j
onto the space spanned by the components of gi :
If some element of gi does not depend on some element of ; then the corresponding element of
Gi is identically zero. For example, this occurs with simple mean-variance moment conditions of the
form gi ( ) = (Yi
1 ; (Yi
2
1)
0
2) ;
where
of the random variable Yi : In such cases,
impose the weakest conditions possible on
1 is a
vec(Gi )
F
vec(Gi )
F
mean parameter and
1=2
vec(
F
or
F
(because EF Gi =
I2 ): In this model,
1 g;
pF
pF
is a variance parameter
is singular. In consequence, it is important to
1=2
F
Gi )
In the simple mean-variance model, k = p = 2; EF Gi =
identi…ed, and F0 contains F0p = fF 2 F :
2
where
:
I2 ; both parameters are strongly
pF
is the smallest singular value of
is bounded away from zero if the fourth moment
of Yi is bounded above, which is implied by the condition in F that EF jjgi jj2+
condition
1
pF
is redundant for
If the condition
p j(
jF (
))
1
M:24 Hence, the
su¢ ciently small in this model.
> 0 in F0j is weakened to p j ( jF ( )) > 0 and the variance
and covariance matrix estimators b n and b n de…ned below can be any consistent estimators (under
1
suitable sequences of distributions), then the LM and CLR tests do not necessarily have correct
asymptotic size. In particular, we provide an example where the asymptotic distribution of the LM
statistic is
2
k
in this case, rather than the desired distribution
2;
p
which leads to over-rejection
under the null when k > p; see Section 12 in the Appendix.25 Hence, the restrictions on the
parameter space F0 are not redundant.
In contrast, the SR-AR, SR-CQLR1 ; and SR-CQLR2 tests introduced in AG2 are shown to
have correct asymptotic size without any conditions on
p j(
jF (
)) or
0
min (EF gi gi ):
All that is
required is the …rst two conditions in F: Hence, these tests have advantages over the LM and CLR
tests considered here in terms of the robustness of their size properties.
Let C F;p
j
2 Rk
(p j)
denote a matrix that contains p
of CF : Six alternative su¢ cient conditions for the
24
p j(
j columns from the last k
j columns
) condition in F0j ; in increasing order of
This holds because EF Gi = I2 and F has elements [ F ]11 = 20 ; [ F ]12 = [ F ]21 = EF Ui (Ui2
20 ); and
2
0
[ F ]22 = EF (Ui2
20 ) ; where 20 := V arF (Yi ); Ui := Yi
10 ; 10 := EF Yi ; and 0 = ( 10 ; 20 ) denotes the true
null value.
25
This example consists of a standard linear IV regression model with one rhs endogenous variable, IV’s that are
irrelevant, i.e., = 0k ; and a correlation between the structural and reduced-form equation errors that equals one
or converges to one as n ! 1: The example also can be extended to cover weak IV cases (where = n 6= 0k ; but
k
n ! 0 su¢ ciently quickly as n ! 1); rather than the irrelevant IV case.
18
strength, are:
0
(i)
(ii)
vec(
F
min
1=2
vec(
F
(iv)
min
vec(Gi )
F
where M and
F
fi
min ( F )
min (V
F
are as in (3.1) and
j)
Gi )
:=
j;
1;
1M
fi
F
2;
1
for some matrix C F;p
1;
where
arF (fi ))
j)
1
2
2;
Gi BF;p
Gi BF;p
1=2
min
(vi)
j
F
(iii)
(v)
1=2
vec(C F;p
F
min
2=(2+ )
;
0
:= EF fi fi0 and fi := @
gi
vec(Gi )
1
A ; and
(3.9)
is as in (3.7).26 See Section 17 in the Supplemental Material
for a proof of the su¢ ciency of these conditions. None of these conditions depend on : Another
su¢ cient condition for the
p j(
) condition in F0j is
1=2
F
p
Gi BF;p
j
1
F
For the linear IV model in (2.2), we have
vec(Gi )
F
=
F
8 2 Rp
j
with jj jj = 1:
= EF u2i Zi Zi0 ;
vec(Gi )
F
(3.10)
= EF vec(Zi Y2i0 )vec(Zi Y2i0 )0 ;
EF vec(Zi Y2i0 )Zi0 ui ; and EF jj(gi0 ; vec(Gi )0 )0 jj2+ = EF jj(ui Zi0 ; vec(Zi Y2i0 )0 )0 jj2+ : Su¢ -
cient conditions for condition (vi) in (3.9) (and, hence, for the
p j(
) condition in F0j ) in the
linear IV regression model are as follows. We have
fi
F
= EF ((ui ; Y2i0 )0
Zi )((ui ; Y2i0 )0
= EF ("i
Zi )("i
Zi )0 + EF si ( )si ( )0 and
V arF (fi ) = EF ("i
Zi )("i
Zi )0 + EF si ( )si ( )0
EF ("i "0i
Zi )0
Zi Zi0 ); where
"i := (ui ; V2i0 )0 ; si ( ) := (0k0 ; (Zi Zi0
= (
min (V
i.e.,
1 ; :::;
p)
arF (fi ))
";F
for
2
j
0
1 ) ; :::;
min (V
(Zi Zi0
0 0
p) ) ;
(3.11)
2 Rk for j = 1; :::; p; and the inequality holds in a psd sense. Hence,
holds if
0
min (EF ("i "i
Zi Zi0 ))
2:
When "i is conditionally homoskedastic,
:= V arF ("i ) = EF ("i "0i jZi ) a.s., we have EF ("i "0i
for example,
EF si ( )EF si ( )0
arF (fi ))
2
holds if
";F
Zi Zi0 ) =
";F
EF Zi Zi0 : Hence,
and EF Zi Zi0 have minimum eigenvalues that are
26
Condition (i) holds if it holds for any C F;p j matrix corresponding to any CF matrix that satis…es the condition
in F0j : Conditions (i) and (ii) are invariant to the choice of the matrix BF in cases where BF is not uniquely de…ned.
19
1=2
2 :
bounded away from zero by
3.2
De…nition of G(Wi ; )
The k
p matrix G(Wi ; ) does not need to equal (@=@ 0 )g(Wi ; ); as de…ned in (1.3). Rather,
the asymptotic size results given below hold for any matrix G(Wi ; ) that satis…es the conditions
in F0 : For example, G(Wi ; ) can be the derivative of g(Wi ; ) almost surely, rather than for all
Wi ; which allows g(Wi ; ) to have kinks. Alternatively, the function G(Wi ; ) can be a numerical
derivative, such as ((g(Wi ; + "e1 )
g(Wi ; ))="; :::; (g(Wi ; + "ep )
g(Wi ; ))=") 2 Rk
p
for some
" > 0; where ej is the jth unit vector, e.g., e1 = (1; 0; :::; 0)0 2 Rp : This choice of G(Wi ; ) matrix
may be useful for models with quite complicated Jacobian matrices (@=@ 0 )g(Wi ; ):
3.3
De…nitions of Asymptotic Size and Asymptotic Similarity
Now, we de…ne asymptotic size and asymptotic similarity of a test of H0 :
=
0
for some
given parameter space F( 0 ) of null distributions F: Let RPn ( 0 ; F; ) denote the null rejection
probability of a nominal size
test with sample size n when the distribution of the data is F: The
asymptotic size of the test for the parameter space F( 0 ) is de…ned by
AsySz := lim sup sup RPn ( 0 ; F; ):
n!1 F 2F (
(3.12)
0)
The test is asymptotically similar (in a uniform sense) for the parameter space F( 0 ) if
lim inf
inf
n!1 F 2F (
RPn ( 0 ; F; ) = lim sup sup RPn ( 0 ; F; ):
n!1 F 2F (
0)
Next, we consider a CS that is obtained by inverting tests of H0 :
asymptotic size of the CS for the parameter space F
lim inf inf (F;
n!1
0 )2F
(1
:= f(F;
(3.13)
0)
0)
=
0
: F 2 F( 0 );
for all
0
2
0
2
: The
g is AsySz :=
RPn ( 0 ; F; )): The CS is asymptotically similar (in a uniform sense) for the
parameter space F if lim inf inf (F;
n!1
0 )2F
(1 RPn ( 0 ; F; )) = lim sup sup(F;
n!1
0 )2F
(1 RPn ( 0 ; F; )):
As de…ned, asymptotic size and similarity of a CS require uniformity over the null values
well as uniformity over null distributions F for each null value
0:
0
2
; as
This additional level of uniformity
does not play a signi…cant role in this paper. The same proofs for tests give results for CS’s with
only minor changes.
The dependence of the parameter space F0 ; de…ned in (3.7), on
0
is suppressed for notational
simplicity. When dealing with CS’s, rather than tests, we make the dependence explicit and write
it as F0 ( 0 ): The asymptotic size and similarity of CS’s is considered for the parameter space F
20
;0
de…ned by
F
4
;0
:= f(F;
0)
: F 2 F0 ( 0 );
0
2
g:
(3.14)
Kleibergen’s Nonlinear LM Test
Here, we de…ne and analyze Kleibergen’s (2005) nonlinear LM test for the nonlinear moment
condition model in (1.1). Let
gbn ( ) := n
1
n
P
i=1
b n ( ) := n
gi ( ); G
1
n
P
i=1
Gi ( ); and b n ( ) := n
1
n
P
gi ( )gi ( )0
i=1
(4.1)
For any matrix A with r rows, we de…ne the projection matrices
0
PA := A(A A) A0 and MA := Ir
gbn ( )b
gn ( )0 :27
PA ;
(4.2)
where ( ) denotes any g-inverse.28 If A has zero columns, we set MA = Ir :
De…ne the (nonlinear) Anderson and Rubin (1949) (AR) statistic of Stock and Wright (2000),
and the Lagrange Multiplier statistic of Kleibergen (2005) as follows:
ARn ( ) := nb
gn ( )0 b n 1 ( )b
gn ( ) and
LMn ( ) := nb
gn ( )0 b n 1=2 ( )P b
1=2
n
bn( )
( )D
b n ( ) := (D
b 1n ( ); :::; D
b pn ( )) 2 Rk
D
b jn ( ) := G
b jn ( )
D
p
;
b n 1=2 ( )b
gn ( ); where
b jn ( ) b n 1 ( )b
gn ( ) 2 Rk for j = 1; :::; p;
b n ( ) := (G
b 1n ( ); :::; G
b pn ( )) 2 Rk p ; and
G
n
b jn ( ) := n 1 P (Gij ( ) G
b jn ( ))gi ( )0 2 Rk
k
for j = 1; :::; p:
(4.3)
i=1
b n ( ) as the orthogonalized sample Jacobian because it equals the sample Jacobian
We refer to D
b n ( ) adjusted to be asymptotically independent of the sample moments gbn ( ):
G
The nominal size
quantile of a
2
p
LM test rejects the null hypothesis in (1.2) when LMn ( 0 ) exceeds the 1
distribution, denoted by
CSLM;n := f
2
p;1
0
2
: The nominal size 1
: LMn ( 0 )
2
p;1
g:
LM CS is de…ned by
(4.4)
Any estimator b n ( ) that is consistent for Egi ( )gi (P)0 under the drifting subsequences of distributions considered
0
in Section 8 in the Appendix can be used, such as n 1 n
i=1 gi ( )gi ( ) ; without changing the asymptotic size results
given below. However, we recommend the de…nition in (4.1).
28
Projection matrices are invariant to the choice of g-inverse.
27
21
The following result establishes the correct asymptotic size and asymptotic similarity of Kleibergen’s (2005) LM test and CS for the parameter spaces F0 and F
;0 ;
respectively.
Theorem 4.1 The asymptotic size of the LM test equals its nominal size
2 (0; 1) for the para-
meter space F0 (de…ned in (3.7)). Furthermore, the LM test is asymptotically similar (in a uniform
sense). Analogous results hold for the LM CS for the parameter space F
;0 ;
de…ned in (3.14).
Comments: (i) Theorem 4.1 provides a more complete set of asymptotic results under the null
hypothesis for the LM statistic than in Kleibergen (2005). See Section 2 for a detailed discussion.
(ii) In contrast to results in Kleibergen (2005), we impose regularity conditions in the speci…cation of F0 in order to establish our asymptotic results for the LM test. We show in Section 12 in
the Appendix that these regularity conditions are not redundant. Without the
p j(
) condition in
F0j ; we show that, for some models, some sequences of distributions, and some (consistent) choices
of variance and covariance estimators, the LM statistic has a
leads to over-rejection of the null when the standard
2
p
2
k
asymptotic distribution. This
critical value is used and the parameters
are over-identi…ed (i.e., k > p):
(iii) Kleibergen’s LM test is asymptotically e¢ cient in a GMM sense under strong IV’s because
it is asymptotically equivalent under n
1=2
local alternatives to t and/or Wald tests based on
asymptotically e¢ cient GMM estimators, e.g., see Newey and West (1987b).
We now provide a brief description of how we obtain the asymptotic distribution of the projec1=2 b
tion matrix onto b n D
n ; which appears in the LM statistic, using the conditions in F0 : Projection
matrices are invariant to multiplication by scalars, such as n1=2 ; and post-multiplication by nonsin1=2 b
gular p p matrices. We use this invariance when normalizing b n D
n to obtain a nondegenerate
limit of the projection matrix under a sequence of distributions fFn 2 F0 : n
normalization depends on the identi…cation strength under fFn : n
1g: The appropriate
1g: For sequences of distribu-
tions where all parameters are strongly identi…ed, such as distributions in F0p ; no normalization is
1=2 b
needed and b n D
n converges in probability to a nonstochastic matrix that has full column rank
p:
For sequences of distributions that are weakly identi…ed in the standard sense (i.e., for which all
parameters are weakly identi…ed), such as suitable sequences of distributions in F00 ; the expected
1=2 b
1=2 ; the vector vec(n1=2 b 1=2 D
b n ) has an
Jacobian EFn Gi is O(n 1=2 ); we normalize b n D
n
n by n
asymptotic normal distribution with possibly nonzero mean, and we obtain the desired a.s. full
1=2 b
column rank property of the asymptotic version of n1=2 b n D
n using the p j ( ) condition in F00
for j = 0:
22
Sequences of distributions fFn : n
1g that are weakly identi…ed in the nonstandard sense are
1=2 b
1=2 and
noticeably more complicated to analyze. For such sequences, we multiply b n D
n by n
1=2 b
post-multiply n1=2 b n D
n by a nonstochastic nonsingular p p matrix that rotates its columns and
then di¤erentially downweights (by suitable functions of n) the q rotated columns that are strongly
or semi-strongly identi…ed for q 2 f1; :::; pg; as determined by the magnitude of the singular values
f
jFn
:j
pg of
1=2
F n EF n Gi
for n
1: This eliminates the otherwise explosive behavior of these
columns. Such sequences of distributions come from [qj=0 F0j : For such sequences, the asymptotic
1=2 b
version of the normalized b n D
q; (i) the
n matrix has full column rank a.s. because, for all j
…rst j nonstochastic (rotated) columns have full column rank by the choice of rotation and (ii) the
expected outer-product matrix of every linear combination of the remaining p
0
CF;k
F
normal (rotated) rows and columns, i.e.,
1=2
j
F
Gi BF;p
j
; satis…es the
j asymptotically
p j(
) lower bound
condition in F0j :
5
Kleibergen’s CLR Test with Jacobian-Variance Weighting
In this section, we consider Kleibergen’s (2005, Sec. 5.1) nonlinear CLR test that employs the
Jacobian-variance weighting. This test utilizes a rank statistic, rkn ( ); that is suitable for testing
the hypothesis rank[EF Gi ]
p
1 against rank[EF Gi ] = p: For example, the rank statistics
of Cragg and Donald (1996, 1997), Robin and Smith (2000), and Kleibergen and Paap (2006)
have been suggested for this purpose. Given rkn ( ) and any p
1; Kleibergen (2005) de…nes the
nonlinear CLR test statistic as
CLRn ( ) :=
1
ARn ( )
2
rkn ( ) +
p
rkn ( ))2 + 4LMn ( ) rkn ( ) :
(ARn ( )
(5.1)
This de…nition mimics the de…nition of the likelihood ratio (LR) statistic in the homoskedastic
normal linear IV regression model with …xed regressors when p = 1; see Moreira (2003, eqn. (3)).
However, it di¤ers from the LR statistic in the latter model when p
2: Smith (2007), Newey and
Windmeijer (2009), and Guggenberger, Ramalho, and Smith (2012) consider GEL versions of the
CLR statistic in (5.1).
The critical value of the CLR test is c(1
; rkn ( )); where c(1
; r) is the 1
quantile of
the distribution of
clr(r) :=
for 0
1
2
2
p
+
2
k p
r+
q
(
r < 1 and the chi-square random variables
23
2
p
+
2
p
2
k p
and
r)2 + 4
2
k p
2r
p
(5.2)
in (5.2) are independent. The
CLR test rejects the null hypothesis H0 :
=
if CLRn ( 0 ) > c(1
0
; rkn ( 0 )):
b n ( ) and
Kleibergen (2005, p. 1114) recommends using a rank statistic that is a function of D
b n ( )) (after
a consistent estimator of the covariance matrix of the asymptotic distribution of vec(D
suitable normalization), denoted VeDn ( ) 2 Rkp kp : (Also, see (37) of Kleibergen (2007).) In the
i.i.d. case considered here, VeDn ( ) is de…ned by
VeDn ( ) := n
1
n
X
vec(Gi ( )
i=1
b n ( ))vec(Gi ( )
G
b n ( ) := ( b 1n ( )0 ; :::; b pn ( )0 )0 2 Rpk
k
:
b n ( ))0
G
b n ( ) b n 1 ( ) b n ( )0 ; where
(5.3)
b n ( ) upon which the rank statistic depends is
The Jacobian-variance weighted version of D
b n ( ))) =
b y ( ) := vec 1 (Ve 1=2 ( )vec(D
D
n
k;p Dn
2
6
fn ( ) = 6
M
6
4
3
p
X
j=1
f1jn ( )D
b jn ( ); :::; M
fpjn ( )D
b jn ( )); where
(M
f1pn ( )
f11n ( )
M
M
7
7 e 1=2
..
..
..
7:= VDn ( ) 2Rkp
.
.
.
5
f
f
Mp1n ( )
Mppn ( )
kp
fj`n ( ) 2 Rk
and M
k
for j; `
p:
(5.4)
The function veck;p1 ( ) is the inverse of the vec( ) function for k
p matrices.29 Similarly, Smith’s
b ny ( )
b ny ( ): We refer to D
(2007) nonlinear CLR test relies on a rank statistic that is a function of D
as the Jacobian-variance-weighted orthogonalized sample Jacobian.
For example, Kleibergen’s (2005, 2007) rank statistic based on the Robin and Smith (2000)
statistic is
rkn ( ) :=
by
min (n(Dn (
b ny ( )):
))0 D
(5.5)
b ny Tny is given in the following theorem.30 Here Tny is a
The asymptotic null distribution of n1=2 D
b ny by an orthogonal matrix and then rescales the resulting
nonstochastic p p matrix that rotates D
b ny Tny has a non-degenerate asymptotic distribution. We let f
columns so that n1=2 D
index a sequence of distributions fFn : n
0
EFn Gi and V arFn @
:n
1g
1g that has certain properties, including convergence of
fi
vech(fi fi
n;h
0)
1
0
A ; where fi := @
gi
vec(Gi
EF n G i )
1
A;
(5.6)
Thus, the domain of veck;p1 ( ) consists of kp-vectors and its range consists of k p matrices.
As mentioned above, for notational simplicity, we often drop the dependence on 0 for statistics that are computed
b ny and Tny denote D
b ny ( 0 ) and Tny ( 0 ); respectively.
under the null hypothesis value = 0 : Thus, D
29
30
24
and convergence (possibly to in…nity) of certain functions of n1=2 EFn Gi : In (5.6), vech( ) denotes
the half vectorization operator that vectorizes the elements in the columns of a symmetric matrix
that are on and below the main diagonal. We de…ne Tny and f
n;h
:n
1g precisely in Section
18 in the Supplemental Material, see (18.9) and (18.28), rather than here. The reason is that it
takes several pages to de…ne these quantities precisely, and the exact form of these quantities is
b ny Tny ;
not important. What is important is the general form of the asymptotic distribution of n1=2 D
which can be speci…ed without these de…nitions.
The following theorem is a key ingredient in determining the asymptotic size of Kleibergen’s
CLR test with Jacobian-variance weighting when p
2: For this CLR test based on the Robin and
Smith (2000) rank statistic (de…ned in (5.5)), the asymptotic size is determined and a formula for
it is stated in Section 18 in the Supplemental Material. The formula for asymptotic size is given
by the supremum of the asymptotic null rejection probabilities over sequences of distributions
with di¤erent identi…cation strengths. For some sequences, the asymptotic versions of the sample
moments and the (suitably normalized) Jacobian-variance weighted orthogonalized sample Jacobian
are independent, and the asymptotic null rejection probabilities are necessarily equal to the nominal
size :
However, when p
2; for some sequences, these asymptotic quantities are not necessarily
independent, and the asymptotic null rejection probabilities are not necessarily equal to the nominal
size : (The problematic sequences of distributions are of the nonstandard weak identi…cation type,
which requires p
2:) The asymptotic null rejection probabilities could be larger or smaller than
(or both) depending on the model. If they are larger (or larger and smaller), the test does not
have correct asymptotic size and is not asymptotically similar. If they are smaller, the test has
correct asymptotic size, but is not asymptotically similar. The outcome that obtains depends on
the speci…c model and moment conditions. Hence, when p
2; we cannot say that, under general
conditions, the Jacobian-variance weighted CLR test has correct asymptotic size.
Although the asymptotic size formula for the Jacobian-variance weighted CLR test is an important result of this paper, it is stated in the Supplemental Material because the notation and
de…nitions needed to state it are extremely lengthy. Instead, we state the following result here,
which shows why we cannot show that this CLR test necessarily has correct asymptotic size when
p
2:
Theorem 5.1 Under the null hypothesis H0 :
=
0
and under all sequences f
n;h
:n
1g with
b ny Tny )
n1=2 (b
gn ; D
1 (as de…ned in Section 18.2 in the Supplemental Material ),
!d
KCLR 8n
y
y
y
y
(g h ; h +M h ); where (g h ; h ; M h ) has a multivariate normal distribution whose mean and variance
matrix depend on lim V arFn ((fi 0 ; vech (fi fi 0 )0 )0 ) and on the limits of certain functions of EFn Gi
n;h
2
25
and g h and
y
h
are independent.
y
h;
Comments: (i) The quantities g h ;
y
and M h ; which appear in Theorem 5.1, are complicated
nonrandom linear functions of a mean zero multivariate normal random vector Lh whose variance
matrix equals the limit of the variance that appears in (5.6). These linear functions are given
explicitly in (18.13), (18.15), and (18.19) in Section 18 in the Supplemental Material.
(ii) When trying to show that Kleibergen’s (2005, 2007) and Smith’s (2007) CLR tests have correct asymptotic size, one needs the conditional asymptotic distributions of the LM statistic and the
statistic Jn ( 0 ) := ARn ( 0 )
function of
y
y
h + M h;
LMn ( 0 ) given the asymptotic rank statistic, which is a nonrandom
2
p
to be
2
k p
and
distributions, respectively.31 The asymptotic distributions
of LMn ( 0 ) and Jn ( 0 ) are quadratic forms in g h with random idempotent weight matrices that
depend on
2
p
are
y
(M h ;
y
(M h ;
and
y
h
y
y
+ M h : If M h = 0k
2
k p
p
a.s., then conditional on
distributions, as desired, because g h and
y
h ) is independent of g h ;
y
y
h ): However, when M h
y
h
y
h;
these asymptotic distributions
are independent. Alternatively, if
one obtains the desired conditional asymptotic distributions given
6= 0k
p
with positive probability, one typically does not get the
y
desired conditional asymptotic distributions, because M h and g h typically are correlated in this
case.
y
(iii) In some scenarios, M h = 0k
E F n G i ! 0k
p;
p
a.s. This always occurs if p = 1:32 If p
2; it occurs if
which covers the cases where all of the parameters are weakly identi…ed in the
standard sense or semi-strongly identi…ed. If p
2; it also occurs if the smallest singular value of
n1=2 EFn Gi diverges to in…nity, which covers the case where all of the parameters are strongly or
semi-strongly identi…ed.
y
In addition, (M h ;
y
h)
is independent of g h ; if gi and fi fi 0 are uncorrelated (for all F in the
parameter space of interest), which holds in some special cases. For example, in a homoskedastic
linear IV model with p rhs endogenous variables and …xed IV’s, it holds if (i) the reduced-form
equation error vector V2i is of the form V2i = K1 ui + K2 i ; where ui is the structural equation error,
K1 is some constant p vector, K2 is some constant p
p vector, (ii) ui is independent of
i;
p matrix, and
i
is some mean zero random
and (iii) ui is symmetrically distributed about zero with three
moments …nite. These conditions hold if (ui ; V2i0 )0 has a multivariate normal distribution, but fail
for most joint distributions of (ui ; V2i0 )0 :33;34
31
See the proof of Theorem 10.1 for details.
The proof of this is given in Comment (ii) to Theorem 18.3 in the Supplemental Material.
33
The correlation between gi and fi fi 0 is zero in this case by the following: y1i = Y2i0 + ui ; Y2i = Zi0 + V2i ;
gi = Zi ui ; Gi = Zi Y2i ; and fi = (ui ; V2i0 )0 Zi : In consequence, the product of any element of gi and any element
of fi fi 0 is of the form of a constant times Zis Zit Zi` times a linear combination (with constant coe¢ cients) of u3i ;
ui 2ij ; ui ij im ; and u2i ij for some s; t; `; j; m 1; where Zis and ij denote the sth element of Zi and the jth element
of i ; respectively. The expectations of these terms are all zero under conditions (i)-(iii).
34
In addition, lack of correlation between gi and fi fi 0 typically does not hold if the IV’s are random and independent
32
26
y
Typically, M h is non-zero (with positive probability) and correlated with g h whenever some
parameters are strongly identi…ed and others are weakly identi…ed in either the standard sense or
in a jointly weakly-identi…ed sense. In consequence, in general, when p
2; one cannot verify
that Kleibergen’s (2005, 2007) and Smith’s (2007) CLR tests have correct asymptotic size using
the standard proof. Depending upon the particular sequence of distributions considered and the
particular moment functions considered, the correlation between g h and
y
h
y
+ M h could increase
or decrease the asymptotic null rejection probability from the nominal probability :
(iv) Numerical simulations of a linear IV model (with p = 2; one parameter strongly identi…ed,
one parameter weakly identi…ed, and a particular distribution of the errors) corroborate the …nding
y
that M h and g h can be correlated asymptotically, see Section 18.3 in the Supplemental Material
for details. In the model considered, the simulated asymptotic null rejection probabilities are found
to be in [4:95; 5:01]; which are very close to the test’s nominal size of 5:00: Whether this occurs for
a wide range of error distributions and for other moment condition models is an open question. It
appears that this question needs to be answered on a case by case basis.
1=2
b ny ( ) by the non(v) If the random weight matrix VeDn ( ) is replaced in the de…nition of D
1=2
random quantity that it is estimating, call it VDn ( ); then the asymptotic distribution of the
quantities in Theorem 5.1 is given by (g h ;
pearance of
y
h );
where g h and
y
h
are independent. Thus, the ap-
y
Mh
1=2
in Theorem 5.1 is due to the estimation of the weight matrix. If VDn ( ) is known
b ny ( ); then the Kleibergen (2005,
(which almost never occurs in practice) and is used to de…ne D
2007) and Smith (2007) CLR tests can be shown to have correct asymptotic size even when p 2:
1=2
b ny Tny is because
(vi) The reason that the estimator VeDn a¤ects the limit distribution of n1=2 D
b n di¤erently. If one bases the rank statistic on W
fn D
b n ; where W
fn
it weights the columns of D
fn ( 0 )) is some random k
(= W
k matrix that converges in probability to a nonsingular matrix,
fn (after suitable normalization) does not a¤ect
then the nondegenerate asymptotic distribution of W
fn D
b n ; only the plim of W
fn does (and the corresponding CLR test
the asymptotic distribution of W
has correct asymptotic size). The proof is given in Section 18.5 in the Supplemental Material.
(vii) In Section 18.1 in the Supplemental Material, we provide an example that illustrates the
results of Theorem 5.1 and Comments (iv) and (v) to Theorem 5.1.
(viii) Given the result of Theorem 5.1, we do not recommend using a rank statistic that depends
b n ( )) (after suitable normalization) when
on an estimator of the asymptotic variance matrix of vec(D
p
2:
(ix) The CLR test with Jacobian-variance weighting (in the rank statistic) is asymptotically
e¢ cient in a GMM sense under strong IV’s provided rkn ( ) !p 1 under strong IV’s, which is the
of (ui ; V2i0 )0 : This is a consequence of the de…nition of EF Gi being di¤erent between the …xed and random IV cases.
27
case for all of the rank tests considered in the literature.35
y
As indicated in Comment (iii) to Theorem 5.1, when p = 1; M h = 0k
p
a.s. In consequence,
Kleibergen’s (2005) CLR test has correct asymptotic size when p = 1 for a suitable parameter space
of distributions F and a suitable rank statistic, such as that in (5.5). We consider the parameter
space
FJV W;p=1 := fF 2 F :
for some
f(F;
0)
3
Gi
F
min (
EF Gi EF G0i )
3g
> 0: For the corresponding CS, we consider the parameter space F
: F 2 FJV W;p=1 ( 0 );
(5.7) with its dependence on
We have FJV W;p=1
0
0
F00 (
2
(5.7)
;JV W;p=1
:=
g; where FJV W;p=1 ( 0 ) denotes the set FJV W;p=1 de…ned in
made explicit.
F0 ) when
3
=
2
(by (3.7) and condition (iv) in (3.9)), where
F00 = F0j with j = 0 (for F0j de…ned in (3.7)) and F0 is the parameter space for which the momentvariance weighted CLR test has correct asymptotic size, see Theorem 6.1 below. When p = 1;
F0 = F00 [F01 and the set F01 places no restrictions on the variance matrix or outer-product matrix
of the orthogonalized sample Jacobian (i.e.,
1F (
)): The parameter space FJV W;p=1 cannot be
enlarged to include a set like F01 ; because the condition on the variance matrix of the orthogonalized
Gi
F
EF Gi EF G0i in FJV W;p=1 is needed to obtain the nonsingularity of the
probability limit of the weight matrix VeDn :
sample Jacobian
When p = 1; the Robin and Smith (2000) rank statistic given in (5.5) (with
= 0 ); which is
1=2
based on Kleibergen’s (2005, 2007) recommended Jacobian-variance weight matrix VeDn ; reduces
to
b
b n 0 Ve 1 D
rkn := nD
Dn n :
(5.8)
Theorem 5.2 Suppose p = 1: The asymptotic size of the CLR test with Jacobian-variance weighting, de…ned by (5.1), (5.2), and (5.8), equals its nominal size
2 (0; 1) for the parameter space
FJV W;p=1 : Furthermore, this CLR test is asymptotically similar (in a uniform sense) for this parameter space. Analogous results hold for the CLR CS with Jacobian-variance weighting for the
parameter space F
;JV W;p=1 :
Comment: Correct asymptotic size holds for Kleibergen’s CLR test with Jacobian-variance
b n has only one column in this case, so it is impossible to have
weighting when p = 1 because D
unequal column weights.
35
This holds because all CLR tests of the form in (5.1) and (5.2) are asymptotically equivalent to the LM test in
(4.3) under the null and n 1=2 local alternatives under strong IV’s, by (10.3) and (10.4) in the proof of Theorem
10.1 in Section 10 in the Appendix, and, as noted above, the LM test is asymptotically e¢ cient in a GMM sense
b n ( ) in its
under strong IV’s. Note that, by de…nition in (4.3), the LM statistic uses moment-variance weighting of D
projection matrix.
28
6
Kleibergen’s CLR Test with Moment-Variance Weighting
Newey and Windmeijer (2009) and Guggenberger, Ramalho, and Smith (2012) consider a version
of Kleibergen’s (2005) CLR test that uses a rank statistic that depends on
b n 1=2 ( )D
b n ( );
(6.1)
b ny ( ): We refer to b n 1=2 ( )D
b n ( ) as the moment-variance-weighted orthogonalized
rather than D
b n : In this section, we
sample Jacobian. This choice gives equal weight to each of the columns of D
show that this choice combined with the Robin and Smith (2000) rank statistic yields a nonlinear
CLR test that has correct asymptotic size for the parameter space F0 : In this case, the rank statistic
is
rkn ( ) :=
b
min (nDn (
b n ( )):
)0 b n 1 ( )D
(6.2)
Theorem 6.1 The asymptotic size of the CLR test with moment-variance weighting, de…ned by
(5.1), (5.2), and (6.2), equals its nominal size
2 (0; 1) for the parameter space F0 (de…ned in
(3.7)). Furthermore, this CLR test is asymptotically similar (in a uniform sense) for this parameter
space. Analogous results hold for the CLR CS with moment-variance weighting for the parameter
space F
;0 ;
de…ned in (3.14).
Comments: (i) Neither Newey and Windmeijer (2009) nor Guggenberger, Ramalho, and Smith
(2012) provide an asymptotic size result like that in Theorem 6.1. Guggenberger, Ramalho, and
Smith (2012) provide asymptotic null rejection probabilities only under Stock and Wright’s (2000)
Assumption C, plus a high-level condition that involves the asymptotic behavior of the rank statistic. Verifying this high-level assumption under parameter sequences that satisfy Assumption C
turns out to be very challenging. We do so in this paper, also see Comment (ii). But note that the
proof of Theorem 6.1, given in Section 10 in the Appendix, involves much more than this. It is complicated because it needs to consider a broad array of di¤erent types of identi…cation ranging from
standard weak identi…cation, to joint weak identi…cation, to semi-strong and strong identi…cation.
(ii) The proof of Theorem 6.1 actually allows for the use of any rank statistic that satis…es an
assumption called Assumption R, which is stated in Section 10, not just the rank statistic rkn ( ) in
(6.2). Assumption R is veri…ed using Theorem 8.4 below for the rank statistic in (6.2). With some
changes, Assumption R can be veri…ed using Theorem 8.4 when the rank statistic is of an “equallyweighted”Robin-Smith form, but with a di¤erent weight matrix than in (6.2). That is, Assumption
1=2
b n ( ) replaced by W
fn ( )D
b n ( ) for
R can be veri…ed when rkn ( ) is as in (6.2) but with b n ( )D
some k
fn ( ) that is positive de…nite (pd) asymptotically. (This is what we
k weight matrix W
29
mean by equally-weighted.) This is done in Section 18.5 in the Supplemental Material. In contrast,
by Theorem 5.1, when p
2; Assumption R typically does not hold for any rank statistic that
b ny ( ):
depends on the Jacobian-variance weighted statistic D
(iii) The CLR test considered in Theorem 6.1 is asymptotically e¢ cient in a GMM sense under
strong IV’s provided rkn ( ) !p 1 under strong IV’s, see Comment (iii) to Theorem 4.1 for more
details.
(iv) Assumption R likely holds for the Cragg and Donald (1996, 1997) and Kleibergen and Paap
b n ( ): However,
(2006) rank statistics when they are based on an equally-weighted function of D
showing this is not easy. We do not do so here.
Although the rank statistic in (6.2) yields a test with correct asymptotic size, it has some
1=2
drawbacks. The use of the pre-multiplication weight matrix b n ( ) and no post-multiplication
b n ( ) is arbitrary. The choice of these weight matrices is important for power
weight matrix for D
purposes because it is a major determinant of the magnitude of rkn ( ) and the latter enters both
the test statistic and the data-dependent critical value function. We show in Section 14 in the
Supplemental Material to AG2 that the rank statistic in (6.2) does not reduce to the rank statistic
in Moreira’s (2003) CLR test in the homoskedastic normal linear IV regression model with …xed
regressors even when p = 1: Speci…cally, the rkn ( ) statistic in (6.2) di¤ers asymptotically from the
rank statistic in Moreira’s CLR test by a scale factor that can range between 0 and 1 depending
on the scenario considered. This is undesirable because Moreira’s CLR test has been shown to have
some approximate optimal power properties in the aforementioned model when p = 1:
In addition, the CLR test with moment-variance weighting, which is considered in this section,
has correct asymptotic size for the parameter space F0 ; but not necessarily for the larger parameter
space F:
These disadvantages motivate interest in the SR-CQLR1 and SR-CQLR2 tests considered in
AG2.
7
Time Series Observations
In this section, we generalize the results of Theorems 4.1, 5.2, and 6.1 from i.i.d. observations to
strictly stationary strong mixing observations. In the time series case, F denotes the distribution
of the stationary in…nite sequence fWi : i = :::; 0; 1; :::g:36 Let ai be a random vector that depends
0
on Wi ; such as vec(Gi ) or CF;k
1=2
j
F
Gi BF;p
j
: In the time series case, we de…ne
36
F
and
ai
F
Asymptotics under drifting sequences of true distributions fFn : n
1g are used to establish the correct asymptotic size of the LM and CLR tests. Under such sequences, the observations form a triangular array of row-wise
strictly stationary observations.
30
ai
F;
di¤erently from their de…nitions in (3.2) for the i.i.d. case. For the time series case, we de…ne
ai
F;
F;
ai
F
and
ai
F
F
as follows:37
1
X
:=
:=
m= 1
1
X
EF (ai
EF ai )(ai
m
ai
F
0
m) ;
EF ai
:=
1
X
EF ai gi0
m;
m= 1
EF gi gi0
m;
ai
F
and
ai
F
:=
ai
F
F
1 ai 0
F :
(7.1)
m= 1
ai
F
Note that
= lim V arF (n
1=2
Pn
ai
F
i=1 (ai
1
F
gi )):38
The time series analogue FT S of the space of distributions F; de…ned in (3.1), is
FT S := fF : fWi : i = :::; 0; 1; :::g are stationary and strong mixing under F with
strong mixing numbers f
F (m)
:m
EF gi = 0k ; EF jj(gi0 ; vec(Gi )0 )0 jj2+
1g that satisfy
M; and
for some ; > 0; d > (2 + )= ; and C; M < 1; where
F
F (m)
min ( F )
Cm
j
(7.2)
is de…ned in (7.1).
pg are de…ned in (3.7), but with FT S in place of F, with
and with the de…nitions of (
1F ; :::; pF );
;
g
We de…ne the time series parameter spaces of distributions FT S;0 and fFT S;0j : 0
and fF0j : 0
d
ai
F
j
pg as F0
de…ned as in (7.1),
BF ; and CF in (3.3)-(3.5) employing the de…nition of
F
in
(7.1). We de…ne the time series parameter space of distributions FT S;JV W;p=1 as FJV W;p=1 is de…ned
in (5.7), but with FT S in place of F; with
(because
Gi
F
:=
Gi
F
series case, rather than
3
F
1 Gi 0
F
0
EF Gi Gi ).
Gi
F
F
and
Gi
F
Gi
F
de…ned as in (7.1), and with EF Gi EF G0i deleted
is de…ned to be EF (Gi EF Gi )(Gi EF Gi )0 in the time
That is, FT S;JV W;p=1 := fF 2 FT S :
> 0: For CS’s, we use the parameter spaces F
;T S;JV W;p=1
:= f(F;
0)
: F 2 FT S;JV W;p=1 ( 0 );
;T S;0
0
:= f(F;
2
denote FT S;0 and FT S;JV W;p=1 with their dependence on
The su¢ cient conditions for the
in the time series setting with
ai
F
p j(
and
0)
min (
Gi
F )
: F 2 FT S;0 ( 0 );
3g
0
for some
2
g and
g; where FT S;0 ( 0 ) and FT S;JV W;p=1 ( 0 )
0
made explicit.
) condition in F0j provided in (3.9) and (3.10) also hold
ai
F
de…ned as in (7.1).
37
Note that the de…nition of aFi in (7.1) di¤ers from its de…nition in (3.2) in two ways. First, there are the lag
m 6= 0 terms. Second, there is the re-centering of ai by its mean EF ai : Re-centering is needed in the time series
context to ensure that aFi is a convergent sum. In the i.i.d. case, we avoid re-centering because without it the
restriction in F0 ; de…ned in (3.7), is weaker.
38
This follows by calculations analogous to those in (19.3) and (19.4) in the proof of Theorem 7.1 below.
31
Now, we de…ne the LM and CLR test statistics in the time series context. To do so, we let
0
VF := lim V arF @n
=
1
X
m= 1
0
EF @
1=2
n
X
i=1
0
@
gi
vec(Gi
11
gi
AA
vec(Gi )
10
EF G i )
A@
gi
vec(Gi
m
m
EF Gi
m)
10
A :
(7.3)
The second equality holds for all F 2 FT S (as shown in the proof of Lemma 19.1 in Section 19 in
the Supplemental Material).
The test statistics depend on an estimator Vbn ( 0 ) of VF : This estimator is (typically) a het-
eroskedasticity and autocorrelation consistent (HAC) variance estimator based on the observations
b n )0 )0 : There are a number of
ffi fbn : i ng; where fi := (g 0 ; vec(Gi )0 )0 and fbn ( ) := (b
g 0 ; vec(G
n
i
HAC estimators available in the literature, e.g., see Newey and West (1987a) and Andrews (1991).
The asymptotic size and similarity properties of the tests are the same for any consistent HAC
estimator. Hence, for generality, we do not specify a particular estimator Vbn ( 0 ): Rather, we state
results that hold for any estimator Vbn ( 0 ) that satis…es the following consistency condition when
the null value
0
is the true value.
Assumption V: Vbn ( 0 )
n
VFn !p 0(p+1)k
(p+1)k
under fFn : n
1g for any sequence fFn 2 FT S :
1g for which VFn ! V for some pd matrix V:
We write the (p + 1)k
(p + 1)k matrix Vbn ( ) in terms of its k
2
6
6
6
b
Vn ( ) = 6
6
6
4
Under Assumption V, b n ( 0 ) !p
under F:
b n ( ) b 1n ( )0
b 1n ( ) VbG n ( )
11
..
..
.
.
b pn ( ) VbG n ( )
p1
F
..
.
b pn ( )0
VbG0 p1 n ( )
..
.
VbGpp n ( )
k submatrices:
3
7
7
7
7:
7
7
5
under F and b n ( 0 ) = ( b 1n ( 0 )0 ; :::; b pn ( 0 )0 )0 !p
(7.4)
vec(Gi )
F
In the time series case, for the LM test, the CLR test with moment-variance weighting, and
when p = 1 the CLR test with Jacobian-variance weighting, the de…nitions of the statistics gbn ( );
b n ( ); ARn ( ); LMn ( ); D
b n ( ); CLRn ( ); and rkn ( ) are the same as in (4.1)-(5.1), but with b n ( )
G
and b jn ( ) for j = 1; :::; p de…ned as in Assumption V and (7.4) rather than as in Sections 4 and
5. In addition, when p = 1; for the CLR test with Jacobian-variance weighting, in the de…nition of
P
b n ( ))vec(Gi ( ) G
b n ( ))0 is replaced by the lower
VeDn in (5.3), the matrix n 1 ni=1 vec(Gi ( ) G
32
right pk
pk submatrix of Vbn ( ) in (7.4) (and b n ( ) and b jn ( ) for j = 1; :::; p are de…ned as in
(7.4)). With these changes, the critical values for the time series case are de…ned in the same way
as in the i.i.d. case.
For the time series case, the asymptotic size and similarity results for the tests described above
are as follows.
Theorem 7.1 Suppose the LM test, the CLR test with moment-variance weighting, and when
p = 1 the CLR test with Jacobian-variance weighting are de…ned as in this section, the parameter
space for F is FT S;0 for the …rst two tests and FT S;JV W;p=1 for the third test, and Assumption
V holds. Then, these tests have asymptotic sizes equal to their nominal size
2 (0; 1) and are
asymptotically similar (in a uniform sense). Analogous results hold for the corresponding CS’s for
the parameter spaces F
;T S;0
and F
;T S;JV W;p=1 :
33
Appendix
This Appendix provides proofs of some of the results stated in the paper and shows that
the eigenvalue condition in F0 is not redundant. For brevity, other proofs are provided in the
Supplemental Material to this paper given in Andrews and Guggenberger (2014b). Section 8 in
this Appendix states some basic results that are used in all of the proofs. For brevity, these results
are proved in Sections 14-16 in the Supplemental Material. These results also are used in Andrews
and Guggenberger (2014a) and should be useful for establishing the asymptotic sizes of other tests
for moment condition models when strong identi…cation is not assumed. Given the results in Section
8, Section 9 proves Theorem 4.1, Section 10 proves Theorem 6.1, and Section 11 proves Theorem
5.2. Theorem 5.1 is proved in Section 18 in the Supplemental Material. Section 12 shows that the
eigenvalue condition in F0 ; de…ned in (3.7), is not redundant in Theorems 4.1, 5.2, and 6.1.
For notational simplicity, throughout the Appendix, we often suppress the argument
various quantities that depend on the null value
8
0
for
0:
Basic Framework and Results for the Proofs
8.1
Uniformity
The proofs of Theorems 4.1, 5.2, and 6.1 use Corollary 2.1(c) in Andrews, Cheng, and Guggenberger (2009) (ACG). The latter result provides general su¢ cient conditions for the correct asymptotic size and (uniform) asymptotic similarity of a sequence of tests.
We now state Corollary 2.1(c) of ACG. Let f
n
:n
1g be a sequence of tests of some null
hypothesis whose null distributions are indexed by a parameter
RPn ( ) denote the null rejection probability of
fhn ( ) = (h1n ( ); :::; hJn (
))0
2
RJ
:n
n
with parameter space
under : For a …nite nonnegative integer J; let
1g be a sequence of functions on
H := fh 2 (R [ f 1g)J : hwn (
wn )
of fng and some sequence f
wn )
! h 2 H; RPwn (
wn )
!
for some
: De…ne
! h for some subsequence fwn g
wn
2
:n
1gg:
Assumption B : For any subsequence fwn g of fng and any sequence f
hw n (
(8.1)
wn
2
:n
1g for which
2 (0; 1):
Proposition 8.1 (ACG, Corollary 2.1(c)) Under Assumption B ; the tests f
asymptotic size
: Let
n
: n
1g have
and are asymptotically similar (in a uniform sense). That is, AsySz := lim sup
n!1
sup
2
RPn ( ) =
and lim inf inf
n!1
2
RPn ( ) = lim sup sup
n!1
34
2
RPn ( ):
Comments: (i) By Comment 4 to Theorem 2.1 of ACG, Proposition 8.1 provides asymptotic size
and similarity results for nominal 1
CS’s, rather than tests, by de…ning
as one would for a test,
but having it depend also on the parameter that is restricted by the null hypothesis, by enlarging
the parameter space
correspondingly (so it includes all possible values of the parameter that is
restricted by the null hypothesis), and by replacing (i)
(ii)
by 1
under
n
by a CS based on a sample of size n;
; (iii) RPn ( ) by CPn ( ); where CPn ( ) denotes the coverage probability of the CS
when the sample size is n; and (iv) the …rst lim sup sup
n!1
2
that appears by lim inf inf
In the present case, where the null hypotheses are of the form H0 :
is taken to be a subvector of
and
n!1
=
2
0
is speci…ed so that the value of this subvector ranges over
:
(ii) In the application of Proposition 8.1 to prove Theorems 4.1 and 6.1, one takes
one-to-one transformation of F0 for tests, and one takes
F
;0
:
; for CS’s,
0
for
2
to be a
to be a one-to-one transformation of
for CS’s. With these changes, the proofs for tests and CS’s are the same. In consequence, we
provide explicit proofs for tests only and obtain the proofs for CS’s by analogous applications of
Proposition 8.1. In the application of Proposition 8.1 to prove Theorem 5.2, the same is done but
with FJV W;p=1 in place of F0 :
(iii) We prove the test results in Theorems 4.1, 5.2, and 6.1 using Proposition 8.1 by verifying
Assumption B for suitable choices of
8.2
and hn ( ):
c n and U
bn
Random Weight Matrices W
cn 2 Rk k and U
bn 2
We prove results for statistics that depend on random weight matrices W
cn D
b nU
bn and functions of this statistic, where
Rp p : In particular, we consider statistics of the form W
b n is de…ned in (4.3). The de…nitions of the random weight matrices W
cn and U
bn depend upon the
D
statistic that is of interest. They are taken to be of the form
cn := W1 (W
c2n ) 2 Rk
W
k
bn := U1 (U
b2n ) 2 Rp
and U
p
;
(8.2)
c2n and U
b2n are random …nite-dimensional quantities, such as matrices, and W1 ( ) and U1 ( )
where W
are nonrandom functions that are assumed below to be continuous on certain sets. The estimators
c2n and U
b2n have corresponding population quantities W2F and U2F ; respectively. For examples,
W
cn and
see Examples 1-3 immediately below. Thus, the population quantities corresponding to W
bn are
U
WF := W1 (W2F ) and UF := U1 (U2F );
(8.3)
respectively.
Example 1: With Kleibergen’s (2005) LM test and the CLR test with moment-variance weighting,
35
which are considered in Sections 4 and 6, respectively, we take
cn = b n 1=2 and U
bn = Ip :
W
(8.4)
In this case, the functions W1 ( ) and U1 ( ) are the identity functions, and the corresponding popu1=2
lation quantities are WF = W2F =
F
; where
F
:= EF gi gi0 ; see (3.2), and UF = U2F = Ip :
1=2 b
Example 2: For a CLR test based on an equally-weighted statistic other than b n D
n ; such as
fn D
b n ; as in Comment (ii) to Theorem 6.1, one de…nes a pd matrix W
fn as desired and one takes
W
cn = W
fn and U
bn = UF = U2F = Ip :
W
Example 3: With Kleibergen’s (2005) CLR test with Jacobian-variance weighting and p = 1;
which is considered in Section 5, we determine the asymptotic distribution of the rank statistic
cn = Ve 1=2 and U
bn = Ip : In this case, the functions W1 ( ) and U1 ( ) are as
in (5.8) by taking W
Dn
in Example 1, and the corresponding population quantities are WF = W2F = (V arF (vec(Gi ))
vec(Gi )
F
F
1 vec(Gi )0 1=2
)
F
=(
vec(Gi )
F
EF Gi EF G0i )
1=2 ;
and UF = U2F = Ip : For this test, we need
the asymptotic distribution of the LM statistic. In consequence, for this test, we also establish
cn and U
bn de…ned as in Example 1.
some asymptotic results with W
Examples 4 & 5: The results of this section are used in AG2 when the asymptotic sizes of two
cn = b n 1=2 and it is convenient to
new SR-CQLR tests are determined. For the SR-CQLR tests, W
take W1 ( ) = ( )
1=2
c2n = b n ; and the matrix U
bn is a nonlinear transformation U1 ( ) of a
and W
matrix estimator, which is di¤erent for the two tests. For brevity, we do not de…ne the nonlinear
transformation or the two matrix estimators here.
We provide results for distributions F in the following set of null distributions:
FW U := fF 2 F :
for some constants
min (WF )
WU
WU;
min (UF )
W U ; jjWF jj
MW U ; and jjUF jj
MW U g (8.5)
> 0 and MW U < 1; where F is de…ned in (3.1). The set FW U \ F0 is
used to establish results for Kleibergen’s LM and the CLR test with moment-variance weighting,
considered in Section 6, using the fact that F0 = FW U \ F0 for
WU
MW U < 1 su¢ ciently large. This holds because for all F 2 F0 ;
1=2
max ( F )
jj
F jj
1=2
M
1=2
for some M < 1 (because jj
M < 1 by the moment conditions in F); jjWF jj = jj
0
min (EF gi gi )
condition in F), where
> 0;
min (UF )
36
1=2
F
=
jj
> 0 su¢ ciently small and
min (WF )
=
0
F jj = jjEF gi gi jj
1=2
min ( F )
min (Ip )
min ( F
1=2
) =
M for some
1=2
(using the
= 1; and jjUF jj = jjIp jj = p:
8.3
Reparametrization
To apply Proposition 8.1, we reparametrize the null distribution F to a vector : The vector
is chosen such that for a subvector of
convergence of a drifting subsequence of the subvector (after
suitable renormalization) yields convergence in distribution of the test statistic and convergence in
distribution of the critical value in the case of the CLR tests.
cn and U
bn in this section, we provide
To be consistent with the use of general weight matrices W
more general de…nitions of
jF ;
BF ; and CF here than are given in Section 3. These general
de…nitions reduce to the de…nitions given in Section 3 when WF =
The vector
1=2
F
and UF = Ip :
depends on the following quantities. Let
p orthogonal matrix of eigenvectors of UF0 (EF Gi )0 WF0 WF (EF Gi )UF
BF denote a p
ordered so that the corresponding eigenvalues (
1F ; :::;
pF )
(8.6)
are nonincreasing. The matrix BF is
such that the columns of WF (EF Gi )UF BF are orthogonal. Let
k orthogonal matrix of eigenvectors of WF (EF Gi )UF UF0 (EF Gi )0 WF0 39
CF denote a k
ordered so that the corresponding eigenvalues are (
(
1F ; :::; pF )
1F ; :::;
pF ; 0; :::; 0)
2 Rk : Let
denote the p singular values of WF (EF Gi )UF ;
which are nonnegative, ordered so that
jF
equal the p eigenvalues of
(8.8)
is nonincreasing. (Some of these singular values may
be zero.) As is well-known, the squares of the p singular values of a k
A0 A
(8.7)
and the largest p eigenvalues of
AA0 :
p matrix A with k
In consequence,
jF
=
2
jF
p
for
j = 1; :::; p:
39
The matrices BF and CF
0
UF (EF Gi )0 WF0 WF (EF Gi )UF
are not uniquely de…ned. We let BF denote one choice of the matrix of eigenvectors of
and analogously for CF :
37
to be40;41;42
De…ne the elements of
0
1F ; :::; pF )
1;F
:= (
2;F
:= BF 2 Rp
p
;
3;F
:= CF 2 Rk
k
;
4;F
5;F
6;F
2 Rp ;
:= (EF Gi1 ; :::; EF Gip ) 2 Rk p ;
0
10
10
gi
gi
A@
A 2 R(p+1)k
:= EF @
vec(Gi )
vec(Gi )
=(
6;1F ; :::;
7;F
:= W2F ;
8;F
:= U2F ;
9;F
:= F; and
=
F
:= (
0
6;(p 1)F )
1;F ; :::;
:= (
2F
pF
; :::;
1F
(p 1)F
(p+1)k
)0 2 Rp
;
1
; where 0=0 := 0;
9;F ):
(8.9)
cn = W1 (W
c2n ) and U
bn = U1 (U
b2n ): We
The dimensions of W2F and U2F depend on the choices of W
let
5;gF
denote the upper left k
k submatrix of
We consider the parameter space
0
5;F:
Thus,
5;gF
= EF gi gi0 =
F:
for ; which corresponds to FW U \ F0 ; where FW U and
F0 are de…ned in (8.5) and (3.7), respectively. The parameter space
0
and the function hn ( ) are
de…ned by
0
:= f :
hn ( ) := (n1=2
By the de…nition of F;
0
=(
1;F ;
1;F ; :::;
2;F ;
9;F )
3;F ;
for some F 2 FW U \ F0 g and
4;F ;
5;F ;
1;F ; :::;
8;F );
7;F ;
8;F ):
(8.10)
indexes distributions that satisfy the null hypothesis H0 :
dimension J of hn ( ) equals the number of elements in (
(
6;F ;
1;F ; :::;
8;F ):
needed, but do not cause any problem. Note that two parameter spaces denoted by
use
2
0;
0:
The
Redundant elements in
such as the redundant o¤-diagonal elements of the symmetric matrix
which are larger than
=
5;F ;
1
are not
and
2;
are considered for the two SR-CQLR tests analyzed in AG2. (We also
in this paper, see (8.11) below.)
We de…ne
and hn ( ) as in (8.9) and (8.10) because, as shown below, the asymptotic dis-
40
For simplicity, when writing
= ( 1;F ; :::; 9;F ); we allow the elements to be scalars, vectors, matrices, and
distributions and likewise in similar expressions.
41
If p = 1; no vector 6;F appears in because 1;F only contains a single element.
42
The vector 6;F is only used in the proofs for CLR tests. It could be deleted when considering only an LM test.
38
tributions of the test statistics under a sequence fFn : n
1g for which hn (
Fn )
! h 2 H
depend on the behavior of lim n1=2 1;Fn ; as well as lim m;Fn for m = 2; :::; 8: For example, the
1=2 b
1=2 b 1=2 D
b n BFn Sn (because proLM statistic in (4.3) depends on b n D
n
n ; or equivalently, on n
jections are invariant to rescaling and rhs transformations by nonsingular matrices), where Sn is
a pd diagonal matrix that is designed to make this quantity Op (1) and not op (1): We show that
1=2 b
Dn BFn Sn : In turn, the latter quantity dethis quantity is asymptotically equivalent to n1=2
Fn
pends on n1=2
vec(n1=2
1=2
Fn
b n BFn = n1=2
G
1=2 b
Fn (Gn BFn
1=2
Fn
b n BFn
(G
1=2
Fn EFn Gi BFn :
EFn Gi BFn ) + n1=2
The quantity
EFn Gi BFn )) has a nondegenerate asymptotic normal distribution by the
central limit theorem (CLT), using the behavior of lim
for s = 2; 4; 5; the fact that BFn is an or1=2 b
thogonal matrix, and the restriction in F0 : Hence, the asymptotic behavior of vec(n1=2 Fn G
n BFn )
1=2
Fn EFn Gi BFn :
depends on that of n1=2
to equal
3;Fn Diagfn
1=2
1;Fn g;
s;Fn
1=2
F n EF n G i ;
Using the SVD of
where Diagfn1=2
1;Fn g
denotes the k
the latter is shown below
p matrix with n1=2
1;Fn
on
the main diagonal and zeros elsewhere.
In Example 1 of Section 8.2 applied to the linear model (2.2), we have WF =
1=2
jth singular value of
As is well known, if
F
EF Zi Y2i0
1=2
=
F
EF Zi Zi0
; where
F
=
1=2
F
EF u2i Zi Zi0
and
is the
jF
for j = 1; :::; p:
is close to zero, weak instrument problems occur. But, as we show, matrices
that are close to being singular, without their columns being close to zero, also lead to weak
IV problems. This is captured in the present set-up by
lim n1=2
pFn
pF
being close to zero in the sense that
< 1: If this occurs, then weak identi…cation problems arise.
For notational convenience,
f
n;h
:n
2
1g denotes a sequence f
:= f :
=(
1;F ; :::;
and H is de…ned in (8.1) with
space
2
9;F )
n
2
2
:n
1g for which hn (
43
2:
replaced by
where the conditions speci…ed in F0 (and
By de…nition,
0)
0;
(8.11)
hypothesis H0 :
and FW U ; f
=
n;h
: n
2:
0
We use the parameter
for two reasons. First, this makes it clear
are really needed. Second, some of the results
given here are used in AG2, which does not employ the smaller set
2
! h 2 H; where
for some F 2 FW U g
in many places in the paper, rather than
de…nitions of
n)
0;
but does use
2:
By the
1g is a sequence of distributions that satis…es the null
0:
We decompose h (de…ned by (8.1), (8.9), and (8.10)) analogously to the decomposition of the
…rst eight components of
: h = (h1 ; :::; h8 ); where
m;F
and hm have the same dimensions for
m = 1; :::; 8: We further decompose the vector h1 as h1 = (h1;1 ; :::; h1;p )0 ; where the elements of
43
Analogously, for any subsequence fwn : n
which hwn ( wn ) ! h 2 H:
1g; f
wn ;h
39
: n
1g denotes a sequence f
wn
2
2
: n
1g for
h1 could equal 1: We decompose h6 as h6 = (h6;1 ; :::; h6;p
upper left k
0
1) :
In addition, we let h5;g denote the
k submatrix of h5 : In consequence, under a sequence f
n1=2
! h1;j
jFn
=
5;gFn
Fn
0 8j
p;
m;Fn
n;h
:n
1g; we have
! hm 8m = 2; :::; 8;
= EFn gi gi0 ! h5;g ; and
6;jFn
! h6;j 8j = 1; :::; p
1:
(8.12)
By the conditions in F; de…ned in (3.1), h5;g is pd.
The smallest and largest singular values of WF (EF Gi )UF (i.e.,
pF
and
1F )
can be related to
those of EF Gi (i.e., spF and s1F ) for F 2 FW U via
c1 sjF
c2 sjF for j = 1 and j = p for some constants 0 < c1 < c2 < 1
jF
that do not depend on F: As shown below, the parameter
under f
n;h
:n
1g if lim n1=2
pFn
is strongly or semi-strongly identi…ed
= 1: In consequence of (8.13), this holds i¤ lim n1=2 spFn =
1: The parameters are weakly identi…ed in the standard sense if lim n1=2
equivalently, if lim n1=2
1Fn
(8.13)
jFn
< 1 8j
p or,
< 1; which holds by (8.13) i¤ lim n1=2 s1Fn < 1: The parameters are
weakly identi…ed in the non-standard sense if lim n1=2
1Fn
by (8.13) i¤ lim n1=2 s1Fn = 1 and lim n1=2 spFn < 1:
= 1 and lim n1=2
pFn
< 1; which holds
The proof of (8.13) is as follows. For notational simplicity, we drop the subscript F in some of
the calculations. We have
min (U
=
0
EG0i W 0 W EGi U )
min (U =jjU jj)0 EG0i W 0 W EGi (U =jjU jj) jjU jj2
:jj jj=1
min
:jj jj=1
=
EG0i W 0 W EGi
max (U
0
U)
min (EGi =jjEGi jj)0 W 0 W (EGi =jjEGi jj) jjEGi jj2
:jj jj=1
max (W
c22
c2 :=
0
0
W)
F 2FW U
0
U)
0
0
min (EGi EGi ) max (U U )
0
min (EGi EGi );
sup [
max (U
where
0
0
1=2
max (WF WF ) max (UF UF )]
<1
(8.14)
and the last inequality holds by the conditions in FW U (de…ned in (8.5)). Because the smallest
eigenvalues of U 0 EG0i W 0 W EGi U and EG0i EGi equal the squares of the smallest singular values
of W EGi U and EGi ; respectively, (8.14) establishes the second inequality in (8.13) for j = p:
Analogous calculations establish the lower bound in (8.14) for j = p and the bounds for j = 1
by replacing min and
by max and
; respectively, in the appropriate places and taking c1 :=
40
inf F 2FW U [
8.4
0
0
1=2
min (WF WF ) min (UF UF )]
> 0:
Assumption WU
cn = W1 (W
c2n ) and U
bn = U1 (U
b2n ) de…ned in (8.2)
We assume that the random weight matrices W
satisfy the following assumption that depends on a suitably chosen parameter space
such as
2;
0;
or
(
2 );
1:
Assumption WU for the parameter space
sequences f wn ;h : n 1g with wn ;h 2
c2wn !p h7 (:= lim W2Fw );
(a) W
Under all subsequences fwn g and all
2:
;
n
b2wn !p h8 (:= lim U2Fw ); and
(b) U
n
(c) W1 ( ) is a continuous function at h7 on some set W2 that contains f 7;F (= W2F ) : =
c2wn wp!1 and U1 ( ) is a continuous function at h8 on some
( 1;F ; :::; 9;F ) 2 g and contains W
b2wn wp!1:
set U2 that contains f 8;F (= U2F ) : = ( 1;F ; :::; 9;F ) 2 g and contains U
In Assumption WU and elsewhere below, “all sequences f
f
: n
wn ;h
sequence f
wn ;h
:n
1g” means “all sequences
1g for any h 2 H” and likewise with n in place of wn : Note that, by de…nition, a
wn ;h
:n
1g determines a sequence of distributions fFwn : n
Assumption WU for the parameter space
0
1g; see (8.9).
is veri…ed in Comment (ii) to Theorem 10.1 given
below for the CLR test with moment-variance weighting, which is considered in Section 6. It also
holds for Kleibergen’s LM test (for the same parameter space 0 ) by the same argument (because
c2n ; U
b2n ; W1 ( ); and U1 ( ) are the same for these two tests, see (8.4)).
W
8.5
Basic Results
For any square-integrable random vector ai and F; Fn 2 F; de…ne
ai
F
:= V arF (ai
(EF a` g`0 )
1
F
ai
h
gi ) and
whenever the limit exists, where the distributions fFwn : n
for any subsequence fwn : n
bi = ai
(EF a` g`0 )
1
F
1g: Note that
ai
F
=
ai
F
:= lim
ai
F wn
(8.15)
1g correspond to f
EF ai EF a0i (because
ai
F
wn ;h
:n
1g
= EF bi b0i for
gi and EF gi = 0k ):
A basic result that is used in the proofs of results for all of the tests considered in this paper
and AG2 is the following.
41
Lemma 8.2 Under all sequences f
0
n1=2 @
bn
vec(D
gbn
EF n Gi )
n;h
1
:n
0
A !d @
1g;
1
gh
A
vec(Dh )
Under all subsequences fwn g and all sequences f
wn ;h
0
0
N @0(p+1)k ; @
:n
0k
h5;g
0pk
pk
vec(Gi )
h
k
11
AA :
1g; the same result holds with n replaced
with wn :
vec(Gi )
h
Comments: (i) The variance matrix
vec(Gi )
h
tions allow
depends on h only through h4 and h5 : The assump-
to be singular.
b n in (4.3)
(ii) Suppose one eliminates the min (EF gi gi0 )
condition in F and one de…nes D
with b n replaced by an eigenvalue-adjusted matrix, denoted by b "n ; which is constructed to have
its smallest eigenvalue greater than or equal to " > 0 multiplied by its largest eigenvalue, see AG2
for the details of such a construction. In this case, the result of Lemma 8.2 still holds and all of
the other asymptotic results following from Lemma 8.2 still hold, except the independence of g h
and Dh : However, this independence is key because it is used in the conditioning argument that
establishes the correct asymptotic size of all of the tests that are shown to have correct asymptotic
size. Without it, these tests do not necessarily have correct asymptotic size. In consequence, we
b n in (4.3) using b n ; not b "n :
de…ne D
b n is de…ned using b " ; rather
The reason that independence does not necessarily hold when D
n
"
1
0
0
than b n ; is that the covariance term EFn [Gij EFn Gij (EFn G`j g` )( Fn ) gi ]gi typically does not
equal 0k
0k
k;
k
when
"
Fn
6=
Fn ;
whereas EFn [Gij
EFn Gij
1
0
Fn gi ]gi
(EFn G`j g`0 )
necessarily equals
see the proof of Lemma 8.2 in Section 14 in the Supplementary Material for more details.
(iii) The proofs of Lemma 8.2 and other results in this section are given in Sections 14-16 in
the Supplemental Appendix.
The following is a key de…nition. Consider a sequence f
n;h
:n
1g: Let q = qh (2 f0; :::; pg)
be such that
h1;j = 1 for 1
where h1;j := lim n1=2
spond to f
n;h
in j (since f
:n
jF
:j
jFn
j
qh and h1;j < 1 for qh + 1
j
p;
0 for j = 1; :::; p by (8.12) and the distributions fFn : n
1g de…ned in (8.11). Such a q exists because fh1;j : j
(8.16)
1g corre-
pg are nonincreasing
pg are the ordered singular values of WF (EF Gi )UF ; as de…ned in (8.8)). As
de…ned, q is the number of singular values of WFn (EFn Gi )UFn that diverge to in…nity when multiplied by n1=2 : Roughly speaking, q is the number of parameters, or one-to-one transformations of
the parameters, that are strongly or semi-strongly identi…ed.
The following quantities appear in Lemma 8.3 below, which gives the asymptotic distribution
42
b n after suitable rotations and rescaling, but without the recentering (by subtracting EFn Gi )
of D
that appears in Lemma 8.2. We partition h2 and h3 and de…ne
h2 = (h2;q ; h2;p
q );
h3 = (h3;q ; h3;k
q );
h1;p
q
2
h
0q
=(
h;q ;
h;p q )
2 Rk
p
;
h;q
:= h3;q ;
3
(p q)
6
7
k
7
:= 6
4 Diagfh1;q+1 ; :::; h1;p g 52 R
0(k
h
as follows:
h;p q
(p q)
;
p) (p q)
:= h3 h1;p
q
+ h71 Dh h81 h2;p
q;
h71 := W1 (h7 ); and h81 := U1 (h8 );
where h2;q 2 Rp
Rk
(p q) ;
q;
h71 2 Rk
h2;p
k;
q
2 Rp
h81 2 Rp
(p q) ;
p;
(8.17)
h3;q 2 Rk
q;
h3;k
q
2 Rk
(k q) ;
h;q
2 Rk
The case where q = p (i.e.,
case. In this case, no h2;p
q;
h1;p
q;
h;p q
2
and Dh is de…ned in Lemma 8.2.44 Note that when Assumption
WU holds h71 = lim WFn = lim W1 (W2Fn ) and h81 = lim UFn = lim U1 (U2Fn ) under f
n1=2
q;
jFn
and
! 1 for all j
h;p q
n;h
:n
1g:
p) is the strong or semi-strong identi…cation
matrices appear in (8.17),
h
= h3;q = h3;p ; and
h
is non-random. In consequence, the limit in distribution (or probability) of the normalized matrix
b n UFn Tn ; where Tn 2 Rp p is de…ned below, is non-random, see Lemma 8.3 below. When
n1=2 WFn D
q < p; identi…cation is weak and the limit of this matrix is random.
Now we provide some motivation for Lemma 8.3, which is stated below. To show that the LM
bn
statistic has a 2 asymptotic distribution we need to determine the asymptotic behavior of D
p
without the recentering by EFn Gi that occurs in Lemma 8.2. In addition, to determine the asymptotic distribution of the rkn statistic in (6.2), we need to determine the asymptotic distribution of
b n UFn …rst by BFn
b n UFn without recentering by EFn Gi :45 To do so, we post-multiply WFn D
W Fn D
and then by a nonrandom diagonal matrix Sn 2 Rp p (which may depend on Fn and h). The
b n UFn BFn to ensure that n1=2 WFn D
b n UFn BFn Sn converges
matrix Sn rescales the columns of WFn D
in distribution to a (possibly) random matrix that is …nite a.s. and not almost surely zero. For
F 2 FW U \ F0 ; it ensures that the (possibly) random limit matrix has full column rank with probability one. For example, in the case of the LM statistic, these transformations are applied with
W Fn =
1=2
Fn
and UFn = Ip :
For the LM statistic and the CLR statistics that employ it, we need the full column rank
property of the limit random matrix in order to apply the continuous mapping theorem (CMT).
b n0 b n 1 D
b n (whose inverse
For the LM statistic, the full rank property ensures that the quantity D
44
For simplicity, there is some abuse of notation here, e.g., h2;q and h2;p q denote di¤erent matrices even if p q
happens to equal q:
45
Furthermore, to determine the asymptotic distributions of the two SR-CQLR test statistics and conditional critical
b n UFn without recentering by
values considered in AG2, we need to determine the asymptotic distribution of WFn D
EFn Gi :
43
appears in the expression for LMn ; see (4.3)), is nonsingular asymptotically with probability one
b n BFn Sn : Note that P 1=2 ; which
b n has been transformed and rescaled to yield n1=2 1=2 D
after D
bn
Fn
appears in the de…nition of LMn in (4.3), can be written as
Pb
1=2
n
bn
D
b n (D
b 0 b 1D
b n)
:= b n 1=2 D
n n
b 0 b 1=2
D
n n
h
1=2
1=2
1=2
1=2
b n Tn ) (n1=2 n 1=2 D
b n Tn )0 ( b n 1=2 n1=2 )0 ( b n 1=2
= (bn
D
n )(n
n
i 1
b n Tn )
b n Tn )0 ( 1=2 b 1=2 ); where
(n1=2 n 1=2 D
(n1=2 n 1=2 D
n
n
Tn := BFn Sn 2 Rp
p
provided Tn has full rank and
and
bn
D
1
n
:=
Fn
1=2
n )
(= EFn gi gi0 );
(8.18)
is pd. In consequence, these transformations do not a¤ect the
n
value or distribution of the LM statistic.
Note that the two SR-CQLR test statistics considered in AG2 do not depend on an LM statistic
b n UFn BFn Sn to have full column rank
and do not require the asymptotic distribution of n1=2 WFn D
a.s.
De…ne
Sn := Diagf(n1=2
1Fn )
1
; :::; (n1=2
qFn )
1
; 1; :::; 1g 2 Rp
p
;
(8.19)
where q = qh is de…ned in (8.16).46
The proof of Theorem 9.1 for the LM test, the proofs of Theorems 8.4 and 10.1 for the CLR
test with moment-variance weighting, and the proofs for the two SR-CQLR tests in AG2 use the
following lemma. The p
p matrix Tn is de…ned in (8.18).
Lemma 8.3 Suppose Assumption WU holds for some non-empty parameter space
all sequences f
n;h
:n
1g with
n;h
bn
n1=2 (b
gn ; D
2
WF =
1=2
F
h)
b n UFn Tn ) !d (g h ; Dh ;
EF n Gi ; W F n D
h
h );
is the nonrandom function of h and Dh
and g h are independent, (d) if Assumption WU holds with
; and UF = Ip ; then
h
Under
;
where (a) (g h ; Dh ) are de…ned in Lemma 8.2, (b)
de…ned in (8.17), (c) (Dh ;
2:
=
0;
has full column rank p with probability one, and (e) under all
subsequences fwn g and all sequences f
wn ;h
:n
1g with
wn ;h
2
; the convergence result above
and the results of parts (a)-(d) hold with n replaced with wn :
Comments: (i) Lemma 8.3(c)-(d) are key properties of the asymptotic distribution of n1=2 (b
gn ;
46
j
Note that
q:
jFn
> 0 for n large for j
q and, hence, Sn is well de…ned for n large, because n1=2
44
jFn
! 1 for all
b n UFn Tn ) that lead to the LM statistic having a
W Fn D
2
p
asymptotic distribution and the CLR test
with moment-variance weighting having correct asymptotic size. Lemma 8.3(c) is a key property
that leads to the correct asymptotic size of the two SR-CQLR tests in AG2. Lemma 8.3(d) is not
needed for these tests because they do not rely on an LM statistic.
(ii) The conditions in F0 are used in the proofs to obtain the result of Lemma 8.3(d) and are
not used elsewhere in the proofs, except where Lemma 8.3(d) is used.
The following theorems are used only for the CLR tests. For the proof of Theorem 4.1 concerning
Kleibergen’s (2005) LM test, one can go from here to Section 9.
Let
bn0 D
b n0 W
cn0 W
cn D
b nU
bn ; 8j = 1; :::; p;
bjn denote the jth eigenvalue of nU
b 0 b 0 c0 c b b
min (nUn Dn Wn Wn Dn Un )
ordered to be nonincreasing in j: By de…nition,
cn D
b nU
bn equals b1=2 :
singular value of n1=2 W
jn
(8.20)
= bpn : Also, the jth
Theorem 8.4 Suppose Assumption WU holds for some non-empty parameter space
Under all sequences f
n;h
:n
(a) bpn !p 1 if q = p;
(b) bpn !d
min (
1g with
0
0
h;p q h3;k q h3;k q
(c) bjn !p 1 for all j
n;h
2
h;p q )
2:
;
if q < p;
q;
b0 D
b 0 c0 c b b
(d) the (ordered ) vector of the smallest p q eigenvalues of nU
n n Wn Wn Dn Un ; i.e., (b(q+1)n ; :::;
bpn )0 ; converges in distribution to the (ordered ) p q vector of the eigenvalues of
h;p q
2 R(p
0
0
h;p q h3;k q h3;k q
q) (p q) ;
(e) the convergence in parts (a)-(d) holds jointly with the convergence in Lemma 8.3, and
(f) under all subsequences fwn g and all sequences f
wn ;h
:n
1g with
wn ;h
2
; the results
in parts (a)-(e) hold with n replaced with wn :
Comments: (i) The statistic bpn =
b 0 b 0 c0 c b b
min (nUn Dn Wn Wn Dn Un )
in Theorem 8.4(a) and (b) is a
Robin and Smith (2000)-type rank statistic.
(ii) Theorem 8.4(a) and (b) is used to determine the asymptotic behavior of the statistic
rkn de…ned in (6.2) (which is employed by the CLR test with moment-variance weighting that is
considered in Section 6). More speci…cally, Theorem 8.4(a) and (b) is used to verify Assumption
R in Section 10 below.
(iii) Theorem 8.4(c) and (d) is used to determine the asymptotic behavior of the critical value
cn and U
bn de…ned suitably). Because
functions for the two SR-CQLR tests considered in AG2 (with W
Theorem 8.4(c) and (d) are immediate by-products of the proofs of Theorem 8.4(a) and (b), they
are stated and proved here, rather than in AG2.
45
(iv) The statement of Theorem 3 in Kleibergen (2005) is di¢ cult to interpret because the expression given for the conditional asymptotic distribution of the CLR statistic involves Kleibergen’s
(2005) statistic rk( 0 ); which is a …nite-sample object. Based on Theorem 8.4, (10.7) below provides
the asymptotic distribution of a class of CLR statistics in terms of an asymptotic version of the
rank statistic employed, which is necessary for a precise statement of the asymptotic distribution.
The class of CLR statistics considered are those de…ned in (5.1) and based on the rank statistic in
cn and U
bn ; which is a Robin and Smith (2000)-type rank statistic.
Theorem 8.4 for some choices of W
cn = b n 1=2 and U
bn = Ip gives the rank statistic de…ned in (6.2).
In particular, taking W
9
Asymptotic Size of the Nonlinear LM Test
In this section, we prove Theorem 4.1 for the LM test.
We state a theorem that veri…es Assumption B of ACG (stated in Section 8) for the LM
1=2
cn = b n 1=2 ; WF =
bn = UF = Ip : (These
test. The following theorem applies with W
; and U
F
de…nitions a¤ect the de…nition of
n;h ;
which appears in the theorem).
Theorem 9.1 The asymptotic null rejection probabilities of the nominal size
2 (0; 1) LM test
equal
2
under all subsequences fwn g and all sequences f
Comments: (i) The requirement that
wn ;h
2
0
wn ;h
:n
1g with
wn ;h
0
8n
1:
(de…ned in (8.10)) implies that the parameter
space for F is F0 (de…ned in (3.7)) for the results given in Theorems 4.1 and 9.1 (because the
restrictions in FW U are not binding, see the discussion in the paragraph containing (8.5)).
(ii) Proposition 8.1 and Theorem 9.1 prove Theorem 4.1 for the LM test. The proof of Theorem
4.1 for the LM CS is analogous, see Comments (i) and (ii) to Proposition 8.1.
For notational simplicity, we prove Theorem 9.1 for the sequence fng; rather than a subsequence
fwn : n
1g: We note here that the same proof holds for any subsequence fwn : n
Proof of Theorem 9.1. Let
LMn using the CMT applied to
n :=
Fn :
1=2 1=2
n gbn ;
n
1g:
We derive the limiting distribution of the statistic
1=2 b
1=2
b n 1=2 1=2
Dn Tn ; where the latter two
n ; and n
n
quantities appear in the expression on the rhs of (8.18). Note that b n !p h5;g by the WLLN, n !
1=2 1=2
1=2
h5;g ; and h5;g is pd. Thus, b n
and UF = Ip
n !p Ik : By Lemma 8.3 applied with WF = F
cn = b n 1=2 and U
bn = Ip ), we get ( n 1=2 n1=2 gbn ; n1=2 n 1=2 D
b n Tn ) !d
(which results from taking W
1=2
(h5;g g h ;
h ):
For the CMT to apply, it is enough to show that the function f : Rk
de…ned by f (D) := D(D0 D)
1 D0
for D 2 Rk
p
is continuous on a set C
46
Rk
p
p
! Rk
with P (
h
k
2
C) = 1:47 Note that f is continuous at each D that has full column rank. And, by Lemma 8.3(d),
1=2
has full column rank a.s. because n;h 2 0 ; Fn 2 F0 ; WF = F ; and UF = Ip : Hence, f is
1=2 1=2
continuous a.s. By b n
n !p Ik ; the convergence result in Lemma 8.3, and the CMT, we have
h
PDn b n 1=2 n1=2 gbn = Dn (Dn0 Dn )
where Dn := ( b n
1=2
Conditional on
1=2 1=2
n )n
h;
n
1=2
b n Tn :
D
v 0h v h is distributed as
1=2
1
Dn0 b n 1=2 n1=2 gbn !d v h := P
2
p
because (i)
h
1=2
h
h5;g g h ;
and g h are independent by property
(c) in Lemma 8.3, (ii) h5;g g h is conditionally distributed as N (0k ; Ik ) by g h
(i), and (iii) P
h
is …xed given
(d) in Lemma 8.3. Because the
distributed as
2
p
h
2
p
N (0k ; h5;g ) and
and projects onto a space of dimension p a.s. by property
distribution does not depend on
h;
v 0h v h is unconditionally
as well. In consequence, using the CMT again, we have
LMn !d LM h := v 0h v h
Given this result and the use of the
2
p;1
2
p:
(9.2)
critical value by the LM test, we obtain the conclusion
of Theorem 9.1 for the LM test: lim PFn (LMn >
10
(9.1)
2
p;1
)= :
Asymptotic Size of the CLR Test with Moment-Variance
Weighting
In this section, we prove Theorem 6.1, which concerns the CLR test (and CS) with moment-
variance weighting based on the Robin-Smith rank statistic. In fact, for the CLR test de…ned by
(5.1)-(5.2), we prove a stronger result than that given in Theorem 6.1. We establish Theorem 6.1
for a CLR test that is based on any rank statistic rkn that satis…es a high-level assumption, denoted
Assumption R, not just the rank statistic rkn ( 0 ) de…ned in (6.2). Then, we verify Assumption R for
the moment-variance-weighted Robin-Smith rank statistic rkn ( 0 ) in (6.2). Note that Assumption
R does not hold for the rank statistic in (5.5) when p
2:
Section 18.5 in the Supplemental Material provides additional asymptotic size results for equallyweighted CLR tests (and CS’s), which are CLR tests that are based on rkn statistics that depend
b n only through W
fn D
b n for some k k weighting matrix W
fn : These results show that equallyon D
weighted CLR tests (and CS’s) based on the Robin and Smith (2000) rank statistic with a general
fn (2 Rk k ) have correct asymptotic size under suitable conditions on W
fn : One can
weight matrix W
47
This holds because the function f2 (D; L) := LD((LD)0 (LD))
at (D; Ik ) if f (D) is continuous at D:
47
1
D0 L0 for a nonsingular k k matrix L is continuous
view these results as verifying Assumption R for a broad class of rkn statistics. In contrast, the
results in the present section establish the correct asymptotic size of CLR tests (and CS’s) under
fn
the high-level condition Assumption R and for the Robin and Smith (2000) rank statistic when W
1=2
is the moment-variance weighting matrix b n ; see Comment (ii) to Theorem 10.1 below.
The high-level condition on the rank statistic rkn is the following.
Assumption R: For any subsequence fwn g and any sequence f
8n
Rk
p
8.2.48
wn ;h
:n
1g with
wn ;h
2
0
1 either (a) rkwn !p rh = 1 or (b) rkwn !d rh (Dh ) for some nonrandom function rh :
! R; where Dh is de…ned in Lemma 8.2, and the convergence is joint with that in Lemma
The following theorem applies when the LM statistic is de…ned as in (4.3) with projection onto
1=2
bn D
b n : In consequence, the quantities in (8.2) in the present case are W
cn = b n 1=2 ; WF =
;
F
bn = UF = Ip : (These de…nitions a¤ect the de…nition of n;h ; which appears in the theorem).
and U
1=2
Theorem 10.1 For any statistic rkn that satis…es Assumption R, the asymptotic null rejection
probabilities of the nominal size
2 (0; 1) CLR test de…ned in (4.3)-(5.2) based on rkn equal
under all subsequences fwn g and all sequences f
wn ;h
:n
1g with
wn ;h
2
0
Comments: (i) Theorem 10.1 and Proposition 8.1 imply that a nominal size
any rank statistic that satis…es Assumption R has asymptotic size
8n
1:
CLR test based on
and is asymptotically similar.
Analogous CS results (to the test results stated in Theorem 10.1) hold for a parameter space
that is a reparametrization of F
;0
and is de…ned as
0
;0
is de…ned, but with the adjustments outlined
in Comments (i) and (ii) to Proposition 8.1.
(ii) Theorems 8.4 and 10.1 and Proposition 8.1 establish the test results of Theorem 6.1.
cn = b n 1=2 and U
bn = Ip imply that
This holds because Theorem 8.4(a), (b), (e), and (f) with W
Assumption R holds for the CLR test with moment-variance weighting, that is considered in Section
6, which uses the Robin and Smith (2000) rkn statistic de…ned in (6.2). (In the present context,
cn =
Theorem 8.4 requires that Assumption WU holds for the parameter space 0 : It holds with W
c2n ; W1 (w) = w for w 2 Rk k ; W2 = Rk k ; U
bn = U
b2n ; U1 (u) = u for u 2 Rp
W
cn = b n 1=2 !p h 1=2 under all sequences f n;h : n 1g with n;h 2
because W
5;g
all n
p;
0
and U2 = Rp p ;
bn = Ip for
and U
1:) In particular, Assumption R holds with rh = 1 if q = p and with rh (Dh ) equal to the
smallest eigenvalue of
(8.17) based on WF =
0
0
h;p q h3;k q h3;k q h;p q if q
1=2
and UF = Ip ): The CS
F
< p (where
h;p q
and h3;k
q
are de…ned in
results of Theorem 6.1 hold by Theorem 8.4,
Comment (i) to Theorem 10.1, and Comment (i) to Proposition 8.1.
48
By rk!n !p 1; we mean that for every K < 1 we have P
probability under !n when the true parameter vector equals 0 :
48
0 ; !n
(rk!n > K) ! 1; where P
0 ; !n
( ) denotes
(iii) Theorem 5.1 shows that Assumption R does not hold in general for rank statistics based
b ny ; de…ned in (5.3)-(5.4), when p
on VeDn and D
2: The reason is that for some sequences of
y
b n and, hence, the rank statistic rkn depends on Dh
distributions the asymptotic distribution of D
y
and M h 6= 0k
p;
not just on Dh alone.
For notational simplicity, the following proof is for the sequence fng; rather than a subsequence
fwn : n
1g: The same proof holds for any subsequence fwn : n
1g:
Proof of Theorem 10.1. Let
Jn := nb
gn0 b n 1=2 M b
It follows from (4.3) that
1=2
n
bn
D
b n 1=2 gbn :
(10.1)
ARn = LMn + Jn :
(10.2)
We now distinguish two cases. First, suppose Assumption R(a) holds: rkn !p 1: By (10.2) and
some algebra, we have (ARn
CLRn =
rkn )2 + 4LMn rkn = (LMn
1
LMn + Jn
2
rkn +
p
(LMn
Jn + rkn )2 + 4LMn Jn : Therefore,
Jn + rkn )2 + 4LMn Jn :
Using a mean-value expansion of the square-root expression in (10.3) about (LMn
(10.3)
Jn + rkn )2 ; we
have
p
(LMn
Jn + rkn )2 + 4LMn Jn = LMn
for an intermediate value
n
p
Jn + rkn + (2
n)
1
4LMn Jn
(10.4)
Jn + rkn )2 and (LMn Jn + rkn )2 + 4LMn Jn : It
p
1 = o (1) (which holds because
using (9.2) and (
p
n)
between (LMn
follows that CLRn = LMn + op (1) !d
2
p
rkn !p 1; LMn = Op (1); and Jn = Op (1) by (10.6) below). Analogously, it can be shown that the
critical value c(1
; rkn ); de…ned above (5.2), of the CLR test converges in probability to
2
p;1
:
The result of Theorem 10.1 then follows by the de…nition of convergence in distribution.
bn
Second, suppose Assumption R(b) holds. Then, using Lemma 8.2, we have (n1=2 gbn ; n1=2 (D
1=2
EFn Gi ); rkn ) !d (g h ; Dh ; rh (Dh )): By the proof of Lemma 8.3 applied with WF = F
and
1=2
cn = b n
bn = Ip ); using the former result in place of
UF = Ip (which correspond to W
and U
b n EFn Gi )) !d (g h ; Dh ) gives
(n1=2 gbn ; n1=2 (D
where
n
bn
(n1=2 gbn ; n1=2 (D
:=
Fn ;
(Dh ;
h)
EFn Gi ); n1=2
n
1=2
b n Tn ; rkn ) !d (g h ; Dh ;
D
and g h are independent, and
49
h
h ; rh (D h ));
(10.5)
has full column rank p with probability
one by Lemma 8.3(d) (because we are considering sequences f wn ;h : n
1g with wn ;h 2 0
1=2
8n
1; WF = F ; and UF = Ip ): In addition, b n !p h5;g ; h5;g is pd, and M b 1=2 Db =
n
M
b n 1=2
1=2
n
n1=2
1=2
n
because Tn (de…ned in (8.18)) and
b n Tn
D
n
1=2
n
are nonsingular. These results
and the CMT imply that
1=2
1=2
Jn !d J h := g 0h h5;g M
h
h5;g g h :
(10.6)
The convergence results in (9.2) and (10.6) and rkn !d rh (Dh ) hold jointly by (10.5) and the
de…nitions of LMn and Jn in (4.3) and (10.1).
1=2
1=2
Note that LM h = g 0h h5;g P
M
P
h5;g g h by (9.1) and (9.2). Conditional on
h;
1=2
h
h
h
h
P
1=2
h
h5;g g h and
1=2
h5;g g h have a joint normal distribution with zero covariance (because V ar(h5;g g h ) = Ik and
M
h
= 0k
k)
and, hence, are independent. The same holds true conditional on Dh ; because
is a nonrandom function of Dh and Dh is independent of g h : In consequence, conditional on
Dh ; LM h and J h are independent and distributed as
2
p
and
2 ;
k p
respectively.
Using the convergence results in (10.5) and (10.6), the de…nition of CLRn in (5.1) with ARn =
LMn + Jn substituted in, and the CMT, we obtain
CLRn !d CLRh :=
1
2
LM h + J h
rh +
q
(LM h + J h
rh )2 + 4LM rh ;
(10.7)
where rh := rh (Dh ):
The function c(1
the distributions of
; r) (de…ned in (5.2)) is continuous in r on R+ by the absolute continuity of
2
p
and
2 ;
k p
which appear in clr(r) (also de…ned in (5.2)), and the continuity
of clr(r) in r a.s. This, rkn !d rh ; and (10.7) yield
CLRn
c(1
; rkn ) !d CLRh
c(1
; rh ):
(10.8)
Therefore, by the de…nition of convergence in distribution, we have
P
0; n
provided P (CLRh = c(1
(CLRn > c(1
; rkn )) ! P (CLRh > c(1
; rh ))
; rh )) = 0; which holds because P (CLRh = c(1
(10.9)
; rh )jDh ) = 0 a.s.
The latter holds because conditional on Dh ; CLRh is absolutely continuous (by (10.7) since LM h
and J h are independent and distributed as
and c(1
2
p
and
2
k p
and rh is a nonrandom function of Dh )
; rh ) is a constant.
From above, conditional on Dh ; LM h and J h are independent and distributed as
2
p
and
2 ;
k p
respectively, and rh is a constant. Thus, conditional on Dh ; CLRh and clr(rh ) have the same
distribution. By de…nition, c(1
; rh ) is the 1
quantile of the absolutely continuous random
50
variable clr(rh ) for any constant rh : Hence,
P (CLRh > c(1
; rh )jDh ) =
Because the left-hand side conditional probability equals
unconditional probability P (CLRh > c(1
; rh )) equals
a.s.
(10.10)
a.s. and
does not depend on Dh ; the
as well. Combined with (10.9), this
gives the desired result.
11
Asymptotic Size of the CLR Test with Jacobian-Variance
Weighting when p = 1
In this section, we prove the test results of Theorem 5.2, which concerns Kleibergen’s CLR test
(and CS) with Jacobian-variance weighting when p = 1: The CS results of Theorem 5.2 hold by an
analogous argument, see Comments (i) and (ii) to Proposition 8.1.
Proof of Theorem 5.2. We prove the test results of Theorem 5.2 using Proposition 8.1 and
results (or variants of results) in Lemma 8.3 and Theorems 8.4, 9.1, and 10.1. The proof is made
cn : To obtain the
more complicated by the fact that we need to use two di¤erent de…nitions of W
asymptotic distribution of the LM statistic (which is a component of the CLR statistic), we need
cn = b n 1=2 and U
bn = 1; because the LM statistic (de…ned in (4.3)) depends on b n 1=2 D
b n:
to take W
b n (de…ned in
b n 0 Ve 1 D
But, to obtain the asymptotic distribution of the rank statistic rkn := nD
Dn
cn = Ve 1=2 and U
bn = 1; because rkn depends on Ve 1=2 D
b n:
(5.8)), we need to take W
Dn
Dn
For notational simplicity, we establish results below for sequences fng; rather than subsequences
fwn g of fng: Subsequence results hold by replacing n by wn in the proofs.
We proceed as follows. First, we apply Lemma 8.3 exactly as in the proof of Theorem 9.1 with
1=2
cn = b n 1=2 ; U
bn = 1; WF =
b n EF n Gi ; W F n D
b n UFn Tn ) !d
W
; and UF = 1: This yields n1=2 (b
gn ; D
F
(g h ; Dh ;
h)
for sequences f
n;h
:n
1g that correspond to distributions F in FW U \ F0 based on
these de…nitions of WF and UF : As discussed in the paragraph containing (8.5), F0 = FW U \ F0
for
WU
su¢ ciently small and MW U su¢ ciently large. We employ constants
WU
and MW U for
which this holds. The joint convergence result above yields the asymptotic distributions of the
ARn ; LMn ; and Jn statistics via the calculations in (9.1), (9.2), (10.1), (10.2), and (10.6).
cn = Ve 1=2 ; U
bn = 1; WF = W2F = (V arF (Gi ) Gi 1 Gi 0 ) 1=2 ; where
Next, we take W
Dn
F
F
F
F
are de…ned in (3.2), W1 ( ) equals the identity function on W2 := Rk
k;
Gi
F
and
UF = U2F = 1; and U1 ( )
equals the identity function on U2 := R: We consider distributions in FJV W;p=1 (which is a subset
of F0 when
3
=
2
by the paragraph following (5.7)). We obtain the asymptotic distribution of rkn
51
under the corresponding sequences f
n;h
:n
1g (which di¤er from the sequences f
n;h
:n
1g in
the previous paragraph due to the di¤erence between the two de…nitions of WF ): More speci…cally,
b 0 Ve 1 D
b n (de…ned in (5.8)) for the
we verify the convergence results in Assumption R for rkn := nD
n Dn
f
n;h
:n
1g sequences of this paragraph. The result of Theorem 8.4(a), (b), (e), and (f) veri…es
the convergence results in Assumption R for sequences f
n;h
:n
1g for which Fn 2 FJV W;p=1
c2n = W
cn = Ve 1=2 ; W1 ( ) equal
8n 1 provided Assumption WU holds for such sequences with W
Dn
b2n = U
bn = 1; U1 ( ) equal to the identity function, and the parameter
to the identity function, U
space
9;F )
being equal to
:= f :
JV W;p=1
= (
1;F ; :::;
Gi
F
Here FW U is de…ned in (8.5) with WF = (V arF (Gi )
FJV W;p=1 = FW U \ FJV W;p=1 for
(and we employ constants
F 2 FJV W;p=1 ;
min (WF )
1=2
0
max (EF Gi Gi )
EF Gi EF G0i +
Gi
F
M+
F
1 Gi 0
F
1=2
min ((V
and UF = 1: Note that
> 0 su¢ ciently small and MW U < 1 su¢ ciently large
arF (Gi )
Gi
F
F
1 Gi 0 1=2
)
F )
=
for some M+ < 1 (because EF Gi G0i
is psd and jjEF Gi G0i jj
Gi
F
in F); jjWF jj = jj(V arF (Gi )
1 Gi 0 1=2
jj
F
F )
(V arF (Gi )
1 Gi 0
F )
1 Gi 0
F ) =
F
Gi
F
arF (Gi )
Gi
F
F
1=2
min (V
3
> 0; and jjUF jj =
arF (Gi )
Gi
F
F
1 Gi 0
F
Gi
F
=
1=2
1 Gi 0
(using
3
F
F )
Gi
EF Gi EF G0i using
F
= 1:
b2n := 1: The requirement of
Assumption WU(b) holds automatically with h8 = 1 because U
the de…nition of
in (3.2)), where
1=2
max (V
M+ for some M+ < 1 by the moment conditions
the condition in FJV W;p=1 and the fact that V arF (Gi )
Gi
F
1 Gi 0 1=2
F )
and MW U that satisfy these conditions). This holds because for all
WU
=
WU
F
for some F 2 FW U \ FJV W;p=1 g:
min (UF )
Assumption WU(c) that W1 ( ) is continuous at h7 and U1 ( ) is continuous at h8 also holds automatically because W1 ( ) and U1 ( ) are identity functions.
Assumption WU(a) for the parameter space
For sequences f
n;h
:n
JV W;p=1
1g; we have
VeDn : = n
1
n
X
(Gi
i=1
= EFn (Gi
b n )(Gi
G
EFn Gi )(Gi
b n )0
G
EFn Gi )0
c2n !p h7 (:= lim W2Fn ):
requires that W
b n b n 1 b 0n
Gi
Fn
1 Gi 0
Fn Fn
+ op (1)
= W2F2n + op (1)
! p h7 2 ;
(11.1)
where the …rst equality holds by (5.3), the second equality holds by the WLLN’s applied multiple
times and Slutsky’s Theorem using the conditions in F; the third equality holds by the de…nition
of W2F ; and the convergence holds because W2Fn =
7;Fn
! h7 by the de…nition of the sequence
1g and h7 is pd (since h7 = lim W2Fn and the eigenvalues of W2F2 are bounded above
1=2
for F 2 F): Equation (11.1) and Slutsky’s Theorem give VeDn !p h7 because h7 2 is pd using
f
n;h
:n
52
the condition in FJV W;p=1 that
min (
Gi
F
EF Gi EF G0i )
: In consequence, Assumption WU(a)
holds.
This completes the veri…cation of Assumption WU for the parameter space
JV W;p=1
and, in
consequence, the veri…cation of the convergence results of Assumption R for rkn for sequences
f
n;h
:n
1g de…ned in the fourth paragraph of this proof.
Now we consider sequences f
n;h
:n
1g that satisfy the conditions on f
n;h
:n
1g given in
both the third and fourth paragraphs of this proof. These sequences correspond to distributions F
in FJV W;p=1 : These sequences satisfy the convergence conditions in (8.11) using the de…nitions in
(8.9) and (8.10) with
jF ; BF ; CF ;
based on WF = (V arF (Gi )
f
n;h
:n
Gi
F
1=2
and W2F de…ned based on WF =
1 Gi 0 1=2
:
F
F )
F
and with these quantities
In consequence, for these sequences of distributions
1g; the results above establish the asymptotic distributions of the ARn ; LMn ; Jn ; and
rkn statistics and the convergence is joint because all of the convergence results are based on the
underlying CLT result in Lemma 8.2. Given this joint convergence, by the same arguments as given
in the proof of Theorem 10.1, we obtain that the CLR test with Jacobian-variance weighting has
asymptotic null rejection probabilities equal to
under all such sequences f
n;h
:n
1g (and all
subsequences of such sequences).
Finally, we apply Proposition 8.1 with
and hn ( ) given by the concatenation of the
and hn ( ) functions used in the third and fourth paragraphs above and with
space of the
spaces used in these paragraphs. (Redundant elements of
vectors
given by the product
and hn ( ) do not cause
any problems.) The result of the previous paragraph veri…es Assumption B for this choice
hn ( ); and
;
: In consequence, Proposition 8.1 implies that the Jacobian-variance weighted CLR
test has correct asymptotic size and is asymptotically similar when p = 1:
12
The Eigenvalue Condition in F0
In this section, we show that the restriction
p j(
is not redundant. If this restriction is weakened to
jF (
p j(
))
jF (
1
> 0 in F0j ; de…ned in (3.7),
)) > 0; we show that, for some
models, some sequences of distributions, and some (consistent) choices of variance and covariance
estimators, the LM statistic in (4.3) has a
of the null when the standard
2
p
2
k
asymptotic distribution. This leads to over-rejection
critical value is used and the parameters are over-identi…ed (i.e.,
k > p): On the other hand, we show that the LM statistic equals zero a.s. for some models and
some distributions F if the condition
p j(
jF (
))
1
> 0 is removed entirely. This implies that
the LM test also under-rejects the null hypothesis and is nonsimilar in both …nite samples and
asymptotically for some F:
53
All of the CLR tests considered in Sections 5 and 6, except that of Smith (2007), are functions
of the LM statistic in (4.3) (and other statistics). In consequence, the aberrant behavior of the LM
statistic and test demonstrated in this section, when the restriction
p j(
jF (
))
1
> 0 in F0
is weakened or eliminated, carries over to the CLR statistics and tests in Sections 5 and 6.49
12.1
Eigenvalue Condition Counter-Examples
For simplicity, we consider the case p = 1 in this section. As above, the null hypothesis is
H0 :
=
0:
Lemma 12.1 (a) Suppose F0 is de…ned with the condition
p j(
)) > 0 in place of
for all j 2 f0; :::; pg; where p = 1: Suppose b n ( ) is de…ned in
jF (
p j(
jF (
))
1 > 0 in F0j
P
(4.1) and b 1n ( ) = n 1 ni=1 Gi ( )gi ( )0 (which di¤ ers from its de…nition in (4.3)). Then, there
exist moment functions g(Wi ; ) and a sequence of null distributions fFn 2 F0 : n 1g for which
b n = b n ( 0 ) and b 1n = b 1n ( 0 ) are well-behaved (in the sense that b n EFn gi g 0 !p 0k k and
i
b 1n EFn Gi g 0 !p 0k k ) and LMn ( 0 ) = ARn ( 0 ) + op (1) !d 2 :
i
k
(b) Suppose F0 is de…ned with the condition p j ( jF ( ))
1 > 0 deleted in F0j for all
j 2 f0; :::; pg; where p = 1: Suppose b n ( ) and b 1n ( ) are de…ned in (4.1) and (4.3), respectively.
Then, there exists moment functions and a null distribution F 2 F0 for which LMn ( 0 ) = 0 a:s:
for all n
1:
Comments: (i) The model we use to prove Lemma 12.1(a) is the linear IV regression model with
one endogenous rhs variable and (for simplicity) no exogenous variables. Speci…cally, the model is
y1i = y2i + ui and y2i = Zi0 + v2i ;
2 Rk ; v2i = ui +
where y1i ; ; y2i ; v2i 2 R; Zi ;
i
(12.1)
for some random variable
i;
= (1
2 )1=2 ;
and the observations are i.i.d. across i for any given n: The parameter space F for the distribution
F of the random vector Wi = (y1i ; y2i ; Zi0 )0 is
F := fF : (12:1) holds with
Zi ; ui ; and
EF u2i = EF
i
2
i
=
=
F
2 Rk ;
=
F
2 ( 1; 1);
are mutually independent, EF ui = EF
= 1; EF jj(ui ; i ; Zi0 Zi )jj2+
for some ; > 0 and M < 1: As de…ned,
49
0;
M; and
i
= 0;
0
min (EF Zi Zi )
g
is the correlation between ui and v2i :
1=2 b
by
Smith’s (2007) CLR test is a function of the LM statistic in (4.3) but with b n D
n replaced by Dn :
54
(12.2)
The moment functions are g(Wi ; ) = Zi (y1i
y2i ): When the null value
this gives gi = gi ( 0 ) = Zi ui and Gi = Gi ( 0 ) =
latter is de…ned with the condition
because (i) for all F 2 F ;
(3.11) because
( 1; 1)); (ii)
min (
min (
0
min (EF Zi Zi )
0
min (EF gi gi )
vec(Gi )
)M 2=(2+ )
F
p j ( jF (
vec(Gi )
)>
F
= EF u2i
for all
)) > 0 in place of
j
p j(
jF (
))
1
> 0: This holds
0 (by the argument in the paragraph that contains
0
min (EF Zi Zi )
2 Rp
is the true value,
Zi y2i : The set F is a subset of F0 when the
0
min (EF "i "i )
> 0 and
0
> 0; where "i = (ui ;
> 0; and (iii)
p j(
ui
0
CF;k
j
1=2
F
0
i ) for
Gi BF;p j
F
2
)
with jj jj = 1 and all j 2 f0; :::; pg (by the results and
arguments in the paragraphs that contain (17.1)-(17.3), which verify that condition (iv), stated
in (3.9), is a su¢ cient condition for the
arbitrarily close to zero for
p j(
) condition in F0j ): The quantity
Fn )
vec(Gi )
)
F
is
arbitrarily close to one.
We consider a sequence of distributions fFn 2 F : n
(=
min (
1g for which
Fn
= 0k for all n
1;
n
! 1; and EFn Zi Zi0 does not depend on n: For these distributions,
Gi =
n gi
+
n Gi ;
where Gi :=
Zi
i
and
n
:= (1
2 1=2
:
n)
(12.3)
In this case, the IV’s are irrelevant and the degree of endogeneity is close to perfect for n large.
(ii) The model we consider in Lemma 12.1(b) is the same as that in part (a) except that F
allows for
=
F
2 ( 1; 1] and we consider a single distribution F with
than a drifting sequence of distributions. For this distribution,
min (
= 0k and
vec(Gi )
)
F
= 1; rather
= 0:
(iii) The intuition for the results in Lemma 12.1(a) and (b) is as follows. As (12.3) shows, Gi
is close to being proportional to gi when
Fn
= 0k and
n
is close to one. And, when
Fn
= 0k and
= 1; they are exactly proportional. By averaging over i = 1; :::; n and by taking expectations, the
b n and gbn and their population counterparts. In consequence,
same properties are seen to hold for G
n
b n (:= G
bn
D
b n b n 1 gbn when p = 1) is close to 0k (because it is a sample version of the L2 (F )
b n (because it is the
projection of Gi on gi ) and the same is true of the population counterpart of D
b n is primarily
L2 (F ) projection of Gi on gi ). The latter implies that the direction of the k-vector D
random. In consequence, this direction turns out to be sensitive to the speci…cation of the sample
matrices b n and b n even within the class of consistent estimators of their population counterparts.
b n to be very close to
One consistent choice of b n and b n (used in Lemma 12.1(a)) yields D
1=2
1=2 b
being proportional to gbn : In this case, the projection of b n gbn onto b n D
n is asymptotically
1=2
equivalent to b n gbn itself. The LM statistic is a quadratic form in this projection k-vector
1=2
(i.e., P b 1=2 Db b n gbn ) multiplied by n: Hence, it behaves asymptotically like a quadratic form in
n
n
b n 1=2 gbn multiplied by n; which is just the AR statistic. This explains the result in Lemma 12.1(a).
55
b n = gbn by (12.3)), another consistent
On the other hand, when n = 1 (which implies that G
b n = 0k a.s. In this case, the projection of
choice of b n and b n (used in Lemma 12.1(b)) yields D
b n 1=2 gbn onto b n 1=2 D
b n equals 0k a.s. Hence, the LM statistic (which is a quadratic form in this
projection times n) equals zero a.s. This explains the result in Lemma 12.1(b).
(iv) The result of Lemma 12.1(a) also holds for the model described in Comment (ii). Hence,
drifting sequences of distributions are not required to show the result of Lemma 12.1(a) if one
removes the condition
p j(
jF (
))
> 0 entirely from F0j : Furthermore, the result of Lemma
1
12.1(a) can be extended to cover weak IV cases (in which
=
quickly as n ! 1); rather than the irrelevant IV case (in which
n
6= 0k ; but
n
= 0k ):
! 0k su¢ ciently
(v) Finite sample simulations corroborate the asymptotic result given in Lemma 12.1(a). For
= 0k ;
the model and LM test described in Comment (i) with k = 5;
(ui ; i )
= 1; Zi
N (05 ; I5 );
N (02 ; I2 ); and Zi independent of (ui ; i ); the null rejection rate of the nominal 5% LM
test is 59:4% when n = 200 and 57:6% when n = 1000: However, when
deviates from 1 even by
a small amount, the magnitude of over-rejection drops very quickly. The null rejection rate of this
nominal 5% LM test is 10:1% when
= 0:99 and n = 200 and 12:9% when
= 0:998 and n = 1000:
(These simulation results are based on 50; 000 simulation repetitions.)
(vi) The conditions of Lemma 12.1(a) and (b) are consistent with those of Theorem 1 of
Kleibergen (2005). This implies that the
2
p
asymptotic distribution of the LM statistic obtained
in the latter only holds under additional conditions, such as those in F0 :
12.2
Proof of Lemma 12.1
Proof of Lemma 12.1. To prove part (a), we use the model de…ned in (12.1)-(12.3). We have
bn =
G
bn
ng
b 1n = n
1
b 1n := n
We choose f
1
+
n
P
i=1
n
P
i=1
n
:n
b
n Gn ;
Gi gi0 = n
b n := n
where G
1
n
P
(
n gi
i=1
Gi gi0 :
+
1
n
P
i=1
Gi ; and
0
n Gi )gi
=
bn
n
bn gbn0
ng
1g to converge to one su¢ ciently fast that n
by (12.3). For example, we can take
bn = G
bn
D
=
=
b 1n b
bn
ng
+
= (1
n
n
1
b
gbn
n Gn
gn0 b n 1 gbn )b
gn
n (b
n
[
+
n
n
+
b
n 1n ;
! 0; where
where
(12.4)
n
= (1
3 )1=2 :
Using the results above, we obtain
bn
bn gbn0
ng
b
n (Gn
56
b
1n
b
n
1
+
gbn ):
b
b
n 1n ] n
1
gbn
2 )1=2
n
(12.5)
This gives
gen := gbn + n
n
bn
:= (G
n n
b n =(
=D
b 1n b n 1 gbn )=(
bn0 b n 1 gbn );
ng
where
gn0 b n 1 gbn )
n nb
1=2
= Op (n
) and gen = gbn + op (n
1=2
);
(12.6)
b n = Op (n 1=2 ) by the CLT since EFn G = EFn Zi
= Op (n 1=2 ) because n ! 1; G
i
k
1
0
EFn i = 0 ; b 1n b n = Op (1) by the WLLN applied twice and min (EFn gi gi ) = min (EFn Zi Zi0 )
> 0; gbn = Op (n 1=2 ) by the CLT, and (nb
g 0 b 1 gbn ) 1 = Op (1); which holds by the CMT because
where
n
n
ARn =
nb
gn0 b n 1 gbn
!d
2
k
n
(by the CLT, WLLN, and CMT) and
for gen in the second line of (12.6) holds by
n
= Op (n
1=2 )
and n
2
k
> 0 a.s., and lastly the result
n
= o(1):
Projections are invariant to nonzero scalar multiplications of the matrix that de…nes the projection. That is, PA = PcA for any matrix A and any scalar c 6= 0: We have n gb0 b 1 gbn 6= 0 wp!1
n
n
b n is
= Op (1) and n ! 1: So, the LM statistic is unchanged wp!1 when D
b n =( n gb0 b 1 gbn ) = gen = gbn + op (n 1=2 ) using (12.6). Thus, we have
replaced by D
n n
because
(nb
gn0 b n 1 gbn ) 1
LMn := nb
gn0 b n 1=2 P b
= nb
gn0 b n 1=2 P b
1=2
n
1=2
n
bn
D
g
en
b
n
1=2
gbn
b n 1=2 gbn + op (1)
= nb
gn0 b n 1 gen (e
gn0 b n 1 gen )
1 0 b 1
gen n gbn
+ op (1)
= nb
gn0 b n 1 gbn + op (1) = ARn + op (1) !d
2
k;
(12.7)
which completes the proof of part (a).
Next, we prove part (b). In this case, we use the model in (12.1)-(12.3) with n = 1 and n = 0
b n = gbn : Given the de…nitions of b n and b 1n in
for all n
1: In consequence, Gi = gi and G
(4.1) and (4.3), this yields
b 1n = n
1
bn = G
bn
D
for all n
n
P
i=1
b n gbn0 =
G
Gi gi0
n
1=2
n
bn
D
b
n
1=2
n
P
i=1
b 1n b n 1 gbn = 0k ; and
LMn := nb
gn0 b n 1=2 P b
1
gi gi0 + gbn gbn0 =
gbn = nb
gn0 b n 1=2 P0k b n 1=2 gbn = 0
1; where the projection matrix, P0k ; onto 0k equals 0k
57
b n;
k:
(12.8)
References
Anderson, T. W., and H. Rubin (1949): “Estimation of the Parameters of a Single Equation in a
Complete Set of Stochastic Equations,” Annals of Mathematical Statistics, 20, 46–63.
Andrews, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Estimation,” Econometrica, 59, 817–858.
Andrews, D. W. K., and X. Cheng (2012): “Estimation and Inference with Weak, Semi-strong,
and Strong Identi…cation,”Econometrica, 80, 2153–2211. Supplemental Material available at
Econometrica Supplemental Material , 80, http://www.econometricsociety.org/ecta/Supmat
/9456_miscellaneous.pdf.
— — — (2013a): “GMM Estimation and Uniform Subvector Inference with Possible Identi…cation
Failure,” Econometric Theory, 30, 1–47.
— — — (2013b): “Maximum Likelihood Estimation and Uniform Inference with Sporadic Identi…cation Failure,” Journal of Econometrics, 173, 36–56. Supplemental Material available with
Cowles Foundation Discussion Paper No. 1824R, 2011, Yale University.
Andrews, D. W. K., X. Cheng, and P. Guggenberger (2009): “Generic Results for Establishing
the Asymptotic Size of Con…dence Sets and Tests,”Cowles Foundation Discussion Paper No.
1813, Yale University.
Andrews, D. W. K., and P. Guggenberger (2014a): “Identi…cation- and Singularity-Robust Inference for Moment Condition Models,” Cowles Foundation Discussion Paper No. 1978, Yale
University.
— — — (2014b): “Supplemental Material to ‘Asymptotic Size of Kleibergen’s LM and Conditional
LR Tests for Moment Condition Models’,” Cowles Foundation Discussion Paper No. 1977,
Yale University.
Andrews, D. W. K., M. J. Moreira, and J. H. Stock (2006): “Optimal Two-Sided Invariant Similar
Tests for Instrumental Variables Regression,” Econometrica, 74, 715–752.
— — — (2008): “E¢ cient Two-Sided Nonsimilar Invariant Tests in IV Regression with Weak
Instruments,” Journal of Econometrics, 146, 241–254.
Andrews, I. (2014): “Conditional Linear Combination Tests for Weakly Identi…ed Models,” unpublished manuscript, Department of Economics, MIT.
58
Andrews, I., and A. Mikusheva (2012): “A Geometric Approach to Weakly Identi…ed Econometric
Models,” unpublished manuscript, Department of Economics, MIT.
— — — (2014a): “Conditional Inference with a Functional Nuisance Parameter,” unpublished
manuscript, Department of Economics, MIT.
— — — (2014b): “Maximum Likelihood Inference in Weakly Identi…ed Models,”Quantitative Economics, forthcoming.
Berry, S., J. Levinsohn, and A. Pakes (1995): “Automobile Prices in Market Equilibrium,”Econometrica, 63, 841–890.
Cavanagh, C. L., G. Elliott, and J. H. Stock (1995): “Inference in Models with Nearly Integrated
Regressors,” Econometric Theory, 11, 1131–1147.
Chamberlain, G. (2007): “Decision Theory Applied to an Instrumental Variables Model,”Econometrica, 75, 609–652.
Chaudhuri, S., T. Richardson, J. Robins, and E. Zivot (2010): “A New Projection-Type SplitSample Score Test in Linear Instrumental Variables Regression,” Econometric Theory, 26,
1820–1837.
Chaudhuri, S., and E. Zivot (2011): “A New Method of Projection-Based Inference in GMM with
Weakly Identi…ed Nuisance Parameters,” Journal of Econometrics, 164, 239–251.
Cheng, X. (2014): “Uniform Inference in Nonlinear Models with Mixed Identi…cation Strength,”
unpublished manuscript, Department of Economics, University of Pennsylvania.
Chernozhukov, V., C. Hansen, and M. Jansson (2009): “Admissible Invariant Similar Tests for
Instrumental Variables Regression,” Econometric Theory, 25, 806–818.
Choi, I., and Phillips, P. C. B. (1992): “Asymptotic and Finite Sample Distribution Theory for IV
Estimators and Tests in Partially Identi…ed Structural Equations,”Journal of Econometrics,
51, 113–150.
Cragg, J. C., and S. G. Donald (1996): “On the Asymptotic Properties of LDU-Based Tests of
the Rank of a Matrix,” Journal of the American Statistical Association, 91, 1301–1309.
— — — (1997): “Inferring the Rank of a Matrix,” Journal of Econometrics, 76, 223–250.
Dufour, J.-M. (1989): “Nonlinear Hypotheses, Inequality Restrictions, and Non-Nested Hypotheses: Exact Simultaneous Tests in Linear Regressions,” Econometrica, 57, 335–355.
59
Dufour, J.-M., and J. Jasiak (2001): “Finite Sample Limited Information Inference Methods
for Structural Equations and Structural Models with Generated Regressors,” International
Economic Review, 42, 815–843.
Guggenberger, P. (2012): “On the Asymptotic Size Distortion of Tests When Instruments Locally
Violate the Exogeneity Condition,” Econometric Theory, 28, 387–421.
Guggenberger, P., F. Kleibergen, S. Mavroeidis, and L. Chen (2012): “On the Asymptotic Sizes
of Subset Anderson-Rubin and Lagrange Multiplier Tests in Linear Instrumental Variables
Regression,” Econometrica, 80, 2649–2666.
Guggenberger, P., J. J. S. Ramalho, and R. J. Smith (2012): “GEL Statistics Under Weak Identi…cation,” Journal of Econometrics, 170, 331–349.
Guggenberger, P., and R. J. Smith (2005): “Generalized Empirical Likelihood Estimators and
Tests Under Partial, Weak and Strong Identi…cation,” Econometric Theory, 21, 667–709.
Hillier, G. (2009): “Exact Properties of the Conditional Likelihood Ratio Test in an IV Regression
Model,” Econometric Theory, 25, 915–957.
Inoue, A., and B. Rossi (2011): “Testing for Weak Identi…cation in Possibly Nonlinear Models,”
Journal of Econometrics, 161, 246–261.
Kleibergen, F. (2002): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,” Econometrica, 70, 1781–1803.
— — — (2004): “Testing Subsets of Structural Parameters in the Instrumental Variables Regression
Model,” Review of Economics and Statistics, 86, 418–423.
— — — (2005): “Testing Parameters in GMM Without Assuming That They Are Identi…ed,”
Econometrica, 73, 1103–1123.
— — — (2007): “Generalizing Weak Instrument Robust IV Statistics Towards Multiple Parameters, Unrestricted Covariance Matrices and Identi…cation Statistics,” Journal of Econometrics, 139, 181–216.
Kleibergen, F., and R. Paap (2006): “Generalized Reduced Rank Tests Using the Singular Value
Decomposition,” Journal of Econometrics, 133, 97–126.
McCloskey, A. (2011): “Bonferroni-Based Size-Correction for Nonstandard Testing Problems,”
unpublished manuscript, Department of Economics, Brown University.
60
Mikusheva, A. (2010): “Robust Con…dence Sets in the Presence of Weak Instruments,” Journal
of Econometrics, 157, 236–247.
Montiel Olea, J. L. (2012): “E¢ cient Conditionally Similar-on-the-Boundary Tests,”unpublished
manuscript, Department of Economics, New York University.
Moreira, M. J. (2003): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027–1048.
— — — (2009): “Tests with Correct Size When Instruments Can Be Arbitrarily Weak,” Journal
of Econometrics, 152, 131–140.
Moreira, H., and M. J. Moreira (2013): “Contributions to the Theory of Similar Tests,” unpublished manuscript, FGV/EPGE, Rio de Janeiro, Brazil.
Newey, W. K., and K. West (1987a): “A Simple, Positive Semi-de…nite, Heteroskedasticity and
Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708.
— — — (1987b): “Hypothesis Testing with E¢ cient Method of Moments Estimation,” International Economic Review, 28, 777–787.
Newey, W. K., and F. Windmeijer (2009): “Generalized Method of Moments with Many Weak
Moment Conditions,” Econometrica, 77, 687–719.
Otsu, T. (2006): “Generalized Empirical Likelihood Inference for Nonlinear and Time Series
Models Under Weak Identi…cation,” Econometric Theory, 22, 513–527.
Phillips, P. C. B. (1989): “Partially Identi…ed Econometric Models,” Econometric Theory, 5,
181–240.
Ploberger, W. (2012): “Optimal Tests for Models with Weak Instruments,” unpublished manuscript, Department of Economics, Washington University in St. Louis.
Robin, J.-M., and R. J. Smith (2000): “Tests of Rank,” Econometric Theory, 16, 151–175.
Smith, R. J. (2007): “Weak Instruments and Empirical Likelihood: A Discussion of the Papers
by D. W. K. Andrews and J. H. Stock and Y. Kitamura,” in Advances in Economics and
Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society,
Vol. III, ed. by R. Blundell, W. K. Newey, and T. Persson. Cambridge, UK: Cambridge
University Press. Also available as CEMMAP Working Paper No. 13/05, UCL.
61
Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with Weak Instruments,”
Econometrica, 65, 557–586.
Stock, J. H., and J. H. Wright (2000): “GMM with Weak Identi…cation,” Econometrica, 68,
1055–1096.
62
Contents
13 Outline
2
14 Proof of Lemma 8.2
3
15 Proof of Lemma 8.3
4
16 Proof of Theorem 8.4
11
17 Proofs of Su¢ ciency of Several Conditions for the
p j(
) Condition in F0j
26
18 Asymptotic Size of Kleibergen’s CLR Test with Jacobian-Variance Weighting
and the Proof of Theorem 5.1
29
19 Proof of Theorem 7.1
52
1
13
Outline
We let AG1 abbreviate the main paper “Asymptotic Size of Kleibergen’s LM and Condi-
tional LR Tests for Moment Condition Models” and its Appendix. References to Sections with
Section numbers less than 13 refer to Sections of AG1. Similarly, all theorems and lemmas with
Section numbers less than 13 refer to results in AG1.
This Supplemental Material provides proofs of some of the results stated in AG1. It also provides
some complementary results to those in AG1.
Sections 14, 15, and 16 prove Lemma 8.2, Lemma 8.3, and Theorem 8.4, respectively, which
appear in Section 8 in the Appendix to AG1. Section 17 proves that the conditions in (3.9) and
(3.10) are su¢ cient for the second condition in F0j :
Section 18 proves Theorem 5.1. Section 18 also determines the asymptotic size of Kleibergen’s
(2005) CLR test with Jacobian-variance weighting that employs the Robin and Smith (2000) rank
statistic, de…ned in Section 5, for the general case of p
this test is correct. But, when p
1: When p = 1; the asymptotic size of
2; we cannot show that its asymptotic size is necessarily correct
(because the sample moments and the rank statistic can be asymptotically dependent under some
sequences of distributions). Section 18 provides some simulation results for this test.
Section 19 proves Theorem 7.1, which provides results for time series observations.
For notational simplicity, throughout the Supplemental Material, we often suppress the argument
0
for various quantities that depend on the null value
Material, the quantities BF ; CF ; and (
1F ; :::; pF )
0:
Throughout the Supplemental
are de…ned using the general de…nitions given
in (8.6)-(8.8), rather than the de…nitions given in Section 3, which are a special case of the former
de…nitions.
For notational simplicity, the proofs in Sections 14-16 are for the sequence fng; rather than a
subsequence fwn : n
1g: The same proofs hold for any subsequence fwn : n
1g: The proofs in
these three sections use the following simpli…ed notation. De…ne
Dn := EFn Gi ;
n
:=
Fn ;
Bn := BFn ; Cn := CFn ; Bn = (Bn;q ; Bn;p
q );
Cn = (Cn;q ; Cn;k
Wn := WFn ; W2n := W2Fn ; Un := UFn ; and U2n := U2Fn ;
where q = qh is de…ned in (8.16), Bn;q 2 Rp
q;
2
Bn;p
q
2 Rp
q );
(13.1)
(p q) ;
Cn;q 2 Rk
q;
and Cn;k
q
2
Rk
(k q) :
n;q
n
:= Diagf
2
6
(p
:= 6
4 0
0(k
Note that
14
De…ne
n
1Fn ; :::; qFn g
0q
n;q
q) q
2 Rq
(p q)
n;p q
p) q
0(k
p) (p q)
q
;
3
n;p q
7
7 2 Rk
5
p
:= Diagf
2 R(p
(q+1)Fn ; :::; pFn g
q) (p q)
; and
:
(13.2)
is the diagonal matrix of singular values of Wn Dn Un ; see (8.8).
Proof of Lemma 8.2
Lemma 8.2 of AG1. Under all sequences f
0
n1=2 @
bn
vec(D
gbn
EF n Gi )
1
0
A !d @
n;h
:n
vec(Dh )
1
gh
A
Under all subsequences fwn g and all sequences f
1g;
0
0
N @0(p+1)k ; @
wn ;h
: n
h5;g
0pk
k
0k
pk
vec(Gi )
h
11
AA :
1g; the same result holds with n
replaced with wn :
Proof of Lemma 8.2. We have
bn
n1=2 vec(D
0
1
b 1n
n
B
C
X
B . C
1=2
Dn ) = n
vec(Gi Dn ) B .. C b n 1 n1=2 gbn
@
A
i=1
b pn
2
0
1
3
0
E
G
g
F
`1
n
` C
n 6
B
7
X
6
B
C 1 7
..
= n 1=2
vec(G
D
)
g
6
B
C Fn i 7 + op (1);
i
n
.
4
@
A
5
i=1
0
EFn G`p g`
where the second equality holds by (i) the weak law of large numbers (WLLN) applied to n 1
P
P
G`j g`0 for j = 1; :::; p; n 1 n`=1 vec(G` ); and n 1 n`=1 g` g`0 ; (ii) EFn gi = 0k ; (iii) h5;g = lim
(14.1)
Pn
`=1
Fn
is
pd, and (iv) the CLT, which implies that n1=2 gbn = Op (1):
Using (14.1), the convergence result of Lemma 8.2 holds (with n in place of wn ) by the Lyapunov
triangular-array multivariate CLT using the moment restrictions in F. The limiting covariance
b n Dn ) and n1=2 gbn in Lemma 8.2 is a zero matrix because
matrix between n1=2 vec(D
EFn [Gij
Dnj
(EFn G`j g`0 )
3
1
0
Fn gi ]gi
= 0k
k
;
(14.2)
where Dnj denotes the jth column of Dn ; using EFn gi = 0k for j = 1; :::; p: By the CLT, the limiting
b n Dn ) in Lemma 8.2 equals
variance matrix of n1=2 vec(D
(EFn vec(G` )g`0 )
lim V arFn (vec(Gi )
1
Fn gi )
see (8.15), and the limit exists because (i) the components of
submatrices of
n1=2 gbn
15
equals
5;Fn
and (ii)
lim EFn gi gi0
s;Fn
vec(Gi )
Fn
= lim
vec(Gi )
Fn
=
vec(Gi )
;
h
are comprised of
(14.3)
4;Fn
and
! hs for s = 4; 5: By the CLT, the limiting variance matrix of
= h5;g :
Proof of Lemma 8.3
Lemma 8.3 of AG1. Suppose Assumption WU holds for some non-empty parameter space
2:
Under all sequences f
n;h
:n
1g with
bn
n1=2 (b
gn ; D
n;h
2
;
b n UFn Tn ) !d (g h ; Dh ;
EFn Gi ; W Fn D
where (a) (g h ; Dh ) are de…ned in Lemma 8.2, (b)
de…ned in (8.17), (c) (Dh ;
1=2
WF =
F
h)
h
h );
is the nonrandom function of h and Dh
and g h are independent, (d) if Assumption WU holds with
; and UF = Ip ; then
h
=
0;
has full column rank p with probability one and (e) under all
subsequences fwn g and all sequences f
wn ;h
:n
1g with
wn ;h
2
; the convergence result above
and the results of parts (a)-(d) hold with n replaced with wn :
The proof of part (d) of Lemma 8.3 uses the following two lemmas and corollary.
2 Rk
Lemma 15.1 Suppose
variance matrix ), k
vectors
p
has a multivariate normal distribution (with possibly singular
p; and the variance matrix of
2 Rp with jj jj = 1: Then, P (
Comments: (i) Let Condition
A su¢ cient condition for Condition
(
0
2 Rk has rank at least p for all nonrandom
has full column rank p) = 1:
denote the condition of the lemma on the variance of
:
is that vec( ) has a pd variance matrix (because
=
Ik )vec( )): The converse is not true. This is proved in Comment (iii) below.
(ii) A weaker su¢ cient condition for Condition
is that the variance matrix of
2 Rk has
rank k for all constant vectors 2 Rp with jj jj = 1: The latter condition holds i¤ V ar( 0 vec( )) > 0
2 Rpk of the form
for all
(because (
0
0
0 )vec(
=
) = vec(
V ar( vec( )) > 0 for all
2
Rpk
2 Rp and
for some
0
)=
0
2 Rk with jj jj = 1 and jj jj = 1
): In contrast, vec( ) has a pd variance matrix i¤
with jj jj = 1:
4
(iii) For example, the following matrix
for Condition
(and hence Condition
satis…es the su¢ cient condition given in Comment (ii)
holds), but not the su¢ cient condition given in Comment
(i). Let Zj for j = 1; 2; 3 be independent standard normal random variables. De…ne
0
1
Z1 Z2
=@
A:
Z3 Z1
(15.1)
Obviously, V ar(vec( )) is not pd. On the other hand, writing
0
2)
= ( 1;
and
=(
1;
0
2) ;
we
have
0
V ar(
) = V ar(
1 [Z1 1
= V ar((
=(
2
1 2)
Now, (
1
= 0 implies
= 0 implies
1; 1)
are: (
and (
2; 2)
2
6= 0;
2
1
= 0 or
2; 2)
= (0; 0) implies (
with jj jj = jj jj = 1; V ar(
+
+
1
2 2 )Z1
1 2 Z2
+
2
2 2)
2 2
+(
+
2
1 2)
2
2 1)
)2
=(
1; 1)
1 1
)2
2
2 1) :
+(
= 0 implies
2
(15.2)
= 0 or
= (0; 0) implies (
1
= 0: In addition,
2
1 2)
=(
2
2 1)
=0
2
1 1 + 2 2)
=(
2
2 2)
>0
> 0: Hence, V ar(
0
) > 0 for all
and
2 R2 with jj jj2 = 1; and the su¢ cient condition given
holds.
allows for redundant rows in
; which corresponds to redundant moment
conditions in the application of Lemma 15.1. Suppose a matrix
adds one or more rows to
2 1 Z3 )
6= 0; etc. So, the two cases where (
) is pd for all
in Comment (ii) for Condition
+ Z1 2 ])
+
= (0; 0): But, (
1 1
2 [Z3 1
= 0 and (
2
= 0 implies
= (0; 0) and (
(iv) Condition
1 1
1 1
+ Z2 2 ] +
satis…es Condition
; which consist of one or more of the existing rows of
: Then, one
or some linear
combinations of them. (In fact, the added rows can be arbitrary provided the resulting matrix has a
multivariate normal distribution.) Call the new matrix
(because the rank of the variance of
+:
The matrix
+
also satis…es Condition
is at least as large as the rank of the variance of
+
;
which is p):
Corollary 15.2 Suppose
Rk
(p q )
Let M 2
q
2 Rk
q
is a nonrandom matrix with full column rank q and
has a multivariate normal distribution (with possibly singular variance matrix ) and k
Rk k
be a nonsingular matrix such that M
q
the variance matrix of M2
2 Rp
q
p q
2
with jj 2 jj = 1: Then, for
2 Rk
=(
q
has rank at least p
q
;
p q
p) = 1:
5
2
p:
= (e1 ; :::; eq ); where el denotes the l-th
coordinate vector in Rk : Decompose M = (M10 ; M20 )0 with M1 2 Rq
2
p q
) 2 Rk
p;
k
and M2 2 R(k
q ) k:
Suppose
q for all nonrandom vectors
we have P (
has full column rank
Comment: Corollary 15.2 follows from Lemma 15.1 by the following argument. We have
0
=@
M
The matrix
M1
q
M1
p q
M2
q
M2
p q
has full column rank p i¤ M
1
0
A=@
Iq
0(k
q ) q
M1
p q
M2
p q
has full column rank p i¤ M2
rank p
q : The Corollary now follows from Lemma 15.1 applied with
M2
;k
p q
q ;p
q ; and
2;
1
A:
(15.3)
p q
has full column
; k; p; and
replaced by
respectively.
The following lemma is a special case of Cauchy’s interlacing eigenvalues result, e.g., see Hwang
(2004). As above, for a symmetric matrix A; let
Let A
r
1 (A)
denote a principal submatrix of A of order r
2 (A)
::: denote the eigenvalues of A:
1: That is, A
r
denotes A with some choice
of r rows and the same r columns deleted.
Proposition 15.3 Let A by a symmetric k
:::
2 (A)
1 (A 1 )
k matrix. Then,
k (A)
k 1 (A 1 )
k 1 (A)
1 (A):
The following is a straightforward corollary of Proposition 15.3.
Corollary 15.4 Let A by a symmetric k
m (A r )
for m = 1; :::; k
r and (b)
k matrix and let r 2 f1; :::; k
m (A)
m r (A r )
1g: Then, (a)
m (A)
for m = r + 1; :::; k:
Proof of Lemma 8.3. First, we prove the convergence result in Lemma 8.3. The singular value
decomposition of Wn Dn Un is
Wn Dn Un = Cn
0
n Bn ;
(15.4)
because Bn is a matrix of eigenvectors of Un0 Dn0 Wn0 Wn Dn Un ; Cn is a matrix of eigenvectors of
Wn Dn Un Un0 Dn0 Wn0 ; and
n
is the k
on the diagonal (ordered so that
jFn
p matrix with the singular values f
jFn
:j
pg of Wn Dn Un
0 is nonincreasing in j).
Using (15.4), we get
Wn Dn Un Bn;q
1
n;q
= Cn
0
n Bn Bn;q
1
n;q
= Cn
0
n@
Iq
0(p q) q
1
A
1
n;q
0
= Cn @
Iq
0(k q) q
1
A = Cn;q ;
(15.5)
where the second equality uses Bn0 Bn = Ip : Hence, we obtain
b n Un Bn;q
Wn D
1
n;q
= Wn Dn Un Bn;q
1
n;q
bn
+ Wn n1=2 (D
= Cn;q + op (1) !p h3;q =
6
h;q ;
Dn )Un Bn;q (n1=2
n;q )
1
(15.6)
where the second equality uses n1=2
q (by the de…nition of q in (8.16)),
b n Dn ) = Op (1) (by
M1 < 1 8F 2 FW U ; see (8.5)), n1=2 (D
jFn
Wn = O(1) (by the condition jjWF jj
! 1 for all j
Lemma 8.2), Un = O(1) (by the condition jjUF jj
M1 < 1 8F 2 FW U ; see (8.5)), and Bn;q ! h2;q
with jjvec(h2;q )jj < 1 (by (8.12) using the de…nitions in (8.17) and (13.1)). The convergence in
(15.6) holds by (8.12), (8.17), and (13.1), and the last equality in (15.6) holds by the de…nition of
h;q
in (8.17).
Using (15.4) again, we have
n1=2 Wn Dn Un Bn;p
0
q
= n1=2 Cn
1
0
n Bn Bn;p q
0
= n1=2 Cn
0
n@
0q (p q)
1
Ip
q
0q (p q)
0q (p q)
B
C
B
C
1=2
C ! h3 B Diagfh1;q+1 ; :::; h1;p g C = h3 h1;p
= Cn B
n
n;p
q
@
A
@
A
(k
p)
(p
q)
(k
p)
(p
q)
0
0
1
A
q;
(15.7)
where the second equality uses Bn0 Bn = Ip ; the convergence holds by (8.12) using the de…nitions in
(8.17) and (13.2), and the last equality holds by the de…nition of h1;p
q
in (8.17).
Using (15.7) and Lemma 8.2, we get
b n Un Bn;p
n1=2 Wn D
where Bn;p
q
! h2;p
q;
q
= n1=2 Wn Dn Un Bn;p
! d h3 h1;p
q
q
bn
+ Wn n1=2 (D
+ h71 Dh h81 h2;p
q
=
Dn )Un Bn;p
q
h;p q ;
(15.8)
Wn ! h71 ; and Un ! h81 by (8.3), (8.12), (8.17), and Assumption WU
using the de…nitions in (13.1) and the last equality holds by the de…nition of
h;p q
in (8.17).
Equations (15.6) and (15.8) combine to prove
b n Un Tn = n1=2 Wn D
b n Un Bn Sn = (Wn D
b n Un Bn;q
n1=2 Wn D
!d (
h;q ;
h;p q )
=
h
1
1=2
b n Un Bn;p q )
Wn D
n;q ; n
(15.9)
using the de…nition of Sn in (8.19). The convergence is joint with that in Lemma 8.2 because it
b n Dn ); which is part of the former. This establishes the
just relies on the convergence of n1=2 (D
convergence result of Lemma 8.3.
Properties (a) and (b) in Lemma 8.3 hold by de…nition. Property (c) in Lemma 8.3 holds by
Lemma 8.2 and property (b) in Lemma 8.3.
7
Now, we prove property (d). We have
h02;p
0
= lim Bn;p
q h2;p q
q Bn;p q
= Ip
q
0
and h03;q h3;q = lim Cn;q
Cn;q = Iq
(15.10)
because Bn and Cn are orthogonal matrices by (8.6) and (8.7). Hence, if q = p; then
0
h
h3;q ;
h
= Ip ; and
2
n;h
0
8n
h;p q ;
M = h03 ; M1 = h03;q ; M2 = h03;k
gives the desired result that P (
that “M
h
all nonrandom vectors
q
h;p q 2
2 Rk
q
=
2
2
2 Rp
q
2 Rp
q;
and
=
q
=
h:
h;q
(= h3;q );
Corollary 15.2
has full column rank p) = 1: The condition in Corollary 15.2
in Corollary 15.2 that “the variance matrix of M2
h03;k
q;
= (e1 ; :::; eq )” holds in this case because h03
q
h;q
1; which is assumed in part
(d). We prove part (d) for this case by applying Corollary 15.2 with q = q;
=
=
has full column rank.
h
Hence, it su¢ ces to consider the case where q < p and
p q
h
p q
h;q
2
= h03 h3;q = (e1 ; :::; eq ): The condition
2 Rk
q
has rank at least p
q for
with jj 2 jj = 1” in this case becomes “the variance matrix of
has rank at least p
q for all nonrandom vectors
2
2 Rp
q
with jj 2 jj = 1:”
It remains to establish the latter property, which is equivalent to
p q
V ar(h03;k
q
h;p q 2 )
>08
2
2 Rp
q
with jj 2 jj = 1:
(15.11)
We have
V ar(h03;k
q
h;p q 2 )
= V ar(h03;k
1=2
q h5;g D h h2;p q 2 )
= ((h2;p
0
q 2)
(h03;k
1=2
q h5;g ))V
= ((h2;p
0
q 2)
(h03;k
1=2
vec(Gi )
((h2;p q 2 )
q h5;g )) h
=
h03;k
h
1=2
q h5;g Gi h2;p q 2
ar(vec(Dh ))((h2;p
1=2 0
q h5;g ) )
q 2)
(h03;k
(h03;k
1=2 0
q h5;g ) )
;
(15.12)
1=2
where the …rst equality holds by the de…nition of
h;p q
in (8.17) and the fact that h71 = h5;g
and
h81 = Ip by the conditions in part (d) of Lemma 8.3, the second and fourth equalities use the general
formula vec(ABC) = (C 0
A)vec(B); the third equality holds because vec(Dh )
by Lemma 8.2, and the fourth equality uses the de…nition of the variance matrix
an arbitrary random vector ai :
Next,
we show that
h03;k
h
1=2
q h5;g Gi h2;p q 2
8
N (0pk ;
ai
h
vec(Gi )
)
h
in (8.15) for
equals the expected outer-product matrix
1=2
0
Cn;k
Fn
lim
n
q
Gi Bn;p
q 2
:
1=2
q h5;g Gi h2;p q 2
h03;k
h
= ((h2;p
0
q 2)
1=2
vec(Gi )
((h2;p q 2 )
q h5;g )) h
(h03;k
= lim((Bn;p
0
q 2)
0
(Cn;k
q
n
= lim((Bn;p
0
q 2)
0
(Cn;k
q
n
0
q 2)
lim((Bn;p
0
q 2)
= lim((Bn;p
0
(Cn;k
0
(Cn;k
0
lim EFn vec(Cn;k
0
Cn;k
Fn
= lim
1=2
n
q
n
q
Gi Bn;p
1=2
q 2
1=2
))
vec(Gi )
((Bn;p q 2 )
Fn
0
(Cn;k
q
n
1=2 0
1=2
))
vec(Gi )
((Bn;p q 2 )
Fn
0
(Cn;k
q
n
1=2 0
n
q
1=2
n
q
1=2 0
q h5;g ) )
(h03;k
1=2
))
))EFn vec(Gi ) EFn vec(Gi )0 ((Bn;p
vec(Gi )
((Bn;p q 2 )
Fn
))
))
Gi Bn;p
0
(Cn;k
0
EFn vec(Cn;k
q 2)
q
n
q
1=2
n
0
(Cn;k
q 2)
n
q
1=2 0
))
1=2 0
))
0
q 2)
Gi Bn;p
;
(15.13)
where the general formula vec(ABC) = (C 0
A)vec(B) is used multiple times, the limits exist by
the conditions imposed on the sequence f
:n
Cn;k
! h3;k
q
q;
and
1=2
n
1=2
h5;g ;
!
n;h
1g; the second equality uses Bn;p
the third equality uses the de…nitions of
0
given in (3.2) and (8.15), respectively, and the last equality uses EFn vec(Cn;k
0
vec(Cn;k
q
n
1=2
Dn Bn;p
1=2
nm Gi Bnm )
= O(n
0
vec(Cn
Fn
We can write lim
0
vec(Cn
m
Fnm
q)
1=2 )
1=2
n
Gi Bn )
by (15.7) with Wn =
n
1=2
1=2
jFnm
2 Rp
For all
2
0
Cn
m ;k
Fnm
lim
2 Rp
q
j
1=2
nm Gi Bnm ;p j
with jj 2 jj = 1; let
and
Gi Bn;p
q)
j;
ai
F
=
1g of matrices
1 for some j = 0; :::; q: It cannot be the case
1=2
jFnm
! 1 as m ! 1
9 1 as m ! 1 by the de…nition of q in (8.16).
Now, we …x an arbitrary j 2 f0; :::; qg: The continuity of the
p j
1=2
ai
F
! h2;p
:
that j > q; because if j > q; then we obtain a contradiction because nm
condition in F0j imply that, for all
n
as the limit of a subsequence fnm : m
for which Fnm 2 F0j for all m
by the …rst condition of F0j and nm
q
j
j
) function and the
p j(
)
with jj jj = 1;
= lim
= (0q
p j(
p j
j0 ; 0 )0
2
0
Cn
m ;k
Fnm
2 Rp
j
1=2
nm Gi Bnm ;p j
j:
Then, Bnm ;p
2
2 Rp
j
> 0:
= Bnm ;p
(15.14)
q 2
and, by
(15.14),
p j
lim
0
Cn
m ;k
Fnm
j
1=2
nm Gi Bnm ;p q 2
>08
Next, we apply Corollary 15.4(b) with A = lim
0
Cn
m ;k
Fnm
q
1=2
nm Gi Bnm ;p q 2
;m=p
j; r = q
0
Cn
m ;k
Fnm
j; where A
j
q
with jj 2 jj = 1:
1=2
nm Gi Bnm ;p q 2
(q j)
(q j)
equals A with its …rst q
and columns deleted in the present case and p > q implies that m = p
9
and A
(15.15)
j
= lim
j rows
1 for all j = 0; :::; q:
Corollary 15.4 and (15.15) give
p q
0
Cn
m ;k
Fnm
lim
q
1=2
nm Gi Bnm ;p q 2
>08
2
2 Rp
q
with jj 2 jj = 1:
(15.16)
Equations (15.12), (15.13), and (15.16) combine to establish (15.11) and the proof of part (d)
is complete.
Part (e) of the Lemma holds by replacing n by the subsequence value wn throughout the
arguments given above.
= 0k for some
Proof of Lemma 15.1. It su¢ ces to show that P (
For any constant
2 Rp with jj jj = 1) = 0:
> 0; there exists a constant K < 1 such that P (jjvec( )jj > K )
:
Given " > 0; let fB( s ; ") : s = 1; :::; N" g be a …nite cover of f 2 Rp : jj jj = 1g; where jj s jj = 1
and B( s ; ") is a ball in Rp centered at
of radius ": It is possible to choose f
s
such that the number, N" ; of balls in the cover is of order "
p+1 :
That is, N"
s
: s = 1; :::; N" g
C1 "
p+1
for some
constant C1 < 1:
Let
r
denote the rth row of
for r = 1; :::; k written as a column vector. If
2 B( s ; "); we
have
jj
s jj
=
k
X
(
0
r(
2
s ))
r=1
!1=2
k
X
r=1
jj
2
r jj
jj
2
s jj
!1=2
= "jjvec( )jj;
where the inequality holds by the Cauchy-Bunyakovsky-Schwarz inequality. If
2 B( s ; ") and
= 0k ; this gives
jj
s jj
(15.17)
"jjvec( )jj:
(15.18)
Suppose Z 2 Rp has a multivariate normal distribution with pd variance matrix. Then, for
any " > 0;
P (jjZ jj
") =
Z
fZ (z)dz
sup fZ (z)
z2Rk
fjjzjj "g
Z
dz
C2 "p
(15.19)
fjjzjj "g
for some constant C2 < 1; where fZ (z) denotes the density of Z with respect to Lebesgue
measure, which exists because the variance matrix of Z is pd, and the inequalities hold because
the density of a multivariate normal is bounded and the volume of a sphere in Rp of radius " is
proportional to "p :
For any
2 Rp with jj jj = 1; let B
the diagonal k
B 0 be a spectral decomposition of V ar(
k matrix with the eigenvalues of V ar(
and B is an orthogonal k
to the eigenvalues in
is
) on its diagonal in nonincreasing order
k matrix whose columns are eigenvectors of V ar(
: By assumption, the rank of V ar(
10
); where
) that correspond
) is p or larger. In consequence,
the …rst p diagonal elements of
B 0 V ar(
: Let (B 0
)B =
vector
B0
and
p
: Let
p submatrix of
is pd (because the …rst p diagonal elements of
Now, given any
= 0k for some
"
= P [N
s=1 [
: We have V ar((B 0
)p ) =
p
are positive).
2B(
s ;"):jj
2 Rp with jj jj = 1)
= 0k g
f
jj=1
"
P [N
s=1 fjj
s jj
"jjvec( )jjg
"
P [N
s=1 fjj
s jj
"jjvec( )jjg \ fjjvec( )jj
"
P [N
s=1 fjj
s jj
"K g +
s=1
N"
X
) =
> 0 and " > 0; we have
P(
N"
X
jj and V ar(B 0
)p denote the p vector that contains the …rst p elements of the k
denote the upper left p
p
jj = jjB 0
are positive. We have jj
P (jj
s jj
P (jj(B 0 s
s=1
K g + P (jjvec( )jj > K )
"K ) +
s )p jj
"K ) +
N" C2 K p "p +
C1 "
!
p+1
C2 K p "p +
as " ! 0;
(15.20)
where the …rst inequality holds by (15.18) using
2 B( s ; "); the third inequality uses the de…nition
of K ; the third last inequality holds because jj(B 0 s
jjB 0 s
s )p jj
s jj
= jj
s jj
using the de…ni-
tions in the paragraph that follows the paragraph that contains (15.19), the second last inequality
holds by (15.19) with Z = (B 0 s
s )p
and the fact that the variance matrix of (B 0 s
s )p
is pd by
the argument given in the paragraph following (15.19), and the last inequality holds by the bound
given above on N" :
Because
= 0k for some
> 0 is arbitrary, (15.20) implies that P (
2 Rp with jj jj = 1) = 0;
which completes the proof.
16
Proof of Theorem 8.4
Theorem 8.4 of AG1. Suppose Assumption WU holds for some non-empty parameter space
2:
Under all sequences f
(a) bpn !p 1 if q = p;
(b) bpn !d
min (
n;h
:n
0
0
h;p q h3;k q h3;k q
1g with
h;p q )
n;h
2
if q < p;
11
;
(c) bjn !p 1 for all j
q;
bn ; i.e., (b(q+1)n ; :::;
b nU
bn0 D
b n0 W
cn0 W
cn D
(d) the (ordered ) vector of the smallest p q eigenvalues of nU
0
0
h;p q h3;k q h3;k q
bpn )0 ; converges in distribution to the (ordered ) p q vector of the eigenvalues of
h;p q
2 R(p
q) (p q) ;
(e) the convergence in parts (a)-(d) holds jointly with the convergence in Lemma 8.3, and
(f) under all subsequences fwn g and all sequences f
wn ;h
:n
1g with
wn ;h
2
; the results
in parts (a)-(e) hold with n replaced with wn :
The proof of Theorem 8.4 uses the following rate of convergence lemma. This lemma is a key
technical contribution of the paper.
Lemma 16.1 Suppose Assumption WU holds for some non-empty parameter space
Under all sequences f
n;h
: n
1g with
n;h
2
and for which q de…ned in (8.16) satis…es
1; we have (a) bjn !p 1 for j = 1; :::; q and (b) when p > q; bjn = op ((n1=2
q
`
q and j = q + 1; :::; p: Under all subsequences fwn g and all sequences f
wn ;h
2
wn ;h
2
`Fn ) )
: n
for all
1g with
; the same result holds with n replaced with wn :
Proof of Lemma 16.1. By the de…nitions in (8.9) and (8.12), h6;j := lim
j = 1; :::; p
2:
(j+1)Fn = jFn
for
1: By the de…nition of q in (8.16), h6;q = 0 if q < p: If q = p; h6;q is not de…ned by
(8.9) and (8.12) and we de…ne it here to equal zero. Because
in j; h6;j 2 [0; 1]: If h6;j > 0; then f
magnitude, i.e., 0 < lim
jFn
:n
1g and f
jF
(j+1)Fn
is nonnegative and nonincreasing
:n
1g are of the same order of
1:50 We group the …rst q singular values into groups that
(j+1)Fn = jFn
have the same order of magnitude within each group. Let Gh (2 f1; :::; qg) denote the number of
groups. (We have Gh
1 because q
1 is assumed in the statement of the lemma.) Note that
Gh equals the number of values in fh6;1 ; :::; h6;q g that equal zero. Let rg and rg denote the indices
of the …rst and last singular values, respectively, in the gth group for g = 1; :::; Gh : Thus, r1 = 1;
rg = rg+1
1; where rGh +1 is de…ned to equal q + 1; and rGh = q: Note that rg and rg depend on h:
By de…nition, the singular values in the gth group, which have the gth largest order of magnitude,
are f
rg Fn
:n
1g; :::; f
r g Fn
:n
1g: By construction, h6;j > 0 for all j 2 frg ; :::; rg
g = 1; :::; Gh : (The reason is: if h6;j is equal to zero for some j 2 frg ; :::; rg
is of smaller order of magnitude than f
by construction, lim
j 0 Fn = jFn
r g Fn
:n
1g; then f
rg Fn
1g for
:n
1g
1g; which contradicts the de…nition of rg :) Also
= 0 for any (j; j 0 ) in groups (g; g 0 ); respectively, with g < g 0 : Note
that when p = 1 we have Gh = 1 and r1 = r1 = 1:
50
Note that supj 1;F 2FW U jF < 1 by the conditions jjWF jj
M1 and jjUF jj
M1 in FW U and the moment
conditions in F : Thus, f jFn : n 1g does not diverge to in…nity, and the “order of magnitude” of f jFn : n 1g
refers to whether this sequence converges to zero, and how slowly or quickly it does, when it does converge to zero.
12
bn are solutions to the determinantal equation
b nU
bn0 D
b n0 W
cn0 W
cn D
The eigenvalues fbjn : j pg of nU
b 10 j
bn
b nU
bn0 D
b n0 W
cn0 W
cn D
Ip j = 0: Equivalently, by multiplying this equation by r12Fn n 1 jBn0 Un0 U
jnU
n
1
b Un Bn j; they are solutions to
jU
n
j
2
0 0 b 0 c0 c b
r1 Fn Bn Un Dn Wn Wn Dn Un Bn
(n1=2
r 1 Fn )
2
b 10 U
b 1 Un Bn j = 0
Bn0 Un0 U
n
n
(16.1)
wp!1; using jA1 A2 j = jA1 j jA2 j for any conformable square matrices A1 and A2 ; jBn j > 0; jUn j > 0
(by the conditions in FW U in (8.5) because
2 and 2 only contains distributions in FW U );
b 1 j > 0 wp!1 (because U
bn !p h81 by (8.2), (8.12), (8.17), and Assumption WU(b) and (c) and
jU
n
h81 is pd), and
r 1 Fn
> 0 for n large (because n1=2
r1 Fn
! 1 for r1
the quali…er wp!1 from some statements below.) Thus, f(n1=2
j
b1n 2 Rr1
for A
r1 ;
2
0 0 b 0 c0 c b
r1 Fn Bn Un Dn Wn Wn Dn Un Bn
q): (For simplicity, we omit
r1 Fn )
2b
jn
:j
pg solve
bn )j = 0 or
(Ip + A
bn ) 1 2 Bn0 Un0 D
b n0 W
cn0 W
cn D
b n Un Bn
j(Ip + A
Ip j = 0; where
r 1 Fn
2
3
b
b2n
A
A
b 10 U
b 1 Un Bn Ip
bn = 4 1n
5 := Bn0 Un0 U
A
n
n
0
b
b
A2n A3n
b2n 2 Rr1
A
(p r1 ) ;
b3n 2 R(p
and A
bn )
multiplying the …rst line by j(Ip + A
r1 ) (p r1 )
(16.2)
and the second line is obtained by
1 j:
We have
=
=
1 c b
r1 Fn Wn Dn Un Bn
1
1
c
r1 Fn (Wn Wn )Wn Dn Un Bn
1
1
c
r1 Fn (Wn Wn )Cn
2
n
(n1=2
+ Op ((n1=2
r 1 Fn )
r1 Fn )
1
)
1 c 1=2 b
Wn n (Dn
Dn )Un Bn
(16.3)
3
h
+ o(1)
0r1 (p r1 )
6 6;r1
7
1=2
1
(p r1 ) r1
= (Ik + op (1))Cn 6
O( r2 Fn = r1 Fn )(p r1 ) (p r1 ) 7
r1 Fn ) )
4 0
5 + Op ((n
0(k p) r1
0(k p) (p r1 )
2
3
r1 1
Y
h6;r
0r1 (p r1 )
1
5 ; where h6;r := Diagf1; h6;1 ; h6;1 h6;2 ; :::;
! p h3 4
h6;` g;
1
0(k r1 ) r1 0(k r1 ) (p r1 )
`=1
h6;r 2 Rr1
1
r1 ;
h6;r := 1 when r1 = 1; O(
1
r1 ) matrix whose diagonal elements are O(
(p r1 ) (p r1 )
r2 Fn = r1 Fn )
r2 Fn = r1
denotes a diagonal (p r1 ) (p
c
Fn ); the second equality uses (15.4), Wn !p h71
(by Assumption WU(a) and (c)), jjh71 jj = jj lim Wn jj < 1 (by the conditions in FW U de…ned in
b n Dn ) = Op (1) (by Lemma 8.2), Un = O(1) (by the conditions in FW U ); and Bn =
(8.5)), n1=2 (D
13
cn Wn 1 !p Ik (because W
cn !p h71 ; h71 :=
O(1) (because Bn is orthogonal), the third equality uses W
jY1
jY1
lim Wn ; and h71 is pd by the conditions in FW U ); jFn = r1 Fn =
( (`+1)Fn = `Fn ) =
h6;` + o(1)
`=1
for j = 2; :::; r1 ; and
jFn = r1 Fn
= O(
r2 Fn = r1 Fn )
nonincreasing in j); and the convergence uses Cn ! h3 ;
and n1=2
r2 Fn = r1 Fn
q):51
! 1 (by (8.16) because r1
r 1 Fn
`=1
for j = r2 ; :::; p (because f
jFn
: j
pg are
! 0 (by the de…nition of r2 );
Equation (16.3) yields
2
0 0 b 0 c0 c b
r1 Fn Bn Un Dn Wn Wn Dn Un Bn
2
!p 4
2
= 4
h6;r
0r1 (p r1 )
1
0(k r1 ) r1
0(k r1 ) (p r1 )
0r1 (p r1 )
2
h6;r
1
0(p r1 ) r1
0(p
r1 ) (p r1 )
30
2
5 h03 h34
3
h6;r
0r1 (p r1 )
1
0(k r1 ) r1
0(k r1 ) (p r1 )
5;
3
5
(16.4)
where the equality holds because h03 h3 = lim Cn0 Cn = Ik using (8.7).
In addition, we have
bn := Bn0 Un0 U
bn 10 U
bn 1 Un Bn
A
Ip !p 0p
p
(16.5)
bn !p h81 by Assumption WU(b) and (c), h81 := lim Un ; and h81 is
bn 1 Un !p Ip (because U
using U
pd by the conditions in FW U ); Bn ! h2 ; and h02 h2 = Ip (because Bn is orthogonal for all n
1):
The ordered vector of eigenvalues of a matrix is a continuous function of the matrix by Elsner’s
Theorem, see Stewart (2001, Thm. 3.1, pp. 37–38). Hence, by the second line of (16.2), (16.4),
2
cn D
b nU
bn Bn (i.e.,
cn0 W
b n0 W
bn0 D
(16.5), and Slutsky’s Theorem, the largest r eigenvalues of
Bn0 U
1
f(n1=2
r 1 Fn )
2b
((n
jn
1=2
r1 Fn )
2
b1n ; :::; (n
1=2
r 1 Fn )
bjn !p 1 8j = 1; :::; r1
because n1=2
r1 Fn
r1 g by the de…nition of bjn ), satisfy
:j
2
br1 n ) !p (1; h26;1 ; h26;1 h26;2 ; :::;
r1 1
Y
h26;` ) and so
`=1
(16.6)
q) and h6;` > 0 for all ` 2 f1; :::; r1 1g (as noted
b0 D
b 0 c0 c b b
above). By the same argument, the smallest p r1 eigenvalues of r12Fn Bn0 U
n n Wn Wn Dn Un Bn ;
i.e., f(n1=2
r1 Fn
r1 Fn )
! 1 (by (8.16) since r1
2b
jn
: j = r1 + 1; :::; pg; satisfy
(n1=2
r 1 Fn )
2
bjn !p 0 8j = r1 + 1; :::; p:
(16.7)
If Gh = 1; (16.6) proves part (a) of the lemma and (16.7) proves part (b) of the lemma (because
51
For matrices that are written as O( ); we sometimes provide the dimensions of the matrix as superscripts for
clarity, and sometimes we do not provide the dimensions for simplicity.
14
in this case r1 = q and
r1 Fn = `Fn
here on, we assume that Gh
= O(1) for all `
2:
Next, de…ne Bn;j1 ;j2 to be the p
Bn for 0
j1 < j2
q by the de…nitions of q and Gh ): Hence, from
(j2
j1 ) matrix that consists of the j1 + 1; :::; j2 columns of
p: Note that the di¤erence between the two subscripts j1 and j2 equals the
number of columns of Bn;j1 ;j2 ; which is useful for keeping track of the dimensions of the Bn;j1 ;j2
matrices that appear below. By de…nition, Bn = (Bn;0;r1 ; Bn;r1 ;p ):
By (16.3) (excluding the convergence part) applied once with Bn;r1 ;p in place of Bn as the farright multiplicand and applied a second time with Bn;0;r1 in place of Bn as the far-right multiplicand,
we have
%n :=
2
0
0 b 0 c0 c b
r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;r1 ;p
2
30
2
=4
h6;r + o(1)
1
0(k
r1 ) r1
+Op ((n1=2
= op (
5 Cn0 (Ik + op (1))Cn 4
1
r1 Fn )
r2 Fn = r1 Fn )
0r1 (p r1 )
O(
r2 Fn =
(k
r1 Fn )
r1 ) (p r1 )
)
+ Op ((n1=2
r 1 Fn )
1
3
5
);
(16.8)
where the last equality holds because (i) Cn0 (Ik + op (1))Cn = Ik + op (1); (ii) when Ik appears in
place of Cn0 (Ik + op (1))Cn ; the …rst summand on the left-hand side (lhs) of the last equality equals
0r1
(p r1 ) ;
and (iii) when op (1) appears in place of Cn0 (Ik + op (1))Cn ; the …rst summand on the lhs
of the last equality equals an r1
(p
r1 ) matrix with elements that are op (
r2 Fn = r1 Fn ):
De…ne
b ( ) :=
1n
2
0
0 b 0 c0 c b
r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;0;r1
b1n ) 2 Rr1
(Ir1 + A
b ( ) :=
3n
2
0
0 b 0 c0 c b
r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p
(Ip
b ( ) := %
2n
n
b2n 2 Rr1
A
(p r1 )
; and
15
r1
r1
b3n ) 2 R(p
+A
;
(16.9)
r1 ) (p r1 )
:
As in the …rst line of (16.2), f(n1=2
0=j
r1 Fn )
2b
jn
:j
pg solve
2
0 0 b 0 c0 c b
r1 Fn Bn Un Dn Wn Wn Dn Un Bn
2
= 4
bn )j
(Ip + A
3
b ( ) b ( )
1n
2n
5
b ( )0 b ( )
2n
3n
b ( )0b 1 ( )b ( )j
2n
2n
1n
= jb1n ( )j jb3n ( )
= jb1n ( )j j
2
0
0 b 0 c0 c b
r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p
b3n A
b02nb1n1 ( )%n %0nb1n1 ( )A
b2n
(Ip r1 + A
1
%0nb1n ( )%n
b02nb1n1 ( )A
b2n )j;
+ A
(16.10)
where the third equality uses the standard formula for the determinant of a partitioned matrix and
the result given in (16.11) below, which shows that b ( ) is nonsingular wp!1 for equal to any
1n
solution
(n1=2
r 1 Fn )
algebra.52
2b
jn
to the …rst equality in (16.10) for j
Now we show that, for j = r1 +1; :::; p; (n1=2
2b
r1 Fn )
jn
p; and the last equality holds by
cannot solve the determinantal equation
jb1n ( )j = 0; wp!1; where this determinant is the …rst multiplicand on the right-hand side (rhs)
of (16.10). This implies that f(n1=2
r1 Fn )
2b
jn
: j = r1 + 1; :::; pg must solve the determinantal
equation based on the second multiplicand on the rhs of (16.10) wp!1: For j = r1 + 1; :::; p; we
have
e
j1n
: = b1n ((n1=2
=
r 1 Fn )
2
bjn )
2
0 b 0 c0 c b
0
r1 Fn Bn;0;r1 Un Dn Wn Wn Dn Un Bn;0;r1
2
= h6;r
+ op (1)
1
op (1)(Ir1 + op (1))
(n1=2
r 1 Fn )
2
b1n )
bjn (Ir1 + A
2
+ op (1);
= h6;r
1
(16.11)
where the second last equality holds by (16.4), (16.5), and (16.7). Equation (16.11) and
2
min (h6;r
1
)>
0 (which follows from the de…nition of h6;r in (16.3) and the fact that h6;` > 0 for all ` 2
1
f1; :::; r1
1g) establish the result stated in the …rst sentence of this paragraph.
For j = r1 +1; :::; p; plugging (n1=2
52
r 1 Fn )
The determinant of the partitioned matrix
2b
=
jn
into the second multiplicand on the rhs of (16.10)
1
0
2
nonsingular, e.g., see Rao (1973, p. 32).
16
2
3
equals j j = j 1 j j
3
1
0
2 1
2j
provided
1
is
gives
2
0
0 b 0 c0 c b
r1 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p
0=j
(n1=2
r 1 Fn )
2
bjn (Ip
b0 e 1 %n
A
2n j1n
bj2n : = A
b3n
A
r1
1
b2n + (n1=2
%0nej1n A
2
0
0 b 0 c0 c b
r2 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p
using Op ((n1=2
j
r2 Fn )
2)
2
r2 Fn = r1 Fn ) )
bj2n )j; where
+A
using (16.8) and (16.11). Multiplying (16.12) by
0=j
+ op ((
= op (1) (because r2
r1 Fn )
2
)
(16.12)
r 1 Fn )
2
2
2
r1 F n = r2 Fn
(p
b0 e 1 A
b
bjn A
2n j1n 2n 2 R
r1 ) (p r1 )
gives
(n1=2
+ op (1)
+ Op ((n1=2
2
r2 Fn )
bjn (Ip
r1
bj2n )j
+A
q by the de…nition of r2 and n1=2
jFn
(16.13)
! 1 for all
q by the de…nition of q in (8.16)).
Thus, f(n1=2
r 2 Fn )
0=j
2b
jn
: j = r1 + 1; :::; pg solve
2
0
0 b 0 c0 c b
r2 Fn Bn;r1 ;p Un Dn Wn Wn Dn Un Bn;r1 ;p
+ op (1)
(Ip
r1
For j = r1 + 1; :::; p; we have
bj2n )j:
+A
(16.14)
bj2n = op (1);
A
(16.15)
b2n = op (1) and A
b3n = op (1) (by (16.5)), e 1 = Op (1) (by (16.11)), %n = op (1) (by (16.8)
because A
j1n
since
r 2 Fn
r1 Fn
and n1=2
r1 Fn
! 1); and (n1=2
(16.7)).
r1 Fn )
2b
jn
= op (1) for j = r1 + 1; :::; p (by
Now, we repeat the argument from (16.2) to (16.15) with the expression in (16.14) replacing that
bj2n ; Bn;p r ; r Fn ;
in the …rst line of (16.2), with (16.15) replacing (16.5), and with j = r +1; :::; p; A
2
1
2
r2 1
r3 Fn ;
r2
r1 ; p
r2 ; and h6;r = Diagf1; h6;r1 +1 ; h6;r1 +1 h6;r1 +2 ; :::;
2
Y
`=r1 +1
h6;` g 2 R(r2
r1 ) (r2 r1 )
bn ; Bn ; r Fn ; r Fn ; r ; p r ; and h ; respectively. (The fact
in place of j = r1 + 1; :::; p; A
1
2
1
1
6;r1
bj2n depends on j; whereas A
bn does not, does not a¤ect the argument.) In addition, Bn;0;r
that A
1
and Bn;r1 ;p in (16.8)-(16.10) are replaced by the matrices Bn;r1 ;r2 and Bn;r2 ;p (which consist of the
r1 + 1; :::; r2 columns of Bn and the last p
r2 columns of Bn ; respectively.) This argument gives
the analogues of (16.6) and (16.7), which are
bjn !p 1 8j = r2 ; :::; r2 and (n1=2
r2 Fn )
2
bjn = op (1) 8j = r2 + 1; :::; p:
(16.16)
bj3n in place of A
bj2n ; where
In addition, the analogue of (16.14) is the same as (16.14) but with A
bj3n is de…ned just as A
bj2n is de…ned in (16.12) but with A
b2j2n and A
b3j2n in place of A
b2n and A
b3n ;
A
17
respectively, where
b1j2n 2 Rr2
for A
r2 ;
bj2n
A
b2j2n 2 Rr2
A
3
b2j2n
b1j2n A
A
5
=4
b0
b3j2n
A
A
2j2n
2
(p r1 r2 ) ;
Repeating the argument Gh
b3j2n 2 R(p
and A
(16.17)
r1 r2 ) (p r1 r2 ) :
2 more times yields
bjn !p 1 8j = 1; :::; rGh and (n1=2
r g Fn )
2
bjn = op (1) 8j = rg + 1; :::; p; 8g = 1; :::; Gh : (16.18)
A formal proof of this “repetition of the argument Gh 2 more times”is given below using induction.
Because rGh = q; the …rst result in (16.18) proves part (a) of the lemma.
The second result in (16.18) with g = Gh implies: for all j = q + 1; :::; p;
(n1=2
rGh Fn )
2
bjn = op (1)
(16.19)
because rGh = q: Either rGh = rGh = q or rGh < rGh = q: In the former case, (n1=2
qFn )
op (1) for j = q + 1; :::; p by (16.19). In the latter case, we have
qFn
lim
rG
rG Fn
h
= lim
rGh Fn
h
Y
=
rGh Fn
of the proof. Hence, in this case too,
(16.20). Because
qFn
`Fn
for all `
qFn )
2b
jn
=
1
h6;j > 0;
(16.20)
j=rGh
where the inequality holds because h6;` > 0 for all ` 2 frGh ; :::; rGh
(n1=2
2b
jn
1g; as noted at the beginning
= op (1) for j = q + 1; :::; p by (16.19) and
q; this establishes part (b) of the lemma.
Now we establish by induction the results given in (16.18) that are obtained heuristically by
“repeating the argument Gh
2 more times.”The induction proof shows that subtleties arise when
establishing the asymptotic negligibility of certain terms.
Let ogp denote a symmetric (p
1; :::; p rg
1
rg
rg for `
1
+`
is op (
(rg
1 +`)Fn
1 (since
(rg
rg
are
(p
rg
1)
matrix whose (`; m) element for `; m =
2
1=2
1
rg Fn ) ):
rg Fn )+Op ((n
nonincreasing in j) and n1=2
1 +m)Fn
jFn
1)
=
We now show by induction over g = 1; :::; Gh that wp!1 f(n1=2
for some (p
2
0
rg Fn Bn;rg
rg
1)
1 ;p
(p
b0 W
c0 c b
Un0 D
n n Wn Dn Un Bn;rg
rg
are ogp may depend on j):
1)
1 ;p
+ ogp
(Ip
rg
rg Fn
r g Fn )
solve
j
Note that ogp = op (1) because
1
! 1 for g = 1; :::; Gh :
2b
jn
: j = rg
bjgn )j = 0
+A
1 + 1; :::; pg
(16.21)
bjgn = op (1) and ogp (where the matrices that
symmetric matrices A
The initiation step of the induction proof holds because (16.21) holds with g = 1 by the …rst line
18
bjgn := A
bn and ogp = 0 for g = 1 (and using the fact that, for g = 1; r
of (16.2) with A
g
and Bn;rg
1 ;p
1
= r0 := 0
= Bn;0;p = Bn ):
For the induction step of the proof, we assume that (16.21) holds for some g 2 f1; :::; Gh
1g
and show that it then also holds for g + 1: By an argument analogous to that in (16.3), we have
1 c b
rg Fn Wn Dn Un Bn;rg
02
B6
6
!p h 3 B
@4
h6;rg 2 R(rg
rg
0r g
1
1 ;p
2
6
= (Ik + op (1))Cn 6
4 Diagf
(rg rg
3
1)
h6;rg
0(k
1)
rg ) (rg rg
(rg rg
1)
1)
7 k
7;0
5
0rg
1
(p rg
1)
rg Fn ; :::; pFn g= rg Fn
0(k
p) (p rg
1)
1
3
7
7 + Op ((n1=2
5
C
A ; where h6;rg := Diagf1; h6;rg ; :::;
(p rg ) C
r g Fn )
1
)
rg 1
Y
j=rg
1 +1
h6;j g;
(16.22)
; and h6;rg := 1 when rg = 1:
Equation (16.22) and h03 h3 = lim Cn0 Cn = Ik yield
2
0
rg Fn Bn;rg
1 ;p
b0 W
c0 c b
Un0 D
n n Wn Dn Un Bn;rg
1 ;p
2
0(rg
2
h6;r
g
!p 4
0(p
rg ) (rg rg
1)
0(p
rg
1)
(p rg )
rg ) (p rg )
3
5:
(16.23)
By (16.21) and ogp = op (1); we have wp!1 f(n1=2 rg Fn ) 2 bjn : j = rg 1 + 1; :::; pg solve
0 b 0 c0 c b
bjgn ) 1 2 B 0
Ip rg 1 j = 0: Hence, by
j(Ip rg 1 + A
rg Fn n;rg 1 ;p Un Dn Wn Wn Dn Un Bn;rg 1 ;p + op (1)
bjgn = op (1) (which holds by the induction assumption), and the same argument as used
(16.23), A
to establish (16.6) and (16.7), we obtain
bjn !p 1 8j = rg
Let ogp denote an (rg
are op (
(rg +j)Fn = rg Fn )
1
+ 1; :::; rg and (n1=2
rg
1)
+ Op ((n1=2
(p
rg Fn )
2
bjn !p 0 8j = rg + 1; :::; p:
(16.24)
rg ) matrix whose elements in column j for j = 1; :::; p
rg Fn )
1 ):
rg
Note that ogp = op (1):
By (16.22) applied once with Bn;rg ;p in place of Bn;rg
19
1 ;p
as the far-right multiplicand and
applied a second time with Bn;rg
1 ;rg
in place of Bn;rg
1 ;p
as the far-right multiplicand, we have
%gn
:=
2
0
rg Fn Bn;rg
2
6
=6
4 Diagf
1 ;rg
b0 W
c0 c b
Un0 D
n n Wn Dn Un Bn;rg ;p
30
0r g
(rg
1 +1)Fn
; :::;
rg ) (rg rg
0(k
+Op ((n1=2
(rg rg
1
rg Fn )
1
2
1)
rg Fn g= rg Fn
1)
)
7 0
6
7 Cn (Ik + op (1))Cn 6 Diagf
5
4
(p rg )
0r g
(rg +1)Fn ; :::; pFn g= rg Fn
0(k
p) (p rg )
= ogp ;
3
7
7
5
(16.25)
where %gn 2 R(rg
rg
1)
(p rg )
; Diagf
(rg
1 +1)Fn
; :::;
rg Fn g= rg Fn
= h6;rg + o(1) = O(1) and the
last equality holds because (i) Cn0 (Ik + op (1))Cn = Ik + op (1); (ii) when Ik appears in place of
Cn0 (Ik + op (1))Cn ; then the contribution from the …rst summand on the lhs of the last equality
rg
in (16.25) equals 0(rg
(p rg )
1)
; and (iii) when op (1) appears in place of Cn0 (Ik + op (1))Cn ; the
contribution from the …rst summand on the lhs of the last inequality in (16.25) equals an ogp matrix.
bjgn as follows:
We partition the (p r ) (p r ) matrices ogp and A
g 1
0
ogp = @
b1jgn
where o1gp ; A
2 R(p
rg ) (p rg )
b
1jgn (
b
2jgn (
b
3jgn (
2
) :=
2 0
rg Bn;rg
o02gp o3gp
1)
(rg rg
1
bjgn
A and A
1)
3
b1jgn A
b2jgn
A
5;
=4
b0
b3jgn
A
A
2jgn
b2jgn
; o2gp ; A
2
2
R(rg
rg
1)
1 ;rg
b0 W
c0 c b
Un0 D
n n Wn Dn Un Bn;rg
b2jgn ; and
A
2
0
0 b 0 c0 c b
rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p
1 ;rg
+ o1gp
+ o3gp
(Irg
(Ip
rg
(16.26)
(p rg )
+ 1; :::; p and g = 1; :::; Gh : De…ne
1
) := %gn + o2gp
) :=
o1gp o2gp
rg
R(rg
; for j = rg
g 1
rg
b3jgn
; and o3gp ; A
1
b1jgn );
+A
b3jgn );
+A
(16.27)
where b1jgn ( ); b2jgn ( ); and b3jgn ( ) have the same dimensions as o1gp ; o2gp ; and o3gp ; respectively.
20
From (16.21), we have wp!1 f(n1=2
0=j
2
0
rg Fn Bn;rg
1 ;p
r g Fn )
b n0 W
cn0 W
cn D
b n Un Bn;r
Un0 D
g
= jb1jgn ( )j jb3jgn ( )
b
2jgn (
= jb1jgn ( )j j
2b
1 ;p
jn
: j = rg
+ ogp
1
(Ip
1
)0b1jgn ( )b2jgn ( )j
2
0
0 b 0 c0 c b
rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p
b3jgn A
b0 b 1 ( )(%gn + o2gp )
[Ip rg + A
2jgn 1jgn
1
b02jgnb1jgn
b2jgn ]j;
+ A
( )A
+ 1; :::; pg solve
rg
+ o3gp
1
bjgn )j
+A
1
(%gn + o2gp )0b1jgn ( )(%gn + o2gp )
1
b2jgn
(%gn + o2gp )0b1jgn ( )A
(16.28)
where the second equality holds by the same argument as for (16.10) and uses the result given in
(16.29) below which shows that b1jgn ( ) is nonsingular wp!1 when equals (n1=2 rg Fn ) 2 bjn for
j = rg + 1; :::; p:
Now we show that, for j = rg +1; :::; p; (n1=2
rg Fn )
2b
jn
cannot solve the determinantal equation
jb1jgn ( )j = 0 for n large, where this determinant is the …rst multiplicand on the rhs of (16.28)
and, hence, it must solve the determinantal equation based on the second multiplicand on the rhs
of (16.28). For j = rg + 1; :::; p; we have
e
1jgn
:= b1jgn ((n1=2
rg Fn )
2
2
bjn ) = h6;r
+ op (1);
g
(16.29)
b1jgn = op (1) (which holds by the
by the same argument as in (16.11), using o1gp = op (1) and A
b1jgn following (16.21)). Equation (16.29) and min (h 2 ) > 0 establish the result
de…nition of A
6;rg
stated in the …rst sentence of this paragraph.
For j = rg +1; :::; p; plugging (n1=2
rg Fn )
gives
(n1=2
bj(g+1)n : = A
b3jgn
A
+(n1=2
bj(g+1)n 2 R(p
and A
= o3gp
jn
into the second multiplicand on the rhs of (16.28)
2
0
0 b 0 c0 c b
rg Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p
0=j
o3gp
2b
2
rg Fn )
bjn (Ip
rg
bj(g+1)n )j; where
+A
b0 e 1 (%gn + o2gp )
A
2jgn 1jgn
2
rg Fn )
rg ) (p rg )
b02jgne 1 A
b
bjn A
1jgn 2jgn
1
(%gn + o2gp )0e1jgn (%gn + o2gp )
1
b2jgn
(%gn + o2gp )0e1jgn A
(16.30)
: The last two summands on the rhs of the …rst line of (16.30) satisfy
1
(%gn + o2gp )0e1jgn (%gn + o2gp ) = o3gp
ogp0 ogp = (
+ o3gp
2
2
rg+1 Fn = rg Fn )o(g+1)p ;
21
(ogp + o2gp )0 (h6;rg2 + op (1))(ogp + o2gp )
(16.31)
where (i) the …rst equality uses (16.25) and (16.29), (ii) the second equality uses o2gp = ogp (which
holds because the (j; m) element of o2gp for j = 1; :::; rg
(rg +m)Fn =
2
1=2
1
rg Fn ) )
rg Fn )+Op ((n
= op (
rg
1
and m = 1; :::; p rg is op (
(rg +m)Fn = rg Fn )+Op
((n1=2
rg Fn )
rg ) and (h6;rg2 + op (1))ogp = ogp (which holds because h6;rg is diagonal and
1)
(rg
since rg
2
min (h6;rg )
1 +j)Fn
1 +j
> 0); (iii) the
2
2
0
rg
rg Fn = rg+1 Fn )ogp ogp for j; m = 1; :::; p
op ( (rg +j)Fn (rg +m)Fn = 2rg Fn )( 2rg Fn = 2rg+1 Fn ) = op ( (rg +j)Fn (rg +m)Fn
Op ((n1=2 rg Fn ) 2 )( 2rg Fn = 2rg+1 Fn ) = Op ((n1=2 rg+1 Fn ) 2 ) and, hence,
last equality uses the fact that the (j; m) element of (
is the sum of a term that is
=
(
2
rg+1 Fn ) and a term
2
2
0
rg Fn = rg+1 Fn )ogp ogp
that is
is o(g+1)p (using the de…nition of o(g+1)p ); and (iv) the last equality uses the
2
2
rg Fn = rg+1 Fn )o3gp for j; m =
Op ((n1=2 rg Fn ) 1 )( 2rg Fn = 2rg+1 Fn )
fact that the (j; m) element of (
2
2
2
rg Fn )( rg Fn = rg+1 Fn ) +
+Op ((n1=2 rg+1 Fn ) 1 )( rg Fn = rg+1 Fn );
=
(using
rg Fn = rg+1 Fn
1; :::; p
=
rg is op (
(rg +j)Fn (rg +m)Fn
2
(rg +j)Fn (rg +m)Fn = rg+1 Fn )
op (
which again is the same order as the (j; m) element of o(g+1)p
1):
The calculations in (16.31) are a key part of the induction proof. The de…nitions of the terms
ogp and ogp (given preceding (16.21) and (16.25), respectively) are chosen so that the results in
(16.31) hold.
For j = rg + 1; :::; p; we have
bj(g+1)n = op (1);
A
(16.32)
b2jgn = op (1) and A
b3jgn = op (1) by (16.21), e 1 = Op (1) (by (16.29)), %gn + o2gp = op (1)
because A
1jgn
(by (16.25) since ogp = op (1)); and (n1=2
rg Fn )
2b
jn
= op (1) (by (16.24)).
Inserting (16.31) and (16.32) into (16.30) and multiplying by
0=j
2
0
0 b 0 c0 c b
rg+1 Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p
Thus, wp!1;
0=j
f(n1=2
rg+1 Fn )
2b
jn
+ o(g+1)p
(n1=2
2
2
rg Fn = rg+1 Fn
rg+1 Fn )
: j = rg+1 ; :::; pg solve
2
0
0 b 0 c0 c b
rg+1 Fn Bn;rg ;p Un Dn Wn Wn Dn Un Bn;rg ;p
+ o(g+1)p
(Ip
rg
2
bjn (Ip
gives
rg
bj(g+1)n )j:
+A
bj(g+1)n )j:
+A
(16.33)
(16.34)
This establishes the induction step and concludes the proof that (16.21) holds for all g = 1; :::; Gh :
Finally, given that (16.21) holds for all g = 1; :::; Gh ; (16.24) gives the results stated in (16.18)
and (16.18) gives the results stated in the Lemma by the argument in (16.18)-(16.20).
Now we use the approach in Johansen (1991, pp. 1569-1571) and Robin and Smith (2000, pp.
172-173) to prove Theorem 8.4. In these papers, asymptotic results are established under a …xed
true distribution under which certain population eigenvalues are either positive or zero. Here we
need to deal with drifting sequences of distributions under which these population eigenvalues may
22
be positive or zero for any given n; but the positive ones may drift to zero as n ! 1; possibly at
di¤erent rates. This complicates the proof. In particular, the rate of convergence result of Lemma
16.1(b) is needed in the present context, but not in the …xed distribution scenario considered in
Johansen (1991) and Robin and Smith (2000).
Proof of Theorem 8.4. Theorem 8.4(a) and (c) follow immediately from Lemma 16.1(a).
b n0 W
cn0 W
cn
bn D
Next, we assume q < p and we prove part (b). The eigenvalues fbjn : j pg of nU
bn
b nU
b n0 W
cn0 W
cn D
bn are the ordered solutions to the determinantal equation jnU
bn D
b nU
D
Ip j = 0:
Equivalently, with probability that goes to one (wp!1), they are the solutions to
b n Un Bn Sn
b n0 W
cn0 W
cn D
jQn ( )j = 0; where Qn ( ) := nSn Bn0 Un0 D
bn 10 U
bn 1 Un Bn Sn ;
Sn0 Bn0 Un0 U
(16.35)
bn j > 0 wp!1. Thus,
because jSn j > 0; jBn j > 0; jUn j > 0; and jU
b 0 b 0 c0 c b b
min (nUn Dn Wn Wn Dn Un )
equals
the smallest solution, bpn ; to jQn ( )j = 0 wp!1. (For simplicity, we omit the quali…er wp!1 that
applies to several statements below.)
We write Qn ( ) in partitioned form using
Bn Sn = (Bn;q Sn;q ; Bn;p
Sn;q := Diagf(n1=2
q );
1Fn )
1
where
; :::; (n1=2
qFn )
1
g 2 Rq
q
:
(16.36)
b n Un Tn (= n1=2 Wn D
b n Un Bn Sn ) can be written
The convergence result of Lemma 8.3 for n1=2 Wn D
as
where
b n Un Bn;q Sn;q !p
n1=2 Wn D
h;q
and
h;p q
h;q
b n Un Bn;p
:= h3;q and n1=2 Wn D
q
!d
h;p q ;
(16.37)
are de…ned in (8.17).
We have
cn W 1 !p Ik and U
bn U 1 !p Ip
W
n
n
(16.38)
cn !p h71 := lim Wn (by Assumption WU(a) and (c)), U
bn !p h81 := lim Un (by Assumpbecause W
tion WU(b) and (c)), and h71 and h81 are pd (by the conditions in FW U ):
23
By (16.35)-(16.38), we have
2
3
b n Un Bn;p q + op (1)
Iq + op (1)
h03;q n1=2 Wn D
5
Qn ( ) = 4
0
0D
0 W0h
1=2 B 0
0D
b 0 W 0 Wn n1=2 D
b n Un Bn;p q + op (1)
b
n1=2 Bn;p
U
+
o
(1)
n
U
3;q
p
q n n n
n;p q n n n
3
2
3
2
2
Sn;q
0q (p q)
Sn;q A1n Sn;q Sn;q A2n
5 ; where
4
5
4
(16.39)
A02n Sn;q
A3n
0(p q) q
Ip q
3
2
A
A2n
bn 10 U
bn 1 Un Bn Ip = op (1) for A1n 2 Rq q ; A2n 2 Rq (p q) ;
bn = 4 1n
5 := Bn0 Un0 U
A
0
A2n A3n
and A3n 2 R(p
q) (p q) ;
0
bn is de…ned in (16.39) just as in (16.5), and the …rst equality uses
A
0 C
:= h3;q and h;q h;q = h03;q h3;q = lim Cn;q
n;q = Iq (by (8.7), (8.9), (8.12), and (8.17)). Note
bjn (de…ned in (16.2)) are not the same in general for j = 1; 2; 3; because their
that Ajn and A
b1n 2 Rr1 r1 :
dimensions di¤er. For example, A1n 2 Rq q ; whereas A
h;q
If q = 0 (< p); then Bn = Bn;p
q
and
bn0 D
b n0 W
cn0 W
cn D
b nU
bn Bn
nBn0 U
1
bn )0 B 10 B 0 U 0 D
b0 0 c
= nBn0 (Un 1 U
n
n n n Wn Wn Wn
!d
0
h;p q
0
h;p q ;
cn W 1 (Wn D
b n Un Bn )B 1 (U 1 U
bn )Bn
W
n
n
n
(16.40)
where the convergence holds by (16.37) and (16.38) and
h;p q
is de…ned as in (8.17) with q = 0:
The smallest eigenvalue of a matrix is a continuous function of the matrix (by Elsner’s Theorem, see
cn D
b nU
bn Bn
cn0 W
b n0 W
bn0 D
Stewart (2001, Thm. 3.1, pp. 37–38)). Hence, the smallest eigenvalue of nBn0 U
converges in distribution to the smallest eigenvalue of
0
0
h;p q h3;k q h3;k q
h;p q
(using h3;k
0
q h3;k q
=
h3 h03 = Ik when q = 0), which proves part (b) of Theorem 8.4 when q = 0:
In the remainder of the proof of part (b), we assume 1
q < p; which is the remaining case
to be considered in the proof of part (b). The formula for the determinant of a partitioned matrix
and (16.39) give
jQn ( )j = jQ1n ( )j jQ2n ( )j; where
Q1n ( ) : = Iq + op (1)
0
Q2n ( ) : = n1=2 Bn;p
2
Sn;q
Sn;q A1n Sn;q ;
0 b0
0
1=2 b
Dn Un Bn;p q
q Un Dn Wn Wn n
0
[n1=2 Bn;p
0
0 b0
q Un Dn Wn h3;q
b n Un Bn;p
[h03;q n1=2 Wn D
q
+ op (1)
Ip
q
+ op (1)
A02n Sn;q ](Iq + op (1)
+ op (1)
Sn;q A2n ];
24
A3n
2
Sn;q
Sn;q A1n Sn;q )
1
(16.41)
none of the op (1) terms depend on ; and the equation in the …rst line holds provided Q1n ( ) is
nonsingular.
By Lemma 16.1(b) (which applies for 1
and bjn Sn;q A1n Sn;q = op (1): Thus,
2 = o (1)
q < p); for j = q + 1; :::; p; we have bjn Sn;q
p
2
bjn Sn;q
Q1n (bjn ) = Iq + op (1)
bjn Sn;q A1n Sn;q = Iq + op (1):
(16.42)
By (16.35) and (16.41), jQn (bjn )j = jQ1n (bjn )j jQ2n (bjn )j = 0 for j = 1; :::; p: By (16.42),
jQ1n (bjn )j =
6 0 for j = q + 1; :::; p wp!1. Hence, wp!1,
jQ2n (bjn )j = 0 for j = q + 1; :::; p:
(16.43)
Now we plug in bjn for j = q + 1; :::; p into Q2n ( ) in (16.41) and use (16.42). We have
0 b0
0
b
q Un Dn Wn Wn Dn Un Bn;p q
0
Q2n (bjn ) = nBn;p
0
[n1=2 Bn;p
bjn [Ip
q
0 b0
0
q Un Dn Wn h3;q
+ A3n
+ op (1)
b n Un Bn;p
+ op (1)](Iq + op (1))[h03;q n1=2 Wn D
0
(n1=2 Bn;p
0 b0
0
q Un Dn Wn h3;q
b n Un Bn;p
A02n Sn;q (Iq + op (1))(h03;q n1=2 Wn D
q
+ op (1)]
+ op (1))(Iq + op (1))Sn;q A2n
q
+ op (1))
+bjn A02n Sn;q (Iq + op (1))Sn;q A2n ]:
(16.44)
The term in square brackets on the last three lines of (16.44) that multiplies bjn equals
Ip
q
+ op (1);
b n Un Bn;p
because A3n = op (1) (by (16.39)), n1=2 Wn D
q
(16.45)
= Op (1) (by (16.37)), Sn;q = o(1) (by the
de…nitions of q and Sn;q in (8.16) and (16.36), respectively, and h1;j := lim n1=2
jFn );
A2n = op (1)
2 A +A0 b S
(by (16.39)), and bjn A02n Sn;q (Iq +op (1))Sn;q A2n = A02n bjn Sn;q
2n
2n jn n;q op (1)Sn;q A2n = op (1)
2 = o (1) and A
(using bjn Sn;q
p
2n = op (1)):
Equations (16.44) and (16.45) give
0
Q2n (bjn ) = n1=2 Bn;p
0
= n1=2 Bn;p
:= Mn;p
q
0 b0
0
q Un Dn Wn [Ik
b n Un Bn;p
h3;q h03;q ]n1=2 Wn D
0 b0
0
0
1=2
b n Un Bn;p q
Wn D
q Un Dn Wn h3;k q h3;k q n
bjn [Ip
q
q
+ op (1)
+ op (1)
+ op (1)];
where the second equality uses Ik = h3 h03 = h3;q h03;q + h3;k
25
0
q h3;k q
bjn [Ip
bjn [Ip
q
q
+ op (1)]
+ op (1)]
(16.46)
(because h3 = lim Cn is an
orthogonal matrix) and the last line de…nes the (p
q)
(p
q) matrix Mn;p
Equations (16.43) and (16.46) imply that fbjn : j = q + 1; :::; pg are the p
q:
q eigenvalues of the
matrix
Mn;p
q
:= [Ip
q
+ op (1)]
1=2
Mn;p
q [Ip q
+ op (1)]
1=2
by pre- and post-multiplying the quantities in (16.46) by the rhs quantity [Ip
(16.47)
q
1=2
+ op (1)]
in
(16.46). By (16.37),
Mn;p
q
!d
0
0
h;p q h3;k q h3;k q
h;p q :
(16.48)
The vector of (ordered) eigenvalues of a matrix is a continuous function of the matrix (by
Elsner’s Theorem, see Stewart (2001, Thm. 3.1, pp. 37–38)). By (16.48), the matrix Mn;p
converges in distribution. In consequence, by the CMT, the vector of eigenvalues of Mn;p
q;
q
viz.,
fbjn : j = q + 1; :::; pg; converges in distribution to the vector of eigenvalues of the limit matrix
0
b 0 b 0 c0
h3;k q h0
h;p q ; which proves part (d) of Theorem 8.4. In addition, min (nU D W
h;p q
n
3;k q
n
n
cn D
b nU
bn ); which equals the smallest eigenvalue, bpn ; converges in distribution to the smallest
W
eigenvalue of
0
0
h;p q h3;k q h3;k q
h;p q ;
which completes the proof of part (b) of Theorem 8.4.
The convergence in parts (a)-(d) of Theorem 8.4 is joint with that in Lemma 8.3 because it
b n Un Tn ; which is part of the former. This
just relies on the convergence in distribution of n1=2 Wn D
establishes part (e) of Theorem 8.4.
Part (f) of Theorem 8.4 holds by the same proof as used for parts (a)-(e) with n replaced by
wn :
17
Proofs of Su¢ ciency of Several Conditions for the
p j(
)
Condition in F0j
In this section, we show that the conditions in (3.9) and (3.10) are su¢ cient for the second
condition in F0j ; which is
p j(
0
CF;k
F
1=2
j
F
Gi BF;p
26
j
)
1
8 2 Rp
j
with jj jj = 1:
Condition (i) in (3.9) is su¢ cient by the following argument:
p j
0
CF;k
F
p j
C F;p
F
1=2
0
=
(
min
=
0
j
F
Gi BF;p
j
j)
vec(C F;p
F
F
j
1=2
Ip
min
2Rp j :jj
Gi BF;p
j
jj=1
(
jj(
Ip
Ip
0
vec(C F;p
F
min
2
2R(p j) :jj
=
jj=1
0
vec(C F;p
F
min
0
1=2
F
F
Gi BF;p
0
j)
j)
0
j
1=2
j
jj
j
j)
0
1=2
Gi BF;p
Ip
1=2
vec(C F;p
F
F
(
j
F
j)
Gi BF;p
j)
j)
(
jj(
min
2Rp
Gi BF;p
j)
j :jj
jj=1
Ip
Ip
jj(
F
0
CF;k
j
0
C F;p j
1=2
F
Gi BF;p
1=2
F
j
Gi BF;p
j
CF;k
0
CF;k
F
0
are a collection of p j rows of CF;k
eigenvalue of a (p
j)
(p
j );
j
1=2
j
F
Gi BF;p
replaced by
j
j)
jj2
jj2
j)
j and r = k
0
C F;p j ;
p (because
1=2
0
CF;k
F
; since
j
F
Gi BF;p
j
=
and by de…nition the rows of
the …rst equality holds because the (p j)-th largest
j) matrix equals its minimum eigenvalue and by the general formula
vec(ABC) = (C 0 A)vec(B); and the last equality holds because jj(
0
Ip
(17.1)
0
likewise with CF;k
j;
jj(
jj
;
is a submatrix of
F
j)
Ip
where the …rst inequality holds by Corollary 15.4(a) with m = p
0
C F;p j
j)
Ip
j)
0
jj2 =
(
0
Ip
j)
=
= 1 using jj jj = jj jj = 1:
Condition (ii) in (3.9) is su¢ cient by su¢ cient condition (i) in (3.9) and the following:
0
min
=
j
2
2R(p j) :jj
jj=1
C F;p
j
min
2R(p j)k :jj
min
jj=1
vec(
F
0
1=2
F
F
(Ip
jj(Ip
min
jj(Ip
=
1=2
vec(C F;p
F
j)
of
0
C F;p j
j
2
jj
Gi BF;p
j
j)
C F;p
C F;p
j
vec(
F
where the last equality uses jj(Ip
Gi BF;p
1=2
F
0
j)
j)
Gi BF;p
jj
j)
vec(
F
1=2
F
min
2
2R(p j) :jj
j)
Gi BF;p
jj=1
j)
jj(Ip
(Ip
jj(Ip
j
;
C F;p
j
j
C F;p
C F;p
C F;p
j)
j)
j)
jj
jj2
(17.2)
j)
jj2 =
0 (I
p j
0
C F;p
j C F;p j )
= 1 because the rows
are orthonormal and jj jj = 1:
Condition (iii) in (3.9) is su¢ cient by su¢ cient condition (ii) in (3.9) and a similar argument to
that given in (17.2) using the fact that min
of BF;p
j
0
2Rpk :jj jj=1 jj(BF;p j
are orthonormal.
27
Ik ) jj2 = 1 because the columns
Condition (iv) in (3.9) is su¢ cient by su¢ cient condition (iii) in (3.9) and a similar argument to
that given in (17.2) using min
of F in place of min
following calculations:
0
2R(p
1
(Ip
F
1=2
2Rpk :jj jj=1 jj(Ip
j)2 :jj
) =
jj=1
p
X
j=1
p
X
jj(Ip
F
C F;p
j
( j =jj j jj)0
min ( F
1
1
F
=(
0
1 ; :::;
0 0
p)
for
2 Rk 8j
j
jj2
2=(2+ )
for M as in the de…nition
= 1: The latter inequality holds by the
( j =jj j jj)
jj j jj2
max ( F )
max ( F )
((EF jjgi jj2+ )1=(2+ ) )2
M 2=(2+
)
2=(2+ )
M
p; the sums are over j for which
ity uses jj jj = 1; and the last inequality holds because
EF jjgi jj2 = ((EF jjgi jj2 )1=2 )2
M
jj j jj2 = 1=
)
j=1
where
j)
) jj2
j
;
(17.3)
6= 0k ; the second equal-
= max
2Rk :jj jj=1 EF (
0
gi )2
by successively applying the
Cauchy-Bunyakovsky-Schwarz inequality, Lyapunov’s inequality, and the moment bound EF jjgi jj2+
M in F:
Conditions (v) and (vi) in (3.9) are su¢ cient by the following argument. Write
vec(Gi )
F
= (MF ; Ipk )
fi
0
F (MF ; Ipk ) ;
(EF vec(Gi )gi0 )(EF gi gi0 )
where MF =
1
2 Rpk
k
:
(17.4)
We have
min (
vec(Gi )
)
F
=
=
min
2Rpk :jj jj=1
min
2Rpk :jj jj=1
0
(MF ; Ipk )
=
min (
fi
F );
0
(MF ; Ipk )0
jj(MF ; Ipk )0 jj
min
2R(p+1)k :jj
fi
0
F (MF ; Ipk )
jj=1
0
(MF ; Ipk )0
jj(MF ; Ipk )0 jj
fi
F
fi
F
jj(MF ; Ipk )0 jj2
(17.5)
where the inequality uses jj(MF ; Ipk )0 jj2 =
0
+
0
MF0 MF
2 Rpk with jj jj = 1: This
1 for
shows that condition (v) is su¢ cient for su¢ cient condition (iv) in (3.9). Since
EF fi EF fi0 ;
fi
F
= V arF (fi ) +
condition (vi) is su¢ cient for su¢ cient condition (v) in (3.9).
The condition in (3.10) is su¢ cient by the following argument:
p j
0
CF;k
F
1=2
j
F
Gi BF;p
j
p
0
CF
F
1=2
F
Gi BF;p
1=2
j
=
p
F
F
Gi BF;p
j
;
(17.6)
where the …rst inequality holds by Corollary 15.4(b) with m = p and r = j and the equality holds
28
because
18
0
CF
F
1=2
F
Gi BF;p
j
= CF0
1=2
F
F
Gi BF;p
j
CF and CF is orthogonal.
Asymptotic Size of Kleibergen’s CLR Test with JacobianVariance Weighting and the Proof of Theorem 5.1
In this section, we establish the asymptotic size of Kleibergen’s CLR test with Jacobian-variance
weighting when the Robin and Smith (2000) rank statistic (de…ned in (5.5)) is employed. This rank
statistic depends on a variance matrix estimator VeDn : See Section 5 for the de…nition of the test.
We provide a formula for the asymptotic size of the test that depends on the speci…cs of the moment
conditions considered and does not necessarily equal its nominal size : First, in Section 18.1, we
provide an example that illustrates the results in Theorem 5.1 and Comment (v) to Theorem 5.1.
In Section 18.2, we establish the asymptotic size of the test based on VeDn de…ned as in (5.3). In
Section 18.3, we report some simulation results for a linear instrumental variable (IV) model with
two rhs endogenous variables. In Section 18.4, we establish the asymptotic size of Kleibergen’s CLR
test with Jacobian-variance weighting under a general assumption that allows for other de…nitions
of VeDn :
In Section 18.5, we show that equally-weighted versions of Kleibergen’s CLR test have correct
asymptotic size when the Robin and Smith (2000) rank statistic is employed and a general equalfn is employed. This result extends the result given in Theorem 6.1 in Section
weighting matrix W
fn = b n 1=2 ; as in (6.2). The results of Section 18.5 are
6, which applies to the speci…c case where W
a relatively simple by-product of the results in Section 18.4.
Proofs of the results stated in this section are given in Section 18.6.
Theorem 5.1 follows from Lemma 18.2 and Theorem 18.3, which are stated in Section 18.4.
18.1
An Example
Here we provide a simple example that illustrates the result of Theorem 5.1. In this example, the
true distribution F does not depend on n: Suppose p = 2; EF Gi = (1k ; 0k ); where ck = (c; :::; c)0 2
b n EF Gi ) !d Dh under F for some random matrix Dh = (D1h ; D2h ) 2 Rk 2 :
Rk for c = 0; 1; n1=2 (D
fn = Ve 1=2 and MF = I2k ; we have n1=2 (M
fn MF ) !d M h under F for some random
Suppose for M
matrix M h 2
Dn
2k
2k
R
:53
We have
b ny = vec 1 (Ve 1=2 vec(D
b n )) = M
f11n D
b 1n + M
f12n D
b 2n ; M
f21n D
b 1n + M
f22n D
b 2n ;
D
k;p Dn
(18.1)
b n EF Gi ) !d Dh and n1=2 (M
fn MF ) !d M h are established in Lemmas 8.2
The convergence results n1=2 (D
and 18.2, respectively, in Section 8 of AG1 and Section 18 in this Supplemental Material under general conditions.
53
29
b n = (D
b 1n ; D
b 2n ); M
fj`n for j; ` = 1; 2 are the four k
where D
fn ; and likewise
k submatrices of M
for Mj`F for j; ` = 1; 2: Let M j`h for j; ` = 1; 2 denote the four k
Tny
= Diagfn
1=2 ; 1g:
k submatrices of M h : We let
Then, we have
b ny Tny =
n1=2 D
!d
b 2n
b 1n + M
f22n n1=2 D
f21n D
b 2n ; n1=2 M
b 1n + M
f12n D
f11n D
M
Ik 1k + 0k
k k
0 ; M 21h 1k + Ik D2h = 1k ; M 21h 1k + D2h ;
(18.2)
b 2n !d D2h
and n1=2 D
b ny Tny depends
(because EF Gi2 = 0k ): Equation (18.2) shows that the asymptotic distribution of n1=2 D
on the randomness of the variance estimator VeDn through M 21h :
f21n !d M 21h (because M21F = 0k
where the convergence uses n1=2 M
k)
It may appear that this example is quite special and the asymptotic behavior in (18.2) only
arises in special circumstances, because EF Gi = (1k ; 0k ); M21F = 0k
k;
and MF = I2k in this
example. But this is not true. The asymptotic behavior in (18.2) arises quite generally, as shown
in Theorem 5.1, whenever p 2:54
1=2
b ny ; then the calculations
If one replaces VeDn by its probability limit, MF ; in the de…nition of D
f21n replaced by n1=2 M21F = 0k k in the …rst line and, hence, M 21h
in (18.2) hold but with n1=2 M
replaced by 0k
k
in the second line. Hence, in this case, the asymptotic distribution only depends
on Dh : Hence, Comment (iv) to Theorem 5.1 holds in this example.
b ny by W
fn D
b n as in Comment (v) to Theorem 5.1. This yields equal
Suppose one de…nes D
b n : This is equivalent to replacing Ve 1=2 by I2 W
fn in the de…nition
weighting of each column of D
Dn
b ny in (18.1). In this case, the o¤-diagonal k
of D
in the …rst line of (18.2) equals 0k
k blocks of I2
k;
fn are 0k
W
k
f21n
and, hence, M
which implies that M 21h = 0k k in the second line of (18.2).
b ny does not depend on the asymptotic distribution of the
Thus, the asymptotic distribution of D
fn : It only depends on the probability limit of W
fn ; as stated
(normalized) weight matrix estimator W
in Comment (v) to Theorem 5.1.
18.2
Asymptotic Size of Kleibergen’s CLR Test with Jacobian-Variance
Weighting
b n is
In this subsection, we determine the asymptotic size of Kleibergen’s CLR test when D
weighted by VeDn ; de…ned in (5.3), which yields what we call Jacobian-variance weighting, and the
Robin and Smith (2000) rank statistic is employed. This rank statistic is de…ned in (5.5) with
f21n does not converge
When the matrix M21F 6= 0k k ; the argument in (18.2) does not go through because n1=2 M
f21n M21F ) !d M 21h by assumption). In this case, one has to alter the de…nition of Tny
in distribution (since n1=2 (M
b n before rescaling them. The rotation required depends on both MF and EF Gi :
so that it rotates the columns of D
54
30
=
0:
For convenience, we restate the de…nition here:
rkn = rkny :=
b ny is as in (5.4) with
(so D
=
by 0 by
min (n(Dn ) Dn );
55
0 ):
Let
b ny := vec 1 (Ve 1=2 vec(D
b n ))
where D
k;p Dn
(18.3)
b y )0 D
b y ; for j = 1; :::; p;
byjn denote the jth eigenvalue of n(D
n
n
ordered to be nonincreasing in j: By de…nition,
b ny equals (by )1=2 :
of n1=2 D
jn
by 0 by
min (n(Dn ) Dn )
(18.4)
= bypn : Also, the jth singular value
De…ne the parameter space FKCLR for the distribution F by
FKCLR := fF 2 F :
where
when
2
1
> 0 and
min (V
arF ((gi0 ; vec(Gi )0 )0 ))
0
0 0 4+
2 ; EF jj(gi ; vec(Gi ) ) jj
M g;
(18.5)
> 0 and M < 1 are as in the de…nition of F in (3.1). Note that FKCLR
in F0 satis…es
M
1
2=(2+ )
2;
F0
by condition (vi) in (3.9). Let vech( ) denote the half
vectorization operator that vectorizes the nonredundant elements in the columns of a symmetric
matrix (that is, the elements on or below the main diagonal). The moment condition in FKCLR is
imposed because the asymptotic distribution of the rank statistic rkny depends on a triangular array
CLT for vech(fi fi 0 ); which employs 4 +
moments for fi ; where fi := (gi0 ; vec(Gi EFn Gi )0 )0
as in (5.6). The min ( ) condition in FKCLR ensures that VeDn is positive de…nite wp!1; which is
1=2
needed because VeDn enters the rank statistic rkny via Ve
; see (18.3).
Dn
For a …xed distribution F; VeDn estimates
de…nition in (8.15) and the
MF
2
6
6
=6
4
DFy :=
M11F
..
.
Mp1F
p
X
j=1
..
min (
.
vec(Gi )
F
de…ned in (8.15), where
vec(Gi )
F
is pd by its
) condition in FKCLR :56 Let
3
M1pF
7
7
..
7 := (
.
5
MppF
vec(Gi ) 1=2
)
F
(M1jF EF Gij ; :::; MpjF EF Gij ) 2 Rk
p
and
; where Gi = (Gi1 ; :::; Gip ) 2 Rk
(18.6)
p
:
As in Section 5, the function veck;p1 ( ) is the inverse of the vec( ) function for k p matrices. Thus, the domain
of veck;p1 ( ) consists of kp-vectors and its range consists of k p matrices.
vec(Gi )
vec(Gi )
56
More speci…cally,
is pd because by (8.15) F
:= V arF (vec(Gi )
(EF vec(G` )g`0 ) F 1 gi )
F
1
1
0
0
0 0
0
0
= ( (EF vec(G` )g` ) F ; Ipk )V arF ((gi ; vec(Gi ) ) )( (EF vec(G` )g` ) F ; Ipk ) ; where ( (EF vec(G` )g`0 ) F 1 ; Ipk ) 2
Rpk (p+1)k has full row rank pk and V arF ((gi0 ; vec(Gi )0 )0 ) is pd by the min ( ) condition in FKCLR :
55
31
y
y
1F ; :::; pF )
Let (
denote the singular values of DFy : De…ne
BFy 2 Rp
p
to be an orthogonal matrix of eigenvectors of DFy0 DFy and
CFy 2 Rk
k
to be an orthogonal matrix of eigenvectors of DFy DFy0
y
1F ; :::;
y 2
jF ) for j
y
pF )
ordered so that the corresponding eigenvalues (
y
jF
tively, are nonincreasing. We have
=(
and (
y
1F ; :::;
(18.7)
y
pF ; 0; :::; 0)
2 Rk ; respec-
= 1; :::; p: Note that (18.7) gives de…nitions
of BF and CF that are similar to the de…nitions in (8.6) and (8.7), but di¤er because DFy replaces
WF (EF Gi )UF in the de…nitions.
De…ne (
1;F ; :::;
9;F )
as in (8.9) with
1=2
= WF =
7;F
F
;
8;F
= Ip ; and W1 ( ) and U1 ( ) equal
to identity functions. De…ne
10;F
0
= V arF @
1
fi
vech (fi fi
A 2 Rd
0)
y
1;F ;
and CFy
where d := (p+1)k+(p+1)k((p+1)k+1)=2: De…ne (
y
jF
are de…ned in (8.9) but with f
:j
pg;
BFy ;
y
2;F ;
d
y
3;F ;
;
(18.8)
y
6;F )
in place of f
as (
jF
1;F ;
:j
2;F ;
3;F ;
6;F )
pg; BF ; and CF ;
respectively.
De…ne
=
KCLR
F
:= (
:= f :
=(
hn ( ) : = (n1=2
Let f
n;h
2
KCLR
1;F ; :::;
1;F ; :::;
1;F ;
:n
y
1;F ;
10;F ;
2;F ;
10;F ;
3;F ;
y
2;F ;
y
1;F ;
4;F ;
y
3;F ;
6;F ;
1g denote a sequence f
n
7;F ;
2
(18.9)
y
6;F )
y
3;F ;
y
2;F ;
5;F ;
y
6;F );
for some F 2 FKCLR g; and
8;F ;
KCLR
10;F ; n
:n
bn
for H as in (8.1). The asymptotic variance of n1=2 vec(D
KCLR
:n
q;
and h1;p
1g for which hn (
EFn Gi ) is
vec(Gi )
h
n)
y
6;F ):
! h 2 H;
under f
n;h
2
q
p and hs for s = 2; :::; 8 as in (8.12), q = qh as in (8.16), h2;q ; h2;p
as in (8.17), and
h8 = Ip due to the de…nitions of
upper left k
y
3;F ;
y
2;F ;
1g by Lemma 8.2.
De…ne h1;j for j
h3;p
1=2 y
1;F ;
7;F
n;
n;q ;
and
8;F
and
n;p q
as in (13.2). Note that h7 =
q ; h3;q ;
1=2
h5;g and
given above, where h5;g (= lim EFn gi gi0 ) denotes the
k submatrix of h5 ; as in Section 8.
For a sequence f
2
h10 = 4
n;h
2
h10;f
h10;f
2f
KCLR
h10;f
h10;f
:n
f
2
2f 2
1g; we have
0
3
5 := lim V arFn @
32
fi
vech (fi fi 0 )
1
A 2 Rd
d
:
(18.10)
Note that h10;f 2 R(p+1)k
With
y
jF ;
BFy ;
and
CFy
(p+1)k
is pd by the de…nition of FKCLR in (18.5).
in place of
jF ;
BF ; and CF ; respectively, de…ne hy1;j for j
p and hys
for s = 2; 3; 6 as in (8.12) as analogues to the quantities without the y superscript, de…ne q y = qhy
as in (8.16), de…ne hy2;qy ; hy2;p
y
n;p q y
qy
; hy3;qy ; hy3;k
qy
; and hy1;p
qy
y
n;
as in (8.17), and de…ne
y
;
n;q y
and
as in (13.2). The quantity q y determines the asymptotic behavior of rkny : By de…nition, q y
is the largest value j (
below that if
qy
p) for which lim n1=2
= p; then
rkny
y
jFn
= 1 under f
qy
!p 1; whereas if
< p; then
n;h 2 KCLR : n
rkny converges in
1g: It is shown
distribution to a
nondegenerate random variable, see Lemma 18.4.
By the CLT, for any sequence f
n
1=2
n
X
i=1
0
0
@
n;h
2
KCLR
fi
vech (fi fi
0
0
EFn fi fi
0)
0
:n
1g;
1
A !d Lh
N (0d ; h10 ); where
Lh = (Lh;1 ; Lh;2 ; Lh;3 )0 for Lh;1 2 Rk ; Lh;2 2 Rkp ; and Lh;3 2 R(p+1)k((p+1)k+1)=2 (18.11)
and the CLT holds using the moment conditions in FKCLR : Note that by the de…nitions of h4 :=
lim EFn Gi and h5 := lim EFn (gi0 ; vec(Gi )0 )0 (gi0 ; vec(Gi )0 ); we have
2
h10;f = 4
for h5;g 2 Rk
k;
h5;g
h5;gG
h5;Gg h5;G
h5;Gg 2 Rkp
k;
vec(h4 )vec(h4 )0
and h5;G 2 Rkp
3
2
5 ; where h5 = 4
h5;g
h5;gG
h5;Gg
h5;G
3
5
(18.12)
kp :
We now provide new, but distributionally equivalent, de…nitions of g h and Dh :
g h := Lh;1 and vec(Dh ) := Lh;2
h5;Gg h5;g1 Lh;1 :
(18.13)
These de…nitions are distributionally equivalent to the previous de…nitions of g h and Dh given
in Lemma 8.2, because by either set of de…nitions g h and vec(Dh ) are independent mean zero
random vectors with variance matrices h5;g and
respectively, where
min (
vec(Gi )
)
Fn
vec(Gi )
h
vec(Gi )
h
(= h5;G vec(h4 )vec(h4 )0 h5;Gg h5;g1 h05;Gg );
is de…ned in (8.15) and is pd (because
vec(Gi )
h
is bounded away from zero by its de…nition in (8.15) and the
FKCLR ):
33
= lim
min (
vec(Gi )
Fn
and
) condition in
De…ne
y
Dh :=
p
X
j=1
2
M
6 11h
6 .
p
; where 6 ..
4
Mp1h
(M1jh Djh ; :::; Mpjh Djh ) 2 Rk
..
.
3
M1ph
7
.. 7
. 7 := (
5
Mpph
vec(Gi ) 1=2
)
;
h
(18.14)
Dh = (D1h ; :::; Dph ); and Dh is de…ned in (18.13). De…ne
y
h
=(
y
h;p q y
y
h;q y ;
y
h;p q y )
: = hy3 hy1;p
qy
y
2 Rk
+ Dh hy2;p
p
qy
y
h;q y
;
2 Rk
:= hy3;qy 2 Rk
(p q y )
qy
; and
:
(18.15)
Let a( ) be the function from Rd to Rkp(kp+1)=2 that maps
n
X
0
1
fi
A into
0)
vech
(f
f
i=1
i i
0
n
X
1
@
An := vech
n
vec(Gi EFn Gi )vec(Gi
n
1
@
(18.16)
EFn Gi )0
i=1
e n := n
1
n
X
i=1
gi gi0 2 Rk
k
and e n := n
1
1
n
X
vec(Gi
i=1
e n e n 1 e 0n
!
EFn Gi )gi0 2 Rpk
1=2
k
1
A ; where
:
Pn
part of its argument. Also, a( ) is well de…ned
P
and continuously partially di¤erentiable at any value of its argument for which n 1 ni=1 fi fi 0 is
Note that a( ) does not depend on the n
i=1 fi
pd.57 We de…ne Ah as follows:
Ah denotes the (kp)(kp + 1)=2
d matrix of partial derivatives of a( )
evaluated at (0(p+1)k0 ; vech(h10;f )0 )0 ;
(18.17)
where the latter vector is the limit of the mean vector of (fi 0 ; vech (fi fi 0 )0 )0 under f
n
n;h
2
KCLR
:
1g:
De…ne
1
M h := vechkp;kp
(Ah Lh ) 2 Rkp
kp
;
(18.18)
1
where vechkp;kp
( ) denotes the inverse of the vech( ) operator applied to symmetric kp kp matrices.
P
The function a( ) is well de…ned in this case because n 1 n
EFn Gi )vec(Gi EFn Gi )0
i=1 vec(Gi
P
n
1
1
0
1
0
1
pk (p+1)k
e
e
e
e
e
e
has full row rank pk:
= ( n n ; Ipk )n
n n ; Ipk ) 2 R
n n ; Ipk ) and (
i=1 fi fi (
57
34
e n e n 1 e 0n
De…ne
y
y
y
M h := (M h;qy ; M h;p
y
M h;p qy
:=
p
X
j=1
qy )
:= (0k
qy
y
; M h;p
qy )
(M 1jh h4;j ; :::; M pjh h4;j )hy2;p qy
and h4 = (h4;1 ; :::; h4;p ) 2 Rk
2 Rk
2 Rk
(p
p
; where
qy )
2
6
6
; Mh = 6
4
M 11h
..
.
..
(18.19)
3
M 1ph
7
.. 7
. 7;
5
M pph
.
M p1h
p:
Below (in Lemma 18.4), we show that the asymptotic distribution of rkny under sequences
f
n;h
2
KCLR
1g with q y < p is given by
:n
rh (Dh ; M h ) :=
where
y
h;p q y
y
h;p q y
min ((
y
+ M h;p
y0
0 y
q y ) h3;k q y h3;k q y (
y
h;p q y
y
+ M h;p
y
is a nonrandom function of Dh by (18.14) and (18.15) and M h;p
function of M h by (18.19). For sequences f
n;h
2
KCLR
: n
1g with
qy
q y ));
qy
(18.20)
is a nonrandom
= p; we show that
rkn !p rh := 1:
We de…ne
h
=(
h;
h;q ;
as in (8.17), as follows:
2 Rk
h;p q )
h2 = (h2;q ; h2;p
q );
p
;
h;q
h3 = (h3;q ; h3;k
:= h3;q ; and
q );
h1;p
q
h;p q
2
:= h3 h1;p
q
+ h7 Dh h8 h2;p
3
0q (p q)
6
7
k
7
:= 6
4 Diagfh1;q+1 ; :::; h1;p g 52 R
0(k p) (p q)
q;
where
(p q)
: (18.21)
1=2
b n through
In the present case, h7 = h5;g and h8 = Ip because the CLRn statistic depends on D
b n ; which appears in the LMn statistic.58 This means that Assumption WU for the parameter
b n 1=2 D
space
KCLR
cn = b n 1=2 ; U
bn = Ip ; h7 = h 1=2 ; and h8 = Ip :
(de…ned in Section 8.4) holds with W
5;g
Thus, the distribution of
h
depends on Dh ; q; and hs for s = 1; 2; 3; 5:
Below (in Lemma 18.5), we show that the asymptotic distribution of the CLRn statistic under
58
b n through the rank statistic.
The CLRn statistic also depends on D
35
sequences f
n;h
2
KCLR
1
CLRh :=
2
1g with q y < p is given by59
:n
LM h + J h
LM h := v 0h v h
2
p;
q
rh + (LM h + J h
v h := P
rh )2 + 4LM rh ; where
1=2
h
1=2
1=2
h5;g g h ; J h := g 0h h5;g M
h
h5;g g h
2
k p;
and
rh := rh (Dh ; M h ):
(18.22)
The quantities (g h ; Dh ; M h ) are speci…ed in (18.13) and (18.18) (and (g h ; Dh ) are the same as in
Lemma 8.2). Conditional on Dh ; LM h and J h are independent and distributed as
respectively (see the paragraph following (10.6)). For sequences f
n;h
2
KCLR
:n
2
p
and
1g with
2 ;
k p
qy =
p; we show that the asymptotic distribution of the CLRn statistic is CLRh := LM h := v 0h v h
where v h := P
1=2
h
h5;g g h :
The critical value function c(1
c(1
2;
p
; r) to be the 1
; r) is de…ned in (5.2) for 0
quantile of the
2
p
r < 1: For r = 1; we de…ne
distribution.
Now we state the asymptotic size of Kleibergen’s CLR test based on Robin and Smith (2000)
statistic with VeDn de…ned in (5.3).
Theorem 18.1 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator
VeDn employed by the rank statistic rkny (de…ned in (18.3)) is de…ned by (5.3). Then, the asymptotic
size of Kleibergen’s CLR test based on the rank statistic rkny is
AsySz = maxf ; sup P (CLRh > c(1
; rh ))g
h2H
provided P (CLRh = c(1
; rh )) = 0 for all h 2 H:
Comments: (i) The proviso in Theorem 18.1 is a continuity condition on the distribution function
of CLRh
c(1
; rh ) at zero. If the proviso in Theorem 18.1 does not hold, then the following
weaker conclusion holds:
AsySz
(18.23)
2 [maxf ; sup P (CLRh > c(1
; rh ))g; maxf ; sup lim P (CLRh > c(1
h2H x"0
h2H
; rh ) + x)g]:
(ii) Conditional on (Dh ; M h ); g h has a multivariate normal distribution a.s. (because (g h ; Dh ;
M h ) has a multivariate normal distribution unconditionally).60 The proviso in Theorem 18.1 holds
59
The de…nitions of v h ; LM h ; J h ; and CLRh in (18.22) are the same as in (9.1), (9.2), (10.6), and (10.7), respectively.
60
Note that g h is independent of Dh :
36
whenever g h has a non-zero variance matrix conditional on (Dh ; M h ) a.s. for all h 2 H: This
holds because (a) P (CLRh = c(1
of iterated expectations,
(b) some calculations show that CLRh
0
1=2
h
; rh )jDh ; M h ) by the law
cJ h + c2 + crh i¤ X h X h = c2 + crh ; where c := c(1
(rh + c)LM h =
c)1=2 (P
; rh )) = E(Dh ;M h ) P (CLRh = c(1
h5;g g h )0 ; c1=2 (M
1=2
h
h5;g g h )0 )0 using (18.22), (c) P
and (d) conditional on (Dh ; M h ); rh ; c; and
h
h
+M
h
=
c(1
; rh ) i¤
; rh ) and X h := ((rh +
= Ik and P
h
M
h
= 0k
k;
are constants.
(iii) When p = 1; the formula for AsySz in Theorem 18.1 reduces to
and the proviso holds
automatically. That is, Kleibergen’s CLR test has correct asymptotic size when p = 1: This holds
y
because when p = 1 the quantity M h in (18.19) equals 0k
p
by Comment (ii) to Theorem 18.3
below. This implies that rh (Dh ; M h ) in (18.20) does not depend on M h : Given this, the proof that
P (CLRh > c(1
; rh ) =
for all h 2 H and that the proviso holds is the same as in (10.9)-(10.10)
in the proof of Theorem 10.1.
(iv) Theorem 18.1 is proved by showing that it is a special case of Theorem 18.6 below, which is
similar but applies not to VeDn de…ned in (5.3), but to an arbitrary estimator VeDn (of the asymptotic
variance
vec(Gi )
h
bn
of n1=2 vec(D
EFn Gi )) that satis…es an Assumption VD (which is stated below).
Lemma 18.2 below shows that the estimator VeDn de…ned in (5.3) satis…es Assumption VD.
(v) A CS version of Theorem 18.1 holds with the parameter space F
where F
;KCLR
:= f(F;
0)
: F 2 FKCLR ( 0 );
in (18.5) with its dependence on
0
0
2
;KCLR
in place of FKCLR ;
g and FKCLR ( 0 ) is the set FKCLR de…ned
made explicit. The proof of this CS result is as outlined in
the Comment to Proposition 8.1. For the CS result, the h index and its parameter space H are as
de…ned above, but h also includes
18.3
0
as a subvector, and H allows this subvector to range over
:
Simulation Results
In this section, for a particular linear IV regression model, we simulate (i) the correlations
y
between M h;p
qy
(de…ned in (18.19)) and g h and (ii) some asymptotic null rejection probabilities
(NRP’s) of Kleibergen’s CLR test that uses Jacobian-variance weighting and employs the Robin
and Smith (2000) rank statistic. The model has p = 2 rhs endogenous variables, k = 5 IV’s, and
an error structure that yields simpli…ed asymptotic formulae for some key quantities. The model is
y1i = Y2i0
0
+ ui and Y2i =
where y1i ; ui 2 R; Y2i ; V2i = (V21i ; V22i )0 ;
take Zij
0
Zi + V2i ;
2 R2 ; Zi = (Zi1 ; :::; Zi5 )0 2 R5 ; and
N (:05; (:05)2 ) for j = 1; :::; 5; ui
N (0; 1); V1i
(18.24)
2 R5
2:
N (0; 1); and V2i = ui V21i : The
random variables Zi1 ; :::; Zi5 ; ui ; and V1i are taken to be mutually independent. We take
37
We
=
n
= (e1 ; e2 cn
1=2 );
where e1 = (1; 0; :::; 0)0 2 R5 and e2 = (0; 1; 0; :::; 0)0 2 R5 : We consider 26
values of the constant c lying between 0 and 60:1 (viz., 0:0; 0:1; :::; 1:0; 1:1; :::; 10:1; 20:1; :::; 60:1);
y
as well as 707:1; 1414:2; and 1; 000; 000: Given these de…nitions, h1;1 = 1; h1;2 = c; and M h =
y
(05 ; M h;p
qy )
2 R5
2;
see (18.19).
In this model, we have gi =
EF Gi gi0 = 0k
k:
Zi Y2i0 : The speci…ed error distribution leads to
Zi ui and Gi =
vec(Gi )
h
In consequence, the matrix
(de…ned in (8.15)), which is the asymptotic
variance of the Jacobian-variance matrix estimator VeDn (de…ned in (5.3)), simpli…es as follows:
vec(Gi )
h
= lim V arFn vec(Di
EFn Di )vec(Di
EFn Di )0
= lim V arFn vec(Gi
EFn Gi )vec(Gi
EFn Gi )0 ; where
Di : = Gi1
1F
1
F
gi ; Gi2
2F
1
F
gi ;
jF
(18.25)
= EF Gij gi0 for j = 1; 2; and
F
= EF gi gi0 :
In addition, in the present model, Gi1 and Gi2 are uncorrelated, where Gi = (Gi1 ; Gi2 ): In consequence,
vec(Gi )
h
is block diagonal. In turn, this implies that lim MFn := (
diagonal with o¤-diagonal block lim M12Fn = 05
vec(Gi ) 1=2
)
h
is block
5:
The quantities hy1;j for j = 1; :::; 5 (de…ned just below (18.10)) are not available in closed form,
so we simulate them using a very large value of n; viz., n = 2; 000; 000: We use 4; 000; 000 simulation
y
repetitions to compute the correlations between the jth elements of M h;p
qy
and g h for j = 1; :::; 5
and the asymptotic NRP’s of the CLR test.61 The data-dependent critical values for the test are
computed using a look-up table that gives the critical values for each …xed value r of the rank
statistic in a grid from 0 to 100 with a step size of :005: These critical values are computed using
4; 000; 000 simulation repetitions.
Results are obtained for each of the 29 values of c listed above. The simulated correlations
y
between the jth elements of M h;p
qy
:33;
for all values of c
:38; and
:38
60:1: For c = 707:1; the correlations are
:32;
For c = 1414:2; the correlations are
correlations are
:01;
in Theorem 5.1 that
and g h for j = 1; :::; 5 take the following values
:01;
y
M h;p qy
:01;
:38;
:24;
:01; and
:38;
:27;
:27;
:27; and
(18.26)
:36;
:36;
:36; and
:36:
:27: For c = 1; 000; 000; the
:01: These results corroborate the …ndings given
and g h are correlated asymptotically in some models under some
sequences of distributions. In consequence, it is not possible to show the Jacobian-variance weighted
CLR test has correct asymptotic size via a conditioning argument that relies on the independence
61
The correlations between the jth and kth elements of these vectors for j 6= k are zero by analytic calculation.
Hence, they are not reported here.
38
of
y
h;p q y
y
+ M h;p
qy
and g h :
Next, we report the asymptotic NRP results for Kleibergen’s CLR test that uses Jacobianvariance weighting and the Robin and Smith (2000) rank statistic. The asymptotic NRP’s are
found to be between 4:95% and 5:01% for the 29 values of c considered. These values are very
close to the nominal size of 5:00%: Whether the di¤erence is due to simulation noise or not is
not clear. The simulation standard error based on the formula 100
)=reps)1=2 ; where
( (1
reps = 4; 000; 000 is the number of simulation repetitions, is :01: However, this formula does not
take into account simulation error from the computation of the critical values.
We conclude that, for the model and error distribution considered, the asymptotic NRP’s of the
Kleibergen’s CLR test with Jacobian-variance weighting is equal to, or very close to, its nominal size.
y
This occurs even though there are non-negligible correlations between M h;p
qy
and g h : Whether
this occurs for all parameters and distributions in the linear IV model, and whether it occurs in
other moment condition model, is an open question. It appears to be a question that can only be
answered on a case by case basis.
18.4
e Dn Estimators
Asymptotic Size of Kleibergen’s CLR Test for General V
In this section, we determine the asymptotic size of Kleibergen’s CLR test (de…ned in Section 5)
using the Robin and Smith (2000) rank statistic based on a general “Jacobian-variance”estimator
VeDn (= VeDn ( 0 )) that satis…es the following Assumption VD.
The …rst two results of this section, viz., Lemma 18.2 and Theorem 18.3, combine to establish
Theorem 5.1, see Comment (i) to Theorem 18.3. The …rst and last results of this section, viz.,
Lemma 18.2 and Theorem 18.6, combine to prove Theorem 18.1.
The proofs of the results in this section are given in Section 18.6.
Assumption VD: For any sequence f n;h 2 KCLR : n
1g; the estimator VeDn is such that
fn = Ve 1=2 and MFn is
fn MFn ) !d M h for some random matrix M h 2 Rkp kp (where M
n1=2 (M
Dn
de…ned in (18.6)), the convergence is joint with
0
n1=2 @
bn
vec(D
gbn
EF n Gi )
1
0
A !d @
gh
vec(Dh )
1
A
0
0
N @0(p+1)k ; @
0k pk
h5;g
0pk
vec(Gi )
h
k
11
AA ;
(18.27)
and (g h ; Dh ; M h ) has a mean zero multivariate normal distribution with pd variance matrix. The
same condition holds for any subsequence fwn g and any sequence f
wn in place of n throughout.
Note that the convergence in (18.27) holds by Lemma 8.2.
39
wn ;h
2
KCLR
:n
1g with
The following lemma veri…es Assumption VD for the estimator VeDn de…ned in (5.3).
Lemma 18.2 The estimator VeDn de…ned in (5.3) satis…es Assumption VD. Speci…cally,
b n EF n Gi ; M
fn MFn ) !d (g h ; Dh ; M h ); where M
fn := Ve 1=2 ; MFn := ( vec(Gi ) ) 1=2 ; and
n1=2 (b
gn ; D
Dn
Fn
(g h ; Dh ; M h ) has a mean zero multivariate normal distribution de…ned by (18.11) and (18.13)(18.18) with pd variance matrix.
b n is de…ned in Lemma 18.2 and
Comment: As stated in the paragraph containing (18.21), D
cn = b n 1=2 and U
bn = Ip :
Theorem 18.3 below with W
De…ne
Sny := Diagf(n1=2
y
1
1
1=2 y
qFn ) ; 1; :::; 1g
1Fn ) ; :::; (n
2 Rp
p
and Tny := Bny Sny ;
(18.28)
where Bny is de…ned in (18.7).
b ny Tny is given in the following theorem.
The asymptotic distribution of n1=2 D
Theorem 18.3 Suppose Assumption VD holds. For all sequences f n;h 2 KCLR : n
1g;
y
y
y
y
y
b n EF n Gi ; D
b n Tn ) !d (g h ; Dh ;
n1=2 (b
gn ; D
h + M h ); where
h is a nonrandom a¢ ne function of
y
Dh de…ned in (18.14) and (18.15), M h is a nonrandom linear (i.e., a¢ ne and homogeneous of
degree one) function of M h de…ned in (18.19), (g h ; Dh ; M h ) has a mean zero multivariate normal
distribution, and g h and Dh are independent. Under all subsequences fwn g and all sequences
f
wn ;h
2
KCLR
:n
1g; the same result holds with n replaced with wn :
y
y
h; M h)
Comments: (i) Note that the random variables (g h ;
in Theorem 5.1 have a multivariate
normal distribution whose mean and variance matrix depend on lim V arFn ((fi 0 ; vec (fi fi 0 )0 ) and
on the limits of certain functions of EFn Gi by (18.11)-(18.19). This, Lemma 18.2, and Theorem
18.3 combine to prove Theorem 5.1 of AG1.
y
(ii) From (18.19), M h = 0k
y
h4 = 0k and q y = 1 implies M h;p
y
M h;p
(
qy
p
if p = 1 (because q y = 0 implies q = 0 which, in turn, implies
qy
has no columns).62 For p
has no columns) or if h4;j = 0k for all j
1Fn ; :::; pFn )
of DFy n satisfy n1=2
jFn
y
2; M h = 0k
p
if p = q y (because
p: The former holds if the singular values
! 1 for all j
semi-strongly identi…ed). The latter occurs if EFn Gi ! 0k
p (i.e., all parameters are strongly or
p
(i.e., all parameters are either weakly
identi…ed in the standard sense or semi-strongly identi…ed). These two condition fail to hold when
Note that q y = 0 implies q = 0 when p = 1 because n1=2 DFy n = n1=2 MFn EFn Gi = O(1) when q y = 0 (by the
de…nition of q y ) and this implies that n1=2 EFn Gi = O(1) using the …rst condition in FKCLR : In turn, the latter
1=2
1=2
implies that n1=2 Fn EFn Gi = O(1) using the last condition in F . That is, q = 0 (since WF = F
and UF = Ip
1=2
c
b
b
because Wn = n
and Un = Ip in the present case, see the Comment to Lemma 18.2).
62
40
one or more parameters are strongly identi…ed and one or more parameters are weakly identi…ed
or jointly weakly identi…ed.
y
(iii) For example, when p = 2 the conditions in Comment (ii) (under which M h = 0k
p)
fail to
hold if EFn Gi1 6= 0k does not depend on n and n1=2 EFn Gi2 ! c for some c 2 Rk :
The following lemma establishes the asymptotic distribution of rkny :
Lemma 18.4 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator
VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, under
all sequences f
(a)
rkny
n;h
2
KCLR
:n
1g;
:= bypn !p 1 if q y = p;
(b) rkny := bypn !d rh (Dh ; M h ) if q y < p; where rh (Dh ; M h ) is de…ned in (18.20) using (18.19)
with M h de…ned in Assumption VD (rather than in (18.18)),
(c) byjn !p 1 for all j
qy;
(d) the (ordered ) vector of the smallest p
b ny ; i.e., ((by y
)1=2 ; :::;
q y singular values of n1=2 D
(q +1)n
(bypn )1=2 )0 ; converges in distribution to the (ordered ) p
hy0
3;k
qy
(
y
h;p q y
y
+ M h;p
qy )
2 R(k
q y ) (p q y ) ;
y
where M h;p
q y vector of the singular values of
is de…ned in (18.19) with M h de…ned
qy
in Assumption VD (rather than in (18.18)),
(e) the convergence in parts (a)-(d) holds jointly with the convergence in Theorem 18.3, and
(f) under all subsequences fwn g and all sequences f
wn ;h
2
KCLR
:n
1g; parts (a)-(e) hold
with n replaced with wn :
The following lemma gives the joint asymptotic distribution of CLRn and rkny and the asymptotic null rejection probabilities of Kleibergen’s CLR test.
Lemma 18.5 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator
VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, under
all sequences f
n;h
2
KCLR
:n
1g;
(a) CLRn = LMn + op (1) !d
(b) lim P (CLRn > c(1
n!1
2
p
and rkny !p 1 if q y = p;
; rkny )) =
if q y = p;
(c) (CLRn ; rkny ) !d (CLRh ; rh ) if q y < p; and
(d) lim P (CLRn > c(1
n!1
P (CLRh = c(1
; rkny )) = P (CLRh > c(1
; rh )) if q y < p; provided
; rh )) = 0:
Under all subsequences fwn g and all sequences f
replaced with wn :
41
wn ;h
2
KCLR
1g; parts (a)-(d) hold with n
Comments: (i) The CLR critical value function c(1
de…nition,
clr(r) :=
1
2
2
p
+
where the chi-square random variables
2
k p
2
p
r+
and
q
2
k p
(
2
p
; r) is the 1
+
2
k p
r)2 + 4
quantile of clr(r): By
2r
p
;
(18.29)
are independent. If rh := rh (Dh ; M h ) does not
depend on M h ; then, conditional on Dh ; rh is a constant and LM h and J h are independent and
distributed as
2
p
and
2
k p
(see the paragraph following (10.6)). In this case, even when q y = p;
P (CLRh > c(1
; rh )) = EDh P (CLRh > c(1
; rh )jDh ) = ;
(18.30)
as desired, where the …rst equality holds by the law of iterated expectations and the second equality
holds because rh is a constant conditional on Dh and c(1
; rh ) is the 1
quantile of the
conditional distribution of clr(rh ) given Dh ; which equals that of CLRh given Dh :
(ii) However, when rh := rh (Dh ; M h ) depends on M h ; the distribution of rh conditional on
Dh is not a pointmass distribution. Rather, conditional on Dh ; rh is a random variable that is not
independent of LM h ; J h ; and CLRh : In consequence, the second equality in (18.30) does not hold
and the asymptotic null rejection probability of Kleibergen’s CLR test may be larger or smaller
than
q y < p:
depending upon the sequence f
n;h
2
KCLR
:n
1g (or f
wn ;h
2
KCLR
:n
1g) when
Next, we use Lemma 18.5 to provide an expression for the asymptotic size of Kleibergen’s CLR
test based on the Robin and Smith (2000) rank statistic with Jacobian-variance weighting.
Theorem 18.6 Let the parameter space for F be FKCLR : Suppose the variance matrix estimator
VeDn employed by the rank statistic rkny (de…ned in (18.3)) satis…es Assumption VD. Then, the
asymptotic size of Kleibergen’s CLR test based on rkny is
AsySz = maxf ; sup P (CLRh > c(1
; rh ))g
h2H
provided P (CLRh = c(1
; rh )) = 0 for all h 2 H:
Comments: (i) Comment (i) to Theorem 18.1 also applies to Theorem 18.6.
(ii) Theorem 18.6 and Lemma 18.2 combine to prove Theorem 18.1.
(iii) A CS version of Theorem 18.6 holds with the parameter space F
see Comment (v) to Theorem 18.1 and the Comment to Proposition 8.1.
42
;KCLR
in place of FKCLR ;
18.5
Correct Asymptotic Size of Equally-Weighted CLR Tests
Based on the Robin-Smith Rank Statistic
In this subsection, we consider equally-weighted CLR tests, a special case of which is considered
in Section 6. By de…nition, an equally-weighted CLR test is a CLR test that is based on a rkn
b n for some general k k weighting matrix W
fn :
b n only through W
fn D
statistic that depends on D
We show that such tests have correct asymptotic size when they are based on the rank statistic
fn 2 Rk k that satis…es certain
of Robin and Smith (2000) and employ a general weight matrix W
1=2
conditions. In contrast, the results in Section 6 consider the speci…c weight matrix b n
2 Rk
k:
The reason for considering these tests in this section is that the asymptotic results can be obtained
as a relatively simple by-product of the results in Section 18.4. All that is required is a slight change
in Assumption VD.
The rank statistic that we consider here is
rkny :=
b 0 f0 f b
min (nDn Wn Wn Dn ):
(18.31)
We replace Assumption VD in Section 18.4 by the following assumption.
Assumption W: For any sequence f n;h 2 KCLR : n
1g; the random k k weight matrix
fn W y ) !d W h for some non-random k k matrices fW y : n 1g and
fn is such that n1=2 (W
W
Fn
Fn
k matrix W h 2 Rk
some random k
k;
WFyn ! Why for some nonrandom pd k
k matrix Why ; the
convergence is joint with the convergence in (18.27), and (g h ; Dh ; W h ) has a mean zero multivariate
normal distribution with pd variance matrix. The same condition holds for any subsequence fwn g
and any sequence f
wn ;h
2
KCLR
:n
fn (= Ve 1=2 ) = Ip
If one takes M
Dn
1g with wn in place of n throughout.
fn in Assumption VD, then D
b ny = W
fn D
b n and the rank
W
statistics in (18.3) and (18.31) are the same. Thus, Assumption W is analogous to Assumption
fn = Ip W
fn and MFn = Ip W y : Note, however, that the latter matrix does not
VD with M
Fn
typically satisfy the condition in Assumption VD that MFn is de…ned in (18.6), i.e., the condition
that MFn = (
vec(Gi ) 1=2
)
:
Fn
Nevertheless, the results in Section 18.4 hold with Assumption VD
WFy ; DFy = WFy EF Gi ; and M h = Ip
replaced by Assumption W and with MF = Ip
these changes,
y
Dh
as in (18.15) with
= Why Dh in (18.14) (because (
y
Dh
Below we show
y
as just given, and M h is
y
the key result that M h;p qy
vec(Gi ) 1=2
)
h
is replaced by Ip
de…ned as in (18.19) with
=
y
0k (p q )
for
rkny
Why );
y
M h;p qy
W h : With
y
h
is de…ned
= W h h4 hy2;p
qy
:
de…ned in (18.31). By (18.20),
this implies that
rh (Dh ; M h ) :=
min ((
y
y0
0 y
h;p q y ) h3;k q y h3;k q y (
43
y
h;p q y ))
(18.32)
when q y < p: Note that the rhs in (18.32) does not depend on M h and, hence, is a function only
of Dh : That is, rh (Dh ; M h ) = rh (Dh ): Given that rh (Dh ; M h ) does not depend on M h ; Comment
(i) to Lemma 18.5 implies that P (CLRh > c(1
sequences f
wn ;h
2
KCLR
:n
; rh )) =
under all subsequences fwn g and all
1g: This and Theorem 18.6 give the following result.
Corollary 18.7 Let the parameter space for F be FKCLR : Suppose the rank statistic rkny (de…ned
fn that satis…es Assumption W. Then, the asymptotic size
in (18.31)) is based on a weight matrix W
of the corresponding equally-weighted version of Kleibergen’s CLR test (de…ned in Section 5 with
rkn ( ) = rkny ) equals
:
Comment: A CS version of Corollary 18.7 holds with the parameter space F
;KCLR
in place of
FKCLR ; see Comment (v) to Theorem 18.1 and the Comment to Proposition 8.1.
y
Now, we establish that M h;p
qy
(= W h h4 hy2;p
qy
) = 0k
Why h4 := lim WFyn EFn Gi = lim CFy n
y 0
y
Fn (BFn )
where CFy n
(p q y ) :
y0
y
Fn BFn
We have
= hy3 lim
y0
y
F n h2 ;
y
Fn
is the singular value decomposition of WFyn EFn Gi ;
with the singular values of WFyn EFn Gi ; denoted by f yjFn : n
and zeroes elsewhere, and CFy n and BFy n are the corresponding
of singular vectors, as de…ned in (18.7). Hence, lim yn exists,
1g for j
k
call it
is the k
p matrix
p; on the main diagonal
k and p
y
h;
(18.33)
p orthogonal matrices
y
and equals hy0
3 h4 h2 : That
is, the singular value decomposition of Why h4 is
y y0
h h2 :
Why h4 = hy3
The k
p matrix
zeroes elsewhere. Let
qy;
y
h;j
y
h has the
y
h;j for j
limits of the singular values of WFyn EFn Gi on its main diagonal and
p denote the limits of these singular values. By the de…nition of
y
jFn
= 0 for j = q y + 1; :::; p (because n1=2
as
y
h
In addition,
2
=4
y
h;q y
y
y
(k
0 q) q
(18.34)
y
y
0q (p q )
y
y
0(k q ) (p q )
y
hy0
2 h2;p
! hy1;j < 1): In consequence,
3
5 ; where
qy
0
=@
y
h;q y
:= Diagf
y
y
0q (p q )
44
Ip
qy
1
A:
y
h
can be written
y
y
h;1 ; :::; h;q y g:
(18.35)
(18.36)
Thus, we have
y
M h;p
qy
: = W h (Why )
= W h (Why )
= 0k
(p q y )
;
1
Why h4 hy2;p
2
1 y4
h3
qy
= W h (Why )
y
h;p q y
y
y
(k
0 q) q
0q
0(k
y
1 y
h3
(p q y )
q y ) (p q y )
y y0 y
h h2 h2;p q y
30
5@
y
y
0q (p q )
Ip
qy
1
A
(18.37)
where the …rst equality holds by the paragraph following Assumption W and uses the condition in
Assumption W that Why is pd and the second equality holds by (18.35) and (18.36). This completes
the proof of Corollary 18.7.
18.6
Proofs of Results Stated in Sections 18.2 and 18.4
For notational simplicity, the proofs in this section are for the sequence fng; rather than a
subsequence fwn : n
1g: The same proofs hold for any subsequence fwn : n
1g:
Proof of Theorem 18.1. Theorem 18.1 follows from Theorem 18.6, which imposes Assumption
VD, and Lemma 18.2, which veri…es Assumption VD when VeDn is de…ned by (5.3).
Proof of Lemma 18.2. Consider any sequence f n;h 2 KCLR : n 1g: By the CLT result in
b n EFn Gi ) in (14.1), and the de…nitions of g h and Dh in
(18.11), the linear expansion of n1=2 (D
(18.13), we have
bn
n1=2 (b
gn ; D
EFn Gi ) !d (g h ; Dh ):
(18.38)
Next, we apply the delta method to the CLT result in (18.11) and the function a( ) de…ned in
(18.16). The mean component in the lhs quantity in (18.11) is (0(p+1)k0 ; vech(EFn fi fi 0 )0 )0 : We have
00
a @@
= vech
= vech
where
vec(Gi )
Fn
and
0(p+1)k
vech(EFn fi fi 0 )
EFn vec(Gi
vec(Gi )
Fn
Fn
1=2
11
AA
EFn Gi )vec(Gi
EFn Gi )0
vec(Gi )
Fn
1 vec(Gi )0
Fn Fn
1=2
= vech(MFn );
(18.39)
are de…ned in (3.2), the …rst equality uses the de…nitions of a( ) and fi
vec(Gi )
Fn
EFn fi fi 0 !
(given in (18.16) and (5.6), respectively), the second equality holds by the de…nition of
in (8.15), and the third equality holds by the de…nition of MFn in (18.6). Also,
h10;f and h10;f is pd. Hence, a( ) is well de…ned and continuously partially di¤erentiable at
45
lim(0(p+1)k0 ; vech(EFn fi fi 0 )0 )0 = (0(p+1)k0 ; vech(h10;f )0 )0 ; as required for the application of the
delta method.
The delta method gives
n1=2 (An
0 0
vech(MFn )) = n1=2 @a @n
1
n
X
i=1
! d Ah Lh ;
0
@
fi
vech (fi fi 0 )
11
AA
0
a@
0(p+1)k
vech(EFn fi fi 0 )
11
AA
(18.40)
where the …rst equality holds by (18.39) and the de…nitions of a( ) and An in (18.16), the convergence
holds by the delta method using the CLT result in (18.11) and the de…nition of Ah following (18.16).
1
Applying the inverse vech( ) operator, namely, vechkp;kp
( ); to both sides of (18.40) gives the
recon…gured convergence result
1
n1=2 (vechkp;kp
(An ))
1
MFn ) !d vechkp;kp
(Ah Lh ) = M h ;
(18.41)
where the last equality holds by the de…nition of M h in (18.18).
The convergence results in (18.38) and (18.41) hold jointly because both rely on the convergence
result in (18.11).
We show below that
n1=2 (VeDn
1
(vechkp;kp
(An ))
2
) = op (1):
This and the delta method applied again (using the function `(A) = A
(18.42)
1=2
for a pd kp
kp matrix
A) give
1
because vechkp;kp
(An ) = (
Qh10;f
Q0
1=2
n1=2 (VeDn
1
(An )) = op (1)
vechkp;kp
vec(Gi ) 1=2
)
+op (1)
h
and
vec(Gi )
h
is pd (because h10;f is pd and
(18.43)
vec(Gi )
h
=
for some full row rank matrix Q). Equations (18.38), (18.41), and (18.43) establish the
result of the lemma.
46
Now we prove (18.42). We have
VeDn := n
=
1
n
n
X
b n )vec(Gi
G
vec(Gi
i=1
n
X
1
vec(Gi
b n )0
G
b n b n 1 b 0n
0
EFn Gi )vec(Gi
EF n Gi )
i=1
=n
1
en
n
X
bn
vec(G
vec(Gi
EFn Gi )b
gn0
EFn Gi )vec(Gi
en
!
bn
vec(G
1
gbn gbn0
en
en e
EFn Gi )0
i=1
n
bn
EFn Gi )vec(G
bn
vec(G
1 e0
n
+ Op (n
EFn Gi )b
gn0
1
EFn Gi )0
0
);
(18.44)
where the second equality holds by subtracting and adding EFn Gi and some algebra, by the de…nitions of b n and b n in (4.1), (4.3), and (5.3), and by the de…nitions of e n and e n in (18.16) and
the third equality holds because (i) the second summand on the lhs of the third equality is Op (n 1 )
b n EFn Gi ) = Op (1) (by the CLT using the moment conditions in F; de…ned in
because n1=2 vec(G
b n EFn Gi ) = Op (1); and b n = Op (1);
(3.1)) and (ii) n1=2 gbn = Op (1) (by Lemma 8.3)), n1=2 vec(G
b
n
1
= Op (1); e n = Op (1); and e n 1 = Op (1) (by the justi…cation given for (14.1)).
Excluding the Op (n
1)
1
term, the rhs in (18.44) equals (vechkp;kp
(An ))
2:
Hence, (18.42) holds
and the proof is complete.
cn =
Proof of Theorem 18.3. The proof is similar to that of Lemma 8.3 in Section 8 with W
bn = Un = Ip ; and the following quantities q; D
b n ; Dn (= EFn Gi ); Bn;q ; n;q ; Cn ; and
Wn = Ik ; U
y
y
y
y
y
y
y by
n ; respectively. The proof employs the
n replaced by q ; Dn ; Dn (= D ); B y ;
y ; Cn ; and
Fn
n;q
n;q
notational simpli…cations in (13.1). We can write
b y By y (
D
n n;q
y
) 1
n;q y
y
= Dny Bn;q
y(
y
) 1
n;q y
by
+ n1=2 (D
n
By the singular value decomposition, Dny = Cny
y
Dny Bn;q
y(
y
) 1
n;q y
= Cny
y0
y
y
n Bn Bn;q y (
0
= Cny @
Iqy
y
y
0(k q ) q
y y0
n Bn :
y
) 1
n;q y
1
A = Cy
47
y
1=2
Dny )Bn;q
y (n
(18.45)
Thus, we obtain
= Cny
n;q y
y
) 1:
n;q y
:
0
y@
n
Iqy
y
y
0(p q ) q
1
A(
y
) 1
n;q y
(18.46)
b n = (D
b 1n ; :::; D
b pn ) 2 Rk
Let D
n
1=2
by
(D
n
Dny )
= n
1=2
p
p
X
j=1
p
X
=
j=1
and Dh = (D1h ; :::; Dph ) 2 Rk
f1jn D
b jn
(M
p:
We have
fpjn D
b jn
M1jFn EFn Gij ; :::; M
f1jn n1=2 (D
b jn
[M
f1jn
EFn Gij ) + n1=2 (M
MpjFn EFn Gij )
M1jFn )EFn Gij ; :::;
fpjn n1=2 (D
b jn EFn Gij ) + n1=2 (M
fpjn MpjFn )EFn Gij ]
M
p
X
(M1jh Djh + M 1jh h4;j ; :::; Mpjh Djh + M pjh h4;j );
!d
(18.47)
j=1
where the convergence holds by Lemma 8.2 in Section 8, Assumption VD, and EFn Gij ! h4;j (by
the de…nition of h4;j ):
Combining (18.45)-(18.47) gives
b y By y (
D
n n;q
where the equality uses n1=2
y
jFn
y
) 1
n;q y
y
h;q y ;
y
y
= Cn;q
y + op (1) !p h3;q y =
(18.48)
0
q y by the de…nition of q y and Bn;q
y Bn;q y = Iq y ;
! 1 for all j
the convergence holds by the de…nition of hy3;qy ; and the last equality holds by the de…nition of
y
h;q y
in (18.15).
Using the singular value decomposition Dny = Cny
y
n1=2 Dny Bn;p
0
0q
y
qy
= n1=2 Cny
B
1=2 y
= Cny B
@ n
n;p
(k
p)
(p
0
y y0 y
n Bn Bn;p q y
1
(p q y )
y y0
n Bn
0
again, we obtain
= n1=2 Cny
0q
y
(p q y )
0
y @
n
0q
y
(p q y )
Ip
1
qy
C
B
C
y
C ! hy B Diagfhy
C = hy hy
;
:::;
h
g
y
3
3 1;p
1;p A
A
@
1;q +1
y
0(k p) (p q )
qy
qy )
1
A
qy
;
(18.49)
where the second equality uses Bny0 Bny = Ip ; the convergence holds by the de…nitions of hy3 and hy1;j
for j = 1; :::; p; and the last equality holds by the de…nition of hy1;p
qy
in the paragraph following
(18.10), which uses (8.17).
y
By (18.47) and Bn;p
qy
! hy2;p
b ny
n1=2 (D
y
qy
; we have
y
Dny )Bn;p
y
using the de…nitions of Dh and M h;p
qy
qy
y
!d Dh hy2;p
qy
y
+ M h;p
qy ;
in (18.14) and (18.19), respectively.
48
(18.50)
Using (18.49) and (18.50), we get
b ny B y
n1=2 D
n;p
qy
b ny
+ n1=2 (D
y
= n1=2 Dny Bn;p
qy
! d hy3 hy1;p
y
Dh hy2;p qy
+
qy
+
y
Dny )Bn;p
y
M h;p qy
y
h;p q y
where the last equality holds by the de…nition of
qy
y
h;p q y
=
y
+ M h;p
qy ;
(18.51)
in (18.15).
Equations (18.48) and (18.51) combine to give
y
b ny B y
)
) 1 ; n1=2 D
n;p q y
n;q y
y
y
y
M h;p qy ) = h + M h
b ny Bny Sny = (D
b ny B y y (
b ny Tny = n1=2 D
n1=2 D
n;q
!d (
y
h;q y ;
y
h;p q y
+
y
(18.52)
y
using the de…nitions of Sny and Tny in (18.28), h in (18.15), and M h in (18.19).
b n EFn Gi ) !d (g h ; Dh ): This convergence is joint with that in (18.52)
By Lemma 8.2, n1=2 (b
gn ; D
b n EFn Gi ); which is part of the former,
because the latter just relies on the convergence of n1=2 (D
fn
and of n1=2 (M
MFn ) !d M h ; which holds jointly with the former by Assumption VD. This
establishes the convergence result of Theorem 18.3.
The independence of g h and (Dh ;
by Lemma 8.2, and the fact that
y
h
y
h)
follows from the independence of g h and Dh ; which holds
is a nonrandom function of Dh :
Proof of Lemma 18.4. The proof of Lemma 18.4 is analogous to the proof of Theorem 8.4 with
cn = Wn = Ik ; U
bn = Un = Ip ; and the following quantities q; D
b n ; Dn (= EFn Gi ); bjn ; Bn ; Bn;q ;
W
b ny ; Dny (= Dy ); by ; Bny ; B y y ; Sny ; S y y ; y ; and hy y ;
Sn ; Sn;q ; jFn ; and h3;q replaced by q y ; D
Fn
jn
n;q
n;q
jFn
3;q
respectively. Theorem 18.3, rather than Lemma 8.3, is employed to obtain the results in (16.37).
In consequence,
h;q
and
y
h;q y
h;p q
are replaced by
y
y
h;q y + M h;q y and
y
0k q by (18.19)).
y
y
y
where
+ M h;qy = h;qy (because M h;qy :=
y
y
y
are replaced by h;qy and h;p qy + M h;p qy in
y
y
h;p q y + M h;p q y ;
The quantities
respectively,
h;q
and
h;p q
(16.37) and in the rest of the proof of Theorem
y
8.4. Note that (16.39) holds with h3;q replaced by hy3;qy because h;qy = hy3;qy by (18.15) (just as
b
b
h;q = h3;q ): Because Un = Un ; the matrices An and Ajn for j = 1; 2; 3 (de…ned in (16.39)) are all
zero matrices, which simpli…es the expressions in (16.41)-(16.44) considerably.
The proof of Theorem 8.4 uses Lemma 16.1 to obtain (16.42). Hence, an analogue of Lemma
16.1 is needed, where the changes listed in the …rst paragraph of this proof are made and h6;j and
Cn are replaced by hy6;j and Cny ; respectively. In addition, FW U is replaced by FKCLR (because
FKCLR
FW U for
equals F0 for
and FKCLR
WU
su¢ ciently small and MW U su¢ ciently large using the facts that F0 \FW U
su¢ ciently small and MW U su¢ ciently large by the argument following (8.5)
bn = Un ; the matrices A
bjn for
F0 by the argument following (18.5)). Because U
WU
49
j = 1; 2; 3 (de…ned in (16.2)) are all zero matrices, which simpli…es the expressions in (16.9)-(16.12)
cn ; D
b n;
considerably. For (16.3) to go through with the changes listed above (in particular, with W
b ny ; Dny ; and Ip ; respectively), we need to show that
Dn ; and Un replaced by Ik ; D
By (5.4) with
=
b ny
n1=2 (D
0
Dny ) = Op (1):
(and with the dependence of various quantities on
(18.53)
0
suppressed for
notational simplicity), we have
2
3
f11n
f1pn
M
M
p
6
7
X
.
.. 7 e 1=2
..
kp
b jn ); where M
fn = 6
b jn ; :::; M
fpjn D
b ny =
f1jn D
D
(M
6 ..
.
. 7:= VDn 2R
4
5
j=1
fp1n
fppn
M
M
kp
:
(18.54)
By (18.6), we have
Dny
=
p
X
(M1jFn Djn ; :::; MpjFn Djn )
(18.55)
j=1
using Dn = (D1n ; :::; Dpn ); and Djn := EFn Gij for j = 1; :::; p:
For s = 1; :::; p; we have
fsjn D
b jn
n1=2 (M
fsjn n1=2 (D
b jn
MsjFn Djn ) = M
b jn
where n1=2 (D
fsjn
Djn ) + n1=2 (M
fsjn
Djn ) = Op (1) (by Lemma 8.2), n1=2 (M
fn
MsjFn ) = Op (1) (because n1=2 (M
vec(Gi ) 1=2
)
;
F
0
EF vec(Gi )gi F 1
MFn ) !d M h by Assumption VD), MsjFn = O(1) (because MF = (
in (8.15) satis…es
and
min (V
vec(Gi )
F
arF (fi ))
2
:= V arF (vec(Gi )
vec(Gi )
F
1
F
MsjFn )Djn = Op (1); (18.56)
gi ) = [
vec(Gi )
F
de…ned
: Ipk ]V arF (fi );
by the de…nition of FKCLR in (18.5)), and Djn = O(1) (by the moment
conditions in F, de…ned in (3.1)).
Hence,
n
1=2
b ny
(D
Dny )
=
p
X
j=1
f1jn D
b jn ; :::; M
fpjn D
b jn )
n1=2 [(M
(M1jFn Djn ; :::; MpjFn Djn )] = Op (1): (18.57)
This completes the proof of the analogue of Lemma 16.1, which completes the proof of parts (a)-(d)
of Lemma 18.4.
For part (e) of Lemma 18.4, the results of parts (a)-(d) hold jointly with those in Theorem 18.3,
rather than those in Lemma 8.3, because Theorem 18.3 is used to obtain the results in (16.37),
rather than Lemma 8.3. This completes the proof.
50
Proof of Lemma 18.5. The proof of parts (a) and (b) is the same as the proof of Theorem 10.1
for the case where Assumption R(a) holds (which states that rkn !p 1) using Lemma 18.4(a),
which shows that rkny !d 1 if q y = p:
The proofs of parts (c) and (d) are the same as in (10.5)-(10.9) in the proof of Theorem 10.1 for
the case where Assumption R(b) holds, using Theorem 18.3 and Lemma 18.4(b) in place of Lemma
8.3, with rh (Dh ; M h ) (de…ned in (18.20)) in place of rh (Dh ); and for part (d), with the proviso
that P (CLRh = c(1
; rh )) = 0: (The proof in Theorem 10.1 that P (CLRh = c(1
; rh )) = 0
does not go through in the present case because rh = rh (Dh ; M h ) is not necessarily a constant
conditional on Dh and alternatively, conditional on (Dh ; M h ); LM h and J h are not necessarily
2
p
independent and distributed as
and
2 :)
k p
Note that (10.10) does not necessarily hold in the
present case, because rh = rh (Dh ; M h ) is not necessarily a constant conditional on Dh :
The proof of Theorem 18.6 given below uses Corollary 2.1(a) of ACG, which is stated below as
Proposition 18.8. It is a generic asymptotic size result. Unlike Proposition 8.1 above, Proposition
18.8 applies when the asymptotic size is not necessarily equal to the nominal size : Let f
n
:n
1g
be a sequence of tests of some null hypothesis whose null distributions are indexed by a parameter
with parameter space
: Let RPn ( ) denote the null rejection probability of
a …nite nonnegative integer J; let fhn ( ) = (h1n ( ); :::; hJn ( ))0 2 RJ : n
functions on
lim inf n!1 Cn
1g be a sequence of
lim supn!1 Cn
1g; let Cn ! [C1;1 ; C2;1 ] denote that C1;1
C2;1 :
Assumption B: For any subsequence fwn g of fng and any sequence f
wn )
under : For
: De…ne H as in (8.1).
For a sequence of scalar constants fCn : n
hw n (
n
! h 2 H; RPwn (
wn )
wn
2
n!1
2
1g for which
! [RP (h); RP + (h)] for some RP (h); RP + (h) 2 [0; 1]:
Proposition 18.8 (ACG, Corollary 2.1(a)) Under Assumption B, the tests f
AsySz := lim sup sup
:n
n
: n
1g have
RPn ( ) 2 [suph2H RP (h); suph2H RP + (h)]:
Comments: (i) Corollary 2.1(a) of ACG is stated for con…dence sets, rather than tests. But,
following Comment 4 to Theorem 2.1 of ACG, with suitable adjustments (as in Proposition 18.8
above) it applies to tests as well.
(ii) Under Assumption B, if RP (h) = RP + (h) for all h 2 H; then AsySz = suph2H RP + (h):
We use this to prove Theorem 18.6. The result of Proposition 18.8 for the case where RP (h) 6=
RP + (h) for some h 2 H is used when proving Comment (i) to Theorem 18.1 and the Comment to
Theorem 18.6.
Proof of Theorem 18.6. Theorem 18.6 follows from Lemma 18.5 and Proposition 18.8 because
51
Lemma 18.5 veri…es Assumption B with RP (h) = RP + (h) =
; rh )) when q y < p:
RP + (h) = P (CLRh > c(1
19
when q y = p and with RP (h) =
Proof of Theorem 7.1
Theorem 7.1 of AG1. Suppose the LM test, the CLR test with moment-variance weighting,
and when p = 1 the CLR test with Jacobian-variance weighting are de…ned as in this section,
the parameter space for F is FT S;0 for the …rst two tests and FT S;JV W;p=1 for the third test, and
Assumption V holds. Then, these tests have asymptotic sizes equal to their nominal size
2 (0; 1)
and are asymptotically similar (in a uniform sense). Analogous results hold for the corresponding
CS’s for the parameter spaces F
;T S;0
and F
;T S;JV W;p=1 :
The proof of Theorem 7.1 is analogous to that of Theorems 4.1, 5.2, and 6.1. In the time series
case, for tests, we de…ne
but with
5;F
=(
1;F ; :::;
9;F )
and f
n;h
:n
1g as in (8.9) and (8.11), respectively,
de…ned di¤erently than in the i.i.d. case. (For CS’s in the time series case, we make
the adjustments outlined in the Comment to Proposition 8.1.) We de…ne63
5;F
:= VF =
1
X
m= 1
In consequence,
5;Fn
0
EF @
gi
vec(Gi
EF Gi )
10
A@
gi
vec(Gi
m
m
EF Gi
m)
10
A :
(19.1)
! h5 implies that VFn ! h5 and the condition in Assumption V holds with
V = h5 :
The proof of Theorem 7.1 uses the CLT given in the following lemma.
Lemma 19.1 Let fi := (gi0 ; vec(Gi )0 )0 : We have: wn
all subsequences fwn g and all sequences f
wn ;h
:n
1=2 Pwn
i=1 (fi
EFn fi ) !d N (0(p+1)k ; h5 ) under
1g:
Proof of Theorem 7.1. The proof is the same as the proofs of Theorems 4.1, 5.2, and 6.1 (given
in Sections 9, 10, and 11, respectively, in the Appendix to AG1) and the proofs of Lemmas 8.2
and 8.3 and Theorem 8.4 (given in Sections 14, 15, and 16 in this Supplemental Material), upon
which the former proofs rely, for the i.i.d. case with some modi…cations. The modi…cations a¤ect
the proofs of Lemmas 8.2 and 8.3 and the proof of Theorem 5.2. No modi…cations are needed
elsewhere.
The …rst modi…cation is the change in the de…nition of
63
of
5;F
described in (19.1).
The di¤erence in the de…nitions of 5;F in the i.i.d. and time series cases re‡ects the di¤erence in the de…nitions
vec(Gi )
in these two cases. See the footnote at (7.1) above regarding the latter.
F
52
The second modi…cation is that b n = b n ( 0 ) !p h5;g not by the WLLN but by Assumption
V and the de…nition of b n ( ) in (7.4). In the time series case, by de…nition, 5;F := VF ; so
h5 := lim
upper left k
FT S ;
= lim VFn : By de…nition, h5;g is the upper left k
5;Fn
k submatrix of h5 and
k submatrix of VF by (7.1) and (19.1). Hence, h5;g = lim
min ( F )
Fn :
F
is the
By the de…nition of
8F 2 FT S : Hence, h5;g is pd.
k submatrix of h5 that corresponds to the submatrix b jn ( ) of Vbn ( ) in
(7.4) for j = 1; :::; p: The third modi…cation is that b jn = b jn ( 0 ) = h5;Gj g + op (1) in (14.1) in
the proof of Lemma 8.2 (rather than b jn = EFn Gij g 0 + op (1)) for j = 1; :::; p and this holds by
Let h5;Gj g be the k
i
Assumption V and the de…nition of b jn ( ) in (7.4) (rather than by the WLLN).
We write
0
h5 = @
h5;g
h05;Gg
h5;Gg
h5;G
1
A for h5;g 2 Rk
k
; h5;Gg
0
1
h5;G1 g
B
C
B
C
..
=B
C 2 Rpk
.
@
A
h5;Gp g
k
; and h5;G 2 Rpk
pk
:
(19.2)
The fourth modi…cation is that VeDn in (11.1) in the proof of Theorem 5.2 is de…ned as described
in Section 7, rather than as in (5.3). In addition, VeDn !p h7 in (11.1) holds with h7 = h5;G
h5;Gg (h5;g )
1 h0
5;Gg
by Assumption V, rather than by the WLLN.
The …fth modi…cation is the use of a WLLN and CLT for triangular arrays of strong mixing
random vectors, rather than i.i.d. random vectors, for the quantities in the proof of Lemma 8.2 and
elsewhere. For the WLLN, we use Example 4 of Andrews (1988), which shows that for a strong
P
mixing row-wise-stationary triangular array fWi : i ng we have n 1 ni=1 ( (Wi ) EFn (Wi )) !p
0 for any real-valued function ( ) (that may depend on n) for which supn
for some
bn
n1=2 (D
(Wi )jj1+ < 1
1 EFn jj
> 0: For the CLT, we use Lemma 19.1 as follows. The joint convergence of n1=2 gbn and
EFn Gi ) in the proof of Lemma 8.2 is obtained from (14.1), modi…ed by the second and
third modi…cations above, and the following result:
n
1=2
n
X
i=1
( (Wi )
0
EFn (Wi )) = @
0k
Ik
h5;Gg h5;g1
!d N (0(p+1)k ; Lh5 ); where
0
1 0
gi
A=@
(Wi ) := @
vec(Gi ) h5;Gg h5;g1 gi
pk
Ipk
Ik
h5;Gg h5;g1
1
An
0k
1=2
n
X
(fi
EFn fi )
i=1
pk
Ipk
10
A@
gi
vec(Gi )
1
A ; (19.3)
fi = (gi0 ; vec(Gi )0 )0 ; and the convergence holds by Lemma 19.1. Using (19.2), the variance matrix
53
Lh5 in (19.3) takes the form:
0
0k
Ik
Lh5 = @
h5;Gg h5;g1
=@
h5;Gg h5;g1
0
vec(Gi )
h
10
pk
Ipk
0k
Ik
h5;Gg0
h5;Gg
h5;G
A@
h5;g
0k
h5;Gg
10
pk
Ipk
h5;Gg h5;g1 h05;Gg :
= h5;G
A@
h5;g
10
pk
A@
vec(Gi )
h
h5;g1 h05;Gg
Ik
0pk
1 0
A=@
k
Ipk
h5;g
0pk
k
0k
pk
1
A
vec(Gi )
h
1
A ; where
(19.4)
Equations (14.1) (modi…ed as described above), (19.3), and (19.4) combine to give the result of
Lemma 8.2 for the time series case.
The sixth modi…cation occurs in the proof of Lemma 8.3(d) in Section 15 in this Supplemental
Material. In the time series case, the proof goes through as is, except that the calculations in (15.13)
ai
F
are not needed because
ai
F
(and, hence,
as well) is de…ned with its underlying components
ai
F
re-centered at their means (which is needed to ensure that
vec(G )
vec(Gi )
implies that lim Fn i = h
automatically holds and
1=2
vec(h03;k q h5;g Gi h2;p q 2 )
(which, in the i.i.d. case, is proved in
h
is a convergent sum). The latter
lim
0
vec(CF
n ;k
Fn
q
1=2
Fn Gi BFn ;p q 2 )
=
(15.13).
This completes the proof of Theorem 7.1.
Proof of Lemma 19.1. For notational simplicity, we prove the result for the sequence fng rather
than a subsequence fwn : n
1g: The same proof applies for any subsequence. By the Cramér-
Wold device, it su¢ ces to prove the result with fi EFn fi and h5 replaced by s(Wi ) = b0 (fi EFn fi )
and b0 h5 b; respectively, for arbitrary b 2 R(p+1)k : First, we show
lim V arFn
n
1=2
n
X
!
s(Wi )
i=1
where by assumption
V arFn
n
1=2
n
X
5;Fn
!
s(Wi )
i=1
=
P1
=
m= 1 EFn s(Wi )s(Wi m )
n
X1
CovFn (s(Wi ); s(Wi
= b0 h5 b;
! h5 : By change of variables, we have
m ))
m= n+1
(19.5)
n
X1
m= n+1
jmj
CovFn (s(Wi ); s(Wi
n
m )):
(19.6)
This gives
V arFn
n
1=2
n
X
i=1
2
1
X
m=n
!
s(Wi )
jjCovFn (s(Wi ); s(Wi
b0
m ))jj
5;Fn b
+
n
X1
m= n+1
54
jmj
jjCovFn (s(Wi ); s(Wi
n
m ))jj:
(19.7)
By a standard strong mixing covariance inequality, e.g., see Davidson (1994, p. 212),
sup jjCovF (s(Wi ); s(Wi
m ))jj
C1
F 2FT S
=(2+ )
(m)
F
C1 C
=(2+ )
m
d =(2+ )
; where d =(2 + ) > 1;
(19.8)
for some C1 < 1; where the second inequality uses the de…nition of FT S in (7.2). In consequence,
both terms on the rhs of (19.7) converge to zero. This and b0 5;Fn b ! b0 h5 b establish (19.5).
P
P
When b0 h5 b = 0; we have limn!1 V arFn (n 1=2 ni=1 s(Wi )) = 0; which implies that n 1=2 ni=1
P
s(Wi ) !d N (0; b0 h5 b) = 0: When b0 h5 b > 0; we can assume 2n = V arFn (n 1=2 ni=1 s(Wi ))
c
for some c > 0 8n
1 without loss of generality. We apply the triangular array CLT in Corollary
1 of de Jong (1997) with (using de Jong’s notation)
n
1=2 s(W
i) n
1:
=
= 0; cni := n
1=2
n
1;
and Xni :=
Now we verify conditions (a)-(c) of Assumption 2 of de Jong (1997). Condition (a)
holds automatically. Condition (b) holds because cni > 0 and EFn jXni =cni j2+ = EFn js(Wi )j2+
2jjbjj2+ M < 1 8Fn 2 FT S : Condition (c) holds by taking Vni = Xni (where Vni is the random
variable that appears in the de…nition of near epoch dependence in De…nition 2 of de Jong (1997)),
dni = 0; and using
Fn (m)
Cm
d
8Fn 2 FT S for d > (2 + )= and C < 1: By Corollary 1 of
de Jong (1997), we have Xni !d N (0; 1): This and (19.5) give
n
1=2
n
X
i=1
s(Wi ) !d N (0; b0 h5 b);
as desired.
55
(19.9)
References
Andrews, D. W. K. (1988): “Laws of Large Numbers for Dependent Non-identically Distributed
Random Variables,” Econometric Theory, 4, 458–467.
Davidson, J. (1994): Stochastic Limit Theory. Oxford: Oxford University Press.
de Jong, R. M. (1997): “Central Limit Theorems for Dependent Heterogeneous Random Variables,” Econometric Theory, 13, 353–367.
Hwang, S.-G. (2004): “Cauchy’s Interlace Theorem for Eigenvalues of Hermitian Matrices,”American Mathematical Monthly, 111, 157–159.
Johansen, S. (1991): “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models,” Econometrica, 59, 1551–1580.
Kleibergen, F. (2005): “Testing Parameters in GMM Without Assuming That They Are Identi…ed,” Econometrica, 73, 1103–1123.
Rao, C. R. (1973): Linear Statistical Inference and its Applications. 2nd edition. New York:
Wiley.
Robin, J.-M., and R. J. Smith (2000): “Tests of Rank,” Econometric Theory, 16, 151–175.
Stewart, G. W. (2001): Matrix Algorithms Volume II : Eigensystems. Philadelphia: SIAM.
56
© Copyright 2026 Paperzz