Testing Overidentifying Restrictions When the True Parameter

Testing Overidentifying Restrictions When the True
Parameter Vector Is near or at the Boundary of the
Parameter Space
Philipp Ketz∗
March 16, 2017
We show that the standard test for testing overidentifying restrictions, which compares the J-statistic (Hansen, 1982) to χ2 critical values, suffers from overrejection when
the true parameter vector is near or at the boundary of the parameter space. While it
is possible to perform size-controlled inference by adjusting the critical values, we suggest the use of a modified J-statistic that, under the null hypothesis, is asymptotically
χ2 distributed. The resulting test constitutes a natural extension of the standard overidentification test, which is valid under the assumption that the true parameter is in the
interior, and has the advantage of being easy to implement. A small Monte Carlo study
asserts that the test has good power properties.
Keywords: Overidentifying restrictions, testing, boundary.
∗
Paris School of Economics,
[email protected].
48
Boulevard
Jourdan,
75014
Paris,
France.
Email:
1
Introduction
Economic models often imply certain restrictions on the distribution of the data. When
the number of restrictions, H, exceeds the number of unknown parameters, p, the additional, so-called overidentifying, restrictions allow us to test the model’s validity, at least
partially. The most widely used test is based on the J-statistic, given by the Generalized Method of Moments (GMM) objective function evaluated at its minimizer, i.e., the
GMM estimator (Hansen, 1982). In standard settings, the asymptotic distribution of the
J-statistic is given by a χ2 distribution with the degrees of freedom equal to the number
of overidentifying restrictions, H − p. One of the assumptions underlying the standard
result is that the true parameter vector is in the interior of the parameter space. In that
case, the J-statistic behaves asymptotically like a quadratic function evaluated at its unconstrained minimizer. When instead the parameter vector is near or at the boundary of
the parameter space, the J-statistic behaves asymptotically like a quadratic function evaluated at a constrained minimizer. Therefore, ceteris paribus the J-statistic is (weakly)
greater under the nonstandard condition that the true parameter vector is near or at
the boundary. As a result, the standard overidentification test suffers from overrejection
when standard critical values are employed. In the context of model specification testing,
this increases the likelihood of mistakenly discarding a valid model. A prominent example
in which the parameter space is restricted and estimates are often found to be close to the
boundary is the random coefficients logit model (Berry, Levinsohn, and Pakes, 1995).1
In this paper, we propose a modified J-statistic based on the results in Ketz (2017a).
The modified J-statistic is given by a quadratic approximation to the original objective
function evaluated at its minimizer. We show that, under the null hypothesis, the modified J-statistic is asymptotically χ2 distributed with H − p degrees of freedom, i.e., it has
the standard asymptotic distribution. The statistic can be computed at no additional
cost and the resulting test has the benefit of allowing inference as if under standard
conditions, given that it relies on standard critical values.
An alternative strategy would be to rely on the J-statistic and to adjust the critical
values in order to obtain a test that controls size. In this paper, we use the results in Ketz
(2017a), which are based on Andrews (1999), to derive the asymptotic null distribution
of the J-statistic when the true parameter is near or at the boundary of the parameter
space. In the special case, where there is at most one parameter near or at the boundary
of the parameter space, the asymptotic distribution is given by P (h)χ2 (H − p) + (1 −
P (h))χ2 (H − p + 1), where 0.5 ≤ P (h) ≤ 1 denotes the probability with which the
estimator (asymptotically) takes on a value in the interior of the parameter space, which
depends on h, a localization parameter that measures how close the true parameter is to
1
See Ketz (2015) for a discussion.
1
the boundary (P (0) = 0.5 and limh→∞ P (h) = 1).2 As h is unknown and not consistently
estimable, a natural approach to size-correct inference is based on the least favorable
configuration approach. In the special case with one parameter near or at the boundary,
the least favorable configuration is given by h = 0, such that the critical value is based
on 0.5χ2 (H − p) + 0.5χ2 (H − p + 1).3 By construction, the resulting test is conservative
for h > 0 and may thus sacrifice power, as illustrated in a small Monte Carlo study.
This paper adds to the recent literature concerned with the behavior of the J-statistic
under nonstandard conditions. Dovonon and Renault (2013) (DR) derive the asymptotic
distribution of the J-statistic when some of the parameters are not first-order identified.
This phenomenon occurs, for example, when testing for common features in asset returns. The asymptotic distribution result in DR bears a certain resemblance to ours. In
particular, for a special case of theirs, they find that the asymptotic null distribution of
the J-statistic is given by 0.5χ2 (H − 1) + 0.5χ2 (H), which corresponds to the special case
described above when p = 1 and h = 0, i.e., the true parameter is at the boundary.4 In
contrast to our setup, however, the asymptotic null distribution of the J-statistic in DR
does not depend on the true parameter value. DR tie their result to the nonstandard
asymptotic distribution of the GMM estimator, which results from the lack of first-order
identification. Recently, Lee and Liao (2014) have suggested the use of additional moment
conditions in order to overcome the lack of first-order identification encountered in DR.
In particular, they exploit the fact that the lack of first-order identification implies additional moment conditions, as the Jacobian of the original moment equals zero. Lee and
Liao (2014) show that the J-statistic based on the extended set of moment conditions has
the standard asymptotic null distribution. As a by-product, their approach also restores
asymptotic normality of the GMM estimator. Irrespective of whether a similar strategy
is feasible here, it would be highly undesirable, requiring that the true parameter be at
the boundary under the null hypothesis. This being said, our solution is similar in spirit
to Lee and Liao (2014) in that both, the resulting estimator and the resulting J-statistic,
have standard asymptotic distributions.
The outline of this paper is as follows. In Section 2, we derive the nonstandard
asymptotic distribution of Hansen’s J-statistic when the true parameter vector may be
near or at the boundary of the parameter space. In Section 3, we propose a modified
J-statistic and derive its asymptotic distribution. The results of a small Monte Carlo
study are presented in Section 4. Section 5 concludes.
2
Note that even in the just identified case, this implies that the J-statistic has a non-degenerate
distribution as long as h < ∞.
3
P (h)χ2 (H −p)+(1−P (h))χ2 (H −p+1) is stochastically dominated by 0.5χ2 (H −p)+0.5χ2 (H −p+1)
for h > 0.
4
In fact, when the true parameter is at the boundary, the results in DR can be used (upon reparameterization) to derive our asymptotic distribution results, at least for some models considered here. See
the last paragraph of Section 2 for details.
2
2
Hansen’s J-statistic
We consider the following population moment
G(θ; γ) ≡ Eγ [g(θ, wi )],
(1)
where g(θ, wi ) denotes a H-dimensional vector function which depends on the p-dimensional
vector θ and some data, wi . We assume that the distribution of the data is fully characterized by the vector γ = (θ, φ) ∈ Γ and let Eγ denote the corresponding expectation.
Here, φ denotes an infinite-dimensional nuisance parameter.5 The parameter space, Γ, is
assumed to be compact and of the following form
Γ = {γ = (θ, φ) : θ ∈ Θ, φ ∈ Φ(θ)}
with6
Θ = [−c, c]p1 × [0, c]p2 ,
(2)
where 0 < c < ∞ and p1 , p2 ∈ N such that p = p1 + p2 with θ = (θ1 , θ2 ) denoting a conformable partition.7 Furthermore, Φ(θ) ⊂ Φ ∀θ ∈ Θ for some compact parameter space
Φ with a metric that induces weak convergence of the bivariate distributions (wi , wi+m )
for all i, m ≥ 1.
In order to define the J-statistic, we first introduce some additional notation. Define
the sample moment corresponding to the population moment given in equation (1) as
follows
n
1X
g(θ, wi ).
(3)
Gn (θ) ≡
n i=1
The GMM objective function is given by
Qn (θ) = Gn (θ)0 Wn Gn (θ)/2,
(4)
where Wn denotes an asymptotically nonsingular weighting matrix. Possible dependence
of Wn on θ is suppressed for notational convenience and can be justified by the use of
a consistent first step estimator. Define an estimator, θ̂n , as any random variable that
satisfies θ̂n ∈ Θ and
Qn (θ̂n ) = min Qn (θ) + op (1).
(5)
θ∈Θ
Hansen’s J-statistic is given by
Jn = 2nQn (θ̂n ).
5
(6)
Despite the presence of φ the model is parametric in nature, as φ is not estimated.
The normalization to 0 is WLOG.
7
Here and in what follows, we let (a, b) denote the vector (a0 , b0 )0 , where a and b are column vectors.
6
3
The J-statistic naturally lends itself to testing the null hypothesis that the economic
model is well specified, i.e.,
H0 : Eγ0 [g(θ0 , wi )] = 0,
where γ0 = (θ0 , φ0 ) denotes the (fixed) true parameter. In the standard case, where
p
θ0 ∈ int(Θ), the asymptotic null distribution of Jn is given by χ2 (H − p), if Wn →
Eγ0 [g(θ0 , wi )g(θ0 , wi )0 ]−1 . Intuitively, the estimator, θ̂n , sets exactly p out of H sample
moments equal to 0. Under the above null hypothesis, the remaining H − p sample
moments should be close to zero, such that their quadratic form (appropriately scaled)
is asymptotically χ2 distributed with H − p degrees of freedom.
In this paper, we are interested in obtaining good finite sample approximations to
the behavior of the J-statistic when the true parameter is near or at the boundary of
the parameter space. Therefore, it is essential to derive asymptotic theory under certain
drifting sequences of true parameters. Let γn = (θn , φn ) = (θ1n , θ2n , φn ) ∈ Γ denote a
sequence of true parameters.8 The sequences of primary interest are
Γ(γ0 , h) = {{γn ∈ Γ(γ0 ) : n ≥ 1} :
√
nθ2n → h, where khk < ∞},
where9
Γ(γ0 ) = {{γn ∈ Γ : n ≥ 1} : γn → γ0 ∈ Γ}.
Note that γn ∈ Γ(γ0 , h) is such that that θ2n → 0, i.e., the sequence of true parameters
drifts towards the boundary. Under sequences of true parameters, the null hypothesis
under test becomes
(7)
H0 : Eγn [g(θn , wi )] = 0.
Next, we state several high level assumptions under which the asymptotic theory in
this paper is derived. They are stated for general population and sample moments. In
particular, the moments need not take the form given in equations (1) and (3). The
assumptions are taken from Andrews and Cheng (2014a) with slight modifications as we
do not allow for varying degrees of identification strength. The first assumption concerns
consistency.10
Assumption 1. (i) Under {γn } ∈ Γ(γ0 ), supθ∈Θ kGn (θ)−G(θ; γ0 )k →p 0 and supθ∈Θ kWn (θ)−
W (θ; γ0 )k →p 0 for some nonrandom funtions G(θ; γ0 ) and W (θ; γ0 ).
(ii) G(θ; γ0 ) = 0 if and only if θ = θ0 .
(iii) G(θ; γ0 ) has continuous left/right (l/r) partial derivatives on Θ, denoted Gθ (θ; γ0 ).
8
Then, {wi }ni=1 forms a triangular array. For ease of notation, we omit the index n.
With a slight abuse of notation, we now let γ0 denote the limit of the sequence of true parameters.
10
Assumption 1 (i), (ii), (iii), (v), (vi), and (vii) correspond to Assumption GMM1 (ii), (iv), (v), (vi),
(vii) and (ix) in Andrews and Cheng (2014a), respectively. And Assumption 1 (iv) corresponds to (part
of) Assumption GMM5 (iii) in Andrews and Cheng (2014a).
9
4
(iv) Gθ ≡ Gθ (θ0 ; γ0 ) has full column rank.
(v) W (θ; γ0 ) is continuous in θ on Θ.
(vi) W ≡ W (θ0 ; γ0 ) is nonsingular.
(vii) Θ is compact.
Note that, by Theorem 6 in Andrews (1999), Assumption 1(iii) implies that
G(θ; γ0 ) = G(θn ; γ0 ) + Gθ (θn ; γ0 )(θ − θn ) + o(kθ − θn k).
(8)
The next assumption ensures that the objective function is asymptotically well approximated by a quadratic function, see below.11
Assumption 2. Under {γn } ∈ Γ(γ0 ),
sup
θ∈Θ:kθ−θn k≤δn
√
nkGn (θ) − G(θ; γ0 ) − Gn (θn ) + G(θn ; γ0 )k
√
= op (1)
(1 + k n(θ − θn )k)
for all constants δn → 0.
The following assumption is sufficient for Assumption 2 and can often be verified using
a ULLN, see e.g., Andrews (1992).12
Assumption 2∗ .
(i) Gn (θ) has continuous l/r partial derivatives on Θ ∀n ≥ 1.
(ii) Under {γn } ∈ Γ(γ0 ),
sup
θ∈Θ:kθ−θn k≤δn
∂
Gn (θ) − Gθ = op (1)
∂θ0
for all constants δn → 0.
The last assumption concerns the asymptotic behavior of the scaled sample moment.13
√
Assumption 3. Under {γn } ∈ Γ(γ0 ), nGn (θn ) →d Y ∼ N (0, V ) for some symmetric
and positive definite matrix V .14
By Lemma 3.1 in Andrews and Cheng (2014a), Assumptions 1(i) and (v) ensure that
under {γn } ∈ Γ(γ0 ) ∀γ0 ∈ Γ we have
Wn ≡ Wn (θ̄n ) = W + op (1),
11
(9)
Assumption 2 corresponds to Assumption GMM2(ii) in Andrews and Cheng (2014a).
Assumption 2∗ (i) corresponds to Assumptions GMM2∗ (i) in Andrews and Cheng (2014a), while
Assumption 2∗ (ii) subsumes Assumptions GMM2∗ (iii) and GMM5(ii) and (iii) in Andrews and Cheng
(2014a).
13
Assumption 3 corresponds to Assumption GMM5(i) in Andrews and Cheng (2014a).
14
For notational convenience, we omit the dependence of V on γ0 .
12
5
where θ̄n denotes a consistent first-step estimator, which is available given Assumption
1, e.g., with Wn (θ) and W (θ; γ0 ) equal to the identity matrix.
Before formally stating the asymptotic distribution result for Hansen’s J-statistic, we
give an heuristic argument. Under the above assumptions, the objective function admits
the following quadratic approximation
Qn (θ) = Qn (θn ) + G0n (θn )W Gθ (θ − θn ) + (θ − θn )0 G0θ W Gθ (θ − θn )/2 + Rn (θ),
where Rn (θ) denotes the remainder. Let
I = G0θ W Gθ ,
√
Zn = I −1 G0θ W nGn (θn ),
and
qn (λ) = (λ + Zn )0 I(λ + Zn ).
Then, the above equation can be written as
Qn (θ) = Qn (θn ) −
√
1 0
1
Zn IZn + qn ( n(θ − θn )) + Rn (θ).
2n
2n
√
It can be shown that n(θ̂n − θn ) is asymptotically equivalent to the minimizer of qn (λ)
√
subject to λ ∈ Λ, say λ̂n . Here, Λ denotes the limit of n(Θ−θn ). Under {γn } ∈ Γ(γ0 , h),
Λ depends on h and we write Λh , where
Λh = Rp1 × [−h1 , ∞) × · · · × [−hp2 , ∞).
Note that
qn (λ̂n ) = λ̂0n I(λ̂n + Zn ) + Zn0 I(λ̂n + Zn ).
Since λ̂0n I(λ̂n + Zn ) = 0 (see e.g., the proof of part (c) of Theorem 3 in Andrews (1999)),
which implies that λ̂0n I λ̂n = −λ̂0n IZn , we have that
qn (λ̂n ) = −λ̂0n I λ̂n + Zn0 IZn ,
such that
Qn (θ̂n ) = Qn (θn ) −
1 0
λ̂ I λ̂n + Rn (θ̂n ).
2n n
It follows that15
Jn = 2nQn (θ̂n ) = 2nQn (θn ) − λ̂0n I λ̂n + op (1).
(10)
In order to formally state the asymptotic distribution of Hansen’s J-statistic we introduce
15
Here, we use Rn (θ̂n ) = op (1/n), which follows from the proof of Proposition 1.
6
some additional notation. In particular, let
q(λ) = (λ + Z)0 I(λ + Z),
where16
Z = I −1 G0θ W Y ∼ N (0, I −1 G0θ W V W Gθ I −1 ).
(11)
Proposition 1. Under {γn } ∈ Γ(γ0 ) and Assumptions 1, 2, and 3, Hansen’s J-statistic,
Jn , defined in equation (6) is asymptotically distributed as Y 0 W Y − λ̂0 I λ̂, where λ̂ denotes
the minimizer of q(λ) subject to λ ∈ Λ.
The proof of Proposition 1 is outlined above. Details can be found in Appendix A.
Remark 1. When the true parameter vector is “far enough” from the boundary, i.e.,
√
n(Θ − θn ) → Rp , then λ̂ = −Z. If, in addition, W = V −1 , then the asymptotic
d
distribution result reduces to the standard result, i.e., Jn → χ2 (H − p).
Remark 2. Under {γn } ∈ Γ(γ0 , h), the asymptotic distribution of Jn is nonstandard and
depends on h. When p2 = 1, i.e., there is only one (scalar) parameter which is restricted
below by zero, there exists a simple closed form expression for the asymptotic distribution
of Jn . Let P (h) denote the probability that Zp < h. Then, if W = V −1 , the asymptotic
distribution of Jn is given by P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1).
The closed form expression for the asymptotic null distribution of the J-statistic given
in Remark 2, albeit only valid for p2 = 1, allows us to illustrate several important points.
First, when the true parameter is near or at the boundary (h < ∞), the test that compares
the J-statistic to the standard critical value, based on χ2 (H −p), suffers from overrejection
since χ2 (H − p) is stochastically dominated by P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1).
Furthermore, it is easy to see that by comparing the J-statistic to the appropriate
quantile of 0.5χ2 (H − p) + 0.5χ2 (H − p + 1) we obtain a test that controls size, as
P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1) is stochastically dominated by 0.5χ2 (H − p) +
0.5χ2 (H − p + 1) for all h > 0 (including h = ∞). The resulting test is conservative by
construction whenever h > 0. Note that using the appropriate quantile of P (h)χ2 (H −
p) + (1 − P (h))χ2 (H − p + 1) is infeasible in practice, as h is unknown and not consistently
estimable.
Lastly, we establish the link between our results and the results in Dovonon and
Renault (2013). In particular, we heuristically show how the theory in Dovonon and
Renault (2013) can be used to derive the convergence rate of the GMM estimator and
the asymptotic distribution of the corresponding J-statistic in the context of the random
coefficients logit model (Berry, Levinsohn, and Pakes, 1995), when the true parameter
vector is at the boundary. In Berry, Levinsohn, and Pakes (1995), the random coefficients
16
For notational convenience, we omit the dependence of Z on γ0 .
7
are parameterized with respect to a vector of means and a vector of standard deviations.
Here, we consider the special case of a single random coefficient with known mean and
unknown standard deviation, σ. Then, Proposition 1 applies after a reparameterization
in terms of variances, σ 2 (Ketz, 2015). In the notation of this paper, the true parameter
is at the boundary if h = 0 and σT2 = 0 ∀T , where T denotes the sample size. For the
purpose of this discussion, we treat the true parameter as fixed and write, σ02 = 0. Then,
Remark 2 implies that Jn is asymptotically distributed as 0.5χ2 (H − 1) + 0.5χ2 (H), since
p1 = 0 and p = p2 = 1. This result is also obtained a special case of Corollary 3.2
in Dovonon and Renault (2013) with p = 1 under the original parameterization with
respect to σ. To see this, note that the random coefficients logit model in its original
parameterization is not first-order identified at σ = 0 (Ketz, 2015). Furthermore, defining
√
σ = +/ − σ 2 we can treat σ0 = 0 as an interior point (for the purpose of estimation)
such that the assumptions of Corollary 3.2 in Dovonon and Renault (2013) are satisfied.
Similarly, Propositions 3.1 and 3.2 in Dovonon and Renault (2013) apply. In particular,
T 1/4 σ̂T is Op (1) and asymptotically either equals zero (with probability 0.5) or has a
nondegenerate distribution with zero-probability mass at zero (with probability 0.5), as
illustrated in Section 6 in Ketz (2015).17
3
Modified J-statistic
Although it is possible to perform size-controlled inference based on the J-statistic using,
for example, the least favorable configuration approach, it has two disadvantages.18 First,
the construction of critical values is, if not difficult, nonstandard and requires simulation,
if p2 > 1. Second, the resulting test may unnecessarily sacrifice power, when the true
parameter is “far” from the boundary of the parameter space.
In this paper, we propose an alternative J-statistic based on the idea in Ketz (2017a),
which under the null hypothesis given in equation (7) has the standard χ2 (H − p) distribution. Consider the following quadratic approximation to the objective function19
Mn (θ) = G0n (θn )W Gn (θn )/2 + G0n (θn )W Gθ (θ − θn ) + (θ − θn )0 G0θ W Gθ (θ − θn )/2. (12)
The minimizer of Mn (θ), say θ̈n , has the following closed form expression
θ̈n = θn − (G0θ W Gθ )−1 G0θ W Gn (θn ).
Under the null, Mn (θ) (multiplied by 2n) evaluated at θ̈n has the same asymptotic
p
p
Instead of σ̂T = + σ̂T2 , we could also let σ̂T = − σ̂T2 .
18
Alternatively, one could apply the methods developed in McCloskey (2016).
19
The only difference with the quadratic approximation given in Section 2 is that Qn (θn ) is replaced
by G0n (θn )W Gn (θn )/2, see equation (4).
17
8
distribution as the standard J-statistic, χ2 (H − p). To see this, let W = AA0 and
G0θ W Gθ = B 0 B. Then,
2nMn (θ̈) = nG0n (θn )W Gn (θn ) − nG0n (θn )W Gθ (G0θ W Gθ )−1 G0θ W Gn (θn )
(13)
can be written as
√
√
( nA0 Gn (θn ))0 I − B(B 0 B)−1 B 0 ( nA0 Gn (θn )),
√
d
where ( nA0 Gn (θn )) → N (0, I) and (I − B(B 0 B)−1 B 0 ) is an idempotent matrix of rank
H − p, and the result follows. 2nMn (θ̈) as an alternative J-statistic is, however, not
feasible, as it depends on unknown quantities. Instead, we propose the following test
statistic
(14)
JnM = 2nM̂n (θ̃n ).
Here, θ̃n denotes the minimizer of M̂n (θ), which denotes the sample analogue of Mn (θ).
In particular, let Ĝθ,n denote a consistent estimator of Gθ , which by Assumptions 1-3
exists.20 If, for example, Assumption 2∗ holds, we can take Ĝθ,n = ∂θ∂ 0 Gn (θ̂n ). Then,
M̂n (θ) is given by
M̂n (θ) = G0n (θ̂n )Wn Gn (θ̂n )/2 + G0n (θ̂n )Wn Ĝθ,n (θ − θ̂n )
+ (θ − θ̂n )0 Ĝ0θ,n Wn Ĝθ,n (θ − θ̂n )/2.
(15)
Next, we state the main result of this paper.
√
d
Theorem 1. Under {γn } ∈ Γ(γ0 ) and Assumptions 1, 2, and 3, n(θ̃n − θn ) → Z, where
Z is given in equation (11). If, in addition, W = V −1 , then the modified J-statistic, JnM ,
given in equation (14) is asymptotically distributed as χ2 (H − p).
The proof of Theorem 1 is given in Appendix A. The first conclusion is obtained by
verifying Assumptions 1-3 in Ketz (2017a), while the second conclusion is obtained by
showing that M̂n (θ̃n ) = Mn (θ̈) + op (1/n).
The modified J-statistic incurs no additional computing cost compared to the standard
J-statistic, cf. equation (15). Given its standard asymptotic null distribution, the test
based on the modified J-statistic constitutes an attractive alternative to the test based
on the J-statistic, which requires an adjustment of the critical values in order to ensure
size-control.
√
Remark 3. It follows from the proof of Theorem 1 that we only require θ̂n to be nconsistent. In particular, Mn (θ) and M̂n (θ) are not required to rely on the same (asymptotic) weighting matrix. As in the context of one-step estimators (Newey and McFadden,
20
See, for example, the estimator described on p. 1043 in Pakes and Pollard (1989), which by Assumption 2 is consistent, cf. Lemma 1 in Andrews (2002).
9
1994), this allows for θ̂n to be obtained using an inefficient weighting matrix, which in
some cases may entail considerable computational benefits.
4
Monte Carlo
This Monte Carlo study serves two purposes. First, we show that the asymptotic theory
derived in Sections 2 and 3 provides good finite sample approximations. Second, we
conduct a power comparison of the two tests that control size, i.e., the test that compares
the J-statistic to adjusted critical values and the test that compares the modified Jstatistic to standard critical values.
In order to assess the quality of the approximations provided by our asymptotic theory,
we turn to the random coefficients logit model introduced by Berry, Levinsohn, and Pakes
(1995). The restrictions on the parameter space arise naturally in this model, as each
random coefficient is parameterized with respect to a mean and a variance, which is
bounded below by zero. The fact that the boundary is “hard” suits the purpose of this
Monte Carlo study, as the contribution of this paper is the availability of the modified
J-statistic in models, where the objective function is not defined outside the parameter
space.21 The data generating process is taken from Ketz (2015) with the only difference
that instead of using the optimal instrument for the endogenous variable, xjt,1 , we employ
all three instruments, zjt . Then, we have H = 6 and p = 4 with p1 = 3 and p2 = 1. For
the sake of brevity, we refrain from reproducing the details of the data generating process
here. Table 1 below shows the null rejection frequencies of the three different tests when
the true parameter is at the boundary, i.e., σ 2 = 0. The first test compares the J-statistic
to the standard critical value, the second test compares the J-statistic to the adjusted
critical value, and the last test compares the modified J-statistic to the standard critical
value.
Table 1: Null rejection frequencies
Test statistic
Jn
Jn
JnM
2
2
2
5% Critical value
χ (2) 0.5χ (2) + 0.5χ (3) χ2 (2)
Rejection frequency 0.090
0.058
0.056
Table 1 shows that our asymptotic theory provides good finite sample approximations.
The first test overrejects at the 5% significance level with a null rejection frequency of
9%. The other two tests have null rejection frequencies close to the nominal level of 5%,
with null rejection frequencies of 5.8% and 5.6%, respectively.
21
In models, where the boundary is merely user imposed—e.g., the coefficient on price is a priori
assumed to be nonpositive—, it is possible to employ the “standard” J-statistic, i.e., the objective
function evaluated at the unrestricted estimator, which simply ignores the restrictions on the parameter
space.
10
In what follows, we let Jn denote the J-statistic or the test based on that statistic
using adjusted critical values. Equivalently, we let JnM denote the modified J-statistic or
the test based on it, which uses standard critical values. In order to compare Jn and JnM
in terms of power, we make use of a simple linear Instrumental Variables (IV) model,
where we “artificially” restrict the slope parameter to be nonnegative. Compared to the
random coefficients logit model, the simple linear IV model provides more transparency
with respect to the alternatives under which we evaluate power. Note that in this context,
the modified J-statistic reduces to the “standard” J-statistic that ignores the “artificial”
restrictions on the parameter space, see also footnote 21.
The model is given by the equation of interest
yi = β0 + β1 xi + ui
and the reduced form equation for xi , the endogenous regressor,
xi = π0 + π1 z1i + πz2i + vi .
The data are generated according to






ui
vi
z1i
z2i






 ∼ N 




0
0
0
0
 
 
 
,
 
 
1 0.5 σuz1 σuz2
0.5 1
0
0
σuz1 0
1
0.2
σuz2 0 0.2
1



 .


The parameter values are β0 = π0 = π1 = π2 = 1. In the tables below, we vary σuz1 , and
σuz2 in order to assess power under different alternatives. We also vary β1 to assess the
impact of the restrictions on the parameter space on power. We impose that β1 cannot
be negative, i.e., β1 ≥ 0. We choose n, the sample size, equal to 250 and perform 1000
Monte Carlo simulations. Let β̂0 , β̂1 and β̃0 , β̃1 denote the restricted and the unrestricted
estimators, respectively. Furthermore, let zi = (1, z1i , z2i )0 ,
ui (β0 , β1 ) = yi − β0 − β1 xi ,
and
n
1X 2
u (β0 , β1 ).
σ (β0 , β1 ) =
n i=1 i
2
Then, the J-statistic is given by
n
Jn (β0 , β1 ) = n
1X
ui (β0 , β1 )zi0
n i=1
!
n
1X 0
zi zi
n i=1
11
!−1
n
1X
ui (β0 , β1 )zi0
n i=1
!
/σ 2 (β0 , β1 ),
such that Jn = Jn (β̂0 , β̂1 ) and JnM = Jn (β̃0 , β̃1 ). Here, H = 3 and p = 2 with p1 = p2 = 1.
Therefore, the critical values for Jn and JnM are based on 0.5χ2 (1) + 0.5χ2 (2) and χ2 (1),
respectively. Table 2 shows the rejection frequencies of Jn and JnM at the 5% significance
level alongside the averages of the estimates over the 1000 Monte Carlo replications for
different values of σuz1 and σuz2 , while β1 = 0, i.e., the true parameter is at the boundary.
σuz1
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
σuz2
0
0
0
0
0
0
0
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Table 2: Power analysis - β1 = 0
β̃0
β̃1
β̂0
β̂1
RF(JnM )
1.1253 -0.1258 0.9986 0.0000 0.9460
1.0837 -0.0845 0.9985 0.0002 0.6680
1.0421 -0.0431 0.9961 0.0026 0.2210
1.0006 -0.0017 0.9836 0.0150 0.0490
0.9590 0.0397 0.9555 0.0432 0.2600
0.9174 0.0812 0.9171 0.0815 0.7420
0.8757 0.1228 0.8757 0.1228 0.9860
1.2498 -0.2495 0.9987 0.0000 0.0590
1.1667 -0.1670 0.9987 0.0000 0.0620
1.0837 -0.0844 0.9985 0.0002 0.0530
1.0006 -0.0017 0.9836 0.0150 0.0490
0.9174 0.0812 0.9171 0.0815 0.0460
0.8341 0.1644 0.8341 0.1644 0.0400
0.7506 0.2479 0.7506 0.2479 0.0420
RF(Jn )
0.9990
0.8830
0.3440
0.0550
0.1690
0.6260
0.9750
1.0000
0.9890
0.4970
0.0550
0.0250
0.0210
0.0210
Analogously to Table 1, Table 2 shows that both tests control size. The upper half of
Table 2 reveals that for some alternatives, σuz1 < 0 and σuz2 = 0, the Jn has better power,
while for other alternatives, σuz1 > 0 and σuz2 = 0, the JnM has better power. Looking at
the (averages of the) estimates we see that the performances of the two tests are closely
linked to the direction of the bias resulting from the violation of the moment conditions.
In particular, the Jn does well when the bias of the unrestricted estimator is such that the
restricted estimator becomes unbiased, whereas when the bias is in the other direction,
the Jn underperforms compared to the JnM . This results from the estimator being biased
towards the interior of the parameter space such that estimates eventually cease to be
restricted, see e.g., σuz1 = −0.3 and σuz2 = 0, leading to identical J-statistics (Jn = JnM ).
Since the Jn uses a larger critical value it sacrifices power in comparison. The lower half of
Table 2 reveals that the JnM only has trivial power against some alternatives, which is not
surprising given that it corresponds to the “standard” overidentification test when there
are no restrictions on the parameter space. The lack of power of the overidentification
test for certain alternatives has for example been pointed out in Small (2007). The Jn ,
on the other hand, has good power for some of these alternative, which follows from the
above reasoning.
12
σuz1
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
σuz2
0
0
0
0
0
0
0
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Table 3: Power analysis - β1 = 0.1
β̃0
β̃1
β̂0
β̂1
RF(JnM )
1.1253 -0.0258 1.0927 0.0065 0.9460
1.0837 0.0155 1.0733 0.0258 0.6680
1.0421 0.0569 1.0403 0.0587 0.2210
1.0006 0.0983 1.0004 0.0984 0.0490
0.9590 0.1397 0.9590 0.1397 0.2600
0.9174 0.1812 0.9174 0.1812 0.7420
0.8757 0.2228 0.8757 0.2228 0.9860
1.2498 -0.1495 1.0992 0.0000 0.0590
1.1667 -0.0670 1.0985 0.0007 0.0620
1.0837 0.0156 1.0732 0.0259 0.0530
1.0006 0.0983 1.0004 0.0984 0.0490
0.9174 0.1812 0.9174 0.1812 0.0460
0.8341 0.2644 0.8341 0.2644 0.0400
0.7506 0.3479 0.7506 0.3479 0.0420
RF(Jn )
0.9370
0.5740
0.1410
0.0280
0.1630
0.6250
0.9750
0.9600
0.3460
0.0380
0.0280
0.0250
0.0210
0.0210
Table 3 replicates Table 2 with β1 = 0.1, i.e., the true parameter is no longer at the
boundary, but close to it relative to the sample size. For all alternatives considered in the
upper half of Table 3, we see that the JnM outperforms the Jn . This is a direct consequence
of the conservativeness of the Jn when the true parameter is not at the boundary (see
σuz1 = σuz2 = 0). The lower half of Table 3 shows a similar pattern as that of Table 2,
except that the Jn only starts to display nontrivial power for a relatively large violation
of the moment conditions.
5
Conclusion
In this paper, we show that the overidentification test that compares the J-statistic, given
by the GMM objective function evaluated at its minimizer, to standard critical values
suffers from overrejection when the true parameter vector is near or at the boundary
of the parameter space. While it is possible to make size-controlled inference based on
the J-statistic using adjusted critical values, we suggest the use of a modified J-statistic,
which is available at no additional computing cost and which, under the null hypothesis,
has the standard asymptotic distribution. Therefore, testing overidentifying restrictions
using the modified J-statistic is simple and has a “flavor” of making inference under
standard conditions.
13
A
Proofs
Proof of Proposition 1. Assumptions 1-3 correspond to Assumptions GMM1, GMM2,
and GMM5 in Andrews and Cheng (2014a). Assumptions 2 and 3 in Ketz (2017a) and
Assumption 6 in Ketz (2017b) correspond to Assumptions D1-D3 in Andrews and Cheng
(2012). By Lemma 10.1 in Andrews and Cheng (2014b) and Lemma 3.1 in Andrews and
Cheng (2012), Assumption GMM1 in Andrews and Cheng (2014a) implies Assumption 5
in Ketz (2017b). By Lemma 10.3 in Andrews and Cheng (2014b), Assumptions GMM1,
GMM2, and GMM5 in Andrews and Cheng (2014a) imply Assumptions D1-D3 in Andrews and Cheng (2012). Therefore, Assumptions 1-3 imply 2 and 3 in Ketz (2017a) and
Assumptions 5 and 6 in Ketz (2017b) with
DQn (θ) = G0θ W Gn (θ)
and
D2 Qn (θn ) = J(γ ∗ ) = G0θ W Gθ .
The result then follows by extending the result and the proof of Proposition 1 in Ketz
(2017b) by part (c) of Theorem 3 in Andrews (1999).
Proof of Theorem 1. By Proposition 1, Assumption 1 in Ketz (2017a) is satisfied. Furthermore, from the proof of Proposition 1 it follows that Assumptions 2 and 3 in Ketz
(2017a) are also satisfied, under Assumptions 1-3. It remains to show that Assumption
4 in Ketz (2017a) holds. Note that θ̃n is given by
−1
θ̃n = θ̂n − Ĝ0θ,n Wn Ĝθ,n
Ĝ0θ,n Wn Gn (θ̂n ).
Since, by Assumptions 1 and 2,
Ĝ0θ,n Wn Ĝθ,n = G0θ W Gθ + op (1),
Assumption 4(ii) in Ketz (2017a) is satisfied. Similarly, we have
∂ 0
G (θ̂n )Wn = G0θ W + op (1).
∂θ0 n
Therefore, it suffices to show that
sup
√
θ∈Θ:k n(θ−θn )k≤
√
kGn (θ) − Gn (θn ) + Gθ (θ − θn )k = op (1/ n),
(16)
which holds by Assumption 2 and equation (8), since kGθ (θn ; γ0 ) − Gθ (θ0 ; γ0 )k = o(kθn −
14
θ0 k). The second conclusion of the Theorem follows by showing that
M̂n (θ̂n ) = Mn (θ̈) + op (1/n).
(17)
Analogously to equation (13), we can write
2nM̂n (θ̃) =nG0n (θ̂n )Wn Gn (θ̂n )
−1
−nG0n (θ̂n )Wn Ĝ0θ,n Ĝ0θ,n Wn Ĝθ,n
Ĝθ,n Wn Gn (θ̂n ).
From equation (16), it follows that
to above, equation (18) satisfies
(18)
√
nGn (θ̂n ) = Op (1). Therefore, by arguments similar
−1
2nM̂n (θ̃) = nG0n (θ̂n )W Gn (θ̂n ) − nG0n (θ̂n )W Gθ (G0θ W Gθ )
G0θ W Gn (θ̂n ) + op (1).
By equation (16), the first part of the previous equation satisfies
nG0n (θn )W Gn (θn ) + 2nG0n (θn )W Gθ (θ̂n − θn ) + (θ̂n − θn )G0θ W Gθ (θ̂n − θn ) + op (1)
and the second part satisfies
−1
nG0n (θn )W Gθ (G0θ W Gθ )
G0θ W Gn (θn )+2nG0n (θn )W Gθ (θ̂n − θn )
+(θ̂n − θn )G0θ W Gθ (θ̂n − θn ) + op (1).
Combining the two gives equation (17), cf. equation (13).
References
Andrews, D. W. (1992). Generic Uniform Convergence. Econometric Theory 8, 241–257.
Andrews, D. W. (1999). Estimation When a Parameter Is on a Boundary. Econometrica 67, 1341–1383.
Andrews, D. W. (2002). Generalized Method of Moments Estimation When a Parameter
Is on a Boundary. Journal of Business & Economic Statistics 20, 530–544.
Andrews, D. W. and X. Cheng (2012). Estimation and Inference With Weak, SemiStrong, and Strong Identification. Econometrica 80, 2153–2211.
Andrews, D. W. and X. Cheng (2014a). GMM Estimation and Uniform Subvector Inference With Possible Identification Failure. Econometric Theory 30, 287–333.
15
Andrews, D. W. and X. Cheng (2014b). Supplement to ‘GMM Estimation and Uniform
Subvector Inference With Possible Identification Failure’. Econometric Theory 30.
Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile Prices in Market Equilibrium.
Econometrica 63, 841–890.
Dovonon, P. and E. Renault (2013). Testing for Common Conditionally Heteroskedastic
Factors. Econometrica 81, 2561–2586.
Hansen, L. (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 50, 1092–1054.
Ketz, P. (2015). A Simple Solution to Invalid Inference in the Random Coefficients Logit
Model. Working Paper.
Ketz, P. (2017a). Subvector Inference When the True Parameter Vector is near the
Boundary. Working Paper.
Ketz, P. (2017b). Supplementary material for “Subvector Inference When the True Parameter Vector is near the Boundary”.
Lee, J. H. and Z. Liao (2014). On Standard Inference for GMM With Seeming Local
Identication Failure. Working Paper.
McCloskey, A. (2016). Bonferroni-Based Size-Correction for Nonstandard Testing Problems. Working Paper.
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing.
Handbook of Econometrics 4, 2111–2245.
Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057.
Small, D. S. (2007). Sensitivity Analysis for Instrumental Variables Regression With
Overidentifying Restrictions. Journal of the American Statistical Association 102,
1049–1058.
16