Testing Overidentifying Restrictions When the True Parameter Vector Is near or at the Boundary of the Parameter Space Philipp Ketz∗ March 16, 2017 We show that the standard test for testing overidentifying restrictions, which compares the J-statistic (Hansen, 1982) to χ2 critical values, suffers from overrejection when the true parameter vector is near or at the boundary of the parameter space. While it is possible to perform size-controlled inference by adjusting the critical values, we suggest the use of a modified J-statistic that, under the null hypothesis, is asymptotically χ2 distributed. The resulting test constitutes a natural extension of the standard overidentification test, which is valid under the assumption that the true parameter is in the interior, and has the advantage of being easy to implement. A small Monte Carlo study asserts that the test has good power properties. Keywords: Overidentifying restrictions, testing, boundary. ∗ Paris School of Economics, [email protected]. 48 Boulevard Jourdan, 75014 Paris, France. Email: 1 Introduction Economic models often imply certain restrictions on the distribution of the data. When the number of restrictions, H, exceeds the number of unknown parameters, p, the additional, so-called overidentifying, restrictions allow us to test the model’s validity, at least partially. The most widely used test is based on the J-statistic, given by the Generalized Method of Moments (GMM) objective function evaluated at its minimizer, i.e., the GMM estimator (Hansen, 1982). In standard settings, the asymptotic distribution of the J-statistic is given by a χ2 distribution with the degrees of freedom equal to the number of overidentifying restrictions, H − p. One of the assumptions underlying the standard result is that the true parameter vector is in the interior of the parameter space. In that case, the J-statistic behaves asymptotically like a quadratic function evaluated at its unconstrained minimizer. When instead the parameter vector is near or at the boundary of the parameter space, the J-statistic behaves asymptotically like a quadratic function evaluated at a constrained minimizer. Therefore, ceteris paribus the J-statistic is (weakly) greater under the nonstandard condition that the true parameter vector is near or at the boundary. As a result, the standard overidentification test suffers from overrejection when standard critical values are employed. In the context of model specification testing, this increases the likelihood of mistakenly discarding a valid model. A prominent example in which the parameter space is restricted and estimates are often found to be close to the boundary is the random coefficients logit model (Berry, Levinsohn, and Pakes, 1995).1 In this paper, we propose a modified J-statistic based on the results in Ketz (2017a). The modified J-statistic is given by a quadratic approximation to the original objective function evaluated at its minimizer. We show that, under the null hypothesis, the modified J-statistic is asymptotically χ2 distributed with H − p degrees of freedom, i.e., it has the standard asymptotic distribution. The statistic can be computed at no additional cost and the resulting test has the benefit of allowing inference as if under standard conditions, given that it relies on standard critical values. An alternative strategy would be to rely on the J-statistic and to adjust the critical values in order to obtain a test that controls size. In this paper, we use the results in Ketz (2017a), which are based on Andrews (1999), to derive the asymptotic null distribution of the J-statistic when the true parameter is near or at the boundary of the parameter space. In the special case, where there is at most one parameter near or at the boundary of the parameter space, the asymptotic distribution is given by P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1), where 0.5 ≤ P (h) ≤ 1 denotes the probability with which the estimator (asymptotically) takes on a value in the interior of the parameter space, which depends on h, a localization parameter that measures how close the true parameter is to 1 See Ketz (2015) for a discussion. 1 the boundary (P (0) = 0.5 and limh→∞ P (h) = 1).2 As h is unknown and not consistently estimable, a natural approach to size-correct inference is based on the least favorable configuration approach. In the special case with one parameter near or at the boundary, the least favorable configuration is given by h = 0, such that the critical value is based on 0.5χ2 (H − p) + 0.5χ2 (H − p + 1).3 By construction, the resulting test is conservative for h > 0 and may thus sacrifice power, as illustrated in a small Monte Carlo study. This paper adds to the recent literature concerned with the behavior of the J-statistic under nonstandard conditions. Dovonon and Renault (2013) (DR) derive the asymptotic distribution of the J-statistic when some of the parameters are not first-order identified. This phenomenon occurs, for example, when testing for common features in asset returns. The asymptotic distribution result in DR bears a certain resemblance to ours. In particular, for a special case of theirs, they find that the asymptotic null distribution of the J-statistic is given by 0.5χ2 (H − 1) + 0.5χ2 (H), which corresponds to the special case described above when p = 1 and h = 0, i.e., the true parameter is at the boundary.4 In contrast to our setup, however, the asymptotic null distribution of the J-statistic in DR does not depend on the true parameter value. DR tie their result to the nonstandard asymptotic distribution of the GMM estimator, which results from the lack of first-order identification. Recently, Lee and Liao (2014) have suggested the use of additional moment conditions in order to overcome the lack of first-order identification encountered in DR. In particular, they exploit the fact that the lack of first-order identification implies additional moment conditions, as the Jacobian of the original moment equals zero. Lee and Liao (2014) show that the J-statistic based on the extended set of moment conditions has the standard asymptotic null distribution. As a by-product, their approach also restores asymptotic normality of the GMM estimator. Irrespective of whether a similar strategy is feasible here, it would be highly undesirable, requiring that the true parameter be at the boundary under the null hypothesis. This being said, our solution is similar in spirit to Lee and Liao (2014) in that both, the resulting estimator and the resulting J-statistic, have standard asymptotic distributions. The outline of this paper is as follows. In Section 2, we derive the nonstandard asymptotic distribution of Hansen’s J-statistic when the true parameter vector may be near or at the boundary of the parameter space. In Section 3, we propose a modified J-statistic and derive its asymptotic distribution. The results of a small Monte Carlo study are presented in Section 4. Section 5 concludes. 2 Note that even in the just identified case, this implies that the J-statistic has a non-degenerate distribution as long as h < ∞. 3 P (h)χ2 (H −p)+(1−P (h))χ2 (H −p+1) is stochastically dominated by 0.5χ2 (H −p)+0.5χ2 (H −p+1) for h > 0. 4 In fact, when the true parameter is at the boundary, the results in DR can be used (upon reparameterization) to derive our asymptotic distribution results, at least for some models considered here. See the last paragraph of Section 2 for details. 2 2 Hansen’s J-statistic We consider the following population moment G(θ; γ) ≡ Eγ [g(θ, wi )], (1) where g(θ, wi ) denotes a H-dimensional vector function which depends on the p-dimensional vector θ and some data, wi . We assume that the distribution of the data is fully characterized by the vector γ = (θ, φ) ∈ Γ and let Eγ denote the corresponding expectation. Here, φ denotes an infinite-dimensional nuisance parameter.5 The parameter space, Γ, is assumed to be compact and of the following form Γ = {γ = (θ, φ) : θ ∈ Θ, φ ∈ Φ(θ)} with6 Θ = [−c, c]p1 × [0, c]p2 , (2) where 0 < c < ∞ and p1 , p2 ∈ N such that p = p1 + p2 with θ = (θ1 , θ2 ) denoting a conformable partition.7 Furthermore, Φ(θ) ⊂ Φ ∀θ ∈ Θ for some compact parameter space Φ with a metric that induces weak convergence of the bivariate distributions (wi , wi+m ) for all i, m ≥ 1. In order to define the J-statistic, we first introduce some additional notation. Define the sample moment corresponding to the population moment given in equation (1) as follows n 1X g(θ, wi ). (3) Gn (θ) ≡ n i=1 The GMM objective function is given by Qn (θ) = Gn (θ)0 Wn Gn (θ)/2, (4) where Wn denotes an asymptotically nonsingular weighting matrix. Possible dependence of Wn on θ is suppressed for notational convenience and can be justified by the use of a consistent first step estimator. Define an estimator, θ̂n , as any random variable that satisfies θ̂n ∈ Θ and Qn (θ̂n ) = min Qn (θ) + op (1). (5) θ∈Θ Hansen’s J-statistic is given by Jn = 2nQn (θ̂n ). 5 (6) Despite the presence of φ the model is parametric in nature, as φ is not estimated. The normalization to 0 is WLOG. 7 Here and in what follows, we let (a, b) denote the vector (a0 , b0 )0 , where a and b are column vectors. 6 3 The J-statistic naturally lends itself to testing the null hypothesis that the economic model is well specified, i.e., H0 : Eγ0 [g(θ0 , wi )] = 0, where γ0 = (θ0 , φ0 ) denotes the (fixed) true parameter. In the standard case, where p θ0 ∈ int(Θ), the asymptotic null distribution of Jn is given by χ2 (H − p), if Wn → Eγ0 [g(θ0 , wi )g(θ0 , wi )0 ]−1 . Intuitively, the estimator, θ̂n , sets exactly p out of H sample moments equal to 0. Under the above null hypothesis, the remaining H − p sample moments should be close to zero, such that their quadratic form (appropriately scaled) is asymptotically χ2 distributed with H − p degrees of freedom. In this paper, we are interested in obtaining good finite sample approximations to the behavior of the J-statistic when the true parameter is near or at the boundary of the parameter space. Therefore, it is essential to derive asymptotic theory under certain drifting sequences of true parameters. Let γn = (θn , φn ) = (θ1n , θ2n , φn ) ∈ Γ denote a sequence of true parameters.8 The sequences of primary interest are Γ(γ0 , h) = {{γn ∈ Γ(γ0 ) : n ≥ 1} : √ nθ2n → h, where khk < ∞}, where9 Γ(γ0 ) = {{γn ∈ Γ : n ≥ 1} : γn → γ0 ∈ Γ}. Note that γn ∈ Γ(γ0 , h) is such that that θ2n → 0, i.e., the sequence of true parameters drifts towards the boundary. Under sequences of true parameters, the null hypothesis under test becomes (7) H0 : Eγn [g(θn , wi )] = 0. Next, we state several high level assumptions under which the asymptotic theory in this paper is derived. They are stated for general population and sample moments. In particular, the moments need not take the form given in equations (1) and (3). The assumptions are taken from Andrews and Cheng (2014a) with slight modifications as we do not allow for varying degrees of identification strength. The first assumption concerns consistency.10 Assumption 1. (i) Under {γn } ∈ Γ(γ0 ), supθ∈Θ kGn (θ)−G(θ; γ0 )k →p 0 and supθ∈Θ kWn (θ)− W (θ; γ0 )k →p 0 for some nonrandom funtions G(θ; γ0 ) and W (θ; γ0 ). (ii) G(θ; γ0 ) = 0 if and only if θ = θ0 . (iii) G(θ; γ0 ) has continuous left/right (l/r) partial derivatives on Θ, denoted Gθ (θ; γ0 ). 8 Then, {wi }ni=1 forms a triangular array. For ease of notation, we omit the index n. With a slight abuse of notation, we now let γ0 denote the limit of the sequence of true parameters. 10 Assumption 1 (i), (ii), (iii), (v), (vi), and (vii) correspond to Assumption GMM1 (ii), (iv), (v), (vi), (vii) and (ix) in Andrews and Cheng (2014a), respectively. And Assumption 1 (iv) corresponds to (part of) Assumption GMM5 (iii) in Andrews and Cheng (2014a). 9 4 (iv) Gθ ≡ Gθ (θ0 ; γ0 ) has full column rank. (v) W (θ; γ0 ) is continuous in θ on Θ. (vi) W ≡ W (θ0 ; γ0 ) is nonsingular. (vii) Θ is compact. Note that, by Theorem 6 in Andrews (1999), Assumption 1(iii) implies that G(θ; γ0 ) = G(θn ; γ0 ) + Gθ (θn ; γ0 )(θ − θn ) + o(kθ − θn k). (8) The next assumption ensures that the objective function is asymptotically well approximated by a quadratic function, see below.11 Assumption 2. Under {γn } ∈ Γ(γ0 ), sup θ∈Θ:kθ−θn k≤δn √ nkGn (θ) − G(θ; γ0 ) − Gn (θn ) + G(θn ; γ0 )k √ = op (1) (1 + k n(θ − θn )k) for all constants δn → 0. The following assumption is sufficient for Assumption 2 and can often be verified using a ULLN, see e.g., Andrews (1992).12 Assumption 2∗ . (i) Gn (θ) has continuous l/r partial derivatives on Θ ∀n ≥ 1. (ii) Under {γn } ∈ Γ(γ0 ), sup θ∈Θ:kθ−θn k≤δn ∂ Gn (θ) − Gθ = op (1) ∂θ0 for all constants δn → 0. The last assumption concerns the asymptotic behavior of the scaled sample moment.13 √ Assumption 3. Under {γn } ∈ Γ(γ0 ), nGn (θn ) →d Y ∼ N (0, V ) for some symmetric and positive definite matrix V .14 By Lemma 3.1 in Andrews and Cheng (2014a), Assumptions 1(i) and (v) ensure that under {γn } ∈ Γ(γ0 ) ∀γ0 ∈ Γ we have Wn ≡ Wn (θ̄n ) = W + op (1), 11 (9) Assumption 2 corresponds to Assumption GMM2(ii) in Andrews and Cheng (2014a). Assumption 2∗ (i) corresponds to Assumptions GMM2∗ (i) in Andrews and Cheng (2014a), while Assumption 2∗ (ii) subsumes Assumptions GMM2∗ (iii) and GMM5(ii) and (iii) in Andrews and Cheng (2014a). 13 Assumption 3 corresponds to Assumption GMM5(i) in Andrews and Cheng (2014a). 14 For notational convenience, we omit the dependence of V on γ0 . 12 5 where θ̄n denotes a consistent first-step estimator, which is available given Assumption 1, e.g., with Wn (θ) and W (θ; γ0 ) equal to the identity matrix. Before formally stating the asymptotic distribution result for Hansen’s J-statistic, we give an heuristic argument. Under the above assumptions, the objective function admits the following quadratic approximation Qn (θ) = Qn (θn ) + G0n (θn )W Gθ (θ − θn ) + (θ − θn )0 G0θ W Gθ (θ − θn )/2 + Rn (θ), where Rn (θ) denotes the remainder. Let I = G0θ W Gθ , √ Zn = I −1 G0θ W nGn (θn ), and qn (λ) = (λ + Zn )0 I(λ + Zn ). Then, the above equation can be written as Qn (θ) = Qn (θn ) − √ 1 0 1 Zn IZn + qn ( n(θ − θn )) + Rn (θ). 2n 2n √ It can be shown that n(θ̂n − θn ) is asymptotically equivalent to the minimizer of qn (λ) √ subject to λ ∈ Λ, say λ̂n . Here, Λ denotes the limit of n(Θ−θn ). Under {γn } ∈ Γ(γ0 , h), Λ depends on h and we write Λh , where Λh = Rp1 × [−h1 , ∞) × · · · × [−hp2 , ∞). Note that qn (λ̂n ) = λ̂0n I(λ̂n + Zn ) + Zn0 I(λ̂n + Zn ). Since λ̂0n I(λ̂n + Zn ) = 0 (see e.g., the proof of part (c) of Theorem 3 in Andrews (1999)), which implies that λ̂0n I λ̂n = −λ̂0n IZn , we have that qn (λ̂n ) = −λ̂0n I λ̂n + Zn0 IZn , such that Qn (θ̂n ) = Qn (θn ) − 1 0 λ̂ I λ̂n + Rn (θ̂n ). 2n n It follows that15 Jn = 2nQn (θ̂n ) = 2nQn (θn ) − λ̂0n I λ̂n + op (1). (10) In order to formally state the asymptotic distribution of Hansen’s J-statistic we introduce 15 Here, we use Rn (θ̂n ) = op (1/n), which follows from the proof of Proposition 1. 6 some additional notation. In particular, let q(λ) = (λ + Z)0 I(λ + Z), where16 Z = I −1 G0θ W Y ∼ N (0, I −1 G0θ W V W Gθ I −1 ). (11) Proposition 1. Under {γn } ∈ Γ(γ0 ) and Assumptions 1, 2, and 3, Hansen’s J-statistic, Jn , defined in equation (6) is asymptotically distributed as Y 0 W Y − λ̂0 I λ̂, where λ̂ denotes the minimizer of q(λ) subject to λ ∈ Λ. The proof of Proposition 1 is outlined above. Details can be found in Appendix A. Remark 1. When the true parameter vector is “far enough” from the boundary, i.e., √ n(Θ − θn ) → Rp , then λ̂ = −Z. If, in addition, W = V −1 , then the asymptotic d distribution result reduces to the standard result, i.e., Jn → χ2 (H − p). Remark 2. Under {γn } ∈ Γ(γ0 , h), the asymptotic distribution of Jn is nonstandard and depends on h. When p2 = 1, i.e., there is only one (scalar) parameter which is restricted below by zero, there exists a simple closed form expression for the asymptotic distribution of Jn . Let P (h) denote the probability that Zp < h. Then, if W = V −1 , the asymptotic distribution of Jn is given by P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1). The closed form expression for the asymptotic null distribution of the J-statistic given in Remark 2, albeit only valid for p2 = 1, allows us to illustrate several important points. First, when the true parameter is near or at the boundary (h < ∞), the test that compares the J-statistic to the standard critical value, based on χ2 (H −p), suffers from overrejection since χ2 (H − p) is stochastically dominated by P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1). Furthermore, it is easy to see that by comparing the J-statistic to the appropriate quantile of 0.5χ2 (H − p) + 0.5χ2 (H − p + 1) we obtain a test that controls size, as P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1) is stochastically dominated by 0.5χ2 (H − p) + 0.5χ2 (H − p + 1) for all h > 0 (including h = ∞). The resulting test is conservative by construction whenever h > 0. Note that using the appropriate quantile of P (h)χ2 (H − p) + (1 − P (h))χ2 (H − p + 1) is infeasible in practice, as h is unknown and not consistently estimable. Lastly, we establish the link between our results and the results in Dovonon and Renault (2013). In particular, we heuristically show how the theory in Dovonon and Renault (2013) can be used to derive the convergence rate of the GMM estimator and the asymptotic distribution of the corresponding J-statistic in the context of the random coefficients logit model (Berry, Levinsohn, and Pakes, 1995), when the true parameter vector is at the boundary. In Berry, Levinsohn, and Pakes (1995), the random coefficients 16 For notational convenience, we omit the dependence of Z on γ0 . 7 are parameterized with respect to a vector of means and a vector of standard deviations. Here, we consider the special case of a single random coefficient with known mean and unknown standard deviation, σ. Then, Proposition 1 applies after a reparameterization in terms of variances, σ 2 (Ketz, 2015). In the notation of this paper, the true parameter is at the boundary if h = 0 and σT2 = 0 ∀T , where T denotes the sample size. For the purpose of this discussion, we treat the true parameter as fixed and write, σ02 = 0. Then, Remark 2 implies that Jn is asymptotically distributed as 0.5χ2 (H − 1) + 0.5χ2 (H), since p1 = 0 and p = p2 = 1. This result is also obtained a special case of Corollary 3.2 in Dovonon and Renault (2013) with p = 1 under the original parameterization with respect to σ. To see this, note that the random coefficients logit model in its original parameterization is not first-order identified at σ = 0 (Ketz, 2015). Furthermore, defining √ σ = +/ − σ 2 we can treat σ0 = 0 as an interior point (for the purpose of estimation) such that the assumptions of Corollary 3.2 in Dovonon and Renault (2013) are satisfied. Similarly, Propositions 3.1 and 3.2 in Dovonon and Renault (2013) apply. In particular, T 1/4 σ̂T is Op (1) and asymptotically either equals zero (with probability 0.5) or has a nondegenerate distribution with zero-probability mass at zero (with probability 0.5), as illustrated in Section 6 in Ketz (2015).17 3 Modified J-statistic Although it is possible to perform size-controlled inference based on the J-statistic using, for example, the least favorable configuration approach, it has two disadvantages.18 First, the construction of critical values is, if not difficult, nonstandard and requires simulation, if p2 > 1. Second, the resulting test may unnecessarily sacrifice power, when the true parameter is “far” from the boundary of the parameter space. In this paper, we propose an alternative J-statistic based on the idea in Ketz (2017a), which under the null hypothesis given in equation (7) has the standard χ2 (H − p) distribution. Consider the following quadratic approximation to the objective function19 Mn (θ) = G0n (θn )W Gn (θn )/2 + G0n (θn )W Gθ (θ − θn ) + (θ − θn )0 G0θ W Gθ (θ − θn )/2. (12) The minimizer of Mn (θ), say θ̈n , has the following closed form expression θ̈n = θn − (G0θ W Gθ )−1 G0θ W Gn (θn ). Under the null, Mn (θ) (multiplied by 2n) evaluated at θ̈n has the same asymptotic p p Instead of σ̂T = + σ̂T2 , we could also let σ̂T = − σ̂T2 . 18 Alternatively, one could apply the methods developed in McCloskey (2016). 19 The only difference with the quadratic approximation given in Section 2 is that Qn (θn ) is replaced by G0n (θn )W Gn (θn )/2, see equation (4). 17 8 distribution as the standard J-statistic, χ2 (H − p). To see this, let W = AA0 and G0θ W Gθ = B 0 B. Then, 2nMn (θ̈) = nG0n (θn )W Gn (θn ) − nG0n (θn )W Gθ (G0θ W Gθ )−1 G0θ W Gn (θn ) (13) can be written as √ √ ( nA0 Gn (θn ))0 I − B(B 0 B)−1 B 0 ( nA0 Gn (θn )), √ d where ( nA0 Gn (θn )) → N (0, I) and (I − B(B 0 B)−1 B 0 ) is an idempotent matrix of rank H − p, and the result follows. 2nMn (θ̈) as an alternative J-statistic is, however, not feasible, as it depends on unknown quantities. Instead, we propose the following test statistic (14) JnM = 2nM̂n (θ̃n ). Here, θ̃n denotes the minimizer of M̂n (θ), which denotes the sample analogue of Mn (θ). In particular, let Ĝθ,n denote a consistent estimator of Gθ , which by Assumptions 1-3 exists.20 If, for example, Assumption 2∗ holds, we can take Ĝθ,n = ∂θ∂ 0 Gn (θ̂n ). Then, M̂n (θ) is given by M̂n (θ) = G0n (θ̂n )Wn Gn (θ̂n )/2 + G0n (θ̂n )Wn Ĝθ,n (θ − θ̂n ) + (θ − θ̂n )0 Ĝ0θ,n Wn Ĝθ,n (θ − θ̂n )/2. (15) Next, we state the main result of this paper. √ d Theorem 1. Under {γn } ∈ Γ(γ0 ) and Assumptions 1, 2, and 3, n(θ̃n − θn ) → Z, where Z is given in equation (11). If, in addition, W = V −1 , then the modified J-statistic, JnM , given in equation (14) is asymptotically distributed as χ2 (H − p). The proof of Theorem 1 is given in Appendix A. The first conclusion is obtained by verifying Assumptions 1-3 in Ketz (2017a), while the second conclusion is obtained by showing that M̂n (θ̃n ) = Mn (θ̈) + op (1/n). The modified J-statistic incurs no additional computing cost compared to the standard J-statistic, cf. equation (15). Given its standard asymptotic null distribution, the test based on the modified J-statistic constitutes an attractive alternative to the test based on the J-statistic, which requires an adjustment of the critical values in order to ensure size-control. √ Remark 3. It follows from the proof of Theorem 1 that we only require θ̂n to be nconsistent. In particular, Mn (θ) and M̂n (θ) are not required to rely on the same (asymptotic) weighting matrix. As in the context of one-step estimators (Newey and McFadden, 20 See, for example, the estimator described on p. 1043 in Pakes and Pollard (1989), which by Assumption 2 is consistent, cf. Lemma 1 in Andrews (2002). 9 1994), this allows for θ̂n to be obtained using an inefficient weighting matrix, which in some cases may entail considerable computational benefits. 4 Monte Carlo This Monte Carlo study serves two purposes. First, we show that the asymptotic theory derived in Sections 2 and 3 provides good finite sample approximations. Second, we conduct a power comparison of the two tests that control size, i.e., the test that compares the J-statistic to adjusted critical values and the test that compares the modified Jstatistic to standard critical values. In order to assess the quality of the approximations provided by our asymptotic theory, we turn to the random coefficients logit model introduced by Berry, Levinsohn, and Pakes (1995). The restrictions on the parameter space arise naturally in this model, as each random coefficient is parameterized with respect to a mean and a variance, which is bounded below by zero. The fact that the boundary is “hard” suits the purpose of this Monte Carlo study, as the contribution of this paper is the availability of the modified J-statistic in models, where the objective function is not defined outside the parameter space.21 The data generating process is taken from Ketz (2015) with the only difference that instead of using the optimal instrument for the endogenous variable, xjt,1 , we employ all three instruments, zjt . Then, we have H = 6 and p = 4 with p1 = 3 and p2 = 1. For the sake of brevity, we refrain from reproducing the details of the data generating process here. Table 1 below shows the null rejection frequencies of the three different tests when the true parameter is at the boundary, i.e., σ 2 = 0. The first test compares the J-statistic to the standard critical value, the second test compares the J-statistic to the adjusted critical value, and the last test compares the modified J-statistic to the standard critical value. Table 1: Null rejection frequencies Test statistic Jn Jn JnM 2 2 2 5% Critical value χ (2) 0.5χ (2) + 0.5χ (3) χ2 (2) Rejection frequency 0.090 0.058 0.056 Table 1 shows that our asymptotic theory provides good finite sample approximations. The first test overrejects at the 5% significance level with a null rejection frequency of 9%. The other two tests have null rejection frequencies close to the nominal level of 5%, with null rejection frequencies of 5.8% and 5.6%, respectively. 21 In models, where the boundary is merely user imposed—e.g., the coefficient on price is a priori assumed to be nonpositive—, it is possible to employ the “standard” J-statistic, i.e., the objective function evaluated at the unrestricted estimator, which simply ignores the restrictions on the parameter space. 10 In what follows, we let Jn denote the J-statistic or the test based on that statistic using adjusted critical values. Equivalently, we let JnM denote the modified J-statistic or the test based on it, which uses standard critical values. In order to compare Jn and JnM in terms of power, we make use of a simple linear Instrumental Variables (IV) model, where we “artificially” restrict the slope parameter to be nonnegative. Compared to the random coefficients logit model, the simple linear IV model provides more transparency with respect to the alternatives under which we evaluate power. Note that in this context, the modified J-statistic reduces to the “standard” J-statistic that ignores the “artificial” restrictions on the parameter space, see also footnote 21. The model is given by the equation of interest yi = β0 + β1 xi + ui and the reduced form equation for xi , the endogenous regressor, xi = π0 + π1 z1i + πz2i + vi . The data are generated according to ui vi z1i z2i ∼ N 0 0 0 0 , 1 0.5 σuz1 σuz2 0.5 1 0 0 σuz1 0 1 0.2 σuz2 0 0.2 1 . The parameter values are β0 = π0 = π1 = π2 = 1. In the tables below, we vary σuz1 , and σuz2 in order to assess power under different alternatives. We also vary β1 to assess the impact of the restrictions on the parameter space on power. We impose that β1 cannot be negative, i.e., β1 ≥ 0. We choose n, the sample size, equal to 250 and perform 1000 Monte Carlo simulations. Let β̂0 , β̂1 and β̃0 , β̃1 denote the restricted and the unrestricted estimators, respectively. Furthermore, let zi = (1, z1i , z2i )0 , ui (β0 , β1 ) = yi − β0 − β1 xi , and n 1X 2 u (β0 , β1 ). σ (β0 , β1 ) = n i=1 i 2 Then, the J-statistic is given by n Jn (β0 , β1 ) = n 1X ui (β0 , β1 )zi0 n i=1 ! n 1X 0 zi zi n i=1 11 !−1 n 1X ui (β0 , β1 )zi0 n i=1 ! /σ 2 (β0 , β1 ), such that Jn = Jn (β̂0 , β̂1 ) and JnM = Jn (β̃0 , β̃1 ). Here, H = 3 and p = 2 with p1 = p2 = 1. Therefore, the critical values for Jn and JnM are based on 0.5χ2 (1) + 0.5χ2 (2) and χ2 (1), respectively. Table 2 shows the rejection frequencies of Jn and JnM at the 5% significance level alongside the averages of the estimates over the 1000 Monte Carlo replications for different values of σuz1 and σuz2 , while β1 = 0, i.e., the true parameter is at the boundary. σuz1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 σuz2 0 0 0 0 0 0 0 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 Table 2: Power analysis - β1 = 0 β̃0 β̃1 β̂0 β̂1 RF(JnM ) 1.1253 -0.1258 0.9986 0.0000 0.9460 1.0837 -0.0845 0.9985 0.0002 0.6680 1.0421 -0.0431 0.9961 0.0026 0.2210 1.0006 -0.0017 0.9836 0.0150 0.0490 0.9590 0.0397 0.9555 0.0432 0.2600 0.9174 0.0812 0.9171 0.0815 0.7420 0.8757 0.1228 0.8757 0.1228 0.9860 1.2498 -0.2495 0.9987 0.0000 0.0590 1.1667 -0.1670 0.9987 0.0000 0.0620 1.0837 -0.0844 0.9985 0.0002 0.0530 1.0006 -0.0017 0.9836 0.0150 0.0490 0.9174 0.0812 0.9171 0.0815 0.0460 0.8341 0.1644 0.8341 0.1644 0.0400 0.7506 0.2479 0.7506 0.2479 0.0420 RF(Jn ) 0.9990 0.8830 0.3440 0.0550 0.1690 0.6260 0.9750 1.0000 0.9890 0.4970 0.0550 0.0250 0.0210 0.0210 Analogously to Table 1, Table 2 shows that both tests control size. The upper half of Table 2 reveals that for some alternatives, σuz1 < 0 and σuz2 = 0, the Jn has better power, while for other alternatives, σuz1 > 0 and σuz2 = 0, the JnM has better power. Looking at the (averages of the) estimates we see that the performances of the two tests are closely linked to the direction of the bias resulting from the violation of the moment conditions. In particular, the Jn does well when the bias of the unrestricted estimator is such that the restricted estimator becomes unbiased, whereas when the bias is in the other direction, the Jn underperforms compared to the JnM . This results from the estimator being biased towards the interior of the parameter space such that estimates eventually cease to be restricted, see e.g., σuz1 = −0.3 and σuz2 = 0, leading to identical J-statistics (Jn = JnM ). Since the Jn uses a larger critical value it sacrifices power in comparison. The lower half of Table 2 reveals that the JnM only has trivial power against some alternatives, which is not surprising given that it corresponds to the “standard” overidentification test when there are no restrictions on the parameter space. The lack of power of the overidentification test for certain alternatives has for example been pointed out in Small (2007). The Jn , on the other hand, has good power for some of these alternative, which follows from the above reasoning. 12 σuz1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 σuz2 0 0 0 0 0 0 0 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 Table 3: Power analysis - β1 = 0.1 β̃0 β̃1 β̂0 β̂1 RF(JnM ) 1.1253 -0.0258 1.0927 0.0065 0.9460 1.0837 0.0155 1.0733 0.0258 0.6680 1.0421 0.0569 1.0403 0.0587 0.2210 1.0006 0.0983 1.0004 0.0984 0.0490 0.9590 0.1397 0.9590 0.1397 0.2600 0.9174 0.1812 0.9174 0.1812 0.7420 0.8757 0.2228 0.8757 0.2228 0.9860 1.2498 -0.1495 1.0992 0.0000 0.0590 1.1667 -0.0670 1.0985 0.0007 0.0620 1.0837 0.0156 1.0732 0.0259 0.0530 1.0006 0.0983 1.0004 0.0984 0.0490 0.9174 0.1812 0.9174 0.1812 0.0460 0.8341 0.2644 0.8341 0.2644 0.0400 0.7506 0.3479 0.7506 0.3479 0.0420 RF(Jn ) 0.9370 0.5740 0.1410 0.0280 0.1630 0.6250 0.9750 0.9600 0.3460 0.0380 0.0280 0.0250 0.0210 0.0210 Table 3 replicates Table 2 with β1 = 0.1, i.e., the true parameter is no longer at the boundary, but close to it relative to the sample size. For all alternatives considered in the upper half of Table 3, we see that the JnM outperforms the Jn . This is a direct consequence of the conservativeness of the Jn when the true parameter is not at the boundary (see σuz1 = σuz2 = 0). The lower half of Table 3 shows a similar pattern as that of Table 2, except that the Jn only starts to display nontrivial power for a relatively large violation of the moment conditions. 5 Conclusion In this paper, we show that the overidentification test that compares the J-statistic, given by the GMM objective function evaluated at its minimizer, to standard critical values suffers from overrejection when the true parameter vector is near or at the boundary of the parameter space. While it is possible to make size-controlled inference based on the J-statistic using adjusted critical values, we suggest the use of a modified J-statistic, which is available at no additional computing cost and which, under the null hypothesis, has the standard asymptotic distribution. Therefore, testing overidentifying restrictions using the modified J-statistic is simple and has a “flavor” of making inference under standard conditions. 13 A Proofs Proof of Proposition 1. Assumptions 1-3 correspond to Assumptions GMM1, GMM2, and GMM5 in Andrews and Cheng (2014a). Assumptions 2 and 3 in Ketz (2017a) and Assumption 6 in Ketz (2017b) correspond to Assumptions D1-D3 in Andrews and Cheng (2012). By Lemma 10.1 in Andrews and Cheng (2014b) and Lemma 3.1 in Andrews and Cheng (2012), Assumption GMM1 in Andrews and Cheng (2014a) implies Assumption 5 in Ketz (2017b). By Lemma 10.3 in Andrews and Cheng (2014b), Assumptions GMM1, GMM2, and GMM5 in Andrews and Cheng (2014a) imply Assumptions D1-D3 in Andrews and Cheng (2012). Therefore, Assumptions 1-3 imply 2 and 3 in Ketz (2017a) and Assumptions 5 and 6 in Ketz (2017b) with DQn (θ) = G0θ W Gn (θ) and D2 Qn (θn ) = J(γ ∗ ) = G0θ W Gθ . The result then follows by extending the result and the proof of Proposition 1 in Ketz (2017b) by part (c) of Theorem 3 in Andrews (1999). Proof of Theorem 1. By Proposition 1, Assumption 1 in Ketz (2017a) is satisfied. Furthermore, from the proof of Proposition 1 it follows that Assumptions 2 and 3 in Ketz (2017a) are also satisfied, under Assumptions 1-3. It remains to show that Assumption 4 in Ketz (2017a) holds. Note that θ̃n is given by −1 θ̃n = θ̂n − Ĝ0θ,n Wn Ĝθ,n Ĝ0θ,n Wn Gn (θ̂n ). Since, by Assumptions 1 and 2, Ĝ0θ,n Wn Ĝθ,n = G0θ W Gθ + op (1), Assumption 4(ii) in Ketz (2017a) is satisfied. Similarly, we have ∂ 0 G (θ̂n )Wn = G0θ W + op (1). ∂θ0 n Therefore, it suffices to show that sup √ θ∈Θ:k n(θ−θn )k≤ √ kGn (θ) − Gn (θn ) + Gθ (θ − θn )k = op (1/ n), (16) which holds by Assumption 2 and equation (8), since kGθ (θn ; γ0 ) − Gθ (θ0 ; γ0 )k = o(kθn − 14 θ0 k). The second conclusion of the Theorem follows by showing that M̂n (θ̂n ) = Mn (θ̈) + op (1/n). (17) Analogously to equation (13), we can write 2nM̂n (θ̃) =nG0n (θ̂n )Wn Gn (θ̂n ) −1 −nG0n (θ̂n )Wn Ĝ0θ,n Ĝ0θ,n Wn Ĝθ,n Ĝθ,n Wn Gn (θ̂n ). From equation (16), it follows that to above, equation (18) satisfies (18) √ nGn (θ̂n ) = Op (1). Therefore, by arguments similar −1 2nM̂n (θ̃) = nG0n (θ̂n )W Gn (θ̂n ) − nG0n (θ̂n )W Gθ (G0θ W Gθ ) G0θ W Gn (θ̂n ) + op (1). By equation (16), the first part of the previous equation satisfies nG0n (θn )W Gn (θn ) + 2nG0n (θn )W Gθ (θ̂n − θn ) + (θ̂n − θn )G0θ W Gθ (θ̂n − θn ) + op (1) and the second part satisfies −1 nG0n (θn )W Gθ (G0θ W Gθ ) G0θ W Gn (θn )+2nG0n (θn )W Gθ (θ̂n − θn ) +(θ̂n − θn )G0θ W Gθ (θ̂n − θn ) + op (1). Combining the two gives equation (17), cf. equation (13). References Andrews, D. W. (1992). Generic Uniform Convergence. Econometric Theory 8, 241–257. Andrews, D. W. (1999). Estimation When a Parameter Is on a Boundary. Econometrica 67, 1341–1383. Andrews, D. W. (2002). Generalized Method of Moments Estimation When a Parameter Is on a Boundary. Journal of Business & Economic Statistics 20, 530–544. Andrews, D. W. and X. Cheng (2012). Estimation and Inference With Weak, SemiStrong, and Strong Identification. Econometrica 80, 2153–2211. Andrews, D. W. and X. Cheng (2014a). GMM Estimation and Uniform Subvector Inference With Possible Identification Failure. Econometric Theory 30, 287–333. 15 Andrews, D. W. and X. Cheng (2014b). Supplement to ‘GMM Estimation and Uniform Subvector Inference With Possible Identification Failure’. Econometric Theory 30. Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile Prices in Market Equilibrium. Econometrica 63, 841–890. Dovonon, P. and E. Renault (2013). Testing for Common Conditionally Heteroskedastic Factors. Econometrica 81, 2561–2586. Hansen, L. (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 50, 1092–1054. Ketz, P. (2015). A Simple Solution to Invalid Inference in the Random Coefficients Logit Model. Working Paper. Ketz, P. (2017a). Subvector Inference When the True Parameter Vector is near the Boundary. Working Paper. Ketz, P. (2017b). Supplementary material for “Subvector Inference When the True Parameter Vector is near the Boundary”. Lee, J. H. and Z. Liao (2014). On Standard Inference for GMM With Seeming Local Identication Failure. Working Paper. McCloskey, A. (2016). Bonferroni-Based Size-Correction for Nonstandard Testing Problems. Working Paper. Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics 4, 2111–2245. Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057. Small, D. S. (2007). Sensitivity Analysis for Instrumental Variables Regression With Overidentifying Restrictions. Journal of the American Statistical Association 102, 1049–1058. 16
© Copyright 2026 Paperzz