TESTING A PARAMETRIC MODEL AGAINST A
NONPARAMETRIC ALTERNATIVE WITH IDENTIFICATION
THROUGH INSTRUMENTAL VARIABLES
Joel Horowitz
THE INSTITUTE FOR FISCAL STUDIES
DEPARTMENT OF ECONOMICS, UCL
cemmap working paper CWP14/04
TESTING A PARAMETRIC MODEL AGAINST A NONPARAMETRIC
ALTERNATIVE WITH IDENTIFICATION THROUGH INSTRUMENTAL VARIABLES
by
Joel L. Horowitz
Department of Economics
Northwestern University
Evanston, IL 60208
USA
September 2004
ABSTRACT
This paper is concerned with inference about a function g that is identified by a
conditional moment restriction involving instrumental variables. The paper presents a test of the
hypothesis that g belongs to a finite-dimensional parametric family against a nonparametric
alternative. The test does not require nonparametric estimation of g and is not subject to the illposed inverse problem of nonparametric instrumental variables estimation. Under mild
conditions, the test is consistent against any alternative model and has asymptotic power
advantages over existing tests. Moreover, it has power arbitrarily close to 1 uniformly over a
class of alternatives whose distance from the null hypothesis is O(n −1/ 2 ) , where n is the sample
size.
Keywords: Hypothesis test, instrumental variables, specification testing, consistent testing
_____________________________________________________________________________
Part of this research was carried out while I was a visitor at the Centre for Microdata Methods and
Practice, University College London. I thank Donald Andrews and Richard Blundell for helpful
discussions and comments. Research supported in part by NSF Grants SES 991092 and SES
0352675.
TESTING A PARAMETRIC MODEL AGAINST A NONPARAMETRIC
ALTERNATIVE WITH IDENTIFICATION THROUGH INSTRUMENTAL VARIABLES
1. INTRODUCTION
Let Y be a scalar random variable, X and W be continuously distributed random
scalars or vectors, and g be a function that is identified by the relation
(1.1)
E[Y − g ( X ) | W ] = 0 .
In (1.1), Y is the dependent variable, X is a possibly endogenous explanatory variable, and W
is an instrument for X . This paper presents a test of the null hypothesis that g in (1.1) belongs
to a finite-dimensional parametric family against a nonparametric alternative hypothesis.
d
Specifically, let Θ be a compact subset of
for some finite integer d > 0 .
The null
hypothesis, H0, is that
(1.2)
g ( x) = G ( x,θ )
for some θ ∈ Θ and almost every x , where G is a known function. The alternative hypothesis,
H1, is that there is no θ ∈ Θ such that (1.2) holds for almost every x . Under mild conditions, the
test presented here is consistent against any alternative model and has asymptotic power
advantages over existing tests. In large samples its power is arbitrarily close to 1 uniformly over
a class of alternative models whose “distance” from H0 is O(n −1/ 2 ) , where n is the sample size.
There has been much recent interest in nonparametric estimation of g in (1.1). See, for
example, Newey, Powell and Vella (1999); Newey and Powell (2003); Darolles, Florens, and
Renault (2002); Blundell, Chen, and Kristensen, (2003); and Hall and Horowitz (2003). Methods
for testing (1.2) against a nonparametric alternative have been developed by Donald, Imbens, and
Newey (2003) and Tripathi and Kitamura (2003). In addition, the test of a conditional mean
function developed by Bierens (1990) and Bierens and Ploberger (1997) can be modified to
provide a test of (1.2). Horowitz and Spokoiny (2001,2002) provide extensive references to other
tests for conditional mean and quantile functions. The test presented here has asymptotic power
advantages over existing tests that permit X to have endogenous components. In addition,
among existing tests of (1.2) against a nonparametric alternative, only the test presented here is
uniformly consistent at a known rate over a known set of alternative hypotheses. Uniform
consistency is important because it provides some assurance that there are not alternatives against
which a test has low power even with large samples. If a test is not uniformly consistent over a
specified set, then that set contains alternatives against which the test has low power.
1
Testing is particularly important in (1.1) because it provides the only currently available
form of inference about g that does not require g to be known up to a finite-dimensional
parameter. Obtaining the asymptotic distribution of a nonparametric estimator of g is very
difficult, and no existing estimator has a known asymptotic distribution. Nor is there a currently
known method for obtaining a nonparametric confidence band for g . By contrast, the test
statistic described in this paper has a relatively simple asymptotic distribution, and
implementation of the test is not difficult.
The test developed here is not affected by the ill-posed inverse problem of nonparametric
instrumental variables estimation.
Consequently, the test’s “precision” exceeds that of any
nonparametric estimator of g . The rate of convergence of a nonparametric estimator of g is
always slower than O p (n −1/ 2 ) and, depending on the details of the distribution of (Y , X ,W ) ,
may be slower than O p (n −ε ) for any ε > 0 (Hall and Horowitz 2003). In contrast, the test
described here can detect a large class of nonparametric alternative models whose distance from
the null-hypothesis model is O(n −1/ 2 ) . Nonparametric estimation and testing of conditional
mean and median functions is another setting in which the rate of testing is faster than the rate of
estimation. See Guerre and Lavergne (2002) and Horowitz and Spokoiny (2001, 2002).
Section 2 describes the test statistic and its properties. Section 3 presents the results of a
Monte Carlo investigation of the finite-sample performance of the test, and Section 4 presents an
illustrative application to real data. The proofs of theorems are in the appendix.
2. THE TEST STATISTIC AND ITS PROPERTIES
Rewrite (1.1) as
(2.1)
Y = g ( X , Z ) + U ; E (U | Z ,W ) = 0 ,
where Y and U are scalar random variables, X and W are random variables whose supports
are contained in [0,1] p ( p ≥ 1 ), and Z is a random variable whose support is contained in [0,1]r
( r ≥ 0 ). If r = 0 , then Z is not included in (2.1). X and Z , respectively, are endogenous and
exogenous explanatory variables.
W
is an instrument for X .
The assumption that
supp( X , Z ,W ) ⊂ [0,1]2 p + r can always be satisfied by carrying out a monotone transformation of
( X , Z ,W ) . The inferential problem is to test the null hypothesis, H0, that
(2.2)
g ( x, z ) = G ( x, z ,θ )
2
for some unknown θ ∈ Θ , known function G , and almost every ( x, z ) ∈ [0,1] p + r . The alternative
hypothesis, H1 is that there is no θ ∈ Θ such that (2.2) holds for almost every ( x, z ) ∈ [0,1] p + r .
The data, {Yi , X i , Zi ,Wi : i = 1,..., n} , are a simple random sample of (Y , X , Z ,W ) .
2.1 The Test Statistic
To form the test statistic, let f XZW denote the probability density function of ( X , Z ,W ) ,
and let f Z denote the probability density function of Z . Let ν be any function in L2 [0,1] p + r .
For each z ∈ [0,1]r define the operator Tz on L2 [0,1] p by
∫
Tzν ( x, z ) = t z (ξ , x)ν (ξ , z )dξ ,
where for each ( x1 , x2 ) ∈ [0,1]2 p ,
t z ( x1 , x2 ) =
∫ f XZW ( x1, z, w) f XZW ( x2 , z, w)dw .
Assume that Tz is nonsingular for each z ∈ [0,1]r . Then H0 is equivalent to
(2.3)
S ( x, z ) ≡ Tz [ g (⋅, ⋅) − G (⋅, ⋅,θ )]( x, z ) = 0
for some θ ∈ Θ and almost every ( x, z ) ∈ [0,1] p + r . H1 is equivalent to the statement that there is
no θ ∈ Θ such that (2.3) holds. A test statistic can be based on a sample analog of
∫ S ( x, z )
2
dxdz ,
but the resulting rate of testing is slower than n −1/ 2 if r > 0 . A rate of n −1/ 2 can be achieved by
carrying out an additional smoothing step. To this end, let
( z1 , z2 ) denote the kernel of a
nonsingular integral operator, L , on L2 [0,1]r if r > 0 . That is, the operator L defined by
Lν ( z ) =
∫
(ζ , z )ν (ζ )dζ
is nonsingular. Let L be the identity operator if r = 0 . Define the operator T on L2 [0,1] p + r by
Tν ( x, z ) = LTzν ( x, z ) . Then T is non-singular. H0 is equivalent to
(2.4)
S ( x, z ) ≡ T [ g (⋅, ⋅) − G (⋅, ⋅,θ )]( x, z ) = 0
for some θ ∈ Θ and almost every ( x, z ) ∈ [0,1] p + r . H1 is equivalent to the statement that there is
no θ ∈ Θ such that (2.4) holds. The test statistic is based on a sample analog of
∫ S ( x, z )
2
dxdz .
To form the analog, observe that under H0,
3
T [ g − G (⋅, ⋅,θ )]( x, z ) = E{[Y − G ( X , Z ,θ )] f XW ( x, z ,W ) ( Z , z} .
Therefore, it suffices to form a sample analog of E{[Y − G ( X , Z ,θ )] f XW ( x, z ,W ) ( Z , z} . To do
( −i )
denote a leave-observation-i-out kernel estimator of f XZW .
this, let fˆXZW
That is, for
Vi ≡ ( X i , Zi ,Wi ) and κ a kernel function of a 2 p + r -dimensional argument,
( −i )
(v ) =
fˆXZW
n
1
nh
2 p+r
v − Vi
h
∑κ
j =1
j ≠i
,
where h is the bandwidth. Let θˆn be an estimator of θ . The sample analog of S ( x, z ) is
Sn ( x, z ) = n −1/ 2
n
( −i )
( x, Zi ,Wi ) ( Z i , z ) .
∑[Yi − G( X i , Zi ,θˆn )] fˆXZW
i =1
The test statistic is
(2.5)
τ n = ∫ Sn2 ( x, z )dxdz
H0 is rejected if τ n is large.
2.2 Regularity Conditions
This section states the assumptions that are used to obtain the asymptotic properties of τ n
under the null and alternative hypotheses. Let ( x1 , z1 , w1 ) − ( x2 , z2 , w2 ) denote the Euclidean
distance between ( x1 , z1 , w1 ) and ( x2 , z2 , w2 ) . Let D j f XZW denote any j ’th partial or mixed
partial derivative of f XZW . Let D0 f XZW ( x, z , w) = f XZW ( x, z , w) .
1.
(i) The support of ( X , Z ,W ) is contained in [0,1]2 p + r .
(ii) ( X , Z ,W ) has a
probability density function f XZW with respect to Lebesgue measure. (iii) There is a constant
C f < ∞ such that | D j f XZW ( x, z , w) | ≤ C f
for all ( x, z , w) ∈ [0,1]2 p + r and
| D2 f XZW ( x1 , z1 , w1 ) − D2 f XZW ( x2 , z2 , w2 ) | ≤ C f ( x1 , z1 , w1 ) − ( x2 , z2 , w2 )
for
j = 0,1, 2 . (iv)
any
second
derivative and any ( x1 , z1 , w1 ) and ( x2 , z2 , w2 ) in [0,1]2 p + r . (v) The operator Tz is nonsingular
for almost every z ∈ [0,1]r .
2. (i) E (U | Z = z ,W = w) = 0 and E (U 2 | Z = z ,W = w) ≤ CU for each ( z , w) ∈ [0,1] p + r
and some constant CU < ∞ .
(ii) | g ( x, z ) | ≤ C g
( x, z ) ∈ [0,1] p + r .
4
for some constant C g < ∞
and all
3. (i) As n → ∞ , θ n → p θ 0 for some θ 0 ∈ Θ , a compact subset of
d
. If H0 is true,
then g ( x, z ) = G ( x, z ,θ 0 ) , θ 0 ∈ int(Θ) , and
(2.6)
n −1/ 2 (θˆn − θ 0 ) = n1/ 2
n
∑γ (U i , X i , Zi ,Wi ,θ0 ) + o p (1)
i =1
γ
for some function
d
taking values in
Eγ (U , X , Z ,W ,θ 0 ) = 0
such that
and
Var[γ (U , X , Z ,W ,θ 0 )] is a finite, non-singular matrix.
4. (i) | G ( x, z ,θ ) |≤ CG for all ( x, z ) ∈ [0,1] p + r , all θ ∈ Θ , and some constant CG < ∞ .
(ii) The first and second derivatives of G ( x, z ,θ ) with respect to θ are bounded by CG uniformly
over ( x, z ) ∈ [0,1] p + r and θ ∈ Θ .
5. (i) The kernel function used to estimate f XZW has the form κ (v) =
2 p+r
∏ j =1
K (v j ) ,
where v j is the j ’th component of v and K is a symmetrical, twice continuously differentiable
probability density function on [ −1,1] . (ii) The bandwidth, h , satisfies h = ch n −1/(2 p + r + 4) , where
ch is a constant and 0 < ch < ∞ . (iii) The operator L is nonsingular.
The representation (2.6) of n1/ 2 (θˆn − θ 0 ) holds, for example, if θˆn is a generalized
method of moments estimator
2.3 The Asymptotic Distribution of the Test Statistic under the Null Hypothesis
To
obtain
the
asymptotic
distribution
of
τn
under
H0,
define
Gθ ( x, z ,θ ) = ∂G ( x, z ,θ ) / ∂θ , Γ( x, z ) = E[Gθ ( X , Z ,θ 0 ) f XZW ( x, Z ,W ) ( Z , z )] ,
Bn ( x, z ) = n −1/ 2
n
∑[U i f XZW ( x, Zi ,Wi ) (Zi , z ) − Γ( x, z)′γ (U i , X i , Zi ,Wi ,θ0 )] ,
i =1
and V ( x1 , z1; x2 , z2 ) = E[ Bn ( x1 , z1 ) Bn ( x2 , z2 )] . Define the operator Ω on L2 [0,1]q + r by
(2.7)
(Ων )( x, z ) =
1
∫0 V ( x, z;ξ ,ζ )ν (ξ ,ζ )dξ dζ .
Let {ω j : j = 1, 2,...} denote the eigenvalues of Ω sorted so that ω1 ≥ ω 2 ≥ ... ≥ 0 .
Let
{χ12j : j = 1, 2,...} denote independent random variables that are distributed as chi-square with one
degree of freedom. The following theorem gives the asymptotic distribution of τ n under H0.
Theorem 1: If H 0 is true and assumptions 1-5 hold, then
5
∞
∑ω j χ12j .
τ n →d
j −1
2.4 Obtaining the Critical Value
The statistic τ n is not asymptotically pivotal, so its asymptotic distribution cannot be
tabulated. This section presents a method for obtaining an approximate asymptotic critical value
for the τ n test. The method replaces the asymptotic distribution of τ n with an approximate
distribution.
The difference between the true and approximate distributions can be made
arbitrarily small under both the null hypothesis and alternatives. Moreover, the quantiles of the
approximate distribution can be estimated consistently as n → ∞ . The approximate 1 − α critical
value of the τ n test is a consistent estimator of the 1 − α quantile of the approximate distribution.
The approximate critical value is obtained under sampling from a pseudo-true model that
coincides with (2.1) if H0 is true and satisfies a version of E[Y − G ( X ,θ 0 ) | Z ,W ] = 0 if H0 is
false. The critical value for the case of a false H0 is used later to establish the properties of τ n
under H1. The pseudo-true model is defined by
(2.8)
Y = G ( X , Z ,θ ) + U ,
where Y = Y − E[Y − G ( X , Z ,θ 0 ) | Z ,W ] , U = Y − G ( X , Z ,θ 0 ) , and θ 0 is the probability limit of
θˆn . This model coincides with (2.1) when H0 is true. Moreover, H0 holds for the pseudo-true
model in the sense that E[Y − G ( X , Z ,θ 0 ) | Z ,W ] = 0 , regardless of whether H0 holds for (2.1).
To describe the approximation to the asymptotic distribution of τ n , let {ω j : j = 1, 2,...}
be the eigenvalues of the version of Ω (denoted Ω ) that is obtained by replacing model (2.1)
with model (2.8). Order the ω j ’s such that ω1 ≥ ω 2 ≥ ... ≥ 0 . Then under sampling from (2.8),
τ n is asymptotically distributed as
∞
τ ≡ ∑ ω j χ12j .
j =1
Given any ε > 0 , there is an integer Kε < ∞ such that
0< P
Kε
j =1
∑ω j χ12j ≤ t − P (τ ≤ t ) < ε .
uniformly over t . Define
6
Kε
τ ε = ∑ ω j χ12j .
j =1
Let zεα denote the 1 − α quantile of the distribution of τ ε . Then 0 < P (τ > zεα ) − α < ε . Thus,
using zεα to approximate the asymptotic 1 − α critical value of τ n creates an arbitrarily small
error in the probability that a correct H0 is rejected. Similarly, use of the approximation creates
an arbitrarily small change in the power of the τ n test when H0 is false.
However, the
eigenvalues ω j are unknown. Accordingly, the approximate 1 − α critical value for the τ n test
is a consistent estimator of the 1 − α quantile of the distribution of τ ε . Specifically, let ωˆ j
( j = 1, 2,..., Kε ) be a consistent estimator of ω j under sampling from (2.8).
Then the
approximate critical value of τ n is the 1 − α quantile of the distribution of
Kε
τˆn = ∑ ωˆ j χ12j .
j =1
This quantile, which will be denoted ẑεα , can be estimated with arbitrary accuracy by simulation.
The remainder of this section describes how to obtain the estimated eigenvalues {ωˆ j } .
Define Wi = [ H (Wi )′, Zi′ ]′ , where H is a known, vector-valued function whose components are
linearly independent, and cθ ≡ dim H + r ≥ d . Assume that θˆn is the GMM estimator
(2.9)
n
i =1
n
θˆn = arg min ∑ [Yi − G ( X i , Z i ,θ )]Wi′ An ∑ [Yi − G ( X i , Z i ,θ )]Wi ,
θ ∈Θ
i =1
where { An } is a sequence of possibly stochastic cθ × cθ weight matrices converging in probability
to a finite, non-stochastic matrix A . Define the cθ × d matrix D = E[WGθ ( X , Z ,θ 0 )′] and the
d × cθ matrix γ = ( D′AD) −1 D′A . Then standard calculations for GMM estimators show that
γ (U i , X i , Zi ,Wi ,θ 0 ) = γ WiU i .
Therefore,
(2.10)
−1 n
V ( x1 , z1; x2 , z2 ) = E n
[ f XZW ( x1 , Zi ,Wi ) ( Zi , z1 ) − Γ( x1 , z1 )′γ Wi ]U i2
i =1
∑
}
× [ f XZW ( x2 , Zi ,Wi ) ( Zi , z2 ) − Γ( x2 , z2 )′γ Wi ] .
7
A consistent estimator of V can be obtained by replacing unknown quantities on the
right-hand side of (2.10) with estimators. To this end, define Dˆ = n −1
∑ i=1WiGθ ( X , Z ,θˆn )′ ,
n
γˆ = ( Dˆ ′An Dˆ )−1 Dˆ ′An , and
Γˆ ( x, z ) = n −1
∑ i=1Gθ ( X i , Zi ,θˆn ) fˆXZW ( x, Zi ,Wi ) (Zi , z) ,
n
where fˆXZW is a kernel estimator of f XZW . Also define Uˆ i = Yi − G ( X i , Zi ,θˆn ) − qˆ ( −i ) ( Zi ,Wi ) ,
where qˆ ( −i ) ( z , w) is the leave-observation i -out kernel regression estimator of Y − G ( X , Z ,θˆn )
on ( Z ,W ) . Then V ( x1 , z1; x2 , z2 ) is estimated consistently by
n
Vˆ ( x1 , z1; x2 , z2 ) = n −1 [ fˆXZW ( x1 , Zi ,Wi ) ( Zi , z1 ) − Γˆ ( x1 , z1 )′γˆWi ]Uˆ i2
i =1
∑
}
× [ fˆXZW ( x2 , Z i ,Wi ) ( Z i , z2 ) − Γˆ ( x2 , z2 )′γˆWi ] .
Let Ω̂ be the integral operator whose kernel is Vˆ ( x1 , z1; x2 , z2 ) . The ωˆ j ’s are the eigenvalues of
Ω̂ .
Theorem 2:
Let assumptions 1-5 hold.
Then as n → ∞ , (i) sup1≤ j ≤ Kε | ωˆ j − ω j |=
O[(log n) /(nh 2 p + r )1/ 2 ] almost surely and (ii) zˆεα → p zεα .
To obtain an accurate numerical approximation to the ωˆ j ’s, let Fˆ ( x, z ) denote the n × 1
vector whose i ’th component is fˆXW ( x, Z i ,Wi ) ( Z i , z ) , Ĝθ denote the n × d matrix whose (i, j )
element is Gθ ( X i , Zi ,θˆn ) , ϒ denote the n × n diagonal matrix whose (i, i ) element is Uˆ i2 , and
W denote the n × d matrix (W1′,...,Wn′ )′ . Finally, define the matrix M = I n − n −1Gˆθ γ W ′ , where
I n is the n × n identity matrix. Then
Vˆ ( z1 , z2 ) = n −1Fˆ ( x1 , z1 )′M ϒM ′Fˆ ( x2 , z2 ) .
The computation of the ωˆ j ’s can now be reduced to finding the eigenvalues of a finitedimensional matrix. Let {φ j : j = 1, 2,...} be an orthonormal basis for L2 [0,1] p + r .Then
fˆXZW ( x, z ,W ) ( Z , z ) =
∞
∞
∑ ∑ dˆ jkφ j ( x, z)φk (Z ,W ) ,
j =1 k =1
where
8
dˆ jk =
1
1
1
1
∫0 dx ∫0 dz1 ∫0 dz2 ∫0 dwfˆXZW ( x, z1, w) ( z1, z2 )φ j ( x, z1 )φk ( z2 w) .
Approximate fˆXZW ( x, z ,W ) ( Z , z ) by the finite sum
Π ( x, z , W , Z ) =
L
L
∑ ∑ dˆ jkφ j ( x, z)φk (Z ,W )
j =1 k =1
for some integer L < ∞ . Since fˆXZW
approximate fˆXZW
is a known function, L can be chosen to make Π
with any desired accuracy. Let φ ( x, z ) denote the L × 1 vector whose j ’th
component is φ j ( x, z ) . Let Φ be the L × n matrix whose ( j , k ) component is φ j ( Z k ,Wk ) . Let
D be the L × L matrix {d jk } . Then Vˆ ( x1 , z1; x2 , z2 ) is approximated by
ˆ
Vˆ ( x1 , z1; x2 , z2 ) = n −1φ ( x1 , z1 )′ DΦM ϒM ′Φ′D′φ ( x2 , z2 ) .
The eigenvalues of Ω̂ are approximated by those of the L × L matrix DΦM ϒM ′Φ′D′ .
2.5 Consistency of the Test against a Fixed Alternative Model
In this section, it is assumed that H0 is false. That is, there is no θ ∈ Θ such that
g ( x, z ) = G ( x, z ,θ ) for almost every ( x, z ) . Let θ 0 denote the probability limit of θˆn . Define
q( x, z ) = g ( x, z ) − G ( x, z ,θ 0 ) . Let zα denote the 1 − α quantile of the distribution of τ n under
sampling from the pseudo-true model (2.8). Let ẑεα denote the 1 − α quantile of τˆn . The
following theorem establishes consistency of the τ n test against a fixed alternative hypothesis.
Theorem 3:
1
∫0 [(Tq)( x, z )] dxdz > 0 .
2
Let assumptions 1-5 hold.
Suppose that H0 is false and that
Then for any α such that 0 < α < 1 ,
lim P (τ n > zα ) = 1
n →∞
and
lim P (τ n > zˆα ) = 1 .
n →∞
Because T is nonsingular, the τ n test is consistent against any alternative that differs
from G ( x, z ,θ 0 ) on a set of ( x, z ) values whose Lebesgue measure exceeds zero. It is shown in
the proof of Theorem 3 that when H0 is false, τ n increases at the rate n as n increases. This is
faster than the rate found by Donald, Imbens, and Newey (2003) for a test of H0 against a
nonparametric alternative based on the GMM test of overidentifying restrictions.
9
2.6 Asymptotic Distribution under Local Alternatives
This section obtains the asymptotic distribution of τ n under the sequence of local
alternative hypotheses
(2.11) Y = G ( X , Z ,θ 0 ) + n −1/ 2 ∆ ( X , Z ) + U ,
where ∆ is a bounded function on [0,1] p + r and θ 0 ∈ int(Θ) . The following additional notation is
used.
Let θˆn be the GMM estimator (2.9).
Let {ψ j : j = 1, 2,...} denote the orthonormal
eigenvectors of Ω . Define µ ( x, z ) = T {∆ − E[W ∆ ( X , Z )]'γ ′Gθ }( x, z ) and
1
µ j = ∫ µ ( x, z )ψ j ( x, z )dxdz .
0
Let {χ12j ( µ 2j / ω j ) : j = 1, 2,...} denote independent random variables that are distributed as noncentral chi-square with one degree of freedom and non-central parameters {µ 2j / ω j } . Let θˆn be
the GMM estimator (2.9). The following theorem states the result.
Theorem 4: Let assumptions 1-5 hold. Under the sequence of local alternatives (2.11),
τ n →d
∞
∑ω j χ12j (µ 2j / ω j ) ,
j −1
where the ω j ’s are the eigenvalues of the operator Ω defined in (2.7).
Let zα denote the 1 − α quantile of the distribution of
∞
∑ j =1ω j χ12j (µ 2j / ω j ) .
Let ẑεα
denote the estimated approximate α -level critical value defined in Section 2.2. Then it follows
from Theorems 2 and 4 that for any ε > 0 ,
limsup | P (τ n > zˆεα ) − P (τ n > zα ) | ≤ ε .
n →∞
It also follows from Theorem 4 that the τ n test has power against local alternatives whose
distance from the null-hypothesis model is O(n −1/ 2 ) . In contrast, the test of Tripathi and
Kitamura (2003) has power only against local alternatives whose distance from the nullhypothesis model decreases more slowly than n −1/ 2 . If µ ( x, z ) = 0 for all ( x, z ) , then there is a
non-stochastic sequence
{θ n }
such that
G ( x, z ,θ n ) = G ( x, z ,θ 0 ) + n −1/ 2 ∆ ( x, z ) + o( n −1/ 2 ) .
Therefore, the distance between the null and alternative hypotheses is o(n −1/ 2 ) .
10
2.7 Uniform Consistency
This section shows that for any ε > 0 , the τ n test rejects H 0 with probability exceeding
1 − ε uniformly over a class of alternative models whose distance from the null hypothesis is
O(n −1/ 2 ) . The following additional notation is used. Let θ g be the probability limit of θˆn under
the hypothesis (not necessarily true) that g ( x, z ) = G ( x, z ,θ ) for some θ ∈ Θ and a given function
G . Let Θ be a compact subset of int(Θ) . Define qg ( x, z ) = g ( x, z ) − G ( x, z ,θ g ) . Let h denote
( −i )
. For each n = 1, 2,... and C > 0 define Fnc as a set of functions g such
the bandwidth in f XZW
that:
(i) | g ( x, z ) | ≤ C g for all ( x, z ) ∈ [0,1] p + r and some constant C g < ∞ ; (ii) θ g ∈ Θ ; (iii)
(2.6) holds uniformly over g ∈ Fnc ; (iv) , Tqg ≥ n −1/ 2C , where ⋅ denotes the L2 norm; and
(v) h 2 qg / Tqg = o(1) as n → ∞ . Fnc is a set of functions whose distance from H 0 shrinks to
zero at the rate n −1/ 2 . That is, Fnc includes functions such that qg = O( n −1/ 2 ) . Condition (ii)
insures the existence of the critical value defined in Section 2.4. The requirement θ g ∈ Θ is not
restrictive in applications because Θ and Θ can usually be made large enough to include any
reasonable θ g . Condition (v) rules out alternatives that depend on x only through sequences of
eigenvectors of T whose eigenvalues converge to 0 too rapidly. For example, let p = 1 , r = 0 ,
and {λ j ,φ j : j = 1, 2,...} denote the eigenvalues and eigenvectors of T
λ1 ≥ λ2 ≥ ... > 0 .
Let,
G ( x,θ ) = θφ1 ( x) ,
ordered so that
g ( x) = φ1 ( x) + φn ( x) , and W = φ1 (W ) .
Then
h 2 qg / Tqg = h 2 / λn . Because h ∝ n −1/ 6 , condition (v) is violated if λn = o(n −1/ 3 ) . The
practical significance of condition (v) is that the τ n test has relatively low power against
alternatives that differ from H0 only through eigenvectors of T with very small eigenvalues.
The following theorem states the result of this section.
Theorem 5: Let assumptions 1, 2, 4, and 5 hold. Assume that θˆn satisfies (2.9). Then
given any δ > 0 , α such that 0 < α < 1 , and any sufficiently large but finite constant C ,
(2.12)
lim inf P (τ n > zα ) ≥ 1 − δ .
n →∞ Fnc
and
(2.13)
lim inf P (τ n > zˆεα ) ≥ 1 − 2δ .
n →∞ Fnc
11
2.8 Alternative Weights
This section compares τ n with a generalization of the test of Bierens (1990) and Bierens
and Ploberger (1997). To minimize the complexity of the discussion, assume that p = 1 and
r = 0 , so Z is not in the model. Let H be a bounded, real-valued function on [0,1]2 such that
1
∫0
2
H ( x, w) s ( w)dw = 0
only if s ( w) = 0 for almost every w∈ [0,1] . Then a test of H0 can be based on the statistic
1
2
τ nH = ∫ SnH
( x ) dx ,
0
where
SnH ( x) = n −1/ 2
n
∑[Yi − G( X i ,θˆn )]H ( x,Wi ) .
i =1
If H ( x, w) = H ( xw) for a suitably chosen function H , then τ nH is a modification of the statistic
of Bierens (1990) and Bierens and Ploberger (1997) for testing the hypothesis that a conditional
mean function belongs to a specified, finite-dimensional parametric family. In this section, it is
shown that the power of the τ nH test can be low relative to that of the τ n test. Specifically, there
are combinations of density functions of ( X ,W ) , f XW , and local alternative models (2.11) such
that an α -level τ nH test based on a fixed H has asymptotic local power arbitrarily close to α ,
whereas the α -level τ n test has asymptotic local power that is bounded away from α . The
opposite situation cannot occur under the assumptions of this paper. That is, it is not possible for
the asymptotic power of the α -level τ n test to approach α while the power of the α -level τ nH
test remains bounded away from α .
The conclusion that the power of τ nH can be low relative to that of τ n is reached by
constructing an example in which the α -level τ n test has asymptotic power that is bounded away
from α but the τ nH test has asymptotic power that is arbitrarily close to α . To minimize the
complexity of the example, assume that θ is known and does not have to be estimated. Define
Bn ( x) = n −1/ 2
n
∑Ui f XW ( x,Wi ) ,
i =1
BnH ( x) = n −1/ 2
n
∑U i H ( x,Wi ) ,
i =1
12
R ( x1 , x2 ) = E[ Bn ( x1 ) Bn ( x2 )] , and RH ( x1 , x2 ) = E[ BnH ( x1 ) BnH ( x2 )] . Also, define the operators
Ω and Ω H on L2 [0,1] by
(Ωψ )( x) =
1
∫0 R ( x,ξ )ψ (ξ )dξ
and
(Ω Hψ )( x) =
1
∫0 RH ( x,ξ )ψ (ξ )dξ .
Let {ω j , ψ j : j = 1, 2,...} and {ω jH ,ψ jH : j = 1, 2,...} denote the eigenvalues and eigenvectors of
Ω and Ω H , respectively, with the eigenvalues sorted in decreasing order. For ∆ defined as in
(2.11), define µ ( x) = (T ∆ )( x) ,
µ H ( x) = ∫
1
0
1
∫0 ∆(ξ ) H ( x, w) f XW (ξ , w)dxdw ,
1
µ j = ∫ µ ( x)ψ j ( x)dx ,
0
and
1
µ jH = ∫ µ H ( x)ψ jH ( x)dx .
0
Then arguments like those used to prove Theorem 4 show that under the sequence of local
alternatives (2.11) with a known θ ,
τn →
∞
∑ω j χ12j (µ 2j / ω j )
d
j −1
and
τ nH →d
∞
∑ω jH χ12j (µ 2jH / ω jH )
j −1
as n → ∞ . Therefore, to establish the first conclusion of this section, it suffices to show that for a
fixed function H , f XW and ∆ can be chosen so that µ
and µ H
2
/
∑ j =1ω jH
∞
2
/
∑ j =1ω j
∞
is bounded away from 0
is arbitrarily close to 0.
To this end, let φ1 ( x) = 1 and φ j +1 ( x) = 2−1/ 2 cos( jπ x) for j ≥ 1 . Let m > 1 be a finite
integer. Define
1 if j = 1 or m
−2 j
otherwise.
e
λj =
13
Let
f XW ( x, w) = 1 +
∞
∑ λ1/j +12φ j +1 ( x)φ j +1 (w) .
j =1
∑ j =1ω j
∞
Let E (U 2 | W = w) = 1 for all w∈ [0,1] . Then R ( x1 , x2 ) = t ( x1 , x2 ) , ω j = λ j , and
non-zero and finite. Set ∆( x) = Dφm ( x) for some finite D > 0 . Then µ
suffices to show that m can be chosen so that µ H
2
2
is
= D 2 λm2 = D 2 . It
is arbitrarily close to 0. To do this, observe
that H ( z , w) has the Fourier representation
∞
∑ h jkφ j ( x)φk (w) ,
H ( x, w) =
j ,k =1
where {h jk : j , k = 1, 2,...} are constants. Moreover, µ H
m can be chosen so that
∑ j =1 h2jm < ε / D 2
∞
2
= D2
∑ j =1 h2jm .
∞
Since H is bounded,
for any ε > 0 . With this m , µ H
2
< ε , which
establishes the first conclusion.
The opposite situation (a sequence of local alternatives for which µ
2
while is µ H
2
approaches 0
remains bounded away from 0) cannot occur. To show this, assume without loss
of generality that the marginal distributions of X and W are U [0,1] , E (U 2 | W = w) = 1 for all
w∈ [0,1] , and
1 1
∫0 ∫0
∑ j =1ω jH = 1 .
∞
H ( x, w)2 dxdw =
Also, assume that ∆
2
< C∆ for some constant C∆ < ∞ . Then,
∞
∑ω jH .
j =1
It follows from the Cauchy-Schwartz inequalty that
µH
2
≤
=
H ( x, w) 2 dzdw
0
1 1
∫0 ∫
1
∫0 ∫0
≤ ∆
2
1
∫0 ∫0
2
f XW ( x, w)∆ ( x)dx dw
2
f XW ( x, w) ∆( x)dx dw
T∆
2
2
≤ C∆ µ .
14
Therefore, µ
2
can approach 0 only if µ H
2
also approaches 0.
3. MONTE CARLO EXPERIMENTS
This section reports the results of a Monte Carlo investigation of the finite-sample
performance of the τ n test. The experiments consist of testing the null hypotheses, H 0 , that
(4.1)
g ( x) = θ 0 + θ1 x
and
(4.2)
g ( x) = θ 0 + θ1 x + θ 2 x 2 .
The alternative hypotheses are (4.2) if (4.1) is H 0 and
(4.3)
g ( x) = θ 0 + θ1 x + θ 2 x 2 + θ 3 x3
if either (4.1) or (4.2) is H 0 .
To provide a basis for judging whether the power of the τ n test is high or low, this
section also reports the results of two other tests. One is an asymptotic t test of the hypothesis
θ 2 = 0 if (4.1) is H 0 and of θ 3 = 0 if (4.2) is H 0 . The t test is an example of an ad hoc test that
might be used in applied research. The other test is the modified test of Bierens (1990) and
Bierens and Ploberger (1997) that is described in Section 2.8.
The weight function is
H ( x, w) = exp(xw) . The critical value was computed using the methods described in Section 2.4.
In all experiments, θ 0 = 0 and θ1 = 0.5 . When (4.2) is the correct model, θ 2 = −0.5 .
When (4.3) is the correct model, θ 2 = −1 , θ 3 = 1 if (4.1) is H 0 , and θ 3 = 4 if (4.2) is H 0 .
Realizations of ( X ,W ) were generated by X = Φ (ξ ) , W = Φ (ζ ) , where Φ is the cumulative
normal distribution function, ζ ~ N (0,1) , ξ = ρζ + (1 − ρ 2 )1/ 2 ε , ε ∼ N (0,1) , and ρ is a constant
parameter whose value varies among experiments. Realizations of Y were generated from
Y = g ( x) + σ U U , where U = ηε + (1 − η 2 )1/ 2ν , ν ∼ N (0,1) , σ U = 0.2 , and η is a constant
parameter whose value varies among experiments. The instruments used to estimate (4.1), (4.2),
and (4.3), respectively, are (1,W ) , (1,W ,W 2 ) , and (1,W ,W 2 ,W 3 ) . The bandwidth h used to
estimate f XW was selected by cross-validation. The kernel is K (v) = (15 /16)(1 − v 2 ) 2 I (| v |≤ 1) ,
where I is the indicator function.
The asymptotic critical value was estimated by setting
Kε = 25 . The results are not sensitive to the choice of Kε , and the estimated eigenvalues ωˆ j are
very close to 0 when j > 25 . The sample size is n = 500 , and the nominal level is 0.05. There
15
are 1000 Monte Carlo replications in each experiment. Computation of the critical value took
approximately 4 seconds on a 900 MHz PC.
The results are shown in Table 1. The differences between the nominal and empirical
rejection probabilities are small when H0 is true. When H0 is false, the powers of the τ n and t
tests are similar. Not surprisingly, the t tests of (4.1) against (4.2) and (4.2) against (4.3) are
somewhat more powerful than the τ n tests. The τ n test is slightly more powerful for testing
(4.1) against (4.3).
The Bierens-type test is much less powerful than the τ n and t tests,
especially for testing (4.2) against (4.3).
5. AN EMPIRICAL EXAMPLE
This section presents an empirical example in which τ n is used to test two hypotheses
about the shape of an Engle curve. One is that the curve is linear. The other is that it is quadratic.
The curve is given by (2.1) with r = 0 , so Z is not in the model. Y denotes the logarithm of the
expenditure share of food consumed off the premises where it was purchased, X denotes the
logarithm of total expenditures, and W denotes annual income from wages and salaries. The
data consist of 785 household-level observations from the 1996 U.S. Consumer Expenditure
Survey. The bandwidth for estimating f XW was selected by cross-validation. The kernel is the
same as the one used in the Monte Carlo experiments. As in the experiments, Kε = 25 .
The τ n test of the hypothesis that g is linear (quadratic) gives τ n = 13.4 (0.32) with a
0.05-level critical value of 3.07 (5.22). Thus, the test rejects the hypothesis that g is linear but
not that g is quadratic. The hypotheses were also tested using the t test described in Section 4.
This test gives t = 2.60 for the hypothesis that g is linear ( θ 2 = 0 in (4.2)) and t = 0.34 for the
hypothesis that g is quadratic ( θ 3 = 0 in (4.3)). The 0.05-level critical value is 1.96. Thus, the
t test also rejects the hypothesis that g is linear but not the hypothesis that it is quadratic.
MATHEMATICAL APPENDIX: PROOFS OF THEOREMS
To minimize the complexity of the presentation, it is assumed here that p = 1 and r = 0 .
The proofs for p > 1 and/or r > 0 are identical after replacing quantities for p = 1, r = 0 with the
analogous quantities for the more general case. Let f XW denote the density function of ( X ,W ) .
Define
Sn1 ( x) = n −1/ 2
n
∑Ui f XW ( x,Wi ) ,
i =1
16
Sn 2 ( x) = n −1/ 2
n
∑[ g ( X i ) − G( X i ,θ0 )] f XW ( x,Wi ) ,
i =1
n
∑[G( X i ,θ0 ) − G( X i ,θˆn ))] f XW ( x,Wi ) ,
Sn3 ( x) = n −1/ 2
Sn 4 ( x) = n −1/ 2
i =1
n
( −i )
( x,Wi ) − f XW ( x,Wi )] ,
∑U i [ fˆXW
i =1
n
( −i )
( x,Wi ) − f XW ( x,Wi )] ,
∑[ g ( X i ) − G( X i ,θ0 )][ fˆXW
Sn5 ( x) = n −1/ 2
i =1
and
Sn 6 ( x) = n −1/ 2
n
( −i )
( x,Wi ) − f XW ( x,Wi )] .
∑[G( X i ,θ0 ) − G( X i ,θˆn ))][ fˆXW
i =1
Then Sn ( x) =
6
∑ S nj ( x ) .
j =1
Lemma 1: As n → ∞ ,
Sn3 ( x) = −Γ( x)′n1/ 2 (θˆn − θ 0 ) + o p (1)
= −Γ( x)′n −1/ 2
n
∑ γ (Ui , X i ,Wi ,θ0 ) + o p (1).
i =1
uniformly over z ∈ [0,1] .
Proof: A Taylor series expansion gives
Sn3 ( x) = − n
−1
n
∑ Gθ ( X i ,θ n ) f XW ( x,Wi )n1/ 2 (θˆn − θ0 ) ,
i =1
where θ n is between θˆn and θ 0 . Application of Jennrich’s (1969) uniform law of large numbers
gives the first result of the lemma.
The second result follows from the first by applying
Assumption 3. Q.E.D.
( −i )
( x, w) / ∂z − ∂f XW ( x, w) / ∂z | = o[(log n) /(n1/ 2 h 2 ) + h] almost
Lemma 2: As n → ∞ , | ∂fˆXW
surely uniformly over ( z , w) ∈[0,1]2 .
Proof: This is a modified version of Theorem 2.2(2) of Bosq (1996) and is proved the
same way as that theorem. Q.E.D.
Lemma 3: As n → ∞ , Sn 4 ( x) = o p (1) uniformly over x ∈ [0,1] .
17
Proof: Let I1 ,..., I m be a partition of [0,1] into m intervals of length 1 / m . For each
( −i )
( −i )
( x, w) = fˆXW
( x, w) − f XW ( x, w) . Then for any
j = 1,..., m , choose a point x j ∈ I j . Define ∆f XW
ε >0,
Sn 4 ( x) = n −1/ 2
m
n
( −i )
( x,Wi )
∑ ∑Ui I ( x ∈ I j )∆f XW
j =1 i =1
= n −1/ 2
m
n
( −i )
( x j ,Wi )
∑ ∑Ui I ( x ∈ I j )∆f XW
j =1 i =1
+ n −1/ 2
m
n
( −i )
( −i )
( x,Wi ) − ∆f XW
( x j ,Wi )]
∑ ∑U i I ( x ∈ I j )[∆f XW
j =1 i =1
≡ Sn 41 ( x) + Sn 42 ( x).
A Taylor series expansion gives
Sn 42 ( x) = n −1/ 2
m
n
( −i )
( x j ,Wi ) / ∂x]( x − x j ) ,
∑ ∑U i I ( x ∈ I j )[∂∆f XW
j =1 i =1
where x j is between x j and x . Therefore, it follows from Lemma 2 that
| Sn 42 ( x) | ≤ n −1/ 2 m −1
m
n
( −i )
( x j ,Wi ) / ∂x |
∑ ∑ | U i | I ( x ∈ I j ) | ∂∆f XW
j =1 i =1
≤n
−1/ 2
−1
1/ 2 2
m o p [(log n) /( n
m
n
∑ ∑| Ui | I ( x ∈ I j )
h ) + h]
j =1 i =1
=
O p [(log n) /( mh 2 ) + n1/ 2 h / m]
uniformly over x ∈ [0,1] . In addition, for any ε > 0 ,
n
( −i )
P sup | Sn 41 ( x) | > ε = P max n −1/ 2 U i ∆f XW
( x j ,Wi ) > ε
1≤ j ≤ m
x∈[0,1]
i =1
∑
≤
n
( −i )
P n −1/ 2 U i ∆f XW
( x j ,Wi ) > ε .
j =1
i =1
m
∑
∑
( −i )
But E[U i ∆f XW
( x j ,Wi )] = 0 , and standard calculations for kernel estimators show that
18
n
( −i )
Var n −1/ 2 U i ∆f XW
( x,Wi ) = O[( nh 2 )−1 + h 4 ]
i =1
∑
for any x ∈ [0,1] . Therefore, it follows from Chebyshev’s inequality that
m
n
j =1
i =1
( −i )
( x j ,Wi ) > ε = O[m /(nh 2ε 2 ) + mh 4 / ε 2 ] ,
∑ P n−1/ 2 ∑U i ∆f XW
which implies that
P sup | Sn 41 ( x) | > ε = O[m /(nh 2ε 2 ) + mh 4 / ε 2 ] .
x∈[0,1]
The lemma now follows by choosing m so that n −1/ 2 m → C3 as n → ∞ , where C3 is a constant
such that 0 < C3 < ∞ . Q.E.D.
Lemma 4: As n → ∞ , Sn 6 ( x) = o p (1) uniformly over x ∈ [0,1] .
Proof: A Taylor series expansion gives
Sn 6 ( x) = n −1
n
( −i )
( x,Wi ) − f XW ( x,Wi )]n1 / 2 (θˆn − θ 0 ) ,
∑ Gθ ( X i ,θ n )[ fˆXW
i =1
where θ n is between θˆn and θ 0 .
The result follows from boundedness of Gθ ,
( −i )
n1/ 2 (θˆn − θ 0 ) = O p (1) , and [ fˆXW
( x,Wi ) − f XW ( x,Wi )] = O[h 2 + (log n) /(nh 2 )1/ 2 ] almost surely
uniformly over x ∈ [0,1] . Q.E.D.
Lemma 5: Under H 0 , Sn ( x) = Bn ( x) + o p (1) uniformly over x ∈ [0,1] .
Proof: Under H 0 , Sn 2 ( x) = Sn5 ( x) = 0 for all x . Now apply Lemmas 1, 3, and 4.
Q.E.D.
Proof of Theorem 1:
Under H 0 , Sn ( x) = Bn ( x) + o p (1) uniformly over x ∈ [0,1] by Lemma 5. Therefore,
1
τ n = ∫ Bn2 ( x)dx + o p (1) .
0
The result follows by writing
1
∫0 [ Bn ( x) − EBn ( x)
2
2
]dx as a degenerate U statistic of order two.
See, for example Serfling (1980, pp. 193-194). Q.E.D.
(
ˆ −Ω
Proof of Theorem 2: | ωˆ j − ω j |= O Ω
McIntosh (1983).
)
by Theorem 5.1a of Bhatia, Davis, and
Moreover, standard calculations for kernel density estimators show that
19
ˆ − Ω = O[(log n) /( nh 2 )1/ 2 ] . Part (i) of the theorem follows by combining these two results.
Ω
Part (ii) is an immediate consequence of part (i). Q.E.D.
Let zα denote the 1 − α quantile of the distribution of
Proof of Theorem 3:
∑ j =1ω j χ12j .
∞
Because of Theorem 2, it suffices to show that if H1 holds, then under sampling
from Y = g ( X ) + U ,
lim P (τ n > zα ) = 1 .
n →∞
This will be done by proving that
plim n −1τ n =
n →∞
1
∫0 [(Tq)( x)] dx > 0 .
2
To do this, observe that by Jennrich’s (1969) uniform law of large numbers, n −1/ 2 Sn 2 ( x) =
(Tq )( x) + o p (1) uniformly over x ∈ [0,1] . Moreover, Sn5 ( x) = o(h −1 log n) = o( n1/ 6 log n) a.s.
( −i )
uniformly over x, w∈ [0,1] because fˆXW
( x, w) − f XW ( x, w) = o[(log n) /( nh 2 )1 / 2 ] a.s. uniformly
over x ∈ [0,1] . Combining these results with Lemma 5 yields
n −1/ 2 Sn ( x) = n −1/ 2 Bn ( x) + (Tq)( x) + o p (1) .
A further application of Jennrich’s (1969) uniform law of large numbers shows that
n −1/ 2 Sn ( x) → p (Tq )( x) , so n −1τ n → p
1
∫0 [(Tq)( x)] dx .
2
Q.E.D.
Proof of Theorem 4: Arguments like those leading to lemma 5 show that
Sn ( x) = Bn ( x) + Sn 2 ( x) + Sn5 ( x) − E (W ∆ )′γ ′(TGθ )( x) + o p (1)
uniformly over x ∈ [0,1] . Moreover,
Sn5 ( x) = n −1
n
( −i )
( x,Wi ) − f XW ( x,Wi )]
∑ ∆( X i )[ fˆXW
i =1
= O[(log n) /(nh 2 )1/ 2 ]
almost surely uniformly over x . In addition
Sn 2 ( x) = n −1
n
∑ ∆( X i ) f XW ( x,Wi )
i =1
= (T ∆ )( x) + o(1)
almost surely uniformly over x . Therefore, Sn ( x) = Bn ( z ) + µ ( x) + o p (1) uniformly over x . But
20
Bn ( x) + µ ( x) =
∞
∑ b jψ j ( x) ,
j =1
where b j = b j + µ j and b j is defined as in the proof of Theorem 1. The b j ’s are asymptotically
distributed as independent N ( µ j ,ω j ) variates. Now proceed as in Serfling’s (1980, pp. 195-199)
derivation of the asymptotic distribution of a 2nd-order degenerate U statistic. Q.E.D.
Proof of Theorem 5: Let z gα denote the critical value under the model Y = g ( X ) + U ,
g ∈ Fnc . Let zˆεα g denote the corresponding estimated approximate critical value. Observe that
z gα is bounded and zˆεα g is bounded in probability uniformly over g ∈ Fnc .
We prove (2.12). The proof of (2.13) is similar. Define Dn ( x) = Sn3 ( x) + Sn 6 ( x)
2
+ E[ Sn 2 ( x) + Sn5 ( x)] and Sn ( x) = Sn ( x) − Dn ( x) . Then τ n = Sn + Dn . Use the inequality
(A1)
a 2 ≥ 0.5b 2 − (b − a) 2
with a = Sn and b = Dn to obtain
P (τ n > z gα ) ≥ P 0.5 Dn
2
− Sn
2
> z gα .
For any finite M > 0 ,
P 0.5 Dn
2
− Sn
2
≤ z gα = P 0.5 Dn
2
≤ z gα + Sn
2
+ P 0.5 Dn
2
≤ z gα + Sn
2
2
≤ z gα + M + P Sn
(
≤ P 0.5 Dn
Sn
, Sn
, Sn
)
2
≤ M
2
> M
2
> M .
is bounded in probability uniformly over g ∈ Fnc . Therefore, for each ε > 0 there is
M ε < ∞ such that for all M > M ε
P 0.5 Dn
)
(
2
≤ z gα + M + ε .
(
2
> z gα + M − ε
2
− Sn
2
≤ z gα ≤ P .5 Dn
2
− Sn
2
> z gα ≥ P .5 Dn
Equivalently,
P 0.5 Dn
and
21
)
(A2)
(
)
P (τ n > z gα ) ≥ P .5 Dn
2
Sn 2 ( x) + Sn5 ( x)] = n −1/ 2
( −i )
( x,Wi ) .
∑[ g ( X i ) − G( X i ,θ g )] fˆXW
> z gα + M − ε .
Now
n
i =1
Therefore,
E[ Sn 2 ( x) + Sn5 ( x)] = n −1/ 2 E
n
∑[ g ( X i ) − G( X i ,θ g )][ f XW ( x,Wi ) + h2 Rn ( x)] ,
i =1
where Rn ( x) is nonstochastic, does not depend on g , and is bounded uniformly over x ∈ [0,1] .
It follows that
E[ Sn 2 ( x) + Sn5 ( x)] = n1/ 2 (Tqg )( x) + O n1/ 2 h 2 qg
and
E[ Sn 2 ( x) + Sn5 ( x)] ≥ 0.5n1/ 2 (Tqg )( x)
uniformly over g ∈ Fnc for all sufficiently large n .
Now
Sn3 ( x) + Sn 6 ( x) ≤
sup
ξ ∈[0,1], g∈Fnc
n1/ 2 G (ξ ,θˆn ) − G (ξ ,θ g ) n −1
n
( −i )
( x,Wi ) .
∑ fˆXW
i =1
( −i )
Therefore, it follows from the definition Fnc and uniform convergence of fˆXW
to f XW that
Sn3 + Sn 6 = O p (1) uniformly over g ∈ Fnc . A further application of (A1) with a = Dn ( x) and
b = E[ Sn 2 ( x) + Sn5 ( x)] gives
(A3)
Dn
2
≥ .125n Tqg
2
+ O p (1)
uniformly over g ∈ Fnc as n → ∞ . Inequality (2.12) follows by substituting (A3) into (A2) and
choosing C to be sufficiently large. Q.E.D.
22
REFERENCES
Bhatia, R., C. Davis, and A. McIntosh (1983). Perturbation of Spectral Subspaces and Solution
of Linear Operator Equations, Linear Algebra and Its Applications, 52/53, 45-67.
Bierens, H.J. (1990). A Consistent Conditional Moment Test of Functional Form, Econometrica,
58, 1443-1458.
Bierens, H.J. and W. Ploberger (1997). Asymptotic Theory of Integrated Conditional Moment
Tests, Econometrica, 65, 1129-1151.
Blundell, R., X. Chen and D. Kristensen (2003). Semi-Nonparametric IV Estimation of Shape
Invariant Engle Curves, working paper CWP 15/03, Centre for Microdata Methods and
Practice, University College London.
Bosq, D. (1996). Nonparametric Statistics for Stochastic Processes, New York: Springer.
Darolles, S., J.-P. Florens, and E. Renault (2002). Nonparametric Instrumental Regression,
working paper, GREMAQ, University of Social Science, Toulouse.
Donald, S.G., G.W. Imbens, and W.K. Newey (2003). Empirical Likelihood Estimation and
Consistent Tests with Conditional Moment Restrictions, Journal of Econometrics, 117, 5593.
Guerre, E. and P. Lavergne (2002). Optimal Minimax Rates for Nonparametric Specification
Testing in Regression Models, Econometric Theory, 18, 1139-1171.
Hall, P. and J.L. Horowitz (2003). Nonparametric Methods for Inference in the Presence of
Instrumental Variables, working paper, Department of Economics, Northwestern University.
Horowitz, J.L. and V.G. Spokoiny (2001). An Adaptive, Rate-Optimal Test of a Parametric
Mean Regression Model against a Nonparametric Alternative, Econometrica, 69, 599-631.
Horowitz, J.L. and V.G.Spokoiny (2002). An Adaptive, Rate-Optimal Test of Linearity for
Median Regression Models, Journal of the American Statistical Association, 97, 822-835.
Jennrich, R.I. (1969). Asymptotic Properties of Non-Linear Least Squares Estimators, Annals of
Mathematical Statistics, 40, 633-643.
Newey, W.K. and J.L. Powell (2003). Instrumental Variable Estimation of Nonparametric
Models, Econometrica, 71, 1565-1578.
Newey, W.K., J.L. Powell, and F. Vella (1999). Nonparametric Estimation of Triangular
Simultaneous Equations Models, Econometrica, 67, 565-603.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics, New York: Wiley.
Tripathi, G. and Y. Kitamura (2003). Testing Conditional Moment Restrictions, Annals of
Statistics, 31, 2059-2095.
23
Table 1: Results of Monte Carlo Experiments
Empirical Probability
Null
Alt.
that H0 Is Rejected Using
Model Model
ρ
η
τn
t test
Bierens’
_
____________________
__
Test ____
H0 is true
(4.1)
0.8
0.8
0.7
0.1
0.5
0.1
0.051
0.030
0.049
0.052
0.034
0.052
0.053
0.029
0.053
(4.2)
0.8
0.8
0.7
0.1
0.5
0.1
0.053
0.046
0.056
0.040
0.077
0.036
0.054
0.043
0.036
H0 is false
(4.1)
(4.2)
0.8
0.8
0.7
0.1
0.5
0.1
0.658
0.721
0.421
0.714
0.827
0.444
0.470
0.466
0.280
(4.1)
(4.3)
0.8
0.8
0.7
0.1
0.5
0.1
0.684
0.663
0.424
0.671
0.580
0.412
0.479
0.464
0.274
(4.2)
(4.3)
0.8
0.8
0.7
0.1
0.5
0.1
0.510
0.972
0.527
0.566
0.987
0.590
0.038
0.030
0.059
24
© Copyright 2026 Paperzz