On the Existence of √ N-consistent Estimators of Linear

√
On the Existence of N -consistent Estimators of
Linear Functionals in Non-Parametric IV Models∗
Laurent Davezies†
September 9, 2016
Abstract
√
This paper considers the issue of the existence of
linear functionals of the form θ0 =
E(ψ(X)m(j) (X))
N -consistent estimators of
in a nonparametric IV model
where E(Y − m(X)|Z) = 0. We derive a necessary condition for the existence of
√
N -consistent estimators and we derive the semi-parametric efficiency bound under
this condition. This condition ensures that θ0 can be expressed as a moment of the
form E(q(Z)Y ) and is not nested with the completeness condition. We then explore
the implications of this necessary condition on the first stage i.e. the distribution of
X|Z.
When the distribution of X conditional on Z belongs to an exponential family, or
when the first stage has an additive structure, the condition that we exhibit turns out
to be quite restrictive for estimating averages under counterfactual distributions. In
√
that case, some parameters are never (respectively always) N estimable. But more
√
surprisingly in some cases, some parameters are N estimable if and only if some
central cross-moments of X and Z are above a critical value.
On the other hand, an additive first stage is a condition that enables the econometrician to identify average marginal effects, and ensures the finiteness of the semiparametric efficiency bound for a large class of DGP, even if the completeness condition does not hold.
Keywords: semi-parametric model, efficiency bound, instrumental variables,
counterfactual, average marginal effect.
JEL classification numbers: C01, C02 and C14.
∗
I thank my colleagues Xavier d’Haultfœuille and Anna Simoni for many helpful discussions. I also
thank Xiaohong Chen and Arthur Lewbel for their helpful comments and encouragements. This work has
also benefited from comments of participants of the 2015 Econometric Society World Congress, of the 2015
Annual Conference of the International Association for Applied Econometrics and of the CREST seminars.
Last, I am indebted to Benjamin Walter for his careful reading. The usual disclaimer applies and all errors
remain ours.
†
CREST, [email protected].
1
Introduction
Models with endogeneity are central in econometrics. In a nonparametric setting with
additively separable errors, the completeness condition (Newey & Powell, 2003) is necessary
and sufficient to identify the structural function. But this condition is quite restrictive and
cannot be tested without supplementary restrictions (Canay et al., 2013). However, the
parameter of interest is often an expectation depending on the structural function (or on
its derivatives) and not the full structural function itself. For instance, the parameter of
interest could be an average marginal effect, an average effect under an exogenous change in
the distribution of regressors, a linear functional considered by Severini & Tripathi (2012b)
and Santos (2011), or a density weighted average derivative (Powell et al., 1989, Cattaneo
et al., 2014).
Moreover, we can hope there exists some regular root-N consistent estimators for such
parameters. The motivation to focus on the existence of root-N regular estimators is
related to some optimality results. Indeed, in sufficiently regular models when regular
estimators exist, a convolution Theorem and a local asymptotic minimax Theorem hold
for any subconvex loss function (cf. van der Vaart, 2000, Theorems 25.20 and 25.21). This
means that the corresponding asymptotic risk of any regular estimator would be bounded
below. For the asymptotic mean square error, this lower bound is usually called the semiparametric efficiency bound (see for instance Newey, 1990b). The efficient estimator that
achieves this bound leads to efficient tests (Choi et al., 1996). This notion generalizes
the Cramer-Rao bound originally introduced for parametric models. Not surprisingly, this
notion is closely related to the Fisher information. In an iid data generating process, it
measures the minimal number of observations needed to discriminate the true value of the
parameter of interest from an alternative one, whatever the value of nuisance parameters.
Both statisticians (Bickel et al., 1993) and econometricians (Chamberlain, 1987, Newey,
1988, Newey, 1990a, Severini & Tripathi, 2012a) have derived those bounds for various
models. However, existence of such regular estimator is not always ensured (van der Vaart,
2000, Theorem 25.32 and van der Vaart, 1991) . To our best knowledge, for the case of
linear functionals in a non-parametric instrumental variables model, the first paper to bring
this issue to light is Ai & Chen (2007). Non-parametric instrumental variables have only
been studied by econometricians (not by statisticians) and a large strand of the literature
focuses on the non-parametric estimation of the regression function m (for instance, Ai
& Chen, 2003, Florens, 2003, Newey & Powell, 2003, Hall & Horowitz, 2005, Chen, 2007,
Florens et al., 2011, Florens & Simoni, 2012, Florens et al., 2012, Darolles et al., 2011,
Chen & Pouzo, 2012). But linear functionals of the regression function, which are often
the parameter of interest for the economists, have received attention more recently (Ai &
1
Chen, 2007, Santos, 2011, Severini & Tripathi, 2012b, Ai & Chen, 2012, Carrasco et al.,
2014, Chen & Pouzo, 2015, Breunig & Johannes, 2016). Ai & Chen (2007), Carrasco et al.
(2014), Chen & Pouzo (2015) and Breunig & Johannes (2016) propose plug-in estimators
based on a non-parametric first step under the completeness condition, but the root-N
consistency of such estimators is far to be ensured. For instance for the plug-in penalized
sieve minimum distance estimator, the detailed discussion in Section 3 of Chen & Pouzo
(2015) illustrates this negative result (see also the discussion of Assumption 4.1 in Ai &
Chen (2007)). Here, we do not establish the properties of a specific estimator, but examine
whether a root-N and regular consistent estimator could exist.
The next section is brief and only introduces notations and the model. If Y = m(X) + ε
with E(ε|Z) = 0 and FY,X,Z identified by the data, we show in the third Section that
parameters of the form θ0 = E(ψ(X)m(j) (X)) are identifiable under weaker conditions
than the completeness condition in Section 3. We generalize a result of Severini & Tripathi
(2012b) to a larger class of parameters including for instance the average marginal effect.
In the forth Section of this paper we give a necessary and sufficient condition for the existence of a finite semi-parametric efficiency bound for such parameters. This condition
is necessary for the existence of a regular root-N estimator. This condition states that it
exists q(Z) such that θ0 = E(q(Z)Y ). Moreover, when such a condition holds, the expression of the semi-parametric efficiency bound suggests an efficient estimator based on an
inverse problem because we also need to have E(q(Z)|X)fX (X) = (−1)|j| [ψ(X)fX (X)](j) .
A straightforward implication of our main result concerns the exogenous case: if there is
no root-N estimator of θ0 based only on the outcome and on the covariates, supplementary instruments won’t help to reach the root-N consistency. Again, results obtained in
this section generalize the previous results of Severini & Tripathi (2012b). Moreover, our
expression of the semi-parametric efficiency bound is more explicit and shows that the estimator of Santos (2011) is not efficient. Our framework also generalizes those of Powell et al.
(1989) or Newey & Stoker (1993). Ai & Chen (2012) propose an alternative expression of
the semi-parametric efficiency bound for an arbitrary sequence of identifying conditional
moments restrictions, but assume the existence of a solution of to a functional minimization 1 . The existence of finite semi-parametric efficiency bound (and next of regular root-N
estimators) is related to the existence of a Riesz representation of the Hadamard derivative
of θ with respect to the square root of measure of any parametric submodel (cf. Sections
25.3 to 25.5 of van der Vaart (2000)). We emphasize the necessary condition under which
such a linear form exists, and we derive a simple expression for the efficiency bound in
which this condition appears explicitly.
A first part of our condition leads to a necessary condition of regularity at the boundary
1
cf. Equations (5) and (6) in Ai & Chen (2012)
2
of the support of the regressors X. In the exogenous case (ie when X = Z), regularity at
the boundary of the support has already been shown to be a sufficient condition in many
papers such as Powell et al. (1989), and we show that we cannot relax it. A second part of
our condition is more restrictive, especially in the endogenous case (ie when X 6= Z). The
sufficient conditions 2 given in Ai & Chen (2007), Carrasco et al. (2007), Carrasco et al.
(2014), Breunig & Johannes (2016) and Ai & Chen (2012) for the existence of a root-N
consistent estimator and/or for existence of a finite semi-parametric efficiency bound are
compatible with the necessary ones that we establish in this paper. In Section 4.3, we also
show that our expression of the semiparametric efficiency bound extends the formula given
in Ai & Chen (2012) to a larger class of DGP (reflecting the discrepancy between their
sufficient conditions and our necessary ones). However, the second part of our necessary
condition is also quite abstract and in the fifth Section, we discuss it on two special cases:
the exponential first stage and the additive first stage. We also give some examples to
show that naive intuition could be misleading.
To sum up, the main implications of our results are the following: 1. only "very regular"
(in a sense detailed in this paper) linear functionals are root-N estimable, and in this case
θ0 could be reformulated as a moment of (Y, Z) but the completeness condition is not
necessary 2. regularity conditions that we get are stronger in the endogenous case than in
the exogenous case and supplementary instruments will not help reach the root-N rate when
the regressors are exogenous, 3. even for some simple data generating processes usually
considered in the textbooks, the mean of Y under a simple counterfactual distribution of
X could be or could not be root-N estimable, depending on the counterfactual distribution
chosen, 4. under an additive first stage, the average marginal effect admits a finite semiparametric efficiency bound under easily interpretable and reasonable conditions.
2
DGP and parameter
Let consider the following instrumental regression:
Y = m(X) + ε and E(ε|Z) = 0
We assume there exists a dominant measure of (Y, X, Z) and X := Supp(X) ⊂ RK ,
Z := Supp(Z) ⊂ Rl , Y := Supp(Y ) ⊂ R. The function m is a function from RK to
2
cf. the source conditions in Carrasco et al. (2007), the Assumption 4.1 in Ai & Chen (2007), the
Assumption 3.3 and 3.4 in Carrasco et al. (2014), Assumptions 1 and 2 and Remark 2.4 in Breunig &
Johannes (2016), and the existence of a solution of Equation (5) in Ai & Chen (2012)
3
R. For a K-dimensional multi-index (j) = (j1 , ..., jK ), f (j) denotes the partial derivative
∂ j1 , ..., ∂ jK f /∂xj11 , ..., ∂xjKK (which is assumed to exist and to not depend on the order in
which the differentiation is performed). Let D(j) (X) the set of functions f from RK to
R such that with probability 1, f (j) (X) exits. We assume that m ∈ D(j) (X) and we are
interested in estimation of the linear functional θ0 = E(ψ(X)m(j) (X)) with ψ a function
from RK to R of the form ψ(x) = h(fX (x), x) with h a known function differentiable
with respect to the first variable and fX the identifiable density of X with respect to a
dominant measure. Theaverage effect
under a counterfactual distribution ff
X of regressors
ff
X (X)
corresponds to θ0 = E fX (X) m(X) . The average marginal effect of Xk corresponds to
E (∂xk m(X)).
To avoid functional dependence between the components of X, we assume that the product
of the dominant measure for the distributions of the components of X is a dominant
measure. Moreover, because differentiating with respect to the component xk only have
sense for continuous variables, for k such that jk > 0, the dominant measure of Xk is
assumed to be the Lebesgue measure.
We can now state a first minimal assumption, under which the model and the parameter
considered have a meaning. In the sequel of the paper, for any random variable U , L1 (U )
denotes the set of real valued and measurable functions f on Supp(U ) such that E(|f (U )|) <
∞.
Assumption 1 (Exclusion restriction and well-defined parameter)
Y = m(X) + ε and θ0 = E ψ(X)m(j) (X)
with
(i) Y ∈ L1 (Y ), m(X) ∈ L1 (X) ∩ D(j) (X) and E(ε|Z) = 0
(ii) m ∈ M ⊂ {l ∈ L1 (X) ∩ D(j) (X) and ψl(j) ∈ L1 (X)}
(iii) FY,X,Z is identified.
The set of functions M in the second condition of the previous assumption reflects the
restrictions imposed by the model. For instance, Chen (2007) consider Hölder, Sobolev or
Besov space and Ai & Chen (2003), Blundell et al. (2007), Florens et al. (2012) impose
functional restrictions like some form of additive separability or partial linearity on m.
In many contexts, economic theory also impose shape restrictions such as monotonicity
and/or concavity of m. Such restrictions can be embedded in the definition of M. Note
that M is not necessarily a linear space especially in the case of shape restrictions.
Our framework generalizes those of Severini & Tripathi (2012b) and Santos (2011) who
consider the case where (j) = (0). Such a class of parameters includes the average of Y
4
under counterfactual distribution of X. Indeed, consider ψ(x) = feX (x)/fX (x) with feX a
density function supported by X . In such a case θ0 represents the average of Y under the
assumption that the distribution of unobserved ε is unchanged whereas the distribution of
X is changed into feX .
Our framework is also a generalization of the weighted average derivative studied among
others by Powell et al. (1989) and Newey & Stoker (1993). Such cases correspond to
X = Z, (j) = (0, ..., 1, ..., 0). Powell et al. (1989) consider the case where ψ(x) = fX (x),
whereas Newey & Stoker (1993) consider any differentiable ψ and include the case of
average marginal effect θ0 = E(∂xk m(X)).
Whereas Severini & Tripathi (2012b) emphasize the difficulty resulting from the endogeneity of X, they do not allow θ0 to be a linear functional of the derivative of m. On the other
hand, Powell et al. (1989) and Newey & Stoker (1993) focus one the average of derivative
of m but under an exogeneity assumption of X.
We now turn to the identification of θ0 under Assumption 1.
3
Identification
Before we present our result on identification, let us introduce some notation that will
be used in the rest of the paper. For any nonnegative number and for any random variable U , Lp (U ) denotes the set of measurable functions f defined on Supp(U ) such that
E(|f p (U )|) < ∞. If p = 0. L0 (U ) is simply the set of measurable functions defined on
the support of U . L∞ (U ) is the set of bounded and measurable functions of U . For
p ≥ 1, Lp0 (U ) denotes the space {f ∈ Lp (U ) s.t. E(f (U )) = 0} and Lp0 (U |V ) the space
{f ∈ Lp (U, V ) s.t. E(f (U, V )|V ) = 0}.
Let A ⊂ L0 (X), A⊥ denotes:
h ∈ L0 (X) ∀g ∈ A, hg ∈ L10 (X)
Because the model relies on the nullity of the conditional expectation of ε, the condition
of identification depends on the following operator:
T : L1 (X)
−→ L1 (Z)
x 7→ g(x) 7−→ z 7→ E(g(X)|Z = z)
Ker (T ) denotes the set of functions l ∈ L1 (X) such that T (l) = 0.
Theorem 3.1 (Identification)
(i) The identification region of θ0 is {θ0 + E(ψl(j) ) : l ∈ (M − {m}) ∩ Ker (T )}. So if M is
5
a linear space, the identification region of θ0 is either {θ0 } (in such case θ0 is identified),
either R (in such case, data are uninformative about θ0 ).
⊥
(ii) θ0 is identified if and only if ψ ∈ l(j) s.t. l ∈ (M − {m}) ∩ Ker (T ) .
When the completeness condition holds (i.e. when Ker (T ) = {0}) then m is identified and
and so is the case for θ0 (Newey & Powell (2003)). Morevoer, Canay et al. (2013) have
shown that completeness is not testable without supplementary restrictions. So achieving
identification of the parameter of interest under weaker condition is attractive. The previous result generalizes those of Severini & Tripathi (2012b): if j = 0 and ψ ∈ L0 (X),
θ0 is identified as soon as for any l ∈ Ker (T ) such that E (|ψ(X)l(X)|) < ∞, one has
E (ψ(X)l(X)) = 0. The following examples illustrate the fact that without completeness
some parameters of interest could be identified.
Example: A simple scale model on the first stage. Let X = Z · η, with Z ⊥⊥ η,
η ∼ U[−1;1] , Z an interval of R+ and E(|Z|) < ∞. In that case, l ∈ Ker (T ) if and only
R1
Rz
if for any z ∈ Z, −1 l(zu)du = −z l(u)du/z = 0. Derivation with respect to z gives
Rz
(l(z) + l(−z)) /z − −z l(u)du/z 2 = 0 and then implies that l is an odd function. Reciprocally, if l is an odd function, then it belongs to Ker (T ). From Theorem 3.1.(ii),
e it follows
X (X)
m(X)
)
that the average effect under the counterfactual distribution feX (i.e. θ0 = E ffX
(X)
is identifiable if and only if feX is even. On the other hand, the average marginal effect is
not identifiable because ψ(x) = 1 is even whereas {l0 s.t. l ∈ Ker (T )}⊥ is the set of odd
functions on [−1; 1].
Example: Average marginal effect under an additive first stage. We will see in
the last section of this paper, that an additive first stage implies regularity of the DGP
that ensures the finiteness of the semi-parametric efficiency bound for the average marginal
effect under mild conditions. Here, we discuss the identification of such parameter.
Proposition 3.2 (Identification of Average marginal effect)
If Y = m(X) + ε with Y, ε and X real random variables such that Assumption 1 holds, and
assume that X = p(Z) + η with Z ⊥⊥ η. Then θ0 = E(m0 (X)) is identified if:
M ⊂ {l ∈ L1 (X), l ∈ D1 (X), ∃P s.t.P(p(z) ∈ P ) = 1 and ∀p ∈ P, ∂p E(l(p+η)) = E(l0 (p+η))}
The argument of identification is relies on the fact that derivative with respect to p ∈
Supp(p(Z)) and integration with respect to Fη commute. It holds if for instance, M =
PS
s
S
{l ∈ L1 (X), l ∈ D1 (X) and |l0 (x)| ≤
s=0 as x }, and E(|η| ) < ∞. The derivative of
m is assumed to be bounded by a polynomial of fixed degree S and for this given S,
the corresponding moment of η are well defined. If the derivative of m is assumed to be
6
bounded by a constant, this condition always holds. It is not difficult to find some DGP
where such condition holds but where completeness condition is not fulfilled. Considere
for instance that P(p(z) ∈ int Supp(p(Z))) = 1, η = ν + u with ν and u two independent
variables with u uniform on an interval I not reduced to a single point. Then E (m0 (X))
is identified whereas neither the completeness nor the bounded completeness condition
hold (d’Haultfœuille, 2011). The result holds if ν = 0 almost-surely and whatever the
distribution of Z. Proof of the Proposition 3.2 is given in the Appendix, but we will
present here an informal sketch of it. Under Condition (i) or (ii) of the Proposition 3.2,
for any p ∈ int (Supp(p(Z))) we have the following equality:
∂
E (l(X)|p(Z) = p) = E (l0 (X)|p(Z) = p) .
∂p
For l ∈ Ker (T ) the left hand side is null and so is the right hand side. Theorem
3.1.(ii) ensures the identification of θ0 . Moreover, the strategy of identification suggests
Pn
1
∂
∂
b (Zi ) with gb (z) an estimator of ∂p
E (Y |p(Z) = p) because ∂p
E (Y |p(Z) = p) =
i=1 g
n
E (m0 (X)|p(Z) = p). When p(.) admits non vanishing derivative with respect to the component zl , the identification hargument
could be restated
as a kind of Wald-ratio: E(m0 (X)) =
i
∂ E(Y |Z=z)
E(E(m0 (X)|Z = z)) = E E zl ∂z p(z) |Z = z . Moreover when more than one instrul
ment zl satisfies ∂zk p(z) 6= 0, the average marginal effect is overidentified (cf. Carrasco
et al. (2007) for a definition of overidentification in non-parametric or semi-parametric
linear inverse problem).
In both the two previous examples, we see that completeness condition is a sufficient but
not a necessary condition for identification of θ0 . Under the conditions of identification
given in Theorem 3.1, we will now examine the issue of the existence of a root-N consistent
estimator of θ0 .
4
Existence of root-N regular estimator
In our context of non-parametric regression and under our identification condition of θ0 , it
is natural to ask whether the parameter of interest admits a finite semi-parametric efficiency
bound. Heuristically, we know that the optimal rate of convergence of estimators of m(j) is
√
always lower then N and in some cases far lower. But because θ is an "average" of m(j) ,
the optimal rate of convergence of estimators of θ could be higher and sometimes reaches
√
the N -rate. In that case, there is a room to build an efficient estimator with respect to any
subconvex loss function (van der Vaart (2000), Theorem 25.21), uniformly most powerful
tests and asymptotically uniformly most accurate confidence sets (see Choi et al. (1996)
7
for a discussion about asymptotically uniformly most powerful tests in semiparametric
models).
The case j = 0, M = L2 (X) and ψ ∈ L2 (X) has been studied by Severini & Tripathi
(2012b). If there exists q ∈ L1 (Z) such that ψ(X) = E(q(Z)|X), then for any l ∈ L2 (X) ∩
Ker (T ), E(ψ(X)l(X)) = E(q(Z)l(X)) = 0 and the condition for identification of θ0 holds.
Moreover in this case, the parameter can be rewritten as θ0 = E(Y q(Z)) and estimated by
the average of the Yi qb(Zi ), with qb a consistent estimator of q. Intuitively, if qb is sufficiently
"regular" the estimator will behave as a sample mean, i.e. the corresponding estimator of
θ0 will be regular and root-N consistent. Severini & Tripathi (2012b) have shown that in
such a case, the existence of such a q is a necessary condition for the existence of a regular
root-N estimator of θ0 .
In this section we derive the same kind of necessary condition in a more general framework
√
to ensure existence of regular N consistent estimators of θ0 and we derive an expression of
the semi-parametric efficiency bound of θ0 . For the sake of simplicity, we will now assume
that Y , m(X), m(j) (X) and ψ(X) all admit finite variances. And consequently we now
work in Hilbert spaces L2 (Y, X, Z). Then T is now considered as an operator from L2 (X)
into L2 (Z) and T ∗ is his adjoint operator.
T ∗ : L2 (Z)
−→ L2 (X)
z 7→ g(z) 7−→ x 7→ E(g(z)|X = x)
4.1
Regularity conditions
We will assume some supplementary regularity conditions. Let X + (respectively X 0 ) denote the subset of the variables Xk such that jk > 0 (respectively jk = 0) and K + (respectively K 0 ) the number of such variables. We assume without loss of generality that the
variables X are ordered such that X = (X + , X 0 ).
Remember that ψ(x) = h(fX (x), x). Because fX is unknown but identifiable in the data,
the existence of root-N consistent estimator is related to regularity conditions of u 7→
h(u, x).
Intuitively, the largest M is, the strongest are the conditions to ensure existence of a rootN consistent estimator. If M is a finite dimensional space (for instance, if m is assumed
to be linear) and if the identification condition for θ0 holds, GMM estimators are root-N
consistent (as soon as Y and m(X) admit finite variances). To avoid such degenerated
cases, we have to impose that M is sufficiently large and a priori the largest is M the
strongest are the necessary condition for finiteness of the semiparametric efficiency bound.
However, we show that the necessary and sufficient condition holds as soon as M contains
8
m + tϕ for any ϕ ∈ Φ and t sufficiently close to zero, where Φ is set of test functions of
Schwartz (Schwartz, 1997):
(
Φ=
ϕ(x) =
K
Y
)
ϕk (xk ) with ϕk indefinitely differentiable with compact support .
k=1
The statistical model is parameterized by (m, fε,X,Z ). Consider a subset of parameter
(mt , fε,X,Z,t ) indexed by t ∈ T a neighborhood of 0 such that (m, fε,X,Z ) = (m0 , fε,X,Z,0 ).
This subset of parameter generates a submodel indexed by t: fY,X,Z,t (y, x, z) = fε,X,Z,t (y −
mt (x), x, z). This submodel is differentiable in quadratic mean if there exists a score
s ∈ L2 (Y, X, Z) such that:
2
Z 1
1 1/2
1/2
1/2
f
− fY,X,Z − sfY,X,Z dν(y, x, z) = 0.
lim
t→0
t Y,X,Z,t
2
The differentiability in quadratic mean is a regularity condition that ensures the efficiency
of the maximum likelyhood in parametric framework (van der Vaart, 2000, Chapters 7
and 8), this regularity condition is allows to extend the notion of optimality in semiparametric framework (see for instance Bickel et al., 1993, van der Vaart, 2000 Chapter
25 or Severini & Tripathi, 2012a Section 2.3). A set of score functions s is a tangent set.
The semi-parametric efficiency bound of θ0 is always defined relative to a given tangent
set. And this bound is only well defined for a parameter
and a tangent set such that
R
(j)
−1
ψ(x)mt (x)fε,X,Z,t (e, x, z)dν − θ0 tends to a continuous linear form on
t 7→ θ(t) = t
the given tangent set when t → 0. Without any regularity condition, the largest tangent
set with respect to which θ(t) is differentiable is not easily characterizable. But under the
following regularity conditions, θ(t) is differentiable with respect to the maximal tangent
set, i.e. the set of all square integrable function with respect to fY,X,Z .
Assumption 2 (Regularity Assumption)
(i) Y ∈ L2 (Y ), ψ(X) ∈ L2 (X) and E(ε|Z) = 0,
(ii) m ∈ M ⊂ {l ∈ L2 (X) ∩ D(j) (X), l(j) ∈ L2 (X) and ψl(j) ∈ L2 (X)},
(iii) The statistical model is dominated by ν(y, x, z) = λ(y) ⊗ µ(x, z) with λ the Lebesgue
measure
(iv) h admits
a derivative with respect to the first component X-almost-surely and ∂fX ψ(X) :=
∂h(u,x) is in L2 (X),
∂u u=fX (X)
(v)There exists non negative v and v bounding the skedastik function:
E (Y − m(X))2 |Z ∈ [v; v],
9
(vi) When X + 6= ∅, X + |X 0 admits a density fX + |X 0 with respect to the Lebesgue measure
+
on RK , X 0 -almost-surely,
1/2
(vii) For t sufficiently close to 0, m−t ∈ M and e 7→ fε,Z,X (e, Z, X) is absolutely continuous
∂ fε,X,Z (ε,X,Z)
∈ L2 (ε, X, Z) (i.e. the Fisher information for location
(Z, X)-almost-surely, fεε,X,Z
(ε,X,Z)
∂ f
(ε,X,Z)
ε,X,Z
of ε is finite) and ε fεε,X,Z
∈ L2 (ε, X, Z),
(ε,X,Z)
(viii) ∀ϕ ∈ Φ, m + tϕ ∈ M for t sufficiently close to 0,
∂ fε,X,Z (ε,X,Z)
(ix) For t sufficiently close to 0, (1 + t)m ∈ M and m(X) fεε,X,Z
∈ L2 (ε, X, Z).
(ε,X,Z)
The first and the second conditions (i) and (ii) in Assumption 2 reinforce the conditions of
Assumption 1, assuming that all considered variables admit finite variances. In particular,
ψ(X)m(j) (X) ∈ L2 (X) ensures that θ(t) admits a derivative at t = 0 which is a continuous
linear form on the maximal tangent set of scores. Assumption 2.(iii) is very common and
natural in such context. Assumption 2.(iv) is made to get a tractable formula for the semiparametric efficiency bound of θ0 when ψ(X) depends on fX (X). We do not have in mind
some interesting parameters of the form E(h(fX (X), X)m(j) (X)) with h not differentiable
with respect to the first component. For instance, h(u, x) = u for the density-weighted
average or h(u, x) = fe(x)/u for the average under the counterfactual distribution fe are
both differentiable with respect to u. Assumption 2.(v) ensures the equivalence between
q(Z) ∈ L2 (Z) and q(Z)ε ∈ L2 (Z, ε), which considerably simplifies the proof. Such an
assumption also ensures that every score (corresponding to a parametric sub-model) can
be approximated by a bounded score (of an other parametric sub-model). Indeed, in a
step of our proof we establish some results for parametric sub-models with bounded score,
and in another step we extend this result to parametric sub-models with unbounded score.
Even if such condition may not be minimal to establish our result, we follow Severini &
Tripathi (2012b) who also assume such quite intuitive and simple condition. Concerning
Assumption 2.(vi), remember that we have previously assumed that each component of
X + is dominated by the Lebesgue measure on R. Assumption 2.(vi) ensures moreover that
no component of X + is functionally determined by other components of X. This is a very
usual assumption in econometrics to ensure that data allows to identify variation of one
component of X keeping the others components constant. In Assumption 2.(vii), absolute
1/2
continuity of e 7→ fε,Z,X (e, Z, X) ensures that ∂ε fε,X,Z is well defined (De La Vallée Poussin
h
i2 ∂ε fε,X,Z (ε,X,Z)
is also well defined. Assumptions
(1915), Section 7.40.2) and next E
fε,X,Z (ε,X,Z)
2.(vii) and 2.(iii) ensure that the parametric sub-model of location {(m(.)−t, fε,X,Z ), t ∈ T }
is differentiable in quadratic mean. These two assumptions also ensure that any score
(corresponding to any sub-model differentiable in quadratic mean) can be decomposed in
two components: one related to the nuisance parameter m and the other one related to
10
fε,X,Z . Assumption 2.(viii) enables us to consider a tangent set sufficiently rich to provide
non-trivial restrictions on the DGP to get a finite semi-parametric efficiency bound. In
particular, this Assumption rules out the case where the support of X is not finite and
where m is known up to a finite parameter corresponding to the usual two-step GMM.
Last, if Assumption 2.(ix) also holds then the parameter θ0 needs to be a moment of (Y, Z)
to ensure that the semi-parametric efficiency bound is finite. Assumptions 2.(vii) to 2.(ix)
impose that M is not degenerated. But they are compatible with a lot of restrictions of
regularity to define M, like those considered by Chen (2007) (Hölder, Sobolev or Besov
spaces). Concerning shape restrictions such as monotonicity and/or concavity, Assumption
2 imposes that m is strictly monotonic and/or strictly concave, meaning that m belongs
to the interior of the constraints that define M. When m is at the boundary of M, the
model becomes irregular and this could help to get a higher rate of convergence3 , such as
in Chetverikov & Wilhelm (2015).
4.2
Main result
The key necessary condition for existence of a root-N regular estimator of θ0 is given by
the following Assumption.
Assumption 3
(i) With probability 1, the weak-derivative of x+ 7→ ψ(x+ , X 0 )fX (x+ , X 0 ) is a function on
+
RK . This function is denoted (ψfX )(j) .
(ii)Q := {q ∈ L2 (Z) s.t. T ∗ (q)fX = (−1)|j| (ψfX )(j) } =
6 ∅
The Assumption 3.(i) is related to the distribution theory developed by Schwartz (1997).
For k = 1, ..., K + , let ϕk an indefinitely differentiable function with compact support
from R to R. As x+ 7→ ψ(x+ , X 0 )fX (x+ , X 0 ) is locally integrable then the quantity
R
Q + (jk ) + +
+
(−1)|j| ψ(x+ , X 0 )fX (x+ , X 0 ) K
k=1 ϕk (xk )dx1 ...dxK is well defined for any (ϕ1 , ..., ϕK + ).
This mapping defines the weak derivative of order (j) of ψ(x+ , X 0 )fX (x+ , X 0 ). This weak
derivative is a function (up to an isomorphism) denoted (ψfX )(j) if for any (ϕ1 , ..., ϕK + ):
+
Z
(j)
+
0
(ψfX ) (x , X )
K
Y
+
+
ϕk (x+
k )dx1 ...dxK
= (−1)
|j|
Z
+
+
0
+
0
ψ(x , X )fX (x , X )
k=1
K
Y
(j )
+
+
ϕk k (x+
k )dx1 ...dxK .
k=1
When x+ 7→ ψ(x+ , X 0 )fX (x+ , X 0 ) is continuously differentiable up to order (j), successive
integrations by parts ensure that (ψfX )(j) coincides with the usual derivative of order (j) of
3
but the rate of convergence becomes specific to loss function and risk considered
11
ψfX . Such a restriction will be discussed briefly in the next section. The second condition in
the previous assumption is the strongest one and the most abstract one. To get the intuition
about the role of this condition, we can consider the particular case where (j) = (0). In that
case Assumption 3.(i) automatically holds and the Assumption 3.(ii) is equivalent to the
existence of q ∈ L2 (Z) such that E(q(Z)|X) = ψ(X). Under the completeness condition
(i.e. Ker (T ) = {0}), m is identified and so is E(ψm) for every ψ. The completeness
condition is equivalent to cl (R(T ∗ )) = L2 (X). In the favorable case ψ is in R(T ∗ ) and so
there exists q such that θ0 = E(ψ(X)m(X)) = E(q(Z)m(X)) = E(q(Z)Y ). But in other
cases ψ is only in cl (R(T ∗ )) \R(T ∗ ). In this latter case, θ0 is only poorly identified and
cannot be expressed as a known moment of the form E(q(Z)Y ) but only approximated by
such moments. Such a situation is not specific to the non parametric instrumental variable
and has been formalized in a wider framework by van der Vaart (van der Vaart, 1991 for
a detailed discussion or van der Vaart, 2000, Theorem 25.32 for a more synthetic one). In
the general case (i.e. (j) 6= (0)), if Assumption 2 holds then Assumption 3 is necessary to
ensure existence of root-N regular estimator, and in that case there exists q ∈ L2 (Z) such
that θ0 = E(q(Z)Y ). This is the main theoretical result of this paper established in the
following Theorem.
Theorem 4.1 (Semi-parametric efficiency bounds)
If Assumptions 1 and Assumptions 2.(i) to 2.(viii) hold and if θ0 is identified, a regular
root-N estimator of θ0 exists only if Assumption 3 holds, and in that case there exists
q ? ∈ Q such that the semiparametric efficiency bound V ? satisfies:
V ? = V [εq ? (Z) + R(X)]
= minq∈Q V [εq(Z) + R(X)] .
with R(X) = (ψ(X) + ∂fX ψ(X)fX (X))m(j) (X) − E (ψ(X) + ∂fX ψ(X)fX (X))m(j) (X) .
If moreover Assumption 2.(ix) holds, a regular root-N estimator of θ0 exists only if θ0 =
E(q ? (Z)Y ).
Theorem 4.1 states that Assumption 3 is necessary to ensure the existence of a finite semiparametric efficiency bound. It also states that the bound is the minimum on Q of a
function V and is achieved for a given q ? ∈ Q.
The existence of regular root-N consistent estimator depends on two conditions. Assumption 3.(i) concerns the regularity of ψfX and will be briefly discussed in the next section.
Assumption 3.(ii) is more abstract and restrictive and is crucial to ensure finiteness of the
semiparametric efficiency bound. Moreover in that case, θ0 is a moment of (Y, Z) only
(unless (1 + t)m ∈
/ M). So, Theorem 4.1 states that a root-N regular consistent estimator
exists only if we are able to "rule out" the endogenous variable X in the definition of θ0 .
12
And the intuition suggests that such situation is very specific. So, we will discuss more
extensively Assumption 3.(ii). In particular, we will focus on two special cases widely considered by theoretical and applied econometricians: the exponential first stage (i.e. X|Z
belongs to an exponential family of distributions) and the additive first stage (i.e. X is the
sum of a function of Z and a term η independent of Z but potentially correlated with ε).
Another interesting remark about Theorem 4.1 concerns the existence of supplementary
instruments in the exogenous case4 . In the exogenous case, we have Z
next the
= X and
2 (j)
semi-parametric efficiency bound of θ0 is not finite if and only if E [ψffXX]
= +∞
(cf. Assumption 3). In that case supplementary instruments Z ∗ (ie the situation where
Z = (X, Z ∗ )) will not
a parametric rate of convergence (because if Assumption
help achieve
2 (j)
< +∞).
3.(ii) holds then E [ψffXX]
On the other hand, in the endogenous case (ie when X 6⊂ Z)), supplementary instruments
could ensure the validity of Assumption 3.(ii). Last, in all cases (endogenous or exogenous)
if Assumption 3 holds for a given set of instruments it will still hold if supplementary
instruments are used.
4.3
Alternative formulation of the semi-parametric efficiency bound
Assumption 3.(ii) and the expression of the semiparametric efficiency bound have alternative formulations, that makes more transparent the link with the two stage least square
estimators used in a parametric framework. Let define the two adjoint operators:
K : L2 (X) → L2 (Z)
f
7→
and
√ 1 2 T (f )
E(ε |Z)
K ∗ : L2 (Z) → L2 (X)
∗
√
g(z)
7→ T
g
E(ε2 |Z)
If Assumption 2.2 holds, the skedastic function z 7→ E(ε2 |Z = z) is assumed to be bounded
and bounded away from 0 and so R(T ∗ ) = R(K ∗ ), R(T ) = R(K), R(T T ∗ ) = R(KK ∗ )
and R(T ∗ T ) = R(K ∗ K).
For any linear and bounded operator A from an Hilbert H1 to an Hilbert H2 , we can define
the generalized inverse A† of A on R(A) by:
A† (y) = u if and only if u = Arg
min
v∈H1 :A(v)=y
||v||2H1 .
Proposition 4.2 If Assumptions 1 to 3 hold and if θ0 is identified, then the semi-parametric
efficiency bound of θ0 is equal to:
4
We are grateful to Arthur Lewbel for suggesting to look at this question.
13
(i)
"
2 #
(j)
√ 2
V ∗ = E K ∗† (−1)|j| [ψffXX] + K ∗ E(εR(X)|Z)
E(ε |Z)
,
2 E(εR(X)|Z)
+E R(X) − ε E(ε2 |Z)
(ii) For any q ∈ Q,
V∗ = E
"
√ 2
K ∗† K ∗ q(Z) + E(εR(X)|Z)
E(ε |Z)
2 E(εR(X)|Z)
+E R(X) − ε E(ε2 |Z)
If and only if
(−1)|j| [ψffXX]
(j)
+K
∗
√
E(εR(X)|Z)
E(ε2 |Z)
2 #
,
∈ R(K ∗ K), these expressions reduce to:
(iii)
∗
(−1)|j| [ψffXX]
(j)
∗
√ 2
+K
V = E
E(ε |Z)
2 ,
+E R(X) − ε E(εR(X)|Z)
E(ε2 |Z)
E(εR(X)|Z)
(j)
E(εR(X)|Z)
∗
|j| [ψfX ]
√
(K K) (−1)
+K
fX
2
∗
†
E(ε |Z)
(iv) For any q ∈ Q,
∗
∗
†
K(K K) K
q(Z) + √ 2
E(ε |Z)
2 E(εR(X)|Z)
+E R(X) − ε E(ε2 |Z)
,
V = E
E(εR(X)|Z)
∗
q(Z) + √
E(εR(X)|Z)
E(ε2 |Z)
.
The formula (iii) has
already been
establishes by Ai & Chen (2012), but it is valid only if
(j)
√ 2
∈ R(K ∗ K), a stronger condition than Assumption 3.(ii).
(−1)|j| [ψffXX] + K ∗ E(εR(X)|Z)
E(ε |Z)
(j)
E(εR(X)|Z)
|j| [ψfX ]
∗
√
Under Assumption 3.(ii) the condition (−1)
+K
∈ R(K ∗ K) is equivfX
E(ε2 |Z)
(j)
√ 2
+ E(εR(X)|Z)
∈ R(K) + Ker (K ∗ ) or equivalently
alent to the fact that K ∗† (−1)|j| [ψffXX]
E(ε |Z)
(j)
E(εR(X)|Z)
∗†
|j| [ψfX ]
PR(K) K (−1)
+ √ 2
∈ R(K), where PR(K) is the orthogonal projecfX
E(ε |Z)
tion on the closure of R(K). In a non-parametric context, the assumption R(K) = R(K)
(or equivalently R(T ) = R(T )) is a restrictive condition that will be discussed below in
Proposition 5.2. Then in general the orthogonal projection of an element on R(K) does
not belong to R(K).
14
,
5
When Assumption 3 holds ?
We will discuss separately Assumption 3.(i) on the one hand and Assumption 3.(ii) on the
other hand. Indeed, the first assumption is only a regularity condition on the functions
ψ and fX . For a given theoretical DGP and function ψ, it is not difficult to determine if
it holds or not. We will only emphasize that the condition holds only if some regularity
conditions at the boundary of the support of X hold and discuss the role of this condition
on the fact that θ0 is a moment of (Z, Y ) under Assumption 2.(ix). On the contrary,
Assumption 3.(ii) involves the adjoint operator T ∗ and is quite abstract at this stage, so
it is not easy to know if it holds for a given DGP and a given function ψ. So, the largest
part of the following will concern Assumption 3.(iii).
5.1
Discussion of Assumption 3.(i)
Proposition 5.1 (Regularity condition on ψfX )
If Assumption 3.(i) holds then, the function x 7→ ψ(x)fX (x) admits is a derivative of
order (j 0 ) ≤ (j) on set D(j 0 ) such that P(X ∈ D(j 0 ) ) = 1. For any k = 1, ..., K such
0
that 0 ≤ jk0 < jk , with probability one, xk 7→ (ψfX )(j ) (X1 , ..., Xk−1 , xk , Xk+1 , ..., XK ) (is
almost-everywhere equal to a function that) vanishes at the boundary of the support of
Xk |X1 , ..., Xk−1 , Xk+1 , ..., XK .
If x 7→ ψ(x)fX (x) is (j) times-continuously differentiable on RK then Assumption 3.(i)
holds.
The previous proposition ensure that weak derivative defined in Assumption 2.(i) is equal
to a usual derivative (up to a negligible set). Moreover, this derivative needs to vanish at
the boundary of the support of ψfX . When (j) > (0), this rules out the function ψ = 1C
where C is a set on which fX is bounded below 0.
To understand why such regularity at the boundary of the support is necessary for the
existence of root-N and regular estimator, we can consider the simple case where there
is only one variable X with a convex compact support [a; b] dominated by the Lebesgue
measure. If j > 0 then (ψfX )(k) for k < j vanishes at a and b. If fX (or one of its j − 1 first
derivatives) does not vanish at the boundary, then ψ needs to vanish sufficiently quickly at
the boundary of the support of X. Then successive integrations by part of ϕ(j) ψfX hold
for any ϕ ∈ Φ. Under Assumption 2.(viii), the existence of a root-N consistent and regular
estimator implies that "the parts between brackets" in the successive integrations by parts
15
of ϕ(j) ψfX vanish i.e.:
lim
(x,x)→(a,b)
j−1
X
x=x
(−1)l (ψ(x)fX (x))(l) ϕ(j−1−l) (x) x=x = 0.
l=0
Because this result needs to hold for any ϕ ∈ Φ, then the successive derivatives of ψfx
needs to vanish at the boundaries a and b.
Such a regularity condition at the boundary of the support is commonly used as a sufficient
condition to get an efficient estimator for average partial derivative of weighted average
derivatives of index models with exogenous regressors (Powell et al., 1989, Newey & Stoker,
1993, Cattaneo et al., 2014). In a wider class of models, relaxing the index structure and
the exogeneity of regressors, we show that such an assumption is in fact necessary for the
existence of regular and root-N consistent estimators.
When X is multidimensional, the previous reasoning can be generalized, however, it involves a generalization of integration by part to a multidimensional setting that complicates
the presentation of the result without adding any essential benefit.
When ψfX is (j) continuously differentiable then successive integration by parts could be
performed and in this the weak derivatives are the usual derivatives. Interestingly, such
regularity condition is often assumed to show root-N consistency of specific estimators
(Cattaneo et al., 2014 for density weighted average derivative with exogenous regressors or
Ai & Chen, 2007 for the endogenous regressors), or to derive the semiparametric efficiency
bound of a parameter (Ai & Chen, 2012). In these papers, the assumption of continuously differentiability is sufficient, in this paper we show that a slightly weaker regularity
condition (namely the weak differentiability) is in fact necessary.
Example: The average under counterfactual distribution corresponds to the case
e
, in that case Assumption 3.(i) does not restrict the choice
where (j) = (0, ..., 0) and ψ = ffX
X
of feX . However, we will show in the sequel that averages under counterfactual distributions
are not always root-N estimable because Assumptions 3.(ii) restricts feX .
Example: The weighted average partial derivative corresponds to X + = X1 and
(j) = (1, 0, ..., 0) and the average marginal effect is a special case corresponding to ψ = 1.
θ0 = E(ψ(X)∂x1 m(X)) is root-N estimable as soon as x1 7→ ψ(x1 , X 0 )fX1 |X 0 (x1 ) is (equal
X almost-surely to) an absolutely continuous function on R. For instance, if X ∼ N (0, 1),
then θ0 is root-N estimable only if ψ admits a derivative whose integral is equal (up to
an additive constant) to ψ. This rules out discontinuous ψ like 1{x ∈ [a; b]} or 1{x > 0}
but allows for kinked function such as x1{x > 0}. Similarly, if X is drawn in a B(α, β)
distribution, then θ0 = E(ψm0 ) is root-N estimable only if ψ is twice differentiable on ]0; 1[
and ψ(x) = o(x1−α ) and ψ(1 − x) = o(x1−β ) for x ∼ 0. Notice that if α ≤ 1 (respectively
β ≤ 1) then ψ needs to vanish at 0 (respectively at 1) sufficiently quickly to ensure existence
16
of a root-N consistent estimator. In particular, root-N consistent estimators for the average
marginal effect (ψ = 1) or for the density weighted average derivative (ψ = fX ) does not
exist if α ≤ 1 or β ≤ 1. For the Beta distribution, note that Assumption 3.(ii) reinforces
these conditions at the boundary of the support: the square integrability of q ensures that
root-N consistent estimators for the average marginal effect (ψ = 1) do not exist if α ≤ 2
or β ≤ 2.
We now turn to Assumption 3.(ii), which is particularly abstract in the endogenous case
(ie X 6= Z). There is at least two frameworks in which the condition Q 6= ∅ have more
comprehensive interpretations. The first one corresponds to the case where the distribution
of X|Z belongs to an exponential family (we will refer to that case as "the exponential
first stage"). The second one corresponds to the case where the first stage is "additive"
i.e. X = p(Z) + η with η ⊥⊥ Z.
5.2
5.2.1
When is Q =
6 ∅?
On the closedness of of range of linear operators
(j)
(j)
The Assumption 3.(ii) implies that (−1)|j| [ψffXX] ∈ L2 (X). Reciprocally, if (−1)|j| [ψffXX] ∈
L2 (X), then Q is never empty if R(T ∗ ) = L2 (X). The condition R(T ∗ ) = L2 (X) holds if
and only if Ker (T ) = {0} (completeness condition) and R(T ∗ ) is closed. The completeness
condition is known to be hard to test is the data (Canay et al., 2013 and Freyberger, 2015)
and impose abstract restrictions on the joint distribution of (X, Z) (see d’Haultfœuille
(2011), Hu & Schennach (2008) or ? for discussion of criteria ensuring the completeness of
non-completeness). The closedness of R(T ∗ ) is even more restrictive as illustrated by the
following proposition.
Proposition 5.2 (Closedness of R(T ) and R(T ∗ ))
(i) The range of T is closed if and only if the range of T ∗ is closed.
(ii) If T (or T ∗ ) is a compact operator, then R(T ) is closed if and only if R(T ) and
R(T ∗ ) are finite dimensional.
(iii) If the completeness condition holds (ie Ker (T ) = {0}) and if X|Z admits a density
with respect to the Lebesgue measure Z-almost surely, then neither R(T ) neither
R(T ∗ ) are closed.
In our instrumental variable framework, if the product measure of (X, Z) is dominated
by the product of marginal measures of X and Z, then T is a compact operator as soon
17
as E(fX,Z (X, Z)) < ∞ (cf. discussion of Assumption A1 in Darolles et al., 2011). The
proposition 5.2.(ii) shows that in that case, R(T ) = L2 (Z) and R(T ∗ ) = L2 (X) only if X
and Z have finite supports (and then T and T ∗ are characterized by full rank matrixes).
When the support of X or Z is not finite, then even in the favorable case of completeness,
Proposition 5.2.(iii) shows that R(T ) (respectively R(T ∗ )) is not closed and then not equal
to L2 (Z) (respectively L2 (X)).
So because R(T ) is in general not closed when X 6= Z, criteria on the DGP and the
parameter θ0 ensuring that Q =
6 ∅ are hard to formulate in the most general case. In the
following, we investigate the condition Q =
6 ∅ in two specific cases: when the first stage
belongs to an exponential distribution and when the first stage is additive.
5.2.2
Specialization to the exponential first stage
Exponential families of distributions have many nice features: they embed many usual
parametric models and provide sufficient statistics. So, they have been widely studied
by econometricians such as in Gourieroux et al. (1984). Concerning the non-parametric
instrumental variables, Newey & Powell (2003) have established the completeness condition
for some (but not all) models in which the true conditional distribution X|Z belongs to an
exponential family. When the first stage is exponential, the condition Q =
6 ∅ is equivalent to
the condition that an identifiable function of X (depending on ψ and on the distribution
of (Z, X)) is the Laplace transform of a measure. The range of Laplace transforms of
measures is not easily characterizable, but some properties can be derived. On the other
hand the range of Laplace transform of linear forms has been precisely characterized by
L. Schwartz and this implies interesting necessary conditions for the non-emptiness of Q.
Note that the following discussion differs from the one of Ai & Chen (2007) (pp 25-26),
which show that root-N consistent estimator exists for θ0 = E(m(1)(X) ) when T is HilbertSchmidt and the marginal density distribution of X is proportional to exp (t(x)) with t a
finite order polynomial.
Let W denote the set of exogenous regressors i.e. the variables that are simultaneously
e denote the set of endogenous variable and Ze the set of excluded
in X and Z. Let X
instruments.
Assumption 4 (Exponential First Stage)
e Z|W
e
(i) X,
is dominated by a product measure πX|W
⊗ πZ|W
e
e
(ii) fX|
x) takes the form hw (e
x)gw (τw (z)) exp (−µw (e
x)0 τw (e
z )), with where µw (x)
e Z=e
e z ,W =w (e
pw
e
and τw (z) are elements of R and 0 ∈ Supp(µw (X)|W
= w) and span (Supp(µw (X)|W = w)) =
pw
R .
18
Assumption 4.(i) is mainly needed for the sake of simplicity, and means that the joint distribution of X, Z|W is not degenerated as in the exogenous case that has been
previously stud
0
e
e
ied. Assumption 4.(ii) is equivalent to fX|
(e
x) = hW (e
x)e
gW (Z) exp −µW (e
x) τW (Z) ,
e Z,W
e
because an integration with respect to x
e ensures that gew (z) is a function of (τ (z), w). The
condition 0 ∈ Supp(µ(X)|W ) can be assumed without loss of generality and is just a normalization5 . Similarly, the condition span (Supp(µw (X)|W = w)) = Rpw can be assumed
without loss of generality6 , as soon as we adopt the convention R0 = {0}.
Newey & Powell (2003) have proved that the completeness condition holds for a subclass
of models with an exponential first stage (for instance, for X|Z ∼ N (b(Z), σ 2 ) with R(b)
containing an open set of RK ). But some models with an exponential first stage do not
verify the condition of completeness for at least two reasons. The first reason is the lack
of variation in Z, for instance if Z is binary and X = Z + ε with ε|Z ∼ N (0, 1). A second
reason of failure of the completeness condition is due to the fact that for some models
the dimension of the linear span of the range of µ and τ is "too large". For instance,
if X|Z ∼ N (b(Z), σ 2 (Z)), then the linear span of the range of µ(X) = (X 2 /2, X) and
2
whereas X is only real and it follows that the completeness
τ (Z) = (− σ21(Z) , σb(Z)
2 (Z) ) is R
condition does not hold in general7 . In both cases, Assumption 4 holds and despite the
failure of the completeness condition, θ0 = E ψ(X)m(j) (X) will be root-N estimable for
some functions ψ. And on the other hand, we will also show that even if the completeness
condition holds, some simple parameter will not be root-N estimable.
R
pW
For any γ ∈ Rpw , let νw (γ) = gw (t) exp (−γ 0 t) dFτw (Z)|W
:
e
=w (t). The random set {γ ∈ R
e (because hW (X)νW (µW (X))
e =
νW (γ) < ∞} is a convex set8 that contains almost-surely µw (X)
fX|W
is finite almost-surely and P(hW (X) = 0|W ) = 0 almost-surely). Then it contains
e
e
with probability one ΓW = int co Supp(µW (X)|W
), the interior of the convex hull of
e
Supp(µW (X)|W
). The function νW can be naturally extended to ΓW + iRpW (where
R
i2 = −1), by νW (u + iv) = gW (t) exp (−(u + iv)0 t) dFτW (Z)|W
(t). This is the Laplace
e
transform of the signed measure MW defined by dMW (t) = gW (t)dFτW (Z)|W
(t). Such a
e
Laplace transform is a function that verifies strong conditions of smoothness. Interestingly
another function (denoted ρW in the following theorem) needs to verify similar conditions
Indeed
gw (τ (z)) exp (−µ(x)0 τ (z))
=
gf
µ(x)0 τ (z))
with
gf
=
w (τ (z)) exp (−e
w (τ (z))
gw (τ (z)) exp(−µ(x0 )τ (z)) and µ
e(x) = µ(x) − µ(x0 ) for a given x0 ∈ Supp(X|W ).
6
If span (Supp(µw (X)|W = w)) is a proper subspace of dimension rw < dw , we can always reparameterize µwP
and τw . Indeed, in such case, up to a reordering
of the
τw , we can asPd
Prcomponent of µw and P
d
r
sume that µj = i=1 λji µi for j > r, and next we have i=1 µi τi = i=1 µi τei with τei = τi + j=r+1 λji τj .
7
For instance, if b is bounded on Supp(Z), then
and β such that b2 (Z) + αb(Z) + β < 0
it exists α
2
2
on Supp(Z) and in that case, E X + αX + β|Z = 0 for σ (Z) = −b2 (Z) − αb(Z) − β. To our best
knowledge, in such case of heteroscedastic first stage, necessary conditions on b(z) and σ 2 (z) to ensure
the completeness are not well understood. See d’Haultfœuille (2011) for a discussion of the completeness
assumption.
8
exp(−(γ1 + γ2 )0 t/2) ≤ exp(−γ10 t) + exp(−γ20 t) because ab ≤ a2 + b2 , and then the convexity follows.
5
19
of smoothness. Let us introduce some notions of regularity of functions from Cp to C.
A function f from an open set D ⊂ Cp to C is holomorphic on D if for any z0 ∈ D,
(z0 )
limz→z0 ,z6=z0 f (z)−f
exists. Roughly speaking, an holomorphic function is "very" regular
z−z0
P
Q
n
and admits an analytic development f (z) = (n)∈Np a(n) z (n) where z (n) = pj=1 zj j .
L. Schwartz has defined the Laplace transform for a wider class of object than functions
or measures by duality arguments (Schwartz, 1997, and see Appendix 7.6 for details).
The Laplace transforms are always defined on a set C + iRp , where C is a convex set.
When C has a non-empty interior, the Laplace transform is an holomorphic function on
int(C) + iRp and verifies some conditions of domination by polynomials. All the Laplace
transforms considered in this paper are Laplace transforms of signed measures. In this
case, the condition of domination of the Laplace transform can be refined, and last when
the measure is dominated by the Lebesgue measure additional conditions apply. Under
Assumption 4, Q is not empty if and only if a function %W (x) depending on ψ, fX , µ and
νW is the Laplace transform of a signed measure. The next Theorem uses this property to
derive necessary conditions on ψ and the joint distribution of (X, Z) for the existence of a
regular root-N estimator. These conditions are sufficient to ensure that %W is the Laplace
transform of a linear form, so informally speaking, necessary conditions given below are
almost-sufficient to ensure the non-emptiness of Q and can be used to reject existence of
root-N estimator in many settings.
Theorem 5.3
If Assumptions 1, 2 and 4 hold then Assumption 3.(ii) holds only if:
(j)
X (X)]
e W ), i.e. there exists rW such that:
∈ L1 (µW (X),
(i) (−1)|j| [ψ(X)f
fX (X)
e = (−1)|j| [ψ(X)fX (X)](j) and rW is integrable with respect to F
rW (µW (X))
,
e
µW (X)|W
fX (X)
W -almost surely,
(ii) W -almost-surely, there exists a convex CW such that P(µW (X) ∈ CW |W ) = 1 such
that %W = rW νW is the Laplace transform of a signed measure with finite totale
variation dominated by dFτ (Z)|W on CW + iRpW .
(iii) W -almost-surely, %W is an holomorphic function on ΓW + iRpW ,
(iv) W -almost-surely, %W is bounded on C + iRpW for any compact subset C of ΓW ,
e
(v) when τW (Z)|W
is dominated by the Lebesgue measure on RpW , %W (u + iv) tends to
0 when v tends to infinity for any u ∈ ΓW .
20
Moreover, when a root-N regular consistent estimator of θ0 exists, the semiparametric efficiency bound of θ0 is:
V ∗ = V (εq ∗ (Z) + R(X)) ,
with
1
dL−1 [%W ]
e
h
i
q (Z) =
(τW (Z))
dF
2
2
e
e
e
τ
(
Z)|W
gw (τ (Z))E(ε |Z)E 1/E(ε |Z)|τW (Z), W
W
h
i
e W
E E(εR(X)|Z)/E(ε2 |Z)|τW (Z),
E(εR(X)|Z)
h
i −
+
E(ε2 |Z)
e W
E(ε2 |Z)E 1/E(ε2 |Z)|τ (Z),
∗
W
and
R(X) = (ψ(X) + ∂fX ψ(X)fX (X)) m(j) (X) − E (ψ(X) + ∂fX ψ(X)fX (X)) m(j) (X) .
(j)
X (X)]
Condition (i) of the Theorem means that the quantity [ψ(X)f
is independent of X
fX (X)
conditional on µ(X) and W . Intuitively, this means that (µ(X), W ) is a sufficient statistic
(j)
X (X)]
, i.e. there is enough variations in (µ(X), W ) to recover variations of
for [ψ(X)f
fX (X)
[ψfX ](j) /fX .
Conditions (i) and (ii) are necessary and sufficient to ensure the existence of q ∈ L1 (Z)
such that T ∗ (q)fX = [ψfX ](j) . And if such a function q is also in L2 (Z), it will be in Q.
Moreover, other conditions of the Theorem imply strong regularity conditions on ψ and
on fX to ensure existence of regular root-N consistent estimator of θ0 . In condition (iii),
holomorphy is a necessary condition to ensure that %W is a Laplace transform (in the sense
of L. Schwartz, see Proposition 6 in Chapter VIII of Schwartz (1997)). We prove in the
appendix that Condition (iv) is necessary for the Laplace transform to be the Laplace
transform of a signed measure. Both (iii) and (iv) are sufficient to ensure that % is a
Laplace transform in the sense of L. Schwartz (see Appendix 7.7.2 and Proposition 6 in
Chapter VIII of Schwartz (1997) for more details). Last, Condition (v) is a refinement of
e
(iv) when τW (Z)|W
is dominated by the Lebesgue measure.
dL−1 [%W ]
In the expression of the semiparametric efficiency bound, dF
denotes the density
e
τW (Z)|W
of the inverse Laplace transform of %W with respect to the Lebesgue-Stieljes measure
e
of τW (Z)|W
. The condition (ii) of the Theorem, the injectivity of the Laplace transform and the Radon-Nikodym Theorem, ensure that the density is well defined. Under exponential first stage, Theorem 5.3 gives the solution of the minimization problem
given in Theorem 4.1. Not surprisingly, the solution q ∗ depends on the skedastic function E(ε2 |Z) and depends also on the dependence between ε and X through the quantity
21
E(εR(X)|Z). When τW is injective, the sigma-algebra generated by Z is equal to the
e W ), and in that case q ∗ (Z) has a simpler expression:
sigma-algebra generated by (τW (Z),
dL−1 [%W ]
e More generally, this simpler expression holds as soon as
q ∗ (Z) = g (τ1(Z))
(τW (Z)).
e dF
e
w
2
τW (Z)|W
2
e W ) and E(εR(X)|Z) = E(εR(X)|τW (Z),
e W ).
E(ε |Z) = E(ε |τW (Z),
Even when ψ and fX are strongly regular, θ0 is not always root-N estimable as illustrated by
simple examples detailed below. Indeed, the behavior of the holomorphic function νW (u)
when the imaginary part of u tends to infinity is related to the size of the set of parameters
θ0 that are root-N estimable: the faster νW (x + iy) tends to 0 when |y| → ∞, the stronger
is the link between X and Z and the larger is the set of functions rW such that rW νW is
a Laplace transform and then the larger is the range of T ∗ and the set of ψ such that θ0
is root-N estimable. In particular, some functions ψ correspond to parameters which are
root-N estimable if the strength of instruments is sufficient and which is not otherwise.
Example: Bivariate normal distribution
Consider the bivariate normal case:
!
!
!!
X
0
1 ρ
∼N
,
.
Z
0
ρ 1
p
The first stage is X = ρZ + 1 − ρ2 η with η|Z ∼ N (0, 1).
Consider a shift ∆m in the mean of X keeping constant the unobserved variable ε. In
that case, the average of the outcome under such counterfactual distribution corresponds
to θ0 = E exp − 21 [(X − ∆m)2 − X 2 ] m(X) and this parameter is root-N estimable.
On the other hand, the average of the outcome under a shift σ 2 − 1 in the variance of X
√
corresponds corresponds to θ0 = 1/ σ 2 E exp − 21 [X 2 (1/σ 2 − 1)] m(X) . This parameter
is root-N estimable if and only if σ 2 > 1 − ρ2 . Intuition behind this result is that when σ
tends to 0, θ0 tends to m(0) which is not a parameter that can be consistently estimated
at the root-N rate even in the exogenous case. On the other hand if the counterfactual
distribution has a higher variance than the original one (σ > 1) the existence of a root-N
consistent estimator is always ensured as soon as ρ 6= 0. We can also consider a choice of σ
and remark, that the semi-parametric efficiency bound is finite if and only if ρ2 > 1 − σ 2 .
When the counterfactual distribution has a lower variance than the original ones, we are
in an intermediary case in which the existence of root-N consistent estimator depends on
the strength of the instruments.
Similarly, the average marginal effect E(m0 (X)) is always root-N estimable, whereas the
density weighted average derivative E(fX (X)m0 (X)) is root-N estimable if and only if ρ2 ≥
1/2. The corresponding semi-parametric efficiency bounds are derived in the Appendix and
are reported in Table 1.
22
Example: Average Marginal Effect with normal homoscedastic first stage Consider the case where θ0 = E(m0 (X)), with X|Z ∼ N (t(Z), σ 2 ) and where Assumption 1
and 2 holds. In that case Assumption 4 is fulfilled with µ(X) = X and τ (Z) = −t(Z)/σ 2 ,
h(X) = exp(−X 2 /(2σ 2 )), g(τ (Z)) = (2πσ 2 )−1/2 exp(−τ 2 (Z)σ 2 /2). In that case, there is no
R
variable W and Γ = ΓW = R, νW (x) = ν(x) = (2πσ 2 )−1/2 exp (−t2 σ 2 /2) exp(−xt)dFτ (Z) (t)
R
0 (x)
(2πσ 2 )−1/2 t exp(−t2 σ 2 /2) exp(−xt)dFτ (Z) (t)
fX
x
and rW (x) = r(x) = − fX (x) = σ2 +
. So, we conclude
ν(x)
that
Z
x
1
2 2
exp(−t σ /2) exp(−xt) 2 + t dFτ (Z) (t).
ρW (x) = ρ(x) =
(2πσ 2 )1/2
σ
Now, assume for the sake of simplicity that E (|τ (Z)| exp(−τ (Z)2 σ 2 /2 − xτ (Z))) < ∞ for
any x ∈ C. The Condition (ii) of Theorem 5.3 holds only if τ (Z) admits a locally absolutely
continuous density with respect to the Lebesgue measure (see Lemma 7.5 in appendix).
So the density of τ (Z) is continuous and almost everywhere differentiable on R and its
derivative fτ0 (Z) is locally integrable. If τ (Z) admits a mass point or if τ (Z) has a density
bounded below on its compact support (such as the uniform distribution), regular root-N
estimators do not exist. A bit of algebra gives:
0
0
ft(Z)
(t(Z))
dL−1 (%)
1
1 fτ (Z) (τ (Z))
=−
,
(τ (Z)) = 2
g(τ (Z)) dFτ (Z)
σ fτ (Z) (τ (Z))
ft(Z) (t(Z))
Then Q is not empty only if E
02 (t(Z))
ft(Z)
2
ft(Z)
(t(Z))
< ∞ which means that the Fisher information
matrix of location of t(Z) is finite. Last, if moreover t is one to one, the semiparametric
efficiency bound of the average marginal effect is:
V ? = V m0 (X) − (Y − m(X))
0
(t(Z))
ft(Z)
ft(Z) (t(Z))
!
.
Example: Average Marginal Effect with normal heteroscedastic first stage Consider the case where θ0 = E(m0 (X)), with X|Z ∼ N (t(Z), σ 2 (Z)). For the sake of simplicity, we assume moreover that Z is compact and that the range of t and σ 2 are respectively
included in compacts of R and ]0; ∞[. To our best knowledge, conditions on (t(Z), σ 2 (Z))
that ensure the completeness condition are not known. Here, we discuss existence of regular
and root-N consistent estimators. In that case, h(X) = 1, µ(X) = (µ1 , µ2 ) = (X, X 2 /2),
τ
1/2
τ2
1
1
2
τ (Z) = (τ1 , τ2 ) = (− σt(Z)
2 (Z) , σ 2 (Z) ) and g(τ ) = (2π)1/2 exp(− 2τ ). Note that the restrictions
2
2
assumed on t(Z) and σ (Z) ensure that the range of τ1 and τ2 are respectively included
in compacts of R and ]0; ∞[. It follows that any continuous function on R×]0; ∞[ of
(τ1 (Z), τ2 (Z)) admits a finite moment. The support of µ(X)|Z does not depend on Z
and is a parabola in the plan and the relative interior of its convex hull is Γ = {(a, b) ∈
23
R2 s.t.b >
a2
}.
2
For (µ1 , µ2 ) ∈ C2 , ν(µ1 , µ2 ) is well defined and equal to:
Z
ν(µ1 , µ2 ) =
1/2
t2
t2
− 2t1 −µ1 t1 −µ2 t2
√ e 2
dFτ (Z) (t1 , t2 ),
2π
and fX (x) = ν(x, x2 /2). It follows that:
Z
%(µ1 , µ2 ) =
2
R
1/2
t2
t2
− 2t1 −µ1 t1 −µ2 t2
dFτ (Z) (t1 , t2 ).
(t1 + µ1 t2 ) √ e 2
2π
t2
1/2 − 1 −µ t −µ t
t1 |t√2 |2π e 2t2 1 1 2 2 dFτ (Z) (t1 , t2 )
Because (µ1 , µ2 ) ∈ C 7→
is the Laplace transform of
a signed measure dominated by the push-forward measure of τ (Z), it follows that % is a
t2
R t3/2
− 2t1 −µ1 t1 −µ2 t2
2
√
e 2
Laplace transform if and only if (µ1 , µ2 ) 7→ µ1
dFτ (Z) (t1 , t2 ) is itself a
2π
2
2
Laplace transform on C + iR where C is a convex of R such that P((X, X 2 /2) ∈ C) = 1.
This means that there exists a function g ∈ L1 (τ1 (Z), τ2 (Z)) such that for any (µ1 , µ2 ) ∈
C + iR2 :
Z
3/2
t2
t
− 1 −µ t −µ t
µ1 √2 e 2t2 1 1 2 2 dFτ (Z) (t1 , t2 ) =
2π
Z
g(t1 , t2 )e−µ1 t1 −µ2 t2 dFτ1 (Z),τ2 (Z) (t1 , t2 ).
Fubini Theorem ensures this previous equality is equivalent to:
Z
Z
µ1 L[Mt2 ](µ1 ) exp (−µ2 t2 ) dFτ2 (Z) (t2 ) =
L[Gt2 ](µ1 ) exp(−µ2 t2 )dFτ2 (Z) (t2 ),
3/2
t
2
u
)dFτ1 (Z)|τ2 (Z)=t2 (u) and
with Mt2 and Gt2 measures defined by dMt2 (u) = √22π exp(− 2t
2
dGt2 (u) = g(u, t2 )dFτ1 (Z)|τ2 =t2 (u).
Injectivity of the Laplace transform ensures that µ1 L[Mτ2 (Z) ](µ1 ) = L[Gτ2 (Z) ](µ1 ). It follows from Lemma 7.5 in Appendix that τ1 (Z)|τ2 (Z) needs to admit a locally absolutely
continuous density with respect to the Lebesgue measure, which is equivalent to the fact
that t(Z)|σ 2 (Z) admits a locally absolutely continuous density (σ 2 (Z)-almost surely). Informally this requirement is very strong when the econometrician only have one instrument
Z, but quite reasonable if the econometrician observe two continuous and different instruments Z1 and Z2 . We will just discuss informally9 this point.
Imagine that Z admits a density with respect to the Lebesgue measure in R. In that case,
9
We do not discuss more precisely the precise meaning of having only one or two distincts instruments,
because some difficulties come from the fact that for Z1 , Z2 drawn in a uniform distribution on [0; 1]2 , it
exists an invertible function g such that Z = g(Z1 , Z2 ) with Z is drawn in a uniform distribution on [0; 1].
24
a finite semi-parametric efficiency bound exists only if Z|σ 2 (Z) is not countable:
|{z ∈ Z : σ 2 (Z) = σ 2 (z)}| > |N|
σ 2 (Z) almost-surely.
Informally, σ 2 (Z) needs to be not too informative on t(Z). It follows that when z 7→ σ 2 (z)
is invertible or more generally admits a finite or countable fibers, regular root-N consistent
estimators do not exist. And even when σ(Z) takes only a few values, the existence
of a locally absolutely continuous density of t(Z)|σ 2 (Z) imposes non trivial and weird
restrictions on the distribution of t(Z). To understand why, consider the case X|Z ∼
N (Z, σ02 1{Z≤a} + σ12 1{Z>a} ): a root-N consistent estimator of the average marginal effect
exists only if Z|Z ≤ a and Z|Z > a admit locally absolutely continuous densities. Then
fZ (z)/FZ (a)1{z≤a} and fZ (z)/(1 − FZ (a))1{z>a} needs to vanish at the boundary of Z but
also at z = a. Then Z needs to admit a locally absolutely continuous density that vanishes
at a and at the boundary of its support.
On the other hand, Z admits a density with respect to the Lebesgue measure on R2 (or
Rp for p > 2). Then t(Z)|σ 2 (Z) = u admits a locally absolutely continuous densities
under reasonable conditions because the existence of two distincts instruments enables the
econometrician to observe continuous variation of t(Z) keeping constant σ 2 (Z). If such
reasonable conditions are fulfilled, then :
V ∗ = V (εq ∗ (Z) + m0 (X)) ,
with:
q ∗ (Z) =
1
2 |Z)E(1/E(ε2 |Z)|t(Z),σ 2 (Z))
E(ε
0
2
2
× Cov (E(εm (X)|Z), 1/E(ε |Z)|t(Z), σ (Z)) −
f0
t(Z)|σ 2 (Z)
(t(Z))
ft(Z)|σ2 (Z) (t(Z))
.
Example: Average marginal effect for a simple parametric first stage Consider
the case where X|Z is dominated by the Lesbegue measure on R, the marginal distribution
of Z is such that P(Z > 0) = 1 and where fX|Z (x) = g(Z) exp(−|x|k Z) with k > 0. For
k = 1 we get the two sided-exponential distribution for X|Z and the case k = 2 corresponds
R
1−1/k
to X|Z ∼ N (0, (2Z)−1 ).It follows that g(z) = [ exp(−|x|k z)dx]−1 = kz
. In such a
2Γ(1/k)
2−1/k
case µ(X) is a scalar. If Z is such that E(Z
) < ∞ (this is the case if Z has a compact
support included in ]0; +∞[), then the dominated convergence Theorem ensures that fX
R
is differentiable and fX0 (x) = k|xk−1 |sgn(x) zg(z) exp(−|x|k z)dFZ (z), where sgn(x) =
1{x>0} − 1{x<0} . To satisfy Condition (i) of Theorem 5.3, fX0 (x)/fRX (x) needs to be a
zg(z) exp(−µ(x)z)dF (z)
function of µ(x) = |x|k . Because fX0 (x)/fX (x) = kµ(x)1−1/k sgn(x) R g(z) exp(−µ(x)z)dFZZ(z) ,
this condition holds if and only if sgn(x) is a function of |x| = µ(x)1/k , which never holds.
25
5.2.3
Specialization to the additive first stage
We now consider the case where the first stage is additive and where both instruments and
residuals of the first stage admit densities with respect to the Lebesgue measure. In the
following, for any p ≥ 0, Lp (RK ) denotes the set of functions f which are measurable with
respect to the Lebesgue measure on RK and such that f p is integrable.
Assumption 5 (Additive 1st stage)
X = p(Z) + η
where R(p) ⊂ RK , p(Z) and η are random variables in RK that admit densities fp(Z) and
fη with respect to the Lesbegue measure on RK and Z ⊥⊥ η.
The previous Assumption rules out the case where there exists non continuous variables
in X 0 . This restriction is made for the sake of simplicity but can be relaxed by a reasoning conditional to X 0 . For every f integrable with respect to the Lesbegue meaR
sure on Rd , Fq denotes the Fourier transform of q: q(u) exp(−2πix0 u)du, and F q (x) =
Fq (−x). Note that under the previous Assumption, X ⊥⊥ Z|p(Z) and then E(q(Z)|X) =
E(E(q(Z)|p(Z))|X) = E(e
q (p(Z))|X). And the additivity of the first stage enables us to
R
∗
rewrite T (q)fX as an convolution product because T ∗ (q)(x)fX (x) = qe(v)fp(Z) (v)fη (x −
v)dv = (e
q fp(Z) ) ? fη (x). The convolution Theorem ensures that Q 6= ∅ if and only if
F[ψfX ](j) /Ffη is the Fourier transform of an integrable function (more precisely the equality
holds for any v such that Ffη (v) 6= 0). Let S, the Schwartz space (i.e. the space of infinitely
differentiable functions f such that xα f (β) (x) decreases to zero when |x| → +∞ for any
multi-index (α) and (β)). Because S ⊂ L1 (RK ), F is well defined on S. By duality arguments, F can be extended to S 0 the topological dual space of S (see Schwartz (1997) for the
topology considered) that contains L1 (RK ), S and L2 (RK ) (up to a canonical isomorphism).
On S 0 , we have FF = FF = id. F is a bijection from S 0 into itself. This is also a bijection
from S (respectively L2 (RK )) to itself. Then despite that F(S) (respectively F(L2 (RK ))
and F(S 0 )) is easily characterizable, there isn’t also simple result for F(L1 (RK )). But we
know that F(L1 (RK )) is included in C00 (RK ), the space of continuous functions from RK to
C vanishing at infinity. And because FF = id on S 0 and then on L1 (RK ), we also have {f ∈
L1 (RK ) s.t. Ff ∈ L1 (RK )} = {f ∈ L1 (RK ) s.t. F f ∈ L1 (RK )} ⊂ F(L1 (RK )) ∩ C00 (RK ).
R
Moreover, the Laplace transform defined by Lf (z) = f (u) exp(−z 0 u)du extends the
Fourier transform10 to Γf + iRd , where Γf = {x ∈ RK : u 7→ f (u) exp(−u0 x) integrable}.
When the support of p(Z) is compact, then we know that the Laplace transform of qefp(Z)
is defined on Cp and the Paley-Wiener Theorem gives us a sharp characterization of the
10
More exactly, with the definition of F and L used here, Lf (x + iy) extends iy 7→ Ff (−iy/2π) from
iRK to Γf + iRK .
26
range of Lqefp(Z) . So in the next Theorem, we consider a general case and a special case
where Supp(p(Z)) is compact.
Theorem 5.4
If Assumptions 1, 2 and 5 hold and if Ffη has isolated zeros then:
(i) Q =
6 ∅ if and only if it exists a function q ∈ L2 (p(Z)) such that F[ψfX ](j) /Ffη =
(−1)|j| Fqfp(Z) .
(ii) Q =
6 ∅ only if F[ψfX ](j) /Ffη ∈ C00 (RK ).
(iii) If F[ψfX ](j) /Ffη ∈ C00 (RK ) ∩ L1 (RK ) then Q 6= ∅ if and only if F F[ψfX ](j) /Ffη ∈
L1 (RK ). And in that case,
o
n
.
Q = q ∈ L1 (Z) : E(q(Z)|p(Z) = v)fp(Z) (v) = (−1)|j| F F[ψfX ](j) /Ffη
Moreover, if the support of p(Z) is compact and
then:
R
0
fη (x)e−t x dx < ∞ for every t ∈ RK
(iv) Q =
6 ∅ only if:
(a) L[ψfX ](j) /Lfη is an holomorphic function on CK bounded on Γ + iRK for every
compact Γ included in RK ,
h
i
(b) L[ψfX ](j) /Lfη (x + iy) tends to zero when when |y| tends to infinity for any
x ∈ RK ,
(c) there exists C, N such that:
L
[ψfX ](j) (x + iy) ≤ C(1 + ||x + iy||)N exp
Lfη (x + iy) !
sup
||p(z)|| · ||y|| .
z∈Supp(Z)
When a root-N regular consistent estimator of θ0 exists, the semiparametric efficiency bound
of θ0 is:
V ∗ = V (εq ∗ (Z) + R(X)) ,
with
dF −1 [F(ψfX )(j) /Ffη ]
(−1)|j|
q (Z) =
(p(Z))
E(ε2 |Z)E [1/E(ε2 |Z)|p(Z)]
dFp(Z)
E [E(εR(X)|Z)/E(ε2 |Z)|p(Z)] E(εR(X)|Z)
+
−
E(ε2 |Z)E [1/E(ε2 |Z)|p(Z)]
E(ε2 |Z)
∗
and
R(X) = (ψ(X) + ∂fX ψ(X)fX (X)) m(j) (X) − E (ψ(X) + ∂fX ψ(X)fX (X)) m(j) (X) .
27
The first part of the previous Theorem enables us to distinguish three cases. First, if
F[ψfX ](j) /Ffη ∈
/ C00 (RK ), it ensures that Q = ∅. Second if F[ψfX ](j) /Ffη ∈ C00 (RK ) ∩ L1 (RK )
and F F[ψfX ](j) /Ffη ∈ L1 (RK ), then Q 6= ∅. Otherwise, if F[ψfX ](j) /Ffη ∈ C00 (RK ) \
1
K
L (R ) or F F[ψfX ](j) /Ffη ∈
/ L1 (RK ), we remain agnostic about the emptiness of Q,
we only know that there is a linear form S ∈ S 0 with a Fourier transform11 equals to
F[ψfX ](j) /Ffη (up to a canonical isomorphism) but this linear form does not necessarily
correspond to a q ∈ L1 (Z) (in the sense that S ∈ {U ∈ S 0 s.t. ∃q ∈ L1 (Z) s.t. U (ϕ) =
R
(−1)|j| E(q(Z)|p(Z) = v)fp(Z) (v)ϕ(v)dv for ϕ ∈ S}). In this last case, it is
always possi
0
12
ble to consider F[ψfX ](j) /Ffη as an element of S , and to verify that S = F F[ψfX ](j) /Ffη
is in L1 (RK ) (up to the usual isomorphism).
The second part of the previous Theorem considers the case where the support of p(Z) is
bounded and where η have thin tails (such as the normal distribution or distribution with
bounded support). In particular, when X has a compact support then p(Z) and η also
have compact supports and then Proposition (iii) of the previous theorem gives a criterion
on the distribution of η to ensure that Q =
6 ∅.
Example: Average under counterfactual distribution. If Z, η are two independent
normal centered distributions of variance σZ2 and ση2 . In that case if, X = Z + η then X ∼
N (0, σZ2 + ση2 ). If we are interested to average m(X) under the counterfactual distribution
√ 2 2
σZ +ση
2
2
2
2
2
2
N (µ, σ ), then ψ(x) =
)
/2 . In such a case,
+
σ
exp
−
(x
−
µ)
/σ
−
x
/(σ
η
Z
σ
2 2
2 2
LψfX (z) = exp(µz + σ z /2) and Lfη (z) = exp(ση z /2) with ΓψfX = Γfη = R, then
the ratio LψfX (z)/Lfη (z) is equal to exp(µz + σ 2 z 2 /2 − ση2 z 2 /2) and verifies condition
(i) of the Theorem 5.4 if and only if σ 2 > ση2 . In such a case, Q contains the single
2
(z−m)2
σ
z
Z
element q(z) = √ 2 2 exp 2σ2 − √ 2 2 . The condition σ 2 > ση2 means that the
σ −ση
Z
2
σ −ση
counterfactual distribution cannot be concentrated more than the noise in the fist stage.
We recover a result already established with Theorem 5.3. Interesting results can also be
obtained for additive first stage which does not verify Assumption 4. For instance, if Z,
η and X are Gamma distributions of parameter (kZ , τ ), (kη , τ ) and (kZ + kη , τ ). If the
counterfactual distribution is a Gamma distribution of parameter (k 0 , τ 0 ), then the ratio
0
LψfX (z)/Lfη (z) is equal to (1 + τη z)kη (1 + τ 0 z)−k and then Condition (ii) of Theorem 5.4
11
See Schwartz (1997) for a definition of Fourier transform of linear form.
To do so, verify that for every compact K, if ϕn is a sequence of functions infinitely differentiable with
a support included in K that tends uniformly to 0 then S(ϕn ) tends to 0. This first step ensures that
S is identifiable to a measure (cf. Theorem III, Chapter I of Schwartz (1997)). To be identifiable to an
integrable function, such measure have also to be dominated by the Lesbegue measure and of finite total
variation by the Radon-Nikodim Theorem. This leads to check also that supϕ∈S,||ϕ||∞ <1 S(ϕ) < ∞ and
supA negligible Borel set inf ϕ∈S,K compact s.t.ϕ≥1A∩K S(ϕ) = 0.
12
28
holds if and only if k 0 > kη . Because the value of θ does not matter, it appears that in
this last case the mean and the variance of the counterfactual distribution can be chosen
arbitrarily small (because the both depends on θ) but its skewness and its kurtosis (the
both depend only on k) have to be larger than those of η.
The case of average derivatives leads to elegant results, as illustrated in the following
proposition.
Proposition 5.5 (Average derivatives under additive first stage)
If Assumption 1, 2 and 5 hold, if Ffη has isolated zero and if θ0 = E m(j) (X) then a
regular and root-N consistent estimator of θ0 exists only if the weak derivative of
fp(Z) is a
(j)
2 !
fp(Z) (p(Z))
(j)
locally integrable function (denoted fp(Z) in the sequel) and if E
< ∞. In
fp(Z) (p(Z))
that case, the semi-parametric efficiency bound is:
V(εq ∗ (Z) + m(j) (X)),
with
(j)
fp(Z)
(−1)|j|
q (Z) =
(p(Z))
E(ε2|Z)E [1/E(ε2 |Z)|p(Z)] fp(Z) E E(εm(j) (X)|Z)/E(ε2 |Z)|p(Z)
E(εm(j) (X)|Z)
+
−
.
E(ε2 |Z)E [1/E(ε2 |Z)|p(Z)]
E(ε2 |Z)
∗
and when p is one to one this expression reduced to:
∗
(j)
f (p(Z))
|j| p(Z)
q (Z) = (−1)
fp(Z) (p(Z))
.
The proof rely on the Condition (i) of Theorem 5.4 and is given in Appendix. We briefly
illustrate the implication of the previous proposition for the average marginal effect.
Example: Average marginal effect. Let X = Z + η with η drawn in an uniform
distribution on [0; 1], Z drawn in an uniform distribution [0; α] and η ⊥⊥ Z. The correlation
2 1/2
between X and Z is αα2 +1
, then α is a measure of the strength of the instrument.
In such cases, we have shown that E(m0 (X)) is always identified (cf. Proposition 3.2).
But the previous proposition ensures that E(m0 (X)) is not root-N estimable whatever the
value of α. Indeed fZ is not absolutely continuous on R and its weak derivative is the
signed measure δ0 − δ1 (where δa is the Dirac at a) and not a function. Such an example
shows that if the densities of Z and η are not "sufficiently" regular, root-N estimation
of the average marginal effect is lost even in case of identification and in presence of
strong instruments. A subsequent question concerns the respective minimal regularity of
29
fZ and fη . As soon as FfX0 (t) = (2iπt)FfX (t), then E(m0 (X)) is root-N estimable only
R f 02 (z)
if Z admits an absolutely-continuous density fZ on R such that fZZ (z) dz < ∞. Note
that fZ must at least vanish at the boundary of its support as it is often assumed and
this vanishing needs to be sufficiently smooth to ensure the integrability condition. If
Z ∼ B(α, β), α > 2 and β > 2 is a necessary condition for the existence of a consistent
root-N estimator. And similarly, if Z ∼ Γ(k, τ ) or if Z ∼ W eibull(k, λ) then k > 2 is
a necessary condition for the existence of a consistent root-N estimator. With Pareto
distribution root-N consistent estimators never exist. But normal or Student distributions
for Z ensure that Q is not empty as soon
as fX is absolutely continuous.In
that case, the
i
h
0 (Z)
fZ
0
semiparametric efficiency bound is V m (x) − (Y − m(x)) fZ (Z) and the parameter θ0 is
0 f (Z)
also equal to −E Y fZZ (Z) . More generally for multivariate X, if Assumptions 2 and 5 hold
and if the Fourier transform of fη has isolated zeros, then a regular and root-N consistent
estimator of E (∂xk m(X)) exists only if pk 7→ f
absolutely continuous (for
p(Z) (p1 , ..., pK ) is almost-every p1 , ..., pk−1 , pk+1 , ..., pK ) and if E
(∂pk fp(Z) )2
(p(Z))
2
fp(Z)
< ∞. In that case, the
semiparametric efficiency bound is V(εq ∗ (Z) + ∂xk m(X)) with
q ∗ (Z) =
6
∂pk fp(Z)
−1
(p(Z))
E(ε2 |Z)E [1/E(ε2 |Z)|p(Z)] fp(Z)
E [E(ε∂xk m(X)|Z)/E(ε2 |Z)|p(Z)] E(ε∂xk m(X)|Z)
+
−
.
E(ε2 |Z)E [1/E(ε2 |Z)|p(Z)]
E(ε2 |Z)
Conclusion
In this paper we show that the parameter θ0 = E(ψ(X)m(j) (X)) with m unknown and
E(Y − m(X)|Z) = 0 is often not root-N estimable. We show that root-N estimability of
θ0 is closely related to the fact that [ψfX ](j) /fX belongs the range of T ∗ , with T ∗ (q)(X) =
E(q(Z)|X). This condition is not nested with the completeness condition that ensures
the identification of m. If the first stage is exponential or additive, our condition is quite
restrictive when the parameter of interest is an average under a counterfactual distribution
of X. It appears that for some DGPs and some counterfactual distributions, θ0 is never
root-N estimable, and sometimes for others, θ0 is root-N estimable if and only if the strength
of the instruments is above a critical value. When the parameter of interest is an average
marginal effect, root-N estimation is achievable if the first stage is additive, even if the
completeness condition does not hold, as soon as the density of the instrument admits a
derivative that vanishes at the boundary of its support.
30
√
θ0
Shift in Mean
ρ
E exp − 12 [(x − m)2 − x2 ] m(X)
ρ 6= 0
yes
Shift in Variance
√
1/ σ 2 E exp − 12 [x2 (1/σ 2 − 1)] m(X)
ρ2 ≤ 1 − σ 2
no
ρ2 > 1 − σ 2
yes
Average marginal effect
ρ2 = 0
no
0
2
31
E(m (x))
ρ >0
Density-weighted average derivative, ...
E (2π)−k/2 exp (−kX 2 /2) m0 (X) , k ∈ N
ρ2 <
2
ρ =
N -estimable
efficiency bound
V (Y − m(X)) exp mρ Z −
ρ2
σ 2 +ρ2 −1
h
22 h
ρ z
V (Y − m(X)) exp 2(1−ρ
2) 1 −
ii
σ 2 ρ2
σ 2 +ρ2 −1
V ε Zρ + m0 (X)
yes
k
or ρ2 = 0
k+1
k
with k ≥ 1
k+1
yes
(k + 1)2 V ε Zρ +
k
k+1
yes
(k + 1)2 V
ρ2 >
m2
2ρ2
no
Table 1: Some examples when (X, Z) ∼ N (0, Σ) with Σ11 = Σ22 = 1
and Σ12 = ρ
1
2
0
k exp (−kX /2) m (X)
2π
!
1−ρ2
εZ exp Z 2 (k+1)ρ
2 −k
+ √ 1 k exp (−kX 2 /2) m0 (X)
2π
√
7
7.1
Appendix A: Proofs of theorems
Proof of Theorem 3.1
Without completeness assumption, T is not invertible, m is not point identified and then
the set identification is
{m + l, l ∈ Ker (T ) ∩ (M − {m})} = {m} + Ker (T ) ∩ (M − {m}).
It follows that the set identification of m(j) is
m(j) + l(j) , l ∈ Ker (T ) ∩ (M − {m}) .
Because ψ is known (up to fX which is identified), the set identification of θ0 is
{θ0 + E(ψl(j) ), l ∈ Ker (T ) ∩ (M − {m})}.
When M is a vectorial space, M − {m} = M and because l 7→ E(ψl(j) ) is a linear form
on vectorial space Ker (T ) ∩ M and its range is {0} or R and this proves (i).
To prove (ii), remark that θ0 is identified if and only if E(ψl(j) ) = 0 for every l ∈ Ker (T ) ∩
(M − {m}), i.e.
⊥
ψ ∈ l(j) s.t. l ∈ Ker (T ) ∩ (M − {m}) .
7.2
Proof of Theorem 3.2
We first prove the result under the first condition. The region of identification of θ0 is
{E(m0 (X) + l0 (X)), l ∈ M ∩ Ker (T )}. For all l ∈ M ∩ Ker (T ), we have both equalities:
E(l(X)|p(Z) = u) = 0
∂u E(l(X)|p(Z) = u) = E(l0 (X)|p(Z) = u)
The second equality is a simple consequence of the dominated convergence Theorem using
PS
s
0
s=1 as x for the domination. All together, these equalities imply that E(l (X)) = 0, and
next the region of identification is reduced to a single point.
Under the second condition, we have:
Z
sup(I)
Z
E(l(X)|p(Z) = p) =
Z
sup(I)+p
Z
l(p+u+v)dFν (v)du =
inf(I)
l(u+v)dFν (v)du,
inf(I)+p
Supp(ν)
32
Supp(ν)
R
R
derivation with respect to p gives: Supp(ν) l(sup(I) + p + v)dFν (v) − Supp(ν) l(inf(I) +
R
R sup(I)
p + v)dFν (v) = Supp(ν) inf(I) l0 (p + u + v)dudFν = E(l0 (X)|p(Z) = p), And next, θ0 is
identified.
7.3
Proof of Theorem 4.1
7.3.1
Some Lemmas
Lemma 7.1 (Derivation in quadratic mean "with respect to m(X)")
If Assumption 1 and 2 holds then r(e, x, z) = ∂e fε,X,Z (e, x, z)/fε,X,Z (e, x, z)1{fε,X,Z (e, x, z) >
0} is well defined and square integrable, and for any l ∈ L2 (X) such that rl ∈ L2 (ε, X, Z),
we have:
2
Z l(x)
1 1/2
1/2
1/2
lim
f
(e − tl(x), x, z) − fε,X,Z (e, x, z) −
r(e, x, z)fε,X,Z (e, x, z) dν(e, x, z) = 0
t→0
t ε,X,Z
2
Moreover, E(r|X, Z) = 0.
1/2
Proof: Let su (e, x, z) = fε,X,Z (e−u, x, z), because x 7→ x2 is locally Lipschitz continuous on
R+ , u 7→ s2u (y, x, z) is absolutely continuous and the chain rule applies: i.e. ∂u s2u (e, x, z) =
2su (e, x, z)∂u su (e, x, z), and next r(e−u, x, z) = ∂u s2u (e, x, z)/s2u (e, x, z)1su (e,x,z)>0 = 2∂u su (e, x, z)/su (e, x
is well defined. The square integrability of r is ensured by Assumption 2.(vii). Next, we
have:
stl(x) (e, x, z) − s0 (e, x, z) =
=
=
R tl(x)
∂u su (e, x, z)dλ(u)
0
R
tl(x)
1/2
1
r(e − u, x, z)fε,X,Z (e − u, x, z)dλ(u)
2 0
1/2
tl(x) R 1
r(e − utl(x), x, z)fε,X,Z (e − utl(x), x, z)dλ(u)
2
0
Next, the Cauchy-Schwartz inequality and the Fubini Theorem ensure that:
R
2 1
1/2
2
stl(X) − s0
=
l (X) 0 r(e − utl(x), x, z)fε,X,Z (e − utl(x), x, z)dλ(u)
h
R
i
1 2
1
2
≤ 4 E l (X) 0 r (ε − utl(X), X, Z)fε,X,Z (ε − utl(X), X, Z)dλ(u)
R1R
R
= 14 0 Supp(X,Z) R r2 (e − utl(x), x, z)fε,X,Z (e − utl(x), x, z)dλ(e) l2 (X)dµ(x, z)dλ(u)
R1R
R
= 41 0 Supp(X,Z) R r2 (e, x, z)fε,X,Z (e, x, z)dλ(e) l2 (X)dµ(x, z)dλ(u)
1
E
t2
h
2 i
1
E
4
= 41 E (r2 (ε, X, Z)l2 (X)) < ∞
1/2
X, Z-almost-surely, t 7→ fε,X,Z (e − t, X, Z) is absolutely continuous and then admits a
33
derivative at t = 0 for almost-every e ∈ R, then:
2
l(X)
1
stl(X) (ε, X, Z) − s0 (ε, X, Z) −
r(ε, X, Z)s0 (ε, X, Z) → 0 a.s.
t
2
The Proposition 2.29 of van der Vaart (2000) ensures the nullity of the limit given in
the Lemma. Next, consider the case where l(X) = 1 and use the fact that ν(e, x, z) =
λ(e) ⊗ ν(x, z) to deduce that X, Z-almost surely:
Z lim
t→0
2
1 1/2
1
1/2
1/2
(f
(e − t) − fε|X,Z (e)) − r(e, X, Z)fε|X,Z (e) dλ(e) = 0.
t ε|X,Z
2
Next, bound E(r|X, Z):
R
− r(e, X, Z)fε|X,Z (e)dλ(e)
|E(r|X, Z)| = h1t fε|X,Z (e − t) − fε|X,Z (e) ih
i
R
1/2
1/2
1/2
1/2
1/2
f
(e)
f
(e
−
t)
+
f
(e)
dλ(e)
≤ 1t fε|X,Z (e − t) − fε|X,Z (e) − r(e,X,Z)
ε|X,Z
ε|X,Z
2
R
i ε|X,Z
h
1/2
1/2
1/2
+ 12 r(e, X, Z)fε|X,Z (e) fε|X,Z (e − t) − fε|X,Z (e) dλ(e)
h 1/2
i2
R 1 1/2
1/2
1/2
1
≤ 2
fε|X,Z (e − t) − fε|X,Z (e) − 2 r(e, X, Z)fε|X,Z (e) dλ(e)
t
1/2
2
R
1/2
1/2
1
2
fε|X,Z (e − t) − fε|X,Z (e) dλ(e)
+ 2 E(r (ε, X, Z)|X, Z)
Theorem 9.5 of Rudin (1987) ensures that limt→0
2
R 1/2
1/2
fε|X,Z (e − t) − fε|X,Z (e) dλ(e) = 0.
Lemma 7.2 (DQM "with respect to the nuisance parameter fε,X,Z ") If Assumption 2 holds, and if ft (e, x, z) is a density with respect to ν and s ∈ L2 (ε, X, Z) such that:
1
lim 2
t→0 t
Z
ft (e, x, z)1{fε,X,Z (e,x,z)=0} dν(e, x, z) = 0,
2
Z 1
1 1/2
1/2
1/2
lim
f (e, x, z) − fε,X,Z (e, x, z) − s(e, x, z)fε,X,Z (e, x, z) dν(e, x, z) = 0
t→0
t t
2
then for any ϕ ∈ Φ:
2
Z s − rϕ
1 1/2
1/2
1/2
lim
f (e − tϕ(x), x, z) − fε,X,Z (e, x, z) −
fε,X,Z (e, x, z) dν(e, x, z) = 0
t→0
t t
2
Let W = (X, Z) and w = (x, z).
R
Assumptions imply that E(s2 (ε, W )) = s2 (e, w)fε,W (e, w)dν(e, w) < ∞.
34
Moreover, we have:
2
R 1 1/2
1/2
1/2
1
f
(e
−
tϕ(x),
w)
−
f
(e,
w)
−
(s(e,
w)
+
r(e,
w)ϕ(x))f
(e,
w)
dν(e, w)
t
ε,W
ε,W
t
2
2
R 1 1/2
1/2
1/2
1
f
(e
−
tϕ(x),
w)
−
f
(e
−
tϕ(x),
w)
−
s(e
−
tϕ(x),
w)f
(e
−
tϕ(x),
w)
≤2
dν(e, w)
t
ε,W
ε,W
t
2
2
R 1 1/2
1/2
1/2
fε,W (e − tϕ(x), w) − fε,W (e, w) − 21 r(e, w)ϕ(x)fε,W (e, w) dν(e, w)
+2
t
2
R
1/2
1/2
+2
s(e − tϕ(x), w)fε,W (e − tϕ(x), w) − s(e, w)fε,W (e, w) dν(e, x, z)
The first and the third terms tend to zero because ν(e, x, z) = λ(e)⊗(x, z) and the Lebesgue
measure λ is translation invariant. The second term tends to zero by assumption.
Lemma 7.3 (Density of bounded scores) Let
S = {s ∈ L2 (ε, X, Z) : E(s) = 0, E(εs|Z) = 0},
Under Assumption 2, Sb = S ∩ L∞ (ε, X, X) is dense in S for the norm of L2 (ε, X, Z).
Consider s ∈ S and sn = s1{|s|<n} − E(s1{|s|<n} ). We have sn ∈ L20 (ε, X, Z) ∩ L∞ (ε, X, Z)
and by dominated convergence Theorem,
lim ||sn − s||22 = lim E (sn − s)2 = 0.
n
n
Let introduce pn the orthogonal projection of sn on closure of {q(Z)ε+c : q ∈ L2 (Z), c ∈ R},
defined by:
E [(sn − pn )(q(Z)ε + c)] = 0 ∀c, q ∈ R × L2 (Z).
We can check that
pn =
E [sn ε|Z]
ε.
E[ε2 |Z]
Note that by definition sn − pn is the projection of sn on S = S, so sn − pn − s ∈ S and
pn ∈ S⊥ , we have by Pythagoras Theorem:
ksn − sk22 = ksn − pn − sk22 + kpn k22 ≥ ksn − pn − sk22 .
Let gk (Z) = E ε2 1{|ε|≥k} |Z , by dominated convergence Theorem:
lim gk (Z) = 0, Z − a.s.,
k
then gk (Z) converges in probability to 0. It follows that it exists kl a strictly increasing
sequence in N, such that P gkl (Z) ≥ 1l ≤ 1/l.
35
Let rl (Y, X, Z) = ε1{|ε|<kl } 1{gkl (Z)<1/l} . rl converges in probability to ε because
P(|rl − ε| > a) ≤
≤
≤
≤
P(|rl − ε| =
6 0)
P ({|ε| ≥ kl } ∩ {gkl (Z) < 1/l}) + P (gkl (Z) ≥ 1/l)
P(|ε| ≥ kl ) + P gkl (Z) ≥ 1l
P(|ε| ≥ kl ) + 1/l
Moreover E(rl2 |Z) converges almost-surely (and then in probability) to E(ε2 |Z) because:
i
h
|E (ε2 − rl2 |Z) | = E ε2 1{|ε|≥kl } 1{gkl (Z)<1/l} + 1{gkl (Z)≥1/l}
2
≤ E gkl (Z)1{gkl (Z)<1/l} + E E [ε |Z] 1{gkl (Z)≥1/l}
≤ 1l + vP gkl (Z) ≥ 1l
≤ 1+v
.
l
> 0 for l ≥ 1+v
It follows that E(rl2 |Z) ≥ v − 1+v
.
l
v
Then for a given n, by continuous mapping Theorem, E(sn ε|Z)rl /E(rl2 |Z) converge in
probability to pn when l tends to infinity.
Moreover,
2
E(sn ε|Z)2 rl2
≤ E(s2n |Z)E(ε2 |Z) ε1+v 2
E(rl2 |Z)2
v+
( n )
ε2
2 2
≤ 4n v
2
(v+ 1+v
n )
By Theorem 17.4 in Jacod & Protter (2000), for every n,
E(sn ε|Z)rl
= 0.
lim −
p
n
l
E(rl2 |Z)
2
Then for every n, let
hn,l
E(sn ε|Z)rl
−E
=
E(rl2 |Z)
E(sn ε|Z)rl
E(rl2 |Z)
We have:
limkhn,l − pn k2 = 0,
l
and because rl2 = rl ε:
E [(sn − hn,l )ε|Z] = 0
E(hn,l ) = E(sn ) = 0.
It follows that
sn − hn,l ∈ S ∩ L∞ (ε, X, Z)
36
.
and
ksn − hn,l − sk2 ≤ ksn − pn − sk2 + kpn − hn,l k2
≤ ksn − sk2 + kpn − hn,l k2 .
And then we deduce that:
lim lim ksn − hn,l − sk2 = 0.
n
l
Then s ∈ S ∩ L∞ (ε, X, Z).
7.3.2
The Proof
The tangent set.
For a parametric sub-model indexed by t in a neighborhood of 0 in R, let θ(t) the value of
the parameter for the DGP corresponding to t.
Z
θ(t) =
(j)
ψt (x)mt (x)fY,X,Z,t (y, x, z)dλ(y, x, z)
Z
=
(j)
h (fX,t (x), x) mt (x)fY,X,Z,t (y, x, z)dλ(y, x, z)
Consider a family of parametric submodel with score l ∈ S ⊂ L2 (Y, X, Z) such that
θ(t) is a sufficiently smooth function of t to ensure that it exists f ∈ L2 (Y, X, Z) such
that ∂t θ(0) = E(f l). The set of score l is called a tangent set, f is called an influence
function, and such case we say that θ is differentiable with respect to the tangent set S
(see van der Vaart (2000), Chapter 25). A necessary condition for existence of a regular
root-N convergent estimator is that it exists such an influence function for the parameter
θ = E(ψ(X)m(j) (X)). The semi-parametric efficiency bound (with respect to the tangent
set S) of θ0 is simply the smallest variance among variance of the influence functions.
In our case, we consider the tangent set of scores S = ∪ϕ∈Φ Sϕ = ∪ϕ∈Φ ∪c≥0 Sϕ,c , where Sϕ,c
denotes the set of scores of parametric models indexed by t and defined by:
Y = m(X) + tϕ(X) + ε, E(ε|Z) = 0, (ε, X, Z) has density f (t) with respect to ν
such that the "true" density fε,X,Z corresponds to f0 and
1
lim 2
t→0 t
Z
ft (e, x, z)1{f0 (e,x,z)=0} dν(e, x, z) = 0,
2
Z 1
1 1/2
1/2
1/2
lim
f (e, x, z) − f0 (e, x, z) − s(e, x, z)f0 (e, x, z) dν(e, x, z) = 0,
t→0
t t
2
and
Z
max
2
2 2
e ft (e, x, z)dν(e, x, z), E ε s
37
< c.
Lemma 7.2 implies that the sub-models considered are differentiable in quadratic means
with a score of the form s − rϕ. It follows that the tangent set considered here is S0 − rΦ
(with S0 corresponds to Sϕ with ϕ = 0 and r defined in Lemma 7.1). The restriction
E(ε|Z) = 0 imposes that:
Z
qeft dν = 0 for any q ∈ L∞ (Z) and t close to 0,
On the other side,
−E(q(Z)εs(ε, X, Z)) =
R
=
R
qe 1t (f
t − f0 ) − sf0 dν
qe
R
1
+2
1
t
1/2
1/2
1/2
−f
− 12 sf0
0
1/2
1/2
qesf0
ft − f 1/2 dν
ft
1/2
1/2
ft + f0
dν(e, x, z)
The Cauchy-Schwartz inequality ensures that:
|E(qεs)| ≤
R
+
2 2
q e
R
1/2
ft
2 2 2
+
q e s f0 dν
1/2
f0
2
dν
1/2 R 1/2 R 1
t
1/2
ft
−
1/2
f0
1/2
ft
2
1/2
ft
−
1/2
f0
−
1/2
1
sf0
2
2
1/2
dν
1/2
dν
1/2
f0
2
1/2
p
dν
≤ 2 max(c, ν)||q||∞
+
q 2 e2
The triangle inequality ensures that:
1/2
R 2 2 2
√
q e s f0 dν(e, x, z)
≤ ||q||∞ c. So, the differentiability in quadratic
and moreover
mean ensures the right hand side of the previous inequality tends to 0 and next
R
E(qεs) = 0.
Then for any s ∈ S0 we have E(εs|Z) = 0.
Lemma 7.3 ensures that we can restrict without loss of generality the tangent set to Sb0 −rΦ
(because the efficient influence function is the projection of any influence function on S0 −rΦ
and then on Sb0 − rΦ by density of Sb0 in S0 ).
R
We also have q(z)(e − tϕ(x))f0 (e − tϕ(x), x, z)dν(e, x, z) = 0 for any q ∈ L∞ (Z), ϕ ∈ Φ,
and t close to 0. Then,
Z
q(z)e
Z
1
(f0 (e − tϕ(x), x, z) − f0 (e, x, z)) dν(e, x, z) = q(z)ϕ(x)f0 (e−tϕ(x), x, z)dν(e, x, z).
t
And translation invariance of the Lebesgue measure ensures that
Z
Z
q(z)ϕ(x)f0 (e − tϕ(x), x, z)dν(e, x, z) =
38
q(z)ϕ(x)f0 (e, x, z)dν(e, x, z).
It follows that:
R
|E(qϕ) − E(qεrϕ)| = qe 1t (f (e − tϕ(x), x, z) − f (e, x, z)) − ϕ(x)r(e, x, z)f (e, x, z) dν(e, x, z)
R
1/2
2
≤
q(z)2 e2 f 1/2 (e − tϕ(x), x, z) + f 1/2 (e, x, z) dν(e, x, z)
1/2
2
ϕ(x)
R 1 1/2
1/2
1/2
f (e − tϕ(x), x, z) − f (e, x, z) − 2 r(e, x, z)f (e, x, z) dν(e, x, z)
t
1/2
1/2 R 1/2
2
R
+ q(z)2 e2 s2 (e, x, z)f (e, x, z)dν(e, x, z)
f (e − tϕ(x), x, z) − f 1/2 (e, x, z) dν(e, x, z)
1/2
Lemma 7.1 ensures the differentiability in quadratic mean of t 7→ f0 (e − tϕ(x), x, z), and
we conclude that for any ϕ ∈ Φ, E(εrϕ|Z) = E(ϕ|Z).
Differentiability of θ with respect to the tangent set
For any s ∈ S0b and ϕ ∈ Φ, the score s − rϕ is generated for instance by the parametric
sub-model ft (e, x, z) = f0 (e − tϕ(x), x, z) [1 + ts(e, x, z)] for |t| ≤ ||s||∞ . This sub-model
R
generates a parametric model for the marginal density of X: fX,t = ft (e, x, z)dλ(e)dρ(z),
where ρ is the pushforward measure of µ under πZ (x, z) = z. The projected model fX,t is
also differentiable in quadratic mean (cf. Proposition 5 of Appendix A.5 of Bickel et al.
(1993)). A direct calculation ensures that fX,t (x) = fX (x)(1 + tb(x)) and b is the score
of t 7→ fX,t defined by b(X) = E(s − rϕ|X) = E(s|X) (remember that E(r|X) = 0).Let
ψt (x) = h(fX,t (x), x) with fX,t (x) = fX (x)(1+tb(x)), its derivative at t = 0 is ∂t ψt (x)|t=0 =
∂fX ψ(x)b(x)fX (x). Then for any b ∈ L∞ (X), we have:
θ(t)−θ
t
=
1
t
R
(ψt (x) − ψ(x))m(j) (x)fX,t (x)dρ(x)
R
+ 1t ψ(x)m(j) (x)(fX,t (x) − fX (x))dρ(x)
R
+ ψt (x)ϕ(j) (x)fX,t (x)dρ(x)
R
R
R
The term ψt (x)ϕ(j) (x)fX,t (x)dρ(x) is ψt (x)ϕ(j) (x)fX (x)dρ(x)+t ψt (x)b(x)ϕ(j) (x)fX (x)dρ(x),
and then dominated convergence Theorem ensures that
Z
lim
t→0
(j)
ψt (x)ϕ (x)fX (x)dρ(x) =
Z
ψ(x)ϕ(j) (x)fX (x)dρ(x) = E(ψϕ(j) ).
The fact that ψm(j) ∈ L2 (X) ensures that
1
lim
t→0 t
Z
ψ(x)m(j) (x)(fX,t (x) − fX (x))dρ(x) = E(ψm(j) b) = E(ψm(j) (s − rϕ)).
And last, Assumption 2.(iv) ensures that
1
lim
t→0 t
Z
(ψt (x) − ψ(x))m(j) (x)fX,t (x)dρ(x) = E(∂fX ψfX bm(j) ) = E(∂fX ψfX (s − rϕ)m(j) ).
39
So:
θ(t) − θ
= E [∂fX ψfX + ψ] m(j) (s − rϕ) + ψϕ(j)
t→0
t
√
If it exists a regular n-consistent estimator θb of θ, then Theorem 2.1 of van der Vaart
(1991) ensures that:
lim
1. θ(t) is differentiable with respect to the tangent set, i.e. it exists f ∈ L2 (ε, X, Z)
such that limt θ(t)−θ
= E(f (s − rϕ)) for any s ∈ S0 and ϕ ∈ Φ,
t
2. and the asymptotic distribution of θb is the convolution of a distribution with a normal
N (0, V ∗ ), with V ∗ = V(ΠS0 −rΦ (f )), where ΠS0 −rΦ is the orthogonal projection on
S0 − rΦ.
Note that in the previous claim, f is not necessarily unique, but ΠS0 −rΦ (f ) does not depend
on the choice of f .
Deriving the conditions and the form of semi-parametric efficiency bound At
√
this stage, we have show that a necessary condition for existence of regular N -consistent
estimator of θ is the existence of an influence function f . If we consider the parametric
sub-models with scores Sb0 , the previous derivation of θ(t) and the projection on then
f ∈ [∂fX ψfX + ψ] m(j) + S⊥
0.
Usual algebra about projections on convex in Hilbert space ensures that S⊥
0 = {q(Z)ε + c :
2
q ∈ L (Z), c ∈ R}.
√
Now if we consider parametric sub model with scores in Sbϕ = Sb0 − rϕ, existence of n
consistent and regular estimator of θ is given by implies that it exists q ∈ L2 (Z) and c ∈ R
such that:
∀ϕ ∈ Φ : E(ψϕ(j) ) = E((qε + c)rϕ)
But we know that E(εrϕ|Z) = E(ϕ|Z) and E(r|X, Z) = 0, so we get:
∀ϕ ∈ Φ : E(ψϕ(j) ) = E(T ∗ (q)ϕ).
Then, up to (−1)|j| , T ∗ (q)fX is the weak derivative of order (j) of ψfX (in the sense of the
linear forms on Φ, cf. Schwartz (1997), Chapter II and III). It follows that Assumption 3
is necessary for existence of a regular and root-N consistent estimator.
So, the expression of the semi-parametric efficiency bound is:
V ? = min V εq(Z) + (ψ(X) + ∂fX ψ(X)fX (X))m(j) (X) ,
q∈Q
Where Q is a closed convex subset in L2 (Z) and the objective function is a quadratic form
40
of q that tends to +∞ when ||q||L2 (Z) → +∞, this minimisation admits a unique solution
q?.
Representation of θ under Assumption 2.(ix) If we have rm ∈ L2 (ε, X, Z), then the
parametric sub-model defined by Y = (1 + t)m(X) + ε, with E(ε|Z) = 0 and (ε, X, Z) ∼
1/2
ft (e, x, z) and ft differentiable in quadratic means is regular. Following the same reasoning as previously, we know that existence of regular estimator of θ implies that θ =
E(ψ(X)m(X)(j) ) = E(q(Z)m(X)) for any q ∈ Q. So, the parameter can be expressed as
the expectation of q ? (Z)Y .
7.4
Proof of Proposition 4.2
In Theorem 4.1, the bound is the is the variance of the sum of a function R(X) and
an element of εL2 (X), then a projection of R(X) on εL2 (Z) and Pythagorean Theorem
ensures that:
V ? = min E (εq(Z) + R(X))2
q∈Q
" "
2 #
2 #
E(εR(X)|Z)
E
(εR(X)|Z)
= min E ε2 q(Z) +
+ E R(X) − ε
q∈Q
E(ε2 |Z)
E(ε2 |Z)
"
#
"
2
2 #
E(εR(X)|Z)
E
(εR(X)|Z)
+ E R(X) − ε
= min E E(ε2 |Z) q(Z) +
q∈Q
E(ε2 |Z)
E(ε2 |Z)
The second term does not depend on q, so the minimization on Q only concerns the first
term. This first term could be expressed as a function of a generalized inverse of an operator
K ∗.
2
varies in T defined by:
When q varies in Q then s = E(ε2 |Z) q(Z) + E(εR(X)|Z)
E(ε2 |Z)
(
T =
[ψfX ](j)
s ∈ L2 (Z) : K ∗ (s) = (−1)|j|
+ K∗
fX
E(εR(X)|Z)
p
E(ε2 |Z)
!)
And next:
"
2 #
2
E(εR(X)|Z)
min E E(ε2 |Z) q(Z) +
=
min
E
s
(Z)
.
s∈T
q∈Q
E(ε2 |Z)
So
s∗ of minimum norm in T which is the preimage of
we are lead back tofind the element
(j)
√ 2
(−1)|j| [ψffXX] + K ∗ E(εR(X)|Z)
under K ∗ . By definition, s? (Z) = Arg mins∈T E(s2 (Z)),
E(ε |Z)
(j)
E(εR(X)|Z)
∗†
|j| [ψfX ]
∗
√ 2
is the generalized inverse K (−1)
+K
which is well defined if
fX
E(ε |Z)
41
(j)
and only if (−1)|j| [ψffXX] ∈ R(K ∗ ) = R(T ∗ ). And then we get the formula (i) and (ii) of
the Proposition 4.2.
Moreover, Corollary 3.2.2 in Groetsch (1977) ensures that:
K ∗† = limt↓0 (tI + KK ∗ )−1 K.
Let h =
(−1)|j| [ψffXX]
(j)
+K
∗
√
E(εR(X)|Z)
E(ε2 |Z)
−1 ∗
, under Assumption 3 it exists u such that K ∗ u = h
and K † u = limt↓0 (tI + K ∗ K) K u is well-defined if and only if u ∈ R(K) + ker K ∗ (see
Groetsch (1977), Chapter 3), then
K ∗† h = limt↓0 (tI + KK ∗ )−1 K(tI + KK ∗ )−1 K(tI + K ∗ K)(tI + K ∗ K)−1 K ∗ u
= limt↓0 K(tI + K ∗ K)−1 K ∗ u
= KK † u if and only if u ∈ R(K) + ker K ∗ or equivalently h ∈ R(K ∗ K)
Last because K † u is the mean square solution of the equation Kx = u, this is also the
mean square solution of K ∗ Kx = K ∗ u and then K † u = (K ∗ K)† K ∗ u. The formula (iii)
and (iv) of Proposition 4.2 follows.
7.5
Proof of Proposition 5.2
The first part of the proposition is given by the Theorem 1.2.4 of Groetsch (1977). The
second part of the proposition is given by the Theorem 3.1.3 of Groetsch (1977) and the fact
that T is normal if and only if T ∗ is normal. For the third part of the proposition, assume
R
that Ker ((T )) = {0} and that E(g(x)|Z) = fX|Z (x)g(x)dλ(x) where λ is the Lebesgue
R
measure on RK . For gn (x) = cos(n(x1 + x2 + ... + xK )), we have fX|Z (x)gn (x)dλ(x) → 0
P
when n tends to +∞ because the Fourier transform of K
i=1 Xi |Z tends to zero at the
R
2
fX|Z (x)gn (x)dλ(x) ≤
infinity. Moreover the Cauchy-Schwartz inequality ensures that
R
R
fX|Z (x)dλ(x)
fX|Z (x)gn2 (x)dλ(x) ≤ 1. Then the dominated convergence Theorem
ensures that ||T (gn )||L2 (Z) tends to zero when n tends to ∞. On the other hand ||gn ||2L2 (Z) =
R
R
cos2 (n(x1 +...+xK ))fX|Z (x)dλ(x) = 21 + 12 sin(2n(x1 +...+xK ))fX|Z (x)dλ(x) and tends
to 1/2 when n tends to infinity. Then, Theorem 1.2.3 in Groetsch (1977) ensures that R(T )
is not closed and next by the first part of the proposition R(T ∗ ) is not closed.
42
7.6
Definition and properties of Fourier and Laplace transforms of linear
forms.
As explained in Section 5.2.2, the condition given in the Theorem 5.3 are sufficient to
ensure that % is a Laplace transform of a linear form. We will briefly present the definition
of the Laplace transform of a linear form.
Let define S, the Schwartz space of Rp : this is the set of complex-valued functions
that are indefinitely differentiable such that themselves and all their derivatives tend
to 0 faster that any inverse power of the norm (see Schwartz (1997), Chapter VII) :
Q
lim|x|→∞ pi=1 xki i ϕ(j) (x) = 0. The usual Fourier transform that maps ϕ to Fϕ (t) =
R
exp(−2πiv 0 t)ϕ(v)dv is a bijection on S. The topology of S is the topology of the following
convergence
p 2
ϕj → 0 if and only if ∀((k), (l)) ∈ (N ) , sup |
x∈Rp
p
Y
(l)
xki i ϕj (x)| → 0.
i=1
For any continuous linear form T ∈ S 0 , ϕ ∈ S 7→ T (Fϕ ) is well-defined because Fϕ ∈ S.
This is a continuous linear form on S, denoted F[T ] and called Fourier transform of T .
The Fourier transform of T ∈ S 0 is related to the usual Fourier transform on S through
the formula:
F[T ](ϕ) = T (Fϕ ) .
For the definition of Laplace transform of T , let F̌ϕ defined by F̌ϕ (t) = Fϕ 2πt , this also
defined a bijection on S.
For any continuous linear form T ∈ S 0 , ϕ ∈ S 7→ T F̌ϕ is well-defined linear form on S,
denoted F̌[T ]. Moreover, this linear form is continuous F̌[T ] ∈ S 0 .
For any T ∈ S 0 , let ΓT the convex set containing 0, defined by:
u ∈ ΓT ⇔ [ϕ 7→ T (exp(−u0 .)ϕ(.))] is a well-defined element of S 0 .
R
For instance, consider the case where T (ϕ) = f (x)ϕ(x)dx. If f is integrable with a
Q
compact support on Rp then ΓT = Rp , if f (x) = pi=1 1{xi >0} then ΓT = Rp+ and if
f (x) = 1 then ΓT = {0}.
For u ∈ ΓT , exp(−u0 .)T denotes the element of S 0 defined by [exp(−u0 .)T ](ϕ) = T (ϕ)
e with
0
0
0
ϕ(x)
e
= exp(−u x)ϕ(x). And u ∈ ΓT if and only if exp(−u .)T ∈ S .
The Laplace transform of T is defined from ΓT to S 0 by:
L[T ] : u ∈ ΓT 7→ ϕ 7→ F̌[exp(−u0 .)T ](ϕ) .
43
R
For instance, consider the case where T (ϕ) = f (x)ϕ(x)dx for f a bounded function, then
R
R −iv0 x
0
L[T ](u)(ϕ) = f (x)e−u x
e
ϕ(v)dv dx. Then L[T ] is related to the usual Laplace
R
transform Lf : (u + iv) 7→ f (x)e−(u+iv)x dx via the following canonical mapping:
Z
L[T ](u)(ϕ) =
Lf (u + iv)ϕ(v)dv.
R
R
Similarly, if T (ϕ) = ϕ(x)dµ(x) for µ a signed measure, then L[T ](u)(ϕ) = Lµ (u +
R
0
iv)ϕ(v)dv with Lµ (u + iv) = e−(u+iv) x dµ(x). More generally, for any T ∈ S 0 it exists a
R
function LT defined on ΓT + iRp such that L(T )(u)(ϕ) = LT (u + iv)ϕ(v)dv (Schwartz,
1997, Chapter VIII), and such function is strongly regular : for instance if ΓT is an open
set, then LT is holomorphic on ΓT +iRp . Because it exists a canonical isomorphism between
0
L[T ] ∈ (ΓT )S and LT ∈ (ΓT + iRp )C , it is not worthwhile to make difference between these
two objects.
7.7
Proof of Theorem 5.3
Before to prove the Theorem, we will prove two useful Lemma to establish (iii).
7.7.1
Lemma
Lemma 7.4 (Approximation of compact sets by polytopes)
Let S a convex set in Rp such that int(S) 6= ∅ and K a compact included in int(S), then it
exists a polytope P such that K ⊂ P ⊂ int(S).
The case p = 1 is obvious because in that case int(S) is an open interval of R, and we can
choose P = [inf(K); sup(K)]. Because K is bounded it exists a closed ball of radius r and
center x0 containing K, we can assume without loss of generality that S is bounded (if S is
unbounded replace S by S ∩ B(x0 , r) in the following). Let K 0 = co(K) the convex hull of
K, K 0 is compact because this is the range of the continuous mapping (λ, x) ∈ Λ×K 7→ λ·x,
P
where Λ is the compact {λ ∈ [0; 1]p : pi λi = 1} (the Thykhonov Theorem ensures that Λ×
K is compact in R2p ). For (x, y) ∈ Rp , d2 (x, y) = ||x − y||2 denotes the Euclidean distance
between x and y and for non-empty A ⊂ Rp , d2 (A, y) = inf x∈A d2 (A, y). For any A ⊂ Rp ,
Ac denotes the complementary of A in Rp . Let d = inf y∈S c d2 (y, K 0 ) > 0, compactness of
K 0 ensures that d > 0 (if d = 0 it exists a sequence xn ∈ K 0 such that d2 (xn , S c ) < 1/n
and a subsequence that converges to x∞ ∈ K 0 ∩ cl(S c ) = K 0 ∩ int(S)). For any non-empty
subsets A and B in Rp , let dH (A, B) = sup{supx∈A d2 (B, x), supy∈B d2 (A, y)} the Hausdorff
metric between A and B. It exists a polytope P such that dH (P, K 0 ) < d/2 and K 0 ⊂ P
(Bronstein (2008)). For any non-empty convex C ⊂ Rp and any x ∈ Rp , it exists a unique
44
πC (x) ∈ cl(C) such that d2 (x, πC (x)) = d2 (x, C) (Hiriarat-Urruty & Lemaréchal (2001),
Section A.3). For any y ∈ S c :
d ≤ d2 (y, πK 0 (y)) ≤
≤
≤
≤
≤
≤
d2 (y, πK 0 (πP (y)))
d2 (y, πP (y)) + d2 (πP (y), πK 0 (πP (y)))
d2 (y, πP (y)) + inf u∈K 0 d2 (πP (y), u)
d2 (y, πP (y)) + supv∈P inf u∈K 0 d2 (v, u)
d2 (y, πP (y)) + dH (P, K 0 )
d2 (y, πP (y)) + d/2
It follows that d2 (y, πP (y)) ≥ d/2 for any y ∈ S c and then inf y∈S c inf x∈P d2 (x, y) > 0. This
ensures that P ∩ cl(S c ) = ∅ and then P ⊂ int(S).
7.7.2
Theorem 5.3
e W ) = E(q(Z,
e W )|τW (Z),
e W ). Under Assumption 4, X
e ⊥⊥
For any q ∈ Q, let qe(τW (Z),
e W (Z),
e W then for any q ∈ Q, T ∗ (q) = T ∗ (e
Z|τ
q ).
Moreover, under Assumption 4 the Bayes formula implies:
R
e W )))|X
e =x
E(e
q (τ (Z,
e, W = w) =
qe(t, w)gw (t) exp(−µw (e
x)0 t)dFτW (Z)|W =w (t)
νw (t)
Assumption 3 ensures that ∀ϕ ∈ Φ:
R
ψ(x)ϕ(j) (x)dFX (x) =
R
=
R
=
R
E(q(Z)|X
= x)ϕ(x)dFX (x)
R
qe(t,w)gw (t) exp(−µw (x)0 t)dFτ (Z)|W =w (t)
ϕ(e
x, w)dFX (e
x, w)
νw (t)
rw (µw (e
x))ϕ(e
x, w)dFX (e
x, w).
Then if Q =
6 ∅:
(j)
|j| [ψ(X)fX (X)]
e =
(−1)
= rW (µW (X))
fX (X)
R
E(q(Z)|τ (Z) = t, W )gW (t) exp(−µ(X)0 t)dFτ (Z)|W (t)
.
e
νW (µW (X))
e = E(q(Z)|X) which is integrable with respect to F e .
And (i) holds because rW (µW (X))
X|W
e
Remark that for any v ∈ RpW , %W (µW (X)+iv)
is (almost-surely) well defined and finite beR
0
e
e
e
cause |e
q (t, W )gW (t) exp(−(µW (X)+iv)
t)|dFτ (Z)|W (t) ≤ νW (µW (X))E(|q(Z)||µ
W (X), W ).
By definition, ρW is the Laplace transform of the measure MW defined by dMW (t) =
qe(t, W )gW (t)dFτW (Z)|W
(t). Domain of a Laplace transforms are always of the form Γ + iRp
e
where Γ is convex (see for instance Schwartz (1997), Chapter VIII). It follows that ρW
e ∈ CW |W ) = 1,
is defined on CW + iRpW where CW is a convex set such that P(µW (X)
e
e
then Supp(µW (X)|W
) ⊂ cl(CW ) and int co Supp(µW (X)|W
) ⊂ int cl(CW ). It follows
45
ΓW ⊂ int(CW ) (a convex set in Rd has the same interior than its closure, see for instance
Hiriarat-Urruty & Lemaréchal, 2001, pages 36 and 37). Moreover, Proposition 6 in Chapter
VIII of Schwartz (1997) ensures that ρW is holomorphic on intCW + iRpW and then on
ΓW + iRpW and Condition (ii) holds.
For any KW compact included in ΓW , Lemma 7.4 ensures that it exists (u1 , ..., upW ) (depending on W ) such that KW ⊂ co(u1 , ..., upW ) ⊂ ΓW , then for every u + iv ∈ KW + iRp :
|ρW (u + iv)| =
≤
≤
≤
≤
R
qe(t, W )gW (t) exp(−(u + iv)0 t)dFτ (Z)|W (t)
R
|e
q (t, W )| gW (t) exp(−u0 t)dFτ (Z)|W (t)
R
Pl
Pl
|e
q (t, W )| gW (t) exp(−u0i t)dFτ (Z)|W (t), with
i=1 λi
i=1 λi ui = u
Pl R
0
|e
q (t, W )| gW (t) exp(−ui t)dFτ (Z)|W (t)
Pli=1
i=1 E (|q(Z)| |µ(X) = ui , W ) νW (ui ) < ∞
Then (iii) holds.
Last if τ (Z)|W is dominated by the Lebesgue measure, then it exists f (t, w) ≥ 0 such
R
e = h(t)f (t, w)dt. Let ξu,w (t) = qe(t, w)gw (t)f (t, w) exp(−u0 t). For any
that E(h(τW (Z)))
v ∈ RpW , ve denotes the vector of Rp with components vek = 0 if |vk | < maxk0 =1,...,pW |vk0 |
R
and vek = π/vk if |vk | = maxk0 =1,...,pW |vk0 |. Then ρW (u + iv) = Rp ξu,W (t) exp(−iv 0 t)dt =
R
R
− Rp ξu,W (t) exp(−iv 0 (t − ve))dt = − Rp ξu,W (t + ve) exp(−iv 0 t)dt.
It follows that:
R
2 |ρW (u + iv)| ≤
|ξu,W (t) − ξu,W (t + ve)| dt
Continuity of translations in L1 (Rp )) (cf. Rudin (1987), Theorem 9.5) ensures that ρW (u +
iv) tends to 0 when maxk0 =1,...,pW |vk0 | tends to ∞ or equivalently when v tends to ∞. And
the Condition (iv) holds.
To derive the form of the semi-parametric efficiency bound, note that the Laplace transform
dL−1 [%W ]
is injective. It follows that qe(t, W ) = gW1(t) dF
(t) does not depend on the choice of
e
τW (Z)|W
q ∈ Q. Then if Q is not empty, we have:
e W ) + {η ∈ L2 (Z) such that E(η(Z)|τW (Z),
e W ) = 0}
Q = qe(τW (Z),
e W ) + η ∗ (Z) the element of Q that achieves the bound V ∗ . For any
Let q ∗ (Z) = qe(τW (Z),
e W ) = 0 and any δ ∈ R:
η ∈ L2 (Z) such that E(η(Z)|τW (Z),
V(εq ∗ (Z) + R(X)) ≤ V(ε(q ∗ (Z) + δη(Z)) + R(X)),
or equivalently, after an expansion:
δ 2 V(εη(Z)) + 2δCov(εη(Z), εq ∗ (Z) + R(X)) ≥ 0.
46
e W ) = 0:
It follows that for any η ∈ L2 (Z) such that E(η(Z)|τW (Z),
Cov(εη(Z), εq ∗ (Z) + R(X)) = 0,
e W) =
Because E(εη(Z)) = E(E(ε|Z)η(Z)) = 0, then for any η ∈ L2 (Z) such that E(η(Z)|τW (Z),
0 we have:
E E(ε2 |Z)q ∗ (Z) + E(εR(X)|Z) η(Z) = 0.
e W )+E(εR(X)|τW (Z),
e W ),
Consider η(Z) = E(ε2 |Z)q ∗ (Z)+E(εR(X)|Z)−E(E(ε2 |Z)q ∗ (Z)|τW (Z),
to obtain that:
h
i
e W = 0.
E V E(ε2 |Z)q ∗ (Z) + E(εR(X)|Z)|τW (Z),
It follows that:
e
e W ).
E(ε |Z)q (Z) + E(εR(X)|Z) = E ε q (Z)|τW (Z), W + E(εR(X)|τW (Z),
2
∗
2 ∗
e + η ∗ (Z), it exists a function h such that:
Because q ∗ (Z) = qe(τW (Z))
e
e = −e
e W ) − E(εR(X)|Z) + h(τW (Z), W ) ,
η ∗ (Z)
q (τW (Z),
E(ε2 |Z)
E(ε2 |Z)
e W ) = 0 ensures:
and then, the condition E(η ∗ (Z)|τW (Z),
h
i
e W ) + E E(εR(X)|Z)/E(ε2 |Z)|τW (Z),
e W
qe(τW (Z),
e W) =
h
i
h(τW (Z),
,
e W
E 1/E(ε2 |Z)|τW (Z),
and next:
h
i
2
e
E
E(εR(X)|Z)/E(ε
|Z)|τ
(
Z),
W
e W)
W
qe(τ (Z),
E(εR(X)|Z)
h W
i+
h
i −
q ∗ (Z) =
E(ε2 |Z)
e W
e W
E(ε2 |Z)E 1/E(ε2 |Z)|τW (Z),
E(ε2 |Z)E 1/E(ε2 |Z)|τW (Z),
and the form of the semiparametric efficiency bound follows.
7.8
Proof of Theorem 5.4
If it exists a regular and root-N consistent estimator, then ψfX admits a weak derivative of
order (j) and there exists q ∈ L2 (Z) such that (−1)(j) E(q(Z)|X = x)fX (x) = [ψfX ](j) (x).
47
Assumption 5 ensures that Z ⊥⊥ X|p(Z), then
T ∗ (q)(x) = E(q(Z)|X = x) = E(E(q(Z)|p(Z))|X = x).
Let qe(v) = E(q(Z)|p(Z) = v), we have qe ∈ L2 (p(Z)) and:
R
R
ψ(x)ϕ(j) (x)dFX (x) =
ϕ(x)T ∗ (q)(x)fX (x)dx
R
=
ϕ(x)E(e
q (p(Z))|X = x)fX (x)dx
R
=
ϕ(x) qefp(Z) ? fη (x)dx
Then if there exists a regular and root-N consistent estimator, we have:
[ψfX ](j) = (−1)|j| qefp(Z) ? fη .
The Fourier transform are well defined and next:
F[ψfX ](j) = (−1)|j| Fqefp(Z) Ffη .
Because Ffη has isolated zeros, we get the "only if part" of the Proposition (i) of Theorem
5.4. The "if part" is achieved by the fact that if such qe exists then qe ∈ Q.
Because q ∈ L1 (Z) then qefp(Z) is integrable with respect to the Lebesgue measure and next
Fqefp(Z) ∈ C00 (RK ) by the Riemann-Lebesgue lemma and (ii) holds.
Moreover, when Fqefp(Z) ∈ L1 (RK )∩C00 (RK ), then usual inversion formula applies i.e. F −1 =
F. Then Q is not empty if and only if it exists (−1)|j| F F[ψfX ](j) /Ffη ∈ L1 (RK ), which
is equivalent to F F[ψfX ](j) /Ffη ∈ L1 (RK ). And next, the full characterization of Q
follows. Then (iii) holds.
R
If p(Z) has a compact support then ϕ 7→ qe(v)fp(Z) (v)ϕ(v)dv is a linear form compactly
supported and then its Laplace transform verifies conditions (iv).(a), (iv).(b) and (iv).(c)
(cf. Schwartz (1997)). Moreover, the identity
L[ψfX ](j) = (−1)|j| Lqefp(Z) Lfη
holds on Γfη + iRK = CK and next
L[ψfX ](j)
L fη
= (−1)|j| Lqefp(Z)
on CK .
Last, the expression of q ∗ is achieved by the same reasoning than those used in the proof
of Theorem 5.3.
48
7.9
Proof of Propostion 5.5
It follows from Condition (i) of Theorem 5.4 that a regular and root-N estimator exists only
(j)
if the weak derivative of fX is a function (locally integrable) fX and we have (cf. Schwartz
Q
jk
(1997), Equation VII.7.3, page 253): Ff (j) (t) = (2πi)|j| t(j) FfX (t) with t(j) = K
k=1 tk . The
X
additive first stage ensures that Ff (j) (t) = (2πi)|j| t(j) Ffp(Z) (t)Ffη (t). The first condition of
X
Theorem 5.4 ensures that there exists q ∈ L2 (p(Z)) such that:
(2πi)|j| t(j) Ffp(Z) (t) = (−1)|j| Fqfp(Z) (t).
Because (2πi)|j| t(j) is the weak derivative of order (j) of the Dirac at 0, it follows (cf.
Schwartz, 1997, Chapter VII, Theorem XV) that (−1)|j| qfp(Z) is the weak derivative of
order (j) of fp(Z) . Then the weak derivative of!fp(Z) is a function of the form (−1)|j| qfp(Z)
2
(j)
fp(Z) (p(Z))
2
< ∞. The expression of q ∗ follows from
with q ∈ L (p(Z)) and next E
fp(Z) (p(Z))
Theorem 5.4.
7.10
A Lemma on Fourier transform
Lemma 7.5 (Fourier transforms and derivatives)
Let P a probability measure on R, and H and G two signed measures dominated by P such
that:
P is dominated by H,
dH
is locally absolutely continuous, and
dP
for any x ∈ R : 2πixF[H](x) = F[G](x),
then P admits a locally absolutely continuous density with respect to the Lebesgue measure.
Proof:
As a signed measured H ∈ S 0 and because the weak derivative of the Dirac at 0 admits
a compact support, 2πixF[H] = F[Dδ ? H] = F[DH] where DH is the weak derivative
of H and Dδ is the weak derivative of the Dirac at 0. The injectivity of the Fourier
transform on S 0 ensures that the weak derivative of the measure H needs to be the measure
G. Theorem II, Chapter II in Schwartz (1997) ensures that it exists a function e
h (with
bounded variation on every finit interval) such that dH(t) = e
h(t)dλ(t) where λ is the
dP
e
Lebesgue measure on R. Then dP (t) = h(t) dH (t)dλ(t) and next, dG(t) = ge(t)dλ(t) with
49
dP
ge = e
h(t) dH
(t) dG
(t) and ge ∈ L1 (λ). So ge is the weak derivative of e
h. The Theorem
dP
e
III in Chapter II of Schwartz ensures that h is absolutely continuous. The product of
locally absolutely continuous functions is a locally absolutely continuous function. And
dP
(t) is locally absolutely continuous, this is the density of P with respect to the
next e
h(t) dH
Lebesgue measure.
8
Appendix B: Details on Table 1
8.1
Proof of the results about the normal bivariate case
We consider the case where:
(X, Z) ∼ N
!
0
0
,
!!
1 ρ
ρ 1
,
Following the notations of Assumption 4 and Theorem 4 (and dropping W that does not
appear here), we have:
ρz
,
µ(x) = x, τ (z) = −
1 − ρ2
2
1
1
x2
t (1 − ρ2 )
g(t) = p
, h(x) = √ exp −
.
exp −
2
2(1 − ρ2 )
2π
1 − ρ2
R
xρz
z2
dz,
Remember that ν(x) = E (g(τ (Z)) exp (−xτ (Z))) = √ 1 2 exp 1−ρ
−
2
2(1−ρ2 )
2π(1−ρ )
and next a bit of algebra gives:
ν(x) = exp
x 2 ρ2
2(1 − ρ2 )
.
Then for a parameter θ0 = E(ψ(X)m(j) (X)), we have to check that the following function
%(x) is a Laplace transform:
%(x) = (−1)
j
(j)
x2
x2
ψ(x) exp(− )
exp
.
2
2(1 − ρ2 )
2 /2)
√
1. Shift in mean: If θ0 = E(ψ(X)m(X)) with ψ(x) = h( exp(−(x
2π
√1
2π
2
e−(x−m) /2
, x) with h(y, x) =
. Note that choosing h like that means that we are interesting by the
average effect under a counterfactual distribution N (m, 1) in a case where fX is
unknown and needs to be estimated. In that case, R(X) = (ψ(X) + ∂fX ψ(X) ·
fX (X))m(X) − E ((ψ(X) + ∂fX ψ(X) · fX (X))m(X)) = 0. A less realistic problem
y
50
could also be considered, assuming thatfX is knownby econometrician, in that case
2
2
h would take the form h(y, x) = exp − (x−m)2 −x
and even if the parameter is
the same, the expression of the semi parametric efficiency bound would be different
because in that case R(X) = ψ(X)m0 (X) − E (ψ(X)m0 (X)).
In that case,
(x − m)2
x2
%(x) = exp −
+
2
2(1 − ρ2 )
m2
= exp −
2
exp
x2 ρ2
− xm .
2(1 − ρ2 )
For ρ 6= 0, usual formula of Laplace transform ensures that:
p
1 − ρ2
m2
(z − m)2 (1 − ρ2 )
%(x) = √
exp −
L exp −
,
2
2ρ2
2π|ρ|
and next, if L−1 [%] denotes the measure that is the inverse Laplace of %(x):
p
1 − ρ2
m2
(t − m)2 (1 − ρ2 )
−1
exp −
dL [%] = √
exp −
dt.
2
2ρ2
2π|ρ|
On the other side:
p
2
1 − ρ2
t (1 − ρ2 )
exp −
g(t)dFτ (Z) (t) = √
dt,
2ρ2
2π|ρ|
so, we obtain:
1 dL−1 [%(x)]
(t) =
g(t) dFτ (Z)
√
2π(1 − ρ2 )3/2
m2
tm(1 − ρ2 )
exp − 2 exp
.
|ρ|
2ρ
ρ2
Because τ (Z) is one to one, E(ε2 |Z) = E(ε2 |τ (Z)) and E(εR(X)|Z) = E(εR(X)|τ (Z)).
It follows that the semi-parametric efficiency bound of θ0 is:
V (εq ? (Z)) .
?
With q (Z) = exp
mZ
ρ
−
m2
2ρ2
.
2. Shift in variance: If θ0 = E(ψ(X)m(X)) with ψ(x) = h( √12π e−x
−x2 /(2σ 2 )
e
√ 1
2πσ 2
2 /2
, x) with h(y, x) =
. Note that choosing h like that means that we are interesting by
the average effect under a counterfactual distribution N (0, σ 2 ) in a case where fX
is unknown and needs to be estimated and next R(X) = 0. We have ψ(x) =
y
51
2
(σ 2 )−1/2 exp − x2
1
σ2
− 1 , then
2 −1/2
%(x) = (σ )
2
1
1
x
.
−
exp −
2 σ 2 1 − ρ2
Because Γ = R, it follows that conditions (iii) and (iv) hold only if
only if σ 2 > 1 − ρ2 . Reciprocally, if σ 2 > 1 − ρ2 , we have:
p
1
σ2
1
− 1−ρ
2 < 0, ie
z 2 σ 2 (1 − ρ2 )
L[exp(−
)],
%(x) = p
2(σ 2 + ρ2 − 1)
2π(σ 2 + ρ2 − 1))
(1 − ρ2 )
and in this case if L−1 [%] denotes the measure that is the inverse Laplace of %(x):
p
(1 − ρ2 )
t2 σ 2 (1 − ρ2 )
dL−1 [%](t) = p
exp(−
)dt.
2(σ 2 + ρ2 − 1)
2π(σ 2 + ρ2 − 1))
On the other side:
p
2
1 − ρ2
t (1 − ρ2 )
g(t)dFτ (Z) (t) = √
exp −
dt,
2ρ2
2π|ρ|
then:
|ρ|
1 dL−1 [%(x)]
(t) = p
exp
2
g(t) dFτ (Z)
σ + ρ2 − 1
t2 (1 − ρ2 )2 (σ 2 − 1)
2ρ2 (σ 2 + ρ2 − 1)
.
It follows that the semi-parametric efficiency bound of θ0 is:
ρ2
Z 2 (σ 2 − 1)
V ε exp
.
σ 2 + ρ2 − 1
2(σ 2 + ρ2 − 1)
3. Average marginal effect: If θ0 = E (m0 (X)), then if q ∈ Q, it follows that E[q(Z)|X] =
X and then q(Z) = Zρ and the semi parametric efficiency bound is:
Z
0
V (Y − m(X)) + m (X)
ρ
4. average derivative weighted by power of density: Let θ0 = E (ψ(X)m0 (X)) with
ψ(X) = h((2π)−1/2 exp(−x2 /2), x), with h(y, x) = y k , for k = 0 we get the average
marginal effect
and
for k = 1 we
get the density-weighted
average derivative. R(X) =
2
2
k+1
k+1
exp − kx2 m0 (X) − E (2π)
− kx2 m0 (X) . The function % is:
k/2 exp
(2π)k/2
%(x) = (k + 1)x exp
x2
2
52
1
− (k + 1)
1 − ρ2
,
which is not the Laplace transform of a function if
1
1−ρ2
− (k + 1) < 0. Then the
k
. For ρ2 >
parameter θ0 is root-N estimable only if ρ2 ≥ k+1
ensure that the semi-parametric efficiency bound is:
2
(k + 1) V εZ exp Z 2
And for ρ2 =
k
k+1
1 − ρ2
(k + 1)ρ2 − k
1
+√ k
2π
k
,
k+1
!
exp −kX 2 /2 m0 (X) .
and k ≥ 1, we get:
!
Z
1
(k + 1)2 V ε + √ k exp −kX 2 /2 m0 (X) .
ρ
2π
Last, if k = 0 and ρ2 = 0 Q is empty.
53
a bit of algebra
References
Ai, C. & Chen, X. (2003), ‘Efficient estimation of models with conditional moment restrictions containing unknown functions’, Econometrica 71, 1795–1843.
Ai, C. & Chen, X. (2007), ‘Estimation of possibly misspecified semiparametric conditional
moment restriction models with different conditioning variables’, Journal of Econometrics 141, 5–43.
Ai, C. & Chen, X. (2012), ‘The semiparametric efficiency bound for models of sequential
moment restrictions containing unknown functions’, Journal of Econometrics 170, 442–
457.
Bickel, P. J., Klaassen, C. A., Ritov, Y. & Wellner, J. A. (1993), Efficient and Adaptative
Estimation for Semiparametric Models, Springer-Verlag New York. ISBN 0-387-98473-9.
Blundell, R., Chen, X. & Kristensen, D. (2007), ‘Semi-nonparametric iv estimation of
shape-invariant engel curves’, Econometrica 75, 1613–1669.
Breunig, C. & Johannes, J. (2016), ‘Adaptive estimation of functionals in nonparametric
instrumental regression’, Econometric Theory 32.
Bronstein, E. M. (2008), ‘Approximation of convex sets by polytopes’, Journal of Mathematical Sciences 153(6), 727–762.
Canay, I. A., Santos, A. & Shaikh, A. M. (2013), ‘On the testability of identification in
some nonparametric models with endogeneity’, Econometrica 81(6), 2535–2559.
Carrasco, M., Florens, J. & Renault, E. (2007), Linear inverse problems and structural econometrics: Estimation based on spectral decomposition and regularization, in
J. Heckman & E. Leamer, eds, ‘Handbook of Econometrics’, Vol. 6B, North-Holland,
pp. 5633–5751.
Carrasco, M., Florens, J. & Renault, E. (2014), Asymptotic normal inference in linear
inverse problems, in J. Racine, L. Su & A. Ullah, eds, ‘The Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics’, Oxford University Press, pp. 65–96.
Cattaneo, M. D., Crump, R. K. & Jansson, M. (2014), ‘Small bandwidth asymptotics for
density-weighted average derivatives’, Econometric Theory 30, 176–200.
Chamberlain, G. (1987), ‘Asymptotic efficiency in estimation with conditional moment
restrictions’, Journal of Econometrics 34, 305–334.
54
Chen, X. (2007), Large sample sieve estimation of semi-nonparametric models, in J. Heckman & E. Leamer, eds, ‘Handbook of Econometrics’, Vol. 6B, North-Holland, pp. 5549–
5632.
Chen, X. & Pouzo, D. (2012), ‘Estimation of nonparametric conditional moment models
with possibly nonsmooth generalized residuals’, Econometrica 80(1), 277–321.
Chen, X. & Pouzo, D. (2015), ‘Sieve wald and qlr inferences on semi/nonparametric conditional moment models’, Econometrica 83(3), 1013–1079.
Chetverikov, D. & Wilhelm, D. (2015), ‘Nonparametric instrumental variable estimation
under monotonicity’, CEMMAP working paper (39).
Choi, S., Hall, W. J. & Schick, A. (1996), ‘Asymptotically uniformly most powerful tests
in parametric and semiparametric models’, The Annals of Statistics 24(2), 841–861.
Darolles, S., Fan, Y., Florens, J. & Renault, E. (2011), ‘Nonparametric instrumental regression’, Econometrica 79(5), 1541–1565.
De La Vallée Poussin, C.-J. (1915), ‘Sur l’intégrale de lebesgue’, Transactions of the American Mathematical Society 16(4), 435–501.
d’Haultfœuille, X. (2011), ‘On the completeness condition in nonparametric instrumental
problems’, Econometric Theory 27(03), 460–471.
Florens, J. (2003), Inverse problems and structural econometrics: The example of instrumental variables, in R. Dewatripont, L. Hansen & S. Turnovsky, eds, ‘Advances in
Economics and Econometrics: Theory and Applications’, Cambridge University Press,
pp. 284–311.
Florens, J., Johannes, J. & Bellegem, S. V. (2011), ‘Identification and estimation by penalization in nonparametric instrumental regression’, Econometric Theory 27, 472–496.
Florens, J., Johannes, J. & Bellegem, S. V. (2012), ‘Instrumental regression in partially
linear models’, Econometric Journal 15, 304–324.
Florens, J. & Simoni, A. (2012), ‘Nonparametric estimation of an instrumental variables
regression: A quasi bayesian approach based on a regularized posterior’, Journal of
Econometrics 170, 458–475.
Freyberger, J. (2015), On completeness and consistency in nonparametric instrumental
variable models.
webpage: http://www.ssc.wisc.edu/˜jfreyberger/Completeness_Freyberger.
55
Gourieroux, C., Montfort, A. & Trognon, A. (1984), ‘Pseudo maximum likelihood methods:
Theory’, Econometrica 52(03), 681–700.
Groetsch, C. W. (1977), Generalized Inverse of Linear Operators, Representation and Approximation, first edn, MARCEL DEKKER. ISBN 0-8247-6615-6.
Hall, P. & Horowitz, J. (2005), ‘Nonparametric methods for inference in the presence of
instrumental variables’, The Annals of Statistics 33, 2904–2929.
Hiriarat-Urruty, J.-B. & Lemaréchal, C. (2001), Fundamentals of Convex Analysis, first
edn, Springer. ISBN 3-540-42205-6.
Hu, Y. & Schennach, S. M. (2008), ‘Instrumental variable treatment of nonclassical measurement error models’, Econometrica 76(1), pp. 195–216.
Jacod, J. & Protter, P. (2000), Probability Essentials, Springer Berlin Heidelberg. ISBN
3-540-66419-X.
Newey, W. K. (1988), ‘Adaptative estimation of regression models via moment restrictions’,
Journal of Econometrics 38, 301–339.
Newey, W. K. (1990a), ‘Efficient instrumental variables estimation of nonlinear models’,
Econometrica 58(4), 809–837.
Newey, W. K. (1990b), ‘Semiparametric efficiency bound’, Journal of Applied Econometrics
5(2), 99–135.
Newey, W. K. & Powell, J. L. (2003), ‘Instrumental variable estimation of nonparametricmodels’, Econometrica 71(5), 1565–1578.
Newey, W. K. & Stoker, T. M. (1993), ‘Efficiency of weighted average derivative estimators
and index models’, Econometrica 61(5), 1199–1223.
Powell, J. L., Stock, J. H. & Stoker, T. M. (1989), ‘Semiparametric estimation of index
coefficients’, Econometrica 57(6), 1403–1430.
Rudin, W. (1987), Real and Complex Analysis, third edn, McGraw-Hill. ISBN 0-07-0542341.
Santos, A. (2011), ‘Instrumental variables methods for recovering continuous linear functionals’, Journal of Econometrics 161, 129–146.
Schwartz, L. (1997), Théorie des distributions, third edn, Hermann. ISBN 2-7056-5551-4.
56
Severini, T. A. & Tripathi, G. (2012a), ‘Semiparametric efficiency bounds for microeconometric models: A survey’, Foundations and Trendsr in Econometrics 6(3-4), 163–397.
Severini, T. & Tripathi, G. (2012b), ‘Efficiency bounds for estimating linear functionals of
nonparametric regression models with endogenous regressors’, Journal of Econometrics
170, 491–498.
van der Vaart, A. W. (1991), ‘On differentiable functionals’, Annals of Statistics 19(1), 178–
204.
van der Vaart, A. W. (2000), Asymptotic Statistics, Cambridge University Press. ISBN
0-521-78450-6.
57