ESTIMATION OF NONPARAMETRIC
MODELS WITH SIMULTANEITY
Rosa L. Matzkin∗
Department of Economics
University of California, Los Angeles
First version: May 2010
This version: August 2014
Abstract
We introduce methods for estimating nonparametric, nonadditive models with simultaneity. The methods are developed by directly connecting the elements of the structural
system to be estimated with features of the density of the observable variables, such as ratios of derivatives or averages of products of derivatives of this density. The estimators
are therefore easily computed functionals of a nonparametric estimator of the density of
the observable variables. We consider in detail a model where to each structural equation
there corresponds an exclusive regressor and a model with one equation of interest and one
instrument that is included in a second equation. For both models, we provide new characterizations of observational equivalence on a set, in terms of the density of the observable
variables and derivatives of the structural functions. Based on those characterizations, we
develop two estimation methods. In the first method, the estimators of the structural derivatives are calculated by a simple matrix inversion and matrix multiplication, analogous to
a standard Least Squares estimator, but with the elements of the matrices being averages
of products of derivatives of nonparametric density estimators. In the second method, the
estimators of the structural derivatives are calculated in two steps. In a first step, values
of the instrument are found at which the density of the observable variables satisfies some
properties. In the second step, the estimators are calculated directly from the values of
derivatives of the density of the observable variables evaluated at the found values of the
instrument. We show that both pointwise estimators are consistent and asymptotically
normal.
____________________
∗
The support of NSF through grants SES-0833058 and SES-1062090 is gratefully acknowledged.
I have greatly benefitted from the insightful comments of a co-editor, a guest co-editor, three
referees, discussions with Richard Blundell, Jim Heckman, and Whitney Newey, comments by
participants at the 2008 All UC Econometrics Conference at Berkeley, 2009 Winter Meeting of the
Econometric Society, 2009 BIRS Conference on Semiparametric and Nonparametric Methods in
Econometrics, 2009 Conference on Consumer Demand at Brown University, 2009 SETA Conference
at Kyoto University, Invited Session on Advances in Econometrics at the 2009 Latin American
Meeting of the Econometric Society, 2009 Annual Minnesota Lecture, 2010 World Congress of
the Econometric Society, 2010 Cowles Summer Conference, many econometric seminars including
Collegio Carlo Alberto, EIFEF, Harvard/MIT, Maryland, Montreal Econometrics, Princeton, UCL,
UCLA, UCSD, USC, Stanford, and the excellent research assistance of Matthew Baird, Anastasia
Burkovskaya, David Kang, Ruoyao Shi, Yulong Wang, Kyle Woodward, Yong Hyeon Yang, and
Federico Zincenko.
1. Introduction
This paper presents estimators for two nonparametric models with simultaneity. The
estimators are shown to be consistent and asymptotically normally distributed. They are
derived from new constructive identification results that are also presented in the paper.
The nonparametric models posses nonadditive unobservable random terms. We consider a
model where to each equation there corresponds an exclusive regressor and a model with
two equations and one instrument. For both models, we develop closed form estimators of
structural derivatives, by averaging over excluded instruments. For the second model, we
develop also two-step indirect estimators.
Estimation of structural models has been one of the main objectives of econometrics
since its early times. The analyses of counterfactuals, the evaluation of welfare, and the
prediction of the evolution of markets, among others, require knowledge of primitive functions
and distributions in the economy, such as technologies and distributions of preferences, which
often can only be estimated using structural models. Estimation of parametric structural
models dates back to the early works of Haavelmo (1943, 1944), Hurwicz (1950a), Koopmans
(1949), Koopmans and Reiersol (1950), Koopmans, Rubin, and Leipnik (1950), Wald (1950),
Fisher (1959, 1961, 1966), Wegge (1965), Rothenberg (1971), and Bowden (1973). (See
Hausman (1983) and Hsiao (1983) for early review articles.) Hurwicz (1950b) considered
nonseparable econometric models, where the random terms are nonadditive.
Nonparametric structural models avoid specifying the functions and distributions as
known up to a finite dimensional parameter vector. Several nonparametric estimators
have been developed for models with simultaneity, based on conditional moment restrictions. These include Newey and Powell (1989, 2003), Darolles, Florens, and Renault (2002),
Ai and Chen (2003), Hall and Horowitz (2003), and for models with nonadditive random
terms, Chernozhukov and Hansen (2005), Chernozhukov, Imbens, and Newey (2007), Chen
and Pouzo (2012), and Chen, Chernozhukov, Lee, and Newey (2014). Identification in these
models has been studied in terms of conditions on the reduced form for the endogenous
regressors. The estimators are defined as solutions to integral equations, which may suffer
from ill-posed inverse problems.
In this paper, we make assumptions and construct nonparametric estimators in ways that
are significantly different from those nonparametric methods for models with simultaneity.
In particular, our estimators are closely tied to pointwise identification conditions on the
structural model. Our conditions allow us to directly read off the density of the observable
variables the particular elements of the structural model that we are interested in estimating. In other words, the goal of this paper is to develop estimators that can be expressed in
closed form. In this vein, estimators for conditional expectations can be easily constructed
1
by integrating nonparametric estimators for conditional probability densities, such as the
kernel estimators of Nadaraya (1964) and Watson (1964). Conditional quantiles estimators
can be easily constructed by inverting nonparametric estimators for conditional distribution
functions, such as in Bhattacharya (1963) and Stone (1977).1 For structural functions with
nonadditive unobservable random terms, several methods exist to estimate the nonparametric function directly from estimators for the distribution of the observable variables. These
include Matzkin (1999, 2003), Altonji and Matzkin (2001, 2005), Chesher (2003), and Imbens and Newey (2003, 2009). The set of simultaneous equations that satisfy the conditions
required to employ these methods is very restrictive. (See Blundell and Matzkin (2014) for
a characterization of simultaneous equation models that can be estimated using a control
function approach.) The goal of this paper is to fill this important gap.
Our simultaneous equations models are nonparametric and nonseparable, with nonadditive unobservable random terms. Unlike linear models with additive errors, each reduced
form function in the nonadditive model depends separately on the value of each of the unobservable variables in the system.
We present two estimation approaches and focus on two models. Both approaches
are developed from two new characterizations of observational equivalence for simultaneous
equations, which we introduce in this paper. The new characterizations, expressed in terms
of the density of the observable variables and ratios of derivatives of the structural functions,
immediately provide constructive methods for identifying one or more structural elements.
Our first model is a system where to each equation there corresponds an exclusive regressor. Consider for example a model where the vector of observable endogenous variables
consists of the Nash equilibrium actions of a set of players. Each player chooses his or
her action as a function of his or her individual observable and unobservable costs, taking
the other players’ actions as given. In this model, each individual player’s observable cost
would be the exclusive observable variable corresponding to the reaction function of that
player. Our method allows to estimate nonparametrically the reaction functions of each
of the players, at each value of the unobservable costs, from the distribution of observable
equilibrium actions and players’ costs. The estimator that we present for this model is an
average derivative type of estimator. The calculation of the estimator for the derivatives of
the reaction functions of the players, at any given value of the observable and unobservable
arguments, requires only a simple matrix inversion and a matrix multiplication, analogous
to the solution of Linear Least Squares estimators. The difference is that the elements in
our matrices are calculated using nonparametric averages of products of derivatives. In
this sense, our estimators can be seen as the extension to models with simultaneity of the
average derivative methods of Stoker (1986) and Powell, Stock and Stoker (1989). As in
1
See Koenker (2005) for other quantile methods.
2
those papers, we extract the structural parameters using weighted averages of functions of
nonparametrically estimated derivatives of the densities of the observable variables.2
Our second model is a two equation model with one instrument. Consider for example
a demand function, where the object of interest is the derivative of the demand function
with respect to price. Price is determined by another function, the supply function, which
depends on quantity produced, an unobservable shock, and at least one observable cost. We
develop for this model estimators based on our two approaches, the average over instruments
derivative estimator, similar to the estimator developed for our first model, and indirect
estimators. Our indirect estimators are based on a two-step procedure. In the first step,
either one or two values of the observable cost are found where the density of the observable
variables satisfies some conditions. In the second step, the derivative of the demand with
respect to price is read off the joint density of the price, quantity, and cost evaluated at the
found values of cost. The estimators are developed by substituting the joint density of price,
quantity, and cost, by a nonparametric estimator for it. The estimators for the derivative of
the demand, at a given price and quantity, that we develop are consistent and asymptotically
normal.
The new observational equivalence and constructive identification results that we present
do not require large support conditions on the observable exogenous regressors. They are
based on the identification results in Matzkin (2008), which start out from the transformation
of variables equation for densities. Employing this equation, Matzkin (2007b, Section 2.1.4)
presented a two-step constructive identification result for an exclusive regressors model, under the assumption that the density of the unobservable variables has a unique known mode.
(See Matzkin (2013) for a detailed development of that result.) Other identification results
that were developed using similar equations are Berry and Haile (2009, 2011) and Chiappori
and Komunjer (2009). Berry and Haile (2009, 2011) developed alternative constructive identification results starting out also from the transformation of variables equation for densities.
Their results apply to the class of exclusive regressors models where each unobservable variable enters in the structural functions through an index. Chiappori and Komunjer (2009)
derived generic, non-constructive identification results in a multinomial model, by creating a
mapping between the second order derivatives of the log density of the observable variables
and the second order derivatives of the log density of the unobservable variables.
We focus in this paper on the most simple models we can deal with, which exhibit simultaneity. However, our proposed techniques can be used in models where simultaneity is
only one of many other possible features of the model. For example, our results can be used
2
Existent extensions of the average derivative methods of Stoker (1986) and Powell, Stock, and Stoker
(1989) for models with endogeneity, such as Altonji and Matzkin (2001, 2005), Blundell and Powell (2003a),
Imbens and Newey (2003, 2009), and Altonji, Ichimura, and Otsu (2012) require conditions that are generally
not satisfied by models with simultaneity.
3
in models with simultaneity in latent dependent variables, models with large dimensional
unobserved heterogeneity, and models where the unobservable variables are only conditionally independent of the explanatory variables. (See Matzkin (2012) for identification results
based on Matzkin (2008) in such extended models.)
Alternative estimators for nonparametric simultaneous equations can be formulated using
a nonparametric version of Manski (1983) Minimum Distance from Independence, as in
Brown and Matzkin (1998). Those estimators are defined as the minimizers of a distance
between the joint and the multiplication of the marginal distributions of the exogenous
variables, and typically do not have a closed form.
The structure of the paper is as follows. In the next section we present the exclusive
regressors model. We develop new observational equivalence results for such model, and
use those results to develop a closed form estimator for either the derivatives or the ratios of
derivatives of the structural functions in this model. We show that the estimator is consistent
and asymptotically normal. In Section 3, we consider a model with two equations and one
instrument. We develop new observational equivalence results for such model, and use those
results to develop an estimator for the derivative of the structural function that excludes the
instrument, by averaging over the instrument. We show that the estimator, which is given in
closed form, is also consistent and asymptotically normal. In Section 4 we present indirect,
two-step estimators for the two equation, one instrument model, and show they are also
consistent and asymptotically normal. Section 5 presents results of simulations performed
using some of the estimators. Section 6 concludes.
2. The Model with Exclusive Regressors
2.1. The Model
We consider in this section the model
1 = 1 (2 3 1 1 )
2 = 2 (1 3 2 2 )
(21)
···
= (1 2 −1 )
where (1 ) is a vector of observable endogenous variables, ( 1 ) is a vector
of observable exogenous variables, and (1 ) is a vector of unobservable variables. The
observable vector has the effect of decreasing the rates of convergence of nonparametric
estimators of model (2.1) but does not add complications for identification, as all our as4
sumptions and identification conclusions can be interpreted as holding conditionally on
Hence, for simplicity of exposition, we will omit from the model.
Since for each the function is unknown and the nonadditive is unobservable, we
will at most be able to identify the values of up to an invertible transformation.3 Hence, for
each we may normalize to be either strictly increasing in as it is assumed in models
additive in or we may normalize to be strictly decreasing in The invertibility of
in implies that, for any fixed values of the other arguments of the function there
is a unique value of for each value of We will denote the function that assigns such
value of by (1 ) Our system of indirect structural equations, denoting the
mapping from the vectors of observable variables to the vector of unobservable variables, is
expressed as
1 = 1 (1 1 )
2 = 2 (1 2 )
···
(22)
= (1 )
The derivatives of the function can be calculated by substituting (2.2) into (2.1) and
differentiating with respect to the various arguments. The derivative of with respect to
when 6= and is a specified value is
¸−1 ∙
¸
∙
(− )
( − )
( − )
| = (1 ) = −
The derivatives of with respect to is the same expression as the derivative for
except that is substituted by The derivative of with respect to
when is a specified value is
¸−1
∙
(− )
( − )
| = (1 ) = −
The estimation methods we introduce are based on the assumptions that (i) and
have, respectively, differentiable densities, and , (ii) the functions ( = 1 )
are twice continuously differentiable, and (iii) for all ( ) in a set in the support of
(1 1 ) the conditional density of given = evaluated at = is
3
See Matzkin (1999, 2003, 2007) for discussion of this nonidentification result in the one equation model,
= ( ) with and independently distributed
5
given by the transformation of variables equation
¯
¯
¯ ( ) ¯
¯
¯
(24) |= () = ( ( )) ¯
¯
where | ( ) | denotes the Jacobian determinant of ( ) with respect to In addition, we assume that (iv) for each the function has a nonvanishing derivative with
respect to its exclusive regressor, . A set of sufficient conditions for (i)-(iv) is given by
Assumptions 2.1-2.3 below.
¡
¢
Assumption 2.1: The function = 1 is twice continuously differentiable. For
each is invertible in and the derivative of with respect to is bounded away
from zero. Conditional on (1 ) the function is 1 − 1, onto and as a function
of x its Jacobian determinant is positive and bounded away from zero.
Assumption 2.2: (1 ) is distributed independently of (1 ) with an everywhere positive and twice continuously differentiable density,
Assumption 2.3: (1 ) possesses a differentiable density.
For the analysis of identification, the left-hand-side of (2.4) can be assumed known. In
practice, it can be estimated nonparametrically. The right-hand-side involves the structural
functions, and whose features are the objects of interest. The differentiability assumptions on and imply that both sides of (24) can be differentiated with respect to
and This allow us to transform (2.4) into a system of linear equations with derivatives of
known functions on one side and derivatives of unknown functions on the other side. We
show in the next subsection how the derivatives of the known function can be used to identify
ratios of derivatives of the unknown functions
We will develop estimators for the identified features of Theorem 2.1 below characterizes the features of that can be identified. Roughly, the theorem states that, under
appropriate conditions on the density, of and on a vector of composite derivatives of
log |( )| the ratios of derivatives, ( ) ( ) of each of the functions,
with respect to its coordinates, are identified. The statement that the ratios of derivatives,
( ) ( ) are identified, is equivalent to the statement that for each is identified up to an invertible transformation. This suggests considering restrictions on the set
of functions which guarantee that no two different functions satisfying those restrictions
are invertible transformations of each other.4 One such class can be defined by requiring that
4
Examples of classes of nonparametric functions satisfying that no two functions in the set are invertible
6
for each function in the class there exists a function : → such that for all
(25)
( ) = () +
and such that () = where and are specified and constant over all the functions in
the class.
2.2. Observational Equivalence
To motivate the additional restrictions that we will impose, we first present an observational
equivalence result for the exclusive regressor models. The result is obtained by specializing
the observational equivalent results in Matzkin (2008) to the exclusive regressors model, and
by expressing those results in terms of the density, of the observable variables instead
of in terms of the density, of the vector of unobservable variables, We also modify
Matzkin (2008)’s results further by restricting the definition of observational equivalence to
a subset of the support of the vector of observable variables.
We first introduce some notation, which will be used throughout the paper. Let |= ()
denote the conditional density of the vector of observable variables. We will denote the
derivative with respect to of the log of |= () log |= () by g ( ) =
(g1 ( ) g ( ))0 The derivative, log |= () of the log of |= () with
respect to will be denoted by g ( ) = (g1 ( ) g ( ))0 The derivative of the
log of the density, of with respect to log () will be denoted by () =
(1 () ())0 When = ( ) log () will be denoted by (( )) or by
() For each and each the ratio of derivatives of with respect to and
( ) ( ) will be denoted by ( ) For an alternative function e these ra
( ) The Jacobian determinants, | ( ) |
tios of derivatives will be denoted by e
( )| The derivatives
and |e
( ) | will be denoted respectively by | ( )| and |e
( )| with respect to any of their arguments, ∈ {1 1 }
of | ( )| and |e
( )| The statement of our main observational
will be denoted by | ( )| and |e
equivalence result in this section involves functions, defined for each by
"
#
X
| ( )| ( )
(26) ( ) =
−
| ( )|
| ( )| ( )
=1
| ( )|
The term ( ) can be interpreted as the effect on log|( )| of a simultaneous
change in and in (1 )
transtormations of each other were studied in Matzkin (1992, 1994) in the context of threshold crossing,
binary, and multinomial choice models, and in Matzkin (1999, 2003) in the context of a one equation model
with a nonadditive random term.
7
Let Γ denote the set of functions that satisfy Assumption 2.1 and let Φ denote the set
of densities that satisfy Assumption 2.2. We define observational equivalence within Γ over
a subset, in the interior of the support of the vector of observable variables.
Definition 2.1: Let denote a subset of the support of ( ) such that for all
( ) ∈ ( ) where is any positive constant A function e ∈ Γ is observationally equivalent to ∈ Γ on if there exist densities and satisfying Assumption
2.2 and such that for all ( ) ∈
(27)
¯
¯
¯ ( ) ¯
¯ = (e
( ))
( ( )) ¯¯
¯
¯
¯
¯ e
¯
(
)
¯
¯
¯ ¯
When ( ) is the pair of inverse structural function and density generating | the
definition states that e is observationally equivalent to if there is a density in Φ that together
with e generates | The following theorem provides a characterization of observational
equivalence on
Theorem 2.1: Suppose that ( ) generates | on and that Assumptions 2.1-2.3
are satisfied. A function e ∈ Γ is observationally equivalent to ∈ Γ on if and only if
for all ( ) ∈
0 =
(28)
´
´
´
´
³
³
³
³
1
2
e
e
g
11 − e
1 g1 + 21 − e
1 g2 + · · · +
−
+
−
1
1
1
1
· · ·
³
´
´
´
´
³
³
³
1
2
e
e
0 = 12 − e
g
2 g1 + 22 − e
2 g2 + · · · +
−
+
−
2
2
2
2
· · ·
³
³
³
³
´
´
´
´
1
2
1
2
e
e
e
e
0 = − g1 + − g2 + · · · + − g + −
where for each and = ( ) = ( ) ( ) e
= e
( ) = e
( )e
( )
1 = 1 ( ) e = e ( ) and = log |= ()
The proof, presented in the Appendix, uses (24) to obtain an expression for the unobservable log (( )) in terms of the observable g ( ) = log |= () The
expression in terms of g ( ) is used to substitute log (( )) in the observational
equivalence result in Matzkin (2008, Theorem 3.2). Equation (2.8) is obtained after manipulating the equations resulting from such substitution.
Theorem 2.1 can be used with (2.4) to develop constructive identification results for
features of and estimators for such features. Taking logs and differentiating both sides of
8
(2.4) with respect to gives
(29)
| ( )|
log |= () X log ( ( ))
=
( 1 ) +
| ( )|
=1
and taking logs and differentiating both sides of (2.4) with respect to gives
(210)
| ( )|
log |= ()
log ( ( ))
=
( 1 ) +
| ( )|
Solving for log () in (210) and substituting the result into each of the log ()
terms in (29) we obtain
(211)
log |= () X log |= () ( )
+ ( )
=
( )
=1
where ( ) is as in (26) Equation (211) implies that ( ) satisfies the following system
of equations
g1 = 11 g1 + 21 g2 + · · · +
1 g + 1
· · ·
(212)
g2 = 12 g1 + 22 g2 + · · · +
2 g + 2
· · ·
g = 1 g1 + 2 g2 + · · · +
g +
Several features of this system of equations deserve mentioning. First, note that this is a
system of equations where only the ratios of derivatives = ( = 1 )
and the terms 1 are unknown. Second, note that the unknown elements in this
¡
¢
system are elements of only the inverse function 1 They do not depend on the
unknown density of (1 ). The density enters the system only through the known
terms, g1 g and g1 g Moreover, the values of depend on the values of
¢
¡
¢
¡ 1
rather than on the ratios of derivatives of 1 Hence, the density has
the potential to generate variation on the values of g1 g and g1 g independently
of the unknown ratios of derivatives = ( = 1 ) and of 1 Third,
each of the ratios of derivatives depend on only one while g1 g and g1 g
depend on all the vector (1 ) Hence, variation on the coordinates other than has
the potential to generate variation on the other elements of the system, while the ratios
stay fixed. This leads to the analysis of conditions on ( ) guaranteeing that
9
values of (g1 g ) which are observable, can be found so that (212) can be solved for
¡
¢
either the whole vector ( ) = 11 1 ; ;
1 ; 1 or for some elements
of it.
2.3. Average Derivatives Estimators for the Model with Exclusive
Regressors
We next develop an estimator
for
´−1the
´ of derivatives, based on a characterization of
³ ratios
³
e
e 0 e Such characterization of employs the fact that
e 0
of the least-squares form
(212) holds for all values of (g1 g ) over any subset of where ( ) is constant.
The elements of the matrices are obtained by averages of multiplications of derivatives of
the
log |= () over the set
³ where
´
³ values
´ of ( ) are constant. Estimation of follows
e 0
e and
e 0 e matrices, |= by a nonparametric estimator
by substituting, in the
for |=
We derive our expression for by characterizing ( ) as the unique solution to the
minimization of an integrated square distance between the left-hand-side and the righthand-side of (2.12). The integration set must be over a subset of the support of ( )
where ( ) is constant. Hence, this set will depend on the restrictions that one assumes
on the function We will provide two sets of restrictions on each leading to different
integration sets.
Another restriction on the integration set is that it must contain in its interior + 1
values of the observable variables such that when g is evaluated at those values, the only
solution to (2.8) is the vector of 0’s. This identification condition guarantees that ( ) is
the unique minimizer of the distance function. For each of the two sets of restrictions on
we will provide conditions on guaranteeing that such + 1 values exist. The two sets of
restrictions on that we will consider are stated in Assumptions 2.4 and 2.4’.
Assumption 2.4: The inverse function is such for some function : → and
all ( ) ( ) = () +
Assumption 2.4’: For each = 1 the inverse function is such that for some
function : → and all ( ) ( ) = () +
Assumption 2.4 is equivalent to requiring that the units of measurement of are tied
to those of by5
5
I thank a referee for showing that Assumption 2.4 is equivalent to this restriction.
10
−
(1 −1 )
(1 −1 )
=
The sets on which ( ) is constant when Assumptions 2.4 and 2.4’ are satisfied are stated
in the following propositions.
Proposition 2.1: Let ( − ) = ( 1 −1 ) be fixed and given. When Assumptions 2.1 and 2.4 are satisfied, ( ) is constant over the set = {( − ) | ∈ }.
Proposition 2.2: Let be fixed and given. When Assumptions 2.1 and 2.4’ are
©
ª
satisfied, ( ) is constant over the set = ( 1 ) | (1 ) ∈
To state our assumption on the density we will denote by ((1) (+1) ) the matrix
of derivatives of ( ) at + 1 values, (1) (+1) of
⎛
⎜
⎜
⎜
⎜
¢ ⎜
¡ (1)
(+1)
=⎜
⎜
⎜
⎜
⎜
⎝
log ((1) )
1
·
·
log (() )
1
log ((+1) )
1
log ((1) )
2
· · ·
log ((1) )
1
·
·
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
2
· · ·
1 ⎟
⎟
⎠
log ((+1) )
log ((+1) )
· · ·
1
2
·
·
log (() )
·
·
log (() )
Assumption 2.5: There exist + 1 not necessarily known, values (1) (+1) of
¢
¡
such that (1) (+1) is invertible6
Assumption 2.6: There exists + 1 not necessarily known values, (1) (+1) in
¡
¢
the set where ( ) is constant, such that for each = 1 + 1 () = () where
() is as in Assumption 2.5.
Assumptions 2.5 and 2.6 require that there exist + 1 values of each corresponding
¡
¢
to the value of at one point in the set such that the matrix (1) (+1) is
invertible. When the set is as in Proposition 2.1, the points in the set can differ only
by their value of Since enters only in the + 1 values of that can be used
6
This assumption is a generalization of the assumptions in Matzkin (2008, Example 4.2) and Matzkin
(2010), which imposed zero values on some of the elements of this matrix, guaranteeing invertibility. Invertibility conditions on an exclusive regressor model were imposed, in previous works, on the matrix of second
order derivatives of log (See Brown, Deb, and Wegkamp (2007) for identification in a semiparametric
version of the model in Matzkin (2007b, Section 2.1.4) and Berry and Haile (2011).)
11
¡
¢
to satisfy the invertibility of (1) (+1) must possess the same values of 1 −1
Hence, only changes in the value of must generate the + 1 linearly independent rows
¢
¡
in (1) (+1) When the set is as in Proposition 2.2, the points in the set may
differ in their values of (1 ) In this case, the + 1 values of that must satisfy the
¢
¡
invertibility of (1) (+1) may differ in their values of any coordinate, not just in the
value of their last coordinate. Normal distributions can satisfy Assumption 2.5 in the latter
case, where the values of (1) (+1) are allowed to differ in all coordinates, but not in the
former case, where only the last coordinates of (1) (+1) are different.
The following propositions relate Assumptions 2.5-2.6 to a testable condition. Let
(+1) denote + 1 points in For each and we will denote the values
()
of g at () by g
(1)
Condition I.1: There exist (1) (+1) in such that the matrix
⎛
⎜
¢ ⎜
¡ (1)
(+1)
=⎜
⎜
⎝
(1)
(1)
(1)
g1
g2
· · · g
·
·
(+1)
(+1)
(+1)
g2
· · · g
g1
1
1
·
1
⎞
⎟
⎟
⎟
⎟
⎠
is invertible.
¡
¢
The existence of + 1 points, (1) (+1) in such that (1) (+1) is
invertible implies by Theorem 2.1 that, when Assumptions 2.1-2.3 are satisfied, ( ) is
identified. This is because for each evaluating the equation corresponding to in (2.8) at
(1) () and (+1) generates + 1 linear independent equations in + 1 unknowns,
whose unique solution is the vector of 0’s. Propositions 2.3 and 2.4 below show that Assumptions 2.5 and 2.6 imply that Condition I.1 is satisfied in models satisfying Assumptions
2.1-2.4 or Assumptions 2.1-2.3 and 2.4’, when is appropriately chosen. They also show
that Condition I.1 can be employed to test Assumption 2.5. Any +1 points (1) (+1)
¡
¢
in for which (1) (+1) is invertible are mapped to + 1 vectors, (1) (+1)
¡
¢
in such that (1) (+1) is invertible.
Proposition 2.3: Suppose that Assumptions 2.1-2.3 and 2.4 are satisfied on Let the
set be included in the set {( − ) | ∈ } Then, for all (1) (+1) in and
¡
¢
¡
¢
(1) (+1) in such that () = () (1) (+1) is invertible if and only if
¢
¡
(1) (+1) is invertible.
12
Proposition 2.4: Suppose that Assumptions 2.1-2.3 and 2.4’ are satisfied on Let the
©
ª
set be included in the set ( 1 ) | (1 ) ∈ Then, for all (1) (+1)
¡
¢
¡
¢
in and (1) (+1) in such that () = () (1) (+1) is invertible if
¢
¡
and only if (1) (+1) is invertible.
We next define a distance function such that ( ) is the unique solution to the minimization of this distance function. Let denote a compact set where, as earlier, ( ) is
constant. Let be such that = { − } × where − is the, possibly empty, singleton corresponding to the coordinates of than remain fixed on Then, when Assumption
2.4 is satisfied, { − } = { − } and any ∈ is a scalar; when Assumption 2.4 is
satisfied, { − } = {} and any ∈ is −dimensional. Let ( − ) denote a speciR
fied differentiable nonnegative function defined on 2 such that
³
´ ( − ) = 1 and
e generated from an al( − ) = 0 on the complement, of For any vector e
ternative function e satisfying the same assumptions
³ as´ but not necessarily observationally
e by
equivalent to we define the distance function e
³ ´ Z
e
e =
"
#
´2
X³
1
2
( − )
g1 − e
g2 − · · · − e
g − e
g − e
=1
³ ´
The value of e
e is the integrated square distance between the left-hand-side and
³ ´
³ ´
the right-hand-side of (2.12) when ( ) is replaced by e
e Since e
e ≥ 0 and
e ) is strictly positive at (1) (+1)
( ) = 0 ( ) is a minimizer of (·) When (
that satisfy Condition I.1, ( ) is the unique minimizer of (·) To express the first coordinates, of the vector ( ) that solves the First Order Conditions for the minimization
of (·) we introduce some additional notation. The average of g and g over will be
denoted, respectively, by
Z
g =
Z
Z
g =
Z
g ( − ) ( − ) =
g ( − ) ( − ) =
Z
Z
log |=(− ) ()
( − )
log |=(− ) ()
( − )
The averaged centered cross products between g and g and between g and g will be
denoted respectively by
¶µ
¶
Z µ
Z
Z
g ( − ) −
g ( − ) −
=
g
g ( − ) and
13
¶µ
¶
Z µ
Z
Z
g ( − ) −
g ( − ) −
=
g
g ( − )
The matrices of centered cross products, and will be defined by
⎛
⎜
⎜
=⎜
⎜
⎝
1 1 2 1 · · 1
·
·
·
·
·
·
1 2 · ·
⎛
⎞
⎜
⎟
⎜
⎟
⎟ and = ⎜
⎜
⎟
⎝
⎠
1 1 2 1 · · 1
·
·
·
·
·
·
1 2 · ·
⎞
⎟
⎟
⎟
⎟
⎠
The matrix of ratios of derivatives, will be defined by
⎛
⎜
⎜
⎜
() = ⎜
⎜
⎜
⎝
11 12 · · · 1
·
·
·
·
·
·
1 2 · · ·
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠
The solution of the First Order Conditions for results in the expression () =
Since ( ) is the unique minimizer, the matrix must be invertible. It follows that ()
will be given by
−1
(213) () =
The following theorems establish the conditions under which ( ) is the unique minimizer
of (·), which imply that () is given by (213) for the definitions of and that
correspond in each case to the definition of the set
Theorem 2.2: Let ( − )=( 1 −1 ) be given and let the compact set be
included in the set {( − ) | ∈ }. Suppose that Assumptions 2.1-2.3 and 2.4-2.6 are
satisfied, and that the nonnegative function ( − ) is strictly positive at least at one
set of points (1) (+1) satisfying Condition I.1. Then, ( ) is the unique minimizer
of
³ ´ Z
e =
e
"
#
´2
X³
1
2
( − )
g1 − e
g2 − · · · − e
g − e
g − e
=1
and () is given by (213)
Theorem 2.3:
Let be given and let the compact set be included in the set
14
©
ª
( 1 ) | (1 ) ∈ Suppose that Assumptions 2.1-2.3, 2.4’, and 2.5-2.6 are
satisfied, and that (1 ) is strictly positive at least at one set of points (1) (+1)
satisfying Condition I.1. Then, ( ) is the unique minimizer of
³ ´ Z
e
e =
"
#
´2
X³
1
2
( 1 ) (1 )
g1 − e
g2 − · · · − e
g − e
g − e
=1
and () is given by (213)
To obtain the estimators for () we note that each of the elements, and
in the matrices and can be estimated from the distribution of the observable
variables, by substituting |= () in all the expressions by a nonparametric estimator,
d
b |= () for |= () Denote such estimators by b and b and let d
and
denote the matrices whose elements are, respectively, b and b Then, the estimator
for the matrix of ratios of derivatives () is defined as
−1
[
d
() = d
2.4. Asymptotic Properties of the Estimator
In this section, we develop asymptotic properties for the estimator presented in Subsection
2.3, for the case when the estimator b |= () for the conditional density of given is
obtained by kernel methods. We assume that for any ∈ = ∈ Let { }=1
denote iid observations generated from The kernel estimator is
´
³
− −
=1
´
³
b |= () =
P
−
=1
P
where is a kernel function and is a bandwidth. The element in the −th row, −th
column of our estimator for is
¶µ
¶
Z µ
Z
Z
b
b ( )
b ( ) −
b ( ) −
g
g
g
g
where for = 1
log b |= ()
b ( ) =
g
and
Z
15
b =
g
Z
log b |= ()
( )
Similarly, the element in the −th row, −th column of our estimator for is
¶µ
¶
Z
Z
Z µ
b ( ) −
b
b ( )
b ( ) −
g
g
g
g
where for = 1
log b |= ()
b ( ) =
g
and
Z
b () =
g
Z
log b |= ()
( )
We will let denote a convex and compact set such that the value at which we estimate
is an interior point of and we will let be a convex and compact set such that
is strictly in the interior of Our result uses the following assumptions.
Assumption 2.7: The density generated by and is bounded and continuously
differentiable of order ≥ +2 where denotes the order of the kernel function. Moreover,
there exists 0 such that for all ( ) ∈ × () and ( )
Assumption 2.8: The set is compact. The function (· ·) is bounded and continuously differentiable, strictly positive values at all ( ) such that belongs to the interior of
values and derivatives equal to zero at all ( ) such that the value of any coordinate of
) that belongs to the complement of
lies on the boundary of and equal zero at all (e
{} × The set {} × contains at least one set of points (1) (+1) satisfying
Condition I.1 and such that is strictly positive at each of those points.
Assumption 2.9: The kernel function is of order where + 2 ≤ It attains
the value zero outside a compact set, integrates to 1, is differentiable of order ∆ and its
derivatives of order ∆ are Lipschitz, where ∆ ≥ 2
→ ∞
Assumption 2.10: The sequence of bandwidths, is such that → 0 +2
∙q
¸2
√ (2)+1+
√ (2)+1
£ 2+2
¤
2+2
→ 0 ln() → ∞ and
ln() + →
0.
To describe the asymptotic behavior of our estimator, we will denote by the vector
2
in formed by stacking the columns of () so that = ( ()) = (11
1 ;
1
1
0
2 2 ; ; ) Let
b denote the estimator for Accordingly, we will de = ⊗ d
note the matrix by ⊗ and its estimator \
The vector
will be the vector formed by stacking the columns of : = (1 1 1 ;
16
2 1 2 ; ; 1 )0 with its estimator defined by substituting each coordinate by its estimator. For each denote
log |= ()
∆ log |= () =
−
Z
log |= ()
( )
and for each denote
g =
½Z ∙Z µ
¶ ¸ ∙Z µ
¶ ¸ ¾
(e
e)
(e
e)
e
e
e
In the proof of Theorem 2.4, which we present in the Appendix, we show that under our
assumptions
q
³
´
\
+2
−
→ (0 )
where the element in corresponding to the covariance between and is
½Z
¢¡
¢
¡
∆ log |= () ∆ log |= ()
Denote by b the matrix whose elements are
µ
( )2
( )
¶
¾
g
(Z
!
)
Ã
³
´³
´ ( )2
g
∆ log b |= () ∆ log b |= ()
b ( )
The following theorem is proved in the Appendix.
Theorem 2.4: Suppose that the model satisfies Assumptions 2.1-2.3, 2.4’, 2.5-2.10.
Then,
q
¡
¢
+2
(
b − ) → 0 ( )−1 ( )−1
´−1
³
´−1
³
b
\
\
is a consistent estimator for ( )−1 ( )−1
and
If Assumption 2.4’ is substituted by Assumption 2.4, it can be shown by adapting the
assumptions
q and proofs of Theorems 2.4 and 3.2 that the rate of convergence of the estimator
for is 2+1
17
3. The Model with Two Equations and One Instrument
3.1. The Model
The model considered in this and the following section is
(31)
1 = 1 (2 1 )
2 = 2 (1 2 )
where (1 2 ) is observable and (1 2 ) is unobservable. In this section, we will develop
an estimator for 1 (2 1 ) 2 , analogous to the estimator in Section 2. In the next
section, we will develop a two-step procedure to estimate 1 (2 1 ) 2 We assume that
1 is either strictly increasing or strictly decreasing in 1 to guarantee the existence of a
function 1 such that for all (1 2 ) 1 = 1 (2 1 (1 2 )) Differentiating with respect
to 1 and 2 this expression, it follows that for any given (1 2 )
12 (1 2 )
1 (2 1 )
=− 1
2
1 (1 2 )
where 1 is the unknown but unique value satisfying 1 = 1 (2 1 ) and 11 and 12 denote
the partial derivatives of 1 with respect to 1 and 2 and Similarly, we assume that 2
is either strictly increasing or strictly decreasing in 2 to guarantee the existence of 2 such
that for all (1 2 ) 2 = 2 (1 2 (1 2 )) The additional assumptions guarantee
that for all (1 2 ) on a set 0 in the support of ( ) = (1 2 )
(32)
¯
¯
¢ ¯ (1 2 ) ¯
¡ 1
2
¯
¯
1 2 |= (1 2 ) = 1 2 (1 2 ) (1 2 ) ¯
(1 2 ) ¯
Assumption 3.1: The function 1 is invertible in 1 and the function 2 is invertible
in 2 The vector function is twice continuously differentiable. The derivative of 2 with
respect to is bounded away from zero. Conditional on the function is 1 − 1, onto 2
and as a function of the Jacobian determinant is positive and bounded away from zero.
Assumption 3.2: (1 2 ) is distributed independently of with an everywhere positive
and twice continuously differentiable density,
Assumption 3.3: possesses a differentiable density.
18
3.2. Observational Equivalence
Our observational equivalence result for model (3.1) involves functions, ( ) and e
( )
e
analogous to the functions and in Section 3. These are defined by
(33)
12 () | ( )|1 | ( )|2
| ( )|
( ) = − 1
−
+
and
1 () 2 ( ) 11 () | ( )|
| ( )|
( )|1 |e
( )|2
e12 () |e
|e
( )|
−
+
e
( ) = − 1
e1 () e2 ( ) e11 () |e
( )|
|e
( )|
Let Γ0 denote the set of functions that satisfy Assumption 3.1. We define observational
equivalence within Γ0 over a subset, 0 of the support of the vector of observable variables.
Definition 3.1: Let 0 denote a subset of the support of ( ) such that for all
( ) ∈ 0 ( ) 2 where 2 is any positive constant Function e ∈ Γ0 is
observationally equivalent to ∈ Γ0 on 0 if there exist densities and satisfying
Assumption 3.2 and such that for all ( ) ∈ 0
¯
¯
¯ ( ) ¯
¯ = (e
¯
( ( )) ¯
( ))
¯
(34)
¯
¯
¯
¯ e
¯ ( ) ¯
¯ ¯
Our observational equivalence for the model (3.1) is given in the following theorem.
Theorem 3.1: Suppose that ( ) generates | and that Assumptions 3.1-3.3 are
satisfied. A function e ∈ Γ0 is observationally equivalent to ∈ Γ0 on 0 if and only if for
all ( ) ∈ 0
(35)
0=
Ã
1
2
12 e
− 1
11 e
1
!
g1 −
µ
| |
|e
|
− 2 1
2
1
1 e e1
¶
g + ( − e
)
where for each and = ( ) e = e ( ) 2 = 2 ( ) e2 = e2 ( ) | | =
| = |e
( )| g1 = g1 ( ) = log |= ()1 g = g ( ) = log |= ()
| ( )| |e
and where = ( ) and e
=e
( ) are as defined in (33)
When comparing Theorem 3.1 with Theorem 2.1, note that the lack of one exclusive
regressor in the first equation has reduced the number of equations by one. The derivative
of log |= () with respect to 1 has taken up the place that the derivative of log |= ()
with respect to 1 would have taken. The ratio of derivatives of 1 appears as a coefficient
19
of log |= ()1 The two ratios of derivatives of 2 which in Theorem 2.1 appeared
separately, each as a coefficient of a different derivative of log |= appear in (3.5) in one
£¡¡ 2 2 ¢ ¡ 2 2 ¢ ¡ 1 1 ¢¢¤
coefficient, of the form
2 − 1 2 1
The proof of Theorem 3.1, presented in the Appendix, proceeds in a way similar to the
one used to prove Theorem 2.1. Equation (32) is used to obtain an expression for the unobservable log (( )) in terms of the derivatives of the observable log( |= ()) The
resulting expression is used to substitute log (( )) in the observational equivalence
result in Matzkin (2008, Theorem 3.2). After manipulation of the equations, the resulting
expression is (35) The main difference between both proofs is that in the two equations,
one instrument model, the expression for log (( )) involves not only the derivative of log( |= ()) with respect to the exogenous variable, but also the derivative of
log( |= ()) with respect to the endogenous variable 1
Theorem 3.1 together with (3.2) can be used to develop constructive identification results
for features of Differentiating both sides of (3.2) with respect to 1 2 and gives
(36)
| |1
log |= ()
log 1 2 (1 2 ) 1
log 1 2 (1 2 ) 2
=
1 +
1 +
1
1
2
| |
(37)
| |2
log |= ()
log 1 2 (1 2 ) 1
log 1 2 (1 2 ) 2
=
2 +
2 +
2
1
2
| |
(38)
log |= ()
log 1 2 (1 2 ) 2 | |
=
+
2
| |
Equation (3.8) reflects the fact that is exclusive to 2 Solving for log ( ( )) 2
from (38) and substituting into (36) and (37) gives
(39)
(310)
log |= ()
log |= () 21 | |1 21 | |
log 1 2 (1 2 ) 1
=
1 +
+
− 2
1
1
2
| |
| |
log |= () 1
log |= () 22 | |2 22 | |
log |= ()
− 2
=
2 +
+
2
1
2
| |
| |
Solving for log ( ( )) 1 from (39) and substituting into (310) gives
(311)
¸
∙ 1 2
12 log |= ()
2 1 22 log |= ()
log |= ()
= 1
+ 1 2 − 2
+ ( )
2
1
1
1
Rearranging terms, and substituting log |= () by g and log |= () by
g we get that,
20
(312)
¸
∙
12
| ( )|
g ( ) + ( )
g2 ( ) = 1 g1 ( ) +
1
11 2
Equation (3.12) together with (3.5) will be referred to in later sections, to build upon them
estimators for 1 (2 1 ) 2 = −12 11
3.3. Average Derivatives Estimator for the Two Equation One Instrument Model
The result of the previous subsection can be used to obtain an estimator for the unknown
values of the coefficients in equation (3.14), in analogy to the estimator developed for the
exclusive regressors model. Let be a given value of the endogenous variables. Suppose
it is known that Assumption 2.4 is satisfied or, more generally, that on a subset in the
£ 1 2¤
support of it is the case that | ( )|| 1 and ( ) are constant over Suppose
also that the following condition is satisfied,
Condition I.2: There exist (1) (2) and (3) in such that the rank of the matrix
⎛
(1)
g1
⎜
⎜
⎜ (2)
⎜ g1
⎜
⎜
⎝
(3)
g1
(1)
g2
(2)
g2
(3)
g2
()
1
⎞
⎟
⎟
⎟
1 ⎟
⎟
⎟
⎠
1
()
is 3, where for = 1 2 3 g1 = log |=() () 1 and g = log |=() ()
Let (1 2 ·) be a continuous, nonnegative function with positive values at the points
R
(1) (2) and (3) and such that (1 2 ) = 1 where = {(1 2 )} × Denote
£
¤
() = 12 ()11 () () = | ( )|| 11 2 and () = ( ) In analogy to the
development in the previous section, and to the
³ proof´of Theorem 2.2, the rank condition
e e
and the continuity of imply that the vector
e
= (() () ()) is the unique
³
´
e e
minimizer of the function
e
defined by
³
´ Z
e
e e
=
´2
³
e
g ( ) − e
( )
g2 ( ) − g1 ( ) − e
The first order conditions of this minimization are given by
21
⎤⎡
⎤ ⎡ R
⎡ R
⎤
R
R
g2 g1
()
g1 g1
g1 g
g1
R
R
⎥⎢
⎥ ⎢ R
⎢ R
⎥
g g
g ⎦ ⎣ () ⎦ = ⎣ g2 g ⎦
⎣ g1 g
R
R
R
1
()
g1
g
g2
R
R
R
where for ∈ {1 2 } g g = g ( ) g ( ) ( ) 2 and g =
R
g ( ) ( ) The 3 × 3 matrix is the Hessian of the function which is con
³
´
e
stant over
e e
By the convexity of and the uniqueness of a minimizer, this matrix
is positive definite and therefore invertible. Solving for () and substituting into the first
equation, we get
#"
# "
#
"
()
1 2
1 1 1
=
1
2
()
where the 2 × 2 matrix is positive definite. Solving for () we get
() =
1 2 − 1 2
1 1 − (1 )2
where the denominator is strictly positive. Replacing the terms in the expression for ()
by nonparametric estimators, we obtain the following nonparametric estimator for ()
b
b
b b
b () = 1 2 − ³1 ´2
2
b1 1 b − b1
b () when the estimators
In the next subsection we develop the asymptotic properties of
for b1 2 b b1 b2 b1 1 and b are obtained by replacing | () by a kernel
estimator for | (). By the relationship between the derivatives of and shown in
Section 2.1, it follows that, since = 12 ()11 ()
1 ( )
b b
b
b
\
2 1
b () = − 1 2 − ³1 ´2
= −
2
2
b1 1 b − b1
Moreover, since 2 (1 2 )22 (1 2 ) = 2 (1 2 ) one can consider restrictions on
2 guaranteeing that the coefficients () and () are constant over
If the derivative 2 ( ) equals 1 for all , 2 ( ) is of the form 2 ( ) = () +
and then the coefficients () and () are constant over When 2 ( ) = () + 2 is
of the form 2 = 2 (1 2 − ) In analogy to Proposition 2.3, Condition I.2 is satisfied in
(1)
(2)
this case if and only if for 1 = 1 (1 2 ) and for 2 = () + (1) 2 = () + (2) and
22
(3)
(3)
2 = () + 2 the following matrix has rank 3,
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
(1)
log 1 2 1 2
1
(1)
log 1 2 1 2
(2)
log 1 2 1 2
1
(2)
log 1 2 1 2
(3)
log 1 2 1 2
(3)
log 1 2 1 2
2
2
1
2
⎞
1 ⎟
⎟
⎟
⎟
⎟
1 ⎟
⎟
⎟
⎟
⎠
1
3.4. Asymptotic Properties of the Estimator
b
Suppose that when calculating ()
b1 2 b b1 b2 b1 1 and b are obtained by
replacing | () by a kernel estimator for | (). As in Section 2.4, define
log |= ()
−
∆ log |= () =
Z
log |= ()
( )
Z
log |= ()
( )
and for = 1 2 define
log |= ()
∆ log |= () =
−
Let
¸
¶∙
µ
¡
¢ ( )
e ( ) = ∆2 log |= ()
1 1 − 21
¸
µ
¶∙
¡
¢ ( )
e ( ) = ∆1 log |= ()
1 1 − 21
¶∙
¸
µ
¡
¢ ( )
2
e ( ) = − ∆ log |= ()
1 1 − 21
¸
µ
¶∙
¡
¢ ( )
1
e ( ) = − ∆ log |= ()
1 1 − 21
¶
µ
¡
¢ ( ) [1 2 − 1 2 ]
e ( ) = −2 ∆1 log |= ()
£
¤2
1 1 − 2
1
µ
¡
¢ ( )
e ( ) = 2 ∆ log |= ()
23
¶
1 [1 2 − 1 2 ]
£
¤2
1 1 − 21
Define
1 () = [e
( ) +
e ( ) +
e ( ) +
e ( )]
2 () = [e
( ) +
e ( )]
() = ( 1 () 2 ())
Define the estimator
b () = (b
1 ()
b 2 ()) for () by substituting by b and by b in
e
e For = (1 2 ) denote
the definitions of
e
=
∙Z ∙Z µ
() =
¶ ¸ ∙Z µ
¶ ¸0 ¸
( )
( )
∙Z
b () =
¸
() () ( ) and
0
Z
b ()
b ()0 b( )
We will make the following assumptions:
Assumption 3.4: There exists a known convex and compact set in the interior of
£
¤
the support of on which the values of | ( )|| 11 ()2 ( ) and ( ) are constant.
Assumption 3.5: There exist at least at one set of points (1) (2) (3) in satisfying
Condition I.2. The function ( ) = (1 2 ) is bounded and continuously differentiable,
with values and derivatives equal to zero when is on the boundary and on the complement
of and with strictly positive values at (1) (2) (3) .
b
The asymptotic behavior of ()
defined in the previous section is established in the
following theorem.
Theorem 3.2:
2.10. Then
Suppose that model (3.1) satisfies Assumptions 3.1-3.5, 2.7, 2.9 and
q
³
´
b
− () → (0 ())
4 ()
and b () → () in probability.
24
4. Indirect Estimators
The estimators developed in Sections 2 and 3 averaged the information provided by the conditional density over a set of values of the exogenous observable variables. In contrast, the
Indirect Estimators developed in this section focus only on one or two values of the exogenous variables. Those values are such that when the conditional density of the endogenous
variables is evaluated at those values, one can directly read off the value of the elements of
interest. Since the conditions for identification may not be necessarily the same, indirect
estimators may provide in some cases a useful alternative to the minimum distance estimators developed in Sections 2 and 3. In addition, by focusing on the particular values of the
observable exogenous variables at which one can read off a structural element of interest,
the estimators developed here allow one to obtain more insight into the relation between
both the exogenous and endogenous variables. Although we develop Indirect Estimators
for the two equations, one instrument model, analogous two-step indirect estimators can be
obtained for the exclusive regressors model of Section 2, employing (212).
4.1 Indirect Estimators for the Two-Equations, One Instrument
Model
We develop in this section two indirect estimators, one based on first derivatives and a second
based on second derivatives, for the two equations, one instrument model,
1 = 1 (2 1 )
2 = 2 (1 2 )
where interest lies on the derivative 1 (2 1 ) 2 for the value of 1 = 1 (1 2 ). Both
estimators are derived from equation (3.12). The estimators are based on two steps. In the
first step, the value or values of the exogenous variables are found, where the density of the
observable variables satisfies some conditions. In the second step, the objects of interest are
read off the density of the observable variables at the found values of the exogenous variables.
We will assume, as in Section 3, that Assumption 2.4 is satisfied. This implies that the value
¡
¡
¢
¡
¢
¢
of the vector = 1 2 3 with 1 = 12 () 11 () 2 = | ( )| 11 ( ) 2 ( )
0
and 3 = ( ) is constant over the set ⊂ {( ) | ∈ } where ( ) is as defined in
(3.3). (The proof is as that of Proposition 2.1 after substituting ( − ) in that proposition
by ) We will consider two sets of assumptions, which substitute for Assumptions 2.5.
Assumption 4.5 implies invertibility of a 2 × 2 matrix whose two rows are the gradients
of log at two points (1 ∗2 ) and (1 ∗∗
2 ) Assumption 4.5’ imposes a condition on the
second order derivatives of log at one point (1 ∗2 ) Assumptions 4.6 and 4.6’ guarantee
25
that there exist points, ∗ and ∗∗ with |=∗ () 0 and |=∗∗ () 0 and such that
the value of 2 at those values of are mapped into the values satisfying Assumptions 4.5 or
4.5’. In Propositions 4.1 and 4.2, we provide characterizations of these assumptions in terms
of conditions on the observable density |= We next employ these characterizations
together with Theorem 3.1 to obtain expressions for 1 (2 1 ) 2 in terms of the values
of the derivatives or second derivatives of |= at particular values of
Assumption 4.5: Let be given and fixed and let 1 = 1 (1 2 ) There exist two
distinct values ∗2 (1 ) and ∗∗
2 (1 ) of 2 such that
(41)
(42)
log (1 ∗2 (1 ))
log (1 ∗∗
2 (1 ))
=
and
2
2
log (1 ∗∗
log (1 ∗2 (1 ))
2 (1 ))
6=
1
1
Assumption 4.5’: Let be given and fixed and let 1 = 1 (1 2 ) There exist a value
∗2 (1 ) of 2 such that
(43)
2 log (1 ∗2 (1 ))
=0
22
and
2 log (1 ∗2 (1 ))
6= 0
2 1
Assumption 4.6: Let be given and fixed and let 1 = 1 (1 2 ) There exists distinct
0
values ∗ and ∗∗ such that ( ∗ ) ( ∗∗ ) ∈ and such that for ∗2 (1 ) and ∗∗
2 (1 ) as in
2
∗∗
Assumption 4.5, ∗2 = 2 (1 2 ∗ ) and ∗∗
2 = (1 2 )
Assumption 4.6’: Let be given and fixed and let 1 = 1 (1 2 ) There exists a value
0
∗ such that ( ∗ ) ∈ and such that for ∗2 (1 ) as in Assumption 4.5’, ∗2 = 2 (1 2 ∗ )
The following propositions provide characterizations of Assumptions 4.5-4.6 and 4.5’-4.6’
in terms of conditions on |=
Proposition 4.1: Let be given and fixed and let 1 = 1 (1 2 ) Suppose that
Assumptions 3.1-3.3 and 2.4 are satisfied. Assumptions 4.5-4.6 are satisfied if and only if
0
there exist ∗ and ∗∗ such that ( ∗ ) ( ∗∗ ) ∈
(4 4 )
log |=∗ () log |=∗∗ ()
log |=∗ () log |=∗∗ ()
6=
=
and
1
1
26
Proposition 4.2: Let be given and fixed and let 1 = 1 (1 2 ) Suppose that
Assumptions 3.1-3.3 and 2.4 are satisfied. Assumptions 4.5’-4.6’ are satisfied if and only if
0
there exists ∗ such that ( ∗ ) ∈
(45)
2 log |=∗ ()
2 log |=∗ ()
=
0
6= 0
2
1
We next employ the implications of Propositions 4.1 and 4.2, together with Theorem 3.1
and equation (3.12), to obtain expressions for 1 (2 1 ) 2 in terms of ratios of differences
of derivatives of log |= or in terms of ratios of second derivatives of log |= . We state
these expressions in Theorems 4.1 and 4.2.
Theorem 4.1: Let be given and fixed and let 1 = 1 (1 2 ) Suppose that Assumptions 3.1-3.3, and 2.4 are satisfied. Let ∗ and ∗∗ be any distinct values of such that
0
( ∗ ) ( ∗∗ ) ∈ ,
(4 6 )
log |=∗ () log |=∗∗ ()
log |=∗ () log |=∗∗ ()
6=
=
and
1
1
holds. Then,
(47)
−1 ()
1 (2 1 )
= 12
=
2
1 ()
log |=∗∗ ()
log |=∗ ()
−
2
2
.
log |=∗ ()
log |=∗∗ ()
−
1
1
¡
¡
¢
¢
Proof: Let = 1 2 3 where 1 = 12 ()11 () 2 = | ( )| 11 ( ) 2 ( )
and where 3 = ( ) is as defined in (3.3). By Assumptions 3.1-3.3, satisfies (3.12).
By Assumption 2.4, is constant over the set {( )| ∈ } since Assumption 2.4 implies
that for all 2 ( ) = 1 and | ( )| is not a function of For any two values (1)
¡
¢ ¡
¢
0
()
()
and (2) of such that (1) (2) ∈ let g2 = log |=() ()2 g1 =
()
log |=() ()1 and g = log |=() () for = 1 2 By (3.12),
g(1)
= 1 g(1)
+ 2 g(1) + 3
2
1
g(2)
= 1 g(2)
+ 2 g(2) + 3
2
1
Subtracting one from the other, we get that
27
and
¡ (1)
¢
¡ (1)
¢
¡ (1)
¢
(2)
(2)
−
g
−
g
g2 − g(2)
=
g
+
g
1
2
2
1
1
´
³
(1)
(2)
If g − g = 0 the equation becomes
¡ (1)
¢
¡
¢
− g(2)
g2 − g(2)
= 1 g(1)
2
1
1
³
´
(1)
(2)
If in addition, g1 − g1 6= 0 we have that
¡
¢ ¡ (1)
¢
(2) −1
(2)
1 = g(1)
−
g
−
g
g
1
2
2
1
Note that 1 = 12 ()11 () = −1 (2 1 ) 2 Hence,
1
(2 1 )
=−
2
µ
¶
12
11
³
´
(1)
(2)
g2 − g2
´
= −³
(1)
(2)
g1 − g1
Replacing (1) by ∗ and (2) by ∗∗ equation (4.7) follows. This completes the proof of
Theorem 4.1.
Theorem 4.2: Let be given and fixed and let 1 = 1 (1 2 ) Suppose that Assump0
tions 3.1-3.3, and 2.4 are satisfied. Let ∗ be a value of such that ( ∗ ) ∈
(48)
2 log |=∗ ()
2 log |=∗ ()
=
0
and
6= 0
2
1
2 log
(49)
∗ ()
|=
−
1 (2 1 )
2
= 2 log
.
|=∗ ()
2
1
¡
¢
Proof of Theorem 4.2: As in the proof of Theorem 4.1, we let = 1 2 3 where
¡
¢
1 = 12 ()11 () 2 = | ( )| 11 ( ) 2 ( ) and where 3 = ( ) is as defined
in (3.3). We note that Assumption 2.4 implies that is constant over the set {( )| ∈ }
Hence by Assumptions 3.1-3.3 and 2.4, it follows by (3.12) that for any value (1) such that
¡ (1) ¢
0
∈
= 1 g(1)
+ 2 g(1) + 3
g(1)
2
1
()
()
()
where g2 = log |=() ()2 g1 = log |=() ()1 and g = log |=() ()
Since is constant over taking derivatives of this equation with respect to gives
28
(1)
(1)
g(1)
= 1 g
+ 2 g
2 2
1
()
()
()
where g2 = 2 log |=() () 2 g1 = log |=() () 1 and g =
2 log |=() ()
(1)
(1)
If g = 0 and g1 6= 0 we have that
¡ (1) ¢−1 ³ (1) ´
g2 2
1 = g
1
Since 1 = 12 ()11 () = −1 (2 1 ) 2 it follows that
µ 1 ¶
(1)
2
g2 2
1 (2 1 )
= − 1 = − (1)
2
1
g1
Replacing (1) by ∗ equation (4.9) follows. This completes the proof of Theorem 4.2.
Our estimation methods for 1 (2 1 )2 under either Assumption 4.5 or 4.5’, are
closely related to our proofs of identification. When Assumptions 3.1-3.3, 2.4 and 4.5’-4.6’ are
satisfied, the estimator for 1 (2 1 )2 is obtained by first estimating nonparametrically
the derivatives 2 log |= ()1 2 log |= ()2 and 2 log |= ()
at the particular value of (1 2 ) at which we want to estimate 1 (2 1 )2 The next
step consists of finding a value
b∗ of satisfying
|=∗ ()
2 log\
=0
2
The estimator for 1 (2 1 )2 is then defined by
(410)
2 log\
|=
∗ ()
1 ( )
−
\
2 1
2
=
2
2 log\
|=
∗ ()
1
We show below that when 2 log |= ()1 2 log |= ()2 and 2 log |= ()2
are estimated using kernel methods, the asymptotic distribution of the estimator for 1 (2 1 )2
defined in this way is consistent and asymptotically normal.
When instead of Assumptions 4.5’-4.6’, we make Assumption 4.5-4.6, our estimator
for 1 (2 1 )2 is obtained by first estimating nonparametrically log |= ()
log |= ()1 and log |= ()2 at the particular value of (1 2 ) for which
b∗ and
b∗∗ of
we want to estimate 1 (2 1 )2 The next step consists of finding values
29
satisfying
log \
log \
|=∗ ()
|=
∗∗ ()
=
1
Our estimator for (2 1 )2 is then defined by
(411)
1 ( )
\
2 1
=
2
log \
|=
∗∗ ()
2
log \
|=
∗ ()
1
−
−
log \
|=
∗ ()
2
log \
|=
∗∗ ()
1
We develop below the asymptotic behavior of this estimator when the values of ∗ and ∗∗ are
known to be such that log |=∗ () = log |=∗∗ () = 0 . These correspond
∗
∗∗
to values ∗2 and ∗∗
2 at which log (1 2 )2 = log (1 2 )2 = 0 We show that
the indirect estimator defined in this way is consistent and asymptotically normal.
4.2. Asymptotic Properties of the Indirect Estimators for the Two
Equations, One Instrument Model
To derive the asymptotic properties of the estimator defined in (4.10), we make the following
assumptions
Assumption 4.7: The density and the density generated by and are
bounded and continuously differentiable of order where ≥ 5 + and denotes the order
of the kernel function (·) specified below in Assumption 4.10.
Assumption 4.8: For any 0 such that 2 log (1 (1 2 ) 2 (1 2 0 )) 22 = 0
0
there exist neighborhoods of (1 2 0 ) and 0 of 0 such that the density () and the
density ( ) = (1 (1 2 ) 2 (1 2 )) | (1 2 )| () are uniformly bounded
0
away from zero on, respectively, 0 and and 3 log (1 (1 2 ) 2 (1 2 )) 32 is
bounded away from zero on those neighborhoods.
Assumption 4.9: For any 0 such that 2 log (1 (1 2 ) 2 (1 2 0 )) 22 = 0
2 log (1 (1 2 ) 2 (1 2 0 )) 1 2 is uniformly bounded away from 0 on the neigh0
borhood defined in Assumption 4.8.
Assumption 4.10: The kernel function attains the value zero outside a compact set,
integrates to 1, is of order where + 5 ≤ is differentiable of order ∆ and its derivatives
of order ∆ are Lipschitz, where ∆ ≥ 5.
30
q
Assumption 4.11: The sequence of bandwidths, , is such that → 0
7+2
hp
i2
p
p
→ 0, 7 → ∞, [ 11
7
ln() 11
→0.
ln()] → ∞, and
+
Assumptions 4.7, 4.10 and 4.11 are standard for derivations of asymptotic results of kernel estimators. Assumptions 4.8 are 4.9 are made to guarantee appropriate asymptotic
\
behavior of the estimator for the value ∗ at which 1 (
2 1 ) 2 is calculated. Define
2
2
2
(
1
2 )
(
1
2 )
1
2 )
1 (e
1 e2 ) =
2 (e
1 e2 ) =
and (e
1 e2 ) = (
1
2
2
¡
¢0
1 e2 ) denote the 3 × 1 vector 1 (e
1 e2 ) 2 (e
1 e2 ) (e
1 e2 ) . DeLet (e
0
∗
fine the vector ( ) = (1 2 3 ) where
1 =
2 log |=∗ ()
2
; 2
h 2
i
log |=∗ () 2
∗
( )
1
3 =
Let
e = ( )
∗ 0
and let
b
e =
b (
b∗ )0
∙Z
∙Z
³
=
2 log |=∗ ()2
2 log |=∗ ()1
( ∗ )
³
−1
2 log |=∗ ()
( ∗ )
1
´
;
´
3 log |=∗ ()
3
¸
(e
1 e2 ) (e
1 e2 ) (e
1 e2 ) ( ∗ ) ( ∗ )
0
¸
(e
1 e2 ) (e
1 e2 ) (e
1 e2 )
b (
b∗ ) b (
b∗ )
0
be an estimator for e obtained by substituting by b in the definitions of 1 2 and 3
b∗ In the Appendix we prove
and substituting ∗ by
Theorem 4.3: Suppose that the model satisfies Assumptions 3.1-3.3, 2.4, 4.5’-4.6’, and
4.7-4.11. Let the estimator for 1 (2 1 ) 2 be as defined in (4.10). Then,
q
³
´
³
´
1
\
e
7 1 (
)
−
(
)
0
→
2 1
2
2 1
2
b
and e is a consistent estimator for e
To derive the asymptotic properties of the estimator defined in (4.11) under the additional
assumption that log |=∗ () = log |=∗∗ () = 0, we make the following
assumptions
31
Assumption 4.7’: The density and the density generated by and are
bounded and continuously differentiable of order where ≥ 4 + and is the order of the
kernel function (·) in Assumption 4.10’.
00
Assumption 4.8’: For any 0 such that log (1 (1 2 ) 2 (1 2 0 )) 2 = 0
0
and log (1 (1 2 ) 2 (1 2 00 )) 2 = 0 there exist neighborhoods of (1 2 0 )
00
of (1 2 00 ) 0 of 0 00 of 00 such that the density () and the density
( ) = (1 (1 2 ) 2 (1 2 )) | (1 2 )| () are uniformly bounded away
0
00
Moreover,
from zero on, respectively, 0 and 00 and on and
2
1
2
2
log ( (1 2 ) (1 2 )) 2 is bounded away from zero on those neighborhoods.
Assumption 4.9’: For any two values 0 00 such that log (1 (1 2 ) 2 (1 2 0 )) 2 =
¡
¡
¡
¢
00 ¢¢
00
2 = 0 log 1 2 (1 20 )1 − log 1 2 (1 2 )1
0 and log 1 (1 2 ) 2 1 2
0
00
is uniformly bounded away from 0 on the neighborhoods
0 and 00 defined on
Assumption 4.8’.
Assumption 4.10’: The kernel function attains the value zero outside a compact set,
integrates to 1, is of order where + 4 ≤ is differentiable of order ∆ and its derivatives
of order ∆ are Lipschitz, where ∆ ≥ 4
p
p
Assumption 4.11’: The sequence of bandwidths, is such that 5 → ∞ 5
i
hp
i2
hp
p
→ 0
ln() 9 + → 0 and
5
ln() 9 + → 0 .
2 )
2 )
2 )
1
Define 1 (e
1 e2 ) = (1
2 (e
1 e2 ) = (1
and (e
1 e2 ) = (
1
2
¡
¢0
e 1 e2 ) denote the 3 × 1 vector 1 (e
1 e2 ) 2 (e
1 e2 ) (e
1 e2 ) Define
Let (e
1
1
1
1
2
2
2
2
the vectors = (1 2 3 ) and = ( 1 2 3 ) by
11 =
−
h
i
( ) −
( ) ( ∗∗ )
;
i2
h
∗∗
(∗ )
∗∗ ) − ( )
∗)
(
(
1
1
(∗∗ )
2
12 = h
13
=
−
∗
(∗ )
1
1
³
(∗ )
2
− ( ∗∗ )
( ∗∗ )
−
(∗∗ )
1
∗∗
log |=∗∗ ()2 − log |=1 ()2
log |=1 ()1 − log |=∗∗ ()1
³
2 log |=∗ ()
2
32
´
( ∗ )
i;
( ∗ )
´
|1 =∗
;
21 =
h
i
( ) −
( ) ( ∗ )
;
h
i2
∗∗
(∗ )
∗∗ ) − ( )
∗)
(
(
1
1
(∗∗ )
2
22 = h
23
=
(∗ )
2
∗∗
( ∗ )
i ; and
( ∗ )
³ log
´
|=2 ()2 − log |=∗ ()2
|2 =∗∗
log |=∗ ()1 − log |=2 ()1
³ 2
´
log |=∗∗ ()
∗∗ )
(
2
(∗ )
1
−
Define
∗
2
( ∗∗ ) −
(∗∗ )
1
∙Z
¸
0
e
e
=
1 e2 ) (e
1 e2 ) 1 ( ∗ )
(e
1 e2 )(e
∙Z
¸
20
0
e 1 e2 ) (e
e 1 e2 )(e
+
1 e2 ) 2 ( ∗∗ )
(e
10
and define an estimator for by
∙Z
¸
0
e
e
(e
1 e2 )(e
1 e2 ) (e
1 e2 )
b 1 b (
b∗ )
∙Z
¸
20
0
e
e
1 e2 ) (e
1 e2 )
b 2 b (
b∗∗ )
+b
(e
1 e2 )(e
b =
b 10
b ∗ by
which is obtained by substituting by
b∗ , and ∗∗ by
b∗∗ in the definitions of 1
2 and 3 In the Appendix we prove
Theorem 4.4: Suppose that Assumptions 3.1-3.3, 2.4, 4.5-4.6, 4.7’-4.11’ are sat\
isfied. Define 1 (
b∗ and
b∗∗ satisfying log b |=∗ () =
1 1 )2 as in (4.11), for
log b |=∗∗ () = 0. Then,
q
³
´
¡
¢
1
5
\
1
(2 1 )2 − (2 1 )2 → N 0
and b is a consistent estimator for
5. Simulations
In this section, we report results of limited experiments with the estimators developed in
the previous sections. The estimators involve choosing many parameters, such as the bandwidths, the kernel functions, and the weight functions in Sections 2 and 3. In addition,
33
trimming is often desirable to avoid random denominators whose values are close to zero.
The results are sensitive to the values of all these parameters; we chose only rudimentary
ones. Further studies are needed to determine how to select these optimally.
5.1. Simulations for the Average Derivative Estimator for the Exclusive Regressors Model
To obtain some indication of the performance of the estimator for the exclusive regressors
model in Section 2, we generated simulated data from a linear and from a nonlinear model.
The linear model was specified as
1 = 075 2 − 1 + 1
2 = −05 1 − 2 + 2
The inverse functions 1 and 2 were then
1 = 1 (1 2 1 ) =
1 − 075 2 + 1
2 = 2 (1 2 2 ) = 05 1 +
2 + 2
For each of 500 simulations, we obtained samples for (1 2 ) and for (1 2 ) from a Normal
distribution with mean (0,0) and with variance matrix equal to the identity matrix. We used
the estimator developed in Section 2 to estimate the derivatives of the functions 1 and 2
with respect to 1 and 2 at (1 2 ) = (0 0) The derivatives of the inverse functions at
(1 2 ) = (0 0) are, for all values of (1 2 )
11 = 1; 12 = −075; 21 = 05; 22 = 1
The weight function () employed was the Normal density with mean (0,0), diagonal variance matrix, and standard deviation for each coordinate equal to 1/3 of the standard deviation of, respectively, 1 and 2 We used a Gaussian kernel of order 8 and bandwidths
= −1(+2+2
)
where denotes the bandwidth for coordinate denotes standard deviation of
sample, denotes the order of the kernel, denotes the number of coordinates of
density; = 0 if the bandwidth is used in the estimation of a density, and = 1 if
bandwidth is used in the estimation of a derivative of a density.
The results are presented in the following four tables. For the table presenting
34
the
the
the
the
c
results for b11 the column b11 denotes the average over simulations of the value of b11
denotes the average over simulations of the estimated standard deviation of b11 denotes
the empirical standard deviation of b11 The 0.95 columns denotes the percentage of the
simulations where the true value of 11 was within the estimated 95% confidence interval.
The other tables report the analogous results for the estimators of the other derivatives.
In general, the average of the estimated standard errors, calculated with the equation for
the variance stated in Theorem 2.4, was close to the empirical standard deviation of the
estimates. As expected, both the bias and the estimated standard errors decreased with
sample size.
11 = 1
12 = −075
0.95
N
0.52
c
0.78
0.96
0.72
0.43
0.61
2,000
0.90
0.39
5,000
1.01
10,000
1.03
500
b11
0.54
1,000
N
c
0.95
0.87
0.98
0.49
0.68
0.98
-0.68
0.42
0.52
0.99
5,000
-0.75
0.30
0.34
0.98
10,000
-0.77
0.25
0.25
0.97
std
500
b11
-0.40
0.57
0.97
1,000
-0.56
0.47
0.98
2,000
0.30
0.30
0.95
0.25
0.22
0.93
21 = 05
22 = 1
0.95
N
0.47
c
0.80
1.0
0.36
0.40
0.61
2,000
0.46
0.33
5,000
0.50
10,000
0.52
std
500
b12
0.26
1,000
N
0.95
0.63
c
0.89
0.96
0.76
0.54
0.68
0.97
2,000
0.93
0.44
0.51
0.97
0.97
5,000
1.01
0.31
0.33
0.98
0.97
10,000
1.03
0.26
0.25
0.95
std
500
b22
0.50
1.0
1,000
0.46
1.0
0.26
0.30
0.22
0.22
The nonlinear model was specified by
1 = 10 [1 + exp(2 (2 − 1 − 5 + 1 ))]−1
2 = 45 + 01 1 − + 2
The corresponding inverse functions are
10
1
lg( − 1) − 2 + 1 + 5
2
1
2
= (1 2 1 ) = − 01 1 + 2 − 45 + 2
1 = 1 (1 2 1 ) =
2
35
The derivatives of (1 2 ) were estimated at (1 2 ) = (5 5) At this point,
11 = −02; 12 = −10; 21 = −01; 22 = 10
As with the linear model, for each of 500 simulations, we obtained samples for (1 2 ) and
for (1 2 ) from a Normal distribution with mean (0,0) and with variance matrix equal to the
identity matrix. The results are presented in the following four tables. The average values
of the estimated standard errors were again in this model close to the standard deviation
of the estimates. It is worth noticing the difference between our nonparametric estimator
¡ ¢
for −12 11 and the corresponding Least Squares estimator, − e12 = −207 calculated
assuming that the model is linear. The value −207 corresponds to the 0.987 quantile of
11 when = 10 000
the empirical distribution of −b
12 b
11 = −02
12 = −1
0.95
N
0.15
c
0.25
0.98
500
-0.37
-0.14
0.15
0.21
0.98
1,000
2,000
-0.16
0.12
0.17
0.98
5,000
-0.17
0.09
0.11
10,000
-0.16
0.07
0.08
std
500
b11
-0.12
1,000
N
21 = −01
0.95
0.58
c
0.75
0.87
-0.51
0.50
0.65
0.92
2,000
-0.70
0.47
0.51
0.92
0.98
5,000
-0.90
0.33
0.34
0.92
0.94
10,000
-0.98
0.26
0.25
0.94
0.95
N
0.16
0.27
1
-0.08
0.15
0.24
2,000
-0.08
0.12
5,000
-0.11
10,000
-0.11
std
500
-0.06
1,000
std
22 = 1
c
b11
N
b12
0.95
0.54
c
0.82
0.90
0.41
0.51
0.72
0.92
2,000
0.62
0.49
0.58
0.93
0.99
5,000
0.85
0.38
0.40
0.92
0.99
10,000
0.95
0.31
0.30
0.93
std
500
b11
0.29
1
1,000
0.19
1
0.09
0.13
0.08
0.10
5.2. Simulations for the Average Derivative Estimator for the Two
Equations One Instrument Model
We considered the same linear and nonlinear models as in the previous subsection, except
that here the first equation has no exogenous regressor. So, the linear and nonlinear models
36
were, respectively,
1 = 075 2
+ 1
2 = −05 1 − 2 + 2
and
1 = 10 [1 + exp(2(2 − 5 + 1 ))]−1
2 = 45 + 01 1 − 2 + 2
For both the linear and nonlinear models, the distribution of (1 2 ) was such that the
marginal density of 1 was Normal with mean 0 and standard deviation .5, and conditional
on 1 the density of 2 was a mixture of two normal distributions. The first of these normal
distribution had mean 1 = − (1 + 01 1 )2 and standard deviation 1 = 01 (1 + 05 1 )2
, while the second had mean 2 = (1 + 01 1 )2 and standard deviation 2 = 01 (1 + 2 1 )2
. The variance of 2 generated th is way was 1.09. was sampled from a Uniform(-1.5,1.5).
The derivative with respect to 2 in the first structural equation was estimated using the
same high order kernel and same bandwidth choices as reported in the previous subsection.
The derivative was estimated at 1 = 2 = 0 for the linear model and at 1 = 2 = 5 for the
nonlinear model The nonparametric densities and derivatives were estimated using again
a Gaussian kernel of order 8 and for each coordinate, bandwidths = −1(+2+2 )
For both models and all samples, the weight to any value of was assigned the value of zero
when at (1 2 ) the estimated value of either the joint density, or the marginal density
of or the conditional density was below .001. Lowering this trimming value increased
coverage. The weight function ( ·) was otherwise uniform.
For the data generated by the nonlinear model, the derivative with respect to 2 has true
value -5, while the Least Squares estimate for this derivative is -2.75. This value corresponds
to the 0.93 quantile of the empirical distribution of the estimator when N=10,000, to the
0.992 quantile when N=25,000 and it is above the 1.0 quantile when N=50,000.
37
Linear Model
075
Nonlinear Model
1 (2 1 )
= −5
1 (2 1 )
=
2
0.95
N
0.30
c
0.32
0.93
10,000
b 11
0.74
0.17
0.21
0.98
25,000
-4.95 1.06 1.48 0.96
0.73
0.12
0.16
0.98
50,000
-4.90 0.77 1.09 0.98
std
10,000
b 11
0.76
25,000
50,000
N
2
std
c
0.95
-5.05 1.89 2.17 0.94
5.3. Simulations for an Indirect Estimator
We performed simulations for the estimator that is based on first order derivatives. For
each of 500 simulations, we generated 1 from a uniform distribution with support (−1 1)
and for each 1 we generated 2 from a conditional density satisfying
¡
¢
log 2 |1 (2 ) = (14) (2 )2 − (12) (2 )4 + 1 + 1 2 + 1 (2 )2
¡
¢
When 1 = 0 the derivative of log 2 |1 (2 ) with respect to 2 equals zero at 2 = −05
2 = 0 and 2 = 05 was generated from a uniform(-1,1) distribution. The object
of interest, 1 (2 1 ) 2 was estimated at 1 = 2 = 0 for the linear model, and at
1 = 2 = 5 for the nonlinear model. The results reported in the tables below were obtained
with kernels of order 4 and bandwidths of size 3/4 of those determined as in the previous
two subsections. Larger bandwidths generated a larger bias for both the linear and the
nonlinear estimators. A higher order kernel made it difficult finding values of at which
log |= () = 0 Even with a kernel of order 4, for some simulations, such values were
not always found. Moreover, even when such values were found, it was also possible that the
estimator had values of different sign than the correct one, or that the random denominators
in the calculation of the estimates or its variance were too low. The numbers in the tables
below are calculated using only the simulations where such situations did not arise. Out of
e denotes the number of simulations so obtained. The lower bound allowed
500 simulations,
on the denominators was 10−6 For the simulations where more than two values of at
which log |= () = 0 were found, the estimate was calculated using the smallest and
largest values of such values. For the data generated from the nonlinear model, the Least
Squares estimate for 1 (2 1 ) 2 is −31 which corresponds to the 0.93 quantile of the
empirical distribution of the Indirect Estimator when = 25 000
38
Linear Model
= 075
Nonlinear Model
= − 5
1 (2 1 )
2
0.95
0.38
c
0.61
0.99
0.76
0.27
0.35
0.74
0.22
0.27
std
5,000
b 12
0.84
10,000
25,000
N
1 (2 1 )
2
e
N
452
0.98
0.97
0.95
2.27
c
2.78
0.98
-5.32
1.80
2.12
0.98
474
-5.00
1.18
1.33
0.97
492
std
5,000
b 12
-5.51
471
10,000
488
25,000
e
445
6. Conclusions
In this paper, we have developed two new approaches for estimation of nonparametric simultaneous equations models. Both approaches were based on new identification results
that were developed in this paper for two models, the exclusive regressors model and the
two equations, one instrument model. In the exclusive regressor model, to each structural
equation there corresponds an exclusive observable exogenous variable. In the two equations, one instrument model, the object of interest is the equation where the instrument is
excluded. For each model, we developed estimators that are calculated by a simple matrix
inversion and a matrix multiplication, where the elements in each matrix are calculated by
averages of products of nonparametrically estimated derivatives of the conditional density of
the observable variables. For the two equations, one instrument model, we developed also
two-step indirect estimators. In the first step, values of the observable exogenous variable
are found where the nonparametric estimator of the conditional density of the observable
variables satisfies some conditions. In the second step, the value of the object of interest is
read off the estimated conditional density evaluated at those values. We developed two such
two-step indirect estimators, one using first order derivatives and a second one using second
order derivatives. We have shown that all the new estimators are asymptotically normal.
Although we only developed two estimation approaches, the new identification results can
be used to develop many other approaches.
7. Appendix
Proof of Theorem 2.1: We first introduce some notation and a Lemma. Let denote
the × matrix whose element in the -th row and -th column is ( ) Let denote
the × diagonal matrix whose element in the -th diagonal place is ( ) The ×
matrix whose element in the -th row and -th column is the ratio ( ) ( )
will then be equal to ( )−1 We will denote by the × 1 vector whose -th element is
39
log ( ( ) The × 1 vectors and will be the vectors whose -th element are,
respectively, log | ( )| and log | ( )| For any function e and any density
and the vectors e
e and
e will be defined analogously.
the matrices e e and e
We will denote the × 1 vectors (g1 g ) and (g1 g ) by, respectively, g and g
. We will make use of the following Lemma.
Lemma M (Matzkin (2008)): Let denote a convex set in the interior of the support
of ( ) Suppose that ( ) ∈ (Γ × Φ) generate on and e ∈ Γ Then, there exists
) generate on if and only if for all ( ) ∈
∈ Φ such that (e
(1)
³
¡ ¢−1 0 ´
¡ ¢−1 ¡
¢
0 − e0 e0
= −( − e
) + e0 e0
− e
Proof : First, restrict the definition of observational equivalence in Matzkin (2008) to
observational equivalence on Then, the lemma follows by the same arguments used in
Matzkin (2008) to prove her Theorems 3.1 and 3.2. The expression in (1) is the transpose
of (2.8) in the statement of Theorem 3.1 in Matzkin (2008).
We will next show that in the exclusive regressors model, when and generate
and when and e are invertible, diagonal, × matrices, on M, (2.8) is equivalent to
(1) Hence, by Lemma M, it will follow that e is observationally equivalent to on M if
and only if (28) is satisfied. For this, we first note that since ( ) generate on
for all ( ) ∈
|= () = ( ( )) | ( )|
Taking logs and differentiating both sides with respect to without writing the arguments
of the functions explicitly, we get
g = 0 +
Since by assumption is invertible, we can solve uniquely for getting
(21)
−1
(0 )
(g − ) =
Replacing in (1) for the expression for in (21) we get
(22)
h
h ¡ ¢ i¡
¡ 0 ¢−1 0 i 0 −1
¢
−1
0
0
− e e
( ) (g − ) = −( −
e ) + e0 e0
e
−
Since 0 (0 )−1 = the right-hand-side of (22) can be expressed as
40
h
=
h
¡ ¢−1 0 i 0 −1
0 − e0 e0
( ) (g − )
i
i
h
¡ ¢−1 0
¡ ¢−1 0
−1
−1
g − − e0 e0
(0 )
(0 )
− e0 e0
Hence, premultiplying both sides of (22) by −e
0 (e
0 )−1 we get
=
i
i
h
h
−1
−1
−1
−1
g − 0 (0 ) − e0 (e
0 )
0 )
0 (0 ) − e0 (e
h
i
¡
¢
−1
e0 (e
0 )
e ) − −
e
( −
¤
£
0 )−1 from both sides of the equality sign, and rearranging terms, the
Subtracting e0 (e
last expression can be written as
i
i h
i
h
h
0
0 −1
0
0 −1
0
0 −1
0
0 −1
g + − ( ) −
e − e (e
)
) e
= 0
( ) − e (e
Note that the vector = (1 )0 = − 0 (0 )−1 and e =
´0
³
e1 e =
e − e0 (e
0 )−1 e
Hence, we have obtained that in the exclusive regressors model, (1) is
equivalent to
i
h
−1
−1
0 )
g + − e = 0
0 (0 ) − e0 (e
This is exactly (2.8) in Section 2.
Proof of Propositions 2.1 and 2.2: By the definition of the model in (2.2), for
each the ratios of derivatives, ( ) ( ) ( = 1 ) depend only of
and When, Assumption 2.4 is satisfied, the ratios of derivatives of are given by
() since the derivative of with respect to is 1. Hence, is constant over =
{( − ) | ∈ } Moreover, the Jacobian determinant | | which is a function of the
derivatives of the functions with respect to does not depend on either, since only
affects and the derivative of with respect to is not a function of Hence, for each
all the terms in defined in (2.6), are constant over It then follows that (1 ) is
constant over = {( − ) | ∈ } A similar reasoning shows that when Assumption
©
ª
2.4’ is satisfied, ( ) is constant over = ( 1 ) | (1 ) ∈
(1)
(+1)
Proof of Proposition 2.3: Let
41
be such that
(1)
´
³
()
= − (+1) =
³
´
(+1)
−
∈ and let (1) (+1) be such that for each = 1 + 1
³
´´
³
()
() = 1 ( 1 ) −1 ( −1 )
We will show that
¡
¢
¡
¢
(1) (+1) is invertible = (1) (+1) is invertible
¡
¢
Note that the relationship between the element in the -th row and -th column of (1) (+1)
¢
¡
and the the -th row and -th column of = (1) (+1) for ∈ {1 } is given by
()
g ( − ) =
³
´
()
log (− − )
( ) +
log | ( )|
¡
¢
¡
¢
where − = 1 ( 1 ) −1 ( −1 ) Suppose that = (1) (+1) is invertible. The value of log | ( )| is constant across the -th column, because | ( )|
is constant over () ( = 1 +1). Multiplying the +1 column of which is a column
of 1’s, by log | ( )| and subtracting it from the -th column will result in a matrix
of the same rank as Since ( ) is also constant across the -th column, dividing the
resulting column by ( ) will not affect the rank as well. Repeating the analogous
¢
¡
operations in each of the first columns of results in the matrix = (1) (+1)
Hence, and must have the same rank. Since is invertible, must be invertible.
Conversely, starting from the matrix and performing the same operations in reverse order,
we end up with the matrix which will be invertible if is invertible.
Proof of Proposition 2.4: By Assumption 2.4’, for each = 1 the derivative
of with respect to is 1 and the derivative of the Jacobian determinant | ( )| with
respect to is zero. Then, for each = 1 and all
g ( ) =
| ( )|
log ( ( ))
log ( ( ))
( ) +
=
| ( )|
´
³
´
³
´´
³ ³
¡
¢
()
()
()
Let () = () ( = 1 +1) and () = 1 1 −1 −1
¡
¢
¡
¢
¡
¢
Since g ( () ) = log ( () ) (1) (+1) = (1) (+1) . Hence,
¢
¡
¢
¡
(1) (+1) is invertible if and only if (1) (+1) is invertible.
42
(1)
(+1)
Proof of Theorem 2.2: Let
denote the values of corresponding to
(+1)
the³points (1)´
³ satisfying ´Condition I.1. Since is strictly positive
³ and continuous
´
(1)
(+1)
()
there exist 0 neighborhoods ( − )
at − −
( = 1 + 1), such that is strictly positive on those neighborhood. Let denote the
union of those neighborhoods. For any
Z
Z
=
³
´2
1
2
e
e
e
e
g − g1 − g2 − · · · − g −
( − )
∩
Z
+
³
´2
1
2
g − e
g1 − e
g2 − · · · − e
g − e
( − )
³
´2
1
2
g1 − e
g2 − · · · − e
g − e
( − )
g − e
∩
´ ³ 1 2
´ ³
´
³
e = e
e
e
e = 1 2
When e
both terms are zero. When
³
´
³
´
()
e
e 6= 1 2
Condition I.1 implies that for at least one (g −
1
2
e
g1 − e
g2 − − e
g − e )2 6= 0 The continuity of g and g on which
is implied by Assumptions
2.1-2.3, implies then that
³ over is strictly
´ posi³ the ´integral
³ ´
1
2
e
e
is uniquely minimized at e
= Since
tive. Hence, e
is a convex and differentiable function, its matrix of second order derivatives must be
R
R
positive semidefinite at the minimizer ( ) Denote g g ( − ) by g g
R
R
and
denote
g
(
)
by
g The first order conditions for any subvector
−
´
³
1
2
are
⎡ R
⎢
⎢
⎢
⎢
⎢
⎢
⎣
R
R
g1 g1
g1 g
g1 g2 · · ·
·
·
·
·
·
·
R
R
R
g g
g g · · ·
g g
R
R 1
R 2
· · ·
g
g1
g2
R
g
R 1
g2
·
R
g
1
⎤
⎡
⎢
⎥⎢
⎥⎢
⎥⎢
⎥⎢
⎥⎢
⎥⎢
⎦⎢
⎢
⎣
1
2
·
·
⎤
⎡ R
g g1
⎥
⎥ ⎢
⎥ ⎢
·
⎥ ⎢
⎥ ⎢
⎥=⎢
·
⎥ ⎢ R
⎥ ⎣ g g
⎥
R
⎦
g
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
and the ( + 1) × ( +
in this expression is the
³ 1) matrix
´
³ matrix´ of second order derivatives
of with respect to e
e which is constant over e
e If this matrix were not
invertible, the solution would not be unique. Hence, since the matrix is positive semidefinite
and invertible, it is positive definite. Solving for using the last equation and substituting
the resulting expression into the other equations, we get
43
⎡
⎢
⎢
⎢
⎢
⎣
1 1 1 2 · · · 1
·
·
·
·
·
·
1 2 · · ·
⎤
⎡
⎢
⎥⎢
⎥⎢
⎥⎢
⎥⎢
⎦⎢
⎣
1
2
·
·
⎤
⎡
1
⎥
⎥ ⎢
⎥ ⎢ ·
⎥=⎢
⎥ ⎢ ·
⎥ ⎣
⎦
⎤
⎥
⎥
⎥
⎥
⎦
where the × matrix is also positive definite, by properties of partitioned matrices, and
therefore invertible.
Proof of Theorem 2.3: The arguments are almost identical to those in the proof of
Theorem 2.2.
Proof of Theorem 2.4: Let F denote the set of bounded and continuously differentiable
functions on the extension of the support of ( ) to the whole space and such that the
R
function e
defined by e
() = |( )| is bounded and continuously differentiable. For
all functions in F, denote by kk the sum of the sup norms of the values and derivatives of
and of e on × For any ( ) ∈ × define the functionals (·) and (·)
on F by () = log |= () and () = log |= () By Assumption
2.7, the density belongs to F and it is such that for 0 and all ( ) ∈ ×
( ) and () Define the functionals (·) and (·) on F by () =
log |= () and () = log |= () For simplicity we leave the argument
( ) implicit. For each define the functional Φ () by
Φ () =
Z
() () ( ) −
µZ
¶ µZ
¶
() ( )
() ( )
³ ´
Then, b = Φ b and = Φ () By Lemma A.2 in the Supplement and
e
Assumptions 2.7 and 2.9, there exists
finite
° 1 1 0 a linear functional Φ and a
°
°
°
1
functional Φ such that when °b − ° ≤ e
³ ´
³
´
³
´
b
b
b
Φ − Φ ( ) = Φ ; − + Φ ; −
°
¯
°2
¯
°
°
³
´¯
³
´¯
°
¯
°
¯
°b
°b
¯
¯
b
b
¯Φ ; − ¯ ≤ 1 ° − ° ; ¯Φ ; − ¯ ≤ 1 ° − °
°2
°
√
°
°b
+2
By Lemma A.5 in the Supplement and Assumptions 2.7, 2.9, and 2.10,
° − ° =
(1) Hence,
44
³
³ ´
´
√
+2 Φ b − Φ ( )
³
´ √
³
´
√
+2
+2
b
b
=
Φ ; − +
Φ ; −
³
´
√
=
+2 Φ ; b − + (1)
Also by Lemma A.5 in the Supplement,
³
´
√
+2
b
Φ ; −
"
#
Z ³
´ ¡ ( ) − R () ¢
√
=
+2
+ (1)
b ( ) − ( )
Let () denote the 2 × matrix whose rows correspond in order to Φ1 1 Φ1 2
Φ1 Φ 1 Φ 2 Φ and its columns correspond in order to 1 2 Let
¡
R
¢
the entry corresponding to (Φ ) equal to 0 if 6= and equal to () − ( )() ()
otherwise. Assumptions 2.7 and 2.8 imply that each entry is bounded and continuous and
equal zero on the complement of Hence, Assumption 5.1 in Newey (1994) is satisfied.
Our Assumptions 2.7 and 2.9 imply that Assumptions K, H, and Y in Newey (1994) are also
satisfied. Hence, it follows by Assumption 2.10 and Lemma 5.3 in Newey (1994) that
(3)
q
³
´
c
+2
−
→ (0 )
where is as defined in Section 2.
Φ () =
Z
Define
() () ( ) −
µZ
¶ µZ
¶
() ( )
() ( )
³ ´
Then, b = Φ b and = Φ () By Lemma A.3 in the Supplementary
°
°
°
°
3
Material, for a finite 3 0, all and all b such that °b − ° ≤ e
¯
¯
¯ ¯
¯
³ ´
³
´¯ ¯
³
´¯
¯
¯b
¯ ¯
¯
¯ ¯
¯
b
b
b
¯ − ¯ = ¯Φ − Φ ( )¯ ≤ ¯Φ ; − ¯ + ¯Φ ; − ¯
°
°
°
°2
°b
°
°b
°
≤ 3 ° − ° + 3 ° − °
°
°
°
°
Since by Assumption 2.10 and Lemma A.5 in the Appendix, °b − ° → 0 in probability, it
¯
¯
¯b
¯
follows that ¯ − ¯ → 0 in probability. Hence, \
converges in probability to the
matrix which is positive definite by Theorem 2.3. This together with (3) implies
45
by Slutsky’s Theorem that
q
¡
−1
−1 ¢
+2
(
b
−
)
→
0
(
)
(
)
´−1
³
´−1
³
b
\
\
is a consistent estimator for ( )−1 ( )−1
To show that
it remains to show that for all
#
Z "³
´³
´ ( )2
∆ log b |= () ∆ log b |= ()
→
b )
(
¸
Z ∙
¡
¢¡
¢ ( )2
∆ log |= () ∆ log |= ()
( )
By Lemma A.6 in the Supplement, the term inside the first integral converges in probability,
uniformly over ∈ to the term inside the second integral. Since by Assumption 2.7 this
expression is bounded from above and below, and since is compact, it follows that the
first integral converges in probability to the second integral.
Proof of Theorem 3.1: We will show that in the model with two equations and one
instrument, (3.5) is equivalent to (1) As in the proof of Theorem 2.1, we note that since
( ) generate on for all ( ) ∈
|= () = ( ( )) | ( )|
Specifically for the two equation, one instrument model,
(31)
¢
¡
1 2 |= (1 2 ) = 1 2 1 (1 2 ) 2 (1 2 ) | (1 2 )|
By Lemma B.1 in the Supplement, Assumptions 3.1-3.2 together with (31) imply that
³
¡ 0 ¢−1 0 ´
0
0
− e e
µ 2 ¶µ
¶¸
µ 2 ¶¸
∙
∙ µ 2 ¶µ 1 ¶
¡
¢
2
e
e
e
| |
1
1
1
(g
−
e
g
= 1 − e1
−
)
−
e
−
1
1
1
2
|e
|
11 2
|e
|
11
|e
|
46
and that
¡ ¢−1 ¡
¢
−( −
e ) + e0 e0
e
−
= −( −
e ) −
e12
µ
Hence, (1) can be expressed as
µ 2 ¶
¶
¢
¢
e2 ¡
e ¡
1
1 −
2 −
e1 + e1
e2
|e
|
|e
|
µ 2 ¶µ
¶¸
µ 2 ¶¸
∙ µ 2 ¶µ 1 ¶
∙
¡
¢
2
| |
e
e
e
1
1
1
g1 − 1
(g − ) − e1
− e2
(32)
1 − e1
1
2
1
|e
|
1
|e
|
1
|e
|
µ 2 ¶
µ 2 ¶
¢
¢
e ¡
e ¡
1
1
1 − e
2 −
e ) − e2
1 + e1
e2
= −( −
|e
|
|e
|
£
Multiplying both sides of (32) by
(33)
=
∙
∙µ
| |
1
1 2
¶
−
µ
¤
− |e
| (e
2 e11 ) we get
|e
|
(e
2 e11
¶¸
(g − ) +
∙µ
12
11
¸
∙ 1 ¸
¢ ¡
¢
e2 ¡
|e
|
(
−
e
)
+
−
e
−
e
−
1
1
2
2
(e
2 e11
e11
¶
−
µ
e12
e11
¶¸
¡
¢
g1 − 1
£
¤
£
¤
Subtracting |e
| (e
2 e11 ) + e12 e
11 1 from both sides of (33) we get
(34)
+
∙µ
12
11
¶
∙µ
−
| |
1
1 2
µ
e12
e11
¶
¶¸
−
µ
|e
|
(e
2 e11
¶¸
¸
| |
g − 1 2
1
¸
12
g1 − 1 1
1
∙
∙
¸
∙ 1 ¸
¡
¢
e2
|e
|
e
e
= −
−
−
−
e
2
2
(e
2 e11
e11 1
∙
By (3.3) in Section 3.2,
¸
¸
∙
12
| |
( ) = 2 − 1 1 − 1 2 and
1
1
∙
¸
¸
∙
e12
|e
|
e
( ) = e
2 − 1
e − 1 2
e
e1 1
e1 e
∙
47
Hence, (34) can be expressed, after rearranging it, as
(35) 0 =
∙µ
12
11
¶
−
µ
e12
e11
¶¸
g1 +
∙µ
| |
1
1 2
¶
−
µ
|e
|
(e
2 e11
¶¸
g + [ ( ) − e
( )]
Proof of Theorem 3.2: Let F be defined as in the proof of Theorem 2.4, let the
functionals Φ1 and Φ2 on F be as defined also in the proof of Theorem 2.4, and let
Φ1 2 Φ Φ1 1 be defined analogously. Define the functional Ξ on F by
Ξ () =
Φ1 2 () Φ () − Φ1 () Φ2 ()
Φ1 1 () Φ () − Φ21 ()
By Lemma A.4 in the Supplement, there exist finite 0 and e
0 such that for all ∈F
with kk ≤ e
Ξ ( + ) − Ξ () = Ξ ( ; ) + Ξ (; )
where
Ξ (; )
−
[Φ1 2 (; ) Φ ( ) − Φ1 (; ) Φ2 ()]
Φ1 1 ( ) Φ ( ) − Φ21 ()
+
[Φ1 2 () Φ ( ; ) − Φ1 () Φ2 ( ; )]
Φ1 1 () Φ () − Φ21 ( )
[Φ1 2 ( ) Φ ( ) − Φ1 ( ) Φ2 ()] [Φ1 1 ( ; ) Φ + Φ1 1 () Φ (; )]
¤2
£
Φ1 1 () Φ () − Φ21 ( )
+
and
=
[2 Φ1 (; ) Φ1 ()] [Φ1 2 ( ) Φ ( ) − Φ1 () Φ2 ( )]
¤2
£
Φ1 1 ( ) Φ ( ) − Φ21 ()
|Ξ ( ; )| ≤ kk |Ξ ( ; )| ≤ kk2
Letting = b − it follows that, for = 2
³
´
√
+2
b
Ξ ; −
³
´∙
√
+2
b
=
Φ1 2 ; −
¸
Φ ()
Φ1 1 () Φ () − Φ21 ( )
¸
³
´∙
√
()
Φ
2
− +2 Φ1 ; b −
Φ1 1 () Φ () − Φ21 ()
48
¸
Φ1 2 ()
Φ1 1 () Φ () − Φ21 ( )
¸
³
´∙
√
(
)
Φ
1
+2
Φ2 ; b −
−
Φ1 1 () Φ () − Φ21 ()
³
´ Φ [Φ
√
1 2 ( ) Φ ( ) − Φ1 () Φ2 ( )]
− +2 Φ1 1 ; b −
¤2
£
Φ1 1 () Φ () − Φ21 ( )
³
´ Φ
√
1 1 [Φ1 2 () Φ () − Φ1 ( ) Φ2 ()]
− +2 Φ ; b −
¤2
£
Φ1 1 ( ) Φ ( ) − Φ21 ()
³
´ 2 Φ () [Φ
√
1
1 2 () Φ () − Φ1 ( ) Φ2 ()]
+2
b
+
Φ1 ; −
¤2
£
Φ1 1 () Φ () − Φ21 ()
³
´∙
√
+ +2 Φ ; b −
By Lemma A.5 in the Supplement, for all j,s
√
+2 Φ ( b − )
"
#
Z ³
´ ¡ () − R ( ) ¢
√
=
b ( ) − ( )
+2
+ (1)
h
i
√
√
√
+2
b
Φ () − Φ ( ) =
+2 Φ ( b − ) + +2 Φ ( b − )
= (1)
³
´
√
+2 Φ ; b −
!" ¡
¢#
R
Z Ãb
√
(
)
(
)
()
−
()
=
+2
−
( )
( )
!" ¡
R
¢#
Z Ãb
√
(
)
−
(
)
(
)
(
)
−
+ +2
( )
( )
+ (1)
and
°2
°
√
°
°
+2 °b − ° → 0 in probability.
where and are as defined in the proof of Theorem 2.4. Hence,
49
Z ³
´
³
´
√
√
b1 − 1
e ( )
+2 Ξ ; b −
=
+2
Z ³
´
√
+2
+
b2 − 2
e ( )
Z ³
´
√
+2
+
b1 − 1
e ( )
Z ³
´
√
+2
b
+
2 − 2
e ( )
Z
³
´
√
+ +2
b1 − 1
e ( )
Z ³
´
√
+2
+
b1 − 1
e ( ) + (1)
e are as defined in Section 3. Thus,
where
e ...,
Z ³
´
³
´
√
√
+2
+2
b
b1 − 1 [e
( ) +
Ξ ; −
=
e ( ) +
e ( ) +
e ( )]
Z ³
´
√
+2
b
+
2 − 2 [e
( ) +
e ( )]
+ (1)
Let
1 () = [e
( ) +
e ( ) +
e ( ) +
e ( )]
and
2 () = [e
( ) +
e ( )]
() = ( 1 () 2 ())
³ ´
By Lemma 5.3 in Newey (1994), with 1 = = 2 2 = 1 = 1 and b
− (0 ) =
³
´
Ξ ; b − it follows that
where for = (1 2 )
³
´
√
+2 Ξ ; b − → (0 )
=
∙Z
¸
() () ( )
0
50
where for = (1 2 )
=
∙Z ∙Z µ
¶ ¸ ∙Z µ
¶ ¸0 ¸
( )
( )
Since by Lemma A.5
¯
°2
³
´¯ ³√
´ °
√
¯
¯
°b
°
+2
+2
b
1 ° − ° → 0
¯Ξ ; − ¯ ≤
it follows that
Hence,
³ ³ ´
´
³
³
´
³
´´
√
√
+2
+2
b
b
b
Ξ − Ξ ( ) =
Ξ ; − + Ξ ; −
³
´
√
=
+2 Ξ ; b − + (1)
³ ³ ´
´
√
+2
b
Ξ − Ξ () → (0 )
To show that b is a consistent estimator for we note that by Lemma° A.6 in° the Supple°
°
ment, for all and all ∈ { } for finite 0 and for °b − ° sufficiently
small,
¯
¯
¯³
´³
´ ( )2 ¡
2¯
¢
¡
¢
( ) ¯
¯
− ∆ log |= () ∆ log |= ()
¯ ∆ log b |= () ∆ log b |= ()
¯
b
¯
( ) ¯
( )
°
°
°b
°
is uniformly bounded on × by ° − ° By Lemma A.5, for all and all
³ ´
∈ { } Φ b → Φ ( ) It then follows that for some finite and each
∈ { }
¯
°
°
¯
¯
°
°
¯b
e ( ) −
e ( )¯ ≤ °b − °
sup ¯
()∈
°
°
°b
°
Since is compact and ° − ° → 0 it follows that
∙Z
¸
∙Z
¸
0
b
b ()
b () ( ) →
() () ( )
0
Hence, b () → ()
Proof of Theorem 4.3: Let z denote the set of densities that satisfy Assumption
51
4.7. Let kk denote the maximum of the supremum of the values and derivatives up
to the fourth order of over the compact set defined as the closure of the neighborhood
defined in Assumption 4.8. We first analyze the functional that for any assigns the
value of at which 2 log |= ()2 = 0. Define the functional Υ ( ) by Υ ( ) =
2 log |= ()2 We will show that there exists a functional () on a neighborhood
of which is defined implicitly by Υ ( ()) = 0 and satisfies a Taylor expansion of the
form ( + ) = () + (; ) + (; ) with | ( ; )| of the order kk2 We then
use this to analyze the functional defining our estimator. We will denote ( ) by ()
() with similar notation for other functions in z By Assumption 4.8, the
and () by e
density functions and are uniformly bounded away from zero on the closure of the
neighborhood defined in Assumption 4.8. It then follows by the definition of k·k that there
exists 1 0 small enough and 3 0 such that for all in z with k − k ≤ 1 and all
() 3 for all in a neighborhood
in z with k − k ≤ 1 () + () 3 and e() + e
∗
( ) for some 0 small enough. For any such and
Υ ( + ) − Υ ( )
=
2 ()
2
h
2 ()
2
()
+
()
i2
h
2
()
2
2 ()
2
()
+
()
i2
+
+
−
+ h
i2
2 −
() + ()
[() + ()]
() + e
e
()
e() + e
()
⎛
i2
i2 ⎞
h
h
⎜
−⎝
=
2 (+)
2
( + )
⎛
−
−
2 ()
2
()
2 ()
()2
2
()
−
()
⎜
−⎝
Define
Υ ( ; ) =
2 ()
2
−
()
2
()2
e
2
2
[()]
−
2 ()
2
()
e
+
()
[e
()]
Υ ( + ) − Υ ( )
h
i2
(+)
[( + )]2
h
i2
−
()
2
[()]
−
−
2 ()
()()
2
−
2 (+)
2
( + )
e
2
()
2
()
e
−
⎟
⎠
2
+
+
h
(+)
[e
( + )]2
h
i2 ⎞
()
2
[e
()]
()
2 ()
()
⎟
⎠
+2
[()]3
2 () e
()e
()
2
52
−
()
2 ()
()
e
[e
()]3
i2
³
+2
()
³
´2
()
()
´2
e
()
and
Υ ( ; ) =
3 log |= ()
3
Υ ( ; ) = Υ ( + ) − Υ ( ) − Υ ( ; ) and
Υ ( ; ) = Υ ( + ) − Υ ( ) − Υ ( ; )
The definition of z and Assumptions 4.7 and 4.8 imply that there exists ∞ such that
for all ( ) in a neighborhood of ( ∗ )
k Υ ( ; )k ≤ || ; k Υ ( ; )k ≤ ||2 ;
k Υ ( ; )k ≤ kk ; k Υ ( ; )k ≤ kk2
Moreover, it can be verified that on a neighborhood of ( ∗ ) Υ ( ; ) and Υ ( ; )
are also Fréchet differentiable on ( ) and their derivatives are continuous on ( ) Assumption 4.8 and the definition of z imply that for all ( ) in a neighborhood of ( ∗ )
Υ ( ; ) is invertible. It then follows by the Implicit Function Theorem on Banach
spaces that there exists a unique functional such that for all in a 4 neighborhood of
( 4 1 )
Υ ( ()) = 0
The Fréchet derivative at is given by
(; ) =
µ
3 log |= ()
3
¶−1
[− Υ ( ; )]
Since Υ is a 2 map on a neighborhood of ( ∗ ) and its second order derivatives are
uniformly bounded on such neighborhood, is a 2 map with uniformly bounded second
derivatives on a neighborhood of Hence, by Taylor’s Theorem on Banach spaces, it follows
that there exists ∞ such that for sufficiently small kk |( + ) − () − ( ; )| ≤
kk2
We now analyze the functional of that defines our estimator. This functional uses
as an input. Define the functional Ψ( ()) by
−
Ψ ( ()) = h
h
2 (())
2
2 (())
1
( ()) −
( ()) −
(()) (())
2
(()) (())
1
i
i
³
´
\
b
b
Then, Ψ ( ( )) = (2 1 ) 2 and Ψ ( ) = 1 (
2 1 ) 2 For and such
1
53
that kk and || are small enough, define
Ψ ( ∗ ; )
=
+
h
2 (∗ )
2
−
h
(∗ ) −
2 (∗ )
2
2 (∗ )
2
∗
( ) +
h 2
∗
( ) −
(∗ )
(∗ )
1
(∗ ) (∗ )
2
ih
h
2 (∗ )
(∗ )
1
2 (∗ )
(∗ )
1
Ψ ( ∗ ; ) =
³
−
+
(∗ ) (∗ )
2
(∗ ) (∗ )
1
2 (∗ )
(∗ )
1
(∗ ) (∗ )
1
−
i
−
i2
−
(∗ ) (∗ )
2
(∗ ) (∗ )
1
− 2 log |=∗ ()2
2 log |=∗ ()1
´
i
−
(∗ ) (∗ )
1
i
Then,
Ψ ( ( ); ) = Ψ ( ∗ ; ) + Ψ ( ∗ ; (; )) and
Ψ ( ( ); ) = Ψ ( + ( + )) − Ψ ( ()) − Ψ ( ( ); )
The properties we derived on and and Assumptions 4.7-4.9 imply that for some
∞ |Ψ ( (); )| ≤ kk and |Ψ ( (); )| ≤ kk2 By applying Lemma 5.3
in Newey (1994) repeatedly, as we did in Lemma A.5, it follows by Assumptions 4.10, 4.11
that when = b −
q
= 7 h
q
+ 7
q
+ 7
³
h
q
7 Ψ ( (); )
− [ (∗ )]
2 (∗ )
(∗ )
1
2 (∗ )
2
h
∗
( ) −
2 (∗ )
(∗ )
1
− 2 log |=∗ ()2
2 log |=∗ ()1
Hence, when = b −
q
= 7 h
´
µ
−
(∗ ) (∗ )
1
(∗ ) (∗ )
2
−
i
i
2 (∗ )
2
[(∗ )] 2 (∗ )
i2
1
(∗ ) (∗ )
1
3 log |=∗ ()
3
q
7 Ψ ( (); )
¶−1 ∙
¸
−1 2 (∗ )
+ (1)
(∗ )
2
2 log |=∗ ()
2 (∗ )
2
i
2 log |=∗ () 2
1
( ∗ )
1
54
q
+ 7
q
+ 7
−
2
2 (∗ )
log |=∗ ()
( ∗ ) 2
1
−1
− 2 log |=∗ ()2
2 log |=∗ ()1
( ∗ )
³ 3
log |=∗ ()
3
´
2 (∗ )
+ (1)
2
By Lemma 5.3 in Newey (1994), with = 2 1 = 3 2 = 0 it follows that
q
³
´
7 Ψ (); b − → (0 e )
where e is as defined prior to the statement of Theorem
° 4.3. °Assumptions
³p 4.7, 4.10 and 4.11
´
°b
°
ln() ( 11 ) +
together with Lemma B.3 in Newey (1994) imply that ° − ° =
¯
³
´¯
³p
´2
√
¯
¯
ln() ( 11 ) + → 0 Hence, since ¯Ψ ( ); b − ¯ ≤
By Assumption 4.11, 7
°2
°
³
´
√
°
°
°b − ° it follows that 7 Ψ (); b − = (1) Hence,
=
=
=
→
q
h
i
1
\
7 1 (
2 1 ) 2 − (2 1 ) 2
³ ³
´
´
√
b (b) − Ψ ( ())
7 Ψ
³
´ √
³
´
√
7
7
b
b
Ψ ( ); − + Ψ ( ); −
³
´
√
7 Ψ ( ); b − + (1)
³
´
0 e
in distribution
b
To show that e converges to e in probability, we first note that, as shown above
¯ ¯
¯ ³ ´
´¯
³
´¯ ¯ ³
¯
¯ ¯
¯ ¯
¯
|b
∗ − ∗ | = ¯ b − ()¯ ≤ ¯ ; b − ¯ + ¯ ; b − ¯
°
°2
°
°
°
°
°
°
≤ °b − ° + °b − ° → 0 in probability
Define the functionals
Ω1 () =
h
2 (())
2
Ω2 () = h
h
( ()) −
(()) (())
2
2 (())
( ())
1
−
55
−
[( ())]
i2
(()) (())
1
− [ ( ())]
2 (())
( ())
1
i
(()) (())
1
i and
Ω3 () =
³
− 2 log |=() ()2
2 log |=() ()1
´
µ
3 log |=() ()
3
¶−1 ∙
−1
( ())
¸
Following steps analogous to those in the proof of Lemma A.1 in the Supplement, it can be
shown that there exist a finite 0 and functionals Ω1 Ω1 Ω2 Ω2 Ω3 and Ω3
such that for all b in a neighborhood of and for = 1 2 3
¯
¯ ³ ´
¯
³
´
³
´¯
¯
¯
¯
¯
¯Ω b − Ω ()¯ = ¯Ω ; b − + Ω ; b − ¯
¯
³
´¯ ¯
³
´¯
¯
¯ ¯
¯
≤ ¯Ω ; b − ¯ + ¯Ω ; b − ¯
°
°
°
°2
°
°
°
°
≤ °b − ° + °b − °
°
°
¯ ³ ´
¯
°
°b
¯
¯
b
Hence, since ° − ° → 0 it follows that ¯Ω − Ω ( )¯ → 0 By the definition of e
b
this implies that e → e in probability.
Proof of Theorem 4.4: The proof is similar to the proof of Theorem 4.3. Let z denote
the set of densities that satisfy Assumption 4.7’. Let kk denote the maximum of the
supremum of the values and derivatives up to the third order of over a compact set that
is defined by the union of the closures of the neighborhoods defined in Assumption 4.8’.
We first analyze the functionals that for any assigns different values 1 and 2 at which
log |=1 () = 0 and log |=2 () = 0 When = 1 = ∗ and 2 = ∗∗ in
the definition of the estimator for 1 (2 1 ) 2 As in the proof of Theorem 4.3, we will
denote ( ) by () and () by e() with similar notation for other functions in z
Since 1 6= 2 the asymptotic covariance of our kernel estimators for the values of 1 and 2
¡
¢0
is zero. Define the functional Υ ( 1 2 ) = log |=1 () log |=2 ()
We first show that there exists a functional () = (1 () 2 ()) satisfying () = (∗ ∗∗ )
which is defined implicitly in a neighborhood of by
¢
¡
Υ 1 () 2 () = 0
By Assumption 4.8’, the density functions and are uniformly bounded away from
zero on the closure of the neighborhood defined in Assumption 4.8’. It then follows by the
definition of k·k that there exists 1 2 3 0 small enough such that for all in z with
k − k ≤ 1 , all in z with kk ≤ 2 all 1 in a 1 neighborhood of ∗ and all 2 in a
( ) + e
( ) 3 ( = 1 2) Hence, by
2 neighborhood ∗∗ ( ) + ( ) 3 and e
56
Assumptions 4.7’ and 4.8’,
⎛
⎜
⎜
Υ ( + 1 2 ) − Υ ( 1 2 ) = ⎜
⎝
Υ ( 1 + 1 2 ) − Υ ( 1 2 ) =
(1 ) (1 )
+
(1 )+(1 )
(2 ) (2 )
+
(2 )+(2 )
³
(1 + 1 )
(1 + 1 )
³
Υ ( 1 2 + 2 ) − Υ ( 1 2 ) = 0
Define
⎛
⎜
⎜
Υ ( 1 2 ; ) = ⎜
⎝
(1 )
(1 )
(2 )
(2 )
−
−
−
(2 + 2 )
(2 + 2 )
(1 ) (
)
+ 1
−
(1 )+
(1 )
(2 ) (
)
+ 2
−
(2 )+
(2 )
(1 + 1 )
(1 +1 )
−
(1 )
(1 )
((1 ))2
(2 )
(2 )
((2 ))2
−
(2 + 2 )
(2 +2 )
−
−
(
1)
(1 )
(
2)
(2 )
−
−
(1 )
(1 )
−
(1 )
(1 )
+
(2 )
(2 )
+
+
(2 )
(2 )
(1 )
(1 )
(2 )
(2 )
⎞
⎟
⎟
⎟
⎠
´0
0 and
´0
(2 )
(1 )
(1 )
+
+
(1 )
(1 )
(
(1 ))2
+
(2 )
(2 )
(
(2 ))2
(2 )
⎞
⎟
⎟
⎟
⎠
2 log |=1 ()
2 log |=2 ()
;
Υ
(
;
)
=
2
1
2
1
2 2
2
2
Υ ( 1 2 ) = Υ ( + 1 2 ) − Υ ( 1 2 ) − Υ ( 1 2 ; )
1 Υ ( 1 2 ; 1 ) =
1 Υ ( 1 2 ; 1 ) = Υ ( 1 + 1 2 ) − Υ ( 1 2 ) − 1 Υ ( 1 2 ; 1 ) and
2 Υ ( 1 2 ; 2 ) = Υ ( 1 2 + 2 ) − Υ ( 1 2 ) − 2 Υ ( 1 2 ; 2 )
The definition of z and Assumptions 4.7’ and 4.8’ imply that for some ∞
°
°
°
°
° Υ ( 1 2 ; )° ≤ | | and ° Υ ( 1 2 ; )° ≤ | |2
(for = 1 2)
k Υ ( 1 2 ; )k ≤ kk ; k Υ ( 1 2 ; )k ≤ kk2
Hence, Υ ( 1 2 ; ) is the Fréchet derivative of Υ with respect to and Υ ( 1 2 ; )
is the Fréchet derivative of Υ with respect to By their definitions and Assumptions 4.7’
ad 4.8’, it follows that both Fréchet derivatives are themselves Fréchet differentiable and their
derivatives are continuous and uniformly bounded on a neighborhood of ( ∗ ∗∗ ) Moreover,
each Υ( 1 2 ; ) ( = 1 2) has a continuous inverse on a neighborhood of ( ∗ ∗∗ )
It then follows by the Implicit Function Theorem on Banach spaces that there exist unique
functionals 1 and 2 such that 1 ( ) = ∗ 2 ( ) = ∗∗ for all in a neighborhood of
¡
¢
Υ 1 () 2 () = 0
1 and 2 are differentiable on a neighborhood of and their Fréchet derivatives are given
57
by, for j=1,2
µ
(; ) =
2 log |= ()
2
¶−1
[− Υ ( 1 2 ; )]
Moreover, 1 and 2 satisfy a First order Taylor expansion around with remainder term
for ( ( + ) − ()) bounded by kk2 for a constant ∞ Define the functional
Ψ( 1 2 ) by
Ψ ( 1 2 ) =
"
(2 )
2
( 2 )
−
(1 )
2
( 1 )
#"
(1 )
1
( 1 )
−
(2 )
1
( 2 )
#−1
³ ´
³ ´´
³
1
2
1
\
Then, Ψ b 1 b 2 b = 1 (
2 1 ) 2 and Ψ ( ( ) ( )) = (2 1 ) 2
Denote ( ) by ( ) and ( ) by ( ) (j=1,2). Then, for kk | 1 | and
| 2 | in a neighborhood of 0,
h
=h
(∗∗ )
2
(∗ )
1
+
+
(∗∗ )
2
(∗ )
1
i
i
Ψ ( + ∗ ∗∗ ) − Ψ ( ∗ ∗∗ )
∗
∗
[ ( ) + ( )] −
[ (∗∗ ) + (∗∗ )] −
h
h
(∗ )
2
i
[ (∗∗ ) + (∗∗ )]
i
∗∗ )
[ (∗ ) + (∗ )]
+ (
1
+
(∗∗ )
1
(∗ )
2
∗
(∗∗ )
(∗ ) − (2 ) (∗∗ )
2
− (∗ )
(∗∗ )
(∗∗ ) −
(∗ )
1
1
Ψ ( ∗ + 1 ∗∗ ) − Ψ ( ∗ ∗∗ )
=
∗
(∗∗ )
1)
(∗ + 1 ) − (+
(∗∗ )
2
2
(∗ + 1 )
(∗∗ )
(∗∗ ) −
(∗ + 1 )
1
1
−
∗
(∗∗ )
(∗ ) − (2 ) (∗∗ )
2
(∗ )
(∗∗ )
∗)
(∗∗ ) −
(
1
1
and
Ψ ( ∗ ∗∗ + 2 ) − Ψ ( ∗ ∗∗ )
=
∗
(∗∗ + 2 )
(∗ ) − (2 ) (∗∗ + 2 )
2
∗∗
(∗ )
(∗∗ + 2 ) − (1+2 ) (∗ )
1
−
∗
(∗∗ )
(∗ ) − (2 ) (∗∗ )
2
(∗ )
(∗∗ )
(∗∗ ) −
(∗ )
1
1
Define
Ψ ( ∗ ∗∗ ; )
58
=
−
h
(∗∗ )
2
h
∗
( ) −
(∗∗ )
2
(∗ )
2
∗
( ) −
h
(∗ )
2
∗∗
(∗∗ )
(∗ )
2
−
i
(∗ )
(∗∗ )
(∗∗ ) −
(∗ )
1
1
( ) +
ih
∗)
(∗∗ )
∗∗
( ) (
(
)
−
(∗ ) +
1
1
i2
h
∗∗
(∗ )
∗∗ ) − ( ) (∗ )
(
1
1
∗∗
i
(∗ )
(∗∗ )
2
(∗ )
(∗∗ )
1
−
i
(∗∗ )
(∗ )
1
Ψ ( ∗ ∗∗ ; ) = Ψ ( + ∗ ∗∗ ) − Ψ ( ∗ ∗∗ ) − Ψ ( ∗ ∗∗ ; )
¶
log |=∗∗ ()2 − log |=1 ()2
|1 =∗ 1
log |=1 ()1 − log |=∗∗ ()1
µ
¶
log |=2 ()2 − log |=∗ ()2
∗
∗∗
|2 =∗∗ 2
2 Ψ ( ; 2 ) =
2 log |=∗ ()1 − log |=2 ()1
1 Ψ ( ; 1 ) =
1
∗
∗∗
µ
¡
¢
¡
¢
Ψ ( ∗ ∗∗ ; ) = Ψ ( ∗ ∗∗ ; )+1 Ψ ∗ ∗∗ ; 1 (; ) +2 Ψ ∗ ∗∗ ; 2 ( ; )
¡
¢
and Ψ ( ∗ ∗∗ ; ) = Ψ + 1 ( + ) 2 ( + ) −Ψ ( ∗ ∗∗ )−Ψ ( ∗ ∗∗ ; )
Assumptions 4.7’ and 4.8’ imply that
|Ψ ( ∗ ∗∗ ; )| ≤ kk and |Ψ ( ∗ ∗∗ ; )| ≤ kk2
Assumptions 4.10’ and 4.11’ imply, following steps as in the proof of Lemma A.5 in the
Supplement, that for b − =
q
5 Ψ ( ∗ ∗∗ ; )
h
i
(∗∗ )
(∗ )
∗
∗∗
q
(
)
−
(
)
2
2
i
= 5 h
∗
∗∗
( )
∗∗ ) − ( ) (∗ )
(
1
1
h
i
h
i
(∗∗ )
(∗ )
(∗ )
(∗∗ )
∗
∗∗
∗∗
∗
q
( ) − 2 ( )
( ) − 1 ( )
2
1
− 5
i2
h
∗∗
(∗ )
∗∗ ) − ( ) (∗ )
(
1
1
³
´
log |=∗∗ ()2 − log |=1 ()2
q
|1 =∗ µ − (∗ ) ¶
1 log |=1 ()1 − log |=∗∗ ()1
5
³ 2
´
+
log |=∗ ()
(∗ )
2
³
´
log |=2 ()2 − log |=∗ ()2
q
|2 =∗∗ µ − (∗∗ ) ¶
2 log |=∗ ()1 − log |=2 ()1
5
³ 2
´
+
+ (1)
log |=∗∗ ()
(∗∗ )
2
59
By Lemma 5.2 in Newey (1994), it follows by Assumptions 4.7’, 4.8’, 4.10’, and 4.11’ that
p
5 Ψ ( ∗ ∗∗ ; ) → (0 ) where is as defined prior to the statement of Theorem
¯
°
°2
³
´¯
p
p
¯
°b
°
∗
∗∗ b
5 ¯
5
4.4. Since ¯Ψ ; − ¯ ≤ ° − ° = (1) by Assumption
4.10’ and Lemma B.2 in Newey (1994), we can conclude that
Ã
!
q
1
1 ( )
\
(
)
2 1
2 1
5
−
2
2
q
³ ³
³ ´
³ ´´
´
¡
¢
b 1 b 2 b − Ψ ( ∗ ∗∗ ) →
=
5 Ψ
0
Following steps analogous to those used in the proof of Theorem 4.3, as in the proof of
Lemma A.1, it can be shown that the estimators for each defined before the statement
b∗ ) → ( ∗ ) and
of Theorem 4.4 converge in probability to Similarly, b (
b∗∗ ) → ( ∗∗ ) Hence, b →
b (
8. References
AI, C. (1997) "A Semiparametric Maximum Likelihood Estimator," Econometrica, 65, 933963.
AI, C. and X. CHEN (2003), "Efficient Estimation of Models with Conditional Moments
Restrictions Containing Unknown Functions," Econometrica, 71, 1795-1843.
ALTONJI, J.G., H. ICHIMURA AND T. OTSU (2012): “Estimating Derivatives in Nonseparable Models with Limited Dependent Variables," Econometrica, 80, 4, 1701-1719.
ALTONJI, J.G. AND R.L. MATZKIN (2001): “Panel Data Estimators for Nonseparable
Models with Endogenous Regressors", NBER Working Paper T0267.
ALTONJI, J.G. AND R.L. MATZKIN (2005): “Cross Section and Panel Data Estimators
for Nonseparable Models with Endogenous Regressors,” Econometrica, 73, 3, 1053-1102.
AMEMIYA, T. (1985), Advanced Econometrics, Cambridge, MA: Harvard University Press.
BHATTACHARYA, P.K. (1963), "On an Analog of Regression Analysis," The Annals of
Mathematical Statistics, Vol. 34, No. 4, pp. 1459-1473.
60
BENKARD, C.L. and S. BERRY (2006) "On the Nonparametric Identification of Nonlinear Simultaneous Equations Models: Comment on B. Brown (1983) and Roehrig (1988),
Econometrica, 74 5
BERRY, S.T. and P.A. HAILE (2009) "Identification in Differentiated Products Markets
Using Market Level Data," working paper, Yale University.
BLUNDELL, R. and R.L. MATZKIN (2014) "Control Functions in Nonseparable Simultaenous Equations Models," Quantitative Economics, Vol. 5, No. 2, 271-295.
BLUNDELL, R. and J. L. POWELL (2003a) "Endogeneity in Nonparametric and Semiparametric Regression Models," in Advances in Economics and Econometrics, Theory and
Applications, Eighth World Congress, Volume II, edited by M. Dewatripont, L.P. Hansen,
and S.J.Turnovsky, Cambridge University Press, Cambridge, U.K.
BLUNDELL, R. AND J.L. POWELL (2003b) "Endogeneity in Semiparametric Binary Response Models," Review of Economic Studies.
BOWDEN, R. (1973) "The Theory of Parametric Identification," Econometrica, 41, 10691074.
BROWN B.W. (1983) "The Identification Problem in Systems Nonlinear in the Variables,"
Econometrica, 51, 175-196.
BROWN, D.J. and R.L. MATZKIN (1998) "Estimation of Nonparametric Functions in Simultaneous Equations Models, with an Application to Consumer Demand," Cowles Foundation Discussion Paper #1175.
CHEN, X., V. CHERNOZHUKOV, S. LEE and W. NEWEY (2014): "Local Identification
of Nonparametric and Semiparametric Models," Econometrica, Vol. 82, 2.
CHEN, X. and D. POUZO (2012) "Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals," Econometrica, Vol. 80, No. 1, 277—321
CHERNOZHUKOV, V. AND C. HANSEN (2005): “An IV Model of Quantile Treatment
Effects,” Econometrica, 73, 245-261.
61
CHERNOZHUKOV, V., G. IMBENS, AND W. NEWEY (2007): “Instrumental Variable
Identification and Estimation of Nonseparable Models via Quantile Conditions," Journal of
Econometrics.
CHESHER, A. (2003): “Identification in Nonseparable Models,” Econometrica, Vol. 71, No.
5.
CHESHER, A. (2005): “Identification of Non-additive Structural Functions," mimeo, presented at the Symposium on Nonparametric Structural Models, World Congress of the Econometric Society, London.
CHIAPPORI, P-A, and I. KOMUNJER (2009) "On the Nonparametric Identification of
Multiple Choice Models," discussion paper, UCSD.
DAROLLES, S., J.P. FLORENS, and E. RENAULT (2002), “Nonparametric Instrumental
Regression,” mimeo, IDEI, Toulouse.
FISHER, F.M. (1959). “Generalization of the rank and order conditions for identifiability”.
Econometrica 27, (3), 431—447.
FISHER, F.M. (1961) "Identifiability Criteria in Nonlinear Systems," Econometrica, 29,
574-590.
FISHER, F.M. (1965) "Identifiability Criteria in Nonlinear Systems: A Further Note," Econometrica, 33, 197-205.
FISHER, F.M. (1966) The Identification Problem in Econometrics. New York: McGrawHill.
FLORENS, J.P. (2003) "Inverse Problems and Structural Econometrics: The Example of
Instrumental Variables," in Advances in Economics and Econometrics, Theory and Applications, Vol. II, edited by M. Dewatripont, L.P. Hansen, and S. Turnovsky, Cambridge:
Cambridge University Press.
FRISCH, R. A.K. (1934) Statistical Confluence Analysis by Means of Complete Regression
Systems, Publication No. 5. Oslo: Universitets Okonomiske Institutt.
62
FRISCH, R. A.K. (1938) "Statistical versus Theoretical Relations in Economic Macrodynamics," Memorandum prepared for a conference om Cambridge, England, July 18-20, 1938, to
discuss drafts of Tinbergen’s League of Nations publications.
GALE, D. and H. NIKAIDO (1965), "The Jacobian Matrix and Gobal Univalence of Mappings," Math. Annalen 159, 81-93.
GALLANT, A. R. and D. W. NYCHKA (1987), "Seminonparametric Maximum Likelihood
Estimation," Econometrica, 55, 363-390.
HALL, P. and J.L. HOROWITZ (2003), “Nonparametric Methods for Inference in the Presence of Instrumental Variables,” Center for Microdata Methods and Practice, WP 102/03.
HAUSMAN, J.A. (1983) “Specification and Estimation of Simultaneous Equation Models,”
in Handbook of Econometrics, Vol. I, edited by Z. Griliches and M.D. Intriligator.
HAAVELMO, T.M. (1943). “The statistical implications of a system of simultaneous equations”. Econometrica, 11, (1), 1—12.
HAAVELMO, T.M. (1944) "The Probability Approach in Econometrics," ,
12, Supplement, July.
HSIAO, C. (1983), "Identification," in Handbook of Econometrics, Vol. 1, edited by Z.
Griliches and M.D. Intriligator, North-Holland Publishing Company.
HURWICZ, L. (1950) "Generalization of the Concept of Identification," in Statistical Inference in Dynamic Economic Models, Cowles Commission Monograph 10. New York: John
Wiley.
IMBENS, G.W. AND W.K. NEWEY (2003) “Identification and Estimation of Triangular
Simultaneous Equations Models Without Additivity,” mimeo, M.I.T.
IMBENS, G.W. AND W.K. NEWEY (2009) “Identification and Estimation of Triangular
Simultaneous Equations Models Without Additivity,” Econometrica, 77, 5.
KOOPMANS, T.C. and O. REIERSOL (1950) "The Identification of Structural Characteristics," Annals of Mathematical Statistics, 21, 165-181.
63
KOOPMANS, T.C., A. RUBIN, and R.B. LEIPNIK (1950) "Measuring the Equation System of Dynamic Economics," in Statistical Inference in Dynamic Economic Models, Cowles
Commission Monograph 10, New York: John Wiley.
LEWBEL, A. (2000), “Semiparametric Qualitative Response Model Estimation with Unknown Hetersokedasticity and Instrumental Variables," Journal of Econometrics, 97, 145177.
MANSKI, C.F. (1983). “Closest empirical distribution estimation”. Econometrica 51 (2),
305—320.
MATZKIN, R.L. (1992) "Nonparametric and Distribution-Free Estimation of the Binary
Threshold Crossing and the Binary Choice Models," Econometrica, 60, 239-270.
MATZKIN, R.L. (1994) “Restrictions of Economic Theory in Nonparametric Methods,” in
Handbook of Econometrics, Vol. IV, R.F. Engel and D.L. McFadden, eds, Amsterdam:
Elsevier, Ch. 42.
MATZKIN, R.L. (1999) "Nonparametric Estimation of Nonadditive Random Functions,"
mimeo.
MATZKIN, R.L. (2003) "Nonparametric Estimation of Nonadditive Random Functions,"
Econometrica, 71, 1339-1375.
MATZKIN, R.L. (2004) "Unobservable Instruments," mimeo, Northwestern University.
MATZKIN, R.L. (2005a) "Identification in Nonparametric Simultaneous Equations Models,"
mimeo, Northwestern University.
MATZKIN, R.L. (2005b) "Heterogenous Choice," invited lecture, World Congress of the
Econometric Society, London, U.K.
MATZKIN, R.L. (2007a) “Nonparametric Identification,” Chapter 73 in Handbook of Econometrics, Vol. 6b, edited by J.J. Heckman and E.E. Leamer, Elsevier B.V., 5307-5368.
64
MATZKIN, R.L. (2007b) “Heterogeneous Choice,” in Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress of the Econometric Society, edited
by R. Blundell, W. Newey, and T. Persson, Cambridge University Press.
MATZKIN, R.L. (2008) “Identification in Nonparametric Simultaneous Equations,” Econometrica, Vol. 76, No. 5, 945-978.
MATZKIN, R.L. (2012) “Identification in Nonparametric Limited Dependent Variable Models with Simultaneity and Unobserved Heterogeneity,” Journal of Econometrics.
MATZKIN, R.L. (2013) “Nonparametric Identification in Structural Economic Models„”
Annual Review of Economics.
NADARAYA, E. (1964) “On Estimating Regression,” Theory Probab.Appl., 9, 141-142.
NEWEY, W.K. (1994) "Kernel Estimation of Partial Means and a General Variance Estimator," Econometric Theory, 10, 233-253.
NEWEY, W. and J. POWELL (1989) “Instrumental Variables Estimation of Nonparametric
Models,” mimeo.
NEWEY, W. and J. POWELL (2003) “Instrumental Variables Estimation of Nonparametric
Models,” Econometrica.
PAGAN A. and A. ULLAH (1999) Nonparametric Econometrics, Cambridge University
Press, Cambridge, U.K.
POWELL, J.L., J.H. STOCK, and T.M. STOKER (1989) "Semiparametric Estimation of
Index Coefficients," , 51, 1403-1430.
ROEHRIG, C.S. (1988) "Conditions for Identification in Nonparametric and Parametric
Models," Econometrica, 56:2, 433-447.
ROTHENBERG, T.J. (1971) "Identification in Parametric Models," Econometrica, 39, 577592.
65
STOKER, T.M. (1986) "Consistent Estimation of Scaled Coefficients," , 54,
1461-1481.
STONE, C.J. (1977) "Consistent Nonparametric Regression," The Annals of Statistics, Vol.
5, No. 4, pp. 595-620.
TINBERGEN, J. (1930) "Bestimmung und Deutung con Angebotskurven: Ein Beispiel,"
Zeitschrift fur Nationalokonomie, 70, 331-342.
WALD, A. (1950) "Note on Identification of Economic Relations," in Statistical Inference in
Dynamic Economic Models, Cowles Commission Monograph 10. New York: John Wiley.
WEGGE, L.L. (1965) "Identifiability Criteria for a System of Equations as a Whole," The
Australian Journal of Statistics, 7, 67-77.
WATSON, G.S. (1964) "Smooth Regression Analysis," Sankhy : The Indian Journal of
Statistics, Vol. 26, No. 4, 359-372.
WORKING, E.J. (1927) "What Do Statistical ‘Demand Curves’ Show? Quaterly Journal of
Economics, 41, 212-235.
WORKING, H. (1925) "The Statistical Determination of Demand Curves," Quaterly Journal
of Economics, 39, 503-543.
66
ESTIMATION OF NONPARAMETRIC MODELS
WITH SIMULTANEITY
Supplementary Material
Rosa L. Matzkin
Department of Economics
UCLA
Abstract
This supplement presents the statements and proofs of the lemmas that are used in the
Appendix of the paper. Lemmas A.1-A.6 are used in the proofs of Theorems 2.4 and 3.2.
Lemma B.1 is used in the proof of Theorem 3.1.
Lemmas
Lemma A.1: Let and be compact and convex sets such that = {} ×
is strictly included in the interior of × Let F denote the set of bounded and continuously differentiable functions on the extension ofR the support of ( ) to the whole
space and such that the function e defined by e() = |( )| is bounded and continuously differentiable. For all functions in F, denote by kk the sum of the sup norms
of the values and derivatives of and of e
on × For any ( ) ∈ ×
define the functionals (·) and (·) on F by () = log |= () and ()
= log |= () For simplicity we leave the argument ( ) implicit. Assume that
the density belongs to F and it is such that for 0 and all ( ) ∈ ×
( ) and () Then, there exists finite 0 and e
0 0 uch that for all
e
∈F with kk ≤ 0 and all ( ) ∈ ×
( + ) − ( ) = ( ; ) + ( ; )
where
( + ) − ( ) = ( ; ) + ( ; )
¤
¤
£
£
−
−
(; ) =
; ( ; ) = −
2
2 ( + )
h
i⎤
⎡
e
e
e
e
−
[ − ]
⎦ ;
(; ) = ⎣ 2 −
e2
i⎤
h
e
e
e
e
e
−
[ − ]
⎦
−
(; ) = − ⎣
2 ( + )
)
e2 (e + e
⎡
1
Moreover,
¯
¯
¯
¯
¯
¯
¯
¯
¯ ( ; )¯ ≤ a kk ; ¯ (; )¯ ≤ a kk ; ¯ (; )¯ ≤ a kk2 ; ¯ (; )¯ ≤ a kk2
Proof: To simplify notation, we will denote ( ) by () by e ( ) by
( ) by and () by e with similar shorthands for functions
and By the definitions of () and ()
()
= and () =
() =
( )
Ã
()
( )
−
()
()
!
=
e
−
e
e
e
Let
¯ ¯ 0 = min{2 1} Then, for all ∈F such that kk ≤ 0 one has ³that ||
´ 2 and
¯e¯
2 By
¯¯ 2 Since and e it follows that ( + ) 2 and e + e
rearranging terms, it follows that
( + ) − ( )
¤
¤#
£
¸ "£
−
−
+
−
=
−
=
+
2
2 ( + )
¤
¤
£
£
−
−
−
=
2
2 ( + )
= (; ) + (; ) and
#
¸ "e
∙
e
+ e
+
−
( + ) − () =
−
−
+
e + e
e
i
i⎤
h
h
⎡
¸
∙
e
e
e
e
e
e
e
e
e
−
−
[ − ] [ − ]
⎦
⎣
−
−
−
=
2
2 ( + )
)
e2
e2 (e + e
∙
i⎤ ⎡
i⎤
h
h
e
e
e
e
e
e
e
e
e
−
−
[ − ]
⎦
⎦ − ⎣ [ − ] −
= ⎣ 2 −
2 ( + )
)
e2
e2 (e + e
⎡
= (; ) + (; )
where the last equalities in each of the two expressions above follow by the definitions of
(; ) ( ; ) (; ) and ( ; ) in the statement of the Lemma.
It remains to show that for some finite 0 which does not depend on ( )
¯
¯
¯
¯
¯
¯
¯
¯
¯ ( ; )¯ ≤ a kk ; ¯ (; )¯ ≤ a kk ; ¯ (; )¯ ≤ a kk2 ; and ¯ (; )¯ ≤ a kk2
2
©
¡ ¢
¡ ¢ª
Let = max 4¯k¯ k 2¯ 4¯ kk 3 By the definition of k k and kk it¯ follows
¯that¯
¯
¯ ¯
¯ ¯
¯
¯ ¯
¯
¯
¯
¯
¯
¯
¯ and ¯e
| | ¯ ¯ | | ¯e¯ and ¯e ¯ are not larger than k k and that || ¯ ¯ | | ¯e
are not
³ larger
´ than kk Moreover, as was shown above, for all such that kk 2 ( + )
and e + e
are larger than 2 By the assumption that e it then follows that
¯£
¤¯
¯
¯ ¯¯ − ¯¯ 2 kk
¯ ( ; )¯ = ¯
kk ≤ kk
¯≤
¯
¯
2
2
¯ £
¤¯
¯
¯ ¯¯ − ¯¯ 2 k k
¯ ( ; )¯ = ¯
kk2 ≤ kk2
¯≤
3
¯ 2 ( + ) ¯
i¯
h
¯
¯
e
¯¯ µ 4 kk ¶
e − e e
¯ ¯ [ − ]
¯
¯≤
¯ (; )¯ = ¯
kk ≤ kk and
−
¯
¯
2
2
e2
¯
¯
h
i¯
¯
¯
¯
e
e
e
e
e
−
¯ 4 k k
¯ ¯
¯
¯≤
¯ ( ; )¯ = ¯ [ − ] −
kk2 ≤ kk2
3
¯ 2 ( + )
¯
2
e
e
e
( + ) ¯
¯
This completes the proof of the Lemma.
Lemma A.2: Let F, k·k (·) and (·) be as in the statement of
Lemma A.1. Assume that the density belongs to F and it is such that for 0 and all
( ) ∈ × ( ) and () Let denote a bounded, nonnegative,
continuously differentiable function, with values equal to zero when any coordinate of ∈
Ris on the boundary of positive values when is in the interior of , and such that
( ) = 1 For simplicity we will leave the argument ( ) implicit. Define the
functional Φ on F by
µZ
¶ µZ
¶
Z
() () ( ) −
() ( )
() ( )
Φ () =
1 0 uch that for all ∈F with kk ≤ e
1
Then, there exist finite 1 0 and e
Φ ( + ) − Φ () = Φ ( ; ) + Φ ( ; )
where, employing the notation of Lemma A.1 for the definitions of ( ; ) (; )
(; ) and (; )
µ
¶
Z
Z
Φ (; ) =
(; ) ( ) −
( )
µ
¶
Z
Z
(; ) () −
()
+
Φ (; ) =
Z
µ
¶
Z
(; ) () −
()
3
+
+
−
Z
Moreover,
µ
¶
Z
( ; ) ( ) −
()
¢¡
¢
¡
(; ) + (; ) (; ) + (; )
µZ
Z
¶ µZ
¶
¢
¢
¡
¡
(; ) + (; )
(; ) + (; )
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ 1 kk ¯Φ (; )¯ ≤ 1 kk2
Proof: Let e
1 = min{e
0 2 1} To show that for all such that kk ≤ e
1
Φ ( + ) − Φ () = Φ ( ; ) + Φ ( ; )
we note that by the definitions of Φ , and it can be shown by adding and
subtracting terms that
Φ ( + ) − Φ ( )
µZ
¶ µZ
¶
( + ) ( + ) −
( + )
( + )
=
µZ
¶ µZ
¶
Z
( ) () +
()
()
−
Z
=
µ
Z
¢
¡
( + ) − () () −
Z
+
Z
+
−
µZ
Z
µ
Z
() −
¶
()
¶
¡
¢
( ) ( + ) − ( )
¢¡
¢
¡
( + ) − ( ) ( + ) − ()
¶ µZ
¡
¢
( + ) − ()
¶
¡
¢
( + ) − ()
¢
¡
¢
¡
Employing the expressions of ( + ) − ( ) and of ( + ) − ( ) in terms
of (; ) ( ; ) ( ; ) and (; ) guaranteed by Lemma A.1, it follows
from the last equality that
=
+
Z
Z
Φ ( + ) − Φ ( )
µ
¶
Z
(; ) ( ) −
( )
µ
Z
(; ) ( ) −
4
¶
( )
+
+
+
−
µZ
Z
Z
Z
µ
Z
( ; ) ( ) −
µ
Z
( ; ) ( ) −
¶
()
¶
()
¡
¢¡
¢
(; ) + (; ) ( ; ) + ( ; )
¶ µZ
¢
¡
(; ) + (; )
¶
¢
¡
(; ) + (; )
Note that the sum of the first two terms is Φ (; ) and the sum of the last four terms
is Φ (; ) Hence,
Φ ( + ) − Φ () = Φ ( ; ) + Φ ( ; )
It remains to show that for some finite 1 0
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ 1 kk and ¯Φ (; )¯ ≤ 1 kk2
©
¡ 2¢
¡ 3 ¢ª
Let = max
4
kk
4
k
k
as defined in the proof of Lemma A.1, and let
o
n
16 k k
2
1 = max 16 Since by assumption and e it follows by the definitions
R
of () and (), the assumption that ( ) = 1 and the conclusion of Lemma
A.1 that
¯
¶
¯
()
¯¯
¯µ
¶¯
Z
Z
¯¯
¯
¯
¯
¯ (; )¯ ¯ () −
≤
()
¯
¯
¯Z
¯
¯
¯
≤
µ
Z
(; ) () −
Z
¯¶¯
¯µ¯
¯ ¯ 4 kk ¯ ¯
¯
¯ ¯ ≤ kk 4 k k ≤ 1 kk
¯
kk ¯ ¯
¯ ¯
2
¯Z
¯
¯
¯
µ
Z
( ; ) () −
and that
¯
¶
¯
()
¯¯
¯µ
¶¯
Z
Z
¯ ¯
¯
¯
¯
¯
¯
≤
() ¯¯
( ; ) ¯ () −
≤
Z
¯µ¯
¯¶¯
¯ ¯ 2 kk ¯ ¯
¯
¯
¯ ¯ ≤ kk 2 k k ≤ 1 kk
kk ¯ ¯
¯ ¯
2
¯
¯
Hence, ¯Φ (; )¯ ≤ 1 kk Analogously, for the four terms of Φ ( ; )
5
¯Z
¯
¯
¯
≤
Z
¯Z
¯
¯
¯
≤
Z
µ
Z
(; ) () −
¯
¶
¯
()
¯¯
µ
Z
(; ) () −
¯
¶
¯
()
¯¯
¯¶¯
¯µ¯
¯ ¯
¯ ¯
1
2 ¯ ¯ 4 kk ¯ ¯
≤
kk2
kk ¯ ¯
¯
¯
4
¯µ¯
¯¶¯
¯ ¯
¯ ¯
1
2 ¯ ¯ 2 kk ¯ ¯
≤ kk2
kk ¯ ¯
¯
¯
4
¯Z
¯
¯
¯
¢¡
¢
¡
¯
¯
(;
)
+
(;
)
(;
)
+
(;
)
¯
¯
Z
¡
¢2
¡
¢2
1
≤
2 kk + kk2 ≤ 2 kk + kk2 ≤ 4 2 kk2 ≤ kk2
4
¯µZ
¯
¯
¯
¶ µZ
¢
¡
( ; ) + ( ; )
¶¯
¯
¢
¡
( ; ) + ( ; ) ¯¯
¡
¢2 1
≤ 2 kk + kk2 ≤ kk2
4
¯
¯
Hence, ¯Φ (; )¯ ≤ 1 kk2 This completes the proof.
Lemma A.3: Let F, k·k (·) (·) and be as in the statement of
Lemma A.3. Assume that the density belongs to F and it is such that for 0 and all
( ) ∈ × ( ) and () For simplicity we will leave the argument
( ) implicit. Define the functional Φ and Φ on F by
µZ
¶ µZ
¶
Z
Φ () =
() () ( ) −
() ( )
() ( )
Φ () =
Z
µZ
() () ( ) −
¶ µZ
() ( )
¶
() ( )
e e
Then,
o exist finite 2 3 0 and 2 3 0 uch that for all ∈F with kk ≤
n there
3
min e
2 e
Φ ( + ) − Φ () = Φ (; ) + Φ (; )
Φ ( + ) − Φ () = Φ (; ) + Φ ( ; )
6
where, employing the notation of Lemma A.1 for the definitions of ( ; ) (; )
( ; ) and (; )
µ
¶
Z
Z
Φ (; ) =
(; ) ( ) −
()
+
Z
Φ (; ) =
+
−
µZ
Z
Z
Z
+
µ
¶
Z
(; ) () −
( )
µ
¶
Z
( ; ) ( ) −
()
µ
¶
Z
( ; ) () −
()
¢
¡
(; ) + (; ) ( (; ) + ( ; ))
¶ µZ
¶
¢
¡
( (; ) + ( ; ))
(; ) + (; )
Φ (; ) =
+
Z
µ
¶
Z
(; ) () −
()
Z
µ
¶
Z
( ; ) () −
()
µ
¶
Z
(; ) () −
( )
Φ (; ) =
+
Z
Z
µ
¶
Z
(; ) ( ) −
( )
Z ³
´¡
¢
(; ) + (; ) ( ; ) + ( ; )
+
¶ µZ
¶
µZ ³
´
¢
¡
(; ) + ( ; )
−
(; ) + (; )
Moreover,
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ 2 kk ¯Φ (; )¯ ≤ 2 kk2
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ 3 kk ¯Φ ( ; )¯ ≤ 3 kk2
Proof: The proof follows similarly to the proof of Lemma A.2, after replacing with
in the results related to Φ () and replacing with in the results related to
Φ () with the obvious modification of the upperbounds.
Lemma A.4: Let F, k·k (·) (·) , Φ (·) Φ (·) and
Φ (·) be as in the statements of Lemmas A.2 and A.3. Assume that the density
7
belongs to F and it is such that for 0 and all ( ) ∈ × ( ) and
() For simplicity we will leave the argument ( ) implicit. Define the functional
Ξ on F by
Φ () Φ () − Φ1 () Φ2 ()
Ξ () = 1 2
Φ1 1 () Φ () − Φ21 ()
£
¤
and assume that for some ∗ 0 Φ1 1 ( ) Φ ( ) − Φ21 () ∗ Then, there exist
finite 0 and e
0 such that for all ∈F with kk ≤ e
Ξ ( + ) − Ξ () = Ξ ( ; ) + Ξ (; )
where
Ξ ( ; )
=
[Φ1 2 (; ) Φ ( ) − Φ1 (; ) Φ2 ()]
Φ1 1 ( ) Φ ( ) − Φ21 ()
+
−
+
[Φ1 2 ( ) Φ ( ; ) − Φ1 () Φ2 (; )]
Φ1 1 () Φ () − Φ21 ()
[Φ1 2 ( ) Φ ( ) − Φ1 () Φ2 ( )] [Φ1 1 ( ; ) Φ + Φ1 1 ( ) Φ (; )]
¤2
£
Φ1 1 () Φ () − Φ21 ( )
[2 Φ1 (; ) Φ1 ()] [Φ1 2 () Φ () − Φ1 () Φ2 ( )]
¤2
£
Φ1 1 () Φ () − Φ21 ( )
¯
¯
¯
¯
¯Ξ (; )¯ ≤ kk ¯Ξ (; )¯ ≤ kk2
o
n
©
¡
¢ª
2
2
∗
2
e
e
e
e
Proof: Let = max 1 2 3 8 kk
and = min 1 2 3 (32 ) where
e
2 e
3 2 3 are defined in Lemma A.3. Since the
1 and 1 are defined in Lemma A.2 and e
assumptions of Lemmas A.2 and A.3 are satisfied, it follows from those lemmas that for all
in F with kk ≤ e
and for = 1 2
Φ ( + ) − Φ () = Φ (; ) + Φ (; )
Φ ( + ) − Φ ( ) = Φ ( ; ) + Φ ( ; ) and
Φ ( + ) − Φ () = Φ ( ; ) + Φ ( ; )
where Φ ( ; ) Φ (; ) Φ (; ) Φ ( ; ) Φ (; ) and Φ (; )
are as defined in Lemmas A.2 and A.3. By those Lemmas, these terms satisfy
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ kk ¯Φ (; )¯ ≤ kk2
8
¯
¯
¯
¯
¯Φ ( ; )¯ ≤ kk ¯Φ (; )¯ ≤ kk2 and
|Φ ( ; )| ≤ kk |Φ (; )| ≤ kk2
We seek a first order expansion
Ξ ( + ) − Ξ () = Ξ ( ; ) + Ξ (; )
with
|Ξ ( ; )| ≤ kk and |Ξ ( ; )| ≤ kk2
Denote 0 0 and by
0 = Φ1 2 ( + ) Φ ( + ) − Φ1 ( + ) Φ2 ( + )
= Φ1 2 () Φ () − Φ1 ( ) Φ2 ()
0 = Φ1 1 ( + ) Φ ( + ) − [Φ1 ( + )]2
= Φ1 1 () Φ () − [Φ1 ()]2
Then,
Ξ ( + ) − Ξ ( ) =
Φ1 2 ( + ) Φ ( + ) − Φ1 ( + ) Φ2 ( + )
Φ1 1 ( + ) Φ ( + ) − [Φ1 ( + )]2
−
=
Φ1 2 () Φ () − Φ1 ( ) Φ2 ()
Φ1 1 () Φ () − [Φ1 ()]2
0
−
0
We will make use of the equality
0
0 − 0 (0 − ) ( 0 − 0 )
−
−
=
0
2
0 2
Lemmas A.2 and A.3 and the definitions of 0 0 and imply that
0 = Φ1 2 ( + ) Φ ( + ) − Φ1 ( + ) Φ2 ( + )
= Φ1 2 () Φ () − Φ1 () Φ2 ()
+Φ1 2 (; ) Φ () + Φ1 2 () Φ (; )
−Φ1 (; ) Φ2 () − Φ1 () Φ2 ( ; )
+
9
where
= Φ1 2 (; ) [Φ ( + )]
+Φ1 2 (; ) [Φ ( + ) + Φ ( + )]
+Φ1 2 ( ; ) [Φ ( + ) + Φ ( + ) + Φ ( + )]
−Φ1 (; ) [Φ2 ( + )]
−Φ1 ( ; ) [Φ2 ( + ) + Φ2 ( + )]
−Φ1 (; ) [Φ2 ( + ) + Φ2 ( + ) + Φ2 ( + )]
and
0 = Φ1 1 ( + ) Φ ( + ) − Φ21 ( + )
= Φ1 1 ( ) Φ () − Φ21 ()
+Φ1 1 ( ; ) Φ + Φ1 1 () Φ (; )
−2 Φ1 (; ) Φ1 ()
+
where
= Φ1 1 ( ; ) [Φ ( + )]
+Φ1 1 (; ) [Φ ( + ) + Φ ( + )]
+Φ1 1 (; ) [Φ ( + ) + Φ ( + ) + Φ ( + )]
−Φ1 (; ) [Φ1 ( + )]
−Φ1 ( ; ) [Φ1 ( + ) + Φ1 ( + )]
−Φ1 (; ) [Φ1 ( + ) + Φ1 ( + ) + Φ1 ( + )]
10
Then,
0 − 0
=
£
¤
Φ1 1 ( ) Φ ( ) − Φ21 () [Φ1 2 ( ; ) Φ ( ) + Φ1 2 ( ) Φ ( ; )]
£
¤
− Φ1 1 ( ) Φ ( ) − Φ21 ( ) [Φ1 ( ; ) Φ2 ( ) + Φ1 () Φ2 (; )]
− [Φ1 2 () Φ () − Φ1 ( ) Φ2 ()] [Φ1 1 ( ; ) Φ + Φ1 1 () Φ (; )]
+2 [Φ1 2 () Φ () − Φ1 () Φ2 ()] Φ1 (; ) Φ1 ( )
£
¤
+ Φ1 1 ( ) Φ ( ) − Φ21 ()
− [Φ1 2 () Φ () − Φ1 ( ) Φ2 ()]
and
0 −
= Φ1 1 (; ) Φ + Φ1 1 () Φ ( ; ) − 2 Φ1 (; ) Φ1 () +
Denote Ξ ( ; ) and Ξ (; ) by
Ξ ( ; )
=
[Φ1 2 (; ) Φ ( ) + Φ1 2 ( ) Φ (; )]
Φ1 1 () Φ ( ) − Φ21 ( )
−
−
+
and
[Φ1 ( ; ) Φ2 ( ) + Φ1 () Φ2 (; )]
Φ1 1 () Φ ( ) − Φ21 ( )
[Φ1 1 (; ) Φ + Φ1 1 () Φ ( ; )] [Φ1 2 ( ) Φ ( ) − Φ1 () Φ2 ( )]
¤2
£
Φ1 1 ( ) Φ () − Φ21 ()
[2 Φ1 (; ) Φ1 ()] [Φ1 2 () Φ () − Φ1 () Φ2 ( )]
¤2
£
Φ1 1 () Φ () − Φ21 ( )
Ξ ( ; )
=
− [0 − ] [ 0 − 0 ]
−
2
2 0
11
Then,
Ξ ( + ) − Ξ () = Ξ ( ; ) + Ξ (; )
It remains to show that for some 0
|Ξ ( ; )| ≤ kk and |Ξ ( ; )| ≤ kk2
By Lemmas A.2 and A.3, for all in F with kk ≤ e
Φ1 1 ( + ) Φ ( + ) − Φ21 ( + )
= [Φ1 1 () + Φ1 1 ( ; ) + Φ1 1 (; )] [Φ () + Φ ( ; ) + Φ ( ; )]
− [Φ1 ( ) + Φ1 ( ; ) + Φ1 (; )]2
and, by the definition of for all ∈ {1 2 } |Φ ( ; )| ≤ kk and |Φ¡ ( ; )| ≤¢
kk2 Moreover, by the assumptions on and for all such |Φ ()| ≤ 8 kk2 2
≤ Hence,
Φ1 1 ( + ) Φ ( + ) − Φ21 ( + )
= Φ1 1 ( ) Φ () − Φ21 () +
for such that
≤ ∗ 2
| | ≤ 16 2 kk ≤ 16 2 e
Then, since
Φ1 1 ( ) Φ () − Φ21 () ∗
it follows that
Φ1 1 ( + ) Φ ( + ) − Φ21 ( + ) ∗ 2
and then, by the definitions of and 0
2 0 ( ∗ )3 2
ª
©
Let = max 16 2 (∗ )2 3776 6 ( ∗ )3 Then, for all such that kk ≤ e
|Ξ (; )| ≤
16 2
≤ kk
( ∗ )2
and since for all such that kk ≤ e
|0 | ≤ 182 || ≤ 22 || ≤ 22 | | ≤ 122 kk2
| | ≤ 122 kk2 |0 − | ≤ 162 kk and | 0 − 0 | ≤ 644 kk it follows that
18 2 (244 kk2 + 244 kk2 ) + (162 kk) (64 4 kk)
( ∗ )3 2
≤ kk2
|Ξ ( ; )| ≤
This completes the proof of Lemma A.4.
12
Lemma A.5: Let F, k·k (·) (·) , Φ (·) Φ (·) and
Φ (·) be as in the statements of Lemmas A.2 and A.3. Suppose that either Assumptions 2.7-2.10 or Assumptions 2.7, 3.4, 3.5, 2.9, and 2.10 are satisfied. Let denote the
dimension of the vector of observable endogenous variables and let = {} × Then,
√
+2 Φ ( b − )
"
#
Z ³
´ ¡ ( ) − R () ¢
√
=
+2
+ (1)
b ( ) − ( )
i
h
√
√
√
+2 Φ
b − Φ ( ) =
b
+2 Φ ()
(
−
)
+
+2 Φ ( b − )
= (1)
³
´
√
+2
b
Φ ; −
!" ¡
¢#
R
Z Ãb
√
()
−
()
(
)
−
(
)
=
+2
(
)
!" ¡
¢#
R
Z Ãb
√
()
−
()
(
)
−
(
)
+ +2
(
)
+ (1)
°
°2
√
°
°
+2 °b − ° → 0 in probability.
Proof: First note that
µ
Z
Z
(2)
(; ) () −
¶
()
i
h
e
e
e
e
−
Z
[ − ]
−
=
2
e2
¶ µZ µ ¶
¶
µZ
[ − ]
−
2
h
i
⎛
⎞
µZ µ ¶
¶
Z
e
e − e e
+⎝
⎠
e2
Z
Since by Assumption 2.8 the values and derivatives of the continuously differentiable function
when any coordinate of is on the boundary of are equal to zero and and e are larger
13
than everywhere on
and ith followsithat
h the
i continuously differentiable functions
£
¤ 2 h i
[ ] e
e and e
e vanish when any coordinate of is on
the boundary of Hence, integration by parts of the terms in (2) containing or e
gives
¶
()
#
"
¶ e
µ
¶
¸
µ
∙
Z
Z
e
+
+
+
= −
2
3
e
e2
µZ
µ ¶
¸
¶ µZ µ ¶
¶
∙
+
+ 2
"
ÃZ
µ ¶ e # ! µZ µ ¶
¶
e
+
−
e
e2
Z
µ
Z
(; ) () −
Let = b −
¢
¡
R
( ) − ()
() = −
2
µ
¶
¸
µZ µ ¶
µ ¶
¸
∙
¶∙
+
+
−
+ 2
2
3
"
µ
µ ¶ e ##
¶ e
¶"
µZ µ ¶
+
+
() =
−
e
e
e2
e2
Then, by Lemmas A.2 and A.3 and by (2)
=
³
´
Φ ; b −
Z
+
µ
¶
Z
b
(; − ) ( ) −
( )
Z
µ
¶
Z
(; b − ) ( ) −
( )
14
=
´ i⎤
⎡ h³
¶
Z
b − µ
⎣
⎦ () −
()
2
Z
³
´⎤
µ
¶
Z
b −
⎣
⎦ () −
()
−
2
Z
⎡
³
´ ∙ µ ¶ ¸
−
+
b −
2
3
#
¶ e
¶ "
µ
Z µ
be e
+
−
+
e
e2
µZ ³
¶ µZ µ ¶
¶
´ ∙ µ¶ ¸
b
+
+ 2
−
ÃZ µ
µ ¶ e # ! µZ µ ¶
¶
¶"
be e
−
+
−
e
e2
Z
Z ³
´ ¡ ( ) − R () ¢
=
b −
¶
Z ³
Z µ
´
be e
b
+
− () +
− ()
By Assumptions 2.7, 2.9, and 2.10, it follows by Lemma 5.3 in ³Newey (1994), by letting
´
R
b ) ()
) = (
in that lemma, 1 = 2 = dim() = 0 = 0 = (b
¡R
¢
and (0 ) = ( ) () that for a finite constant matrix
µZ ³
¶
´
√
b( ) − ( ) () → (0 )
e
Similarly,µletting in that same
¶ lemma, 1 = ³0 2 = dim() ´= 0 = 0 =
R
R b
e () it follows that for a
e () and (0 ) =
()
(b
) =
()
finite constant matrix
µZ
√
µ
¶
¶
be
e
() − () () → (0 )
Hence,
√
+2 Φ ( b − )
#
"
Z ³
´ ¡ () − R ( ) ¢
√
=
+ (1)
b ( ) − ( )
+2
15
Using analogous arguments for Φ ( b− ) and for Φ ( b− ) it follows that
√
+2 Φ ( b − )
µ
¶
Z
Z
√
=
+2
( ; b − ) ( ) −
()
¶
Z
Z
³
´ µ
√
+2
b
() −
; −
( )
+
= (1)
and that
³
´
√
+2
b
Φ ; −
µ
¶
Z
Z
√
+2
b
=
(; − ) () −
( )
µ
¶
Z
Z
√
+2
b
+
(; − ) ( ) −
( )
!
Ã
Z ³
´ ¡ ( ) − R ( ) ¢
√
b −
+2
=
!
Ã
Z ³
´ ¡ ( ) − R () ¢
√
b −
+ +2
+ (1)
To show that
°
°2
√
°b
°
+2
° − ° → 0 in probability.
we note that from Appendix B.3 in Newey (1994),
°
°
³p
´
°
°b
log() ( 2+2 ) +
° − ° =
³p
´2
√
Since by Assumption 2.10,
+2
log () ( 2+2 ) +
converges to zero, it
follows that
°2
°
√
°
°b
+2
° − ° → 0 in probability.
Lemma A.6: Under the same definitions, notations, and assumptions
°
° in Lemma A.5,
°b
°
for all and all ∈ { } for finite 0 and for ° − ° sufficiently small,
³
´³
´ 2
sup()∈ | ∆ log b |= () ∆ log b |= () ()
°
°
¡
¢¡
¢ 2
°
°
− ∆ log |= () ∆ log |= () ()
| ≤ °b − °
16
°
°
°
°b
b
0 and for all
Proof: By Lemma A.1, it follows that for all such that ° − ° ≤ e
( ) ∈
¯
¯
¯ ³ ´
¯ b ( ) ( ) ¯
¯ ¯
³
´¯ ¯
³
´¯
¯
¯
¯ ¯
¯
¯ ¯
¯
b
b
b
−
¯
¯ = ¯ − ()¯ ≤ ¯ ; − ¯ + ¯ ; − ¯
¯ b( )
( ) ¯
°
°2
°
°
°
°
°
°
≤ °b − ° + °b − °
Hence,
°2
°
°
°
¯
¯ ³ ´
°
°
°
°
¯
¯
sup ¯ b − ()¯ ≤ °b − ° + °b − °
()∈
Since is bounded,
R
( ) = 1 and is compact, this implies that
¯Z
¯
Z
³ ´
¯
¯
¯ b −
( ) ¯¯
¯
Z ¯ ³ ´
¯
¯
¯
b
≤
¯ − ()¯
°
°2
°
°
°b
°
°b
°
≤ ° − ° + ° − °
and therefore, for all ( ) ∈
¯µ ³ ´ Z
¶ µ
¶¯
Z
³ ´
¯
¯
b
¯
¯ b −
−
(
)
−
(
)
¯
¯
¯
Z
¯ ³ ´
¯ ¯¯Z
³ ´
¯
¯
¯
≤ ¯ b − ()¯ + ¯¯ b −
() ¯¯
°
°2
°
°
°
°
°
°
≤ 2 °b − ° + 2 °b − °
By an analogous argument for , it can shown employing Lemma A.1 that for all ( ) ∈
¯µ
¶ µ
¶¯
Z
³ ´ Z
³ ´
¯
¯
b
¯ b −
¯
−
()
−
¯
¯
°
°2
°
°
°
°
°
°
≤ 2 °b − ° + 2 °b − °
Since for all ( ) ∈ ( ) and () it also follows that, since
by°the definition
°
°
°
0
of e
0 in the proof of Lemma A.1, e
0 = min{2 1} for all b such that °b − ° ≤ e
¯
¯
¯ 1
¯
1
¯
¯
−
¯
¯≤
b ) ( ) ¯
¯ (
¯
°
¯
°
¯
°
¯b
°
2 °b − °
¯ ( ) − ( )¯
¯
¯
≤
¯b
¯
2
|(
)|
(
)
¯
¯
17
³ ³ ´ R
i
³ ´
´
h
R
b
b
b
b
b
Denote
b = − = ( ) − ()
R
¢
£
R
¤
¡
= () − () and = () − ( ) Note that
³
´
³
´
b +
b − − b −
(b
−
)
b
b
=
−
b
b
Then,
³
´
³
´¯
¯
¯
¯
¯ (b
¯
b
b
b
¯
¯
−
)
+
−
−
−
b
¯
¯
¯ b ¯
¯
¯
−
¯ = ¯
¯
¯
b
¯ b
¯
¯
¯
¯
¯
¯
¯
¯ ¯
¯
¯
¯b
¯
¯b¯
− ¯ | | + ¯b − ¯ ||
|(b
− )| ¯
¯ + ¯
¯ ¯
≤
¯ b¯
¯ ¯ | |
°
°
°
°
¯ ¯
i
h
°
°b
°
°b
¯b¯
2 k(b
− )k ¯ ¯ | | + ° − ° | | + ° − ° ||
≤
2
¯ ¯
¯b¯
Since ¯
¯ = | + + | ≤ ||+||+|| and all the three last term are either uniformly
°
°
°
°
bounded, by Assumption 2.7, or bounded by a constant times °b − ° by Lemma A.1, it
follows
³ that for a finite ´
³ 0 for all ( ) ∈
´ 2
¡
¢¡
¢ 2
b
b
| ∆ log |= () ∆ log |= () ()
− ∆ log |= () ∆ log |= () ()
|
°
°
°
°
≤ °b − °
where = and = A similar argument holds for ∈ { } and ∈ { }
Lemma B.1: Suppose that (32) and Assumptions 3.1-3.3 are satisfied. Then,
³
¡ 0 ¢−1 0 ´
0
0
− e e
µ 2 ¶µ
¶¸
µ 2 ¶¸
∙
∙ µ 2 ¶µ 1 ¶
¡
¢
2
e
e
e
| |
1
1
1
(g − ) − e1
− e2
g1 − 1
= 1 − e1
1
2
1
|e
|
1
|e
|
1
|e
|
and
¡ ¢−1 ¡
¢
−( − e
) + e0 e0
e
−
µ 2 ¶
µ 2 ¶
¢
¢
e ¡
e ¡
1
1
= −( −
1 −
2 − e
e ) − e2
e1 + e1
2
|e
|
|e
|
where the notation corresponds to that used in the proof of Theorem 3.1.
Proof: Taking logs and differentiating both sides of (31) in the proof of Theorem
3.1, with respect to leaving the arguments implicit, we get
(11)
g = 2 2 +
18
and taking logs and differentiation both sides of (31) with respect to 1 we get
g1 = 11 1 + 21 2 + 1
(12)
Since the matrix
µ
0 2
11 21
¶
is invertible, because by assumption 11 and 2 are different from zero, the unique value of
the vector (1 2 ) that solves (11) − (12) is
µ
¶ Ã − 21 1 ! µ
¶
g
1
−
1 2
1
1
1
(13)
=
1
2
g1 − 1
0
2
Since 0 = (0 2 ) it follows by (13) that
0 = 2 2 = g −
(1435)
By the definition of 0 and (13)
0
(15)
=
=
Ã
Ã
Note that
(16)
=
µ
11 21
12 22
¶Ã
0
− 12 21
11 2
1
+
22
2
12
11
− 21
11 2
1
2
! µ
1
11
0
! µ
g −
g1 − 1
g1 − ³1
³ 1 ´ ¡
´
¢
| |
2
g
(g − )
+
1 − 1
1
1 2
1
e0
=
1
¡ 0 ¢−1 ³
= 0
e
³
−e
12
³
2
|
|
´
2
|
|
e11
g −
g1 − 1
¶
!
¶
´ µ e2
−e
21
2
−e
12 e11
³ 2´ ´
|
|
¶
By (15) and (16) it then follows that
h ¡ ¢ i£
¤
−1
0
(17)
e0 e0
!#
"Ã
h³
³ 2´
³ 2 ´ ´i
g1 −³ 1 ´
³
´
¢
12 ¡
=
−e
12 | |
e11 | |
g
+ 1|| 2 (g − )
−
1
1
1
1
1
µ 2 ¶¸
¶¸
∙ µ 2 ¶µ 1 ¶
∙ µ 2 ¶µ
¡
¢
2
e
e
e
| |
1
1
1
− e2
(g − )
g1 − 1 + e1
= e1
|e
|
11
|e
|
|e
|
11 2
By (14) and (17)
³
¡ ¢−1 0 ´
0 − e0 e0
19
h ¡ ¢ i £
¤
−1
=
] − e0 e0
0
µ 2 ¶¸
¶¸
∙ µ 2 ¶µ
∙ µ 2 ¶µ 1 ¶
¡
¢
2
e
e
e
| |
1
1
1
= (g − )− e1
− e2
(g − )
g1 − 1 − e1
|e
|
11
|e
|
|e
|
11 2
µ 2 ¶µ
¶¸
µ 2 ¶¸
∙ µ 2 ¶µ 1 ¶
∙
¡
¢
2
e
e
e
| |
1
1
1
(g
−
e
−
)
−
e
g
= 1 − e1
−
1
1
1
2
|e
|
11 2
|e
|
11
|e
|
[0
By (16)
¡ ¢−1 ¡
¢
−( −
e ) + e0 e0
e
−
⎞
⎛
e1
³ 2´
³ 2 ´ ´ 1 −
³
⎠
⎝
e11 | |
) + −e
12 | |
= −( − e
2 −
e2
µ 2 ¶
µ 2 ¶
¢
¢
e ¡
e ¡
= −( −
1 −
2 − e
e ) − e12
e1 + e11
2
|e
|
|e
|
20
© Copyright 2026 Paperzz