Submitted to the Annals of Statistics
NONPARAMETRIC IDENTIFICATION AND ESTIMATION
OF TRANSFORMATION MODELS
By Pierre-André Chiappori
Columbia University
and
By Ivana Komunjer
University of California San Diego
and
By Dennis Kristensen∗
University College London, CeMMAP and CREATES
This paper derives sufficient conditions for nonparametric transformation models to be identified and develops estimators of the identified components. Our nonparametric identification result is global,
and allows for endogenous regressors. In particular, we show that
a completeness assumption combined with conditional independence
with respect to one of the regressors suffices for the model to be nonparametrically identified. The identification result is also constructive
in the sense that it yields explicit expressions of the functions of interest. We show how natural estimators can be developed from these
expressions, and analyze their theoretical properties. Importantly, it
is demonstrated that the proposed estimator of the unknown transformation function converges at a parametric rate. A test for whether
a candidate regressor indeed satisfies the conditional independence
assumption required for identification is developed. A Monte Carlo
experiment illustrates the performance of our method in the context
of a duration model with endogenous regressors.
1. Introduction. A variety of structural econometric models comes in
form of a transformation model,
(1)
Y = T (g(X) + ) ,
∗
Kristensen gratefully acknowledges research support by the Economic and Social Research Council through the ESRC Centre for Microdata Methods and Practice grant RES589-28-0001, the Danish National Research Foundation (through a grant to CREATES)
and the European Research Council (grant no. ERC-2012-StG 312474)
AMS 2000 subject classifications: Primary 62G05; secondary 62G99
Keywords and phrases: nonparametric identification, transformation
models, duration
√
models, endogeneity, special regressor, kernel estimation, n-consistency
1
2
CHIAPPORI, KOMUNJER, AND KRISTENSEN
for some unknown monotonic transformation T (y), regression function g (x)
and cumulative distribution function of |X, F |X . One important class are
duration models that have been widely applied to study duration data in labor economics (Keifer, 1988), IO (Mata and Portugal, 1994), insurance (Abbring, Chiappori, and Zavadil, 2007), and finance (Engle, 2000; Lo, MacKinlay, and Zhang, 2002), among others. Another class are hedonic models studied by Ekeland, Heckman, and Nesheim (2004) and Heckman, Matzkin, and
Nesheim (2005). A yet different example are models of binary choice with
underlying random utilities à la Hausman and Wise (1978). Further examples of models that fall within the framework of Equation (1) can be found
in Matzkin (2007). In many empirical applications of the above model, endogeneity of X is a concern; see, for example, Bijwaard and Ridder (2005).
We develop
nonparametric
identification and estimation results for the
¢
¡
triplet T, g, F |X allowing for X to be endogenous. First, we show that the
model is identified under conditions that are significantly weaker than full
independence between X and . Our identification strategy is constructive
in a sense that we obtain explicit expressions of the relevant components
of the model in terms of primitives such as the joint distribution of the
observables. This in turn allows us to develop simple nonparametric estimators of the identified components which we analyze. This analysis shows
our nonparametric estimator of T (y) attains a parametric convergence rate.
The parametric rate implies that for the estimation of g (x), we can treat
T as known. To the best of our knowledge, our paper is the first to establish identification and to develop rate-optimal estimation procedure for fully
nonparametric transformation models under endogeneity.
The identification argument proceeds in two steps: We first show that
T (y) is identified under the assumption that a the regressors can be decomposed into X = (X1 , X−1 ) where the subvector X1 is conditionally exogenous, ⊥ X1 | X−1 . The regressor X1 plays a role similar to the “special
regressor” of Lewbel (1998), though the restrictions imposed here are somewhat weaker than in Lewbel (1998): g(X) need not be linear in X1 , and
no large support conditions are imposed on X1 . Once, T has been identified, we rewrite the model as Θ (Y ) = g(X) + where Θ (Y ) ≡ T −1 (Y )
can be treated as observed. This is a nonparametric regression problem
with endogenous regressors. The identification of g (x) and F |X can now
be established using existing results on either nonparametric instrumental
variables (see Darolles, Fan, Florens, and Renault, 2011; Newey and Powell,
2003; Hall and Horowitz, 2005, amongst others) or nonparametric control
functions (see Newey, Powell, and Vella, 1999; Imbens and Newey, 2009).
Note that identification of T does not require the availability of instruments
NONPARAMETRIC TRANSFORMATION MODELS
3
or controls; these are only used to identify g and F |X .
Our estimation strategy builds upon our identification result where we
demonstrate that Θ can be expressed as a functional of FY |X . This leads
to a natural pointwise estimator of Θ where we replace FY |X with a nonparametric kernel estimator. The resulting estimator is shown to converge
with parametric rate. Once Θ has been estimated, g can be estimated using
either nonparametric IV or nonparametric control functions with Θ̂ (Y ) replacing the true but unknown dependent variable, Θ (Y ). In principle, any
existing nonparametric estimators can be employed. For the theoretical results, we focus on the sieve estimator of Blundell, Chen, and Kristensen
(2007) which we adjust to allow for multiple regressors and pre-estimated
dependent variables thereby yielding a feasible estimator of g. Given the
parametric convergence rate of Θ̂, our nonparametric IV estimator of g converges with the same rate as the oracle estimator assuming Θ known. Once
Θ and g have been recovered, we can compute residuals and use these to
estimate F |X .
The identification and estimation schemes critically rely on the availability of a regressor X1 satisfying the conditional independence assumption
⊥ X1 | X−1 . If, for a given choice of X1 , this assumption is violated
the proposed estimators are inconsistent. It is therefore important to be
able to check the validity of a candidate regressor. As part of the identification argument, we derive a set of over-identifying restrictions implied by
the conditional independence assumption. We develop a statistical test that
examines whether these restrictions hold in a given sample.
We investigate the finite-sample performance of our estimator of Θ in
a Monte Carlo simulation study. The study is designed around a popular
duration model. We find that the estimator Θ̂ performs well with small biases
and variances even for moderate sample sizes. Moreover, the performance
does not seem to be affected by whether regressors are endogenous or not. As
such, the proposed estimator appears to be both robust and quite precise.
Our identification results are close in spirit to those obtained by Ridder
(1990), Ekeland, Heckman, and Nesheim (2004), and Jacho-Chávez, Lewbel, and Linton (2010) who focus on exogenous regressors. Fève and Florens
(2010) and Florens and Sokullu (2011) allow for endogenous regressors but
assume that g (x) = β 0 x or partially linear. Finally, there is a growing literature on general identification results under endogeneity. For example,
Chernozhukov, Imbens, and Newey (2007) and Chen, Chernozhukov, Lee,
and Newey (2011) provide identification conditions for a general class of
models, including ours. These only provide local identification results and
their conditions are quite abstract. We complement these papers by provid-
4
CHIAPPORI, KOMUNJER, AND KRISTENSEN
ing primitive conditions for global identification in the above transformation
model with endogenous regressors.
A number of existing papers have proposed alternative, nonparametric
estimators of transformation functions. However, as discussed below, most
existing strategies require either that g is (partially) linear or that a preliminary estimator of g is available. The only exception is Jacho-Chávez,
Lewbel, and Linton (2010) who develop fully nonparametric estimators of
transformation models. But their estimator of T does not obtain parametric rate, and does not allow for endogenous regressors. As such, our paper
appears to be the first one to develop a fully nonparametric, rate-optimal
estimation procedure.
To our knowledge, the first nonparametric estimator of Θ under exogeneity was developed in Horowitz (1996) who requires as input an initial
estimator of g. If g is parametrically specified a number of different estimators exist (Duan, 1990; Han, 1987), while in the nonparametric case we are
only aware of the sieve rank estimator of Matzkin (1991) whose asymptotic
properties are still not fully understood. Alternative rank-based estimators
of T were subsequently developed; see, e.g., Ye and Duan (1997) and Chen
(2002). But these also rely on a first-step estimator of g and exogenous regressors. An extension to partially linear g and endogenous regressors can be
found in Jochmans (2011). None of these existing estimators are straightforwardly extended to the fully nonparametric case. Finally, the general sieve
estimators developed in Chen and Pouzo (2012) and Chernozhukov, Imbens,
and Newey (2007) should in principle be applicable to our model but their
properties are unknown.
The remainder of the paper is organized as follows. Section 2 contains the
identification result, while estimators of T , g and F |X are proposed and analyzed in Section 3. The test for conditional independence is developed and
analyzed in Section 4. Section 5 discusses how our results can be applied
in the context of duration models, and illustrates the performance of the
proposed estimators through a Monte Carlo experiment. The last section
concludes. Additional technical assumptions are relegated to an Appendix.
All proofs are given in detail in the supplemental article (Chiappori, Komunjer, and Kristensen, 2013).
2. Identification.
2.1. Model and Assumptions. We consider the nonparametric transformation model given in Equation (1) where Y has support Y ⊆ R, X =
(X1 , . . . , Xdx ) has support X ⊆ Rdx , and belongs to E ⊆ R. The variables
NONPARAMETRIC TRANSFORMATION MODELS
Y and X are observed, while
following assumptions.
remains latent. Hereafter we maintain the
Assumption A1. For a.e. x ∈ X , the conditional distribution F
of given X = x is absolutely continuous with a continuous density f
with support Ex ⊆ R.
Assumption A2.
T0
5
|X (·, x)
|X (·, x)
The support D of g(X) + is connected in R.
Assumption A3. T is continuously differentiable on D with derivative
> 0 on D, and 0 ∈ T (D).
Assumptions A1, A2 and A3 guarantee that the conditional distribution
FY |X (·, x) of Y given X = x is absolutely continuous with a continuous density fY |X (·, x). Moreover, since Y = T (D) the support Y of Y is a connected
subset of R. The requirement of T being strictly increasing could be replaced
by strict monotonicity; the strictly decreasing case is treated by considering
−T . Without loss of generality, we assume that 0 ∈ T (D), i.e. 0 belongs to
the support of Y .
We will assume that at least one of the regressors in X is conditionally
exogenous. We may, with no loss of generality, assume that this restriction is
satisfied by its first component, X1 . The case of several exclusion restrictions
will be discussed in detail in the following section. Whenever dx > 1, we
denote by X−1 the remaining subvector of X, i.e. X−1 ≡ (X2 , . . . , Xdx ).
The supports of X1 and X−1 are denoted by X1 and X−1 , respectively.
Assumption A4.
X1 ⊆ R.
⊥ X1 | X−1 and X1 is continuously distributed on
Note that except for the continuity of the random variable X1 , A4 does
not restrict its support X1 . In particular, X1 need not be equal to R, and
X1 may well have bounded support. Perhaps more importantly, Assumption
A4 allows all the other components X2 , . . . , Xdx to be either continuous or
discrete with bounded or unbounded supports. We now further restrict the
regression function g : X → R in (1).
Assumption A5.
ists.
For a.e. x ∈ X , the partial derivative ∂g(x)/∂x1 ex-
Similar to A4, Assumption A5 only restricts the smoothness of g with respect to x1 . Nothing is being said about the behavior of g with respect to the
remaining components x−1 . When we analyze the nonparametric estimators
6
CHIAPPORI, KOMUNJER, AND KRISTENSEN
of T and g, we will however impose additional smoothness conditions on g
as a function of x−1 .
Under Assumptions A4 and A5, the exogenous variable X1 plays a role
similar to the “special regressor” of Lewbel (1998), however, under considerably weaker restrictions. In particular, no large support conditions are
required on the distribution of X1 given X−1 ; this is unlike in Lewbel (1998)
who requires the latter to be either the entire real line, or else large enough
if the supports of X−1 and are bounded. Moreover, while Lewbel (1998)
assumes the function g to be of the form: g(x) = x1 g1 (x−1 ) + g2 (x−1 ) with
g1 (x−1 ) 6= 0 for a.e. x−1 , we only require that ∂g(x)/∂x1 6= 0 for some x.
The above restrictions suffice for identification of T . To show identification
of g and F |X , we need additionally restrictions to be able to generate exogenous variation. We here follow the literature on nonparametric instrumental
variables (IV’s) and assume the existence of a set of IV’s Z ∈ Z ⊆ Rdz that
satisfy standard conditions of this literature:
Assumption A6. For a.e. z ∈ Z, E( |Z = z) = 0 and the conditional
distribution of X−1 given Z = z is complete: for every function m : X−1 → R
such that E[m(X−1 )] exists and is finite, E[m(X−1 ) | Z = z] = 0 implies
m(x−1 ) = 0 for a.e. x−1 ∈ X−1 .
If one is willing to impose the additional restriction that g is bounded,
the completeness condition A6 can be replaced by the weaker assumption of bounded completeness: for every bounded function m : X−1 → R,
E[m(X−1 )|Z] = 0 w.p.1 implies m(X−1 ) = 0 w.p.1 (see, e.g., Blundell,
Chen, and Kristensen, 2007, for a discussion). Finally, note that, instead
of the above IV approach, one could take a control function approach as
pursued in Newey, Powell, and Vella (1999) and Imbens and Newey (2009)
to identify g.
Following the usual terminology, we hereafter refer to a triplet (T, g, F |X )
as a structure. The model then corresponds to the set of all structures
(T, g, F |X ) that satisfy the restrictions given by Assumptions A1 through
A6. Each structure in the model induces a conditional distribution FY |X of
the observables, and two structures (T̃ , g̃, F̃˜|X ) and (T, g, F |X ) are observationally equivalent if they generate the same FY |X . The structure (T, g, F |X )
is globally identified if any observationally equivalent structure (T̃ , g̃, F̃˜|X )
satisfies T̃ (y) = T (y), g̃(x) = g(x) and F̃˜|X (t, x) = F |X (t, x). We first provide identification results in the case of one conditionally exogenous variable
X1 ; we then extend our results to the case where there are possibly several
variables X1 , . . . , XI (I > 1) that are conditionally independent of given
NONPARAMETRIC TRANSFORMATION MODELS
7
the remaining components X−I of X = (X1 , . . . , XI , X−I ).
2.2. Single Exogenous Regressor. The conditional independence property in Assumption A4 has strong implications which we now derive. In
what follows, let Φ(y, x) denote the conditional cdf of Y given X, Φ(y, x) ≡
FY |X (y, x) = P (Y ≤ y|X = x). Under Assumption A3, Θ is continuously
differentiable and strictly increasing on Y. Note that in addition Θ(Y) = D.
Equation (1) is equivalent to = Θ (Y ) − g(X), so by Θ0 > 0 and the
conditional independence of and X1 given X−1 ,
(2)
Φ(y, x) = P ( ≤ Θ (y) − g(x)|X = x) = F
|X
(Θ (y) − g(x), x−1 ) ,
for all (y, x) ∈ Y × X .
It is clear from Equation (1) that some normalization of the model is
needed; indeed, for any λ > 0 and μ ∈ R, the transformation model (1)
is equivalent to Y = T̃ (λg(X) + μ + λ ) where T̃ is defined by T̃ (t) ≡
T ((t − μ)/λ). We therefore impose the following normalization condition
on any structure (T, g, F |X ) in (1):
(3)
T (0) = 0, E [ ] = 0, and E[g(X)] = 1.
More generally, the following normalizations can be used: T (y0 ) = 0 and
E[g(X)] = ḡ 6= 0 for known y0 and ḡ. These translate into the normalizations
in (3) for the following equivalent representation: Θ̃(Ỹ ) = g̃ (X) + ˜ where
Ỹ = Y − y0 , Θ̃ (y) = Θ (y + y0 ) /ḡ, g̃ (x) = g (x) /ḡ and ˜ = /ḡ. So in the
following, we will focus on (3).
The main identification result is as follows:
Theorem 1. Let Assumptions A1-A5 and the normalization condition
(3) hold. Assume in addition that the set A ≡ {x ∈ X : ∂Φ(y, x)/∂x1 6=
0 for every y ∈ Y} is nonempty. Then:
(i) T is globally identified;
(ii) (g, F |X ) are globally identified if in addition Assumption A6 holds.
The requirement that A is nonempty can be thought of as a generalized
rank condition saying that X1 has a causal impact on Y . The intuition
behind this condition appears by taking derivatives w.r.t. x1 in Equation
(2),
∂g(x)
∂Φ(y, x)
= −f |X (Θ (y) − g(x), x−1 )
.
∂x1
∂x1
Thus, the requirement has two parts: First, we need that for some x ∈ X ,
∂g(x)/∂x1 6= 0; this requirement excludes the situations in which g is a constant function. Second, we need that for the same value x, {t ∈ R : t = Θ(y)−
8
CHIAPPORI, KOMUNJER, AND KRISTENSEN
g(x), y ∈ Y} ⊆ Ex ; this assumption ensures that f |X (Θ (y) − g(x), x−1 ) > 0
for every y ∈ Y, and is akin to Assumption 5a in Horowitz (1996). A simple
primitive condition for the second requirement is that Ex = R, for example. Rather than imposing specific sufficient conditions on f |X and g, we
maintain the high-level condition that A is nonempty.
While used to identify (g, F |X ), the completeness Assumption A6 is not
used to identify the transformation T . In fact, the proof of Theorem 1 shows
that under A1-A5 and the normalization condition (3):
(4)
Θ(y) = θ (y, x) ,
θ (y, x) ≡
S (y, x)
,
EY [S (Y, x)]
x ∈ A,
where
(5) S (y, x) ≡
Z
0
y
Φy (u, x)
du and EY [S (Y, x)] =
Φ1 (u, x)
Z
S (y, x) fY (y) dy.
R
Here, fY (y) denotes the unconditional density of Y , and we use subscripts
Ry
to denote partial derivatives.1 Moreover, for y < 0, the integral 0 f (t) dt
R0
should be read as − y f (t)dt. Key to the identification of T is the conditional
independence assumption A4 which in particular guarantees that θ (y, x) is
constant with respect to x. Hence, evaluating this quantity at any x for
which Φ1 (y, x) never vanishes allows to recover Θ. The expression in (4)
also makes clear why Theorem 1 needs to assume the set A of such x’s to
be nonempty.
As Equation (4) shows, our identification strategy is constructive in a
sense that it leads for a closed form expression of Θ as a function of the
observables. In the next section, we shall develop nonparametric estimators
of Θ and g that build on this expression, and examine their properties.
But beforehand, let us examine the case of several conditionally exogenous
variables.
2.3. Multiple Exogenous Regressors. If several of the X variables are conditionally independent of , the question arises as to whether this additional
information can be used for identification (and estimation) of T , g and F |X .
Formally, suppose we can decompose the regressors as X = (XI , X−I ) where
the subvector XI ∈ R|I| contains the |I| exogenous components of X while
X−I contains the potentially endogenous regressors. The support of XI is
denoted by XI . We now assume the following:
1
Specifically, g1 (x) ≡
∂g(x)
,
∂x1
Φy (y, x) ≡
∂Φ(y,x)
∂y
and Φ1 (y, x) ≡
∂Φ(y,x)
.
∂x1
NONPARAMETRIC TRANSFORMATION MODELS
9
Assumption A4’.
XI ⊆ R|I| .
⊥ XI | X−I and XI is continuously distributed on
Assumption A5’.
∂g(x)/∂xi exist.
For a.e. x ∈ X and every i ∈ I, the partial derivatives
Assumptions A4’ and A5’ strengthen our earlier assumptions A4 and A5,
respectively. It is worth pointing out though that the case of several conditionally exogenous variables is also a particular version of the previous
setting. Indeed, assume that A4’ holds so that the disturbance in the
model (1) is known to be conditionally independent of XI given the remaining components of X. Since E[ ] = 0, it then holds that w.p.1 E[ |XI ] = 0.
Hence, it suffices to include XI in the vector of instruments Z.
Having several conditionally exogenous variables XI providing information on the model (1) can potentially enlarge the region in which T (y) is
identified in cases where A defined in Theorem 1 is empty. To be specific,
define
Yi (x) ≡ {y ∈ Y : ∂Φi (y, x) 6= 0}, i ∈ I,
and suppose that each of these sets is connected and contains 0. Then, by the
same arguments as in the proof of Theorem 1, Θ (y) = θi (y, x) for y ∈ Yi (x)
where
Si (y, x)
,
(6)
θi (y, x) ≡
EY [Si (Y, x)]
(7)
Si (y, x) ≡
Z
0
y
Φy (u, x)
du,
Φi (u, x)
EY [Si (Y, x)] =
Z
Si (y, x) fY (y) dy.
R
Using only one of the
Sexogenous regressors, Xi for some i ∈ I, we can identify
Θ (y) for y ∈ Yi ≡ x∈Xi Yi (x) which may not be the whole domain of Y .
Pooling the information contained in eachSof the exogenous regressors, we
have identification on the larger set Y ∗ ≡ i∈I Yi .
Moreover, additional exogenous regressors generate more over-identifying
restrictions: By the conditional independence of and XI , we have Φ (y, x) =
F |X (Θ (y) − g(x), x−I ), for every (y, x) ∈ Y × X . Differentiating w.r.t. xi
i ∈ I gives
(8)
∂F |X
∂g
∂Φ
(Θ(y) − g(x), x−I ).
(y, x) = −
(x)
∂xi
∂xi
∂t
Now take any y ∈ Aj (x): Equation (8), then implies that
∂Φ(y, x)/∂xi
∂g(x)/∂xi
=
,
∂Φ(y, x)/∂xj
∂g(x)/∂xj
10
CHIAPPORI, KOMUNJER, AND KRISTENSEN
which is a function of x alone. Thus, existence of multiple exogenous regressors leads to over-identifying restrictions which can be used to, for example,
test for the joint exogeneity of XI . This could be done by examining whether
the ratios [∂Φ(y, x)/∂xj ] / [∂Φ(y, x)/∂xi ], i, j ∈ I, are indeed invariant with
respect to y.
Finally, the availability of multiple, exogenous regressors can also be used
to construct more efficient estimators of Θ: By a straightforward extension of
the arguments in the next section, for each exogenous regressor, Xi , we can
√
construct a corresponding n-consistent estimator of Θ(y), say, Θ̂i (y), for
y ∈ Ai . A fully efficient estimator utilizing all information contained in the
|I| exogenous regressors can then beP
obtained through a linear combination
of these |I| estimators, say Θ̂(y) = i∈I wi (y) Θ̂i (y) for a set of weighting
functions {wi (y)}i∈I , each with support Ai . As with GMM-type estimators,
given the (asymptotic) covariance structure of the estimators, {Θ̂i (y)}i∈I ,
the weights {wi (y)}Ii=1 can be chosen to obtain (pointwise) efficiency.
3. Estimation. We use the identification results of the previous section
to derive explicit estimators of (T, g, F |X ). We focus on the case of a single
exclusion restriction. The extension to multiple exclusion restrictions was
discussed above. Moreover, we will assume that X has a continuous distribution which we then estimate using kernel smoothing techniques. The case
where some of the regressors have a discrete distribution is easily handled
by simply replacing the relevant kernels by indicator functions.
Suppose we have a random sample (Yi , Xi , Zi ) (i = 1, . . . , n) drawn from
the transformation model in Equation (1). Our estimator of Θ (y) takes
as starting point Equation (4) which implies
R that for any weighting function w (x) with support X0 ⊆ A satisfying X0 w (x) dx = 1, we have Θ (y) =
R
X w (x) θ (y, x) dx. The estimation method that we propose is then straightforward in principle: We first obtain a nonparametric estimator of the conditional cdf Φ(y, x). We then plug the derivatives with respect to y and x1
of this estimator into Equation (5) to obtain an estimator of S (y, x) and
its moment which in turn are substituted into Equation (4). This yields a
nonparametric estimator of θ (y, x) and thereby Θ (y).
To be more specific, observe that the conditional cdf can be written as
Φ (y, x) =
p (y, x) ≡
Z
p (y, x)
,
f (x)
y
−∞
fY,X (u, x) du,
f (x) ≡
Z
Y
fY,X (u, x) du,
NONPARAMETRIC TRANSFORMATION MODELS
11
where fY,X (y, x) is the joint pdf of (Y, X). Thus, a natural kernel-based
estimator of Φ (y, x) is
(9)
Φ̂ (y, x) =
n
p̂ (y, x) =
p̂ (y, x)
,
fˆ (x)
1X
Khy (Yi − y) Khx (Xi − x) ,
n
i=1
n
1X
fˆ (x) =
Khx (Xi − x) ,
n
i=1
K (x/hx ) /hdxx ,
and hx , hy > 0 bewith Khy (y) = K (y/hy ), Khx (x) =
ing univariate
bandwidths.
The
functions
K
(y)
and
K (x) are given as
Ry
Qdx
K (y) = −∞ K (u) du and K (x) = i=1 K (xi ) with K : R → R being
a univariate kernel. Note that we could allow for individual bandwidths for
each variable in Xi but to keep the notation simple we here use a common
bandwidth across all regressors. Also note that we could replace Khy (Yi − y)
by the indicator function I {Yi 6 y}; however, for technical reasons we keep
Khy (Yi − y). We pay a small price in terms of bias for this choice. We then
propose the following least-squares (LS) type estimator of Θ (y) given the
weighting function w introduced earlier:
Z
Ŝ (y, x)
LS
(10)
Θ̂ (y) ≡
w (x) θ̂ (y, x) dx, θ̂ (y, x) =
,
ÊY [Ŝ (Y, x)]
X
Z y
n
Φ̂y (u, x)
1X
du and ÊY [Ŝ (Y, x)] =
Ŝ (Yi , x) .
Ŝ (y, x) =
n
0 Φ̂1 (u, x)
i=1
The weighting function w serves two purposes: First, it is used to control
for the usual denominator problem present in many semiparametric estimators where we divide by a nonparametric density estimator. Specifically,
we will require that inf x∈X0 f (x) > 0 and inf y∈Y,x∈X0 Φ1 (y, x) > 0 where
X0 is the support of w (x); these assumptions in particular imply that X0
is compact. Second, it allows us to improve on efficiency of the estimator
by reweighing θ̂ (y, x) as a function of x. There is a tension between these
two purposes since in order to obtain full efficiency, we expect that w needs
to have full support which is ruled out by compact support assumption. It
should be possible to weaken this restriction and allow the support X0 to
grow with sample size. This will however lead to more complicated conditions and proofs and so we maintain the compact support assumption for
simplicity.
We note that the proposed estimator is similar to the estimator of Horowitz
(1996) who considers the semiparametric model where the regression function is restricted to g (x) = β 0 x. Assuming for simplicity that β is known so
12
CHIAPPORI, KOMUNJER, AND KRISTENSEN
that V ≡ β 0 X is observed, the estimator of Horowitz (1996) can be written
as:
Z y
Z
Ĝy (u, v)
dudv,
Θ̃ (y) = − ω (v)
V
0 Ĝ1 (u, v)
where Ĝ(y, v) is a kernel estimator of G(y, v) = P (Y ≤ y|V = v) =
FY |V (y, v), V is the support of V , and ω (v) is a weighting function with compact support. As shown in Horowitz (1996), due to the double-integration,
√
Θ̃ (y) is n-consistent despite the fact that it relies on first-step nonparametric estimators that converge with slower rate. Similarly, in our case we
√
also integrate over both y and x, and as such Θ̂LS (y) will be n-consistent.
Horowitz’s estimator is known to perform poorly in finite samples due to
denominator issues. We expect that similar issues may arise for the leastsquares type estimator in Equation (10) and therefore develop an alternative
median-type estimator which should be more robust in finite samples. For
a given distribution as induced by w (x), instead of computing the average
of θ̂ (y, x) with respect to x, we compute its median with respect to x,
Z
¯
¯
LAD
(y) ≡ arg min
w (x) ¯θ̂ (y, x) − q ¯dx.
Θ̂
q∈R
X
As is well-known, LAD estimators tend to be more robust to outliers compared to least-squares estimators. We expect that in our setting, this will
imply that Θ̂LAD (y) deals with denominator issues better than Θ̂LS (y). This
is confirmed in our simulation study. To simplify the theoretical analysis, we
follow Horowitz (1998) and introduce a smoothed version of the above leastabsolute deviation (LAD) estimator,
(y) ≡ arg min Qb (q|θ̂ (y, ·)),
Θ̂LAD
b
q∈R
where, with Fb (·) ≡ F (·/b) for some cdf F with median at zero and some
bandwidth b > 0, and
Z
Qb (q|θ (y, ·)) ≡
w (x) {θ (y, x) − q} {2Fb (θ (y, x) − q) − 1} dx.
X
(y) → Θ̂LAD (y) as b → 0.
It is easily seen that Θ̂LAD
b
Once either Θ̂LS (y) or Θ̂LAD
(y) have been computed, the regression
b
function and the conditional cdf of the error term can be estimated using nonparametric IV techniques: First, suppose that Θ is known. Then,
Θ(Y ) = g(X) + with E[ |X1 , Z] = 0 so the estimation of g is a standard
nonparametric IV regression problem. In this case, we can now in principle
NONPARAMETRIC TRANSFORMATION MODELS
13
employ any existing nonparametric IV estimator proposed in literature such
as the kernel estimator of Hall and Horowitz (2005) or the sieve estimator of
Blundell, Chen, and Kristensen (2007). For the theoretical analysis, we shall
focus on the latter; we expect that similar results will hold for alternative
nonparametric IV estimators. The sieve estimator takes the following form
when Θ is known:
n
X
{h̃ (X1,i , Zi ) − M̂ (X1,i , Zi |gn )}2 ,
(11)
g̃ ≡ arg min
gn ∈Gn
i=1
where h̃ (x1 , z) and M̂ (x1 , z|gn ) are first-step nonparametric estimators (such
as a kernel regression or a series estimators) of
(12)
h (x1 , z) ≡ E [Θ (Y ) |X1 = x1 , Z = z]
M (x1 , z|gn ) ≡ E [gn (X) |X1 = x1 , Z = z] ,
and Gn is a sieve space. We have here left out the weighting function used
in Blundell, Chen, and Kristensen (2007) since this is only used to obtain
efficiency of the parametric component of their model.
With Θ unknown, we need to modify the chosen nonparametric sieve
IV estimator with the true, but unknown, dependent variable Θ (Y ) being
(y). For the above
replaced by the generated one Θ̂ (Y ) = Θ̂LS (Y ) or Θ̂LAD
b
sieve estimator, this leads to the following feasible version:
(13)
n
X
ĝ ≡ arg min
{ĥ (X1,i , Zi ) − M̂ (X1,i , Zi |gn )}2 ,
gn ∈Gn
i=1
where ĥ (x1 , z) is an estimator of E[Θ̂ (Y ) |X1 = x1 , Z = z].
Finally, given Θ̂ (y) and ĝ (x), we can compute the corresponding residuals, ˆi = Θ̂ (Yi ) − ĝ (Xi ), i = 1, . . . , n. Standard nonparametric estimators
of conditional cdf’s, such as the kernel one presented above, can now be
employed with the residuals replacing the actual unobserved errors,
Pn
i=1 Khy (ˆi − t) Khx−1 (X−1,i − x−1 )
Pn
.
F̂ |X (t, x−1 ) ≡
i=1 Khx−1 (X−1,i − x−1 )
We now proceed to analyze the asymptotic properties of the estimators. In
order to do so, we introduce additional assumptions on the model and the
kernel function used in the estimation.
Assumption A7. The univariate kernel K is differentiable, and there
exists constants C, η > 0 such that
¯
¯
¯
¯
¯
¡ 0 ¢¯¯
¯ (i)
¯ (i) ¯
−η
(i)
z ¯ ≤ C ¯z − z 0 ¯ , i = 0, 1,
¯K (z)¯ ≤ C |z| , ¯K (z) − K
14
CHIAPPORI, KOMUNJER, AND KRISTENSEN
R
(i)
where
Furthermore, R K (z) dz = 1,
R
R j K (z) denotes the ith derivative.
m
R z K (z) dz = 0, 1 ≤ j ≤ m − 1, and R |z| K (z) dz < ∞.
The above class is fairly general and accommodates kernels with both
bounded and unbounded support. We do, however, require the kernel K to
be differentiable which rules out uniform and Epanechnikov kernels. This is
only used for technical reasons, and we expect the following results to also
hold for non-differentiable kernels. We allow for both standard second-order
kernels (m = 2) such as the Gaussian one, and higher-order kernels (m > 2).
The use of higher-order kernels in conjunction with smoothness conditions
on the densities in the model allow us to control for the smoothing bias
induced by the use of kernels. In general, the kernel has to be of higher
√
order in order for Θ̂ (y) to be n-consistent.
Assumption A8. The joint density fY,X (y, x) is bounded and m times
differentiable with bounded derivatives; its mth order partial derivatives are
uniformly continuous. Furthermore, supx∈X ,y∈Y k(x, y)kq fY,X (y, x) < ∞
for some constant q > 0.
Note that the number of derivatives, m ≥ 2, is assumed to match up with
the order of the kernel K. The requirement that supx,y k(x, y)kq fY,X (y, x) <
∞ is implied by E[|Y |q ] < ∞ and E[kXkq ] < ∞.
As noted earlier the weighting function is used to control the denominator
problem of our estimator. More specifically, with X0 ⊆ X denoting the
support of the weighting function w(x), we require that:
Assumption A9.
supy∈Y |Θ (y)| < ∞.
inf y∈Y,x∈X0 |Φ1 (y, x)| > 0, inf x∈X0 f (x) > 0, and
The lower bound condition on |Φ1 (y, x)| is related to the set A introduced
in Theorem 1 and further restricts the behavior of Φ1 (y, x). In particular,
Assumption A9 implies that X0 ⊆ A and so A has non-empty interior. If
√
in fact A has empty interior, we cannot obtain n-consistency. The lower
bounds imposed on |Φ1 (y, x)| and f (x) allow us to control the estimation
error, Ŝ (y, x) − S (y, x), uniformly over (y, x) ∈ Y × X0 . The above condition implicitly restricts the support of the weighting function to be compact,
and Y to have bounded support. Sufficient conditions for the compact support assumption to hold are that g is bounded and has compact support.
We conjecture that inf y∈Y,x∈X0 |Φ1 (y, x)| > 0 could be replaced by weaker
conditions allowing for unbounded support of Y . However, we expect that
NONPARAMETRIC TRANSFORMATION MODELS
15
those would come at the cost of having to introduce trimming in the definition of our estimator Ê[Ŝ (Y, x)], and complicate our proofs considerably;
we therefore maintain the above stronger assumption.
√
√
√ 2m
nhx → 0, nh2m
→ 0, nhxdx +2 / log n → ∞,
Assumption A10.
y
√
and nhy hdxx +1 / log n → ∞.
Assumption A10 restricts the set of feasible bandwidths to ensure that the
squared estimation error of the kernel estimators p̂ (y, x) and fˆ (x) and their
√
relevant derivatives all are of order oP (1/ n) uniformly over y ∈ Y and x ∈
X0 . As is standard for kernel estimators, there is a curse-of-dimensionality
which appears in the last two restrictions on hx : When dx = dim (X) is large
we in general need to use higher-order kernels in order for all four conditions
to hold simultaneously. For example, if hx ∝ n−rx and hy ∝ n−ry then
Assumption A10 holds whenever m > (dx + 2) /2 and 1/ (4m) < rx , ry <
1/ [2(dx + 2)].
To state the asymptotic distribution of the estimator, we collect the observables in U ≡ (Y, X) and introduce the function δ w (U |y) given by
(14)
δ w (U |y) ≡ δ 1w̄1 (U |y) − δ 2w̄2 (U |y) ,
with w̄1 (x) ≡ w (x) /EY [S (Y, x)], w̄2 (y, x) ≡ w (x) S (y, x) /EY [S (Y, x)]2
and
(15)
Z y
δ 1w̄1 (U |y) ≡ w̄1 (X)
D̄0 (u, X) du + I {Y ≤ y} w̄1 (Xi ) Dp,y (Y, X)
Y
¤
Z y £
∂ w̄1 (Xi ) D̄1 (u, X)
du,
+
∂x1
Y
(16)
Z
£ ¡
¢ ¤
w̄2
0
w̄2 (y, x) {S (Y, x) − EY [S (Y, x)]} dx,
δ 2 (U |y) ≡ E a U, U |y |U +
Z
Y0
X
¢
©
ª
¡
D̄0 (u, X) du + I Y ≤ Y 0 w̄1 (X) Dp,y (Y, X)
a U, U 0 |y ≡ w̄2 (y, X)
Y
¤
Z Y0 £
∂ w̄2 (y, X) D̄1 (u, X)
+
du,
∂x1
Y
where U 0 ≡ (Y 0 , X 0 ) is an independent copy of U . Here, D̄k (y, x) = Dp,k (y, x)+
Df,k (y, x) and Dp,k (y, x) and Df,k (y, x) are defined in Equation (S.17)
in the supplemental article (Chiappori, Komunjer, and Kristensen, 2013),
k ∈ {0, 1, y}. Under the above conditions, we then have the following weak
convergence result for the proposed estimator:
16
CHIAPPORI, KOMUNJER, AND KRISTENSEN
Theorem 2. Let Assumptions A1 through A10 and the normalization
condition (3) hold. Then, for any b > 0, the following functional weak convergence results hold for y ∈ Y,
√
√
n(Θ̂LS (y) − Θ (y)) ⇒ W (y) ,
n(Θ̂LAD
(y) − Θ (y)) ⇒ W (y) ,
b
where y 7→ W (y) is a zero-mean Gaussian process with covariance kernel
Ω (y1 , y2 ) = E [δ w (Ui |y1 ) δ w (Ui |y2 )].
As can be seen from the above expression, the function δ w (U |y) = δ (U |y, w, Φ)
is a known functional of the weighing function w(x) and the conditional cdf
Φ(y, x) of Y given X, and so the asymptotic covariance kernel can be consistently estimated by
n
1X
δ(Ui |y1 , w, Φ̂)δ(Ui |y2 , w, Φ̂),
Ω̂ (y1 , y2 ) =
n
i=1
where Φ̂ is the kernel estimator given in Equation (9).2
An interesting feature of the smoothed LAD estimator is that its firstorder asymptotic properties are invariant to the choice of bandwidth. This is
different from the analysis in Horowitz (1998) who has to restrict b to shrink
at a suitable rate to kill smoothing biases. The reason for this discrepancy
is that in the limit θ̂ (y, x) is constant with respect to x and so the effect of
smoothing is asymptotically negligible. In practice, the LAD estimator will
be affected by the bandwidth choice but the impact should be small.
As a first step in the analysis of ĝ, we first extend the conditions of Blundell, Chen, and Kristensen (2007) to a multivariate setting to ensure that
the infeasible estimator g̃ in Equation (11) is consistent; these are straightforward but rather technical extensions which we relegate to the Appendix.
We note that the most substantive of these additional assumptions is the
requirement of compact support of (X1 , Z).
The compactness of Y implied by Assumption A9 together with Theo(y) both converge uniformly with rate
rem 2 yields that Θ̂LS (y) and Θ̂LAD
b
√
OP (1/ n). This in turn allows us to show that the feasible estimator ĝ is
asymptotically equivalent to g̃, thereby yielding the following result:
Theorem 3. Let Assumptions A1 through A10 and the normalization
condition (3) hold. Assume in addition that Assumptions A12 through A16
2
In principle, efficiency of the estimator can be obtained by minimizing the asymptotic
2
variance E[δ w
i (y) ] as a functional of w. Given the complex expression of the influence
w
function δ i (y), this is a quite complicated problem and so is left for future work.
NONPARAMETRIC TRANSFORMATION MODELS
17
in the Appendix hold. Then, the feasible sieve IV estimator ĝ satisfies
sZ
³
´
p
[ĝ (x) − g (x)]2 fX (x) dx = Op kn−r/dx + τ n kn /n ,
kĝ − gkX =
X
where dx = dim (X), kn = dim (Gn ), r ≥ 1 is the degree of smoothness of g
and τ n is the sieve measure of ill-posedness:
p
E{gn (X)}2
p
(17)
τn ≡
sup
.
E{E[gn (X)|X1 , Z]}2
gn ∈Gn :gn 6=0
The convergence rate depends on the sieve-measure of ill-posedness τ n
which in turn depends on the decay rate of the singular values, which we
denote by {μk }, of the conditional mean operator g 7→ M (x1 , z|g) defined in Equation (12); see Section 4 in Blundell, Chen, and Kristensen
(2007) for further discussion. If for example, the singular values satisfy
s/d
some s > ¢0 then τ n ≤ const × kn x and we obtain
μk ³ k−s/dx , for
¡ −r/[2(r+s)+d
x] .
kĝ − gkX = Op n
The convergence rate stated in Theorem 3 is identical to the one for the
oracle estimator g̃ that assumes knowledge of T ; thus, there is no (asymptotic) loss from not knowing T in the estimation of g. This is due to the
fact that T̂ converges with faster rate than g̃, and so it does not influence
the feasible estimator ĝ. The above result only gives the rate of convergence
of the estimator. We conjecture that the general results of Belloni, Chen,
Chernozhukov, and Liao (2010) could be applied to our problem to develop
distributional results. As shown there, the rate of convergence towards an
√
asymptotic distribution is slower than n, and so the asymptotic distribution is unaffected by the first-step estimation of Θ.
We conjecture that Theorem 3 remains true without restricting the support of Y to be bounded. In particular, by inspection of the proof of Theorem 3, we see that
all that is¢ needed for the result to hold is that ||Θ̂ (·) −
¡ −r/[2(r+s)+1]
=
o
n
, where k·kY denotes the L2 -norm, kΘk2Y =
Θ
(·)
||
Y
P
R
2
Y Θ (y) fY (y) dy. We expect this to hold in great generality.
Finally, we note that with ĝ and Θ̂ converging uniformly, the estimator
F̂ |X (t, x−1 ) is clearly also consistent. A full analysis of the asymptotic properties of F̂ |X (t, x−1 ) is outside of the scope of this paper. We expect that
the techniques developed in Mammen, Rothe, and Schienle (2012) could be
adapted to our setting and thereby allow for a more complete analysis of
F̂ |X (t, x−1 ). This is left for future research.
18
CHIAPPORI, KOMUNJER, AND KRISTENSEN
4. Testing for Conditional Independence. The identification and
estimation results developed in the two previous sections rests on two fundamental assumptions regarding the chosen “special” regressor X1 : First,
that it is relevant in the sense that ∂g(x)/ (∂x1 ) 6= 0 and, second, that it is
exogenous in the sense that
H0 : ⊥ X1 | X−1 .
If either of these two restrictions is violated, the proposed estimators will in
general be inconsistent. It is therefore of interest to develop tools to examine
whether a candidate regressor indeed satisfies these assumptions. Regarding
the first hypothesis, ∂g(x)/ (∂x1 ) 6= 0, this can be tested by examining
whether ∂Φ(y, x)/ (∂x1 ) 6= 0 or not. Given the nonparametric estimator
Φ̂(y, x) provided in the previous section, this can be done using standard
tools. We therefore in the following focus on H0 .
In general, testing exogeneity of a regressor requires the availability of an
instrument to generate over-identifying restrictions; this is, for example, the
case in the Hausmann test. In our case, taking as maintained hypothesis
that data is generated by the model in Equation (1), overidentifying restrictions arrive in the form of θ (y, x) as defined in Equation (4): Under H0 ,
θ (y, x) = Θ (y) for all x ∈ A, while under the alternative, θ (y, x) 6= Θ (y).
This additional restrictions arrive from the fact that we require strict independence instead of conditional mean independence. More specifically, by
following the same arguments as in the proof of Theorem 1, if H0 is not
imposed, then the following result holds:
Theorem 4. Let Assumptions A1-A3 and A5 hold, and assume that
only the second part of Assumption A4 holds, i.e. X1 is continuous. Then,
θ(y, x) = Θ(y) + ∆ (y, x) ,
(y, x) ∈ Y × A,
where, with g1 (x) = ∂g(x)/ (∂x1 ),
∆ (y, x) ≡ Θ(y)−g1 (x)
Z
0
y
0
Θ (t)
½
Under H0 , ∆ (y, x) = 0 since ∂F
∂F
¾−1
− g(x), x)/ (∂x1 )
− g1 (x)
dt.
f |X (Θ(t) − g(x), x)
|X (Θ(t)
|X (Θ(t) − g(x), x)/ (∂x1 )
= 0.
So a natural way to test H0 is by examining whether the proposed estimators θ̂ (y, x) and Θ̂ (y) are significantly different from each other across
(y, x) ∈ Y × A. To allow for added flexibility, we will use two different sets
of bandwidths, (hx , hy ) and (h0,x , h0,y ), for the computation of Θ̂ (y) and
19
NONPARAMETRIC TRANSFORMATION MODELS
θ̂ (y, x) respectively in the testing procedure. To emphasize that different
bandwidths are used in computation of θ̂ (y, x), we use θ̂0 (y, x) to denote
the estimator based on (h0,x , h0,y ). We then propose to compare the two
nonparametric estimators through the following L2 -statistic,
Z Z
(18)
I≡
W (y, x) [θ̂0 (y, x) − Θ̂ (x)]2 dydx,
Y
X
where
R R W (y, x) is a weighting function with compact support satisfying
Y X W (y, x) dydx = 1. We will reject H0 if I is “large.”
The test based on I is related to standard nonparametric misspecification
tests where a “parametric” estimator, Θ̂ (x), is compared with a nonparametric one, θ̂0 (y, x); see e.g. Härdle and Mammen (1993) and Kristensen
(2011). However, in comparison to these papers, the asymptotic analysis of
I is complicated by the fact that the nonparametric estimator θ̂0 (y, x) is
more complicated compared to the kernel regression and density estimators
considered in these two papers. The test is also similar to the specification
test proposed in Lewbel, Lu, and Liangjun (2013). However, Lewbel, Lu,
and Liangjun (2013) maintain the assumption of exogenous regressors and
then wish to test for the functional form restrictions implied by a transformation model. In our case, we maintain the functional form and wish to test
for exogeneity of X1 .
For the asymptotic analysis, we restrict the set of feasible bandwidths
used in the computation of θ̂0 (y, x) to satisfy:
Assumption A11.
dx /2+2 4m
h0,y
nh0,x
→ 0,
d /2+2+m
dx +2
x
nh0,x
→ ∞, nh0,x
3/2d
nh0,x x / (log (n))2
→ ∞,
d /2+2+4m
x
→ 0, nh0,x
3/2d −2
nh20,y h0,x x / (log (n))2
→ 0,
→ ∞.
With Df,k (y, x) and Dp,k (y, x) defined in Equation (S.17) in the supplemental article (Chiappori, Komunjer, and Kristensen, 2013), k ∈ {0, 1, y},
the asymptotic distribution of the test statistic in (18) is then as follows:
Theorem 5. Let Assumptions A1 through A9 hold, and the bandwidths
for Θ̂ (y) and θ̂0 (y, x) satisfy A10 and A11 respectively. Then,
d /2+2 I
x
nh0,x
− mI d
→ N (0, 1)
vI
¡
¢
where, with K1 (x) = ∂K (x) / (∂x1 ), σ 2 (y, x) ≡ Var D̄0 (y, Y, X) |X = x ,
Z
Z Z
1
2
mI ≡
K (x) dx ×
σ 2 (y, x) W (y, x) f (x) dydx
x
nhd0,x
X
Y X
Z Z
Z
1
2
K1 (x) dx ×
W (y, x) D̄12 (y, x) f (x) dydx,
+ dx +2
nh0,x
X
Y X
20
CHIAPPORI, KOMUNJER, AND KRISTENSEN
vI2
≡2
Z
X
2
[K1 ∗ K1 ] (x) dx ×
Ry
Z Z
X
W (y, x) D̄14 (y, x) f 2 (x) dydx,
Ry
Dp,1 (u, x) du + I {Y ≤ y} Dp,y (Y, x)
EY [S (Y, x)]
i
hR
Y
S (y, x) EY 0 Df,0 (u, x) du
,
−
EY [S (Y, x)]2
hR
i
Y
S (y, x) EY 0 Df,0 (u, x) du
.
D̄1 (y, x) ≡ −
EY [S (Y, x)]2
¯
¯
dx /2+2 ¯ I−mI ¯
If ⊥ X1 | X−1 does not hold such that ∆ (y, x) 6= 0, nh0,x
¯ vI ¯ →
+∞.
D̄0 (y, Y, x) ≡
Y
Dp,0 (u, x) du +
Y
Y
The above result is similar to the ones in Härdle and Mammen (1993)
and Kristensen (2011) except that the expressions of the location and scale
parameters, mI and vI , are somewhat more involved. As these two papers,
we propose to use Bootstrap to implement the test. To resample data under
the null, we draw X ∗ ¡∼ F̂X ¢where F̂X is the empirical distribution of X,
∗
and finally compute Y ∗ = T̂ −1 (ĝ (X ∗ ) + ∗ ).
then draw ∗ ∼ F̂ |X−1 ·|X−1
¡
¢
∗ , ∗ ⊥ X ∗ | X ∗ and so the test statistic
Since we draw ∗ from F̂ |X−1 ·|X−1
1
−1
based on the bootstrap sample will follow the distribution stated in Theorem
5 in large samples.
5. Monte Carlo Application to Duration Models. We now illustrate how the proposed identification and estimation strategy can be used in
the study of duration models. Nonparametric identification and estimation of
the duration models under exogeneity have been studied in the works by Elbers and Ridder (1982), Heckman and Singer (1984b,a), Heckman and Honore (1989), Honore (1990), Ridder (1990), Heckman (1991), Honore (1993),
Hahn (1994), Horowitz (1999), Ridder and Woutersen (2003), and Abbring
and Ridder (2011) among others; van den Berg (2001) gives a survey of this
literature. However, allowing for endogeneity is important in many applications, see, e.g., Bijwaard and Ridder (2005), but very few results exist in
this context.
5.1. Identification of Duration Models under Endogeneity. First, we recall some basic facts about duration models. Let τ ∈ (0, +∞) denote the
NONPARAMETRIC TRANSFORMATION MODELS
21
duration, X ∈ X be a vector of observed covariates, U ∈ (0, +∞) an unobserved individual heterogeneity term, and H(t, x, u) denote the conditional
hazard function:
H(t, x, u) ≡ lim
dt→0
P ( t 6 τ < t + dt| X = x, U = u)
dt
We shall assume that both X and U are time-invariant.3 Under this assumption, the integrated conditional hazard is distributed as a unit exponential
random variable, i.e. for a.e. (x, u) ∈ X × (0, +∞) we have ξ ≡
Rτ
0 H(t, x, u)dt ∼ Exp(1). In the mixed proportional hazard model, H(t, x, u) =
H0 (t) exp[−φ(x)]u where H0 (t) > 0 is the baseline hazard.R The correspondt
ing log-integrated conditional hazard transform, λ(t) ≡ ln 0 θ0 (s)ds, can be
expressed as
(19)
λ(τ ) = φ(X) + ln ξ − ln U.
Note that λ satisfies λ0 (t) > 0 and the limit conditions limt→0 λ(t) = −∞
and limt→+∞ λ(t) = +∞. Thus, the inverse mapping ϕ = λ−1 is such that
ϕ : R 7→ R+ , ϕ0 (t) > 0 on R, limt→−∞ ϕ(t) = 0 and limt→+∞ ϕ(t) = +∞.
Defining σ ≡ E [φ (X)], the model can be written on the form of (1) where
Y = τ − 1, Θ (y) ≡ λ (y + 1) /σ, g (x) ≡ φ (x) /σ and = (ln ξ − ln U ) /σ.
The above model falls within our framework. By construction, E [g (X)] =
E [Θ (τ )] = 1, so this part of our normalization (3) holds automatically.
The second normalization, Θ (0) =R 0, amounts to setting λ(1) = 0 (which
1
normalizes the baseline hazard to 0 H0 (s)ds = 1), and E[ln U ] = e where
e ≈ 0.577 denotes Euler’s constant.4 Thus, up to the scale parameter σ, we
can nonparametrically estimate the hazard rate model using the techniques
developed in the previous section. The estimators of the normalized hazard
rate function and regression function, Θ (τ ) and g (x), are entirely new and
not yet seen in the literature.
Once Θ, g and F have been estimated, we can estimate σ along the same
lines as in Horowitz (1999): If X is exogenous, we can follow Horowitz (1999)
and obtain that
R
Gv (t|v) p2 (v) dv
,
σ = lim σ (t) , σ (t) = − R
t→0
G (t|v) p2 (v) dv
3
Alternatively, one could assume that X and U are external (or exogenous for τ ), in the sense that P ( X(t, t + dt), U (t, t + dt)| τ > t + dt, X(t), U(t)) =
P ( X(t, t + dt), U(t, t + dt)| X(t), U(t)); see e.g. (Lancaster, 1992).
4
Note that when ξ ∼ Exp(1) then V ≡ ln ξ follows a (negative) Gumbel distribution
whose distribution is given by FV (v) = 1 − exp(− exp v). Then, E(V ) = e.
22
CHIAPPORI, KOMUNJER, AND KRISTENSEN
where G (t|v) = P (τ ≤ t|V = v) and p (v) is the density of V ≡ g (X). If
X−1 is endogenous, the above identification result is no longer valid. Instead,
we can use that
Z
i
h
Φ (t, x) = 1 − exp −Λ (t) e−σg(x)−u dFln U |X−1 (u|x−1 ) ;
Z
i
h
Φ1 (t, x) = −σΛ (t) g1 (x) exp −Λ (t) e−σg(x)−u dFln U |X−1 (u|x−1 ) .
In particular, we can then express the scale as
R
Φ1 (t, x) /g1 (x) f 2 (x) dx
R
.
σ = lim σ (t) , σ (t) = −
t→0
Φ (t, x) f 2 (x) dx
This appears to be a new identification result which should be of independent
interest.
5.2. Monte Carlo Design. We here focus on the performance of the
estimator of Θ since the estimators of g and F |X have been examined
in other studies. For the Monte Carlo study, we generate data from (19)
with X = (X1 , X2 ) being bivariate and generated as X1 = ν 1 , X2 =
α1 Z + α2 Z 2 + ν 2 + ρ , where (ν 1 , ν 2 , , Z) are mutually independent standard normal random variables. Thus, X1 ⊥ is exogenous while X2 remains
endogenous whenever ρ 6= 0. We consider both the case of exogenous regressors (ρ = 0) and endogenous ones (ρ = 0.5). Finally, the regression function
is specified as
φ(X) = β 1 Φ (X1 ) + β 2 X1 + β 3 X22 ,
with (β 1 , β 2 , β 3 ) = (2.0, 0.1, −0.1), while λ (t) is chosen as
log-model : λ(t) = ln tα , α = 1,
corresponding to a proportional hazard duration model with a Weibull baseline hazard.
For the implementation of the estimators, we have to choose the bandwidths used to estimate Φ (y, x) and its derivatives together with a weighting
function w. In addition, for the computation of Θ̂ (y), we have to numerically evaluate the integrals that enter the expression of our estimator. The
bandwidths are chosen by first implementing Silverman’s Rule-of-thumb and
then scaling these down since our theoretical results state that we should
√
undersmooth in order to obtain n-consistency. We also use different bandwidth depending on whether we are estimating Φ̂y (y, x) or Φ̂x (y, x). To be
NONPARAMETRIC TRANSFORMATION MODELS
23
more specific, our bandwidths for the two estimators are chosen as follows:
Φ̂y (y, x) : hy = (4/3)1/5 σ̂ Y n−(1+δ)/5 ,
Φ̂x (y, x) : hy = 41/3 σ̂ Y n−1/3 ,
hxi = σ̂ Y n−(1+δ)/6 ,
hxi = (4/6)1/8 σ̂ Y n−(1+δ)/8 ,
where σ̂ 2Y is the sample variance of Y and δ controls the degree of undersmoothing which we set to δ = 1. Next, the support of the weighting function
was chosen as the uniform density with support X0 chosen in data-driven
way to avoid the aforementioned denominator issues,
n
o
¡
¢
X0 = (x1 , x2 ) : Φ̂x Ȳ , x > c, q̂Xk (2.5) ≤ xk ≤ q̂Xk (97.5) k = 1, 2 ,
where Ȳ is sample mean of Y and q̂Xk (·) the sample quantile function of
Xk , k = 1, 2.
The estimators were then implemented as follows:
First,
simulate N ≥
´
³
∗
∗
∗
1 uniform bivariate draws on X0 , say xi = xi,1 , xi,2 for i = 1, . . . , N
∗
and compute θ̂i (y) = Ŝ (y, x∗i ) /ÊY [Ŝ (Y, x∗i )] for each draw. Given these S
alternative estimators evaluated at randomly chosenPvalues of x across X0 ,
∗
we then computed the “empirical” mean, Θ̂LS (y) = N
i=1 θ̂ i (y) /N , and the
(y); for the latter, we chose a bandwidth
smoothed empirical median, Θ̂LAD
b
of b = 0.01.
The performance of the LAD version of our estimator for the exogenous
and endogenous case are reported in Figures 1 and 2 respectively. We see
that the estimator performs very well, with little bias and variance for the
most part of the domain of Y , despite the fact that the estimator is based
on only n = 250 observations. As such, the attractive properties asymptotic
properties of the estimator appear to also hold in finite samples. Moreover,
there is only small differences in the performance of the estimator when
comparing the exogenous and endogenous case.
Finally, Figure 3 shows
P the performance of the least-squares version of
our estimator, Θ̂ (y) = Ss=1 Θ̂s (y) /S, for the case of endogenous regressors.
The performance of Θ̂LS (y) is clearly inferior to Θ̂LAD
(y) as shown in Figure
b
∗
2. The poor performance is due to the fact that θ̂s (y), s = 1, . . . , S, contain
a relatively large number of “outliers” which here are given equal weight. In
contrast, the LAD estimator discards these outliers and so is not affected.
6. Conclusion. We conclude by discussing possible extensions of our
identification result. Instead of assuming conditional independence between
and X1 given X−1 , we could assume that some instrument W was available such that and X1 were conditionally independent given (X−1 , W ),
24
CHIAPPORI, KOMUNJER, AND KRISTENSEN
3.5
3
2.5
Θ(y)
2
1.5
1
0.5
Θ(y)
Mean of est. Θ(y)
95% CI of est. Θ(y)
0
−0.5
0
1
2
3
4
5
6
7
8
9
y
Fig 1. Median-estimator of Θ (y) with exogenous regressors.
3
2.5
2
Θ(y)
1.5
1
0.5
Θ(y)
Mean of est. Θ(y)
95% CI of est Θ(y)
0
−0.5
0
1
2
3
4
5
6
7
8
y
Fig 2. Median estimator of Θ (y) with endogenous regressors.
9
25
NONPARAMETRIC TRANSFORMATION MODELS
5
Θ(y)
Mean of est. Θ(y)
95% CI of est. Θ(y)
4
Θ(y)
3
2
1
0
−1
0
1
2
3
4
5
6
7
8
9
y
Fig 3. Mean-estimator of Θ (y) with endogenous regressors.
i.e. ⊥ X1 | (X−1 , W ). This would amount to considering the conditional
distribution FY |X,V of Y given (X, W ) which now satisfies:
FY |X,W (y, x, w) ≡ Φ(y, x, w) = F
|X,V
(Θ(y) − g(x), x−1 , w) .
Redefining X to be (X, W ), the above expression falls exactly in the framework obtained in (2), with an additional restriction on the function g which
now no longer depends on the components of X corresponding to W . When
the conditional distribution of the redefined vector X−1 given Z is complete,
we know that g is identifiable. This identification result holds even without
restricting the way that g depends on W ; a fortiori, the identification result
remains valid in this case.
APPENDIX A: SIEVE IV ASSUMPTIONS
We here state the additional regularity conditions used to establish Theorem 3. First, we need some additional notation: The first-step conditional
26
CHIAPPORI, KOMUNJER, AND KRISTENSEN
mean estimators: h̃ (x1 , z) and M̂ (x1 , z|gn ) are assumed to take the form
h̃ (x1 , z) = pJn (x1 , z)0 (P 0 P )−
n
X
pJn (X1,i , Zi )Θ (Yi ) ,
i=1
M̂ (x1 , z|gn ) = pJn (x1 , z)0 (P 0 P )−
n
X
pJn (X1,i , Zi )gn (Xi ) ,
i=1
where pJn (z1 , z) = (p1 (x1 , z), . . . , pJn (x1 , z))0 is a sieve basis of dimension
Jn ≥ 1, and P = (pJn (X1,1 , Z1 ), . . . , pJn (X1,n , Zn ))0 . Also let Λrc (X ) ≡
{g ∈ Λr (X ) : ||g||Λr ≤ c} be a Hölder ball (of radius c) of functions with
smoothness r as introduced in Blundell, Chen, and Kristensen (2007). We
are now ready to state the regularity conditions.
Assumption A12.
∞ for some a > r.
(i) g ∈ G ≡ Λrc (X ) for some r > 1/2; (ii) E[||X||2a ] <
Assumption A13. The functions h (x1 , z) ≡ E[Θ (Y ) |X1 = x1 , Z = z]
and M (x1 , z|gn ) ≡ E[gn (X) |X1 = x1 , Z = z] belong to H ≡ Λrcm (X1 × Z),
rm > 1/2, for any gn ∈ Gn .
Assumption A14. (i) The smallest eigenvalue and the largest eigenvalue of E[pJn (X1 , Z)pJn (X1 , Z)0 ] are bounded and bounded away from zero
for each J2n ; (ii) pJn (x1 , z) is either a cosine series or a B-spline basis of
order γ b , with γ b > rm > 1/2; (iii) the density of (X1 , Z) is continuous,
bounded and bounded away from zero over its support X1 × Z, which is a
compact set with non-empty interior.
Assumption A15. There is a gn ∈ Gn such that τ 2n × E[E[g(X) −
gn (X) |X1 , Z]2 ] ≤ const × ||g − gn ||2X .
−2rm /(1+dz )−1
Assumption A16. (i) kn → ∞, Jn /n → 0; (ii) nJn
and limn→∞ (Jn /kn ) = c0 > 1;
→0
SUPPLEMENTARY MATERIAL
Supplement: Proofs of the results stated in the text
(; .pdf). The supplementary material contains the proofs of the results stated
in the main text.
NONPARAMETRIC TRANSFORMATION MODELS
27
REFERENCES
Abbring, J. H., P. Chiappori, and T. Zavadil (2007): “Better Safe than Sorry? Ex
Ante and Ex Post Moral Hazard in Dynamic Insurance Data,” VU University Amsterdam and Columbia University.
Abbring, J. H., and G. Ridder (2011): “Regular Variation and the Identification of
Generalized Accelerated Failure-Time Models,” CentER and University of Southern
California.
Belloni, A., X. Chen, V. Chernozhukov, and Z. Liao (2010): “On Limiting Distributions of Possibly Unbounded Functionals of Linear Sieve M-Estimators,” Yale University.
Bijwaard, G., and G. Ridder (2005): “Correcting for selective compliance in a reemployment bonus experiment,” Journal of Econometrics, 125, 77—111.
Blundell, R., X. Chen, and D. Kristensen (2007): “Semi-Nonparametric IV Estimation of Shape-Invariant Engel Curves,” Econometrica, 75, 1613—1669.
Chen, S. (2002): “Rank Estimation of Transformation Models,” Econometrica, 70, 1683—
1697.
Chen, X., V. Chernozhukov, S. Lee, and W. Newey (2011): “Local Identification
of Nonparametric and Semiparametric Models,” Discussion paper, Cowles Foundation
Discussion Paper No. 1795.
Chen, X., and D. Pouzo (2012): “Estimation of Nonparametric Conditional Moment
Models With Possibly Nonsmooth Generalized Residuals,” Econometrica, 80, 277321.
Chernozhukov, V., G. W. Imbens, and W. K. Newey (2007): “Instrumental Variable
Estimation of Nonseparable Models,” Journal of Econometrics, 139, 4—14.
Chiappori, P., I. Komunjer, and D. Kristensen (2013): “Supplement to: “Nonparametric Identification and Estimation of Transformation Models”,” .
Darolles, S., Y. Fan, J. Florens, and E. Renault (2011): “Nonparametric
Instrumental Regression,” Econometrica, 79, 1541—1565, Centre de Recherche et
Développement Économique, 05-2002.
Duan, N. (1990): “The adjoint projection pursuit regression,” Journal of the American
Statistical Association, 85, 1029—1038.
Ekeland, I., J. J. Heckman, and L. Nesheim (2004): “Identification and Estimation
of Hedonic Models,” The Journal of Political Economy, 112, S60—S109.
Elbers, C., and G. Ridder (1982): “True and Spurious Duration Dependence: The
Identifiability of the Proportional Hazards Model,” The Review of Economic Studies,
49, 402—411.
Engle, R. F. (2000): “The Econometrics of Ultra-High-Frequency Data,” Econometrica,
68, 1—22.
Fève, F., and J.-P. Florens (2010): “The practice of non-parametric estimation by
solving inverse problems: the example of transformation models,” The Econometrics
Journal, 13, S1—S27.
Florens, J., and S. Sokullu (2011): “Nonparametric Estimation of Semiparametric
Transformation Models,” Toulouse School of Economics.
Hahn, J. (1994): “The Efficiency Bound for the Mixed Proportional Hazard Model,” The
Review of Economic Studies, 61, 607—629.
Hall, P., and J. L. Horowitz (2005): “Nonparametric Methods for Inference in the
Presence of Instrumental Variables,” The Annals of Statistics, 33, 2904—2929.
Han, A. K. (1987): “A Non-Parametric Analysis of Transformations,” Joumal of Econometrics, 35, 191—209.
Härdle, W., and E. Mammen (1993): “Comparing Nonparametric versus Parametric
28
CHIAPPORI, KOMUNJER, AND KRISTENSEN
Regression Fits,” Annals of Statistics, 21, 1926—1947.
Hausman, J. A., and D. A. Wise (1978): “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences,” Econometrica, 46, 403—426.
Heckman, J. J. (1991): “Identifying the Hand of Past: Distinguishing State Dependence
from Heterogeneity,” The American Economic Review: Papers and Proceedings of the
Hundred and Third Annual Meeting of the American Economic Association, 81, 75—79.
Heckman, J. J., and B. E. Honore (1989): “The Identifiability of the Competing Risks
Models,” Biometrika, 76, 325—330.
Heckman, J. J., R. L. Matzkin, and L. Nesheim (2005): “Estimation and Simulation
of Hedonic Models,” in Frontiers in Applied General Equilibrium, ed. by T. Kehoe,
T. Srinivasan, and J. Whalley. Cambridge University Press, Cambridge.
Heckman, J. J., and B. Singer (1984a): “The Identifiability of the Proportional Hazards
Model,” The Review of Economic Studies, 60, 231—243.
(1984b): “A Method for Minimizing the Impact of Distributional Assumptions
in Econometric Models for Duration Data,” Econometrica, 52, 271—320.
Honore, B. E. (1990): “Simple Estimation of a Duration Model with Unobserved Heterogeneity,” Econometrica, 58, 453—473.
(1993): “Identification Results for Duration Models with Multiple Spells,” The
Review of Economic Studies, 60, 241—246.
Horowitz, J. L. (1996): “Semiparametric Estimation of a Regression Model with an
Unknown Transformation of the Dependent Variable,” Econometrica, 64, 103—137.
(1998): “Bootstrap Methods for Median Regression Models,” Econometrica, 66,
1327—1351.
(1999): “Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity,” Econometrica, 67, 1001—1028.
Imbens, G. W., and W. K. Newey (2009): “Identification and Estimation of Triangular
Simultaneous Equations Models without Additivity,” Econometrica, 77, 1481—1512.
Jacho-Chávez, D., A. Lewbel, and O. Linton (2010): “Identification and Nonparametric Estimation of a Transformed Additively Separable Model,” Journal of Econometrics, 156, 392—407.
Jochmans, K. (2011): “Pairwise-comparison Estimation with Nonparametric Controls,”
Manuscript, Sciences Po Dpartement d’conomie.
Keifer, N. M. (1988): “Economic Duration Data and Hazard Functions,” Journal of
Economic Literature, 26, 646—679.
Kristensen, D. (2011): “Semi-Nonparametric Estimation and Misspecification Testing
of Diffusion Models,” Journal of Econometrics, 164, 382403.
Lancaster, T. (1992): The Econometric Analysis Of Transition Data. Cambridge University Press.
Lewbel, A. (1998): “Semiparametric Latent Variable Model Estimation with Endogenous
or Mismeasured Regressors,” Econometrica, 66, 105—121.
Lewbel, A., X. Lu, and S. Liangjun (2013): “Specification Testing for Transformation
Models with Applications to Generalized Accelerated Failure-Time Models,” .
Lo, A. W., A. C. MacKinlay, and J. Zhang (2002): “Econometric models of limitorder executions,” Journal of Financial Economics, 65, 31—71.
Mammen, E., C. Rothe, and M. Schienle (2012): “Nonparametric Regression with
Nonparametrically Generated Covariates,” Annals of Statistics, 40, 1132—1170.
Mata, J., and P. Portugal (1994): “Life Duration of New Firms,” The Journal of
Industrial Economics, 42, 227—245.
Matzkin, R. L. (1991): “A Nonparametric Maximum Rank Correlation Estimator,” in
NONPARAMETRIC TRANSFORMATION MODELS
29
Nonparametric and Semiparametric Methods in Econometrics and Statistics, ed. by
W. Barnett, J. Powell, and G. Tauchen, chap. 11. Cambridge University Press.
(2007): “Nonparametric Identification,” in Handbook of Econometrics, Vol. 6B,
ed. by J. J. Heckman, and E. Leamer, pp. 5307—5368. Elsevier.
Newey, W. K., and J. L. Powell (2003): “Instrumental Variable Estimation of Nonparametric Models,” Econometrica, 71, 1565—1578.
Newey, W. K., J. L. Powell, and F. Vella (1999): “Nonparametric Estimation of
Triangular Simultaneous Equations Models,” Econometrica, 67, 565—603.
Ridder, G. (1990): “The Non-Parametric Identification of Generalized Accelerated
Failure-Time Models,” The Review of Economic Studies, 57, 167—181.
Ridder, G., and T. Woutersen (2003): “The Singularity of the Information Matrix of
the Mixed Proportional Hazard Model,” Econometrica, 71, 1579—1589.
van den Berg, G. J. (2001): “Duration Models: Specification, Identification and Multiple
Durations,” in Handbook of Econometrics, Vol. 5, ed. by J. J. Heckman, and E. Leamer,
pp. 3381—3460. Elsevier.
Ye, J., and N. Duan (1997): “Nonparametric n−1/2 -consistent estimation for the general
transformation models,” Annals of Statistics, 25, 2682—2717.
Department of Economics
Columbia University
E-mail: [email protected]
Department of Economics
University of California San Diego
E-mail: [email protected]
Department of Economics
University College London, CeMMAP and CREATES
E-mail: [email protected]
© Copyright 2026 Paperzz