Identification of Limited Dependent Variable Models with

Identification in Nonparametric
Limited Dependent Variable Models with
Simultaneity and Unobserved Heterogeneity
Rosa L. Matzkin1
Department of Economics
University of California, Los Angeles
First version: January 2010
This version: March 2011
Abstract
We extend the identification results for nonparametric simultaneous equations models in Matzkin (2008) to situations where the observations on the
vector of dependent variables might be limited, and where the number of
exogenous unobservable variables is larger than the number of dependent
variables.
1
The support of NSF through grants SES-0833058 and BCS-0852261 and of NIA
through grant F014613//AG012846-1251 is gratefully acknowledged. I am also thankful
for their insightful comments to Elie Tamer, two referees, Gary Chamberlain, Andrew
Chesher, Yong Hyeon Yang, and the participants at the May 2009 Conference on Identification and Decisions, in honor of Charles Manski, where this paper was presented.
1
1
Introduction
This paper develops new identification results for nonparametric limited dependent variable models. The models can be subject to simultaneity in latent or observable continuous variables, and may depend on a large number
of unobservable variables. We provide conditions under which the distribution of the vector of unobservable variables as well as the functions of the
unobservable and observable variables are nonparametrically identified.
The results can be applied to a wide range of models. In particular,
they can be applied to binary threshold crossing models, ordered dependent
variable models, and censored dependent variable models with continuous
endogenous explanatory variables and no dummy or other limited explanatory variables. Models with "no structural shifts" considered in Heckman
(1978) belong to this type. Examples of empirical situations that can be analyzed using such models include variations of the labor participation decision
of one member of the family as a function of other household income, as in
Smith and Blundell (1986), Blundell and Smith (1989), and Blundell and
Powell (2003b), and the decision model for post-secondary vocational school
training in Nelson and Olson (1978).
Our main method proceeds by first transforming the model with limited
dependent variables and a large number of unobservable variables into one
with ‘observable’ continuous dependent variables and with the same number
of unobservable variables as the number of dependent variables. The transformed model will satisfy the conditions of the models analyzed in Matzkin
(2008). To determine the identification of this transformed model, we use the
identification results for simultaneous equations models developed in Matzkin
(2008). We then use the identified elements in the transformed model together with the particular structures of the original model to determine the
identification of the elements of the original model. Such elements are the
particular functions and distributions generating the distribution of the unobservable variables in the transformed models and the structural equations
2
that describe the interaction among the latent and observable dependent variables, the observable exogenous variables, and the unobservable exogenous
variables.
To transform the nonparametric model with limited dependent variables
into another nonparametric model with ‘observable’ continuous dependent
variables, we extend some of the original idea in Manski (1975). In this pioneering paper, Manski showed, among other things, that one can identify and
consistently estimate the coefficients of linear subutility functions in discrete
choice models without specifying parametrically the distribution of the unobservable random subutilities. Two of the key elements in Manski’s (1975)
proof of identification were a scale normalization on the vector of coefficients
and assumptions on a regressor, requiring it to possess a strictly increasing
distribution conditional on the other regressors and a nonzero coefficient.
Those same assumptions were used in a large number of distribution-free
methods for binary and multinomial models that were later developed. In
particular, Cosslett (1983) showed how these same elements deliver as well
identification of the distribution of the unobservable random term, when this
is independent of the regressors. Matzkin (1992, examples 3 and 5) also
used an additive regressor with strictly increasing conditional distribution
and nonzero coefficient. Instead of the scale normalization used by Manski
(1975) and Cosslett (1983), she specified the coefficient of this regressor to
have the value one. Matzkin (1992) showed that when the latent variable in
binary threshold crossing or binary choice models is the sum of this regressor
plus a nonparametric function of the other regressors, plus the unobservable
random term, the nonparametric function and the distribution of the unobservable random term are identified. Matzkin (1992) developed the results
assuming that the additive unobservable random term was distributed independently of the vector of all observable explanatory variables. Lewbel
(2000) showed that the continuous, large support regressor need only be independent of the unobservable variables, conditional on the other explanatory
3
variables. This allowed the regressor to be used in the identification of a
variety of models with endogeneous regressors. In this paper, we will require
statistical independence between the vector of all exogenously determined,
observed explanatory variables and the vector of unobservable variables, as
in Matzkin (1992), instead of only a conditional independence assumption,
as in Lewbel (2000).2
To review the literature related to our models, consider first simultaneous equations models in latent or observable continuous dependent variables, where each structural equation contains only one unobservable exogenous variable. The parametric identification and estimation of such models
can be analyzed using the results in Amemiya (1974, 1977, 1979), Heckman (1974, 1976, 1978), Nelson and Olson (1978), and Blundell and Smith
(1989) by restricting attention to the cases where the structural system of simultaneous equations involve only latent or observable dependent variables.
See, in particular, Heckman (1974), Nelson and Olson (1978), Heckman
(1978, Cases 1, 3, and 5), Amemiya (1979), and Blundell and Smith (1989).
(Amemiya (1981), Lee (1981), and Maddala (1985) compare various such
models.) Newey (1985) developed semiparametric estimation methods for
some of these models. (Blundell and Powell (2003a) provide a lucid survey.)
Berry and Haile (2009a, 2009b) and Chiappori and Komunjer (2009) analyzed
nonparametric identification in multinomial choice models with simultaneity
and unobserved heterogeneity with special emphasis on the model developed
by Berry, Levinsohn, and Pakes (1995)). These papers deal with simultaneity by making a completeness assumption, as in Newey and Powell (1989,
2003) and Chernozhukov and Hansen (2005). Berry and Haile (2009b) show
identification also using a transformation of variables approach, related to
2
Papers that used large support additive regressors to establish identification, either
using full independence or conditional independence conditions, in models with limited
dependent variables include Matzkin (1993, 1994, 2005, 2007a, 2007b), Briesch, Chintagunta, and Matzkin (1997, 2002, 2009), Lewbel (1998), Honore and Lewbel (2002), Kahn
and Lewbel (2007), Abbring and Heckman (2007), Heckman and Navarro (2007), Berry
and Haile (2009a), and Lewbel, Linton and McFadden (2010).
4
Matzkin (2005, 2008) and to earlier work by Brown (1983), Roehrig (1988),
and Benkard and Berry (2006).
The main method we develop in this paper to deal with simultaneity in
latent and observable variables is an extension of the unpublished Section 5,
in Matzkin (2005), which considered a simultaneous equation model in latent
variables with discrete observability on the latent variables. The method
imposes specified scales on the measurement of the unobserved values of the
limited dependent variables. Intuitively, fixing the scale allows us to identify
a nonparametric function of the unobserved values of the limited dependent
variables. Although this known scale assumption can be relaxed, this would
typically be at the expense of adding functional restrictions.
The importance of allowing for unobserved heterogeneity in models generated by individual behavior has been recognized for a long time (See, e.g.,
Lancaster (1979)). Heckman and Singer (1984a, 1984b) pioneered the development of nonparametric methods to deal with unobserved heterogeneity in duration models. Beran and Hall (1992) and Hoderlein, Klemela,
and Mammen (2007) considered identification in random coefficient models.
Heckman and Willis (1977) and Hausman and Wise (1978) provided identification and estimation results for parametric binary response and multinomial models with unobserved heterogeneity. Nonparametric identification
and estimation of either binary response or multinomial models with unobserved heterogeneity has been studied by Briesch, Chintagunta and Matzkin
(1997, 2007), Ichimura and Thompson (1998), Altonji and Matzkin (2001,
2005), Chesher and Silva Santos (2002), Hess, Bolduc, and Polak (2005),
Harding and Hausman (2006), Bajari, Fox, and Ryan (2007), Gautier and
Kitamura (2007), Hoderlein (2008), Bajari, Fox, Kim, and Ryan (2009), and
Fox and Gandhi (2009), among others. Fox and Gandhi (2009) developed
as well results for identification of nonparametric regressions, selection models, and mixed-continuous discrete choice models using mixtures over sets of
functions. Matzkin (2007a) presents a partial survey of the literature on
5
unobserved heterogeneity. (See also Matzkin (2007b).) The explanatory
variables in the models mentioned above are either independent or conditionally independent of the unobservable random terms. Identification of
the distribution of unobserved heterogeneity in models with simultaneity
was studied in the previously mentioned papers by Berry and Haile (2009a,
2009b) and Chiappori and Komunjer (2009).
The method that we develop to deal with unobserved heterogeneity is
based on specifying a relationship among nonadditive functions of unobservable random terms, and identifying the nonparametric distributions of these
unobservable random terms as well as the functions of which they are arguments of, using the identification results in Matzkin (2003) and Matzkin
(2008). Specifically, our techniques allow to decompose the value of an ‘aggregate’ unobservable random vector into functions of multiple other unobservable random terms. One can use these techniques any time the distribution of an unobservable random vector conditional on a vector of observable
variables is determined to be identified. In particular, the techniques we develop, to decompose the conditional distribution of a vector of unobservable
variables conditional on a vector of observable variables, can be used instead
of or together with some of the other techniques that have been mentioned
above to deal with unobserved heterogeneity in various models.
The structure of the paper is as follows. In the next section, we describe
the model and specify the assumptions. In Section 3, we present our identification result. Examples are provided in Section 4. Section 5 concludes.
2
The Model
We consider a model, in which an unknown function,  represents a relationship between a vector of potentially latent, endogenous variables,  ∗ ∈  
a vector of observable exogenous variables, (   ) ∈  + +  and
6
a vector of unobservable exogenous variables, ( ) ∈ +  The relationship among these vectors is described by the unknown function  :
+ + + ++ →  , by
(1)
 ( ∗      ) = 0
The function  will be assumed to satisfy certain invertibility and weak separability restrictions, specified below. The vector  will represent structural
errors, generating dependence across the coordinates of  ∗  The vector  will
represent unobserved heterogeneity. The potentially unobservable endogenous vector  ∗ is assumed to be at least partially observed. Specifically, we
assume that there exists a known transformation  :  →  such that for
any value  ∗ of  ∗ ,  ( ∗ ) is observed. We denote  ( ∗ ) by  Hence, the
model is (1) and
(2)
 =  ( ∗ )
where  is the vector of observable endogenous variables and (   ) is
the vector of observable exogenous variables. Note that  ( ∗ ) is not an
argument in (1) The simultaneity is in terms of only the latent variables.
Particular cases of such models are the ones in Heckman’s (1978) with no
structural shifts. In such models the coherency conditions are satisfied because they impose zero-valued coefficients for the dummy variables in the
structural equations.
In simple cases, the transformation,  determines whether the model
corresponds to a multinomial model, a Tobit model, or any other limited
dependent variables models. To provide some simple examples, suppose that
the model contains one latent variable, 1∗ and one observable endogenous
variable, 2 = 2∗  Then, the case where the econometrician observes only if
7
1∗ is above or below a threshold corresponds to the transformation
 (1∗  2∗ ) = (1 2∗ )
= (0 2∗ )
1∗  0
if
otherwise
The case where the econometrician observes the value of 1∗ only when this
value is above 0 corresponds to the transformation
 (1∗  2∗ ) = (1∗  2∗ )
= (0 2∗ )
if
1∗  0
otherwise
Instead of specifying a known coefficient for the limited dependent variable
and additivity in 1 and 2 , as other models do, our assumptions impose a
piecewise linear specification the functional 1∗ the median of  is 0
If the value of 1∗ is always observed, the transformation is
 (1∗  2∗ ) = (1∗  2∗ )
Some examples with two latent variables are the Tobit type models
 (1∗  2∗ ) = (1∗  2∗ )
if (1∗  2∗ )  (0 0)
otherwise
= 0
 (1∗  2∗ ) = (1∗  0)
if 1∗  0 2∗ ≤ 0
= (0 2∗ )
= (1∗  2∗ )
if 1∗ ≤ 0 2∗  0
if 1∗  0 2∗  0
otherwise
= (0 0)
8
and the multinomial model
 (1∗  2∗ ) = (1 0)
= (0 1)
2.1
if 1∗ ≥ 0 ≥ 2∗
if 1∗ ≤ 0  2∗
Assumptions on distributions
We will first specify assumptions on the distributions of the observable
and unobservable exogenous variables. Assumption D1 requires independence between the  dimensional vector  the large dimensional vector 
and the vector of observable exogenous (   )  The vector  will represent structural errors, generating dependence across the coordinates of  ∗ 
The coordinates of  may enter one or all of the structural functions. The
vector  will be linked to a subvector, 0  of  in relatively unspecified ways.
On the other hand, each coordinate of the vector  representing an element
of unobserved heterogeneity, will enter only one of the structural equations
and it will do so linked to a subvector of  in restricted ways. Assumption
D2 requires independence across the coordinates of  This assumption is
only needed to identify the distribution of the whole vector  from the distribution of the marginal distributions of the coordinates of  Assumption
D3 requires full support for the unobservables  and  and for the vector of
exogenous variables  Note that while the coordinates of the vector  will
be assumed to be mutually independent, no such requirement is imposed on

Assumption D1:   and (   ) are mutually independent.
9
Assumption D2: The coordinates of  are independently distributed.
Assumption D3: The distributions of   and  are absolutely continuous and possess everywhere positive densities.
2.2
Assumptions on 
We next specify our assumptions on the function  These concern the
observable as well as the unobservable exogenous variables. In the specification of the model in (1) and (2) we have denoted the vector of observable
explanatory variables, (   ) ∈  + +  in terms of three subvectors,
  and  because we wanted to emphasize the different roles that each
of these subvectors has in identification. The vector of observable exogenous
variables  ∈  will have the role of allowing us to identify the conditional
distribution, given (   )  of the vector of latent variables,  ∗  from the
conditional distribution, given (   )  of the vector of observable endogenous variables,  The vector of observable exogenous variables  ∈  will
have the role of allowing us to identify the distribution of ( ) and nonparametric functions linking elements of ( ) to elements of  These latter non
parametric functions are 0 (0  )  whose values belong
³ to  ´and  realvalued nonparametric functions, 1 (1  1 )         each nonadditive in the unobservable random terms. The vectors 0  1    are
subvectors of  The following assumptions impose separability restrictions
on  that will be instrumental into making the roles of  and  effective.
Assumption B (separability in ( ∗   )) : There exists a known function
 : 2 →  such that (i) the function  depends on ( ∗  ) only through
10
the values of  ( ∗  )  (ii) for any value  of   (· ) is continuous,
1−1 and onto   (iii) for any  ∈   there exists a value  in the support
of  and a value  in the support of  such that for any ( )
Pr ( ( ∗   ) ≤ | =   =   = ) = Pr ( ≤ | =   =   = )
This assumption turns the distribution of the potentially discrete  into
the distribution of a continuous random vector  ( ∗   )  It is a generalization of the condition used in Matzkin (2005, Section 5) to show identification
in models with simultaneity in latent variables. To provide some examples,
disregard the existence of ( ). Consider the transformation where
 (1∗  2∗ ) = (1 2∗ )
= (0 2∗ )
if
1∗  0
otherwise
Let  (1∗  2∗  1  2 ) = (1∗ + 1  2∗ )  Assume that the support of 1 is 
and that 0 belongs to the support of 2  For any given  = (1  2 )  let
1 = 1  2 = 0 1 = 0 2 = 2  Then
Pr ( ( ∗   ) ≤ (1  2 ) |1 = 1 )
= Pr (1∗ + 1 ≤ 1  2∗ ≤ 2 |1 = 1 )
= Pr (1∗ ≤ 0 2∗ ≤ 2 |1 = 1 )
= Pr (1 ≤ 0 2 ≤ 2 |1 = 1 )
11
Consider next the transformation defined by
 (1∗  2∗ ) = (1∗  2∗ )
= (0 2∗ )
if
1∗  0
otherwise
Let again  (1∗  2∗  1  2 ) = (1∗ + 1  2∗ )  Assume that the support of
1 is the set of all nonpositive numbers and that 0 belongs to the support
of 2 . If 1  0 let 1 = 0 and 2 = 0 Then, since conditional on
1 = 0  ( ∗   ) = (1∗  2∗ )  the condition is satisfied letting 1 = 1 and
2 = 2  If 1  0 let 1 = 1  2 = 0 1 = 0 and 2 = 2  Then, as in the
previous example,
Pr ( ( ∗   ) ≤ (1  2 ) |1 = 1 )
= Pr (1∗ + 1 ≤ 1  2∗ ≤ 2 |1 = 1 )
= Pr (1∗ ≤ 0 2∗ ≤ 2 |1 = 1 )
= Pr (1 ≤ 0 2 ≤ 2 |1 = 1 )
Hence, the condition is again satisfied.
Note that in the Binary Threshold Crossing model, the support of  is
required to be the whole line, while in the Tobit Model the support of 
need only be the set of nonpositive numbers. Intuitively, the large support
of  in the first case is needed to compensate for the fact that the only
observed property of  ∗ is whether it is positive or negative. In the Tobit
model, the values of  ∗ are observed when they are positive. Hence, there
is no need to use values of  in order to recover the distribution of  ∗ for
those values. Lewbel and Matzkin (2009) analyze support requirements on
12
‘special regressors’ across different Limited Dependent Variable models.
Under additional restrictions, one could apply directly the results in
Matzkin (2008) without use of the vector  One such case is where identification of functions in the model can be established using only the region
where the distribution of  given  coincides with the distribution of  ∗
given  Along this line, it might be useful contrasting our assumptions with
those used in some nonparametric versions of the tobit model when 2∗ is
not endogenous. These models have been studied recently by, among others,
Lewbel and Linton (2002) and Chen, Dahl, and Kahn (2005).
Denote
 = (1  2 ) where 1 is a scalar. Lewbel and Linton (2002) consider the
model
1∗ = (1  2 ) + 
 (1∗ ) = 1∗
if 1∗  0
otherwise,
= 0
where  and  are independently distributed. Chen, Dahl, and Kahn (2005)
consider the model
1∗ = (1  2 ) + (1  2 )
 (1∗ ) = 1∗
if 1∗  0
otherwise,
= 0
where  and  are independently distributed, and where two quantiles of 
are known. In contrast, our model would be specified as
1∗ = 1 + (2  )
 (1∗ ) = 1∗
if 1∗  0
otherwise,
= 0
13
where  and (1  2 ) are independently distributed, 1 is continuously distributed conditional on 2 , and  is strictly increasing in  Our additivity in 1
and specified coefficient for 1 compensates for the additivity in  the separability between 1∗ and 2  and the specified coefficient for 1∗  imposed in the
models of Lewbel and Linton (2002) and Chen, Dahl, and Kahn (2005). One
could use the assumptions in those models, together with additional ones, to
obtain identification results when simultaneity is present. This would avoid
relaying on the observable vector  at the expense of additional structure.
Consider for example, the model
1∗ =  2∗ + 1 (1 ) + 1
2∗ =  1∗ + 2 (2 ) + 2
 (1∗  2∗ ) = (1∗  2∗ )
if 1∗  0 2∗  0
otherwise,
= (0 0)
where 1 and 2 be both multivariate continuous random vectors,  and  are
parameters of unknown values, and 1 and 2 are nonparametric, continuous
functions. If the joint distribution of (1∗  2∗  1  2 ) were observed, one could
use Matzkin (2008) to easily derive identification results for   1 and
2  However, even when this is not the case, one can show that, because of
the particular structure,under some pointwise conditions on the value of the
derivative of the density of (1  2 )  the distribution of (1∗  2∗  1  2 ) only
over the region where (1∗  2∗ )  (0 0) is enough to identify   1  and 2 
This could not be possible, however, in more general structures.
The next assumption assumes the existence of nonparametric functions
 (   ) for each coordinate,  of  Each of these functions is assumed to
satisfy any one of the restrictions that were used in Matzkin (2003) to show
identification of nonadditive random functions. In addition, it is assumed
14
that for each of them, there exists a value,   of the subvector   at which
the value of the function  is known. This is used to ‘shut down’ the effect
of each of the functions at the specified values of   Instead of using this
trick, one could use, under certain structures, deconvolution techniques as in
some of the papers mentioned in the Introduction.
Assumption M (unknown monotone functions of
³ ): There
´ exist 
nonparametric, unknown functions 1 (1  1 )        where  =
´
³
1     and where each 1    denote the values of subvectors of
¢
¡
 = 1    satisfying (i) each  is continuously differentiable and
strictly increasing in  a.e. in  (ii) for each , there exists a known
scalar  and a known value e in the support of  such that for all values
    ) =   (iii) either (iii.a)  is homogenous of degree one
of    (e
in (    ) and for some value (     ) in the support of (   ) and some
scalar     (    ) =    or (iii.b) for some value   in the support of
 and all values of     (     ) =    or (iii.c) the subvector  can
be partitioned into  = (1  2 ) where 2 ∈  and there exists a function
 such that for all (1  2 )   (1  2    ) =  (1  2 +  ) and for some
¡
¢
¢
¡
known value   and known vector value  1      1   =   
Assumption M(ii) is the one made to be able to ‘shut down’ the effect of
¡
¢
 by letting  equal a specified value. Identification of 1    within
¢
¡
e  would require all functions in the set to sata set of functions 
e 1   
¢
¡
isfy Assumption M in exactly the same way that 1    satisfies it.
In particular, all functions in the set would be required to possess the same
¡
¢
values of      e  and   ( = 1   ) as 1     In some cases,
establishing identification within the set will not require knowing the values
of e  as long as these values are common to all functions in the set. Estimation however, would typically require specifying known values for e  An
example of a function  satisfying M(ii) is  (   ) =   2  for  ∈ 
15
In this example e = 0 When the effect of all  ’s except for one, ∗  is
’shut down’, we will be able to identify the distribution of the values of ∗
¡
¢
conditional e1   e∗ −1  ∗  e∗ +1   e  Since by Assumption D.1, ∗ is
distributed independently of  the distribution of  ∗ will be identified from
¡
¢
this distribution of ∗ conditional on e1   e∗ −1  ∗  e∗ +1   e  Assumptions M(i) and M(iii) allow then to identify the distribution of ∗ from
the distribution of ∗ conditional on ∗  Assumptions M(i) and M(iii)
were used in Matzkin (2003) to show identification of functions  (    ) 
nonadditive in   when   was distributed independently of  with an everywhere positive density. Let  | = (·) denote the cumulative distribution
of  =  (    ) conditional on  =   Matzkin (2003) showed that if
the function  satisfies  (     ) =  for some specified value   in the
support of   then for all  

 () =  | = ()
¢
¡
 (  ) = −1
()


|
=


|
=




If the function  satisfies  (     ) =   and is homogeneous of degree
one, then for all  
 () = 

 | =

 (  ) =


−1
 | =


µµ
µ



¶

¶
 


 | = 

µµ


¶

¶¶

If the function satisfies  (1  2    ) =  (1  2 +  ) and for some speci¡
¢
¢
¡
fied   and vector value  1      1   =    then

 () =  | =(1 −) (  )

´
³
−1
 (  ) =  | =  | =(1 −) (  ) 

16
We note that Assumption M does not require all the  functions to
satisfy the same restrictions. The assumption only requires that for each 
 satisfies either () or () or () We also note that even though
we have argued that Assumption M allows identification of the distribution
of  for each  this with Assumption D.3 imply that the distribution of the
whole vector  is identified.
The following assumption will be made to guarantee identification of the
distribution of  and of nonparametric functions of  We will assume that
the - dimensional vector  is an argument, together with a subvector 0 of
the observable vector  of an unknown nonparametric function 0  which
has range   Denote the value of 0 at (0  ) by 0 : 0 = 0 (0  )  We
want to impose restrictions on 0 guaranteeing that 0 and the distribution
of  are identified from the distribution of (0  0 )  Matzkin (2008) provides
a method to find such restrictions. To use her result, we need to introduce
additional notation. We will let 0 and  satisfy the assumptions made on
the reduced form system of the simultaneous equations models analyzed in
e 0 (0  )  both invertMatzkin (2008). For any two functions 0 (0  ) and 
ible in  denote the inverses with respect to  by, respectively, −1
0 (0  0 )
−1
e −1
and 
e 0 (0  0 )  Denote the matrix of partial derivatives of 
0 (0  0 )
−1
e 0 (0  0 ) 0 , and the matrix of partial derivwith respect to 0 by  
−1
e −1
atives of 
e 0 (0  0 ) with respect to 0 by  
0 (0  0 ) 0  Denote the
−1
matrices corresponding to 0 (0  0 ) in the same way. The Jacobian de−1
terminants of 
e −1
0 (0  0 ) and 0 (0  0 ) will be denoted respectively by
¯
¯ −1
¯
¯ −1
¯ 
e 0 (0  0 ) 0 ¯ and ¯0 (0  0 ) 0 ¯ and be both assumed to be
17
positive. We define
¯
¯
¯
¯
¯
¯
¯ −1
¯ 


e −1
0 (0  0 ) ¯
0 (0  0 ) ¯
¯
¯
−
log ¯
log
∆0 (0  0 ) =
¯ 
¯
¯
0
0
0
0

¯
¯
¯
¯
−1
¯
¯
¯ −1
¯ 


e
(


)
(


)
0
0
0
0
0
0
¯−
¯
¯
∆0 (0  0 ) =
log ¯¯
log
¯ 0
¯
¯
0
0
0
For any possible density  of  and any value of (0  0 )  we then define the
e  ) by
matrix  (0  0 ;  
e  )
 (0  0 ;  
⎛ ³

 −1
0 (0 0 )
0
´0
⎜
⎜
⎜
´0
= ⎜ ³ −1

 0 (0 0 )
⎜
⎝
0
∆0 (0  0 ) +
³
−1
0 (0 0 )
0
´0
 log( (−1
0 (0 0 )))

⎞
⎟
⎟
⎟
´0  log  −1 (  )
³ −1
(  ( 0 0 0 )) ⎟
0 (0 0 )
⎟
∆0 (0  0 ) +
⎠
0

The following assumption will be used, together with the results on
nonparametric identification of simultaneous equations models in Matzkin
(2008), to guarantee that the distribution of  and the nonparametric function 0 are identified from the distribution of (0  0 ) 
Assumption I (unknown invertible function of ): There exist a nonparametric, unknown function, 0 (0  )  such that (i) for any value of 0 
the range of 0 (0  ·) is  (ii) a.e. in 0  0 (0  ·) is invertible and 0
and its inverse are continuously differentiable, (iii) for any value of 0 
|0 (0  ) |  0(iv) for a known value e0 of 0 and a known value
0  ) = 0 for all  and () 0 satisfies
0 ∈  of a vector  ∈   0 (e
18
shape restrictions guaranteeing that for any 
e 0 6= 0 satisfying the same
restrictions that 0 is assumed to satisfy there exists (0  0 ) such that for
any  satisfying the restrictions that  is assumed to satisfy, the rank of
e  ) is  + 1
the matrix  (0  0 ;  
Note that, as in Assumption M, identification of 0 within a set of funce 0 within the set to satisfy I(iv) with the
tions 
e 0 will require all functions 
same e0 and same 0 as 0 
Assumption L (known links): There exist a known continuously differthat (i) the function  depends
on ´´
entiable function  : + →  such
³
³
(  ) only through the values of  0 (0  )  1 (1  1 )        
(ii) for each  ≥ 1  is an argument in only one coordinate,   of 
and (iii) for each  ≥ 1 and  if  is an argument of  then when
for all  ≥ 0 all the subvectors  of  other than  equal the value e
¡
¢
at which  e    =    is strictly increasing in the value of   (v)
¢
¡
 0 (0  )  1    is invertible in 0 
Assumption L specifies the structure generating the heterogeneity in the
model. Define the function  :  ++ →  by
³
³
´´
 (  ) =  0 (0  )  1 (1   1 )       
Let  denote the value of  (  )  In our identification theorem, we will
show that under the assumptions of our model, the joint distribution of ( )
is identified. Then, we will show that from the joint distribution of ( )
we will be able to identify the distributions of  and  and the functions
0  1     This latter step will use Assumptions M, I, and L together
with D1-D3. To provide an intuition for the way in which this result will be
19
obtained, note that when  = e for  = 1   
¡
¢
 (  ) =  =  0 (0  )  1   
Since the function  is known and invertible in 0  from the distribution
¢
¡
of  given  = 0  e1   e we will be able to calculate the distribution
of 0 = 0 (0  ) given  Let 0 = −1 () denote the inverse of  with
¢
¡
respect to 0 when  = 0  e1   e  We then have that the distribution of
¢
¡
0 = 0 (0  ) conditional on  = 0  e1   e is identified. Using then
the results in Matzkin (2008), we can show that under Assumption I, 0
¡
¢
and the distribution of  are identified. Next, letting  = e0  1  e2  e 
we get that
¢
¡
 (  ) =  =  0  1 (1   1 )  2   
¢
¡
Since the distribution of  given  = e0  1  e2  e is identified, and
the known function  is strictly increasing in 1  we can calculate the distri¡
¢
bution of 1 = 1 (1   1 ) given  = e0  1  e2  e  Since by Assumption
M 1 satisfies one of the assumptions in Matzkin (2003), 1 and the distribution of  1 are identified. From this it follows that the distribution of  is
identified, since by assumption, its coordinates are independent.
We next make an assumption on  that allows us to transform the model
with limited dependent variables and a large number of unobservables into a
model with observable continuous dependent variables and the same number
of unobservable random terms as the number of endogenous variables.
Assumption S (separability of V): For some function   : + + →
 and all values ( ∗      ) of ( ∗      ) in the support of this
vector
20
V ( ∗      ) = 0
  ( ( ∗  )    (  )) = 0
=
The next assumption relates to conditions on the function   satisfying
(3)
  (  ) = 0
It requires that for any values ( ) there exists a unique value of  satisfying
(3) and for any values ( )  there exists a unique value of  satisfying (3)
Moreover, it also requires that the functions mapping   into  and   into
 be twice continuously differentiable, onto   and have positive Jacobian
determinants, given  These assumptions were made in Matzkin (2008) to
show identification of nonparametric simultaneous equations models where
the number of unobservable variables equal the number of observable endogenous variables, and where all variables are continuous and with everywhere
positive densities. In many simultaneous equation models, the uniqueness of
 may be too strong. It implies uniqueness of equilibrium, which in most
cases is known to be satisfied under very strong assumptions. Our separability assumptions on  have the effect of transforming the model with limited
dependent variables and large number of unobservables into a model of the
type analyzed in Matzkin (2008), where there were no latent endogenous
variables and the number of unobservable exogenous variables was the same
as that of the observable endogenous variables.
Assumption U (uniqueness of b and ): There exist unique twice
continuously differentiable functions  :  + →  and  : + → 
such that  and  are onto   for each ( ) ∈  +
  ( ( )   ) = 0
21
and for each ( ) ∈ +
  (   ( )) = 0
In addition, the Jacobian determinants, | ( ) | and | ( ) |,
are assumed to be positive for all (  ) 
3
Identification
The assumptions imply that model (1) can be expressed as
 =  ( )
where
³
³
´´
 =  (  ) =  0 (0  )  1 (1  1 )       
and
 =  ( ∗  )
This is the structural equations model, involving in each of the  equations interactions among the endogenous variables in the vector  ∗  Since the
functions  and  are known, this structural model will be identified if the
functions 0  1     and , and the distribution of ( ) are identified.
The reduced form system of the transformed model is
 =  ( )
for a function  which is the inverse of  with respect to  If  is identified,
 is also identified.
The analysis of identification will proceed by applying Matzkin (2008) to
22
the model  = ( ) to show that under our assumptions the function  is
identified. The identification of  will then be used to show identification of
0  1    and the distributions of  and  Instead of imposing enough
assumptions, as we do, to guarantee the identification of  one could consider
imposing weaker assumptions to achieve identification of only a functional,
() of  Given an alternative function, e Matzkin (2008) can be used to
determine whether (e
) is observationally equivalent to ()
Some parts of our identification analysis will be constructive, while those
that invoke Matzkin (2008) will be less so. Constructive identification is
appealing because it immediately suggests an estimation method. While the
identification results in Matzkin (2008) are not constructive, a recent paper
(Matzkin (2010)) develops an easily computable estimator for models that are
shown to be identified using Matzkin (2008). The estimation method is based
on the same transformation of variables approach from which the results in
Matzkin (2008) were developed. A control function approach (e.g., Blundell
and Powell (2003b), Chesher (2003), Imbens and Newey (2003, 2009) could
also be used to estimate some simultaneous equation models. (See Blundell
and Matzkin (2010).)
Our identification theorem gives conditions on the set of functions e and
on distributions |= , satisfying all the assumptions that were made on
the function  and all the assumptions that were made on the distribution,
|= of  given  =  When these conditions are satisfied, and all the
above assumptions are also satisfied, the theorem states that the distribution
of ( ), the functions   and the function  are identified. To state the
conditions of the theorem we will first define a set to which (  ) is assumed
to belong, and then define a matrix that depends on any pair of functions
and any distribution within that set.
We will let (Γ ×  ) denote the set of pairs (e
  ) such that e satisfies all
the assumptions made about  and  is such that for all  in the support
of   satisfies all the assumptions that  is assumed to satisfy. In
23
particular, both  and e posses the same values of      e  and   that
satisfy assumptions M and I. We will also let (Γ0 × 0 ) denote the set of
¢
¡
e 0 satisfies all the assumptions made about 0 and
pairs 
e 0   such that 
 satisfies all the assumptions made about   Similarly to the notation we
introduced before Assumption I, for any function e in Γ we will denote the
matrix of partial derivatives of e with respect to  by (e
 ( ) ) and the
matrix of partial derivatives of e with respect to  by (e
 ( ) )  The
Jacobian determinant of e will be denoted by |e
 ( ) |  Note that by
assumption this determinant is positive. The value of a conditional density
 ( ))  For any two functions
|= at e ( ) will be denoted by |= (e
e and  we will denote by ∆ the difference in the rate of change of their
Jacobian determinants with respect to  and by ∆ the difference in the rate
of change of their Jacobian determinants with respect to  That is
∆
∆
¯
¯
¯
¯
¯ ( ) ¯
¯ e
¯


(
)

¯−
¯
log ¯¯
log ¯¯
=
¯



 ¯
¯
¯
¯
¯
¯ ( ) ¯
¯ e

( ) ¯¯

¯
¯
¯
−
log ¯
log ¯
=

 ¯ 
 ¯
For any two possible functions, e and  in Γ and any possible density  in
 , we define the matrix  ( ; e   ) on (  ) ∈ + + by
⎛ ³

()

´0
∆ +
³
()

´0
 log(|= (()))

⎞
⎟
⎟
⎟
³
´0  log 
( |= (())) ⎟
()
⎟
∆ +
⎠


⎜
⎜
⎜
´0
 (  ; e   ) = ⎜ ³

()
⎜
⎝

24
Theorem: Consider the model described by (1)-(2) and satisfying Assumptions B,M,I,L,S,U,D1,D2, and D3. Suppose that for any e and  in Γ
such that e 6=  and any density  in  there exists (  ) such that the
rank of the matrix  (  ; e   ) is larger than  Then,  is identified
in Γ 0 is identified in Γ0  the distribution of  is identified in 0  and the
functions 1     and the distribution of  and therefore of ( ) are
identified in their respective sets.
To prove the theorem, we will first show that we can identify the distribution of  =  ( ∗   ) conditional on ( ) and that the distribution of
 =  (  ) conditional on  is independent of  We then show that,
conditional on  the model can be transformed into one of the form
 =  ( )
where all the assumptions in the basic simultaneous equations model considered in Matzkin (2008) are satisfied. The rank condition then implies by
Matzkin (2008) that the function  ( ) is identified. This implies that its
inverse with respect to ,  ( )  is also identified
To identify the distribution of  given any  in the support of  we note
that for any  ∈ 
¯
¯
¯  ( ) ¯
¯
¯
|= () = |== ( ( )) ¯
 ¯
Finally, by choosing the appropriate values,  ()  for  we can identify the
distribution of  the distribution of any coordinate  of  and the distribution of any  (    )  From the latter, we can identify the function  
The detailed proof follows.
In the proof, instead of showing identification of the functions  when
25
 satisfies any of the cases in M(iii), we will show identification only for
the case where  (     ) =   The proof for the case where the function
 satisfies M(iii.a) or M(iii.c) is very similar. The only difference is that
we would be showing identification of  and the distribution of  using a
different functional on the distribution of observable variables.
Proof of the Theorem: Let the distribution of  given (   ) be
given. By Assumption  for any  ∈   there exists a value  in the
support of  and  in the support of  such that for any ( )
Pr ( ( ∗   ) ≤ | =   =   = ) = Pr ( ≤ | =   =   = )
By Assumption D1, ( ) is distributed independently of (   )  This
implies that conditional on ( )  ( ) is independent of  By Assumption
U,  ( ∗   ) =  (  (  ))  Hence, by Assumption  conditional on
( )   ( ∗   ) is a function of ( )  Since ( ) is independent of 
conditional on ( )  this implies that  ( ∗   ) is independent of  ,
conditional on ( )  Hence, for any 
Pr ( ( ∗   ) ≤ | =   =   = ) = Pr ( ( ∗   ) ≤ | =   = )
Since
Pr ( ( ∗   ) ≤ | =   =   = ) = Pr ( ≤ | =   =   = )
we have that the distribution of  conditional on ( ) can be recovered
from the distribution of  conditional on (  ) by
Pr ( ( ∗   ) ≤ | =   = ) = Pr ( ≤ | =   =   = )
26
where  and  are such that  =  ( ∗ ) for  ∗ such that  ( ∗  ) = 
The independence between ( ) and (   ) which follows by Assumption D1 also implies that ( ) is independent of ( )  This implies
that conditional on  ( ) is independent of  Hence, conditional on 
 =  (  ) is independent of  Consider the model
 =  ( )
where the probability density  and the marginal pdf of ( ) are also
known. Under our assumptions, a pair (e
  ) in (Γ ×  ) is observationally
¢
¡
equivalent to another pair  | in (Γ ×  ) iff for all  in the support of
 and all ( )
¯
¯
¯
¯
¯ e
¯  ( ) ¯
 ( ) ¯¯
¯
¯
¯
= |= ( ( )) ¯
 ( )) ¯
|= (e
 ¯
 ¯
Following Matzkin (2008), one can show that under our assumptions, this
equality is equivalent to requiring that for all ( ) the rank of the matrix
 (  ; e   ) is . Note that the rank of this matrix is at least  and
can’t be strictly larger than +1 Hence, since we have that whenever e 6= 
for any  in  there exists    such that the rank of  (  ; e   )
is  + 1 then  is identified. Since  is identified, its inverse, , conditional
on  is also identified.
From  and | we can identify | since for any  and any  ∈ 
¯
¯
¯  ( ) ¯
¯
¯
|= () = |== ( ( )) ¯
 ¯
Next, we use |= () to identify the functions  for  = 0 1   and
the distribution of ( )  Under our assumptions, by appropriately choosing
the value of  we can ‘shut down’ the effect of all 0  ( = 1  ) on |
¡
¢
¡
¢
. In particular, when  = 0  e1   e   =  0 (0  )  1     By
27
¡
¢
Assumptions I and L, 0 (·) ≡  · 1    is 1-1, continuously differentiable, known, and has support   Then, 0 = 0 (0  ) = 0−1 ()  It
¡
¢
follows that the density of 0 conditional on  = 0  e1   e is identified,
since
¯
¯
¯ 0 (0 ) ¯
¯
0 |=(0 1  ) (0 ) = |=(0 1  ) (0 (0 )) ¯¯


0 ¯
By Matzkin (2008), the rank condition in Assumption I implies that 0 and
 are identified.
To show identification of 1 and of the distribution of 1  we employ a
similar argument. First we show identification of the joint distribution of
(1  ) and, next, we use this distribution to identify the function 1 (1   1 )
and the distribution of 1  In this case, we use Matzkin (2003). For this,
¢
¡
we let  = e0  1  e2   e  Assume without loss of generality that 1 =
1 (1  1 ) is an argument of the first coordinate of  Denote the value
¢
¡
of this first coordinate of  when  = e0  1  e2   e by the real-valued
function 1 (1 )  By Assumption L and M, 1 (·) is strictly increasing and
¡
¢
known. The distribution of 1 conditional on  = e0  1  e2   e is
identified by
1 |=(0 1 2  ) (1 ) = |=(0 1 2  ) (1 (1 ))


We will assume, for simplicity, that 1 satisfies M(iii.b). Following the arguments in Matzkin (2003), when 1 =  1  we have that for any value 1 
1 |=(0 1 2  ) (1 ) = 1 |=(0 1 2  ) (1 )


= 1 (1 ) 
Hence, 1 is identified. The function 1 (1   1 ) can then be identified using
28
the expression
¡
¢
1 (1   1 ) = −1|=    
1 ( 1 ) 
( 0 1 2  )
1
Similar arguments can be used for all  ≥ 1 Hence,
 and
´  are identified.
³
The identification of the distribution of  = 1     then follows by the
independence across  Similarly the identification of the distribution of ( )
follows by the independence between  and 
We have then shown that from the distribution of  given  we can
identify the functions 0 and 1    as well as the distribution of ( ) 
This together with the identification of  shown above concludes the proof.
4
Examples
We next provide some models motivated by empirical examples that have
been analyzed in previous studies, and describe variations of them where our
results can be applied. We first consider a version of the empirical example
studied in Nelson and Olson (1978). In this version of the model, each individual simultaneously allocates his time in each period to vocational school,
college, and work, taking prior accumulated time as given, and considering
the choice consequences for current wages. The model has four endogenous
variables. The variable ∗ denotes the allocation of time during period  to
vocational school training, while  denotes observed accumulated vocational
school training by the beginning of the period. The variable ∗ denotes allocation of time to college study during period  measured in years, while 
denotes accumulated formal schooling by the beginning of the period. The
variable  denotes work experience during period  while  denotes ac-
29
cumulated work experience by the beginning of the period. The variable
 denotes the average hourly rate in period  The model has additional
conditioning variables, which we will denote by a vector   The variable ∗
is observed only when it is positive and only an ordered indicator for ∗ is
observed. The equations of the model are
∗ = 0 + 1  + 2  + 3  + 4  + 05  + 
∗ =  0 +  1  +  2  +  3  +  4  +  05  + 
 =  0 +  1  +  2  +  3  +  4  +  05  + 
ln( ) =  0 +  1  +  2  +  3  +  4 ∗ +  5 ∗ +  6  +  07  + 
with
 =
and
(
∗
0
⎧
⎪
⎨ 3
 =
2
⎪
⎩
1
if ∗  0
otherwise
if ∗  1
if 0  ∗  1
otherwise
After accommodating support conditions, this model can satisfy our assumption B when 1 = −1  2 = −1  1 =  4  and  2 =  5 . This would
be reasonable if wages were functions of ∗ +  and ∗ +  and the individual would make his decisions about time allocation accordingly. Assume
that (       ) is distributed independently of (         )  Then,
again after accommodating if necessary support conditions, one can use our
results to relax the linearity in the model.
Another example is a variation on a labor participation decision, which
depends on other household income (see, e.g. Blundell and Powell (2003b)).
Suppose that the decision is on whether to temporarily increase or decrease
work hours, 1∗  over a fixed exogenously given quantity of hours,  and
30
suppose that other household income is a function of the total quantity of
hours, 1∗ +  Then, the model could be
1∗ = 1 (2  1 + 1 ) − 
2 = 2 (1∗ +  2 + 2 )
1∗  0
1 = 1

1 = 0
otherwise
Incorporating unobserved heterogeneity would be possible by letting, for example, 1 = 10 + 1 and 2 = 20 + 2 , where
¡
¢
10 = 10 01 +  1  02 +  2
¡
¢
20 = 20 01 +  1  02 +  2
1 = 1 (1   1 )
2 = 2 (2   2 )
The model would then be
¡
¢
1∗ = 1 2  10 + 1 + 1 − 
¡
¢
2 = 2 1∗ +  20 + 2 + 2
1 = 1

1∗  0
1 = 0
otherwise
We can analyze identification of this model making the assumptions we
made in the previous sections, and applying the results in those sections.
31
Note that the unobservables  1 and  2 enter both equations while each, 1
and 2  enters into only one of the equations. The observable variables
are (1  2     ) where 1 ∈ {0 1}  2 ∈   = (1  2 )  and  =
(01  02  1  2 )  In this model, the function  is given by
Ã
1 ( ∗   )
 ( ∗   ) =
2 ( ∗   )
Ã
!
1∗ + 
=
2∗
!
Ã
1∗ + 
=
2
!
As we have shown in Section 2, this function satisfies Assumption B.
The vector of unobservable random terms,  is
 =
Ã
Ã
1
2
!
!
1 (  )
2 (  )
Ã
!
10 + 1
=
20 + 2
!
Ã
10 (01 +  1  02 +  2 ) + 1 (1  1 )
=
20 (01 +  1  02 +  2 ) + 2 (2  2 )
=
Let (1  2 ) =  (1∗  2∗  )  The model obtained by transforming the
model with continuous latent endogenous variables into a model with con-
32
tinuous ‘observable’ variables is
1 = 1 (2  1 + 1 )
2 = 2 (1  2 + 2 )
Under conditions specified in the previous sections, the joint distribution
of (  ) is identified. Assume that 1 and 2 are strictly increasing with
respect to their respective last coordinates. Denote the inverse of 1 with
respect to its last coordinate by 1  and the inverse of 2 with respect to its
last coordinate by 2  We then get that
1 = 1 (2  1 ) − 1
2 = 2 (1  2 ) − 2
Under our assumptions, one can use the distribution of (1  2 ) conditional
on ( ) to identify the function  = (1  2 ) and, for each value of  =
(01  02  1  2 )  the distribution of  given  = (01  02  1  2 ) 3
The next step identifies a particular structure generating the distribution
of  given  = (01  02  1  2 )  In the example, the particular structure is
¡
¢
1 = 10 01 +  1  02 + 2 + 1 (1   1 )
¡
¢
2 = 20 01 +  1  02 + 2 + 2 (2   2 )
Assume that 0 = (0 0) 1 = 0 and 2 = 0 When 1 = e1 and 2 = e2 this
heterogeneity model becomes
3
Identification of this model when  is distributed independently of  is, in fact, already
well known using a variety of normalizations, assumptions, and arguments (Section 5 in
Matzkin (2005), Example 4 in Matzkin (2007a), Section 4.2 in Matzkin (2008), Section 6
in Berry and Haile (2009b), and Section 6 in Matzkin (2010).) Estimation for this model
has been developed in Matzkin (2010).
33
¡
¢
1 = 10 01 +  1  02 + 2
¡
¢
2 = 20 01 +  1  02 + 2
Under our assumptions, (10  20 ) is invertible. Let ( 1  2 ) denote its inverse.
We then have the model
1 =  1 (1  2 ) − 01
2 =  2 (1  2 ) − 01
By assumption, ( 1   2 ) is independent of  The steps under which the distribution of (1  2 ) conditional on  can be identified have been already
described above. Hence, we can use this distribution to identify ( 1   2 ) and
the distribution of ( 1   2 ). Since (10  20 ) is the inverse of ( 1  2 )  it is
identified once ( 1   2 ) is.
When 01 = e01 and 02 = e02  the heterogeneity model becomes
1 = 1 (1  1 )
2 = 2 (2  2 )
The function 1 and the distribution of  1 are identified from the distribution
01  e02  1  e2 )  and the function 2 and the distribution of
of 1 given  = (e
01  e02  e1  2 ) 
2 are identified from the distribution of 2 given  = (e
We have then described the steps under which the functions 10  20  1  2
and the distribution of ( 1   2   1  2 ) are identified. The identification of
(1  2 ) imply the identification of (1  2 )  Hence, all the elements in the
original structural model can be identified under our assumptions.
.
34
5
Conclusions
We have presented identification results for limited dependent variable models with simultaneity in, latent or observable, continuous endogenous variables and with unobserved heterogeneity. The identification proceeds by first
transforming the model with latent endogenous variables,  ∗  into one with
continuous endogenous variables,  and with a vector of unobservable variables of the same dimension, . We have provided conditions under which
the joint distribution of the vector of continuous endogenous variables,  and
of observable exogenous variables, ( ) is identified, and conditional on 
 is independent of  This have been used together with the identification
result for simultaneous equation models in Matzkin (2008) to show that the
distribution of  given  is identified and the mapping from ( ) to  is also
identified. The distribution of  given  has then used to identify nonadditive
functions of unobservable random terms. This and the mapping from ( )
to  has then been used to identify the original structural model.
35
6
References
ABBRING, J.H. and J.J. HECKMAN (2007): "Econometric Evaluation
of Social Programs, Part III: Distributional Treatment Effects, Dynamic
Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy
Evaluation," in Handbook of Econometrics, Vol. 6B, edited by J.J. Heckman
and E.E. Leamer.
ALTONJI, J. and R.L. MATZKIN (2005): "Cross-section and Panel Data
Estimators for Nonseparable Models with Endogenous Regressors," Econometrica, 73, 1053-1102.
AMEMIYA, T. (1978):"The Estimation of a Simultaneous Equation Generalized Probit Model," Econometrica, 46, pp. 1193-1205.
AMEMIYA, T (1979): "The Estimation of a Simultaneous Equation Tobit Model," International Economic Review, Vol. 20, No. 1, pp. 169-181.
AMEMIYA, T. (1985), Advanced Econometrics, Cambridge, MA: Harvard University Press.
ATHEY, S. and G.W. IMBENS (2007): Discrete Choice Models with
Multiple Unobserved Choice Characteristics," International Economic Review, No. 48, 1159-1192.
BAJARI, P., J. FOX, K.I. KIM, and S. RYAN (2009): "The Random
Coefficients Logit Model is Identified," Discussion paper, University of Minnesota.
36
BAJARI, P., J. FOX, and S. RYAN (2007): "Linear Regression Estimation of Discrete Choice Models with Nonparametric Distribution of Random
Coefficients," American Economic Review, Papers and Proceedings, 97, 459463.
BENKARD, C.L. and S. BERRY (2006) "On the Nonparametric Identification of Nonlinear Simultaneous Equations Models: Comment on B. Brown
(1983) and Roehrig (1988)," Econometrica, 74, 1429-1440.
BERAN, R. and P HALL (1992): "Estimating Coefficient Distributions
in Random Coefficients Regression," Annals of Statistics, 20, 1970-1984.
BERRY, S. (1994): "Estimating Discrete Choice Models of Product Differentiation," RAND Journal of Economics, 23(2), pp. 242-262.
BERRY, S. J. LEVINSOHN, and A. PAKES (1995): "Automobile Prices
in Market Equilibrium," Econometrica, 60(4), pp. 889-917.
BERRY, S.T. and P.A. HAILE (2009a): "Nonparametric Identification
of Multinomial Choice Demand Models with Heterogeneous Consumers,"
Discussion Paper, Yale University.
BERRY, S.T. and P.A. HAILE (2009b): "Identification in Differentiated
Products Markets Using Market Level data," Working Paper, Yale University.
BLUNDELL, R. and R.L. MATZKIN (2010) "Conditions for the Existence of Control Functions in Nonadditive Simultaneous Equation Models,"
mimeo, UCL and UCLA.
37
BLUNDELL, R.W. and J. L. POWELL (2003a) "Endogeneity in Nonparametric and Semiparametric Regression Models," in Advances in Economics and Econometrics, Theory and Applications, Eighth World Congress,
Volume II, edited by M. Dewatripont, L.P. Hansen, and S.J.Turnovsky, Cambridge University Press, Cambridge, U.K.
BLUNDELL, R.W. AND J.L. POWELL (2003b) "Endogeneity in Semiparametric Binary Response Models," Review of Economic Studies.
BLUNDELL, R.W. AND R.J.SMITH (1989) "Estimation in a Class of
Simultaneous Equation Limited Dependent Variable Models," The Review
of Economic Studies, Vol. 56, No. 1, pp. 37-57.
BRIESCH, R., P. CHINTAGUNTA, and R. MATZKIN (1997) "Nonparametric Discrete Choice Models with Unobserved Heterogeneity," mimeo,
Northwestern University.
BRIESCH, R., P. CHINTAGUNTA, and R. MATZKIN (2007) "Nonparametric Discrete Choice Models with Unobserved Heterogeneity," Journal of
Business and Economic Statistics, forthcoming.
BRIESCH, R., P. CHINTAGUNTA, and R.L. MATZKIN (2002) “Semiparametric Estimation of Choice Brand Behavior,” Journal of the American
Statistical Association, Vol. 97, No. 460, Applications and Case Studies,
973-982.
BROWN B.W. (1983) "The Identification Problem in Systems Nonlinear
in the Variables," Econometrica, 51, 175-196.
CHEN, S., G.B. DAHL, and S. KAHN (2005): "Nonparametric Iden-
38
tification and Estimation of a Censored Location-Scale Regression Model,"
Journal of the American Statistical Association, Vol. 100, No. 469, Theory
and Methods.
CHERNOZHUKOV, V. AND C. HANSEN (2005): “An IV Model of
Quantile Treatment Effects,” Econometrica, 73, 245-261.
CHESHER, A. (2003): “Identification in Nonseparable Models,” Econometrica, Vol. 71, No. 5.
CHESHER, A. and J.M.C. SILVA SANTOS (2002): "Taste Variation in
Discrete Choice Models," Review of Economic Studies, 69, 147-168.
CHIAPPORI, P.-A. and I. KOMUNJER (2009): "On the Nonparametric
Identification of Multiple Choice Models," Discussion Paper, University of
California San Diego.
COSSLETT, S. R. (1983), "Distribution-free Maximum Likelihood Estimator of the Binary Choice Model," Econometrica, Vol. 51, No. 3, pp.
765-782.
FOX, J. and A. GANDHI (2009): "Identifying Heterogeneity in Economic
Choice Models," Discussion Paper, University of Chicago.
GAUTIER, E. and Y. KITAMURA (2007): "Nonparametric Estimation
in Random Coefficients Binary Choice Models," Discussion Paper, Yale.
HARDING, M.C. and J. HAUSMAN (2006): "Using a Laplace Approximation to Estimate the Random Coefficients Logit Model by Non-linear
Least Squares," Working Paper.
39
HAUSMAN, J.A. (1983) “Specification and Estimation of Simultaneous Equation Models,” in Handbook of Econometrics, Vol. I, edited by Z.
Griliches and M.D. Intriligator.
HAUSMAN, J. and D. WISE (1978): "A Conditional Probit Model for
Quality Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences," Econometrica, 46, pp. 403-426.
HECKMAN, J.J. (1974): "Shadow Prices, Market Wages, and Labor
Supply," Econometrica, Vol. 42, No. 4, pp. 679-694.
HECKMAN, J.J. and S. NAVARRO (2007): "Dynamic Discrete Choice
and Dynamic Treatment Effects," Journal of Econometrics, 136, 341-396.
HECKMAN, J.J. (1978): "Dummy Endogenous Variables in a Simultaneous Equation System," Econometrica, Vol. 46, No. 6, 931-958.
HECKMAN, J.J. and B.. SINGER (1982) "The Identification Problem in
Econometric Models for Duration Data," in W. Hildenbrand, ed. Advances
in Econometrics, pp. 39077. Cambridge: Cambridge University Press.
HECKMAN, J.J. and B.. SINGER (1984a) "The Identifiability of the
Proportional Hazard Model," Review of Economic Studies, 51, 2312-243.
HECKMAN, J.J. and B.. SINGER (1984b) "A Method of Minimizing the
Impact of Distributional Assumptions in Econometric Models for Duration
Data," Econometrica, The Identifiability of the Proportional Hazard Model,"
Econometrica, 52, 271-320.
40
HECKMAN J.J. and R.J. WILLIS (1977): "A Beta Logistic Model for
the Analysis of Sequential Labor Force Participation by Married Women,"
Journal of Political Economy, 85, 27-58.
HESS, S., D. BOLDUC, and J. POLAK (2005): "Random Covariance
Heterogeneity in Discrete Choice Models," Working Paper.
HODERLEIN, S. (2008): "Endogeneity in Semiparametric Binary Random Coefficient Models," Discussion Paper, Brown University.
HODERLEIN, S., J. KLEMELA, and E. MAMMEN (2008): "Analyzing
the Random Coefficient Model Nonparametrically," Working paper.
HONORE, B.E. (1990) "Simple Estimation of a Duration Model with
Unobserved Heterogeneity," Econometrica, 58, 453-473.
HONORE, B.E. and A. LEWBEL (2002) "Semiparametric Binary Choice
Panel Data Models Without Strictly Exogenou Regressors," Econometrica,
70, 2053-2063.
ICHIMURA, H. and T.S. THOMPSON (1998): "Maximum Likelihood
Estimation of a Binary Choice Model with Random Coefficients of Unknown
Distribution," Journal of Economics, 86(2), 269-95.
IMBENS, G.W. AND W.K. NEWEY (2003) “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,”
mimeo, M.I.T.
IMBENS, G.W. AND W.K. NEWEY (2009) “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,”
41
Econometrica, 77, 5.
KAHN, S. and A. LEWBEL (2007): "Weighted and Two Stage Least
Squares Estimation of Semiparametric Truncated Regression Models," Econometric Theory, 23, 309-347.
LANCASTER, T. (1979) “Econometric Methods for the Analysis of Unemployment” Econometrica, 47, 939-956.
LEE, L.-F. (1981): "Simultaneous Equations Models with Discrete and
Censored Variables," in C.F. Manski and D. McFadden, eds. Structural
Analysis of Discrete Data with Econometric Applications, pp. 346-364.
Cambridge, Mass: MIT Press.
LEWBEL, A. (1998): "Semiparametric Latent Variable Model Estimation with Endogenous or Mismeasured Regressors," Econometrica, 66, 105121.
LEWBEL, A. (2000), “Semiparametric Qualitative Response Model Estimation with Unknown Heteroskedasticity and Instrumental Variables," Journal of Econometrics, 97, 145-177.
LEWBEL, A. and O. LINTON (2002): "Nonparametric Censored and
Truncated Regression," Econometrica, Vol. 70, No. 2, 765-779.
LEWBEL, A., O. LINTON, and D. McFADDEN (2010): "Estimating
Features of a Distribution From Binomial Data," Journal of Econometrics,
LEWBEL, A. and R.L. MATZKIN (2009), “Identification of Limited Dependent Variable Models Using a Special Regressor with Limited Support,"
42
mimeo.
MADDALA, G. S. (1983): Limited Dependent and Qualitative Variables
in Econometrics, Cambridge: Cambridge University Press.
MANSKI, C.F. (1975), "Maximum Score Estimation of the Stochastic
Utility Model of Choice," Journal of Econometrics, 3, 205-228.
MANSKI, C.F. (1985) "Semiparametric Analysis of Discrete Response:
Asymptotic Properties of the Maximum Score Estimator," JoE, 27, 313-334.
MANSKI, C.F. (1987) "Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data," , 55, 357-362.
MATZKIN, R.L. (1992) "Nonparametric and Distribution-Free Estimation of the Binary Threshold Crossing and the Binary Choice Models,"
, 60, 239-270.
MATZKIN, R.L. (1993), "Nonparametric and Estimation of Polychotomous Choice Models," Journal of Econometrics, 58, 137-168.
MATZKIN, R.L. (1994) “Restrictions of Economic Theory in Nonparametric Methods,” in Handbook of Econometrics, Vol. IV, R.F. Engel and
D.L. McFadden, eds, Amsterdam: Elsevier, Ch. 42.
MATZKIN, R.L. (2003) "Nonparametric Estimation of Nonadditive Random Functions," Econometrica, 71, 1339-1375.
MATZKIN, R.L. (2004) "Unobservable Instruments," mimeo, Northwestern University.
43
MATZKIN, R.L. (2005) "Identification in Nonparametric Simultaneous
Equations Models," mimeo, Northwestern University.
MATZKIN, R.L. (2007a): "Heterogenous Choice," in Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress
of the Econometric Society, ed. by R. Blundell, W. Newey, and T, Persson.
Cambridge University Press.
MATZKIN, R.L. (2007b): "Nonparametric Identification," in Handbook
of Econometrics, ed. by J.J. Heckman and E. Leamer, Vol. 6B, Elsevier.
MATZKIN, R.L. (2008) "Identification in Nonparametric Simultaneous
Equations Models," Econometrica, 76, 945-978.
MATZKIN, R.L. (2010) "Estimation of Nonparametric Models with Simultaneity," mimeo, UCLA.
NELSON, F. and L. OLSON (1978): "Specification and Estimation of a
Simultaneous Equation Model with Limited Dependent Variables, " International Economic Review, Vol. 19, No. 3, pp. 695-709.
NEWEY, W.K. (1985): "Semiparametric Estimation of Limited Dependent Variable Models with Endogenous Explanatory Variables," Annales de
l’INSEE, No. 59/60, pp. 219-237.
NEWEY, W. and J. POWELL (1989) “Instrumental Variables Estimation of Nonparametric Models,” mimeo.
NEWEY, W. and J. POWELL (2003) “Instrumental Variables Estima-
44
tion of Nonparametric Models,” Econometrica, 71(5), 1565-1578.
ROEHRIG, C.S. (1988) "Conditions for Identification in Nonparametric
and Parametric Models," Econometrica, 56:2, 433-447.
SMITH, R.J. and R.W. BLUNDELL (1989) "An Exogeneity Test for the
Simultaneous Equation Tobit Model with an Application to Labour Supply,"
Econometrica, 54, 679-685.
45