Three-Step CCP Estimation of Dynamic
Programming Discrete Choice Models with
Large State Space
Cheng Chou
Geert Ridder
University of Leicester
University of Southern California
March 30, 2017
Abstract
The existing estimation methods of structural dynamic discrete choice models mostly
require knowing or estimating the state transition distributions. When some or all state
variables are continuous and/or the dimension of the vector of state variables is moderately large, the estimation of state transition distributions becomes difficult and has
to rely on tight distributional assumptions. We show that state transition distributions
are indeed not necessary to identify and estimate the flow utility functions in the presence of panel data and excluded variables. A state variable is called excluded variable
if it does not affect the flow utility but affects the choice probability by affecting the
future payoff of current choice. We propose a new estimator of flow utility functions
without estimating or specifying the state transition law or solving agentsโ dynamic
programming problems. The estimator can be applied to both infinite horizon stationary model or general dynamic discrete choice models with time varying flow utility
functions and state transition law.
JEL Classification: C14, C23, C35, J24
Keywords: Semiparametric estimation, dynamic discrete choice, continuous state variables,
large state space
1
1
Introduction
We extend the two-step Conditional Choice Probability (CCP) estimator of Hotz and Miller
(1993, HM93 hereafter) to a three-step CCP estimator. We first explain HM93โs two-step
method to make our motivation clear. The CCP is a function of the difference between
expected payoffs from different alternatives. In dynamic programming discrete choice model,
the expected payoff of one alternative equals the sum of the expected flow utility function and
conditional valuation function of that alternative. Both are functions of the current observed
state ๐ ๐ก and choice ๐๐ก . The expected flow utility functions are usually parameterized. The
difficulty is the conditional valuation function, which is the expected remaining lifetime
utility from period ๐ก onwards given the current state ๐ ๐ก and choice ๐๐ก . The key observation
in HM93 is that under certain conditions the conditional valuation functions are explicit
functions of all future CCP, future expected flow utility functions and future state transition
distributions.1 So the first point is that the CCP in period ๐ก can be expressed as a function of
all CCP, flow utility functions and state transition distributions beyond period ๐ก. Moreover,
the CCP in period ๐ก of choosing alternative ๐ is also the conditional mean of the dummy
variable that equals 1 when alternative ๐ was chosen in period ๐ก given the observed state
variables. So the second point is that after parameterizing the flow utility functions and
knowing all CCP and state transition distributions, we can set up moment conditions about
the unknown parameters in flow utility functions using the conditional mean interpretation
of the CCP.Based on these two points, the first step of HM93โs two-step estimator is to
estimate the CCP and state transition distributions, nonparametrically. Plugging these
estimates into the moment conditions defined by the CCP formula, the second step is to
estimate the unknown parameters in the flow utility functions via the generalized methods
of moments (GMM) estimator.
The motivation for this paper is that when the dimension of the vector of observable
state variables ๐ ๐ก is even moderately large, the nonparametric estimation of the state transition distributions ๐น (๐ ๐ก+1 | ๐ ๐ก ) in the first step of HM93 becomes difficult. The presence of
continuous state variables will only make the task harder. The same difficulty appears in
the other estimators following HM93, including Hotz, Miller, Sanders, and Smith (1994);
Aguirregabiria and Mira (2002); Pesendorfer and Schmidt-Dengler (2008); Arcidiacono and
Miller (2011); Srisuma and Linton (2012); Arcidiacono and Miller (2016), whose implementation requires the state transition distributions. For large state space, we have to grid the
1 Two
main conditions for this conclusion are the flow utility functions are additive in the utility shocks,
and the utility shocks are serially uncorrelated. Arcidiacono and Miller (2011) address the serially correlated
unobserved heterogeneity by including finite types in the model.
2
state space coarsely, or impose various conditional independence restrictions and parametric specification of state transition distributions. The aim is to obtain the state transition
distributions somehow.
The main point of this paper is that it is not necessary to estimate the state transition
distributions to estimate the expected flow utility functions by using HM93โs original result
with panel data and excluded variables. A state variable is called excluded variable if it does
not affect flow utility but affect future payoffs. Let ๐๐ก (๐ ๐ก ) be the vector of CCP in period
๐ก and ๐ข(๐๐ก , ๐ ๐ก ; ๐๐ก ) be the expected flow utility functions known up to a finitely dimensional
๐๐ก . HM93 shows that given observed state ๐ ๐ก , the expected optimal flow utility in period
๐ก, denoted by ๐๐ก๐ , is an explicit function of CCP and the expected flow utility functions at
time ๐ก. Denote such a function by
๐๐ก๐ (๐๐ก (๐ ๐ก ), ๐ข(๐๐ก , ๐ ๐ก ; ๐๐ก )).
For example, if the utility shocks are independent and follow type-1 extreme value distribution, it follows from HM93 that
๐๐ก๐ = โ Pr(๐๐ก = ๐ | ๐ ๐ก )[๐ข(๐๐ก = ๐, ๐ ๐ก ; ๐๐ก ) โ ln Pr(๐๐ก = ๐ | ๐ ๐ก )] + 0.5776,
(1)
๐โ๐ด
where ๐ด is the choice set. Under the assumptions of HM93, the conditional valuation
function of choosing ๐๐ก at time ๐ก equals
๐โ
โ ๐ฝ ๐โ๐ก E(๐๐๐ (๐๐ (๐ ๐ ), ๐ข(๐๐ , ๐ ๐ ; ๐๐ )) โฃ ๐ ๐ก , ๐๐ก ),
(2)
๐=๐ก+1
where ๐ฝ is the discount factor, and ๐โ โค โ is the last decision period. HM93 needs the
state transition distributions to evaluate the conditional expectations from period ๐ก+1 to ๐โ .
Hotz, Miller, Sanders, and Smith (1994) needs the state transition distributions to simulate
future states and choices to evaluate these conditional expectations. Our idea is that since
๐๐ก๐ can be explicitly written a function of CCP and flow utility functions, e.g. eq. (1),
we might directly estimate the conditional expectation terms in eq. (2) by nonparametric
regressions after estimating CCP and parameterizing flow utility functions if we have panel
data. More concretely, take eq. (1) as an example and assume ๐ ๐ก is a scalar. Letting
๐ข(๐๐ก = ๐, ๐ ๐ก ; ๐๐ก ) = ๐ ๐ก ๐๐ก (๐), to estimate the conditional valuation function of eq. (2), it is
only necessary to estimate the following terms
E(Pr(๐๐ = ๐ | ๐ ๐ )๐ ๐ โฃ ๐ ๐ก , ๐๐ก )
and
E(Pr(๐๐ = ๐ | ๐ ๐ ) ln Pr(๐๐ = ๐ | ๐ ๐ ) โฃ ๐ ๐ก , ๐๐ก ) (3)
for each period ๐ = ๐ก + 1, โฆ , ๐โ . Why would it help? It helps because estimating ๐น (๐ ๐ก+1 |๐ ๐ก )
involves the dimension of 2โ
dim ๐ ๐ก , while the nonparametric regressions, like eq. (3), involves
3
the dimension of dim ๐ ๐ก + 1. In addition, we have abundant resource to deal with the curseof-dimensionality in nonparametric regressions.
Careful readers may now raise two questions. First, for the infinite horizon stationary
dynamic programming problem, in which ๐โ = โ, do we have to estimate infinite number
of nonparametric regressions, which is infeasible given that any practical panel data have
finite sampling periods? Second, for finite horizon problem (๐โ < โ), do we need panel
data covering agentsโ entire decision horizon in order to estimate eq. (3) for each period
๐ = ๐ก + 1, โฆ , ๐โ ? The answer is โnoโ to both questions. For infinite horizon stationary
problem, we require the panel data to cover two consecutive decision periods. For finite
horizon problem, we require the panel data to cover four consecutive decision periods. The
two-period requirement for infinite horizon stationary problem is easier to understand. For
an infinite horizon stationary problem, both CCP and state transition distributions are not
time varying. The time invariant CCP can be estimated from single period data, and two
periods data are enough for estimating the state transition distributions. Once time invariant
CCP and state transition distributions are known, in principle it is possible to calculate the
conditional expectations like eq. (3). The four-period requirement for finite horizon problem
is more subtle. In this paper, the identification of the model will be achieved by using the
Exclusion Restriction proposed by Chou (2016). The Exclusion Restriction says that there
are excluded variables that do not affect the immediate payoff of the discrete choice but
affect future payoffs. One useful conclusion of Chou (2016) is that the value function in the
last sampling period of panel data is nonparametrically identified when panel data cover
at least four consecutive decision periods. It is useful because we can write the conditional
valuation function in eq. (2) for finite horizon problem as follows,
๐ โ1
โ ๐ฝ ๐โ๐ก E(๐๐๐ (๐๐ (๐ ๐ ), ๐ข(๐๐ , ๐ ๐ ; ๐๐ )) โฃ ๐ ๐ก , ๐๐ก ) + ๐ฝ ๐ โ๐ก E(๐๐ฬ (๐ ๐ ) โฃ ๐ ๐ก , ๐๐ก ),
๐=๐ก+1
where ๐ is the last decision period in sample, and ๐๐ฬ (๐ ๐ ) is the value function in period
๐ after integrating out the unobserved utility shocks. The nonparametric regressions involved in the first summation term have no problem. The second term is not a problem
either, because we can express ๐๐ฬ (๐ ๐ ) as a series expansion and the series coefficients are
identifiable.
Our three-step extension of HM93 proceeds in the following steps. The first step is to
estimate the CCP nonparametrically. The second step is to estimate the conditional valuation function by nonparametric regressions of generated dependent variables, which are the
terms in the expression of ๐๐ก๐ (๐๐กฬ (๐ ๐ก ), ๐ข(๐๐ก , ๐ ๐ก ; ๐๐ก )), on the current state and choice (๐ ๐ก , ๐๐ก ).
The third step is GMM estimation of ๐๐ก of the flow utility functions. Although the exact
4
moment conditions that we used for the third step is different from HM93, the essential
difference between our three-step estimator and the original HM93โs two-step procedure is
that we replace the nonparametric estimation of state transition distributions in HM93 with
the nonparametric regression with generated dependent variables. This can be complementary to the literature when the dimension of the state variables is large. The additional cost
is that for finite horizon problem, we need excluded variables in order to identify the value
function in the last sampling period. The excluded variable will also be needed to identify
the infinite horizon stationary dynamic programming discrete choice model.2
Since HM93, a series of CCP estimation papers (Altuฤ and Miller, 1998; Arcidiacono
and Miller, 2011, 2016) have been using the concept of โfinite dependenceโ to simplify the
CCP estimator. Given โfinite dependenceโ property, such as terminal or renewal action,
the current choice will not alter the distribution of future states after a certain number of
periods. Therefore, the conditional valuation function will depend only on the CCP and
flow utility functions in a small number of periods ahead. This can substantially simplify
the CCP estimation. Such โfinite dependenceโ property can also be used to simplify our
three-step estimation, because we are essentially also estimating the conditional valuation
functions. We are not going to pursue this direction in the present paper.
To derive the asymptotic variance of the proposed estimator, we derive the general
formulas for the asymptotic variance of the three-step semiparametric M-estimators with
generated dependent variables for the nonparametric regressions of the second step. It turns
out that unlike the three-step semiparametric estimators with generated regressors (Hahn
and Ridder, 2013), the sampling error in the first-step estimation will always affect the
influence function of the three-step semiparametric estimators with generated dependent
variables.
The rest is organized as follows. The dynamic discrete choice model is described in section 2. Section 3 and section 4 show the three-step estimation of infinite horizon stationary
model and general nonstationary model, resepctively. Section 5 is a numerical study of the
proposed three-step CCP estimator. Section 6 concludes the paper. Some technical details
are included in Appendix A, B, C, and D.
2 One
way identify the infinite horizon stationary model without using the excluded variable is to impose
the normalization assumption that lets the flow utility function of a reference alternative being zero for all
state values. This normalization will be problematical for doing counterfactual analysis as shown in our
discussions after assumption 4 of our model. Parametric specification of flow utility function might be able
to identify the model in some cases. For example, in the numerical example of Srisuma and Linton (2012),
they estimate an infinite horizon stationary model without using the excluded variables or the normalization.
They did not address the identification issue, though their Monte Carlo experiments suggest the parameters
are identified.
5
2
Dynamic Programming Discrete Choice Model
First, we set up the model. Second, we decribe the data required for our estimator. Third,
we briefly discuss the identification and conclude with the moment conditions that are the
basis of our three step estimator.
2.1
The Model
We focus on the binary choice case. For period ๐ก, let โฆ๐ก be the vector of state variables that
could be relevant to the current and future choices or utilities apart from time itself. In
each period ๐ก, an agent makes a binary choice ๐๐ก โ ๐ด โก {0, 1} based on โฆ๐ก . The choice ๐๐ก
affects both the agentโs flow utility in period ๐ก and the distribution of the next period state
variables โฆ๐ก+1 . The vector โฆ๐ก is completely observable to the agent in period ๐ก but partly
observable to econometricians. Let โฆ๐ก โก (๐ฅ๐ก , ๐ง๐ก , ๐๐ก ). Econometricians only observe ๐ฅ๐ก and
๐ง๐ก , and let the vector ๐๐ก denote the unobservable utility shocks. Also denote ๐ ๐ก โก (๐ฅ๐ก , ๐ง๐ก ) the
observable state variables. Assumption 1 assumes that โฆ๐ก is a controlled first-order Markov
process, which is standard in the literature. Assumption 2 has two points: (i) the flow utility
is additive in ๐๐ก ; (ii) given ๐ฅ๐ก , ๐ง๐ก does not affect flow utility (the Exclusion Restriction). The
additivity in utility shocks is usually maintained in the literature with notable exception
of Kristensen, Nesheim, and de Paula (2014). We will use the Exclusion Restriction to
identify the model. The exclusion restriction exists in the applied literature, e.g. Fang and
Wang (2015) and Blundell, Costa Dias, Meghir, and Shaw (2016). We will use the notation
by Aguirregabiria and Mira (2010) in their survey paper as much as possible. The first
difference is that due to the presence of the excluded state variables ๐ง๐ก , we decided to use
โฆ๐ก โก (๐ ๐ก , ๐๐ก ) and ๐ ๐ก โก (๐ฅ๐ก , ๐ง๐ก ) to denote the vector of all state variables and the vector of
observable state variables, respectively.3
Assumption 1. Pr(โฆ๐ก+1 | โฆ๐ก , ๐๐ก , โฆ๐กโ1 , ๐๐กโ1 , โฆ) = Pr(โฆ๐ก+1 | โฆ๐ก , ๐๐ก ).
Assumption 2. The agent receives flow utility ๐๐ก (๐๐ก , โฆ๐ก ) in period ๐ก.
Letting ๐๐ก โก
โฒ
(๐๐ก (0), ๐๐ก (1)) , assume
๐๐ก (๐๐ก , โฆ๐ก ) = ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) + ๐๐ก (๐๐ก ).
The flow utility ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) in assumption 2 has a subscript โ๐กโ. This is a bit uncommon
in the literature. By adding โ๐กโ as a subscript, we allow the flow utility to depend on the
decision period, which is โageโ in many applications of labor economics, per se.
3 Aguirregabiria
and Mira (2010) used ๐ ๐ก and ๐ฅ๐ก to denote the vector of all state variables and the vector
of observable state variables, respectively.
6
Let ๐โ โค โ be the last decision period. In each period ๐ก, the agent makes a sequence of
choices {๐๐ก , โฆ , ๐๐โ } to maximize the expected discounted remaining lifetime utility,
๐โ
โ ๐ฝ ๐โ๐ก E(๐๐ (๐๐ , โฆ๐ ) โฃ ๐๐ก , โฆ๐ก )
๐=๐ก
where 0 โค ๐ฝ < 1 is the discount factor.4 The agentโs problem is a Markov decision process,
which can be solved by dynamic programming. Let ๐๐ก (โฆ๐ก ) be the optimal value function,
which solves Bellmanโs equation,
๐๐ก (โฆ๐ก ) = max ๐ข๐ก (๐๐ก = ๐, ๐ฅ๐ก ) + ๐๐ก (๐) + ๐ฝ E(๐๐ก+1 (โฆ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก , ๐๐ก = ๐).
๐โ๐ด
(4)
Accordingly, if the agent follows the optimal decision rule, the observed choice ๐๐ก satisfies
๐๐ก = arg max ๐ข๐ก (๐๐ก = ๐, ๐ฅ๐ก ) + ๐ฝ E(๐๐ก+1 (โฆ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก , ๐๐ก = ๐) + ๐๐ก (๐).
๐โ๐ด
(5)
Equation (5) is similar to the data generating process of a static binary choice model with
the exception of the additional term ๐ฝ E(๐๐ก+1 (โฆ๐ก+1 )|๐ ๐ก , ๐๐ก , ๐๐ก = ๐) (the conditional valuation
function associated with choosing ๐ in period ๐ก in HM93โs terminology) in the โexpected
payoffโ of alternative ๐. Without further restriction, the conditional valuation function is
non-separable from the unobserved ๐๐ก . To avoid dealing with non-separable models, we
make the following assumption, which ensures
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก ),
E(๐๐ก+1 (โฆ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก , ๐๐ก ) = E(๐๐ก+1
(6)
ฬ (๐ ๐ก+1 ) โก E(๐๐ก+1 (๐ ๐ก+1 , ๐๐ก+1 ) โฃ ๐ ๐ก+1 ).
๐๐ก+1
(7)
where
Assumption 3. (i) ๐ ๐ก+1 , ๐๐ก+1 โ
โ ๐๐ก | ๐ ๐ก , ๐๐ก , (ii) ๐ ๐ก+1 โ
โ ๐๐ก+1 | ๐ ๐ก , ๐๐ก , and (iii) ๐๐ก+1 โ
โ ๐ ๐ก , ๐ ๐ก .
This assumption is also common in the literature. Equation (6) implies that the agentโs
expectation of her optimal future value depends only on the observable states variables ๐ ๐ก
and choice ๐๐ก . This could be restrictive in some applications, e.g. it excludes fixed effect that
affects flow utility or state transition or both. In Magnac and Thesmarโs (2002) research
about the identification of the dynamic programming discrete choice model, they show
the identification of flow utility allowing for fixed effect that could affect both flow utility
and state transition. One of their conditions is that the flow utility and the conditional
valuation function of one reference alternative equal to zero. In our notation, taking โ0โ as
4 In
general, we can allow for time varying discount factor, ๐ฝ๐ก , but for the simplicity of exposition, we do
not consider such an extension here.
7
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = 0) = 0.
the reference alternative, they assume ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) = 0 and E(๐๐ก+1
Such a condition turns out to be too restrictive in some applications, as we will discuss more
after assumption 4.
Using the simplification of eq. (6), the observed discrete choice ๐๐ก satisfies
๐๐ก = 1(๐๐ก (0) โ ๐๐ก (1) < ๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก )),
where
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก )
๐ฃ๐ก (๐๐ก , ๐ ๐ก ) = ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) + ๐ฝ E(๐๐ก+1
(8)
is the choice-specific value function minus the flow utility shock in Aguirregabiria and Miraโs
(2010) terminology. Let ๐๐กฬ โก ๐๐ก (0)โ๐๐ก (1) and ๐บ(·|๐ ๐ก ) be the cumulative distribution function
(CDF) of ๐๐กฬ given ๐ ๐ก . In terms of ๐บ(· | ๐ ๐ก ), the CCP ๐๐ก (๐ ๐ก ) = Pr(๐๐ก = 1 | ๐ ๐ก ) is
๐๐ก (๐ ๐ก ) = ๐บ(๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) โฃ ๐ ๐ก )
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = 1) โ ๐ฝ E(๐๐ก+1
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = 0) โฃ ๐ ๐ก ).
= ๐บ(๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) โ ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐ฝ E(๐๐ก+1
(9)
Our analysis will heavily use the notation of flow utility difference and the conditional
expectation difference. Without simple notation, the discussion would be cumbersome as
shown by the above displayed equation. So we define the following new notation
๐ขฬ๐ก (๐ฅ๐ก ) โก ๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) โ ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก )
ฬ (๐ ๐ก+1 ) | ๐ ๐ก ) โก E(๐๐ก+1
ฬ (๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก = 1) โ E(๐๐ก+1
ฬ (๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก = 0).
E(ฬ ๐๐ก+1
The estimation of ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) is equivalent to the estimation of ๐ขฬ๐ก (๐ฅ๐ก ) and ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ).
When the CDF ๐บ(· | ๐ ๐ก ) is unknown, even the difference ๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก )
cannot be identified in general, let alone the flow utility functions ๐ข๐ก (๐๐ก , ๐ฅ๐ก ).5 Suppose that
the CDF ๐บ(· | ๐ ๐ก ) is known, the absolute level of ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) cannot be identified. Take ๐ฝ = 0
for example, for any constant ๐ โ โ,
๐๐ก (๐ ๐ก ) = ๐บ(๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) โ ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) | ๐ ๐ก )
= ๐บ([๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) + ๐] โ [๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐] โฃ ๐ ๐ก ).
The following assumption is to address these concerns.
5 With
additional assumptions, the CDF ๐บ(· | ๐ ๐ก ) is identifiable as shown by Aguirregabiria (2010, page
205). His arguments are based on the following observation. Besides the presence of the conditional valuation
ฬ
function ๐ฝ E(๐๐ก+1
(๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก ), the CCP formula eq. (9) is similar to the CCP in the binary static discrete
choice model studied by Matzkin (1992), in which the CDF ๐บ(· | ๐ ๐ก ) can be nonparametrically identified
with the โspecial regressorsโ and the zero-median assumption. In this paper, we focus on the estimation
when ๐บ(· | ๐ ๐ก ) is known.
8
Assumption 4.
(i) Assume ๐ ๐ก โ
โ ๐๐กฬ , and ๐๐กฬ is a continuous random variable with real
line support. Letting ๐บ(·) be the CDF of ๐๐กฬ , assume ๐บ(·) is a known strictly increasing
function, and E(๐๐ก (0)) = 0.
(ii) For every period ๐ก, let ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก = ๐ฅโ ) = 0 for some ๐ฅโ .
The independence assumption ๐ ๐ก โ
โ ๐๐กฬ is not particularly strong and is commonly maintained in the literature given that we have to assume the conditional CDF of ๐๐กฬ given ๐ ๐ก
is known anyway. The normalization in assumption 4.(ii) differs from the commonly used
normalization by letting one flow utility function be zero,
๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก = ๐ฅ) = 0,
for all ๐ก and all ๐ฅ.
(10)
The normalization of eq. (10) implies that the flow utility of alternative 0 in each period
does not vary with respect to the values of the state variable ๐ฅ๐ก , and more importantly with
respect to the distribution of ๐ฅ๐ก . This has serious consequence. Because the future value of a
current choice is the discounted expected sum of all future optimal flow utilities by following
the optimal strategy, prohibiting ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) from changing with respect to ๐ฅ๐ก for each
period ๐ก is going to greatly restrict how the future value of the current choice changes with
respect to the distribution of (๐ฅ๐ก+1 , ๐ฅ๐ก+2 , โฆ ) given ๐ฅ๐ก and ๐๐ก . When the counterfactual
policy is to change the state transition distributions, it is known that the normalization of
eq. (10) is not innocuous, but the normalization of assumption 4.(ii) is mostly harmless for
making the counterfactual policy predictions (Norets and Tang, 2014; Kalouptsidi, Scott,
and Souza-Rodrigues, 2015; Chou, 2016). With the Exclusion Restriction, Chou (2016)
shows that the flow utility functions are nonparametrically identifiable without imposing
the โstrongโ normalization of eq. (10).
2.2
Data and Structural Parameters
There are ๐ agents in data. For each agent ๐ = 1, โฆ , ๐, we observe her decisions ๐๐๐ก
and observable state variables ๐ ๐๐ก from decision period 1 to ๐ โค ๐โ . Taking the female
labor force participation model as an example, the decision period is โageโ, and ๐๐๐ก is of
course whether or not woman ๐ is on the labor market at the age of ๐ก. In the female labor
supply study (e.g. Eckstein and Wolpin, 1989; Altuฤ and Miller, 1998; Blundell, Costa Dias,
Meghir, and Shaw, 2016), some common state variables in ๐ ๐๐ก include woman ๐โs education,
accumulated assets, working experience, her family background, her partnerโs wage (which
would zero if she is single), education and working experience, and the number of dependent
children in the household. The point is ๐ ๐๐ก usually has a large dimension and includes both
9
continuous (partnerโs wage) and discrete variables (education). These features make the
direct estimation of the state transition distribution hard. It should be remarked that the
first decision period 1 in sample does not need to be the initial period of her dynamic
programming problem, nor does the last period ๐ in sample correspond to the terminal
decision period ๐โ .
The structural parameters of this model include the flow utility functions ๐ข๐ก (๐๐ก , ๐ฅ๐ก ),
discount factor ๐ฝ and state transition distributions ๐น๐ก+1 (๐ ๐ก+1 |๐ ๐ก , ๐๐ก ) in each period ๐ก. The
state transition distributions are identified from data. By adding time subscript in the state
transition distributions, we allow the distribution vary across decision periods. This can be
useful in applications because for example data show that mean earnings are hump shaped
over the working lifetime (see e.g. Heckman, Lochner, and Todd, 2003; Huggett, Ventura,
and Yaron, 2011; Blundell, Costa Dias, Meghir, and Shaw, 2016).
2.3
Identification from Linear Moment Conditions
The identification of the structural parameters (excepting for the state transition distributions, which are directly identified from data) is based on the following equations,
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ),
ฬ ๐๐ก+1
๐(๐๐ก (๐ ๐ก )) = ๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) = ๐ขฬ๐ก (๐ฅ๐ก ) + ๐ฝ E(
(11a)
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = 0),
๐(๐๐ก (๐ ๐ก )) = ๐๐กฬ (๐ ๐ก ) โ ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) โ ๐ฝ E(๐๐ก+1
(11b)
for every period ๐ก in data. Our estimator will be based on an alternative expression of
the above equations, which is eq. (14) at the end of this section. Here ๐(๐) and ๐(๐) are
constructed from ๐บ(·), the known strictly increasing CDF of ๐๐กฬ = ๐๐ก (0) โ ๐๐ก (1):
๐(๐) โก ๐บโ1 (๐)
(12)
๐(๐) โก โซ max[0, ๐บโ1 (๐) โ ๐ก] d ๐บ(๐ก),
for ๐ โ [0, 1]. Clearly, ๐(๐) is the quantile function. If we substitute ๐ in the definition of
๐(๐) with the CCP ๐๐ก (๐ ๐ก ), ๐(๐๐ก (๐ ๐ก )) viewed as a function of ๐ ๐ก is the so called McFaddenโs
social surplus function, that is the expected life-time utility following the optimal policy
minus the expected life-time utility of choosing alternative zero in period ๐ก and following
the optimal policy thereafter. The fact that the social surplus function depends only on
the CCP is known in the literature (see e.g. page 501โ502 of HM93 and Proposition 2 of
Aguirregabiria, 2010).
Equation (11a) follows from inverting the CDF ๐บ(·) in eq. (9) and the definition of ๐(๐)
in eq. (12). It is a special case of the Hotz and Millerโs inversion result (HM93, Proposition
10
1), which establishes the invertibility for multinomial discrete choice. Though not explicitly
presented in HM93, eq. (11b) can be derived using the similar arguments of HM93, page
501. It follows from integrating out ๐๐ก from the both sides of Bellmanโs eq. (4). First, in
terms of ๐ฃ๐ก (๐๐ก , ๐ ๐ก ), eq. (4) is rewritten as follows,
๐๐ก (๐ ๐ก , ๐๐ก ) = max ๐ฃ๐ก (๐๐ก = ๐, ๐ ๐ก ) + ๐๐ก (๐).
๐โ๐ด
The integration of ๐๐ก for both sides of the above display gives
๐๐กฬ (๐ ๐ก ) = โซ max(๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + ๐๐ก (0), ๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) + ๐๐ก (1)) d ๐น (๐๐ก (0), ๐๐ก (1))
= ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + โซ max(0, ๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) โ ๐๐กฬ ) d ๐บ(๐๐กฬ ).
Here we used eq. (7) and E(๐๐ก (0)) = 0 in assumption 4. It then follows from
๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) = ๐บโ1 (๐๐ก (๐ ๐ก ))
that
๐๐กฬ (๐ ๐ก ) = ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + โซ max(0, ๐บโ1 (๐๐ก (๐ ๐ก )) โ ๐)ฬ d ๐บ(๐)ฬ
= ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + ๐(๐๐ก (๐ ๐ก )).
The above display gives eq. (11b) by substituting ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) with its definition in eq. (8).
In addition, the two functional of CCP, ๐(๐๐ก (๐ ๐ก )) and ๐(๐๐ก (๐ ๐ก )), are related. Letting ๐๐๐ก be
the optimal choice at time ๐ก given ๐ ๐ก , we have
E(๐๐ก (๐๐๐ก ) | ๐ ๐ก ) = ๐(๐๐ก (๐ ๐ก )) โ ๐๐ก (๐ ๐ก )๐(๐๐ก (๐ ๐ก )).
(13)
This can be verified by calculating ๐๐กฬ (๐ ๐ก ) in the following way,
๐๐กฬ (๐ ๐ก ) = E(๐ฃ๐ก (๐๐๐ก , ๐ ๐ก ) + ๐๐ก (๐๐๐ก ) โฃ ๐ ๐ก )
= ๐๐ก (๐ ๐ก )๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) + (1 โ ๐๐ก (๐ ๐ก ))๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + E(๐๐ก (๐๐๐ก ) | ๐ ๐ก )
= ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + ๐๐ก (๐ ๐ก )[๐ฃ๐ก (๐๐ก = 1, ๐ ๐ก ) โ ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก )] + E(๐๐ก (๐๐๐ก ) | ๐ ๐ก )
= ๐ฃ๐ก (๐๐ก = 0, ๐ ๐ก ) + ๐๐ก (๐ ๐ก )๐(๐๐ก (๐ ๐ก )) + E(๐๐ก (๐๐๐ก ) | ๐ ๐ก ).
Comparing the above display with eq. (11b), we have eq. (13). In the rest of the paper, we
work with the term ๐(๐๐ก (๐ ๐ก )) โ ๐๐ก (๐ ๐ก )๐(๐๐ก (๐ ๐ก )) as a whole. So we define
๐(๐(๐ )) โก ๐(๐(๐ )) โ ๐(๐ )๐(๐(๐ )).
11
Equation (11) can be viewed as linear moment conditions about the structural parameters. We observe CCP, hence ๐(๐๐ก (๐ ๐ก )) and ๐(๐๐ก (๐ ๐ก )), from data. Given the discount factor
๐ฝ, the observable ๐(๐๐ก (๐ ๐ก )) and ๐(๐๐ก (๐ ๐ก )) are linear in the unknown flow utility and value
functions. Given the flow utility and value functions, ๐(๐๐ก (๐ ๐ก )) and ๐(๐๐ก (๐ ๐ก )) are linear in
๐ฝ. Chou (2016) shows that with the Exclusion Restriction and certain rank conditions, we
can nonparametrically identify (i) the value function ๐๐กฬ (๐ ๐ก ); (ii) the difference between the
flow utility functions ๐ขฬ๐ก (๐ฅ๐ก ); (iii) the flow utility function of alternative zero ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ).
Below, we assume that the model is identified and focus on the estimation issue, though
during the presentation of our estimator, you will see at least hints about the identification
in remark 1 and remark 2.
In the rest, we transform eq. (11) to eq. (14), which turns out to be more useful for the
estimation job. Lemma 1 is to get rid of the event ๐๐ก = 0 in the conditional expectation
ฬ (๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก = 0) in eq. (11b), so that the law of iterated expectation arguments, like
E(๐๐ก+1
E(E(๐ฃ๐ก+๐ | ๐ ๐ก+๐โ1 ) โฃ ๐ ๐ก ) = E(๐ฃ๐ก+๐ | ๐ ๐ก ), can be used.
Lemma 1. We have
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = 0) = E(๐๐ก+1
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ) โ ๐ฝ โ1 ๐๐ก (๐ ๐ก )[๐(๐๐ก (๐ ๐ก )) โ ๐ขฬ๐ก (๐ฅ๐ก )]
E(๐๐ก+1
ฬ (๐ ๐ก+1 ) and
Proof. It follows from the law of total probability that (we suppress ๐ ๐ก+1 in ๐๐ก+1
๐ ๐ก in ๐[ (๐ [ )๐ก])
ฬ | ๐ ๐ก ) = (1 โ ๐๐ก ) E(๐๐ก+1
ฬ | ๐ ๐ก , ๐๐ก = 0) + ๐๐ก E(๐๐ก+1
ฬ | ๐ ๐ก , ๐๐ก = 1)
E(๐๐ก+1
ฬ | ๐ ๐ก , ๐๐ก = 0) + ๐๐ก E(ฬ ๐๐ก+1
ฬ | ๐ ๐ก ).
= E(๐๐ก+1
So we have
ฬ | ๐ ๐ก )
ฬ | ๐ ๐ก , ๐๐ก = 0) = E(๐๐ก+1
ฬ | ๐ ๐ก ) โ ๐๐ก E(ฬ ๐๐ก+1
E(๐๐ก+1
ฬ | ๐ ๐ก ) โ ๐ฝ โ1 ๐๐ก [๐(๐๐ก ) โ ๐ขฬ๐ก (๐ฅ๐ก )],
= E(๐๐ก+1
where the second line follows from eq. (11b).
Using lemma 1, we can rewrite eq. (11b) as follows,
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ) + ๐๐ก (๐ ๐ก )[๐(๐๐ก (๐ ๐ก )) โ ๐ขฬ๐ก (๐ฅ๐ก )],
๐(๐๐ก (๐ ๐ก )) = ๐๐กฬ (๐ ๐ก ) โ ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) โ ๐ฝ E(๐๐ก+1
hence
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ).
๐๐กฬ (๐ ๐ก ) = [๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐๐ก (๐ ๐ก )๐ขฬ๐ก (๐ฅ๐ก ) + ๐(๐๐ก (๐ ๐ก ))] + ๐ฝ E(๐๐ก+1
12
โ
The term within the bracket is indeed the expected optimal flow utility in period ๐ก,
๐๐ก๐ (๐ ๐ก ) = E(๐ข๐ก (๐๐๐ก , ๐ฅ๐ก ) + ๐๐ก (๐๐๐ก ) | ๐ ๐ก ),
because
E(๐ข๐ก (๐๐๐ก , ๐ฅ๐ก ) | ๐ ๐ก ) = ๐๐ก (๐ ๐ก )๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) + (1 โ ๐๐ก (๐ ๐ก ))๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก )
= ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐๐ก (๐ ๐ก )๐ขฬ๐ก (๐ฅ๐ก ),
and eq. (13).
In summary, we have
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก )
ฬ ๐๐ก+1
๐(๐๐ก (๐ ๐ก )) = ๐ขฬ๐ก (๐ฅ๐ก ) + ๐ฝ E(
(14a)
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก )
๐๐กฬ (๐ ๐ก ) = ๐๐ก๐ (๐ ๐ก ) + ๐ฝ E(๐๐ก+1
(14b)
๐๐ก๐ (๐ ๐ก ) โก ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐๐ก (๐ ๐ก )๐ขฬ๐ก (๐ฅ๐ก ) + ๐(๐๐ก (๐ ๐ก )),
(14c)
for every decision period ๐ก. Below, we show how to estimate ๐ข๐ก (๐๐ก , ๐ฅ๐ก ) using these equations.
3
Estimation of Infinite Horizon Stationary Markov Decision Processes
We first show our three-step estimator for the infinite horizon stationary dynamic programming discrete choice model, because of its simple structure and importance in the literature.
Since the structural parameters of an infinite horizon stationary model are all time invariant,
we omit ๐ก from CCP, flow utility and value functions below.
The plan for this section is the following. First, we derive a simpler moment condition
about the flow utility functions from eq. (14). This simpler moment condition, eq. (23),
resembles linear regression, but it has infinite number of unknown conditional expectations
in both dependent and independent variables. Second, we describe our three-step semiparametric estimation method assuming that we can estimate the infinite number of unknown
conditional expectations with finite-period panel data. Third, we show how to estimate
the infinite number of conditional expectations with two-period panel data by providing an
estimable approximation formula of all conditional expectations and the order of approximation error. Fourth, we derive the influence function for our three-step semiparametric
estimator.
13
3.1
Linear moment equation
Starting with eq. (14b), we have
๐ ฬ (๐ ๐ก ) โ ๐ฝ E(๐ ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ) = ๐ ๐ (๐ ๐ก ).
(15)
By the recursive structure of eq. (15), we expect to express ๐ ฬ (๐ ๐ก ) in terms of E(๐๐๐ (๐ ๐ ) โฃ ๐ ๐ก )
with ๐ = ๐ก, ๐ก + 1, โฆ . Lemma 2 verifies our expectation.
โ
๐
Lemma 2. If assumption 1-4 hold, we have ๐ ฬ (๐ ๐ก ) = โ๐=0 ๐ฝ ๐ E(๐๐ก+๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก ).
Proof. Let ๐น (๐ โฒ | ๐ ) be the time invariant conditional CDF of ๐ ๐ก+1 given ๐ ๐ก . In the proof, we
consider the case that the conditional CDF ๐น (๐ โฒ | ๐ ) of ๐ ๐ก+1 | ๐ ๐ก is absolutely continuous for
every ๐ , and let ๐(๐ โฒ | ๐ ) be the conditional probability density function (PDF). This does
not lose generality, since one can redefine PDF with respect to other measures depending
on the type of ๐ ๐ก .
Let ๐ฑ be a Banach space of ๐ ฬ (๐ ) with norm โ๐ ฬ (๐ )โ = sup๐ โ๐ฎ |๐ ฬ (๐ )|, where ๐ฎ is the
domain of the value function. We can define a linear operator ๐ฟ such that
[๐ฟ(๐ ฬ )](๐ ) = E(๐ ฬ (๐ โฒ ) | ๐ ) = โซ ๐ ฬ (๐ โฒ )๐น ( d ๐ โฒ | ๐ ).
The left-hand-side of eq. (15) equals [(๐ผ โ ๐ฝ๐ฟ)(๐ ฬ )](๐ ๐ก ), where ๐ผ denotes identity operator.
First, we show ๐ผ โ ๐ฝ๐ฟ is invertible. Because ๐ฟ is a linear integral operator with the
kernel function being the conditional PDF ๐(๐ โฒ | ๐ ), we know that
โ๐ฟโ =
โฅ[๐ฟ(๐ ฬ )](๐ )โฅ =
sup
sup
โฅโซ ๐ ฬ (๐ โฒ )๐(๐ โฒ | ๐ ) d ๐ โฒ โฅ
๐ ฬ โ๐ฑ,โ๐ ฬ โ=1
๐ ฬ โ๐ฑ,โ๐ ฬ โ=1
= sup โซ|๐(๐ โฒ | ๐ )| d ๐ โฒ = sup โซ ๐(๐ โฒ | ๐ ) d ๐ โฒ = 1.
๐ โ๐ฎ
๐ โ๐ฎ
We then know that [๐ฝ๐ฟ(๐ ฬ )](๐ ๐ก ) = ๐ฝ E(๐ ฬ (๐ ๐ก+1 ) | ๐ ๐ก ), as a linear operator, has norm ๐ฝ < 1.
It then follows from the geometric series theorem for linear operators (e.g. see Helmberg, 2008, Theorem 4.23.3) that ๐ผ โ ๐ฝ๐ฟ is invertible. Hence, we conclude that ๐ ฬ (๐ ๐ก ) =
(๐ผ โ ๐ฝ๐ฟ)โ1 ๐๐ก๐ (๐ ๐ก ).
โ
Second, we have (๐ผ โ ๐ฝ๐ฟ)โ1 = โ๐=0 (๐ฝ๐ฟ)๐ from the geometric series theorem, hence
๐ ฬ (๐ ๐ก ) = โโ
๐ฝ ๐ [๐ฟ๐ (๐ ๐ (๐ ))](๐ ๐ก ).
๐=0
Third, we show that
๐
[๐ฟ๐ (๐ ๐ (๐ ))](๐ ๐ก ) = E(๐๐ก+๐
(๐ ๐ก+๐ ) | ๐ ๐ก ),
for ๐ = 0, 1, 2, โฆ
(16)
by induction. For ๐ = 0, because ๐ฟ0 = ๐ผ and E(๐๐ก๐ (๐ ๐ก ) | ๐ ๐ก ) = ๐๐ก๐ (๐ ๐ก ), the above display
holds. Suppose the above display holds for ๐ = ๐ฝ , we need to show that it holds for ๐ = ๐ฝ +1.
14
To see this, we have
[๐ฟ๐ฝ+1 (๐ ๐ (๐ ))](๐ ๐ก ) = [๐ฟ(๐ฟ๐ฝ (๐ ๐ (๐ )))](๐ ๐ก )
= โซ ๐ฟ๐ฝ (๐ ๐ (๐ ))(๐ ๐ก+1 )๐(๐ ๐ก+1 | ๐ ๐ก ) d ๐ ๐ก
๐
= โซ โซ ๐๐ก+1+๐ฝ
(๐ ๐ก+1+๐ฝ )๐(๐ ๐ก+1+๐ฝ | ๐ ๐ก+1 ) d ๐ ๐ก+1+๐ฝ ๐(๐ ๐ก+1 | ๐ ๐ก ) d ๐ ๐ก .
(17)
๐
The last line used [๐ฟ๐ฝ (๐ ๐ (๐ ))](๐ ๐ก+1 ) = E(๐๐ก+1+๐ฝ
(๐ ๐ก+1+๐ฝ ) | ๐ ๐ก+1 ). By the Markov property,
we have ๐(๐ ๐ก+1+๐ฝ | ๐ ๐ก+1 ) = ๐(๐ ๐ก+1+๐ฝ | ๐ ๐ก+1 , ๐ ๐ก ), hence
๐(๐ ๐ก+1+๐ฝ | ๐ ๐ก+1 )๐(๐ ๐ก+1 | ๐ ๐ก ) = ๐(๐ ๐ก+1+๐ฝ , ๐ ๐ก+1 | ๐ ๐ก ).
As a result,
๐
eq. (17) = โซ โซ ๐๐ก+1+๐ฝ
(๐ ๐ก+1+๐ฝ )๐(๐ ๐ก+1+๐ฝ , ๐ ๐ก+1 | ๐ ๐ก ) d ๐ ๐ก+1+๐ฝ d ๐ ๐ก
๐
= E(๐๐ก+1+๐ฝ
(๐ ๐ก+1+๐ฝ ) | ๐ ๐ก ).
By induction, we conclude that eq. (16) is true.
Last, given eq. (16) and ๐ ฬ (๐ ๐ก ) = โโ
๐ฝ ๐ [๐ฟ๐ (๐ ๐ (๐ ))](๐ ๐ก ), the statement is proved.
๐=0
โ
Expressing the integrated value function ๐ ฬ (๐ ๐ก ) as a sum of discounted expected future
optimal flow utilities given the current state ๐ ๐ก is not an innovation relative to the literature.
For discrete state space, HM93, Hotz, Miller, Sanders, and Smith (1994) and Miller (1997)
all write the un-discounted conditional valuation function E(๐ ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก , ๐๐ก = ๐) as
โ
โ ๐ฝ ๐ E(๐ ๐ (๐ ๐ก+๐ ) โฃ ๐ ๐ก , ๐๐ก = ๐),
(18)
๐=1
which follows from the expression of ๐ ฬ (๐ ๐ก+1 ) in our lemma 2. The existing CCP estimators
in discrete state space proceeds by expressing the conditional expectations of eq. (18) as
the product the state transition probability matrices and the optimal flow utilities in each
state. More concretely, for discrete state space, let ๐น denote the state transition probability
matrix from ๐ ๐ก to ๐ ๐ก+1 , which is constant across time for stationary problem. Let ๐น๐ be the
conditional state transition probability matrix from ๐ ๐ก to ๐ ๐ก+1 given ๐๐ก = ๐. And finally let
๐โ ๐ denote the vector the optimal flow utility in each state of the state space. HM93 and
other CCP estimators estimate eq. (18) by
โ
๐ฝ๐น๐ โ ๐ฝ ๐ ๐น ๐ ๐โ ๐ = ๐ฝ๐น๐ (๐ผ โ ๐ฝ๐น )โ1 ๐โ ๐ .
๐=0
15
For small state space, the estimation of the transition probability matrices is straightforward.
For large state space, the estimation requires various restrictions, which are not always easy
to form and justify. Of course, the above formula breaks when ๐ ๐ก contains continuous
state variables. Excepting for griding and parametric specification, another remedy for
continuous state variables is to use Srisuma and Linton (2012). Their estimator essentially
views eq. (15) as a Fredholm integral equation of type 2 with unknown kernel ๐น (๐ โฒ | ๐ ),
which was nonparametrically estimated in their paper. Still, their method has to estimate
the state transition probabilities.
The departure from the literature and the innovation is that we will estimate the conditional expectation terms in eq. (18) by nonparametric regressions. Of course, we do not
need to restrict the state space to be discrete. To see why and how we estimate eq. (18)
by nonparametric regressions, we first derive a linear moment condition of the flow utility
functions.
We now derive the linear moment equation eq. (23). Equation (14a) for infinite horizon
stationary problem reads
ฬ ๐ ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ).
๐(๐(๐ ๐ก )) = ๐ข(๐ฅ
ฬ ๐ก ) + ๐ฝ E(
Replacing ๐ ฬ (๐ ๐ก+1 ) in the above display with its series expression in lemma 2 and applying
๐
๐
E(E(๐๐ก+๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก+1 ) โฃ ๐ ๐ก ) = E(๐๐ก+๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก ),
which follows from the Markov property of ๐ ๐ก , we have the moment condition
โ
๐
ฬ ๐ก+๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก ),
๐(๐(๐ ๐ก )) = ๐ข(๐ฅ
ฬ ๐ก ) + โ ๐ฝ ๐ E(๐
(19)
๐=1
where
ฬ ๐ก ) + ๐(๐(๐ ๐ก )).
๐๐ก๐ (๐ ๐ก ) = ๐ข(๐๐ก = 0, ๐ฅ๐ก ) + ๐(๐ ๐ก )๐ข(๐ฅ
For notational simplicity, let
๐ข(๐ฅ
ฬ ๐ก ) = ๐ฅโฒ๐ก ๐ฟ
๐ข(๐๐ก = 0, ๐ฅ๐ก ) = ๐ฅโฒ๐ก ๐ผ.
and
(20)
The flow utility functions can also be nonparametrically estimated if we write ๐ข(๐ฅ
ฬ ๐ก ) and
๐ข(๐๐ก = 0, ๐ฅ๐ก ) as series expansion. We are interested in estimating ๐ โก (๐ฟ โฒ , ๐ผโฒ )โฒ .
It follows from eq. (20) that eq. (19) becomes
โ
โ
โ
๐(๐(๐ ๐ก )) = ๐ฅโฒ๐ก ๐ฟ โ โ ๐ฝ ๐ โ1๐ (๐ ๐ก ) + โ ๐ฝ ๐ โ2๐ (๐ ๐ก )โฒ ๐ผ + โ ๐ฝ ๐ โ3๐ (๐ ๐ก )โฒ ๐ฟ,
๐=1
๐=1
16
๐=1
(21)
where
ฬ
โ1๐ (๐ ๐ก ) โก E(๐(๐(๐
๐ก+๐ )) | ๐ ๐ก ),
ฬ ๐ก+๐ | ๐ ๐ก ),
โ2๐ (๐ ๐ก ) โก E(๐ฅ
ฬ
โ3๐ (๐ ๐ก ) โก E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ ๐ก ).
โฒ
Suppose we have data (๐๐๐ก , ๐ฅโฒ๐๐ก , ๐ง๐๐ก
) for ๐ = 1, โฆ , ๐, ๐ก = 1, โฆ , ๐ โฅ 2 and a known discount
factor ๐ฝ. For ๐ = 1, โฆ , ๐, letting
โ
๐ฆ๐๐ก โก ๐(๐(๐ ๐๐ก )) + โ ๐ฝ ๐ โ1๐ (๐ ๐๐ก ),
(22a)
๐=1
โ
โ
โฒ
๐๐๐ก
โก (๐ฅโฒ๐๐ก + โ ๐ฝ ๐ โ3๐ (๐ ๐๐ก )โฒ , โ ๐ฝ ๐ โ2๐ (๐ ๐๐ก )โฒ ),
๐=1
(22b)
๐=1
eq. (21) is written as the following linear equation of ๐,
โฒ
๐ฆ๐๐ก = ๐๐๐ก
๐.
(23)
Letting ๐๐ก = (๐ฆ1๐ก , โฆ , ๐ฆ๐๐ก )โฒ , ๐ = (๐1โฒ , โฆ , ๐๐โฒ )โฒ , ๐
๐ก = (๐1๐ก , โฆ , ๐๐๐ก )โฒ , and ๐
= (๐
1โฒ , โฆ , ๐
๐โฒ )โฒ ,
we have ๐ = ๐
๐, or
๐ = (๐
โฒ ๐
)โ1 (๐
โฒ ๐ ).
3.2
Three-step estimator
The estimation of ๐ is to obtain the sample analog of ๐
and ๐ . First, we approximate ๐ฆ๐๐ก
and ๐๐๐ก by truncating the infinite series in eq. (21). We will justify the truncation latter,
but given the presence of the discount factor ๐ฝ < 1, it should not be a surprise that we can
truncate the series for estimation. Define
๐ฝ
๐ฆ๐๐ก,๐ฝ โก ๐(๐(๐ ๐๐ก )) + โ ๐ฝ ๐ โ1๐ (๐ ๐๐ก ),
(24a)
๐=1
๐ฝ
๐ฝ
โฒ
๐๐๐ก,๐ฝ
โก (๐ฅโฒ๐๐ก + โ ๐ฝ ๐ โ3๐ (๐ ๐๐ก )โฒ , โ ๐ฝ ๐ โ2๐ (๐ ๐๐ก )โฒ ).
๐=1
(24b)
๐=1
The notation ๐๐ก,๐ฝ , ๐๐ฝ , ๐
๐ก,๐ฝ , and ๐
๐ฝ are then defined similarly from ๐ฆ๐๐ก,๐ฝ and ๐๐๐ก,๐ฝ . It
will be helpful to denote โ๐ฝ๐ (๐ ๐ก ) โก (โ๐1 (๐ ๐ก ), โฆ , โ๐๐ฝ (๐ ๐ก ))โฒ for ๐ = 1, 2, 3 and โ๐ฝ (๐ ๐ก )โฒ โก
(โ๐ฝ1 (๐ ๐ก )โฒ , โ๐ฝ2 (๐ ๐ก )โฒ , โ๐ฝ3 (๐ ๐ก )โฒ ) for calculating the influence functions latter. Our estimator proceeds in three steps.
Step 1: Estimate the CCP E(๐๐๐ก | ๐ ๐๐ก ). Let ๐(๐ )
ฬ
be the CCP estimator.
Step 2: Estimate ๐(๐(๐ )) and โ๐๐ (๐ ) for ๐ = 1, 2, 3 and ๐ = 1, โฆ , ๐ฝ . Let ๐(๐(๐ ))
ฬ
be the
ฬ
estimators of ๐(๐(๐ )). Taking the estimation of โ3๐ (๐ ) = E(๐(๐ ๐ก+๐ )๐ฅ๐ก+๐ | ๐ ๐ก ) for
17
example, we need to estimate
E(๐(๐ ๐ก+๐ )๐ฅ๐ก+๐ | ๐ ๐ก , ๐๐ก = ๐),
for ๐ = 0, 1. When ๐ก + ๐ โค ๐ , these conditional expectations can be estimated by
nonparametric regression of ๐(๐
ฬ ๐,๐ก+๐ )๐ฅ๐,๐ก+๐ on ๐ ๐๐ก and ๐๐๐ก . If ๐ก + ๐ > ๐ , we can still
estimate these conditional means by using the stationarity property. We will explain
the estimation of โ๐๐ when ๐ก + ๐ > ๐ in detail after completing the procedure. For
๐ = 1, 2, 3, let โฬ ๐๐ (๐ ) be the estimator of โ๐๐ (๐ ).
Step 3: Then ๐ฆ๐๐ก,๐ฝ
ฬ
and ๐๐๐ก,๐ฝ
ฬ , hence ๐๐ฝฬ and ๐
ฬ ๐ฝ , are constructed by replacing the unknown
๐ and โ๐๐ with their respective estimates. We have the estimator
โ1
๐๐ฝฬ = (๐
ฬ ๐ฝโฒ ๐
ฬ ๐ฝ ) (๐
ฬ ๐ฝโฒ ๐๐ฝฬ ).
We want to emphasize that the implementation of the entire estimation procedure involves
only certain number of nonparametric regressions and one ordinary linear regression. The
number of nonparametric regressions depends on the dimension of ๐ฅ๐ก linearly. So we claim
that the numerical implementation is not difficult, and the computation burden grows linearly in the dimension of the state variables.
Remark 1 (Role of the excluded variable ๐ง๐ก with linear flow utility functions). We claimed
that the excluded variable ๐ง๐ก was required to identify the model. This point can be now
better understood from the linear regression perspective using eq. (23). To identify ๐, the
โregressorsโ in ๐๐ก cannot be perfectly collinear. For simplicity, let ๐ฅ๐ก be a scalar. So there
are two regressors in ๐๐ก :
โ
โ
ฬ
๐ฅ๐ก + โ ๐ฝ ๐ E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก , ๐ง๐ก )
and
ฬ ๐ก+๐ | ๐ฅ๐ก , ๐ง๐ก ).
โ ๐ฝ ๐ E(๐ฅ
๐=1
๐=1
Without excluded variable ๐ง๐ก , these two regressors become
โ
โ
ฬ
๐ฅ๐ก + โ ๐ฝ ๐ E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก )
and
๐=1
ฬ ๐ก+๐ | ๐ฅ๐ก ).
โ ๐ฝ ๐ E(๐ฅ
๐=1
ฬ ๐ก+๐ |๐ฅ๐ก ) depending on the conditional distribution
The term ๐ฅ๐ก is likely to be collinear with E(๐ฅ
of ๐ฅ๐ก+๐ given ๐ฅ๐ก and ๐๐ก . Due to the presence of CCP, we cannot tell how much is the
ฬ
ฬ
correlation between E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก ) and E(๐ฅ๐ก+๐ | ๐ฅ๐ก ). However, because CCP is smaller
than one and
ฬ
E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก ) = E(๐(๐ ๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก , ๐๐ก = 1) โ E(๐(๐ ๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก , ๐๐ก = 0),
18
โ
ฬ
we expect that โ๐=1 ๐ฝ ๐ E(๐(๐
๐ก+๐ )๐ฅ๐ก+๐ | ๐ฅ๐ก ) is dominated by ๐ฅ๐ก . With linear flow utility
specification and without the excluded variable, the conclusion is the two regressors are
going to be highly correlated in some cases making the variance of ๐๐ฝฬ large.
โ
Remark 2 (Role of the excluded variable ๐ง๐ก with nonparametric flow utility functions). If
we leave the flow utility functions nonparametrically unknown, we can write the original
moment equation eq. (19) as follows,
โ
โ
๐=1
๐=1
โ
ฬ
ฬ
๐(๐(๐ ๐ก )) + โ E(๐(๐(๐
ฬ ๐ก ) + โ ๐ฝ ๐ E(๐(๐
ฬ ๐ก+๐ ) | ๐ฅ๐ก )]
๐ก+๐ )) | ๐ฅ๐ก ) = [๐ข(๐ฅ
๐ก+๐ )๐ข(๐ฅ
ฬ
+ โ ๐ฝ ๐ E(๐ข(๐
๐ก+๐ = 0, ๐ฅ๐ก+๐ ) โฃ ๐ฅ๐ก ).
๐=1
The left-hand-side of the above display is known from data. However, two terms on the
right-hand-side are both unknown functions of ๐ฅ๐ก , hence neither ๐ข(๐ฅ
ฬ ๐ก ) nor ๐ข(๐๐ก = 0, ๐ฅ๐ก ) is
โ
identifiable.
3.3
Estimation of conditional expectations with two-period data
We now explain the estimation of โ๐๐ (๐ ) when ๐ก + ๐ > ๐ in more detail. In particular, we
consider the most โdifficultโ case in which ๐ = 2. Let ๐(๐ ) denote a generic function of ๐ .
We only need to show the approximation of E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก , ๐๐ก = ๐) for all ๐ฝ โฅ 1 and ๐ โ ๐ด.
As a summary for the rest, first, we prove that
E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก ) = ๐ ๐พ (๐ ๐ก )โฒ ฮ ๐ฝ ๐ + error term;
second, we derive the order of the error term; third, we show,
E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก , ๐๐ก ) = E(E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก ) = ๐ ๐พ (๐ ๐ก )โฒ ฮ (๐)ฮ๐ฝโ1 ๐ + error term,
and derive the order of the approximation error. The notation used in the approximation
formulas will be clear soon. The bottom line is the formula can be estimated from twoperiod panel data. The proofs of the results are not interesting by themselves, so we left
them into the appendix.
For simplicity, let ๐ โ [0, 1] and let ๐ โ ๐ฟ2 (0, 1). Here ๐ฟ2 (0, 1) denotes the set of functions
1
๐ โถ [0, 1] โฆ โ such that โซ ๐2 (๐ ) d ๐ < โ. In this paper, we let ๐1 (๐ ), ๐2 (๐ ), โฆ denote a
0
sequence of generic approximating functions. Here, we let ๐1 , ๐2 , โฆ form an orthonormal
basis of ๐ฟ2 (0, 1). We can always write
โ
๐(๐ ) = โ ๐๐ ๐๐ (๐ )
๐=1
19
1
where ๐๐ = โซ ๐(๐ )๐๐ (๐ ) d ๐ . Denote
0
๐๐ฬ (๐ ) โก E(๐๐ (๐ ๐ก+1 ) โฃ ๐ ๐ก = ๐ )
and
๐๐ฬ (๐ , ๐) โก E(๐๐ (๐ ๐ก+1 ) | ๐ ๐ก = ๐ , ๐๐ก = ๐),
(25)
for ๐ = 1, 2, โฆ . Provided that ๐๐ฬ (๐ ), ๐๐ฬ (๐ , ๐) โ ๐ฟ2 (0, 1) (e.g. assumption 5.(i) and 5.(iii)
give two sufficient conditions), we can also write
โ
โ
๐๐ฬ (๐ ) = โ ๐พ๐,๐ ๐๐ (๐ )
and
๐๐ฬ (๐ , ๐) = โ ๐พ๐,๐ (๐)๐๐ (๐ ),
๐=1
๐=1
1
where ๐พ๐,๐ = โซ ๐๐ฬ (๐ )๐๐ (๐ ) d ๐ , and ๐พ๐,๐ (๐) is defined similarly. Letโs truncate all series at
0
๐พ, and write
๐(๐ ) = ๐ ๐พ (๐ )โฒ ๐ + ๐(๐ ),
๐๐ฬ (๐ ) = ๐ ๐พ (๐ )โฒ ๐พ๐ + ๐๐ (๐ ),
๐๐ฬ (๐ , ๐) = ๐ ๐พ (๐ )โฒ ๐พ๐ (๐) + ๐๐ (๐ , ๐).
๐ ๐ (๐ ), ๐พ๐ , ๐๐ (๐ ), ๐พ๐ (๐) and
Here, ๐ ๐พ = (๐1 , โฆ , ๐๐พ )โฒ , ๐ = (๐1 , โฆ , ๐๐พ )โฒ , ๐(๐ ) = โโ
๐=๐พ+1 ๐ ๐
๐๐ (๐ , ๐) are defined similarly. The following notation will be handy latter: let ๐๐พ =
(๐1 , โฆ , ๐๐พ )โฒ , ๐๐พ (๐ , ๐) = (๐1 (๐ , ๐), โฆ , ๐๐พ (๐ , ๐))โฒ , ฮ = (๐พ1 , โฆ , ๐พ๐พ ), and ฮ (๐) = (๐พ1 (๐), โฆ , ๐พ๐พ (๐)).
Note both ฮ and ฮ (๐) are estimable from two-period panel data using nonparametric regression. Depending on how smooth ๐(๐ ), ๐๐ฬ (๐ ) and ๐๐ฬ (๐ , ๐) (for each ๐) are, we can bound
the norm of ๐(๐ ), ๐๐ (๐ ) and ๐๐ (๐ , ๐).
Assumption 5.
(i) Assume ๐๐ฬ (๐ ) defined in eq. (25) is monotone is ๐ , or the state
transition density ๐(๐ ๐ก+1 | ๐ ๐ก ) is continuous in ๐ ๐ก . Either condition ensures ๐๐ฬ (๐ ) โ
๐ฟ2 (0, 1).
(ii) A Sobolev ellipsoid is a set ๐ต(๐, ๐) = {๐โฃโโ
๐ ๐2 โค ๐2 }, where ๐๐ โผ ๐๐2๐ as ๐ โ โ.
๐=1 ๐ ๐
Assume that the coefficients (๐1 , ๐2 , โฆ ) and (๐พ๐,1 , ๐พ๐,2 , โฆ ), for each ๐ = 1, 2, โฆ , belong
to the Sobolev ellipsoid ๐ต(๐, ๐). So we have a bound for โ๐(๐ )โ2 and โ๐๐ (๐ )โ2 :
โ๐(๐ )โ2 = ๐(๐พ โ๐ )
and
โ๐๐ (๐ )โ2 = ๐(๐พ โ๐ ).
The bound follows from the definition of ๐(๐ ) and ๐๐ (๐ ) and Sobolev ellipsoid. Lemma
8.4 of Wasserman (2006) establishes this bound. Loosely speaking, the value of ๐
depends on the order of the differentiability of the function.
(iii) For each ๐ โ ๐ด, assume ๐๐ฬ (๐ , ๐) is monotone is ๐ , or the state transition density
๐(๐ ๐ก+1 | ๐ ๐ก , ๐๐ก = ๐) is continuous in ๐ ๐ก .
(iv) Assume that the coefficients (๐พ๐,1 (๐), ๐พ๐,2 (๐), โฆ ), for each ๐ = 1, 2, โฆ , belong to the
Sobolev ellipsoid ๐ต(๐, ๐). So that โ๐๐ (๐ , ๐)โ2 = ๐(๐พ โ๐ ) for each ๐ โ ๐ด.
20
Lemma 3. Suppose ๐ ๐ก is a first-order Markov process, and assumption 5.(i) holds. For any
๐ฝ โฅ 1, we have
๐ฝ
E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก ) = ๐ ๐พ (๐ ๐ก )โฒ ฮ ๐ฝ ๐ + โ E(๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก ) + E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก ),
๐=1
for any ๐ก.
Proposition 1. Suppose ๐ ๐ก is a first-order Markov process, and assumption 5.(i)-(ii) hold.
For any ๐ฝ โฅ 1 (including ๐ฝ โ โ),
โE(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก ) โ ๐ ๐พ (๐ ๐ก )โฒ ฮ ๐ฝ ๐โ2 = ๐(๐พ 1/2โ๐ ).
Proposition 2. Suppose ๐ ๐ก is a first-order Markov process, and assumption 5 holds. For
any ๐ฝ โฅ 1 (including ๐ฝ โ โ),
โE(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก , ๐๐ก = ๐) โ ๐ ๐พ (๐ ๐ก )โฒ ฮ (๐)ฮ ๐ฝโ1 ๐โ2 = ๐(๐พ 1/2โ๐ ).
ฬ ฮฬ ๐ฝโ1 ๐.
Given proposition 2, a reasonable estimator of E(๐(๐ ๐ก+๐ฝ )|๐ ๐ก , ๐๐ก = ๐) is ๐ ๐พ (๐ ๐ก )โฒ ฮ(๐)
ฬ
The two matrices ฮ(๐)
and ฮฬ are easily obtained from the nonparametric regressions of
๐๐ (๐ ๐,๐ก+1 ) on ๐ ๐,๐ก and (๐ ๐,๐ก , ๐๐,๐ก ), respectively, for all ๐ = 1, โฆ , ๐พ. The vector ๐ ฬ is obtained
1
from its definition ๐๐ = โซ ๐(๐ )๐๐ (๐ ) d ๐ .
0
3.4
Influence function for stationary decision process
Letting ๐ = ๐๐ , our goal is to characterize the first-order asymptotic properties of ๐๐ฝฬ =
โ1
(๐
ฬ ๐ฝโฒ ๐
ฬ ๐ฝ ) (๐
ฬ ๐ฝโฒ ๐๐ฝฬ ) when ๐ โ โ and ๐ is fixed by deriving its influence function. The
derivation will mostly use the pathwise derivative approach by Newey (1994a) and Hahn
and Ridder (2013).
Letting ๐โ (๐ ๐ก ) be the true CCP, denote ๐ฆ๐๐กโ and ๐๐๐กโ the ๐ฆ๐๐ก and ๐๐๐ก of eq. (22) evaluated
at the true CCP.The other objects with subscript โโโ are defined in the same fashion. Define
โฒ
๐โ = (๐
โโฒ ๐
โ )โ1 ๐
โโฒ ๐โ = [E(๐๐๐กโ ๐๐๐กโ
)]โ1 E(๐๐๐กโ ๐ฆ๐๐กโ ),
โ1
โฒ
โฒ
๐๐ฝโ ,
๐๐ฝ = (๐
๐ฝโ
๐
๐ฝโ ) ๐
๐ฝโ
โฒ
๐๐ฝโ = [E(๐๐๐ก,๐ฝโ ๐๐๐ก,๐ฝโ
)]โ1 E(๐๐๐ก,๐ฝโ ๐ฆ๐๐ก,๐ฝโ ).
โฒ
The identity in eq. (26) follows from ๐ฆ๐๐กโ = ๐๐๐กโ
๐โ . We decompose
โ
๐ (๐๐ฝฬ โ ๐โ ) =
โ
๐ (๐๐ฝฬ โ ๐๐ฝโ ) +
and make the following assumption.
21
โ
๐ (๐๐ฝโ โ ๐โ ),
(26)
Assumption 6.
(i) There exists a constant ๐, such that sup๐ |โ๐๐ (๐ )| < ๐ for all ๐ = 1, 2, 3
and ๐ = 1, 2, โฆ .
โฒ
(ii) The smallest eigenvalue of E(๐๐๐ก,๐ฝ ๐๐๐ก,๐ฝ
) is bounded away from zero uniformly in ๐ฝ =
1, 2, โฆ .
โ
๐ (๐๐ฝโ โ๐โ ) = ๐๐ (๐ฝ ๐ฝ+1 ) (proposition B.2),
โ
so we can focus on the analysis of the variance term ๐ (๐ ฬ โ ๐ ), which can be understood
In the appendix, we show that the bias term
๐ฝ
๐ฝโ
as a three-step M-estimator with generated dependent variables. Proposition B.2 justifies
the truncation we made earlier. The moment equation is
โฒ
๐(๐ฅ๐๐ก , ๐, โ๐ฝ (๐ ๐๐ก ; ๐), ๐) = ๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐).
โฒ
= (๐ฅโฒ๐๐ก + โ๐ฝ๐=1 ๐ฝ ๐ โ3๐ (๐ ๐๐ก )โฒ , โ๐ฝ๐=1 ๐ฝ ๐ โ2๐ (๐ ๐๐ก )โฒ ) is a function of
Equation (24) says that ๐๐๐ก,๐ฝ
๐ฅ๐๐ก , โ๐ฝ2 (๐ ๐๐ก ) and โ๐ฝ3 (๐ ๐๐ก ), and ๐ฆ๐๐ก,๐ฝ = ๐(๐(๐ ๐๐ก )) + โ๐ฝ๐=1 ๐ฝ ๐ โ1๐ (๐ ๐๐ก ) is a function of ๐(๐ ๐๐ก ) and
โ๐ฝ1 (๐ ๐๐ก ). The vector โ๐ฝ (๐ ๐ก )โฒ โก (โ๐ฝ1 (๐ ๐ก )โฒ , โ๐ฝ2 (๐ ๐ก )โฒ , โ๐ฝ3 (๐ ๐ก )โฒ ) itself is a function of ๐. So we write
โ๐ฝ (๐ ; ๐). The estimator ๐ ฬ solves the moment equation,
๐๐ก
๐ฝ
๐,๐
โ ๐(๐ฅ๐๐ก , ๐, โฬ ๐ฝ (๐ ๐๐ก ; ๐),
ฬ ๐)/๐
ฬ
= 0,
๐,๐ก=1
and we have E(๐(๐ฅ๐๐ก , ๐๐ฝโ , โ๐ฝโ (๐ ๐๐ก ; ๐โ ), ๐โ )) = 0.
It is similar to Hahn and Ridder (2013) that using the pathwise derivative approach in
โ
Newey (1994a), the influence function associated with ๐ (๐๐ฝฬ โ ๐๐ฝโ ) can be expressed as
โ
a sum of the following terms: (i) the leading term ๐ (๐๐ฝ โ ๐๐ฝโ ), (ii) a term that adjusts
for the sampling variation in the nonparametric regressions for estimating โ๐๐ (๐ ๐ก ), when the
CCP is known,
๐,๐
๐,๐
โฒ
1
โ โ ( โ ๐๐๐ก,๐ฝ (๐ฅ๐๐ก , โฬ ๐ฝ (๐ ๐๐ก ; ๐โ ))๐๐๐ก,๐ฝ (๐ฅ๐๐ก , โฬ ๐ฝ (๐ ๐๐ก ; ๐โ )) )
๐ ๐,๐ก=1 ๐,๐ก=1
โ1
๐๐๐ก,๐ฝ (๐ฅ๐๐ก , โฬ ๐ฝ (๐ ๐๐ก ; ๐โ ))๐ฆ๐๐ก,๐ฝ (๐โ , โฬ ๐ฝ (๐ ๐๐ก ; ๐โ )) โ
โ
๐ ๐๐ฝ ,
and (iii) an adjustment for the CCP estimation ๐ฬ in the first step, i.e.,
๐,๐
๐,๐
โ1
โฒ
1
โ โ ( โ ๐๐๐ก,๐ฝ (๐ฅ๐๐ก , โ๐ฝโ (๐ ๐๐ก ; ๐))๐
ฬ ๐๐ก,๐ฝ (๐ฅ๐๐ก , โ๐ฝโ (๐ ๐๐ก ; ๐))
ฬ )
๐ ๐,๐ก=1 ๐,๐ก=1
๐๐๐ก,๐ฝ (๐ฅ๐๐ก , โ๐ฝโ (๐ ๐๐ก ; ๐))๐ฆ
ฬ ๐๐ก,๐ฝ (๐,ฬ โ๐ฝโ (๐ ๐๐ก ; ๐))
ฬ โ
Proposition B.1 of the Appendix shows that the leading term
We then just derive the other two terms.
22
โ
โ
๐ ๐๐ฝ ,
๐ (๐๐ฝ โ ๐๐ฝโ ) is ๐๐ (๐ฝ ๐ฝ+1 ).
In appendix D, we show the general results of the asymptotic variance of semiparametric
M-estimators with generated dependent variables, which cover the present problem as a
special case. The results here are derived using the general formulas in appendix D.
We consider the second adjustment term first, which can be analyzed by the pathwise
derivative approach in Newey (1994a, page 1360-1361) or Hahn and Ridder (2013, page
327). The unknown functions in โ๐ฝ (๐ ๐ก ; ๐)ฬ in the second step nonparametric regressions are
the difference between partial means in Newey (1994b)โs terminology. Letting
๐
๐๐ก โก
1(๐๐๐ก = 1) 1 โ 1(๐๐๐ก = 1)
โ
,
๐โ (๐ ๐๐ก )
1 โ ๐โ (๐ ๐๐ก )
it can be shown that the second adjustment term equals
๐,๐ ,๐ฝ
1
๐๐โ ๐
โ
โ
๐ฝ [๐
๐๐ก ๐(๐โ (๐ ๐,๐ก+๐ )) โ โ1๐โ (๐ ๐๐ก )]
๐ ๐,๐ก,๐=1 ๐๐ฆ๐๐ก
(27)
๐,๐ ,๐ฝ
๐๐โ ๐ 0
1
โ
+โ
๐ฝ ( ) โ [๐
๐๐ก ๐ฅ๐,๐ก+๐ โ โ2๐โ (๐ ๐๐ก )]
๐ ๐,๐ก,๐=1 ๐๐๐๐ก
1
(28)
๐,๐ ,๐ฝ
1
๐๐โ ๐ 1
+โ
โ
๐ฝ ( ) โ [๐
๐๐ก ๐โ (๐ ๐,๐ก+๐ )๐ฅ๐,๐ก+๐ โ โ3๐โ (๐ ๐๐ก )],
๐ ๐,๐ก,๐=1 ๐๐๐๐ก
0
(29)
+ ๐๐ (1)
where ๐๐โ /๐๐ฆ๐๐ก and ๐๐โ /๐๐๐๐ก denote the derivatives of the moment equation ๐(๐ฅ๐๐ก , ๐๐ฝโ , โ๐ฝโ (๐ ๐๐ก ; ๐โ ), ๐โ )
with respect to ๐ฆ๐๐ก,๐ฝโ and ๐๐๐ก,๐ฝโ , respectively. We have
๐๐โ
= ๐๐๐ก,๐ฝโ
๐๐ฆ๐๐ก
and
๐๐โ
โฒ
๐๐ฝโ )๐ผ โ ๐๐๐ก,๐ฝโ ๐๐ฝโ .
= (๐ฆ๐๐ก,๐ฝโ โ ๐๐๐ก,๐ฝโ
๐๐๐๐ก
Here ๐ผ is a 2๐๐ฅ ×2๐๐ฅ identity matrix, and ๐๐ฅ is the dimension of ๐ฅ๐ก . Equation (27), (28) and
(29) are the adjustment terms for sampling error in โฬ ๐ฝ (๐ ๐๐ก ; ๐โ ), โฬ ๐ฝ (๐ ๐๐ก ; ๐โ ) and โฬ ๐ฝ (๐ ๐๐ก ; ๐โ ),
1
2
3
respectively.
The third adjustment term accounts for the sampling variation in estimating the CCP.
Unlike the adjustment for the generated regressors studied by Hahn and Ridder (2013), the
adjustment for the generated dependent variables can be obtained by simply linearizing the
moment equation with respect to the first step estimator. It can be shown using the formula
23
in appendix D that the third adjustment term equals
๐,๐
1
๐๐โ โฒ
โ โ
๐ (๐โ (๐ ๐๐ก ))(๐๐๐ก โ ๐โ (๐ ๐๐ก ))
๐ ๐,๐ก=1 ๐๐ฆ๐๐ก
(30)
๐,๐ ,๐ฝ
1
๐๐โ ๐ ๐๐(๐โ (๐ ๐,๐ก+๐ ))
+โ
โ
๐ฝ ๐
๐๐ก
(๐๐๐ก โ ๐โ (๐ ๐๐ก ))
๐๐
๐ ๐,๐ก,๐=1 ๐๐ฆ๐๐ก
(31)
๐,๐ ,๐ฝ
1
๐๐โ ๐ 1
+โ
โ
๐ฝ ( ) โ ๐
๐๐ก ๐ฅ๐,๐ก+๐ (๐๐๐ก โ ๐โ (๐ ๐๐ก ))
๐ ๐,๐ก,๐=1 ๐๐๐๐ก
0
(32)
+ ๐๐ (1),
(33)
where
๐๐(๐โ (๐ ๐,๐ก+๐ ))
= ๐(๐โ (๐ ๐,๐ก+๐ )) + ๐โ (๐ ๐,๐ก+๐ )๐โฒ (๐โ (๐ ๐,๐ก+๐ )) โ ๐โฒ (๐โ (๐ ๐,๐ก+๐ )).
๐๐
Equation (30), (31) and (32) are the adjustment terms for sampling error in ๐(๐(๐
ฬ ๐๐ก ),
โ๐ฝ1 (๐ ๐ก ; ๐)ฬ and โ๐ฝ3 (๐ ๐ก ; ๐).
ฬ
Remark 3. From the influence functions, it easy to see that the discount factor affects the
asymptotic variance of the three-step estimator. The closer to 1 is the discount factor ๐ฝ,
the larger variance is the three-step estimator.
4
Estimation of General Markov Decision Processes
By โgeneralโ Markov decision processes, we allow for time varying flow utility functions and
state transition distributions, and finite decision horizon.
The plan for this section is the following. First, we derive a simpler moment condition
about the flow utility functions from eq. (14). This simpler moment condition, eq. (35),
resembles a partially linear model, in which the unknown function is the integrated value
function in the last sampling period ๐๐ฬ (๐ ๐ ). Using the series approximation of ๐๐ฬ (๐ ๐ )
and the partitioned linear regression arguments, we derive an estimable formula for the
vector of parameters specifying the flow utility functions. Such a formula is then estimated
by a three-step semiparametric estimator, which is our second part. Third, we derive the
influence function for our three-step semiparametric estimator. The arguments there have
two parts. We first show that the series approximation error of ๐๐ฬ (๐ ๐ ) does not affect
the influence function of our three-step estimator under certain conditions. Then, the rest
arguments are similar to the ones we used in deriving the influence function for the infinite
horizon stationary model.
24
4.1
Linear moment condition
Starting from eq. (14b), we have
ฬ (๐ ๐ก+1 ) โฃ ๐ ๐ก ).
๐๐กฬ (๐ ๐ก ) = ๐๐ก๐ (๐ ๐ก ) + ๐ฝ E(๐๐ก+1
ฬ (๐ ๐ก+1 ) with the above expression and using the Markov property,
Iteratively substituting ๐๐ก+1
we have
๐ โ๐กโ1
๐
๐๐กฬ (๐ ๐ก ) = โ ๐ฝ ๐ E(๐๐ก+๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก ) + ๐ฝ ๐ โ๐ก E(๐๐ฬ (๐ ๐ ) โฃ ๐ ๐ก ).
๐=0
ฬ (๐ ๐ก+1 ), eq. (14a) becomes
Applying this formula to ๐๐ก+1
๐ โ๐กโ1
๐
ฬ ๐ก+๐
๐(๐๐ก (๐ ๐ก )) = ๐ขฬ๐ก (๐ฅ๐ก ) + โ ๐ฝ ๐ E(๐
(๐ ๐ก+๐ ) โฃ ๐ ๐ก ) + ๐ฝ ๐ โ๐ก E(๐๐ฬ (๐ ๐ ) โฃ ๐ ๐ก ),
๐=1
where
๐๐ก๐ (๐ ๐ก ) = ๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) + ๐๐ก (๐ ๐ก )๐ขฬ๐ก (๐ฅ๐ก ) + ๐(๐๐ก (๐ ๐ก )).
Similar to the infinite horizon stationary model, let
๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) = ๐ฅโฒ๐ก ๐ผ๐ก
and
๐ขฬ๐ก (๐ฅ๐ก ) = ๐ฅโฒ๐ก ๐ฟ๐ก .
We have the following linear equations,
๐ฆ๐ โ1 = ๐ฅโฒ๐ โ1 ๐ฟ๐ โ1 + ๐ฝ E(ฬ ๐๐ฬ (๐ ๐ ) | ๐ ๐ โ1 )
(34a)
๐ โ๐กโ1
๐ โ๐กโ1
๐=1
๐=1
๐ฆ๐ก = ๐ฅโฒ๐ก ๐ฟ๐ก + โ ๐ฝ ๐ โ2,๐ก,๐ (๐ ๐ก )โฒ ๐ผ๐ก+๐ + โ ๐ฝ ๐ โ3,๐ก,๐ (๐ ๐ก )โฒ ๐ฟ๐ก+๐ + ๐ฝ ๐ โ๐ก E(ฬ ๐๐ฬ (๐ ๐ ) | ๐ ๐ก ), (34b)
for ๐ก = 1, โฆ , ๐ โ 2, where
๐ โ๐กโ1
๐ฆ๐ก โก ๐(๐๐ก (๐ ๐ก )) + โ ๐ฝ ๐ โ1,๐ก,๐ (๐ ๐ก )
and
๐ฆ๐ โ1 โก ๐(๐๐ โ1 (๐ ๐ โ1 )),
๐=1
with
ฬ
โ1,๐ก,๐ (๐ ๐ก ) โก E(๐(๐
๐ก+๐ (๐ ๐ก+๐ ))โฃ๐ ๐ก ),
ฬ ๐ก+๐ |๐ ๐ก ),
โ2,๐ก,๐ (๐ ๐ก ) โก E(๐ฅ
ฬ ๐ก+๐ (๐ ๐ก+๐ )๐ฅ๐ก+๐ โฃ๐ ๐ก ).
โ3,๐ก,๐ (๐ ๐ก ) โก E(๐
Equation (34) provides a set of linear moment conditions about ๐ = (๐ฟ1โฒ , โฆ , ๐ฟ๐โฒ โ1 , ๐ผโฒ2 , โฆ , ๐ผโฒ๐ โ1 )โฒ .
With some definition of matrices, we have a concise version of eq. (34). Suppose we have
panel data (๐๐๐ก , ๐ โฒ๐๐ก ) with ๐ = 1, โฆ, ๐ and ๐ก = 1, โฆ, ๐ . For each ๐, define a (๐ โ 1) × dim(๐)
matrix
๐
๐ = (๐
๐,๐ฟ , ๐
๐,๐ผ ),
25
where
๐
๐,๐ฟ
โฒ
โก ๐ฅ๐,1
โข
โข
โข
โกโข
โข
โข
โข
โฃ 0
๐ฝโ3,1,1 (๐ ๐,1 )โฒ
๐ฝ 2 โ3,1,2 (๐ ๐,1 )โฒ
๐ฅโฒ๐,2
๐ฝโ3,2,1 (๐ ๐,2 )โฒ
๐
๐,๐ผ
โฒ
โก ๐ฝโ2,1,1 (๐ ๐,1 )
โข
โข
โข
โข
โกโข
โข
โข
โข
0
โข
โข
โฃ
๐ฅโฒ๐,3
โฏ ๐ฝ ๐ โ2 โ3,1,๐ โ2 (๐ ๐,1 )โฒ โค
โฅ
โฏ ๐ฝ ๐ โ3 โ3,2,๐ โ3 (๐ ๐,2 )โฒ โฅ
โฅ
โฏ ๐ฝ ๐ โ4 โ3,3,๐ โ4 (๐ ๐,3 )โฒ โฅ ,
โฅ
โฅ
โฑ
โฎ
โฅ
๐ฅโฒ๐,๐ โ1
โฆ
๐ฝ 2 โ2,1,2 (๐ ๐,1 )โฒ
๐ฝ 3 โ2,1,3 (๐ ๐,1 )โฒ
โฏ
๐ฝโ2,2,1 (๐ ๐,2 )โฒ
๐ฝ 2 โ2,2,2 (๐ ๐,2 )โฒ
โฏ
๐ฝโ2,3,1 (๐ ๐,3 )โฒ
โฏ
โฑ
0
๐ฝ ๐ โ2 โ2,1,๐ โ2 (๐ ๐,1 )โฒ โค
โฅ
๐ฝ ๐ โ3 โ2,2,๐ โ3 (๐ ๐,2 )โฒ โฅ
โฅ
๐ฝ ๐ โ4 โ2,3,๐ โ4 (๐ ๐,3 )โฒ โฅ
โฅ.
โฎ
โฅ
โฅ
๐ฝโ2,๐ โ2,1 (๐ ๐,๐ โ2 )โฒ โฅ
โฅ
โฅ
โฆ
Let
โฒ
๐ป๐ = (๐ฝ ๐ โ1 E(ฬ ๐๐ฬ (๐ ๐ ) | ๐ ๐1 ), โฆ , ๐ฝ E(ฬ ๐๐ฬ (๐ ๐ ) | ๐ ๐,๐ โ1 ))
For each ๐, let ๐๐ = (๐ฆ๐1 , โฆ , ๐ฆ๐,๐ โ1 )โฒ and eq. (34) becomes
๐๐ = ๐
๐ ๐ + ๐ป๐ .
Stacking ๐1 , โฆ, ๐๐ , we have ๐ = (๐1โฒ , โฆ , ๐๐โฒ ), which is a ๐ = ๐ โ
(๐ โ 1) dimensional vector.
Define ๐
and ๐ป similarly. We have
๐ = ๐
๐ + ๐ป.
(35)
Because ๐ป involves the unknown ๐๐ฬ (๐ ๐ ), eq. (35) is similar to the partially linear model
(Donald and Newey, 1994). It should be remarked that in discrete state space, Chou (2016)
shows that given the exclusion restriction the integrated value function ๐๐ฬ (๐ ๐ ) is identifiable
up to a constant when ๐ โฅ 4.
4.2
Three-step estimator
Before listing our three-step recipe, we need to deal with the unknown function ๐๐ฬ (๐ ๐ ) in
๐ป by series approximation. Let
โฒ
๐ ๐พ (๐ ) = (๐1 (๐ ), โฆ , ๐๐พ (๐ ))
26
be a vector of approximating functions, such as power series or splines. Let
๐๐,๐พ
ฬ ๐พ (๐ ๐ )โฒ | ๐ ๐,1 )
๐ฝ ๐ โ1 E(๐
โก
โค
โฅ
=โข
โฎ
โข
โฅ
ฬ ๐พ (๐ ๐ )โฒ | ๐ ๐,๐ โ1 ) โฆ
โฃ ๐ฝ E(๐
๐โฒ๐พ = (๐โฒ1๐พ , โฆ , ๐โฒ๐๐พ ).
and
The estimator of ๐ are coefficients of ๐
๐ in the linear regression of ๐๐ on ๐
๐ and ๐๐,๐พ . By
the usual partioned linear regression arguments, the estimator may be written as follows,
๐๐พ = (๐
โฒ ๐๐พ ๐
)โ1 ๐
โฒ ๐๐พ ๐ ,
with ๐๐พ = ๐ผ โ ๐๐พ (๐โฒ๐พ ๐๐พ )โ1 ๐โฒ๐พ . The invertibility of ๐
โฒ ๐๐พ ๐
will be discussed latter. In
the next subsection, we show the difference between ๐๐พ and the true value of ๐ is asymptotically negligible. So our estimator of ๐ is the sample analog of ๐๐พ formed in the following
three steps.
ฬ ๐พ (๐ ๐ ) | ๐ ๐ก ) for ๐ก = 1, โฆ, ๐ โ 1. Let ๐๐กฬ (๐ ๐ก )
Step 1: Estimate the CCP E(๐๐๐ก | ๐ ๐๐ก ) and ๐ฝ ๐ โ๐ก E(๐
be the CCP estimator. Let ๐ฬ ๐พ be the estimate of ๐, and let ๐ฬ ๐พ be the estimate
of ๐ from replacing ๐ with ๐ฬ .
๐พ
๐พ
๐พ
Step 2: Let ๐(๐๐กฬ (๐ ๐ก )) be the estimator of ๐(๐๐ก (๐ ๐ก )). Let โฬ be the estimator of โ from
nonparametric regressions with generated dependent variables.
Step 3: Then ๐ ฬ and ๐
ฬ are constructed by replacing the unknown ๐ and โ with their respective estimates. We have the estimator
ฬ = (๐
ฬ โฒ ๐ฬ ๐
)
ฬ โ1 ๐
ฬ โฒ ๐ฬ ๐พ ๐ ฬ .
๐๐พ
๐พ
4.3
Influence function for general decision process
Let ๐ = ๐ โ
(๐ โ 1). Letting ๐โ be the true vector of parameters specifying the flow utility
function, we can decompose
โ
ฬ โ๐ )=
๐ (๐๐พ
โ
โ
ฬ โ๐ )+
๐ ( ๐๐พ
๐พ
โ
๐ (๐๐พ โ ๐โ ).
โ
ฬ โ ๐ ) (variance term) if โ ๐ (๐ โ ๐ )โ (bias term) is ๐ (1). We
๐ (๐๐พ
๐พ
๐พ
โ
๐
โ
start by showing the conditions under which โ ๐ (๐๐พ โ ๐โ )โ = ๐๐ (1). Then the properties of
โ
๐ (๐ ฬ โ ๐ ) can be similarly analyzed using the influence function formula for the threeWe can focus on
๐พ
โ
๐พ
step semiparametric estimators with generated dependent variables. So the details will be
omitted.
27
Proposition 3. Suppose (i) The smallest eigenvalue of E(๐ โ1 ๐
โฒ ๐๐พ ๐
)โ1 is bounded away
from zero uniformly in ๐พ; (ii) The series approximation error of ๐๐ฬ (๐ ๐ ) is of the order
๐(๐พ โ๐ ). More explicitly, there exists ๐ > 0, such that sup๐ โ๐ฎ โฃ๐๐ฬ (๐ ๐ )โโ๐พ
๐ ๐ (๐ )โฃ โค ๐(๐พ โ๐ ).
๐=1 ๐ ๐
Then we have โ๐๐พ โ ๐โ โ = ๐๐ (๐พ โ๐ ). So if we impose undersmoothing conditions such that
โ
โ
๐พ โ๐ = ๐(1/ ๐), we can let the bias term โ ๐ (๐๐พ โ ๐โ )โ = ๐๐ (1).
โ
Proof. See appendix C.
ฬ here. The derivation is similar to the stationary
We derive the influence function of ๐๐พ
Markov decision process case. First, the three-step estimator is viewed as a M-estimator
with the following moment equation,
๐(๐ฅ๐ , ๐, โ(๐ ๐ ; ๐), ๐) = ๐
ฬ ๐โฒ (๐๐ฬ โ ๐
ฬ ๐ ๐),
where ๐ฅ๐ = (๐ฅโฒ๐1 , โฆ , ๐ฅโฒ๐,๐ โ1 )โฒ , ๐ ๐ = (๐ โฒ๐1 , โฆ , ๐ โฒ๐๐ )โฒ , and ๐(๐ ๐ ) = (๐1 (๐ 1 ), โฆ , ๐๐ โ1 (๐ ๐ โ1 ))โฒ is
the vector of CCP from period 1 to ๐ โ 1. Here ๐
ฬ ๐ = ๐๐,๐พ ๐
and ๐๐ฬ = ๐๐,๐พ ๐ , with
โฒ
โฒ
โฒ
๐๐พ
= (๐1,๐พ
, โฆ , ๐๐,๐พ
). Each ๐๐,๐พ is an (๐ โ 1) × ๐ matrix. We write โ(๐ ๐ ; ๐) to
emphasize the dependence of โ on the CCP.The three-step estimator solves
๐
ฬ ๐ ; ๐),
โ ๐(๐ฅ๐ , ๐, โ(๐
ฬ ๐)/๐
ฬ
= 0.
๐=1
We have E(๐(๐ฅ๐ , ๐๐พ , โโ (๐ ๐ ; ๐โ ), ๐โ )) = 0.
Using the pathwise derivative approach in Newey (1994a), we again can write
โ
ฬ โ
๐(๐๐พ
๐๐พ ) as the sum of the three components:
๐
โ1
โ
1
โฒ
โฒ
โ (โ ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐โ )) ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐โ ))) ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐โ )) ๐๐ฬ (โโ (๐ ๐ ; ๐โ )) โ ๐๐๐พ (i)
๐ ๐=1
๐
โ1
โ
1
ฬ ๐ ; ๐โ ))โฒ ๐
ฬ ๐ (๐ฅ๐ , โ(๐
ฬ ๐ ; ๐โ ))) ๐
ฬ ๐ (๐ฅ๐ , โ(๐
ฬ ๐ ; ๐โ ))โฒ ๐๐ฬ (โ(๐
ฬ ๐ ; ๐โ )) โ ๐๐๐พ
โ (โ ๐
ฬ ๐ (๐ฅ๐ , โ(๐
๐ ๐=1
(ii)
๐
โ1
โ
1
โฒ
โฒ
โ (โ ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐))
ฬ ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐)))
ฬ
๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐))
ฬ ๐๐ฬ (โโ (๐ ๐ ; ๐))
ฬ โ ๐๐๐พ
๐ ๐=1
(iii)
By the definition of ๐๐พ , term (i) equals zero. We just need to calculate the other two terms.
Using the general results of the asymptotic variance of the three-step estimators with
generated dependent variables in appendix D, we have the following results. Let
๐
๐๐ก =
1(๐๐๐ก = 1) 1 โ 1(๐๐๐ก = 1)
โ
,
๐๐กโ (๐ ๐๐ก )
1 โ ๐๐กโ (๐ ๐๐ก )
๐๐,๐ก,๐ = ๐ฝ ๐ {๐
๐๐ก ๐(๐๐ก+๐,โ (๐ ๐,๐ก+๐ )) โ โ1,๐ก,๐โ (๐ ๐๐ก )} ,
๐๐,๐ก,๐ = ๐ฝ ๐ [๐
๐๐ก ๐ฅ๐,๐ก+๐ โ โ2,๐ก,๐โ (๐ ๐,๐ก )] ,
๐๐,๐ก,๐ = ๐ฝ ๐ [๐
๐๐ก ๐๐ก+๐,โ (๐ ๐,๐ก+๐ )๐ฅ๐,๐ก+๐ โ โ3,๐ก,๐โ (๐ ๐๐ก )] ,
28
and define two block upper triangular matrices,
๐โฒ
โก ๐,1,1
โข
โข
โข
โฃ 0
๐โฒ
โก ๐,1,1
โข
โข
โข
โฃ 0
โฒ
๐๐,1,2
โฒ
๐๐,1,3
โฒ
๐๐,2,1
โฒ
๐๐,2,2
โฑ
โฒ
๐๐,1,2
โฒ
๐๐,1,3
โฒ
๐๐,2,1
โฒ
๐๐,2,2
โฑ
โฒ
โฏ ๐๐,1,๐
โ2
โค
โฒ
โฏ ๐๐,2,๐
โ3 โฅ
โฅ
โฑ
โฎ โฅ
โฒ
โฑ ๐๐,๐
โ2,1 โฆ
โฒ
โฏ ๐๐,1,๐
โ2
โค
โฒ
โฏ ๐๐,2,๐
โ3 โฅ
โฅ.
โฑ
โฎ โฅ
โฒ
โฑ ๐๐,๐
โ2,1 โฆ
Let
๐ โ2
(36)
(37)
โฒ
๐ โ3
๐ด๐
=
(โ๐=1 ๐๐,1,๐ , โ๐=1 ๐๐,2,๐ , โฏ , โ1๐=1 ๐๐,๐ โ2,๐ , 0) ,
๐ต๐
=
[ 0(๐ โ1)×๐๐ฟ
๐ถ๐
=
[
eq. (36)
01×๐๐ผ
0(๐ โ2)×1
eq. (37)
0
01×(๐ โ2)
],
0(๐ โ1)×๐๐ผ ] .
Denote
๐
ฬ ๐โ = ๐
ฬ ๐ (๐ฅ๐ , โโ (๐ ๐ ; ๐โ ))
Term (ii) equals the following,
โ
1 ๐ โฒ
โ โ ๐
ฬ ๐โ
[๐ด๐ โ (๐ต๐ + ๐ถ๐ )๐๐พ ] โ ๐๐๐พ + ๐๐ (1).
๐ ๐=1
Let
๐โฒ (๐1โ (๐ ๐1 ))(๐๐1 โ ๐1โ (๐ ๐1 ))
โก
โค
โฅ,
๐ธ๐ = โข
โฎ
โข
โฅ
โฒ
โฃ๐ (๐๐ โ1โ (๐ ๐,๐ โ1 ))(๐๐,๐ โ1 โ ๐๐ โ1โ (๐ ๐,๐ โ1 ))โฆ
(๐ ๐,1+๐ ))
๐
๐1 (๐๐,1+๐ โ ๐1+๐,โ (๐ ๐,1+๐ ))
โ๐ โ2 ๐ฝ ๐ ๐๐(๐1+๐๐๐
โค
โก ๐=1
โฅ
โข
โฎ
๐น๐ = โข ๐๐(๐ (๐
โฅ.
๐ โ1 ๐,๐ โ1 ))
๐
๐,๐ โ1 (๐๐,๐ โ1 โ ๐๐ โ1,โ (๐ ๐,๐ โ1 )) โฅ
โข๐ฝ
๐๐
0
โฃ
Letting
๐๐,๐ก,๐ = ๐ฝ ๐ [๐ฅ๐,๐ก+๐ ๐
๐๐ก (๐๐,๐ก+๐ โ ๐๐ก+๐,โ (๐ ๐,๐ก+๐ ))],
29
โฆ
and define a block triangular matrix,
๐โฒ
โก ๐,1,1
โข
โข
โข
โฃ 0
โฒ
๐๐,1,2
โฒ
๐๐,1,3
โฏ
โฒ
๐๐,2,1
โฒ
๐๐,2,2
โฏ
โฑ
โฑ
โฑ
โฒ
๐๐,1,๐
โ2
โค
โฒ
๐๐,2,๐
โ3 โฅ
โฅ
โฎ โฅ
โฒ
๐๐,๐
โ2,1 โฆ
(38)
and let
eq. (38)
๐บ๐ = [ 0(๐ โ1)×1
01×(๐ โ2)
0(๐ โ1)×๐๐ผ ] .
Term (iii) equals the following,
๐ธ + ๐น1 โ ๐บ1 ๐๐พ
โก 1
โค โ
1 ๐ โฒ
โฅ โ ๐๐๐พ + ๐๐ (1).
โ โ ๐
ฬ ๐โ
๐๐,๐พ โข
โฎ
โข
โฅ
๐ ๐=1
โฃ๐ธ๐ + ๐น๐ โ ๐บ๐ ๐๐พ โฆ
5
5.1
Monte Carlo Studies
Infinite horizon stationary decision process
In this numerical example, both ๐ฅ๐ก and ๐ง๐ก are scalars. Letting the flow utility functions have
the quadratic form,
๐ข(๐ฅ
ฬ ๐ก ) = ๐ฟ1 + ๐ฟ2 ๐ฅ๐ก + ๐ฟ3 ๐ฅ2๐ก
๐ข(๐๐ก = 0, ๐ฅ๐ก ) = ๐ผ1 + ๐ผ2 ๐ฅ๐ก + ๐ผ3 ๐ฅ2๐ก ,
and
and ๐ข๐ก (๐๐ก = 1, ๐ฅ๐ก ) = ๐ข(๐ฅ
ฬ ๐ก ) + ๐ข(๐๐ก = 0, ๐ฅ๐ก ). In simulation, we let
๐ข(๐๐ก = 0, ๐ฅ๐ก ) = ๐ข(๐ฅ
ฬ ๐ก ) = .5 + ๐ฅ๐ก โ .5๐ฅ2๐ก .
The support for ๐ฅ๐ก is [0, 3]. The excluded variable ๐ง๐ก is time invariant (hence drop the time
subscript ๐ก of ๐ง๐ก in the sequel). Conditional on (๐ฅ๐ก , ๐ง), ๐ฅ๐ก+1 is generated from
๐ฅ๐ก+1 = min(3, ๐๐ก โ
(1+๐๐ก+1 )๐ฅ๐ก +(1โ๐๐ก )·2๐๐ก+1 ) with
๐๐ก+1 |๐ง โผ Beta(shape1 (๐ง), shape2 (๐ง)).
The shape parameters the beta distribution are determined in the following way so that
E(๐๐ก+1 | ๐ง) = ๐ง and Var(๐๐ก+1 ) = 1/20:
shape1 (๐ง) = 20๐ง 2 (1 โ ๐ง) โ ๐ง
and
shape2 (๐ง) =
shape1 (๐ง) โ
(1 โ ๐ง)
.
๐ง
To ensure shape1 (๐ง), shape2 (๐ง) > 0, we let ๐ง follow a uniform distribution with support
[.053, .947]. We set the discount factor ๐ฝ = .8. The utility shocks ๐๐ก (0), ๐๐ก (1) follow in30
Table 1: Estimation of Infinite Horizon Stationary Markov Decision Process
Flow Utility Functions
๐ฟ1 = 0.5
.525
(.065)
.544
(.101)
.621
(.182)
๐ฟ2 = 1
๐ฟ3 = โ0.5
.940
โ.501
.903
โ.504
.771
โ.500
(.112)
(.169)
(.294)
(.032)
(.047)
(.079)
Panel Data
๐ผ2 = 1
๐ผ3 = โ0.5
๐
๐
.930
โ.456
2e4
2
.916
โ.436
1e4
2
.905
โ.389
5e3
2
(.104)
(.169)
(.296)
(.027)
(.045)
(.084)
dependent type-I extreme value distribution. Given these structural parameters, we can
determine the true CCP.6
In the numerical examples, we simulate two periods (๐ = 2) dynamic discrete choices
by ๐ number of agents. The first period states {๐ฅ๐1 , ๐ง๐ โถ ๐ = 1, โฆ , ๐} were drawn uniformly
from their support. We estimate the model when ๐ = 5, 000, 10, 000 and 20, 000. Assume
the discount factor is known in the estimation. Table 1 reports the performance of our
estimator based on 1, 000 replications.7 The performance is satisfying. There is bias due to
the smoothing in the nonparametric ingredient of our three-step semiparametric estimator.
As the cross-section sample size increases, the bias decreases. There is no information about
๐ผ1 , because the intercept of ๐ข(๐๐ก = 0, ๐ฅ๐ก ) is not identifiable.
5.2
General Markov decision process
Here, the data are still generated from the above stationary model, however, in the estimation, we do not assume the stationarity. Instead, we let
๐ขฬ๐ก (๐ฅ๐ก ) = ๐ฟ1,๐ก + ๐ฟ2,๐ก ๐ฅ๐ก + ๐ฟ3,๐ก ๐ฅ2๐ก
๐ข๐ก (๐๐ก = 0, ๐ฅ๐ก ) = ๐ผ1,๐ก + ๐ผ2,๐ก ๐ฅ๐ก + ๐ผ3,๐ก ๐ฅ2๐ก ,
and
and estimate ๐ฟ๐ก and ๐ผ๐ก for each sampling period.
In the numerical examples, we simulate six periods (๐ = 6) dynamic discrete choices by
๐ number of agents. As we did for the stationary model, we draw the first period states
{๐ฅ๐1 , ๐ง๐ โถ ๐ = 1, โฆ , ๐} uniformly from their support and estimate the model when ๐ =
5, 000, 10, 000 and 20, 000. Assume the discount factor is known in the estimation. Table 2
reports the performance of our estimator based on 1, 000 replications. The performance
6 Readers,
who are interested in the algorithm of determining CCP from the structural parameters, can
find the details in the documentation of our codes.
7 The Monte Carlo simulation studies used the ALICE High Performance Computing Facility at the
University of Leicester.
31
Table 2: Estimation of General Markov Decision Process
Flow Utility Fun.
๐ฟ1 = 0.5
๐ฟ2 = 1
๐ก=1
๐ก=2
.571
.532
(.101)
(.066)
.893
.949
๐ก=3
.520
(.061)
.975
๐ก=4
.510
(.054)
.992
๐ก=5
.523
(.060)
.953
(.179)
(.135)
(.132)
(.119)
(.134)
โ.470
โ.486
โ.499
โ.507
โ.498
๐ผ2 = 1
1.078
1.058
1.017
(.121)
(.119)
๐ผ3 = โ0.5
โ.515
โ.510
โ.496
โ.490
๐ฟ3 = โ0.5
๐ฟ1 = 0.5
๐ฟ2 = 1
(.046)
(.044)
(.114)
(.041)
.582
.545
(.141)
(.091)
(.037)
.525
(.038)
.507
(.086)
(.080)
.993
(.034)
.538
(.080)
(.188)
.969
1.001
(.175)
(.179)
โ.467
โ.484
โ.502
โ.514
โ.500
๐ผ2 = 1
1.092
1.071
1.019
(.177)
(.172)
๐ผ3 = โ0.5
โ.520
โ.508
โ.492
โ.483
๐ฟ1 = 0.5
๐ฟ2 = 1
(.063)
(.161)
(.058)
.610
.567
(.208)
(.136)
.846
.892
(.062)
(.167)
(.053)
.539
(.116)
.942
(.057)
(.053)
.523
(.109)
.967
.926
.978
(.051)
.559
(.111)
.874
(.276)
(.256)
(.241)
(.240)
โ.466
โ.483
โ.501
โ.514
โ.504
๐ผ2 = 1
1.105
1.075
1.027
(.240)
(.228)
๐ผ3 = โ0.5
โ.513
โ.497
โ.484
โ.466
(.089)
(.090)
(.231)
(.084)
(.086)
(.228)
(.075)
(.082)
(.074)
1e4
(.058)
(.366)
๐ฟ3 = โ0.5
2e4
(.044)
(.190)
(.062)
.929
(.112)
(.038)
(.246)
๐ฟ3 = โ0.5
.882
(.043)
๐
5e3
(.082)
.964
(.069)
is satisfying. There is bias due to the smoothing in the nonparametric ingredient of our
three-step semiparametric estimator. As the cross-section sample size increases, the bias
decreases. Again, ๐ผ1,๐ก is not identifiable.
6
Conclusion
We propose a three-step semiparametric estimator of the expected flow utility functions of
structural dynamic programming discrete choice models. Our estimator is an extension of
the CCP estimators in the literature. The main advantage of our estimator is that it does not
require the estimation of the state transition distributions, which could be difficult when
32
the dimension of the observable state variables is even moderately large. Our estimator
can be applied to both infinite horizon stationary discrete choice models and the finite
horizon nonstationary model with time varying expected flow utility functions and/or state
transition distributions.
33
Appendix
A
Proofs for the approximation formulas in infinite horizon stationary problem
Proof of Lemma 3. We prove by induction. For ๐ฝ = 1, we have
E(๐(๐ ๐ก+1 ) | ๐ ๐ก ) = E(๐ ๐พ (๐ ๐ก+1 )โฒ ๐ + ๐(๐ ๐ก+1 ) โฃ ๐ ๐ก )
= E(๐ ๐พ (๐ ๐ก+1 )โฒ ๐ โฃ ๐ ๐ก ) + E(๐(๐ ๐ก+1 ) | ๐ ๐ก )
๐พ
= โ ๐๐ ๐๐ฬ (๐ ๐ก ) + E(๐(๐ ๐ก+1 ) | ๐ ๐ก )
๐=1
๐พ
= โ ๐๐ [๐ ๐พ (๐ ๐ก )โฒ ๐พ๐ + ๐๐ (๐ ๐ก )] + E(๐(๐ ๐ก+1 ) | ๐ ๐ก )
๐=1
๐พ
๐พ
= โ ๐๐ ๐ ๐พ (๐ ๐ก )โฒ ๐พ๐ + โ ๐๐ ๐๐ (๐ ๐ก ) + E(๐(๐ ๐ก+1 ) | ๐ ๐ก )
๐=1
๐=1
๐พ
โฒ
๐พ
= ๐ (๐ ๐ก ) ฮ ๐ + ๐ (๐ ๐ก )โฒ ๐ + E(๐(๐ ๐ก+1 ) | ๐ ๐ก ),
which equals to the formula in this lemma.
Suppose the lemma holds for ๐ฝ = ๐ฝโ , we need to show that it holds for ๐ฝ = ๐ฝโ + 1. It
follows from the Markov property that
E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก ) = E(E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก+1 ) | ๐ ๐ก )
๐ฝโ
๐พ
โฒ
๐ฝโ
= E(๐ (๐ ๐ก+1 ) ฮ ๐ + โ E(๐๐พ (๐ ๐ก+๐ฝโ +1โ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก+1 ) + E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก+1 ) โฃ ๐ ๐ก )
๐=1
๐ฝโ
๐พ
โฒ
๐ฝโ
= E(๐ (๐ ๐ก+1 ) ฮ ๐ | ๐ ๐ก ) + โ E(๐๐พ (๐ ๐ก+๐ฝโ +1โ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก ) + E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก ).
๐=1
We can organize E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐ฝโ ๐ | ๐ ๐ก ) as follows,
E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐ฝโ ๐ | ๐ ๐ก ) = E(๐ ๐พ (๐ ๐ก+1 )โฒ | ๐ ๐ก )ฮ ๐ฝโ ๐
= (๐1ฬ (๐ ๐ก ), โฆ , ๐๐พ
ฬ (๐ ๐ก ))ฮ ๐ฝโ ๐
= [(๐ ๐พ (๐ ๐ก )โฒ ๐พ1 , โฆ , ๐ ๐พ (๐ ๐ก )โฒ ๐พ๐พ ) + ๐๐พ (๐ ๐ก )โฒ ]ฮ ๐ฝโ ๐
= ๐ ๐พ (๐ ๐ก )โฒ ฮ ๐ฝโ +1 ๐ + ๐๐พ (๐ ๐ก )โฒ ฮ ๐ฝโ ๐.
34
Together, we have
๐ฝโ +1
E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก ) โ ๐ ๐พ (๐ ๐ก )โฒ ฮ ๐ฝโ +1 ๐ = โ E(๐๐พ (๐ ๐ก+๐ฝโ +1โ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก ) + E(๐(๐ ๐ก+๐ฝโ +1 ) | ๐ ๐ก ).
๐=1
โ
By induction, we conclude that the lemma is correct.
Proof of Proposition 1. We prove by showing the order of
๐ฝ
โ E(๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก ) + E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก ).
๐=1
First, viewing E(โ
| ๐ ๐ก ) as a linear operator (whose norm is 1), we know
โE(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก )โ2 โค โ๐(๐ )โ2 = ๐(๐พ โ๐ ).
Second, by the same token,
โE(๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ฮ ๐โ1 ๐ โฃ ๐ ๐ก )โ2 โค โ๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ฮ ๐โ1 ๐โ2 .
Let ๐๐ฬ = ฮ ๐ ๐. It follows from the triangle inequality, we have
๐พ
โ๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ฮ ๐โ1 ๐โ2 = โ๐๐พ (๐ ๐ก+๐ฝโ๐ )โฒ ๐๐โ1
ฬ โ2 โค โ|๐๐โ1,๐
ฬ
|โ๐๐ (๐ ๐ก )โ2 .
(39)
๐=1
It then follows from Cauchyโs inequality that
โ
โ๐พ
โ
โ๐พ
โ๐พ
2 โ
2
2
eq. (39) โค โโ|๐๐โ1,๐
ฬ
| โ
โโโ๐๐ (๐ ๐ก )โ = โ๐๐โ1
ฬ โโโโ๐๐ (๐ ๐ก )โ โผ โ๐๐โ1
ฬ โ โ
๐(๐พ 1/2โ๐ ).
โท ๐=1
โท ๐=1
โท ๐=1
For ๐๐ฬ = ฮ ๐ ๐, we have โ๐๐ฬ โ = โฮ ๐ ๐โ โค โฮ ๐ โโ๐โ. Because โ๐โ โค โ๐โ2 < โ, we have
โ๐๐ฬ โ โผ โฮ ๐ โ. As a result,
eq. (39) โค โฮ ๐ โ โ
๐(๐พ 1/2โ๐ ).
The order of the error term is ๐(๐พ 1/2โ๐ ) โ
โ๐ฝ๐=1 โฮ ๐ โ. For any finite ๐ฝ , the order is of course
๐(๐พ 1/2โ๐ ).
We are also interested in the case when ๐ฝ โ โ. Lemma A.1 shows that โฮ โ โค 1 with
equality only when ๐๐พ (๐ ๐ก ) = 0 or ฮ = 0. For the two trivial cases, if ฮ = 0, then the bound
is simply zero; if ๐๐พ (๐ ๐ก ) = 0, the bound is simply โE(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก )โ2 โค โ๐(๐ )โ2 = ๐(๐พ โ๐ ).
If โฮ โ < 1, we know that the above display is simply
โ
โ
๐
๐(๐พ 1/2โ๐ ) โ
โโฮ ๐ โ โค ๐(๐พ 1/2โ๐ ) โ
โโฮ โ = ๐(๐พ 1/2โ๐ )
๐=1
๐=1
1
= ๐(๐พ 1/2โ๐ ).
1 โ โฮ โ
So we conclude that the approximation bound is ๐(๐พ 1/2โ๐ ) regardless of the value of ๐ฝ . โ
35
Lemma A.1. We have โฮ โ โค 1 with equality only when ๐๐พ (๐ ๐ก ) = 0 or ฮ = 0.
Proof. For any ๐0 โ โ๐พ , we have
E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐0 | ๐ ๐ก ) = E(๐ ๐พ (๐ ๐ก+1 )โฒ | ๐ ๐ก )ฮ ๐0
= ๐ ๐พ (๐ ๐ก )โฒ ฮ 2 ๐0 + ๐๐พ (๐ ๐ก )โฒ ฮ ๐0 .
Note that ๐ ๐พ (๐ ) = (๐1 (๐ ), โฆ , ๐๐พ (๐ ))โฒ is orthogonal to ๐๐พ (๐ ) = (๐1 (๐ ), โฆ , ๐๐พ (๐ ))โฒ , because
โ
๐๐ (๐ ) = โ๐=๐พ+1 ๐พ๐,๐ ๐๐ (๐ ). So we have
โ๐ ๐พ (๐ ๐ก )โฒ ฮ 2 ๐0 + ๐๐พ (๐ ๐ก )โฒ ฮ ๐0 โ2 = โ๐ ๐พ (๐ ๐ก )โฒ ฮ 2 ๐0 โ2 + โ๐๐พ (๐ ๐ก )โฒ ฮ ๐0 โ2 ,
and
โ๐ ๐พ (๐ ๐ก )โฒ ฮ 2 ๐0 โ2 โค โE(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐0 | ๐ ๐ก )โ2 ,
with equality only when ๐๐พ (๐ ๐ก )โฒ ฮ ๐0 = 0. Since ๐1 (๐ ), โฆ , ๐๐พ (๐ ) are orthonormal, we have
โ๐ ๐พ (๐ ๐ก )โฒ ฮ 2 ๐0 โ2 = โฮ 2 ๐0 โ.
We also know
โE(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐0 | ๐ ๐ก )โ2 โค โ๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐0 โ = โฮ ๐0 โ.
We conclude that
โฮ 2 ๐0 โ โค โฮ ๐0 โ
for all ๐0 โ โ๐พ . Thus, โฮ โ โค 1 with equality only when ๐๐พ (๐ ๐ก ) = 0 or ฮ = 0.
โ
Proof of Proposition 2. We write
E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก , ๐๐ก = ๐) = E(E(๐(๐ ๐ก+๐ฝ ) | ๐ ๐ก+1 ) | ๐ ๐ก , ๐๐ก = ๐)
= E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐ฝโ1 ๐ + ๐(๐พ 1/2โ๐ ) | ๐ ๐ก , ๐๐ก = ๐)
= E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐ฝโ1 ๐ | ๐ ๐ก , ๐๐ก = ๐) + ๐(๐พ 1/2โ๐ ).
We then write
E(๐ ๐พ (๐ ๐ก+1 )โฒ ฮ ๐ฝโ1 ๐ | ๐ ๐ก , ๐๐ก = ๐) = E(๐ ๐พ (๐ ๐ก+1 )โฒ | ๐ ๐ก , ๐๐ก )ฮ ๐ฝโ1 ๐
= ๐ ๐พ (๐ ๐ก )โฒ ฮ (๐)ฮ ๐ฝโ1 ๐ + ๐๐พ (๐ ๐ก , ๐)โฒ ฮ ๐ฝโ1 ๐
= ๐ ๐พ (๐ ๐ก )โฒ ฮ (๐)ฮ ๐ฝโ1 ๐ + ๐(๐พ 1/2โ๐ ).
The last line follows from the proof of proposition 1. The two displays above together shows
โ
the result.
36
B
Order of the bias term in infinite horizon stationary
Markov decision process
We first have that
โ
๐ฆ๐๐ก โ ๐ฆ๐๐ก,๐ฝ = โ ๐ฝ ๐ โ1๐ (๐ ๐๐ก ),
๐=๐ฝ+1
โ
โ
๐=๐ฝ+1
๐=๐ฝ+1
โฒ
โฒ
๐๐๐ก
โ ๐๐๐ก,๐ฝ
= ( โ ๐ฝ ๐ โ3๐ (๐ ๐๐ก )โฒ , โ ๐ฝ ๐ โ2๐ (๐ ๐๐ก )โฒ ).
It follows from Assumption 6.(i),
|๐ฆ๐๐ก โ ๐ฆ๐๐ก,๐ฝ | โค ๐ฝ ๐ฝ+1 2๐/(1 โ ๐ฝ),
(40a)
โ๐๐๐ก โ ๐๐๐ก,๐ฝ โ โค ๐ฝ ๐ฝ+1 โ๐๐ฅ ๐/(1 โ ๐ฝ).
(40b)
Proposition B.1. Given Assumption 6, we have
โ
๐ (๐๐ฝ โ ๐๐ฝโ ) = ๐๐ (๐ฝ ๐ฝ+1 ).
Proof. As an M-estimator, the influence function of ๐๐ฝ is
โฒ
๐๐๐ก,๐ฝโ (๐ฆ๐๐ก,๐ฝโ โ ๐๐๐ก,๐ฝโ
๐๐ฝโ ).
For the simplicity of exposition, we omit the subscript โ*โ in ๐๐๐ก,๐ฝโ and ๐ฆ๐๐ก,๐ฝโ throughout
the proof. To prove the proposition, it suffices to show that
โฒ
๐๐ฝโ ) = ๐๐ (๐ฝ ๐ฝ+1 ).
๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
โฒ
Because ๐ฆ๐๐ก โ ๐๐๐ก
๐โ = 0, it can be shown that
โฒ
โฒ
โฒ
๐๐ฝโ ) โ ๐๐๐ก,๐ฝ (๐ฆ๐๐ก โ ๐๐๐ก
๐โ )
๐๐ฝโ ) = ๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
= ๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )
+ ๐๐๐ก,๐ฝ (๐๐๐ก โ ๐๐๐ก,๐ฝ )โฒ ๐๐ฝโ
+
โฒ
๐๐๐ก,๐ฝ ๐๐๐ก
(๐โ
โ ๐๐ฝโ ).
(41)
(42)
(43)
It follows from eq. (40) that eq. (41) and (42) are both ๐๐ (๐ฝ ๐ฝ+1 ). To complete the proof,
we only need to show that ๐โ โ ๐๐ฝโ = ๐๐ (๐ฝ ๐ฝ+1 ).
โฒ
We have E(๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐๐ฝโ )) = 0, and
โฒ
โฒ
โฒ
E(๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐๐ฝโ โ ๐ฆ๐๐ก + ๐๐๐ก
๐โ )) = E(๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐๐ฝโ )) = 0.
It then can be shown that
โฒ
E(๐๐๐ก,๐ฝ ๐๐๐ก,๐ฝ
)(๐๐ฝโ โ ๐โ ) = E(๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )) โ E(๐๐๐ก,๐ฝ (๐๐๐ก,๐ฝ โ ๐๐๐ก )โฒ )๐โ .
37
โฒ
Applying eq. (40), we have ๐โ โ ๐๐ฝโ = ๐๐ (๐ฝ ๐ฝ+1 ). So we conclude that ๐๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐๐๐ก,๐ฝ
๐๐ฝโ )
โ
is ๐๐ (๐ฝ ๐ฝ+1 ).
โ
Proposition B.2. Given Assumption 6, we have ๐ (๐๐ฝโ โ ๐โ ) = ๐๐ (๐ฝ ๐ฝ+1 ).
โ
โ
โ
Proof. We write ๐ (๐๐ฝโ โ ๐โ ) = ๐ (๐๐ฝโ โ ๐๐ฝ ) + ๐ (๐๐ฝ โ ๐โ ). Proposition B.1 has shown
โ
โ
that ๐ (๐๐ฝโ โ๐๐ฝ ) = ๐๐ (๐ฝ ๐ฝ+1 ). Lemma B.1 below shows that ๐ (๐๐ฝ โ๐โ ) = ๐๐ (๐ฝ ๐ฝ+1 ). โ
โ
Lemma B.1. Given Assumption 6, we have ๐ (๐๐ฝ โ ๐โ ) = ๐๐ (๐ฝ ๐ฝ+1 ).
Proof. For the simplicity of exposition, we omit the subscript โ*โ. We decompose ๐๐ฝ โ ๐ as
the following sum,
โ1
โ1
(๐
๐ฝ โฒ ๐
๐ฝ /๐ ) (๐
๐ฝ โฒ ๐๐ฝ )/๐ โ (๐
๐ฝ โฒ ๐
๐ฝ /๐ ) ๐
โฒ ๐ /๐
โฒ
โ1
(44)
โ1
+(๐
๐ฝ ๐
๐ฝ /๐ ) ๐
โฒ ๐ /๐ โ (๐
โฒ ๐
/๐ ) ๐
โฒ ๐ /๐ .
(45)
Below, we derive the orders of eq. (44) and (45).
We first derive the order of eq. (44), which equals
โฒ
E(๐๐๐ก,๐ฝ ๐๐๐ก,๐ฝ
)โ1 [(๐
๐ฝ โฒ ๐๐ฝ )/๐ โ ๐
โฒ ๐ /๐ ] + ๐๐ (1),
if (๐
๐ฝ โฒ ๐๐ฝ )/๐ โ ๐
โฒ ๐ /๐ = ๐๐ (1), which will be shown. Because Assumption 6.(ii) has
โฒ
assumed the smallest eigenvalue of E(๐๐๐ก,๐ฝ ๐๐๐ก,๐ฝ
) is bounded from zero, it suffices to derive the
order of (๐
๐ฝ โฒ ๐๐ฝ )/๐ โ๐
โฒ ๐ /๐ to find the order of eq. (44). Decompose (๐
๐ฝ โฒ ๐๐ฝ )/๐ โ๐
โฒ ๐ /๐
as the following sum
(๐
๐ฝโฒ ๐๐ฝ โ ๐
๐ฝโฒ ๐ )/๐ + (๐
๐ฝโฒ ๐ โ ๐
โฒ ๐ )/๐ .
To derive the order of (๐
๐ฝโฒ ๐๐ฝ โ ๐
๐ฝโฒ ๐ )/๐ , we consider its squared mean and then use the
Markov inequality. Recall that ๐๐๐ก,๐ฝ is a ๐๐ -dimensional vector. We have
๐๐
๐
๐
2
E(โ(๐
๐ฝโฒ ๐๐ฝ โ ๐
๐ฝโฒ ๐ )/๐ โ2 ) = โ E([๐ โ1 โ โ ๐โ,๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )] ).
๐=1 ๐ก=1
โ=1
Since ๐๐ is finite, it suffices to consider the order of an individual term of the above sum.
Since ๐ is finite, without loss of generality, we consider the case, where ๐ = 2. The term
=2
E([๐ โ1 โ๐๐=1 โ๐๐ก=1
๐โ,๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )]2 ) equals
(2๐)โ1 E([๐โ,๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )]2 ) + (2๐)โ1 E(๐โ,๐1,๐ฝ (๐ฆ๐1,๐ฝ โ ๐ฆ๐1 )๐โ,๐2,๐ฝ (๐ฆ๐2,๐ฝ โ ๐ฆ๐2 )).
Because the regressors ๐โ,๐๐ก,๐ฝ are bounded, and the difference |๐ฆ๐๐ก โ ๐ฆ๐๐ก,๐ฝ | is of order ๐ฝ ๐ฝ+1 ,
we conclude
๐ ๐ =2
2
E([๐ โ1 โ โ ๐โ,๐๐ก,๐ฝ (๐ฆ๐๐ก,๐ฝ โ ๐ฆ๐๐ก )] ) = ๐(๐โ1 ๐ฝ 2(๐ฝ+1) ).
๐=1 ๐ก=1
38
By the Markov inequality, (๐
๐ฝโฒ ๐๐ฝ โ ๐
๐ฝโฒ ๐ )/๐ = ๐๐ (๐ โ1/2 ๐ฝ (๐ฝ+1) ). Similar arguments show
that (๐
๐ฝโฒ ๐ โ๐
โฒ ๐ )/๐ = ๐๐ (๐ โ1/2 ๐ฝ (๐ฝ+1) ). Hence, we conclude (๐
๐ฝโฒ ๐๐ฝ โ๐
โฒ ๐ )/๐ , henceforth
eq. (44), is ๐๐ (๐ โ1/2 ๐ฝ (๐ฝ+1) ).
We next derive the order of eq. (45), which rewritten as follows,
[(๐
๐ฝ โฒ ๐
๐ฝ /๐ )
โ1
โ (๐
โฒ ๐
/๐ )โ1 ]๐
โฒ ๐ /๐ .
The term ๐
โฒ ๐ /๐ = ๐๐ (๐ โ1/2 ). Letting
๐๐ฝ โก 2๐
๐ฝโฒ (๐
โ ๐
๐ฝ )/๐ + (๐
โ ๐
๐ฝ )โฒ (๐
โ ๐
๐ฝ )/๐ ,
the term (๐
๐ฝ โฒ ๐
๐ฝ /๐ )
โ1
โ (๐
โฒ ๐
/๐ )โ1 equals
โ1
โ1
โ1 โ1
(๐
๐ฝ โฒ ๐
๐ฝ /๐ ) ๐๐ฝ (๐
๐ฝ โฒ ๐
๐ฝ /๐ ) [๐ผ + ๐๐ฝ (๐
๐ฝ โฒ ๐
๐ฝ /๐ ) ] .
This can be shown by the Woodbury formula in linear algebra. It is easy to see that
๐๐ฝ = ๐๐ (๐ฝ ๐ฝ+1 ), hence the above display is also ๐๐ (๐ฝ ๐ฝ+1 ). We then conclude that eq. (45),
โ
is ๐๐ (๐ โ1/2 ๐ฝ (๐ฝ+1) ).
C
Order of the bias term in general Markov decision
process
Proof of Proposition 3. Let ๐(๐ ๐ ) = ๐๐ฬ (๐ ๐ ) โ โ๐พ
๐ ๐ (๐ ) be the residual of series ap๐=1 ๐ ๐ ๐
๐พ
ฬ
proximation. Let ๐ = ๐ป โ ๐๐พ ๐ denote the vector consisting of ๐ฝ ๐ โ๐ก E(๐(๐
๐ ) | ๐ ๐๐ก ) for ๐
and ๐ก. Here ๐๐พ = (๐1 , โฆ , ๐๐พ )โฒ .
โ
We will show that โ ๐ (๐๐พ โ ๐โ )โ โค โ๐๐๐๐ฅ โ๐โ, where ๐๐๐๐ฅ is the largest eigenvalue of
๐
โ โก (๐ โ1 ๐
โฒ ๐๐พ ๐
)โ1 . Replacing ๐ in ๐๐พ = (๐
โฒ ๐๐พ ๐
)โ1 ๐
โฒ ๐๐พ ๐ with ๐ = ๐
๐โ + ๐ป, we
have ๐๐พ โ ๐โ = (๐
โฒ ๐๐พ ๐
)โ1 ๐
โฒ ๐๐พ ๐. Then
โ๐ (๐๐พ โ ๐โ )โ2 = โ(๐
โ )โ1 ๐
โฒ ๐๐พ ๐โ2
= ๐ โฒ ๐๐พ ๐
(๐
โ )โ1 (๐
โ )โ1 ๐
โฒ ๐๐พ ๐
= ๐ โฒ ๐๐พ ๐
(๐
โ )โ1/2 (๐
โ )โ1 (๐
โ )โ1/2 ๐
โฒ ๐๐พ ๐
โค ๐๐๐๐ฅ โ
๐ โฒ ๐๐พ ๐
(๐
โ )โ1/2 (๐
โ )โ1/2 ๐
โฒ ๐๐พ ๐
= ๐๐๐๐ฅ โ
๐ โฒ ๐๐พ ๐
(๐
โ )โ1 ๐
โฒ ๐๐พ ๐
= ๐ ๐๐๐๐ฅ โ
๐ โฒ ๐๐พ ๐
(๐
โฒ ๐๐พ ๐
)โ1 ๐
โฒ ๐๐พ ๐
โค ๐ ๐๐๐๐ฅ โ๐โ2 .
39
We have
โ
โ ๐ (๐๐พ โ ๐โ )โ โค โ๐๐๐๐ฅ โ๐โ.
(46)
๐พ
By the condition that sup๐ โ๐ฎ โฃ๐๐ฬ (๐ ๐ ) โ โ๐=1 ๐๐ ๐๐ (๐ )โฃ = sup๐ โ๐ฎ |๐(๐ )| โค ๐(๐พ โ๐ ), we have
โ
โ๐โ = ๐ ๐(๐พ โ๐ ). Also, โ๐๐๐๐ฅ < โ with probability 1. So we have โ(๐๐พ โ ๐โ )โ =
โ
๐๐ (๐พ โ๐ ).
D Asymptotic variance of three-step semiparametric Mestimators with generated dependent variables
This appendix derives the asymptotic variance for the three-step semiparametric M-estimators
with generated dependent variabels in the second step nonparametric regressions. First, we
describe the three-step estimator for which our proposed formula applies. Second, we derive the influence function and the asymptotic variance. Third, we derive the asymptotic
variance formula when the second step nonparametric regressions are โpartial meansโ in
Neweyโs (1994b) terminology. The general formula can be applied to our three-step CCP
estimator.
Assume that we have random sample {๐ฆ๐ , ๐ฅ๐ , ๐ง๐ โถ ๐ = 1, โฆ , ๐} about random variables ๐ฆ,
๐ฅ and ๐ง. In the first step, we obtain an estimator ๐(๐ฅ,
ฬ ๐ง) of ๐โ (๐ฅ, ๐ง) โก E(๐ฆ | ๐ฅ, ๐ง) by nonparametric regression of ๐ฆ on ๐ฅ and ๐ง. The variable ๐คโ = ๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง)) is the dependent
variable in the second step nonparametric regression. The function ๐ is known.
In the second step, the goal is to estimate
โโ (๐ฅ; ๐คโ ) โก E(๐คโ = ๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง)) โฃ ๐ฅ).
Obviously, when ๐(๐ฅ, ๐ง, ๐) is linear in ๐, the first step and second step can be merged by
the law of iterated expectation. So we are interested in the case where ๐ is nonlinear in
๐. Because ๐โ (๐ฅ, ๐ง), hence ๐คโ , is not observable, we use ๐คฬ ๐ = ๐(๐ฅ๐ , ๐ง๐ , ๐(๐ฅ
ฬ ๐ , ๐ง๐ )) in the
nonparametric regression. The nonparametric regression estimator of ๐คฬ on ๐ฅ is denoted by
ฬ
โ(๐ฅ;
๐ค).
ฬ
The third step defines a semiparametric M-estimator ๐ ฬ that solves a moment equation
of the form
๐
ฬ ๐ ; ๐ค))
๐โ1 โ ๐(๐ฅ๐ , ๐, โ(๐ฅ
ฬ = 0.
๐=1
Suppose that E(๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))) = 0. Moreover, assume that ๐ depends on โ only
through its value โ(๐ฅ). Our interest is to characterize the first-order asymptotic properties
of ๐.ฬ
40
We derive the influence function of ๐ ฬ using the pathwise derivative approach in Newey
(1994a, page 1360-1361). For a path {๐น๐ }, that equals the true distribution of (๐ฆ, ๐ฅ, ๐ง) when
๐ = ๐ , let ๐(๐น ) be the probability limit of ๐ ฬ if data were generated from ๐น . We have
โ
๐
๐
E๐ (๐(๐ฅ, ๐(๐น๐ ), โ๐ (๐ฅ; ๐ค๐ ))) = 0,
where ๐ค๐ = ๐(๐ฅ, ๐ง, ๐๐ (๐ฅ, ๐ง)), ๐๐ = E๐ (๐ฆ | ๐ฅ, ๐ง), and โ๐ (๐ฅ; ๐ค๐ ) = E๐ (๐ค๐ | ๐ฅ). Let ๐ โก
๐ E(๐(๐ฅ, ๐, โโ (๐ฅ; ๐คโ )))/๐๐โฃ
๐=๐โ
, and ๐(๐ฆ, ๐ฅ, ๐ง) = ๐ ln(d ๐น๐ )/๐๐ be score function (the deriva-
tives with respect to ๐ are always evaluated at ๐ = ๐โ ). For expositional simplity, we write
๐ ๐(๐ฅโ )/๐๐ฅ = ๐ ๐(๐ฅ)/๐๐ฅ|๐ฅ=๐ฅโ for a generic function ๐(๐ฅ).
It can be shown by Neweyโs pathwise derivative approach that โ๐ ๐ ๐(๐น๐ )/๐๐ equals
the sum of the following three terms
E(๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))๐(๐ฆ, ๐ฅ, ๐ง)),
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))
(๐คโ โ โโ (๐ฅ))๐(๐ฆ, ๐ฅ, ๐ง)),
๐โ
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
E(
(๐ฆ โ ๐โ (๐ฅ, ๐ง))๐(๐ฆ, ๐ฅ, ๐ง)).
๐โ
๐๐
E(
(i)
(ii)
(iii)
Here term (i) corresponds to the leading term that captures the uncertainty left if we know
โโ and ๐โ , hence ๐คโ . Term (ii) accounts for the sampling variation in estimating E(๐คโ | ๐ฅ),
when ๐คโ is known. Term (iii) reflects the sampling variation in ๐.ฬ It is easy to derive term
(ii) from (Newey, 1994a, page 1360-1361). Term (iii) needs more explanation.
Term (iii) is derived from the pathwise derivative
๐ E(๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐ค๐ )))/๐๐
(47)
We have
eq. (47) = ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) ๐โโ (๐ฅ; ๐ค) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
[
๐๐ ])/๐๐
๐โ
๐๐ค
๐๐
= ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))
๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
E(
๐๐ โฃ ๐ฅ))/๐๐
๐โ
๐๐
= ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
๐๐ )/๐๐.
๐โ
๐๐
(48)
The second line follows from the fact that โโ (๐ฅ; ๐ค) = E(๐ค | ๐ฅ) is a linear functional, so
its Fréchet derivative is itself. The last line follows from the law of iterated expectation.
Since ๐๐ = E๐ (๐ฆ | ๐ฅ, ๐ง) is projection, term (iii) can be derived from equation (4.5) of Newey
(1994a).
41
From term (i), (ii) and (iii), we conclude that the influence function of the three-step
estimator ๐ ฬ is
โ ๐ โ1 [๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) +
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))
(๐คโ โ โโ (๐ฅ))
๐โ
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
+
(๐ฆ โ ๐โ (๐ฅ, ๐ง))].
๐โ
๐๐
Remark 4 (Difference between generated regressors and generated dependent variables).
Hahn and Ridder (2013) derives the asymptotic variance for semiparametric estimators
with generated regressors. The estimation there also proceeds in three steps. The estimator
in the first step is used to generate regressors for the nonparametric regression in the second
step. An interesting conclusion of their paper is that when third step estimator is linear in
the conditional mean estimated in the second step, the sampling variation in the first step
estimation does not affect the asymptotic variance of the three-step estimator. However, it
easy to see from term (iii) here that if the first step is to generate dependent variables rather
than regressors, the sampling variation in the first step will always affect the asymptotic
โ
variance of the three-step estimator.
We next consider the case where the second step nonparametric regression involves partial means instead of full means. In particular, suppose there is a discrete variable ๐ taking
value from 1,โฆ,๐พ. In our dynamic discrete choice model, ๐ is ๐ฆ itself. The second step is to
estimate
โ๐โ (๐ฅ; ๐คโ ) = E(๐คโ = ๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง)) โฃ ๐ฅ, ๐ = ๐).
The leading term does not change. Using the following identity,
โ๐โ (๐ฅ; ๐คโ ) = E(๐คโ = ๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง)) โฃ ๐ฅ, ๐ = ๐)
= E(
1(๐ = ๐)๐คโ
โฃ ๐ฅ),
Pr(๐ = ๐ | ๐ฅ)
we can show that second term now becomes
E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))
1(๐ = ๐)๐คโ
(
โ โ๐โ (๐ฅ)) ๐(๐ฆ, ๐ฅ, ๐ง)).
๐โ
Pr(๐ = ๐ | ๐ฅ)
For the third term, we have
eq. (47) = ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) ๐โโ (๐ฅ; ๐ค) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
[
๐๐ ])/๐๐
๐โ
๐๐ค
๐๐
= ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ ))
๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
E(
๐๐ โฃ ๐ฅ, ๐ = ๐))/๐๐
๐โ
๐๐
= ๐ E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) 1(๐ = ๐) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
๐๐ )/๐๐.
๐โ
Pr(๐ = ๐ | ๐ฅ)
๐๐
42
Hence the third adjustment term of the influence function is
E(
๐๐(๐ฅ, ๐โ , โโ (๐ฅ; ๐คโ )) 1(๐ = ๐) ๐๐(๐ฅ, ๐ง, ๐โ (๐ฅ, ๐ง))
(๐ฆ โ ๐โ (๐ฅ, ๐ง))๐(๐ฆ, ๐ฅ, ๐ง)).
๐โ
Pr(๐ = ๐ | ๐ฅ)
๐๐
References
Aguirregabiria, V. (2010). Another look at the identification of dynamic discrete decision processes: An application to retirement behavior. Journal of Business & Economic
Statistics 28(2), 201โ218.
Aguirregabiria, V. and P. Mira (2002). Swapping the nested fixed point algorithm: A class
of estimators for discrete markov decision models. Econometrica 70(4), 1519โ1543.
Aguirregabiria, V. and P. Mira (2010). Dynamic discrete choice structural models: A survey.
Journal of Econometrics 156(1), 38โ67.
Altuฤ, S. and R. A. Miller (1998). The effect of work experience on female wages and labour
supply. The Review of Economic Studies 65(1), 45โ85.
Arcidiacono, P. and R. A. Miller (2011). Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica 79(6), 1823โ
1867.
Arcidiacono, P. and R. A. Miller (2016). Nonstationary dynamic models with finite dependence. Technical report, Mimeo.
Blundell, R., M. Costa Dias, C. Meghir, and J. Shaw (2016). Female labor supply, human
capital, and welfare reform. Econometrica 84(5), 1705โ1753.
Chou, C. (2016). Identification and linear estimation of general dynamic programming
discrete choice models. Mimeo, University of Leicester, Leicester.
Donald, S. G. and W. K. Newey (1994). Series estimation of semilinear models. Journal of
Multivariate Analysis 50(1), 30โ40.
Eckstein, Z. and K. I. Wolpin (1989). Dynamic labour force participation of married women
and endogenous work experience. The Review of Economic Studies 56(3), 375โ390.
Fang, H. and Y. Wang (2015). Estimating dynamic discrete choice models with hyperbolic
discounting, with an application to mammography decisions. International Economic
Review 56(2), 565โ596.
43
Hahn, J. and G. Ridder (2013). Asymptotic variance of semiparametric estimators with
generated regressors. Econometrica 81(1), 315โ340.
Heckman, J. J., L. J. Lochner, and P. E. Todd (2003). Fifty years of mincer earnings
regressions. Working Paper 9732, National Bureau of Economic Research.
Helmberg, G. (2008). Introduction to spectral theory in Hilbert space. Mineola, N.Y.: Dover
Publications, Inc.
Hotz, V. J. and R. A. Miller (1993). Conditional choice probabilities and the estimation of
dynamic models. The Review of Economic Studies 60(3), 497โ529.
Hotz, V. J., R. A. Miller, S. Sanders, and J. Smith (1994). A simulation estimator for
dynamic models of discrete choice. The Review of Economic Studies 61(2), 265โ289.
Huggett, M., G. Ventura, and A. Yaron (2011). Sources of lifetime inequality. The American
Economic Review 101(7), 2923โ2954.
Kalouptsidi, M., P. T. Scott, and E. Souza-Rodrigues (2015). Identification of counterfactuals and payoffs in dynamic discrete choice with an application to land use. Working
paper, NBER.
Kristensen, D., L. Nesheim, and A. de Paula (2014). CCP and the estimation of nonseparable
dynamic models. Mimeo, University College London.
Magnac, T. and D. Thesmar (2002). Identifying dynamic discrete decision processes. Econometrica 70(2), 801โ816.
Matzkin, R. L. (1992). Nonparametric and distribution-free estimation of the binary threshold crossing and the binary choice models. Econometrica, 239โ270.
Miller, R. A. (1997). Estimating models of dynamic optimization with microeconomic data.
In M. H. Pesaran and P. Schmidt (Eds.), Handbook of Applied Econometrics, Volume 2,
Chapter 6, pp. 246โ299. Blackwell Oxford.
Newey, W. K. (1994a). The asymptotic variance of semiparametric estimators. Econometrica, 1349โ1382.
Newey, W. K. (1994b). Kernel estimation of partial means and a general variance estimator.
Econometric Theory 10(02), 1โ21.
Norets, A. and X. Tang (2014). Semiparametric inference in dynamic binary choice models.
The Review of Economic Studies 81, 1229โ1262.
44
Pesendorfer, M. and P. Schmidt-Dengler (2008). Asymptotic least squares estimators for
dynamic games. The Review of Economic Studies 75(3), 901โ928.
Srisuma, S. and O. Linton (2012). Semiparametric estimation of markov decision processes
with continuous state space. Journal of Econometrics 166(2), 320โ341.
Wasserman, L. (2006). All of nonparametric statistics. New York: Springer.
45
© Copyright 2026 Paperzz