Three-Step CCP Estimation of Dynamic Programming

Three-Step CCP Estimation of Dynamic
Programming Discrete Choice Models with
Large State Space
Cheng Chou
Geert Ridder
University of Leicester
University of Southern California
March 30, 2017
Abstract
The existing estimation methods of structural dynamic discrete choice models mostly
require knowing or estimating the state transition distributions. When some or all state
variables are continuous and/or the dimension of the vector of state variables is moderately large, the estimation of state transition distributions becomes difficult and has
to rely on tight distributional assumptions. We show that state transition distributions
are indeed not necessary to identify and estimate the flow utility functions in the presence of panel data and excluded variables. A state variable is called excluded variable
if it does not affect the flow utility but affects the choice probability by affecting the
future payoff of current choice. We propose a new estimator of flow utility functions
without estimating or specifying the state transition law or solving agentsโ€™ dynamic
programming problems. The estimator can be applied to both infinite horizon stationary model or general dynamic discrete choice models with time varying flow utility
functions and state transition law.
JEL Classification: C14, C23, C35, J24
Keywords: Semiparametric estimation, dynamic discrete choice, continuous state variables,
large state space
1
1
Introduction
We extend the two-step Conditional Choice Probability (CCP) estimator of Hotz and Miller
(1993, HM93 hereafter) to a three-step CCP estimator. We first explain HM93โ€™s two-step
method to make our motivation clear. The CCP is a function of the difference between
expected payoffs from different alternatives. In dynamic programming discrete choice model,
the expected payoff of one alternative equals the sum of the expected flow utility function and
conditional valuation function of that alternative. Both are functions of the current observed
state ๐‘ ๐‘ก and choice ๐‘Ž๐‘ก . The expected flow utility functions are usually parameterized. The
difficulty is the conditional valuation function, which is the expected remaining lifetime
utility from period ๐‘ก onwards given the current state ๐‘ ๐‘ก and choice ๐‘Ž๐‘ก . The key observation
in HM93 is that under certain conditions the conditional valuation functions are explicit
functions of all future CCP, future expected flow utility functions and future state transition
distributions.1 So the first point is that the CCP in period ๐‘ก can be expressed as a function of
all CCP, flow utility functions and state transition distributions beyond period ๐‘ก. Moreover,
the CCP in period ๐‘ก of choosing alternative ๐‘Ž is also the conditional mean of the dummy
variable that equals 1 when alternative ๐‘Ž was chosen in period ๐‘ก given the observed state
variables. So the second point is that after parameterizing the flow utility functions and
knowing all CCP and state transition distributions, we can set up moment conditions about
the unknown parameters in flow utility functions using the conditional mean interpretation
of the CCP.Based on these two points, the first step of HM93โ€™s two-step estimator is to
estimate the CCP and state transition distributions, nonparametrically. Plugging these
estimates into the moment conditions defined by the CCP formula, the second step is to
estimate the unknown parameters in the flow utility functions via the generalized methods
of moments (GMM) estimator.
The motivation for this paper is that when the dimension of the vector of observable
state variables ๐‘ ๐‘ก is even moderately large, the nonparametric estimation of the state transition distributions ๐น (๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) in the first step of HM93 becomes difficult. The presence of
continuous state variables will only make the task harder. The same difficulty appears in
the other estimators following HM93, including Hotz, Miller, Sanders, and Smith (1994);
Aguirregabiria and Mira (2002); Pesendorfer and Schmidt-Dengler (2008); Arcidiacono and
Miller (2011); Srisuma and Linton (2012); Arcidiacono and Miller (2016), whose implementation requires the state transition distributions. For large state space, we have to grid the
1 Two
main conditions for this conclusion are the flow utility functions are additive in the utility shocks,
and the utility shocks are serially uncorrelated. Arcidiacono and Miller (2011) address the serially correlated
unobserved heterogeneity by including finite types in the model.
2
state space coarsely, or impose various conditional independence restrictions and parametric specification of state transition distributions. The aim is to obtain the state transition
distributions somehow.
The main point of this paper is that it is not necessary to estimate the state transition
distributions to estimate the expected flow utility functions by using HM93โ€™s original result
with panel data and excluded variables. A state variable is called excluded variable if it does
not affect flow utility but affect future payoffs. Let ๐‘๐‘ก (๐‘ ๐‘ก ) be the vector of CCP in period
๐‘ก and ๐‘ข(๐‘Ž๐‘ก , ๐‘ ๐‘ก ; ๐œƒ๐‘ก ) be the expected flow utility functions known up to a finitely dimensional
๐œƒ๐‘ก . HM93 shows that given observed state ๐‘ ๐‘ก , the expected optimal flow utility in period
๐‘ก, denoted by ๐‘ˆ๐‘ก๐‘œ , is an explicit function of CCP and the expected flow utility functions at
time ๐‘ก. Denote such a function by
๐‘ˆ๐‘ก๐‘œ (๐‘๐‘ก (๐‘ ๐‘ก ), ๐‘ข(๐‘Ž๐‘ก , ๐‘ ๐‘ก ; ๐œƒ๐‘ก )).
For example, if the utility shocks are independent and follow type-1 extreme value distribution, it follows from HM93 that
๐‘ˆ๐‘ก๐‘œ = โˆ‘ Pr(๐‘Ž๐‘ก = ๐‘Ž | ๐‘ ๐‘ก )[๐‘ข(๐‘Ž๐‘ก = ๐‘Ž, ๐‘ ๐‘ก ; ๐œƒ๐‘ก ) โˆ’ ln Pr(๐‘Ž๐‘ก = ๐‘Ž | ๐‘ ๐‘ก )] + 0.5776,
(1)
๐‘Žโˆˆ๐ด
where ๐ด is the choice set. Under the assumptions of HM93, the conditional valuation
function of choosing ๐‘Ž๐‘ก at time ๐‘ก equals
๐‘‡โˆ—
โˆ‘ ๐›ฝ ๐‘Ÿโˆ’๐‘ก E(๐‘ˆ๐‘Ÿ๐‘œ (๐‘๐‘Ÿ (๐‘ ๐‘Ÿ ), ๐‘ข(๐‘Ž๐‘Ÿ , ๐‘ ๐‘Ÿ ; ๐œƒ๐‘Ÿ )) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก ),
(2)
๐‘Ÿ=๐‘ก+1
where ๐›ฝ is the discount factor, and ๐‘‡โˆ— โ‰ค โˆž is the last decision period. HM93 needs the
state transition distributions to evaluate the conditional expectations from period ๐‘ก+1 to ๐‘‡โˆ— .
Hotz, Miller, Sanders, and Smith (1994) needs the state transition distributions to simulate
future states and choices to evaluate these conditional expectations. Our idea is that since
๐‘ˆ๐‘ก๐‘œ can be explicitly written a function of CCP and flow utility functions, e.g. eq. (1),
we might directly estimate the conditional expectation terms in eq. (2) by nonparametric
regressions after estimating CCP and parameterizing flow utility functions if we have panel
data. More concretely, take eq. (1) as an example and assume ๐‘ ๐‘ก is a scalar. Letting
๐‘ข(๐‘Ž๐‘ก = ๐‘Ž, ๐‘ ๐‘ก ; ๐œƒ๐‘ก ) = ๐‘ ๐‘ก ๐œƒ๐‘ก (๐‘Ž), to estimate the conditional valuation function of eq. (2), it is
only necessary to estimate the following terms
E(Pr(๐‘Ž๐‘Ÿ = ๐‘Ž | ๐‘ ๐‘Ÿ )๐‘ ๐‘Ÿ โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก )
and
E(Pr(๐‘Ž๐‘Ÿ = ๐‘Ž | ๐‘ ๐‘Ÿ ) ln Pr(๐‘Ž๐‘Ÿ = ๐‘Ž | ๐‘ ๐‘Ÿ ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก ) (3)
for each period ๐‘Ÿ = ๐‘ก + 1, โ€ฆ , ๐‘‡โˆ— . Why would it help? It helps because estimating ๐น (๐‘ ๐‘ก+1 |๐‘ ๐‘ก )
involves the dimension of 2โ‹…dim ๐‘ ๐‘ก , while the nonparametric regressions, like eq. (3), involves
3
the dimension of dim ๐‘ ๐‘ก + 1. In addition, we have abundant resource to deal with the curseof-dimensionality in nonparametric regressions.
Careful readers may now raise two questions. First, for the infinite horizon stationary
dynamic programming problem, in which ๐‘‡โˆ— = โˆž, do we have to estimate infinite number
of nonparametric regressions, which is infeasible given that any practical panel data have
finite sampling periods? Second, for finite horizon problem (๐‘‡โˆ— < โˆž), do we need panel
data covering agentsโ€™ entire decision horizon in order to estimate eq. (3) for each period
๐‘Ÿ = ๐‘ก + 1, โ€ฆ , ๐‘‡โˆ— ? The answer is โ€œnoโ€ to both questions. For infinite horizon stationary
problem, we require the panel data to cover two consecutive decision periods. For finite
horizon problem, we require the panel data to cover four consecutive decision periods. The
two-period requirement for infinite horizon stationary problem is easier to understand. For
an infinite horizon stationary problem, both CCP and state transition distributions are not
time varying. The time invariant CCP can be estimated from single period data, and two
periods data are enough for estimating the state transition distributions. Once time invariant
CCP and state transition distributions are known, in principle it is possible to calculate the
conditional expectations like eq. (3). The four-period requirement for finite horizon problem
is more subtle. In this paper, the identification of the model will be achieved by using the
Exclusion Restriction proposed by Chou (2016). The Exclusion Restriction says that there
are excluded variables that do not affect the immediate payoff of the discrete choice but
affect future payoffs. One useful conclusion of Chou (2016) is that the value function in the
last sampling period of panel data is nonparametrically identified when panel data cover
at least four consecutive decision periods. It is useful because we can write the conditional
valuation function in eq. (2) for finite horizon problem as follows,
๐‘‡ โˆ’1
โˆ‘ ๐›ฝ ๐‘Ÿโˆ’๐‘ก E(๐‘ˆ๐‘Ÿ๐‘œ (๐‘๐‘Ÿ (๐‘ ๐‘Ÿ ), ๐‘ข(๐‘Ž๐‘Ÿ , ๐‘ ๐‘Ÿ ; ๐œƒ๐‘Ÿ )) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก ) + ๐›ฝ ๐‘‡ โˆ’๐‘ก E(๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก ),
๐‘Ÿ=๐‘ก+1
where ๐‘‡ is the last decision period in sample, and ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) is the value function in period
๐‘‡ after integrating out the unobserved utility shocks. The nonparametric regressions involved in the first summation term have no problem. The second term is not a problem
either, because we can express ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) as a series expansion and the series coefficients are
identifiable.
Our three-step extension of HM93 proceeds in the following steps. The first step is to
estimate the CCP nonparametrically. The second step is to estimate the conditional valuation function by nonparametric regressions of generated dependent variables, which are the
terms in the expression of ๐‘ˆ๐‘ก๐‘œ (๐‘๐‘กฬ‚ (๐‘ ๐‘ก ), ๐‘ข(๐‘Ž๐‘ก , ๐‘ ๐‘ก ; ๐œƒ๐‘ก )), on the current state and choice (๐‘ ๐‘ก , ๐‘Ž๐‘ก ).
The third step is GMM estimation of ๐œƒ๐‘ก of the flow utility functions. Although the exact
4
moment conditions that we used for the third step is different from HM93, the essential
difference between our three-step estimator and the original HM93โ€™s two-step procedure is
that we replace the nonparametric estimation of state transition distributions in HM93 with
the nonparametric regression with generated dependent variables. This can be complementary to the literature when the dimension of the state variables is large. The additional cost
is that for finite horizon problem, we need excluded variables in order to identify the value
function in the last sampling period. The excluded variable will also be needed to identify
the infinite horizon stationary dynamic programming discrete choice model.2
Since HM93, a series of CCP estimation papers (AltuฤŸ and Miller, 1998; Arcidiacono
and Miller, 2011, 2016) have been using the concept of โ€œfinite dependenceโ€ to simplify the
CCP estimator. Given โ€œfinite dependenceโ€ property, such as terminal or renewal action,
the current choice will not alter the distribution of future states after a certain number of
periods. Therefore, the conditional valuation function will depend only on the CCP and
flow utility functions in a small number of periods ahead. This can substantially simplify
the CCP estimation. Such โ€œfinite dependenceโ€ property can also be used to simplify our
three-step estimation, because we are essentially also estimating the conditional valuation
functions. We are not going to pursue this direction in the present paper.
To derive the asymptotic variance of the proposed estimator, we derive the general
formulas for the asymptotic variance of the three-step semiparametric M-estimators with
generated dependent variables for the nonparametric regressions of the second step. It turns
out that unlike the three-step semiparametric estimators with generated regressors (Hahn
and Ridder, 2013), the sampling error in the first-step estimation will always affect the
influence function of the three-step semiparametric estimators with generated dependent
variables.
The rest is organized as follows. The dynamic discrete choice model is described in section 2. Section 3 and section 4 show the three-step estimation of infinite horizon stationary
model and general nonstationary model, resepctively. Section 5 is a numerical study of the
proposed three-step CCP estimator. Section 6 concludes the paper. Some technical details
are included in Appendix A, B, C, and D.
2 One
way identify the infinite horizon stationary model without using the excluded variable is to impose
the normalization assumption that lets the flow utility function of a reference alternative being zero for all
state values. This normalization will be problematical for doing counterfactual analysis as shown in our
discussions after assumption 4 of our model. Parametric specification of flow utility function might be able
to identify the model in some cases. For example, in the numerical example of Srisuma and Linton (2012),
they estimate an infinite horizon stationary model without using the excluded variables or the normalization.
They did not address the identification issue, though their Monte Carlo experiments suggest the parameters
are identified.
5
2
Dynamic Programming Discrete Choice Model
First, we set up the model. Second, we decribe the data required for our estimator. Third,
we briefly discuss the identification and conclude with the moment conditions that are the
basis of our three step estimator.
2.1
The Model
We focus on the binary choice case. For period ๐‘ก, let โ„ฆ๐‘ก be the vector of state variables that
could be relevant to the current and future choices or utilities apart from time itself. In
each period ๐‘ก, an agent makes a binary choice ๐‘Ž๐‘ก โˆˆ ๐ด โ‰ก {0, 1} based on โ„ฆ๐‘ก . The choice ๐‘Ž๐‘ก
affects both the agentโ€™s flow utility in period ๐‘ก and the distribution of the next period state
variables โ„ฆ๐‘ก+1 . The vector โ„ฆ๐‘ก is completely observable to the agent in period ๐‘ก but partly
observable to econometricians. Let โ„ฆ๐‘ก โ‰ก (๐‘ฅ๐‘ก , ๐‘ง๐‘ก , ๐œ€๐‘ก ). Econometricians only observe ๐‘ฅ๐‘ก and
๐‘ง๐‘ก , and let the vector ๐œ€๐‘ก denote the unobservable utility shocks. Also denote ๐‘ ๐‘ก โ‰ก (๐‘ฅ๐‘ก , ๐‘ง๐‘ก ) the
observable state variables. Assumption 1 assumes that โ„ฆ๐‘ก is a controlled first-order Markov
process, which is standard in the literature. Assumption 2 has two points: (i) the flow utility
is additive in ๐œ€๐‘ก ; (ii) given ๐‘ฅ๐‘ก , ๐‘ง๐‘ก does not affect flow utility (the Exclusion Restriction). The
additivity in utility shocks is usually maintained in the literature with notable exception
of Kristensen, Nesheim, and de Paula (2014). We will use the Exclusion Restriction to
identify the model. The exclusion restriction exists in the applied literature, e.g. Fang and
Wang (2015) and Blundell, Costa Dias, Meghir, and Shaw (2016). We will use the notation
by Aguirregabiria and Mira (2010) in their survey paper as much as possible. The first
difference is that due to the presence of the excluded state variables ๐‘ง๐‘ก , we decided to use
โ„ฆ๐‘ก โ‰ก (๐‘ ๐‘ก , ๐œ€๐‘ก ) and ๐‘ ๐‘ก โ‰ก (๐‘ฅ๐‘ก , ๐‘ง๐‘ก ) to denote the vector of all state variables and the vector of
observable state variables, respectively.3
Assumption 1. Pr(โ„ฆ๐‘ก+1 | โ„ฆ๐‘ก , ๐‘Ž๐‘ก , โ„ฆ๐‘กโˆ’1 , ๐‘Ž๐‘กโˆ’1 , โ€ฆ) = Pr(โ„ฆ๐‘ก+1 | โ„ฆ๐‘ก , ๐‘Ž๐‘ก ).
Assumption 2. The agent receives flow utility ๐‘ˆ๐‘ก (๐‘Ž๐‘ก , โ„ฆ๐‘ก ) in period ๐‘ก.
Letting ๐œ€๐‘ก โ‰ก
โ€ฒ
(๐œ€๐‘ก (0), ๐œ€๐‘ก (1)) , assume
๐‘ˆ๐‘ก (๐‘Ž๐‘ก , โ„ฆ๐‘ก ) = ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) + ๐œ€๐‘ก (๐‘Ž๐‘ก ).
The flow utility ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) in assumption 2 has a subscript โ€œ๐‘กโ€. This is a bit uncommon
in the literature. By adding โ€œ๐‘กโ€ as a subscript, we allow the flow utility to depend on the
decision period, which is โ€œageโ€ in many applications of labor economics, per se.
3 Aguirregabiria
and Mira (2010) used ๐‘ ๐‘ก and ๐‘ฅ๐‘ก to denote the vector of all state variables and the vector
of observable state variables, respectively.
6
Let ๐‘‡โˆ— โ‰ค โˆž be the last decision period. In each period ๐‘ก, the agent makes a sequence of
choices {๐‘Ž๐‘ก , โ€ฆ , ๐‘Ž๐‘‡โˆ— } to maximize the expected discounted remaining lifetime utility,
๐‘‡โˆ—
โˆ‘ ๐›ฝ ๐‘Ÿโˆ’๐‘ก E(๐‘ˆ๐‘Ÿ (๐‘Ž๐‘Ÿ , โ„ฆ๐‘Ÿ ) โˆฃ ๐‘Ž๐‘ก , โ„ฆ๐‘ก )
๐‘Ÿ=๐‘ก
where 0 โ‰ค ๐›ฝ < 1 is the discount factor.4 The agentโ€™s problem is a Markov decision process,
which can be solved by dynamic programming. Let ๐‘‰๐‘ก (โ„ฆ๐‘ก ) be the optimal value function,
which solves Bellmanโ€™s equation,
๐‘‰๐‘ก (โ„ฆ๐‘ก ) = max ๐‘ข๐‘ก (๐‘Ž๐‘ก = ๐‘Ž, ๐‘ฅ๐‘ก ) + ๐œ€๐‘ก (๐‘Ž) + ๐›ฝ E(๐‘‰๐‘ก+1 (โ„ฆ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐œ€๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž).
๐‘Žโˆˆ๐ด
(4)
Accordingly, if the agent follows the optimal decision rule, the observed choice ๐‘Ž๐‘ก satisfies
๐‘Ž๐‘ก = arg max ๐‘ข๐‘ก (๐‘Ž๐‘ก = ๐‘Ž, ๐‘ฅ๐‘ก ) + ๐›ฝ E(๐‘‰๐‘ก+1 (โ„ฆ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐œ€๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) + ๐œ€๐‘ก (๐‘Ž).
๐‘Žโˆˆ๐ด
(5)
Equation (5) is similar to the data generating process of a static binary choice model with
the exception of the additional term ๐›ฝ E(๐‘‰๐‘ก+1 (โ„ฆ๐‘ก+1 )|๐‘ ๐‘ก , ๐œ€๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) (the conditional valuation
function associated with choosing ๐‘Ž in period ๐‘ก in HM93โ€™s terminology) in the โ€œexpected
payoffโ€ of alternative ๐‘Ž. Without further restriction, the conditional valuation function is
non-separable from the unobserved ๐œ€๐‘ก . To avoid dealing with non-separable models, we
make the following assumption, which ensures
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก ),
E(๐‘‰๐‘ก+1 (โ„ฆ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐œ€๐‘ก , ๐‘Ž๐‘ก ) = E(๐‘‰๐‘ก+1
(6)
ฬ„ (๐‘ ๐‘ก+1 ) โ‰ก E(๐‘‰๐‘ก+1 (๐‘ ๐‘ก+1 , ๐œ€๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก+1 ).
๐‘‰๐‘ก+1
(7)
where
Assumption 3. (i) ๐‘ ๐‘ก+1 , ๐œ€๐‘ก+1 โŸ‚
โŸ‚ ๐œ€๐‘ก | ๐‘ ๐‘ก , ๐‘Ž๐‘ก , (ii) ๐‘ ๐‘ก+1 โŸ‚
โŸ‚ ๐œ€๐‘ก+1 | ๐‘ ๐‘ก , ๐‘Ž๐‘ก , and (iii) ๐œ€๐‘ก+1 โŸ‚
โŸ‚ ๐‘ ๐‘ก , ๐‘Ž ๐‘ก .
This assumption is also common in the literature. Equation (6) implies that the agentโ€™s
expectation of her optimal future value depends only on the observable states variables ๐‘ ๐‘ก
and choice ๐‘Ž๐‘ก . This could be restrictive in some applications, e.g. it excludes fixed effect that
affects flow utility or state transition or both. In Magnac and Thesmarโ€™s (2002) research
about the identification of the dynamic programming discrete choice model, they show
the identification of flow utility allowing for fixed effect that could affect both flow utility
and state transition. One of their conditions is that the flow utility and the conditional
valuation function of one reference alternative equal to zero. In our notation, taking โ€œ0โ€ as
4 In
general, we can allow for time varying discount factor, ๐›ฝ๐‘ก , but for the simplicity of exposition, we do
not consider such an extension here.
7
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) = 0.
the reference alternative, they assume ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = 0 and E(๐‘‰๐‘ก+1
Such a condition turns out to be too restrictive in some applications, as we will discuss more
after assumption 4.
Using the simplification of eq. (6), the observed discrete choice ๐‘Ž๐‘ก satisfies
๐‘Ž๐‘ก = 1(๐œ€๐‘ก (0) โˆ’ ๐œ€๐‘ก (1) < ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก )),
where
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก )
๐‘ฃ๐‘ก (๐‘Ž๐‘ก , ๐‘ ๐‘ก ) = ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) + ๐›ฝ E(๐‘‰๐‘ก+1
(8)
is the choice-specific value function minus the flow utility shock in Aguirregabiria and Miraโ€™s
(2010) terminology. Let ๐œ€๐‘กฬƒ โ‰ก ๐œ€๐‘ก (0)โˆ’๐œ€๐‘ก (1) and ๐บ(·|๐‘ ๐‘ก ) be the cumulative distribution function
(CDF) of ๐œ€๐‘กฬƒ given ๐‘ ๐‘ก . In terms of ๐บ(· | ๐‘ ๐‘ก ), the CCP ๐‘๐‘ก (๐‘ ๐‘ก ) = Pr(๐‘Ž๐‘ก = 1 | ๐‘ ๐‘ก ) is
๐‘๐‘ก (๐‘ ๐‘ก ) = ๐บ(๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) โˆฃ ๐‘ ๐‘ก )
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 1) โˆ’ ๐›ฝ E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) โˆฃ ๐‘ ๐‘ก ).
= ๐บ(๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) โˆ’ ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐›ฝ E(๐‘‰๐‘ก+1
(9)
Our analysis will heavily use the notation of flow utility difference and the conditional
expectation difference. Without simple notation, the discussion would be cumbersome as
shown by the above displayed equation. So we define the following new notation
๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) โ‰ก ๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) โˆ’ ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก )
ฬ„ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก ) โ‰ก E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 1) โˆ’ E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0).
E(ฬƒ ๐‘‰๐‘ก+1
The estimation of ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) is equivalent to the estimation of ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) and ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ).
When the CDF ๐บ(· | ๐‘ ๐‘ก ) is unknown, even the difference ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก )
cannot be identified in general, let alone the flow utility functions ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ).5 Suppose that
the CDF ๐บ(· | ๐‘ ๐‘ก ) is known, the absolute level of ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) cannot be identified. Take ๐›ฝ = 0
for example, for any constant ๐‘ โˆˆ โ„,
๐‘๐‘ก (๐‘ ๐‘ก ) = ๐บ(๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) โˆ’ ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) | ๐‘ ๐‘ก )
= ๐บ([๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) + ๐‘] โˆ’ [๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘] โˆฃ ๐‘ ๐‘ก ).
The following assumption is to address these concerns.
5 With
additional assumptions, the CDF ๐บ(· | ๐‘ ๐‘ก ) is identifiable as shown by Aguirregabiria (2010, page
205). His arguments are based on the following observation. Besides the presence of the conditional valuation
ฬ„
function ๐›ฝ E(๐‘‰๐‘ก+1
(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก ), the CCP formula eq. (9) is similar to the CCP in the binary static discrete
choice model studied by Matzkin (1992), in which the CDF ๐บ(· | ๐‘ ๐‘ก ) can be nonparametrically identified
with the โ€œspecial regressorsโ€ and the zero-median assumption. In this paper, we focus on the estimation
when ๐บ(· | ๐‘ ๐‘ก ) is known.
8
Assumption 4.
(i) Assume ๐‘ ๐‘ก โŸ‚
โŸ‚ ๐œ€๐‘กฬƒ , and ๐œ€๐‘กฬƒ is a continuous random variable with real
line support. Letting ๐บ(·) be the CDF of ๐œ€๐‘กฬƒ , assume ๐บ(·) is a known strictly increasing
function, and E(๐œ€๐‘ก (0)) = 0.
(ii) For every period ๐‘ก, let ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก = ๐‘ฅโˆ— ) = 0 for some ๐‘ฅโˆ— .
The independence assumption ๐‘ ๐‘ก โŸ‚
โŸ‚ ๐œ€๐‘กฬƒ is not particularly strong and is commonly maintained in the literature given that we have to assume the conditional CDF of ๐œ€๐‘กฬƒ given ๐‘ ๐‘ก
is known anyway. The normalization in assumption 4.(ii) differs from the commonly used
normalization by letting one flow utility function be zero,
๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก = ๐‘ฅ) = 0,
for all ๐‘ก and all ๐‘ฅ.
(10)
The normalization of eq. (10) implies that the flow utility of alternative 0 in each period
does not vary with respect to the values of the state variable ๐‘ฅ๐‘ก , and more importantly with
respect to the distribution of ๐‘ฅ๐‘ก . This has serious consequence. Because the future value of a
current choice is the discounted expected sum of all future optimal flow utilities by following
the optimal strategy, prohibiting ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) from changing with respect to ๐‘ฅ๐‘ก for each
period ๐‘ก is going to greatly restrict how the future value of the current choice changes with
respect to the distribution of (๐‘ฅ๐‘ก+1 , ๐‘ฅ๐‘ก+2 , โ€ฆ ) given ๐‘ฅ๐‘ก and ๐‘Ž๐‘ก . When the counterfactual
policy is to change the state transition distributions, it is known that the normalization of
eq. (10) is not innocuous, but the normalization of assumption 4.(ii) is mostly harmless for
making the counterfactual policy predictions (Norets and Tang, 2014; Kalouptsidi, Scott,
and Souza-Rodrigues, 2015; Chou, 2016). With the Exclusion Restriction, Chou (2016)
shows that the flow utility functions are nonparametrically identifiable without imposing
the โ€œstrongโ€ normalization of eq. (10).
2.2
Data and Structural Parameters
There are ๐‘› agents in data. For each agent ๐‘– = 1, โ€ฆ , ๐‘›, we observe her decisions ๐‘Ž๐‘–๐‘ก
and observable state variables ๐‘ ๐‘–๐‘ก from decision period 1 to ๐‘‡ โ‰ค ๐‘‡โˆ— . Taking the female
labor force participation model as an example, the decision period is โ€œageโ€, and ๐‘Ž๐‘–๐‘ก is of
course whether or not woman ๐‘– is on the labor market at the age of ๐‘ก. In the female labor
supply study (e.g. Eckstein and Wolpin, 1989; AltuฤŸ and Miller, 1998; Blundell, Costa Dias,
Meghir, and Shaw, 2016), some common state variables in ๐‘ ๐‘–๐‘ก include woman ๐‘–โ€™s education,
accumulated assets, working experience, her family background, her partnerโ€™s wage (which
would zero if she is single), education and working experience, and the number of dependent
children in the household. The point is ๐‘ ๐‘–๐‘ก usually has a large dimension and includes both
9
continuous (partnerโ€™s wage) and discrete variables (education). These features make the
direct estimation of the state transition distribution hard. It should be remarked that the
first decision period 1 in sample does not need to be the initial period of her dynamic
programming problem, nor does the last period ๐‘‡ in sample correspond to the terminal
decision period ๐‘‡โˆ— .
The structural parameters of this model include the flow utility functions ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ),
discount factor ๐›ฝ and state transition distributions ๐น๐‘ก+1 (๐‘ ๐‘ก+1 |๐‘ ๐‘ก , ๐‘Ž๐‘ก ) in each period ๐‘ก. The
state transition distributions are identified from data. By adding time subscript in the state
transition distributions, we allow the distribution vary across decision periods. This can be
useful in applications because for example data show that mean earnings are hump shaped
over the working lifetime (see e.g. Heckman, Lochner, and Todd, 2003; Huggett, Ventura,
and Yaron, 2011; Blundell, Costa Dias, Meghir, and Shaw, 2016).
2.3
Identification from Linear Moment Conditions
The identification of the structural parameters (excepting for the state transition distributions, which are directly identified from data) is based on the following equations,
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ),
ฬƒ ๐‘‰๐‘ก+1
๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) = ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) = ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + ๐›ฝ E(
(11a)
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0),
๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) = ๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) โˆ’ ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) โˆ’ ๐›ฝ E(๐‘‰๐‘ก+1
(11b)
for every period ๐‘ก in data. Our estimator will be based on an alternative expression of
the above equations, which is eq. (14) at the end of this section. Here ๐œ‘(๐‘) and ๐œ“(๐‘) are
constructed from ๐บ(·), the known strictly increasing CDF of ๐œ€๐‘กฬƒ = ๐œ€๐‘ก (0) โˆ’ ๐œ€๐‘ก (1):
๐œ‘(๐‘) โ‰ก ๐บโˆ’1 (๐‘)
(12)
๐œ“(๐‘) โ‰ก โˆซ max[0, ๐บโˆ’1 (๐‘) โˆ’ ๐‘ก] d ๐บ(๐‘ก),
for ๐‘ โˆˆ [0, 1]. Clearly, ๐œ‘(๐‘) is the quantile function. If we substitute ๐‘ in the definition of
๐œ“(๐‘) with the CCP ๐‘๐‘ก (๐‘ ๐‘ก ), ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) viewed as a function of ๐‘ ๐‘ก is the so called McFaddenโ€™s
social surplus function, that is the expected life-time utility following the optimal policy
minus the expected life-time utility of choosing alternative zero in period ๐‘ก and following
the optimal policy thereafter. The fact that the social surplus function depends only on
the CCP is known in the literature (see e.g. page 501โ€“502 of HM93 and Proposition 2 of
Aguirregabiria, 2010).
Equation (11a) follows from inverting the CDF ๐บ(·) in eq. (9) and the definition of ๐œ‘(๐‘)
in eq. (12). It is a special case of the Hotz and Millerโ€™s inversion result (HM93, Proposition
10
1), which establishes the invertibility for multinomial discrete choice. Though not explicitly
presented in HM93, eq. (11b) can be derived using the similar arguments of HM93, page
501. It follows from integrating out ๐œ€๐‘ก from the both sides of Bellmanโ€™s eq. (4). First, in
terms of ๐‘ฃ๐‘ก (๐‘Ž๐‘ก , ๐‘ ๐‘ก ), eq. (4) is rewritten as follows,
๐‘‰๐‘ก (๐‘ ๐‘ก , ๐œ€๐‘ก ) = max ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = ๐‘Ž, ๐‘ ๐‘ก ) + ๐œ€๐‘ก (๐‘Ž).
๐‘Žโˆˆ๐ด
The integration of ๐œ€๐‘ก for both sides of the above display gives
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = โˆซ max(๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + ๐œ€๐‘ก (0), ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) + ๐œ€๐‘ก (1)) d ๐น (๐œ€๐‘ก (0), ๐œ€๐‘ก (1))
= ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + โˆซ max(0, ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) โˆ’ ๐œ€๐‘กฬƒ ) d ๐บ(๐œ€๐‘กฬƒ ).
Here we used eq. (7) and E(๐œ€๐‘ก (0)) = 0 in assumption 4. It then follows from
๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) = ๐บโˆ’1 (๐‘๐‘ก (๐‘ ๐‘ก ))
that
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + โˆซ max(0, ๐บโˆ’1 (๐‘๐‘ก (๐‘ ๐‘ก )) โˆ’ ๐œ€)ฬƒ d ๐บ(๐œ€)ฬƒ
= ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )).
The above display gives eq. (11b) by substituting ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) with its definition in eq. (8).
In addition, the two functional of CCP, ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) and ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )), are related. Letting ๐‘Ž๐‘œ๐‘ก be
the optimal choice at time ๐‘ก given ๐‘ ๐‘ก , we have
E(๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) | ๐‘ ๐‘ก ) = ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) โˆ’ ๐‘๐‘ก (๐‘ ๐‘ก )๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )).
(13)
This can be verified by calculating ๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) in the following way,
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = E(๐‘ฃ๐‘ก (๐‘Ž๐‘œ๐‘ก , ๐‘ ๐‘ก ) + ๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) โˆฃ ๐‘ ๐‘ก )
= ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) + (1 โˆ’ ๐‘๐‘ก (๐‘ ๐‘ก ))๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + E(๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) | ๐‘ ๐‘ก )
= ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )[๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ ๐‘ก ) โˆ’ ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก )] + E(๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) | ๐‘ ๐‘ก )
= ๐‘ฃ๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) + E(๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) | ๐‘ ๐‘ก ).
Comparing the above display with eq. (11b), we have eq. (13). In the rest of the paper, we
work with the term ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) โˆ’ ๐‘๐‘ก (๐‘ ๐‘ก )๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) as a whole. So we define
๐œ‚(๐‘(๐‘ )) โ‰ก ๐œ“(๐‘(๐‘ )) โˆ’ ๐‘(๐‘ )๐œ‘(๐‘(๐‘ )).
11
Equation (11) can be viewed as linear moment conditions about the structural parameters. We observe CCP, hence ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) and ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )), from data. Given the discount factor
๐›ฝ, the observable ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) and ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) are linear in the unknown flow utility and value
functions. Given the flow utility and value functions, ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) and ๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) are linear in
๐›ฝ. Chou (2016) shows that with the Exclusion Restriction and certain rank conditions, we
can nonparametrically identify (i) the value function ๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ); (ii) the difference between the
flow utility functions ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ); (iii) the flow utility function of alternative zero ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ).
Below, we assume that the model is identified and focus on the estimation issue, though
during the presentation of our estimator, you will see at least hints about the identification
in remark 1 and remark 2.
In the rest, we transform eq. (11) to eq. (14), which turns out to be more useful for the
estimation job. Lemma 1 is to get rid of the event ๐‘Ž๐‘ก = 0 in the conditional expectation
ฬ„ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) in eq. (11b), so that the law of iterated expectation arguments, like
E(๐‘‰๐‘ก+1
E(E(๐‘ฃ๐‘ก+๐‘˜ | ๐‘ ๐‘ก+๐‘˜โˆ’1 ) โˆฃ ๐‘ ๐‘ก ) = E(๐‘ฃ๐‘ก+๐‘˜ | ๐‘ ๐‘ก ), can be used.
Lemma 1. We have
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) = E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ) โˆ’ ๐›ฝ โˆ’1 ๐‘๐‘ก (๐‘ ๐‘ก )[๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) โˆ’ ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก )]
E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) and
Proof. It follows from the law of total probability that (we suppress ๐‘ ๐‘ก+1 in ๐‘‰๐‘ก+1
๐‘ ๐‘ก in ๐‘[ (๐‘ [ )๐‘ก])
ฬ„ | ๐‘ ๐‘ก ) = (1 โˆ’ ๐‘๐‘ก ) E(๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) + ๐‘๐‘ก E(๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 1)
E(๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) + ๐‘๐‘ก E(ฬƒ ๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก ).
= E(๐‘‰๐‘ก+1
So we have
ฬ„ | ๐‘ ๐‘ก )
ฬ„ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = 0) = E(๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก ) โˆ’ ๐‘๐‘ก E(ฬƒ ๐‘‰๐‘ก+1
E(๐‘‰๐‘ก+1
ฬ„ | ๐‘ ๐‘ก ) โˆ’ ๐›ฝ โˆ’1 ๐‘๐‘ก [๐œ‘(๐‘๐‘ก ) โˆ’ ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก )],
= E(๐‘‰๐‘ก+1
where the second line follows from eq. (11b).
Using lemma 1, we can rewrite eq. (11b) as follows,
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )[๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) โˆ’ ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก )],
๐œ“(๐‘๐‘ก (๐‘ ๐‘ก )) = ๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) โˆ’ ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) โˆ’ ๐›ฝ E(๐‘‰๐‘ก+1
hence
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ).
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = [๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + ๐œ‚(๐‘๐‘ก (๐‘ ๐‘ก ))] + ๐›ฝ E(๐‘‰๐‘ก+1
12
โ– 
The term within the bracket is indeed the expected optimal flow utility in period ๐‘ก,
๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) = E(๐‘ข๐‘ก (๐‘Ž๐‘œ๐‘ก , ๐‘ฅ๐‘ก ) + ๐œ€๐‘ก (๐‘Ž๐‘œ๐‘ก ) | ๐‘ ๐‘ก ),
because
E(๐‘ข๐‘ก (๐‘Ž๐‘œ๐‘ก , ๐‘ฅ๐‘ก ) | ๐‘ ๐‘ก ) = ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) + (1 โˆ’ ๐‘๐‘ก (๐‘ ๐‘ก ))๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก )
= ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ),
and eq. (13).
In summary, we have
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก )
ฬƒ ๐‘‰๐‘ก+1
๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) = ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + ๐›ฝ E(
(14a)
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก )
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = ๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) + ๐›ฝ E(๐‘‰๐‘ก+1
(14b)
๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) โ‰ก ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + ๐œ‚(๐‘๐‘ก (๐‘ ๐‘ก )),
(14c)
for every decision period ๐‘ก. Below, we show how to estimate ๐‘ข๐‘ก (๐‘Ž๐‘ก , ๐‘ฅ๐‘ก ) using these equations.
3
Estimation of Infinite Horizon Stationary Markov Decision Processes
We first show our three-step estimator for the infinite horizon stationary dynamic programming discrete choice model, because of its simple structure and importance in the literature.
Since the structural parameters of an infinite horizon stationary model are all time invariant,
we omit ๐‘ก from CCP, flow utility and value functions below.
The plan for this section is the following. First, we derive a simpler moment condition
about the flow utility functions from eq. (14). This simpler moment condition, eq. (23),
resembles linear regression, but it has infinite number of unknown conditional expectations
in both dependent and independent variables. Second, we describe our three-step semiparametric estimation method assuming that we can estimate the infinite number of unknown
conditional expectations with finite-period panel data. Third, we show how to estimate
the infinite number of conditional expectations with two-period panel data by providing an
estimable approximation formula of all conditional expectations and the order of approximation error. Fourth, we derive the influence function for our three-step semiparametric
estimator.
13
3.1
Linear moment equation
Starting with eq. (14b), we have
๐‘‰ ฬ„ (๐‘ ๐‘ก ) โˆ’ ๐›ฝ E(๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ) = ๐‘ˆ ๐‘œ (๐‘ ๐‘ก ).
(15)
By the recursive structure of eq. (15), we expect to express ๐‘‰ ฬ„ (๐‘ ๐‘ก ) in terms of E(๐‘ˆ๐‘Ÿ๐‘œ (๐‘ ๐‘Ÿ ) โˆฃ ๐‘ ๐‘ก )
with ๐‘Ÿ = ๐‘ก, ๐‘ก + 1, โ€ฆ . Lemma 2 verifies our expectation.
โˆž
๐‘œ
Lemma 2. If assumption 1-4 hold, we have ๐‘‰ ฬ„ (๐‘ ๐‘ก ) = โˆ‘๐‘—=0 ๐›ฝ ๐‘— E(๐‘ˆ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก ).
Proof. Let ๐น (๐‘ โ€ฒ | ๐‘ ) be the time invariant conditional CDF of ๐‘ ๐‘ก+1 given ๐‘ ๐‘ก . In the proof, we
consider the case that the conditional CDF ๐น (๐‘ โ€ฒ | ๐‘ ) of ๐‘ ๐‘ก+1 | ๐‘ ๐‘ก is absolutely continuous for
every ๐‘ , and let ๐‘“(๐‘ โ€ฒ | ๐‘ ) be the conditional probability density function (PDF). This does
not lose generality, since one can redefine PDF with respect to other measures depending
on the type of ๐‘ ๐‘ก .
Let ๐’ฑ be a Banach space of ๐‘‰ ฬ„ (๐‘ ) with norm โ€–๐‘‰ ฬ„ (๐‘ )โ€– = sup๐‘ โˆˆ๐’ฎ |๐‘‰ ฬ„ (๐‘ )|, where ๐’ฎ is the
domain of the value function. We can define a linear operator ๐ฟ such that
[๐ฟ(๐‘‰ ฬ„ )](๐‘ ) = E(๐‘‰ ฬ„ (๐‘ โ€ฒ ) | ๐‘ ) = โˆซ ๐‘‰ ฬ„ (๐‘ โ€ฒ )๐น ( d ๐‘ โ€ฒ | ๐‘ ).
The left-hand-side of eq. (15) equals [(๐ผ โˆ’ ๐›ฝ๐ฟ)(๐‘‰ ฬ„ )](๐‘ ๐‘ก ), where ๐ผ denotes identity operator.
First, we show ๐ผ โˆ’ ๐›ฝ๐ฟ is invertible. Because ๐ฟ is a linear integral operator with the
kernel function being the conditional PDF ๐‘“(๐‘ โ€ฒ | ๐‘ ), we know that
โ€–๐ฟโ€– =
โˆฅ[๐ฟ(๐‘‰ ฬ„ )](๐‘ )โˆฅ =
sup
sup
โˆฅโˆซ ๐‘‰ ฬ„ (๐‘ โ€ฒ )๐‘“(๐‘ โ€ฒ | ๐‘ ) d ๐‘ โ€ฒ โˆฅ
๐‘‰ ฬ„ โˆˆ๐’ฑ,โ€–๐‘‰ ฬ„ โ€–=1
๐‘‰ ฬ„ โˆˆ๐’ฑ,โ€–๐‘‰ ฬ„ โ€–=1
= sup โˆซ|๐‘“(๐‘ โ€ฒ | ๐‘ )| d ๐‘ โ€ฒ = sup โˆซ ๐‘“(๐‘ โ€ฒ | ๐‘ ) d ๐‘ โ€ฒ = 1.
๐‘ โˆˆ๐’ฎ
๐‘ โˆˆ๐’ฎ
We then know that [๐›ฝ๐ฟ(๐‘‰ ฬ„ )](๐‘ ๐‘ก ) = ๐›ฝ E(๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก ), as a linear operator, has norm ๐›ฝ < 1.
It then follows from the geometric series theorem for linear operators (e.g. see Helmberg, 2008, Theorem 4.23.3) that ๐ผ โˆ’ ๐›ฝ๐ฟ is invertible. Hence, we conclude that ๐‘‰ ฬ„ (๐‘ ๐‘ก ) =
(๐ผ โˆ’ ๐›ฝ๐ฟ)โˆ’1 ๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ).
โˆž
Second, we have (๐ผ โˆ’ ๐›ฝ๐ฟ)โˆ’1 = โˆ‘๐‘—=0 (๐›ฝ๐ฟ)๐‘— from the geometric series theorem, hence
๐‘‰ ฬ„ (๐‘ ๐‘ก ) = โˆ‘โˆž
๐›ฝ ๐‘— [๐ฟ๐‘— (๐‘ˆ ๐‘œ (๐‘ ))](๐‘ ๐‘ก ).
๐‘—=0
Third, we show that
๐‘œ
[๐ฟ๐‘— (๐‘ˆ ๐‘œ (๐‘ ))](๐‘ ๐‘ก ) = E(๐‘ˆ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) | ๐‘ ๐‘ก ),
for ๐‘— = 0, 1, 2, โ€ฆ
(16)
by induction. For ๐‘— = 0, because ๐ฟ0 = ๐ผ and E(๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) | ๐‘ ๐‘ก ) = ๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ), the above display
holds. Suppose the above display holds for ๐‘— = ๐ฝ , we need to show that it holds for ๐‘— = ๐ฝ +1.
14
To see this, we have
[๐ฟ๐ฝ+1 (๐‘ˆ ๐‘œ (๐‘ ))](๐‘ ๐‘ก ) = [๐ฟ(๐ฟ๐ฝ (๐‘ˆ ๐‘œ (๐‘ )))](๐‘ ๐‘ก )
= โˆซ ๐ฟ๐ฝ (๐‘ˆ ๐‘œ (๐‘ ))(๐‘ ๐‘ก+1 )๐‘“(๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) d ๐‘ ๐‘ก
๐‘œ
= โˆซ โˆซ ๐‘ˆ๐‘ก+1+๐ฝ
(๐‘ ๐‘ก+1+๐ฝ )๐‘“(๐‘ ๐‘ก+1+๐ฝ | ๐‘ ๐‘ก+1 ) d ๐‘ ๐‘ก+1+๐ฝ ๐‘“(๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) d ๐‘ ๐‘ก .
(17)
๐‘œ
The last line used [๐ฟ๐ฝ (๐‘ˆ ๐‘œ (๐‘ ))](๐‘ ๐‘ก+1 ) = E(๐‘ˆ๐‘ก+1+๐ฝ
(๐‘ ๐‘ก+1+๐ฝ ) | ๐‘ ๐‘ก+1 ). By the Markov property,
we have ๐‘“(๐‘ ๐‘ก+1+๐ฝ | ๐‘ ๐‘ก+1 ) = ๐‘“(๐‘ ๐‘ก+1+๐ฝ | ๐‘ ๐‘ก+1 , ๐‘ ๐‘ก ), hence
๐‘“(๐‘ ๐‘ก+1+๐ฝ | ๐‘ ๐‘ก+1 )๐‘“(๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) = ๐‘“(๐‘ ๐‘ก+1+๐ฝ , ๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ).
As a result,
๐‘œ
eq. (17) = โˆซ โˆซ ๐‘ˆ๐‘ก+1+๐ฝ
(๐‘ ๐‘ก+1+๐ฝ )๐‘“(๐‘ ๐‘ก+1+๐ฝ , ๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) d ๐‘ ๐‘ก+1+๐ฝ d ๐‘ ๐‘ก
๐‘œ
= E(๐‘ˆ๐‘ก+1+๐ฝ
(๐‘ ๐‘ก+1+๐ฝ ) | ๐‘ ๐‘ก ).
By induction, we conclude that eq. (16) is true.
Last, given eq. (16) and ๐‘‰ ฬ„ (๐‘ ๐‘ก ) = โˆ‘โˆž
๐›ฝ ๐‘— [๐ฟ๐‘— (๐‘ˆ ๐‘œ (๐‘ ))](๐‘ ๐‘ก ), the statement is proved.
๐‘—=0
โ– 
Expressing the integrated value function ๐‘‰ ฬ„ (๐‘ ๐‘ก ) as a sum of discounted expected future
optimal flow utilities given the current state ๐‘ ๐‘ก is not an innovation relative to the literature.
For discrete state space, HM93, Hotz, Miller, Sanders, and Smith (1994) and Miller (1997)
all write the un-discounted conditional valuation function E(๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) as
โˆž
โˆ‘ ๐›ฝ ๐‘— E(๐‘ˆ ๐‘œ (๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž),
(18)
๐‘—=1
which follows from the expression of ๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) in our lemma 2. The existing CCP estimators
in discrete state space proceeds by expressing the conditional expectations of eq. (18) as
the product the state transition probability matrices and the optimal flow utilities in each
state. More concretely, for discrete state space, let ๐น denote the state transition probability
matrix from ๐‘ ๐‘ก to ๐‘ ๐‘ก+1 , which is constant across time for stationary problem. Let ๐น๐‘Ž be the
conditional state transition probability matrix from ๐‘ ๐‘ก to ๐‘ ๐‘ก+1 given ๐‘Ž๐‘ก = ๐‘Ž. And finally let
๐‘ˆโƒ— ๐‘œ denote the vector the optimal flow utility in each state of the state space. HM93 and
other CCP estimators estimate eq. (18) by
โˆž
๐›ฝ๐น๐‘Ž โˆ‘ ๐›ฝ ๐‘— ๐น ๐‘— ๐‘ˆโƒ— ๐‘œ = ๐›ฝ๐น๐‘Ž (๐ผ โˆ’ ๐›ฝ๐น )โˆ’1 ๐‘ˆโƒ— ๐‘œ .
๐‘—=0
15
For small state space, the estimation of the transition probability matrices is straightforward.
For large state space, the estimation requires various restrictions, which are not always easy
to form and justify. Of course, the above formula breaks when ๐‘ ๐‘ก contains continuous
state variables. Excepting for griding and parametric specification, another remedy for
continuous state variables is to use Srisuma and Linton (2012). Their estimator essentially
views eq. (15) as a Fredholm integral equation of type 2 with unknown kernel ๐น (๐‘ โ€ฒ | ๐‘ ),
which was nonparametrically estimated in their paper. Still, their method has to estimate
the state transition probabilities.
The departure from the literature and the innovation is that we will estimate the conditional expectation terms in eq. (18) by nonparametric regressions. Of course, we do not
need to restrict the state space to be discrete. To see why and how we estimate eq. (18)
by nonparametric regressions, we first derive a linear moment condition of the flow utility
functions.
We now derive the linear moment equation eq. (23). Equation (14a) for infinite horizon
stationary problem reads
ฬƒ ๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ).
๐œ‘(๐‘(๐‘ ๐‘ก )) = ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) + ๐›ฝ E(
Replacing ๐‘‰ ฬ„ (๐‘ ๐‘ก+1 ) in the above display with its series expression in lemma 2 and applying
๐‘œ
๐‘œ
E(E(๐‘ˆ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ) = E(๐‘ˆ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก ),
which follows from the Markov property of ๐‘ ๐‘ก , we have the moment condition
โˆž
๐‘œ
ฬƒ ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก ),
๐œ‘(๐‘(๐‘ ๐‘ก )) = ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) + โˆ‘ ๐›ฝ ๐‘— E(๐‘ˆ
(19)
๐‘—=1
where
ฬƒ ๐‘ก ) + ๐œ‚(๐‘(๐‘ ๐‘ก )).
๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) = ๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘(๐‘ ๐‘ก )๐‘ข(๐‘ฅ
For notational simplicity, let
๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) = ๐‘ฅโ€ฒ๐‘ก ๐›ฟ
๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = ๐‘ฅโ€ฒ๐‘ก ๐›ผ.
and
(20)
The flow utility functions can also be nonparametrically estimated if we write ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) and
๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) as series expansion. We are interested in estimating ๐œƒ โ‰ก (๐›ฟ โ€ฒ , ๐›ผโ€ฒ )โ€ฒ .
It follows from eq. (20) that eq. (19) becomes
โˆž
โˆž
โˆž
๐œ‘(๐‘(๐‘ ๐‘ก )) = ๐‘ฅโ€ฒ๐‘ก ๐›ฟ โˆ’ โˆ‘ ๐›ฝ ๐‘— โ„Ž1๐‘— (๐‘ ๐‘ก ) + โˆ‘ ๐›ฝ ๐‘— โ„Ž2๐‘— (๐‘ ๐‘ก )โ€ฒ ๐›ผ + โˆ‘ ๐›ฝ ๐‘— โ„Ž3๐‘— (๐‘ ๐‘ก )โ€ฒ ๐›ฟ,
๐‘—=1
๐‘—=1
16
๐‘—=1
(21)
where
ฬƒ
โ„Ž1๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐œ‚(๐‘(๐‘ 
๐‘ก+๐‘— )) | ๐‘ ๐‘ก ),
ฬƒ ๐‘ก+๐‘— | ๐‘ ๐‘ก ),
โ„Ž2๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐‘ฅ
ฬƒ
โ„Ž3๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ ๐‘ก ).
โ€ฒ
Suppose we have data (๐‘Ž๐‘–๐‘ก , ๐‘ฅโ€ฒ๐‘–๐‘ก , ๐‘ง๐‘–๐‘ก
) for ๐‘– = 1, โ€ฆ , ๐‘›, ๐‘ก = 1, โ€ฆ , ๐‘‡ โ‰ฅ 2 and a known discount
factor ๐›ฝ. For ๐‘– = 1, โ€ฆ , ๐‘›, letting
โˆž
๐‘ฆ๐‘–๐‘ก โ‰ก ๐œ‘(๐‘(๐‘ ๐‘–๐‘ก )) + โˆ‘ ๐›ฝ ๐‘— โ„Ž1๐‘— (๐‘ ๐‘–๐‘ก ),
(22a)
๐‘—=1
โˆž
โˆž
โ€ฒ
๐‘Ÿ๐‘–๐‘ก
โ‰ก (๐‘ฅโ€ฒ๐‘–๐‘ก + โˆ‘ ๐›ฝ ๐‘— โ„Ž3๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ , โˆ‘ ๐›ฝ ๐‘— โ„Ž2๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ ),
๐‘—=1
(22b)
๐‘—=1
eq. (21) is written as the following linear equation of ๐œƒ,
โ€ฒ
๐‘ฆ๐‘–๐‘ก = ๐‘Ÿ๐‘–๐‘ก
๐œƒ.
(23)
Letting ๐‘Œ๐‘ก = (๐‘ฆ1๐‘ก , โ€ฆ , ๐‘ฆ๐‘›๐‘ก )โ€ฒ , ๐‘Œ = (๐‘Œ1โ€ฒ , โ€ฆ , ๐‘Œ๐‘‡โ€ฒ )โ€ฒ , ๐‘…๐‘ก = (๐‘Ÿ1๐‘ก , โ€ฆ , ๐‘Ÿ๐‘›๐‘ก )โ€ฒ , and ๐‘… = (๐‘…1โ€ฒ , โ€ฆ , ๐‘…๐‘‡โ€ฒ )โ€ฒ ,
we have ๐‘Œ = ๐‘…๐œƒ, or
๐œƒ = (๐‘…โ€ฒ ๐‘…)โˆ’1 (๐‘…โ€ฒ ๐‘Œ ).
3.2
Three-step estimator
The estimation of ๐œƒ is to obtain the sample analog of ๐‘… and ๐‘Œ . First, we approximate ๐‘ฆ๐‘–๐‘ก
and ๐‘Ÿ๐‘–๐‘ก by truncating the infinite series in eq. (21). We will justify the truncation latter,
but given the presence of the discount factor ๐›ฝ < 1, it should not be a surprise that we can
truncate the series for estimation. Define
๐ฝ
๐‘ฆ๐‘–๐‘ก,๐ฝ โ‰ก ๐œ‘(๐‘(๐‘ ๐‘–๐‘ก )) + โˆ‘ ๐›ฝ ๐‘— โ„Ž1๐‘— (๐‘ ๐‘–๐‘ก ),
(24a)
๐‘—=1
๐ฝ
๐ฝ
โ€ฒ
๐‘Ÿ๐‘–๐‘ก,๐ฝ
โ‰ก (๐‘ฅโ€ฒ๐‘–๐‘ก + โˆ‘ ๐›ฝ ๐‘— โ„Ž3๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ , โˆ‘ ๐›ฝ ๐‘— โ„Ž2๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ ).
๐‘—=1
(24b)
๐‘—=1
The notation ๐‘Œ๐‘ก,๐ฝ , ๐‘Œ๐ฝ , ๐‘…๐‘ก,๐ฝ , and ๐‘…๐ฝ are then defined similarly from ๐‘ฆ๐‘–๐‘ก,๐ฝ and ๐‘Ÿ๐‘–๐‘ก,๐ฝ . It
will be helpful to denote โ„Ž๐ฝ๐‘˜ (๐‘ ๐‘ก ) โ‰ก (โ„Ž๐‘˜1 (๐‘ ๐‘ก ), โ€ฆ , โ„Ž๐‘˜๐ฝ (๐‘ ๐‘ก ))โ€ฒ for ๐‘˜ = 1, 2, 3 and โ„Ž๐ฝ (๐‘ ๐‘ก )โ€ฒ โ‰ก
(โ„Ž๐ฝ1 (๐‘ ๐‘ก )โ€ฒ , โ„Ž๐ฝ2 (๐‘ ๐‘ก )โ€ฒ , โ„Ž๐ฝ3 (๐‘ ๐‘ก )โ€ฒ ) for calculating the influence functions latter. Our estimator proceeds in three steps.
Step 1: Estimate the CCP E(๐‘Ž๐‘–๐‘ก | ๐‘ ๐‘–๐‘ก ). Let ๐‘(๐‘ )
ฬ‚
be the CCP estimator.
Step 2: Estimate ๐œ‘(๐‘(๐‘ )) and โ„Ž๐‘˜๐‘— (๐‘ ) for ๐‘˜ = 1, 2, 3 and ๐‘— = 1, โ€ฆ , ๐ฝ . Let ๐œ‘(๐‘(๐‘ ))
ฬ‚
be the
ฬƒ
estimators of ๐œ‘(๐‘(๐‘ )). Taking the estimation of โ„Ž3๐‘— (๐‘ ) = E(๐‘(๐‘ ๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ ๐‘ก ) for
17
example, we need to estimate
E(๐‘(๐‘ ๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž),
for ๐‘Ž = 0, 1. When ๐‘ก + ๐‘— โ‰ค ๐‘‡ , these conditional expectations can be estimated by
nonparametric regression of ๐‘(๐‘ 
ฬ‚ ๐‘–,๐‘ก+๐‘— )๐‘ฅ๐‘–,๐‘ก+๐‘— on ๐‘ ๐‘–๐‘ก and ๐‘Ž๐‘–๐‘ก . If ๐‘ก + ๐‘— > ๐‘‡ , we can still
estimate these conditional means by using the stationarity property. We will explain
the estimation of โ„Ž๐‘˜๐‘— when ๐‘ก + ๐‘— > ๐‘‡ in detail after completing the procedure. For
๐‘˜ = 1, 2, 3, let โ„Žฬ‚ ๐‘˜๐‘— (๐‘ ) be the estimator of โ„Ž๐‘˜๐‘— (๐‘ ).
Step 3: Then ๐‘ฆ๐‘–๐‘ก,๐ฝ
ฬ‚
and ๐‘Ÿ๐‘–๐‘ก,๐ฝ
ฬ‚ , hence ๐‘Œ๐ฝฬ‚ and ๐‘…ฬ‚ ๐ฝ , are constructed by replacing the unknown
๐œ‘ and โ„Ž๐‘˜๐‘— with their respective estimates. We have the estimator
โˆ’1
๐œƒ๐ฝฬ‚ = (๐‘…ฬ‚ ๐ฝโ€ฒ ๐‘…ฬ‚ ๐ฝ ) (๐‘…ฬ‚ ๐ฝโ€ฒ ๐‘Œ๐ฝฬ‚ ).
We want to emphasize that the implementation of the entire estimation procedure involves
only certain number of nonparametric regressions and one ordinary linear regression. The
number of nonparametric regressions depends on the dimension of ๐‘ฅ๐‘ก linearly. So we claim
that the numerical implementation is not difficult, and the computation burden grows linearly in the dimension of the state variables.
Remark 1 (Role of the excluded variable ๐‘ง๐‘ก with linear flow utility functions). We claimed
that the excluded variable ๐‘ง๐‘ก was required to identify the model. This point can be now
better understood from the linear regression perspective using eq. (23). To identify ๐œƒ, the
โ€œregressorsโ€ in ๐‘Ÿ๐‘ก cannot be perfectly collinear. For simplicity, let ๐‘ฅ๐‘ก be a scalar. So there
are two regressors in ๐‘Ÿ๐‘ก :
โˆž
โˆž
ฬƒ
๐‘ฅ๐‘ก + โˆ‘ ๐›ฝ ๐‘— E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก , ๐‘ง๐‘ก )
and
ฬƒ ๐‘ก+๐‘— | ๐‘ฅ๐‘ก , ๐‘ง๐‘ก ).
โˆ‘ ๐›ฝ ๐‘— E(๐‘ฅ
๐‘—=1
๐‘—=1
Without excluded variable ๐‘ง๐‘ก , these two regressors become
โˆž
โˆž
ฬƒ
๐‘ฅ๐‘ก + โˆ‘ ๐›ฝ ๐‘— E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก )
and
๐‘—=1
ฬƒ ๐‘ก+๐‘— | ๐‘ฅ๐‘ก ).
โˆ‘ ๐›ฝ ๐‘— E(๐‘ฅ
๐‘—=1
ฬƒ ๐‘ก+๐‘— |๐‘ฅ๐‘ก ) depending on the conditional distribution
The term ๐‘ฅ๐‘ก is likely to be collinear with E(๐‘ฅ
of ๐‘ฅ๐‘ก+๐‘— given ๐‘ฅ๐‘ก and ๐‘Ž๐‘ก . Due to the presence of CCP, we cannot tell how much is the
ฬƒ
ฬƒ
correlation between E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก ) and E(๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก ). However, because CCP is smaller
than one and
ฬƒ
E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก ) = E(๐‘(๐‘ ๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก , ๐‘Ž๐‘ก = 1) โˆ’ E(๐‘(๐‘ ๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก , ๐‘Ž๐‘ก = 0),
18
โˆž
ฬƒ
we expect that โˆ‘๐‘—=1 ๐›ฝ ๐‘— E(๐‘(๐‘ 
๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— | ๐‘ฅ๐‘ก ) is dominated by ๐‘ฅ๐‘ก . With linear flow utility
specification and without the excluded variable, the conclusion is the two regressors are
going to be highly correlated in some cases making the variance of ๐œƒ๐ฝฬ‚ large.
โ– 
Remark 2 (Role of the excluded variable ๐‘ง๐‘ก with nonparametric flow utility functions). If
we leave the flow utility functions nonparametrically unknown, we can write the original
moment equation eq. (19) as follows,
โˆž
โˆž
๐‘—=1
๐‘—=1
โˆž
ฬƒ
ฬƒ
๐œ‘(๐‘(๐‘ ๐‘ก )) + โˆ‘ E(๐œ‚(๐‘(๐‘ 
ฬƒ ๐‘ก ) + โˆ‘ ๐›ฝ ๐‘— E(๐‘(๐‘ 
ฬƒ ๐‘ก+๐‘— ) | ๐‘ฅ๐‘ก )]
๐‘ก+๐‘— )) | ๐‘ฅ๐‘ก ) = [๐‘ข(๐‘ฅ
๐‘ก+๐‘— )๐‘ข(๐‘ฅ
ฬƒ
+ โˆ‘ ๐›ฝ ๐‘— E(๐‘ข(๐‘Ž
๐‘ก+๐‘— = 0, ๐‘ฅ๐‘ก+๐‘— ) โˆฃ ๐‘ฅ๐‘ก ).
๐‘—=1
The left-hand-side of the above display is known from data. However, two terms on the
right-hand-side are both unknown functions of ๐‘ฅ๐‘ก , hence neither ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) nor ๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) is
โ– 
identifiable.
3.3
Estimation of conditional expectations with two-period data
We now explain the estimation of โ„Ž๐‘˜๐‘— (๐‘ ) when ๐‘ก + ๐‘— > ๐‘‡ in more detail. In particular, we
consider the most โ€œdifficultโ€ case in which ๐‘‡ = 2. Let ๐‘”(๐‘ ) denote a generic function of ๐‘ .
We only need to show the approximation of E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) for all ๐ฝ โ‰ฅ 1 and ๐‘Ž โˆˆ ๐ด.
As a summary for the rest, first, we prove that
E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก ) = ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝ ๐œŒ + error term;
second, we derive the order of the error term; third, we show,
E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก ) = E(E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก ) = ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ (๐‘Ž)ฮ“๐ฝโˆ’1 ๐œŒ + error term,
and derive the order of the approximation error. The notation used in the approximation
formulas will be clear soon. The bottom line is the formula can be estimated from twoperiod panel data. The proofs of the results are not interesting by themselves, so we left
them into the appendix.
For simplicity, let ๐‘  โˆˆ [0, 1] and let ๐‘” โˆˆ ๐ฟ2 (0, 1). Here ๐ฟ2 (0, 1) denotes the set of functions
1
๐‘” โˆถ [0, 1] โ†ฆ โ„ such that โˆซ ๐‘”2 (๐‘ ) d ๐‘  < โˆž. In this paper, we let ๐‘ž1 (๐‘ ), ๐‘ž2 (๐‘ ), โ€ฆ denote a
0
sequence of generic approximating functions. Here, we let ๐‘ž1 , ๐‘ž2 , โ€ฆ form an orthonormal
basis of ๐ฟ2 (0, 1). We can always write
โˆž
๐‘”(๐‘ ) = โˆ‘ ๐œŒ๐‘— ๐‘ž๐‘— (๐‘ )
๐‘—=1
19
1
where ๐œŒ๐‘— = โˆซ ๐‘”(๐‘ )๐‘ž๐‘— (๐‘ ) d ๐‘ . Denote
0
๐‘ž๐‘˜ฬ„ (๐‘ ) โ‰ก E(๐‘ž๐‘˜ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก = ๐‘ )
and
๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) โ‰ก E(๐‘ž๐‘˜ (๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก = ๐‘ , ๐‘Ž๐‘ก = ๐‘Ž),
(25)
for ๐‘˜ = 1, 2, โ€ฆ . Provided that ๐‘ž๐‘˜ฬ„ (๐‘ ), ๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) โˆˆ ๐ฟ2 (0, 1) (e.g. assumption 5.(i) and 5.(iii)
give two sufficient conditions), we can also write
โˆž
โˆž
๐‘ž๐‘˜ฬ„ (๐‘ ) = โˆ‘ ๐›พ๐‘˜,๐‘— ๐‘ž๐‘— (๐‘ )
and
๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) = โˆ‘ ๐›พ๐‘˜,๐‘— (๐‘Ž)๐‘ž๐‘— (๐‘ ),
๐‘—=1
๐‘—=1
1
where ๐›พ๐‘˜,๐‘— = โˆซ ๐‘ž๐‘˜ฬ„ (๐‘ )๐‘ž๐‘— (๐‘ ) d ๐‘ , and ๐›พ๐‘˜,๐‘— (๐‘Ž) is defined similarly. Letโ€™s truncate all series at
0
๐พ, and write
๐‘”(๐‘ ) = ๐‘ž ๐พ (๐‘ )โ€ฒ ๐œŒ + ๐œˆ(๐‘ ),
๐‘ž๐‘˜ฬ„ (๐‘ ) = ๐‘ž ๐พ (๐‘ )โ€ฒ ๐›พ๐‘˜ + ๐œ”๐‘˜ (๐‘ ),
๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) = ๐‘ž ๐พ (๐‘ )โ€ฒ ๐›พ๐‘˜ (๐‘Ž) + ๐œ”๐‘˜ (๐‘ , ๐‘Ž).
๐œŒ ๐‘ž (๐‘ ), ๐›พ๐‘˜ , ๐œ”๐‘˜ (๐‘ ), ๐›พ๐‘˜ (๐‘Ž) and
Here, ๐‘ž ๐พ = (๐‘ž1 , โ€ฆ , ๐‘ž๐พ )โ€ฒ , ๐œŒ = (๐œŒ1 , โ€ฆ , ๐œŒ๐พ )โ€ฒ , ๐œˆ(๐‘ ) = โˆ‘โˆž
๐‘—=๐พ+1 ๐‘— ๐‘—
๐œ”๐‘˜ (๐‘ , ๐‘Ž) are defined similarly. The following notation will be handy latter: let ๐œ”๐พ =
(๐œ”1 , โ€ฆ , ๐œ”๐พ )โ€ฒ , ๐œ”๐พ (๐‘ , ๐‘Ž) = (๐œ”1 (๐‘ , ๐‘Ž), โ€ฆ , ๐œ”๐พ (๐‘ , ๐‘Ž))โ€ฒ , ฮ“ = (๐›พ1 , โ€ฆ , ๐›พ๐พ ), and ฮ“ (๐‘Ž) = (๐›พ1 (๐‘Ž), โ€ฆ , ๐›พ๐พ (๐‘Ž)).
Note both ฮ“ and ฮ“ (๐‘Ž) are estimable from two-period panel data using nonparametric regression. Depending on how smooth ๐‘”(๐‘ ), ๐‘ž๐‘˜ฬ„ (๐‘ ) and ๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) (for each ๐‘Ž) are, we can bound
the norm of ๐œˆ(๐‘ ), ๐œ”๐‘˜ (๐‘ ) and ๐œ”๐‘˜ (๐‘ , ๐‘Ž).
Assumption 5.
(i) Assume ๐‘ž๐‘˜ฬ„ (๐‘ ) defined in eq. (25) is monotone is ๐‘ , or the state
transition density ๐‘“(๐‘ ๐‘ก+1 | ๐‘ ๐‘ก ) is continuous in ๐‘ ๐‘ก . Either condition ensures ๐‘ž๐‘˜ฬ„ (๐‘ ) โˆˆ
๐ฟ2 (0, 1).
(ii) A Sobolev ellipsoid is a set ๐ต(๐‘š, ๐‘) = {๐‘โˆฃโˆ‘โˆž
๐‘Ž ๐‘2 โ‰ค ๐‘2 }, where ๐‘Ž๐‘— โˆผ ๐œ‹๐‘—2๐‘š as ๐‘— โ†’ โˆž.
๐‘—=1 ๐‘— ๐‘—
Assume that the coefficients (๐œŒ1 , ๐œŒ2 , โ€ฆ ) and (๐›พ๐‘˜,1 , ๐›พ๐‘˜,2 , โ€ฆ ), for each ๐‘˜ = 1, 2, โ€ฆ , belong
to the Sobolev ellipsoid ๐ต(๐‘š, ๐‘). So we have a bound for โ€–๐œˆ(๐‘ )โ€–2 and โ€–๐œ”๐‘˜ (๐‘ )โ€–2 :
โ€–๐œˆ(๐‘ )โ€–2 = ๐‘‚(๐พ โˆ’๐‘š )
and
โ€–๐œ”๐‘˜ (๐‘ )โ€–2 = ๐‘‚(๐พ โˆ’๐‘š ).
The bound follows from the definition of ๐œˆ(๐‘ ) and ๐œ”๐‘˜ (๐‘ ) and Sobolev ellipsoid. Lemma
8.4 of Wasserman (2006) establishes this bound. Loosely speaking, the value of ๐‘š
depends on the order of the differentiability of the function.
(iii) For each ๐‘Ž โˆˆ ๐ด, assume ๐‘ž๐‘˜ฬ„ (๐‘ , ๐‘Ž) is monotone is ๐‘ , or the state transition density
๐‘“(๐‘ ๐‘ก+1 | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) is continuous in ๐‘ ๐‘ก .
(iv) Assume that the coefficients (๐›พ๐‘˜,1 (๐‘Ž), ๐›พ๐‘˜,2 (๐‘Ž), โ€ฆ ), for each ๐‘˜ = 1, 2, โ€ฆ , belong to the
Sobolev ellipsoid ๐ต(๐‘š, ๐‘). So that โ€–๐œ”๐‘˜ (๐‘ , ๐‘Ž)โ€–2 = ๐‘‚(๐พ โˆ’๐‘š ) for each ๐‘Ž โˆˆ ๐ด.
20
Lemma 3. Suppose ๐‘ ๐‘ก is a first-order Markov process, and assumption 5.(i) holds. For any
๐ฝ โ‰ฅ 1, we have
๐ฝ
E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก ) = ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝ ๐œŒ + โˆ‘ E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก ),
๐‘—=1
for any ๐‘ก.
Proposition 1. Suppose ๐‘ ๐‘ก is a first-order Markov process, and assumption 5.(i)-(ii) hold.
For any ๐ฝ โ‰ฅ 1 (including ๐ฝ โ†’ โˆž),
โ€–E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก ) โˆ’ ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝ ๐œŒโ€–2 = ๐‘‚(๐พ 1/2โˆ’๐‘š ).
Proposition 2. Suppose ๐‘ ๐‘ก is a first-order Markov process, and assumption 5 holds. For
any ๐ฝ โ‰ฅ 1 (including ๐ฝ โ†’ โˆž),
โ€–E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) โˆ’ ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ (๐‘Ž)ฮ“ ๐ฝโˆ’1 ๐œŒโ€–2 = ๐‘‚(๐พ 1/2โˆ’๐‘š ).
ฬ‚ ฮ“ฬ‚ ๐ฝโˆ’1 ๐œŒ.
Given proposition 2, a reasonable estimator of E(๐‘”(๐‘ ๐‘ก+๐ฝ )|๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) is ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“(๐‘Ž)
ฬ‚
The two matrices ฮ“(๐‘Ž)
and ฮ“ฬ‚ are easily obtained from the nonparametric regressions of
๐‘ž๐‘˜ (๐‘ ๐‘–,๐‘ก+1 ) on ๐‘ ๐‘–,๐‘ก and (๐‘ ๐‘–,๐‘ก , ๐‘Ž๐‘–,๐‘ก ), respectively, for all ๐‘˜ = 1, โ€ฆ , ๐พ. The vector ๐œŒ ฬ‚ is obtained
1
from its definition ๐œŒ๐‘— = โˆซ ๐‘”(๐‘ )๐‘ž๐‘— (๐‘ ) d ๐‘ .
0
3.4
Influence function for stationary decision process
Letting ๐‘ = ๐‘›๐‘‡ , our goal is to characterize the first-order asymptotic properties of ๐œƒ๐ฝฬ‚ =
โˆ’1
(๐‘…ฬ‚ ๐ฝโ€ฒ ๐‘…ฬ‚ ๐ฝ ) (๐‘…ฬ‚ ๐ฝโ€ฒ ๐‘Œ๐ฝฬ‚ ) when ๐‘› โ†’ โˆž and ๐‘‡ is fixed by deriving its influence function. The
derivation will mostly use the pathwise derivative approach by Newey (1994a) and Hahn
and Ridder (2013).
Letting ๐‘โˆ— (๐‘ ๐‘ก ) be the true CCP, denote ๐‘ฆ๐‘–๐‘กโˆ— and ๐‘Ÿ๐‘–๐‘กโˆ— the ๐‘ฆ๐‘–๐‘ก and ๐‘Ÿ๐‘–๐‘ก of eq. (22) evaluated
at the true CCP.The other objects with subscript โ€œโˆ—โ€ are defined in the same fashion. Define
โ€ฒ
๐œƒโˆ— = (๐‘…โˆ—โ€ฒ ๐‘…โˆ— )โˆ’1 ๐‘…โˆ—โ€ฒ ๐‘Œโˆ— = [E(๐‘Ÿ๐‘–๐‘กโˆ— ๐‘Ÿ๐‘–๐‘กโˆ—
)]โˆ’1 E(๐‘Ÿ๐‘–๐‘กโˆ— ๐‘ฆ๐‘–๐‘กโˆ— ),
โˆ’1
โ€ฒ
โ€ฒ
๐‘Œ๐ฝโˆ— ,
๐œƒ๐ฝ = (๐‘…๐ฝโˆ—
๐‘…๐ฝโˆ— ) ๐‘…๐ฝโˆ—
โ€ฒ
๐œƒ๐ฝโˆ— = [E(๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ—
)]โˆ’1 E(๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— ๐‘ฆ๐‘–๐‘ก,๐ฝโˆ— ).
โ€ฒ
The identity in eq. (26) follows from ๐‘ฆ๐‘–๐‘กโˆ— = ๐‘Ÿ๐‘–๐‘กโˆ—
๐œƒโˆ— . We decompose
โˆš
๐‘ (๐œƒ๐ฝฬ‚ โˆ’ ๐œƒโˆ— ) =
โˆš
๐‘ (๐œƒ๐ฝฬ‚ โˆ’ ๐œƒ๐ฝโˆ— ) +
and make the following assumption.
21
โˆš
๐‘ (๐œƒ๐ฝโˆ— โˆ’ ๐œƒโˆ— ),
(26)
Assumption 6.
(i) There exists a constant ๐œ, such that sup๐‘  |โ„Ž๐‘˜๐‘— (๐‘ )| < ๐œ for all ๐‘˜ = 1, 2, 3
and ๐‘— = 1, 2, โ€ฆ .
โ€ฒ
(ii) The smallest eigenvalue of E(๐‘Ÿ๐‘–๐‘ก,๐ฝ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
) is bounded away from zero uniformly in ๐ฝ =
1, 2, โ€ฆ .
โˆš
๐‘ (๐œƒ๐ฝโˆ— โˆ’๐œƒโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ) (proposition B.2),
โˆš
so we can focus on the analysis of the variance term ๐‘ (๐œƒ ฬ‚ โˆ’ ๐œƒ ), which can be understood
In the appendix, we show that the bias term
๐ฝ
๐ฝโˆ—
as a three-step M-estimator with generated dependent variables. Proposition B.2 justifies
the truncation we made earlier. The moment equation is
โ€ฒ
๐‘š(๐‘ฅ๐‘–๐‘ก , ๐œƒ, โ„Ž๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘), ๐‘) = ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐œƒ).
โ€ฒ
= (๐‘ฅโ€ฒ๐‘–๐‘ก + โˆ‘๐ฝ๐‘—=1 ๐›ฝ ๐‘— โ„Ž3๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ , โˆ‘๐ฝ๐‘—=1 ๐›ฝ ๐‘— โ„Ž2๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ ) is a function of
Equation (24) says that ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐‘ฅ๐‘–๐‘ก , โ„Ž๐ฝ2 (๐‘ ๐‘–๐‘ก ) and โ„Ž๐ฝ3 (๐‘ ๐‘–๐‘ก ), and ๐‘ฆ๐‘–๐‘ก,๐ฝ = ๐œ‘(๐‘(๐‘ ๐‘–๐‘ก )) + โˆ‘๐ฝ๐‘—=1 ๐›ฝ ๐‘— โ„Ž1๐‘— (๐‘ ๐‘–๐‘ก ) is a function of ๐‘(๐‘ ๐‘–๐‘ก ) and
โ„Ž๐ฝ1 (๐‘ ๐‘–๐‘ก ). The vector โ„Ž๐ฝ (๐‘ ๐‘ก )โ€ฒ โ‰ก (โ„Ž๐ฝ1 (๐‘ ๐‘ก )โ€ฒ , โ„Ž๐ฝ2 (๐‘ ๐‘ก )โ€ฒ , โ„Ž๐ฝ3 (๐‘ ๐‘ก )โ€ฒ ) itself is a function of ๐‘. So we write
โ„Ž๐ฝ (๐‘  ; ๐‘). The estimator ๐œƒ ฬ‚ solves the moment equation,
๐‘–๐‘ก
๐ฝ
๐‘›,๐‘‡
โˆ‘ ๐‘š(๐‘ฅ๐‘–๐‘ก , ๐œƒ, โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘),
ฬ‚ ๐‘)/๐‘
ฬ‚
= 0,
๐‘–,๐‘ก=1
and we have E(๐‘š(๐‘ฅ๐‘–๐‘ก , ๐œƒ๐ฝโˆ— , โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ), ๐‘โˆ— )) = 0.
It is similar to Hahn and Ridder (2013) that using the pathwise derivative approach in
โˆš
Newey (1994a), the influence function associated with ๐‘ (๐œƒ๐ฝฬ‚ โˆ’ ๐œƒ๐ฝโˆ— ) can be expressed as
โˆš
a sum of the following terms: (i) the leading term ๐‘ (๐œƒ๐ฝ โˆ’ ๐œƒ๐ฝโˆ— ), (ii) a term that adjusts
for the sampling variation in the nonparametric regressions for estimating โ„Ž๐‘˜๐‘— (๐‘ ๐‘ก ), when the
CCP is known,
๐‘›,๐‘‡
๐‘›,๐‘‡
โ€ฒ
1
โˆš โˆ‘ ( โˆ‘ ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ))๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— )) )
๐‘ ๐‘–,๐‘ก=1 ๐‘–,๐‘ก=1
โˆ’1
๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ))๐‘ฆ๐‘–๐‘ก,๐ฝ (๐‘โˆ— , โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— )) โˆ’
โˆš
๐‘ ๐œƒ๐ฝ ,
and (iii) an adjustment for the CCP estimation ๐‘ฬ‚ in the first step, i.e.,
๐‘›,๐‘‡
๐‘›,๐‘‡
โˆ’1
โ€ฒ
1
โˆš โˆ‘ ( โˆ‘ ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘))๐‘Ÿ
ฬ‚ ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘))
ฬ‚ )
๐‘ ๐‘–,๐‘ก=1 ๐‘–,๐‘ก=1
๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฅ๐‘–๐‘ก , โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘))๐‘ฆ
ฬ‚ ๐‘–๐‘ก,๐ฝ (๐‘,ฬ‚ โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘))
ฬ‚ โˆ’
Proposition B.1 of the Appendix shows that the leading term
We then just derive the other two terms.
22
โˆš
โˆš
๐‘ ๐œƒ๐ฝ ,
๐‘ (๐œƒ๐ฝ โˆ’ ๐œƒ๐ฝโˆ— ) is ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
In appendix D, we show the general results of the asymptotic variance of semiparametric
M-estimators with generated dependent variables, which cover the present problem as a
special case. The results here are derived using the general formulas in appendix D.
We consider the second adjustment term first, which can be analyzed by the pathwise
derivative approach in Newey (1994a, page 1360-1361) or Hahn and Ridder (2013, page
327). The unknown functions in โ„Ž๐ฝ (๐‘ ๐‘ก ; ๐‘)ฬ‚ in the second step nonparametric regressions are
the difference between partial means in Newey (1994b)โ€™s terminology. Letting
๐œ…๐‘–๐‘ก โ‰ก
1(๐‘Ž๐‘–๐‘ก = 1) 1 โˆ’ 1(๐‘Ž๐‘–๐‘ก = 1)
โˆ’
,
๐‘โˆ— (๐‘ ๐‘–๐‘ก )
1 โˆ’ ๐‘โˆ— (๐‘ ๐‘–๐‘ก )
it can be shown that the second adjustment term equals
๐‘›,๐‘‡ ,๐ฝ
1
๐œ•๐‘šโˆ— ๐‘—
โˆš
โˆ‘
๐›ฝ [๐œ…๐‘–๐‘ก ๐œ‚(๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )) โˆ’ โ„Ž1๐‘—โˆ— (๐‘ ๐‘–๐‘ก )]
๐‘ ๐‘–,๐‘ก,๐‘—=1 ๐œ•๐‘ฆ๐‘–๐‘ก
(27)
๐‘›,๐‘‡ ,๐ฝ
๐œ•๐‘šโˆ— ๐‘— 0
1
โˆ‘
+โˆš
๐›ฝ ( ) โŠ— [๐œ…๐‘–๐‘ก ๐‘ฅ๐‘–,๐‘ก+๐‘— โˆ’ โ„Ž2๐‘—โˆ— (๐‘ ๐‘–๐‘ก )]
๐‘ ๐‘–,๐‘ก,๐‘—=1 ๐œ•๐‘Ÿ๐‘–๐‘ก
1
(28)
๐‘›,๐‘‡ ,๐ฝ
1
๐œ•๐‘šโˆ— ๐‘— 1
+โˆš
โˆ‘
๐›ฝ ( ) โŠ— [๐œ…๐‘–๐‘ก ๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )๐‘ฅ๐‘–,๐‘ก+๐‘— โˆ’ โ„Ž3๐‘—โˆ— (๐‘ ๐‘–๐‘ก )],
๐‘ ๐‘–,๐‘ก,๐‘—=1 ๐œ•๐‘Ÿ๐‘–๐‘ก
0
(29)
+ ๐‘œ๐‘ (1)
where ๐œ•๐‘šโˆ— /๐œ•๐‘ฆ๐‘–๐‘ก and ๐œ•๐‘šโˆ— /๐œ•๐‘Ÿ๐‘–๐‘ก denote the derivatives of the moment equation ๐‘š(๐‘ฅ๐‘–๐‘ก , ๐œƒ๐ฝโˆ— , โ„Ž๐ฝโˆ— (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ), ๐‘โˆ— )
with respect to ๐‘ฆ๐‘–๐‘ก,๐ฝโˆ— and ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— , respectively. We have
๐œ•๐‘šโˆ—
= ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ—
๐œ•๐‘ฆ๐‘–๐‘ก
and
๐œ•๐‘šโˆ—
โ€ฒ
๐œƒ๐ฝโˆ— )๐ผ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— ๐œƒ๐ฝโˆ— .
= (๐‘ฆ๐‘–๐‘ก,๐ฝโˆ— โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ—
๐œ•๐‘Ÿ๐‘–๐‘ก
Here ๐ผ is a 2๐‘‘๐‘ฅ ×2๐‘‘๐‘ฅ identity matrix, and ๐‘‘๐‘ฅ is the dimension of ๐‘ฅ๐‘ก . Equation (27), (28) and
(29) are the adjustment terms for sampling error in โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ), โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ) and โ„Žฬ‚ ๐ฝ (๐‘ ๐‘–๐‘ก ; ๐‘โˆ— ),
1
2
3
respectively.
The third adjustment term accounts for the sampling variation in estimating the CCP.
Unlike the adjustment for the generated regressors studied by Hahn and Ridder (2013), the
adjustment for the generated dependent variables can be obtained by simply linearizing the
moment equation with respect to the first step estimator. It can be shown using the formula
23
in appendix D that the third adjustment term equals
๐‘›,๐‘‡
1
๐œ•๐‘šโˆ— โ€ฒ
โˆš โˆ‘
๐œ‘ (๐‘โˆ— (๐‘ ๐‘–๐‘ก ))(๐‘Ž๐‘–๐‘ก โˆ’ ๐‘โˆ— (๐‘ ๐‘–๐‘ก ))
๐‘ ๐‘–,๐‘ก=1 ๐œ•๐‘ฆ๐‘–๐‘ก
(30)
๐‘›,๐‘‡ ,๐ฝ
1
๐œ•๐‘šโˆ— ๐‘— ๐œ•๐œ‚(๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— ))
+โˆš
โˆ‘
๐›ฝ ๐œ…๐‘–๐‘ก
(๐‘Ž๐‘–๐‘ก โˆ’ ๐‘โˆ— (๐‘ ๐‘–๐‘ก ))
๐œ•๐‘
๐‘ ๐‘–,๐‘ก,๐‘—=1 ๐œ•๐‘ฆ๐‘–๐‘ก
(31)
๐‘›,๐‘‡ ,๐ฝ
1
๐œ•๐‘šโˆ— ๐‘— 1
+โˆš
โˆ‘
๐›ฝ ( ) โŠ— ๐œ…๐‘–๐‘ก ๐‘ฅ๐‘–,๐‘ก+๐‘— (๐‘Ž๐‘–๐‘ก โˆ’ ๐‘โˆ— (๐‘ ๐‘–๐‘ก ))
๐‘ ๐‘–,๐‘ก,๐‘—=1 ๐œ•๐‘Ÿ๐‘–๐‘ก
0
(32)
+ ๐‘œ๐‘ (1),
(33)
where
๐œ•๐œ‚(๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— ))
= ๐œ‘(๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )) + ๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )๐œ‘โ€ฒ (๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )) โˆ’ ๐œ“โ€ฒ (๐‘โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )).
๐œ•๐‘
Equation (30), (31) and (32) are the adjustment terms for sampling error in ๐œ‘(๐‘(๐‘ 
ฬ‚ ๐‘–๐‘ก ),
โ„Ž๐ฝ1 (๐‘ ๐‘ก ; ๐‘)ฬ‚ and โ„Ž๐ฝ3 (๐‘ ๐‘ก ; ๐‘).
ฬ‚
Remark 3. From the influence functions, it easy to see that the discount factor affects the
asymptotic variance of the three-step estimator. The closer to 1 is the discount factor ๐›ฝ,
the larger variance is the three-step estimator.
4
Estimation of General Markov Decision Processes
By โ€œgeneralโ€ Markov decision processes, we allow for time varying flow utility functions and
state transition distributions, and finite decision horizon.
The plan for this section is the following. First, we derive a simpler moment condition
about the flow utility functions from eq. (14). This simpler moment condition, eq. (35),
resembles a partially linear model, in which the unknown function is the integrated value
function in the last sampling period ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ). Using the series approximation of ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ )
and the partitioned linear regression arguments, we derive an estimable formula for the
vector of parameters specifying the flow utility functions. Such a formula is then estimated
by a three-step semiparametric estimator, which is our second part. Third, we derive the
influence function for our three-step semiparametric estimator. The arguments there have
two parts. We first show that the series approximation error of ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) does not affect
the influence function of our three-step estimator under certain conditions. Then, the rest
arguments are similar to the ones we used in deriving the influence function for the infinite
horizon stationary model.
24
4.1
Linear moment condition
Starting from eq. (14b), we have
ฬ„ (๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก ).
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = ๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) + ๐›ฝ E(๐‘‰๐‘ก+1
ฬ„ (๐‘ ๐‘ก+1 ) with the above expression and using the Markov property,
Iteratively substituting ๐‘‰๐‘ก+1
we have
๐‘‡ โˆ’๐‘กโˆ’1
๐‘œ
๐‘‰๐‘กฬ„ (๐‘ ๐‘ก ) = โˆ‘ ๐›ฝ ๐‘— E(๐‘ˆ๐‘ก+๐‘—
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก ) + ๐›ฝ ๐‘‡ โˆ’๐‘ก E(๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) โˆฃ ๐‘ ๐‘ก ).
๐‘—=0
ฬ„ (๐‘ ๐‘ก+1 ), eq. (14a) becomes
Applying this formula to ๐‘‰๐‘ก+1
๐‘‡ โˆ’๐‘กโˆ’1
๐‘œ
ฬƒ ๐‘ก+๐‘—
๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) = ๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + โˆ‘ ๐›ฝ ๐‘— E(๐‘ˆ
(๐‘ ๐‘ก+๐‘— ) โˆฃ ๐‘ ๐‘ก ) + ๐›ฝ ๐‘‡ โˆ’๐‘ก E(๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) โˆฃ ๐‘ ๐‘ก ),
๐‘—=1
where
๐‘ˆ๐‘ก๐‘œ (๐‘ ๐‘ก ) = ๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) + ๐‘๐‘ก (๐‘ ๐‘ก )๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) + ๐œ‚(๐‘๐‘ก (๐‘ ๐‘ก )).
Similar to the infinite horizon stationary model, let
๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = ๐‘ฅโ€ฒ๐‘ก ๐›ผ๐‘ก
and
๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) = ๐‘ฅโ€ฒ๐‘ก ๐›ฟ๐‘ก .
We have the following linear equations,
๐‘ฆ๐‘‡ โˆ’1 = ๐‘ฅโ€ฒ๐‘‡ โˆ’1 ๐›ฟ๐‘‡ โˆ’1 + ๐›ฝ E(ฬƒ ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) | ๐‘ ๐‘‡ โˆ’1 )
(34a)
๐‘‡ โˆ’๐‘กโˆ’1
๐‘‡ โˆ’๐‘กโˆ’1
๐‘—=1
๐‘—=1
๐‘ฆ๐‘ก = ๐‘ฅโ€ฒ๐‘ก ๐›ฟ๐‘ก + โˆ‘ ๐›ฝ ๐‘— โ„Ž2,๐‘ก,๐‘— (๐‘ ๐‘ก )โ€ฒ ๐›ผ๐‘ก+๐‘— + โˆ‘ ๐›ฝ ๐‘— โ„Ž3,๐‘ก,๐‘— (๐‘ ๐‘ก )โ€ฒ ๐›ฟ๐‘ก+๐‘— + ๐›ฝ ๐‘‡ โˆ’๐‘ก E(ฬƒ ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) | ๐‘ ๐‘ก ), (34b)
for ๐‘ก = 1, โ€ฆ , ๐‘‡ โˆ’ 2, where
๐‘‡ โˆ’๐‘กโˆ’1
๐‘ฆ๐‘ก โ‰ก ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )) + โˆ‘ ๐›ฝ ๐‘— โ„Ž1,๐‘ก,๐‘— (๐‘ ๐‘ก )
and
๐‘ฆ๐‘‡ โˆ’1 โ‰ก ๐œ‘(๐‘๐‘‡ โˆ’1 (๐‘ ๐‘‡ โˆ’1 )),
๐‘—=1
with
ฬƒ
โ„Ž1,๐‘ก,๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐œ‚(๐‘
๐‘ก+๐‘— (๐‘ ๐‘ก+๐‘— ))โˆฃ๐‘ ๐‘ก ),
ฬƒ ๐‘ก+๐‘— |๐‘ ๐‘ก ),
โ„Ž2,๐‘ก,๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐‘ฅ
ฬƒ ๐‘ก+๐‘— (๐‘ ๐‘ก+๐‘— )๐‘ฅ๐‘ก+๐‘— โˆฃ๐‘ ๐‘ก ).
โ„Ž3,๐‘ก,๐‘— (๐‘ ๐‘ก ) โ‰ก E(๐‘
Equation (34) provides a set of linear moment conditions about ๐œƒ = (๐›ฟ1โ€ฒ , โ€ฆ , ๐›ฟ๐‘‡โ€ฒ โˆ’1 , ๐›ผโ€ฒ2 , โ€ฆ , ๐›ผโ€ฒ๐‘‡ โˆ’1 )โ€ฒ .
With some definition of matrices, we have a concise version of eq. (34). Suppose we have
panel data (๐‘Ž๐‘–๐‘ก , ๐‘ โ€ฒ๐‘–๐‘ก ) with ๐‘– = 1, โ€ฆ, ๐‘› and ๐‘ก = 1, โ€ฆ, ๐‘‡ . For each ๐‘–, define a (๐‘‡ โˆ’ 1) × dim(๐œƒ)
matrix
๐‘…๐‘– = (๐‘…๐‘–,๐›ฟ , ๐‘…๐‘–,๐›ผ ),
25
where
๐‘…๐‘–,๐›ฟ
โ€ฒ
โŽก ๐‘ฅ๐‘–,1
โŽข
โŽข
โŽข
โ‰กโŽข
โŽข
โŽข
โŽข
โŽฃ 0
๐›ฝโ„Ž3,1,1 (๐‘ ๐‘–,1 )โ€ฒ
๐›ฝ 2 โ„Ž3,1,2 (๐‘ ๐‘–,1 )โ€ฒ
๐‘ฅโ€ฒ๐‘–,2
๐›ฝโ„Ž3,2,1 (๐‘ ๐‘–,2 )โ€ฒ
๐‘…๐‘–,๐›ผ
โ€ฒ
โŽก ๐›ฝโ„Ž2,1,1 (๐‘ ๐‘–,1 )
โŽข
โŽข
โŽข
โŽข
โ‰กโŽข
โŽข
โŽข
โŽข
0
โŽข
โŽข
โŽฃ
๐‘ฅโ€ฒ๐‘–,3
โ‹ฏ ๐›ฝ ๐‘‡ โˆ’2 โ„Ž3,1,๐‘‡ โˆ’2 (๐‘ ๐‘–,1 )โ€ฒ โŽค
โŽฅ
โ‹ฏ ๐›ฝ ๐‘‡ โˆ’3 โ„Ž3,2,๐‘‡ โˆ’3 (๐‘ ๐‘–,2 )โ€ฒ โŽฅ
โŽฅ
โ‹ฏ ๐›ฝ ๐‘‡ โˆ’4 โ„Ž3,3,๐‘‡ โˆ’4 (๐‘ ๐‘–,3 )โ€ฒ โŽฅ ,
โŽฅ
โŽฅ
โ‹ฑ
โ‹ฎ
โŽฅ
๐‘ฅโ€ฒ๐‘–,๐‘‡ โˆ’1
โŽฆ
๐›ฝ 2 โ„Ž2,1,2 (๐‘ ๐‘–,1 )โ€ฒ
๐›ฝ 3 โ„Ž2,1,3 (๐‘ ๐‘–,1 )โ€ฒ
โ‹ฏ
๐›ฝโ„Ž2,2,1 (๐‘ ๐‘–,2 )โ€ฒ
๐›ฝ 2 โ„Ž2,2,2 (๐‘ ๐‘–,2 )โ€ฒ
โ‹ฏ
๐›ฝโ„Ž2,3,1 (๐‘ ๐‘–,3 )โ€ฒ
โ‹ฏ
โ‹ฑ
0
๐›ฝ ๐‘‡ โˆ’2 โ„Ž2,1,๐‘‡ โˆ’2 (๐‘ ๐‘–,1 )โ€ฒ โŽค
โŽฅ
๐›ฝ ๐‘‡ โˆ’3 โ„Ž2,2,๐‘‡ โˆ’3 (๐‘ ๐‘–,2 )โ€ฒ โŽฅ
โŽฅ
๐›ฝ ๐‘‡ โˆ’4 โ„Ž2,3,๐‘‡ โˆ’4 (๐‘ ๐‘–,3 )โ€ฒ โŽฅ
โŽฅ.
โ‹ฎ
โŽฅ
โŽฅ
๐›ฝโ„Ž2,๐‘‡ โˆ’2,1 (๐‘ ๐‘–,๐‘‡ โˆ’2 )โ€ฒ โŽฅ
โŽฅ
โŽฅ
โŽฆ
Let
โ€ฒ
๐ป๐‘– = (๐›ฝ ๐‘‡ โˆ’1 E(ฬƒ ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) | ๐‘ ๐‘–1 ), โ€ฆ , ๐›ฝ E(ฬƒ ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) | ๐‘ ๐‘–,๐‘‡ โˆ’1 ))
For each ๐‘–, let ๐‘Œ๐‘– = (๐‘ฆ๐‘–1 , โ€ฆ , ๐‘ฆ๐‘–,๐‘‡ โˆ’1 )โ€ฒ and eq. (34) becomes
๐‘Œ๐‘– = ๐‘…๐‘– ๐œƒ + ๐ป๐‘– .
Stacking ๐‘Œ1 , โ€ฆ, ๐‘Œ๐‘› , we have ๐‘Œ = (๐‘Œ1โ€ฒ , โ€ฆ , ๐‘Œ๐‘›โ€ฒ ), which is a ๐‘ = ๐‘› โ‹… (๐‘‡ โˆ’ 1) dimensional vector.
Define ๐‘… and ๐ป similarly. We have
๐‘Œ = ๐‘…๐œƒ + ๐ป.
(35)
Because ๐ป involves the unknown ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ), eq. (35) is similar to the partially linear model
(Donald and Newey, 1994). It should be remarked that in discrete state space, Chou (2016)
shows that given the exclusion restriction the integrated value function ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) is identifiable
up to a constant when ๐‘‡ โ‰ฅ 4.
4.2
Three-step estimator
Before listing our three-step recipe, we need to deal with the unknown function ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) in
๐ป by series approximation. Let
โ€ฒ
๐‘ž ๐พ (๐‘ ) = (๐‘ž1 (๐‘ ), โ€ฆ , ๐‘ž๐พ (๐‘ ))
26
be a vector of approximating functions, such as power series or splines. Let
๐‘„๐‘–,๐พ
ฬƒ ๐พ (๐‘ ๐‘‡ )โ€ฒ | ๐‘ ๐‘–,1 )
๐›ฝ ๐‘‡ โˆ’1 E(๐‘ž
โŽก
โŽค
โŽฅ
=โŽข
โ‹ฎ
โŽข
โŽฅ
ฬƒ ๐พ (๐‘ ๐‘‡ )โ€ฒ | ๐‘ ๐‘–,๐‘‡ โˆ’1 ) โŽฆ
โŽฃ ๐›ฝ E(๐‘ž
๐‘„โ€ฒ๐พ = (๐‘„โ€ฒ1๐พ , โ€ฆ , ๐‘„โ€ฒ๐‘›๐พ ).
and
The estimator of ๐œƒ are coefficients of ๐‘…๐‘– in the linear regression of ๐‘Œ๐‘– on ๐‘…๐‘– and ๐‘„๐‘–,๐พ . By
the usual partioned linear regression arguments, the estimator may be written as follows,
๐œƒ๐พ = (๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐‘Œ ,
with ๐‘€๐พ = ๐ผ โˆ’ ๐‘„๐พ (๐‘„โ€ฒ๐พ ๐‘„๐พ )โˆ’1 ๐‘„โ€ฒ๐พ . The invertibility of ๐‘…โ€ฒ ๐‘€๐พ ๐‘… will be discussed latter. In
the next subsection, we show the difference between ๐œƒ๐พ and the true value of ๐œƒ is asymptotically negligible. So our estimator of ๐œƒ is the sample analog of ๐œƒ๐พ formed in the following
three steps.
ฬƒ ๐พ (๐‘ ๐‘‡ ) | ๐‘ ๐‘ก ) for ๐‘ก = 1, โ€ฆ, ๐‘‡ โˆ’ 1. Let ๐‘๐‘กฬ‚ (๐‘ ๐‘ก )
Step 1: Estimate the CCP E(๐‘Ž๐‘–๐‘ก | ๐‘ ๐‘–๐‘ก ) and ๐›ฝ ๐‘‡ โˆ’๐‘ก E(๐‘ž
be the CCP estimator. Let ๐‘„ฬ‚ ๐พ be the estimate of ๐‘„, and let ๐‘€ฬ‚ ๐พ be the estimate
of ๐‘€ from replacing ๐‘„ with ๐‘„ฬ‚ .
๐พ
๐พ
๐พ
Step 2: Let ๐œ‘(๐‘๐‘กฬ‚ (๐‘ ๐‘ก )) be the estimator of ๐œ‘(๐‘๐‘ก (๐‘ ๐‘ก )). Let โ„Žฬ‚ be the estimator of โ„Ž from
nonparametric regressions with generated dependent variables.
Step 3: Then ๐‘Œ ฬ‚ and ๐‘…ฬ‚ are constructed by replacing the unknown ๐œ‘ and โ„Ž with their respective estimates. We have the estimator
ฬ‚ = (๐‘…ฬ‚ โ€ฒ ๐‘€ฬ‚ ๐‘…)
ฬ‚ โˆ’1 ๐‘…ฬ‚ โ€ฒ ๐‘€ฬ‚ ๐พ ๐‘Œ ฬ‚ .
๐œƒ๐พ
๐พ
4.3
Influence function for general decision process
Let ๐‘ = ๐‘› โ‹… (๐‘‡ โˆ’ 1). Letting ๐œƒโˆ— be the true vector of parameters specifying the flow utility
function, we can decompose
โˆš
ฬ‚ โˆ’๐œƒ )=
๐‘ (๐œƒ๐พ
โˆ—
โˆš
ฬ‚ โˆ’๐œƒ )+
๐‘ ( ๐œƒ๐พ
๐พ
โˆš
๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— ).
โˆš
ฬ‚ โˆ’ ๐œƒ ) (variance term) if โ€– ๐‘ (๐œƒ โˆ’ ๐œƒ )โ€– (bias term) is ๐‘œ (1). We
๐‘ (๐œƒ๐พ
๐พ
๐พ
โˆ—
๐‘
โˆš
start by showing the conditions under which โ€– ๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€– = ๐‘œ๐‘ (1). Then the properties of
โˆš
๐‘ (๐œƒ ฬ‚ โˆ’ ๐œƒ ) can be similarly analyzed using the influence function formula for the threeWe can focus on
๐พ
โˆš
๐พ
step semiparametric estimators with generated dependent variables. So the details will be
omitted.
27
Proposition 3. Suppose (i) The smallest eigenvalue of E(๐‘ โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 is bounded away
from zero uniformly in ๐พ; (ii) The series approximation error of ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) is of the order
๐‘‚(๐พ โˆ’๐œ‰ ). More explicitly, there exists ๐œ‰ > 0, such that sup๐‘ โˆˆ๐’ฎ โˆฃ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ )โˆ’โˆ‘๐พ
๐œŒ ๐‘ž (๐‘ )โˆฃ โ‰ค ๐‘‚(๐พ โˆ’๐œ‰ ).
๐‘—=1 ๐‘— ๐‘—
Then we have โ€–๐œƒ๐พ โˆ’ ๐œƒโˆ— โ€– = ๐‘‚๐‘ (๐พ โˆ’๐œ‰ ). So if we impose undersmoothing conditions such that
โˆš
โˆš
๐พ โˆ’๐œ‰ = ๐‘œ(1/ ๐‘›), we can let the bias term โ€– ๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€– = ๐‘œ๐‘ (1).
โ– 
Proof. See appendix C.
ฬ‚ here. The derivation is similar to the stationary
We derive the influence function of ๐œƒ๐พ
Markov decision process case. First, the three-step estimator is viewed as a M-estimator
with the following moment equation,
๐‘š(๐‘ฅ๐‘– , ๐œƒ, โ„Ž(๐‘ ๐‘– ; ๐‘), ๐‘) = ๐‘…ฬƒ ๐‘–โ€ฒ (๐‘Œ๐‘–ฬƒ โˆ’ ๐‘…ฬƒ ๐‘– ๐œƒ),
where ๐‘ฅ๐‘– = (๐‘ฅโ€ฒ๐‘–1 , โ€ฆ , ๐‘ฅโ€ฒ๐‘–,๐‘‡ โˆ’1 )โ€ฒ , ๐‘ ๐‘– = (๐‘ โ€ฒ๐‘–1 , โ€ฆ , ๐‘ โ€ฒ๐‘–๐‘‡ )โ€ฒ , and ๐‘(๐‘ ๐‘– ) = (๐‘1 (๐‘ 1 ), โ€ฆ , ๐‘๐‘‡ โˆ’1 (๐‘ ๐‘‡ โˆ’1 ))โ€ฒ is
the vector of CCP from period 1 to ๐‘‡ โˆ’ 1. Here ๐‘…ฬƒ ๐‘– = ๐‘€๐‘–,๐พ ๐‘… and ๐‘Œ๐‘–ฬƒ = ๐‘€๐‘–,๐พ ๐‘Œ , with
โ€ฒ
โ€ฒ
โ€ฒ
๐‘€๐พ
= (๐‘€1,๐พ
, โ€ฆ , ๐‘€๐‘›,๐พ
). Each ๐‘€๐‘–,๐พ is an (๐‘‡ โˆ’ 1) × ๐‘ matrix. We write โ„Ž(๐‘ ๐‘– ; ๐‘) to
emphasize the dependence of โ„Ž on the CCP.The three-step estimator solves
๐‘›
ฬ‚ ๐‘– ; ๐‘),
โˆ‘ ๐‘š(๐‘ฅ๐‘– , ๐œƒ, โ„Ž(๐‘ 
ฬ‚ ๐‘)/๐‘›
ฬ‚
= 0.
๐‘–=1
We have E(๐‘š(๐‘ฅ๐‘– , ๐œƒ๐พ , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— ), ๐‘โˆ— )) = 0.
Using the pathwise derivative approach in Newey (1994a), we again can write
โˆš
ฬ‚ โˆ’
๐‘›(๐œƒ๐พ
๐œƒ๐พ ) as the sum of the three components:
๐‘›
โˆ’1
โˆš
1
โ€ฒ
โ€ฒ
โˆš (โˆ‘ ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— )) ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— ))) ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— )) ๐‘Œ๐‘–ฬƒ (โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— )) โˆ’ ๐‘›๐œƒ๐พ (i)
๐‘› ๐‘–=1
๐‘›
โˆ’1
โˆš
1
ฬ‚ ๐‘– ; ๐‘โˆ— ))โ€ฒ ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Ž(๐‘ 
ฬ‚ ๐‘– ; ๐‘โˆ— ))) ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Ž(๐‘ 
ฬ‚ ๐‘– ; ๐‘โˆ— ))โ€ฒ ๐‘Œ๐‘–ฬƒ (โ„Ž(๐‘ 
ฬ‚ ๐‘– ; ๐‘โˆ— )) โˆ’ ๐‘›๐œƒ๐พ
โˆš (โˆ‘ ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Ž(๐‘ 
๐‘› ๐‘–=1
(ii)
๐‘›
โˆ’1
โˆš
1
โ€ฒ
โ€ฒ
โˆš (โˆ‘ ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘))
ฬ‚ ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘)))
ฬ‚
๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘))
ฬ‚ ๐‘Œ๐‘–ฬƒ (โ„Žโˆ— (๐‘ ๐‘– ; ๐‘))
ฬ‚ โˆ’ ๐‘›๐œƒ๐พ
๐‘› ๐‘–=1
(iii)
By the definition of ๐œƒ๐พ , term (i) equals zero. We just need to calculate the other two terms.
Using the general results of the asymptotic variance of the three-step estimators with
generated dependent variables in appendix D, we have the following results. Let
๐œ…๐‘–๐‘ก =
1(๐‘Ž๐‘–๐‘ก = 1) 1 โˆ’ 1(๐‘Ž๐‘–๐‘ก = 1)
โˆ’
,
๐‘๐‘กโˆ— (๐‘ ๐‘–๐‘ก )
1 โˆ’ ๐‘๐‘กโˆ— (๐‘ ๐‘–๐‘ก )
๐‘Ž๐‘–,๐‘ก,๐‘— = ๐›ฝ ๐‘— {๐œ…๐‘–๐‘ก ๐œ‚(๐‘๐‘ก+๐‘—,โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )) โˆ’ โ„Ž1,๐‘ก,๐‘—โˆ— (๐‘ ๐‘–๐‘ก )} ,
๐‘๐‘–,๐‘ก,๐‘— = ๐›ฝ ๐‘— [๐œ…๐‘–๐‘ก ๐‘ฅ๐‘–,๐‘ก+๐‘— โˆ’ โ„Ž2,๐‘ก,๐‘—โˆ— (๐‘ ๐‘–,๐‘ก )] ,
๐‘๐‘–,๐‘ก,๐‘— = ๐›ฝ ๐‘— [๐œ…๐‘–๐‘ก ๐‘๐‘ก+๐‘—,โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— )๐‘ฅ๐‘–,๐‘ก+๐‘— โˆ’ โ„Ž3,๐‘ก,๐‘—โˆ— (๐‘ ๐‘–๐‘ก )] ,
28
and define two block upper triangular matrices,
๐‘โ€ฒ
โŽก ๐‘–,1,1
โŽข
โŽข
โŽข
โŽฃ 0
๐‘โ€ฒ
โŽก ๐‘–,1,1
โŽข
โŽข
โŽข
โŽฃ 0
โ€ฒ
๐‘๐‘–,1,2
โ€ฒ
๐‘๐‘–,1,3
โ€ฒ
๐‘๐‘–,2,1
โ€ฒ
๐‘๐‘–,2,2
โ‹ฑ
โ€ฒ
๐‘๐‘–,1,2
โ€ฒ
๐‘๐‘–,1,3
โ€ฒ
๐‘๐‘–,2,1
โ€ฒ
๐‘๐‘–,2,2
โ‹ฑ
โ€ฒ
โ‹ฏ ๐‘๐‘–,1,๐‘‡
โˆ’2
โŽค
โ€ฒ
โ‹ฏ ๐‘๐‘–,2,๐‘‡
โˆ’3 โŽฅ
โŽฅ
โ‹ฑ
โ‹ฎ โŽฅ
โ€ฒ
โ‹ฑ ๐‘๐‘–,๐‘‡
โˆ’2,1 โŽฆ
โ€ฒ
โ‹ฏ ๐‘๐‘–,1,๐‘‡
โˆ’2
โŽค
โ€ฒ
โ‹ฏ ๐‘๐‘–,2,๐‘‡
โˆ’3 โŽฅ
โŽฅ.
โ‹ฑ
โ‹ฎ โŽฅ
โ€ฒ
โ‹ฑ ๐‘๐‘–,๐‘‡
โˆ’2,1 โŽฆ
Let
๐‘‡ โˆ’2
(36)
(37)
โ€ฒ
๐‘‡ โˆ’3
๐ด๐‘–
=
(โˆ‘๐‘—=1 ๐‘Ž๐‘–,1,๐‘— , โˆ‘๐‘—=1 ๐‘Ž๐‘–,2,๐‘— , โ‹ฏ , โˆ‘1๐‘—=1 ๐‘Ž๐‘–,๐‘‡ โˆ’2,๐‘— , 0) ,
๐ต๐‘–
=
[ 0(๐‘‡ โˆ’1)×๐‘‘๐›ฟ
๐ถ๐‘–
=
[
eq. (36)
01×๐‘‘๐›ผ
0(๐‘‡ โˆ’2)×1
eq. (37)
0
01×(๐‘‡ โˆ’2)
],
0(๐‘‡ โˆ’1)×๐‘‘๐›ผ ] .
Denote
๐‘…ฬƒ ๐‘–โˆ— = ๐‘…ฬƒ ๐‘– (๐‘ฅ๐‘– , โ„Žโˆ— (๐‘ ๐‘– ; ๐‘โˆ— ))
Term (ii) equals the following,
โˆš
1 ๐‘› โ€ฒ
โˆš โˆ‘ ๐‘…ฬƒ ๐‘–โˆ—
[๐ด๐‘– โˆ’ (๐ต๐‘– + ๐ถ๐‘– )๐œƒ๐พ ] โˆ’ ๐‘›๐œƒ๐พ + ๐‘œ๐‘ (1).
๐‘› ๐‘–=1
Let
๐œ‘โ€ฒ (๐‘1โˆ— (๐‘ ๐‘–1 ))(๐‘Ž๐‘–1 โˆ’ ๐‘1โˆ— (๐‘ ๐‘–1 ))
โŽก
โŽค
โŽฅ,
๐ธ๐‘– = โŽข
โ‹ฎ
โŽข
โŽฅ
โ€ฒ
โŽฃ๐œ‘ (๐‘๐‘‡ โˆ’1โˆ— (๐‘ ๐‘–,๐‘‡ โˆ’1 ))(๐‘Ž๐‘–,๐‘‡ โˆ’1 โˆ’ ๐‘๐‘‡ โˆ’1โˆ— (๐‘ ๐‘–,๐‘‡ โˆ’1 ))โŽฆ
(๐‘ ๐‘–,1+๐‘— ))
๐œ…๐‘–1 (๐‘Ž๐‘–,1+๐‘— โˆ’ ๐‘1+๐‘—,โˆ— (๐‘ ๐‘–,1+๐‘— ))
โˆ‘๐‘‡ โˆ’2 ๐›ฝ ๐‘— ๐œ•๐œ‚(๐‘1+๐‘—๐œ•๐‘
โŽค
โŽก ๐‘—=1
โŽฅ
โŽข
โ‹ฎ
๐น๐‘– = โŽข ๐œ•๐œ‚(๐‘ (๐‘ 
โŽฅ.
๐‘‡ โˆ’1 ๐‘–,๐‘‡ โˆ’1 ))
๐œ…๐‘–,๐‘‡ โˆ’1 (๐‘Ž๐‘–,๐‘‡ โˆ’1 โˆ’ ๐‘๐‘‡ โˆ’1,โˆ— (๐‘ ๐‘–,๐‘‡ โˆ’1 )) โŽฅ
โŽข๐›ฝ
๐œ•๐‘
0
โŽฃ
Letting
๐‘”๐‘–,๐‘ก,๐‘— = ๐›ฝ ๐‘— [๐‘ฅ๐‘–,๐‘ก+๐‘— ๐œ…๐‘–๐‘ก (๐‘Ž๐‘–,๐‘ก+๐‘— โˆ’ ๐‘๐‘ก+๐‘—,โˆ— (๐‘ ๐‘–,๐‘ก+๐‘— ))],
29
โŽฆ
and define a block triangular matrix,
๐‘”โ€ฒ
โŽก ๐‘–,1,1
โŽข
โŽข
โŽข
โŽฃ 0
โ€ฒ
๐‘”๐‘–,1,2
โ€ฒ
๐‘”๐‘–,1,3
โ‹ฏ
โ€ฒ
๐‘”๐‘–,2,1
โ€ฒ
๐‘”๐‘–,2,2
โ‹ฏ
โ‹ฑ
โ‹ฑ
โ‹ฑ
โ€ฒ
๐‘”๐‘–,1,๐‘‡
โˆ’2
โŽค
โ€ฒ
๐‘”๐‘–,2,๐‘‡
โˆ’3 โŽฅ
โŽฅ
โ‹ฎ โŽฅ
โ€ฒ
๐‘”๐‘–,๐‘‡
โˆ’2,1 โŽฆ
(38)
and let
eq. (38)
๐บ๐‘– = [ 0(๐‘‡ โˆ’1)×1
01×(๐‘‡ โˆ’2)
0(๐‘‡ โˆ’1)×๐‘‘๐›ผ ] .
Term (iii) equals the following,
๐ธ + ๐น1 โˆ’ ๐บ1 ๐œƒ๐พ
โŽก 1
โŽค โˆš
1 ๐‘› โ€ฒ
โŽฅ โˆ’ ๐‘›๐œƒ๐พ + ๐‘œ๐‘ (1).
โˆš โˆ‘ ๐‘…ฬƒ ๐‘–โˆ—
๐‘€๐‘–,๐พ โŽข
โ‹ฎ
โŽข
โŽฅ
๐‘› ๐‘–=1
โŽฃ๐ธ๐‘› + ๐น๐‘› โˆ’ ๐บ๐‘› ๐œƒ๐พ โŽฆ
5
5.1
Monte Carlo Studies
Infinite horizon stationary decision process
In this numerical example, both ๐‘ฅ๐‘ก and ๐‘ง๐‘ก are scalars. Letting the flow utility functions have
the quadratic form,
๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) = ๐›ฟ1 + ๐›ฟ2 ๐‘ฅ๐‘ก + ๐›ฟ3 ๐‘ฅ2๐‘ก
๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = ๐›ผ1 + ๐›ผ2 ๐‘ฅ๐‘ก + ๐›ผ3 ๐‘ฅ2๐‘ก ,
and
and ๐‘ข๐‘ก (๐‘Ž๐‘ก = 1, ๐‘ฅ๐‘ก ) = ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) + ๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ). In simulation, we let
๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = ๐‘ข(๐‘ฅ
ฬƒ ๐‘ก ) = .5 + ๐‘ฅ๐‘ก โˆ’ .5๐‘ฅ2๐‘ก .
The support for ๐‘ฅ๐‘ก is [0, 3]. The excluded variable ๐‘ง๐‘ก is time invariant (hence drop the time
subscript ๐‘ก of ๐‘ง๐‘ก in the sequel). Conditional on (๐‘ฅ๐‘ก , ๐‘ง), ๐‘ฅ๐‘ก+1 is generated from
๐‘ฅ๐‘ก+1 = min(3, ๐‘‘๐‘ก โ‹…(1+๐œ‰๐‘ก+1 )๐‘ฅ๐‘ก +(1โˆ’๐‘‘๐‘ก )·2๐œ‰๐‘ก+1 ) with
๐œ‰๐‘ก+1 |๐‘ง โˆผ Beta(shape1 (๐‘ง), shape2 (๐‘ง)).
The shape parameters the beta distribution are determined in the following way so that
E(๐œ‰๐‘ก+1 | ๐‘ง) = ๐‘ง and Var(๐œ‰๐‘ก+1 ) = 1/20:
shape1 (๐‘ง) = 20๐‘ง 2 (1 โˆ’ ๐‘ง) โˆ’ ๐‘ง
and
shape2 (๐‘ง) =
shape1 (๐‘ง) โ‹… (1 โˆ’ ๐‘ง)
.
๐‘ง
To ensure shape1 (๐‘ง), shape2 (๐‘ง) > 0, we let ๐‘ง follow a uniform distribution with support
[.053, .947]. We set the discount factor ๐›ฝ = .8. The utility shocks ๐œ€๐‘ก (0), ๐œ€๐‘ก (1) follow in30
Table 1: Estimation of Infinite Horizon Stationary Markov Decision Process
Flow Utility Functions
๐›ฟ1 = 0.5
.525
(.065)
.544
(.101)
.621
(.182)
๐›ฟ2 = 1
๐›ฟ3 = โˆ’0.5
.940
โˆ’.501
.903
โˆ’.504
.771
โˆ’.500
(.112)
(.169)
(.294)
(.032)
(.047)
(.079)
Panel Data
๐›ผ2 = 1
๐›ผ3 = โˆ’0.5
๐‘›
๐‘‡
.930
โˆ’.456
2e4
2
.916
โˆ’.436
1e4
2
.905
โˆ’.389
5e3
2
(.104)
(.169)
(.296)
(.027)
(.045)
(.084)
dependent type-I extreme value distribution. Given these structural parameters, we can
determine the true CCP.6
In the numerical examples, we simulate two periods (๐‘‡ = 2) dynamic discrete choices
by ๐‘› number of agents. The first period states {๐‘ฅ๐‘–1 , ๐‘ง๐‘– โˆถ ๐‘– = 1, โ€ฆ , ๐‘›} were drawn uniformly
from their support. We estimate the model when ๐‘› = 5, 000, 10, 000 and 20, 000. Assume
the discount factor is known in the estimation. Table 1 reports the performance of our
estimator based on 1, 000 replications.7 The performance is satisfying. There is bias due to
the smoothing in the nonparametric ingredient of our three-step semiparametric estimator.
As the cross-section sample size increases, the bias decreases. There is no information about
๐›ผ1 , because the intercept of ๐‘ข(๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) is not identifiable.
5.2
General Markov decision process
Here, the data are still generated from the above stationary model, however, in the estimation, we do not assume the stationarity. Instead, we let
๐‘ขฬƒ๐‘ก (๐‘ฅ๐‘ก ) = ๐›ฟ1,๐‘ก + ๐›ฟ2,๐‘ก ๐‘ฅ๐‘ก + ๐›ฟ3,๐‘ก ๐‘ฅ2๐‘ก
๐‘ข๐‘ก (๐‘Ž๐‘ก = 0, ๐‘ฅ๐‘ก ) = ๐›ผ1,๐‘ก + ๐›ผ2,๐‘ก ๐‘ฅ๐‘ก + ๐›ผ3,๐‘ก ๐‘ฅ2๐‘ก ,
and
and estimate ๐›ฟ๐‘ก and ๐›ผ๐‘ก for each sampling period.
In the numerical examples, we simulate six periods (๐‘‡ = 6) dynamic discrete choices by
๐‘› number of agents. As we did for the stationary model, we draw the first period states
{๐‘ฅ๐‘–1 , ๐‘ง๐‘– โˆถ ๐‘– = 1, โ€ฆ , ๐‘›} uniformly from their support and estimate the model when ๐‘› =
5, 000, 10, 000 and 20, 000. Assume the discount factor is known in the estimation. Table 2
reports the performance of our estimator based on 1, 000 replications. The performance
6 Readers,
who are interested in the algorithm of determining CCP from the structural parameters, can
find the details in the documentation of our codes.
7 The Monte Carlo simulation studies used the ALICE High Performance Computing Facility at the
University of Leicester.
31
Table 2: Estimation of General Markov Decision Process
Flow Utility Fun.
๐›ฟ1 = 0.5
๐›ฟ2 = 1
๐‘ก=1
๐‘ก=2
.571
.532
(.101)
(.066)
.893
.949
๐‘ก=3
.520
(.061)
.975
๐‘ก=4
.510
(.054)
.992
๐‘ก=5
.523
(.060)
.953
(.179)
(.135)
(.132)
(.119)
(.134)
โˆ’.470
โˆ’.486
โˆ’.499
โˆ’.507
โˆ’.498
๐›ผ2 = 1
1.078
1.058
1.017
(.121)
(.119)
๐›ผ3 = โˆ’0.5
โˆ’.515
โˆ’.510
โˆ’.496
โˆ’.490
๐›ฟ3 = โˆ’0.5
๐›ฟ1 = 0.5
๐›ฟ2 = 1
(.046)
(.044)
(.114)
(.041)
.582
.545
(.141)
(.091)
(.037)
.525
(.038)
.507
(.086)
(.080)
.993
(.034)
.538
(.080)
(.188)
.969
1.001
(.175)
(.179)
โˆ’.467
โˆ’.484
โˆ’.502
โˆ’.514
โˆ’.500
๐›ผ2 = 1
1.092
1.071
1.019
(.177)
(.172)
๐›ผ3 = โˆ’0.5
โˆ’.520
โˆ’.508
โˆ’.492
โˆ’.483
๐›ฟ1 = 0.5
๐›ฟ2 = 1
(.063)
(.161)
(.058)
.610
.567
(.208)
(.136)
.846
.892
(.062)
(.167)
(.053)
.539
(.116)
.942
(.057)
(.053)
.523
(.109)
.967
.926
.978
(.051)
.559
(.111)
.874
(.276)
(.256)
(.241)
(.240)
โˆ’.466
โˆ’.483
โˆ’.501
โˆ’.514
โˆ’.504
๐›ผ2 = 1
1.105
1.075
1.027
(.240)
(.228)
๐›ผ3 = โˆ’0.5
โˆ’.513
โˆ’.497
โˆ’.484
โˆ’.466
(.089)
(.090)
(.231)
(.084)
(.086)
(.228)
(.075)
(.082)
(.074)
1e4
(.058)
(.366)
๐›ฟ3 = โˆ’0.5
2e4
(.044)
(.190)
(.062)
.929
(.112)
(.038)
(.246)
๐›ฟ3 = โˆ’0.5
.882
(.043)
๐‘›
5e3
(.082)
.964
(.069)
is satisfying. There is bias due to the smoothing in the nonparametric ingredient of our
three-step semiparametric estimator. As the cross-section sample size increases, the bias
decreases. Again, ๐›ผ1,๐‘ก is not identifiable.
6
Conclusion
We propose a three-step semiparametric estimator of the expected flow utility functions of
structural dynamic programming discrete choice models. Our estimator is an extension of
the CCP estimators in the literature. The main advantage of our estimator is that it does not
require the estimation of the state transition distributions, which could be difficult when
32
the dimension of the observable state variables is even moderately large. Our estimator
can be applied to both infinite horizon stationary discrete choice models and the finite
horizon nonstationary model with time varying expected flow utility functions and/or state
transition distributions.
33
Appendix
A
Proofs for the approximation formulas in infinite horizon stationary problem
Proof of Lemma 3. We prove by induction. For ๐ฝ = 1, we have
E(๐‘”(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก ) = E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ๐œŒ + ๐œˆ(๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก )
= E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ๐œŒ โˆฃ ๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก )
๐พ
= โˆ‘ ๐œŒ๐‘˜ ๐‘ž๐‘˜ฬ„ (๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก )
๐‘˜=1
๐พ
= โˆ‘ ๐œŒ๐‘˜ [๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ๐›พ๐‘˜ + ๐œ”๐‘˜ (๐‘ ๐‘ก )] + E(๐œˆ(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก )
๐‘˜=1
๐พ
๐พ
= โˆ‘ ๐œŒ๐‘˜ ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ๐›พ๐‘˜ + โˆ‘ ๐œŒ๐‘˜ ๐œ”๐‘˜ (๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก )
๐‘˜=1
๐‘˜=1
๐พ
โ€ฒ
๐พ
= ๐‘ž (๐‘ ๐‘ก ) ฮ“ ๐œŒ + ๐œ” (๐‘ ๐‘ก )โ€ฒ ๐œŒ + E(๐œˆ(๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก ),
which equals to the formula in this lemma.
Suppose the lemma holds for ๐ฝ = ๐ฝโˆ— , we need to show that it holds for ๐ฝ = ๐ฝโˆ— + 1. It
follows from the Markov property that
E(๐‘”(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก ) = E(E(๐‘”(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก )
๐ฝโˆ—
๐พ
โ€ฒ
๐ฝโˆ—
= E(๐‘ž (๐‘ ๐‘ก+1 ) ฮ“ ๐œŒ + โˆ‘ E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ— +1โˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก+1 ) + E(๐œˆ(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก+1 ) โˆฃ ๐‘ ๐‘ก )
๐‘—=1
๐ฝโˆ—
๐พ
โ€ฒ
๐ฝโˆ—
= E(๐‘ž (๐‘ ๐‘ก+1 ) ฮ“ ๐œŒ | ๐‘ ๐‘ก ) + โˆ‘ E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ— +1โˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก ).
๐‘—=1
We can organize E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐ฝโˆ— ๐œŒ | ๐‘ ๐‘ก ) as follows,
E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐ฝโˆ— ๐œŒ | ๐‘ ๐‘ก ) = E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ | ๐‘ ๐‘ก )ฮ“ ๐ฝโˆ— ๐œŒ
= (๐‘ž1ฬ„ (๐‘ ๐‘ก ), โ€ฆ , ๐‘ž๐พ
ฬ„ (๐‘ ๐‘ก ))ฮ“ ๐ฝโˆ— ๐œŒ
= [(๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ๐›พ1 , โ€ฆ , ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ๐›พ๐พ ) + ๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ]ฮ“ ๐ฝโˆ— ๐œŒ
= ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝโˆ— +1 ๐œŒ + ๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝโˆ— ๐œŒ.
34
Together, we have
๐ฝโˆ— +1
E(๐‘”(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก ) โˆ’ ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐ฝโˆ— +1 ๐œŒ = โˆ‘ E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ— +1โˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+๐ฝโˆ— +1 ) | ๐‘ ๐‘ก ).
๐‘—=1
โ– 
By induction, we conclude that the lemma is correct.
Proof of Proposition 1. We prove by showing the order of
๐ฝ
โˆ‘ E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก ) + E(๐œˆ(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก ).
๐‘—=1
First, viewing E(โ‹… | ๐‘ ๐‘ก ) as a linear operator (whose norm is 1), we know
โ€–E(๐œˆ(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก )โ€–2 โ‰ค โ€–๐œˆ(๐‘ )โ€–2 = ๐‘‚(๐พ โˆ’๐‘š ).
Second, by the same token,
โ€–E(๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒ โˆฃ ๐‘ ๐‘ก )โ€–2 โ‰ค โ€–๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒโ€–2 .
Let ๐œŒ๐‘—ฬƒ = ฮ“ ๐‘— ๐œŒ. It follows from the triangle inequality, we have
๐พ
โ€–๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ฮ“ ๐‘—โˆ’1 ๐œŒโ€–2 = โ€–๐œ”๐พ (๐‘ ๐‘ก+๐ฝโˆ’๐‘— )โ€ฒ ๐œŒ๐‘—โˆ’1
ฬƒ โ€–2 โ‰ค โˆ‘|๐œŒ๐‘—โˆ’1,๐‘˜
ฬƒ
|โ€–๐œ”๐‘˜ (๐‘ ๐‘ก )โ€–2 .
(39)
๐‘˜=1
It then follows from Cauchyโ€™s inequality that
โˆš
โˆš๐พ
โˆš
โˆš๐พ
โˆš๐พ
2 โˆš
2
2
eq. (39) โ‰ค โˆšโˆ‘|๐œŒ๐‘—โˆ’1,๐‘˜
ฬƒ
| โ‹… โˆšโˆ‘โ€–๐œ”๐‘˜ (๐‘ ๐‘ก )โ€– = โ€–๐œŒ๐‘—โˆ’1
ฬƒ โ€–โˆšโˆ‘โ€–๐œ”๐‘˜ (๐‘ ๐‘ก )โ€– โˆผ โ€–๐œŒ๐‘—โˆ’1
ฬƒ โ€– โ‹… ๐‘‚(๐พ 1/2โˆ’๐‘š ).
โŽท ๐‘˜=1
โŽท ๐‘˜=1
โŽท ๐‘˜=1
For ๐œŒ๐‘—ฬƒ = ฮ“ ๐‘— ๐œŒ, we have โ€–๐œŒ๐‘—ฬƒ โ€– = โ€–ฮ“ ๐‘— ๐œŒโ€– โ‰ค โ€–ฮ“ ๐‘— โ€–โ€–๐œŒโ€–. Because โ€–๐œŒโ€– โ‰ค โ€–๐‘”โ€–2 < โˆž, we have
โ€–๐œŒ๐‘—ฬƒ โ€– โˆผ โ€–ฮ“ ๐‘— โ€–. As a result,
eq. (39) โ‰ค โ€–ฮ“ ๐‘— โ€– โ‹… ๐‘‚(๐พ 1/2โˆ’๐‘š ).
The order of the error term is ๐‘‚(๐พ 1/2โˆ’๐‘š ) โ‹… โˆ‘๐ฝ๐‘—=1 โ€–ฮ“ ๐‘— โ€–. For any finite ๐ฝ , the order is of course
๐‘‚(๐พ 1/2โˆ’๐‘š ).
We are also interested in the case when ๐ฝ โ†’ โˆž. Lemma A.1 shows that โ€–ฮ“ โ€– โ‰ค 1 with
equality only when ๐œ”๐พ (๐‘ ๐‘ก ) = 0 or ฮ“ = 0. For the two trivial cases, if ฮ“ = 0, then the bound
is simply zero; if ๐œ”๐พ (๐‘ ๐‘ก ) = 0, the bound is simply โ€–E(๐œˆ(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก )โ€–2 โ‰ค โ€–๐œˆ(๐‘ )โ€–2 = ๐‘‚(๐พ โˆ’๐‘š ).
If โ€–ฮ“ โ€– < 1, we know that the above display is simply
โˆž
โˆž
๐‘—
๐‘‚(๐พ 1/2โˆ’๐‘š ) โ‹… โˆ‘โ€–ฮ“ ๐‘— โ€– โ‰ค ๐‘‚(๐พ 1/2โˆ’๐‘š ) โ‹… โˆ‘โ€–ฮ“ โ€– = ๐‘‚(๐พ 1/2โˆ’๐‘š )
๐‘—=1
๐‘—=1
1
= ๐‘‚(๐พ 1/2โˆ’๐‘š ).
1 โˆ’ โ€–ฮ“ โ€–
So we conclude that the approximation bound is ๐‘‚(๐พ 1/2โˆ’๐‘š ) regardless of the value of ๐ฝ . โ– 
35
Lemma A.1. We have โ€–ฮ“ โ€– โ‰ค 1 with equality only when ๐œ”๐พ (๐‘ ๐‘ก ) = 0 or ฮ“ = 0.
Proof. For any ๐œŒ0 โˆˆ โ„๐พ , we have
E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐œŒ0 | ๐‘ ๐‘ก ) = E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ | ๐‘ ๐‘ก )ฮ“ ๐œŒ0
= ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ 2 ๐œŒ0 + ๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐œŒ0 .
Note that ๐‘ž ๐พ (๐‘ ) = (๐‘ž1 (๐‘ ), โ€ฆ , ๐‘ž๐พ (๐‘ ))โ€ฒ is orthogonal to ๐œ”๐พ (๐‘ ) = (๐œ”1 (๐‘ ), โ€ฆ , ๐œ”๐พ (๐‘ ))โ€ฒ , because
โˆž
๐œ”๐‘˜ (๐‘ ) = โˆ‘๐‘—=๐พ+1 ๐›พ๐‘˜,๐‘— ๐‘ž๐‘— (๐‘ ). So we have
โ€–๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ 2 ๐œŒ0 + ๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐œŒ0 โ€–2 = โ€–๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ 2 ๐œŒ0 โ€–2 + โ€–๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐œŒ0 โ€–2 ,
and
โ€–๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ 2 ๐œŒ0 โ€–2 โ‰ค โ€–E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐œŒ0 | ๐‘ ๐‘ก )โ€–2 ,
with equality only when ๐œ”๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ ๐œŒ0 = 0. Since ๐‘ž1 (๐‘ ), โ€ฆ , ๐‘ž๐พ (๐‘ ) are orthonormal, we have
โ€–๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ 2 ๐œŒ0 โ€–2 = โ€–ฮ“ 2 ๐œŒ0 โ€–.
We also know
โ€–E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐œŒ0 | ๐‘ ๐‘ก )โ€–2 โ‰ค โ€–๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐œŒ0 โ€– = โ€–ฮ“ ๐œŒ0 โ€–.
We conclude that
โ€–ฮ“ 2 ๐œŒ0 โ€– โ‰ค โ€–ฮ“ ๐œŒ0 โ€–
for all ๐œŒ0 โˆˆ โ„๐พ . Thus, โ€–ฮ“ โ€– โ‰ค 1 with equality only when ๐œ”๐พ (๐‘ ๐‘ก ) = 0 or ฮ“ = 0.
โ– 
Proof of Proposition 2. We write
E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) = E(E(๐‘”(๐‘ ๐‘ก+๐ฝ ) | ๐‘ ๐‘ก+1 ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž)
= E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐ฝโˆ’1 ๐œŒ + ๐‘‚(๐พ 1/2โˆ’๐‘š ) | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž)
= E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐ฝโˆ’1 ๐œŒ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) + ๐‘‚(๐พ 1/2โˆ’๐‘š ).
We then write
E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ ฮ“ ๐ฝโˆ’1 ๐œŒ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก = ๐‘Ž) = E(๐‘ž ๐พ (๐‘ ๐‘ก+1 )โ€ฒ | ๐‘ ๐‘ก , ๐‘Ž๐‘ก )ฮ“ ๐ฝโˆ’1 ๐œŒ
= ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ (๐‘Ž)ฮ“ ๐ฝโˆ’1 ๐œŒ + ๐œ”๐พ (๐‘ ๐‘ก , ๐‘Ž)โ€ฒ ฮ“ ๐ฝโˆ’1 ๐œŒ
= ๐‘ž ๐พ (๐‘ ๐‘ก )โ€ฒ ฮ“ (๐‘Ž)ฮ“ ๐ฝโˆ’1 ๐œŒ + ๐‘‚(๐พ 1/2โˆ’๐‘š ).
The last line follows from the proof of proposition 1. The two displays above together shows
โ– 
the result.
36
B
Order of the bias term in infinite horizon stationary
Markov decision process
We first have that
โˆž
๐‘ฆ๐‘–๐‘ก โˆ’ ๐‘ฆ๐‘–๐‘ก,๐ฝ = โˆ‘ ๐›ฝ ๐‘— โ„Ž1๐‘— (๐‘ ๐‘–๐‘ก ),
๐‘—=๐ฝ+1
โˆž
โˆž
๐‘—=๐ฝ+1
๐‘—=๐ฝ+1
โ€ฒ
โ€ฒ
๐‘Ÿ๐‘–๐‘ก
โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
= ( โˆ‘ ๐›ฝ ๐‘— โ„Ž3๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ , โˆ‘ ๐›ฝ ๐‘— โ„Ž2๐‘— (๐‘ ๐‘–๐‘ก )โ€ฒ ).
It follows from Assumption 6.(i),
|๐‘ฆ๐‘–๐‘ก โˆ’ ๐‘ฆ๐‘–๐‘ก,๐ฝ | โ‰ค ๐›ฝ ๐ฝ+1 2๐œ/(1 โˆ’ ๐›ฝ),
(40a)
โ€–๐‘Ÿ๐‘–๐‘ก โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ โ€– โ‰ค ๐›ฝ ๐ฝ+1 โˆš๐‘‘๐‘ฅ ๐œ/(1 โˆ’ ๐›ฝ).
(40b)
Proposition B.1. Given Assumption 6, we have
โˆš
๐‘ (๐œƒ๐ฝ โˆ’ ๐œƒ๐ฝโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
Proof. As an M-estimator, the influence function of ๐œƒ๐ฝ is
โ€ฒ
๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— (๐‘ฆ๐‘–๐‘ก,๐ฝโˆ— โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ—
๐œƒ๐ฝโˆ— ).
For the simplicity of exposition, we omit the subscript โ€œ*โ€ in ๐‘Ÿ๐‘–๐‘ก,๐ฝโˆ— and ๐‘ฆ๐‘–๐‘ก,๐ฝโˆ— throughout
the proof. To prove the proposition, it suffices to show that
โ€ฒ
๐œƒ๐ฝโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
โ€ฒ
Because ๐‘ฆ๐‘–๐‘ก โˆ’ ๐‘Ÿ๐‘–๐‘ก
๐œƒโˆ— = 0, it can be shown that
โ€ฒ
โ€ฒ
โ€ฒ
๐œƒ๐ฝโˆ— ) โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก โˆ’ ๐‘Ÿ๐‘–๐‘ก
๐œƒโˆ— )
๐œƒ๐ฝโˆ— ) = ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
= ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )
+ ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘Ÿ๐‘–๐‘ก โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ )โ€ฒ ๐œƒ๐ฝโˆ—
+
โ€ฒ
๐‘Ÿ๐‘–๐‘ก,๐ฝ ๐‘Ÿ๐‘–๐‘ก
(๐œƒโˆ—
โˆ’ ๐œƒ๐ฝโˆ— ).
(41)
(42)
(43)
It follows from eq. (40) that eq. (41) and (42) are both ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ). To complete the proof,
we only need to show that ๐œƒโˆ— โˆ’ ๐œƒ๐ฝโˆ— = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
โ€ฒ
We have E(๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐œƒ๐ฝโˆ— )) = 0, and
โ€ฒ
โ€ฒ
โ€ฒ
E(๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐œƒ๐ฝโˆ— โˆ’ ๐‘ฆ๐‘–๐‘ก + ๐‘Ÿ๐‘–๐‘ก
๐œƒโˆ— )) = E(๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐œƒ๐ฝโˆ— )) = 0.
It then can be shown that
โ€ฒ
E(๐‘Ÿ๐‘–๐‘ก,๐ฝ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
)(๐œƒ๐ฝโˆ— โˆ’ ๐œƒโˆ— ) = E(๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )) โˆ’ E(๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘Ÿ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก )โ€ฒ )๐œƒโˆ— .
37
โ€ฒ
Applying eq. (40), we have ๐œƒโˆ— โˆ’ ๐œƒ๐ฝโˆ— = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ). So we conclude that ๐‘Ÿ๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
๐œƒ๐ฝโˆ— )
โ– 
is ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
โˆš
Proposition B.2. Given Assumption 6, we have ๐‘ (๐œƒ๐ฝโˆ— โˆ’ ๐œƒโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
โˆš
โˆš
โˆš
Proof. We write ๐‘ (๐œƒ๐ฝโˆ— โˆ’ ๐œƒโˆ— ) = ๐‘ (๐œƒ๐ฝโˆ— โˆ’ ๐œƒ๐ฝ ) + ๐‘ (๐œƒ๐ฝ โˆ’ ๐œƒโˆ— ). Proposition B.1 has shown
โˆš
โˆš
that ๐‘ (๐œƒ๐ฝโˆ— โˆ’๐œƒ๐ฝ ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ). Lemma B.1 below shows that ๐‘ (๐œƒ๐ฝ โˆ’๐œƒโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ). โ– 
โˆš
Lemma B.1. Given Assumption 6, we have ๐‘ (๐œƒ๐ฝ โˆ’ ๐œƒโˆ— ) = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ).
Proof. For the simplicity of exposition, we omit the subscript โ€œ*โ€. We decompose ๐œƒ๐ฝ โˆ’ ๐œƒ as
the following sum,
โˆ’1
โˆ’1
(๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ ) (๐‘…๐ฝ โ€ฒ ๐‘Œ๐ฝ )/๐‘ โˆ’ (๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ ) ๐‘…โ€ฒ ๐‘Œ /๐‘
โ€ฒ
โˆ’1
(44)
โˆ’1
+(๐‘…๐ฝ ๐‘…๐ฝ /๐‘ ) ๐‘…โ€ฒ ๐‘Œ /๐‘ โˆ’ (๐‘…โ€ฒ ๐‘…/๐‘ ) ๐‘…โ€ฒ ๐‘Œ /๐‘ .
(45)
Below, we derive the orders of eq. (44) and (45).
We first derive the order of eq. (44), which equals
โ€ฒ
E(๐‘Ÿ๐‘–๐‘ก,๐ฝ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
)โˆ’1 [(๐‘…๐ฝ โ€ฒ ๐‘Œ๐ฝ )/๐‘ โˆ’ ๐‘…โ€ฒ ๐‘Œ /๐‘ ] + ๐‘œ๐‘ (1),
if (๐‘…๐ฝ โ€ฒ ๐‘Œ๐ฝ )/๐‘ โˆ’ ๐‘…โ€ฒ ๐‘Œ /๐‘ = ๐‘‚๐‘ (1), which will be shown. Because Assumption 6.(ii) has
โ€ฒ
assumed the smallest eigenvalue of E(๐‘Ÿ๐‘–๐‘ก,๐ฝ ๐‘Ÿ๐‘–๐‘ก,๐ฝ
) is bounded from zero, it suffices to derive the
order of (๐‘…๐ฝ โ€ฒ ๐‘Œ๐ฝ )/๐‘ โˆ’๐‘…โ€ฒ ๐‘Œ /๐‘ to find the order of eq. (44). Decompose (๐‘…๐ฝ โ€ฒ ๐‘Œ๐ฝ )/๐‘ โˆ’๐‘…โ€ฒ ๐‘Œ /๐‘
as the following sum
(๐‘…๐ฝโ€ฒ ๐‘Œ๐ฝ โˆ’ ๐‘…๐ฝโ€ฒ ๐‘Œ )/๐‘ + (๐‘…๐ฝโ€ฒ ๐‘Œ โˆ’ ๐‘…โ€ฒ ๐‘Œ )/๐‘ .
To derive the order of (๐‘…๐ฝโ€ฒ ๐‘Œ๐ฝ โˆ’ ๐‘…๐ฝโ€ฒ ๐‘Œ )/๐‘ , we consider its squared mean and then use the
Markov inequality. Recall that ๐‘Ÿ๐‘–๐‘ก,๐ฝ is a ๐‘‘๐œƒ -dimensional vector. We have
๐‘‘๐œƒ
๐‘›
๐‘‡
2
E(โ€–(๐‘…๐ฝโ€ฒ ๐‘Œ๐ฝ โˆ’ ๐‘…๐ฝโ€ฒ ๐‘Œ )/๐‘ โ€–2 ) = โˆ‘ E([๐‘ โˆ’1 โˆ‘ โˆ‘ ๐‘Ÿโ„“,๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )] ).
๐‘–=1 ๐‘ก=1
โ„“=1
Since ๐‘‘๐œƒ is finite, it suffices to consider the order of an individual term of the above sum.
Since ๐‘‡ is finite, without loss of generality, we consider the case, where ๐‘‡ = 2. The term
=2
E([๐‘ โˆ’1 โˆ‘๐‘›๐‘–=1 โˆ‘๐‘‡๐‘ก=1
๐‘Ÿโ„“,๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )]2 ) equals
(2๐‘›)โˆ’1 E([๐‘Ÿโ„“,๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )]2 ) + (2๐‘›)โˆ’1 E(๐‘Ÿโ„“,๐‘–1,๐ฝ (๐‘ฆ๐‘–1,๐ฝ โˆ’ ๐‘ฆ๐‘–1 )๐‘Ÿโ„“,๐‘–2,๐ฝ (๐‘ฆ๐‘–2,๐ฝ โˆ’ ๐‘ฆ๐‘–2 )).
Because the regressors ๐‘Ÿโ„“,๐‘–๐‘ก,๐ฝ are bounded, and the difference |๐‘ฆ๐‘–๐‘ก โˆ’ ๐‘ฆ๐‘–๐‘ก,๐ฝ | is of order ๐›ฝ ๐ฝ+1 ,
we conclude
๐‘› ๐‘‡ =2
2
E([๐‘ โˆ’1 โˆ‘ โˆ‘ ๐‘Ÿโ„“,๐‘–๐‘ก,๐ฝ (๐‘ฆ๐‘–๐‘ก,๐ฝ โˆ’ ๐‘ฆ๐‘–๐‘ก )] ) = ๐‘‚(๐‘›โˆ’1 ๐›ฝ 2(๐ฝ+1) ).
๐‘–=1 ๐‘ก=1
38
By the Markov inequality, (๐‘…๐ฝโ€ฒ ๐‘Œ๐ฝ โˆ’ ๐‘…๐ฝโ€ฒ ๐‘Œ )/๐‘ = ๐‘‚๐‘ (๐‘ โˆ’1/2 ๐›ฝ (๐ฝ+1) ). Similar arguments show
that (๐‘…๐ฝโ€ฒ ๐‘Œ โˆ’๐‘…โ€ฒ ๐‘Œ )/๐‘ = ๐‘‚๐‘ (๐‘ โˆ’1/2 ๐›ฝ (๐ฝ+1) ). Hence, we conclude (๐‘…๐ฝโ€ฒ ๐‘Œ๐ฝ โˆ’๐‘…โ€ฒ ๐‘Œ )/๐‘ , henceforth
eq. (44), is ๐‘‚๐‘ (๐‘ โˆ’1/2 ๐›ฝ (๐ฝ+1) ).
We next derive the order of eq. (45), which rewritten as follows,
[(๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ )
โˆ’1
โˆ’ (๐‘…โ€ฒ ๐‘…/๐‘ )โˆ’1 ]๐‘…โ€ฒ ๐‘Œ /๐‘ .
The term ๐‘…โ€ฒ ๐‘Œ /๐‘ = ๐‘‚๐‘ (๐‘ โˆ’1/2 ). Letting
๐‘ˆ๐ฝ โ‰ก 2๐‘…๐ฝโ€ฒ (๐‘… โˆ’ ๐‘…๐ฝ )/๐‘ + (๐‘… โˆ’ ๐‘…๐ฝ )โ€ฒ (๐‘… โˆ’ ๐‘…๐ฝ )/๐‘ ,
the term (๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ )
โˆ’1
โˆ’ (๐‘…โ€ฒ ๐‘…/๐‘ )โˆ’1 equals
โˆ’1
โˆ’1
โˆ’1 โˆ’1
(๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ ) ๐‘ˆ๐ฝ (๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ ) [๐ผ + ๐‘ˆ๐ฝ (๐‘…๐ฝ โ€ฒ ๐‘…๐ฝ /๐‘ ) ] .
This can be shown by the Woodbury formula in linear algebra. It is easy to see that
๐‘ˆ๐ฝ = ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ), hence the above display is also ๐‘‚๐‘ (๐›ฝ ๐ฝ+1 ). We then conclude that eq. (45),
โ– 
is ๐‘‚๐‘ (๐‘ โˆ’1/2 ๐›ฝ (๐ฝ+1) ).
C
Order of the bias term in general Markov decision
process
Proof of Proposition 3. Let ๐œˆ(๐‘ ๐‘‡ ) = ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) โˆ’ โˆ‘๐พ
๐œŒ ๐‘ž (๐‘  ) be the residual of series ap๐‘—=1 ๐‘— ๐‘— ๐‘‡
๐พ
ฬƒ
proximation. Let ๐œˆ = ๐ป โˆ’ ๐‘„๐พ ๐œŒ denote the vector consisting of ๐›ฝ ๐‘‡ โˆ’๐‘ก E(๐œˆ(๐‘ 
๐‘‡ ) | ๐‘ ๐‘–๐‘ก ) for ๐‘–
and ๐‘ก. Here ๐œŒ๐พ = (๐œŒ1 , โ€ฆ , ๐œŒ๐พ )โ€ฒ .
โˆš
We will show that โ€– ๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€– โ‰ค โˆš๐œ†๐‘š๐‘Ž๐‘ฅ โ€–๐œˆโ€–, where ๐œ†๐‘š๐‘Ž๐‘ฅ is the largest eigenvalue of
๐‘…โˆ— โ‰ก (๐‘ โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 . Replacing ๐‘Œ in ๐œƒ๐พ = (๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐‘Œ with ๐‘Œ = ๐‘…๐œƒโˆ— + ๐ป, we
have ๐œƒ๐พ โˆ’ ๐œƒโˆ— = (๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ. Then
โ€–๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€–2 = โ€–(๐‘…โˆ— )โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆโ€–2
= ๐œˆ โ€ฒ ๐‘€๐พ ๐‘…(๐‘…โˆ— )โˆ’1 (๐‘…โˆ— )โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ
= ๐œˆ โ€ฒ ๐‘€๐พ ๐‘…(๐‘…โˆ— )โˆ’1/2 (๐‘…โˆ— )โˆ’1 (๐‘…โˆ— )โˆ’1/2 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ
โ‰ค ๐œ†๐‘š๐‘Ž๐‘ฅ โ‹… ๐œˆ โ€ฒ ๐‘€๐พ ๐‘…(๐‘…โˆ— )โˆ’1/2 (๐‘…โˆ— )โˆ’1/2 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ
= ๐œ†๐‘š๐‘Ž๐‘ฅ โ‹… ๐œˆ โ€ฒ ๐‘€๐พ ๐‘…(๐‘…โˆ— )โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ
= ๐‘ ๐œ†๐‘š๐‘Ž๐‘ฅ โ‹… ๐œˆ โ€ฒ ๐‘€๐พ ๐‘…(๐‘…โ€ฒ ๐‘€๐พ ๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘€๐พ ๐œˆ
โ‰ค ๐‘ ๐œ†๐‘š๐‘Ž๐‘ฅ โ€–๐œˆโ€–2 .
39
We have
โˆš
โ€– ๐‘ (๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€– โ‰ค โˆš๐œ†๐‘š๐‘Ž๐‘ฅ โ€–๐œˆโ€–.
(46)
๐พ
By the condition that sup๐‘ โˆˆ๐’ฎ โˆฃ๐‘‰๐‘‡ฬ„ (๐‘ ๐‘‡ ) โˆ’ โˆ‘๐‘—=1 ๐œŒ๐‘— ๐‘ž๐‘— (๐‘ )โˆฃ = sup๐‘ โˆˆ๐’ฎ |๐œˆ(๐‘ )| โ‰ค ๐‘‚(๐พ โˆ’๐œ‰ ), we have
โˆš
โ€–๐œˆโ€– = ๐‘ ๐‘‚(๐พ โˆ’๐œ‰ ). Also, โˆš๐œ†๐‘š๐‘Ž๐‘ฅ < โˆž with probability 1. So we have โ€–(๐œƒ๐พ โˆ’ ๐œƒโˆ— )โ€– =
โ– 
๐‘‚๐‘ (๐พ โˆ’๐œ‰ ).
D Asymptotic variance of three-step semiparametric Mestimators with generated dependent variables
This appendix derives the asymptotic variance for the three-step semiparametric M-estimators
with generated dependent variabels in the second step nonparametric regressions. First, we
describe the three-step estimator for which our proposed formula applies. Second, we derive the influence function and the asymptotic variance. Third, we derive the asymptotic
variance formula when the second step nonparametric regressions are โ€œpartial meansโ€ in
Neweyโ€™s (1994b) terminology. The general formula can be applied to our three-step CCP
estimator.
Assume that we have random sample {๐‘ฆ๐‘– , ๐‘ฅ๐‘– , ๐‘ง๐‘– โˆถ ๐‘– = 1, โ€ฆ , ๐‘›} about random variables ๐‘ฆ,
๐‘ฅ and ๐‘ง. In the first step, we obtain an estimator ๐‘(๐‘ฅ,
ฬ‚ ๐‘ง) of ๐‘โˆ— (๐‘ฅ, ๐‘ง) โ‰ก E(๐‘ฆ | ๐‘ฅ, ๐‘ง) by nonparametric regression of ๐‘ฆ on ๐‘ฅ and ๐‘ง. The variable ๐‘คโˆ— = ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง)) is the dependent
variable in the second step nonparametric regression. The function ๐œ‘ is known.
In the second step, the goal is to estimate
โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ) โ‰ก E(๐‘คโˆ— = ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง)) โˆฃ ๐‘ฅ).
Obviously, when ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘) is linear in ๐‘, the first step and second step can be merged by
the law of iterated expectation. So we are interested in the case where ๐œ‘ is nonlinear in
๐‘. Because ๐‘โˆ— (๐‘ฅ, ๐‘ง), hence ๐‘คโˆ— , is not observable, we use ๐‘คฬ‚ ๐‘– = ๐œ‘(๐‘ฅ๐‘– , ๐‘ง๐‘– , ๐‘(๐‘ฅ
ฬ‚ ๐‘– , ๐‘ง๐‘– )) in the
nonparametric regression. The nonparametric regression estimator of ๐‘คฬ‚ on ๐‘ฅ is denoted by
ฬ‚
โ„Ž(๐‘ฅ;
๐‘ค).
ฬ‚
The third step defines a semiparametric M-estimator ๐œƒ ฬ‚ that solves a moment equation
of the form
๐‘›
ฬ‚ ๐‘– ; ๐‘ค))
๐‘›โˆ’1 โˆ‘ ๐‘š(๐‘ฅ๐‘– , ๐œƒ, โ„Ž(๐‘ฅ
ฬ‚ = 0.
๐‘–=1
Suppose that E(๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))) = 0. Moreover, assume that ๐‘š depends on โ„Ž only
through its value โ„Ž(๐‘ฅ). Our interest is to characterize the first-order asymptotic properties
of ๐œƒ.ฬ‚
40
We derive the influence function of ๐œƒ ฬ‚ using the pathwise derivative approach in Newey
(1994a, page 1360-1361). For a path {๐น๐œ }, that equals the true distribution of (๐‘ฆ, ๐‘ฅ, ๐‘ง) when
๐œ = ๐œ , let ๐œ‡(๐น ) be the probability limit of ๐œƒ ฬ‚ if data were generated from ๐น . We have
โˆ—
๐œ
๐œ
E๐œ (๐‘š(๐‘ฅ, ๐œ‡(๐น๐œ ), โ„Ž๐œ (๐‘ฅ; ๐‘ค๐œ ))) = 0,
where ๐‘ค๐œ = ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘๐œ (๐‘ฅ, ๐‘ง)), ๐‘๐œ = E๐œ (๐‘ฆ | ๐‘ฅ, ๐‘ง), and โ„Ž๐œ (๐‘ฅ; ๐‘ค๐œ ) = E๐œ (๐‘ค๐œ | ๐‘ฅ). Let ๐‘€ โ‰ก
๐œ• E(๐‘š(๐‘ฅ, ๐œƒ, โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )))/๐œ•๐œƒโˆฃ
๐œƒ=๐œƒโˆ—
, and ๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง) = ๐œ• ln(d ๐น๐œ )/๐œ•๐œ be score function (the deriva-
tives with respect to ๐œ are always evaluated at ๐œ = ๐œโˆ— ). For expositional simplity, we write
๐œ• ๐‘“(๐‘ฅโˆ— )/๐œ•๐‘ฅ = ๐œ• ๐‘“(๐‘ฅ)/๐œ•๐‘ฅ|๐‘ฅ=๐‘ฅโˆ— for a generic function ๐‘“(๐‘ฅ).
It can be shown by Neweyโ€™s pathwise derivative approach that โˆ’๐‘€ ๐œ• ๐œ‡(๐น๐œ )/๐œ•๐œ equals
the sum of the following three terms
E(๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง)),
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))
(๐‘คโˆ— โˆ’ โ„Žโˆ— (๐‘ฅ))๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง)),
๐œ•โ„Ž
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
E(
(๐‘ฆ โˆ’ ๐‘โˆ— (๐‘ฅ, ๐‘ง))๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง)).
๐œ•โ„Ž
๐œ•๐‘
E(
(i)
(ii)
(iii)
Here term (i) corresponds to the leading term that captures the uncertainty left if we know
โ„Žโˆ— and ๐‘โˆ— , hence ๐‘คโˆ— . Term (ii) accounts for the sampling variation in estimating E(๐‘คโˆ— | ๐‘ฅ),
when ๐‘คโˆ— is known. Term (iii) reflects the sampling variation in ๐‘.ฬ‚ It is easy to derive term
(ii) from (Newey, 1994a, page 1360-1361). Term (iii) needs more explanation.
Term (iii) is derived from the pathwise derivative
๐œ• E(๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘ค๐œ )))/๐œ•๐œ
(47)
We have
eq. (47) = ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) ๐œ•โ„Žโˆ— (๐‘ฅ; ๐‘ค) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
[
๐‘๐œ ])/๐œ•๐‘
๐œ•โ„Ž
๐œ•๐‘ค
๐œ•๐‘
= ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))
๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
E(
๐‘๐œ โˆฃ ๐‘ฅ))/๐œ•๐‘
๐œ•โ„Ž
๐œ•๐‘
= ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
๐‘๐œ )/๐œ•๐‘.
๐œ•โ„Ž
๐œ•๐‘
(48)
The second line follows from the fact that โ„Žโˆ— (๐‘ฅ; ๐‘ค) = E(๐‘ค | ๐‘ฅ) is a linear functional, so
its Fréchet derivative is itself. The last line follows from the law of iterated expectation.
Since ๐‘๐œ = E๐œ (๐‘ฆ | ๐‘ฅ, ๐‘ง) is projection, term (iii) can be derived from equation (4.5) of Newey
(1994a).
41
From term (i), (ii) and (iii), we conclude that the influence function of the three-step
estimator ๐œƒ ฬ‚ is
โˆ’ ๐‘€ โˆ’1 [๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) +
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))
(๐‘คโˆ— โˆ’ โ„Žโˆ— (๐‘ฅ))
๐œ•โ„Ž
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
+
(๐‘ฆ โˆ’ ๐‘โˆ— (๐‘ฅ, ๐‘ง))].
๐œ•โ„Ž
๐œ•๐‘
Remark 4 (Difference between generated regressors and generated dependent variables).
Hahn and Ridder (2013) derives the asymptotic variance for semiparametric estimators
with generated regressors. The estimation there also proceeds in three steps. The estimator
in the first step is used to generate regressors for the nonparametric regression in the second
step. An interesting conclusion of their paper is that when third step estimator is linear in
the conditional mean estimated in the second step, the sampling variation in the first step
estimation does not affect the asymptotic variance of the three-step estimator. However, it
easy to see from term (iii) here that if the first step is to generate dependent variables rather
than regressors, the sampling variation in the first step will always affect the asymptotic
โ– 
variance of the three-step estimator.
We next consider the case where the second step nonparametric regression involves partial means instead of full means. In particular, suppose there is a discrete variable ๐‘Ž taking
value from 1,โ€ฆ,๐พ. In our dynamic discrete choice model, ๐‘Ž is ๐‘ฆ itself. The second step is to
estimate
โ„Ž๐‘˜โˆ— (๐‘ฅ; ๐‘คโˆ— ) = E(๐‘คโˆ— = ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง)) โˆฃ ๐‘ฅ, ๐‘Ž = ๐‘˜).
The leading term does not change. Using the following identity,
โ„Ž๐‘˜โˆ— (๐‘ฅ; ๐‘คโˆ— ) = E(๐‘คโˆ— = ๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง)) โˆฃ ๐‘ฅ, ๐‘Ž = ๐‘˜)
= E(
1(๐‘Ž = ๐‘˜)๐‘คโˆ—
โˆฃ ๐‘ฅ),
Pr(๐‘Ž = ๐‘˜ | ๐‘ฅ)
we can show that second term now becomes
E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))
1(๐‘Ž = ๐‘˜)๐‘คโˆ—
(
โˆ’ โ„Ž๐‘˜โˆ— (๐‘ฅ)) ๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง)).
๐œ•โ„Ž
Pr(๐‘Ž = ๐‘˜ | ๐‘ฅ)
For the third term, we have
eq. (47) = ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) ๐œ•โ„Žโˆ— (๐‘ฅ; ๐‘ค) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
[
๐‘๐œ ])/๐œ•๐‘
๐œ•โ„Ž
๐œ•๐‘ค
๐œ•๐‘
= ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— ))
๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
E(
๐‘๐œ โˆฃ ๐‘ฅ, ๐‘Ž = ๐‘˜))/๐œ•๐‘
๐œ•โ„Ž
๐œ•๐‘
= ๐œ• E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) 1(๐‘Ž = ๐‘˜) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
๐‘๐œ )/๐œ•๐‘.
๐œ•โ„Ž
Pr(๐‘Ž = ๐‘˜ | ๐‘ฅ)
๐œ•๐‘
42
Hence the third adjustment term of the influence function is
E(
๐œ•๐‘š(๐‘ฅ, ๐œƒโˆ— , โ„Žโˆ— (๐‘ฅ; ๐‘คโˆ— )) 1(๐‘Ž = ๐‘˜) ๐œ•๐œ‘(๐‘ฅ, ๐‘ง, ๐‘โˆ— (๐‘ฅ, ๐‘ง))
(๐‘ฆ โˆ’ ๐‘โˆ— (๐‘ฅ, ๐‘ง))๐‘†(๐‘ฆ, ๐‘ฅ, ๐‘ง)).
๐œ•โ„Ž
Pr(๐‘Ž = ๐‘˜ | ๐‘ฅ)
๐œ•๐‘
References
Aguirregabiria, V. (2010). Another look at the identification of dynamic discrete decision processes: An application to retirement behavior. Journal of Business & Economic
Statistics 28(2), 201โ€“218.
Aguirregabiria, V. and P. Mira (2002). Swapping the nested fixed point algorithm: A class
of estimators for discrete markov decision models. Econometrica 70(4), 1519โ€“1543.
Aguirregabiria, V. and P. Mira (2010). Dynamic discrete choice structural models: A survey.
Journal of Econometrics 156(1), 38โ€“67.
AltuฤŸ, S. and R. A. Miller (1998). The effect of work experience on female wages and labour
supply. The Review of Economic Studies 65(1), 45โ€“85.
Arcidiacono, P. and R. A. Miller (2011). Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica 79(6), 1823โ€“
1867.
Arcidiacono, P. and R. A. Miller (2016). Nonstationary dynamic models with finite dependence. Technical report, Mimeo.
Blundell, R., M. Costa Dias, C. Meghir, and J. Shaw (2016). Female labor supply, human
capital, and welfare reform. Econometrica 84(5), 1705โ€“1753.
Chou, C. (2016). Identification and linear estimation of general dynamic programming
discrete choice models. Mimeo, University of Leicester, Leicester.
Donald, S. G. and W. K. Newey (1994). Series estimation of semilinear models. Journal of
Multivariate Analysis 50(1), 30โ€“40.
Eckstein, Z. and K. I. Wolpin (1989). Dynamic labour force participation of married women
and endogenous work experience. The Review of Economic Studies 56(3), 375โ€“390.
Fang, H. and Y. Wang (2015). Estimating dynamic discrete choice models with hyperbolic
discounting, with an application to mammography decisions. International Economic
Review 56(2), 565โ€“596.
43
Hahn, J. and G. Ridder (2013). Asymptotic variance of semiparametric estimators with
generated regressors. Econometrica 81(1), 315โ€“340.
Heckman, J. J., L. J. Lochner, and P. E. Todd (2003). Fifty years of mincer earnings
regressions. Working Paper 9732, National Bureau of Economic Research.
Helmberg, G. (2008). Introduction to spectral theory in Hilbert space. Mineola, N.Y.: Dover
Publications, Inc.
Hotz, V. J. and R. A. Miller (1993). Conditional choice probabilities and the estimation of
dynamic models. The Review of Economic Studies 60(3), 497โ€“529.
Hotz, V. J., R. A. Miller, S. Sanders, and J. Smith (1994). A simulation estimator for
dynamic models of discrete choice. The Review of Economic Studies 61(2), 265โ€“289.
Huggett, M., G. Ventura, and A. Yaron (2011). Sources of lifetime inequality. The American
Economic Review 101(7), 2923โ€“2954.
Kalouptsidi, M., P. T. Scott, and E. Souza-Rodrigues (2015). Identification of counterfactuals and payoffs in dynamic discrete choice with an application to land use. Working
paper, NBER.
Kristensen, D., L. Nesheim, and A. de Paula (2014). CCP and the estimation of nonseparable
dynamic models. Mimeo, University College London.
Magnac, T. and D. Thesmar (2002). Identifying dynamic discrete decision processes. Econometrica 70(2), 801โ€“816.
Matzkin, R. L. (1992). Nonparametric and distribution-free estimation of the binary threshold crossing and the binary choice models. Econometrica, 239โ€“270.
Miller, R. A. (1997). Estimating models of dynamic optimization with microeconomic data.
In M. H. Pesaran and P. Schmidt (Eds.), Handbook of Applied Econometrics, Volume 2,
Chapter 6, pp. 246โ€“299. Blackwell Oxford.
Newey, W. K. (1994a). The asymptotic variance of semiparametric estimators. Econometrica, 1349โ€“1382.
Newey, W. K. (1994b). Kernel estimation of partial means and a general variance estimator.
Econometric Theory 10(02), 1โ€“21.
Norets, A. and X. Tang (2014). Semiparametric inference in dynamic binary choice models.
The Review of Economic Studies 81, 1229โ€“1262.
44
Pesendorfer, M. and P. Schmidt-Dengler (2008). Asymptotic least squares estimators for
dynamic games. The Review of Economic Studies 75(3), 901โ€“928.
Srisuma, S. and O. Linton (2012). Semiparametric estimation of markov decision processes
with continuous state space. Journal of Econometrics 166(2), 320โ€“341.
Wasserman, L. (2006). All of nonparametric statistics. New York: Springer.
45