Estimating Dynamic Economic Models with Fixed Effects

Estimating Dynamic Economic Models with Fixed
Effects∗
PRELIMINARY AND INCOMPLETE
Jeppe Druedahl†
Thomas H. Jørgensen‡
Dennis Kristensen§
January 30, 2017
Abstract
We propose a novel approach to estimate dynamic economic models with heterogeneous agents from observed behavior. The estimator is non-parametric in the
sense that it does not impose any restrictions on the distribution of heterogeneous
parameters. We develop the asymptotic behavior of the estimator and Monte Carlo
results show that the proposed estimator works well even in relatively short panels.
We apply our method to estimate a model of intertemporal consumption allocation
allowing for heterogeneity in time preferences using high quality Danish longitudinal
register data. We find substantial heterogeneity in preferences within educational
strata and the distributions of estimated preferences suggest more mass at high
values of discount factors for high skilled. Finally, we use the estimated householdspecific preferences to show that households who have never had an unemployment
insurance are also less patient and less risk averse than other households. (JEL:
C14, C51, D91)
Keywords: Heterogeneity, Dynamic Economic Models, Structural Estimation, Intertemporal asset allocation.
∗
We thank Bo E. Honoré, Elena Manresa, Christopher Carroll, Mette Ejrnæs, Lutz Hendricks, Rasmus
Søndergaard Pedersen, Søren Leth-Petersen, Claus Thustrup Kreiner and Anders Munk-Nielsen for fruitful discussions and suggestions. The project also benefited from seminar participants at Princeton and
Copenhagen. Financial support from the Danish Council for Independent Research in Social Sciences is
gratefully acknowledged (FSE, grant no. 4091-00040 and 5052-00086B). Part of this research was carried
out while Jørgensen was visiting Princeton University in the fall 2015. Jørgensen thanks Bo E. Honoré
for his exceptional hospitality and effort in organizing the stay. An earlier draft was circulated under the
title “Heterogeneous Preferences and Wealth Inequality”.
†
Department of Economics, University of Copenhagen, Øster Farimagsgade 5, Building 26, DK1353 Copenhagen K, Denmark. E-mail: [email protected]. Website: http://econ.ku.
dk/druedahl.
‡
Department of Economics, University College London, Gower Street, London, United Kingdom. Email: [email protected]. Webpage: www.tjeconomics.com.
§
Department of Economics, University College London, Gower Street, London, United Kingdom. Email: [email protected]. Website: https://sites.google.com/site/econkristensen.
1
Introduction
Economic agents are recurrently found to be heterogeneous in terms of ex ante characteristics such as abilities and preferences. Experiments and surveys have, for example,
repeatedly provided evidence of substantial preference heterogeneity.1 This heterogeneity,
furthermore, often have important positive and normative implications. Heterogeneity in
patience and risk aversion are, for example, important in explaining wealth inequality in
excess of income inequality2 and for understanding asset price puzzles.3 Furthermore it
might have large effects on the form and level of optimal taxation.4
We propose a novel non-parametric approach to estimate dynamic economic models
with heterogeneous agents from panel data on observed choices. We use that systematic
variation in observed choices, beyond what a given economic model and measurement
error can explain, is evidence of heterogeneity. Our estimator of both homogeneous and
heterogeneous parameters is simple to implement without imposing any distributional
assumptions on heterogeneous parameters. It can furthermore be used to estimate models
with both discrete and continuous choices, though we focus on the latter.
For concreteness,imagine that the goal is to estimate a consumption-saving model in
the spirit of Deaton (1991), allowing for heterogeneity in a single preference parameter,
and that we have access to panel data of N households, indexed by i, where we for Ti
periods, indexed by t, observe their market resources mit and level of consumption cit .
Further denote the model-implied optimal level of consumption by c? (mit ; θ, γi ), where
θ is a vector of homogeneous parameters, and γi is a household-specific heterogeneous
preference parameter.
In order to estimate θ and the household-specific values of γi , our main assumption
is the distribution of γi can be well-approximated by a discrete distribution, Γ. Our
estimator can therefore be thought of as a grouped fixed effects estimator, where the
distribution of γi is uncovered as the histogram of the household-specific values of γi , as
originally suggested by Kamakura (1991). Postponing the discussion of how to choose Γ
in empirical applications to later, assuming instead that it is known, the homogeneous
parameters in θ and the group membership of each household, ji ∈ {1, . . . J}, can be
estimated using e.g. nonlinear least squares as
1
2
3
4
See e.g. Barsky, Juster, Kimball and Shapiro (1997), Beetsma and Schotman (2001), Holt and Laury
(2005), Andersen, Harrison, Lau and Rutström (2008, 2010), Guiso and Paiella (2008), Kimball, Sahm
and Shapiro (2008, 2009), Dohmen, Falk, Huffman, Sunde, Schupp and Wagner (2011), Andreoni and
Sprenger (2012) and Finke and Huston (2013).
See e.g. Krusell and Smith (1997, 1998), Hendricks (2007), Cagetti and De Nardi (2008), Cozzi (2014),
Carroll, Slacalek and Tokuoka (2014) and De Nardi (2015).
See e.g. Guvenen (2006, 2009) and Gârleanu and Panageas (2015).
See e.g. Kocherlakota (2010) and Farhi and Werning (2012).
1
(θ̂, jˆ1 , jˆ2 , . . . , jˆN ) = arg
min
θ,j1 ,j2 ,...jN
Ti
N X
1 X
(cit − c? (mit ; θ, γ ji ))2
N i=1 t=1
Estimating the distribution of γi then boils down to finding the weights on each element
in Γ, ω = {ω1 , . . . , ωJ }. These weights can be found by simple population averages,
P
ω̂k = N1 N
i=1 1{ĵi = k}.
Traditional fixed effect (FE) estimators, not assuming finite support of γi , suffers from
an incidental parameter problem known to cause a substantial bias in nonlinear panel
models (see e.g. Hahn and Newey, 2004). Hahn and Moon (2010) argue that because the
classification parameters are super-consistent when assuming finite support, the incidental
parameter problem of our estimator should less pronounced. Furthermore, unlike random
coefficient models, our estimator allows for arbitrary correlation between heterogeneous
parameters and other model elements. Finally, a major computational advantage of our
estimator is that, conditional on θ, the J solutions to the economic model, c? (·), can be
pre-computed. The model does thus not need to be re-solved when estimating the N group
memberships. This substantially reduces the computational time required to evaluate the
criterion function, and makes many and potentially dense points in Γ feasible.
Monte Carlo estimation results show that the proposed estimator has good finite
sample properties. Specifically, we test its ability to uncover heterogeneous time preferences in the canonical buffer-stock consumption model pioneered by Deaton (1991, 1992)
and Carroll (1992, 1997). Assuming that observed consumption is contaminated with
multiplicative log-normal measurement error, we formulate a maximum likelihood (ML)
version of our estimator. We find that the estimator performs well even with relative
few time periods and substantial measurement error in consumption. We also find that
while misspecifying the number of and placement of the fixed nodes in γ naturally affects
the performance of the estimator, the estimated distribution of heterogeneous parameters
are very close to the truth. We find a substantial bias in homogeneous parameters when
the number of nodes are incorrect, however. We show how the panel jackknife approach
of Hahn and Newey (2004) and Dhaene and Jochmans (forthcoming) can substantially
reduce the bias in the homogeneous parameters.
To illustrate the empirical applicability of our proposed estimator, we also estimate
the buffer-stock consumption model on Danish administrative register data allowing for
heterogeneous time preferences and/or heterogeneous CRRA coefficients. This model was
first structurally estimated in Gourinchas and Parker (2002) and Cagetti (2003), assuming homogeneous preferences within occupational and educational strata respectively. We
are the first to estimate the model with a non-parametric distribution of preference heterogeneity. Our results suggest that there is substantial heterogeneity within educational
strata. Across educational strata we find that the estimated distributions of discount fac2
tors and CRRA coefficients are shifted towards higher values for high skilled households.
Alan and Browning (2010), which is the only comparable study estimating heterogeneous
preference parameters using observational data, finds similar results using the Panel Study
of Income Dynamics (PSID).
Our explicit estimation of group memberships allows us to perform post-estimation
analyzes of the different preference groups. Specifically, we use the estimated householdspecific preferences to show that households who have never had an unemployment insurance are less patient and less risk averse than other households. This suggests that the
estimated preferences align with economic intuition.
After discussing the related literature below, the paper proceeds as follows. Section
2 formulates and presents the proposed estimator in general notation. Section 4 presents
the Monte Carlo estimation results. In section 5, we report the estimation results from
our empirical application. Finally, we conclude in section 6.
1.1
Relation to Existing Estimators
Our proposed estimator is closely related to two recent strands of literature. Firstly,
Bajari, Fox and Ryan (2007) and Fox, Kim, Ryan and Bajari (2011) suggest a similar histogram strategy of fixing a discrete grid of the heterogeneous parameters when estimating
discrete choice models with random coefficients.5 Fox, Kim and Yang (2015) provide formal justification for their non-parametric approach approach. The assumption in this
existing literature is that the coefficients are random, and they thus seek to estimate the
population weights on each fixed node, ω. In the case of discrete (or discretized) choice
models, this estimator can be formulated as a constrained least squares problem, which
is easy to implement, and is ensured to have a unique global optimum. Unfortunately,
the estimator is much more complex for continuous choice models because it generally
requires solving a highly non-linear optimization problem with all the population weights
as variables.6 All dynamic models with random coefficients, furthermore, face the initial
conditions problem (see e.g. Heckman, 1981).
The second strand of literature, closely related to our proposed estimator, is the
grouped fixed effect (GFE) estimator proposed by Hahn and Moon (2010); Bonhomme
and Manresa (2015) and Bester and Hansen (forthcoming).7 In these papers, both the
placement of groups (i.e. the values in Γ) and the group membership of each observa-
5
6
7
Like our estimator, this facilitates pre-computation of the model solution for all J types (for given θ).
Ackerberg (2009) likewise proposed a combination of importance sampling and change of variables to
reduce simulation based estimation time by “pre-computing” the solution only over relevant objects.
The constrained least squares formulation of the estimator can, as shown by Nevo, Turner and Williams
(forthcoming), be recovered for continuous choices in a method of moment version where all the moments
are restricted to be linear in the population weights.
See also the related studies by Lin and Ng (2012) and Ando and Bai (2016).
3
tional unit is explicitly estimated. However, as discussed in Bonhomme and Manresa
(2015), estimating the group placements can in practice imply problems with multiple
local optima and non-convergence.8 This implies that the GFE estimator is mostly useful
when heterogeneity is suspected to be in the form of a small number of “sufficiently”
distinct groups across an unknown domain. Our estimator oppositely focuses on the case
of pervasive heterogeneity on a well-known domain.
In a broader context, our estimator is also related to a large literature on estimation
of mixture models. A particularly popular estimator in this class is the non-parametric
maximum likelihood estimator (NPMLE) proposed by Heckman and Singer (1984), among
others. These types of estimators often formulate an expected likelihood function where
both the groups placement and weights are to be estimated. As for the GFE estimator,
the simultaneous estimation of weights and nodes can result in multiple local optima
and problems of convergence. The common approach to numerically maximize the log
expected likelihood function is to apply the expectation-maximization (EM) algorithm
(Dempster, Laird and Rubin, 1977). Unfortunately, the EM-algorithm has a slow convergence rate and thus requires many evaluations of the likelihood function which can be
very time consuming if the estimator nests a numerical solution of a dynamic economic
model (Pilla and Lindsay, 2001). Empirical applications have therefore been restricted to
cases with a few distinct groups.
In the specific context of estimation of heterogeneous time and risk preferences, our
paper is also closely related to Alan and Browning (2010) and Alan, Browning and Ejrnæs
(2014). They propose a synthetic residual estimation (SRE) approach, where the distance
between observed and simulated consumption data is minimized conditional on fully parametric distributions of preference heterogeneity and the assumption that all households,
irrespective of their individual preferences, draw Euler residuals from a mixture of two
log-normal distributions.9 The main benefit of the SRE estimator is that it does not require a full specification of the income process, or ever solving the model, but on the other
hand it relies on very restrictive parametric assumptions, which our estimator avoids.
2
A Fixed Grouped Fixed Effects Estimator
In this section, we state the proposed estimator in general notation, while we later turn
to a concrete example in our Monte Carlo study. We consider a structural model, which
8
9
In a certain sense the results in the GFE papers can be seen as providing formal justification for cluster
analysis.
Note that while the mean Euler-residual (in the absence of borrowing constraints) is independent of
preferences, higher order moments are generally not. This is the case even if the distribution of pooled
Euler-residuals across heterogeneous households is well approximated by a mixture of two log-normals
(as found in Alan and Browning, 2010).
4
for unit i (individual, household, form etc.) at time t has state variables sit and choice
variables cit , and implies optimal choices c?it ≡ c? (sit ; θ, γi ), where θ ∈ Θ is a set of
homogeneous parameters, and γi is a vector of unit-specific parameters. This could be a
vector of optimal discrete and continuous choice variables in a dynamic economic model.
We wish to estimate θ and γi using an (unbalanced) panel of N units observed for Ti
(potentially non-consecutive) periods, where we in each period observes all the states, sobs
it ,
obs
and a non-empty subset of the choices, cit , potentially contaminated with measurement
error.
The fixed effects (FE) estimator is given by
θ̂
FE
Ti
N X
1 X
? obs
FE
= arg min
g(cobs
it , c (sit ; θ, γ̂i (θ)); θ)
θ∈Θ N
i=1 t=1
γ̂iF E (θ) = arg min
γi ∈R
Ti
X
? obs
g(cobs
it , c (sit ; θ, γi ); θ)
(2.1)
(2.2)
t=1
where g(·) is some criteria function. The FE problem has N + dim(θ) parameters to
be solved for. Especially when it is time consuming to evaluate c? () (by e.g. stochastic
dynamic programming), this estimator might seem infeasible. We propose an alternative
approximate estimator that aims at limiting the computational burden of FE estimation
of structural dynamic economic models.
Our approach is to formulate a discrete approximation of the continuous FE estimator
in (2.1) in which γi is restricted to take on only a finite number of values, γi ∈ Γ =
{γ 1 , . . . , γ J }. We think of the number of nodes, J, as a function of the data but suppress
the dependence throughout. Below we supply an approach to estimate the number of
nodes in applications. Our proposed fixed group fixed effects estimator (FGFE) is then
Ti
N X
1 X
? obs
g(cobs
it , c (sit ; θ, γ̂i (θ)); θ)
N i=1 t=1
θ̂ = arg min
θ∈Θ
γ̂i (θ) = arg min
γi ∈Γ
Ti
X
? obs
g(cobs
it , c (sit ; θ, γi ); θ)
(2.3)
(2.4)
t=1
Let j = (j1 , . . . , jN ) denote the vector of group memberships and J ≡ {1, 2, . . . J} as
the set of potential group memberships, the group membership is then estimated as ĵi =
PJ
PJ
k=1 k1γ̂i (θ̂)=γ k where
k=1 1γ̂i (θ̂)=γ k = 1.
A key advantage of our proposed estimator is that for a given guess of θ, we can precompute the J solutions to the economic model for the various values in Γ, and estimate
the N group membership parameters independently across units from equation (2.3). The
population weights on each element in Γ, ω = {ωj }J1 , can subsequently be estimated by
ω̂k =
N
1 X
1
, ∀k ∈ J.
N i=1 ĵi =k
5
(2.5)
The estimator easily handles situations where ωk = 0 for some k. This is not the case
for estimators where γ k is also estimated. Even if all weights are always strictly positive
in the true optimum, a trial value of γ k with ωk = 0 imply that the objective function
does not change with γ k severely complicating the optimizer’s decision how to proceed.
Indeed, Bonhomme and Manresa (2015) report significant problems with finding the global
maximum which might be due to a large dimensional problem with many flat regions.10
2.1
Estimating the Number of Nodes, J
We propose a split-panel cross-validation approach to choose the number of nodes, J, in
applications. For a given guess of J, imagine splitting the panel into I non-overlapping
partitions along the time dimension. For each partition, ι, we can estimate θ̂ιJ and
J
J
ĵιJ = (ĵ1,ι
, . . . , ĵN,ι
) and use these estimated parameters to calculate the sum of squared
predicted errors for the time periods not used in estimation (denoted with subscript −ι),
Eι (J) ≡ N −1
T
N X
X
J
εit,−ι (θ̂ιJ , γ ĵi,ι )2 .
i=1 t=1
Choosing J that minimizes the mean squared error
I
1X
Eι (J)
Jˆ = arg min
J∈N I
ι=1
provides an estimate of the number of nodes and domain that trades of the bias and
variance of the estimator.
Another way to estimate the number of nodes is a successive approximation approach,
similar to that suggested by Fernández-Villaverde, Rubio-Ramírez and Santos (2006) to
determine the degree of accuracy of a numerical solution method required for the approximate likelihood function to be a good approximation of the exact likelihood. Particularly,
to use the decrease in the estimated objective function as a metric to determine when to
stop adding nodes. While this is a simple metric to compute, choosing when to stop is
somewhat arbitrary.
Other alternatives have been proposed in various different strands of literature. Popular approaches to determine the number of latent factors in factor analysis or the number
of clusters in cluster analysis is to use information criteria, such as BIC or AIC (see e.g.
Milligan and Cooper, 1985; Bai and Ng, 2002; and Bonhomme and Manresa, 2015).
10
This identification problem is not unique to the GFE estimator. The same identification problem is
also inherent in the random coefficient estimator proposed by Heckman and Singer (1984) and heavily
used in empirical applications.
6
3
Asymptotic Theory [To come]
Under the assumption that the number of nodes and the placement of these nodes are
known (i.e. known G), the estimator studied in Hahn and Moon (2010) is similar to the
type of estimator we consider. Specifically, their estimator focuses on the estimation of
dynamic discrete games of firm behavior with potentially multiple, but a finite number
of, equilibria. In their setup, each market is observed over several time periods and they
assume that the equilibrium played in a market is time-invariant. Translating their setup
into our framework, the equilibrium played in a market is the unobserved heterogeneity
(γi ) in our framework and the assumption of finitely many equilibria is equivalent to our
finite support assumption on Γ.
Hahn and Moon (2010) show that the estimator is consistent as N and T both goes
to infinity and that correct classification converges to one even when the number of time
periods observed, T , grows significantly slower than the number of units, N . Specifically,
they show that for many typical settings, the incidental parameter problem of standard FE
estimators (unrestricted support of γi ) vanishes as long as T grows as some log function of
N . Finally, they show that the estimator of the homogeneous parameters, θ, is asymptotic
normal and inference is not affected by the classification parameters due to their fast rate
of convergence.
• Consistency
• Normality (as. var as FE)
• Convergence rates (N −1 , T −1 , J −1 )
– Bias reduction works when T /J → 0 when T, J → ∞
– Asymptotic distribution of the bias reduced estimator
4
Monte Carlo Experiments
We here illustrate the finite sample properties of our proposed FGFE estimator. We study
two examples based on the data generating processes (DGPs):
DGP1: yit = ρyit−1 + αi + εit
exp(ρyit−1 + αi + εit )
DGP2: yit =
1 + exp(ρyit−1 + αi + εit )
(4.1)
(4.2)
where ε ∼ N (0, 0.01) across all simulations.
For each of the 200 Monte Carlo replications we simulate N = 2, 000 individuals for
T ∈ {10, 20, 30} periods and apply our estimator using J ∈ {5, 10, 50} equally spaced
7
nodes. We simulate data letting yi0 = 0, ρ = 0.95 and α is drawn from a normal with
mean zero, variance 0.1 truncated to the interval [−0.3, 0.3] and assume that researchers
know the domain of the unit-specific coefficients, αi , but not the values. In turn, the
researcher wishes to uncover one homogenous parameter, ρ, and a vector of heterogeneous
parameters, α = (α1 , . . . , αN ).
Table 4.1: Monte Carlo Results: ρ, Linear Model.
Avg. Abs. Bias
baseline
MC Std.
bias reduced
baseline
bias reduced
T = 10
FE
FGFE
J =5
J = 10
J = 50
0.0550
0.0054
0.0035
0.0053
0.0587
0.0561
0.0550
0.0102
0.0061
0.0056
0.0061
0.0037
0.0036
0.0118
0.0067
0.0054
T = 20
FE
FGFE
J =5
J = 10
J = 50
0.0194
0.0029
0.0017
0.0022
0.0212
0.0199
0.0195
0.0062
0.0039
0.0029
0.0041
0.0022
0.0017
0.0074
0.0040
0.0022
T = 30
FE
FGFE
J =5
J = 10
J = 50
0.0116
0.0018
0.0010
0.0012
0.0132
0.0119
0.0116
0.0203
0.0027
0.0018
0.0035
0.0016
0.0010
0.0247
0.0031
0.0013
Notes: Columns 1 and 2 report the average absolute bias of the baseline and the bias-reduced estimates of ρ across the Monte Carlo replications. Columns 3 and 4 report the standard deviation across the
replications. All results are for the linear DGP1.
Table 4.2 shows that increasing the number of nodes reduces the average mean squared
error and the average MC standard error significantly.
8
Table 4.2: Monte Carlo Results: αi , Linear Model.
Avg. MSE
baseline
Avg. MC Std.
bias reduced
baseline
bias reduced
T = 10
FE
FGFE
J =5
J = 10
J = 50
0.0260
0.0228
0.1240
0.1097
0.0283
0.0265
0.0260
0.0247
0.0232
0.0228
0.1332
0.1261
0.1241
0.1181
0.1116
0.1097
T = 20
FE
FGFE
J =5
J = 10
J = 50
0.0231
0.0205
0.1152
0.1030
0.0254
0.0236
0.0231
0.0224
0.0208
0.0205
0.1245
0.1171
0.1152
0.1121
0.1048
0.1031
T = 30
FE
FGFE
J =5
J = 10
J = 50
0.0222
0.0201
0.1116
0.1014
0.0245
0.0227
0.0222
0.0256
0.0205
0.0201
0.1215
0.1136
0.1117
0.1255
0.1032
0.1014
Notes: Columns 1 and 2 report the average mean square error of the
baseline and the bias-reduced estimates of {αi }N
1 across the Monte
Carlo replications. Columns 3 and 4 report the average standard
deviation across the replications. All results are for the linear DGP1.
Tables 4.1 and 4.2 reports the estimation results related to the homogeneous and
heterogeneous parameters, respectively. The first column in Table 4.1 reports the average
absolute bias and the second column reports the split-panel jackknife bias reduced average
absolute bias. The third and fourth column reports the standard deviation across MC runs
for the baseline and bias reduced estimates. We also report the standard FE estimates
for reference. The FE estimator is feasible in this setup because the computational time
associated with evaluating the models in both DGPs are rather low. In situations where
evaluating the model at a set of parameter values is time consuming, such as many
structural dynamic economic models, implementing the standard FE estimator may easily
be unfeasible.
As our theory suggests, the FGFE estimator converges towards the FE estimator and
using 50 nodes delivers almost identical results. The split-panel jackknife bias reduction
reduces the incidental parameter bias significantly (while increasing the variance slightly
in out finite sample).
9
Table 4.3: Monte Carlo Results: ρ, Nonlinear Model.
Avg. Abs. Bias
baseline
MC Std.
bias reduced
baseline
bias reduced
T = 10
FE
FGFE
J =5
J = 10
J = 50
0.0569
0.0369
0.0110
0.0147
0.0529
0.0544
0.0570
0.0910
0.0476
0.0372
0.0573
0.0253
0.0117
0.1123
0.0500
0.0163
T = 20
FE
FGFE
J =5
J = 10
J = 50
0.0447
0.0221
0.0101
0.0124
0.0458
0.0383
0.0452
0.0846
0.0549
0.0244
0.0560
0.0333
0.0112
0.1036
0.0687
0.0158
T = 30
FE
FGFE
J =5
J = 10
J = 50
0.0374
0.0165
0.0096
0.0114
0.0506
0.0314
0.0372
0.0838
0.0546
0.0178
0.0597
0.0328
0.0105
0.1060
0.0669
0.0149
Notes: Columns 1 and 2 report the average absolute bias of the baseline and the bias-reduced estimates of ρ across the Monte Carlo replications. Columns 3 and 4 report the standard deviation across the
replications. All results are for the nonlinear DGP2.
10
Table 4.4: Monte Carlo Results: αi , Nonlinear Model.
Avg. MSE
baseline
Avg. MC Std.
bias reduced
baseline
bias reduced
T = 10
FE
FGFE
J =5
J = 10
J = 50
0.0233
0.0239
0.1063
0.1122
0.0252
0.0237
0.0233
0.0295
0.0249
0.0240
0.1194
0.1089
0.1064
0.1371
0.1174
0.1124
T = 20
FE
FGFE
J =5
J = 10
J = 50
0.0213
0.0212
0.1034
0.1056
0.0234
0.0216
0.0213
0.0270
0.0231
0.0213
0.1163
0.1068
0.1035
0.1299
0.1149
0.1059
T = 30
FE
FGFE
J =5
J = 10
J = 50
0.0208
0.0207
0.1022
0.1036
0.0233
0.0211
0.0209
0.0266
0.0225
0.0207
0.1160
0.1055
0.1023
0.1292
0.1125
0.1038
Notes: Columns 1 and 2 report the average mean square error of the
baseline and the bias-reduced estimates of {αi }N
1 across the Monte
Carlo replications. Columns 3 and 4 report the average standard
deviation across the replications. All results are for the nonlinear
DGP2.
4.1
Choosing the Number of Nodes
We here implement a simple half-panel cross-validation approach. This follows closely the
bias-correction approach above and aims at preserving eventual time-series properties of
the actual data. In particular, we split the simulated data into two sub-samples where
the first contains the first T /2 time period observations for all individuals and the second
sample contain the remaining T /2 time periods for all individuals. We then estimate the
J
model parameters for each sub-sample to get (ρ̂Jι , α̂i,ι
), ι = 1, 2. We then calculate the
squared predicted error for each sample using the estimates from the other left out sample
11
(illustrated here for the linear DGP1)
E1 (J) = N −1
E2 (J) = N −1
T /2
N X
X
J 2
(yi,t − ρ̂J2 yit−1 + α̂2,i
)
i=1 t=1
N
T
X
X
J 2
(yi,t − ρ̂J1 yit−1 + α̂1,i
)
i=1 t=T /2+1
and estimate J as
1
(4.3)
Jˆ = arg min (E1 (J) + E2 (J))
J∈J 2
where we restrict the number of nodes to be in a sub-set of the natural numbers, namely
J = {10, 11, . . . , 100}.
Figure 4.1 plots the histogram of estimated number of groups across all 200 Monte
Carlo runs together with the average number of estimated groups for T ∈ {10, 20, 30}.
Figure 4.1 reports results for the linear DGP1 and Figure 4.2 reports the results from the
nonlinear DGP2.
Figure 4.1: Estimated Number of Groups, J. Linear model, DGP1.
(a) T = 10.
(b) T = 20.
0.25
share
average (43)
0.1
share
average (54)
0.2
share
0.15
0.25
0.05
0.15
0.1
0.05
0
share
average (60)
0.2
share
0.2
share
(c) T = 30.
0.25
0.15
0.1
0.05
0
0
10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100
number of groups, J
number of groups, J
number of groups, J
Notes: The figure reports the distribution (and average) of the estimated number of groups using the
cross-validation criteria (4.3) for varying sample sizes. All results are based on the linear DGP1.
Figure 4.2: Estimated Number of Groups, J. Nonlinear model, DGP2.
share
average (73)
0.3
share
share
0.3
(b) T = 20.
0.2
0.1
share
average (82)
0.2
0.1
0
(c) T = 30.
0.3
share
(a) T = 10.
share
average (85)
0.2
0.1
0
0
10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100
number of groups, J
number of groups, J
number of groups, J
Notes: The figure reports the distribution (and average) of the estimated number of groups using the
cross-validation criteria (4.3) for varying sample sizes. All results are based on the nonlinear DGP2.
12
As our theory suggests [REF to BIAS RATE], when the number of time periods
increase, the average optimal number of groups increases. Particularly, the average estimated number of groups are 43, 54 and 60 when using 10, 20 and 30 time periods,
respectively. While there is significant dispersion across MC runs the average estimated
number of groups are reassuringly close to the 50 groups that delivered almost identical
results as the standard FE estimator in tables 4.1–4.4 above. For the non-linear model,
the number of optimal groups is slightly higher in our setting.
5
An Empirical Application to Danish Data
In this section, we apply our proposed estimator to Danish administrative register data.
We estimate through Maximum Likelihood the canonical buffer-stock consumption model
of Deaton (1991, 1992) and Carroll (1992, 1997) assuming that (imputed) consumption
is contaminated with mean-one multiplicative log-normal error with variance ση2 . Qualitatively similar results from a non-linear least square (NLLS) estimator and a robust
Huber-type estimator, not relying on the distributional assumption on the measurement
error, is reported in Appendix D in the online supplemental material. The supplemental
material Appendix C also contain a Monte Carlo study of the ability of the proposed
estimator to estimate this type of model calibrated to the Danish data.
We first estimate the model under the assumption of fully homogeneous preferences,
and then in turn allow for heterogeneity in the discount factor, β. We include in the
supplemental material alternative results from letting ρ be heterogeneous. In both cases,
we find substantial preference heterogeneity within educational strata, and a clear improvement in the model’s predictive power when allowing for either type of heterogeneity.
The results also align well with economic intuition with e.g. high skilled households being
more patient. In a post-estimation analysis, we additionally show that households with
no unemployment insurance are estimated to have more mass at lower discount factors
(and lower relative risk aversion coefficients). The NPGFE estimation routine converged
in less than five minutes, illustrating that the proposed estimator is also applicable for
more complex and computationally time demanding models.
13
5.1
Data
We use high quality Danish administrative registers covering the entire population in the
period 1987-1996.11 All information are based on third party reports with little additional
self-reporting. All self-reporting are moreover subject to possible auditing giving reliable
longitudinal information on household characteristics, assets, liabilities and income.
Household income includes all monetary income net of all taxes, except any income
related to ownership of financial assets. Transfers, such as child benefits and unemployment benefits, are also included to ensure that disposable income accurately measures
the flow of resources available for consumption. Net wealth consists of stocks, bonds,
bank deposits, cars, boats, house value for home owners and mortgage deeds net of total
liabilities. The house value is assessed by the tax authorities for tax purposes. Pension
wealth is not observed in the registers and thus not included in the wealth measure.
Household consumption is not observed in the registers and is, therefore, imputed
using a simple budget approach, Ct = Ỹt − ∆At , where Ỹt = Yt + r · At is disposable
income, At is end-of-period net wealth, r is the real rate of return, and ∆At thus proxies
savings. A very similar imputation method is evaluated on Danish data in Browning and
Leth-Petersen (2003) and found to produce a reasonable approximation. The resulting
consumption measure will, however, e.g. include some durables such as home appliances.
All variables are deflated with the official consumer price index.
We restrict attention to stable married or cohabiting couples in which the husband is
between age 25 and 59. This is to mitigate issues regarding educational and retirement
choices. To increase homogeneity of households, we restrict the spousal age difference
to be no more than five years, and require that no one in the household ever becomes
self-employed, are out of the labor market, or retire before age 59. To limit the effect
of errors in the imputation procedure on our estimates of preference heterogeneity, we
trim our sample from extreme observations and require that we have data for at least 5
years.12 In total this leaves us with an unbalanced panel of 317,793 households observed in
at most 9 time periods with a total of 2,994,679 household-time observations. Households
are classified as high skilled if either member holds at least a bachelor degree (86.713
households are denoted as high-skilled).
11
We begin in 1987 to be able to consistently match individuals into couples, and we end with 1996
because the Danish wealth tax was abolished in this year. Information on, e.g., cars and boats where
not collected in subsequent years leading to a break in the wealth measure from 1996 to 1997.
12
Further details on the data are provided in appendix B.
14
5.2
Model
Our application builds on the incomplete markets model pioneered by Deaton (1991, 1992)
and Carroll (1992, 1997). For completeness we give e short description of the model here.
We consider unitary households indexed by i with heterogeneous preferences who work
for Tr periods, then retire and eventually die at the end of period T . The recursive form
of the household’s problem is
Cit1−ρ
Vit (Pit , Mit ) = max
+ βi Et [Vit+1 (Pit+1 , Mit+1 )]
Cit ≥0 1 − ρ
(5.1)
subject to the inter-temporal budget constraint
Mit+1 = RAit + Yit
Ait = Mit − Cit
(5.2)
(5.3)
where Ait is end-of-period assets, Mit is beginning-of-period market resources, Yit is income, and R is the gross rate of return. Consumers are allowed to be net-borrowers up
to a fraction of their permanent income Pit . End-of-period wealth thus has to satisfy
Ait ≥ −λt Pit , λt =


0
t ≥ Tr

λ
else.
(5.4)
where we restrict retirees not to be net borrowers (λt = 0, t ≥ Tr ).
In the beginning of each period, households receive a stochastic income
Yit = Pit ξit ,
ξ ∼ log N (−0.5σξ2 , σξ2 )
(5.5)
Pit = Gt Pit−1 ψit ,
ψ ∼ log N (−0.5σψ2 , σψ2 )
(5.6)
where Gt is an age-dependent gross growth rate of permanent income, ψit is a mean-one
permanent shock to income, and ξit is a mean-one transitory shock to income. We assume
that income is constant post retirement, Yit = κPiT , t ≥ T , where κ is the replacement
rate in retirement.
We denote the model-implied optimal level of consumption for a household aged t
with resources Mit and permanent income Pit by Cit? = Ct? (Mit , Pit ; θ, βi ) where θ =
(ρ, R, λ, σξ2 , σψ2 ). The model is solved using the endogenous grid method (EGM) proposed
by Carroll (2006).13
13
We use 300 discrete points to approximate the consumption function and 82 Guass-Hermite quadrature
points to approximate expectations with respect to future transitory and permanent income shocks.
15
5.3
Calibrations
In addition to the parameters of the income process estimated below, we fix several other
parameters of the model before turning to estimation. Particularly, we choose an interest
of R = 1.03 similar to the long run real return on 10 year Danish government bonds
which over the period 1987-2007 was 3.8 percent. The same interest rate is used in
e.g. Gourinchas and Parker (2002). Informally looking into the observed consumption
behavior of households in debt we furthermore set the borrowing constraint to be binding
at 30 percent of permanent income (λ = 0.30). Kaplan (2012) estimates an almost
identical placement of the credit constraint using the PSID. Finally, we set the replacement
rate in retirement to 90 percent, κ = 0.9 based on The Danish Ministry of Finance (2003)
and assume that households retire at age 60 (Tr = 60) and dies at age 75 (T = 75).
Following the approach in Meghir and Pistaferri (2004), we estimate the transitory
and permanent income shocks variances for each education group separately using
σψ2 = cov(∆it ,
2
X
∆i,t−1+k )
(5.7)
k=0
σξ2 = −cov(∆it , ∆i,t+1 )
(5.8)
whereit is the residual for household i in period t from a regression of log household
income on a full set of age dummies, i.e.
log(Yit ) = cons +
59
X
αjage 1ageit =j + it
(5.9)
j=25
The results are reported in table 5.1. The income variances of Danish households are
smaller than those typically estimated for the US. As argued in Jørgensen (forthcoming),
this is most likely due to i) a generous social welfare system, ii) progressive taxation, iii)
a relatively high “minimum wage”, and iv) register data is typically less noise compared
to surveys typically used. We find that high skilled households are subject to both larger
transitory shocks, and larger permanent shocks.
Table 5.1: Income Shock Variances
σψ2 · 103
σξ2 · 103
Low skilled
High skilled
Est
(s.e.)
Est
(s.e.)
2.86
3.20
(0.05)
(0.10)
3.56
5.19
(0.09)
(0.24)
Notes: The income shock variances are estimated based on the approach proposed in
Meghir and Pistaferri (2004).
16
The growth in income is estimated by re-arranging the income process such that
N
1 X
1
Gt = exp
∆ log Yit + σψ2
N i=1
2
!
(5.10)
A smoothed growth rate G̃t is obtained using a third degree polynomial in age. The results
are reported in figure 5.1. Permanent income, Pit , is found by applying the Kalman filter
on the time series of log income for each household (the resulting life cycle profile is shown
in appendix B in the online supplemental material).
Figure 5.1: Gross Income Growth Rates, Gt .
(a) Low skilled
(b) High skilled
1.12
1.12
Point Estimation
Smoothed
1.1
1.1
1.08
1.08
1.06
1.06
1.04
1.04
1.02
1.02
1
1
0.98
25
30
35
40
45
50
55
60
0.98
25
30
35
age
5.4
40
45
50
55
60
age
A Maximum Likelihood Estimator
We follow the typical assumption that consumption is observed with multiplicative iid
log-normal measurement error (G&P?) with mean one and variance ση2 , i.e.
Citobs = Cit? ηit ,
log η ∼ N (−0.5ση2 , ση2 )
(5.11)
The mean-corrected log-differences in observed and predicted consumption thus follows a
Gaussian distribution, i.e.
?
2
εit (ρ, βi ) ≡ cobs
it − cit + 0.5ση ,
εit (ρ, βi ) ∼ N (0, ση2 )
(5.12)
obs
obs
where lowercase letters denote log-transformed variables, e.g., cobs
it = log Ct (Mit , Pit ; θ, βi ).
Note, that we ignore for simplicity here that we use an estimated permanent income measure stemming from the Kalman filter because we do not observe that in the data. Alternatively, the approach proposed in Jørgensen and Kristensen (2017) could be adopted
17
here as well. We have not pursued that strategy further here.
The mean log likelihood function is then
L(ρ, ση , {βi }N
i=1 ) =
N
1 X
`i (ρ, ση , βi )
N i=1
where `i (ρ, ση , βi ) is the log-likelihood contributions associated with household i,
Ti
X
1
εit (ρ, βi )2 ση−2
`i (ρ, ση , βi ) = − Ti log(2πση2 ) +
2
t=1
!
We discretize the discount factors, {β j }Jj=1 , into J equally spaced nodes and our FGFE
MLE solves
(ρ̂, σ̂η ) = arg max L(ρ, ση , {β ji }N
i=1 )
ρ,ση ∈R+
jˆi (ρ, ση ) = arg max `i (ρ, ση , β ji ), ∀i = 1, . . . , N
ji ∈J
(5.13)
(5.14)
where the group memberships are estimated through classification maximum likelihood
(Bryant and Williamson, 1978).
The MLE of the measurement error variance, ση2 , is biased because the estimator
does not recognize the reduced degrees of freedom from the estimation of βi and the
homogeneous parameters in θ. We, therefore, estimate a re-parameterized parameter,
ση2 = σ̃η2 (N T − N − (dim(θ) − 1))/N T , where σ̃η2 is the “standard” biased measurement
P
error variance and N T ≡ N
i=1 Ti . This parametrization corrects for the dimensionality
of the homogeneous parameters and the estimation of the N classification parameters in
equation (5.14).
5.5
Estimation Results
The estimation results are presented in table 5.2 for both educational groups.
Columns (1) and (4) report the homogeneous estimates for low and high skilled households. Surprisingly, we here find that low-skilled households with a discount factor of 0.966
are much more patient than high skilled households with a discount factor of 0.933. On
the other hand, high skilled households have a CRRA coefficient of 7.3 far exceeding the
CRRA coefficient of 2.1 for low skilled. As discussed above in relation to the problems
in identifying β and ρ separately, the total saving motive might thus still be stronger for
the high skilled than for the low skilled households – both due to the higher risk aversion
and the lower inter-temporal elasticity of substitution.
Columns (2) and (5) in table 5.2 report estimation results when β is allowed to be
heterogeneous, and figure 5.2 shows the estimated distributions of β. We allow for a large
domain with β ∈ [0.75, 1.05] and discretize it into J = 100 bins. We find very little mass
18
Table 5.2: Estimated Preferences.
Low skilled
High skilled
Hom.
(1)
FGFE
(2)
Hom.
(3)
FGFE
(4)
β
0.966
0.961†
0.938
0.968†
ρ
2.119
1.490
{1.969}
7.256
1.393
{1.772}
(0.000)
(0.080)
[0.033]
(0.001)
(0.065)
(0.001)
ση
0.333
(0.000)
0.335
{0.340}
[0.030]
(0.002)
0.352
(0.000)
(0.000)
0.352
{0.361}
(0.001)
L
N
Obs.
−2.502
168315
1341582
−1.984
168315
1341582
−2.916
62057
490694
−2.355
62057
490694
J
β∈
−
R+
100
[0.75, 1.05]
−
R+
100
[0.75, 1.05]
†
‡
Notes: Robust asymptotic standard errors in brackets. Clustered on the individual level. Split-panel jackknife bias
reduced estimates reported in curly brackets (Dhaene and
Jochmans, forthcoming).
Reported are the estimated mean of the respective heterogeneous distribution with the standard deviation of the distribution in square brackets.
The number of nodes refers here to the number of points in
the interpolation object. The household-specific estimates are
allowed to be continuous in the domain.
on the boundary indicating that our chosen domain is large enough. In opposition to the
results for the homogeneous specification, we now find that the high skilled are somewhat
more patient than the low skilled; the mean of the distribution for the high skilled is
0.968, while it is 0.0961 for the low skilled. In figure 5.2 we see that the distribution of
discount factors is shifted to the right for the high skilled compared to low skilled, while
the shapes of the distributions are quite similar across the two educational groups. Both
distributions are relatively symmetric, but with a somewhat fat left tail.
The CRRA coefficients are found to be relatively low and similar across the educational
groups; ρ = 1.49 for low skilled, and ρ = 1.39 for high skilled. The estimates increase to
around 1.97 and 1.77 for low and high skilled, respectively, when applying the split-panel
jackknife bias reduction approach proposed in Dhaene and Jochmans (forthcoming). The
estimated homogeneous parameters and the means of heterogeneous distributions are thus
in line with existing estimates. See e.g. Cagetti (2003); Gourinchas and Parker (2002)
and Alan, Attanasio and Browning (2009).
Looking at the improvement in the log-likelihood function when allowing β to be
19
heterogeneous compared to the homogeneous case, we can calculate log-likelihood ratios
of 174, 374 and 69, 628 for low and high skilled, respectively. Assuming that these LRstatistics are χ2N −1 with N − 1 degrees of freedom (N classification parameters versus 1
homogeneous parameter), we get p-values of zero, suggesting that the estimated heterogeneity is significant both economically and statistically.14
Figure 5.2: Estimated β Distributions, FGFE.
(a) Probability distribution.
(b) Cumulative distribution.
0.1
0.08
1
Low skilled
High skilled
0.8
0.06
0.6
0.04
0.4
0.02
0.2
0
0.75
0.8
0.85
0.9
-
0.95
1
0
0.75
1.05
0.8
0.85
0.9
-
0.95
1
1.05
Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixed
groups as the empirical PDF in the left panel and the CDF in the right panel.
Although we find evidence of substantial preference heterogeneity, our estimated distributions, especially of discount factors, are more narrow than those found in Alan and
Browning (2010), which is the only comparable study using observational data. In the
case of β, they find a spread between the 90th and 10th percentile of 0.143 for the low
skilled and 0.134 for the high skilled. We estimate spreads of 0.050 and 0.046, respectively.
Our results are broadly in line with experimental evidence from Denmark. The distribution of β seems consistent with experimental evidence from Denmark reported in
Andersen, Harrison, Lau and Rutström (2010). In general it should be noted that the
dispersion of estimates found in the experimental and survey-based literature is large15 ,
but that our results is probably in the lower end in terms of the estimated degree of
heterogeneity.
14
This is not a formal test. The χ2 distribution and the degrees of freedom are at best an approximation.
Furthermore, for the “test” to be nested, we should include the estimated homogeneous β̂ as a node in
the discrete domain when estimating the distribution of discount factors. Since we view this “test” as
an informal assessment of the importance of heterogeneity, we have not pursued this any further.
15
See footnote 1 for references.
20
5.6
Preferences and Unemployment Insurance
An interesting feature of the proposed estimator is that it explicitly group households
according to their (estimated) preferences. We can thus easily compare the distributions
of preferences across groups of households divided according to various characteristics.16
In particular, we observe whether anyone in the household have an unemployment insurance. In order to construct a time invariant grouping, we denote households as having
no unemployment insurance if they have never been observed as having an unemployment
insurance, and as having unemployment insurance otherwise.
Figures 5.3 shows the estimated cumulative distributions of discount factors for low and
high skilled household divided by unemployment insurance take up. The group without
any unemployment insurance is relatively small, and the resulting density plots noisy, but
we see a noticeable difference in the discount factor distributions across the two groups –
especially for high skilled households. Specifically, as economic intuition would suggest,
we find that households who have never had unemployment insurance are associated with
relatively lower valuation of the future compared to the group of households in which at
least one member have had unemployment insurance at some point.
Figure 5.3: Estimated Preference Distributions for Sub-Groups
(b) High skilled.
(a) Low skilled.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0.75
0.8
0.85
0.9
-
0.95
1
0
0.75
1.05
0.8
0.85
0.9
-
0.95
1
1.05
Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixed
groups as the empirical PDF in the left panel and the CDF in the right panel. Results are split based on
whether at least one household member have ever been observed to have an unemployment insurance or
not.
While the reported correlations are not causal interpretations and we have ignored
estimation uncertainty related to the group membership of each household for convenience, this analysis shows how the proposed estimator can be used to investigate ex ante
16
While the estimated preferences are associated with estimation uncertainty, we abstract from that
important point here.
21
preference heterogeneity across household characteristics and produce meaningful results.
• Update the figure to include legend!
6
Discussion
Our non-parametric grouped fixed-effects estimator was shown to offer a computationally
simple and efficient approach to estimate dynamic economic models with unrestricted heterogeneity from panel data on observed choices. In a Monte Carlo study we showed that
it has good finite sample properties, and specifically that it can uncover the distribution
of preference heterogeneity from consumption choices using a standard life-cycle consumption model. The estimator’s empirical applicability was shown estimating a similar
life-cycle consumption saving model on Danish administrative data allowing for heterogeneous time preferences and and/or heterogeneous CRRA coefficients. These results
indicated a large degree of preference heterogeneity, where differences across education
groups for example aligned well with economic intuition.
Interesting avenues for future work includes both an investigation of the asymptotic
property of the estimator and applications of the estimator to more complex dynamic
economic models with multiple continuous and discrete choices. Building on the current
application, it would, for example, be interesting to estimate a more general life-cycle
saving model with non-parametric heterogeneity affecting not just consumption choices,
but also portfolio and retirement choices.
22
A
Model Details
Proposition 1. The optimal end-of-period asset choice satisfies
At ≥ At =


0
if t = T

− min {Ωt , λt } · Γt
if t < T
where
Λt ≡


R−1

R−1
if t = T − 1
· Γt · ξ
i
h
· min {ΛT −1 , λ} + ξ · Γt
if t < T − 1
Γt ≡ Gt · ψ
Proof. Let Et [•] denote the worst-case expectation operator given information t. Note
that any MT ≤ 0 implies that the household cannot choose a Ct > 0 such that At ≤ 0.
Consequently
C 1−ρ
= −∞
lim VT (•, MT ) = lim t
Mt &0
Ct &0 1 − ρ
which the household want to avoid at any cost. Therefore we have
ET −1 [MT − AT ] > 0 ↔
ET −1 [R · AT −1 + YT ] > 0 ↔
R · AT −1 + ΓT · ξ · PT −1 > 0 ↔
AT −1 > −R−1 · ΓT · ξ · PT −1
Combining this with the exogenous borrowing constraint we get
AT −1 > − min {ΛT −1 , λ} · PT −1
Similar arguments further implies
ET −2 [MT −1 − min {ΛT −1 , λ} · PT −1 ] > 0 ↔
ET −1 [R · AT −2 + YT −1 ]
>
− min {ΛT −1 , λ} · ET −1 [PT −1 ] ↔
R · AT −2 + GT −1 · ψ · ξ · PT −2
>
− min {ΛT −1 , λ} · GT −1 · ψ · PT −2 ↔
AT −2
>
− R−1 · min {ΛT −1 , λ} + ξ · ΓT −1 · PT −2
h
|
.
23
i
{z
=ΛT −2
}
B
Data
B.1
Income Definitions
In the Danish income registers, we have the following income variables:
DISPON_NY
|
{z
}
disposable income
= SAMLINK_NY − SKATMVIALT_NY
|
{z
}
−
taxes
− UNDERHOL
+ TBKONTHJ}
{z
|
QRENTUD2
|
{z
}
alimony+returned benefits
interest payments
SAMLINK_NY
= PERINDKIALT
|
{z
}
total income
+OVSKEJD02_NY
+ OVERSKEJD07}
|
{z
imputed rental value
=
RENTEINDK
PERINDKIALT
{z
}
|
{z
} + PEROEVRIGFORMUE
|
{z
}+
|
interest income
total monetary income
other property income
ERHVERVSINDK(_GL) + OVERFORSINDK
{z
}
|
{z
} |
public
transfers
wages and profits
+RESUINK(_GL)
|
{z
other income
}
We define
Yitgross ≡ PERINDKIALT
Yitasset ≡ RENTEINDK+PEROEVRIGFORMUE
Yitnonasset ≡ PERINDKIALT − Ytassets
Yittransf ers ≡ OVERFORSINDK
ςit ≡ SKATMVIALT_NY


Y gross
it
− ςit
asset Y
if Y itgross < 0.1
it
Yitnom ≡ 
(1 − τ ) · Y nonasset − else
it
it
where i is for a couple, t is for observation year, and Yitnom is after-tax monetary income
from all sources, except financial assets. To approximate the after tax earnings of houseςit
holds with substantial income from financial assets, we use the tax rate τit ≡ Y gross
of
it
households without substantial income from financial assets, but with a similar level of
non-asset income (specifically we use twenty bins of Yitnonassets ).
24
B.2
Data Construction
We construct our variables as follows:
1. Couples are constructed using EFALLE (from BEF) (before 1987 we only have
C_FAELLE_ID from FAIN).
2. Birthyear and gender is based on FOED_DAG and KOEN (from BEF) or if not
available ALDER and KOEN (from FAIN). Couple age is the age of the male.
3. Wealth Anom
is the total net wealth excluding pensions (FORM and FORMit
REST_NY05 (after 1996) from INDH) adjusted upwards with 10 percent of the
nom
(KOEJD or if missing EJENDOMSVURDERvalue of any owned properties Hikt
ING from INDH).
4. Self-Employment is coded as PSTILL≤ 20 (from IDAP).
5. Not in the labor market is coded as PSTILL= 90 (from IDAP).
6. Retirement is coded as PSTILL in {50, 55, 92, 93, 94} (from IDAP).
7. Student is coded as PSTILL = 91 (from IDAP).
8. A couple is coded as high-skilled if at least one of them has ≥ 180 months of
education (using HFPRIA from UDDA); otherwise it is coded as low-skilled.
We additionally calculate nominal cash-on-hand and imputed consumption as
nom
Mitnom ≡ R · Anom
i,t−1 + Yit
Citnom ≡ Mitnom − Anom
it
(B.1)
(B.2)
All variables are subsequently deflated with the consumer price index.
B.3
Sample Selection
We use the following iterative selection criteria:
1. Our baseline sample is all unique couples, where the male is older than 18 and is in
the income registers sometimes between 1987 and 1996 (both included).
2. Both partners are between age 25 and 59 (both included).
3. The age difference is not larger than 5 years.
4. Neither of them are ever self-employed or not in the labor market (see definition in
sub-section B.2).
25
Table B.1: Sample Selection
1.
2.
3.
4.
5.
6.
7.
8.
Unique Couples
Observations
1.935.069
1.142.433
1.040.074
657.926
626.302
624.944
617.334
230.372
12.869.391
8.542.785
6.862.197
4.207.377
4.117.788
3.990.007
3.951.504
1.832.276
62.057
490.694
Baseline
Age between 25 and 59
Age difference ≤ 5 years
Never self-employed
Not students
Not retired before age 59
Education information not missing
≥ 5 “non-extreme” observations
hereof high-skilled
5. No information is used when or before any of them are students (see definition in
sub-section B.2).
6. Neither of them retire before age 59 (see definition in sub-section B.2).
7. Education information is not missing for both partners.
8. We remove all households with fewer than 5 observations satisfying:
(a)
Mit Cit
it
, Yit , YAraw
Yit
it
and Yit are not below the 1st percentile or above the 99th percentile by age-year bins.
(b) mit ≡
Mit
Pit
≥ −λ
(c) ait ≡
Ait
Pit
≥ −λ
(d) cit ≡
Cit
Pit
< 0.3
Additionally we do not use information for any of the periods where the above
requirements are not satisfied.
Table B.1 shows how the sample size is affected by these choices.
B.4
Life Cycle Profiles
In order to calculate life-cycle profiles, we need to detrend across cohorts. We do so in
two steps. First we run the following regression separately for each education group
log(Yit ) = cons +
59
X
αjage 1ageit =j +
j=25
1996
X
k=1987
26
αkyear 1yearit =k + it
(B.3)
Secondly, the education specific trend growth rates of income is derived as the constant
from a regression of the first differences year dummy coefficients on no covariates, i.e.
∆αtyear = (G − 1) + t
(B.4)
Finally all monetary variables are detrended relative to a 25 year old in 1996 by dividing
birthyearit −1996−25
through by the factor G
, and normalized by subsequently dividing through
by the mean income of a unskilled household of age 25.
Figure B.1-B.4 show the resulting life-cycle profiles.
Figure B.1: Life Cycle Profiles - Yt
(a) Low Skilled - Percentiles
(b) High Skilled - Percentiles
2.5
2
2.5
10th
25th
50th
75th
90th
2
1.5
1.5
1
1
0.5
25
30
35
40
45
50
55
60
0.5
25
30
35
40
age
(c) Low Skilled - Mean by Birthyear
2.5
2
2
1.5
1.5
1
1
30
35
40
50
55
60
(d) High Skilled - Mean by Birthyear
2.5
0.5
25
45
age
45
50
55
60
age
0.5
25
30
35
40
45
age
27
50
55
60
Figure B.2: Life Cycle Profiles - At
(b) High Skilled - Percentiles
(a) Low Skilled - Percentiles
8
6
8
10th
25th
50th
75th
90th
6
4
4
2
2
0
0
-2
25
30
35
40
45
50
55
60
-2
25
30
35
40
45
age
50
55
60
age
(c) Low Skilled - Mean by Birthyear
(d) High Skilled - Mean by Birthyear
8
8
6
6
4
4
2
2
0
0
-2
25
30
35
40
45
50
55
60
-2
25
30
35
40
age
45
50
55
60
age
Figure B.3: Life Cycle Profiles - mt
(a) Low Skilled - Percentiles
(b) High Skilled - Percentiles
6
5
4
6
10th
25th
50th
75th
90th
5
4
3
3
2
2
1
1
0
25
30
35
40
45
50
55
60
age
0
25
30
35
40
45
age
28
50
55
60
Figure B.4: Life Cycle Profiles - ct
(b) High Skilled - Percentiles
(a) Low Skilled - Percentiles
2
1.5
2
10th
25th
50th
75th
90th
1.5
1
1
0.5
0.5
0
25
30
35
40
45
50
55
60
age
0
25
30
35
40
45
age
29
50
55
60
C
A Monte Carlo Study: Buffer Stock Model
In this section, we investigate the finite sample properties of our proposed estimator
applied to estimate the model of interest in the application in Section 5.
In each of the 50 Monte Carlo runs conducted here, we first simulate N households for
35 periods (from age 25 through 59) with household-specific discount factors βi and initial
draws of wealth and permanent income from log-normal distributions with, respectively,
means of 0.1 and 1 and variances of 0.2 and 0.4. We draw individual discount factors
from a normal distribution with mean 0.98 and standard deviation 0.02 and truncate the
distribution at such that βi ∈ [0.8, 1.1] for all i. In our simulations, no observations were
truncated. When estimating, we fix the domain to be β ∈ [0.8, 1.1] and define the finite
support as BJ = {0.8 + j(1.1 − 0.8)/J}J−1
j=0 .
We calibrate the model using the fairly standard values reported in table C.1. The
estimation sample is then constructed by randomly picking T adjacent periods for each
household to be used. Consumption is finally multiplied with random draws of log-normal
measurement error as in equation (5.11). We use this simulated data to estimate (ρ̂, σ̂η , ĵ)
solving (5.13)–(5.14) for each MC run.
Table C.1: Calibrated Parameters for Monte Carlo Results.
ρ
κ
σξ σψ Gt
R
λ
2.5 0.5 0.1 0.1 1.02 1.04 0.3
Table C.2 and C.3 report MC results when using J ∈ {50, 100, 200} nodes to approximate the true continuous distribution of preferences. Results for N = 100, 000,
T ∈ {10, 30}, and ση = 0.10 are reported. We estimate, besides the heterogeneous discount factors, the CRRA coefficient, ρ, using a nonlinear least squares criterion,
ρ̂ = arg min
ρ>0
β̂i (ρ) = arg min
N X
T
1 X
(Cit /Ct? (Mit , Pit ; β̂i (ρ), ρ) − 1)2
N N =1 t=1
βi ∈BJ
T
X
(Cit /Ct? (Mit , Pit ; βi , ρ) − 1)2
t=1
using the fact that measurement error has mean one.
30
Table C.2: Monte Carlo Results: ρ, Buffer Stock Model.
Avg. Abs. Bias
baseline
MC Std.
bias reduced
baseline
bias reduced
T = 10
J = 50
J = 100
J = 200
1.0578
0.5044
0.2544
0.6503
0.2598
0.2648
0.0428
0.0416
0.0495
0.0923
0.0880
0.0758
T = 30
J = 50
J = 100
J = 200
0.4059
0.2385
0.1873
0.0759
0.1130
0.1580
0.0136
0.0111
0.0101
0.0347
0.0224
0.0215
Notes: Columns 1 and 2 report the average absolute bias of the
baseline and the bias-reduced estimates of ρ across the Monte Carlo
replications. Columns 3 and 4 report the standard deviation across
the replications. All results are for the Buffer Stock model.
Table C.3: Monte Carlo Results: βi , Buffer Stock Model.
Avg. MSE
baseline
Avg. MC Std.
bias reduced
baseline
bias reduced
T = 10
J = 50
J = 100
J = 200
0.0003
0.0001
0.0001
0.0002
0.0001
0.0001
0.0113
0.0092
0.0084
0.0099
0.0085
0.0085
T = 30
J = 50
J = 100
J = 200
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0048
0.0042
0.0041
0.0043
0.0040
0.0040
Notes: Columns 1 and 2 report the average mean square error of the
baseline and the bias-reduced estimates of {βi }N
1 across the Monte
Carlo replications. Columns 3 and 4 report the average standard
deviation across the replications. All results are for the Buffer Stock
model.
31
D
Results from Alternative Estimators [UPDATE]
We here report in Table [REF], the estimation results from alternative estimators. Particularly, we here show estimated parameters from a non-linear least squares (NLLS)
estimator,
Ti
N X
1 X
N
L(θ, {γi }i=1 ) =
εit (θ, γi )2
N i=1 t=1
where
εit (θ, γi ) = Citobs /Cit? − 1
and a pseudo Huber loss function,
L(θ, {γi }N
i=1 )
Ti
N X
q
1 X
2
=
δ ( 1 + (εit (θ, γi )/δ)2 − 1)
N i=1 t=1
with δ = 0.1 being a dampening parameter such that the estimates are more robust to
outliers.
32
References
Ackerberg, D. A. (2009): “A new use of importance sampling to reduce computational
burden in simulation estimation,” Quantitative Marketing and Economics, 7(4), 343–
376.
Alan, S., O. Attanasio and M. Browning (2009): “Estimating Euler equations
with noisy data: two exact GMM estimators,” Journal of Applied Econometrics, 24(2),
309–324.
Alan, S. and M. Browning (2010): “Estimating Intertemporal Allocation Parameters
using Synthetic Residual Estimation,” The Review of Economic Studies, 77(4), 1231–
1261.
Alan, S., M. Browning and M. Ejrnæs (2014): “Income and Consumption: a Micro
Semi-structural Analysis with Pervasive Heterogeneity,” .
Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2008): “Eliciting
Risk and Time Preferences,” Econometrica, 76(3), 583–618.
Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2010): “Preference heterogeneity in experiments: Comparing the field and laboratory,” Journal of
Economic Behavior & Organization, 73(2), 209–224.
Ando, T. and J. Bai (2016): “Panel Data Models with Grouped Factor Structure
Under Unknown Group Membership,” Journal of Applied Econometrics, 31(1), 163–
191, jae.2467.
Andreoni, J. and C. Sprenger (2012): “Risk Preferences Are Not Time Preferences,”
The American Economic Review, 102(7), 3357–3376.
Bai, J. and S. Ng (2002): “Determining the Number of Factors in Approximate Factor
Models,” Econometrica, 70(1), 191–221.
Bajari, P., J. T. Fox and S. P. Ryan (2007): “Linear Regression Estimation of
Discrete Choice Models with Nonparametric Distributions of Random Coefficients,”
The American Economic Review, 97(2), 459–463.
Barsky, R. B., F. T. Juster, M. S. Kimball and M. D. Shapiro (1997): “Preference Parameters and Behavioral Heterogeneity: An Experimental Approach in the
Health and Retirement Study,” The Quarterly Journal of Economics, 112(2), 537–579.
Beetsma, R. M. W. J. and P. C. Schotman (2001): “Measuring Risk Attitudes in
a Natural Experiment: Data from the Television Game Show Lingo,” The Economic
Journal, 111(474), 821–848.
33
Bester, C. A. and C. B. Hansen (forthcoming): “Grouped effects estimators in fixed
effects models,” Journal of Econometrics.
Bonhomme, S. and E. Manresa (2015): “Grouped Patterns of Heterogeneity in Panel
Data,” Econometrica, 83(3), 1147–1184.
Browning, M. and S. Leth-Petersen (2003): “Imputing consumption from income
and wealth information,” The Economic Journal, 113(488), F282–F301.
Bryant, P. and J. A. Williamson (1978): “Asymptotic Behaviour of Classification
Maximum Likelihood Estimates,” Biometrika, 65(2), 273–281.
Cagetti, M. (2003): “Wealth Accumulation Over the Life Cycle and Precautionary
Savings,” Journal of Business & Economic Statistics, 21(3), 339–353.
Cagetti, M. and M. De Nardi (2008): “Wealth Inequality: Data and Models,”
Macroeconomic Dynamics, 12(S2), 285–313.
Carroll, C. D. (1992): “The buffer-stock theory of saving: Some macroeconomic evidence,” Brookings Papers on Economic Activity, 2, 61–156.
(1997): “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,”
The Quarterly Journal of Economics, 112(1), 1–55.
(2006): “The method of endogenous gridpoints for solving dynamic stochastic
optimization problems,” Economics Letters, 91(3), 312–320.
Carroll, C. D., J. Slacalek and K. Tokuoka (2014): “The Distribution of Wealth
and the Marginal Propensity to Consume,” .
Cozzi, M. (2014): “Risk aversion heterogeneity, risky jobs and wealth inequality,”
Queen’s Economics Department Working Paper, No. 1286.
De Nardi, M. (2015): “Quantitative Models of Wealth Inequality: A Survey,” NBER
Working Paper 21106.
Deaton, A. (1991): “Saving and liquidity constraints,” Econometrica, 59(5), 1221–1248.
Deaton, A. (1992): Understanding Consumption. Oxford University Press.
Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum Likelihood
from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society.
Series B (Methodological), 39(1), 1–38.
Dhaene, G. and K. Jochmans (forthcoming): “Split-panel jackknife estimation of
fixed-effect models,” Review of Economic Studies.
34
Dohmen, T., A. Falk, D. Huffman, U. Sunde, J. Schupp and G. G. Wagner (2011): “Individual Risk Attitudes: Measurement, Determinants, and Behavioral
Consequences,” Journal of the European Economic Association, 9(3), 522–550.
Farhi, E. and I. Werning (2012): “Capital taxation: Quantitative explorations of the
inverse Euler equation,” Journal of Political Economy, 120(3), 398–445.
Fernández-Villaverde, J., J. F. Rubio-Ramírez and M. S. Santos (2006):
“Convergence properties of the likelihood of computed dynamic models,” Econometrica, 74(1), 93–119.
Finke, M. S. and S. J. Huston (2013): “Time preference and the importance of saving
for retirement,” Journal of Economic Behavior & Organization, 89, 23–34.
Fox, J. T., K. i. Kim, S. P. Ryan and P. Bajari (2011): “A simple estimator for
the distribution of random coefficients,” Quantitative Economics, 2(3), 381–418.
Fox, J. T., K. i. Kim and C. Yang (2015): “A simple nonparametric approach to
estimating the distribution of random coefficients in structural models,” Discussion
paper.
Gârleanu, N. and S. Panageas (2015): “Young, Old, Conservative, and Bold: The
Implications of Heterogeneity and Finite Lives for Asset Pricing,” Journal of Political
Economy, 123(3), 670–685.
Gourinchas, P.-O. and J. A. Parker (2002): “Consumption over the life cycle,”
Econometrica, 70(1), 47–89.
Guiso, L. and M. Paiella (2008): “Risk Aversion, Wealth, and Background Risk,”
Journal of the European Economic Association, 6(6), 1109–1150.
Guvenen, F. (2006): “Reconciling conflicting evidence on the elasticity of intertemporal
substitution: A macroeconomic perspective,” Journal of Monetary Economics, 53(7),
1451–1472.
(2009): “A Parsimonious Macroeconomic Model for Asset Pricing,” Econometrica, 77(6), 1711–1750.
Hahn, J. and H. R. Moon (2010): “Panel Data Models with Finite Number of Multiple
Equilibira,” Econometric Theory, 26(3), 863–881.
Hahn, J. and W. K. Newey (2004): “Jackknife and analytical bias reduction for
nonlinear panel models,” Econometrica, 72, 1295–1319.
35
Heckman, J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating aDiscrete Time–Discrete Data Stochastic Process,” in
Structural Analysis of Discrete Panel Data with Econometric Applications, ed. by C. F.
Manski and D. McFadden, pp. 179–195. Cambridge, MA: MIT Press,.
Heckman, J. and B. Singer (1984): “A method for minimizing the impact of distributional assumptions in econometric models for duration data,” Econometrica: Journal
of the Econometric Society, pp. 271–320.
Hendricks, L. (2007): “How important is discount rate heterogeneity for wealth inequality?,” Journal of Economic Dynamics and Control, 31(9), 3042–3068.
Holt, C. A. and S. K. Laury (2005): “Risk Aversion and Incentive Effects: New Data
without Order Effects,” The American Economic Review, 95(3), 902–904.
Jørgensen, T. H. (forthcoming): “Life-Cycle Consumption and Children: Evidence
from a Structural Estimation,” Oxford Bulletin of Economics and Statistics.
Jørgensen, T. H. and D. Kristensen (2017): “Simple Estimation of Microeconometric Models with Latent Dynamic Variables,” unpublished working paper, University
College London.
Kamakura, W. A. (1991): “Estimating flexible distributions of ideal-points with external analysis of preferences,” Psychometrika, 56(3), 419–431.
Kaplan, G. (2012): “Inequality and the life cycle,” Quantitative Economics, 3.
Kimball, M. S., C. R. Sahm and M. D. Shapiro (2008): “Imputing risk tolerance
from survey responses,” Journal of the American statistical Association, 103(483), 1028–
1038.
(2009): “Risk Preferences in the PSID: Individual Imputations and Family
Covariation,” American Economic Review, 99(2), 363–68.
Kocherlakota, N. R. (2010): The New Dynamic Public Finance. Princeton University
Press.
Krusell, P. and A. A. Smith (1997): “Incoem and wealth heterogeneity, portfolio
choice, and equilibrium asset returns,” Macroeconomic Dynamics, 1(02), 387–422.
(1998): “Income and wealth heterogeneity in the macroeconomy,” Journal of
Political Economy, 106(5), 867–896.
Lin, C.-C. and S. Ng (2012): “Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership is Unknown,” Journal of Econometric Methods,
1(1), 42–55.
36
Meghir, C. and L. Pistaferri (2004): “Income variance dynamics and heterogeneity,”
Econometrica, 72(1), 1–32.
Milligan, G. W. and M. C. Cooper (1985): “An examination of procedures for
determining the number of clusters in a data set,” Psychometrika, 50(2), 159–179.
Nevo, A., J. L. Turner and J. W. Williams (forthcoming): “Usage-Based Pricing
and Demand for Residential Broadband,” Econometrica.
Pilla, R. S. and B. G. Lindsay (2001): “Alternative EM methods for nonparametric
finite mixture models,” Biometrika, 88(2), 535–550.
The Danish Ministry of Finance (2003): ældres sociale vilkår. (In Danish).
37