Estimating Dynamic Economic Models with Fixed Effects∗ PRELIMINARY AND INCOMPLETE Jeppe Druedahl† Thomas H. Jørgensen‡ Dennis Kristensen§ January 30, 2017 Abstract We propose a novel approach to estimate dynamic economic models with heterogeneous agents from observed behavior. The estimator is non-parametric in the sense that it does not impose any restrictions on the distribution of heterogeneous parameters. We develop the asymptotic behavior of the estimator and Monte Carlo results show that the proposed estimator works well even in relatively short panels. We apply our method to estimate a model of intertemporal consumption allocation allowing for heterogeneity in time preferences using high quality Danish longitudinal register data. We find substantial heterogeneity in preferences within educational strata and the distributions of estimated preferences suggest more mass at high values of discount factors for high skilled. Finally, we use the estimated householdspecific preferences to show that households who have never had an unemployment insurance are also less patient and less risk averse than other households. (JEL: C14, C51, D91) Keywords: Heterogeneity, Dynamic Economic Models, Structural Estimation, Intertemporal asset allocation. ∗ We thank Bo E. Honoré, Elena Manresa, Christopher Carroll, Mette Ejrnæs, Lutz Hendricks, Rasmus Søndergaard Pedersen, Søren Leth-Petersen, Claus Thustrup Kreiner and Anders Munk-Nielsen for fruitful discussions and suggestions. The project also benefited from seminar participants at Princeton and Copenhagen. Financial support from the Danish Council for Independent Research in Social Sciences is gratefully acknowledged (FSE, grant no. 4091-00040 and 5052-00086B). Part of this research was carried out while Jørgensen was visiting Princeton University in the fall 2015. Jørgensen thanks Bo E. Honoré for his exceptional hospitality and effort in organizing the stay. An earlier draft was circulated under the title “Heterogeneous Preferences and Wealth Inequality”. † Department of Economics, University of Copenhagen, Øster Farimagsgade 5, Building 26, DK1353 Copenhagen K, Denmark. E-mail: [email protected]. Website: http://econ.ku. dk/druedahl. ‡ Department of Economics, University College London, Gower Street, London, United Kingdom. Email: [email protected]. Webpage: www.tjeconomics.com. § Department of Economics, University College London, Gower Street, London, United Kingdom. Email: [email protected]. Website: https://sites.google.com/site/econkristensen. 1 Introduction Economic agents are recurrently found to be heterogeneous in terms of ex ante characteristics such as abilities and preferences. Experiments and surveys have, for example, repeatedly provided evidence of substantial preference heterogeneity.1 This heterogeneity, furthermore, often have important positive and normative implications. Heterogeneity in patience and risk aversion are, for example, important in explaining wealth inequality in excess of income inequality2 and for understanding asset price puzzles.3 Furthermore it might have large effects on the form and level of optimal taxation.4 We propose a novel non-parametric approach to estimate dynamic economic models with heterogeneous agents from panel data on observed choices. We use that systematic variation in observed choices, beyond what a given economic model and measurement error can explain, is evidence of heterogeneity. Our estimator of both homogeneous and heterogeneous parameters is simple to implement without imposing any distributional assumptions on heterogeneous parameters. It can furthermore be used to estimate models with both discrete and continuous choices, though we focus on the latter. For concreteness,imagine that the goal is to estimate a consumption-saving model in the spirit of Deaton (1991), allowing for heterogeneity in a single preference parameter, and that we have access to panel data of N households, indexed by i, where we for Ti periods, indexed by t, observe their market resources mit and level of consumption cit . Further denote the model-implied optimal level of consumption by c? (mit ; θ, γi ), where θ is a vector of homogeneous parameters, and γi is a household-specific heterogeneous preference parameter. In order to estimate θ and the household-specific values of γi , our main assumption is the distribution of γi can be well-approximated by a discrete distribution, Γ. Our estimator can therefore be thought of as a grouped fixed effects estimator, where the distribution of γi is uncovered as the histogram of the household-specific values of γi , as originally suggested by Kamakura (1991). Postponing the discussion of how to choose Γ in empirical applications to later, assuming instead that it is known, the homogeneous parameters in θ and the group membership of each household, ji ∈ {1, . . . J}, can be estimated using e.g. nonlinear least squares as 1 2 3 4 See e.g. Barsky, Juster, Kimball and Shapiro (1997), Beetsma and Schotman (2001), Holt and Laury (2005), Andersen, Harrison, Lau and Rutström (2008, 2010), Guiso and Paiella (2008), Kimball, Sahm and Shapiro (2008, 2009), Dohmen, Falk, Huffman, Sunde, Schupp and Wagner (2011), Andreoni and Sprenger (2012) and Finke and Huston (2013). See e.g. Krusell and Smith (1997, 1998), Hendricks (2007), Cagetti and De Nardi (2008), Cozzi (2014), Carroll, Slacalek and Tokuoka (2014) and De Nardi (2015). See e.g. Guvenen (2006, 2009) and Gârleanu and Panageas (2015). See e.g. Kocherlakota (2010) and Farhi and Werning (2012). 1 (θ̂, jˆ1 , jˆ2 , . . . , jˆN ) = arg min θ,j1 ,j2 ,...jN Ti N X 1 X (cit − c? (mit ; θ, γ ji ))2 N i=1 t=1 Estimating the distribution of γi then boils down to finding the weights on each element in Γ, ω = {ω1 , . . . , ωJ }. These weights can be found by simple population averages, P ω̂k = N1 N i=1 1{ĵi = k}. Traditional fixed effect (FE) estimators, not assuming finite support of γi , suffers from an incidental parameter problem known to cause a substantial bias in nonlinear panel models (see e.g. Hahn and Newey, 2004). Hahn and Moon (2010) argue that because the classification parameters are super-consistent when assuming finite support, the incidental parameter problem of our estimator should less pronounced. Furthermore, unlike random coefficient models, our estimator allows for arbitrary correlation between heterogeneous parameters and other model elements. Finally, a major computational advantage of our estimator is that, conditional on θ, the J solutions to the economic model, c? (·), can be pre-computed. The model does thus not need to be re-solved when estimating the N group memberships. This substantially reduces the computational time required to evaluate the criterion function, and makes many and potentially dense points in Γ feasible. Monte Carlo estimation results show that the proposed estimator has good finite sample properties. Specifically, we test its ability to uncover heterogeneous time preferences in the canonical buffer-stock consumption model pioneered by Deaton (1991, 1992) and Carroll (1992, 1997). Assuming that observed consumption is contaminated with multiplicative log-normal measurement error, we formulate a maximum likelihood (ML) version of our estimator. We find that the estimator performs well even with relative few time periods and substantial measurement error in consumption. We also find that while misspecifying the number of and placement of the fixed nodes in γ naturally affects the performance of the estimator, the estimated distribution of heterogeneous parameters are very close to the truth. We find a substantial bias in homogeneous parameters when the number of nodes are incorrect, however. We show how the panel jackknife approach of Hahn and Newey (2004) and Dhaene and Jochmans (forthcoming) can substantially reduce the bias in the homogeneous parameters. To illustrate the empirical applicability of our proposed estimator, we also estimate the buffer-stock consumption model on Danish administrative register data allowing for heterogeneous time preferences and/or heterogeneous CRRA coefficients. This model was first structurally estimated in Gourinchas and Parker (2002) and Cagetti (2003), assuming homogeneous preferences within occupational and educational strata respectively. We are the first to estimate the model with a non-parametric distribution of preference heterogeneity. Our results suggest that there is substantial heterogeneity within educational strata. Across educational strata we find that the estimated distributions of discount fac2 tors and CRRA coefficients are shifted towards higher values for high skilled households. Alan and Browning (2010), which is the only comparable study estimating heterogeneous preference parameters using observational data, finds similar results using the Panel Study of Income Dynamics (PSID). Our explicit estimation of group memberships allows us to perform post-estimation analyzes of the different preference groups. Specifically, we use the estimated householdspecific preferences to show that households who have never had an unemployment insurance are less patient and less risk averse than other households. This suggests that the estimated preferences align with economic intuition. After discussing the related literature below, the paper proceeds as follows. Section 2 formulates and presents the proposed estimator in general notation. Section 4 presents the Monte Carlo estimation results. In section 5, we report the estimation results from our empirical application. Finally, we conclude in section 6. 1.1 Relation to Existing Estimators Our proposed estimator is closely related to two recent strands of literature. Firstly, Bajari, Fox and Ryan (2007) and Fox, Kim, Ryan and Bajari (2011) suggest a similar histogram strategy of fixing a discrete grid of the heterogeneous parameters when estimating discrete choice models with random coefficients.5 Fox, Kim and Yang (2015) provide formal justification for their non-parametric approach approach. The assumption in this existing literature is that the coefficients are random, and they thus seek to estimate the population weights on each fixed node, ω. In the case of discrete (or discretized) choice models, this estimator can be formulated as a constrained least squares problem, which is easy to implement, and is ensured to have a unique global optimum. Unfortunately, the estimator is much more complex for continuous choice models because it generally requires solving a highly non-linear optimization problem with all the population weights as variables.6 All dynamic models with random coefficients, furthermore, face the initial conditions problem (see e.g. Heckman, 1981). The second strand of literature, closely related to our proposed estimator, is the grouped fixed effect (GFE) estimator proposed by Hahn and Moon (2010); Bonhomme and Manresa (2015) and Bester and Hansen (forthcoming).7 In these papers, both the placement of groups (i.e. the values in Γ) and the group membership of each observa- 5 6 7 Like our estimator, this facilitates pre-computation of the model solution for all J types (for given θ). Ackerberg (2009) likewise proposed a combination of importance sampling and change of variables to reduce simulation based estimation time by “pre-computing” the solution only over relevant objects. The constrained least squares formulation of the estimator can, as shown by Nevo, Turner and Williams (forthcoming), be recovered for continuous choices in a method of moment version where all the moments are restricted to be linear in the population weights. See also the related studies by Lin and Ng (2012) and Ando and Bai (2016). 3 tional unit is explicitly estimated. However, as discussed in Bonhomme and Manresa (2015), estimating the group placements can in practice imply problems with multiple local optima and non-convergence.8 This implies that the GFE estimator is mostly useful when heterogeneity is suspected to be in the form of a small number of “sufficiently” distinct groups across an unknown domain. Our estimator oppositely focuses on the case of pervasive heterogeneity on a well-known domain. In a broader context, our estimator is also related to a large literature on estimation of mixture models. A particularly popular estimator in this class is the non-parametric maximum likelihood estimator (NPMLE) proposed by Heckman and Singer (1984), among others. These types of estimators often formulate an expected likelihood function where both the groups placement and weights are to be estimated. As for the GFE estimator, the simultaneous estimation of weights and nodes can result in multiple local optima and problems of convergence. The common approach to numerically maximize the log expected likelihood function is to apply the expectation-maximization (EM) algorithm (Dempster, Laird and Rubin, 1977). Unfortunately, the EM-algorithm has a slow convergence rate and thus requires many evaluations of the likelihood function which can be very time consuming if the estimator nests a numerical solution of a dynamic economic model (Pilla and Lindsay, 2001). Empirical applications have therefore been restricted to cases with a few distinct groups. In the specific context of estimation of heterogeneous time and risk preferences, our paper is also closely related to Alan and Browning (2010) and Alan, Browning and Ejrnæs (2014). They propose a synthetic residual estimation (SRE) approach, where the distance between observed and simulated consumption data is minimized conditional on fully parametric distributions of preference heterogeneity and the assumption that all households, irrespective of their individual preferences, draw Euler residuals from a mixture of two log-normal distributions.9 The main benefit of the SRE estimator is that it does not require a full specification of the income process, or ever solving the model, but on the other hand it relies on very restrictive parametric assumptions, which our estimator avoids. 2 A Fixed Grouped Fixed Effects Estimator In this section, we state the proposed estimator in general notation, while we later turn to a concrete example in our Monte Carlo study. We consider a structural model, which 8 9 In a certain sense the results in the GFE papers can be seen as providing formal justification for cluster analysis. Note that while the mean Euler-residual (in the absence of borrowing constraints) is independent of preferences, higher order moments are generally not. This is the case even if the distribution of pooled Euler-residuals across heterogeneous households is well approximated by a mixture of two log-normals (as found in Alan and Browning, 2010). 4 for unit i (individual, household, form etc.) at time t has state variables sit and choice variables cit , and implies optimal choices c?it ≡ c? (sit ; θ, γi ), where θ ∈ Θ is a set of homogeneous parameters, and γi is a vector of unit-specific parameters. This could be a vector of optimal discrete and continuous choice variables in a dynamic economic model. We wish to estimate θ and γi using an (unbalanced) panel of N units observed for Ti (potentially non-consecutive) periods, where we in each period observes all the states, sobs it , obs and a non-empty subset of the choices, cit , potentially contaminated with measurement error. The fixed effects (FE) estimator is given by θ̂ FE Ti N X 1 X ? obs FE = arg min g(cobs it , c (sit ; θ, γ̂i (θ)); θ) θ∈Θ N i=1 t=1 γ̂iF E (θ) = arg min γi ∈R Ti X ? obs g(cobs it , c (sit ; θ, γi ); θ) (2.1) (2.2) t=1 where g(·) is some criteria function. The FE problem has N + dim(θ) parameters to be solved for. Especially when it is time consuming to evaluate c? () (by e.g. stochastic dynamic programming), this estimator might seem infeasible. We propose an alternative approximate estimator that aims at limiting the computational burden of FE estimation of structural dynamic economic models. Our approach is to formulate a discrete approximation of the continuous FE estimator in (2.1) in which γi is restricted to take on only a finite number of values, γi ∈ Γ = {γ 1 , . . . , γ J }. We think of the number of nodes, J, as a function of the data but suppress the dependence throughout. Below we supply an approach to estimate the number of nodes in applications. Our proposed fixed group fixed effects estimator (FGFE) is then Ti N X 1 X ? obs g(cobs it , c (sit ; θ, γ̂i (θ)); θ) N i=1 t=1 θ̂ = arg min θ∈Θ γ̂i (θ) = arg min γi ∈Γ Ti X ? obs g(cobs it , c (sit ; θ, γi ); θ) (2.3) (2.4) t=1 Let j = (j1 , . . . , jN ) denote the vector of group memberships and J ≡ {1, 2, . . . J} as the set of potential group memberships, the group membership is then estimated as ĵi = PJ PJ k=1 k1γ̂i (θ̂)=γ k where k=1 1γ̂i (θ̂)=γ k = 1. A key advantage of our proposed estimator is that for a given guess of θ, we can precompute the J solutions to the economic model for the various values in Γ, and estimate the N group membership parameters independently across units from equation (2.3). The population weights on each element in Γ, ω = {ωj }J1 , can subsequently be estimated by ω̂k = N 1 X 1 , ∀k ∈ J. N i=1 ĵi =k 5 (2.5) The estimator easily handles situations where ωk = 0 for some k. This is not the case for estimators where γ k is also estimated. Even if all weights are always strictly positive in the true optimum, a trial value of γ k with ωk = 0 imply that the objective function does not change with γ k severely complicating the optimizer’s decision how to proceed. Indeed, Bonhomme and Manresa (2015) report significant problems with finding the global maximum which might be due to a large dimensional problem with many flat regions.10 2.1 Estimating the Number of Nodes, J We propose a split-panel cross-validation approach to choose the number of nodes, J, in applications. For a given guess of J, imagine splitting the panel into I non-overlapping partitions along the time dimension. For each partition, ι, we can estimate θ̂ιJ and J J ĵιJ = (ĵ1,ι , . . . , ĵN,ι ) and use these estimated parameters to calculate the sum of squared predicted errors for the time periods not used in estimation (denoted with subscript −ι), Eι (J) ≡ N −1 T N X X J εit,−ι (θ̂ιJ , γ ĵi,ι )2 . i=1 t=1 Choosing J that minimizes the mean squared error I 1X Eι (J) Jˆ = arg min J∈N I ι=1 provides an estimate of the number of nodes and domain that trades of the bias and variance of the estimator. Another way to estimate the number of nodes is a successive approximation approach, similar to that suggested by Fernández-Villaverde, Rubio-Ramírez and Santos (2006) to determine the degree of accuracy of a numerical solution method required for the approximate likelihood function to be a good approximation of the exact likelihood. Particularly, to use the decrease in the estimated objective function as a metric to determine when to stop adding nodes. While this is a simple metric to compute, choosing when to stop is somewhat arbitrary. Other alternatives have been proposed in various different strands of literature. Popular approaches to determine the number of latent factors in factor analysis or the number of clusters in cluster analysis is to use information criteria, such as BIC or AIC (see e.g. Milligan and Cooper, 1985; Bai and Ng, 2002; and Bonhomme and Manresa, 2015). 10 This identification problem is not unique to the GFE estimator. The same identification problem is also inherent in the random coefficient estimator proposed by Heckman and Singer (1984) and heavily used in empirical applications. 6 3 Asymptotic Theory [To come] Under the assumption that the number of nodes and the placement of these nodes are known (i.e. known G), the estimator studied in Hahn and Moon (2010) is similar to the type of estimator we consider. Specifically, their estimator focuses on the estimation of dynamic discrete games of firm behavior with potentially multiple, but a finite number of, equilibria. In their setup, each market is observed over several time periods and they assume that the equilibrium played in a market is time-invariant. Translating their setup into our framework, the equilibrium played in a market is the unobserved heterogeneity (γi ) in our framework and the assumption of finitely many equilibria is equivalent to our finite support assumption on Γ. Hahn and Moon (2010) show that the estimator is consistent as N and T both goes to infinity and that correct classification converges to one even when the number of time periods observed, T , grows significantly slower than the number of units, N . Specifically, they show that for many typical settings, the incidental parameter problem of standard FE estimators (unrestricted support of γi ) vanishes as long as T grows as some log function of N . Finally, they show that the estimator of the homogeneous parameters, θ, is asymptotic normal and inference is not affected by the classification parameters due to their fast rate of convergence. • Consistency • Normality (as. var as FE) • Convergence rates (N −1 , T −1 , J −1 ) – Bias reduction works when T /J → 0 when T, J → ∞ – Asymptotic distribution of the bias reduced estimator 4 Monte Carlo Experiments We here illustrate the finite sample properties of our proposed FGFE estimator. We study two examples based on the data generating processes (DGPs): DGP1: yit = ρyit−1 + αi + εit exp(ρyit−1 + αi + εit ) DGP2: yit = 1 + exp(ρyit−1 + αi + εit ) (4.1) (4.2) where ε ∼ N (0, 0.01) across all simulations. For each of the 200 Monte Carlo replications we simulate N = 2, 000 individuals for T ∈ {10, 20, 30} periods and apply our estimator using J ∈ {5, 10, 50} equally spaced 7 nodes. We simulate data letting yi0 = 0, ρ = 0.95 and α is drawn from a normal with mean zero, variance 0.1 truncated to the interval [−0.3, 0.3] and assume that researchers know the domain of the unit-specific coefficients, αi , but not the values. In turn, the researcher wishes to uncover one homogenous parameter, ρ, and a vector of heterogeneous parameters, α = (α1 , . . . , αN ). Table 4.1: Monte Carlo Results: ρ, Linear Model. Avg. Abs. Bias baseline MC Std. bias reduced baseline bias reduced T = 10 FE FGFE J =5 J = 10 J = 50 0.0550 0.0054 0.0035 0.0053 0.0587 0.0561 0.0550 0.0102 0.0061 0.0056 0.0061 0.0037 0.0036 0.0118 0.0067 0.0054 T = 20 FE FGFE J =5 J = 10 J = 50 0.0194 0.0029 0.0017 0.0022 0.0212 0.0199 0.0195 0.0062 0.0039 0.0029 0.0041 0.0022 0.0017 0.0074 0.0040 0.0022 T = 30 FE FGFE J =5 J = 10 J = 50 0.0116 0.0018 0.0010 0.0012 0.0132 0.0119 0.0116 0.0203 0.0027 0.0018 0.0035 0.0016 0.0010 0.0247 0.0031 0.0013 Notes: Columns 1 and 2 report the average absolute bias of the baseline and the bias-reduced estimates of ρ across the Monte Carlo replications. Columns 3 and 4 report the standard deviation across the replications. All results are for the linear DGP1. Table 4.2 shows that increasing the number of nodes reduces the average mean squared error and the average MC standard error significantly. 8 Table 4.2: Monte Carlo Results: αi , Linear Model. Avg. MSE baseline Avg. MC Std. bias reduced baseline bias reduced T = 10 FE FGFE J =5 J = 10 J = 50 0.0260 0.0228 0.1240 0.1097 0.0283 0.0265 0.0260 0.0247 0.0232 0.0228 0.1332 0.1261 0.1241 0.1181 0.1116 0.1097 T = 20 FE FGFE J =5 J = 10 J = 50 0.0231 0.0205 0.1152 0.1030 0.0254 0.0236 0.0231 0.0224 0.0208 0.0205 0.1245 0.1171 0.1152 0.1121 0.1048 0.1031 T = 30 FE FGFE J =5 J = 10 J = 50 0.0222 0.0201 0.1116 0.1014 0.0245 0.0227 0.0222 0.0256 0.0205 0.0201 0.1215 0.1136 0.1117 0.1255 0.1032 0.1014 Notes: Columns 1 and 2 report the average mean square error of the baseline and the bias-reduced estimates of {αi }N 1 across the Monte Carlo replications. Columns 3 and 4 report the average standard deviation across the replications. All results are for the linear DGP1. Tables 4.1 and 4.2 reports the estimation results related to the homogeneous and heterogeneous parameters, respectively. The first column in Table 4.1 reports the average absolute bias and the second column reports the split-panel jackknife bias reduced average absolute bias. The third and fourth column reports the standard deviation across MC runs for the baseline and bias reduced estimates. We also report the standard FE estimates for reference. The FE estimator is feasible in this setup because the computational time associated with evaluating the models in both DGPs are rather low. In situations where evaluating the model at a set of parameter values is time consuming, such as many structural dynamic economic models, implementing the standard FE estimator may easily be unfeasible. As our theory suggests, the FGFE estimator converges towards the FE estimator and using 50 nodes delivers almost identical results. The split-panel jackknife bias reduction reduces the incidental parameter bias significantly (while increasing the variance slightly in out finite sample). 9 Table 4.3: Monte Carlo Results: ρ, Nonlinear Model. Avg. Abs. Bias baseline MC Std. bias reduced baseline bias reduced T = 10 FE FGFE J =5 J = 10 J = 50 0.0569 0.0369 0.0110 0.0147 0.0529 0.0544 0.0570 0.0910 0.0476 0.0372 0.0573 0.0253 0.0117 0.1123 0.0500 0.0163 T = 20 FE FGFE J =5 J = 10 J = 50 0.0447 0.0221 0.0101 0.0124 0.0458 0.0383 0.0452 0.0846 0.0549 0.0244 0.0560 0.0333 0.0112 0.1036 0.0687 0.0158 T = 30 FE FGFE J =5 J = 10 J = 50 0.0374 0.0165 0.0096 0.0114 0.0506 0.0314 0.0372 0.0838 0.0546 0.0178 0.0597 0.0328 0.0105 0.1060 0.0669 0.0149 Notes: Columns 1 and 2 report the average absolute bias of the baseline and the bias-reduced estimates of ρ across the Monte Carlo replications. Columns 3 and 4 report the standard deviation across the replications. All results are for the nonlinear DGP2. 10 Table 4.4: Monte Carlo Results: αi , Nonlinear Model. Avg. MSE baseline Avg. MC Std. bias reduced baseline bias reduced T = 10 FE FGFE J =5 J = 10 J = 50 0.0233 0.0239 0.1063 0.1122 0.0252 0.0237 0.0233 0.0295 0.0249 0.0240 0.1194 0.1089 0.1064 0.1371 0.1174 0.1124 T = 20 FE FGFE J =5 J = 10 J = 50 0.0213 0.0212 0.1034 0.1056 0.0234 0.0216 0.0213 0.0270 0.0231 0.0213 0.1163 0.1068 0.1035 0.1299 0.1149 0.1059 T = 30 FE FGFE J =5 J = 10 J = 50 0.0208 0.0207 0.1022 0.1036 0.0233 0.0211 0.0209 0.0266 0.0225 0.0207 0.1160 0.1055 0.1023 0.1292 0.1125 0.1038 Notes: Columns 1 and 2 report the average mean square error of the baseline and the bias-reduced estimates of {αi }N 1 across the Monte Carlo replications. Columns 3 and 4 report the average standard deviation across the replications. All results are for the nonlinear DGP2. 4.1 Choosing the Number of Nodes We here implement a simple half-panel cross-validation approach. This follows closely the bias-correction approach above and aims at preserving eventual time-series properties of the actual data. In particular, we split the simulated data into two sub-samples where the first contains the first T /2 time period observations for all individuals and the second sample contain the remaining T /2 time periods for all individuals. We then estimate the J model parameters for each sub-sample to get (ρ̂Jι , α̂i,ι ), ι = 1, 2. We then calculate the squared predicted error for each sample using the estimates from the other left out sample 11 (illustrated here for the linear DGP1) E1 (J) = N −1 E2 (J) = N −1 T /2 N X X J 2 (yi,t − ρ̂J2 yit−1 + α̂2,i ) i=1 t=1 N T X X J 2 (yi,t − ρ̂J1 yit−1 + α̂1,i ) i=1 t=T /2+1 and estimate J as 1 (4.3) Jˆ = arg min (E1 (J) + E2 (J)) J∈J 2 where we restrict the number of nodes to be in a sub-set of the natural numbers, namely J = {10, 11, . . . , 100}. Figure 4.1 plots the histogram of estimated number of groups across all 200 Monte Carlo runs together with the average number of estimated groups for T ∈ {10, 20, 30}. Figure 4.1 reports results for the linear DGP1 and Figure 4.2 reports the results from the nonlinear DGP2. Figure 4.1: Estimated Number of Groups, J. Linear model, DGP1. (a) T = 10. (b) T = 20. 0.25 share average (43) 0.1 share average (54) 0.2 share 0.15 0.25 0.05 0.15 0.1 0.05 0 share average (60) 0.2 share 0.2 share (c) T = 30. 0.25 0.15 0.1 0.05 0 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of groups, J number of groups, J number of groups, J Notes: The figure reports the distribution (and average) of the estimated number of groups using the cross-validation criteria (4.3) for varying sample sizes. All results are based on the linear DGP1. Figure 4.2: Estimated Number of Groups, J. Nonlinear model, DGP2. share average (73) 0.3 share share 0.3 (b) T = 20. 0.2 0.1 share average (82) 0.2 0.1 0 (c) T = 30. 0.3 share (a) T = 10. share average (85) 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of groups, J number of groups, J number of groups, J Notes: The figure reports the distribution (and average) of the estimated number of groups using the cross-validation criteria (4.3) for varying sample sizes. All results are based on the nonlinear DGP2. 12 As our theory suggests [REF to BIAS RATE], when the number of time periods increase, the average optimal number of groups increases. Particularly, the average estimated number of groups are 43, 54 and 60 when using 10, 20 and 30 time periods, respectively. While there is significant dispersion across MC runs the average estimated number of groups are reassuringly close to the 50 groups that delivered almost identical results as the standard FE estimator in tables 4.1–4.4 above. For the non-linear model, the number of optimal groups is slightly higher in our setting. 5 An Empirical Application to Danish Data In this section, we apply our proposed estimator to Danish administrative register data. We estimate through Maximum Likelihood the canonical buffer-stock consumption model of Deaton (1991, 1992) and Carroll (1992, 1997) assuming that (imputed) consumption is contaminated with mean-one multiplicative log-normal error with variance ση2 . Qualitatively similar results from a non-linear least square (NLLS) estimator and a robust Huber-type estimator, not relying on the distributional assumption on the measurement error, is reported in Appendix D in the online supplemental material. The supplemental material Appendix C also contain a Monte Carlo study of the ability of the proposed estimator to estimate this type of model calibrated to the Danish data. We first estimate the model under the assumption of fully homogeneous preferences, and then in turn allow for heterogeneity in the discount factor, β. We include in the supplemental material alternative results from letting ρ be heterogeneous. In both cases, we find substantial preference heterogeneity within educational strata, and a clear improvement in the model’s predictive power when allowing for either type of heterogeneity. The results also align well with economic intuition with e.g. high skilled households being more patient. In a post-estimation analysis, we additionally show that households with no unemployment insurance are estimated to have more mass at lower discount factors (and lower relative risk aversion coefficients). The NPGFE estimation routine converged in less than five minutes, illustrating that the proposed estimator is also applicable for more complex and computationally time demanding models. 13 5.1 Data We use high quality Danish administrative registers covering the entire population in the period 1987-1996.11 All information are based on third party reports with little additional self-reporting. All self-reporting are moreover subject to possible auditing giving reliable longitudinal information on household characteristics, assets, liabilities and income. Household income includes all monetary income net of all taxes, except any income related to ownership of financial assets. Transfers, such as child benefits and unemployment benefits, are also included to ensure that disposable income accurately measures the flow of resources available for consumption. Net wealth consists of stocks, bonds, bank deposits, cars, boats, house value for home owners and mortgage deeds net of total liabilities. The house value is assessed by the tax authorities for tax purposes. Pension wealth is not observed in the registers and thus not included in the wealth measure. Household consumption is not observed in the registers and is, therefore, imputed using a simple budget approach, Ct = Ỹt − ∆At , where Ỹt = Yt + r · At is disposable income, At is end-of-period net wealth, r is the real rate of return, and ∆At thus proxies savings. A very similar imputation method is evaluated on Danish data in Browning and Leth-Petersen (2003) and found to produce a reasonable approximation. The resulting consumption measure will, however, e.g. include some durables such as home appliances. All variables are deflated with the official consumer price index. We restrict attention to stable married or cohabiting couples in which the husband is between age 25 and 59. This is to mitigate issues regarding educational and retirement choices. To increase homogeneity of households, we restrict the spousal age difference to be no more than five years, and require that no one in the household ever becomes self-employed, are out of the labor market, or retire before age 59. To limit the effect of errors in the imputation procedure on our estimates of preference heterogeneity, we trim our sample from extreme observations and require that we have data for at least 5 years.12 In total this leaves us with an unbalanced panel of 317,793 households observed in at most 9 time periods with a total of 2,994,679 household-time observations. Households are classified as high skilled if either member holds at least a bachelor degree (86.713 households are denoted as high-skilled). 11 We begin in 1987 to be able to consistently match individuals into couples, and we end with 1996 because the Danish wealth tax was abolished in this year. Information on, e.g., cars and boats where not collected in subsequent years leading to a break in the wealth measure from 1996 to 1997. 12 Further details on the data are provided in appendix B. 14 5.2 Model Our application builds on the incomplete markets model pioneered by Deaton (1991, 1992) and Carroll (1992, 1997). For completeness we give e short description of the model here. We consider unitary households indexed by i with heterogeneous preferences who work for Tr periods, then retire and eventually die at the end of period T . The recursive form of the household’s problem is Cit1−ρ Vit (Pit , Mit ) = max + βi Et [Vit+1 (Pit+1 , Mit+1 )] Cit ≥0 1 − ρ (5.1) subject to the inter-temporal budget constraint Mit+1 = RAit + Yit Ait = Mit − Cit (5.2) (5.3) where Ait is end-of-period assets, Mit is beginning-of-period market resources, Yit is income, and R is the gross rate of return. Consumers are allowed to be net-borrowers up to a fraction of their permanent income Pit . End-of-period wealth thus has to satisfy Ait ≥ −λt Pit , λt = 0 t ≥ Tr λ else. (5.4) where we restrict retirees not to be net borrowers (λt = 0, t ≥ Tr ). In the beginning of each period, households receive a stochastic income Yit = Pit ξit , ξ ∼ log N (−0.5σξ2 , σξ2 ) (5.5) Pit = Gt Pit−1 ψit , ψ ∼ log N (−0.5σψ2 , σψ2 ) (5.6) where Gt is an age-dependent gross growth rate of permanent income, ψit is a mean-one permanent shock to income, and ξit is a mean-one transitory shock to income. We assume that income is constant post retirement, Yit = κPiT , t ≥ T , where κ is the replacement rate in retirement. We denote the model-implied optimal level of consumption for a household aged t with resources Mit and permanent income Pit by Cit? = Ct? (Mit , Pit ; θ, βi ) where θ = (ρ, R, λ, σξ2 , σψ2 ). The model is solved using the endogenous grid method (EGM) proposed by Carroll (2006).13 13 We use 300 discrete points to approximate the consumption function and 82 Guass-Hermite quadrature points to approximate expectations with respect to future transitory and permanent income shocks. 15 5.3 Calibrations In addition to the parameters of the income process estimated below, we fix several other parameters of the model before turning to estimation. Particularly, we choose an interest of R = 1.03 similar to the long run real return on 10 year Danish government bonds which over the period 1987-2007 was 3.8 percent. The same interest rate is used in e.g. Gourinchas and Parker (2002). Informally looking into the observed consumption behavior of households in debt we furthermore set the borrowing constraint to be binding at 30 percent of permanent income (λ = 0.30). Kaplan (2012) estimates an almost identical placement of the credit constraint using the PSID. Finally, we set the replacement rate in retirement to 90 percent, κ = 0.9 based on The Danish Ministry of Finance (2003) and assume that households retire at age 60 (Tr = 60) and dies at age 75 (T = 75). Following the approach in Meghir and Pistaferri (2004), we estimate the transitory and permanent income shocks variances for each education group separately using σψ2 = cov(∆it , 2 X ∆i,t−1+k ) (5.7) k=0 σξ2 = −cov(∆it , ∆i,t+1 ) (5.8) whereit is the residual for household i in period t from a regression of log household income on a full set of age dummies, i.e. log(Yit ) = cons + 59 X αjage 1ageit =j + it (5.9) j=25 The results are reported in table 5.1. The income variances of Danish households are smaller than those typically estimated for the US. As argued in Jørgensen (forthcoming), this is most likely due to i) a generous social welfare system, ii) progressive taxation, iii) a relatively high “minimum wage”, and iv) register data is typically less noise compared to surveys typically used. We find that high skilled households are subject to both larger transitory shocks, and larger permanent shocks. Table 5.1: Income Shock Variances σψ2 · 103 σξ2 · 103 Low skilled High skilled Est (s.e.) Est (s.e.) 2.86 3.20 (0.05) (0.10) 3.56 5.19 (0.09) (0.24) Notes: The income shock variances are estimated based on the approach proposed in Meghir and Pistaferri (2004). 16 The growth in income is estimated by re-arranging the income process such that N 1 X 1 Gt = exp ∆ log Yit + σψ2 N i=1 2 ! (5.10) A smoothed growth rate G̃t is obtained using a third degree polynomial in age. The results are reported in figure 5.1. Permanent income, Pit , is found by applying the Kalman filter on the time series of log income for each household (the resulting life cycle profile is shown in appendix B in the online supplemental material). Figure 5.1: Gross Income Growth Rates, Gt . (a) Low skilled (b) High skilled 1.12 1.12 Point Estimation Smoothed 1.1 1.1 1.08 1.08 1.06 1.06 1.04 1.04 1.02 1.02 1 1 0.98 25 30 35 40 45 50 55 60 0.98 25 30 35 age 5.4 40 45 50 55 60 age A Maximum Likelihood Estimator We follow the typical assumption that consumption is observed with multiplicative iid log-normal measurement error (G&P?) with mean one and variance ση2 , i.e. Citobs = Cit? ηit , log η ∼ N (−0.5ση2 , ση2 ) (5.11) The mean-corrected log-differences in observed and predicted consumption thus follows a Gaussian distribution, i.e. ? 2 εit (ρ, βi ) ≡ cobs it − cit + 0.5ση , εit (ρ, βi ) ∼ N (0, ση2 ) (5.12) obs obs where lowercase letters denote log-transformed variables, e.g., cobs it = log Ct (Mit , Pit ; θ, βi ). Note, that we ignore for simplicity here that we use an estimated permanent income measure stemming from the Kalman filter because we do not observe that in the data. Alternatively, the approach proposed in Jørgensen and Kristensen (2017) could be adopted 17 here as well. We have not pursued that strategy further here. The mean log likelihood function is then L(ρ, ση , {βi }N i=1 ) = N 1 X `i (ρ, ση , βi ) N i=1 where `i (ρ, ση , βi ) is the log-likelihood contributions associated with household i, Ti X 1 εit (ρ, βi )2 ση−2 `i (ρ, ση , βi ) = − Ti log(2πση2 ) + 2 t=1 ! We discretize the discount factors, {β j }Jj=1 , into J equally spaced nodes and our FGFE MLE solves (ρ̂, σ̂η ) = arg max L(ρ, ση , {β ji }N i=1 ) ρ,ση ∈R+ jˆi (ρ, ση ) = arg max `i (ρ, ση , β ji ), ∀i = 1, . . . , N ji ∈J (5.13) (5.14) where the group memberships are estimated through classification maximum likelihood (Bryant and Williamson, 1978). The MLE of the measurement error variance, ση2 , is biased because the estimator does not recognize the reduced degrees of freedom from the estimation of βi and the homogeneous parameters in θ. We, therefore, estimate a re-parameterized parameter, ση2 = σ̃η2 (N T − N − (dim(θ) − 1))/N T , where σ̃η2 is the “standard” biased measurement P error variance and N T ≡ N i=1 Ti . This parametrization corrects for the dimensionality of the homogeneous parameters and the estimation of the N classification parameters in equation (5.14). 5.5 Estimation Results The estimation results are presented in table 5.2 for both educational groups. Columns (1) and (4) report the homogeneous estimates for low and high skilled households. Surprisingly, we here find that low-skilled households with a discount factor of 0.966 are much more patient than high skilled households with a discount factor of 0.933. On the other hand, high skilled households have a CRRA coefficient of 7.3 far exceeding the CRRA coefficient of 2.1 for low skilled. As discussed above in relation to the problems in identifying β and ρ separately, the total saving motive might thus still be stronger for the high skilled than for the low skilled households – both due to the higher risk aversion and the lower inter-temporal elasticity of substitution. Columns (2) and (5) in table 5.2 report estimation results when β is allowed to be heterogeneous, and figure 5.2 shows the estimated distributions of β. We allow for a large domain with β ∈ [0.75, 1.05] and discretize it into J = 100 bins. We find very little mass 18 Table 5.2: Estimated Preferences. Low skilled High skilled Hom. (1) FGFE (2) Hom. (3) FGFE (4) β 0.966 0.961† 0.938 0.968† ρ 2.119 1.490 {1.969} 7.256 1.393 {1.772} (0.000) (0.080) [0.033] (0.001) (0.065) (0.001) ση 0.333 (0.000) 0.335 {0.340} [0.030] (0.002) 0.352 (0.000) (0.000) 0.352 {0.361} (0.001) L N Obs. −2.502 168315 1341582 −1.984 168315 1341582 −2.916 62057 490694 −2.355 62057 490694 J β∈ − R+ 100 [0.75, 1.05] − R+ 100 [0.75, 1.05] † ‡ Notes: Robust asymptotic standard errors in brackets. Clustered on the individual level. Split-panel jackknife bias reduced estimates reported in curly brackets (Dhaene and Jochmans, forthcoming). Reported are the estimated mean of the respective heterogeneous distribution with the standard deviation of the distribution in square brackets. The number of nodes refers here to the number of points in the interpolation object. The household-specific estimates are allowed to be continuous in the domain. on the boundary indicating that our chosen domain is large enough. In opposition to the results for the homogeneous specification, we now find that the high skilled are somewhat more patient than the low skilled; the mean of the distribution for the high skilled is 0.968, while it is 0.0961 for the low skilled. In figure 5.2 we see that the distribution of discount factors is shifted to the right for the high skilled compared to low skilled, while the shapes of the distributions are quite similar across the two educational groups. Both distributions are relatively symmetric, but with a somewhat fat left tail. The CRRA coefficients are found to be relatively low and similar across the educational groups; ρ = 1.49 for low skilled, and ρ = 1.39 for high skilled. The estimates increase to around 1.97 and 1.77 for low and high skilled, respectively, when applying the split-panel jackknife bias reduction approach proposed in Dhaene and Jochmans (forthcoming). The estimated homogeneous parameters and the means of heterogeneous distributions are thus in line with existing estimates. See e.g. Cagetti (2003); Gourinchas and Parker (2002) and Alan, Attanasio and Browning (2009). Looking at the improvement in the log-likelihood function when allowing β to be 19 heterogeneous compared to the homogeneous case, we can calculate log-likelihood ratios of 174, 374 and 69, 628 for low and high skilled, respectively. Assuming that these LRstatistics are χ2N −1 with N − 1 degrees of freedom (N classification parameters versus 1 homogeneous parameter), we get p-values of zero, suggesting that the estimated heterogeneity is significant both economically and statistically.14 Figure 5.2: Estimated β Distributions, FGFE. (a) Probability distribution. (b) Cumulative distribution. 0.1 0.08 1 Low skilled High skilled 0.8 0.06 0.6 0.04 0.4 0.02 0.2 0 0.75 0.8 0.85 0.9 - 0.95 1 0 0.75 1.05 0.8 0.85 0.9 - 0.95 1 1.05 Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixed groups as the empirical PDF in the left panel and the CDF in the right panel. Although we find evidence of substantial preference heterogeneity, our estimated distributions, especially of discount factors, are more narrow than those found in Alan and Browning (2010), which is the only comparable study using observational data. In the case of β, they find a spread between the 90th and 10th percentile of 0.143 for the low skilled and 0.134 for the high skilled. We estimate spreads of 0.050 and 0.046, respectively. Our results are broadly in line with experimental evidence from Denmark. The distribution of β seems consistent with experimental evidence from Denmark reported in Andersen, Harrison, Lau and Rutström (2010). In general it should be noted that the dispersion of estimates found in the experimental and survey-based literature is large15 , but that our results is probably in the lower end in terms of the estimated degree of heterogeneity. 14 This is not a formal test. The χ2 distribution and the degrees of freedom are at best an approximation. Furthermore, for the “test” to be nested, we should include the estimated homogeneous β̂ as a node in the discrete domain when estimating the distribution of discount factors. Since we view this “test” as an informal assessment of the importance of heterogeneity, we have not pursued this any further. 15 See footnote 1 for references. 20 5.6 Preferences and Unemployment Insurance An interesting feature of the proposed estimator is that it explicitly group households according to their (estimated) preferences. We can thus easily compare the distributions of preferences across groups of households divided according to various characteristics.16 In particular, we observe whether anyone in the household have an unemployment insurance. In order to construct a time invariant grouping, we denote households as having no unemployment insurance if they have never been observed as having an unemployment insurance, and as having unemployment insurance otherwise. Figures 5.3 shows the estimated cumulative distributions of discount factors for low and high skilled household divided by unemployment insurance take up. The group without any unemployment insurance is relatively small, and the resulting density plots noisy, but we see a noticeable difference in the discount factor distributions across the two groups – especially for high skilled households. Specifically, as economic intuition would suggest, we find that households who have never had unemployment insurance are associated with relatively lower valuation of the future compared to the group of households in which at least one member have had unemployment insurance at some point. Figure 5.3: Estimated Preference Distributions for Sub-Groups (b) High skilled. (a) Low skilled. 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0.75 0.8 0.85 0.9 - 0.95 1 0 0.75 1.05 0.8 0.85 0.9 - 0.95 1 1.05 Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixed groups as the empirical PDF in the left panel and the CDF in the right panel. Results are split based on whether at least one household member have ever been observed to have an unemployment insurance or not. While the reported correlations are not causal interpretations and we have ignored estimation uncertainty related to the group membership of each household for convenience, this analysis shows how the proposed estimator can be used to investigate ex ante 16 While the estimated preferences are associated with estimation uncertainty, we abstract from that important point here. 21 preference heterogeneity across household characteristics and produce meaningful results. • Update the figure to include legend! 6 Discussion Our non-parametric grouped fixed-effects estimator was shown to offer a computationally simple and efficient approach to estimate dynamic economic models with unrestricted heterogeneity from panel data on observed choices. In a Monte Carlo study we showed that it has good finite sample properties, and specifically that it can uncover the distribution of preference heterogeneity from consumption choices using a standard life-cycle consumption model. The estimator’s empirical applicability was shown estimating a similar life-cycle consumption saving model on Danish administrative data allowing for heterogeneous time preferences and and/or heterogeneous CRRA coefficients. These results indicated a large degree of preference heterogeneity, where differences across education groups for example aligned well with economic intuition. Interesting avenues for future work includes both an investigation of the asymptotic property of the estimator and applications of the estimator to more complex dynamic economic models with multiple continuous and discrete choices. Building on the current application, it would, for example, be interesting to estimate a more general life-cycle saving model with non-parametric heterogeneity affecting not just consumption choices, but also portfolio and retirement choices. 22 A Model Details Proposition 1. The optimal end-of-period asset choice satisfies At ≥ At = 0 if t = T − min {Ωt , λt } · Γt if t < T where Λt ≡ R−1 R−1 if t = T − 1 · Γt · ξ i h · min {ΛT −1 , λ} + ξ · Γt if t < T − 1 Γt ≡ Gt · ψ Proof. Let Et [•] denote the worst-case expectation operator given information t. Note that any MT ≤ 0 implies that the household cannot choose a Ct > 0 such that At ≤ 0. Consequently C 1−ρ = −∞ lim VT (•, MT ) = lim t Mt &0 Ct &0 1 − ρ which the household want to avoid at any cost. Therefore we have ET −1 [MT − AT ] > 0 ↔ ET −1 [R · AT −1 + YT ] > 0 ↔ R · AT −1 + ΓT · ξ · PT −1 > 0 ↔ AT −1 > −R−1 · ΓT · ξ · PT −1 Combining this with the exogenous borrowing constraint we get AT −1 > − min {ΛT −1 , λ} · PT −1 Similar arguments further implies ET −2 [MT −1 − min {ΛT −1 , λ} · PT −1 ] > 0 ↔ ET −1 [R · AT −2 + YT −1 ] > − min {ΛT −1 , λ} · ET −1 [PT −1 ] ↔ R · AT −2 + GT −1 · ψ · ξ · PT −2 > − min {ΛT −1 , λ} · GT −1 · ψ · PT −2 ↔ AT −2 > − R−1 · min {ΛT −1 , λ} + ξ · ΓT −1 · PT −2 h | . 23 i {z =ΛT −2 } B Data B.1 Income Definitions In the Danish income registers, we have the following income variables: DISPON_NY | {z } disposable income = SAMLINK_NY − SKATMVIALT_NY | {z } − taxes − UNDERHOL + TBKONTHJ} {z | QRENTUD2 | {z } alimony+returned benefits interest payments SAMLINK_NY = PERINDKIALT | {z } total income +OVSKEJD02_NY + OVERSKEJD07} | {z imputed rental value = RENTEINDK PERINDKIALT {z } | {z } + PEROEVRIGFORMUE | {z }+ | interest income total monetary income other property income ERHVERVSINDK(_GL) + OVERFORSINDK {z } | {z } | public transfers wages and profits +RESUINK(_GL) | {z other income } We define Yitgross ≡ PERINDKIALT Yitasset ≡ RENTEINDK+PEROEVRIGFORMUE Yitnonasset ≡ PERINDKIALT − Ytassets Yittransf ers ≡ OVERFORSINDK ςit ≡ SKATMVIALT_NY Y gross it − ςit asset Y if Y itgross < 0.1 it Yitnom ≡ (1 − τ ) · Y nonasset − else it it where i is for a couple, t is for observation year, and Yitnom is after-tax monetary income from all sources, except financial assets. To approximate the after tax earnings of houseςit holds with substantial income from financial assets, we use the tax rate τit ≡ Y gross of it households without substantial income from financial assets, but with a similar level of non-asset income (specifically we use twenty bins of Yitnonassets ). 24 B.2 Data Construction We construct our variables as follows: 1. Couples are constructed using EFALLE (from BEF) (before 1987 we only have C_FAELLE_ID from FAIN). 2. Birthyear and gender is based on FOED_DAG and KOEN (from BEF) or if not available ALDER and KOEN (from FAIN). Couple age is the age of the male. 3. Wealth Anom is the total net wealth excluding pensions (FORM and FORMit REST_NY05 (after 1996) from INDH) adjusted upwards with 10 percent of the nom (KOEJD or if missing EJENDOMSVURDERvalue of any owned properties Hikt ING from INDH). 4. Self-Employment is coded as PSTILL≤ 20 (from IDAP). 5. Not in the labor market is coded as PSTILL= 90 (from IDAP). 6. Retirement is coded as PSTILL in {50, 55, 92, 93, 94} (from IDAP). 7. Student is coded as PSTILL = 91 (from IDAP). 8. A couple is coded as high-skilled if at least one of them has ≥ 180 months of education (using HFPRIA from UDDA); otherwise it is coded as low-skilled. We additionally calculate nominal cash-on-hand and imputed consumption as nom Mitnom ≡ R · Anom i,t−1 + Yit Citnom ≡ Mitnom − Anom it (B.1) (B.2) All variables are subsequently deflated with the consumer price index. B.3 Sample Selection We use the following iterative selection criteria: 1. Our baseline sample is all unique couples, where the male is older than 18 and is in the income registers sometimes between 1987 and 1996 (both included). 2. Both partners are between age 25 and 59 (both included). 3. The age difference is not larger than 5 years. 4. Neither of them are ever self-employed or not in the labor market (see definition in sub-section B.2). 25 Table B.1: Sample Selection 1. 2. 3. 4. 5. 6. 7. 8. Unique Couples Observations 1.935.069 1.142.433 1.040.074 657.926 626.302 624.944 617.334 230.372 12.869.391 8.542.785 6.862.197 4.207.377 4.117.788 3.990.007 3.951.504 1.832.276 62.057 490.694 Baseline Age between 25 and 59 Age difference ≤ 5 years Never self-employed Not students Not retired before age 59 Education information not missing ≥ 5 “non-extreme” observations hereof high-skilled 5. No information is used when or before any of them are students (see definition in sub-section B.2). 6. Neither of them retire before age 59 (see definition in sub-section B.2). 7. Education information is not missing for both partners. 8. We remove all households with fewer than 5 observations satisfying: (a) Mit Cit it , Yit , YAraw Yit it and Yit are not below the 1st percentile or above the 99th percentile by age-year bins. (b) mit ≡ Mit Pit ≥ −λ (c) ait ≡ Ait Pit ≥ −λ (d) cit ≡ Cit Pit < 0.3 Additionally we do not use information for any of the periods where the above requirements are not satisfied. Table B.1 shows how the sample size is affected by these choices. B.4 Life Cycle Profiles In order to calculate life-cycle profiles, we need to detrend across cohorts. We do so in two steps. First we run the following regression separately for each education group log(Yit ) = cons + 59 X αjage 1ageit =j + j=25 1996 X k=1987 26 αkyear 1yearit =k + it (B.3) Secondly, the education specific trend growth rates of income is derived as the constant from a regression of the first differences year dummy coefficients on no covariates, i.e. ∆αtyear = (G − 1) + t (B.4) Finally all monetary variables are detrended relative to a 25 year old in 1996 by dividing birthyearit −1996−25 through by the factor G , and normalized by subsequently dividing through by the mean income of a unskilled household of age 25. Figure B.1-B.4 show the resulting life-cycle profiles. Figure B.1: Life Cycle Profiles - Yt (a) Low Skilled - Percentiles (b) High Skilled - Percentiles 2.5 2 2.5 10th 25th 50th 75th 90th 2 1.5 1.5 1 1 0.5 25 30 35 40 45 50 55 60 0.5 25 30 35 40 age (c) Low Skilled - Mean by Birthyear 2.5 2 2 1.5 1.5 1 1 30 35 40 50 55 60 (d) High Skilled - Mean by Birthyear 2.5 0.5 25 45 age 45 50 55 60 age 0.5 25 30 35 40 45 age 27 50 55 60 Figure B.2: Life Cycle Profiles - At (b) High Skilled - Percentiles (a) Low Skilled - Percentiles 8 6 8 10th 25th 50th 75th 90th 6 4 4 2 2 0 0 -2 25 30 35 40 45 50 55 60 -2 25 30 35 40 45 age 50 55 60 age (c) Low Skilled - Mean by Birthyear (d) High Skilled - Mean by Birthyear 8 8 6 6 4 4 2 2 0 0 -2 25 30 35 40 45 50 55 60 -2 25 30 35 40 age 45 50 55 60 age Figure B.3: Life Cycle Profiles - mt (a) Low Skilled - Percentiles (b) High Skilled - Percentiles 6 5 4 6 10th 25th 50th 75th 90th 5 4 3 3 2 2 1 1 0 25 30 35 40 45 50 55 60 age 0 25 30 35 40 45 age 28 50 55 60 Figure B.4: Life Cycle Profiles - ct (b) High Skilled - Percentiles (a) Low Skilled - Percentiles 2 1.5 2 10th 25th 50th 75th 90th 1.5 1 1 0.5 0.5 0 25 30 35 40 45 50 55 60 age 0 25 30 35 40 45 age 29 50 55 60 C A Monte Carlo Study: Buffer Stock Model In this section, we investigate the finite sample properties of our proposed estimator applied to estimate the model of interest in the application in Section 5. In each of the 50 Monte Carlo runs conducted here, we first simulate N households for 35 periods (from age 25 through 59) with household-specific discount factors βi and initial draws of wealth and permanent income from log-normal distributions with, respectively, means of 0.1 and 1 and variances of 0.2 and 0.4. We draw individual discount factors from a normal distribution with mean 0.98 and standard deviation 0.02 and truncate the distribution at such that βi ∈ [0.8, 1.1] for all i. In our simulations, no observations were truncated. When estimating, we fix the domain to be β ∈ [0.8, 1.1] and define the finite support as BJ = {0.8 + j(1.1 − 0.8)/J}J−1 j=0 . We calibrate the model using the fairly standard values reported in table C.1. The estimation sample is then constructed by randomly picking T adjacent periods for each household to be used. Consumption is finally multiplied with random draws of log-normal measurement error as in equation (5.11). We use this simulated data to estimate (ρ̂, σ̂η , ĵ) solving (5.13)–(5.14) for each MC run. Table C.1: Calibrated Parameters for Monte Carlo Results. ρ κ σξ σψ Gt R λ 2.5 0.5 0.1 0.1 1.02 1.04 0.3 Table C.2 and C.3 report MC results when using J ∈ {50, 100, 200} nodes to approximate the true continuous distribution of preferences. Results for N = 100, 000, T ∈ {10, 30}, and ση = 0.10 are reported. We estimate, besides the heterogeneous discount factors, the CRRA coefficient, ρ, using a nonlinear least squares criterion, ρ̂ = arg min ρ>0 β̂i (ρ) = arg min N X T 1 X (Cit /Ct? (Mit , Pit ; β̂i (ρ), ρ) − 1)2 N N =1 t=1 βi ∈BJ T X (Cit /Ct? (Mit , Pit ; βi , ρ) − 1)2 t=1 using the fact that measurement error has mean one. 30 Table C.2: Monte Carlo Results: ρ, Buffer Stock Model. Avg. Abs. Bias baseline MC Std. bias reduced baseline bias reduced T = 10 J = 50 J = 100 J = 200 1.0578 0.5044 0.2544 0.6503 0.2598 0.2648 0.0428 0.0416 0.0495 0.0923 0.0880 0.0758 T = 30 J = 50 J = 100 J = 200 0.4059 0.2385 0.1873 0.0759 0.1130 0.1580 0.0136 0.0111 0.0101 0.0347 0.0224 0.0215 Notes: Columns 1 and 2 report the average absolute bias of the baseline and the bias-reduced estimates of ρ across the Monte Carlo replications. Columns 3 and 4 report the standard deviation across the replications. All results are for the Buffer Stock model. Table C.3: Monte Carlo Results: βi , Buffer Stock Model. Avg. MSE baseline Avg. MC Std. bias reduced baseline bias reduced T = 10 J = 50 J = 100 J = 200 0.0003 0.0001 0.0001 0.0002 0.0001 0.0001 0.0113 0.0092 0.0084 0.0099 0.0085 0.0085 T = 30 J = 50 J = 100 J = 200 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0048 0.0042 0.0041 0.0043 0.0040 0.0040 Notes: Columns 1 and 2 report the average mean square error of the baseline and the bias-reduced estimates of {βi }N 1 across the Monte Carlo replications. Columns 3 and 4 report the average standard deviation across the replications. All results are for the Buffer Stock model. 31 D Results from Alternative Estimators [UPDATE] We here report in Table [REF], the estimation results from alternative estimators. Particularly, we here show estimated parameters from a non-linear least squares (NLLS) estimator, Ti N X 1 X N L(θ, {γi }i=1 ) = εit (θ, γi )2 N i=1 t=1 where εit (θ, γi ) = Citobs /Cit? − 1 and a pseudo Huber loss function, L(θ, {γi }N i=1 ) Ti N X q 1 X 2 = δ ( 1 + (εit (θ, γi )/δ)2 − 1) N i=1 t=1 with δ = 0.1 being a dampening parameter such that the estimates are more robust to outliers. 32 References Ackerberg, D. A. (2009): “A new use of importance sampling to reduce computational burden in simulation estimation,” Quantitative Marketing and Economics, 7(4), 343– 376. Alan, S., O. Attanasio and M. Browning (2009): “Estimating Euler equations with noisy data: two exact GMM estimators,” Journal of Applied Econometrics, 24(2), 309–324. Alan, S. and M. Browning (2010): “Estimating Intertemporal Allocation Parameters using Synthetic Residual Estimation,” The Review of Economic Studies, 77(4), 1231– 1261. Alan, S., M. Browning and M. Ejrnæs (2014): “Income and Consumption: a Micro Semi-structural Analysis with Pervasive Heterogeneity,” . Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2008): “Eliciting Risk and Time Preferences,” Econometrica, 76(3), 583–618. Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2010): “Preference heterogeneity in experiments: Comparing the field and laboratory,” Journal of Economic Behavior & Organization, 73(2), 209–224. Ando, T. and J. Bai (2016): “Panel Data Models with Grouped Factor Structure Under Unknown Group Membership,” Journal of Applied Econometrics, 31(1), 163– 191, jae.2467. Andreoni, J. and C. Sprenger (2012): “Risk Preferences Are Not Time Preferences,” The American Economic Review, 102(7), 3357–3376. Bai, J. and S. Ng (2002): “Determining the Number of Factors in Approximate Factor Models,” Econometrica, 70(1), 191–221. Bajari, P., J. T. Fox and S. P. Ryan (2007): “Linear Regression Estimation of Discrete Choice Models with Nonparametric Distributions of Random Coefficients,” The American Economic Review, 97(2), 459–463. Barsky, R. B., F. T. Juster, M. S. Kimball and M. D. Shapiro (1997): “Preference Parameters and Behavioral Heterogeneity: An Experimental Approach in the Health and Retirement Study,” The Quarterly Journal of Economics, 112(2), 537–579. Beetsma, R. M. W. J. and P. C. Schotman (2001): “Measuring Risk Attitudes in a Natural Experiment: Data from the Television Game Show Lingo,” The Economic Journal, 111(474), 821–848. 33 Bester, C. A. and C. B. Hansen (forthcoming): “Grouped effects estimators in fixed effects models,” Journal of Econometrics. Bonhomme, S. and E. Manresa (2015): “Grouped Patterns of Heterogeneity in Panel Data,” Econometrica, 83(3), 1147–1184. Browning, M. and S. Leth-Petersen (2003): “Imputing consumption from income and wealth information,” The Economic Journal, 113(488), F282–F301. Bryant, P. and J. A. Williamson (1978): “Asymptotic Behaviour of Classification Maximum Likelihood Estimates,” Biometrika, 65(2), 273–281. Cagetti, M. (2003): “Wealth Accumulation Over the Life Cycle and Precautionary Savings,” Journal of Business & Economic Statistics, 21(3), 339–353. Cagetti, M. and M. De Nardi (2008): “Wealth Inequality: Data and Models,” Macroeconomic Dynamics, 12(S2), 285–313. Carroll, C. D. (1992): “The buffer-stock theory of saving: Some macroeconomic evidence,” Brookings Papers on Economic Activity, 2, 61–156. (1997): “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,” The Quarterly Journal of Economics, 112(1), 1–55. (2006): “The method of endogenous gridpoints for solving dynamic stochastic optimization problems,” Economics Letters, 91(3), 312–320. Carroll, C. D., J. Slacalek and K. Tokuoka (2014): “The Distribution of Wealth and the Marginal Propensity to Consume,” . Cozzi, M. (2014): “Risk aversion heterogeneity, risky jobs and wealth inequality,” Queen’s Economics Department Working Paper, No. 1286. De Nardi, M. (2015): “Quantitative Models of Wealth Inequality: A Survey,” NBER Working Paper 21106. Deaton, A. (1991): “Saving and liquidity constraints,” Econometrica, 59(5), 1221–1248. Deaton, A. (1992): Understanding Consumption. Oxford University Press. Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38. Dhaene, G. and K. Jochmans (forthcoming): “Split-panel jackknife estimation of fixed-effect models,” Review of Economic Studies. 34 Dohmen, T., A. Falk, D. Huffman, U. Sunde, J. Schupp and G. G. Wagner (2011): “Individual Risk Attitudes: Measurement, Determinants, and Behavioral Consequences,” Journal of the European Economic Association, 9(3), 522–550. Farhi, E. and I. Werning (2012): “Capital taxation: Quantitative explorations of the inverse Euler equation,” Journal of Political Economy, 120(3), 398–445. Fernández-Villaverde, J., J. F. Rubio-Ramírez and M. S. Santos (2006): “Convergence properties of the likelihood of computed dynamic models,” Econometrica, 74(1), 93–119. Finke, M. S. and S. J. Huston (2013): “Time preference and the importance of saving for retirement,” Journal of Economic Behavior & Organization, 89, 23–34. Fox, J. T., K. i. Kim, S. P. Ryan and P. Bajari (2011): “A simple estimator for the distribution of random coefficients,” Quantitative Economics, 2(3), 381–418. Fox, J. T., K. i. Kim and C. Yang (2015): “A simple nonparametric approach to estimating the distribution of random coefficients in structural models,” Discussion paper. Gârleanu, N. and S. Panageas (2015): “Young, Old, Conservative, and Bold: The Implications of Heterogeneity and Finite Lives for Asset Pricing,” Journal of Political Economy, 123(3), 670–685. Gourinchas, P.-O. and J. A. Parker (2002): “Consumption over the life cycle,” Econometrica, 70(1), 47–89. Guiso, L. and M. Paiella (2008): “Risk Aversion, Wealth, and Background Risk,” Journal of the European Economic Association, 6(6), 1109–1150. Guvenen, F. (2006): “Reconciling conflicting evidence on the elasticity of intertemporal substitution: A macroeconomic perspective,” Journal of Monetary Economics, 53(7), 1451–1472. (2009): “A Parsimonious Macroeconomic Model for Asset Pricing,” Econometrica, 77(6), 1711–1750. Hahn, J. and H. R. Moon (2010): “Panel Data Models with Finite Number of Multiple Equilibira,” Econometric Theory, 26(3), 863–881. Hahn, J. and W. K. Newey (2004): “Jackknife and analytical bias reduction for nonlinear panel models,” Econometrica, 72, 1295–1319. 35 Heckman, J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating aDiscrete Time–Discrete Data Stochastic Process,” in Structural Analysis of Discrete Panel Data with Econometric Applications, ed. by C. F. Manski and D. McFadden, pp. 179–195. Cambridge, MA: MIT Press,. Heckman, J. and B. Singer (1984): “A method for minimizing the impact of distributional assumptions in econometric models for duration data,” Econometrica: Journal of the Econometric Society, pp. 271–320. Hendricks, L. (2007): “How important is discount rate heterogeneity for wealth inequality?,” Journal of Economic Dynamics and Control, 31(9), 3042–3068. Holt, C. A. and S. K. Laury (2005): “Risk Aversion and Incentive Effects: New Data without Order Effects,” The American Economic Review, 95(3), 902–904. Jørgensen, T. H. (forthcoming): “Life-Cycle Consumption and Children: Evidence from a Structural Estimation,” Oxford Bulletin of Economics and Statistics. Jørgensen, T. H. and D. Kristensen (2017): “Simple Estimation of Microeconometric Models with Latent Dynamic Variables,” unpublished working paper, University College London. Kamakura, W. A. (1991): “Estimating flexible distributions of ideal-points with external analysis of preferences,” Psychometrika, 56(3), 419–431. Kaplan, G. (2012): “Inequality and the life cycle,” Quantitative Economics, 3. Kimball, M. S., C. R. Sahm and M. D. Shapiro (2008): “Imputing risk tolerance from survey responses,” Journal of the American statistical Association, 103(483), 1028– 1038. (2009): “Risk Preferences in the PSID: Individual Imputations and Family Covariation,” American Economic Review, 99(2), 363–68. Kocherlakota, N. R. (2010): The New Dynamic Public Finance. Princeton University Press. Krusell, P. and A. A. Smith (1997): “Incoem and wealth heterogeneity, portfolio choice, and equilibrium asset returns,” Macroeconomic Dynamics, 1(02), 387–422. (1998): “Income and wealth heterogeneity in the macroeconomy,” Journal of Political Economy, 106(5), 867–896. Lin, C.-C. and S. Ng (2012): “Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership is Unknown,” Journal of Econometric Methods, 1(1), 42–55. 36 Meghir, C. and L. Pistaferri (2004): “Income variance dynamics and heterogeneity,” Econometrica, 72(1), 1–32. Milligan, G. W. and M. C. Cooper (1985): “An examination of procedures for determining the number of clusters in a data set,” Psychometrika, 50(2), 159–179. Nevo, A., J. L. Turner and J. W. Williams (forthcoming): “Usage-Based Pricing and Demand for Residential Broadband,” Econometrica. Pilla, R. S. and B. G. Lindsay (2001): “Alternative EM methods for nonparametric finite mixture models,” Biometrika, 88(2), 535–550. The Danish Ministry of Finance (2003): ældres sociale vilkår. (In Danish). 37
© Copyright 2026 Paperzz