INTERNATIONALECONOMICREVIEW Vol.45, No. 2, May2004 PRIORS FROM GENERAL EQUILIBRIUM MODELS FOR VARS* BY MARCO DEL NEGRO AND FRANK SCHORFHEIDE1 Research Department, Federal Reserve Bank of Atlanta, US.A.; Department of Economics, University of Pennsylvania, U.S.A. Thisarticleuses a simpleNew Keynesiandynamicstochasticgeneralequilibriummodel as a priorfor a vectorautoregression,and showsthat the resulting model is competitivewithstandardbenchmarksin termsof forecasting,andcan be used for policyanalysis. 1. INTRODUCTION Dynamic stochastic general equilibrium (DSGE) models are popular nowadays in macroeconomics. They are taught in virtually every economics Ph.D. program, and represent a predominant share of publications in the field. Yet, when it comes to policy making, these models are scarcely used-at least from a quantitative point of view. The main quantitative workhorse for policy making at the Federal Reserve System is FRB-US, a macroeconometric model built in the Cowles foundation tradition-a style of macroeconomics that is no longer taught in many Ph.D. programs.2In their decision process, Fed policymakers rely heavily on forecasting. They want to know the expected path of inflation in the next few quarters, and by how much a 50 basis point increase in the federal funds rate would affect that path. FRB-US offers answers to these questions-answers that many macroeconomists would regard with suspicion given both the Lucas' (1976) critique and the fact that in general the restrictions imposed by Cowles foundation models are at odds with * ManuscriptreceivedAugust2002;revisedMay2003. 1Partof thisresearchwasconductedwhilethe secondauthorwas visitingtheResearchDepartment of the FederalReserveBankof Atlanta,for whosehospitalityhe is grateful.Thesecondauthorgratefully acknowledgesfinancialsupportfrom the UniversityResearchFoundationof the Universityof Pennsylvania.Wethanktwoanonymousreferees,SeanCampbell,FabioCanova,FrankDiebold,John Geweke,Dave Gordon,ThomasLubik,JimNason, ChrisSims,Dan Waggoner,CharlesWhiteman, Tao Zha, and seminarparticipantsat the 2003ASSA Meetings,Bank of Canada,BrownUniversity, 2002 CIRANO ForecastingConference,Duke University,EuropeanCentralBank,2002 European EconometricSociety Meetings,FederalReserve Bank of Atlanta,2002 LACEA Meetings,Spring 2002MacroFederalReserveSystemMeetings,IndianaUniversity,2002MidwestMacroConference, 2002 SED Meetings,Universityof Pennsylvania,and UCSD for helpfulcommentsand discussions. We would like to thankIskanderKaribzhanovfor translatingsome of our Matlabcode in C. The viewsexpressedin this articledo not necessarilyreflectthose of the FederalReserveBankof Atlanta or the FederalReserve System.Please addresscorrespondenceto: FrankSchorfheide,Department of Economics,Universityof Pennsylvania,3718 LocustWalk,Philadelphia,PA 19104-6297.E-mail: [email protected]. 2 Some luckyfew, amongthem the authorsof this article,have had the privilegeof encountering proponentsof this approachduringtheirgraduatestudies. 643 644 DEL NEGRO AND SCHORFHEIDE dynamic general equilibrium macroeconomics (Sims, 1980). General equilibrium models on the other hand have a hard time providing alternative answers. The fact that these models are perceived to do badly in terms of forecasting, as they are scarcely parameterized, is perhaps one of the reasons why they are not at the forefront of policy making. Although progress is being made in the development of DSGE models that deliver acceptable forecasts, e.g., Smets and Wouters (2003), this article proposes an approach that combines a stylized general equilibrium model with a vector autoregression (VAR) to obtain a specification that both forecasts well and is usable for policy analysis. Specifically, the approach involves using prior information coming from a DSGE model in the estimation of a VAR. Loosely speaking, this prior can be thought of as the result of the following exercise: (i) simulate time-series data from the DSGE model, (ii) fit a VAR to these data. In practice we replace the sample moments of the simulated data by population moments computed from the DSGE model solution. Since the DSGE model depends on unknown structuralparameters, we use a hierarchical prior in our analysis by placing a distribution on the DSGE model parameters. A tightness parameter controls the weight of the DSGE model prior relative to the weight of the actual sample. Markov Chain Monte Carlo methods are used to generate draws from the joint posterior distribution of the VAR and DSGE model parameters. The article shows that the approach makes even a fairly stylized New Keynesian DSGE model competitive with standard benchmarks in terms of forecasting real output growth, inflation, and the nominal interest rate-the three variables that are of most interest to policymakers.3 Up to this point our procedure borrows from the work of DeJong et al. (1993) and Ingram and Whiteman (1994), who were the first to use priors from DSGE models for VARs. Ingram and Whiteman showed that prior information from the bare-bones stochastic growth model of King et al. (1988) is helpful in forecasting real economic activity, such as output, consumption, investment, and hours worked. In addition to documenting the forecasting performance of a trivariate VAR with a prior derived from a monetary DSGE model, this article makes two contributions that significantly extend the earlier work. First, we show formally how posterior inference for the VAR parameters can be translated into posterior inference for DSGE model parameters. Second, we propose procedures to conduct two types of policy experiments within our framework. The first policy analysis is based on identified VAR impulse responses to monetary policy shocks. To obtain identification we construct an orthonormal matrix from the VAR approximation 3 Ireland (2004) pursues a similar goal by augmenting a linearized DSGE model with unobservable errors, following Sargent (1989) and Altug (1989). Unlike in these earlier articles, Ireland does not restrict the errors to be uncorrelated across variables ("measurement errors"). Instead he allows them to follow a stationary VAR. Although the resulting hybrid model is suitable for forecasting, Ireland provides no general discussion about the extent to which the latent errors are distinguishable from the structural shocks (identification) and no framework for policy analysis. Robertson et al. (2002) also provide an alternative approach, based on relative entropy, to incorporate moment restrictions from economic theory into a model's forecasts. PRIORSFROMDSGE MODELSFOR VARS 645 of the DSGE model to map the reduced-forminnovationsinto structuralshocks. Thisprocedureinducesa DSGE model-basedpriordistributionfor the VAR impulse responses,whichis updatedwith the sampleinformation. The second policy experimentis more ambitious.We want to forecastthe effects of a change in the policy rule. Suppose a policymakerhas a DSGE model that forecastspoorly and a VAR model that is silent about the effects of a policy regime change.In order to assess the effects of a policy interventionaccurately, one has to correct the DSGE model predictionsto bring its baseline forecasts underthe existingpolicyregimeon track.Based on the VAR with DSGE model prior (DSGE-VAR) we propose to implementsuch a correction.Since we lack knowledgeabout the relationshipbetween policyparametersand DSGE model misspecification,we assumethat the pre-interventioncorrectionremainsappropriateunderthe new policy regime.To illustrateour approachwe try to forecast the impactof the changefrom the Martin-Burns-Millerregimeto the VolckerGreenspanregime on the volatilityof inflationand compareit to the variability of actualinflationin the Volcker-Greenspanperiod. The articleis organizedas follows.Section2 containsa briefdescriptionof the DSGE model that we are usingto constructthe priordistribution.Section3 discussesthe specificationof the DSGE model priorandexploresthe joint posterior distributionof VAR andDSGE modelparametersfroma theoreticalperspective. Empiricalresults for a VAR in output growth,inflation,and interest rates are presentedin Section4. Section5 concludes.Proofsand computationaldetailsare providedin the Appendix. 2. A SIMPLE MONETARY DSGE MODEL Oureconometricprocedureis appliedto a trivariateVAR for output,inflation, and interest rates. The prior distributionfor the VAR is derived from a New KeynesianDSGE model. To make this articleself-contained,we brieflyreview the model specification,which is adopted from Lubik and Schorfheide(2004). Detailed derivationscanbe found,for instance,in KingandWolman(1999),King (2000),and Woodford(2003). The model economy consists of a representativehousehold, a continuumof monopolisticallycompetitivefirms,and a monetarypolicy authoritythat adjusts the nominalinterestrate in response to deviationsof inflationand output from theirtargets.TherepresentativehouseholdderivesutilityfromrealbalancesM/ P and consumptionsC relativeto a habit stock. We assumethat the habit stock is given by the level of technologyA. This assumptionassuresthat the economy evolvesalonga balancedgrowthpath.Thehouseholdderivesdisutilityfromhours worked,h, and maximizes (i) s *E^t sE,+tX i -((CAs - 1 log -hs Ms ~s 646 DEL NEGRO AND SCHORFHEIDE where / is the discountfactor, r is the risk aversionparameter,and X is a scale factor. P is the economy-widenominal price level that the household takes as given. The (gross) inflation rate is defined as Tt = Pt/Pt-i. The householdsuppliesperfectlyelastic labor servicesto the firmsperiod by period and receives the real wage W. The household has access to a domestic capitalmarketwhere nominalgovernmentbonds B are tradedthat pay (gross) interestR. Furthermore,it receives aggregateresidualprofitsD from the firms and has to pay lump-sumtaxes T. Thus,the households'budget constraintis of the form (2) C, + Bt Pt + Mt Pt __t W Pt + Mt - + Rt Pt -_ Pt + Dt The usualtransversalityconditionon assetaccumulationrulesout Ponzischemes. The productionsector is describedby a continuumof monopolisticallycomdemandcurvefor its differentiated petitivefirmseach facinga downward-sloping product (3) ) (Ptt=0 = -1/ Pt ( Xt(i) xtu}X This demandfunctioncan be derivedin the usual way from Dixit-Stiglitzpreferences,wherebyPt(j) is the profit-maximizing price consistentwith production level Xt(j). The parameter v is the elasticity of substitution between two differ- entiatedgoods. The aggregateprice level and aggregatedemandXt are beyond the controlof the individualfirm.Nominalrigidityis introducedby assumingthat firmsface quadraticadjustmentcosts in nominal prices.When a firm wants to changeits price beyond the economy-wideinflationrate Ir*, it incursmenu costs in the form of lost output T(hPt(()t(j) -vr t The parameter (p> 0 governs the degree of stickiness in the economy. Productionis assumedto be linear in labor ht(j), which each firmhires from the household (5) Xt(j) = Atht(j) TotalfactorproductivityAt is an exogenouslygiven unit root processof the form (6) In At = In y + InAt1 + where (7) Zt = PzZt-1 + Cz,t Zt 647 PRIORSFROMDSGE MODELSFOR VARS and E(,t can be broadlyinterpretedas a technologyshock that affects all firms in the same way.The specificationof the technologyprocessinducesa stochastic trendinto the model.4 Firmj chooses its labor input ht(j) and price Pt(j) to maximizethe present value of futureprofits 00 Et EQsDs(i) (8) s=t subjectto (5) and (6), where the time s profitis given by - Whs(j)) ) P (J)s(j PS Ds(j) = 2Ps-l(j) Xs)(j--) Here Q is the time-dependentdiscountfactor that firmsuse to evaluate future profit streams.Although firmsare heterogenousex ante, we only consider the symmetricequilibriumin which all firmsbehave identicallyand can be aggregated into a single representativemonopolisticallycompetitivefirm.Under the assumptionthat households have access to a complete set of state-contingent claims Qt+l/Qt in equilibrium. Since the household is the recipi- = P(Ct/Ct+l)T ent of the firms'residualpaymentsit directsfirmsto make decisionsbased on the household'sintertemporalrate of substitution. The centralbankfollowsa nominalinterestrateruleby adjustingits instrument in responseto deviationsof inflationandoutputfromtheirrespectivetargetlevels (9) (9) gt (gt-1)P R R- R R* \f)R-( ) 't)~l(m_~t,2-pR) (7) *) ( Xt) Xt* e(Rt e where R* is the steady-statenominalrate and Xt is potentialoutput,whichwe define as X* = At after normalizingthe steady-statehours worked to one. The parameter0 < PR < 1 determinesthe degreeof interestratesmoothing.Thecentral bank supplies the money demandedby the household to support the desired nominalinterest rate. The monetarypolicy shock ER,t can be interpretedas an unanticipateddeviationfromthe policyrule. The governmentconsumesa fraction(t of each individualgood j. We define gt = 1/(1 - (t) and assumethatgt = ln(gt/g*) followsa stationaryAR(1) process (10) gt = Pggt-1 + Eg,t where (g,t can be broadlyinterpretedas governmentspendingshock. The governmentlevies a lump-sumtax (or subsidy) Tt/Pt to financeany shortfallin governmentrevenues(or to rebate any surplus) 4 Since our simple DSGE model lacks an internal propagation mechanism that can generate serially correlated output growth rates we assume that technology growth follows a stationary AR(1) process. 648 ("1) DEL NEGRO AND SCHORFHEIDE ' tXt + Rt-l - = + + Bt Pt Pt Pt Pt t + Pt The fiscal authority accommodates the monetary policy of the central bank and endogenously adjusts the primary surplus to changes in the government's outstanding liabilities. To solve the model, optimality conditions are derived for the maximization problems. Consumption, output, wages, and the marginal utility of consumption are detrended by the total factor productivity At. The model has a deterministic steady state in terms of the detrended variables. Define the percentage deviation of a variable yt from its trend Yt*as 5t = In Y - In t*. The log-linearized system can be reduced to three equations in output, inflation, and nominal interest rates - Et[t+1i]) + (1 - pg)gt + Pz-Zt (12) Xt = Et[Xt+1]- (13) fit = -1Et [tt+l] + K[Xt- gt] (14) Rt = pRtt-1 + (1 - PR) (4lft Tr-1(R + 22Xt) + ER,t where r* = y/B is the steady-state real interest rate and K is a function of the price adjustment costs and the demand elasticity. The parameter measures the overall degree of distortion in the economy. Equation (12) is an intertemporal consumption Euler equation, whereas Equation (13) is derived from the firms' optimal price-setting problem and governs inflation dynamics. Equation (14) is the log-linearized monetary policy rule.5 The linear rational expectations system given by Equations (7), (10), and (12)-(14) can be solved, for instance, with the algorithm described in Sims (2002). The relationship between the steady-state deviations and observable output growth, inflation, and interest rates is given by the following measurement equations (15) A lnxt = In y + Ait + 5t A In P = Inr* + it In R' =4[(lnr* + ln *) + Rt] In the subsequent empirical analysis a period t corresponds to one quarter. Output growth and inflation are quarter-to-quarterchanges, whereas the interest rate, Rt, is annualized. The evolution of Xt, ft, , ,and Ztis given by the solution to the linear rational expectations system given above. The DSGE model has three structural 5 In this article we restrict the parameter space to values that lead to a unique stable solution of the linear rational expectations system. Lubik and Schorfheide (2004) discuss the econometric analysis of linear rational expectations models when the parameter space is not restricted to the determinacy region. 649 PRIORSFROMDSGE MODELSFOR VARS shocks that we collect in the vector Et = [ER,t,Eg,t, E,t]'. We assume that the shocks are normallydistributedand independentof each other and over time. Their standard deviations are denoted by a R, gg,and a z, respectively. The DSGE model parametersare stackedinto the vector 0 = [ n y, In r*, Inr*, K, T,Og, 1l, *2, PR, Pg, (16) Pz, R, Oz]/ The DSGE model imposes tight restrictions across the parameters of the movingaverage (MA) representation for output growth, inflation, and interest rates. 3. USING THE DSGE MODEL TO OBTAIN A PRIOR FOR A VAR Let Yt = [Aln xt, Aln Pt, In Rt]. A less restrictive MA representation for the n x 1 vector Ytthan the one implied by the DSGE model of the previous section can be obtainedfrom a VAR model (17) Yt = Oo + lYt-i + * ( pYt-p + Ut where ut is a vector of one-step-ahead forecast errors. VARs have a long tradition in applied macroeconomics as a tool for forecasting, policy analysis, and business cycle analysis. One drawback of VARs is that they are not very parsimonious: In many applications, data availability poses a serious constraint on the number of endogenous variables and the number of lags that can effectively be included in a VAR without overfitting the data. A solution to this problem of too many parameters is to constrain the parameter estimates by "shrinking"them toward a specific point in the parameter space.6 For instance, Doan et al. (1984) proposed to shrink the estimates of a VAR that included the level of output, and prices, toward univariaterandomwalk representations.The justificationfor this shrinkagewas statistical rather than economic: No-change forecasts implied by the random walk model were known to be fairly accurate. Our method shrinks the VAR estimates toward restrictions that are derived from a DSGE model. The spirit of our ap- proachis to take the DSGE model restrictionsseriouslywithoutimposingthem dogmatically. Shrinkage estimators can be interpreted as Bayes estimators that are derived from prior distributions that concentrate much of their probability mass near the desired parameter restrictions. Priors are a way of systematically adding observations. In fact, many prior distributions can be incorporated into the estimation by augmenting the actual data set with dummy observations.7 Loosely speaking, we implement the DSGE model prior by generating dummy observations from the DSGE model, and adding them to the actual data. The ratio of dummy over actual observations-which we call X in the remainder of the article-measures the weight of the prior relative to the sample. 6 Shrinkage estimators have a long tradition in statistics, see, e.g., Lehmann (1983). 7 This so-called mixed estimation is due to Theil and originally instance, by Sims and Zha (1998). Goldberger (1961), and used, for 650 DEL NEGRO AND SCHORFHEIDE The DSGE modelis indexedby a vector0 of so-calleddeep parameters.Instead of conditioningour analysison a specificvalue of 0 we place a priordistribution on the DSGE model parameters.In the econometricparlance,we constructa hierarchicalprior consistingof a marginaldistributionfor 0 and a conditional distributionfor the VAR parametersgiven 0, which is generated throughthe dummyobservations.Bayes theoremthen leadsus to a jointposteriordistribution for the DSGE model and VAR parameters.Our implementationof the DSGE modelpriorhasanimportantadvantageoverpreviousapproaches.Welearnabout the DSGE model parametersby implicitlysearchingfor valuesof 0 for whichthe distancebetweenthe VAR estimateandthe vectorautoregressiverepresentation of the DSGE model is small.This informationabout 0 will be exploited for the policy analysisin Section4. We have just given an informal descriptionof our procedure.Much of the remainderof this section-for the econometricallyorientedreader-provides an accurateanddetailedexposition.In Section3.1we discussthe likelihoodfunction. Section3.2 describesthe priordistribution.The posteriordistributionis analyzed in Section3.3. 3.1. LikelihoodFunction. In order to constructa likelihoodfunctionwe assume that the innovationsut in Equation(17) have a multivariatenormaldistributionJA(0,Eu)conditionalon past observationsof Yt.Let Y be the T x n matrix with rows y'. Let k = 1 + np, Xbe the T x k matrix with rows xt = [1, y-,..., p]'. The VAR y_tp], U be the T x n matrix with rows u', and E = [Vo, 1i,..., can be expressedas Y = X >+ U with likelihoodfunction (18) p(Y y),Eu) o I- T/2exp l -tr[Eul1(Y'Y- Y'X' Y- Y 'X + 'X'X)] conditionalon observationsy1-p,..., yo. Although the DSGE model presented in Section 2 does not have a finite-ordervector autoregressiverepresentationin terms of Yt,the VAR can be interpretedas an approximationof the MA representation of the DSGE model. The magnitudeof the discrepancybecomes smalleras more lags are includedin the VAR. Since0 is of muchlowerdimension than the VAR parametervector, the DSGE model imposes restrictionson the (approximate)vector autoregressiverepresentationof yt. 3.2. PriorDistribution. Supposethe actualobservationsare augmentedwith T* = T artificialobservations(Y*, X*) generatedfromthe DSGE model based on the parametervector 0. The likelihoodfunctionfor the combinedsample of artificialand actualobservationsis obtainedby premultiplying(18) with (19) p(Y*(O) I D, E) IEu-T/2 exp j-tr[u ('* (Y*Y 2~ u ~ -~ Y Y * ~X- ~ * + ^>XX' ~ ~X*X~] )]} 651 PRIORSFROMDSGE MODELSFOR VARS The factorization p(Y*(o), Yj ,C,) (20) = p(Y*(0) jI , E,)p(YI *, U,) suggeststhat the term p(Y*(O)1 p, Su) can be interpretedas a priordensityfor 'P and Eu. It summarizesthe informationabout the VAR parameterscontained in the sampleof artificialobservations. If we in fact were to constructour priorby generatingrandomdrawsfromthe DSGE model, a repeated applicationof our procedurewould lead to stochastic variationin the prior distribution,which is undesirable.In order to remove the stochasticvariationfrom p(Y*(O)Ia, Su) we replacethe nonstandardizedsample moments *'*, Y*' X*, and X"X* by their expected values. According to our DSGE model, the vector Ytis covariancestationaryand the expectedvalues of the samplemomentsare given by the (scaled)populationmoments Tr*y(0), ATrr(0), andATr f(0), where,for instance,rfyy() = Ee[ytyj].Sincethese populationmomentscanbe computedanalytically,ourprocedureis veryefficientfrom the computationalstandpoint.Formally,the use of populationmomentsimplies that we replaceEquation(19) with (21) p(P, u ) = c-1 (0)lE TE-(ry() exp-tr[ -r; - ' (0) (0)0 + cr*x(0)4)] where we also addedan initialimproperpriorp(QI,Su) x I I-(n+1)/2. T > k + n and rxx(0)is invertible,the priordensityis properand Providedthat T In nondegenerate. this case the normalizationfactorc(0) can be chosento ensure that the densityintegratesto one (the formulafor c(0) is givenin the Appendix). Define the functions (22) *(0) = rFx (23) *(0)= r*(f) - r* y()r*-(o)r() (0)r() Conditionalon 0 the prior distributionof the VAR parameters(21) is of the Inverted-Wishart (ZW) - Normal(.A) form8 (24) (25) 8 If - k, , 10 - ZW(AT.TE*(), ,T n) |I u 0 , 7(*(0), Eu? (ATrx(0)) 1) r'x(0) is of reduced rank, then the prior concentrates its mass in a lower dimensional subspace of the domain of a. In our application the number of structural shocks equals the number of endogenous variables n to which the model is fitted and Frx() is full rank. For DSGE models with less than n structural shocks we recommend to ensure that rFx(0) is full rank by introducing additional shocks or measurement errors. 652 DEL NEGRO AND SCHORFHEIDE The specification of the prior is completed with a distribution of the DSGE model parameters, details of which we discuss in Section 4. Overall our prior has the hierarchical structure p(1, E,, 0) = p(<, E, IO)p(0) (26) The functions V*(0) and 12(0) trace out a subspace of the VAR parameter space and can be interpreted as follows. Suppose that data are generated from a DSGE model with parameters 0. Among the pth-order VARs the one with the coefficient matrix 1I*(0) minimizes the one-step-ahead quadratic forecast error loss. The corresponding forecast error covariance matrix is given by XS(0). Our prior is designed to assign probability mass outside of the subspace traced out by (*(0) and S*(0).9 We use the covariance matrix ES(0) 0 (XTr7x(0))-1 to distribute probability mass around I*(0) and average over 0 with respect to a prior p(O). The orientation of the prior contours is such that the prior is fairly diffuse in the directions of the VAR parameter space that we expect to estimate imprecisely according to the DSGE model. Our prior differs from the one used by DeJong et al. (1993), who used a simulation procedure to approximate (in our notation) the marginal prior for the VAR coefficients p(I, E) = fp((, ,, IO)p(O)dOby a conjugate 1W-1A/ prior. 3.3. Posterior Distribution. In order to study the posterior distribution we factorize it into the posterior density of the VAR parameters given the DSGE model parameters and the marginal posterior density of the DSGE model parameters p(<, E,, 0 1Y) = p((, E, IY, )p(O l Y) (27) Let (>(0) and E,(0) be the maximum-likelihood estimates of ? and Eu, respectively, based on artificial sample and actual sample (28) t(0) l() (29) = (xTr;x(0) + X'X)- (xTry + X'Y) = (X + )T[( Ty(0) + Y'Y) - (iTr (O) + Y'X)(XTF* (e) + '1( TVr(0) + X'Y)] 'x) Since conditional on 0 the DSGE model prior and the likelihood function are conjugate, it is straightforward to show, e.g., like Zellner (1971), that the posterior distribution of 1 and E is also of the Inverted Wishart-Normal form 9 Ingram and Whiteman (1994) used a Gaussian prior for the DSGE model parameters 0 - A(0, V ) and approximated the function I*(0) equation by equation with a first-order Taylor series around the prior mean 0 to induce a prior distribution for the VAR parameters. PRIORSFROMDSGE MODELSFOR VARS (30) (31) 653 E IY,0 - ZW((X+ 1)TX((0), (1 + X)T- k, n) i IY, xu, 0 - A;((0), SuX (xTrxx(0) + X'x)-1) The formula for the marginal posterior density of 0 and the description of a Markov Chain Monte Carlo algorithm that generates draws from the joint posterior of 0, S,, and 0 are provided in the Appendix. The ability to compute the population moments r* (0), Fry(0), and rlx(0) analytically from the log-linearized solution to the DSGE model and the use of conjugate priors for the VAR parameters makes the approach very efficient from a computational point of view: 25,000 draws from the posterior distribution of all the items of interest-including forecast paths and impulse responses-can be obtained in less than 10 minutes using a 1.2 GHz PC. In the remainder of this section we discuss the choice of X and the source of information about the DSGE model parameters. 3.3.1. Choice of L. The hyperparameter X determines the effective sample size for the artificial observations, which is AT. If A.is small the prior is diffuse, and the actual observations dominate the artificial observations in the posterior. According to Equation (28), the posterior mean of ? conditional on 0 equals the OLS estimate of I if . = 0. For large values of X the prior concentrates along the restriction functions 0* (0) and XS(0). We will show that as the length of the artificial sample increases the VAR parameter estimates will stay closer to the restrictions implied by the DSGE model. In the limit our procedure is equivalent to estimating the VAR subject to the DSGE model restrictions. Formally,as A -> oo, conditional on 0 the posterior mean 4(0) approaches c*(0) and the variance Su(0) goes to zero. However, this result does not imply that the actual observations have no influence on the overall posterior distribution. An examination of the marginal posterior of the DSGE model parameters can clarify this point. The posterior p(0 I Y) can be obtained by combining the marginal-likelihood function (32) p(YI 0)= f p(YI c, Eu)p(), ,uI0)d(), Eu) with the prior p(O) (see Appendix). If T is fixed and A tends to infinity the marginal-likelihoodfunctionp(YI 0) approachesthe (quasi)-likelihoodfunction of the DSGE model10 (33) p*(y0) Eu*(0) l /exp -2tr[ Eu*(0)(Y- X*())'(Y- X*(O))J 10 Since the DSGE model typically does not have a finite-order VAR specification p*(YI 0) is a quasi-likelihood function from the perspective of the structural model. 654 DEL NEGRO AND SCHORFHEIDE The function p*(YI 0) is obtained by replacing the unrestricted parameters 0 and Eu in Equation (18) with the restriction functions 1*(0) and S,(0). The result is summarized in the following proposition that is proved in the Appendix. 1. PROPOSITION Let 0 be the mode of p(Y 0). For a fixed set of observations Y, ln P(Y 0) p(YI ) > ln P*(Y I0) p*(YI0) uniformlyfor in compact subsets of as oo X for which S (0) and r* (0) are nonsingular. Not surprisingly,the empirical performance of a VAR with DSGE model prior will crucially depend on the choice of X.We use a data-driven procedure to determine an appropriate value X of the hyperparameter. We maximize the marginal data density (34) px(Y)= J P(YI 0)p(0) dO with respect to X over some grid A = {l1,...,lq}. Rather than averaging our conclusions about all possible values of ., we condition on the value X. with the highest posterior probability.This marginal data density can also be used to choose an appropriate lag length for the VAR. 3.3.2. Learning about the DSGE model parameters. A major improvement of our procedure over earlier approaches is that it enables posterior inference with respect to the DSGE model parameters, whereas previous approaches only delivered a posterior for the VAR parameters. Although the likelihood function (18) itself does not directly depend on the DSGE model parameters, the marginal distribution of 0 will nevertheless be updated through the sample information. Similar to Smith (1993), an estimate of 0 is implicitly obtained by projecting the VAR estimates onto the restrictions implied by the DSGE model. The joint posterior density can be written as (35) p(0, Eu, 0 I Y) = p(DP,Eu I Y)p(0 I D, Su) Learning about 0 takes place indirectly through learning about the VAR parameters. We expect that the best fit of the VAR model is achieved for values of X that allow moderate deviations from the DSGE model restrictions. Define the function (36) q(IY)= expf- ln ~u (0)l -tr[E^ -1 tr[umle 2 m[Eem:*(O) -- ()] -r e) Xme ()()(*(O) - 4)mle)]} 655 PRIORSFROMDSGE MODELSFOR VARS where "mle and Su,mlemaximizethe likelihood function (18). Using a second-order Taylorexpansion,it can be shownthat the logarithmof q(OIY) is approximately a quadraticfunctionof the discrepancybetween the VAR estimatesand the restrictionfunctionsgeneratedfromthe DSGE model.Forlong samplesand under moderatelytightpriorsthe marginallog-likelihoodfunctioncan be approximated as follows: 2. Let 0 be the mode of p(YI 0). Suppose T PROPOSITION T --- oo. Then 1 - XT In q(YJO) p(YI O) ) + Op(max[( T)-', p =l In q(y p(Y\0) > oo, X -. 0, and ]) q(Y(0) The approximation holds uniformly for 0 in compact subsets of ( for which S2 (0) and rFx(0) are nonsingular. The intuitionfor this resultis the following.The weightof the priorrelativeto the likelihoodfunctionis small (X -> 0), so that for all values of 0 the posterior distributionof the VAR parametersconcentratesaround mle. The conditional densityof 0 given (c and Euprojects >mleonto the subspace >*(0).The amountof informationaccumulatedin the marginallikelihoodp(YI 0) relativeto the prior depends on the rate at which XT diverges.The more weight is placed on the artificialobservationsfrom the DSGE model (X convergesto zero slowly), the more curvatureand informationthere is in p(YI 0). Accordingto Proposition2 the posteriorestimate of 0 can be interpretedas a minimumdistanceestimate (e.g., Chamberlain,1984 and Moon and Schorfheide,2002) that is obtained by minimizingthe weighted discrepancybetween the unrestrictedVAR estimates 4)mle and the restrictionfunction I*(0). 4. EMPIRICAL APPLICATION Thissectiondescribesthe resultsobtainedwhenwe applythe priorfromthe New Keynesianmodeldescribedin Section2 on a trivariateVAR in realoutputgrowth, inflation,andinterestrates.Weuse quarterlydata,andthe lag lengthin the VAR is four quarters.11 Section4 consistsof fourparts.Section4.1 discussesthe priorand DSGE model parameters.Section 4.2 describesthe forecasting for the posterior results.We show that the DSGE model priorleads to a substantialimprovement relativeto an unrestrictedVAR andin severaldimensionsdominatesa VAR with Minnesotaprior(Minn-VAR). 11The data for real output growth come from the Bureau of Economic Analysis (Gross Domestic Product-SAAR, Billions Chained 1996$). The data for inflation come from the Bureau of Labor Statistics (CPI-U: All Items, seasonally adjusted, 1982-1984 = 100). The interest rate series are constructed as in Clarida et al. (2000): For each quarter the interest rate is computed as the average federal funds rate ( source: Haver Analytics) during the first month of the quarter, including business days only. Our sample ranges from 1955:111to 2001:111. 656 DEL NEGRO AND SCHORFHEIDE Policy analysis with a DSGE-VAR is discussed in Sections 4.3 and 4.4. First, we construct impulse response functions to study the effects of modest interventions (Leeper and Zha, 2003) in terms of unanticipated deviations from the monetary policy reaction function. The DSGE model is used to obtain an identification scheme for the VAR. Second, we use the DSGE-VAR to predict the effects of a policy rule change. Whereas in the context of VARs the analysis of regime changes is generally subject to the Lucas' critique, our approach can be seen as a weighted average of two extremes: (i) using the DSGE model to forecast the effects of the policy change (X = oo), and (ii) using the VAR to make forecasts (X = 0), thereby ignoring the effects of the policy intervention. In our framework, the choice of the prior weight X reflects the degree of misspecification of the structural model. We try to predict the impact of the change from the Martin-Burns-Miller regime to the Volcker-Greenspan regime using the DSGE-VAR. The results suggest that the approach is promising, at least in some dimensions. 4.1. Prior and Posterior of 0. All empirical results are generated with the prior distribution reported in Table 1. The model parameters In y, In r*, In r*, o R, ag, and Oa are scaled by 100 to convert their units into percentages. The priors for the quarterly steady-state growth rate, inflation rate, and real interest rate are fairly diffuse and have means of 0.5%, 1.0%, and 0.5%, respectively. With 90% prior probability the risk aversion parameter r is between 1.2 and 2.8, whereas the slope of the Phillips curve K is between 0.06 and 0.51. The latter interval TABLE1 PRIOR DISTRIBUTIONS FOR DSGE MODEL PARAMETERS Name Range In y In 7r* In r* JR JR IR+ /R+ JR+ IR+ /R+ [0,1) [0.1) [0,1) JR+ R+ R+ K r lrJ f2 PR Pg Pz CrR ag aoz Density Normal Normal Gamma Gamma Gamma Gamma Gamma Beta Beta Beta Inv. Gamma Inv. Gamma Inv. Gamma Mean SD 0.500 1.000 0.500 0.300 2.000 1.500 0.125 0.500 0.800 0.300 0.251 0.630 0.875 0.250 0.500 0.250 0.150 0.500 0.250 0.100 0.200 0.100 0.100 0.139 0.323 0.430 NOTES: The model parameters In y, In 7r*,In r*, rR, og, and orzare scaled by 100 to convert them into percentages. The Inverse Gamma priors are of the form p(a Iv, s) oc c-v-le-s /2a,2 where v = 4 and s equals 0.2, 0.5, and 0.7, respectively. Approximately 1.5% of the prior mass lies in the indeterminacy region of the parameter space. The prior is truncated in order to restrict it to the determinacy region of the DSGE model (SD is standard deviation). 657 PRIORSFROMDSGE MODELSFOR VARS TABLE 2 POSTERIOR OF DSGEMODELPARAMETERS: 1959:III-1979:II Posterior, . = 1 Prior Posterior, Name CI (Low) CI (High) CI (Low) CI (High) CI (Low) In y In r* In r* K 0.101 0.219 0.132 0.063 1.197 1.121 0.001 0.157 0.922 1.863 0.880 0.513 2.788 1.910 0.260 0.812 0.473 0.433 0.113 0.101 1.336 1.011 0.120 0.530 1.021 1.613 0.463 0.516 2.816 1.559 0.497 0.756 0.616 0.553 0.126 0.081 1.684 1.009 0.150 0.550 T Vf '2 PR =- 10 CI (High) 1.045 1.678 0.384 0.416 3.225 1.512 0.545 0.747 NOTES:We report 90% confidence intervals (CI) based on the output of the Metropolis-Hastings Algorithm. The model parameters In y, In zr*, and In r* are scaled by 100 to convert them into percentages. is consistent with the values that have been used in calibrationexercises,e.g., Claridaet al. (2000).The priorsfor the policyparametersV1and 1V2are centered at Taylor's(1999) values.12The prioris truncatedat the indeterminacyregion of the parameterspace. As stressedin Section 3, our procedurealso generatesposteriorestimatesfor the DSGE model parameters.Such estimates are presented in Table 2 for the To illustratethat the extent of learningabout 0 sample period 1959:III11-1979:II11. X of on the the DSGE modelprior,Table2 reports90%posterior weight depends confidencesets for X = 1 and X = 10. A comparisonof priorand posteriorintervals indicatesthat for X = 1 the data lead to a modest updating.The confidence intervalsfor most parametersshrink.For instance,the priorconfidenceinterval for the interestrate smoothingparameterPR rangesfrom 0.15 to 0.81, whereas the posteriorintervalnarrowsto [0.21, 0.54]. The parameterthat characterizes the responsivenessof the centralbank to inflationis reviseddownward,whereas the confidenceintervalfor the output coefficientshifted upwards.The posterior meansfor Vf1and1V2are1.3and0.3,respectively.Theupdatingis slightlymorepronouncedfor X= 10,when the artificialsamplesize is 10 timesas long as the actual sample. 4.2. ForecastingResults. Theobjectiveof thissubsectionis to showthatVARs with DSGE model priorsproduceforecaststhatimproveon those obtainedusing unrestrictedVARs, and are competitivewith those obtained using the popular Minnesotaprior.The Minnesotaprior shrinksthe VAR coefficientsto univariate unit root representations.Although it has been empiricallysuccessful,e.g., 12Since the inflation rate and the interest rate in the DSGE model are quarter to quarter, the value of Vr2corresponds to one-fourth of the value obtained in univariate Taylor rule regressions that use annualized interest rate and inflation data. 658 DEL NEGRO AND SCHORFHEIDE Litterman (1986), Todd (1984), it lacks economic justification and ignores information with respect to co-movements of the endogenous variables.13 In a particular instance, this point has already been made by Ingram and Whiteman (1994). However, we provide two extensions of their results. First, we show that DSGE model priors can be helpful in forecasting not only real but also nominal variables. Second, whereas Ingram and Whiteman document the ex post forecasting performance of their DSGE-VAR as a function of the relative weight of the prior, we also study the forecasting performance under the requirement that a hyperparameter X is chosen ex ante for each forecast based on the marginal posterior data density p (Y). This is an important extension because the forecasting performance of the VAR is sensitive to X and it has to be guaranteed that a good X can be chosen before the actual forecast errors become available. Most of the remainder of the section will present results from a forecasting exercise using a rolling sample from 1975:1 to 1997:111(90 periods). For each date in the forecasting interval we used 80 observations in order to estimate the VAR, that is, a ratio of data to parameters of about 6 to 1 for each equation. This choice is motivated by the fact that the data-parameter ratio in larger models that are being used for actual forecasting, such as the Atlanta Fed VAR, is of the same magnitude. It is important to remark that the results presented in this section have no pretense of being general: They are specific to the particular DSGE model, and the particular VAR being estimated. We begin by investigating how the forecasting performance of the VAR changes as a function of X and the forecast horizon. Figure 1 plots the percentage improvement (or loss, if negative) in root mean square forecast errors (rmse) of the DSGE-VAR relative to the unrestricted VAR for cumulative real output growth, cumulative inflation, and the federal funds rate.14 The fourth panel of Figure 1 depicts the improvements in the multivariate forecasting performance statistic proposed by Doan et al. (1984).15The grid of values of Xranges from 0 to 13In this article the Minnesota prior is implemented as = (In ? (X,XT) + tHml)) (vec(XYrT) + tHm1m) where the parameter t denotes the weight of the Minnesota prior, ?m is the prior mean, and Hm is the prior tightness. The values of )mand Hm are the same as in Doan et al. (1984), with the exception of the prior mean for the first lag of output growth and inflation. Since these two variables enter the VAR in growth rates, as opposed to log levels, to be consistent with the random walk hypothesis the prior mean for the first lag of the "own" regressor in the output growth and inflation equations is 0 and not 1. The Minnesota prior is augmented by a proper ZW prior for Su. The weight of the Minnesota prior is controlled by the hyperparameter l. The hyperparameter is selected ex ante using a modification of Equation (34). This value hovers around 0.5, depending on the sample. The value used in Doan et al. (1984) is t = 1. 14Neither the output growth rates nor the inflation rates are annualized. 15 The "ln-det" statistic is defined as the converse of the natural logarithm of the determinant of the error covariance matrix of the forecasts, divided by two (to convert from variance to standard error) times the number of variables that are forecasted (to obtain an average figure). The improvement in the multivariate forecasting performance statistics is computed by taking the difference between the multivariate statistics multiplied by 100 to obtain percentage figures. This number can be seen as the average in the improvements for the individual variables, adjusted to take into account the joint forecasting performance, i.e., the correlation in forecast errors. 659 PRIORSFROM DSGE MODELSFOR VARS inflation real GDP growth 20 .. o 10 .- o0 0 1qua quarters multivariatestatistic Fed Fund 20 - 10 o 0o 10- 5 0 quartersahead . 4qua2 quartersahead lnflO1 lnf 1.81 nf10 NOTES: The plot shows the percentage gain (loss) in RMSEs relative to an unrestricted VAR. The rolling sample is 1975:III-1997:III (90 periods). At each date in the sample, 80 observations are used in order to estimate the VAR. FIGURE1 FORECASTING PERFORMANCE AS A FUNCTION OF THE WEIGHT OF THE PRIOR oo, where X = oo means forecasting with the VAR approximation of the DSGE model. By definition the gain for X= 0 is zero. As the weight of the prior is increased we observe a substantial gain in the forecast performance. The surface for the multivariate statistic is fairly flat for values of Xbetween 0.5 and 5, but then deteriorates sharply as . approaches infinity. Overall our performance measures have an inverse U-shape as a function of the hyperparameter X. This indicates that there is a benefit from shrinking the VAR estimates toward the DSGE model restrictions without dogmatically imposing them. The ex post optimal X for long-run forecasts tends to be larger than for short-run forecasts. In order to obtain accurate forecasts over long horizons one has to estimate powers of the autoregressive coefficients (. The large sampling variance of these estimates can be reduced by increasing the weight of the prior. However, once the length of the artificial saniple relative to the actual sample exceeds 2, the variance reduction is dominated by an increased bias and the forecasting accuracy generally deteriorates. Interestingly, the deterioration is not sharp at all: In particular, for inflation and the interest rate 660 DEL NEGRO AND SCHORFHEIDE TABLE3 PERCENTAGE GAIN (LOSS) IN RMSES: DSGE PRIOR VERSUS UNRESTRICTED VAR (V-UNR) AND MINNESOTA PRIOR (V-MINN) rGDP Growth Inflation Fed Funds Multivariate Horizon V-unr V-Minn V-unr V-Minn V-unr V-Minn V-unr V-Minn 1 2 4 6 8 10 12 14 16 17.335 16.977 15.057 14.116 12.387 14.418 15.078 16.236 19.122 1.072 6.965 5.803 3.452 4.230 7.986 12.512 17.233 21.575 8.389 7.247 8.761 10.460 11.481 12.261 12.626 12.995 13.238 1.653 1.339 4.767 7.240 7.794 8.351 9.011 9.634 10.116 7.250 5.024 5.008 6.648 8.420 8.242 6.404 6.059 5.823 -7.593 -4.895 -1.878 -0.713 -0.204 -0.639 0.726 1.146 2.389 12.842 10.993 9.630 10.388 11.023 12.864 12.419 12.611 13.428 0.943 2.884 3.959 4.290 5.187 6.463 7.537 8.481 9.512 NOTES:The rolling sample is 1975:111-1997:111 (90 periods). At each date in the sample, 80 observations are used in order to estimate the VAR. The forecasts are computed based on the values X and l that have the highest posterior probability based on the estimation sample. the long-run forecasts from DSGE-VAR are still accurate even when the prior weight is infinity. Although Figure 1 documents that the DSGE model prior leads to an improved forecasting performance, two questions remain. First, can an appropriate value of A be selected ex ante? Second, how does the DSGE-VAR compare to a more competitive forecasting model such as a VAR with Minnesota prior? We argued in Section 3.3.1 that X can be chosen over a grid A to maximize the marginal data density px(Y), given in Equation (34). This leads to the hyperparameter estimate (37) = arg max p (Y) XeA Depending on the sample, this value generally hovers around 0.6, which corresponds to 48 artificial observations from the DSGE model. However, the shape of the marginal data density as a function of X is flat for values of A between 0.5 and 2, suggesting that the fit of the model is approximately the same within that range. These estimates are broadly consistent with the value that leads to the best ex post one-step-ahead forecast performance.16 Table 3 documents the rmse improvements of the DSGE-VAR based on A relative to the unrestricted VAR and the Minn-VAR. The DSGE-VAR clearly outperforms the unrestricted VAR, even if the weight of the prior is chosen ex ante. The DSGE-VAR also dominates the Minn-VAR in terms of output growth and inflation forecasts and according to the multivariate performance measure. 16A full Bayesian procedure would average over . rather than condition on the highest posterior probability X.However, in our experience the values of X that have nonnegligible posterior probability produce very similar predictions so that the gain from averaging instead of conditioning is minimal. PRIORSFROMDSGE MODELSFOR VARS 661 Although the improvement is small, around 1-1.5%, for short horizons, it is substantial for forecast horizons beyond two years. The DSGE-VAR forecasts of the federal funds rate are unfortunately slightly worse than the Minn-VAR forecasts for most of the horizons that we considered. Overall, these results suggest that the DSGE-VAR is competitive with, and to some extent improves upon, the MinnVAR.17 When interpreting these results one has to bear in mind a key difference between Minnesota and DSGE model prior. The Minnesota prior has only a statistical but not an economic justification. We will show subsequently that the DSGE model prior can be exploited in the analysis of policy interventions. 4.3. Impulse Response Functions. In order to compute dynamic responses of output, inflation, and interest rates to unanticipated changes in monetary policy and to other structural shocks it is necessary to determine the mapping between the structural shocks et and the one-step-ahead forecast errors ut. Let Str be the Cholesky decomposition of Eu. It is well known that in any exactly identified structural VAR the relationship between ut and Et can be characterized as follows: (38) Ut = EtrQEt where Q is an orthonormal matrix and the structural shocks are from now on standardized to have unit variance, that is E[te/]J = Z. According to Equation (17) the initial impact of Et on the endogenous variables Ytin the VAR is given by (39) ayt) =V \cEt/ VAR The identification problem arises from the fact that the data are silent about the choice of the rotation matrix 2. More prosaically, since Etr2Q2Q'tr = EtrEtr = Eu the likelihood function is invariant to 2. Macroeconomists generally require Q to have some ex ante justification and to produce ex post impulse response functions that are "reasonable," i.e., conform in one or more dimensions with the predictions of theoretical models. Since there is no agreement on what these dimensions should be, a multitude of identification strategies have been proposed (see Christiano et al., 1999, for a recent survey). In this article we have used the DSGE model to derive a prior distribution for the reduced-form VAR parameters. Hence, it is quite natural in our framework to use the structural model also to identify the VAR. Thus, we will now construct a rotation matrix Q2based on the dynamic equilibrium model. The DSGE model itself is identified in the sense that for each value of 0 there is a unique matrix Ao(O),obtained from the state space representation (15), that deter17We do not report formal significance tests for superior forecast performance, such as the Diebold and Mariano (1995) test, since the assumptions underlying those tests do not match the setup in our article. Thus, the results should be interpreted as ex post accuracy comparisons, not as hypothesis tests. Although not pursued here, Bayesian posterior odds could be used to choose among DSGE-VAR and Minn-VAR ex ante. 662 DEL NEGRO AND SCHORFHEIDE mines the contemporaneous effect of Et on Yt.Using a QR factorization of Ao(O), the initial response of Ytto the structural shocks can be uniquely decomposed into (40) ay) t DSGE= DSGE Ao(O) = (0)Q2*(O) where ZEr(0) is lower triangular and Q*(0) is orthonormal. To identify the VAR, we maintain the triangularization of its covariance matrix E" and replace the rotation Q2in Equation (39) with the function Q*(0) that appears in (40).18 The implementation of this identification procedure is straightforward in our framework. Since we are able to generate draws from the joint posterior distribution of D, EU, and 0, we can for each draw (i) use 1 to construct a MA representation of yt in terms of the reduced-form shocks ut, (ii) compute a Cholesky decomposition of Eu, and (iii) calculate Q2= Q*(0) to obtain a MA representation in terms of the structural shocks et. A few remarks about the procedure are due. First, since the likelihood of the reduced-form VAR is invariant with respect to Q, the rotation matrix that is used to achieve identification is the same a posteriori as it is a priori for any fixed 0. However, as explained in Section 3.3.2, the distribution of 0 is updated with the sample information. Hence, we learn from the data which rotation to choose, indirectly, via learning about the DSGE model parameters. Second, the extent to which the posterior impulse responses are forced to look like the DSGE model's responses will depend on the tightness of the prior. The larger i, the more similar the responses will be. If the researcher chooses the tightness of the prior endogenously based on the marginal data density (34) the data-and not the researcher-will determine in which dimensions the posterior impulse responses will conform to the model's responses, and in which dimensions they will not. Figure 2 depicts the impulse response functions with respect to monetary policy shocks of cumulative real output growth, inflation, and the interest rate, normalized so that the initial impact of a monetary shock on the interest rate is 25 basis points. Each plot shows the VAR impulse responses (dashed-and-dotted line), the corresponding 90% error bands (dotted lines), and the DSGE model impulse responses (solid lines). The estimates are based on a sample of 80 observations ending in 2001:111.The impulse responses are computed for different values of the tightness parameter A, namely A E {0.5, 1, 5}. As expected, the VAR impulse responses become closer to the model's as the weight of the prior increases. Specifically, the distance between the posterior means of the VAR and the model's impulse responses decreases. In addition, the bands for the VAR impulse responses narrow considerably. It is interesting to observe that in some dimensions the VAR impulse responses conform to the model's even for small values of the tightness parameter (X = 0.5). 18Since the VAR representation of the DSGE model, as characterized by I*(0) and EZ(0), is only an approximation, the Cholesky decomposition of E*(0) is not exactly equal to ETr(0).However, in our experience the difference is for practical purposes negligible. 663 PRIORSFROMDSGE MODELSFOR VARS real GDP growth-cumulative 40 Lo inflation Fed Fund 10 30 5 20 - 20 10 \ II . 0 0 . . .. 4 8 12 16 4 8 12 16 -15 * o -10 : 4 0 8 12 16 -10 0 .. 4 .. . ... 8 12 16 30 20 T1 11 f<1 - 10 0' 0 4 8 12 -10 16 0 ., 4 8 12 16 Lr 10 0 4 8 12 16 15 ' - -5 .. .. 0 . 4 8 12 16 -._,_ . 0..0- -10 0 4 8 ,_ .. - - 12 16 The dashed-dotted lines represent the posterior means of the VAR impulse response functions. NOTES: The dotted lines are 90% confidence bands. The solid lines represent the mean impulse responses from the DSGE model. The impulse responses are based on the sample 1981:IV-2001:III. 2 FIGURE IDENTIFIED IMPULSE RESPONSE FUNCTIONS The sign and the magnitude of the VAR impulse responses on impact agree with the model and are very precisely estimated. This suggests that Etr and ZE*(0) in Equations (39) and (40) are very similar. The response of inflation to a money shock is short-lived both in the DSGE model as well as in the VAR. In other dimensions there is less agreement: Where the model predicts long-run money neutrality,the VAR impulse responses indicate that there is substantial uncertainty about the long-run effects of money shocks on output. Although these findings are specific to this DSGE model, they seem to favor identification strategies based on impulse responses on impact (as in Faust, 1998;Uhlig, 2001; Canova and DeNicol6, 2002) relative to strategies that rely on long-run neutrality. Identification schemes based on zero-restrictions on the contemporaneous impact of the structural shocks often produce a price-puzzle in three- or four-variable VARs. Although the price-puzzle can be avoided by including producer prices in addition to consumer prices, it can also be avoided by using our identification scheme that is not based on zero-restrictions. 664 DEL NEGRO AND SCHORFHEIDE 4.4. Regime Shifts. The analysis of welfare implications of different monetary policy rules has become an active area of research (see, for instance, the articles in Taylor, 1999). It is important for policymakers to have a set of tools that allows them to predict the effects of switching from one policy rule to another. Suppose a policymaker only has a DSGE model and an unrestricted VAR available. Moreover, suppose the forecast performance of the DSGE model is worse than that of the VAR. However, because of the Lucas critique, the policymaker does not fully trust the VAR to correctly predict the impact of a permanent change in the policy rule. Based on the DSGE-VAR, we propose an approach by which the policymaker uses the DGSE model to forecast the effect of the policy change, but corrects the DSGE model predictions. The correction is the one that brings the DSGEs forecasts under the existing policy regime on track. The tacit assumption underlying this procedure is that the correction is itself policy invariant. We begin by decomposing the joint posterior of VAR and DSGE model parameters into p(C>,Eu, 0 I Y) = p(t, (41) I0, Y)p(0 I Y) To the extent that p(1>, Eu I0, Y) assigns mass away from the restriction functions I*(0) and ES(0) it can be interpreted as a correction of the VAR representation of the DSGE model given 0. This correction has been constructed from past observations to optimize forecasting performance. Partition 0 = [0[s), 0'p)]', where 0(p) corresponds to the parameters that the central bank can affect with its conduct of monetary policy. Suppose that our beliefs about the post-intervention policy parameters can be characterized by the density p(O(p)),which possibly concentrates all its mass on a single value of 0(p). We combine p((p)) with the information that we have about the nonpolicy parameters based on the available data. This leads to a modified posterior of 0 P(0 I Y) = p(0s) I Y)p(P)) (42) where p(0(s) I Y) is the marginal posterior of the nonpolicy parameters. Our inference with respect to the effect of the regime shift will be drawn from px(c,>,S, (43) Y)= p Px(D, E, I Y, 0)x( Y)d0 We use the subscript Xto indicate that the conclusions depend on the weight given to the DSGE model. It is instructive to take another look at the posterior mean of the VAR parameters 4 conditional on the DSGE model parameters 0 (O) = ( (0) +x1XX)T-1X' (1 )+F* T'X'Y) The parameters 0 are now drawn from the modified distribution p(0 I Y) that reflects the change in the distribution of the policy parameters due to the intervention. The sample moments X'X, X'Y, and Y'Y, on the other hand, are functions of PRIORS FROM DSGE MODELS FOR VARS 665 pre-intervention observations only. They remain part of our analysis because they incorporate information about how to "correct" the DSGE model predictions to achieve a good forecast performance. If X = 0, the VAR posterior does not depend on 0 at all: Hence the researcher is ignoring the DSGE model (and the regime shift itself) in computing her forecasts. If A = oo, then the procedure is equivalent to analyzing the policy directly with the (VAR approximation of the) DSGE model. It is clear that the Lucas critique is fully observed only in the X = oo extreme. Most of the current literature on monetary policy rules focuses on the effect of these rules on the magnitude of economic fluctuations and the households' utility over the business cycle. A popular measure of welfare besides agents' utility is the volatility of the output gap and inflation. To illustrate our prediction approach we are considering the effect of a change in the response of the federal funds rate to deviations of inflation from its target rate (the parameter Ifl in the Taylor rule (4)) on the standard deviation of real output growth and inflation. A widely shared belief, e.g., Clarida et al. (2000), is that under the chairmanship of Paul Volcker and Alan Greenspan the U.S. central bank responded more aggressively to rising inflation than under their predecessors William Martin, Arthur Burns, and William Miller. Based on the empirical results in the Taylor rule literature we compare two policies. Under the first policy scenario f1 = 1.1, whereas under the second policy scenario *1 = 1.8. The former can be loosely interpreted as a continuation of the inactive Martin, Burns, and Miller policy,19whereas the latter corresponds to a switch to a more active Volcker, Greenspan policy. To assess the two policies we generated draws from the modified posterior (42) and simulate trajectories of 80 observations conditional on the parameter draws. For each trajectory we discard the first eight quarters (hence we consider only the paths post-1982:III) and then compute the standard deviation of output growth, inflation, and the federal funds rate. The results for various choices of X are summarized in the density plots of Figure 3. The dashed densities corresponds to f1 = 1.1 and the solid densities to f1 = 1.8. The vertical lines in the plots show the standard deviation of the actual sample (post-1982:III) for the variables of interest. The 1982:1 threshold is taken from Clarida et al. (2000). The transition period from high inflation to low inflation between 1979 and 1982 implies that the actual standard deviation of inflation for the whole sample is high, and in our view does not reflect the "steady-state" variability of inflation under the Volcker-Greenspan policy. Hence we choose to discard the first eight quarters.20 The DSGE model predicts that an increase in the Taylor rule parameter fr1 induces a lower equilibrium variability of inflation and therefore a lower variability of the federal funds rate. This is indeed what can be observed in Figure 3 for X > 0. 19 Although some authors report estimates of if 1 < 1 we restrict ourselves to the determinacy region of the DSGE model. 20 A possible explanation is that the linear model fails to capture the transition period from high inflation to low inflation. In the model, agents change their expectations instantaneously when the policy change is announced whereas in reality there may be a learning process in which agents slowly realize that the policy change is permanent (regime shift) rather than temporary (deviation from policy rule). 666 DEL NEGRO AND SCHORFHEIDE Fed Fund inflation realGDPgrowth 0.5 2 0 0 Lo 1 2 0 0 1 2 2 0 5 10 5 10 0 5 10 0 5 0 10 5 10 0.5 0 1 2 0 1 0 2 0 1 2 10 1 0 2 0 0 0.5 'f:A2 0 0 0.5 m2 0 0 0 1 2 2 0 a 0 2 2 0 1 2 0 - 0, 1 2 0.5 0 to the samplestandard linescorrespond NOTES:Thedottedhorizontal deviationof the actualdata and the solid lines are posterior predictie distributions of sample from 1982:V to 1999:11.The dashed SHIFT OFA POLICY REGIME EFFECTS Thedashedline for the sametimeperiod,obtainedusingdataup to 1979:11. standard deviations toii = l1.1,thesolidlinecorresponds to hI =t1.8. corresponds FIGURE 3 EFFECTS OF A POLICY REGIME SHIFT from 1.1 to increases the forecasted variability of inflation and Whenever 1.8 tend to.8, rigt th distribution e is sk t interest rate decreases. It roeassuringts = of 1.8 (solid line) concentrates around of the predicted standard deviation for under e = 1.1 the actual standard deviation. The s predictns ti (dashed line) tend interest In of overestimate rate, the predictions to inflation variability. terms the under a change 1.8 tend to underestimate Fed Fund variability, whereas the predicthebecomes largerdashed tions under that= 1.1predicted the effect ofof policy change distribution appears to be roughly on spot, the distribution is skewed to the right). The predictions for the standard deviation of output do not change significantly as *1 increases. The DSGE-VAR does not predict the reduction of the variability in real output growth that took place after 1979. This reduction may be the outcome of a change in the exogenous technology process rather than the effect of monetary policy. Note that the predicted effect of the policy change becomes larger as the weight of the prior increases. As more weight is given to the DSGE model, the differences in the predictions becomes sharper, as one would expect. Although the experiment just described is not pure out-of-sample prediction, since the policy experiment /1 = 1.8 was motivated by an analysis of the PRIORSFROMDSGE MODELSFOR VARS 667 Volcker-Greenspansample,it illustratesthe potentialof our approach.We view the procedureas a tool that lets the policymakerassess the effects of the policy change as a functionof the confidenceplaced in the structuralmodel measured by S. 5. CONCLUSIONS The articletakes the idea of Ingramand Whiteman(1994)-imposing priors from general equilibriummodels on VARs- and develops it into a full-blown, computationallyefficientstrategythatis usablefor policyanalysis.Ourapproach involves the followingsteps: (i) Choose a DSGE model and a priordistribution for its parameters.(ii) Solve the DGSE model and map the prior distribution of its parametersinto a prior distributionfor the VAR parameters.Although a log-linearapproximationof the DSGE model simplifiesthe computationof the VAR approximationgivenby c*(0) and E,(O) considerably,it is not crucialto our approach.(iii) Obtainvia Monte Carlomethods the joint posteriordistribution of DSGE and VAR parameters,which can then be used to computepredictive densities. We applythis procedureto a VAR in real outputgrowth,inflation,andinterest rates,and show that it is broadlysuccessfulin termsof forecastingperformance. TheDSGE-VARclearlyoutperformsan unrestrictedVAR andthe vectorautoregressiveapproximationof the DSGE model itself at all horizons.Its forecasting performanceis comparableto and in some dimensionsbetter than a VAR with Minnesotaprior. Unlike the Minnesotaprior,our DSGE model prior also offers help in terms of policy analysis.We constructa VAR identificationscheme for the structural shocksbased on a comparisonof the contemporaneousVAR responseswith the DSGE model responses.The identificationof policyshocksenablesan analysisof modestinterventions.Our approachprovidesan attractivealternativeto existing identificationschemesbecauseit closelyties VAR identificationto a fullyspecified generalequilibriummodel. Beyond the specificationof a DSGE model and the possibly data-drivenselection of the prior weight, our identificationprocedure does not requirethe researcherto make any additionalchoices. We alsoillustratehow a VAR withDSGE modelpriorcanbe usedto predictthe effectsof a permanentchangein the policyrule-a taskthatis generallyconsidered infeasiblefor identifiedVARs. We use the approachto predictthe impactof the changefrom the Martin-Burns-Millerregimeto the Volcker-Greenspanregime on the volatilityof the variablesof interest.Wefindthatat leastin somedimensions the approachfaresreasonablywell in termsof predictingthe effect of the change, althoughfurtherresearchis needed to investigatethis issue more deeply. More researchlies down the road. First,if the VAR is specifiedin terms of output and prices ratherthan output growthand inflation,then the calculation of expected samplemomentsof data generatedfrom the DSGE model becomes more complicateddue to the nonstationarity.Second,our methodcould be used to generatepriorsfor state-spacemodelsthatformallynest the DSGE model.This extensionis conceptuallystraightforwardbut challengingfrom a computational perspective.Third,it is worthwhileto apply the method to largerscale DSGE 668 DEL NEGRO AND SCHORFHEIDE models, e.g., the Smets-Wouters (2003) model, and to make comparisons among priors that are derived from different models, such as a New Keynesian model versus a flexible price cash-in-advance model. As envisioned in Diebold (1998), the combination of DSGE models and VARs shows promise for macroeconomic forecasting and policy analysis. APPENDIX A.1. Derivations and Proofs. The normalization factor c(O) for the conditional prior density of the VAR parameters p(I, E~ I0) in Equation (21) is given by nT-k (A.1) c(O) = (2jr)'k 1TF(O) XTE(0) 2| )| T- k + -i)/2] 2n2 k)r[(r i=l where F[.] denotes the gamma function. The marginal likelihood function of 0 in Equation (32) is given by (A.2) P(YI 0) = p(YI d), E)p(I', E I0)/p(, TX X Trx (0) + X'Al E I Y) (i+ 1) Tru( (0) 2 )~T-k X7Tr (0)1 I XT* (0)1 X (2r)- /22 2 n2 + 1)T- k + 1 - i)/2] rF[(XT- k +1 - i)/2] Ii=ln r[(( n(XT-k) 2 2 i= The second equality can be obtained from the normalization constants of the Inverted Wishart-Normal distributions. Define the sample moments Fxx = X'X/ T, fry = = and T. Let Y'Y/ X'Y/ T, 0 = 1/k and 0 be the mode of the marginal logPyy likelihood function given in Equation (A.2). Consider the log-likelihood ratio PROOF OF PROPOSITION 1. (A.3) (A.3) p(Y O),) =ln p(Y T ln n (IO)+ FXx - -lI,AF*l10\f In /'/ 11 (2lnlc:(e)l (1/0 + 1)T- k n 1/ + 2 l/q5 T ) + 1/ In 2 -+ n n ln rxx *- 1 + (px'x (5)F(xx 1/? 669 PRIORSFROMDSGE MODELSFOR VARS We derive an approximation of the log-likelihood ratio that is valid as ( -> first-order Taylor approximation of the second term around ( = 0 yields 0. A r =In IIZl + ptr[rxx1 o(r2)+ InI?+ Pr1xx r'x (A.4) Notice that (A.5) 1/4 + 1 1/? =_ [r* - r* x- r- ]-1 yy y u + x[rF;y + rfyy (r;x qryx)(rx + crpx)- (rF* + orxy)] The log-determinant of this term has the following first-order Taylor expansion around 0 = 0: In l/+ (A.6) 1/?0 ^- = In IIZ+ btr[E- (yy - r - *- *' + *'rx,,*)] + 0(02 Combining these results yields (A.7) ln p(YI 0) - T In + 2T Iu*(0) T ln |E*(0)I - 2-tr[-1(0o)(y - ry T + 2tr[E*- (0)(fr - rfyxl*(0) - c4*'(0)rfX + = n P( *() - *'(0)rxy + c$*'(0)fxx* (0))] *'(0)r,,xxD*(0))]+ 0(0) 0) + O(Q) p*(YI0) Thus, as 0 -> 0 the log-likelihood ratio converges to the log-likelihood ratio of the quasi-likelihood functions. The convergence is uniform on compact subsets of I 0 for which ES(0) and 1XX\VIU~IV11UU rX(0) are nonsingular. \ IIU 670 DEL NEGRO AND SCHORFHEIDE 2. We rewrite the marginal log-likelihood ratio given PROOFOFPROPOSITION in Equation (A.3) in terms of X T u -jl2 IIn |E:,(0)|- I Ay P(YO)0) ln^^^ P (A.8) n 2 In |Af:+ ?xx (0 XX 2 \(0)r^l + 1)T- k ln I(X+ 1)*- -( T+n s^ + 2 5:(0)1 2 (0)fu() ,, ++ rx (0)rx jn^ n In 2 ( + 1)T - k In (X + + 1)*-)u( + 2 A Taylor series expansion of the second term around X = 0 yields ln 1XI+ rF' rx (A.9) - In Irx rxx + Xtr[rXl rX] + O(X2) Notice that (A.10) (X + 1)E*-' Eu = [ryy - ryxrxx rry]- x [xr;y+ yy - (xr'; + fryx)(xr*x+ rxx)-1(xr + rfX)] The log-determinant of this term has the following first-order Taylor expansion around X = 0 (A.11) In |(X + 1) = = ' S I--tr[ieu,mle(y Xtr[u,mleeu] u- In | Eu,mle - rmle ^*m r+ yxmie + Xtr[ ,mle(* * leDi lerxmle)] x )]D ney - (mle) 'rx((I q 0(X2) O(X2) + 4(mle)] + 0(x2) Combining the three terms leads to the following approximation of the loglikelihood ratio (A.12) p(Y.I) X- T In I M 2 2U p(YI ) In ] +2 (0 U - XT r-1 ] +T q----tr[Efmj,E:*' U,mle 2u 2 tr[Eu'mleE(O)] 20)] * 671 PRIORSFROMDSGE MODELSFOR VARS XT - ? -tr 2 utr[u'mle ( VO:e) )mle) xx((0 ]m e)] )- + Op(max[2 T, 1]) =nq(YI 0) +Op(max[X2T, 1]) q(yIl) A.2. PracticalImplementation. We assumethat the parameterspace of X is finite A = {/1,... ,lq}. In orderto select X,and to generatedrawsfrom the joint posteriordistributionof DSGE model parametersand VAR parameterswe use the followingscheme: 1. For each X E A use the Metropolisalgorithmdescribedin Schorfheide (2000) to generate draws from px(OIY) oc px(Y 0) p(0). The steps needed to evaluate px(YI 0) based on Equation (A.2) are as follows. For each 0: (i) Solve the DSGE model given by Equations(7), (10), and (12)-(14), for instance,with the algorithmdescribedin Sims (2002). This leads to a transitionequationof the form St = T(O)st_l + R(0)Et (A.13) The measurementequation(15) can be writtenin stackedform as (A.14) Yt = Z(O)st + D(0) + vt (In our implementationwe choose st such that vt = 0). Define the variance-covariancematricesof the shocksas E[Vt Vt] = Xvv(0) IE[EtEt] = XE(0) E[t Vt] = 5UV(0) (ii) Compute the population moments ry(0), Frx(0), and F* (0) from the state-space representation (A.13, A.14). Notice that (A.15) E[ytyt] = Zss Z' + ZREV + (ZREv)' + E, + DD' E[YtYt-_h]= ZTh(QZssZ' + RZev) + DD' where Qss= E[sts'] and can be obtained by solving the following Lyapunov equation: Qss = T ss T' + R2E, R'. 672 DEL NEGRO AND SCHORFHEIDE 2. Based on these draws apply Geweke's (1999) modified harmonic mean estimator to obtain numerical approximations of the data densities px (Y). 3. Find the presample size Xthat has the highest data density. 4. Select the draws of {0(s)} that correspond to Xand use standard methods to generate draws from p((, Su I Y, 0(s)) (see Equations (30) and (31)) for each 0(s). Notice that this scheme can also be used to select among competing DSGE models. Moreover, the whole procedure can be easily generalized to the case in which we have a prior distribution over the hyperparameter X. REFERENCES ALTUG, S., "Time-To-Buildand AggregateFluctuations:Some New Evidence,"International Economic Review 30(4) (1989), 889-921. F, AND G. DE NICOL6,"MonetaryDisturbancesMatterfor BusinessCycleFluctuations in the G-7," Journal of Monetary Economics 49(6) (2002), 1131-59. CHAMBERLAIN, G., "Panel Data," in Z. Griliches and M. Intriligator, eds., Handbook of CANOVA, Econometrics,Vol. 3 (New York:ElsevierScience,1984),1465-1514. AND C. EVANS, "Monetary Policy Shocks: What Have CHRISTIANO,L., M. EICHENBAUM, We Learnedand to What End?"in J. Taylorand M. Woodford,eds., Handbookof Macroeconomics(New York:ElsevierScience,1999). CLARIDA,R., J. GALi,ANDM. GERTLER,"Monetary Policy Rules and Macroeconomic Stabil- ity:Evidenceand Some Theory,"QuarterlyJournalof Economics115 (2000),147-80. "AnalyzingVARswithMonetaryBusiness DEJONG,D., B. F INGRAM,ANDC.H. WHITEMAN, Cycle Model Priors," Proceedings of the American Statistical Association, Bayesian Statistics Section (1993), 160-69. Journalof DIEBOLD,F.X., "ThePast,Present,andFutureof MacroeconomicForecasting," Economic Perspectives 12 (1998), 175-92. ANDR. S. MARIANO, "Comparing Predictive Accuracy," Journal of Business and -, Economic Statistics 13 (1995), 253-63. DOAN, T., R. LITTERMAN,AND C. SIMS,"Forecasting and Conditional Projections Using Realistic Prior Distributions," Econometric Reviews 3 (1984), 1-100. FAUST, J., "The Robustness of Identified VAR Conclusionsabout Money," Carnegie Rochester Conference Series 49 (1998), 207-44. J., "Using SimulationMethods for Bayesian EconometricModels:Inference, GEWEKE, EconometricsReviews18 (1999),1-127. Development,and Communications," INGRAM,B. F, AND C. H. WHITEMAN,"Supplanting the Minnesota Prior-Forecasting MacroeconomicTime Series Using Real Business Cycle Model Priors,"Journalof Monetary Economics 34 (1994), 497-510. IRELAND,P.N., "A MethodforTakingModelsto the Data,"Journal of Economic Dynamics and Control28 (6) (March2004), 1205-26. KING,R. G.,"TheNew IS-LMModel:Language,Logic,andLimits,"FederalReserveBank of Richmond, Economic Quarterly 86 (2000), 45-103. "What Should the Monetary Authority Do If Prices Are Sticky?" - , ANDA. L. WOLMAN, in JohnB. Taylor,ed., MonetaryPolicyRules(Chicago:Universityof ChicagoPress, 1999),349-404. -, C. I. PLOSSER,ANDS. T. REBELO,"Production, Growth and Business Cycles," Journal of Monetary Economics 21 (1988), 195-232. LEEPER, E. M., ANDT. ZHA, "Modest Policy Interventions," Journal of Monetary Economics 50 (8) (November 2003), 1673-1700. LEHMANN, E., Theory of Point Estimation (New York: John Wiley & Sons, 1983). 673 PRIORSFROMDSGE MODELSFOR VARS R. B., "Forecasting with Bayesian Vector Autoregressions-Five LITTERMAN, Years of Experience," Journal of Business and Economic Statistics 4 (1986), 25-37. LUBIK, T., ANDF. SCHORFHEIDE, "Testing for Indeterminacy: An Application to U.S. Monetary Policy,"American Economic Review 94 (1) (March 2004), forthcoming. LucAs, R. E., "Econometric Policy Evaluation: A Critique," in K. Brunner and A. Meltzer, eds., The Phillips Curve and the Labor Market, Carnegie-Rochester Conference Series 1 (Amsterdam: North-Holland, 1976). MOON,H. R., AND F. SCHORFHEIDE, "Minimum Distance Estimation of Nonstationary Time Series Models," Econometric Theory 18 (2002), 1385-1407. ROBERTSON, J. C., E. W. TALLMAN, AND C. H. WHITEMAN, "Forecasting Using Relative Entropy," Federal Reserve Bank of Atlanta Working Paper 22, 2002. T. J., "Two Models of Measurements and the Investment Accelerator," Journal SARGENT, of Political Economy 97(2) (1989), 251-87. SCHORFHEIDE, F, "Loss Function-Based Evaluation of DSGE Models," Journal of Applied Econometrics 15 (2000), 645-70. SIMS,C. A., "Macroeconomics and Reality," Econometrica 48 (1980), 1-48. -, "Solving Rational Expectations Models," Computational Economics 20(1-2) (2002), 1-20. -, ANDT. ZHA, "Bayesian Methods for Dynamic Multivariate Models," International Economic Review 39 (1998), 949-968. SMETS,F, AND R. WOUTERS,"An Estimated Stochastic Dynamic General Equilibrium Model of the Euro Area," Journal of the European Economic Association 1 (5) (September 2003), 1123-75. A. A., "Estimating Nonlinear Time-Series Models Using Simulated Vector AutoreSMITH, gressions," Journal of Applied Econometrics 8 (1993), S63-S84. TAYLOR,J. B., Monetary Policy Rules (Chicago: University of Chicago Press, 1999). THEIL, H., AND A. S. GOLDBERGER, "On Pure and Mixed Estimation in Economics," Inter- national Economic Review 2 (1961), 65-78. TODD,R. M., "Improving Economic Forecasting with Bayesian Vector Autoregression," Federal Reserve Bank of Minneapolis Quarterly Review (1984), 18-29. UHLIG,H., "What Are the Effects of Monetary Policy? Evidence from an Agnostic Identification Procedure," Mimeo, Humboldt University, 2001. WOODFORD,M., Interestand Prices: Foundations of a Theory of Monetary Policy (Princeton: Princeton University Press, 2003). ZELLNER,A., Introduction to Bayesian Inference in Econometrics (New York: John Wiley & Sons, 1971).
© Copyright 2026 Paperzz