Learning from Structural Vector Autoregression Models February, 2001 Stephen Gordon Dorothee Boccanfuso Departement d'economique and CREFA Pavillon J-A de Seve Universite Laval Quebec City, Quebec G1K 7P4 Canada Please send comments to [email protected] Abstract This study suggests a Bayesian approach for estimating Structural Vector Autoregression (SVAR) models based on direct analysis of the unidentied model. It is well known that if the prior distribution is proper, then the posterior will also be proper, regardless of whether or not the model is identied. Of course, if the model is not identied, then the posterior distribution for some - but by no means all - aspects of the model will be entirely determined by the prior. The focus of attention in this study is the extent to which the data revise prior beliefs about the main features of interest of SVAR models. We make particular emphasis on the need to specify priors that actually reect prior opinion. Instead of trying to specify priors for the SVAR parameters directly, we make use of standard macroeconomic intuition to determine which impulse-response functions are plausible a priori; these beliefs are used as the basis for a proper prior for the SVAR model. These priors are then updated by the data; the extent to which the priors are modied determines what we can learn. In an empirical application, we analyse a bivariate model of aggregate demand and supply (output and price level). Typically, this model is estimated under the assuming that demand shocks have no long-run eect on the level of output; since this restriction is a maintained hypothesis a priori, it is impossible to revise one's beliefs about the validity of this claim a posteriori. In our framework, it is a straightforward matter to characterise non-degenerate prior beliefs about the validity of this proposition, and to evaluate the direction and extent to which the data revise our priors. We nd that even if a prior is centred over the case in which demand shocks have no long-run eect on output, the posterior suggests that a one-standarddeviation demand shock increases the long-run level of output by about one per cent. Details of the Markov Chain Monte Carlo algorithm used in the analysis are outlined in a technical appendix. An earlier version of this paper was circulated under the title \What can we learn from structural vector autoregression models?". We would like to thank Alain Guay, Mark Dwyer, Dale Poirier, Ellis Tallman, Tao Zha and seminar participants at the University of Toronto and the Federal Reserve Bank of Atlanta for their useful comments. Remaining errors are the responsibility of the authors. 1 Introduction Structural Vector Autoregression (SVAR) models and their associated impulse-response functions have proven to be useful tools for macroeconomic policy analysis, and they have been the subject of many studies since Sims (1980). Since the structure of these models is quite simple (consistent estimates for the reduced-form parameters can be obtained from estimating each equation of the system by ordinary least squares), estimation is usually straightforward; the emphasis of much of this literature has been on the search for appropriate identication rules. Since conclusions about the eect of structural shocks are not robust to the choice of the identication rule, this question can have profound implications for policy analysis. If a decision-maker is presented with two conicting scenarios describing the eect of a given policy, a natural criterion for choosing between the two would be: which is more consistent with the data? In the current context, this criterion is unavailable - two diering impulse-response functions could be based on identical estimates for the reduced-form model. Since the data cannot choose between identication rules, other criteria must be introduced. A common theme in the literature on identifying SVARs is the importance of non-data-based information, often based on economic theory or on beliefs about the speed at which shocks in one sector are transmitted to the rest of the economy. Recent work that emphasises the role of information of this sort includes Leeper, Sims and Zha (1996), Uhlig (1997), Faust (1998) and Dwyer (1998). Given the importance that this literature accords to the role of prior beliefs in identifying SVARs, it is not surprising that much of this literature has a Bayesian avour. Bayesian methods provide a clear and coherent way for decision makers to make best use of the information contained in the data to update prior information; see Berger (1985), Bernardo and Smith (1994) or Poirier (1995) and the references therein for expositions and justications of Bayesian methods. 1 Although Bayesian methods are more prevalent in this eld than in many others in applied economics, the existing literature does not fully exploit an important feature of Bayesian methods, namely that exact identication is not required. It is well known (see, for example, Poirier (1998)) that if a proper prior is specied, then the posterior will always be well-behaved, regardless of whether or not the model is identied. Despite the acknowledged importance of prior information and the usefulness of Bayesian methods, most existing studies do not attempt to exploit this aspect of Bayesian analysis. Typically, improper or very diuse priors are used for the reduced-form parameters, and conditional on these reduced-form parameters, deterministic (or dogmatic) priors are used to identify the structural parameters; see Uhlig (1997) or Faust (1998) for recent examples. An exception is provided by Dwyer (1998), in which proper - albeit extremely diuse - priors are specied. However, since he focusses on the problem of choosing among dogmatic identication strategies, the advantages of not limiting attention to dogmatic priors are not fully explored. We nd the reluctance to make full use of the advantages oered by specifying a proper prior to be a puzzling state of aairs. It is unlikely that \ignorance" priors characterise the beliefs of researchers who work with SVAR models; for example, in discussions on the eectiveness of monetary policy - the standard application of this class of models - a lack of prior opinion can hardly be said to be the greatest obstacle facing the analyst. A more likely explanation is the perceived computational burden of a \fully Bayesian" approach: see, for example, the quote from Leeper, Sims and Zha (1996) cited in Dwyer (1998). As does Dwyer (1998), this study argues that the computational costs of a fully Bayesian approach are overstated, and that they are outweighed by the conceptual gains of a coherent approach to inference. We illustrate our approach by focussing on a question that cannot be addressed using usual techniques based on exact identication: what are the long-run eects of a demand shock on 2 output? The bivariate SVAR model of output and the price level is usually interpreted in terms of the standard macroeconomic model of aggregate supply and aggregate demand. Invariably, the structural form of this model is estimated under the identication rule suggested by Blanchard and Quah (1989), namely that the long-run eect of a demand shock on the level of output should be zero. Although this proposition may seem plausible as a rst-order approximation, its empirical validity has been by no means established beyond a doubt, and even the theoretical foundation upon which it rests has credible competitors, namely, models that display path-dependence. A simple example is presented by Aghion and Howitt (1998, pp 235-43), who discuss the case of an endogenous growth model in which investment in research and development is accelerated in periods of expansion. In this case, a temporary increase in output will increase output in all future periods, since the technical progress that was produced by the temporary demand shock will result in a permanent increase in output. By assuming a priori that it is literally impossible that a temporary demand shock can have no long-run eect on output, it is equally impossible to test this claim against a plausible alternative. In our application, we are able to incorporate standard theory in our priors (our prior mean for the long-run eect on centred on zero), but we admit the possibility that this proposition may not be literally correct (the prior standard deviation is not zero). We nd that in each of the four cases we estimate, the estimated eect on output of a positive demand shock of one standard deviation is revised from a mean of zero to a mean of about one per cent. The paper has ve sections. Section 2 describes the SVAR model and discusses identication issues. Our suggested approach is applied to two well-known SVAR models in Sections 3 and Section 4 discusses issues of robustness. Section 5 concludes. A technical appendix outlines the details of the MCMC algorithm used in the estimation. 3 2 Identication Issues There is a vast literature on identication rules for SVAR models. This section summarises some of the existing results, and adapts some Bayesian results to this framework. 2.1 The model The structural model can be written as follows: A(L)Xt = + "t (1) where Xt is a vector of n endogenous variables, is a vector of n constant parameters, "t iid N (0; In ) is a random vector of uncorrelated structural disturbances, and A(L) is a matrix polynomial in the lag operator of order p: A(L)Xt = A0 Xt + A1 Xt;1 + A2 Xt;2 + : : : + Ap Xt;p . If A0 is nonsingular, the reduced form of the model - upon which the likelihood function is based - is: Xt = A0 ;1 ; A0 ;1[A1 Xt;1 + A2 Xt;2 + : : : ApXt;p ] + A0 ;1"t b + B1Xt;1 + B2 Xt;2 + : : : BpXt;p + et (2) b + B (L)Xt;1 + et where B (L) = B1 + B2 L + : : : Bp Lp and where et iidN (0; A0 ;1 A0 ;1 0 ) is the vector of disturbances for the reduced form. Given estimates of the reduced form parameters, forecasting exercises can be carried out. However, since the reduced-form disturbances are linear combinations of the structural shocks, the model in (2) is not a suitable framework for policy analysis. The eect of structural shocks is more easily seen in the moving-average representation: 4 Xt = A(L);1 ( + "t ) c + C (L)"t (3) The terms in the matrix polynomial C (L) have a natural interpretation. If the j th element of the structural disturbance vector "t is subjected to a shock equal to one standard deviation, the eect on Xi;t+s is given by the (i; j ) element of the Cs matrix. Treated as a function of s, the [Cs ]i;j terms form the impulse-response function of Xi with respect to the j th structural shock. If the j th element in Xt is a policy variable, then the impulse-response function is a convenient way to express the eect of a change in policy on a given variable of interest. The structure of the SVAR model is identical to that of a standard linear simultaneous equations model (LSEM), and like the LSEM, it has an identication issue that needs to be addressed before the model can be applied to data. One way of viewing the problem (see, for example, Leeper, Sims and Zha (1996)) is to consider a transformation in which (1) is premultiplied by an orthonormal matrix (0 = I ). It can be shown that such a transformation leaves the reduced form of the model unchanged, but it does modify the impulse-response functions. Since it is only the reducedform parameters in (2) that are identied in the likelihood function, more information must be used to obtain the impulse-response functions. 2.2 Existing approaches to the identication problem Another way of looking at the identication problem in SVAR models is to note that if A0 were known, then it would be possible to recover the other structural parameters A1 ; A2 ; : : : Ap from the reduced form parameters, since Bi = A0 ;1 Ai ; i = 1; : : : p. However, the only available independent information for determining A0 is in the estimate for the covariance matrix for the reduced-form disturbances: V (et ) = A0 ;1 A0 ;1 0 . Since the covariance matrix is symmetric, there are only n(n + 5 1)=2 free parameters in V (et ), while there are n2 elements in A0 . If the analyst is willing to impose n(n ; 1)=2 exact restrictions on A0 , it may be possible to retrieve A0 from estimates for V (et ). For example, if all n(n ; 1)=2 elements of A0 above the main diagonal are set equal to zero, A0 can be recovered from the Cholesky decomposition of the inverse of V (et ) (Sims 1986). Leeper, Sims and Zha (1996) make use of prior beliefs about the speed at which certain shocks aect certain variables to impose enough restrictions on A0 that allow them to retrieve A0 from V (et ). Typically, exclusion restrictions are based on intuitive plausibility a priori, but several recent studies have explored the use of identication rules that yield plausible results a posteriori. Leeper, Sims and Zha's (1996) comparison of identied SVAR models is largely based on whether or not the results produced `reasonable' results. Uhlig (1997) proposes a method in which the criterion for the choice of an identication rule is a loss function that penalizes identication rules that yield implausible impulse-response functions. Faust (1998) develops a method for checking for robustness across identication rules, by limiting attention to rules that yield reasonable results. Much of the literature on SVARs - and all of the references in this section - make use of Bayesian techniques to some extent. Given the importance of such notions as \plausible" identication rules and \reasonable" results, it would be surprising if this were not the case. On the other hand, the actual implementations of the Bayesian paradigm make use of priors that are often an uneasy mix of ignorance and dogmatism: reference or very diuse priors are placed on the parameters of the reduced form (2), and conditional on the reduced form parameters, deterministic rules are used to identify the structural parameters. Identication rules that give plausible posteriors are favoured. The emphasis on reasonable posteriors is perhaps worrisome, since it may lead to the sort of circular reasoning described by Uhlig (1997). If the identication rule is chosen so that estimates 6 for impulse-response functions accord with the analyst's a priori intuition, then the posterior will simply take the form of the prior; the analyst would learn nothing from such an exercise. It will be possible to learn something from the data only if they can force a revision in one's prior beliefs. Although it is proper for an analyst to announce and defend any prior beliefs, using these beliefs to impose certain features on the posteriors contradicts what Poirier (1995) refers to as the `Cromwell Rule'1 , namely, to \think it possible that you may be mistaken". 2.3 Bayesian analysis of unidentied SVAR models As Poirier (1998) points out, it is well known that \Bayesian analysis of a nonidentied model is always possible if a proper prior is used". In many applications of Bayesian methods, results using `non-informative' priors are reported, so that it might be possible to see what information there is in the data when it is not ltered through subjective beliefs that may not be shared by all readers. In the current context, the identication problems inherent in SVAR models make it clear that this sort of objectivity is an unobtainable goal2 . Although the Bayesian paradigm is applicable in the the case of nonidentied models, there will be features of the model for which the posterior distribution will be entirely determined by the prior. We have noted that premultiplying the model (1) by an orthonormal matrix will not aect the reduced form of the model. In order to make explicit the identied and nonidentied aspects of the model, write the structural model as D(L) = + "~t (4) Poirier's (1995) rule is applied in a dierent context, but in the same spirit. This problem is in fact endemic to all models, regardless of whether or not they are identied. As Bernardo and Smith (1994, p 298) observe, \[t]here is no `objective' prior that represents ignorance". 1 2 7 where "~t = "t , and where is orthonormal. By dening Ai Di ; i = 0; : : : ; p and , we obtain the specication in (1), where the transformed shocks "~t retain the properties of "t and can also be interpreted as structural shocks for the transformed model. Taking into account the previous discussion, D = (D0 ; D1 ; : : : Dp ) can be said to be identied, but is not. If prior beliefs about D and are independent, Proposition 1 in Poirier (1998) demonstrates that the marginal posterior distribution for will be exactly identical to its marginal prior distribution. Even if D and are not a priori independent, the conditional posterior distribution p(jD; data) will always be identical to the conditional prior distribution p(jD) (Poirier, 1998, Proposition 2). If it were reasonable to impose that D and should be independent, and/or if the focus of attention were on the conditional distribution p(jD; data), these results would be unsettling, since they imply that the data cannot be informative about the features of the model that were of most interest. Fortunately, this is not the case. The parameters in (4) are dicult to interpret, and are of interest only insofar as they determine the impulse-response functions or the reduced form parameters. The features of the model for which the data are not informative are not the features of interest. 2.4 Priors that actually reect prior opinion If the SVAR model were identied, and if we had access to an innitely informative data set, then the form of the prior is almost without consequence, since the data would shrink all priors to a degenerate posterior located at the 'true' parameter vector. Since we are working with an unidentied model and with nite samples, the posterior will always reect both sample and prior information. If the posterior is not entirely determined by the data, then the form of the prior becomes an important feature of the analysis. The dangers of using an arbitrary default prior are 8 obvious: nonsensical priors may generate nonsensical posteriors. It is perhaps worth commenting on the random walk prior at this point. Litterman's (1986) suggestion of using proper but weak priors for the reduced form of the model centered over the random walk case for has also been widely used, and has been adapted to cases in which priors are formed over the structural form, such as Sims and Zha (1996) and Dwyer (1998). The random walk prior is simple to implement, and since many macroeconomic time series appear to have unit roots, this seems to be a plausible place to centre the location hyperparameters. However, no prior can be considered \neutral", and it might be instructive to consider its implications, especially when it is noted that the impulse-response functions are at lines. Few economists would assert that this actually reects what one would expect to occur after a structural shock. One of the main arguments we advance in this study is that analysts should pay close attention to the form of the prior, and that they should be prepared to claim that they are plausible. This statement should be uncontroversial: prior distributions for the parameters should reect prior opinion about how the economy works. The remaining problem is a technical one: how can priors for impulse-response functions be translated into priors for the SVAR parameters? The question is the focus of the next section. 3 Specifying Proper Priors for the SVAR This section demonstrates how the analyst can make use of his or her prior beliefs about various aspects of the SVAR model in order to specify a prior that reects prior opinion. Our discussion is based on the aggegate supply-aggregate demand model of output and the price level. This model has been the subject of numerous studies, from Blanchard and Quah (1989) to Cooley and Dwyer (1998). There are two variables in the system, so n = 2, and we set Xt (Yt ; Pt )0 . The data are 9 taken from Faust and Leeper (1997): the output measure Yt is the log of real US GDP, and the price variable Pt is the log of the US GDP deator. The data are quarterly, and we follow Faust and Leeper (1997) in setting p = 8. 3.1 Conjugate, conditionally conjugate and non-conjugate priors We noted earlier that the reduced form model determines the form of the likelihood, so much of the earlier work on Bayesian analysis of SVAR models (for example, Litterman 1986) has focussed attention on forming sensible priors for the parameters in (2). The diculties encountered here are not computational: if the analyst makes use of a conjugate normal-Wishart prior, then the posterior is also a normal-Wishart density whose properties are well known. If the focus of attention is on the forecasting properties of the model, analysis is fairly straightforward. Similar results are available for the structural form of the model, as Sims and Zha (1996) demonstrate. If the conditional prior p(A1 ; A2 ; : : : ; Ap jA0) is normal, so is the conditional posterior p(A1 ; A2 ; : : : ; ApjA0 ; data). This result is less strong, but its usefulness will be seen in the Gibbs sampling algorithm developed below. In principle, there is no reason to limit attention to priors of the form used here; the development of Markov chain Monte Carlo (MCMC) techniques has made it possible to estimate features of posteriors resulting from any prior density. The reason for using conditionally conjugate priors is pedagogical: many researchers in this eld are not familiar with MCMC techniques3 . Our aim here is to develop techniques that can be implemented by analysts who may be relative neophytes to MCMC methods. To our knowledge, the only other empirical analysis of SVAR models that makes use of MCMC techniques is Waggoner and Zha (1999). 3 10 3.2 Eliciting a prior One reason why analysts have favoured some sort of default prior in empirical applications is simply that the parameters in the structural model (1) are extremely dicult to interpret: it is hard to imagine what the parameters in A should look like. Our way around this diculty is to characterise priors about the features of the model that are easily interpreted, and to see what sort of prior for A is consistent with these beliefs. 3.2.1 Priors for the impulse-response functions Clearly, the features of the model for which it is easiest to form priors that are consistent with economic intuition are the impulse-response functions themselves. If a proper prior is specied for C (C0 ; C1 ; : : : ; Cp), then this induces another proper prior for A, and the posterior distribution p(Ajdata) will be well-behaved. Since A determines the impulse-response functions for all horizons, so does C . In the aggregate demand-aggregate supply model of output and prices, there are four impulseresponse functions: the eect of supply shocks on output (Y RS ) and on prices (PRS ), as well as the eect of demand shocks on output (Y RD) and prices (PRD). Since the lag length p = 8, the prior for C and A will be proper if the prior for fY RSi ; PRSi ; Y RDi ; PRDi g8i=0 is integrable. This can be assured if we only specify priors for values of i up to 8, but it became apparent early on in the estimation that reasonable priors over a two-year horizon did not rule out unreasonably explosive behaviour shortly afterwards. After a certain amount of introspection, we decided that we were prepared to specify priors for a 5-year horizon, and these priors are graphed in Figure 1; the means and standard deviations are reported in Table 3 in the Appendix. The interpretation of these priors is standard. Demand shocks 11 are expected to have a positive eect on output in the short term, but not at longer horizons; the standard deviations decrease slightly at longer horizons, reecting our increasing condence that Y RD will eventually return to zero. Demand shocks are expected to increase prices in both the short and medium term, but since we are less certain about the long-run properties of P after a demand shock, our prior standard deviations increase at longer horizons. Supply shocks are expected to increase output and reduce prices, but the increasing standard deviations reect our uncertainty about the long-run adjustment process after supply shocks. Another feature of our priors that is implicit in Figure 1 is that we believe that the impulse-response functions are fairly smooth. For example, if we were told that the true value of an impulse-response function at a given point was greater than what we expected, we would revise upwards our priors for the impulseresponse function in this region. We incorporate this notion by setting the coecient of prior correlation between two points that are s quarters apart equal to 0:7s . The priors for the dierent impulse-response functions are independent, although other analysts may wish to incorporate beliefs that a smaller-than-expected eect of (say) a supply shock on output should be associated with a smaller-than-expected eect on prices. We refer to the priors graphed in Figure 1 as \proto-priors" for C (C; C9 ; C10 ; : : : ; C20 ). The proto-prior p(C ) takes the form of the product of four multivariate normal distributions whose means, standard deviations and coecients of correlation are described above. Note that for a given value of C , we can recover A and compute the associated C ; denote this mapping by C (C ). Since the dimension of C is greater than that of C , f (C ) p(C (C )) is an integrable function. In the estimation, we found it convenient to provide upper and lower bounds for the impulseresponse functions in order to rule out explosive paths. Using the proto-prior graphed in Figure 1, the upper and lower bounds were both set ve standard deviations from the means of 12 fY RSi; PRSi; Y RDi; PRDig8p=0 . The additional restrictions Y RS0 > 0 and PRD0 > 0 mean that positive supply (demand) shocks do not generate contemporaneous decreases in output (prices). Let F represent the region of the parameter space consistent with these restrictions. 3.2.2 Priors for the long-run eect of demand shocks on output In this model, the identication problem is typically addressed by using Blanchard and Quah's (1989) rule of imposing long-run neutrality of output with respect to transient demand shocks, i.e,, limi!1 Y RDi = 0. This identication rule is consistent with the stories based on the standard textbook treatment of demand shocks. But if this rule is imposed a priori, it becomes impossible to see whether ot not the data are in fact more consistent with a plausible alternative, such as a model that displays hysterisis. Since the cumulative eect of demand shocks on output is a feature of special interest in this model, our priors take this into account. Instead of attempting to use a nite data set to make inferences about the innite long run4, we choose to concentrate interest on LRY RD P (1=20) 100 i=81 Y RDi , the average eect of a demand shock on output during the 5-year interval between the 20th and the 25th year after the shock. Focussing on LRY RD amounts to an implicit prior belief that the short-run dynamics of a demand shock will have worked themselves out after 20 years; taking the average over the next ve years will further attentuate any high-frequency uctuations that may persist after 80 quarters. Other analysts may wish to use a longer or a shorter horizon, or indeed any another measure of the long-run eect of demand shocks on output. Our prior for LRY RD is centered around 0, but we do not rule out cumulative changes of 1 percentage point. We let p(LRY RD) denote the prior for LRY RD, and it takes the form of a 4 This point is discussed by Faust and Leeper (1997). 13 N (0; 1) distribution. Again, since C determines LRY RD, let LRY RD(C ) represent this mapping, and dene g(C ) p(LRY RD(C )). Note that g is not an integrable function. In addition, we rule out values for LRY RD with an absolute value of greater than 25%; let G denote the region of the parameter space that satises this restriction. 3.2.3 Priors for the reduced form Since the reduced form of the model describes the intertemporal dynamics of observed prices and output, it may be the case that analysts will want to incorporate priors for B as well. The popularity of the Litterman (1986) random walk prior suggests that many analysts believe it to be at least a rough rst approximation to their prior. We also would be surprised to see reduced-form parameters that were signicantly far from the random walk case, and this prior is incorporated into a multivariate normal prior p(B ) with mean set equal to the random walk case, and where the identity matrix serves as the covariance matrix (i.e., the prior standard deviations are 1). If B (C ) represents the mapping from C to B (via A), let h(C ) p(B (C )). We also rule out all parameter combinations that yield reduced-form parameter vectors that are `too far' from the random walk case. A candidate C is acceptable only if the maximum distance between the elements of the corresponding B the random walk case is less than 1; this region is denoted by H. 3.2.4 Priors for variance decompositions Although we do not do so in this study (our priors are already quite informative and our focus is on LRY RD), researchers who are interested in seeing how much of the variance of a given variable of interest can be attributed to a given structural shock can use similar reasoning to set out priors for variance decompositions. 14 3.2.5 The prior for C There are many ways to combine the densities f (C ), g(C ) and h(C ); we choose to multiply them (simple harmonic mean): p(C ) / f (C )g(C )h(C ) T C 2 C (5) T where C F G H is the intersection of the acceptable regions. Clearly, this distribution is non-standard, and analytic expressions for such features as moments and quantiles are unavailable. On the other hand, MCMC techniques can be applied here to obtain draws from (5), so that simulation-consistent estimates for various features of interest can be obtained. Our particular application takes the form of a Gibbs sampler in which each iteration consists of a draw generated by the Metropolis-Hastings algorithm (\Metropolis-within-Gibbs"); details of the algorithm are described in the Appendix. 3.2.6 The prior for A Given the one-to-one and onto mapping between A and C , the prior p(C ) is also a prior for A; a suitably calibrated MCMC algorithm can be used to produce draws from the resulting posterior; an earlier draft of this study took this approach. However, given the large number of parameters and the highly irregular structure of the posterior, treating p(C ) as the prior is likely to be a frustrating experience for those who do not have extensive experience with MCMC techniques. Instead, we suppose that the prior for A can be approximated by a multivariate normal distribution. For each value of C drawn from p(C ), we can retrieve the corresponding value of A; the mean and variance of p(A) are then set equal to the sample mean and variance of this sample of draws for A. Table 4 in the Appendix lists the means and standard deviations use to construct 15 the multivariate normal prior p(A). The support of A - denoted by A - is restricted to the region consistent with C 5 . Before going to the data, it is useful to see if the approximated p(A) does in fact correspond to reasonable prior beliefs about the features of interest of the model. We simulated 5000 iid draws from p(A), and computed the impulse-response functions and the value of LRY RD for each draw. Figure 2 plots the estimates for the prior means of the impulse-response functions and their associated one-standard-deviation error bands. The estimate for Y RD has the familiar humpbacked shape, and it reects the belief that a positive demand shock will have only a transitory eect on output. The same demand shock is expected to lead to a permanent increase in the price level. A positive supply shock is expected to permanently increase output and decrease prices. The actual size of the response is of course less clear, and fairly wide error bands have been used in order to capture our prior uncertainty. As we mentioned earlier, we are particularly interested in the long-run eect of a demand shock on output. The prior mean for LRY RD implied by p(A) is -0.05%, and its prior standard deviation is 3.1%. Although prior reects the belief that demand shocks will have no eect on output in the long run (the prior mean of LRY RD is essentially zero), it also reects the belief that the long-run neutrality proposition might not be consistent with the data (the prior standard deviation is strictly positive). 3.3 Prior robustness Suppose that implementation of a certain policy innovation is being contemplated. If a decision maker has well-dened beliefs about how the economy works, then the econometrician need only 5 The probability that a value of A drawn from an unrestricted p(A) will be in A is 0.864. 16 formalise an appropriate prior that is consistent with these beliefs and compute the posterior moments of interest; the policy maker can then decide whether or not the policy should be carried out. In practice, policy analysis is rarely carried out in this way. The econometrician is asked to inform the policy makers about what the data \say"; this information is then used as an input (along with the subjective opinion of the decision maker) in the policy-making process. Although the decision maker may be justied in not conding in an econometrician, refusing to do so complicates matters. Since it is not possible to make use of a neutral prior, the econometrician is obliged to undertake several rounds of estimation using dierent priors, with the hope that one of the priors will correspond to those of the decision maker. In a similar vein, studies written for a broader audience are expected to report results for a range of priors that reects the divergence of prior opinion among readers. Ideally, conclusions about a feature of interest should be robust to variations in the prior6, but there is little reason to expect that this will be the case in every empirical study. If two analysts use dierent priors and come up with dierent results, then a decision maker may feel obliged to discard the study that uses the `wrong' prior, and the policy debate may degenerate into a struggle to demonstrate that one subjective opinion is `better' than another. This prospect may help to explain the reluctance of analysts to make use of proper priors. This need not be the case. Instead of attempting to demonstrate that a given modelling decision - i.e., the choice of the combination of likelihood and prior - should be given higher weight a priori, the adoption of proper priors means that a standard Bayesian model comparison exercise can be implemented to determine which models should be assigned higher weight a posteriori. This point is demonstrated by Dwyer (1998), who develops techniques for distinguishing between various exact If the prior includes a dogmatic identication rule, this amounts to saying that conclusions should be robust to the way in which the model is identied - a recurring theme in the SVAR literature. 6 17 Prior (a) (b) (c) (d) Table 1: Priors Description Base prior Error bands in f (C ) doubled Standard deviation for LRY RD in g(C ) halved Both (b) and (c) identication schemes. In our robustness exercise below, we adopt MCMC techniques developed by Gelfand and Dey (1994) to compare the priors used here. We consider three simple variations on the base model. A brief description of the four priors are listed in Table 1, and Figure 3 plots the priors for the Y RD impulse-response function for the various priors (the prior and prosterior densities for LRY RD are graphed in Figure 6 and are discussed in the presentation of the empirical results in Section 4.2 below). 3.4 Prior for We have almost no guidance in determining priors for , the intercept term in (1), so we make use of a small subset of the data - a \training sample" - to generate proper priors. For a given value of A, we can manipulate the data to produce a random vector that is distributed iid N (; In ). Combining a at proto-prior and the rst 16 data points (about 10 % of the main sample) yields a normal prior p(jA) whose mean and variances are obtained using the usual conjugate formulae (Poirier, 1995, p 300). 3.5 Summary From a decision-theoretic point of view, the need for an informative prior is clear: learning is literally impossible if the agent is completely ignorant of the phenomenon being studied. A given article taken from (for example) the SVAR literature is most meaningful to trained economists 18 who are familiar with the theoretial and econometric frameworks upon which the analysis is based. These readers bring a great deal of a priori information to the exercise, typically have strong beliefs about issues such as (for example) the response of prices and output to supply and demand shocks, and can potentially learn a great deal. On the other hand, the same article would be literally meaningless in the hands of someone who satised none of these conditions. Even if the analyst is untroubled by such considerations, there are more practical reasons for using an informative prior. Since the model is unidentied, the data will never be able to speak for themselves; the posterior will always reect at least some features of the prior, regardless of the sample size. If the prior does not actually reect prior opinion, interpreting the resulting posterior may be problematic. The aim of this section has been to demonstrate that it is possible to make use of economic intuition to develop informative priors for the structural parameters of the SVAR model. We make no claim that our priors are `correct', nor do we expect analysts who disagree with these priors to adopt them as a form of default prior. Our goal here is to provide a practical example of the sort of prior elicitation process that we feel should be an integral part of any empirical study. 4 Analysing the Posterior Given the multivariate normal structure of the priors p(A) and p(jA), it is a fairly straightforward matter to make use of well-known results from conjugal Bayesian analysis (such as those developed by Sims and Zha, 1996) to construct a Gibbs sampling algorithm to simulate an articial sample drawn from the posterior p(A; jdata). These draws can then be used to provide simulation-consistent estimates for the features of interest of the model. 19 4.1 A Gibbs sampling algorithm Partition the vector of structural parameters so the the SVAR model (1) can be written as: 2 Yt;1 3 6 7 6 7 6P 7 6 t;1 7 6 7 6 7 6 7 6 Yt;2 7 6 7 2 2 0 32 3 2 3 2 0 36 3 7 A0;Y Yt 1 AY 66 " S;t 7 4 54 5 = 4 5+4 5 6 Pt;2 7 5 7+4 6 7 0 0 6 7 A0;P Pt 2 AP 6 . 7 "D;t 6 .. 7 6 7 6 7 6 7 6 7 6 Yt;p 7 6 7 4 5 (6) Pt;p where A0;Y and A0;P are 2 1 vectors of the rows of A0 , and where AY and AP are 2p 1 vectors of the top and bottom rows, respectively, of fA1 ; A2 ; : : : ; Ap g. Given this partition, it is possible to manipulate the model so that it can be written as a simple linear regression model with unit variance. For example, suppose that A0;Y is the only unknown feature of the model, and dene: yt ; [ Yt;1 Pt;1 : : : Yt;p Pt;p ] AY ; 1 Xt ; [ Yt Pt ] The model can then be written as y = XA0;Y + (7) where is a vector of iid N (0; 1) error terms. Since the prior for A is a multivariate normal, so is the conditional prior p(A0;Y jA0;P ; AY ; AP ); let A^0;Y and V^0;Y represent the mean and variance of 20 this conditional distribution. The model (7) can now be interpreted as a linear regression model with unit variance and with a normal conjugate prior. It is well known that the conditional posterior p(A0;Y jA0;P ; AY ; AP ; ; data) is also a normal distribution with mean A0;Y and variance V0;Y , where: ;1 V0;Y = (X 0 X + V^0;1 ;Y ) 0 A0;Y = V0;Y (V^0;1 ;Y A^0;Y + X y) Draws from p(A0;Y jA0;P ; AY ; AP ; ; data) can therefore be simulated using standard algorithms for simulating a pseudo-random draw from a normal distribution; draws that do not satisfy the condition A 2 A are discarded. The exercise can be repeated to produce models of the form (7) for A0;P , AY , and AP , and in each case, the full conditional posterior will be a multivariate normal. We have already noted that if A is known, the model can be manipulated to generate data that are N (; I ); given the conjugate normal prior p(jA), the posterior p(jA; data) is also normal. We are now able to implement a Gibbs sampling algorithm where the sequence fAi ; i gNi=1 is generated according to: i;1 i;1 i;1 Ai0;Y p(A0;Y jAi0;1 ;P ; AY ; AP ; ; data) Ai0;P p(A0;P jAi0;Y ; AiY;1 ; AiP;1 ; i;1 ; data) AiY p(AY jAi0;Y ; Ai0;P ; AiP;1 ; i;1 ; data) AiP p(AP jAi0;Y ; Ai0;P ; AiY ; i;1; data) i p(jAi0;Y ; Ai0;P ; AiY ; AiP ; data) (8) The sequence of draws generated by (8) converges in distribution to the stable density p(A; jdata), and these draws can be used to generate simulation-consistent estimates for the posterior moments of interest. In our application, 500 iterations were used to initialise the Gibbs sampler; the results 21 below are based on the subsequent 5000 draws generated by (8). 4.2 Posteriors for impulse-response functions For each draw Ai simulated from the posterior p(Ajdata), we can retrieve the associated impulseresponse functions; the average across these N impulse-response functions is our estimate for the posterior means. The estimates for the impulse-response functions generated by our base prior (a) are plotted in Figure 4. Since the form of the posteriors is similar to that of the prior (with the possible exception of the eect of a supply shock on output), the data do not appear to greatly challenge the economic intuition used in forming the priors. On the other hand, we appear to have greatly underestimated the eect of structural shocks on the price level: the absolute values for PRS and PRD are both revised upwards. Similarly, our prior also underestimated the eect of a demand shock on output. Minor variations in the prior do not appear to signicantly aect our results. Figure 5 plots the posterior estimates for Y RD for each of the priors described in Table 1. Using wider error bands a priori as in (b) and (d) generates wider error bands a posteriori, and it appears that the tighter prior on LRY RD plays a role in smoothing out estimates in (d), but has little eect in (c). In choosing our priors, we paid particular attention to the long-run eect of demand shocks on output. Since our prior for LRY RD is not degenerate, it will be possible to learn something from the data. For each prior in Table 1, we have articial samples from both the prior and posterior distributions for LRY RD, and their estimated densities are graphed in Figure 6. In Figure 5, the data revise estimates for the short-run eect of demand shocks upward for each of the priors, and Figure 6 suggests that this upward revision persists into the longer term as well. The extent of the revision depends almost entirely on the amount of prior precision for the error 22 bands; tighter priors generate smaller revisions. 4.3 Comparing priors Suppose there are M models under consideration; they could dier in the form of the prior p(jM j ), the form of the data density p(X j; M j ), or both7 . Bayesian model comparison exercises are based on the marginal likelihoods p(X jM j ) for the various models j = 1; 2; : : : ; M : Z p(X jM j ) = p(X j; M j )p(jM j ) d (9) The posterior probability assigned to a given model is calculated using Bayes' rule: j j p(M j jX ) = PMp(X jM )p(i M ) i i=1 p(X jM )p(M ) (10) where p(M j ) is the prior probability associated with M j . The obvious diculty in implementing a Bayesian model comparison exercise is evaluating the integral in (9); closed-form solutions are available only for simple models with conjugal priors. However, if draws from the posterior p(jX; M j ) are available, Gelfand and Dey (1994) demonstrate that estimates for p(X jM j ) can be obtained with very little additional eort. Suppose that we have draws from p(jX; M j ), and that is is possible to evaluate p(X ji ; M j ) and p(i jM j ), including the normalising constant. Let f () be any known density for , and let figNi=1 be a sample of draws taken from p(jX; M j ). It is then a simple matter to demonstrate that N ;1 N X f (i) a:s: 1 R ! i i p(X j)p() d i=1 p(X j )p( ) 7 It will usually be the case that changing the form of the data density will change the interpretation of the parameter vector, thus requiring a new prior as well 23 Table 2: Estimated log predictive likelihoods Prior Log Predictive Likelihood (a) -396 (b) -549 (c) -405 (d) -378 In our application, p() is the prior for A, and f is a multivariate normal distribution whose moments are set equal to the estimated posterior means and covariances8 Table 2 reports the estimated log predictive densities for each of the priors used above; the numerical standard errors are about 0.3 for each estimate. It is clear that prior (d) receives the most support; the log Bayes factor in favour of prior (c) against the base prior is 18, which implies overwhelming evidence in favour of (d) over (a). 4.4 Have We Elicited a Posterior? The data that are typically used in SVAR analysis are so well known that many analysts might not be able to distinguish between opinions that are not data-based, and those formed while working with the data; the analyst may in fact be eliciting a posterior instead of a prior. Even if this is the case, it is still possible to use Bayes' rule to characterise the analyst's implicit prior, and to see to what extent the data revise prior opinion. Let p(A; jX ) represent the joint posterior distribution for A and . Integrating out and manipulation of Bayes' rule yields: AjX ) p(A) / pp((X jA) R (11) where p(X jA) p(X jA; )p(jA) d is the predictive density of the data given A. If p(AjX ) and 8 Both densities include corrections for the fact that their domains are restricted to A. 24 p(jA) are normal, a Gibbs sampling algorithm analagous to that in (8) can be implemented to obtain draws from p(A); see the Appendix for details. It should be noted that (11) does not imply that it is always possible to retrieve a prior from an arbitrary posterior; the data impose limits on the class of admissible priors. For example, recall that for the linear regression model in (7), the posterior precision matrix V0;1 ;Y is the sum of the 0 prior precision matrix V^0;1 ;Y and the data precision matrix X X . If the posterior is known, the prior precision matrix can be recovered according to ;1 0 V^0;1 ;Y = A0;Y ; X X (12) If the prior covariance matrix is to be positive denite, the dierence between the posterior precision and the data precision must be positive denite; the data provide a lower bound for the level of posterior precision. Equivalently, (12) provides the upper bound for a posterior covariance matrix consistent with coherent inference. So have we, in fact, elicited a posterior instead of a prior? Although it may well be that our prior has been 'contaminated' by exposure to the data, it is too diuse to be characterised as a posterior that resulted from a coherent learning exercise; it proved to be impossible to implement a Gibbs sampling algorithm for generating draws from the prior without violating the constraint imposed by (12). 5 Conclusion The identication problems inherent in SVAR models illustrate the importance of subjective opinion in empirical work, and Bayesian methods provide a coherent framework for combining this information with the data. Even so, few empirical studies attempt to fully exploit the advantages 25 of this approach. The aim of this study is to demonstrate that technical diculties involved in specifying priors and in computing posteriors are no longer insurmountable. The \fully Bayesian" approach is both desirable and feasible. Although the computational burdens of non-conjugate analysis are real enough, it is not a dicult matter to implement a MCMC estimation algorithm. The greater challenge facing analysts is to abandon the notion that there is a non-informative prior that will allow the data to speak for themselves. Even if all the parameters of a model are identied, the use of non-informative priors can generate posteriors with paradoxical features9 . If the parameters of interest are not identied, attempts to make use of non-informative priors are even more problematic. Although many analysts may be sympathetic with the intuition behind setting certain parameters (or combinations) to zero, few would be comfortable with the assertion that a given identication rule should be taken as literally true a priori. Indeed, the extensive literature on the the robustness of results to variations in the identifying rules used to generate them suggest that few analysts are willing to make the leaps of faith demanded of them by advocates of a given exact identication rule. If the prior does not accurately reect the analyst's beliefs, it is dicult to imagine how the resulting posterior should be interpreted. We believe that there are no insurmountable diculties in forming proper priors. For simple models such as that studied here, analysts with minimal experience in this eld can elicit priors with only a modest amount of introspection. This approach has its limits, but analysts who have more expertise in macroeconomic policy analysis will be able to interpret and form priors about impulse-response functions for higher-dimension models. Even if an analyst claims to have no opinion about a certain impulse response function, it will be possible at the very least to put upper 9 See, for example, Poirier (1995), pp 321-329 and the references therein. 26 and lower bounds over the short run, and the analyst will often be willing to rule out explosive paths in the long run. Another potential source of prior opinion is the vast literature on simulated dynamic general equilibrium (DGE) models of the sort popularised in the analysis of real business cycles. The SVAR and DGE strategies are typically viewed as competitors in the eld of macroeconomic policy analysis, and Cooley and Dwyer (1998) note that standard SVAR and DGE modelling strategies can generate quite dierent conclusions. However, policy makers are not obliged to choose between theory and data: Bayesian methods oer a coherent mechanism to make optimal use of both sources of information. 27 References Berger, James O., (1985), Statistical Decision Theory and Bayesian Analysis, (2nd edition), New York:Springer Bernardo, J.M. and A.F.M. Smith, (1994), Bayesian Theory, New York:Wiley Blanchard, Olivier and Danny Quah, (1989), \The dynamic eects of aggregate demand and supply disturbances", American Economic Review 79 (4), pp 655-673 Chib, Siddhartha, and Ed Greenberg, (1996), \Markov chain Monte Carlo simulation methods in econometrics", Econometric Theory 12, pp 409-431 Cooley, Thomas F. and Mark Dwyer, (1998), \Business cycle analysis without much theory", Journal of Econometrics 83, pp 57-88 Dwyer, Mark, (1998), \Impulse-response priors for discriminating structural vector autoregressions", mimeo, UCLA Uhlig, Harald, (1997), \What are the eects of monetary policy? Results from an agnostic identication procedure", mimeo, Tilberg University Faust, Jon, (1998), \The robustness of identied VAR conclusions about money", mimeo, Federal Reserve Board and Eric M. Leeper, (1997), \When do long-run identifying restrictions give reliable results?" Journal of Business and Economic Statistics 15 (3), pp 345-354 Gelfand, A.E. and D.K. Dey, (1994), \Bayesian model choice: asymptotics and exact calculations", Journal of the Royal Statistical Society Series B 56, pp 501-514 Geweke, John, (1998), \Using simulation methods for Bayesian Econometrics models: inference, development and communication", mimeo, University of Minnesota Leeper, Eric M., Christopher A. Sims and Tao Zha, (1996), \What does monetary policy do?" Brookings Papers on Economic Activity 2, pp 1-78 Litterman, R.B., (1986), \Forecasting with Bayesian vector autoregressions - ve years of experience", Journal of Business and Economic Statistics 4 (1), pp 25-38 Poirier, Dale J., (1995), Intermediate Statistics and Econometrics: A Comparative Approach. Cambridge:MIT Press , (1998), \Revising beliefs in nonidentied models", Econometric Theory 14, pp 483-509 28 Ritter, Christian and Martin A. Tanner, (1992), \Facilitating the Gibbs sampler: the Gibbs stopper and the griddy-Gibbs sampler", Journal of the American Statistical Association 87, pp 861868 Sims, Christopher A., (1980), \Macroeconomics and reality", Econometrica 48 (1), pp 1-48 and Tao Zha, (1996), \Bayesian methods for dynamic multivariate models", mimeo, Yale University Waggoner, Daniel F. and Tai Zha, (1999), \Does Normalization Matter for Inference?", working paper, Federal Reserve Bank of Atlanta 29 The proto-prior Technical Appendix Table 3: Means and standard deviations for the proto-prior Horizon 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Y RS Mean Std Dev 0.6000 0.2500 0.8278 0.2687 0.8873 0.2875 0.9227 0.3063 0.9500 0.3250 0.9708 0.3438 0.9883 0.3625 1.0039 0.3812 1.0171 0.4000 1.0294 0.4187 1.0401 0.4375 1.0498 0.4562 1.0592 0.4750 1.0676 0.4938 1.0754 0.5125 1.0830 0.5313 1.0899 0.5500 1.0966 0.5687 1.1028 0.5875 1.1087 0.6062 1.1145 0.6250 Y RD Mean Std Dev 0.5051 0.4000 0.8081 0.3913 0.9091 0.3825 0.8081 0.3738 0.5051 0.3650 0.2991 0.3563 0.2608 0.3475 0.2324 0.3388 0.2102 0.3300 0.1925 0.3213 0.1778 0.3125 0.1656 0.3038 0.1551 0.2950 0.1461 0.2863 0.1382 0.2775 0.1312 0.2688 0.1250 0.2600 0.1194 0.2513 0.1144 0.2425 0.1099 0.2338 0.1057 0.2250 30 PRS Mean Std Dev -0.5000 0.2500 -0.7847 0.2687 -0.8646 0.2875 -0.9134 0.3063 -0.9514 0.3250 -0.9807 0.3438 -1.0055 0.3625 -1.0279 0.3812 -1.0469 0.4000 -1.0646 0.4187 -1.0801 0.4375 -1.0943 0.4562 -1.1080 0.4750 -1.1202 0.4938 -1.1317 0.5125 -1.1429 0.5313 -1.1531 0.5500 -1.1632 0.5687 -1.1723 0.5875 -1.1811 0.6062 -1.1898 0.6250 PRD Mean Std Dev 0.5000 0.2500 0.6484 0.2687 0.7549 0.2875 0.8409 0.3063 0.9143 0.3250 0.9790 0.3438 1.0372 0.3625 1.0905 0.3812 1.1398 0.4000 1.1857 0.4187 1.2288 0.4375 1.2696 0.4562 1.3083 0.4750 1.3451 0.4938 1.3804 0.5125 1.4142 0.5313 1.4467 0.5500 1.4781 0.5687 1.5084 0.5875 1.5376 0.6062 1.5660 0.6250 The prior Table 4: Means and standard deviations for p(A): Base prior p Ap(1; 1) Ap(1; 2) Ap(2; 1) Ap(2; 2) 0 60.4537 -60.5789 57.0065 65.4363 (19.03) (21.24) (18.04) (20.39) 1 76.5960 -66.2477 61.3234 76.1820 (28.81) (32.91) (28.89) (29.42) 2 -16.2154 9.0170 -0.1040 -5.1211 (20.72) (21.67) (20.95) (22.23) 3 -8.0004 -13.4474 -10.8014 -10.9238 (21.95) (21.92) (20.76) (24.87) 4 -0.1411 1.1270 -2.3854 -2.2439 (23.15) (23.77) (21.02) (23.00) 5 4.3625 3.2892 5.8315 6.0133 (22.65) (24.12) (21.97) (24.49) 6 0.8907 3.7315 1.0286 2.7684 (23.91) (26.52) (22.72) (25.73) 7 0.3811 -5.4103 -1.8359 -2.5776 (25.30) (26.99) (22.86) (24.69) 8 0.0177 7.1779 2.2314 0.3500 (15.28) (16.47) (14.34) (16.38) The Gibbs sampler Let denote a random vector, and let () represent the \target" distribution that we wish to analyse (in this study, the target densities are the prior p() and the posterior p(jdata)). Without loss of generality, partition according to = (1 ; 2 ; : : : ; m ). Suppose that it is possible to simulate from the \full conditional" distributions (j j;j ), the distribution of j conditional on the data and the other parameters. We can then simulate a sequence fi gNi=1 according to i1 (1 ji2;1; i3 ; : : : ; im;1) i2 (2 ji1; i3;1 ; : : : ; im;1) .. . im (im ji1; i2 ; : : : ; im;1 ) Under suitable regularity conditions, it can be shown that draws generated by this sequence converges in distribution to the joint distribution (). If N is large enough, these draws can be used to produce simulation-consistent estimates for the moments of interest. 31 The Metropolis-Hastings algorithm Suppose that we wish to simulate draws from (), and dene q(; 0 ) to be a known density from which it is easy to simulate a candidate 0 given , the value generated by the previous iteration if the algorithm. Dene 0 0 (; 0 ) max (())qq((; ;0 )) ; 1 (13) The MHA returns the candidate 0 with probability ; if 0 is rejected, then the MHA returns . The MHA can be implemented for the entire vector , or for the conditional distribution (j j;j ) in an iteration within a Gibbs sampler. For a more detailed description, see Chib and Greenberg (1995). Simulating draws from p(C ) In our application, we use the \Metropolis-within-Gibbs" version of the Gibbs sampler to simulate draws from p(C ), where each of the draws from the full conditional distributions p(j j;j ) are simulated using the MHA. We divide the parameter vector C into four subvectors: Ci fY RS g80 , Cii fY RDg80, Ciii fPRS g80 and Civ fPRDg80 , and the candidate generating distribution for for each Cj takes the form of a multivariate normal distribution that is independent of the previous realisation of the chain, so q(Cj 0 ) q(Cj 0 jCj ). The mean and variance for q(Cj 0 ) are taken from Table 3. For example, the means and standard deviations of q(Ci 0 ) are set equal to the means and standard deviations for Y RS for horizons 0 to 8 listed in Table 3, and the coecient of correlation between horizons s quarters apart is set to 0:7s . 500 draws were used to initialise the chain; the subsequent 2000 draws were used to characterise the moments for p(A). Retrieving a prior from an elicited posterior Suppose that the prior distribution for is N (^; ), and suppose also that is a diagonal matrix; this will be that case in our example. The predictive density of the data conditional on A after marginalising out has the form A(L)Xt = ^ + ut where ut iid N (0; I + ) is an error term that incorporates both the structural shocks and the prior uncertainty about . As in Section 4.1, if A0;Y is the only unknown element of A, and if we dene yt ; [ Yt;1 Pt;1 : : : Yt;p Pt;p ] AY ; ^1 Xt ; [ Yt Pt ] (14) then the model can again be written as y = XA0;Y + u where u is now a vector of N (0; (1 + 11 )) error terms. Premultiply the system by (1 + 11 );1=2 so that if y~ (1 + 11 );1=2 y and X~ (1 + 11 );1=2 X , we obtain a linear regression model of the form in (7): ~ 0;Y + y~ = XA (15) where is again a vector of N (0; 1) error terms. If the posterior p(Ajdata) is a known normal 32 distribution, then the conditional posterior p(A0;Y jA0;P ; AY ; AP ; data) is also a normal distribution with mean A0;Y and variance V0;Y . The conditional prior p(A0;Y jA0;P ; AY ; AP ) has a N (A^0;Y ; V^0;Y ) distribution, where: ;1 0 V^0;1 ;Y = V0;Y ; X~ X~ 0 A^0;Y = V^0;Y (V0;1 ;Y A0;Y ; X y) In our example, the prior distribution for is based on a 'training sample' and is conditional on A, so the above analysis must be modied slightly. Dene zt ; [ Yt;1 Pt;1 : : : Yt;p Pt;p ] AY and let Xt retain its interpretation in (14). If z and X are the sample averages of zt and Xt from a training sample of observations, standard conjugate Bayesian analysis results in the prior 1 N (X A0;Y ; z ; ;1 ) The model can now be written as [zt + z ] = [Xt + X ]A0;Y = ut where ut is N (0; 1 + ;1 ). Premultiplying by (1 + ;1 );1=2 yields the linear regression model zt = Xt A0;Y + vt where zt (1 + ;1 );1=2 [zt + z ], X t (1 + ;1 );1=2 [Xt + X ], and where vt N (0; 1). This model has the same form as (15), and if the posterior for A0;Y is multivariate normal, then the techniques described above can be used to characterise the prior. This approach can be adapted to recover the conditional priors for the other elements of A, so a Gibbs sampling algorithm similar to (8) can be implemented10 to generate a sample of draws from the prior consistent with the posited posterior for A. 10 Note that the step generating draws for is not required here. 33 Figures Figure 1: Proto-priors for impulse-response functions Y response to demand P response to demand 1.5 2.5 2 1 1.5 0.5 1 0 −0.5 0.5 0 5 10 15 0 20 0 Y response to supply 0 1.5 −0.5 1 −1 0.5 −1.5 0 5 10 15 10 15 20 P response to supply 2 0 5 −2 20 0 5 10 15 NOTE: Solid lines graph the prior expected per cent change in output and prices subsequent to supply and demand shocks with a magnitude of one standard deviation. The dashed lines indicate the one-standard-deviation error bands. 34 20 Figure 2: Priors for impulse-response functions: Base model Y response to demand P response to demand 1.5 2 1 1.5 0.5 1 0 0.5 −0.5 0 5 10 15 0 20 0 Y response to supply 5 10 15 20 P response to supply 1.5 0 −0.5 1 −1 0.5 −1.5 0 0 5 10 15 −2 20 0 5 10 15 NOTE: Solid lines graph the prior expected per cent change in output and prices subsequent to supply and demand shocks with a magnitude of one standard deviation. The dashed lines indicate the one-standard-deviation error bands. 35 20 Figure 3: Priors for YRD Prior (a) Prior (b) 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 0 5 10 15 −1 20 0 5 Prior (c) 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 0 5 10 15 20 15 20 Prior (d) 2 −1 10 15 −1 20 0 5 10 NOTE: Solid lines graph the prior expected per cent change in output subsequent to a onestandard-deviation demand shock. The dashed lines indicate the one-standard-deviation error bands. 36 Figure 4: Posteriors for impulse-response functions: Base prior Y response to demand P response to demand 1.5 3 2.5 1 2 1.5 0.5 1 0 0 5 10 15 0.5 20 0 Y response to supply 5 10 15 20 P response to supply −0.5 1.5 −1 −1.5 1 −2 0.5 0 5 10 15 −2.5 20 0 5 10 15 NOTE: Solid lines graph the posterior expected per cent change in output and prices subsequent to supply and demand shocks with a magnitude of one standard deviation. The dashed lines indicate the one-standard-deviation error bands. The dotted lines are the prior means. 37 20 Figure 5: Posteriors for YRD Posterior (a) Posterior (b) 3 3 2 2 1 1 0 0 −1 0 5 10 15 −1 20 0 5 Posterior (c) 3 2 2 1 1 0 0 0 5 10 15 20 15 20 Posterior (d) 3 −1 10 15 −1 20 0 5 10 NOTE: Solid lines graph the posterior expected per cent change in output subsequent to a onestandard-deviation demand shock. The dashed lines indicate the one-standard-deviation error bands. The dotted line are the prior means. 38 Figure 6: Posteriors for LRY RD Posterior (a) Posterior (b) 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 −4 −2 0 2 0 −4 4 −2 Posterior (c) 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 −2 0 2 4 2 4 Posterior (d) 1.4 0 −4 0 2 0 −4 4 −2 0 NOTE: Solid lines graph the estimated posterior densities for the long-run change in output associated with a one-standard-deviation demand shock. The dashed lines graph the estimated prior densities. 39
© Copyright 2026 Paperzz