The Delta Prior and Small Sample Distribution in Time Seriesв€— Marek JarociВґ nski European Central Bank Albert Marcet Institut dвЂ™An`alisi Econ`omica CSIC and CEPR Preliminary, this version: August 22, 2008 Abstract This paper serves several purposes: i) to reconcile the Bayesian and classical views about the need for correcting upwards the OLS estimator in autoregressive models, ii) to propose the use of the delta prior, based on prior knowledge about growth rates of certain variables, iii) to develop techniques for translating prior knowledge about observables into priors about coefficients. All these purposes are closely related: the delta prior is based on the prior knowledge that most economic time series will not grow at arbitrarily high or low rates, this leads us to propose the delta prior as a widely accepted Bayesian prior. We also show that the Bayesian and classical views are easily reconciled when we observe that the posterior from the delta prior adjusts the OLS estimate in the same direction as classical bias corrections, because frequentist small sample studies were implicitly assuming a delta prior and because the so-called вЂќexactвЂќ likelihood approach has the same effect. We find that the delta prior generates estimates that have lower frequentist risk than frequentist bias-corrected estimators, в€— We thank Manolo Arellano, Frank Schorfheide, Chris Sims for useful conversations and Harald Uhlig for his comments on the earlier version of this paper. All errors are our own. Albert Marcet acknowledges financial support from CREI, DGES (Ministerio de EducaciВґon y Ciencia), CIRIT (Generalitat de Catalunya) and the Barcelona Economics program of CREA. All computations were performed using Ox (Doornik, 2002). The opinions expressed herein are those of the authors and do not necessarily represent those of the European Central Bank. Contacts: [email protected] and [email protected]. 1 so they are attractive also from the classical perspective. A prior about the likely distribution of growth rates is a statement about the features of the marginal distribution of the data. This can not be used directly to build a posterior, so we need to convert this statement into a prior about the coefficients of the VAR. For this purpose, we propose a procedure for translating priors about observables into priors about coefficients that has not been used before and that has potential applications beyond the construction of the delta prior. We illustrate the use of the delta prior on a macroeconomic VAR of Christiano, Eichenbaum and Evans (1999) and compare its effects to those of the dummy observation priors of Sims and Zha (1998) and of classical bootstrap approach of Killian (1998). Keywords: Time Series Models, Small Sample Distribution, Bias Correction, Bayesian Methods, Vector Autoregression, Monetary Policy Shocks JEL codes: C11, C22, C32 2 Contents 1 Introduction 4 2 Why a Bayesian should agree that ПЃOLS is too small 2.1 Frequentist bias corrections . . . . . . . . . . . . . . 2.2 Views in the Bayesian literature . . . . . . . . . . . . 2.3 Reconciling the two views under the delta prior . . . 2.4 Frequentist evaluation of the delta estimator . . . . . 3 Translating Priors for Observables 3.1 Defining the prior for coefficients . 3.2 Fixed point formulation . . . . . . 3.3 Gaussian approximate fixed point . 3.4 Specifying priors on observables and prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . accuracy of the . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . obtained . . . . . 6 . 6 . 11 . 13 . 18 21 . 21 . 23 . 25 . 28 4 Comparison with Other Priors 4.1 Priors which adjust the OLS estimator for other reasons . . . 4.2 Priors similar to the delta prior . . . . . . . . . . . . . . . . . 4.3 Dummy observations priors . . . . . . . . . . . . . . . . . . . 31 31 32 34 5 Empirical Example: the effect of monetary shocks in the US 36 6 Conclusions 42 Appendix A Initial conditions 45 A.1 Initial conditions in the studies of bias . . . . . . . . . . . . . 45 A.2 The Bayesian approach: S-priors for the AR(1) . . . . . . . . 47 A.3 The S-prior for VARвЂ™s . . . . . . . . . . . . . . . . . . . . . . 50 Appendix B Marginal distribution of the observables implied by the prior on growth rates 53 Appendix C Our prior is not a reparameterization 53 Appendix D Analytical iteration on the mapping F for AR(1) 54 D.1 Guess of the solution . . . . . . . . . . . . . . . . . . . . . . . 55 D.2 Approaching the prior by fixed point iteration . . . . . . . . . 55 3 1 Introduction EconometriciansвЂ™ opinions on how to model growing time series differ widely. Some recommend pre-testing for unit roots and, in multivariate contexts, cointegration, while others recommend strongly against it; some estimate in levels, others estimate in differences; some use classical inference based on asymptotics, others classical small sample methods, others use Bayesian estimation. Within the Bayesian camp there are several standard, automatic prior distributions whose virtues are hard to compare in general terms. Often these views imply very different approaches to the data and different empirical implications. We think however, that most economists would find the following statement acceptable: most economic time series are unlikely to grow by a huge amount in a given period. Economists would agree that the growth rate of output is very unlikely to exceed, say, 50%. In this paper we propose to use this prior knowledge in the estimation of time series. We call the prior distribution that is derived from this prior the вЂњdelta priorвЂќ. Studying the delta prior has many side products. First, it reconciles Bayesian and classical small sample approaches to time series estimation. Second, to the extent that it is a relatively uncontroversial and widely acceptable prior, inferences from the corresponding posterior should be widely acceptable. Third, translating the knowledge about growth rates into a standard Bayesian prior for model parameters is a non-trivial task and it forces us to develop techniques that, to our knowledge, have not been used before. These techniques are likely to be useful for the implementation of other prior knowledge on observables. Fourth, we argue that the delta prior is in fact desirable from a classical point of view as well, since it delivers a smaller bias then the OLS while having a lower mean square error (MSE) than classical small sample approximate bias corrections, whose important selling point is exactly their improved mean squared error performance. Fifth, the delta prior provides a unified framework that allows the interpretation of several Bayesian priors proposed in the literature. Section 2 reconciles Bayesian and classical small sample approaches. A growing literature is worrying about the inadequacy of standard inference in time series based on asymptotic theory. Various methods that perform better in small samples have been designed from a classical perspective. This literature often focuses on the well-known bias of OLS. But as emphasized by Sims and Uhlig (1991) a Bayesian with a flat prior sees no need to correct the bias. It seems as if an applied economist who would like to ignore the small sample problems all he/she needs to do is to state his wholehearted belief in the Bayesian approach and a flat prior. The choice between classical 4 bias corrected estimators and Bayesian estimators is purely a matter of faith in the classical or the Bayesian-flat prior approach. Since most researchers (like us) do not feel strongly one way or the other and are willing to use whatever approach is likely to work, we think it is productive to show how to reconcile these two approaches. This is easily done with the delta prior: we show that the Bayesian posterior obtained with this prior is no longer symmetric around the OLS estimate and its mean can be considered to be вЂњbias correctedвЂќ. Furthermore, we argue that this is not surprising because classical small sample studies were already incorporating a sort of delta prior in their assumptions. Furthermore, we show that the Bayesian point estimate with the delta prior has a lower MSE than classical estimators designed to correct for the small sample bias. Therefore, the delta prior estimation has all the virtues that a classical econometrician would ask for: a lowered bias and a lower mean squared error than competitors. Of course, it also has all the standard virtues of the Bayesian procedure, such as the facility of performing inference about functions parameters (such as impulse responses), while constructing confidence intervals for functions of classical bias correcting estimators is tricky (see Sims and Zha (1999) for more discussion of this issue). A substantial technical difficulty arises because the a-priori statement that growth rates of the series have a certain distribution is a statement about observable series. Instead, the standard procedure in Bayesian econometrics is to formulate a prior for unobservable coefficients. In section 3 we show how to translate a prior for observable series into a standard prior distribution for parameters.1 This involves solving approximately a functional equation or, alternatively, a fixed point problem in the space of distributions. We show how to address this issue in a practical way and we propose an algorithm that works very well in the empirical applications we consider. The techniques discussed in this section are applicable to other cases where researchers naturally formulate prior views on the distribution of observable series. In section 4 we compare the delta prior with other priors that have been used in the literature, including the Minnesota prior, JeffreyвЂ™s prior, вЂњexact likelihoodsвЂќ (which can also be interpreted as using a particular prior), and 1 An excellent discussion of obtaining priors from statements about observables can be found in (Berger, 1985, Ch.3.5). The techniques discussed in BergerвЂ™s book are not applicable to the problem at hand, so we propose a technique that, to our knowledge, has not been used before. Dummy observation priors of Sims and Zha (1998); Sims (2006) and priors from DSGE models of Del Negro and Schorfheide (2004) are also a related work in this spirit and in section 4.3 we discuss in detail the relationship of our technique with these approaches. 5 SimsвЂ™ dummy observation prior. We think that the delta prior provides a common ground to unify views on these various priors. We would claim that other approaches were driving up the estimated AR root in a more or less arbitrary way, while the delta prior achieves this in a well justified way, as a result of a prior which is easy to elicit and assess intuitively. Section 5 discusses a practical application which illustrates the effects of the delta prior and compares it with the results obtained with other priors and classical procedures. We conclude in section 6. We discuss the relevant literature in various places along the paper. 2 Why a Bayesian should agree that ПЃOLS is too small Throughout this section we use as an example the Gaussian AR(1) model with an intercept: yt = О± + ПЃytв€’1 + t (1) with t i.i.d N(0, Пѓ 2 ), which holds for t = 1 . . . T . In order to simplify the setup further, we will always assume that ПЃ > 0, which is the most relevant situation in modeling macroeconomic time series. Ordinary least squares estimates of the coefficients of this model will be denoted by О±OLS and ПЃOLS . In considering bayesian point estimates we will always assume a square loss function in ПЃ. This section discusses, first from the classical and then from the bayesian perspective, the properties of OLS in time series. In the context of our example, the question boils down to whether ПЃOLS is likely to be too small compared with ПЃ, and thus needs to be adjusted upwards. We will argue that, contrary to a common belief, the controversy regarding upward adjustment of ПЃOLS is unrelated to whether one takes a bayesian (post-sample) or a frequentist (pre-sample) perspective. Instead, what is crucial is the prior information used implicitly in the classical analysis or explicitly in the bayesian analysis. We will argue that this prior information is best elicited as a prior on the initial growth rates of the series y, and that an estimator using this prior is attractive for both bayesians and frequentists. 2.1 Frequentist bias corrections It is well known that asymptotic distribution is not a good approximation to small sample distribution for most time series applications. In particular, it has been known for a very long time that OLS is biased and that it is likely 6 to underestimate the highest root of the process for the relevant case of a root close to 1.2 Therefore classical econometricians should find the use of OLS problematic in many applied time series studies. Related to the bias is the well known fact that the small sample distribution of OLS is asymmetric. These two facts have been largely ignored in much of the applied work based on classical inference, which often uses the asymptotic distribution. The small sample bias of OLS may be important in VAR applications. First, as shown in Abadir et al. (1999), the bias increases with the dimension of the VAR. Second, the size of the impulse-response coefficients at lag П„ is proportional to the root of the AR polynomial raised to a power П„ . Therefore, a small deviation in the VAR coefficients is magnified in the impulse-response coefficients at medium lags. A researcher who ignores this bias will often underestimate the persistence of the impulse response and, as a result, may be understating the economic relevance of his results. We consider the frequentist small sample distribution of the OLS estimator (О±OLS , ПЃOLS ) when T observations are available. For the present example, we take: О± = y0 = 0, ПЃ = 0.95 and T = 25. We also take Пѓ = 1, but the distribution of ПЃOLS is unaffected by the choice of Пѓ (see e.g. Result 2 in the Appendix). A Monte Carlo approximation of the distribution of ПЃOLS is shown in Figure 1. 3.5 ПЃ = 0.95 3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 Figure 1: Distribution of ПЃOLS when ПЃ = 0.95, sample size is 25 and О± = y0 = 0. Approximation on the basis of 100,000 simulated series. This is a familiar picture: the distribution of ПЃOLS is skewed to the left and this estimator is biased downwards. The reasons for this bias have been 2 Some of the earliest references are Quenouille (1949), Hurwicz (1950), Marriott and Pope (1954) and Kendall (1954). An general characterization of the effects of the bias on the highest root is in Stine and Shaman (1989). 7 described at length in the literature, so we do not repeat them here. In the above example E(ПЃOLS ) = 0.85. Furthermore, the effect of the bias is magnified when considering moving average representations: according to the mean OLS estimate the impulse response coefficient at lag 20 is 0.04, instead of 0.36 with the true coefficient. This suggests that ПЃOLS should be adjusted upwards to compensate for the bias. A wealth of procedures for correcting the bias have been designed. Among others: Quenouille (1949) proposed a jackknife estimator. Orcutt and Winokur (1969) and Roy and Fuller (2001) use analytical expressions of the bias to construct bias corrected estimates. Andrews (1993) numerically obtains a median-unbiased estimator, which is extended to approximately median-unbiased estimation in higher order autoregressions in Andrews and Chen (1994). MacKinnon and Smith (1998) consider various strategies of correcting the bias with bootstrap. A bootstrap bias correction is also used for VARs in Kilian (1998). We will use the generic name вЂњbias-correctingвЂќ (BC) estimators for the estimators derived in this literature. The proposed bias corrections are quite successful in improving the properties of the OLS. First, they eliminate most of the bias.3 Second, numerous Monte Carlo studies demonstrate that BC estimators have lower mean squared error (MSE) than OLS in a relevant range of parameter values (which, in the case of the AR(1) is approximately 0.5 < ПЃ в‰¤ 1). For examples of such Monte Carlo studies see MacKinnon and Smith (1998), Roy and Fuller (2001) or section 2.4 later in this paper. However, there are several problems that limit the appeal of this literature. The first issue is the lack of a clear optimality property of any of the BC estimators. We donвЂ™t know if they are most efficient within any class (and, with the exception of Andrews (1993), they are not even unbiased). (Approximate) unbiasedness gives an impression of impartiality, and thus sounds like an attractive feature for an estimator, but it has no decision theoretic justification, and needs to be specified as a goal in itself. Therefore, we do not know how to trade it for efficiency. This is a key question because, although compared with OLS, BC estimators are both less biased and more efficient, but, as we will show in this paper, even larger gains in MSE can be obtained by sacrificing some of the bias correction. Another problem is that often in time series models we are interested in nonlinear functions of parameters, such as impulse responses, whose bias and MSE need to be addressed separately. Yet another, are the complications in conducting inference with 3 Exact mean-unbiasedness cannot be achieved, because the bias is a non-linear function of the true parameter which is unknown. The only exact bias correction exists for the median bias and only in the AR(1) case (Andrews, 1993). 8 BC estimates (confidence intervals, hypothesis tests). These are standard Bayesian arguments raised against frequentist practices. An issue which we would like to stress in particular is the dependence of the frequentist bias and the derived bias corrections on the initial conditions assumed. To illustrate this dependence, consider changing the previous example. Let us say we change only О± while keeping y0 = 0 (we could equivalently do the reverse and change y0 while keeping О± constant). The distribution of ПЃOLS for the case О± = 2 (and О± = 0 for comparison) is shown in Figure 2 18 16 О±=0 О±=2 ПЃ = 0.95 14 12 10 8 6 4 2 0 0.6 0.7 0.8 0.9 1 1.1 Figure 2: Density of ПЃOLS when ПЃ = 0.95, sample size is 25, y0 = 0, О± = 2 (and the density for О± = 0 for comparison). Approximation on the basis of 100,000 simulated series. As can be seen, now OLS looks like a much better estimator and its distribution is almost symmetric. The required bias correction would now be much smaller than in the initial example. The reason for this is that since we kept y0 = 0, the long run mean of yt is now given by О±/(1 в€’ ПЃ) which is much higher than the initial condition. As a result, y grows fast, especially in the first periods, as it is converging back to this mean. We now provide an intuitive argument to show that it is these large growth rates of y that cause OLS to be such a good estimator.4 Figure 3 displays two realizations of the process y, picked from the simulations underlying Figures 1 and 2. The first column of graphs plots yt against time. The second column plots в€†yt against time. The third column shows a scatterplot of the right-hand-side variable (ytв€’1 ) against the left-hand-side variable (yt ) in the regression on equation (1) for this realization. The two 4 A similar вЂњimprovementвЂќ of the OLS estimator would occur if we kept О± = 0 but we lowered y0 . 9 rows of graphs only differ in the constant term О±, while all other parameters (including y0 = 0) and random errors are the same in both rows. The first row assumes О± = 0 while the second row presents a process with О± = 2. The solid line in the scatterplots of the last column is the true regression line implied by the true parameters in (1) while the dashed line is the fitted regression estimated by OLS for this realization. в€†yt 4 yt 40 35 30 25 О±=0 20 15 10 5 0 yt ss yt 3 2 1 0 -1 t 0 10 -2 20 t 10 20 в€†yt 4 yt 40 35 30 25 О±=2 20 15 10 5 0 2 1 0 -1 t 10 20 0 10 20 30 0 10 20 30 yt-1 yt 3 0 О±OLS + ПЃ OLSyt-1 О± + ПЃyt-1 40 35 30 25 20 15 10 5 0 -2 t 10 20 40 35 30 25 20 15 10 5 0 yt-1 Figure 3: Two cases of the AR(1) process and the performance of the OLS estimator of the coefficients. The first column plots yt against time. The second column plots the first difference of yt against time. The third column shows scatter plots of yt against ytв€’1 , along with true and fitted regression lines. The series in the upper row starts at its long run mean and it fluctuates around it. The slope of the dashed line in the last graph of the first row is lower than the actual regression line, reflecting the OLS bias which results in ПЃOLS < ПЃ. Notice that the second graph in the first row indicates that the realization contains small growth rates. But in the second row, the initial condition assumed implies that the series starts 40 standard errors away from the long run mean. The transition from the remote starting value to 10 the steady state dominates the series behavior. It results in much higher dispersion of the values of the explanatory variable ytв€’1 . It is well known that when the values of the explanatory variable are highly dispersed, OLS is a very good estimator.5 This is why the fitted regression in the last graph of the second row is much closer to the truth. Notice from the plots of the first difference of the series that y grows unusually fast, especially in the first periods, compared to the long-run growth rate. These large initial growth rates help OLS to perform well. Therefore, one should not be surprised to learn that the small sample distribution of ПЃOLS is much more symmetric and narrow in the case of the second row, as was displayed in Figure 2. A majority of the cited frequentist studies assume that the first observation is drawn from the stationary distribution of the process (which in our example is N(О±/(1 в€’ ПЃ), Пѓ 2 /(1 в€’ ПЃ))). This is complemented with some arbitrary assumption for the case of |ПЃ| в‰Ґ 1, when the stationary distribution does not exist. That is, in the context of our Figure 3, these studies explicitly discard situations from the lower panel. Obviously, this has dramatic effects on the conclusions regarding the size of the bias. The importance of the initial conditions extends also to the unit root testing, as has been discussed in MВЁ uller and Elliott (2003). It is hard to assess if these assumptions are plausible in an applied problem at hand, and probably for this reason, this issue is rarely discussed. However, we consider it a strength of the frequentist approach, that it forces researchers to consider the initial conditions explicitly. A bayesian is free to condition on the initial observation and neglect any reflection about the initial conditions. 2.2 Views in the Bayesian literature As is well known, using a flat prior for the regression coefficients results in a posterior distribution which is symmetric and centered at the OLS estimate. Therefore OLS gives the optimal point estimate under a standard loss function. This result holds regardless of whether all regressors are exogenous, or any of them happen to be lagged dependent variables. Therefore, for a Bayesian who has a flat prior, OLS needs no adjustment upwards in autoregressive models despite the presence of a small sample bias. Sims and Uhlig (1991) use the AR(1) model without intercept as an illustration of the fact that presence of a bias is not a compelling reason to adjust the estimator.6 They study the joint distribution of the true coefficient In other words, if the variance of the explanatory variable is high the matrix (X X)в€’1 is small causing the variance of the OLS estimator to be small. 6 Sims and Uhlig (1991) also formulate other criticisms to the classical approach. In particular, they point to the inconsistency that arises from the fact that the asymptotic 5 11 ПЃ and its OLS estimate ПЃOLS , implied by the model and a uniform prior (they display this bivariate distribution from different angles in their helicopter tour plots). This joint distribution embeds both the skewed, biased distribution of ПЃOLS conditional on ПЃ, and the Bayesian quasi-posterior distribution of ПЃ conditional on the OLS estimate (which should be distinguished from the usual posterior, which is the distribution conditional on the data). They point out that the distribution of the autoregressive coefficient ПЃ conditional on its OLS estimate ПЃOLS is symmetric around ПЃOLS , in spite of the bias of ПЃOLS , which is may be counterintuitive at first,7 and which highlights the importance of carefully distinguishing the pre-sample and post-sample perspectives. There are well known philosophical objections to ever using the presample perspective (i.e. failing to condition on the observed sample). The Likelihood Principle implies that samples which have not occurred should not affect the inference with the actual sample, and violating this rule can lead to very unreasonable inferences (for excellent discussion of these issues and examples see e.g. Berger and Wolpert (1988)). Therefore, seeing ПЃOLS as too low, or as appropriate, which seemed a very mundane practical issue, is actually a matter of foundational principles. A researcher who takes a presample (frequentist) perspective thinks that the actual ПЃ is likely to be higher then ПЃOLS , while a researcher who takes a post-sample (bayesian) perspective (and is willing to specify his marginal prior beliefs about ПЃ as uniform), believes that ПЃ is equally likely to be higher or lower. Note that, like Sims and Uhlig (1991), we are abstracting for now from the situation where a bayesian has some prior ideas about ПЃ which are not well characterized by a uniform prior, because this would obviously shift the posterior away from the OLS. Such non-uniform priors could arise either from prior information, or from alternative specification of a noninformative prior, namely the JeffreysвЂ™ prior (see Phillips, 1991). We comment on JeffreysвЂ™ prior later in the paper. distribution of OLS is discontinuous at exactly a unit root. This criticism pertains to the use of asymptotic theory in the classical approach. The small sample distribution of OLS is nicely continuous at a unit root so it is not subject to the вЂњdiscontinuityвЂќ criticism. We do not mention this issue in the paper because our objective is to compare the small sample classical approach with Bayesian approaches. 7 The intuition they provide is that the bias towards zero is compensated by the increasing precision of the OLS as the autoregressive parameter increases. Since these effects the bias and the increasing precision - exactly cancel, the distribution of the autoregressive parameter conditional on the OLS estimate is symmetric around the OLS estimate. However, in the model with an intercept the small sample bias of OLS is much larger, while the precision of OLS increases less with the size of the autoregressive parameter (which has been pointed out e.g. in Andrews, 1993, p.158). Therefore, the above intuition about canceling effects does not extend to the specification with the constant term. 12 We think that this part of SimsвЂ™ and UhligвЂ™s paper is frequently misinterpreted. The misinterpretation is that in general, in autoregressions, the decision whether to adjust the OLS estimate upwards or not, is a matter of taking a pre-sample or a post-sample perspective (as long as oneвЂ™s prior for ПЃ is flat). This suggests a deep dichotomy between the frequentist and the bayesian treatment of time series models. But in fact, it is only a peculiarity of the AR(1) model without the constant term, that flat-prior bayesians never choose to adjust ПЃOLS upwards, independently of the initial condition. In a more general autoregressive model, there is no such dichotomy between the two schools. The decision on adjusting ПЃOLS upwards or not depends on the initial conditions one assumes. Both frequentists and flat-prior bayesians will agree that ПЃOLS is too small, once the initial observation is assumed to be close to the deterministic component of the process. Both frequentist and flat-prior bayesian point estimates will approach the OLS estimate, as the initial deviation from the deterministic component increases. The claim that the frequentist bias corrections of the OLS estimate depend on the initial condition was discussed in the previous section. The claim that the same is true with bayesians (although their corrections of the OLS are, in general, different from bias corrections) will be substantiated in the next section. In fact, in practical applications bayesians did not ignore the frequentist evidence that OLS underestimates the root of the process. The most popular priors used routinely in VARs: the Minnesota prior of Doan et al. (1984) and вЂ™dummy observationsвЂ™ priors of Sims, are designed to push the posterior towards the unit root and thus reduce the bias of the posterior mean, compared with the OLS estimate. These priors are discussed in detail in section 4 of this paper. Motivations for these priors sometimes sound quite close to worries about the bias of OLS. For example, Sims and Zha (1998, p.959) justify their priors by the need to correct some undesirable features of the posterior which are the вЂњother side of the well-known bias toward stationarity of least-squares estimates of dynamic autoregressions.вЂќ However, these priors are experience-based devices for improving the forecasting performance, and it is difficult to elicit them or assess if they are reasonable in themselves. Instead, this paper proposes an intuitive prior which addresses the problems of the OLS estimator. 2.3 Reconciling the two views under the delta prior We have illustrated in section 2.1 that the key issue in determining the size of the problems with the OLS estimator is the initial condition, i.e. the relation of the first observation in the sample with the parameters of the model. We find it difficult to specify such initial conditions, and we find the 13 solutions in the literature to be arbitrary. How can we distinguish reasonable initial conditions from unreasonable ones? Our proposal is to think in terms of the implied dynamics of the series initialized at such initial conditions. Therefore, instead of specifying the distribution of the first observation, we specify a prior about initial growth rates of the process, conditional on the first observation. Most economists would agree that economic time series are unlikely to grow by a huge amount in a given period. If asked to deliver a precise statement about their beliefs economists might produce a distribution for the growth rate in, say, period 1. For example, the answer might be8,9 в€†y1 в€ј N(Вµв€† , Пѓ 2в€† ) (2) for some Вµв€† , Пѓ 2в€† . Recall that y0 is a given constant. A Bayesian would like to use this prior information and combine it with model (1) to form a posterior distribution on the parameters О±, ПЃ. The prior statement (2) is not a prior distribution on the unobserved parameters О±, ПЃ as is usually required in the application of BayesвЂ™ rule. For this purpose we have to translate (2) into the prior distribution of О±, ПЃ. Translating a prior for coefficients from a prior for observable series is in general not a trivial task, much effort will be devoted to this issue in the next section. For the AR(1) case, however, this amounts to a reparameterization of the model. Given our model we have: в€†y1 = О± + (ПЃ в€’ 1)y0 + 1 (3) First we guess that О± + (ПЃ в€’ 1)y0 has a normal distribution, independent of 1 . Second, taking the expectation and the variance on both sides of (3) and using the standard assumption that the distribution of the coefficients О±, ПЃ is independent of the вЂ™s we can write Вµв€† = E(О± + (ПЃ в€’ 1)y0 ) Var(в€†y1 ) = Var(О± + (ПЃ в€’ 1)y0 ) + Пѓ 2 (4) (5) Putting all this together we find that our prior for the coefficients has to satisfy: О± + (ПЃ в€’ 1)y0 в€ј N(Вµв€† , Пѓ 2g ) (6) for Пѓ 2g = Пѓ 2в€† в€’Пѓ 2 . Under our guess that О±+(ПЃв€’1)y0 had a normal distribution independent of 1 we see that indeed any distribution for О±, ПЃ that satisfies (6) is consistent with (2). 8 As usual we consider logs of growing variables so that в€†y is the growth rate. The assumption of normality is convenient in this section. The techniques discussed in section 3 and applied in section 5 do not need this assumption at all. 9 14 Condition (6) can be viewed either as a prior distribution of ПЃ conditional on О± or as a prior distribution of О± conditional on ПЃ. In order to have a joint prior distribution for О±, ПЃ we need to complement it with the respective marginal distribution. If no additional prior knowledge is available, it is natural to complement this condition with a flat prior for either ПЃ or О±. The convenient feature of this simple example is that the posterior will be the same in either case, because the kernel of the prior is the same. A word of caution is in order: since we take the variance of as known throughout the paper, for a prior for growth rates to be compatible with the model we need that Пѓ 2в€† в‰Ґ Пѓ 2 . If this is not the case then (2) is incompatible with the model: there is no prior for the coefficients that is consistent with the model. This possibility of non-existence of the underlying prior for coefficients will be an issue throughout the paper. We call (6) the вЂ™deltaвЂ™ prior. To insist on our semantics: throughout the paper we call a prior statement on observables such as (2) a вЂњprior for growth ratesвЂќ. We reserve the name вЂњdelta priorвЂќ for the distribution of unobservable coefficients that is consistent with the prior for growth rates. In the case (which we will use in this section) that y0 = 0 the prior for growth rate simply translates to a prior for the constant term (which can be completed e.g. with a flat prior for ПЃ). О± в€ј N(Вµв€† , Пѓ 2g ) (7) Let us illustrate the effect of the delta prior in a вЂњhelicopter tourвЂќ graph used by Sims and Uhlig (1991), which shows the joint distribution of the OLS estimate ПЃOLS and the actual parameter ПЃ. Sims and Uhlig use this graph to illustrate that, in spite of the small sample bias, the posterior conditional on the ПЃOLS is symmetric and centered around ПЃOLS . The first row of Figure 4 illustrates the point of Sims and Uhlig (1991). We reproduce (up to the sampling error) two of their graphs, which are generated for the case of AR(1) without the constant term. The cross-sections along the fixed-ПЃ lines represent the small sample distribution of ПЃOLS given ПЃ. The cross-sections along the fixed-ПЃOLS lines represent the distribution of ПЃ given a value for the estimate ПЃOLS . These cross sections are a summary of the posterior distribution of ПЃ given ПЃOLS . These cuts reflect two well known facts: i) the small sample distribution of OLS under the classical approach is biased downwards and skewed; ii) the posterior under a flat prior is symmetric and centered at the OLS estimator. In the subsequent two rows we present analogous graphs for the case with the constant term and with the delta prior given by (7).10 We take 10 Adapting the procedure of Sims and Uhlig (1991) to the model with the constant term, 15 Вµв€† = 0 and we consider two values for Пѓ в€† . The plots in the second row of Figure 4 are generated with a very loose delta prior, with Пѓ в€† = 1, in order to approximate the effect of the flat prior (we cannot draw from the flat prior which is not a proper distribution). The third row of Figure 4 illustrates that in the model with an intercept, when reasonable initial conditions are assumed, both frequentists and bayesians agree that the OLS estimate is too low, and wish to correct it upwards. These plots are generated assuming Пѓ в€† = 0.1, which would rule out with 95% probability initial вЂ™systematicвЂ™ (not caused by 1 ) growth rates outside the (-20%,20%) range. We think that this is a very reasonable requirement for most growing macroeconomic time series. The frequentist small sample distribution on the left plot shows a downward bias, which is much stronger then in the model without intercept. A frequentist would like to correct the OLS estimate upwards to compensate for this bias. The posterior generated with this prior is also far from being symmetric around the OLS estimate, but it is shifted towards higher values.11 In this case, given the OLS estimate of 0.95, a Bayesian would believe that the true parameter ПЃ is around 0.975, adjusting the OLS estimate upwards as a frequentist would. The plots in the second row illustrate that, as noted in the previous section, as the initial condition becomes less binding, both the frequentist small sample distribution and the bayesian posterior distribution concentrate around the OLS estimate. In fact, they converge to a spike at 0.95 for the exactly flat prior for the initial growth rate. Therefore, the posterior mean of ПЃ conditional on ПЃOLS is indeed centered at the OLS estimate when the prior is flat. But this case should not be confronted with the classical small sample analysis under reasonable initial conditions. It should be confronted with classical small sample analysis under no restriction for initial conditions. A flat prior says that the probability that |в€†y1 | is larger than, say, 20% is much bigger than the probability that |в€†y1 | will be lower than 20%. Alternatively, one could say that a flat prior implies in the prior for growth rates we take the values of ПЃ on a grid and draw values of О± from the prior (7). For each draw of (ПЃ, О±) we simulate the process given by (1) starting from y0 = 0. We generate 10000 data vectors of length T = 100. For each data vector we compute О±OLS , ПЃOLS . Lining up the histograms of ПЃOLS вЂ™s we obtain a surface which is the joint p.d.f. of ПЃ and ПЃOLS under the delta prior О± and a flat prior for ПЃ. The images that are seen from our helicopter show a slightly different concept from those seen from SUвЂ™s helicopter: in our case both О± and ПЃ are underlying parameters, therefore when we condition on ПЃ we are in fact integrating over the distribution of О±. 11 Note that ПЃ increases from right to left in the horizontal axis of this plot. We have followed Sims and Uhlig in this convention, because it allows to switch smoothly between the two cross-sections of the plot in the вЂ™helicopter tourвЂ™ fashion, without having to flip one dimension. 16 cut ПЃ OLS|ПЃ =0.95 cut ПЃ |ПЃ OLS=0.95 1400 1200 1400 No constant, flat 1000 1200 800 1000 600 800 400 600 200 400 200 0 0.7 0.75 0.8 0.85 OLS 0.9 0.95 With constant, delta looser ПЃ 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0.7 0.75 0.8 0.85 OLS 0.9 0.95 ПЃ 1 1 0.8 0.85 ПЃ 0.9 1.050.95 0 0.8 0.85 0.9 0.95 1.1 1.05 1 0.95 ПЃ 0.9 0.85 0.8 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0.8 OLS 0.85 ПЃ 0.9 0.95 1.1 1.05 1 0.95 ПЃ 0.9 0.85 0.8 1.05 1 0.95 0.9 0.85 0.8 ПЃ OLS 0.8 0.85 ПЃ 0.9 1.050.95 With constant, delta tighter 1200 1000 900 800 700 600 500 400 300 200 100 0 0.7 0.75 0.8 0.85 OLS 0.9 0.95 ПЃ 1000 800 600 400 200 1 0 0.8 0.85 0.9 0.95 1.1 0.8 0.85 ПЃ 0.9 1.050.95 ПЃ OLS ПЃ Figure 4: Views of joint frequency distributions of (ПЃ, ПЃOLS ) generated with alternative assumptions. 17 Пѓ 2в€† = в€ћ. This is not an вЂќuninformativeвЂќ prior, on the contrary, this informs the researcher that OLS is quite a good estimator. Only an analyst who really thinks that very high growth rates are very likely should use OLS. An analyst thinking that very high growth rates are impossible or very unlikely should not use OLS, instead he should incorporate this information into the estimation. 2.4 Frequentist evaluation of the delta estimator Our discussion of the connection between Bayesian and classical views may have seemed purely academic and theoretical. But our purpose is very practical: having put both procedures in the same ground we can now compare their advantages. We now study the performance of the delta estimator (a point estimate obtained with the delta prior and the loss function which is square in the error of ПЃ) from the classical point of view. We will see that the delta estimator, although inspired by Bayesian principles, can be also justified from a classical perspective.Classical bias corrections (as discussed earlier) have an element of arbitrariness in that a full bias correction is never achieved. For that reason, in the end, much the literature on BC ends up checking the virtues of an estimator in terms of the MSE reduction it achieves for certain вЂќrelevantвЂќ parameter values. We show that the delta estimators, in fact, show a substantially lower mean square error for a wide range of parameter values that include the ranges considered of interest in many BC papers. The Bayesian posterior mean is guaranteed to have the lowest MSE on average over the parameter space, where the average is taken with the weights given by the prior. However, classical evaluations focus on estimatorsвЂ™ performance for specific parameter values, within relevant ranges. We show the bias and MSE properties of the delta estimator compared to alternative estimators from this point of view. To study the frequentist properties of the delta estimator in model (1) we repeat the Monte Carlo study of MacKinnon and Smith (1998, section 5), adding the delta estimator. In order to highlight the small sample problems, we take the case of T = 25. As in the above paper, the initial condition is generated as y0 = О±/(1 в€’ ПЃ) + where в€ј N (0, Пѓ 2 ). This ensures invariance of the results with respect to О± and Пѓ (see the Appendix A). To compute the values in the following figures we took 100,000 realizations of the process for each value of ПЃ = 0, 0.05, . . . , 1.2. We consider the delta estimators following from the prior for growth rates stating that the initial growth rate is normally distributed with mean zero (as in equation (2)), with standard deviations: 0.2 (the estimator called delta1) and 0.05 (the estimator called delta2). 18 0.2 OLS CBC delta1 delta2 0.15 0.1 E(Л†ПЃ)-ПЃ 0.05 0 -0.05 -0.1 -0.15 -0.2 0.4 0.5 0.6 0.7 0.8 ПЃ 0.9 1 1.1 1.2 Figure 5: Bias of the OLS, CBC and two delta estimators in a Monte Carlo experiment, sample size: T=25. 0.3 OLS CBC delta1 delta2 0.25 RMSE(Л†ПЃ) 0.2 0.15 0.1 0.05 0 0.4 0.5 0.6 0.7 0.8 ПЃ 0.9 1 1.1 1.2 Figure 6: RMSE of the OLS, CBC and two delta estimators in a Monte Carlo experiment, sample size: T=25. 19 The performance of the delta estimator is compared with that of OLS and the constant-bias-correcting (CBC) estimator in the nomenclature of MacKinnon and Smith (1998).12 Figure 5 shows the bias for each considered estimator and each true value of ПЃ.13 Obviously, the largest bias is exhibited by the OLS estimator. The CBC estimator is only approximately unbiased so the picture reflects that the bias is not zero, but CBC reduces the bias considerably relative to OLS. We can see that for high values of ПЃ the bias of the delta estimator is in between that of OLS and CBC. Therefore, the Bayesian estimator achieves an approximate bias correction in this parameter region. The correction is not as precise as in the CBC estimator. However, less biased estimators can have larger MSE. Figure 6 reports the root MSE for the three estimators under consideration at various values of ПЃ. For the positive ПЃs, the CBC has lower MSE then the OLS estimator when ПЃ is above 0.5.14 The figure shows that delta estimators beat OLS and the CBC for sufficiently high values of ПЃ. This surely contains the relevant roots in practice for possibly non-stationary series. Actually, the delta estimator can be substantially better, for example, at ПЃ = 1 the RMSE under CBC is around 30% larger than with the delta2 estimator. This loss in efficiency is very large, close to the one that would result from throwing away half of the sample. Notice that, in this case, all the cards seem to be stacked in favor of the CBC: the bias correction is obtained taking into account the actually used initial condition. In the real world this (or any other) assumption on the initial condition is quite unwarranted, so we would expect CBC to be at an even larger disadvantage in practice. We have repeated the Monte Carlo study using other initial conditions used in the literature (and discussed in Appendix A.1) and we obtain similar results. Our conclusion is that, as long as a classical researcher is willing to say that the true ПЃ stays between, say, .7 and 1.1, the delta estimator is a better alternative than available bias corrections, even from a classical point of view. Unfortunately, outside the case of a simple AR(1) with the prior on only the first period growth rate, the delta posterior can not be computed easily by using techniques currently available. In the next section we take up a number of technical problems that arise when trying to elicit the delta prior 12 Kilian (1998) uses the same estimator to improve the small sample properties of inference with VARs. 13 We only report positive values of ПЃ. Since we are concerned with possibly nonstationary series, we ignore negative values of ПЃ. 14 This threshold is approximately 0.5 also for other sample sizes and for more sophisticated bootstrap bias correcting estimators (see MacKinnon and Smith, 1998, Figures 4 and 6) and for other bias correcting estimators (see Roy and Fuller, 2001, Tables 1 and 3). 20 in large scale VARвЂ™s. 3 Translating Priors for Observables Standard priors used in Bayesian analysis are distributions of unobserved parameters. A prior about the likely behavior of growth rates of a series such as in (2) is, instead, a prior about observables. The practical question, which we study in this section, is how to translate a prior about observables into a (standard) prior distribution of unobservable parameters. The discussion and the methods we propose in this section are not limited to the case of priors about growth rates, but are quite general and can be useful whenever either experience or formal economic models carry information about the likely behavior of observable series. The special case when the prior for the observables takes the form of the one step ahead predictive distribution conditional on the known right hand side variables, is known as the вЂ™predictive approachвЂ™ to elicitation. This approach to obtaining priors (as opposed to the вЂ™structural approachвЂ™ which involves specifying directly the prior for the model coefficients) has been advocated, in a general context, e.g. in Kadane (1980). A widely quoted practical procedure for obtaining priors for parameters in this way is proposed in Kadane et al. (1980), and has been applied in the time series context in Kadane et al. (1996). Our prior from the previous section, which specifies the first period growth rate conditional on the initial observation, can also be handled with these methods. However, the tools of the вЂ™predictive approachвЂ™ are not applicable in case of priors about growth rates over several periods, which will be studied and motivated in this section. A discussion of more general techniques for translating priors on observables into priors for coefficients can be found in Berger (1985, Chapter 3.5). These techniques, however, are hard to apply for the large scale time series models used currently in practice, so we develop a novel approach. We note at this point that translating the marginal distribution of observables into a prior for parameters is not simply a reparameterization, which is discussed in more detail in Appendix C. 3.1 Defining the prior for coefficients Let us consider a general N -dimensional stochastic process {yt }. We define Y в‰Ў [y1 , ..., yT ] as a T Г— N matrix gathering the random variables from which the sample of T observations is drawn, while Y represents the actual observed realization of these variables. The joint probabilities for the 21 stochastic process are given by a model that determines the likelihood function LY |B (Y ; B), which gives the density of observing a realization Y for a parameter value B. We assume the researcher knows LY |B . We now assume that the researcher is willing to state a prior distribution on Y П•Y (8) representing the likely behavior of the series the researcher has in mind before analyzing the sample. It is, therefore, a marginal distribution of the observable data. The researcherвЂ™s knowledge of the likelihood LY |B implies that the uncertainty represented in П•Y is a combination of the analystвЂ™s uncertainty about the actual values of coefficients B and the error terms of the model. Therefore knowledge of LY |B and П•Y implicitly defines a marginal distribution of the parameters fB which is consistent with these functions. The task of obtaining the prior for coefficients means starting from the known distribution П•Y and the likelihood LY |B , and finding the prior distribution of parameters fB that is consistent with them. A standard formula gives that the desired prior fB satisfies LY |B (Y ; B) fB (B) dB = П•Y (Y ) for all Y (9) Obviously, this just says that the integral over B of the joint distribution of Y, B has to equal the marginal distribution of Y specified by the prior for observables (8). Thus (9) gives a functional equation that defines fB in terms of the functions П•Y , LY |B . Equations of this type are known in calculus as inhomogeneous Fredholm equations of the first kind. Obviously the above problem does not need to have solution for arbitrary П•Y and LY |B . For example, we already pointed out in the AR(1) case analyzed in subsection 2.3 that in order for (2) to be consistent with the model (1) we needed Пѓ 2в€† в‰Ґ Пѓ 2 . Other cases where a solution for fB or where the solution is uninteresting will be discussed as we go along. In this case the researcherвЂ™s belief in П•Y is just incompatible with the model LY |B , the researcher is asking the model to do something it can not do. As we will see later, in practice this is rarely a problem because even if the exact solution does not exist, one can often find a prior for parameters which would approximate the desired distribution П•Y to a satisfactory degree. Finding an fB that solves the above functional equation is not trivial. Only in special cases obtaining a prior is easily done analytically, as in the AR(1) case which is discussed in the Appendix D. As discussed in Appendix C, translating the prior for observables into the prior for coefficients is not a matter of reparameterization, using the usual change-of-variable formula. 22 The reason is that the presence of the random errors ( ), in addition to the random parameters, complicate things. In principle, random errors could be integrated out, but for that a joint distribution of the errors and observables has to be specified, which would be consistent with the assumption of the independence of the parameters and the errors. Specifying such distribution may not be easier then solving the integral equation (9). It is easier to find numerically fB that satisfies (9) approximately. One may be tempted to obtain such a numerical approximation by discretizing the distributions LY |B and П•Y and view (9) as a linear system determining a discretized version of fB . However, this approach is difficult to apply in practice: the resulting discretized fB is often going to have negative terms and it turns out that it is often ill-conditioned, so that a very large mass of the distribution can be shifted to values of B very far away from each other depending on how the discretization is exactly performed.15 We do not exploit this route in this paper but we propose the following. 3.2 Fixed point formulation We now reformulate the problem of defining fB as a fixed point problem. This serves several purposes: first, it will suggest an algorithm to find an approximate prior by successive iterations that are quite easy to perform and that work well in the examples we have tried. These approximations will deliver a normal distribution which will be easy to use in practice. Second, this formulation will be useful in section 4 to compare the delta prior with other priors proposed in the literature. Let g be any density defined on the parameter space of B. Define the functional F as F (g)(B) в‰Ў LY |B (Y ; B) g(B) П•Y (Y ) dY LY |B в€— g(Y ) for all B (10) where LY |B в€— g(Y ) в‰Ў LY |B (Y ; B) g(B) dB This functional can be interpreted in the following way: note that the term LY |B (Y ;B) g(B) is the posterior on the parameters derived from the prior g L в€—g(Y ) Y |B conditional on having observed the realization Y . Therefore, the expression inside the integral in (10) is the joint distribution of Y, B that emerges when 15 A similar problem in the economics literature is encountered by Sims (2007b), who also discusses some of the technical difficulties in solving this kind of equation. 23 the marginal density of Y is the specified П•Y . Therefore, F (g) is the integral of the joint distribution, so it is the marginal distribution of B. The mapping sends a prior g to a density F (g) which is a mixture of posteriors given each Y weighted by П•Y . Obviously, F (g) and g have to be consistent, this is formalized in the following proposition: Proposition 1 1. F maps the space of densities of B into itself. 2. If fB satisfies (9) then fB is a fixed point of F : Proof: We first show that if g is a density of B then F (g) is also a density of B. Since both LY |B and П•Y are densities we have LY |B , П•Y в‰Ґ 0. Since we only consider gвЂ™s that are densities we have g в‰Ґ 0. Since the integral of a non-negative function is non-negative we have F (g)(B) в‰Ґ 0 for all B. Furthermore, we show that F (g) integrates to one for any density g: F (g)(B) dB = LY |B (Y ; B) g(B) П•Y (Y ) dB LY |B в€— g(Y ) = П•Y (Y ) LY |B в€— g(Y ) = П•Y (Y ) dY = 1 dY LY |B (Y ; B) g(B) dB (11) dY (12) (13) where the first equality uses the definition of F and Fubini, the second equality just pulls out of the inner integral those terms that do not depend on B, the third equality uses the definition of LY |B в€— g and the fourth equality follows because П•Y is a density. This proves part 1. To prove part 2, if fB solves (9) then, for all B F (fB )(B) = = LY |B (Y ; B) fB (B) П•Y (Y ) dY LY |B в€— fB (Y ) LY |B (Y ; B) fB (B) dY = fB (B) (14) LY |B (Y ; B) dY = fB (B) (15) where the first equality holds from the definition of F , the second equality follows from (9) and the third equality takes fB (B) before the integral since 24 it does not depend on Y and the fourth equality holds because L is a density on Y so it integrates to 1 over its domain. Therefore F (fB ) = fB . It would be nice to have sufficiency of the second part of this proposition to assure that any fixed point of F is indeed a prior consistent with П•Y . Unfortunately, at this writing we do not have a proof of sufficiency. Absent sufficiency, the proposition can be used in the following way: if we find a fixed point we have to go back and check if (9) is satisfied for the fixed point that has been found. In that case we know that the fixed point of F has the desired property. Indeed, since we can only aspire at finding approximate solutions we will have to check (9) anyway to see if it is sufficiently close to being satisfied. 3.3 Gaussian approximate fixed point Iterating on F in order to find a fixed point means iterating on distributions. Only in very special cases the integrations in the subsequent iterations can be performed analytically: we discuss one such special case in Appendix D. In general, this is a difficult numerical task. There are various strategies to perform it, some based on discretizing and imposing that the result is a distribution, other based on using the mapping F to generate draws from successive distributions in a similar way as the MCMC techniques. We propose here an alternative that has worked well for us in practice. The idea will be to start at a normal distribution for B and iterate on F only approximately, by staying within the realm of normal distributions along the iterations. First, this guarantees that along the way we always have a well-defined distribution for B. Second, normal distribution is fully described by its mean and variance, which greatly reduces the dimensionality of the problem. Third, normal distributions produce closed form formulas, which speeds up the calculations. We will discuss later that the accuracy of this approximation can be readily checked by considering it (9) holds and that, in a certain sense, the fact that this is an approximation is nearly irrelevant. We specify the likelihood LY |B to come from a gaussian VAR, but it should not be hard to generalize the procedure for other likelihoods. What will be needed is that the likelihood and the parameterized prior are such that the posterior is known. Assume {yt } to be a VAR(P ) process: P yt = О¦i ytв€’i + Оі + i=1 25 t tв‰Ґ0 в€ј N(0, ОЈ ) i.i.d. and Оі is a vector of constant terms.16 We have T + P observations so that our sample is given by Y в‰Ў [y1 , ..., yT ] , a T Г— N matrix gathering T observations determined by the model and taking the first P observations as given. The VAR(P ) process can be also written as t Y = XB + E (16) where, X collects, in the usual way, the lagged values of Y and a column of ones which multiplies the constant terms , B в‰Ў [О¦1 , . . . , О¦P , Оі] and E в‰Ў [ 1 , ..., T ] . The initial conditions Y0 в‰Ў [yв€’P +1 , ..., y0 ] are fixed. This implies the standard likelihood function for gaussian VARвЂ™s, conditional on initial P observations Y0 (which are inside X, in the first rows): LY |B в€ј Nvec Y ((IN вЉ— X) vec B, (ОЈ вЉ— IT )) (17) We assume for simplicity the error variance ОЈ to be known. We now look for successive iterations on the mapping F within the space of normal distributions. We know from standard Bayesian econometrics that if we start with a gaussian prior for the coefficients g g в€ј N Вµg , ОЈg (18) for some mean Вµg and variance ОЈg , then conditional on observing a value Y the posterior defined as pg (B|Y ) в‰Ў LY |B (Y ; B) g(B) LY |B в€— g(Y ) is also gaussian with mean and variance в€’1 Varpg (В·|Y ) (B) = ОЈв€’1 вЉ—X X g +ОЈ Epg (В·|Y ) (B) = Varpg (В·|Y ) (B) в€’1 в€’1 ОЈв€’1 g Вµg + vec(X Y ОЈ ) (19) (20) In approximately iterating on the mapping F we exploit this closed-form solution for the posterior. We would now like to perform successive iterations on F . According to the above notation F (g)(B) = pg (B|Y ) П•Y (Y ) dY so that F sends a prior g to a density F (g) which is a mixture of posteriors, averaged over many samples Y , each weighted by the probability of this 16 It is straightforward to generalize this to the case with other exogenous variables. 26 sample according to the prior for observables П•Y . This mixture needs not be a normal distribution. However, we now approximate F (g) itself with a gaussian distribution with the same mean and variance. To evaluate the mean and the variance of B under the distribution F (g) we use a simple Monte Carlo procedure based on the following result: Result 1 EF (g) (B) = EП•Y Epg (В·|Y ) (B) (21) VarF (g) (B) = EП•Y Varpg (В·|Y ) (B) + VarП•Y Epg (В·|Y ) (B) (22) Proof Given g, for any function h we have EF (g) (h(B)) = = h(B) F (g)(B) dB = h(B) pg (B|Y ) dB h(B) pg (B|Y ) П•Y (Y ) dY П•Y (Y ) dY = EП•Y dB Epg (В·|Y ) (h(B)) (23) where the first equality follows by definition of EF (g) , the second by definition of F (g), the third by Fubini and the fourth by definition of EП•Y . Clearly, (21) follows when we consider h(B) = B. To prove (22) we notice VarF (g) (B) = EF (g) (B 2 ) в€’ EF (g) (B) = EП•Y Epg (В·|Y ) (B 2 ) 2 в€’ EП•Y Epg (В·|Y ) (B) = EП•Y Epg (В·|Y ) (B 2 ) в€’ Epg (В·|Y ) (B) 2 2 + EП•Y Epg (В·|Y ) (B) + VarП•Y Epg (В·|Y ) (B) = EП•Y Varpg (В·|Y ) (B) + VarП•Y Epg (В·|Y ) (B) where the first equality follows from the basic fact that Var(X) = E(X 2 ) в€’ (E(X))2 , the second equality uses (23) for h(B) = B 2 , simple algebra, and (21); the third equality uses the вЂќbasic factвЂќ about variances just mentioned and that EП•Y is a linear operator, and the fourth equality applies the вЂќbasic factвЂќ about variances again inside of the operator EП•Y . This result immediately suggests the following Monte-Carlo approximation to compute the mean and variance according to the distribution F (g): draw M realizations of Y from П•Y ; for each draw Y compute Epg (В·|Y ) (B) and Varpg (В·|Y ) (B) using the above closed-form expressions, finally approximate the expectations EП•Y by averaging over the M draws the expressions inside 27 2 в€’ EП•Y Epg (В·|Y the expectations in the right side of (21) and (22). The normal distribution with mean EF (g) (B) and variance VarF (g) (B) found in this way is the approximation to F (g) that we propose. In the applications we propose below we found fixed points of F by successive iterations on the above scheme: starting a relatively flat distribution as an initial guess we find successive means and variances using the approximate iteration described above until the scheme delivers satisfactory approximation to the desired marginal distribution of the data. We have no theorem that such an algorithm will work, but in all practical applications we have tried it delivered priors which implied marginal data densities quite close to the desired one. In all cases it worked similarly as the analytically tractable case in Appendix D: after the first few iterations the means of coefficients stabilized, and subsequent iterations were only shrinking the prior variances. Obviously if such iteration failed there are a number of search algorithms that could be used to find a fixed point of the mean and variance. It can be seen that П•Y can be completely general for this scheme to work, all we need is that we can generate random draws from it. It is also not necessary to restrict our attention to normal distributions, all that is needed is that given the likelihood the posterior has a closed-form expression. Therefore, this idea should be applicable to a wide range of priors on observables and to a wide range of approximations. 3.4 Specifying priors on observables and accuracy of the obtained prior The key to this approach is to specify a prior distribution on observables П•Y . This prior should allow itself to be translated into a prior density for the coefficients. The translation problem should have a solution and be numerically manageable. We discuss some generic considerations regarding how such priors should be specified. Closely related is the issue of checking if the obtained prior for the coefficients indeed has the desired property that it is consistent with П•Y . Since numerical approximations will most likely have to be used this is a necessary final step for any numerical approach. Additionally, if the fixed point approach of the last subsection is applied, this is a necessary step since we lack a sufficiency result. To make our discussion concrete we now specify a prior for growth rates for an N -dimensional VAR(P ). We state that the analyst has a prior notion that growth rates are distributed в€†yt в€ј N (Вµв€† , ОЈв€† ) for all t = 1, ..., T0 28 (24) for given Вµв€† , ОЈв€† , T0 . Here, Вµв€† is the vector of prior means and ОЈв€† is the vector of total prior variance that incorporates both the analystвЂ™s uncertainty about parameters and the uncertainty due to random shocks (in terms of our discussion of equation (6), this is equal to Пѓ 2g + Пѓ 2 ). Therefore, the prior for coefficients satisfying (24) will not exist if we specify ОЈв€† which is smaller, in matrix sense, then ОЈ .17 We assume here that this mean and variance are constant for all t = 1, ..., T0 , but the researcher may be willing to leave open the possibility that the initial growth rates are not similar to the long run growth rates. For example, the research may want to allow for a transition period due to an initial condition much lower than steady state. We also need to specify the prior covariance structure of growth rates across time. Having a priori positively serially correlated growth rate is appealing for two reasons. First, because empirically observed growth rates of macroeconomic variables tend show a positive serial correlation but, more importantly, because if, say, О± is higher than expected this would mean that all growth rates are higher than expected. In the empirical example we will specify, for simplicity, a single common correlation of growth rates across time. One key issue is how to choose the parameter T0 involved in the above prior distribution. One may be tempted to impose the prior on all the observations in the sample and to set T0 = T . But this is not reasonable, because it implies that the weight of the prior increases as the sample size grows, therefore the tightness of the prior grows with the sample size, and this will prevent the posterior from being asymptotically valid. So, generically we will need to specify a fixed T0 < T. In principle, abstracting from singularities and other complications, it seems reasonable to expect that the prior (24) should define the distribution of the same dimension as the number of parameter values in the VAR in order for the implied prior for B to be unique. For example, in the model with only lagged dependent variables and a constant we need T0 N = P N 2 + N . If T0 is too low the prior for growth rates would give rise to an improper prior for coefficients, and to rather loose prior restrictions on the parameters, while if T0 is too large the implied prior is overdetermined and does not exist. As we have seen (and we will see in other examples) even if T0 = P N + 1 we may have an overdetermined or improper prior. It would seem at first glance that these issues of non-uniqueness or nonexistence could plague any application of priors on observables. In practice, however, most of these difficulties can be turned into blessings. In section 2 we chose T0 = 1 precisely because we wanted to use a very loose prior and 17 That is, wee need the matrix ОЈв€† в€’ ОЈ to be positive definite. 29 show how even such a loose prior matters for the posterior distribution. To the extent that many researchers may want to approach the data imposing minimal prior information on the estimation it is desirable to use a low T0 , even if this implies an improper prior, and let the data determine the posterior. In particular, one approach we will follow in the applied examples will be to choose T0 so that the number of variables determined by the prior is equal to the number of initial observations that are fixed (i.e. we donвЂ™t have any terms for them in the likelihood), that is, T0 = N P . One reason for this is that in this way we include as many restrictions as if we would state a distribution of initial conditions in terms of the parameters of the model as in the so-called вЂњexactвЂќ likelihood approach (see our discussion in Appendix A). One final issue: instead of imposing a prior on the initial T0 observations as we did, we could have imposed a prior on the observations t = s, s + 1, ..., s + T0 for some value s > 1. There would be no technical problem with that, as long as we keep in mind that it would be inconsistent to always choose the last T0 observations, i.e.. to set s = T в€’ T0 + 1, as this would amount to saying that the prior ideas on the coefficients change with the sample size. But the choice of a fixed s is indeed, in a way, arbitrary. We always choose s = 1 in the applications below because a prior on the initial dates seems to influence more strongly the posterior and because, as shown in section 2, it relates to a large literature on small sample bias. It is clear that (24) plus the assumption about covariances across time implies a (prior) distribution for the first T0 periods of the series and, therefore, it implies a prior distribution on observables П•Y . The algebraic details are shown in Appendix B for completeness. A numerical procedure, such as the one proposed in the previous section, can be applied to find a candidate prior for coefficients. Since the exact solution may not exist, or because of the numerical approximation, the prior needs to be checked. This is done in a straightforward way by a Monte Carlo simulation, in which coefficients are drawn from the candidate prior, and then data are drawn from the likelihood function for T0 periods. In the case of the standard VAR, we draw gaussian errors and simulate the series starting from the same initial conditions as the actual data at hand. Such data y will be distributed with the marginal distribution of the data, implied by the candidate prior. This distribution will then need to be compared with the desired marginal distribution of the data П•Y . Comparison could involve a few moments or quantiles of the distribution, or plots of the densities. Formal comparison criteria are not strictly necessary, because the comparison is in the end inherently subjective: the researcher should verify if the implied distribution of observables reflects his prior beliefs. 30 4 Comparison with Other Priors The preceding discussion of priors for growth rates and their translation to priors for coefficients is also helpful for interpreting some of the other standard priors for VARs. We have discussed at length the relationship with the flat prior in section 2 so we will not comment on it any further. But most common informative priors for VARs result in adjusting the posterior in the direction of frequentist bias corrections, similarly as the delta prior does. We start by commenting on the JeffreyвЂ™s prior and the Minnesota prior, which adjust the posteriors similarly as the delta prior but for reasons other than using prior information about plausible growth rates. More related to our work will be the prior of Villani (2008) and UhligвЂ™s (1994a) вЂ™exact likelihoodвЂ™ analysis, which also can be interpreted as using a particular prior. Finally, we will turn to priors introduced through dummy observations proposed by Sims (1996), Sims and Zha (1998), and Del Negro and Schorfheide (2004). These priors are also in technical details similar to the delta prior and it turns out that the fixed point formulation discussed above helps interpreting them. 4.1 Priors which adjust the OLS estimator for other reasons Phillips (1991) criticized the flat prior on the grounds that it was not truly noninformative. He suggested to use JeffreyвЂ™s prior, which has the attractive property of invariance to reparameterizations, and which turns out to put increasing weight on higher values of ПЃ. By favoring higher roots, JeffreyвЂ™s prior adjusts the posterior in the direction of frequentist bias corrections. PhillipsвЂ™ work sparked an intensive debate, to which we have nothing to add, but we will note the following: JeffreyвЂ™s prior is not commonly found in practical applications. In part, this is because of computational difficulties that make VAR applications hard.18 Second, instead of discussing what is a вЂњtrulyвЂќ uninformative prior, we prefer to think of priors that are indeed informative but that are commonly acceptable to many economists. Applied work often uses the so-called Minnesota prior, with a NormalWishart prior centered at the random-walk model (see Doan et al., 1984). This estimator has an effect resembling a bias correction, since in practice it usually increases the root of the process relative to the OLS estimate. In any case, the correction is determined by the precision of the prior, which is set in a mechanical and essentially arbitrary way. Minnesota prior has been shown 18 Cointegrating VARs with JeffreyвЂ™s prior are studied in Kleibergen and van Dijk (1994). 31 to be effective for forecasting purposes. However, using a VAR with this prior to summarize properties of the data may be more contentious. If the prior matters for the results, these results can also be challenged by criticizing the prior. Part of the problem is that this prior is stated directly as a prior for coefficients. Therefore it is difficult to have a meaningful discussion about what are the reasonable values for the parameters of the Minnesota prior, since the mapping between coefficients and features of the observables, about which on may have prior information, is not a very intuitive one. In the standard specification the Minnesota prior is essentially different from the delta prior because it does not incorporate any knowledge about growth rates. The standard specification of the Minnesota prior uses a flat prior for the constant term of the VAR (Doan, 2000, p.310), which implies a flat prior for the growth rate of the series. Also, the shape of the Minnesota prior is quite different, because it assumes that the coefficients are independent a priori. In contrast, the delta prior usually implies a certain prior correlation among coefficients in the VAR so as to keep the growth rate in its distribution. For example, in the simple case of subsection 2.3 it is clear that the condition (6), assuming e.g. y0 > 0 implies a negative prior correlation between О± and ПЃ. The above comments related to the way the Minnesota prior has been applied so far. But the ideas of the Minnesota prior and the ideas discussed in the current paper can be combined in a productive way: one could specify a prior on growth rates and then find the parameters on the Minnesota prior that most closely reproduce these pre-specified growth rates. That is, one could impose on fB the parameterization of the Minnesota prior and iterate on these parameters until they produce a prior distribution that agrees satisfactorily with a prior for growth rates. In this case, given our observation in the previous paragraph, it would be important to extend it, and allow for a correlation between constant terms and autoregressive coefficients. This would still reduce enormously the number of parameters that determine the prior for VAR coefficients. We have not pursued this idea in this paper. 4.2 Priors similar to the delta prior The work of Villani (2008) can be interpreted as imposing a delta prior at some dates in very distant future. Formally, it would involve specifying a fixed T0 and taking s в†’ в€ћ. This is in the same spirit as the delta prior since it incorporates prior information on growth rates. The key difference is that Villani specifies VARs in differences, i.e. imposes unit root with certainty. In contrast, the delta prior is designed for the VAR in levels and it pushes the posterior towards the unit root only as much as appropriate, given the 32 specified prior.19 Many papers have implicitly introduced priors about initial growth rates when exploiting the information about parameters contained in the initial observations in the sample. The standard likelihood conditional on the initial observations ignores this information. There are two approaches to correcting this deficiency of the standard likelihood (which may be equivalent in simple cases). One involves specifying the distribution of the initial observation conditional on parameters, and the other involves specifying the distribution of parameters conditional on the initial observation. Works such as Thornber (1967), Zellner (1971, ch.7.1), Lubrano (1995) and many others write the density of the initial observation conditional on parameters. The likelihood including the term for the initial observation is often called вЂњexactвЂќ likelihood. In fact, using the word вЂњexactвЂќ for this exercise is slightly presumptuous, since one can only specify the distribution of the initial observation by making very strong assumptions about what has happened before any sample observation was available. An alternative interpretation is used e.g. in Schotman and Dijk (1991a,b), who specify a prior about the mean of the process, conditional on the initial observation. Linking the parameters of the model and the initial observation is least controversial in stationary models. Therefore, most of the above papers have either excluded nonstationary models, or specified separate assumptions for this case. The approach which allows a unified treatment of stationary and nonstationary cases is proposed in Uhlig (1994a) (The main focus of his paper is on studying the JeffreysвЂ™ prior, but we concentrate only on his treatment of initial observations.) Uhlig (1994a) links parameters and initial conditions by specifying how many periods the model has been operating before the sample start (UhligвЂ™s parameter S) and by stating what was the value of the process at the beginning of these periods. We call his approach an S-prior, but he stated it as writing the exact likelihood. The common feature of the S-prior and the previous priors is that they link the parameters to the initial observation in a way which rules out crazy growth rates in the first periods. In terms of our Figure 3, they make situations like that in the lower panel very unlikely. In fact, the S-prior (and the previous approaches) is analogous to the standard practice in classical small sample econometrics of using the distribution of the estimators where initial conditions are generated by the model. In terms of our discussion in section 2, the relation of the delta prior with 19 Note that VillaniвЂ™s statement about long run series behavior is in fact a reparameterization. This is why he does not encounter the difficulties of obtaining a prior that we discussed in section 3. 33 UhligвЂ™s S-prior (and the previous approaches) is quite close. In Appendix A.2 we report the frequentist properties of the posterior mean under the S-prior in the univariate case. We show that the S-prior has similar properties of the delta-prior: it delivers a similar correction of the OLSE and it has very good mean squared error properties in a relevant range of parameters. Furthermore, in Appendix A.3 we propose an extension of UhligвЂ™s methodology to the multivariate case and in an empirical application we find results close to those of the delta prior. These results reinforce the theme of section 2, namely, that if huge growth rates are discarded in the prior, an asymmetric posterior is obtained and the Bayesian estimator can be seen as correcting the bias of OLS and improving over classical estimators. However, we feel that the delta prior is easier to use for applied work than the Sв€’prior. To our knowledge, the application in Appendix A.3 is the first extension to multivariate VARвЂ™s, and it relies on certain approximations. But most importantly, it is not clear how to pick the S and the initial condition. One alternative is to pick S = в€ћ but this is as arbitrary as any other assumption: why not 10 periods?, why not since the end of WWII? Furthermore, for roots equal or larger than 1 we have to assume a finite number of periods and an arbitrary initial condition at that time. The natural choice, followed by Uhlig (1994a) is to take yв€’S = Вµ but this is choice is as hard to defend as any other. We presume that it will be difficult for economists to agree about the вЂњrightвЂќ value of S and the вЂќrightвЂќ value of the initial condition. Our proposed delta prior avoids these issues by taking the initial value of the process as given, which amounts to saying that we have a completely noninformative prior for S and the initial condition yв€’S and anything else that happened before our sample starts. To our thinking, it should be much easier for economists to compromise about reasonable values of the growth rates. 4.3 Dummy observations priors An alternative is to specify the prior as dummy observations as in Sims and Zha (1998), Sims (2006) and other works of Sims. This is similar to the situation e.g. in Del Negro and Schorfheide (2004), where the dummy observations are generated by a DSGE model and have a distribution implied by the prior distribution of structural model parameters. The idea of introducing dummy observations is motivated by the need to push the posterior towards a model that is вЂњreasonableвЂќ and that is represented by these dummy observations. This is related, but not equivalent, to priors for observables. The key difference is that, as stated in (Sims, 2007a, p.7) вЂњThese dummy observations are interpreted as вЂ™mental observationsвЂ™ on 34 A, c (JM: i.e. VAR parameters), not as observations on data satisfying the model...вЂќ Let us give a detailed account. Sims (1996, 2000) argued that, even though a flat prior justified using OLS from a Bayesian perspective, the results were unsatisfactory because the OLS estimate tends to attribute too much of the data dynamics to the deterministic component of the process. He was referring to this phenomenon as the вЂ™other side of the well-known bias towards stationarityвЂ™ (Sims and Zha, 1998, p.959). Sims proposed to address this issue by using several types of dummy initial observations. The most relevant for this paper is the вЂ™oneunit-rootвЂ™ dummy observation prior, which reflects the вЂ™belief that no-change forecasts should be вЂ™goodвЂ™ at the beginning of the sampleвЂ™ (Sims, 1996, p.5). This is a special case of the delta prior, with the prior growth rate of zero in the first period. The dummy observation consists of y1 = y0 = yв€’1 = . . . yв€’P +1 = yВЇО» в€’1 yв€’i and О» is a specified scalar. The vector of 1вЂ™s that where yВЇ = P1 Pi=0 corresponds to the constant term in the data matrix is set to О» in this observation. In terms of our previous discussion this prior can be written as: p(B) = LY |B (Y ; B) LY |B (Y ; B) dB Оґ SD Y (Y )dY where Оґ SD is a degenerate density of Y which puts a unit mass on the Y dummy observation defined above. This is the result of applying mapping F once to a flat prior, i.e. the first iteration on our fixed point problem. While similar in spirit, Sims вЂ™one-unit-rootвЂ™ dummy observation prior differs from the delta prior in the following ways: i) the initial conditions are taken to be yВЇ instead of the actual yв€’P +1 , ..., y0 ii) all the prior mass is on the growth rate of 0; and iii) the variance of the error in the dummy observation is different from the variance error term of the regular observations, and it is determined by the arbitrary scalar О». Point iii) is the consequence of the dummy observations being вЂ™mental observations on parametersвЂ™ and not on the data; and if they are not observations on the data, then further iteration on the mapping F is not necessary. The effect of ii), i.e. of putting all the prior mass on just one value of the growth rate, makes the prior tight, but it can be made looser by inflating the corresponding error variance. The вЂ™one-unit-rootвЂ™ dummy observation prior is computationally much more convenient then the delta prior. However, this prior, while making a reference to the behavior of the series, is in fact a prior for coefficients. This 35 makes the scaling parameter О» difficult to interpret and elicit, contrary to the variance of the growth rates of observables in the delta prior approach. 5 Empirical Example: the effect of monetary shocks in the US To illustrate the effect of alternative priors on macroeconomic VARs we first replicate the estimation of the effects of the monetary shock in the US from Christiano et al. (1999).20 The authors estimate a VAR with output (Y, measured by the log of real GDP), prices (P, the log of the implicit GDP deflator), commodity prices (PCOM, the smoothed change in an index of sensitive commodity prices), federal funds rate (FF), total reserves (TR, in logs), nonborrowed reserves (NBR, in logs) and money (M1 or M2, in logs). All data are quarterly and the sample is from 1965:3 to 1995:2. Residuals are orthogonalized with the Choleski decomposition of the variance, with the above variable ordering, and monetary shock is the one corresponding to the federal funds rate. Figure 7 displays the responses of output to a monetary shock, estimated with alternative approaches. We focus on responses of output first, since they happen to be most affected both by the frequentist small sample bias, and by alternative prior assumptions. Responses of the remaining variables are reported and shortly discussed further below. Our benchmark is the posterior distribution of the impulse responses obtained with the flat prior for the coefficients. It is displayed in all plots to facilitate comparisons, plotted as the shaded region. Precisely, this region shows a 95% band, constructed as quantiles 0.025 and 0.975, calculated separately for each horizon, of the posterior distribution of the impulse responses. This distribution is simulated by Monte Carlo following the well known algorithm from the RATS package manual (Doan, 2000, example 13.4).21 The flat prior band is almost symmetric around the OLS point estimate, which is also displayed in the first plot as the dashed line. Continuous lines on each plot present 95% bands constructed with alternative approaches. The first plot reproduces the results in Christiano et al. (1999) who use a bootstrap procedure that is an attempt to provide an idea about uncertainty about the point estimates of impulse responses, while disregarding the small sample bias. In this bootstrap procedure, the OLS point estimate of the 20 We choose this example because the data is on the Internet, and because the authors use the bootstrap error bounds, which highlight the frequentist small sample bias. N +1 21 The precise form of the prior is p(B, ОЈ) в€ќ |ОЈ|в€’ 2 . 36 coefficients is taken to be the data generating process, and the series are repeatedly generated with these coefficients, from resampled errors. The band is constructed from the percentiles of the distribution of the impulse responses estimated by OLS from the generated series (вЂ™other-percentileвЂ™ bootstrap, see Sims and Zha (1999)). Although this was not the intention of the authors, this experiment is essentially a study of the frequentist small sample bias in this context. It shows that when the data generating process exhibits as much persistence as the dashed line, the OLS estimates on the data coming from this DGP tend to be less persistent, and output response to the monetary policy shock is weaker and dies out sooner. The sample is quite large - 120 observations - so the bias may not be striking at first glance, but in smaller samples even 95% bootstrap bands often exclude point estimates. (Also, it is possible that this application has so many parameters that the sampling error is very large and the bias is, by comparison, small potatoes.) The next step is to impose the prior on initial growth rates of the variables. The parameters of the basic specification of the prior are given in table 1. We have chosen these values mechanically by setting the prior means and variances of growth rates equal simply to those observed on average in the whole sample. A more opinionated reader can plug in his/her own preferred values. We set all autocorrelations of growth rates to 0.4. We take T0 = 4, the number of lags in the VAR, so that the number of assumed distributions of growth rates is equal to the number of initial observations on which the likelihood conditions. In this way we intend to complete the conditional likelihood with the terms conveying the assumption that the initial observations behave, in terms of growth rates, similarly as the rest of the sample. Table 1: Average growth rates and standard deviations of the endogenous variables in the full sample (1965Q3:1996Q2) variable mean annualized growth rate annualized standard deviation Y 2.7 3.6 P 5.0 2.5 PCOM 3.2 206 FF 0.1 4.8 NBR 5.4 9.1 TR 5.2 6.6 M1 6.5 4.0 Note: The quarterly growth rates and their standard deviations are multiplied by 4. The original quarterly values were used in the delta prior. 37 bootstrap delta prior - empirical 0.002 0.002 0 0 -0.002 -0.002 -0.004 -0.004 -0.006 -0.006 -0.008 -0.008 flat prior 95% band OLS other 95% band -0.01 -0.01 0 2 4 6 8 10 12 14 0 2 4 delta prior - zero growth 6 8 10 12 14 12 14 12 14 Sims initial dummy 0.002 0.002 0 0 -0.002 -0.002 -0.004 -0.004 -0.006 -0.006 -0.008 -0.008 -0.01 -0.01 0 2 4 6 8 10 12 14 0 2 4 Kilian 6 8 10 Kilian nonstationary 0.002 0.002 0 0 -0.002 -0.002 -0.004 -0.004 -0.006 -0.006 -0.008 -0.008 -0.01 -0.01 0 2 4 6 8 10 12 14 0 2 4 6 8 10 Figure 7: Impulse responses of output to monetary shocks. 38 Note that with this choice of T0 , the dimension of the prior is only 4Г—7 = 28, compared with the 4 Г— 72 + 7 = 203 coefficients, so the precise prior would be improper. To keep the prior proper, we complete it by a very weak shrinkage prior centered at zero. We accomplish this by initializing the fixed point iteration, by which we find the approximate gaussian prior for the coefficients, with the gaussian prior with zero mean and a large diagonal variance (with diagonal terms equal to 1002 , which is several times larger then e.g. the posterior variances of the coefficients estimated with the flat prior). After 160 iterations most variances have shrunk by many orders of magnitude, but some of them remain barely different from the starting point, consistently with the pure delta prior being improper. The match between the 28 assumed growth rates, and the growth rates implied by the actually used gaussian prior is illustrated in Figure 8, and it is deemed to be quite reasonable. In contrast to the non-informative prior, which is centered around the OLS estimate, and in an even starker contrast with the bootstrap bands, the delta prior results in impulse responses which are more persistent then the OLS estimates imply. They are also stronger in the medium run. The effect on the economic interpretation of the results is striking. The cumulative effect of the shock after 4 years is -6.6% of the quarterly GDP (at the median) when estimated with the delta prior, but only -4.6% when estimated with OLS, and only -3.1% according to the bootstrap bands. Therefore, using the flat prior we underestimate the cumulative effects of monetary shock by about one third, and when using bootstrap bands - by more then a half, compared with the effects obtained with what we think is a reasonable prior. That the delta prior impulse responses are more persistent is intuitive, since, as discussed earlier, the assumptions about initial growth rates are also implicit in the frequentist analyses of the small sample bias. Incorporating these assumptions leads to the optimal Bayesian posterior which, from the frequentist point of view, implements a bias correction.22 In the second row of Figure 7 we display two more plots which illustrate the relationship between the delta prior and the вЂ™initial dummy observationвЂ™ prior proposed by Sims. SimsвЂ™ initial dummy observation prior reflects the idea that вЂ™no growth forecast is a good forecast in the beginning of the sampleвЂ™ and it is very convenient to introduce, but its variance has no intuitive interpretation. We compare it with the prior for growth rates with the similar variance as in the baseline case, but centered at zero growth rates. We 22 The bands are narrower, which partly reflects the fact that, in the current version of the procedure, we do not take into account the uncertainty about error variance, and instead fix it at the OLS estimate. A flat prior band which disregards uncertainty about error variance has similar width as the presented delta prior band. 39 Y, t=1 45 actual 40 desired 35 30 25 20 15 10 5 0 -0.04-0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 0.05 Y, t=2 45 40 35 30 25 20 15 10 5 0 -0.04-0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 0.05 P, t=1 Y, t=3 45 40 35 30 25 20 15 10 5 0 -0.04-0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 P, t=2 Y, t=4 45 40 35 30 25 20 15 10 5 0 -0.05-0.04-0.03-0.02-0.01 0 0.010.020.030.040.050.06 P, t=3 P, t=4 70 70 70 70 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 0 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0 -0.02 -0.01 PCOM, t=1 0 0.01 0.02 0.03 0.04 10 0 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05 PCOM, t=2 0 -0.03 -0.02 -0.01 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.6 0.6 0.6 0.6 0.5 0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0 0 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 FF, t=1 0.1 0 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 FF, t=2 -3 35 35 35 30 30 30 25 25 25 25 20 20 20 20 15 15 15 15 10 10 10 10 5 5 5 0 -0.06 0 -0.06 0 -0.06 0 0.02 0.04 0.06 -0.04 NBR, t=1 18 16 14 12 10 8 6 4 2 0 -0.1 -0.05 0 0.05 -0.02 0 0.02 0.04 0.06 -0.02 0 0.1 0.15 TR, t=1 -0.05 0 0.05 0.02 0.04 0.06 0.1 0.15 1 2 3 0 -0.1 -0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08 NBR, t=3 18 16 14 12 10 8 6 4 2 0 -0.1 0 5 -0.04 NBR, t=2 18 16 14 12 10 8 6 4 2 0 -0.1 -1 FF, t=4 30 -0.02 -2 FF, t=3 35 -0.04 0.01 0.02 0.03 0.04 0.05 PCOM, t=4 0.8 0 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0 PCOM, t=3 -0.05 0 TR, t=2 0.05 NBR, t=4 0.1 0.15 18 16 14 12 10 8 6 4 2 0 -0.15 -0.1 -0.05 TR, t=3 25 25 25 25 20 20 20 20 15 15 15 15 10 10 10 10 5 5 5 5 0 -0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08 0.1 0 -0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08 0.1 0 -0.06 -0.04 -0.02 M, t=1 M, t=2 0 0.02 0.04 0.06 0.08 0.1 40 40 35 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 -0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0 -0.04-0.03-0.02-0.01 0 0.010.020.030.040.050.060.07 0 -0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.05 0.1 0.15 0 -0.1-0.08-0.06-0.04-0.02 0 0.020.040.060.08 0.1 0.12 M, t=3 40 0 TR, t=4 M, t=4 45 40 35 30 25 20 15 10 5 0 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 Figure 8: Distributions of the growth rates of VAR variables in the first 4 periods of the sample, obtained by Monte Carlo simulation. вЂ™actualвЂ™ - the marginal density of the data (growth rates) implied by the baseline delta prior; вЂ™desiredвЂ™ - the assumed growth rates which are to be matched by the delta prior. 40 can see that SimsвЂ™ prior increases persistence of the response compared with the flat prior, in the similar manner as the delta prior. Its effect is much weaker than that of the prior for initial growth rates centered at zero, with empirically observed variances. Our interpretation is that SimsвЂ™ prior is centered at zero growth rates, but ends up having a larger variance then the variance of our baseline specification. In the third row of Figure 7 we display, for comparison, the effect of applying frequentist bias correction procedures in the present context. Of course, the resulting bands correspond to the distribution of the impulse response across alternative datasets, which is a different statistical object then the posterior distribution. However, such bands are often casually interpreted as delivering a Bayesian post-sample uncertainty about impulse responses. We apply the procedure to construct error bands for impulse responses proposed in Kilian (1998), called bootstrap-after-bootstrap. It is intended to improve the coverage of bootstrap intervals by approximately removing small-sample bias at each step of the simulation. Bootstrap is used to approximately remove the bias, and the underlying assumption is that the bias is constant in the neighborhood of the OLS estimate. As described in Kilian (1998), the first step consists of estimating and correcting the bias of the OLS point estimate. In the second step, the error bands are generated: series are repeatedly simulated from the data generating process implied by the corrected OLS estimate, a VAR is estimated by OLS at each simulated data set, the OLS estimate is corrected for bias analogously as in the first step, and finally impulse responses are computed and stored. In a simplified, but computationally much cheaper version of this algorithm, the bias estimate obtained in the first step is reused in the second step, for correction of all OLS estimates on generated data (instead of performing a separate bootstrap for each of them). We use this simplified bootstrap-afterbootstrap here. Bias correction pushes the roots of the process towards the nonstationary range and explosive responses may often result. This may, however, be a spurious effect caused by the assumption of constant bias. Kilian suggests to shrink bias estimates in all cases, when the bias corrected estimate would become explosive, and thus guarantee stationarity of all the simulated distribution of impulse responses. Sims and Zha (1999), when applying KilianвЂ™s method, deviate here and allow for explosiveness. We report results from both approaches: the plot labeled вЂ™KillianвЂ™ shows the results with shrinking of the explosive roots and the plot labeled вЂ™Killian nonstationaryвЂ™ shows responses allowing for explosiveness. In the present example the VAR has roots close to unity and the results differ quite a lot depending on the handling of the nonstationary roots. The 41 bands obtained with shrinking of explosive roots actually exhibit a marginally faster mean reversion even then the flat prior bands. The bands obtained without shrinking are much more spread out for farther lags, and put even more weight on the explosive behavior, then the posterior results with delta or SimsвЂ™ priors. Overall, the figure illustrates the dilemma involved in applying bootstrap-after-bootstrap, when the root of the system is close to unity: imposing stationarity discards much of the bias correction. Allowing nonstationarity, on the other hand, exposes the results to the inaccuracy of the assumption of constant bias. We guess that it is because of the failure of this assumption in practice, that the bands contain too much nonstationarity, and are so wide for farther lags. It is possible that applying the full, not simplified, version of KilianвЂ™s algorithm gives better results. In Figures 9 and 10 we report impulse responses for all variables. OLS point estimate is plotted with the dashed line, the same on all plots, and the 95% bands are delimited with continuous lines. The first column of Figure 9 shows that frequentist bias towards stationarity affects most clearly the response of output, and to a much smaller extent federal funds rate (FF), total reserves (TOTR) and money (M1). We are identifying only one shock, so we do not check what happens to the remaining 7 Г— 6 = 42 impulse responses of this system. Interestingly, the effect of delta prior is strongest exactly where the bias, indicated by the deviation of the bootstrapped median from OLS, was strongest. 6 Conclusions We have argued in this paper that, beyond the AR(1) model without intercept, the decision of whether to adjust the OLS estimate in time series or not is not a matter of philosophical principles (bayesian vs frequentist perspective). Instead, depends on whether we believe that the initial condition is compatible with reasonable growth rates, or that any initial condition is possible. While there is a number of standard priors for VARs, which have the same qualitative features, our preferred alternative is the delta prior. Contrary to the contenders, this prior has a clear interpretation and it is really possible to elicit it. It should be uncontroversial that most economic variables can not have huge growth in any given period; introducing this knowledge about the economy reconciles a Bayesian posterior with the certainty that classical econometricians (at least those that correctly worry about small sample issues) have about the fact that OLS should be adjusted upwards close to a unit root. 42 bootstrap Y 0.002 0 0 0 0 -0.002 -0.002 -0.002 -0.004 -0.004 -0.004 -0.004 -0.004 -0.006 -0.006 -0.006 -0.006 -0.006 -0.008 -0.008 -0.008 OLS pct 2.5 pct 97.5 -0.01 P 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 -0.008 -0.01 0 2 4 6 8 10 12 14 16 -0.01 0 2 4 6 8 10 12 14 16 0.004 0.004 0.004 0.004 0.002 0.002 0.002 0.002 0.002 0 0 0 0 0 -0.002 -0.002 -0.002 -0.002 -0.002 -0.004 -0.004 -0.004 -0.004 -0.004 -0.006 -0.006 -0.006 -0.006 -0.006 -0.008 -0.008 -0.008 -0.008 -0.01 0 PCOM -0.01 0 0.004 -0.01 2 4 6 8 10 12 14 16 -0.01 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0 0 0 0 0 -0.1 -0.1 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 -0.2 -0.2 -0.3 -0.3 -0.3 -0.3 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.012 0.012 0.012 0.012 0.01 0.01 0.01 0.01 0.01 0.008 0.008 0.008 0.008 0.008 0.006 0.006 0.006 0.006 0.006 0.004 0.004 0.004 0.004 0.004 0.002 0.002 0.002 0.002 0.002 0 0 0 0 0 -0.002 -0.002 -0.002 -0.002 -0.002 -0.004 -0.004 -0.004 -0.004 -0.004 -0.006 0 2 4 6 8 10 12 14 16 -0.006 0 2 4 6 8 10 12 14 16 -0.006 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.02 0.02 0.02 0.02 0.015 0.015 0.015 0.015 0.015 0.01 0.01 0.01 0.01 0.01 0.005 0.005 0.005 0.005 0.005 0 0 0 0 0 -0.005 -0.005 -0.005 -0.005 -0.005 -0.01 -0.01 -0.01 -0.01 -0.01 -0.015 -0.015 -0.015 -0.015 -0.015 -0.02 -0.02 -0.02 -0.02 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.02 0.02 0.02 0.02 0.015 0.015 0.015 0.015 0.015 0.01 0.01 0.01 0.01 0.01 0.005 0.005 0.005 0.005 0.005 0 0 0 0 0 -0.005 -0.005 -0.005 -0.005 -0.005 -0.01 -0.01 -0.01 -0.01 -0.01 -0.015 -0.015 -0.015 -0.015 -0.015 -0.02 0 2 4 6 8 10 12 14 16 -0.02 0 2 4 6 8 10 12 14 16 -0.02 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.01 0.01 0.01 0.01 0.005 0.005 0.005 0.005 0.005 0 0 0 0 0 -0.005 -0.005 -0.005 -0.005 -0.005 -0.01 -0.01 -0.01 -0.01 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 -0.02 0 0.01 0 8 -0.02 0 0.02 -0.02 6 -0.006 0 0.02 0 4 -0.3 0 0.012 -0.006 2 -0.01 0 0.2 2 0 -0.008 -0.01 0 0.2 0 FF Sims dummy 0.002 -0.002 0 TR delta - zero 0.002 0 -0.01 NBR delta - empirical 0.002 -0.002 -0.008 M1 flat 0.002 -0.01 0 2 4 6 8 10 12 14 16 Figure 9: Impulse responses to monetary shocks: OLS point estimate (dashed line) and the 95% uncertainty bands (continuous lines) generated by alternative methods 43 bootstrap Y 0.002 0 0 0 -0.002 -0.002 -0.004 -0.004 -0.004 -0.006 -0.006 -0.006 -0.008 -0.008 OLS pct 2.5 pct 97.5 -0.01 -0.01 P 0 2 4 6 8 10 12 14 16 PCOM 4 6 8 10 12 14 16 0.004 0.002 0.002 0.002 0 0 0 -0.002 -0.002 -0.002 -0.004 -0.004 -0.004 -0.006 -0.006 -0.006 -0.008 -0.008 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.2 0.2 0.1 0.1 0.1 0 0 0 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 -0.3 -0.3 4 6 8 10 12 14 16 0.012 2 4 6 8 10 12 14 16 0.01 0.01 0.01 0.008 0.008 0.006 0.006 0.006 0.004 0.004 0.004 0.002 0.002 0 0 -0.002 -0.002 -0.004 -0.004 -0.004 -0.006 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.02 0.02 0.015 0.015 0.015 0.01 0.01 0.01 0.005 0.005 0.005 0 0 0 -0.005 -0.005 -0.005 -0.01 -0.01 -0.01 -0.015 -0.015 -0.015 -0.02 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.02 0.02 0.015 0.015 0.015 0.01 0.01 0.01 0.005 0.005 0.005 0 0 0 -0.005 -0.005 -0.005 -0.01 -0.01 -0.01 -0.015 -0.015 -0.015 -0.02 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0.01 0.01 0.005 0.005 0.005 0 0 0 -0.005 -0.005 -0.005 -0.01 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 -0.02 0 0.01 -0.01 0 -0.02 0 0.02 -0.02 10 12 14 16 -0.006 0 0.02 -0.02 8 0.002 0 -0.002 2 6 0.012 0.008 0 4 -0.3 0 0.012 -0.006 2 -0.01 0 0.2 2 0 -0.008 -0.01 0 FF 2 0.004 0 TR -0.01 0 0.004 -0.01 NBR Kilian nonst. 0.002 -0.002 -0.008 M1 Kilian 0.002 -0.01 0 2 4 6 8 10 12 14 16 Figure 10: Impulse responses to monetary shocks: OLS point estimate (dashed line) and the 95% uncertainty bands (continuous lines) generated by alternative methods 44 Even from the classical perspective, our Bayesian posterior estimates are attractive. Correcting the bias does not really deliver (mean-) unbiasedness while our estimator, which also reduces bias, can have an edge in terms of mean squared error. We have illustrated the effect of the delta-prior in a macroeconomic VAR for the US economy. The posterior corrects the excessive stationarity of the OLS estimate, and implies much more persistent responses of output to monetary policy shocks. It will be interesting to apply the philosophy of specifying a prior for the VAR through features of the data to other cases. Priors for VARs implied by DSGE models are a promising application on which we plan to work. Appendices Appendix A A.1 Initial conditions Initial conditions in the studies of bias This appendix discusses how the initial condition for the AR(1) process was specified in this literature. To facilitate the discussion of the initial conditions, it is convenient to reparameterize the model as: yt в€’ Вµ = ПЃ (ytв€’1 в€’ Вµ) + t for t = 1 . . . T (A.1) This parameterization is a special case of (1) for:23 О± = Вµ(1 в€’ ПЃ) (A.2) The specified distribution for y0 will be related to Вµ. Under (A.1) the process reverts to Вµ if |ПЃ| < 1, goes away from Вµ if |ПЃ| > 1. The constant term drops out when ПЃ = 1.24 23 One can not go the other way around: if ПЃ = 1 in (1) there is no Вµ that satisfies the above equation and generates the same yвЂ™s. This will have implications for the practical application of this parameterization when the observed variable has a positive growth rate and is close to a unit root, because a trend will have to be introduced in this case to have a growing variable. 24 Zivot (1994) explains that the motivation for suppressing the constant term О± when ПЃ = 1 is that it changes interpretation in the unit root case, from determining the level of the process to determining its drift. When no trend is included for the general ПЃ, it is not reasonable to allow for a drift in just one point of the parameter space: ПЃ = 1. 45 Classical estimators need to specify fully the distribution of the process in order to derive the small sample properties of the estimator. 25 A necessary step for that is specifying the distribution of the initial observation y0 . Since the focus is on estimating ПЃ, it is convenient to specify a distribution that causes the OLS estimator of ПЃ to be independent from the вЂ™nuisanceвЂ™ parameters of the model: Вµ and Пѓ. In that way, the results on estimators of ПЃ are valid for a wide range of nuisance parameters. A general condition for this independence is: Result 2 Assume the initial condition in model (A.1) is given by: y0 = Вµ + ПѓП€ (A.3) where П€ is a random variable. Then, if П€ independent of the shocks and its distribution is independent of Вµ and Пѓ, the distribution of the OLS estimator of ПЃ in (1) is independent of Вµ and Пѓ.26 All classical papers that we know of use a version of (A.3). Of course, each choice of П€ will result in a different small sample distribution of ПЃЛ† conditional on ПЃ. The most popular approach is to assume that the first observation is drawn from the stationary distribution of the process (as if the process had been running for a long time before the beginning of the sample). This means 1 Пѓ2 assuming y0 в€ј N Вµ, 1в€’ПЃ 0, 1в€’ПЃ 2 , or П€ в€ј N 2 . The stationary distribution 25 We note in this paper that this forces classical econometricians to consider carefully the joint distribution of the data and the parameters, which, in our bayesian interpretation, helps them to elicit reasonable priors. 26 Similar results have been used in the literature. As can be seen, the proof is very simple, but we could not find a formal proof, so we offer it here for completeness. The proposition is very similar to the property of ПЃ Л† discussed in Andrews (1993, Appendix A), which contains a verbal proof for |ПЃ| в‰¤ 1 and a particular distribution for П€, but allowing for a trend. Proof : Define normalized errors: u в‰Ў /Пѓ. (A.3) allows to write: t ПЃtв€’i ui + ПЃt П€ yt = Вµ + Пѓ = Вµ + Пѓ yЛњt i=1 where yЛњ is the process with the zero mean, which would obtain from the same realization of errors, but rescaled to have a unit variance. Then it is a matter of simple algebra to show that: ПЃ Л†в‰Ў yt ytв€’1 в€’ T T 2 ytв€’1 в€’( ytв€’1 yt 2 ytв€’1 ) = 46 yЛњt yЛњtв€’1 в€’ T T 2 yЛњtв€’1 в€’( yЛњtв€’1 yЛњt 2 yЛњtв€’1 ) exists only for stationary processes, and therefore this assumption must be completed by another assumption for the nonstationary values of ПЃ. (Bhargava, 1986, p.372) (who constructs tests for unit roots invariant to Вµ and Пѓ) takes 1 1 в€’ ПЃ2 П€ в€ј N (0, 1) when |ПЃ| < 1 П€ в€ј N 0, otherwise (A.4) Andrews (1993) considers median unbiased estimators for ПЃ by taking a similar starting point for the stationary case and an arbitrary starting point (i.e. П€ is an arbitrary constant) for the unit root case (he does not consider explosive values of ПЃ). An alternative approach is taken in MacKinnon and Smith (1998) who make the same assumption for both stationary and nonstationary parameters values (see their p.206-207): y0 в€ј N Вµ, Пѓ 2 (A.5) so that they take П€ в€ј N (0, 1) for all ПЃ. All these choices guarantee that, in case of the ПЃ < 1, the process starts not too far from its steady state and therefore the convergence towards the steady state will not generate excessive growth rates. In the explosive cases, the speed of divergence from Вµ is increasing with the distance from Вµ, and the assumptions used limit growth rates also in this case. Note, however, that these alternative initial conditions have different implications for the bias, and so that MacKinnon and Smith (1998) and Andrews (1993) are correcting different biases. A.2 The Bayesian approach: S-priors for the AR(1) In this section we explore the implications of the initial conditions, formulated following Uhlig (1994a). Uhlig considers processes which had started at the mean S periods before the first observation in the sample.27 Uhlig (1994) proposes a formulation which encompasses many cases: he assumes yв€’S = Вµ, where S is a given constant that the researcher has to specify. This provides an alternative П€. 27 Strictly speaking, Uhlig (1994a) does not derive a вЂ™priorвЂ™ from the above specification of the initial condition, but he uses it to write the вЂ™exact likelihoodвЂ™. This is, of course, a semantic issue, since the resulting posterior is the same as if the above specification for the initial condition is considered a prior given the initial observations, which is our preferred interpretation. 47 To provide a common nomenclature for all these cases, we call UhligвЂ™s proposal the вЂ™SвЂ™ model. Then the case y0 = 0 becomes the вЂ™S = 0вЂ™ model, the assumption of MacKinnon and Smith (1998) as in (A.5) becomes the вЂ™S = 1вЂ™ model and we call the assumption of Bhargava (1986) in (A.4) the вЂ™S = в€ћвЂ™ model. The general вЂ™S-priorвЂ™ (Uhlig (1994a) and (Zivot, 1994, p.569-570)) is: 1 p(Вµ|Пѓ, y0 , S) = Пѓ в€’1 ОЅ(ПЃ, S)в€’ 2 exp в€’ 1 ОЅ(ПЃ, S)в€’1 (y0 в€’ Вµ)2 2Пѓ 2 (A.6) with ОЅ(ПЃ, S) = (1 в€’ ПЃ2S )/(1 в€’ ПЃ) =S |ПЃ| = 1 |ПЃ| = 1 (A.7) These priors are then complemented with the non-informative prior for ПЃ and Пѓ: p(ПЃ, Пѓ) в€ќ Пѓ в€’1 (A.8) The S = 1 prior is conditionally conjugate, and the full posterior distribution can be conveniently simulated with the Gibbs sampler. For the general S the conjugacy is lost, but the marginal posterior for ПЃ can be obtained analytically by a straightforward integration of the posterior with respect to Вµ and Пѓ. The resulting formula can be found in Zivot (1994, equation (43)). The bias and mean squared error of the Bayesian вЂ™S-priorвЂ™ estimators are examined in a Monte Carlo experiment similar to the one in section 2.4. We denote the Bayesian posterior means obtained with S = 0, 1 and 100 priors respectively as BS0, BS1 and BS100. We simulate the AR(1) processes with initial conditions corresponding to S = 0, 1 and в€ћ/1 and in each case try all 3 Bayesian estimators, in addition to OLS and the CBC estimator of MacKinnon and Smith (1998). The results are presented in Figures 11, 12 and 13. The BS1 and BS100 estimators perform somewhat better, but are broadly similar to the delta prior: in RMSE terms they beat OLS for the positive ПЃ and the CBC on an even larger interval (except that the BS100 has slightly worse RMSE than the CBC for ПЃ around 1). In the range where they perform well, their bias (unreported here) is stronger then that of the BC, although reduced compared with OLS. They are not very sensitive to the accuracy of the prior, e.g. the BS1 estimator performs similarly also when the prior assumption is false: for the processes starting at the mean (Figure 11), and in the stationary distribution (Figure 13). That last observation is not true for the BS0 estimator, which gives a particularly strong advantage over the alternatives when the prior is correct 48 0.26 0.24 OLS CBC BS0 BS1 BS100 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 -1 -0.5 0 ПЃ 0.5 1 Figure 11: RMSE of the OLS, BS0, BS1 and BS100 estimators in a Monte Carlo experiment with S = 0, sample size: T = 25. 0.26 0.24 OLS CBC BS0 BS1 BS100 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 -1 -0.5 0 ПЃ 0.5 1 Figure 12: RMSE of the OLS, BS0, BS1 and BS100 estimators in a Monte Carlo experiment with S = 1, sample size: T = 25. 49 0.26 0.24 OLS CBC BS0 BS1 BS100 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 -1 -0.5 0 ПЃ 0.5 1 Figure 13: RMSE of the OLS, BS0, BS1 and BS100 estimators in a Monte Carlo experiment with S = в€ћ, sample size: T = 25. (for the processes starting exactly at the mean, Figure 11), but for some parameter values it is very misleading when the prior is wrong. The graphs are truncated, so it cannot be seen that the RMSE of BS0 has a peak of 0.65 (more then 3 times that of the alternatives) around ПЃ = в€’0.6 for S = 1 (Figure 12) and of 0.9 (more then 6 times that of the alternatives) around ПЃ = в€’0.9 for S = в€ћ/1 (Figure 13). Nevertheless, in the range which is most relevant in practice, i.e. for high positive values of ПЃ, BS0 is always more precise then the alternatives. A.3 The S-prior for VARвЂ™s In this section we generalize the вЂ™S=1вЂ™ prior to the VAR(P) model, evolving around an arbitrary exogenous process (for example a constant mean, or a linear trend). Let Y be a T Г— J matrix gathering T observations on J jointly endogenous variables. Let L denote a lag operator and define a P Г— 1 vector l в‰Ў [L, L2 , . . . , LP ] . Define X в‰Ў l вЉ— Y = [Yв€’1 , Yв€’2 , . . . , Yв€’P ], a T Г— JP matrix with P lagged matrices Y . W is a T Г— K matrix with K exogenous variables (in case of a time invariant mean, W = О№T , a T Г— 1 vector of ones, while in case of a linear trend W = [О№T , П„ ], where П„ = [1, 2, . . . , T ] ). Z в‰Ў l вЉ— W = [Wв€’1 , Wв€’2 , . . . , Wв€’P ] is a T Г— KP matrix with P lagged matrices W . Finally. U is a T Г— J matrix with T independent normal vectors 50 of length J, each with a variance ОЈ. Deviations of Y from the exogenous component evolve according to a VAR(P) process, perturbed by shocks U : Y в€’ W О“ = (X в€’ Z (IP вЉ— О“)) О + U (A.9) where О is a JP Г— J matrix of VAR coefficients and О“ is a K Г— J matrix with means and slopes of the time trend for each endogenous variable. Reduced form of the above model is: Лњ + XО + U Y = WО“ (A.10) Лњ = О“ в€’ (l вЉ— О“)О = О“(I в€’ О¦1 L в€’ ... в€’ О¦P LP ) О“ (A.11) with where О¦i is a matrix of VAR coefficients of lag i, so that О = (О¦1 , О¦2 , . . . , О¦P ) . Equation (A.11) is a generalization of the restriction О± = Вµ(1 в€’ ПЃ) (A.2) for the AR(1) case, and analogously to that case it has implications when the lag polynomial in О¦s contains some unit roots. When О“ consists only of constant terms, (A.11) guarantees that those corresponding to series that have unit roots are suppressed, and the random walks have no drifts. When О“ contains constant terms and time trends, (A.11) ensures that the time trends corresponding to unit root series are suppressed, consistently with the postulate of Uhlig (1994b). Likelihood conditional on initial P observations is proportional to: T 1 L(Y ; О , О“) в€ќ |ОЈ|в€’ 2 exp в€’ tr ОЈв€’1 U U 2 (A.12) A common noninformative prior for the error variance and the autoregressive coefficients is: 1 p(ОЈ, О ) в€ќ |ОЈ|в€’ 2 (J+1) (A.13) The prior for О“ must be proper, to compensate the indeterminacy at unit root implied by restriction (A.11). A вЂ™bias-correctingвЂ™ prior assumption, in the spirit of the previous sections, relates the coefficients of the deterministic component of the process to the initial observations: Y 0 в€’ W 0 О“ = U0 (A.14) where Y0 is a T0 Г— J matrix with T0 initial observations on the endogenous variables, W0 is a T0 Г— K matrix with the corresponding values of the exogenous variables, and U0 a T0 Г— J matrix with T0 independent normal vectors 51 length J with a variance ОЈ0 . We take ОЈ0 to be the OLS estimate of the VAR errors. The implied prior for О“ is normal, centered on the OLS estimate of О“ on the T0 initial observations: Л† 0 , ОЈ0 вЉ— W0 W0в€’1 ) p(vecО“|Y0 , W0 , ОЈ0 ) = N(vecО“ (A.15) where Л† 0 в‰Ў (W W0 )в€’1 W Y0 О“ 0 0 The simplifying assumption in (A.14), which delivers conjugacy of the prior, is that the first T0 deviations from the deterministic component are independent, and only starting with time period 0 the time dependence implied by О kicks in. An exact multivariate and multilag counterpart to the UhligвЂ™s S-prior would assume, that the process had been equal to its deterministic component up to period в€’S and then started evolving according to (A.9). But then the variance of U0 would depend in a complicated way on О and the conjugacy of the prior would be lost. The prior in (A.15) implies that the initial T0 observations are used only to infer on the coefficients of the deterministic process О“, but not the О . This may be a simplification, but it is still better then the case of the flat prior, where the information in the initial observations is ignored altogether. The joint posterior implied by (A.12),(A.13) and (A.15) is: p(О , О“, ОЈ|Y, Y0 ) в€ќ |ОЈ|в€’ T +J+1 2 1 exp в€’ tr(ОЈв€’1 U U + ОЈв€’1 0 U0 U0 ) 2 (A.16) The conditional posteriors are: p(ОЈ|О , О“, Y, Y0 ) = IW(U U, T ) (A.17) Лњ ОЈ вЉ— (X Лњ X) Лњ в€’1 p(vecО |О“, ОЈ, Y, Y0 ) = N vecО , (A.18) p(vecО“|О , ОЈ, Y, Y0 ) = N Gв€’1 g, Gв€’1 (A.19) where IW is an Inverted Wishart distribution, Лњ в‰Ў X в€’ Z(IP вЉ— О“) X Лњ = (X Лњ X) Лњ в€’1 X Лњ (Y в€’ W О“) О G = A (ОЈв€’1 вЉ— IT )A + ОЈв€’1 0 вЉ— W0 W0 Л† g = A (ОЈв€’1 вЉ— IT )vec(Y в€’ XО ) + ОЈв€’1 0 вЉ— W0 W0 vecО“0 A = IJ вЉ— W в€’ (О вЉ— Z) ((IP вЉ— KJP )(vecIP вЉ— IJ ) вЉ— IK ) and KJP is the commutation matrix (Magnus and Neudecker, 1988, p.46). A sample from the posterior (A.16) can be generated by Gibbs sampler, i.e. by drawing in turn from (A.17), (A.18) and (A.19). 52 Appendix B Marginal distribution of the observables implied by the prior on growth rates For completeness we write here explicitly the marginal distribution of the level of observables Y implied by the prior for growth rates в€†Y , such as the one in (24). For simplicity we are assuming that the autocorrelations of growth rates of all variables are the same, and equal to r, which results in the full variance of growth rates having Kronecker structure Var(vec в€†Y ) = ОЈв€† вЉ— R where R is a T0 Г— T0 matrix with ones along the diagonal and r everywhere else. To write the distribution of Y note that defining the T0 Г— 1 vector e в‰Ў (1, 0, . . . , 0) and T0 Г— T0 matrix A: пЈ« пЈ¶ 1 0 ... 0 пЈ¬в€’1 1 . . . 0пЈ· пЈ¬ пЈ· A в‰Ў пЈ¬ .. .. . . .. пЈ· пЈ . . . .пЈё 0 0 в€’1 1 we can write пЈ« пЈ¶ y1 в€’ y0 пЈ¬ y2 в€’ y1 пЈ· пЈ· = AY в€’ ey0 в€†Y = пЈ¬ пЈ пЈё ... yT0 в€’ yT0 в€’1 (B.1) Defining the T0 Г— 1 vectors О№=(1, . . . , 1) and П„ в‰Ў (1, 2, . . . T0 ) and for a given initial condition y0 this together with the prior for growth rates implies a prior for observables П•Y в€ј N vec ВµП• , ОЈП• (B.2) with ВµП• = Aв€’1 (ey0 + О№Вµв€† ) = О№y0 + П„ Вµв€† в€’1 (B.3) в€’1 в€’1 ОЈП• = (IN вЉ— A ) (ОЈв€† вЉ— R) (IN вЉ— A ) = ОЈв€† вЉ— A RA Appendix C в€’1 (B.4) Our prior is not a reparameterization This section illustrates the problem involved in obtaining the delta prior by a reparameterization. The problem is that such attempts generate distribu53 tions of coefficients which are not independent of the error terms, which we find quite unappealing. The complication in the reparameterization is that there is no one-to-one mapping between observables and parameters, because the mapping involves another random term: the errors . Obtaining the delta prior through a reparameterization would therefore involve specifying the joint distribution of (y, ), then obtaining the distribution of (B, ) through a change-of-variable technique. For the sake of example, consider the AR(1) model with the constant term and T0 = 2. In this case, the mapping from (B, ) to (y, ) is as follows: y1 y2 Оµ1 Оµ2 = О± + ПЃy0 + Оµ1 = О± + О±ПЃ + ПЃ2 y0 + ПЃОµ1 + Оµ2 = Оµ1 = Оµ2 (C.1) (C.2) (C.3) (C.4) It is easy to verify that the Jacobian matrix of this transformation is: пЈ« пЈ¶ 1 y0 1 0 пЈ¬1 + ПЃ О± + 2ПЃy0 + Оµ1 ПЃ 1пЈ· пЈ¬ пЈ· пЈ 0 0 1 0пЈё 0 0 0 1 The determinant of this matrix is О± + (ПЃ в€’ 1)y0 + Оµ1 , and the absolute value of this term multiplies the distribution in the new parameter space (О±, ПЃ, Оµ1 , Оµ2 ). This term cannot be factorized into terms involving only Оµs and only the parameters, and therefore the obtained density will not, in general, be consistent with independence of the model parameters and errors. In principle, we could specify a joint distribution of observables and errors in which this term would cancel, and which therefore would deliver independence of parameters and errors. However, specifying such a distribution is a challenge that may not be easier then solving the functional equation (9). Appendix D Analytical iteration on the mapping F for AR(1) Consider the AR(1) model without the constant term, with y0 = 0 given. The first observation must be nonzero, because if we are unlucky to start exactly at the mean, growth rate in the first period does not depend on 54 the coefficients and the prior for growth rate will not carry any information about ПЃ. But we only require (9) to hold in a probability one set of ys. For simplicity, Пѓ 2 is given. Everywhere we will implicitly condition on y0 and Пѓ 2 . The model: 2 (D.1) yt = ПЃytв€’1 + t t i.i.d. N(0, Пѓ ) that is, the likelihood of a hypothetical observation in period 1: Ly1 |ПЃ (y 1 ; ПЃ) = N (ПЃy 0 , Пѓ 2 ) (D.2) Introduce the prior assumption about zero (without loss of generality) growth rate in the first period: fв€†y1 (в€†ВЇ y1 ) = N(0, Пѓ 2в€† ) (D.3) П•(ВЇ y1 ) = N(y0 , Пѓ 2в€† ) (D.4) which implies: LetвЂ™s find the marginal prior fПЃ (ПЃ) which will be consistent with the above L and П•, i.e. which will satisfy y1 ) L(ВЇ y1 ; ПЃ)fПЃ (ПЃ)dПЃ = П•(ВЇ D.1 (D.5) Guess of the solution Analyzing the structure of the problem, we can easily guess that the solution is: Пѓ2 в€’ Пѓ2 fПЃguess (ПЃ) = N 1, в€† 2 (D.6) y0 Verifying (we skip algebraic details which are tedious, the integral can be performed by completing the square): 1 Ly1 |ПЃ (y 1 ; ПЃ)fПЃguess (ПЃ)dПЃ = ... = (2ПЂ)в€’ 2 Пѓ в€’1 в€† exp в€’ 1 (y 1 в€’ y0 )2 2 Пѓ 2в€† = П•(ВЇ y1 ) (D.7) so the guess was right: fПЃguess (ПЃ) satisfies condition (D.5). D.2 Approaching the prior by fixed point iteration In the context of the AR(1) model one iteration with mapping F produces F (1) fПЃ (ПЃ) (as before, the integral is tedious but easy to compute by вЂ™completing the squareвЂ™ approach): 55 fПЃF (1) (ПЃ) = L(y1 ; ПЃ) Г— 1 Пѓ2 + Пѓ2 П•(y1 )dy1 = ... = N 1, в€† 2 y0 L(y1 ; ПЃ) Г— 1dПЃ (D.8) Verifying if it satisfies D.5, i.e. if it is consistent with the desired marginal distribution of y1 yields: y1 ) Ly1 |ПЃ (y 1 ; ПЃ)fПЃF (1) (ПЃ)dПЃ = ... = N y0 , Пѓ 2в€† + 2Пѓ 2 = П•(ВЇ (D.9) F (1) The marginal distribution of y1 implied by fПЃ (ПЃ) is not what we wanted. It has a correct mean, but the variance is too high. F (F (1)) In the second iteration, first we compute the prior fПЃ (ПЃ) by applying mapping F to the prior obtained in the first step F (1) Пѓ 2в€† + Пѓ 2 Пѓ 4в€† + 2Пѓ 2в€† Пѓ 2 + 2Пѓ 4 Г— F (1) y02 (Пѓ 2в€† + 2Пѓ 2 )2 L(y1 ; ПЃ) Г— fПЃ (ПЃ)dПЃ (D.10) Conveniently, we already computed the integral in the denominator while verifying F(1) (equation D.9 above). This prior has a smaller variance then the prior from the first step, because the second quotient in the variance is less then 1, which can be seen after expanding the denominator: fПЃF (F (1)) (ПЃ) = L(y1 ; ПЃ) Г— fПЃ (ПЃ) П•(y1 )dy1 = ... = N 1, Пѓ 4в€† + 2Пѓ 2в€† Пѓ 2 + 2Пѓ 4 <1 Пѓ 4в€† + 4Пѓ 2в€† Пѓ 2 + 4Пѓ 4 (D.11) So the prior F(F(1)) has a smaller variance then the prior F(1). However, it still does not satisfy (D.5): Ly1 |ПЃ (y 1 ; ПЃ)fПЃF (F (1)) (ПЃ)dПЃ = ... = N y0 , Пѓ 6в€† + 4Пѓ 4в€† Пѓ 2 + 8Пѓ 2в€† Пѓ 4 + 6Пѓ 6 (Пѓ 2в€† + 2Пѓ 2 )2 = П•(ВЇ y1 ) (D.12) The marginal distribution of y1 implied by is still not right, the mean remains correct, but the variance is still too big, although smaller than in the first step because it can be transformed to F (F (1)) fПЃ (ПЃ) Пѓ 2в€† < (Пѓ 2в€† + 2Пѓ 2 )3 в€’ (2Пѓ 4в€† Пѓ 2 + 4Пѓ 2в€† Пѓ 4 + 2Пѓ 6 ) < Пѓ 2в€† + 2Пѓ 2 (Пѓ 2в€† + 2Пѓ 2 )2 56 (D.13) (The transformation is intended to facilitate seeing the second inequality. The first inequality is easy to prove too.) So in the second iteration we got closer to the right prior. Further iterations can be seen as observing more and more samples of size T0 , and learning better and better about the marginal distribution of parameters. Unfortunately, for T0 > 1 the integrals involved in the fixed point iterations are no longer closed form, and need to be evaluated by Monte Carlo. References Abadir, K. M., Hadri, K., and Tzavalis, E. (1999). The influence of VAR dimensions on estimator biases. Econometrica, 67(1):163вЂ“181. Andrews, D. W. K. (1993). Exactly median-unbiased estimation of first order autoregressive / unit root models. Econometrica, 61(1):139вЂ“165. Andrews, D. W. K. and Chen, H.-Y. (1994). Approximately median-unbiased estimation of autoregressive models. Journal of Business and Economic Statistics, 12(2):187вЂ“204. Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics. Springer, New York, second edition. Berger, J. O. and Wolpert, R. L. (1988). The Likelihood Principle, volume 6 of Lecture Notes - Monograph Series. Institute of Mathematical Statistics, Hayward, California, second edition. Bhargava, A. (1986). On the theory of testing for unit roots in observed time-series. Review of Economic Studies, 53(3):369вЂ“384. Christiano, L. J., Eichenbaum, M., and Evans, C. L. (1999). Monetary policy shocks: What have we learned and to what end? In Taylor, J. B. and Woodford, M., editors, Handbook of Macroeconomics, number 1A, chapter 2, pages 65вЂ“148. Amsterdam: North-Holland. Del Negro, M. and Schorfheide, F. (2004). Priors from general equilibrium models for VARs. International Economic Review, 45(2):643вЂ“673. Doan, T., Litterman, R., and Sims, C. (1984). Forecasting and conditional projections using realistic prior distributions. Econometric Reviews, 3(1):1вЂ“100. 57 Doan, T. A. (2000). RATS version 5 UserвЂ™s Guide. Estima, Suite 301, 1800 Sherman Ave., Evanston, IL 60201. Doornik, J. A. (2002). Object-Oriented Matrix Programming Using Ox. London: Timberlake Consultants Press and Oxford: www.nuff.ox.ac.uk/Users/Doornik, third edition. Hurwicz, L. (1950). Least-squares bias in time series. In Koopmans, T. C., editor, Statistical Inference in Dynamic Economic Models. Wiley, New York. Kadane, J. B. (1980). Predictive and structural methods for eliciting prior distributions. In Zellner, A., editor, Bayesian Analysis in Econometrics and Statistics, chapter 8, pages 89вЂ“93. North-Holland, Amsterdam. Kadane, J. B., Chan, N. H., and Wolfson, L. J. (1996). Priors for unit root models. Journal of Econometrics, 75(1):99вЂ“111. Kadane, J. B., Dickey, J. M., Winkler, R. L., Smith, W. S., and Peters, S. C. (1980). Interactive elicitation of opinion for a normal linear model. Journal of the American Statistical Association, 75(372):845вЂ“854. Kendall, M. G. (1954). Note on the bias in the estimation of autocorrelation. Biometrika, XLI:403вЂ“404. Kilian, L. (1998). Small-sample confidence intervals for impulse response functions. The Review of Economics and Statistics, 80(2):218вЂ“230. Kleibergen, F. and van Dijk, H. K. (1994). On the shape of the likelihood/posterior in cointegrating models. Econometric Theory, 10(3-4):514вЂ“ 551. Lubrano, M. (1995). Testing for unit roots in a bayesian framework. Journal of Econometrics, 69:81вЂ“109. MacKinnon, J. G. and Smith, A. A. (1998). Approximate bias correction in econometrics. Journal of Econometrics, 85:205вЂ“230. Magnus, J. R. and Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley and Sons, New York. Marriott, F. H. C. and Pope, J. A. (1954). Bias in the estimation of autocorrelations. Biometrika, XLI:393вЂ“403. 58 MВЁ uller, U. K. and Elliott, G. (2003). Tests for unit roots and the initial condition. Econometrica, 71(4):1269вЂ“1286. Orcutt, G. H. and Winokur, H. S. (1969). First order autoregression: Inference, estimation, and prediction. Econometrica, 37(1):1вЂ“14. Phillips, P. C. (1991). To criticize the critics: An objective bayesian analysis of stochastic trends. Journal of Applied Econometrics, 6(4):333вЂ“364. Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the Royal Statistical Society Series B, 11:68вЂ“84. Roy, A. and Fuller, W. A. (2001). Estimation for autoregressive time series with a root near 1. Journal of Business and Economic Statistics, 19(4):482вЂ“ 493. Schotman, P. C. and Dijk, H. K. V. (1991a). A bayesian analysis of the unit root in real exchange rates. Journal of Econometrics, 49(1-2):195вЂ“238. Schotman, P. C. and Dijk, H. K. V. (1991b). On bayesian routes to unit roots. Journal of Applied Econometrics, 6(4):387вЂ“401. Sims, C. A. (2000). Using a likelihood perspective to sharpen econometric discourse: Three examples. Journal of Econometrics, 95:443вЂ“462. Sims, C. A. (2006). Conjugate dummy observation priors for VARвЂ™s. Technical report, Princeton University. Sims, C. A. (2007a). Making macro models behave reasonably. Technical report, Princeton University. Sims, C. A. (2007b). Rational inattention: A research agenda. Technical report, Princeton University. Sims, C. A. (revised 1996). Inference for multivariate time series models with trend. Discussion paper, presented at the 1992 Americal Statistical Association Meetings. Sims, C. A. and Uhlig, H. (1991). Understanding unit rooters: A helicopter tour. Econometrica, 59(6):1591вЂ“1599. Sims, C. A. and Zha, T. (1998). Bayesian methods for dynamic multivariate models. International Economic Review, 39(4):949вЂ“68. Sims, C. A. and Zha, T. (1999). Error bands for impulse responses. Econometrica, 67(5):1113вЂ“1155. 59 Stine, R. A. and Shaman, P. (1989). A fixed point characterization for bias of autoregressive estimators. The Annals of Statistics, 17(3):1275вЂ“1284. Thornber, H. (1967). Finite sample monte carlo studies: An autoregressive illustration. Journal of the American Statistical Association, 62(319):801вЂ“ 818. Uhlig, H. (1994a). On Jeffreys prior when using the exact likelihood function. Econometric Theory, 10(3-4):633вЂ“644. Uhlig, H. (1994b). What macroeconomists should know about unit roots - a bayesian perspective. Econometric Theory, 10(3-4):645вЂ“671. Villani, M. (2008). Steady state priors for vector autoregressions. Journal of Applied Econometrics. forthcoming. Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. Wiley, New York. Zivot, E. (1994). A bayesian analysis of the unit-root hypothesis within an unobserved components model. Econometric Theory, 10(3-4):552вЂ“578. 60

© Copyright 2023 Paperzz