Bayesian Inference in Econometric Models Using Monte Carlo Integration Author(s): John Geweke Source: Econometrica, Vol. 57, No. 6 (Nov., 1989), pp. 1317-1339 Published by: The Econometric Society Stable URL: http://www.jstor.org/stable/1913710 Accessed: 14/08/2010 13:30 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=econosoc. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica. http://www.jstor.org Econometrica,Vol. 57, No. 6 (November, 1989),1317-1339 BAYESIAN INFERENCE IN ECONOMETRIC MODELS USING MONTE CARLO INTEGRATION BY JOHN GEWEKE1 Methods for the systematic application of Monte Carlo integration with importance sampling to Bayesian inference in econometric models are developed. Conditions under which the numerical approximation of a posterior moment converges almost surely to the true value as the number of Monte Carlo replications increases, and the numerical accuracy of this approximation may be assessed reliably, are set forth. Methods for the analytical verification of these conditions are discussed. Importance sampling densities are derived from multivariate normal or Student t approximations to local behavior of the posterior density at its mode. These densities are modified by automatic rescaling along each axis. The concept of relative numerical efficiency is introduced to evaluate the adequacy of a chosen importance sampling density. The practical procedures based on these innovations are illustrated in two different models. KEYWORDS: Importance sampling, numerical integration, Markov chain model, ARCH linear model. 1. INTRODUCTION ECONOMETRIC MODELS are usually expressed in terms of an unknown vector of parameters 0 E c Rk, which fully specifies the joint probability distribution of the observations X = {x, ... XT}. In most cases there exists a probability density function f(Xl0), and classical inference then often proceeds from the likelihood function L(8)=f(XIO). The asymptotic behavior of the likelihood function is well understood, and as a consequence there is a well developed set of tools with which problems of computation and inference can be approached; Quandt (1983) and Engle (1984) provide useful surveys. The analytical problems in a new model are often far from trivial, but there are typically several approaches that can be explored systematically with the realistic anticipation that one or more will lead to classical inference procedures with an asymptotic justification. Bayesian inference proceeds from the likelihood function and prior information which is usually expressed as a probability density function over the parameters, T(O), it being implicit that a(#) depends on the conditioning set of prior information. The posterior distribution is proportional to p(O) = v(p)L(O). This formulation restricts the prior probability measure for &9to be absolutely continuous with respect to Lebesgue measure. Extensions that allow concentrations of prior probability mass are generally straightforward but notationally cumbersome, and are postponed to the concluding section. Most Bayesian inference problems can be expressed as the evaluation of the expectation of a e 'Financial support for this work was provided by NSF Grant SES-8605867, and by a grant from the Pew Foundation to Duke University. This work benefited substantially by suggestions from an editor, Jean-Francois Richard, Herman van Dijk, Robert Wolpert, and two anonymous referees, who bear no responsibility for any errors of commission or omission. 1317 1318 JOHN GEWEKE function of interest g(0) underthe posterior, (1) E[g(p)] - Jg(@)p(0)dp/Jp(0)d0. Approaches to this problem are nowhere near as systematic,methodical,or generalas are those to classicalinferenceproblems:classicalinferenceis carried out routinelyusing likelihoodfunctionsfor which the evaluationof (1) is at best a majorand dubiousundertakingand for most practicalpurposesimpossible.If the integration in (1) is undertakenanalyticallythen the range of likelihood functionsthat can be consideredis small,and the class of priorsand functionsof interestthat can be consideredis severelyrestricted.Manynumericalapproaches, like quadraturemethods, requirespecial adaptationfor each g, 7r,or L, and become unworkableif k exceeds,say, three.(Good surveysof thesemethodsmay be found in Davis and Rabinowitz,1975; Hammersleyand Handscomb,1979; and Rubinstein,1981.) With the advent of powerful and cheap computing, numericallyintensive methods for the computationof (1) have become more attractive.Monte Carlo integrationwith importancesamplingprovidesa systematicapproachthat-in principle-can be applied in any situation in which E[g(O)] exists, and is practicalfor largevaluesof k. It was discussedby Hammersleyand Handscomb (1964, Section 5.4) and broughtto the attentionof econometriciansby Kloek and van Dijk (1978). The main idea is simple. Let {Pi} be a sequenceof k-dimensional randomvectors,i.i.d. in theoryand in practicegeneratedsyntheticallyand therefore a very good approximationto an i.i.d. sequence. If the probability distribution function of the #, is proportional to p(O), then n- 1?l 1g(0i) E[g(O)]; if it is proportional to L(8), then n-'Y27 g(#)1r(#i) --E[g(O)]. (Except in Section 4, all convergenceis in n, the number of Monte Carlo replications. Almost sure convergence is indicated " -* ", convergence in distribution "=> ".) Only in a very limited set of simple cases is it feasible to generate syntheticvariateswhose p.d.f. is in proportionto the posterioror the likelihood function (but the set does include common and interesting cases in which classical and analyticalBayesianinferencefail; see Geweke, 1986). More generally, suppose that the probabilitydistributionfunctionof the #, is I(8), termed the importancesamplingdensity.Then, under very weak assumptionsgiven in Section 2, n (2) n- n (#J)I(0Ji] E[g(0)p / [p E P(JiI(#JJ -- E [g(# )] The rate of almost sure convergencein (2) dependscriticallyon the choice of the importance sampling density. Under stronger assumptions,developed in Section 3, (3) n2/2ma esti gn-E e t N(O, w2), T and a2 may be estimatedconsistently.This result was indicatedby Kloek and BAYESIAN INFERENCE 1319 van Dijk (1978), but does not follow from the resultof Cramer(1946) they cite, and is repeated by Bauwens (1984) but without explicit discussion of the momentswhose existenceis assumed.Looselyspeaking,the importancesampling density shouldmimicthe posteriordensity,and it is especiallyimportantthat the tails of I(8) not decay more quickly than the tails of p(#). This is keenly appreciatedby those who have done empiricalwork with importancesampling, includingZellner and Rossi (1984), Gallantand Monahan(1985), Bauwensand Richard (1985), and van Dijk, Kloek, and Boender (1985). These and other investigators have experiencedsubstantialdifficultiesin tailoring importance samplingdensities to the problemat hand. This is an importantfailing, for the approach is neither attractivenor methodical if special arcane problems in numericalanalysishave to be resolvedin each application. This paper approachesthese problemsanalytically,and presentsnew results which should make the applicationof Monte Carlo integrationby importance sampling much more routine. Section 3 provides sufficientand easily verified conditions for (3). It also introducesa measureof relativenumericalefficiency which is natural to use in evaluatingthe effectivenessof any given importance samplingdensity. Based on these considerations,Section 4 outlinesa systematic approach to the choice of I(8). It utilizes the local behaviorof the posterior density at its mode, and a new class of importancesampling densities, the multivariatesplit-normaland multivariatesplit-Student.These developmentsare illustrated in Section 5, with the homogeneousMarkov chain model, and ia Section 6, with the ARCH linearmodel. Some directionsfor futureresearchare taken up in the last section. 2. BAYESIAN INFERENCE WITH IMPORTANCE SAMPLING We begin with some additionalnotation, and a set of assumptionsbasic to what follows. Since expectationsare taken with respectto a varietyof distributions, it is useful to denote Ef [h(#)] = [Jef(#) dOf]'Jef(0)h(0) dO wheref(0) is proportionalto a probabilitydensityand h(#) is integrablewith respectto the probabilitymeasurethat is inducedby f(O). Similarlydenote by varf[h(#)] and mdf[h(0)] the variance and mean deviation of h(0) under this probability measure,when the respectivemomentsexist. ASSUMPTION 1: The product of the prior density, vn(P), and the likelihood function, L (#), is proportionalto a properprobabilitydensityfunction defined on 9. ASSUMPTION 2: (0 }?=1 is a sequence of i.i.d. random vectors, the common distributionhaving a probabilitydensityfunction I(8). ASSUMPTION 3: The supportof I(8) includes 39. ASSUMPTION 4: E [ g(O)] exists and is finite. 1320 JOHN GEWEKE Assumption 1 is generally satisfied and usually not difficult to verify. It explicitlyallows the priordensityto be improper,so long as Jep(0) dO c < oo. For h(0) integrablewith respectto the posteriorprobabilitymeasure,the simpler E-E[h()] (and similarlyvar[h(0)] and md[h(0)]) will be notation h-E[h()] used. Assumptions2 and 3 are specificto the methodof Monte Carlointegration with importance sampling; their role is obvious from the discussion in the introduction.Most inferenceproblemscan be cast in the form of determining E[ g(q)], throughthe functionof interest,g(O). The value of g-nin (2) is invariantwith respect to arbitraryscaling of the posteriorand importancesamplingdensities.In certainsituationsit is convenient not to work with normalizingconstantsfor I(8), so we shall re-expressg-, with I*(8) then = d-I(#) in place of I(#). Denote the weightfunction w(0) p(#)/I*(O); '= g(01)w(0)/L7=1 w(O,). Consistency of gn for g is a direct conse- g, = quence of the stronglaw of largenumbers. THEOREM 1: UnderAssumptions1-4, gn -- E[g(p)]. This simple resultprovidesa foundationfor the numericaltreatmentof a very wide arrayof practicalproblemsin Bayesianinference.A few classesof examples are worth mention. (i) If the value of a Borelmeasurablefunctionof the parameters,h(0), whose expectationexists and is finite,is to be estimatedwith a quadraticloss function, then g(0) = h(0). (ii) If a decision is to be made betweenactions a, and a2, accordingto the expectationof loss functions1(#; a,) and 1(0; a2), then g(0) = 1(0; a,) - 1(0; a2). (iii) Posteriorprobabilitiesmay be assessedthroughindicatorfunctionsof the form x(@;S) = 1 if Ge S, 0 if #0 S. To evaluate the posterior probability E A}). Assumption4 is satisfiedtrivially, P[h (0) e A], let g(g) = x(8; {0 : h(0) - PP[h(0) E A]. Construction of posterior so given the other assumptions c.d.f.'s is a specialcase that is particularlyrelevantif h(0) itself has no posterior moments. (An example is provided subsequentlyin Section 5; see especially Table IV.) (iv) The applicationto c.d.f.'s suggestshow to constructquantiles.Suppose P[h(0) < x] = a, and for a given E> 0 P[h(0) < x - e] <a, P[h(0) <x + e] > a. w(@)> a and If qn is definedto be any numbersuch that E;ih(O,), qnw(i)/i2=l Yi:h(f,)>qn w(q0)/,24'1 w(0i)> 1 - a, then by Theorem 1 1iMn1"O P[q e (x- + E)]= 1. Hence if the c.d.f. of h(0) is strictly increasingat x, the ath quantile can be computedroutinelyas a byproductof Monte Carlointegration with importancesampling. (v) In forecasting problems functions of interest are of the form g(0)= P(XT+S EA I#, X) and in this case g(#) itself is an integral. Monte Carlo integration with an importancesamplingdensity for the parametersmay be combined with Monte Carlo integrationfor future realizationsof the random vector xt to form predictivedensities(Geweke(1988b)). E, x 1321 BAYESIAN INFERENCE 3. EVALUATINGTHE IMPORTANCESAMPLINGDENSITY Standingalone theseresultsare of little practicalvalue,becausenothingcan be said about rates of convergenceand there is no formal guidancefor choosing I(@) beyond the obviousrequirementthat subsetsof the supportof the posterior should not be systematicallyexcluded.Moreover,as noted in the introduction, for seemingly sensible choices of I(@), k,, can behave badly. Poor behavior is usually manifested by values of k,, that exhibit substantialfluctuationsafter thousandsof replications,which in turn can be tracedto extremelylargevalues of w(0i) that turn up occasionally.The key problemis the distributionof the ratio of two means in k,, and this may be characterizedby a central limit theorem. THEOREM 2: In addition to Assumptions 1-4, suppose E[w()] c-w [ = )] dO p()2/I*( and E[g(#)2w(p)] f = c- dO [g(#)2p(#)2/I*(O)] are finite. Let d-E a2 [g(O)-gI2w()} [g( 62=- - n) =c-ldn ] W(o)/2 i=l [g() _g]2w(f) (0) da 2 w(Oi)I. /_i= Then *1l/2 (n-- k) N(O, ]2) 2 * an2 Define A(#)=g(#)w(f) and observe that g, ==[n i=lA(#i)]l 1A=A(01)-* cdg, n lEn 1w( 9#) -* cd. By From Theorem 1, n a CentralLimit Theorem(e.g. Billingsley(1979, Theorem29.5)), n1/2(gPROOF: [n - lEn 1w(01)]. N(O, aJ2), a2 = c-G2d-2{var[A(g)] = c -2d-1 + g2 var[w(g)] [g(#) -_gJ2W(#)p(#) - dO. 2gcov[A(#), w(#)] } Q.E.D. as the numerical standard error of gn. n = (n2)1/2 2 must be verified,analytically,if that resultis to of Theorem The conditions We shall refer to be used to assess the accuracy of g-nas an approximation of E[g(O)]. Failure of 1322 JOHN GEWEKE these conditionsdoes not vitiatethe use of g-"thereis no compellingreasonwhy g) needs to have a limitingdistribution.But it is essentialto have some indication of Ig,- -, and this task becomes substantiallymore difficultif a centrallimit theoremcannotbe applied.Practicalitysuggeststhat I(@)be chosen so that Theorem2 applies.The additionalassumptionsin Theorem2 typically can be verifiedby showingeither n1//2(gn- (4) w(#)<iw (5) Ois compact, and p(0) <p <oo, <oo VOEt9, and var[g()] <oo; I(#) > e > 0, VOE 69. Demonstrationof (4) involvescomparisonof the tail behaviorsof p(O) and I(8). Demonstrationof (5) is generallysimple,although&9may be compactonly after a nontrivial reparameterization of the prior and the likelihood.Meeting these conditions does not establishthe reliabilityof gn and 62 in any practicalsense: examplesin Section 5 demonstrateuncontrivedcases in which(5) appliesbut 62 would be unreliableeven for n = 1020. The strengthof these conditionsis, rather, that they providea startingpoint to use numericalmethodsin constructingI(@). We shall returnto this endeavorin Section4. The expressionfor a2 in Theorem2 indicatesthat the numericalstandarderror is adverselyaffectedby largevar[g(p)],and by largerelativevaluesof the weight function. The formeris inherentin the functionof interestand in the posterior density, but the latter can in principlebe controlledthroughthe choice of the importancesamplingdensity.A simplebenchmarkfor comparingthe adequacy of importance samplingdensities is the numericalstandarderror that would result if the importancesamplingdensity were the posteriordensity itself, i.e., I*(@) x p(O). In this case a2 = var[g(#)] and the numberof rephcationscontrols the numericalstandarderrorrelativeto the posteriorstandarddeviationof the function of interest-e.g., n = 10,000 implies the formerwill be one percentof the latter.This is a very appealingmetricfor numericalaccuracy. Only in special cases is it possible or practicalto constructsyntheticvariates whose distributionis the same as the posterior. But since var[g(e)] can be computedas a routinebyproductof MonteCarlointegration,it is possibleto see what the numericalvariancewould have been, had it been possible to generate synthetic random vectors directly from the posterior distribution.Define the relative numerical efficiency of the importance sampling density for the function of interest g(#), RNE-var[g(q)]/d-E{[g(0)-gI2w(O)} =var[g(0)]/a2. The RNE is the ratio of numberof replicationsrequiredto achieveany specified numerical st--idard error using th'eiiportance sampling density I(@), to the numberrequiredusing the posteriordensityas the importancesamplingdensity. The nunmerical standarderrorfor is the fraction(RNE- n) - 1/2 of the posterior standarddeviation. Low values of RNE indicatethat thereexists an importancesamplingdensity (namely, the posteriordensityitself) that does not have to be tailoredspecifically BAYESIAN INFERENCE 1323 to the function of interest, and provides substantially greater numerical efficiency. Thus they alert the investigator to the possibility that more efficient, yet practical, importance sampling densities might be found. If one is willing to consider importance sampling densities that depend on the function of interest-which will be impractical if expected values of many functions of interest are to be computed-then even greater efficiencies can generally be achieved. A lower bound on a2 for all such functions can be expressed. THEoREM 3: In addition to Assumptions 1-4 suppose that md[g(O)] is finite. Then the importancesampling densitythat minimizes a2 has kernel lg(g) - -1p(0), and for this choice a2 = {md[g(O)]}2. PROOF: From Theorem 2, aI2= c-2J0 K(O)2/I(O) dO, when we impose the constraint d- = Je I() dO= 1 and define K(8) = Ig(e) - -1p(O). Following Rubinstein (1981, Theorem 4.3.1; the restriction to bounded 69 is inessential), a2 is minimized by I(@) cc IK(O)I. Q.E.D. Theorem 3 has some interesting direct applications. An important class of cases consists of model evaluation problems: 0 E 19* or 0 e 6*, g(O) is an indicator function for e*, and so E[g(I)] = P[O e 19*] p*. Then_a2is minimized if I(@) cc (1 -p*)p(#) for 0 E 19* and I(@) o p*p(O) for 0 E 19*: half the drawings should be made in 19* and half in e*, and in proportion to the posterior in each case. It appears impractical to construct the densities with kernel Ig(p) - -1p(O), in general: a different function would-be required for each g(O); a preliminary evaluation of with a less efficient method would be a prerequisite; and methods for sampling from those importance sampling densities would need to be devised. However, Theorem 3 suggests that importance sampling densities with tails thicker than the posterior density (a characteristic of densities with kernel lg(e) - gl p(p)) might be more efficient than the posterior density itself (and hence have RNE > 1) although still suboptimal. This is found in practice with some frequency, and some examples are given in Section 5. 4. CHOOSING THE IMPORTANCE SAMPLING DENSITY There are two logical steps in choosing the importance sampling density. The first is to determine a class of densities that satisfy the conditions of Theorem 2, usually by using (4) or (5). The second step is to find a density within that class that attains a satisfactory RNE for the functions of interest. The first objective can only be achieved analytically. The second is a well-defined problem ill its own right, that can be solved numerically. When 69 is compact the first step will usually be trivial. For example, if the classical regularity conditions for the asymptotic normal distribution of the maximum likelihood estimator 0 obtain, with 0 LN(O*, V), then the N(p, V) importance sampling density satisfies (5). If &9is not compact, it will generally be 1324 JOHN GEWEKE necessaryto investigatethe tail behaviorof the likelihoodfunctionto verify(4). There are many instancesin which L(8) behaveslike kO[-qfor largevaluesof 0i and it is not difficultto determineq. Given a multivariatenormalprioron 0 the tail of the marginalposteriordensity in 0i will behavelike exp(-cc2) for large values of 0, and a multivariatenormal importancesamplingdensity may be appropriate.On the otherhand,if the prioris improper,and flat for one or more parameters,then in these instancesno multivariatenormalimportancesampling densitywill satisfyTheorem2. An instructiveleadingexampleis the computation of the expectedvalueof a functionof the meanof a univariatenormalpopulation with unknown mean and variance,using the N(j, s2/n) importancesampling density. It is not hard to see that (a) Theorem2 does not apply; (b) if one computes 6n2and var(g-n)= E7=1[g(0i)- g-]2w(0i)/[E27=1 w(0i)f2, then the computed RNE approaches 0 almost surely as n -- oo; (c) if n is held fixed, computed RNE -- 1 as samplesize approachesinfinity.These resultsare typical of cases in which the classicalasymptoticexpansionof the log-likelihoodfunction is valid, and E9is not compact.They underscorethe importanceof investigating the tail behavior of the likelihood function analytically rather than numerically. Once the tail behaviorof the likelihoodfunction is characterizeda class of importancesamplingdensities is usually suggested.For example,if the tail is multivariate Student with variance matrix 2 and degrees of freedom v, a multivariatet importancesampling density with weakly larger variance and weakly smallerdegreesof freedomwill satisfyTheorem2. In generallet I(0; a) be a family of importancesamplingdensitiesindexedby the g x 1 vector a, and let w(#; a) p(#)/I(q; a) be the correspondingfamily of weight functions.It will not usuallybe practicalto choose a distinctimportancesamplingdensityfor each function of interest. A reasonable objective is the minimization of E[w(#, a)] c Jofp(0)2/I(#; a)] dO.This function appearsin the first condition of Theorem 2 and in the denominatorof each E[g(q)]; it is precisely the functionone wouldminimizeif g(0) werean indicatorfunctionfor 19*,p(e*) = 1/2. Two related classes of importancesamplingdensities have turned out to be quite useful in our experiencewith this approach.They involve densities that have not, to our knowledge,been describedin the literature.The heuristicidea is to begin with minus the inverse of the Hessian of the log posterior density evaluatedat its mode as the variancematrixof a multivariatenormalimportance sampling density, and shift to the multivariateStudent t, if warranted,with degreesof freedomindicatedby the tail behaviorof the posteriordensity.If the prior is diffuse, the asymptoticvarianceof the maximumlikelihood estimator may be substitutedfor minus the inverse of the Hessian of the log posterior density evaluated at the mode. In practiceeither importancesamplingdensity can be poor, becausethe Hessianpoorly predicts(especiallyif it underpredicts) the value of the posteriordensity away from the mode, because the posterior density is substantiallyasymmetric,or both. Illustrationsof these problemsare providedin the next two sections.To cope with themwe modify the multivariate 1325 BAYESIAN INFERENCE normal or Student t, comparing the posterior density and importance sampling density in each direction along each axis, in each case adjusting the importance sampling density to reduce E[w(O, a)]. With k parameters, there are 2k adjustments in the last step, so a is a 2k X 1 vector. A little more formally, let sgn+( ) be the indicator function for nonnegative real numbers, and let sgn-( ) = 1 - sgn+( ). The k-variate split normal density N*(,u, T, q, r) is readily described by construction of a member x of its population: N(O, Ik); E [R[qi sgn X=,lb+ (E-i) + risgn (Ei) (i=L... Ei k); Tq. The log-p.d.f. at x is (up to an additive constant) n - E [log(qi)sgn I-1 (6i) + log(ri)sgn-(E)] - (1/2)E'E. In the application here, ,u is the posterior mode, and T is a factorization such that the inverse of TT' is the negative of the Hessian of the log posterior density evaluated at the mode. A variate from the multivariate split Student density t*(u,T, q, r, v) is constructed the same way, except that qi = [qi sgn+(Ei) + xx2(v). The log-p.d.f. is (up to an additive with risgn-(ei)]ei(I/v)-1/2, constant) n [log(qi)sgn+ (ei) + log(ri)sgn (ei)] -[(v + k)/2] log(l + v - . i=1 Choosing a' = (q', r') to minimize E[w(O; a)] requires numerical integration at each step of the usual algorithms for minimization of a nonlinear function. Solution of this optimization problem with any accuracy would, in most applications, likely be more time consuming than the computation of E[g(O)] itself. A much more efficient procedure can be used to select values for q and r that, in many applications, produces high RNE's. The intuitive idea is to explore each axis in each direction, to find the slowest rate of decline in the posterior density relative to a univariate normal (or Student) density, and then choose the variance of the normal (or Student) density to match that slowest rate of decline. A little more formally, let e(i) be a k x 1 indicator vector, e$') = e -()e M = 1. For the split normal define i(-8) accordingto p( (6) " +' Te (i)) /p ( ") = eXp-8 2/2 ti( ) I6I(2[logp(') Then take qi = sup,>,0 fi() - (8)2] logp(O + STe('))] and ri = sup,<,0fi(). } For the split Student the 1326 JOHN GEWEKE procedure is the same except that f1(6) - v-1/2II{[p(6)/p(-+ STe(i))]2/(+k)-1/2 In practice, carrying out the evaluation for 8 = 1/2,1,.. ., 6, seems to be satisfactory. The two examples that follow illustrate how these methods are applied. 5. EXAMPLES: THE BINOMIAL AND MARKOV CHAINS In this section we take up a set of successively complex examples in which the parameter space 9 is compact. We begin with situations which can be studied analytically in some detail, and consequently there would be no need for a numerical approach at all. This is strictly for illustrative purposes. As will be seen, not much elaboration is required to generate realistic models in which there are no analytical solutions. 5.1. A Simple Binomial Model Let a sample of size N be drawn from a population in which P[x = 1] = 0, P[x = 0] = 1 - 0, and suppose there are M occurrences of "1". If the prior is XT(O)= 1, 0 < 0 < 1, then the posterior is (7) p(O) = [B(M+1,N-M+1)]l0M(1 o)NfM (O0 1). Since N12( - 0) N(0, 0(1 - 0)), a normal approximation based on the asymptotic distribution of the m.l.e. 0 is I(0) = [2wN-v10(1 - 6)] /2exp{ -N(O- 0)2/26(1 - 0)). Since 0 < 0 < 1 condition (5) is satisfied. Consider two cases. In Case A, N = 69, M = 6, 0 = .087, the asymptotic standard error for 0 is .034 and the asymptotic variance is .00115. In Case B, N= 71, M= 54, 0 = .761, the asymptotic standard error for 0 is .051, and the asymptotic variance is .00256. For the corresponding normal importance sampling densities Figure 1 shows the weight functions w(0) = p(O)/I(O) indicated by, the heavy lines. In each case w(0) appears badly behaved, attaining a value of over 1070 for 0 around .9 in Case A, and attaining a value of over 104 for 0 between .2 and .3 in Case B. For our purposes it is E[w(O)] = Jf w(O)p(O) dO and not w(O) which matters. The light lines in Figure 1 show w(O)p(O), which is much less than w(O) when w(O) is large. In Case A E[w(O)] = 1.65 X 1024, and in Case B E[w(O)] = 1.05. Clearly the importance sampling density I(@) is adequate in the latter case but not the former. The difficulty in Case A is that the likelihood function declines more slowly than its asymptotic normal approximation, for 0 > 0. A split normal importance sampling density with variance .00115 for 0 < 0 and variance greater than .00115 for 0 > 0 performs better, as shown in Figure 2. As the latter variance increases, E[w(0)] decreases exponentially, until the variance reaches .0016; E[w(0)] attains its minimum at .0019, and then increases slowly. 1327 BAYESIAN INFERENCE RATIOS OFFUNCTIONS: N:69,Mz6 80 RATIOSOF FUNCTIONS: lo 60m 'G40 -u ~~~6- 0 ~~~~~~~~~4- o20 0 c"-20 co -8 -X N-69,Mz6 8 -2- ~~~~~~~~~~~-1 -40 -6- -60 -81 000..20 ~~~~~~~~~~-8C 040 0.60 0.w 1.00 0b. oio'lSo'oi05 Probability Parameter, Theta 80- RATIOSOFFUNCTIONS: N71, M54 o1o' 0.h5 Probability Parameter, Theta RATIOSOF FUNCTIONS: N:71,M=54 20 60 15 40- 1.00?5 o20? 0- \ 00 m -20 -05- -1.5 -60 -80 0.00 020 040 060 080 I.. i.00 -2.0, 055 060 0.65 070 075 0.80 0.85 Probability Porameter, Theta Probability Parameter, Theta FIGURE 1.-Values of log(w(6)) (heavy lines) and log(w(O)p(G)) (light lines) for the two binomial models discussed in Section 5.1. Minimondas o Minimondos o Functionof Variance Functionof Vorionce 25 1 20- v 20,- 118 1 16- 14~~~~~~~~~~~~~~1 0 1 12- '15 ~~~~~~~~~110- 4' 100- lo0't 1 06 1.04 5 \ 0 0.10 015 -' 1.02 0.35 1.00 . 016 0.18 0.2 0220240.260.28 0.3 032 Variance in Importance Density Variance in Importance Density 020 025 030 2.-Values of E(w(O)) as a function of the variance of a split normal importance sampling density, centered at J, for Case A discussed in Section 5.1. FIGURE 5.2. The Two-StateHomogeneousMarkov Chain Model2 Suppose an agent can occupyone of two states,with P[In state i at time t IIn state j at time t - 1] = pj, i # j. Let N agents be observedat times t - 1 and t, with mr,jagents in state i at time t - 1 and state j at time t. With a priordensity r(P1, P2) = 1, 0 <pi < 1 (i = 1,2), the posterior p.d.f. is proportional to (8) P(P P21(PP2') =pP 12p 2) -PP m11(1 _ P)m22) 2For elaborationon the Markovchain model and the problemof embeddability,see Singerand Spilerman(1976) or Geweke,Marshall,and Zarkin(1986a,1986b). 1328 JOHN GEWEKE TABLE I THREE EXAMPLES FOR THE TWO-STATE HOMOGENEOUS MARKOV CHAIN MODELs Mii M12 M21 M22 P1 P2 P1 + P2 Case I Case II Case III 63 6 17 54 .087 (.034) .239 (.051) .326 (.061) 21 66 6 24 .759 (.046) .200 (.073) .959 (.086) 68 28 17 4 .292 (.046) .810 (.086) 1.102 (.097) aCarats denote m.l.e.'s; asymptotic standard errors are shown parenthetically. Data are taken from Singer and Cohen (1980). O< pi < 1 (i = 1, 2). If the underlying process of transition takes place continuously and homogeneously through time, it may be described in terms of lim, 08 P[In state i at time t + IlInstate j at time t] = rj, i j. If Pl +P2 < 1, then rj = P1 log(l - Pl - P2)/( P1 + P2); otherwise the process of transition cannot be continuous and homogeneous through time. Whenever there exists a continuous time model corresponding to a discrete time Markov chain model, the discrete time model is said to be embeddable. To provide specific numerical examples, we use part of a data set reported by Singer and Cohen (1980). The data pertain to the incidence of malaria in Nigeria, state 1 being no detectable parasitemia and state 2 being a positive test for parasitemia. Data are reported separately for different demographic groups. The process of infection and recovery is modeled as a homogeneous Markov chain, and Singer and Cohen (1980) focus on whether this model is embeddable. Data for three cases, and the corresponding maximum likelihood estimates and their asymptotic standard errors are provided in Table I. Notice that the Case I data are those used to illustrate the binomial model; a two-state Markov chain model is simply a combination of two binomial models, as reflected in comparison of (7) <1 and (8). In Case I A1Pl2<< 1, and the model is embeddable; in Case II l+P and in Case III Pi +P2> 1, and the question of embeddability is open. We conduct inference in this model with two priors and six functions of interest. The first prior is uninformative: nY(PI, P2) = 1 for all pj e (0,1). The second prior imposes embeddability but otherwise is the same as the first: T(P11 P2)=2 for all (pi, pj): Pi > ?, Pj > ?, P1 + P2 < 1. The functions of interest are P'P2l Pi1 P2 , rj 1, and r7- . The function pJ 1 is the mean duration in state j in the discrete time model, while rif is the mean duration in state j assuming a continuous time model. Since if' is undefined if Pl +P2 >1, E(rj-1) is defined only under the second prior; E[rj] is infinite for either prior, although the posterior distribution of rj may be obtained readily as suggested in examples (iii) and (iv) of Section 2. Results obtained using the methods proposed in earlier sections are presented in Table II. Asterisks in column 1 denote the use of the second prior; otherwise the first is used. The first block of columns provides results with the normal 1329 BAYESIAN INFERENCE TABLE II INFERENCEIN THETwO-STATE HOMOGENEOUSMARKOVCHAIN MODELa Importance Density g )b E[g] Normal 1000,, RNEC sd[g] E[g] Split Normal sd[g] 1000,, RNEC Likelihood Function E[g] sd[g] 1000,, RNEC Actual E[g] sd[g] Case I Pi P2 P1i P2 1 rj71 r 1* .098 .034 .246 .049 11.7 4.98 4.25 .926 9.60 4.35 3.49 .887 .051 .056 5.09 .970 4.44 .932 .441 .099 .769 .247 .960 11.7 .911 4.24 .961 9.58 .905 3.48 .035 .051 4.94 .934 4.32 .897 .033 .050 4.15 .891 3.62 .854 .099 .035 .035 1.00 .099 1.13 .248 .050 .050 1.00 .247 1.01 1.41 11.6 5.37 5.37 1.00 11.7 1.10 4.21 .910 .910 1.00 4.24 1.42 9.55 4.68 4.68 1.00 .874 .874 1.00 1.10 3.45 .035 .050 4.98 .925 - .753 .219 1.33 5.16 .740 .182 1.36 6.01 .472 2.18 .045 .072 .083 2.07 .043 .049 .081 2.09 .130 1.22 .044 .070 .081 1.77 .052 .060 .098 2.18 .163 1.30 1.05 1.05 1.05 1.37 .686 .672 .681 .921 .642 .880 .752 .220 1.33 5.15 .739 .183 1.36 5.97 .471 2.17 .046 .072 .084 2.06 .044 .050 .082 2.07 .132 1.22 .046 .072 .084 2.06 .054 .063 .103 2.58 .164 1.51 1.00 .753 .046 1.00 .219 .072 1.00 1.33 .083 1.00 5.17 2.08 .647 .647 .647 .647 .647 .647 - .296 .784 3.46 1.29 .268 .670 3.82 1.51 1.22 .488 .046 .083 .565 .152 .041 .060 .615 .147 .419 .173 .046 1.01 .082 1.03 .547 1.06 .152 .998 .092 .201 .138 .189 1.32 .217 .340 .185 0.93 .206 .401 .185 .297 .781 3.46 1.30 .270 .668 3.80 1.51 1.21 .490 .046 .085 .556 .156 .041 .062 .595 .152 .408 .179 .046 .085 .556 .156 .089 .136 1.30 .333 .893 .391 1.00 1.00 1.00 1.00 .208 .208 .208 .208 .208 .208 - Case II Pi P2 pj11 p2-1 P1 P2*1* Pi* P21* 1* r1 r27* Case III Pi P2 P1i P- 1 P1 P2* Pi* P2 * r1-1* r7'-1* a .753 .046 .218 .070 1.33 .083 5.18 2.09 .739 .043 .183 .050 .082 1.36 6.00 2.13 .472 .129 2.18 1.23 .051 .808 .083 .720 .096 .759 1.92 1.19 .060 .526 .064 .629 .117 .491 2.21 .928 .170 .576 1.30 .891 .296 .045 .051 .782 .084 .118 3.46 .558 .583 1.29 .153 .246 .269 .041 .145 .669 .062 .257 3.80 .600 1.82 1.51 .150 .686 1.21 .394 .116 .488 .170 .698 .810 .507 .915 .387 .080 .057 .108 .048 .115 .060 Expected values of functions of interest g( .296 .046 .783 .084 3.46 .562 1.29 .154 - - ) were computed by Monte Carlo integration with importance sampling, using 10,000 replications. b Prior is I(p'l, P2) = 1 if no asterisk: with asterisk, prior is qr(pl, P2) = 2 if Pi +P2 the posterior satisfies P1 + P2 < 1. c Relative numerical efficiency. < 1. In Case I, essentially all the mass of importance sampling density motivated by the asymptotic distribution: p1 and The second block of P2 are independent, pj N(fPi, p1(1 )AMj1+Mj2)). columns provides results with the split normal importance sampling density constructed as described in Section 4. Since the posterior density is the product of a binomial in Pi and a binomial in P2' each pj may be expressed as a ratio involving two independent chi-square random variables (Johnson and Kotz (1970, Section 24.2)). Hence it is straightforward to use the likelihood function itself as the importance sampling density; the third block of columns provides results obtained this way. Finally, the posterior moments of pi and p.-1 have simple analytical expressions under the first prior, and these are provided in the last block of columns. (Under the second prior the posterior moments of pj and 1330 JOHN GEWEKE PJ-1 could probablybe computedmore efficientlyby numericalevaluationof the incomplete beta, but this has not been pursuedbecause it does not extend to multi-stateMarkovchain models.) In Case I, P[ p1 + P2 < 1] 1 under the first prior; none of the 10,000 replicationswith any importancesamplingdensityturnedup P1 + P2 > 1. In Case and in Case III P[p1+p2<1]-.21, under the first II, P[p1+p2<1]-.65 prior. The computed expected values of all functions of interest with each importancesamplingdensityare quite similarto each otherand (whereavailable) to the actual values. There are no surprisesgiven the computed numerical standard errors. There are greater relative differencesamong the computed standarddeviationsof the functionsof interest,and it may be verifiedanalytically that this must be the case. One could, of course, compute numerical standarderrorsfor the posteriorstandarddeviationsas well. The relativenumerical efficienciesindicate that, with a few small exceptionsin Case II, the split normalis a better importancesamplingdensitythan the normal.The superiority of the split normal increasessystematicallyas E(pj) approacheszero and the asymptoticnormal becomesa poorer approximationto the likelihoodfunction. Many examplesof relativenumericalefficienciesin excess of 1.0 turnup with the split normal importancesampling density, precisely where we would expect: E( pj) closer to zero, and momentsof the pj showinggreaterdispersion,like pj-1. By constructionRNE is 1.0 for importancesamplingfrom the posteriordensity, and for importancesamplingfrom the likelihood function it is the posterior probabilityof an imposedprior,evaluatedassuminga priorthat is uniformin the parameterspace of the likelihoodfunction. TABLE III AccuRAcY IN THE SoME DIAGNOSTICSFOR COMPUTATIONAL TWO-STATEHOMOGENEOUSMARKOVCHAIN MODEL Normal Sampling Density n = 10,000 n = 50,000 Case I RNE, Pi RNE, P2 1W10 Case II RNE, Pi RNE, P2 1W10 Case III RNE, Pi RNE, P2 Split Normal Sampling Density n = 10,000 n = 50,000 .441 .769 186.2 72.9 .269 .657 1,774.7 497.0 1.137 1.014 2.5 2.5 1.139 1.012 2.5 2.5 .808 .720 31.2 17.5 .774 .564 277.9 106.3 1.054 1.054 1.9 1.9 1.047 1.053 1.9 1.9 .810 .507 101.0 52.8 .790 .462 348.7 161.3 1.011 1.032 1.8 1.8 1.009 1.030 1.8 1.8 BAYESIAN INFERENCE 1331 From the exampleof Section5.1, we know that the population RNE for the normal importancesamplingdensityin Case I shouldbe on the orderof 10- 24. The normal is such a bad importancesampling density that, even with 104 replications,it will almostcertainlynot producethe values of the pj that lead to difficulties,and computed RNE will bear no resemblanceto population RNE. As the number of replicationsincreasescomputed RNE's tend to decline in situationslike this, as is clearfromconsiderationof Figure1. Table III contrasts some computed RNE's for n = 10,000 and n = 50,000 and bears out the diffi- culties with Case I. In these situationsthe largestweights w(Gi) increaserapidly as the numberof replications,n, increases,and orderstatisticspertainingto the w(6i) are often useful in identifyingvery poorly conditionedimportancesampling densities. Since the largest w(pi)'s may be retained systematicallyas n increasesand E4'1w(Qi)and Ei 1w(Q)2 are computedin any event, the diagnostic of the form wm= (n/m)ET w(w (where superscripts (n+l-i)(G)2/Ei =W(pi)2 denote orderstatisticsand m is small)can be quiteusefulas an indicationof thin tails in the importancesamplingdensity relativeto the posterior.The values of these diagnosticsfor our examplesareprovidedin Table III, againfor n = 10,000 and n = 50,000; they indicate-correctly-that the split normalis a muchbetter importancesamplingdensitythan is the normal. 5.3. The QuartileHomogeneousMarkov Chain Model 3 Considerlast a four-statehomogeneousMarkovchain model, state j denoting the jth quartile of income. Let pij P[In state j at time t IIn state i at time t - 1], and arrange the pij in a 4 x 4 matrix P. Since 4=1 Pij = 4=1 Pji = 1 for j = 1, ... , 4, the likelihood function has 9 free parameters and is not proportional to a product of multinomialdensities.We supposethat the processof transition between income quartilestakesplace in continuoustime; define rij= lim 8 - 'P [In statej at time t + 8 IIn state i at time t] for all i *j, and let ri = -E i rij. If the discretetime model is embeddable, then R = Q log[A]Q 1, whereQ is the matrixwhose columnsare righteigenvectors of P and A is the diagonalmatrixof correspondingeigenvalues.A given discrete time model is embeddableif and only if the off-diagonalelements of Q log[A]Q-1 are all real and nonnegativeand the diagonalelementsare all real and nonpositivefor some combinationof the branchesof the logarithmsof the eigenvalues.The problemsof inferencesubjectto the constraintof embeddability is ill suited to classicaltreatment,and a purelyanalyticalBayesianapproachis precludedby the complexityof the indicatorfunctionfor embeddability. 3Detailson the constructionof the normalimportancesamplingdensityfor this examplemay be found in Geweke,Marshall,and Zarkin(1986a).Interestin this problemin generaland in measures of mobilityin particularis motivatedby Geweke,Marshall,and Zarkin(1986b). 1332 JOHN GEWEKE Values of fi(8),quartile homogeneous Morkovchain model fi(8) 1.6- a-Axis1 *-Axis 2 o-Axis3 -a -Axis4 -Axis x-Axis 6 x-Axis7 * -Axis8 1.4i 5 1.2- 1.0- 0.80.6 0.40.2-6 -4 -2 0 2 4 6 a FIGURE3.-Values of f,(8) along nine orthogonal axes of the likelihood function, constructed as described in Section 5.3. The 9 x 1 vectorof parametersfor the likelihoodfunctionconsistsof those Pij for which j # i and j / i + 1, orderedso that the Pij appearin ascendingorder.4 The split normal importancesamplingdensity was constructedas describedin Section 4, with T being the lower triangularCholeski factorizationof the aymptotic variance matrix of 6. Some idea of the behavior of the likelihood surface is conveyedin Figure3, which shows values of the fi(3) definedin (6). For the lower-numberedaxes, the boundaryof the parameterspaceis only a few standarddeviationsin the negativedirection,and so fi(3) is truncatedon the left. For these axes fi'(8) is greater for 8 > 0 than it is for the higher-numbered axes, indicatingthat the likelihoodfunctiontapersmost slowly relativeto the asymptotic normal approximationin these directions.This is not surprisingin view of the examplein Section5.1. The functions of interestare - r,j-, mean durationin each state, rjj, rate of transitionto and from each state, and threesummarymeasuresof mobilitythat have been proposed in the literature: M*(R) = -.25 tr(R), MB,(R) = .254 =Ej4=ji_-jjrij, and M2*(R)= -log(X2) where 2 is the second largest eigenvalueof P. We imposethe priorso(P) ac 1 if P is embeddableand so(P) = 0 otherwise. Under this prior posteriormeans of the rjj-1,but not the rjj or the mobility measures,exist. The data set was extractedfrom the National Longitudinal Surveydata file, consistingof those 460 white males who werenot enrolled For details, see the Appendix of Geweke, Marshall, and Zarkin (1986a). 1333 BAYESIAN INFERENCE in school, employed full time, and marriedwith spouse present during 1968, 1969, and 1970; and whose familyincomeswerecoded for 1969, 1970, and 1971. Monte Carlo integrationwith 10,000 replications,using the normal and split normal importancesamplingdensitieswas carriedout. The normalimportance samplingdensity performedvery poorly: w, = 2,829.6.The correspondingvalue for the split normalwas 22.1. The posteriorprobabilityof embeddabilityis .246 (with a numerical standarderror of .0009, computed using the split normal importance sampling density). This probability,rather than 1.0, provides the norm against which RNE's should be compared,for any importancesampling density based on the likelihoodfunctionratherthan the posterior. The computedvaluesin Table IV are essentiallythe same,for expectedvalues and mediansof functionsof interest,with the two importancesamplingdensities employed. Higher moments,and quantilepoints fartherfrom the median,show greater differences. The computed RNE's suggest that the split normal is substantiallymore efficientthan the normal.However,for the normalimportance sampling density computed RNE tends to vary widely with each set of 10,000 replications, and deterioratesas the number of replicationsincreases.In this application, and-we conjecture-in most applicationswith more than a few dimensions, careful modificationof asymptotic normal and other symmetric distributionsis requiredif computednumericalstandarderrorsare to be interpreted reliablyusing Theorem2. TABLE IV INFERENCEIN THE QUARTILEHOMOGENEOUSMARKOVCHAIN MODELO Importance Density g( ) -r221 -331- _ Importance Density g( ) -r17jt -r22 _r3_ r44 -ril r22 r33 -r44 M*(R) MB*(R) M2*( R) E[g] Split Normal sd[gg] 100l, 2.35 1.33 1.33 2.28 .338 .170 .162 .324 Normal .786 .393 .380 .786 RNEb E[g] sd[g] 100ln RNEb .184 .187 .182 .169 2.36 1.34 1.33 2.28 .341 .170 .169 .323 .919 .470 .546 1.056 .138 .131 .095 .094 .01 .25 Split Normal Quantile .50 .75 .99 .01 .25 Normal Quantile .50 .75 .99 1.68 .987 .979 1.60 .304 .566 .572 .319 .485 .586 .317 2.10 1.22 1.21 2.05 .391 .692 .698 .403 .565 .675 .435 2.33 1.32 1.32 2.26 .428 .759 .760 .443 .598 .718 .810 2.56 1.45 1.43 2.48 .477 .821 .823 .488 .637 .760 1.03 3.28 1.76 1.74 3.13 .594 1.01 1.02 .621 .741 .881 1.48 1.68 .965 .973 1.65 .311 .566 .570 .318 .487 .586 .316 2.12 1.21 1.21 2.05 .390 .691 .694 .402 .563 .676 .418 2.32 1.32 1.32 2.25 .430 .755 .757 .444 .597 .716 .795 2.56 1.44 1.44 2.48 .472 .825 .823 .488 .638 .759 1.00 3.21 1.76 1.75 3.14 .589 1.03 1.02 .598 .745 .864 1.49 aExpected values, standard deviations, and fractiles of functions of interest g( ) were computed by Monte Carlo integration with importance sampling from a split normal density, using 10,000 replications. b Relative numerical efficiency. 1334 JOHN GEWEKE 6. EXAMPLES: ARCH LINEAR MODELS In this section we take up an examplein which the parameterspace 6) is not compact. It involves constraintswhich cannot readily be imposed in classical inference, as discussedin detail elsewhere(Geweke (1988a)). For this, among other reasons,exact Bayesianinferenceis appealingin this model. The ARCH linearmodelwas proposedby Engle(1982) to allowpersistencein conditionalvariancein economictime series.Let xt: k x 1 and yt be time series for which the distribution of yt conditional on 4-' = {xt-S Yt- s ?> 1} is (9) ytiVit-, - N(xtf,8, ht). Defining et = y,- x'/, take p ht = O+ E aj?t_2 h (Et- *,EtpYql j=1 The parameterizationin termsof y allowsrestrictionslike ao = Yo,aj = Yi(p + 1 -j), the linearlydecliningweightsemployedby Engle(1982, 1983).For (9) to be plausible it is necessary that a0 > 0 and aj >O (j = ,...,p). For { Et} to be stationaryit is necessaryand sufficientthat the roots of 1 - ER1 ajzz all be outside the unit circle. Given the sample (x,, yt; t= 1,..., T) the log-likelihood function is (up to an additiveconstant) T T (10) 1=-(1/2) E t=p+l log ht -(1/2) htlt2. t=p+l Engle (1982) has shownthat for largesamplesd 21/d/3dy'-0, and the maximum likelihoodestimates/3of /3 and ' of y are asymptoticallyindependent.A scoring algorithmmay be used to maximizethe likelihoodfunction,and revisionof the estimate y( ) of y and P8(l) of /3 may proceed separatelyat each step. We programmedthe proceduredescribedin Engle (1982) and then replicatedother results (Engle (1983)) using data furnishedby the Journalof Money,Credit,and Banking. Since /8 and y' are asymptoticallyindependentwe constructthe importance samplingdensityas the productof a densityfor / and a densityfor y. To employ condition (4) and the proceduresof Section4 first comparethe tail behaviorof the multivariatet densityin m dimensions, f(X; , 2, v) x [i + V(x-f) x )] -(m)/2 with that of the likelihoodfunction.Allowing lxjl oo while fixingall otherxi, the log of f(x; ,u,2, v) behaveslike - (v + m)logIxj1.The log likelihoodfunction (10) behaves like - [(T - p)/2]log(yj) as -yj oo with all other parameters fixed, - oo with all otherparameters fixed.Hencefor /3 and like - (T - p)logj/3I as I/pj3I T - p - k degrees with we shall use a split Student importance sampling density 1335 BAYESIAN INFERENCE TABLE V IN THEARCH LINEARMODELa INFERENCE g( ) E[g] p1 1.096 .954 1.001 .281 .795 P2 YO Y1 P[Stable] sd(g] 100a,, RNEb .092 .135 .291 .066 - .110 .155 .373 .079 .475 .700 .757 .609 .693 .722 a Expectedvaluesand standarddeviationsof functions of interest g( ) were computed by Monte Carlo integrationwith importancesamplingfrom a split Studentdensity,using 10,000 replications. b Relativenumericalefficiency. of freedom,and for -y we shall use a split Studentimportancesamplingdensity with (T - p)/2 - q degrees of freedom. The variance matrices are given by expressions(28) and (32), respectively,in Engle (1982). An artificialsampleof size 200 providesa concreteexample: 2 Yt x= E F,pxtj j=- 1, Et ~ N(0ht) pi = + Et, #2-1 xt2 = cos(27rt/200); ht= 1 + .5Et2_ + .25Et2_2 The parameterizationof the conditionalvarianceprocess in the model is h = = y1. We assume a flat pnror on /3 and yo + -yl(2et21+ et.2); a0 = Yo,a1 = 2yl, Ca2 -y, and estimate these four parametersand assess the posteriorprobabilityof stability P[Hyll< 1/3]. Results using the split Student importance sampling density constructedas describedin Section4 are presentedin Table V. The local behaviorof the log-likelihoodfunctionis portrayedin Figure4. For each of the four axes this figureindicatesthe actuallog-likelihoodfunction(solid thick line), the multivariate"t" approximationto the likelihoodfunction using the informationmatrix(Engle (1982, (28) and (32))) (dotted line), and the split Student approximationto the likelihood function constructedas describedin Section 4 (thin line). (In the case of y, the curvesterminateat -yO=0 or Y1l= 0.) For /3 the classical asymptotictheory provides a good approximationto the likelihoodfunction,but for -ythe approximationis poor. In Table VI we comparediagnosticsfor computationalaccuracyusing the unmodifiedand split normaldensities(which are unjustifiedtheoreticallysince E[w(#)] is infinitein thesecases)and the unmodifiedand split Studentdensities. The unmodifieddensitiesperformquite poorly,for reasonsmade clearin Figure 4. The split normal and split Student sampling densities lead to acceptable performance,in the sense that RNE is apparentlyof the same orderof magnitude as would be achieved if one could sample directly from the likelihood function.In the case of the split normalwe know this is an illustration,but from 1336 JOHN GEWEKE SecondBetaAxis FirstBetaAxis 0- 0-10>7 J-20 J-20- -30-6 -5 -4 3-2- -30-6 -5 4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 -1 Deviotions Standard Deviations Standard Axis SecondGamma Axis FirstGamma 0-0 U. U. ?-0-2 >?-20- I -| ,1.1 | Z.I,ls -304-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 . , -30 - -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 StandardDeviotions StandardDeviations functionsare normalizedto have the value 1 at 0 standarddeviations,which correspondsto the asymptoticmaximumlikelihoodestimator.The solid thickline is the log likelihood function;the dotted line is the log multivariatet importancesamplingdensity;and the solid thin line is the log split Studentimportancesamplingdensity. FIGuRE 4.-All TABLE VI AccuRAcY SoME DIAGNOSTICSFORCOMPUTATIONAL ARCH LINEAR MODEL Split Normal SamplingDensity MultivariateNormal SamplingDensity n - 10,000 n = 50,000 n = 10,000 n = 50,000 RNE,f1 .006 .016 .686 .666 RNE,,2 RNE, yo .005 .001 .014 .002 .682 .564 .708 .529 .006 .018 .012 .044 .748 .744 .714 .738 RNE, y1 RNE, p Oi 'W10 9,872.8 994.4 39,893.6 4,857.2 Multivariatet (99) SamplingDensity n = 50,000 10,000 51.0 22.3 n= 225.4 66.0 Split Student SamplingDensity 10,000 n = 50,000 RNE,f1 .036 .068 .700 .707 RNE, f2 RNE, yo .027 .002 .065 .006 .757 .609 .742 .501 .017 .049 .041 .118 .692 .722 .690 .741 RNE, y, RNE, p Wl 'W10 9,726.9 979.6 37,067.8 4,476.4 19.9 16.0 232.3 70.9 1337 BAYESIAN INFERENCE TABLE VII ACCURACY SOMEDIAGNOSTICSFOR COMPUTATIONAL ARCH LINEARMODEL Multivariate Sampling Density with Indicated Degrees of Freedom n= 10,000 v=1.0 v=1.5 v=2.0 v=0.5 RNE, 1 RNE, f2 RNE, yo RNE, yt RNE, p (01 (010 .144 .150 .127 .129 .127 W1 (010 v=4.0 .536 .540 .336 .410 .493 53.4 34.2 .402 .410 .303 .339 .368 42.2 32.3 52.2 47.2 v= 3.0 RNE,f1 RNE, f2 RNE, yo RNE, yr RNE, p .302 .307 .243 .262 .282 41.2 29.8 v=6.0 .529 .525 .317 .399 .504 69.0 42.7 .487 .537 .271 .366 .522 136.3 71.2 .451 .455 .313 .365 .417 40.2 34.2 v= 12.0 .356 .430 .197 .292 .495 321.4 130.0 values of n are required before RNE is mainly determined Figure 4 astronomical sampling by the relative tail behavior of the likelihood function and importance for the split Student and split normal importance sampling density. Diagnostics the same. densities are essentially samwith importance The alternative approach to Monte Carlo integration pling for posterior densities with tails thick relative to the normal- has been to t distributions with small degrees of freedom (Zellner and employ multivariate Rossi (1984), van Dijk, Hop, and Louter (1986)). We conclude this example with of diagnostics an examination for computational accuracy for this procedure. with the asymptotic variance matrices for ,B and y, multivariate Beginning importance sampling densities with v = .5, 1, 1.5, 2, 3, 4, 6, and 12 degrees of for numerical accuracy are given in freedom were used. The resulting diagnostics Table VII. RNE for the four parameters attains a maximum at v = 3, that for p at v = 6. For v < 3 the sampling density is too diffuse, and as v becomes larger it the multivariate normal which is much too compact. That w-diagnosapproaches tics are sensitive to sampling densities with tails that are too thin rather than too t with low thick is clearly indicated in Table VII. The RNE of the multivariate degrees of freedom is decidedly less than that of the split Student t, but of the same order of magnitude for proper choice of v. In the absence of any systematic is basis for choosing v, however, costly numerical experimentation or theoretical value. required to find the appropriate 7. CONCLUSION Integration by Monte Carlo with importance sampling provides models. Thanks to increasingly Bayesian inference in econometric a paradigm for cheap comput- 1338 JOHN GEWEKE ing it is practical to obtain the exact posterior distributions of any function of interest of the parameters, subject to any diffuse prior. This is done through systematic exploration of the likelihood function, and the asymptotic sampling distribution theory for maximum likelihood estimators provides the organization for this exploration. Diffuse priors may incorporate inequality restrictions which arise frequently in applied work but are impractical if not impossible to handle in a classical setting. Concentrations of prior probability mass may be treated by applying the methods in this paper to both the original and lower dimensional problems, and combining results through the appropriate posterior odds ratios. By choosing functions of interest appropriately, formal and direct answers to the questions that motivate empirical work can be obtained. The investigator can routinely handle problems in inference that would be analytical nightmares if attacked from a sampling theoretic point of view, and is left free to craft his models and functions more according to substantive questions and less subject to the restrictions of what asymptotic distribution theory can provide. Integration by Monte Carlo is an attractive research tool because it makes numerical problems much more routine than do other numerical integration methods. The principle analytical task left to the econometrician is the characterization of the tail behavior of the likelihood function. This characterization leads to a family of importance sampling densities from which a good choice can be made automatically. Prior distributions and functions of interest can then be specified at will, taking care that computed moments actually exist. There is no guarantee that these methods will produce computed moments of reasonable accuracy in a reasonable number (say, 10,000) of Monte Carlo replications. Pathological likelihood functions will demand more analytical work: for example analytical marginalization may be possible, and transformation of parameters to obtain more regular posterior densities can be quite helpful, but these tasks must be approached on a case-by-case basis. It is in precisely such cases that sampling theoretic asymptotic theory and normal or other approximations to the posterior are also likely to be inadequate: the difficulty is endemic to the model and not the approach. Clearly there is much to be learned about more complicated likelihood functions. In approaching these problems, the emphasis should be-as it has been in this paper-on generic rather than specific solutions. Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27706, U.S.A. Final manuscriptreceivedMarch, 1989. REFERENCES BAUWENS, W. (1984): Bayesian Full Information Analysis of Simultaneous Equation Models Using Integration by Monte Carlo. Berlin: Springer-Verlag. BAUWENS,W., AND J. F. RICHARD(1985): "A 1-1 Poly-t Random Variable Generator with Application to Monte Carlo Integration," Journal of Econometrics,29, 19-46. P. (1979): Probability andMeasure.New York: Wiley. BILLINGSLEY, BAYESIAN INFERENCE CRAMIR, 1339 H. (1946): Mathematical Methodsof Statistics.Princeton:PrincetonUniversityPress. DAVIS, P. J., AND P. RABINOWITZ(1975): Methods of Numerical Integration. New York: Academic Press. ENGLE, R. F. (1982): "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, 50, 987-1008. (1983): "Estimates of the Variance of U.S. Inflation Based on the ARCH Model," Journal of Money, Credit, and Banking, 15, 286-301. (1984): "Wald, Likelihood Ratio and Lagrange Multiplier Tests in Econometrics," Chapter 13 of Handbook of Econometrics, Vol. II, ed. by Z. Griliches and M. D. Intriligator. Amsterdam: North-Holland. GALLANT, A. R., AND J. F. MONAHAN (1985): "Explicitly Infinite-Dimensional Bayesian Analysis of Production Technologies," Journal of Econometrics, 30, 171-202. GEWEKE,J. (1986): "Exact Inference in the Inequality Constrained Normal Linear Regression Model," Journal of Applied Econometrics, 1, 127-141. (1988a): "Exact Inference in Models with Autoregressive Conditional Heteroscedasticity," in Dynamic Econometric Modeling, ed. by E. Bemdt, H. White, and W. Barnett. Cambridge: Cambridge University Press, 73-104. (1988b): "Antithetic Acceleration of Monte Carlo Integration in Bayesian Inference," Journal of Econometrics, 38, 73-90. GEWEKE, J., R. C. MARSHALL, AND G. ZARKIN (1986a): "Exact Inference for Continuous Time Markov Chain Models," Review of Economic Studies, 53, 653-699. (1986b): "Mobility Indices in Continuous Time Markov Chains," Econometrica, 54,1407-1423. J. M., AND D. C. HANDSCOMB(1964):MonteCarloMethods.London:Methuen(First HAMMERSLEY, Edition). (1979): Monte Carlo Methods. London: Chapman and Hall (Second Edition). JOHNSON, N. C., AND S. KOTZ(1970): Continuous Univariate Distributions-2. New York: Wiley. KLOEK, T., AND H. K. VAN DIJK (1978): "Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo," Econometrica, 46, 1-20. QUANDT, R. E. (1983): "Computational Problems and Methods," Chapter 12 of Handbook of Econometrics, Vol. I, ed. by Z. Griliches and M. D. Intriligator. Amsterdam: North-Holland. RUBINSTEIN, R. Y. (1981): Simulation and the Monte Carlo Method. New York: Wiley. SINGER, B., AND J. COHEN (1980): "Estimating Malaria Incidence and Recovery Rates from Panel Surveys," Mathematic Biosciences, 49, 273-305. of SocialProcessesby MarkovModels," SINGER, B., AND S. SPILERMAN(1976):"The Representation American Journal of Sociology, 82, 1-54. VAN DIMK, H. K., J. P. Hop, AND A. S. LOUTER (1986): "An Algorithmfor the Computationof Posterior Moments and Densities Using Simple Importance Sampling," Erasmus University Econometric Institute Report 8625/E. VAN DIJK, H. K., T. KLOEK, AND C. G. E. BOENDER (1985): "PosteriorMomentsComputedby Mixed Integration," Journal of Econometrics, 29, 3-18. ZELLNER, A., AND P. RossI (1984): "Bayesian Analysis of Dichotomous Quantal Response Models," Journal of Econometrics, 25, 365-394.
© Copyright 2026 Paperzz