Discretization of Continuous Markov Chains and Markov Chain Monte Carlo Convergence Assessment Author(s): Chantal Guihenneuc-Jouyaux and Christian P. Robert Source: Journal of the American Statistical Association, Vol. 93, No. 443 (Sep., 1998), pp. 10551067 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2669849 . Accessed: 14/10/2013 12:55 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Discretization of Continuous Markov Chains and Markov Chain Monte Carlo Convergence Assessment and Christian P. ROBERT Chantal GUIHENNEUC-JOUYAUX We show thatcontinuousstate-spaceMarkov chains can be rigorouslydiscretizedinto finiteMarkov chains. The idea is to Once a finiteMarkovchain subsamplethe continuouschain at renewaltimesrelatedto small sets thatcontrolthe discretization. is derivedfromthe Markov chain Monte Carlo output,generalconvergencepropertieson finitestate spaces can be exploited forconvergenceassessmentin severaldirections.Our choice is based on a divergencecriterionderivedfromKemenyand Snell, moreefficiently, on twoparallelchainsonly, whichis firstevaluatedon parallelchainswitha stoppingtimeand thenimplemented, of thiscriterionis illustratedon threestandard using Birkhoff'spointwiseergodictheoremfor stoppingrules.The performance examples. KEY WORDS: Divergence;Ergodictheorem;Finitestatespace; MarkovchainMonteCarlo algorithm; Multiplechains;Renewal theory;Renewal time;Stoppingtime. Lewis (1992, 1996), which is devised for binaryMarkov Rafteryand Lewis (1992, 1996) have developeda conver- chains,and techniquesbased on the centrallimittheorem and Diebolt (1997), manycongence controlmethodbased on a two-stateMarkov chain, as developedby Chauveau resultsfromKemenyand Snell (1960) lead to conby creatinga sequence (((t)) of indicatorvariablesfroman vergence diagnoses.We choose to use the divergencecrivergence arbitrary continuousstate-spaceMarkovchain (0(t)), in terion,inspiredfromthe convergenceof the difference thenumbersof visitsto a givenstatefortwo initialstates. ((t) <0. The motivationforthis choice is twofold.First,stabilizaof tion of the differenceis indicativeof the stationarity Assuminga Markovian structureon the sequence This initial to matter. statescease the chains,because the theyproposedan evaluationof the"burn-in"timeand of the particularlyfitsthe purpose of convergencecontrol.Secnumberof simulationsrequiredfora givenprecision,based ond, the theoreticallimitof the criterioncan be computed matrixof the((t)'s. The advantages on thepseudo-transition matrixof thefinitechain,and thecritefromthetransition of an approachbased on a preliminarydiscretizationare as rionthenappears a particularcontrolvariatemethod.In numerous.Both the model and the underlyingMarkovian practice,our convergencecontrolis based on simultaneous theoryaremuchsimpler,convergenceof thediscretizedvergroups stabilizationsof empiricaldivergencesfordifferent sion occursfaster,and the assessmentcan be strengthened comand on a subsequent of startingand visitingstates, by refiningthe discretization,althoughit always applies This with theoretical limits. is only the estimated parison only to the discretizedchain. A drawbackof the Raftery a firstpossible exploitationof the discretizedchain,other and Lewis (1992) approachis that (((t)) is not a Markov evaluationscan be simultaneously proposedfora stronger conditionshold (see Kemenyand chain,unless restrictive convergencediagnosis. For instance,Propp and Wilson's valid Snell 1960). We propose a generaland theoretically (1995) exact simulationmethodcan be used to generatethe methodbased on subsamplingof a discrete discretization startingvalues of thechains. sequence derivedfrom(0(t)) dependingon a sequence of The divergencecriterionand its performancesare first renewaltimes;thatis; epochs thatseparatethe chain into establishedfora parallelscheme,whichnecessitatesmany iid blocks (Meyn and Tweedie 1993). restartsof the continuouschain. Althoughthis allows for Once a true finitestate-spaceMarkov chain is conthereare manydrawbacksto usan easy implementation, severalconvergenceassessmentscan applyforthat structed, ing parallel chains. First,the restartspreventany control chain. Besides the normalapproximationof Rafteryand of stationarity, and thus the empiricaldivergencesclearly lack validityas evaluationsof theirtheoreticalcounterpart. is Associate Professorat Unit6Associ6e Moreover,the convergenceassessmentalso imposes a difChantalGuihenneuc-Jouyaux au CentreNational de La RechercheScientifique1323, Laboratoirede ferentimplementation of the Markov chain Monte Carlo StatistiqueM6dicale, Universit6Paris 5, 75006 Paris, France (E-mail: (MCMC) algorithm, because thisparallelevaluationcannot ChristianP. Robertis Professorand Head of the [email protected]). on-line. We thenreducethenumberof parallel Statistics Laboratory,CREST, INSEE, 75675 Paris, France (E-mail: be operated for of thecontrolmethod implementation [email protected]).This work was discussed at the Methods for Control chainsnecessary of Monte Carlo Markov Chains workshop,held at CREST and involving to twochains,by virtueof Birkhoff's pointwiseergodictheG. Celeux,D. Cellier,D. Chauveau,J.Diebolt,M.A. Gruet,V. Lasserre,M. orem(Battacharyaand Waymire1990), withtheadditional 1. INTRODUCTION Lavielle, F. Muri,A. Philippe,and S. Richardson,to whomwe are grateful for numerouscomments.Commentsfromparticipantsof the HSSS Conferencein Rebild were helpfulto improvethefocus and organization of the paper. Criticismsand suggestionsfromG. Casella, the associate editor,and both refereeswere equally helpfulin improvingthe article's readability. ? 1998 American Statistical Association Journal of the American Statistical Association September 1998, Vol. 93, No. 443, Theory and Methods 1055 This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Journalof the AmericanStatisticalAssociation,September 1998 1056 incentivethatthesetwo chainsdo notneed to be restarted at all. We thus.get as close as possibleto a genuineon-line evaluation. The article is organized as follows. Section 2 recalls useful facts on renewaltheoryand establishesthatfinite in conMarkovchainscan be constructed by discretization tinuousstatespaces. Section3 describesthedivergencecrithroughstopterion,includingan improvedimplementation ping rules. Section 4 elaborateson the estimationof the divergenceforcontinuousstatespaces,eliminatinga seeminglyintuitivestoppingrule,and gives a firstevaluationon Section5 derives a benchmarkexamplefromtheliterature. the finalversionof the criterionfromBirkhoff'sergodic on two examples.Section6 contheorem,withillustrations cludes thearticle. 2. dom subsampling,as shownby Meyn and Tweedie (1993, kernelK(0' 0) p. 118),or whenthedensityof thetransition is boundedin a neighborhoodof 0 forall 0 E (9, as shown by Roberts and Tweedie (1996). This case thus is fairly commonin MCMC setups. Once a triplet(A, E, v) is known,the transitionkernel K(O(t) 0(t1-)) can be modifiedby "splitting"to inducerenewal. Indeed,when0(t-1) E A, we can write K(((t) 10(t-1)) = Ev(Q(t)) + (1 6 ) K(o(t)0(t-')) -EV(0(t)) and thus representK as a mixtureof two distributions, v(0(t)) and = Ke0(t0 l0("-) fE-(m 0()I(t-')) )-Ev(0(0)) into from0(t-1) to 0(t) canthenbe modified Thetransition DISCRETIZATION OF CONTINUOUS MARKOVCHAINS of 0(t) E The majorproblemwitha naivediscretization like 0(t) 01 V(01) 02 K(02 Q0(t1)) with probabilityE with probability1 - 6, (2) whereA is a measurablesubset,is thatthe sequence ()(t)) is not usuallya Markov chain,because of the dependence on the previousvalues of q(k). There is, however,a case of particularinterestwhere a Markov subchain can be whenbothA and AC are atomsof the constructed-namely, chain ((t)). [We recall thata set B is an atom if the transitionkernelP of the chain satisfiesp(O(t+1) E CQO(t)) = v(C) for every 0(t) E B, where v is a fixedprobability wellmeasure.]We thusconsidera generaland theoretically of continuousMarkovchainsbased groundeddiscretization on an extensionof atomsto small sets.We firstrecall some necessarynotionson these sets and theirconnectionwith renewaltheory. 2.1 Small Sets and Renewal Times. when 0(t-1) E A. Althoughthe overall transitionis not modified(marginallyin 0(t)), thereare epochs when0(t) is of theprevious generatedfromv(0) and is thusindependent value 0(t-1). These occurrencesare called renewalevents. The modificationof the kernelin (2) requires simulationsfromk(010(t-1)) when0(t-1) E A. Even thoughonly theratioEv(02)/K(02 0(t1)) is neededforthissimulation, bothK and v are usuallyimplicit.This ratiomustthenbe We proposeusingsumsof conditionaldensiapproximated. tiesto removetheintegralexpressionsof bothkernels.This techniqueis more clearlydevelopedthroughthe examples of Sections4 and 5. 2.2 Discretization Considera chain withseveraldisjointsmall sets Ai (i A smallsetA (see MeynandTweedie1993,p.106)sat- 1,..., k) and correspondingparameters(Ei,vi). The Ai's isfiesthe followingproperty:There exist,m E N*, E > 0 and a probabilitymeasurevm such thatwhen0(t) E A, Pr(O(t+m) E B 0(t)) > EVm(B) (1) foreverymeasurableset B. It can be shownthatsmall sets measureexistforthechainsinvolved withpositiveinvariant in MCMC algorithms, because it followsfromworkof Asmussen(1979) thateveryirreducibleMarkov chain allows forrenewal.Meyn and Tweedie (1993, p.109) also showed thatthespace 83can be coveredwitha countablenumberof small sets. When the chain is uniformly ergodic,as in the benchmarkexampleof Section4.2 (see Robert1996), 83is of small a small set by itself.The practicaldeterminations sets and of the corresponding(e, v) are more delicate,but Mykland,Tierneyand Yu (1995) and Robert(1996) have shownthatthiscan be done in realisticsetups,sometimes kernel. througha modification of the transition For simplicity'ssake, we will assume in the sequel that m =1, whichis ensuredwhen the chain is stronglyaperiodic.Strongaperiodicitycan always be achievedby ran- a partition of the (i 1, ... , k) do notnecessarilyconstitute space. We can thendefinerenewaltimesT by (n > 1), Tn = infft >T,,-,; 3~I < i < k, 0(t-1) EEAi and0(t) V2 }. (3) As shownby Meynand Tweedie(1993, p. 207), theTns are finiteforeverystarting value whenthechain (0(t)) is Harris thatis, such thattheexpectednumberof returns recurrent; is infinitewithprobability1. [See Chan and Geyer(1994), and Tierney(1994) for a discussionon the occurrenceof Harrisrecurrencein MCMC setups.] is thatalthough The main motivationfor thisdefinition thefinitevalued sequence deducedfrom0(t) by k ,q(t) = EIgA(() 1 i= is not a Markovchain;the subchain (<S(r)) - This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions t(7) 1057 and Robert:DiscretizedMCMC and Convergence Assessment Guihenneuc-Jouyaux functions such that a non-negative enjoysthisproperty. Markov chain Proposition1. For a Harris-recurrent (o(t)), if the subsamplingtimes fn are definedas in (3) by the visitingtimes to one of the k disjointsmall sets A1,-.. , Ak, thesequence ( -(n)) = (r1(Tn)), whichrepresents thesuccessiveindicesof thesmall sets visitedby thechain ((t)), is an homogeneousMarkov chain on the finitestate space {1,..., k}. Pr(O(t?i) E B 0(t)) > s(0(t))v(B). (4) Therefore,the subsequentnotionsof splittingand renewal can be extendedto this setup,leadingto a widerrange of applicationsforProposition1. For instance,Myklandet al. (1995) have shown thatan hybridMCMC algorithmcan to ensurethat(4) holds. always be constructed 3. CONVERGENCE ASSESSMENT FOR FINITE Proof. To establishthat(((n)) is a Markov chain,we STATE SPACES need to showthat((n) dependson thepast onlythroughthe 3.1 The DivergenceCriterion last term,((n-l). We have Once a finitestate space chain is obtained,the entire = I... = j, ((n-2) Pr(~((?z= il((n-l) rangeof finiteMarkovchaintheoryis available,providinga convergenceresultswhose conjunction varietyof different = j = Pr(r1(T -) = j q(Tn-1) can strengthen theconvergencediagnosis.We choose to use an exact evaluationof the mixingrate of the chain based on thecomparisonbetweenthenumberof visitsto a given = Pr(O(Tn-l) E Ai2 (Tn-1l-) E A., startingpoints.This so-called distatefromtwo different froma convergenceresultof is derived evaluation vergence E Al,...) 0(Tn -2-1) itmeetsthe Kemenyand Snell (1960). Besides itssimplicity, main requirement of convergencecontrol,because it comstartingpoints = EO(o) [llA, (09( 7-1-1)) E07_-)E Aj) pares the behaviorof chains withdifferent untilindependencefromthe startingpositions.Obviously, alternative criteriacan be similarlydevisedbased on other 2 -1) f E-Al, ... convergenceresults(e.g., Chauveau and Diebolt 1997). maIn thestudyof regularMarkovchainswithtransition - E0(o) 1[IA,(O(Tn-1-1+An)) EA trixP, Kemenyand Snell (1960) pointedouttheimportance Q(T0n-1-1) of the so-calledfindamentalmatrix Q(Tn-2-1) E Al,.. ]) Z - J- (1P -A)]-', /(Tnn-2). of (Tn-i) where,An= Tn-Tn-i is independent whereA is thelimitingmatrixlP, withall rowsequal to wF, the strongMarkovpropertyimpliesthat Therefore, associated with P. A particular the stationarydistribution if j (T) denotesthe numberof that is of interest property = j ((n-2) = = j ((n-1) Pr(((n) timesthattheMarkovchain (0(t)) is in statej (1 < j < k) = EQ(o) [RA, (O(Tn-i-I+An)) Q0(Tn-l-1) E A in thefirstT stages,thatis, 1(Trn) =2 | (Tn- Q(Tn-2-1) =E [IA, (O(Al))] E Al ...] = Pr(&(n) T = ij&(n-1) - Tj j) (T)= ,Gj (0()) t=1 A0,thedivergence because (O(t),t > Tnl (Tn)) is distributedas (O(t), t > thenforany initialdistribution of thechaincan be derivedfrom T" (n))). The homogeneity divj (Ao,-w) = rlimEA0[Tj(T)] -Twj T-~oo theinvariance(in n) of Pjz given that 0(T -1) - Pr(((n) = i (n1)) v3 for every n. Figure I illustratesdiscretizationon a chain withthree smallsets:A1 = [-8.5,-7.5], A2 = [7.5,8.5],and A3 = [17.5,18.5], whichis constructedin Section 5.2. Although the chain visits the threesets quite often,renewaloccurs as shownby the symbols. witha muchsmallerfrequency, This resulthas importantbearingson convergenceassessment,because it shows thatfiniteMarkov chains can be rigorouslyderivedfroma continuousMarkov chain in renewalsetups.The drawbackis obviouslythatthe small sets need to be exhibited,but Myklandet al. (1995) have proposed quasi-automaticschemes to this effect.In fact, a regeneration (1) oftenoccursfor conditionextending -namely, thatthereexists Hastings-Metropolisalgorithms satisfies k divj(Ao, -w) ZAo zlj - Tj, (5) 1=1 where the zij's are the elementsof Z. A consequence of (5) is that,if two chains startfromstates u and v with numbersof passages in j, Tju(T) and jv(T), corresponding limitis thenthecorresponding divj(u,v) =r ]i { u [t=1 = zuj-ZUj This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions ]fj( 0 ~~~~~~(6) 1058 Journalof the AmericanStatisticalAssociation,September 1998 0 50 150 1 00 O20 Figure 1. Discretizationof a ContinuousMarkovChain Based on Three Small Sets. The renewal events are representedby trianglesforA1, circles forA2, and squares forA3. The relevanceof thisnotionforconvergencecontrolpur- ased estimatorsof 0; thatis, useless for the estimationof poses is multiple.First,it assess theeffectof initialvalues divj(u, v). It thusmakes sense to stoptheestimationat this on the chain by exhibitingthe rightscale for the conver- meetingtime.Therefore,computationof gence (5) to a finitelimit.Indeed,each term lim {Eu [Tj(T) -Ev [Tj(T)1} T--+oo Eu 1., (o() is greatlysimplified, because the limitis not relevantanymore.More formally, if N(u, v) denotesthestoppingtime is infinite, because the chains are recurrent. In thatsense, whenthechains(0(t)) and (0(t)) starting fromstatesu and v theconvergenceresult(5) is strongerthanthe ergodicthe- are identicalforthefirsttime,we have thefollowingresult, orem(i.e., theconvergenceof the empiricalaverageto the whose proofis givenin AppendixA: theoretical expectation), because thelateronlyindicatesinLemma 1. For everycoupleofMarkovchains(0), 0(t)) dependencefrominitialvalues in the scale 1/T. Note the such thatE[N(u,v)2] < +oo, the divergencedivj(u,v) is of similarity givenby _t-1 T j f~(O (t) t=l1 - T7fj E N(u,v) E ))-11j(0(0))} 11j(0( (7) to Yu and Mykland's(1995) cusumcriterion, Lt=l thedifference beingthatir3is estimatedfromthesame chainin theircase. The conditionE[N(u, v)2] < +oo, which holds in the Moreover,the criterionis immediatelyvalid in the sense case when both chains are independent(because the state thatit does notrequirestationarity, butrathertakesintoacspace is finite;see Meyn and Tweedie 1993, p. 316), is counttheinitialvalues.A thirdincentiveis thattheproperty not necessarilyverifiedby coupled chains;thatis, in cases thatthelimitingdifference in thenumberof visitsis equal to (zuj - zvj) providesa quantitativecontrol,because the when (0(t)) and (O(t)) are dependent.But, in the case of matrixZ can be estimateddirectlyfromthe transitions of a strongcoupling-namely, when 0(t) is a deterministic the Markov chain. We thus obtaina controlvariatetech- functionof (O(t), 0(t-1) 0-1))-the stoppingtimesatisfies nique forgeneralMarkovchains,because the estimatesof E[N(u, v)2] < +oo. (This setupcan indeedbe rewritten in divj(u,v) and of (Zll; - zvj) mustconvergeto thesame termsof a singleMarkovchain.)It is thusrarelynecessary quantity. to verifythatE[N(u, v)2] < +oo holds in practice. In practice,we can use M replications(O(t,)) and (O$jt) 3.2 DivergenceEstimation with initialvalues u and v, (1 < m < M). If Nm(u,v) Given a finite(or a discretizedcontinuous)state-space denotes the firstepoch t when 0Ot) and 0(t, are equal, Markov chain,the implementation of the divergencecon- thenthedivergence (u, v) can be approximatedby divj trolmethodrelies on an estimateof div,(u, v). Simple esM Nm (u,v) timatorsbased on parallelchains startingfromu and v do not stabilizein T even in stationarysetupsforreasonsre(8) M E E [li(Su,m)-llj0,)]X t=l lated to the ruinphenomenon,exhibitedin a coin-tossing experiment by Feller (1970, chap. 3). A superioralternative whichis a convergent estimatorwhenM goes to infinity. is to use stoppingrules whichaccelerateconvergence. 4. DIVERGENCE ESTIMATION FOR In fact,considertwo chains (O(t)) and (O(t)) withinitial CONTINUOUS CHAINS valuesu and v. Whenthesechainsmeetin an arbitrary state, 4.1 Implementation theirpathsfollowthe same distribution fromthismeeting time and the additionalterms1j(1(t)) - 1j( (t)) are unbiThe estimationof divergencevalues in the continuous m=l1 This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions 1059 Guihenneuc-Jouyaux and Robert:DiscretizedMCMC and ConvergenceAssessment case is quite similarto the proposal in Section 3. For a givenreplicationm of theM parallelrunsused to evaluate the expectation(7), k chains (0(t) ) are initializedfromthe k boundingmeasuresvj (1 < j < k). The generationof 0(t) is modifiedaccordingto (2) when0o)-1)entersone of the small sets Ai, and thismodification providesthe subof themthreplicationto the chains((y7) ). The contribution of divj(i1 i2), approximation N lim E [113(( (n) (( (n) )-lj ) n=1 is actuallygivenby small set Aj. Second, traditionalantitheticargumentscan to thissetting,to acceleratea meetingin the be transferred same small set. 4.2 BenchmarkExample The nuclearpumpfailuredatasetof Gaver and O'Muircheartaigh(1987) has been repeatedlyused in the MCMC of theGibbs samto illustratetheimplementation literature (e.g., Gelfandand Smith pler and of hybridmodifications 1990; Myklandet al. 1995; Tanner1993). It is invokedhere to illustratethe factthatsmall sets and renewaltimescan easily be derivedin standardsetups.We recall thatthe associatedGibbs sampleris to simulate E {13. (i (9) ]13 ( (n2,m)} 1 ,m) (I < i-< I10) g;a(p, + a) ti + /3) Ail/,3, ti,p, Nm(i1,i2) n=1 ( a + 10a, 6+ EAi) to the whereNm(i1, i2) is the stoppingtimecorresponding -, A1o /3JA1, firstoccurrenceof because (9) is an unbiased estimatorof divj(i, i2) accordingto Lemma 1. wheretheobservationsare (pi,ti) and thehyperparameters An interesting propertyfollowsfromthe alternativeex- are chosen as a = 1.8, ry .01, and 3 1. If we introduce 1,... ,J), then pression small sets of the formAj = /31j3l (j= kernelis thelowerboundon thetransition -(n)= Nm (ilj Z n=1 i2) Nm llj(<i(n) (i 2 'll) -<}(n)' Z (10) n=1 whereN,17(ijji2) and Nm(i2Ii1) are the "meetingtime"for bothchains,whichcorrespondto theabsolutestoppingtime Tm(i1 i2) when the two (continuousstate-space)chains (0Mt)m)and (0(t)m) meetforthefirsttimein thesame small set Al (i.e., 0(Trnii,j2)) c Al) and 0(Tr(il,i2)) c Al, and whenrenewaloccurs forboth.Indeed,it would seem that this stoppingrule improvesthe estimationof divj(i',i2) because it is usually smallerthan equation (9). However, thisapproachinducesa bias in theevaluationof divj(il, '2), as shownby thefollowingresult. Lemma2. If the couple of chains (0(t), 04t) ) is such thatthe stoppingtime T(u, v) verifiesE[T(u, v)2] < oo, theevaluation(10) of divj(ii, i2) is biased, N(ulv) div,(u,v) -E E n=1 K(/3',31)> / (3 + (/oa+of) x e-0' (6+EZi 10 { {- x dAl .. 10 {(ti Ai) A) )2~(ti(pi+ca) Xx II E +/3.)P2?c+Oz + ) AiP+ A e (t7'+O3)At } dA1o + 3)P?+c Ii= (ti +<j)P2+; { } (j3) 1J(?lea +1Oc-1 + a) N(vlu) lj (< (n) _ E 1 (<S(n)) n=1 =E[N(ulv)-N(vlu)] 7rj, xi=1 dA dA1o. to be in statey. where7rjdenotesthestationary probability to thefinitecase, theconditionE[T(u, v)2] < 00 Contrary Thus probabilityof renewalwithina small set Ay is is usually difficult to assess, but the argumentagainstus10=rI(t + i3 )+, =?.J ing the absolute stoppingtimeremainsobviouslyvalid if this conditiondoes not hold. Note also thatcouplingbetween the chains (0(t) ) (1 < u < k) can be used to reduce Nm(i1, i2) and thus to accelerate the estimationof whereastheboundingprobabilityv; is the marginaldistrito implementin bution(in ,B)of thejoint distribution divj(u,v), althoughcoupling is difficult continuousstate-spacechains (see Johnson1996). In fact, two departuresfrom independencebetween the parallel giFa(pi+ v,t +a), (i chainsare of interest.First,thesame uniformrandomvariable can be used at each (absolute)timet to decide whether thisis a renewaltimeforeverychain enteringan arbitrary This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions 1060 Journalof the AmericanStatisticalAssociation,September 1998 runof theGibbs sampleron 5,000 iterations A preliminary providesthesmallsetsgivenin Table 1 as thosemaximizing theprobabilityof renewal j = EjPr((t) c A3). the convergence Followingthe foregoingdevelopments, assessmentassociatedwiththesesmallsetsAj can be based on parallelrunsof eightchains (:(k) (j 1,.. .,8, m 1,.. ., M) starting fromtheeightsmallsetswithinitialdisthe corresponding tributions vj: 1.Generate (i = 1, . . . , 8) Ai - 2. xp (ti + ,Bj); Generate /3,ga (-y+10a, The chains (t) d+EA,). finitestatesspace ) inducecorresponding 5. CONVERGENCE ASSESSMENT FOR TWO PARALLELCHAINS 5.1 ConvergenceAssessment RatherThan DivergenceEstimation For evaluatingthe convergenceof an MCMC algorithm, (e.g.,Geyer,1992; literature thereis nowa well-documented Rafteryand Lewis 1996; Robert 1996, sec. 6.5) about the problemsassociated with using parallel chains,including dependenceon startingvalues, ambiguousdistribution for The call forparalthefinalvalues,and wasteof simulations. negative lel chainsin thepresentsetuphas ratherdifferent features.On the one hand,the estimatesof the transition matrixP and of the divergences, divj(u, v) thusproduced are quite valid,because the dependenceon startingvalues is inherentto the criterion.On the otherhand,the sample producedby the finalvalues of the parallel chains cannot of be exploitedas a stationarysample of the distribution interest, because of the shortrunscreatedby the stopping rule. Moreover,startingan equal numberof chains from each small set does not necessarilyreflectthe weightsof thesesets in the stationarity In thatsense,the distribution. methodis the oppositeof an "on-line"controltechnique, even thoughit providesusefulinformation on the mixing rateof thechain. We showin thislast sectionhow thedivergencecriterion can be implementedwithonly two parallel chains,for an numberof small sets Ai. This alternativeimplearbitrary mentation is based on Birkhoff's pointwiseergodictheorem, whichwe now recall (see Battacharyaand Waymire1990, pp. 223-227, for a proof). We denote X = (X(1),...) a Markovchain and TrX = (X(r+l), X(r+2),...) the shifted versionof the chain. chains (4)j) with j and contributeto the approximationof the divergencesdiv,(iI, i2) via the sums (9), dependingon couplingtimesN(ii, i2). Figure2 describesthe convergenceof fourselectedestimateddivergencesas the numberof parallel runs increases.The averages stabilize ratherquickly,and moreover,the overallnumberof iterationsrequiredbythemethodis moderate,because themean couplingtimeis only 14.0; this impliesthateach sum (9) involveson average14 stepsof theGibbssampler.The standarddeviationis derivedfromtheempiricalvarianceof the sums (9). To approximatethe ratioEv(/3')/K(3,/3') mentionedin Theorem1. For an ergodicMarkov chain (X(n)), with Section 2.1, the integralsin both v(,3') and K(/3,/3')are 7rand a functionalg of X, the averdistribution stationary replacedby sums (see Robert1996), leadingto theapproxage imation - cv(/3') M ME g(TmX) K(3,/3') S=1 (( z (d+ Ei2 + Ez? s) 7Y?Oc exp {-/3 =1A?} Z1 exp{-,/3AEl } 771= (l 1 convergesalmostsurelyto theexpectationE1,[g(X)]. This resultthusextendsthestandardergodictheorem(see Meyn and Tweedie 1993) to functionalsof thewhole chain and thusallow forrepeateduse of thesame chain.In particular,if R is a stoppingtime-thatis, a functionalsuch that wherethe AMare generatedfrom8xp(ti + 3j) and the Ai from8xp(t -i+3). This approximation deviceis theoretically theeventR(X) = n is determined by (X(1),..., (X(n)) for An S justified large enough. accelerating(and stabilizand Tweedie 1993, p. 71)-and if g Meyn only (see ing) techniqueis to use repeatedlythe same sample of S satisfies Exp(l) randomvariablesforgenerationof the (ti +?/)A8I's; that is, to take advantage of the scale structureof the In the simulations,we took S = 500, gamma distribution. = g(X(1)) . g(X()) *... , X(n))...) X(R(X(1).... althoughsmaller values also ensure stabilityof the approximation. Table 1. Small Sets Associated Withthe Transition KernelK(13,13')and Associated Parameters Aj [1.6, 1.78] [1.8, 1.94] [1.95, 2.13] [2.15, 2.37] [2.4, 2.54] [2.55, 2.69] [2.7, 2.86] [2.9, 3.04] 6E .268 .0202 .372 .0276 .291 .0287 .234 .0314 .409 .0342 .417 .0299 .377 .0258 .435 .0212 pi This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Guihenneuc-Jouyaux and Robert:DiscretizedMCMC and ConvergenceAssessment 1061 (8,1 ,2) (7,8, 1) to 1 .................... 0 10000 20000 ........ o...........-.-.. 30000 40000 50000 0 10000 (1,2,3) 20000 30000 40000 50000 40000 50000 (2,3,4) C\J c'J 0 10000 20000 30000 40000 50000 0 10000 20000 30000 Based on (2) forFour Chains StartedFromFour Small Sets Aj. The triplets(i1, i2, I) index Figure2. Convergenceof the DivergenceCriterion in the numberof visitsof I by the chains (c")) and ((t)). The envelope is located two standard deviationsfromthe average. For the difference each replication,the chains are restartedfromthe correspondingsmall sets. The theoreticallimitsderivedfromthe estimationof P are -.00498, -.0403, .00332, and -.00198 (based on 50,000 iterations). thentheforegoingresultapplies.In thesetupof thisarticle, X can be chosenas beingmade of two replicationsx(n) (a(n),((n)) of thediscretizedsubchainof Proposition1. The stoppingtimeis thenthefirstepoch N when >(n)and~(n) u and (1) = v), and are equal (thatis, N(u, v) for (1) the functionalg is vectorvalued withcomponent(j, u, v) equal to N(u,v) E n=1 [[ '1 ) (2 ) if (1) = u and 61) = v, and 0 otherwise. The gain broughtby this resultis far fromnegligible, because insteadof using a couple of (independentor not) chains(a(n), (n)) only once betweenthe starting point and their stoppingtime N, the same sequence is used N - 1 timesin the sum (11) and contributes to theestimationof thedivergences forthevalues(a(n), (n)= (u, V) 1,. .. N). Moreover,the methodno longernecessi(n tatesrestarting thechains (n) and((n) oncetheyhavemet. This featureallows for on-linecontrolof the MCMC algorithm,a bettermixingof the chain, and directuse for chains(O(n)) purposes.In fact,thecontinuous estimation behindthediscretizedsubchains( (in)) (i = 1,2) are generand theresulting( (n) an)),S ated withoutany constraint, are used to update the divergenceestimationsby batches; (n) thatis, everytime (n) As before,a firstconvergenceassessmentcan be based on thegraphicalstabilizationof theestimateddivergences. Note thatwe also can estimatethevarianceof (9), because the batches between renewal times are independent(see Robert1995 forthevarianceestimation).A morequantitativeevaluationof theconvergenceof thedivergenceestimatorsfollowsfromcomparisonof theestimateddivergences with the estimatedlimitszid - z22' when the transition is derived of (((n)) fromthevariouschainsandthe matrix matrixZ is computedwiththisapproximation. fundamental We now considertwo standardexamples to show how Note also thatalthough ourtechniqueappliesand performs. both examples involve Gibbs samplingtechniqueswhere the derivationof the small set is usually straightforward, minorizingtechniquescan be easily extendedto general Metropolis-Hastingsalgorithms. This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Journalof the AmericanStatisticalAssociation,September 1998 1062 5.2 (1995), a standardGibbs samplerforthismodel is Cauchy PosteriorDistribution Considerthattheposteriordistribution wF(OX1,X2,X3) X -02 /2o2 X (1 + (0 [(1 + (0 -X)2) - OXl, X2, X3, ?7i,7r2, + (0 - X3 ) X)2)(1 , (12) rl_,AS '7lXl Vtl 2,3) (i=1, oioxi -sxp(l+(O-xi)2) (13) 773 + 'q2X2 + 'q3X3 + 'q2 + 'q3 + 07f 2 I 71l + 972 + 'q3 + (f -2J whichcorrespondsto a normalpriorassociatedwiththree The intervalsC = [rl,r21 with xi < rl < X2 < r2 < Cauchy observationsx1,x2, and X3. As shownby Robert X3 are small sets (see Robert 1995) forthe Markov chain (B,C,D) ~~Ill II ,,1 O I \\ 1\ A' s 1L, I o | i ': ; 15000 10000 5000 0 (0DB) (CD,BC) 6 I 0 0 00002 1000 2000 40 00 0 00 06000 (500000iterations) StartedFromB and D. The triplets(I, li, i2) indexthe difference in Figure3. Convergenceof the DivergenceCriterionforTwoChains Initially the numberof visitsof I by chains startingfromil and fromi2. The envelope is located twostandarddeviationsfromthe average. The theoretical limitsderivedfromthe estimationof P are .094, 1.202 and -.688. The scales of the threegraphs representthe numberof times div1(i1,i2)has been updated along the 500,000 iterations.The finalvalues of the threeestimateddivergencesare .164, 1.076 and -.825. This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions 1063 and Robert:DiscretizedMCMC and Convergence Assessment Guihenneuc-Jouyaux (1,3,2) (2,3,1) C>~~~~~~~~ 0 I-~~~~~~~~~~~~~~~~~~~~~~P 1000 500 15F00 200 0 2000 600 400 1000 1200 800 MOMO diagnoses C> 6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ? | ' i t ~~~~~~~~~~" ~~14 ;"' \; 16 ' 18 20 00 22 02 04 06 08 10o mean=Q0052 mean170281 (i1,i2,I) indexthedifference A1 and A2. Thetriplets Started.From forTwoChainsInitially Criterion oftheDivergence Figure4. Convergence theaverage.Thetheoretical from deviations 'i and from from i2. Theenvelopeis locatedtwostandard ofvisitsofI bychainsstarting inthenumber limitsderived fromthe estimationof P are-.110, .0288, and -0225 (based on 100,000 iterations),whereas the finalvalues of the estimated i2) havebeen updated. oftimesdiv1(i1, tothenumber graphscorrespond are .00747,.0161,and-.0162. Thescales on thethreefirst divergences iterations. oftheMCMCsamples,based on the10,000first as wellas thehistograms graphsproducetheaverageconvergence Theconvergence run of the Gibbs Markov chain. A preliminary kernelsatisfies three-state associatedwith(13), and the corresponding sampleron 5,000 iterationsleads to the choice of thethree 1 l+P11 l+P~31 small sets EV(01), 3 V(O/ P K(O00/)> C [7.5,8.5], B = [-8.5, -7.5], wherev is the marginaldensity(in 0) of and Tl 12,713) (0, 711 N1(l++n+2 ++3 x xP(1+ ) D = [17.5,18.5] U 23' xp as optimizingthe probabilitiesof renewal, 71+'q2+773+Uf2 (1+222) x 0 < 10-x21 < 10-Xll -xi, < P12 = < P22=max(r2-x2, x2-ri), and P31=X3-r2 < 10-X31 .02, QC = 0.10, and and Pi, = ri-xi QB- Xp(1+2) <P32=X3-ri. Similarderivationscan be obtainedforthesets B = [si, s2] withsi < xi < s2 < X2 and D = [vl,v2] withX2 < Vl < X3 < V3. If we choose in addition82 < ri and r2 < vl, the threesmall sets are disjointand we can thus create a QD- .05. Figure 1 gives the 200 firstvalues of 0(t) generatedfrom (13) and indicatesthe correspondingoccurrencesof (n). Note thatthe choice of the small sets is by no means restrictedto neighborhoodsof the modes, althoughthis increases theprobabilityof renewal. of thedivergence Figure3 illustratestheimplementation fromthesmall controldeviceon twoparallelchainsstarting sets B and C. Convergenceof the criterionis ratherslow, because thenumberof simulationsof thecontinuouschain (6(t)) is 500,000. The scale of the threegraphs is much This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions 1064 Journalof the AmericanStatisticalAssociation,September 1998 smaller,however,and indicatesthe numberof timeseach divergencehas been updated. 5.3 AR(1) Model Witha Changepoint ConsidertheAR(1) model Xt+j = + Et-'t+1 pixt if t (0, infAPr(Q = ulp,a) < ELuli v~~~~~~~~~~~~2), < p(n), a(")), because a minorization conditionholdsfor (p(n), sets of the formA [P1,P] x [P2,2] X [u,5]. Indeed, K((p, C), (p', C')) can be boundedfrombelow by cv(p', C') on such small sets,wherev is derivedfromtheGibbs sampler (5.3) by simulatingri from infAPr(,= ulp,a) and replacing a by a in the simulationof the pi's, as shown in Appendix C, along with the exact value of c. wheretheparametersP1, P2, a, and eI {1, , T2} are For a simulatedsample of 50 observationswith parameunknown.(See 6 Ruanaidhand Fitzgerald1996 forsimilar ters = -.8, P2 = .2, a = 1.0, and K = 18, a prelimip1 modelsin signalprocessingsetups.)The posteriordistribu- naryexplorationof the parameterspace over 5,000 iterationassociatedwiththeprior tionsleads to the small sets,A1 = [-1., -.77] x [.33,.74] x [.785,.835],A2 = [-1., -.76] x [.35,.73] x [.835,.865],and 7r(P, a, I) = ,f1[-1,1](P1) 1ff[-1,1](P2) {11,-,T-21(/-) A3 = [-1., -.76] x [.35,.74] x [.865,.92],withcorrespond01 ing probabilitiesof renewal.00979,.0153 and .00965. is Figure 4 providesthe evolutionof the divergenceestifortwoparallelindependent mation chainsstartedfromthe (x pit2 T-1 (- T 1 ep-1 x2 + u7 (xt+i-pixt) exp-2(72 small sets A1 and A1. The 1,267 pointsin the graphscorrespondto theupdatesof thedivergenceestimationsat the meetingtimesof the discretizedchains.The totalnumber T-1 of iterationsin the continuouschains is actually100,000, + Z(Xt+l-p2Xt)21 foran averageexcursiontimeof 78 iterations. Xt+1 = 7 = P2Xt + T1 ep2u2 exp- 2 if t > E4+1 2I,Pi E 1 + E1 xt xt 2 - 2p, E1 xtxt+i 1 6. CONCLUSION 1 In this articlewe have establisheda theoreticallyvalid and implementableapproachto Markov chain discretizaT-1 T-1 T-1 tion and illustratedthe performanceof a convergenceas2 x2 x2 - 2P2 + P sessmenttechniqueforfinitestatespaces based on the no(14) Xt++ K+1 K+1 K+1 with our method tion of divergence.Potentialdifficulties that it the determination of small sets Aj are (a) requires whichcan be simulatedfromthefollowingGibbs sampler: withmanageableassociatedparameters(ES,vj), and (b) it providesconservativeconvergenceassessmentsforthedisPr(/-= u) ocexp- 12 [Tj(u)Pj(Pl2pt1(u)) cretizedMarkovchain. + T2(U)P2(P2 -2Y2 (2)) The latterpointis a consequenceof ourrigorousrequire, mentsfor the Markov structureof the discretechain and It is exact convergenceof the divergenceapproximations. to exhibitsafeboundson thenumber thusrathercomforting of simulationsnecessaryto give a good evaluationof the ATP2 T (82K),( ) of interest.We indeedpreferour elaborateand distribution P2 VTP 00 slow but well-grounded convergencecriterion theoretically T2 to handyand quick alternativeswithlimitedjustifications, and because thelatterare testedin veryspecial setupsbutusuand inaccuraciesoutsidetheseseally encounterdifficulties tups.Note, however,thatthe convergenceassessmentdoes ga-a( 2x2 + E (xt+l_Pixt)2 ,7-2 nottotallyextendto theoriginalcontinuousMarkovchain. the few examplestreatedhere show thatusing Moreover, iN T-1 the estimateof 1P as a controlvariatetechniqueleads to + E (Xt+l P2xt)2J) , (15) long delays in the convergencediagnostic. The difficulty (a) is obviouslymore of a concern,but whereMfT (At,T) denotesthe normaldistribution restricted thereare theoreticalassurancesthatsmallsetsexistin most to [-1,1], and the Ti(u) and pii(u) (i = 1,2) are given MCMC setupsand, moreover,Myklandet al. (1995) have by (14). proposedsome quasi-automatedschemesto constructsuch Note that we are in a setup where the dualityprinci- small sets by hybridmodifications of the originalMCMC ple of Diebolt and Robert (1994) applies, because iN is algorithm.Note also thatthetechniqueswe used in theexa finitestate-spaceMarkov chain. But the discretization amplesof Section5-namely, to boundfrombelow theconmethodexposedin thisarticlecan also be used forthechain ditionaldistributions dependingon 0-can be reproducedin This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Guihenneuc-Jouyaux and Robert:DiscretizedMCMC and Convergence Assessment 1065 manyGibbs setupsand in particularfordata augmentation, the minorization condition(1), and thatotherconvergence whereasMyklandet al. (1995) have shownthatindependent diagnosticsbased on the naturalfiniteMarkovchainsgenHastings-Metropolissettingsare quite manageablein this eratedin these setups would be preferable.Moreover,as respect.At last,thechoice of thespace wherethesmallsets also pointedout by Gilks, Roberts,and Sahu (1997) for are constructed is open,and-at least forGibbs samplers- an accelerationmethodusingregeneration, the applicabilthereare oftenobvious choices as in the dualityprinciple ityof the methodin high-dimensional problemsis limited of Diebolt and Robert(1993, 1994). It mustbe pointedout, by the difficulty to obtainefficient minorizingconditions, however,thatmissing-datastructures like mixturesof dis- even thoughnew developmentsare bound to occur in this tributions are notoriousforleadingto verysmallboundsin area, giventhecurrentinterest. APPENDIX A: PROOF OF LEMMA 1 Because N(u, v) is a stoppingtime,the strongMarkovpropertyimpliesthatforn > m, E [h(lP)) |(N(u,v)) m] =E j,N(u,v)= ))] [h((( E [h((v)) | v(v)) j, N(u, v) m] foreveryfunctionh, and,by conditioning, we derivethat E [NAN(u,v) EN 3 E{[(()-(( 13} 13u13U S(r {X E = -N(u,v) {1(13u + E ((v 1 13 ))} IN(u,v)<N n=1 13 J(u +E [ Now E [5 {nIj((n) - N(u,v)>N] < NP(N(u, v) > N), < E[N(u,V) 2]/N, ))} I<N(U,v)>N] (n) I[(v) 13 and impliesthattheleftside goes to 0 whenN goes to infinity -N(u,v) E -N[(u,v) 13((n) _ 1[ {[ 13(d ( E fn)}_E n=1 S ) }IN(u,v)<N- n=1 -N(u,v) E {13(u N) 1 ( goes to 0 whenN goes to infinity by thedominatedconvergencetheorem.Therefore, [ limE N -N(u { i (((n))} i)- =E E < 2E [N(u,V)lfN(u,v)>N] )N(u,v)>N ] ,v) [ APPENDIX B: PROOF OF LEMMA 2 For ease of notation,we defineNu = N(u v) and Nv - N(vIu). Conditionallyon (Nu, N,), we get,forN > NM,V Nv, E N (5'{i3(Qn) 1 ($n))}2 Nu n= 1 n=l1 - Nu = E E 5 n=1 Nu =E + 5 N-N, I3 ]f('(b+Nu)) ((nN) n1 n=1 Nu-N, E i(~(n))] n=Nv+l1 N, (n) -n=l + 5 )_- N-Nu (((n)) n=ln=1n= Ln=1 IS(71 n=N,,+1 N( ;Ij (n)) N N N, 5 (i(nP)-I(~(n))?+ =E IN NNu INu>n=lE N,-Nu7 3 (N(-(nfl)) 'NN <Nv I 3=.(B.11) I)) This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Journalofthe AmericanStatisticalAssociation,September 1998 1066 and &,Nv+n) because the chains(NU+n) (n > on (Na, conditionally 0) have the same distribution and thus (N-N,/)A(N-Nv) -(N-NU,)/\(N-Nv) E E NV), +Nv ) ((n ) _ 1.j (tn+N,, O. 0 n=1 n-=1 to the case Because E[(Nu V Nv)2] < E[T(u, v)2], an argumentsimilarto the one in the proofof Lemma 1 validatesthe restriction betweendivj(u, v) and (10) is givenby N > Nu V NV whenN goes to oo, and the difference EI[ (3 ()] E[NU- Nv]E = E[Nu -Nv]7j r. because theMarkovchain (a(n)) is ergodicwithlimitingdistribution APPENDIX C: MINORIZINGCONDITIONS FOR EXAMPLE IN SECTION 5.3 When (p, a) E A, T-2 ? Pr(ni= UIP,9)71 (P/19, K((P,a),(P',a')) = lP, U) (/ U)72 u=1 >ES U=:1 infAe-(1/2a2)[_1(u)P1(p1 T-12U _K1SP e [D{(I 2 1Uinf,?,?5 H i1 a (Hence ? is equal to (-) 2I 2 [1{(i- 2 T - 1 [ ] } 2 t,u(U)) (2 _=1 - p4(u))/VVT%(U)U} inf:UT | 1 -I( TU) a )U -,Df 1Pj,)729j,) - T%(u)U}] A(U))/ - - 2 F1(tU) + (p2 - [2 (n)) ET -1-A(u) - X = vlp,a) infAPr(n >3T-21 i(u))/x/7(u) ex {(PI (P2 -2p2(K))] infAPr(n= ulp,c) ulp,a) supAPr(n= 2/A2(u))] -2A2(K))+-r2(K)P2 (1/2a2)[-r1(/i)pj(pl LT-2 infAPr(Q ulp,a) T 2 2pl(u))+T2(u)p2(p2 JJ T T2 (U) } infAPr(= ulp,a) supAPr(,n ulp,a) and theminorizingmeasurev is T _2_infPr___=ulp__a U= V1infAPr(n =uvp, a) xx (PI p _ pIu(U)) 2T1(U) + (p2 2u2 - /12(Kr))T2 (U) 2 x 227r2 (U' p,u ) ]7J[1{(I - 1ti(u))//i(u)U} - {(-1 - Ati(u))/V/Hi(u)U}] -1 -22i= [ReceivedJune1996. RevisedMarch 1998.] REFERENCES Asmussen,S. (1979), AppliedProbabilityand Queues, New York:Wiley. Battacharya,R. N., and Waymire,E. C. (1990), StochasticProcesses With Applications,New York:Wiley. Chauveau, D., and Diebolt, J. (1997), "MCMC ConvergenceDiagnostic via theCentralLimitTheorem,"technicalreport,Universit6de Marnela-Vallee. Diebolt, J.,and Robert,C. P. (1993), "The Duality Principle,"Journalof theRoyal StatisticalSociety,Ser. B, 55, 71-72. Diebolt, J.,and Robert,C. P. (1994), "Estimationof FiniteMixtureDistributions ThroughBayesianSampling,"JournaloftheRoyal Statistical Society,Ser. B, 56, 163-175. Feller,W. (1970), An Introduction to ProbabilityTheoryand Its Applications,Vol. 1, New York:Wiley. Gaver,D. P., and O'Muircheartaigh, I. G. (1987), "RobustEmpiricalBayes Analysisof EventRates,"Technometrics 29, 1-15. Gelfand,A. E., and Smith,A. F. M. (1990), "Sampling-BasedApproaches to CalculatingMarginalDensities,"Jouirnal of theAmericanStatistical Association,85, 398-409. Geyer,C. J. (1992), "PracticalMonte Carlo MarkovChain" (withdiscus- sion),StatisticalScience,7, 473-511. Gilks,W. R., Roberts,G. O., and Sahu, S. K. (1997), "AdaptiveMarkov technicalreport,Cambridge ChainMonteCarlo ThroughRegeneration," University, MRC BiostatisticsUnit. JohnsonV. E. (1996), "StudyingConvergenceof Markov Chain Monte Carlo AlgorithmsUsing Coupled Sample Paths,"Journalof theAmerican StatisticalAssociation,91, 154-166. Kemeny,J. G., and Snell, S. L. (1960), FiniteMarkovChains,Princeton, NJ:Van Nostrand. Meyn, S. P., and Tweedie, R. L. (1993), Markov Chains and Stochastic London: Springer-Verlag. Stability, Mykland,P., Tierney,L., and Yu, B. (1995), "Regenerationin Markov Chain Samplers,"Journalof theAmericanStatisticalAssociation,90, 233-241. 6 Ruanaidh,J. J. K., and Fitzgerald,W. J. (1996), NumericalBayesian MethodsAppliedto Signal Processing,New York: Springer-Verlag. Propp,J. G., and Wilson,D. B. (1995), "Exact SamplingWith Coupled Markov Chains and Applicationsto StatisticalMechanics,"technical report,MassachussettsInstituteof Technology,Dept. of Mathematics. Raftery,A., and Lewis, S. (1992), "How Many Iterationsin the Gibbs Sampler?"in Bayesian Statistics4, eds. J. M. Bernardo,J. 0. Berger, Press, A. P. Dawid, and A. F. M. Smith,Oxford,UK: OxfordUniversity pp. 765-776. This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions Guihenneuc-Jouyaux and Robert:DiscretizedMCMC and ConvergenceAssessment 1067 Raftery,A. E. and Lewis, S. (1996), "Implementing MCMC," in Markov CentralLimitTheoremsforMultidimensional Hastingsand Metropolis chain Monte-Carloin Practice,eds. W. R. Gilks,S. T. Richardson,and Algorithms, Biometrika,83, 95-110. D. J. Spiegelhalter, London: Chapmanand Hall, pp. 115-130. Tanner,M. (1991), Tools for StatisticalInference:Observed Data and Robert,C. P. (1995), "ConvergenceControlTechniquesforMonte Carlo Data Augmentation Methods,LectureNotes in Statistics67, New York: Markov Chain Algorithms," StatisticalScience, 10, 231-253. Springer-Verlag. Robert,C. P. (1996), Methodesde Monte Carlo par Chai nes de Markov, Yu, B., and Mykland,P. (1994), "Looking at Markov SamplersThrough Paris: Economica. Cusum Path Plots: A Simple Diagnostic Idea," TechnicalReport413, Roberts,G. O., and Tweedie,R. L. (1996), "GeometricConvergenceand Universityof California-Berkeley, Dept. of Statistics. This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz