Discretization of Continuous Markov Chains and

Discretization of Continuous Markov Chains and Markov Chain Monte Carlo Convergence
Assessment
Author(s): Chantal Guihenneuc-Jouyaux and Christian P. Robert
Source: Journal of the American Statistical Association, Vol. 93, No. 443 (Sep., 1998), pp. 10551067
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2669849 .
Accessed: 14/10/2013 12:55
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Discretization of Continuous Markov Chains
and Markov Chain Monte Carlo
Convergence Assessment
and Christian P. ROBERT
Chantal GUIHENNEUC-JOUYAUX
We show thatcontinuousstate-spaceMarkov chains can be rigorouslydiscretizedinto finiteMarkov chains. The idea is to
Once a finiteMarkovchain
subsamplethe continuouschain at renewaltimesrelatedto small sets thatcontrolthe discretization.
is derivedfromthe Markov chain Monte Carlo output,generalconvergencepropertieson finitestate spaces can be exploited
forconvergenceassessmentin severaldirections.Our choice is based on a divergencecriterionderivedfromKemenyand Snell,
moreefficiently,
on twoparallelchainsonly,
whichis firstevaluatedon parallelchainswitha stoppingtimeand thenimplemented,
of thiscriterionis illustratedon threestandard
using Birkhoff'spointwiseergodictheoremfor stoppingrules.The performance
examples.
KEY WORDS: Divergence;Ergodictheorem;Finitestatespace; MarkovchainMonteCarlo algorithm;
Multiplechains;Renewal
theory;Renewal time;Stoppingtime.
Lewis (1992, 1996), which is devised for binaryMarkov
Rafteryand Lewis (1992, 1996) have developeda conver- chains,and techniquesbased on the centrallimittheorem
and Diebolt (1997), manycongence controlmethodbased on a two-stateMarkov chain, as developedby Chauveau
resultsfromKemenyand Snell (1960) lead to conby creatinga sequence (((t)) of indicatorvariablesfroman vergence
diagnoses.We choose to use the divergencecrivergence
arbitrary
continuousstate-spaceMarkovchain (0(t)),
in
terion,inspiredfromthe convergenceof the difference
thenumbersof visitsto a givenstatefortwo initialstates.
((t)
<0.
The motivationforthis choice is twofold.First,stabilizaof
tion of the differenceis indicativeof the stationarity
Assuminga Markovian structureon the sequence
This
initial
to
matter.
statescease
the chains,because the
theyproposedan evaluationof the"burn-in"timeand of the
particularlyfitsthe purpose of convergencecontrol.Secnumberof simulationsrequiredfora givenprecision,based
ond, the theoreticallimitof the criterioncan be computed
matrixof the((t)'s. The advantages
on thepseudo-transition
matrixof thefinitechain,and thecritefromthetransition
of an approachbased on a preliminarydiscretizationare
as
rionthenappears a particularcontrolvariatemethod.In
numerous.Both the model and the underlyingMarkovian
practice,our convergencecontrolis based on simultaneous
theoryaremuchsimpler,convergenceof thediscretizedvergroups
stabilizationsof empiricaldivergencesfordifferent
sion occursfaster,and the assessmentcan be strengthened
comand
on
a
subsequent
of startingand visitingstates,
by refiningthe discretization,althoughit always applies
This
with
theoretical
limits.
is
only
the estimated
parison
only to the discretizedchain. A drawbackof the Raftery
a firstpossible exploitationof the discretizedchain,other
and Lewis (1992) approachis that (((t)) is not a Markov
evaluationscan be simultaneously
proposedfora stronger
conditionshold (see Kemenyand
chain,unless restrictive
convergencediagnosis. For instance,Propp and Wilson's
valid
Snell 1960). We propose a generaland theoretically
(1995) exact simulationmethodcan be used to generatethe
methodbased on subsamplingof a discrete
discretization
startingvalues of thechains.
sequence derivedfrom(0(t)) dependingon a sequence of
The divergencecriterionand its performancesare first
renewaltimes;thatis; epochs thatseparatethe chain into
establishedfora parallelscheme,whichnecessitatesmany
iid blocks (Meyn and Tweedie 1993).
restartsof the continuouschain. Althoughthis allows for
Once a true finitestate-spaceMarkov chain is conthereare manydrawbacksto usan easy implementation,
severalconvergenceassessmentscan applyforthat
structed,
ing parallel chains. First,the restartspreventany control
chain. Besides the normalapproximationof Rafteryand
of stationarity,
and thus the empiricaldivergencesclearly
lack validityas evaluationsof theirtheoreticalcounterpart.
is Associate Professorat Unit6Associ6e Moreover,the convergenceassessmentalso imposes a difChantalGuihenneuc-Jouyaux
au CentreNational de La RechercheScientifique1323, Laboratoirede ferentimplementation
of the Markov chain Monte Carlo
StatistiqueM6dicale, Universit6Paris 5, 75006 Paris, France (E-mail: (MCMC) algorithm,
because thisparallelevaluationcannot
ChristianP. Robertis Professorand Head of the
[email protected]).
on-line.
We thenreducethenumberof parallel
Statistics Laboratory,CREST, INSEE, 75675 Paris, France (E-mail: be operated
for
of thecontrolmethod
implementation
[email protected]).This work was discussed at the Methods for Control chainsnecessary
of Monte Carlo Markov Chains workshop,held at CREST and involving to twochains,by virtueof Birkhoff's
pointwiseergodictheG. Celeux,D. Cellier,D. Chauveau,J.Diebolt,M.A. Gruet,V. Lasserre,M.
orem(Battacharyaand Waymire1990), withtheadditional
1. INTRODUCTION
Lavielle, F. Muri,A. Philippe,and S. Richardson,to whomwe are grateful for numerouscomments.Commentsfromparticipantsof the HSSS
Conferencein Rebild were helpfulto improvethefocus and organization
of the paper. Criticismsand suggestionsfromG. Casella, the associate
editor,and both refereeswere equally helpfulin improvingthe article's
readability.
? 1998 American Statistical Association
Journal of the American Statistical Association
September 1998, Vol. 93, No. 443, Theory and Methods
1055
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Journalof the AmericanStatisticalAssociation,September 1998
1056
incentivethatthesetwo chainsdo notneed to be restarted
at all. We thus.get as close as possibleto a genuineon-line
evaluation.
The article is organized as follows. Section 2 recalls
useful facts on renewaltheoryand establishesthatfinite
in conMarkovchainscan be constructed
by discretization
tinuousstatespaces. Section3 describesthedivergencecrithroughstopterion,includingan improvedimplementation
ping rules. Section 4 elaborateson the estimationof the
divergenceforcontinuousstatespaces,eliminatinga seeminglyintuitivestoppingrule,and gives a firstevaluationon
Section5 derives
a benchmarkexamplefromtheliterature.
the finalversionof the criterionfromBirkhoff'sergodic
on two examples.Section6 contheorem,withillustrations
cludes thearticle.
2.
dom subsampling,as shownby Meyn and Tweedie (1993,
kernelK(0' 0)
p. 118),or whenthedensityof thetransition
is boundedin a neighborhoodof 0 forall 0 E (9, as shown
by Roberts and Tweedie (1996). This case thus is fairly
commonin MCMC setups.
Once a triplet(A, E, v) is known,the transitionkernel
K(O(t) 0(t1-)) can be modifiedby "splitting"to inducerenewal. Indeed,when0(t-1) E A, we can write
K(((t)
10(t-1)) = Ev(Q(t)) + (1
6 ) K(o(t)0(t-'))
-EV(0(t))
and thus representK as a mixtureof two distributions,
v(0(t)) and
= Ke0(t0 l0("-)
fE-(m
0()I(t-'))
)-Ev(0(0))
into
from0(t-1) to 0(t) canthenbe modified
Thetransition
DISCRETIZATION OF CONTINUOUS
MARKOVCHAINS
of 0(t) E
The majorproblemwitha naivediscretization
like
0(t)
01
V(01)
02
K(02 Q0(t1))
with probabilityE
with probability1 -
6,
(2)
whereA is a measurablesubset,is thatthe sequence ()(t))
is not usuallya Markov chain,because of the dependence
on the previousvalues of q(k). There is, however,a case
of particularinterestwhere a Markov subchain can be
whenbothA and AC are atomsof the
constructed-namely,
chain ((t)). [We recall thata set B is an atom if the transitionkernelP of the chain satisfiesp(O(t+1) E CQO(t)) =
v(C) for every 0(t) E B, where v is a fixedprobability
wellmeasure.]We thusconsidera generaland theoretically
of continuousMarkovchainsbased
groundeddiscretization
on an extensionof atomsto small sets.We firstrecall some
necessarynotionson these sets and theirconnectionwith
renewaltheory.
2.1
Small Sets and Renewal Times.
when 0(t-1) E A. Althoughthe overall transitionis not
modified(marginallyin 0(t)), thereare epochs when0(t) is
of theprevious
generatedfromv(0) and is thusindependent
value 0(t-1). These occurrencesare called renewalevents.
The modificationof the kernelin (2) requires simulationsfromk(010(t-1)) when0(t-1) E A. Even thoughonly
theratioEv(02)/K(02 0(t1)) is neededforthissimulation,
bothK and v are usuallyimplicit.This ratiomustthenbe
We proposeusingsumsof conditionaldensiapproximated.
tiesto removetheintegralexpressionsof bothkernels.This
techniqueis more clearlydevelopedthroughthe examples
of Sections4 and 5.
2.2
Discretization
Considera chain withseveraldisjointsmall sets Ai (i
A smallsetA (see MeynandTweedie1993,p.106)sat- 1,..., k) and correspondingparameters(Ei,vi). The Ai's
isfiesthe followingproperty:There exist,m E N*, E > 0
and a probabilitymeasurevm such thatwhen0(t) E A,
Pr(O(t+m) E B 0(t)) > EVm(B)
(1)
foreverymeasurableset B. It can be shownthatsmall sets
measureexistforthechainsinvolved
withpositiveinvariant
in MCMC algorithms,
because it followsfromworkof Asmussen(1979) thateveryirreducibleMarkov chain allows
forrenewal.Meyn and Tweedie (1993, p.109) also showed
thatthespace 83can be coveredwitha countablenumberof
small sets. When the chain is uniformly
ergodic,as in the
benchmarkexampleof Section4.2 (see Robert1996), 83is
of small
a small set by itself.The practicaldeterminations
sets and of the corresponding(e, v) are more delicate,but
Mykland,Tierneyand Yu (1995) and Robert(1996) have
shownthatthiscan be done in realisticsetups,sometimes
kernel.
througha modification
of the transition
For simplicity'ssake, we will assume in the sequel that
m =1, whichis ensuredwhen the chain is stronglyaperiodic.Strongaperiodicitycan always be achievedby ran-
a partition
of the
(i 1, ... , k) do notnecessarilyconstitute
space. We can thendefinerenewaltimesT by (n > 1),
Tn =
infft >T,,-,; 3~I < i < k, 0(t-1) EEAi
and0(t) V2
}.
(3)
As shownby Meynand Tweedie(1993, p. 207), theTns are
finiteforeverystarting
value whenthechain (0(t)) is Harris
thatis, such thattheexpectednumberof returns
recurrent;
is infinitewithprobability1. [See Chan and Geyer(1994),
and Tierney(1994) for a discussionon the occurrenceof
Harrisrecurrencein MCMC setups.]
is thatalthough
The main motivationfor thisdefinition
thefinitevalued sequence deducedfrom0(t) by
k
,q(t) =
EIgA(()
1
i=
is not a Markovchain;the subchain
(<S(r))
-
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
t(7)
1057
and Robert:DiscretizedMCMC and Convergence Assessment
Guihenneuc-Jouyaux
functions such that
a non-negative
enjoysthisproperty.
Markov chain
Proposition1. For a Harris-recurrent
(o(t)), if the subsamplingtimes fn are definedas in (3)
by the visitingtimes to one of the k disjointsmall sets
A1,-.. , Ak, thesequence ( -(n)) = (r1(Tn)), whichrepresents
thesuccessiveindicesof thesmall sets visitedby thechain
((t)), is an homogeneousMarkov chain on the finitestate
space {1,..., k}.
Pr(O(t?i) E B 0(t)) > s(0(t))v(B).
(4)
Therefore,the subsequentnotionsof splittingand renewal
can be extendedto this setup,leadingto a widerrange of
applicationsforProposition1. For instance,Myklandet al.
(1995) have shown thatan hybridMCMC algorithmcan
to ensurethat(4) holds.
always be constructed
3. CONVERGENCE ASSESSMENT FOR FINITE
Proof. To establishthat(((n)) is a Markov chain,we
STATE SPACES
need to showthat((n) dependson thepast onlythroughthe
3.1 The DivergenceCriterion
last term,((n-l). We have
Once a finitestate space chain is obtained,the entire
= I...
= j, ((n-2)
Pr(~((?z= il((n-l)
rangeof finiteMarkovchaintheoryis available,providinga
convergenceresultswhose conjunction
varietyof different
= j
= Pr(r1(T -) = j q(Tn-1)
can strengthen
theconvergencediagnosis.We choose to use
an exact evaluationof the mixingrate of the chain based
on thecomparisonbetweenthenumberof visitsto a given
= Pr(O(Tn-l) E Ai2 (Tn-1l-) E A.,
startingpoints.This so-called distatefromtwo different
froma convergenceresultof
is
derived
evaluation
vergence
E Al,...)
0(Tn -2-1)
itmeetsthe
Kemenyand Snell (1960). Besides itssimplicity,
main requirement
of convergencecontrol,because it comstartingpoints
= EO(o) [llA, (09( 7-1-1)) E07_-)E Aj)
pares the behaviorof chains withdifferent
untilindependencefromthe startingpositions.Obviously,
alternative
criteriacan be similarlydevisedbased on other
2 -1)
f
E-Al, ...
convergenceresults(e.g., Chauveau and Diebolt 1997).
maIn thestudyof regularMarkovchainswithtransition
- E0(o) 1[IA,(O(Tn-1-1+An))
EA
trixP, Kemenyand Snell (1960) pointedouttheimportance
Q(T0n-1-1)
of the so-calledfindamentalmatrix
Q(Tn-2-1)
E Al,.. ])
Z - J- (1P -A)]-',
/(Tnn-2).
of (Tn-i)
where,An= Tn-Tn-i is independent
whereA is thelimitingmatrixlP, withall rowsequal to wF,
the strongMarkovpropertyimpliesthat
Therefore,
associated with P. A particular
the stationarydistribution
if j (T) denotesthe numberof
that
is
of
interest
property
= j ((n-2) =
= j ((n-1)
Pr(((n)
timesthattheMarkovchain (0(t)) is in statej (1 < j < k)
= EQ(o) [RA, (O(Tn-i-I+An))
Q0(Tn-l-1) E A
in thefirstT stages,thatis,
1(Trn)
=2
|
(Tn-
Q(Tn-2-1)
=E
[IA,
(O(Al))]
E Al ...]
= Pr(&(n)
T
= ij&(n-1)
-
Tj
j)
(T)= ,Gj
(0())
t=1
A0,thedivergence
because (O(t),t > Tnl (Tn)) is distributedas (O(t), t > thenforany initialdistribution
of thechaincan be derivedfrom
T" (n))). The homogeneity
divj (Ao,-w) = rlimEA0[Tj(T)] -Twj
T-~oo
theinvariance(in n) of
Pjz
given that 0(T -1)
-
Pr(((n)
=
i
(n1))
v3 for every n.
Figure I illustratesdiscretizationon a chain withthree
smallsets:A1 = [-8.5,-7.5], A2 = [7.5,8.5],and A3 =
[17.5,18.5], whichis constructedin Section 5.2. Although
the chain visits the threesets quite often,renewaloccurs
as shownby the symbols.
witha muchsmallerfrequency,
This resulthas importantbearingson convergenceassessment,because it shows thatfiniteMarkov chains can
be rigorouslyderivedfroma continuousMarkov chain in
renewalsetups.The drawbackis obviouslythatthe small
sets need to be exhibited,but Myklandet al. (1995) have
proposed quasi-automaticschemes to this effect.In fact,
a
regeneration
(1) oftenoccursfor
conditionextending
-namely, thatthereexists
Hastings-Metropolisalgorithms
satisfies
k
divj(Ao,
-w)
ZAo zlj
- Tj,
(5)
1=1
where the zij's are the elementsof Z. A consequence of
(5) is that,if two chains startfromstates u and v with
numbersof passages in j, Tju(T) and jv(T),
corresponding
limitis
thenthecorresponding
divj(u,v) =r
]i
{
u
[t=1
= zuj-ZUj
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
]fj( 0
~~~~~~(6)
1058
Journalof the AmericanStatisticalAssociation,September 1998
0
50
150
1 00
O20
Figure 1. Discretizationof a ContinuousMarkovChain Based on Three Small Sets. The renewal events are representedby trianglesforA1,
circles forA2, and squares forA3.
The relevanceof thisnotionforconvergencecontrolpur- ased estimatorsof 0; thatis, useless for the estimationof
poses is multiple.First,it assess theeffectof initialvalues divj(u, v). It thusmakes sense to stoptheestimationat this
on the chain by exhibitingthe rightscale for the conver- meetingtime.Therefore,computationof
gence (5) to a finitelimit.Indeed,each term
lim {Eu [Tj(T) -Ev [Tj(T)1}
T--+oo
Eu
1., (o()
is greatlysimplified,
because the limitis not relevantanymore.More formally,
if N(u, v) denotesthestoppingtime
is infinite,
because the chains are recurrent.
In thatsense, whenthechains(0(t)) and (0(t)) starting
fromstatesu and v
theconvergenceresult(5) is strongerthanthe ergodicthe- are identicalforthefirsttime,we have thefollowingresult,
orem(i.e., theconvergenceof the empiricalaverageto the whose proofis givenin AppendixA:
theoretical
expectation),
because thelateronlyindicatesinLemma 1. For everycoupleofMarkovchains(0), 0(t))
dependencefrominitialvalues in the scale 1/T. Note the
such
thatE[N(u,v)2] < +oo, the divergencedivj(u,v) is
of
similarity
givenby
_t-1
T
j f~(O
(t)
t=l1
-
T7fj
E
N(u,v)
E
))-11j(0(0))}
11j(0(
(7)
to Yu and Mykland's(1995) cusumcriterion,
Lt=l
thedifference
beingthatir3is estimatedfromthesame chainin theircase.
The conditionE[N(u, v)2] < +oo, which holds in the
Moreover,the criterionis immediatelyvalid in the sense
case
when both chains are independent(because the state
thatit does notrequirestationarity,
butrathertakesintoacspace
is finite;see Meyn and Tweedie 1993, p. 316), is
counttheinitialvalues.A thirdincentiveis thattheproperty
not necessarilyverifiedby coupled chains;thatis, in cases
thatthelimitingdifference
in thenumberof visitsis equal
to (zuj - zvj) providesa quantitativecontrol,because the when (0(t)) and (O(t)) are dependent.But, in the case of
matrixZ can be estimateddirectlyfromthe transitions
of a strongcoupling-namely, when 0(t) is a deterministic
the Markov chain. We thus obtaina controlvariatetech- functionof (O(t), 0(t-1) 0-1))-the
stoppingtimesatisfies
nique forgeneralMarkovchains,because the estimatesof E[N(u, v)2] < +oo. (This setupcan indeedbe rewritten
in
divj(u,v) and of (Zll; - zvj) mustconvergeto thesame termsof a singleMarkovchain.)It is thusrarelynecessary
quantity.
to verifythatE[N(u, v)2] < +oo holds in practice.
In practice,we can use M replications(O(t,)) and (O$jt)
3.2 DivergenceEstimation
with initialvalues u and v, (1 < m < M). If Nm(u,v)
Given a finite(or a discretizedcontinuous)state-space denotes the firstepoch t when 0Ot) and 0(t, are equal,
Markov chain,the implementation
of the divergencecon- thenthedivergence (u, v) can be approximatedby
divj
trolmethodrelies on an estimateof div,(u, v). Simple esM Nm (u,v)
timatorsbased on parallelchains startingfromu and v do
not stabilizein T even in stationarysetupsforreasonsre(8)
M E E
[li(Su,m)-llj0,)]X
t=l
lated to the ruinphenomenon,exhibitedin a coin-tossing
experiment
by Feller (1970, chap. 3). A superioralternative whichis
a convergent
estimatorwhenM goes to infinity.
is to use stoppingrules whichaccelerateconvergence.
4. DIVERGENCE ESTIMATION FOR
In fact,considertwo chains (O(t)) and (O(t)) withinitial
CONTINUOUS CHAINS
valuesu and v. Whenthesechainsmeetin an arbitrary
state,
4.1
Implementation
theirpathsfollowthe same distribution
fromthismeeting
time and the additionalterms1j(1(t)) - 1j( (t)) are unbiThe estimationof divergencevalues in the continuous
m=l1
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
1059
Guihenneuc-Jouyaux
and Robert:DiscretizedMCMC and ConvergenceAssessment
case is quite similarto the proposal in Section 3. For a
givenreplicationm of theM parallelrunsused to evaluate
the expectation(7), k chains (0(t) ) are initializedfromthe
k boundingmeasuresvj (1 < j < k). The generationof
0(t) is modifiedaccordingto (2) when0o)-1)entersone of
the small sets Ai, and thismodification
providesthe subof themthreplicationto the
chains((y7) ). The contribution
of divj(i1 i2),
approximation
N
lim E
[113(( (n)
(( (n)
)-lj
)
n=1
is actuallygivenby
small set Aj. Second, traditionalantitheticargumentscan
to thissetting,to acceleratea meetingin the
be transferred
same small set.
4.2
BenchmarkExample
The nuclearpumpfailuredatasetof Gaver and O'Muircheartaigh(1987) has been repeatedlyused in the MCMC
of theGibbs samto illustratetheimplementation
literature
(e.g., Gelfandand Smith
pler and of hybridmodifications
1990; Myklandet al. 1995; Tanner1993). It is invokedhere
to illustratethe factthatsmall sets and renewaltimescan
easily be derivedin standardsetups.We recall thatthe associatedGibbs sampleris to simulate
E
{13.
(i
(9)
]13 ( (n2,m)}
1 ,m)
(I < i-< I10)
g;a(p, + a) ti + /3)
Ail/,3,
ti,p,
Nm(i1,i2)
n=1
(
a
+ 10a, 6+ EAi)
to the
whereNm(i1, i2) is the stoppingtimecorresponding
-, A1o
/3JA1,
firstoccurrenceof
because (9) is an unbiased
estimatorof divj(i, i2) accordingto Lemma 1.
wheretheobservationsare (pi,ti) and thehyperparameters
An interesting
propertyfollowsfromthe alternativeex- are chosen as a = 1.8, ry .01, and 3 1. If we introduce
1,... ,J), then
pression
small sets of the formAj = /31j3l (j=
kernelis
thelowerboundon thetransition
-(n)=
Nm
(ilj
Z
n=1
i2)
Nm
llj(<i(n)
(i 2 'll)
-<}(n)'
Z
(10)
n=1
whereN,17(ijji2) and Nm(i2Ii1) are the "meetingtime"for
bothchains,whichcorrespondto theabsolutestoppingtime
Tm(i1 i2) when the two (continuousstate-space)chains
(0Mt)m)and (0(t)m) meetforthefirsttimein thesame small
set Al (i.e., 0(Trnii,j2))
c Al) and
0(Tr(il,i2))
c
Al, and
whenrenewaloccurs forboth.Indeed,it would seem that
this stoppingrule improvesthe estimationof divj(i',i2)
because it is usually smallerthan equation (9). However,
thisapproachinducesa bias in theevaluationof divj(il, '2),
as shownby thefollowingresult.
Lemma2. If the couple of chains (0(t), 04t) ) is such
thatthe stoppingtime T(u, v) verifiesE[T(u, v)2] < oo,
theevaluation(10) of divj(ii, i2) is biased,
N(ulv)
div,(u,v) -E
E
n=1
K(/3',31)> /
(3 +
(/oa+of)
x e-0' (6+EZi
10
{
{-
x dAl
..
10
{(ti
Ai)
A)
)2~(ti(pi+ca)
Xx
II
E
+/3.)P2?c+Oz
+
)
AiP+
A
e
(t7'+O3)At
}
dA1o
+ 3)P?+c
Ii= (ti +<j)P2+;
{
}
(j3)
1J(?lea
+1Oc-1
+
a)
N(vlu)
lj (< (n) _
E
1 (<S(n))
n=1
=E[N(ulv)-N(vlu)]
7rj,
xi=1
dA dA1o.
to be in statey.
where7rjdenotesthestationary
probability
to thefinitecase, theconditionE[T(u, v)2] < 00
Contrary
Thus probabilityof renewalwithina small set Ay is
is usually difficult
to assess, but the argumentagainstus10=rI(t + i3 )+, =?.J
ing the absolute stoppingtimeremainsobviouslyvalid if
this conditiondoes not hold. Note also thatcouplingbetween the chains (0(t) ) (1 < u < k) can be used to reduce Nm(i1, i2) and thus to accelerate the estimationof whereastheboundingprobabilityv; is the marginaldistrito implementin bution(in ,B)of thejoint distribution
divj(u,v), althoughcoupling is difficult
continuousstate-spacechains (see Johnson1996). In fact,
two departuresfrom independencebetween the parallel
giFa(pi+ v,t +a), (i
chainsare of interest.First,thesame uniformrandomvariable can be used at each (absolute)timet to decide whether
thisis a renewaltimeforeverychain enteringan arbitrary
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
1060
Journalof the AmericanStatisticalAssociation,September 1998
runof theGibbs sampleron 5,000 iterations
A preliminary
providesthesmallsetsgivenin Table 1 as thosemaximizing
theprobabilityof renewal j = EjPr((t) c A3).
the convergence
Followingthe foregoingdevelopments,
assessmentassociatedwiththesesmallsetsAj can be based
on parallelrunsof eightchains (:(k) (j
1,.. .,8, m
1,.. ., M) starting
fromtheeightsmallsetswithinitialdisthe corresponding
tributions
vj:
1.Generate (i = 1, . . . , 8)
Ai -
2.
xp (ti + ,Bj);
Generate
/3,ga (-y+10a,
The chains (t)
d+EA,).
finitestatesspace
) inducecorresponding
5. CONVERGENCE ASSESSMENT FOR TWO
PARALLELCHAINS
5.1 ConvergenceAssessment RatherThan
DivergenceEstimation
For evaluatingthe convergenceof an MCMC algorithm,
(e.g.,Geyer,1992;
literature
thereis nowa well-documented
Rafteryand Lewis 1996; Robert 1996, sec. 6.5) about the
problemsassociated with using parallel chains,including
dependenceon startingvalues, ambiguousdistribution
for
The call forparalthefinalvalues,and wasteof simulations.
negative
lel chainsin thepresentsetuphas ratherdifferent
features.On the one hand,the estimatesof the transition
matrixP and of the divergences,
divj(u, v) thusproduced
are quite valid,because the dependenceon startingvalues
is inherentto the criterion.On the otherhand,the sample
producedby the finalvalues of the parallel chains cannot
of
be exploitedas a stationarysample of the distribution
interest,
because of the shortrunscreatedby the stopping
rule. Moreover,startingan equal numberof chains from
each small set does not necessarilyreflectthe weightsof
thesesets in the stationarity
In thatsense,the
distribution.
methodis the oppositeof an "on-line"controltechnique,
even thoughit providesusefulinformation
on the mixing
rateof thechain.
We showin thislast sectionhow thedivergencecriterion
can be implementedwithonly two parallel chains,for an
numberof small sets Ai. This alternativeimplearbitrary
mentation
is based on Birkhoff's
pointwiseergodictheorem,
whichwe now recall (see Battacharyaand Waymire1990,
pp. 223-227, for a proof). We denote X = (X(1),...) a
Markovchain and TrX = (X(r+l), X(r+2),...) the shifted
versionof the chain.
chains (4)j) with
j and contributeto the approximationof the divergencesdiv,(iI, i2) via the sums (9), dependingon couplingtimesN(ii, i2). Figure2 describesthe
convergenceof fourselectedestimateddivergencesas the
numberof parallel runs increases.The averages stabilize
ratherquickly,and moreover,the overallnumberof iterationsrequiredbythemethodis moderate,because themean
couplingtimeis only 14.0; this impliesthateach sum (9)
involveson average14 stepsof theGibbssampler.The standarddeviationis derivedfromtheempiricalvarianceof the
sums (9).
To approximatethe ratioEv(/3')/K(3,/3') mentionedin
Theorem1. For an ergodicMarkov chain (X(n)), with
Section 2.1, the integralsin both v(,3') and K(/3,/3')are
7rand a functionalg of X, the averdistribution
stationary
replacedby sums (see Robert1996), leadingto theapproxage
imation
-
cv(/3')
M
ME g(TmX)
K(3,/3')
S=1
((
z
(d+ Ei2
+
Ez?
s)
7Y?Oc
exp {-/3
=1A?}
Z1
exp{-,/3AEl
}
771=
(l
1
convergesalmostsurelyto theexpectationE1,[g(X)].
This resultthusextendsthestandardergodictheorem(see
Meyn and Tweedie 1993) to functionalsof thewhole chain
and thusallow forrepeateduse of thesame chain.In particular,if R is a stoppingtime-thatis, a functionalsuch that
wherethe AMare generatedfrom8xp(ti + 3j) and the Ai
from8xp(t -i+3). This approximation
deviceis theoretically
theeventR(X) = n is determined
by (X(1),..., (X(n))
for
An
S
justified
large enough.
accelerating(and stabilizand
Tweedie
1993,
p. 71)-and if g
Meyn
only
(see
ing) techniqueis to use repeatedlythe same sample of S
satisfies
Exp(l) randomvariablesforgenerationof the (ti +?/)A8I's;
that is, to take advantage of the scale structureof the
In the simulations,we took S = 500,
gamma distribution.
= g(X(1)) .
g(X()) *... , X(n))...)
X(R(X(1)....
althoughsmaller values also ensure stabilityof the approximation.
Table 1. Small Sets Associated Withthe Transition
KernelK(13,13')and Associated Parameters
Aj
[1.6, 1.78]
[1.8, 1.94]
[1.95, 2.13]
[2.15, 2.37]
[2.4, 2.54]
[2.55, 2.69]
[2.7, 2.86]
[2.9, 3.04]
6E
.268
.0202
.372
.0276
.291
.0287
.234
.0314
.409
.0342
.417
.0299
.377
.0258
.435
.0212
pi
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Guihenneuc-Jouyaux
and Robert:DiscretizedMCMC and ConvergenceAssessment
1061
(8,1 ,2)
(7,8, 1)
to
1
....................
0
10000
20000
........ o...........-.-..
30000
40000
50000
0
10000
(1,2,3)
20000
30000
40000
50000
40000
50000
(2,3,4)
C\J
c'J
0
10000
20000
30000
40000
50000
0
10000
20000
30000
Based on (2) forFour Chains StartedFromFour Small Sets Aj. The triplets(i1, i2, I) index
Figure2. Convergenceof the DivergenceCriterion
in the numberof visitsof I by the chains (c")) and ((t)). The envelope is located two standard deviationsfromthe average. For
the difference
each replication,the chains are restartedfromthe correspondingsmall sets. The theoreticallimitsderivedfromthe estimationof P are -.00498,
-.0403, .00332, and -.00198 (based on 50,000 iterations).
thentheforegoingresultapplies.In thesetupof thisarticle,
X can be chosenas beingmade of two replicationsx(n) (a(n),((n)) of thediscretizedsubchainof Proposition1. The
stoppingtimeis thenthefirstepoch N when >(n)and~(n)
u and (1) = v), and
are equal (thatis, N(u, v) for (1)
the functionalg is vectorvalued withcomponent(j, u, v)
equal to
N(u,v)
E
n=1
[[ '1 )
(2
)
if (1) = u and 61) = v, and 0 otherwise.
The gain broughtby this resultis far fromnegligible,
because insteadof using a couple of (independentor not)
chains(a(n), (n)) only once betweenthe starting
point
and their stoppingtime N, the same sequence is used
N - 1 timesin the sum (11) and contributes
to theestimationof thedivergences
forthevalues(a(n), (n)= (u, V)
1,. .. N). Moreover,the methodno longernecessi(n
tatesrestarting
thechains (n) and((n) oncetheyhavemet.
This featureallows for on-linecontrolof the MCMC algorithm,a bettermixingof the chain, and directuse for
chains(O(n))
purposes.In fact,thecontinuous
estimation
behindthediscretizedsubchains( (in)) (i = 1,2) are generand theresulting( (n) an)),S
ated withoutany constraint,
are used to update the divergenceestimationsby batches;
(n)
thatis, everytime (n)
As before,a firstconvergenceassessmentcan be based
on thegraphicalstabilizationof theestimateddivergences.
Note thatwe also can estimatethevarianceof (9), because
the batches between renewal times are independent(see
Robert1995 forthevarianceestimation).A morequantitativeevaluationof theconvergenceof thedivergenceestimatorsfollowsfromcomparisonof theestimateddivergences
with the estimatedlimitszid - z22' when the transition
is derived
of (((n))
fromthevariouschainsandthe
matrix
matrixZ is computedwiththisapproximation.
fundamental
We now considertwo standardexamples to show how
Note also thatalthough
ourtechniqueappliesand performs.
both examples involve Gibbs samplingtechniqueswhere
the derivationof the small set is usually straightforward,
minorizingtechniquescan be easily extendedto general
Metropolis-Hastingsalgorithms.
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Journalof the AmericanStatisticalAssociation,September 1998
1062
5.2
(1995), a standardGibbs samplerforthismodel is
Cauchy PosteriorDistribution
Considerthattheposteriordistribution
wF(OX1,X2,X3) X
-02 /2o2
X (1 + (0
[(1 + (0 -X)2)
-
OXl, X2, X3, ?7i,7r2,
+ (0 - X3 )
X)2)(1
, (12)
rl_,AS '7lXl
Vtl
2,3)
(i=1,
oioxi -sxp(l+(O-xi)2)
(13)
773
+ 'q2X2 + 'q3X3
+ 'q2 + 'q3 + 07f 2
I
71l + 972 + 'q3 +
(f
-2J
whichcorrespondsto a normalpriorassociatedwiththree The intervalsC = [rl,r21 with xi < rl < X2 < r2 <
Cauchy observationsx1,x2, and X3. As shownby Robert X3 are small sets (see Robert 1995) forthe Markov chain
(B,C,D)
~~Ill
II
,,1
O
I
\\
1\
A'
s
1L,
I
o
|
i
':
;
15000
10000
5000
0
(0DB)
(CD,BC)
6
I
0
0
00002
1000
2000
40
00
0
00
06000
(500000iterations)
StartedFromB and D. The triplets(I, li, i2) indexthe difference
in
Figure3. Convergenceof the DivergenceCriterionforTwoChains Initially
the numberof visitsof I by chains startingfromil and fromi2. The envelope is located twostandarddeviationsfromthe average. The theoretical
limitsderivedfromthe estimationof P are .094, 1.202 and -.688. The scales of the threegraphs representthe numberof times div1(i1,i2)has
been updated along the 500,000 iterations.The finalvalues of the threeestimateddivergencesare .164, 1.076 and -.825.
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
1063
and Robert:DiscretizedMCMC and Convergence Assessment
Guihenneuc-Jouyaux
(1,3,2)
(2,3,1)
C>~~~~~~~~
0
I-~~~~~~~~~~~~~~~~~~~~~~P
1000
500
15F00
200
0
2000
600
400
1000 1200
800
MOMO diagnoses
C>
6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
?
|
'
i
t
~~~~~~~~~~" ~~14
;"'
\;
16
'
18
20
00
22
02
04
06
08
10o
mean=Q0052
mean170281
(i1,i2,I) indexthedifference
A1 and A2. Thetriplets
Started.From
forTwoChainsInitially
Criterion
oftheDivergence
Figure4. Convergence
theaverage.Thetheoretical
from
deviations
'i and from
from
i2. Theenvelopeis locatedtwostandard
ofvisitsofI bychainsstarting
inthenumber
limitsderived fromthe estimationof P are-.110,
.0288, and -0225
(based on 100,000 iterations),whereas the finalvalues of the estimated
i2) havebeen updated.
oftimesdiv1(i1,
tothenumber
graphscorrespond
are .00747,.0161,and-.0162. Thescales on thethreefirst
divergences
iterations.
oftheMCMCsamples,based on the10,000first
as wellas thehistograms
graphsproducetheaverageconvergence
Theconvergence
run of the Gibbs
Markov chain. A preliminary
kernelsatisfies three-state
associatedwith(13), and the corresponding
sampleron 5,000 iterationsleads to the choice of thethree
1
l+P11
l+P~31
small sets
EV(01),
3 V(O/
P
K(O00/)>
C
[7.5,8.5],
B = [-8.5, -7.5],
wherev is the marginaldensity(in 0) of
and
Tl 12,713)
(0, 711
N1(l++n+2
++3
x
xP(1+ )
D = [17.5,18.5]
U
23'
xp
as optimizingthe probabilitiesof renewal,
71+'q2+773+Uf2
(1+222)
x
0
< 10-x21
< 10-Xll
-xi,
< P12 =
< P22=max(r2-x2,
x2-ri),
and
P31=X3-r2
< 10-X31
.02,
QC = 0.10,
and
and
Pi, = ri-xi
QB-
Xp(1+2)
<P32=X3-ri.
Similarderivationscan be obtainedforthesets B = [si, s2]
withsi < xi < s2 < X2 and D = [vl,v2] withX2 < Vl <
X3 < V3. If we choose in addition82 < ri and r2 < vl,
the threesmall sets are disjointand we can thus create a
QD-
.05.
Figure 1 gives the 200 firstvalues of 0(t) generatedfrom
(13) and indicatesthe correspondingoccurrencesof (n).
Note thatthe choice of the small sets is by no means restrictedto neighborhoodsof the modes, althoughthis increases theprobabilityof renewal.
of thedivergence
Figure3 illustratestheimplementation
fromthesmall
controldeviceon twoparallelchainsstarting
sets B and C. Convergenceof the criterionis ratherslow,
because thenumberof simulationsof thecontinuouschain
(6(t)) is 500,000. The scale of the threegraphs is much
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
1064
Journalof the AmericanStatisticalAssociation,September 1998
smaller,however,and indicatesthe numberof timeseach
divergencehas been updated.
5.3
AR(1) Model Witha Changepoint
ConsidertheAR(1) model
Xt+j
=
+ Et-'t+1
pixt
if t
(0,
infAPr(Q = ulp,a)
<
ELuli
v~~~~~~~~~~~~2),
<
p(n), a(")), because a minorization
conditionholdsfor
(p(n),
sets of the formA
[P1,P] x [P2,2] X [u,5]. Indeed,
K((p, C), (p', C')) can be boundedfrombelow by cv(p', C')
on such small sets,wherev is derivedfromtheGibbs sampler (5.3) by simulatingri from
infAPr(,= ulp,a)
and replacing a by a in the simulationof the pi's, as
shown in Appendix C, along with the exact value of c.
wheretheparametersP1, P2, a, and eI {1,
, T2} are
For a simulatedsample of 50 observationswith parameunknown.(See 6 Ruanaidhand Fitzgerald1996 forsimilar ters = -.8, P2 = .2, a = 1.0, and K = 18, a prelimip1
modelsin signalprocessingsetups.)The posteriordistribu- naryexplorationof the parameterspace over 5,000 iterationassociatedwiththeprior
tionsleads to the small sets,A1 = [-1., -.77] x [.33,.74] x
[.785,.835],A2 = [-1., -.76] x [.35,.73] x [.835,.865],and
7r(P, a, I) = ,f1[-1,1](P1) 1ff[-1,1](P2)
{11,-,T-21(/-)
A3 = [-1., -.76] x [.35,.74] x [.865,.92],withcorrespond01
ing probabilitiesof renewal.00979,.0153 and .00965.
is
Figure 4 providesthe evolutionof the divergenceestifortwoparallelindependent
mation
chainsstartedfromthe
(x
pit2
T-1
(- T 1 ep-1
x2 +
u7
(xt+i-pixt)
exp-2(72
small sets A1 and A1. The 1,267 pointsin the graphscorrespondto theupdatesof thedivergenceestimationsat the
meetingtimesof the discretizedchains.The totalnumber
T-1
of
iterationsin the continuouschains is actually100,000,
+ Z(Xt+l-p2Xt)21
foran averageexcursiontimeof 78 iterations.
Xt+1
=
7
=
P2Xt
+
T1 ep2u2
exp- 2
if t >
E4+1
2I,Pi E
1
+ E1 xt
xt
2 - 2p, E1 xtxt+i
1
6. CONCLUSION
1
In this articlewe have establisheda theoreticallyvalid
and implementableapproachto Markov chain discretizaT-1
T-1
T-1
tion and illustratedthe performanceof a convergenceas2
x2
x2 - 2P2
+ P
sessmenttechniqueforfinitestatespaces based on the no(14)
Xt++
K+1
K+1
K+1
with our method
tion of divergence.Potentialdifficulties
that
it
the
determination
of small sets Aj
are
(a)
requires
whichcan be simulatedfromthefollowingGibbs sampler:
withmanageableassociatedparameters(ES,vj), and (b) it
providesconservativeconvergenceassessmentsforthedisPr(/-= u) ocexp- 12 [Tj(u)Pj(Pl2pt1(u))
cretizedMarkovchain.
+ T2(U)P2(P2 -2Y2 (2))
The latterpointis a consequenceof ourrigorousrequire,
mentsfor the Markov structureof the discretechain and
It is
exact convergenceof the divergenceapproximations.
to exhibitsafeboundson thenumber
thusrathercomforting
of simulationsnecessaryto give a good evaluationof the
ATP2
T (82K),( )
of interest.We indeedpreferour elaborateand
distribution
P2
VTP 00
slow
but
well-grounded
convergencecriterion
theoretically
T2
to handyand quick alternativeswithlimitedjustifications,
and
because thelatterare testedin veryspecial setupsbutusuand inaccuraciesoutsidetheseseally encounterdifficulties
tups.Note, however,thatthe convergenceassessmentdoes
ga-a(
2x2 + E (xt+l_Pixt)2
,7-2
nottotallyextendto theoriginalcontinuousMarkovchain.
the few examplestreatedhere show thatusing
Moreover,
iN
T-1
the estimateof 1P as a controlvariatetechniqueleads to
+ E (Xt+l P2xt)2J) , (15)
long delays in the convergencediagnostic.
The difficulty
(a) is obviouslymore of a concern,but
whereMfT (At,T) denotesthe normaldistribution
restricted thereare theoreticalassurancesthatsmallsetsexistin most
to [-1,1], and the Ti(u) and pii(u) (i = 1,2) are given MCMC setupsand, moreover,Myklandet al. (1995) have
by (14).
proposedsome quasi-automatedschemesto constructsuch
Note that we are in a setup where the dualityprinci- small sets by hybridmodifications
of the originalMCMC
ple of Diebolt and Robert (1994) applies, because iN is algorithm.Note also thatthetechniqueswe used in theexa finitestate-spaceMarkov chain. But the discretization amplesof Section5-namely, to boundfrombelow theconmethodexposedin thisarticlecan also be used forthechain ditionaldistributions
dependingon 0-can be reproducedin
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Guihenneuc-Jouyaux
and Robert:DiscretizedMCMC and Convergence Assessment
1065
manyGibbs setupsand in particularfordata augmentation, the minorization
condition(1), and thatotherconvergence
whereasMyklandet al. (1995) have shownthatindependent diagnosticsbased on the naturalfiniteMarkovchainsgenHastings-Metropolissettingsare quite manageablein this eratedin these setups would be preferable.Moreover,as
respect.At last,thechoice of thespace wherethesmallsets also pointedout by Gilks, Roberts,and Sahu (1997) for
are constructed
is open,and-at least forGibbs samplers- an accelerationmethodusingregeneration,
the applicabilthereare oftenobvious choices as in the dualityprinciple ityof the methodin high-dimensional
problemsis limited
of Diebolt and Robert(1993, 1994). It mustbe pointedout, by the difficulty
to obtainefficient
minorizingconditions,
however,thatmissing-datastructures
like mixturesof dis- even thoughnew developmentsare bound to occur in this
tributions
are notoriousforleadingto verysmallboundsin area, giventhecurrentinterest.
APPENDIX A: PROOF OF LEMMA 1
Because N(u, v) is a stoppingtime,the strongMarkovpropertyimpliesthatforn > m,
E [h(lP)) |(N(u,v))
m] =E
j,N(u,v)=
))]
[h(((
E [h((v))
| v(v))
j, N(u, v)
m]
foreveryfunctionh, and,by conditioning,
we derivethat
E
[NAN(u,v)
EN
3
E{[(()-(( 13}
13u13U S(r
{X
E
=
-N(u,v)
{1(13u
+ E
((v
1
13
))}
IN(u,v)<N
n=1
13
J(u
+E [
Now
E
[5
{nIj((n)
-
N(u,v)>N]
< NP(N(u, v) > N), < E[N(u,V) 2]/N,
))} I<N(U,v)>N]
(n)
I[(v)
13
and
impliesthattheleftside goes to 0 whenN goes to infinity
-N(u,v)
E
-N[(u,v)
13((n) _ 1[
{[ 13(d
(
E
fn)}_E
n=1
S
) }IN(u,v)<N-
n=1
-N(u,v)
E
{13(u
N)
1
(
goes to 0 whenN goes to infinity
by thedominatedconvergencetheorem.Therefore,
[
limE
N
-N(u
{
i
(((n))}
i)-
=E E
< 2E [N(u,V)lfN(u,v)>N]
)N(u,v)>N
]
,v)
[
APPENDIX B: PROOF OF LEMMA 2
For ease of notation,we defineNu = N(u v) and Nv - N(vIu). Conditionallyon (Nu, N,), we get,forN > NM,V Nv,
E
N
(5'{i3(Qn)
1
($n))}2
Nu
n= 1
n=l1
- Nu
=
E
E
5
n=1
Nu
=E
+
5
N-N,
I3
]f('(b+Nu))
((nN)
n1
n=1
Nu-N,
E
i(~(n))]
n=Nv+l1
N,
(n)
-n=l
+
5
)_-
N-Nu
(((n))
n=ln=1n=
Ln=1
IS(71
n=N,,+1
N(
;Ij
(n))
N
N
N,
5 (i(nP)-I(~(n))?+
=E
IN
NNu
INu>n=lE
N,-Nu7
3 (N(-(nfl))
'NN
<Nv
I
3=.(B.11)
I))
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Journalofthe AmericanStatisticalAssociation,September 1998
1066
and &,Nv+n)
because the chains(NU+n)
(n >
on (Na,
conditionally
0) have the same distribution
and thus
(N-N,/)A(N-Nv)
-(N-NU,)/\(N-Nv)
E
E
NV),
+Nv )
((n
) _
1.j (tn+N,,
O.
0
n=1
n-=1
to the case
Because E[(Nu V Nv)2] < E[T(u, v)2], an argumentsimilarto the one in the proofof Lemma 1 validatesthe restriction
betweendivj(u, v) and (10) is givenby
N > Nu V NV whenN goes to oo, and the difference
EI[ (3 ()]
E[NU- Nv]E
=
E[Nu
-Nv]7j
r.
because theMarkovchain (a(n)) is ergodicwithlimitingdistribution
APPENDIX C: MINORIZINGCONDITIONS FOR EXAMPLE IN SECTION 5.3
When (p, a) E A,
T-2
? Pr(ni= UIP,9)71 (P/19,
K((P,a),(P',a')) =
lP, U)
(/
U)72
u=1
>ES
U=:1
infAe-(1/2a2)[_1(u)P1(p1
T-12U
_K1SP
e
[D{(I
2
1Uinf,?,?5
H i1
a
(Hence
?
is equal to
(-) 2I
2
[1{(i-
2
T
-
1
[ ]
}
2
t,u(U))
(2
_=1
-
p4(u))/VVT%(U)U}
inf:UT | 1 -I(
TU) a
)U
-,Df
1Pj,)729j,)
-
T%(u)U}]
A(U))/
-
-
2
F1(tU) + (p2 - [2 (n))
ET
-1-A(u)
-
X
= vlp,a)
infAPr(n
>3T-21
i(u))/x/7(u)
ex {(PI
(P2 -2p2(K))]
infAPr(n= ulp,c)
ulp,a)
supAPr(n=
2/A2(u))]
-2A2(K))+-r2(K)P2
(1/2a2)[-r1(/i)pj(pl
LT-2 infAPr(Q ulp,a)
T 2
2pl(u))+T2(u)p2(p2
JJ
T
T2 (U)
}
infAPr(= ulp,a)
supAPr(,n
ulp,a)
and theminorizingmeasurev is
T
_2_infPr___=ulp__a
U=
V1infAPr(n =uvp,
a)
xx
(PI
p
_
pIu(U)) 2T1(U) + (p2
2u2
-
/12(Kr))T2 (U)
2
x
227r2
(U'
p,u
)
]7J[1{(I
-
1ti(u))//i(u)U}
-
{(-1
-
Ati(u))/V/Hi(u)U}]
-1
-22i=
[ReceivedJune1996. RevisedMarch 1998.]
REFERENCES
Asmussen,S. (1979), AppliedProbabilityand Queues, New York:Wiley.
Battacharya,R. N., and Waymire,E. C. (1990), StochasticProcesses With
Applications,New York:Wiley.
Chauveau, D., and Diebolt, J. (1997), "MCMC ConvergenceDiagnostic
via theCentralLimitTheorem,"technicalreport,Universit6de Marnela-Vallee.
Diebolt, J.,and Robert,C. P. (1993), "The Duality Principle,"Journalof
theRoyal StatisticalSociety,Ser. B, 55, 71-72.
Diebolt, J.,and Robert,C. P. (1994), "Estimationof FiniteMixtureDistributions
ThroughBayesianSampling,"JournaloftheRoyal Statistical
Society,Ser. B, 56, 163-175.
Feller,W. (1970), An Introduction
to ProbabilityTheoryand Its Applications,Vol. 1, New York:Wiley.
Gaver,D. P., and O'Muircheartaigh,
I. G. (1987), "RobustEmpiricalBayes
Analysisof EventRates,"Technometrics
29, 1-15.
Gelfand,A. E., and Smith,A. F. M. (1990), "Sampling-BasedApproaches
to CalculatingMarginalDensities,"Jouirnal
of theAmericanStatistical
Association,85, 398-409.
Geyer,C. J. (1992), "PracticalMonte Carlo MarkovChain" (withdiscus-
sion),StatisticalScience,7, 473-511.
Gilks,W. R., Roberts,G. O., and Sahu, S. K. (1997), "AdaptiveMarkov
technicalreport,Cambridge
ChainMonteCarlo ThroughRegeneration,"
University,
MRC BiostatisticsUnit.
JohnsonV. E. (1996), "StudyingConvergenceof Markov Chain Monte
Carlo AlgorithmsUsing Coupled Sample Paths,"Journalof theAmerican StatisticalAssociation,91, 154-166.
Kemeny,J. G., and Snell, S. L. (1960), FiniteMarkovChains,Princeton,
NJ:Van Nostrand.
Meyn, S. P., and Tweedie, R. L. (1993), Markov Chains and Stochastic
London: Springer-Verlag.
Stability,
Mykland,P., Tierney,L., and Yu, B. (1995), "Regenerationin Markov
Chain Samplers,"Journalof theAmericanStatisticalAssociation,90,
233-241.
6 Ruanaidh,J. J. K., and Fitzgerald,W. J. (1996), NumericalBayesian
MethodsAppliedto Signal Processing,New York: Springer-Verlag.
Propp,J. G., and Wilson,D. B. (1995), "Exact SamplingWith Coupled
Markov Chains and Applicationsto StatisticalMechanics,"technical
report,MassachussettsInstituteof Technology,Dept. of Mathematics.
Raftery,A., and Lewis, S. (1992), "How Many Iterationsin the Gibbs
Sampler?"in Bayesian Statistics4, eds. J. M. Bernardo,J. 0. Berger,
Press,
A. P. Dawid, and A. F. M. Smith,Oxford,UK: OxfordUniversity
pp. 765-776.
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions
Guihenneuc-Jouyaux
and Robert:DiscretizedMCMC and ConvergenceAssessment
1067
Raftery,A. E. and Lewis, S. (1996), "Implementing
MCMC," in Markov
CentralLimitTheoremsforMultidimensional
Hastingsand Metropolis
chain Monte-Carloin Practice,eds. W. R. Gilks,S. T. Richardson,and
Algorithms,
Biometrika,83, 95-110.
D. J. Spiegelhalter,
London: Chapmanand Hall, pp. 115-130.
Tanner,M. (1991), Tools for StatisticalInference:Observed Data and
Robert,C. P. (1995), "ConvergenceControlTechniquesforMonte Carlo
Data Augmentation
Methods,LectureNotes in Statistics67, New York:
Markov Chain Algorithms,"
StatisticalScience, 10, 231-253.
Springer-Verlag.
Robert,C. P. (1996), Methodesde Monte Carlo par Chai nes de Markov, Yu, B., and Mykland,P. (1994), "Looking at Markov SamplersThrough
Paris: Economica.
Cusum Path Plots: A Simple Diagnostic Idea," TechnicalReport413,
Roberts,G. O., and Tweedie,R. L. (1996), "GeometricConvergenceand
Universityof California-Berkeley,
Dept. of Statistics.
This content downloaded from 152.3.22.59 on Mon, 14 Oct 2013 12:55:45 PM
All use subject to JSTOR Terms and Conditions