Following a moving target Monte Carlo inference for dynamic

J. R. Statist. Soc. B (2001)
63, Part 1, pp. 127±146
Following a moving target Ð Monte Carlo inference
for dynamic Bayesian models
Walter R. Gilks
Medical Research Council Biostatistics Unit, Cambridge, UK
and Carlo Berzuini
University of Pavia, Italy
[Received September 1998. Final revision August 2000]
Summary. Markov chain Monte Carlo (MCMC) sampling is a numerically intensive simulation
technique which has greatly improved the practicality of Bayesian inference and prediction. However, MCMC sampling is too slow to be of practical use in problems involving a large number of
posterior (target) distributions, as in dynamic modelling and predictive model selection. Alternative
simulation techniques for tracking moving target distributions, known as particle ®lters, which
combine importance sampling, importance resampling and MCMC sampling, tend to suffer from a
progressive degeneration as the target sequence evolves. We propose a new technique, based on
these same simulation methodologies, which does not suffer from this progressive degeneration.
Keywords: Bayesian inference; Dynamic model; Hidden Markov model; Importance resampling;
Importance sampling; Markov chain Monte Carlo methods; Particle ®lter; Predictive model selection;
Sequential imputation; Simulation; Tracking
1.
Introduction
Bayesian applications of Markov chain Monte Carlo (MCMC) methods involve generating
many samples from the posterior distribution of the model parameters by using a Markov
chain, and then approximating posterior expectations of interest with sample averages.
Although MCMC sampling is computationally intensive, it has been remarkably successful in
expanding the repertoire of feasible Bayesian problems; see, for example, the many applications described in Gilks et al. (1996). However, for dynamic problems where the posterior
(or target) distribution evolves over time through the accumulation of data, and in other
situations where a large collection of target distributions is involved, MCMC methods are
too computationally intensive to be useful, especially where realtime sequential forecasting is
required. Examples of dynamic problems include ®nancial and medical time series prediction,
sequential system identi®cation in control engineering, speech recognition, military tracking,
on-line updating of classi®cation systems and machine learning. Multiple-target distributions
also arise with model selection techniques based on k-step-ahead prediction.
To reduce the computational burden of dynamic Bayesian analysis, several techniques
involving some or all of importance sampling, importance resampling and MCMC sampling
have been proposed (Handschin and Mayne, 1969; Zaritskii et al., 1975; Kong et al., 1994;
Address for correspondence: Walter R. Gilks, Medical Research Council Biostatistics Unit, Institute of Public
Health, University Forvie Site, Robinson Way, Cambridge, CB2 2SR, UK.
E-mail: [email protected]
& 2001 Royal Statistical Society
1369±7412/01/63127
128
W. R. Gilks and C. Berzuini
Rubin, 1988; Gordon et al., 1993; Kanazawa et al., 1995; Isard and Blake, 1996; Berzuini et
al., 1997; Liu and Chen, 1998). These are collectively known as particle ®lters and are
reviewed brie¯y in Section 4.2. These techniques require the generation of an initial set of
particles, which is then progressively resampled and augmented to take account of incoming
data and parameters. The state of the art is reviewed by Liu and Chen (1998) and Doucet
(1998). However, most of the techniques proposed to date can su€er from progressive
impoverishment of the representativeness of the particles as the dynamic process evolves,
especially when unknown hyperparameters are involved, or inference about past unobserved
variables is required; see Section 4.2. Here we propose a new method for dynamic Bayesian
analysis, which we call the resample±move algorithm. Although this technique draws conceptually on the same base technologies of importance sampling±resampling and MCMC
sampling, it avoids the degeneration of existing methods.
We use p…dx† to denote generically a probability measure on a random variable x, and p…x†
a probability density function. We use ….† to denote a target distribution and all its conditional and marginal distributions and densities, e.g. …dx1 jx2 † and …x1 †. When ….† is a
posterior distribution, conditioning on the data is implicit and will be suppressed notationally.
2. A motivating example
oooooo
oooo
ooooooooo
o oo o o o o o oo
o o o oo o o o o o
o
o
o
o
o
o
o
o
o
oooooooo
oo
oooo
oooooo
ooo
oooooooooo
ooooooo
ooooooo
ooo
oo ooooooooo
oo o o oo o o o o
o o o o oo
5
10
y
15
20
25
As a concrete motivating example, we consider a version of a classical problem in non-linear
®ltering, known as bearings-only tracking (Gordon et al., 1993; Carpenter et al., 1998, 1999;
Bergman, 1999).
The problem, which we analyse in Section 5, is described in Fig. 1. A ship is moving in
the north±east plane through random smooth accelerations and decelerations. A stationary
observer at the origin of the plane takes a noisy measurement zt of the ship's angular position
at each time t ˆ 1, 2, . . .. Let xt and yt denote the east and north co-ordinates of the ship at
integer time t respectively, with corresponding velocities x_ t and y_ t . A simple model for the
trajectory evolution of the ship is, for t > 1,
0
observer
-0.20
-0.15
-0.10
-0.05
0.0
x
Fig. 1. Our illustrative application considers a ship moving along a smooth trajectory in the x±y plane, where x
and y represent east and north respectively: *, true position of the ship at each time t ˆ 1, 2, 3, . . .; - - - - -,
corresponding observed angular position
Dynamic Bayesian Models
x_ t ˆ N…x_ t 1 , y_ t ˆ N… y_ t 1 , xt ˆ x t
1
‡ x_ t
yt ˆ yt
1
‡ y_ t
9
1
†, >
>
>
>
1 >
†, =
>
1, >
>
>
>
;
,
1
129
…1†
where N…c, d † denotes a normal distribution with mean c and variance d. These equations
contain the assumption that the x- and y-components of the ship's velocity have the same,
constant but unknown, volatility 1 .
The equation for the observed bearings, z, is
zt ˆ tan 1 … yt =xt † ‡ N…0, 2 †.
…2†
As we observe the ship in its motion, new data zt accrue, along with new parameters …x_ t , y_ t †.
The vector of unknowns at time t is
t ˆ …, x1 , y1 , x_ 1 , y_ 1 , . . ., x_ t , y_ t †,
…3†
and the data are z1:t ˆ …z1 , . . ., zt †. From a Bayesian perspective, therefore, the target distribution
t …dt † ˆ p…dt jz1:t †
evolves in an expanding space, t . As t increases, our aim is to maintain a set of sampled
points, or particles, in t which can be used to estimate aspects of t of interest. In particular,
these particles can be used at any given time point t to approximate the conditional posterior
distribution for the current state of the ship, given the data z1:t accumulated up to that point,
or to predict the ship's position at a future point in time.
To re¯ect the accumulating information on the unknown hyperparameter , as t increases
there is an increasing need to update the values of represented in the particles. This causes
particular problems for existing simulation strategies, as noted above. See Section 4.2 for
further details. We now turn to a description of the resample±move algorithm.
3.
Method
We suppose that we have an evolving sequence of target distributions t …d†, where t denotes
discrete time. Denote the support of t by . For ease of exposition, in this section we assume
that does not depend on t. However, as noted above, our main concern is with problems
where expands with t. In Section 4.1 we show that the development in this section applies
directly to scenarios where depends on t.
In general, will be a vector of model parameters, and t …d† ˆ p…djXt † will be the
posterior distribution of after observing data Xt accumulated up to time t. However, our
methodology makes no assumptions about the process generating the t -sequence. Indeed, t
need not even be a posterior distribution. For example, t could be an importance sampling
distribution; see Section 6. Moreover, need not be a parameter vector. We merely regard the
t as a sequence of distributions of interest.
Our aim is to estimate
…
Et ‰gŠ ˆ g…† t …d†,
130
W. R. Gilks and C. Berzuini
for any function of interest g at any time t, by using Monte Carlo methods. Our resample±
move algorithm involves maintaining a set St of particles, each representing a value of . This
set of particles is adapted to the evolution in t by using a combination of importance
sampling, importance resampling and MCMC sampling. In outline, the procedure is as
follows. An initial set St1 of particles is sampled at time t1 and is used until time t2 , when it is
resampled and then moved in -space to form set St2 . This process continues so that, at time
tk‡1 , for k ˆ 1, 2, . . ., the current particle set Stk is resampled and moved to form Stk‡1 . Each
resampling is an importance-weighted resampling, and each resampled particle is moved
according to a Markov chain transition kernel. At any time t during this process, Et ‰gŠ is
estimated by importance sampling based on the most recently generated set of particles. We
now describe this algorithm in detail.
3.1. Preliminaries
To simplify the notation, without loss of generality, we assume that sampling or resampling
is done at integer-valued times t ˆ 1, 2, . . .. Thus we set tk ˆ k, for all positive integers k.
… j†
Let k denote a generic particle sampled at time k, and in particular let k denote the jth
particle sampled at time k. Let nk denote the number of particles generated at time k, so
… j†
Sk ˆ fk ; j ˆ 1, . . ., nk g. Let
wkt …† ˆ t …d†=k …d†.
…4†
(For readers who are unfamiliar with measure theoretic notation, think of t …d†=k …d† as a
density ratio t …†=k …†.) We shall refer to wkt ….† as an incremental weight function. When
t ˆ k ‡ 1 we suppress the second subscript on wkt , so wk …† denotes wk, k‡1 …†.
For k ˆ 1, 2, . . ., let qk …k , dk‡1 † denote a Markov chain transition kernel with invariant
(stationary) distribution k‡1 , i.e. qk …k , dk‡1 † denotes a conditional probability of moving to
a measurable set dk‡1 2 at time k ‡ 1, given position k at time k, with the property that
…
k‡1 …dk † qk …k , dk‡1 † ˆ k‡1 …dk‡1 †.
…5†
k 2
See, for example, Tierney (1994).
We adopt the convention that the ®rst subscript k on a variable indicates functional
dependence on the particles in Sk , and when combined with superscript … j † this dependence
… j†
… j†
… j†
… j†
… j†
is restricted to just the jth element of Sk , k . Thus gk denotes g…k †, wk denotes wk …k †,
… j†
… j†
… j†
… j†
wkt denotes wkt …k † and qk …dk‡1 † denotes qk …k , dk‡1 †.
3.2. Resample±move algorithm
After initialization at time t ˆ 1, our algorithm proceeds with a rejuvenation at each
subsequent integer time. The rejuvenation comprises two steps: a resample step and a move
step, as we now describe.
(a) Initialization: at time t ˆ 1, generate the initial set of particles S1 by sampling,
independently for j ˆ 1, . . ., n1 ,
… j†
1 1 ,
…6†
where `' denotes `is sampled from'. Then, for k ˆ 1, 2, . . ., we have the rejuvenation
step.
Dynamic Bayesian Models
131
…i †
wk ,
(b) Rejuvenation: at each time t ˆ k ‡ 1 calculate weights
for i ˆ 1, . . ., nk . Generate
Sk‡1 by performing the following two steps, independently for j ˆ 1, . . ., nk‡1 :
…i †
(i) resample step Ð randomly select a particle from Sk , such that k is selected with
…i †
probability proportional to wk , for i ˆ 1, . . ., nk , and denote the selected particle
…ij †
by k ;
…i j †
… j†
(ii) move step Ð move k to a new position k‡1 by sampling
…i j †
… j†
k‡1 qk .
…7†
3.3. Notes on the algorithm
We assume that, at time t ˆ 1, it is possible to sample independently directly from the target
distribution 1 . In most Bayesian applications, 1 would be the prior distribution of the
parameters, from which sampling would be easy for the large class of directed acyclic
graphical (DAG) models. For non-DAG models, such as typically arise in applications
with a spatial component, one strategy would be to set 1 to a convenient approximation of
the prior, which nevertheless admits independent sampling, and then to set 2 equal to the
prior.
…i j †
The resample step at time k ‡ 1 selects a particle k at random from the current particle
set Sk , for each j ˆ 1, . . ., nk‡1 . This is a weighted resampling with replacement and is known
as importance resampling (Rubin, 1988) or the weighted bootstrap (Smith and Gelfand, 1992).
Our notation allows the particle set size, nk‡1 , to change over time, although in practice a
®xed set size might be used. In Section 3.4 we consider why we might want to vary the set
size.
Regions of which are under-represented in Sk with respect to k‡1 have weights that are
greater than 1 according to equation (4), and similarly particles which are over-represented
…i j †
have weights that are less than 1. It can be shown that the set of resampled particles fk ;
j ˆ 1, . . ., nk‡1 g, before being moved, is approximately a sample from the current target
distribution k‡1 , provided that particle set sizes are large. Note that this is a dependent
sample. Although the resampling at time k ‡ 1 is performed independently for each j, it is
conditional on the values in Sk . Previous stages of resampling therefore induce marginal
dependence between the resampled particles.
…i †
The incremental weights wk used in the resample step involve k and k‡1 , but in most
Bayesian applications we can calculate only ~ k ˆ ck k and ~ k‡1 ˆ ck‡1 k‡1 , where the normalization constants ck and ck‡1 are unavailable as their evaluation would involve intractable
…i †
high dimensional integration. Instead of using wk in the resample step, we can use the
unnormalized weight
…i †
…i †
…i †
w~ k ˆ ~ k‡1 …dk †=~ k …dk †,
…8†
…i †
wk .
since this can be directly calculated and is proportional to
The move step at time k ‡ 1 in e€ect performs one or more iterations of an MCMC
algorithm on each of the particles selected at the resample step, where the invariant distribution of the MCMC algorithm is k‡1 . For an introduction to MCMC methods see, for
example, Tierney (1994), Gilks et al. (1996) or Robert and Casella (1999). For example, qk
might be an iteration of the Gibbs sampler, or perhaps a Gibbs or Metropolis±Hastings move
which updates just one element of vector . The kernel qk need be neither irreducible nor
reversible. We discuss the choice of qk further, later. As noted above, each selected particle
…i j †
k , before moving, is approximately distributed according to k‡1 . Therefore, by equation
132
W. R. Gilks and C. Berzuini
… j†
(5), the moved particle k‡1 is also approximately distributed according to k‡1 . Thus we do
not require the usual burn-in of MCMC algorithms.
The purpose of the resample and move steps is examined in Section 3.4. In Section 4.2 we
review connections between the resample±move algorithm and other techniques for dynamic
simulation.
3.4. Estimating Et ‰g Š
Recall that our objective is to estimate Et ‰gŠ, where g…† represents a generic function of
interest. Having run the resample±move algorithm up to time t, we can estimate Et ‰gŠ by
g~ t ˆ
nt
1 P
… j†
g…t †.
nt jˆ1
…9†
Thus g~ t is the mean of g over the particle set St .
Below we present a central limit theorem (CLT) for g~ t . This involves an integral operator
Ikt ….†, de®ned for any integers k and t such that 1 4 k 4 t:
8…
t
< h… † Q
wl 1 …l 1 † ql 1 …l 1 , dl †,
k < t,
t
…10†
Ikt …h† ˆ
lˆk‡1
:
k ˆ t,
h…k †,
where integration is over the space fk‡1 2 g . . . ft 2 g. This operator takes h…t † as
its operand and produces a result which depends on k . We suppress both k and t in the
notation for Ikt , these being implied by the subscripts.
We also need to de®ne a variance-like quantity:
V *k
1, t …g†
ˆ Ek ‰I 2kt …g
Et ‰gІŠ.
…11†
Here, g is a function of t , and the outer expectation in equation (11) integrates over k 2 k .
Thus V *k 1, t …g† is a function of the particles in Sk 1 , as indicated by its ®rst subscript. We can
now state the CLT.
Theorem 1. Under the integrability conditions of theorem 2 in Appendix A
g~ t Et ‰ gŠ
p
) N…0, 1†,
Vt …g†
as n1 ! 1; then n2 ! 1, and so on, where
Vt …g† ˆ
t
P
1
V *k
kˆ1 nk
1, t …g†.
…12†
Proof. See corollary 2 of Appendix A.
The limit in theorem 1 is for large particle set sizes fnk g, not for large t. This is the
appropriate mode of analysis, as we wish to make valid inference and prediction for each t.
However, we thank a referee for pointing out that the mode of convergence of the theorem is
not directly relevant to the practical context in which we might wish to consider behaviour
for large n, where n ˆ n1 ˆ n2 ˆ . . .. Such convergence results would require the use of
more complex interacting particle theory; see, for example, Shiga and Tanaka (1985) and Del
Moral and Guionnet (1999).
Dynamic Bayesian Models
133
We see from theorem 1 that g~ t is a consistent estimator of Et ‰gŠ, whose variance Vt …g† is a
somewhat complex function of the incremental weights and transition kernels encountered up
to time t. We now discuss the structure of Vt …g† in detail.
Many particles in Sk will remain unselected after completing the rejuvenation at time k ‡ 1.
Assuming that nk ˆ nk‡1 ˆ n, the expected proportion of particles in Sk which will remain
unselected after the rejuvenation is
0
1n
n
…
j
†
n B
1P
wk C
B1
C 5 1 1 ! exp… 1†
as n ! 1,
n
P
… j †A
n jˆ1 @
n
wk
jˆ1
by Jensen's inequality. Thus more than a third of the current particles are lost with each
rejuvenation. This progressive impoverishment is re¯ected in the contribution to Vt …g† of
one variance component V *k 1, t …g†=nk from each rejuvenation. Clearly, large set sizes nk will
reduce these components, but the accumulation of variance components with each rejuvenation could be highly undesirable.
In particular, suppose that qk …k , dk‡1 † puts all its probability mass on k‡1 ˆ k , so that
the move step almost surely does not move the particle. In this situation, V *k 1, t …g† reduces to
Ek ‰f…g
Et ‰gІwkt g2 Š ˆ Et ‰…g
Et ‰gІ2 wkt Š,
leading to
Vt …g† ˆ Et …g
Et ‰ gŠ †2
t
P
1
wkt ,
kˆ1 nk
…13†
suppressing notationally the functional dependence of g and wkt on . From equation (13)
we see that Vt …g† is strictly increasing with each rejuvenation. Several existing resampling
algorithms for dynamic simulation share this unfortunate property (see Section 4.2).
A special case of theorem 1 is ordinary importance sampling (see, for example, Geweke
(1989)), which is obtained when 1 4 t < 2, i.e. before the ®rst rejuvenation. Now, Vt …g†
reduces to equation (13) with t ˆ 1, the importance sampling variance formula. Thus, by
theorem 1 of Geweke (1989), Et ‰gŠ can be consistently estimated by g~ t without recourse to
rejuvenation. (This holds for any t > 0, since it is only for notational convenience that we set
rejuvenation times at t ˆ 2, 3, . . ..)
We have seen that the resample step impoverishes the particle base, and is not required for
consistent estimation. What, then, is the purpose of rejuvenation? We now show that the
move step enriches the particle base and provides an opportunity to improve estimation
…i †
…i †
considerably. Consider a particle k with a heavy weight wk . This particle may be selected at
the resample step at time k ‡ 1 for several di€erent j, and each copy of this particle will tend
to be moved to a distinct point in a region of strong support under k‡1 . Thus successive
rejuvenation steps will help the current particle base St to track the moving target t . An
illustration of this is given in Section 5. Without rejuvenation, tracking cannot occur, resulting eventually in heavy weights accumulating on just a few particles, impacting adversely on
Vt …g†, as can be seen from the importance sampling formula for Vt …g†, given by equation (13)
with t ˆ 1.
More precisely, if, for any integer l 4 t, transition kernel ql is rapidly mixing (allowing ¯uid
movement around ), then V *k 1, t …g† will be reduced for all k < l. This is most easily seen by
134
W. R. Gilks and C. Berzuini
considering an extreme case where ql is perfectly mixing, i.e. ql l . In this case, Sl does
not depend on earlier stages, and V *k 1, t …g† is identically 0 for all k < l (see lemma 2 in Appendix A).
Thus we see that there is a cost, in terms of the precision of estimation, with each
rejuvenation but also an important potential bene®t if the move step provides good mixing.
In practice, the optimal timing of rejuvenations will depend critically on the speed with
which the target t is moving, and the mixing rate of the transition kernels employed.
Estimates of Vt …g† will be invaluable in deciding when to rejuvenate, and in determining an
adequate particle set size nt .
3.5. Variance estimation
We propose an estimator of Vt …g† based on only the particles from stage t. For this we require
…i j †
… j†
some additional terminology. We say that particle k is the parent of particle k‡1 , where i j
…i †
… j†
is as de®ned in Section 3.2. Further, for k < l, we say that k is an ancestor of a particle l if
… j†
…i †
the line of parentage from l passes through k . For k 4 l, de®ne
… j†
…m†
… jm†
if k is an ancestor of l ,
C kl ˆ 1
…14†
0
otherwise.
For k 4 t, de®ne
… j†
I^ kt …g† ˆ
nt
nk P
… jm† …m†
C gt .
nt mˆ1 kt
…15†
… j†
… j†
In Appendix A (lemma 8), we show that I^kt …g† is a consistent estimator of I kt …g†. For l 4 t,
de®ne
V^ *l 1, t …g† ˆ
nl
1 P
… j†
I^ …g
nl jˆ1 lt
g~ t †2 .
…16†
In Appendix A (lemma 9), we show that V^ *l 1, t …g† is a consistent estimator of V *l 1, t …g†. De®ne
V^ t …g† ˆ
t
P
lˆ1
nl 1 V^ *l 1, t …g†.
…17†
In Appendix A (theorem 3), we show that V^ t …g† is a consistent estimator of var…g~ t †. An
equivalent but computationally more convenient formula is
V^ t …g† ˆ
…m1 m2 †
where N t
nt
nt
P
1 P
…m m † …m †
N t 1 2 …gt 1
2
nt m1 ˆ1 m2 ˆ1
…m1 †
is the number of common ancestors of t
…m1 m2 †
Nt
4.
…m2 †
g~ t †…gt
ˆ
nl
t P
P
lˆ1 jˆ1
… j, m1 †
C lt
…m2 †
and t
… j, m2 †
C lt
g~ t †
…18†
, i.e.
.
Dynamic simulation
4.1. Augmenting the space
So far we have assumed that the target distribution t has the same support for all t, i.e.
t ˆ . This assumption must be relaxed to embrace those numerous situations where the
Dynamic Bayesian Models
135
-space grows in size during the evolution of t . For example, in clinical monitoring, new
patient-speci®c random e€ects or frailties may be introduced into the Bayesian model with
each new patient (Berzuini et al., 1997). Similarly, in hidden Markov model processing of
sequential data, each new data observation enters the model accompanied by a new hidden
state variable. The development in this section follows Liu and Chen (1998).
At time k ‡ 1, suppose that new variables k‡1 are introduced into . To deal with this,
just before rejuvenation at k ‡ 1, we perform the following augmentation step at time k ‡ 1
…i †
for i ˆ 1, . . ., nk : draw an independent sample k‡1 from an augmentation distribution
…i †
f k‡1 …dk‡1 † and construct a new particle
…i †
…i †
…i †
~k ˆ …k , k‡1 †0 ,
…19†
…i †
k‡1
where the joint distribution k f
is absolutely continuous with respect to k‡1 .
…i †
…i †
Rejuvenation then proceeds as described in Section 3.2, with k replaced by ~k , Sk replaced
…i †
…i †
by S~ k ˆ f ~k g and wk replaced by
…i †
w~ k ˆ
…i †
k‡1 …d~k †
…i †
…i †
…i †
k …dk † f k‡1 …dk‡1 †
,
…20†
where we adopt the convention that fk‡1 …dk‡1 † 1 if k‡1 ˆ k , i.e. if no augmentation is
required.
…i †
In a Bayesian context, let k‡1 …dk‡1 jk † denote the conditional posterior distribution
…i †
(under the model) of k‡1 , i.e. the distribution of k‡1 conditional on k and on all the data
…i †
up to time k ‡ 1, Xk‡1 . Let k …dk‡1 jk † denote the conditional prior distribution, i.e. the
…i †
distribution of k‡1 conditioning on only k and Xk . Both the conditional prior and the
…i †
…i †
conditional posterior are natural choices for f k‡1 . Choosing the conditional prior for f k‡1
leads to
…i †
…i †
w~ k ˆ p…dXk‡1 jXk , ~k‡1 †,
i.e. the conditional likelihood. See the example in Section 5.2.
If the data arriving at time k ‡ 1 carry little information about k‡1 , then the conditional
prior may be a good choice, but if they carry substantial information about k‡1 then most of
the particles generated by this distribution would fall outside the region of high support for
k‡1 and would consequently receive a very low weight, with an adverse e€ect on the variance
of the resulting estimates. A better choice in this case would be the conditional posterior, but
this might not be easy to sample from. Moreover, the conditional posterior will typically be
…i †
known only up to a k -dependent normalizing `constant'. In principle, any choice for f k‡1
where the normalizing constant is known, and which is not too dissimilar to the conditional
posterior distribution, will suce (Berzuini et al., 1997; Liu and Chen, 1998).
4.2. Other dynamic simulation methods
Several other methods have been proposed for dynamic posterior simulation. The simplest of
these is sequential importance sampling (Handschin and Mayne, 1969; Zaritskii et al., 1975;
Kong et al., 1994), which corresponds to the resample±move algorithm without rejuvenation,
…i †
and where the augmentation distribution f k‡1 is either the conditional prior or conditional
posterior distribution. Sequential importance sampling may perform poorly if the arriving
information moves the posterior distribution t away from 1 , causing the importance
weights w1t to become concentrated on just a few particles of S1 .
136
W. R. Gilks and C. Berzuini
To help to overcome this, Kong et al. (1994) introduced rejuvenation through an importance resampling step (Rubin, 1988). This corresponds to the resample±move algorithm
without the move step. The advantages and disadvantages of importance resampling in a
dynamic context are discussed by Liu and Chen (1995). Similar resampling algorithms have
been proposed. The Bayesian bootstrap ®lter (Gordon et al., 1993), also known as the condensation algorithm (Isard and Blake, 1996) or likelihood weighting algorithm (Kanazawa et
al., 1995), uses the conditional prior for the augmentation distribution and performs the
resampling step before the augmentation step, to introduce more diversity to the particle
base. Berzuini et al. (1997), Liu and Chen (1998) and Pitt and Shephard (1999) have introduced
MCMC sampling at the augmentation stage, to facilitate sampling from the conditional
posterior.
These resampling methods have been shown to work well for hidden Markov models in
which there are no unknown hyperparameters (e.g. a volatility parameter), and where inference
about past states of the hidden process is not required. However, in other situations, these
methods may perform poorly, as they provide no opportunity to generate new values for
unknown quantities after their initial generation, even though information about them may
continue to accrue. Consequently, as the target distribution drifts away from these initial
values, the particle base may degenerate to contain few distinct values of these variables. See
Section 5 for an illustration. To introduce more diversity to the current set, Sutherland and
Titterington (1994) proposed replacing the resampling step with resampling from a kernel
density estimate constructed on the current set Sk . Similarly, West (1993) proposed resampling
through adaptive importance sampling, where the bandwidth of the kernel density estimate
is reduced on successive iterations of the adaptation to reduce the bias in the variance of
the target distribution. Liu and Chen (1998) proposed sampling from a Rao±Blackwellized
reconstruction of the target distribution, which is essentially a Gibbs sampling form of the
`move' step of the resample±move algorithm. Carpenter et al. (1998) have pointed out that
MCMC methods may be used to move particles.
5.
An illustration
Recall the bearings-only example of Section 2. In this section we shall describe an implementation of the resample±move algorithm on this example and then compare this algorithm
with a standard sequential importance resampling (SIR) ®lter in terms of their performance
in tracking a simulated ship's trajectory.
5.1. Dynamic modelling of ship tracking
At time t, on the arrival of a new observed bearing zt , the parameter space expands to
incorporate the unobserved ship velocity …x_ t , y_ t †, and we want to generate an updated set of
…i †
particles, the ith particle t containing a realization of the ship's history up to time t, as
described by equation (3).
Notationally, for array v, we let vt denote restriction to elements corresponding to time t,
and vh:k denote restriction to elements associated with time interval ‰h, kŠ. Thus, for example,
xh:k :ˆ fxh , xh‡1 , . . ., xk g.
At time t, the target is the posterior distribution of t , which has the form
t …t † ˆ p…t jz1:t † / p…t † .
t
Q
iˆ1
p…zi ji †,
…21†
Dynamic Bayesian Models
137
where t …t † denotes a density with respect to Lebesgue measure of t , the conditional
likelihood p…zi ji † is fully determined by the model speci®cation and where the prior distribution p…t † has structure
0
C_y1:t
p…t † ˆ p…1 † . exp… 0:5 y_1:t
0
0:5 x_ 1:t
Cx_ 1:t †.
…22†
Of the two terms of the above product, the ®rst, p…1 †, represents our prior knowledge about
the ship's initial conditions. The second term represents prior belief in smoothness of the
ship's accelerations. C is the t t tridiagonal symmetric matrix whose elements are all 0
except for those on the diagonals adjacent to the main diagonal, which are equal to 1, and
those on the main diagonal, C ‰i, iŠ ˆ 1 if i ˆ 1 or i ˆ t; otherwise C ‰i, iŠ ˆ 2. This matrix is
rank de®cient, re¯ecting the fact that the exponent term in equation (22) is invariant with
respect to global level changes of the co-ordinate and velocity vectors.
5.2. Applying the resample±move algorithm
We now discuss our implementation of the resample±move algorithm in the example, with
particular attention on the augmentation and rejuvenation stages of the algorithm.
…i †
…i †
In the augmentation stage at time t, each particle t 1 was augmented to ~t 1 by incorporating
a realization of the pair …x_ t , y_ t †, sampled from the conditional prior N…x_ t 1 , 1 † N… y_ t 1 ,
…i †
1 †. The augmented particle ~t 1 then entered the resampling step with weight proportional
…i †
to p…zt j~t 1 †.
In the rejuvenation stage at time t we used the following three types of move:
(a) rescaling Ð all co-ordinate and velocity values representing the ship's trajectory, along
both axes, are reduced or ampli®ed by the same factor;
(b) local perturbation Ð the particle trajectory is perturbed locally, by moving a block of
consecutive points of the trajectory to a new position, while leaving the remaining part
of the trajectory unchanged;
(c) -move Ð update the value of .
The role of the rescaling move in the bearings-only problem is discussed in Carpenter et al.
(1999). We implemented this move according to a Metropolis±Hastings scheme, using a
symmetric proposal distribution. Details are given in Berzuini and Gilks (2000). The rescaling
move at time t involves the entire length of the ship's trajectory up to time t and therefore its
burden of computation is of the order O…t†. However, we can perform the rescaling move
progressively less frequently, with probability O…t † say, where > 0. Thus the rescaling
move represents a burden of computation of the order O…t1 †.
The local perturbation move is applied to the x-vector
_
®rst, and then to the y-vector.
_
For
0
, a new candidate x@
_ a:b can be drawn from the
any given block x_ a:b currently at value x_ a:b
proposal distribution N…x_ 0a:b , r.…Ca:b, a:b † 1 †, where C is described in Section 5.1, and parameter
r controls the spread of the proposal distribution. To complete the move, we then generate a
candidate y_ a@:b for the block y_ a:b from a complex dependence proposal distribution that
incorporates information about zt and x_ a:b . Further details about this distribution are given in
Berzuini and Gilks (2000). The candidate …x@
_ a:b , y_a@:b † is then either globally accepted or
globally rejected. At stage t of the algorithm, it will often be sensible to update a block
containing the tth time point of the trajectory, and subsequently to update less recent blocks,
visited in a random order to avoid drift. If the updated blocks cover the entire length of the
trajectory up to t without overlapping, the computational burden of the local perturbation
move at the updating stage t will be O…t†. However, because new data zt will provide
138
W. R. Gilks and C. Berzuini
progressively less information on …x_ k , y_ k †, for each ®xed k < t, we can arrange to update …x_ k ,
y_ k † with diminishing frequency as t increases, such that the overall burden of computation for
the local perturbation move at time t is of order O…t †, for some < 1.
The parameter 1 is updated by a standard Gibbs move, which involves at each updating
stage t the entire sequence of length t of points in the trajectory. Because the full conditional
distribution involves all elements of the trajectory up to time t, the computational complexity
of updating is O…t†. However, because the marginal posterior distribution for will stabilize
over time, we can exploit information on contained in the particles and update progressively less frequently, with probability O…t † say, where > 0. Thus the burden of
computation for -update is of the order O…t1 †. However, this will impoverish the diversity
of -values in St and will tend to reduce the precision of estimates of .
The overall burden of computation of each rejuvenation step at time t, summing the separate
contributions of the three types of move, is O…t1 † ‡ O…t † ‡ O…t1 † < O…t†. By comparison,
in a typical MCMC algorithm, all the model parameters would need to be updated at every
iteration, with a burden of computation for one iteration of MCMC sampling at time t equal
to O…t† ‡ O…t† ‡ O…t† ˆ O…t†, summing the separate contributions for the rescaling move, the
local perturbation move and the -updating.
5.3. Experiment
A series of 150 observations was generated from model (1)±(2), with initial conditions
x1 ˆ 0:01, y1 ˆ 20, x_ 1 ˆ 0:002 and y_ 1 ˆ 0:06, and by ®xing ˆ 0:000 001 and ˆ 0:005.
These data are shown in Fig. 1. The simulated data were sequentially incorporated into the
resample±move ®lter. In running the algorithm, we chose p…1 † to represent a proper
informative prior on the ship's initial condition, slightly displaced with respect to the true
value of and with respect to initial co-ordinates x1 and y1 , but not with respect to the true
initial velocities x_ 1 and y_ 1 . Details on this prior, as well as on other aspects of this experiment,
can be found in Berzuini and Gilks (2000). We set equal to 0:005. The number of particles
was kept equal to 100 over the entire ®ltering process.
On the same data, with the same number of particles, we ran a standard SIR ®lter. This
®lter, developed by Gordon et al. (1993), is based on principles contained in Rubin (1988), as
an alternative to MCMC sampling in dynamic systems. The SIR ®lter can be described as a
special case of our resample±move algorithm, in which there is no move step, so that no
parameter value is changed once it has been sampled.
5.4. Results
Fig. 2 compares the performance of the resample±move algorithm and SIR on the same data,
using the same number of particles. In Fig. 2, x and y represent east and north respectively.
Fig. 2(a) shows results from the resample±move algorithm and Fig. 2(b) the results from SIR.
In both plots the ship's true trajectory is represented by a chain of circles, a ®lled square
indicating the ®nal position. In both plots the output of the ®lter is represented by a piecewise
linear curve whose break points represent the estimated means of the ®ltering distribution of
…xt , yt †, for t ˆ 1, 2, . . ., with arrows indicating time ordering. At time t ˆ 1, the estimated
position of the ship di€ers markedly from the truth as a consequence of the prior. Then the
resample±move method uses the information contained in the incoming data to provide a fair
particle approximation of the evolving path, whereas problems of particle impoverishment
attend the SIR ®lter. Note that the piecewise linear curve in each plot does not represent a
realization of the ship's trajectory, but rather the sequence of ®ltered means.
Dynamic Bayesian Models
(a)
139
(b)
Fig. 2. Comparing the performances of (a) the resample±move method and (b) a standard SIR ®lter: *, the
ship's true trajectory; &, ®nal position; Ð, the ®lter's output (see the text for further explanation)
Fig. 3 compares the performance of the resample±move method and SIR in estimating the
static parameter . Fig. 3(a) suggests that the resample±move algorithm satisfactorily tracks
the evolution of the posterior mean for , until, in the end, this mean comes very close to the
true value. By contrast, according to Fig. 3(b), SIR appears unable to track the posterior
mean satisfactorily. The problem with SIR is that the set of particles propagated is left
without low values of after a few resampling stages. Such a loss prevents the method from
adapting to the gradual decline of the posterior mean of . Of course, this also means that
forecasts of the future path of the ship would correspondingly underestimate the uncertainty
involved. The superiority of the resample±move algorithm from this point of view lies in the
fact that the sampled values of are occasionally moved. Most methods proposed in the
literature, including the algorithms proposed by Liu and Chen (1998), Berzuini et al. (1997)
and Gordon et al. (1993), lack the ability of updating the sampled values for the static
parameters, once they have been sampled, unless ad hoc modi®cations are employed.
Therefore we expect that the performance of these methods in estimating in our bearingsonly example would not be much dissimilar to that of SIR.
6. Discussion
The resample±move algorithm provides a framework for dynamic Monte Carlo Bayesian
inference and prediction. In applying this framework, there are many implementational
details which must be decided, in particular the particle set sizes nk , when to rejuvenate, which
parameters to move and which Markov chain kernels to use to obtain rapid mixing. We
cannot at this stage make general recommendations regarding these choices, and in most
applications some experimentation will be required. Indeed, the implementation of the
resample±move algorithm may be more demanding than for MCMC sampling, as an e€ective
strategy must be devised for a wide collection of target distributions.
140
W. R. Gilks and C. Berzuini
(a)
(b)
Fig. 3. Comparing the performances of (a) the resample±move method and (b) SIR in estimating the static
parameter The theory of Section 3.4 provides a CLT for large particle set sizes nt . It does not
comment on limiting behaviour at large times t. This would depend on the rate and type of
information arriving, and whether this information is accompanied by new unobserved
variables. This area deserves further research.
As noted in Section 1, the target distribution t need not be a posterior distribution. For
example, if we are concerned with events in the tails of the process, then the resample±move
algorithm could be run with t replaced with some heavier-tailed sequence of distributions,
with an appropriate adjustment to the importance weights.
Acknowledgements
We thank Andrew Gelman for insightful comments on our methodology and the Consiglio
Nazionale delle Ricerche Institute of Numerical Analysis of Pavia for computing support.
We also are indebted to three referees, each of whom provided exceptionally incisive and
constructive reports on our original manuscript.
Appendix A
Throughout we use k, l and t to denote non-negative integers. De®ne q0 …† 1 …†. Let Eqk denote
expectation with respect to qk …k , .†. Thus Eqk is a function of k , although we suppress this dependence
in the notation. For any k and t such that 0 4 k 4 t and t > 0, de®ne
8P
nk
… j†
>
I kt …g†
>
>
>
>
< jˆ1
nk
g~ kt ˆ P I …ktj † …1†
>
>
jˆ1
>
>
>
:
Et … g†
1 4 k 4 t,
…23†
k ˆ 0, t > 0.
Note that this is consistent with the de®nition of g~ t given by equation (9), i.e. g~ t ˆ g~ tt . Let Ek denote
expectation over sampling stage k ‡ 1, conditional on the particles in Sk . Then
Dynamic Bayesian Models
…
8 nk
P … j†
… j†
>
>
w
g…k‡1 † qk …dk‡1 †
>
k
>
>
jˆ1
2
<
k‡1
nk
P
… j†
Ek ‰ gŠ ˆ
wk
>
>
>
jˆ1
>
>
:
E1 ‰gŠ
k 5 1,
141
…24†
k ˆ 0.
Finally, let `)' denote convergence in distribution and `!p ' denote convergence in probability. Let
n1 , . . ., nt ! 1 denote taking ®rst n1 ! 1, then n2 ! 1, and so on.
Remark 1. We state, without proof, the algebraic identities
Ikl fIlt …g†g ˆ Ikt …g†,
Ek ‰ gŠ ˆ g~ k, k‡1 ,
k 4 l 4 t,
…25†
k 5 0.
…26†
Lemma 1. Let
qR
k …k‡1 , dk † ˆ
k‡1 …dk † qk …k , dk‡1 †
.
k‡1 …dk‡1 †
…27†
Then qR
k is the reverse Markov chain transition kernel corresponding to qk and has stationary measure
k‡1 .
„
R
.
Proof. k 2
qR
k …k‡1 , dk † ˆ 1, using equation (5). Therefore qk …k‡1 , † is a probability measure for all
k‡1 2 . Moreover, from equation (27)
…
…
k‡1 …dk‡1 † qR
k‡1 …dk † qk …k , dk‡1 †
k …k‡1 , dk † ˆ
k‡1 2
k‡1 2
ˆ k‡1 …dk †.
Therefore k‡1 is a stationary measure of qR
k.
Corollary 1. For 1 4 k 4 t,
8
…
t
…d † Q
>
>
g…t † t t
qR … , dl 1 †,
<
k …dk † lˆk‡1 l 1 l
Ikt …g† ˆ k‡1, . . ., t 2
>
>
:
g…t †,
k < t,
k ˆ t.
Proof. Substitute equation (27) into expression (10).
Remark 2. The following two identities follow directly from corollary 1:
Ek ‰Ikt …g†Š ˆ Et ‰gŠ,
Ek ‰Ik‡1,t …g†Š
ˆ g~ kt ,
Ek ‰Ik‡1,t …1†Š
0 < k 4 t,
0 4 k < t.
…28†
…29†
Lemma 2. Suppose that qk is perfectly mixing, i.e. qk …k , dk‡1 † ˆ k‡1 …dk‡1 †. Then V l* 1, t …g† ˆ 0 for all
l 4 k 4 t.
Proof. Substituting k‡1 for qk in expression (10) leads to
Ikt …h† ˆ wk …k † Ek‡1 ‰Ik‡1, t …h†Š,
ˆ wk …k † Et ‰hŠ,
by equation (28). Now from equations (11) and (25), for l 4 k,
…30†
142
W. R. Gilks and C. Berzuini
V l* 1, t …g† ˆ El ‰I 2lt …g
Et ‰gІŠ
ˆ El ‰‰Ilk fIkt …g
Et ‰ gІgŠ2 Š
ˆ El ‰‰Ilk fwk …k † Et ‰g
from equation (30). But Et ‰ g
Et ‰ gŠŠgŠ2 Š,
Et ‰ gŠŠ ˆ 0. Hence V l* 1, t …g† ˆ 0 for all l 4 k.
Lemma 3. For 1 4 k 4 t, suppose that
(a) supk
(b) supk
fEqk 1 ‰I 2kt …g†Šg < 1 and
fEqk 1 ‰I 2kt …1†Šg < 1,
1 2
1 2
where the suprema should be ignored for k ˆ 1. Then
g~ kt g~ k 1, t
) N…0, 1†
f…1=nk † Vk 1, t …g†g1=2
as nk ! 1,
where
Vk
1, t …g† ˆ
1
Ek 1 ‰I 2kt …g†Š
E2k 1 ‰Ikt …1†Š
2
Ek 1 ‰Ikt …g†Š Ek 1 ‰Ikt …g† Ikt …1†Š E2k 1 ‰Ikt …g†Š Ek 1 ‰I 2kt …1†Š
‡
.
Ek 1 ‰Ikt …1†Š
E2k 1 ‰Ikt …1†Š
…31†
Proof. For 0 < k 4 t, g~ kt has the form of a ratio of sums of conditionally independent, identically
distributed, random variables. Lemma 3 then follows from the delta method, as in the lemma of
Berzuini et al. (1997), noting that g~ k 1, t is the ratio of expected values of these sums, from equation (29).
The integrability conditions of the lemma of Berzuini et al. (1997) hold since Ek 1 ‰I 2kt …g†Š has the form of
a weighted average, from expression (24), and is therefore less than or equal to supk 1 2
fEqk 1 ‰I 2kt …g†Šg,
which is ®nite by assumption; similarly for Ek 1 ‰I 2kt …1†Š.
Lemma 4. Suppose that conditions (a) and (b) of lemma 3 hold for all k ˆ 1, . . ., t. Then
Et 1 ‰ gŠ ) Et ‰gŠ as n1 , . . ., nt 1 ! 1.
Proof. From lemma 3, g~ kt ) g~ k 1, t as nk ! 1, for all k 4 t. Therefore, by induction, g~ kt ) g~ 0t
as n1 , . . ., nk ! 1, and in particular g~ t 1, t ) g~ 0t as n1 , . . ., nt 1 ! 1. Now, from equation (26),
g~ t 1, t ˆ Et 1 ‰gŠ, and g~ 0t ˆ Et ‰ gŠ, by de®nition (23): hence the result.
Lemma 5. Suppose that, for given l and t satisfying 1 < l 4 t, and for all k < l, the following
conditions hold:
(a)
(b)
(c)
(d)
(e)
supk
supk
supk
supk
supk
…Eqk
…Eqk
1 2
…Eqk
1 2
…Eqk
1 2
…Eqk
1 2
1 2
‰I 2kl fI 2lt …g†gІ < 1,
‰I 2kl fI 2lt …1†gІ < 1,
1
‰I 2kl fIlt …g† Ilt …1†gІ < 1,
1
‰I 2kl fIlt …g†gІ < 1,
1
‰I 2kl fIlt …1†gІ < 1.
1
1
Then Vl 1, t …g† ) V *l 1, t …g†, as n1 , . . ., nl
is given by equation (11).
1
! 1, where Vl
1, t …g†
is given by equation (31) and V *l 1, t …g†
Proof. The conditions ensure that El 1 ‰I 2lt …g†Š ) El ‰I 2lt …g†Š as n1 , . . ., nl 1 ! 1, from lemma 4.
Similar results follow for El 1 ‰I 2lt …1†Š, etc. Substituting these results into equation (31) gives
V *l 1, t …g† ˆ El ‰I 2lt …g†Š
2 El ‰Ilt …g† Ilt …1†Š Et ‰gŠ ‡ El ‰I 2lt …1†Š E2t ‰ gŠ,
using equation (28). Rearranging this expression gives V l* 1, t …g† ˆ El ‰I 2lt …g
(11).
Et ‰gІŠ, as in equation
Theorem 2. Assume that conditions (a)±(e) of lemma 5 hold for each l ˆ 1, . . ., t. Then for any
k ˆ 1, . . ., t:
Dynamic Bayesian Models
zkt ˆ r
g~ kt Et ‰ gŠ
) N…0, 1†
k
P
…1=nl † V *l 1, t …g†
143
as n1 , . . ., nk ! 1.
lˆ1
Proof. Let
zkt0 ˆ p
g~ kt g~ k
f…1=nk † Vk
1, t
1, t …g†g
2k
,
1, t
ˆ
Vk 1, t …g†
,
V *k 1, t …g†
r2k
1, t
ˆ
…1=nk † V k* 1, t …g†
.
k
P
*
…1=nl † V l 1, t …g†
lˆ1
Then
zkt ˆ zkt0 rk
Now rk 1, t is a constant and k
imaginary number:
1, t
1, t k 1, t
‡ zk
1, t
p
r2k
…1
1, t †.
…32†
does not depend on Sk . So, for any constant a, letting i denote the
p
…1 r2k 1, t †g Ek 1 ‰ exp…iazkt0 rk 1, t k
p
! expfiazk 1, t …1 r2k 1, t † 12 …ark 1, t k 1, t †2 g
Ek 1 ‰exp…iazkt †Š ˆ expf iazk
1, t
1, t †Š
…33†
as nk ! 1, by the CLT for zkt0 in lemma 3. Note that conditions (a) and (b) of lemma 3 are a special
case of the conditions for theorem 2, obtained with k ˆ l. Applying lemma 5 to expression (33) we
obtain
p
Ek 1 ‰exp…iazkt †Š ! expfiazk
1, t
p
…1
r2k
1, t †
as n1 , . . ., nk ! 1.
Now suppose that the assertion of this theorem is true for k ˆ l
1
2
…ark
1, t †
2
g
…34†
1. Since
jEl 1 ‰exp…iazlt †Šj 4 El 1 ‰j exp…iazlt † jŠ ˆ 1,
we have, by the bounded convergence theorem (e.g. Billingsley (1986), theorem 16.5), as n1 , . . .,
nl ! 1,
p
E ‰El 1 ‰ exp…iazlt † ŠŠ ! E‰ expfiazl 1, t …1 r2l 1, t † 12 …arl 1, t †2 g Š
ˆ exp…
1
2
a2 †,
by the assertion. Therefore, by the continuity theorem (e.g. Billingsley (1986), theorem 26.3), zlt ) N…0,
1† as n1 , . . ., nl ! 1. Therefore, if the assertion is true for k ˆ l 1, it is also true for k ˆ l 4 t.
Finally, for k ˆ 1, zkt ˆ zkt0 ) N…0, 1† as n1 ! 1, by lemma 3. Hence, by induction, the assertion is true
for all k ˆ 1, . . ., t.
Corollary 2. With the conditions of theorem 2,
r
g~ t Et ‰ gŠ
) N…0, 1†
t
P
*
…1=nl † Vl 1, t …g†
as n1 , . . ., nt ! 1,
lˆ1
where g~ t is de®ned in equation (9).
Proof. Put k ˆ t in theorem 2, noting that g~ t g~ tt .
… j†
Corollary 3. For any k 4 t, nk 1 j wkt ) 1 as n1 , . . ., nk ! 1.
Proof. Replace g by wkt in corollary 2.
Lemma 6. For k 4 l, de®ne
… j†
Y kl …g† ˆ
nl
1 P
… jm†
…m†
C
g…l †,
nl mˆ1 kl
…35†
144
where
W. R. Gilks and C. Berzuini
… jm†
C kl
… j†
… j†
is de®ned in expression (14). Then Y k, k‡1 …g† ) nk 1 I k, k‡1 …g† as n1 , . . ., nk‡1 ! 1.
Proof. By the law of large numbers,
… j†
… j, m†
…m†
Y k, k‡1 …g† ) E ‰C k, k‡1 g…k‡1 †Š
as nk‡1 ! 1,
… j† …
w
… j†
ˆ nk k
g…k‡1 † q… k , dk‡1 †
P …l †
wk
lˆ1
ˆ
nk
P
lˆ1
Now nk
1
…l †
l wk
1
… j†
…l †
wk
I k, k‡1 …g†.
) 1 as n1 , . . ., nk ! 1, by corollary 3: hence the result.
… j†
… j†
… j†
Lemma 7. For any k 4 l, Y kl …g† ) nk 1 I kl …g† as n1 , . . ., nl ! 1, where Y kl …g† is de®ned in equation
(35).
Proof. The assertion is true trivially for k ˆ l. Suppose that the assertion is true for a given k,
2 4 k 4 l. Noting that
… jm†
1, l
Ck
ˆ
nk
P
iˆ1
… ji †
…im†
1, k C kl ,
Ck
it follows that
… j†
1, l …g†
Yk
ˆ
)
nk
P
iˆ1
… ji †
1, k
Ck
nk
P
iˆ1
… ji †
1, k
Ck
…i †
Y kl …g†
1 …i †
I …g†
nk kl
as n1 , . . ., nl ! 1,
by assertion,
)
1 … j†
I
fI …g†g
nk 1 k 1, k kl
as n1 , . . ., nl ! 1,
by lemma 6 (with g replaced by Ikl …g†),
ˆ
1 … j†
I
…g†,
nk 1 k 1, l
by equation (25). Thus, if the assertion is true for a given k, it is also true for k
the assertion is true for all positive k 4 l.
1. Hence, by induction,
… j†
… j†
… j†
Lemma 8. For any k 4 t, I^kt …g† ) I kt …g† as n1 , . . ., nt ! 1, where I^ kt …g† is de®ned in equation
(15).
Proof. The result is obtained by applying lemma 7 to equation (15).
Lemma 9. For l 4 t, V^ *l 1, t …g† ) V *l 1, t …g† as n1 , . . ., nt ! 1, where V^ *l 1, t …g† is de®ned in equation
(16) and V *l 1, t …g† is de®ned in equation (11).
Proof.
V^ *l 1, t …g† )
nl
1 P
… j†
I …g
nl jˆ1 kt
Et ‰ gІ2
as n1 , . . ., nt ! 1,
by corollary 2 and lemma 8,
) El ‰I 2kt …g
Et ‰gŠ †Š
as n1 , . . ., nt ! 1,
by corollary 2. The result then follows with equation (11).
Dynamic Bayesian Models
145
Theorem 3.
V^ t …g†
)1
var…g~ t †
as n1 , . . ., nt ! 1,
where V^ t …g† is de®ned in equation (17).
Proof. From equation (17):
t
P
V^ t …g†
ˆ lˆ1
t
var…g~ t † P
lˆ1
)1
nl 1 V^ l* 1, t …g†
nl 1 V l* 1, t …g†
t
P
lˆ1
nl 1 V l* 1, t …g†
var…g~ t †
as n1 , . . ., nt ! 1,
by lemma 9 and corollary 2.
References
Bergman, N. (1999) Recursive Bayesian estimation, navigation and tracking applications. Dissertation 579.
Department of Electrical Engineering, Linkoeping University, Linkoeping.
Berzuini, C., Best, N., Gilks, W. R. and Larizza, C. (1997) Dynamic conditional independence models and Markov
chain Monte Carlo methods. J. Am. Statist. Ass., 92, 1403±1412.
Berzuini, C. and Gilks, W. R. (2000) Resample±move ®ltering with cross-model jumps. In Sequential Monte Carlo in
Practice (eds A. Doucet, J. F. G. de Freitas and N. J. Gordon). New York: Springer. To be published.
Billingsley, P. (1986) Probability and Measure. New York: Wiley.
Carpenter, J. R., Cli€ord, P. and Fernhead, P. (1998) Building robust simulation-based ®lters for evolving data sets.
Technical Report. Department of Statistics, University of Oxford, Oxford.
Ð (1999) Improved particle ®lter for nonlinear problems. IEE Proc. F, 146, 2±7.
Del Moral, P. and Guionnet, A. (1999) A central limit theorem for non-linear ®ltering using interacting particle
systems. Ann. Appl. Probab., 9, 275±297.
Doucet, A. (1998) On sequential simulation-based methods for Bayesian ®ltering. Technical Report. University of
Cambridge, Cambridge.
Geweke, J. (1989) Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 57, 1317±
1339.
Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (eds) (1996) Markov Chain Monte Carlo in Practice. London:
Chapman and Hall.
Gordon, N. J., Salmond, D. J. and Smith, A. F. M. (1993) Novel approach to non-linear non-Gaussian Bayesian
state estimation. IEEE Proc. F, 140, 107±113.
Handschin, J. E. and Mayne, D. Q. (1969) Monte Carlo techniques to estimate the conditional expectation in multistage non-linear ®ltering. Int. J. Control, 9, 547±559.
Isard, M. and Blake, A. (1996) Contour tracking by stochastic propagation of conditional density. In Computer
Vision: Proc. ECCV '96 (eds B. Buxton and R. Cipolla). New York: Springer.
Kanazawa, K., Koller, D. and Russell, S. (eds) (1995) Proc. Conf. Uncertainty in Arti®cial Intelligence. San Mateo:
Morgan±Kaufmann.
Kong, A., Liu, J. S. and Wong, W. H. (1994) Sequential imputations and Bayesian missing data problems. J. Am.
Statist. Ass., 89, 278±288.
Liu, J. S. and Chen, R. (1995) Blind deconvolution via sequential imputation. J. Am. Statist. Ass., 90, 567±576.
Ð (1998) Sequential Monte Carlo methods for dynamic systems. J. Am. Statist. Ass., 93, 1032±1044.
Pitt, M. K. and Shephard, N. (1999) Filtering via simulation: auxiliary particle ®lters. J. Am. Statist. Ass., to be
published.
Robert, C. and Casella, G. (1999) Monte Carlo Statistical Methods. New York: Springer.
Rubin, D. B. (1988) Using the SIR algorithm to simulate posterior distributions. In Bayesian Statistics 3 (eds J. M.
Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith), pp. 395±402. Oxford: Oxford University Press.
Shiga, T. and Tanaka, H. (1985) Central limit theorem for a system of Markovian particles with mean ®eld
interaction. Z. Wahrsch. Ver. Geb., 69, 439±459.
Smith, A. F. M. and Gelfand, A. E. (1992) Bayesian statistics without tears: a sampling±resampling perspective. Am.
Statistn, 46, 84±88.
Sutherland, A. I. and Titterington, D. M. (1994) Bayesian analysis of image sequences. Technical Report 94-3.
Department of Statistics, University of Glasgow, Glasgow.
146
W. R. Gilks and C. Berzuini
Tierney, L. (1994) Markov chains for exploring posterior distributions (with discussion). Ann. Statist., 22, 1701±1786.
West, M. (1993) Mixture models, Monte Carlo, Bayesian updating and dynamic models. In Proc. 24th Symp.
Computing Science and Statistics, pp. 325±333. Fairfax Station: Interface Foundation of North America.
Zaritskii, V. S., Shimelevich, L. I. and Svetnik, V. B. (1975) Monte Carlo technique in problems of optimal data
processing. Autom. Remote Control, 12, 95±103.