The Alive Particle Filter and its use in Particle Markov chain Monte

The Alive Particle Filter and its use in Particle Markov chain Monte
Carlo
BY PIERRE DEL MORAL1 , AJAY JASRA2 , ANTHONY LEE3 ,
CHRISTOPHER YAU4 & XIAOLE ZHANG4
1 School
of Mathematics and Statistics, University of New South Wales, Sydney, NSW, 2052, AUS.
E-Mail: [email protected]
of Statistics & Applied Probability, National University of Singapore, Singapore, 117546, SG.
E-Mail: [email protected]
3 Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK.
E-Mail: [email protected]
4,5 Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
2 Department
E-Mail: [email protected] , [email protected]
Abstract
In the following article we investigate a particle filter for approximating Feynman-Kac models with indicator
potentials and we use this algorithm within Markov chain Monte Carlo (MCMC) to learn static parameters of
the model. Examples of such models include approximate Bayesian computation (ABC) posteriors associated
with hidden Markov models (HMMs) or rare-event problems. Such models require the use of advanced particle
filter or MCMC algorithms e.g. [13], to perform estimation. One of the drawbacks of existing particle filters,
is that they may ‘collapse’, in that the algorithm may terminate early, due to the indicator potentials. In this
article, using a newly developed special case of the locally adaptive particle filter in [14], which is closely related
to [16], we use an algorithm which can deal with this latter problem, whilst introducing a random cost per-time
step. In particular, we show how this algorithm can be used within MCMC, using particle MCMC [2]. It is
established that, when not taking into account computational time, when the new MCMC algorithm is applied
to a simplified model it has a lower asymptotic variance in comparison to a standard particle MCMC algorithm.
Numerical examples are presented for ABC approximations of HMMs.
Key Words: Particle Filters, Markov Chain Monte Carlo, Feynman-Kac Formulae.
1
Introduction
Let {(En , En )}n≥1 be a sequence of measurable spaces, {Gn (x) = IBn (x)}n≥1 , (x, Bn ) ∈ En × En , Bn ⊂ En , be a
sequence of indicator potentials and {Mn : En−1 × En → [0, 1]}n≥1 , with x0 ∈ E0 a fixed point, be a sequence of
Markov kernels. Then for the collection of bounded and measurable functions ϕ ∈ Bb (En ) the n−time Feynman-Kac
marginal is:
γn (ϕ)
, n≥1
ηn (ϕ) :=
γn (1)
Qn−1
assuming that γn (ϕ) = Ex0 [ϕ(Xn ) p=1 Gp (Xp )] is well-defined, where Ex0 [·] is the expectation w.r.t. the law of
an inhomogeneous Markov chain with transition kernels {Mn }n≥1 . Such models appear routinely in the statistics
and applied probability literature including:
• ABC approximations (as in, e.g., [11])
• ABC approximations of HMMs (e.g. [8, 13, 17])
• Rare-Events problems (as in, e.g. [6])
In some scenarios, we will be interested in a static parameter θ ∈ Θ and Bayesian estimation, i.e. allowing Mn
to depend on θ, we are interested in the density proportional to γn,θ (1)π(θ), where π(θ) is a prior on the static
parameter. In order to perform estimation for such models, one often has to resort to numerical methods such as
particle filters or MCMC; see the aforementioned references.
Supposing θ is fixed, the basic particle filter, at time n and given a collection of samples N ≥ 1 with nonzero potential on EN
n−1 , will generate samples on En using the Markov kernels {Mn }n≥1 and then sample with
PN
replacement amongst {xin }1≤i≤N according to the normalized weights Gn (xin )/ j=1 Gn (xjn ). The key issue with
this basic particle filter is that, at any given time, there is no guarantee that any sample xin lies in Bn , and in some
challenging scenarios, the algorithm can ‘die-out’ (or collapse), that is, that all of the samples have zero potentials
5 Yau
is currently affiliated with the Wellcome Trust, University of Oxford
1
(see [10] for examples of such algorithms, which can work well). From an inference perspective, this is clearly an
undesirable property and can lead to some poor performance. For some classes of examples, e.g. [6, 11], there are
some adaptive techniques which can reduce the possibility of the algorithm collapsing, but these are not always
guaranteed to work in practice. In this article we consider the alive particle filter, developed in [15] and [1]. This
algorithm uses the same sampling mechanism, but the samples are generated until there is a prespecified number
that are alive. This removes the possibility that the algorithm can collapse, but introduces a random cost per
time-step.
When θ is fixed, under assumptions, we establish the following results for the alive particle filter:
1. The estimate of γn (ϕ) computed using the proposed filter is unbiased
2. The relative variance of the particle filter estimates of γn (ϕ), assuming N = O(n), is shown to grow linearly
in n.
Note that a proof of point 1. is in [1], but our proof is original and particularly useful to develop further results such
as central limit theorems (CLTs), which cannot easily be done using the approach in [1]. The results are of particular
interest when using the alive particle filter within MCMC methodology (a particle MCMC (PMCMC) algorithm,
[2]), for inferring θ as we now explain. PMCMC algorithms enable one to sample from the density proportional
to γn,θ (1)π(θ), using a particle filter. This is because, for any fixed θ, the standard particle filter provides an
unbiased estimate of γn,θ (1) ([9, Theorem 7.4.2]), so denoting all the variables generated by a particle filter as u
with probability density Ψn,θ (u), one can produce a computable and unbiased estimate γn,θ (1, u) of γn,θ (1). The
essence of the PMCMC method is to proceed
by defining an auxiliary target proportional to γn,θ (1, u)Ψn,θ (u)π(θ)
R
and using MCMC to sample from it; as γn,θ (1, u)Ψn,θ (u)du = γn,θ (1), this produces samples from the target of
interest. As the alive particle filter also produces an unbiased estimate of γn,θ (1) it can be used instead of the
standard particle filter and as noted above, one expects superior performance from an empirical perspective. This
is a new MCMC algorithm, to the best of our knowledge. There is a variety of applications of such PMCMC
algorithms, for example, when performing static parameter estimation for ABC approximations of HMMs. The
results in 1 & 2 not only allow one to construct new PMCMC algorithms, but also provide theoretical guidelines for
their implementation. We also give a result, when not taking into account computational time, that when the new
MCMC algorithm is applied to a simplified model it has a lower asymptotic variance in comparison to a standard
particle MCMC algorithm (i.e. when using the standard particle filter).
The structure of this article is as follows: In Section 2 we provide a motivating example, ABC approximations of
HMMs, for the construction of the particle filter, as well as the alive particle filter itself. In Section 3 our theoretical
results are provided along with some interpretation of their meaning. In Section 4 we develop a PMCMC algorithm
using the guidelines in Section 3 for static parameter estimation associated to ABC approximations of HMMs. In
Section 5 we implement the alive particle filter for the motivating example as well as PMCMC. In Section 6 the
article is concluded, with some discussion of future work. The appendix contains technical results for the theory in
Section 3.
2
Motivating Example and Algorithm
Throughout θ is fixed unless otherwise stated. We will place a prior on θ and it will be random, when stated.
2.1
Motivating Example
We are given an HMM with observations {Yn }n≥1 , Yn ∈ Y ⊆ Rdy , hidden states {Zn }n≥0 , Zn ∈ Z ⊆ Rdx , Z0 given.
We assume:
Z
P(Yn ∈ A|{Zn }n≥0 ) =
gθ (y|zn )dy n ≥ 1
A
and
Z
P(Zn ∈ A|{Zk }k≥0 ) =
fθ (z|zn−1 )dz
n≥1
A
with θ ∈ Θ a static parameter and dy Lebesgue measure.
We assume gθ (y|xn ) is unknown (up to a positive, unbiased estimate), but one can sample from the associated
distribution. In this scenario, one cannot apply a standard particle filter (or many other numerical approximation
schemes). [8, 13] develop the following ABC approximation of the joint smoothing density, for > 0:
Qn
g (yk |zk )fθ (zk |zk−1 )
(1)
πθ (z1:n |y1:n ) = R Qnk=1 θ
k=1 gθ (yk |zk )fθ (zk |zk−1 )dz1:n
Xn
2
where
R
gθ (yk |zk ) =
g (u|zk )du
B (yk ) θ
R
du
B (yk )
and B (yk ) is the open ball centered at yk with radius .
We let θ be fixed and omit it from our notations; it is reintroduced later on. We introduce a Feynman-Kac
representation of the ABC approximation described above. Let En = E = Z×Y and define for n ≥ 1, Gn : E → {0, 1}:
Gn (x) = IX×B (yn ) (x)
with x = (z, u). Now introduce Markov kernels {Mn }n≥1 , Mn : E × B(X × Y) → [0, 1] (B(·) are the Borel sets), with
Mn (x, dx0 ) = f (z 0 |z)g(u0 |z 0 )du0 dz 0 .
Then the ABC predictor is for n ≥ 1:
ηn (ϕ) :=
γn (ϕ)
,
γn (1)
(2)
where ϕ ∈ Bb (E) and
γn (ϕ) = Ex0
" n−1
Y
# Z
Gp (Xp ) ϕ(Xn ) =
n−1
Y
En
p=1
n
Y
Gp (xp ) ϕ(xn )
Mp (xp−1 , dxp ).
p=1
(3)
p=1
This provides a concrete example of the Feynman-Kac model in Section 1. In light of (2), we henceforth refer
to γn (1) as the normalizing constant. This quantity is of fundamental importance in a wide variety of statistical
applications, notably in static parameter estimation, as it is equivalent to the marginal likelihood of the observed
data Y1 , . . . , Yn−1 in contexts such as the ABC approximation presented above, as can be determined from (3).
2.2
Standard Particle Filter
Now define, for n ≥ 2:
Φn (ηn−1 )(ϕ) =
where Mn (ϕ)(x) =
and setting
R
ηn−1 (Gn−1 Mn (ϕ))
ηn−1 (Gn−1 )
ϕ(y)Mn (x, dy). The standard particle filter works by sampling x11 , . . . , xN
1 i.i.d from M1 (x0 , ·)
ηnN (ϕ) =
N
1 X
ϕ(xin ) n ≥ 1
N i=1
N
at times n ≥ 2 sampling x1n , . . . , xN
n from Φn (ηn−1 )(·), assuming that the system has not died out.
2.3
Alive Particle Filter
We now discuss an idea which will prevent the particle filter from dying out; see also [14] and [16]. Throughout we
assume that Mn (x, Bn ) is not known for each x, n; if this is known, then one can develop alternative algorithms.
At time 1, we sample x11 , . . . , xT1 1 i.i.d. from M1 (x0 , ·), where
T1 = inf{n ≥ N :
n
X
G1 (xi1 ) ≥ N }.
i=1
Then, define
η1T1 (ϕ) =
TX
1 −1
1
ϕ(xi1 ).
T1 − 1 i=1
Now, at time 2 sample x12 , . . . , xT2 2 , conditionally i.i.d. from Φ2 (η1T1 )(·), where
T2 = inf{n ≥ N :
n
X
i=1
3
G2 (xi2 ) ≥ N }.
Algorithm 1 Alive Particle Filter
1. At time 1. For j = 1, 2, . . . until j =: T1 is reached such that G1 (xj1 ) = 1 and
Pj
i=1
G1 (xi1 ) = N :
• Sample xj1 from M1 (x0 , ·).
2. At time 1 < p ≤ n. For j = 1, 2, . . . until j =: Tp is reached such that Gp (xjp ) = 1 and
Pj
i=1
Gp (xip ) = N :
(a) Sample ajp−1 uniformly from {k ∈ {1, . . . , Tp−1 − 1} : Gp−1 (xkp−1 ) = 1}.
aj
p−1
(b) Sample xjp from Mp (xp−1
, ·).
This is continued until needed (i.e. with an obvious definition of T3 , T4 etc). The idea here is that, at every time
step, we retain N − 1 particles with non-zero weight, so that the algorithm never dies out, but with the additional
issue that the computational cost per time-step is a random variable. The procedure is described in Algorithm 1.
We note that the approach in [15] retains N alive particles, i.e. it differs only in step 2(a) of Algorithm 1 by sampling
instead ajp−1 uniformly on {k ∈ {1, . . . , Tp−1 } : Gp−1 (xkp−1 ) = 1}. This seemingly innocuous difference is, however,
crucial to the unbiasedness results we develop in the sequel.
2.3.1
Some Remarks
We remark that one can show [9] that for n ≥ 2, the normalizing constant is given by
γn (1) =
n−1
Y
ηp (Gp ).
p=1
Thus, a natural estimate of the normalizing constant is
γnTn (1) =
n−1
Y
ηpTp (Gp ) =
n−1
Y
p=1
p=1
N −1
.
Tp − 1
We note that the estimates of ηn and γn are different from those considered in [16] (which uses all the samples, not
removing the last one). This is a critical point as in Proposition 3.1 we show that this estimate of the normalizing
constant is unbiased which is crucial for using this idea inside MCMC algorithms. In this direction, one uses the
particle filter to help propose values and there is an accept/reject step; we discuss this approach in Section 4.
Other than the fact that this filter will not die out, in the context of our motivating example, there is also a
natural use of this idea. This is because, one can envisage the arrival of an outlier or unusual data; in such scenarios,
the alive particle filter will assign (most likely) more computational effort for dealing with this issue, which is not
something that the standard filter is designed to do.
A final remark is as follows; in our example Bn = X × B (yn ) and so, as assumed in this article in general,
Mn (x, Bn ) is not known for each x, n. This removes the possibility of changing measure to Q (in the formula for
γn (·)), (with finite dimensional marginal Qn )
Qn (d(x1 , . . . , xn )) =
n
Y
Mp (xp−1 , dxp )IBp (xp )
Mp (xp−1 , Bp )
p=1
call the Markov kernels in the product M̂p . This is because the new potential at time n is exactly: Mn (x, Bn ).
However, one can simulate from M̂n and use an unbiased estimate of Mn (x, Bn ) for each particle. That is, we
obtain samples z (1) , z (2) from M̂p (xip−1 , ·) using R samples (total) from Mp (xip−1 , ·) and then we set xip = z (1) (say)
with associated weight 1/(R − 1). This particular procedure would then have a fixed number of particles with no
possibility of collapsing. Other than the algorithm being convoluted, some particles xip−1 could be such that E[R]
is prohibitively large, even though E[Tp ] is not very large, which provides a reasonable argument against such a
scheme.
3
Theoretical Results
We will now present some theoretical results for the particle filter in Section 2.3 (so that θ is fixed).
4
3.1
Assumptions and Notations
Define the following sequence of Markov kernels, for n ≥ 1:
M̂n (x, dy) =
Mn (x, dy)Gn (y)
.
Mn (Gn )(x)
We will make use of the following assumptions:
• (Ĝ): For each n ≥ 0
sup
(x,y)∈B2n
Mn+1 (Gn+1 )(x)
= δ̂n < ∞.
Mn+1 (Gn+1 )(y)
(m)
• (M̂m ): There exist m ≥ 1 such that for any p ≥ 1 there is a β̂p
M̂p,p+m (x, dz) ≤
β̂p(m) M̂p,p+m (y, dz)
∀ (x, y) ∈ Bp
∈ [1, ∞) such that
M̂p,p+m = M̂p+1 M̂p+2 . . . M̂p+m .
Qp+m−1
(m)
The two conditions are (Ĥm ) in [7]; we also use the notation δ̂p = q=p
δ̂q . These assumptions are exceptionally
strong, but we remark that for the scenario of interest, weaker conditions have not been used in the literature. Note
that in addition, in the context of ABC, the assumptions are essentially qualitative as verifying them is very
difficult (even on compact state-spaces) as the likelihood density is typically intractable. However, we still expect
the phenomena reported in the below results to hold in some practical situations. We again remark that our results
are relevant for scenarios other than ABC.
Some notations are now given. For a probability measure on E (denoted
P(E)) µ ∈ P(E) and bounded measurable
R
real-valued function (denoted Bb (E)) ϕ ∈ BRb (E), we write µ(ϕ) := E ϕ(x)µ(dx). For a non-negative operator on
Bb (E), R(x, ·), and ϕ ∈ Bb (E), R(ϕ)(x) = E ϕ(y)R(x, dy). Let n ≥ 2, we will use the semi-group Qn (x, dy) =
Gn−1 (x)Mn (x, dy) and for 1 ≤ p < n,
Z
Qp,n (ϕ)(xp ) =
ϕ(xn )Qp+1 (xp , dxp+1 ) × · · · × Qn (xn−1 , dxn ),
En−p
where ϕ ∈ Bb (E); when p = n, Qn,n is the identity operator. E denotes expectation w.r.t. the stochastic process
which generates the algorithm, with corresponding
probability P. Fn is the filtration generated
Q
Qn−1 by the alive
particle system up-to time n. It is assumed that ∅ = 1. Note the important formula γn (ϕ) = [ q=1 ηq (Gq )]ηn (ϕ),
ϕ ∈ Bb (E).
3.2
Unbiasedness
Define:
γnTn (ϕ) :=
n−1
Y
p=1
N − 1 Tn
η (ϕ).
Tp − 1 n
The technical results used in this Section can be found in Appendix A.
Proposition 3.1. We have for any n ≥ 1, N ≥ 2 and ϕ ∈ Bb (En ), that
E[γnTn (ϕ)] = γn (ϕ).
Proof. The proof uses the standard Martingale difference decomposition in [9], with some additional expectation
properties that need to be proved. The case n = 1 follows from Lemma A.1, so we assume n ≥ 2. We remark that
for p ∈ {2, . . . , n}:
Tp−1
Tp−1
Tp−1
γpTp (1)Φp (ηp−1
)(Qp,n (ϕ)) = γp−1
(1)ηp−1
(Qp−1,n (ϕ))
and hence that
γnTn (ϕ) − γn (ϕ) =
n
X
T
p−1
γpTp (1)[ηpTp − Φp (ηp−1
)](Qp,n (ϕ)).
p=1
Then by Lemma A.1, it follows that
T
p−1
E[γpTp (1)[ηpTp − Φp (ηp−1
)](Qp,n (ϕ))|Fp−1 ] = 0
and hence that
E[γnTn (ϕ) − γn (ϕ)] = 0
from which we easily conclude the result.
5
3.3
Non-Asymptotic Variance Theorem
Below the term
Pn
s=1
δ̂s(m) β̂s(m)
ηs (Gs )
(m) (m)
β̂s
is as in [7]. The expressions and interpretations for δ̂s
can be found in Section
3.1, as is the assumption (Ĥm ). In addition, (ηnTn )2 is the U −statistic that is formed from our empirical measure
Tn−1
(1)2 (ηnTn )⊗2 (F ) for F ∈ Bb (E2 ).
ηnTn and (ηnTn )⊗2 is the corresponding V −statistic. In addition (γnTn )⊗2 (F ) = γn−1
Proposition 3.2. Assume (Ĥm ). Then for any n ≥ 2, N ≥ 3
N>
n
(m) (m)
X
δ̂s β̂s
s=1
⇒
ηs (Gs )
n
(m) (m)
h γ Tn (1)
2 i
4 X δ̃s δ̂s β̂s
n
E
−1
≤
.
γn (1)
N s=1 ηs (Gs )
Proof. The result follows essentially from [7]. To modify the proof to our set-up, we will prove that for F : E2 → R+
(where the expectation on the L.H.S. is w.r.t. the stochastic process that generates the SMC algorithm)
E[(γnTn )⊗2 (F )] ≤
N − 1 n
⊗2
Eξ [η1⊗2 Cξ1 Q⊗2
2 Cξ2 . . . Qn Cξn (F )]
N −2
(4)
where for each n ≥ 1, independently
Pξ (ξn = 1) = 1 − Pξ (ξn = 0) =
1
N −1
with corresponding joint expectation Eξ and C1 (F )(x, y) = F (x, x), C0 (F )(x, y) = F (x, y). Once (4) is proved this
gives a verification of Lemma 3.2, eq. (3.3) of [7], given this, the rest of the argument then follows Proposition 3.4
of [7] and Theorem 5.1 and Corollary 5.2 in [7] (note that the fact that we have an upper-bound with α = 0 (as in
[7]) does not modify the result). We will write expectations w.r.t. the probability space associated to the particle
system enlarged with the (independent) {ξn }n≥1 as Eξ .
Thus, we consider the proof of (4). We have
E[(γnTn )⊗2 (F )|Fn−1 ] = γnTn (1)2 E[(ηnTn )⊗2 (F )|Fn−1 ].
Now
E[(ηnTn )⊗2 (F )|Fn−1 ]
hT − 2
i
1
n
E
(ηnTn )2 (F ) +
ηnTn (C(F ))Fn−1
Tn − 1
Tn − 1
1
Tn−1 ⊗2
Tn−1
≤ Φn (ηn−1 ) (F ) +
Φn (ηn−1
)(C(F ))
N −1
N − 1
Tn−1 ⊗2
≤
Eξ [Φn (ηn−1
) (Cξn (F ))|Fn−1 ]
N −2
=
where we have used (Tn − 2)/(Tn − 1) ≤ 1, 1/(Tn − 1) ≤ 1/(N − 1) and Lemmas A.2 and A.1 to obtain the second
line. Thus we have that
N − 1
Tn−1 ⊗2
Eξ [Φn (ηn−1
) (Cξn (F ))|Fn−1 ]
E[(γnTn )⊗2 (F )|Fn−1 ] ≤ γnTn (1)2
N −2
N − 1
Tn−1
Tn−1 ⊗2
≤ γn−1
(1)2
Eξ [(ηn−1
) (Qn Cξn (F ))|Fn−1 ].
N −2
Using the above inequality, one can repeat the argument inductively to deduce (4). This completes the proof of the
Proposition.
Remark 3.1. The significance of the result is simply that if
(m) (m)
sup
s
δ̂s β̂s
<c
ηs (B (ys ))
then if N > cn the relative variance will be bounded by a constant that is independent of n. This will be useful
for the PMCMC algorithm in Section 4; see for instance the discussion in [2] as to the significance of a relative
variance result.
6
4
Particle MCMC
4.1
Motivation
We now utilize the results in Propositions 3.1-3.2. In particular, Proposition 3.1 allows us to construct an MCMC
method for performing static parameter inference in the context of ABC approximations of HMMs.
Recall Section 2.1. Our objective is to sample from the posterior density:
R
Qn
n
k=1 gθ (yk |zk )fθ (zk |zk−1 ) dz1:n π(θ)
X
Qn
(5)
π(θ|y1:n ) = R
k=1 gθ (yk |zk )fθ (zk |zk−1 ) dz1:n π(θ)dθ
Xn ×Θ
where gθ , fθ is as (1) and π(θ) is a prior probability density on Θ. Throughout the Section, we set N ≥ 2, > 0,
but in general omit dependencies on these quantities. In practice, one often seeks to sample from an associated
probability on the extended state-space En × Θ
π̃(θ, z1:n , u1:n |y1:n ) ∝
n
Y
IB (yk ) (uk )gθ (uk |zk )fθ (zk |zk−1 ) π(θ).
k=1
It is then easily verified that for any fixed θ ∈ Θ
Z
π(θ|y1:n ) =
π̃(θ, z1:n , u1:n |y1:n )dz1:n du1:n .
En
A typical way to sample from π̃(θ, z1:n , u1:n |y1:n ) is via the Metropolis-Hastings method, proposing to move
0
from (θ, z1:n , u1:n ) to (θ0 , z1:n
, u01:n ) via the probability density:
q(θ0 |θ)
n
Y
0
gθ0 (u0k |zk0 )fθ0 (zk0 |zk−1
)
k=1
such a proposal removes the need to evaluate gθ which is not available in this context. As is well known e.g. [2], such
procedures typically do not work very well and lead to slow mixing on the parameter space Θ. This proposal can
be greatly improved by running a particle-filter (the particle marginal Metropolis-Hastings (PMMH) algorithm) as
in [2]; that is a Metropolis-Hastings move that will first move θ, via q(θ0 |θ) and then run the algorithm in Section
2.2 picking a whole path, l, xl1:n ∈ En the sample used with a probability proportional to Gn (xin ). Remarkably,
this procedure yields samples from (5) via an auxiliary probability density; the details can be found in [2], but the
apparently fundamental property is that the estimate of the normalizing constant is unbiased. Note also that the
sample from the Markov chain (θ, xl1:n ) also provides a sample from π̃(θ, z1:n , u1:n |y1:n ).
Intitively, one expects that the alive filter in Section 2.3 out-performs the standard one, for a given computational
complexity. In addition, as seen in Proposition 3.1, the estimate of the normalizing constant is unbiased. It is
therefore a reasonable conjecture that one can construct a new PMMH algorithm, with the alive particle filter
investigated previously in this article and that this might perform better (in some sense) than the standard PMMH
just described. We remark that the justification of this new PMMH, to be given below, follows from the statements
in [4] (see also [3]) and Proposition 3.1; we provide details for completeness.
4.2
New PMMH Kernel
We will define an appropriate target probability to produce samples from (5), but we first give the algorithm:
1. Sample θ(0) from any absolutely continuous distribution. Then run the alive particle filter (with parameter
Tn (0)
Tn (0)
(1) (now denoted γn+1,θ(0)
(1)). Pick a trajectory xl1:n (0),
value θ(0)) in Section 2.3 up-to time n, storing γn+1
l ∈ {1, . . . , Tn (0) − 1}, with probability
Gn (xln (0))
.
PTn (0)−1
Gn (xin (0))
i=1
Set i = 1.
2. Propose θ0 |θ(i − 1) from a proposal with positive density on Θ (write it as q(θ0 |θ)). Then run the alive particle
Tn0
l
0
filter (with parameter value θ0 ) in Section 2.3 up-to time n, storing γn+1,θ
0 (1). Pick a trajectory (x1:n ) with
probability
Gn ((xln )0 )
.
PTn0 −1
i 0
i=1 Gn ((xn ) )
7
T (i)
T0
n
n
Set θ(i) = θ0 , γn+1,θ(i)
(1) = γn+1,θ
0 (1) with probability:
T0
n
γn+1,θ
0 (1)
1∧
Tn (i−1)
γn+1,θ(i−1)
(1)
T (i)
π(θ0 )q(θ(i − 1)|θ0 )
.
π(θ(i − 1))q(θ0 |θ(i − 1))
T (i−1)
n
n
Otherwise set θ(i) = θ(i − 1), γn+1,θ(i)
(1) = γn+1,θ(i−1)
(1). Set i = i + 1 and return to the start of 2.
For readers interested in the numerical implementation, they can skip to Section 5, noting that the θ samples will
come from the posterior (5); this is now justified in the rest of the section. It should be noted that our PMMH
algorithm is just that in [2] except we have replaced the standard particle filter with the alive particle filter.
We construct the following auxiliary target probability on the state-space:
Ē
Θ×
=
∞ [
ET1 × {T1 } ×
∞ [
T1 =N
T2 =N
{1, . . . , Tn−1 − 1}Tn
∞ [
E Tn ×
ET2 × {1, . . . , T1 − 1}T2 × {T2 } × · · · ×
Tn =N
× {Tn } × {1, . . . , Tn − 1} · · ·
.
Whilst the state-space looks complicated it corresponds to the static parameter and all the variables (the states
and the resampled indices) sampled by the alive particle filter up-to time-step n and then just the picking of one of
the final paths.
For n ≥ 2 (we omit θ from our notation) define
1:T
n
Ψn d(x1n , . . . , xTnn ), a1n−1 , . . . , aTn−1
, Tn |xn−1n−1 , Tn−1 :=
ai
ISn (x1n , . . . , xTnn , Tn )
n−1
)
Gn−1 (xn−1
QTn
Tn −1
N −1
i=1 PTn−1 −1 G
i=1
P∞
Tn =N
Tn −1
1:Tn
an−1
∈{1,...,Tn−1 −1} N −1
R
P
E Tn
i
n−1 (xn−1 )
hQ
ISn (x1n , . . . , xTnn , Tn )
ai
n−1
Mn (xn−1
, dxin )
ai
n−1
Gn−1 (xn−1
)
Tn
i=1 PTn−1 −1 G
(xi
i=1
n−1
n−1 )
ai
n−1
Mn (xn−1
, dxin )
i
where for n ≥ 1
Sn = {(u1n , . . . , uTnn , Tn ) ∈ YTn × {N, N + 1, . . . } :
TX
n −1
IB (yn ) (uin ) = N − 1 ∩ uTnn ∈ B (yn )}.
i=1
In addition, set
Ψ1 d(x11 , . . . , xT1 1 ), T1
−1 QT1
i
IS1 (x11 , . . . , xTn1 , T1 ) TN1−1
i=1 Mn (x0 , dx1 )
hQ
i
:= P
R
∞
T1
T1 −1
i
I (x11 , . . . , xT1 1 , T1 )
T1 =N N −1
i=1 M1 (x0 , dxn ) .
ET1 S1
Then the PMMH algorithm just defined samples from the target (c.f. [2, pp. 298])
π̄(θ, d(x1 , . . . , xn ), a1:n−1 , l, T1:n |y1:n ) ∝
n
Y
Gn (xln )
Tn
k
γ
(1)
Ψ
d(x1k , . . . , xTk k ), a1k−1 , . . . , aTk−1
, Tk |
PTn −1
k
n+1,θ
j
G
(x
)
n
n
j=1
k=2
1:T
xk−1k−1 , Tk−1 Ψ1 d(x11 , . . . , xT1 1 ), T1 π(θ).
where ak = (a1k , . . . , aTk k ), xk = (x1k , . . . , xTk k ) and l ∈ {1, . . . , Tn − 1}. Marginalizing over l
Tn
π̄(θ, d(x1 , . . . , xn ), a1:n−1 , T1:n |y1:n ) ∝ γn+1,θ
(1)
n
Y
k
Ψk d(x1k , . . . , xTk k ), a1k−1 , . . . , aTk−1
, Tk |
k=2
1:T
xk−1k−1 , Tk−1
Ψ1 d(x11 , . . . , xT1 1 ), T1 π(θ).
Then using Proposition 3.1, one has that
π̄(θ, d(x1 , . . . , xn ), a1:n−1 , T1:n |y1:n ) ∝ γn+1,θ (1)π(θ).
8
That is, for any fixed θ ∈ Θ
Z
π̄(θ, d(x1 , . . . , xn ), a1:n−1 , l, t1:n |y1:n ).
π(θ|y1:n ) =
Ē\Θ
Note also that the samples (θ, xl1:n ) from π̄ are marginally distributed according to π̃(θ, z1:n , u1:n |y1:n ). The associated ergodicity of the new PMMH algorithm follows the construction in [2] and we omit details for brevity. Finally,
we remark that Proposition 3.2 suggests a rule of thumb to set N ; one should choose N = O(n). More detailed
analysis for choosing N can be found in [12].
4.3
Some Remarks on PMCMC
If one follows the proof of Propositon 3.2, we conjecture the following result. Suppose that the relative variance for
γ N −1 (1)
the standard particle filter (with N − 1 particles) estimate of γn (1) is written Ě[( nγn (1) − 1)2 ] then one has
2 i
h γ N −1 (1)
2 i
h γ Tn (1)
n
n
−1
≤ Ě
−1
.
E
γn (1)
γn (1)
This means that in the PMMH context, the (relative) variance of the estimate using the alive filter is always less
than that of the standard particle filter, for every θ (not taking into account that the alive filter will cost more in
terms of computation). One might expect that this is sufficient to deduce that the PMMH with the alive filter is
better (e.g. with regards to its convergence properties) than the PMMH with the standard particle filter. As noted
in [5], this is not typically the case, and one often needs a stronger ordering than variance, to deduce the superiority
of one particle MCMC algorithm versus another; that of convex ordering (see e.g. [20, Defintion 3.A.1] and this is
denoted ≤cx ). In this latter work the authors show that if the estimates γnTn (1) and γnN −1 (1) are convex ordered,
then, one PMMH will be better than another with regards to the asymptotic variance and spectral gap (see [5,
Theorem 3] for precise details).
In order to utilize the results (we will suppress θ) for our context, we will consider a simplified scenario, where
Mn (x, ·) is simply a probability measure ν on time-homogeneous space E, so one seeks to estimate
n−1
Y
ν(Gp ).
p=1
T
We will generate, independently at each time p, Tp samples from ν, x1p , . . . , xp p where
Tp = inf{n ≥ N :
n
X
Gp (xip ) ≥ N }
i=1
and compare the estimates
n−1
Y
p=1
N −1
Tp − 1
and
n−1
Y
p=1
N −1
1 X
Gp (xip ).
N − 1 i=1
Clearly, the time computational effort to produce the second estimate is significantly less than the first. These two
estimates will be proxies to represent the alive particle filter and the standard particle filter respectively. We have
the following result.
Proposition 4.1. For any n ≥ 2 and N ≥ 2
n−1
Y
p=1
n−1
−1
Y 1 NX
N −1
≤cx
Gp (xip ).
Tp − 1
N
−
1
p=1
i=1
T
Proof. Let p be fixed. Conditionally on Tp , the joint probability of x1p , . . . , xp p is
Ψp (d(x1p , . . . , xTp p )|Tp ) ∝ ISp (x1p , . . . , xTp p , Tp )
Tp
Y
i=1
9
ν(dxip )
T −1
and hence, conditionally on Tp , x1p , . . . , xp p
are exchangeable. Clearly we have
Tp −1
X
N −1
1
=
IB (xi ).
Tp − 1
Tp − 1 i=1 p p
Also, it follows that for any fixed Tp , (1/(Tp − 1), . . . , 1/(Tp − 1)), ((Tp − 1) enteries), is smaller in the majorization
order (e.g. [20, page 2]) than (1/(N − 1), . . . , 1/(N − 1), 0, . . . , 0) ((Tp − 1) enteries where only the first N − 1 enteries
are non-zero). Then, conditionally on Tp by [20, Theorem 3.A.35]
N −1
1 X
N −1
|Tp ≤cx
IB (xi )|Tp .
Tp − 1
N − 1 i=1 p p
Hence by [20, Theorem 3.A.12(b)] it follows that
N −1
1 X
N −1
≤cx
IB (xi ).
Tp − 1
N − 1 i=1 p p
The proof is concluded by the independence over each term in the product and repeated application of [20, Corollary
3.A.22.].
The implication of this result is as follows. If one considers two PMMH algorithms which will move θ with the
same proposal density and then sample the Xpi as above (assuming that ν depends on θ) then by [5, Theorem 3]
using the estimate
n−1
Y N −1
(6)
T −1
p=1 p
Qn−1
PN −1
will be better than using p=1 N 1−1 i=1 Gp (xip ) as, for instance for appropriate functionals of interest, the
asymptotic variance in the CLT for the MCMC algorithm with the estimate (6) will be lower than if one used
Qn−1 1 PN −1
i
p=1 N −1
i=1 Gp (xp ) (other good comparison properties also occur, but we do not discuss them). This provides
some theoretical evidence to prefer using the alive particle filter over the standard one in PMMH algorithms. We
should again remind the reader that the result should be understood with some caution as, the alive filter uses much
more computational effort and, this analysis is for a simplified version of the model. However, we will investigate
the implications of this result from an empirical perspective, in the next section.
5
5.1
Numerical Implementation
ABC Filtering
We will compare the alive filter to the standard particle filter (θ fixed). Whilst the alive particle filter has been
considered elsewhere, a comparion on ABC models is not in the literature; we expect this simulation study to be
informative for PMCMC algorithms.
5.1.1
Linear Gaussian model
To investigate the alive particle filter, we consider the following linear Gaussian state space model (with all quantities
one-dimensional):
Zn =Zn−1 + Vn ,
Yn =2Zn + Wn ,
t≥1
2
where Vn ∼ N (0, σv2 ), and independently Wn ∼ N (0, σw
). Our objective is to fit an ABC approximation of this
HMM; this is simply to investigate the algorithm constructed in this article.
10
5.1.2
Set up
2
Data are simulated from the (true) model for T = 5000 time steps and σv2 ∈ {0.1, 1, 5} and σw
∈ {0.1, 1, 5}.
i.i.d.
1
For n ∈ {1, . . . , T }, if pn ≥ 500
, where pt ∼ U[0,1] (the uniform distribution on [0, 1]), we have Yn = c, where
c ∈ {80, 90, . . . 140, 150}. Recall B (y) = {u : |u − y| < } and we consider a fixed sequence of which values belong
to set {5, 10, 15}. We compare the alive particle filter to the approach in [13].
The proposal dynamics are as described in Section 2.3. For the approach in [13], N = 2000 and we resample every
time. For the alive particle filter, we used N = 1500 particles; this is to keep the computation time approximately
equal. We also estimate the normalizing constant via the alive filter at each time step and compare it with ‘exact’
values obtained via the Kalman filter in the limiting case = 0. To assess the performance in normalizing constant
estimation, the relative variance is estimated via independent runs of the procedure.
Our results are constructed as two parts. In the first part, we compare the performance of two particle filters
under different scenarios. In the second part, we focus on examples where the approach in [13] collapses. All results
were averaged over 50 runs. We note that, with regards to the results in this Section and the approach in [15];
generally similar conclusions can be drawn with regards to comparison to the approach in [13].
5.1.3
Part I
In this part, the analyses of the alive particle filter were completed in approximately 115 seconds and approximately
103 seconds were taken for the approach in [13] (which we just term the particle filter). Our results are shown in
Figures 1-6.
Figure 1 displays the log relative error for the alive filter to the particle filter. We present the time evolution of
the L1 log relative error between the ‘exact’ and estimated first moment. From our results, the mean log relative
error for each panel is {0.06, 0.04, 0.07}. Figure 2 plots the absolute L1 error of the alive particle filter error across
time. These results indicate, in the scenarios under study, that both filters are performing about the same time
with regards to estimating the filter. This is unsurprising as both methods use essentially the same information,
and the outlying values do not lead to a collapse of the particle filter.
In Figure 3, we show the time evolution of the log of the normalizing constant estimate for three approaches,
i.e. Kalman filter (black ‘–’ line), new ABC filter (red ‘-·-’ line) and SMC method (blue ‘··’ line). Figure 4 displays
the (log) relative variance of the estimate of the normalizing constant via the alive particle filter, when using the
Kalman filter as the ground truth. In Figure 3, there is unsurprisingly a bias in estimation of the normalizing
constant, as the ABC approximation is not exact, i.e. 6= 0. In Figure 4 the linear decay in variance proven in
Proposition 3.2 is demonstrated (although under a log transformation).
In Figure 5 and 6, we show the number of particles used at each time step (that is to achieve N alive particles) of
the alive filter (Figure 5) and the number of alive particles for the standard particle filter (Figure 6). Both Figures
illustrate the effect of outlying data, where the alive filter has to work ‘harder’ (i.e. assigns more computational
effort), whereas the standard filter just loses particles.
Figure 1: Estimation error of the first moment for the linear state space model. Each panel displays (Log) the ratio
of L1 error of the alive filter to old filter.
11
Figure 2: Estimation error of the first moment for the linear state space model using the alive particle filter (red
‘?’ indicates the x-axis position of outlier).
Figure 3: Estimated normalizing constant for the linear state space model: Kalman filter (black ’–’), alive filter
(red ’-·-’) and old filter (blue ’··’). Each panel displays the estimated normalizing constant across time.
5.1.4
Part II
In this part, we keep the initial conditions the same as in the previous Section but change the value of . Instead
of using ∈ {5, 10, 15}, we set smaller values to , i.e. ∈ {3, 6, 12} (recall the smaller , the closer the ABC
approximation is to the true HMM [13, Theorem 1]. This change makes the standard particle filter collapse whereas
the alive filter does not have this problem. All results were averaged over 50 runs and our results are shown in
Figures 7-8.
In Figure 7, we present the true simulated hidden√trajectory
along with a plot of the estimated Xt given by
√
the two particle filters across time when (σv , σw ) = ( 5, 5). As shown in Figure 7, the alive filter can provide
better estimation versus the old particle filter. Figure 8 displays the log relative error of the alive filter to standard
particle filter, which supports the previous point made, with regards to estimation of the hidden state. Based upon
the results displayed, the alive filter can provide good estimation results under the same conditions when the old
particle filter collapses.
5.2
Particle MCMC: Implementation on Real Data
We consider the following state-space model, for n ≥ 1
Yn =εn β exp(Zn )
Zn =φZn−1 + Vn
where εn ∼ St(0, ξ1 , ξ2 , ξ3 ) (a stable distribution with location parameter 0, scale ξ1 , skewness parameter ξ2 and
stability parameter ξ3 ) and Vn ∼ N (0, c). We set θ = (β, c, φ), with priors c ∼ IG(2, 1/100), φ ∼ IG(2, 1/50)
(IG(a, b) is an inverse Gamma distribution with mode b/(a + 1)) and β ∼ N (0, 10). Note that the inverse Gamma
distributions have infinite variance.
We consider the daily (adjust closing) index of the S & P 500 index between 03/01/2011 − 14/02/2013 (533
data points). Our data are the log-returns, that is, if In is the index value at time n, Yn = log(In /In−1 ). The
data are displayed in Figure 9. The basic form of the state-space model has been used in many articles for the
analysis of financial data (see for instance [19]) but generally εn has a tractable probability density function. The
stable distribution may help us to more realistically capture heavy tails prevalent in financial data, than perhaps a
12
Figure 4: (Log) Relative variance of normalizing constant of alive filter to Kalman filter for the linear state space
model. Each panel displays the relative variance across time.
Figure 5: Number of particles used for alive filter for the linear state space model. Each panel displays the number
of particles across time.
standard Gaussian. In most scenarios, the probability density function of a stable distribution is intractable, which
suggests that an ABC approximation might be a sensitive way to approximate the true model; this is what is fitted
below.
5.2.1
Algorithm setup
We consider two scenarios to compare the standard PMMH algorithm and the new one developed above. In the
first situation we set ξ3 = 1.75 and in the second, ξ3 = 1.2, with ξ1 = ξ2 = 1 in both situations. In the first case, we
make a suitable value as the data are not expected to jump off the same scale as the initial data. In the second, is significantly reduced; this is to illustrate a point about the algorithm we introduce. Both algorithms are run for
about the same computational time, such that the new PMMH algorithm has 20000 iterations. The parameters are
initialized with draws from the priors. The proposal on β is a normal random walk and for (c, φ) a gamma proposal
centered at the current point with proposal variance scaled to obtain reasonable acceptance rates. We consider
N ∈ {10, 100, 1000} and for the new PMMH algorithm this value is lower to allow the same computational time.
5.2.2
Results
Our results are presented in Figures 10-13. In Figures 10-11 we can see the output in the case that ξ3 = 1.75.
For all cases, it appears that both algorithms perform very well; the acceptance rates were around 0.25 for each
case. For the new PMMH algorithm the average number of simulations of the data, per-iteration and data-point,
were (1636, 745, 365) for N ∈ {1000, 100, 10} respectively (recall we have modified N to make the computational
time similar to the standard PMMH). For this scenario one would prefer the standard PMMH as the algorithmic
performance is very good, with a removal of a random computation cost per iteration.
In Figures 12-13 the output when ξ = 1.2 is displayed. In Figure 12 we can see that the standard PMMH
algorithm performs very badly, barely moving across the parameter space, whereas the new PMMH algorithm has
very reasonable performance (Figure 13). In this case, is very small, the standard SMC collapses very often,
which leads to the undesirable performance displayed. We note that considerable effort was expended in trying to
get the standard PMMH algorithm to work in this case, but we did not manage to do so (so we do not claim that
the algorithm cannot be made to work). Note also that whilst these are just one run of the algorithms, we have
13
Figure 6: Number of particles used for the particle filter of [13] for the linear state space model. Each panel displays
the number of particles across time.
Figure 7: (a) ‘True’ Zt and (b) estimated Zt across time for the linear state space model, where red (’–’) indicates
the alive particle filter and black ’··’ indicates the particle filter.
seen this behaviour in many other cases and it is typical in these examples. The results here suggest that the new
PMMH kernel might be preferred in difficult sampling scenarios, but in simple cases it does not seem to be required.
6
Summary
In this article we have investigated the alive particle filter; we developed and analyzed new particle estimates
and derived new and principled MCMC algorithms. There are several extensions to the work in this article.
Firstly, we have presented and analyzed the most standard particle filter; one can investigate more intricate filters
commensurate with the current state of the art. Secondly, we have presented the most basic PMCMC algorithm;
one can extend to particle Gibbs methods and beyond; see for instance [22] for a particle Gibbs algorithm. Finally,
one can also use the SMC theory in this article to interact with that of MCMC theory to investigate the performance
of our PMCMC procedures.
Acknowledgements
The first author was supported by an MOE Singapore grant R-155-000-119-133. We thank Gareth Peters for useful
conversations on this work.
A
Technical Results for the Normalizing Constant
Below Fn is the filtration generated by the alive particle system up-to time n.
Lemma A.1. We have for any n ≥ 1, N ≥ 2 and ϕ ∈ Bb (En ), that
T
n−1
E[ηnTn (ϕ)|Fn−1 ] = Φn (ηn−1
)(ϕ)
14
Figure 8: Estimation error of the first moment for the linear state space model. Each panel displays (Log) the ratio
of L1 error of alive filter to the particle filter.
Figure 9: S & P 500 (a) index data and (b) (Log) Daily return
T
−1
where Φ1 (η−1
)(ϕ) = M1 (ϕ).
Proof. We have, for any n ≥ 1, N ≥ 2 that Tn |Fn−1 is a Negative Binomial random variable with parameters N − 1
Tn−1
Tn−1
and success probability Φn (ηn−1
)(Bn ) = Φn (ηn−1
)(Gn ) and note that from [18, 21]
hN −1
i
Tn−1
E
)(Bn ).
Fn−1 = Φn (ηn−1
Tn − 1
(7)
Now,
E[ηnTn (ϕ)|Fn−1 ]
TX
n −1
i
1
ϕ(Xni )Fn−1
Tn − 1 i=1
=
h
E
=
h
E
=
Tn−1
i
h N − 1 Φ (η Tn−1 )(ϕI ) )(ϕIBcn ) o
N − 1 Φn (ηn−1
n n−1
Bn
+
1
−
E
Fn−1
T
T
n−1
n−1
Tn − 1 Φn (ηn−1 )(Bn )
Tn − 1 Φn (ηn−1 )(Bcn )
T
T
n−1
n−1
i
Φn (ηn−1
)(ϕIBn )
Φn (ηn−1
)(ϕIBcn ) o
1 n
(N − 1)
+
(T
−
N
)
F
n
n−1
Tn−1
Tn−1
Tn − 1
Φn (ηn−1
)(Bn )
Φn (ηn−1
)(Bcn )
where we have used the fact that there are N − 1 particles that are ‘alive’ and Tn − N that will die and used the
conditional distribution of the samples given Tn . Now by (7), it follows then that
T
T
T
n−1
n−1
n−1
E[ηnTn (ϕ)|Fn−1 ] = Φn (ηn−1
)(ϕIBn ) + Φn (ηn−1
)(ϕIBcn ) = Φn (ηn−1
)(ϕ)
which concludes the proof.
Lemma A.2. We have for any n ≥ 2, N ≥ 3, ϕ ∈ Bb (E2n ):
T
n−1 ⊗2
) (ϕ).
E[(ηnTn )2 (ϕ)|Fn−1 ] = Φn (ηn−1
15
Figure 10: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in
Section 2.2. Each row displays the samples with different N . Here ξ3 = 1.75.
16
Figure 11: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in
Section 2.3. Each row displays the samples with different N . Here ξ3 = 1.75.
17
Figure 12: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in
Section 2.2. Each row displays the samples with different N . Here ξ3 = 1.2.
18
Figure 13: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in
Section 2.3. Each row displays the samples with different N . Here ξ3 = 1.2.
19
Proof. We have:
E[(ηnTn )2 (ϕ)|Fn−1 ] =
i Φn (η Tn−1 )⊗2 (I 2 ϕ)
h (N − 1)(N − 2) Bn
n−1
E
+
Fn−1
T
n−1
⊗2
(Tn − 1)(Tn − 2)
Φn (ηn−1 ) (B2n )
h
E 2(N − 1)
(8)
T
n−1 ⊗2
i
) (IBn ×Bcn ϕ)
Φn (ηn−1
N −2
1
−
+
Fn−1
T
Tn−1
n−1
Tn − 1 (Tn − 1)(Tn − 2)
Φn (ηn−1 )(Bn )Φn (ηn−1
)(Bcn )
i Φn (η Tn−1 )⊗2 (I c 2 ϕ)
h
N −1
N −1
(N − 1)2
(Bn )
n−1
.
E 1−
−
+−
Fn−1
Tn−1 ⊗2
Tn − 1 Tn − 2
(Tn − 1)(Tn − 2)
) ((Bcn )2 )
Φn (ηn−1
(9)
(10)
The three terms on the R.H.S. arise due to the (N − 1)(N − 2) different pairs of particles which land in B2n (8), the
2(N − 1)(Tn − N ) pairs of different particles which land in Bn × Bcn (9) and the (Tn − N )(Tn − N − 1) different
Tn−1
) arise from the conditional distributions of the
pairs of particles which land in (Bcn )2 (10); the factors of Φn (ηn−1
particles given Tn (recalling that conditional on Fn−1 , Tn is a negative binomial random variables parameters N
Tn−1
)(B (yn ))).
and Φn (ηn−1
Now for (8), we have from [18, 21] that
h (N − 1)(N − 2) i
Tn−1 ⊗2
E
) (B2n ).
Fn−1 = Φn (ηn−1
(Tn − 1)(Tn − 2)
so that (8) becomes
T
n−1 ⊗2
Φn (ηn−1
) (IB2n ϕ).
Recalling (7) and using the above result, (9) becomes
T
n−1 ⊗2
2Φn (ηn−1
) (IBn ×Bcn ϕ).
Finally, noting that for any t 6= 1, 2 1/(t − 2) = 1/(t − 1) + 1/[(t − 1)(t − 2)], and thus using the above results that
Tn−1 ⊗2
i
hN −1
Φn (ηn−1
) (Bn )2
Tn−1 ⊗2
) (Bn ) +
E
Fn−1 = Φn (ηn−1
Tn − 2
N −1
it follows that (10) is equal to
T
n−1 ⊗2
Φn (ηn−1
) (I(Bcn )2 ϕ).
Hence we have shown
E[(ηnTn )2 (ϕ)|Fn−1 ]
T
T
T
=
n−1 ⊗2
n−1 ⊗2
n−1 ⊗2
Φn (ηn−1
) (IB2n ϕ) + 2Φn (ηn−1
) (IBn ×Bcn ϕ) + Φn (ηn−1
) (I(Bcn )2 ϕ)
=
n−1 ⊗2
Φn (ηn−1
) (ϕ).
T
References
[1] Amrein, M. & Künsch, H. (2011). A variant of importance splitting for rare event estimation: Fixed number
of successes. ACM TOMACS, 21, article 13.
[2] Andrieu, C., Doucet, A. & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods (with
discussion). J. R. Statist. Soc. Ser. B, 72, 269–342.
[3] Andrieu, A., Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations.
Ann. Statist., 37, 697–725.
[4] Andrieu, C. & Vihola, M. (2015). Convergence properties of pseudo-marginal Markov chain Monte Carlo
algorithms. Ann. Appl. Probab., 25, 1030–1077.
[5] Andrieu, C. & Vihola, M. (2014). Establishing some order amongst exact approximation of MCMCs.
arXiv:1404.6909 [stat.CO].
20
[6] Cérou, F., Del Moral, P., Furon, T. & Guyader, A. (2012). Sequential Monte Carlo for rare event
estimation. Statist. Comp., 22, 795–808.
[7] Cérou, F., Del Moral, P. & Guyader, A. (2011). A non-asymptotic variance theorem for un-normalized
Feynman-Kac particle models. Ann. Inst. Henri Poincare, 47, 629–649.
[8] Dean, T. A., Singh, S. S., Jasra, A. & Peters G. W. (2014). Parameter estimation for Hidden Markov
models with intractable likelihoods. Scand. J. Statist., 41, 970–987.
[9] Del Moral, P. (2004). Feynman-Kac Formulae. Springer, New York.
[10] Del Moral, P., & Doucet, A. (2004). Particle motions in absorbing medium with hard and soft obstacles.
Stoch. Anal. Appl., 22, 1175–1207.
[11] Del Moral, P., Doucet, A. & Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate
Bayesian computation. Statist. Comp., 22, 1223–1237.
[12] Doucet, A., Pitt, M. K., Deligiannidis, G. & Kohn, R. (2015). Efficient implementation of Markov chain
Monte Carlo when using an unbiased likelihood estimator. Biometrika, 102, 295-313.
[13] Jasra, A., Singh, S. S., Martin, J. S. & McCoy, E. (2012). Filtering via approximate Bayesian computation.
Statist. Comp., 22, 1223–1237.
[14] Lee, A., Andrieu, C. & Doucet, A. (2015). An active particle perspective of MCMC and its application to
locally adaptive MCMC algorithms. Work in progress.
[15] Le Gland, F. & Oudjane, N. (2004). Stability and uniform approximation of nonlinear filters using the
Hilbert metric, and application to particle filters. Ann. Appl. Probab., 14, 144–187.
[16] Le Gland, F. & Oudjane, N. (2006). A sequential particle algorithm that keeps the particle system alive. In
Stochastic Hybrid Systems : Theory and Safety Critical Applications, (H. Blom & J. Lygeros, Eds), Lecture
Notes in Control and Information Sciences 337, 351–389, Springer: Berlin.
[17] Martin, J. S., Jasra, A., Singh, S. S., Whiteley, N., Del Moral, & McCoy, E. (2014). Approximate
Bayesian computation for smoothing. Stoch. Anal. Appl., 32, 397-422 .
[18] Neuts, M. F. & Zacks, S. (1967). On mixtures of χ2 and F − distributions which yield distributions of the
same family. Ann. Inst. Stat. Math., 19, 527–536.
[19] Pitt, M. K. & Shephard N. (1997). Filtering via simulation: Auxiliary particle filters. J. Amer. Statist.
Assoc., 94, 590–599.
[20] Shaked, M. & Shanthikumar, J. G. (2007). Stochastic Orders. Springer: New York.
[21] Zacks, S. (1980). On some inverse moments of negative-binomial distributions and their application in estimation. J. Stat. Comp. & Sim., 10, 163-165.
[22] Zhang, X. (2014). Some Contributions to Approximate Inference in Bayesian Statistics. PhD Thesis, Imperial
College London.
21