The Alive Particle Filter and its use in Particle Markov chain Monte Carlo BY PIERRE DEL MORAL1 , AJAY JASRA2 , ANTHONY LEE3 , CHRISTOPHER YAU4 & XIAOLE ZHANG4 1 School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, 2052, AUS. E-Mail: [email protected] of Statistics & Applied Probability, National University of Singapore, Singapore, 117546, SG. E-Mail: [email protected] 3 Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK. E-Mail: [email protected] 4,5 Department of Mathematics, Imperial College London, London, SW7 2AZ, UK. 2 Department E-Mail: [email protected] , [email protected] Abstract In the following article we investigate a particle filter for approximating Feynman-Kac models with indicator potentials and we use this algorithm within Markov chain Monte Carlo (MCMC) to learn static parameters of the model. Examples of such models include approximate Bayesian computation (ABC) posteriors associated with hidden Markov models (HMMs) or rare-event problems. Such models require the use of advanced particle filter or MCMC algorithms e.g. [13], to perform estimation. One of the drawbacks of existing particle filters, is that they may ‘collapse’, in that the algorithm may terminate early, due to the indicator potentials. In this article, using a newly developed special case of the locally adaptive particle filter in [14], which is closely related to [16], we use an algorithm which can deal with this latter problem, whilst introducing a random cost per-time step. In particular, we show how this algorithm can be used within MCMC, using particle MCMC [2]. It is established that, when not taking into account computational time, when the new MCMC algorithm is applied to a simplified model it has a lower asymptotic variance in comparison to a standard particle MCMC algorithm. Numerical examples are presented for ABC approximations of HMMs. Key Words: Particle Filters, Markov Chain Monte Carlo, Feynman-Kac Formulae. 1 Introduction Let {(En , En )}n≥1 be a sequence of measurable spaces, {Gn (x) = IBn (x)}n≥1 , (x, Bn ) ∈ En × En , Bn ⊂ En , be a sequence of indicator potentials and {Mn : En−1 × En → [0, 1]}n≥1 , with x0 ∈ E0 a fixed point, be a sequence of Markov kernels. Then for the collection of bounded and measurable functions ϕ ∈ Bb (En ) the n−time Feynman-Kac marginal is: γn (ϕ) , n≥1 ηn (ϕ) := γn (1) Qn−1 assuming that γn (ϕ) = Ex0 [ϕ(Xn ) p=1 Gp (Xp )] is well-defined, where Ex0 [·] is the expectation w.r.t. the law of an inhomogeneous Markov chain with transition kernels {Mn }n≥1 . Such models appear routinely in the statistics and applied probability literature including: • ABC approximations (as in, e.g., [11]) • ABC approximations of HMMs (e.g. [8, 13, 17]) • Rare-Events problems (as in, e.g. [6]) In some scenarios, we will be interested in a static parameter θ ∈ Θ and Bayesian estimation, i.e. allowing Mn to depend on θ, we are interested in the density proportional to γn,θ (1)π(θ), where π(θ) is a prior on the static parameter. In order to perform estimation for such models, one often has to resort to numerical methods such as particle filters or MCMC; see the aforementioned references. Supposing θ is fixed, the basic particle filter, at time n and given a collection of samples N ≥ 1 with nonzero potential on EN n−1 , will generate samples on En using the Markov kernels {Mn }n≥1 and then sample with PN replacement amongst {xin }1≤i≤N according to the normalized weights Gn (xin )/ j=1 Gn (xjn ). The key issue with this basic particle filter is that, at any given time, there is no guarantee that any sample xin lies in Bn , and in some challenging scenarios, the algorithm can ‘die-out’ (or collapse), that is, that all of the samples have zero potentials 5 Yau is currently affiliated with the Wellcome Trust, University of Oxford 1 (see [10] for examples of such algorithms, which can work well). From an inference perspective, this is clearly an undesirable property and can lead to some poor performance. For some classes of examples, e.g. [6, 11], there are some adaptive techniques which can reduce the possibility of the algorithm collapsing, but these are not always guaranteed to work in practice. In this article we consider the alive particle filter, developed in [15] and [1]. This algorithm uses the same sampling mechanism, but the samples are generated until there is a prespecified number that are alive. This removes the possibility that the algorithm can collapse, but introduces a random cost per time-step. When θ is fixed, under assumptions, we establish the following results for the alive particle filter: 1. The estimate of γn (ϕ) computed using the proposed filter is unbiased 2. The relative variance of the particle filter estimates of γn (ϕ), assuming N = O(n), is shown to grow linearly in n. Note that a proof of point 1. is in [1], but our proof is original and particularly useful to develop further results such as central limit theorems (CLTs), which cannot easily be done using the approach in [1]. The results are of particular interest when using the alive particle filter within MCMC methodology (a particle MCMC (PMCMC) algorithm, [2]), for inferring θ as we now explain. PMCMC algorithms enable one to sample from the density proportional to γn,θ (1)π(θ), using a particle filter. This is because, for any fixed θ, the standard particle filter provides an unbiased estimate of γn,θ (1) ([9, Theorem 7.4.2]), so denoting all the variables generated by a particle filter as u with probability density Ψn,θ (u), one can produce a computable and unbiased estimate γn,θ (1, u) of γn,θ (1). The essence of the PMCMC method is to proceed by defining an auxiliary target proportional to γn,θ (1, u)Ψn,θ (u)π(θ) R and using MCMC to sample from it; as γn,θ (1, u)Ψn,θ (u)du = γn,θ (1), this produces samples from the target of interest. As the alive particle filter also produces an unbiased estimate of γn,θ (1) it can be used instead of the standard particle filter and as noted above, one expects superior performance from an empirical perspective. This is a new MCMC algorithm, to the best of our knowledge. There is a variety of applications of such PMCMC algorithms, for example, when performing static parameter estimation for ABC approximations of HMMs. The results in 1 & 2 not only allow one to construct new PMCMC algorithms, but also provide theoretical guidelines for their implementation. We also give a result, when not taking into account computational time, that when the new MCMC algorithm is applied to a simplified model it has a lower asymptotic variance in comparison to a standard particle MCMC algorithm (i.e. when using the standard particle filter). The structure of this article is as follows: In Section 2 we provide a motivating example, ABC approximations of HMMs, for the construction of the particle filter, as well as the alive particle filter itself. In Section 3 our theoretical results are provided along with some interpretation of their meaning. In Section 4 we develop a PMCMC algorithm using the guidelines in Section 3 for static parameter estimation associated to ABC approximations of HMMs. In Section 5 we implement the alive particle filter for the motivating example as well as PMCMC. In Section 6 the article is concluded, with some discussion of future work. The appendix contains technical results for the theory in Section 3. 2 Motivating Example and Algorithm Throughout θ is fixed unless otherwise stated. We will place a prior on θ and it will be random, when stated. 2.1 Motivating Example We are given an HMM with observations {Yn }n≥1 , Yn ∈ Y ⊆ Rdy , hidden states {Zn }n≥0 , Zn ∈ Z ⊆ Rdx , Z0 given. We assume: Z P(Yn ∈ A|{Zn }n≥0 ) = gθ (y|zn )dy n ≥ 1 A and Z P(Zn ∈ A|{Zk }k≥0 ) = fθ (z|zn−1 )dz n≥1 A with θ ∈ Θ a static parameter and dy Lebesgue measure. We assume gθ (y|xn ) is unknown (up to a positive, unbiased estimate), but one can sample from the associated distribution. In this scenario, one cannot apply a standard particle filter (or many other numerical approximation schemes). [8, 13] develop the following ABC approximation of the joint smoothing density, for > 0: Qn g (yk |zk )fθ (zk |zk−1 ) (1) πθ (z1:n |y1:n ) = R Qnk=1 θ k=1 gθ (yk |zk )fθ (zk |zk−1 )dz1:n Xn 2 where R gθ (yk |zk ) = g (u|zk )du B (yk ) θ R du B (yk ) and B (yk ) is the open ball centered at yk with radius . We let θ be fixed and omit it from our notations; it is reintroduced later on. We introduce a Feynman-Kac representation of the ABC approximation described above. Let En = E = Z×Y and define for n ≥ 1, Gn : E → {0, 1}: Gn (x) = IX×B (yn ) (x) with x = (z, u). Now introduce Markov kernels {Mn }n≥1 , Mn : E × B(X × Y) → [0, 1] (B(·) are the Borel sets), with Mn (x, dx0 ) = f (z 0 |z)g(u0 |z 0 )du0 dz 0 . Then the ABC predictor is for n ≥ 1: ηn (ϕ) := γn (ϕ) , γn (1) (2) where ϕ ∈ Bb (E) and γn (ϕ) = Ex0 " n−1 Y # Z Gp (Xp ) ϕ(Xn ) = n−1 Y En p=1 n Y Gp (xp ) ϕ(xn ) Mp (xp−1 , dxp ). p=1 (3) p=1 This provides a concrete example of the Feynman-Kac model in Section 1. In light of (2), we henceforth refer to γn (1) as the normalizing constant. This quantity is of fundamental importance in a wide variety of statistical applications, notably in static parameter estimation, as it is equivalent to the marginal likelihood of the observed data Y1 , . . . , Yn−1 in contexts such as the ABC approximation presented above, as can be determined from (3). 2.2 Standard Particle Filter Now define, for n ≥ 2: Φn (ηn−1 )(ϕ) = where Mn (ϕ)(x) = and setting R ηn−1 (Gn−1 Mn (ϕ)) ηn−1 (Gn−1 ) ϕ(y)Mn (x, dy). The standard particle filter works by sampling x11 , . . . , xN 1 i.i.d from M1 (x0 , ·) ηnN (ϕ) = N 1 X ϕ(xin ) n ≥ 1 N i=1 N at times n ≥ 2 sampling x1n , . . . , xN n from Φn (ηn−1 )(·), assuming that the system has not died out. 2.3 Alive Particle Filter We now discuss an idea which will prevent the particle filter from dying out; see also [14] and [16]. Throughout we assume that Mn (x, Bn ) is not known for each x, n; if this is known, then one can develop alternative algorithms. At time 1, we sample x11 , . . . , xT1 1 i.i.d. from M1 (x0 , ·), where T1 = inf{n ≥ N : n X G1 (xi1 ) ≥ N }. i=1 Then, define η1T1 (ϕ) = TX 1 −1 1 ϕ(xi1 ). T1 − 1 i=1 Now, at time 2 sample x12 , . . . , xT2 2 , conditionally i.i.d. from Φ2 (η1T1 )(·), where T2 = inf{n ≥ N : n X i=1 3 G2 (xi2 ) ≥ N }. Algorithm 1 Alive Particle Filter 1. At time 1. For j = 1, 2, . . . until j =: T1 is reached such that G1 (xj1 ) = 1 and Pj i=1 G1 (xi1 ) = N : • Sample xj1 from M1 (x0 , ·). 2. At time 1 < p ≤ n. For j = 1, 2, . . . until j =: Tp is reached such that Gp (xjp ) = 1 and Pj i=1 Gp (xip ) = N : (a) Sample ajp−1 uniformly from {k ∈ {1, . . . , Tp−1 − 1} : Gp−1 (xkp−1 ) = 1}. aj p−1 (b) Sample xjp from Mp (xp−1 , ·). This is continued until needed (i.e. with an obvious definition of T3 , T4 etc). The idea here is that, at every time step, we retain N − 1 particles with non-zero weight, so that the algorithm never dies out, but with the additional issue that the computational cost per time-step is a random variable. The procedure is described in Algorithm 1. We note that the approach in [15] retains N alive particles, i.e. it differs only in step 2(a) of Algorithm 1 by sampling instead ajp−1 uniformly on {k ∈ {1, . . . , Tp−1 } : Gp−1 (xkp−1 ) = 1}. This seemingly innocuous difference is, however, crucial to the unbiasedness results we develop in the sequel. 2.3.1 Some Remarks We remark that one can show [9] that for n ≥ 2, the normalizing constant is given by γn (1) = n−1 Y ηp (Gp ). p=1 Thus, a natural estimate of the normalizing constant is γnTn (1) = n−1 Y ηpTp (Gp ) = n−1 Y p=1 p=1 N −1 . Tp − 1 We note that the estimates of ηn and γn are different from those considered in [16] (which uses all the samples, not removing the last one). This is a critical point as in Proposition 3.1 we show that this estimate of the normalizing constant is unbiased which is crucial for using this idea inside MCMC algorithms. In this direction, one uses the particle filter to help propose values and there is an accept/reject step; we discuss this approach in Section 4. Other than the fact that this filter will not die out, in the context of our motivating example, there is also a natural use of this idea. This is because, one can envisage the arrival of an outlier or unusual data; in such scenarios, the alive particle filter will assign (most likely) more computational effort for dealing with this issue, which is not something that the standard filter is designed to do. A final remark is as follows; in our example Bn = X × B (yn ) and so, as assumed in this article in general, Mn (x, Bn ) is not known for each x, n. This removes the possibility of changing measure to Q (in the formula for γn (·)), (with finite dimensional marginal Qn ) Qn (d(x1 , . . . , xn )) = n Y Mp (xp−1 , dxp )IBp (xp ) Mp (xp−1 , Bp ) p=1 call the Markov kernels in the product M̂p . This is because the new potential at time n is exactly: Mn (x, Bn ). However, one can simulate from M̂n and use an unbiased estimate of Mn (x, Bn ) for each particle. That is, we obtain samples z (1) , z (2) from M̂p (xip−1 , ·) using R samples (total) from Mp (xip−1 , ·) and then we set xip = z (1) (say) with associated weight 1/(R − 1). This particular procedure would then have a fixed number of particles with no possibility of collapsing. Other than the algorithm being convoluted, some particles xip−1 could be such that E[R] is prohibitively large, even though E[Tp ] is not very large, which provides a reasonable argument against such a scheme. 3 Theoretical Results We will now present some theoretical results for the particle filter in Section 2.3 (so that θ is fixed). 4 3.1 Assumptions and Notations Define the following sequence of Markov kernels, for n ≥ 1: M̂n (x, dy) = Mn (x, dy)Gn (y) . Mn (Gn )(x) We will make use of the following assumptions: • (Ĝ): For each n ≥ 0 sup (x,y)∈B2n Mn+1 (Gn+1 )(x) = δ̂n < ∞. Mn+1 (Gn+1 )(y) (m) • (M̂m ): There exist m ≥ 1 such that for any p ≥ 1 there is a β̂p M̂p,p+m (x, dz) ≤ β̂p(m) M̂p,p+m (y, dz) ∀ (x, y) ∈ Bp ∈ [1, ∞) such that M̂p,p+m = M̂p+1 M̂p+2 . . . M̂p+m . Qp+m−1 (m) The two conditions are (Ĥm ) in [7]; we also use the notation δ̂p = q=p δ̂q . These assumptions are exceptionally strong, but we remark that for the scenario of interest, weaker conditions have not been used in the literature. Note that in addition, in the context of ABC, the assumptions are essentially qualitative as verifying them is very difficult (even on compact state-spaces) as the likelihood density is typically intractable. However, we still expect the phenomena reported in the below results to hold in some practical situations. We again remark that our results are relevant for scenarios other than ABC. Some notations are now given. For a probability measure on E (denoted P(E)) µ ∈ P(E) and bounded measurable R real-valued function (denoted Bb (E)) ϕ ∈ BRb (E), we write µ(ϕ) := E ϕ(x)µ(dx). For a non-negative operator on Bb (E), R(x, ·), and ϕ ∈ Bb (E), R(ϕ)(x) = E ϕ(y)R(x, dy). Let n ≥ 2, we will use the semi-group Qn (x, dy) = Gn−1 (x)Mn (x, dy) and for 1 ≤ p < n, Z Qp,n (ϕ)(xp ) = ϕ(xn )Qp+1 (xp , dxp+1 ) × · · · × Qn (xn−1 , dxn ), En−p where ϕ ∈ Bb (E); when p = n, Qn,n is the identity operator. E denotes expectation w.r.t. the stochastic process which generates the algorithm, with corresponding probability P. Fn is the filtration generated Q Qn−1 by the alive particle system up-to time n. It is assumed that ∅ = 1. Note the important formula γn (ϕ) = [ q=1 ηq (Gq )]ηn (ϕ), ϕ ∈ Bb (E). 3.2 Unbiasedness Define: γnTn (ϕ) := n−1 Y p=1 N − 1 Tn η (ϕ). Tp − 1 n The technical results used in this Section can be found in Appendix A. Proposition 3.1. We have for any n ≥ 1, N ≥ 2 and ϕ ∈ Bb (En ), that E[γnTn (ϕ)] = γn (ϕ). Proof. The proof uses the standard Martingale difference decomposition in [9], with some additional expectation properties that need to be proved. The case n = 1 follows from Lemma A.1, so we assume n ≥ 2. We remark that for p ∈ {2, . . . , n}: Tp−1 Tp−1 Tp−1 γpTp (1)Φp (ηp−1 )(Qp,n (ϕ)) = γp−1 (1)ηp−1 (Qp−1,n (ϕ)) and hence that γnTn (ϕ) − γn (ϕ) = n X T p−1 γpTp (1)[ηpTp − Φp (ηp−1 )](Qp,n (ϕ)). p=1 Then by Lemma A.1, it follows that T p−1 E[γpTp (1)[ηpTp − Φp (ηp−1 )](Qp,n (ϕ))|Fp−1 ] = 0 and hence that E[γnTn (ϕ) − γn (ϕ)] = 0 from which we easily conclude the result. 5 3.3 Non-Asymptotic Variance Theorem Below the term Pn s=1 δ̂s(m) β̂s(m) ηs (Gs ) (m) (m) β̂s is as in [7]. The expressions and interpretations for δ̂s can be found in Section 3.1, as is the assumption (Ĥm ). In addition, (ηnTn )2 is the U −statistic that is formed from our empirical measure Tn−1 (1)2 (ηnTn )⊗2 (F ) for F ∈ Bb (E2 ). ηnTn and (ηnTn )⊗2 is the corresponding V −statistic. In addition (γnTn )⊗2 (F ) = γn−1 Proposition 3.2. Assume (Ĥm ). Then for any n ≥ 2, N ≥ 3 N> n (m) (m) X δ̂s β̂s s=1 ⇒ ηs (Gs ) n (m) (m) h γ Tn (1) 2 i 4 X δ̃s δ̂s β̂s n E −1 ≤ . γn (1) N s=1 ηs (Gs ) Proof. The result follows essentially from [7]. To modify the proof to our set-up, we will prove that for F : E2 → R+ (where the expectation on the L.H.S. is w.r.t. the stochastic process that generates the SMC algorithm) E[(γnTn )⊗2 (F )] ≤ N − 1 n ⊗2 Eξ [η1⊗2 Cξ1 Q⊗2 2 Cξ2 . . . Qn Cξn (F )] N −2 (4) where for each n ≥ 1, independently Pξ (ξn = 1) = 1 − Pξ (ξn = 0) = 1 N −1 with corresponding joint expectation Eξ and C1 (F )(x, y) = F (x, x), C0 (F )(x, y) = F (x, y). Once (4) is proved this gives a verification of Lemma 3.2, eq. (3.3) of [7], given this, the rest of the argument then follows Proposition 3.4 of [7] and Theorem 5.1 and Corollary 5.2 in [7] (note that the fact that we have an upper-bound with α = 0 (as in [7]) does not modify the result). We will write expectations w.r.t. the probability space associated to the particle system enlarged with the (independent) {ξn }n≥1 as Eξ . Thus, we consider the proof of (4). We have E[(γnTn )⊗2 (F )|Fn−1 ] = γnTn (1)2 E[(ηnTn )⊗2 (F )|Fn−1 ]. Now E[(ηnTn )⊗2 (F )|Fn−1 ] hT − 2 i 1 n E (ηnTn )2 (F ) + ηnTn (C(F ))Fn−1 Tn − 1 Tn − 1 1 Tn−1 ⊗2 Tn−1 ≤ Φn (ηn−1 ) (F ) + Φn (ηn−1 )(C(F )) N −1 N − 1 Tn−1 ⊗2 ≤ Eξ [Φn (ηn−1 ) (Cξn (F ))|Fn−1 ] N −2 = where we have used (Tn − 2)/(Tn − 1) ≤ 1, 1/(Tn − 1) ≤ 1/(N − 1) and Lemmas A.2 and A.1 to obtain the second line. Thus we have that N − 1 Tn−1 ⊗2 Eξ [Φn (ηn−1 ) (Cξn (F ))|Fn−1 ] E[(γnTn )⊗2 (F )|Fn−1 ] ≤ γnTn (1)2 N −2 N − 1 Tn−1 Tn−1 ⊗2 ≤ γn−1 (1)2 Eξ [(ηn−1 ) (Qn Cξn (F ))|Fn−1 ]. N −2 Using the above inequality, one can repeat the argument inductively to deduce (4). This completes the proof of the Proposition. Remark 3.1. The significance of the result is simply that if (m) (m) sup s δ̂s β̂s <c ηs (B (ys )) then if N > cn the relative variance will be bounded by a constant that is independent of n. This will be useful for the PMCMC algorithm in Section 4; see for instance the discussion in [2] as to the significance of a relative variance result. 6 4 Particle MCMC 4.1 Motivation We now utilize the results in Propositions 3.1-3.2. In particular, Proposition 3.1 allows us to construct an MCMC method for performing static parameter inference in the context of ABC approximations of HMMs. Recall Section 2.1. Our objective is to sample from the posterior density: R Qn n k=1 gθ (yk |zk )fθ (zk |zk−1 ) dz1:n π(θ) X Qn (5) π(θ|y1:n ) = R k=1 gθ (yk |zk )fθ (zk |zk−1 ) dz1:n π(θ)dθ Xn ×Θ where gθ , fθ is as (1) and π(θ) is a prior probability density on Θ. Throughout the Section, we set N ≥ 2, > 0, but in general omit dependencies on these quantities. In practice, one often seeks to sample from an associated probability on the extended state-space En × Θ π̃(θ, z1:n , u1:n |y1:n ) ∝ n Y IB (yk ) (uk )gθ (uk |zk )fθ (zk |zk−1 ) π(θ). k=1 It is then easily verified that for any fixed θ ∈ Θ Z π(θ|y1:n ) = π̃(θ, z1:n , u1:n |y1:n )dz1:n du1:n . En A typical way to sample from π̃(θ, z1:n , u1:n |y1:n ) is via the Metropolis-Hastings method, proposing to move 0 from (θ, z1:n , u1:n ) to (θ0 , z1:n , u01:n ) via the probability density: q(θ0 |θ) n Y 0 gθ0 (u0k |zk0 )fθ0 (zk0 |zk−1 ) k=1 such a proposal removes the need to evaluate gθ which is not available in this context. As is well known e.g. [2], such procedures typically do not work very well and lead to slow mixing on the parameter space Θ. This proposal can be greatly improved by running a particle-filter (the particle marginal Metropolis-Hastings (PMMH) algorithm) as in [2]; that is a Metropolis-Hastings move that will first move θ, via q(θ0 |θ) and then run the algorithm in Section 2.2 picking a whole path, l, xl1:n ∈ En the sample used with a probability proportional to Gn (xin ). Remarkably, this procedure yields samples from (5) via an auxiliary probability density; the details can be found in [2], but the apparently fundamental property is that the estimate of the normalizing constant is unbiased. Note also that the sample from the Markov chain (θ, xl1:n ) also provides a sample from π̃(θ, z1:n , u1:n |y1:n ). Intitively, one expects that the alive filter in Section 2.3 out-performs the standard one, for a given computational complexity. In addition, as seen in Proposition 3.1, the estimate of the normalizing constant is unbiased. It is therefore a reasonable conjecture that one can construct a new PMMH algorithm, with the alive particle filter investigated previously in this article and that this might perform better (in some sense) than the standard PMMH just described. We remark that the justification of this new PMMH, to be given below, follows from the statements in [4] (see also [3]) and Proposition 3.1; we provide details for completeness. 4.2 New PMMH Kernel We will define an appropriate target probability to produce samples from (5), but we first give the algorithm: 1. Sample θ(0) from any absolutely continuous distribution. Then run the alive particle filter (with parameter Tn (0) Tn (0) (1) (now denoted γn+1,θ(0) (1)). Pick a trajectory xl1:n (0), value θ(0)) in Section 2.3 up-to time n, storing γn+1 l ∈ {1, . . . , Tn (0) − 1}, with probability Gn (xln (0)) . PTn (0)−1 Gn (xin (0)) i=1 Set i = 1. 2. Propose θ0 |θ(i − 1) from a proposal with positive density on Θ (write it as q(θ0 |θ)). Then run the alive particle Tn0 l 0 filter (with parameter value θ0 ) in Section 2.3 up-to time n, storing γn+1,θ 0 (1). Pick a trajectory (x1:n ) with probability Gn ((xln )0 ) . PTn0 −1 i 0 i=1 Gn ((xn ) ) 7 T (i) T0 n n Set θ(i) = θ0 , γn+1,θ(i) (1) = γn+1,θ 0 (1) with probability: T0 n γn+1,θ 0 (1) 1∧ Tn (i−1) γn+1,θ(i−1) (1) T (i) π(θ0 )q(θ(i − 1)|θ0 ) . π(θ(i − 1))q(θ0 |θ(i − 1)) T (i−1) n n Otherwise set θ(i) = θ(i − 1), γn+1,θ(i) (1) = γn+1,θ(i−1) (1). Set i = i + 1 and return to the start of 2. For readers interested in the numerical implementation, they can skip to Section 5, noting that the θ samples will come from the posterior (5); this is now justified in the rest of the section. It should be noted that our PMMH algorithm is just that in [2] except we have replaced the standard particle filter with the alive particle filter. We construct the following auxiliary target probability on the state-space: Ē Θ× = ∞ [ ET1 × {T1 } × ∞ [ T1 =N T2 =N {1, . . . , Tn−1 − 1}Tn ∞ [ E Tn × ET2 × {1, . . . , T1 − 1}T2 × {T2 } × · · · × Tn =N × {Tn } × {1, . . . , Tn − 1} · · · . Whilst the state-space looks complicated it corresponds to the static parameter and all the variables (the states and the resampled indices) sampled by the alive particle filter up-to time-step n and then just the picking of one of the final paths. For n ≥ 2 (we omit θ from our notation) define 1:T n Ψn d(x1n , . . . , xTnn ), a1n−1 , . . . , aTn−1 , Tn |xn−1n−1 , Tn−1 := ai ISn (x1n , . . . , xTnn , Tn ) n−1 ) Gn−1 (xn−1 QTn Tn −1 N −1 i=1 PTn−1 −1 G i=1 P∞ Tn =N Tn −1 1:Tn an−1 ∈{1,...,Tn−1 −1} N −1 R P E Tn i n−1 (xn−1 ) hQ ISn (x1n , . . . , xTnn , Tn ) ai n−1 Mn (xn−1 , dxin ) ai n−1 Gn−1 (xn−1 ) Tn i=1 PTn−1 −1 G (xi i=1 n−1 n−1 ) ai n−1 Mn (xn−1 , dxin ) i where for n ≥ 1 Sn = {(u1n , . . . , uTnn , Tn ) ∈ YTn × {N, N + 1, . . . } : TX n −1 IB (yn ) (uin ) = N − 1 ∩ uTnn ∈ B (yn )}. i=1 In addition, set Ψ1 d(x11 , . . . , xT1 1 ), T1 −1 QT1 i IS1 (x11 , . . . , xTn1 , T1 ) TN1−1 i=1 Mn (x0 , dx1 ) hQ i := P R ∞ T1 T1 −1 i I (x11 , . . . , xT1 1 , T1 ) T1 =N N −1 i=1 M1 (x0 , dxn ) . ET1 S1 Then the PMMH algorithm just defined samples from the target (c.f. [2, pp. 298]) π̄(θ, d(x1 , . . . , xn ), a1:n−1 , l, T1:n |y1:n ) ∝ n Y Gn (xln ) Tn k γ (1) Ψ d(x1k , . . . , xTk k ), a1k−1 , . . . , aTk−1 , Tk | PTn −1 k n+1,θ j G (x ) n n j=1 k=2 1:T xk−1k−1 , Tk−1 Ψ1 d(x11 , . . . , xT1 1 ), T1 π(θ). where ak = (a1k , . . . , aTk k ), xk = (x1k , . . . , xTk k ) and l ∈ {1, . . . , Tn − 1}. Marginalizing over l Tn π̄(θ, d(x1 , . . . , xn ), a1:n−1 , T1:n |y1:n ) ∝ γn+1,θ (1) n Y k Ψk d(x1k , . . . , xTk k ), a1k−1 , . . . , aTk−1 , Tk | k=2 1:T xk−1k−1 , Tk−1 Ψ1 d(x11 , . . . , xT1 1 ), T1 π(θ). Then using Proposition 3.1, one has that π̄(θ, d(x1 , . . . , xn ), a1:n−1 , T1:n |y1:n ) ∝ γn+1,θ (1)π(θ). 8 That is, for any fixed θ ∈ Θ Z π̄(θ, d(x1 , . . . , xn ), a1:n−1 , l, t1:n |y1:n ). π(θ|y1:n ) = Ē\Θ Note also that the samples (θ, xl1:n ) from π̄ are marginally distributed according to π̃(θ, z1:n , u1:n |y1:n ). The associated ergodicity of the new PMMH algorithm follows the construction in [2] and we omit details for brevity. Finally, we remark that Proposition 3.2 suggests a rule of thumb to set N ; one should choose N = O(n). More detailed analysis for choosing N can be found in [12]. 4.3 Some Remarks on PMCMC If one follows the proof of Propositon 3.2, we conjecture the following result. Suppose that the relative variance for γ N −1 (1) the standard particle filter (with N − 1 particles) estimate of γn (1) is written Ě[( nγn (1) − 1)2 ] then one has 2 i h γ N −1 (1) 2 i h γ Tn (1) n n −1 ≤ Ě −1 . E γn (1) γn (1) This means that in the PMMH context, the (relative) variance of the estimate using the alive filter is always less than that of the standard particle filter, for every θ (not taking into account that the alive filter will cost more in terms of computation). One might expect that this is sufficient to deduce that the PMMH with the alive filter is better (e.g. with regards to its convergence properties) than the PMMH with the standard particle filter. As noted in [5], this is not typically the case, and one often needs a stronger ordering than variance, to deduce the superiority of one particle MCMC algorithm versus another; that of convex ordering (see e.g. [20, Defintion 3.A.1] and this is denoted ≤cx ). In this latter work the authors show that if the estimates γnTn (1) and γnN −1 (1) are convex ordered, then, one PMMH will be better than another with regards to the asymptotic variance and spectral gap (see [5, Theorem 3] for precise details). In order to utilize the results (we will suppress θ) for our context, we will consider a simplified scenario, where Mn (x, ·) is simply a probability measure ν on time-homogeneous space E, so one seeks to estimate n−1 Y ν(Gp ). p=1 T We will generate, independently at each time p, Tp samples from ν, x1p , . . . , xp p where Tp = inf{n ≥ N : n X Gp (xip ) ≥ N } i=1 and compare the estimates n−1 Y p=1 N −1 Tp − 1 and n−1 Y p=1 N −1 1 X Gp (xip ). N − 1 i=1 Clearly, the time computational effort to produce the second estimate is significantly less than the first. These two estimates will be proxies to represent the alive particle filter and the standard particle filter respectively. We have the following result. Proposition 4.1. For any n ≥ 2 and N ≥ 2 n−1 Y p=1 n−1 −1 Y 1 NX N −1 ≤cx Gp (xip ). Tp − 1 N − 1 p=1 i=1 T Proof. Let p be fixed. Conditionally on Tp , the joint probability of x1p , . . . , xp p is Ψp (d(x1p , . . . , xTp p )|Tp ) ∝ ISp (x1p , . . . , xTp p , Tp ) Tp Y i=1 9 ν(dxip ) T −1 and hence, conditionally on Tp , x1p , . . . , xp p are exchangeable. Clearly we have Tp −1 X N −1 1 = IB (xi ). Tp − 1 Tp − 1 i=1 p p Also, it follows that for any fixed Tp , (1/(Tp − 1), . . . , 1/(Tp − 1)), ((Tp − 1) enteries), is smaller in the majorization order (e.g. [20, page 2]) than (1/(N − 1), . . . , 1/(N − 1), 0, . . . , 0) ((Tp − 1) enteries where only the first N − 1 enteries are non-zero). Then, conditionally on Tp by [20, Theorem 3.A.35] N −1 1 X N −1 |Tp ≤cx IB (xi )|Tp . Tp − 1 N − 1 i=1 p p Hence by [20, Theorem 3.A.12(b)] it follows that N −1 1 X N −1 ≤cx IB (xi ). Tp − 1 N − 1 i=1 p p The proof is concluded by the independence over each term in the product and repeated application of [20, Corollary 3.A.22.]. The implication of this result is as follows. If one considers two PMMH algorithms which will move θ with the same proposal density and then sample the Xpi as above (assuming that ν depends on θ) then by [5, Theorem 3] using the estimate n−1 Y N −1 (6) T −1 p=1 p Qn−1 PN −1 will be better than using p=1 N 1−1 i=1 Gp (xip ) as, for instance for appropriate functionals of interest, the asymptotic variance in the CLT for the MCMC algorithm with the estimate (6) will be lower than if one used Qn−1 1 PN −1 i p=1 N −1 i=1 Gp (xp ) (other good comparison properties also occur, but we do not discuss them). This provides some theoretical evidence to prefer using the alive particle filter over the standard one in PMMH algorithms. We should again remind the reader that the result should be understood with some caution as, the alive filter uses much more computational effort and, this analysis is for a simplified version of the model. However, we will investigate the implications of this result from an empirical perspective, in the next section. 5 5.1 Numerical Implementation ABC Filtering We will compare the alive filter to the standard particle filter (θ fixed). Whilst the alive particle filter has been considered elsewhere, a comparion on ABC models is not in the literature; we expect this simulation study to be informative for PMCMC algorithms. 5.1.1 Linear Gaussian model To investigate the alive particle filter, we consider the following linear Gaussian state space model (with all quantities one-dimensional): Zn =Zn−1 + Vn , Yn =2Zn + Wn , t≥1 2 where Vn ∼ N (0, σv2 ), and independently Wn ∼ N (0, σw ). Our objective is to fit an ABC approximation of this HMM; this is simply to investigate the algorithm constructed in this article. 10 5.1.2 Set up 2 Data are simulated from the (true) model for T = 5000 time steps and σv2 ∈ {0.1, 1, 5} and σw ∈ {0.1, 1, 5}. i.i.d. 1 For n ∈ {1, . . . , T }, if pn ≥ 500 , where pt ∼ U[0,1] (the uniform distribution on [0, 1]), we have Yn = c, where c ∈ {80, 90, . . . 140, 150}. Recall B (y) = {u : |u − y| < } and we consider a fixed sequence of which values belong to set {5, 10, 15}. We compare the alive particle filter to the approach in [13]. The proposal dynamics are as described in Section 2.3. For the approach in [13], N = 2000 and we resample every time. For the alive particle filter, we used N = 1500 particles; this is to keep the computation time approximately equal. We also estimate the normalizing constant via the alive filter at each time step and compare it with ‘exact’ values obtained via the Kalman filter in the limiting case = 0. To assess the performance in normalizing constant estimation, the relative variance is estimated via independent runs of the procedure. Our results are constructed as two parts. In the first part, we compare the performance of two particle filters under different scenarios. In the second part, we focus on examples where the approach in [13] collapses. All results were averaged over 50 runs. We note that, with regards to the results in this Section and the approach in [15]; generally similar conclusions can be drawn with regards to comparison to the approach in [13]. 5.1.3 Part I In this part, the analyses of the alive particle filter were completed in approximately 115 seconds and approximately 103 seconds were taken for the approach in [13] (which we just term the particle filter). Our results are shown in Figures 1-6. Figure 1 displays the log relative error for the alive filter to the particle filter. We present the time evolution of the L1 log relative error between the ‘exact’ and estimated first moment. From our results, the mean log relative error for each panel is {0.06, 0.04, 0.07}. Figure 2 plots the absolute L1 error of the alive particle filter error across time. These results indicate, in the scenarios under study, that both filters are performing about the same time with regards to estimating the filter. This is unsurprising as both methods use essentially the same information, and the outlying values do not lead to a collapse of the particle filter. In Figure 3, we show the time evolution of the log of the normalizing constant estimate for three approaches, i.e. Kalman filter (black ‘–’ line), new ABC filter (red ‘-·-’ line) and SMC method (blue ‘··’ line). Figure 4 displays the (log) relative variance of the estimate of the normalizing constant via the alive particle filter, when using the Kalman filter as the ground truth. In Figure 3, there is unsurprisingly a bias in estimation of the normalizing constant, as the ABC approximation is not exact, i.e. 6= 0. In Figure 4 the linear decay in variance proven in Proposition 3.2 is demonstrated (although under a log transformation). In Figure 5 and 6, we show the number of particles used at each time step (that is to achieve N alive particles) of the alive filter (Figure 5) and the number of alive particles for the standard particle filter (Figure 6). Both Figures illustrate the effect of outlying data, where the alive filter has to work ‘harder’ (i.e. assigns more computational effort), whereas the standard filter just loses particles. Figure 1: Estimation error of the first moment for the linear state space model. Each panel displays (Log) the ratio of L1 error of the alive filter to old filter. 11 Figure 2: Estimation error of the first moment for the linear state space model using the alive particle filter (red ‘?’ indicates the x-axis position of outlier). Figure 3: Estimated normalizing constant for the linear state space model: Kalman filter (black ’–’), alive filter (red ’-·-’) and old filter (blue ’··’). Each panel displays the estimated normalizing constant across time. 5.1.4 Part II In this part, we keep the initial conditions the same as in the previous Section but change the value of . Instead of using ∈ {5, 10, 15}, we set smaller values to , i.e. ∈ {3, 6, 12} (recall the smaller , the closer the ABC approximation is to the true HMM [13, Theorem 1]. This change makes the standard particle filter collapse whereas the alive filter does not have this problem. All results were averaged over 50 runs and our results are shown in Figures 7-8. In Figure 7, we present the true simulated hidden√trajectory along with a plot of the estimated Xt given by √ the two particle filters across time when (σv , σw ) = ( 5, 5). As shown in Figure 7, the alive filter can provide better estimation versus the old particle filter. Figure 8 displays the log relative error of the alive filter to standard particle filter, which supports the previous point made, with regards to estimation of the hidden state. Based upon the results displayed, the alive filter can provide good estimation results under the same conditions when the old particle filter collapses. 5.2 Particle MCMC: Implementation on Real Data We consider the following state-space model, for n ≥ 1 Yn =εn β exp(Zn ) Zn =φZn−1 + Vn where εn ∼ St(0, ξ1 , ξ2 , ξ3 ) (a stable distribution with location parameter 0, scale ξ1 , skewness parameter ξ2 and stability parameter ξ3 ) and Vn ∼ N (0, c). We set θ = (β, c, φ), with priors c ∼ IG(2, 1/100), φ ∼ IG(2, 1/50) (IG(a, b) is an inverse Gamma distribution with mode b/(a + 1)) and β ∼ N (0, 10). Note that the inverse Gamma distributions have infinite variance. We consider the daily (adjust closing) index of the S & P 500 index between 03/01/2011 − 14/02/2013 (533 data points). Our data are the log-returns, that is, if In is the index value at time n, Yn = log(In /In−1 ). The data are displayed in Figure 9. The basic form of the state-space model has been used in many articles for the analysis of financial data (see for instance [19]) but generally εn has a tractable probability density function. The stable distribution may help us to more realistically capture heavy tails prevalent in financial data, than perhaps a 12 Figure 4: (Log) Relative variance of normalizing constant of alive filter to Kalman filter for the linear state space model. Each panel displays the relative variance across time. Figure 5: Number of particles used for alive filter for the linear state space model. Each panel displays the number of particles across time. standard Gaussian. In most scenarios, the probability density function of a stable distribution is intractable, which suggests that an ABC approximation might be a sensitive way to approximate the true model; this is what is fitted below. 5.2.1 Algorithm setup We consider two scenarios to compare the standard PMMH algorithm and the new one developed above. In the first situation we set ξ3 = 1.75 and in the second, ξ3 = 1.2, with ξ1 = ξ2 = 1 in both situations. In the first case, we make a suitable value as the data are not expected to jump off the same scale as the initial data. In the second, is significantly reduced; this is to illustrate a point about the algorithm we introduce. Both algorithms are run for about the same computational time, such that the new PMMH algorithm has 20000 iterations. The parameters are initialized with draws from the priors. The proposal on β is a normal random walk and for (c, φ) a gamma proposal centered at the current point with proposal variance scaled to obtain reasonable acceptance rates. We consider N ∈ {10, 100, 1000} and for the new PMMH algorithm this value is lower to allow the same computational time. 5.2.2 Results Our results are presented in Figures 10-13. In Figures 10-11 we can see the output in the case that ξ3 = 1.75. For all cases, it appears that both algorithms perform very well; the acceptance rates were around 0.25 for each case. For the new PMMH algorithm the average number of simulations of the data, per-iteration and data-point, were (1636, 745, 365) for N ∈ {1000, 100, 10} respectively (recall we have modified N to make the computational time similar to the standard PMMH). For this scenario one would prefer the standard PMMH as the algorithmic performance is very good, with a removal of a random computation cost per iteration. In Figures 12-13 the output when ξ = 1.2 is displayed. In Figure 12 we can see that the standard PMMH algorithm performs very badly, barely moving across the parameter space, whereas the new PMMH algorithm has very reasonable performance (Figure 13). In this case, is very small, the standard SMC collapses very often, which leads to the undesirable performance displayed. We note that considerable effort was expended in trying to get the standard PMMH algorithm to work in this case, but we did not manage to do so (so we do not claim that the algorithm cannot be made to work). Note also that whilst these are just one run of the algorithms, we have 13 Figure 6: Number of particles used for the particle filter of [13] for the linear state space model. Each panel displays the number of particles across time. Figure 7: (a) ‘True’ Zt and (b) estimated Zt across time for the linear state space model, where red (’–’) indicates the alive particle filter and black ’··’ indicates the particle filter. seen this behaviour in many other cases and it is typical in these examples. The results here suggest that the new PMMH kernel might be preferred in difficult sampling scenarios, but in simple cases it does not seem to be required. 6 Summary In this article we have investigated the alive particle filter; we developed and analyzed new particle estimates and derived new and principled MCMC algorithms. There are several extensions to the work in this article. Firstly, we have presented and analyzed the most standard particle filter; one can investigate more intricate filters commensurate with the current state of the art. Secondly, we have presented the most basic PMCMC algorithm; one can extend to particle Gibbs methods and beyond; see for instance [22] for a particle Gibbs algorithm. Finally, one can also use the SMC theory in this article to interact with that of MCMC theory to investigate the performance of our PMCMC procedures. Acknowledgements The first author was supported by an MOE Singapore grant R-155-000-119-133. We thank Gareth Peters for useful conversations on this work. A Technical Results for the Normalizing Constant Below Fn is the filtration generated by the alive particle system up-to time n. Lemma A.1. We have for any n ≥ 1, N ≥ 2 and ϕ ∈ Bb (En ), that T n−1 E[ηnTn (ϕ)|Fn−1 ] = Φn (ηn−1 )(ϕ) 14 Figure 8: Estimation error of the first moment for the linear state space model. Each panel displays (Log) the ratio of L1 error of alive filter to the particle filter. Figure 9: S & P 500 (a) index data and (b) (Log) Daily return T −1 where Φ1 (η−1 )(ϕ) = M1 (ϕ). Proof. We have, for any n ≥ 1, N ≥ 2 that Tn |Fn−1 is a Negative Binomial random variable with parameters N − 1 Tn−1 Tn−1 and success probability Φn (ηn−1 )(Bn ) = Φn (ηn−1 )(Gn ) and note that from [18, 21] hN −1 i Tn−1 E )(Bn ). Fn−1 = Φn (ηn−1 Tn − 1 (7) Now, E[ηnTn (ϕ)|Fn−1 ] TX n −1 i 1 ϕ(Xni )Fn−1 Tn − 1 i=1 = h E = h E = Tn−1 i h N − 1 Φ (η Tn−1 )(ϕI ) )(ϕIBcn ) o N − 1 Φn (ηn−1 n n−1 Bn + 1 − E Fn−1 T T n−1 n−1 Tn − 1 Φn (ηn−1 )(Bn ) Tn − 1 Φn (ηn−1 )(Bcn ) T T n−1 n−1 i Φn (ηn−1 )(ϕIBn ) Φn (ηn−1 )(ϕIBcn ) o 1 n (N − 1) + (T − N ) F n n−1 Tn−1 Tn−1 Tn − 1 Φn (ηn−1 )(Bn ) Φn (ηn−1 )(Bcn ) where we have used the fact that there are N − 1 particles that are ‘alive’ and Tn − N that will die and used the conditional distribution of the samples given Tn . Now by (7), it follows then that T T T n−1 n−1 n−1 E[ηnTn (ϕ)|Fn−1 ] = Φn (ηn−1 )(ϕIBn ) + Φn (ηn−1 )(ϕIBcn ) = Φn (ηn−1 )(ϕ) which concludes the proof. Lemma A.2. We have for any n ≥ 2, N ≥ 3, ϕ ∈ Bb (E2n ): T n−1 ⊗2 ) (ϕ). E[(ηnTn )2 (ϕ)|Fn−1 ] = Φn (ηn−1 15 Figure 10: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.2. Each row displays the samples with different N . Here ξ3 = 1.75. 16 Figure 11: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.3. Each row displays the samples with different N . Here ξ3 = 1.75. 17 Figure 12: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.2. Each row displays the samples with different N . Here ξ3 = 1.2. 18 Figure 13: Trace plot of each parameter across iterations for a PMMH algorithm using the SMC algorithm in Section 2.3. Each row displays the samples with different N . Here ξ3 = 1.2. 19 Proof. We have: E[(ηnTn )2 (ϕ)|Fn−1 ] = i Φn (η Tn−1 )⊗2 (I 2 ϕ) h (N − 1)(N − 2) Bn n−1 E + Fn−1 T n−1 ⊗2 (Tn − 1)(Tn − 2) Φn (ηn−1 ) (B2n ) h E 2(N − 1) (8) T n−1 ⊗2 i ) (IBn ×Bcn ϕ) Φn (ηn−1 N −2 1 − + Fn−1 T Tn−1 n−1 Tn − 1 (Tn − 1)(Tn − 2) Φn (ηn−1 )(Bn )Φn (ηn−1 )(Bcn ) i Φn (η Tn−1 )⊗2 (I c 2 ϕ) h N −1 N −1 (N − 1)2 (Bn ) n−1 . E 1− − +− Fn−1 Tn−1 ⊗2 Tn − 1 Tn − 2 (Tn − 1)(Tn − 2) ) ((Bcn )2 ) Φn (ηn−1 (9) (10) The three terms on the R.H.S. arise due to the (N − 1)(N − 2) different pairs of particles which land in B2n (8), the 2(N − 1)(Tn − N ) pairs of different particles which land in Bn × Bcn (9) and the (Tn − N )(Tn − N − 1) different Tn−1 ) arise from the conditional distributions of the pairs of particles which land in (Bcn )2 (10); the factors of Φn (ηn−1 particles given Tn (recalling that conditional on Fn−1 , Tn is a negative binomial random variables parameters N Tn−1 )(B (yn ))). and Φn (ηn−1 Now for (8), we have from [18, 21] that h (N − 1)(N − 2) i Tn−1 ⊗2 E ) (B2n ). Fn−1 = Φn (ηn−1 (Tn − 1)(Tn − 2) so that (8) becomes T n−1 ⊗2 Φn (ηn−1 ) (IB2n ϕ). Recalling (7) and using the above result, (9) becomes T n−1 ⊗2 2Φn (ηn−1 ) (IBn ×Bcn ϕ). Finally, noting that for any t 6= 1, 2 1/(t − 2) = 1/(t − 1) + 1/[(t − 1)(t − 2)], and thus using the above results that Tn−1 ⊗2 i hN −1 Φn (ηn−1 ) (Bn )2 Tn−1 ⊗2 ) (Bn ) + E Fn−1 = Φn (ηn−1 Tn − 2 N −1 it follows that (10) is equal to T n−1 ⊗2 Φn (ηn−1 ) (I(Bcn )2 ϕ). Hence we have shown E[(ηnTn )2 (ϕ)|Fn−1 ] T T T = n−1 ⊗2 n−1 ⊗2 n−1 ⊗2 Φn (ηn−1 ) (IB2n ϕ) + 2Φn (ηn−1 ) (IBn ×Bcn ϕ) + Φn (ηn−1 ) (I(Bcn )2 ϕ) = n−1 ⊗2 Φn (ηn−1 ) (ϕ). T References [1] Amrein, M. & Künsch, H. (2011). A variant of importance splitting for rare event estimation: Fixed number of successes. ACM TOMACS, 21, article 13. [2] Andrieu, C., Doucet, A. & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods (with discussion). J. R. Statist. Soc. Ser. B, 72, 269–342. [3] Andrieu, A., Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist., 37, 697–725. [4] Andrieu, C. & Vihola, M. (2015). Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Ann. Appl. Probab., 25, 1030–1077. [5] Andrieu, C. & Vihola, M. (2014). Establishing some order amongst exact approximation of MCMCs. arXiv:1404.6909 [stat.CO]. 20 [6] Cérou, F., Del Moral, P., Furon, T. & Guyader, A. (2012). Sequential Monte Carlo for rare event estimation. Statist. Comp., 22, 795–808. [7] Cérou, F., Del Moral, P. & Guyader, A. (2011). A non-asymptotic variance theorem for un-normalized Feynman-Kac particle models. Ann. Inst. Henri Poincare, 47, 629–649. [8] Dean, T. A., Singh, S. S., Jasra, A. & Peters G. W. (2014). Parameter estimation for Hidden Markov models with intractable likelihoods. Scand. J. Statist., 41, 970–987. [9] Del Moral, P. (2004). Feynman-Kac Formulae. Springer, New York. [10] Del Moral, P., & Doucet, A. (2004). Particle motions in absorbing medium with hard and soft obstacles. Stoch. Anal. Appl., 22, 1175–1207. [11] Del Moral, P., Doucet, A. & Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statist. Comp., 22, 1223–1237. [12] Doucet, A., Pitt, M. K., Deligiannidis, G. & Kohn, R. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika, 102, 295-313. [13] Jasra, A., Singh, S. S., Martin, J. S. & McCoy, E. (2012). Filtering via approximate Bayesian computation. Statist. Comp., 22, 1223–1237. [14] Lee, A., Andrieu, C. & Doucet, A. (2015). An active particle perspective of MCMC and its application to locally adaptive MCMC algorithms. Work in progress. [15] Le Gland, F. & Oudjane, N. (2004). Stability and uniform approximation of nonlinear filters using the Hilbert metric, and application to particle filters. Ann. Appl. Probab., 14, 144–187. [16] Le Gland, F. & Oudjane, N. (2006). A sequential particle algorithm that keeps the particle system alive. In Stochastic Hybrid Systems : Theory and Safety Critical Applications, (H. Blom & J. Lygeros, Eds), Lecture Notes in Control and Information Sciences 337, 351–389, Springer: Berlin. [17] Martin, J. S., Jasra, A., Singh, S. S., Whiteley, N., Del Moral, & McCoy, E. (2014). Approximate Bayesian computation for smoothing. Stoch. Anal. Appl., 32, 397-422 . [18] Neuts, M. F. & Zacks, S. (1967). On mixtures of χ2 and F − distributions which yield distributions of the same family. Ann. Inst. Stat. Math., 19, 527–536. [19] Pitt, M. K. & Shephard N. (1997). Filtering via simulation: Auxiliary particle filters. J. Amer. Statist. Assoc., 94, 590–599. [20] Shaked, M. & Shanthikumar, J. G. (2007). Stochastic Orders. Springer: New York. [21] Zacks, S. (1980). On some inverse moments of negative-binomial distributions and their application in estimation. J. Stat. Comp. & Sim., 10, 163-165. [22] Zhang, X. (2014). Some Contributions to Approximate Inference in Bayesian Statistics. PhD Thesis, Imperial College London. 21
© Copyright 2026 Paperzz