Perfect Simulation of Hawkes Processes Author(s): Jesper Møller and Jakob G. Rasmussen Source: Advances in Applied Probability, Vol. 37, No. 3 (Sep., 2005), pp. 629-646 Published by: Applied Probability Trust Stable URL: http://www.jstor.org/stable/30037347 . Accessed: 23/09/2011 17:35 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Applied Probability Trust is collaborating with JSTOR to digitize, preserve and extend access to Advances in Applied Probability. http://www.jstor.org Adv.Appl. Prob. (SGSA)37, 629-646 (2005) Printed in Northern Ireland © Applied Probability Trust 2005 PERFECT SIMULATION HAWKES PROCESSES OF JESPER M0LLER * ** AND JAKOB G. RASMUSSEN,* *** Aalborg University Abstract Ourobjectiveis to constructa perfectsimulationalgorithmfor unmarked andmarked Hawkesprocesses.The usualstraightforward simulationalgorithmsuffersfromedge algorithm doesnot.By viewingHawkesprocesses effects,whereasourperfectsimulation as Poissonclusterprocessesandusingtheirbranchingandconditionalindependence functionforthe lengthof a cluster structures, of thedistribution usefulapproximations arederived.Thisis usedto construct upperandlowerprocessesfortheperfectsimulation algorithm.A tail-lightness forthe applicability conditionturnsoutto be of importance of theperfectsimulationalgorithm.Examplesof applications andempiricalresultsare presented. Approximate couplingfromthepast;edgeeffect;exact simulation; dominated Keywords: simulation;markedHawkesprocess;markedpointprocess;perfectsimulation;point process;Poissonclusterprocess;thinning;upperprocess;lowerprocess Primary60G55 2000Mathematics SubjectClassification: Secondary 68U20 1. Introduction Unmarkedand markedHawkes processes [10], [11], [12], [14] play a fundamentalrole in pointprocesstheoryandits applications(see, e.g. [8]), andtheyhave applicationsin seismology [13], [22], [23], [27] andneurophysiology[7]. Therearemanyways to definea markedHawkes process,butfor ourpurposesit is most convenientto defineit as a markedPoisson clusterprocess X = {(ti, Zi)} with events (or times) ti E R and marks Zi defined on an arbitrary (mark) space M equippedwith a probabilitydistributionQ. The cluster centres of X are given by certain events called immigrants,while the otherevents are called offspring. Definition 1. (Hawkes process with unpredictable marks.) (a) The immigrantsfollow a Poisson process with a locally integrableintensity function i(t), t e R. (b) The marksassociatedto the immigrantsareindependentandidenticallydistributed(i.i.d.) with distributionQ, and are independentof the immigrants. (c) Eachimmigrantti generatesa cluster Ci, which consists of markedeventsof generations of order n = 0, 1, ... with the following branching structure (see Figure 1). We first have (ti, Zi), which is said to be of generation0. Given the 0,.... n generationsin Received 3 December2004; revision received 13 April 2005. * Postaladdress:Departmentof MathematicalSciences, AalborgUniversity,FredrikBajersVej7G, DK-9220 Aalborg, Denmark. ** Email address:[email protected] ***Email address:[email protected] 629 630 * SGSA J. M0LLER AND J. G. RASMUSSEN 2 0 1 2 11 232 FIGURE 1: The branchingstructureof the variousgenerationsof events in a cluster (ignoringthe marks) (top), and the events on the time axis (bottom). Ci, each (tj, Zj) e Ci of generationn recursivelygenerates a Poisson process 'j of offspring of generation n + 1 with intensity function yj (t) = y (t - tj, Zj), t > tj. Here, y is a nonnegativemeasurablefunctiondefinedon (0, oo). We referto cj as an offspring process, and to yj and y as fertility rates. Furthermore,the mark Zk associated to any offspring tk e Fj has distributionQ, and Zk is independentof tk and all (t, Zi) with tl < tk. As in [8], we refer to this as the case of unpredictable marks. (d) Given the immigrants,the clustersare independent. (e) X consists of the union of all clusters. The independenceassumptionsin (c) and (d) imply thatwe have i.i.d. marks. In the special case where y (t, z) = y (t) does not dependon its second argument(or if P(y(t, Z) = y(t) for Lebesgue almost all t > 0) = 1, where Z denotes a generic mark),the events follow an unmarkedHawkesprocess. Apartfrom in that case, the events and the marksare dependentprocesses. Anotherway of defining the process is as follows (see, e.g. [8]): the marksare i.i.d. and the conditionalintensity function .(t) at time t e R, for the events given the previous history {(tk, Zk): tk < t}, is given by ,(t) = Iz(t) + y(t - ti, Zi). (1) ti <t Simulation proceduresfor Hawkes processes are needed for various reasons: analytical resultsareratherlimiteddue to the complex stochasticstructure;statisticalinference,especially model checking and prediction, require simulations; and displaying simulated realizations of specific model constructionsprovides a better understandingof the model. The general approachto simulatinga (markedor unmarked)point process is to use a thinning algorithm such as the Shedler-Lewis thinning algorithmor Ogata's modified thinning algorithm(see, e.g. [8]). However, Definition 1 immediatelyleads to the following approximatesimulation algorithm,where t_ e [-oo, 0] and t+ e (0, oo] are user-specifiedparametersand the output consists of all marked points (ti, Zi) with ti e [0, t+). Algorithm 1. The following steps (i)-(ii) generatean approximatesimulationof those marked events (ti, Zi) E X with 0 < ti < t+. (i) Simulatethe immigrantson [t_, t+). (ii) For each such immigrant ti, simulate Zi and those (tj, Z ) e Ci with ti < tj < t+. Perfectsimulationof Hawkesprocesses SGSA* 631 In general,Algorithm 1 suffers from edge effects, since clusters generatedby immigrants before time t_ may contain offspring in [0, t+). Bremaud et al. [5] studied the 'rate of installation',i.e. they considereda coupling of X, aftertime 0, to the outputfromAlgorithm 1 when t+ = oo. Undera tail-lightnessassumption(see the paragraphafterProposition3, below) and other conditions, they established an exponentiallydecreasingbound for the probability P(t_, oo), say, that X, after time 0, coincides with the outputof the algorithm. Algorithm 1 was also investigatedin the authors'own work [18], where variousmeasuresfor edge effects, includingrefinedresults for P(t_, oo), were introduced. Ourobjectivein this paperis to constructa perfect (or exact) simulationalgorithm.Perfect simulation has been a hot research topic since the seminal Propp-Wilson algorithm [24] appeared,butthe areasof applicationhaveso farbeenratherlimitedandmanyperfectsimulation algorithmsproposedin the literaturearetoo slow for real applications.As demonstratedin [18], our perfect simulation algorithm can be practical and efficient. Moreover, apart from the advantageof not suffering from edge effects, our perfect simulation algorithmcan also be useful in quantifyingthe edge effects sufferedby Algorithm 1 (see [18]). The perfect simulationalgorithmis derivedusing similarprinciplesas in Brix and Kendall [6], but our algorithmis a nontrivialextension, since the Brix-Kendall algorithmrequiresthe knowledge of the cumulativedistributionfunction (CDF) F for the length of a cluster, and F is unknowneven for the simplest examples of Hawkes processes. By establishingcertain monotonicityand convergenceresults,we are able to approximateF to any requiredprecision, and, more importantly,to constructa dominatingprocess and upperand lower processes in a similar fashion as in the dominated-coupling-from-the-past algorithmof [16]. Under a taillightness condition, our perfect simulationalgorithmturnsout to be feasible in applications, while, in the heavy-tailedcase, we can at least say somethingaboutthe approximateform of F (see Example7). The paperis organizedas follows. Section 2 contains some preliminaries,includingilluminatingexamples of Hawkes process models used throughoutthe paperto illustrateour results. In Section 3, we describe the perfect simulationalgorithm,assuming that F is known, while the above-mentionedconvergence and monotonicityresults are establishedin Section 4. In Section 5, we complete the perfect simulationalgorithm,using dominatedcoupling from the past. Finally, Section 6 contains a discussion of our algorithmand results, and suggestions on how to extend these to more general settings. 2. Preliminaries and examples 2.1. The branching structure and self-similarity property of clusters By Definition 1, we can view the markedHawkes process X = {(ti, Zi)} as a Poisson cluster process with cluster centres given by the immigrants,where the clusters, given the immigrants,are independent. In this section, we describe a self-similaritypropertyresulting from the specific branchingstructurewithin a cluster. For events ti < tj, we say that (tj, Zj) has ancestor ti of order n > 1 if there is a sequence sl,...,sn of offspring such that Sn = tj and sk, k = 1,.... n, is one of the offspring of Sk-1, with so = ti. We then say that tj is an offspring of nth generation with respect to ti; for convenience, we say that ti is of zeroth generationwith respect to itself. Now we define the total offspring process Ci as all those (tj, Zj) such that tj is an event of generation n e N0 with respect to ti (note that (ti, Zi) E Ci). The clusters are defined as those Ci for which ti is an immigrant(see Definition 1). 632 SGSA J. M0LLER AND J. G. RASMUSSEN The total offspringprocesses have the same branchingstructurerelativeto their generating events. More precisely, since yi (t) = y (t - ti, Zi) for any event ti, we see by Definition 1 that, conditionalon the events ti < tj, the translatedtotal offspringprocesses Ci - ti := {(tl - ti, Zi): (t, ZI) e Ci), Cj - tj := {(tl - tj, Z) : (t, ZI) e Cj} are identically distributed. In particular,conditional on the immigrants,the clusters relative to their cluster centres (the immigrants)are i.i.d. with distributionP, say. Furthermore,conditional on a cluster's nth generation events 9n,, say, the translated total offspring processes Cj - tj with tj E n are i.i.d. with distributionP. We refer to this last propertyas the i.i.d. self-similarityproperty of offspring processes or, for short, the self-similarityproperty. Note that the assumptionof unpredictablemarksis essential for these propertiesto hold. 2.2. A basic assumption and some terminology and notation Let F denote the CDF for the length L of a cluster, i.e. the time between the immigrant and the last event of the cluster. Considerthe mean numberof events in any offspringprocess a (ti), v := E v, where v= y (t, Z)dt is the totalfertility rate of an offspringprocess and Z denotes a generic markwith distribution Q. Henceforth,we assume that (2) 0 < P < 1. The conditionv < 1 appearscommonly in the literatureon Hawkesprocesses (see, e.g. [5], [8], and [14]), and is essential to our convergenceresultsin Section 4.2. It implies that F(0)= Ee-v > 0, where F(0) is the probabilitythat a clusterhas no offspring. It is equivalentto assumingthat E S < oo, where S denotes the numberof events in a cluster: by inductionon n = 0, 1,..., because of the branchingandconditionalindependencestructureof each cluster,vn is the mean numberof generationn events in a cluster,meaningthat ES = 1 + +v2+ - --= 1 1-v (3) if v < 1, while E S = oo otherwise. The other condition, v > 0, excludes the trivial case in which there are almost surely no offspring. It is readily seen to be equivalentto F < 1. (4) Furthermore, h(t) = E[y(t, Z)/v], t > 0, h(t) = E[y(t, Z)/v], t > 0, and (5) Perfectsimulationof Hawkesprocesses SGSA* 633 are well-defined densities (with respect to the Lebesgue measure). The density h will play a key role laterin this paper;it can be interpretedas the normalizedintensityfunctionfor thefirst generationof offspringin a cluster started at time 0. Note that h specifies the density of the distance R from an arbitraryoffspringto its nearestancestor. In the sequel, since the clusters, relativeto theirclustercentres,arei.i.d. (see Section 2.1), we assumewithoutloss of generality that L, R, and S are definedwith respect to the same immigrantto = 0, with markZo = Z. Clearly, if L > 0 then R > t implies that L > t, meaning the distribution of L has a thicker tail thanthatof R. The probabilityfunctionfor S is given by P(S = k) = P(Sn+I = k - 1 Sn = k)/k, k e N, where Sn denotes the numberof events of nth generationand n e N is arbitrary(see [9] or Theorem2.11.2 of [15]). Thus, P(S = k) = E[e-kv(kv)k-1/k!], k e N. (6) 2.3. Examples Throughoutthe paper,we illustratethe resultswith the following cases. Example 1. (Unmarkedprocess.) An unmarkedHawkesprocess with exponentiallydecaying fertilityrate is given by v = v = a, h(t) = h(t) = fe-pt where a, 0 < a < 1 and Bf > 0 are parameters.Here, 1/fl is a scale parameterfor both the distributionof R and the distributionof L. The left-handpanel of Figure 2 shows perfect simulationsof this process on [0, 10] when gj(t) = 1 is constant,a = 0.9, and 1f = 10, 5, 2, 1. By (3), we expect to see about 10 clusters (in total) and 100 events. The clustersof course become more visible as ,f increases. The left-handpanelof Figure3 shows six simulationsof clusterswith a = 0.9. Here,a is an inverse scaling parameter;1f is irrelevantsince, to obtain comparableresults for this example andthe following two examples,we have omittedshowing the scale. All the clustershave been simulatedconditionalon S > 1 to avoid the frequentand ratheruninterestingcase containing only the immigrant.These few simulationsindicatethe generaltendencyof L to vary widely. Example 2. (Birth-deathprocess.) Considera markedHawkes process with y(t, Z) = a_l(t < Z) EZ where a, 0 < a < 1, is a parameter,Z is a positive randomvariable, and 1(.) denotes the indicatorfunction. Then X can be viewed as a birth-deathprocess with birth at time ti and survivaltime Zi for the ith individual.The birthrate is )(t) = g(t) + - EZ card{i: ti < t < ti + Zi}, t E (cf. (1)). Moreover, v = aZ EZ ,V = a, h(t)=E F1(t<Z) Z , h(t) = P(Z > t) EZ Since v is random,the distributionof S is more dispersedthanin the unmarkedcase (cf. (6)). 634 * SGSA J. M0LLER AND J. G. RASMUSSEN on [0, 10] of theunmarked 2: Onthe left, we displayfourperfectsimulations Hawkesprocess FIGURE a = 0.9, g = 1, and/ = 10,5, 2, 1 (toptobottom).Random jitterhasbeen (Example1)withparameters addedin theverticaldirectionto helpdistinguisheventslocatedclosetogether.Ontheright,we display threeperfectsimulationson [0, 10] of the birth-deathHawkesprocess(Example2) with parameters of thelinesontothehorizontal a = 0.9, fi = 1, and,8 = 5, 2, 1 (topto bottom),wheretheprojections axisshowthesize of themarks. of clustersstartedat0 andconditioned on S > 1 in the FIGURE 3: Ontheleft,we displaysix simulations case,and, unmarked casewitha = 0.9. Inthecentre,we displaythesamesimulations, in thebirth-death case. Differentscalingsareusedin thethreecases. on theright,in theheavy-tailed The special case in which g(t) = /x is constantand Z is exponentiallydistributedwith mean 1/f3 was consideredin [5, p. 136]. In this case, X is a time-homogeneousMarkovbirth-death process with birthrate 1 + apn and deathrate fn, wheren is the numberof living individuals. Furthermore, h(t) = /e- t and h(t) = pE1(,8t), where El(s) = - dt is the exponentialintegralfunction. As in Example1, 1/1f is a scale parameterfor the distribution of L. As discussed in Example8, below, the stationarydistribution(i.e. the distributionof X at any fixed time) is known up to a constantof proportionality,and it is possible to simulatefrom this by rejectionsampling. The right-handpanel of Figure 2 shows threeperfect simulationson [0, 10] in the Markov case with ,L = 1, a = 0.9, and fi = 5, 2, 1, where the marksare indicatedby line segments of differentlengths. The centrepanel of Figure 3 shows six simulationsof clusters (with marks excluded) with a = 0.9, simulatedconditionalon S > 1. These simulationsindicatethat L is slightly more dispersedthanin Example 1, since the marksintroduceadditionalvariationin the cluster lengths. In fact, the coefficient of variationestimatedfrom 10 000 perfect simulations is 1.92 in Example 1 and 2.85 in the presentcase. Perfectsimulationof Hawkesprocesses SGSA* 635 Example 3. (Heavy-taileddistributionfor L.) Suppose that y(t, Z) = aZe-tz, where a E (0, 1) is a parameterand Z is exponentiallydistributedwith mean 1//3. Then v = v = a is constant, meaning that the distributionof S is the same as in the unmarkedcase (cf. (6)). Furthermore, h(t) = h(t) = (t + )2 specifies a Paretodensity. This is a heavy-taileddistribution,as it has infiniteLaplacetransform (£(O) = E eR = 00 for all 0 > 0). Moreover, it has infinite moments (E[RP] = 00 for all p > 1). Consequently,L also has a heavy-taileddistributionwith infinitemomentsandinfinite Laplacetransform.Note that 6 is a scale parameterfor the distributionof L. The right-handpanel of Figure3 shows six simulationsof clusterswith a = 0.9 and 16= 1. These indicatethatL is much more dispersedthanin Examples 1 and 2 (in fact, the dispersion is infinitein the presentcase). 3. Perfect simulation Assuming, for the moment, that F (the CDF for the length of a cluster) is known, the following algorithmfor perfect simulation of the markedHawkes process is similar to the algorithmfor simulationof Poisson clusterprocesses withoutedge effects given in [6] (see also [17] and [21]). Algorithm 2. Let II be the point process of immigrantson [0, t+), and let 12 be the point process of immigrants ti < 0 such that {(tj, Zj) e Ci: tj e [0, 00)} 0. 1. Simulate II as a Poisson process with intensity function k1(t) = /(t) on [0, t+). 2. For each ti E II, simulate Zi and those (tj, Zj) E Ci with ti < tj < t+. 3. Simulate 12 as a Poisson process with intensity function X2(t) = (1 - F(-t))s(t) (-oo, 0). on 4. For each ti E 12, simulate Zi and {(tj, Zj) e Ci : tj e [0, t+)} conditional on the event that {(tj, Zj) E Ci: tj E [0, 00)}) 0. 5. The outputis all markedpoints from steps 1, 2, and 4. Remark 1. In steps 1 and 2 of Algorithm2, we use Algorithm 1 (with t_ = 0). In step 4, it is not obvious how to constructan elegant approachensuringthat at least one point will fall after0. Instead,we use a simple rejectionsampler:we repeatedlysimulateZi, from Q and the successive generationsof offspringtj to ti (togetherwith theirmarksZ ), until thereis at least one event of Ci aftertime 0. The key point is how to simulate12 in step 3, since this requiresknowledge of F, which is unknownin closed form (see Remark3, below). In Section 4, we addressthis problemand, in Section 5, we constructan algorithmfor simulating12. In practice,we must requirethat 12 is (almost surely) finite or, equivalently,that f (1 - F(-t))L(t) dt <00. (7) In the case that /z(t) is bounded, (7) is satisfiedif supt>ogL(t)E L < 00. A condition for the finitenessof E L is establishedin Lemma 1 and Remark2, below. 636 SGSA J. M0LLER AND J. G. RASMUSSEN Proposition 1. The output of Algorithm 2 follows the distributionof the marked Hawkes process. Proof The immigrantprocess minus II U 12 generatesclusters with no events in [0, t+). Since II consists of the immigrantson [0, t+), it follows directly that II is a Poisson process with intensity Ail(t) = /z(t) on [0, t+). Since 12 consists of those immigrants on (-oo, 0) with offspring after 0, 12 can be viewed as an independentthinningof the immigrantprocess with retentionprobabilityp(t) = 1 - F(-t) and, thus, 12 is a Poisson process with intensity X2(t) = (1 - F(-t))itt(t). Since II and 12 are independent,it follows from Section 2.1 that {Ci: ti E I1) and {Ci: ti E 12} are independent. Viewing the markedHawkes process as a Poisson cluster process, it follows from Remark 1 that the clusters are generatedin the right way in steps 2 and4 of Algorithm2 when we only wantto samplethose markedpoints (tj, Zj) with tj E [0, t+). Thus,Algorithm2 producesrealizationsfrom the distributionof the marked Hawkes process. Using the notationof Section 2.2, the following lemma generalizes and sharpensa result of [14] aboutthe mean length of a cluster. Lemma 1. Wehave 1 E[(1-e-v) Ee-V E[R Z]]<EL< v --E. 1- v (8) Proof Consider a cluster startingwith an immigrantat time to = 0, with mark Zo = Z (cf. Section 2.1). For tj E 1, let Rj denote the distance from tj to 0, and Lj the length of the total offspring Cj process started by tj. Then L = max{Rj + Lj : tj e ~i }, so, if we condition on Z and let Rj,z be distributedas Rj, conditionalon the event Z = z, then [°e-v EL = E[E[L | Z]] = E v1 E[max{Rj,z + L: j = 1 ..., e ] i}] . (9) i=1 To obtainthe upperinequality,observe that Ei!E E EL i=1 (Rj,z + Lj) =E[vE[R I Z]]+ vEL, j=1 wherewe haveused the fact thatthe Lj areidenticallydistributedandhavethe same distribution as L, because of the self-similarityproperty(see Section 2.1), and the fact that the Rj are identicallydistributedwhen conditionedon Z. Hence, EL< - 1 E[vE[R | Z]]= 1-v 1 -v E r sy(s, Z) ds = ER, 1- v which verifies the upperinequality.Finally,by (9), EL>E E. i (E[R | Z]+EL) i=1 which reduces to the lower inequality. =E[(1 -e-) E[R | Z]] + E[1 - e-] E L, SGSA* 637 Perfect simulation of Hawkes processes Remark 2. If either v or y/v is independentof Z (in other words, either the numberor the locations of offspring in an offspring process are independentof the mark associated to the generic event), then it is easily proventhath = h and, thus, (8) reducesto 1--1 Ee - ER<EL< - -ER. -v Consequently,E L < oo if and only if E R < oo. This immediatelyshows that E L < oo in Example 1 and E L = oo in Example 3. In Example 2, when Z is exponentiallydistributed with mean 1/f/, (8) becomes a a (a + 2) <EL< 2(a + 1)fl - 1(1 - a)' so in this case E L < oo. Not surprisingly,apartfromfor small values of a e (0, 1), the bounds are ratherpoor and of little use except in establishingthe finitenessof E L. 4. The distribution for the length of a cluster In this section, we derivevariousdistributionalresultsconcerningthe length L of a cluster. The results are needed in Section 5 to complete step 3 of Algorithm2; however,many of the results are also of independentinterest. 4.1. An integral equation for F Below, in Proposition2, an integralequationfor F is derived, and we discuss how to approximateF by numericalmethods,using a certainrecursion.Proposition2 is a generalization of Theorem5 of [14], which was provedusing void probabilitiesobtainedfrom a generalresult for the probability-generatingfunctionalfor an unmarkedHawkes process. However,as was pointed out in [8], the probability-generatingfunctional for the markedHawkes process is difficultto obtain. We give a directproof based on void probabilities. For n E No, let In denote the CDF for the length of a clusterwhen all events of generations n + 1, n + 2,... are removed (it will become clear in Section 4.2 why we use the notation 'ln'). Clearly, In is decreasingin n, In -> F pointwise as n -- oo, and lo(t) = 1, t >0. Let C denote the class of Borel functionsf : [0, oo) -- [0, 1]. For f E e, define(p(f) e C by q(f)(t) = E exp -v + f(t - s)y(s, Z)ds , t > 0. (10) Proposition 2. Wehave In = (In-1), ne N, (11) and F = (p(F). (12) Proof As in the proof of Lemma 1, we can consider a cluster startedat time to = 0, with associated marks Zo = Z. For a fixed t > 0 and n E N, split 0(0) into three point processes 01, F2, and 43: the process 41 consists of those first generationoffspring ti e <c(0) n [0, t) thatdo not generateeventsof generationn - 1 or lower with respectto ti on [t, oo); J. M0LLER AND J. G. RASMUSSEN 638 * SGSA 02 = (j (0) n [0, t)) \i consistsof the remainingfirstgenerationoffspringon [0, t); and I3 = 0 (0) n[t, oo) consists of the firstgenerationoffspringon [t, oo). Conditionalon Z, Q1, F2,and (3 are independent Poisson processes with intensity functions i1(s) = y(s, Z)ln-1(t - s) on [0, t), X2(s) = y(s, Z)(1 - ln-1(t - s)) on [0, t), and X3(s) = y(s, Z) on [t, oo), respectively. This follows by an independentthinningargumentsince, conditionalon 8n (the nth generation of offspringin Co), the processes Cj - tj with tj e n are i.i.d. and distributedas Co (this is the self-similaritypropertyfrom Section 2.1). Consequently, ln(t) = E[P(02 = 0 = Eexp(- I Z) P(3 = 0 I Z)] z(s, Z)ds - which reduces to (11). Takingthe limit as n - f 3(s, Z)ds) , oo on both sides of (11), we obtain (12) by monotone convergence, since in (t) < ln-1 (t) for all t > 0 and n e N. Remark 3. As illustratedin the following example, we have been unsuccessful in using (12) to obtain a closed form expression for F even for simple choices of y. Fortunately,the recursion(11) provides a useful numericalapproximationto F. As the integralin (10), with f = 1n-1, quickly becomes difficult to evaluate analyticallyas n increases, we compute the integralnumerically,using a quadraturerule. Example 4. (Unmarkedprocess.) ConsiderExample 1 with B = 1. Then (12) is equivalentto [t e F(s)es ds = - n(eaF(t)), which is not analyticallysolvable. 4.2. Monotonicity properties and convergence results As established in Theorem 1 below, many approximationsof F other than In exist, and their rates of convergence may be geometric with respect to different norms. Notice that certainmonotonicitypropertiesare fulfilled by (p, where, for functions f: [0, oo) -+ [0, 1], and set n = 1, 2,..., we recursively define o[](f) = f and p[n](f) = (P[n-l](f)), fn = W[n](f), n = 0, 1,.... Note that Fn = F for all n E N0. As In = q[n](1) is decreasing towardsthe CDF F, cases in which G is a CDF and Gn increasesto F areof particularinterest. Lemma 2. For any f, g e C, we have f < g= fn gn, ne N, (13) f < p(f) = fn is nondecreasingin n, (14) f > ((f) = fn is nonincreasingin n. (15) Proof Equation(13) follows immediatelyfrom (10) when n = 1, and then by inductionin the remainingcases. Equations(14) and (15) follow from (13). Theorem 1. Withrespectto the supremumnorm IlflII = supt>oIf (t) I, p is a contractionon C, that is, for all f, g E C andn e N, we have fn, gn E C and |i(f) - (p(g)I~\0 ivIIf - gll0. (16) SGSA* 639 Perfect simulation of Hawkes processes Furthermore,F is the uniquefixpoint, i.e. 0 as n -+ oo 0iF - fnloo and (17) -)n (18) lII(f) - flloo, < -f -1-v where IIp(f) - flloo < 1. Furthermore,if f < qp(f) or f > (p(f), then fn convergesto F from below or respectively,above. l\F - foo Proof Let f, g e C. Recall that,by the mean value theorem(e.g. Theorem5.11 of [1]), for any real numbersx and y, we have ex - ey = (x - y)ez(x'y), where z(x, y) is a real number between x and y. Thus, by (10), I(f) - p(g)00 = sup Ee-ec(tf') gl (f(t - s) - g(t - s))y(s, Z)ds , wherec(t, f, g) is arandomvariablebetweenfo f(t - s)y(s, Z) ds andf g(t - s)y(s, Z) ds. Since f, g _ 1, we obtainec(tfg) _ eV(see (2)). Consequently, lio(f) - y(g)lloo < sup E t>o < E[ 0 (f(t - s) - g(t - s))y(s, Z)ds If - gllooy(s, Z) ds = v||f - glloo. Thereby,(16) is verified. Since C is complete (see, e.g. Theorem3.11 of [26]), it follows, from the fixpointtheoremfor contractions(see, e.g. Theorem4.48 of [1]), thatthe contractionhas a uniquefixpoint: by (12), this is F. Since f e C implies that ((f) e C, we find that fn e C by induction. Hence, using (12), (16), and induction,we have lf- F|loo - o(F)|loo _ v\\fn- - FIIoo _ Un Ifn - Floo = IISP(fn-1) (19) for n e N. Since v < 1, we recover(17). Similarlyto (19), we have IIfn- fn-llloo _ Vn-1Ifi - flloo, n E N. (20) Furthermore,by (17), we have lim I|fm - fIoo. ||F - f ||oo= m-+oo So, by the triangleinequalityand (20), we have IIF- f|loo < m-+oo lim (| fi - f||oo + II/2 - fllo + lim _ m--oo "+ 1Ifm - fm-llloo) ||ift- f|lloo(1 + v + -- - + vOm-1) Ilfi - flloo 1-v (see (2)). Combiningthis with (19), we obtain (18). Finally, if f < p(f) or f > (o(f) then by (14) or, respectively,(15) and (17), fn convergesfrom below or, respectively,above. 640 * SGSA J. M0LLER AND J. G. RASMUSSEN Similarresults to those of Theorem 1, but for the L-norm, were establishedin [18]. The following remarkand propositionshow how to find upper and lower bounds on F in many cases. Remark 4. Considera function f e G. The conditionsf < p(f) and f > yp(f) are satisfied in the extremecases f = 0 and f = 1, respectively. The upperbound f = 1 is useful in the following sections, but the lower bound f = 0 is too small a functionfor our purposes;if we requirethat E L < oo (cf. Remark1) then f = 0 cannotbe used (in fact we use only f = 0 when producingthe right-handplot in Figure4, below). To obtaina more useful lower bound, observe that f < p(f) implies f < F < 1 (cf. (4) andTheorem 1). If f < 1 then a sufficient conditionfor f < yp(f) is 1 fo(1 - f(t - s))h(s) ds + ft° h(s) ds - > 1 - f (t) v t > 0. (21) This follows readilyfrom (5) and (10), using the fact that ex > 1 + x. The function f in (21) is closest to F when f is a CDF G and we have equality in (21). Equivalently,G satisfies the renewalequation G(t) = 1 - v + G(t - s)h(s) ds, t > 0, which has the unique solution G(t) = 1 - + y(1 - v)3n *n(s) ds, t > 0, (22) n=1 where *n denotes n-times convolution(see TheoremIV2.4 of [2]). In other words, G is the CDF of R1 + . . . + RK (setting R i+ - -+ RK = 0 if K = 0), where K, R, R2, ... are independentrandomvariables,each Ri has densityh, and K has a geometricdensity (1 - i)in. Interestingly,this geometricdensity is equal to E Sn/ E S (see (3)). The next propositionshows that, in many situations,G < p(G) when G is an exponential CDFwith a sufficientlylargemean. In suchcases, F has no heaviertailsthansuchanexponential distribution. Denote by £(0) = efth(t) dt, 0 e R, the Laplace transformof h. Proposition 3. If G(t) = 1 - e-t for t > 0, where 0 > 0 and £(0) < 1/f, then G < qp(G). Proof Upon insertingf = G into the right-handside of (21), we obtain esh(s) ds + eJ h(s) ds. Since this is an increasingfunctionof t > 0, (21) is satisfiedif and only if L(0) < 1/v. Note thatProposition3 always appliesfor sufficientlysmall 0 > 0, except in the case where h is heavy tailed in the sense that L(0) = 00oo for all 0 > 0. The condition d(0) < 1/v is equivalentto the tail-lightnesscondition [5, Equation(2.1)]. Perfectsimulationof Hawkesprocesses 1.0 SGSA 1.0 0.30 0.8- 0.20 0.8- 0.6- 0.10 0.6 0.00 0.40 10 20 30 40 50 60 641 0.4 0 10 20 30 40 50 60 0 100 200 300 400 500 case with 4: Onthe left, we displayplotsof In andGn for n = 0, 5,..., 50 in the unmarked FIGURE a = 0.9 and 3 = 1 (see Example5); 150andGsoaredrawnsolidto illustratetheapproximate formof F, whereastheothercurvesaredashed.Inthecentre,we displayplotsof thedensity [1' /(1 - 1(0)) + Gn/(1 - Gn(0))] when n = 50 (solid) andthe exponentialdensity with the same mean (dashed). On the right,we displaythe sameplotsas on theleft,forExample7 witha = 0.9 and/ = 1, usingIn andOn of F. as approximations 4.3. Examples In Examples 5 and 6 below, we let G(t) = 1 - e-t, t > 0O, (23) be the exponentialCDF with parameter0 > 0. Example 5. (Unmarked process.) For the case in Example 1, £(0) = /3/(/3 - 0) if 0 < /, and £(0) = oc otherwise. Interestingly, for the 'best choice', 0 = -(l1/ii) = /3(1 - a), (23) becomes the CDF for R times E S, which is easily seen to be the same as the CDF in (22). The left-handpanel of Figure4 shows In and Gn when 0 = /3(1 - a) and (a, /3) = (0.9, 1). The convergenceof In and Gn (with respect to || IIoo)and the approximateform of F are clearly visible. Since G is a CDF and Gn+i > Gn, we find that Gn is also a CDF. The centre panel of Figure4 shows the density F'(t)/(1 - F(0)) (t > 0) approximatedby I(t) 1/ 2 1 - I1 (0) Gn(t0) 1 - Gn(O) when n = 50 (in which case 1/n(t)/(1 - ln(0)) and Gn(t)/(1 - Gn(0)) are effectively equal). As shown in the plot, the density is close to the exponentialdensity with the same mean, but the tail is slightly thicker. Example 6. (Birth-deathprocess.) For the case in Example 2, foz e s £(0) = E f- ds = 0EZ z(0) - 1 OEZ , where £z (0) = E eoz is the Laplace transformof Z. In the special case where Z is exponentially distributedwith mean 1//3, £(0) = £z(0) = //( - 0) is of the same form as in Example 5. Plots of In, Gn, and 1 2- 1 In Gn ~ 1- Gn(0) for n = 0, 5,..., 50 and (a, /3) = (0.9, 1) are similar to those in the centre and right-hand panels of Figure4, and are thereforeomitted. 642 SGSA J. M0LLER AND J. G. RASMUSSEN Example 7. (Heavy-taileddistributionfor L.) For the case in Example 3, Proposition3 does not apply, as £(0) = oo for all 8 > 0. The CDF in (22) is not known in closed form, since the convolutionsare not tractable(in fact, this is the case when h specifies any known heavytailed distribution,including the Pareto, Weibull, log-normal, or log-gamma distributions). Nonetheless, it is still possible to get an idea of what F looks like: the right-handpanel of Figure 4 shows In and On := o[n](0) for n = 0, 5, ..., 50 in the case (a, /8) = (0.9, 1). As in Examples5 and6, the convergenceof In andGn (where,now, G = 0) andthe approximateform of F areclearlyvisible. However,as indicatedby the plots andverifiedin [18], limt-o Gn(t) < 1 when G = 0, meaningthat Gn is not a CDF. 5. Simulation of 12 To complete the perfect simulation algorithm (Algorithm 2), we need a useful way of simulating 2. Ourprocedureis based on a dominatingprocess and the use of coupled upper algorithm[16]. and lower processes, in the spiritof the dominated-coupling-from-the-past Suppose that f e 2 is in closed form, with f < Op(f),and that (7) is satisfied when we replaceF by f (situationsin which these requirementsarefulfilledareconsideredin Sections 3, 4.2, and4.3). Particularly,if 4/ is constantand f is a CDF, (7) implies that f has a finite mean. Now, for n E No, let Un and Ln denote Poisson processes on (-oo, 0) with intensityfunctions Xu(t) = (1 - fn(-t))/z(t) and Xn(t) = (1 - In(-t)/z(t), respectively. By Theorem 1, XAis nonincreasingand 4 is nondecreasingin n, and they both convergeto A2(geometricallyfast with respectto the supremumnorm). Consequently,we can use independentthinningto obtainthe following sandwiching/funnellingproperty(see [16]): 0 = Lo _c L _c L2 c... CI2 C... c U2 c U1 c U0. (24) The details are given by the following algorithm. Algorithm 3. (Simulationof 2.) 1. Generate a realization {(tl, Z1),... (tk, Zk)} of U0, where tl < - .. < tk. 2. If Uo = 0 then return 12 = 0 and stop; otherwise, generate independentuniform numbersWi,..., Wkon [0, 1] (independentlyof Uo) and set n = 1. 3. For j = 1,..., k, assign (tj, Zj) to Ln or Un if WjX(tj) Wj2(tj) < u(t). < An(t) or, respectively, 4. If Un = Ln thenreturn12 = Ln andstop;otherwise,increasen by 1 andrepeatsteps 3-4. Proposition 4. Algorithm3 workscorrectlyand terminatesalmost surelywithinfinite time. Proof To see this, imagine that, regardlessof whether Uo = 0 in step 2 or Un = Ln in step 4, we continue to generate (U1, L1), (U2, L2), etc. Furthermore,add an extra step: for j = 1, ..., k, assign (tj, Zj) to /2 if and only if Wjh~(tj)l h2(tj). SGSA* 643 Perfect simulation of Hawkes processes Then clearly, because of the convergencepropertiesof Auand A. (see the discussion above), (24) is satisfied and, conditional on t, , tk, ... k P(Ln # Un for alln e No) < (t), W (t) lim P(W (t) > n(t)) j=1 k < W1j,~(tj) _ X2(tj)) = 0. = EP(2(tj) j=1 Thus,Algorithm3 almost surely terminateswithin finite time and the outputequals 12. Remark 5. We compute In and fn numerically,using a quadraturerule (see Remark3). After step 1 in Algorithm3, we let the last quadraturepoint be given by -ti (since we do not need to calculate In(t) and fn(t) for t > -tl). Since we have to calculate In and fn recursivelyfor all n = 0, 1, 2, ... untilAlgorithm3 terminates,thereis no advantagein using a doublingscheme for n, as in the Propp-Wilsonalgorithm[24]. Example 8. (Birth-deathprocess.) We have checked our computercode for Algorithms 2 and 3 by comparingwith results producedby anotherperfect simulationalgorithm. Consider the case in Example2 when /(t) = /z is constantand Z is exponentiallydistributedwith mean 1/ft. If N denotesthe numberof events alive at time 0, we have the following detailedbalance conditionfor its equilibriumdensity lrn: 7n (l + afn) = rn+fl(n n + 1), This density is well defined, since limn-+0 (n+l/n) E NO. = a < 1. Now, choose m e No and e > 0 such that a = a + s < 1 and rn+1/rn < a whenever n > m. If/z < aft, we can take e = m = 0; otherwise, we can use m >_ (t - af)/fe for some e > 0. Define an unnormalized density xrn,n e No, by rn = Ttn/ro if n < m, and by Xr = an-m7tm/o otherwise. We can easily sample from Jr by inversion(see [25]), since we can calculate 00 / n- --- .1 0zno 0 Then, since Trn > m - a 1-a n ro 7n/rO, we can sample N from rn by rejection sampling (see [25]). Fur- thermore, conditional on N = n, we generate n independentmarks Z, ..., Zn that are exponentially distributedwith mean 1/fl (here, we exploit the memoryless propertyof the exponential distribution). Finally, we simulate the markedHawkes process with events in (0, t+], using the conditionalintensity '(t) = +a( l(t < ti + Zi)). l (t < Z) + i=1 0<ti<t We have implementedthis algorithmfor comparisonwith our algorithm. Not surprisingly, it is a lot faster than our perfect simulationalgorithm(roughly 1200 times as fast in the case a = 0.9, p = /z = 1, and t+ = 10), since it exploits the fact that we know the stationary distributionin this special case. J. M0LLER AND J. G. RASMUSSEN 644 SGSA 6. Extensions and open problems Except in the heavy-tailedcase, ourperfectsimulationalgorithmis feasible in the examples we have considered. However,simulationof the heavy-tailedcases is an unsolvedproblem. In these cases, we can only say somethingaboutthe approximateform of F (see Example7). For applicationssuch as those in seismology [23], extensions of our results and algorithms to the heavy-tailed cases are important. The epidemic-type aftershock sequences (ETAS model) [22], used for modellingtimes andmagnitudesof earthquakes,is a heavy-tailedmarked Hawkes process. Its spatio-temporalextension, which also includes the locations of the earthquakes(see [23]), can be applied to the problem of predictablemarks (the location of an aftershockdependson the location of the earthquakethat causes it). This problemis easily solved, though, since the times and magnitudesare independentof the locations and can be simulatedwithout worryingaboutthese. This, of course, still leaves the unsolved problemof the heavy tails. Extensions to nonlinearHawkes processes [3], [8] would also be interesting. However, things again become complicated, since a nonlinearHawkes process is not even a Poisson clusterprocess. Simulationsof Hawkesprocesses with predictablemarkscan, in some cases, be obtainedby using a thinning algorithm,if it is possible to dominatethe Hawkes process with predictable marksby a Hawkesprocesswith unpredictablemarks.We illustratethe procedurewith a simple birth-deathexample. Example 9. (Birth-deathprocess.) Considertwo birth-deathHawkesprocesses as definedin Example2. Let Ii have unpredictablemarks,with Z1 ~ Exp(6), and let 2 have predictable marks,with Z ~ Exp(f + 1/ZZ ), where Z is the markof the first-orderancestorof ti. Both models have y(t, Z) = a l1(t < Z), with the same a and r, andthey also have the same (t). The model tP2 has the intuitive appeal that long-lived individualshave long-lived offspring. Note thatthe intensityof 'i dominatesthe intensityof 'I2 if the marksare simulatedsuch that Z' > Z? To simulate 'I2, we first simulate I1 using Algorithm 3, with the modificationsthat we associate both marks Z\ and Z to the event ti, and we keep all events from the algorithm whetherthey fall in or before [0, t+). Each markedevent (tj, ZJ) is then includedin '2 with retentionprobability z(t) + af Eti<tj, g (t) + a tiE'2 l(tj - ti < Z2) ti;<tj l(tj - ti < Z) ' andthe final outputis all markedevents from 'I2 falling in [0, t+). It is easily proventhatthese retentionprobabilitiesresultin the correctprocess I2-. Another process that would be interesting to obtain by thinning is the Hawkes process withoutimmigrantsconsideredin [4]; this process has l(t) = 0 for all t. However,for this to be nontrivial(i.e. not almost surely empty), it is necessary that v = 1, which means that any dominatingHawkes process has v > 1 and, thus, cannotbe simulatedby Algorithm3. Many of our results and algorithmscan be modifiedif we slightly extend the definition(in Section 1) of a markedHawkesprocess, as follows. For any event ti with associatedmarkZi, let ni denote the number of (first-generation)offspring generatedby (ti, Zi), and suppose thatni, conditionalon Zi, is not necessarilyPoisson distributed,butthatni is still conditionally Perfectsimulationof Hawkesprocesses SGSA 645 independentof ti andthe previoushistory.A particularlysimple case occurswhen ni is either 1 or 0, and p = E[P(ni = 1 I Zi)] is assumedto be strictlybetween 0 and 1 (here p plays a role similarto thatof v, introducedin Section 4). We then redefineq9by p(f)(t) = 1 - p + p f (t - s)h(s) ds, where, now, h(s) = E[p(Z)h(s, Z)]/p. Since (pis now linear, the situationis much simpler. For example, F is given by G in (22) (with v replacedby p). Anotherextensionof practicalrelevanceis to considera non-Poissonimmigrantprocess,e.g. a Markovor Cox process. The results in Section 4 do not depend on the choice of immigrant process, and the straightforwardsimulationalgorithm(Algorithm 1) applies providedthat it is feasible to simulatethe immigrantson [t , t+). However,the perfect simulationalgorithm relies heavily on the assumptionthatthe immigrantprocess is Poisson. Finally,we notice thatit wouldbe interestingto extendourideas to spatialHawkesprocesses (see [19] and [20]). Acknowledgements We aregratefulto S0renAsmussen, Svend Bemtsen, HoriaCornean,MartinJacobsen,Arne Jensen, and GiovanniLuca Torrisifor helpful discussions. The researchof JesperM0ller was supportedby the Danish NaturalScience ResearchCouncil and the Networkin Mathematical Physics and Stochastics (MaPhySto), funded by grants from the Danish National Research Foundation. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] APOSTOL, T. M. (1974). MathematicalAnalysis.Addison-Wesley,Reading,MA. ASMUSSEN, S. (1987). AppliedProbabilityand Queues. JohnWiley, Chichester. BREMAUD, P. ANDMASSOULIf, L. (1996). Stabilityof nonlinearHawkes processes.Ann. Prob. 24, 1563-1588. P. ANDMASSOULIE, BREMAUD, L. (2001). Hawkes branchingpoint processes without ancestors.J. Appl. Prob. 38, 122-135. G. (2002). Rate of convergence to equilibriumof markedHawkes BRIMAUD,P., NAPPO,G. ANDTORRISI, processes. J. Appl. Prob. 39, 123-136. BRIX,A. ANDKENDALL, W. S. (2002). Simulationof cluster point processes without edge effects. Adv.Appl. Prob. 34, 267-280. CHORNOBOY, L. P. ANDKARR,A. F. (1988). Maximumlikelihoodidentificationof neuralpoint E. S., SCHRAMM, process systems. Biol. Cybernet.59, 265-275. D. J. ANDVERE-JONES, DALEY, D. (2003). An Introductionto the Theoryof Point Processes, Vol. 1, Elementary Theoryand Methods,2nd edn. Springer,New York. DwAss, M. (1969). The total progeny in a branchingprocess and a related random walk. J. Appl. Prob. 6, 682-686. HAWKES, A. G. (1971). Point spectraof some mutuallyexcitingpointprocesses.J. R. Statist.Soc. B 33, 438-443. A. G. (1971). Spectraof some self-exciting and mutually exciting point processes. Biometrika58, HAWKES, 83-90. A. G. (1972). Spectraof some mutuallyexcitingpointprocesseswith associatedvariables.In Stochastic HAWKES, Point Processes, ed. P. A. W. Lewis, JohnWiley, New York,pp. 261-271. HAWKES, L. (1973). Clustermodels for earthquakes- regional comparisons.Bull. A. G. ANDADAMOPOULOS, Internat.Statist.Inst. 45, 454-461. 646 SGSA J. M0LLER AND J. G. RASMUSSEN A. G. ANDOAKES,D. (1974). A cluster representationof a self-exciting process. J. Appl. Prob. 11, [14] HAWKES, 493-503. [15] JAGERS, P. (1975). BranchingProcesses with Biological Applications.JohnWiley, London. [16] KENDALL, W. S. ANDM0LLER,J. (2000). Perfect simulationusing dominatingprocesses on orderedspaces, with applicationto locally stable point processes.Adv.Appl. Prob. 32, 844-865. [17] M0LLER,J. (2003). Shot noise Cox processes.Adv.Appl. Prob. 35, 614-640. [18] M0LLER,J. ANDRASMUSSEN, J. G. (2005). Approximatesimulationof Hawkesprocesses. Submitted. G. L. (2005). Generalisedshot noise Cox processes.Adv.Appl. Prob. 37, 48-74. [19] M0LLER,J. ANDTORRISI, [20] M0LLER,J. ANDTORRISI, G. L. (2005). Perfect and approximatesimulationof spatial Hawkes processes. In preparation. [21] M0LLER, J. ANDWAAGEPETERSEN, R. P. (2004). StatisticalInferenceand SimulationforSpatialPointProcesses. Chapmanand Hall, Boca Raton,FL. [22] OGATA, Y. (1988). Statisticalmodels for earthquakeoccurrencesand residual analysis for point processes. J. Amer Statist.Assoc. 83, 9-27. Y. (1998). Space-timepoint-processmodels for earthquakeoccurrences.Ann. Inst. Statist. Math. 50, [23] OGATA, 379-402. [24] PROPP,J. G. ANDWILSON, D. B. (1996). Exact sampling with coupled Markov chains and applications to statisticalmechanics.RandomStructuresAlgorithms9, 223-252. [25] RIPLEY, B. D. (1987). StochasticSimulation.JohnWiley, New York. [26] RUDIN,W. (1987). Real and ComplexAnalysis. McGraw-Hill,New York. [27] VERE-JONES, D. ANDOZAKI,T. (1982). Some examples of statisticalinferenceappliedto earthquakedata.Ann. Inst. Statist.Math. 34, 189-207.
© Copyright 2025 Paperzz