Heavy tails:
Sample Path Large Deviations
and Importance Sampling
Bert Zwart
Centrum Wiskunde & Informatica
Amsterdam
Jose Blanchet, Bohan Chen & Chang-Han Rhee
http://arxiv.org/abs/1606.02795
B. Zwart (CWI)
Heavy tails
1 / 43
What this talk is about
Let X (t), t ≥ 0 be a Lévy process, such that
P(X (1) > x) = L+ (x)x −α ,
Set X̄n (t) = X (nt)/n,
P(X (1) < −x) = L− (x)x −β .
X̄n = {X̄n (t), t ∈ [0, 1]}.
Can we obtain sample-path large deviations for X̄n ? i.e.
P(X̄n ∈ A) ∼
???,
n→∞
Can we use these insights to develop fast sampling algorithms?
B. Zwart (CWI)
Heavy tails
2 / 43
Overview
Motivation and introduction
Main result: sample path large deviations principles
Examples
Simulation
More details
I
I
I
Random walks
Connections to the standard LD framework
Proofs
B. Zwart (CWI)
Heavy tails
3 / 43
Heavy tails are everywhere!
Computer systems
Finance
losses
delays, files, …
Social networks
Energy Systems
popularity, contagion
blackouts
𝑃 𝑠𝑖𝑧𝑒 > 𝑛 ≈
B. Zwart (CWI)
Heavy tails
1
𝑛𝛼
1
4 / 43
Books on Heavy tails
In progress: JK Nair, Adam Wierman, BZ The fundamentals of heavy tails
B. Zwart (CWI)
Heavy tails
5 / 43
Large deviations for light tails
Let Ui , i ≥ 1 be an i.i.d. sequence with E [U1 ] = 0.
Let S0 = 0 and for n ≥ 1, let Sn = U1 + . . . + Un .
The weak law of large numbers (WLLN) states that
lim P(|Sn /n| > ) = 0
n→∞
for every > 0.
How fast is the convergence to 0 in the WLLN?
B. Zwart (CWI)
Heavy tails
6 / 43
Cramérs theorem
Cumulant generating function of U1 : Λ(s) = log E [e sU1 ]
Convex conjugate of Λ: Λ∗ (a) = sups≥0 [as − Λ(s)].
Λ∗ is convex, and strictly convex if U1 is non-deterministic
Theorem (Cramér (1938))
For a > 0:
1
lim log P(Sn /n > a) = −Λ∗ (a).
n→∞ n
What about P(Sn /n ∈ A)?
B. Zwart (CWI)
Heavy tails
7 / 43
The large-deviations principle (Varadhan, 1966)
A sequence of r.v.’s Zn satisfies an LDP with rate function I if I is lower
semi-continuous and
− inf◦ I (ξ) ≤ lim inf
ξ∈A
n→∞
log P(Zn ∈ A)
log P(Zn ∈ A)
≤ lim sup
≤ − inf I (ξ),
n
n
ξ∈A−
n→∞
Example: Zn = Sn /n and I = Λ∗ .
Some cornerstones:
Contraction principle (analogue of continuous mapping principle)
Connections with convex & variational analysis, statistical physics, ....
Often Crucial: the property that I is good (i.e. has compact level sets).
B. Zwart (CWI)
Heavy tails
8 / 43
Heavy tails
Cramérs theorem does not give a precise answer (0) if
E [e U ] = ∞
for all > 0. In this case, we say that U has a heavy (right) tail.
Examples of heavy tails:
Pareto: P(U > x) ∼ x −α
2
Lognormal: P(U > x) ∼ e −(log x)
α
Weibull: P(U > x) ∼ e −x , α ∈ (0, 1).
Any df with a hazard rate decreasing to 0.
B. Zwart (CWI)
Heavy tails
9 / 43
Classes of HT distributions
U is called long-tailed (and write U ∈ L) if
P(U > x + y )
→1
P(U > x)
as x → ∞ for every y > 0.
An equivalent way to write this is
P(U > x + y | U > x) → 1.
B. Zwart (CWI)
Heavy tails
10 / 43
Subexponential distributions
A non-negative random variable U is subexponential if for two independent
copies U1 , U2 of U,
P(U1 + U2 > x)
→ 2,
P(U1 > x)
as x → ∞. We write U ∈ S. Note that
P(max{U1 , U2 } > x) ∼ 2P(U1 > x),
and that U1 + U2 > max{U1 , U2 }.
A large sum is most likely due to a large maximum.
B. Zwart (CWI)
Heavy tails
11 / 43
The principle of a single big jump
Theorem (A. Nagaev, 1969) Let Ui be i.i.d. with mean 0 and
P(U1 > x) = L(x)x −α . Let Sn = U1 + . . . + Un and a > 0. Then
P(Sn /n > a) ∼ nP(U1 > an).
Heavy-tailed analogue of Cramérs theorem
Most general formulation of this result in Dieker, Denisov, Shneer (2008).
√
Necessary (due to CLT): P(U1 > n) ∼ P(U1 > n + O( n)).
In heavy-tailed world (’extremistan’) rare events happen by catastrophes.
In light-tailed world (’mediocristan’) by conspiracies
B. Zwart (CWI)
Heavy tails
12 / 43
Another classic example: GI/GI/1 FIFO Waiting time
Let B ∗ be a r.v. with density P(B > x)/E [B]. B is a generic service time.
Theorem (Korshunov (1997)). The following are equivalent:
1
W ∈ S,
2
B ∗ ∈ S,
3
P(W > x) ∼
ρ
∗
1−ρ P(B
> x).
Earlier versions: Borovkov (1971), Cohen (1973), Pakes (1975),
Veraverbeke (1977).
Proofs based on Wiener-Hopf factorization, and estimates for random sums
Recent Martingale proof of 2 ⇒ 3 by Denisov & Wachtel (2012)
B. Zwart (CWI)
Heavy tails
13 / 43
Main motivation of this work
In many (network) applications, we can write our object of interest as a
mapping Ψ of the sample path of a random walk or Levy process.
Recall
X̄n (t) = X (nt)/n,
X̄n = {X̄n (t), t ∈ [0, 1]}.
Let Ψ be a continuous mapping on D. What can we say about
P(Ψ(X̄n ) ∈ B) = P(X̄n ∈ Ψ−1 (B))?
Need
Estimate of P(X̄n ∈ A) for sufficiently many sets A.
A version of a contraction principle/continuous mapping theorem
B. Zwart (CWI)
Heavy tails
14 / 43
Additional motivation: multiple big jumps
Contractual graph of an
Insurance-reinsurance Network
Insurance-reinsurance networks (Blanchet & Shi 2015)
Fluid queues with On-Off sources (Z, Borst, Mandjes 2004)
Many-server queues (Foss & Korshunov 2005, 2014)
My vision: we need a more structural approach to such problems, in line
with large-deviations theory for light-tailed systems and weak convergence
B. Zwart (CWI)
Heavy tails
15 / 43
Spectrally positive Lévy processes
Z
X (s) = sa+B(s)+
Z
x[N([0, s]×dx)−sν(dx)]+
0<x≤1
xN([0, s]×dx),
x>1
E [X (s)] = 0,
a is a drift parameter,
B a Brownian motion,
N is a Poisson random measure with mean measure Leb×ν on
[0, 1] × (0, ∞);
R1
ν is a measure on (0, ∞) satisfying 0 x 2 ν(dx) < ∞ and
ν(x, ∞) = L(x)x −α , α > 1.
B. Zwart (CWI)
Heavy tails
16 / 43
Spectrally positive Lévy processes (2)
Some notation:
Dj : subspaces of the Skorokhod space D consisting of nondecreasing
step functions, vanishing at the origin, with exactly j jumps
S
D6j , 0≤i≤j Di
D+ (ξ): the number of upward jumps of an element ξ in D.
Finally, set
J (A) ,
B. Zwart (CWI)
inf
ξ∈D<∞ ∩A
Heavy tails
D+ (ξ).
17 / 43
Main result for one-sided case
Theorem 1
Suppose that A is a measurable set. If J (A) < ∞, and if A is bounded
away from D6J (A)−1 , then
CJ (A) (A◦ ) ≤ lim inf
n→∞
P(X̄n ∈ A)
P(X̄n ∈ A)
≤ lim sup
≤ CJ (A) (A− ).
J
(A)
(nν[n, ∞))
n→∞ (nν[n, ∞))J (A)
Recall
J (A) ,
inf
ξ∈D<∞ ∩A
D+ (ξ).
J (A) is the number of jumps needed to achieve the event X̄n ∈ A.
B. Zwart (CWI)
Heavy tails
18 / 43
The pre-factor
ναj : restriction to Rj↓
+ of the j-fold product measure of να , where
−α
να (x, ∞) , x .
h
i
P
For j ≥ 1, Cj (·) , E ναj {y ∈ (0, ∞)j : ji=1 yi 1[Ui ,1] ∈ ·} , with
Ui , i ≥ 1 are i.i.d. uniform on [0, 1].
Can be written as j-dimensional integral
Interpretation is important for simulation, and leads to an extension
of a conditional limit theorem of Durrett (1980).
B. Zwart (CWI)
Heavy tails
19 / 43
Conditional limit theorem
Suppose that, in addition, B ⊆ D[0, 1] satisfies CJ (B) (B ◦ ) = CJ (B) (B − ).
Let X̄n (t) , X (nt)/n. Then
d
|B
X̄n |X̄n ∈ B → X̄∞
where
|B
P(X̄∞
∈ ·) =
Note that
CJ (B) (· ∩ B)
.
CJ (B) (B)
J (B)
D
|B
X̄∞
(t) =
X
χn 1[Un ,1] (t)
n=1
for some i.i.d. Pareto random variables χn .
The Principle of J (B) Big Jumps!
B. Zwart (CWI)
Heavy tails
20 / 43
Conditional limit theorem
Suppose that, in addition, B ⊆ D[0, 1] satisfies CJ (B) (B ◦ ) = CJ (B) (B − ).
Let X̄n (t) , X (nt)/n. Then
d
|B
X̄n |X̄n ∈ B → X̄∞
where
|B
P(X̄∞
∈ ·) =
Note that
CJ (B) (· ∩ B)
.
CJ (B) (B)
J (B)
D
|B
X̄∞
(t) =
X
χn 1[Un ,1] (t)
n=1
for some i.i.d. Pareto random variables χn .
The Principle of J (B) Big Jumps!
B. Zwart (CWI)
Heavy tails
20 / 43
Two-sided Lévy processes
Z
X (s) = sa+B(s)+
Z
x[N([0, s]×dx)−sν(dx)]+
xN([0, s]×dx),
|x|>1
0<|x|≤1
E [X (s)] = 0,
a is a drift parameter,
B a Brownian motion,
N is a Poisson random measure with mean measure Leb×ν on
[0, 1] × (−∞, ∞);
R1
ν is a measure on (−∞, ∞) satisfying −1 x 2 ν(dx) < ∞,
ν(x, ∞) = L+ (x)x −α ,
ν(−∞, −x) = L− (x)x −β ,
α, β > 1.
B. Zwart (CWI)
Heavy tails
21 / 43
Main result for two-sided case
Theorem 2 Suppose that A is a measurable set. Let
I(j, k) , (α − 1)j + (β − 1)k, and consider
(J (A), K(A)) ∈
I(j, k).
arg min
(j,k)∈Z2+ ;Dj,k ∩A6=∅
If the argument minimum (J (A), K(A)) is unique and A is bounded away
from D<J (A),K(A) , then
lim inf
P(X̄n ∈ A)
J
n→∞ (nν[n, ∞)) (A) (nν(−∞, −n])K(A)
≥ CJ (A),K(A) (A◦ )
P(X̄n ∈ A)
J
n→∞ (nν[n, ∞)) (A) (nν(−∞, −n])K(A)
≤ CJ (A),K(A) (A− ).
lim sup
B. Zwart (CWI)
Heavy tails
22 / 43
Two-sided Lévy processes (2)
Notation:
Dj,k : step functions vanishing at the origin with exactly j upward
jumps and k downward jumps.
S
D<j,k = (l,m)∈I<j,k Dl,m and
I<j,k = {(l, m) ∈ Z2+ \(j, k) : (α−1)l +(β −1)m ≤ (α−1)j +(β −1)k}.
Constant:
j
k
h
i
X
X
j
k
j+k
Cj,k (·) , E να ×νβ {(x, y ) ∈ (0, ∞)
:
xi 1[Ui ,1] −
yi 1[Vi ,1] ∈ ·}
i=1
B. Zwart (CWI)
Heavy tails
i=1
23 / 43
The rate function
The minimization problem
arg min
(α − 1)j + (β − 1)k
(j,k)∈Z2+ ;Dj,k ∩A6=∅
can be cast as a deterministic impulse control problem.
Related framework: Barles (1985).
[Stochastic IC: Bensoussan & Lions (1984), Dai & Yao (2014)]
Optimality of such problems can be described by quasi-variational
inequalities.
In examples we solve the minimization problem directly.
B. Zwart (CWI)
Heavy tails
24 / 43
Examples
We consider several functionals of X̄n , leading to specific A.
Need to check that A is bounded away from D6J (A)−1
It suffices to check that Aδ ∩ D<∞ is bounded away from D6J (A)−1 .
B. Zwart (CWI)
Heavy tails
25 / 43
Moderate jumps
Motivated by a reinsurance problem we consider
!
Q(n) , P
sup [X̄n (t) − ct] ≥ a; sup [X̄n (t) − X̄n (t−)] ≤ b ,
t∈[0,1]
t∈[0,1]
A , {ξ ∈ D : sup [ξ(t) − ct] ≥ a; sup [ξ(t) − ξ(t−)] ≤ b},
t∈[0,1]
t∈[0,1]
J (A) = da/be.
Condition on A holds iff a/b is not an integer, in which case
Q(n) ∼ Cda/be (A)(nν[n, ∞))da/be .
B. Zwart (CWI)
Heavy tails
26 / 43
A barrier digital option
Consider a Lévy-driven Ornstein-Uhlenbeck process of the form
d Ȳn (t) = −κd Ȳn (t) + d X̄n (t) ,
Ȳn (0) = 0.
We apply our results to estimate
b(n) = P( inf Ȳn (t) ≤ −a− , Ȳn (1) ≥ a+ ).
0≤t≤1
Theorem 2 applies, and we obtain
b (n) ∼ C1,1 (A) nν[n, ∞)nν(−∞, −n]
as n → ∞.
B. Zwart (CWI)
Heavy tails
27 / 43
A = {ξ : l ≤ ξ ≤ u}: only jump when you must
1
B. Zwart (CWI)
Heavy tails
28 / 43
A Stochastic Fluid Network
ρ1 = 0.8
Queue 1
ρ3 = 1
µ1 = 1
Q12 = 0.1
Q13 = 0.8
Queue 3
Q21 = 0.1
µ2 = 1
ρ2 = 0.8
µ3
Q23 = 0.8
Queue 2
Job size distribution: Pareto with α1 = 2, α2 = 2, α3 = 4
Queue 3 experiences large workload because of
I
I
I
single big job to either Queue 1 or 2 in case µ3 < 2.52
two big jobs to Queue 1 and 2 in case µ3 ∈ (2.52, 2.6)
one big job in Q3 in case µ3 > 2.6.
B. Zwart (CWI)
Heavy tails
29 / 43
Simulation: the barrier option example
Let
An ,
Sn0
≥ bn, min
0≤k≤n
Sk0
≤ −an .
Assume 0 < µ < b and rewrite An as
An = Sn0 − µn ≥ (b − µ)n, min (Sk0 − µk) + µk ≤ −an
0≤k≤n
= Sn ≥ (b − µ)n, min Sk + µk ≤ −an ,
0≤k≤n
where {Sn }n≥0 denotes the centered random walk.
B. Zwart (CWI)
Heavy tails
30 / 43
The theoretical estimator
Let b 0 = a + b − µ and let
Qγ ( · ) , P( · |Bnγ )
denote the conditional distribution given Bnγ , where
Bnγ , ∃i < j : Xi < −γ1 an, Xj > γ2 b 0 n ,
and γ = (γ1 , γ2 ) ∈ (0, 1)2 is a tuning parameter. Set
dQγ
1
=
1 γ.
dP
P(Bnγ ) Bn
Let w ∈ (0, 1) and set
Qγ,w ( · ) , w P( · ) + (1 − w )Qγ ( · )
Our estimator Z takes the form
dP
Z = 1An
=
dQγ,w
w+
B. Zwart (CWI)
Heavy tails
1An
1
1−w
γ
P(Bnγ ) Bn
.
31 / 43
Algorithm to sample from Qγ
1
Draw randomly a pair (I , J) from {1, . . . , n} without replacement.
2
Sample XI conditional on {XI < −γ1 an} and XJ conditional on
{XJ > γ2 b 0 n}.
3
Sample the other Xi ’s from their original distribution.
4
Accept it with probability
a(x1 , . . . , xn ) = P
i,j
1Bnγ
1{xi <−γ1 an, xj >γ2 b0 n}
.
(and you are done or go to step 1 in case of reject)
Algorithm has bounded relative error if γ is chosen wisely (depend on
lower and upper tail indices of jumps)
B. Zwart (CWI)
Heavy tails
32 / 43
Overview
Motivation and introduction
Main result: sample path large deviations principles
Examples
Simulation
More details
I
I
I
Random walks
Connections to the standard LD framework
Proofs
B. Zwart (CWI)
Heavy tails
33 / 43
Random walks
S(k), k ≥ 0: centered random walk
S̄n (t) = S([nt])/n, t ≥ 0, and S̄n = {S̄n (t), t ∈ [0, 1]}.
N(t), t ≥ 0: unit rate Poisson process.
X (t) , SN(t) , t ≥ 0, X̄n (t) and X̄n are as defined before
S̄n satisfies the same limit theorem as X̄n .
Proof idea: Skorokhod distance between S̄n and X̄n is bounded by
sup [N(nt)/n − t].
t∈[0,1]
B. Zwart (CWI)
Heavy tails
34 / 43
Connection with large deviations framework
Let X̄n be a centered and scaled two sided Lévy process with Lévy measure
ν satisfying
ν(x, ∞) = L+ (x)x −α ,
ν(−∞, −x) = L− (x)x −β ,
Define
I (ξ) ,
(α − 1)D+ (ξ) + (β − 1)D− (ξ), if ξ is a step function, ξ(0) = 0,
∞,
otherwise.
I is lower semi-continuous on D but does not have compact level sets
⇒ I is a rate function, but not a good rate function
B. Zwart (CWI)
Heavy tails
35 / 43
Weak large deviations principle
Theorem 3
The scaled process X̄n satisfies the weak large deviations principle
with rate function I and speed log n, i.e.,
lim inf
n→∞
log P(X̄n ∈ G )
≥ − inf I (x)
x∈G
log n
for every open set G , and
lim sup
n→∞
log P(X̄n ∈ K )
≤ − inf I (x)
x∈K
log n
for every compact set K .
Proof idea: reduce everything to balls, and use Theorem 2.
B. Zwart (CWI)
Heavy tails
36 / 43
Nonexistence of strong large deviations principle
The upper bound in Theorem 3 can not be extended to all closed sets.
Take α = β = 2 for ease of exposition
Set π(ξ) ,
supt∈(0,1] ξ(t) − ξ(t−) , supt∈(0,1] ξ(t−) − ξ(t) .
If X̄n satisfies a strong LDP, the contraction principle implies that
π(X̄n ) satisfies an LDP with rate function I 0 given by
I 0 (y1 , y2 ) = I(y1 > 0) + I(y2 > 0).
B. Zwart (CWI)
Heavy tails
37 / 43
Nonexistence of strong large deviations principle (ctd)
Consider the closed set A ,
S∞
k=2 [log k, ∞)
× [k −1/2 , ∞).
P(π(X̄n ) ∈ A) ≥ P(π(X̄n ) ∈ [log n, ∞) × [n−1/2 , ∞)),
The log of the RHS behaves like −1 ∗ log n (one big jump needed),
This contradicts with − inf (y1 ,y2 )∈A I 0 (y1 , y2 ) = −2
(suggesting two big jumps are needed),
We need the condition that A is bounded away from the set of step
functions with at most one big jump
B. Zwart (CWI)
Heavy tails
38 / 43
Proof: M-convergence (Lindskog, Resnick, Roy 2014)
Let (S, d) be a complete separable metric space, and S be the Borel
σ-algebra on S.
Given a closed subset C of S, define Cr , {x ∈ S : d(x, C) < r } for
r ≥ 0, and let M(S \ C) be the class of measures defined on SS\C
whose restrictions to S \ Cr are finite for all r > 0.
CS\C is the set of real-valued, non-negative, bounded, continuous
functions whose support is bounded away from C (i.e., f (Cr ) = {0}
for some r > 0).
A sequence of measures µn ∈ M(S \ C) converges to µ ∈ M(S \ C) if
µn (f ) → µ(f ) for each f ∈ CS\C .
For Theorem 1, we take S = D and C = D≤j−1 .
B. Zwart (CWI)
Heavy tails
39 / 43
Characterization of M convergence (LRR2014)
Let µ, µn ∈ M(S \ C). Then µn → µ in M(S \ C) as n → ∞ if and only if
lim sup µn (F ) ≤ µ(F )
(1)
n→∞
for all closed F ∈ SS\C bounded away from C and
lim inf µn (G ) ≥ µ(G )
n→∞
(2)
for all open G ∈ SS\C bounded away from C.
For Theorem 1, we take S = D and C = D≤j−1 .
B. Zwart (CWI)
Heavy tails
40 / 43
Asymptotic equivalence (RBZ2016)
Suppose that Xn and Yn are random elements taking values in a complete
separable metric space (S, d). Yn is said to be asymptotically equivalent to
Xn with respect to n and C, if, for each δ > 0 and γ > 0,
−γ
lim sup −1
n P(Xn ∈ (S \ C) , d(Xn , Yn ) ≥ δ) = 0
n→∞
−γ
lim sup −1
n P(Yn ∈ (S \ C) , d(Xn , Yn ) ≥ δ) = 0.
n→∞
For Theorem 1 it suffices to take C = ∅.
B. Zwart (CWI)
Heavy tails
41 / 43
Proof: M-convergence: the one-sided case
Theorem 1 follows from
Theorem 1’
For each j ≥ 1,
(nν[n, ∞))−j P(X̄n ∈ ·) → Cj (·),
(3)
in M(D \ D6j−1 ), as n → ∞.
Proof:
X̄n is asymptotically equivalent to Jnj which is the process obtained
from X̄n keeping its j biggest jumps.
Show both are asymptotically equivalent
Use representation for Jnj and many detailed technical estimates
B. Zwart (CWI)
Heavy tails
42 / 43
Final Comments
Currently under investigation: Weibull tails
Future plans: Markov additive processes, random graphs, connections
with impulse control
Good general definition of heavy-tailed LDP is still lacking (in this
talk: Skorokhod space)
B. Zwart (CWI)
Heavy tails
43 / 43
© Copyright 2026 Paperzz