On the estimation of the heavy–tail exponent in time series using the

On the estimation of the heavy–tail exponent in time
series using the max–spectrum
Stilian A. Stoev ([email protected])
University of Michigan, Ann Arbor, U.S.A.
JSM, Salt Lake City, 2007
joint work with: George Michailidis ([email protected]) and
Murad Taqqu ([email protected])
Outline
• Heavy tails are ubiquitous
• An old problem
• Max–spectrum
• The estimator
• Asymptotic properties
• Data examples
2
Heavy tails
• A random variable X is said to be heavy–tailed if
P{|X| ≥ x} ∼ L(x)x−α,
as x → ∞,
for some α > 0 and a slowly varying function L.
◦ Here we focus on the simpler but important context:
X ≥ 0,
a.s.
and
P{X > x} ∼ Cx−α,
◦ X (infinite moments) For p > 0,
In particular,
and
EX p < ∞
if and only if
0<α≤2
0<α≤1
⇒
⇒
as x → ∞.
p < α.
Var(X) = ∞
E|X| = ∞.
• The estimation of the heavy–tail exponent α is an important problem with
rich history.
3
Heavy tails everywhere: Traded volumes
5
10
x 10
Traded Volumes No. Stocks, INTC, Nov 1, 2005
8
6
4
2
2
4
6
8
10
12
4
x 10
4
x 10
4
3
2
1
2000
4000
6000
8000
10000
12000
4
Heavy tails everywhere: TCP durations
4
x 10
TCP Flow Sizes (packets): UNC link 2001 (~ 36 min)
8
6
4
2
2
4
6
8
time
The first minute
10
12
14
4
x 10
1200
1000
800
600
400
200
500
1000
1500
2000
2500
3000
3500
5
Heavy tails everywhere: Insurance claims
Danish Fire Loss Data: 1980 − 1990
250
200
150
100
50
200
400
600
800
1000 1200 1400 1600 1800 2000
Hill plot: α (k) = 1.394
H= 0.60422 (0.020897), α =1.655
H
10
Max−Spectrum
2
1.5
1
0
500
1000 1500
order statistics
2000
8
6
4
2
0
0
5
Scales j
10
6
Tail exponent estimation: an old problem
• Hill (1975) – the MLE in the Pareto model P{X > x} = x−α , x ≥ 1 and
introduced the Hill plot:
k
1X
α
bH (k) := (
log(Xi,n) − log(Xk+1,n ))−1,
k i=1
where X1,n ≥ X2,n ≥ · · · ≥ Xk,n are the top–k order statistics of the sample.
• A lot of work for iid data – less for dependent:
◦ Resnick and Stǎricǎ (1995) – consistency of Hill–type estimators.
◦ J. Hill (2006) – asymptotic normality of Hill–type estimators under NED
(near epoch dependence) conditions.
◦ ...
• Even for iid data, Hill plots are: volatile & hard to interpret: “Hill horror
plot”
7
Another approach: max self–similarity
• For iid (Xk ) with tail exponent α
1
n
_
n1/α i=1
d
Xi −→ Z,
as n → ∞,
where P{Z ≤ x} = exp{−Cx−α}, x > 0.
◦ The above continues to hold for many dependent stationary (Xk )!
• Given X1, . . . , Xn , set
j
D(j, k) :=
2
_
i=1
X2j (k−1)+i ,
1 ≤ k ≤ nj := [n/2j ], 1 ≤ j ≤ log2(n).
to be block–maxima of dyadic sizes.
◦ Observe that
nj
1 X
log2 D(j, k) ≃ E log2 2j/αZ = j/α + E log2 Z,
Yj :=
nj
k=1
as j → ∞.
8
The max–spectrum: iid asymptotics
The Yj ’s, 1 ≤ j ≤ log2 n is the max–spectrum of the data set (Xk , 1 ≤ k ≤ n).
• An estimator of α is then derived from Yj via regression:
α
b=α
b[j1, j2] :=
j2
X
j=j1
wj Yj ,
with
X
j
wj = 0,
X
jwj = 1.
j
• For iid data: The estimator α
b[j1, j2] is consistent and asymptotically normal,
j
as j1 , j2 → ∞ but so that n/2 1 , n/2j2 → ∞.
Thm [S., Michailidis & Taqqu (2006)] For iid data under second order tail
regularity conditions. Let 1 ≤ r(n) ≤ log2 n be such that
√
√
n/2r(n)(1/2+β/α) + r(n)2r(n)/2/ n −→ 0, as n → ∞,
then
√
~ Y
~ µ~r i) ≤ x} − Φ(x/σ~)| −→ 0,
~ i − hθ,
sup |P{ nj2+r(n) (hθ,
θ
x∈R
n → ∞.
9
The max–spectrum: iid asymptotics (cont’d)
~ = (Yj+r(n))j2 , θ~ = (θj )j2 , and
Here Y
j=j1
j=j1
µ
~ r = ((j + r(n))/α + C, j1 ≤ j ≤ j2 ),
and
~
σθ~2 = α−2 θ~tΣ1 θ.
Remarks:
• The β > 0 governs the “second order” tail behavior. Roughly:
P{X > x} ∼ Cx−α(1 + Dx−β ),
as x → ∞.
• The asymptotic cov matrix Σ1 is the same as for 1−Fréchet data.
◦ It does not depend on α and C = E log2 Z.
• Consistency and asymptotic normality for α
b[r(n) + j1, r(n) + j2] follow.
◦ The rates are the same as for the Hill estimator – Hall (1982).
~ yields the optimal
• The explicit asymptotic cov α−2Σ1 of the max–spectrum Y
linear GLS estimators – important in practice.
10
The max–spectrum: dependent data
Let (Xk )k∈Z be stationary, with tail exponent α and extremal index θ > 0.
• Then,
1
n1/α
_
1≤k≤n
d
Xk → θ
1/α
Z
where
1
n1/α
_
d
1≤k≤n
Xk∗ → Z,
(n → ∞)
where (Xk∗ ) are iid copies of X1.
◦ Since θ > 0, the max–spectrum (Yj ) for time series scales as for iid data:
Yj ≃ j/α + C,
as j → ∞ and nj = n/2j → ∞.
• The same, regression–based, estimators α
b=
Pj2
j=j1
wj Yj work!
• The asymptotics for α
b are harder (than for iid data)!
◦ Intuition: the block–maxima D(j, k), 1 ≤ k ≤ nj are asymptotically iid, as
j → ∞.
11
Max–spectrum illustration: TCP durations
TCP Flow Sizes (bytes): Max self−similarity H= 0.924 (0.044637), α =1.0822
26
24
Max−Spectrum
22
20
18
16
14
12
2
4
6
8
10
Scales j
12
14
16
12
Two asymptotic regimes
• Intermediate scales: Fix j1 < j2 integer and let
α
bn = α
b[r(n) + j1, r(n) + j2],
where
r(n) → ∞ and 2r(n) /n → 0,
◦ We expect to get consistency and asymptotic normality for α
bn .
as n → ∞.
• Large scales: Fix ℓ ∈ N and focus on the largest ℓ + 1 scales:
α
bn = α
b[log2 n − ℓ, log2 n].
◦ We can only get “distributional consistency”:
d
with αZ a random variable.
α
b n → αZ ,
as n → ∞,
• Both regimes are useful/interesting in practice.
• More details ...
13
Intermediate scales asymptotics
The regularity conditions: for Mn := max1≤k≤n Xk
P{n−1/αMn ≤ x} = exp{−c(n, x)x−α}, x > 0,
|c(n, x) − cX | ≤ c1(x)n−β , ∀x > 0,
(Plus a technicality at x ≈ 0.)
where
with c1(x) = O(x−R ), x ↓ 0.
(1)
◦ Intuition: β controls the second order tail behavior of Mn .
◦ Caveat: Relation (1) may be hard to verify! We have it for moving maxima.
• We get rates on moments of f (Mn /n1/α ), in particular:
Thm [S. & Michailidis (2006)] Under the above conditions, for all k ∈ N,
provided
R∞
1
E| logk (Mn/n1/α) − E logk (Z)| = O(n−β ),
c1 (x)x−α−1+δ dx, for δ > 0.
as n → ∞,
14
Intermediate scales: asymptotic normality
Let (Xk ) be stationary with tail exponent α > 0.
Thm [S. & Michailidis (2006)] Under the above conditions, and if (Xk ) is
m–dependent, we have
√
d
nr(n) (α
bn − α) −→ N (0, α2cw ),
where cw = w
~ t Σ1w,
~ and α
bn = α
b[r(n) + j1, r(n) + j2], provided
2r(n) /n + n/2r(n)(1+2 min{1,β}) −→ 0,
as n → ∞.
Remarks:
• The same asymptotic variance as in the iid case.
◦ Intuition: The block–maxima D(j, k), 1 ≤ k ≤ nj – asymptotically iid!
• β captures: second order tails PLUS dependence.
• Asymptotic confidence intervals available!
• Optimal linear GLS estimators available!
15
Large scales: distributional consistency
The regularity conditions and m–dependence are restrictive.
• As in Davis & Resnick (1985), let
Xk =
∞
X
ci ξk−i ,
where
i=0
X
i
|ci |δ < ∞,
0 < δ < min{1, α}.
◦ Here (ξk ) are iid and P{|ξ1| > x} ∼ Cx−α , x → ∞, with P{ξ1 > x}/P{|ξ1| >
x} → p ∈ [0, 1], as x → ∞.
Lemma For Xk (m) := max1≤i≤m Xm(k−1)+i , k = 1, 2, . . ., we get
f dd
{m−1/α Xk (m)}k∈N −→ {Zk }k∈N, as m → ∞,
where (Zk ) are iid α−Fréchet. Provided p maxi ci > 0 or (1−p) maxi (−ci ) > 0.
• This justifies the “asymptotic independence phenomenon” for the block–
maxima (D(j, k))k as j → ∞!
Thm [S. & Michailidis (2006)] Under the above conditions, with fixed ℓ
d
bZ,ℓ, as n → ∞,
α
bn −→ α
where α
bn = α
b[ top–ℓ scales] and α
bZ is based on iid α−Fréchet data Z1, . . . , Z2ℓ+1 .
16
Distributional consistency: implications
• No consistency but confidence intervals!
• Covers more processes!
• The approximation is often valid for “small” n.
17
AR(1) with Pareto (α = 1.5) innovations
AR(1) with Pareto innovations: φ = 0.9, α = 1.5
1500
1000
500
0.5
1
1.5
2
2.5
3
4
x 10
Hill plot
Hill plot
3
2.5
2
α
α
2
1.5
1.5
1
0.5
1
2
Order statistics k
1
3
4
x 10
500 1000 1500
Order statistics k
2000
18
The max–spectrum ...
Max self−similarity: α = 1.4844
13
12
Max−Spectrum
11
10
9
8
7
6
5
2
4
6
8
Scales j
10
12
14
19
Data examples: the advantage of time scales
20
Google: traded volume
Transaction volumes for GOOG in November 2005
5
Number of shares
x 10
2
1.5
1
0.5
5
10
15
20
Day of the month
Confidence intervals for α per day
25
4
α
3
2
1
5
10
15
20
Day of the month
25
30
21
Google: traded volume – the time series
Transaction volumes for GOOG: Nov 7, 2005
4
Number of shares
x 10
8
6
4
2
0.5
1
1.5
2
2.5
3
3.5
4
4
α = 1.0729
Hill plot
18
Max−Spectrum
3
2.5
α
x 10
2
1.5
1
16
14
12
10
8
0
200 400 600 800
Order statistics k
0
5
10
15
Scales j
22
Intel: traded volume
Transaction volumes for INTC in November 2005
6
Number of shares
x 10
3
2
1
5
10
15
20
Day of the month
Confidence intervals for α per day
25
6
5
α
4
3
2
1
5
10
15
20
Day of the month
25
30
23
Intel: strange time series
Transaction volumes for INTC: Nov 23, 2005
5
x 10
Number of shares
2.5
2
1.5
1
0.5
1
2
3
4
5
6
7
8
9
4
x 10
α(7,11) = 1.0578, α(12,16) = 5.2128
Hill plot
18
Max−Spectrum
α
3
2
1
16
14
12
10
0
200 400 600 800 1000
Order statistics k
0
5
10
Scales j
15
24
Intel: typical time series
Transaction volumes for INTC: Nov 21, 2005
5
Number of shares
3
x 10
2
1
0
1
2
3
4
5
6
4
α = 1.5564
Hill plot
2.5
Max−Spectrum
25
2
α
x 10
1.5
1
20
15
10
0.5
200 400 600 800 1000
Order statistics k
5
10
15
Scales j
25
References:
Davis, R. A. and Resnick, S.I.(1985) Limit theory for moving averages of random variables
with regularly varying tail probabilities. The Annals of Probability 13 (1), 179–195.
Hall, P. (1982) On some simple estimates of an exponent of regular variation, J. Roy. Stat.
Assoc. (Ser B), 44, 37–42.
Hill, B. M. (1975) A simple general approach to inference about the tail of a distribution.
The Annals of Statistics 3, 1163–1174.
Resnick, S. and Stǎricǎ, C. (1995) Consistency of Hill’s estimator for dependent data. Journal
of Applied Probability 32, 139–167.
Stoev, S. and Michailidis, G. (2006) On the estimation of the heavy–tail exponent in time
series using the max–spectrum, Technical Report, University of Michigan.
Stoev, S., Michailidis, G., and Taqqu, M.S. (2006) Estimating heavy–tail exponents through
max self–similarity, Technical Report, University of Michigan.
WRDS https://wrds.wharton.upenn.edu/.
Pennsylvania.
Wharton School of Management, Universty of
26