Two applications of max–stable distributions: Random

Two applications of max–stable distributions:
Random sketches and
Heavy–tail exponent estimation
Stilian Stoev ([email protected])
Department of Statistics
University of Michigan
Boston University Probability Seminar
March 1, 2007
Based on joint works with Marios Hadjieleftheriou,
George Kollios, George Michailidis and Murad Taqqu
1
Outline:
• Max–stable distributions: background
• Max–stable sketches:
dominance norms
norm, distance and
• Max–spectrum: heavy tail exponent estimation
2
Max–stable distributions
Th (Fisher-Tippett-Gnedenko) Let X1, . . . , Xn , ... be iid.
If
1
d
(X1 ∨ · · · ∨ Xn ) − bn → Z, (n → ∞)
(1)
an
then the non–degenerate Z has one of the three types
of distributions:
(i) Frechet: Gα (x) := exp{−x−α }, x > 0, with α > 0.
(ii) Negative Frechet: Ψα (x) := exp{−(−x)α }, x < 0,
with α > 0.
(iii) Gumbel: Λ(x) := exp{−e−x}, x ∈ R.
• Analog of the CLT when ∨ replaces +.
• Roughly: the tails of Xi determine the type of limit.
Def A rv X is max–stable if, ∀a, b > 0, ∃c > 0, d:
d
aX ′ ∨ bX ′′ = cX + d,
where X, X ′ , X ′′ iid.
Th The limit Z in (1) are max–stable. Conversely any
max–stable Z appears in a limit of iid maxima as in (1).
3
More on Frechet distributions
Def Z is α−Fréchet (α > 0) if
P{Z ≤ x} = exp{−cx−α}, x > 0, (c > 0).
Properties
• scale family: For all a > 0,
P{aZ ≤ x} = exp{−(aαc)x−α}, x > 0
• Notation: For the scale c1/α :
kZkα := c1/α ,
so
kaZkα = akZkα, a > 0.
• heavy tails: As x → ∞,
−α
P{Z > x} = 1 − e−cx ∼ cx−α.
• moments: EZ p < ∞ iff p < α and
Z ∞
−α
EZ p =
xp de−cx = cp Γ(1 − p/α).
0
Max–stability: For iid Z, Z1 , . . . , Zn :
d
Indeed:
Z1 ∨ · · · ∨ Zn = n1/α Z.
−α
P{Zi ≤ x, i = 1, . . . , n} = P{Z ≤ x}n = e−ncx , x > 0.
4
Application I: Random Sketches
First, mathematical concepts, then explanation & use.
5
Max–stable sketches
’Signal’: f (i) ≥ 0, i = 1, . . . , n, with large n.
Def For α > 0, and k ∈ N, let
Ej (f ) :=
n
_
f (i)Zj (i),
i=1
1 ≤ j ≤ k,
for iid standard α−Fréchet Zj (i)’s
P{Zj (i) ≤ x} = exp{−x−α}, x > 0 i.e. kZj (i)kα = 1.
• Max–stable sketch of f : Ej (f ), 1 ≤ j ≤ k.
• Idea: use {Ej (f )} with k << n as a proxy to f !
• More on this later...
Properties
◦ Ej (f ) are α−Fréchet with
kEj (f )kα = (
n
X
i=1
f (i)α )1/α =: kf kℓα .
Indeed, for x > 0:
Y
X
P{Ej (f ) ≤ x} =
P{Z ≤ x/f (i)} = exp{−(
f (i)α )x−α }.
i
i
◦ max–linearity: For a, b > 0 and another ’signal’ g(i)
Ej (af ∨ bg) = aEj (f ) ∨ bEj (g),
where (af ∨ bg)(i) = af (i) ∨ bg(i).
1 ≤ j ≤ k,
6
Max–stable sketches: estimation
Observations
• The Ej (f )’s are iid in j = 1, . . . , k
• The scale kEj (f )kα equals the norm kf kℓα .
Estimators:
• (moment) For 0 < p < α, define

1/p
k
cp,α X

Lp (f ) :=
Ej (f )p  ,
k j=1
with cp,α := Γ(1 − p/α)−1.
Motivation:
1X
Ej (f )p ≈ EEj (f )p = Γ(1 − p/α)kEj (f )kpα .
k j
• (median) Define,
M (f ) := ln(2)1/α med{Ej (f ), 1 ≤ j ≤ k}.
Motivation:
α
e−x = 1/2
⇔
x = ln(2)−1/α .
7
Norms and distances
Given the max–stable sketch {Ej (f )} of a ’signal’ f .
• For ’large’ k’s, we can estimate the norm:
kf kℓα ≈ Lp (f )
or
kf kℓα ≈ M (f ).
• Performance guarantees later...
• Given another ’signal’ g, define the distance:
ρα (f, g) :=
n
X
i=1
|f (i)α − g(i)α|.
Note that |a − b| = 2(a ∨ b) − a − b, a, b ≥ 0, so:
X
X
X
α
α
α
ρα (f, g) = 2
f (i) ∨ g(i) −
f (i) −
g(i)α ,
i
i
i
and hence:
ρα (f, g) = 2kf ∨ gkαℓα − kf kαℓα − kgkαℓα .
• Thus, given the sketches Ej (f ) and Ej (g), we can
estimate ρα (f, g) by
or by
ρα (f, g) ≈ 2Lp (f ∨ g)α − Lp (f )α − Lp (g)α
ρα (f, g) ≈ 2M (f ∨ g)α − M (f )α − M (g)α
• The key is: Ej (f ∨ g) = Ej (f ) ∨ Ej (g) so Ej (f ∨ g) is
available from Ej (f ) and Ej (g).
8
Dominance norms
Given are several ’signals’ ft (i), 1 ≤ i ≤ n, indexed by
t = 1, . . . , T .
Def The dominant signal is:
f ∗ (i) := max ft(i),
1≤t≤T
1 ≤ i ≤ n.
The dominance norm of the ft’s is kf ∗ kℓα .
Problem: Estimate kf ∗ kℓα from the sketches {Ej (ft)},
t = 1, . . . , T .
Without accessing the signals ft directly!
Solution:
• Max–linearity implies
Ej (f ∗ ) = Ej (f1 ) ∨ · · · ∨ Ej (fT ),
1 ≤ j ≤ k.
• The sketch Ej (f ∗ ) is readily available!
Proceed as above and estimate kf ∗ kℓα by:
kf ∗ kℓα ≈ Lp (f ∗ )
or by
kf ∗ kℓα ≈ M (f ∗ ).
9
Motivation: Data streams
Signals which cannot be stored and processed with conventional methods are emerging in:
• Telephone call monitoring (unmanageably large data
bases).
• Internet traffic monitoring (vast, rapidly growing
amounts of data).
• Sensing applications (restrictions on power, bandwidth, etc.).
• Online transactions (Banking, Stock market data,
E-commerce, etc.).
The streaming model – a general framework. Let
f (i),
1≤i≤n
be a signal with large domain size n.
Starting with f (i) = 0, update f sequentially in time:
• cash register Observe (i, a(i)), and update f :
f (i) := f (i) + a(i)
• aggregated Observe (i, f (i)) directly.
10
Examples: Call detail records & IP monitoring
Phone calls: Each time a phone call ends, a CDR
packet is sent to a main station. Think of:
CDR = (i, a(i), · · ·), with i = (phone no.) a(i) = time used.
• Given a CDR (i, a(i), update
f (i) := f (i) + a(i).
So the ’signal’ f (i) contains info on all phone numbers!
• Multiple signals: ft(i), t = 1, . . . , 7 capture the calling
patterns for different days of the week.
IP monitoring: Monitor a fast Internet link during a
given day (or hour). How many distinct IPs used it?
How many bytes were transmitted from a given range
of IPs? Can you give me estimates in real time???
• Signal: f (i) = Bytes transmitted by IP i.
• Stream items: A packed from IP i contains info:
(i, b(i)),
i = (IP number),
b(i) = (Bytes)
On observing (i, b(i)), update:
f (i) := f (i) + b(i).
• Impossible to store and process all data in real time.
11
“Classical” (additive) random sketches
Instead of storing the signal f (i), 1 ≤ i ≤ n, keep:
Sj (f ) =
n
X
f (i)Xj (i),
i=1
1 ≤ j ≤ k,
where k << n and where Xj (i) are iid random variables.
The Sj (f )’s 1 ≤ j ≤ k is called the random sketch of f .
• Important: Can generate Xj (i)’s ’on the fly’ from
small O(k log2 (n)) set of ’seeds’ in main memory.
• Can update the sketch sequentially in the most general, cash register model.
• Can approximate norms and inner products!
If Var(Xj (i)) = σ 2, then Var(Sj (f )) = kf k2ℓ2 σ 2 ,
P
2
i f (i) .
• So
kf k2ℓ2
kf k2ℓ2 =
1 X
Sj (f )2.
≈
K j
• For two signals f and g, by linearity,
k
kf −
gk2ℓ2
1X
≈
|Sj (f ) − Sj (g)|2,
k j=1
12
Back to max–stable sketches
Strengths
1. Max–stable sketches are max–linear. Natural use
in: sensor data, Internet traffic monitoring, Transaction record tracking, max–sequential updates.
2. Work for any α > 0 and not only for 0 < α ≤ 2
(unlike the sum–stable sketches).
3. Tailor made for: dominance norm estimation.
Illustration: Signals: ft (i) = Bytes of IP number i during t−th hour of monitoring. Large domain: 1 ≤ i ≤ n
where n = 232 .
Problem: Estimate the norm of the ’heaviest load scenario’ during the day, i.e. for
f ∗ (i) = max ft (i)
1≤t≤24
estimate the dominance norm
kf ∗ kℓα ≈?
Problem: Compare several network traffic scenarios
f , g, h,... without accessing the enormous data directly. Compute pair–wise distances, norms and dominance norms.
• Max–stable sketches provide efficient solutions.
Limitation: Max–stable sketches are not linear – cannot be updated in the linear cash register format!
13
Max–stable sketches: Performance guarantees
Norms
Given a max–stable sketch Ej (f ), j = 1, . . . , k of a signal
f (i) ≥ 0.
Recall that M (f ) is the median–based estimator of kf kℓα .
Th 1(SHKT’07) Let ǫ, δ ∈ (0, 1). Then,
P{|M (f )/kf kℓα − 1| ≤ ǫ} ≥ 1 − δ,
if k ≥ C log(1/δ)/ǫ2, for some C > 0.
For optimal value of the constant, as ǫ, δ → 0, we have
k = k(ǫ, δ) ∼
ln(1/δ)
2
.
α2 ln2(2) ǫ2
Distances
Let now g be another signal and let
D(f, g) := 2M (f ∨g)α −M (f )−M (g) ≈ ρα (f, g) = kf α −g αkℓ1 .
Th 4 (SHKT’07) Let ǫ, δ, η ∈ (0, 1). If ρα (f ) ≥ ηkf ∨gkαℓα ,
then
P{|D(f, g)/ρα(f, g) − 1| ≤ ǫ} ≥ 1 − δ,
for k ≥ C log(1/δ)/ǫ2, with some C > 0.
14
Sketches: summary & credits
Some seminal papers & algorithmic applications
Alon, Matias & Szegedy (1996)
Feingenbaum, Kannanm Strauss & Viswanathan (1999)
Fong & Strauss (2000)
Indyk (2000)
Gilbert, Kotidis, Muthukrishnan & Strauss (2001, 2002)
Cormode & Muthukrishnan (2003)
Datar, Immorlica, Indyk & Mirrokni (2004)
Review paper: Muthukrishnan (2003)
More on max–stable sketches: Stoev, Hadjieleftheriou, Kollios & Taqqu (2007)
Code: By Marios Hadjieleftheriou (AT&T Research)
http://sourceforge.net/projects/sketches
A lesson:
(For probabilists and computer scientists)
Study each other’s fields!
15
Application II: Heavy tail exponents
16
Heavy tailed data
• A random variable X is said to be heavy–tailed if
P{|X| ≥ x} ∼ L(x)x−α,
as x → ∞,
for some α > 0 and a slowly varying function L.
◦ Here we focus on the simpler but important context:
X ≥ 0,
a.s.
and
P{X > x} ∼ Cx−α,
◦ X (infinite moments) For p > 0,
EX p < ∞
if and only if
as x → ∞.
p < α.
In particular,
and
0<α≤2
0<α≤1
⇒
⇒
Var(X) = ∞
E|X| = ∞.
• The estimation of the heavy–tail exponent α is an
important problem with rich history.
• Why do we need heavy–tail models?
Every finite sample X1 , . . . , Xn has finite sample mean,
variance and all sample moments!
Why consider heavy tailed models in practice?!
17
Why use heavy–tailed models?
“All models are wrong, but some are useful.”
George Box
Let F and G be any two distributions with positive densities on (0, ∞).
Let ǫ > 0 and x1 , . . . , xn ∈ (0, ∞) be arbitrary, then both:
and
PF {Xi ∈ (xi − ǫ, xi + ǫ), i = 1, . . . , n} > 0
PG{Xi ∈ (xi − ǫ, xi + ǫ), i = 1, . . . , n} > 0
are positive!
• For a given sample, very many models apply.
• The ones that continue to work as the sample grows
are most suitable.
We next present real data sets of Financial, Insurance
and Internet data. They can be very heavy tailed.
18
Traded volumes on the Intel stock
5
10
x 10
Traded Volumes No. Stocks, INTC, Nov 1, 2005
8
6
4
2
2
4
6
8
10
12
4
x 10
4
x 10
4
3
2
1
2000
4000
6000
8000
10000
12000
19
Insurance claims due to fire loss
Danish Fire Loss Data: 1980 − 1990
250
200
150
100
50
200
400
600
800
1000 1200 1400 1600 1800 2000
Hill plot: α (k) = 1.394
H= 0.60422 (0.020897), α =1.655
H
10
Max−Spectrum
2
1.5
1
0
500
1000 1500
order statistics
2000
8
6
4
2
0
0
5
Scales j
10
20
TCP flow sizes (in number of packets)
4
x 10
TCP Flow Sizes (packets): UNC link 2001 (~ 36 min)
8
6
4
2
2
4
6
8
time
The first minute
10
12
14
4
x 10
1200
1000
800
600
400
200
500
1000
1500
2000
2500
3000
3500
21
History
• Hill (1975) worked out the MLE in the Pareto model
P{X > x} = x−α, x ≥ 1 and introduced the Hill plot:
k
1X
α
bH (k) := (
log(Xi,n) − log(Xk+1,n ))−1,
k i=1
where X1,n ≥ X2,n ≥ · · · ≥ Xk,n are the top–K order
statistics of the sample.
• How to choose k?
◦ pick k where the plot of α
bH (k) vs. k stabilizes.
◦ serious problems in practice:
volatile & hard to interpret: “Hill horror plot”
confidence intervals
robustness
• Consistency and asymptotic normality resolved: Weissman (1978), Hall (1982) in semi–parametric setting.
• Many other estimators: kernel based Csörgő, Deheuvels and Mason (1985), moment Dekkers, Einmahl
and de Haan (1989), among many others.
• Most estimators exploit the top order statistics.
22
Hill horror plots: TCP flow sizes
Hill plot
2
α
1.5
1
0.5
0
0
2
4
6
8
Order statistics k
αH(300) = 1.4114
10
12
14
4
αH(12000) = 0.9296
x 10
1.1
1.5
α
α
1
0.9
1
0.8
0.5
0
500
Order statistics k
1000
0.5
1
1.5
2
Order statistics k x 104
23
Fréchet max–stable laws
Consider i.i.d. Xi ’s with P{X1 > x} ∼ Cx−α , x → ∞. By
the Fisher–Tippett–Gnedenko theorem:
1
max Xi ≡
n1/α 1≤i≤n
1
n
_
n1/α i=1
d
Xi −→ Z,
as n → ∞,
where Z is α−Fréchet extreme value distribution:
P{Z ≤ x} = exp{−Cx−α}, x > 0.
• The extreme value distributions are max–stable. In
particular, for i.i.d. α−Fréchet Z, & Zi ’s:
d
Z1 ∨ · · · ∨ Zn = n1/α Z.
• A time series of i.i.d. α−Fréchet {Zk } is 1/α−max-selfsimilar:
d
∀m ∈ N : {∨1≤i≤mZm(k−1)+i }k∈N = m1/α {Zk }k∈N.
◦ Block–maxima of size m have the same distribution
as the original data, rescaled by the factor m1/α .
• Any heavy–tailed {Xk } (i.i.d.) data set is asymptotically max self–similar.
24
Max–spectrum
Given a positive sample X1 , . . . , Xn, consider the dyadic
block–maxima:
j
D(j, k) := maxj X2j (k−1)+i ≡
1≤i≤2
2
_
X2j (k−1)+i ,
i=1
with k = 1, . . . , nj = [n/2j ].
• In view of the asymptotic scaling:
1
2
d
D(j, k) −→ Z,
j/α
(j → ∞),
observe that
nj
1 X
log2 D(j, k) ≃ j/α + c,
Yj :=
nj j=1
(j, nj → ∞)
where c := E log2 Z.
◦ The last asymptotics “follow” from the LLN since
D(j, k)’s are independent in k.
• The max–spectrum of the data is defined as the statistics:
Yj , j = 1, . . . , [log2(n)].
◦ Can identify α from the slope of the Yj ’s vs. j, for
large j’s.
25
Max–spectrum estimators of α
Given the max–spectrum Yj , j = 1, . . . , [log2 (n)], define
where
P
α
b(j1, j2) := (
j jwj = 1 and
P
j
j2
X
wj Yj )−1,
j=j1
wj = 0.
• That is, use linear regression to estimate the slope
1/α.
• Issues:
◦weighted or generalized least squares must be used,
since
Var(Yj ) ∝ 1/nj ∝ 2j ,
◦ The choice of the scales j1 and j2 is critical.
• These issues and confidence intervals, are addressed
in Stoev, Michailidis and Taqqu (2006).
26
Examples of max–spectra
Intel Volume: α =1.3745
Max−Spectrum
Max−Spectrum
Frechet(α = 1.5), Estimated α =1.506
15
10
5
20
15
10
5
10
Scales j
15
5
1.3−stable(β=1), α =1.2692
10
Scales j
15
T−dist (df = 3), α =2.7538
15
Max−Spectrum
Max−Spectrum
8
10
5
6
4
2
0
5
10
Scales j
15
5
10
Scales j
15
27
Robustness of the max–spectrum
Max self−similarity: α(5,10) = 0.9486,
α(10,12) =1.4124
2
12
20
18
16
Max−Spectrum
14
12
10
8
6
4
2
0
4
6
8
10
Scales j
14
16
Compare with the Hill horror plot of the Internet flow–
size data (next slide).
28
Hill horror plots: TCP flow sizes
Hill plot
2
α
1.5
1
0.5
0
0
2
4
6
8
Order statistics k
αH(300) = 1.4114
10
12
14
4
αH(12000) = 0.9296
x 10
1.1
1.5
α
α
1
0.9
1
0.8
0.5
0
500
Order statistics k
1000
0.5
1
1.5
2
Order statistics k x 104
Compare with the max–spectrum plot of the Internet
flow–size data (previous slide).
29
Asymptotic normality of the max–spectrum
Let P{Xk ≤ x} =: F (x), where
1 − F (x) = Cx−α (1 + O(x−β )),
with some β > 0.
(x → ∞),
(2)
• Consider the max–spectrum {Yj } for the range of
scales:
r(n) + j1 ≤ j ≤ r(n) + j2 ,
with fixed j1 < j2 and r(n) → ∞, n → ∞.
Theorem Under (2) and another technical condition,
√
~ Y
~ µ
~r ) − (θ,
~ r )) ≤ x} − Φ(x/σθ~)
sup P{ nj2+r ((θ,
x∈R
√
(3)
≤ Cθ~(1/2rβ/α + r2r/2/ n),
where Φ is the standard Normal c.d.f.
~r = (Yr+j )j2
Here Y
j=j1 and
~ Y
~r ) =
(θ,
j2
X
θj Yr+j ,
µr (j) = (r + j)/α + c
j=j1
and
~ Σαθ)
~ :=
σθ~2 = (θ,
j2
X
θi Σα(i, j)θj > 0.
i,j=j1
30
The covariance structure of the max–spectrum
• The asymptotic covariance matrix Σ is the same as if
the data were i.i.d. α−Fréchet!
◦ The intuition is that the block–maxima
{D(r + j, k), k = 1, . . . , nr+j }
behave like i.i.d. α−Fréchet variables, as r = r(n) → ∞.
• The covariance entries are given by:
with
Σα(i, j) = α−22i∨j−j2 ψ(|i − j|),
ψ(a) := Cov(log2 (Z1), log2 (Z1 ∨ (2a − 1)Z2 ),
for i.i.d. 1−Fréchet Z1 & Z2.
a ≥ 0,
◦ Note that α appears only as a factor in Σα :
Σα = α−2Σ1 .
It does not affect the correlation structure of the max–
spectrum.
• GLS estimators for α use the matrix Σα .
• Asymptotic normality for α
b(r + j1, r + j2) follows from
the last theorem.
• Stoev, Michailidis and Taqqu (2006) has details on
confidence intervals for α and automatic selection of
j1&j2 .
31
Automatic selection of scales
~ = {Yj }J
’Null hypothesis’: Y
j=1 is multivariate Normal
with covariance matrix α−2Σ1 and means
EYj = j/α + const, 1 ≤ j ≤ J.
Given a range of scales j1 ≤ j ≤ j2 , set
~,
b 1, j2) ≡ 1/α
H(j
b(j1, j2) := w
~ j′ 1 ,j2 Y
for the GLS estimator of the slope H = 1/α of Yj vs j.
Algorithm:
1. Set j2 ≈ J = [log2 (n)] and j1 := j2 − 1.
b +1 := H(j
b 1 − 1, j2) and H
b 0 := H(j
b 1, j2).
2. Compute H
b +1 − H
b 0 (based on the null
3. Compute a conf int for H
hypothesis).
4. If the conf int does not contain 0, then stop. Else,
set j1 := j1 − 1 and repeat Step 2.
Important details:
• Plug–in for α used in the covariance matrix
• Level of conf int to be set by the user
• Multiple testing issue not addressed
The method works very well in practice!
32
Automatic selection of scales: an illustration
Simulated: a mixture of 1−Fréchet (10%) and Exponential with mean 5 (90%).
Sample size: n = 217 = 131, 072.
Automatic choices of j : p=0.01
1
Automatic j : α−estimates
1
150
600
100
400
50
200
0
0
5
j1
10
15
1
α
1.5
2
MSE−best j1=10: α−estimates
Square−root MSE
Root−MSE
0
0.5
100
1.5
1
50
0.5
5
j1
10
15
0
0.5
1
α
1.5
33
Rates for moment functionals of maxima
Let X1, . . . , Xn be i.i.d. from F and recall that
_
1
d
Mn := 1/α
Xi −→ Z, (n → ∞).
n
1≤i≤n
• Are there results on the rate of convergence
Ef (Mn) −→ Ef (Z), (n → ∞)
for “reasonable” f ’s?
◦ Pickands (1975) shows only the convergence of moments (no rates).
• Our approach: consider
with
F (x) = exp{−σ α (x)x−α },
σ α (x) −→ C,
(4)
x ∈ R,
(x → ∞).
◦ Note: 1 − F (x) ∼ Cx−α , x → ∞ is equivalent to (4).
• Extra assumption:
|σ α (x) − C| ≤ Dx−β ,
for large x.
34
Rates (cont’d)
But (4) is very convenient to handle rates!
• Note that
P{Mn ≤ x} = P{X1 ≤ n1/αx}n = F (n1/αx)n
= exp{−σ α(n1/α x)x−α }.
◦ Thus,
Ef (Mn) =
Z
∞
f (x)dFn (x) =
0
and also
Ef (Z) =
Z
∞
0
Z
∞
f (x)dG(x) =
0
Z
0
(5)
f (x)d exp{−σ α (n1/α x)x−α },
∞
f (x)d exp{−Cx−α }.
• Now, integration by parts yields:
Z ∞
(G(x) − Fn(x))f ′ (x)dx.
E(f (Mn) − f (Z)) =
0
◦ However G(x) and Fn (x) are of the same “exponential”
form!
35
Rates (cont’d)
By the mean value theorem:
|Fn(x) − G(x)| = | exp{−σ α(n1/α x)x−α } − exp{−Cx−α}|
≤
≤
|σ α (n1/α x) − C|x−α exp{−cx−α }
Dn−β/α x−(α+β) exp{−cx−α },
as x → ∞,
• Thus, for any ǫ > 0, as n → ∞,
Z ∞
|Fn(x) − G(x)||f ′ (x)|dx
ǫ
Z ∞
−α
≤ Dn−β/α
|f ′ (x)|x−(α+β) e−cx dx = O(n−β/α ).
0
• By taking ǫ = ǫn → 0, and using a mild technical
condition we can also bound the integral near zero
Z ǫn
|Fn(x) − G(x)||f ′ (x)|dx.
0
Th If σ α (x) ∼ Dx−β , as x → ∞, then as n → ∞,
Z ∞
−α
nβ/α (Ef (Mn ) − Ef (Z)) −→ D
x−(α+β) f ′ (x)e−Cx dx,
0
provided mild technical conditions on σ(x) at 0 and on
f at 0 and ∞ hold.
36
Rates (cont’d)
• We have thus obtained exact rates for moment functionals
1
Ef ( 1/α max Xi)
1≤i≤n
n
◦ They are valid for a large class of absolutely continuous
f , including:
f (x) = log2 (x),
for example.
and
f (x) = xp , p ∈ (0, α),
◦ More details can be found in Stoev, Michailidis and
Taqqu (2006).
• As a corollary, we also get rates of convergence for
Cov(log2 D(r+j1, k), log2 D(r+j2, k)),
as r = r(n) → ∞.
• These tools and the Berry–Esseen Theorem, yield the
uniform asymptotic normality of the max–spectrum.
Thank you!
37
References
Alon, N., Matias, Y. & Szegedy, M. (1996), The space complexity of
approximating the frequency moments, in ‘Proceedings of the 28th
ACM Symposium on Theory of Computing’, pp. 20–29.
Cormode, G. & Muthukrishnan, S. (2003), Algorithms - ESA 2003, Vol.
2832 of Lecture Notes in Computer Science, Springer, chapter Estimating Dominance Norms of Multiple Data Streams, pp. 148 –160.
Csörgő, S., Deheuvels, P. & Mason, D. (1985), ‘Kernel estimates of the
tail index of a distribution’, Annals of Statistics 13(3), 1050–1077.
Datar, M., Immorlica, N., Indyk, P. & Mirrokni, V. S. (2004), Locality–
sensitive hashing scheme based on p–stable distributions, in ‘SCG
’04: Proceedings of the twentieth annual symposium on Computational geometry’, ACM Press, New York, NY, USA, pp. 253–262.
Dekkers, A., Einmahl, J. & de Haan, L. (1989), ‘A moment estimator for the index of an extreme–value distribution’, Ann. Statist.
17(4), 1833–1855.
Feigenbaum, J., Kannan, S., Strauss, M. & Viswanathan, M. (1999), An
approximate l1-difference algorithm for massive data streams, in
‘FOCS ’99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science’, IEEE Computer Society, Washington, DC, USA, p. 501.
Fong, J. H. & Strauss, M. (2000), An approximate lp-difference algorithm
for massive data streams, in ‘STACS ’00: Proceedings of the 17th
Annual Symposium on Theoretical Aspects of Computer Science’,
Springer-Verlag, London, UK, pp. 193–204.
Gilbert, A., Kotidis, Y., Muthukrishnan, S. & Strauss, M. (2001), Surfing
wavelets on streams: one-pass summaries for approximate aggregate queries, in ‘Procedings of VLDB, Rome, Italy’.
Gilbert, A., Kotidis, Y., Muthukrishnan, S. & Strauss, M. (2002), ‘How
to summarize the universe: Dynamic maintenance of quantiles’.
Hall, P. (1982), ‘On some simple estimates of an exponent of regular
variation’, J. Roy. Stat. Assoc. 44, 37–42. Series B.
Hill, B. M. (1975), ‘A simple general approach to inference about the tail
of a distribution’, The Annals of Statistics 3, 1163–1174.
Indyk, P. (2000), Stable distributions, pseudorandom generators, embeddings and data stream computation, in ‘IEEE Symposium on Foundations of Computer Science’, pp. 189–197.
Muthukrishnan, S. (2003), Data streams: algorithms and applications,
Preprint. http://www.cs.rutgers.edu/∼muthu/.
Pickands, J. (1975), ‘Statistical inference using extreme order statistics’,
Ann. Statist. 3, 119–131.
Stoev, S., Hadjieleftheriou, M., Kollios, G. & Taqqu, M. (2007), Norm,
point and distance estimation over multiple signals using max–
stable distributions, in ‘23rd International Conference on Data Engineering’, ICDE, IEEE. To appear.
Stoev, S., Michailidis, G. & Taqqu, M. (2006), Estimating heavy–tail
exponents through max self–similarity, Preprint.
Weissman, I. (1978), ‘Estimation of parameters and large quantiles based
on the k largest observations’, Journal of the American Statistical
Association 73, 812–815.