Two applications of max–stable distributions: Random sketches and Heavy–tail exponent estimation Stilian Stoev ([email protected]) Department of Statistics University of Michigan Boston University Probability Seminar March 1, 2007 Based on joint works with Marios Hadjieleftheriou, George Kollios, George Michailidis and Murad Taqqu 1 Outline: • Max–stable distributions: background • Max–stable sketches: dominance norms norm, distance and • Max–spectrum: heavy tail exponent estimation 2 Max–stable distributions Th (Fisher-Tippett-Gnedenko) Let X1, . . . , Xn , ... be iid. If 1 d (X1 ∨ · · · ∨ Xn ) − bn → Z, (n → ∞) (1) an then the non–degenerate Z has one of the three types of distributions: (i) Frechet: Gα (x) := exp{−x−α }, x > 0, with α > 0. (ii) Negative Frechet: Ψα (x) := exp{−(−x)α }, x < 0, with α > 0. (iii) Gumbel: Λ(x) := exp{−e−x}, x ∈ R. • Analog of the CLT when ∨ replaces +. • Roughly: the tails of Xi determine the type of limit. Def A rv X is max–stable if, ∀a, b > 0, ∃c > 0, d: d aX ′ ∨ bX ′′ = cX + d, where X, X ′ , X ′′ iid. Th The limit Z in (1) are max–stable. Conversely any max–stable Z appears in a limit of iid maxima as in (1). 3 More on Frechet distributions Def Z is α−Fréchet (α > 0) if P{Z ≤ x} = exp{−cx−α}, x > 0, (c > 0). Properties • scale family: For all a > 0, P{aZ ≤ x} = exp{−(aαc)x−α}, x > 0 • Notation: For the scale c1/α : kZkα := c1/α , so kaZkα = akZkα, a > 0. • heavy tails: As x → ∞, −α P{Z > x} = 1 − e−cx ∼ cx−α. • moments: EZ p < ∞ iff p < α and Z ∞ −α EZ p = xp de−cx = cp Γ(1 − p/α). 0 Max–stability: For iid Z, Z1 , . . . , Zn : d Indeed: Z1 ∨ · · · ∨ Zn = n1/α Z. −α P{Zi ≤ x, i = 1, . . . , n} = P{Z ≤ x}n = e−ncx , x > 0. 4 Application I: Random Sketches First, mathematical concepts, then explanation & use. 5 Max–stable sketches ’Signal’: f (i) ≥ 0, i = 1, . . . , n, with large n. Def For α > 0, and k ∈ N, let Ej (f ) := n _ f (i)Zj (i), i=1 1 ≤ j ≤ k, for iid standard α−Fréchet Zj (i)’s P{Zj (i) ≤ x} = exp{−x−α}, x > 0 i.e. kZj (i)kα = 1. • Max–stable sketch of f : Ej (f ), 1 ≤ j ≤ k. • Idea: use {Ej (f )} with k << n as a proxy to f ! • More on this later... Properties ◦ Ej (f ) are α−Fréchet with kEj (f )kα = ( n X i=1 f (i)α )1/α =: kf kℓα . Indeed, for x > 0: Y X P{Ej (f ) ≤ x} = P{Z ≤ x/f (i)} = exp{−( f (i)α )x−α }. i i ◦ max–linearity: For a, b > 0 and another ’signal’ g(i) Ej (af ∨ bg) = aEj (f ) ∨ bEj (g), where (af ∨ bg)(i) = af (i) ∨ bg(i). 1 ≤ j ≤ k, 6 Max–stable sketches: estimation Observations • The Ej (f )’s are iid in j = 1, . . . , k • The scale kEj (f )kα equals the norm kf kℓα . Estimators: • (moment) For 0 < p < α, define 1/p k cp,α X Lp (f ) := Ej (f )p , k j=1 with cp,α := Γ(1 − p/α)−1. Motivation: 1X Ej (f )p ≈ EEj (f )p = Γ(1 − p/α)kEj (f )kpα . k j • (median) Define, M (f ) := ln(2)1/α med{Ej (f ), 1 ≤ j ≤ k}. Motivation: α e−x = 1/2 ⇔ x = ln(2)−1/α . 7 Norms and distances Given the max–stable sketch {Ej (f )} of a ’signal’ f . • For ’large’ k’s, we can estimate the norm: kf kℓα ≈ Lp (f ) or kf kℓα ≈ M (f ). • Performance guarantees later... • Given another ’signal’ g, define the distance: ρα (f, g) := n X i=1 |f (i)α − g(i)α|. Note that |a − b| = 2(a ∨ b) − a − b, a, b ≥ 0, so: X X X α α α ρα (f, g) = 2 f (i) ∨ g(i) − f (i) − g(i)α , i i i and hence: ρα (f, g) = 2kf ∨ gkαℓα − kf kαℓα − kgkαℓα . • Thus, given the sketches Ej (f ) and Ej (g), we can estimate ρα (f, g) by or by ρα (f, g) ≈ 2Lp (f ∨ g)α − Lp (f )α − Lp (g)α ρα (f, g) ≈ 2M (f ∨ g)α − M (f )α − M (g)α • The key is: Ej (f ∨ g) = Ej (f ) ∨ Ej (g) so Ej (f ∨ g) is available from Ej (f ) and Ej (g). 8 Dominance norms Given are several ’signals’ ft (i), 1 ≤ i ≤ n, indexed by t = 1, . . . , T . Def The dominant signal is: f ∗ (i) := max ft(i), 1≤t≤T 1 ≤ i ≤ n. The dominance norm of the ft’s is kf ∗ kℓα . Problem: Estimate kf ∗ kℓα from the sketches {Ej (ft)}, t = 1, . . . , T . Without accessing the signals ft directly! Solution: • Max–linearity implies Ej (f ∗ ) = Ej (f1 ) ∨ · · · ∨ Ej (fT ), 1 ≤ j ≤ k. • The sketch Ej (f ∗ ) is readily available! Proceed as above and estimate kf ∗ kℓα by: kf ∗ kℓα ≈ Lp (f ∗ ) or by kf ∗ kℓα ≈ M (f ∗ ). 9 Motivation: Data streams Signals which cannot be stored and processed with conventional methods are emerging in: • Telephone call monitoring (unmanageably large data bases). • Internet traffic monitoring (vast, rapidly growing amounts of data). • Sensing applications (restrictions on power, bandwidth, etc.). • Online transactions (Banking, Stock market data, E-commerce, etc.). The streaming model – a general framework. Let f (i), 1≤i≤n be a signal with large domain size n. Starting with f (i) = 0, update f sequentially in time: • cash register Observe (i, a(i)), and update f : f (i) := f (i) + a(i) • aggregated Observe (i, f (i)) directly. 10 Examples: Call detail records & IP monitoring Phone calls: Each time a phone call ends, a CDR packet is sent to a main station. Think of: CDR = (i, a(i), · · ·), with i = (phone no.) a(i) = time used. • Given a CDR (i, a(i), update f (i) := f (i) + a(i). So the ’signal’ f (i) contains info on all phone numbers! • Multiple signals: ft(i), t = 1, . . . , 7 capture the calling patterns for different days of the week. IP monitoring: Monitor a fast Internet link during a given day (or hour). How many distinct IPs used it? How many bytes were transmitted from a given range of IPs? Can you give me estimates in real time??? • Signal: f (i) = Bytes transmitted by IP i. • Stream items: A packed from IP i contains info: (i, b(i)), i = (IP number), b(i) = (Bytes) On observing (i, b(i)), update: f (i) := f (i) + b(i). • Impossible to store and process all data in real time. 11 “Classical” (additive) random sketches Instead of storing the signal f (i), 1 ≤ i ≤ n, keep: Sj (f ) = n X f (i)Xj (i), i=1 1 ≤ j ≤ k, where k << n and where Xj (i) are iid random variables. The Sj (f )’s 1 ≤ j ≤ k is called the random sketch of f . • Important: Can generate Xj (i)’s ’on the fly’ from small O(k log2 (n)) set of ’seeds’ in main memory. • Can update the sketch sequentially in the most general, cash register model. • Can approximate norms and inner products! If Var(Xj (i)) = σ 2, then Var(Sj (f )) = kf k2ℓ2 σ 2 , P 2 i f (i) . • So kf k2ℓ2 kf k2ℓ2 = 1 X Sj (f )2. ≈ K j • For two signals f and g, by linearity, k kf − gk2ℓ2 1X ≈ |Sj (f ) − Sj (g)|2, k j=1 12 Back to max–stable sketches Strengths 1. Max–stable sketches are max–linear. Natural use in: sensor data, Internet traffic monitoring, Transaction record tracking, max–sequential updates. 2. Work for any α > 0 and not only for 0 < α ≤ 2 (unlike the sum–stable sketches). 3. Tailor made for: dominance norm estimation. Illustration: Signals: ft (i) = Bytes of IP number i during t−th hour of monitoring. Large domain: 1 ≤ i ≤ n where n = 232 . Problem: Estimate the norm of the ’heaviest load scenario’ during the day, i.e. for f ∗ (i) = max ft (i) 1≤t≤24 estimate the dominance norm kf ∗ kℓα ≈? Problem: Compare several network traffic scenarios f , g, h,... without accessing the enormous data directly. Compute pair–wise distances, norms and dominance norms. • Max–stable sketches provide efficient solutions. Limitation: Max–stable sketches are not linear – cannot be updated in the linear cash register format! 13 Max–stable sketches: Performance guarantees Norms Given a max–stable sketch Ej (f ), j = 1, . . . , k of a signal f (i) ≥ 0. Recall that M (f ) is the median–based estimator of kf kℓα . Th 1(SHKT’07) Let ǫ, δ ∈ (0, 1). Then, P{|M (f )/kf kℓα − 1| ≤ ǫ} ≥ 1 − δ, if k ≥ C log(1/δ)/ǫ2, for some C > 0. For optimal value of the constant, as ǫ, δ → 0, we have k = k(ǫ, δ) ∼ ln(1/δ) 2 . α2 ln2(2) ǫ2 Distances Let now g be another signal and let D(f, g) := 2M (f ∨g)α −M (f )−M (g) ≈ ρα (f, g) = kf α −g αkℓ1 . Th 4 (SHKT’07) Let ǫ, δ, η ∈ (0, 1). If ρα (f ) ≥ ηkf ∨gkαℓα , then P{|D(f, g)/ρα(f, g) − 1| ≤ ǫ} ≥ 1 − δ, for k ≥ C log(1/δ)/ǫ2, with some C > 0. 14 Sketches: summary & credits Some seminal papers & algorithmic applications Alon, Matias & Szegedy (1996) Feingenbaum, Kannanm Strauss & Viswanathan (1999) Fong & Strauss (2000) Indyk (2000) Gilbert, Kotidis, Muthukrishnan & Strauss (2001, 2002) Cormode & Muthukrishnan (2003) Datar, Immorlica, Indyk & Mirrokni (2004) Review paper: Muthukrishnan (2003) More on max–stable sketches: Stoev, Hadjieleftheriou, Kollios & Taqqu (2007) Code: By Marios Hadjieleftheriou (AT&T Research) http://sourceforge.net/projects/sketches A lesson: (For probabilists and computer scientists) Study each other’s fields! 15 Application II: Heavy tail exponents 16 Heavy tailed data • A random variable X is said to be heavy–tailed if P{|X| ≥ x} ∼ L(x)x−α, as x → ∞, for some α > 0 and a slowly varying function L. ◦ Here we focus on the simpler but important context: X ≥ 0, a.s. and P{X > x} ∼ Cx−α, ◦ X (infinite moments) For p > 0, EX p < ∞ if and only if as x → ∞. p < α. In particular, and 0<α≤2 0<α≤1 ⇒ ⇒ Var(X) = ∞ E|X| = ∞. • The estimation of the heavy–tail exponent α is an important problem with rich history. • Why do we need heavy–tail models? Every finite sample X1 , . . . , Xn has finite sample mean, variance and all sample moments! Why consider heavy tailed models in practice?! 17 Why use heavy–tailed models? “All models are wrong, but some are useful.” George Box Let F and G be any two distributions with positive densities on (0, ∞). Let ǫ > 0 and x1 , . . . , xn ∈ (0, ∞) be arbitrary, then both: and PF {Xi ∈ (xi − ǫ, xi + ǫ), i = 1, . . . , n} > 0 PG{Xi ∈ (xi − ǫ, xi + ǫ), i = 1, . . . , n} > 0 are positive! • For a given sample, very many models apply. • The ones that continue to work as the sample grows are most suitable. We next present real data sets of Financial, Insurance and Internet data. They can be very heavy tailed. 18 Traded volumes on the Intel stock 5 10 x 10 Traded Volumes No. Stocks, INTC, Nov 1, 2005 8 6 4 2 2 4 6 8 10 12 4 x 10 4 x 10 4 3 2 1 2000 4000 6000 8000 10000 12000 19 Insurance claims due to fire loss Danish Fire Loss Data: 1980 − 1990 250 200 150 100 50 200 400 600 800 1000 1200 1400 1600 1800 2000 Hill plot: α (k) = 1.394 H= 0.60422 (0.020897), α =1.655 H 10 Max−Spectrum 2 1.5 1 0 500 1000 1500 order statistics 2000 8 6 4 2 0 0 5 Scales j 10 20 TCP flow sizes (in number of packets) 4 x 10 TCP Flow Sizes (packets): UNC link 2001 (~ 36 min) 8 6 4 2 2 4 6 8 time The first minute 10 12 14 4 x 10 1200 1000 800 600 400 200 500 1000 1500 2000 2500 3000 3500 21 History • Hill (1975) worked out the MLE in the Pareto model P{X > x} = x−α, x ≥ 1 and introduced the Hill plot: k 1X α bH (k) := ( log(Xi,n) − log(Xk+1,n ))−1, k i=1 where X1,n ≥ X2,n ≥ · · · ≥ Xk,n are the top–K order statistics of the sample. • How to choose k? ◦ pick k where the plot of α bH (k) vs. k stabilizes. ◦ serious problems in practice: volatile & hard to interpret: “Hill horror plot” confidence intervals robustness • Consistency and asymptotic normality resolved: Weissman (1978), Hall (1982) in semi–parametric setting. • Many other estimators: kernel based Csörgő, Deheuvels and Mason (1985), moment Dekkers, Einmahl and de Haan (1989), among many others. • Most estimators exploit the top order statistics. 22 Hill horror plots: TCP flow sizes Hill plot 2 α 1.5 1 0.5 0 0 2 4 6 8 Order statistics k αH(300) = 1.4114 10 12 14 4 αH(12000) = 0.9296 x 10 1.1 1.5 α α 1 0.9 1 0.8 0.5 0 500 Order statistics k 1000 0.5 1 1.5 2 Order statistics k x 104 23 Fréchet max–stable laws Consider i.i.d. Xi ’s with P{X1 > x} ∼ Cx−α , x → ∞. By the Fisher–Tippett–Gnedenko theorem: 1 max Xi ≡ n1/α 1≤i≤n 1 n _ n1/α i=1 d Xi −→ Z, as n → ∞, where Z is α−Fréchet extreme value distribution: P{Z ≤ x} = exp{−Cx−α}, x > 0. • The extreme value distributions are max–stable. In particular, for i.i.d. α−Fréchet Z, & Zi ’s: d Z1 ∨ · · · ∨ Zn = n1/α Z. • A time series of i.i.d. α−Fréchet {Zk } is 1/α−max-selfsimilar: d ∀m ∈ N : {∨1≤i≤mZm(k−1)+i }k∈N = m1/α {Zk }k∈N. ◦ Block–maxima of size m have the same distribution as the original data, rescaled by the factor m1/α . • Any heavy–tailed {Xk } (i.i.d.) data set is asymptotically max self–similar. 24 Max–spectrum Given a positive sample X1 , . . . , Xn, consider the dyadic block–maxima: j D(j, k) := maxj X2j (k−1)+i ≡ 1≤i≤2 2 _ X2j (k−1)+i , i=1 with k = 1, . . . , nj = [n/2j ]. • In view of the asymptotic scaling: 1 2 d D(j, k) −→ Z, j/α (j → ∞), observe that nj 1 X log2 D(j, k) ≃ j/α + c, Yj := nj j=1 (j, nj → ∞) where c := E log2 Z. ◦ The last asymptotics “follow” from the LLN since D(j, k)’s are independent in k. • The max–spectrum of the data is defined as the statistics: Yj , j = 1, . . . , [log2(n)]. ◦ Can identify α from the slope of the Yj ’s vs. j, for large j’s. 25 Max–spectrum estimators of α Given the max–spectrum Yj , j = 1, . . . , [log2 (n)], define where P α b(j1, j2) := ( j jwj = 1 and P j j2 X wj Yj )−1, j=j1 wj = 0. • That is, use linear regression to estimate the slope 1/α. • Issues: ◦weighted or generalized least squares must be used, since Var(Yj ) ∝ 1/nj ∝ 2j , ◦ The choice of the scales j1 and j2 is critical. • These issues and confidence intervals, are addressed in Stoev, Michailidis and Taqqu (2006). 26 Examples of max–spectra Intel Volume: α =1.3745 Max−Spectrum Max−Spectrum Frechet(α = 1.5), Estimated α =1.506 15 10 5 20 15 10 5 10 Scales j 15 5 1.3−stable(β=1), α =1.2692 10 Scales j 15 T−dist (df = 3), α =2.7538 15 Max−Spectrum Max−Spectrum 8 10 5 6 4 2 0 5 10 Scales j 15 5 10 Scales j 15 27 Robustness of the max–spectrum Max self−similarity: α(5,10) = 0.9486, α(10,12) =1.4124 2 12 20 18 16 Max−Spectrum 14 12 10 8 6 4 2 0 4 6 8 10 Scales j 14 16 Compare with the Hill horror plot of the Internet flow– size data (next slide). 28 Hill horror plots: TCP flow sizes Hill plot 2 α 1.5 1 0.5 0 0 2 4 6 8 Order statistics k αH(300) = 1.4114 10 12 14 4 αH(12000) = 0.9296 x 10 1.1 1.5 α α 1 0.9 1 0.8 0.5 0 500 Order statistics k 1000 0.5 1 1.5 2 Order statistics k x 104 Compare with the max–spectrum plot of the Internet flow–size data (previous slide). 29 Asymptotic normality of the max–spectrum Let P{Xk ≤ x} =: F (x), where 1 − F (x) = Cx−α (1 + O(x−β )), with some β > 0. (x → ∞), (2) • Consider the max–spectrum {Yj } for the range of scales: r(n) + j1 ≤ j ≤ r(n) + j2 , with fixed j1 < j2 and r(n) → ∞, n → ∞. Theorem Under (2) and another technical condition, √ ~ Y ~ µ ~r ) − (θ, ~ r )) ≤ x} − Φ(x/σθ~) sup P{ nj2+r ((θ, x∈R √ (3) ≤ Cθ~(1/2rβ/α + r2r/2/ n), where Φ is the standard Normal c.d.f. ~r = (Yr+j )j2 Here Y j=j1 and ~ Y ~r ) = (θ, j2 X θj Yr+j , µr (j) = (r + j)/α + c j=j1 and ~ Σαθ) ~ := σθ~2 = (θ, j2 X θi Σα(i, j)θj > 0. i,j=j1 30 The covariance structure of the max–spectrum • The asymptotic covariance matrix Σ is the same as if the data were i.i.d. α−Fréchet! ◦ The intuition is that the block–maxima {D(r + j, k), k = 1, . . . , nr+j } behave like i.i.d. α−Fréchet variables, as r = r(n) → ∞. • The covariance entries are given by: with Σα(i, j) = α−22i∨j−j2 ψ(|i − j|), ψ(a) := Cov(log2 (Z1), log2 (Z1 ∨ (2a − 1)Z2 ), for i.i.d. 1−Fréchet Z1 & Z2. a ≥ 0, ◦ Note that α appears only as a factor in Σα : Σα = α−2Σ1 . It does not affect the correlation structure of the max– spectrum. • GLS estimators for α use the matrix Σα . • Asymptotic normality for α b(r + j1, r + j2) follows from the last theorem. • Stoev, Michailidis and Taqqu (2006) has details on confidence intervals for α and automatic selection of j1&j2 . 31 Automatic selection of scales ~ = {Yj }J ’Null hypothesis’: Y j=1 is multivariate Normal with covariance matrix α−2Σ1 and means EYj = j/α + const, 1 ≤ j ≤ J. Given a range of scales j1 ≤ j ≤ j2 , set ~, b 1, j2) ≡ 1/α H(j b(j1, j2) := w ~ j′ 1 ,j2 Y for the GLS estimator of the slope H = 1/α of Yj vs j. Algorithm: 1. Set j2 ≈ J = [log2 (n)] and j1 := j2 − 1. b +1 := H(j b 1 − 1, j2) and H b 0 := H(j b 1, j2). 2. Compute H b +1 − H b 0 (based on the null 3. Compute a conf int for H hypothesis). 4. If the conf int does not contain 0, then stop. Else, set j1 := j1 − 1 and repeat Step 2. Important details: • Plug–in for α used in the covariance matrix • Level of conf int to be set by the user • Multiple testing issue not addressed The method works very well in practice! 32 Automatic selection of scales: an illustration Simulated: a mixture of 1−Fréchet (10%) and Exponential with mean 5 (90%). Sample size: n = 217 = 131, 072. Automatic choices of j : p=0.01 1 Automatic j : α−estimates 1 150 600 100 400 50 200 0 0 5 j1 10 15 1 α 1.5 2 MSE−best j1=10: α−estimates Square−root MSE Root−MSE 0 0.5 100 1.5 1 50 0.5 5 j1 10 15 0 0.5 1 α 1.5 33 Rates for moment functionals of maxima Let X1, . . . , Xn be i.i.d. from F and recall that _ 1 d Mn := 1/α Xi −→ Z, (n → ∞). n 1≤i≤n • Are there results on the rate of convergence Ef (Mn) −→ Ef (Z), (n → ∞) for “reasonable” f ’s? ◦ Pickands (1975) shows only the convergence of moments (no rates). • Our approach: consider with F (x) = exp{−σ α (x)x−α }, σ α (x) −→ C, (4) x ∈ R, (x → ∞). ◦ Note: 1 − F (x) ∼ Cx−α , x → ∞ is equivalent to (4). • Extra assumption: |σ α (x) − C| ≤ Dx−β , for large x. 34 Rates (cont’d) But (4) is very convenient to handle rates! • Note that P{Mn ≤ x} = P{X1 ≤ n1/αx}n = F (n1/αx)n = exp{−σ α(n1/α x)x−α }. ◦ Thus, Ef (Mn) = Z ∞ f (x)dFn (x) = 0 and also Ef (Z) = Z ∞ 0 Z ∞ f (x)dG(x) = 0 Z 0 (5) f (x)d exp{−σ α (n1/α x)x−α }, ∞ f (x)d exp{−Cx−α }. • Now, integration by parts yields: Z ∞ (G(x) − Fn(x))f ′ (x)dx. E(f (Mn) − f (Z)) = 0 ◦ However G(x) and Fn (x) are of the same “exponential” form! 35 Rates (cont’d) By the mean value theorem: |Fn(x) − G(x)| = | exp{−σ α(n1/α x)x−α } − exp{−Cx−α}| ≤ ≤ |σ α (n1/α x) − C|x−α exp{−cx−α } Dn−β/α x−(α+β) exp{−cx−α }, as x → ∞, • Thus, for any ǫ > 0, as n → ∞, Z ∞ |Fn(x) − G(x)||f ′ (x)|dx ǫ Z ∞ −α ≤ Dn−β/α |f ′ (x)|x−(α+β) e−cx dx = O(n−β/α ). 0 • By taking ǫ = ǫn → 0, and using a mild technical condition we can also bound the integral near zero Z ǫn |Fn(x) − G(x)||f ′ (x)|dx. 0 Th If σ α (x) ∼ Dx−β , as x → ∞, then as n → ∞, Z ∞ −α nβ/α (Ef (Mn ) − Ef (Z)) −→ D x−(α+β) f ′ (x)e−Cx dx, 0 provided mild technical conditions on σ(x) at 0 and on f at 0 and ∞ hold. 36 Rates (cont’d) • We have thus obtained exact rates for moment functionals 1 Ef ( 1/α max Xi) 1≤i≤n n ◦ They are valid for a large class of absolutely continuous f , including: f (x) = log2 (x), for example. and f (x) = xp , p ∈ (0, α), ◦ More details can be found in Stoev, Michailidis and Taqqu (2006). • As a corollary, we also get rates of convergence for Cov(log2 D(r+j1, k), log2 D(r+j2, k)), as r = r(n) → ∞. • These tools and the Berry–Esseen Theorem, yield the uniform asymptotic normality of the max–spectrum. Thank you! 37 References Alon, N., Matias, Y. & Szegedy, M. (1996), The space complexity of approximating the frequency moments, in ‘Proceedings of the 28th ACM Symposium on Theory of Computing’, pp. 20–29. Cormode, G. & Muthukrishnan, S. (2003), Algorithms - ESA 2003, Vol. 2832 of Lecture Notes in Computer Science, Springer, chapter Estimating Dominance Norms of Multiple Data Streams, pp. 148 –160. Csörgő, S., Deheuvels, P. & Mason, D. (1985), ‘Kernel estimates of the tail index of a distribution’, Annals of Statistics 13(3), 1050–1077. Datar, M., Immorlica, N., Indyk, P. & Mirrokni, V. S. (2004), Locality– sensitive hashing scheme based on p–stable distributions, in ‘SCG ’04: Proceedings of the twentieth annual symposium on Computational geometry’, ACM Press, New York, NY, USA, pp. 253–262. Dekkers, A., Einmahl, J. & de Haan, L. (1989), ‘A moment estimator for the index of an extreme–value distribution’, Ann. Statist. 17(4), 1833–1855. Feigenbaum, J., Kannan, S., Strauss, M. & Viswanathan, M. (1999), An approximate l1-difference algorithm for massive data streams, in ‘FOCS ’99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science’, IEEE Computer Society, Washington, DC, USA, p. 501. Fong, J. H. & Strauss, M. (2000), An approximate lp-difference algorithm for massive data streams, in ‘STACS ’00: Proceedings of the 17th Annual Symposium on Theoretical Aspects of Computer Science’, Springer-Verlag, London, UK, pp. 193–204. Gilbert, A., Kotidis, Y., Muthukrishnan, S. & Strauss, M. (2001), Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in ‘Procedings of VLDB, Rome, Italy’. Gilbert, A., Kotidis, Y., Muthukrishnan, S. & Strauss, M. (2002), ‘How to summarize the universe: Dynamic maintenance of quantiles’. Hall, P. (1982), ‘On some simple estimates of an exponent of regular variation’, J. Roy. Stat. Assoc. 44, 37–42. Series B. Hill, B. M. (1975), ‘A simple general approach to inference about the tail of a distribution’, The Annals of Statistics 3, 1163–1174. Indyk, P. (2000), Stable distributions, pseudorandom generators, embeddings and data stream computation, in ‘IEEE Symposium on Foundations of Computer Science’, pp. 189–197. Muthukrishnan, S. (2003), Data streams: algorithms and applications, Preprint. http://www.cs.rutgers.edu/∼muthu/. Pickands, J. (1975), ‘Statistical inference using extreme order statistics’, Ann. Statist. 3, 119–131. Stoev, S., Hadjieleftheriou, M., Kollios, G. & Taqqu, M. (2007), Norm, point and distance estimation over multiple signals using max– stable distributions, in ‘23rd International Conference on Data Engineering’, ICDE, IEEE. To appear. Stoev, S., Michailidis, G. & Taqqu, M. (2006), Estimating heavy–tail exponents through max self–similarity, Preprint. Weissman, I. (1978), ‘Estimation of parameters and large quantiles based on the k largest observations’, Journal of the American Statistical Association 73, 812–815.
© Copyright 2026 Paperzz