extremefit : package to estimate the conditional
probabilities and extreme quantiles
Kévin Jaunâtre
based on a joint work with
Gilles Durrieu, Ion Grama, Khoai-Quang Pham and Jean-Marie Tricot
Université de Bretagne Sud
Rencontres R, Toulouse, 2016
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
1 / 19
1
Formulation of the problem
2
Estimation of the parameters
3
Other extreme value theory packages
4
Example of utilisation of extremefit
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
2 / 19
Formulation of the problem
Formulation of the problem
Let be Ft (x ) = P(X ≤ x |T = t) a family of conditional distributions with
t ∈ [0, Tmax ], Tmax > 0.
Assume that Ft ∈ DA of the Fréchet
law with parameter 1/γt :
−1/γt
Φ1/γt (x ) = exp −x
, x ≥ 0.
We observe independent variables
Xt1 , . . . , Xtn where 0 ≤ t1 < ... < tn ≤ T
and each Xti has the distribution function Fti .
The goal :
1
For a given large x to provide a pointwise estimate of the (extreme)
tail probability process : St (x ) = 1 − Ft (x ) , t ∈ [0, Tmax ].
2
For a given p ∈ (0, 1) to provide a pointwise estimate of the
(extreme) p-quantile process Ft−1 (p) , t ∈ [0, Tmax ].
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
3 / 19
Formulation of the problem
Theoretical background
The tail of Ft will be represented by its excess d.f. over the threshold
τ >0:
1 − Ft (x τ )
Ft,τ (x ) = P (Xt ≤ x τ | Xt > τ ) = 1 −
, x ≥ 1.
1 − Ft (τ )
Consider the Pareto d.f. :
Pθ (x ) = 1 − x −1/θ ,
x ≥ 1, θ > 0.
Theorem (Fisher-Tippet-Gnedenko theorem)
1
Ft ∈ DA of the Fréchet law with parameter 1/γt iff for any x ≥ 1,
Ft,τ (x ) → Pγt (x )
2
as
τ → ∞.
Equivalent condition : 1 − Ft (x ) = x −1/γt Lt (x ), with Lt (x ) slowly
varying perturbation.
F-T-G theorem suggests to approximate Ft,τ (x ) ≈ Pγt (x ) with estimated
γt for large τ.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
4 / 19
Formulation of the problem
Our approach
Approaching Ft with a family of semi-parametric models :
(
Ft,τ,θ (x ) =
Ft (x )
x ∈ [0, τ ],
x
1 − (1 − Ft (τ )) 1 − Pθ ( τ ) x > τ,
We fit the tail of Ft (x ) by a Pareto model where :
τ is the unknown threshold
Ft and θ are unknown
Pointwise estimation in t
For each time t we use obs. in the window [t − h, t + h] of size h :
1
Ft (·) is estimated by the weighted empirical distribution function, for
x ∈ [0, τ ]
2
θ is estimated by maximizing the weighted likelihood.
3
τ is replaced by the adaptive threshold τb explained later.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
5 / 19
Estimation of the parameters
Estimation of θ
Given the time t and the bandwidth h, we define the weights :
ti − t
0 ≤ Wt,h (t) = K
≤ 1,
h
where K (·) is a kernel function (for ex. K (t) = √12π exp(−t 2 /2)).
The weighted quasi-log-likelihood function is :
Lt,h (τ, θ) =
n
X
Wt,h (ti ) log
i=1
dFt,τ,θ
(Xti ).
dx
Maximum quasi-likelihood estimator with fixed threshold τ :
θb
t,h,τ
bt,h,τ =
here n
Pn
=
1
bt,h,τ
n
n
X
Xti
Wt,h (ti )1{Xti >τ } log
τ
i=1
i=1 Wt,h (ti )1{Xti >τ }
,
is the weighted nb. of obs. beyond τ .
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
6 / 19
Estimation of the parameters
Choice of the threshold τ
Consider the case of i.i.d. observations : Fti = F
(no choice of the bandwidth h).
We propose a procedure to choose the threshold τ = τn in two steps :
Propagation step :
given a sequence of embedded Pareto models, it consists in
determining the largest one to fit the data.
Selection step :
determine the fitted model by penalized maximum likelihood.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
7 / 19
Estimation of the parameters
Propagation step
We choose as thresholds the order statistics : τ1 ≡ X(1) ≥ ... ≥ τn ≡ X(n) .
1
2
Fix τm0 a starting threshold (m0 ≥ 3).
For m = m0 , . . . , n, test sequentially :
null hypothesis Hτm : Fτm has a Pareto distribution Pθ
against
eτ : Fτ has a Pareto CP distribution Pθ ,θ ,ν with some
alternative H
m
m
1 2
change-point ν
3
b
Stop when the null hypothesis is first rejected (say m).
The ouptut is the break point bs = τm
≡
X
.
b
(m
b)
To test the null hypothesis Hτm we use the likelihood ratio test statistic
(precisely its max of over all change points νl = τl /τm ) :
TS(τm ) =
max
δ 0 k≤l≤(1−δ 00 )m
{maxθ1 ,θ2 Lτm (Pθ1 ,θ2 ,νl ) − maxθ Lτm (Pθ ) } .
Hτm is rejected if TS(τm ) > D, where D is a critical value.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
8 / 19
Estimation of the parameters
Selection step
Usually one chooses the last accepted parametric model in the propagation step. This can
introduce a huge bias, since the last accepted model is close to the rejected one.
The idea : since at the break point bs the null hypothesis is rejected,
penalize for getting close to bs .
Introduce a penalty term
Pen (τ ) = Lτ Pθb
,
τ ≤ s.
bs
Choose the adaptive threshold τb by maximizing in τ the penalized
likelihood :
max Lτ (Pθ ) − Pen (τ ) = Lτ Pθb − Lτ Pθb ≥ 0
θ
τ
bs
The output :
1 the adaptive threshold
τb
2 the adaptive estimator
θbbτ = θbn,bτ .
3 (the break point b
s)
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
9 / 19
Estimation of the parameters
Choice of the bandwidth h
Cross Validation (globaly for all ti ).
1
2
For each bandwidth hm = aq m , m = 1, . . . , Mh choose the threshold
τbm by the adaptive procedure presented before.
b among the values hm by minimizing the
We choose an adaptive h
cross validation function
(−i)
bp (ti , hm ) q
log
,
b −1 (p) F
t
,h
i l
grid
X X
1
CV (hm , p) =
Mh card(Tgrid ) h t ∈T
l
(−i)
qbp
i
(·) is the adaptive quantile estimator with Xti removed.
Fbt−1
(p) is the empirical quantile in the window [t − hl , t + hl ]
i ,hl
p should be not very high : p = pcv = 0.99.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
10 / 19
Other extreme value theory packages
R packages for extreme value theory
There are several packages on the extreme value theory with different
application domain and methods. The following packages are the most
used ones and close to our method
evd package : Univariate and bivariate extreme modelling, DA of
Gumbel and Weibull computed
evir package : Univariate modelling. DA of Gumbell computed.
Both of them have functions helping the choice of the threshold.
The package evir allow an easy prediction of the extreme quantiles.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
11 / 19
Other extreme value theory packages
extremefit package
What new features does the extremefit package bring ?
Adaptive choice of the threshold τ in the POT method for
computation of the extreme p-quantile and of the extreme survival
probabilities.
Computation of the extreme p-quantile and of the extreme survival
probabilities varying with the time t ∈ [0, Tmax ].
Choice of the bandwidth parameter by cross validation method.
Goodness-of-fit test for testing the adjustment of the tail by a Pareto
type d.f.
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
12 / 19
Example of utilisation of extremefit
Adaptive choice of the threshold τ
Test Statistic, start pt = 10 break pt = 84
6
4
Test Statistic
0
0.5
2
1.0
Hill estimator
1.5
8
2.0
10
Hill = 1.2461 kadapt = 52
20
40
60
80
0
20
40
60
80
k
k
Pen Lik, kadapt = 52
Survival function, threshold = 1.1468
0.6
0.2
0.4
survival function
4
3
2
0
0.0
1
Penalized MaxLikelihood
0.8
5
1.0
0
0
20
40
60
80
−4
−2
k
extremefit : package to estimate the conditional probabilities and extreme quantiles
0
2
4
6
log(X)
Rencontres R, Toulouse, 2016
13 / 19
Example of utilisation of extremefit
Example of a data set (ERDF)
0.5
1.0
X
1.5
2.0
The data are the average consumption on a sample of clients with 3 or 6
kVa contract. The data are collected every 30 minutes during 3 years.
0.0
0.5
1.0
1.5
2.0
2.5
3.0
tYear
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
14 / 19
Example of utilisation of extremefit
Computation of quantiles
1.5
1.0
Electric consumption (kVA)
2.0
Extreme quantiles
− 0.95−quantile
0.5
− 0.99−quantile
− 0.999−quantile
0.0
0.5
1.0
1.5
2.0
2.5
3.0
t (Year)
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
15 / 19
Example of utilisation of extremefit
An example of a data set (Wind from Brest)
The data are from a study of the wind speed in Brest (France) from 1976
to 2005. The data are collected daily.
60
40
0
20
Wind Speed (km/h)
80
100
Figure: Wind speed for the period 1990 to 2001
1990
1992
1994
1996
1998
2000
Date
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
16 / 19
Example of utilisation of extremefit
Computation of the 0.99-quantile
60
40
0
20
Wind Speed (km/h)
80
100
The bandwidth is chosen to be two months.
1990
1992
1994
1996
1998
2000
Date
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
17 / 19
Example of utilisation of extremefit
Computation of the 0.99-quantile
60
40
0
20
Wind Speed (km/h)
80
100
The bandwidth is chosen by cross-validation (more than 4 months).
1990
1992
1994
1996
1998
2000
Date
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
18 / 19
Thank you for your attention !
extremefit : package to estimate the conditional probabilities and extreme quantiles
Rencontres R, Toulouse, 2016
19 / 19
© Copyright 2026 Paperzz