Estimation of the third order parameter in extreme value statistics

Estimation of the third order parameter in extreme value
statistics
Yuri Goegebeur
∗
Tertius de Wet
†
April 9, 2011
Abstract
We introduce a class of estimators for the third order parameter in extreme value statistics when
the distribution function underlying the data is heavy tailed. For appropriately chosen intermediate
sequences of upper order statistics, consistency is established under the third order tail condition and
asymptotic normality under the fourth order tail condition. Numerical and simulation experiments
illustrate the asymptotic and finite sample behavior, respectively, of some selected estimators.
Keywords: Pareto-type distribution, third order parameter, fourth order condition.
1
Introduction
A central topic in the area of extreme value statistics is drawing inferences about the far tail of a distribution function. In this process of drawing inferences, it is necessary to extend the empirical distribution
function beyond the available data, which is typically done by imposing some conditions on the tail behavior of the underlying distribution function. In this paper we deal with an estimation problem within
the class of heavy-tailed or Pareto-type distributions, which constitute the max-domain of attraction of
the Fréchet distribution.
∗ Department
of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M,
Denmark (email: [email protected]). This author’s research was supported by a grant from the Danish Natural
Science Research Council.
† Department of Statistics and Actuarial Science, University of Stellenbosch, Private Bag X1, Matieland 7602, South
Africa (email: [email protected]). This author’s research was partially supported by the National Research Foundation.
1
A distribution is said to be of Pareto-type if for some γ > 0 its survival function is of the form:
1 − F (x) = x−1/γ ℓF (x),
x > 0,
(1)
ℓF (λx)
→ 1 as x → ∞ for all λ > 0.
ℓF (x)
(2)
where ℓF denotes a slowly varying function at infinity, i.e.
The parameter γ is called the extreme value index, and clearly governs the tail behavior, with larger
values indicating heavier tails. The Pareto-type model can also be stated in an equivalent way in terms
of the tail quantile function U , where U (x) := inf{y : F (y) ≥ 1 − 1/x}, x > 1, as follows
U (x) = xγ ℓU (x),
(3)
with ℓU again a slowly varying function at infinity (Gnedenko, 1943).
The estimation of the parameter γ has received a lot of attention in the extreme value literature; we
refer to Beirlant et al. (2004) and de Haan and Ferreira (2006) for good accounts of such estimation
procedures. Consistency of estimators for γ can typically be achieved under (1), which is a first order
condition, whereas establishing asymptotic normality requires more structure on the tail of F . This extra
structure is typically formulated in terms of a second order condition on the tail behavior, the so-called
slow variation with remainder condition; see Bingham et al. (1987) and de Haan and Ferreira (2006).
Rx
Let Dρ (x) := 1 uρ−1 du = (xρ − 1)/ρ, x > 0, ρ < 0.
Second order condition (R2 ) There exists a positive real parameter γ, a negative real parameter ρ and
a function b with b(t) → 0 for t → ∞, of constant sign for large values of t, such that
lim
t→∞
ln U (tx) − ln U (t) − γ ln x
= Dρ (x), ∀x > 0.
b(t)
Let us remark here that condition (R2 ) implies that |b| is regularly varying of index ρ (Geluk and de
Haan, 1987), and hence the parameter ρ determines the rate of convergence of ln U (tx) − ln U (t) to its
limit, being γ ln x, as t → ∞. It should be mentioned that the second order condition as presented in
de Haan and Ferreira (2006) also allows the case ρ = 0. However, in this paper the case ρ = 0 is excluded.
Although the estimation of the second order parameter ρ is challenging, both from a theoretical and an
applied perspective, several estimators for ρ that work well in practice have recently been introduced,
2
and their asymptotic properties studied. We refer here to Gomes et al. (2002), Fraga Alves et al. (2003),
Ciuperca and Mercadier (2010), and Goegebeur et al. (2010), among others. In these papers, consistency
is established under (R2 ), whereas asymptotic normality requires again a further condition on the tail
behavior, the third order condition. Denote
Z x
Z y
ρ−1
Hρ,β (x) :=
y
uβ−1 du dy
1
1
1 xρ+β − 1 xρ − 1
=
,
−
β
ρ+β
ρ
x > 0, ρ, β < 0.
Third order condition (R3 ) There exists a positive real parameter γ, negative real parameters ρ and
β, functions b and b̃ with b(t) → 0 and b̃(t) → 0 for t → ∞, both of constant sign for large values of t,
such that
lim
t→∞
ln U(tx)−ln U(t)−γ ln x
b(t)
− Dρ (x)
b̃(t)
= Hρ,β (x), ∀x > 0.
Note that under condition (R3 ) it can be proven that |b̃| is regularly varying of index β. We refer to de
Haan and Stadtmüller (1996) for further details. It is clear that (R3 ) specifies the rate of convergence in
(R2 ), in a way similar to (R2 ) specifying the rate of convergence in the first order framework.
In this paper we deal with the estimation of the third order parameter β in (R3 ) with ρ < 0 and β < 0.
Having estimators for this parameter is practically relevant for the following reasons. Firstly, successful
application of extreme value methods often depends critically on the appropriate selection of the number
of extreme values used in the estimation. Although this problem has been well studied in the framework
of the estimation of γ (Hall and Welsh (1985), Beirlant et al. (1996), Drees and Kaufmann (1998), Guillou
and Hall (2001), among others), there are, apart from some heuristic guidelines, no formal methods to
deal with this problem when estimating ρ. We refer to Caeiro and Gomes (2008) for a discussion of this
issue. Secondly, recently improved estimators for γ were introduced, which were obtained by explicitly
estimating the dominant term of the asymptotic bias; see Beirlant et al. (1999), Feuerverger and Hall
(1999), Beirlant et al. (2002), Gomes and Martins (2002), and more recently Gomes et al. (2008), for
examples of such bias-corrected estimators for γ. No such estimators are available for the second order
parameter. The absence of research on the above mentioned topics for the estimation of ρ can be attributed to the fact that to date, to the best of our knowledge, no estimators for β are available. With
this paper we make a first attempt in this direction, with the objective to provide an impetus to new
research on improved estimation procedures for the second order parameter.
3
The remainder of this paper is organized as follows. In the next section we introduce the class of
estimators for the third order parameter β. The asymptotic properties of the proposed class, consistency
and asymptotic normality, are established in Section 3. In Section 4 we illustrate the asymptotic and
finite sample behavior of some specific estimators for β numerically. The proofs of all results are deferred
to the Appendix.
2
The class of estimators for β
In this section we introduce our class of estimators for the third order parameter. Consider X1 , . . . , Xn
independent and identically distributed (i.i.d.) random variables according to a distribution function F
satisfying (1), with associated order statistics X1,n ≤ · · · ≤ Xn,n . Let Tn,k (K) denote a kernel statistic
with kernel function K:
Tn,k (K) :=
k
j
1X
K
Zj
k j=1
k+1
(4)
where Zj := j(ln Xn−j+1,n − ln Xn−j,n ), the so-called scaled log-spacings of successive order statistics
introduced in Beirlant et al. (1999) and Feuerverger and Hall (1999). Further, for β, ρ < 0, let
Z 1
Z 1
K 2 (u)du,
K(u)du,
σ 2 (K) :=
µ(K) :=
0
0
I1 (K, ρ) :=
Z
1
K(u)u
−ρ
du,
I2 (K, ρ, β) :=
Z
1
K(u)u−ρ
0
0
u−β − 1
du.
β
In this paper we will consider statistics of type (4) with a log-weight function
Lα (u) :=
(− ln u)α
, u ∈ (0, 1),
Γ(α + 1)
with α > 0 and where Γ denotes the Gamma function, i.e. Γ(ω) =
R∞
0
e−x xω−1 dx (ω > 0), as basic
building blocks for the class of estimators for β. Therefore, we start with studying their asymptotic
behavior. The following proposition gives the asymptotic distributional representation of Tn,k (Lα ) under
(R3 ). Let Y1,n ≤ · · · ≤ Yn,n denote the order statistics of a random sample of size n from the unit Pareto
distribution, with distribution function F (y) = 1 − 1/y, y > 1.
Proposition 1 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R3 ). If
k → ∞ such that k/n → 0 we have that
D
Tn,k (Lα ) =
γ + γσ(Lα )
Nk (Lα )
√
+ b(Yn−k,n )I1 (Lα , ρ)
k
+b(Yn−k,n )b̃(Yn−k,n )I2 (Lα , ρ, β)(1 + oP (1)) + b(Yn−k,n )OP
where Nk (Lα ) is an asymptotic standard normally distributed random variable.
4
1
√
k
,
This proposition follows immediately from the fact that Tn,k (Lα ) is an element of the class of kernel
statistics studied in Theorem 2 of Goegebeur et al. (2010), and hence for brevity we omit its proof from
this paper. Note that
1
1
1
,
I2 (Lα , ρ, β) =
−
β (1 − ρ − β)α+1
(1 − ρ)α+1
1
,
(1 − ρ)α+1
Γ(2α + 1)
.
Γ2 (α + 1)
I1 (Lα , ρ) =
σ 2 (Lα ) =
The basic idea is now to use Proposition 1 to construct differences and ratios of statistics Tn,k (Lα ), chosen
in such a way that the term involving I2 (Lα , ρ, β) becomes the dominant one.
For arbitrary positive real parameters α and α̃, we denote
I1 (α, α̃, ρ)
:= I1 (Lα , ρ) − I1 (Lα̃ , ρ),
From Proposition 1 we have that if
Ψn,k (α, α̃, ᾰ) :=
I2 (α, α̃, ρ, β) := I2 (Lα , ρ, β) − I2 (Lα̃ , ρ, β).
√
kb(n/k) → ∞
Tn,k (Lα ) − Tn,k (Lα̃ )
Tn,k (Lᾰ ) − Tn,k (Lα̃ )
P
→
ψ(α, α̃, ᾰ, ρ) :=
I1 (α, α̃, ρ)
(1 − ρ)α̃−α − 1
=
,
I1 (ᾰ, α̃, ρ)
(1 − ρ)α̃−ᾰ − 1
where ᾰ is again an arbitrary positive real parameter. We refer to the proof of Theorem 1 for further
details about this result. Now, in order to make the term involving I2 (α, ρ, β) the dominant one, we
construct a ratio of differences of statistics Ψn,k (α, α̃, ᾰ), with appropriately chosen parameters α, α̃ and
ᾰ. Setting 0 < τ1 < τ2 < τ3 and 0 < δ1 , δ2 < τ1 , consider the statistic
Λn,k (τ , δ) :=
Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 )
Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) − Ψn,k (τ3 − δ1 , τ3 , τ3 − δ2 )
where τ ′ := (τ1 , τ2 , τ3 ) and δ ′ := (δ1 , δ2 ), and the function
Λ(τ , ρ, β)
:=
=
c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β)
c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β) − c1 (τ3 − δ1 , τ3 , τ3 − δ2 , ρ, β)
(1 − ρ − β)τ2 (1 − ρ)τ1 − (1 − ρ − β)τ1 (1 − ρ)τ2
(1 − ρ − β)τ3 −τ1
,
(1 − ρ − β)τ3 (1 − ρ)τ2 − (1 − ρ − β)τ2 (1 − ρ)τ3
with
I2 (α, α̃, ρ, β) − ψ(α, α̃, ᾰ, ρ)I2 (ᾰ, α̃, ρ, β)
.
(5)
I1 (ᾰ, α̃, ρ)
√
In Theorem 1 we establish the consistency of Λn,k (τ , δ) for Λ(τ , ρ, β) as kb(n/k)b̃(n/k) → ∞. Further,
c1 (α, α̃, ᾰ, ρ, β)
:=
we have that the function β 7→ Λ(τ , ρ, β) is monotone, as stated in the following lemma.
Lemma 1 The function β 7→ Λ(τ , ρ, β) is decreasing for β ∈ (−∞, 0) with
lim Λ(τ , ρ, β) = −
β→0−
τ2 − τ1
τ2 − τ3
and
5
lim Λ(τ , ρ, β) = +∞.
β→−∞
The consistency of Λn,k (τ , δ) for Λ(τ , ρ, β) together with the result from Lemma 1 motivate the estimating
equation Λn,k (τ , δ) = Λ(τ , ρ, β), from which the estimator for β can be obtained by inversion:
β̂n,k (τ , δ, ρ) := Λ← (τ , ρ, Λn,k (τ , δ)),
provided Λn,k (τ , δ) > −(τ2 − τ1 )/(τ2 − τ3 ).
3
3.1
Main asymptotic results
Consistency
First, we establish consistency of the proposed class of estimators for β under condition (R3 ), and for
appropriately chosen sequences k and n.
Theorem 1 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R3 ). Then,
√
P
if k → ∞ such that k/n → 0 and kb(n/k)b̃(n/k) → ∞ we have that Λn,k (τ , δ) → Λ(τ , ρ, β) and that
P
P
β̂n,k (τ , δ, ρ) → β. Further, if ρ̂n,k̆ is a consistent estimator for ρ, then also β̂n,k (τ , δ, ρ̂n,k̆ ) → β.
The recent extreme value literature contains several consistent estimators for the second order parameter ρ
that work very well in practice; see Gomes et al. (2002), Fraga Alves et al. (2003), Ciuperca and Mercadier
(2010), and Goegebeur et al. (2010). In these papers it is shown that consistency of the estimators for ρ
p
can be obtained when the sequence k̆ satisfies k̆ → ∞ with k̆/n → 0 and k̆b(n/k̆) → ∞. Moreover, if
p
p
additionally k̆b(n/k̆)b̃(n/k̆) → λ1 and k̆b2 (n/k̆) → λ2 we can obtain asymptotic normality for ρ̂n,k̆ ,
when appropriately normalized. In the framework of Theorem 1, k̆ can be taken equal to k, the number of
√
√
extreme values used in the estimation of β. Indeed, kb(n/k)b̃(n/k) → ∞ implies that kb(n/k) → ∞,
and hence ρ̂n,k will be consistent for ρ, though for such a sequence we may no longer guarantee asymptotic
normality of the normalized ρ̂n,k .
3.2
Asymptotic normality
In this section we will show that our class of estimators for the third order parameter, when appropriately
normalized, converges in distribution to a normal random variable. Similarly to the estimation of first
and second order tail parameters, where the derivation of the asymptotic normality required a second
and a third order condition on the tail behavior, respectively, we impose here the so-called fourth order
condition, introduced in Fraga Alves et al. (2006).
6
Denote
Rρ,β,η (x)
Z
x
y ρ−1
Z
y
uβ−1
Z
u
sη−1 ds du dy
1
1
1
ρ+β+η
xρ − 1
xρ − 1
1
x
−1
1 xρ+β − 1
1
−
−
−
, x > 0, ρ, β, η < 0.
η β+η
ρ+β+η
ρ
β
ρ+β
ρ
:=
=
Fourth order condition (R4 ) There exists a positive real parameter γ, negative real parameters ρ, β
and η, functions b, b̃ and b̆ with b(t) → 0, b̃(t) → 0 and b̆(t) → 0 for t → ∞, all of constant sign for large
values of t, such that
ln U (tx) − ln U (t) − γ ln x
1
1
lim
− Dρ (x) − Hρ,β (x) = Rρ,β,η (x), ∀x > 0.
t→∞ b̆(t) b̃(t)
b(t)
It is shown in Fraga Alves et al. (2006) that |b̆| is regularly varying with index of regular variation η. See
also Wang and Cheng (2005).
We start with the expansion of the statistic Tn,k (K) under (R4 ). This result is more general than what
is needed in the present paper, but is of interest on its own as such an expansion for a general kernel
statistic is to date not available.
We impose the following conditions on the kernel function K.
Condition (K) Let K be a function defined on (0, 1) such that
1
t
(i) K(t) =
Rt
0
R j/(k+1)
j
u(v)dv for some function u satisfying (k + 1) (j−1)/(k+1) u(t)dt ≤ f ( k+1
) for some
positive continuous and integrable function f defined on (0, 1),
(ii)
1
k
Pk
(iii)
1
k
(iv)
1
k
j=1
Pk
K
j=1
Pk
j=1
K
K
j
k+1
j
k+1
j
k+1
√
= µ(K) + o(1/ k) for k → ∞,
j
k+1
j
k+1
−ρ
√
= I1 (K, ρ) + o(1/ k) for k → ∞,
−ρ−β
√
= I1 (K, ρ + β) + o(1/ k) for k → ∞,
(v) σ 2 (K) < ∞,
7
√
i
(vi) maxi∈{1,...,k} K k+1
= o( k) for k → ∞,
(vii)
R1
0
|K(u)| u|ρ|−1−ε du < ∞ for some ε > 0.
Denote
I3 (K, ρ, β, η)
:=
1
η
Z
1
K(u)u
−ρ
0
u−β−η − 1 u−β − 1
−
β+η
β
du.
Theorem 2 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). If
further (K) holds, then for k → ∞ such that k/n → 0 we have
Tn,k (K)
D
=
√
Nk (K)
γµ(K) + γσ(K) √
(1 + oP (1)) + b(Yn−k,n )I1 (K, ρ) + b(Yn−k,n )OP (1/ k)
k
√
+b(Yn−k,n )b̃(Yn−k,n )I2 (K, ρ, β) + b(Yn−k,n )b̃(Yn−k,n )OP (1/ k)
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )I3 (K, ρ, β, η)(1 + oP (1)),
where Nk (K) is an asymptotic standard normally distributed random variable.
For the log kernel function Lα we have
I3 (Lα , ρ, β, η)
1
1
1
−
=
β + η (1 − ρ − β − η)α+1
(1 − ρ)α+1
1
1
1
−
.
−
β (1 − ρ − β)α+1
(1 − ρ)α+1
1
η
We now establish the asymptotic normality of our estimator. In this we will use the following notation:
I3 (α, α̃, ρ, β, η) :=
c2 (α, α̃, ᾰ, ρ, β, η) :=
c3 (α, α̃, ᾰ, ρ, β) :=
Vk (τ1 , τ2 , δ, γ, ρ) :=
d1 (τ1 , τ2 , δ, ρ, β) :=
d2 (τ1 , τ2 , δ, ρ, β, η) :=
d3 (τ1 , τ2 , δ, ρ, β) :=
Vk (τ , δ, γ, ρ, β) :=
d2 (τ , δ, ρ, β, η) :=
d3 (τ , δ, ρ, β) :=
v 2 (τ , δ, γ, ρ, β) :=
I3 (Lα , ρ, β, η) − I3 (Lα̃ , ρ, β, η),
I3 (α, α̃, ρ, β, η) − ψ(α, α̃, ᾰ, ρ)I3 (ᾰ, α̃, ρ, β, η)
,
I1 (ᾰ, α̃, ρ)
I2 (α, α̃, ρ, β)I2 (ᾰ, α̃, ρ, β) − ψ(α, α̃, ᾰ, ρ)I22 (ᾰ, α̃, ρ, β)
,
I12 (ᾰ, α̃, ρ)
Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ) − Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ),
c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β),
c2 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β, η) − c2 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β, η),
c3 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c3 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β),
Vk (τ1 , τ2 , δ, γ, ρ) − Λ(τ , ρ, β)Vk (τ2 , τ3 , δ, γ, ρ)
d1 (τ2 , τ3 , δ, ρ, β)
d2 (τ1 , τ2 , δ, ρ, β, η) − Λ(τ , ρ, β)d2 (τ2 , τ3 , δ, ρ, β, η)
,
d1 (τ2 , τ3 , δ, ρ, β)
d3 (τ1 , τ2 , δ, ρ, β) − Λ(τ , ρ, β)d3 (τ2 , τ3 , δ, ρ, β)
,
d1 (τ2 , τ3 , δ, ρ, β)
Avar(Vk (τ , δ, γ, ρ, β)),
8
for Nk (α, α̃, ᾰ, γ, ρ) we refer to (8).
Theorem 3 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). Then,
√
√
√
if k → ∞ such that k/n → 0, kb(n/k)b̃(n/k) → ∞, kb(n/k)b̃(n/k)b̆(n/k) → λ1 and kb(n/k)b̃2 (n/k) →
λ2 we have that
√
D
kb(n/k)b̃(n/k)[Λn,k (τ , δ) − Λ(τ , ρ, β)] → N (λ1 d2 (τ , δ, ρ, β, η) − λ2 d3 (τ , δ, ρ, β), v 2 (τ , δ, γ, ρ, β)).
The proof of Theorem 3 is given in Section 6.4, and the derivation of the asymptotic variance can be
found in Section 6.6.
Let Λ′ (τ , ρ, β) := dΛ(τ , ρ, β)/dβ and set
d˜2 (τ , δ, ρ, β, η) :=
d˜3 (τ , δ, ρ, β) :=
ṽ 2 (τ , δ, γ, ρ, β) :=
d2 (τ , δ, ρ, β, η)
,
Λ′ (τ , ρ, β)
d3 (τ , δ, ρ, β)
,
Λ′ (τ , ρ, β)
v 2 (τ , δ, γ, ρ, β)
.
[Λ′ (τ , ρ, β)]2
A straightforward application of the delta method establishes the asymptotic normality of the estimators
β̂n,k (τ , δ, ρ) and β̂n,k (τ , δ, ρ̂n,k̆ ) for β. The result is stated formally in the next theorem.
Theorem 4 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). Then,
√
√
√
if k → ∞ such that k/n → 0, kb(n/k)b̃(n/k) → ∞, kb(n/k)b̃(n/k)b̆(n/k) → λ1 and kb(n/k)b̃2 (n/k) →
λ2 we have that
√
D
kb(n/k)b̃(n/k)[β̂n,k (τ , δ, ρ) − β] → N (λ1 d˜2 (τ , δ, ρ, β, η) − λ2 d˜3 (τ , δ, ρ, β), ṽ 2 (τ , δ, γ, ρ, β)).
The result continues to hold when ρ in β̂n,k (τ , δ, ρ) is replaced by an external estimator ρ̂n,k̆ which is
p
p
such that ρ̂n,k̆ − ρ = OP (1/( k̆b(n/k̆))), when k̆b(n/k̆) → ∞, provided
Some remarks:
√
kb(n/k)b̃(n/k)
p
→ 0.
k̆b(n/k̆)
(6)
• Many important members of the class of Pareto-type models like the Fréchet, Burr and student t, satisfy
condition (R4 ) with b(x) ∼ a1 xρ , b̃(x) ∼ a2 xρ and b̆(x) = a3 xρ as x → ∞, for some constants a1 , a2 and
a3 . For such models one easily obtains from Theorem 4 that the asymptotic mean squared error (AMSE)
optimal k is of the order n−6ρ/(1−6ρ) , which is larger than the orders n−2ρ/(1−2ρ) and n−4ρ/(1−4ρ) one
9
typically achieves for the estimation of first and second order tail parameters, respectively. When using
this AMSE optimal k sequence for the estimation of β, the rate of convergence in Theorem 4 is of the
order nρ/(1−6ρ) . Also, in this regard, condition (6) can be easily verified to be satisfied when using the
AMSE optimal sequences for k and k̆.
• The construction of confidence intervals for the third order parameter β, based on the normal limiting
distribution of Theorem 4 requires that one has an estimator for the normalizing function b(n/k)b̃(n/k)
available. The function b(n/k) can be estimated by using the exponential regression model for scaled
log-spacings of successive order statistics presented in Beirlant et al. (1999), or, when one considers the
special case where b(x) = axρ , by the estimator for the second order scale parameter a of Caeiro and
Gomes (2006) combined with an estimator for the second order rate parameter ρ (see e.g. Gomes et al.,
2002, Goegebeur et al., 2010). However, so far no estimator for the function b̃ is available; in fact this is a
topic of current research. In case one is interested in a confidence interval for β one could use a bootstrap
procedure as an alternative to the use of Theorem 4.
4
Numerical and simulation results
In this section we illustrate the asymptotic and the finite sample behavior of some specific examples
of β̂(τ , δ, ρ). We consider the parameter settings δ ′ = (0.05, 0.1) combined with τ ′ = (0.5, 0.6, 0.7),
τ ′ = (0.6, 0.7, 0.8) and τ ′ = (0.7, 0.8, 0.9), and δ ′ = (0.25, 0.5) combined with τ ′ = (0.75, 1, 1.25),
τ ′ = (1, 1.25, 1.5) and τ ′ = (1.25, 1.5, 1.75). These values for the tuning parameters gave a good performance for a wide range of Pareto-type models in a preliminary evaluation of the asymptotic mean
squared errors. It should however be mentioned that these values are in no way optimally selected, so
improvements over the results presented in this paper are still possible.
In Figure 1 we show the asymptotic bias components d˜2 (τ , δ, ρ, β, η) and d˜3 (τ , δ, ρ, β), and the asymptotic standard deviation ṽ(τ , δ, γ, ρ, β) as a function of β for Pareto-type models with ρ = β = η. The
bias components are increasing in β, a result that is in line with the estimation of the second order
parameter. The parameter settings δ ′ = (0.05, 0.1) with τ ′ = (0.7, 0.8, 0.9) and δ ′ = (0.25, 0.5) with
τ ′ = (1.25, 1.5, 1.75) have the smallest bias components of the parameter configurations under consideration, although it should be mentioned that the actual performance with respect to asymptotic bias will
depend on the way in which the two asymptotic bias components become combined in the overall bias.
As for the asymptotic standard deviation we see that none of the parameter settings performs uniformly
10
best. As expected, the asymptotic standard deviations are considerably larger than those reported in
Fraga Alves et al. (2003) and Goegebeur et al. (2010) in the framework of the estimation of the second
order parameter.
In order to get a better idea about the actual large sample behavior of the estimators for β we consider
two specific Pareto-type models:
1. The Burr distribution. As illustrated in Rodriguez (1977), the Burr model is very flexible, and hence
has good potential for practical modeling purposes; see Beirlant et al. (1998) for an application in
the actuarial context. For the Burr distribution, denoted Burr(ζ, λ, δ),
λ
ζ
1 − F (x) =
,
x > 0; ζ, λ, δ > 0,
ζ + xδ
from which one easily derives that γ = 1/(λδ). Further, the model satisfies condition (R4 ) with
ρ = −1/λ, ρ = β = η,
b(x) = γ
xρ
xρ
, b̃(x) = ρ
ρ
1−x
1 − xρ
and
b̆(x) = 2ρ
xρ
.
1 − xρ
We consider the cases (ζ, λ, δ) = (1, 2, 0.5), (1, 1, 1), and (1, 0.5, 2), corresponding to γ = 1 and
β = −0.5, −1, and −2, respectively.
2. The ‘modified’ Pareto distribution with a tail quantile function given by
U (x) = Cxγ eD0 +D1 x
ρ
+D2 xρ+β +D3 xρ+β+η
,
with C > 0 and D1 , D2 , D3 6= 0. This model satisfies (R4 ) provided η > β with
b(x)
=
b̃(x)
=
b̆(x)
=
ρD1 xρ
,
e1 xβ − C
e2 xβ+η
1−C
β(ρ + β)D2 xβ
,
e3 xη )
ρD1 (1 − C
η(β + η)(ρ + β + η)D3 xη
,
β(ρ + β)D2
in which
e1 := (ρ + β)D2 ,
C
ρD1
e2 := (ρ + β + η)D3 ,
C
ρD1
e3 := (β + η)(ρ + β + η)D3 .
C
β(ρ + β)D2
Note that this model allows the rate parameters ρ, β and η to be different from each other. We
consider the parameter setting γ = 0.1, ρ = −1, β = −0.5, η = −0.25, C = 1, D0 = 0, D1 = 1,
D2 = −1 and D3 = −0.75.
11
In Figures 2-4 we show the first order approximation to the asymptotic bias, standard deviation and
mean squared error, respectively, as a function of k when n = 10, 000, for all parameter settings and Burr
distributions under consideration. The asymptotic bias shown in Figure 2 is, as expected, in absolute
value increasing in k. For the parameter combinations with δ ′ = (0.05, 0.1), the setting τ ′ = (0.6, 0.7, 0.8)
provides overall a good performance, whereas for δ ′ = (0.25, 0.5) the best performance is obtained with
τ ′ = (0.75, 1, 1.25). Concerning the asymptotic standard deviation shown in Figure 3 we observe that
none of the considered parameter combinations works uniformly best, a result that is in line with Figure
1, though the cases δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) provide
a good overall performance. Figure 4 shows the AMSE of β̂n,k (δ, τ , ρ) as a function of k. As expected,
the parameter combinations δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25)
work best, with δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) performing slightly better than δ ′ = (0.25, 0.5),
τ ′ = (0.75, 1, 1.25). In Figure 5 we show the corresponding results for the modified Pareto distribution.
Similar to the Burr distribution, also here the parameter settings δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and
δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) give a good performance among the combinations under consideration.
It is a well known fact that when estimating higher order parameters, the differences between the asymptotic and the finite sample results may become considerable. We refer to Draisma (2000) for a discussion
of the difficulties of estimating asymptotic behavior correctly in finite samples. Also, Gomes et al. (2002)
commented on the discrepancies between the asymptotic and finite sample results when estimating the
second order rate parameter ρ. Therefore, we also illustrate the properties of the proposed estimator
with a small simulation study. In this simulation experiment we include the four distributions considered
above, as well as the generalized Pareto distribution. For the generalized Pareto distribution, denoted
GPD(σ, γ),
γx −1/γ
,
1 − F (x) = 1 +
σ
x > 0; σ, γ > 0.
This distribution can be easily verified to satisfy condition (R4 ) with ρ = β = η = −γ. We set
(σ, γ) = (1, 0.5) and (1, 1). For all the distributions under consideration we generated 1,000 samples
of size n = 10, 000, and computed the estimators for β with δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and
δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25), for k = 10 to k = 9, 990, in steps of 10. The authors developed a
Fortran program, built around the NAG library subroutine C05ADF, to obtain β̂n,k (τ , δ, ρ) by numerical inversion of the estimating equation Λn,k (τ , δ) = Λ(τ , ρ, β). The program can be obtained upon
request from the authors. In the simulation experiment, the parameter ρ is estimated with the power
kernel estimator introduced in Goegebeur et al. (2010), computed at (i) k̆ = k (Figures 6 and 7) and
(ii) k̆ = n0.95 in case ρ ≥ −1 and k̆ = n0.975 otherwise (Figures 8 and 9). In case (ii) we only report
12
the results for the Burr and Generalized Pareto distributions, which allow an explicit derivation of the
order of the AMSE optimal sequence for k, and the proposed sequences for k̆ can be easily verified to
satisfy (6) in Theorem 4. In Figures 6 till 9 we show the median (left column) and the empirical mean
squared error (right column) as a function of k. We motivate our choice for the median by the fact that
the sampling distribution of β̂n,k (δ, τ , ρ̂n,k ) is extremely skew and also has a large variance; the latter
being in line with the results from the above asymptotic evaluation. The empirical mean squared error is
computed as the sample mean (over the simulation runs) of the squared deviations between the estimate
and the true parameter value. The sampling distribution of both estimators seems to be reasonably well
centered at the true value, though, as expected, within the Burr(ζ, λ, δ) and GPD(σ, γ) families the case
β = −0.5 is more difficult than the cases with a smaller β value. The estimators for β with δ ′ = (0.05, 0.1),
τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) perform about equally well in terms of location
and empirical MSE, with no clear preference for one of the settings. By comparing Figures 6 and 7 with
Figures 8 and 9, we see that taking k̆ = n0.95 in case ρ ≥ −1 and k̆ = n0.975 otherwise works slightly
better than using k̆ = k, both in terms of bias and empirical MSE.
The proposed settings for the tuning parameters δ and τ gave overall a good performance on the simulated
data sets. As mentioned above, the proposed values of these parameters are in no way optimally selected,
and further work could be done on the development of automatic, data driven, procedures for setting
these parameters. In practical applications of extreme value methodology, the tuning parameters are
typically set at values that provide reasonably stable sample paths of the estimators under consideration,
as it is common practice to infer about the value of the parameter of interest on the basis of the stable
part of the sample path. This can be done e.g. by using a stability criterion, such as the one discussed
in Caeiro and Gomes (2006) in the framework of the estimation of second order parameters. This is a
topic of future research.
5
Conclusion
In this paper we introduced a class of estimators for the third order parameter in extreme value statistics
and established its consistency and asymptotic normality under some regularity conditions. The presented
theoretical results are of interest on their own, but will also form the basis for research on the AMSE
optimal selection of the number of upper order statistics when estimating the second order parameter and
the development of bias-corrected estimators for the second order parameter. Work on these extensions
is in progress.
13
6
6.1
Appendix
Proof of Lemma 1
The two limits of Λ(τ , ρ, β) are trivial to obtain. In order to show that β 7→ Λ(τ , ρ, β) is a decreasing
function for β ∈ (−∞, 0) we study its derivative
dΛ(τ , ρ, β)
(1 − ρ − β)τ2 +τ3 −1 (1 − ρ)τ2 +τ3 f (|β|)
,
=
dβ
[(1 − ρ − β)τ3 (1 − ρ)τ2 − (1 − ρ − β)τ2 (1 − ρ)τ3 ]2
with
f (x) := (τ3 − τ1 )(1 + ax)τ2 −τ1 − (τ2 − τ1 )(1 + ax)τ3 −τ1 − (τ3 − τ2 ),
x > 0,
and a := 1/(1 − ρ). It now remains to verify that f (x) < 0 for x > 0. We have
lim f (x) = 0,
x→0+
lim f (x) = −∞
x→+∞
and
f ′ (x) = a(τ2 − τ1 )(τ3 − τ1 )(1 + ax)−τ1 −1 [(1 + ax)τ2 − (1 + ax)τ3 ],
which is negative for x > 0, from which the result follows.
6.2
Proof of Theorem 1
For some arbitrary positive real numbers α, α̃ and ᾰ, we denote
Nk (α, α̃, γ) :=
Nk (α, α̃, ᾰ, γ, ρ) :=
γ[σ(Lα )Nk (Lα ) − σ(Lα̃ )Nk (Lα̃ )],
Nk (α, α̃, γ) − ψ(α, α̃, ᾰ, ρ)Nk (ᾰ, α̃, γ)
.
I1 (ᾰ, α̃, ρ)
(7)
(8)
From Proposition 1 we have the following asymptotic distributional representation
Tn,k (Lα ) − Tn,k (Lα̃ )
b(Yn−k,n )
1
D
= I1 (α, α̃, ρ) + √
Nk (α, α̃, γ)
kb(Yn−k,n )
+b̃(Yn−k,n )I2 (α, α̃, ρ, β)(1 + oP (1)) + OP
1
√
k
.
√
√
kb(n/k)b̃(n/k) → ∞ implies that kb(n/k) → ∞, and hence, using the property Yn−k,n =
√
√
n/k(1 + oP (1)) and the regular variation of the function b, we obtain kb(Yn−k,n ) = kb(n/k)(1 +
Note that
oP (1)) → ∞. Also, from the above property concerning Yn−k,n and the regular variation of b̃ we obtain
b̃(Yn−k,n ) = b̃(n/k)(1 + oP (1)) → 0. A straightforward application of Taylor’s theorem gives then
D
Ψn,k (α, α̃, ᾰ) =
1
ψ(α, α̃, ᾰ, ρ) + √
Nk (α, α̃, ᾰ, γ, ρ)(1 + oP (1))
kb(Yn−k,n )
+b̃(Yn−k,n )c1 (α, α̃, ᾰ, ρ, β)(1 + oP (1)),
14
where c1 (α, α̃, ᾰ, ρ, β) is defined in (5). For the specific construction considered in Theorem 1 we then
have
Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 )
b̃(Yn−k,n )
D
= [c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β)](1 + oP (1))
1
+√
[Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ) − Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)] (1 + oP (1)),
kb(Yn−k,n )b̃(Yn−k,n )
√
and hence, if kb(n/k)b̃(n/k) → ∞
Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 )
b̃(Yn−k,n )
P
→ c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β).
P
A similar result holds for the denominator of Λn,k (τ , δ), and thus Λn,k (τ , δ) → Λ(τ , ρ, β). The function
Λ← is continuous in its last argument and hence the consistency of β̂n,k (τ , δ, ρ) for β is obtained in a
straightforward way from the continuous mapping theorem. In a similar way we obtain the consistency
of β̂n,k (τ , δ, ρ̂n,k̆ ) for β from the continuous mapping theorem.
6.3
Proof of Theorem 2
We state the following lemma, taken from Goegebeur et al. (2010), which provides a result that is useful
in the derivation of the asymptotic expansion of Tn,k (K) given in Theorem 2.
Lemma 2 (Goegebeur et al., 2010) Denote by E1 , . . . , Ek standard exponential random variables
and by U1,k ≤ . . . ≤ Uk,k the order statistics of a random sample of size k from U (0, 1). Assume that
R1
R1
|ρ|−1−ε
du < ∞ for some ε > 0 in case |ρ| < 1. Then
0 |K(u)|du < ∞ in case |ρ| ≥ 1 and that 0 |K(u)|u
−ρ
k
j
j
1X
−ρ
K
Uj,k −
Sk =
k j=1
k+1
k+1
!
√
Ej = OP (1/ k),
for k → ∞.
Proof of Theorem 2
D
From the inverse probability integral transform we have that Xj,n = U (Yj,n ), j = 1, . . . , n. Further, it
D
is well known that Yn−j+1,n /Yn−k,n = Yk−j+1,k , j = 1, . . . , k, independently from Yn−k,n . Using these
15
properties and condition (R4 ) we can write, for j = 1, . . . , k,
ln Xn−j+1,n − ln Xn−k,n
D
= ln U (Yk−j+1,k Yn−k,n ) − ln U (Yn−k,n )
= γ ln Yk−j+1,k + b(Yn−k,n )Dρ (Yk−j+1,k )
+b(Yn−k,n )b̃(Yn−k,n )Hρ,β (Yk−j+1,k )
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Rρ,β,η (Yk−j+1,k )
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Bn,k (j),
where
Bn,k (j) :=
1
1
b̆(Yn−k,n ) b̃(Yn−k,n )
−Rρ,β,η (Yk−j+1,k ),
ln U (Yk−j+1,k Yn−k,n ) − ln U (Yn−k,n ) − γ ln Yk−j+1,k
− Dρ (Yk−j+1,k ) − Hρ,β (Yk−j+1,k )
b(Yn−k,n )
and hence
Zj
D
= γj(ln Yk−j+1,k − ln Yk−j,k ) + b(Yn−k,n )j(Dρ (Yk−j+1,k ) − Dρ (Yk−j,k ))
+b(Yn−k,n )b̃(Yn−k,n )j(Hρ,β (Yk−j+1,k ) − Hρ,β (Yk−j,k ))
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )j(Rρ,β,η (Yk−j+1,k ) − Rρ,β,η (Yk−j,k ))
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )ζj ,
where ζj := j(Bn,k (j) − Bn,k (j + 1)), with the convention that Bn,k (k + 1) = 0.
This then gives
Tn,k (K)
D
=
k
j
1X
K
j(ln Yk−j+1,k − ln Yk−j,k )
γ
k j=1
k+1
k
1X
j
+b(Yn−k,n )
K
j(Dρ (Yk−j+1,k ) − Dρ (Yk−j,k ))
k j=1
k+1
+b(Yn−k,n )b̃(Yn−k,n )
k
j
1X
K
j(Hρ,β (Yk−j+1,k ) − Hρ,β (Yk−j,k ))
k j=1
k+1
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )
(1)
(2)
k
j
1X
K
j(Rρ,β,η (Yk−j+1,k ) − Rρ,β,η (Yk−j,k ))
k j=1
k+1
k
1X
j
K
ζj
k j=1
k+1
(3)
=: Tn,k (K) + b(Yn−k,n )Tn,k (K) + b(Yn−k,n )b̃(Yn−k,n )Tn,k (K)
(4)
(5)
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Tn,k (K) + b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Tn,k (K).
16
(1)
(2)
We now analyze each term in turn. The derivations for the terms Tn,k (K) and Tn,k (K) are identical to
those in Goegebeur et al. (2010). We repeat the argumentation here briefly as it is also useful for the
development of the subsequent terms.
For the first term we have the following. Let E1 , . . . , Ek denote a random sample of size k from the unit
exponential distribution, and denote by E1,k ≤ · · · ≤ Ek,k the corresponding order statistics. From the
Rényi representation (Rényi, 1953) we obtain
D
=
ln Yk−j+1,k − ln Yk−j,k
D
=
Ek−j+1,k − Ek−j,k
Ej
.
j
Thus
(1)
Tn,k (K)
k
k
1X
j
j
1X
K
+γ
K
(Ej − 1)
= γ
k j=1
k+1
k j=1
k+1
D
√
Nk (K)
Nk (K)
= γµ(K) + o(1/ k) + γσ(K) √
= γµ(K) + γσ(K) √ (1 + oP (1)),
k
k
where
√
Nk (K) := k
1
k
Pk
j=1
K
j
k+1
σ(K)
(9)
(Ej − 1)
which is, under condition (K) (vi), an asymptotic standard normal random variable by the LindebergFeller central limit theorem, see also Chernoff et al. (1967).
(2)
As for Tn,k (K) we obtain, by the mean value theorem and the Rényi representation, the following expression
(2)
Tn,k (K)
D
=
D
=
ρEk−j+1,k
k
e
− eρEk−j,k
j
1X
K
j
k j=1
k+1
ρ
k
j
1X
K
Ej eρQj,k
k j=1
k+1
where Qj,k is a random value between Ek−j,k and Ek−j+1,k , and hence
−ρ
"
−ρ #
k
k
1X
j
j
j
j
1X
D
(2)
ρQj,k
Tn,k (K) =
K
K
e
−
Ej
Ej +
k j=1
k+1
k+1
k j=1
k+1
k+1
(2,1)
(2,2)
=: Tn,k (K) + Tn,k (K).
(2,1)
(1)
The term Tn,k (K) can be analyzed in a similar way as Tn,k (K):
√
σ(K, ρ)
σ(K, ρ)
(2,1)
Pk (K, ρ) = I1 (K, ρ) + √
Pk (K, ρ)(1 + oP (1)),
Tn,k (K) = I1 (K, ρ) + o(1/ k) + √
k
k
17
where σ 2 (K, ρ) :=
R1
0
K 2 (u)u−2ρ du and
√
Pk (K, ρ) := k
1
k
Pk
j=1
K
j
k+1
j
k+1
σ(K, ρ)
−ρ
(Ej − 1)
.
The statistic Pk (K, ρ) is a standardized kernel statistic with a kernel function satisfying
j j −ρ √
D
= o( k), and hence we have that Pk (K, ρ) →
maxj∈{1,...,k} K k+1
N (0, 1) for k → ∞ (Cherk+1
(2,2)
noff et al., 1967). The term Tn,k (K) can be analyzed as follows. Let U1,k ≤ · · · ≤ Uk,k denote the order
statistics of a random sample of size k from the uniform (0, 1) distribution. From the inverse probability
D
D
integral transform we obtain that exp(Ek−j+1,k ) = exp(− ln(1 − Uk−j+1,k )) = exp(− ln Uj,k ) = 1/Uj,k ,
j = 1, . . . , k. Also we have
−ρ j
ρQj,k
−
e
k+1
(
−ρ −ρ )
j
j
ρEk−j+1,k
ρEk−j,k
max e
−
−
, e
k+1
k+1
)
(
−ρ −ρ j
j
−ρ
−ρ
max Uj+1,k
−
, Uj,k −
k+1
k+1
(
−ρ )
−ρ j+1
j
−ρ
−ρ
max Uj+1,k −
+ cj,k , Uj,k −
,
k+1
k+1
≤
D
=
≤
where cj,k := ((j + 1)/(k + 1))−ρ − (j/(k + 1))−ρ . Hence
−ρ k k −ρ
1 X j
j
j
+
1
1 X (2,2)
Ej U
cj,k Ej
K
K
|Tn,k (K)| ≤
+
−
j+1,k
k
k j=1
k+1
k+1
k+1 j=1
−ρ k −ρ
j
j
1 X +
K k + 1 Ej Uj,k − k + 1
k
j=1
(2,2,1)
=: Tn,k
(2,2,2)
(K) + Tn,k
(2,2,1)
By using Lemma 2, the terms Tn,k
(2,2,2)
Tn,k
(2,2,3)
(K) + Tn,k
(2,2,3)
(K) and Tn,k
(K).
√
(K) can be shown to be both OP (1/ k). Concerning
(K) we have that
(2,2,2)
Tn,k (K)
k |ρ|−1
j
|ρ| 1 X z
K
Ej ,
=
k + 1 k j=1 k + 1 j,k
where zj,k is a value between j/(k + 1) and (j + 1)/(k + 1). When |ρ| ≥ 1 we have
k j
|ρ| 1 X (2,2,2)
Ej
K
Tn,k (K) ≤
k + 1 k j=1
k+1 (2,2,2)
from which, by the strong law of large numbers, we obtain that Tn,k
(K) = OP (1/k). In case |ρ| < 1
we have
(2,2,2)
Tn,k
(K) ≤
|ρ|−1
k |ρ| 1 X j
j
K
Ej
k + 1 k j=1 k+1 k+1
18
(10)
e (vii) and the strong law of large numbers, yields that
which, by invoking condition (K)
(2,2,2)
Tn,k
(K) = OP (1/k).
Combining the above results gives
√
(2)
Tn,k (K) = I1 (K, ρ) + OP (1/ k).
(3)
(4)
The terms Tn,k (K) and Tn,k (K) can be analyzed using an analogous line of argumentation, and yield
√
I2 (K, ρ, β) + OP (1/ k),
(3)
Tn,k (K) =
(4)
Tn,k (K) =
I3 (K, ρ, β, η)(1 + oP (1)).
(5)
Finally, concerning Tn,k (K), we apply condition K(i) and obtain
(5)
Tn,k (K) =
=
Z j/(k+1)
k
1X
ζj
u(v)dv
k j=1 j/(k + 1) 0
j Z
k
k + 1 X ζj X i/(k+1)
u(v)dv
k j=1 j i=1 (i−1)/(k+1)
k
=
k+1 X
k i=1
Z
i/(k+1)
u(v)dv Bn,k (i)
(i−1)/(k+1)
and
(5)
|Tn,k (K)| ≤
k
i
1X
f
|Bn,k (i)| .
k i=1
k+1
From Theorem 2.1 in Fraga Alves et al. (2006) we have that for every ε > 0 there exists n0 such that for
any n ≥ n0 , with arbitrary probability, for i = 1, . . . , k,
|Bn,k (i)|
ε+ρ+β+η
≤ εYk−i+1,k
D
−ρ−β−η−ε
= εUi,k
−ρ−β−η−δ
≤ εUi,k
where ε < δ. Thus we have that
B (i) n,k
sup −ρ−β−η−δ = oP (1),
i∈{1,...,k} U
i,k
whence
(5)
|Tn,k (K)| ≤ oP (1)
k
i
1X
−ρ−β−η−δ
f
Ui,k
k i=1
k+1
which, provided δ < −ρ − β − η, is oP (1).
19
6.4
Proof of Theorem 3
It is easy to verify that Lα satisfies (K), and hence from Theorem 2 we obtain the following asymptotic
representation for our statistic under (R4 )
D
Tn,k (Lα ) =
√
Nk (Lα )
√
(1 + oP (1)) + b(Yn−k,n )I1 (Lα , ρ) + b(Yn−k,n )OP (1/ k)
k
√
+b(Yn−k,n )b̃(Yn−k,n )I2 (Lα , ρ, β) + b(Yn−k,n )b̃(Yn−k,n )OP (1/ k)
γ + γσ(Lα )
+b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )I3 (Lα , ρ, β, η)(1 + oP (1)),
and thus
Tn,k (Lα ) − Tn,k (Lα̃ )
b(Yn−k,n )
1
D
= I1 (α, α̃, ρ) + √
Nk (α, α̃, γ)(1 + oP (1)) + b̃(Yn−k,n )I2 (α, α̃, ρ, β)
kb(Yn−k,n )
+b̃(Yn−k,n )b̆(Yn−k,n )I3 (α, α̃, ρ, β, η)(1 + oP (1)).
(11)
A straightforward application of Taylor’s theorem provides the following expansion for a ratio of statistics
of the form (11):
D
Ψn,k (α, α̃, ᾰ) =
1
Nk (α, α̃, ᾰ, γ, ρ)(1 + oP (1)) + b̃(Yn−k,n )c1 (α, α̃, ᾰ, ρ, β)
ψ(α, α̃, ᾰ, ρ) + √
kb(Yn−k,n )
+b̃(Yn−k,n )b̆(Yn−k,n )c2 (α, α̃, ᾰ, ρ, β, η)(1 + oP (1))
−b̃2 (Yn−k,n )c3 (α, α̃, ᾰ, ρ, β)(1 + oP (1)).
Taking a difference of two statistics Ψn,k with appropriately chosen parameters α, α̃ and ᾰ gives
Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 )
b̃(Yn−k,n )
1
D
= d1 (τ1 , τ2 , δ, ρ, β) + √
Vk (τ1 , τ2 , δ, γ, ρ)(1 + oP (1))
kb(Yn−k,n )b̃(Yn−k,n )
+b̆(Yn−k,n )d2 (τ1 , τ2 , δ, ρ, β, η)(1 + oP (1)) − b̃(Yn−k,n )d3 (τ1 , τ2 , δ, ρ, β)(1 + oP (1)),
and, by another application of Taylor’s theorem
D
Λn,k (τ , δ) =
Vk (τ , δ, γ, ρ, β)
(1 + oP (1)) + b̆(Yn−k,n )d2 (τ , δ, ρ, β, η)(1 + oP (1))
Λ(τ , ρ, β) + √
kb(Yn−k,n )b̃(Yn−k,n )
−b̃(Yn−k,n )d3 (τ , δ, ρ, β)(1 + oP (1)),
from which the result of Theorem 3 follows.
6.5
Proof of Theorem 4
The asymptotic normality of the normalized β̂n,k (τ , δ, ρ) can be obtained by a straightforward application of the delta method.
20
For what concerns the asymptotic normality of the normalized β̂n,k (τ , δ, ρ̂n,k̆ ), first note, by long and
tedious calculations, that dβ/dρ exists for all β, ρ < 0, and is continuous in ρ. From Taylor’s theorem we
then obtain
β̂n,k (τ , δ, ρ̂n,k̆ ) = β̂n,k (τ , δ, ρ) +
dβ
(ρ̂ − ρ)(1 + oP (1)).
dρ n,k̆
Thus
√
kb(n/k)b̃(n/k)(β̂n,k (τ , δ, ρ̂n,k̆ ) − β) =
√
kb(n/k)b̃(n/k)(β̂n,k (τ , δ, ρ) − β)
dβ √
kb(n/k)b̃(n/k)(ρ̂n,k̆ − ρ)(1 + oP (1)),
+
dρ
from which the result of Theorem 4 follows.
6.6
Derivation of the asymptotic variance in Theorem 3
Straightforward calculations lead to
Var(Vk (τ , δ, γ, ρ, β)) =
1
d21 (τ2 , τ3 , δ, ρ, β)
Var(Vk (τ1 , τ2 , δ, γ, ρ)) + Λ2 (τ , ρ, β)Var(Vk (τ2 , τ3 , δ, γ, ρ))
−2Λ(τ , ρ, β)Cov(Vk (τ1 , τ2 , δ, γ, ρ), Vk (τ2 , τ3 , δ, γ, ρ))] ,
in which
Var(Vk (τ1 , τ2 , δ, γ, ρ)) =
Var(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ)) + Var(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ))
−2Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ))
and
Cov(Vk (τ1 , τ2 , δ, γ, ρ), Vk (τ2 , τ3 , δ, γ, ρ)) =
Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ))
−Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ3 − δ1 , τ3 , τ3 − δ2 , γ, ρ))
−Var(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ))
+Cov(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ), Nk (τ3 − δ1 , τ3 , τ3 − δ2 , γ, ρ)).
For arbitrary but positive αi , α̃i , ᾰi , i = 1, 2, we have, using (8), that
Cov(Nk (α1 , α̃1 , ᾰ1 , γ, ρ), Nk (α2 , α̃2 , ᾰ2 , γ, ρ)) =
1
[Cov(Nk (α1 , α̃1 , γ), Nk (α2 , α̃2 , γ))
I1 (ᾰ1 , α̃1 , ρ)I1 (ᾰ2 , α̃2 , ρ)
−ψ(α2 , α̃2 , ᾰ2 , ρ)Cov(Nk (α1 , α̃1 , γ), Nk (ᾰ2 , α̃2 , γ))
−ψ(α1 , α̃1 , ᾰ1 , ρ)Cov(Nk (ᾰ1 , α̃1 , γ), Nk (α2 , α̃2 , γ))
+ψ(α1 , α̃1 , ᾰ1 , ρ)ψ(α2 , α̃2 , ᾰ2 , ρ)Cov(Nk (ᾰ1 , α̃1 , γ), Nk (ᾰ2 , α̃2 , γ))]
21
(12)
where, from (7), the covariances are of the form
Cov(Nk (α1 , α̃1 , γ), Nk (α2 , α̃2 , γ)) ∼ γ 2 [σ(α1 , α2 ) − σ(α1 , α̃2 ) − σ(α̃1 , α2 ) + σ(α̃1 , α̃2 )],
for k → ∞, with
σ(α1 , α2 ) :=
Z
1
Lα1 (u)Lα2 (u)du =
0
Γ(α1 + α2 + 1)
.
Γ(α1 + 1)Γ(α2 + 1)
The required variance terms in (12) follow in a straightforward way from the above covariance expressions.
Acknowledgement
The authors are very grateful to the referees for their careful reading of the paper and their comments
that lead to significant improvements of the initial draft.
References
[1] Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., 1999. Tail index estimation and an exponential
regression model. Extremes 2, 177–200.
[2] Beirlant, J., Dierckx, G., Guillou, A., Stărică, C., 2002. On exponential representations of logspacings of extreme order statistics. Extremes 5, 157–180.
[3] Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., 2004. Statistics of Extremes – Theory and
Applications. Wiley Series in Probability and Statistics.
[4] Beirlant, J., Goegebeur, Y., Verlaak, R., Vynckier, P., 1998. Burr regression and portfolio segmentation. Insurance: Mathematics and Economics 23, 231–250.
[5] Beirlant, J., Vynckier, P., Teugels, J., 1996. Tail index estimation, Pareto quantile plots, and regression diagnostics. Journal of the American Statistical Association 91, 1659–1667.
[6] Bingham, N.H., Goldie, C.M., Teugels, J.L., 1987. Regular Variation. Cambridge: Cambridge University Press.
[7] Caeiro, F., Gomes, M.I., 2006. A new class of estimators of a “scale” second order parameter.
Extremes 9, 193–211.
[8] Caeiro, F., Gomes, M.I., 2008. Minimum-variance reduced-bias tail index and high quantile estimation. REVSTAT – Statistical Journal 6, 1–20.
22
[9] Chernoff, H., Gastwirth, J.L., Johns, M.V., 1967. Asymptotic distribution of linear combinations of
functions of order statistics with applications to estimation. Annals of Mathematical Statistics 38,
52–72.
[10] Ciuperca, G., Mercadier, C., 2010. Semi-parametric estimation for heavy tailed distributions. Extremes 13, 55–87.
[11] de Haan, L., Ferreira, A., 2006. Extreme Value Theory: An Introduction. Springer.
[12] de Haan, L., Stadtmüller, U., 1996. Generalized regular variation of second order. Journal of the
Australian Mathematical Society (Series A) 61, 381–395.
[13] Draisma, G., 2000. Parametric and Semi-Parametric Methods in Extreme Value Theory. Tinbergen
Institute Research Series 239, Erasmus Universiteit Rotterdam.
[14] Drees, H., Kaufmann, E., 1998. Selecting the optimal sample fraction in univariate extreme value
estimation. Stochastic Processes and their Applications 75, 149–172.
[15] Feuerverger, A., Hall, P., 1999. Estimating a tail exponent by modelling departure from a Pareto
distribution. The Annals of Statistics 27, 760–781.
[16] Fraga Alves, I., de Haan, L., Lin, T., 2006. Third order extended regular variation. Publications de
l’Institut Mathématique 80, 109–120.
[17] Fraga Alves, M.I., Gomes, M.I., de Haan, L., 2003. A new class of semi-parametric estimators of the
second order parameter. Portugaliae Mathematica 60, 193–213.
[18] Geluk, J.L., de Haan, L., 1987. Regular Variation, Extensions and Tauberian Theorems. CWI Tract
40, Center for Mathematics and Computer Science, Amsterdam.
[19] Gnedenko, B.V., 1943. Sur la distribution limite du terme maximum d’une série aléatoire. Annals of
Mathematics 44, 423–453.
[20] Goegebeur, Y., Beirlant, J., de Wet, T., 2010. Kernel estimators for the second order parameter in
extreme value statistics. Journal of Statistical Planning and Inference 140, 2632–2652.
[21] Gomes, M.I., de Haan, L., Peng, L., 2002. Semi-parametric estimation of the second order parameter
in statistics of extremes. Extremes 5, 387–414.
[22] Gomes, M.I., de Haan, L., Rodrigues, L.H., 2008. Tail index estimation for heavy-tailed models:
accommodation of bias in weighted log-excesses. Journal of the Royal Statistical Society Series B 70,
31–52.
23
[23] Gomes, M.I., Martins, M.J., 2002. ‘Asymptotically unbiased’ estimators of the tail index based on
external estimation of the second order parameter. Extremes 5, 5–31.
[24] Guillou, A., Hall, P., 2001. A diagnostic for selecting the threshold in extreme value analysis. Journal
of the Royal Statistical Society Series B 63, 293–305.
[25] Hall, P., Welsh, A.H., 1985. Adaptive estimates of parameters of regular variation. The Annals of
Statistics 13, 331–341.
[26] Rényi, A., 1953. On the theory of order statistics. Acta Mathematica Academiae Scientiarum Hungaricae 4, 191–231.
[27] Rodriguez, R.N., 1977. A guide to the Burr type XII distributions. Biometrika 64, 129–134.
[28] Wang, X.Q., Cheng, S.H., 2005. General regular variation of n-th order and the 2nd order Edgeworth
expansion of the extreme value distribution (I). Acta Mathematica Sinica 21, 1121–1130.
24
3.0
0.0
0.5
1.0
1.5
Abias
2.0
2.5
3.0
2.5
2.0
1.5
Abias
1.0
0.5
0.0
−2.0
−1.5
−1.0
−0.5
0.0
−2.0
−1.5
−1.0
−0.5
0.0
−0.5
0.0
−0.5
0.0
4
3
Abias
2
1
0
0
1
2
Abias
3
4
5
beta
5
beta
−2.0
−1.5
−1.0
−0.5
0.0
−2.0
−1.5
−1.0
800
600
Astd
400
200
0
0
200
400
Astd
600
800
1000
beta
1000
beta
−2.0
−1.5
−1.0
beta
−0.5
0.0
−2.0
−1.5
−1.0
beta
Figure 1: Asymptotic behavior of β̂n,k (δ, τ , ρ) for Pareto-type models with γ = 1 and ρ = β = η. Left:
parameter setting δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and
τ ′ = (0.7, 0.8, 0.9) (dotted line); right: parameter setting δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25) (solid
line), τ ′ = (1, 1.25, 1.5) (dashed line) and τ ′ = (1.25,
25 1.5, 1.75) (dotted line). Top row: d˜2 (τ , δ, ρ, β, η);
middle row: d˜3 (τ , δ, ρ, β); bottom row: ṽ(τ , δ, γ, ρ, β) as a function of β.
1.0
−1.0
−0.5
0.0
Abias
0.5
1.0
0.5
0.0
Abias
−0.5
−1.0
0
2000
4000
6000
8000
10000
0
2000
4000
8000
10000
6000
8000
10000
6000
8000
10000
1.0
0.5
−1.0
−0.5
0.0
Abias
0.5
0.0
Abias
−0.5
−1.0
0
2000
4000
6000
8000
10000
0
2000
4000
0.5
0.0
−0.5
−1.0
−1.0
−0.5
0.0
Abias
0.5
1.0
k
1.0
k
Abias
6000
k
1.0
k
0
2000
4000
6000
k
8000
10000
0
2000
4000
k
Figure 2: Asymptotic bias of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the Burr(1,2,0.5)
(top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1) with τ ′ =
(0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted line). Right: δ ′ =
(0.25, 0.5) with τ ′ = (0.75, 1, 1.25) (solid line), τ ′ =26(1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 1.5, 1.75)
(dotted line).
50
0
10
20
Astd
30
40
50
40
30
Astd
20
10
0
0
2000
4000
6000
8000
10000
0
2000
4000
6000
8000
10000
6000
8000
10000
6000
8000
10000
40
30
Astd
20
10
0
0
10
20
Astd
30
40
50
k
50
k
0
2000
4000
6000
8000
10000
0
2000
4000
40
30
Astd
20
10
0
0
10
20
Astd
30
40
50
k
50
k
0
2000
4000
6000
k
8000
10000
0
2000
4000
k
Figure 3: Asymptotic standard deviation of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the
Burr(1,2,0.5) (top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1)
with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted
′
line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25)
27 (solid line), τ = (1, 1.25, 1.5) (dashed line) and
τ ′ = (1.25, 1.5, 1.75) (dotted line).
0.25
AMSE
0.00
0.05
0.10
0.15
0.20
0.25
0.20
0.15
AMSE
0.10
0.05
0.00
5000
6000
7000
8000
9000
10000
5000
6000
7000
9000
10000
8000
9000
10000
8000
9000
10000
0.25
0.20
0.15
AMSE
0.00
0.05
0.10
0.20
0.15
AMSE
0.10
0.05
0.00
5000
6000
7000
8000
9000
10000
5000
6000
7000
0.20
0.15
AMSE
0.00
0.05
0.10
0.00
0.05
0.10
0.15
0.20
0.25
k
0.25
k
AMSE
8000
k
0.25
k
5000
6000
7000
8000
k
9000
10000
5000
6000
7000
k
Figure 4: Asymptotic mean squared error of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the
Burr(1,2,0.5) (top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1)
with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted
′
line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25)
28 (solid line), τ = (1, 1.25, 1.5) (dashed line) and
τ ′ = (1.25, 1.5, 1.75) (dotted line).
4
−4
−2
0
Abias
2
4
2
0
Abias
−2
−4
0
2000
4000
6000
8000
10000
0
2000
4000
6000
8000
10000
6000
8000
10000
8000
9000
10000
40
30
Astd
20
10
0
0
10
20
Astd
30
40
50
k
50
k
0
2000
4000
6000
8000
10000
0
2000
4000
8
6
AMSE
4
2
0
0
2
4
AMSE
6
8
10
k
10
k
5000
6000
7000
8000
9000
k
10000
5000
6000
7000
k
Figure 5: Modified Pareto distribution. Asymptotic bias (top), asymptotic standard deviation (middle)
and asymptotic mean squared error (bottom) of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000. Left:
δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9)
′
(dotted line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75,
29 1, 1.25) (solid line), τ = (1, 1.25, 1.5) (dashed line)
and τ ′ = (1.25, 1.5, 1.75) (dotted line).
10
0
8
−1
MSE
6
−2
beta
4
−3
2
−4
0
−5
6000
7000
8000
9000
10000
6000
7000
8000
9000
10000
9500
10000
k
0
−5
2
−4
4
−3
beta
MSE
6
−2
8
−1
0
10
k
8000
8500
9000
9500
10000
8000
8500
9000
k
10
0
−5
−4
5
−3
beta
MSE
−2
15
−1
0
20
k
9000
9200
9400
9600
9800
10000
9000
k
9200
9400
9600
9800
10000
k
Figure 6: Burr simulation, 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k ) with τ ′ =
(0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line) for
Burr(1, 2, 0.5) (top), Burr(1, 1, 1) (middle), Burr(1, 0.5, 2) (bottom). Right: corresponding empirical
MSE.
30
10
0
8
−1
MSE
6
−2
beta
4
−3
2
−4
0
−5
6000
7000
8000
9000
10000
6000
7000
8000
9000
10000
9500
10000
k
0
−5
2
−4
4
−3
beta
MSE
6
−2
8
−1
0
10
k
8000
8500
9000
9500
10000
8000
8500
9000
k
10
0
−5
−4
5
−3
beta
MSE
−2
15
−1
0
20
k
5000
6000
7000
8000
9000
10000
5000
k
6000
7000
8000
9000
10000
k
Figure 7: GPD(1,0.5) (top), GPD(1,1) (middle) and modified Pareto (bottom) simulation, 1,000 runs of
size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line),
and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line). Right: corresponding empirical MSE.
31
10
0
8
−1
MSE
6
−2
beta
4
−3
2
−4
0
−5
6000
7000
8000
9000
10000
6000
7000
8000
9000
10000
9500
10000
k
0
−5
2
−4
4
−3
beta
MSE
6
−2
8
−1
0
10
k
8000
8500
9000
9500
10000
8000
8500
9000
k
10
0
−5
−4
5
−3
beta
MSE
−2
15
−1
0
20
k
9000
9200
9400
9600
9800
10000
9000
k
9200
9400
9600
9800
10000
k
Figure 8: Burr simulation, 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k̆ ) with τ ′ =
(0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line) for
Burr(1, 2, 0.5) (top), Burr(1, 1, 1) (middle), Burr(1, 0.5, 2) (bottom). Right: corresponding empirical
MSE.
32
10
0
8
−1
MSE
6
−2
beta
4
−3
2
−4
0
−5
6000
7000
8000
9000
10000
6000
7000
8000
9000
10000
9500
10000
k
0
−5
2
−4
4
−3
beta
MSE
6
−2
8
−1
0
10
k
8000
8500
9000
9500
10000
8000
k
8500
9000
k
Figure 9: GPD(1,0.5) (top) and GPD(1,1) (bottom), 1,000 runs of size n = 10, 000. Left: Median
of β̂n,k (τ , δ, ρ̂n,k̆ ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ =
(0.05, 0.1) (dashed line). Right: corresponding empirical MSE.
33