Estimation of the third order parameter in extreme value statistics Yuri Goegebeur ∗ Tertius de Wet † April 9, 2011 Abstract We introduce a class of estimators for the third order parameter in extreme value statistics when the distribution function underlying the data is heavy tailed. For appropriately chosen intermediate sequences of upper order statistics, consistency is established under the third order tail condition and asymptotic normality under the fourth order tail condition. Numerical and simulation experiments illustrate the asymptotic and finite sample behavior, respectively, of some selected estimators. Keywords: Pareto-type distribution, third order parameter, fourth order condition. 1 Introduction A central topic in the area of extreme value statistics is drawing inferences about the far tail of a distribution function. In this process of drawing inferences, it is necessary to extend the empirical distribution function beyond the available data, which is typically done by imposing some conditions on the tail behavior of the underlying distribution function. In this paper we deal with an estimation problem within the class of heavy-tailed or Pareto-type distributions, which constitute the max-domain of attraction of the Fréchet distribution. ∗ Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark (email: [email protected]). This author’s research was supported by a grant from the Danish Natural Science Research Council. † Department of Statistics and Actuarial Science, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa (email: [email protected]). This author’s research was partially supported by the National Research Foundation. 1 A distribution is said to be of Pareto-type if for some γ > 0 its survival function is of the form: 1 − F (x) = x−1/γ ℓF (x), x > 0, (1) ℓF (λx) → 1 as x → ∞ for all λ > 0. ℓF (x) (2) where ℓF denotes a slowly varying function at infinity, i.e. The parameter γ is called the extreme value index, and clearly governs the tail behavior, with larger values indicating heavier tails. The Pareto-type model can also be stated in an equivalent way in terms of the tail quantile function U , where U (x) := inf{y : F (y) ≥ 1 − 1/x}, x > 1, as follows U (x) = xγ ℓU (x), (3) with ℓU again a slowly varying function at infinity (Gnedenko, 1943). The estimation of the parameter γ has received a lot of attention in the extreme value literature; we refer to Beirlant et al. (2004) and de Haan and Ferreira (2006) for good accounts of such estimation procedures. Consistency of estimators for γ can typically be achieved under (1), which is a first order condition, whereas establishing asymptotic normality requires more structure on the tail of F . This extra structure is typically formulated in terms of a second order condition on the tail behavior, the so-called slow variation with remainder condition; see Bingham et al. (1987) and de Haan and Ferreira (2006). Rx Let Dρ (x) := 1 uρ−1 du = (xρ − 1)/ρ, x > 0, ρ < 0. Second order condition (R2 ) There exists a positive real parameter γ, a negative real parameter ρ and a function b with b(t) → 0 for t → ∞, of constant sign for large values of t, such that lim t→∞ ln U (tx) − ln U (t) − γ ln x = Dρ (x), ∀x > 0. b(t) Let us remark here that condition (R2 ) implies that |b| is regularly varying of index ρ (Geluk and de Haan, 1987), and hence the parameter ρ determines the rate of convergence of ln U (tx) − ln U (t) to its limit, being γ ln x, as t → ∞. It should be mentioned that the second order condition as presented in de Haan and Ferreira (2006) also allows the case ρ = 0. However, in this paper the case ρ = 0 is excluded. Although the estimation of the second order parameter ρ is challenging, both from a theoretical and an applied perspective, several estimators for ρ that work well in practice have recently been introduced, 2 and their asymptotic properties studied. We refer here to Gomes et al. (2002), Fraga Alves et al. (2003), Ciuperca and Mercadier (2010), and Goegebeur et al. (2010), among others. In these papers, consistency is established under (R2 ), whereas asymptotic normality requires again a further condition on the tail behavior, the third order condition. Denote Z x Z y ρ−1 Hρ,β (x) := y uβ−1 du dy 1 1 1 xρ+β − 1 xρ − 1 = , − β ρ+β ρ x > 0, ρ, β < 0. Third order condition (R3 ) There exists a positive real parameter γ, negative real parameters ρ and β, functions b and b̃ with b(t) → 0 and b̃(t) → 0 for t → ∞, both of constant sign for large values of t, such that lim t→∞ ln U(tx)−ln U(t)−γ ln x b(t) − Dρ (x) b̃(t) = Hρ,β (x), ∀x > 0. Note that under condition (R3 ) it can be proven that |b̃| is regularly varying of index β. We refer to de Haan and Stadtmüller (1996) for further details. It is clear that (R3 ) specifies the rate of convergence in (R2 ), in a way similar to (R2 ) specifying the rate of convergence in the first order framework. In this paper we deal with the estimation of the third order parameter β in (R3 ) with ρ < 0 and β < 0. Having estimators for this parameter is practically relevant for the following reasons. Firstly, successful application of extreme value methods often depends critically on the appropriate selection of the number of extreme values used in the estimation. Although this problem has been well studied in the framework of the estimation of γ (Hall and Welsh (1985), Beirlant et al. (1996), Drees and Kaufmann (1998), Guillou and Hall (2001), among others), there are, apart from some heuristic guidelines, no formal methods to deal with this problem when estimating ρ. We refer to Caeiro and Gomes (2008) for a discussion of this issue. Secondly, recently improved estimators for γ were introduced, which were obtained by explicitly estimating the dominant term of the asymptotic bias; see Beirlant et al. (1999), Feuerverger and Hall (1999), Beirlant et al. (2002), Gomes and Martins (2002), and more recently Gomes et al. (2008), for examples of such bias-corrected estimators for γ. No such estimators are available for the second order parameter. The absence of research on the above mentioned topics for the estimation of ρ can be attributed to the fact that to date, to the best of our knowledge, no estimators for β are available. With this paper we make a first attempt in this direction, with the objective to provide an impetus to new research on improved estimation procedures for the second order parameter. 3 The remainder of this paper is organized as follows. In the next section we introduce the class of estimators for the third order parameter β. The asymptotic properties of the proposed class, consistency and asymptotic normality, are established in Section 3. In Section 4 we illustrate the asymptotic and finite sample behavior of some specific estimators for β numerically. The proofs of all results are deferred to the Appendix. 2 The class of estimators for β In this section we introduce our class of estimators for the third order parameter. Consider X1 , . . . , Xn independent and identically distributed (i.i.d.) random variables according to a distribution function F satisfying (1), with associated order statistics X1,n ≤ · · · ≤ Xn,n . Let Tn,k (K) denote a kernel statistic with kernel function K: Tn,k (K) := k j 1X K Zj k j=1 k+1 (4) where Zj := j(ln Xn−j+1,n − ln Xn−j,n ), the so-called scaled log-spacings of successive order statistics introduced in Beirlant et al. (1999) and Feuerverger and Hall (1999). Further, for β, ρ < 0, let Z 1 Z 1 K 2 (u)du, K(u)du, σ 2 (K) := µ(K) := 0 0 I1 (K, ρ) := Z 1 K(u)u −ρ du, I2 (K, ρ, β) := Z 1 K(u)u−ρ 0 0 u−β − 1 du. β In this paper we will consider statistics of type (4) with a log-weight function Lα (u) := (− ln u)α , u ∈ (0, 1), Γ(α + 1) with α > 0 and where Γ denotes the Gamma function, i.e. Γ(ω) = R∞ 0 e−x xω−1 dx (ω > 0), as basic building blocks for the class of estimators for β. Therefore, we start with studying their asymptotic behavior. The following proposition gives the asymptotic distributional representation of Tn,k (Lα ) under (R3 ). Let Y1,n ≤ · · · ≤ Yn,n denote the order statistics of a random sample of size n from the unit Pareto distribution, with distribution function F (y) = 1 − 1/y, y > 1. Proposition 1 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R3 ). If k → ∞ such that k/n → 0 we have that D Tn,k (Lα ) = γ + γσ(Lα ) Nk (Lα ) √ + b(Yn−k,n )I1 (Lα , ρ) k +b(Yn−k,n )b̃(Yn−k,n )I2 (Lα , ρ, β)(1 + oP (1)) + b(Yn−k,n )OP where Nk (Lα ) is an asymptotic standard normally distributed random variable. 4 1 √ k , This proposition follows immediately from the fact that Tn,k (Lα ) is an element of the class of kernel statistics studied in Theorem 2 of Goegebeur et al. (2010), and hence for brevity we omit its proof from this paper. Note that 1 1 1 , I2 (Lα , ρ, β) = − β (1 − ρ − β)α+1 (1 − ρ)α+1 1 , (1 − ρ)α+1 Γ(2α + 1) . Γ2 (α + 1) I1 (Lα , ρ) = σ 2 (Lα ) = The basic idea is now to use Proposition 1 to construct differences and ratios of statistics Tn,k (Lα ), chosen in such a way that the term involving I2 (Lα , ρ, β) becomes the dominant one. For arbitrary positive real parameters α and α̃, we denote I1 (α, α̃, ρ) := I1 (Lα , ρ) − I1 (Lα̃ , ρ), From Proposition 1 we have that if Ψn,k (α, α̃, ᾰ) := I2 (α, α̃, ρ, β) := I2 (Lα , ρ, β) − I2 (Lα̃ , ρ, β). √ kb(n/k) → ∞ Tn,k (Lα ) − Tn,k (Lα̃ ) Tn,k (Lᾰ ) − Tn,k (Lα̃ ) P → ψ(α, α̃, ᾰ, ρ) := I1 (α, α̃, ρ) (1 − ρ)α̃−α − 1 = , I1 (ᾰ, α̃, ρ) (1 − ρ)α̃−ᾰ − 1 where ᾰ is again an arbitrary positive real parameter. We refer to the proof of Theorem 1 for further details about this result. Now, in order to make the term involving I2 (α, ρ, β) the dominant one, we construct a ratio of differences of statistics Ψn,k (α, α̃, ᾰ), with appropriately chosen parameters α, α̃ and ᾰ. Setting 0 < τ1 < τ2 < τ3 and 0 < δ1 , δ2 < τ1 , consider the statistic Λn,k (τ , δ) := Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) − Ψn,k (τ3 − δ1 , τ3 , τ3 − δ2 ) where τ ′ := (τ1 , τ2 , τ3 ) and δ ′ := (δ1 , δ2 ), and the function Λ(τ , ρ, β) := = c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β) c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β) − c1 (τ3 − δ1 , τ3 , τ3 − δ2 , ρ, β) (1 − ρ − β)τ2 (1 − ρ)τ1 − (1 − ρ − β)τ1 (1 − ρ)τ2 (1 − ρ − β)τ3 −τ1 , (1 − ρ − β)τ3 (1 − ρ)τ2 − (1 − ρ − β)τ2 (1 − ρ)τ3 with I2 (α, α̃, ρ, β) − ψ(α, α̃, ᾰ, ρ)I2 (ᾰ, α̃, ρ, β) . (5) I1 (ᾰ, α̃, ρ) √ In Theorem 1 we establish the consistency of Λn,k (τ , δ) for Λ(τ , ρ, β) as kb(n/k)b̃(n/k) → ∞. Further, c1 (α, α̃, ᾰ, ρ, β) := we have that the function β 7→ Λ(τ , ρ, β) is monotone, as stated in the following lemma. Lemma 1 The function β 7→ Λ(τ , ρ, β) is decreasing for β ∈ (−∞, 0) with lim Λ(τ , ρ, β) = − β→0− τ2 − τ1 τ2 − τ3 and 5 lim Λ(τ , ρ, β) = +∞. β→−∞ The consistency of Λn,k (τ , δ) for Λ(τ , ρ, β) together with the result from Lemma 1 motivate the estimating equation Λn,k (τ , δ) = Λ(τ , ρ, β), from which the estimator for β can be obtained by inversion: β̂n,k (τ , δ, ρ) := Λ← (τ , ρ, Λn,k (τ , δ)), provided Λn,k (τ , δ) > −(τ2 − τ1 )/(τ2 − τ3 ). 3 3.1 Main asymptotic results Consistency First, we establish consistency of the proposed class of estimators for β under condition (R3 ), and for appropriately chosen sequences k and n. Theorem 1 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R3 ). Then, √ P if k → ∞ such that k/n → 0 and kb(n/k)b̃(n/k) → ∞ we have that Λn,k (τ , δ) → Λ(τ , ρ, β) and that P P β̂n,k (τ , δ, ρ) → β. Further, if ρ̂n,k̆ is a consistent estimator for ρ, then also β̂n,k (τ , δ, ρ̂n,k̆ ) → β. The recent extreme value literature contains several consistent estimators for the second order parameter ρ that work very well in practice; see Gomes et al. (2002), Fraga Alves et al. (2003), Ciuperca and Mercadier (2010), and Goegebeur et al. (2010). In these papers it is shown that consistency of the estimators for ρ p can be obtained when the sequence k̆ satisfies k̆ → ∞ with k̆/n → 0 and k̆b(n/k̆) → ∞. Moreover, if p p additionally k̆b(n/k̆)b̃(n/k̆) → λ1 and k̆b2 (n/k̆) → λ2 we can obtain asymptotic normality for ρ̂n,k̆ , when appropriately normalized. In the framework of Theorem 1, k̆ can be taken equal to k, the number of √ √ extreme values used in the estimation of β. Indeed, kb(n/k)b̃(n/k) → ∞ implies that kb(n/k) → ∞, and hence ρ̂n,k will be consistent for ρ, though for such a sequence we may no longer guarantee asymptotic normality of the normalized ρ̂n,k . 3.2 Asymptotic normality In this section we will show that our class of estimators for the third order parameter, when appropriately normalized, converges in distribution to a normal random variable. Similarly to the estimation of first and second order tail parameters, where the derivation of the asymptotic normality required a second and a third order condition on the tail behavior, respectively, we impose here the so-called fourth order condition, introduced in Fraga Alves et al. (2006). 6 Denote Rρ,β,η (x) Z x y ρ−1 Z y uβ−1 Z u sη−1 ds du dy 1 1 1 ρ+β+η xρ − 1 xρ − 1 1 x −1 1 xρ+β − 1 1 − − − , x > 0, ρ, β, η < 0. η β+η ρ+β+η ρ β ρ+β ρ := = Fourth order condition (R4 ) There exists a positive real parameter γ, negative real parameters ρ, β and η, functions b, b̃ and b̆ with b(t) → 0, b̃(t) → 0 and b̆(t) → 0 for t → ∞, all of constant sign for large values of t, such that ln U (tx) − ln U (t) − γ ln x 1 1 lim − Dρ (x) − Hρ,β (x) = Rρ,β,η (x), ∀x > 0. t→∞ b̆(t) b̃(t) b(t) It is shown in Fraga Alves et al. (2006) that |b̆| is regularly varying with index of regular variation η. See also Wang and Cheng (2005). We start with the expansion of the statistic Tn,k (K) under (R4 ). This result is more general than what is needed in the present paper, but is of interest on its own as such an expansion for a general kernel statistic is to date not available. We impose the following conditions on the kernel function K. Condition (K) Let K be a function defined on (0, 1) such that 1 t (i) K(t) = Rt 0 R j/(k+1) j u(v)dv for some function u satisfying (k + 1) (j−1)/(k+1) u(t)dt ≤ f ( k+1 ) for some positive continuous and integrable function f defined on (0, 1), (ii) 1 k Pk (iii) 1 k (iv) 1 k j=1 Pk K j=1 Pk j=1 K K j k+1 j k+1 j k+1 √ = µ(K) + o(1/ k) for k → ∞, j k+1 j k+1 −ρ √ = I1 (K, ρ) + o(1/ k) for k → ∞, −ρ−β √ = I1 (K, ρ + β) + o(1/ k) for k → ∞, (v) σ 2 (K) < ∞, 7 √ i (vi) maxi∈{1,...,k} K k+1 = o( k) for k → ∞, (vii) R1 0 |K(u)| u|ρ|−1−ε du < ∞ for some ε > 0. Denote I3 (K, ρ, β, η) := 1 η Z 1 K(u)u −ρ 0 u−β−η − 1 u−β − 1 − β+η β du. Theorem 2 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). If further (K) holds, then for k → ∞ such that k/n → 0 we have Tn,k (K) D = √ Nk (K) γµ(K) + γσ(K) √ (1 + oP (1)) + b(Yn−k,n )I1 (K, ρ) + b(Yn−k,n )OP (1/ k) k √ +b(Yn−k,n )b̃(Yn−k,n )I2 (K, ρ, β) + b(Yn−k,n )b̃(Yn−k,n )OP (1/ k) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )I3 (K, ρ, β, η)(1 + oP (1)), where Nk (K) is an asymptotic standard normally distributed random variable. For the log kernel function Lα we have I3 (Lα , ρ, β, η) 1 1 1 − = β + η (1 − ρ − β − η)α+1 (1 − ρ)α+1 1 1 1 − . − β (1 − ρ − β)α+1 (1 − ρ)α+1 1 η We now establish the asymptotic normality of our estimator. In this we will use the following notation: I3 (α, α̃, ρ, β, η) := c2 (α, α̃, ᾰ, ρ, β, η) := c3 (α, α̃, ᾰ, ρ, β) := Vk (τ1 , τ2 , δ, γ, ρ) := d1 (τ1 , τ2 , δ, ρ, β) := d2 (τ1 , τ2 , δ, ρ, β, η) := d3 (τ1 , τ2 , δ, ρ, β) := Vk (τ , δ, γ, ρ, β) := d2 (τ , δ, ρ, β, η) := d3 (τ , δ, ρ, β) := v 2 (τ , δ, γ, ρ, β) := I3 (Lα , ρ, β, η) − I3 (Lα̃ , ρ, β, η), I3 (α, α̃, ρ, β, η) − ψ(α, α̃, ᾰ, ρ)I3 (ᾰ, α̃, ρ, β, η) , I1 (ᾰ, α̃, ρ) I2 (α, α̃, ρ, β)I2 (ᾰ, α̃, ρ, β) − ψ(α, α̃, ᾰ, ρ)I22 (ᾰ, α̃, ρ, β) , I12 (ᾰ, α̃, ρ) Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ) − Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ), c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β), c2 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β, η) − c2 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β, η), c3 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c3 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β), Vk (τ1 , τ2 , δ, γ, ρ) − Λ(τ , ρ, β)Vk (τ2 , τ3 , δ, γ, ρ) d1 (τ2 , τ3 , δ, ρ, β) d2 (τ1 , τ2 , δ, ρ, β, η) − Λ(τ , ρ, β)d2 (τ2 , τ3 , δ, ρ, β, η) , d1 (τ2 , τ3 , δ, ρ, β) d3 (τ1 , τ2 , δ, ρ, β) − Λ(τ , ρ, β)d3 (τ2 , τ3 , δ, ρ, β) , d1 (τ2 , τ3 , δ, ρ, β) Avar(Vk (τ , δ, γ, ρ, β)), 8 for Nk (α, α̃, ᾰ, γ, ρ) we refer to (8). Theorem 3 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). Then, √ √ √ if k → ∞ such that k/n → 0, kb(n/k)b̃(n/k) → ∞, kb(n/k)b̃(n/k)b̆(n/k) → λ1 and kb(n/k)b̃2 (n/k) → λ2 we have that √ D kb(n/k)b̃(n/k)[Λn,k (τ , δ) − Λ(τ , ρ, β)] → N (λ1 d2 (τ , δ, ρ, β, η) − λ2 d3 (τ , δ, ρ, β), v 2 (τ , δ, γ, ρ, β)). The proof of Theorem 3 is given in Section 6.4, and the derivation of the asymptotic variance can be found in Section 6.6. Let Λ′ (τ , ρ, β) := dΛ(τ , ρ, β)/dβ and set d˜2 (τ , δ, ρ, β, η) := d˜3 (τ , δ, ρ, β) := ṽ 2 (τ , δ, γ, ρ, β) := d2 (τ , δ, ρ, β, η) , Λ′ (τ , ρ, β) d3 (τ , δ, ρ, β) , Λ′ (τ , ρ, β) v 2 (τ , δ, γ, ρ, β) . [Λ′ (τ , ρ, β)]2 A straightforward application of the delta method establishes the asymptotic normality of the estimators β̂n,k (τ , δ, ρ) and β̂n,k (τ , δ, ρ̂n,k̆ ) for β. The result is stated formally in the next theorem. Theorem 4 Let X1 , . . . , Xn be i.i.d. random variables according to a distribution satisfying (R4 ). Then, √ √ √ if k → ∞ such that k/n → 0, kb(n/k)b̃(n/k) → ∞, kb(n/k)b̃(n/k)b̆(n/k) → λ1 and kb(n/k)b̃2 (n/k) → λ2 we have that √ D kb(n/k)b̃(n/k)[β̂n,k (τ , δ, ρ) − β] → N (λ1 d˜2 (τ , δ, ρ, β, η) − λ2 d˜3 (τ , δ, ρ, β), ṽ 2 (τ , δ, γ, ρ, β)). The result continues to hold when ρ in β̂n,k (τ , δ, ρ) is replaced by an external estimator ρ̂n,k̆ which is p p such that ρ̂n,k̆ − ρ = OP (1/( k̆b(n/k̆))), when k̆b(n/k̆) → ∞, provided Some remarks: √ kb(n/k)b̃(n/k) p → 0. k̆b(n/k̆) (6) • Many important members of the class of Pareto-type models like the Fréchet, Burr and student t, satisfy condition (R4 ) with b(x) ∼ a1 xρ , b̃(x) ∼ a2 xρ and b̆(x) = a3 xρ as x → ∞, for some constants a1 , a2 and a3 . For such models one easily obtains from Theorem 4 that the asymptotic mean squared error (AMSE) optimal k is of the order n−6ρ/(1−6ρ) , which is larger than the orders n−2ρ/(1−2ρ) and n−4ρ/(1−4ρ) one 9 typically achieves for the estimation of first and second order tail parameters, respectively. When using this AMSE optimal k sequence for the estimation of β, the rate of convergence in Theorem 4 is of the order nρ/(1−6ρ) . Also, in this regard, condition (6) can be easily verified to be satisfied when using the AMSE optimal sequences for k and k̆. • The construction of confidence intervals for the third order parameter β, based on the normal limiting distribution of Theorem 4 requires that one has an estimator for the normalizing function b(n/k)b̃(n/k) available. The function b(n/k) can be estimated by using the exponential regression model for scaled log-spacings of successive order statistics presented in Beirlant et al. (1999), or, when one considers the special case where b(x) = axρ , by the estimator for the second order scale parameter a of Caeiro and Gomes (2006) combined with an estimator for the second order rate parameter ρ (see e.g. Gomes et al., 2002, Goegebeur et al., 2010). However, so far no estimator for the function b̃ is available; in fact this is a topic of current research. In case one is interested in a confidence interval for β one could use a bootstrap procedure as an alternative to the use of Theorem 4. 4 Numerical and simulation results In this section we illustrate the asymptotic and the finite sample behavior of some specific examples of β̂(τ , δ, ρ). We consider the parameter settings δ ′ = (0.05, 0.1) combined with τ ′ = (0.5, 0.6, 0.7), τ ′ = (0.6, 0.7, 0.8) and τ ′ = (0.7, 0.8, 0.9), and δ ′ = (0.25, 0.5) combined with τ ′ = (0.75, 1, 1.25), τ ′ = (1, 1.25, 1.5) and τ ′ = (1.25, 1.5, 1.75). These values for the tuning parameters gave a good performance for a wide range of Pareto-type models in a preliminary evaluation of the asymptotic mean squared errors. It should however be mentioned that these values are in no way optimally selected, so improvements over the results presented in this paper are still possible. In Figure 1 we show the asymptotic bias components d˜2 (τ , δ, ρ, β, η) and d˜3 (τ , δ, ρ, β), and the asymptotic standard deviation ṽ(τ , δ, γ, ρ, β) as a function of β for Pareto-type models with ρ = β = η. The bias components are increasing in β, a result that is in line with the estimation of the second order parameter. The parameter settings δ ′ = (0.05, 0.1) with τ ′ = (0.7, 0.8, 0.9) and δ ′ = (0.25, 0.5) with τ ′ = (1.25, 1.5, 1.75) have the smallest bias components of the parameter configurations under consideration, although it should be mentioned that the actual performance with respect to asymptotic bias will depend on the way in which the two asymptotic bias components become combined in the overall bias. As for the asymptotic standard deviation we see that none of the parameter settings performs uniformly 10 best. As expected, the asymptotic standard deviations are considerably larger than those reported in Fraga Alves et al. (2003) and Goegebeur et al. (2010) in the framework of the estimation of the second order parameter. In order to get a better idea about the actual large sample behavior of the estimators for β we consider two specific Pareto-type models: 1. The Burr distribution. As illustrated in Rodriguez (1977), the Burr model is very flexible, and hence has good potential for practical modeling purposes; see Beirlant et al. (1998) for an application in the actuarial context. For the Burr distribution, denoted Burr(ζ, λ, δ), λ ζ 1 − F (x) = , x > 0; ζ, λ, δ > 0, ζ + xδ from which one easily derives that γ = 1/(λδ). Further, the model satisfies condition (R4 ) with ρ = −1/λ, ρ = β = η, b(x) = γ xρ xρ , b̃(x) = ρ ρ 1−x 1 − xρ and b̆(x) = 2ρ xρ . 1 − xρ We consider the cases (ζ, λ, δ) = (1, 2, 0.5), (1, 1, 1), and (1, 0.5, 2), corresponding to γ = 1 and β = −0.5, −1, and −2, respectively. 2. The ‘modified’ Pareto distribution with a tail quantile function given by U (x) = Cxγ eD0 +D1 x ρ +D2 xρ+β +D3 xρ+β+η , with C > 0 and D1 , D2 , D3 6= 0. This model satisfies (R4 ) provided η > β with b(x) = b̃(x) = b̆(x) = ρD1 xρ , e1 xβ − C e2 xβ+η 1−C β(ρ + β)D2 xβ , e3 xη ) ρD1 (1 − C η(β + η)(ρ + β + η)D3 xη , β(ρ + β)D2 in which e1 := (ρ + β)D2 , C ρD1 e2 := (ρ + β + η)D3 , C ρD1 e3 := (β + η)(ρ + β + η)D3 . C β(ρ + β)D2 Note that this model allows the rate parameters ρ, β and η to be different from each other. We consider the parameter setting γ = 0.1, ρ = −1, β = −0.5, η = −0.25, C = 1, D0 = 0, D1 = 1, D2 = −1 and D3 = −0.75. 11 In Figures 2-4 we show the first order approximation to the asymptotic bias, standard deviation and mean squared error, respectively, as a function of k when n = 10, 000, for all parameter settings and Burr distributions under consideration. The asymptotic bias shown in Figure 2 is, as expected, in absolute value increasing in k. For the parameter combinations with δ ′ = (0.05, 0.1), the setting τ ′ = (0.6, 0.7, 0.8) provides overall a good performance, whereas for δ ′ = (0.25, 0.5) the best performance is obtained with τ ′ = (0.75, 1, 1.25). Concerning the asymptotic standard deviation shown in Figure 3 we observe that none of the considered parameter combinations works uniformly best, a result that is in line with Figure 1, though the cases δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) provide a good overall performance. Figure 4 shows the AMSE of β̂n,k (δ, τ , ρ) as a function of k. As expected, the parameter combinations δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) work best, with δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) performing slightly better than δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25). In Figure 5 we show the corresponding results for the modified Pareto distribution. Similar to the Burr distribution, also here the parameter settings δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) give a good performance among the combinations under consideration. It is a well known fact that when estimating higher order parameters, the differences between the asymptotic and the finite sample results may become considerable. We refer to Draisma (2000) for a discussion of the difficulties of estimating asymptotic behavior correctly in finite samples. Also, Gomes et al. (2002) commented on the discrepancies between the asymptotic and finite sample results when estimating the second order rate parameter ρ. Therefore, we also illustrate the properties of the proposed estimator with a small simulation study. In this simulation experiment we include the four distributions considered above, as well as the generalized Pareto distribution. For the generalized Pareto distribution, denoted GPD(σ, γ), γx −1/γ , 1 − F (x) = 1 + σ x > 0; σ, γ > 0. This distribution can be easily verified to satisfy condition (R4 ) with ρ = β = η = −γ. We set (σ, γ) = (1, 0.5) and (1, 1). For all the distributions under consideration we generated 1,000 samples of size n = 10, 000, and computed the estimators for β with δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25), for k = 10 to k = 9, 990, in steps of 10. The authors developed a Fortran program, built around the NAG library subroutine C05ADF, to obtain β̂n,k (τ , δ, ρ) by numerical inversion of the estimating equation Λn,k (τ , δ) = Λ(τ , ρ, β). The program can be obtained upon request from the authors. In the simulation experiment, the parameter ρ is estimated with the power kernel estimator introduced in Goegebeur et al. (2010), computed at (i) k̆ = k (Figures 6 and 7) and (ii) k̆ = n0.95 in case ρ ≥ −1 and k̆ = n0.975 otherwise (Figures 8 and 9). In case (ii) we only report 12 the results for the Burr and Generalized Pareto distributions, which allow an explicit derivation of the order of the AMSE optimal sequence for k, and the proposed sequences for k̆ can be easily verified to satisfy (6) in Theorem 4. In Figures 6 till 9 we show the median (left column) and the empirical mean squared error (right column) as a function of k. We motivate our choice for the median by the fact that the sampling distribution of β̂n,k (δ, τ , ρ̂n,k ) is extremely skew and also has a large variance; the latter being in line with the results from the above asymptotic evaluation. The empirical mean squared error is computed as the sample mean (over the simulation runs) of the squared deviations between the estimate and the true parameter value. The sampling distribution of both estimators seems to be reasonably well centered at the true value, though, as expected, within the Burr(ζ, λ, δ) and GPD(σ, γ) families the case β = −0.5 is more difficult than the cases with a smaller β value. The estimators for β with δ ′ = (0.05, 0.1), τ ′ = (0.6, 0.7, 0.8) and δ ′ = (0.25, 0.5), τ ′ = (0.75, 1, 1.25) perform about equally well in terms of location and empirical MSE, with no clear preference for one of the settings. By comparing Figures 6 and 7 with Figures 8 and 9, we see that taking k̆ = n0.95 in case ρ ≥ −1 and k̆ = n0.975 otherwise works slightly better than using k̆ = k, both in terms of bias and empirical MSE. The proposed settings for the tuning parameters δ and τ gave overall a good performance on the simulated data sets. As mentioned above, the proposed values of these parameters are in no way optimally selected, and further work could be done on the development of automatic, data driven, procedures for setting these parameters. In practical applications of extreme value methodology, the tuning parameters are typically set at values that provide reasonably stable sample paths of the estimators under consideration, as it is common practice to infer about the value of the parameter of interest on the basis of the stable part of the sample path. This can be done e.g. by using a stability criterion, such as the one discussed in Caeiro and Gomes (2006) in the framework of the estimation of second order parameters. This is a topic of future research. 5 Conclusion In this paper we introduced a class of estimators for the third order parameter in extreme value statistics and established its consistency and asymptotic normality under some regularity conditions. The presented theoretical results are of interest on their own, but will also form the basis for research on the AMSE optimal selection of the number of upper order statistics when estimating the second order parameter and the development of bias-corrected estimators for the second order parameter. Work on these extensions is in progress. 13 6 6.1 Appendix Proof of Lemma 1 The two limits of Λ(τ , ρ, β) are trivial to obtain. In order to show that β 7→ Λ(τ , ρ, β) is a decreasing function for β ∈ (−∞, 0) we study its derivative dΛ(τ , ρ, β) (1 − ρ − β)τ2 +τ3 −1 (1 − ρ)τ2 +τ3 f (|β|) , = dβ [(1 − ρ − β)τ3 (1 − ρ)τ2 − (1 − ρ − β)τ2 (1 − ρ)τ3 ]2 with f (x) := (τ3 − τ1 )(1 + ax)τ2 −τ1 − (τ2 − τ1 )(1 + ax)τ3 −τ1 − (τ3 − τ2 ), x > 0, and a := 1/(1 − ρ). It now remains to verify that f (x) < 0 for x > 0. We have lim f (x) = 0, x→0+ lim f (x) = −∞ x→+∞ and f ′ (x) = a(τ2 − τ1 )(τ3 − τ1 )(1 + ax)−τ1 −1 [(1 + ax)τ2 − (1 + ax)τ3 ], which is negative for x > 0, from which the result follows. 6.2 Proof of Theorem 1 For some arbitrary positive real numbers α, α̃ and ᾰ, we denote Nk (α, α̃, γ) := Nk (α, α̃, ᾰ, γ, ρ) := γ[σ(Lα )Nk (Lα ) − σ(Lα̃ )Nk (Lα̃ )], Nk (α, α̃, γ) − ψ(α, α̃, ᾰ, ρ)Nk (ᾰ, α̃, γ) . I1 (ᾰ, α̃, ρ) (7) (8) From Proposition 1 we have the following asymptotic distributional representation Tn,k (Lα ) − Tn,k (Lα̃ ) b(Yn−k,n ) 1 D = I1 (α, α̃, ρ) + √ Nk (α, α̃, γ) kb(Yn−k,n ) +b̃(Yn−k,n )I2 (α, α̃, ρ, β)(1 + oP (1)) + OP 1 √ k . √ √ kb(n/k)b̃(n/k) → ∞ implies that kb(n/k) → ∞, and hence, using the property Yn−k,n = √ √ n/k(1 + oP (1)) and the regular variation of the function b, we obtain kb(Yn−k,n ) = kb(n/k)(1 + Note that oP (1)) → ∞. Also, from the above property concerning Yn−k,n and the regular variation of b̃ we obtain b̃(Yn−k,n ) = b̃(n/k)(1 + oP (1)) → 0. A straightforward application of Taylor’s theorem gives then D Ψn,k (α, α̃, ᾰ) = 1 ψ(α, α̃, ᾰ, ρ) + √ Nk (α, α̃, ᾰ, γ, ρ)(1 + oP (1)) kb(Yn−k,n ) +b̃(Yn−k,n )c1 (α, α̃, ᾰ, ρ, β)(1 + oP (1)), 14 where c1 (α, α̃, ᾰ, ρ, β) is defined in (5). For the specific construction considered in Theorem 1 we then have Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) b̃(Yn−k,n ) D = [c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β)](1 + oP (1)) 1 +√ [Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ) − Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)] (1 + oP (1)), kb(Yn−k,n )b̃(Yn−k,n ) √ and hence, if kb(n/k)b̃(n/k) → ∞ Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) b̃(Yn−k,n ) P → c1 (τ1 − δ1 , τ1 , τ1 − δ2 , ρ, β) − c1 (τ2 − δ1 , τ2 , τ2 − δ2 , ρ, β). P A similar result holds for the denominator of Λn,k (τ , δ), and thus Λn,k (τ , δ) → Λ(τ , ρ, β). The function Λ← is continuous in its last argument and hence the consistency of β̂n,k (τ , δ, ρ) for β is obtained in a straightforward way from the continuous mapping theorem. In a similar way we obtain the consistency of β̂n,k (τ , δ, ρ̂n,k̆ ) for β from the continuous mapping theorem. 6.3 Proof of Theorem 2 We state the following lemma, taken from Goegebeur et al. (2010), which provides a result that is useful in the derivation of the asymptotic expansion of Tn,k (K) given in Theorem 2. Lemma 2 (Goegebeur et al., 2010) Denote by E1 , . . . , Ek standard exponential random variables and by U1,k ≤ . . . ≤ Uk,k the order statistics of a random sample of size k from U (0, 1). Assume that R1 R1 |ρ|−1−ε du < ∞ for some ε > 0 in case |ρ| < 1. Then 0 |K(u)|du < ∞ in case |ρ| ≥ 1 and that 0 |K(u)|u −ρ k j j 1X −ρ K Uj,k − Sk = k j=1 k+1 k+1 ! √ Ej = OP (1/ k), for k → ∞. Proof of Theorem 2 D From the inverse probability integral transform we have that Xj,n = U (Yj,n ), j = 1, . . . , n. Further, it D is well known that Yn−j+1,n /Yn−k,n = Yk−j+1,k , j = 1, . . . , k, independently from Yn−k,n . Using these 15 properties and condition (R4 ) we can write, for j = 1, . . . , k, ln Xn−j+1,n − ln Xn−k,n D = ln U (Yk−j+1,k Yn−k,n ) − ln U (Yn−k,n ) = γ ln Yk−j+1,k + b(Yn−k,n )Dρ (Yk−j+1,k ) +b(Yn−k,n )b̃(Yn−k,n )Hρ,β (Yk−j+1,k ) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Rρ,β,η (Yk−j+1,k ) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Bn,k (j), where Bn,k (j) := 1 1 b̆(Yn−k,n ) b̃(Yn−k,n ) −Rρ,β,η (Yk−j+1,k ), ln U (Yk−j+1,k Yn−k,n ) − ln U (Yn−k,n ) − γ ln Yk−j+1,k − Dρ (Yk−j+1,k ) − Hρ,β (Yk−j+1,k ) b(Yn−k,n ) and hence Zj D = γj(ln Yk−j+1,k − ln Yk−j,k ) + b(Yn−k,n )j(Dρ (Yk−j+1,k ) − Dρ (Yk−j,k )) +b(Yn−k,n )b̃(Yn−k,n )j(Hρ,β (Yk−j+1,k ) − Hρ,β (Yk−j,k )) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )j(Rρ,β,η (Yk−j+1,k ) − Rρ,β,η (Yk−j,k )) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )ζj , where ζj := j(Bn,k (j) − Bn,k (j + 1)), with the convention that Bn,k (k + 1) = 0. This then gives Tn,k (K) D = k j 1X K j(ln Yk−j+1,k − ln Yk−j,k ) γ k j=1 k+1 k 1X j +b(Yn−k,n ) K j(Dρ (Yk−j+1,k ) − Dρ (Yk−j,k )) k j=1 k+1 +b(Yn−k,n )b̃(Yn−k,n ) k j 1X K j(Hρ,β (Yk−j+1,k ) − Hρ,β (Yk−j,k )) k j=1 k+1 +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n ) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n ) (1) (2) k j 1X K j(Rρ,β,η (Yk−j+1,k ) − Rρ,β,η (Yk−j,k )) k j=1 k+1 k 1X j K ζj k j=1 k+1 (3) =: Tn,k (K) + b(Yn−k,n )Tn,k (K) + b(Yn−k,n )b̃(Yn−k,n )Tn,k (K) (4) (5) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Tn,k (K) + b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )Tn,k (K). 16 (1) (2) We now analyze each term in turn. The derivations for the terms Tn,k (K) and Tn,k (K) are identical to those in Goegebeur et al. (2010). We repeat the argumentation here briefly as it is also useful for the development of the subsequent terms. For the first term we have the following. Let E1 , . . . , Ek denote a random sample of size k from the unit exponential distribution, and denote by E1,k ≤ · · · ≤ Ek,k the corresponding order statistics. From the Rényi representation (Rényi, 1953) we obtain D = ln Yk−j+1,k − ln Yk−j,k D = Ek−j+1,k − Ek−j,k Ej . j Thus (1) Tn,k (K) k k 1X j j 1X K +γ K (Ej − 1) = γ k j=1 k+1 k j=1 k+1 D √ Nk (K) Nk (K) = γµ(K) + o(1/ k) + γσ(K) √ = γµ(K) + γσ(K) √ (1 + oP (1)), k k where √ Nk (K) := k 1 k Pk j=1 K j k+1 σ(K) (9) (Ej − 1) which is, under condition (K) (vi), an asymptotic standard normal random variable by the LindebergFeller central limit theorem, see also Chernoff et al. (1967). (2) As for Tn,k (K) we obtain, by the mean value theorem and the Rényi representation, the following expression (2) Tn,k (K) D = D = ρEk−j+1,k k e − eρEk−j,k j 1X K j k j=1 k+1 ρ k j 1X K Ej eρQj,k k j=1 k+1 where Qj,k is a random value between Ek−j,k and Ek−j+1,k , and hence −ρ " −ρ # k k 1X j j j j 1X D (2) ρQj,k Tn,k (K) = K K e − Ej Ej + k j=1 k+1 k+1 k j=1 k+1 k+1 (2,1) (2,2) =: Tn,k (K) + Tn,k (K). (2,1) (1) The term Tn,k (K) can be analyzed in a similar way as Tn,k (K): √ σ(K, ρ) σ(K, ρ) (2,1) Pk (K, ρ) = I1 (K, ρ) + √ Pk (K, ρ)(1 + oP (1)), Tn,k (K) = I1 (K, ρ) + o(1/ k) + √ k k 17 where σ 2 (K, ρ) := R1 0 K 2 (u)u−2ρ du and √ Pk (K, ρ) := k 1 k Pk j=1 K j k+1 j k+1 σ(K, ρ) −ρ (Ej − 1) . The statistic Pk (K, ρ) is a standardized kernel statistic with a kernel function satisfying j j −ρ √ D = o( k), and hence we have that Pk (K, ρ) → maxj∈{1,...,k} K k+1 N (0, 1) for k → ∞ (Cherk+1 (2,2) noff et al., 1967). The term Tn,k (K) can be analyzed as follows. Let U1,k ≤ · · · ≤ Uk,k denote the order statistics of a random sample of size k from the uniform (0, 1) distribution. From the inverse probability D D integral transform we obtain that exp(Ek−j+1,k ) = exp(− ln(1 − Uk−j+1,k )) = exp(− ln Uj,k ) = 1/Uj,k , j = 1, . . . , k. Also we have −ρ j ρQj,k − e k+1 ( −ρ −ρ ) j j ρEk−j+1,k ρEk−j,k max e − − , e k+1 k+1 ) ( −ρ −ρ j j −ρ −ρ max Uj+1,k − , Uj,k − k+1 k+1 ( −ρ ) −ρ j+1 j −ρ −ρ max Uj+1,k − + cj,k , Uj,k − , k+1 k+1 ≤ D = ≤ where cj,k := ((j + 1)/(k + 1))−ρ − (j/(k + 1))−ρ . Hence −ρ k k −ρ 1 X j j j + 1 1 X (2,2) Ej U cj,k Ej K K |Tn,k (K)| ≤ + − j+1,k k k j=1 k+1 k+1 k+1 j=1 −ρ k −ρ j j 1 X + K k + 1 Ej Uj,k − k + 1 k j=1 (2,2,1) =: Tn,k (2,2,2) (K) + Tn,k (2,2,1) By using Lemma 2, the terms Tn,k (2,2,2) Tn,k (2,2,3) (K) + Tn,k (2,2,3) (K) and Tn,k (K). √ (K) can be shown to be both OP (1/ k). Concerning (K) we have that (2,2,2) Tn,k (K) k |ρ|−1 j |ρ| 1 X z K Ej , = k + 1 k j=1 k + 1 j,k where zj,k is a value between j/(k + 1) and (j + 1)/(k + 1). When |ρ| ≥ 1 we have k j |ρ| 1 X (2,2,2) Ej K Tn,k (K) ≤ k + 1 k j=1 k+1 (2,2,2) from which, by the strong law of large numbers, we obtain that Tn,k (K) = OP (1/k). In case |ρ| < 1 we have (2,2,2) Tn,k (K) ≤ |ρ|−1 k |ρ| 1 X j j K Ej k + 1 k j=1 k+1 k+1 18 (10) e (vii) and the strong law of large numbers, yields that which, by invoking condition (K) (2,2,2) Tn,k (K) = OP (1/k). Combining the above results gives √ (2) Tn,k (K) = I1 (K, ρ) + OP (1/ k). (3) (4) The terms Tn,k (K) and Tn,k (K) can be analyzed using an analogous line of argumentation, and yield √ I2 (K, ρ, β) + OP (1/ k), (3) Tn,k (K) = (4) Tn,k (K) = I3 (K, ρ, β, η)(1 + oP (1)). (5) Finally, concerning Tn,k (K), we apply condition K(i) and obtain (5) Tn,k (K) = = Z j/(k+1) k 1X ζj u(v)dv k j=1 j/(k + 1) 0 j Z k k + 1 X ζj X i/(k+1) u(v)dv k j=1 j i=1 (i−1)/(k+1) k = k+1 X k i=1 Z i/(k+1) u(v)dv Bn,k (i) (i−1)/(k+1) and (5) |Tn,k (K)| ≤ k i 1X f |Bn,k (i)| . k i=1 k+1 From Theorem 2.1 in Fraga Alves et al. (2006) we have that for every ε > 0 there exists n0 such that for any n ≥ n0 , with arbitrary probability, for i = 1, . . . , k, |Bn,k (i)| ε+ρ+β+η ≤ εYk−i+1,k D −ρ−β−η−ε = εUi,k −ρ−β−η−δ ≤ εUi,k where ε < δ. Thus we have that B (i) n,k sup −ρ−β−η−δ = oP (1), i∈{1,...,k} U i,k whence (5) |Tn,k (K)| ≤ oP (1) k i 1X −ρ−β−η−δ f Ui,k k i=1 k+1 which, provided δ < −ρ − β − η, is oP (1). 19 6.4 Proof of Theorem 3 It is easy to verify that Lα satisfies (K), and hence from Theorem 2 we obtain the following asymptotic representation for our statistic under (R4 ) D Tn,k (Lα ) = √ Nk (Lα ) √ (1 + oP (1)) + b(Yn−k,n )I1 (Lα , ρ) + b(Yn−k,n )OP (1/ k) k √ +b(Yn−k,n )b̃(Yn−k,n )I2 (Lα , ρ, β) + b(Yn−k,n )b̃(Yn−k,n )OP (1/ k) γ + γσ(Lα ) +b(Yn−k,n )b̃(Yn−k,n )b̆(Yn−k,n )I3 (Lα , ρ, β, η)(1 + oP (1)), and thus Tn,k (Lα ) − Tn,k (Lα̃ ) b(Yn−k,n ) 1 D = I1 (α, α̃, ρ) + √ Nk (α, α̃, γ)(1 + oP (1)) + b̃(Yn−k,n )I2 (α, α̃, ρ, β) kb(Yn−k,n ) +b̃(Yn−k,n )b̆(Yn−k,n )I3 (α, α̃, ρ, β, η)(1 + oP (1)). (11) A straightforward application of Taylor’s theorem provides the following expansion for a ratio of statistics of the form (11): D Ψn,k (α, α̃, ᾰ) = 1 Nk (α, α̃, ᾰ, γ, ρ)(1 + oP (1)) + b̃(Yn−k,n )c1 (α, α̃, ᾰ, ρ, β) ψ(α, α̃, ᾰ, ρ) + √ kb(Yn−k,n ) +b̃(Yn−k,n )b̆(Yn−k,n )c2 (α, α̃, ᾰ, ρ, β, η)(1 + oP (1)) −b̃2 (Yn−k,n )c3 (α, α̃, ᾰ, ρ, β)(1 + oP (1)). Taking a difference of two statistics Ψn,k with appropriately chosen parameters α, α̃ and ᾰ gives Ψn,k (τ1 − δ1 , τ1 , τ1 − δ2 ) − Ψn,k (τ2 − δ1 , τ2 , τ2 − δ2 ) b̃(Yn−k,n ) 1 D = d1 (τ1 , τ2 , δ, ρ, β) + √ Vk (τ1 , τ2 , δ, γ, ρ)(1 + oP (1)) kb(Yn−k,n )b̃(Yn−k,n ) +b̆(Yn−k,n )d2 (τ1 , τ2 , δ, ρ, β, η)(1 + oP (1)) − b̃(Yn−k,n )d3 (τ1 , τ2 , δ, ρ, β)(1 + oP (1)), and, by another application of Taylor’s theorem D Λn,k (τ , δ) = Vk (τ , δ, γ, ρ, β) (1 + oP (1)) + b̆(Yn−k,n )d2 (τ , δ, ρ, β, η)(1 + oP (1)) Λ(τ , ρ, β) + √ kb(Yn−k,n )b̃(Yn−k,n ) −b̃(Yn−k,n )d3 (τ , δ, ρ, β)(1 + oP (1)), from which the result of Theorem 3 follows. 6.5 Proof of Theorem 4 The asymptotic normality of the normalized β̂n,k (τ , δ, ρ) can be obtained by a straightforward application of the delta method. 20 For what concerns the asymptotic normality of the normalized β̂n,k (τ , δ, ρ̂n,k̆ ), first note, by long and tedious calculations, that dβ/dρ exists for all β, ρ < 0, and is continuous in ρ. From Taylor’s theorem we then obtain β̂n,k (τ , δ, ρ̂n,k̆ ) = β̂n,k (τ , δ, ρ) + dβ (ρ̂ − ρ)(1 + oP (1)). dρ n,k̆ Thus √ kb(n/k)b̃(n/k)(β̂n,k (τ , δ, ρ̂n,k̆ ) − β) = √ kb(n/k)b̃(n/k)(β̂n,k (τ , δ, ρ) − β) dβ √ kb(n/k)b̃(n/k)(ρ̂n,k̆ − ρ)(1 + oP (1)), + dρ from which the result of Theorem 4 follows. 6.6 Derivation of the asymptotic variance in Theorem 3 Straightforward calculations lead to Var(Vk (τ , δ, γ, ρ, β)) = 1 d21 (τ2 , τ3 , δ, ρ, β) Var(Vk (τ1 , τ2 , δ, γ, ρ)) + Λ2 (τ , ρ, β)Var(Vk (τ2 , τ3 , δ, γ, ρ)) −2Λ(τ , ρ, β)Cov(Vk (τ1 , τ2 , δ, γ, ρ), Vk (τ2 , τ3 , δ, γ, ρ))] , in which Var(Vk (τ1 , τ2 , δ, γ, ρ)) = Var(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ)) + Var(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)) −2Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)) and Cov(Vk (τ1 , τ2 , δ, γ, ρ), Vk (τ2 , τ3 , δ, γ, ρ)) = Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)) −Cov(Nk (τ1 − δ1 , τ1 , τ1 − δ2 , γ, ρ), Nk (τ3 − δ1 , τ3 , τ3 − δ2 , γ, ρ)) −Var(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ)) +Cov(Nk (τ2 − δ1 , τ2 , τ2 − δ2 , γ, ρ), Nk (τ3 − δ1 , τ3 , τ3 − δ2 , γ, ρ)). For arbitrary but positive αi , α̃i , ᾰi , i = 1, 2, we have, using (8), that Cov(Nk (α1 , α̃1 , ᾰ1 , γ, ρ), Nk (α2 , α̃2 , ᾰ2 , γ, ρ)) = 1 [Cov(Nk (α1 , α̃1 , γ), Nk (α2 , α̃2 , γ)) I1 (ᾰ1 , α̃1 , ρ)I1 (ᾰ2 , α̃2 , ρ) −ψ(α2 , α̃2 , ᾰ2 , ρ)Cov(Nk (α1 , α̃1 , γ), Nk (ᾰ2 , α̃2 , γ)) −ψ(α1 , α̃1 , ᾰ1 , ρ)Cov(Nk (ᾰ1 , α̃1 , γ), Nk (α2 , α̃2 , γ)) +ψ(α1 , α̃1 , ᾰ1 , ρ)ψ(α2 , α̃2 , ᾰ2 , ρ)Cov(Nk (ᾰ1 , α̃1 , γ), Nk (ᾰ2 , α̃2 , γ))] 21 (12) where, from (7), the covariances are of the form Cov(Nk (α1 , α̃1 , γ), Nk (α2 , α̃2 , γ)) ∼ γ 2 [σ(α1 , α2 ) − σ(α1 , α̃2 ) − σ(α̃1 , α2 ) + σ(α̃1 , α̃2 )], for k → ∞, with σ(α1 , α2 ) := Z 1 Lα1 (u)Lα2 (u)du = 0 Γ(α1 + α2 + 1) . Γ(α1 + 1)Γ(α2 + 1) The required variance terms in (12) follow in a straightforward way from the above covariance expressions. Acknowledgement The authors are very grateful to the referees for their careful reading of the paper and their comments that lead to significant improvements of the initial draft. References [1] Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., 1999. Tail index estimation and an exponential regression model. Extremes 2, 177–200. [2] Beirlant, J., Dierckx, G., Guillou, A., Stărică, C., 2002. On exponential representations of logspacings of extreme order statistics. Extremes 5, 157–180. [3] Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., 2004. Statistics of Extremes – Theory and Applications. Wiley Series in Probability and Statistics. [4] Beirlant, J., Goegebeur, Y., Verlaak, R., Vynckier, P., 1998. Burr regression and portfolio segmentation. Insurance: Mathematics and Economics 23, 231–250. [5] Beirlant, J., Vynckier, P., Teugels, J., 1996. Tail index estimation, Pareto quantile plots, and regression diagnostics. Journal of the American Statistical Association 91, 1659–1667. [6] Bingham, N.H., Goldie, C.M., Teugels, J.L., 1987. Regular Variation. Cambridge: Cambridge University Press. [7] Caeiro, F., Gomes, M.I., 2006. A new class of estimators of a “scale” second order parameter. Extremes 9, 193–211. [8] Caeiro, F., Gomes, M.I., 2008. Minimum-variance reduced-bias tail index and high quantile estimation. REVSTAT – Statistical Journal 6, 1–20. 22 [9] Chernoff, H., Gastwirth, J.L., Johns, M.V., 1967. Asymptotic distribution of linear combinations of functions of order statistics with applications to estimation. Annals of Mathematical Statistics 38, 52–72. [10] Ciuperca, G., Mercadier, C., 2010. Semi-parametric estimation for heavy tailed distributions. Extremes 13, 55–87. [11] de Haan, L., Ferreira, A., 2006. Extreme Value Theory: An Introduction. Springer. [12] de Haan, L., Stadtmüller, U., 1996. Generalized regular variation of second order. Journal of the Australian Mathematical Society (Series A) 61, 381–395. [13] Draisma, G., 2000. Parametric and Semi-Parametric Methods in Extreme Value Theory. Tinbergen Institute Research Series 239, Erasmus Universiteit Rotterdam. [14] Drees, H., Kaufmann, E., 1998. Selecting the optimal sample fraction in univariate extreme value estimation. Stochastic Processes and their Applications 75, 149–172. [15] Feuerverger, A., Hall, P., 1999. Estimating a tail exponent by modelling departure from a Pareto distribution. The Annals of Statistics 27, 760–781. [16] Fraga Alves, I., de Haan, L., Lin, T., 2006. Third order extended regular variation. Publications de l’Institut Mathématique 80, 109–120. [17] Fraga Alves, M.I., Gomes, M.I., de Haan, L., 2003. A new class of semi-parametric estimators of the second order parameter. Portugaliae Mathematica 60, 193–213. [18] Geluk, J.L., de Haan, L., 1987. Regular Variation, Extensions and Tauberian Theorems. CWI Tract 40, Center for Mathematics and Computer Science, Amsterdam. [19] Gnedenko, B.V., 1943. Sur la distribution limite du terme maximum d’une série aléatoire. Annals of Mathematics 44, 423–453. [20] Goegebeur, Y., Beirlant, J., de Wet, T., 2010. Kernel estimators for the second order parameter in extreme value statistics. Journal of Statistical Planning and Inference 140, 2632–2652. [21] Gomes, M.I., de Haan, L., Peng, L., 2002. Semi-parametric estimation of the second order parameter in statistics of extremes. Extremes 5, 387–414. [22] Gomes, M.I., de Haan, L., Rodrigues, L.H., 2008. Tail index estimation for heavy-tailed models: accommodation of bias in weighted log-excesses. Journal of the Royal Statistical Society Series B 70, 31–52. 23 [23] Gomes, M.I., Martins, M.J., 2002. ‘Asymptotically unbiased’ estimators of the tail index based on external estimation of the second order parameter. Extremes 5, 5–31. [24] Guillou, A., Hall, P., 2001. A diagnostic for selecting the threshold in extreme value analysis. Journal of the Royal Statistical Society Series B 63, 293–305. [25] Hall, P., Welsh, A.H., 1985. Adaptive estimates of parameters of regular variation. The Annals of Statistics 13, 331–341. [26] Rényi, A., 1953. On the theory of order statistics. Acta Mathematica Academiae Scientiarum Hungaricae 4, 191–231. [27] Rodriguez, R.N., 1977. A guide to the Burr type XII distributions. Biometrika 64, 129–134. [28] Wang, X.Q., Cheng, S.H., 2005. General regular variation of n-th order and the 2nd order Edgeworth expansion of the extreme value distribution (I). Acta Mathematica Sinica 21, 1121–1130. 24 3.0 0.0 0.5 1.0 1.5 Abias 2.0 2.5 3.0 2.5 2.0 1.5 Abias 1.0 0.5 0.0 −2.0 −1.5 −1.0 −0.5 0.0 −2.0 −1.5 −1.0 −0.5 0.0 −0.5 0.0 −0.5 0.0 4 3 Abias 2 1 0 0 1 2 Abias 3 4 5 beta 5 beta −2.0 −1.5 −1.0 −0.5 0.0 −2.0 −1.5 −1.0 800 600 Astd 400 200 0 0 200 400 Astd 600 800 1000 beta 1000 beta −2.0 −1.5 −1.0 beta −0.5 0.0 −2.0 −1.5 −1.0 beta Figure 1: Asymptotic behavior of β̂n,k (δ, τ , ρ) for Pareto-type models with γ = 1 and ρ = β = η. Left: parameter setting δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted line); right: parameter setting δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25) (solid line), τ ′ = (1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 25 1.5, 1.75) (dotted line). Top row: d˜2 (τ , δ, ρ, β, η); middle row: d˜3 (τ , δ, ρ, β); bottom row: ṽ(τ , δ, γ, ρ, β) as a function of β. 1.0 −1.0 −0.5 0.0 Abias 0.5 1.0 0.5 0.0 Abias −0.5 −1.0 0 2000 4000 6000 8000 10000 0 2000 4000 8000 10000 6000 8000 10000 6000 8000 10000 1.0 0.5 −1.0 −0.5 0.0 Abias 0.5 0.0 Abias −0.5 −1.0 0 2000 4000 6000 8000 10000 0 2000 4000 0.5 0.0 −0.5 −1.0 −1.0 −0.5 0.0 Abias 0.5 1.0 k 1.0 k Abias 6000 k 1.0 k 0 2000 4000 6000 k 8000 10000 0 2000 4000 k Figure 2: Asymptotic bias of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the Burr(1,2,0.5) (top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25) (solid line), τ ′ =26(1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 1.5, 1.75) (dotted line). 50 0 10 20 Astd 30 40 50 40 30 Astd 20 10 0 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 6000 8000 10000 6000 8000 10000 40 30 Astd 20 10 0 0 10 20 Astd 30 40 50 k 50 k 0 2000 4000 6000 8000 10000 0 2000 4000 40 30 Astd 20 10 0 0 10 20 Astd 30 40 50 k 50 k 0 2000 4000 6000 k 8000 10000 0 2000 4000 k Figure 3: Asymptotic standard deviation of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the Burr(1,2,0.5) (top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted ′ line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25) 27 (solid line), τ = (1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 1.5, 1.75) (dotted line). 0.25 AMSE 0.00 0.05 0.10 0.15 0.20 0.25 0.20 0.15 AMSE 0.10 0.05 0.00 5000 6000 7000 8000 9000 10000 5000 6000 7000 9000 10000 8000 9000 10000 8000 9000 10000 0.25 0.20 0.15 AMSE 0.00 0.05 0.10 0.20 0.15 AMSE 0.10 0.05 0.00 5000 6000 7000 8000 9000 10000 5000 6000 7000 0.20 0.15 AMSE 0.00 0.05 0.10 0.00 0.05 0.10 0.15 0.20 0.25 k 0.25 k AMSE 8000 k 0.25 k 5000 6000 7000 8000 k 9000 10000 5000 6000 7000 k Figure 4: Asymptotic mean squared error of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000 for the Burr(1,2,0.5) (top), Burr(1,1,1) (middle) and Burr(1,0.5,2) (bottom) distributions. Left: δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) (dotted ′ line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 1, 1.25) 28 (solid line), τ = (1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 1.5, 1.75) (dotted line). 4 −4 −2 0 Abias 2 4 2 0 Abias −2 −4 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 6000 8000 10000 8000 9000 10000 40 30 Astd 20 10 0 0 10 20 Astd 30 40 50 k 50 k 0 2000 4000 6000 8000 10000 0 2000 4000 8 6 AMSE 4 2 0 0 2 4 AMSE 6 8 10 k 10 k 5000 6000 7000 8000 9000 k 10000 5000 6000 7000 k Figure 5: Modified Pareto distribution. Asymptotic bias (top), asymptotic standard deviation (middle) and asymptotic mean squared error (bottom) of β̂n,k (τ , δ, ρ) as a function of k when n = 10, 000. Left: δ ′ = (0.05, 0.1) with τ ′ = (0.5, 0.6, 0.7) (solid line), τ ′ = (0.6, 0.7, 0.8) (dashed line) and τ ′ = (0.7, 0.8, 0.9) ′ (dotted line). Right: δ ′ = (0.25, 0.5) with τ ′ = (0.75, 29 1, 1.25) (solid line), τ = (1, 1.25, 1.5) (dashed line) and τ ′ = (1.25, 1.5, 1.75) (dotted line). 10 0 8 −1 MSE 6 −2 beta 4 −3 2 −4 0 −5 6000 7000 8000 9000 10000 6000 7000 8000 9000 10000 9500 10000 k 0 −5 2 −4 4 −3 beta MSE 6 −2 8 −1 0 10 k 8000 8500 9000 9500 10000 8000 8500 9000 k 10 0 −5 −4 5 −3 beta MSE −2 15 −1 0 20 k 9000 9200 9400 9600 9800 10000 9000 k 9200 9400 9600 9800 10000 k Figure 6: Burr simulation, 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line) for Burr(1, 2, 0.5) (top), Burr(1, 1, 1) (middle), Burr(1, 0.5, 2) (bottom). Right: corresponding empirical MSE. 30 10 0 8 −1 MSE 6 −2 beta 4 −3 2 −4 0 −5 6000 7000 8000 9000 10000 6000 7000 8000 9000 10000 9500 10000 k 0 −5 2 −4 4 −3 beta MSE 6 −2 8 −1 0 10 k 8000 8500 9000 9500 10000 8000 8500 9000 k 10 0 −5 −4 5 −3 beta MSE −2 15 −1 0 20 k 5000 6000 7000 8000 9000 10000 5000 k 6000 7000 8000 9000 10000 k Figure 7: GPD(1,0.5) (top), GPD(1,1) (middle) and modified Pareto (bottom) simulation, 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line). Right: corresponding empirical MSE. 31 10 0 8 −1 MSE 6 −2 beta 4 −3 2 −4 0 −5 6000 7000 8000 9000 10000 6000 7000 8000 9000 10000 9500 10000 k 0 −5 2 −4 4 −3 beta MSE 6 −2 8 −1 0 10 k 8000 8500 9000 9500 10000 8000 8500 9000 k 10 0 −5 −4 5 −3 beta MSE −2 15 −1 0 20 k 9000 9200 9400 9600 9800 10000 9000 k 9200 9400 9600 9800 10000 k Figure 8: Burr simulation, 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k̆ ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line) for Burr(1, 2, 0.5) (top), Burr(1, 1, 1) (middle), Burr(1, 0.5, 2) (bottom). Right: corresponding empirical MSE. 32 10 0 8 −1 MSE 6 −2 beta 4 −3 2 −4 0 −5 6000 7000 8000 9000 10000 6000 7000 8000 9000 10000 9500 10000 k 0 −5 2 −4 4 −3 beta MSE 6 −2 8 −1 0 10 k 8000 8500 9000 9500 10000 8000 k 8500 9000 k Figure 9: GPD(1,0.5) (top) and GPD(1,1) (bottom), 1,000 runs of size n = 10, 000. Left: Median of β̂n,k (τ , δ, ρ̂n,k̆ ) with τ ′ = (0.75, 1, 1.25), δ ′ = (0.25, 0.5) (solid line), and τ ′ = (0.6, 0.7, 0.8), δ ′ = (0.05, 0.1) (dashed line). Right: corresponding empirical MSE. 33
© Copyright 2026 Paperzz