Smith, Richard L. and Weissman, Ishay; (1991).Estimating the Extremal Index."

ESTIMATING THE EXTREMAL INDEX
by
RICHARD L. SMITH
University of North Carolina, USA
and
ISHAY WEISSMAN
Technion, Haifa, Israel
Contact Address: Richard L. Smith, Department of Statistics, University of North
Carolina, Chapel Hill, N.C. 27599-3260, U.S.A.
November
4 1991
Abstract
The extremal index is an important parameter measuring the degree of clustering of
extremes in a stationary process. If we consider the point process of exceedance times over
a high threshold, then this can be shown to converge asymptotically to a clustered Poisson
process. The extremal index, a parameter between 0 and 1, is the reciprocal of the mean
cluster size. Apart from being of interest in its own right, it is a crucial parameter for
determining the limiting distribution of extreme values from the process. In this paper
we review current work on statistical estimation of the extremal index, and consider an
optimality criterion based on a bias-variance trade-off. Theoretical results are presented
for some simple stochastic processes, and the practical implications are examined through
simulations and some real data analysis.
1
1. BACKGROUND
Suppose we have n observations from a stationary process Xi, i
distribution function F. For large n and
Un,
~
0 with marginal
it is typically the case that
(1.1 )
where B E [0, 1] is a constant for the process known as the extremal index. This concept
originated in papers by Newell (1964), Loynes (1965) and O'Brien (1974), and was placed
on a firm footing by Leadbetter (1983). It is explained further in Section 3.7 of the book
by Leadbetter et al. (1983), and the review paper Leadbetter and Rootzen (1988) contains much more information. The classical theory of extremes in independent, identically
distributed sequences (Galambos 1987, Leadbetter et al. 1983) provides conditions for the
existence of normalising constants
an
> 0 and bn such that
(1.2)
where Hn(x) is one of the three classical extreme value types
'1'a(x)
= exp{ -( _x)a}, X
< 0 or A(x)
= exp( _e-
Z
),
-00
~a(x)
<x<
00.
= exp( -x-a), X
> 0,
Combining (1.1) and
(1.2), it is seen that for a stationary sequence {X n }, we have
(1.3)
Thus it appears that B is the key parameter for extending extreme value theory for
i.i.d. random variables to dependent stochastic processes. From a JtatiJtical point of view,
since there is already an extensive literature on estimating extreme value distributions
from i.i.d. data, it can be seen that estimating the extremal index is a key problem.
An alternative characterisation of the extremal index (Hsing et al. 1988) is that liB is
the mean cluster size in the point process of exceedance times over a high threshold. This
suggest that a suitable way to estimate the extremal index is to identify clusters of highlevel exceedances, and to calculate the mean size of those clusters. This idea underlies the
2
statistical proposals of Smith (1989) and Davison and Smith (1990), though in an informal
way the concept has existed in the hydrology literature for at least 20 years. However, until
recently there has been no discussion of any optimality criteria for selecting the clusters.
This is the subject of the present paper.
Let us assume that a high threshold Un has already been chosen. The choice of Un
for the estimation of the tail of F has been extensively studied from both a theoretical
and practical point of view; references include Smith (1987, 1989), Davison and Smith
(1990), Dekkers and de Haan (1989), Dekkers, Einmahl and de Haan (1989). There is no
theoretical reason why the threshold for estimating () has to be the same as the threshold
for estimating the tail of F, and it is possible to consider procedures in which they are
different. Initially, however, we treat the threshold as fixed and concentrate on the problem
of identifying clusters. Later discussion will include optimal choice of the threshold as well.
The discussion throughout is confined to stationary sequences. Although nonstationary sequences are of practical interest, an exact analogue of the extremal index may not
exist in this case (Hiisler 1986). In practice, in cases where the nonstationarity arises from
such sources as seasonal variation in the data, a pragmatic approach is to break up the
data into seasonal portions within which they may be assumed stationary. In the following
discussion we shall assume that such preliminary analysis has already been carried out.
Suppose, then, we have n observations from a stationary series, and let N n denote
the number of observations which exceed a predetermined high threshold Un' We consider
two methods of defining clusters. The first, called the blocks method, divides the data
into approximately k n blocks of length r n , where n :::::: knr n . Each block is treated as one
cluster; thus if Zn denotes the number of blocks in which there is at least one exceedance
of the threshold Un, we consider Nn/Z n to estimate the mean size of a cluster and hence
estimate () by
(1.4)
The following argument leads to a refinement of this. We can consider Zn/kn to be
3
an estimator of 1 - Frn(u n ) where, as in (1.1), Fr denotes the distribution function of
max(XI , ... ,Xr ). Also, Nn/n is an estimator of 1 - F(u n ). If we relate these by the
approximation Fr ~ Fr9, then we get the estimator
(1.5)
Recalling that n
~
knT n , it can be seen that (1.4) and (1.5) are asymptotically equivalent
when the ratios Zn/kn and Nn/n are small, but it is conceivable that (1.5) has some
advantages in terms of second-order asymptotic properties. It can easily be checked that,
for numbers x and y between 0 and 1,
x
log(l - x) .
.
-Y < Iog-y
(1
) If and only If y < x
and from this it follows that On < On provided N n < TnZ n , which will always be the case
except in the circumstance that every block with at least one exceedance consists entirely
of exceedances.
Our second method is based on the idea of runs of observations below or above the
threshold defining clusters. More precisely, suppose we take any sequence of Tn consecutive
observations below the threshold as separating two clusters. An equivalent and in some
ways more attractive characterisation is the following. Define Wn,i to be 1 if the i'th
observation is above the threshold (i.e. if Xi >
n
Un)
and 0 otherwise. Let
n
N n = LWn,i' Z: = L Wn,j{1 - Wn,i+d· .. (l - Wn,i+r n )'
i=1
(1.6)
i=1
Then
(1.7)
We call this the runs estimator. The definition of Z: in (1.6) ensures that an exceedance
in position i is counted if and only if the following Tn observations are all below the
threshold
Un,
in other words, if it is the rightmost member of a cluster according to the
runs definition.
4
The theoretical properties of the blocks estimator (1.4) have been studied by Hsing
(1990, 1991). It is a natural starting point for any statistical study because probabilistic
analyses of the extremal index (Leadbetter 1983, Hsing et al. 1988) used the same definition. However, the runs estimator seems more natural from a statistical point of view.
For example, Smith (1989) used precisely this method of separating clusters, calling Tn the
cluster interval. Nandagopalan (1990) and Leadbetter et al. (1989) have used a version of
the runs estimator with
Tn
= 1 and Nandagopalan (1990) derived theoretical properties for
it, but in general there seems no reason to confine ourselves to r n
= 1 and a more general
estimator is desirable. Indeed a major theme of the present paper is the optimal choice of
r n from bias and variance considerations.
2. BIAS AND VARIANCE CALCULATIONS
The main results of this paper depend on the asymptotic properties of the point
process of exceeedance times of a sequence of high thresholds Un. We begin by quoting
one of the main results of Hsing et al. (1988).
Suppose {un' n 2: 1} is a sequence of thresholds such that F(un) -4 1 as n -4
let N n denote the exceedance point process for the level
Un
00,
and
by Xl, ... ,Xn , i.e. for any 0 :::;
a < b:::; 1, Nn[a, b] denotes the number of exceedances of the level Un by Xi, na :::; i :::; nb.
Suppose {X n } is stationary and for i < j let 8t denote the O'-field generated by the events
{X t
:::;
un}, i :::; t :::; j. For n 2: 1 and 1 :::; f :::; n - 1 let
G:n,i = max{1 Pr(A n B) - Pr(A) Pr(B)1 : A E 8~, B E 8 k+b 1 :::; k :::; n - f}.
The process is said to satisfy the condition 6( un) if G:n,i n -4 0 as n -4
sequence {fn} with fn/n -4
o.
00
for some
This is a condition of "mixing in the tails", similar to, but
somewhat stronger than, the condition D( un) used extensively in the book by Leadbetter
et al. (1983). Suppose also we define sequences {k n } and {rn} where k n -4
00,
knfn/n -4 0
and knG:n,i n -40, while r n is the integer part of n/kn . Define the cluster size distribution
7t"n
by
5
rn
1r
n(j)
rn
= Pr{L Wn,i = jl L
i=l
Wn,i > O},
j ~ 1
i=l
where Wn,i is the indicator of the event {Xi> Un}. Thus 1r n is the distribution of cluster
. size when the definition of a cluster is based on a block of length r n .
Under these conditions, a combination of Corollary 3.3, Theorem 4.1 and Theorem 4.2
of Hsing et al. (1988) leads to the following conclusion: if Fn(u n ) -+e-'\ and 1r n(j)
as n
-+ 00,
where 0
-+
1r(j)
< A < 00 and 1r is a probability distribution on the positive integers,
then the exceedance point process N n converges to a compound Poisson process consisting
of "cluster centres" forming a Poisson process of intensity A on [0,1], and each cluster
consisting of a random number of points which follows the distribution 1r, independently
for each cluster. Moreover, under additional summability conditions on the
1r n
(j)'s, we
also have 1/8 = 'L,j1rj, confirming the notion that the extremal index is a reciprocal of
mean cluster size.
Let us now assume that these asymptotic results are an adequate description of the exceedance process, and consider the effect on estimation of 8. Our arguments are somewhat
heuristic, but precise formulation of the asymptotic results along the lines of Hsing (1991)
or Nandogopalan (1990) involves considerable technicalities which we prefer to avoid. Let
Zn denote the number of clusters and N n the total number of exceedances, and suppose I-"
and
q2
are the mean and variance of the cluster size distribution. For the purpose of this
calculation, we do not yet distinguish between the "blocks" and "runs" definitions of clusters. Also, 8 = 1/1-". From the obvious relations E{NnIZn }
= Znl-", var{NnIZn } = Zn q2
it
follows at once that ENn = AI-" and the asymptotic covariance matrix of (Zn, N n ) is
(2.1)
In practice, we may replace A by n8F(u n ) where F = 1 - F. Let On = Zn/Nn.
It follows by the delta method (Taylor expansion of f(Zn, N n ) about f(EZn,EN n ), with
f(x, y) = x/y) that asymptotically we have
6
(2.2)
This result is essentially due to Hsing (1991), who established precise sufficient conditions for (2.1), (2.2) and the corresponding asymptotic normality results to hold. The
asymptotics are a little different from Hsing et al. (1988), since in that paper it was sufficient to assume A was a constant, whereas here we need to allow A - t
consistent estimates. Moreover, if we replace
On
by
00
in order to obtain
8n defined by (1.5), it is clear that
the
same result will hold, to the first order of approximation, provided k n and n are both large
compared with A.
Now, however, let us use the same method to obtain a first-order approximation to
the bias of On and
8n .
By (2.1) and the delta method we obtain
E { Zn } ~ E(Zn) {I + Var(Nn )
Nn
E(Nn )
(ENn )2
_
Cov(Zn, N n )}
(ENn)(EZ n )
(2.3)
E(Zn) (
(J'2 )
~ E(Nn ) 1 + AJ.t2 '
correct to 0(1/ A), where we have not replaced EZn and ENn by their asymptotic values
because we will also want to use (2.3) in the situation where there are alternative definitions
of Zn and N n which may not have the same means.
An extension of the same argument shows that
(2.4)
Again, we will replace A by n8F( un) in subsequent discussion. The relative sizes of
the correction terms in (2.4) depend on the relative magnitudes of A, k n and nj since these
are unknown at present, no further simplification of (2.4) is attempted.
So far, we have not distinguished between the blocks and runs estimators. Now we do
so, writing Z: in place of Zn in the latter case.
7
For integer i
~
2 and threshold u, define
8(i,u)
= Pr{X2
~
u,X3
~
u, ... ,Xi
~
uIX1 > u}
(2.5)
O'Brien (1987) characterised the extremal index in the form
(2.6)
for suitable sequences in
~ 00
~
and Un such that F(u n )
1. Essentially, the restriction
on in and Un is that in must not grow too rapidly relative to Un, and in most practical
cases (O'Brien 1974, Smith 1992) it is possible to replace (2.6) by
= i-oo
lim
lim 8(i, u).
u:F( u)-l
8
(2.7)
Note, however, that the order of limits in (2.7) cannot be interchanged, the limit as i
being 0 for each fixed
U
~ 00
for which F(u) < 1.
We always have E(Nn ) = nF(u n ). For the blocks method, we have
rn
= kn LPr{Xi > Un,Xi+l
~ Un, ""Xrn ~ Un}
i=l
Tn
= knF(u n ) L8(i,u n).
i=l
Hence from (2.3) and (2.4), replacing ,\ by n8F( un), we have
E8~ n
1
-
8::::: -
Tn
2
Tn
L...J{8(i,un) - 8}
""
i=l
+
F(
) 2
Un""
(j
n
(2.8)
and
(2.9)
8
For the runs method, it follows immediately from (1.6) that
= nF(un)B(r n + 1,u n )
and hence, using (2.3) with
Z~
replacing Zn, that
(2.10)
The comparison of (2.8) and (2.10) already gives us a concrete reason to prefer the runs
estimator: provided r n does not grow too fast it is reasonable to suppose that B(r n + 1, un)
will be closer to B than the average of B( i, un), 1 :5 i :5 r n , so that the bias in Bn will be
smaller than that in
8n •
More detailed calculations in the following sections bear this out.
3. EXAMPLE: A DOUBLY STOCHASTIC MODEL
We now consider a specific model, a form of doubly stochastic process, for which
it is possible to make some explicit computations and comparisons. These will serve to
motivate some conjectures about the general case.
Let {ei, i
~
1} be i.i.d. with distribution function G, and suppose Y1
= 6,
and for
i> 1,
with probability 'l/J,
with probability 1 -
'l/J,
the choice being made independently for each i. Furthermore, let
X' - {Yi
I 0
with probability TJ,
with probability 1 - TJ,
again independently of everything else. In words, the {Yi} process consists of runs of
observations which remain constant for a geometrically distributed length of time, while
the {Xi} process is a distorted version of {Yi} in which runs of high-level values are
interrupted by randomly and independently distributed O's. We assume G(O) < 1.
9
Let M n = max{X1 , ... ,Xn } and define
(3.1)
assuming u > O. Then Po
Pr{Y2
~
and hence for n
~
ulY1
~
= Qo = 1, PI = 1, Ql = 1 u}
= 1- e + t/Je,
"1. If e = 1- G(u) > 0 we have
Pr{Y2 ~ ulYi > u}
= (1 -
t/J)(1 - e)
1
Pn = (1 - e + t/Je)Pn- 1 + (1 - t/J )eQn-l,
Qn
= (1 -
"1){(1- t/J )(1 - e)Pn- 1 + (t/J + e - t/Je)Qn-d·
Defining
we have
Ro = (
1)
1
'
R n = ARn- 1 where A
=
(l-e+t/Je)
(1 - "1)(1 - t/J)(1 - e)
(l-t/J)e)
(1 - "1)(t/J + e - t/Je) .
(3.2)
These equations have the solution
(3.3)
where G1 , G2 , Db D 2 are determined by the initial conditions to be
Al - 1
1 - A2 - "1 D = Al + "1 - 1
1
G =
= Al - A2 ' D = Al - A2 ' 2
Al - A2
and AI, A2 are the eigenvalues of A, obtained by solving the quadratic equation
1
1 - A2
Al - A2 ' G2
(1 - e + t/Je - A){(I- "1)(t/J + e - t/Je) - A} - (1 - "1)(1 - t/J?(1- e)e = 0
which reduces to
10
(3.4)
A2 - A(l +"p - "pTJ - eTJ
+ "peTJ) + (1 -
TJ)"p
=0
with roots
A = 1 +"p -"pTJ - eTJ(l -"p) ± [{1 +"p;"pTJ - eTJ(l- "p)}2 - 4"p(1 - TJ)]l/2.
(3.5)
As e -+ 0 we have
(3.6)
We apply this first to the distribution of M n • Since Pr{Yi $ u} = 1- e we have from
(3.1)
Fn(u)
= Pr{Mn $
u}
= (1- e)Pn + eQn
which can be determined exactly from (3.3)-(3.5). Now suppose n -+ 00, e -+ 0 so that
nPr{X1 > u} = neTJ -+ T where T E (0,00) is fixed. Hwe expand based on (3.6), ignoring
the terms involving A2 because these are geometrically small compared with those involving
Ai, we find that
Pr{Mn < u} =exp{ -T(l - "p)/(1 -"p + "pTJ)}.
2
T"pTJ
1 _ T (1 - "p )2(1 +"p - "pry) _
{
.
2n(1-"p + "pTJ)3
n(l -"p + "pTJ)2
In particular, Pr{Mn
< u}
-+
+0
(~) }
n2
(3.7)
'
e- 8r where
f)
= _1_-~"p_
1 - "p + "pTJ
is the extremal index for this problem.
11
(3.8)
Now let us turn our attention to the functions B(i, u). In terms of (3.1) we have
B(i,u)
= (1-1P)(1- e)Pi-1 + (1P + e -1Pe)Qi-1.
Recall that u and e are related by e
=1-
G( u). Taking a Taylor expansion in e for
fixed i we find that
B(i,u) = {
+
1-1P
1 - 1P + 1PTJ
+ eTJ(l-1P)(l-1P -1PTJ)}
(1 - 1P + 1PTJ )3
{1- 1ieTJ(l-1P)
}
- 1P + 1PTJ
TJ
{1 e(l -1P)(1 - TJ)(l -1P -1PTJ)}
(1 - TJ)(l -1P + 1PTJ)
(1 -1P + 1PTJ)2
.
(3.9)
.{I + ieTJ(l-1P)
} {1P(l-11)}i + O(e2).
-1P + 1PTJ
1
It follows at once from this that B as defined by (2.7) is the same as that from (3.8).
Let
(3
= 1P(1 -
TJ)·
Our main interest is in cases in which e is small, possibly very small, while i is moderately
large. Retaining the terms of O(ie) and O((3i) while discarding everything else leads to
the approximation
(3.10)
Now let us consider some consequences of these calculations for the bias and variance
approximations of Section 2. In practice u will not be a constant but a sequence of
thresholds
Un
so e = en also depends on n. The cluster sizes have a geometric distribution:
7r(j) = (1 - B);-lB for j > 1. Hence J1. = l/B and (j2 = (1 - B)/B2. By (2.2),
(3.11)
12
which does not depend directly on r n , so the initial choice of r n can be made solely to
minimise bias. Moreover, to the first order of accuracy (3.11) holds equally well for
On
and
On'
Now consider the bias, initially for On' We use the approximation (2.10), ignoring the
second term for the moment. Using the approximation (3.10), we want to choose r n to
minimise_l- rn€nB27] + (1- B),8rn - 11. IT we set r n = (log(l/€n) + loglog(l/€n) + C n )/ 10g,8
for bounded Cn, then both terms are of O( €n log(l/en))' Note that, because the two terms
are of opposite sign, it might be possible to make the bias vanish completely by choosing
C n appropriately, but in general the requirement that r n be an integer defeats this. This
also makes it impossible to determine the exact constant of multiplicity.
Thus, based on the approximation (3.10), we conclude that the best possible rate for
B(r n ,u) - B is of O(e n log(l/e n )) which depends on n only through en, i.e. the optimal r n
depends only on the threshold and not the sample size.
Let us now combine the bias and variance terms, choosing first
minimise mean squared error.
ance is of O(l/ne n ).
Un
and then r n to
The squared bias is of O(e;log2(1/e n )) and the vari-
For these to be the same order of magnitude we require en =
O(n- 1 / 3(logn)-2/3) leading to a mean squared error of O(n- 2/ 3(1ogn)2/3). The second
term in (2.10) is of O(l/n€n) and hence negligible in comparison with the first, so at this
point we have justified neglecting this term in the preceding calculations.
It is a somewhat different matter, however, if we use the blocks estimator
en. In this
case, assuming (3.10), we have for r large that
~ ~{B(' ) _ B)} '" _ enB27]r
r ~ z, u
'"
2
1=1
(1 - B)
,8)
+ r(l -
so, considering the first term only in (2.8), we need to choose r n so that r n = O( €;;1/2).
That leads to a bias of O(€~/2). Then choosing €n to minimise the mean squared error
leads to an optimal €n of O(n- 1 / 2) and a resulting mean squared error of O(n- 1 / 2). From
13
this it can be seen that the blocks estimator leads to a worse optimal rate of convergence
than the runs estimator.
Finally, let us consider the refinement that takes us from
On
into
en,
which results
in two additional terms in (2.9) as compared with (2.8). The larger additional term is
of O(rn€n) which is of the same order as the first of the two terms resulting from (3.10).
Thus, the correction leads to a final mean squared error of the same order of magnitude,
but a different constant of proportionality. This suggests that the correction is worth
considering, though we do not have any definitive results as to when it is better.
4. IMPLICATIONS FOR MORE GENERAL PROCESSES
In Section 2 we argued that the key feature in characterising the bias, of both the
blocks and runs estimators, was the discrepancy between B( i, u) and B for finite i and u.
In Section 3, we derived the formula (3.10) for a specific but highly artificial model. This
is the only case that we have calculated in detail. Nevertheless, we want to argue, by a
mixture of heuristic reasoning and computer simulation, followed in the next section by a
real data example, that this structure is considerably more general than that, and provides
the basis for a practical method of selecting r n and hence estimating the extremal index.
The conjectured general result is given by rewriting equation (3.10) in the form
(4.1)
for some constants C and f3 E (0,1).
For a fixed threshold u it is clear that B( i, u) is decreasing to 0 as i increases. Let i u
be the index i for which B( i, u) is closest to B. Ideally, for given u we would like to set i
equal to i u • Suppose i is much bigger than i u • Then
14
B(iu,u) - B(i,u)
= Pr{X2 =:; u, ""Xi u =:; u, Xj > u for some j, i u < j =:; ilX1 > u}
=Pr{X2 =:; U,,,,,Xi u =:; ulX1 > u}.
, . Pr{Xj
(4.2)
> u for some j, i u < j =:; ilX2 =:; u, ... ,Xi =:; u,X1 > u}
u
~B . F i9 (u)
the reasoning for the second factor being that, for very large i, the event whose probability
is being computed is approximately independent of the conditioning event, by whatever
form of mixing hypothesis is being assumed, and is therefore approximated by (1.1) applied
with n = i.
For only moderately large i, the correct form of approximation is not so clear-cut,
but for many processes satisfying a geometric ergodicity property, one would expect that
probabilities of the form
Pr{X2 =:; u, .... ,Xi-l =:; U,Xi
> ulX1 > u}
.~hould decay exponentially fast as i grows, implying an error of form C f3i. Putting the
two together implies a general expression of form
(4.3)
On expanding F i9 (u) in powers of F(u), (4.3) reduces to (4.1).
However, if the
reasoning leading to (4.3) is correct, then (4.3) should be a better approximation than
(4.1), so we use (4.3) rather than (4.1) in subsequent calculation.
The above reasoning is very general and highly heuristic. Unfortunately, it seems too
complicated at the moment to perform the calculations rigorously for specific classes of
processes. One example where they might be possible is for Markov chains with continuous
state space. Smith (1992) has shown that the calculation of the extremal index for such
chains, under some conditions about the transition function, may be reduced to an equivalent calculation about random walks. The geometric ergodicity of random walks may
15
be used to justify the second term in (4.3), while the first term follows from the general
argument given above. In this case, what we would expect is that the constants C and
f3
will be independent of the threshold. This will be confirmed numerically in one example,
defined by equation (4.4) below.
The only other case for which we are aware of detailed calculations being carried out
is the paper of Hsing (1990) for m-dependent processes. However, the result in this case
appears to be of a different form, arising from the fact that although functions of the
process are independent over lags greater than m, up to lag m they may be arbitrarily
complicated, and this appears to preclude any direct approximation of the form we have
been considering.
From this it can be seen that the proposed approximation is not universally applicable.
The spirit in which we are proposing it, therefore, is as a first approach to be tried in a
statistical procedure.
The remainder of this section gives some simulations. Each value of 8( i, u) is based
on approximately 105 replications. First, the doubly stochastic model of Section 3 was
simulated. Numerous combinations of TJ and
these, TJ = 0.7 and
.,p were simulated, but we show only one of
.,p = 0.9, which was chosen because in this case the two bias terms
roughly balance out leading to a range of positive and negative biases. Three thresholds
were chosen corresponding to F(u) =0.03, 0.018 and 0.006 and the bias 8( i, u) - 8 plotted
for different i in Figure 1. Here and in all subsequent plots, the three thresholds are
represented by crosses, triangles and diamonds in that order, the diamonds being for the
highest threshold. It is natural to examine how well the approximation (4.3) fits, and this
was calculated from the data by a nonlinear least squares fit of the parameters 8, C and
f3, fitted separately for each threshold. The results were
F(u) = 0.030 :
8=
0.128, C = 3.22, /3 = 0.270,
F(u) = 0.018 :
8=
0.133, C = 3.34, /3 = 0.264,
16
F(u) = 0.006:
8=
0.141,6 = 3.35,,8 = 0.263.
In this case the exact values computed by Section 3 are B = 0.1370, C = 3.196,
j3 = 0.27. It can be seen that the estimated values agree well with these. In Figure 1 the
three fitted curves are superimposed on the plot, which seems to confirm a good fit.
Now let us consider an example of a Markov chain. We use for this the logistic Gumbel
distribution
rX1
Pr{X1 _< x 1, X 2 <
_ x 2 } = exp{(e-
+ e- rx2 )I/r} ,
r ~ 1,
(4.4)
which has been shown (in particular by Tawn, 1988b) to be a realistic model for many
applications of bivariate extreme value theory. The proposal here is that we should use
the conditional distribution of X 2 given Xl, derived from (4.4), as a transition distribution
in a Markov chain. We fixed r = 2 for illustration. In this case B ~ 0.328 is known from
Smith (1992), but we know nothing about C and j3 including whether they exist at all.
Figure 2 shows a plot of the simulated B( i, u) together with fitted curves of the form
(4.3) fitted by nonlinear least squares to each threshold. The three thresholds correspond
to F( u) =0.01, 0.006 and 0.002. The fitted curves are:
F( u) = 0.010 :
8=
0.332,6 = 0.319,,8 = 0.501,
F( u) = 0.006 :
8=
0.332,6 = 0.307,,8 = 0.514,
F( u)
= 0.002 : 8 = 0.330,6 = 0.270,,B = 0.550.
In this case all three estimated values of
(J
were very close to the true values, while
the estimated C and (3 are reasonably close to each other over different thresholds. Again,
the plot seems to confirm that (4.3) is the right functional shape.
17
The foregoing simulations refer solely to the function B( i, u) and not to the actual
behaviour of the estimators. As an example of the latter, Figure 3 shows simulations of
the root mean squared error for different values of r n of each of the three estimators
and On when the true model is (4.4) with r
= 2.
Bn, Bn
In each case the runs estimator On achieves
its minimum mean squared error at the smallest value of r n , but after that the conclusions
are not so clear-cut. It appears, however, that the blocks estimator
as
Bn
performs as well
On in terms of minimum mean squared error over all r n. This partially contradicts the
results of Section 3, but it may be that even larger sample sizes than n
= 10000 are needed
to demonstrate the effect, or it may be that we have not chosen the threshold here in a
optimal way.
5. DATA ANALYSIS EXAMPLE
The example which follows is based on part of a large study that concerned heights
of sea waves sampled at three-hourly intervals. It turns out that tidal effects are not
important, but both common sense and previous studies such as Tawn (1988a) suggest
that a "storm length" in the region of 1-2 days (r n = 8-16) should be appropriate. This
series was picked out from a large number of similar series as one which caused particular
difficulty in interpreting the estimates of extremal index.
The full series was oflength 29220, and three thresholds Ull
U2
and U3 were chosen over
which there were respectively 2816, 1170 and 463 exceedances. Thus we estimate F( ud =
2816/29220
= 0.096 and similarly F( U2) = 0.040, F( U3) = 0.016.
To give an example of the
calculation, for threshold Ul there were 2816 exceedances and hence 2815 intervals between
exceedances, of which 2468 were of length 0 (i.e. consecutive exceedances), 35 of length
1, 21 of length 2, and so on. Counting the number of intervals of length> r is equivalent
= 1 there
are 2815 - 2468 + 1 = 348 clusters and hence estimated extremal index 348/2816 = 0.124.
With r = 2 the numbers are 2815 - 2468 - 35 + 1 = 313 clusters, extremal index 0.111.
to computing the runs estimator with the same r
= rn •
For instance, with r
The full data set, expressed as a frequency table for intervals between exceedances over
three thresholds, is given in Table 1, and the extremal index estimates are calculated and
18
displayed in Figure 4. It can be seen that the extremal index is hard to specify, estimates
from 0.044 to 0.194 being obtained. The standard errors of the estimates, obtained from
(2.2), are respectively about 0.006, 0.01 and 0.02 for the three thresholds, and more or less
independent of r. This implies that the differences among the calculated values are not
merely the result of sampling variation.
For this example the procedure described in Section 4 was again applied, the model
(4.3) (i replaced by r) being fitted by nonlinear least squares to the estimated 8's for each
threshold. The results in this case were:
F(u)
= 0.096 : 8 = 0.048,6 = 0.084, P= 0.875,
F(u)
= 0.040: 8 = 0.091,6 = 0.087,p = 0.884,
F( u)
= 0.016 : 8 = 0.135,6 = 0.065, P= 0.881.
We can now see what is going on much better. The estimates of C and (3 are quite
consistent across different thresholds, but the values of 8 are radically different. The plots
in Figure 4 confirm this picture very strongly. We seem to have a case where the extremal
index varies across different thresholds, though the model (4.3) fits very well across each
of the three thresholds we have examined.
Strictly speaking, the case where the appropriate extremal index varies over different
thresholds falls outside the scope of this paper, but it has been observed as a practical
phenomenon in a number of different studies. See for example J. Tawn's remarks in the
discussion of Davison and Smith (1990). It seems that a more detailed theory encompassing
such cases is needed. In the context of the present paper, the message seems to be that
direct application of (4.3) fails, but if we allow the extremal index to be dependent on the
threshold, the rest of the analysis fits very well, even to the extent of the other parameters
C and {3 being nearly constant across thresholds.
19
Another question to ask is the following: suppose, as in the original recipe, we did use
the runs estimator with a single r to estimate the extremal index. What value of r would
give the closest answer to our final estimate? The answer, surprisingly, is the same for all
three thresholds: r = 20. Since this corresponds to a time period of 2.5 days, it seems that
we need to take a slightly longer storm length than originally assumed. Nevertheless, it is
pleasing that a consistent and readily interpretable storm length is obtained.
We may summarise the main results of the paper as follows:
1. The theoretical analysis provides a firm basis for preferring the runs estimator
over the blocks estimator, and gives some asymptotic results about the optimum rate of
convergence.
2. Detailed calculations for the doubly stochastic model, supplemented by heuristic
reasoning and simulation for other cases, gives support to the model (4.3) which is the
main component of the bias terms derived in (2.8)-(2.10).
3. It is possible to estimate the three parameters in (4.3), and thus obtain an estimate
of (J which does not depend too sensitively on a particular value of r, by a least-squares fit
based on a range of values of r or i. In the one data example we considered, the conclusion
was that it was necessary to assume different values of (J for different thresholds, and since
such a phenomenon has been suggested on a number of previous occasions, it would appear
to be a genuine and practically important phenomenon.
20
ACKNOWLEDGEMENTS
The bulk of the work was done while RLS was at Surrey University. Acknowledgement
is made to the Wolfson Foundation for supporting RLS's research at Surrey, to Technion
for supporting a visit by RLS to Technion, and to the SERC for a Visiting Fellowship
which supported IW's visit to Surrey. The authors would like to thank Uri Cohen for
assistance with the programming and graphics.
REFERENCES
Davison, A.C. and Smith, R.L. (1990), Models for exceedances over high thresholds
(with discussion). J.R. Statist. Soc. B 52, 393-442.
Dekkers, A.L.M. and de Haan, L. (1989), On the estimation of the extreme-value
index and large quantile estimation. A nn. Statist. 17, 1795-1832.
Dekkers, A.L.M., Einmahl, J.H.J. and de Haan, L. (1989), A moment estimator for
the index of an extreme-value distribution. Ann. Statist. 17, 1833-1855.
Galambos, J. (1987), The Asymptotic Theory of Eztreme Order Stati.'Jtic.'J (2nd. edn.).
Krieger, Melbourne, Fl. (First edn. published 1978 by John Wiley, New York.)
Hsing, T., Husler, J. and Leadbetter, M.R. (1988), On the exceedance point process
for a stationary sequence. Probability Theory and Related Fields 78, 97-112.
Hsing, T. (1990), Estimating the extremal index under M-dependence and tail balancing. Preprint, Department of Statistics, Texas A&M University.
Hsing, T. (1991), Estimating the parameters of rare events. Stoch. Proc. Appl. 37,
117-139.
Husler, J. (1986), Extreme values of non-stationary random sequences. J. Appl. Prob.
23, 937-950.
Leadbetter, M.R. (1983), Extremes and local dependence in stationary sequences. Z.
Wahrsch. v. Geb. 65, 291-306.
Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983), Eztreme.'J and Related Properties of Random Sequences and Series. Springer Verlag, New York.
21
Leadbetter, M.R. and Rootzen, H. (1988), Extremal theory for stochastic processes.
Ann. Probab. 16,431-478.
Leadbetter, M.R., Weissman, I., de Haan, L. and Rootzen, H. (1989), On clustering of
high levels in statistically stationary series. Proceedings of the Fourth International Meeting
on Statistical Climatology, ed. John Sansom. New Zealand Meteorological Service, P.O.
Box 722, Wellington, New Zealand.
Loynes, R.M. (1965), Extreme values in uniformly mixing stationary stochastic processes. A nn. Math. Statist. 36, 993-999.
Nandogopalan, S. (1990), Multivariate extremes and the estimation of the extremal
index. Ph.D. dissertation available as Technical Report Number 315, Center for Stochastic
Processes, Department of Statistics, University of North Carolina.
Newell, G.F. (1964), Asymptotic extremes for m-dependent random variables. Ann.
Math. Statist. 35, 1322-1325.
O'Brien, G.L. (1974), The maximum term of uniformly mixing stationary processes.
Z. Wahrsch. v. Geb. 30, 57-63.
O'Brien, G.L. (1987), Extreme values for stationary and Markov sequences. Ann.
Probab. 15, 281-291.
Smith, R.L. (1987), Estimating tails of probability distributions. Ann. Statist. 15,
1174-1207.
Smith, R.L. (1989), Extreme value analysis of environmental time series: an example
based on ozone data. Statistical Science 4, 367-393.
Smith, R.L. (1992), The extremal index for a Markov chain. To appear, J. Appl.
Prob.
Tawn, J.A. (1988a), An extreme value theory model for dependent observations. J.
Hydrology 101, 227-250.
Tawn, J.A. (1988), Bivariate extreme value theory - models and estimation.
Biometrika 75, 397-415.
22
TABLE 1: EXCEEDANCE DATA
The data represent exceedances over three thresholds denoted Ul, U2 and U3, based
on a total number of 29220 observations. There were respectively 2816, 1170 and 463
exceedances over the three thresholds. In the table, the lengths of intervals between consecutive exceedances are given in the following form:
ColI:
Col 2:
Col 3:
Col 4:
n.
Number of intervals of length n between exceedances over
Number of intervals of length n between exceedances over
Number of intervals of length n between exceedances over
o
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
>30
2468
35
21
20
22
10
21
971
12
16
5
5
7
4
6
3
12
11
7
7
4
7
4
4
7
4
7
3
6
1
3
4
2
3
2
4
Ul.
U2.
U3.
372
2
8
2
2
1
2
1
5
o
5
4
1
4
4
4
o
o
1
3
o
o
4
3
2
o
2
3
1
1
o
o
o
o
o
1
1
1
1
1
2
o
1
1
3
2
o
1
o
o
o
1
2
2
1
o
o
111
94
55
23
FIGURE CAPTIONS
Figure 1. Simulated values of (}(i, u) - (} for the doubly stochastic model of Section
3, with TJ = 0.7,
tP
= 0.9. The crosses, triangles and diamonds represent simulated values
for three thresholds for which 1 - F( u) are respectively 0.03, 0.018 and 0.006. The curves
are based on equation (4.3) fitted to the data separately for each threshold. It can be seen
that the curves fit well through the data points.
Figure £. Same as Figure 1, but the simulated process is now a first-order Markov chain
with bivariate distributions given by (4.6) with
1 - F( u)
= 0.01 (crosses), 1 -
F(u)
T
= 2. The three thresholds correspond to
= 0.006 (triangles), 1 -
F(u)
= 0.002 (diamonds), and
the curves are again based on fitting (4.3).
Figure 3. Simulations of the square root of mean squared error for the three estimators
Bn (dot-dashed curve), On
(dashed curve) and
Un
(solid curve), for n
= 10000 and Tn
to 300. All the curves are based on 100 simulations of the model (4.6) with
T
Up
= 2, with
F(u) = 0.07 (Figure 3(a)), F(u) = 0.04 (Figure 3(b)) and F(u) = 0.01 (Figure 3(c)).
Figure
4. A real data example based on wave heights. The three thresholds correspond
to empirical 1 - F(u) = 0.096 (crosses), 1 - F(u) = 0.04 (triangles), 1 - F(u) = 0.016
(diamonds), and the curves are again based on fitting (4.3). Again the curves fit well
individually, but there is a very clear separation among the three thresholds, which unlike
Figures 1 and 2 cannot be explained away as the effects of bias. Thus we conclude that
the difference between the curves is real and represents different extremal indices being
effective at different levels of the process.
24
Bias in Theta(u,i)
-0.06
o
0.02 0.06 0.10 0.1 4 0.18 0.22 0.26
~~.
(.J1
~f
*~.
I i
~
0
I
I
~
(J1
(J1
<..N
<..N
~
0
~
(J1
(J1
0
0
0-
i
([)
r+-
0
II
*I ~,
t
0
~ ~
I
t
...
I
I
I
(J1
~
·.
I :
a
(j)
I
I
)( ~ +
N
0
([)
*~ ~
I : i
0
»$
*c!>I •
I
N
---
,
I
>f ~
I
I I
t
f ~
t
I
I
I
I
I
•
--....j
V
(j)
II
0
·CD
\1
, .
t:
;b
.
Bias In Theta(u,i)
-0.02
-0.06
I
I
I
0
0
>f~~.
(Jl
-
/1'
xr+
I-
-
/ I .
I
......
I
I
~.--~
(Jl .-
......
0.02 0.04 0.06 0.08 0.10
..-
/
l-
I
I .
-
~
X
0
xl r>,' ~
f,'
-
-
, I
N
-'(Jl
(N
0
xl
f
~
I<
-
f
fx
(N~
(Jl
f
~
0
-
l>
1
1
I
I
~I-
p<-
0
'\."
I
~
-
t
I
-
4t
I
..J
<
0
LO
-.
,.-+
0
0CD
..
)
-
f
Dt
J.J
~
I
I
I
1
I
(Jl
1
I
1
[>1
1
0
-
t
[)J
~
()
I
I
II
N
I
I
I
I
I
-.
-:A"
(j)
-
t
1
I
I
I
1<-
(Jl
r>J
,
I
f"1 I
N
~ T7
0
square root of mse
o
0.5
,
1
, ,J
"
t\
I
(]1
o
,
I
I
I
'.
'.
'.
,
,
I
'.
'.
I
'.
\
\
\
I
~
,..
\
\I
0
0
"
m
:J
I
I
3
a.
x
.,
0
3
"
~
(]1
0
e..,.,
OJ
0
0
-.
:E
..,
II
N
o
I
I
I\.)
I
N
I
I
0
0
<
,
,
,
,
I
,
o
,
,
.
,
,,
.....
'7'
~
I
0
..,
~
~
~
J
I\.)
3
o
~
I
m
-
.~
,,
J
J
I
I
"
:J
0..
I
C
I
o
"
I
\
lO
I
\
~
I
\
\
\
\
I
I
\
\
I\.)
(]1
o
\
\
I
\
\
,
\
\
\
,
~
o~_--L
o
\
\
__""""__
I
I
I
I
I
I
I
I
_ _..1-_--J'--_-..L._ _....L.._ _L-_-..J._ _...J
..L-_~~_-..L.
square root of mse
a
0.5
1
~---
.";r.;,,.~
J:1
'" I'
I
I
~
'. t
~
'.(
~
1\.
()1
0
vJ,
,
..--
-
I'.
cr-
\
\
I
I
I
I
I
\
\
\
,
,,,
\
\
\
~
0
0
!:to
::J
a.
t
-
I
('l)
X
0
3
.,
r+
\
I
0
,
,
Vol
0
0
I
f\J
0
0
I
I
I
I
I
,
I
f\J
()1
0
..,
0
"<
0
\
I
N
,
,
,
~
;::+:
:::J""
,
,
,
,
,
,
..,
II
I
N
0
::J
a.
c
II
0
I
I
<.0
,
,
0)
I
I
I
I
I
I
I
I
,
I
,
,
,
I
\
I
I
\
I
I
\
0
0
3
I
\
Vol
,
,.,
,
I
0,
f\J
-f\
\
\
\
I
0
0
\
\
~
()1
CD
\
,
_.I
..,
\
I
I
\
3('l)
3(I)
\
I
I
,
I
t
square root of mse
0
(
I-
-----
0.5
1
I
I
)f
r
'"
('/
IJ
~
iI
;
'
i
;t
I
-
i(
W
I
(J1
0
~
--
'-
- i-
......
0
0
I
3
I
I
::J
a.
0
3
"<
0
I
N
,
~
\
(1)
~
......
01
0
~
-
I-
t-+-
, I
I
I
I
-
t-+(!)
0
0
C
II
- '-
-
,\
II
if
i
+
J
I
I
I
I
N
(J1
'-
-i-
0
\
-
I-
~
~
~
t
.
Vo!
0
0
.<D
0
!I
-
"
a.
!I
N
....,
N
.1
Vo!
0
0
::r
0
::J
I
I
I
N
3
0
....,
I
I
I
.
(1)
~
....,
-
I
I
I
t-+-
X
,
~
-
\
. I
~ \
! I
! I
~ \
I I
I I
I
I I
I!
I!
I!
II
"
I
.
I
<D
Estimated Theta
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
-.
-.
o
»
/X
I
C>/
>!
~
p
;<
I
r
I
I
N
o ~ *'
'"
I
I
I
J.
I
I
.....
lO,
=r0
0
0
-
I
I
I
-.
r+
•
I
I
-
I
I
~
I
I
I
~
t
I
I
I
I
I
CD
r+.
I
t:::r
I
I
-
I
I
f<
I
I
I
<
CD
I
I
/
l-
,
~
I
-
i+
/.
I
/
~
0
/+
~
'/<
~
~ +'
<6
~
~
.,
J
~
I<
-
7
/
f
1
J1
.1
+/
p>
XI
I-
...
/
•
/.
/t$'
x/
I-
I
I
I
I
I
I
I