sketch of the proof ML estimation Profile likelihood Goodness-of-fit

A little bit of history
Lecture 3
February 24, 2014
1+ γ
Extreme-value distributions: max-domain of
attraction, estimation and goodness-of-fit
methods
Multivariate models
Original theorem about the maxstable distributions: Fisher-Tippet
(1928)
1

− 
x−a γ 
 
G(x ) = exp− 1 + γ
 
b  
 


x−a
> 0.
b
Frechet(α) corresponds to GEV(1/γ)
1-F~x-γL(x) belongs to the maxdomain of attraction of the
GEV(1/γ)
Generalised extreme-value
distribution: Jenkinson (1953).
Characterisation of max-domain
of attraction: Gnedenko (1943)
Remark: if we are interested in the distribution of minima, then we
should consider the maxima of their negatives
Domain of attraction – sketch
of the proof
The simpler direction: the survival function is regularly
varying implies that F∈MDA(F1/γ):
Let an= F(1-1/n).
F (an x)
→ x −α (n → ∞), because an → ∞
n F ( a n x) ≈
F (a n )
Thus for x>0:
P ( M n < an x) = F n ( a n x) = exp{n ln(1 − F ( an x ))} → exp{− x −α } if n → ∞
n
If x<0, then F(x)<1 thus F (anx)0.
ML estimation
The density function of the GEV (in the region 1+γ(x-b)/a>0)
1
1

− −1
− 
1
x −b  γ
x−b  γ 
 
g ( x ) = 1 + γ
exp− 1 + γ
 

a
a 
a  
 


Thus the loglikelihood function (if 1+γ(xi-b)/a>0 for every i)
1  n
x −b 

l (a, b, γ ) = −n log a −  + 1 log1 + γ i
−
a 

γ
 i =1
∑
For the construction of confidence intervals for the
estimators of the high quantiles it is worth using
the profile likelihood because of the skewness of
the loglikelihood function
In order to do so, the loglikelihood function has to
be reparametrised on a way that the quantile has to
be one of the parameters. We have seen that the 1p quantile is:
a
y p = log(1 − p ).
z p = b − (1 − y −pγ )
where
γ
a
−γ
Thus b = z p + γ (1 − p ) because log(1-p)~p if p~0
So we may maximise in the function of zp, a and γ.
x −b


1 + γ i
a 

i =1
∑
The location of the maximum may be found using numerical
methods (one should take care on the initial values and on the
condition to be fulfilled all the time)
It possesses the usual asymptotic properties (optimality,
normality), if γ>-0.5.
Profile likelihood
n
Goodness-of-fit
Classic tests:
Chi-squared
Kolmogorov-Smirnov
these are not too strong
Cramér-von Mises type tests: the (possibly
weighted) integral of the difference
between the empirical and∞ the theoretical
distribution is used: C = w(t )( F (t ) − F (t )) 2 dt
n
∫
−∞
n
−
1
γ
Further alternatives
Alternatives
∞
( Fn ( x) − F ( x)) 2
dF ( x )
F ( x)(1 − F ( x))
−∞
n
2
Computation: A = − n − ∑ ( 2i − 1)(log zi + log(1 − z n +1−i )) / n
2
Anderson-Darling test: A =
∫
i =1
where zi=F(Xi). Sensitive in both tails.
∞
( F ( x) − F ( x)) 2
Modification:
B2 = ∫ n
dF ( x )
1 − F ( x)
−∞
(for maximum; upper tails). Its computation:
n
n
i =1
i =1
B 2 = n / 2 − ∑ (2i − 1) log(1 − zn +1−i ) / n − ∑ zi / n
Power studies
Limit distributions
The probability of correct decision (p=0.05):
lim sup n Fn ( x) − Fn (a2 x + b2 ) → sup B ( y ) − y B
2
100 200 400 100 200
Distribution-free for the case of known parameters.
For example:
n →∞
Another test can be based on the stability
property of the GEV distributions: for any m ∈N
there exist am, bm such that F(x)=Fm(amx+bm)
(x ∈R)
2
The test statistics:
h( a, b) = sup n Fn ( x) − Fn ( ax + b)
x
Alternatives for estimation:
To find a,b which minimize h(a,b) (computerintensive algorithm needed).
To estimate the GEV parameters by maximum
likelihood and plug these in to the stability
property.
x
( y)
Distr.
y
where B denotes the Brownian Bridge over [0,1].
As the limits are functionals of the normal
distribution, the effect of parameter estimation by
maximum likelihood can be taken into account by
transforming the covariance structure.
In practice: simulated critical values should be used
(advantage: small-sample cases).
For specific cases, where the upper tails play the
important role (e.g. modified maximal values of real
flood data), B is the most sensitive.
When applying the above tests for the flood data
(annual maxima; windows of size 50), there were a
couple of cases when the GEV hypothesis had to be
rejected at the level of 95%.
Possible reasons: nonstationarity
changes in river bed properties (shape, vegetation etc).
Climate change?
Periodicity?
400
NB
exp
Normal
K-S
0.27 0.49 0.88 0.36
0.61
0.19
0.23
B
0.02 0.27 0.49 0.17
0.58
0.05
0.08
A-D
0.31 0.62 0.96 0.72
0.97
0.21
0.34
h
0.67 0.87 0.99 0.75
0.91
0.10
0.14
For typical alternatives, the A-D test seems to
outperform the other tests. The power of h very much
depends on the shape of the underlying distribution.
Multivariate extreme-value
distributions
Applications
200
n Test
Typically coordinate-wise maxima are
considered
Remark: this does not mean a coincident
appearance!
Marginal distributions may be chosen
arbitrarily, the traditional is the Frechet(1)
– it can be achieved for any (known) Fj
marginal: Yj=-1/log(Fj(Xj))
d-dimensional extreme-value
distributions (MGEV)
Let X1, X2,…,Xn be independent, identically
distributed d-dimensional probability distributions.
Let an, bn be norming vectors, such that
[max(X1, X2,…, Xn)-an]/ bn
tend to a nondegenerate limit. Then this limit is
necessarily a d-dimensional max-stable or a so-called
extreme-value distribution.
Max-stability: to every n there is an a,b such that
n
F (x)=F(ax+b)
Representation (de Haan,
1985)
Let ||x||=|x1|+…+| xd|
Sd be the unit simplex: {x≥0:||x||=1}
There is a finite H measure on Sd , such that
∫ ω dH (ω ) = 1
j
Sd
for every j=1,…,d
H is called spectral measure
With this notation G(x)=exp{-V(x)}, where
V ( x) =
ωj
∫ max x
1≤ j ≤ d
Sd
The bivariate case (Pickands)
G is a bivariate 2 dimensional extreme-value
distribution with Frechet(1) margins, if and only if
there is a function A:[0;1][0;1] such that
max(t,1-t)≤A(t)≤1 for all 0≤t≤1
  1
A(t) is convex, and
1   x 2 
Properties

G ( x ) = exp−  +  A
  x1 x2   x1 + x2 
A is called the (Pickands-type) dependence function
Remark: the condition is just necessary, but not
sufficient in d-dimensions
Examples for asymptotically
independent distributions
Theorem (Sibuya) The bivariate vector F is
asymptotically independent if and only if
P(X1>q1(u)|X2>q2(u))0, if u1
(qi(u) is the u-quantile of the ith marginal)
Corollary: the multivariate normal distribution
is asymptotically independent, if ρ<1 for the
pairwise correlations.
dH (ω )
j
The MGEV distributions are positive
quandrant-dependent:
G(x)≥G(x1)G(x2)…G(xd)
The case of independence:
G(x)=G1(x1)G2(x2)…Gd(xd). The distributions
falling into the max-domain of attraction of
this G is called asymptotically independent
The spectral measure of this G puts unit
masses into the vertices of the simplex
Complete dependence
G is completely dependent, if
G(x)=min(G1(x1),G2(x2),…,Gd(xd)).
In this case H is concentrated to the
central point (1/d,…,1/d) of Sd
V(x)=max(1/x1,…1/xd) (for all x≥0)
A(t)=max(t,1-t)
This in interesting from just a
theoretical point of view
Parametric families
Estimation
Gumbel (logistic):
α
  d
 

G ( x) = exp− 
x j −1/ α  


  j =1
 

0<α≤1 measures the strength of the
dependence:
∑
α=1 : independence
α0: complete dependence
Asymmetric logistic
Negative asymmetric logistic
…
[
]
∞
E log max (tYi ,1 , (1 − t )Yi , 2 ) = log A(t ) + log(u )du
∫
0
An estimator, fulfilling the also the condition
A(0)=A(1)=1:
log Aˆ CFG (t ) = log Aˆ (t ) − t log Aˆ (1) − (1 − t ) log Aˆ (0)
(Capéraá, Fougeres, Genest, 1997)
Example
Data: annual maxima form two stations at
the upper part of the river Tisza in northeastern Hungary.
It is reliable only if the model fit is good
Estimation is straightforward, confidence intervals
may be constructed
Another method
Let Xi be a sample with GEV distribution (i=1,…,n) G its
distribution function
Yi,j=-log G(Xi,j) (j=1,2)
Zi=min(Yi,1/t, Yi,2/(1-t)) exponentially distributed with
expectation 1/A(t)
The estimator of A(t): n/(Z1+… +Zn)
Bu this is not always convex. Convexity can be achieved
e.g. by the greatest convex minorant (Hall-Tajvidi, 2000)
The differentiability is still a question – the estimator has
to be smoothed further
One does not need any condition on the model
Methods are more complicated
Parametric: fitting parametric models,
typically by maximum likelihood method
Nonparametric estimator, d=2
Nonparametric: the estimation of the spectral
measure or the dependence function from
the empirical distribution
Results
The fit of the GEV distributions was good, this
was used when transforming the marginals.
Quantile estimation:
ML estimation for the
parameters of the
symmetric
logistic model
.