Examining applicability of a new technique for threshold selection in

Examining applicability of a new
technique for threshold selection in
extreme value modelling
Samantha Hinsley
Jenny Wadsworth
Maths & Stats, Lancaster University
Why look at a new method of
threshold selection?
Background information
Investigating extreme values involves looking at
the tails of probability distributions.
A problem with this lies in deciding where we
should model the tail from.
This project looked at a new method for defining
such a threshold, using data sets of ‘ocean energy’
values over time in places such as the Gulf of
Mexico.
The Generalized Pareto distribution
One way to look at extreme values is to choose a
threshold and look only at data above this point.
After choosing a threshold, the Generalized Pareto
distribution can be used.
The GP distribution fits the tails of most
probability distributions, meaning that most data
sets collected can be evaluated in this way.
Threshold selection
5.0
5.5
6.0
1.0
0.5
●
●
●
●
●
3.5
4.0
4.5
5.0
5.5
4.0
6.0
4
6
8
10
u
estimates
against
(b) Mean residual life plot
Figure 1: Plots to aid threshold selection.
After choosing our threshold, we create postcalculation diagnostic plots where we desire linearity
amongst the fulfillment of other criteria to check the
threshold’s suitability (Figure 2).
0.2
0.4
0.6
0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
3.0
3.5
4.0
4.5
5.0
v
Figure 3: Triangle plot
Figures 1, 2 (old method) and 3 (new) help in
choosing where to begin looking at thresholds.
10
9
8
7
6
5
4
1.0
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
4
5
6
7
Model
Return Level Plot
Density Plot
8
25
f(x)
0.8
20
0.4
0.0
5
●
1e+01
●
1e+03
Return period (years)
4
5
6
7
8
9
10
x
Figure 2: Post threshold calculation diagnostic plots.
STOR-i internship 2010
The Gulf of Mexico results are shown in table 1.
Threshold (v) p-value
3
0.015
3.1
0.135
3.1
0.12
3.2
0.43
Table 1: Thresholds and p-values.
Table 1 suggests that 3.1 is a suitable threshold.
This is lower than what would have been chosen
from the old method.
The results show that there is no evidence to
suggest why a threshold above 3.1 is needed.
Analysis of the new method
Advantages:
More accurate
Easier to interpret
Less room for human error
Limitations:
Other variables not taken into account (e.g. wave
direction).
If H0 always rejected, it could be that no single
model fits for the tail but this cannot be
guaranteed.
Seasonality in the data affects the results - the
p-values can be misleading if the data follows a
different trend for different times of year.
(Overcome by looking only at winter data.)
The triangle plot can be misleading if there is only
a small number of data points - the test statistics
can become small if there is a small amount of
data above the threshold.
Conclusion
●
●
Empirical
15
10
Return level
1e−01
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.2
0.0
●
●
●
Finding the best threshold
Quantile Plot
Empirical
0.6
0.4
0.0
0.2
Model
0.8
1.0
Probability Plot
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
3.5
Threshold
(a) Parameter
threshold
●
●
−0.5
−1.0
3.0
●
u
●
●
●
0.0
●
●
●
●
●
−0.5
0.0
●
●
●
●
4.5
Threshold
●
●
4.0
Mean Excess
3.5
5.0
●
●
●
2.0
●
●
●
●
3.0
Shape
Figure 3 shows the test statistics for models tested
with different lowest thresholds (v) and different
change points (u).
The larger the test statistic, the larger the circle
on the plot.
Large test statistics indicate where the lowest
threshold may be too low.
4.5
●
●
●
Hypotheses to test with LR test:
H0: A single tail model is ok for chosen threshold.
H1: Single tail model is not ok for the threshold.
We test the hypotheses by comparing a single tail
model with a tail model that has a change point, u
(not a single tail).
1.5
●
New method - Hypotheses
1.0
●
●
LR tests are a type of hypothesis test.
They use the likelihood function to test a null
hypothesis, H0.
With LR tests we find a p-value: the probability of
obtaining a test statistic at least as large as the
one observed if H0 were true.
If the evidence against H0 is strong then the
p-value will be small.
0.5
●
0
●
Likelihood Ratio tests and p-values
Triangle plot
2.5
5
●
−5
Modified Scale
To choose a threshold for the GPD model to be
fitted from, 2 types of graphs are initially plotted.
In Figure 1(a), we look to construct a horizontal
line, that cuts through all the confidence bars.
In Figure 1(b), we look for approximate linearity
whilst keeping in between the confidence bounds.
In both plots, we want to choose the lowest
possible threshold that fits the criteria.
Using only graphs leaves room for error and
misinterpretation.
It is difficult to decide on the ‘best’ threshold with
this method.
Results
Run 200 simulations of the data under H0 from
the chosen threshold to find the distribution of the
LR test.
Use the distribution to find the p-value.
If p < 0.05, increase the threshold and repeat.
Where the threshold is thought to be, run another
set of 200 simulations to ensure accuracy.
When p < 0.05, there is significant evidence to
reject H0.
When p ≥ 0.05, there is not enough evidence to
reject H0, so proceed to use single tail model.
The old method is useful to see the area where the
threshold should occur.
The new technique is better to see accurately
where the threshold actually lies.
Overall, the new method of threshold selection
removes a lot of the problems that were found in
the old method.
Although the new technique has limitations, many
of these limitations would also be seen, or even
undetected, in the old method too.
Best solution: use the old method to indicate
where the threshold might lie, then use the new
method to quantify credibility of a chosen
threshold.
By doing this, it can be seen whether the result
from the new technique is likely to be correct or
whether the results have been affected by an
outside factor such as seasonality.
References
Wadsworth, J. L. and Tawn, J. A. (2010).
Likelihood-based Procedures for Threshold Diagnostics and Uncertainty in
Extreme Value Modelling.
Coles, S. (2007).
An Introduction to Statistical Modeling of Extreme Values.
Springer-Verlag, London, 4th edition.
DeGroot, M. H. and Schervish, M. J. (2002).
Probability and Statistics. Addison-Wesley, 3rd edition.
[email protected]