Selecting an optimum subset of climate models - for climate

Selecting an optimum subset of climate models
for climate impact studies
Renate A. I. Wilcke1
1
Rossby Centre,
Swedish Meteorological and Hydrological
Institute, Sweden
2
Lars Bärring1,2
Thomas Mendlik3
Centre for Environment and Climate
Research,
Lund University, Sweden
EMS2014-365 @ EMS, Prague, 10.10.2014
3
Wegener Center for Climate and Global
Change,
University of Graz, Austria
Introduction
Method
Experiments
Summary
Motivation
Motivation
Climate model output data are applied within climate impact studies
2 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Motivation
Motivation
Climate model output data are applied within climate impact studies
2 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Motivation
Motivation
Climate model output data are applied within climate impact studies
Common issue: Climate model ensemble is bigger than what can/want/will
be used by impact modellers
2 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Motivation
Idea
Question: Output from how many climate models should be
used?
Question: Which climate models should be used?
We know that different climate models perform differently in
different seasons and regions for different variables.
Answers should be applicable for different specific impact studies
Selection method should be flexible and objective
3 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
singular vector
decomposition
new variables
PCs
final selection:
cluster means
4 / 16
[email protected]
Selecting an optimum subset
silhouette
values
hierarchical
clustering
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
4 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
singular vector
decomposition
new variables
PCs
4 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
SVD
To gain uncorrelated variables and reduce their number without
loosing information.
Transformation to an orthogonal vector space.
Standardized data matrix X consists of m models and n variables
Where variables are all combinations of climate change signals (CCS)
of climate variables, indices, for different regions and seasons.
var1(sea1, reg1) var2(sea1, reg2)
Mod1
a1,1
a1,2
Mod2 
a
···
2
,
1

X=

···
a3,1
···
Modm
am , 1
···

5 / 16
[email protected]
Selecting an optimum subset
···
···
···
···
···
varn

a1,n
a2,n 

a3,n 
am , n
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
singular vector
decomposition
new variables
PCs
6 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
singular vector
decomposition
new variables
PCs
6 / 16
[email protected]
Selecting an optimum subset
silhouette
values
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Silhouette values
Purpose: Automated suggestion for optimum size of subset to select, i. e.
gaining k value for tree-cutting in hierarchical clustering
7 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Silhouette values
Purpose: Automated suggestion for optimum size of subset to select, i. e.
gaining k value for tree-cutting in hierarchical clustering
Silhouettes (Rousseeuw, 1987):
a distance measure between a
point and all other points in
different clusters.
testing for different number of
clusters
choosing the highest average
silhouette value
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.,
20(0):53–65.
7 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Silhouette values
choosing the highest average
silhouette value
0.8
0.6
0.4
4 PCs (83%)
0.2
testing for different number of
clusters
2 PCs (63%)
3 PCs (75%)
0.0
Silhouettes (Rousseeuw, 1987):
a distance measure between a
point and all other points in
different clusters.
average silhouette value
1.0
Purpose: Automated suggestion for optimum size of subset to select, i. e.
gaining k value for tree-cutting in hierarchical clustering
2
4
6
8
clusters
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.,
20(0):53–65.
7 / 16
[email protected]
Selecting an optimum subset
10
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
singular vector
decomposition
new variables
PCs
8 / 16
[email protected]
Selecting an optimum subset
silhouette
values
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
silhouette
values
singular vector
decomposition
new variables
PCs
8 / 16
[email protected]
hierarchical
clustering
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
distance representing
similarity
9 / 16
[email protected]
Selecting an optimum subset
EC−EARTH_RCA4
NorESM1−M_RCA4
M−MPI−ESM−LR_RCA4
EC−EARTH_RACMO22E
EC−EARTH_HIRHAM5
GFDL−GFDL−ESM2M_RCA4
CanESM2_RCA4
CERFACS−CNRM−CM5_RCA4
clusters build by
minimizing distances
HadGEM2−ES_RCA4
bottom-up, starting with
m different clusters
MIROC5_RCA4
Hierarchical clustering is
applied to the euclidean
distances between the
PCs
IPSL−CM5A−MR_RCA4
Hierarchical clustering
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
silhouette
values
singular vector
decomposition
new variables
PCs
10 / 16
[email protected]
hierarchical
clustering
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection process – sketch
silhouette
values
singular vector
decomposition
new variables
PCs
hierarchical
clustering
final selection:
cluster means
10 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
SVD
Silhouettes
Clustering
Selection
Selection from clusters
clusters with only 2
members which equally
representing the mean:
arbitrary choise
11 / 16
[email protected]
5
0
NorESM1−M_RCA4
EC−EARTH_RCA4
M−MPI−ESM−LR_RCA4
PC2 22.42%
models closest to those
means are selected
EC−EARTH_RACMO22E
IPSL−CM5A−MR_RCA4
EC−EARTH_HIRHAM5
MIROC5_RCA4
−5
multidimensional means
can represent clusters
CanESM2_RCA4
HadGEM2−ES_RCA4
GFDL−GFDL−ESM2M_RCA4
−10
each cluster represents a
group which is distinct
from the others
−10
−5
CERFACS−CNRM−CM5_RCA4
0
PC1 40.4%
Selecting an optimum subset
5
10
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
Variables: Climate change signals (CCS) of climate variables and
indices for different seasons and regions
Models: 11 RCM-GCM runs from EURO-CORDEX, rcp8.5
12 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
Variables: Climate change signals (CCS) of climate variables and
indices for different seasons and regions
Models: 11 RCM-GCM runs from EURO-CORDEX, rcp8.5
To simulate different impact applications, we tested different . . .
combinations of climate variables (tas, pr, hurs, . . . ) and indices
(degree days, frost days, . . . )
seasons (DJF, MAM, . . . )
CCS periods (near, mid, far future)
12 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
CCS: T2m, pr, wet-day freq., 2
degree days
2069-2098 vs. 1971-2000
all seasons
6 regions in northern Europe
13 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
0.6
2 PCs (62%)
3 PCs (74%)
4 PCs (81%)
0.0
suggestion: 4 clusters
0.4
6 regions in northern Europe
0.2
all seasons
average silhouette value
2069-2098 vs. 1971-2000
0.8
1.0
CCS: T2m, pr, wet-day freq., 2
degree days
2
4
6
clusters
13 / 16
[email protected]
Selecting an optimum subset
8
10
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
CCS: T2m, pr, wet-day freq., 2
degree days
tas, pr, ind
2069-2098 vs. 1971-2000
13 / 16
[email protected]
EC−EARTH_RACMO22E
EC−EARTH_RCA4
EC−EARTH_HIRHAM5
NorESM1−M_RCA4
M−MPI−ESM−LR_RCA4
GFDL−GFDL−ESM2M_RCA4
MIROC5_RCA4
Selecting an optimum subset
CERFACS−CNRM−CM5_RCA4
hierarchical clustering
IPSL−CM5A−MR_RCA4
suggestion: 4 clusters
CanESM2_RCA4
6 regions in northern Europe
HadGEM2−ES_RCA4
all seasons
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
CCS: T2m, pr, wet-day freq., 2
degree days
tas, pr, ind
CanESM2_RCA4
EC−EARTH_RACMO22E
NorESM1−M_RCA4
4
all seasons
6
2069-2098 vs. 1971-2000
HadGEM2−ES_RCA4
final selection/suggestion: RCA4
with CanESM2, IPSL, CNRM,
and HIRHAM5 with EC-EARTH
0
−2
PC2 23.03%
−4
IPSL−CM5A−MR_RCA4
−6
hierarchical clustering
EC−EARTH_RCA4
GFDL−GFDL−ESM2M_RCA4
MIROC5_RCA4
−8
suggestion: 4 clusters
EC−EARTH_HIRHAM5
M−MPI−ESM−LR_RCA4
2
6 regions in northern Europe
−10
−5
CERFACS−CNRM−CM5_RCA4
0
PC1 38.88%
13 / 16
[email protected]
Selecting an optimum subset
5
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Experiments set-up
CCS: T2m, pr, wet-day freq., 2
degree days
tas, pr, ind
CanESM2_RCA4
EC−EARTH_RACMO22E
NorESM1−M_RCA4
4
all seasons
6
2069-2098 vs. 1971-2000
HadGEM2−ES_RCA4
final selection/suggestion: RCA4
with CanESM2, IPSL, CNRM,
and HIRHAM5 with EC-EARTH
0
−2
PC2 23.03%
−4
IPSL−CM5A−MR_RCA4
−6
hierarchical clustering
EC−EARTH_RCA4
GFDL−GFDL−ESM2M_RCA4
MIROC5_RCA4
−8
suggestion: 4 clusters
EC−EARTH_HIRHAM5
M−MPI−ESM−LR_RCA4
2
6 regions in northern Europe
−10
−5
[email protected]
0
PC1 38.88%
!OBS! Arbitrary selection if only 2
cluster members
13 / 16
CERFACS−CNRM−CM5_RCA4
Selecting an optimum subset
5
Introduction
Method
Experiments
Summary
Set-up
Set-up
Results
Results
CCS of temperature is dominating the clustering in all
combinations
always the same two basic clusters
tas domination is not visible in the finally selected models
Influence of CCS of clim. indices is bigger at end of century
Choice of period influences the final selection
14 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Summary
Outlook
Summary
Model “quality” control should be done before hand
Selection is period dependent
When tas included, it dominates the clustering
Selection is application dependent (there is no ONE answer)
Selection result is just a suggestion → do not use it blindly
Assuming the models work equally well in projecting possible
future climates
15 / 16
[email protected]
Selecting an optimum subset
Introduction
Method
Experiments
Summary
Summary
Outlook
Outlook
Selection needs to be compared to when used on bias corrected
data
More different regions needs to be tested
Include information about single periods (non CCS)
Direct test with impact models
Study will result in an automated web application to support
climate and impact researchers
16 / 16
[email protected]
Selecting an optimum subset