a discourse - Hans von Storch

Statistics as a means to
construct knowledge in
climate and related sciences
-- a discourse -Hans von Storch
Institute for Coastal Research
GKSS, Germany
9IMSC, Cape Town, 24-28 May 2004
The basic approach …
… is to combine systematically
empirical knowledge („data“)
with
dynamical knowledge („models“)
in order to determine
• characteristic parameters (“inference”)
• consistency of models and data (“testing”)
The knowledge represented by data and models are
both uncertain.
This uncertainty makes us to resort to statistical
concepts.
The resulting additional knowledge is
• best guesses of numbers (ideally together with
confidence intervals)
• evaluation of the consistency of theoretical
concepts with observational evidence.
These new knowledge claims are based on the
amount of available data.
In general: If more data are available, the
confidence in the numbers increases, but the
consistency of the concepts decreases.
In general, the problem may be conceptualized by
the state space formalism, with
- a state space equation, e.g.,
Ψt+1 = F(Ψt, α, η) + ε
(M)
with the state variable Ψt, external parameters η
and internal parameters α. The term ε is a random
component, which supposedly represents the
uncertainty of the model M.
- an observation equation
xt = B(Ψt) + δ
(B)
with the observable x, and the random component
δ.
Examples:
1. Goodness of fit
2. Extreme value
3. PIPs and POPs
4. Downscaling
5. Detection and attribution
6. Determination of parameters
7. Analysis
1. Goodness of fit
( M ) u ~ W ( ,  )
Weibull distributi on
 shape;  scale parameter
 x  1 (  )
fW (x)  ( ) e
 
x
(B)
in case of wind in the extratropi cs
OWS M :
good fit with   2.64 (JJA)
or 3.04 (DJF)
(M )
 ()   k with 0  k  1
( being the autocovari ance function)
2. Extreme
( B)
values
Long memory?
1 ( r / Rq ) k
Pq (r ) 
e
with r  0
Rq
the probabilit y density function of waiting time r
between tw o events of exceeding the q level.
Rq  E[ Pq (r )].
Synthetic
example with
k =0.4
Bunde et al., 2004: Return
intervals of rare events in records
with long-term persistence …
722-1284
annual water
levels of the Nile
Distribution Pq(r) of
return times between
consecutive extreme
values r. Rq is the
expected value.
Significance:
Extremes are not uniformly
distributed in time, as described
by a Poisson process, but appear
in clusters.
Synthetic examples
with k =0.4
Expected waiting time for next
exceedance event conditional upon
length of previous waiting time r0.
722-1284 annual
water levels of the Nile
Bunde et al., 2004: Return intervals of
rare events in records with long-term persistence …
State space equation in lowdimensional subspace
(M)
… and POPs
 t 1  F ( t ,  , )  
Special form
Observational equation in highdimensional space.
(B) xt  P t  
Parameters P, α determined such
that
E xt 1  PF ( t ,  , )  min
3. PIPS …
(M)
(B)
 t 1   t  
xt  P t
Ψ, λ complex numbers;
(M) describes the damped
rotation in a 2-dimensional space
spanned by complex eigenvectors
of E(xt+1xtT) E(xtxtT)-1. All
eigenvectors form PT.
Example: POP of MJO
Real and imaginary part of spatial pattern
in equatorial velocity potential at 200 hPa
10-day forecast using state
space equation in 2-d space
von Storch, H. and J.S. Xu, 1990: Principal Oscillation Pattern Analysis of the Tropical 30- to 60- day Oscillation:
Part I: Definition of an Index and its Prediction. - Climate Dyn. 4, 175-190
4. Downscaling
The state space is
simulated by ”reality”
of by GCMs.
The observation
equation relates largescale variables, which
are supposedly well
observed (analysed) or
simulated, to variables
with relevant impact
for clients.
Large scale state:
JFM mean temperature anomaly
Example: snow drops
Flowering date anomaly
of snow drop (galanthus
nivalis)
Maak, K. and H. von Storch, 1997: Statistical downscaling of monthly mean
air temperature to the beginning of the flowering of Galanthus nivalis L. in
Northern Germany. - Intern. J. Biometeor. 41, 5-12
The state space dynamics is given by
the assumption that the complete
state of the atmosphere may be given
by
(M)
 t   ak (t ) g k  
k
The “patterns” gk represent the
influence of a series of external
influences, while ε represents the
internal variability of the climate
system. Ψ describes the full 3-d
dynamics of the climate system.
The observation equation is
formulated in a parameter space (A),
and the state variable is projected on
a space of observed variables (L[ψ] )
(B)
Ak  L( t )  g k
r ,ad
5. Detection
and attribution
Here, L is the projector of the full
space on the space of observed (and
considered) variables, and gr,ad is the
adjoint pattern of gk in the reduced
space.
Detection means to test the null
hypothesis
H 0 : Ak  0
while attribution means the
assessment that
Ak is consistent with ak.
(i.e. Ak lies in a suitable small
confidence “interval” of ak)
Attribution diagram for
observed 50-year trends in
JJA mean temperature.
Detection and attribution (cont’d)
The ellipsoids enclose nonrejection regions for testing
the null hypothesis that the 2dimensional vector of signal
amplitudes estimated from
observations has the same
distribution as the
corresponding signal amplitudes
estimated from the simulated
1946-95 trends in the
greenhouse gas, greenhouse
gas plus aerosol and solar
forcing experiments.
Courtesy G. Hegerl.
Zwiers, F.W., 1999: The detection of climate change. In: H. von Storch and G. Flöser (Eds.):
Anthropogenic Climate Change. Springer Verlag, 163-209, ISBN 3-540-65033-4
6. Determination of parameters
In general, when many observations are available,
optimal parameters α may be determined by
finding those α which minimize the functional
E xt 1  BF ( t ,  , )

Example: Determination of
parameters – oceanic dissipation
M2 tidal dissipation rates, estimated by combining Topex/Poseidon
altimeter data with a hydrodynamical tide models. The solid
lines encircle high dissipation areas in the deep ocean From Egbert and
Ray [32]
Egbert GD, Ray RD (2000) Significant dissipation of tidal energy in the deep
ocean inferred from satellite altimeter data. Nature 45:775-778
7. Analysis
Skillful estimates of the unknown
field Ψt are obtained by
integrating the state-space
equations and the observation
equation forward in time:
 t*1  F ( t ,  , )
xt*1  B ( t*1 )
and, as best guess
 t 1   t*1  K ( xt*1  xt 1 )
Example: spectral
nudging in RCMs
State space equation:
RCM
Observable xt: large-scale
features, provided by
analyses or GCM output.
Correction step: nudging
large-scales in spectral
domain
Percentile-percentile diagram of local wind at an
ocean location as recorded by a local buoy and as
simulated in a RCM constrained by lateral control
only, and constrained by spectral nudging
The purpose of statistics is …
• to specify pre-defined „models“ of reality by fitting
characteristic numbers to observational evidence.
 developing and extending models and theories
• to analyze states and changes by interpreting
empirical evidence in light of a pre-specified model.
 monitoring weather (and climate)
• to test theories and models as to whether they are
valid in light of the empirical evidence.
 falsifying theories and models
Potential of „professional statisticians“
The specification of the models is usually not a statistical
problem, but needs guidance by dynamical knowledge.
Therefore, when applying advanced method in climate
science „professional“ statisticians often fail to achieve
significant knowledge gains.
We need market places, where
a) method-driven mathematical (and theoretical physics)
statisticians meet problem-driven people from climate
science
b) other problem-driven scientists (e.g., geostatistians,
econometricians) to allow the export of methods to climate
science.
So what?