Estimation of positive sum-to-one constrained parameters with

Estimation of positive sum-to-one constrained
parameters with ensemble-based Kalman filters:
application to an ocean ecosystem model
Ehouarn Simon1 , Annette Samuelsen1 , Laurent Bertino1
Dany Dumont2
1 Nansen
2 Institut
Environmental and Remote Sensing Center, Norway
des sciences de la mer, Université du Québec à Rimouski, Canada
16 November 2012
1
Outline
1
Estimation of positive sum-to-one constrained parameters with
ensemble-based Kalman filters
2
Numerical results in GOTM-NORWECOM
Assimilation of ocean color data
CHLA: forecast mean (13-08-2008)
CHLA: observations (13-08-2008)
Time evolution of the errors
RMS error and standard deviations (mg/m3)
2.5
RMS Ensemble
STD Ensemble
2
1.5
1
0.5
0
01−Jan−2007
01−Jan−2008
01−Jan−2009
01−Jan−2010
Difficulties
Nonlinear dynamics and positive variables (tracer concentration).
Numerous poorly known parameters.
Seasonal lack of observations (Arctic) and large error (satellite), few in-situ data.
Diet of the herbivorous / predators ?
Zooplankton grazing preferences (relative diet).
B No prior knowledge, one set of values for the whole domain.
Adaptation of zooplankton to their local environment?
Non-Gaussianity and parameter estimation with the EnKF
Truncations of negative values
Potential depletion of components of the ensemble
B Large corrections on parameters in erroneous directions.
Grazing efficiency f
Grazing efficiency f
1.2
Mean and standard deviations (d!1)
Mean and standard deviations (d!1)
1.2
1
0.8
0.6
0.4
0.2
0
0
500
1000
Time (d)
1500
1
0.8
0.6
0.4
0.2
0
0
500
1000
Time (d)
1500
Gaussian anamorphosis extension of ensemble-based Kalman filters
Analysis with transformed variables and observations
(Bertino et al., 2003)
Applicability of the methods in large systems
Effectiveness of the Gaussian anamorphosis extension of the EnKF to estimate
4
parameters in nonlinear frameworks of positive variables.
Outline
1
Estimation of positive sum-to-one constrained parameters with
ensemble-based Kalman filters
2
Numerical results in GOTM-NORWECOM
Positive sum-to-one constrained parameters (πi )i∈NN
Estimation
Constraints: ∀i = 1 : N, πi > 0 and
N
X
πi = 1
i =1
B Cannot be handled by the EnKF in that form.
Dirichlet distribution of order N
∀i = 1 : N, πi =
φi
N
X
φk
with
φi ∼ Γ(θi , 1)
with
φi ∼ N (θi , Σi )
k=1
Gelman’s formulation (1995)
∀i = 1 : N, πi =
e φi
N
X
k=1
e
φk
Positive sum-to-one constrained parameters (πi )i∈NN
Hyperspherical coordinates: N − 1 parameters
8
π1
>
>
>
>
>
>
>
∀i = 2 : N − 1,
>
>
>
>
>
<
πi
>
>
>
>
>
>
>
>
>
>
>
>
πN
:
=
=
cos2 ( π2 φ1 )
iY
−1
k=1
=
π
π
sin2 ( φk ) cos2 ( φi )
2
2
N−2
Y
k=1
π
π
sin2 ( φk ) sin2 ( φN−1 )
2
2
with (φi )i=1=N−1 distributed on the segment line [0, 1]
Inversion of the hyperspherical coordinate system
Recursively.
Information on the distribution of the (πi )i=1:N or available samples:
B Prior values for the (φi )i=1:N−1 .
7
Expected values and variances of the (πi )i=1:N
Tuning of the parameters of the distributions of the (φi )i=1:N−1
∀i = 1 : N − 1,
E [πi ] = mi ,
var(πi ) = σi2
Assumption: (φi )i =1:N−1 ∼ (Di ([0, 1], Θi ))i =1:N−1 are independent.
Expected values of the (πi )i=1:N−1
8
1
1
>
>
(E [e jπφ1 ] + E [e −jπφ1 ]) = m1 −
>
>
4
2
>
>
>
<
mi
1
1
−
∀i = 2 : N − 1, (E [e jπφi ] + E [e −jπφi ]) =
>
>
iX
−1
4
2
>
>
>
>
1−
mk
>
:
k=1
Variances of the (πi )i=1:N−1
8 1
3
1
>
(E [e 2jπφ1 ] + E [e −2jπφ1 ]) = − + σ12 + m12 − (E [e jπφ1 ] + E [e −jπφ1 ])
>
>
16
8
4
>
>
>
>
>
>
>
< ∀i = 2 : N − 1, 1 (E [e 2jπφi ] + E [e −2jπφi ]) = − 3 − 1 (E [e jπφi ] + E [e −jπφi ])
16
8
4
>
σi2 + mi2
>
>
>
+
>
>
k−1
i
>
X
Y 1
>
>
>
(−2)k−1 (σi2−k + mi2−k )
(E [e jπφi −l ] + E [e −jπφi −l ])
:
4
8
k=1
l =1
Example with the triangular distribution
Triangular distribution: φi ∼ T (0, 1, ci )
Tuning of the mode ci .
Characteristic function:
∀t ∈ R,
E [e jtφi ] = −2
Equal preferences: E [πi ] =
(1 − ci ) − e jci t + ci e jt
π 2 ci (1 − ci )
1
N
A system of N − 1 nonlinear equations:
∀i = 1 : N − 1,
cos(πci ) + 2ci − 1
N −i −1
+
= 0.
π 2 ci (1 − ci )
2(N − i + 1)
N > 3: non existence of solutions.
Outline
1
Estimation of positive sum-to-one constrained parameters with
ensemble-based Kalman filters
2
Numerical results in GOTM-NORWECOM
GOTM-NORWECOM-ASSIM
Assimilation
Deterministic ensemble Kalman filter (DEnKF, Sakov and Oke, 2008).
State vector: biogeochemical state vector and parameters.
100 members.
Gaussian anamorphosis (Bertino et al., 2003) for the biogeochemical state
variables (and parameters for the spherical formulation) .
Reference solution xt
Deterministic simulation with different preferences:
B Mesozooplankton: πdia = 0.6, πmic = 0.15, πden = 0.25.
B Microzooplankton: πfla = 0.6, πden = 0.15, πdia = 0.25.
Chlorophyll
Nitrate
The data assimilation system
Configuration of the experiments
One year to warm up the ensemble, then four years with assimilation.
Observations: y = xtP ∗ G , with G ∼ Γ(
1
, σ 2 ),
σo2 o
σo = 0.3.
B Surface chlorophyll (2 first layers) every seven days.
Truncated-Gaussian perturbations in the whole water column every twelve hours:
phytoplankton and zooplankton only.
Robustness of the estimation: experiments repeated 20 times.
Layer 1
Reference
Reference (observation time)
Observations
7
6
5
4
3
2
1
0
Jan00
12
Layer 2
Jan01
Jan02
Jan03
Time (d)
Jan04
8
Chlorophyll concentration (mg/m3)
Chlorophyll concentration (mg/m3)
8
7
6
5
4
3
2
1
0
Jan00
Jan01
Jan02
Jan03
Time (d)
Jan04
e φi
π i = PN
Preferences: Gelman’s formulation
k=1
MesoZ πdia
0.6
0.4
0.2
0
Jan02
Jan03
Time (d)
0.6
0.4
0.2
0
!0.2
Jan00
Jan04
MicroZ πfla
0.6
0.4
0.2
0
13
Jan02
Jan03
Time (d)
0.4
0.2
0
Jan04
Jan04
0.8
0.6
0.4
0.2
0
!0.2
Jan00
Jan01
Jan02
Jan03
Time (d)
Jan01
Jan02
Jan03
Time (d)
Jan04
MicroZ πdia
MicroZooplankton: detritus (N)
Mean and standard deviations
Mean and standard deviations
0.8
Jan01
Jan02
Jan03
Time (d)
0.6
MicroZ πden
MicroZooplankton: flagellates
!0.2
Jan00
Jan01
0.8
!0.2
Jan00
MicroZooplankton: diatoms
Mean and standard deviations
Jan01
0.8
MesoZooplankton: detritus (N)
Mean and standard deviations
Mean and standard deviations
Mean and standard deviations
MesoZooplankton: MicroZooplankton
0.8
!0.2
Jan00
MesoZ πden
MesoZ πmic
MesoZooplankton: diatoms
e φk
Jan04
0.8
0.6
0.4
0.2
0
!0.2
Jan00
Jan01
Jan02
Jan03
Time (d)
Jan04
Preferences: spherical formulation
MesoZ πdia
MesoZ πmic
0.6
0.4
0.2
0
Jan02
Jan03
Time (d)
0.6
0.4
0.2
0
!0.2
Jan00
Jan04
MicroZ πfla
0.6
0.4
0.2
0
14
Jan02
Jan03
Time (d)
0.4
0.2
0
Jan04
Jan04
0.8
0.6
0.4
0.2
0
!0.2
Jan00
Jan01
Jan02
Jan03
Time (d)
Jan01
Jan02
Jan03
Time (d)
Jan04
MicroZ πdia
MicroZooplankton: detritus (N)
Mean and standard deviations
Mean and standard deviations
0.8
Jan01
Jan02
Jan03
Time (d)
0.6
MicroZ πden
MicroZooplankton: flagellates
!0.2
Jan00
Jan01
0.8
!0.2
Jan00
MicroZooplankton: diatoms
Mean and standard deviations
Jan01
0.8
MesoZooplankton: detritus (N)
Mean and standard deviations
0.8
!0.2
Jan00
MesoZ πden
MesoZooplankton: MicroZooplankton
Mean and standard deviations
Mean and standard deviations
MesoZooplankton: diatoms
Jan04
0.8
0.6
0.4
0.2
0
!0.2
Jan00
Jan01
Jan02
Jan03
Time (d)
Jan04
Preferences: ternary plots of the final estimates
Microzooplankton
Gelman’s formulation
Mesozooplankton
Detritus
Diatoms
True
Initial
DEnKF
Diatoms
MicroZ
Flagellates
Spherical formulation
Mesozooplankton
Detritus
Diatoms
15
Detritus
Microzooplankton
Diatoms
MicroZ
Flagellates
Detritus
Spherical formulation: asymmetry of the transformation
Robustness of the results to the matching preferences/transformed parameters
Same 20 experiments (observation, prior ensembles)
Microzooplankton: permutation in the assignment preferences/transformed
parameters every four experiments.
8
>
<
>
:
Mesozooplankton
2 π
φ1 )
2
2 π
2 π
πMIC = sin ( φ1 ) cos ( φ2 )
2π
2
2
2 π
πDET = sin ( φ1 ) sin ( φ2 )
2
2
πDIA = cos (
8
>
<
>
:
Microzooplankton
2 π
πDET = cos ( φ1 )
2
2 π
2 π
πFLA = sin ( φ1 ) cos ( φ2 )
2
2
π
2
2 π
πDIA = sin ( φ1 ) sin ( φ2 )
2
2
Microzooplankton: marginal improvement of the estimates of the preferences.
Mesozooplankton: slight degradation of the estimation.
Mesozooplankton
RMS error (Mesozooplankton.)
Diet
Prior (%)
Gelman
Spherical
Spher. + perm.
16
Diatoms
45
35
33.3
36.7
MicroZ
120
146.6
86.7
133.3
Detritus
Detritus
32
44
48
52
Diatoms
Initial
DEnKF
True
MicroZ
Conclusion and perspectives
Two approaches for estimating positive sum-to-one constrained
parameters.
B Gelman’s formulation: Gaussian parameters, symmetry of the
transformation, N transformed parameters.
B Spherical formulation: N − 1 parameters to estimate, asymmetry of the
transformation, distribution of the transformed parameters?
Preliminary experiments indicate that the zooplankton grazing
preferences could be estimated with both approaches.
Quite simple framework.
Further investigations have to be done.
B Distribution.
B Additional parameters to estimate.
B Real observations (Mike station).
17
Thank you!
18
Bibliography
Bertino L., Evensen G. and Wackernagel H.: Sequential Data Assimilation Techniques
in Oceanography, International Statistical Reviews, 71, 223-241, 2003.
Bocquet M., Pires C.A. and Wu L.: Beyond Gaussian statistical modeling in
geophysical data assimilation, Monthly Weather Review, 138, 8, 2997-3023, 2010.
Doron M., Brasseur P. and Brankart J.-M.: Stochastic estimation of biogeochemical
parameters of a 3D ocean coupled physical-biogeochemical model: Twin experiments.
Journal of Marine Systems, 87 (3-4), 194-207, 2011.
Gelman A.: Method of Moments Using Monte Carlo Simulation. Journal of
Computational and Graphical Statistics, 4, 1, 36-54, 1995.
Gelman A., Bois F. and Jiang J.: Physiological Pharmacokinetic Analysis Using
Population Modeling and Informative Prior Distributions. Journal of the American
Statistical Association, 91, 436, 1400-1412, 1996.
Simon E. and Bertino L: Gaussian anamorphosis extension of the DEnKF for combined
state and parameter estimation: application to a 1D ocean ecosystem model, Journal of
Marine Systems, 89, 1-18, 2012.
Simon E., Samuelsen A., Bertino L. and Dumont D.: Estimation of positive
sum-to-one constrained zooplankton grazing preferences with the DEnKF: a twin
experiments, Ocean Science, 8, 587-602, 2012.
Construction of a monovariate anamorphosis:
Z (x) = ψ(Y (x))
Empirical anamorphosis
Empirical marginal distribution of Z (x): sample (zi )i =1:N .
Cumulative distribution function of Y (x): G .
N
X
ψ(y ) =
zi 1]G −1 ( i −1 ),G −1 (
N
i =1
)]
8
3- Definition of the tails
30
25
4
2
0
!10
!5
Gaussian values
0
5
3
3
Physical values (mg/m )
8
Physical values (mg/m )
10
6
(y )
2- Interpolation
10
3
Physical values (mg/m )
1- Empirical anamorphosis
i
N
6
4
2
0
!10
20
15
10
5
!5
Gaussian values
0
5
0
!10
!5
0
5
Gaussian values
10
Major issues
Choice of the physical data set: specification of the likely and unlikely values.
Dependance on a reference model run (model bias, extreme events).
15