Stochastic solution of large least squares systems in variational data

Stochastic solution of large least squares systems
in variational data assimilation
Parallel 4DVAR without tangents and adjoints
J. Mandel, S. Gratton, and E. Bergou
University of Colorado Denver
Institute of Computer Science, Czech Academy of Sciences
INP-ENSEEIHT and CERFACS, Toulouse, France
Supported by Fondation STAE project ADTAO
NSF grants AGS-0835579 and DMS-1216481
Czech Grant Agency project 13-34856S
Preconditioned Iterative Methods, Prague 2013
Data assimilation
I
Standard computing: data in, results out
I
Not much changed in principle since we were taking punched cards
and magnetic tapes to the mainframe
I
Data assimilation: combine data with a running model
I
Cast as Bayesian update or least squares
Used already in Apollo 11 navigation, now in GPS, standard in
numerical weather prediction
I
Weak Constraint 4DVAR
I
We want to determine x0 , . . . , xk (xi = state at time i)
approximately from model and observations (data)
x0
xi
H i ( xi )
I
≈ xb
state at time 0 ≈ the background
≈ Mi (xi −1 ) state evolution ≈ by the model
≈ yi
value of observation operator ≈ the data
quantify “≈” by covariances
x0 ≈ xb ⇔ kx0 − xb k2B−1 = (x0 − xb )T B−1 (x0 − xb ) ≈ 0 etc.
I
⇒ nonlinear least squares problem
k
k
kx0 − xb k2B−1 + ∑ kxi − Mi (xi −1 )k2Q−1 + ∑ kyi − Hi (xi )k2R−1 → min
i =1
.
i
i =1
i
x0:k
Incremental 4DVAR
I
linearization at [x0 , . . . , xk ] = x0:k
Mi (xi −1 + zi −1 ) ≈ Mi (xi −1 ) + Mi0 (xi −1 ) zi −1
Hi (xi + zi ) ≈ Hi (xi ) + Hi0 (xi ) zi
I
Gauss-Newton method (Bell, 1994; Courtier et al., 1994) consists
of the iterations
x0 : k ← x0 : k + z0 : k
with the increments z0:k solving the linear least squares problem
k 2
kx0 + z0 − xb k2B−1 + ∑ xi + zi − Mi (xi −1 ) − Mi0 (xi −1 ) zi −1 Q−1
i
i =1
k
+
∑ yi − Hi (xi ) − Hi0 (xi ) zi Ri−1 → min
z0:k
i =1
2
Linearized 4DVAR as Kalman smoother
I
Write the linear least squares problem for the increments z0:k as
k
k
kz0 − zb k2B−1 + ∑ kzi − Mi zi −1 − mi k2Q−1 + ∑ kdi − Hi zi k2R−1 → min
i =1
I
i
i =1
i
Theorem The least squares solution z0:k is the mean of the random
vectors Z0:k satisfying the stochastic equations
Z0 = zb
Zi = Mi Zi −1 + mi
di = Hi Zi
+V0 , V0 ∼ N (0, B)
+Vi , Vi ∼ N (0, Qi ) , i = 1, . . . , k
+Wi , Wi ∼ N (0, Ri ) i = 1, . . . , k
I
This is the Kalman smoother (Rauch et al., 1965; Bell, 1994).
I
Simulate by Monte-Carlo =⇒ Ensemble Kalman Smoother (EnKS)
(Evensen and van Leeuwen, 2000).
z0:k
Kalman smoother
I
To solve the stochastic Kalman smoother equations (Rauch et al.,
1965; Bell, 1994)
Z0 = zb
Zi = Mi Zi −1 + mi
di = Hi Zi
I
I
+V0 , V0 ∼ N (0, B)
+Vi , Vi ∼ N (0, Qi ) , i = 1, . . . , k
+Wi , Wi ∼ N (0, Ri ) i = 1, . . . , k
Build the solution recursively: Zi |j incorporates the data d1:j only,
need Zi = Zi |k
Kalman filter is the recursion Zi −1|i −1 7→ Zi |i .
I
Kalman smoother applies the same update to previous times to get
Z0:i |i −1 7→ Z0:i |i
I
Eventually, we obtain Z0:k as Z0:k |k
Kalman filter and smoother recursions
I
Recall notation: Zstep number | data incorporated up to
I
Kalman filter incorporates data di in the latest state:
di
Zi |i −1 data
−→ Zi |i
Z0|0
I
M1
−→
Z1|0
data d1
−→
Z1|1
M2
−→
Z2 | 1
data d2
−→
Z2 | 2
Kalman smoother incorporates data di also in all previous states
di
Z0:i |i −1 data
−→ Z0:i |i
Z0|0
→
M1
−→
Z0|0
Z1|0
d1
−→
Z0 | 1
Z1 | 1




→
Z0 | 1
Z0|2
d2  Z

→  Z1|1  −→
1|2
M2
Z
Z
2|1
2|2
−→
Ensemble Kalman filter (EnKF)
Represent random vector Zi |j by an ensemble of realizations
h
i
ZiN|j = zi1|j , . . . , ziN|j
1. Initialize by sampling background distribution
z0`|0 ∼ N (zb , B) ,
` = 1, . . . , N.
2. For i = 1, . . . , k:
2.1 Advance in time and randomize by sampling model error
zi`|i −1 = Mi zi`−1|i −1 + mi + vi` ,
vi` ∼ N (0, Qi )
2.2 Incorporate data di
T
N T
−1
zi`|i =zi`|i −1 − PN
i H i ( Hi P i H i + R i )
· (Hi zi`|i −1 − di − wi` ),
wi` ∼ N (0, Ri ) ,
N
where PN
i is the sample covariance from the ensemble Zi |i −1
Approximate solution: the ensemble mean: zi |i ≈ N1 zi1|i + · · · + ziN|i
Ensemble Kalman smoother (EnKS)
I
Run the Ensemble Kalman filter (EnKF). Turns out incorporating
the data di replaces the ensemble by linear combinations
N
N
N
N
N ×N .
(transformation by a TN
i ): Zi |i = Zi |i −1 Ti , Ti ∈ R
I
EnKS = EnKF + transform the history exactly the same way:
Z0N:i |i = Z0N:i |i −1 TN
i .
Approximate solution: the ensemble mean: zi |k ≈ N1 zi1|k + · · · + ziN|k
Convergence of the EnKF and EnKS
Definition (Stochastic Lp norm) For random vector X : Ω → Rn ,
1/p
kX kp = E |X |p
Theorem (Mandel et al. (2011); Le Gland et al. (2011)) zi`|i → Zi |i in all
Lp , 1 ≤ p < ∞, as the ensemble size N → ∞
Corollary
The EnKF
ensemble mean
1 z1 + · · · + zN → z = E Z
p
i |i
i |i in all L , 1 ≤ p < ∞, as N → ∞
N
i |i
i |i
Corollary The EnKS ensemble mean converges to the solution of the
linear least squares problem:
1 1
zi |k + · · · + ziN|k
→ zi = zi | k = E ( Z i | k )
N
in all Lp , 1 ≤ p < ∞, as N → ∞.
Model operator
The linearized model Mi = Mi0 (xi −1 ) occurs only in advancing the time
as the action on the ensemble of the increments
Mi xi −1 + zi`−1 − xi ≈ Mi (xi −1 ) + Mi0 (xi −1 ) zi`−1 − xi = Mi zi`−1 + mi
Approximating by finite differences with a parameter τ > 0:
Mi xi −1 + τzi`−1 − Mi (xi −1 )
Mi zi`−1 + mi ≈
+ Mi (xi −1 ) − xi
τ
Needs N + 1 evaluations of Mi , at xi −1 and xi −1 + τzi`−1 , ` = 1, . . . , N
Accurate in the limit τ → 0.
For τ = 1, recover the nonlinear model: Mi xi −1 + zi`−1 − xi
Observation operator
The observation matrix occurs only in the action on the ensemble,
i
h
Hi Z N = Hi z 1 , . . . , Hi z N .
Approximating by finite differences with a parameter τ > 0:
Hi xi + τzi` − Hi (xi )
Hi zi`−1 ≈
τ
Needs N + 1 evaluations of Hi , at xi −1 and xi −1 + τzin−1 .
Accurate in the limit τ → 0.
τ = 1 ⇒ becomes exactly the EnKS run on the nonlinear problem
⇒ EnKS independent of the point of linearization
⇒ no convergence of the 4DVAR iterations.
In our tests, τ = 0.1 seems to work well enough.
Tikhonov regularization and Levenberg-Marquardt method
I
Gauss-Newton may not converge, even locally. Add a penalty
(Tikhonov regularization) to control the size of the increments zi :
k
kz0 − zb k2B−1 + ∑ kzi − Mi zi −1 − mi k2Q−1 +
i
i =1
k
∑ kdi − Hi zi k2Ri−1
i =1
k
+ γ ∑ kzi k2S−1 → min
i =0
i
z0:k
I
Becomes the Levenberg-Marquardt method, which is guaranteed to
converge for large enough γ.
I
Implement the regularization as independent observations zi ≈ 0
with error covariance Si : simply run the analysis step the second
time (Johns and Mandel, 2008). Here, this is statistically exact
because the distributions in the Kalman smoother are gaussian.
Convergence
Theorem. Every iteration of the Levenberg-Marquardt or the
Gauss-Newton method with the stochastic linear solver by the EnKS
converges in all Lp , 1 ≤ p < ∞ as N → ∞, and τ → 0, to the same
iteration with the exact linear solver.
Functional interface
No matrices needed explicitly. Only certain matrix-vector operations.
I
I
Evaluation of the model operator Mi (xi −1 ) for a given state xi −1 .
Evaluation of the observation operator Hi (xi ) for a given state xi .
I
Multiplication of a vector u0 by square root V of the background
covariance B: B = VV T . Similarly, multiplication of a vector ui by
a square roots of the model error covariance Qi , and a
data-dimensioned vector vi by a square root of the data error
covariance Ri .These matrix roots are used only to generate samples
from N (0, B ), N (0, Qi ), and N (0, Ri ).
I
Solving a system with matrix Ri .
Computational results
Lorenz 63 model
dx = − σ (x − y )
dt
dy
dt = ρx − y − xz
dz = xy − βz
dt
EnKS-4DVAR for Lorenz 63 model
Twin experiment - one solution
chosen as the truth
1
1
, Ri = σr2 I, Hi (x, y , z ) = x 2 , y 2 , z 2 .
B = σb2 diag 1, ,
4 9
Root mean square error of EnKS-4DVAR iterations over 50 timesteps,
ensemble size 100, model error covariances Qi ≈ 0.
Iteration
RMSE
1
20.16
2
15.37
3
3.73
4
2.53
5
0.09
6
0.09
Lorenz 96 model
dxj
1
= (xj −1 (xj +1 − xj −2 ) − xj + F ), j = 1, . . . , 40
dt
κ
cyclic b.c. x−1 = x39 , x0 = x40 , x41 = x1 (Lorenz, 2006)
I
behaves chaotically in the case of κ = 1 and F = 8
I
model of time evolution on a constant latitude circle
15
10
x(20)
5
0
−5
−10
0
10
20
30
40
50
time steps
60
70
80
90
100
EnKS-4DVAR for Lorenz 96 model
I
I
I
I
Twin experiment - one solution chosen as the truth. The whole
state is observed.
Background covariance B = σb2 diag (1, 41 , 19 , . . .)
Data error covariance Ri = σr2 I , observation operator
Hi (Xi ) = Xi + Xi2
Model error covariance Qi = 0.1C , Cmn = exp (−5|m − n |),
ensemble size N = 40.
0.32
0.3
0.28
0.26
rmse
0.24
0.22
0.2
0.18
0.16
0
5
10
15
20
25
time
30
35
40
45
50
RMSE of EnKS-4DVAR solution remains bounded over 50 time steps,
max cca 5%
An example where Gauss-Newton does not converge
(x0 − 2)2 + (3 + x12 )2 + 106 (x0 − x1 )2 → min
4DVAR with xb = 2, B = I, M1 = I , H1 (x ) = x 2 , y1 = 3, Q1 = 10−6
Thus, γ > 0 (regularization) is sometimes required.
Properties
4DVAR sets up a large nonlinear least squares problem.
Classical methods
I
In each iteration, sequential evaluations of the model, tangent and
adjoint operators, and solving large linear least squares
I
Extra code for tangent and adjoint operators.
I
Gauss-Newton iterations may not converge, not even locally.
I
Difficult to parallelize - only inside the model.
The new method
I
Solve the linear least squares from 4DVAR by EnKS, naturally
parallel over the ensemble members.
I
Linear algebra glue is cheap, use parallel dense libraries.
I
Finite differences ⇒ no tangent and adjoint operators needed.
I
Add Tikhonov regularization to the linear least squares ⇒
Levelberg-Marquardt method with guaranteed convergence.
I
Cheap and simple implementation of Tikhonov regularization within
EnKS as an additional observation.
To do
I
Testing on large, realistic problems
I
Localization in the EnKS (the standard sample covariance is low
rank..., with spurious terms)
I
Adaptive control of initial ensemble spread in each iteration
I
Adaptive control of the regularization (the Levenberg-Marquardt
parameter γ)
I
More theory, esp. nonlinear and Levenberg-Marquardt convergence
Related work
I
The equivalence between weak constraint 4DVAR and Kalman
smoothing is approximate for nonlinear problems, but still useful
(Fisher et al., 2005).
I
Hamill and Snyder (2000) estimated background covariance from
ensemble for 4DVAR.
I
Gradient methods in the span of the ensemble for one analysis cycle
(i.e., 3DVAR) include Zupanski (2005), Sakov et al. (2012) (with
square root EnKF as a linear solver in Newton method), and
Bocquet and Sakov (2012), who added regularization and use
LETKF-like approach to minimize the nonlinear cost function over
linear combinations of the ensemble.
I
Liu et al. (2008, 2009) combine ensembles with (strong constraint)
4DVAR and minimize in the observation space.
I
Zhang et al. (2009) use EnKF to obtain the covariance for 4DVAR,
and 4DVAR to feed the mean analysis into EnKF.
References
B. Bell. The iterated Kalman smoother as a Gauss-Newton method. SIAM Journal on
Optimization, 4(3):626–636, 1994. doi: 10.1137/0804035.
M. Bocquet and P. Sakov. Combining inflation-free and iterative ensemble Kalman
filters for strongly nonlinear systems. Nonlinear Processes in Geophysics, 19(3):
383–399, 2012. doi: 10.5194/npg-19-383-2012.
P. Courtier, J.-N. Thépaut, and A. Hollingsworth. A strategy for operational
implementation of 4d-var, using an incremental approach. Quarterly Journal of the
Royal Meteorological Society, 120(519):1367–1387, 1994. ISSN 1477-870X. doi:
10.1002/qj.49712051912.
G. Evensen and P. J. van Leeuwen. An ensemble Kalman smoother for nonlinear
dynamics. Monthly Weather Review, 128(6):1852–1867, 2000. doi:
10.1175/1520-0493(2000)128h1852:AEKSFNi2.0.CO;2.
M. Fisher, M. Leutbecher, and G. A. Kelly. On the equivalence between Kalman
smoothing and weak-constraint four-dimensional variational data assimilation.
Quarterly Journal of the Royal Meteorological Society, 131(613, Part c):3235–3246,
OCT 2005. ISSN 0035-9009. doi: {10.1256/qj.04.142}.
T. M. Hamill and C. Snyder. A hybrid ensemble Kalman filter–3D variational analysis
scheme. Monthly Weather Review, 128(8):2905–2919, 2000. doi:
10.1175/1520-0493(2000)128h2905:AHEKFVi2.0.CO;2.
C. J. Johns and J. Mandel. A two-stage ensemble Kalman filter for smooth data
assimilation. Environmental and Ecological Statistics, 15:101–110, 2008. doi:
10.1007/s10651-007-0033-0.
F. Le Gland, V. Monbet, and V.-D. Tran. Large sample asymptotics for the ensemble
Kalman filter. In D. Crisan and B. Rozovskii, editors, The Oxford Handbook of
Nonlinear Filtering. Oxford University Press, 2011. INRIA Report 7014, August
2009.
C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data
assimilation scheme. Part I: Technical formulation and preliminary test. Monthly
Weather Review, 136(9):3363–3373, 2008. doi: 10.1175/2008MWR2312.1.
C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data
assimilation scheme. Part II: Observing system simulation experiments with
Advanced Research WRF (ARW). Monthly Weather Review, 137(5):1687–1704,
2009. doi: 10.1175/2008MWR2699.1.
E. N. Lorenz. Predictability - a problem partly solved. In T. Palmer and
R. Hagendorn, editors, Predictability of Weather and Climate, pages 40–58.
Cambridge University Press, 2006.
J. Mandel, L. Cobb, and J. D. Beezley. On the convergence of the ensemble Kalman
filter. Applications of Mathematics, 56:533–541, 2011. doi:
10.1007/s10492-011-0031-2. arXiv:0901.2951, January 2009.
H. E. Rauch, F. Tung, and C. T. Striebel. Maximum likelihood estimates of linear
dynamic systems. AIAA Journal, 3(8):1445–1450, 1965.
P. Sakov, D. S. Oliver, and L. Bertino. An iterative EnKF for strongly nonlinear
systems. Monthly Weather Review, 140(6):1988–2004, 2012. doi:
10.1175/MWR-D-11-00176.1.
Y. Trémolet. Model-error estimation in 4D-Var. Q. J. Royal Meteorological Soc., 133
(626):1267–1280, 2007. ISSN 1477-870X. doi: 10.1002/qj.94.
F. Zhang, M. Zhang, and J. Hansen. Coupling ensemble Kalman filter with
four-dimensional variational data assimilation. Advances in Atmospheric Sciences,
26(1):1–8, 2009. doi: 10.1007/s00376-009-0001-8.
M. Zupanski. Maximum likelihood ensemble filter: Theoretical aspects. Monthly
Weather Review, 133(6):1710–1726, 2013/01/01 2005. doi: 10.1175/MWR2946.1.