A Bayesian view of Data Assimilation

A Bayesian view of Data Assimilation
and Quality Control of Observations
Lecture, Intensive Course on Data Assimilation
Andrew Lorenc, Buenos Aires, Argentina. October 27 - November 7, 2008
© Crown copyright Met Office
Content
1. Bayes Theorem – adding information
•
Gaussian PDFs
•
Non-Gaussian observational errors - Quality Control
2. Simplest possible Bayesian NWP analysis
-
Two gridpoints, one observation.
3. Predicting the prior PDF
– a Bayesian view of 4D-Var v Ensemble KF
© Crown copyright Met Office Andrew Lorenc 2
Bayes Theorem – adding information
Gaussian PDFs
Non-Gaussian observational errors - Quality Control
© Crown copyright Met Office
Bayes' Theorem for Discrete Events
A B
P(A)
P(A∩B)
P(AB)
events
probability of A occurring, or
knowledge about A's past occurrence
probability that A and B both occur,
conditional probability of A given B
We have two ways of expressing P(A∩B):
P(A ∩ B) = P(B) P(A | B) = P(A) P(B | A)
⇒ Bayes' Theorem:
P(B | A) P(A)
P(A | B) =
P(B)
Can calculate P(B) from: P(B)= P(B | A)P(A)+ P(B | A )P( A )
© Crown copyright Met Office Andrew Lorenc 4
Bayes theorem in continuous form,
to estimate a value x given an observation yo
o
p(
| x)p(x)
y
o
p(x | y ) =
p( y o )
p(xyo)
p(x)
p(yox)
Can get
p(yo)
is the posterior distribution,
is the prior distribution,
is the likelihood function for x
o
o
p(
)
=
∫
p(
y
y | x)p(x)dx
by integrating over all x:
© Crown copyright Met Office Andrew Lorenc 5
Assume Gaussian pdfs
Prior is Gaussian with mean xb, variance Vb :
p(x) = (2π V b )
- 21
 1 (x - xb )2 

exp 
2
V
b


x ~ N( xb ,V b )
o
Ob yo, Gaussian about true value x variance Vo : y ~ N(x,V o )
p( y | x) = (2π V o )
- 21
o
 1 ( y o - x )2 

exp 
2
V
o


Substituting gives a Gaussian posterior:
p(x) = (2π V a )
- 21
© Crown copyright Met Office Andrew Lorenc 6
 1 (x - xa )2 

exp 
2
V
a


x ~ N (x a ,V a )
Advantages of Gaussian
assumption
1. Best estimate is a found by solving linear equations:
1
Va
=
1
Vo
+
1
1
V
Vb
p(x) = (2π V a )
- 21
a
x =
a
1
V
o
y +
o
1
V
x
b
b
 1 (x - x a )2 

exp 
2
V
a


Taking logs gives quadratic equation; differentiating to find extremum gives linear equation.
2. Best estimate is a function of values & [co-]variances only.
Often these are all we know.
3. Weights are independent of values.
© Crown copyright Met Office Andrew Lorenc 7
Combination of Gaussian prior & observation
- Gaussian posterior,
- weights independent of values.
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(3,1)
posterior x ~ N(2.25,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(5,1)
posterior x ~ N(3.75,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(7,1)
posterior x ~ N(5.25,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(9,1)
posterior x ~ N(6.75,0.75)
© Crown copyright Met Office Andrew Lorenc 8
Quality Control example:
Bayesian Dice
Discrete Bayes Theorem Applied to Gross Observational Errors
I have two dice. One is weighted towards throwing sixes. I have performed
some experiments with them, and have the prior statistics that:
for the weighted (W) die,P(6|W) = 58/60
for the good (G) die,
P(6|G)
= 10/60
I choose one at random: P(W) = P(G) = ½ = 50%
I throw this die, and it shows a six. Now:P(6)
= P(6|W) P(W) +P(6|G) P(G
= 58/60 1/2 + 10/60 1/2
= 34/60
We can now apply Bayes' Theorem:
P(G|6)
= P(6|G) P(G) / P(6)
= 10/60 1/2 / 34/60 = 5/34 = 15%
P(W|6) = P(6|W) P(W) / P(6)
= 58/60 1/2 / 34/60 = 29/34 = 85%
© Crown copyright Met Office Andrew Lorenc 9
Simple model for PDF of
observations with errors
Assume that a small fraction of the observations are corrupted, and hence
worthless. The others have Gaussian errors.
For each observation we have:
( ) (
(
)
)()
p yo |x = p yo |G ∩ x P(G ) + p yo |G ∩ x P G
G is the event "there is a gross error" and G means not G.
(
) (
k
p (y |G ∩ x ) =
0
p y o |G ∩ x = N y o |H(x), E+ F
o

© Crown copyright Met Office Andrew Lorenc 10
)
over the range of plausible values
elsewhere
Applying this model
• Can simply apply Bayes Theorem to the discrete event G
(
)
)
( )
o
P
| G P(G )
y
o
P G| y =
P yo
(
k P(G )
=
k P(G ) + N y o|H x b ,R + HBH T P(G )
(
( )
)
Lorenc, A.C. and Hammon, O., 1988: "Objective quality control of observations using Bayesian
methods. Theory, and a practical implementation." Quart. J. Roy. Met. Soc., 114, 515-543
• Or we can use the non-Gaussian PDF directly
p yo |x = p yo |G ∩ x P(G ) + p yo |G ∩ x P G
( ) (
)
(
)()
Ingleby, N.B., and Lorenc, A.C. 1993: "Bayesian quality control using multivariate normal distributions".
Quart. J. Roy. Met. Soc., 119, 1195-1225
Andersson, Erik and Jarvinen, Heikki. 1999: "Variational Quality Control" Quart. J. Roy. Met. Soc., 125,
697-722
© Crown copyright Met Office Andrew Lorenc 11
Posterior probability that an observation is
“correct”, as a function of its deviation from
the background forecast
© Crown copyright Met Office Andrew Lorenc 12
Gaussian prior combined with observation
with gross errors - extreme obs are rejected.
prior x ~ N(0,3)
likelihood p(yo|x) ~ 97%*N(3,1) + 3%*0.02
posterior x
prior x ~ N(0,3)
likelihood p(yo|x) ~ 97%*N(5,1) + 3%*0.02
posterior x
prior x ~ N(0,3)
likelihood p(yo|x) ~ 97%*N(7,1) + 3%*0.02
posterior x
prior x ~ N(0,3)
likelihood p(yo|x) ~ 97%*N(9,1) + 3%*0.02
posterior x
© Crown copyright Met Office Andrew Lorenc 13
Js
Combination of Gaussian prior & observation
- Gaussian posterior,
- weights independent of values.
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(3,1)
posterior x ~ N(2.25,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(5,1)
posterior x ~ N(3.75,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(7,1)
posterior x ~ N(5.25,0.75)
prior x ~ N(0,3)
likelihood p(yo|x) ~ N(9,1)
posterior x ~ N(6.75,0.75)
© Crown copyright Met Office Andrew Lorenc 14
Js
Variational Penalty Functions
• Finding the most probable posterior value involves
maximising a product [of Gaussians]
• By taking –ln of the posterior PDF, we can instead
minimise a sum [of quadratics]
• This is often called the “Penalty Function” J
• Additive constants can be ignored
© Crown copyright Met Office Andrew Lorenc 15
Penalty functions: J(x) = -ln(p(x))+c
p Gaussian ⇒ J quadratic
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(3,1)
J(x) = Jb(x) + Jo(x)
J
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(7,1)
J(x) = Jb(x) + Jo(x)
J
© Crown copyright Met Office Andrew Lorenc 16
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(5,1)
J(x) = Jb(x) + Jo(x)
J
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(9,1)
J(x) = Jb(x) + Jo(x)
J
PDFs
Penalty functions: J(x) = -ln(p(x))+c
p non-Gaussian ⇒ J non-quadratic
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x).
p(yo|x)~97%*N(3,1)+3%*0.02
J(x) = Jb(x) + Jo(x)
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x).
p(yo|x)~97%*N(5,1)+3%*0.02
J(x) = Jb(x) + Jo(x)
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x).
p(yo|x)~97%*N(7,1)+3%*0.02
J(x) = Jb(x) + Jo(x)
Jb(x)=-ln(p(x))+c. x ~ N(0,3)
Jo(x).
p(yo|x)~97%*N(9,1)+3%*0.02
J(x) = Jb(x) + Jo(x)
© Crown copyright Met Office Andrew Lorenc 17
PDFs
Expected benefit as a
function of analysed value.
Curves are plotted for
three different benefit
functions, with widths C=0
(maximum at X), C=9
(maximum at +), C=100
(max at ).
Shown for reference are
the background pdf (with
xb=0, B=9), and the
observational pdf (with
yo=10, R=0.5).
Lorenc, A. C., 2002: Atmospheric Data Assimilation and Quality Control.
Ocean Forecasting, eds Pinardi & Woods. ISBN 3-540-67964-2. 73-96
© Crown copyright Met Office Andrew Lorenc 18
Other models for observation
error
The Gaussian + flat distribution
lead to a rather rapid rejection of
observations with large
deviations. This can cause
problems:
1.When the guess is some way
from the observation
2.When the observation is of
(important) severe weather
Erik Andersson prefers a “Huber
Norm” penalty function:
© Crown copyright Met Office Andrew Lorenc 19
Simplest possible Bayesian NWP
analysis
© Crown copyright Met Office
Simplest possible example – 2 grid-points,
1 observation. Standard notation:
Ide, K., Courtier, P., Ghil, M., and Lorenc, A.C. 1997: "Unified notation for data
assimilation: Operational, Sequential and Variational" J. Met. Soc. Japan,
Special issue "Data Assimilation in Meteorology and Oceanography: Theory
and Practice." 75, No. 1B, 181—189
 x1 
x= 
x 
 2
Model is two grid points:
y o = (y o )
1 observed value yo midway (but use notation for >1):
Can interpolate an estimate y of the observed value:
y = H (x ) = x1 + x2 = Hx = (
1
2
1
2
1
2
1
2
 x1 
) 
 x2 
This example H is linear, so
we can use matrix notation for
fields as well as increments.
© Crown copyright Met Office Andrew Lorenc 21
background pdf
We have prior estimate
xb1 with error variance Vb:
(
exp (-
p ( x1 ) = (2πVb ) exp -
1
2
p ( x 2 ) = (2πVb )
1
2
- 12
- 12
But errors in x1 and x2 are usually correlated
⇒ must use a multi-dimensional Gaussian:
p( x1 ∩ x 2 ) = p (x ) = ((2π )
2
)
V )
1
Vb
2
b 2
2
b
x ~ N ( x : x b , B)
 1
b T
−1
b 
| B | exp - (x - x ) B (x - x )
 2

)
- 12
where B is the covariance matrix:
© Crown copyright Met Office Andrew Lorenc 22
(x − x )
(x − x )
b 2
1
 1 µ

B =V b
 µ 1


background pdf
© Crown copyright Met Office Andrew Lorenc 23
Observational errors
Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction."
Quart. J. Roy. Met. Soc., 112, 1177-1194.
(
o
t
y ~ N y ,E
(
)
instrumental error
o
t
- 12
o
t T
1
p y | y = (2π |E|) exp - 2 y − y E −1 y o − y t
(
)
(
)
(
))
y ~ N (H (x t ),F )
error of representativeness

 1
t T
p t (y|x ) = (2π|F|) exp - (y − H (x )) F −1 (y − H (x t ))
 2

- 12
t
Observational error
combines these 2 :
(
) ∫ p(y |y ) p (y|x )dy
exp (- (y − H (x )) (E+F ) (y
p y o |x t =
= (2π|E+F|)
- 12
© Crown copyright Met Office Andrew Lorenc 24
o
y ~ N (H (x t ),E+F )
1
2
o
t
t
o
t
T
-1
o
))
− H (x t )
background pdf
obs likelihood function
© Crown copyright Met Office Andrew Lorenc 25
Bayesian analysis equation
( )
( )
p y o |x p (x )
p x| y =
p yo
( )
o
x ~ N (x a , A )
Property of Gaussians that, if H is linearisable :
where
xa
and A are defined by:
A −1 = B −1 + HT (E + F ) H
−1
x a = x b + AH T (E + F ) (y o − H ( x b ) )
−1
© Crown copyright Met Office Andrew Lorenc 26
background pdf
obs likelihood function
posterior analysis PDF
© Crown copyright Met Office Andrew Lorenc 27
Analysis equation
For our simple example the algebra is easily done by hand, giving:



b  1+ µ

 2

 

2
b
b
x  x 
V
+

x
x
o
1
2   1
a




y  
x = = +
1+ µ  


b
a
b
2   1


x
x
 2   2 E + F +V  2  
a
1
© Crown copyright Met Office Andrew Lorenc 28
b
1
How to estimate the prior PDF?
How to calculate its time evolution?
i.e. 4D-Var versus Ensemble KF
© Crown copyright Met Office
Fokker-Planck Equation
Ensemble methods attempt to sample entire PDF.
© Crown copyright Met Office
Gaussian Probability
Distribution Functions
• Easier to fit to sampled errors.
• Quadratic optimisation problems, with linear solution
methods – much more efficient.
• The Kalman filter is optimal for linear models, but
• it is not affordable for expensive models (despite the “easy”
quadratic problem)
• it is not optimal for nonlinear models.
• Advanced methods based on the Kalman filter can
be made affordable:
• Ensemble Kalman filter (EnKF, ETKF, ...)
• Four-dimensional variational assimilation (4D-Var)
© Crown copyright Met Office
Ensemble Kalman filter
Fit Gaussian to forecast ensemble.
© Crown copyright Met Office
EnKF
=
=
=
© Crown copyright Met Office
Deterministic 4D-Var
Initial PDF is approximated by a Gaussian.
Descent algorithm only explores a small part of the PDF,
on the way to a local minimum.
© Crown copyright Met Office
Simple 4D-Var, as a least-squares best fit of a
deterministic model trajectory to observations
© Crown copyright Met Office
Assumptions in deriving
deterministic 4D-Var
Bayes Theorem - posterior PDF:
where the obs likelihood
function is given by:
Impossible to evaluate the
integrals necessary to find
“best”.
Instead assume best x
maximises PDF, and
minimises -ln(PDF):
Purser, R.J. 1984: "A new approach to the optimal assimilation of meteorological data by iterative Bayesian analysis".
Preprints, 10th conference on weather forecasting and analysis. Am Met Soc. 102-105
Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction." Quart. J. Roy. Met. Soc., 112, 1177-1194.
© Crown copyright Met Office
The deterministic 4D-Var equations
(
)
(
P x yo ∝ P (x) P yo x
Assume
Gaussians
(
)
P ( x ) ∝ exp −
(
Bayesian posterior pdf.
1
2
(x − x )
b T
(
)
B −1 ( x − xb )
(
)
P y x = P y y ∝ exp −
o
o
© Crown copyright Met Office
J (x) =
1
2
(x − x
)
b T
(y − y )
o
T
(
R −1 y − y o
))
y = H ( M ( x))
But nonlinear model makes pdf non-Gaussian:
full pdf is too complicated to be allowed for.
So seek mode of pdf by
finding minimum of
penalty function
1
2
)
B
−1
(x − x ) +
b
1
2
(
y−y
o
)
T
(
(
R −1 y − y o
∇ x J ( x ) = B −1 ( x − xb ) + M *H *R −1 y − y o
)
)
Statistical, incremental 4D-Var
Statistical 4D-Var approximates entire PDF by a Gaussian.
© Crown copyright Met Office
Statistical 4D-Var - equations
(
Independent, Gaussian P δ x, δ η y
background and model
errors ⇒ non-Gaussian pdf
for general y:
o
)
(
exp ( −
exp −
Incremental linear approximations
in forecasting model predictions of
observed values converts this to
an approximate Gaussian pdf:
The mean of this approximate pdf
is identical to the mode, so it can
be found by minimising:
© Crown copyright Met Office
(
∝ exp −
(δ x − ( x
1
2
1
2
(δ η + η )
1
2
(y − y )
T
g
o
(
−x
b
T
g
))
T
(
B −1 δ x − ( x b − x g )
(
Q −1 δ η + ηg
(
R −1 y − y o
))
( (
)
% % δ x, η + H M x g , ηg
y = HM
(
))
))
) (δ x − ( x − x ) ) B (δ x − ( x
+ (δ η + η ) Q (δ η + η )
+ (y − y ) R (y − y )
J δ x, δ η =
b
1
2
g
1
2
1
2
g
o
T
T
T
−1
−1
−1
g
o
))
b
− xg )
)
Incremental 4D-Var with
Outer Loop
Inner low-resolution incremental variational iteration
ADJOINT OF P.F. MODEL
T
DESCENT
xb
ALGORITHM
background
U
y
y
y
y
U
δx
xg
PERTURBATION FORECAST MODEL +δη
FULL FORECAST MODEL
+η
Outer, full-resolution iteration Optional model error terms
© Crown copyright Met Office
Questions and answers
© Crown copyright Met Office
Content
1. Bayes Theorem – adding information
•
Gaussian PDFs
•
Non-Gaussian observational errors - Quality Control
2. Simplest possible Bayesian NWP analysis
-
Two gridpoints, one observation.
3. Predicting the prior PDF
– a Bayesian view of 4D-Var v Ensemble KF
© Crown copyright Met Office Andrew Lorenc 42