A Bayesian view of Data Assimilation and Quality Control of Observations Lecture, Intensive Course on Data Assimilation Andrew Lorenc, Buenos Aires, Argentina. October 27 - November 7, 2008 © Crown copyright Met Office Content 1. Bayes Theorem – adding information • Gaussian PDFs • Non-Gaussian observational errors - Quality Control 2. Simplest possible Bayesian NWP analysis - Two gridpoints, one observation. 3. Predicting the prior PDF – a Bayesian view of 4D-Var v Ensemble KF © Crown copyright Met Office Andrew Lorenc 2 Bayes Theorem – adding information Gaussian PDFs Non-Gaussian observational errors - Quality Control © Crown copyright Met Office Bayes' Theorem for Discrete Events A B P(A) P(A∩B) P(AB) events probability of A occurring, or knowledge about A's past occurrence probability that A and B both occur, conditional probability of A given B We have two ways of expressing P(A∩B): P(A ∩ B) = P(B) P(A | B) = P(A) P(B | A) ⇒ Bayes' Theorem: P(B | A) P(A) P(A | B) = P(B) Can calculate P(B) from: P(B)= P(B | A)P(A)+ P(B | A )P( A ) © Crown copyright Met Office Andrew Lorenc 4 Bayes theorem in continuous form, to estimate a value x given an observation yo o p( | x)p(x) y o p(x | y ) = p( y o ) p(xyo) p(x) p(yox) Can get p(yo) is the posterior distribution, is the prior distribution, is the likelihood function for x o o p( ) = ∫ p( y y | x)p(x)dx by integrating over all x: © Crown copyright Met Office Andrew Lorenc 5 Assume Gaussian pdfs Prior is Gaussian with mean xb, variance Vb : p(x) = (2π V b ) - 21 1 (x - xb )2 exp 2 V b x ~ N( xb ,V b ) o Ob yo, Gaussian about true value x variance Vo : y ~ N(x,V o ) p( y | x) = (2π V o ) - 21 o 1 ( y o - x )2 exp 2 V o Substituting gives a Gaussian posterior: p(x) = (2π V a ) - 21 © Crown copyright Met Office Andrew Lorenc 6 1 (x - xa )2 exp 2 V a x ~ N (x a ,V a ) Advantages of Gaussian assumption 1. Best estimate is a found by solving linear equations: 1 Va = 1 Vo + 1 1 V Vb p(x) = (2π V a ) - 21 a x = a 1 V o y + o 1 V x b b 1 (x - x a )2 exp 2 V a Taking logs gives quadratic equation; differentiating to find extremum gives linear equation. 2. Best estimate is a function of values & [co-]variances only. Often these are all we know. 3. Weights are independent of values. © Crown copyright Met Office Andrew Lorenc 7 Combination of Gaussian prior & observation - Gaussian posterior, - weights independent of values. prior x ~ N(0,3) likelihood p(yo|x) ~ N(3,1) posterior x ~ N(2.25,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(5,1) posterior x ~ N(3.75,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(7,1) posterior x ~ N(5.25,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(9,1) posterior x ~ N(6.75,0.75) © Crown copyright Met Office Andrew Lorenc 8 Quality Control example: Bayesian Dice Discrete Bayes Theorem Applied to Gross Observational Errors I have two dice. One is weighted towards throwing sixes. I have performed some experiments with them, and have the prior statistics that: for the weighted (W) die,P(6|W) = 58/60 for the good (G) die, P(6|G) = 10/60 I choose one at random: P(W) = P(G) = ½ = 50% I throw this die, and it shows a six. Now:P(6) = P(6|W) P(W) +P(6|G) P(G = 58/60 1/2 + 10/60 1/2 = 34/60 We can now apply Bayes' Theorem: P(G|6) = P(6|G) P(G) / P(6) = 10/60 1/2 / 34/60 = 5/34 = 15% P(W|6) = P(6|W) P(W) / P(6) = 58/60 1/2 / 34/60 = 29/34 = 85% © Crown copyright Met Office Andrew Lorenc 9 Simple model for PDF of observations with errors Assume that a small fraction of the observations are corrupted, and hence worthless. The others have Gaussian errors. For each observation we have: ( ) ( ( ) )() p yo |x = p yo |G ∩ x P(G ) + p yo |G ∩ x P G G is the event "there is a gross error" and G means not G. ( ) ( k p (y |G ∩ x ) = 0 p y o |G ∩ x = N y o |H(x), E+ F o © Crown copyright Met Office Andrew Lorenc 10 ) over the range of plausible values elsewhere Applying this model • Can simply apply Bayes Theorem to the discrete event G ( ) ) ( ) o P | G P(G ) y o P G| y = P yo ( k P(G ) = k P(G ) + N y o|H x b ,R + HBH T P(G ) ( ( ) ) Lorenc, A.C. and Hammon, O., 1988: "Objective quality control of observations using Bayesian methods. Theory, and a practical implementation." Quart. J. Roy. Met. Soc., 114, 515-543 • Or we can use the non-Gaussian PDF directly p yo |x = p yo |G ∩ x P(G ) + p yo |G ∩ x P G ( ) ( ) ( )() Ingleby, N.B., and Lorenc, A.C. 1993: "Bayesian quality control using multivariate normal distributions". Quart. J. Roy. Met. Soc., 119, 1195-1225 Andersson, Erik and Jarvinen, Heikki. 1999: "Variational Quality Control" Quart. J. Roy. Met. Soc., 125, 697-722 © Crown copyright Met Office Andrew Lorenc 11 Posterior probability that an observation is “correct”, as a function of its deviation from the background forecast © Crown copyright Met Office Andrew Lorenc 12 Gaussian prior combined with observation with gross errors - extreme obs are rejected. prior x ~ N(0,3) likelihood p(yo|x) ~ 97%*N(3,1) + 3%*0.02 posterior x prior x ~ N(0,3) likelihood p(yo|x) ~ 97%*N(5,1) + 3%*0.02 posterior x prior x ~ N(0,3) likelihood p(yo|x) ~ 97%*N(7,1) + 3%*0.02 posterior x prior x ~ N(0,3) likelihood p(yo|x) ~ 97%*N(9,1) + 3%*0.02 posterior x © Crown copyright Met Office Andrew Lorenc 13 Js Combination of Gaussian prior & observation - Gaussian posterior, - weights independent of values. prior x ~ N(0,3) likelihood p(yo|x) ~ N(3,1) posterior x ~ N(2.25,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(5,1) posterior x ~ N(3.75,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(7,1) posterior x ~ N(5.25,0.75) prior x ~ N(0,3) likelihood p(yo|x) ~ N(9,1) posterior x ~ N(6.75,0.75) © Crown copyright Met Office Andrew Lorenc 14 Js Variational Penalty Functions • Finding the most probable posterior value involves maximising a product [of Gaussians] • By taking –ln of the posterior PDF, we can instead minimise a sum [of quadratics] • This is often called the “Penalty Function” J • Additive constants can be ignored © Crown copyright Met Office Andrew Lorenc 15 Penalty functions: J(x) = -ln(p(x))+c p Gaussian ⇒ J quadratic Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(3,1) J(x) = Jb(x) + Jo(x) J Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(7,1) J(x) = Jb(x) + Jo(x) J © Crown copyright Met Office Andrew Lorenc 16 Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(5,1) J(x) = Jb(x) + Jo(x) J Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x)=-ln(p(yo|x))+c. p(yo|x) ~ N(9,1) J(x) = Jb(x) + Jo(x) J PDFs Penalty functions: J(x) = -ln(p(x))+c p non-Gaussian ⇒ J non-quadratic Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x). p(yo|x)~97%*N(3,1)+3%*0.02 J(x) = Jb(x) + Jo(x) Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x). p(yo|x)~97%*N(5,1)+3%*0.02 J(x) = Jb(x) + Jo(x) Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x). p(yo|x)~97%*N(7,1)+3%*0.02 J(x) = Jb(x) + Jo(x) Jb(x)=-ln(p(x))+c. x ~ N(0,3) Jo(x). p(yo|x)~97%*N(9,1)+3%*0.02 J(x) = Jb(x) + Jo(x) © Crown copyright Met Office Andrew Lorenc 17 PDFs Expected benefit as a function of analysed value. Curves are plotted for three different benefit functions, with widths C=0 (maximum at X), C=9 (maximum at +), C=100 (max at ). Shown for reference are the background pdf (with xb=0, B=9), and the observational pdf (with yo=10, R=0.5). Lorenc, A. C., 2002: Atmospheric Data Assimilation and Quality Control. Ocean Forecasting, eds Pinardi & Woods. ISBN 3-540-67964-2. 73-96 © Crown copyright Met Office Andrew Lorenc 18 Other models for observation error The Gaussian + flat distribution lead to a rather rapid rejection of observations with large deviations. This can cause problems: 1.When the guess is some way from the observation 2.When the observation is of (important) severe weather Erik Andersson prefers a “Huber Norm” penalty function: © Crown copyright Met Office Andrew Lorenc 19 Simplest possible Bayesian NWP analysis © Crown copyright Met Office Simplest possible example – 2 grid-points, 1 observation. Standard notation: Ide, K., Courtier, P., Ghil, M., and Lorenc, A.C. 1997: "Unified notation for data assimilation: Operational, Sequential and Variational" J. Met. Soc. Japan, Special issue "Data Assimilation in Meteorology and Oceanography: Theory and Practice." 75, No. 1B, 181—189 x1 x= x 2 Model is two grid points: y o = (y o ) 1 observed value yo midway (but use notation for >1): Can interpolate an estimate y of the observed value: y = H (x ) = x1 + x2 = Hx = ( 1 2 1 2 1 2 1 2 x1 ) x2 This example H is linear, so we can use matrix notation for fields as well as increments. © Crown copyright Met Office Andrew Lorenc 21 background pdf We have prior estimate xb1 with error variance Vb: ( exp (- p ( x1 ) = (2πVb ) exp - 1 2 p ( x 2 ) = (2πVb ) 1 2 - 12 - 12 But errors in x1 and x2 are usually correlated ⇒ must use a multi-dimensional Gaussian: p( x1 ∩ x 2 ) = p (x ) = ((2π ) 2 ) V ) 1 Vb 2 b 2 2 b x ~ N ( x : x b , B) 1 b T −1 b | B | exp - (x - x ) B (x - x ) 2 ) - 12 where B is the covariance matrix: © Crown copyright Met Office Andrew Lorenc 22 (x − x ) (x − x ) b 2 1 1 µ B =V b µ 1 background pdf © Crown copyright Met Office Andrew Lorenc 23 Observational errors Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction." Quart. J. Roy. Met. Soc., 112, 1177-1194. ( o t y ~ N y ,E ( ) instrumental error o t - 12 o t T 1 p y | y = (2π |E|) exp - 2 y − y E −1 y o − y t ( ) ( ) ( )) y ~ N (H (x t ),F ) error of representativeness 1 t T p t (y|x ) = (2π|F|) exp - (y − H (x )) F −1 (y − H (x t )) 2 - 12 t Observational error combines these 2 : ( ) ∫ p(y |y ) p (y|x )dy exp (- (y − H (x )) (E+F ) (y p y o |x t = = (2π|E+F|) - 12 © Crown copyright Met Office Andrew Lorenc 24 o y ~ N (H (x t ),E+F ) 1 2 o t t o t T -1 o )) − H (x t ) background pdf obs likelihood function © Crown copyright Met Office Andrew Lorenc 25 Bayesian analysis equation ( ) ( ) p y o |x p (x ) p x| y = p yo ( ) o x ~ N (x a , A ) Property of Gaussians that, if H is linearisable : where xa and A are defined by: A −1 = B −1 + HT (E + F ) H −1 x a = x b + AH T (E + F ) (y o − H ( x b ) ) −1 © Crown copyright Met Office Andrew Lorenc 26 background pdf obs likelihood function posterior analysis PDF © Crown copyright Met Office Andrew Lorenc 27 Analysis equation For our simple example the algebra is easily done by hand, giving: b 1+ µ 2 2 b b x x V + x x o 1 2 1 a y x = = + 1+ µ b a b 2 1 x x 2 2 E + F +V 2 a 1 © Crown copyright Met Office Andrew Lorenc 28 b 1 How to estimate the prior PDF? How to calculate its time evolution? i.e. 4D-Var versus Ensemble KF © Crown copyright Met Office Fokker-Planck Equation Ensemble methods attempt to sample entire PDF. © Crown copyright Met Office Gaussian Probability Distribution Functions • Easier to fit to sampled errors. • Quadratic optimisation problems, with linear solution methods – much more efficient. • The Kalman filter is optimal for linear models, but • it is not affordable for expensive models (despite the “easy” quadratic problem) • it is not optimal for nonlinear models. • Advanced methods based on the Kalman filter can be made affordable: • Ensemble Kalman filter (EnKF, ETKF, ...) • Four-dimensional variational assimilation (4D-Var) © Crown copyright Met Office Ensemble Kalman filter Fit Gaussian to forecast ensemble. © Crown copyright Met Office EnKF = = = © Crown copyright Met Office Deterministic 4D-Var Initial PDF is approximated by a Gaussian. Descent algorithm only explores a small part of the PDF, on the way to a local minimum. © Crown copyright Met Office Simple 4D-Var, as a least-squares best fit of a deterministic model trajectory to observations © Crown copyright Met Office Assumptions in deriving deterministic 4D-Var Bayes Theorem - posterior PDF: where the obs likelihood function is given by: Impossible to evaluate the integrals necessary to find “best”. Instead assume best x maximises PDF, and minimises -ln(PDF): Purser, R.J. 1984: "A new approach to the optimal assimilation of meteorological data by iterative Bayesian analysis". Preprints, 10th conference on weather forecasting and analysis. Am Met Soc. 102-105 Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction." Quart. J. Roy. Met. Soc., 112, 1177-1194. © Crown copyright Met Office The deterministic 4D-Var equations ( ) ( P x yo ∝ P (x) P yo x Assume Gaussians ( ) P ( x ) ∝ exp − ( Bayesian posterior pdf. 1 2 (x − x ) b T ( ) B −1 ( x − xb ) ( ) P y x = P y y ∝ exp − o o © Crown copyright Met Office J (x) = 1 2 (x − x ) b T (y − y ) o T ( R −1 y − y o )) y = H ( M ( x)) But nonlinear model makes pdf non-Gaussian: full pdf is too complicated to be allowed for. So seek mode of pdf by finding minimum of penalty function 1 2 ) B −1 (x − x ) + b 1 2 ( y−y o ) T ( ( R −1 y − y o ∇ x J ( x ) = B −1 ( x − xb ) + M *H *R −1 y − y o ) ) Statistical, incremental 4D-Var Statistical 4D-Var approximates entire PDF by a Gaussian. © Crown copyright Met Office Statistical 4D-Var - equations ( Independent, Gaussian P δ x, δ η y background and model errors ⇒ non-Gaussian pdf for general y: o ) ( exp ( − exp − Incremental linear approximations in forecasting model predictions of observed values converts this to an approximate Gaussian pdf: The mean of this approximate pdf is identical to the mode, so it can be found by minimising: © Crown copyright Met Office ( ∝ exp − (δ x − ( x 1 2 1 2 (δ η + η ) 1 2 (y − y ) T g o ( −x b T g )) T ( B −1 δ x − ( x b − x g ) ( Q −1 δ η + ηg ( R −1 y − y o )) ( ( ) % % δ x, η + H M x g , ηg y = HM ( )) )) ) (δ x − ( x − x ) ) B (δ x − ( x + (δ η + η ) Q (δ η + η ) + (y − y ) R (y − y ) J δ x, δ η = b 1 2 g 1 2 1 2 g o T T T −1 −1 −1 g o )) b − xg ) ) Incremental 4D-Var with Outer Loop Inner low-resolution incremental variational iteration ADJOINT OF P.F. MODEL T DESCENT xb ALGORITHM background U y y y y U δx xg PERTURBATION FORECAST MODEL +δη FULL FORECAST MODEL +η Outer, full-resolution iteration Optional model error terms © Crown copyright Met Office Questions and answers © Crown copyright Met Office Content 1. Bayes Theorem – adding information • Gaussian PDFs • Non-Gaussian observational errors - Quality Control 2. Simplest possible Bayesian NWP analysis - Two gridpoints, one observation. 3. Predicting the prior PDF – a Bayesian view of 4D-Var v Ensemble KF © Crown copyright Met Office Andrew Lorenc 42
© Copyright 2026 Paperzz