Outline for today Approximation of likelihoods for non-linear mixed effects models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark ◮ integrals involving Gaussian densities - likelihoods for non-linear mixed models ◮ Laplace approximation ◮ Gauss-Hermite quadrature ◮ Monte Carlo methods and importance sampling February 28, 2013 1 / 29 2 / 29 Poisson GLM η = X β linear predictor. Data Yi Poisson distributed General problem: approximation of integrals of the form Z h(u)f (u)du Yi ∼ Poisson(λi ) Rk and independent where mean EYi = λi = exp(ηi ). where f (·) is a Gaussian (normal density). Note mean equals variance Motivation: compution of likelihoods and conditional expectations in non-linear mixed effects models. VarYi = EYi = λi In practice we often encounter overdispersion and correlation: VarYi > EYi and Yi and Yj not independent. 3 / 29 4 / 29 A closer look at integrand in case of Poisson: Poisson log-normal model Introduce vector of zero-mean normal random effects U ∼ N(0, Σ). f (y |u)φ(u) = Conditional on U = u, Yi are Poisson Yi |U = u ∼ Poisson(exp(ηi ) Y 1 X exp yi ηi − exp(ηi ) f (u) yi ! i If Yi |U = u is Bernouilli with and independent with η = X β + Zu. pi (u) = P(Yi = 1|u) = Joint conditional density of Y : f (y |u) = Y i f (yi |u) exp(ηi ) 1 + exp(ηi ) (logistic regression) then f (y |u)f (u) = The marginal joint density of y (the likelihood function) is Z f (y |u)f (u)du f (y ) = Y i pi (u)yi (1 − pi (u))1−yi f (u) = Rk where i exp X i 1 1 exp(− u T Σ−1 u) f (u) = 2 |Σ|1/2 yi ηi − log(1 + exp(ηi ) f (u) In both cases complicated integrands - no analytical solution of integral. 5 / 29 Simplest case: one-dimensional random effect 6 / 29 Laplace approximation Let g (u) = log(f (y |u)f (u)) and choose û so g ′ (û) = 0 (û = arg max g (u)). Taylor expansion around û: U ∼ N(0, τ 2 ) Compute f (y ) = Possibilities: Z g (u) ≈ g̃ (u) = R 1 1 g (û)+(u−û)g ′ (û)+ (u−û)2 g ′′ (û) = g (û)− (u−û)2 −g ′′ (û) 2 2 2 , I.e. exp(g̃ (u)) proportional to normal density N µLP , σLP 2 = −1/g ′′ (û). µLP = û σLP Z Z exp(g̃ (u))du exp(g (u))du ≈ f (y ) = R R Z q 1 2 = exp(g (û)) exp − 2 (u − µLP )2 du = exp(g (û)) 2πσLP 2σLP R f (y |u)f (u)du ◮ Laplace approximation. ◮ Numerical integration/quadrature. 7 / 29 8 / 29 Prediction Minimum mean square prediction of U given Y = y is conditional expectation E[U|Y = y ] Note Laplace approximation also possible for higher dimensions (multivariate Taylor expansion - later). f (u|y ) = f (y |u)f (u)/f (y ) ∝ exp(g (u)) ≈ const exp − 2 = −1/g ′′ (û). where µLP = û σLP 1 (u−µLP )2 2 2σLP Note: µLP is mode of conditional distribution. It follows 2 U|Y = y ≈ N µLP , σLP and µLP approximation of E[U|Y = y ]. 9 / 29 Gaussian quadrature 10 / 29 Adaptive Gaussian Quadrature Gauss-Hermite quadrature (numerical integration) is Z R h(x)φ(x)dx ≈ n X A large gain in precision can be obtained by adaptation using Laplace-approximation: wi h(xi ) i=1 Z where φ is the standard normal density and (xi , wi ),i = 1, n are certain arguments and weights which can be looked up in a table. We can replace ≈ with = whenever h is a polynomial of degree 2n − 1 or less (Arnes part of course). Z f (y |u)f (u) 2 f (y |u)f (u)du ≈ f (u; µLP , σLP )du = 2 ) f (u; µLP , σLP Z f (y |σLP x + µLP )f (σLP x + µLP ) σLP φ(x)dx φ(x) (change of variable: x = (u − µLP )/σLP ) In principle we can use Gauss-Hermite directly but we might need large number n to get accurate results (if integrand far from low order polynomial). 11 / 29 12 / 29 Prediction of random effects for GLMM Advantage f (y |σLP x + µLP )f (σLP x + µLP ) f (y |u)f (u) = 2 ) φ(x) φ(u; µLP , σLP x = (u−µLP )/σLP Conditional mean E[U|Y = y ] = close to constant (f (y )) - hence adaptive G-H quadrature very accurate. Z uf (u|y )du is minimum mean square error predictor, i.e. E(U − Û)2 GH scheme with n = 5: is minimal with Û = H(Y ) where H(y ) = E[U|Y = y ] x 2.020 0.959 0.0000000 -0.959 -2.020 0.533 0.222 0.011 w 0.011 0.222 (x’s are roots of Hermite polynomial computed using e.g. ghq in library glmmML or gaussHermiteData() in fastGHquad). Difficult to analytically evaluate Z E[U|Y = y ] = uf (y |u)f (u)/f (y )du (GH schmes for n = 5 and n = 10 available on web page) 13 / 29 Computation of conditional expectations 14 / 29 Hierarchical/nested models Z f (y |u)f (u) du = E[U|Y = y ] = u f (y ) Z 1 f (y |σLP x + µLP )f (σLP x + µLP ) (σLP x + µLP ) σLP φ(x)dx f (y ) φ(x) Suppose U = (U1 , . . . , Uk ) consist of independent normals and Yi = (Yi1 , . . . , Yini ) only depends on Ui . Then we can factorize integral into one-dimensional integrals YZ f (yi |ui )dui f (y ) = Note: i (σLP x + µLP ) f (y |σLP x + µLP )f (σLP x + µLP ) σLP φ(x) R Hence back in the one-dimensional case. behaves like a first order polynomial in x - hence GH still accurate. 15 / 29 16 / 29 Difficult cases for numerical integration - dimension k >> 1 Suppose two levels of nested random effects (Yij = (Yij1 , . . . , Yijni ) depends on Uij and Ui ): f (y ) = m1 Z h Y m2 Z Y i=1 R j=1 R i f (yij |uij , ui )φ(uij )duij φ(ui )dui Hence m1 one-dimensional integrals each involving m2 one-dimensional integrals. Computational cost roughly O(m1 m2 q 2 ) where q number of quadrature points for one-dimensional integral (ignoring cost of possible Laplace approximation for adaptation). Still manageable. ◮ correlated random effects: cost q k ◮ crossed random effects (e.g. ηij = µ + ui + vj : cost m1 qq m2 where m1 and m2 numbers of ui and vj ) Not possible to factorize likelihood into low-dimensional integrals hence numerical quadrature may not be feasible. Alternatives: Laplace-approximation or Monte Carlo computation. 17 / 29 Monte Carlo computation of likelihood for GLMM 18 / 29 Simple Monte Carlo f (y ) = Likelihood function is an expectation: Z f (y |u)f (u)du = Ef (y |U) f (y ) = Rk Z Rk f (y |u)f (u)du = Ef (y |U) ≈ fˆ(y ) = M 1 X f (y |U l ; β) M l=1 where U l ∼ N(0, Σ) independent Use Monte Carlo simulations to approximate expectation. Monte Carlo variance: Var(LSMC (θ)) = 19 / 29 1 Varf (y |U 1 ; β) M 20 / 29 Importance sampling Consider Z ∼ f and suppose we wish to evaluate Eh(Z ) where h(Z ) has large variance. Estimate Varf (y |U 1 ; β) using empirical variance estimate based on f (y |U l ; β), l = 1, . . . , M: 1 M −1 M X l=1 Suppose we can find density g so that h(z)f (z) ≈ const and h(z)f (z) > 0 ⇒ g (z) > 0 g (z) (f (y |U l ; β) − fˆ(y ))2 Then Eh(Z ) = Often Varf (y |U 1 ; β) is large so large M is needed. where Y ∼ g . Increasing dimension k leads to worse performance (useless in high dimensions - curse of dimensionality) Z h(Y )f (Y ) h(z)f (z) g (z)dz = E g (z) g (Y ) Note variance of h(Y )f (Y )/g (Y ) small so estimate Eh(Z ) ≈ M 1 X h(Yi )f (Yi ) M g (Yi ) i=1 has small Monte Carlo error. 21 / 29 22 / 29 Importance sampling for GLMM Possibility: Note g (·) probability density on Rk . f (y ) = Z Rk f (y |u)f (u)du = Z f (y |u)f (u) = f (y ) = const f (u|y ) Laplace: U|Y = y ≈ N µLP , ΣLP f (y |u)f (u) g (u)du = g (u) f (y |V )f (V ) E where V ∼ g (·). g (V ) Rk M 1 X f (y |V l )f (V l ) where V l ∼ g (·), l = 1, . . . , M f (y ) ≈ fˆIS,g (θ) = M g (V l ) Use g (·) density for N µLP , ΣLP or tν µLP , ΣLP -distribution: f (y |u)f (u) ≈ const g (u) l=1 so small variance. Find g so Var f (y g|V(V)f)(V ) small. Simulation straightforward. VarfˆIS,g (θ) < ∞ if f (y |v )f (v )/g (v ) bounded (i.e. use g (·) with heavy tails). Note: ‘Monte Carlo version’ of adaptive Gaussian quadrature. 23 / 29 24 / 29 Suppose parametric model f (y ; θ) = Consider fixed θ0 : R f (y |u; θ)f (u; θ)du = L(θ) g (u) = f (u|y , θ0 ) = f (y |u; θ0 )f (u; θ0 )/L(θ0 ) This suffices for finding MLE: Then Z Z f (y |u; θ)f (u; θ) f (y |u; θ)f (u; θ) g (u)du = L(θ0 ) f (u|y , θ0 )du = g (u) f R (y |u; θ0 )f (u; θ0 ) R h f (y |U; θ)f (U; θ) h f (y |U; θ)f (U; θ) i L(θ) L(θ0 )Eθ0 |Y = y ⇔ = E θ0 |Y = y f (y |U; θ0 )f (U; θ0 ) L(θ0 ) f (y |U; θ0 )f (U; θ0 ) L(θ) = arg max L(θ) = arg max θ θ M 1 X f (y |U l ; θ)f (U l ; θ) L(θ) ≈ L(θ0 ) M f (y |U l ; θ0 )f (U l ; θ0 ) l=1 where Ul ∼ f (u|y ; θ0 ) So we can estimate ratio L(θ)/L(θ0 ) where L(θ0 ) is unknown constant. Need θ close to θ0 ! so f (y |u; θ)f (u; θ)/g (u) close to constant. 25 / 29 26 / 29 Maximization of likelihood using Newton-Raphson Let Vθ (y , u) = Then u(θ) = d log f (y , u|θ) dθ d log L(θ) = Eθ [Vθ (y , U)|Y = y ] dθ Next time: simulation methods and integrated nested Laplace approximation (INLA). and d2 log L(θ) dθT dθ = − Eθ [dVθ (y , U)/dθT |Y = y ] − Varθ [Vθ (y , U)|Y = y ] j(θ) = − Newton-Raphson: θl+1 = θl + j(θl )−1 u(θl ) All unknown expectations and variances can be estimated using the previous numerical integration or Monte Carlo methods ! 27 / 29 28 / 29 Exercises 1. R exercises on exercise-sheets exercises6.pdf and exercises7.pdf (skip last exercise on sheet exercises7.pdf). 2. Check formulas for u(θ) and j(θ). How can these expression be computed using importance sampling ? 29 / 29
© Copyright 2026 Paperzz