Penalized Spline Smoothing, Mixed Models
(and Bayesian Statistics) Three players in a liaison.
Göran Kauermann
Centre for Statistics
University Bielefeld
Germany
London 15. April 2010
(Penalized) Regression Splines in a nutshell
Simple smoothing model y = µ(x) + ε is fitted by replacing
µ(x) = β0 + xβx + x2βxx +
K
X
uj (x − τk )2+
k=1
= X(x)β + Z(x)u = B(x)θ
for knots τ1, τ2, . . . τk with K large. This yields the estimate
µ
b = B(B T B)−1B T y
↑
(k + q) × (q + k), with k “large, but not too large”
London 15. April 2010
1
Regressions Splines
We get the estimate via
4
3
2
1
0
0
1
2
3
4
µ
b = B(B T B)−1B T y
0.0
0.2
0.4
0.6
K=5,q=2,unpenalized
London 15. April 2010
0.8
1.0
0.0
0.2
0.4
0.6
K=10, q=2, unpenalized
0.8
1.0
2
Need for Penalization
London 15. April 2010
3
Penalized Splines
(O’Sullivan, 1986, Eilers & Marx, 1996, Ruppert, Wand & Carroll 2003)
“The Penalized Spline Recipe”:
1. Take a rich, high dimensional Basis B(x), choose k generously
large.
2. Minimize the penalized least square criterion
(y − Bθ)T (y − Bθ) + λθ T Dθ →
min
with D as adequately chosen penalty matrix, in the simplest case
the identity matrix.
3. Choose penalty parameter λ data-driven (e.g. by cross validation)
London 15. April 2010
4
Unpenalized versus Penalized Spline
London 15. April 2010
5
Reformulation
For B Splines (O’Sullivan Splines) we can rewrite the penalized
estimation to (Wand & Ormerod, 2008, Aust. & NZ J. Stat)
y = Xβ + Zu + ǫ
with X low dimensional and Z high dimensional and penalized least
square
(y − Xβ − Zu)T (y − Xβ − Zu) + λuT Du →
min
u, β
with penalty matrix D usually chosen as identity matrix.
London 15. April 2010
6
Linking Penalized Splines with Linear Mixed Models
We comprehend the penalty as a priori normal distribution:
u ∼ N (0, σu2 D −1)
(1)
y|u ∼ N (Xβ + Zu, σǫ2I)
(2)
so that
With (1) and (2) we get a Linear Mixed Model
The Posterior Bayes estimate (or the BLUP) and the penalized estimate
are equivalent, that is
London 15. April 2010
b + Zǔ
b = Xβ
µ
7
Extending Penalized Splines to
Generalized Linear Mixed Models (GLMM)
We can extend the model to
E(Y |x) = h{η(x)}
with h(·) as known (natural) link function and Y ∼ exp{y η(x) − b(η(x))}
Replacing η(x) = X(x)β + Z(x)u yields the GLMM
u ∼ N (0, σu2 D −1)
E(Y |u) = h{Xβ + Zu}
London 15. April 2010
8
Marginal Likelihood and Laplace Approximation
The (marginal) likelihood results as
L(β) = |σu2 D −1|−1/2
Z
exp
( n
X
i=1
)
Yiη(xi) − b(η(xi))
½
uT Du
exp
σu2
¾
The likelihood is not availabe analytically, but can be approximated with
Laplace Approximation.
This yields the penalized likelihood
London 15. April 2010
9
du
Properties of a Statistical Partnership
• Smoothing parameter λ = σǫ2/σu2 can be estimated by maximum
likelihood.
⇒ This avoids grid searching!
(Kauermann, 2005, JSPI; Ruppert, Wand & Carroll, 2003).
• Smoothing parameter can be selected in the presence of
correlated errors
⇒ AIC based selection fails here!
(Krivobokova & Kauermann, 2007, JASA)
London 15. April 2010
10
More on the Statistical Partnership
• Asymptotic results on the Number of Knots
⇒Penalized Splines are asymptotically justified!
(Kauermann, Krivobokova & Fahrmeir, 2009, JRSS B,
Claeskens, Krivobokova & Opsomer, 2009, Biometrika,
Li & Ruppert, 2009, Biometrika )
• Bayesian Smoothing with the validity of the Laplace Approximation
⇒ Laplace is an alternative to MCMC!
(Kauermann, Krivobokova & Fahrmeir, 2009,JRSS B,
Rue, Martino & Chopin, 2009, JRSS B)
London 15. April 2010
11
More on the Statistical Partnership
• .Local Adaptive Smoothing
⇒ Simple and fast computation!
(Krivobokova, Crainiceanu & Kauermann, 2008, JCGS)
• .Model Selection
⇒ Automatic correction for degrees of freedom!
(Wager, Vaida & Kauermann, 2008, Aus. & NZ J. Stat
Kneib & Greven, 2010, Biometrika)
• and ...
London 15. April 2010
12
Selecting the Spline Dimension K
(a finite sample view)
London 15. April 2010
13
Selecting K, the number of knots (in finite samples)
• Eilers & Marx (1996, Stat. Science) suggest to use a generous
number of knots.
• Ruppert (2002, JCGS) the rule of thumb K = min(40, n/4).
• Wood (2006, Generalized Additive Models, R package mgcv) suggests K = 10 ∗ 3(d − 1) with d as dimension of covariates.
• Question: Can the Mixed Model formulation be used to determine the
number of knots?
London 15. April 2010
14
Linear Mixed Models and Truncated Polynomials
We use a truncated polynomial basis:
X(x) =
Z K (x) =
µ
µ
2
q
x
x
1, x, , . . . ,
2
q!
¶
and
(x − τ1)q+
(x − τK−1)q+
,...,
q!
q!
¶
where (x)q+ = xq for x > 0 and 0.
Coefficients u are penalized in the Mixed Model formulation:
u ∼ N (0, σu2 I K−1),
London 15. April 2010
Y |u ∼ N (Xβ + Z K u, σǫ2I n)
15
Marginal Formulation of the Linear Mixed Models
Integrating out the spline coefficients u yields the marginal model
Y
∼ N
¡
¢
Xβ, σǫ2V K (λ)
T
with λ = σǫ2/σu2 and V K (λ) = I + λ1 Z K Z K
This yields the log likelihood:
lK (β, σǫ2, λ)
1
log(σǫ2) log |V K (λ)|
−1
−
− 2 (Y − Xβ)T V K (λ) (Y − Xβ)
=−
2
2
2σǫ
London 15. April 2010
16
K as parameter
Question: How does lK (β, σǫ2, λ) depend on K?
Idea: We consider K as (discrete valued) parameter and maximize
lK (β, σǫ2, λ)
with respect to K and β, σǫ2 and λ.
London 15. April 2010
17
Example
London 15. April 2010
18
Theoretical Justification
Assumptions:
• We assume that covariate x is uniformly distributed on [0, 1]
and knots 0 < τ1 < . . . < τK = 1 are placed such hat
τj − τj−1 = O(K −1) , j = 2, 3, . . . , K
• We assume that σu2 = cK −1 for some constant c.
Note: The latter assumption guarantees differentiability of the (random)
function µ(x).
London 15. April 2010
19
Profile Likelihood
The maximized likelihood equals
1
b σ
b
b = − n log(b
lK (β,
bǫ2, λ)
σǫ2) − log |V K (λ)|,
2
2
We can show that that for K increasing and n fixed:
1.
σ
bǫ2
=
σǫ2
©
¡
1 + Op n
− 21
µ
¢ª
2. log |V K (λ)| ≈ log 1 +
.
n
c
¶
¡
O 1+K
a
¢
for some constants a < 0 and some constant c
The maximized likelihood does NOT change when K increases.
London 15. April 2010
20
Simulations
London 15. April 2010
21
Practical Consequence
In practice, the choice of K can be done as follows
1. Forward Selection Style:
Choose a moderate K, fit the model and increase K until the likelihood does not further increase.
2. Backward Selection Style:
Choose a generous K and decrease it slightly to check that the likelihood does not decrease.
London 15. April 2010
22
Generalization
The result extends to Generalized Linear Mixed Models (GLMM) of the
form E(Y |x) = h{η(x)} with
lK (β, λ) = −
K
log(σu2 ) + log
2
Z
½
exp lK (β|u) −
T
1u u
du
2
2 σu
¾
The above integral is not analytical.
It can be shown that for K → ∞ and sample size n fixed, the above
likelihood can be approximated with Laplace approximation.
Note: This is not an asymptotic but a finite sample result!
London 15. April 2010
23
Example: Toxoplasmosis in Pregnancy
We observe Yt = number of pregnant women with acute toxoplasmosis
(found in regular screenings of more than 50,000 women in federal state of Upper
Austria )
Yt ∼ Poisson (exp{µ(weekt) + tiβt}) .
where µ(week) is a cyclic function.
London 15. April 2010
24
Example: Toxoplasmosis in Pregnancy
London 15. April 2010
25
Selecting the Spline Dimension K
(an asymptotic view)
London 15. April 2010
26
How does K depend on n
• Large K scenario:
K is chosen large, as in classical spline smoothing.
⇒ The penalty asymptotically dominates
Li & Ruppert (2009, Biometrika),
Claeskens, Krivobokova & Opsomer (2009, Biometrika).
• Small K scenario:
K is chosen moderate, as in regression splines
⇒ The penalty has asymptotically no influence
Kauermann, Krivobokova & Fahrmeir (2009, JRSS, B)
London 15. April 2010
27
A Deeper Asymptotic Insight
We assume now that K depends on sample size n.
The research questions are then twofold:
1. In penalized spline smoothing, how should K = K(n) depend on n
such that consistency is justified?
2. If K = K(n), what is the impact on the validity of the Laplace approximation in GLMM?
(Focus in this talk)
London 15. April 2010
28
Generalized Smoothing Model
We assume the generalized spline smoothing model
u ∼ N (0, σu2 D −1)
E(y|u) = h{Xβ + Zu}
yielding the marginal likelihood
l(β, σu2 ) = log
Z
f (y|u)φ(u; σu2 D −1)du
Question: If K = K(n), is Laplace Approximation of the marginal likelihood still valid?
London 15. April 2010
29
Assumptions (sketch)
We work with truncated polynomials of order q and make the following
assumptions:
1. Let xi have support [0, 1] and let the knots 0 = τ0 < τ2 < . . . τK <
τK+1 = 1 , be chosen such that τj+1 − τj = O(K −1).
(knots are placed equidistantly)
2. We assume u ∼ N (0, σu2 ) with σu−2 ∝ n2/(2q+3), with q as order of the
polynomial basis.
(this guarantees smoothness and consistency)
London 15. April 2010
30
Result
We can show
l(β, σu2 ) =
Z
exp{l(β, u, σu2 )}φ(u, σu2 )du
= lLaplace(β, σu2 ){1 + O(ε0)}
where ε0, the leading component in the Laplace approximation, is of
ignorable order.
London 15. April 2010
31
Consequences about the result
• Good News: Laplace approximation is valid for q ≥ 1 (i.e. at least
linear basis), even in asymptotic terms.
• Shun & McCullach (1995) derived the result for q = 0 and found
inconsistency.
• The results holds even in a quite general (Bayesian) framework
Kauermann, Krivobokova & Fahrmeir (2009, JRSS, B)
Rue & Martino & Chopin (2009, JRSS, B)
London 15. April 2010
32
Discussion
• The liaison between Penalized Splines and Mixed Models allows for
new, innovative statistical modelling.
• Laplace approximation makes everything run and works in even complex models.
• The partnership builds a bridge to Bayesian Modelling.
• In fact, one may extend the idea by imposing a prior on β as well
(work in progress).
• ...
Papers and Preprints available from www.wiwi.uni-bielefeld.de/statistik
London 15. April 2010
33
© Copyright 2026 Paperzz