Hierarchical Models
Stefan Kiebel
Wellcome Dept. of Imaging Neuroscience
University College London
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Data
fMRI, single subject
fMRI, multi-subject
EEG/MEG, single subject
ERP/ERF, multi-subject
Hierarchical model for all
imaging data!
Reminder: voxel by voxel
model
specification
Time
parameter
estimation
hypothesis
statistic
Intensity
single voxel
time series
SPM
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
General Linear Model
p
1
y X
1
1
y
N
=
X
N
N: number of scans
p: number of regressors
p
+
N
Model is specified by
1. Design matrix X
2. Assumptions about
Linear hierarchical model
Hierarchical model
(1 )
(2)
y X
(1 )
X
(2)
(1 )
(1 )
(2)
C
( n 1 )
X
(n)
(n)
Multiple variance
components at each level
(i)
Q
(n)
At each level, distribution of
parameters is given by level above.
What we don’t know: distribution of
parameters and variance parameters.
(i)
k
k
(i)
k
Example: Two level model
1 1
yX
1
X
2 2
(1)
y
=
X1
1
2
1
2
+
1
(1)
X2
1 = X 2
+ 2
(1)
X3
Second level
First level
Algorithmic Equivalence
Hierarchical
model
(1 )
(2)
(2)
(n)
(n)
y X
(1 )
X
(2)
(1 )
(1 )
( n 1 )
X
(n)
Parametric
Empirical
Bayes (PEB)
EM = PEB = ReML
Single-level
model
y
X
(1 )
X
...
(1 )
( n 1 )
(2)
(1 )
(n) (n)
X X
(1 )
X
(n)
Restricted
Maximum
Likelihood
(ReML)
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Estimation
y X
N 1
Np
p 1
EM-algorithm
N 1
1
C |y ( X C X )
T
|y C |y X C y
T
maximise
L ln p(y| λ )
g
J
d
M-step
d L
d
2
1
J g
k
Assume, at voxel j:
E-step
dL
2
C k Qk
1
1
jk j k
Friston et al. 2002,
Neuroimage
And in practice?
Most 2-level models are just too big to
compute.
And even if, it takes a long time!
Moreover, sometimes we‘re only
interested in one specific effect and
don‘t want to model all the data.
Is there a fast approximation?
Summary Statistics approach
First level
Data
Design Matrix
ˆ1
ˆ12
Second level
Contrast Images
t
T
c ˆ
T
Vaˆr ( c ˆ )
SPM(t)
ˆ 2
ˆ 22
ˆ11
ˆ11
2
ˆ12
ˆ12
2
One-sample
t-test @ 2nd level
Validity of approach
The summary stats approach is exact under i.i.d.
assumptions at the 2nd level, if for each session:
Within-session covariance the same
First-level design the same
One contrast per session
All other cases: Summary stats approach seems to
be robust against typical violations.
Mixed-effects
y
X [X
(0)
(1 )
X
data
X [X
]
Q {Q
V I
Summary
statistics
(1 )
1
X
(1 )
, , X
X
(1 )
(2)
Q
]
(2)
1
X
(1 ) T
Step 1
ˆ
(1 )
(X V
T
1
X)
1
T
X V
1
y
REML { yy
Y ˆ
V
T
n , X , Q}
(1 )
X X
(2)
(1 )
i
X
(1 )
(1 )
Qi X
(1 ) T
i
EM
approach
(0)
(2)
j
(2)
Qj
j
Step 2
ˆ
(2)
ˆ
(X V
T
(2)
1
X)
1
T
X V
1
y
Friston et al. (2004)
Mixed effects and fMRI
studies, Neuroimage
, }
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Sphericity
C Cov( ) E ( )
T
y X
Scans
‚sphericity‘ means:
Cov( ) I
2
Var ( i )
1
2
Scans
i.e.
2
2nd level: Non-sphericity
Error
covariance
Errors are independent
but not identical
Errors are not independent
and not identical
1st level fMRI: serial correlations
t a t 1 t with t ~ N (0, 2 )
autoregressive process of
order 1 (AR-1)
N
Cov( )
autocovariancefunction
N
Restricted Maximum Likelihood
y X
Cov( ) ?
observed
Q1
yj yj
T
voxel j
ReML
estimated
Q2
ˆ1Q1 ˆ2Q2
Estimation in SPM5
y j X j j
Cˆ Coˆv( ) ReML(
y j y j , X , Q)
T
voxel j
ReML (pooled estimate)
ˆ
j ,OLS
X yj
ˆ j , ML ( X V X ) X V y j
Ordinary least-squares
•optional in SPM5
•one pass through data
•statistic using effective degrees
of freedom
T
1
1
T
1
Maximum Likelihood
•2 passes (first pass for selection of
voxels)
•more accurate estimate of V
2nd level: Non-sphericity
Errors can be
non-independent and non-identical
when…
Several parameters per subject, e.g. repeated measures design
Complete characterization of the hemodynamic response,
e.g. taking more than 1 HRF parameter to 2nd level
Example data set (R Henson) on:
http://www.fil.ion.ucl.ac.uk/spm/data/
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Example I
Stimuli:
Auditory Presentation (SOA = 4 secs) of
(i) words and (ii) words spoken backwards
e.g.
“Book”
and
“Koob”
Subjects:
Scanning:
(i) 12 control subjects
(ii) 11 blind subjects
fMRI, 250 scans per
subject, block design
U. Noppeney et al.
Population differences
1st level:
Controls
Blinds
2nd level:
V
c [1 1]
T
X
Example II
Stimuli:
Auditory Presentation (SOA = 4 secs) of words
Motion Sound
Visual
Action
“jump” “click” “pink”
“turn”
Subjects:
(i) 12 control subjects
Scanning:
fMRI, 250 scans per
subject, block design
Question:
What regions are affected
by the semantic content of
the words?
U. Noppeney et al.
Repeated measures Anova
1st level:
Motion
Sound
?
Visual
?
?
=
Action
=
=
X
2nd level:
1
T
c 0
0
V
X
1
0
1
1
0
1
0
0
1
Example 3: 2-level ERP model
Single subject
Multiple subjects
Trial type 1
Subject j
...
...
Trial type n
...
...
Trial type i
Subject 1
Subject m
Test for difference in N170
Example: difference
between trial types
ERP-specific contrasts:
Average between 150 and
190 ms
2nd level contrast
...
y
=
X
1
Identity
matrix
1
-1 1
2
c
T
1
=
+
X
2
second level
first level
2
Some practical points
RFX: If not using multi-dimensional contrasts at 2nd level (Ftests), use a series of 1-sample t-tests at the 2nd level.
Use mixed-effects model only, if seriously in doubt about
validity of summary statistics approach.
If using variance components at 2nd level: Always check your
variance components (explore design).
Conclusion
Linear hierarchical models are general enough for typical
multi-subject imaging data (PET, fMRI, EEG/MEG).
Summary statistics are robust approximation to mixed-effects
analysis.
To minimize number of variance components to be estimated
at 2nd level, compute relevant contrasts at 1st level and use
simple test at 2nd level.
© Copyright 2025 Paperzz