Hierarchical Models - University College London

Hierarchical Models
Stefan Kiebel
Wellcome Dept. of Imaging Neuroscience
University College London
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Data
fMRI, single subject
fMRI, multi-subject
EEG/MEG, single subject
ERP/ERF, multi-subject
Hierarchical model for all
imaging data!
Reminder: voxel by voxel
model
specification
Time
parameter
estimation
hypothesis
statistic
Intensity
single voxel
time series
SPM
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
General Linear Model
p
1
y  X  
1
1

y
N
=
X
N
N: number of scans
p: number of regressors
p
+

N
Model is specified by
1. Design matrix X
2. Assumptions about 
Linear hierarchical model
Hierarchical model


(1 )

(2)
y  X
(1 )
 X
(2)
(1 )


(1 )
(2)
C


( n 1 )
 X
(n)

(n)

Multiple variance
components at each level
(i)

  Q
(n)
At each level, distribution of
parameters is given by level above.
What we don’t know: distribution of
parameters and variance parameters.
(i)
k
k
(i)
k
Example: Two level model
1 1
yX 

1
X

2  2 

(1)
y
=


X1
1
2 
1
 2 
+  
1
(1)
X2
 1 = X 2 
+  2 
(1)
X3
Second level
First level
Algorithmic Equivalence
Hierarchical
model


(1 )

(2)

(2)
(n)

(n)
y  X
(1 )
 X
(2)
(1 )

(1 )


( n 1 )
 X
(n)

Parametric
Empirical
Bayes (PEB)
EM = PEB = ReML
Single-level
model
y 

X
(1 )
 X 
... 
(1 )
( n 1 )
(2)

 
(1 )
(n) (n)
X X 
(1 )
X
(n)
Restricted
Maximum
Likelihood
(ReML)
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Estimation
y  X
N 1
Np
  
p 1
EM-algorithm
N 1
1
C |y  ( X C  X )
T
 |y  C |y X C  y
T
maximise
L  ln p(y| λ )
g
J
d
M-step
d L
d
2
1
 J g
k
Assume, at voxel j:
E-step
dL
2
C   k Qk
1
1
 jk   j k
Friston et al. 2002,
Neuroimage
And in practice?
Most 2-level models are just too big to
compute.
And even if, it takes a long time!
Moreover, sometimes we‘re only
interested in one specific effect and
don‘t want to model all the data.
Is there a fast approximation?
Summary Statistics approach
First level
Data
Design Matrix
ˆ1
ˆ12
Second level
Contrast Images
t
T
c ˆ
T
Vaˆr ( c ˆ )
SPM(t)
ˆ 2
ˆ 22
ˆ11
ˆ11
2
ˆ12
ˆ12
2
One-sample
t-test @ 2nd level
Validity of approach
The summary stats approach is exact under i.i.d.
assumptions at the 2nd level, if for each session:
Within-session covariance the same
First-level design the same
One contrast per session
All other cases: Summary stats approach seems to
be robust against typical violations.
Mixed-effects
y
X  [X
(0)
(1 )
X
data
X  [X
]
Q  {Q
V  I
Summary
statistics
(1 )
1
X
(1 )
, , X
X
(1 )
(2)
Q
]
(2)
1
X
(1 ) T
Step 1
ˆ
(1 )
 (X V
T
1
X)
1
T
X V
1
y
  REML { yy
Y  ˆ
V 
T
n , X , Q}
(1 )
X  X
(2)

(1 )
i
X
(1 ) 
(1 )
Qi X
(1 )  T

i
EM
approach
(0)

(2)
j
(2)
Qj
j
Step 2
ˆ
(2)
ˆ
 (X V
T
(2)
1
X)
1
T
X V
1
y
Friston et al. (2004)
Mixed effects and fMRI
studies, Neuroimage
, }
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Sphericity
C  Cov( )  E ( )
T
y  X  
Scans
‚sphericity‘ means:
Cov( )   I
2
Var ( i )  
 1
2
Scans
i.e.
2
2nd level: Non-sphericity
Error
covariance
Errors are independent
but not identical
Errors are not independent
and not identical
1st level fMRI: serial correlations
 t  a t 1  t with t ~ N (0,  2 )
autoregressive process of
order 1 (AR-1)
N
Cov( )
autocovariancefunction
N
Restricted Maximum Likelihood
y  X  
Cov( ) ?
observed
Q1
 yj yj
T
voxel j
ReML
estimated
Q2
ˆ1Q1  ˆ2Q2
Estimation in SPM5
y j  X j   j
Cˆ   Coˆv( )  ReML(
 y j y j , X , Q)
T
voxel j
ReML (pooled estimate)
ˆ

j ,OLS
 X yj
ˆ j , ML  ( X V X ) X V y j
Ordinary least-squares
•optional in SPM5
•one pass through data
•statistic using effective degrees
of freedom
T
1
1
T
1
Maximum Likelihood
•2 passes (first pass for selection of
voxels)
•more accurate estimate of V
2nd level: Non-sphericity
Errors can be
non-independent and non-identical
when…
Several parameters per subject, e.g. repeated measures design
Complete characterization of the hemodynamic response,
e.g. taking more than 1 HRF parameter to 2nd level
Example data set (R Henson) on:
http://www.fil.ion.ucl.ac.uk/spm/data/
Overview
Why hierarchical models?
The model
Estimation
Expectation-Maximization
Summary Statistics approach
Variance components
Examples
Example I
Stimuli:
Auditory Presentation (SOA = 4 secs) of
(i) words and (ii) words spoken backwards
e.g.
“Book”
and
“Koob”
Subjects:
Scanning:
(i) 12 control subjects
(ii) 11 blind subjects
fMRI, 250 scans per
subject, block design
U. Noppeney et al.
Population differences
1st level:
Controls
Blinds
2nd level:
V
c  [1  1]
T
X
Example II
Stimuli:
Auditory Presentation (SOA = 4 secs) of words
Motion Sound
Visual
Action
“jump” “click” “pink”
“turn”
Subjects:
(i) 12 control subjects
Scanning:
fMRI, 250 scans per
subject, block design
Question:
What regions are affected
by the semantic content of
the words?
U. Noppeney et al.
Repeated measures Anova
1st level:
Motion
Sound
?
Visual
?
?
=
Action
=
=
X
2nd level:
1

T
c  0
0

V
X
1
0
1
1
0
1
0 

0 
 1
Example 3: 2-level ERP model
Single subject
Multiple subjects
Trial type 1
Subject j
...
...
Trial type n
...
...
Trial type i
Subject 1
Subject m
Test for difference in N170
Example: difference
between trial types
ERP-specific contrasts:
Average between 150 and
190 ms
2nd level contrast

...
y
=
X
1
Identity
matrix
1
-1 1
 2 
c 
T
1
=
+
X
2 
second level
first level

2 
Some practical points
RFX: If not using multi-dimensional contrasts at 2nd level (Ftests), use a series of 1-sample t-tests at the 2nd level.
Use mixed-effects model only, if seriously in doubt about
validity of summary statistics approach.
If using variance components at 2nd level: Always check your
variance components (explore design).
Conclusion
Linear hierarchical models are general enough for typical
multi-subject imaging data (PET, fMRI, EEG/MEG).
Summary statistics are robust approximation to mixed-effects
analysis.
To minimize number of variance components to be estimated
at 2nd level, compute relevant contrasts at 1st level and use
simple test at 2nd level.