GLM - Wellcome Trust Centre for Neuroimaging

The General Linear Model (GLM) in SPM5
Klaas Enno Stephan
Wellcome Trust Centre for Neuroimaging
University College London
With many thanks to my colleagues in the
FIL methods group, particularly Stefan
Kiebel, for useful slides
SPM Short Course, May 2007
Wellcome Trust Centre for Neuroimaging
Overview of SPM
Image time-series
Realignment
Kernel
Design matrix
Smoothing
General linear model
Statistical parametric map (SPM)
Statistical
inference
Normalisation
Gaussian
field theory
p <0.05
Template
Parameter estimates
A very simple fMRI experiment
One session
Passive word
listening
versus rest
7 cycles of
rest and listening
Blocks of 6 scans
with 7 sec TR
Question: Is there a change
in the BOLD response
between listening and rest?
Stimulus function
Modelling the measured data
Why?
How?
Make inferences about effects of interest
1. Decompose data into effects and
error
2. Form statistic using estimates of
effects and error
stimulus
function
data
linear
model
effects
estimate
error
estimate
statistic
Voxel-wise time series analysis
model
specification
Time
parameter
estimation
hypothesis
statistic
BOLD signal
single voxel
time series
SPM
Time
=1
BOLD signal
+ 2
x1
+
x2
y  x11  x2 2  e
error
Single voxel regression model
e
Mass-univariate analysis: voxel-wise GLM
1
p
1
1

y
N
=
N
X
p
y  X  e
e ~ N ( 0,  I )
2
+
N
e
Model is specified by
1. Design matrix X
2. Assumptions about e
N: number of scans
p: number of regressors
The design matrix embodies all available knowledge about
experimentally controlled factors and potential confounds.
GLM assumes Gaussian “spherical” (i.i.d.) errors
sphericity = iid:
error covariance is
scalar multiple of
identity matrix:
Cov(e) = 2I
Examples for non-sphericity:
4
Cov(e)  
0
0

1
non-identity
1
Cov(e)  
0
0

1
2
Cov(e)  
1
1

2
non-identity
non-independence
Parameter estimation: Ordinary least squares
y  X  e
 1 
 
2 
=
y
+
e
X
Ordinary least
squares (OLS):
Estimate parameters
such that
T
1
T
ˆ
  (X X ) X y
N
e
2
t
t 1
minimal
OLS parameter estimate
(assuming iid error)
A geometric perspective
e  Ry
y  X  e
R I P
y
e
ˆ  ( X X ) X y
1
T
Design space
defined by X
x2
T
yˆ  Xˆ
x1
yˆ  Py
1
P  X (X X ) X
T
T
Correlated and orthogonal regressors
y
x2*
x2
x1
y  x11  x2  2  e
y  x11  x2  2  e
1   2  1
1  1;  2  1
Correlated regressors =
explained variance is
shared between
regressors
*
*
*
When x2 is orthogonalized
with regard to x1, only the
parameter estimate for x1
changes, not that for x2!
What are the problems of using the GLM for fMRI data?
1. BOLD responses have a
delayed and dispersed form.
HRF
2. The BOLD signal includes substantial amounts of
low-frequency noise.
3. The data are serially correlated (temporally
autocorrelated)
 this violates the assumptions of the noise model
in the GLM
Problem 1: Shape of BOLD response
Solution: Convolution model
hemodynamic
response
function
(HRF)
t
f  g (t )   f ( ) g (t   )d
0
The response of a linear time-invariant (LTI) system is the convolution of the input
with the system's response to an impulse (delta function).
expected BOLD response
= input function impulse response function (HRF)
Convolution model of the BOLD response
Convolve stimulus function with
a canonical hemodynamic
response function (HRF):
t
f  g (t )   f ( ) g (t   )d
0
 HRF
Problem 2: Low-frequency noise
Solution: High pass filtering
SY  SX  Se
S = residual forming matrix of DCT set
discrete cosine
transform (DCT) set
High pass filtering: example
blue =
data
black = mean + low-frequency drift
green = predicted response, taking into account
low-frequency drift
red =
predicted response (with low-frequency
drift explained away)
Problem 3: Serial correlations
et  aet 1   t with  t ~ N (0,  2 )
1st order autoregressive
process: AR(1)
N
Cov(e)
autocovariance
function
N
Dealing with serial correlations
• Pre-colouring: impose some known autocorrelation
structure on the data (filtering with matrix W) and use
Satterthwaite correction for df’s.
• Pre-whitening:
1. Use an enhanced noise model with hyperparameters
for multiple error covariance components.
2. Use estimated autocorrelation to specify matrix W
and use weighted least squares (WLS) estimation.
WY  WX   We
Multiple covariance components
V  Cov( e)
e ~ N (0,  V )
2
V   iQi
enhanced noise model
V
= 1
Q1
+ 2
Estimation of hyperparameters  with ReML (restricted maximum
likelihood).
Q2
Hyperparameter estimation
e ~ N (0,  V )
2
V  1Q1  2Q2
Three hyperparameters: 1, 2, 
• Assumption: error covariance pattern is identical within
voxels (of same tissue class), variance is not
 voxel-specific estimates of , but single estimate of  for
all voxels
• Estimation of  in SPM proceeds by pooling over all voxels
that survive an F-test at p<0.001 based on OLS estimates
(“first pass” during estimation in SPM)
Contrasts &
statistical parametric maps
c=10000000000
Q: activation during
listening ?
X
Null hypothesis:
1
0
c ˆ
T
t
Std (c ˆ )
T
t-statistic based on ML estimates
c ˆ
Wy  WX  W
t

̂  (WX ) Wy
Stˆd (c ˆ )
T
1 / 2
W V
T
 2V  Cov( )
Stˆd (c ˆ )  ˆ c (WX ) (WX ) c
c = +1 0 0 0 0 0 0 0 0 0 0
T
2 T

X
V
ˆ
2



Wy  WXˆ

2
tr ( R)
R  I  WX (WX )

ReMLestimate

T
Physiological confounds
• head movements
• arterial pulsations
• breathing
• eye blinks
• adaptation affects, fatigue, fluctuations in
concentration, etc.
Outlook: further challenges
• correction for multiple comparisons
• variability in the HRF across voxels
• slice timing
• limitations of frequentist statistics
 Bayesian analyses
• GLM ignores interactions among voxels
 models of effective connectivity
Summary
• Mass-univariate approach: same GLM for each voxel
• GLM includes all known experimental effects and
confounds
• Convolution with a canonical HRF
• High-pass filtering to account for low-frequency drifts
• Estimation of multiple variance components (e.g. to
account for serial correlations)
• Parametric statistics
Correction for multiple comparisons
• Mass-univariate approach:
We apply the GLM to each of a huge number of voxels
(usually > 100,000).
• Threshold of p<0.05  more than 5000 voxels
significant by chance!
• Massive problem with multiple comparisons!
• Solution: Gaussian random field theory
(Will’s talk)
Variability in the HRF
• HRF varies substantially across voxels and subjects
• For example, latency can differ by ± 1 second
• Solution: use multiple basis functions
• See talk on er-fMRI by Christian