powerpoint

Covariance components II
autocorrelation & nonsphericity
Alexa Morcom Oct. 2003
Methods by blondes vs. mullets?
Nonsphericity - what is it
and why do we care?
• Need to know expected behaviour of
parameters under H0 - less intrinsic variability
means fewer df, so liberal inference
• Null distribution assumed normal
• Further assumed to be ‘iid’ - errors are
identical and independently distributed
• “Estimates of variance components are
used to compute statistics and variability in
these estimates determine the statistic’s
d.f.”
An illustration...
• A GLM with just 2 observations
y
= X* b
+ e
y1 = X* b1 + e1
y2
b2
e2
e ~ N(0, s) iid
e ~ N(0, C e)
iid assumptions
error covariance matrix C e
Spherical
e2
e1
Ce =
1
0
0
1
Non-identical
e2
e1
Ce =
4
0
0
1
Non-independent
e2
e1
Ce =
1 3
0.5 5
Varieties of nonsphericity in fMRI
• Temporal autocorrelation - 1st level
• Correlated repeated measures - 2nd level
• Unequal variances between groups - 2nd level
• Unequal within-subject variances - 1st level*
• Unbalanced designs at 1st level*
• (Spatial ‘nonsphericity’ or smoothness)
A traditional psychology example
Level 1
Level 2
Level 3
0-back
1-back
2-back
Subjects
1 …12
Subjects
1 …12
Subjects
1 …12
• Repeated measures of RT across subjects
• RTs to levels 2 & 3 may be more highly
correlated than those to levels 1 & 2
Sphericity
s11
s21
…
sk1
s12
s22
…
sk2
… s1k
… s2k
… ...
… skk
Compound symmetry
n subjects
k treatments
s2 rs2 … rs2
rs2 s2 … rs2
… … … ...
rs2 rs2 … s2
sij = sample var/ cov
By inspection:
Variance of difference
between pair of levels
constant
Treatment variances
equal, treatment
covariances equal
Not easy to see!
The traditional psychology solution
• Sphericity - most liberal condition for SS to be
distributed as F ratio
• A measure of departure from sphericity: e
• SS but approx. by F with Greenhouse-Geisser
corrected d.f. (based on Satterthwaite approx):
F [(k-1)e, (n-1)(k-1)e]
• A fudge in SPSS because e must be estimated,
and this is imprecise (later…) so correction
slightly liberal
A more general GLM
y =
X*b +
e
OLS
Wy = WX*b + We
W/GLS
• Weighting by W to render Cov(We) iid or known
A more general GLM
y =
X*b +
e
OLS
Wy = WX*b + We
W/GLS
• Weighting by W to render Cov(We) iid or known
^
bw = (WX)-y
Cb^ = (WX)- WCe W T(WX)-T
• i.e. covariance of parameter estimates depends on
both the design and the error structure ...
A more general GLM
y =
X*b +
e
OLS
Wy = WX*b + We
W/GLS
• Weighting by W to render Cov(We) iid or known
^
bw = (WX)-y
Cb^ = (WX)- WCe W T(WX)-T
• i.e. covariance of parameter estimates depends on
both the design and the error structure ...
• If Ce is iid with var = s 2, then W = I; Cb^ Ce = s 2I
A more general GLM
y =
X*b +
e
OLS
Wy = WX*b + We
W/GLS
• Weighting by W to render Cov(We) iid or known
^
bw = (WX)-y
Cb^ = (WX)- WCe W T(WX)-T
• i.e. covariance of parameter estimates depends on
both the design and the error structure ...
• If Ce is iid with var = s 2, then W = I; Cb^ Ce = s 2I
• If single covariance component, direct estimation
• Otherwise iterative, or determine Ce first ...
Colouring & whitening...
• Imposed ‘ temporal smoothing ’ W=S (SPM99)
Sy = SX*b + Se
C^b = (SX)- SCe S T(SX)-T
S is known and Ce assumed ‘swamped’
Resulting d.f. adjustment = Satterthwaite
(but better than Greenhouse-Geisser)
Colouring & whitening...
• Imposed ‘ temporal smoothing ’ W=S (SPM99)
Sy = SX*b + Se
C^b = (SX)- SCe S T(SX)-T
S is known and Ce assumed ‘swamped’
Resulting d.f. adjustment = Satterthwaite
(but better than Greenhouse-Geisser)
• Prewhitening: if Ce is assumed known,
premultiply by W = Ce½ (SPM2)
^
b by OLS then is best estimator
& C^b = (XT Ce-1X)-1
Effects on statistics
t
=
cT b
(cTCbc )½
• Estimation is better - increased precision of b
• Minimum covariance of estimator maximises t
as Cb is in denominator (& depends on X & Ce:
compare S, ‘bigger’ denominator)
• Precise determination of d.f. as function of W
(i.e. Ce) & design matrix X (if S, fewer)
Estimating multiple covariance
components
• Doing this at every voxel would require ReML
at every voxel (my contract is too short…)
• As in SPSS, such estimation of Ce would be
imprecise, and inference ultimately too liberal:
Ce = rrT + X Cb XT
(critical ‘circularity’… )
• To avoid this, SPM2 uses spatial (cross-voxel)
pooling of covariance estimation
• This way, Ce estimate is precise & (prewhitened
OLS) estimation proceeds noniteratively
1st level nonsphericity
• Model Ce as linear combination of bfs:
C(l)e = Si (l1Q1 + l 2Q2)
• Timeseries autocorrelations in fMRI
(Low freq. 1/f removed by high-pass filter)
White noise is Q1
Lag 1 autoregressive AR(1) is Q2
Estimated Ce
Q1
Q2
2nd level nonsphericity
• Here model unequal variance across measures,
&/or unequal covariance between measures
C(l)e = Si (l1Q1 + l 2Q2 … + … l iQi)
• No. of bfs depends on no. of measures &
options selected
Nonsphericity?
Correlated repeated measures?
Variance for each
measure for all
subjects
Covariance of each
pair of measures for
all subjects
3 measures:
3 diagonals Q1- Q3
3 off-diag Q4- Q6
What difference does it make?
SPM99 OLS method (applied
incorrectly) & assuming iid big t, lots of df, liberal
Worsley & Friston’s SPM99
method with Satterthwaite df
correction - smaller t, fewer
df, valid but not ideal (cons)
SPM2 Gauss-Markov (ideal)
estimator with prewhitening full no. of df along with
correct t value
Limitations of 2 level approach
y = X(1)b(1) + e(1)
b(1) = X(2) b(2) + e(2)
y = X(1)X(2)b(2) + X(1)e(2) + e(1)
Cov(y) = X(1)Ce(2)X(1)T + Ce(1) (into ReML)
• 2-stage ‘summary statistic’ approach assumes
‘mixed effects’ covariance components are
separable at the 2 levels
• Specifically, assumes design X & variance
same for all subjects/ sessions, even if
nonsphericity modelled at each level