Machine learning techniques for
quantifying neural synchrony:
application to the diagnosis of
Alzheimer's disease from EEG
Justin Dauwels
LIDS, MIT
LMI, Harvard Medical School
Amari Research Unit, Brain Science Institute, RIKEN
June 9, 2008
RIKEN Brain Science Institute
•RIKEN Wako Campus (near Tokyo)
• about 400 researchers and staff (20% foreign)
• 300 research fellows and visiting scientists
• about 60 laboratories
• research covers most aspects of brain science
Collaborators
François Vialatte*, Theo Weber+, Shun-ichi Amari*, Andrzej Cichocki*
Project
Early diagnosis of Alzheimer’s disease based on EEG
Financial Support
(*RIKEN, +MIT)
Research Overview
Machine learning & signal processing for applications in NEUROSCIENCE
= development of ALGORITHMS to analyze brain signals
• EEG (RIKEN, MIT, MGH)
subject of this talk
• diagnosis of Alzheimer’s disease
• detection/prediction of epileptic seizures
• analysis of EEG evoked by visual/auditory stimuli
• EEG during meditation
• projects related to brain-computer interface (BMI)
• Calcium imaging (RIKEN, NAIST, MIT)
•effect of calcium on neural growth
•role of calcium propagation in gliacells and neurons
• Diffusion MRI (Brigham&Women’s Hospital, Harvard Medical School, MIT)
• estimation and clustering of tracts (future project)
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Alzheimer's disease
Outside glimpse: clinical perspective
Evolution of the disease (stages)
One disease,
•
2 to 5 years before
EEG data
mild cognitive impairment (often unnoticed)
many symptoms
6 to 25 % progress to Alzheimer's per year
•
-
Mild (early stage)
becomes less energetic or spontaneous
noticeable cognitive deficits
still independent (able to compensate)
•
-
Moderate (middle stage)
Mental abilities decline
personality changes
become dependent on caregivers
•
-
Severe (late stage)
complete deterioration of the personality
loss of control over bodily functions
total dependence on caregivers
• 2% to 5% of people over 65 years old
• up to 20% of people over 80
Jeong 2004 (Nature)
memory, language, executive functions,
apraxia, apathy, agnosia, etc…
Memory
(forgetting
relatives)
Apathy
Loss of
Self-control
Video sources: Alzheimer society
Alzheimer's disease
Inside glimpse: brain atrophy
amyloid plaques and
neurofibrillary tangles
Video source:
Alzheimer society
Images: Jannis Productions.
(R. Fredenburg; S. Jannis)
Video source: P. Thompson, J.Neuroscience, 2003
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Alzheimer's disease
Inside glimpse: abnormal EEG
EEG system: inexpensive, mobile, useful for screening
Brain “slow-down”
slow rhythms (0.5-8 Hz)
fast rhythms (8-30 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Decrease of synchrony
•
•
•
focus of this project
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)
MCI vs. mildAD (Babiloni et al., 2006).
Images: www.cerebromente.org.br
Spontaneous (scalp) EEG
Time-frequency |X(t,f)|2
(wavelet transform)
f (Hz)
Time-frequency patterns
(“bumps”)
Fourier |X(f)|2
Fourier power
amplitude
t (sec)
EEG x(t)
Fourier transform
2
1
2
3
3
1
Frequency
High frequency
Low frequency
Windowed Fourier transform
Fourier basis functions
*
=
Window
function
windowed basis
functions
f
Windowed
Fourier
Transform
t
Spontaneous EEG
f (Hz)
Time-frequency |X(t,f)|2
(wavelet transform)
Time-frequency patterns
(“bumps”)
Fourier |X(f)|2
Fourier power
amplitude
t (sec)
EEG x(t)
Signatures of local synchrony
f (Hz)
Time-frequency patterns
(“bumps”)
EEG stems from thousands of neurons
bump if neurons are phase-locked
= local synchrony
t (sec)
Alzheimer's disease
Inside glimpse: abnormal EEG
EEG system: inexpensive, mobile, useful for screening
Brain “slow-down”
slow rhythms (0.5-8 Hz)
fast rhythms (8-30 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Decrease of synchrony
•
•
•
focus of this project
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)
MCI vs. mildAD (Babiloni et al., 2006).
Images: www.cerebromente.org.br
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Comparing EEG signal rhythms ?
2 signals
PROBLEM I:
Signals of 3 seconds sampled at 100 Hz ( 300 samples)
Time-frequency representation of one signal = about 25 000 coefficients
Comparing EEG signal rhythms ?(2)
One pixel
Numerous
neighboring pixels
PROBLEM II:
Shifts in time-frequency!
Sparse representation: bump model
f(Hz)
f(Hz)
Bumps
Sparse
representation
104-
105
coefficients
t (sec)
t (sec)
f(Hz)
Assumptions:
t (sec)
1. time-frequency map is suitable representation
about 102 parameters
2. oscillatory bursts (“bumps”) convey key information
Normalization:
F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).
Similarity of bump models...
How “similar” or “synchronous” are two bump models?
= GLOBAL synchrony
Reminder: bumps due to LOCAL synchrony
= MULTI-SCALE approach
... by matching bumps
y1 y2
Some bumps match
Offset between matched bumps
SIMILAR bump models if:
Many matches
Strongly overlapping matches
... by matching bumps (2)
• Bumps in one model, but NOT in other
→ fraction of “spurious” bumps ρspur
• Bumps in both models, but with offset
→ Average time offset δt (delay)
→ Timing jitter with variance st
→ Average frequency offset δf
→ Frequency jitter with variance sf
Synchrony: only st and ρspur relevant
Stochastic Event Synchrony (SES)
= (ρspur, δt, st, δf, sf )
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Average synchrony
1. Group electrodes in regions
2. Bump model for each region
3. SES for each pair of models
4. Average the SES parameters
Beyond pairwise interactions...
Pairwise similarity
Multi-variate similarity
...by clustering
y1 y2 y3 y4 y5
y1 y2 y3 y4 y5
HARD combinatorial problem!
Models similar if
• few deletions/large clusters
• little jitter
Constraint: in each cluster at most one bump from each signal
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
EEG Data
• EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched
control subjects (CTR) recorded while in rest with closed eyes
→ spontaneous EEG
• All 22 MCI patients suffered from Alzheimer’s disease (AD) later on
• Electrodes located on 21 sites according to 10-20 international system
• Electrodes grouped into 5 zones (reduces number of pairs)
1 bump model per zone
• Used continuous “artifact-free” intervals of 20s
• Band pass filtered between 4 and 30 Hz
EEG data provided by Prof. T. Musha
Similarity measures
•
•
Correlation and coherence
Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ...
TIME
•
Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform)
No Phase Locking
•
State space based measures
sync likelihood, S-estimator, S-H-N-indices, ...
•
FREQUENCY
Information-theoretic measures
KL divergence, Jensen-Shannon divergence, ...
Phase Locking
Sensitivity (average synchrony)
Corr/Coh
Granger
Info. Theor.
State Space
Phase
SES
Mann-Whitney test: small p value suggests large difference in statistics of both groups
Significant differences for ffDTF and ρ!
Classification
ffDTF
•
•
•
Clear separation, but not yet useful as diagnostic tool
Additional indicators needed (fMRI, MEG, DTI, ...)
Can be used for screening population (inexpensive, simple, fast)
Correlations
Strong (anti-) correlations
„families“ of sync measures
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time-frequency domain
Pairs
of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Ongoing work
Time-varying similarity parameters
no stimulus
stimulus
no stimulus
high st
low st
high st
high st
low st
high st
st
Future work
Matching event patterns instead of single events
f(Hz)
coupling between
frequency bands
t (sec)
= allows us to extract patterns in time-frequency map of EEG!
HYPOTHESIS:
Perhaps specific patterns occur in time-frequency EEG maps
of AD patients
before onset of epileptic seizures
REMARK:
Such patterns are ignored by classical approaches: STATIONARITY/AVERAGING!
Conclusions
Measure for similarity of point processes („stochastic event synchrony“)
Key idea: alignment of events
Solved by statistical inference
Application: EEG synchrony of MCI patients
About 85% correctly classified; perhaps useful for screening population
Ongoing/future work: time-varying SES, extracting patterns of bumps
References + software
References
Quantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and
Application to Neural Signals, Neural Computation (under revision)
A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based
on EEG, NeuroImage (under revision)
Measuring Neural Synchrony by Message Passing, NIPS 2007
Quantifying the Similarity of Multiple Multi-Dimensional Point Processes by Integer Programming
with Application to Early Diagnosis of Alzheimer's Disease from EEG, EMBC 2008 (submitted)
Software
MATLAB implementation of the synchrony measures
Machine learning techniques for
quantifying neural synchrony:
application to the diagnosis of
Alzheimer's disease from EEG
Justin Dauwels
LIDS, MIT
LMI, Harvard Medical School
Amari Research Unit, Brain Science Institute, RIKEN
June 9, 2008
Machine learning for neuroscience
Multi-scale in time and space
Data fusion: EEG, fMRI, spike data, bio-imaging, ...
Large-scale inference
Visualization
Behavior ↔ Brain ↔ Brain Regions ↔ Neural Assemblies ↔ Single neurons ↔ Synapses ↔ Ion channels
Estimation
Simple closed form expressions
Deltas: average offset
...where
Sigmas: var of offset
artificial observations (conjugate prior)
Large-scale synchrony
Apparently, all brain regions affected...
Alzheimer's disease
Outside glimpse: the future (prevalence)
Million of sufferers
USA (Hebert et al. 2003)
14
12
• 2% to 5% of people
over 65 years old
• Up to 20% of people
over 80
10
8
6
4
Jeong 2004 (Nature)
2
0
Million of sufferers
1980
1990
2000
2010
2020
2030
2040
2050
World (Wimo et al. 2003)
120
100
80
60
Developped
countries
Developping countries
40
20
0
Ongoing and future work
Applications
Fluctuations of EEG synchrony
Caused by auditory stimuli and music (T. Rutkowski)
Caused by visual stimuli (F. Vialatte)
Yoga professionals (F. Vialatte)
Professional shogi players (RIKEN & Fujitsu)
Brain-Computer Interfaces (T. Rutkowski)
Spike data from interacting monkeys (N. Fujii)
Calcium propagation in gliacells (N. Nakata)
Neural growth (Y. Tsukada & Y. Sakumura)
...
Algorithms
alternative inference techniques (e.g., MCMC, linear programming)
time dependent (Gaussian processes)
multivariate (T.Weber)
Fitting bump models
Initialisation
Adaptation
After adaptation
Signal
gradient method
Bump
F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).
Boxplots
SURPRISE!
No increase in jitter, but significantly less matched activity!
Physiological interpretation
• neural assemblies more localized?
• harder to establish large-scale synchrony?
Similarity of bump models...
How “similar” or “synchronous” are two bump models?
Probabilistic inference
POINT ESTIMATION: θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Uniform prior p(θ): δt, δf = average offset, st, sf = variance of offset
Conjugate prior p(θ): still closed-form expression
Other kind of prior p(θ): numerical optimization (gradient method)
Probabilistic inference
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
EQUIVALENT to (imperfect) bipartite max-weight matching problem
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) = argmaxc Σkk’ wkk’(i) ckk’
s.t. Σk’ ckk’ ≤ 1 and Σk ckk’ ≤ 1 and ckk’ 2 {0,1}
find heaviest set of disjoint edges
not necessarily perfect
ALGORITHMS
• Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm)
• Linear programming relaxation: extreme points of LP polytope are integral
• Max-product algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]
Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
Generative model
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β-2)ckk’
Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
Conditioning on θ
μ↓ μ↑ μ↓
μ↑
Max-product algorithm (2)
• Iteratively compute messages
• At convergence, compute marginals p(ckk’) = μ↓(ckk’) μ↓(ckk’) μ↑(ckk’)
• Decisions: c*kk’ = argmaxckk p(ckk’)
’
Algorithm
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING → max-product
ESTIMATION → closed-form
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
y y’
( -δt /2, -δf /2)
( δt /2, δf /2)
• offset between hidden and observed bump
= Gaussian random vector with
mean ( ±δt /2, ±δf /2)
covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
Easily extendable to more than 2 observations…
Generative model (2)
y y’
i
i’
( -δt /2, -δf /2)
j’
( δt /2, δf /2)
• Binary variables ckk’
ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 (e.g., cii’ = 1 cij’ = 0)
• Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’ are binary (“matching constraints”)
• Generative Model p(y, y’, yhidden , c, δt , δf , st , sf )
θ
(symmetric in y and y’)
• Eliminate yhidden → offset is Gaussian RV with mean = ( δt , δf ) and covariance diag (st , sf)
p(y, y’, c, θ) = ∫ p(y, y’, yhidden , c, θ) dyhidden
• Probabilistic Inference: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
Summary
• Bumps in one model, but NOT in other
→ fraction of “spurious” bumps ρspur
• Bumps in both models, but with offset
→ Average time offset δt (delay)
→ Timing jitter with variance st
→ Average frequency offset δf
→ Frequency jitter with variance sf
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
Objective function
y y’
i
( -δt /2, -δf /2)
i’
j’
( δt /2, δf /2)
• Logarithm of model: log p(y, y’, c, θ) = Σkk’ wkk’ ckk’ + log I(c) + log pθ(θ) + γ
wkk’ = -(1/st (t k’ – tk – δt)2 + 1/sf (f k’ – fk– δf)2 )
- 2 log β
Euclidean distance between bump centers
β = pd (λ/V)1/2
• Large wkk’ if :
a) bumps are close
b) small pd
c) few bumps per volume element
• No need to specify pd , λ, and V, they only appear through β = knob to control # matches
Distance measures
Scaling
wkk’ = 1/st,kk’ (t k’ – tk – δt)2 + 1/sf,kk’ (f k’ – fk– δf)2 + 2 log β
st,kk’ = (Δtk + Δt’k) st
Non-Euclidean
sf,kk’ = (Δfk + Δf’k) sf
Generative model
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β-2)ckk’
Prior for parameters
Expect bumps to appear at about same frequency, but delayed
Frequency shift requires non-linear transformation, less likely than delay
Conjugate priors for st and sf (scaled inverse chi-squared):
Improper prior for δt and δt : p(δt) = 1 = p(δf)
Preliminary results for multi-variate model
linear comb of pc
CTR
MCI
Probabilistic inference
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
APPROACH:
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING
POINT ESTIMATION
Minx2 X, y2Y d(x,y)
X
Y
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
y1 y2 y3 y4 y5
Generate M “noisy” observations
• offset between hidden and observed bump
= Gaussian random vector with
mean ( δt,m /2, δf,m /2)
covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
pc (i) = p(cluster size = i |y)
(i = 1,2,…,M)
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
• “deletion” with probability pd
(other prior pc0 for cluster size)
Role of local synchrony
Stimuli
Consolidation
Assembly
activation
Assembly
recall
Voice Face
Stimulus
Hebbian consolidation
Voice
(Hebb 1949, Fuster 1997)
Probabilistic inference
PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m, pc
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
CLUSTERING (IP or MP)
POINT ESTIMATION
Integer program
• Max-product algorithm (MP) on sparse graph
• Integer programming methods (e.g., LP relaxation)
© Copyright 2026 Paperzz