Summary GC-LDA Related Work Example Topics

Generalized Correspondence-LDA Models (GC-LDA) For
Identifying Functional Regions in the Brain
Timothy N.
1
1
Rubin ,
Oluwasanmi
4
Yarkoni
Related Work
•  Existing large-scale meta-analytic approaches for identifying functional
regions tend to use multi-stage processes, which produce widely distributed
networks that are difficult to interpret:
LDA-Derived Regions
(Poldrack et al., 2012)
Topic 8
Obesity
Cocaine
Disorder
Drug Abuse
Topic 10
Amnesia
Alzheimer's
Korsakoff
Wernicke
•  Modeling goals are to address the following weaknesses in existing methods:
•  Jointly identify cognitive function and spatial region (rather than identifying
spatial component in initial stage and mapping to functional component)
•  Identify spatially cohesive, interpretable spatial regions associated with
cognitive functions
•  Additionally, we hope to provide general improvements to CorrespondenceLDA, that are useful in other applications such as image annotation
GC-LDA
(C)
α
θ
γ
z
y
w
x
NW
α
θ
γ
z
y
w
x
NW
NX
φ
λ1
… λN
α
θ
γ
z
y
w
x
c
NW
NX
D
β
Example Topics
(B)
(A)
NX
D
β
T
φ
µ
σ
T
D
β
φ
µ
σ
R
π
δ
T
Generative process for GC-LDA (A):
•  Sample a multinomial mixture θd over topics
•  To generate each peak activation in d:
o  Sample a topic y from θd
o  Sample an activation x from a Spatial Distribution f (Λ y)
•  To generate each word token in d:
o  Sample a topic z from a multinomial distribution proportional to the yassignments within the document plus smoothing count γ
o  Sample a word w from the multinomial distribution φz
GC-LDA Versions Considered
(B) No-Subregions: Spatial distribution for each topic is a Multivariate
Gaussian. Equivalent to smoothed Correspondence-LDA when γ = 0
(C) Subregions Models: Spatial distribution for each topic is a Gaussian
Mixure, where subregion ry is sampled from topic y with probability πr,y and
each subregion is associated with R Gaussians (we used R=2)
Considered two variants of (C): Unconstrained Subregions, and Constrainted
Subregions, where symmetry of means was enforced along horizontal axis
Predictions on Holdout Data
•  We trained all three variants of the model on 80% of document data, and
predicted all held out activation and word tokens, for a range of γ values
•  For all models, using a non-zero γ improves performance
•  Both Subregions Models outperform the No-Subregions model
Activations + Words
Log-Likelihood
•  A central goal of cognitive neuroscience is to find a mapping from neural
activity onto cognitive function.
•  We developed a Bayesian topic modeling approach, Generalized
Correspondence-LDA (GC-LDA) to construct a functional neuro-anatomical
atlas of the brain.
•  GC-LDA is a generalization Correspondence-LDA Model (Blei & Jordan,
2003), for modeling multiple data-types, where one type describes the other.
•  In the current application, input to the model consists of abstract text and
reported (x,y,z) activation coordinates for over 11,000 fMRI publications
•  The models learn a set of T topics, where each topic is parameterized by a
probabilistic distribution over linguistic features (e.g. `vision’, `motion’,
`perception’) and a spatial distribution over neural activity
•  Each topic captures a functional region of the brain, where the linguistic
features for the topic describe a functional signature associated with the
spatial distribution of the topic
•  We consider three variants of the GC-LDA model, with an emphasis on
variants that are particularly useful for modeling neuroimaging data
(Smith et al., 2009)
Michael N.
3,
Jones Tal
SurveyMonkey. 2 University of Illinois at Urbana-Champaign. 3 Indiana University 4 University of Texas at Austin.
Summary
PCA-Derived Regions
2
Koyejo ,
Activations only
Words only
• 
• 
• 
• 
Conclusions
Results on predicting holdout data indicate that smoothing parameter γ
significantly improves model performance across all variants
Predictions on holdout data suggest that the Subregions Models using
Gaussian mixures outperform the No-Subregions model (which is equivalent
to a smoothed version of Correspndence-LDA)
Qualitative results demonstrate that GC-LDA identifies interpretable functional
regions, consistent with known features described in neuroimaging literature
We expect that model improvements in terms of both γ parameter and spatial
distributions are applicable to other fields such as image annotation
References
Poldrack, R. A., Mumford, J. A., Schonberg, T., Kalar, D., Barman, B., & Yarkoni, T. (2012). Discovering Relations Between Mind, Brain, and Mental Disorders Using Topic
Mapping. PLoS Computational Biology, 8(10).
Smith, S. M., Fox, P. T., Miller, K. L., Glahn, D. C., Fox, P. M., Mackay, C. E., ... & Beckmann, C. F. (2009). Correspondence of the brain's functional architecture during
activation and rest. Proceedings of the National Academy of Sciences, 106(31), 13040-13045.
Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in
informaion retrieval (pp. 127-134). ACM.