Summary/Conclusions - Princeton CS

Mind Reading with fMRI
Ken Norman
Department of Psychology
Princeton University
May 1, 2007
Brain Scanning
• Today’s topic: Applying pattern classifiers to brain scanning
data, to decode the information represented in a person’s
brain at a particular point in time
• This is NOT the standard approach
• Standard approach:
• Stick someone in the scanner
• Have them perform a cognitive task
• Explore which brain regions are engaged by the
cognitive task
Brain Scanning
• If you’re interested in memory retrieval:
• Scan people while they’re retrieving memories
• Scan people during a control condition
• Look at which brain regions respond differentially
• This approach has been very productive for cognitive
neuroscience
Brain Scanning
• Alternative approach to analyzing brain scanning data:
• Use pattern classification algorithms, applied to distributed
patterns of neural activity, to identify the neural signatures of
particular thoughts and memories
• Once we have trained the classifier to recognize a particular
thought, we can use the classifier to track the comings and
goings of those thoughts over time
Motivation
• Why pattern classification?
• Reason #1: Improve the interface between fMRI and cognitive
theories
• Cognitive neuroscientists have developed very detailed
theories of how information is processed in the brain
• What information is represented in different brain structures?
• How is it represented?
• How is that information transformed at different stages of
processing?
• To directly test these theories, we need a way of decoding
the informational contents of the subject’s brain state
Motivation
• Reason #2: We aren’t doing as good a job of “data mining”
fMRI data as we could...
• We collect several GB of information from each subject
• There is a lot of information about subjects’ thoughts
buried in these big data files; the challenge is how to
extract this information
• Machine learning researchers have developed
tremendously powerful algorithms for extracting
meaningful regularities from large data sets
• These algorithms are not routinely used in fMRI data
analysis…
Outline
• 3 minute overview of functional MRI
• Brief overview of existing research on fMRI pattern
classification
• Technical challenges & machine learning issues
Brain Scanning 101
Brain Scanning 101
• How do we image neural activity with functional MRI?
• Brain regions that are active use up more metabolic
resources
• In particular, they use up more oxygen from the blood
• The MRI machine can be tuned to detect the difference
between oxygenated and deoxygenated blood
• By looking at which brain areas have deoxygenated vs.
oxygenated blood, we can get a sense of which brain areas
are active at a particular moment
Brain Scanning 101
• it takes approx. 2 seconds for the MRI machine to take a
snapshot of blood flow (across the entire brain)
fMRI images
• Big cube, made out of a grid of little cubes
– Pixel = one square in a 2D grid (“picture
element”)
– Voxel = one of the tiny little cubes in an
fMRI image (like a volumetric pixel)
• Voxels are approx. 3 millimeters on each side
• Neuron size ~ 10 micrometer
• Each voxel reflects the aggregate
activity of a very large number of neurons
• We aren’t directly measuring activity, we
are measuring blood flow!
• Blood flow response is smeared out in
time (peak response = ~6 sec after neural
activity)
Patterns in the brain
• Key idea: Cognitive states correspond to distributed patterns
of brain activity
• What do these “patterns in the brain” look like?
The Eight Categories Study
(Haxby et al. 2001)
Faces
Houses
Cats
Bottles
Scissors
Shoes
Chairs
Scrambled Pictures
slides courtesy of Jim Haxby
Identification Accuracy ± SE
Accuracy of Category Identification
100
90
80
70
60
50
40
30
20
10
0
Chance
Overall Accuracy = 96%
slides courtesy of Jim Haxby
Our Studies
• We set out to extend the basic pattern classification method
• The brain patterns from the Haxby study correspond to
several minutes’ worth of brain activity
• We wanted to see if we could classify cognitive states
based on single brain images (reflecting ~2 seconds’
worth of neural activity)
Pattern Classification Method
• General approach: Say that we want to be able to track the
presence of two different cognitive states in the subject’s brain
(e.g., viewing shoes vs. bottles) using fMRI
Pattern Classification Method
1. Acquire brain data
while the subject is
thinking about shoes
or bottles
Pattern Classification Method
1. Acquire brain data
2. Convert each
functional brain
volume (~ 2 seconds
worth of data) into a
vector that reflects the
pattern of activity
across voxels at that
point in time.
We typically do some
kind of feature
selection to cut down
on the number of
voxels
Pattern Classification Method
1. Acquire brain data
2. Generate brain
patterns
3. Label brain patterns
according to whether
the subject was
viewing shoes vs.
bottles (adjusting for
lag in the blood flow
response)
Pattern Classification Method
1. Acquire brain data
2. Generate brain
patterns
3. Label brain patterns
4. Train a classifier to
discriminate between
bottle patterns and
shoe patterns
Simple Neural Network Classifier (Logistic Regression)
•
•
•
To estimate how much subjects are thinking about bottles, compute a weighted
sum of voxel activity values; do the same for shoes
Apply decision rule (e.g., sigmoid function)
To train the classifier, we use a learning algorithm that sets the weights to
maximize decision performance (e.g., backpropagation)
Bottle
vs
Shoe
Output layer
Input layer (voxels)
Pattern Classification Method
1. Acquire brain data
2. Generate brain
patterns
3. Label brain patterns
4. Train the classifier
5. Apply the trained
classifier to new brain
patterns (not
presented at training).
Free Recall & Mental Time Travel (Polyn et al.,
2005)
• How do we selectively retrieve memories from a particular
event?
• Intuitively: We try to recapture our mindset from that event
• Concretely: We try to make our brain state during recall
resemble our brain state during the original event
• “Mental Time Travel”
• Goal of the study: Use fMRI pattern-analysis to image this
process of mental time travel as it happens...
Imaging Mental Time Travel (Polyn et al., 2005)
• Memory experiment: Subjects study 3 types of stimuli
Jack Nicholson
Giza pyramids
flask
• Recall test: Recall items from all 3 categories, in any order
• Hypothesis: To recall a particular category, subjects try to
recapture their mindset from the study phase
• In concrete terms: Subjects try to make their brain state at
test resemble their brain state when they were studying that
category
• If subjects succesfully recapture their brain state from the
study phase, this will trigger recall of specific studied items...
Analysis strategy
• Step 1: Feed fMRI data from the study phase into a pattern
classification algorithm
• Train the pattern classifier to recognize the brain patterns
associated with studying faces vs. locations vs. objects
Neural network classifier
• Mapping from voxel activity values to output units (one
per category)
Analysis strategy
• Step 2: Apply the trained classifier to brain data from the
retrieval phase
• Use the classifier to track, second-by-second, how well
the subject’s brain state at retrieval matches their brain
state when they were studying faces vs. locations vs.
objects
Predictions
• As subjects try recall faces, locations, and objects, their
brain state should come into alignment with the brain
states associated with studying faces, locations, and
objects
• This neural measure of category-specific “mental
reinstatement” should be predictive of recall
match to object study context
match to location study context
match to face study context
Final free recall - classifier output
Classifier traces for Subject 9 during final free recall.
match to object study context
match to location study context
match to face study context
Final free recall - classifier output
Classifier traces for Subject 9 during final free recall.
Other findings
• Kamitani & Tong (2005): decode the orientation of a striped
pattern that is being viewed by the subject (accurate to within
20 degrees)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
2006 Pittsburgh competition
• Subjects were scanned while they watched 3 episodes of
“Home Improvement”
• Time-varying ratings obtained for “amusement”, “food”,
“tools”, “faces”...
• Goal: predict ratings using brain data
• Train a classifier using brain data + ratings from 2 episodes
• Then, feed the trained classifier the brain data from the 3rd
episode and use the classifier to predict (in a second-bysecond fashion) the subject’s feature ratings
2006 competition
• some representative correlation values:
•
•
•
•
•
•
•
Amusement: .46
Faces: .67
Language: .69
Laughter: .58
Motion: .49
Music: .76
Tools: .62
2007 Pittsburgh competition
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• www.braincompetition.org
Interim Summary
• By applying classifiers to fMRI data, we can derive a timevarying estimate of the subject’s cognitive state, that relates
in a meaningful way to their behavior
• Technical challenges
Technical Challenges
• From the perspective of machine learning, fMRI classification is
a particularly difficult problem (Mitchell et al., 2004, Machine
Learning)
• Big patterns
• Noisy patterns
• Relatively few patterns
• What can we do to improve classification?
Classifiers
• We have tried lots of classifiers
– Neural network, correlation-based classifiers, support vector
machines, Gaussian Naive Bayes, boosting, k-nearestneighbor, linear discriminant analysis...
• The exact classifier that we use doesn’t seem to matter (much);
nonlinear classifiers do not systematically outperform linear
classifiers...
• Regularization helps (e.g., ridge regression outperforms normal
regression)
Feature Selection
• Getting rid of noisy voxels greatly helps performance
• Standard method:
• Run a voxel-wise omnibus ANOVA on the conditions of interest
(e.g., face vs. location vs. object)
• Get rid of voxels that don’t vary significantly across conditions
Feature Selection
• This ANOVA method helps, but it has several problems
• Main benefit of linear classifiers is that they can aggregate weak
signals across voxels
• In light of this, it seems like a bad idea to discard individual
voxels just because the voxel’s signal is weak...
Feature Selection
• What we really want to do is to come up with multivariate
means of voxel selection
• we want to select sets of voxels that in aggregate carry
useful information
• Promising approach: Searchlights (Kriegeskorte et al., 2006,
PNAS)
Dimensionality Reduction
• We are also exploring different methods of re-coding the data
• There is extensive redundancy across voxels (esp. spatially
proximal voxels)
• Is there a more efficient way to represent the input (i.e., with
fewer dimensions)
• manifold learning
• Spatial wavelet decomposition
• ICA
Dimensionality reduction algorithms
• Generative models (David Weiss & David Blei)
• Each brain state is made of a linear combination of
“neural topics”
• Each topic = a pattern of voxel activity across the whole
brain (positive and negative values are OK)
• To generate a brain state from topics, multiply each topic
by a positive value
• Topics are constrained to be spatially sparse (L2
regularization; trying L1 also)
Next steps
• We know a lot about the brain (in general), the fMRI
response, and cognition that we are not telling the
classifier…
• Currently: Each brain pattern is treated as a distinct
observation
• In actuality: There is massive correlation between adjacent
time points
• Knowing the information represented at time n tells you a lot
about the information represented at time n + 1
Next steps
• In addition to temporal correlation, there is extensive spatial
correlation
• Nearby voxels tend to represent similar things
• One way to address this issue is by spatially smoothing the
data (averaging together activity from nearby voxels)
• However, you can lose information this way
• A more sophisticated approach would be to directly measure
pairwise correlations between voxels and incorporate this
information in the model
Next steps
• Currently, our analyses are focused on single subjects
• Is there some way to leverage data from other subjects to
help with classification
• If you run 10 subjects in the Haxby 8-category experiment,
none of the subjects will have the exact same “shoe”
representation, but the shoe representations are not random
either
• It might be possible to draw on data from other subjects to
set priors on which voxels will be involved in representing
shoes
Next steps
• Also, there is an enormous body of evidence relating to
which brain structures are involved in a given cognitive task
• “face area”, “place area”
• We can use this information to set priors on voxel weights in
the classification process
Next steps
• The cognitive states that we are trying to classify often have
a hierarchical structure
• How you represent a stimulus depends on the task that you
are performing
• Informing the classifier about this hierarchical structure
should boost classification
Next steps
• Different tasks (dangerous/safe, land/water) have different
neural signatures
• If we can detect the neural signatures of these tasks, we can
conditionalize the classifer on which task representation is
present in the subject’s head
Next steps
• Lots of potential constraints
•
•
•
•
•
Temporal autocorrelation
Correlation between nearby voxels
Data from other subjects in the same experiment
Data from other experiments
Hierarchical structure of cognitive states
• How do we inform the classifier of these constraints?
• Graphical models should provide a way of doing this….
Summary
• By applying pattern classification algorithms to neuroimaging
data, we can extract a tremendous amount of information
regarding what subjects are thinking, and how subjects’
thoughts evolve over time
• Plenty of room for improvement...
• Solving these problems will require meaningful contributions
from several disciplines: Cognitive psychology, neuroscience,
machine learning, engineering, signal processing, statistics,
and mathematics...
Computational Memory Lab
• Michael Bannert
• Melissa Carroll
• Denis Chigirev
• Greg Detre
• Chris Moore
• Ehren Newman
• Joel Quamme
• Susan Robison
• Per Sederberg
• Matt Weber
• David Weiss
• And many others…
http://compmem.princeton.edu
Princeton Colleagues
• David Blei (Comp. Sci.)
• Matt Botvinick (PSY)
• Jon Cohen (PSY)
• Ingrid Daubechies (Math)
• Jim Haxby (PSY)
• Fei-Fei Li (Comp. Sci.)
• Dan Osherson (PSY)
• Peter Ramadge (EE)
• Rob Schapire (Comp. Sci.)
• Greg Stephens (Physics)
my email: [email protected]
Princeton Multi-Voxel Pattern Analysis Toolkit
currently in public beta-testing:
www.csbmb.princeton.edu/mvpa
NiAM (NeuroImaging Analysis Methods) group
meets Fridays 2pm