Pixels to Voxels: Modeling Visual Representation in the Human Brain

Pixels to Voxels: Modeling Visual
Representation in the Human Brain
Authors: Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant
Presenters: JunYoung Gwak, Kuan Fang
Outlines
●
●
●
●
●
Background
Motivation
Related Works
Models
Experiments
Background
Brain areas related to vision
Simulation of brain activities
Background
Functional Magnetic resonance imaging (fMRI)
Blood oxygenation level-dependent (BOLD)
Motivation
Human
Computer
Vision
mountain/people/fishes
/...
Feature Representations
mountain/human/fishes
/...
Related Works
T Naselaris et al, “Bayesian reconstruction of natural images from human brain activity,” Neuron 2009
Related Works
Shinji Nishimoto, Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies, Current Biology 2011
Model: Ridge Regression
19-Category (19-Cat) Feature Representation
19-dimensional binary:
●
indicating the presence (1) or absence (0) of 19 semantic image categories
(e.g. ”furniture,” ”vehicle,” ”water,” etc.)
ConvNet Feature Representation
For each brain voxel, select one of the ConvNet layers
as the ConvNet feature:
●
●
●
AlexNet pretrained on ImageNet classification
7 feature space (conv-1 to conv-5, fc-6 to fc-7)
select the optimal ConvNet layer maximizing
prediction accuracy of voxel activity
Fisher-Vector (FV) Feature Representation
●
●
●
Patch features: Extracted using SIFT descriptors
Prototypical Patch Features: A dictionary of 64 modes of patch features learned
using Gaussian Mixture Model (GMM) on random natural image patches
FV Features: Reflect the difference between patch features and prototypical
patch features.
Encoding model performance
●
How accurately does each model predict the brain activity?
○
○
●
●
Measure the performance of features spaces from computer vision
■ Fisher-Vector (FV) Feature
■ ConvNet Feature
Compared to previously studied 19-Category Feature
Accuracy measure: correlation coefficient
In order to avoid correlation by chance, focus on significant values
○
○
Null distribution: obtained by chance, 1000 permutation of validation set response
Take upper (1-p-value)-th percentile of null distribution as threshold for significant correlation
Encoding model performance
Encoding model performance
For each plot
●
●
●
●
each point: accuracy of single voxel prediction
x-axis: 19-cat accuracy
y-axis: computer vision feature accuracy
gray area: accuracy below significance threshold
(p < 0.0001)
What this means
●
●
●
Red dots: voxels where prediction of computer
vision feature outperforms 19-cat model
Blue dots: voxels where prediction of 19-cat
model outperforms computer vision feature
Gray dots: indistinguishable from noise, discarded
Encoding model performance
Plotted according to ROIs, as
identified in earlier studies
Early visual areas
Higher visual areas
Early visual areas
Higher visual areas
Encoding model performance
Computer vision features clearly
outperforms 19-cat features at
lower visual areas
●
●
Known to be selective for
structural information in natural
images
19-cat features does not have any
structural information
Encoding model performance
Computer vision features shows
comparable performance to 19-cat
features at higher visual areas
●
Believed to be involved in form
processing and object
segmentation
In general, ConvNet Feature
performs better than FV feature
Encoding model performance
FV feature vs ConvNet feature
●
●
Red: ConvNet feature outperforms FV feature
Blue: FV feature outperforms ConvNet feature
ConvNet feature outperforms FV feature
at earlier and intermediate visual areas
Comparable for higher visual areas
Investigating Voxel Tuning
●
What we have shown:
○
●
What we want to do:
○
●
FV and ConvNets features can be used to predict brain activity in many visual areas
Gain better understanding of human visual representation by examining FV and
ConvNet models (for this paper, analysis using ConvNet only)
How:
○
○
Visualizing top five images which activates/deactivates each voxel in different ROIs
Clustering model weights of the same ROI. Visualizing each cluster
Voxel activation analysis
●
●
One voxel from each ROI
Top five images which activates/deactivates each voxel out of 170k images
Top five images which activates this voxel
Top five images which deactivates this voxel
Voxel activation analysis
●
V1:
○
○
Increase activity with high-frequency texture
Decrease activity with low-frequency texture
Top five images which activates this voxel
Top five images which deactivates this voxel
Voxel activation analysis
●
V4: (largely unknown area from previous studies)
○
○
Increase activity with blob in the center
Decrease activity with large-scale texture
Top five images which activates this voxel
Top five images which deactivates this voxel
Voxel activation analysis
●
EBA
○
○
Increase activity with people or animals
Decrease activity with scenes or texture
Top five images which activates this voxel
Top five images which deactivates this voxel
Voxel activation analysis
●
PPA
○
○
Increase activity with large scenes
Decrease activity with small item with high texture
Top five images which activates this voxel
Top five images which deactivates this voxel
Model weight clustering
●
●
●
●
Investigate fine-grained structure of ROI
K-Means clustering on ConvNet model weights within EBA with high accuracy
Two subjects, or people (reproducible)
Visualize top five images which activates/deactivates cluster average model weight
Model weight clustering
●
C1
○
○
Increase activity with group of people in action
Decrease activity with rounded shape (s1) or landscape (s2)
Top five images which activates this cluster
Top five images which deactivates this cluster
Model weight clustering
●
C2
○
○
Increase activity with single person
Decrease activity with landscape
Top five images which activates this cluster
Top five images which deactivates this cluster
Model weight clustering
●
●
Clusters are spatially coherent
Clusters are present at corresponding anatomical location in both subjects
Conclusion
●
●
Show that features used in computer vision can be used to predict human
brain activity
Propose a new way to investigate visual representation in the human brain
using analysis in ConvNet
○
○
Analysis results match with previous findings about different ROIs
Analysis can be used to explore conventional ROIs in further details