Pixels to Voxels: Modeling Visual Representation in the Human Brain Authors: Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant Presenters: JunYoung Gwak, Kuan Fang Outlines ● ● ● ● ● Background Motivation Related Works Models Experiments Background Brain areas related to vision Simulation of brain activities Background Functional Magnetic resonance imaging (fMRI) Blood oxygenation level-dependent (BOLD) Motivation Human Computer Vision mountain/people/fishes /... Feature Representations mountain/human/fishes /... Related Works T Naselaris et al, “Bayesian reconstruction of natural images from human brain activity,” Neuron 2009 Related Works Shinji Nishimoto, Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies, Current Biology 2011 Model: Ridge Regression 19-Category (19-Cat) Feature Representation 19-dimensional binary: ● indicating the presence (1) or absence (0) of 19 semantic image categories (e.g. ”furniture,” ”vehicle,” ”water,” etc.) ConvNet Feature Representation For each brain voxel, select one of the ConvNet layers as the ConvNet feature: ● ● ● AlexNet pretrained on ImageNet classification 7 feature space (conv-1 to conv-5, fc-6 to fc-7) select the optimal ConvNet layer maximizing prediction accuracy of voxel activity Fisher-Vector (FV) Feature Representation ● ● ● Patch features: Extracted using SIFT descriptors Prototypical Patch Features: A dictionary of 64 modes of patch features learned using Gaussian Mixture Model (GMM) on random natural image patches FV Features: Reflect the difference between patch features and prototypical patch features. Encoding model performance ● How accurately does each model predict the brain activity? ○ ○ ● ● Measure the performance of features spaces from computer vision ■ Fisher-Vector (FV) Feature ■ ConvNet Feature Compared to previously studied 19-Category Feature Accuracy measure: correlation coefficient In order to avoid correlation by chance, focus on significant values ○ ○ Null distribution: obtained by chance, 1000 permutation of validation set response Take upper (1-p-value)-th percentile of null distribution as threshold for significant correlation Encoding model performance Encoding model performance For each plot ● ● ● ● each point: accuracy of single voxel prediction x-axis: 19-cat accuracy y-axis: computer vision feature accuracy gray area: accuracy below significance threshold (p < 0.0001) What this means ● ● ● Red dots: voxels where prediction of computer vision feature outperforms 19-cat model Blue dots: voxels where prediction of 19-cat model outperforms computer vision feature Gray dots: indistinguishable from noise, discarded Encoding model performance Plotted according to ROIs, as identified in earlier studies Early visual areas Higher visual areas Early visual areas Higher visual areas Encoding model performance Computer vision features clearly outperforms 19-cat features at lower visual areas ● ● Known to be selective for structural information in natural images 19-cat features does not have any structural information Encoding model performance Computer vision features shows comparable performance to 19-cat features at higher visual areas ● Believed to be involved in form processing and object segmentation In general, ConvNet Feature performs better than FV feature Encoding model performance FV feature vs ConvNet feature ● ● Red: ConvNet feature outperforms FV feature Blue: FV feature outperforms ConvNet feature ConvNet feature outperforms FV feature at earlier and intermediate visual areas Comparable for higher visual areas Investigating Voxel Tuning ● What we have shown: ○ ● What we want to do: ○ ● FV and ConvNets features can be used to predict brain activity in many visual areas Gain better understanding of human visual representation by examining FV and ConvNet models (for this paper, analysis using ConvNet only) How: ○ ○ Visualizing top five images which activates/deactivates each voxel in different ROIs Clustering model weights of the same ROI. Visualizing each cluster Voxel activation analysis ● ● One voxel from each ROI Top five images which activates/deactivates each voxel out of 170k images Top five images which activates this voxel Top five images which deactivates this voxel Voxel activation analysis ● V1: ○ ○ Increase activity with high-frequency texture Decrease activity with low-frequency texture Top five images which activates this voxel Top five images which deactivates this voxel Voxel activation analysis ● V4: (largely unknown area from previous studies) ○ ○ Increase activity with blob in the center Decrease activity with large-scale texture Top five images which activates this voxel Top five images which deactivates this voxel Voxel activation analysis ● EBA ○ ○ Increase activity with people or animals Decrease activity with scenes or texture Top five images which activates this voxel Top five images which deactivates this voxel Voxel activation analysis ● PPA ○ ○ Increase activity with large scenes Decrease activity with small item with high texture Top five images which activates this voxel Top five images which deactivates this voxel Model weight clustering ● ● ● ● Investigate fine-grained structure of ROI K-Means clustering on ConvNet model weights within EBA with high accuracy Two subjects, or people (reproducible) Visualize top five images which activates/deactivates cluster average model weight Model weight clustering ● C1 ○ ○ Increase activity with group of people in action Decrease activity with rounded shape (s1) or landscape (s2) Top five images which activates this cluster Top five images which deactivates this cluster Model weight clustering ● C2 ○ ○ Increase activity with single person Decrease activity with landscape Top five images which activates this cluster Top five images which deactivates this cluster Model weight clustering ● ● Clusters are spatially coherent Clusters are present at corresponding anatomical location in both subjects Conclusion ● ● Show that features used in computer vision can be used to predict human brain activity Propose a new way to investigate visual representation in the human brain using analysis in ConvNet ○ ○ Analysis results match with previous findings about different ROIs Analysis can be used to explore conventional ROIs in further details
© Copyright 2026 Paperzz