NATURE VS. NURTURE - Imperial College London

NATURE VS. NURTURE
BIOLOGICAL PARALLELS TO DEEP LEARNING
Kai Arulkumaran
Imperial College London
2015-11-13
@KaiLashArul
OVERVIEW
Deep or Learning?
Biological Parallels
Black Boxes
DEEP OR LEARNING?
DEEP LEARNING'S PROMINENCE
"Unreasonably effective"
ImageNet Classification (Krizhevsky et al., 2012)
Transition from hand-engineered to learned features
Discovering higher-level abstract features
HISTORY:
BIOLOGICAL MOTIVATIONS
Artificial Neural Networks: neurons, synapes, synaptic
weights, activations
First thoughts on flying machines: flapping wings
First practical flying machines: fixed-wing gliders
What are the relevant (biological) principles?
ANNs emit "average firing rates" ⇒ rate coding
SNNs emit spikes ⇒ temporal coding
REPRESENTATION LEARNING
a.k.a. feature learning (Bengio et al., 2013)
DL priors are distribution and depth
Distinguishable regions and model parameters
Local representations: #regions ≈ #params
Distributed representations: #regions ≈ e#params
Depth allows hierarchical organisation
Joint learning is ideal, but depth still confers benefits
CORTICAL LAYERS
Cerebral cortex has six main layers
"Cortical columns"
Drawings of cortical layers (Cajal, 1899)
IMITATING THE VISUAL CORTEX
Simple and complex cells in V1 (Hubel & Wiesel, 1959)
Visualisizing Receptive fields (Hyvärinen et al., 2009)
Hierarchical Fourier-based operators (Granlund, 1978)
Neocognitron: S-cells and C-cells (Fukushima, 1980)
LEARNING VISUAL FEATURES
CNN layer 1 filters (Krizhevsky et al., 2012)
Resembles receptive fields in the visual cortex!
Same with sparse coding, k-means etc. (Memisevic, 2015)
Natural image statistics
INDEPENDENT COMPONENTS
ANALYSIS
ICA (Hyvärinen et al., 2009)
SPARSE CODING
Sparse coding (Olshausen & Field, 1997)
K-MEANS
k-means (Memisevic, 2015)
CELLULAR NEURAL NETWORKS
One cell in analog hardware (Chua & Yang, 1988)
conv(input) + conv(recurrent) → nonlin(tanh)
Pixel input includes neighbourhood like a MRF
Each "cell" (pixel) has "dendrites"
BIOLOGICALLY INSPIRED
2ULHQWDWLRQ
DGDSWHG
2ULHQWDWLRQ
DGDSWHG
2ULHQWDWLRQ
DGDSWHG
3RROHU
66
3RROHU
66
3RROHU
66
,PDJH
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
3RROHU
66
'LYLVLYH1RUP
&
1RQOLQHDULW\
&
&RQYROXWLRQ
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
6,'HVFULSWRUV
6,'HVFULSWRUV
6,'HVFULSWRUV
Cortexica/BICV architecture
2x conv → pool → nonlin → feat-pool → norm
Internal feedback (gated connections)
REALTIME TRACKING
Virgin Marathon 2010
ONE MODULE 1/3
Convolution + pooling filters
Convolution (parameterised Gabor-like wavelets)
Retinotopic mapping (Wandell, 1995; Ng & Bharath, 2005)
Average pooling (subsampling)
Receptive field support increases (Foster, 1985)
ONE MODULE 2/3
Nonlinearity (absolute function)
Need nonlinearity to provide invariance in complex cells,
given inputs of simple cells
Achieved via rectification functions (Hubel & Wiesel, 1962)
ONE MODULE 3/3
Feature space pooling (Granlund & Knutsson, 1994)
K/2−1
Double-angle mapping (K → 2):
∑
∣ (ℓ)
∣ j2ϕk
f
(m, n) e
∣ k
∣
k=0
Orientation dominance in V1 cells (Ringach, 2002)
Divisive normalisation
Performed across the brain (Carandini & Heeger, 2012)
"TRADITIONAL" CV PIPELINE
Appearance-based localisation (Rivera-Rubio et al., 2015)
Real-time appearance-based indoor localisation
Trained and tested on low res images
Data augmentation not just for DL (Chatfield et al., 2014)
EVEN SO...
We can hand-engineer low-level features OK
But what about high-level features?
Deep learning tackles perceptual information
Good Old Fashioned AI tackles symbolic information
Can we bridge the conceptual gap (Hassabis, 2011)?
Let's look at the brain...
BIOLOGICAL PARALLELS
3 CONTROVERSIAL HYPOTHESES...
...for computation in the primate cortex (Dean et al., 2012)
Modular Minds Hypothesis
Single Algorithm Hypothesis
Scalable Cortex Hypothesis
MODULAR MINDS
Visual cortex, auditory cortex etc.
Evidence from neurophysiology, neuroimaging etc.
Specialism is fine, but multimodality ⟹ intelligence(?)
Psychology: auditory and visual word forms
As humans develop, we increasingly engage in crossmodal tasks (Cone et al., 2008)
Acts as a regulariser (Srivastava & Salakhutdinov, 2014)
ANALOGIES
Very human trait
word2vec: woman is to cat as man is to...
...dog (Mikolov et al., 2013)
Visual/word analogies (Kiros et al., 2014)
SINGLE ALGORITHM
Neocortex has homogeneous high-level structure
Optical nerve rerouted to auditory cortex (Roe et al., 1992)
Can learn to use ultrasonic sensors (Warwick et al., 2005)
Is the algorithm backpropagation (or BPTT)?
BACKPROPAGATION
Is it biologically plausible?
Hinton likes "thought vectors" (Kiros et al., 2015)
Same was said for "wake-sleep" (Hinton et al., 1995), etc.
Bengio is working on target propagation (Bengio, 2014)
SCALABLE CORTEX
Bigger (deeper) is better (Bengio, 2009)
How do we compare against other primates?
Difference depends, but usually small (Dean et al., 2012)
Most noticeable in language areas (Granger, 2006)
Neuroscientists now compare against NNs
OBJECT CATEGORISATION
Complex cells are phase & ~translation invariant
Inferior Temporal Cortex categorises (Hung et al., 2005)
View-invariant representations (Quiroga et al., 2005)
Humans perform categorisation when activations in ITC
settle to become more stable (Ritchie et al., 2015)
ATTENTION
The brain has finite computational power
Selective attention during information processing
System naturally has a bottleneck (Anderson, 2004)
Visual attention is old (Schmidhuber & Huber, 1991)
Many more applications than that!
Caption-to-image (Mansimov et al., 2015)
VANISHING GRADIENTS
Vanishing or exploding gradients (Hochreiter, 1991)
Normalisation to the rescue?
Canonical computation (Carandini & Heeger, 2012)
Batch Norm (can) work well (Ioffe & Szegedy, 2015)
Divisive normalisation in biology
Used for sparsity (Gülçehre & Bengio, 2013)
PARALLELS ARE AFFIRMING
Not doing things the same as everyone else
But finding the same things independently
One more neuroscience case study...
BLACK BOXES
UNDERSTANDING NNS
Human Brain Project
Aim: simulate a whole human brain by 2023
Simulating a brain ≠ intelligence(?)
Model uncertainty (Gal & Ghahramani, 2015)
Deep Gaussian Processes (Damianou & Lawrence, 2012)
REPRODUCIBLE RESEARCH
The "long tail" of science
DL community is great at sharing information
Code goes on ‰‹– repos
Datasets are stored as files/in databases
But every unrecorded experiment loses data
Enter FGLab: https://kaixhin.github.io/FGLab/
FGLAB 1/2
Client-server machine learning dashboard
Node.js + MongoDB or Docker
Web GUI and API
Command-line inputs
FGLAB 2/2
Structured/unstructured file outputs
Save what you want
Compare results
Data is saved
QUESTIONS
How can we further combine depth and learning?
Can we extract the relevant principles from biology?
Do we really understand our models?
THANKS
Anil Bharath (many biological references)
Nerhun Yildiz (Cellular Neural Networks)
Marta Garnelo (sanity check)
You (for listening?)
REFERENCES 1/4
[1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems (pp. 10971105).
[2] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new
perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1798-1828.
[3] y Cajal, S. R. (1899). Comparative study of the sensory areas of the human cortex.
[4] Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat's striate
cortex. The Journal of physiology, 148(3), 574-591.
[5] Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach
to Early Computational Vision (Vol. 39). Springer Science & Business Media.
[6] Granlund, G. H. (1978). In search of a general picture processing operator. Computer Graphics
and Image Processing, 8(2), 155-173.
[7] Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism
of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.
[8] Memisevic, R. (2015). Visual features: From Fourier to Gabor. Deep Learning Summer School,
Montreal 2015.
[9] Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy
employed by V1?. Vision research, 37(23), 3311-3325.
[10] Chua, L. O., & Yang, L. (1988). Cellular neural network: Theory. IEEE Trans. Circuits Syst, 35,
1257-1272.
REFERENCES 2/4
[11] Wandell, B. A. (1995). Foundations of vision. Sinauer Associates.
[12] Bharath, A. A., & Ng, J. (2005). A steerable complex wavelet construction and its application
to image denoising. Image Processing, IEEE Transactions on, 14(7), 948-959.
[13] Foster, K. H., Gaska, J. P., Nagler, M., & Pollen, D. A. (1985). Spatial and temporal frequency
selectivity of neurones in visual cortical areas V1 and V2 of the macaque monkey. The Journal of
Physiology, 365(1), 331-363.
[14] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional
architecture in the cat's visual cortex. The Journal of physiology, 160(1), 106.
[15] Granlund, G. H., & Knutsson, H. (1994). Signal Processing for Computer Vision. Springer
Science & Business Media.
[16] Ringach, D. L., Shapley, R. M., & Hawken, M. J. (2002). Orientation selectivity in macaque V1:
diversity and laminar dependence. The Journal of neuroscience, 22(13), 5639-5651.
[17] Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation.
Nature Reviews Neuroscience, 13(1), 51-62.
[18] Rivera-Rubio, J., Alexiou, I. & Bharath, A. A. (2015). Indoor Localisation with Regression
Networks and Place Cell Models. In Proceedings of the British Machine Vision Conference (pp.
147.1-147.12).
[19] Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the
details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531.
[20] Hassabis, D. (2011). Systems neuroscience and AGI. In Winter Intelligence Conference.
REFERENCES 3/4
[21] Dean, T. L., Corrado, G., & Shlens, J. (2012, July). Three Controversial Hypotheses Concerning
Computation in the Primate Cortex. In AAAI.
[22] Cone, N. E., Burman, D. D., Bitan, T., Bolger, D. J., & Booth, J. R. (2008). Developmental
changes in brain regions involved in phonological and orthographic processing during spoken
language processing. Neuroimage, 41(2), 623-635.
[23] Srivastava, N., & Salakhutdinov, R. R. (2012). Multimodal learning with deep boltzmann
machines. In Advances in neural information processing systems (pp. 2222-2230).
[24] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems (pp. 3111-3119).
[25] Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). Unifying visual-semantic embeddings with
multimodal neural language models. arXiv preprint arXiv:1411.2539.
[26] Roe, A. W., Pallas, S. L., Kwon, Y. H., & Sur, M. (1992). Visual projections routed to the auditory
pathway in ferrets: receptive fields of visual neurons in primary auditory cortex. The Journal of
neuroscience, 12(9), 3651-3664.
[27] Warwick, K., Gasson, M., Hutt, B., & Goodhew, I. (2005, October). An attempt to extend
human sensory capabilities by means of implant technology. In Systems, Man and Cybernetics,
2005 IEEE International Conference on (Vol. 2, pp. 1663-1668). IEEE.
[28] Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Torralba, A., Urtasun, R., & Fidler, S. (2015).
Skip-thought vectors. arXiv preprint arXiv:1506.06726.
[29] Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The "wake-sleep" algorithm for
unsupervised neural networks. Science, 268(5214), 1158-1161.
REFERENCES 4/4
[30] Bengio, Y. (2014). How auto-encoders could provide credit assignment in deep networks via
target propagation. arXiv preprint arXiv:1407.7906.
[31] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine
Learning, 2(1), 1-127.
[32] Granger, R. (2006). Engines of the brain: The computational instruction set of human
cognition. AI Magazine, 27(2), 15.
[33] Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout of object identity
from macaque inferior temporal cortex. Science, 310(5749), 863-866.
[34] Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual
representation by single neurons in the human brain. Nature, 435(7045), 1102-1107.
[35] Ritchie, J. B., Tovar, D. A., & Carlson, T. A. (2015). Emerging Object Representations in the
Visual System Predict Reaction Times for Categorization. PLoS computational biology, 11(6).
[36] Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2014). Generating Images from
Captions with Attention. arXiv preprint arXiv:1511.02793.
[37] Gülçehre, Ç., & Bengio, Y. (2013). Knowledge matters: Importance of prior information for
optimization. arXiv preprint arXiv:1301.4083.
[38] Gal, Y., & Ghahramani, Z. (2015). Dropout as a Bayesian approximation: Representing model
uncertainty in deep learning. arXiv preprint arXiv:1506.02142.
[39] Damianou, A. C., & Lawrence, N. D. (2012). Deep gaussian processes. arXiv preprint
arXiv:1211.0358.