NATURE VS. NURTURE BIOLOGICAL PARALLELS TO DEEP LEARNING Kai Arulkumaran Imperial College London 2015-11-13 @KaiLashArul OVERVIEW Deep or Learning? Biological Parallels Black Boxes DEEP OR LEARNING? DEEP LEARNING'S PROMINENCE "Unreasonably effective" ImageNet Classification (Krizhevsky et al., 2012) Transition from hand-engineered to learned features Discovering higher-level abstract features HISTORY: BIOLOGICAL MOTIVATIONS Artificial Neural Networks: neurons, synapes, synaptic weights, activations First thoughts on flying machines: flapping wings First practical flying machines: fixed-wing gliders What are the relevant (biological) principles? ANNs emit "average firing rates" ⇒ rate coding SNNs emit spikes ⇒ temporal coding REPRESENTATION LEARNING a.k.a. feature learning (Bengio et al., 2013) DL priors are distribution and depth Distinguishable regions and model parameters Local representations: #regions ≈ #params Distributed representations: #regions ≈ e#params Depth allows hierarchical organisation Joint learning is ideal, but depth still confers benefits CORTICAL LAYERS Cerebral cortex has six main layers "Cortical columns" Drawings of cortical layers (Cajal, 1899) IMITATING THE VISUAL CORTEX Simple and complex cells in V1 (Hubel & Wiesel, 1959) Visualisizing Receptive fields (Hyvärinen et al., 2009) Hierarchical Fourier-based operators (Granlund, 1978) Neocognitron: S-cells and C-cells (Fukushima, 1980) LEARNING VISUAL FEATURES CNN layer 1 filters (Krizhevsky et al., 2012) Resembles receptive fields in the visual cortex! Same with sparse coding, k-means etc. (Memisevic, 2015) Natural image statistics INDEPENDENT COMPONENTS ANALYSIS ICA (Hyvärinen et al., 2009) SPARSE CODING Sparse coding (Olshausen & Field, 1997) K-MEANS k-means (Memisevic, 2015) CELLULAR NEURAL NETWORKS One cell in analog hardware (Chua & Yang, 1988) conv(input) + conv(recurrent) → nonlin(tanh) Pixel input includes neighbourhood like a MRF Each "cell" (pixel) has "dendrites" BIOLOGICALLY INSPIRED 2ULHQWDWLRQ DGDSWHG 2ULHQWDWLRQ DGDSWHG 2ULHQWDWLRQ DGDSWHG 3RROHU 66 3RROHU 66 3RROHU 66 ,PDJH & & & & & & & & & & & & & & & & & & & 3RROHU 66 'LYLVLYH1RUP & 1RQOLQHDULW\ & &RQYROXWLRQ & & & & & & & & & & & & & & & 6,'HVFULSWRUV 6,'HVFULSWRUV 6,'HVFULSWRUV Cortexica/BICV architecture 2x conv → pool → nonlin → feat-pool → norm Internal feedback (gated connections) REALTIME TRACKING Virgin Marathon 2010 ONE MODULE 1/3 Convolution + pooling filters Convolution (parameterised Gabor-like wavelets) Retinotopic mapping (Wandell, 1995; Ng & Bharath, 2005) Average pooling (subsampling) Receptive field support increases (Foster, 1985) ONE MODULE 2/3 Nonlinearity (absolute function) Need nonlinearity to provide invariance in complex cells, given inputs of simple cells Achieved via rectification functions (Hubel & Wiesel, 1962) ONE MODULE 3/3 Feature space pooling (Granlund & Knutsson, 1994) K/2−1 Double-angle mapping (K → 2): ∑ ∣ (ℓ) ∣ j2ϕk f (m, n) e ∣ k ∣ k=0 Orientation dominance in V1 cells (Ringach, 2002) Divisive normalisation Performed across the brain (Carandini & Heeger, 2012) "TRADITIONAL" CV PIPELINE Appearance-based localisation (Rivera-Rubio et al., 2015) Real-time appearance-based indoor localisation Trained and tested on low res images Data augmentation not just for DL (Chatfield et al., 2014) EVEN SO... We can hand-engineer low-level features OK But what about high-level features? Deep learning tackles perceptual information Good Old Fashioned AI tackles symbolic information Can we bridge the conceptual gap (Hassabis, 2011)? Let's look at the brain... BIOLOGICAL PARALLELS 3 CONTROVERSIAL HYPOTHESES... ...for computation in the primate cortex (Dean et al., 2012) Modular Minds Hypothesis Single Algorithm Hypothesis Scalable Cortex Hypothesis MODULAR MINDS Visual cortex, auditory cortex etc. Evidence from neurophysiology, neuroimaging etc. Specialism is fine, but multimodality ⟹ intelligence(?) Psychology: auditory and visual word forms As humans develop, we increasingly engage in crossmodal tasks (Cone et al., 2008) Acts as a regulariser (Srivastava & Salakhutdinov, 2014) ANALOGIES Very human trait word2vec: woman is to cat as man is to... ...dog (Mikolov et al., 2013) Visual/word analogies (Kiros et al., 2014) SINGLE ALGORITHM Neocortex has homogeneous high-level structure Optical nerve rerouted to auditory cortex (Roe et al., 1992) Can learn to use ultrasonic sensors (Warwick et al., 2005) Is the algorithm backpropagation (or BPTT)? BACKPROPAGATION Is it biologically plausible? Hinton likes "thought vectors" (Kiros et al., 2015) Same was said for "wake-sleep" (Hinton et al., 1995), etc. Bengio is working on target propagation (Bengio, 2014) SCALABLE CORTEX Bigger (deeper) is better (Bengio, 2009) How do we compare against other primates? Difference depends, but usually small (Dean et al., 2012) Most noticeable in language areas (Granger, 2006) Neuroscientists now compare against NNs OBJECT CATEGORISATION Complex cells are phase & ~translation invariant Inferior Temporal Cortex categorises (Hung et al., 2005) View-invariant representations (Quiroga et al., 2005) Humans perform categorisation when activations in ITC settle to become more stable (Ritchie et al., 2015) ATTENTION The brain has finite computational power Selective attention during information processing System naturally has a bottleneck (Anderson, 2004) Visual attention is old (Schmidhuber & Huber, 1991) Many more applications than that! Caption-to-image (Mansimov et al., 2015) VANISHING GRADIENTS Vanishing or exploding gradients (Hochreiter, 1991) Normalisation to the rescue? Canonical computation (Carandini & Heeger, 2012) Batch Norm (can) work well (Ioffe & Szegedy, 2015) Divisive normalisation in biology Used for sparsity (Gülçehre & Bengio, 2013) PARALLELS ARE AFFIRMING Not doing things the same as everyone else But finding the same things independently One more neuroscience case study... BLACK BOXES UNDERSTANDING NNS Human Brain Project Aim: simulate a whole human brain by 2023 Simulating a brain ≠ intelligence(?) Model uncertainty (Gal & Ghahramani, 2015) Deep Gaussian Processes (Damianou & Lawrence, 2012) REPRODUCIBLE RESEARCH The "long tail" of science DL community is great at sharing information Code goes on repos Datasets are stored as files/in databases But every unrecorded experiment loses data Enter FGLab: https://kaixhin.github.io/FGLab/ FGLAB 1/2 Client-server machine learning dashboard Node.js + MongoDB or Docker Web GUI and API Command-line inputs FGLAB 2/2 Structured/unstructured file outputs Save what you want Compare results Data is saved QUESTIONS How can we further combine depth and learning? Can we extract the relevant principles from biology? Do we really understand our models? THANKS Anil Bharath (many biological references) Nerhun Yildiz (Cellular Neural Networks) Marta Garnelo (sanity check) You (for listening?) REFERENCES 1/4 [1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 10971105). [2] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1798-1828. [3] y Cajal, S. R. (1899). Comparative study of the sensory areas of the human cortex. [4] Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology, 148(3), 574-591. [5] Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision (Vol. 39). Springer Science & Business Media. [6] Granlund, G. H. (1978). In search of a general picture processing operator. Computer Graphics and Image Processing, 8(2), 155-173. [7] Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202. [8] Memisevic, R. (2015). Visual features: From Fourier to Gabor. Deep Learning Summer School, Montreal 2015. [9] Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1?. Vision research, 37(23), 3311-3325. [10] Chua, L. O., & Yang, L. (1988). Cellular neural network: Theory. IEEE Trans. Circuits Syst, 35, 1257-1272. REFERENCES 2/4 [11] Wandell, B. A. (1995). Foundations of vision. Sinauer Associates. [12] Bharath, A. A., & Ng, J. (2005). A steerable complex wavelet construction and its application to image denoising. Image Processing, IEEE Transactions on, 14(7), 948-959. [13] Foster, K. H., Gaska, J. P., Nagler, M., & Pollen, D. A. (1985). Spatial and temporal frequency selectivity of neurones in visual cortical areas V1 and V2 of the macaque monkey. The Journal of Physiology, 365(1), 331-363. [14] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of physiology, 160(1), 106. [15] Granlund, G. H., & Knutsson, H. (1994). Signal Processing for Computer Vision. Springer Science & Business Media. [16] Ringach, D. L., Shapley, R. M., & Hawken, M. J. (2002). Orientation selectivity in macaque V1: diversity and laminar dependence. The Journal of neuroscience, 22(13), 5639-5651. [17] Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13(1), 51-62. [18] Rivera-Rubio, J., Alexiou, I. & Bharath, A. A. (2015). Indoor Localisation with Regression Networks and Place Cell Models. In Proceedings of the British Machine Vision Conference (pp. 147.1-147.12). [19] Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531. [20] Hassabis, D. (2011). Systems neuroscience and AGI. In Winter Intelligence Conference. REFERENCES 3/4 [21] Dean, T. L., Corrado, G., & Shlens, J. (2012, July). Three Controversial Hypotheses Concerning Computation in the Primate Cortex. In AAAI. [22] Cone, N. E., Burman, D. D., Bitan, T., Bolger, D. J., & Booth, J. R. (2008). Developmental changes in brain regions involved in phonological and orthographic processing during spoken language processing. Neuroimage, 41(2), 623-635. [23] Srivastava, N., & Salakhutdinov, R. R. (2012). Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems (pp. 2222-2230). [24] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). [25] Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539. [26] Roe, A. W., Pallas, S. L., Kwon, Y. H., & Sur, M. (1992). Visual projections routed to the auditory pathway in ferrets: receptive fields of visual neurons in primary auditory cortex. The Journal of neuroscience, 12(9), 3651-3664. [27] Warwick, K., Gasson, M., Hutt, B., & Goodhew, I. (2005, October). An attempt to extend human sensory capabilities by means of implant technology. In Systems, Man and Cybernetics, 2005 IEEE International Conference on (Vol. 2, pp. 1663-1668). IEEE. [28] Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Torralba, A., Urtasun, R., & Fidler, S. (2015). Skip-thought vectors. arXiv preprint arXiv:1506.06726. [29] Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The "wake-sleep" algorithm for unsupervised neural networks. Science, 268(5214), 1158-1161. REFERENCES 4/4 [30] Bengio, Y. (2014). How auto-encoders could provide credit assignment in deep networks via target propagation. arXiv preprint arXiv:1407.7906. [31] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127. [32] Granger, R. (2006). Engines of the brain: The computational instruction set of human cognition. AI Magazine, 27(2), 15. [33] Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout of object identity from macaque inferior temporal cortex. Science, 310(5749), 863-866. [34] Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual representation by single neurons in the human brain. Nature, 435(7045), 1102-1107. [35] Ritchie, J. B., Tovar, D. A., & Carlson, T. A. (2015). Emerging Object Representations in the Visual System Predict Reaction Times for Categorization. PLoS computational biology, 11(6). [36] Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2014). Generating Images from Captions with Attention. arXiv preprint arXiv:1511.02793. [37] Gülçehre, Ç., & Bengio, Y. (2013). Knowledge matters: Importance of prior information for optimization. arXiv preprint arXiv:1301.4083. [38] Gal, Y., & Ghahramani, Z. (2015). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv preprint arXiv:1506.02142. [39] Damianou, A. C., & Lawrence, N. D. (2012). Deep gaussian processes. arXiv preprint arXiv:1211.0358.
© Copyright 2026 Paperzz