Deep Learning with Symbols Daniel L. Silver Acadia University, Wolfville, NS, Canada HL NeSy Seminar - Dagsthul, April 2017 Intelligent Information Technology Research Lab, Acadia University, Canada 1 Introduction Shameer Iqbal Intelligent Information Technology Research Lab, Acadia University, Canada 2 Ahmed Galilia Motivation Humans are multimodal learners We are able to associate one modality with another Conjecture: knowing a concept has a lot to do with the fusion of sensory/motor channels Intelligent Information Technology Research Lab, Acadia University, Canada 4 Motivation Further Conjecture: Symbols allow us to share complex concepts quickly, concisely A human communications tool A course approximation of a noisy concept And their sounds: “one” “two” … 0 1 2 3 4 5 6 7 8 9 Also help us to escape local minima when learning Intelligent Information Technology Research Lab, Acadia University, Canada 5 Objective 1 To develop a multimodal system A generative deep learning architecture Trained using unsupervised algorithms That scales linearly in the number of channels Can reconstruct missing modalities Train and test it on digits 0-9 Four channels: Image, Audio, Motor, … Symbolic Classification Intelligent Information Technology Research Lab, Acadia University, Canada 6 Background Deep Belief Networks Stacked auto-encoders develop a rich feature space from unlabelled examples using unsupervised algorithms [Source: Caner Hazibas – slideshare] Intelligent Information Technology Research Lab, Acadia University, Canada 7 Background Deep Belief Networks RBM = Restricted Boltzman Machines Intelligent Information Technology Research Lab, Acadia University, Canada 8 Background Multimodal Learning MML Approach has been adopted by several deep learning researchers: (Srivastava and Salakhutdinov 2012) (Ngiam et al. 2011), (Socher et al. 2014) (Kiros, Salakhutdinov, and Zemel 2014) (Karpathy and Fei-Fei 2015) However tend to associate only 2 modalities Association layer is fine-tuned using supervised techniques such as back-prop Intelligent Information Technology Research Lab, Acadia University, Canada 10 Background Problem Refinement Intelligent Information Technology Research Lab, Acadia University, Canada 11 Background Problem Refinement Intelligent Information Technology Research Lab, Acadia University, Canada 12 Background Problem Refinement #1 Supervised fine-tuning does not scale well to three or more modalities Must fine-tune all possible input-output modality combinations Grows exponentially (2n-2), where n is number of channels Intelligent Information Technology Research Lab, Acadia University, Canada 13 Background Problem Refinement Example: n=3 channels, 6 configurations (23-2) = 6 (24-2) = 14 (25-2) = 30 Intelligent Information Technology Research Lab, Acadia University, Canada 14 Background Problem Refinement #2 Standard unsupervised learning using RBM approaches yields poor reconstruction A channel that provides a simple, noise free signal will dominate over other channels at the associative layer Difficult for another channel to generate correct features at the associate layer Intelligent Information Technology Research Lab, Acadia University, Canada 15 Theory and Approach Network Architecture Provides a concise symbolic rep of AM Intelligent Information Technology Research Lab, Acadia University, Canada 16 Theory and Approach RBM training of DBN Stack 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 22 Theory and Approach RBM training of DBN Stack 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 23 Theory and Approach Fine-tuning with Iterative Back-fitting Create and save hi Split weights wr wg 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 24 Theory and Approach Fine-tuning with Iterative Back-fitting Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2 Create new hj’ Split weight wr wg vi vi' 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 25 Theory and Approach Fine-tuning with Iterative Back-fitting Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2 Create new hj’ Split weight wr wg vi 8 Intelligent Information Technology Research Lab, Acadia University, Canada vi' eight 26 Emperical Studies Data and Method 10 reps x 20 male Canadian students 2000 examples in total Handwritten digits 0-9 Audio recordings Vector of noisey motor coordinates Classifications 100 examples per student x 20 students Conducted 10-fold cross validation DEMO 18 subjects in training set (1800 examples) 2 subjects in test set (200 examples) Intelligent Information Technology Research Lab, Acadia University, Canada 30 Deep Learning and LML http://ml3cpu.acadiau.ca [Iqbal and Silver, in press] Intelligent Information Technology Research Lab, Acadia University, Canada 31 Emperical Studies Data and Method Evaluation: Examine reconstruction error of each channel given input on another channel Error measure differs for each channel: Class: misclassification error Image: agreement with ANN classifier (99% acc) Audio: STFT signal is not reversible so as to create sound, agreement with RF classifier (93% acc) Motor: error = distance to target vector template; (anything < 2.2 is human readable) Intelligent Information Technology Research Lab, Acadia University, Canada 32 Emperical Studies Results Reconstruction of Classification Channel Intelligent Information Technology Research Lab, Acadia University, Canada 33 Emperical Studies Results Reconstruction of Image Channel Intelligent Information Technology Research Lab, Acadia University, Canada 34 Emperical Studies Results Reconstruction of Motor Channel Intelligent Information Technology Research Lab, Acadia University, Canada 35 Emperical Studies Discussion Elimination of channel dominance is not perfect, but significantly better Reconstruction error of missing channel decreases as available channels increase NOTE: The classification channel is not needed Introduced to clarify the concept in assoc. mem. A symbol for a noisy concept Intelligent Information Technology Research Lab, Acadia University, Canada 36 Objective 2 To show that learning with symbols is easier and more accurate than without A deep supervised learning architecture Develop a model to add two MNIST digits With and without symbolic inputs Test on previously unseen examples To examine what is happening in the network Is network learning addition or just a mapping function? Intelligent Information Technology Research Lab, Acadia University, Canada 39 Challenge in Training Deep Architectures Many tricks are used to overcome local minimum, most are a form of Inductive bias that favours portions of weights space where good solutions tend to be found. Intelligent Information Technology Research Lab, Acadia University, Canada 40 Two learners are better than one ! Consider you’re in the jungle … Learning concepts … Then you meet another person You share symbols Accuarcy improves Learn rate increases Intelligent Information Technology Research Lab, Acadia University, Canada 41 Challenge in Training Deep Architectures A single learner is hampered by the presence of local minima within its rep. space Overcoming this difficulty requires a lot of training examples Instead an agent’s learning effectiveness can be significantly improved with symbols Inspires social interaction and Bengio, Y.: Evolving culture vs local minima, In ArXiv 1203.2990v1. Springer (2013), dev. of culture http://arxiv.org/abs/1203.2990 Intelligent Information Technology Research Lab, Acadia University, Canada 42 Emperical Studies: Learning to Add MNIST digits Google Tensorflow Noisy: With binary symbolic values for each digit: Input: 2 MNIST digit images (784 x 2 values) Output: 2 MNIST digit images (784 x 2 values) Input: 1568 (images) + 10 + 10 (symbols) values Output: 1568 + 10 + 10 values 1-3 hidden layers of ReLU units Intelligent Information Technology Research Lab, Acadia University, Canada 43 DL Model – Without Symbols Intelligent Information Technology Research Lab, Acadia University, Canada 44 DL Model – With Symbols Intelligent Information Technology Research Lab, Acadia University, Canada 45 Most recent results: Without symbols With Symbols Intelligent Information Technology Research Lab, Acadia University, Canada 46 Discussion: Improved results of about 10% with symbolic outputs (based on classification of the output digits by a highly accurate convolution network) Believe we can do much better Lab working on: Different architectures Varying number of training examples with symbols Interpreting hidden node features Intelligent Information Technology Research Lab, Acadia University, Canada 47 Thank You! QUESTONS? https://ml3cpu.acadiau.ca/ [email protected] http://tinyurl/dsilver 48 Intelligent Information Technology Research Lab, Acadia University, Canada 48 References Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, Now Publishers, 2009. Bengio, Y. and LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, editors, Large Scale Kernel Machines. MIT Press. Bengio, Y.: Evolving culture vs local minima, In ArXiv 1203.2990v1. Springer (2013), http://arxiv.org/abs/1203.2990 Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. Cambridge, MA: MIT Press, 2017. Print. Intelligent Information Technology Research Lab, Acadia University, Canada 49
© Copyright 2026 Paperzz