A Scalable Unsupervised Deep Multimodal Learning System Mohammed Shameer Iqbal Daniel L. Silver Acadia University, Wolfville, NS, Canada FLAIRS-2016 Key Largo, Florida, USA – May 17, 2016 Intelligent Information Technology Research Lab, Acadia University, Canada 1 Introduction Shameer Iqbal, MSc Intelligent Information Technology Research Lab, Acadia University, Canada 2 Motivation Humans are multimodal learners We are able to associate one modality with another Conjecture: knowing a concept has a lot to do with the fusion of sensory/motor channels Intelligent Information Technology Research Lab, Acadia University, Canada 4 Objective To develop a multimodal system A generative deep learning architecture Trained using unsupervised algorithms That scales linearly in the number of channels Can reconstruct missing modalities Train and test it on digits 0-9 Four channels: Image, Audio, Motor, Classification Intelligent Information Technology Research Lab, Acadia University, Canada 5 Background Deep Belief Networks Intelligent Information Technology Research Lab, Acadia University, Canada 6 Background Multimodal Learning MML Approach has been adopted by several deep learning researchers: (Srivastava and Salakhutdinov 2012) (Ngiam et al. 2011), (Socher et al. 2014) (Kiros, Salakhutdinov, and Zemel 2014) (Karpathy and Fei-Fei 2015) However tend to associate only 2 modalities Association layer is fine-tuned using supervised techniques such as back-prop Intelligent Information Technology Research Lab, Acadia University, Canada 8 Background Problem Refinement Intelligent Information Technology Research Lab, Acadia University, Canada 9 Background Problem Refinement Intelligent Information Technology Research Lab, Acadia University, Canada 10 Background Problem Refinement #1 Supervised fine-tuning does not scale well to three or more modalities Most fine-tune all possible input-output modality combinations Grows exponentially (2n-2), where n is number of channels Intelligent Information Technology Research Lab, Acadia University, Canada 11 Background Problem Refinement Example: n=3 channels, 6 configurations (23-2) = 6 (24-2) = 14 (25-2) = 30 Intelligent Information Technology Research Lab, Acadia University, Canada 12 Background Problem Refinement #2 Standard unsupervised learning using Deep Belief Network approaches yields poor reconstruction A channel that provides a simple, noise free signal will dominate over other channels at the associative layer Difficult for the channel to generate correct features at the associate layer Intelligent Information Technology Research Lab, Acadia University, Canada 13 Theory and Approach Network Architecture Intelligent Information Technology Research Lab, Acadia University, Canada 14 Theory and Approach Network Architecture Intelligent Information Technology Research Lab, Acadia University, Canada 16 Theory and Approach Network Architecture Intelligent Information Technology Research Lab, Acadia University, Canada 17 Theory and Approach Network Architecture Intelligent Information Technology Research Lab, Acadia University, Canada 18 Theory and Approach Network Architecture "0” = [(5,1), (3,4), (4,8), (5,10),(7,8), (7,4), (5,1)] Intelligent Information Technology Research Lab, Acadia University, Canada 19 Theory and Approach RBM training of DBN Stack 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 20 Theory and Approach RBM training of DBN Stack 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 21 Theory and Approach Fine-tuning with Iterative Back-fitting Create and save hi Split weights wr wg 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 22 Theory and Approach Fine-tuning with Iterative Back-fitting Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2 Create new hj’ Split weight wr wg vi vi' 8 Intelligent Information Technology Research Lab, Acadia University, Canada eight 23 Emperical Studies Objective To develop a 4-channel MM DLA that is able to reconstruct all channels given only one Intelligent Information Technology Research Lab, Acadia University, Canada 26 Emperical Studies Data and Method 10 reps x 20 male Canadian students 2000 examples in total Handwritten digits 0-9 Audio recordings Vector of noisey motor coordinates Classifications 100 examples per student x 20 students Conducted 10-fold cross validation DEMO 18 subjects in training set (1800 examples) 2 subjects in test set (200 examples) Intelligent Information Technology Research Lab, Acadia University, Canada 27 Emperical Studies Data and Method Evaluation: Examine reconstruction error of a channel given input on another channel Error measure differs for each channel: Class: misclassification error Image: agreement with ANN classifier (99% acc) Audio: STFT signal is not reversable so as to create sound, agreement with RF classifier (93% acc) Motor: error = distance to target vector template; (anything < 2.2 is human readable) Intelligent Information Technology Research Lab, Acadia University, Canada 28 Emperical Studies Results Reconstruction of Classification Channel Intelligent Information Technology Research Lab, Acadia University, Canada 29 Emperical Studies Results Reconstruction of Image Channel Intelligent Information Technology Research Lab, Acadia University, Canada 30 Emperical Studies Results Reconstruction of Motor Channel Intelligent Information Technology Research Lab, Acadia University, Canada 31 Emperical Studies Discussion Elimination of channel dominance is not perfect, but significantly better Additional experiments have shown that The classification channel is not needed Reconstruction error of missing channel decreases as available channels increase Intelligent Information Technology Research Lab, Acadia University, Canada 32 Conclusion Unsupervised MM DLA with a shared associtaive layer and back-fitting is able overcome the two challenges posed: Able to accurately reconstruct missing channel values given input on one Generates an activation in the associative layer useful for all modalities Scales linearly in the number of channels Intelligent Information Technology Research Lab, Acadia University, Canada 33 Future work It is rare that all sensory/motor receive correlating input Modify algorithm to train with missing modal values Propose to iterate between RBM and back-fitting algorithms Allow channels stacks and associative layer to develop weights that better consider both within-channel reconstruction and crosschannel reconstruction Move to parallel computing archecture – to decrease experimentatio time Intelligent Information Technology Research Lab, Acadia University, Canada 34 Thank You! QUESTONS? https://ml3cpu.acadiau.ca/ [email protected] http://tinyurl/dsilver 35 Intelligent Information Technology Research Lab, Acadia University, Canada 35
© Copyright 2026 Paperzz