Knowledge Transfer in Artificial Neural Networks

A Scalable Unsupervised Deep
Multimodal Learning System
Mohammed Shameer Iqbal
Daniel L. Silver
Acadia University,
Wolfville, NS, Canada
FLAIRS-2016
Key Largo, Florida, USA – May 17, 2016
Intelligent Information Technology Research Lab, Acadia University, Canada
1
Introduction
Shameer Iqbal, MSc
Intelligent Information Technology Research Lab, Acadia University, Canada
2
Motivation



Humans are multimodal
learners
We are able to associate
one modality with another
Conjecture: knowing a
concept has a lot to do with
the fusion of sensory/motor
channels
Intelligent Information Technology Research Lab, Acadia University, Canada
4
Objective

To develop a multimodal system





A generative deep learning architecture
Trained using unsupervised algorithms
That scales linearly in the number of channels
Can reconstruct missing modalities
Train and test it on digits 0-9


Four channels:
Image, Audio, Motor, Classification
Intelligent Information Technology Research Lab, Acadia University, Canada
5
Background
Deep Belief Networks
Intelligent Information Technology Research Lab, Acadia University, Canada
6
Background
Multimodal Learning

MML Approach has been adopted by several
deep learning researchers:






(Srivastava and Salakhutdinov 2012)
(Ngiam et al. 2011), (Socher et al. 2014)
(Kiros, Salakhutdinov, and Zemel 2014)
(Karpathy and Fei-Fei 2015)
However tend to associate only 2 modalities
Association layer is fine-tuned using
supervised techniques such as back-prop
Intelligent Information Technology Research Lab, Acadia University, Canada
8
Background
Problem Refinement
Intelligent Information Technology Research Lab, Acadia University, Canada
9
Background
Problem Refinement
Intelligent Information Technology Research Lab, Acadia University, Canada
10
Background
Problem Refinement

#1 Supervised fine-tuning does not scale well
to three or more modalities


Most fine-tune all possible input-output modality
combinations
Grows exponentially (2n-2), where n is number of
channels
Intelligent Information Technology Research Lab, Acadia University, Canada
11
Background
Problem Refinement

Example: n=3 channels, 6 configurations
(23-2) = 6
(24-2) = 14
(25-2) = 30
Intelligent Information Technology Research Lab, Acadia University, Canada
12
Background
Problem Refinement



#2 Standard unsupervised learning using Deep Belief
Network approaches yields poor reconstruction
A channel that provides a simple, noise free signal will
dominate over other channels at the associative layer
Difficult for the channel to generate correct features at
the associate layer
Intelligent Information Technology Research Lab, Acadia University, Canada
13
Theory and Approach
Network Architecture
Intelligent Information Technology Research Lab, Acadia University, Canada
14
Theory and Approach
Network Architecture
Intelligent Information Technology Research Lab, Acadia University, Canada
16
Theory and Approach
Network Architecture
Intelligent Information Technology Research Lab, Acadia University, Canada
17
Theory and Approach
Network Architecture
Intelligent Information Technology Research Lab, Acadia University, Canada
18
Theory and Approach
Network Architecture
"0” = [(5,1), (3,4), (4,8), (5,10),(7,8), (7,4), (5,1)]
Intelligent Information Technology Research Lab, Acadia University, Canada
19
Theory and Approach
RBM training of DBN Stack
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
20
Theory and Approach
RBM training of DBN Stack
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
21
Theory and Approach
Fine-tuning with Iterative Back-fitting
Create and save hi
Split weights wr
wg
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
22
Theory and Approach
Fine-tuning with Iterative Back-fitting
Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2
Create new hj’
Split weight
wr
wg
vi
vi'
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
23
Emperical Studies
Objective

To develop a 4-channel MM DLA that is able
to reconstruct all channels given only one
Intelligent Information Technology Research Lab, Acadia University, Canada
26
Emperical Studies
Data and Method

10 reps x 20 male Canadian students





2000 examples in total


Handwritten digits 0-9
Audio recordings
Vector of noisey motor coordinates
Classifications
100 examples per student x 20 students
Conducted 10-fold cross validation


DEMO
18 subjects in training set (1800 examples)
2 subjects in test set (200 examples)
Intelligent Information Technology Research Lab, Acadia University, Canada
27
Emperical Studies
Data and Method

Evaluation:


Examine reconstruction error of a channel given
input on another channel
Error measure differs for each channel:




Class: misclassification error
Image: agreement with ANN classifier (99% acc)
Audio: STFT signal is not reversable so as to create
sound, agreement with RF classifier (93% acc)
Motor: error = distance to target vector template;
(anything < 2.2 is human readable)
Intelligent Information Technology Research Lab, Acadia University, Canada
28
Emperical Studies
Results

Reconstruction of Classification Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
29
Emperical Studies
Results

Reconstruction of Image Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
30
Emperical Studies
Results

Reconstruction of Motor Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
31
Emperical Studies
Discussion


Elimination of channel dominance is not
perfect, but significantly better
Additional experiments have shown that


The classification channel is not needed
Reconstruction error of missing channel
decreases as available channels increase
Intelligent Information Technology Research Lab, Acadia University, Canada
32
Conclusion

Unsupervised MM DLA with a shared
associtaive layer and back-fitting is able
overcome the two challenges posed:

Able to accurately reconstruct missing
channel values given input on one


Generates an activation in the associative layer
useful for all modalities
Scales linearly in the number of channels
Intelligent Information Technology Research Lab, Acadia University, Canada
33
Future work


It is rare that all sensory/motor receive correlating input
Modify algorithm to train with missing modal values



Propose to iterate between RBM and back-fitting algorithms
Allow channels stacks and associative layer to develop weights
that better consider both within-channel reconstruction and crosschannel reconstruction
Move to parallel computing archecture – to decrease
experimentatio time
Intelligent Information Technology Research Lab, Acadia University, Canada
34
Thank You!
QUESTONS?



https://ml3cpu.acadiau.ca/
[email protected]
http://tinyurl/dsilver
35
Intelligent Information Technology Research Lab, Acadia University, Canada
35