Emperical Studies

Deep Learning with Symbols
Daniel L. Silver
Acadia University,
Wolfville, NS, Canada
HL NeSy Seminar - Dagsthul, April 2017
Intelligent Information Technology Research Lab, Acadia University, Canada
1
Introduction
Shameer Iqbal
Intelligent Information Technology Research Lab, Acadia University, Canada
2
Ahmed Galilia
Motivation



Humans are multimodal
learners
We are able to associate
one modality with another
Conjecture: knowing a
concept has a lot to do with
the fusion of sensory/motor
channels
Intelligent Information Technology Research Lab, Acadia University, Canada
4
Motivation



Further Conjecture: Symbols allow us to share
complex concepts quickly, concisely
A human communications tool
A course approximation of a noisy concept
And their sounds:
“one” “two” …
0 1 2 3 4 5 6 7 8 9

Also help us to escape local minima when learning
Intelligent Information Technology Research Lab, Acadia University, Canada
5
Objective 1

To develop a multimodal system





A generative deep learning architecture
Trained using unsupervised algorithms
That scales linearly in the number of channels
Can reconstruct missing modalities
Train and test it on digits 0-9


Four channels:
Image, Audio, Motor, … Symbolic Classification
Intelligent Information Technology Research Lab, Acadia University, Canada
6
Background
Deep Belief Networks

Stacked auto-encoders develop a rich feature
space from unlabelled examples using
unsupervised algorithms
[Source: Caner Hazibas – slideshare]
Intelligent Information Technology Research Lab, Acadia University, Canada
7
Background
Deep Belief Networks
RBM = Restricted Boltzman Machines
Intelligent Information Technology Research Lab, Acadia University, Canada
8
Background
Multimodal Learning

MML Approach has been adopted by several
deep learning researchers:






(Srivastava and Salakhutdinov 2012)
(Ngiam et al. 2011), (Socher et al. 2014)
(Kiros, Salakhutdinov, and Zemel 2014)
(Karpathy and Fei-Fei 2015)
However tend to associate only 2 modalities
Association layer is fine-tuned using
supervised techniques such as back-prop
Intelligent Information Technology Research Lab, Acadia University, Canada
10
Background
Problem Refinement
Intelligent Information Technology Research Lab, Acadia University, Canada
11
Background
Problem Refinement
Intelligent Information Technology Research Lab, Acadia University, Canada
12
Background
Problem Refinement

#1 Supervised fine-tuning does not scale well
to three or more modalities


Must fine-tune all possible input-output modality
combinations
Grows exponentially (2n-2), where n is number of
channels
Intelligent Information Technology Research Lab, Acadia University, Canada
13
Background
Problem Refinement

Example: n=3 channels, 6 configurations
(23-2) = 6
(24-2) = 14
(25-2) = 30
Intelligent Information Technology Research Lab, Acadia University, Canada
14
Background
Problem Refinement



#2 Standard unsupervised learning using RBM
approaches yields poor reconstruction
A channel that provides a simple, noise free signal will
dominate over other channels at the associative layer
Difficult for another channel to generate correct
features at the associate layer
Intelligent Information Technology Research Lab, Acadia University, Canada
15
Theory and Approach
Network Architecture
Provides a concise
symbolic rep of AM
Intelligent Information Technology Research Lab, Acadia University, Canada
16
Theory and Approach
RBM training of DBN Stack
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
22
Theory and Approach
RBM training of DBN Stack
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
23
Theory and Approach
Fine-tuning with Iterative Back-fitting
Create and save hi
Split weights wr
wg
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
24
Theory and Approach
Fine-tuning with Iterative Back-fitting
Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2
Create new hj’
Split weight
wr
wg
vi
vi'
8
Intelligent Information Technology Research Lab, Acadia University, Canada
eight
25
Theory and Approach
Fine-tuning with Iterative Back-fitting
Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2
Create new hj’
Split weight
wr
wg
vi
8
Intelligent Information Technology Research Lab, Acadia University, Canada
vi'
eight
26
Emperical Studies
Data and Method

10 reps x 20 male Canadian students





2000 examples in total


Handwritten digits 0-9
Audio recordings
Vector of noisey motor coordinates
Classifications
100 examples per student x 20 students
Conducted 10-fold cross validation


DEMO
18 subjects in training set (1800 examples)
2 subjects in test set (200 examples)
Intelligent Information Technology Research Lab, Acadia University, Canada
30
Deep Learning and LML
http://ml3cpu.acadiau.ca
[Iqbal and Silver, in press]
Intelligent Information Technology Research Lab, Acadia University, Canada
31
Emperical Studies
Data and Method

Evaluation:


Examine reconstruction error of each channel given
input on another channel
Error measure differs for each channel:




Class: misclassification error
Image: agreement with ANN classifier (99% acc)
Audio: STFT signal is not reversible so as to create
sound, agreement with RF classifier (93% acc)
Motor: error = distance to target vector template;
(anything < 2.2 is human readable)
Intelligent Information Technology Research Lab, Acadia University, Canada
32
Emperical Studies
Results

Reconstruction of Classification Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
33
Emperical Studies
Results

Reconstruction of Image Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
34
Emperical Studies
Results

Reconstruction of Motor Channel
Intelligent Information Technology Research Lab, Acadia University, Canada
35
Emperical Studies
Discussion



Elimination of channel dominance is not
perfect, but significantly better
Reconstruction error of missing channel
decreases as available channels increase
NOTE: The classification channel is not
needed


Introduced to clarify the concept in assoc. mem.
A symbol for a noisy concept
Intelligent Information Technology Research Lab, Acadia University, Canada
36
Objective 2

To show that learning with symbols is easier
and more accurate than without





A deep supervised learning architecture
Develop a model to add two MNIST digits
With and without symbolic inputs
Test on previously unseen examples
To examine what is happening in the network

Is network learning addition or just a mapping
function?
Intelligent Information Technology Research Lab, Acadia University, Canada
39
Challenge in Training Deep
Architectures
Many tricks are used to overcome local minimum, most are a form of
Inductive bias that favours portions of weights space where good solutions
tend to be found.
Intelligent Information Technology Research Lab, Acadia University, Canada
40
Two learners are better than one !





Consider you’re in the
jungle …
Learning concepts
…
Then you meet
another person
You share symbols


Accuarcy improves
Learn rate increases
Intelligent Information Technology Research Lab, Acadia University, Canada
41
Challenge in Training Deep
Architectures




A single learner is hampered by
the presence of local minima
within its rep. space
Overcoming this difficulty
requires a lot of training
examples
Instead an agent’s learning
effectiveness can be
significantly improved with
symbols
Inspires social interaction and Bengio, Y.: Evolving culture vs local minima, In
ArXiv 1203.2990v1. Springer (2013),
dev. of culture
http://arxiv.org/abs/1203.2990
Intelligent Information Technology Research Lab, Acadia University, Canada
42
Emperical Studies:
Learning to Add MNIST digits


Google Tensorflow
Noisy:



With binary symbolic values for each digit:



Input: 2 MNIST digit images (784 x 2 values)
Output: 2 MNIST digit images (784 x 2 values)
Input: 1568 (images) + 10 + 10 (symbols) values
Output: 1568 + 10 + 10 values
1-3 hidden layers of ReLU units
Intelligent Information Technology Research Lab, Acadia University, Canada
43
DL Model – Without Symbols
Intelligent Information Technology Research Lab, Acadia University, Canada
44
DL Model – With Symbols
Intelligent Information Technology Research Lab, Acadia University, Canada
45
Most recent results:
Without symbols With Symbols
Intelligent Information Technology Research Lab, Acadia University, Canada
46
Discussion:

Improved results of about 10% with symbolic
outputs (based on classification of the output
digits by a highly accurate convolution network)


Believe we can do much better
Lab working on:



Different architectures
Varying number of training examples with symbols
Interpreting hidden node features
Intelligent Information Technology Research Lab, Acadia University, Canada
47
Thank You!
QUESTONS?



https://ml3cpu.acadiau.ca/
[email protected]
http://tinyurl/dsilver
48
Intelligent Information Technology Research Lab, Acadia University, Canada
48
References




Bengio, Y. (2009). Learning deep architectures for AI. Foundations
and Trends in Machine Learning, Now Publishers, 2009.
Bengio, Y. and LeCun, Y. (2007). Scaling learning algorithms
towards AI. In L. Bottou, O. Chapelle, D. DeCoste, and J. Weston,
editors, Large Scale Kernel Machines. MIT Press.
Bengio, Y.: Evolving culture vs local minima, In ArXiv 1203.2990v1.
Springer (2013), http://arxiv.org/abs/1203.2990
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep
learning. Cambridge, MA: MIT Press, 2017. Print.
Intelligent Information Technology Research Lab, Acadia University, Canada
49