Knowledge Transfer in Artificial Neural Networks

Lifelong Machine
Learning and Reasoning
Daniel L. Silver
Acadia University,
Wolfville, NS, Canada
CoCo Workshop @ NIPS 2015
Montreal, Canada - Dec 12, 2015
Intelligent Information Technology Research Lab, Acadia University, Canada
1
Significant contributions by






Jane Gomes
Moh. Shameer Iqbal
Ti Wang
Xiang Jiang
Geoffrey Mason
Hossein Parvar
Intelligent Information Technology Research Lab, Acadia University, Canada
2
Talk Outline







Overview
Lifelong Machine Learning
Role of Deep Learning
Connection to Knowledge Rep and Reasoning
Learning to Reason (L2R)
Empirical Studies
Conclusion and Future Work
Intelligent Information Technology Research Lab, Acadia University, Canada
3
Overview


It is now appropriate to seriously consider the
nature of systems that learn and reason over
a lifetime
Advocate a systems approach in the context
of an agent that can:



Acquire new knowledge through learning
Retain and consolidate that knowledge
Use it in future learning, reasoning
and other aspects of AI
[D.Silver, Q. Yang, L.Li 2013]
Intelligent Information Technology Research Lab, Acadia University, Canada
4
Overview

Machine learning has made great strides in
Learning to Classify (L2C) in a probabilistic
manner in accord with the environment
P(x)
x
Intelligent Information Technology Research Lab, Acadia University, Canada
5
Overview

Propose: Learning to Reason, or L2R

As per L.Valiant, D.Roth, R.Khardon, L.Bottou in
a PAC sense, reasoning has to be adequate
P(x)
x
Intelligent Information Technology Research Lab, Acadia University, Canada
6
Overview

Motivation: Learning to Reason, or L2R:




LML  KR:
New insights into how to best represent
common background knowledge acquired
over time and over the input space
KR places additional constraints on internal
representation in the same way as LML
Generative Deep Learning – to use wealth of
unlabelled examples and provide greater plasticity
Intelligent Information Technology Research Lab, Acadia University, Canada
7
Lifelong Machine Learning (LML)

Considers systems that can learn
many tasks over a lifetime




From impoverished training sets
Across a diverse domain of tasks
Where practice of tasks happens
Able to effectively and efficiently


Consolidate (retain and integrate) learned
knowledge
Transfer prior knowledge when learning a new
task
Intelligent Information Technology Research Lab, Acadia University, Canada
8
Lifelong Machine Learning (LML)
space of
hypothesis
spaces H’
space of
hypotheses H
space of
examples X
h'k
hj
xi
Intelligent Information Technology Research Lab, Acadia University, Canada
9
Lifelong Machine Learning (LML)
Testing
Examples
Instance Space
X
Domain
Knowledge
long-term memory
(x, f(x))
Knowledge
Transfer
Training
Examples
Inductive
Bias
Retention &
Consolidation
Knowledge
Selection
Inductive
Learning System
short-term memory
S
Intelligent Information Technology Research Lab, Acadia University, Canada
Model of
Classifier
h
h(x) ~ f(x)
10
Lifelong Machine Learning (LML)
Testing
Examples
Instance Space
X
Domain
Knowledge
long-term memory
(x, f(x))
Knowledge
Transfer
Training
Examples
Inductive
Bias
Knowledge
Selection
Inductive
Learning System
short-term memory
S
Intelligent Information Technology Research Lab, Acadia University, Canada
Retention &
Consolidation
Model of
Classifier
h
h(x) ~ f(x)
11
Lifelong Machine Learning (LML)
Instance Space
X
f1(x)
f2(x)
…
f9(x)
Domain
Knowledge
fk(x)
Testing
Examples
Consolidated
MTL
long-term memory
(x, f(x))
Knowledge
Transfer
Inductive
Bias
fk(x)
f2(x)
Training
Examples
Knowledge
Selection
Retention &
Consolidation
f5(x)
Multiple Task
Learning (MTL)
[R. Caruana 1997]
S
x1
Model of
Classifier
h
h(x) ~ f(x)
xn
Intelligent Information Technology Research Lab, Acadia University, Canada
12
csMTL and
An Environmental Example
16
MAE (m^3/s)
15
14
13
12
11
0
No Transfer
1
2
3
4
Years of Data Transfered
Wilmot
Sharpe
Sharpe & Wilmot
5
6
Shubenacadie
x = weather data
Stream flow rate prediction
[Gaudette, Silver, Spooner 2006]
Intelligent Information Technology Research Lab, Acadia University, Canada
f(x) = flow rate
13
Context Sensitive MTL (csMTL)

We have developed an alternative
approach that is meant to overcome
limitations of MTL networks:






y=f(c,x)
Uses a single output
Context inputs associate an example with a
task; or indicate absence of a primay input
Develops a fluid domain of task knowledge
index by the context inputs
Supports consolidation of knowledge
Facilitates practicing a task
More easily supports tasks with
vector outputs
c1
Context Inputs c
One output
for all tasks
c k x1
xn
Primary Inputs x
[Silver, Poirier and Currie, 2008]
Intelligent Information Technology Research Lab, Acadia University, Canada
14
csMTL and
Tasks with Multiple Outputs

Liangliang Tu (2010)



Image Morphing:
Inductive transfer
between tasks that have
multiple outputs
Transforms 30x30 grey
scale images using
inductive transfer
Three mapping tasks
NA
NH
NS
[Tu and Silver, 2010]
Intelligent Information Technology Research Lab, Acadia University, Canada
15
Two more Morphed Images
Passport
Angry
Filtered
Passport
Sad
Filtered
Intelligent Information Technology Research Lab, Acadia University, Canada
17
17
LML via csMTL
Task Rehearsal
Functional transfer
(virtual examples) for
slow consolidation
f1(c,x)
Short-term
Learning
Network
f’(c,x)
Long-term
Consolidated
Domain
Knowledge
Network
Representational
transfer from CDK
for rapid learning
c1
One output
for all tasks
ck x1
Context Inputs
Intelligent Information Technology Research Lab, Acadia University, Canada
xn
Standard Inputs
18
LML via csMTL


Consolidation via task rehearsal can be achieved very effciently:
 Need only train on a few virtual examples (as few as one)
selected at random during each training iteration
Maintains stable prior functionality while allowing
representational plasticity for integration of new task
[Silver, Mason and Eljabu 2015]
Intelligent Information Technology Research Lab, Acadia University, Canada
19
Deep Learning and LML

Stacked RBMs develop a rich feature space
from unlabelled examples using unsupervised
algorithms
[Source: Caner Hazibas – slideshare]
Intelligent Information Technology Research Lab, Acadia University, Canada
20
Deep Learning and LML





y=f(c,x) One output
Transfer learning and consolidation
for all tasks
works better with a deep learning csMTL
Generative models are built using an
RBM stack and unlabelled examples
Inputs include context and primary
attributes
Can produce a rich variety of features
indexed by the context nodes
Supervised learning used to fine-tune all
or portion of weights for multiple-task
c1
ck x1
xn
knowledge transfer or consolidation
Context Inputs c Primary Inputs x
[Jiang and Silver, 2015]
Intelligent Information Technology Research Lab, Acadia University, Canada
21
Deep Learning and LML
Experiments using the MNIST dataset
y=f(c,x) One output
for all tasks
c1
ck x1
Context Inputs c
xn
Primary Inputs x
[Jiang and Silver, 2015]
Intelligent Information Technology Research Lab, Acadia University, Canada
22
Deep Learning and LML
http://ml3cpu.acadiau.ca
[Wang and Silver, 2015]
[Iqbal and Silver, in press]
Intelligent Information Technology Research Lab, Acadia University, Canada
23
Deep Learning and LML

Stimulates new ideas about:



How knowledge of the world is learned,
consolidated, and then used for future learning
and reasoning
How best to learn and represent
common background knowledge
Important to Big AI problem solving
... such as reasoning
Intelligent Information Technology Research Lab, Acadia University, Canada
25
Knowledge Representation
and Reasoning



Focuses on the representation
of information that can be used
for reasoning
It enables an entity to determine
consequences by thinking
rather than acting
Traditionally requires a
reasoning/inference engine to
answer queries about beliefs
Intelligent Information Technology Research Lab, Acadia University, Canada
27
Knowledge Representation
and Reasoning



Reasoning could be considered “algebraic
[systematic] manipulation of previously
acquired knowledge in order to answer a
new question” (L. Bottou 2011)
Requires a method of acquiring and storing
knowledge
Learning from the environment
is the obvious choice …
Intelligent Information Technology Research Lab, Acadia University, Canada
28
Learning to Reason (L2R)



Concerned with the process of learning a knowledge
base and reasoning with it [Kardon and Roth 97]
Reasoning is subject to the errors that can be bounded
in terms of the inverse of the effort invested in the
learning process
“This statement
Requires knowledge representations
is false”
that are learnable and
facilitate reasoning
Intelligent Information Technology Research Lab, Acadia University, Canada
29
Learning to Reason (L2R)



Takes a probabilistic perspective on learning and
reasoning [Kardon and Roth 97]
Agent need not answer all possible knowledge queries
Only those that are relevant to the environment in a
(PAC) sense [Valiant 08, Juba 12&13 ]
Intelligent Information Technology Research Lab, Acadia University, Canada
30
Learning to Reason (L2R)

Valiant and Khardon show formally:



L2R allows efficient learning of Boolean logical assertions in the
PAC-sense
Learned knowledge can be used to reason efficiently, and to an
expected level of accuracy and confidence
We wish to demonstrate that:
A knowledge base of Boolean functions, is PAC learnable from
examples using a csMTL network
 Even when the examples provide information about only
portion of the input space
… explore a LML approach - consolidation over time and over
the input space

Intelligent Information Technology Research Lab, Acadia University, Canada
31
Learning to Reason (L2R)
Propositional
Logic Functions
Input “truth table terms:
A B C …True/False
01 0…1
Simple
terms and
clauses:
~A
B C
(~A v B)
(~B v C)
More complex
functions:
Functions of
Functions:
(~A v B) v (~B v C)
(~A v C)
~(~A v B) v ~(~B v C) v
(~A v C)
Intelligent Information Technology Research Lab, Acadia University, Canada
32
L2R with LML – Study 1

Consider the Law of Syllogism


KB: (A  B)∧(B  C)
Q: (A  C)
Intelligent Information Technology Research Lab, Acadia University, Canada
33
L2R with LML – Study 1
Learning the Law of Syllogism:
Training Set, KB:
and
Query Set, Q:
cA
Intelligent Information Technology Research Lab, Acadia University, Canada
cB
cC A
B
C
35
L2R with LML – Study 1
Training Set, KB:
Learning the Law of Syllogism:
6-10-10-1 network
Query Set, Q:
cA
cB
cC A
B
C
Results:
Average over 30 runs: 89% correct
Intelligent Information Technology Research Lab, Acadia University, Canada
36
L2R with LML – Study 2
Objective: Learn the Law of Syllogism (10 literals)
KB: (A∧B∨C)  (D∨E∨~F) ∧ (D∨E∨~F) (G∨(~H∧I)∨~J)
Q: (A∧B∨C)  (G∨(~H∧I)∨~J)
Training set: 100% of subKB examples
(A∧B∨C)  (D∨E∨~F)
(D∨E∨~F) (G∨(~H∧I)∨~J)
20-10-10-1
network
Q: (A∧B∨C)  (G∨(~H∧I)∨~J)
Average over 10 runs: 78% accuracy
Intelligent Information Technology Research Lab, Acadia University, Canada
37
L2R with LML – Study 3
Objective: To learning the following knowledge base:
Two different ways:
1. From examples of KB (1024 in total)
2. From examples of sub-clauses of KB (sub-KB)
Training Set: All possible sub-KB:
Intelligent Information Technology Research Lab, Acadia University, Canada
20-10-10-1
network
38
L2R with LML – Study 3
Objective: To learning the following knowledge base:
Mean accuracy
Results:
Test on all
KB examples
(over 5 runs)
% of examples used for training
Intelligent Information Technology Research Lab, Acadia University, Canada
39
Conclusion

Learning to Reason (L2R) using a csMTL
neural network:
1.
2.
3.
4.

Uses examples to learn a model of logical functions in a
probabilistic manner
Consolidates knowledge from examples that represent
only portion of the input space
Reasoning = testing the model using truth table of Q
Relies on context nodes to select inputs that are relevant
Results on simple Boolean logic domain
suggests promise
Intelligent Information Technology Research Lab, Acadia University, Canada
40
Future work





Create a scope for determining those tasks that a trained
network finds TRUE
Thoroughly examined the affect of a probability
distribution over the input space (train and test sets)
Combine csMTL with deep learning architectures to
learn hierarchies of abstract features (tend to be DNF)
Consider other learning algorithms
Consider more complex knowledge bases – beyond
propositional logic
Intelligent Information Technology Research Lab, Acadia University, Canada
41
Thank You!
[email protected]
http://tinyurl/dsilver
References:











L. G. Valiant. Knowledge infusion: In pursuit of robustness in artificial intelligence. FSTTCS, 415-422, 2008.
Brendan Juba. Implicit learning of common sense for reasoning. IJCAI, 939-946, 2013.
Roni Khardon and D. Roth. Learning to reason. Journal of the ACM, 44(5):697-725, 1997.
D. Siver, R. Poirier, and D. Currie. Inductive transfer with context sensitive neural networks. Machine Learning Special Issue on Inductive Transfer, Springer, 73(3):313-336, 2008.
Silver, D. and Mason, G. and Eljabu, L. 2015, Consolidation using Sweep Task Rehearsal: Overcoming the
Stability-Plasticity Problem, Advances in Artificial Intelligence, 28th Conference of the Canadian Artificial
Intelligence Association (AI 2015), Springer, LNAI 9091, pp 307-324.
Wang.T and Silver,D. 2015, Learning Paired-associate Images with An Unsupervised Deep Learning Architecture,
LNAI 9091, pp 250-263.
Gomes, J. and Silver,D. 2015, Learning to Reason in A Probably Approximately Correct Manner, Proceeding of
the CCECE 2014, Halifax, NS, May 2015, IEEE Press, pp. 1475-8.
Silver, D. The Consolidation of Task Knowledge for Lifelong Machine Learning. Proceedings of the AAAI Spring
Symposium on Lifelong Machine Learning, Stanford University, CA, AAAI, March, 2013, pp 46–48.
Silver, D. and Yang, Q. and Li, L. Lifelong machine learning systems: Beyond learning algorithms. Proceedings of
the AAAI Spring Symposium on Lifelong Machine Learning, Stanford University, CA, AAAI, March, 2013, pp 49–
55.
Silver, D. and Tu, L. Image Morphing: Transfer Learning between Tasks that have Multiple Outputs. Advances in
Artificial Intelligence, 25th Conference of the Canadian Artificial Intelligence Association (AI 2012), Toronto, ON,
May, 2012, Springer, LNAI 7310, pp. 194-205.
Silver, D. and Spooner, I. and Gaudette, L. 2009. Inductive Transfer Applied to Modeling River Discharge in Nova
Scotia, Atlantic Geology: Journal of the Atlantic Geoscience Society, (45) 191–203.
Intelligent Information Technology Research Lab, Acadia University, Canada
42