Combining Inductive and Analytical Learning

Combining Inductive and
Analytical Learning
Ch 12. in Machine Learning
Tom M. Mitchell
고려대학교 자연어처리 연구실
한경수
1999. 7. 9.
Contents
 Motivation
 Inductive-Analytical Approaches to Learning
 Using Prior Knowledge to Initialize the Hypothesis
 The KBANN Algorithm
 Using Prior Knowledge to Alter the Search Objective
 The TANGENTPROP Algorithm
 The EBNN Algorithm
 Using Prior Knowledge to Augment Search Operators
 The FOCL Algorithm
Combining Inductive & Analytical Learning
2
Motivation(1/2)
 Inductive & Analytical Learning
Goal:
Justification:
Advantages:
Pitfalls:
Inductive Learning
Analytical Learning
Hypothesis fits data
Hypothesis fits domain theory
Statistical inference
Deductive inference
Requires little prior knowledge
Learns from scarce data
Scarce data, incorrect bias
Imperfect domain theory
 A spectrum of learning tasks
 Most practical learning problems lie somewhere between these
two extremes of the spectrum.
Combining Inductive & Analytical Learning
3
Motivation(2/2)
 What kinds of learning algorithms can we devise that
make use of approximate prior knowledge, together
with available data, to form general hypothesis?
 domain-independent algorithms that employ explicitly input
domain-dependent knowledge
 Desirable Properties
 no domain theory  learn as well as inductive methods
 perfect domain theory  learn as well as analytical methods
 imperfect domain theory & imperfect training data  combine
the two to outperform either inductive or analytical methods
 accommodate arbitrary and unknown errors in domain theory
 accommodate arbitrary and unknown errors in training data
Combining Inductive & Analytical Learning
4
The Learning Problem
 Given:
 A set of training examples D, possibly containing errors
 A domain theory B, possibly containing errors
 A space of candidate hypothesis H
 Determine:
 A hypothesis that best fits the training examples & domain
theory
Combining Inductive & Analytical Learning
5
Hypothesis Space Search
 Learning as a task of searching through hypothesis space
 hypothesis space H
 initial hypothesis
h0
 the set of search operator O
 define individual search steps
 the goal criterion G
 specifies the search objective
 Methods for using prior knowledge
Use prior knowledge to
 derive an initial hypothesis h0 from which to begin the search
 alter the objective G of the hypothesis space search
 alter the available search steps O
Combining Inductive & Analytical Learning
6
Using Prior Knowledge to Initialize
the Hypothesis
 Two Steps
1. initialize the hypothesis to perfectly fit the domain theory
2. inductively refine this initial hypothesis as needed to fit the
training data
 KBANN(Knowledge-Based Artificial Neural Network)
Given:
 A set of training examples
 A domain theory consisting of nonrecursive, propositional Horn clauses
Determine:
 An artificial neural network that fits the training examples, biased the
domain theory
1. Analytical Step

create an initial network equivalent to the domain theory
2. Inductive Step

refine the initial network (use BACKPROP)
Combining Inductive & Analytical Learning
 Table 12.2(p.341)
7
Example: The Cup Learning Task
Neural Net Equivalent to Domain Theory
Result of refining the network
Combining Inductive & Analytical Learning
8
Remarks
 KBANN vs. Backpropagation
 when given an approximately correct domain theory & scarce
training data

KBANN generalizes more accurately than Backpropagation
 Classifying promoter regions in DNA
 Backpropagation: error rate 8/106
 KBANN: error rate 4/106
 bias
 KBANN
 domain-specific theory

Backpropagation
 domain-independent syntactic bias
toward small weight values
Combining Inductive & Analytical Learning
9
Using Prior Knowledge to Alter the
Search Objective
 Use of prior knowledge
 incorporate it into the error criterion minimized by gradient descent
 network must fit a combined function of the training data & domain
theory
 Form of prior knowledge
 derivatives of the target function
 certain type of prior knowledge can be expressed quite naturally
 example: recognizing handwritten characters
 “the identity of the character is independent of small translations and
rotations of the image.”
Combining Inductive & Analytical Learning
10
The TANGENTPROP Algorithm
 Domain Knowledge
 expressed as derivatives of the target function with respect to
transformations of its inputs
 Training Derivatives
 TANGENTPROP assumes various training derivatives of the
target function are provided.
 xi , f ( xi ),
f ( x)
x

xi
 Error Function
2


ˆ



f
(
s
(

,
x
))

f
(
s
(

,
x
))
j
i
j
i
2
ˆ
 
E   ( f ( xi )  f ( xi ))    


 


i 
j

 0 

s j ( , xi ) : transformation(rotation or translation)
s j (0, xi )  xi
 : constant to determine the relative importance
Combining Inductive & Analytical Learning
 Table 12.4(p.349)
11
Remarks
 TANGENTPROP combines the prior knowledge with
observed training data, by minimizing an objective
function that measures both
 the network’s error with respect to the training example values
 the network’s error with respect to the desired derivatives
 TANGENTPROP is not robust errors in the prior
knowledge
 need to automatically select
 EBNN Algorithm

Combining Inductive & Analytical Learning
12
The EBNN Algorithm(1/2)
 Input
 A set of training examples of the form  xi , f ( xi ) 
 A domain theory represented by a set of previously trained NN
 Output
 A new NN that approximates the target function
 Algorithm
 Create a new, fully connected feedforward network to represent
the target function
 For each training example, determine corresponding training
derivatives
 Use the TANGENTPROP algorithm to train the target network
Combining Inductive & Analytical Learning
13
The EBNN Algorithm(2/2)
 Computation of training derivatives
 Figure 12.7(p.353)
 compute them itself for each observed training example
 explain each training example in terms of a given domain theory
 extract training derivatives from this explanation


Cup
Cup
Cup
,
,...,
 BottomIsFlat ConcavityPointsUp
MadeOfStyrofoam  x  xi

 provide important information for distinguishing relevant from irrelevant features
 How to weight the relative importance of the inductive & analytical
component of learning
  is chosen independently for each training example
 consider how accurately the domain theory predicts the training value
for this particular example
 Error Function
2


ˆ ( x) 


A
(
x
)

f
2
ˆ




E   ( f ( xi )  f ( xi ))  i  

j
j 
x  ( x  x ) 
i 
j  x
i 

i  1 
A( xi )  f ( xi )
A(x): domain theory prediction for input x
xi : ith training instance
x j : jth component of the vector x
c: normalizing constant
c
Combining Inductive & Analytical Learning
0  i  1
14
Remarks
 EBNN vs. Symbolic Explanation-Based Learning
 domain theory consisting of NNs rather than Horn clauses
 relevant dependencies take the form of derivatives
 accommodates imperfect domain theories
 learns a fixed-sized neural network
 requires constant time to classify new instances
 unable to represent sufficiently complex functions
Combining Inductive & Analytical Learning
15
Using Prior Knowledge to
Augment Search Operators
 The FOCL Algorithm
 Figure 12.8(p.358)
 Two operators for generating candidate specializations
1. Add a single new literal
2. Add a set of literals that constitute logically sufficient conditions for
the target concept, according to the domain theory
 select one of the domain theory clauses whose head matches the target
concept.
 Unfolding: Each nonoperational literal is replaced, until the sufficient
conditions have been restated in terms of operational literals.
 Pruning: the literal is removed unless its removal reduces classification
accuracy over the training examples.
 FOCL selects among all these candidate specializations, based
on their performance over the data
 domain theory is used in a fashion that biases the learner
 leaves final search choices to be made based on performance
over the training data
 Figure 12.9(p.361)
Combining Inductive & Analytical Learning
16