Knowledge in Learning

Knowledge in Learning
Chapter 21
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Using prior knowledge
For DT and logical description learning,
we assume no prior knowledge
We do have some prior knowledge, so
how can we use it?
We need a logical formulation as opposed
to the function learning.
CS 471/598 by H. Liu
2
Inductive learning in the
logical setting
The objective is to find a hypothesis that
explains the classifications of the
examples, given their descriptions.
Hypothesis ^ Description |= Classifications
Descriptions - the conjunction of all the example
descriptions
Classifications - the conjunction of all the example
classifications
CS 471/598 by H. Liu
3
A cumulative learning
process
Fig 21.1
The new approach is to design agents that
already know something and are trying to
learning some more.
Intuitively, this should be faster and better than
without using knowledge, assuming what’s
known is always correct.
How to implement this cumulative learning with
increasing knowledge?
CS 471/598 by H. Liu
4
Some examples of using
knowledge
One can leap to general conclusions after only
one observation.
Your such experience?
Traveling to Brazil: Language and name
?
A pharmacologically ignorant but diagnostically
sophisticated medical student …
?
CS 471/598 by H. Liu
5
Some general schemes
Explanation-based learning (EBL)
Hypothesis^Description |= Classifications
Background |= Hypothesis
doesn’t learn anything factually new from instance
Relevance-based learning (RBL)
Hypothesis^Descriptions |= Classifications
Background^Descrip’s^Class |= Hypothesis
deductive in nature
Knowledge-based inductive learning (KBIL)
Background^Hypothesis^Descrip’s |= Classifications
CS 471/598 by H. Liu
6
Inductive logical
programming (ILP)
ILP can formulate hypotheses in general firstorder logic
Others like DT are more restricted languages
Prior knowledge is used to reduce the
complexity of learning:
prior knowledge further reduces the H space
prior knowledge helps find the shorter H
Again, assuming prior knowledge is correct
CS 471/598 by H. Liu
7
Explanation-based learning
A method to extract general rules from
individual observations
The goal is to solve a similar problem faster
next time.
Memoization - speed up by saving results and
avoiding solving a problem from scratch
EBL does it one step further - from observations
to rules
CS 471/598 by H. Liu
8
Why EBL?
Explaining why something is a good idea is
much easier than coming up with the idea.
Once something is understood, it can be
generalized and reused in other circumstances.
Extracting general rules from examples
EBL constructs two proof trees simultaneously
by variablization of the constants in the first tree
An example (Fig 21.2)
CS 471/598 by H. Liu
9
Basic EBL
Given an example, construct a proof tree using
the background knowledge
In parallel, construct a generalized proof tree for
the variabilized goal
Construct a new rule (leaves => the root)
Drop any conditions that are true regardless of
the variables in the goal
CS 471/598 by H. Liu
10
Efficiency of EBL
Choosing a general rule
too many rules -> slow inference
aim for gain - significant increase in speed
as general as possible
Operationality - A subgoal is operational means it is
easy to solve
Trade-off between Operationality and Generality
Empirical analysis of efficiency in EBL study
CS 471/598 by H. Liu
11
Learning using relevant
information
Prior knowledge: People in a country usually
speak the same language
Observation: Given Fernando is Brazilian &
speaks Portuguese
We cab logically conclude via resolution
CS 471/598 by H. Liu
12
Functional dependencies
We have seen a form of relevance:
determination - language (Portuguese) is a function
of nationality (Brazil)
Determination is really a relationship between
the predicates
The corresponding generalization follows
logically from the determinations and
descriptions.
CS 471/598 by H. Liu
13
We can generalize from Fernando to all Brazilians,
but not to all nations. So, determinations can limit
the H space to be considered.
Determinations specify a sufficient basis
vocabulary from which to construct hypotheses
concerning the target predicate.
A reduction in the H space size should make it
easier to learn the target predicate.
CS 471/598 by H. Liu
14
Learning using relevant
information
A determination P Q says if any examples match
on P, they must also match on Q
Find the simplest determination consistent with
the observations
Search through the space of determinations from one
predicate, two predicates
Algorithm - Fig 21.3 (page 635)
Time complexity is n choosing p.
CS 471/598 by H. Liu
15
Combining relevance based learning with
decision tree learning -> RBDTL
Its learning performance improves (Fig 21.4).
Other issues
noise handling
using other prior knowledge
from attribute-based to FOL
CS 471/598 by H. Liu
16
Inductive logic programming
It combines inductive methods with FOL.
ILP represents theories as logic programs.
ILP offers complete algorithms for inducing
general, first-order theories from examples.
It can learn successfully in domains where
attribute-based algorithms fail completely.
An example - a typical family tree (Fig 21.5)
CS 471/598 by H. Liu
17
Inverse resolution
If Classifications follow from B^H^D, then we
can prove this by resolution with refutation
(completeness).
If we run the proof backwards, we can find a H
such that the proof goes through.
Generating inverse proofs
A family tree example (Fig 21.6)
CS 471/598 by H. Liu
18
Inverse resolution involves search
Each inverse resolution step is nondeterministic
For any C and C1, there can be many C2
Discovering new knowledge with IR
It’s not easy - a monkey and a typewriter
Discovering new predicates with IR
Fig 21.7
The ability to use background knowledge
provides significant advantages
CS 471/598 by H. Liu
19
Top-down learning (FOIL)
A generalization of DT induction to the first-order
case by the same author of C4.5
Starting with a general rule and specialize it to fit data
Now we use first-order literals instead of attributes,
and H is a set of clauses instead of a decision tree.
Example: =>grandfather(x,y)
(page 642)
positive and negative examples
adding literals one at a time to the left-hand side
e.g., Father (x,y) => Grandfather(x,y)
How to choose literal? (Algorithm on page 643)
CS 471/598 by H. Liu
20
Summary
Using prior knowledge in cumulative learning
Prior knowledge allows for shorter H’s.
Prior knowledge plays different logical roles as
in entailment constraints
EBL, RBL, KBIL
ILP generate new predicates so that concise
new theories can be expressed.
CS 471/598 by H. Liu
21