Problem Solving - public.asu.edu

Learning from Observations
Chapter 18
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Learning agents
Improve their behavior through diligent
study of their own experiences.
Acting -> Experience -> Better Acting
We’ll study how to make a learning agent
to learn; what is needed for learning; and
some representative methods of learning
from observations
CS 471/598 by H. Liu
2
A general model
What are the components of a learning agent?
Learning element - learn and improve (Fig 18.1)
Performance element - an agent itself to perceive & act
Problem generator - suggest some exploratory actions
Critic - provide feedback how the agent is doing
The design of an LA is affected by four issues:
prior info, feedback, representation, performance
CS 471/598 by H. Liu
3
What do we need
Components of the performance element (p 527)
Each component should be learnable given feedback
Representation of the components
Propositional Logic, FOL, or others
Available feedback
Supervised, Reinforcement, Unsupervised
Prior knowledge
Nil, some, (Why not all?)
Put it all together as learning some functions
CS 471/598 by H. Liu
4
Inductive Learning
Data described by examples
an example is a pair (x, f(x))
Induction - given a collection of examples of f,
return a function h that approximates f.
Fig 18.2
Hypothesis
Bias
Learning incrementally or in batch
CS 471/598 by H. Liu
5
Some questions about
inductive learning
Are there many forms of inductive learning?
We’ll learn some
Can we achieve both expressiveness and
efficiency?
How can one possibly know that one’s learning
algorithm has produced a theory that will
correctly predict the future?
If one does not, how can one say that the
algorithm is any good?
CS 471/598 by H. Liu
6
Learning decision trees
A decision tree takes as input an object described
by a set of properties and outputs yes/no
“decision”.
One of the simplest and yet most successful
forms of learning
To make a decision “wait” or “not wait”, we need
information such as … (page 532 for 10
attributes, a data set in Fig 18.5)
Patrons(Full)^WaitEstimate(0-10)^Hungry(N)=>WillWait
CS 471/598 by H. Liu
7
Let’s make a decision
Where to start?
CS 471/598 by H. Liu
8
Expressiveness of a DT
Continued from page 7 - A possible DT (Fig 18.4)
The decision tree language is essentially
propositional, with each attribute test being a
proposition.
Any Boolean functions can be written as a
decision tree (truth tables <-> DTs)
DTs can represent many functions with much
smaller trees, but not for all Boolean functions
(parity, majority)
CS 471/598 by H. Liu
9
How many different functions are in the set of
all Boolean functions on n attributes?
How to find consistent hypotheses in the space
of all possible ones?
And which one is most likely the best?
CS 471/598 by H. Liu
10
Inducing DTs from examples
Extracting a pattern (DTs) means being able to
describe a large number of cases in a concise
way - a consistent & concise tree.
Applying Occam’s razor: the most likely
hypothesis is the simplest one that is consistent
with all observations.
How to find the smallest DT?
Examine the most important attribute first (Fig 18.6)
Algorithm (Fig 18.7, page 537)
A DT (Fig 18.8)
CS 471/598 by H. Liu
11
Choosing the best attribute
A computational method - information theory
Information - informally, the more surprise you have,
the more information you have; mathematically,
I(P(v1),…,P(vn)) = sum[-P(vi)logP(vi)]
I(1/2,1/2) = 1
I(0,1) = (1,0) = 0
Information alone can’t help much to answer “what
is the correct classification?”.
CS 471/598 by H. Liu
12
Information gain - the difference between
the original and the new info requirement:
Remainder(A) = Sum[p1*I(B1)+…+pn*I(Bn)]
where p1+…+pn = 1
Gain(A) = I(A) - Remainder(A)
CS 471/598 by H. Liu
13
Which attribute?
Revisit the example of “Wait” or “Not Wait”
using your favorite 2 attributes.
CS 471/598 by H. Liu
14
Assessing the
performance
A fair assessment: the one the learner has not seen.
Errors
Training and test sets:
Divide the data into two sets
Learn on the training set
Test on the test set
If necessary, shuffle the data and repeat
Learning curve - “happy graph” (Fig 18.9)
CS 471/598 by H. Liu
15
Practical use of DT learning
BP’s use of GASOIL
Learning to fly on a flight simulator
An industrial strength system - Quinlan’s
C4.5
Who’s the next hero?
CS 471/598 by H. Liu
16
Some issues of DT
applications
Missing values
Multivalued attributes
Continuous-valued attributes
CS 471/598 by H. Liu
17
Learning general logical
descriptions
Inductive learning is a process of
searching for a good hypothesis in the
hypothesis space defined by the
representation language.
There are logical connections among
examples, hypotheses, and the goal.
Go beyond decision tree induction.
CS 471/598 by H. Liu
18
What’re Goal and Hypotheses
Goal predicate Q - WillWait
Learning is to find an equivalent logical
expression we can classify examples
Each hypothesis proposes such an expression - a
candidate definition of Q.
E.g., Fig 18.8 expresses the following (Hr):
CS 471/598 by H. Liu
19
Hypothesis space is the set of all hypotheses the
learning algorithm is designed to entertain.
One of the hypotheses is correct:
H1 V H2 V…V Hn
Each Hi predicts a certain set of examples - the
extension of the goal predicate.
Two hypotheses with different extensions are
logically inconsistent with each other, otherwise,
they are logically equivalent.
CS 471/598 by H. Liu
20
What are Examples
An example is an object of some logical
description to which the goal concept may or
may not apply.
One instance/tuple is an example
Ideally, we want to find a hypothesis that agrees
with all the examples.
The relation between f and h are: ++, --, +(false negative), -+ (false positive). If the last
two occur, example I and h are logically
inconsistent.
CS 471/598 by H. Liu
21
Current-best hypothesis
search
Maintain a single hypothesis
Adjust it as new examples arrive to
maintain consistency (Fig 18.10)
Generalization for positive examples
Specialization for negative examples
Algorithm
(Fig 18.11, page 547)
Need to check for consistency with all
existing examples each time taking a new
example
CS 471/598 by H. Liu
22
Example of WillWait
Problems: nondeterministic, no guarantee for
simplest and correct h, need backtrack
CS 471/598 by H. Liu
23
Least-commitment search
Keeping one h as its best guess is the problem
-> Can we keep as many as possible?
Version space (candidate elimination) Algo
incremental
least-commitment
From intervals to boundary sets
G-set and S-set
Everything between is guaranteed to be consistent
wit examples.
CS 471/598 by H. Liu
24
Version space
Generalization and specialization (Fig 18.13)
False
False
False
False
positive for Si, too general, discard it
negative for Si, too specific, generalize it minimally
positive for Gi, too general, specialize it minimally
negative for Gi, too specific, discard it
When to stop
One concept left (Si = Gi)
The version space collapses
Run out of examples
One major problem: can’t handle noise
CS 471/598 by H. Liu
25
Why learning works?
How can one possibly know that his/her learning
algorithm will correctly predict the future?
How do we know that h is close enough to f
without knowing f?
Computational learning theory has provided some
answers. The basic idea is that because any wrong h
will make an incorrect prediction, it will be found out with
high probability after a small number of examples. So, if h
is consistent with a sufficient number of examples, it is
unlikely to to seriously wrong - probably approximately
correct.
CS 471/598 by H. Liu
26
Summary
Learning is essential for intelligent agents
dealing with the unknowns
improving its capability over time
All types of learning can be considered as
learning an accurate representation h of f.
Inductive learning - f from data to h
Decision trees - deterministic Boolean functions
Learning logical theories - Current Best and
Least Commitment
CS 471/598 by H. Liu
27