Two perspectives of Machine Learning:
Machine Learning for advanced data analysis
Machine Learning for robust artificial agents
pessimism and
optimism
performance
“quality”
Environment
Learning
Component
training or
time or …
Knowledge
Base
Performance
Component
Doug Fisher
1
Machine Learning Overview
Empirical, Supervised Learning
Example: Naïve Bayesian Classifiers
Subclass: Supervised Rule Induction
Example: Decision tree induction
Example: Brute-force induction of decision rules
Empirical, Unsupervised Learning
Unsupervised Rule Induction
Association Rule Learning
Bayesian Network Learning
Clustering
Analytical Learning
Explanation-Based Learning
Empirical/Analytic Hybrids
Doug Fisher
2
Empirical (aka data-driven) Supervised Learning
Given: a set of classified objects (a training data set)
Find: a classifier (for predicting class membership of unclassified
data – a test set)
An example training set:
Index
1
2
. .
n
V1
v11
v12
. .
v12
V2
v21
v22
V3 . .
v32
v32
v21
v31
.
. Vm
vm2
vm1
C
c1
c2
vm2
c1
Subsequent example (but NOT general) assumptions:
two values per variable, two classes
Doug Fisher
3
Environment
Labeled data
Labeled data
Learning
Component
v11 v21 v32 …vm2 c1
Classifier
Unlabeled
Data
v11 v21 v32 …vm2 c?
Performance
Component
test data
accuracy
Doug Fisher
4
Amount of training
Example: Decision tree classifiers (e.g., Recommender systems)
[ SciFi = -1, Terror = 1, Romance = -1, Ebert = 1, Siskel = 1,….]
Ebert
-1
1
Siskel
SciFi
-1
1
-1
BigStar
-1
1
~Rent-it
(-1)
~Rent-it
1
Terror
-1
1
Rent-it ~Rent-it
(1)
Romance
-1
1
Spouse
Rent-it Rent-it
-1
1
Social and/or content attributes; thumbs up/down or ratings
Doug Fisher
~Rent-it
Rent-it
5
Learning a Decision Tree: standard greedy approach
(Top-Down Induction of Decision Trees)
Node TDIDT (Set Data, int (* TerminateFn) (Set, Set, Set), Variable (* SelectFn) (Set, Set, Set))
{
if ((* TerminateFn) (Data)) return ClassNode(Data);
BestVariable = (* SelectFn)(Data);
return
(
TestNode(BestVariable)
v1
TDIDT({d | d in Data and
Value(BestAttribute, d)
= v1})
)
v2
TDIDT({d | d in Data and
Value(BestAttribute, d)
= v2})
}
Doug Fisher
This is not the only way to learn a decision tree !!
6
Attribute Selection
BestVariable = (* SelectFn)(Data)
The big picture on attribute selection:
•
if Vi and C are independent, value Vi least (“minimally”) informative
•
if each value of Vi associated with exactly one C,
value Vi most (“maximally”) informative
•
most cases somewhere in between
V1
C1
C2
v11
50
50
v12
20
70
Doug Fisher
C1
C2
100
V2
v21
C1
C2
70
V3
v31
70
0
50
60
110
20
40
v22
0
70
70
v32
20
10
30
70
140
70
70
140
70
70
140
7
Selecting the best divisive attribute (SelectFN):
Attribute Vi that minimizes:
P(Ck | Vi = vij) | log P(Ck | Vi = vij) |
P(Vi = vij)
j
treat 0 * log 0 as 0, else a runtime error
will be generated (log 0 is undefined)
k
#bits necessary to
encode Ck conditioned
on Vi = vij
Expected number of bits necessary to
encode C membership conditioned on
Vi = vij
Doug Fisher
Expected number of bits necessary to encode C conditioned on
knowledge of Vi value
8
Selecting the best divisive attribute (alternate):
Attribute that maximizes:
P(Ck | Vi = vij)^2
P(Vi = vij)
j
Doug Fisher
k
9
Deciding when to stop expansion
(* TerminateFn) (Data))
Prospective Pruning
Assume that a decision tree has been constructed from training data, and it includes a node that tests
on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two
training data there are C1), and the right child predicting C2 (because the only training datum there is
C2). The situation is illustrated here:
-1
V
C1 (2)
C2 (0)
1
C1 (0)
C2 (1)
Suppose that during subsequent use, it is found that
i. a large # of items (N > 1000) are classified to the node described above
ii. 50% of these have V= -1 and 50% of these have V = 1
iii. post classification analysis shows that of the N items reaching the node during usage, 67% were
C1 and 33% were C2
iv. of the 0.5 * N items that went to the left leaf during usage, 67% were C1 and 33% were C2
v. of the 0.5 * N items that went to the right leaf during usage, 67% were also C1 and 33% were C2
Doug Fisher
10
Deciding when to stop expansion
(* TerminateFn) (Data))
Prospective Pruning
Suppose that during subsequent use, it is found that
i. a large # of items (N > 1000) are classified to the node described above
ii. 50% of these have V= -1 and 50% of these have V = 1
iii. post classification analysis shows that of the N items reaching the node during usage, 67% were
C1 and 33% were C2
iv. of the 0.5 * N items that went to the left leaf during usage, 67% were C1 and 33% were C2
v. of the 0.5 * N items that went to the right leaf during usage, 67% were also C1 and 33% were C2
-1
V
C1 (2)
C2 (0)
50% items go here;
predict C1;
wrong 33% of time
0.5(0.33)
pruned
to
1
C1 (0)
C2 (1)
50% items go here;
predict C2;
wrong 67% of time
+
C1 (2)
C2 (1)
100% items go here;
predict C1;
wrong 33% of time
1.0(0.33) error rate
0.5(0.67) = 0.5 error rate
Doug Fisher
11
Variations of DT Induction
Regression trees predict values along a continuously-valued dependent variable
Regression tree over one variable, with an illustration from Srinivasan and Fisher
(1995) IEEE Software Engineering paper on estimating software development time
(http://dl.acm.org/citation.cfm?id=205309)
Doug Fisher
12
Variations of DT Induction
R regression tree over one variable, with an illustration from the
IEEE Software Engineering paper on estimating software development time.
We also discussed using linear regression at each regression tree leaf instead of
using zero-order models (i.e., h(x) = θ0 ) at each leaf. For example, using a linear
regression model over x, we might have the following at two leaves of the
regression tree.
To make a prediction of y for a given x, we would classi
the x to a leaf and then use the linear model over that lea
to estimate y by h(x).
Doug Fisher
13
Variations
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
continuous attributes
• hard versus soft splits
other node types (e.g., perceptron trees)
continuous classes (regression trees)
forests of trees (ensembles)
termination conditions (pruning)
selection measures (see problem DT1)
missing values
• during training
• during classification
noise in data
irrelevant attributes
less greedy variants (e.g., lookahead)
incremental construction
applications (e.g., Banding)
cognitive modeling (e.g., Hunt)
DT based approaches to nearest neighbor search, object recognition
background knowledge to augment feature space
Doug Fisher
14
Auxiliary Slides
Doug Fisher
15
V1
Data: 1
-1
dVector
-1
-1
-1
-1
The test attribute might actually -1
be represented by an index into
the datum vector (e.g., instead of 1
V2
-1
1
-1
-1
1
-1
1
1
V3 V4
1 1
-1 1
-1 1
1 -1
1 -1
1 1
-1 -1
1 -1
V4, a 3 might be given here, indicating
a test of location 3 of a datum vector
indexed from 0 to 3.
Best-attribute: V4
Assume left branch always
corresponds to -1
4 V4 4
C dClass
c1
c1 (or –1)
c1
c1
c2 (or 1)
c2
c2
c2
A Datum, d
A Set of Datum
Number of data sent down
left and right branches,
respectively.
Assume right branch always
corresponds to 1
TDIDT( [-1-11-11c1, -111-11c2, TDIDT([1-111c1, -11-11c1,
-11-1-1c2, 111-11c2])
-1-1-11c1, -1-111c2])
Doug Fisher
16
4 V4 4
BestAttribute: V2
TDIDT( [-1-11-1c1, -111-1c2,
-11-1-1c2, 111-1c2])
TDIDT([1-111c1,-11-11c1,
-1-1-11c1, -1-111c2])
4 V4 4
BestAttribute: V2
1 V2 3
TDIDT( [-1-11-1c1])
Doug Fisher
TDIDT([1-111c1,-11-11c1,
-1-1-11c1, -1-111c2])
TDIDT([-111-1c2, -11-1-1c2, 111-1c2])
17
4 V4 4
1 V2 3
TDIDT( [-1-11-1c1])
TDIDT([1-111c1,-11-11c1,
-1-1-11c1, -1-111c2])
TDIDT([-111-1c2, -11-1-1c2, 111-1c2])
0 C1 1
i.e., -1
Number of data at leaf in C1 (right entry) and
not in C1 (left entry)
Doug Fisher
18
4 V4 4
1 V2 3
TDIDT( [-1-11-1c1])
0 C1 1
TDIDT([1-111c1,-11-11c1,
-1-1-11c1, -1-111c2])
TDIDT([-111-1c2, -11-1-1c2, 111-1c2])
0 C2 3
i.e., 1
Doug Fisher
19
4 V4 4
BestAttribute: V3
1 V2 3
0 C1 1
Doug Fisher
TDIDT([1-111c1,-11-11c1,
-1-1-11c1, -1-111c2])
0 C2 3
20
4 V4 4
BestAttribute: V3
1 V2 3
0 C1 1
2 V3 2
0 C2 3
TDIDT([-11-11c1, TDIDT([1-111c1,
-1-1-11c1])
-1-111c2])
0 C1 2
Doug Fisher
21
4 V4 4
1 V2 3
0 C1 1
0 C2 3
2 V3 2
0 C1 2
TDIDT([1-111c1,
-1-111c2])
BestAttribute: V1
Doug Fisher
22
4 V4 4
1 V2 3
0 C1 1
0 C2 3
2 V3 2
0 C1 2
1 V1 1
TDIDT([-1-111c2]) TDIDT([1-111c1])
0 C2 1
0 C1 1
In general, it might appear that the left integer
field of a leaf will always be 0, but some
termination functions allow “non-pure” leaves
(e.g., no split changes the class distribution
significantly).
Doug Fisher
23
Decomposition and search are important principles in machine learning
x
xx
x x
x
x
Y
x
x
x
x
x
x x
x
x
x
x x
x
x
x
x
x
X
Doug Fisher
24
Decomposition and search are important principles in machine learning
X=n
x
xx
x x
x
x
Y
x
x
x
x
x x
x
x x
x
Y=m
x
x
x
x
x
x
x
X
Doug Fisher
25
Decomposition and search are important principles in machine learning
X=n
x
xx
x x
x
x
Y
x
x
x
x
x x
x
x x
x
Y=m
x
x
x
x
x
x
x
X
X
Doug Fisher
26
Lots of different search algorithms possible !!
Ensembles of classifiers
Other supervised approaches: ANNs, SVMs, …
Relational (e.g., first-order) representations, such as:
IF R(?c1, ?r1) Λ R(?c2, ?r1) Λ R(?c3, ?r2) Λ R(?c4, ?r2) Λ R(?c5, ?r2)
Λ ≠(?c1, ?c2) Λ ≠(?c3, ?c4) Λ ≠(?c3, ?c5) Λ ≠(?c4, ?c5)
THEN FullHouse(?c1, ?c2, ?c3, ?c4, ?c5)
K
H
K
C
5
S
5
C
5
D
2
D
9
H
2
C
2
H
9
D
6
S
6
H
7
C
7
D
7
H
A
C
3
C
3
D
A
H
3
H
Doug Fisher
The matching problem (on sets of feature vectors)
27
Empirical, Supervised Learning
Example: Naïve Bayesian Classifiers
Subclass: Supervised Rule Induction
Example: Decision tree induction
Example: Brute-force induction of decision rules
Empirical, Unsupervised Learning
Unsupervised Rule Induction
Association Rule Learning
Bayesian Network Learning
Clustering
Analytical Learning
Explanation-Based Learning
Empirical/Analytic Hybrids
Doug Fisher
28
Unsupervised Performance Task: Pattern Completion
0.5 ? 0.01 -0.12 …. ?
C’
C’’
Knowledge
Base
0.5 0.75 0.01 –0.12 …. –0.45
Doug Fisher
c’1 c’’2
29
Environment
data
0.5 0.75 0.01 –0.12 …. –0.45
Completed data
Learning
Component
Knowledge
Base
Incomplete Data
0.5 ? 0.01 -0.12 …. ?
Performance
Component
A
test data
accuracy
B
Different attributes
Doug Fisher
C
Amount of training
30
Example: Unsupervised rule induction of Association Rules
(market-basket analysis)
In a nutshell: run brute force rule discovery for all possible consequents,
not simply single variable values (e.g., V1=v12), but consequents that are
conjunctions of variable values (e.g., V1=v12 & V4=v42 & V5=v51).
Retain rules A C such that P(A & C) >= T1 and P(C|A) >= T2. These thresholds
enable pruning of the search space (A and C are themselves conjunctions).
Problem: a plethora of rules, most uninteresting, are produced.
Solutions: Organize/prune rules by
a) Interestingness (e.g., AC interesting if P(A, C) >> P(A)P(C) or << P(A)P(C)
b) confidence (a confidence interval around coverage and/or accuracy)
c) support for top-level goal
Doug Fisher
31
Example (Empirical, Unsupervised): Learning Bayesian Networks
P(v1)
P(v3|v1)
P(v3|~v1)
P(v5|v3)
P(v5|~v3)
v1
v2
P(v2)
v3
v5
v4
P(v4|v2, v3),
P(v4|v2,~v3),
P(v4|~v2, v3),
P(v4|~v2, ~v3)
Components of a Bayesian Network: a topology (graph) that qualitatively indicates
displays the conditional independencies, and probability tables at each node
Semantics of graphical component: for each variable, v, v is independent of all
of its non-descendents conditioned on its parents
A Bayesian Network is a graphical representation of a joint probability distribution
with (conditional) independence relationships made explicit
Doug Fisher
32
Recall the chain rule:
Assume Vi a binary valued variable (T or F)
P(v1 and v2 and ~v3 and v4 and ~v5)
A factorization ordering
= P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4)
P(v1,v2)
P(v1,v2,~v3)
P(v1,v2,~v3,v4)
P(v1,v2,~v3,v4,~v5)
P(v1 and v2 and ~v3 and v4 and ~v5)
An alternative ordering
= P(v4)P(v2|v4)P(~v3|v4,v2)P(v1|v4,v2,~v3)P(~v5|v4,v2,~v3,v1)
Doug Fisher
33
P(v1 and v2 and ~v3 and v4 and ~v5)
A factorization ordering
= P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4)
Assume the following conditional independencies:
P(v1)
v2 independent of v1
P(v2|v1) = P(v2) and P(v2|~v1) = P(v2), P(~v2|v1) = P(~v2), P(~v2|~v1) = P(~v2)
P(~v3|v1,v2) = P(~v3|v1) v3 independent of v2 conditioned on v1
and P(~v3|v1,~v2) = P(~v3|v1), P(~v3|~v1,v2) = P(~v3|~v1), P(~v3|~v1,~v2) = P(~v3|~v1),
P(v3|v1,v2) = P(v3|v1), P(v3|v1,~v2) = P(v3|v1), P(v3|~v1,v2) = P(v3|~v1),
P(v3|~v1,~v2) = P(v3|~v1)
P(v4|v1,v2,~v3) = P(v4|v2, ~v3) and ……
P(~v5|v1,v2,~v3,v4) = P(~v5|~v3) and …..
Doug Fisher
34
P(v1 and v2 and ~v3 and v4 and ~v5)
= P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4)
= P(v1)P(v2)P(~v3|v1)P(v4|v2,~v3)P(~v5|~v3)
How many probabilities need be stored?
P(v1), P(~v1) 2 probabilities (actually only one, since P(~v1) = 1 – P(v1))
2 probabilities (or 1) instead of 4 (or 2)
P(v2|v1) = P(v2)
and P(v2|~v1) = P(v2), P(~v2|v1) = P(~v2), P(~v2|~v1) = P(~v2)
P(~v3|v1) = 1 – P(v3|v1)
P(~v3|v1,v2) = P(~v3|v1)
4 probabilities (or 2) instead of 8 (or 4)
and P(~v3|v1,~v2) = P(~v3|v1), P(~v3|~v1,v2) = P(~v3|~v1), P(~v3|~v1,~v2) = P(~v3|~v1),
P(v3|v1,v2) = P(v3|v1), P(v3|v1,~v2) = P(v3|v1), P(v3|~v1,v2) = P(v3|~v1),
P(v3|~v1,~v2) = P(v3|~v1)
8 probabilities (or 4) instead of 16 (or 8)
P(v4|v1,v2,~v3) = P(v4|v2, ~v3) and ……
4 probabilities (or 2) instead of 32 (or 16)
P(~v5|v1,v2,~v3,v4) = P(~v5|~v3) and …..
Doug Fisher
35
For a particular factorization ordering, construct a Bayesian network as follows:
v1 a “root”
P(v1), P(~v1)
v1
P(v1) = 0.75
P(~v1) = 0.25 = 1 – P(v1)
v2 is second variable in ordering. If v2 independent of a subset of its predecessors
(possibly the empty set) in ordering conditioned on a disjoint subset of predecessors
(including possibly all its predecessors), then the latter subset is its parents, else
if latter subset is empty then v2 is a “root”
Since P(v2|v1) = P(v2) ….
P(v1)
Doug Fisher
v1
v2 P(v2)
36
v3 is third variable in ordering. Since P(v3|v1,v2) = P(v3|v1), …:
P(v1)
v1
v2 P(v2)
P(v3|v1)
v3
P(v3|~v1)
P(~v3|v1) = 1 – P(v3|v1)
P(~v3|~v1) = 1 – P(v3|~v1)
Since P(v4| v1, v2, v3) = P(v4 | v2, v3), …
P(v1)
P(v3|v1)
P(v3|~v1)
Doug Fisher
v1
v2
v3
v4
P(v2)
P(v4|v2, v3), P(~v4|v2,v3) = 1-P(v4|v2,v3)
P(v4|v2,~v3), …
P(v4|~v2, v3), …
37
P(v4|~v2, ~v3), …
Since P(v5|v1,v2, v3, v4) = P(v5|v3),…:
P(v1)
P(v3|v1)
P(v3|~v1)
P(v5|v3)
P(v5|~v3)
v1
v2
P(v2)
v3
v5
v4
P(v4|v2, v3),
P(v4|v2,~v3),
P(v4|~v2, v3),
P(v4|~v2, ~v3)
Components of a Bayesian Network: a topology (graph) that qualitatively indicates
displays the conditional independencies, and probability tables at each node
Semantics of graphical component: for each variable, v, v is independent of all
of its non-descendents conditioned on its parents
Doug Fisher
38
Where does knowledge of conditional independence come from?
a) From data. Consider congressional voting records. Suppose that we have data
on House votes (and political party). Suppose variables are ordered
Party, Immigration, StarWars, ….
Party P(Republican) = 0.52 (226/435 Republicans
209/435 Democrats)
To determine relationship between Party and Immigration, we count
Actual Counts
Immigration
Yes
No
Republican
17
209
Democrat
160
49
Predicted Counts (if Immigration and
Party independent)
Yes No
Republican 92 134
Democrat
85 124
Very different distributions – conclude dependent
Doug Fisher
P(Rep)*P(Yes) * 435
= 0.52 * (17+160)/435 * 435
39
17/226
P(Yes| Rep) = 0.075
P(Yes|Dem) = 0.765
Party P(Republican) = 0.52 (226/435 Republicans
209/435 Democrats)
Immigration
Actual Counts
Immigration
Yes
No
Republican
17
209
Democrat
160
49
Consider StarWars
Is StarWars independent of Party and Immigration?
(i.e., is P(StarWars|Party, Immigration) approx equal P(StarWars)
for all combinations of variable values?)
if yes, then stop and make StarWars a “root”, else continue
Is StarWars independent of Immigration conditioned on Party?
if yes, then stop and make Immigration a child of Party, else continue
Is StarWars independent of Party conditioned on Immigration?
if yes, then stop and make Immigration a child of Immigration, else continue
Make StarWars a child of both Party and Immigration
Doug Fisher
40
17/226
Party P(Republican) = 0.52 (226/435 Republicans
209/435 Democrats)
P(Yes| Rep) = 0.075
P(Yes|Dem) = 0.765
Immigration
Actual Counts
Immigration
Yes
No
Republican
17
209
Democrat
160
49
Actual Counts
StarWars
Yes No
219
7
24
185
Consider StarWars
Is StarWars independent of Party and Immigration?
Actual Counts
Predicted Counts
Immigration P(Rep & Imm=Y)P(SW=Y)435 Immigration
Yes
No
Yes
No
3
205 4
7.5
117 92
Republican 14
Republican 9.5
152
16
33
71
27
22
Democrat 8
Democrat 89
Yes No Yes No
Yes No Yes No
StarWars
StarWars
Doug Fisher
different – not independent
41
Further tests might indicate
Party
Immigration
StarWars
i.e., Immigration and StarWars are independent conditioned on Party
Doug Fisher
42
Where does knowledge of conditional independence come from?
b) “First principles”
For example, suppose that the grounds keeper sets sprinkler timers
to a fixed schedule that depends on the season (Summer, Winter,
Spring, Fall), and suppose that the probability that it rains or not
is dependent on season. We might write:
Season
Rains
Sprinkler
This model might differ from one in which a homeowner manually
turns on a sprinkler
Season
Rains
Doug Fisher
Sprinkler
43
Example (Empirical, Unsupervised): Clustering
Given data (vectors of variable values)
Compute a partition (clusters) of the vectors, such that vectors within
a cluster tend to be similar, and vectors across clusters tend to be
dissimilar
For example,
1
2
V1
0.3
0.4
N-1 -0.3
N -0.5
Doug Fisher
V2 V3 V4 ………….. VM
0.7
0.1 -0.2 …………. -0.5
0.8
0.01 0.1 …………. -0.4
…………………..
0.1
1.01 0.8 …………. 1.3
0.03 1.1 0.9 …………. 0.9
1,2…
…,N-1,N
44
Cluster summary representations (e.g., the centroid)
1
2
V1
0.3
0.4
N-1 -0.3
N -0.5
V2 V3 V4 ………….. VM
0.7
0.1 -0.2 …………. -0.5
0.8 0.01 0.1 …………. -0.4
…………………..
0.1
1.01 0.8 …………. 1.3
0.03 1.1 0.9 …………. 0.9
C1
C2
1,2…
…,N-1,N
0.35 0.75 0.05 -0.05 …. –0.45
(centroid for C1)
Doug Fisher
-0.4 0.05 1.05 0.85 …. 1.1
(centroid for C2)
45
Using summary representations for inference
0.5 ? 0.01 -0.12 …. ?
C1
C2
1,2…
…,N-1,N
0.35 0.75 0.05 -0.05 …. –0.45
(centroid for C1)
-0.4 0.05 1.05 0.85 …. 1.1
(centroid for C2)
0.5 0.75 0.01 –0.12 …. –0.45
Doug Fisher
46
K-means
Clustering K-Means (Data, K) {
ClusterCentroids = K randomly selected vectors from Data
for each d in Data
assign d to cluster with closest centroid
do {
compute new cluster centroids
for each d in Data
assign d to cluster with closest centroid
} while NOT termination condition
}
“closest”: Euclidean distance
Doug Fisher
47
Artificial Intelligence
“Influence”
Cognitive Psychology
Statistics
Other: Database, Automata,…
Time
Doug Fisher
Total “mass” of outside influences decreasing, because
Machine Learning has become a community of its own
48
Doug’s view
Mary’s view
Moorthy’s view
Hua’s view
Jing’s view
Emily’s view
4
Doug Fisher 9
Empirical, Supervised Learning
Example: Naïve Bayesian Classifiers
Subclass: Supervised Rule Induction
Example: Decision tree induction
Example: Brute-force induction of decision rules
Empirical, Unsupervised Learning
Unsupervised Rule Induction
Association Rule Learning
Bayesian Network Learning
Clustering
Analytical Learning
Explanation-Based Learning
Empirical/Analytic Hybrids
Doug Fisher
50
Environment
data
Solved Problem(s)
Learning
Component
Knowledge
Base
Problem(s)
Problem
Solving
Quality
or speed
or …
Doug Fisher
Performance
Component
Amount of training (Number of problem solved)
51
Learning macros: Given a plan, generalize the plan so that the generalized plan
can be applied in a greater number of situations
Objective: reusing previously-developed generalized plans (aka macro-operators)
will reduce the cost (improve the “speed”) of subsequent planning
A
B
B
A
C
Start State
GoalSpec
Unstack(A,B) Putdown(A) Unstack(B,C) Stack(B,A)
(Generalize)
Unstack(?x1, ?y1) Putdown(?x1) Unstack(?y1, ?z1) Stack(?y1, ?x1)
Doug Fisher
52
A
B
B
A
C
Start State
GoalSpec
Unstack(A,B) Putdown(A) Unstack(B,C) Stack(B,A)
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y2, ?z1)
Stack(?y3, ?x3)
x
On(?x1,?y1)
Holding(?x2) Holding(?x2) On(?y2,?z1) On(?y2,?z1)
Holding(?y3) Holding(?y3)
Clear(?x1)
Clear(?x2)
Clear(?y2) Clear(?y2)
Clear(?x3) Clear(?x3)
Handemp()
Handemp()
Handemp() Handemp()
Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y2)
Clear(?y3)
Clear(?y1)
Clear(?z1)
On(?y3,?x3)
xOn(?x1,?y1)
xClear(?x1)
xHandemp()
Doug Fisher
53
Learning macros:
A
B
B
A
C
Start State
Unstack(A,B) Putdown(A) Unstack(B,C) Stack(B,A)
GoalSpec
{?x3/?x2}
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y2, ?z1)
Stack(?y3, ?x3)
x
On(?x1,?y1)
Holding(?x2) Holding(?x2) On(?y2,?z1) On(?y2,?z1)
Holding(?y3) Holding(?y3)
Clear(?x1)
Clear(?x2)
Clear(?y2) Clear(?y2)
Clear(?x3) Clear(?x3)
Handemp()
Handemp()
Handemp() Handemp() {?y3/?y2}
Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y2)
Clear(?y3)
Clear(?y1)
Clear(?z1)
On(?y3,?x3)
xOn(?x1,?y1)
xClear(?x1)
xHandemp()
Doug Fisher
54
Learning macros:
A
B
B
A
C
Start State
Unstack(A,B) Putdown(A) Unstack(B,C) Stack(B,A)
GoalSpec
{?x3/?x2}
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y2, ?z1)
Stack(?y2, ?x2)
x
On(?x1,?y1)
Holding(?x2) Holding(?x2) On(?y2,?z1) On(?y2,?z1)
Holding(?y2) Holding(?y2)
Clear(?x1)
Clear(?x2)
Clear(?y2) Clear(?y2)
Clear(?x2) Clear(?x2)
Handemp()
Handemp()
Handemp() Handemp() {?y3/?y2}
Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y2)
Clear(?y2)
Clear(?y1)
Clear(?z1)
On(?y2,?x2)
xOn(?x1,?y1)
xClear(?x1)
xHandemp()
Doug Fisher
55
Learning macros:
A
B
B
A
C
Start State
GoalSpec
Unstack(A,B) Putdown(A) Unstack(B,C) Stack(B,A)
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y2, ?z1)
Stack(?y2, ?x2)
x
On(?x1,?y1)
Holding(?x2) Holding(?x2) On(?y2,?z1) On(?y2,?z1)
Holding(?y2) Holding(?y2)
Clear(?x1)
Clear(?x2)
Clear(?y2) Clear(?y2)
Clear(?x2) Clear(?x2)
Handemp()
Handemp()
Handemp() Handemp()
Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y2)
Clear(?y2)
Clear(?y1)
Clear(?z1)
On(?y2,?x2)
{?y2/?y1}
xOn(?x1,?y1)
xClear(?x1)
xHandemp()
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y1, ?z1)
Stack(?y1, ?x2)
x
On(?x1,?y1)
Holding(?x2) Holding(?x2) On(?y1,?z1) On(?y1,?z1)
Holding(?y1) Holding(?y1)
Clear(?x1)
Clear(?x2)
Clear(?y1) Clear(?y1)
Clear(?x2) Clear(?x2)
Handemp()
Handemp()
Handemp() Handemp()
Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y1)
Clear(?y1)
Clear(?y1)
Clear(?z1)
On(?y1,?x2)
Doug Fisher
56
x On(?x1,?y1)
x Clear(?x1)
x Handemp()
Learning macros:
A
B
B
A
C
Start State
GoalSpec
Unstack(?x1, ?y1)
Putdown(?x2)
Unstack(?y1, ?z1)
Stack(?y1, ?x2)
On(?x1,?y1)
Holding(?x2) Holding(?x2) x On(?y1,?z1) On(?y1,?z1)
Holding(?y1) Holding(?y1)
Clear(?x1)
Clear(?x2)
Clear(?y1) Clear(?y1)
Clear(?x2) Clear(?x2)
Handemp()
Handemp() Handemp()
Handemp()
{?x2/?x1} Handemp()
Holding(?x1)
OnTab(?x2)
Holding(?y1)
Clear(?y1)
Clear(?y1)
Clear(?z1)
On(?y1,?x2)
x On(?x1,?y1)
x Clear(?x1)
x Handemp()
Unstack(?x1, ?y1)
Putdown(?x1)
Unstack(?y1, ?z1)
Stack(?y1, ?x1)
On(?x1,?y1)
Holding(?x1) Holding(?x1) x On(?y1,?z1) On(?y1,?z1)
Holding(?y1) Holding(?y1)
Clear(?x1)
Clear(?x1)
Clear(?y1) Clear(?y1)
Clear(?x1) Clear(?x1)
Handemp()
Handemp()
Handemp() Handemp()
Handemp()
Holding(?x1)
OnTab(?x1)
Holding(?y1)
Clear(?y1)
Clear(?y1)
Clear(?z1)
On(?y1,?x1)
Doug Fisher
57
x On(?x1,?y1)
x Clear(?x1)
x Handemp()
Learning macros:
A
B
B
A
C
GoalSpec
Start State
Unstack(?x1, ?y1)
Putdown(?x1)
Unstack(?y1, ?z1)
Stack(?y1, ?x1)
On(?x1,?y1)
Holding(?x1) Holding(?x1) x On(?y1,?z1) On(?y1,?z1)
Holding(?y1) Holding(?y1)
Clear(?x1)
Clear(?x1)
Clear(?y1) Clear(?y1)
Clear(?x1) Clear(?x1)
Handemp()
Handemp()
Handemp() Handemp()
Handemp()
Holding(?x1)
OnTab(?x1)
Holding(?y1)
Clear(?y1)
Clear(?y1)
Clear(?z1)
On(?y1,?x1)
x On(?x1,?y1)
x Clear(?x1)
x Handemp()
Macrop(?x1, ?y1, ?z1)
On(?x1, ?y1)
On(?y1, ?z1)
Clear(?x1)
Handemp()
Doug Fisher
On(?x1, ?y1)
Clear(?x1)
On(?y1, ?z1)
Clear(?y1)
OnTab(?x1)
Clear(?z1)
On(?y1, ?x1)
58
© Copyright 2026 Paperzz