Computational Discovery of Communicable Knowledge

Cumulative Learning of Relational and
Hierarchical Skills from Problem Solving
Pat Langley
Institute for the Study of
Learning and Expertise
Palo Alto, CA
http://www.isle.org
This research was funded by Grant HR0011-04-1-0008 from the DARPA Information
Processing Technology Office, which may not agree with the points made in this talk.
Research Objectives
We are designing and implementing new learning methods that:
 operate over relational, hierarchical knowledge structures
 support reasoning, reactive control, and problem solving
 are embedded within a broader architectural framework
 utilize existing knowledge to increase learning rates
 acquire this knowledge in an incremental, cumulative manner
 are applicable to a variety of challenging domains
We hope to develop learning mechanisms that support horizontal
and vertical transfer both within and across domains.
The ICARUS Architecture*
Perceptual
Buffer
Long-Term
Conceptual
Memory
Long-Term
Skill Memory
Categorization
and Inference
Means-Ends
Analysis
Short-Term
Conceptual
Memory
Perception
Skill
Retrieval
Environment
Goal/Skill
Stack
Skill
Execution
Motor
Buffer
* without learning
Organization of Long-Term Memory
ICARUS organizes both concepts and skills in a hierarchical manner.
concepts
Each concept is defined in terms of other concepts and/or percepts.
skills
Each skill is defined in terms of other skills, concepts, and percepts.
Concepts from In-City Driving Domain
(in-segment (?self ?sg)
:percepts ((self ?self segment ?sg) (segment ?sg)))
(aligned-with-lane (?self ?lane)
:percepts ((self ?self) (lane-line ?lane angle ?angle))
:positives ((in-lane ?self ?lane))
:tests
((> ?angle 0.05) (< ?angle 0.05)) )
(on-street (?self ?packet)
:percepts ((self ?self) (packet ?packet street ?street)
(segment ?sg street ?street))
:positives ((not-delivered ?packet) (current-segment ?self ?sg)))
(increasing-direction (?self)
:percepts ((self ?self))
:positives ((increasing ?b1 ?b2))
:negatives ((decreasing ?b3 ?b4)) )
Organization of Long-Term Memory
ICARUS interleaves its long-term memories for concepts and skills.
concepts
skills
For example, the skill highlighted here refers directly to
the highlighted concepts.
Skills from In-City Driving Domain
(turn-around-on-street (?self ?packet)
:percepts ((self ?self segment ?segment direction ?dir)
(building ?landmark))
:start
((on-street-wrong-direction ?packet))
:effects
((on-street-right-direction ?packet))
:ordered
((get-in-U-turn-lane ?self) (prepare-for-U-turn ?self)
(steer-for-U-turn ?self ?landmark)) )
(get-aligned-in-segment (?self ?sg)
:percepts ((lane-line ?lane angle ?angle))
:requires ((in-lane ?self ?lane))
:effects
((aligned-with-lane ?self ?lane))
:actions
((steer (times ?angle 2))) )
(steer-for-right-turn (?self ?int ?endsg)
:percepts ((self ?self speed ?speed) (intersection ?int cross ?cross)
(segment ?endsg street ?cross angle ?angle))
:start
((ready-for-right-turn ?self ?int))
:effects
((in-segment ?self ?endsg))
:actions
((times steer 2)) )
Basic ICARUS Processes
ICARUS matches patterns to recognize concepts and select skills.
concepts
Concepts are matched bottom up, starting from percepts.
skills
Skill paths are matched top down, starting from intentions.
A Trace of Means-Ends Problem Solving
An impasse causes ICARUS to invoke a means-ends problem solver.
11
10
9
8
1
7
6
3
5
2
4
The resulting traces provide the material for learning new relational
skills and concepts in terms of simpler components.
Learning Skills from Means-Ends Traces
11
10
9
8
1
7
A
6
3
5
2
4
concept chaining
ICARUS learns skills for ordering subgoals from concept chaining.
Learning Skills from Means-Ends Traces
11
10
9
8
1
7
A
6
3
5
B
2
4
skill chaining
ICARUS learns skills for ordering subskills from skill chaining.
Learning Skills from Means-Ends Traces
concept chaining
11
10
C
9
8
1
7
A
6
3
5
B
2
4
Each level of skill learning builds upon results from prior levels.
Learning Skills from Means-Ends Traces
skill chaining
11
10
C
9
D
8
1
7
A
6
3
5
B
2
4
This leads ICARUS to extend its skill hierarchy in a cumulative way.
Learning Skills from Means-Ends Traces
E
11
10
C
9
D
8
1
7
A
6
3
5
B
2
4
concept chaining
This in turn supports transfer both within and across problems.
Transfer Results in FreeCell
FreeCell is a complex solitaire game in which all cards are visible.
We let ICARUS practice on versions with a small set of cards, then
examined its transfer to problems with more cards.
Transfer Results in FreeCell
Experiments revealed substantial transfer to the harder problems.
This held both for the percentage of problems solved and for the
effort required on successful attempts.
Directions for Future Research
Our initial results suggest ICARUS can transfer knowledge learned
on simple problems to complex ones from the same domain.
In future work, we intend to examine the additional issues of:
 vertical transfer to domains that utilize others as components;
 horizontal transfer to domains to share knowledge elements;
 horizontal transfer to tasks that require representation mapping.
The final problem is a key challenge in developing robust methods
for reusing learned knowledge.
We hope to evaluate our ideas on both action-oriented domains
like strategy games and inferential tasks like physics problems.
The General Game-Playing Testbed
Genesereth and Love (2005) have developed a framework that:





supports a wide variety of wide variety of N-person games;
describes each game setting in a standard logical formalism;
specifies the rules of each game in a related formalism;
manages matches between players and records activities;
provides sample games for debugging candidate systems.
They have designed this framework to encourage research on
general approaches to intelligent behavior.
However, it also provides an excellent testbed for evaluating the
ability of learning systems to transfer within and across domains.
See http://games.stanford.edu for more details and examples.