Mind is About Predictions (Talk) - incompleteideas.net Coming Soon!

Artificial Intelligence
Should Be About Predictions
Rich Sutton
AT&T Labs
with special thanks to
Michael Littman, Doina Precup, Satinder Singh,
David McAllester, Peter Stone
Outline
AI at an Impasse
A Predictive Proposal
Some of the Machinery
Prospects and Conclusion
It’s Hard to Build Large AI Systems
Brittleness
Unforeseen interactions
Scaling
Requires too much manual complexity management
 people must understand, intervene, patch and tune
 like programming
Need more self-maintenance
 learning, verification
 internal coherence of knowledge and experience
AI at a Impasse
We can’t go beyond ourselves
We can’t make AI systems more complex than we can
understand
 All the representations
 All the possible meanings
 All the interactions
Beyond that, we get bogged down




Brittleness
Continual manual tuning
Teams of people diverge on rep’ns and meanings
No big return for our efforts
What keeps the knowledge in an AI system
correct?
People do!
But eventually this is a dead end.
The key to a successful AI is that it can tell
for itself if it is working correctly.
The Verification Principle
The Verification Principle
An AI system can successfully maintain
knowledge only to the extent that it can
verify that knowledge itself
Two Strategies for Self-maintenance
Logical self-consistency
 Check statements for consistency with each other
 Establishes an internal coherence within the AI
 But tells us nothing about the external world
Consistency with data
 Make predictions, see if they happen
 Establishes a coherence between the AI and its
world
Outline
AI at an Impasse
A Predictive Proposal
Some of the Machinery
Prospects and Conclusion
Mind is About Predictions
Hypothesis: Knowledge is predictive
About what-leads-to-what, under what ways of behaving
What will I see if I go around the corner?
Objects: What will I see if I turn this over?
Active vision: What will I see if I look at my hand?
Value functions: What is the most reward I know how to get?
Such knowledge is learnable, chainable
Hypothesis: Mental activity is working with predictions
Learning them
Combining them to produce new predictions (reasoning)
Converting them to action (planning, reinforcement learning)
Figuring out which are most useful
Philosophical and Psychological Roots
Like classical british empiricism (1650–1800)
 Knowledge is about experience
 Experience is central
But not anti-nativist (evolutionary experience)
Emphasizing sequential rather than simultaneous events
 Replace association/contiguity with prediction/contingency
Close to Tolman’s “Expectancy Theory” (1932–1950)
 Cognitive maps, vicarious trial and error
Psychology struggled to make it a science (1890–1950)
 Introspection
 Behaviorism, operational definitions
 Objectivity
Modern Computional View of Mind
OK to talk about insides of minds
Ok to talk about the function and purpose of a design
We talk about Why
 Why a system works
 Why it should compute X and in manner Y
 Why such a system should achieve purpose Z
This is new, and resolves classical struggles
 Servo-mechanisms, state-transition probabilities
 Utility and decision theory
 Information as signal – subjective (private) yet clear
Purpose defines and constrains mental constructs
Informational View of Mind
Mind does information processing
Mind exchanges information with the world
Mind
experience
World
Only experience is known for sure
 Anything more public or “objective” is suspect
World is an I-O entity, a black box
Although we often seem to talk about what is inside,
All we can sensibly talk about is I-O behavior
Is Mind about Predictions?
OR
Is Mind about Action (or Policies)?
Of course it is ultimately about action
But action generation methods are relatively clear
 Value functions and decision theory
– Pick action that maximizes expected cumulative reward
 OR
–
–
–
–
Policy gradient RL methods
Execution-time search
Reflexes and behavior-based robotics
Learning-extended reflexes and conditioning
Flexible cognition requires more than action generation
Most mental activity is working with predictions
An old, simple, appealing idea
Mind as prediction engine!
Predictions are learnable, combinable
They represent cause and effect, and can be pieced
together to yield plans
Perhaps this old idea is essentially correct.
Just needs




Development, revitalization in modern forms
Greater precision, formalization, mathematics
The computational perspective to make it respectable
Imagination, determination, patience
– Not rushing to performance
Requisites of Prediction Proposal
The AI has to have a life
Predictions must be very flexible, expressive
 To capture a wide variety of world knowledge
– Mixtures of transition predictions
– Closed-loop action conditioning
– Closed-loop termination
And yet be grounded, directly comparable to data
Predictions must be combinable, compositional
 Support varieties of planning
 Projection and anticipation of futures
Outline
AI at an Impasse
A Predictive Proposal
Some of the Machinery
Prospects and Conclusion
Machinery for
General Transition Predictions
In steps of increasing expressiveness




Simple state-transition predictions
Mixtures of predictions
Closed-loop termination
Closed-loop action conditioning
While staying grounded in data
 Predictions and State
The Simplest Transition Predictions
state
Experience
st
at
action
st1 at1 st2
at2
1-step Prediction
X
a
Y
Pr st 1  Y st  X, at  a
k-step Prediction
X
p
Y
Pr st k  Y st  X, at
at k given by p
Mixtures of k-step Predictions:
Terminating over a period of time
time steps
of interest
Where will I be in 10–20 steps?
Where will I be in roughly k steps?
now
k steps
now
k=10
steps
k=20
steps
Arbitrary termination
profiles are possible
short term
But sometimes anything
like this is too loose and sloppy...
medium term
long term
Closed-loop Termination
Terminate depending on what happens
E.g., instead of “Will I finish this report soon”
which uses a soft termination profile:
1 hr
probably in about an hour
Prob.
time
Use “Will I be done when my boss gets here?”
1
Prob.
0
boss
arrives
only one precise but uncertain
time matters
Closed-loop Termination
Terminate depending on what happens
E.g., instead of “Will I finish this report soon”
which uses a soft termination profile:
1 hr
probably in about an hour
Prob.
time
Use “Will I be done when my boss gets here?”
1
Prob.
0
boss
arrives
only one precise but uncertain
time matters
Closed-loop termination
allows time specification to be
both flexible and precise
Instead of “what will I see at t+100?”
Can say “what will I see when I open the box?”
or “when John arrives?”
“when the bus comes?”
“when I get to the store?”
Will we elect a black or a woman president first?
Where will the tennis ball be when it reaches me?
What time will it be when the talk starts?
A substantial increase in expressiveness
Closed-loop Action Conditioning
What happens depends on what you do
What you do depends on what happens
Each prediction has a closed-loop policy
Policy: States --> Actions
(or Probs.)
If you follow the policy, then you predict and verify
 Otherwise not
 If partly followed, temporal-difference methods can be
used
General Transition Predictions (GTPs)
Closed-loop terminations
And closed-loop policies
Correspond to arbitrary experiments
and the results of those experiments
What will I see if I go into the next room?
What time will it be when the talk is over?
Is there a dollar in the wallet in my pocket?
Where is my car parked?
Can I throw the ball into the basket?
Is this a chair situation?
What will I see if I turn this object around?
Anatomy of a
General Transition Prediction
States
Measurement
space
p:S M
1 Predictor
Recognizes the conditions, makes the prediction
2 Experiment
- policy
- termination condition
- measurement function(s)
 Pr e st  s,p ,  m(e)
p(s) 
eA S
*
e at st1
atk 1stk
Actions
p : S  A or 2A
 : S  [0,1]
m : S  A   M
*

Ep , m(e)
Example: Open-the-door
Predictor
Use visual input to estimate
 Probabilities of succeeding in opening the door, and of other
outcomes (door locked, no handle, no real door)
 expected cumulative cost (sub-par reward) in trying
Experiment
 Policy for walking up to the door, shaping grasp of handle,
turning, pulling, and opening the door
 Terminate on successful opening or various failure conditions
 Measure outcome and cumulative cost
RoboCup-Soccer Example
Safe to pass?
Predict the outcome
of choosing to pass
• The pass will take several steps to set up
– choosing to pass involves a whole action policy
• You may choose to not to pass half way through
• Terminations and outcomes:
–
–
–
–
pass is aborted
opponents touch the ball before teammate
teamate touches first, appears to control ball
ball goes out of bounds
Example: Pass-to-Teammate
Predictor uses perceived positions of ball, opponents,
etc. to estimate probabilities of






Successful pass, openness of receiver
Interception
Reception failure
Aborted pass, in trouble
Aborted pass, something better to do
Loss of time
Experiment
 Policy for maneuvering ball, or around ball, to set up and pass
 Termination strategy for aborting, recognizing completion
 Measurement of outcome, time
More Predictive Knowledge
John is in the coffee room
My car is in the South parking lot
What we know about geography, navigation
What we know about how an object looks, rotates
What we know about how objects can be used
Recognition strategies for objects and letters
The portrait of Washington on the dollar in the wallet in
my other pants in the laundry, has a mustache on it
 Composing experiments creates a productive rep’n language
Relational, Propositional, and Deictic
 objects X, If I drop X, then X will be on the floor
X
X
 Holding object X means predicting certain sensations if, for
example, one directs one’s eyes toward one’s hand
 Thus, on dropping, the predicted sensations are merely
transferred from the looking-at-hand prediction to the
looking-at-floor prediction
 Such transfer of existing predictions should be a common
part of visual knowledge - updated every time the eyes move
 X,Y, such that Red(X), Blue(Y), and Above(X,Y)
 There is some place I can foveate and see Red
 There is some place I can foveate and see Blue
 If I foveate first the Red place, “mark” it, then the Blue
place, the mark will be Above the fovea (may need to search)
 These are typical ideas of modern, active, deictic vision
Combining Predictions
If the mind is about predictions,
Then thinking is combining predictions to
produce new ones
Predictions obviously compose
 If A->B and B->C, then A->C
GTPs are designed to do this generally
 Fit into “Bellman equations” of semi-Markov
extensions of dynamic programming
 Can also be used for simulation-based planning
Composing Predictions
X
p11
Y
Y
T1
X
p11 then p 22
T1  T2
Transient measurement
(e.g., elapsed time,
cumulative reward)
p 2 2
T2
Z
Z
Final measurement
(e.g., partial distribution
of outcome states)
Composing Predictions
X
p11
T1
X
Y’
.1
Y .8
Y’’ .1
p 11 then if Y p 2  2
T1  .8 T2
Y
p 2 2
T2
Y’
.1
Z .8
Y’’
.1
Z
Room-to-Room GTPs
Sutton, Precup,
& Singh, 1999
(General Transition Predictions)
Target (goal)
hallway
Policy
“Options”
Precup 2000
Sutton, Precup,
& Singh 1999
4 stochastic
primitive actions
up
left
Termination
hallways
right
Fail 33%
of t he t ime
down
8 multi-step GTPs
( t o each room' s 2 hallways)
Predict: Probability of reaching each terminal hallway
Goal: minimize # steps + values for target and other outcome hallway
Planning with GTPs
wit h cell-t o-cell
primit ive act ions
V (goal )=1
Iteration #0
Iteration #1
Iteration #2
Iteration #1
Iteration #2
wit h room-t o-room
opt ions
(GTPs)
V (goal )=1
Iteration #0
Learning Path-to-Goal with and
without GTPs
1000
Primitives
GTPs
& primitives
Steps
per 100
episode
GTPs
10
1
10
100
Episodes
1000
10,000
Rooms Example: Simultaneous Learning of
all 8 GTPs from their Goals
0.7
0.4
0.6
RMS Error in
l l
goal prediction
0.3
upper
hallway
subgoal
ideal
values
0.5
lower
hallway
subgoal
0.4
0.2
0.3
learned
values
0.2
0.1
0.1
0
0
20,000
40,000
60,000
Time steps
80,000
100,000
0
0
20,000
40,000
Two subgoal
state values
60,000
80,000
100,000
Time Steps
All 8 hallway GTPs were learned accurately
and efficiently while actions are selected totally at random
Machinery for General Predictions
In steps of increasing expressiveness




Simple state-transition predictions
Mixtures of predictions
Closed-loop termination
Closed-loop action conditioning
While staying grounded in data
 Predictions and State
Predictive State Representations
Problem: So far we have assumed states
but world really just gives information, “observations”
Hypothesis: What we normally think of as state
is a set of predictions about outcomes of experiments
 Wallet’s contents, John’s location, presence of objects…
Prior work:
 Learning deterministic FSAs - Rivest & Schapire, 1987
 Adding stochasticity: An alternative to HMMs - Herbert Jaeger, 1999
 Adding action: An alternative to POMDPs - Littman, Sutton, & Singh
2001
Empty Gridworld with Local Sensing
Four actions: Up, Down, Right, Left
And four sensory bits
Distance to Wall Predictions
R
RR
RRR
RRRR
. . .
0
0
1
1
0 D
1 DD
1 DDD
“meaning” of
predictions
. . .
4 GTPs suffice to identify each state
More needed to update PSR
Many more are computed from PSR
Predictive State
Representation (PSR)
Suppose we add one non-uniformity
R
RR
RRR
RRRR
. . .
0
0
1
1
0 D
1 DD
1 DDD
. . .
Now there is much more to know
It would be challenging to program it all correctly
Other Extension Ideas
• Stochasticity
• Egocentric motion
• Multiple Rooms
• Second agent
• Moveable objects
• Transient goals
It’s easy to make such problems arbitrarily challenging
Outline
AI at an Impasse
A Predictive Proposal
Some of the Machinery
Prospects and Conclusion
How Could These Ideas Proceed?
• Build systems! Build Gridworlds!
• A performance orientation would be problematic
• The “Knowledge Representation” guys may not be
impressed
• But others I think will be very interested and
appreciative - throughout modern probabalistic AI
Conclusion:
Predictions are the Coin of the Mental Realm
Knowledge is Predictions
About what-leads-to-what, under what ways of behaving
Such knowledge is learnable, chainable
Mental activity is working with predictions
Learning them
Combining them to produce new predictions (reasoning)
Converting them to action (planning, reinforcement learning)
Figuring out which are most useful
Predictions are verifiable
A natural way to self-maintain knowledge,
which is essential for scaling AI beyond programming
Most of the machinery is simple but potentially powerful
Reliable Knowledge Requires Verification
We can distinguish
1. Having knowledge
2. Having the ability to verify knowledge
I.e., there is something beyond having knowledge
which we might call understanding its meaning
and which is key in practice to building powerful AIs
Summary of Results for
Predictive State Rep’ns (PSRs)
Exist compact, linear PSRs
 # tests ≤ # states in minimal POMDP
 # tests ≤ Rivest & Schapire’s Diversity
 # tests can be exponentially fewer than diversity and POMDP
Compact simulation/update process
Construction algorithm from POMDP
Learning/discovery algorithms of Rivest and Schapire,
and of Jaeger, do not immediately extend to PSRs
There are natural EM-like algorithms (current work)