Artificial Intelligence Should Be About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone Outline AI at an Impasse A Predictive Proposal Some of the Machinery Prospects and Conclusion It’s Hard to Build Large AI Systems Brittleness Unforeseen interactions Scaling Requires too much manual complexity management people must understand, intervene, patch and tune like programming Need more self-maintenance learning, verification internal coherence of knowledge and experience AI at a Impasse We can’t go beyond ourselves We can’t make AI systems more complex than we can understand All the representations All the possible meanings All the interactions Beyond that, we get bogged down Brittleness Continual manual tuning Teams of people diverge on rep’ns and meanings No big return for our efforts What keeps the knowledge in an AI system correct? People do! But eventually this is a dead end. The key to a successful AI is that it can tell for itself if it is working correctly. The Verification Principle The Verification Principle An AI system can successfully maintain knowledge only to the extent that it can verify that knowledge itself Two Strategies for Self-maintenance Logical self-consistency Check statements for consistency with each other Establishes an internal coherence within the AI But tells us nothing about the external world Consistency with data Make predictions, see if they happen Establishes a coherence between the AI and its world Outline AI at an Impasse A Predictive Proposal Some of the Machinery Prospects and Conclusion Mind is About Predictions Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful Philosophical and Psychological Roots Like classical british empiricism (1650–1800) Knowledge is about experience Experience is central But not anti-nativist (evolutionary experience) Emphasizing sequential rather than simultaneous events Replace association/contiguity with prediction/contingency Close to Tolman’s “Expectancy Theory” (1932–1950) Cognitive maps, vicarious trial and error Psychology struggled to make it a science (1890–1950) Introspection Behaviorism, operational definitions Objectivity Modern Computional View of Mind OK to talk about insides of minds Ok to talk about the function and purpose of a design We talk about Why Why a system works Why it should compute X and in manner Y Why such a system should achieve purpose Z This is new, and resolves classical struggles Servo-mechanisms, state-transition probabilities Utility and decision theory Information as signal – subjective (private) yet clear Purpose defines and constrains mental constructs Informational View of Mind Mind does information processing Mind exchanges information with the world Mind experience World Only experience is known for sure Anything more public or “objective” is suspect World is an I-O entity, a black box Although we often seem to talk about what is inside, All we can sensibly talk about is I-O behavior Is Mind about Predictions? OR Is Mind about Action (or Policies)? Of course it is ultimately about action But action generation methods are relatively clear Value functions and decision theory – Pick action that maximizes expected cumulative reward OR – – – – Policy gradient RL methods Execution-time search Reflexes and behavior-based robotics Learning-extended reflexes and conditioning Flexible cognition requires more than action generation Most mental activity is working with predictions An old, simple, appealing idea Mind as prediction engine! Predictions are learnable, combinable They represent cause and effect, and can be pieced together to yield plans Perhaps this old idea is essentially correct. Just needs Development, revitalization in modern forms Greater precision, formalization, mathematics The computational perspective to make it respectable Imagination, determination, patience – Not rushing to performance Requisites of Prediction Proposal The AI has to have a life Predictions must be very flexible, expressive To capture a wide variety of world knowledge – Mixtures of transition predictions – Closed-loop action conditioning – Closed-loop termination And yet be grounded, directly comparable to data Predictions must be combinable, compositional Support varieties of planning Projection and anticipation of futures Outline AI at an Impasse A Predictive Proposal Some of the Machinery Prospects and Conclusion Machinery for General Transition Predictions In steps of increasing expressiveness Simple state-transition predictions Mixtures of predictions Closed-loop termination Closed-loop action conditioning While staying grounded in data Predictions and State The Simplest Transition Predictions state Experience st at action st1 at1 st2 at2 1-step Prediction X a Y Pr st 1 Y st X, at a k-step Prediction X p Y Pr st k Y st X, at at k given by p Mixtures of k-step Predictions: Terminating over a period of time time steps of interest Where will I be in 10–20 steps? Where will I be in roughly k steps? now k steps now k=10 steps k=20 steps Arbitrary termination profiles are possible short term But sometimes anything like this is too loose and sloppy... medium term long term Closed-loop Termination Terminate depending on what happens E.g., instead of “Will I finish this report soon” which uses a soft termination profile: 1 hr probably in about an hour Prob. time Use “Will I be done when my boss gets here?” 1 Prob. 0 boss arrives only one precise but uncertain time matters Closed-loop Termination Terminate depending on what happens E.g., instead of “Will I finish this report soon” which uses a soft termination profile: 1 hr probably in about an hour Prob. time Use “Will I be done when my boss gets here?” 1 Prob. 0 boss arrives only one precise but uncertain time matters Closed-loop termination allows time specification to be both flexible and precise Instead of “what will I see at t+100?” Can say “what will I see when I open the box?” or “when John arrives?” “when the bus comes?” “when I get to the store?” Will we elect a black or a woman president first? Where will the tennis ball be when it reaches me? What time will it be when the talk starts? A substantial increase in expressiveness Closed-loop Action Conditioning What happens depends on what you do What you do depends on what happens Each prediction has a closed-loop policy Policy: States --> Actions (or Probs.) If you follow the policy, then you predict and verify Otherwise not If partly followed, temporal-difference methods can be used General Transition Predictions (GTPs) Closed-loop terminations And closed-loop policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around? Anatomy of a General Transition Prediction States Measurement space p:S M 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) Pr e st s,p , m(e) p(s) eA S * e at st1 atk 1stk Actions p : S A or 2A : S [0,1] m : S A M * Ep , m(e) Example: Open-the-door Predictor Use visual input to estimate Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) expected cumulative cost (sub-par reward) in trying Experiment Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door Terminate on successful opening or various failure conditions Measure outcome and cumulative cost RoboCup-Soccer Example Safe to pass? Predict the outcome of choosing to pass • The pass will take several steps to set up – choosing to pass involves a whole action policy • You may choose to not to pass half way through • Terminations and outcomes: – – – – pass is aborted opponents touch the ball before teammate teamate touches first, appears to control ball ball goes out of bounds Example: Pass-to-Teammate Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of Successful pass, openness of receiver Interception Reception failure Aborted pass, in trouble Aborted pass, something better to do Loss of time Experiment Policy for maneuvering ball, or around ball, to set up and pass Termination strategy for aborting, recognizing completion Measurement of outcome, time More Predictive Knowledge John is in the coffee room My car is in the South parking lot What we know about geography, navigation What we know about how an object looks, rotates What we know about how objects can be used Recognition strategies for objects and letters The portrait of Washington on the dollar in the wallet in my other pants in the laundry, has a mustache on it Composing experiments creates a productive rep’n language Relational, Propositional, and Deictic objects X, If I drop X, then X will be on the floor X X Holding object X means predicting certain sensations if, for example, one directs one’s eyes toward one’s hand Thus, on dropping, the predicted sensations are merely transferred from the looking-at-hand prediction to the looking-at-floor prediction Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move X,Y, such that Red(X), Blue(Y), and Above(X,Y) There is some place I can foveate and see Red There is some place I can foveate and see Blue If I foveate first the Red place, “mark” it, then the Blue place, the mark will be Above the fovea (may need to search) These are typical ideas of modern, active, deictic vision Combining Predictions If the mind is about predictions, Then thinking is combining predictions to produce new ones Predictions obviously compose If A->B and B->C, then A->C GTPs are designed to do this generally Fit into “Bellman equations” of semi-Markov extensions of dynamic programming Can also be used for simulation-based planning Composing Predictions X p11 Y Y T1 X p11 then p 22 T1 T2 Transient measurement (e.g., elapsed time, cumulative reward) p 2 2 T2 Z Z Final measurement (e.g., partial distribution of outcome states) Composing Predictions X p11 T1 X Y’ .1 Y .8 Y’’ .1 p 11 then if Y p 2 2 T1 .8 T2 Y p 2 2 T2 Y’ .1 Z .8 Y’’ .1 Z Room-to-Room GTPs Sutton, Precup, & Singh, 1999 (General Transition Predictions) Target (goal) hallway Policy “Options” Precup 2000 Sutton, Precup, & Singh 1999 4 stochastic primitive actions up left Termination hallways right Fail 33% of t he t ime down 8 multi-step GTPs ( t o each room' s 2 hallways) Predict: Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway Planning with GTPs wit h cell-t o-cell primit ive act ions V (goal )=1 Iteration #0 Iteration #1 Iteration #2 Iteration #1 Iteration #2 wit h room-t o-room opt ions (GTPs) V (goal )=1 Iteration #0 Learning Path-to-Goal with and without GTPs 1000 Primitives GTPs & primitives Steps per 100 episode GTPs 10 1 10 100 Episodes 1000 10,000 Rooms Example: Simultaneous Learning of all 8 GTPs from their Goals 0.7 0.4 0.6 RMS Error in l l goal prediction 0.3 upper hallway subgoal ideal values 0.5 lower hallway subgoal 0.4 0.2 0.3 learned values 0.2 0.1 0.1 0 0 20,000 40,000 60,000 Time steps 80,000 100,000 0 0 20,000 40,000 Two subgoal state values 60,000 80,000 100,000 Time Steps All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random Machinery for General Predictions In steps of increasing expressiveness Simple state-transition predictions Mixtures of predictions Closed-loop termination Closed-loop action conditioning While staying grounded in data Predictions and State Predictive State Representations Problem: So far we have assumed states but world really just gives information, “observations” Hypothesis: What we normally think of as state is a set of predictions about outcomes of experiments Wallet’s contents, John’s location, presence of objects… Prior work: Learning deterministic FSAs - Rivest & Schapire, 1987 Adding stochasticity: An alternative to HMMs - Herbert Jaeger, 1999 Adding action: An alternative to POMDPs - Littman, Sutton, & Singh 2001 Empty Gridworld with Local Sensing Four actions: Up, Down, Right, Left And four sensory bits Distance to Wall Predictions R RR RRR RRRR . . . 0 0 1 1 0 D 1 DD 1 DDD “meaning” of predictions . . . 4 GTPs suffice to identify each state More needed to update PSR Many more are computed from PSR Predictive State Representation (PSR) Suppose we add one non-uniformity R RR RRR RRRR . . . 0 0 1 1 0 D 1 DD 1 DDD . . . Now there is much more to know It would be challenging to program it all correctly Other Extension Ideas • Stochasticity • Egocentric motion • Multiple Rooms • Second agent • Moveable objects • Transient goals It’s easy to make such problems arbitrarily challenging Outline AI at an Impasse A Predictive Proposal Some of the Machinery Prospects and Conclusion How Could These Ideas Proceed? • Build systems! Build Gridworlds! • A performance orientation would be problematic • The “Knowledge Representation” guys may not be impressed • But others I think will be very interested and appreciative - throughout modern probabalistic AI Conclusion: Predictions are the Coin of the Mental Realm Knowledge is Predictions About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful Predictions are verifiable A natural way to self-maintain knowledge, which is essential for scaling AI beyond programming Most of the machinery is simple but potentially powerful Reliable Knowledge Requires Verification We can distinguish 1. Having knowledge 2. Having the ability to verify knowledge I.e., there is something beyond having knowledge which we might call understanding its meaning and which is key in practice to building powerful AIs Summary of Results for Predictive State Rep’ns (PSRs) Exist compact, linear PSRs # tests ≤ # states in minimal POMDP # tests ≤ Rivest & Schapire’s Diversity # tests can be exponentially fewer than diversity and POMDP Compact simulation/update process Construction algorithm from POMDP Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs There are natural EM-like algorithms (current work)
© Copyright 2026 Paperzz