CS 188: Artificial Intelligence Announcements Today Motivating

Announcements
CS 188: Artificial Intelligence
Contest: amazing stuff – 59 teams!
Fall 2009
Final qualifying tournament almost done running!
Congrats to current qualifiers!
Send us your info ASAP
Advanced Applications:
Robotics / Vision / Language
Tonight: seeding tournament
Wednesday NOON: final tournament
Thursday: final results in class!
Dan Klein – UC Berkeley
Many slides from Pieter Abbeel, John DeNero
1
2
Today
Motivating Example
How do we specify a task like this?
[demo: autorotate / tictoc]
3
Autonomous Helicopter Flight
Autonomous Helicopter Setup
On-board inertial
measurement unit (IMU)
Position
Control inputs:
Send out controls to
helicopter
alon : Main rotor longitudinal cyclic pitch control (affects pitch rate)
alat : Main rotor latitudinal cyclic pitch control (affects roll rate)
acoll : Main rotor collective pitch (affects main rotor thrust)
arud : Tail rotor collective pitch (affects tail rotor thrust)
1
Helicopter MDP
Problem: What’s the Reward?
State:
Rewards for hovering:
[demo: hover]
Actions (control inputs):
alon : Main rotor longitudinal cyclic pitch control (affects pitch rate)
alat : Main rotor latitudinal cyclic pitch control (affects roll rate)
acoll : Main rotor collective pitch (affects main rotor thrust)
arud : Tail rotor collective pitch (affects tail rotor thrust)
Rewards for “Tic-Toc”?
Transitions (dynamics):
Problem: what’s the target trajectory?
Just write it down by hand?
st+1 = f (st, at) + wt
[f encodes helicopter dynamics]
[w is a probabilistic noise model]
[demo: bad]
Can we solve the MDP yet?
8
[demo: pac apprentice]
Apprenticeship Learning
Pacman Apprenticeship!
Demonstrations are expert games
Goal: learn reward function from expert demonstration
Assume
Get expert demonstrations
Guess initial policy
Repeat:
Features defined over states s
Score of a state given by:
Find w which make the expert better than
Solve MDP for new weights w:
Learning goal: find weights which explain expert actions
9
[demo: unaligned]
Helicopter Apprenticeship?
Probabilistic Alignment
Intended trajectory
Expert
demonstrations
Time indices
Intended trajectory satisfies dynamics.
Expert trajectory is a noisy observation of one of the
hidden states.
11
But we don’t know exactly which one.
2
[demo: alignment]
[demo: airshow]
Alignment of Samples
Final Behavior
Result: inferred sequence is much cleaner!
13
14
What is NLP?
Problem: Ambiguities
Headlines:
Fundamental goal: analyze and process human language,
broadly, robustly, accurately…
End systems that we want to build:
Ambitious: speech recognition, machine translation, information extraction,
dialog interfaces, question answering…
Modest: spelling correction, text categorization…
Enraged Cow Injures Farmer With Ax
Hospitals Are Sued by 7 Foot Doctors
Ban on Nude Dancing on Governor’s Desk
Iraqi Head Seeks Arms
Local HS Dropouts Cut in Half
Juvenile Court to Try Shooting Defendant
Stolen Painting Found by Tree
Kids Make Nutritious Snacks
Why are these funny?
15
Grammar: PCFGs
Parsing as Search
Natural language grammars are very ambiguous!
PCFGs are a formal probabilistic model of trees
Each “rule” has a conditional probability (like an HMM)
Tree’s probability is the product of all rules used
Parsing: Given a sentence, find the best tree – search!
ROOT → S
375/420
S → NP VP .
320/392
NP → PRP
127/539
VP → VBD ADJP
32/401
…..
17
18
3
Syntactic Analysis
[demo]
Machine Translation
Translate text from one language to another
Recombines fragments of example translations
Challenges:
Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday
packing 135 mph winds and torrential rain and causing panic in Cancun,
where frightened tourists squeezed into musty shelters .
What fragments? [learning to translate]
How to make efficient? [fast translation search]
19
21
4
Levels of Transfer
25
Machine Translation
[demo: MT]
29
5