Announcements CS 188: Artificial Intelligence Contest: amazing stuff – 59 teams! Fall 2009 Final qualifying tournament almost done running! Congrats to current qualifiers! Send us your info ASAP Advanced Applications: Robotics / Vision / Language Tonight: seeding tournament Wednesday NOON: final tournament Thursday: final results in class! Dan Klein – UC Berkeley Many slides from Pieter Abbeel, John DeNero 1 2 Today Motivating Example How do we specify a task like this? [demo: autorotate / tictoc] 3 Autonomous Helicopter Flight Autonomous Helicopter Setup On-board inertial measurement unit (IMU) Position Control inputs: Send out controls to helicopter alon : Main rotor longitudinal cyclic pitch control (affects pitch rate) alat : Main rotor latitudinal cyclic pitch control (affects roll rate) acoll : Main rotor collective pitch (affects main rotor thrust) arud : Tail rotor collective pitch (affects tail rotor thrust) 1 Helicopter MDP Problem: What’s the Reward? State: Rewards for hovering: [demo: hover] Actions (control inputs): alon : Main rotor longitudinal cyclic pitch control (affects pitch rate) alat : Main rotor latitudinal cyclic pitch control (affects roll rate) acoll : Main rotor collective pitch (affects main rotor thrust) arud : Tail rotor collective pitch (affects tail rotor thrust) Rewards for “Tic-Toc”? Transitions (dynamics): Problem: what’s the target trajectory? Just write it down by hand? st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model] [demo: bad] Can we solve the MDP yet? 8 [demo: pac apprentice] Apprenticeship Learning Pacman Apprenticeship! Demonstrations are expert games Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess initial policy Repeat: Features defined over states s Score of a state given by: Find w which make the expert better than Solve MDP for new weights w: Learning goal: find weights which explain expert actions 9 [demo: unaligned] Helicopter Apprenticeship? Probabilistic Alignment Intended trajectory Expert demonstrations Time indices Intended trajectory satisfies dynamics. Expert trajectory is a noisy observation of one of the hidden states. 11 But we don’t know exactly which one. 2 [demo: alignment] [demo: airshow] Alignment of Samples Final Behavior Result: inferred sequence is much cleaner! 13 14 What is NLP? Problem: Ambiguities Headlines: Fundamental goal: analyze and process human language, broadly, robustly, accurately… End systems that we want to build: Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering… Modest: spelling correction, text categorization… Enraged Cow Injures Farmer With Ax Hospitals Are Sued by 7 Foot Doctors Ban on Nude Dancing on Governor’s Desk Iraqi Head Seeks Arms Local HS Dropouts Cut in Half Juvenile Court to Try Shooting Defendant Stolen Painting Found by Tree Kids Make Nutritious Snacks Why are these funny? 15 Grammar: PCFGs Parsing as Search Natural language grammars are very ambiguous! PCFGs are a formal probabilistic model of trees Each “rule” has a conditional probability (like an HMM) Tree’s probability is the product of all rules used Parsing: Given a sentence, find the best tree – search! ROOT → S 375/420 S → NP VP . 320/392 NP → PRP 127/539 VP → VBD ADJP 32/401 ….. 17 18 3 Syntactic Analysis [demo] Machine Translation Translate text from one language to another Recombines fragments of example translations Challenges: Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun, where frightened tourists squeezed into musty shelters . What fragments? [learning to translate] How to make efficient? [fast translation search] 19 21 4 Levels of Transfer 25 Machine Translation [demo: MT] 29 5
© Copyright 2026 Paperzz