1/10/13 What’s in a (course) name? • Official class )tle: Neural and Gene)c Approaches to AI Probabilis)c Approaches to Ar)ficial Intelligence (aka: Neural and Gene)c approaches to AI) – Actually, NEURAL&GENTC APPR ARTFCL INTEL – This will appear on your schedules and transcripts • But we won’t cover neural and gene)c topics – This semester: A new experimental course on “Probabilis)c Approaches to AI” – Prof. Hauser taught a similar version of B553 last year CS B553 Spring 2013 What’s wrong with gene)c and neural approaches? • Nothing! They’re fascina)ng and useful! • We’re teaching this experimental course as B553 for boring bureaucra)c reasons • That said, probabilis)c approaches are very popular right now in most subfields of AI Brief, biased history of AI • 1950: Alan Turing speculates about machine intelligence, proposes Turing Test • 1955: Newell & Simon unveil the Logic Theorist to prove theorems automa)cally; introduce form of Lisp • 1956: Dartmouth Conference introduces term Ar)ficial Intelligence; par)cipants include Shannon, Minsky, McCarthy, etc. 1960’s: Progress & op)mism • AI as search through huge spaces – Techniques for heuris)c search, e.g. branch & bound (1960), alpha-‐beta pruning (1963), A* search (1968)… • Ar)ficial neural networks – Using perceptrons (1957); much excitement about their poten)al power • Fuzzy logic (1965) – To model uncertainty • Micro-‐worlds – E.g. Sussman’s “Block world” in computer vision 1970’s: Decline and the “AI Winter” • 1969: Minsky’s book Perceptrons proves limita)ons of neural networks • 1971-‐2: Cook’s and Karp’s NP-‐ completeness papers show many problems are simply intractable • Overly op)mis)c predic)ons of the 60’s don’t materialize; Funding agencies lose faith in AI 1 1/10/13 1980’s: Rise from the ashes • Resurgence of neural networks, with mul)-‐layer networks and the backpropaga)on algorithm • Much work in Knowledge-‐based systems that try to represent and capture knowledge • Expert systems apply rules defined by human experts to solve problems Early 1990’s: Another crash • Expert systems very difficult to maintain – Difficult to manage huge knowledge bases – Suffered from “brikleness:” Systems would give nonsensical answers for no easily-‐explained reasons – Many companies see these as the future and invest lots of money in developing them… Mid 1990’s to present • Greater math sophisEcaEon, connec)ng AI problems to other domains – “Revolu)onary” (Norvig & Russell) – Connec)ons to op)miza)on, probabilis)c and sta)s)cal models, informa)on theory, etc. • Focus on more concrete, less ambiEous goals • Less interest in biologically-‐inspired techniques in favor of techniques that seem to work in prac)ce • Moore’s law makes hard problems more tractable Probabilis)c techniques • AI is full of uncertainty – Can’t observe full state of system – Observa)ons we can make are noisy – Our models of the world are imperfect – Some systems can’t be modeled anyway (chao)c and apparently nondeterminis)c systems) Martin-Shepard 2010 • Probabilis)c frameworks give us a principled way of dealing with and reasoning about this uncertainty – Largely championed by Judea Pearl (2011 Turing Award) But they’re not a silver bullet! • We’ll s)ll face challenges like… – Probability distribu)ons that are impossibly complex, with intractably many dimensions – Parameter es)ma)on problems that seem to require exponen)al amounts of data – Inference problems that are NP hard • Much work is thus devoted to balancing between what we’d like to model and what we are able to model – Probabilis.c graphical models are a popular framework Course goals • Introduce the modern mathemaEcal and algorithmic machinery used in probabilis)c techniques for AI – Mostly in the graphical model framework • You’ll get to understand both the (nice, clean) theory and the (onen messy) implementa)on details – Along three dimensions: model representa)on, inference, and parameter es)ma)on (learning) • Gain experience with different applica)ons of these techniques to real AI problems – In vision, natural language processing, robo)cs, etc. 2 1/10/13 Course overview (tenta)ve) • • • • • • • • • Basic probability: Nota)on, Bayes law, Bayesian classifiers RepresentaEon: Bayesian networks, Markov networks Exact inference: Variable elimina)on, condi)oning, clique trees Approximate inference: BP, par)cle sampling, graph cuts Inference as opEmizaEon: Gradient descent, Newton methods, stochas)c op)miza)on, gene)c algorithms Parameter learning: ML and MAP es)ma)on, Expecta)on Maximiza)on Structure learning Temporal models: Markov chains, HMMs ApplicaEons Course mechanics • Syllabus, schedule, assignments, announcements, etc. on IU OnCourse – hkp://oncourse.indiana.edu/ • Readings from textbook and selec)ons from papers and other books – Textbook: Koller and Friedman, Probabilis.c Graphical Models: Principles and Techniques, 2009. Grading • 50% Assignments – Mixture of pen-‐and-‐paper and programming problems – For programming problems, any general-‐purpose programming language is acceptable if you implement the important rou)nes yourself (more detail on this later) – We’ll typically recommend a language • 20% Final project • 30% In-‐class quizzes Course staff • Prof. David Crandall 227 Informa)cs West Office hour: T 2-‐3 (tenta)ve) • AI: Alex Rudnick 330I Lindley Hall Office hour: W 10-‐11am (tenta)ve) Prerequisites • Technically, CS B551 • Prac)cally: – Proficiency in a general-‐purpose programming language, e.g. C/C++, Matlab, Python, Java – Some level of mathema)cal maturity, esp. with sta)s)cs, linear algebra, and calculus – Willingness to learn some programming and/or math on your own if necessary Project • On a topic of your choice • Three deliverables: a brief proposal, a final report (and source code), a brief presenta)on • Wide range of possible projects, e.g. – Develop new technique for problem X – Apply exis)ng technique to new applica)on Y – Implement technique Z in a significantly faster way – Implement and compare techniques W and U – Or something else broadly related to probabilis)c techniques 3 1/10/13 Academic integrity • Read and understand the AI policy on syllabus Review of basic probability concepts • We will look for and prosecute AI viola)ons • Be especially careful with homework assignments – You may discuss homework problems at a high level (e.g. general strategies for solving problems), but you may not share code, and you must cite the other student in your submission – If you use ideas or code from another source (like a webpage or textbook) you must acknowledge the source in your submission Probability 101 • A finite probability space consists of: Basic iden))es • For two events A and B… – A finite set S of mutually-‐exclusive outcomes – A func)on such that: – What’s the probability that either A or B (or both) occur? If A and B are disjoint, their intersec)on is the empty set, and the last term is 0. • An event A is a subset of S, . – The probability of an event is defined as Super simple example #1 • Suppose you roll a six-‐sided die 5 )mes. What’s the probability of rolling a “three” all 5 )mes? Super simple example #2 • Suppose you roll a six-‐sided die 5 )mes. What’s the probability of rolling a “three” during the first roll? 4 1/10/13 Super simple example #3 Example #3 (2nd try) • Suppose you roll a die 5 )mes. What’s the probability of gewng at least 1 six? • Suppose you roll a die 5 )mes. What’s the probability of gewng at least 1 six? Answer: – The probability of gewng a six on a single roll is 1/6. – So the probability of gewng a six among 5 rolls is 5*1/6=5/6. – Answer 2: Sum probabili)es of disjoint events P(at least 1 six) = P(1 six and 4 non-‐sixes) + P(2 sixes and 3 non-‐sixes) + P(3 sixes and 2 non-‐sixes) + P(4 sixes and 1 non-‐six) + P(5 sixes) = … – Right, but a lot of work. WRONG! The events are not disjoint. An example (3rd try) – Either we get at least 1 six (event A), or we get no sixes (event B). A and B are clearly disjoint and their union is S. The probability of B is (5/6)5. P(A) = 1 – P(B) = 1 – (5/6)5 = 0.598 • Given our class of ~30 people, what’s the probability that at least two of us share the same birthday? Probability of shared birthday • Suppose you roll a die 5 )mes. What’s the probability of gewng at least 1 six? The Birthday Problem # of people Condi)onal probabili)es • Probability that one event occurs, given that another event is known to have occurred – Denoted . “Probability of A given B” – Defined as: Condi)onal probabili)es • Two events are independent if – Or, equivalently, if – Independence denoted • The joint probability of independent events A and B both occurring is then simply: • Leads directly to the chain rule: – This idea of factoring a distribu)on into a product of two simpler distribu)ons will be a recurring course theme! – More generally: 5
© Copyright 2025 Paperzz