The Menu Bar Lecture 17: Dis-course Administrivia Boxing ourselves in with dis-course Adding reasoning If time: language learnability – the basic formal results • • • • Professor Robert C. Berwick [email protected] 6.863 Spring 2010 Lecture 17 DRT: Informal semantics • • • When is a DRS satisfied in a model? If and only if it is an accurate picture of the information encoded in the model The following DRS should be satisfied iff discourse referents x and y can be embedded (i.e., associated with entities in the model) such that: DRT: Semantics for complex conditions • • • Negated DRS: satisfied if it is not possible to find the picture inside the model Disjunctive DRSs: satisfied if at least one of these pictures can be embedded But what about implicational conditions? • 1. 2. 3. the first entity is a woman the second is a boxer the first stands in the admires relation to the second So this is: “a woman admires a boxer” 6.863 Spring 2010 Lecture 17 • • Satisfied iff every embedding of the antecedent also embeds the consequent “Every woman loves a boxer” So, discourse referents introduced in antecedents of conditionals are universally quantified Other discourse referents are existentially quantified 6.863 Spring 2010 Lecture 17 DRT: Accessibility • Accessibility—the property valid antecedents have with respect to pronouns—is a geometrical concept: • It depends on how DRSs are contained within each other, in part based on a parse tree • The discourse referents of a DRS K1 are accessible from K2 when: • K1 subordinates K2, or • K1 equals K2 6.863 Spring 2010 Lecture 17 Parse tree geometry, accessability, and pronouns • Bush likes him (Bush cannot = him) • Bush thinks that Clinton likes him (Bush can = him) What is the constraint about ‘accessability’? Essentially, ‘environment frames’ (the boxes) say where a pronoun must be ‘free’ – read from parse trees as ‘c-command’ (recall ‘displaced’ elements) Constraint is that a pronoun must be locally free, and can be bound from a name outside its frame 6.863 Spring 2010 Lecture 17 DRT: Subordination • DRS K1 subordinates K2 if and only if: • K1 contains a condition of the form ¬K2 • K1 contains a condition of the form K2 ⇒ K, where K is some DRS • K1 ⇒ K2 is a condition in some DRS K • K1 contains a condition of the form K2∨K or K∨K2, where K is some DRS • K1 subordinates some DRS K, and K subordinates K2 • In short: look up, and with implication, look left. 6.863 Spring 2010 Lecture 17 Building DRSs • DRSs can be constructed from parse trees in much the same way as formulas of FOL • The lambda calculus can serve as a glue language: • Extend the DRS language with the lambda operator, functional application, and a DRS-merge operator + • Build representations bottom-up • Linear representation in NLTK: ([x, y], [rocky(x), moose(y), has(x,y)]) 6.863 Spring 2010 Lecture 17 Building discourse structures in NLTK >>> from nltk.sem.drt import * >>> dp = nltk.DrtParser() >>> dp = nltk.DrtParser() >>> drs1 = dp.parse('([x, y], [rocky(x), moose(y), has(x, y)])') [1] >>> print drs1 ([x,y],[rocky(x), moose(y), has(x,y)]) >>> print drs1.fol() exists x y.((rocky(x) & moose(y)) & has(x,y)) >>> drs4 = dp.parse('([x, y], [rocky(x), moose(y), has(x, y)])') >>> drs5 = dp.parse('([u, z], [PRO(u), natasha(z), bite(u, z)])') >>> drs6 = drs4 + drs5 >>> print drs6.simplify() ([x,y,u,z],[rocky(x), moose(y), has(x,y), PRO(u), natasha(z), bite(u,z)]) >>> print drs6.simplify().resolve_anaphora() ([x,y,u,z],[rocky(x), moose(y), has(x,y), (u = [x,y,z]), natasha(z), bite(u,z)]) 6.863 Spring 2010 Lecture 17 And finally a DRT parse… >>> from nltk.parse import load_earley >>> parser = load_earley('grammars/book_grammars/drt.fcfg', logic_parser=nltk.DrtParser()) >>> trees = parser.nbest_parse(ʼRocky has a moose'.split()) >>> print trees[0].node['sem'].simplify() ([x,z2],[Rocky(x), moose(z2), has(x,z2)]) That’s all well and good, but what is this useful for? 6.863 Spring 2010 Lecture 17 The DRT grammar & a piece of a parse Original cf/semantic rule for ‘a’: Det[num=sg,SEM=<\P Q. exists x.(P(x) & Q(x))>] -> 'a' Corresponding DRT rule: Det[num=sg,SEM=<\P Q.([x],[]) + P(x) + Q(x)>] -> 'aʼ (recall that DRT assumes the existential is there) Example subtree for ‘a moose’: (NP[num='sg', SEM=<\Q.(([x],[moose(x)]) + Q(x))>] (Det[num='sg', SEM=<\P Q.((([x],[]) + P(x)) + Q(x))>] a) (Nom[num='sg', SEM=<\x.([],[moose(x)])>] (N[num='sg', SEM=<\x.([],[moose(x)])>] moose))))) How does this work? Apply Det as a fn to Nom as arg, ie, Apply: \P Q.((([x],[]) + P(x)) + Q(x)) to: \x.([],[moose(x)]) yields: \Q.(([x],[]) + .([],[moose(x)]) + Q(x)) which simplifies to: \Q.(([x], [moose(x)]) + Q(x)) 6.863 Spring 2010 Lecture 17 The Language use domain • As inference tasks: • Querying • Consistency checking • Informativity checking (what is this?) • Plus we need to do multiple sentences… • In short, we need something to handle this… 6.863 Spring 2010 Lecture 17 Ah, we must be able to reason! • • So, we need to know about presuppositions Because presupposition relates to consistency and informativity (remember the Spock episode)? • • • • • • You say you don’t remember the Spock episode? Well, let me remind you… Kirk: Spock, how many Romulans are there in Sector 53? Spock: None, Captain Kirk: You are sure? Spock: Affirmative <Camera rocks violently> Kirk: Damn your Vulcan ears, Spock! I thought you said there were no Romulans in Sector 53!!!! Spock: But Captain, there is no Sector 53. Logic dictates…. • • 6.863 Spring 2010 Lecture 17 What are presuppositions? • Information a speaker expects to be taken for granted: • The couple that won the dance contest was pleased. • Jody loves her husband. • Vincent regrets that Mia is married. • These sequences are unacceptable: • Jody has no husband. ?Jody loves her husband. • Mia is not married. ?Vincent regrets that Mia is married. 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Presupposition is not entailment • Presupposition behaves differently from ordinary entailment • Both: • Jody loves her husband. and its negation: • Jody does not love her husband. presuppose that Jody has a husband 6.863 Spring 2010 Lecture 17 Presupposition triggers The binding problem • Presuppositions are triggered by lexical items or syntactic constructions: • the definite article: the • factive verbs: regret, resent • possessive pronouns (her), aspectual verbs (begin), iteratives (again), temporal subordinate clauses (when), quantifiers (every), and many more • These lexical items and constructions are called presupposition triggers • The use of a presupposition trigger introduces a presupposition that constrains the context • The following sentence presupposes that someone owns an apartment: • A boxer nearly escaped from his apartment • But it is not just anyone — nor just any boxer — who is presupposed to own an apartment • It is the boxer we are actually talking about who is presupposed to own the apartment • The existentially quantified NP binds both the asserted boxer and the presupposed apartment 6.863 Spring 2010 Lecture 17 Accommodation • The following sentence presupposes that Vincent has a boss: • Vincent informed his boss • Or: Spoke, how many Romulans are in sector alpha3? • What happens when the hearer has no information about whether Vincent has a boss (i.e., about whether the presupposition is true)? • Simplistic solution: Reject the utterance (Spock!) • More robust solution: Try to add the presupposition to the context • The process of adding a presupposition is called accommodation 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Presupposition as anaphora • A presupposition trigger introduces a DRS containing the content of the presupposition • Presuppositional DRSs are introduced locally, in the same DRS as the trigger and marked with an α • A presuppositional DRS (or elementary presupposition) must be resolved, either by: • Binding: matching the content of an accessible, compatible antecedent • Accommodation: adding its content to an accessible location (locally or to a superordinate DRS) • Accommodation is subject to a number of constraints 6.863 Spring 2010 Lecture 17 Presupposition as anaphora: A resolution algorithm Generate DRS for sentence with α-marked elementary presuppositions located together with their triggers. Merge this DRS with discourse DRS so far. 2. Traverse the merged DRS, and on encountering an αmarked DRS try to: a) (Partially) match the presupposed information to an accessible antecedent b) Alternatively, add the presupposed information to the local DRS or to a superordinate DRS 3. After generating all possible readings, filter DRSs that violate acceptability constraints 1. Presupposition as anaphora: A simple binding example • A waitress serves Vincent a coffee. Vincent lights up a cigarette. The waitress leaves. Step 1: Step 3: Step 2: 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 A lot goes on behind the scenes… Non-determinism The algorithm generates a number of possible readings (exponential in the number of presuppositions) – we need constraints to filter out all of these…! • The waitress as coffee 6.863 Spring 2010 Lecture 17 • The waitress as Vincent The waitress as cigarette These readings are filtered out by step 3 6.863 Spring 2010 Lecture 17 A new waitress So the constraints in general • Will require some sort of reasoning system, however crude (ie, that you cannot set waitresses on fire…) • It will also require some sort of Wordnet-like notion of ontology… • Further, responses to questions in general will require some reasoning capability (as in the Star Trek example) • Let’s cover some of the ‘conversational maxims’ based on these notions 6.863 Spring 2010 Lecture 17 Baby Bush • Computes semantic representations and difft quantifier scopings (and we already know how hard this can be…!) • Accumulates information over sentences • But…is very….uhm, still not the sharpest knife in the drawer The Bush hierarchy • • • • • • Baby Bush: No inference capabilities Clever Bush: negative consistency checks (sophisticated prover) Sensitive Bush: negative and positive informativity checks Scrupulous Bush: eliminating superfluous readings Knowledgeable Bush: adding background knowledge Helpful Bush: question answering We start with the tabula rasa Bush: no inference capabilities. So, even though it can parse and form FOL semantics and do all the quantifier scoping, it will act dumb 6.863 Spring 2010 Lecture 17 Baby Bush There are WMD in Iraq Bush: OK There are no WMD in Iraq Bush: OK Condi is a woman; Cheney does not like any woman Bush: OK Cheney likes Condi Bush: OK What’s missing? No consistency or informativity checks. Let’s define these. 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 What must be added: Inferential constraints • Be consistent: • Jody is married. Her husband is a dealer. • Jody is not married. #Her husband is a dealer. • The consistency constraint clearly helps to filter out nonsensical readings. • Be informative: • Jody is a boxer. #Jody is a boxer. • Mia is married. #She has a husband. 6.863 Spring 2010 Lecture 17 Consistency checking • A formula is consistent if it is satisfied in at least one model. • Consistent formulas describe “conceivable” or “possible” states of affairs • A formula that is not consistent is called inconsistent. • Inconsistent formulas describe “inconceivable” or “impossible” states of affairs • A finite set of formulas is consistent if the conjunction of the formulas is consistent. • Consistency checking is undecidable for first-order logic. 6.863 Spring 2010 Lecture 17 So What More Do We Need? 1. Translate the English text and query to a representation in a logical language (viz., typed lambda calculus) • Representation of text’s literal meaning 2. Check for logical consistency and coherence of the representation • To catch errors in translation 3. Combine with a representation of general world knowledge 4. Run automated reasoning tools to get an answer 6.863 Spring 2010 Lecture 17 Informativity checking • A valid sentence is a sentence that is true in all models. • What does validity have to do with informativity? • We often call valid sentences uninformative and invalid sentences informative. Why? • If φ1… φn ⊨ ψ, we say that ψ is uninformative wrt φ1… φn (it doesn’t add any new knowledge) • If φ1… φn ⊭ ψ, we say that ψ is informative wrt φ1… φn • Validity testing is also undecidable for first-order logic 6.863 Spring 2010 Lecture 17 Informativity >>> dt.add_sentence('A person dances', informchk=True) Sentence 'A person dances' under reading 'exists x.(person(x) & dance(x))': Not informative relative to thread 'd0' 6.863 Spring 2010 Lecture 17 Now we have clever Bush Bush is the President Bush: OK Bush is not the President Bush: That is not OK It is good to be king! 6.863 Spring 2010 Lecture 17 Back to the press conference • Consistency check removes ‘dumb’ interchanges 6.863 Spring 2010 Lecture 17 Clever Bush Cheny is a Republican Bush: OK Bush likes every Republican Bush: OK Bush does not like Cheney Bush: That is inconsistent 6.863 Spring 2010 Lecture 17 Model building • Theorem provers check whether a formula or set of formulas is valid (true in all possible models) • Model builders attempt to construct a formula (or set of formulas) and so show that this formula is satisfiable (true in at least one possible model) • So – we must limit model builders to a finite domain size… (cf the ‘closed world’ assumption in AI), else this is unsolvable in general • Uncertainty: if you don’t find a valid model, you don’t know whether it’s satisfiable or not… but • If you do, you can be pretty sure the formula is satisfiable 6.863 Spring 2010 Lecture 17 Sensitive Bush: Informativity • “Make your contribution as informative as is required (for the current purposes of the exchange).” H. P. Grice. • Suggests need for Informativity checking – indeed, for informativity checking against background knowledge 6.863 Spring 2010 Lecture 17 Relationship between consistency and informativity • ψ is informative iff ¬ψ is consistent: • Informativity means that the negation was also a possibility, so you have found out something new • ψ is informative wrt φ1… φn iff {φ1,…, φn , ¬ψ} is consistent • ψ is uninformative iff ¬ψ is inconsistent: • Uninformativity means that the negation simply was not a possibility (cf., “there are no Romulans because there is no sector 53”) • ψ is uninformative wrt φ1… φn iff {φ1,…, φn , ¬ψ} is inconsistent 6.863 Spring 2010 Lecture 17 Informativeness • [Negative test] Lexical knowledge ∪ World knowledge ∪ Discourse-so-far ⇒ φ • [Positive test] Lexical knowledge ∪ World knowledge ∪ Discourse-so-far ∪ {¬φ} has a model 6.863 Spring 2010 Lecture 17 That will get rid of dumb exchanges like this Bush is not the President (ψ) Bush: OK Bush is not the President (φ) (system now tries to prove ψ⇒φ , and easily wins) Bush: I already know that! (Clearly, this also will fix cases like: Bush likes every Republican; Cheney is a Republican; Bush likes Cheney – the last is uninformative) 6.863 Spring 2010 Lecture 17 What about these? A Clinton loves a woman There are 2 quantifier scopings, but these are logically equivalent, so really there is only one distinct reading. 6.863 Spring 2010 Lecture 17 Knowledgeable Bush • Assume: KB is a formula (the conjunction of a set of formulas) containing relevant background knowledge. • Consistency: • Let φ be the translation of a DRS to FOL • Use first-order tools to check whether KB ∧ φ is consistent • Informativity: • Let ψ be the formula KB ∧ OLD, where OLD is the translation of the old DRS to FOL • Let φ be the translation of the new DRS to FOL • Use first-order tools to check whether is φ informative wrt ψ 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Hypernym (‘above’) Hyponym (‘below’) 6.863 Spring 2010 Lecture 17 NLTK: consistency & informativity checking 6.863 Spring 2010 Lecture 17 How close are we now to this? >>> dt = nltk.DiscourseTester(['A student dances', 'Every student is a person']) >>> dt.readings() s0 readings: s0-r0: exists x.(student(x) & dance(x)) s1 readings: s1-r0: all x.(student(x) -> person(x)) >>> dt.add_sentence('No person dances', consistchk=True) Inconsistent discourse d0 ['s0-r0', 's1-r0', 's2-r0']: s0-r0: exists x.(student(x) & dance(x)) s1-r0: all x.(student(x) -> person(x)) s2-r0: -exists x.(person(x) & dance(x)) >>> dt.retract_sentence('No person dances', verbose=True) Current sentences are s0: A student dances s1: Every student is a person 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Not even close: the real Turing test As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don’t know. Language learning: the formal analysis • What are the main results? • Why is it important to know them? —Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing 6.863 Spring 2010 Lecture 17 Learning: Observe some values of a function 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Guess the whole function 6.863 Spring 2010 Lecture 17 Another guess: Just as good? 6.863 Spring 2010 Lecture 17 Perfect fit to perfect, incomplete data More data needed to decide 6.863 Spring 2010 Lecture 17 Imperfect fit to noisy data Will an ungrammatical sentence ruin baby forever? (yes, under a conservative strategy ...) 6.863 Spring 2010 Lecture 17 Or can baby figure out which data to (partly) ignore? 6.863 Spring 2010 Lecture 17 Language learning: What kind of evidence? Poverty of the Stimulus • Never enough input data to completely determine the polynomial … • Always have infinitely many possibilities • … unless you know the order of the polynomial ahead of time. • 2 points determine a line • 3 points determine a quadratic • etc. • In language learning, is it enough to know that the target language is generated by a CFG? • without knowing the size of the CFG? 6.863 Spring 2010 Lecture 17 Poverty of the Stimulus (1955 on) Chomsky (and the Cartesian tradition): Just like polynomials – never enough data unless you know something in advance. So kids must be born knowing what to expect in language. • Children listen to language • Children are corrected?? • Children observe language in context • Children observe frequencies of language 6.863 Spring 2010 Lecture 17 The guy is happy / is the guy happy The guy who is happy is tired / • • • • Children listen to language [unsupervised, ‘unlabled’ data] Children are corrected?? [supervised, ‘labeled’ data] Children observe language in context Children observe frequencies of language Remember: Language = set of strings (but we can generalize this) 6.863 Spring 2010 Lecture 17 Gold’s Theorem (1967) a simple, but very powerful negative result along these lines: kids (or computers) can’t learn much without supervision or a priori knowledge • Children listen to language • Children are corrected?? - NOT! (is this true?) • Children observe language in context 6.863 Spring 2010 Lecture 17 “Mother, I’d rather do it myself” The classic example • In controlled studies (from Brown, 1967 on): children are corrected for semantic mal-formed sentences, but not really all that often for syntactic ones (why?) • When they are corrected for (usually minor morphological) overgeneralizations, e.g., goedwent, they (surprise) don’t usually listen • Child: Nobody don’t like me • Mother: No, say, “nobody likes me” • Child: Nobody don’t like me • Mother: No, say, “nobody likes me” … (goes on 12 more times) • Mother: OK, one last time: say, “nobody likes me” • Child: Oh! Nobody don’t likes me! 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 But what about this kind of indirect negative evidence? • • • • • • If you haven’t heard a certain constraint/construction after a certain time, you’ll never hear it (because it’s not part of your target language/grammar) Example: suppose you hear, “I am happy” a thousand times But never “He am happy” Can we infer from (comparatively) low frequency to ungrammatical? Tricky! Note that low probability on its own cannot imply ungrammaticality: if infinite # of grammatical sentences, then there cannot be a lower bound on their proability Because: if all grammatical sentences at least probability ε, then there can be at most 1/ε grammatical sentences, which implies the language is finite 6.863 Spring 2010 Lecture 17 What about this idea? • • • To ensure learnability: Though particular sentences might overlap for 2 different languages, and even the frequencies (probabilities) of the 2 sentences might be the same, the distributions of two languages cannot overlap (this is a constraint to be hypothesized - it might be true or not) Otherwise hard to see how learner could distinguish between L1 and L2 We’ll come back to this…for now, let’s consider the basic case, and assume no indirect negative evidence 6.863 Spring 2010 Lecture 17 The Idealized Situation • Babysitter talks • Baby listens • • • • • • 1. Babysitter outputs a sentence 2. Baby hypothesizes what the language is (given all sentences so far) 3. Go to step 1 Guarantee: Babysitter’s language is in the set of hypotheses that Baby is choosing among Guarantee: Any sentence of Babysitter’s language is eventually uttered by Caretaker (even if infinitely many) Assumption: Vocabulary (or alphabet) is finite. 6.863 Spring 2010 Lecture 17 Can Baby learn under these conditions? The real situation two same way " little boy's mitten two good ones . twelve fourteen o'clock post office mail oh # your rope tricks little red spots little fish house oh poor little fellow your washing maxne some child outside . two seals # one strong man . one bareback rider . little soda water same just like what ? some water juice ? green stamp basket some pretty work ? two street lights one more what dirty water germs red street light . three dozen milk yes # and one dozen eggs . your dozen what one dozen eggs two little birdies ? your automatic rifle . two dirty fingers # but I don't think they're broken . dirty dirty fingers 6.863 Spring 2010 Lecture 17 Languages vs. Grammars • Learning in the limit: • There is some point at which Baby’s hypothesis is correct and never changes again. Baby has converged! • Baby doesn’t have to know that it’s reached this point – it can keep an open mind about new evidence – but if its hypothesis is right, no such new evidence will ever come along. • • • • A family C of languages is learnable in the limit if one could construct a perfect C-Baby that can learn any language L ∈ C in the limit from a Babysitter who speaks L. • Does Baby have to get the right grammar? • (E.g., does VP have to be called VP?) • Assumption: Finite vocabulary. Baby knows the class C of possibilities, but not L. Is there a perfect finite-state Baby? Is there a perfect context-free Baby? 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 Conservative Strategy Conservative Strategy • Baby’s hypothesis should always be smallest language consistent with the data • Baby’s hypothesis should always be smallest language consistent with the data • Works for finite languages? Let’s try it … • Language 1: {aa,ab,ac} • Language 2: {aa,ab,ac,ad,ae} • Language 3: {aa,ac} • Language 4: {ab} • Works for finite languages? Let’s try it … • Language 1: {aa,ab,ac} • Language 2: {aa,ab,ac,ad,ae} • Language 3: {aa,ac} • Language 4: {ab} Babysitter aa Baby L3 ab L1 ac L1 ab L1 aa L1 … 6.863 Spring 2010 Lecture 17 Babysitter aa L3 Baby Evil Babysitter • To find out whether Baby is perfect, we have to see whether it gets 100% correct even in the most adversarial conditions • Assume Babysitter is trying to fool Baby • although she must speak only sentences from L • and she must eventually speak each such sentence • Does Baby’s strategy work? ab L1 ac L1 ab L1 aa L1 ae ad … 6.863 Spring 2010 Lecture 17 An Unlearnable family of Languages • Family of languages: • Let Ln = set of all strings of length < n • What is L0? • What is L1? • What is L∞? • • • If the true language is L∞, can Babysitter really follow rules? Must eventually speak every sentence of L∞. Possible? Yes: ε; a, b; aa, ab, ba, bb; aaa, aab, aba, abb, baa, … • Our family is C = {L0, L1, …, L∞} 6.863 Spring 2010 Lecture 17 aa ac ab 6.863 Spring 2010 Lecture 17 An Unlearnable Family of Languages An Unlearnable Family • Let Ln = set of all strings of length < n • What is L0? • What is L1? • What is L∞? • Our family is C = {L0, L1, …, L∞} • A perfect C-baby will distinguish among all of these depending on the input. • But there is no perfect C-baby … • Our class is C = {L0, L1, …, L∞ } • Suppose Baby adopts conservative strategy, always picking smallest possible language in C. • So if Babysitter’s longest sentence so far has 75 words, baby’s hypothesis is L76. • This won’t always work: What language can’t a conservative Baby learn? 6.863 Spring 2010 Lecture 17 6.863 Spring 2010 Lecture 17 An Unlearnable Family • Our class is C = {L0, L1, …, L∞} • Could a non-conservative baby be a perfect C-Baby, and eventually converge to any of these? • Claim: Any perfect C-Baby must be “quasi-conservative”: • If true language is L76, and baby posits something else, baby must still eventually come back and guess L76 (since it’s perfect). • So if longest sentence so far is 75 words, and Babysitter keeps talking from L76, then eventually baby must actually return to the conservative guess L76. • Agreed? 6.863 Spring 2010 Lecture 17 Babysitter’s Revenge • • • • If longest sentence so far is 75 words, and Babysitter keeps talking from L76, then eventually a perfect C-baby must actually return to the conservative guess L76. Suppose true language is L∞. Evil Babysitter can prevent our supposedly perfect C-Baby from converging to it. If Baby ever guesses L∞, say when the longest sentence is 75 words: • Then Evil Babysitter keeps talking from L76 until Baby capitulates and revises her guess to L76 – as any perfect CBaby must. • So Baby has not stayed at L∞ as required. Then Babysitter can go ahead with longer sentences. If Baby ever guesses L∞ again, she plays the same trick again. 6.863 Spring 2010 Lecture 17 Implications • We found that C = {L0, L1, …, L∞} isn’t learnable in the limit. • How about class of finite-state languages? • Not unless you limit it further (e.g., # of states) • After all, it includes all languages in C, and more, so learner has harder choice! • How about class of context-free languages? • Not unless you limit it further (e.g., # of rules) 6.863 Spring 2010 Lecture 17 Is this too ‘adversarial’? • Should we assume Babysitter is evil? • Maybe more like Google…. • Perhaps Babysitter isn’t trying to fool the baby not an adversarial situation 6.863 Spring 2010 Lecture 17 Punchline • What about the class of probabilistic CFGs? • Suppose Babysitter has to output sentences randomly with the appropriate probabilities, (what does that mean?) • Is s/he unable to be too evil? • Are there then perfect Babies that are guaranteed to converge to an appropriate probabilistic CFG? • I.e., from hearing a finite number of sentences, Baby can correctly converge on a grammar that predicts an infinite number of sentences… • But only if Baby knows the distribution fn of the sentences a priori (Angluin) • Even then, what is the complexity (# examples, time)? 6.863 Spring 2010 Lecture 17
© Copyright 2026 Paperzz