Lecture 17: Dis-course The Menu Bar DRT: Informal semantics DRT

The Menu Bar
Lecture 17: Dis-course
Administrivia
Boxing ourselves in with dis-course
Adding reasoning
If time: language learnability – the basic formal
results
•
•
•
•
Professor Robert C. Berwick
[email protected]
6.863 Spring 2010 Lecture 17
DRT: Informal semantics
•
•
•
When is a DRS satisfied in a model?
If and only if it is an accurate picture of the information encoded in the
model
The following DRS should be satisfied iff discourse referents x and y
can be embedded (i.e., associated with entities in the model) such that:
DRT: Semantics for complex conditions
•
•
•
Negated DRS: satisfied if it is not possible to find the picture inside the
model
Disjunctive DRSs: satisfied if at least one of these pictures can be
embedded
But what about implicational conditions?
•
1.
2.
3.
the first entity is a woman
the second is a boxer
the first stands in the admires relation to
the second
So this is: “a woman admires a boxer”
6.863 Spring 2010 Lecture 17
•
•
Satisfied iff every embedding of the
antecedent also embeds the
consequent “Every woman loves a
boxer”
So, discourse referents introduced in antecedents of conditionals are
universally quantified
Other discourse referents are existentially quantified
6.863 Spring 2010 Lecture 17
DRT: Accessibility
• Accessibility—the property valid antecedents
have with respect to pronouns—is a geometrical
concept:
• It depends on how DRSs are contained within
each other, in part based on a parse tree
• The discourse referents of a DRS K1 are
accessible from K2 when:
• K1 subordinates K2, or
• K1 equals K2
6.863 Spring 2010 Lecture 17
Parse tree geometry, accessability, and
pronouns
• Bush likes him (Bush cannot = him)
• Bush thinks that Clinton likes him (Bush can = him)
What is the constraint about ‘accessability’?
Essentially, ‘environment frames’ (the boxes) say where a pronoun must
be ‘free’ – read from parse trees as ‘c-command’ (recall ‘displaced’
elements)
Constraint is that a pronoun must be locally free, and can be bound from
a name outside its frame
6.863 Spring 2010 Lecture 17
DRT: Subordination
• DRS K1 subordinates K2 if and only if:
• K1 contains a condition of the form ¬K2
• K1 contains a condition of the form K2 ⇒ K, where K is
some DRS
• K1 ⇒ K2 is a condition in some DRS K
• K1 contains a condition of the form K2∨K or K∨K2,
where K is some DRS
• K1 subordinates some DRS K, and K subordinates K2
• In short: look up, and with implication, look left.
6.863 Spring 2010 Lecture 17
Building DRSs
• DRSs can be constructed from parse trees in much the
same way as formulas of FOL
• The lambda calculus can serve as a glue language:
• Extend the DRS language with the lambda operator,
functional application, and a DRS-merge operator +
• Build representations bottom-up
• Linear representation in NLTK:
([x, y], [rocky(x), moose(y), has(x,y)])
6.863 Spring 2010 Lecture 17
Building discourse structures in NLTK
>>> from nltk.sem.drt import *
>>> dp = nltk.DrtParser()
>>> dp = nltk.DrtParser()
>>> drs1 = dp.parse('([x, y], [rocky(x), moose(y), has(x, y)])') [1]
>>> print drs1
([x,y],[rocky(x), moose(y), has(x,y)])
>>> print drs1.fol()
exists x y.((rocky(x) & moose(y)) & has(x,y))
>>> drs4 = dp.parse('([x, y], [rocky(x), moose(y), has(x, y)])')
>>> drs5 = dp.parse('([u, z], [PRO(u), natasha(z), bite(u, z)])')
>>> drs6 = drs4 + drs5
>>> print drs6.simplify()
([x,y,u,z],[rocky(x), moose(y), has(x,y), PRO(u), natasha(z), bite(u,z)])
>>> print drs6.simplify().resolve_anaphora()
([x,y,u,z],[rocky(x), moose(y), has(x,y), (u = [x,y,z]), natasha(z), bite(u,z)])
6.863 Spring 2010 Lecture 17
And finally a DRT parse…
>>> from nltk.parse import load_earley
>>> parser = load_earley('grammars/book_grammars/drt.fcfg',
logic_parser=nltk.DrtParser())
>>> trees = parser.nbest_parse(ʼRocky has a moose'.split())
>>> print trees[0].node['sem'].simplify()
([x,z2],[Rocky(x), moose(z2), has(x,z2)])
That’s all well and good, but what is this useful for?
6.863 Spring 2010 Lecture 17
The DRT grammar & a piece of a parse
Original cf/semantic rule for ‘a’:
Det[num=sg,SEM=<\P Q. exists x.(P(x) & Q(x))>] -> 'a'
Corresponding DRT rule:
Det[num=sg,SEM=<\P Q.([x],[]) + P(x) + Q(x)>] -> 'aʼ
(recall that DRT assumes the existential is there)
Example subtree for ‘a moose’:
(NP[num='sg', SEM=<\Q.(([x],[moose(x)]) + Q(x))>]
(Det[num='sg', SEM=<\P Q.((([x],[]) + P(x)) + Q(x))>] a)
(Nom[num='sg', SEM=<\x.([],[moose(x)])>]
(N[num='sg', SEM=<\x.([],[moose(x)])>] moose)))))
How does this work? Apply Det as a fn to Nom as arg, ie,
Apply: \P Q.((([x],[]) + P(x)) + Q(x)) to: \x.([],[moose(x)]) yields:
\Q.(([x],[]) + .([],[moose(x)]) + Q(x)) which simplifies to:
\Q.(([x], [moose(x)]) + Q(x))
6.863 Spring 2010 Lecture 17
The Language use domain
• As inference tasks:
• Querying
• Consistency checking
• Informativity checking (what is this?)
• Plus we need to do multiple sentences…
• In short, we need something to handle this…
6.863 Spring 2010 Lecture 17
Ah, we must be able to reason!
•
•
So, we need to know about presuppositions
Because presupposition relates to consistency and informativity
(remember the Spock episode)?
•
•
•
•
•
•
You say you don’t remember the Spock episode?
Well, let me remind you…
Kirk: Spock, how many Romulans are there in Sector 53?
Spock: None, Captain
Kirk: You are sure?
Spock: Affirmative
<Camera rocks violently>
Kirk: Damn your Vulcan ears, Spock! I thought you said there were
no Romulans in Sector 53!!!!
Spock: But Captain, there is no Sector 53. Logic dictates….
•
•
6.863 Spring 2010 Lecture 17
What are presuppositions?
• Information a speaker expects to be taken for granted:
• The couple that won the dance contest was pleased.
• Jody loves her husband.
• Vincent regrets that Mia is married.
• These sequences are unacceptable:
• Jody has no husband. ?Jody loves her husband.
• Mia is not married. ?Vincent regrets that Mia is
married.
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Presupposition is not entailment
• Presupposition behaves differently from ordinary
entailment
• Both:
• Jody loves her husband.
and its negation:
• Jody does not love her husband.
presuppose that Jody has a husband
6.863 Spring 2010 Lecture 17
Presupposition triggers
The binding problem
• Presuppositions are triggered by lexical items or syntactic
constructions:
• the definite article: the
• factive verbs: regret, resent
• possessive pronouns (her), aspectual verbs (begin),
iteratives (again), temporal subordinate clauses (when),
quantifiers (every), and many more
• These lexical items and constructions are called
presupposition triggers
• The use of a presupposition trigger introduces a
presupposition that constrains the context
• The following sentence presupposes that someone owns an
apartment:
• A boxer nearly escaped from his apartment
• But it is not just anyone — nor just any boxer — who is
presupposed to own an apartment
• It is the boxer we are actually talking about who is
presupposed to own the apartment
• The existentially quantified NP binds both the asserted
boxer and the presupposed apartment
6.863 Spring 2010 Lecture 17
Accommodation
• The following sentence presupposes that Vincent has a
boss:
• Vincent informed his boss
• Or: Spoke, how many Romulans are in sector alpha3?
• What happens when the hearer has no information about
whether Vincent has a boss (i.e., about whether the
presupposition is true)?
• Simplistic solution: Reject the utterance (Spock!)
• More robust solution: Try to add the presupposition to
the context
• The process of adding a presupposition is called
accommodation
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Presupposition as anaphora
• A presupposition trigger introduces a DRS containing the
content of the presupposition
• Presuppositional DRSs are introduced locally, in the
same DRS as the trigger and marked with an α
• A presuppositional DRS (or elementary presupposition)
must be resolved, either by:
• Binding: matching the content of an accessible,
compatible antecedent
• Accommodation: adding its content to an accessible
location (locally or to a superordinate DRS)
• Accommodation is subject to a number of constraints
6.863 Spring 2010 Lecture 17
Presupposition as anaphora:
A resolution algorithm
Generate DRS for sentence with α-marked elementary
presuppositions located together with their triggers.
Merge this DRS with discourse DRS so far.
2. Traverse the merged DRS, and on encountering an αmarked DRS try to:
a) (Partially) match the presupposed information to an
accessible antecedent
b) Alternatively, add the presupposed information to the
local DRS or to a superordinate DRS
3. After generating all possible readings, filter DRSs that
violate acceptability constraints
1.
Presupposition as anaphora:
A simple binding example
• A waitress serves Vincent a coffee. Vincent lights up a
cigarette. The waitress leaves.
Step 1:
Step 3:
Step 2:
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
A lot goes on behind the scenes…
Non-determinism
The algorithm generates a number of possible readings (exponential in the
number of presuppositions) – we need constraints to filter out all of these…!
•
The waitress
as coffee
6.863 Spring 2010 Lecture 17
•
The waitress
as Vincent
The waitress
as cigarette
These readings are filtered out by step 3
6.863 Spring 2010 Lecture 17
A new
waitress
So the constraints in general
• Will require some sort of reasoning system, however crude
(ie, that you cannot set waitresses on fire…)
• It will also require some sort of Wordnet-like notion of
ontology…
• Further, responses to questions in general will require
some reasoning capability (as in the Star Trek example)
• Let’s cover some of the ‘conversational maxims’ based on
these notions
6.863 Spring 2010 Lecture 17
Baby Bush
• Computes semantic representations and difft
quantifier scopings (and we already know how
hard this can be…!)
• Accumulates information over sentences
• But…is very….uhm, still not the sharpest knife in
the drawer
The Bush hierarchy
•
•
•
•
•
•
Baby Bush: No inference capabilities
Clever Bush: negative consistency checks (sophisticated prover)
Sensitive Bush: negative and positive informativity checks
Scrupulous Bush: eliminating superfluous readings
Knowledgeable Bush: adding background knowledge
Helpful Bush: question answering
We start with the tabula rasa Bush: no inference capabilities. So, even though it
can parse and form FOL semantics and do all the quantifier scoping, it will act
dumb
6.863 Spring 2010 Lecture 17
Baby Bush
There are WMD in Iraq
Bush: OK
There are no WMD in Iraq
Bush: OK
Condi is a woman; Cheney does not like any woman
Bush: OK
Cheney likes Condi
Bush: OK
What’s missing? No consistency or informativity checks.
Let’s define these.
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
What must be added: Inferential constraints
• Be consistent:
• Jody is married. Her husband is a dealer.
• Jody is not married. #Her husband is a dealer.
• The consistency constraint clearly helps to filter out
nonsensical readings.
• Be informative:
• Jody is a boxer. #Jody is a boxer.
• Mia is married. #She has a husband.
6.863 Spring 2010 Lecture 17
Consistency checking
• A formula is consistent if it is satisfied in at least one
model.
• Consistent formulas describe “conceivable” or
“possible” states of affairs
• A formula that is not consistent is called inconsistent.
• Inconsistent formulas describe “inconceivable” or
“impossible” states of affairs
• A finite set of formulas is consistent if the conjunction of
the formulas is consistent.
• Consistency checking is undecidable for first-order logic.
6.863 Spring 2010 Lecture 17
So What More Do We Need?
1. Translate the English text and query to a
representation in a logical language (viz., typed
lambda calculus)
• Representation of text’s literal meaning
2. Check for logical consistency and coherence of
the representation
• To catch errors in translation
3. Combine with a representation of general world
knowledge
4. Run automated reasoning tools to get an answer
6.863 Spring 2010 Lecture 17
Informativity checking
• A valid sentence is a sentence that is true in all models.
• What does validity have to do with informativity?
• We often call valid sentences uninformative and invalid
sentences informative. Why?
• If φ1… φn ⊨ ψ, we say that ψ is uninformative wrt
φ1… φn (it doesn’t add any new knowledge)
• If φ1… φn ⊭ ψ, we say that ψ is informative wrt
φ1… φn
• Validity testing is also undecidable for first-order logic
6.863 Spring 2010 Lecture 17
Informativity
>>> dt.add_sentence('A person dances', informchk=True)
Sentence 'A person dances' under reading 'exists x.(person(x) &
dance(x))':
Not informative relative to thread 'd0'
6.863 Spring 2010 Lecture 17
Now we have clever Bush
Bush is the President
Bush: OK
Bush is not the President
Bush: That is not OK It is good to be king!
6.863 Spring 2010 Lecture 17
Back to the press conference
• Consistency check removes ‘dumb’ interchanges
6.863 Spring 2010 Lecture 17
Clever Bush
Cheny is a Republican
Bush: OK
Bush likes every Republican
Bush: OK
Bush does not like Cheney
Bush: That is inconsistent
6.863 Spring 2010 Lecture 17
Model building
• Theorem provers check whether a formula or set of
formulas is valid (true in all possible models)
• Model builders attempt to construct a formula (or set of
formulas) and so show that this formula is satisfiable (true
in at least one possible model)
• So – we must limit model builders to a finite domain
size… (cf the ‘closed world’ assumption in AI), else this is
unsolvable in general
• Uncertainty: if you don’t find a valid model, you don’t
know whether it’s satisfiable or not… but
• If you do, you can be pretty sure the formula is satisfiable
6.863 Spring 2010 Lecture 17
Sensitive Bush: Informativity
• “Make your contribution as informative as is
required (for the current purposes of the
exchange).” H. P. Grice.
• Suggests need for Informativity checking –
indeed, for informativity checking against
background knowledge
6.863 Spring 2010 Lecture 17
Relationship between
consistency and informativity
• ψ is informative iff ¬ψ is consistent:
• Informativity means that the negation was also a
possibility, so you have found out something new
• ψ is informative wrt φ1… φn iff {φ1,…, φn , ¬ψ} is
consistent
• ψ is uninformative iff ¬ψ is inconsistent:
• Uninformativity means that the negation simply was
not a possibility (cf., “there are no Romulans because
there is no sector 53”)
• ψ is uninformative wrt φ1… φn iff {φ1,…, φn , ¬ψ} is
inconsistent
6.863 Spring 2010 Lecture 17
Informativeness
• [Negative test] Lexical knowledge ∪ World
knowledge ∪ Discourse-so-far ⇒ φ
• [Positive test] Lexical knowledge ∪ World
knowledge ∪ Discourse-so-far ∪ {¬φ} has a
model
6.863 Spring 2010 Lecture 17
That will get rid of dumb exchanges like this
Bush is not the President (ψ)
Bush: OK
Bush is not the President (φ)
(system now tries to prove ψ⇒φ , and easily wins)
Bush: I already know that!
(Clearly, this also will fix cases like:
Bush likes every Republican; Cheney is a Republican; Bush
likes Cheney – the last is uninformative)
6.863 Spring 2010 Lecture 17
What about these?
A Clinton loves a woman
There are 2 quantifier scopings, but these are
logically equivalent, so really there is only one
distinct reading.
6.863 Spring 2010 Lecture 17
Knowledgeable Bush
• Assume: KB is a formula (the conjunction of a set of
formulas) containing relevant background knowledge.
• Consistency:
• Let φ be the translation of a DRS to FOL
• Use first-order tools to check whether KB ∧ φ is
consistent
• Informativity:
• Let ψ be the formula KB ∧ OLD, where OLD is the
translation of the old DRS to FOL
• Let φ be the translation of the new DRS to FOL
• Use first-order tools to check whether is φ informative
wrt ψ
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Hypernym (‘above’)
Hyponym (‘below’)
6.863 Spring 2010 Lecture 17
NLTK: consistency & informativity checking
6.863 Spring 2010 Lecture 17
How close are we now to this?
>>> dt = nltk.DiscourseTester(['A student dances',
'Every student is a person'])
>>> dt.readings()
s0 readings: s0-r0: exists x.(student(x) & dance(x))
s1 readings: s1-r0: all x.(student(x) -> person(x))
>>> dt.add_sentence('No person dances', consistchk=True)
Inconsistent discourse d0 ['s0-r0', 's1-r0', 's2-r0']:
s0-r0: exists x.(student(x) & dance(x))
s1-r0: all x.(student(x) -> person(x))
s2-r0: -exists x.(person(x) & dance(x))
>>> dt.retract_sentence('No person dances', verbose=True)
Current sentences are
s0: A student dances
s1: Every student is a person
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Not even close: the real Turing test
As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don’t know.
Language learning: the formal analysis
• What are the main results?
• Why is it important to know them?
—Donald Rumsfeld,
Feb. 12, 2002, Department of Defense news briefing
6.863 Spring 2010 Lecture 17
Learning: Observe some values of a function
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Guess the whole function
6.863 Spring 2010 Lecture 17
Another guess: Just as good?
6.863 Spring 2010 Lecture 17
Perfect fit to perfect, incomplete data
More data needed to decide
6.863 Spring 2010 Lecture 17
Imperfect fit to noisy data
Will an ungrammatical sentence ruin baby forever?
(yes, under a conservative strategy ...)
6.863 Spring 2010 Lecture 17
Or can baby figure out which data to (partly) ignore?
6.863 Spring 2010 Lecture 17
Language learning:
What kind of evidence?
Poverty of the Stimulus
• Never enough input data to completely determine the
polynomial …
• Always have infinitely many possibilities
• … unless you know the order of the polynomial ahead of time.
• 2 points determine a line
• 3 points determine a quadratic
• etc.
• In language learning, is it enough to know that the target
language is generated by a CFG?
• without knowing the size of the CFG?
6.863 Spring 2010 Lecture 17
Poverty of the Stimulus (1955 on)
Chomsky (and the Cartesian tradition): Just like
polynomials – never enough data unless you know
something in advance. So kids must be born knowing
what to expect in language.
• Children listen to language
• Children are corrected??
• Children observe language in context
• Children observe frequencies of language
6.863 Spring 2010 Lecture 17
The guy is happy / is the guy happy
The guy who is happy is tired /
•
•
•
•
Children listen to language [unsupervised, ‘unlabled’ data]
Children are corrected?? [supervised, ‘labeled’ data]
Children observe language in context
Children observe frequencies of language
Remember: Language = set of strings (but we can generalize
this)
6.863 Spring 2010 Lecture 17
Gold’s Theorem (1967)
a simple, but very powerful negative result along
these lines:
kids (or computers) can’t learn much
without supervision or a priori knowledge
• Children listen to language
• Children are corrected?? - NOT! (is this true?)
• Children observe language in context
6.863 Spring 2010 Lecture 17
“Mother, I’d rather do it myself”
The classic example
• In controlled studies (from Brown, 1967 on):
children are corrected for semantic mal-formed
sentences, but not really all that often for syntactic
ones (why?)
• When they are corrected for (usually minor
morphological) overgeneralizations, e.g., goedwent, they (surprise) don’t usually listen
• Child: Nobody don’t like me
• Mother: No, say, “nobody likes me”
• Child: Nobody don’t like me
• Mother: No, say, “nobody likes me”
…
(goes on 12 more times)
• Mother: OK, one last time: say, “nobody likes me”
• Child: Oh! Nobody don’t likes me!
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
But what about this kind of indirect negative
evidence?
•
•
•
•
•
•
If you haven’t heard a certain constraint/construction after a certain
time, you’ll never hear it (because it’s not part of your target
language/grammar)
Example: suppose you hear, “I am happy” a thousand times
But never “He am happy”
Can we infer from (comparatively) low frequency to ungrammatical?
Tricky!
Note that low probability on its own cannot imply ungrammaticality: if
infinite # of grammatical sentences, then there cannot be a lower
bound on their proability
Because: if all grammatical sentences at least probability ε, then there
can be at most 1/ε grammatical sentences, which implies the language
is finite
6.863 Spring 2010 Lecture 17
What about this idea?
•
•
•
To ensure learnability: Though particular sentences might overlap for
2 different languages, and even the frequencies (probabilities) of the 2
sentences might be the same, the distributions of two languages
cannot overlap (this is a constraint to be hypothesized - it might be true
or not)
Otherwise hard to see how learner could distinguish between L1 and L2
We’ll come back to this…for now, let’s consider the basic case, and
assume no indirect negative evidence
6.863 Spring 2010 Lecture 17
The Idealized Situation
• Babysitter talks
• Baby listens
•
•
•
•
•
•
1. Babysitter outputs a sentence
2. Baby hypothesizes what the language is
(given all sentences so far)
3. Go to step 1
Guarantee: Babysitter’s language is in the set of hypotheses that Baby is
choosing among
Guarantee: Any sentence of Babysitter’s language is eventually uttered by
Caretaker (even if infinitely many)
Assumption: Vocabulary (or alphabet) is finite.
6.863 Spring 2010 Lecture 17
Can Baby learn under these conditions?
The real situation
two same way
" little boy's mitten
two good ones .
twelve fourteen o'clock
post office mail
oh # your rope tricks
little red spots
little fish house
oh poor little fellow
your washing maxne
some child outside .
two seals # one strong man .
one bareback rider .
little soda water
same just like what ?
some water juice ?
green stamp basket
some pretty work ?
two street lights
one more what
dirty water germs
red street light .
three dozen milk
yes # and one dozen eggs .
your dozen what
one dozen eggs
two little birdies ?
your automatic rifle .
two dirty fingers # but I don't think they're
broken .
dirty dirty fingers
6.863 Spring 2010 Lecture 17
Languages vs. Grammars
• Learning in the limit:
• There is some point at which Baby’s hypothesis is correct and never
changes again. Baby has converged!
• Baby doesn’t have to know that it’s reached this point – it can keep an
open mind about new evidence – but if its hypothesis is right, no such
new evidence will ever come along.
•
•
•
•
A family C of languages is learnable in the limit if one could construct a
perfect C-Baby that can learn any language L ∈ C in the limit from a
Babysitter who speaks L.
• Does Baby have to get the right grammar?
• (E.g., does VP have to be called VP?)
• Assumption: Finite vocabulary.
Baby knows the class C of possibilities, but not L.
Is there a perfect finite-state Baby?
Is there a perfect context-free Baby?
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
Conservative Strategy
Conservative Strategy
•
Baby’s hypothesis should always be smallest language consistent with
the data
•
Baby’s hypothesis should always be smallest language consistent with
the data
•
Works for finite languages? Let’s try it …
• Language 1: {aa,ab,ac}
• Language 2: {aa,ab,ac,ad,ae}
• Language 3: {aa,ac}
• Language 4: {ab}
•
Works for finite languages? Let’s try it …
• Language 1: {aa,ab,ac}
• Language 2: {aa,ab,ac,ad,ae}
• Language 3: {aa,ac}
• Language 4: {ab}
Babysitter aa
Baby
L3
ab
L1
ac
L1
ab
L1
aa
L1
…
6.863 Spring 2010 Lecture 17
Babysitter aa
L3
Baby
Evil Babysitter
• To find out whether Baby is perfect, we have to see
whether it gets 100% correct even in the most adversarial
conditions
• Assume Babysitter is trying to fool Baby
• although she must speak only sentences from L
• and she must eventually speak each such sentence
• Does Baby’s strategy work?
ab
L1
ac
L1
ab
L1
aa
L1
ae
ad
…
6.863 Spring 2010 Lecture 17
An Unlearnable family of Languages
• Family of languages:
• Let Ln = set of all strings of length < n
• What is L0?
• What is L1?
• What is L∞?
•
•
•
If the true language is L∞, can Babysitter really follow rules?
Must eventually speak every sentence of L∞. Possible?
Yes: ε; a, b; aa, ab, ba, bb; aaa, aab, aba, abb, baa, …
• Our family is C = {L0, L1, …, L∞}
6.863 Spring 2010 Lecture 17
aa ac
ab
6.863 Spring 2010 Lecture 17
An Unlearnable Family of Languages
An Unlearnable Family
• Let Ln = set of all strings of length < n
• What is L0?
• What is L1?
• What is L∞?
• Our family is C = {L0, L1, …, L∞}
• A perfect C-baby will distinguish among all of
these depending on the input.
• But there is no perfect C-baby …
• Our class is C = {L0, L1, …, L∞ }
• Suppose Baby adopts conservative strategy,
always picking smallest possible language in C.
• So if Babysitter’s longest sentence so far has 75
words, baby’s hypothesis is L76.
• This won’t always work:
What language can’t a conservative Baby learn?
6.863 Spring 2010 Lecture 17
6.863 Spring 2010 Lecture 17
An Unlearnable Family
• Our class is C = {L0, L1, …, L∞}
• Could a non-conservative baby be a perfect C-Baby, and
eventually converge to any of these?
• Claim: Any perfect C-Baby must be “quasi-conservative”:
• If true language is L76, and baby posits something else,
baby must still eventually come back and guess L76
(since it’s perfect).
• So if longest sentence so far is 75 words, and Babysitter
keeps talking from L76, then eventually baby must
actually return to the conservative guess L76.
• Agreed?
6.863 Spring 2010 Lecture 17
Babysitter’s Revenge
•
•
•
•
If longest sentence so far is 75 words, and Babysitter keeps
talking from L76, then eventually a perfect C-baby must
actually return to the conservative guess L76.
Suppose true language is L∞.
Evil Babysitter can prevent our supposedly perfect C-Baby
from converging to it.
If Baby ever guesses L∞, say when the longest sentence is 75
words:
• Then Evil Babysitter keeps talking from L76 until Baby
capitulates and revises her guess to L76 – as any perfect CBaby must.
• So Baby has not stayed at L∞ as required.
Then Babysitter can go ahead with longer sentences. If Baby
ever guesses L∞ again, she plays the same trick again.
6.863 Spring 2010 Lecture 17
Implications
• We found that C = {L0, L1, …, L∞} isn’t learnable in the
limit.
• How about class of finite-state languages?
• Not unless you limit it further (e.g., # of states)
• After all, it includes all languages in C, and more, so
learner has harder choice!
• How about class of context-free languages?
• Not unless you limit it further (e.g., # of rules)
6.863 Spring 2010 Lecture 17
Is this too ‘adversarial’?
• Should we assume Babysitter is evil?
• Maybe more like Google….
• Perhaps Babysitter isn’t trying to fool the baby not an adversarial situation
6.863 Spring 2010 Lecture 17
Punchline
• What about the class of probabilistic CFGs?
• Suppose Babysitter has to output sentences randomly with the
appropriate probabilities, (what does that mean?)
• Is s/he unable to be too evil?
• Are there then perfect Babies that are guaranteed to converge
to an appropriate probabilistic CFG?
• I.e., from hearing a finite number of sentences, Baby can
correctly converge on a grammar that predicts an infinite
number of sentences…
• But only if Baby knows the distribution fn of the sentences a
priori (Angluin)
• Even then, what is the complexity (# examples, time)?
6.863 Spring 2010 Lecture 17