Slides from Day 16

Language Acquisi-on and Change Nov 11, 2014 theredappledwells Child Language Acquisi-on Given raw speech input, infants must learn •  Phone-c categories from sound •  Ar-culatory correspondences to phonemes •  Lexicons (vocabularies of words) •  Word categories and syntax •  Seman-cs and mapping of words to en--es and ac-ons Word segmenta-on theredappledwells Word Segmenta-on •  Observed: sequence of leKers without spaces •  To Learn: lexicon of words (and n-­‐gram probabili-es) + segmenta-on of leKer sequence into words •  Given a lexicon, how to find the best segmenta-on of a leKer sequence? Viterbi Word Segmenta-on max P(there), P(the)P(re) = 0.10 P(thered)P(app) = 0.0002 #theredappledwells!
P(the) = 0.40 P(the)P(red) = 0.02 max P(thered)P(apple), P(there)P(dapple) apple! !0.07!
app ! !0.01!
dapple !0.02!
dappled !0.03!
dwells !0.06!
dwell! !0.01!
led ! !0.03!
re! ! !0.08!
red ! !0.05!
the ! !0.40!
there! !0.10!
well ! !0.06!
wells! !0.02!
How to learn the lexicon •  Ini-alize lexicon •  E-­‐Step: segment observa-ons using Viterbi (hard EM) or compute distribu-on over all possible segmenta-ons (soY EM) •  M-­‐Step: take rela-ve counts of words produced by segmenta-on •  Is it similar to what the child is doing? How to learn the lexicon •  Problem: this doesn’t work! •  Maximum likelihood strategy learns that every sentence is a single word –  this actually maximizes the probability of the observed corpus •  Alternate solu-ons? Bayesian Word Segmenta-on •  Place a prior on the lexicon (Bayesian approach) –  We know the unigram probabili-es should have a Zipfian distribu-on –  Give high probabili-es to lexicons that are Zipfian, low probabili-es to “flat” lexicons: Dirichlet prior –  Chinese Restaurant Process Morphological Segmenta-on •  Given a lexicon of words, how do infants learn the morphology of the language? •  Or, given a lexicon of words in an unknown language, can we write a program that behaves like a linguist and analyzes the morphology? Morphological Segmenta-on: English Morphological Segmenta-on: Arabic Morphological Segmenta-on: Ulwa Morphological Segmenta-on •  Focus on concatena-ve morphology •  Quick heuris-c: Given these words walk walking war walked wall waltz walker wars how to split off morphemes? •  Zellig Harris (1955): points of low predictability are morpheme boundaries •  Prefix trees are a good ini-aliza-on for learner Morphological Segmenta-on Model: •  Equivalence sets of affixes (signatures) •  Each signature is associated with a set of stems •  A signature together with stems is a class Morphological Segmenta-on •  P(word) = P(class)P(stem|class)P(affix|class) summed over all classes •  Each model is a probability distribu-on over classes, with each class being a probability distribu-on over stems and affixes •  Goal: find best model for the data Morphological Segmenta-on •  Maximum likelihood with EM: –  ini-alize segmenta-on of data with prefix trees –  group together signatures and find the associated probabili-es by coun-ng (M step) –  find expected counts over all possible segmenta-ons of data (E step) •  Same problem: ends up trea-ng each word as a morpheme •  Bayesian approach -­‐-­‐ use prior favoring models that produce Zipfian distribu-ons on stems and signatures Morphological Segmenta-on: Results Acquisi-on of Phone-c Categories •  Children don’t know what sounds are in their anguage •  Phonemes are not universal; e.g –  {R, L} is a single phoneme in Japanese –  Aspirated stops in Hindi are their own phonemes Acquisi-on of Phone-c Categories •  Run an algorithm resembling word segmenta-on on sound (represented as spectral or MFCC features) •  More complex because each phoneme is a distribu-on over acous-cs (like a Gaussian): this should be learned too Acquisi-on of Phone-c Categories Ac-ve research area •  Current work finds units that are correlated with phonemes, but not exactly –  Are phonemes real? (non-­‐segmental phonology) •  Can build unified learners: –  Phonemes from acous-cs –  Words and morphemes in terms of learned phonemes •  Next step: seman-c discovery using visual cues and world knowledge? Exemplar Theory in Linguis-cs •  We store exemplars of sounds rather than an abstracted representa-on –  all the instances that we’ve ever heard of each phoneme/syllable/speech unit –  “memory decay”: very old exemplars may be forgoKen –  when we hear a new sound, we match it against the exemplars Is the brain doing k nearest neighbors? Language Change Great Vowel ShiY in English Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,
Or bends with the remover to remove:
O no! it is an ever-fixed mark
That looks on tempests and is never shaken;
It is the star to every wandering bark,
Whose worth's unknown, although his height be taken.
Love's not Time's fool, though rosy lips and cheeks
Within his bending sickle's compass come:
Love alters not with his brief hours and weeks,
But bears it out even to the edge of doom.
If this be error and upon me proved,
I never writ, nor no man ever loved.
Language Change Curves Language Change Curves Language Change Curves Computa-onal model of change that produces logis-c (s-­‐shaped) curves? Language Change: Modeling •  Incremental varia-ons from each genera-on accumulate, lead to large-­‐scale language change –  Variants arise as a result of exposure to other linguis-c groups –  … or because learners acquire language imperfectly from their parents –  Variants may be consciously preferred because they confer higher social status Language Change: Dynamical Systems Approach •  Discrete genera-ons •  Learners in Gt learn from parent genera-on Gt-­‐1 •  Two languages, L1 and L2 (e.g. L1 = tradi-onal usage of literally, L2 = figura-ve usage of literally) •  Each member of a genera-on learns and speaks exactly one of the two languages •  Examples that a given member of Gt hears can come from any member of Gt-­‐1 Language Change: Dynamical Systems Approach •  At genera-on t, αt of the popula-on speak L1 and 1-­‐αt speak L2 •  Each speaker of L1 says a sentence s with prob. P1(s) •  Each speaker of L2 says a sentence s with prob. P2(s) Language Change: Dynamical Systems Approach Prob. that a learner hears a sentence belonging to: •  L1 but not in L2 = αt(1-­‐a) •  L1 and L2 = (αt)a+(1-­‐αt)b •  L2 but not in L1 = (1-­‐αt)(1-­‐b) •  L1 = ? •  L2 = ? Triggering Learning Algorithm •  Learning algorithm of each child in genera-on t+1: 1.  Pick a hypothesis language L (from L1 and L2) uniformly 2.  Hear a new sentence s 3.  If s can be parsed by hypothesis L, retain hypothesis 4.  Else, flip hypothesis 5.  Go back to step 2 Triggering Learning Algorithm Evolu-on •  If learners learn in two examples, αt+1 = 0.5(αt+(1-­‐αt)b) (αt+(1-­‐αt)b) + 0.5(1-­‐αt)(1-­‐b)αt(1-­‐a) + 0.5αt(1-­‐a)(αt+(1-­‐αt)b) + 0.5(αt(a-­‐1)+1)(αt(1-­‐a)) = Aαt2 +Bαt+C where A = 0.5(-­‐a2+2a+b2-­‐2b), B = (-­‐a+b-­‐b2+1), C = 0.5b2 Evolu-on Agent-­‐Based Modeling •  Dynamical systems are mathema-cally well-­‐defined •  ... but require simplis-c assump-ons about learning and interac-on •  Alterna-ve: define a set of agents –  At each -me step, agent A can talk to B with probability given by some features of A and B –  Agents are born, move, and die –  Pros: more flexible: can model interac-on with parents, peers, social groups –  Cons: no analy-c solu-ons