Predictability and Anticipation Information Theory meets Visual World Seminar SS 2015 Maria Staudte 1 The details... • • • Seminar with participation • • Summary of papers due night before Active discussion Presentation • • One long/two short papers pre-discussion of slides 4 CP (7 CP = presentation + term paper) 2 Schedule 3 Predictability 4 Information Theory • Shannon (1948): Which amount of information can be transmitted via a certain channel (even though it may be noisy)? • How many bits are needed (minimally) to encode a certain message? • • What is the channel’s capacity? How fast can information be transmitted via this channel? 5 Information Theory • Rolling a die: • How many bits are needed to encode the outcomes of a die? • How many binary variables to code all states? • • • 1 2 3 4 5 6 000 001 010 011 100 101 What is the entropy for rolling a die? • How many yes-no questions needed to determine final state? • • log2 states = questions log2 6 = 2.585 6 Entropy • • • • Entropy: • • • Measure of uncertainty The less likely an event is, the more information it conveys Probability of events coupled with information amount Rolling a (fair) die: • log2 6 = 2.585 Tossing a (fair) coin: • log2 2 = 1 Higher entropy - more uncertainty/information - less predictable outcome 7 Quiz:Quiz: Welche Verteilung hat die Entropie? Welche Verteilung hat höchste die höchste Entropie? 0.4 0.2 Entropy 0.6 0.4 0.8 0.6 0.4 0.8 Probability 0.6 0.8 Probability Probability 0.8 2) 1 2) 1 Probability 1) 1 1) 1 0.2 0.2 0.6 0.4 0.2 Which distribution has0 Black higher entropy? 0 0 0 Black Black Red Green Red Green Red Green Black Red Green 0.4 0.2 0 0.6 0.4 Probability 0.6 0.8 0.8 Probability Probability 0.8 4) 1 4) 1 0.4 0.2 0.2 0 Black0 Black Red Green Red Green Matthew Matthew Crocker (UdS) Crocker (UdS) 0.6 0.8 Probability 3) 1 3) 1 0.4 0.2 Black0 Black Red Green Red Green Mathe IIIMathe III 8 0.6 20. Juni 2013 17 / 44 17 / 44 20. Juni 2013 Entropy and Language • • • High entropy ~ low predictability • • Small lexicon -> lower entropy Large lexicon -> higher entropy Language is efficient! Often words are “predictable”! 9 Verbal Selection • Some verbs have selectional preferences, i.e. constrain the upcoming object to a certain class of objects (Resnik, 1994) • Measure for selectional preference strength from occurrence in corpora (relative entropy of classes for a given verb) • • Reveals whether a verb is highly selective Selectional association to determine semantic fit of a class (contribution to overall strength distribution) • Reveals which class is selected 10 Verbal Selection 11 Predictability • Low predictability : • The man sees the 12 Predictability • Entropy? Low predictability : • The man sees the ? 13 Predictability • Low predictability : • The man sees the book. 14 Predictability • Higher predictability : • The man reads the 15 Predictability • Higher predictability : • The man reads the ? 16 Predictability • Higher predictability : • The man reads the book. 17 Predictability • • But what is predicted?? “The man reads the ... “ • • • POS noun (syntactic category) Readable objects (semantic category) “book”, “newspaper” (word) 18 Psycholinguistics • Conditional point-wise entropy / Surprisal: ➡ Information given by a word in a given context H(y|x1...xn) = -log2 p(y|x1...xn) • • • Higher predictability ~ lower surprisal Low predictability ~ more information/higher surprisal Hypothesis: More information/surprisal = more cognitive load for processor 19 Predictability • • Higher predictability ~ lower surprisal “The man reads the ... “ • • • POS noun (syntactic surprisal) Readable objects (semantic category, semantic features) “book”, “newspaper” (specific word) 20 Predictability • • Empirical measures of cognitive load that relate to predictability and surprisal: • • • Cloze (predictability) Reading (surprisal) ERPs (surprisal) But what is predicted?? 21 Predictability • • • Cloze • The man reads the ___ Reading • • Self-paced: The - man - reads - the - book Eye-tracking during reading full sentence ERPs • ERP components during reading of critical word 22 rplane’ also differ in how well they fit with event schemas that the than others, none was nonsensical, barring partic ntence ‘brings to mind’ through semantic memory processes. How- ing a strategy (conscious or unconscious) whe er, whereas ‘kite’ and ‘airplane’ differ in meaning, ‘a’ and ‘an’ do not, article was taken to signal an impending semanti ing distinguished only by their phonological forms. Since their mantics are identical and they differ only in frequency of usage and ERP recording and analyses ngth, there is no reason for the articles to be differentially difficult to To measure brain activity associated with pr Kutas & Federmeier 1999 at 26 scalp loca tegrate into a given sentence representation unless (i) ‘a’ is always recorded electroencephalograms Urbach & Kutas 2005 sentences word-by-word from a CRT a sier to integrate, because it is shorter and/or more frequent than ‘an’ pants readDeLong, everyday usage, or, as we will maintain, (ii) comprehenders have (200-ms duration each). ERPs were analyzed fo in 160 sentences, with 16 participants vie eady (unconsciously)ERPs formedreveal a higher,that non-trivial expectation for nouns comprehenders predict: 160 articles and nouns were sorted into ten eq te’ than for ‘airplane’. If anticipation is an integral part of language processing, then it function of each item’s cloze probability, from h Semantic features ould be reflected in the brain activity probed by the more and less lowest (0–10%). ERPs for each 10% bin were avera pected indefinite articles. If the amount of pre-activation is driven across, participants. The average numerical cloze “They then wanted to make look ictly by word length or frequency, whatever thethe ERPhotel effect, it more like a tropical resort. So they ould be context independent,planted...” with all examples of ‘a’ (versus ‘an’) Vertex ERPs by median split on cloze p tterning together. Even if pre-activation is context dependent, the a e.g., 'The day was breezy so the boy went ou palms / pines / tulips ain may react to the anticipated article with one response and to Articles No ything else with a different response, in a binary rather than a graded –5 µV hion. Finally, we hypothesize that the language processor exploits Specific words ntence context to probabilistically pre-activate possible continua'airplane ns, consistent with constraint-based models; if so, thetoN400 “The boy went outside fly...”should 'an' spond to a degree that can be estimated from the article’s offline cloze 500 ms 0 0 obability. In sum, no observable brain’s response to a / difference an kitein/the airplane 'a' ore- versus less-expected articles would be a sharp blow to predictive ocessing accounts, whereas a graded N400 effect correlated with the ticle’s offline cloze probability would support incremental, predictive < 50% Article cloze <5 ocessing that is sensitive to meaning-based constraints. ≥ 50% Article cloze ≥5 Predicted content • • • • • • • 23 Context • Corpora-based measures use written text and exclude on-line information such as the (visual) environment • Among other things (see other SFB projects) co-present people (speaker/listener) or objects (possible referents) are relevant context • How does this context change predictability -> surprisal/ cognitive load? Can it be measured and quantified? • “read” may a priori select for book/newspaper but in a given situation (e.g. constructing a billy-shelf) predict “manual” which here denotes a concrete object! 24 Anticipation 25 Visual World Paradigm • VWP links perception of visual information with language processing via time-locked eye-movement analysis • Eye-movements during language comprehension reveal details about the current interpretation (Cooper, 1974) • immediate reference resolution (Cooper 1974; Tanenhaus et • and anticipated referents (Altmann & Kamide, 1999) al. 1995) 26 Spoken word recognition (Allopenna, Magnuson & Tanenhaus 1998) “Pick up the candle” 27 Verb-based anticipation (Altmann & Kamide 1999) “The boy will move eat thecake.” cake.” the 28 Anticipation vs priming Y. Kamide et al. / Journal of Memory and Language 49 (2003) 133–156 (Kamide et al. 2003) Y. Kamide et al. / Journal of Memory and Language 49 (2003) 133–156 Fig. 4. Percentage of trials for each region of interest with looks to the motorbike in each of the four different senten (each datapoint include looks to the other objects, averaged across the appropriate conditions—see the main text). Al 29 relevant to the analyses are inple visual (panel A) and auditory (panel B) stimuli Experiment 2. The regions looksfrom to the Agent. Predictability & Anticipation • What is predicted?? • Matching words are easier to process • • Matching objects are inspected • • Other matched/predicted words? Constrained to shown objects? (Knoeferle & Crocker 2006) Using the VWP to assess predictions and cognitive load (to indirectly determine surprisal) 30 Predictability & Anticipation “The boy will eat ...” “... the train” -log2 p(candy|boy will eat) vs -log2 p(cake|boy will eat) vs -log2 p(steak|boy will eat) vs... -log2 p(cake|boy will eat + visible objects) vs -log2 p(ball|boy will eat + visible objects) 31 Index of Cognitive Activity • Frequency of pupil jitter to indicate level of cognitive load (Marshall 2000, Demberg et al. 2013) • • Robust to changes in light Robust to eye-movements! 32 InfoTheory & VWP? • • From corpus-derived probabilities for ling. stimuli... • • Cloze (for low probability items?) Empirical estimate: Reading -> Auditory Processing ... to eye-tracking in visual context • • • Cloze ? Anticipatory eye-movements during prediction Gaze duration? ICA!? 33 Outlook • Presentations on: • Predictability - • • Anticipation - • • • what is measured, how? describe limits and expectations on VWP experiments what do comprehenders predict/anticipate? what do eye-movements reveal? surprisal? ICA - “empirical surprisal measure” • how could it be used to combine the approaches? 34
© Copyright 2026 Paperzz