Introduction

Predictability and
Anticipation
Information Theory meets Visual World
Seminar SS 2015
Maria Staudte
1
The details...
•
•
•
Seminar with participation
•
•
Summary of papers due night before
Active discussion
Presentation
•
•
One long/two short papers
pre-discussion of slides
4 CP (7 CP = presentation + term paper)
2
Schedule
3
Predictability
4
Information Theory
•
Shannon (1948):
Which amount of information can be transmitted via a
certain channel (even though it may be noisy)?
•
How many bits are needed (minimally) to encode a certain
message?
•
•
What is the channel’s capacity?
How fast can information be transmitted via this channel?
5
Information Theory
•
Rolling a die:
•
How many bits are needed to encode the outcomes of a die?
•
How many binary variables to code all states?
•
•
•
1
2
3
4
5
6
000 001 010 011 100 101
What is the entropy for rolling a die?
•
How many yes-no questions needed to determine final state?
•
•
log2 states = questions
log2 6 = 2.585
6
Entropy
•
•
•
•
Entropy:
•
•
•
Measure of uncertainty
The less likely an event is, the more information it conveys
Probability of events coupled with information amount
Rolling a (fair) die:
•
log2 6 = 2.585
Tossing a (fair) coin:
•
log2 2 = 1
Higher entropy - more uncertainty/information - less predictable
outcome
7
Quiz:Quiz:
Welche
Verteilung
hat die
Entropie?
Welche
Verteilung
hat höchste
die höchste
Entropie?
0.4
0.2
Entropy
0.6
0.4
0.8
0.6
0.4
0.8
Probability
0.6
0.8
Probability
Probability
0.8
2) 1 2) 1
Probability
1) 1 1) 1
0.2
0.2
0.6
0.4
0.2
Which
distribution
has0 Black
higher
entropy?
0
0
0
Black Black
Red Green
Red Green
Red Green
Black
Red Green
0.4
0.2
0
0.6
0.4
Probability
0.6
0.8
0.8
Probability
Probability
0.8
4) 1 4) 1
0.4
0.2
0.2
0
Black0 Black
Red Green
Red Green
Matthew Matthew
Crocker (UdS)
Crocker (UdS)
0.6
0.8
Probability
3) 1 3) 1
0.4
0.2
Black0 Black
Red Green
Red Green
Mathe IIIMathe III
8
0.6
20. Juni 2013
17 / 44 17 / 44
20. Juni 2013
Entropy and Language
•
•
•
High entropy ~ low predictability
•
•
Small lexicon -> lower entropy
Large lexicon -> higher entropy
Language is efficient!
Often words are “predictable”!
9
Verbal Selection
•
Some verbs have selectional preferences, i.e. constrain the
upcoming object to a certain class of objects (Resnik, 1994)
•
Measure for selectional preference strength from occurrence
in corpora (relative entropy of classes for a given verb)
•
•
Reveals whether a verb is highly selective
Selectional association to determine semantic fit of a class
(contribution to overall strength distribution)
•
Reveals which class is selected
10
Verbal Selection
11
Predictability
•
Low predictability :
•
The man sees the
12
Predictability
•
Entropy?
Low predictability :
•
The man sees the ?
13
Predictability
•
Low predictability :
•
The man sees the book.
14
Predictability
•
Higher predictability :
•
The man reads the
15
Predictability
•
Higher predictability :
•
The man reads the ?
16
Predictability
•
Higher predictability :
•
The man reads the book.
17
Predictability
•
•
But what is predicted??
“The man reads the ... “
•
•
•
POS noun (syntactic category)
Readable objects (semantic category)
“book”, “newspaper” (word)
18
Psycholinguistics
•
Conditional point-wise entropy / Surprisal:
➡ Information given by a word in a given context
H(y|x1...xn) = -log2 p(y|x1...xn)
•
•
•
Higher predictability ~ lower surprisal
Low predictability ~ more information/higher surprisal
Hypothesis: More information/surprisal = more cognitive
load for processor
19
Predictability
•
•
Higher predictability ~ lower surprisal
“The man reads the ... “
•
•
•
POS noun (syntactic surprisal)
Readable objects (semantic category, semantic features)
“book”, “newspaper” (specific word)
20
Predictability
•
•
Empirical measures of cognitive load that relate to
predictability and surprisal:
•
•
•
Cloze (predictability)
Reading (surprisal)
ERPs (surprisal)
But what is predicted??
21
Predictability
•
•
•
Cloze
•
The man reads the ___
Reading
•
•
Self-paced: The - man - reads - the - book
Eye-tracking during reading full sentence
ERPs
•
ERP components during reading of critical word
22
rplane’ also differ in how well they fit with event schemas that the than others, none was nonsensical, barring partic
ntence ‘brings to mind’ through semantic memory processes. How- ing a strategy (conscious or unconscious) whe
er, whereas ‘kite’ and ‘airplane’ differ in meaning, ‘a’ and ‘an’ do not, article was taken to signal an impending semanti
ing distinguished only by their phonological forms. Since their
mantics are identical and they differ only in frequency of usage and ERP recording and analyses
ngth, there is no reason for the articles to be differentially difficult to To measure brain activity associated with pr
Kutas & Federmeier 1999
at 26 scalp loca
tegrate into a given sentence representation unless (i) ‘a’ is always recorded electroencephalograms
Urbach
& Kutas 2005
sentences
word-by-word
from a CRT a
sier to integrate, because it is shorter and/or more frequent than ‘an’ pants readDeLong,
everyday usage, or, as we will maintain, (ii) comprehenders have (200-ms duration each). ERPs were analyzed fo
in 160 sentences, with 16 participants vie
eady (unconsciously)ERPs
formedreveal
a higher,that
non-trivial
expectation for nouns
comprehenders
predict:
160 articles and nouns were sorted into ten eq
te’ than for ‘airplane’.
If anticipation is an integral part of language processing, then it function of each item’s cloze probability, from h
Semantic
features
ould be reflected in the brain
activity probed
by the more and less lowest (0–10%). ERPs for each 10% bin were avera
pected indefinite articles. If the amount of pre-activation is driven across, participants. The average numerical cloze
“They then
wanted
to make
look
ictly by word length or frequency,
whatever
thethe
ERPhotel
effect,
it more like a tropical resort. So they
ould be context independent,planted...”
with all examples of ‘a’ (versus ‘an’)
Vertex ERPs by median split on cloze p
tterning together. Even if pre-activation is context dependent, the
a
e.g., 'The day was breezy so the boy went ou
palms
/ pines
/ tulips
ain may react to the anticipated
article
with one
response and to
Articles
No
ything else with a different response, in a binary rather than a graded
–5 µV
hion. Finally, we hypothesize
that the
language processor exploits
Specific
words
ntence context to probabilistically pre-activate possible continua'airplane
ns, consistent with constraint-based
models;
if so, thetoN400
“The boy
went outside
fly...”should
'an'
spond to a degree that can be estimated from the article’s offline cloze
500 ms
0
0
obability. In sum, no observable
brain’s response to
a / difference
an kitein/the
airplane
'a'
ore- versus less-expected articles would be a sharp blow to predictive
ocessing accounts, whereas a graded N400 effect correlated with the
ticle’s offline cloze probability would support incremental, predictive
< 50% Article cloze
<5
ocessing that is sensitive to meaning-based constraints.
≥ 50% Article cloze
≥5
Predicted content
•
•
•
•
•
•
•
23
Context
•
Corpora-based measures use written text and exclude
on-line information such as the (visual) environment
•
Among other things (see other SFB projects) co-present
people (speaker/listener) or objects (possible referents)
are relevant context
•
How does this context change predictability -> surprisal/
cognitive load? Can it be measured and quantified?
•
“read” may a priori select for book/newspaper but in a
given situation (e.g. constructing a billy-shelf) predict
“manual”
which here denotes a
concrete object!
24
Anticipation
25
Visual World Paradigm
•
VWP links perception of visual information with language
processing via time-locked eye-movement analysis
•
Eye-movements during language comprehension reveal
details about the current interpretation (Cooper, 1974)
•
immediate reference resolution (Cooper 1974; Tanenhaus et
•
and anticipated referents (Altmann & Kamide, 1999)
al. 1995)
26
Spoken word recognition
(Allopenna, Magnuson & Tanenhaus 1998)
“Pick up the candle”
27
Verb-based anticipation
(Altmann & Kamide 1999)
“The boy will move
eat
thecake.”
cake.”
the
28
Anticipation vs priming
Y. Kamide et al. / Journal of Memory and Language 49 (2003) 133–156
(Kamide et al. 2003)
Y. Kamide et al. / Journal of Memory and Language 49 (2003) 133–156
Fig. 4. Percentage of trials for each region of interest with looks to the motorbike in each of the four different senten
(each datapoint include looks to the other objects, averaged across the appropriate conditions—see the main text). Al
29 relevant to the analyses are inple visual (panel A) and auditory (panel B) stimuli
Experiment
2. The regions
looksfrom
to the
Agent.
Predictability & Anticipation
•
What is predicted??
•
Matching words are easier to process
•
•
Matching objects are inspected
•
•
Other matched/predicted words?
Constrained to shown objects? (Knoeferle & Crocker 2006)
Using the VWP to assess predictions and cognitive load
(to indirectly determine surprisal)
30
Predictability & Anticipation
“The boy will eat ...”
“... the train”
-log2 p(candy|boy will eat) vs -log2 p(cake|boy will eat) vs -log2 p(steak|boy will eat) vs...
-log2 p(cake|boy will eat + visible objects) vs -log2 p(ball|boy will eat + visible objects)
31
Index of Cognitive Activity
•
Frequency of pupil jitter to indicate level of cognitive load
(Marshall 2000, Demberg et al. 2013)
•
•
Robust to changes in light
Robust to eye-movements!
32
InfoTheory & VWP?
•
•
From corpus-derived probabilities for ling. stimuli...
•
•
Cloze (for low probability items?)
Empirical estimate: Reading -> Auditory Processing
... to eye-tracking in visual context
•
•
•
Cloze ?
Anticipatory eye-movements during prediction
Gaze duration? ICA!?
33
Outlook
•
Presentations on:
•
Predictability -
•
•
Anticipation -
•
•
•
what is measured, how? describe limits and expectations on
VWP experiments
what do comprehenders predict/anticipate?
what do eye-movements reveal? surprisal?
ICA - “empirical surprisal measure”
•
how could it be used to combine the approaches?
34