predictability - UC Berkeley Linguistics

Predictability and Phonetic
Attention:
How Context Shapes Exemplars
Jonathan Manker
PHREND
University of California, Santa Cruz
September 24, 2016
Overview of Presentation
• I. Background
1) Modes of Listening
2) Exemplar Theory
• II. Experiment I: Listening to Predictable and Unpredictable Words
• III. Experiment II: Imitation of Predictable and Unpredictable Words
• IV. Discussion and Conclusion
Background I: Modes of Listening
• Research has shown low-level phonetic information is used alongside
higher-level contextual information in speech recognition.
• Cole & Jakimik (1980): “Words are recognized through the interaction
of sound and knowledge.”
• Words excised from sentence contexts can be phonetically ambiguous
or even unrecognizable (coarticulation, reduction, etc.)
• Phoneme restoration (Warren 1970): speakers fail to notice [s]
replaced with a cough
• Marslen-Wilson & Welsh (1978): speakers more likely to notice onset
rather than coda speech errors ([n] replacing /m/ in ‘made’ vs. ‘time’)
‘What’ vs. ‘How’ Mode
• Lindblom et al. (1995):
• ‘what’ mode: more commonly used in speech, listeners focuses on
extracting meaning
• ‘how’ mode: listener focuses on details of sounds, pronunciation
• relevant in sound change: suggests Ohala’s hypocorrection could
result from listening in ‘how’ mode, failing to normalize expected
coarticulation
Listening Modes: Neurological Evidence
• Hickok and Poeppel (2004, 2007): Dual Stream Model
• apply dorsal and ventral streams to auditory processing
• Ventral stream:
• lower part of brain, bilateral
• mapping between abstracted sound and meaning
• Dorsal stream:
• upper portions of brain, left-hemisphere oriented
• active in sublexical processing such as phoneme identification, rhyming tasks
• involved in motor planning; suggests perception-production link
• Ventral analogous to ‘what’ mode; dorsal to ‘how’ mode
Background: Exemplar Theory
• Exemplar-based theories of speech (Johnson 1997, Pierrehumbert
2002) propose that individual instances of words are stored in
memory.
• Contrasted with theories only allowing abstract representations of
words
• Speakers may use all the available phonetic details to identify words,
rather than normalizing all speech to abstracted forms
• More recent work has suggested a need for both exemplars and
abstract representations (Pierrehumbert 2002, Goldinger 2007).
Exemplars and Higher-Level Knowledge
• Goldinger (2007): “[E]ach stored exemplar is actually a product of
perceptual input combined with prior knowledge…”
• Maye (2007): Exemplars may be shaped by different levels of
attention towards different cues, based on subjects’ L1
• Pierrehumbert (2006): Speakers attuned to more informative socioindexical features
The Current Study
• Cole & Jakimik’s observations demonstrate the effect of context on
speech recognition and perception
• Does context then mediate the activation of ‘what’/ventral vs.
‘how’/dorsal listening modes?
• Hypotheses:
• More attention to phonetic detail should occur with less context.
• Predictable speech may be stored as less precise exemplars, more
heavily influenced by higher level linguistic information.
Experiment I: Listening to Predictable and
Unpredictable Speech
• Target question: Do listeners store or process more detail for
unpredictable words, and less for predictable words?
• This experiment will focus on speech perception alone, to isolate the role
of the listener in attending to phonetic detail.
• Subjects will hear words in sentences, in either a predictable or
unpredictable context; the word will be repeated after a pause and
listeners will be asked to compare it to the original. Is the same?
• Prediction: Subjects will more notice more phonetic details of
unpredictable words and will make more errors in comparing predictable
words.
Experiment I: Stimuli
• Subjects are presented with a total of 80 recorded sentences:
• Of these, 30 are target sentences including:
• 15 with target words in predictable contexts
• 15 with different target words in unpredictable contexts
• Target words: All target sentences end in a target word, beginning with /k/, two
syllables, and initial stress.
• Predictable Context:
• “The pioneers made log ______,” or “Pennies are made out of ______.”
• Unpredictable Context:
• “Joe turned and saw the ______,” or “The next word is ______.”
• Predictability of words confirmed in Mechanical Turk survey: 30 subjects were asked to
give one guess about blank word. All target words were correctly guessed by between
9/30 and 30/30 participants.
Experiment I: Stimuli and tasks
• The same recordings of the target words are heard in both
predictability contexts (copied from predictable, pasted into
unpredictable; pronunciations are identical, only difference is
surrounding context)
• The subjects will hear these target sentences, with either predictable
or unpredictable targets, followed by a pause and some static, then
with the target word repeated.
• The target word is copied from the sentence, and is repeated either
identically, or is modified to have longer VOT (at least 100 ms) and
higher initial pitch (20 Hz higher than model).
• Examples: Pennies are made out of copper … [STATIC] … copper
The next word is copper … [STATIC] … copper
• Subjects are asked to judge if the repeated word sounds the same as
when it occurred in its sentence.
• Prediction: Subjects will be more accurate in their comparisons with
unpredictable words and less so with predictable words (more errors)
Experiment I: Tasks and Stimuli, cont’d
• The remaining 50/80 sentences are fillers of two types:
• 25 audio comparison sentences (just like the targets), except the target /
repeated word is not the last word in the sentence, and does not begin with
/k/
• 25 “content” questions such that:
• The listener simply hears “The professor taught linguistics for sixty years…” followed by
“How long did the professor teach linguistics?
• These were used in order to keep subjects listening both for meaning (“what?”) and
sound (“how?”)
Subject Groups
• 80 subjects participated via Amazon Mechanical Turk (with the
experiment run using SurveyGizmo)
• 4 groups with 20 subjects each:
• 2 groups were presented with 30/60 of
the target words, while the other 2 groups
were presented with the other half
of the words
• Within these groups, the targets
were counterbalanced for predictability, so
target words were only heard once by
each subject, either predictable or
unpredictable
Group A
heard e.g. ‘cabins’ in
pred. context, ‘copper
‘in unpred. context
Group B
heard e.g. ‘canvas’ in
pred. context, college’ in
unpred. context
Group A’
heard e.g. ‘cabins’ in
unpred. ‘copper’ in pred.
context
Group B’
heard e.g. ‘canvas’ in
unpred. context,
‘college’ in pred. context
Results
• The target data from this experiment was all categorical Y/N
responses (did the word sound the same when repeated?)
• Over 80 subjects and 2400 total responses (for target stimuli)
• Of those 2400, there were 535 errors (22.3% error rate)
• Subjects made significantly more errors in comparing predictable
words (p = 0.012) following my hypothesis.
• Errors for predictable words: 291 (54.4%)
• Errors for unpredictable words: 244 (45.6%)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: response ~ answer * group * predictability * + (1|subject) + (1|word)
Chisq
Df
Pr(>Chisq)
answer
5.4117
1
0.02000*
group
2.1484
1
0.14272
predictability
6.2991
1
0.01208*
answer:group
1.7661
1
0.18387
answer:predictability
0.1064
1
0.74424
group:predictability
0.9430
1
0.33152
answer:group:predict 1.6281
1
0.20196
ability
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Experiment II: Imitation of Predictable and
Unpredictable Speech
• The first experiment established that predictability modulates
attention to phonetic details
• Thus, contextual information affects speech perception
• Could this phenomenon be relevant for sound change?
• Experiment II will examine the listener turned speaker--- how
perception influences production
• Phonetic accommodation paradigm will be used to understand what
phonetic details are perceived and how those affect production
Phonetic Accommodation
• phenomenon whereby characteristics (phonetic, syntactic, etc.) of one’s
speech are influenced through perceiving the speech of others.
• stored word exemplars determine pronunciation
• new exemplars influence new productions
• Earliest observation in lab setting, Goldinger (1998):
•
•
•
•
baseline word reading to post-stimulus
immediate vs. delayed shadowing
word frequency, # of repetitions
more impressionistic AXB format for comparing to model
• Quantitative measurements of specific manipulated features:
• Shockley, Sabadini & Fowler (2004), Nielsen (2011): VOT
• Tilsen (2009): vowel quality
Phonetic Accommodation, cont’d
• Other features of the phenomenon:
• automatic / subconscious ? (Goldinger 1998, Lewandowski 2012)
• Specificity of accommodated features: Nielsen (2011)
• Abstractness / generalizability: Nielsen (2011)
• Higher-level knowledge influences: Nye & Fowler (2003)
• Social factors:
• Divergence/convergence: Babel (2010), Babel (2012)
• Gender and conversational roles: Pardo (2006)
• Thus, phonetic accommodation is a useful tool for understanding
what was perceived and how this influences production
• Hypothesis: More phonetic accommodation will be observed for
unpredictable speech
Stimuli
• The same target words and sentences were also used for experiment
II.
• Predictable Context:
“The pioneers made log ______,” or “Pennies are made out of ______.”
• Unpredictable Context:
“Joe turned and saw the ______,” or “The next word is ______.”
• In all target sentences, the final /k/ initial word was digitally enhanced (VOT
was approximately doubled to at least 100 ms, pitch of the first syllable was
raised by about 20 Hz).
• Once again, the target words were excised from their pronunciations in the
predictable context into the unpredictable context; thus the same recording
of the same word was heard in both conditions (undoing any possible
hyperarticulation of unpredictable words).
Tasks
• The experiment consisted of a single block of 100 sentences in
random order:
• 30 with target predictable words
• 30 with target different unpredictable words
• 40 filler sentences (no /k/ final words, no digital enhancement)
• No target word was ever heard twice in the experiment (no priming
effect possible)
Subject Groups
• The experiment consisted of four groups of 10 subjects each (40
total).
• Two instructional conditions:
• Condition I: No instruction to imitation (20 subjects)
• Condition II: Told to “sound more like the model in some way” (20 subjects)
• Groups were counterbalanced for target word context:
• Group A heard 30 target predictable set A, 30 target unpredictable set B, and
40 fillers
• Group B heard 30 target predictable set B, 30 target predictable set A, and the
same 40 fillers.
• (Again, each subject heard each target word only once, in either a predictable
or unpredictable context depending on the group)
Group A
No instruction to imitate
Told to imitate
30 predictable:
“The pioneers made log
cabins.”
30 predictable:
“The pioneers made log
cabins.”
30 unpredictable:
“The first thing Mary saw
was the coffins.”
30 unpredictable:
“The first thing Mary saw
was the coffins.”
40 fillers
40 fillers
Group A’ – counterbalanced 30 predictable:
(reverse predictability)
“The vampires are sleeping
in coffins.”
30 unpredictable: “Joe
turned and saw the cabins.”
40 fillers
30 predictable:
“The vampires are
sleeping in coffins.”
30 unpredictable: “Joe
turned and saw the
cabins.”
40 fillers
Results
• With only one block (no baseline or post-exposure reading, which
would have primed words in listening task), our question was:
Is the subjects feature X (VOT, pitch, etc.) more similar to the model’s in
the predictable or unpredictable context?
Results: VOT
• votdiff = votsubj- votmodel
• relvotdiff = (votsubj / vlsubj–votmod / vlmod)
VOTDIFF
mean
median
RELVOTDIFF mean
median
NO INSTRUCTION TO IMITATE
Predictable
Unpredictable
words
words
TOLD TO IMITATE
Predictable Unpredictab
words
le words
-59.0 ms
-58.6 ms
-56.98%
-55.49 %
-42.8 ms
-45.4 ms
-45.65%
-45.76%
-54.1 ms
-52.0 ms
-50.15%
-49.09%
-40.7 ms
-41.9 ms
-43.23%
-44.08%
• Higher values (closer to zero) indicate VOT closer to the model.
Subjects were closer to the model when told to imitate, but more
importantly were closer to the model for unpredictable words.
Mixed effects model over all subjects reveals predictability and order
within experiment are significant predictors of VOT.
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: RELVOTDIFF ~ PREDICTABILITY * GROUP * ORDER * GENDER +
(1|SUBJECT) + (1|WORD)
predictability
order
NO INSTRUCTION TO
IMITATE
Chisq
Df Pr(>Chisq
)
24.6517 1 6.868e-07
***
0.0087
1 0.92588
TOLD TO IMITATE
Chisq
Df Pr(>Chisq)
1.3089
1
26.0099 1
predictability:
0.0780
1 0.77999 4.0696
order
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1
0.25260
3.397e-07
***
0.04366 *
• When given no
instruction to imitate,
predictability of the
target words is
significant.
• When told to imitate,
predictability is not
significant, however
order and a
predictability*order
interaction are.
• When told to imitate,
the subjects’ VOT
becomes more like the
models’ over time for
unpredictable words,
while there is no
change for predictable
words.
• Mixed effects model
shows significant
difference by
predictability in the 4th
quarter (tokens 75100), p = 0.0059 when
told to imitate
Results: Pitch
• Compared pitch of the target syllable (first syllable of [k] initial word)
to that of model
• relpitchdiff = (target.pitchmod / utterance.pitchmod)– (target.pitchsubj /
utterance.pitchsubj)
Analysis of Deviance Table (Type II Wald chisquare tests)
Given significant
differences for men and
women, these results
are considered
separately
Response: RELPITCHDIFF ~ PREDICTABILITY * GROUP * ORDER * GENDER +
(1|SUBJECT) + (1|WORD)
NO INSTRUCTION TO IMITATE
TOLD TO IMITATE
Chisq
Df Pr(>Chisq) Chisq Df Pr(>Chisq)
gender
6.0356
1 0.01402 * 3.3086 1 0.0689184 .
predictability: 0.0725
1 0.78771
4.0510 1 0.0441456 *
order:gender
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Pitch: Men
• Similar pattern to
VOT: closer to the
model for
unpredictable words
NO INSTRUCTION TO IMITATE TOLD TO IMITATE
Predictable Unpredictable
Predictable Unpredictable
RELPITCHDIFF
(MEN)
mean
median
-23.82%
-19.83%
-17.00%
-15.83%
-9.06%
-6.37%
-7.18%
-4.64%
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: RELPITCHDIFF (MEN)~ PREDICTABILITY * GROUP * ORDER * GENDER + (1|SUBJECT)
+ (1|WORD)
predictability
order
predictability:order
NO INSTRUCTION TO
IMITATE
Chisq
Df Pr(>Chisq)
12.931 1 0.0003232
0
***
7.1177 1 0.0076326 **
0.1243 1 0.7244663
TOLD TO IMITATE
Chisq
2.4758
Df
1
Pr(>Chisq)
0.115612
1.4566
9.0632
1
1
0.227467
0.002608
**
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Similar pattern to VOT (all subjects), for Predictability and Predictability:order-- when told to imitate male subjects get closer to the pitch contour but only
for unpredictable words (correlations: pred, r = -.06, unpred, r = 0.2)
• Unexpectedly, Order is significant when no instruction is given, and there is a
negative correlation--- subjects drift further from the model over time (r = -.1)
Pitch: Women
• Womens’ pitch on target
syllables was higher for
unpredictable words--- but
actually exceeding the model
in relative pitch.
RELPITCHDIFF
(WOMEN)
mean
median
NO INSTRUCTION TO IMITATE
Predictable
Unpredictable
words
words
-3.55%
6.88%
-5.01%
9.66%
TOLD TO IMITATE
Predictable Unpredictabl
words
e words
-2.30%
3.50%
0.06%
4.15%
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: RELPITCHDIFF (WOMEN)~ PREDICTABILITY * GROUP * ORDER + (1|SUBJECT) + (1|WORD)
NO INSTRUCTION TO IMITATE
Chisq
Df Pr(>Chisq)
predictability
25.4992 1
4.426e-07 ***
order
1.6495 1
0.19903
predictability:order
0.2155 1
0.64247
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
•
•
•
•
TOLD TO IMITATE
Chisq
Df Pr(>Chisq)
10.9250 1
0.0009487 ***
5.9768
1
0.0144956 *
0.2184
1
0.6402586
The pattern for women imitating pitch was different from men;
Predictability was significant in both experiments
No predictability:order interaction in either experiment
When told to imitate their pitch became more like the model over
time
Discussion
• The findings from these experiments
show that context modulates attention
to phonetic details, where fewer
phonetic details are noticed for
predictable words.
• Higher level linguistic knowledge
therefore can shape our exemplars,
agreeing with Goldinger’s (2007)
quote:
• “[E]ach stored exemplar is actually a product of perceptual input
combined with prior knowledge…”
Predictable words –>
blurred exemplars?
• The results also suggest something of a ‘how’ and a ‘what’ listening
mode, where predictable speech is processed in the ‘what’ mode and
unpredictable speech is processed in the ‘how’ mode.
• More complicated perhaps? Two extreme endpoints of a scale?
• Both predictability and
instructional conditions may push
speakers towards different
listening mode
• Weight of abstracted form vs. new
exemplar in both perception and
production
• Predictability*order interaction:
perhaps over time subjects
became accustomated to
prominence pattern, and
abstracted these details as well
for predictable words
Implications for Sound Change
• The imitation study showed that the difference in perception of words
according to predictability affected production as well; Thus relevant
for sound change.
• Another dimension of lexical diffusion? How to quantify a word’s global
predictability? (as opposed to frequency)
• Perhaps this process drives the profound phonological differences in
reliably more predictable word classes ---- such as function words, as
opposed to content words
• Could influence or facilitate the development of phrasal prominence?
(as opposed to Lindblom’s hyperarticulation /production based account)
Bibliography
• Babel, M. 2012. Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics 40(1): 177–
189.
• Babel, M. & Bulatov, D. 2012. The role of fundamental frequency in phonetic accommodation. Language and Speech 552:231–248
• Cole, R.A. and Jakimik, J. 1980. A model of speech perception. In Cole, R. (ed.) Perception and Production of Fluent Speech, 133163.
• Goldinger, S. D. 1998. Echoes of echoes? An episodic theory of lexical access. Psychological Review 105(2): 251–279.
• Goldinger, S. D. 2007. A complementary-systems approach to abstract and episodic speech perception. In: Proceedings of the
17th International Congress of Phonetic Sciences, 49-54.
• Hickok, G. and Poeppel, D. 2004. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy
of language. Cognition 92(1-2), 67-99.
• Hickok, G. and Poeppel, D. 2007. Opinion – The cortical organization of speech processing. Nature Reviews and Neuroscience 8(5)
393-402.
• Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In Johnson & Mullennix (eds.) Talker
Variability in Speech Processing (pp. 145-165). San Diego: Academic Press.
• Lewandowski, N. 2012. Automaticity and consciousness in phonetic convergence (abstract). Proceedings of the Listening Talker
Workshop, 71.
• Lindblom, B., Guion, S., Hura, S., Moon, S-J., and Willerman, R. 1995. Is sound change adaptive? Rivista di Linguistica 7(1), 5-37.
• Maye, J. 2007. Learning to overcome L1 phonological biases. In: Proceedings of the 17th International Congress of Phonetic
Sciences.
• Marslen-Wilson, W. D., & Welsh, A. 1978. Processing interactions and lexical access during word-recognition in continuous
speech. Cognitive Psychology 10, 29-63.
• Nielsen, K. 2011. Specificity and abstractness of VOT imitation. Journal of Phonetics 39(2):132–142.
• Nye, P. W. & Fowler, C. A. 2003. Shadowing latency and imitation: the effect of familiarity with the phonetic patterning of English.
Journal of Phonetics 31(1): 63–79.
• Pardo, J. S. 2006. On phonetic convergence during conversational interaction. The Journal of the Acoustical Society of America
119(4): 2382.
• Pierrehumbert, J. 2002. Word-specific phonetics. In Gussenhoven, C., and Warner, N. (eds.), Laboratory Phonology VII (pp. 101139).
• Pierrehumber, J. 2006. The next toolkit. Journal of Phonetics 34, 516-531.
• Shockley K, Sabadini L, and Fowler CA (2004). Imitation in shadowing words. Perception and Psychophysics, 66, 422–9.
• Tilsen, S. 2009. Subphonemic and cross-phonemic priming in vowel shadowing: evidence for the involvement of exemplars in
production. Journal of Phonetics, 37: 3, 276-296.
• Warren, R. M. 1970. Perceptual restoration of missing speech sounds. Science 167, 392-393.