Predictability and Phonetic Attention: How Context Shapes Exemplars Jonathan Manker PHREND University of California, Santa Cruz September 24, 2016 Overview of Presentation • I. Background 1) Modes of Listening 2) Exemplar Theory • II. Experiment I: Listening to Predictable and Unpredictable Words • III. Experiment II: Imitation of Predictable and Unpredictable Words • IV. Discussion and Conclusion Background I: Modes of Listening • Research has shown low-level phonetic information is used alongside higher-level contextual information in speech recognition. • Cole & Jakimik (1980): “Words are recognized through the interaction of sound and knowledge.” • Words excised from sentence contexts can be phonetically ambiguous or even unrecognizable (coarticulation, reduction, etc.) • Phoneme restoration (Warren 1970): speakers fail to notice [s] replaced with a cough • Marslen-Wilson & Welsh (1978): speakers more likely to notice onset rather than coda speech errors ([n] replacing /m/ in ‘made’ vs. ‘time’) ‘What’ vs. ‘How’ Mode • Lindblom et al. (1995): • ‘what’ mode: more commonly used in speech, listeners focuses on extracting meaning • ‘how’ mode: listener focuses on details of sounds, pronunciation • relevant in sound change: suggests Ohala’s hypocorrection could result from listening in ‘how’ mode, failing to normalize expected coarticulation Listening Modes: Neurological Evidence • Hickok and Poeppel (2004, 2007): Dual Stream Model • apply dorsal and ventral streams to auditory processing • Ventral stream: • lower part of brain, bilateral • mapping between abstracted sound and meaning • Dorsal stream: • upper portions of brain, left-hemisphere oriented • active in sublexical processing such as phoneme identification, rhyming tasks • involved in motor planning; suggests perception-production link • Ventral analogous to ‘what’ mode; dorsal to ‘how’ mode Background: Exemplar Theory • Exemplar-based theories of speech (Johnson 1997, Pierrehumbert 2002) propose that individual instances of words are stored in memory. • Contrasted with theories only allowing abstract representations of words • Speakers may use all the available phonetic details to identify words, rather than normalizing all speech to abstracted forms • More recent work has suggested a need for both exemplars and abstract representations (Pierrehumbert 2002, Goldinger 2007). Exemplars and Higher-Level Knowledge • Goldinger (2007): “[E]ach stored exemplar is actually a product of perceptual input combined with prior knowledge…” • Maye (2007): Exemplars may be shaped by different levels of attention towards different cues, based on subjects’ L1 • Pierrehumbert (2006): Speakers attuned to more informative socioindexical features The Current Study • Cole & Jakimik’s observations demonstrate the effect of context on speech recognition and perception • Does context then mediate the activation of ‘what’/ventral vs. ‘how’/dorsal listening modes? • Hypotheses: • More attention to phonetic detail should occur with less context. • Predictable speech may be stored as less precise exemplars, more heavily influenced by higher level linguistic information. Experiment I: Listening to Predictable and Unpredictable Speech • Target question: Do listeners store or process more detail for unpredictable words, and less for predictable words? • This experiment will focus on speech perception alone, to isolate the role of the listener in attending to phonetic detail. • Subjects will hear words in sentences, in either a predictable or unpredictable context; the word will be repeated after a pause and listeners will be asked to compare it to the original. Is the same? • Prediction: Subjects will more notice more phonetic details of unpredictable words and will make more errors in comparing predictable words. Experiment I: Stimuli • Subjects are presented with a total of 80 recorded sentences: • Of these, 30 are target sentences including: • 15 with target words in predictable contexts • 15 with different target words in unpredictable contexts • Target words: All target sentences end in a target word, beginning with /k/, two syllables, and initial stress. • Predictable Context: • “The pioneers made log ______,” or “Pennies are made out of ______.” • Unpredictable Context: • “Joe turned and saw the ______,” or “The next word is ______.” • Predictability of words confirmed in Mechanical Turk survey: 30 subjects were asked to give one guess about blank word. All target words were correctly guessed by between 9/30 and 30/30 participants. Experiment I: Stimuli and tasks • The same recordings of the target words are heard in both predictability contexts (copied from predictable, pasted into unpredictable; pronunciations are identical, only difference is surrounding context) • The subjects will hear these target sentences, with either predictable or unpredictable targets, followed by a pause and some static, then with the target word repeated. • The target word is copied from the sentence, and is repeated either identically, or is modified to have longer VOT (at least 100 ms) and higher initial pitch (20 Hz higher than model). • Examples: Pennies are made out of copper … [STATIC] … copper The next word is copper … [STATIC] … copper • Subjects are asked to judge if the repeated word sounds the same as when it occurred in its sentence. • Prediction: Subjects will be more accurate in their comparisons with unpredictable words and less so with predictable words (more errors) Experiment I: Tasks and Stimuli, cont’d • The remaining 50/80 sentences are fillers of two types: • 25 audio comparison sentences (just like the targets), except the target / repeated word is not the last word in the sentence, and does not begin with /k/ • 25 “content” questions such that: • The listener simply hears “The professor taught linguistics for sixty years…” followed by “How long did the professor teach linguistics? • These were used in order to keep subjects listening both for meaning (“what?”) and sound (“how?”) Subject Groups • 80 subjects participated via Amazon Mechanical Turk (with the experiment run using SurveyGizmo) • 4 groups with 20 subjects each: • 2 groups were presented with 30/60 of the target words, while the other 2 groups were presented with the other half of the words • Within these groups, the targets were counterbalanced for predictability, so target words were only heard once by each subject, either predictable or unpredictable Group A heard e.g. ‘cabins’ in pred. context, ‘copper ‘in unpred. context Group B heard e.g. ‘canvas’ in pred. context, college’ in unpred. context Group A’ heard e.g. ‘cabins’ in unpred. ‘copper’ in pred. context Group B’ heard e.g. ‘canvas’ in unpred. context, ‘college’ in pred. context Results • The target data from this experiment was all categorical Y/N responses (did the word sound the same when repeated?) • Over 80 subjects and 2400 total responses (for target stimuli) • Of those 2400, there were 535 errors (22.3% error rate) • Subjects made significantly more errors in comparing predictable words (p = 0.012) following my hypothesis. • Errors for predictable words: 291 (54.4%) • Errors for unpredictable words: 244 (45.6%) Analysis of Deviance Table (Type II Wald chisquare tests) Response: response ~ answer * group * predictability * + (1|subject) + (1|word) Chisq Df Pr(>Chisq) answer 5.4117 1 0.02000* group 2.1484 1 0.14272 predictability 6.2991 1 0.01208* answer:group 1.7661 1 0.18387 answer:predictability 0.1064 1 0.74424 group:predictability 0.9430 1 0.33152 answer:group:predict 1.6281 1 0.20196 ability Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Experiment II: Imitation of Predictable and Unpredictable Speech • The first experiment established that predictability modulates attention to phonetic details • Thus, contextual information affects speech perception • Could this phenomenon be relevant for sound change? • Experiment II will examine the listener turned speaker--- how perception influences production • Phonetic accommodation paradigm will be used to understand what phonetic details are perceived and how those affect production Phonetic Accommodation • phenomenon whereby characteristics (phonetic, syntactic, etc.) of one’s speech are influenced through perceiving the speech of others. • stored word exemplars determine pronunciation • new exemplars influence new productions • Earliest observation in lab setting, Goldinger (1998): • • • • baseline word reading to post-stimulus immediate vs. delayed shadowing word frequency, # of repetitions more impressionistic AXB format for comparing to model • Quantitative measurements of specific manipulated features: • Shockley, Sabadini & Fowler (2004), Nielsen (2011): VOT • Tilsen (2009): vowel quality Phonetic Accommodation, cont’d • Other features of the phenomenon: • automatic / subconscious ? (Goldinger 1998, Lewandowski 2012) • Specificity of accommodated features: Nielsen (2011) • Abstractness / generalizability: Nielsen (2011) • Higher-level knowledge influences: Nye & Fowler (2003) • Social factors: • Divergence/convergence: Babel (2010), Babel (2012) • Gender and conversational roles: Pardo (2006) • Thus, phonetic accommodation is a useful tool for understanding what was perceived and how this influences production • Hypothesis: More phonetic accommodation will be observed for unpredictable speech Stimuli • The same target words and sentences were also used for experiment II. • Predictable Context: “The pioneers made log ______,” or “Pennies are made out of ______.” • Unpredictable Context: “Joe turned and saw the ______,” or “The next word is ______.” • In all target sentences, the final /k/ initial word was digitally enhanced (VOT was approximately doubled to at least 100 ms, pitch of the first syllable was raised by about 20 Hz). • Once again, the target words were excised from their pronunciations in the predictable context into the unpredictable context; thus the same recording of the same word was heard in both conditions (undoing any possible hyperarticulation of unpredictable words). Tasks • The experiment consisted of a single block of 100 sentences in random order: • 30 with target predictable words • 30 with target different unpredictable words • 40 filler sentences (no /k/ final words, no digital enhancement) • No target word was ever heard twice in the experiment (no priming effect possible) Subject Groups • The experiment consisted of four groups of 10 subjects each (40 total). • Two instructional conditions: • Condition I: No instruction to imitation (20 subjects) • Condition II: Told to “sound more like the model in some way” (20 subjects) • Groups were counterbalanced for target word context: • Group A heard 30 target predictable set A, 30 target unpredictable set B, and 40 fillers • Group B heard 30 target predictable set B, 30 target predictable set A, and the same 40 fillers. • (Again, each subject heard each target word only once, in either a predictable or unpredictable context depending on the group) Group A No instruction to imitate Told to imitate 30 predictable: “The pioneers made log cabins.” 30 predictable: “The pioneers made log cabins.” 30 unpredictable: “The first thing Mary saw was the coffins.” 30 unpredictable: “The first thing Mary saw was the coffins.” 40 fillers 40 fillers Group A’ – counterbalanced 30 predictable: (reverse predictability) “The vampires are sleeping in coffins.” 30 unpredictable: “Joe turned and saw the cabins.” 40 fillers 30 predictable: “The vampires are sleeping in coffins.” 30 unpredictable: “Joe turned and saw the cabins.” 40 fillers Results • With only one block (no baseline or post-exposure reading, which would have primed words in listening task), our question was: Is the subjects feature X (VOT, pitch, etc.) more similar to the model’s in the predictable or unpredictable context? Results: VOT • votdiff = votsubj- votmodel • relvotdiff = (votsubj / vlsubj–votmod / vlmod) VOTDIFF mean median RELVOTDIFF mean median NO INSTRUCTION TO IMITATE Predictable Unpredictable words words TOLD TO IMITATE Predictable Unpredictab words le words -59.0 ms -58.6 ms -56.98% -55.49 % -42.8 ms -45.4 ms -45.65% -45.76% -54.1 ms -52.0 ms -50.15% -49.09% -40.7 ms -41.9 ms -43.23% -44.08% • Higher values (closer to zero) indicate VOT closer to the model. Subjects were closer to the model when told to imitate, but more importantly were closer to the model for unpredictable words. Mixed effects model over all subjects reveals predictability and order within experiment are significant predictors of VOT. Analysis of Deviance Table (Type II Wald chisquare tests) Response: RELVOTDIFF ~ PREDICTABILITY * GROUP * ORDER * GENDER + (1|SUBJECT) + (1|WORD) predictability order NO INSTRUCTION TO IMITATE Chisq Df Pr(>Chisq ) 24.6517 1 6.868e-07 *** 0.0087 1 0.92588 TOLD TO IMITATE Chisq Df Pr(>Chisq) 1.3089 1 26.0099 1 predictability: 0.0780 1 0.77999 4.0696 order Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 1 0.25260 3.397e-07 *** 0.04366 * • When given no instruction to imitate, predictability of the target words is significant. • When told to imitate, predictability is not significant, however order and a predictability*order interaction are. • When told to imitate, the subjects’ VOT becomes more like the models’ over time for unpredictable words, while there is no change for predictable words. • Mixed effects model shows significant difference by predictability in the 4th quarter (tokens 75100), p = 0.0059 when told to imitate Results: Pitch • Compared pitch of the target syllable (first syllable of [k] initial word) to that of model • relpitchdiff = (target.pitchmod / utterance.pitchmod)– (target.pitchsubj / utterance.pitchsubj) Analysis of Deviance Table (Type II Wald chisquare tests) Given significant differences for men and women, these results are considered separately Response: RELPITCHDIFF ~ PREDICTABILITY * GROUP * ORDER * GENDER + (1|SUBJECT) + (1|WORD) NO INSTRUCTION TO IMITATE TOLD TO IMITATE Chisq Df Pr(>Chisq) Chisq Df Pr(>Chisq) gender 6.0356 1 0.01402 * 3.3086 1 0.0689184 . predictability: 0.0725 1 0.78771 4.0510 1 0.0441456 * order:gender Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Pitch: Men • Similar pattern to VOT: closer to the model for unpredictable words NO INSTRUCTION TO IMITATE TOLD TO IMITATE Predictable Unpredictable Predictable Unpredictable RELPITCHDIFF (MEN) mean median -23.82% -19.83% -17.00% -15.83% -9.06% -6.37% -7.18% -4.64% Analysis of Deviance Table (Type II Wald chisquare tests) Response: RELPITCHDIFF (MEN)~ PREDICTABILITY * GROUP * ORDER * GENDER + (1|SUBJECT) + (1|WORD) predictability order predictability:order NO INSTRUCTION TO IMITATE Chisq Df Pr(>Chisq) 12.931 1 0.0003232 0 *** 7.1177 1 0.0076326 ** 0.1243 1 0.7244663 TOLD TO IMITATE Chisq 2.4758 Df 1 Pr(>Chisq) 0.115612 1.4566 9.0632 1 1 0.227467 0.002608 ** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • Similar pattern to VOT (all subjects), for Predictability and Predictability:order-- when told to imitate male subjects get closer to the pitch contour but only for unpredictable words (correlations: pred, r = -.06, unpred, r = 0.2) • Unexpectedly, Order is significant when no instruction is given, and there is a negative correlation--- subjects drift further from the model over time (r = -.1) Pitch: Women • Womens’ pitch on target syllables was higher for unpredictable words--- but actually exceeding the model in relative pitch. RELPITCHDIFF (WOMEN) mean median NO INSTRUCTION TO IMITATE Predictable Unpredictable words words -3.55% 6.88% -5.01% 9.66% TOLD TO IMITATE Predictable Unpredictabl words e words -2.30% 3.50% 0.06% 4.15% Analysis of Deviance Table (Type II Wald chisquare tests) Response: RELPITCHDIFF (WOMEN)~ PREDICTABILITY * GROUP * ORDER + (1|SUBJECT) + (1|WORD) NO INSTRUCTION TO IMITATE Chisq Df Pr(>Chisq) predictability 25.4992 1 4.426e-07 *** order 1.6495 1 0.19903 predictability:order 0.2155 1 0.64247 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • • • • TOLD TO IMITATE Chisq Df Pr(>Chisq) 10.9250 1 0.0009487 *** 5.9768 1 0.0144956 * 0.2184 1 0.6402586 The pattern for women imitating pitch was different from men; Predictability was significant in both experiments No predictability:order interaction in either experiment When told to imitate their pitch became more like the model over time Discussion • The findings from these experiments show that context modulates attention to phonetic details, where fewer phonetic details are noticed for predictable words. • Higher level linguistic knowledge therefore can shape our exemplars, agreeing with Goldinger’s (2007) quote: • “[E]ach stored exemplar is actually a product of perceptual input combined with prior knowledge…” Predictable words –> blurred exemplars? • The results also suggest something of a ‘how’ and a ‘what’ listening mode, where predictable speech is processed in the ‘what’ mode and unpredictable speech is processed in the ‘how’ mode. • More complicated perhaps? Two extreme endpoints of a scale? • Both predictability and instructional conditions may push speakers towards different listening mode • Weight of abstracted form vs. new exemplar in both perception and production • Predictability*order interaction: perhaps over time subjects became accustomated to prominence pattern, and abstracted these details as well for predictable words Implications for Sound Change • The imitation study showed that the difference in perception of words according to predictability affected production as well; Thus relevant for sound change. • Another dimension of lexical diffusion? How to quantify a word’s global predictability? (as opposed to frequency) • Perhaps this process drives the profound phonological differences in reliably more predictable word classes ---- such as function words, as opposed to content words • Could influence or facilitate the development of phrasal prominence? (as opposed to Lindblom’s hyperarticulation /production based account) Bibliography • Babel, M. 2012. Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics 40(1): 177– 189. • Babel, M. & Bulatov, D. 2012. The role of fundamental frequency in phonetic accommodation. Language and Speech 552:231–248 • Cole, R.A. and Jakimik, J. 1980. A model of speech perception. In Cole, R. (ed.) Perception and Production of Fluent Speech, 133163. • Goldinger, S. D. 1998. Echoes of echoes? An episodic theory of lexical access. Psychological Review 105(2): 251–279. • Goldinger, S. D. 2007. A complementary-systems approach to abstract and episodic speech perception. In: Proceedings of the 17th International Congress of Phonetic Sciences, 49-54. • Hickok, G. and Poeppel, D. 2004. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92(1-2), 67-99. • Hickok, G. and Poeppel, D. 2007. Opinion – The cortical organization of speech processing. Nature Reviews and Neuroscience 8(5) 393-402. • Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In Johnson & Mullennix (eds.) Talker Variability in Speech Processing (pp. 145-165). San Diego: Academic Press. • Lewandowski, N. 2012. Automaticity and consciousness in phonetic convergence (abstract). Proceedings of the Listening Talker Workshop, 71. • Lindblom, B., Guion, S., Hura, S., Moon, S-J., and Willerman, R. 1995. Is sound change adaptive? Rivista di Linguistica 7(1), 5-37. • Maye, J. 2007. Learning to overcome L1 phonological biases. In: Proceedings of the 17th International Congress of Phonetic Sciences. • Marslen-Wilson, W. D., & Welsh, A. 1978. Processing interactions and lexical access during word-recognition in continuous speech. Cognitive Psychology 10, 29-63. • Nielsen, K. 2011. Specificity and abstractness of VOT imitation. Journal of Phonetics 39(2):132–142. • Nye, P. W. & Fowler, C. A. 2003. Shadowing latency and imitation: the effect of familiarity with the phonetic patterning of English. Journal of Phonetics 31(1): 63–79. • Pardo, J. S. 2006. On phonetic convergence during conversational interaction. The Journal of the Acoustical Society of America 119(4): 2382. • Pierrehumbert, J. 2002. Word-specific phonetics. In Gussenhoven, C., and Warner, N. (eds.), Laboratory Phonology VII (pp. 101139). • Pierrehumber, J. 2006. The next toolkit. Journal of Phonetics 34, 516-531. • Shockley K, Sabadini L, and Fowler CA (2004). Imitation in shadowing words. Perception and Psychophysics, 66, 422–9. • Tilsen, S. 2009. Subphonemic and cross-phonemic priming in vowel shadowing: evidence for the involvement of exemplars in production. Journal of Phonetics, 37: 3, 276-296. • Warren, R. M. 1970. Perceptual restoration of missing speech sounds. Science 167, 392-393.
© Copyright 2026 Paperzz