The effect of prosody and speaker voice informaion on the

The effect of prosody and speaker voice informa5on on the interpreta5on of hyperbole Sunwoo Jeong, Simon Todd, Jus5ne Kao, Noah Goodman, Meghan Sumner Stanford University Introduc5on •  Pragma5cally enriched uEerances in everyday life: non-­‐literal interpreta5ons –  Irony: e.g. “That’s just great!” –  Metaphor: e.g. “You are the light of my life.” –  Hyperbole: e.g. “This place sells the best cheese cake in the universe!”, “This sandwich cost a million dollars!” Prior studies on hyperbole •  Lexical and syntac5c factors that affect the interpreta5on of hyperbole: McCarthy and Carter (2003), Loewenberg (1982) –  Lexical items: numeric expressions (millions of), intensifying adverbs (absolutely), other adverbs (literally) –  Syntac5c structures: polysynde5c structure (loads and loads and loads of) •  Interac5onal factors: a shiZ in foo5ng (Goffman 1979) Prior studies on hyperbole •  An experimental study by Kao et al. (2014) –  Subjects are presented with text s5muli (e.g. “It cost 1000 dollars.”) given in an interac5on context –  They are asked to rate the likelihood of 500, 1000, etc. being the actual price of the item. •  Finding: subjects’ responses can be predicted by the price priors and the affect priors –  A ra5onal speech act model (RSA) that successfully models the experiment data Our hypotheses •  Prosodic cues and speaker voice informa6on given in spoken words will also systema5cally affect the interpreta5on of hyperbole Our hypotheses: the effect of prosody •  Categorical differences in prosodic contours will affect the interpreta5on of hyperbole –  declara5ve contour (vs.) hyperbolic contour •  More incremental adjustments in finer grained prosodic cues will also affect the interpreta5on of hyperbole –  higher pitch of the main accented syllable, longer dura5on of the main accented syllable Our hypotheses: the effect of speaker voice informa5on •  Macro-­‐level social informa5on about the speaker may influence the interpreta5on of hyperbole –  e.g. speaker gender •  More specific impressions about the characteris5cs of the speaker may also influence the interpreta5on of hyperbole –  e.g. sounding impulsive, thriZy, etc. Methodology •  A percep5on study using spoken words •  Closely resembles the experiment design in Kao et al. (2014) which used text s5muli •  The s5muli are actual speech that have been carefully manipulated to test the effects of different kinds of prosodies Methodology 1. Establishing the price priors for the items used in the s5muli 2. Main experiment 3. Post-­‐experiment judgments on different speaker voices 2. Main experiment: s5muli “I bought X yesterday. It cost Y dollars.” X Y Perfume China set a hundred Necklace Chainsaw Dumbell rack Lawn mower a thousand a million 2. Main experiment: ques5ons Ques5ons: How much do you think Y actually cost? (answer in numeric free response) 2. Main experiment: speakers •  S5muli recorded by 4 different speakers (2 males, 2 females) in two different prosodies –  Declara5ve (L* on the main accented syllable) –  Hyperbolic (H* on the main accented syllable) •  No manipula5ons on the declara5ve prosody recordings •  Further manipula5ons on the hyperbolic prosody recordings 2. Main experiment: prosodic manipula5ons •  The main accented syllables were normalized within each gender for the baseline hyperbolic prosodies •  The pitch and/or the dura6on of the main accented syllable bearing the pitch accent (HUNdred, THOUsand, MILLion) was manipulated, with minimal adjustment to adjacent syllables 2. Main experiment: prosodic manipula5ons 300
Pitch (Hz)
250
200
150
100
75
It cost me a thousand dollars.
300 0
Time (s)
250
Pitch (Hz)
2.186
200
150
100
75
It cost me a thousand dollars.
2.181
0
Time (s)
2. Main experiment: prosodic manipual5ons •  Baseline F0 values of the accented syllables mul5plied by 1.25 to generate pitch-­‐
manipulated hyperbolic prosody tokens •  Baseline dura5ons of the accented syllables mul5plied by 1.5 to generate dura5on-­‐
manipulated hyperbolic prosody tokens •  Both dura5ons and F0 values manipulated 2. Main experiment: design Speaker Item Said Price Perfume Necklace Male 1 100 Male 2 China set 1000 Female 1 Lawn mower 1000000 Female 2 Dumbbell rack Chainsaw Manipula6ons Declara5ve Hyperbolic M1 (P/D) Condi6on 1 M2 (P) Condi6on 2 M3 (D) Condi6on 3 1. Establishing baselines •  Price prior experiment D bought E. How much do you think E cost? D: feminine (Nicole), masculine (Jerry), or gender-­‐
neutral (Skyler) names E: stereotypically feminine items (perfume, necklace, china set), stereotypically masculine items (lawnmower, dumbbell rack, chainsaw); 6 items in total 1. Price priors: results 3. Subjec5ve judgments on speaker voices •  A sentence with neutral proposi5onal content spoken by the 4 speakers “I think he walked around the pond.” •  Ques5ons related to impressions about the speaker voice 3. Subjec5ve judgments on speaker voices •  Like-­‐me ra5ng: How much do you thing the speaker is like you? –  on a scale from 1-­‐100 •  Impulsiveness: Is the speaker likely to spend money without planning to? –  yes/no •  ThriZiness: Is the speaker likely to save more than 20% of his/her income annually? –  yes/no 3. Subjec5ve judgments on speaker voices •  Income: What do you think is the most likely annual income of the speaker? –  less than 50,000 –  between 50,000-­‐100,000 –  more than 100,000) •  Trendiness: Is the speaker likely to be up to date on recent style trends? (yes/no) Procedures •  Price prior experiment –  40 subjects –  Less than 5 minutes •  Main experiment / post-­‐experiment judgments –  150 subjects (50 for each of the 3 condi5ons) –  Allowed mul5ple breaks, 30-­‐45 minutes •  All recruited from Amazon Mechanical Turk Interpre5ng the results •  Roughly, less frequent literal interpreta5ons are interpreted as more hyperbolic readings •  Roughly, lower es5mated actual prices are interpreted as stronger, more hyperbolic reading •  Some caveats (the complex pragma5cs of hyperbole) The seman5cs and pragma5cs of hyperbole •  The Gricean view and the coopera5ve principle (Grice 1975) –  The speakers are coopera5ve agents whose aim is to make relevant and helpful contribu5ons to the conversa5on •  More and more exaggera5on not always equivalent to stronger hyperbole •  Must be mindful of the pragma5cs of hyperbole when interpre5ng the results The seman5cs and pragma5cs of hyperbole “This perfume cost a million dollars!” •  Rule out the literal interpreta5on based on prior knowledge •  Reason that there must be a specific goal as to why the speaker decided to convey the non-­‐
literal meaning •  Derive affec5ve subtext (usually, nega5ve affect – e.g. Gibbs et al. 1991) Results: the case of said price 100 4000
0
Frequency
8000
“It cost hundred dollars.” Histogram
ofadat1_1$price.actual
0
50
100
150
200
Answers from the subjects dat1_1$price.actual
(es5mated actual price) 250
Results: the case of said price Million 5000
2000
0
Frequency
Histogram
“It cof
ost adat3_2$price.actual
million dollars.” 0
1000
2000
3000
4000
dat3_2$price.actual
Answers from the subjects (es5mated actual price) 5000
6000
Results: the case of said price 1000 4000
2000
0
Frequency
Histogram
ofa dat2_1$price.actual
“It cost thousand dollars.” 0
500
1000
1500
Answers from the subjects dat2_1$price.actual
(es5mated actual price) 2000
Results: the effect of prosody •  3 different mixed effects models for different said price bins (hundred / thousand / million) –  Main dependent variable: es6mated actual price (responses from the subjects) –  Independent variables: prosody, price prior means, price prior standard devia5ons, item sex –  Random effects: speakers, subjects Results: the effect of prosody •  At least three-­‐way dis5nc5on between the effects of different prosodies: –  Declara5ve –  Hyperbolic (base) –  Hyperbolic + Pitch + Dura5on Results: the effect of prosody •  Baseline hyperbolic prosodies elicit significantly more hyperbolic responses (i.e. lower es5mated actual price) than declara5ve prosodies (p <0.1) •  Hyperbolic prosodies with both pitch and dura5on manipula5ons elicit significantly more hyperbolic responses (i.e. lower es5mated actual price) than declara5ve prosodies (p <0.001) Results: the effect of prosody 0.003
] *
Declara6ve vs. Baseline Hyperbolic manipulation
] * 0.002
Baseline Hyperbolic vs. Hyperbolic
Hyperbolic + Pitch + Dura6on density
Declarative
Hyp+Duration
Hyp+Pitch
0.001
Hyp+Pitch+Duration
0.000
0
500
1000
1500
price.actual
es5mated actual price 2000
Results: The effect of speaker voice informa5on •  3 different linear regression models for different said price bins (hundred / thousand / million) –  Main dependent variable: es6mated actual price (responses from the subjects) –  Independent variables: prosody, price prior means, price prior standard devia5ons, item sex, impulsiveness, thriIiness, es6mated speaker income, trendiness –  Interac5ons between prosody and 4 kinds of speaker characteris5cs Results: the effect of speaker voice informa5on •  There was a weak speaker gender effect (p > 0.05) – female speakers elicited significantly lower es5mated actual price (more hyperbole) •  However, this effect disappeared when more fine-­‐grained speaker judgments were included as predictors (e.g. impulsiveness) •  Surface gender effects most likely mediated other more specific characteris5cs (cf) Ochs 1992, Podesva 2007) Results: the effect of speaker voice informa5on •  Other things being equal, in general, if the speaker is perceived as being: – Impulsive è significantly more hyperbolic interpreta5ons – ThriIy è significantly less hyperbolic interpreta5ons – higher income è significantly more hyperbolic interpreta5ons By item: the effect of impulsiveness Impulsive: Yes Impulsive: No •  Higher peaks in the 1000 area (literal interpreta5on) for the non-­‐impulsive cases •  More responses in 500 – 800 region for the impulsive cases es5mated actual price By item: the effect of perceived speaker income Income: Low Income: Mid Income: High Dura6on •  Perceived low income speakers elicit considerably higher literal interpreta5ons Pitch •  esp. in the case of lawn mower (combined with higher pitch) es5mated actual price Interac5ons between prosody and speaker voice informa5on •  Complex, mul5-­‐faceted interac5ons between prosody and speaker voice informa5on – If the speaker is perceived to be upper-­‐
class, higher pitch elicits significantly more hyperbolic interpreta6ons – If the speaker is perceived to be lower-­‐
class, higher pitch elicits significantly less hyperbolic interpreta6ons (more literal interpreta5ons) Es5mated speaker income: low Higher Pitch: Literal interpreta5on price.actual
1500
1000
es5mated actual price 500
Longer dura5on: Hyperbolic interpreta5on 0
Declarative
Hyperbolic
Hyp+Duration
Hyp+PitchHyp+Pitch+Duration
factor(manipulation)
Es5mated speaker income: mid 2000
Longer dura5on: Literal interpreta5on price.actual
1500
es5mated actual price Higher Pitch: Hyperbolic interpreta5on 1000
500
0
Declarative
Hyperbolic
Hyp+Duration
Hyp+PitchHyp+Pitch+Duration
factor(manipulation)
Es5mated speaker income: high price.actual
1500
Longer dura5on: Literal interpreta5on 1000
Higher Pitch: Hyperbolic interpreta5on es5mated actual price 500
0
Declarative
Hyperbolic
Hyp+Duration
Hyp+PitchHyp+Pitch+Duration
factor(manipulation)
Interac5ons between prosody and speaker voice informa5on •  Second example – If the speaker is perceived to be impulsive, the effect of higher pitch and longer dura6on become more pronounced. – If the speaker is perceived to be not impulsive, the effect of higher pitch and longer dura5on is considerably mi5gated. Speaker impulsiveness: yes 2000
price.actual
1500
es5mated actual price 1000
500
0
Declarative
Hyperbolic
Hyp+Duration
Hyp+PitchHyp+Pitch+Duration
factor(manipulation)
Speaker impulsiveness: no 2000
price.actual
1500
es5mated actual 1000
price 500
0
Declarative
Hyperbolic
Hyp+Duration
Hyp+PitchHyp+Pitch+Duration
factor(manipulation)
Interac5ons between prosody and speaker voice informa5on •  A complex interac5on between prosodic cues and speaker voice informa5on •  A single prosodic cue such as higher pitch does not always categorically index a single kind of affec5ve or pragma5c meaning •  Different emphasis compared to previous works (e.g. Paeschke 2004, Ko et al. 2014) Conclusion •  Categorical differences in prosodic contours systema5cally affect the interpreta5on of hyperbole •  Finer-­‐grained differences in prosodic cues (e.g. higher pitch, longer dura5on) also systema5cally affect the interpreta5on of hyperbole Conclusion •  Speaker voice informa5on such as perceived impulsiveness of the speaker also significantly influence the interpreta5on of hyperbole •  The study suggests important three-­‐way interac5ons between prosody, pragma5c meaning, and social meaning Selected bibliography Bergen, Leon, Noah D Goodman, and Roger Levy. 2012. “That’s What She (could Have) Said: How Alterna5ve UEerances Affect Language Use.” In Proceedings of the Thirty-­‐
Fourth Annual Conference of the Cogni@ve Science Society, eds. Naomi Miyake, David Peebles, and Richard P Cooper. Aus5n, TX: Cogni5ve Science Society, 120–25. Gibbs, Raymond W, and Jennifer O’Brien. 1991. “Psychological Aspects of Irony Understanding.” Journal of pragma@cs 16(6): 523–30. Goffman, E. 1979. Foo@ng. Semio5ca, 25(1-­‐2), 1–30. Grice, H P. 1975. “Logic and Conversa5on’in P. Cole and J. Morgan (eds.) Syntax and Seman5cs Volume 3: Speech Acts.” Johnson, M., & Charniak, E. 2004. A TAG-­‐based noisy channel model of speech repairs. In Proceedings of the 42nd Annual Mee@ng on Associa@on for Computa@onal Linguis@cs. Ochs, Elinor. 1992. Indexing gender. In Alessandro Duran5 and Charles Goodwin (eds.) Rethinking Context. Cambridge, U.K.: Cambridge University Press. 335–359. Selected bibliography Kao, Jus5ne T, Jean Y Wu, Leon Bergen, and Noah D Goodman. 2014. “Nonliteral Understanding of Number Words.” Proceedings of the Na@onal Academy of Sciences 111(33): 12002–7. Ko, Sei Jin, Melody S Sadler, and Adam D Galinsky. 2014. “The Sound of Power Conveying and Detec5ng Hierarchical Rank Through Voice.” Psychological science: 0956797614553009. Loewenberg, I. (1982). Labels and hedges, the metalinguis5c turn. Language and Style, 15(3), 193–207. McCarthy, Michael, and Ronald Carter. 2004. “‘There’s Millions of Them’: Hyperbole in Everyday Conversa5on.” Journal of pragma@cs 36(2): 149–84. Paeschke, Astrid. 2004. “Global Trend of Fundamental Frequency in Emo5onal Speech.” In Speech Prosody 2004, Interna@onal Conference. Podesva, Robert J. 2007. Phona5on type as a stylis5c variable: The use of falseEo in construc5ng a persona. Journal of Sociolinguis@cs 11: 478-­‐504. We thank Chris PoEs, Penny Eckert, and Rob Podesva for their insigh{ul comments. We also thank the four speakers who helped us with the s5muli crea5on, as well as numerous other par5cipants who helped us during our recording sessions. This project was supported in part by funding from the Clayman Ins5tute for Gender Research.