Links oder rechts – oder doch gleich? Akustische Analyse der Kompositumsbetonung im Englischen Interdisziplinäres Sprachwissenschaftliches Kolloquium Marburg, 19. Mai 2006 Gero Kunter Universität Siegen [email protected] What is compound stress? • definition: compound stress is the prominence relation between the members of a compound • focus on English noun-noun compounds • combinatorial possibilities: left member is more prominent Æ left-ward stress compúter company right member is more prominent Æ right-ward stress home phónes equal prominence of both members? Æ level (equal, double) stress knée-déep 2 Compound stress in English • no systematic phonetic description available yet • no or only partial answers to: – what are the relevant phonetic cues? – how do they affect perception? – can they be used to predict perceived stress position? • number of different stress types unclear for English compounds – two-way distinction (cf. Giegerich 2004) (left-stress vs. right-stress)? – three-way distinction (cf. Jones 19674, Dretzke 1998) (left-stress vs. right-stress vs. level stress)? 3 Three research questions 1. How do listeners rate prominence within compounds? Æ perception experiment 2. Which acoustic measurements can predict prominence ratings? Æ model of compound stress 3. What stress categories exist for English NN compounds, and can they be assigned automatically? Æ classification of compound stress 4 Corpus data • Boston University Radio Speech Corpus (Ostendorf et al. 1996) • recordings of radio speakers (3 female, 4 male) • high-quality studio recordings of semi-natural speech • corpus contains ~5,000 NN compounds • only subset used in this paper 5 Example The device is attached to a plastic wristband . It looks like a watch. It functions like an electronic probation officer . When a computerized call is made to a former prisoner's home phone , that person answers by plugging in the device. The wristband can be removed only by breaking its clasp. 6 First research question: How do speakers perceive stress? Previous research • mainly concerned with... – stress in verbs vs. nouns (e.g. Fry 1958): íncrease vs. incréase – minimal pairs (e.g. Farnetani et al. 1988): páper bag vs. paper bág • no account of compound stress variability • only forced-choice experiments with given stress levels 8 Perception of compound stress Stimuli: • 105 random, unique items from Boston corpus • 15 items from each speaker Participants: • 32 native speakers of American English • 1 excluded (81 items were rated at scale extremes) Task: • prominence rating for each compound on an unmarked scale (internal range: 0-999) • rating done by moving a slider on computer screen • slider position represents relative perceived prominence within compound 9 10 11 0.004 0.004 Rating examples house speaker = 291, s.d. = 235.9 x = 601, s.d. = 240.4 0.000 0.000 0.001 0.001 0.002 0.002 0.003 0.003 x Vietnam War -200 0 200 400 600 prominence rating 800 1000 -200 0 200 400 600 800 prominence rating 12 1000 Results • generally, ratings for each item show clear peaks of prominence perception • considerable variation of ratings between listeners (mean s.d. = 176.3 across all items and subjects) Æ speakers may strongly disagree about prominence patterns in compounds • large overlaps may occur Æ compound stress categories are not a clear-cut case 13 0.003 0.002 right-hand side of scale is disfavoured • variation of ratings increases with mean of perception ratings (r = 0.32, p < 0.01) Æ left-stress is perceptually more salient (less rating variation, more decisive rating) peaks 0.001 • distribution of mean ratings has peaks at 335 and 547 Æ two different types of stress are perceived 0.000 • 0.004 Results 0 200 400 600 800 1000 mean prominence rating 14 Means vs. unpooled results 0.004 both distributions have same peaks preferance of left scale is clearer in unpooled distribution target for further analysis: mean perception ratings 0.000 0.000 0.001 0.001 0.002 0.002 0.003 0.003 0.004 • • • 0 200 400 600 unpooled prominence rating 800 1000 0 200 400 600 mean prominence rating 800 1000 15 Second research question: How can prominence perception be modelled? Phonetic cues to compound stress • stressed member in a compound has... – higher pitch – higher intensity – longer duration ...than the unstressed member (cf. Lehiste 1970, Farnetani et al. 1988) • generally, pitch is claimed to be the strongest cue • observations based on minimal pairs • not reported in literature: creaky voice phonation may occur in unstressed (right?) member breath test 17 Acoustic measurements • for left and right member: – syllable with primary stress – sonorant part of rime • pitch, intensity, and duration of left member in relation to right member – calculation of differences – difference is clearly positive if left value is larger than right value • creaky voice may occur independently in left and right member (absolute values are used) 18 Pitch difference • mean F0 (Fleft and Fright) • pitch difference δpitch expressed in semitones: δ pit ch Fleft log Frig ht = 12 ⋅ log (2 ) • neutralization of gender-specific pitch range differences • semitone scale good approximation of sound perception 19 Intensity and duration difference Intensity: • average intensity in dB (Ileft and Iright) • intensity difference δint: δ int = Ileft − Iright • difference neutralizes recording volume variations Duration: • length of sonorants of rime in msec (Dleft and Dright) • duration difference δdur: δ dur = Dleft − Dright 20 Creaky voice • phonation mode with various acoustic properties: – – – – – lowered F0 lowered intensity high variation of glottal pulse timing (=jitter) high amplitude variation between pulses (=shimmer) ... (cf. Blomgren, Chen et al. 1998, Gordon & Ladefoged 2001) • used here: jitter (Jleft and Jright) • measurements are logarithmically transformed • high jitter value: creaky voice phonation in corresponding member 21 Data collection • items from perception experiment • phonetic software: Praat (Boersma & Weenink 2005) • boundaries of compound, member, and sonorant parts are annotated manually • measurements taken by a Praat script • automatic approach allows large-scale empirical studies 22 Automatic adjustments • necessary for pitch and jitter measurements • gender-specific pitch range (75-300 Hz for male, 100-500 Hz for female speakers) • pitch ranges and sensitivity are adjusted if – no mean pitch can be determined – octave jumps occur Æ necessary with creaky voice phonation • fallback: manual marking of glottal pulses 23 Regression analysis • statistical model that predicts results from perception experiment on basis of acoustic measurements • regression equation: ˘ ˘ ˘ ˘ ˘ ˆ˘ + † ˆ δ ˆ δ ˆ · ˆ J ˆ J ′int ′eft + β ′ ght + ε Y =† † † † β β β Y 0 1 2 3 nt + β 4 5 0 1 · pi pittch ch + † 2 ·dur dur + β 3δ i 4 Jl left 5 Jri right Y β̂ n δx J’x ε response variable (=perception rating) regression coefficient predictors (=acoustic measurements) error (= model deviation) 24 Problems: intrinsic vowel properties • vowel-intrinsic pitch, duration and intensity differences are neglected • regression analysis is regarded as quite robust against this type of variation, but – do corrections of intrinsic properties improve the analysis? – how could these corrections be done? – e.g. pitch normalization for each / / against mean pitch of all / /s? For each speaker? 25 Problems: collinearity • some of the measurements are highly correlated: – intensity difference (δint) correlates with pitch difference (δpitch) – jitter J correlates with pitch and intensity differences (δpitch) and (δint) • solution: only information that is not contained in the correlating variables is included in analysis • for intensity: ˆ +β ˆ δ δ =β + δ′ int 0 1 pit c h in t • for jitter: ˆ +β ˆ δ ˆ ′ Jleft = β 0 1 pit ch + β 2δ int + Jleft ˆ +β ˆ δ ˆ ′ Jright = β 0 1 pit c h + β 2δ int + Jrig ht 26 Results • jitter in left member is not significant for the model • all other variables are significant predictors for perceived stress position (p < 0.01) • measurements predict perception rating as expected • pitch perception is more left-ward if – – – – left pitch is higher left intensity is higher left duration is longer right member has high degree of jitter 27 Regression table β̂ std. err. t p(|t|) intercept 480.30 10.75 44.69 < 0.001 δpitch -19.87 1.74 -11.42 < 0.001 δ’int -13.22 2.36 -5.60 < 0.001 δdur -690.59 101.59 -6.80 < 0.001 28.31 10.76 2.63 < 0.01 J’right F (4, 98) = 44.14, p < 0.001, R² = 0.629 • coefficients reflect influence of measurement on stress perception: ˘ +† ˘ ·− 19.87 ˘ ˘ 690.59 ˘ δ′ +−† ˘13.22 ′ ′ ′ ′ Y==480.30 † +† Y − 0 1 pitch + †2⋅·δdur 3· int + †4⋅J lef t 5 J right +⋅ δ int + 28.31 ⋅ Jright pit ch dur • • coefficient sizes are not necessarily comparable, because different scales are used large amount of the variation of perception rating explained 28 Research question three: What stress categories exist for English NN compounds? English compound stress levels • usually two stress levels are assumed – left- and rightward stress (e.g. Giegerich 2004, Plag 2006) • still, level stress is described occasionally (e.g. Dretzke 1998): gárden wáll, hóme rúle, knée-déep • arguments based on observational data or theoretical grounds • no statistical support is provided yet 30 Stress classification • idea: phonetically similar items should belong to the same stress category • procedure: – standardization of measurements – (standardized) coefficients from regression analysis are used as weights – hierarchical cluster analysis 31 Hierarchical cluster analysis • clustering algorithm: – – – – find the two most similar items or groups combine into a new group calculate group center repeat until all items and groups are combined • representation: binary tree • vertical distance of linking nodes indicate similarity of groups 32 2500 2000 1500 tabulation errors AIDS virus boston harbor phone calls hiv virus lynn activist neighborhood residents state employees police training budget cuts Massachusetts House lunchboxes fenway park user interfaces Mathers case health clinics income tax biotech centers u.s. citizen taxation committee training ship Welfare Department state colleges lawmakers dinosaur researcher roadways senate president households spending reductions myosin gene state consent Naushon Island eviction notices wristband Vietnam War homicide charge king appointee treaty rights Seabrook attrition program campaign promise correction officials shellfish fringe candidates school districts pay cut automobile registration bank customer game plan oil fires bandaid oil spill war effort Macintosh user Senate committee target market tax hike condo market cancer patient lawmaker water use Bush administration role models Javelin Software Roxbury site milltown funding situation house speaker Beacon Hill massachusetts towns beacon hill capital police election day community service Finneran amendment home phones college board state house time zone bond sales watershed girlfriend trade barriers air-time paychecks water pipes prayer healing bingo games sex acts boarding schools Wonderland bar association growing season learning disabilities state officials immigration policy growth period network racing days prison sentence price tag computer companies birth control T.H.M. problem breathalizer results Height Results 3000 home phones, eviction notices, police training,state colleges, ... boarding schools, shellfish, house speaker, price tag, ... 1000 500 0 33 items boarding schools, shellfish, house speaker, price tag, ... Results median cluster 1 left-stressed 343 cluster 2 right-stressed 527 500 • large effect size, overlap less than 33% group means correspond closely to peaks in perception experiment (335 and 547) 400 • 300 p < 0.01, Cohen‘s d = 1.34) 600 700 perception rating of two groups is significantly different (t (103) = 6.529, 200 • home phones, eviction notices, police training, state colleges, ... cluster 1 cluster 2 34 Summary 1. How do listeners rate prominence within compounds? Æ native speakers perceive two distinct stress levels Æ large amount of variation between raters Æ left-ward stress is perceived more unambiguously 2. Which acoustic measurements can predict prominence ratings? Æ pitch, duration, intensity differences and phonation in right member contribute to stress perception Æ perceived stress can be predicted by these cues Æ prediction accuracy comparable to rating of listeners 35 Summary 3. What stress categories exist for English NN compounds, and can they be assigned automatically? Æ acoustic measurements divide compounds into two groups with significantly different stress ratings Æ no support for hypothesized level stress category 36 Perspectives • model can be used to predict perception scores for unrated compounds • large-scale empirical work on rules governing compound stress becomes possible (with all 5,000 compounds in the Boston corpus) • application in automatic speech recognition systems 37 References Blomgren, Michael, Yang Chen, Manwa L. Ng & Harvey R. Gilbert (1998), ‘Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers’, JASA 103(5), 2649-2658. Boersma, Paul & David Weenink (2006), Praat: doing phonetics by computer (Version 4.4.20) [Computer program]. Retrieved May 3, 2006, from http://www.praat.org/ Dretzke, Burkhard (1998), Modern English and American Pronunciation, Paderborn: Schöningh. Farnetani, Edda, Carol Taylor Torsello & Piero Cosi (1998), ‘English compound versus non-compound noun phrases in discourse: an acoustic and perceptual study‘, Language and Speech 31(2), 157-180. Fry, Dennis B. (1958), ‘Experiments in the perception of stress‘, Language and Speech 1, 126-152. Giegerich, Heinz (2004), ‘Compound or phrase? English noun-plus-noun constructions and the stress criterion‘ English Language and Linguistics, 8(1), 1-24. Gordon, Matthew & Peter Ladefoged (2001), ‘Phonation types: a cross-linguistic overview’, Journal of Phonetics 29, 383-406. Jones, Daniel (19674), The Pronunciation of English, Cambridge: CUP. Lehiste, Ilse (1970), Suprasegmentals, Cambridge, Mass: MIT Press. Ostendorf, Mari, Patti Price, & Stefanie Shattuck-Hufnagel (1996), Boston University Radio Speech Corpus, Philadelphia: Linguistic Data Consortium, University of Pennsylvania. Plag, Ingo (2006), ‘The variability of compound stress in English: structural, semantic and analogical factors’, English Language and Linguistics 10 (1), 143-172.
© Copyright 2026 Paperzz