What is Compound Stress?

Links oder rechts – oder doch gleich?
Akustische Analyse der
Kompositumsbetonung im Englischen
Interdisziplinäres Sprachwissenschaftliches Kolloquium
Marburg, 19. Mai 2006
Gero Kunter
Universität Siegen
[email protected]
What is compound stress?
• definition: compound stress is the prominence relation
between the members of a compound
• focus on English noun-noun compounds
• combinatorial possibilities:
left member is more prominent
Æ left-ward stress
compúter company
right member is more prominent
Æ right-ward stress
home phónes
equal prominence of both members?
Æ level (equal, double) stress
knée-déep
2
Compound stress in English
• no systematic phonetic description available yet
• no or only partial answers to:
– what are the relevant phonetic cues?
– how do they affect perception?
– can they be used to predict perceived stress position?
• number of different stress types unclear for English
compounds
– two-way distinction (cf. Giegerich 2004)
(left-stress vs. right-stress)?
– three-way distinction (cf. Jones 19674, Dretzke 1998)
(left-stress vs. right-stress vs. level stress)?
3
Three research questions
1. How do listeners rate prominence within compounds?
Æ perception experiment
2. Which acoustic measurements can predict prominence
ratings?
Æ model of compound stress
3. What stress categories exist for English NN compounds,
and can they be assigned automatically?
Æ classification of compound stress
4
Corpus data
• Boston University Radio Speech Corpus (Ostendorf et al.
1996)
• recordings of radio speakers (3 female, 4 male)
• high-quality studio recordings of semi-natural speech
• corpus contains ~5,000 NN compounds
• only subset used in this paper
5
Example
The device is attached to a plastic wristband
. It looks
like a watch. It functions like an electronic probation officer
. When a computerized call is made to a former prisoner's
home phone
, that person answers by plugging in the
device. The wristband
can be removed only by
breaking its clasp.
6
First research question:
How do speakers perceive stress?
Previous research
• mainly concerned with...
– stress in verbs vs. nouns (e.g. Fry 1958):
íncrease
vs.
incréase
– minimal pairs (e.g. Farnetani et al. 1988):
páper bag
vs.
paper bág
• no account of compound stress variability
• only forced-choice experiments with given stress levels
8
Perception of compound stress
Stimuli:
• 105 random, unique items from Boston corpus
• 15 items from each speaker
Participants:
• 32 native speakers of American English
• 1 excluded (81 items were rated at scale extremes)
Task:
• prominence rating for each compound on an unmarked scale (internal
range: 0-999)
• rating done by moving a slider on computer screen
• slider position represents relative perceived prominence within
compound
9
10
11
0.004
0.004
Rating examples
house speaker
= 291, s.d. = 235.9
x
= 601, s.d. = 240.4
0.000
0.000
0.001
0.001
0.002
0.002
0.003
0.003
x
Vietnam War
-200
0
200
400
600
prominence rating
800
1000
-200
0
200
400
600
800
prominence rating
12
1000
Results
• generally, ratings for each item show clear peaks of
prominence perception
• considerable variation of ratings between listeners (mean
s.d. = 176.3 across all items and subjects)
Æ speakers may strongly disagree about prominence
patterns in compounds
• large overlaps may occur
Æ compound stress categories are not a clear-cut case
13
0.003
0.002
right-hand side of scale is
disfavoured
• variation of ratings increases
with mean of perception ratings
(r = 0.32, p < 0.01)
Æ left-stress is perceptually more
salient (less rating variation,
more decisive rating)
peaks
0.001
•
distribution of mean ratings has
peaks at 335 and 547
Æ two different types of stress
are perceived
0.000
•
0.004
Results
0
200
400
600
800
1000
mean prominence rating
14
Means vs. unpooled results
0.004
both distributions have same peaks
preferance of left scale is clearer in unpooled distribution
target for further analysis: mean perception ratings
0.000
0.000
0.001
0.001
0.002
0.002
0.003
0.003
0.004
•
•
•
0
200
400
600
unpooled prominence rating
800
1000
0
200
400
600
mean prominence rating
800
1000
15
Second research question:
How can prominence perception be
modelled?
Phonetic cues to compound stress
• stressed member in a compound has...
– higher pitch
– higher intensity
– longer duration
...than the unstressed member
(cf. Lehiste 1970, Farnetani et al. 1988)
• generally, pitch is claimed to be the strongest cue
• observations based on minimal pairs
• not reported in literature: creaky voice phonation may
occur in unstressed (right?) member
breath test
17
Acoustic measurements
• for left and right member:
– syllable with primary stress
– sonorant part of rime
• pitch, intensity, and duration of left member in relation to
right member
– calculation of differences
– difference is clearly positive if left value is larger than right
value
• creaky voice may occur independently in left and right
member (absolute values are used)
18
Pitch difference
• mean F0 (Fleft and Fright)
• pitch difference δpitch expressed in semitones:
δ pit ch
 Fleft 
log 

Frig ht 

= 12 ⋅
log (2 )
• neutralization of gender-specific pitch range differences
• semitone scale good approximation of sound perception
19
Intensity and duration difference
Intensity:
• average intensity in dB (Ileft and Iright)
• intensity difference δint:
δ int = Ileft − Iright
• difference neutralizes recording volume variations
Duration:
• length of sonorants of rime in msec (Dleft and Dright)
• duration difference δdur:
δ dur = Dleft − Dright
20
Creaky voice
• phonation mode with various acoustic properties:
–
–
–
–
–
lowered F0
lowered intensity
high variation of glottal pulse timing (=jitter)
high amplitude variation between pulses (=shimmer)
...
(cf. Blomgren, Chen et al. 1998, Gordon & Ladefoged 2001)
• used here: jitter (Jleft and Jright)
• measurements are logarithmically transformed
• high jitter value: creaky voice phonation in corresponding
member
21
Data collection
• items from perception experiment
• phonetic software: Praat (Boersma & Weenink 2005)
• boundaries of compound, member, and sonorant parts are
annotated manually
• measurements taken by a Praat script
• automatic approach allows large-scale empirical studies
22
Automatic adjustments
• necessary for pitch and jitter measurements
• gender-specific pitch range (75-300 Hz for male, 100-500
Hz for female speakers)
• pitch ranges and sensitivity are adjusted if
– no mean pitch can be determined
– octave jumps occur
Æ necessary with creaky voice phonation
• fallback: manual marking of glottal pulses
23
Regression analysis
• statistical model that predicts results from perception
experiment on basis of acoustic measurements
• regression equation:
˘
˘
˘
˘
˘
ˆ˘ + †
ˆ δ
ˆ δ
ˆ ·
ˆ J
ˆ J
′int
′eft + β
′ ght + ε
Y =†
†
†
†
β
β
β
Y
0
1
2
3
nt + β
4
5
0
1 · pi
pittch
ch + †
2 ·dur
dur + β
3δ i
4 Jl
left
5 Jri
right
Y
β̂ n
δx J’x
ε
response variable (=perception rating)
regression coefficient
predictors (=acoustic measurements)
error (= model deviation)
24
Problems: intrinsic vowel properties
• vowel-intrinsic pitch, duration and intensity differences are
neglected
• regression analysis is regarded as quite robust against
this type of variation, but
– do corrections of intrinsic properties improve the analysis?
– how could these corrections be done?
– e.g. pitch normalization for each / / against mean pitch of all
/ /s? For each speaker?
25
Problems: collinearity
• some of the measurements are highly correlated:
– intensity difference (δint) correlates with pitch difference
(δpitch)
– jitter J correlates with pitch and intensity differences (δpitch)
and (δint)
• solution: only information that is not contained in the
correlating variables is included in analysis
• for intensity:
ˆ +β
ˆ δ
δ =β
+ δ′
int
0
1
pit c h
in t
• for jitter:
ˆ +β
ˆ δ
ˆ
′
Jleft = β
0
1 pit ch + β 2δ int + Jleft
ˆ +β
ˆ δ
ˆ
′
Jright = β
0
1 pit c h + β 2δ int + Jrig ht
26
Results
• jitter in left member is not significant for the model
• all other variables are significant predictors for perceived
stress position (p < 0.01)
• measurements predict perception rating as expected
• pitch perception is more left-ward if
–
–
–
–
left pitch is higher
left intensity is higher
left duration is longer
right member has high degree of jitter
27
Regression table
β̂
std. err.
t
p(|t|)
intercept
480.30
10.75
44.69
< 0.001
δpitch
-19.87
1.74
-11.42
< 0.001
δ’int
-13.22
2.36
-5.60
< 0.001
δdur
-690.59
101.59
-6.80
< 0.001
28.31
10.76
2.63
< 0.01
J’right
F (4, 98) = 44.14, p < 0.001, R² = 0.629
•
coefficients reflect influence of measurement on stress perception:
˘ +†
˘ ·− 19.87
˘
˘ 690.59
˘ δ′ +−†
˘13.22
′
′
′
′
Y==480.30
†
+†
Y
−
0
1 pitch + †2⋅·δdur
3· int + †4⋅J
lef
t
5 J
right +⋅ δ int + 28.31 ⋅ Jright
pit ch
dur
•
•
coefficient sizes are not necessarily comparable, because different
scales are used
large amount of the variation of perception rating explained
28
Research question three:
What stress categories exist for English
NN compounds?
English compound stress levels
• usually two stress levels are assumed – left- and rightward stress (e.g. Giegerich 2004, Plag 2006)
• still, level stress is described occasionally (e.g. Dretzke
1998):
gárden wáll, hóme rúle, knée-déep
• arguments based on observational data or theoretical
grounds
• no statistical support is provided yet
30
Stress classification
• idea: phonetically similar items should belong to the same
stress category
• procedure:
– standardization of measurements
– (standardized) coefficients from regression analysis are used
as weights
– hierarchical cluster analysis
31
Hierarchical cluster analysis
• clustering algorithm:
–
–
–
–
find the two most similar items or groups
combine into a new group
calculate group center
repeat until all items and groups are combined
• representation: binary tree
• vertical distance of linking nodes indicate similarity of
groups
32
2500
2000
1500
tabulation errors
AIDS virus
boston harbor
phone calls
hiv virus
lynn activist
neighborhood residents
state employees
police training
budget cuts
Massachusetts House
lunchboxes
fenway park
user interfaces
Mathers case
health clinics
income tax
biotech centers
u.s. citizen
taxation committee
training ship
Welfare Department
state colleges
lawmakers
dinosaur researcher
roadways
senate president
households
spending reductions
myosin gene
state consent
Naushon Island
eviction notices
wristband
Vietnam War
homicide charge
king appointee
treaty rights
Seabrook
attrition program
campaign promise
correction officials
shellfish
fringe candidates
school districts
pay cut
automobile registration
bank customer
game plan
oil fires
bandaid
oil spill
war effort
Macintosh user
Senate committee
target market
tax hike
condo market
cancer patient
lawmaker
water use
Bush administration
role models
Javelin Software
Roxbury site
milltown
funding situation
house speaker
Beacon Hill
massachusetts towns
beacon hill
capital police
election day
community service
Finneran amendment
home phones
college board
state house
time zone
bond sales
watershed
girlfriend
trade barriers
air-time
paychecks
water pipes
prayer healing
bingo games
sex acts
boarding schools
Wonderland
bar association
growing season
learning disabilities
state officials
immigration policy
growth period
network
racing days
prison sentence
price tag
computer companies
birth control
T.H.M. problem
breathalizer results
Height
Results
3000
home phones,
eviction notices,
police
training,state
colleges, ...
boarding schools,
shellfish, house
speaker, price tag,
...
1000
500
0
33
items
boarding schools,
shellfish, house
speaker, price tag,
...
Results
median
cluster 1
left-stressed
343
cluster 2
right-stressed
527
500
•
large effect size, overlap less
than 33%
group means correspond
closely to peaks in
perception experiment (335
and 547)
400
•
300
p < 0.01, Cohen‘s d = 1.34)
600
700
perception rating of two
groups is significantly
different (t (103) = 6.529,
200
•
home phones,
eviction notices,
police training,
state colleges, ...
cluster 1
cluster 2
34
Summary
1. How do listeners rate prominence within compounds?
Æ native speakers perceive two distinct stress levels
Æ large amount of variation between raters
Æ left-ward stress is perceived more unambiguously
2. Which acoustic measurements can predict prominence
ratings?
Æ pitch, duration, intensity differences and phonation in right member
contribute to stress perception
Æ perceived stress can be predicted by these cues
Æ prediction accuracy comparable to rating of listeners
35
Summary
3. What stress categories exist for English NN compounds,
and can they be assigned automatically?
Æ acoustic measurements divide compounds into two groups with
significantly different stress ratings
Æ no support for hypothesized level stress category
36
Perspectives
• model can be used to predict perception scores for
unrated compounds
• large-scale empirical work on rules governing compound
stress becomes possible (with all 5,000 compounds in the
Boston corpus)
• application in automatic speech recognition systems
37
References
Blomgren, Michael, Yang Chen, Manwa L. Ng & Harvey R. Gilbert (1998), ‘Acoustic, aerodynamic,
physiologic, and perceptual properties of modal and vocal fry registers’, JASA 103(5), 2649-2658.
Boersma, Paul & David Weenink (2006), Praat: doing phonetics by computer (Version 4.4.20) [Computer
program]. Retrieved May 3, 2006, from http://www.praat.org/
Dretzke, Burkhard (1998), Modern English and American Pronunciation, Paderborn: Schöningh.
Farnetani, Edda, Carol Taylor Torsello & Piero Cosi (1998), ‘English compound versus non-compound
noun phrases in discourse: an acoustic and perceptual study‘, Language and Speech 31(2), 157-180.
Fry, Dennis B. (1958), ‘Experiments in the perception of stress‘, Language and Speech 1, 126-152.
Giegerich, Heinz (2004), ‘Compound or phrase? English noun-plus-noun constructions and the stress
criterion‘ English Language and Linguistics, 8(1), 1-24.
Gordon, Matthew & Peter Ladefoged (2001), ‘Phonation types: a cross-linguistic overview’, Journal of
Phonetics 29, 383-406.
Jones, Daniel (19674), The Pronunciation of English, Cambridge: CUP.
Lehiste, Ilse (1970), Suprasegmentals, Cambridge, Mass: MIT Press.
Ostendorf, Mari, Patti Price, & Stefanie Shattuck-Hufnagel (1996), Boston University Radio Speech
Corpus, Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Plag, Ingo (2006), ‘The variability of compound stress in English: structural, semantic and analogical
factors’, English Language and Linguistics 10 (1), 143-172.