Word accent, emphatic stress, and syntax in a synthesis rule

Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
Word accent, emphatic
stress, and syntax in a
synthesis rule scheme for
Swedish
Carlson, R. and Granstròˆm, B.
journal:
volume:
number:
year:
pages:
STL-QPSR
14
2-3
1973
031-036
http://www.speech.kth.se/qpsr
32.
STL-QPSR 2 -3/1973
A m i x t u r e of the acute and grave r u l e s y s t e m s will give a tentative
s y s t e m predicting the fundamental frequency p a t t e r n of Swedish nonsense
word sentences without limitations concerning word accent. It should
be s t r e s s e d that the p r e s e n t r u l e s d e s c r i b e only a c e r t a i n subject' s be havior and that d i a l e c t a l and ideolectal d i f f e r e n c e s have not yet been
considered (4)
.
One important presupposition underlying o u r r u l e s y s t e m i s that the
intonation contour c a n be split into two p a r t s :
sentence intonation and
In s t a t e m e n t s pronounced by our subject the
word s t r e s s marking.
sentence intonation could be d e s c r i b e d simply a s a l i n e a r f a l l in F 0 with
constant s t a r t i n g and end values.
The s t r e s s m a r k i n g contour i s c h a r -
a c t e r i z e d by the location of m i n i m a and by a positive F 0 wave that conThe amplitude of t h i s wave w a s found t o be r e l a t e d
n e c t s t h e s e minima.
t o the duration of the preceding s t r e s s e d vowel (Fig. 111-B-I).
This
s y s t e m will p r e d i c t the intonational p a t t e r n of sentences consisting only
of acute accent words (Fig. 111-B -2).
With r e s p e c t to F
0
t h e r e i s a close s i m i l a r i t y between the syllable
with secondary s t r e s s in grave w o r d s and the s t r e s s e d syllable in acute
words.
However, the durational p a t t e r n i s different.
Typically a vowel
c a r r y i n g secondary s t r e s s w a s found to be about 23 % s h o r t e r than a
m a i n s t r e s s vowel, everything e l s e being equal(').
T h i s will influence
the peak height of the Fo wave ( ~ i g .111-B-I).
The p r i m a r y s t r e s s syllable of a grave word h a s a different s t r e s s
marking c o r r e l a t e .
The p a t t e r n i s the i n v e r s e of that of the acute accent,
and the r e f e r e n c e point i s a m a x i m u m r a t h e r than a minimum.
The
value of the maximum s e e m s to be a function of the duration of the word
receding the grave syllable.
Now t h e r e i s a contradiction between two
hypotheses viz. , the truncation hypothesis and the r a t e adjustment hypot h e s i s (3). The contradiction l e a d s t o the question, should the word intonation contour of p r i m a r y s t r e s s syllables in grave accent words have
a fixed shape which i s truncated in accordance with the duration of the
vowel o r should the m i n i m u m govern the p a t t e r n s o a s to cause a n a d justment of r a t e ? In our s y s t e m we have chosen the second hypothesis
in o r d e r t o make the r u l e s a s simple a s possible.
variance with the e a r l i e r r e s u l t s ( 3 ) .
not be of perceptual importance.
However, t h i s i s a t
The adopted r u l e might o r might
STL-QPSR 2-3/1973
The Fo r u l e s
The r u l e s have a s input the s t r e s s and tonal p a t t e r n and the duration
of s u c c e s s i v e s e g m e n t s in a given sentence.
LA1
T e m p o-ra l-location
--- - - - -of-
I.
Fo m i n i m a
i)
Locate one minimum in the middle of e a c h s t r e s s e d vowel
segment except p r i m a r y s t r e s s vowels in grave accent words.
ii) Move the m i n i m u m to the vowel onset in sentence boundary
words.
iii) Two additional m i n i m a a r e located a t the beginning and
end of the sentence.
2.
F, m a x i m a
Locate a maximum in the initial p a r t of s t r e s s e d vowels
c a r r y i n g p r i m a r y s t r e s s in grave a c c e n t words.
FBL -L v a-l u-e-s-of-m
-a-x-i m-aC o m ~ u t ethe value of m a x i m a (dealt with in stage A2) using
the durational coefficient for the preceding word.
i m,
a and
& L C , - onnect
,,
-m-a x,
- -m-i n-i m-a1.
Connect m a x i m a and m i n i m a by half a period of a cosine wave
2.
Connect m i n i m a with no m a x i m a inbetween by a complete,
inverted period of a cosine wave. The period length is d e t e r mined by the duration inbetween the two m i n i m a and the
amplitude i s a function of the duration of the s t r e s s e d vowel
containing the f i r s t minimum (Fig. 111-B - I).
-(D)- -Add
_ -the
- _sentence
_ - - -intonation
- - - - - component
-----F o r our subject: a l i n e a r fall f r o m 120 Hz
to 90 Hz.
An illustration of how t h e s e r u l e s function i s shown in Fig. 111-B-3.
It should be noted that the duration h a s been normalized for the sentence
c a r r y i n g the grave a c c e n t word in o r d e r t o get synchrony between the two
sentence s.
P o s s i b l e expansion of the m o d e l
Within the p r e s e n t d e s c r i p t i v e f r a m e w o r k it a p p e a r s possible to make
some generalizations a i m i n g a t a synthesis of a wider c l a s s of sentence
realizations.
A l t e r a t i o n s of the p h r a s e contour could obviously give such
effects a s e x p r e s s i o n of doubt (question tone) o r signalling that something
will follow (continuation tone).
150
200
250
300 msec
VOWEL DURATION
(MEDIAN)
Fig. 111-B-I.
Deviation of Fo peak f r o m the underlying
sentence contour a s a f u n c t i ~ nof the d u r a tion of the preceding s t r e s s e d vowel.
-
f0
stressed vowels
A
--
unstressed vowels
cmsononts
...
J
Fig. 111-B-2.
time
A fundamental frequency contour synthesized with the aid of the timing and F, rules
described in the text. The structural analysis of the sentence i s shown to the left.
STL-QPSR 2 -3/1973
34.
Depending on the s t r u c t u r a l a n a l y s i s of the sentence some variation
in the timing and intonation might be predicted.
In this section we will
d e m o n s t r a t e how the model handles junctural phenomena and e m p h a s i s
The sentence chosen i s the s a m e a s the one
within a single sentence.
One p r o c e d u r e that the m o d e l suggests
used in the previous sections.
for c r e a t i n g emphatic s t r e s s i s briefly touched upon in ref.").
The x
coefficient in the timing r u l e contains a c e r t a i n factor g r e a t e r than the
one for the emphatic s e g m e n t s and the i n v e r s e of that factor f o r the
surrounding syllables.
Through this f a c t o r the absolute value of the n
coefficient m a y be made continuously variable s o a s to provide the a p
-
The r e s u l t i n g modification of the t i m e
p r o p r i a t e d e g r e e of e m p h a s i s .
s t r u c t u r e will c a u s e a d e s i r a b l e modification of the intonational p a t t e r n
The p a t t e r n c a n be s e e n in Fig. 111-B -4.
without a l t e r i n g the F o - r u l e s.
Only one outstanding peak in F
0
a s s o c i a t e d with the e m p h a s i s a p p e a r s
a s a l s o noted f o r other languages (Svetozarova)t 5)
.
In Fig. 111-B-4 two a l t e r n a t i v e s a r e p r e s e n t e d ; e m p h a s i s on the
s t r e s s e d syllable v s on the whole word.
T h e s e r e a l i z a t i o n s can be in
I
f r e e variation but with a definitive p r e f e r e n c e f o r the syllable e m p h a s i s .
1
I
The next example i l l u s t r a t e s how in t h i s m o d e l a difference in syntactic a n a l y s i s of the s a m e sentence will be reflected in different p r o sodic p a t t e r n s .
Dividing the r e f e r e n c e sentence in two p h r a s e s a t two
1
different p l a c e s will convey a shift of meaning a s :
"John tog
bilen i garaget"
"John tog bilen
i garaget"
"John took
the c a r in the garage"
"John took the c a r
in the garage"
In the m o d e l t h i s h a s been handled by letting the p h r a s e s p a s s through
the r u l e s y s t e m independently but deleting the final and initial specification of Fo a t the juncture.
The r e s u l t can be s e e n in Fig. 111-B -5.
In
t h i s example informal listening t e s t s indicate that no i m p r e s s i o n of
subordination of the second p a r t of the sentence i s conveyed but a c l e a r
p r e j u n c t u r a l i n c r e a s e in perceived s t r e s s i s apparent.
The subordination could e a s i l y be taken c a r e of by postulating a higher node with a p p r o p r i a t e cu , p coefficients a s d i s c u s s e d extensively in ref. (2)
.
Conclusion:
At l e a s t two c l a s s e s of effects a r e omitted in the p r e s e n t
model of prosody.
One i s the variation in intensity depending on the
amplitude and s p e c t r a l
shape of the sound s o u r c e s , the other i s the
-- -Iaccent1I
.e.e.
stressed vowls(occent words)
words)
"
unstressed vowels
................. consonants
0
time
Fig. 111-B-3.
Synthesized fundamental frequency contours showing the difference between accent I and accent 11.
35.
STL-QPSR 2 -3/1973
s y s t e m a t i c distribution of positional allophones.
T h i s d o e s not m e a n that
we deny the importance of t h e s e effects, only that we want t o concentrate
on what we believe to be the two m o s t important prosodic f a c t o r s t h a t
a r e not segment-specific namely timing and Fo intonation.
The relevance of our r u l e s y s t e m h a s been d e m o n s t r a t e d by synthe sizing nonsense sentence s including the given example s and evaluating
them in a n informal listening situation.
The p r e s e n t r u l e s y s t e m i s only
tentative and obviously insufficient in many r e s p e c t s .
However, it i s in-
tended a s a b a s i s f o r a d e s c r i p t i o n that could be t e s t e d against new c l a s s e s
of analytical d a t a and it could a l s o be subject t o m o r e f o r m a l p e r c e p t u a l
evaluations.
T h i s will s u r e l y u r g e modifications and expansions but s t i l l
we feel that a t t h i s r a t h e r p r e l i m i n a r y stage some important g e n e r a l i z a tions have a l l the s a m e been captured.
References
(1)
R . G a r l s o n , B, G r a n s t r o m , B. Lindblom, and K. Rapp: "Some
timing and fundamental frequency c h a r a c t e r i s t i c s of Swedish
sentences: d a t a , r u l e s , and a perceptual evaluationft,
STL-QPSR 4/1972, pp. 11-19.
(2)
B. Lindblom and K. Rapp: ''Some t e m p o r a l r e g u l a r i t i e s of spoken
Swedish", p a p e r p r e sented a t the Symposium on Auditory
Analysis and P e r c e p t i o n of Speech, Leningrad, Aug. 2 1-23,
197 3.
Y. E r i k s o n and M. A l s t e r m a r k : "Fundamental frequency c o r r e l a t e s
of the grave word accent in Swedish: the effect of vowel d u r a tion'', STL-QPSR 2 -3/1972, pp. 53 -60.
(3)
(4)
E . Gdrding and P. Lindblad: "Constancy and variation in Svredish
word accent patterns'" Working P a p e r s 7, 1973. Phonetics
L a b o r a t o r y , University of Lund.
(5)
N. D
.
Svetozarova: "The inner s t r u c t u r e of intonation contours in
Russian", p a p e r p r e s e n t e d a t the Symposium on Auditory Analysis
and P e r c e p t i o n of Speech, Leningrad, Aug. 2 1-23, 1973.
-
stressed vowels
unstressed vowels
L
time
Fig. 111-B-4.
Synthesized fundamental frequency contours showing two examples of e m p h a s i s .
Emphasized syllables a r e shown in the s t r u c t u r a l analysis by thick lines.
STL-QPSR 2 -3/1973
IV.
A.
PSYCHOACOUSTICS
DETERMINATION OF DIFFERENCE LIMEN AT LOW FREQUENCIES
S u m m a r y of t h e s i s work for "civilingenjorsexamenfI
A. Askenfelt
Abstract
Difference l i m e n for frequency w a s d e t e r m i n e d in the region 40-200 Hz.
T h r e e types of stimuli w e r e u s e d ; double - b a s s synthesis, low-pass f i l t e r e d pulse t r a i n , and p u r e tones. Subjects listened to p a i r s of tones and
adjusted the frequency of the second tone until they had matched the pitches
to their own satisfaction. The standard deviation of the settings w a s used
a s m e a s u r e of DL.
The obtained DL' s a r e a tenth of the often quoted data of Shower &
Biddulph and Zwicker & Feldtkeller but in close a g r e e m e n t with the data
of J . N o r d m a r k (1968). P u r e tone s gave n e a r l y twice a s high value s of
D L ' S a s the complex tones.
1.
Introduction
It h a s been r e a l i z e d for a long t i m e that data on pitch d i s c r i m i n a t i o n
a r e of relevance not only to psychoacoustics but a l s o to musicology (1)
.
Since the end of the 19th century a number of investigations of difference
lirnen (DL) for frequency h a s been c a r r i e d out ( 2 , 3 , 4 1 5 9 6 2 7 ) . ~h~ results
of some of the m o r e important w o r k s a r e shown in Fig. IV-A-1.
A s seen
in the figure t h e r e i s a considerable d e g r e e of d i s a g r e e m e n t , which in
p a r t i s probably due to differing methods of m e a s u r e m e n t .
A l s o , it ap-
p e a r s to a m u s i c i a n that some of the values r e p o r t e d a r e strikingly high.
F o r e x a m p l e , the c l a s s i c a l work of Shower & ~ i d d u l ~ h gives
' ~ ) a D L of
3 Hz a t a frequency of 60 Hz.
This c o r r e s p o n d s to 8 5 c e n t s , a l m o s t a
semitone (1 semitone = 100 cents). All but one of these r e s u l t s a r e der i v e d f r o m stimuli consisting of p u r e sinusoids.
No a t t e m p t h a s e a r l i e r
been m a d e to m e a s u r e D L for frequency with m o r e " n a t u r a l f f sounds.
The purpose of this work i s t o d e t e r m i n e D L f o r low f r e q u e n c i e s using
stimuli s i m i l a r to the tones f r o m a m u s i c a l b a s s instrument.
2.
Stimuli
The double - b a s s w a s chosen to r e p r e s e n t the b a s s i n s t r u m e n t s .
Six
tones in the frequency region 40-200 Hz w e r e r e c o r d e d in a n anechoic
r o o m and analyzed by a n audio-frequency spectrograph.
t r u m i s shown in Fig. IV-A-2.
A typical s p e c -
The tones w e r e synthesized according to