A model of word and sentence intonation

Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
A model of word and
sentence intonation
Öhman, S.
journal:
volume:
number:
year:
pages:
STL-QPSR
9
2-3
1968
006-011
http://www.speech.kth.se/qpsr
B.
A MODEL O F WORD AND S E N T E N C E INTONATION
S.E.G.
*
ahman
Many investigators have suggested that the fundamental frequency
contours of utterances c a n be divided into two components, namely,
word intonation and sentence intonation, approximately a s shown i n
Fig. I-B-1.
The rightmost box i s supposed to generate the fundamental f r e quency contour, fo(t), a s a result of t h r e e m a j o r factors, namely, the
vocal cord tension ( h e r e denoted by g(t)), articulatory interactions
(indicated above the box), and acoustic interactions (indicated below
the box).
The articulatory interactions r e s u l t f r o m the secondary effects
on the vibrations of the vocal c o r d s which a r e associated with the production of high vowels, voiceless consonants, and glottal stops.
The
acoustic interactions r e s u l t f r o m the secondary fluctuations i n the
s u p r a - and subglottal p r e s s u r e s which a r e due t o the varying d e g r e e s
of c l o s u r e of the mouth.
The vocal cord tension, g(t), i s supposed to
be the s u m of two signals, namely, a sentence intonation component,
gs(t), and a word intonation component, gw(t).
In languages that have word s t r e s s , such a s English and German,
the box labelled "Word intonation filter", may be assumed to g e n e r a t e the appropriate word s t r e s s pitch pattern, and the box labelled
"Sentence intonation filter" to generate the slow phrase contour on
which the s t r e s s fluctuations a r e superimposed.
In what follows,
however, I shall a s s u m e that the word tones of the Scandinavian
languages a r e controlled by the lower box i n Fig. I-B-1 and that both
the basic p h r a s e contours and the superimposed s t r e s s inflections
a r e produced by the upper box.
In a m o r e g e n e r a l model i t i s really
n e c e s s a r y t o have t h r e e inputs, one f o r the basic p h r a s e contour, one
f o r the s t r e s s inflection, and one for the word tones, but since my
purpose h e r e is to d i s c u s s the Scandinavian word tones only, I shall
make the simplification just mentioned.
*
Verbatim version of paper presented a t the 6th ICA, Tokyo 1968.
I
FUNCTIONAL MODEL OF LARYNX CONTROL
SENTENCE
lNTONATION
INPUTS
n
'-1
SENTENCE
ARTICULATORY
INTERACTION
SIONAL
INTO NATION
FILTER
LARYNX
MODEL
~")ROIQJt~I
IT'
INTONATION
ACOUSTIC
INTERACTION
SIGNAL
WORD
INTONATION
INPUTS
Fig. I-B-1.
STL-QPSR 2-3/1968
7.
C~ualitativeanalysis of a l a r g e number of pitch p a t t e r n s of r e a l
u t t e r a n c e s suggests that the fundamental frequency signal i s the r e sponse of a relatively sluggish s y s t e m t o a sequence of relatively
s i m p l e level changing commands a s indicated i n the left-most p a r t of
Fig. I - B - I .
The n e u r a l signals that r e a c h t h e laryngeal m u s c l e s a r e
not e n t i r e l y of t h i s s i m p l e n a t u r e but the s t e p function model h a s cons i d e r a b l e analytical advantages and s e e m s t o b e quite s a t i s f a c t o r y
f o r the purposes of a functional description.
It may i n f a c t be a s s u m e d that e a c h of the c o n t r o l b r a n c h e s of o u r
model h a s the s i m p l e c h a r a c t e r i s t i c s s u m m a r i z e d i n Fig. I-B-2.
That is, the command g e n e r a t o r s smooth t h e s t e p function inputs
just like a c r i t i c a l l y damped t h i r d - o r d e r l i n e a r f i l t e r and m o r e o v e r ,
t h e analogous vocdl- c o r d tension i s exponentially r e l a t e d t o the fundamental frequency output.
Unfortunately, I don' t have t i m e now to
g o into the e x p e r i m e n t a l d a t a on which t h e s e p a r t i c u l a r assumptions
a r e based.
The model developed s o f a r c a n be z:lticizec! .sn the ;,-:-,:nii
-:-:a; ~ m a s t
functions of relevance t o s p e e c h production c a n be synthesized by
m e a n s of smoothed s t e p function sequences.
It is c l e a r , m o r e o v e r ,
that only a s m a l l number of s t e p s a r e n e c e s s a r y i n o r d e r to r e p r o duce t h e Scandinavian accents.
This is i l l u s t r a t e d i n Fig. I-B-3.
H e r e we s e e the pitch contour of the u t t e r a n c e [ s e j a manner1 Ijcn]
a s produced by a s p e a k e r of the Stockholm dialect.
h a s the g r a v e tone o r accent.
The word [mannen]
With the acute accent i t would sound
[ m ~ n n e n ] . No Scandinavian dialect h a s m o r e than two tones.
The filled c i r c l e s r e p r e s e n t the period-by-period pitch m e a s u r e m e n t s and the solid line is the smoothed r e s p o n s e of the model to the
s t e p function input shown a t the bottom.
The v e r t i c a l l i n e s indicate
acoustic segment boundaries.
At the beginning, the c u r v e shows a s m a l l inflection which probably is due to the u n s t r e s s e d and hence reduced g r a v e accent of the
f r a m e word [ s c j a ] .
Then follows the falling- rising pitch p a t t e r n
typical of the unreduced g r a v e accent.
The falling end contour of
the p h r a s e is visible i n the rightmost p a r t of the graph.
FUNCTIONAL MODEL OF LARYNGEAL
CONTROL IN INTONATION
NEURO-MOTOR
COMMAND
VOCAL CORD
"TENSION"
FUNDAMENTAL
FREQUENCY
Fig. 1-B-2. Detailed rpecification of one of the channel8 ohown in Fig. I-B-l .
STL-CPSR 2-3/1968
Although, t h e synthesized c u r v e matches the data well and although
only a few commands a r e used f o r t h e synthesis, the s t e p function input i s not p a r t i c u l a r l y revealing f r o m the phonetic point of view.
All
we l e a r n , essentially, is that i t is all-aight t o s e t t h e f i l t e r constants
i n s u c h a way that the fo output r e a c h e s 90 % of the t a r g e t pitch l e v e l
in about 250 m s e c i n r e s p o n s e t o a single s t e p input.
This c a n b e
s e e n m o r e c l e a r l y i n Fig. I-B-4 which shows a n u t t e r a n c e having t h e
o t h e r Stockholm tonal accent, the a c u t e one.
.
It sounds like [ s ~ j a
I
I
I
m&nnenI j ~ n ]
I
The s t e p function input i s much s i m p l e r h e r e than i n the previous
c a s e , but on the o t h e r hand, t h e model r e s p o n s e does not reproduce
the d a t a equally well.
,
Note, i n p a r t i c u l a r , the s m a l l pitch deflection
1
I
,
during the [ m ] which is v e r y s y s t e m a t i c and cannot b e explained by
a r t i c u l a t o r y o r acoustic influence f r o m t h e consonant.
It is of c o u r s e
possible t o make up f o r this m i s m a t c h by adding a few m o r e s t e p s a t
the input, but t h i s would only reduce t h e possibility of i n t e r p r e t i n g
the input p a t t e r n i n phonetic t e r m s .
Evidently, the p r e s e n t model
allows us too much f r e e d o m and we need s o m e m o r e g e n e r a l p r i n -
I
I
ciple by which the l i b r a r y of a d m i s s i b l e inputs c a n be defined and
constrained s o that ad hoc introductions of s t e p commands c a n b e
avoided.
The a i m of the w o r k r e p o r t e d on i n this p a p e r was t o look
I
f o r constraining c r i t e r i a of this s o r t among the Scandinavian accent
patterns
.
i
I
Fig. I-B-5 gives a s u m m a r y of the acute and g r a v e pitch contours
of one hundred Scandinavian d i a l e c t s a s m e a s u r e d by the G e r m a n
phonetician E d v a r d Meyer s e v e r a l d e c a d e s ago.
E a c h subgraph
p r e s e n t s f r o m left to right: the name of the dialect, the a c u t e patt e r n , and the g r a v e pattern.
exist.
As you c a n s e e , v e r y many v a r i e t i e s
In Danish, f o r instance, the a c u t e accent is a glottal stop i n
the middle of the vowel, and on the island of Gotland, a s well a s
~ d i a l e c t s , the two a c c e n t s
in s e v e r a l C a h ria
distinct t o the native s p e a k e r s
- though completely
- a r e closely s i m i l a r .
They m a y
sound like: acute accent, [ s ~ j amgnnen ~ j ~ vne ]r s u s g r a v e [sEja
.
mBnnen Ij ~ n ]
Now, to get on with t h e analysis, the following a s sumption i s
introduced.
F o r e v e r y s t r e s s e d word a single s t e p of positive a m -
plitude is e n t e r e d a t t h e input of the s e n t e n c e intonation f i l t e r .
This
I
I
FUNDAMENTAL FREQUENCY ( C P S )
(CURVES A & B )
2
I
2
0
0
(CURVE C )
I
2
ul
0
Fig. I-B-5. Schematic acute and grave accent patterns of a hundred Scandinavian dial e c t s according to E. A. Meyer: Die Intonation i m Schwediechen, part 11.
9.
STL-QPSR 2-3/1968
step is made to s t a r t at the beginning of the f i r s t consonant of the
s t r e s s e d syllable ( s e e Fig. I-B-6).
The consequences of this assumption a r e seen i n the left p a r t of
The wriggled curves a r e measured data and the smooth
Fig. I-B-6.
curves a r e calculated model responses.
The upper display r c p r e -
sents the acute word [m6:nen] embedded i n a sentence frame, l d va
~
m6:nen ja s a ] and the lower display corresponds to the grave word
[mb:nen] i n
C ~ va
E
mb:nen ja sa].
F o r technical reasons the pitch
curve goes t o z e r o during the voiceless consonant [ s ) where, in fact,
it i s undefined.
A positive going sentence intonation step representing the s t r e s s
according to our previous assumption is introduced a t the beginning
of the [ m ] and a negative s t e p representing the end contour i s introduced l a t e r i n the sentence, both i n the acute and i n the grave cases.
The curves marked with the l e t t e r E show the difference between the
calculated and the measured contours.
Note that the e r r o r curves have a negative dip both in the acute
and i n the grave c a s e s ,
If our assumption that s t r e s s i s a positive
sentence intonation step i s correct, then these dips must be the word
intonation components of the two patterns, since they represent the
residue a f t e r elimination of the sentence intonation component.
The right p a r t of the figure shows the result of introducing an
appropriately shaped negative pulse at the input of the word intonation filter.
Almost perfect matches a r e obtained i f this pulse i s
made t o occur early i n the acute c a s e and late i n the grave case.
F r o m the point of view of the present descriptive model, these nega-
-
tive pulses a r e the Stockholm tonal accents.
The proposal shown i n Fig. I-B-7 therefore suggests itself.
The
Stockholm tonal accents can be synthesized with a sentence intonation
s t e p and a word intonation pulse which a r e coarticulated i n the appropriate manner.
The p a r a m e t e r s of this model a r e : the amplitude of
the step, marked A; the depth of the pulse, marked B; the duration
of the pulse, marked D; and the timing of the pulse, marked t2.
F u r t h e r m o r e , the possibly different time constants of the sentence
intonation filter and the word intonation filter, to be denoted by cu and
B, respectively, a r e a l s o of relevance.
ACUTE ACCENT: STOCKHOLM
Hz
150r
Idrvalm
1
6:
I n ~ n lja
s
a:
I
6:
ldcvalrnl
l n r n l
ja
s
I
a:
Hz
1150
GRAVE ACCENT: STOCKHOLM
I
I
0
.5
I
I
1.0 sec
r'ig. I-B-6. Comparison of Stockholm accent patterns with curves calculated by
rileans of intonation model. The pulses marked I, IS, and IW represent
model outputs with the same input commands that were used to match
the e.impirica1 data but with the model constants cr and B both s e t to 1000.
h^
,.
I
.I
INPUT COMMANDS FOR SWEDISH ACCENTS
SENTENCE
INTONATION
STEP
--
time
t
A
WORD
INTONATION
PULSE
B
7
Fig. I-B-7.
10.
STL-QPSR 2-3/1968
Some of t h e possibilities of t h i s model a r e shown i n Fig. 1-13-3.
In the c u r v e family marked A t h e depth of the word accent pulse is
z e r o and the model r e s p o n s e s f o r varying s t r e s s s t e p amplitudes a r e
shown.
The c u r v e family m a r k e d cu is s i m i l a r except that h e r e the
amplitude is fixed and the f i l t e r t i m e constant is changed.
In the c u r v e families marked B, B, and D, the s t r e s s s t e p a m plitude is z e r o and the depth, t i m e constant, and duration of t h e
word accent pulses a r e systematically varied.
a typical s t r e s s s t e p r e 2
sponse has been combined with a typical accent pulse response f o r
Finally, i n the c u r v e family m a r k e d t
various timings of the l a t t e r pulse.
I have pointed t o the importance of the relative timing of the a c cent commands f o r t h e grave/acute tonal c o n t r a s t i n the Stockholm
i n Fig. I-B-8). A s i m i l a r situation s e e m s t o obtain
2
i n a l l Scandinavian dialects. This is indicated i n Fig. I-B-9.
dialect (cf., t
This figure shows a r e a r r a n g e m e n t of Meyer' s d a t a .
Again,
each subgraph shows f r o m left t o right: the name of the dialect, t h e
acute pattern, and the g r a v e pattern.
It will be s e e n h e r e that, a s
we go f r o m dialect to dialect, the acoustic reflexes of the acute and
the g r a v e accent pulses a r e gradually shifted, a s a p a i r i t s e e m s ,
e i t h e r t o the right o r to the left depending on which d i r e c t i o n we follow i n the orbit.
Starting i n Dalarna, f o r instance, the g r a v e accent f i r s t shows
up a s a l i t t l e groove i n the onset r a m p of the s t r e s s p a t t e r n and this
groove penetrates toward the right a s we rnove upward toward Stockholm w h e r e a f t e r it moves toward the end of the word until it finally
drowns
- i n the Baltic s e a i t would
seem
.. .
On the o t h e r hand, starting once m o r e i n Dalarna, the acute
accent first shows up a s a groove on t h e t a i l of t h e pitch p a t t e r n and
t h i s groove p e n e t r a t e s gradually toward t h e left i n the word a s we
move toward D e n m a r k where i t shows up a s a glottal stop, and then
i t continues toward the beginning of the word until finally it l o s e s
itself among t h e wolves of Lapland.
ACCENT MODEL:
EFFECTS OF SIX PARAMETERS
F i g . I-B-8.
Model r e s p o n s e s t o input commands of Fig. I-B-7.
Explanation i n text.
STL-CPSR 2-3/1968
11.
E v e r y Scandinavian dialect is c l o s e t o one o r another of t h e s e
patterns.
This is a n i n t e r e s t i n g fact which, although not completely
understood as yet, a p p e a r s t o support a t l e a s t t h e g r o s s outlines of
t h e model s u m m a r i z e d h e r e .
Acknowledgments
I should like t o acknowledge t h e valuable cooperation of m y colleague, Johan Liljencrants, i n the writing of s o m e of the c o m p u t e r
p r o g r a m s r e q u i r e d f o r t h i s work.
References
E. A . Meyer: Die Intonation i m Schwedischen, T e i l I (Stockholm 1937).
B. htalmberg: Sydsvensk ordaccent (Lund 1953).