Dept. for Speech, Music and Hearing Quarterly Progress and Status Report A model of word and sentence intonation Öhman, S. journal: volume: number: year: pages: STL-QPSR 9 2-3 1968 006-011 http://www.speech.kth.se/qpsr B. A MODEL O F WORD AND S E N T E N C E INTONATION S.E.G. * ahman Many investigators have suggested that the fundamental frequency contours of utterances c a n be divided into two components, namely, word intonation and sentence intonation, approximately a s shown i n Fig. I-B-1. The rightmost box i s supposed to generate the fundamental f r e quency contour, fo(t), a s a result of t h r e e m a j o r factors, namely, the vocal cord tension ( h e r e denoted by g(t)), articulatory interactions (indicated above the box), and acoustic interactions (indicated below the box). The articulatory interactions r e s u l t f r o m the secondary effects on the vibrations of the vocal c o r d s which a r e associated with the production of high vowels, voiceless consonants, and glottal stops. The acoustic interactions r e s u l t f r o m the secondary fluctuations i n the s u p r a - and subglottal p r e s s u r e s which a r e due t o the varying d e g r e e s of c l o s u r e of the mouth. The vocal cord tension, g(t), i s supposed to be the s u m of two signals, namely, a sentence intonation component, gs(t), and a word intonation component, gw(t). In languages that have word s t r e s s , such a s English and German, the box labelled "Word intonation filter", may be assumed to g e n e r a t e the appropriate word s t r e s s pitch pattern, and the box labelled "Sentence intonation filter" to generate the slow phrase contour on which the s t r e s s fluctuations a r e superimposed. In what follows, however, I shall a s s u m e that the word tones of the Scandinavian languages a r e controlled by the lower box i n Fig. I-B-1 and that both the basic p h r a s e contours and the superimposed s t r e s s inflections a r e produced by the upper box. In a m o r e g e n e r a l model i t i s really n e c e s s a r y t o have t h r e e inputs, one f o r the basic p h r a s e contour, one f o r the s t r e s s inflection, and one for the word tones, but since my purpose h e r e is to d i s c u s s the Scandinavian word tones only, I shall make the simplification just mentioned. * Verbatim version of paper presented a t the 6th ICA, Tokyo 1968. I FUNCTIONAL MODEL OF LARYNX CONTROL SENTENCE lNTONATION INPUTS n '-1 SENTENCE ARTICULATORY INTERACTION SIONAL INTO NATION FILTER LARYNX MODEL ~")ROIQJt~I IT' INTONATION ACOUSTIC INTERACTION SIGNAL WORD INTONATION INPUTS Fig. I-B-1. STL-QPSR 2-3/1968 7. C~ualitativeanalysis of a l a r g e number of pitch p a t t e r n s of r e a l u t t e r a n c e s suggests that the fundamental frequency signal i s the r e sponse of a relatively sluggish s y s t e m t o a sequence of relatively s i m p l e level changing commands a s indicated i n the left-most p a r t of Fig. I - B - I . The n e u r a l signals that r e a c h t h e laryngeal m u s c l e s a r e not e n t i r e l y of t h i s s i m p l e n a t u r e but the s t e p function model h a s cons i d e r a b l e analytical advantages and s e e m s t o b e quite s a t i s f a c t o r y f o r the purposes of a functional description. It may i n f a c t be a s s u m e d that e a c h of the c o n t r o l b r a n c h e s of o u r model h a s the s i m p l e c h a r a c t e r i s t i c s s u m m a r i z e d i n Fig. I-B-2. That is, the command g e n e r a t o r s smooth t h e s t e p function inputs just like a c r i t i c a l l y damped t h i r d - o r d e r l i n e a r f i l t e r and m o r e o v e r , t h e analogous vocdl- c o r d tension i s exponentially r e l a t e d t o the fundamental frequency output. Unfortunately, I don' t have t i m e now to g o into the e x p e r i m e n t a l d a t a on which t h e s e p a r t i c u l a r assumptions a r e based. The model developed s o f a r c a n be z:lticizec! .sn the ;,-:-,:nii -:-:a; ~ m a s t functions of relevance t o s p e e c h production c a n be synthesized by m e a n s of smoothed s t e p function sequences. It is c l e a r , m o r e o v e r , that only a s m a l l number of s t e p s a r e n e c e s s a r y i n o r d e r to r e p r o duce t h e Scandinavian accents. This is i l l u s t r a t e d i n Fig. I-B-3. H e r e we s e e the pitch contour of the u t t e r a n c e [ s e j a manner1 Ijcn] a s produced by a s p e a k e r of the Stockholm dialect. h a s the g r a v e tone o r accent. The word [mannen] With the acute accent i t would sound [ m ~ n n e n ] . No Scandinavian dialect h a s m o r e than two tones. The filled c i r c l e s r e p r e s e n t the period-by-period pitch m e a s u r e m e n t s and the solid line is the smoothed r e s p o n s e of the model to the s t e p function input shown a t the bottom. The v e r t i c a l l i n e s indicate acoustic segment boundaries. At the beginning, the c u r v e shows a s m a l l inflection which probably is due to the u n s t r e s s e d and hence reduced g r a v e accent of the f r a m e word [ s c j a ] . Then follows the falling- rising pitch p a t t e r n typical of the unreduced g r a v e accent. The falling end contour of the p h r a s e is visible i n the rightmost p a r t of the graph. FUNCTIONAL MODEL OF LARYNGEAL CONTROL IN INTONATION NEURO-MOTOR COMMAND VOCAL CORD "TENSION" FUNDAMENTAL FREQUENCY Fig. 1-B-2. Detailed rpecification of one of the channel8 ohown in Fig. I-B-l . STL-CPSR 2-3/1968 Although, t h e synthesized c u r v e matches the data well and although only a few commands a r e used f o r t h e synthesis, the s t e p function input i s not p a r t i c u l a r l y revealing f r o m the phonetic point of view. All we l e a r n , essentially, is that i t is all-aight t o s e t t h e f i l t e r constants i n s u c h a way that the fo output r e a c h e s 90 % of the t a r g e t pitch l e v e l in about 250 m s e c i n r e s p o n s e t o a single s t e p input. This c a n b e s e e n m o r e c l e a r l y i n Fig. I-B-4 which shows a n u t t e r a n c e having t h e o t h e r Stockholm tonal accent, the a c u t e one. . It sounds like [ s ~ j a I I I m&nnenI j ~ n ] I The s t e p function input i s much s i m p l e r h e r e than i n the previous c a s e , but on the o t h e r hand, t h e model r e s p o n s e does not reproduce the d a t a equally well. , Note, i n p a r t i c u l a r , the s m a l l pitch deflection 1 I , during the [ m ] which is v e r y s y s t e m a t i c and cannot b e explained by a r t i c u l a t o r y o r acoustic influence f r o m t h e consonant. It is of c o u r s e possible t o make up f o r this m i s m a t c h by adding a few m o r e s t e p s a t the input, but t h i s would only reduce t h e possibility of i n t e r p r e t i n g the input p a t t e r n i n phonetic t e r m s . Evidently, the p r e s e n t model allows us too much f r e e d o m and we need s o m e m o r e g e n e r a l p r i n - I I ciple by which the l i b r a r y of a d m i s s i b l e inputs c a n be defined and constrained s o that ad hoc introductions of s t e p commands c a n b e avoided. The a i m of the w o r k r e p o r t e d on i n this p a p e r was t o look I f o r constraining c r i t e r i a of this s o r t among the Scandinavian accent patterns . i I Fig. I-B-5 gives a s u m m a r y of the acute and g r a v e pitch contours of one hundred Scandinavian d i a l e c t s a s m e a s u r e d by the G e r m a n phonetician E d v a r d Meyer s e v e r a l d e c a d e s ago. E a c h subgraph p r e s e n t s f r o m left to right: the name of the dialect, the a c u t e patt e r n , and the g r a v e pattern. exist. As you c a n s e e , v e r y many v a r i e t i e s In Danish, f o r instance, the a c u t e accent is a glottal stop i n the middle of the vowel, and on the island of Gotland, a s well a s ~ d i a l e c t s , the two a c c e n t s in s e v e r a l C a h ria distinct t o the native s p e a k e r s - though completely - a r e closely s i m i l a r . They m a y sound like: acute accent, [ s ~ j amgnnen ~ j ~ vne ]r s u s g r a v e [sEja . mBnnen Ij ~ n ] Now, to get on with t h e analysis, the following a s sumption i s introduced. F o r e v e r y s t r e s s e d word a single s t e p of positive a m - plitude is e n t e r e d a t t h e input of the s e n t e n c e intonation f i l t e r . This I I FUNDAMENTAL FREQUENCY ( C P S ) (CURVES A & B ) 2 I 2 0 0 (CURVE C ) I 2 ul 0 Fig. I-B-5. Schematic acute and grave accent patterns of a hundred Scandinavian dial e c t s according to E. A. Meyer: Die Intonation i m Schwediechen, part 11. 9. STL-QPSR 2-3/1968 step is made to s t a r t at the beginning of the f i r s t consonant of the s t r e s s e d syllable ( s e e Fig. I-B-6). The consequences of this assumption a r e seen i n the left p a r t of The wriggled curves a r e measured data and the smooth Fig. I-B-6. curves a r e calculated model responses. The upper display r c p r e - sents the acute word [m6:nen] embedded i n a sentence frame, l d va ~ m6:nen ja s a ] and the lower display corresponds to the grave word [mb:nen] i n C ~ va E mb:nen ja sa]. F o r technical reasons the pitch curve goes t o z e r o during the voiceless consonant [ s ) where, in fact, it i s undefined. A positive going sentence intonation step representing the s t r e s s according to our previous assumption is introduced a t the beginning of the [ m ] and a negative s t e p representing the end contour i s introduced l a t e r i n the sentence, both i n the acute and i n the grave cases. The curves marked with the l e t t e r E show the difference between the calculated and the measured contours. Note that the e r r o r curves have a negative dip both in the acute and i n the grave c a s e s , If our assumption that s t r e s s i s a positive sentence intonation step i s correct, then these dips must be the word intonation components of the two patterns, since they represent the residue a f t e r elimination of the sentence intonation component. The right p a r t of the figure shows the result of introducing an appropriately shaped negative pulse at the input of the word intonation filter. Almost perfect matches a r e obtained i f this pulse i s made t o occur early i n the acute c a s e and late i n the grave case. F r o m the point of view of the present descriptive model, these nega- - tive pulses a r e the Stockholm tonal accents. The proposal shown i n Fig. I-B-7 therefore suggests itself. The Stockholm tonal accents can be synthesized with a sentence intonation s t e p and a word intonation pulse which a r e coarticulated i n the appropriate manner. The p a r a m e t e r s of this model a r e : the amplitude of the step, marked A; the depth of the pulse, marked B; the duration of the pulse, marked D; and the timing of the pulse, marked t2. F u r t h e r m o r e , the possibly different time constants of the sentence intonation filter and the word intonation filter, to be denoted by cu and B, respectively, a r e a l s o of relevance. ACUTE ACCENT: STOCKHOLM Hz 150r Idrvalm 1 6: I n ~ n lja s a: I 6: ldcvalrnl l n r n l ja s I a: Hz 1150 GRAVE ACCENT: STOCKHOLM I I 0 .5 I I 1.0 sec r'ig. I-B-6. Comparison of Stockholm accent patterns with curves calculated by rileans of intonation model. The pulses marked I, IS, and IW represent model outputs with the same input commands that were used to match the e.impirica1 data but with the model constants cr and B both s e t to 1000. h^ ,. I .I INPUT COMMANDS FOR SWEDISH ACCENTS SENTENCE INTONATION STEP -- time t A WORD INTONATION PULSE B 7 Fig. I-B-7. 10. STL-QPSR 2-3/1968 Some of t h e possibilities of t h i s model a r e shown i n Fig. 1-13-3. In the c u r v e family marked A t h e depth of the word accent pulse is z e r o and the model r e s p o n s e s f o r varying s t r e s s s t e p amplitudes a r e shown. The c u r v e family m a r k e d cu is s i m i l a r except that h e r e the amplitude is fixed and the f i l t e r t i m e constant is changed. In the c u r v e families marked B, B, and D, the s t r e s s s t e p a m plitude is z e r o and the depth, t i m e constant, and duration of t h e word accent pulses a r e systematically varied. a typical s t r e s s s t e p r e 2 sponse has been combined with a typical accent pulse response f o r Finally, i n the c u r v e family m a r k e d t various timings of the l a t t e r pulse. I have pointed t o the importance of the relative timing of the a c cent commands f o r t h e grave/acute tonal c o n t r a s t i n the Stockholm i n Fig. I-B-8). A s i m i l a r situation s e e m s t o obtain 2 i n a l l Scandinavian dialects. This is indicated i n Fig. I-B-9. dialect (cf., t This figure shows a r e a r r a n g e m e n t of Meyer' s d a t a . Again, each subgraph shows f r o m left t o right: the name of the dialect, t h e acute pattern, and the g r a v e pattern. It will be s e e n h e r e that, a s we go f r o m dialect to dialect, the acoustic reflexes of the acute and the g r a v e accent pulses a r e gradually shifted, a s a p a i r i t s e e m s , e i t h e r t o the right o r to the left depending on which d i r e c t i o n we follow i n the orbit. Starting i n Dalarna, f o r instance, the g r a v e accent f i r s t shows up a s a l i t t l e groove i n the onset r a m p of the s t r e s s p a t t e r n and this groove penetrates toward the right a s we rnove upward toward Stockholm w h e r e a f t e r it moves toward the end of the word until it finally drowns - i n the Baltic s e a i t would seem .. . On the o t h e r hand, starting once m o r e i n Dalarna, the acute accent first shows up a s a groove on t h e t a i l of t h e pitch p a t t e r n and t h i s groove p e n e t r a t e s gradually toward t h e left i n the word a s we move toward D e n m a r k where i t shows up a s a glottal stop, and then i t continues toward the beginning of the word until finally it l o s e s itself among t h e wolves of Lapland. ACCENT MODEL: EFFECTS OF SIX PARAMETERS F i g . I-B-8. Model r e s p o n s e s t o input commands of Fig. I-B-7. Explanation i n text. STL-CPSR 2-3/1968 11. E v e r y Scandinavian dialect is c l o s e t o one o r another of t h e s e patterns. This is a n i n t e r e s t i n g fact which, although not completely understood as yet, a p p e a r s t o support a t l e a s t t h e g r o s s outlines of t h e model s u m m a r i z e d h e r e . Acknowledgments I should like t o acknowledge t h e valuable cooperation of m y colleague, Johan Liljencrants, i n the writing of s o m e of the c o m p u t e r p r o g r a m s r e q u i r e d f o r t h i s work. References E. A . Meyer: Die Intonation i m Schwedischen, T e i l I (Stockholm 1937). B. htalmberg: Sydsvensk ordaccent (Lund 1953).
© Copyright 2026 Paperzz