View PDF - CiteSeerX

FROM PROSODIC STRUCTURE TO INTONATION CONTOURS
Merle Horne and Marcus Filipsson
Dept. of Linguistics and Phonetics, Lund University, Helgonabacken 12, S-223 62 Lund,
Sweden
ABSTRACT
The design of a set of rules for generating Swedish intonation in a text-to-speech system
containing a linguistic preprocessor is presented.
1. INTRODUCTION
In a number of previous articles, we have presented the structure of the components in a
linguistic preprocessor to a Swedish text-to-speech system which has been developed
within the HSFR/NUTEK speech technology project ‘Intonation in Restricted Texts:
Modelling and Synthesis’.
The first component developed was a referent tracker (see Figure 1). This is a crucial
component for any text-to-speech system since generation of natural prosody requires
knowledge about information structure. The referent tracker recognizes contextual
coreference or cospecification relations between content words based on
pragmatic/situational givenness, morphological identity and lexical semantic identity-ofsense relations (hyponymy, synonymy, meronymy/partonymy) holding between the
words [1]. These identity-of-sense relations are modelled in a computerized lexicon [2]
and the tracking procedure works within an adjustable text window. In the output of the
referent tracker, all lexical words are specified as either contextually ‘New’ (N) or
‘Given’ (G). This information can then be used in the F0 generating component in order
to appropriately assign ‘sentence stresses’ or ‘focal accents’. Thus, for example, instead
of assigning the most prominent sentence stress to the last content word in a clause as has
generally been the case in existing text-to-speech systems, one can instead formulate a
more adequate rule which assigns the most prominent focal accent (sentence stress) to the
last New content word. Consequently, by having a referent-tracker in the lingustic
preprocessor, one can better model discourse parameters such as the New/Given
distinction which governs accent assignment to a great extent.
Another goal of the project has been to generate an abstract prosodic structure which can
be used in the text-to-speech system in order to better model prosodic boundary signalling
and the association of accents/tones with the segmental string in general (see Figure 2). In
[3-4], a prosodic structure is proposed containing three hierarchically ordered levels: the
Prosodic Word (PW), the Prosodic Phrase (PPh) and the Prosodic Utterance (PU). The
PPh is the central constituent on the basis of which the other constituents are defined.
Prosodically, it is characterized by a boundary tone (H% or L%), a degree of Final
Lengthening and a Silent Interval (breath pause) [5-6]. It corresponds syntactically very
often with the clause; for example, in a radio broadcast corpus containing 724 clauses,
where clauses also included elliptical clauses, 499 or 69% of these ended in a PPh
boundary. Syllable count also plays a role in determining the position of PPh boundaries:
a number of clauses can be grouped together in a PPh if they consist of a limited number
of syllables (e.g. elliptic clauses). In our material, where the speech rate is on the average
of about 5 syllables/second, PPh’s contained between 7 and 63 syllables, with a mean at
24 syllables (SD=10.3).
MORPH DECOMPOSITION
REFERENT TRACKING
60-WORD WINDOW
word
1
word
2
word
3
word . . .
4
Next Word
MARK AS
‘NEW’
–
STEM
–
– SYNONYM
IDENTITY
Pragmatically
with
with
previous
previous
Given
word
word
?
?
?
+
+
+
MARK
AS
‘GIVEN’
MARK
AS
‘GIVEN’
MARK
AS
‘GIVEN’
SUPER– ORDINATE
term w.r.t
previous
word
?
+
MARK
AS
‘GIVEN’
Figure 1. Design of the Referent Tracker which identifies Givenness on the basis of
pragmatic (contextual) givenness, morphological identity, as well as lexical semantic
identity of sense relationships (synonymy, hyponymy, and meronymy/partonymy).
PU
PPh
PW
Pause
F.L.
PPh
{H%
L%}
PW
Pause
F.L.
(Function Word) 0 Content Word (Function Word)
0
{H#
L#}
Hz
Figure 2. Prosodic hierarchy used in modelling intonation in our data-base (read radio
speech) showing constituents and associated acoustic correlates: H# and L# are Prosodic
Word boundary tones, H% and L% are Prosodic Phrase boundary tones, F.L. = Final
Lengthening.
PPh PPh
PPh PPh
350
H%
t
300
n
P
P
f
e
A me
A
e
250
da s
oc
U
U
n
r
e
S
S
p
m
200
E
E
e
x
f a
pu
t i
m
å a vä l r
ot r
ll
n d
t io
n k t er
ers x a
ti l
i t
150
l
L%
100
0
500
1000 1500 2000 2500 3000 3500 4000 4500 ms
Figure 3. F0 curve showing internal PPh boundaries for a fragment of the sentence
‘Tolvmånaders statsskuldväxlar låg stilla på 10,83 procent, medan 6-månadersväxlar fallit
5 punkter till 10,77 procent ‘Twelve-month state debt bonds remained stable at 10.83
percent, while 6-month bonds fell 5 points to 10.77 percent’. Numbers written in bold are
focussed.
Elliptical clauses containing less than 7 syllables are assigned a weaker boundary and
grouped together with another PPh. Linking of clauses was observed to take place in our
data if the first of two clauses contained less than 30 syllables and the resulting structure
did not contain on the average of more than 40 syllables. A further factor that governs
PPh boundary assignment is the number of focussed constituents in a clause. For
example, an optional PPh boundary was observed in our stock market data (radio speech)
between two focussed (new) Predicate Complements (e.g. a Direct Object and a
prepositional phrase functioning as an Adjunct). An example of this is shown in Figure 3.
PW
350 k
300
PW
u
r
PW
H
-
Hz
l
250
H#
a
s
n
e
200
r
L
150
100
PW
H*
0
l
f
un d
H*
e
L- r den i
L#
ö
n l
L*
L
H
d an
e
d
e
ha
L
n de l
L#
500
1500
2000
ms
1000
Figure 4. F0 curve showing PW-boundaries for a fragment of the sentence Kurserna föll
under den inledande handel på dagens Stockholmsbörs ‘Rates fell during the opening
trade at (today’s) Stockholm’s stock-market’. Words written in bold are focussed.
Within the PPhs, PWs are defined. Lexically, these are composed of a Content Word
(CW) followed by any Function Words (FW) up to the next Content Word. The Prosodic
Word is a rhythmical unit and is characterized by a word accent and a boundary tone
which is H# if the word does not have a focal accent and L# if the word does have a focal
accent. These boundary tones function to create the transitions between word accents.
Figure 4 presents an example of a fragment of a sentence analysed into PW’s. One or
more PPh’s (which consist of one or more PW’s) constitute a PU which corresponds
textually to a paragraph and correlates with a shift in discourse topic. Prosodically, PU
boundaries are marked by a greater degree of Final Lengthening and Silent Interval
duration than those associated with PPhs. The beginnings of PU’s are also characterized
acoustically by a high F0 level on the first accented word. This seems to be a general
characteristic of discourse intonation [7]; however, we have not as yet made any detailed
study of this and other ‘left-edge’ boundary markers in our data.
2. IMPLEMENTING THE PROSODIC STRUCTURE
Using the information obtained from the referent tracker and the prosodic parser, we are
currently involved in developing a rule component for generating intonation contours. A
set of rules associate the underlying prosodic structure with acoustic parameters (F0,
duration (Final Lengthening), Silent Intervals) has now partially been implemented.
Information concerning syllable boundaries, word accent type and stress which is also
needed in the formulation of the rules is extracted from the lexicon [2].
The rules make reference to the New/Given (coreferential) status of words when the word
accent form is assigned. For example, if a PPh-final word is associated with the label
N(ew) and is marked as Accent 1 in the lexicon, it will be realized phonetically with one
of the tonal patterns HL*H-, L*H- or H- depending on the number of syllables it
contains: if it contains a prestress syllable, it will be associated with all three tones in the
representation HL*H-, where the first H is associated with a point in the prestress
syllable, the L* with the beginning of the stressed Vowel, and the final H- with some
point in the post-stress syllable as in (1a). If there is no prestress syllable, the word will be
associated with the two rightmost tones L*H- in (1b), and if the word is a monosyllable,
then priority is given to the focal H- which is the only underlying tonal component
realized PPh-finally in the read speech being modelled (1c). Tones not associated with (or
‘delinked’ from) segments are not realized phonetically following conventions within
autosegmental phonology [8]. The rules thus generate a number of contextual variants of
the underlying word accent representations as assumed in [9].
1)
a) H
L* H -
ten.den.sen
‘the tendency’
[Accent 1]
[New]
b) H L*
H
-
poj.ken
‘the boy’
[Accent 1]
[New]
c) H L* H svag
‘weak’
[Accent 1]
[New]
The New/Given status of words also conditions the amount of accentual prominence
assigned to a word. A New word, for example, is assigned more prominence in terms of
F0 peak height in relation to the baseline than a contextually Given word. Function words
are further assigned less prominence than Given content words. Furthermore, there are
rules for downstepping of word accents after the last focal accent in a PPh (see [10] for a
discussion of this phenomenon in Swedish). These rules for downstepping must also
make reference to the New/Given status of words.
Reference to prosodic structure is needed when formulating the F0-rules first of all in
order to determine the location of the PW-boundary tones (H# or L#) which constitute the
transitions to a following word accent. Reference to prosodic structure is further needed
in order to be able to model how speakers chunk up speech into PPh’s and PU’s.
2.1. Methodology
As speech data for rule implementation purposes, we have used recorded speech in order
to obtain an optimal segmental quality. We followed this methodology in order to
simulate as well as possible a future stage where the F0 rules are part of a concatenative
synthesis system. After some practice, a male speaker (the second author) was able to
produce utterances with a more or less flat F0 contour, and with minimal variation in
intensity and duration (using a monotonous, robot-like speaking style). This was desirable
in order to be able to test the effect of moving the location of accents while avoiding
secondary influences from duration differences. In Figure 5 an example of the
‘deaccented’ speech is illustrated. Intonation contours were then generated from this input
data using an implementation of the PSOLA technique [11].
A system for creating F0-files from the pre-recorded sentences and the associated
prosodic structure was further developed. The pre-recorded original sentences were
manually labeled with (1) an orthographic transcription of each word, (2) syllable
boundaries and (3) vowel onset time for the primary stressed syllable. If the last word
Figure 5. A labeled example of a ‘deaccented’ reading of the sentence Den röda bilen är
min favorit ‘The red car is my favorite’.
ended in (a) voiceless sound(s), a label for the (4) actual end of voicing was added in
order to be able to correctly time the end of the phrase accent in the voiced portion of the
final syllable.
The system creates an ‘ordinary’ text file by extracting words from the label file with
orthographic transcriptions. This text file is then analysed by the referent tracker and
prosodic parser [3-4]. The parser looks up each word in the lexicon and constructs a
prosodic structure for a sentence in terms of PW’s, PPh’s and PU’s. The labels ‘New’ or
‘Given’ are also assigned to each word based on the referent tracker [1]. From the lexicon
the system also derives word accent type and the number of syllables for each word. A
check is made that the lexically defined number of syllables is equal to the number of
syllables derived from the manual labeling. An example of this lexical information is
given in Figure 6:
den 1 /d'än/ {pron} <den{pron}>
röda 2 /r"öd2 a/ {a23 } <röd{a23 }, a2e, 1>
bilen 1 /b'Ilën/ {s2 3 } <bil{s2 3 }, s2q, 1>
är 1 /'Ær/ {v pres} <är{v pres}>
min 1 /m'in/ {pron} <min{pron}>
favorit 1 /favor'It/ {s3} <favorit{s3}>
Figure 6. An example of the entries from Hedelin et al.’s lexicon [2] for the words in the
utterance in Figure 5 showing accent type and word class/inflexion class information.
The system contains a large set of rules for transforming a linguistically parsed sentence
into a label file consisting of a sequence of F0 values. These rules have as their domain
the Prosodic Phrase. Within the Prosodic Phrase, different sets of rules apply to a number
of positions: 1. Final Focal Prosodic Word, 2. Post-Focal Prosodic Words
(Downstepping), 3. Prefocal Prosodic Words.
Within each PW different rules are applied depending on the number of Function Words,
as well as the number of syllables before, after and within the Content Word. Durations
and timing are taken from the syllable boundaries and vowel onset times which are
manually labeled at this stage. Thus, a sequence of Time-F0 pairs are constructed. In the
top label tier in Figure 8, just below the F0 contours, is an example of such a sequence of
F0 values expressed in Hertz. The F0 values in the current rules are representative of a
male voice, with a range extending between 75 and 150 Hertz and with a floor at 100
Hertz. The prominence levels used are thus extrapolations from the data-base which is
spoken by a female whose F0 range is of course much higher. We have attempted,
however, to make the F0 levels realizing word accents and boundary tones for the male
speaker the same as the levels used by the female expressed in percent of the distance
between the bottom and top of the speaker-range.
The final part of the system takes this sequence of F0 values, interpolates them linearly
and produces an F0 file which is then used as input to the re-synthesis algorithm.
Examples of two such F0 contours can be found in Figure 8. Figure 7 shows an example
of a prosodically parsed sentence from which the lower F0 curve in Figure 8 is derived.
-- -- [ PU
-- -- [ PPh
-- -- [ PW
den Gw DT FW
röda N JJ CW
-- -- ] PW
-- -- [ PW
bilen Gi2 NN CW
är N VA FW
min Gw PS FW
-- -- ] PW
-- -- [ PW
favorit Gs5 NN CW
-- -- ] PW
-- -- ] PPh
. Gw SENT SENT
-- -- ] PU
# Gw PARA PARA
Figure 7. Output from the referent-tracker and prosodic parser for the sentence Den röda
bilen är min favorit ‘The red car is my favourite’ where röda is the final N(ew) content
word. Gw, Gi, and Gs represent Given(ness) due to lexical word class, morphological
identity and synonymy, respectively. DT= Determiner, JJ=Adjective, NN=Noun,
VA=Aux. Verb, PS=Possessive pronoun. FW stands for ‘Function Word’ and CW for
‘Content Word’.
2.2. Sample derivation. In order to give the reader some idea of how the rules function
in generating F0 contours, we will show the steps that are needed in order to derive the
lower F0 curve in Figure 8, reproduced below as Figure 9. The encircled numbers at
different points on the curve show the order in which the F0 values given in the rules are
associated with these points. The relevant rules are found in the flow-diagrams in Figures
10-11 which summarize the rules for PPh-final accent 2 (focal) position and postfocal
position, respectively. The rules in Figures 10-11 used to derive the F0 curve in Figure 9
are indicated in the flow-diagrams by the same numbers as those found on the curve.
The first value assigned by the rules in Figure 10 is that associated with the L% at the end
of the PPh. L% is assigned a level of 75 Hz according to rule (1). The algorithm then
proceeds to the last PW in the PPh which contains a [+new] CW (Content Word). This
corresponds to den röda ‘the red’ in the example sentence. The rules then check to see if
there are any FW’s (Function Words) before the CW. In this case, there is one. Since den
is an Accent 1 word, it is assigned a L* at the beginning of the syllable at 100 Hz (2) and
a H at the end of the (stressed) syllable at 110 Hz (3). The algorithm then asks if there are
more FW’s. In this case, there are none, so that the algorithm proceeds to ask if the focal
word has Accent 2. Since röda is indeed an accent 2 word, the rules check to see if there
are any post-secondary stress syllables in the word. Since there are none in röda, the focal
H- is assigned to the end of the secondary stressed syllable -da at 140 Hz (4). The
algorithm then proceeds to point (5) which assigns the word accent H* at the beginning of
the stressed vowel. Subsequently, the rule at point (6) assigns a L at the end of the
stressed vowel at 100 Hz. The algorithm in Figure 11 then assigns F0 values to the postfocal material.
Figure 8. F0 contours generated for the sentence Den röda bilen är min favorit ‘The red
car is my favorite ’. The top version has focal accents on röda ‘red’ as well as on the last
content word favorit ‘favorite’. This differs from the lower version where the last focal
accent does not fall on favorit but rather on röda‘red’. At the bottom are label tiers for
PW’s, PPh’s, syllable boundaries (s), vowel onset (v), and the automatically generated
sequence of F0 values (for the lower contour).
4
2
3
5
7
6
8
9
1
L%
PW
PPh
Figure 9. F0 curve for the utterance Den röda bilen är min favorit. Encircled numbers
represent the order in which the F0 values (indicated on the tier immediately below the F0
curve) associated with these positions are assigned by the F0 generating rules. These rules
are shown in Figures 10-11.
START
+
1
PHRASE-FINAL
FOCAL
ACCENT 2
Sentence 1: For each PPh in sentence
-
Is it the last PPh in the sentence?
Assign a L% at end of PPh (75Hz)
Assign a L% at end of PPh (90Hz)
Go to last PW in current PPh which contains a +new CW
+
2
L* at beg. of stressed syll (100 Hz)
H* at beg. of stressed syll. (110 Hz)
L at end of stressed syll. (100 Hz)
3 H at end of stressed syll. (110 Hz)
+
Go to next FW
-
Any FWÕs before CW?
+
Ð
FW Acc. 1?
More FW?
Ð
Is word Accent 2
+
Any post-2-ary stress syllables in word?
-
Are there any following PWs in current PPh?
4
GO TO
PHRASE_
FINAL
ACCENT
1 RULES
+
Are there any following PWs in current PPh?
+
H-goes in Mid.
of 2ary Stressed
syll. (130 Hz)
-
+
H -goes at End of 2ary
Stressed syll. (140 Hz)
-
L- goes at end of
post 2-ary stressed
syll. (110 Hz)
L goes at end of post
2-ary stress syll. (120Hz)
L- goes at End of 2-ary
Stressed Syll. 75Hz
L# also goes at end of PW (75Hz)
Ð
Go to 1st PW in current PPh
H goes at end of
2-ary stressed
Syll. (130 Hz)
-
H goes at end of
2-ary stressed Syll.
(140 Hz)
L# goes at end of current PW- 100 Hz
L# goes at end of
PW (75Hz)
5
H* goes at Beg. of stressed V-120Hz
6
L goes at end of stressed syll. (100 Hz) and
spreads to Beg. of 2ary-stressed Syll.
Are there any following PW’s in current PPh+?
+
DOWNSTEPPING
Û (Figure 11)
Figure 10. Rules for deriving the final focal Prosodic Word containing an Accent 2-word.
Values associated with points 1-6 in Figure 9 are assigned by rules in this set. Remaining
values are assigned by rules in Figure 11.
The algorithm in Figure 11 first calculates the number of stressed syllables from the end
of the current PW up to the end of the current PPh. This corresponds to 2 (on bílen and
favorít). The algorithm then goes to the first stressed syllable bi- and then asks if it is in
an Accent 2 word. Since it is not, the rules then ask if there any prestress syllables. Since
there are none in bilen, the algorithm then asks if bi- is the first of the post-focal stressed
syllables (in order to calculate the amount of downstepping). Since it is, and since it is not
the last stressed syllable, it is assigned a L* at the end of the stressed syllable at 100 Hz
(7). The algorithm then proceeds to the next and last stressed syllable, i.e. -it in favorit.
Since it is an Accent 1 word, the beginning of the prestress syllable -vo-, is assigned a H
with the same value as the preceding value (i.e. 100 Hz). The Accent 1 L* is assigned to
the beginning of the stressed syllable at 87.5 Hz, i.e. one half of the distance between
100Hz and the bottom of the range, 75Hz (n in rule 9 stands for the number of post-focal
stressed syllables)
Û
DOWNSTEPPING
Calculate number of stressed syllables from end of current PW up
to end of current PPh
Go to 1st stressed syllable
Are there any
prestress syll's?
+
H (same value as preceding value)
assigned to Beg. of prestress syll.
8
-
Is it in an Accent 2 word?
+
H*(same value as preceding value) assigned to
Beg. of stressed Syll.
L* assigned to beginning of stressed syll
at 100Hz-(i-1/n)x25Hz) (115 Hz if at
internal PPh boundary)
L assigned to End of stressed Syll. at
100Hz-(i-1/n)x25Hz (where n=number of
stressed syllables up to end of current PPh)
(115 Hz if at internal PPh boundary)
9
Is it the 1st stressed syll?
Go to Next
Stressed Syll.
∩
-
+
Is it the last stressed Syll.?
7
+
+
L# assigned to
end of Stressed
Syll. at
75Hz
L* assigned to end of Stressed Syll. at
100Hz
Go to 1st PW in current PPh
-
More stressed syll's up to end of current PPh?
Figure 11. F0 rules for deriving coutours in Post-Focal position. Values 7-9 in Figure 9
are assigned by rules in this set.
3. CONCLUSION
The output of a linguistic preprocessor which extracts information regarding the
New/Given status of content words as well as information concerning prosodic
constituent structure has been used in the formulation of a set of rules for F0 generation
which associate word accents and prosodic constituent boundary markers with the
segmental structure. The rules developed so far are based partially on observations of our
data-base and partially on predictions about the whole system. This has been necessary
since our goal has been to formulate the rules in as general a form as possible. The rules
are still in a very preliminary stage and only generate a small set of the possible
intonation patterns in the data-base. As noted above, we have not implemented rules for
Final Lengthening either. However, the system as it stands now allows one to
automatically generate intonation contours and to vary the acoustic parameter settings.
This then makes it possible to make diagnostic tests of the system and compare the output
with more data in order to fine-tune the system. Preliminary testing of the system has
indicated that listeners do in fact prefer a system that includes a referent tracker to one
that does not. For example, when asked which of Answer 1 or Answer 2 (see below) they
prefered as an answer to the question in (2), 79% of 94 listeners asked preferred Answer
2, i.e. the variant that would be generated by a system marking röda ‘red’ as the last new
content word in the utterance (favorit ‘favorite’ is deaccented since it is synonymous
with the expression tycker bäst om ‘like best’ which occurs in the question).
2) Question: Vilken bil tycker du bäst om?
‘What car do you like best?’
Answer 1) Den röda bilen är min favorit.
‘The red car is my favorite’
Answer 2) Den röda bilen är min favorit.
‘The red car is my favorite’
On the other hand, Answer 1 with focus on the final content word was preferred by 76%
of the 94 listeners as an answer to the question in (3) requesting a clarification of
previously heard utterance. In this case, the final lexical item is assumed to constitute
New information for the listener and is thus assigned a focal accent:
3) Question: Sa du att den gröna bilen är din farbrors?
Did you say that the green car is your uncle’s?
Answer 1) Nej, jag sa att den röda bilen är min favorit.
‘No, I said that the red car is my favorite’
Answer 2) Nej, jag sa att den röda bilen är min favorit.
‘No, I said that the red car is my favorite’
It is expected that after the implementation of rules for Final Lengthening and
lengthening of focussed words that the listener results will improve. It is possible that the
F0 prominence levels should also be adjusted. Nevertheless, even at this very preliminary
stage, a marked improvement in the naturalness of intonation has been observed using a
system with a linguistic preprocessor. Figure 12 summarizes in schematic form our
system for generation of intonation.
LEXICON
LABEL FILES
*.words
*.words
*.syllables
*.vowels
*.endvoicing
Convert from
*.words label file to
a text file
Prosodic Parser
Read label files
Read prosodic
structure
RULES
Assign F0 values to points in
time, thus creating a sequence of
time-F0 pairs.
Interpolate and
create an F0 file.
Resynthesize original
waveform with new
F0 using PSOLA
Figure 12. Design of system for extracting linguistic information from texts and
generating intonation contours.
ACKNOWLEDGEMENTS
This research was supported by a grant from the Swedish Language Technology Program
(HSFR-NUTEK).
REFERENCES
1.
2.
3.
4.
Horne, M., Filipsson, M., Ljungqvist, M.& Lindström, A. ‘‘Referent tracking in
restricted texts using a lemmatized lexicon: implications for generation of
prosody,’’ Proceedings Eurospeech ’93 (Berlin) Vol. 3: 2011-2014, 1993.
Hedelin, P., Jonsson, A. & Lindblad, P. ‘‘Svenskt uttalslexikon: 3 ed. Tech. Report’’.
Chalmer’s Univ. of Technology, Gothenburg, 1987.
Horne, M. & Filipsson, M. ‘‘Generating prosodic structure for Swedish text-tospeech,’’ Proceedings ICSLP 94 (Yokohama), Vol. 2: 711-714, 1994.
Horne, M. & Filipsson, M. ‘‘Computational extraction of lexico-grammatical
information for generation of Swedish intonation’’. Progress in speech synthesis,
ed. J. van Santen, R. Sproat, J. Olive, & J. Hirschberg. Springer, New York, 443457, 1996.
5. Horne, M. & Filipsson, M. ‘‘Computational modelling and generation of prosodic
structure in Swedish,’’ Proceedings XIIIth ICPhS, Stockholm, Vol. 4: 364-367,
1995.
6. Horne, M., Strangert, E. & Heldner, M. ‘‘Prosodic boundary strength in Swedish:
Final Lengthening and Silent Interval duration,’’ Proc. XIIIth ICPhS, Stockholm,
Vol. 1: 170-173, 1995.
7. Grosz, B. & Hirschberg, J. ‘‘Some intonational characteristics of discourse
structure,’’ Proceedings ICSLP92, Vol. 1:429-432, 1992.
8. Goldsmith, J. Autosegmental and metrical phonology. Blackwell, Oxford, 1990.
9. Bruce, G. Swedish word accents in sentence perspective, Gleerups, Lund, 1977.
10. Bruce, G. ‘‘Developing the Swedish intonation model,’’ Working Papers 22 (Dept.
of Ling., U. of Lund): 51-116, 1982.
11. Möhler, G. & Dogil, G. ‘‘Test environment for the two level model of Germanic
prominence,’’ Proceedings Eurospeech´95, Vol. 2: 1019-1022, 1995.