WORD FORM FREQUENCY AND PHONE DURATIONS IN FINNISH

WORD FORM FREQUENCY AND PHONE DURATIONS IN FINNISH INFORMAL
DIALOGUE
Mietta Lennes
Department of Speech Sciences
University of Helsinki
ABSTRACT
Finnish is a quantity language. In continuous speech, however, the absolute durations of speech sounds do not
systematically reflect phonemic length. Phonetic duration
depends on many factors, e.g., the articulation, syllable structure, position within the word, accent placement within the
utterance, speaking rate and speech style. Moreover, different speech sounds have different probabilities in speech,
which may affect their temporal properties. This problem
is addressed in the present study. Speech sound durations
measured from informal spoken dialogues are compared
with the frequencies of the corresponding word forms.
Phone durations measured from the initial syllables of frequent words are found to be generally shorter than those in
rare words.
the probabilities of words contribute to phone durations in
Finnish speech. The aim of this study is to investigate the
relationships of word frequency and phone durations in six
spontaneous, informal Finnish dialogues. Phone durations
of different phone classes are studied as well.
2. METHODS
Six dialogues were recorded from 12 native speakers (five
females) of Finnish, aged between 20 and 30 years. All
speakers were university students and they had lived in the
capital city area of Finland (Helsinki, Espoo, or Vantaa) for
most of their lives. The participants of each dialogue knew
each other well.
2.1. Recordings
1. INTRODUCTION
In the Finnish language, at least eight vowel phonemes and
13 consonant phonemes can be distinguished. Each of these
phonemes may occur phonologically as either long or short.
In continuous speech, however, the absolute durations of
speech sounds do not systematically reflect phonemic length.
Both classical and recent studies exist on speech sound durations and their relationships with phonological quantity in
Finnish [1, 2, among others]. Phonetic duration is known
to depend on at least the phoneme type, its position within
the moraic structure, accent placement within the utterance,
speaking rate and style. However, most of the existing studies have been performed on read-aloud or prepared speech.
The probabilities of words and phonemes in spoken
Finnish are different from those in written language. Speakers need not produce common and predictable words as
clearly as rare, informative words. The frequencies of word
forms have been shown to affect the articulatory reduction
of vowel phones within word tokens in at least Finnish,
Dutch, and Russian [3, 4]. Accentuation tends to affect segmental durations as well. Therefore, it may be expected that
The author’s work has been funded by the Academy of Finland
(projects 53623, 53005).
The recordings were performed in an anechoic room at the
Laboratory of Acoustics and Audio Signal Processing at
Helsinki University of Technology. The speakers were sitting two meters apart, facing in opposite directions. They
heard both their own and the other speaker’s voice through
headphones. Thus, the situation somewhat resembled a telephone conversation. The speakers were left alone in the
room for 45-60 minutes, and a few general topics were given
for them to discuss. However, they were instructed not to
force themselves to keep to these topics.
Each speaker’s voice was recorded with a high-quality
headset microphone to a separate channel of a DAT recorder.
The digital stereo signal was then transferred to a computer.
The channels were separated into two sound files of identical length.
2.2. Annotation
Each speaker’s utterances were first transliterated following
Finnish orthographic conventions using the Praat program
[5]. Boundaries of utterances, words, and syllables were annotated semi-automatically as separate tiers. All phones in
the recorded material from 10 speakers were automatically
segmented and labelled. Fragments of the these phonetic
4
3.2.1. Associating phone segments with phonemes
0
2
log(Frequency)
6
8
geminates that occurred at the first syllable border. A total of 6585 phone segments (3583 syllables, 312 utterances)
were thus analyzed.
0
500
1000
1500
2000
2500
Word forms
Fig. 1. The frequency distribution of 2441 orthographically
different word forms within five informal Finnish dialogues.
The most common word form ’se’ occurred 2291 times in
the material, the rarest words only once. Word frequency
is shown as logarithmic. Vertical lines indicate the division
into six subgroups, numbered 1-6 from lowest to highest
frequency, that were used for comparing phone durations.
annotations were manually corrected and used for further
analyses.
3. RESULTS
3.1. Word form frequencies
A frequency dictionary of 2441 word forms was created
from a total of 45044 word tokens spoken by the 10 speakers
within 6 dialogues. Since morphological analyses have not
yet been completed, the frequency dictionary did in some
cases contain several structurally identical occurrences of a
word form. The word frequency distribution is shown in
figure 1.
3.2. Phone durations
All phone duration measurements were done with the Praat
program [5]. In order to build a set of data that would best
reflect variability due to the contextual probability of words,
all single-word utterances were excluded from the phone
duration analysis. Also, utterance-initial stop consonants
were excluded, since they tend to be unusually short (the
mere bursts of stops). Moreover, all utterance-final phones
were discarded to reduce effects of pre-pausal lengthening.
Since every word token has at least one syllable and since
the word-initial syllables allow for the greatest structural
complexity in Finnish, the phone durations were investigated primarily in initial syllables. The data also contains
The ”phonemic” structure of syllables and words was automatically derived from the orthographic transcripts of each
type of unit. This method is often used in speech technology, since the Finnish orthography closely corresponds
to phonemic structure. However, this is only a pragmatic
means to arrive at a closed and definite set of labels for
phonetic segments, and the result depends solely on what
the transcriber has written. For instance, the length of a
phoneme is determined by whether the transcriber has typed
a single or a double character in the orthographic transcript,
and this decision is in turn mostly determined by written
forms. Therefore, in this paper, the notion of phoneme
refers only to the label of a phonetic segment that was
derived from the word label, and not to any phonologically defined entity.
In spontaneous speech, phonetic segmentations usually
contain sequences of segments which differ from the expected ”phonemic” sequence with regard to the number of
segments and their labels. The syllable provides an articulatorily motivated unit that can in most cases be used to automatically associate the segmented and transcribed phones
with phoneme labels, as long as the syllable boundaries have
been marked in the segmentation. Each annotated syllable
was divided into at most five structural parts: onset1, onset2,
nucleus, coda1, and coda2, of which the nucleus (a vowel
part) was always required. Thus, a long vowel phoneme
would have the total duration of all vowel phone segments
that had been segmented within the boundaries of one syllable. Diphthongs were also dealt with as a separate group.
3.2.2. Phone durations
The word forms were sorted according to their frequency
and divided into six groups that were numbered from 1 to 6
according to increasing frequency (group 1 containing rare
word forms and group 6 the most frequent words, see figure
1).
Figure 2 shows the distributions of phone durations for
short and long phonemes and dipthongs within word-initial
syllables according to the the six frequency groups. Frequency groups 5 and 6 contain mostly function words. Such
common function words as niin, joo, siis, ei can be held responsible for the longest durations of the long phonemes
and diphthongs in group 6.
The separation between long and short quantities (as determined from the orthographic labels) is most apparent for
the infrequent words in group 1. The mean durations of
[3] Mietta Lennes, “On the expected variability of vowel
quality in Finnish informal dialogue,” in Proceedings
of the 15th International Congress of Phonetic Sciences
(ICPhS), Barcelona, Spain, M. Solé, D. Recasens, and
J. Romero, Eds., 2003, pp. 2985–2988.
[4] Rob J. J. H. van Son, Olga Bolotova, Mietta Lennes, and
Louis C. W. Pols, “Frequency effects on vowel reduction in three typologically different languages (Dutch,
Finnish, Russian),” in ICSLP 2004 (INTERSPEECH),
4.-8.10.2004, Jeju Island, Korea, 2004.
[5] Paul Boersma and David Weenink, “Praat: doing
phonetics by computer,” 1992–2004, available at:
http://www.praat.org/.
250
150
50
0
4
5
6
150
250
Long phonemes
1
2
3
4
5
6
50
150
250
Diphthongs
0
[2] Michael O’Dell, Intrinsic timing and quantity in
Finnish, Ph.D. thesis, University of Tampere, 2004.
3
Word form frequency group
5. REFERENCES
[1] Jaakko Lehtonen,
Aspects of quantity in standard Finnish, Number [VI] in Studia Philologica
Jyväskyläensia. Jyväskylä: University of Jyväskylä,
1970.
2
50
It has been shown that in casual Finnish speech, the phone
durations in the initial syllables of words tend to be shorter
in frequent and predictable words. This may reflect a
general increase in speaking rate. It may also be assumed
that some of the duration-based contrasts that exist in clearly
pronounced speech may not be as important in highly predictable parts of casual speech.
1
0
4. CONCLUSIONS
Short phonemes
Duration (ms)
short and long phonemes are different for all word frequencies, but the distribution of values is non-symmetric and
there is a large amount of variability.
Figure 3 indicates how phone duration for short
phonemes within word-initial syllables varies by word form
frequency for different phoneme labels. There is again a
great deal of variability in the duration values, suggesting
that many factors probably contribute to them. However, the
smoothing curves show a similar, slightly downward trend
for each phoneme label. The phoneme labels with the highest frequencies in the whole dialogue material were /i,e,A/
for vowels and /s,t,n/ for consonants.
In the current study, the word form frequencies have
only been analysed on the basis of orthographic transcripts,
and homonymous forms have not been separately considered. A morphological analysis of the word forms in this
corpus is underway, which may help to build models for
phone duration.
1
2
3
4
5
6
Fig. 2. Durations of short and long phonemes and diphthongs within word-initial syllables of six different word
frequency classes. Single-word utterances, utterance-initial
stops and utterance-final phones were excluded. Word frequency increases from group 1 (left) to group 6 (right). Despite the variability, a negative correlation was found for
both short and long phonemes. Some outliers of very long
duration are not visible.
2
y
4
6
ä
ö
150
100
50
o
r
s
u
v
j
l
m
n
N
Phone duration
150
100
50
150
100
50
a
e
f
h
i
150
100
50
2
4
6
2
4
6
2
4
6
log(Word form frequency)
Fig. 3. Phone durations for short phonemes that occurred in word-initial syllables. Frequency of word forms increases from
left to right. The smooth curves indicate that an increase in word form frequency is apparently associated with slightly
shorter phone durations. There are only a few observations for such rare phoneme labels as /ö/ and /f/. Word-initial stops
were excluded from the data, but geminates at the 1st-2nd syllable border were included. A small number of large values are
not visible.