WORD FORM FREQUENCY AND PHONE DURATIONS IN FINNISH INFORMAL DIALOGUE Mietta Lennes Department of Speech Sciences University of Helsinki ABSTRACT Finnish is a quantity language. In continuous speech, however, the absolute durations of speech sounds do not systematically reflect phonemic length. Phonetic duration depends on many factors, e.g., the articulation, syllable structure, position within the word, accent placement within the utterance, speaking rate and speech style. Moreover, different speech sounds have different probabilities in speech, which may affect their temporal properties. This problem is addressed in the present study. Speech sound durations measured from informal spoken dialogues are compared with the frequencies of the corresponding word forms. Phone durations measured from the initial syllables of frequent words are found to be generally shorter than those in rare words. the probabilities of words contribute to phone durations in Finnish speech. The aim of this study is to investigate the relationships of word frequency and phone durations in six spontaneous, informal Finnish dialogues. Phone durations of different phone classes are studied as well. 2. METHODS Six dialogues were recorded from 12 native speakers (five females) of Finnish, aged between 20 and 30 years. All speakers were university students and they had lived in the capital city area of Finland (Helsinki, Espoo, or Vantaa) for most of their lives. The participants of each dialogue knew each other well. 2.1. Recordings 1. INTRODUCTION In the Finnish language, at least eight vowel phonemes and 13 consonant phonemes can be distinguished. Each of these phonemes may occur phonologically as either long or short. In continuous speech, however, the absolute durations of speech sounds do not systematically reflect phonemic length. Both classical and recent studies exist on speech sound durations and their relationships with phonological quantity in Finnish [1, 2, among others]. Phonetic duration is known to depend on at least the phoneme type, its position within the moraic structure, accent placement within the utterance, speaking rate and style. However, most of the existing studies have been performed on read-aloud or prepared speech. The probabilities of words and phonemes in spoken Finnish are different from those in written language. Speakers need not produce common and predictable words as clearly as rare, informative words. The frequencies of word forms have been shown to affect the articulatory reduction of vowel phones within word tokens in at least Finnish, Dutch, and Russian [3, 4]. Accentuation tends to affect segmental durations as well. Therefore, it may be expected that The author’s work has been funded by the Academy of Finland (projects 53623, 53005). The recordings were performed in an anechoic room at the Laboratory of Acoustics and Audio Signal Processing at Helsinki University of Technology. The speakers were sitting two meters apart, facing in opposite directions. They heard both their own and the other speaker’s voice through headphones. Thus, the situation somewhat resembled a telephone conversation. The speakers were left alone in the room for 45-60 minutes, and a few general topics were given for them to discuss. However, they were instructed not to force themselves to keep to these topics. Each speaker’s voice was recorded with a high-quality headset microphone to a separate channel of a DAT recorder. The digital stereo signal was then transferred to a computer. The channels were separated into two sound files of identical length. 2.2. Annotation Each speaker’s utterances were first transliterated following Finnish orthographic conventions using the Praat program [5]. Boundaries of utterances, words, and syllables were annotated semi-automatically as separate tiers. All phones in the recorded material from 10 speakers were automatically segmented and labelled. Fragments of the these phonetic 4 3.2.1. Associating phone segments with phonemes 0 2 log(Frequency) 6 8 geminates that occurred at the first syllable border. A total of 6585 phone segments (3583 syllables, 312 utterances) were thus analyzed. 0 500 1000 1500 2000 2500 Word forms Fig. 1. The frequency distribution of 2441 orthographically different word forms within five informal Finnish dialogues. The most common word form ’se’ occurred 2291 times in the material, the rarest words only once. Word frequency is shown as logarithmic. Vertical lines indicate the division into six subgroups, numbered 1-6 from lowest to highest frequency, that were used for comparing phone durations. annotations were manually corrected and used for further analyses. 3. RESULTS 3.1. Word form frequencies A frequency dictionary of 2441 word forms was created from a total of 45044 word tokens spoken by the 10 speakers within 6 dialogues. Since morphological analyses have not yet been completed, the frequency dictionary did in some cases contain several structurally identical occurrences of a word form. The word frequency distribution is shown in figure 1. 3.2. Phone durations All phone duration measurements were done with the Praat program [5]. In order to build a set of data that would best reflect variability due to the contextual probability of words, all single-word utterances were excluded from the phone duration analysis. Also, utterance-initial stop consonants were excluded, since they tend to be unusually short (the mere bursts of stops). Moreover, all utterance-final phones were discarded to reduce effects of pre-pausal lengthening. Since every word token has at least one syllable and since the word-initial syllables allow for the greatest structural complexity in Finnish, the phone durations were investigated primarily in initial syllables. The data also contains The ”phonemic” structure of syllables and words was automatically derived from the orthographic transcripts of each type of unit. This method is often used in speech technology, since the Finnish orthography closely corresponds to phonemic structure. However, this is only a pragmatic means to arrive at a closed and definite set of labels for phonetic segments, and the result depends solely on what the transcriber has written. For instance, the length of a phoneme is determined by whether the transcriber has typed a single or a double character in the orthographic transcript, and this decision is in turn mostly determined by written forms. Therefore, in this paper, the notion of phoneme refers only to the label of a phonetic segment that was derived from the word label, and not to any phonologically defined entity. In spontaneous speech, phonetic segmentations usually contain sequences of segments which differ from the expected ”phonemic” sequence with regard to the number of segments and their labels. The syllable provides an articulatorily motivated unit that can in most cases be used to automatically associate the segmented and transcribed phones with phoneme labels, as long as the syllable boundaries have been marked in the segmentation. Each annotated syllable was divided into at most five structural parts: onset1, onset2, nucleus, coda1, and coda2, of which the nucleus (a vowel part) was always required. Thus, a long vowel phoneme would have the total duration of all vowel phone segments that had been segmented within the boundaries of one syllable. Diphthongs were also dealt with as a separate group. 3.2.2. Phone durations The word forms were sorted according to their frequency and divided into six groups that were numbered from 1 to 6 according to increasing frequency (group 1 containing rare word forms and group 6 the most frequent words, see figure 1). Figure 2 shows the distributions of phone durations for short and long phonemes and dipthongs within word-initial syllables according to the the six frequency groups. Frequency groups 5 and 6 contain mostly function words. Such common function words as niin, joo, siis, ei can be held responsible for the longest durations of the long phonemes and diphthongs in group 6. The separation between long and short quantities (as determined from the orthographic labels) is most apparent for the infrequent words in group 1. The mean durations of [3] Mietta Lennes, “On the expected variability of vowel quality in Finnish informal dialogue,” in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain, M. Solé, D. Recasens, and J. Romero, Eds., 2003, pp. 2985–2988. [4] Rob J. J. H. van Son, Olga Bolotova, Mietta Lennes, and Louis C. W. Pols, “Frequency effects on vowel reduction in three typologically different languages (Dutch, Finnish, Russian),” in ICSLP 2004 (INTERSPEECH), 4.-8.10.2004, Jeju Island, Korea, 2004. [5] Paul Boersma and David Weenink, “Praat: doing phonetics by computer,” 1992–2004, available at: http://www.praat.org/. 250 150 50 0 4 5 6 150 250 Long phonemes 1 2 3 4 5 6 50 150 250 Diphthongs 0 [2] Michael O’Dell, Intrinsic timing and quantity in Finnish, Ph.D. thesis, University of Tampere, 2004. 3 Word form frequency group 5. REFERENCES [1] Jaakko Lehtonen, Aspects of quantity in standard Finnish, Number [VI] in Studia Philologica Jyväskyläensia. Jyväskylä: University of Jyväskylä, 1970. 2 50 It has been shown that in casual Finnish speech, the phone durations in the initial syllables of words tend to be shorter in frequent and predictable words. This may reflect a general increase in speaking rate. It may also be assumed that some of the duration-based contrasts that exist in clearly pronounced speech may not be as important in highly predictable parts of casual speech. 1 0 4. CONCLUSIONS Short phonemes Duration (ms) short and long phonemes are different for all word frequencies, but the distribution of values is non-symmetric and there is a large amount of variability. Figure 3 indicates how phone duration for short phonemes within word-initial syllables varies by word form frequency for different phoneme labels. There is again a great deal of variability in the duration values, suggesting that many factors probably contribute to them. However, the smoothing curves show a similar, slightly downward trend for each phoneme label. The phoneme labels with the highest frequencies in the whole dialogue material were /i,e,A/ for vowels and /s,t,n/ for consonants. In the current study, the word form frequencies have only been analysed on the basis of orthographic transcripts, and homonymous forms have not been separately considered. A morphological analysis of the word forms in this corpus is underway, which may help to build models for phone duration. 1 2 3 4 5 6 Fig. 2. Durations of short and long phonemes and diphthongs within word-initial syllables of six different word frequency classes. Single-word utterances, utterance-initial stops and utterance-final phones were excluded. Word frequency increases from group 1 (left) to group 6 (right). Despite the variability, a negative correlation was found for both short and long phonemes. Some outliers of very long duration are not visible. 2 y 4 6 ä ö 150 100 50 o r s u v j l m n N Phone duration 150 100 50 150 100 50 a e f h i 150 100 50 2 4 6 2 4 6 2 4 6 log(Word form frequency) Fig. 3. Phone durations for short phonemes that occurred in word-initial syllables. Frequency of word forms increases from left to right. The smooth curves indicate that an increase in word form frequency is apparently associated with slightly shorter phone durations. There are only a few observations for such rare phoneme labels as /ö/ and /f/. Word-initial stops were excluded from the data, but geminates at the 1st-2nd syllable border were included. A small number of large values are not visible.
© Copyright 2026 Paperzz