Journal of Phonetics (1998) 26, 371—380 Article ID: jp 980080 On the speaker-dependence of the perceived prominence of F0 peaks Carlos Gussenhoven* and Toni Rietveld Centre for Language Studies, University of Nijmegen, Postbus 9103, NL-6500 HD Nijmegen, The Netherlands Received 19 December 1997, revised 16 July 1998, accepted 24 September 1998 Stimuli in which a recording of an original, somewhat androgynous female voice had been manipulated by means of up-scaling and downscaling of formant frequencies so as to simulate a female and a male speaker, elicited significantly different prominence judgements from listeners, even when they had identical fundamental frequency (F ) 0 contours. The stimuli consisted of brief sentences in which one word was provided with a H*#L pitch accent. Formant manipulations were done with the help of LPC resynthesis, and F was manipulated with the 0 PSOLA technique. Accented syllables in the artificial female stimuli were judged to be less prominent than those in the artificial male stimuli. Since the only difference between the two relevant sets of stimuli resides in the spectral envelope patterns, a plausible interpretation of the results is, first, that listeners make an estimated F0 range for the speaker onto which perceived contours are projected, enabling them to read off the contour’s prominence level; and second, that listeners assign higher frequency ranges to female than to male voices. These results confirm the frequently made assumption that perceptual pitch-scaling models which assign F 0 values to phonological H-tones and L-tones must include a speakerspecific component. ( 1998 Academic Press 1. Introduction If a woman were asked to imitate an intonation contour produced by a man, there would be two ways in which she could interpret her task. In one interpretation, which we might refer to as the ‘phonetic interpretation’, she would attempt to reproduce the actual pattern of variation in vocal cord vibration, in effect trying to sound like a man: her voice would have—for her—unusually low pitch, and her pitch excursions would be smaller than she would make them herself when speaking normally. In the other interpretation, which we might call the ‘‘linguistic interpretation’’, she would try to give an accurate version of the intonation contour as she herself would have produced it if she had wanted to say the same thing. In this interpretation, her rate of vocal cord vibration would be considerably higher, and the excursions of the original contour would appear enlarged when viewed on the same linear scale. Our speaker’s ‘linguistic interpretation’ involves a number of steps. First, she will have to make an estimate of the model speaker’s pitch *Corresponding author. E-mail: [email protected] 0095—4470/98/040371#10 $30.00/0 ( 1998 Academic Press 372 C. Gussenhoven and ¹. Rietveld range, a scale covering the distance between what she expects will be the speaker’s lowest and highest pitches. Second, she will have to project the actual F0 contour of the model utterance onto this scale, so as to be able to judge how wide the excursions in the speaker’s contour are and where in the speaker’s pitch range the contour is located. A successful estimate of these parameters will enable her not to mistake a high, narrow-span contour with a contour spoken low in the pitch range containing fairly average pitch excursions. Third, she will have to project the model contour onto her own speaker-specific pitch range, so as to be able to decide where to begin, how high or how low to go, and where to end. The speaker-specific pitch range that this scenario presupposes has been referred to as the contours ‘Graph-paper’ by Pierrehumbert (1980); we will here use the term ‘reference scale’. From this perspective, without a reference scale, no judgement can be made of perceptual attributes that are determined by the contour’s F0 range, such as the liveliness of the contour (Traunmüller, 1988; Traunmüller & Eriksson, 1995), the degree of surprise (Gussenhoven & Rietveld 1997), or the prominence of any F0 peaks in it (e.g., Rietveld & Gussenhoven, 1985; Terken, 1991). Experiments that have been concerned with the relation between these perceptual attributes and properties of the contour have quite reasonably assumed that the reference scale was constant in each experiment, since the stimuli were produced by the same (artificial or real) speaker. For instance, Beckman (1995) observes that ‘‘all of our theories of intonational structure include at least an implicit representation of the speaker’s overall pitch range in our models of the hearer’s competence.’’ The purpose of our experiment was to provide a demonstration that listeners in fact adjust the reference scale according to their estimate of the speaker’s F0 range. There are a number of ways in which this could be done. We could use a variety of speakers with different individual pitch ranges, as evidenced by their average F0 over a series of utterances. Alternatively, we could rely on the commonly perceived differences in mean F0 between the speech of men and the speech of women in Dutch (van Bezooijen, 1996) and assume that listeners will create different reference scales for these two groups of speakers. We decided to use the latter strategy, and accordingly ran a perception experiment with two sets of stimuli in which we had manipulated formant values, such that one set sounded as if they were spoken by a woman and the other set sounded as if they were spoken by a man. Although, strictly speaking, our method will therefore only be able to show genderspecificity of the pitch range, we will assume that the results indicate that pitch range perception is in fact speaker-specific. The perceptual attribute we have chosen to investigate is perceived prominence, also known as ‘intonational emphasis’ (Ladd & Morton, 1997). Rather than referring to different levels of phonological prominence (among which one could distinguish the weak branch of the foot, the strong branch of the foot, the word stress, and the accented syllable), perceived prominence amounts to the score obtained from native speakers in a perception task with gradient ‘emphasis’ or ‘prominence’ as the response category. In principle, a prominence perception task could be related to the whole contour, to the pitch-accented word, or to the syllable with which the pitch accent is associated. The first task may be too general for judges to feel comfortable with, since they may prefer to concentrate on a more specific relation between some aspect of the signal and a perceptual attribute. Streefkerk et al. (1997) find that the latter two tasks yield highly correlated results, but that perceived prominence is somewhat higher when listeners judge the prominence of an accented syllable than when they judge the prominence of an accented word. Most frequently, prominence perception tasks seek to establish the perceived Speaker-dependence of prominence judgements 373 prominence, or intonational emphasis, of accented syllables that are realised with the help of a pitch peak, the phonetic implementation of a H*#L pitch accent. Perceived prominence of F0 peaks is related to the maximum frequency excursion of the fundamental, with higher peaks eliciting greater perceived prominence than lower peaks (Pierrehumbert, 1979; Rietveld & Gussenhoven, 1985), as well as to properties of the surrounding F0 maxima (Pierrehumbert, 1979) and minima (Gussenhoven, Repp, Rietveld, Rump & Terken, 1997). Accordingly, we decided to measure the perceived prominence of a syllable in which a H*#L pitch peak is located. The choice of the perceptual attribute ‘prominence’ as the dependent variable was made because, even more so than perceptual attributes like ‘surprise’ and ‘liveliness’, perceived prominence has been shown to be a very sensitive variable, which is readily affected by (within-speaker) changes in pitch range. This has been shown for overall pitch range modifications that are created by variation of the peak height in contours with relatively fixed low F0 values (Shriberg, Ladd, Terken & Stolcke, 1996; Ladd & Morton, 1997), as well as for pitch range modifications that rely on the wholesale shifting up or down of the contour in the speaker’s frequency range (Rietveld & Gussenhoven, 1985). The latter experiment in fact combined both types of variation, and showed that a ‘global’ seven semitone increase in F0 of a single-peak contour raises the perceived prominence of the peak by the same amount as a 1.5 semitone increase of just the peak. These results were interpreted as being due to the shifting up of the contour, or of the contour peak, along the reference scale, with higher values eliciting higher prominence levels. There are also findings that are more appropriately interpreted as being due to shifts of the reference scale itself. A subtle effect of this kind is reported in Gussenhoven et al. (1997), who present evidence that listeners make an estimate of the location of the speaker’s reference scale on the basis of the F0 of the initial unaccented portion of the contour: a slight raising of the contour’s onset had the effect that the perceived prominence of the peak decreased. This was interpreted to mean that the speaker’s reference scale was raised, with the contour being held constant.1 This brief review of experimental findings and theoretical assumptions suggests that there are two ways in which we can manipulate the relation between contour and reference scale. First, if the speaker is (assumed to be) the same, an increase in the F0 of the peak or of the entire contour will increase perceived prominence. Second, if the listener’s estimate of the reference scale were to change as a result of her impression that she was listening to a speaker with a higher average F0 , the perceived prominence of the F0 peak will go down with (whole-contour) increases in the F0. That is, because a female speaker will be expected to have a higher reference scale, an F0 peak in a given F0 contour will be heard as less prominent if the listener believes she is listening to a female voice than if she thinks the speaker is male. Thus, there were two predictions that the experiment reported here was intended to test: (1) If the formant structure suggests the speaker is female, a given F0 peak will have less perceived prominence than when the formant structure suggests the speaker is male. 1Terken (1991) found that when the baseline of a one-peaked contour is given a declining shape by raising all values except the last, i.e., tilting it using the end point as a pivot, the perceived prominence of the peak goes up, apparently contradicting the finding of Gussenhoven et al. (1997). However, as explained in the latter article, peak height and beginning of the contour co-varied in the stimuli used by Terken (1991), so that effects of increased peak and raised level were confounded. 374 C. Gussenhoven and ¹. Rietveld (2) If the formant structure is held constant and the pitch range is increased, either by raising just the peak height (i.e., raising the maximum frequency excursion while leaving the remainder of the F0 contour unaltered (henceforth ‘peak height condition’) or by a wholesale raising of the entire contour (henceforth ‘baseline condition’), the perceived prominence will increase. To avoid misunderstandings, we would like to make explicit that our experiment was not concerned to show that the pitch of a preceding utterance will determine the perceived pitch range of a subsequent utterance by the same speaker, or that the pitch of the surrounding utterance fragments will determine the pitch range of any intervening fragment. Leather (1983) showed for Mandarin Chinese that the interpretation of the lexical tone on a syllable with a given F0 pattern will depend on the F0 of the surrounding sentence fragments, in a way reminiscent of the experiments by Ladefoged and Broadbent (1957), who showed that the same vowel spectrum will be interpreted differently depending on the spectral characteristics of the embedding sentence. Leather showed that when the surrounding pitch is relatively high, a given F0 value may be interpreted as the realisation of an L-tone, while a lower average surrounding pitch may cause the same F0 value to be interpreted as the realisation of an H-tone. In our case, it is the formant frequencies of the stimulus, rather than the surrounding F0, that we expected to affect the interpretation of the speaker’s pitch range. 2. The experiment The above hypotheses were tested in a perception experiment in which natural utterances (henceforth ‘source utterances’, after Ladd & Morton, 1977) were provided with two modified spectrum envelope patterns, one representative of a female speaker and the other of a male speaker, and each artificial spectrum was subsequently provided with a number of artificial F0 contours. 2.1 Materials A 25-year-old female speaker of Dutch recorded two fairly monotonous utterances on audiotape, one of which had the vowel [i] in word stress position and the other the vowel [a:]. The purpose of varying the degree of opening of the accented vowels in this way was to increase the generalisibility of the results. One sentence-like phrase, S1, was ‘Dat geblaat van die schapen daar’ (‘that bleating of those sheep there’), which was 1430 ms long and had an [a:] in ‘schapen’ of 161 ms.; the other, S2, was ‘Dat geklier van die pieten daar’ (‘that fidgeting of those nits there’), which was 1630 ms long and had an [i] in ‘pieten’ of 66 ms. After AD-conversion (10 kHz sampling frequency) and Linear Predictive Coding (LPC) analysis (12 coefficients, frame-length 10 ms, window 25 ms), the utterances were resynthesised in two versions, one of which was intended to be perceived as produced by a female voice and the other as produced by a male voice. Following suggestions by Elmlund, Frehr and Petersen (1992), the first, second, and fifth formants in the original female speech signal were downscaled by 0.85, 0.85, and 0.80, respectively, in order to obtain a version that sounded as if spoken by a man. The ‘female’ voice was created by multiplying the first three formants in the original utterances by a factor of 1.2. Both versions, therefore, had artificial spectra. Informal tests with phonetically trained Speaker-dependence of prominence judgements 375 listeners confirmed that the artificial spectra sounded convincingly like a male and a female speaker, respectively. For each sentence, the amplitudes of the samples corresponding with the F0-peaks in the accented syllables were all scaled to the same value. For these manipulations, we used the LVS package (Vogten, 1985). Subsequently, both the ‘male’ and the ‘female’ versions were resynthesised with a number of artificial one-peak intonation contours in which the height of the baseline and the height of the peak were varied. The contours consisted of one H*#L-accent, on the syllables ‘scha’ and ‘pie’, respectively, with a boundary L-tone at the beginning and one at the end, giving an accent-lending peak superimposed on a slightly descending baseline. The baseline was varied in three steps, ‘high’, ‘mid’ and ‘low’, and each of these baseline conditions was combined with two peak heights, to implement the peak height condition. Peaks had flanks of 100 ms and were aligned such that the F0 maxima occurred in the middle of the vowel. Table 1 gives the values we used. We did not cross both the ‘male’ and ‘female’ versions with each baseline condition: the low baseline condition is unrealistic when combined with the ‘female’ voice, while the high baseline condition is unrealistic when combined with the ‘male’ voice. Therefore, the stimuli that are relevant for testing the first hypothesis, which requires a comparison of the ‘female’ and ‘male’ versions, are those with the mid baseline. This subset consisted of 2 (sentences)]2 (genders)] 2 (peak heights)"8 stimuli (cf. the ‘female’ and the ‘male’ contours with baseline of 145 Hz in Fig. 1). Within the sets of stimuli for each ‘gender’, it is possible to test the second hypothesis. These two mutually exclusive subsets (cf. the two boxed sets of contours in Fig. 1) consisted of 2 (sentences)]2 (peak heights)]2 (baselines)"8 stimuli each. The total number of stimuli was thus 16. 2.2. Procedure Two versions of the stimulus tape were made, each with a different random order of the stimuli. Thirty subjects participated in the experiment, who were equally divided over both test halves. The 16 stimuli were presented three times in eight blocks of 6 stimuli each. The interstimulus interval was 5.5 s. The eight blocks were mixed with 72 stimuli belonging to a different experiment on peak prominence for which the same task was used. This was a magnitude estimation task: subjects were asked to rate the level of prominence of (strictly, the ‘degree of emphasis’ on) the syllables ‘scha’ and ‘pie’ in each stimulus by putting a vertical mark across an uncalibrated horizontal line, one of which was printed on their score sheets for each stimulus. Each block of six stimuli corresponded to a single page, and was preceded by an anchor stimulus, ‘Dat gedoe van die boeren daar’ (‘that fussing of those farmers there’), which had a baseline of 145—130 Hz and a peak of 190 Hz. The prominence level of the anchor stimulus was marked on the score TABLE I. F (Hz) of contour beginnings, contour ends, and 0 peaks of artificial pitch contours Baseline begins 175 145 115 Baseline ends Low peak 169 130 100 235 190 155 High peak 245 205 165 376 C. Gussenhoven and ¹. Rietveld Figure 1. Structure of the experimental contours with hypothetical male and female reference scales, indicated by the boxes. sheet as the midpoint of the first scale of each page. The uncalibrated scales were subsequently quantized as 100-point scales. 2.3. Results The resulting mean scores, pooled over subjects and repetitions, are shown in Fig. 2. The variation in the scores given was quite similar in all conditions: the mean SD was 13.33. The lowest SD was 11.28, which occurred in the condition ‘male speaker, sentence 1, 115 Hz’, while the highest was 15.05, which occurred in the condition ‘female speaker, sentence 1, 145 Hz’. This variation in the scores reflects the variation between the subjects but does not affect the analysis, as we are dealing with a within-subject design. Panel (a) shows the effects of the baseline conditions and the peak height conditions for the ‘female’ voice, for the two source sentences separately, while panel (b) shows the equivalent scores for the ‘male’ voice. As can be seen, the higher baseline results in higher preceived prominence in all cases, while higher peaks result in higher perceived prominence in all cases except in the ‘mid’ male register for the sentence with [a:]. Separate analyses of variance (repeated measures, based on the means of the three repetitions) were carried out on the scores for the ‘female’ and ‘male’ voices, respectively, with BASELINE, PEAK, and SENTENCE as three two-level factors. For the ‘female’ voice, only the main effects reached significance: BASELINE: F1,29"24.67, p(0.001, and the index of explained variance g2"0.460; PEAK: F2,29"6.84, p(0.015, g2"0.191; and SENTENCE: F1,29"35.31, p(0.001, g2"0.549. For the ‘male’ voice the main effects BASELINE and PEAK were significant (F 1,29"25.82, p(0.001, g2"0.471 and F1,29"31.48, p(0.001, g2"0.520, respectively). Two interactions were also significant: SENTENCE]PEAK (F1,29"6.01, p(0.021, g2"0.172) and BASELINE]PEAK (F1,29"9.24, p(0.006, g2"0.242). Speaker-dependence of prominence judgements 377 Figure 2. Mean perceived prominence of the F peaks as a function 0 of peak height and baseline, for each source sentence separately (panel a: ‘female’ voice, panel b: ‘male’ voice). Mean values based on 90 observations. S1"Dat geblaat van die schapen daar; S2"Dat geklier van die pieten daar. Fig. 3 shows the effect of the GENDER and SENTENCE conditions in the mid baseline condition. As can be seen, the female voice consistently shows a lower perceived prominence than the male voice. An analysis of variance (repeated measures, based on the means of the three repetitions) was carried out on the perceived prominence values of the stimuli with the mid baseline. Four factors turned out to be (near) significant at the 5% level: SENTENCE: F1,29"14.43, p(0.002, g2"0.332; GENDER: F1,29"68.43, p(0.001, g2"0.702; PEAK: 378 C. Gussenhoven and ¹. Rietveld Figure 3. Mean perceived prominence of the F peaks as a function 0 of ‘gender’ and ‘peak’. Mean values each based on 180 observations. F1,29"3.69, p(0.061 (near significant), g2"0.113; and the interaction SENTENCE] GENDER: F 1,29"22.68, p(0.001, g2"0.439. The main effect SENTENCE is not relevant here. It could be expected to affect the prominence ratings, since the words to be rated differed in spectral composition and duration. The manipulated gender of the voice appeared to be a strong factor: for both sentences, the ‘male’ voice yielded higher perceived prominence levels than the ‘female’ voice. Peak height was nearly significant (p(0.061): three out of four comparisons show differences in perceived prominence that correspond with the differences in the F0 peaks. The interaction SENTENCE]GENDER was unexpected. It appears that for both levels of PEAK HEIGHT, the prominence difference between the ‘male’ and ‘female’ voices is somewhat larger for the sentence with [i] (‘S1’ in Fig. 2) in the accented syllable than for the sentence with [a:] (‘S2’ in Fig. 2) in the accented syllable. 3. Discussion The main finding is that the manipulated gender of the voice influences prominence judgements in the expected direction. When superimposed on a male voice, the H*#L peak in the same F0 contour leads to greater perceived prominence than when it is superimposed on a female voice. This confirms the hypothesis that perceived gender of the speaker is used by the listener to anchor the reference scale upon which contours are projected. In addition, within each artificial voice, both the raising of the baseline and the raising of the peak were seen to independently increase the perceived prominence of the peaks. This confirms our second hypothesis (as well as a great deal of earlier research): when the reference scale is held constant, higher pitch leads to greater prominence. Could our ‘gender’ effect have other causes? It is well known that the manipulation of the spectral characteristics of speech signals may affect their loudness. For instance, Speaker-dependence of prominence judgements 379 Glave and Rietveld (1975) showed that speech signals with equal intensities but different spectra differ in loudness. A crucial factor here is the distribution of the spectral energy over the different critical bands. In general, increasing the distance between formants will also increase the chance that energy will be concentrated in different critical bands. For obvious reasons, the distance between the formants of our ‘female voice’ was larger than the distance between formants in the ‘male voice’, meaning that, ceteris paribus, the female vowels might be perceived as somewhat louder than the male vowels. Our results speak against this, however; the stressed syllables in the words geblaat and pieten as realised in the ‘male voice’ were judged to be more prominent than the corresponding syllables in the ‘female voice’. This strongly suggests that our upscaling of the formants did not by itself increase the perceived prominence. Moreover, the sheer size of the effect, which greatly exceeds that of a peak height increase of 10 Hz, makes it unlikely that it is an artefact of formant alteration.2 The results of our experiment are consistent with findings by Traunmüller and Eriksson (1995). They showed that the perceived liveliness of F0 excursions depends not only on F0 and speech rate, but also on the ‘amount of space available below F1’ (on this distance, see also Traunmüller, 1988). Specifically, they found that when the spectral distance between the first formant and the baseline (their value ‘F"’, which is very similar to the concept of the ‘baseline’ used in our study) is larger, the perceived degree of liveliness is smaller. This predicts that when F1 goes up, causing the spectral distance between F0 and F1 to increase, the perceived liveliness should decrease. As perceived liveliness and perceived prominence are likely to be related, it is not surprising to see the same dependence between spectral characteristics and perceived prominence in the results of our experiment. However, we do not believe that the distance between F1 and F0 itself is responsible for these two concurring findings. Rather, a higher F1 (and a higher F2 , F3 , etc.) suggest a different speaker, one who is likely to have a higher average F0 , and hence a higher ‘reference scale’. We assume it is this that causes pitch movements to sound less impressive, lively, surprised, etc. to the listener. The unexpected interaction between SENTENCE and GENDER does not have an obvious explanation. We found that the effect of GENDER on perceived prominence was greater for the sentence with [i] than for the sentence with [a:] in the accented syllable. A possible explanation may be found in the effect of intrinsic pitch. Results obtained by Silverman (1987, chap. 3) for English suggest that [i] will be heard as less prominent than [a:], all else being equal. If the reference scale is nonlinear, the effect of the manipulated gender should be larger as the contours are scaled lower on the reference scales, i.e., larger for [i] than for [a:], which is in accordance with our finding. However, the interaction effect is small, and therefore we refrain from further attempts to explain it. In summary, it was found that a change in the apparent gender of the speaker can cause the perceived prominence of an F0 peak to change: male speech is more prominent than female speech, all else being equal. Listeners appear to use the speaker’s voice 2Van Heuven and Menert (1996) failed to find any effect on stress perception of formant upscaling and downscaling in Dutch disyllables representing nonsense and real words presented in isolation. Since theirs was a forced-choice task selecting either the first or the second syllable as stressed, their results do not bear on our experiment. It is not clear why male speech should be biased for stress towards a different syllable than female speech, and neither is it clear that any overall effects of formant alteration on loudness should be biased towards either of the syllables. 380 C. Gussenhoven and ¹. Rietveld characteristics to estimate the location of the reference scale on which the F0 contour is projected so as to ‘read off’ the speaker’s intended prominence level. The authors would like to thank reviewer Peter Assmann and an anonymous reviewer for helpful comments on an earlier version of this paper. References Beckman, M. E. (1995) Local shapes and global trends. Proceedings International Conference of Phonetic Sciences, Stockholm, Vol. II, 100—107 Bezooijen, R. van (1996) Socio-cultural aspects of pitch differences between Japanese and Dutch women. ¸anguage and Speech, 38, 253—265 Elmlund, M., Frehr, I. & Petersen, N. H. (1992) Formant transformation from male to female synthetic voices. Proceedings International Conference on Speech and ¸anguage Processing, Banff, Vol. II, 1187—1190 Glave, R. D. & Rietveld, A. C. M. (1975) Is the effort dependence of speech loudness explicable on the basis of acoustical cues? Journal of the Acoustical Society of America, 58, 875—879 Gussenhoven, C. & Rietveld, T. (1997) Empirical evidence for the contrast between L* and H* in Dutch rising contours. In A. Botinis et al. (eds), Intonation: ¹heory, Models and Applications, Proceedings of an ESCA ¼orkshop. Grenoble: European Speech Communication Association. 169—172 Gussenhoven, C., Repp, B. H., Rietveld, A., Rump, H. H. & Terken, J. (1997) The perceptual prominence of fundamental frequency peaks. Journal of the Acoustical Society of America, 102, 3009—3022 Heuven, V. J. van & Menert, L. (1996) Why stress position bias? Journal of the Acoustical Society of America 100, 2439—2451 Ladefoged, P. & Broadbent, D. E. (1957) Information conveyed by vowels. Journal of the Acoustical Society of America, 29, 98—104 Ladd, D. R. & R. Morton (1997) The perception of intonational emphasis: continuous or categorical? Journal of Phonetics, 25, 313—342 Leather, J. (1983) Speaker normalization in perception of lexical tone. Journal of Phonetics, 11, 373—382 Pierrehumbert, J. (1979) The perception of fundamental frequency declination. Journal of the Acoustical Society of America, 66, 363—369 Pierrehumbert, J. (1980) The phonetics and phonology of English intonation. PhD dissertation MIT. Published by Garland Press, New York, 1990. Rietveld, A. C. M. & C. Gussenhoven (1985) On the relation between pitch excursion size and prominence. Journal of Phonetics, 13, 299—308 Shriberg, E. E., Ladd, D. R., Terken, J. & Stolcke, A. (1996) Modelling pitch range variation-within and across speakers: Predicting F targets when ‘‘speaking up’’. In Proceedings of the International Conference on Spoken 0 ¸anguage Processing, Philadelphia. Supplement pp. 1—4 Silverman, K. A. E. (1987) ¹he structure and processing of fundamental frequency contours. PhD dissertation Cambridge Streefkerk, B. M., Pols, L. C. W. and Ten Bosch, L. F. M. (1997). Prominence in read aloud sentences, as marked by listeners and classified automatically. Proceedings of the Institute of Phonetic Sciences of the ºniversity of Amsterdam, 21, 101—116 Terken, J. (1991) Fundamental frequency and perceived prominence of accented syllables. Journal of the Acoustical Society of America, 89, 1768—1776 Traunmüller, H. (1988) Paralinguistic variation and invariance in the characteristic frequencies of vowels. Phonetica, 45, 1—29 Traunmüller, H. & A. Eriksson (1995) The perceptual evaluation of F excursions in speech as evidenced in 0 liveliness estimations. Journal of the Acoustical Society of America, 97 (no. 3, March), 1905—1915 Vogten, L. (1985). Handleiding No. 67. LVS-Speech Processing Programs on IPO VAX 11/780. Eindhoven: Institute of Perception Research
© Copyright 2026 Paperzz