On the speaker-dependence of the perceived

Journal of Phonetics (1998) 26, 371—380
Article ID: jp 980080
On the speaker-dependence of the perceived prominence
of F0 peaks
Carlos Gussenhoven* and Toni Rietveld
Centre for Language Studies, University of Nijmegen, Postbus 9103, NL-6500 HD Nijmegen,
The Netherlands
Received 19 December 1997, revised 16 July 1998, accepted 24 September 1998
Stimuli in which a recording of an original, somewhat androgynous
female voice had been manipulated by means of up-scaling and downscaling of formant frequencies so as to simulate a female and a male
speaker, elicited significantly different prominence judgements from
listeners, even when they had identical fundamental frequency (F )
0
contours. The stimuli consisted of brief sentences in which one word was
provided with a H*#L pitch accent. Formant manipulations were done
with the help of LPC resynthesis, and F was manipulated with the
0
PSOLA technique. Accented syllables in the artificial female stimuli were
judged to be less prominent than those in the artificial male stimuli. Since
the only difference between the two relevant sets of stimuli resides in the
spectral envelope patterns, a plausible interpretation of the results is, first,
that listeners make an estimated F0 range for the speaker onto which
perceived contours are projected, enabling them to read off the contour’s
prominence level; and second, that listeners assign higher frequency
ranges to female than to male voices. These results confirm the frequently
made assumption that perceptual pitch-scaling models which assign F
0
values to phonological H-tones and L-tones must include a speakerspecific component.
( 1998 Academic Press
1. Introduction
If a woman were asked to imitate an intonation contour produced by a man, there would
be two ways in which she could interpret her task. In one interpretation, which we might
refer to as the ‘phonetic interpretation’, she would attempt to reproduce the actual pattern of variation in vocal cord vibration, in effect trying to sound like a man: her voice
would have—for her—unusually low pitch, and her pitch excursions would be smaller
than she would make them herself when speaking normally. In the other interpretation,
which we might call the ‘‘linguistic interpretation’’, she would try to give an accurate version of the intonation contour as she herself would have produced it if she had wanted to
say the same thing. In this interpretation, her rate of vocal cord vibration would be
considerably higher, and the excursions of the original contour would appear enlarged
when viewed on the same linear scale. Our speaker’s ‘linguistic interpretation’ involves
a number of steps. First, she will have to make an estimate of the model speaker’s pitch
*Corresponding author. E-mail: [email protected]
0095—4470/98/040371#10 $30.00/0
( 1998 Academic Press
372
C. Gussenhoven and ¹. Rietveld
range, a scale covering the distance between what she expects will be the speaker’s lowest
and highest pitches. Second, she will have to project the actual F0 contour of the model
utterance onto this scale, so as to be able to judge how wide the excursions in the
speaker’s contour are and where in the speaker’s pitch range the contour is located.
A successful estimate of these parameters will enable her not to mistake a high, narrow-span contour with a contour spoken low in the pitch range containing fairly average
pitch excursions. Third, she will have to project the model contour onto her own
speaker-specific pitch range, so as to be able to decide where to begin, how high or how
low to go, and where to end. The speaker-specific pitch range that this scenario
presupposes has been referred to as the contours ‘Graph-paper’ by Pierrehumbert (1980);
we will here use the term ‘reference scale’. From this perspective, without a reference
scale, no judgement can be made of perceptual attributes that are determined by the
contour’s F0 range, such as the liveliness of the contour (Traunmüller, 1988; Traunmüller
& Eriksson, 1995), the degree of surprise (Gussenhoven & Rietveld 1997), or the
prominence of any F0 peaks in it (e.g., Rietveld & Gussenhoven, 1985; Terken, 1991).
Experiments that have been concerned with the relation between these perceptual
attributes and properties of the contour have quite reasonably assumed that the reference scale was constant in each experiment, since the stimuli were produced by the same
(artificial or real) speaker. For instance, Beckman (1995) observes that ‘‘all of our theories
of intonational structure include at least an implicit representation of the speaker’s
overall pitch range in our models of the hearer’s competence.’’ The purpose of our
experiment was to provide a demonstration that listeners in fact adjust the reference scale
according to their estimate of the speaker’s F0 range. There are a number of ways in
which this could be done. We could use a variety of speakers with different individual
pitch ranges, as evidenced by their average F0 over a series of utterances. Alternatively,
we could rely on the commonly perceived differences in mean F0 between the speech
of men and the speech of women in Dutch (van Bezooijen, 1996) and assume that listeners will create different reference scales for these two groups of speakers. We decided to
use the latter strategy, and accordingly ran a perception experiment with two sets of stimuli in which we had manipulated formant values, such that one set sounded as if they
were spoken by a woman and the other set sounded as if they were spoken by a man.
Although, strictly speaking, our method will therefore only be able to show genderspecificity of the pitch range, we will assume that the results indicate that pitch range
perception is in fact speaker-specific.
The perceptual attribute we have chosen to investigate is perceived prominence, also
known as ‘intonational emphasis’ (Ladd & Morton, 1997). Rather than referring to
different levels of phonological prominence (among which one could distinguish the weak
branch of the foot, the strong branch of the foot, the word stress, and the accented
syllable), perceived prominence amounts to the score obtained from native speakers in
a perception task with gradient ‘emphasis’ or ‘prominence’ as the response category. In
principle, a prominence perception task could be related to the whole contour, to the
pitch-accented word, or to the syllable with which the pitch accent is associated. The first
task may be too general for judges to feel comfortable with, since they may prefer to concentrate on a more specific relation between some aspect of the signal and a perceptual
attribute. Streefkerk et al. (1997) find that the latter two tasks yield highly correlated
results, but that perceived prominence is somewhat higher when listeners judge the
prominence of an accented syllable than when they judge the prominence of an accented
word. Most frequently, prominence perception tasks seek to establish the perceived
Speaker-dependence of prominence judgements
373
prominence, or intonational emphasis, of accented syllables that are realised with the
help of a pitch peak, the phonetic implementation of a H*#L pitch accent. Perceived
prominence of F0 peaks is related to the maximum frequency excursion of the fundamental, with higher peaks eliciting greater perceived prominence than lower peaks
(Pierrehumbert, 1979; Rietveld & Gussenhoven, 1985), as well as to properties of the
surrounding F0 maxima (Pierrehumbert, 1979) and minima (Gussenhoven, Repp, Rietveld, Rump & Terken, 1997). Accordingly, we decided to measure the perceived prominence of a syllable in which a H*#L pitch peak is located.
The choice of the perceptual attribute ‘prominence’ as the dependent variable was
made because, even more so than perceptual attributes like ‘surprise’ and ‘liveliness’,
perceived prominence has been shown to be a very sensitive variable, which is readily
affected by (within-speaker) changes in pitch range. This has been shown for overall pitch
range modifications that are created by variation of the peak height in contours with
relatively fixed low F0 values (Shriberg, Ladd, Terken & Stolcke, 1996; Ladd & Morton,
1997), as well as for pitch range modifications that rely on the wholesale shifting up or
down of the contour in the speaker’s frequency range (Rietveld & Gussenhoven, 1985).
The latter experiment in fact combined both types of variation, and showed that a
‘global’ seven semitone increase in F0 of a single-peak contour raises the perceived
prominence of the peak by the same amount as a 1.5 semitone increase of just the peak.
These results were interpreted as being due to the shifting up of the contour, or of the
contour peak, along the reference scale, with higher values eliciting higher prominence
levels. There are also findings that are more appropriately interpreted as being due to
shifts of the reference scale itself. A subtle effect of this kind is reported in Gussenhoven et
al. (1997), who present evidence that listeners make an estimate of the location of the
speaker’s reference scale on the basis of the F0 of the initial unaccented portion of the
contour: a slight raising of the contour’s onset had the effect that the perceived prominence of the peak decreased. This was interpreted to mean that the speaker’s reference
scale was raised, with the contour being held constant.1
This brief review of experimental findings and theoretical assumptions suggests that
there are two ways in which we can manipulate the relation between contour and reference scale. First, if the speaker is (assumed to be) the same, an increase in the F0 of the
peak or of the entire contour will increase perceived prominence. Second, if the listener’s
estimate of the reference scale were to change as a result of her impression that she was
listening to a speaker with a higher average F0 , the perceived prominence of the F0 peak
will go down with (whole-contour) increases in the F0. That is, because a female speaker
will be expected to have a higher reference scale, an F0 peak in a given F0 contour will be
heard as less prominent if the listener believes she is listening to a female voice than if she
thinks the speaker is male. Thus, there were two predictions that the experiment reported
here was intended to test:
(1) If the formant structure suggests the speaker is female, a given F0 peak will have less
perceived prominence than when the formant structure suggests the speaker is male.
1Terken (1991) found that when the baseline of a one-peaked contour is given a declining shape by raising all
values except the last, i.e., tilting it using the end point as a pivot, the perceived prominence of the peak goes up,
apparently contradicting the finding of Gussenhoven et al. (1997). However, as explained in the latter article,
peak height and beginning of the contour co-varied in the stimuli used by Terken (1991), so that effects of
increased peak and raised level were confounded.
374
C. Gussenhoven and ¹. Rietveld
(2) If the formant structure is held constant and the pitch range is increased, either by
raising just the peak height (i.e., raising the maximum frequency excursion while
leaving the remainder of the F0 contour unaltered (henceforth ‘peak height condition’) or by a wholesale raising of the entire contour (henceforth ‘baseline condition’),
the perceived prominence will increase.
To avoid misunderstandings, we would like to make explicit that our experiment was
not concerned to show that the pitch of a preceding utterance will determine the perceived pitch range of a subsequent utterance by the same speaker, or that the pitch of the
surrounding utterance fragments will determine the pitch range of any intervening fragment. Leather (1983) showed for Mandarin Chinese that the interpretation of the lexical
tone on a syllable with a given F0 pattern will depend on the F0 of the surrounding
sentence fragments, in a way reminiscent of the experiments by Ladefoged and Broadbent (1957), who showed that the same vowel spectrum will be interpreted differently
depending on the spectral characteristics of the embedding sentence. Leather showed
that when the surrounding pitch is relatively high, a given F0 value may be interpreted as
the realisation of an L-tone, while a lower average surrounding pitch may cause the same
F0 value to be interpreted as the realisation of an H-tone. In our case, it is the formant
frequencies of the stimulus, rather than the surrounding F0, that we expected to affect the
interpretation of the speaker’s pitch range.
2. The experiment
The above hypotheses were tested in a perception experiment in which natural utterances (henceforth ‘source utterances’, after Ladd & Morton, 1977) were provided with two
modified spectrum envelope patterns, one representative of a female speaker and the
other of a male speaker, and each artificial spectrum was subsequently provided with
a number of artificial F0 contours.
2.1 Materials
A 25-year-old female speaker of Dutch recorded two fairly monotonous utterances on
audiotape, one of which had the vowel [i] in word stress position and the other the vowel
[a:]. The purpose of varying the degree of opening of the accented vowels in this way was
to increase the generalisibility of the results. One sentence-like phrase, S1, was ‘Dat
geblaat van die schapen daar’ (‘that bleating of those sheep there’), which was 1430 ms
long and had an [a:] in ‘schapen’ of 161 ms.; the other, S2, was ‘Dat geklier van die pieten
daar’ (‘that fidgeting of those nits there’), which was 1630 ms long and had an [i] in
‘pieten’ of 66 ms. After AD-conversion (10 kHz sampling frequency) and Linear Predictive Coding (LPC) analysis (12 coefficients, frame-length 10 ms, window 25 ms), the
utterances were resynthesised in two versions, one of which was intended to be perceived
as produced by a female voice and the other as produced by a male voice. Following
suggestions by Elmlund, Frehr and Petersen (1992), the first, second, and fifth formants
in the original female speech signal were downscaled by 0.85, 0.85, and 0.80, respectively,
in order to obtain a version that sounded as if spoken by a man. The ‘female’ voice was
created by multiplying the first three formants in the original utterances by a factor of 1.2.
Both versions, therefore, had artificial spectra. Informal tests with phonetically trained
Speaker-dependence of prominence judgements
375
listeners confirmed that the artificial spectra sounded convincingly like a male
and a female speaker, respectively. For each sentence, the amplitudes of the samples
corresponding with the F0-peaks in the accented syllables were all scaled to the same
value. For these manipulations, we used the LVS package (Vogten, 1985).
Subsequently, both the ‘male’ and the ‘female’ versions were resynthesised with a
number of artificial one-peak intonation contours in which the height of the baseline and
the height of the peak were varied. The contours consisted of one H*#L-accent, on the
syllables ‘scha’ and ‘pie’, respectively, with a boundary L-tone at the beginning and one at
the end, giving an accent-lending peak superimposed on a slightly descending baseline.
The baseline was varied in three steps, ‘high’, ‘mid’ and ‘low’, and each of these baseline
conditions was combined with two peak heights, to implement the peak height condition.
Peaks had flanks of 100 ms and were aligned such that the F0 maxima occurred in the
middle of the vowel. Table 1 gives the values we used.
We did not cross both the ‘male’ and ‘female’ versions with each baseline condition:
the low baseline condition is unrealistic when combined with the ‘female’ voice, while the
high baseline condition is unrealistic when combined with the ‘male’ voice. Therefore, the
stimuli that are relevant for testing the first hypothesis, which requires a comparison of
the ‘female’ and ‘male’ versions, are those with the mid baseline. This subset consisted of
2 (sentences)]2 (genders)] 2 (peak heights)"8 stimuli (cf. the ‘female’ and the ‘male’
contours with baseline of 145 Hz in Fig. 1). Within the sets of stimuli for each ‘gender’, it
is possible to test the second hypothesis. These two mutually exclusive subsets (cf. the two
boxed sets of contours in Fig. 1) consisted of 2 (sentences)]2 (peak heights)]2 (baselines)"8 stimuli each. The total number of stimuli was thus 16.
2.2. Procedure
Two versions of the stimulus tape were made, each with a different random order of the
stimuli. Thirty subjects participated in the experiment, who were equally divided over
both test halves. The 16 stimuli were presented three times in eight blocks of 6 stimuli
each. The interstimulus interval was 5.5 s. The eight blocks were mixed with 72 stimuli
belonging to a different experiment on peak prominence for which the same task was
used. This was a magnitude estimation task: subjects were asked to rate the level of
prominence of (strictly, the ‘degree of emphasis’ on) the syllables ‘scha’ and ‘pie’ in each
stimulus by putting a vertical mark across an uncalibrated horizontal line, one of which
was printed on their score sheets for each stimulus. Each block of six stimuli corresponded to a single page, and was preceded by an anchor stimulus, ‘Dat gedoe van die boeren
daar’ (‘that fussing of those farmers there’), which had a baseline of 145—130 Hz and
a peak of 190 Hz. The prominence level of the anchor stimulus was marked on the score
TABLE I. F (Hz) of contour beginnings, contour ends, and
0
peaks of artificial pitch contours
Baseline begins
175
145
115
Baseline ends Low peak
169
130
100
235
190
155
High peak
245
205
165
376
C. Gussenhoven and ¹. Rietveld
Figure 1. Structure of the experimental contours with hypothetical
male and female reference scales, indicated by the boxes.
sheet as the midpoint of the first scale of each page. The uncalibrated scales were subsequently quantized as 100-point scales.
2.3. Results
The resulting mean scores, pooled over subjects and repetitions, are shown in Fig. 2. The
variation in the scores given was quite similar in all conditions: the mean SD was 13.33.
The lowest SD was 11.28, which occurred in the condition ‘male speaker, sentence 1,
115 Hz’, while the highest was 15.05, which occurred in the condition ‘female speaker,
sentence 1, 145 Hz’. This variation in the scores reflects the variation between the subjects but does not affect the analysis, as we are dealing with a within-subject design.
Panel (a) shows the effects of the baseline conditions and the peak height conditions for
the ‘female’ voice, for the two source sentences separately, while panel (b) shows the
equivalent scores for the ‘male’ voice. As can be seen, the higher baseline results in higher
preceived prominence in all cases, while higher peaks result in higher perceived prominence in all cases except in the ‘mid’ male register for the sentence with [a:].
Separate analyses of variance (repeated measures, based on the means of the three
repetitions) were carried out on the scores for the ‘female’ and ‘male’ voices, respectively,
with BASELINE, PEAK, and SENTENCE as three two-level factors. For the ‘female’ voice, only
the main effects reached significance: BASELINE: F1,29"24.67, p(0.001, and the index of
explained variance g2"0.460; PEAK: F2,29"6.84, p(0.015, g2"0.191; and SENTENCE:
F1,29"35.31, p(0.001, g2"0.549. For the ‘male’ voice the main effects BASELINE and
PEAK were significant (F
1,29"25.82, p(0.001, g2"0.471 and F1,29"31.48, p(0.001,
g2"0.520, respectively). Two interactions were also significant: SENTENCE]PEAK
(F1,29"6.01, p(0.021, g2"0.172) and BASELINE]PEAK (F1,29"9.24, p(0.006,
g2"0.242).
Speaker-dependence of prominence judgements
377
Figure 2. Mean perceived prominence of the F peaks as a function
0
of peak height and baseline, for each source sentence separately
(panel a: ‘female’ voice, panel b: ‘male’ voice). Mean values based on
90 observations. S1"Dat geblaat van die schapen daar; S2"Dat
geklier van die pieten daar.
Fig. 3 shows the effect of the GENDER and SENTENCE conditions in the mid baseline
condition. As can be seen, the female voice consistently shows a lower perceived prominence than the male voice.
An analysis of variance (repeated measures, based on the means of the three repetitions) was carried out on the perceived prominence values of the stimuli with the mid
baseline. Four factors turned out to be (near) significant at the 5% level: SENTENCE:
F1,29"14.43, p(0.002, g2"0.332; GENDER: F1,29"68.43, p(0.001, g2"0.702; PEAK:
378
C. Gussenhoven and ¹. Rietveld
Figure 3. Mean perceived prominence of the F peaks as a function
0
of ‘gender’ and ‘peak’. Mean values each based on 180 observations.
F1,29"3.69, p(0.061 (near significant), g2"0.113; and the interaction SENTENCE]
GENDER: F
1,29"22.68, p(0.001, g2"0.439. The main effect SENTENCE is not relevant
here. It could be expected to affect the prominence ratings, since the words to be rated
differed in spectral composition and duration. The manipulated gender of the voice
appeared to be a strong factor: for both sentences, the ‘male’ voice yielded higher
perceived prominence levels than the ‘female’ voice. Peak height was nearly significant
(p(0.061): three out of four comparisons show differences in perceived prominence that
correspond with the differences in the F0 peaks. The interaction SENTENCE]GENDER was
unexpected. It appears that for both levels of PEAK HEIGHT, the prominence difference
between the ‘male’ and ‘female’ voices is somewhat larger for the sentence with [i] (‘S1’ in
Fig. 2) in the accented syllable than for the sentence with [a:] (‘S2’ in Fig. 2) in the
accented syllable.
3. Discussion
The main finding is that the manipulated gender of the voice influences prominence
judgements in the expected direction. When superimposed on a male voice, the H*#L
peak in the same F0 contour leads to greater perceived prominence than when it is
superimposed on a female voice. This confirms the hypothesis that perceived gender of
the speaker is used by the listener to anchor the reference scale upon which contours are
projected. In addition, within each artificial voice, both the raising of the baseline and the
raising of the peak were seen to independently increase the perceived prominence of the
peaks. This confirms our second hypothesis (as well as a great deal of earlier research):
when the reference scale is held constant, higher pitch leads to greater prominence.
Could our ‘gender’ effect have other causes? It is well known that the manipulation of
the spectral characteristics of speech signals may affect their loudness. For instance,
Speaker-dependence of prominence judgements
379
Glave and Rietveld (1975) showed that speech signals with equal intensities but different
spectra differ in loudness. A crucial factor here is the distribution of the spectral energy
over the different critical bands. In general, increasing the distance between formants will
also increase the chance that energy will be concentrated in different critical bands. For
obvious reasons, the distance between the formants of our ‘female voice’ was larger than
the distance between formants in the ‘male voice’, meaning that, ceteris paribus, the
female vowels might be perceived as somewhat louder than the male vowels. Our results
speak against this, however; the stressed syllables in the words geblaat and pieten as
realised in the ‘male voice’ were judged to be more prominent than the corresponding
syllables in the ‘female voice’. This strongly suggests that our upscaling of the formants
did not by itself increase the perceived prominence. Moreover, the sheer size of the effect,
which greatly exceeds that of a peak height increase of 10 Hz, makes it unlikely that it is
an artefact of formant alteration.2
The results of our experiment are consistent with findings by Traunmüller and
Eriksson (1995). They showed that the perceived liveliness of F0 excursions depends not
only on F0 and speech rate, but also on the ‘amount of space available below F1’ (on this
distance, see also Traunmüller, 1988). Specifically, they found that when the spectral
distance between the first formant and the baseline (their value ‘F"’, which is very similar
to the concept of the ‘baseline’ used in our study) is larger, the perceived degree of liveliness is smaller. This predicts that when F1 goes up, causing the spectral distance between
F0 and F1 to increase, the perceived liveliness should decrease. As perceived liveliness
and perceived prominence are likely to be related, it is not surprising to see the same
dependence between spectral characteristics and perceived prominence in the results of
our experiment. However, we do not believe that the distance between F1 and F0 itself is
responsible for these two concurring findings. Rather, a higher F1 (and a higher F2 , F3 ,
etc.) suggest a different speaker, one who is likely to have a higher average F0 , and hence
a higher ‘reference scale’. We assume it is this that causes pitch movements to sound less
impressive, lively, surprised, etc. to the listener.
The unexpected interaction between SENTENCE and GENDER does not have an obvious
explanation. We found that the effect of GENDER on perceived prominence was greater for
the sentence with [i] than for the sentence with [a:] in the accented syllable. A possible
explanation may be found in the effect of intrinsic pitch. Results obtained by Silverman
(1987, chap. 3) for English suggest that [i] will be heard as less prominent than [a:], all
else being equal. If the reference scale is nonlinear, the effect of the manipulated gender
should be larger as the contours are scaled lower on the reference scales, i.e., larger for [i]
than for [a:], which is in accordance with our finding. However, the interaction effect is
small, and therefore we refrain from further attempts to explain it.
In summary, it was found that a change in the apparent gender of the speaker can
cause the perceived prominence of an F0 peak to change: male speech is more prominent
than female speech, all else being equal. Listeners appear to use the speaker’s voice
2Van Heuven and Menert (1996) failed to find any effect on stress perception of formant upscaling and
downscaling in Dutch disyllables representing nonsense and real words presented in isolation. Since theirs was
a forced-choice task selecting either the first or the second syllable as stressed, their results do not bear on our
experiment. It is not clear why male speech should be biased for stress towards a different syllable than female
speech, and neither is it clear that any overall effects of formant alteration on loudness should be biased
towards either of the syllables.
380
C. Gussenhoven and ¹. Rietveld
characteristics to estimate the location of the reference scale on which the F0 contour is
projected so as to ‘read off’ the speaker’s intended prominence level.
The authors would like to thank reviewer Peter Assmann and an anonymous reviewer for helpful comments on
an earlier version of this paper.
References
Beckman, M. E. (1995) Local shapes and global trends. Proceedings International Conference of Phonetic
Sciences, Stockholm, Vol. II, 100—107
Bezooijen, R. van (1996) Socio-cultural aspects of pitch differences between Japanese and Dutch women.
¸anguage and Speech, 38, 253—265
Elmlund, M., Frehr, I. & Petersen, N. H. (1992) Formant transformation from male to female synthetic voices.
Proceedings International Conference on Speech and ¸anguage Processing, Banff, Vol. II, 1187—1190
Glave, R. D. & Rietveld, A. C. M. (1975) Is the effort dependence of speech loudness explicable on the basis of
acoustical cues? Journal of the Acoustical Society of America, 58, 875—879
Gussenhoven, C. & Rietveld, T. (1997) Empirical evidence for the contrast between L* and H* in Dutch rising
contours. In A. Botinis et al. (eds), Intonation: ¹heory, Models and Applications, Proceedings of an ESCA
¼orkshop. Grenoble: European Speech Communication Association. 169—172
Gussenhoven, C., Repp, B. H., Rietveld, A., Rump, H. H. & Terken, J. (1997) The perceptual prominence of
fundamental frequency peaks. Journal of the Acoustical Society of America, 102, 3009—3022
Heuven, V. J. van & Menert, L. (1996) Why stress position bias? Journal of the Acoustical Society of America
100, 2439—2451
Ladefoged, P. & Broadbent, D. E. (1957) Information conveyed by vowels. Journal of the Acoustical Society of
America, 29, 98—104
Ladd, D. R. & R. Morton (1997) The perception of intonational emphasis: continuous or categorical? Journal of
Phonetics, 25, 313—342
Leather, J. (1983) Speaker normalization in perception of lexical tone. Journal of Phonetics, 11, 373—382
Pierrehumbert, J. (1979) The perception of fundamental frequency declination. Journal of the Acoustical Society
of America, 66, 363—369
Pierrehumbert, J. (1980) The phonetics and phonology of English intonation. PhD dissertation MIT. Published
by Garland Press, New York, 1990.
Rietveld, A. C. M. & C. Gussenhoven (1985) On the relation between pitch excursion size and prominence.
Journal of Phonetics, 13, 299—308
Shriberg, E. E., Ladd, D. R., Terken, J. & Stolcke, A. (1996) Modelling pitch range variation-within and across
speakers: Predicting F targets when ‘‘speaking up’’. In Proceedings of the International Conference on Spoken
0
¸anguage Processing, Philadelphia. Supplement pp. 1—4
Silverman, K. A. E. (1987) ¹he structure and processing of fundamental frequency contours. PhD dissertation
Cambridge
Streefkerk, B. M., Pols, L. C. W. and Ten Bosch, L. F. M. (1997). Prominence in read aloud sentences, as
marked by listeners and classified automatically. Proceedings of the Institute of Phonetic Sciences of the
ºniversity of Amsterdam, 21, 101—116
Terken, J. (1991) Fundamental frequency and perceived prominence of accented syllables. Journal of the
Acoustical Society of America, 89, 1768—1776
Traunmüller, H. (1988) Paralinguistic variation and invariance in the characteristic frequencies of vowels.
Phonetica, 45, 1—29
Traunmüller, H. & A. Eriksson (1995) The perceptual evaluation of F excursions in speech as evidenced in
0
liveliness estimations. Journal of the Acoustical Society of America, 97 (no. 3, March), 1905—1915
Vogten, L. (1985). Handleiding No. 67. LVS-Speech Processing Programs on IPO VAX 11/780. Eindhoven:
Institute of Perception Research