The roles of fundamental frequency contours and sentence context

Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
The roles of fundamental frequency contours and
sentence context in Mandarin Chinese speech
intelligibility
Jiuju Wang and Hua Shu
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University,
Beijing 100875, China
[email protected], [email protected]
Linjun Zhanga)
College of Chinese Studies, Beijing Language and Culture University, Beijing 100083,
China
[email protected]
Zhaoxing Liu
Beijing Chinese Language and Culture College, Beijing 100037, China
[email protected]
Yang Zhang
Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral
Development, University of Minnesota, Minneapolis, Minnesota 55455
[email protected]
Abstract: Flattening the fundamental frequency (F0) contours of
Mandarin Chinese sentences reduces their intelligibility in noise but not
in quiet. It is unclear, however, how the absence of primary acoustic cue
for lexical tones might be compensated with the top-down information
of sentence context. In this study, speech intelligibility was evaluated
when participants listened to sentences and word lists with or without
F0 variations in quiet and noise. The results showed that sentence context partially explained the unchanged intelligibility of monotonous
Chinese sentences in quiet and further indicate that F0 variations and
sentence context act in concert during speech comprehension.
C 2013 Acoustical Society of America
V
PACS numbers: 43.71.Gv [AC]
Date Received: January 9, 2013
Date Accepted: May 28, 2013
Speech perception includes both top-down and bottom-up processes, and phonemic as
well as lexical identification is heavily influenced by context in degraded conditions
(Hickok and Poeppel, 2007). Although F0 contour, the primary prosodic feature, does
not determine segmental consonant or vowel identity by itself, it has a variety of linguistic functions, including marking emphasis and phrase boundaries. Previous studies
examining the effects of F0 contours on the intelligibility of English sentences produced
by healthy (Wingfield et al., 1984; Laures and Weismer, 1999) or hearing-impaired
speakers (Maassen and Povel, 1984) have found that the absence of F0 variation
decreases the intelligibility of speech compared with normal F0 variation.
In tonal languages like Mandarin Chinese, lexical tone contrasts are phonologically as important as phonemic contrasts. That is, the F0 contours for Chinese lexical
tones distinguish lexical meanings from otherwise identical strings of phonemes.
a)
Author to whom correspondence should be addressed.
J. Acoust. Soc. Am. 134 (1), July 2013
C 2013 Acoustical Society of America
V
EL91
Author's complimentary copy
1. Introduction
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
However, recent studies with Mandarin Chinese materials surprisingly showed that
monotonous sentences with flattened F0 contours were just as intelligible as normal
sentences with natural F0 patterns in a quiet listening condition (Patel et al., 2010; Xu
et al., 2013). Because the speech materials in these studies used only normal sentences,
it is impossible to determine what cues (e.g., the remaining secondary acoustic cues for
lexical tones or higher-level semantic information based on the sentential context) are
utilized to comprehend the speech when the normal pitch patterns are flattened.
The first motivation for the present study was to examine the role of sentence
context in Mandarin Chinese intelligibility in a quiet listening condition. We manipulated the top-down information variable and obtained intelligibility scores using both
normal and word list sentences with natural or flat F0 contours. If high-level semantic
information plays an important role, the intelligibility of pitch-flattened word list
“sentences” would be substantially lower than that of the normal sentence counterparts
because the word list sentences are designed to be semantically incoherent to remove
the top-down contextual information. On the contrary, if low-level acoustic information is the major determinant for speech perception in quiet, the intelligibility of word
list sentences with flattened F0 contour would be comparable to that of the normal
sentences.
For speech-in-noise listening conditions, previous studies have consistently
demonstrated the importance of dynamic F0 contours to speech intelligibility regardless
of whether the target language is a tonal language or not (Laures and Bunton, 2003;
Binns and Culling, 2007; Patel et al., 2010; Miller et al., 2010). The results show that
naturally varying F0 contours improve speech intelligibility in background noise compared with flat or inverted F0 contours. The explanations proposed for these findings
maintain that dynamic changes in F0 direct the listener’s attention to the content words
of the utterance and assist with the segmentation of words in continuous speech.
Unchanging F0 lowers intelligibility of the utterance because it reduces the contrast
between words and makes it more difficult to parse continuous speech into meaningful
units (Laures and Bunton, 2003; Binns and Culling, 2007). As with the previous studies
that examined the intelligibility of speech in quiet, the speech-in-noise investigations
did not look into the contribution that semantic information provided by sentence context might make to speech intelligibility.
The objective of the present study was to investigate the role of sentence context
and the interaction between sentence context and F0 contours during Mandarin Chinese
speech comprehension in both quiet and noise. We hypothesized that sentence context
would contribute to the intelligibility of Chinese speech in all conditions, which partially
accounts for the unimpaired intelligibility of pitch flattened sentences in quiet. Furthermore,
different signal-to-noise ratios (SNRs) associated with the listening conditions (in quiet vs in
noise) might modulate the interaction between sentence context and F0 contours.
2. Methods
One hundred and fifty-six undergraduate participants from Beijing Normal University
were recruited. Five participants were omitted from the final analysis—three reported a
hearing disorder in subject screening, and the other two were omitted due to computer
error during data collection. The remaining 151 participants were all native Chinese
speakers between the ages of 18 and 23, and all had hearing sensitivity 20 dB hearing
level for octave frequencies between 250 and 8000 Hz bilaterally. In order to avoid the
stimulus order effects, we adopted a mixed-design with F0 contours (normal vs flat)
and background noise (quiet, SNR ¼ þ5 dB and 5 dB) as between-subject factors and
sentence context as a within-subject factor. The subject distribution in the betweensubject conditions was as follows: Quiet/normal F0 condition, number of subjects
(n) ¼ 26; quiet/flat F0 condition, n ¼ 25; SNR ¼ þ5 dB/normal F0 condition, n ¼ 25;
EL92 J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
Author's complimentary copy
2.1 Subjects
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
SNR ¼ þ5 dB/flat F0 condition, n ¼ 24; SNR ¼ 5 dB/normal F0 condition, n ¼ 25;
SNR ¼ 5 dB/flat F0 condition, n ¼ 26.
2.2 Materials
Fig. 1. Acoustic features of sample speech stimuli. Broadband spectrograms (SPG: 0 to 5 kHz), intensity envelopes (INT: 50 to 100 dB), and fundamental frequency contours (F0: 0 to 500 Hz) are displayed for (A) normal
sentence and its pitch-flattened counterpart; (B) word list sentence and its pitch-flattened counterpart.
J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
EL93
Author's complimentary copy
To manipulate the F0 and sentence context effects, four types of target sentences were
created: Normal sentences and word list sentences with naturally intonated or unnaturally monotonous contours. The normal sentences were 20 declarative Chinese sentences
with a variety of topics, and each sentence was comprised of 5 to 9 words. Words
from the entire pool of the normal sentences were pseudo-randomly selected to form
the word list sentences, which were syntactically anomalous and semantically meaningless at the whole sentence level. They were matched in length (number of syllables)
with the normal sentences. The normal sentences and word list sentences were read by
a male native speaker of Chinese. Manipulation of F0 was done using Praat (Institute
of Phonetic Sciences, University of Amsterdam; downloadable at www.praat.org). A
flat F0 contour was created for each sentence at the sentence’s mean F0 and the resulting monotonous sentence was resynthesized using the PSOLA method (Fig. 1).
Consonant-misplaced sentences were used as masker stimuli. These sentences
were constructed by replacing the onset consonant of each syllable in the normal sentences with another consonant, provided that the replacement did not violate the phonotactic rules of Chinese. These consonant-misplaced sentences were syntactically
anomalous, unintelligible at both lexical and sentential levels. The masker sentences
were read by a female native speaker of Chinese. The choice of a male target speaker
and a female masking speaker was to enable the clear instruction “listen to the male
speaker” to be used throughout, rather than to train the subjects on the identity of the
target speaker (Scott et al., 2004).
Each target sentence was combined with masker noise at 2 SNR levels (þ5
and 5 dB). The level of the target sentences was fixed at 70 dB sound pressure level
and the level of the competing speech masker varied around the level of the target
speech. The masker speech was edited to be, on average, 1 s longer than the target
speech (500 ms prior to the beginning and 500 ms at the end of the target sentence) so
that no part of the speech target was unmasked.
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
2.3 Procedure
Listeners were tested individually in a sound-attenuated booth facing a computer monitor. Stimuli were presented via loudspeakers (Edifier R18, Edifier Technology Co.
Ltd., Beijing, China). Because sentence context was the only within-subject factor,
each listener was presented with a total of 40 trials—20 normal sentences and 20 word
list sentences with natural or flat F0 contours in one background condition (quiet or
one SNR level). Listeners were instructed that they would be listening to sentences in
quiet or noise, and were asked to write down the words read by the male speaker. The
task was self-paced; listeners pressed a key to advance from trial to trial. Each sentence
could be heard only once. The first author scored the written responses. Incorrect or
omitted words were annotated, and performance scores were checked by an independent auditor blind to the experiment. Practice sentences were provided before the experiment, sampling all conditions. After the practice block, the experimenter checked the
readability of the participant’s handwriting.
3. Results
Intelligibility was determined by a keyword-correct count (Scott et al., 2004). The number of correct keywords (content words, varied across sentences from 4 to 7) identified
by each listener was counted and then converted to the percentage of the total number
of words and averaged across listeners. A 2 2 3 repeated measures analysis of variance (ANOVA), with sentence context as the within-subject factor and F0 contours
and background noise as the between-subject factors, was carried out. Results showed
that the three main effects were all highly significant [sentence context: F(1,
149) ¼ 414.06, p < 0.001, g2 ¼ 0.741; F0 contours: F(1, 149) ¼ 203.48, p < 0.001,
g2 ¼ 0.584; background noise: F(2, 148) ¼ 377.53, p < 0.001, g2 ¼ 0.839], revealing that
intelligibility was reduced by the lack of sentence context and natural contours, and in
the presence of background noise (Fig. 2).
Further analyses comparing all possible 2- and 3-way ANOVAs showed that
the interactions were all significant [p values < 0.001, g2 0.130 for all the 2-way interactions and p ¼ 0.002, g2 ¼ 0.08 for the 3-way interaction]. The significant interactions
revealed that intelligibility disproportionately decreased with increasing noise for flatcontour sentences versus natural-contour sentences as well as for word list sentences
versus normal sentences. Furthermore, intelligibility degraded to a greater extent for
word list sentences than normal sentences when the F0 patterns of the two types of sentences changed from natural contours to flat contours (Fig. 3). Bonferroni adjusted
post hoc pairwise comparisons revealed that intelligibility was significantly different for
most contrasting pairs of interest. The four exceptional pairs included normal sentences
with natural contours versus normal sentences with flat contours in quiet, normal sentences with natural contours in quiet versus their counterparts in 5 dB SNR background noise, word list sentences with nature contours versus normal sentences with
flat contours both in quiet and 5 dB SNR background noise.
The present study aimed to investigate the roles of sentence context, F0 contour, background noise, and the interactions among the factors in the intelligibility of Mandarin
Chinese speech. The significant main effects indicate that all three factors contribute to
the intelligibility of Chinese speech. That is, lack of sentence context and natural F0
variations, and the presence of background noise all lead to a decrease in intelligibility,
respectively. The significant interactions indicate that the contributions that sentence
context and F0 contours make to the intelligibility of Chinese speech are modulated by
the background noise conditions. Specifically, when presented in quiet, normal sentences with flat F0 contours are as intelligible as their counterparts with natural F0 contours. However, when presented in noise, flattening the F0 contours of normal sentences dramatically reduced the intelligibility compared with the mild decrease in
EL94 J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
Author's complimentary copy
4. Discussion
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
Fig. 2. Word-report scores sorted by the main effects of factors. Error bars represent standard deviation across
subjects.
Fig. 3. Word-report scores sorted by the interaction effects.
J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
EL95
Author's complimentary copy
intelligibility for normal sentences with natural F0 contours. These results are consistent with previous findings (Patel et al., 2010; Xu et al., 2013), highlighting the importance of natural F0 contours for sentence intelligibility in noise. At the same time,
word list sentences with natural F0 contours were less intelligible than their normal sentence counterparts whether presented in quiet or noise, highlighting the contribution of
sentence context to speech intelligibility irrespective of the background conditions.
One issue of particular interest to the present study is to what extent the unimpaired intelligibility of sentences with flat F0 contours in quiet is attributed to the
remaining phonetic cues or semantic information from the rest of the sentence.
Although F0 contour is the primary acoustic cue for lexical tone perception, other cues
(e.g., duration and amplitude) are also utilized by native Chinese speakers to recognize
the types of tones (Wise and Chong, 1957; Liu and Samuel, 2004). It is also welldocumented that semantic information facilitates spoken word recognition and speech
comprehension (McClelland and Elman, 1986; Liu and Samuel, 2007). Because only
two types of speech materials, i.e., normal sentences with and without natural F0
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
contours, were used in the previous studies (Patel et al., 2010; Xu et al., 2013), it was
difficult to separate the effects of high-level semantic and low-level acoustic information on intelligibility. In the present study, the result that absence of sentence context
reduced the intelligibility indicates that sentence context partially accounts for the
unimpaired intelligibility of monotonous Chinese sentences in quiet. Furthermore,
when both sentence context and natural F0 variations were deprived, the speech material was least intelligible whether in quiet or noise background conditions. This finding
indicates that F0 variations and sentence context act in concert during Chinese speech
comprehension. When word list sentences with natural F0 contours were directly compared with normal sentences with flat F0 contours, their intelligibility was similar in
the quiet and 5 dB SNR noise conditions, whereas the intelligibility of word list sentences with natural F0 contours was slightly higher than the normal sentences with flat F0
contours in the 5 dB SNR noise condition. This result indicates that the relative importance of sentence context and F0 contours for speech intelligibility depends on background conditions.
The test materials, especially the word list sentences and sentences with flat F0
contours were specially designed for this experiment. The lack of ecological validity of
the materials may limit the explanation of our results. However, in a recent study,
Feng et al. (2012) found that although lexical tone recognition for sine-wave Chinese
monosyllabic words is poor, the recognition accuracy for sine-wave sentences is very
high, reflecting the compensation effect of contextual information when the tonal information is poor. Our results, together with the findings of Feng et al. (2012), indicate
that the functional load of lexical tones on sentence comprehension is limited. That is,
lexical meaning access is possible in a sentence context when surface pitch patterns of
tones are altered, although lexical tones are as important as segmental phonemes in
specifying the meaning of a word in a tone language.
In a recent fMRI study, Xu et al. (2013) found that Mandarin Chinese sentences with natural or flat F0 contours elicited similar activation in the lexical-semantic
processing areas (e.g., the left insular, middle, and inferior temporal gyri) and at the
same time, monotonous sentences elicited greater activation in the left planum temporale than sentences with natural F0 contours. These results demonstrate that lexical
meaning can still be accessed in pitch-flattened Chinese sentences, and that this process
is realized by automatic recovery of the phonological representations of lexical tones
from the altered tonal patterns. Our results are consistent with the findings of Xu et al.
(2013), supporting the models which include an important role for top-down information in guiding speech perception (e.g., Hickok and Poeppel, 2007), given that listeners
can automatically use additional neural and cognitive resources to recover distorted
tonal patterns in sentences. As there are significant interactions among the three factors
(i.e., sentence context, F0 variation, and background noise) in the intelligibility of
Chinese speech, further studies are needed to better understand the brain mechanisms
involved in these cognitive processes.
Acknowledgments
The research was supported by grants from the Humanities and Social Sciences
Foundation (Projects for Young Scholars) of the Chinese Ministry of Education (Grant
No. 10YJCZH223) to L.J.Z., and from the Natural Science Foundation of China (Grant
No. 31271082), the Natural Science Foundation of Beijing (Grant No. 7132119), and the
Fundamental Research Fund for the Central Universities to H.S.
Binns, C., and Culling, J. F. (2007). “The role of fundamental frequency contours in the perception of
speech against interfering speech,” J. Acoust. Soc. Am. 122(3), 1765–1776.
Feng, Y. M., Xu, L., Zhou, N., Yang, G., and Yin, S. K. (2012). “Sine-wave speech recognition in a tonal
language,” J. Acoust. Soc. Am. 131(2), EL133–EL138.
EL96 J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
Author's complimentary copy
References and links
Wang et al.: JASA Express Letters
[http://dx.doi.org/10.1121/1.4811159]
Published Online 14 June 2013
J. Acoust. Soc. Am. 134 (1), July 2013
Wang et al.: Chinese speech intelligibility
EL97
Author's complimentary copy
Hickok, G., and Poeppel, D. (2007). “The cortical organization of speech processing,” Nat. Rev. Neurosci.
8(5), 393–402.
Laures, J. S., and Bunton, K. (2003). “Perceptual effects of a flattened fundamental frequency at the
sentence level under different listening conditions,” J. Commun. Disord. 36(6), 449–464.
Laures, J. S., and Weismer, G. (1999). “The effects of a flattened fundamental frequency on intelligibility
at the sentence level,” J. Speech Lang. Hear. Res. 42(5), 1148–1156.
Liu, S., and Samuel, A. G. (2004). “Perception of Mandarin lexical tones when F0 information is
neutralized,” Lang Speech 47(2), 109–138.
Liu, S., and Samuel, A. G. (2007). “The role of Mandarin lexical tones in lexical access under different
contextual conditions,” Lang. Cognit. Processes 22(4), 566–594.
Maassen, B., and Povel, D. (1984). “The effect of correcting fundamental frequency on the intelligibility of
deaf speech and its interaction with temporal aspects,” J. Acoust. Soc. Am. 76, 1673–1681.
McClelland, J. L., and Elman, J. (1986). “The TRACE model of speech perception,” Cogn. Psychol. 18(1),
1–86.
Miller, S. E., Schlauch, R. S., and Watson, P. J. (2010). “The effects of fundamental frequency contour
manipulations on speech intelligibility in background noise,” J. Acoust. Soc. Am. 128(1), 435–443.
Patel, A. D., Xu, Y., and Wang, B. (2010). “The role of F0 variation in the intelligibility of Mandarin
sentences,” in Proceedings of Speech Prosody 2010 (Chicago, IL).
Scott, S. K., Rosen, S., Wickham, L., and Wise, R. J. (2004). “A positron emission tomography study of
the neural basis of informational and energetic masking effects in speech perception,” J. Acoust. Soc. Am.
115(2), 813–821.
Wingfield, A., Lombardi, L., and Sokol, S. (1984). “Prosodic features and the intelligibility of accelerated
speech: Syntactic versus periodic segmentation,” J. Speech Hear. Res. 27, 128–134.
Wise, C. M., and Chong, L. P.-H. (1957). “Intelligibility of whispering in a tone language,” J. Speech
Hear. Disord. 22(3), 335–338.
Xu, G., Zhang, L., Shu, H., Wang, X., and Li, P. (2013). “Access to lexical meaning in pitch-flattened
Chinese sentences: An fMRI study,” Neuropsychologia 51(3), 550–556.