Ischebeck, A., Friederici, A.D., K. Alter. Processing prosodic

Processing prosodic boundaries in natural and hummed speech. An fMRI study
Anja K. Ischebeck1,2, Angela D. Friederici2, Kai Alter2,3
1
2
Innsbruck Medical University, Clinical Department of Neurology, Austria
Max Planck Institute of Human Cognitive and Brain Sciences, Leipzig, Germany
3
Newcastle University Medical School, Newcastle upon Tyne, U.K.
Corresponding author:
Anja Ischebeck, PhD
Innsbruck Medical University
Anichstrasse 35
6020 – Innsbruck (Austria)
tel.: +43 512 504 23661
e-mail: [email protected]
Abstract
Speech contains prosodic cues such as pauses between different phrases of a sentence. These
intonational phrase boundaries (IPBs) elicit a specific component in ERP-studies, the socalled closure-positive-shift (CPS). The aim of the present fMRI study is to identify the neural
correlates of this prosody-related component in sentences containing segmental and prosodic
information (natural speech) and hummed sentences only containing prosodic information.
Sentences with two IPBs both in normal and hummed speech activated the middle superior
temporal gyrus, the Rolandic operculum and the gyrus of Heschl more strongly than sentences
with one IPB. The results from a region of interest (ROI) analysis of auditory cortex and
auditory association areas suggest that the posterior Rolandic operculum, in particular,
supports the processing of prosodic information. A comparison of natural speech and hummed
sentences revealed a number of left-hemispheric areas within the temporal lobe as well as in
the frontal and parietal lobe that were activated more strongly for natural speech than for
hummed sentences. These areas constitute the neural network for the processing of natural
speech. The finding that no area was activated more strongly for hummed sentences compared
to natural speech suggests that prosody is an integrated part of natural speech.
2
Introduction
The speech melody of an utterance can carry information that is critically important to
understand the meaning of a sentence (see, for a review, Frazier et al., 2006; Friederici &
Alter, 2004). In intonational languages such as German, Dutch, and English, prosodic
information on sentence-level is mainly conveyed, among others, by the pitch contour of an
utterance and the presence of speech pauses. Sentences usually contain one or more major
intonational phrases (IPh; Selkirk, 1995) that can be separated by speech pauses. Syntactically
relevant speech breaks are also referred to as intonational phrase boundaries (IPB). In studies
using an event-related brain potential (ERP) paradigm IPBs were observed to give rise to a
positive shift in the EEG-signal that is referred to as the closure positive shift or CPScomponent (Steinhauer et al., 1999). This component has been interpreted as being
specifically related to the prosodic information contained in IPBs, as it has been observed
using sentence materials that lacked semantic and syntactic information, such as pseudoword
sentences (Pannekamp et al., 2005), filtered speech materials (Steinhauer & Friederici, 2001)
or hummed speech (Pannekamp et al., 2005). The present fMRI study attempts to identify the
brain regions that are involved in the processing of intonational phrase boundaries (IPBs).
IPBs are often employed by the speaker to clarify the structure of an otherwise
syntactically ambiguous sentence. A sentence like 'Before Ben starts # the day dreaming has
to stop' has a different meaning than 'Before Ben starts the day # dreaming has to stop' (#
indicating a break). Syntactically relevant speech breaks are also referred to as intonational
phrase boundaries (IPB). IPBs often separate major intonational phrases and correspond to
major syntactic boundaries (Cooper & Paccia-Cooper, 1981). They represent a high level in
the so-called prosodic hierarchy (Selkirk, 1995, 2000; Nespor & Vogel, 1983). In
psycholinguistic experiments, IPBs were shown to help resolve ambiguities related to late
closure ambiguities (Schafer et al., 2000; Grabe et al., 1994).
3
Experimental evidence suggests that humans can make use of the prosodic information
contained in IPBs, that is, in the absence of semantic or syntactic information. Behaviorally, it
has been shown that listeners are able to detect major prosodic boundaries in meaningless
speech materials, such as, for example, reiterant speech (i.e., a sentence spoken as a string of
repeated syllables while preserving the original prosodic contour) (de Rooij, 1976), spectrally
scrambled and low-pass filtered speech (de Rooij, 1975, Kreiman, 1982), and hummed
sentences (t'Hart & Cohen, 1990). It should be noted, that the rationale for using stimuli of
this kind is based on the assumption that speech is separable into layers, such as semantics,
syntax and prosody. This assumption, however, may only be an approximation, as it has been
argued that the possibility to separate these layers is inherently limited (Austin, 1975; Searle,
1969).
In studies using an event-related brain potential (ERP) paradigm IPBs were observed
to give rise to a positive shift in the EEG-signal that is referred to as the closure positive shift
or CPS-component (Steinhauer et al., 1999). Subsequent studies indicated that the CPS is
specifically related to the prosodic aspects of an IPB, as it was also observed for sentence
materials that were stripped of semantic information, such as pseudoword sentences, and for
sentence materials with reduced or absent segmental information, such as hummed sentences
(Pannekamp et al., 2005), and filtered speech materials (Steinhauer & Friederici, 2001). The
CPS component is typically distributed bilaterally with a central maximum with a shift to the
right for hummed sentences. However, due to the intrinsic difficulty of source localization in
EEG studies, it is unclear which brain regions generate the CPS component.
To our knowledge, only one imaging study so far investigated the processing of IPBs
in speech (Strelnikov et al., 2006). Strelnikov et al. compared sentence materials in Russian
that contained an IPB (e.g. 'To chop not # to saw', meaning One should not chop but saw)
with sentences that did not contain an IPB ('Father bought him a coat'). Comparing sentences
with IPB ('segmented') to sentences without IPB ('not segmented') stronger activation1 was
4
observed within the right posterior prefrontal cortex and an area within the right cerebellum.
In the reverse comparison, stronger activation was observed in the gyrus of Heschl,
bilaterally, and the left sylvian sulcus. However, the two types of sentence materials were
used in two different tasks thus confounding stimulus type and task. Although the comparison
between segmented and non-segmented speech materials very likely yielded brain areas that
are relevant for prosody processing, it can not be excluded that differences due to the tasks
contributed to the results.
Ideally, a study investigating the processing of IPBs should compare conditions that
do not differ in any other respect than the presence or absence of IPBs, keeping everything
else constant. In the electrophysiological studies reviewed above (e.g., Steinhauer et al.,
1999), the same task was used on sentence materials that either had one or two IPBs. The
sentence materials used in the electrophysiological studies reviewed above were carefully
constructed as sentence pairs with the same or similar words. It should be noted that the IPBs
in these sentences were obligatory, entailing differences with regard to the syntactic and
semantic structure between the two sentences of such a pair. However, these differences only
play a role, when the sentence materials are presented naturally spoken, but not when their
segmental content is removed by filtering or humming.
Hummed speech has the advantage that it preserves the prosody of natural speech
while removing major aspects of semantic and syntactic information of the utterance. Human
speakers have been shown to be able to selectively preserve the original prosodic contour of
an utterance. When asked to produce reiterant speech (i.e., selectively preserving the prosodic
contour of a meaningful utterance by repeating a syllable), the resultant utterance preserved
the prosodic aspects from normal speech, such as duration and pitch (Larkey, 1983), as well
as accentuation and boundary marking (Rilliard & Aubergé, 1998). When utterances are
hummed, the hummed version of an utterance has also been shown to preserve pitch contour
and duration of the natural spoken version (Pannekamp et al., 2005). Most importantly, the
5
CPS component Pannekamp et al. observed at IPBs within hummed speech materials were
similar to the CPS observed for natural speech, but more lateralized to the right hemisphere.
Hummed speech has the additional advantage that it is a familiar human vocalization.
Different from speech materials that are rendered unintelligible artificially and sound more
unfamiliar (Scott et al., 2000; Meyer et al., 2004) the known unintelligibility of humming
effectively prevents participants from any attempt to decipher the original speech content of
the signal.
In the present functional magnetic resonance imaging (fMRI) study, sentence pairs
containing one or two IPBs were presented auditorily to the participants as natural speech and
as hummed sentences. Similar to previous electrophysiological studies (Steinhauer et al. 1999;
Pannekamp et al., 2005; Isel et al., 2005), the sentence pairs were constructed using the same
or equivalent content words, except for one or two critical words. All sentences were
meaningful, syntactically correct and spoken with natural prosody. To ensure variability in the
sentence materials used with regard to the position of the additional IPB, two types of
sentence pairs were constructed, one type with the additional IPB at an early position within
the sentence (type A), the other type at a later position in the sentence (see, Table 1, for
examples of the sentence materials). Similar to previous electrophysiological studies, the IPBs
contained in the sentences were obligatory. We had used materials with obligatory IPBs
because such materials had been investigated in previous electrophysiological studies. A CPS
had been observed for sentences of type A (Steinhauer et al., 1999; Pannekamp et al., 2005)
as well as for coordination structures like type B (Steinhauer, 2003). In the case of the
naturally spoken sentences with obligatory IPBs, observed activation differences might in part
be due to the additional semantic and syntactic differences between the sentences, rather than
being solely due to the presence of IPBs. To differentiate the processing of prosody from
associated syntactic and semantic processing, also a hummed version of each sentence was
6
produced by a trained speaker who was instructed to preserve the natural prosody of the
original sentence.
The aim of the present study was to identify the neural structures involved in the
processing of sentence-level prosody by comparing sentences with two IPBs to sentences that
have only one IPB. To differentiate prosodic processing from syntactic and semantic
processing, differences between sentences with a different number of IPBs were investigated
separately for natural speech and hummed sentences.
With regard to the neural correlates of IPB processing we hypothesized that the
primary auditory cortices and the auditory association areas play an important role. These
areas are, among others, involved in the processing of complex auditory signals and speech
(see, for a review, Griffiths et al., 2004). The superior temporal gyrus has been observed to be
involved in processing of prosody (Hesling et al., 2005; Doherty et al., 2004). Activation in
regions outside the temporal lobe has also been reported but appears to vary across different
studies. These activations were observed to depend on the specifics of the task (Tong et al.,
2005; Plante et al., 2002), the type of prosody involved (e.g., affective vs. linguistic,
Wildgruber et al., 2004) and the degree of propositional information contained in the stimulus
materials (Hesling et al., 2005; Tong et al., 2005; Gandour et al., 2002, 2004). It is possible,
that areas outside the temporal lobe are also involved and that the auditory association areas
are only part of a more extended processing network. IPBs are realized by variations in the
prosody of an utterance. We therefore hypothesized that the superior temporal gyrus, among
others, will show a modulation, positive or negative, due to the presence of an additional IPB.
Furthermore, given the observed shift of the CPS from a bilateral distribution for
natural speech to a more right hemispheric distribution for hummed sentences a more right
hemispheric lateralization for prosodic aspect may be observed. Evidence from patients with
brain lesions inspired a first raw hypothesis about prosody processing, namely, that the right
hemisphere plays a dominant role in prosody processing. Patients with lesions within the left
7
hemisphere often suffer from aphasia, while non-aphasic patients with right hemispheric
damage seem to have difficulties to perceive or produce the prosodic aspects of speech (see,
for a review, Wong, 2002, but see, Perkins et al., 1996). On the basis of this first raw
hypothesis, two more sophisticated classes of hypotheses have been developed. According to
the acoustic or cue-dependent class of hypotheses, hemispheric dominance is determined
solely by the acoustic properties of the auditory signal. Zatorre and Belin (2001), for example,
suggested that the left hemisphere is specialized in processing the high frequency components
that generate the vowels and consonants in speech, while the right hemisphere is specialized
in processing the low frequency patterns that make up the intonational contour of a syllable or
sentence (for a similar view, see, Poeppel, 2003). According to the class of functional or taskdependent hypotheses, hemispheric dominance depends on the function of prosodic
information (van Lancker, 1980) or on the attentional focus required for example by an
experimental task. A shift from the right hemisphere to the left is assumed to occur, when the
task or function of the prosodic information contained in a speech signal involves language
processing rather than prosody processing. A review of the available empirical studies
suggests that lateralization is stimulus-dependent but can, in addition, vary as a function of
task (Friederici & Alter, 2004). It should be noted, however, that the first raw hypothesis of
right-hemispheric dominance of prosody is still under debate. If prosodic processes are mainly
subserved by the right hemisphere we should find the right hemisphere being activated in both
natural and hummed speech and if, moreover, prosody is represented neuronally as a separate
layer a direct comparison between hummed minus natural speech should result in more left
hemispheric activation for natural speech.
Methods
Participants
8
Sixteen healthy right-handed healthy young adults (8 female; mean age: 26.1 years, SD: 4.44)
took part in the experiment. The data of two participants were discarded from the analysis,
one because of scanner malfunction, the other because of too many errors. All participants
were or had been students of the University of Leipzig. They were native speakers of German
with normal hearing and had no history of neurological or psychiatric illness. Volunteers were
paid for their cooperation and had given written consent before the experiment. Ethical
approval to the present study has been provided by the University of Leipzig.
Materials
Forty-eight German sentence pairs were constructed, that either had one or two intonational
phrase boundaries (IPB). Ideally, the materials used should only differ with regard to the
number of IPBs they contain. If possible, they should consist of the same words to ensure
similar lexical retrieval processes. They should also contain the same number of words and
syllables, so that they do not differ in length. To ensure some variability in the sentence
materials used with regard to the position of the additional IPB, two types of sentence pairs
were constructed. One type (A) had an early additional phrase boundary, one type (B) an
additional phrase boundary at a later position in the sentence (see, Table 1, for examples of
the sentence materials). The IPBs contained in the sentences were obligatory because such
materials had been investigated in previous electrophysiological studies. Sentences of type A
had been used in earlier electrophysiological studies where they were observed to elicit the
CPS component (Steinhauer et al., 1999). These materials were also found to elicit the CPS
component when they were low-pass filtered (Steinhauer & Friederici, 2001) or hummed
(Pannekamp et al., 2005). In addition, coordination structures like the type B sentences used
here also have been investigated in previous electrophysiological studies (Steinhauer, 2003)
and a CPS was observed. The pairs of one and two IPB sentences of types A and B were as
similar as possible with regard to the words used. In type B, the same content words were
used. Also in type A, the same content words were used, with the exception of the verb, which
9
was either transitive or intransitive. To ensure comparability, the frequency and number of
syllables of the verb was matched within a type A sentence pair. We had constructed these
sentence materials with the aim to create conditions that do not differ by more than the aspect
under scrutiny, namely, the number of IPBs. It should be noted that the sentence materials
constructed for this experiment come close to this ideal but are not perfect. The type A (and
type B) sentences with one or two IPBs additionally differ from each other with regard to
their syntactic structure, and word order (type B). A possibility to circumvent these
differences would have been the choice of sentence materials with optional IPBs rather than
obligatory IPBs as in the materials used here. While obligatory IPBs correspond to major
syntactic boundaries, optional IPBs are on a lower level of the prosodic hierarchy, namely the
level of phonological phrases (Selkirk, 1995). They do not necessarily correspond to syntactic
phrases (Truckenbrodt, 2005). Furthermore, optional IPBs depend on several factors such as
speech rate and hesitations (filled or unfilled pauses) which might make them more difficult
to detect for the listener. Although it is highly probable that optional IPBs also elicit a CPS
component, this has not yet been investigated. The choice of our materials was primarily
motivated to ensure comparability to previous electrophysiological studies where a CPS
component was observed. Although not optimal, we think that our materials are reasonably
comparable to allow inferences on prosodic processing.
--- insert Table 1 about here --The sentences were spoken with natural prosody. Additionally, a hummed version of
each sentence was recorded. All materials were recorded from a trained female native speaker
of German. The speaker was instructed to speak a hummed version of the sentence after its
naturally spoken version, and to take care to preserve its original prosody, speed and total
number of syllables of the normal sentence. In Figure 1, spectrograms and fundamental
frequency contours are given for type A and type B example sentence (hummed and naturally
10
spoken). The recorded materials were digitized (44.1 KHz) and normalized (70%) with regard
to amplitude envelope to ensure an equal volume for all stimuli.
--- insert Figure 1 about here --Design and Procedure
The task of the participants was to indicate in a two alternative forced choice task whether a
probe word presented after the sentence had been contained in the sentence or not. Probe
words were chosen for each sentence and belonged to different word classes (e.g., nouns,
verbs, determiners). The probes were spoken with final rising pitch, indicating a question. Of
the 96 sentences in total, 72 were selected as experimental materials, 18 as probe sentences,
and 6 as practice sentences. In each trial participants had to decide whether the probe word
had occurred in the sentence or not. 'Yes' ('No') answers were given with the middle (index)
finger of the right hand. Sentences from the experimental materials never contained the probe.
The probe sentences were not analyzed. In the case of the naturally spoken probe sentences,
the probe word was always contained in the sentence and a 'yes' answer was expected. In the
case of the hummed experimental sentence materials, the task was very easy. As soon as the
hummed sentence presentation finished, participants knew that 'no' had to be the answer. The
main purpose of the task in the case of the hummed sentence materials was to have
participants attend to the presentation of the sentence while it lasted. In the hummed probe
sentences, one word of the hummed sentence was naturally spoken. To prevent that the
participants suspended attention as soon as they detected a naturally spoken word in the
hummed probe sentence, the probe word was identical to the naturally spoken word within the
probe sentence in only half of the trials. This ensured that participants had to wait until the
presentation of the probe word.
Naturally spoken and hummed sentence materials were presented in alternating blocks.
Every block consisted of 2 practice sentences, 24 experimental sentences, 6 probe-sentences,
and 6 null events (i.e., trials in which no auditory stimulus was presented). The null events
11
were included to increase the efficiency of the design (Liu et al., 2001) and to provide a
baseline condition for analysis. The blocks of hummed materials were of the same structure.
The total experiment consisted of 6 blocks (3 hummed, 3 normal), that is, 228 trials in total.
Trials were randomized with first-order transition probabilities between conditions held
constant. Three randomizations were generated in total. Each trial began with the presentation
of a hummed or normal sentence. Then a probe word was presented 500 ms after the offset of
the sentence. After the presentation of the probe a waiting time was inserted to ensure a total
inter-trial interval (ITI) of 11 seconds (see Figure 1). With a TR of 3 s and an ITI of 11 s, trial
presentation and scanner triggering were synchronous every three trials. To ensure
synchronization, the beginning of every third trial was synchronized using the scanner trigger.
After synchronization a random waiting time of 0 to 600 ms was inserted before the
presentation of the sentence. Stimulus presentation and response time recording were
controlled by a computer outside the scanner using Presentation software (Neurobehavioral
Systems Inc.). The experimental materials were presented over earphones within the scanner.
The participants were additionally protected from scanner noise by earplugs. They reported
that they had no difficulties understanding the sentences over the scanner noise. A button box
compatible with the scanner environment was used to record the responses of the participants.
--- insert Figure 2 about here --fMRI data acquisition
All imaging data were acquired using a 3T Bruker 30/100 Medspec MR scanner. A high
resolution structural scan using the 3D modified driven equilibrium Fourier transform
(MDEFT) imaging technique was also obtained (128 sagittal slices, 1.5 mm thickness, inplane resolution: 0.98 x 0.98 mm, field of vision (FOV): 250 mm). Functional images were
acquired using a T2* gradient echo-planar imaging (EPI) sequence. For the functional
measurement a noise-reduced sequence was chosen to ensure that noise would not impair
comprehension. 18 ascending axial slices per volume were continuously obtained, in the plane
12
of the anterior and posterior commissure (in-plane resolution of 3 x 3 mm, FOV: 192 mm, 5
mm thickness, 1 mm gap, echo time (TE): 40 ms, repetition time (TR): 3000 ms). The 18
slices covered the temporal lobes and part of the parietal lobes and cerebellum. According to a
short questionnaire given to the participants after the experiment the noise level of the scanner
did not critically impair the auditory quality and comprehensibility of the naturally spoken or
the hummed stimulus materials.
Functional data analysis
FMRI data analysis was performed with statistical parametric mapping (SPM2) software
(Wellcome Department of Cognitive Neurology, London, U.K.). The six blocks were
measured as separate runs with 142 volumes each. The first two images of each functional run
were discarded to ensure signal stabilization. The functional data of each participant were
motion-corrected. The structural image of each participant was registered to the time series of
functional images and normalized using the T1 template provided by SPM2, corresponding
approximately to Talairach & Tournoux Space (Talairach & Tournoux, 1988). The functional
images were normalized using the normalization parameters of the structural image and then
smoothed with a full-width half-maximum (FWHM) Gaussian kernel of 12 mm. A statistical
analysis on the basis of the general linear model was performed, as implemented in SPM2.
Though hummed and normal sentence materials were presented in blocks, an event-related
analysis was chosen. This made it possible to compare the sentences to the null events
interspersed within the blocks as a baseline, as well as to discard probe trials and error trials
from the analysis. The delta-function of the trial onsets per condition was convolved with the
canonical form of the hemodynamic response function as given in SPM2 and its first and
second temporal derivative to generate model time courses for the different conditions. Each
trial was modelled in SPM2 using the beginning of the sentence as trial onset. Due to the
length of the auditorily presented sentence stimuli the BOLD response was modelled with a
duration of 5.345 s (average length of the sentences, hummed and natural). Errors and probe
13
sentences were modelled as separate conditions and excluded from the contrasts calculated.
The functional time series was high-pass filtered with a frequency cut-off of 1/80 Hz. No
global normalization was used. Motion parameters and the lengths of the individual sentences
per trial were entered into the analysis as parameters of no interest. For the random effects
analysis, the calculated contrast images from the first level analysis of each participant were
entered into a second level analysis. ROI-Analysis: Eight regions of interest (ROIs) were
created on the basis of the AAL-atlas (automated anatomical labelling (AAL) atlas, TzourioMazoyer et al., 2002): superior temporal gyrus, the Rolandic operculum, the gyrus of Heschl
and the inferior frontal gyrus (pars triangularis and pars opercularis). The ROI for the superior
temporal gyrus was divided into three parts, anterior (y > -15), middle (-35 < y < -15) and
posterior (y < -35), and the Rolandic operculum into two parts, anterior (y > -15) and
posterior (y < -15). A visualization of the temporal ROIs is given in Figure 3. The ROIanalysis was performed by averaging the effect-sizes (contrast estimates) over all voxels for
the left and right ROIs per participant.
--- insert Figure 3 about here --Results
Behavioral
Participants were excluded from further analysis if they made more than 30% errors in any
condition. This led to the exclusion of one participant in addition to the one participant that
was excluded due to scanner malfunction. Analyzing the data of the remaining fourteen
participants yielded 144 errors (5.7 %) in total. Response times were measured from the
presentation of the probe word. Only correct responses to experimental trials, not probe trials,
within the interval of 200 to 2000 ms were analyzed. Response times were entered into a
repeated measures analysis of variance with SPEECH TYPE (natural vs. hummed) and IPB
(one vs. two IPBs) as factors. Participants reacted significantly faster to the hummed (473 ms,
SD = 46.05) than to the naturally spoken sentence materials (753 ms, SD = 85.36), yielding a
14
significant main effect of SPEECH TYPE (F(1,13) = 30.64, MSE = 35691, p < 0.001). The
number of IPBs did not influence response times; the main effect of IPB and the interaction
SPEECH TYPE x IPB was not significant.
FMRI-data: whole brain analysis
Comparisons of sentence materials with a different number of IPBs.
For natural speech, sentences with two IPBs activated the superior temporal gyrus,
extending to the Rolandic operculum and gyrus of Heschl, bilaterally, more strongly than
sentences with one IPB (Figure 4a, Table 2). In the corresponding comparison for hummed
sentences, a significant cluster was observed in the left supramarginal gyrus, extending to the
left superior temporal gyrus and the left gyrus of Heschl. No significant clusters were
observed in the reverse comparisons, for natural speech as well as for hummed sentences.
--- insert Figure 4 & Table 2 about here --Comparisons of the two basic types of materials (hummed sentences and natural speech) to
baseline
When natural speech was compared to baseline (null events), the strongest activations
were observed within the superior temporal gyrus, bilaterally, and the SMA (Table 2). Further
activations were observed within the right precentral gyrus, the right insula, the left superior
parietal lobule and in the cerebellum. When hummed sentences were compared to baseline,
activations were observed within the superior temporal gyrus, bilaterally, the precentral gyrus,
bilaterally, the SMA and the right middle frontal gyrus, as well as the left inferior parietal
lobule. Activations were also observed within the cerebellum, bilaterally.
Comparisons of hummed sentences to natural speech
Compared to hummed sentences, natural speech activated the frontal gyrus and middle
temporal gyrus, bilaterally but with a left-hemispheric predominance, as well as the left
angular gyrus and the left thalamus and caudate, more strongly (Figure 4b, Table 2). No brain
area was significantly more activated for hummed sentences than for natural speech.
15
FMRI-data: ROI-Analysis
To further investigate and compare the behaviour of the brain areas potentially involved in
IPB processing, we conducted an ROI-analysis over eight ROIs (pars opercularis and
triangularis of the inferior frontal gyrus, anterior, middle and posterior part of the superior
temporal gyrus, gyrus of Heschl, and anterior and posterior Rolandic operculum). The effect
sizes for each condition were averaged over all voxels contained within each ROI per side and
participant. The results per ROI, hemisphere and condition are shown in Figure 5. They were
entered into a repeated measures analysis of variance with ROI (8), HEMISPHERE (left vs.
right), IPB (1 vs. 2 IPS), and SPEECH TYPE (naturally spoken vs. hummed) as factors. The
left hemisphere was on average more strongly activated than the right hemisphere, which is
reflected in a significant main effect of HEMISPHERE (F(1,13) = 49.53, MSE = 2.279, p <
0.001). Natural speech activated the brain areas investigated more strongly than hummed
speech, yielding a significant main effect of SPEECH TYPE (F(1,13) = 11.93, MSE = 4.415,
p < 0.01). More activation was observed for sentences containing two intonational phrase
boundaries than sentences containing one intonational phrase boundary giving a main effect
of IPB (F(1,13) = 17.45, MSE = 0.385, p < 0.01). The different overall activation levels
within the ROIs yielded a significant main effect of ROI (F(7,91) = 72.71, MSE = 1.435, p <
0.001). SPEECH TYPE and IPB yielded additive effects, as none of the interactions
containing both factors reached significance. The type of speech materials (hummed or
natural) influenced lateralization, yielding a significant SPEECH TYPE x HEMISPHERE
interaction (F(1,13) = 30.59, MSE = 0.419, p < 0.001) and depended on the ROI giving a
significant triple ROI x SPEECH TYPE x HEMISPHERE interaction (F(7,91) = 3.92, MSE =
0.103, p < 0.001). The lateralization, type of materials, and number of IPBs, influenced
activation differently in different ROIs, giving the significant two-way interactions ROI x
HEMISPHERE (F(7,91) = 19.83, MSE = 0.437, p < 0.001), ROI x SPEECH TYPE (F(7,91) =
6.40, MSE = 0.175, p < 0.001) and ROI x IPB (F(7,91) = 16.00, MSE = 0.018, p < 0.001).
16
The significant triple interactions between ROI x HEMISPHERE x IPB (F(7,91) = 6.14, MSE
= 0.007, p < 0.001) and ROI x HEMISPHERE x SPEECH TYPE (F(7,91) = 3.92, MSE =
0.103, p < 0.001) indicate that SPEECH TYPE and IPB each interacted with hemisphere
differently in different ROIs. The remaining interactions were not significant.
To identify these differences between ROIs more closely we conducted additional
ANOVAs separately for different ROIs. Significant main effects for IPB were observed in all
ROIs except for the inferior frontal gyrus ROIs. The main effect SPEECH TYPE was
significant in all ROIs but the anterior and posterior Rolandic operculum. Significant main
effects for HEMISPHERE were observed in all ROIs with the exception of the anterior STG
and the anterior Rolandic operculum. The HEMISPHERE x SPEECH TYPE interaction was
significant in all but three ROIs, the anterior and posterior Rolandic operculum and the gyrus
of Heschl. A HEMISPHERE x IPB interaction was observed in the posterior STG and the
gyrus of Heschl. No ROI showed a significant SPEECH TYPE x IPB interaction. The triple
interaction HEMISPHERE x SPEECH TYPE x IPB was significant only in the anterior
Rolandic operculum.
To investigate differences with regard to lateralization for the two types of material in
every ROI more closely, Newman-Keuls post-hoc tests were calculated for the interaction
HEMISPHERE x SPEECH TYPE. A stronger activation within the left hemisphere compared
to the right was observed for natural speech in nearly all ROIs, except the anterior Rolandic
operculum and the anterior part of the superior temporal gyrus. For hummed speech a lefthemispheric dominance was observed in only four ROIs, the gyrus of Heschl, the middle and
posterior part of the superior temporal gyrus and the posterior Rolandic operculum. No ROI
showed a significantly stronger activation of the right hemisphere for hummed sentences
compared to natural speech.
--- insert Figure 3 about here --Discussion
17
The aim of the present study was to identify the brain areas involved in the processing of
sentence-level prosody by investigating the processing of intonational phrase boundaries
(IPBs). We will first discuss our results with regard to differences in the processing of
sentences with two IPBs as compared to sentences with one IPB, and later turn to the general
differences between natural and hummed speech.
Brain areas involved in IPB processing
In the whole brain analysis stronger activation was observed for sentences with two
IPBs than sentences with one IPB within the left superior temporal gyrus, for natural speech
as well as for hummed sentences. An additional focus within the superior temporal gyrus on
the right side was significant for natural speech but failed to reach significance for hummed
speech. This indicates that processing an additional IPB activates the superior temporal gyrus
similarly for natural speech and hummed sentences.
In the ROI-analysis (pars opercularis and triangularis of the inferior frontal gyrus,
anterior, middle and posterior part of the superior temporal gyrus, gyrus of Heschl, and
anterior and posterior Rolandic operculum), more activation for two than for one IPB was
observed only in the temporal ROIs, but not within the inferior frontal gyrus. This indicates
that prosody processing mainly involves brain areas related to auditory processing. In
addition, all but one ROIs, the posterior Rolandic operculum, showed a significant main effect
or an interaction with SPEECH TYPE (natural speech or hummed sentences). This result
could be taken to indicate that these regions are involved in the processing of the more
complex segmental information contained in natural speech. The posterior Rolandic
operculum (bilaterally), on the other hand, did not show stronger activation for natural speech
compared to hummed sentences: there was no main effect or any interaction with SPEECH
TYPE. This suggests that this region might play a less pronounced role in the processing of
the specific spectral composition and the additional linguistic information (semantics, syntax)
contained in natural speech. As this region, however, was more strongly activated by
18
materials containing two IPBs rather than one (main effect of IPB) it could be speculated that
this region might be specifically involved in the processing of the prosodic information.
Interestingly, there was no interaction of IPB and SPEECH TYPE in any of the ROIs
investigated. This indicates that the activation elicited by the presence of an additional IPB in
the ROIs investigated did not depend on the comprehensibility of the utterance. Although
IPBs can aid the understanding of an utterance, we did not observe any evidence for an
interaction in the respective ROIs, not even within the left inferior frontal gyrus, a region
assumed to be involved in syntactic processing. This finding should not lead to the
conclusion, however, that prosody and syntax do not interact. It is possible that syntax and
prosody are processed independently from each other in different brain regions and that the
interaction between both types of information occurs in higher associative areas within the
brain. This is also suggested by a recent ERP study with patients suffering from lesions in the
corpus callosum (Friederici et al., 2007).
So far, only one neuroimaging study investigated the perception of IPBs (Strelnikov et
al., 2006). In the condition with IPB, the IPB could be in one of two positions, changing the
meaning of the sentence. 'To chop not # to saw' ('Rubit nelzya, pilit') means: 'not to chop but
to saw', whereas 'To chop # not to saw' ('Rubit, nelzya pilit') means 'to chop and not to saw'.
This is a possible construction in Russian because, different from Germanic languages such as
German, Dutch or English, the negative word ‘not’ may have scope to its left or right
depending on the position of an IPB. In this condition, participants were first presented with
one of the two versions and had to select the appropriate alternative (one needs to: chop/saw).
In the condition without IPB, participants were first presented with a simple statement ('Father
bought him a coat') and then had to select the appropriate alternative ('His father bought him:
a coat/a watch'). Strelnikov and colleagues reported stronger activation for sentences
containing an IPB (e.g., 'To chop not # to saw', or 'To chop # not to saw') than for sentences
without an IPB (e.g., 'Father bought him a coat') within the right posterior prefrontal cortex
19
and an area within the right cerebellum. Different from our results, Strelnikov et al. did not
observe stronger activation within the superior temporal gyrus. However, as already outlined
in the introduction, the task used was different for both sentences types. Although the task
consisted in both cases of a visually presented question with two response alternatives ('Father
bought him: a coat / a watch' and 'One needs to: saw / chop', for the two examples,
respectively), the first type of materials (sentences without IPBs) required a semantic
judgement based on the segmental content of the utterance while the second type of materials
(sentences with an IPB) required a semantic judgement based on the segmental content as
well as the prosodic information (IPB position) of the utterance. It is therefore possible that
the differences observed by Strelnikov et al. (2006) might not only be due to the presence or
absence of IPBs in the materials but also to task variables.
To summarize our results so far, we observed a stronger activation within the superior
temporal gyrus for sentences containing two IPBs compared to sentences containing one IPB.
The activation extended to the gyrus of Heschl and the Rolandic operculum. To find out
whether one of these areas might show a specialization with regard to prosody processing as
compared to the processing of the segmental content of speech we also conducted an ROI
analysis. In the ROI-analysis, the posterior part of the Rolandic operculum was the only brain
region that showed a modulation of activation due to the presence of an additional IPB
independent of the amount of segmental information contained in natural speech as compared
to hummed sentences.
The processing of hummed sentences compared to natural speech
Hummed sentences also represent a human vocalization and a complex auditory
signal. Comparing the processing of natural speech with hummed sentences can therefore
yield information with regard to the brain areas processing the segmental content of speech as
compared to brain areas processing the more basic aspects of speech. Comparing the
processing of hummed sentences to baseline gives an initial overview of the brain areas
20
involved in the processing of hummed speech, as well as, potentially, in the processing the
prosodic aspects of speech. The strongest activations were observed within the superior
temporal gyrus, extending to the Rolandic operculum and the gyrus of Heschl, areas involved
in auditory processing. Strong activations were also observed within the SMA, the right
precentral gyrus and the cerebellum. These activations are most likely due to the manual
response required by the task. On the basis of the logic of cognitive subtraction, subtracting
hummed sentences from natural speech should yield brain areas that are involved in the
processing of segmental information, lexico-semantics and syntax. Stronger activations for
natural speech than for hummed sentences were observed in the middle temporal gyrus,
bilaterally, and within the left opercular and triangular part of the inferior frontal gyrus,
corresponding approximately to BA 44 and 45. These fronto-temporal activations are related
to the processing of syntactic and semantic information contained in natural speech. Similar
activations have been reported in other studies comparing natural speech to unintelligible
speech (Spitsyna et al., 2006). Natural speech also activated the superior temporal gyrus more
strongly than hummed speech, possibly reflecting the involvement of the auditory cortex in
processing the more complex spectral components carrying the segmental information of
natural speech. It should be noted that the probe monitoring task used here might have had
some influence on the brain activation patterns observed. Our task induced an attentional
focus on lexico-semantic processing rather than pure prosody processing that might also have
influenced the activations we observed in temporal areas for hummed speech. In the
following, we will further discuss our findings on the background of the organization of the
auditory system with regard to its relation to speech and prosody processing.
The brain areas involved in the processing of complex sounds and speech
The auditory association cortex seems to represent mainly spectral and temporal
properties of auditory stimuli rather than more abstract high-level properties such as auditory
objects (see, for reviews, Griffiths et al., 2004; Griffiths & Giraud, 2004). The auditory
21
association areas are assumed to be processing more abstract properties than primary auditory
cortex. Some of these properties are assumed to be processed in separate regions.
Rauschecker and Tian (2000) proposed that anterior regions subserve auditory object
identification while posterior regions process spatial information, similar to the ventral
('what') and dorsal ('where') pathways of the visual system (Ungerleider & Mishkin, 1982).
Although such an anterior-posterior distinction is supported in part by single cell recordings in
monkeys (Recanzone, 2000; Tian et al., 2001) as well as by lesion (Adriani et al., 2003;
Clarke et al., 2000) and brain imaging studies (Maeder et al., 2001), the distinction is not clear
cut. The functional nature of this anterior-posterior division is therefore still under debate
(Zatorre et al., 2002).
With regard to the processing of speech, the dual pathway model (Rauschecker &
Tian, 2000) suggests that speech should activate regions anterior to the primary auditory
cortex, as it is a familiar and highly complex auditory signal with no relation to spatial
information. When intelligible speech is compared to non-speech, stronger left-lateralized
activations obtain in anterior and middle parts of the superior temporal sulcus, for single
words as well as for sentences (Binder et al., 2000; Narain et al., 2003; Scott al., 2000; Meyer
et al., 2004). However, often also activation within the posterior temporal and inferior parietal
regions is observed in imaging studies (Meyer et al., 2002; Spitsyna et al., 2006). Although
not directly derivable from the dual pathway model, this finding is not altogether surprising. It
has long been known from lesion studies that posterior temporal and inferior parietal regions
are critical for supramodal language comprehension (Geschwind, 1965).
The stronger activations for natural speech than for hummed sentences observed in the
present study within the superior temporal gyrus are compatible with previous results. We
observed stronger activation for natural than for hummed speech in middle and posterior parts
of the superior temporal gyrus and the posterior part of the Rolandic operculum. With regard
to the lack of activation in anterior parts of the superior temporal lobe, it could be speculated
22
that hummed speech is recognized as a familiar human vocalization. This familiarity might
explain differences to results from studies using speech materials that are rendered
unintelligible artificially and sound more unfamiliar (Scott et al., 2000; Meyer et al., 2004).
It should be noted, however, that the activations observed within the temporal cortex
for speech may not be specific for speech. Similar activations have been observed for other
complex auditory stimuli such as musical sounds and melodies (Griffiths et al., 1998;
Patterson et al., 2001; see, for a review, Price et al., 2005). It is possible that differences
between speech and other complex auditory stimuli are subtle and go beyond simple
localization. A recent study by Tervaniemi et al. (2006), for example, showed that the
superior temporal region reacts differently to changes in pitch or duration for speech syllables
than for musical sounds.
The brain areas involved in the perception of prosody
Activations within the superior temporal gyri have been regularly observed in imaging studies
when speech with or without prosodic modulation is presented and compared to baseline (e.g.
Wildgruber et al., 2004; Hesling et al., 2005; Gandour et al., 2003, 2004). Regions outside the
temporal lobe have also been observed to show a modulation of activation dependent on
prosodic properties but appear to vary considerably across different studies. Prosody-related
activations were observed to depend on the specifics of the task (Tong et al., 2005; Plante et
al., 2002), the type of prosody involved (e.g., affective vs. linguistic, Wildgruber et al., 2004)
and the degree of propositional information contained in the stimulus materials (Hesling et al.,
2005; Tong et al., 2005; Gandour et al., 2003, 2004). Although the superior temporal region
seems to be the main candidate for the perception of prosodic modulation in speech, other
areas outside the temporal lobe are likely to be involved and the auditory association areas are
likely to be only part of a more extended processing network for prosodic aspects of speech.
Imaging studies investigating the processing of sentence-level prosody often compared
natural speech to speech stimuli either with no segmental information or with no or little
23
prosodic information. One possibility is to remove the segmental information by filtering,
preserving the intonational contour of the sentence (Meyer et al., 2002, 2004). The results
from these studies point towards a fronto-temporal network with a right hemispheric
dominance. The Rolandic operculum, in particular in the right hemisphere, has been identified
as part of this network. Another approach is to remove the intonational contour of sentences,
for example, by high-pass filtering (Meyer et al., 2004; Hermann et al., 2003), or to reduce
prosody information by speaking with little prosodic expressiveness (Hesling et al., 2005).
The rationale for this approach is to identify prosody-related activations by subtracting lowprosody speech from normal speech, thus ideally subtracting away activations related to the
processing of the segmental content of speech while leaving prosody-related activations. The
results from these studies, however, are mixed. Meyer et al. (2004) did not observe stronger
activations for normal speech than for low-prosody speech. It is possible, that this is finding is
due to the task that required a prosody comparison between two successive sentences stimuli
in an experimental setting in which fattened speech [-prosodic information] and degraded
speech [-segmental information were presented in a pseudo-randomized order. Hesling et al.
(2005) observed different results for intelligible and nonintelligible (i.e., low-pass filtered)
speech. In the case of high-prosody intelligible speech, the right superior temporal gyrus was
more strongly activated than in the case of low-prosody intelligible speech. High-prosody
nonintelligible speech did not activate any brain regions more strongly than low-prosody
nonintelligible speech. These results indicate that the additional prosodic information
contained in high-prosodic speech activates the superior temporal gyrus, although not very
strongly. Finally, a study by Doherty and colleagues (2004) compared sentences with a rising
pitch (question) to sentences with a falling pitch (statement). Although their results might not
exclusively reflect prosody processing, they observed, among others, a stronger activation
within the superior temporal gyrus, bilaterally, for sentences with a rising pitch. These
findings also indicate that the superior temporal gyrus might play a dominant role in the first
24
stages of prosody processing. The results of the present study, namely, that areas within and
around the superior temporal gyrus are involved in IPB processing therefore agree well with
previous results.
The lateralization of prosody processing
Although a right hemispheric dominance in prosody processing was initially suggested based
on evidence from patients, the results with regard to the perception of sentence-level nonaffective prosody show a mixed picture. Some studies show greater impairment in RHD
patients compared to LHD and controls (Bryan, 1989), whereas others find greater
impairment for LHD (Pell & Baum, 1997; Perkins et al., 1996), or a more complicated pattern
of preserved function alongside impairments for both patient groups compared to controls
(Baum & Dwivedi, 2003; Baum et al., 1997; Walker et al., 2001; Imaizumi et al., 1998). A
possible reason for these differences might be that the patient groups comprise patients with
non-overlapping patterns of brain damage, often in fronto-parietal areas. Cognitive
functioning in these patients might therefore be compromised in a number of domains, such
as, for example, working memory and drawing inferences, faculties required in complex tasks
such as prosodic disambiguation of sentences. Imaging studies investigating the perception of
sentence-level prosody with healthy adults circumvent this problem and allow a separate
assessment of lateralization for different brain regions.
A number of imaging studies have investigated the lateralization of prosodic
processing (Hesling et al., 2005; Meyer et al., 2002, 2004; Kotz et al., 2003). Meyer et al.
(2002, 2004) observed stronger right-hemispheric activations within fronto-temporal areas for
low-pass filtered speech than for natural speech whereas Kotz et al. (2003) using natural
speech stimuli observed predominantly bilateral activations. Hesling et al. (2005) found
activation within the right superior temporal gyrus when high-prosody intelligible speech was
compared to low-prosody intelligible speech. Another way to remove the propositional
content of speech while preserving its prosodic contour is to present foreign language
25
materials to listeners with no knowledge of this language. In an experiment by Tong et al.
(2005), bilateral activation was observed for English speakers listening to Chinese language
materials for eight out of nine ROIs investigated. A stronger right-hemispheric activation was
observed for only one ROI within the middle frontal gyrus. Tong et al. also found mostly
bilateral activation (six out of nine ROIs) for Chinese participants listening to Chinese, and a
dependence of lateralization on the type of task. This evidence indicates that prosody when
presented together with segmental information is subserved by a bilaterally distributed
network of fronto-temporo-parietal brain areas.
In the present study, hummed sentence materials were presented as well as natural
speech. Under the hypothesis that the right hemisphere is specialized in the processing of
sentence-level prosody it was hypothesized that the processing of hummed speech compared
to baseline shows right-hemispheric lateralization. In the whole brain analysis, bilateral
activation was observed in the superior temporal gyrus and the precentral gyrus, and stronger
activation of the right hemisphere was observed within the middle frontal gyrus for hummed
sentences. In the ROI-analysis, six out of eight ROIs showed stronger activation within the
left hemisphere for natural speech, whereas only four out of eight ROIs show a lefthemispheric dominance for hummed sentences. This suggests that prosodic processing is
organized more bilaterally than the processing of other properties of speech, such as syntax or
lexico-semantics. Further insights about the lateralization with regard to the processing of
specific linguistic aspects of prosody can be derived from the lateralization of IPB processing.
An interaction between the number of IPBs and lateralization was observed only in two ROIs
(posterior STG, gyrus of Heschl) with both ROIs showing a stronger modulation of activation
due to the number of IPBs in the left hemisphere, for natural speech as well as hummed
sentences. This could be taken to indicate that these two temporal areas show a lefthemispheric dominance for prosody processing.
26
The left-hemispheric dominance in the temporal regions observed in the present study
might be due to some degree to the task employed. While other studies required participants
to attend directly to the prosodic information of the speech stimuli (Tong et al. 2005; Meyer et
al., 2002, 2004), a probe detection task was used here that required participants, even in the
hummed speech condition, to attend to and memorize a potentially appearing naturally spoken
word carrying segmental information. The results of the present study are therefore
compatible with the functional or task-dependent class of hypotheses. According to this class
of hypotheses, the lateralization of prosody processing could shift from the right to the left
hemisphere when the task or the stimulus materials promote attention to syntactic-semantic
rather than prosodic properties of the materials.
Conclusion
This study aimed at identifying the brain areas involved in the processing of sentence-level
prosody, and in particular, the processing of intonational phrase boundaries (IPBs). Sentences
with two IPBs activated the superior temporal gyrus, bilaterally, more strongly than sentences
with one intonational phrase boundary. This pattern of activation was very similar for natural
speech and hummed sentences. The results from the ROI analysis suggest that the posterior
Rolandic operculum might play a specific role in the processing of prosodic information,
because it was the only ROI not the showing an influence of the type of speech materials
(hummed sentences or natural speech). When comparing natural speech and hummed
sentence materials, we found natural speech to activate a number of areas in the left
hemisphere more strongly than hummed sentences. The left-hemispheric dominance of
temporal activations observed for hummed sentences, however, might be due to the
attentional focus on segmental information required by the task employed in the present
study.
27
References
Adriani, M., Maeder, P., Meuli, R., Bellmann, A., Frischknecht, R., Villemure, J.-G., Mayer,
J., Annoni, J.-M., Bogousslavsky, J., Fornari, E., Thiran, J.-P., Clarke, S. 2003. Sound
recognition and localization in man: specialized cortical networks and effects of acute
circumscribed lesions. Exp Brain Res 153:591-604.
Austin J. 1975. How to do things with words. (William James Lectures). 2nd edition. M.
Sbisa, J. Urmsson (Eds). Cambridge, Mass: Harvard University Press.
Baum, S. R., Dwivedi, V. D. 2003. Sensitivity to prosodic structure in left-and righthemisphere-damaged individuals. Brain Lang 87:278-289.
Baum, S., Pell, M., Leonard, C., Gordon, J. 1997. The ability of right- and left-hemispheredamaged individuals to produce and interpret prosodic cues marking phrasal
boundaries. Lang Speech 40:313–330.
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J.
N., Possing, E. T. 2000. Human temporal lobe activation by speech and nonspeech
sounds. Cereb Cortex 10:512–528.
Brett, M., Christoff, K., Cusack, R., Lancaster, J. 2001. Using the Talairach atlas with the
MNI template. Neuroimage 13:S85.
Bryan, K. 1989. Language prosody and the right hemisphere. Aphasiology. 3:285–299.
Clarke, S., Bellman, A., Meuli, R., Assal, G., Steck, A. J. 2000. Auditory agnosia and
auditory spatial deficits following left hemispheric lesions: evidence for distinct
processing pathways. Neuropsychologia 38:797-807.
Cooper, WE., Paccia-Cooper, J. 1981. Syntax and Speech. Harvard University Press.
Doherty, C. P., West, W. C., Dilley, L. C., Shattuck-Hufnagel, S., Caplan, D. 2004.
Question/statement judgments: An fMRI study of intonation processing. Human Brain
Mapp 23:85–98.
28
Frazier, L., Carlson, K., Clifton, C. 2006. Prosodic phrasing is central to language
comprehension. Trends Cogn Sci 6:244-249.
Friederici, A. D., Alter, K. 2004. Lateralization of auditory language functions: A dynamic
dual pathway model. Brain Lang 89:267-276.
Friederici, A. D., von Cramon, D.Y., Kotz, S. A. 2007. Role of the corpus callosum in speech
comprehension: Interfacing syntax and prosdody. Neuron 53:135-145.
Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., Satthamnuwong, N.,
Lurito, J. 2003. Temporal integration of speech prosody is shaped by language
experience: An fMRI study. Brain Lang 84:318-336.
Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N., Tong, Y., Li, X.
2002. A cross-linguistic fMRI study of spectral and temporal cues underlying
phonological processing. J Cogn Neurosci 14:1076–1087.
Gandour, J., Tong, Y., Wong, D., Talavage, T., Dzemidzic, M., Xu., Y., Li., X., Lowe, M.
2004. Hemispheric roles in the perception of speech prosody. Neuroimage, 23:344357.
Geschwind, N. 1965. Disconnexion syndromes in animals and man: Part I. Brain 88:237–294.
Grabe, E., Warren, P., Nolan, F. 1994. Resolving category ambiguities - evidence from stress
shift. Speech Communication 15:101-114.
Griffiths, T. D., Giraud, A. L. 2004. Auditory Function. In: Human Brian Function. 2nd ed. R.
S. J. Frackowiak, K. J. Friston, C. D. Frith, R. J. Dolan, C. J. Price, S. Zeki, J.
Ashburner, W. Penny (Eds). Amsterdam, NL: Elsevier. pp. 61-75.
Griffiths, T. D., Warren, J. D., Scott, S., K., Nelken, I., King, A. J. 2004. Cortical processing
of complex sound: a way forward? Trends Neurosci. 27:181-185.
Griffiths, T.D., Büchel, C., Frackowiak, R. S. J., Patterson, R. D. 1998. Analysis of temporal
structure in sound by the human brain. Nat Neurosci. 1:422–427.
29
Grosjean, F., Hirt, C. 1996. Using prosody to predict the end of sentences in English and
French: Normal and brain-damaged subjects. Lang Cogn Proc. 11:107-134.
Herrmann, C. S., Friederici, A.D., Oertel, U., Maess, B., Hahne, A., Alter, K. 2003. The brain
generates its own sentence melody: A Gestalt phenomenon in speech perception. Brain
Lang, 85:396-401.
Hesling, I., Clement, S., Bordessoules, M., Allard, M. 2005. Cerebral mechanisms of prosodic
integration: evidence from connected speech. Neuroimage. 24:937-947.
Imaizumi, S., Mori, K., Kiritani, S., Hiroshi, H., Tonoike, M. 1998. Task-dependent laterality
for cue decoding during spoken language processing. Neuroreport. 9:899–903.
Isel, F., Alter, K., Friederici, A. D. 2005. Influence of prosodic information on the processing
of split particles: ERP evidence from spoken German. J Cogn Neurosci. 17:154-167.
Kotz, S. A., Meyer, M., Alter, K., Besson, M., von Cramon, D. Y., Friederici, A. D. 2003. On
the lateralization of emotional prosody: an event-related functional MR investigation.
Brain Lang. 86:366– 376.
Kreiman, J. 1982. Perception of sentence and paragraph boundaries in natural conversation.
Journal of Phonetics, 10:163-175.
Larkey, L. S. 1983. Reiterant speech: an acoustic and perceptual validation. J Acoust Soc Am.
73:1337-1345.
Liu, T. T., Frank, L. R., Wong, E. C., Buxton, R. B. 2001. Detection power, estimation
efficiency and predictability in eventrelated fMRI. Neuroimage. 13: 759–773.
Maeder, P. P., Meuli, R. A., Adriani, M., Bellmann, A., Fornari, E.Thiran, J.-P., Pittet, A.,
Clarke, S. 2001. Distinct Pathways Involved in Sound Recognition and Localization:
A Human fMRI Study. Neuroimage. 14:802-816.
Meyer, M., Alter, K., Friederici, A. D., Lohmann, G., & von Cramon, D. Y. 2002. FMRI
reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum
Brain Mapp. 17:73–88.
30
Meyer, M., Steinhauer, K., Alter, K., Friederici, A. D., von Cramon, D. Y. 2004. Brain
activity varies with modulation of dynamic pitch variance in sentence melody. Brain
Lang. 89:277-289.
Narain, C., Scott, S. K., Wise, R. J. S., Rosen, S., Leff, A., Iversen, S. D., Matthews, P. M.
2003. Defining a left-lateralized response specific to intelligible speech using fMRI.
Cereb. Cortex. 13:1362–1368.
Nespor, M., Vogel I. 1983. Prosodic structure above the word. In: Prosody: Models and
Measurements. A. Cutler, D. R. Ladd (Eds.) Berlin: Springer Verlag. pp. 123-140.
Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A.D. 2005. Prosody-driven
Sentence Processing: An Event-related Brain Potential Study. J Cogn Neurosci
17:407-421.
Patterson, R. D., Uppenkamp, S., Johnsrude, I., Griffiths, T. D. 2002. The Processing of
Temporal Pitch and Melody Information in Auditory Cortex. Neuron 36:767–776.
Pell, M., Baum, S. 1997. The ability to perceive and comprehend intonation in linguistic and
affective contexts by brain-damaged adults. Brain Lang 57:80–99.
Perkins, J. M., Baran, J. A., Gandour, J. 1996. Hemispheric specialization in processing
intonation contours. Aphasiology 10:343–362.
Plante, E., Creusere, M., Sabin, C. 2002. Dissociating sentential prosody from sentence
processing: Activation interacts with task demands. Neuroimage. 17:401–410.
Poeppel, D. 2003. The analysis of speech in different temporal integration windows: Cerebral
lateralization as asymmetric sampling in time. Speech Comm. 41:245–255.
Price, C., Thierry, G., Griffiths, T. 2005. Speech-specific auditory processing: where is it?
Trends Cogn Sci. 9:271-276.
Rauschecker, J. P., Tian, B. 2000. Mechanisms and streams for processing of ‘what’ and
‘where’ in auditory cortex. Proc Natl Acad Sci 97:11800–11806.
31
Recanzone, G. H. 2000. Spatial processing in the auditory cortex of the macaque monkey.
Proc Natl Acad Sci. 97:11829-11835.
Rilliard, A., Aubergé, V. 1998. Reiterant Speech for the Evaluation of Natural vs. Synthetic
Prosody. In: Third ESCA/COCOSDA Workshop on Speech Synthesis. Australia.
1998. 87-92.
Rooij, J. J. de, 1975. Prosody and the perception of syntactic boundaries. IPO Annual
Progress Report 10:36-39.
Rooij, J. J. de, 1976. Perception of prosodic boundaries. IPO Annual Progress Report 11:2024.
Schafer, A. J., Speer, S. R., Warren, P., White, S. D. 2000. Intonational disambiguation in
sentence production and comprehension. J Psycholing Res 29:169-182.
Schmahmann, J. D., Doyon, J., McDonald, D., Holmes, C., Lavoie, K., Hurwitza, A. S.,
Kabanic, N., Toga, A., Evans, A., Petrides, M. 1999. Three-Dimensional MRI Atlas of
the Human Cerebellum in Proportional Stereotaxic Space. Neuroimage 10:233-260.
Scott, S.K., Blank, C. C., Rosen, S., Wise, R. J. S. 2000. Identification of a pathway for
intelligible speech in the left temporal lobe. Brain. 123:2400–2406.
Searle J. 1969. An essay in the philosophy of language. Cambridge, MA: Cambridge
University Press.
Selkirk, E. 1995. Sentence prosody: intonation, stress, and phrasing. In The Handbook of
Phonological Theory, J. A. Goldsmith (Ed.). Cambridge, MA: Blackwell. pp. 550-569.
Selkirk, E. 2000. The interaction of constraints on prosodic phrasing. In Prosody: Theory and
Experiment M. Horne (Ed.), Dordrecht: Kluwer Academic Publishing, pp. 231-262.
Spitsyna, G., Warren, J. E., Scott, S. K., Turkheimer, F. E., Wise, R. J. S. 2006. Converging
Language Streams in the Human Temporal Lobe. J Neurosci 26:7328-7336.
Stark, C. E. L., Squire, L. R. 2001. When zero is not zero: The problem of ambiguous
baseline conditions in fMRI. Proc Natl Acad Sci 98:12760–12766.
32
Steinhauer, K. 2003. Electrophysiological correlates of prosody and punctuation. Brain Lang
86:142-164.
Steinhauer, K., Alter, K., Friederici, A. D. 1999. Brain potentials indicate immediate use of
prosodic cues in natural speech processing. Nat Neurosci 2:191–196.
Steinhauer, K., Friederici, A. D. 2001. Prosodic boundaries, comma rules, and brain
responses: The closure positive shift in ERPs as a universal marker for prosodic
phrasing in listeners and readers. J Psycholing Res 30:267–295.
Strelnikov, K. N., Vorobyev, V. A., Chernigovskaya, T. V., Medvedeva, S. V. 2006. Prosodic
clues to syntactic processing - a PET and ERP study. Neuroimage 29:1127-1134.
Talairach, J., Tournoux, P. 1988. Co-planar stereotaxic atlas of the human brain. New York:
Thieme.
Tervaniemi, M., Szameitat, A., Kruck, S., Schröger, E., Alter, K., De Baene, W. Friederici, A.
D. 2006. From air oscillations to music and speech: functional magnetic resonance
imaging evidence for fine-tuned neural networks in audition. J Neurosci 23:8647–
8652.
t'Hart, R. C. & Cohen, A. 1990. A perceptual study of intonation: an experimental-phonetic
approach to speech melody. Cambridge, England: Cambridge Univ. Press
Tian, B., Reser, D., Durham, A., Kustov, A., Rauschecker, J. 2001. Functional specialization
in rhesus monkey auditory cortex. Science 292:290–293.
Tomasi, D., Ernst, T., Caparelli, E. C., Chang, L. 2006. Common deactivation patterns during
working memory and visual attention tasks: an intra-subject fMRI study at 4 Tesla.
Human Brain Mapp 27:694-705.
Tong, Y., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Xu, Y., Li, X., Lowe, M.
2005. Neural circuitry underlying sentence-level linguistic prosody. Neuroimage
28:417-428.
33
Truckenbrodt, H. 2005. A short report on intonation phrase boundaries in German.
Linguistische Berichte 203:273-296.
Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N.,
Mazoyer, B., Joliot, M. 2002. Automated anatomical labelling of activations in spm
using a macroscopic anatomical parcellation of the MNI MRI single subject brain.
Neuroimage 15:273-289.
Ungerleider, L., Mishkin, M. 1982. Two cortical visual systems. In: Analysis of visual
behavior. D. J. Ingle, M. A. Goodale, R. J. W. (Eds) Cambridge, MA: MIT Press. pp
549-586.
Van Lancker, D. 1980. Cerebral lateralization of pitch cues in the linguistic signal. Papers in
Linguistic: Int. J. Human Communication 13:200–277.
Walker, J. P., Fongemie, K., Daigle, T. 2001. Prosodic facilitation in the resolution of
syntactic ambiguities in subjects with left and right hemisphere damage. Brain Lang
78:169-196.
Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W., Ackermann, H.,
2004. Distinct frontal regions subserve evaluation of linguistic and emotional aspects
of speech intonation. Cereb Cortex 14:1384–1389.
Wong, P. C. M. 2002. Hemispheric specialization of linguistic pitch patterns. Brain Res Bull
59:83–95.
Zatorre, R. J., Belin, P. 2001. Spectral and temporal processing in human auditory cortex.
Cereb Cortex. 11:946–953.
Zatorre, R. J., Bouffard, M., Ahad, P., Belin, P. 2002. Where is ‘where’ in the human auditory
cortex? Nat Neurosci 5:905–909.
34
Acknowledgement.
This work was supported by a grant of the Human Frontier Science
Program (HFSP RGP5300/2002-C102) awarded to Kai Alter.
35
Table 1. Example of the stimulus materials and mean length of the sentences in secs.
Sentences (Naturally spoken and hummed)
Natural Hummed
Type A
1 IPB
Peter verspricht Anna zu arbeiten # und das Büro zu putzen
Peter promises Anna to work and to clean the office
2 IPBs
4.42
4.64
4.74
5.06
5.73
5.76
6.12
5.85
Peter verspricht # Anna zu entlasten # und das Büro zu putzen
Peter promises to support Anna and to clean the office
Type B
1 IPB
Otto bringt Fleisch, # Ute und Georg kaufen Salat und Säfte für das Grillfest.
Otto contributes meat, Ute and Georg buy salad and soft drinks to the barbecue.
2 IBPs
Otto bringt Fleisch, # Ute kauft Salat # und Georg kauft Säfte für das Grillfest.
Otto contributes meat, Ute buys salad and Georg buys soft drinks to the barbecue.
36
Table 2. Contrasts are thresholded at p < 0.001 uncorrected, p < 0.05 corrected on cluster
level. Coordinates are reported as given by SPM2 (MNI space), corresponding only
approximately to Talairach Turnoux space (Talairach and Tournoux, 1988; Brett et al., 2001).
Anatomical labels are given on the basis of the classification of the AAL (automated
anatomical labeling) atlas (Tzourio-Mazoyer et al., 2002). Cerebellar labels within the AALatlas are based on Schmahmann et al. (1999). The first label denotes the location of the
maximum, the following labels denote further areas containing a majority of voxels of the
activated cluster. Abbreviations: IFG_oper = inferior frontal gyrus, pars opercularis, IFG_tri =
inferior frontal gyrus, pars triangularis, MFG = middle frontal gyrus, SFG = superior frontal
gyrus, SMA = supplemental motor area, TP = temporal pole, MTG = middle temporal gyrus,
STG = superior temporal gyrus, AG = angular gyrus, ROp = Rolandic operculum, HeschlG =
gyrus of Heschl, SPL = superior parietal lobule, IPL = inferior parietal lobule, PrecG =
precentral gyrus, PostG = postcentral gyrus, supramG = surpamarginal gyrus, k = cluster size.
*Activation is part of a bigger cluster. 1 masked with natural speech > baseline (hummed
sentences > baseline) at p < 0.05, inclusive.
Side
x
Natural Speech > Baseline
y
Frontal
Left
Right SMA
3
15
Right Insula*
36
24
Right PrecG*
30
-6
Right PrecG*
54
3
Temporal
Left
STG
-60 -24
Right STG*
66
-18
Parietal
Left
SPL
-30 -57
Cerebellum
Left
Right
Right Crus1
42
-57
Natural > Hummed Sentences1
Frontal
Left
IFG_tri , IFG_oper
-39 30
Right SMA
6
15
Right IFG_tri
45
27
Right MFG
33
3
Temporal
Left
TP, MTG, STG
-57 9
Right MTG, STG
63
3
Parietal
Left
AG, SPL
-27 -63
Basal Ganglia
Left
Thalamus, Caudate
-15 -12
Natural Speech: 2 pauses > 1 pause
Temporo-parietal
Left
HeschlG, STG, ROp -45 -18
Right STG, ROp, HeschlG 42
3
Natural Speech: 1 pauses > 2 pause
ns
z
k
Z
x
Hummed Sentences > Baseline
y
Z
k
Z
54
6
30
48
7897
7897
7897
7897
5.36
4.77
4.63
4.09
PrecG*
-36
-12
66
4928
4.78
SMA*
MFG*
PrecG
3
27
51
-3
36
-3
66
30
0
4928
4928
4928
5.81
4.42
6.40
6
6
7897
7897
6.25
6.03
STG*
STG
-54
51
-6
3
-6
54
4928
47
5.90
4.96
48
7897
4.68
IPL
27
-39
30
4928
3.63
-54
-36
-57
-36
-36
-36
92
93
87
4.00
4.04
3.82
24
105
3.61
-36
7897
4.87
0
54
18
54
600
209
50
65
5.13
5.42
4.13
4.30
-24
-18
583
150
5.01
4.43
42
83
4.00
12
347
4.13
Crus1
-42
VI
33
Crus1
45
Hummed > Natural Speech1
ns
Hummed sentences: 2 pauses > 1 pause
12
-12
287
242
4.51
4.64
SupramG, STG, HeschlG
-63
-39
Hummed sentences: 1 pauses > 2 pause
ns
37
Figure Captions
Figure 1. Spectrograms (0 - 5000 Hz) and fundamental frequency contours (pitch: 0 - 500 Hz)
for sample stimuli of the sentence materials used for all conditions of the experiment. Natural
speech is presented on the left, the respective hummed version of the sentence on the right.
Spectograms and pitch contours are given for each pair (1 and 2 IPBs) of sentence materials,
for type A sentences in the top half, for type B sentences in the bottom half.
Figure 2. Trial timing scheme.
Figure 3. A visualization of the six temporal ROIs as an ascending series of axial slices
(shown only for the left hemishpere). ROIs: anterior (y > -15, light blue), middle (-15 > y > 35, yellow) and posterior (y < -35, brown) part of the superior temporal gyrus, gyrus of
Heschl (dark blue), anterior (y > -35, medium blue) and posterior (y < -35, orange) part of the
Rolandic operculum.
Figure 4. a) Comparison of sentences with 2 IPBs to sentences with 1 IPB: 2 IPBs > 1 IPB.
Natural speech (left) and hummed sentences (right). b) Comparison of speech types: Natural
speech > hummed sentences. Threshold: p < 0.001, uncorrected, showing only clusters with
more than 40 voxels, corresponding to a p < 0.05 corrected on cluster-level. Left is left in the
image.
Figure 5. Results of the ROI-analysis (bars with black stripes = left hemisphere, white bars =
right hemisphere). Effect sizes were averaged over all voxels of each ROI. Error bars
represent the standard error of the mean. Abbreviations: n1 = natural speech with 2 IPBs, n2 =
natural speech with 2 IPBs, h1 = hummed sentences with 1 IPB, h2 = hummed sentences with
2 IPBs, IFG oper = inferior frontal gyrus, pars opercularis, IFG tri = inferior frontal gyrus,
pars triangularis, ant/post RolOp = anterior/posterior part of the Rolandic operculum,
ant/mid/post STG = anterior/middle/posterior part of the superior temporal gyrus, Heschl =
gyrus of Heschl.
38
39
Footnotes
1. Although the term 'activation' suggest an absolute value, it is, in the case of fMRI data,
always relative. This is due to two reasons. First, the statistical analysis only evaluates
differences between conditions. Second, it is generally difficult to define an absolute baseline
in brain activation for physiological reasons (Stark & Squire, 2001; Tomasi et al., 2006). In
the present article, the term 'activation' is used only, when an experimental condition is
compared to a low-level baseline. For comparisons between conditions the term 'activation
difference' is used, or the term 'activation' with an adjective indicating the direction of the
difference (e.g., 'stronger').
40