The Developmental Origins of Voice Processing in

Neuron
Report
The Developmental Origins of Voice
Processing in the Human Brain
Tobias Grossmann,1,2,* Regine Oberecker,2 Stefan Paul Koch,3 and Angela D. Friederici2
1Centre
for Brain and Cognitive Development, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig,
Germany
3Berlin Neuroimaging Centre, Department of Neurology, Charite Universitätsmedizin, Luisenstrasse 56, 10099 Berlin, Germany
*Correspondence: [email protected]
DOI 10.1016/j.neuron.2010.03.001
2Department
Open access under CC BY license.
SUMMARY
In human adults, voices are processed in specialized
brain regions in superior temporal cortices. We examined the development of this cortical organization
during infancy by using near-infrared spectroscopy.
In experiment 1, 7-month-olds but not 4-month-olds
showed increased responses in left and right superior
temporal cortex to the human voice when compared
to nonvocal sounds, suggesting that voice-sensitive
brain systems emerge between 4 and 7 months of
age. In experiment 2, 7-month-old infants listened
to words spoken with neutral, happy, or angry prosody. Hearing emotional prosody resulted in increased responses in a voice-sensitive region in the
right hemisphere. Moreover, a region in right inferior
frontal cortex taken to serve evaluative functions in
the adult brain showed particular sensitivity to happy
prosody. The pattern of findings suggests that
temporal regions specialize in processing voices
very early in development and that, already in infancy,
emotions differentially modulate voice processing in
the right hemisphere.
INTRODUCTION
The human voice is clearly one of the most important stimuli in
our auditory environment, which not only conveys speech information, but allows us to recognize individuals and their
emotional states (Belin et al., 2004). In human adults, voices
are processed in specialized brain regions located in the upper
bank of the superior temporal sulcus (Belin et al., 2000).
Recently, it has been shown that macaque monkeys have a
similar voice-selective region in the superior temporal plane
that preferentially responds to conspecific vocalizations, suggesting that recognizing the sound that is the vocalization of a
species member is an evolutionarily conserved brain function
in primates that is independent of language (Petkov et al.,
2008, 2009). These voice-selective areas in auditory cortex,
similar to face-selective areas in visual cortex identified in
both human adults and monkeys (Kanwisher et al., 1997; Tsao
852 Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc.
et al., 2006), are thought to bind the processing of crucial socially
relevant information to sensory systems.
In human adults, the voice-sensitive temporal regions not only
react to voice-specific information but are moreover sensitive to
emotional prosody crucial in social communication (Grandjean
et al., 2005; Ethofer et al., 2006). Such a modulation of sensory
processing by emotional signals is particularly strong for
threat-related emotions, occurs independent of attention, and
is thought to be a fundamental neural mechanism both in faceand voice-sensitive brain regions to prioritize the processing of
significant stimuli (see Vuilleumier, 2006, for a review). Although
well described for the adult brain, the developmental origins of
the cortical organization underlying voice and emotional prosody
processing in the human brain remain unknown. Here we report
two experiments with young infants that fill this gap.
Behavioral work has shown that newborn infants prefer human
voices to similar nonsocial auditory stimuli (Ecklund-Flores and
Turkewitz, 1996; Hutt et al., 1968) and their mother’s voice to
the voice of another newborn’s mother (DeCasper and Fifer,
1980). These postnatal listening preferences are primarily related
to infants’ sensitivity to prosodic characteristics of speech (Mehler et al., 1988; Moon et al., 1993). The latter finding is relevant
insofar as prosodic cues are known to play an essential role in
the perception of vocally communicated emotions (Scherer,
1986). Indeed, newborns of English- and Spanish-speaking
mothers presented with a range of vocal expressions (happy,
angry, sad, and neutral) in their native and nonnative language
showed an increase in eye-opening responses following the
onset of stimuli with happy prosody when compared to the other
emotions, but only when they listened to the vocal expression in
their native language (Mastropieri and Turkewitz, 1999). Despite
this very early form of sensitivity to happy prosody in familiar
contexts, further behavioral studies show that only from around
5 months of age do infants robustly discriminate between happy,
angry, and sad emotional prosody (Flom and Bahrick, 2007;
Walker-Andrews, 1997).
Recent electrophysiological work indicates an early sensitivity
to language-specific and emotion-specific prosodic information
in the speech signal. The processing of prosodic stress was
shown to elicit language-specific event-related brain potentials
(ERPs) in 4- to 5-month-old infants (Friederici et al., 2007). An
ERP study investigating the processing of emotional prosody
in 7-month-old infants (Grossmann et al., 2005) revealed
that infants discriminated between neutral, happy, and angry
Neuron
Voice Processing in Human Infants
emotional prosody. As early as 300 ms poststimulus onset, ERPs
for angry prosody differed from happy or neutral prosody over
frontal and central electrodes, suggesting a greater initial
attention to angry voices. Both angry and happy prosody resulted in a greater positive slow wave than neutral prosody at
temporal electrodes, pointing toward an enhanced sensory processing of emotionally loaded stimuli. Thus it appears that
aspects of the human voice and prosody, be it emotional or intonational, are processed early in life and that the brain reacts quite
specifically to these aspects in speech (for reviews of auditory
language functions during early infancy, see Friederici, 2006;
Kuhl, 2004).
Although this work has provided important insights, ERP data
cannot provide clear information on the exact brain regions that
are involved in processing prosody in infancy. Studies investigating the brain substrates of infants’ auditory discrimination
abilities by measuring their hemodynamic brain responses
indicate that, already by the age of 2 months, infants display a
left hemispheric advantage for spoken language, whereas music
results in bilateral patterns of activation in the planum temporale
(Dehaene-Lambertz et al., 2009). Furthermore, a right hemispheric advantage for the processing of language prosody in
the temporal cortex can be observed by the age of 3 months
(Homae et al., 2006). These lateralization patterns are quite
similar to those seen in adults (for reviews see Vigneau et al.,
2006; Friederici and Alter, 2004; Koelsch and Siebel, 2005).
However, despite the similar brain lateralization patterns, 2- to
3-month-old infants do not yet show specificity in their brain
responses in temporal cortex. Namely, direct contrasts between
speech and music, mother’s and stranger’s voice (DehaeneLambertz et al., 2009), and forward and backward speech
(Dehaene-Lambertz et al., 2002) did not reveal significant differences in 2- to 3-month-olds’ temporal cortex responses. This
suggests that the specialization of temporal brain regions
involved in speech and voice recognition occur after the age of
3 months.
The present study used near-infrared spectroscopy (NIRS)
permitting spatial localization of brain activation by measuring
hemodynamic responses to investigate the neurotopography
of voice and emotional prosody in young infants (see Minagawa-Kawai et al., 2008; Lloyd-Fox et al., 2010 for reviews of
this method and its use with infants). Other neuroimaging techniques that are well established in adults are limited in their use
with infants because of methodological concerns. For example,
positron emission tomography (PET) exposes participants to
radioisotopes, and functional magnetic resonance imaging
(fMRI) requires the participant to remain very still and exposes
them to a noisy environment. Although both PET and fMRI
have been used with infants, this work is restricted to the study
of sleeping, sedated, or very young infants. NIRS is better suited
for infant research because it can accommodate a good degree
of movement from the infants, enabling them to sit upright on
their parent’s lap and behave relatively freely while watching or
listening to certain stimuli. In addition, unlike PET and fMRI,
NIRS systems are portable. Finally, despite its inferior spatial
resolution, NIRS, like fMRI, measures localized patterns of
hemodynamic responses, thus allowing for a comparison of
infant NIRS data with adult fMRI data (see Strangman et al.,
2002, for evidence of a strong correlation between the hemodynamic responses measured with fMRI and NIRS).
We first investigated voice sensitivity in infants, as voices have
been shown to be processed in specific temporal brain regions in
human adults and nonhuman primates (Petkov et al., 2008, 2009).
In experiment 1, we thus presented 4- and 7-month-old infants
with vocal and nonvocal sounds, in order to examine when
regions in infant temporal cortices become sensitive to the
human voice. We decided to study infants of these ages as prior
work suggests that speech and specific voices (e.g., mother’s
voice) do not yet evoke adult-like specialized temporal brain
responses in younger infants (Dehaene-Lambertz et al., 2009).
Second, we assessed whether the voice-sensitive regions as
identified in experiment 1 were modulated by emotional prosody
(Grandjean et al., 2005; Ethofer et al., 2006). In experiment 2, we
therefore presented 7-month-old infants with happy, angry, and
neutral prosody while measuring their brain responses.
RESULTS
Experiment 1
Our analysis of 7-month-old infants’ brain responses revealed
that three channels in posterior temporal cortex, two located in
the right hemisphere (channel 17 and 22) and one located in
the left hemisphere (channel 3), were sensitive to the human
voice (see Figure 1). These three brain regions showed significant increases in oxygenated hemoglobin (oxyHb) concentration
when the vocal condition was compared to the nonvocal condition (left hemisphere: channel 3: F [1, 15] = 4.782, p = 0.045; right
hemisphere: channel 17: F [1, 15] = 5.626, p = 0.032 and channel
22: F [1, 15] = 5.797, p = 0.029). Similar increased activation
effects were not obtained in our analysis of 4-month-old infants’
brain responses (see Figure 2). Rather, there was one region in
the right hemisphere that showed significant increases in oxyHb
when the nonvocal condition was compared to the vocal condition (channel 19: F [1, 15] = 5.07, p = 0.04). For the group of
7-month-olds, no brain regions were found in which the oxyHb
concentration changes were higher in the nonvocal than in the
vocal condition.
The analysis of deoxygenated hemoglobin (deoxyHb) concentration changes revealed no significant differences between
conditions in 4- and 7-month-old infants. The fact that we did
not find any significant decreases in deoxyHb that accompanied
the increase in oxyHb, as one would expect on the basis of adult
work (Obrig and Villringer, 2003), is in line with previous infant
NIRS work (Grossmann et al., 2008; Meek, 2002; Nakato et al.,
2009). Several infant NIRS studies either failed to find a significant
decrease or even observed an increase in deoxyHb concentration. Although a number of factors such as immaturity of the
infant brain have been suggested to explain this difference
between infants and adults, the exact nature of this difference
remains an open question (for a discussion, see Meek, 2002;
Nakato et al., 2009).
Experiment 2
Our analysis revealed two channels in the right hemisphere
(channels 15 and 17) that were sensitive to emotion in 7-monthold infants (see Figure 3). These channels showed significant
Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc. 853
Neuron
Voice Processing in Human Infants
Figure 1. Voice-Sensitive Brain Regions
Identified in 7-Month-Old Infants in Experiment 1
This graph depicts mean oxygenated hemoglobin
concentration changes (±SEM) for vocal and
other sounds measured from 24 NIRS channels.
Channels that showed significant increases for
vocal compared to other sounds are marked in
red on the head model.
differences in oxyHb concentration when emotion (happy, angry,
and neutral prosody) was assessed as a within-subjects factor
in repeated-measures ANOVAs (right hemisphere: channel 15:
F [2, 34] = 7.245, p = 0.002; channel 17: F [2, 34] = 4.977,
p = 0.013). Of these two channels, channel 17 (located in posterior temporal cortex) had been identified as voice sensitive in
experiment 1. This channel showed a significant increase in
oxyHb when the angry condition was compared to the happy
condition (t [1, 17] = 2.165, p = 0.045) and when the angry condition was compared to the neutral condition (t [1, 17] = 2.289,
p = 0.035) using a post-hoc paired t test. Furthermore, channel
17 also showed an increase in oxyHb that was marginally significant when the happy condition was compared to the neutral
condition (t [1, 17] = 2.052, p = 0.056).
Moreover, channel 15, located in the right
inferior frontal cortex, showed a significant increase in oxyHb when the happy
condition was compared to the angry
condition (t [1, 17] = 2.943, p = 0.009)
and when the happy condition was compared to the neutral condition (t [1, 17] =
2.765, p = 0.013), whereas the angry condition was not statistically different from
the neutral condition (t [1, 17] = 0.102,
p = 0.92). As in experiment 1, the analysis of deoxy concentration
changes revealed no significant differences between conditions.
DISCUSSION
The present study investigated the processing of voice specificity and prosody specificity in the infant brain.
Voice Processing
In experiment 1, we found that 7-month-old infants showed
significantly increased hemodynamic responses in left and right
superior temporal cortex to the human voice when compared
to nonvocal sounds. This suggests that voices, as a class of
Figure 2. Brain Responses in 4-Month-Old
Infants in Experiment 1
This graph depicts mean oxygenated hemoglobin
concentration changes (±SEM) for vocal and other
sounds measured from 24 NIRS channels. The
channel that showed a significant increase for
other sounds compared to vocal sounds is
marked in blue on the head model.
854 Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc.
Neuron
Voice Processing in Human Infants
Figure 3. Brain Regions Modulated by
Emotional Prosody in Experiment 2
This graph depicts mean oxygenated hemoglobin
concentration changes (±SEM) for happy, angry,
and neutral prosody measured from 24 NIRS
channels. The channel that showed an increased
sensitivity to angry prosody is marked in magenta,
and the channel that showed increased sensitivity
to happy prosody is marked in blue on the head
model.
auditory objects with high occurrence and ecological interest,
are processed in a fairly specialized brain region by 7 months
of age. Strikingly, 4-month-old infants’ temporal regions did
not show a similar voice-sensitive responding in experiment 1,
indicating that voice sensitivity in the posterior temporal cortex
emerges between 4 and 7 months of age. The finding that the
group of younger infants did not show voice-sensitive responding is in line with earlier fMRI work in which 2- to 3-month-olds
failed to show adult-like increased temporal cortex responses
when speech was compared to backward speech (DehaeneLambertz et al., 2002) or music (Dehaene-Lambertz et al.,
2009). Infants by the age of 4 months rather showed an
increased hemodynamic response to nonvocal stimuli in one
region in right temporal cortex located more anterior then the
region identified as voice sensitive in 7-month-olds. This finding
suggests that 4-month-olds’ brains are able to discriminate
between the two kinds of auditory stimuli but they seem to be
using different (immature) brain mechanisms for this discrimination, since only 7-month-olds show adult-like increased
responses to the human voice.
The brain region identified as voice sensitive in 7-month-olds
appears to be localized in similar portions of the superior
temporal cortex as in adults (see Belin et al., 2000, and Figure S1
for comparison of localization in adults), indicating developmental continuity in voice processing between 7-month-old
infants and adults. In adults, the voice-sensitive regions for
stimulus material identical to that used in the present experiment
1 were found in the upper bank of the superior temporal sulcus
(Belin et al., 2000). However, the spatial precision in localizing
cortical responses achieved with NIRS in infants is more coarse
than the excellent spatial resolution obtained by fMRI used in
previous adult studies (see Aslin and Mehler, 2005; Lloyd-Fox
et al., 2010 for a discussion of the advantages and limitations
of using NIRS with infants). Furthermore, our current measurement technique did not provide us with information of the depth
at which the source of this activation is
located (see Blasi et al., 2007, for NIRS
methodology that allows for the measurement of depth-dependent hemodynamic responses in infants). Therefore,
we cannot assess whether the voicesensitive regions identified in 7-monthold infants are located in the sulcus or
the gyrus of the superior temporal cortex.
Nevertheless, the functionally similar
brain responses in superior temporal
cortex in infants and adults suggest that the current infant
NIRS results and previous fMRI results with adults represent
homologous brain processes. Taken together, in conjunction
with earlier work with nonhuman primates (Petkov et al., 2008),
by demonstrating that this brain specialization emerges early
during human postnatal development, the results of experiment
1 provide further support for the notion that sensitive responding
to the vocalizations of conspecifics is an evolutionarily important
brain function in primates.
Processing Emotional Prosody
The brain responses to emotional prosody as obtained in experiment 2 are in line with previous adult studies (Grandjean et al.,
2005; Ethofer et al., 2006). Hearing emotional prosody (happy
and angry) but not neutral prosody evoked an increased
response in a right temporal region in 7-month-old infants that
was identified as voice sensitive in experiment 1. This result indicates that the enhancement of sensory processing by emotional
signals is a fundamental and early developing neural mechanism
engaged to prioritize the processing of significant stimuli (Vuilleumier, 2006). It is interesting to note that the brain response
in right temporal cortex was larger to angry prosody when
compared to happy prosody, indicating that threatening signals
have a particularly strong impact on voice processing (see also
Grossmann et al., 2005). This heightened sensitivity to negative
information is in accordance with the notion of a negativity
bias, which is proposed to be an evolutionarily driven propensity
to attend and react more strongly to negative information
(Cacioppo and Berntson, 1999) that appears to emerge in the
second half of the first year of life (see Vaish et al., 2008).
In experiment 2, hearing happy prosody but not angry or
neutral prosody, evoked an increased response in a region in
right inferior frontal cortex in 7-month-olds that did not show
voice sensitivity in experiment 1. Greater activation to happy
voices than angry voices in right inferior frontal cortex has also
Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc. 855
Neuron
Voice Processing in Human Infants
been observed in adults (Johnstone et al., 2006), suggesting
developmental continuity in how the human brain processes
happy prosody. Current models of prosody processing in adults
(Schirmer and Kotz, 2006; Wiethoff et al., 2008) hold that,
following the acoustic analysis in temporal cortices, information
is passed on to the inferior frontal regions for further and more
detailed evaluation. The finding that 7-month-olds engage right
inferior frontal cortex when listening to happy prosody might
therefore indicate that speech characterized by positive vocal
affect undergoes a more explicit evaluation than speech with
neutral or angry affect.
This finding might also relate to a number of behavioral
findings suggesting that infants show strong perferences for
infant-directed speech (so-called motherese). Motherese
compared to adult-directed speech posseses unique acoustic
characteristics: it is generally slower and contains exaggerated
pitch contours, hyperarticulation of vowels, and (critical for the
interpretation of the current findings) positive prosody (Fernald,
1985; Kuhl et al., 1997; Cooper and Aslin, 1990). It is also interesting to note that motherese with its happy prosody has been
found to facilitate learning, specifically language and word
learning in the developing infant (Kuhl, 2004; Liu et al., 2003;
Singh et al., 2002; Vallabha et al., 2007). Therefore, in conjunction with these behavioral findings, the inferior frontal response
to happy prosody observed in 7-month-old infants in experiment
2 may constitute the neural basis for a more detailed cognitive
evaluation of infant-directed happy speech.
information once any lexical information is absent in the acoustic
stimuli (for a review, see Friederici and Alter, 2004). The current
data from 7-month-old infants together with those from adults
suggest that voice-sensitive regions in the right hemisphere
play an important role in processing emotional prosody.
Role of the Right Hemisphere
Even though voice-sensitive responses were observed in both
hemispheres in 7-month-olds in experiment 1, the right hemisphere seemed to be more interested in voice compared to other
sounds. While only one NIRS channel showed a voice-sensitive
response in the left hemisphere, two adjacent voice-sensitive
channels were found in the right hemisphere. Moreover, the
overall magnitude of the responses to voices in the two channels
in the right hemisphere was larger than that in the left hemisphere. The finding that the spatial extent and the magnitude
of the voice-sensitive response were larger in the right hemisphere is in line with adult imaging findings suggesting that
the voice-sensitive responses are predominant in the right
hemisphere (Belin et al., 2000). The modulation of infant brain
responses by emotion observed in experiment 2 was restricted
to the right hemisphere. Similarly, in adult neuroimaging studies,
responses in temporal cortex showed strongest effects of
emotion in the right hemisphere (Grandjean et al., 2005; Ethofer
et al., 2006). In conjunction with some lesion work (Borod et al.,
2002), this has led to the suggestion that the right hemisphere
plays a predominant role in processing emotional prosody (Wildgruber et al., 2002). However, in adults, lesion studies have also
discussed the contribution of the left hemisphere for the understanding of emotional prosody (Kucharska-Pietura et al., 2003;
Ross et al., 1997; Van Lancker and Sidtis, 1992). But this can
be explained by the fact that in these adult lesion studies meaningful speech stimuli were used, and the left hemisphere is
thought to be involved in the recognition of emotion conveyed
through meaningful speech (Kucharska-Pietura et al., 2003).
The right hemisphere shows a clear dominance for prosodic
Participants
The final sample in experiment 1 consisted of 16 7-month-old infants (eight
girls) aged between 201 and 217 days (M = 210.2 days) and 16 4-month-old
infants (seven girls) aged between 108 and 135 days (M = 123.1 days). The final
sample in experiment 2 consisted of 18 7-month-old infants (eight girls) aged
between 199 and 216 days (M = 211.8 days). An additional 26 were tested
for experiment 1 (4 months: n = 6; 7 months: n = 8) and experiment 2 (7 months:
n = 12) but not included in the final sample because they had too many motion
artifacts resulting in too few usable trials for analysis (minimum number of five
trials per condition) (n = 18) or because of technical failure (n = 2). Note that an
attrition rate at this level is within the normal range for an infant NIRS study
(Minagawa-Kawai et al., 2008; Lloyd-Fox et al., 2010). All infants were born
full-term (37–42 weeks gestation) and with normal birthweight (>2500 g).
All parents gave informed consent before the study.
856 Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc.
Implications for Neurodevelopmental Disorders
Finally, these findings might also have important implications
for neurodevelopmental disorders such as autism. Adult participants with autism fail to activate voice-sensitive regions in temporal cortex (Gervais et al., 2004). Furthermore, older children
and adults with autism are impaired in identifying emotion expressed through tone of voice (Hobson et al., 1989; Rutherford
et al., 2002; Van Lancker et al., 1989). Our findings demonstrating that voice-sensitive brain regions are already specialized
and modulated by emotional information by the age of 7 months
raise the possibility that the critical neurodevelopmental processes underlying impaired voice processing in autism might
occur before 7 months. Therefore, in future work the current
approach could be used to assess individual differences in
infants’ responses to voices and emotional prosody and might
thus serve as one of potentially multiple markers that can help
with an early identification of infants at risk for a neurodevelopmental disorder (for example, see Elsabbagh and Johnson,
2007).
EXPERIMENTAL PROCEDURES
Stimuli
For experiment 1, stimulus material consisted of 40 8 s long trials of vocal and
nonvocal sounds (16 bit/22 KHz sampling rate). Vocal trials included speech
(words and nonwords) as well as nonspeech vocalizations, and nonvocal trials
consisted of sounds from nature, animals, modern human environment (cars,
telephone, airplanes), and musical instruments (for more detail, see Belin et al.
[2000] and http://vnl.psy.gla.ac.uk). For experiment 2, the stimulus material
consisted of 74 semantically neutral German verbs previously validated and
used with adults (Schirmer and Kotz, 2006) and with infants (Grossmann
et al., 2005). A female speaker produced all words with happy, angry, and
neutral prosody. Words were taped with a DAT recorder and digitized at a
16 bit/44.1 kHz sampling rate. The three emotions did not differ with respect
to their mean intensity (for further acoustic analysis, see Grossmann et al.,
2005).
Procedures
Infants were seated on their parent’s lap in a dimly lit and sound-attenuated
room. Stimuli were presented via loudspeaker (SPL = 70 dB). In experiment
1, the experimental sessions consisted of 8 s long trials during which various
vocal or nonvocal sound stimuli were presented consecutively. Voices and
Neuron
Voice Processing in Human Infants
nonvocal sounds were randomly distributed over the session with no more
than two trials of the same category occurring in a row. The intertrial interval
was 12 s. In experiment 2, the experimental session consisted of 5 s long trials
during which five words of one emotion category (happy, angry, or neutral)
were presented consecutively. Trials from the different emotional categories
were randomly distributed over the session with no more than two trials of
the same category occurring consecutively. The intertrial interval was 15 s.
During the presentation of the acoustic stimuli, a cartoon was presented to
the infants on a computer screen placed at a 60 cm distance in order to
keep their attention and reduce motion artifacts. The experimental session
lasted on average 7 min, 20 s (average number of trials = 22).
Data acquisition and analysis. In both experiments, cortical activation was
measured using a Hitachi ETG-4000 NIRS system. The multichannel system
uses two wavelengths at 695 nm and 830 nm. Two custom-built arrays consisting of nine optodes (five sources, four detectors) in a 12 channel (sourcedetector pairs) arrangement with an interoptode separation of 20 mm were
placed over temporal and inferior frontal brain regions on each hemisphere
(see Figures 1–3) using an Easycap (Falk Minow). The NIRS method relies on
the optical determination of changes in oxygenated (oxyHb) and deoxygenated (deoxyHb) hemoglobin concentrations in cerebral cortex, which result
from increased regional cerebral blood flow (Obrig and Villringer, 2003).
NIRS data were continuously sampled at 10 Hz. For analysis, after calculation
of the hemoglobin concentration changes, pulse-related signal changes and
overall trends were eliminated by low-pass filtering (Butterworth, 5th order,
lower cutoff 0.5 Hz). Movement artifacts were corrected by an established
procedure (see Koch et al., 2006; Wartenburger et al., 2007), which allows
marking of artifacts and then padding the contaminated data segments by
linear interpolation. Cortical activations were assessed statistically by comparing average concentration changes (oxyHb and deoxyHb) within trials
(20 s after stimulus onset) between the experimental conditions by using
repeated-measures ANOVAs.
DeCasper, A.J., and Fifer, W.P. (1980). Of human bonding: Newborns prefer
their mothers’ voices. Science 280, 1174–1176.
Dehaene-Lambertz, G., Dehaene, S., and Hertz-Pannier, L. (2002). Functional
neuroimaging of speech perception in infants. Science 208, 2013–2015.
Dehaene-Lambertz, G., Montavont, A., Jobert, A., Allirol, L., Dubois, J.,
Hertz-Pannier, L., and Dehaene, S. (2009). Language or music, mother or
Mozart? Structural and environmental influences on infants’ language
networks. Brain Lang., in press. Published online October 27, 2009. 10.
1016/j.bandl.2009.09.003.
Ecklund-Flores, L., and Turkewitz, G. (1996). Asymmetric headturning to
speech and nonspeech in human newborns. Dev. Psychobiol. 29, 205–217.
Elsabbagh, M., and Johnson, M.H. (2007). Infancy and autism: progress,
prospects, and challenges. Prog. Brain Res. 164, 355–383.
Ethofer, T., Anders, S., Wiethoff, S., Erb, M., Herbert, C., Saur, R., Grodd, W.,
and Wildgruber, D. (2006). Effects of prosodic emotional intensity on activation
of associative auditory cortex. Neuroreport 17, 249–253.
Fernald, A. (1985). Four-month-olds prefer to listen to motherese. Infant
Behav. Dev. 8, 181–195.
Flom, R., and Bahrick, L.E. (2007). The development of infant discrimination of
affect in multimodal and unimodal stimulation: The role of intersensory
redundancy. Dev. Psychol. 43, 238–252.
Friederici, A.D. (2006). The neural basis of language development and its
impairment. Neuron 52, 941–952.
Friederici, A.D., and Alter, K. (2004). Lateralization of auditory language
functions: a dynamic dual pathway model. Brain Lang. 89, 267–276.
Friederici, A.D., Friedrich, M., and Christophe, A. (2007). Brain responses in
4-month-old infants are already language specific. Curr. Biol. 17, 1208–1211.
SUPPLEMENTAL INFORMATION
Gervais, H., Belin, P., Boddaert, N., Leboyer, M., Coez, A., Sfaello, I., Barthélémy, C., Brunelle, F., Samson, Y., and Zilbovicius, M. (2004). Abnormal
cortical voice processing in autism. Nat. Neurosci. 7, 801–802.
Supplemental Information includes a supplemental figure related to Figure 1
and can be found with this article online at doi:10.1016/j.neuron.2010.03.001.
Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer,
K.R., and Vuilleumier, P. (2005). The voices of wrath: brain responses to angry
prosody in meaningless speech. Nat. Neurosci. 8, 145–146.
ACKNOWLEDGMENTS
Grossmann, T., Striano, T., and Friederici, A.D. (2005). Infants’ electric brain
responses to emotional prosody. Neuroreport 16, 1825–1828.
T.G. was supported by a Sir Henry Wellcome Postdoctoral Fellowship
awarded by the Wellcome Trust (082659/Z/07/Z).
Grossmann, T., Johnson, M.H., Lloyd-Fox, S., Blasi, A., Deligianni, F., Elwell,
C., and Csibra, G. (2008). Early cortical specialization for face-to-face communication in human infants. Proc. R. Soc. Lond. B. Biol. Sci. 275, 2803–2811.
Accepted: February 13, 2010
Published: March 24, 2010
Hobson, R.P., Ouston, J., and Lee, A. (1989). Naming emotion in faces and
voices: abilities and disabilities in autism and mental retardation. Br. J. Dev.
Psychol. 7, 237–250.
REFERENCES
Aslin, R.N., and Mehler, J. (2005). Near-infrared spectroscopy for functional
studies of brain activity in human infants: promise, prospects, and challenges.
J. Biomed. Opt. 10, 11009.
Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., and Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature 403, 309–312.
Belin, P., Fecteau, S., and Bédard, C. (2004). Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135.
Blasi, A., Fox, S., Everdell, N., Volein, A., Tucker, L., Csibra, G., Gibson, A.P.,
Hebden, J.C., Johnson, M.H., and Elwell, C.E. (2007). Investigation of depth
dependent changes in cerebral haemodynamics during face perception in
infants. Phys. Med. Biol. 52, 6849–6864.
Borod, J.C., Bloom, R.L., Brickman, A.M., Nakhutina, L., and Curko, E.A.
(2002). Emotional processing deficits in individuals with unilateral brain
damage. Appl. Neuropsychol. 9, 23–36.
Cacioppo, J.T., and Berntson, G.G. (1999). The affect system: architecture and
operating characteristics. Curr. Dir. Psychol. Sci. 8, 133–137.
Cooper, R.P., and Aslin, R.N. (1990). Preference for infant-directed speech in
the first month after birth. Child Dev. 61, 1584–1595.
Homae, F., Watanabe, H., Nakano, T., Asakawa, K., and Taga, G. (2006).
The right hemisphere of sleeping infant perceives sentential prosody.
Neurosci. Res. 54, 276–280.
Hutt, S.J., Hutt, C., Leonard, H.G., von Bermuth, H., and Muntjewerff, W.F.
(1968). Auditory responsivity in the human neonate. Nature 218, 888–890.
Johnstone, T., van Reekum, C.M., Oakes, T.R., and Davidson, R.J. (2006).
The voice of emotion: an FMRI study of neural responses to angry and happy
vocal expressions. Soc. Cogn. Affect. Neurosci. 1, 242–249.
Kanwisher, N., McDermott, J., and Chun, M.M. (1997). The fusiform face area:
a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311.
Koch, S.P., Steinbrink, J., Villringer, A., and Obrig, H. (2006). Synchronization
between background activity and visually evoked potential is not mirrored by
focal hyperoxygenation: implications for the interpretation of vascular brain
imaging. J. Neurosci. 26, 4940–4948.
Koelsch, S., and Siebel, W.A. (2005). Towards a neural basis of music perception. Trends Cogn. Sci. 9, 578–584.
Kucharska-Pietura, K., Phillips, M.L., Gernand, W., and David, A.S. (2003).
Perception of emotions from faces and voices following unilateral brain
damage. Neuropsychologia 41, 963–970.
Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc. 857
Neuron
Voice Processing in Human Infants
Kuhl, P.K. (2004). Early language acquisition: cracking the speech code. Nat.
Rev. Neurosci. 5, 831–843.
Scherer, K.R. (1986). Vocal affect expression: a review and a model for future
research. Psychol. Bull. 99, 143–165.
Kuhl, P.K., Andruski, J.E., Chistovich, I.A., Chistovich, L.A., Kozhevnikova,
E.V., Ryskina, V.L., Stolyarova, E.I., Sundberg, U., and Lacerda, F. (1997).
Cross-language analysis of phonetic units in language addressed to infants.
Science 277, 684–686.
Schirmer, A., and Kotz, S.A. (2006). Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn. Sci. 10, 24–30.
Liu, H.M., Kuhl, P.K., and Tsao, F.M. (2003). An association between mothers’
speech clarity and infants’ speech discrimination skills. Dev. Sci. 6, 1–10.
Strangman, G., Culver, J.P., Thompson, J.H., and Boas, D.A. (2002). A quantitative comparison of simultaneous BOLD fMRI and NIRS recordings during
functional brain activation. Neuroimage 17, 719–731.
Lloyd-Fox, S., Blasi, A., and Elwell, C.E. (2010). Illuminating the developing
brain: The past, present and future of functional near-infrared spectroscopy.
Neurosci. Biobehav. Rev. 34, 269–284.
Mastropieri, D., and Turkewitz, G. (1999). Prenatal experience and neonatal
responsiveness to vocal expressions of emotion. Dev. Psychobiol. 35,
204–214.
Meek, J. (2002). Basic principles of optical imaging and application to the
study of infant development. Dev. Sci. 5, 371–380.
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., and
Amiel-Tison, C. (1988). A precursor of language acquisition in young infants.
Cognition 29, 143–178.
Minagawa-Kawai, Y., Mori, K., Hebden, J.C., and Dupoux, E. (2008). Optical
imaging of infants’ neurocognitive development: recent advances and
perspectives. Dev. Neurobiol. 68, 712–728.
Moon, C., Cooper, R.P., and Fifer, W. (1993). Two-day-olds prefer their native
language. Infant Behav. Dev. 16, 495–500.
Nakato, E., Otsuka, Y., Kanazawa, S., Yamaguchi, M.K., Watanabe, S., and
Kakigi, R. (2009). When do infants differentiate profile face from frontal face?
A near-infrared spectroscopic study. Hum. Brain Mapp. 30, 462–472.
Obrig, H., and Villringer, A. (2003). Beyond the visible—imaging the human
brain with light. J. Cereb. Blood Flow Metab. 23, 1–18.
Petkov, C.I., Kayser, C., Steudel, T., Whittingstall, K., Augath, M., and
Logothetis, N.K. (2008). A voice region in the monkey brain. Nat. Neurosci.
11, 367–374.
Singh, L., Morgan, J., and Best, C. (2002). Infants’ listening preferences: baby
talk or happy talk. Infancy 3, 365–394.
Tsao, D.Y., Freiwald, W.A., Tootell, R.B.H., and Livingstone, M.S. (2006).
A cortical region consisting entirely of face-selective cells. Science 311,
670–674.
Vaish, A., Grossmann, T., and Woodward, A. (2008). Not all emotions are
created equal: the negativity bias in social-emotional development. Psychol.
Bull. 134, 383–403.
Vallabha, G.K., McClelland, J.L., Pons, F., Werker, J.F., and Amano, S. (2007).
Unsupervised learning of vowel categories from infant-directed speech. Proc.
Natl. Acad. Sci. USA 104, 13273–13278.
Van Lancker, D., and Sidtis, J.J. (1992). The identification of affective-prosodic
stimuli by left- and right-hemisphere-damaged subjects: all errors are not
created equal. J. Speech Hear. Res. 35, 963–970.
Van Lancker, D.R., Cornelius, C., and Kreiman, J. (1989). Recognition of
emotional-prosodic meanings in speech by autistic, schizophrenic, and
normal children. Dev. Neuropsychol. 5, 207–226.
Vigneau, M., Beaucousin, V., Hervé, P.Y., Duffau, H., Crivello, F., Houdé, O.,
Mazoyer, B., and Tzourio-Mazoyer, N. (2006). Meta-analyzing left hemisphere
language areas: phonology, semantics, and sentence processing. Neuroimage 30, 1414–1432.
Vuilleumier, P. (2006). How brains beware: neural mechanisms of emotional
attention. Trends Cogn. Sci. 9, 585–594.
Walker-Andrews, A.S. (1997). Infants’ perception of expressive behaviors:
Differentiation of multimodal information. Psychol. Bull. 121, 1–20.
Petkov, C.I., Logothetis, N.K., and Obleser, J. (2009). Where are the human
speech and voice regions, and do other animals have anything like them?
Neuroscientist 15, 419–429.
Wartenburger, I., Steinbrink, J., Telkemeyer, S., Friedrich, M., Friederici, A.D.,
and Obrig, H. (2007). The processing of prosody: Evidence of interhemispheric
specialization at the age of four. Neuroimage 34, 416–425.
Ross, E.D., Thompson, R.D., and Yenkosky, J. (1997). Lateralization of
affective prosody in brain and the callosal integration of hemispheric language
functions. Brain Lang. 56, 27–54.
Wiethoff, S., Wildgruber, D., Kreifelts, B., Becker, H., Herbert, C., Grodd, W.,
and Ethofer, T. (2008). Cerebral processing of emotional prosody—influence
of acoustic parameters and arousal. Neuroimage 39, 885–893.
Rutherford, M.D., Baron-Cohen, S., and Wheelwright, S. (2002). Reading the
mind in the voice: a study with normal adults and adults with Asperger
syndrome and high functioning autism. J. Autism Dev. Disord. 32, 189–194.
Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., and Grodd, W. (2002).
Dynamic brain activation during processing of emotional intonation: Influence
of acoustic parameters, emotional valence, and sex. Neuroimage 4, 856–859.
858 Neuron 65, 852–858, March 25, 2010 ª2010 Elsevier Inc.