The co-occurrence of multisensory facilitation and cross

J Neurophysiol 106: 2896 –2909, 2011.
First published August 31, 2011; doi:10.1152/jn.00303.2011.
The co-occurrence of multisensory facilitation and cross-modal conflict in the
human brain
Andreea Oliviana Diaconescu, Claude Alain, and Anthony Randal McIntosh
1
Rotman Research Institute, Baycrest Centre, Toronto; and 2Department of Psychology, University of Toronto, Ontario, Canada
Submitted 5 April 2011; accepted in final form 26 August 2011
multisensory integration; auditory; visual; magnetoencephalography;
event-related synthetic aperture magnetometry analysis
PREVIOUS EXAMINATIONS OF MULTISENSORY processes demonstrated that perception by means of distinct sensory channels
not only contributes to the richness of each sensory experience
but also improves object detection and recognition (for a
review, see Doehrmann and Naumer 2008). Simple low-level
stimulus properties, including temporal correspondence and
spatial congruence, mediate multisensory facilitation in auditory and visual modalities. Indeed, a stimulus presented in one
sensory modality can facilitate the detection of a spatially
coincident stimulus presented in a distinct sensory modality.
Address for reprint requests and other correspondence: A. O. Diaconescu,
Laboratory for Social and Neural Systems Research, Dept. of Economics, Univ. of
Zürich, Blümlisalpstrasse 10, CH-8006, Zürich, Switzerland (e-mail:
[email protected]).
2896
McDonald et al. (2000) used an auditory cross-modal cueing
paradigm to demonstrate that the co-occurrence of an irrelevant
sound decreased the speed of detection of a subsequent light
when the target light appeared in the same location as the
sound. In addition, Diederich and Colonius (2004) provided
evidence of faster response times (RTs) to monochromatic
lights when they were accompanied by simple tones presented
simultaneously or at short intervals before or after target
stimuli. Therefore, a temporally synchronized stimulus in one
sensory modality facilitates the detectability of another stimulus presented in another sensory modality.
Behavioral facilitation in multisensory contexts capitalizes
on the spatial and temporal properties of cross-modal, audiovisual (AV) stimuli as well as their semantic relatedness. The
critical stimuli used in the present study included complex
sounds and semantically related black-and-white line drawings
of animate and inanimate objects. We examined the neural
processes that mediate the associations between familiar auditory and visual stimuli, which can be referred to as semantic
multisensory integration resembling naturalistic situations of
multisensory stimulation. Multisensory integration, in the present context, refers to the neural process by which unisensory
signals are combined to form a unique signal that is specifically
associated with the cross-modal stimulus. It is operationally
defined as a multisensory response, both neural and behavioral,
that is significantly distinct from the sum of the responses
evoked by the modality-specific component stimuli (cf. Stein et
al. 2010).
Whereas integration of semantically congruent cross-modal
stimuli speeds up object detection and recognition, crossmodal conflicts impair performance (Chen and Spence 2010).
Previous research demonstrated the presence of visual dominance during cross-modal conflicts, with ambiguous visual
inputs impairing the recognition of complex sounds more than
ambiguous auditory information impaired visual object recognition (Laurienti et al. 2004; Yuval-Greenberg and Deouell
2009). Furthermore, studies from Spence and colleagues
(Hartcher-O’Brien et al. 2008; Koppen et al. 2008; Koppen and
Spence 2007a, 2007b; Sinnett et al. 2007, 2008) reported
longer RTs following cross-modal stimuli when participants
were required to detect the presence of simple tones embedded
within cross-modal presentations. These findings further support the interference of the more dominant visual sensory
modality on the detection of auditory targets.
To date, few studies have examined the effects of both
multisensory facilitation and competition on performance. Sinnett et al. (2008) showed that presentations of complex sounds
alongside visual objects can either speed up or slow down RTs
depending on task demands. Participants were significantly
0022-3077/11 Copyright © 2011 the American Physiological Society
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Diaconescu AO, Alain C, McIntosh AR. The co-occurrence of
multisensory facilitation and cross-modal conflict in the human brain.
J Neurophysiol 106: 2896 –2909, 2011. First published August 31,
2011; doi:10.1152/jn.00303.2011.—Perceptual objects often comprise a visual and auditory signature that arrives simultaneously
through distinct sensory channels, and cross-modal features are linked
by virtue of being attributed to a specific object. Continued exposure
to cross-modal events sets up expectations about what a given object
most likely “sounds” like, and vice versa, thereby facilitating object
detection and recognition. The binding of familiar auditory and visual
signatures is referred to as semantic, multisensory integration.
Whereas integration of semantically related cross-modal features is
behaviorally advantageous, situations of sensory dominance of one
modality at the expense of another impair performance. In the present
study, magnetoencephalography recordings of semantically related
cross-modal and unimodal stimuli captured the spatiotemporal patterns underlying multisensory processing at multiple stages. At early
stages, 100 ms after stimulus onset, posterior parietal brain regions
responded preferentially to cross-modal stimuli irrespective of task
instructions or the degree of semantic relatedness between the auditory and visual components. As participants were required to classify
cross-modal stimuli into semantic categories, activity in superior
temporal and posterior cingulate cortices increased between 200 and
400 ms. As task instructions changed to incorporate cross-modal
conflict, a process whereby auditory and visual components of crossmodal stimuli were compared to estimate their degree of congruence,
multisensory processes were captured in parahippocampal, dorsomedial, and orbitofrontal cortices 100 and 400 ms after stimulus onset.
Our results suggest that multisensory facilitation is associated with
posterior parietal activity as early as 100 ms after stimulus onset.
However, as participants are required to evaluate cross-modal stimuli
based on their semantic category or their degree of congruence,
multisensory processes extend in cingulate, temporal, and prefrontal
cortices.
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
J Neurophysiol • VOL
occipital complex (LOC), which has been previously linked to
visual object recognition (Malach et al. 1995).
Electrophysiological research of semantic cross-modal conflicts has focused primarily on conflicts regarding the phonetic
perception of speech. Whereas congruent articulatory gestures
significantly speed up auditory processes related to speech (van
Wassenhove et al. 2005), incompatible visual input can greatly
impair speech perception. The most documented case of crossmodal conflict during speech is the McGurk effect (McGurk
and MacDonald 1976), in which incongruent or temporally
asynchronous visual representations of speech modify the auditory percept phonetically (McGurk and MacDonald 1976).
Several MEG studies (e.g., Kaiser et al. 2005; Nishitani and
Hari 2002) showed oscillatory gamma-band activity in response to McGurk-like stimuli beginning in the posterior
parietal scalp region at 160 ms and extending over occipital
and temporal-to-inferior frontal regions between 200 and 320
ms. To determine the brain regions that are chiefly involved in
multisensory integration during speech, previous studies used
transcranial magnetic stimulation (TMS) and temporally perturbed task-related neural activity. Disruption of posterior
parietal activity 200 ms after stimulus onset impaired AV
integration of tones and shapes (Pourtois and de Gelder 2002).
In addition, TMS applied to the superior temporal sulcus (STS)
significantly reduced the McGurk effect within 100 ms after
auditory syllable onset (Beauchamp et al. 2010).
With respect to semantically incongruent combinations of
nonspeech sounds and visual objects, large negative amplitude
deflections were captured 390 ms after stimulus onset (Molholm et al. 2004). This ERP pattern resembled the N400
component, which was previously implicated in semantic mismatch processing of objects in linguistic contexts (Kutas and
Federmeier 2000). Although the effects of cross-modal incongruence were not explicitly examined, recent functional magnetic resonance imaging (fMRI) studies suggested that cingulate and inferior frontal cortices were recruited during semantic
conflicts (Laurienti et al. 2003; Van Petten and Luka 2006).
In the present study, we investigated the effects of both
multisensory facilitation and semantic cross-modal conflicts in
sensory-specific and modality-independent brain regions. We
predicted that the integrated neural signal would be larger and,
most importantly, that multisensory processes would engage a
distinct set of brain regions than each of the responses evoked
by the modality-specific component stimuli. We considered it
to be a principled prediction that multisensory integration
processes are not simply the linear combination of unisensory
responses. Recent multiple cell recording studies (Molholm et
al. 2006) and functional neuroimaging studies in humans
(Baumann and Greenlee 2007; Bishop and Miller 2008; Calvert et al. 2001; Grefkes et al. 2002; Macaluso et al. 2004;
Moran et al. 2008) showed that cross-modal stimuli not only
elicited increased activity in sensory-specific cortices but also
activated a distinct network of posterior parietal brain regions,
including the inferior parietal sulcus (IPS), the inferior parietal
lobule (IPL), and the superior parietal lobule (SPL).
All conditions used in the present study included both
unimodal and cross-modal presentations of familiar complex
sounds and visual objects. Behavioral multisensory facilitation
effects that capitalize on the semantic relatedness of crossmodal stimuli were assessed by asking participants to categorize stimuli based on their animacy (i.e., living vs. nonliving
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
faster to detect visual targets when they were embedded within
cross-modal presentations than when they were presented
alone. However, when required to identify complex sounds that
were either presented in isolation or as cross-modal pairs,
participants were significantly slower to respond to crossmodal stimuli compared with unimodal ones.
Given the behavioral advantages of multisensory integration
and the performance decreases following cross-modal conflicts, more research is required to determine whether a common network of brain regions is responsible for processing
cross-modal conflict compared with multisensory facilitation
processes. Furthermore, more research is required to determine
the extent to which multisensory processes are modulated by
the degree of semantic correspondence between visual and
auditory stimuli. Thus, in the present study, we used magnetoencephalography (MEG) recordings to investigate the neural
mechanisms underlying both multisensory facilitation and
cross-modal conflict during the presentation of complex sounds
and semantically related black-and-white line drawings of
animate and inanimate objects.
Noninvasive electrophysiological studies in humans that
used semantically related AV stimuli and contrasted crossmodal with unimodal stimulus presentations demonstrated
modulations of both early and late event-related related potentials (ERPs) over both sensory-specific and modality-independent cortical areas (e.g., Molholm et al. 2004; Yuval-Greenberg and Deouell 2007). For instance, recent studies found
increases in the visual N1 waveform at occipital sites for
cross-modal presentations compared with unimodal ones (Molholm et al. 2004; Yuval-Greenberg and Deouell 2007). YuvalGreenberg and Deouell (2007) also examined oscillatory gamma-band activity (30 –70 Hz) in response to cross-modal stimuli using a name verification task of naturalistic objects. Early
and late gamma-band activity at 90 and 260 ms, respectively,
was associated with low-level feature integration and higher
level object representation. Similarly, using a perceptual decision task, Schneider et al. (2008) found enhanced gamma-band
activity between 120 and 180 ms in response to semantically
congruent relative to semantically incongruent cross-modal
pairs of common objects.
There is increasing evidence that meaning and semantic
relatedness play an important role in multisensory integration.
For instance, Senkowski et al. (2007) compared the effects of
natural and abstract stimuli reflecting motion. Participants were
presented with a random stream of naturalistic and abstract
video clips and static target stimuli. Each stimulus class consisted of unimodal auditory, unimodal visual, and cross-modal
presentations. The participants’ task was to count the target
stimuli embedded within the natural and abstract stimulus
classes. Only the former showed evidence of early multisensory integration within 120 ms after stimulus onset. Furthermore, naturalistic motion represented by AV presentations
engaged occipital, temporal, and frontal regions.
Murray et al. (2004), on the other hand, demonstrated the
efficacy of cross-modal stimuli on subsequent memory recall
by showing that visual images were more likely to be remembered when they were previously presented along with semantically related complex sounds than when they were presented
only in the visual modality. Cross-modal presentations, in this
case, led to increases in cortical activity in the right lateral
2897
2898
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
objects). Conversely, both behavioral and neural responses to
cross-modal conflicts were examined by requiring participants
to detect the level of semantic correspondence between auditory and visual stimuli embedded within cross-modal presentations that were either semantically congruent or incongruent.
We were particularly interested to compare multisensory integration processes in conditions of semantic congruence and
behavioral facilitation to multisensory processes in conditions
containing cross-modal semantic conflict and reductions in
performance.
MATERIALS AND METHODS
Participants
Stimuli
Stimuli were selected to have semantically congruent auditory and
visual representations. Two types of stimuli, animate and inanimate,
were used in the study. Items were selected from four distinct
categories: 1) animals, 2) musical instruments, 3) automobiles, and 4)
household objects. The first category of stimuli was labeled as
“animate,” whereas the remaining three categories were considered
“inanimate” objects.
Black-and-white line drawings, selected from the Snodgrass and
Vanderwart (1980), served as the visual stimuli. All visual stimuli
were matched according to size (in pixels), brightness, and contrast.
Participants sat in an upright position and viewed the visual stimuli on
a back-projection screen that subtended ⬃30 degrees of visual angle
when they were seated 70 cm from the screen. Semantically related
nonspeech, complex sounds were matched in terms of loudness by
computing the mean root mean square value. Each complex sound
was assigned the mean amplitude; thus louder sounds were reduced,
whereas softer ones were amplified. Complex sounds were delivered
binaurally at an intensity level of 60 dB HL based on the audiometric
mean across both ears. Binaural auditory stimuli were presented via an
OB 822 Clinical Audiometer through ER30 transducers (Etymotic
Research, Elk Grove, IL) and connected with a 1.5-m length of
matched plastic tubing and foam earplugs to the participants’ ears.
Complex sounds and line drawings were paired to create crossmodal congruent stimulus combinations in which the AV stimuli
matched semantically (e.g., picture of a lion paired with the sound of
a roar or picture of an ambulance car paired with a siren). Incongruent
cross-modal stimulus combinations, on the other hand, were created
by randomly pairing complex sounds with visual objects from distinct
categories; thus a complex sound or a picture from the animate
category was always paired with a picture or complex sound from the
inanimate category. In summary, four stimulus types were employed:
1) auditory unimodal (A), 2) visual unimodal (V), 3) cross-modal
congruent (AV⫹), and 4) cross-modal incongruent (AV⫺).
To ensure that the complex sounds were easily nameable and
identifiable, we assessed accuracy values and RTs for each stimulus
exemplar in an initial behavioral pilot. Five young adults (mean age:
26 yr) participated in this initial pilot. Complex sounds were excluded
if detection accuracy levels fell below 75% and RTs exceeded 2 SD
above the mean RT values for each individual subject. Furthermore,
after behavioral testing, we also asked participants to rate complex
J Neurophysiol • VOL
Procedure
Each stimulus or stimulus pair was presented for 400 ms; for the
auditory stimuli, the 400-ms interval also included a 5-ms fall and rise
time. The time interval between the end of the stimulus presentation
and the beginning of the next trial varied between 2 and 4 s (equiprobable). See Fig. 1 for an illustration of the paradigm.
Multisensory facilitation, the RT gain following semantically congruent AV stimuli, was assessed using a simple detection and a feature
classification task. During the detection task, participants were instructed to respond (left index finger response) as quickly as possible
to any trial type: unimodal A, unimodal V, cross-modal congruent
(AV⫹), and cross-modal incongruent (AV⫺). Forty random presentations of each stimulus type and a total of 160 trials were used in this
condition. Therefore, because we used 30 exemplars for each trial
type, certain auditory or visual stimuli were presented more than once.
In the semantic classification task, participants were required to
make animacy or inanimacy judgments following unimodal A, unimodal V, and cross-modal congruent AV⫹ trial types (left index and
middle finger responses for animate and inanimate judgments, respectively). The correspondence between the response keys and the type of
stimulus presentation was counterbalanced between participants.
Table 1. List of auditory and visual stimuli used in the experiment
Animate
Cow
Pig
Sheep
Lamb
Dog
Cat
Horse
Donkey
Chicken
Turkey
Rooster
Carnival
Sparrow
Goose
Owl
Penguin
Whale
Dolphin
Fly
Cricket
Frog
Snake
Crocodile
Lion
Bear
Wolf
Tiger
Puma
Elephant
Monkey
106 • DECEMBER 2011 •
www.jn.org
Inanimate
Musical instruments
Violin
Guitar
Harp
Banjo
Trumpet
Bell
Clarinet
Piano
Organ
Drum
Cymbal
Automobiles/transportation
Train
Ship
Car
Ambulance
Fire truck
Motorcycle
icycle
Airplane
Helicopter
Household objects
Frying pan
Fork
Sink
Bottle
Hairdryer
Door
Telephone
Tea kettle
Camera
Clock
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Eighteen young adults (9 males, ages 19 –25 yr, mean ⫾ SD: 24.17 ⫾
3.9 yr), all right-handed with healthy neurological histories, normal to
corrected-to-normal vision and hearing, and an average of 16.39 yr of
education, participated in this study. The study was approved by the
joint Baycrest Centre-University of Toronto Research Ethics Committee, and the rights and privacy of the participants were observed.
All participants gave formal informed consent before the experiment
and received monetary compensation for participation.
sounds based on their recognizability. On the basis of the behavioral
findings and the postexperiment questionnaire results, we excluded
several complex sounds along with their visual counterparts. Thus, for
each animate or inanimate category, 30 different exemplars from each
sensory modality (auditory and visual) were selected because they
were unambiguously categorized. See Table 1 for a complete list of
the visual and auditory stimuli used in the experiment. In total, 60
animate stimuli (30 auditory and 30 visual) and 60 inanimate stimuli
(30 auditory and 30 visual) were used.
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
2899
system for preprocessing and analysis of the MEG data. Three
participants were eliminated from the analysis due to excessive head
motion during the MEG scan. Thus MEG data from a total of 15
participants were preprocessed and subsequently analyzed.
Neuromagnetic activity was sampled at a rate of 1,250 Hz and was
recorded continuously in 6 experimental blocks (i.e., 3 experimental
conditions repeated once) of 15 min recording time each. Participants
took a minimum 5-min break between each block. We divided the
conditions into 15-min blocks to maximize the total number of trials
for each condition while also keeping the blocks short to minimize the
amount of head motion in the MEG scanner. The simple detection
condition was always presented first; however, the order of the rest of
the experimental condition blocks was randomly assigned to each
participant.
MEG Preprocessing
Forty randomized presentations of each stimulus type and a total of
120 trials were used in this condition.
To assess cross-modal conflicts, we asked participants to indicate
whether auditory and visual signatures within cross-modal presentations were semantically congruent or not. As such, in the congruency
task, participants discriminated between congruent and incongruent
cross-modal presentations by pressing one of two responses (left
index and middle finger response keys). Forty random presentations of
each stimulus type were used. The correspondence between the
response keys and the type of stimulus presentation was counterbalanced between participants to control for any finger dominance effects
on response latencies.
Presentation software (version 10.3; Neurobehavioral Systems, San
Francisco, CA; http://www.neurobs.com/) was used to control visual
and auditory stimulus delivery and to record participants’ response
latency and accuracy.
MEG Recordings
The MEG was recorded in a magnetically shielded room at the
Rotman Research Institute, Baycrest Centre, using a 151-channel
whole head neuromagnetometer (OMEGA; VSM Medtech, Vancouver, Canada). MEG collection was synchronized to the onset of each
visual stimulus by recording the luminance changes of the screen with
a photodiode. With respect to the auditory stimuli, the MEG data
collection was synchronized to the onset of the auditory sound
envelope.
Participants’ head positions within the MEG scanner were determined at the start and at the end of each recording block using
indicator coils placed on nasion and bilateral preauricular points.
These fiducial points established a head-based Cartesian coordinate
J Neurophysiol • VOL
MEG Data Analysis
Event-related SAM analysis. The “nonlinear beamformer” or synthetic aperture magnetometry (SAM) technique was used to model the
source of the measured magnetic field (Robinson and Vrba 1998;
Sekihara et al. 2001). SAM minimizes power or the variance of the
measured MEG signals such that signals emitted from sources outside
each specified voxel are suppressed (Brookes et al. 2007; Cheyne et
al. 2007). This enables one to display simultaneously active sources at
multiple sites, provided that they are not perfectly synchronized. To
obtain spatial precision without integrating power over long temporal
windows, we used an event-related version of the SAM analysis
technique, event-related SAM or ER-SAM, introduced by Cheyne et
al. (2006) to identify evoked brain responses from unaveraged singletrial data.
Despite their advantages in terms of spatial resolution and localization accuracy, the performance of beamformers can be reduced in
the presence of sources with high temporal correlations. However,
beamformers used to localize time-locked brain activity, including
ER-SAM, have been shown to be fairly robust in the presence of
sources with small to moderate correlations, exhibiting only small
decreases in amplitude in the presence of such sources (Gross et al.
2001; Hadjipapas et al. 2005; Sekihara et al. 2002). It is important to
consider, however, that the proximity between highly correlated
sources may reduce estimated power in such sources. Quraan and
Cheyne (2010) used simulations of fully correlated bilateral sources in
the auditory cortex as well as correlated bilateral sources in striate and
extrastriate areas with random Gaussian noise and demonstrated that
source cancellations were less likely to occur at signal-to-noise levels
typical of neural sources in current MEG systems. Another reason for
selecting ER-SAM over more traditional beamformers is that it can
localize sources that are time-locked to stimulus events even in the
presence of artifacts and instrumental noise (Cheyne et al. 2007).
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Fig. 1. Experimental design. Four possible stimulus combinations were used in
this study: 1) unimodal visual (V; example, picture of a violin), 2) unimodal
auditory (A; example, sound of a “roar”), 3) cross-modal congruent or
simultaneous auditory and visual stimuli that were matched semantically
(AV⫹; example, picture of a bird matched with a corresponding “chirp”
sound), and 4) cross-modal incongruent or simultaneous auditory and visual
stimuli that were semantically mismatched (AV⫺; example, picture of a lion
matched with a “siren” sound). Each stimulus or stimulus pair was presented
for 400 ms; for the auditory stimulus, the 400-ms interval also included a 5-ms
fall and rise time. The time interval between the end of the stimulus presentation and the beginning of the next trial (ITI) varied between 2 and 4 s
(equiprobable).
Third-gradient noise correction was applied to the continuous MEG
data. Afterwards, the MEG data were parsed into epochs including a
200-ms prestimulus and 1,000-ms poststimulus activity window, and
DC offsets were removed from the entire epoch. Finally, MEG data
were band-pass filtered between 0.1 and 55 Hz and averaged across all
trial types.
Neuromagnetic activity was also coregistered to each participant’s
individual structural MRI. To constrain the sources of activation to
each participant’s head shape and structural anatomy, we acquired
MRI scans for each participant using a 3.0T Siemens Tim MAGNETOM Trio MRI scanner (syngo.MR software; Siemens Medical,
Erlangen, Germany) with 12-channel head coil. All participants’
structural MRIs and MEG source data were spatially normalized to
the Talairach standard brain using Analysis of Functional Neuroimaging software (AFNI; Cox 1996).
2900
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
J Neurophysiol • VOL
cantly different from random noise. Second, the reliability of each
source contribution was assessed using a bootstrap estimation of
standard errors for the MEG source saliences. The primary purpose of
the bootstrap is to determine those portions of the source waveforms
that show reliable experimental effects across subjects. Importantly,
with the use of bootstrap estimation of standard errors, no correction
for multiple comparisons was necessary, because the source saliences
were calculated in a single mathematical step on the whole brain at
once (McIntosh et al. 1996; McIntosh and Lobaugh 2004). These two
resampling techniques provide complementary information about the
statistical strength of the task effects and its reliability across participants. Statistical evaluation of task effects was performed using an
optimal number of 500 permutations (cf. Nichols and Holmes 2002)
and 300 bootstrap iterations (cf. Efron and Tibshirani 1986; McIntosh
et al. 1996).
RESULTS
Behavioral Results
Performance accuracy. In the simple detection condition,
accuracy levels were at ceiling in all four trial types (i.e., AV⫹,
AV⫺, A, and V). In the animacy condition, a larger number of
errors were observed for unimodal A compared with AV⫹ and
unimodal V trial types [F(2,13) ⫽ 207.33, P ⬍ 0.001]. Furthermore, in the congruency condition, participants showed
more false alarms in response to AV⫹ trial types. In other
words, participants were more likely to classify AV stimuli as
incongruent in cases of semantic congruence [F(1,14) ⫽
131.81, P ⬍ 0.0001]. See Table 3 for means and SD values in
the three conditions and four trial types.
Response times. MULTISENSORY FACILITATION. Only correct
responses were used in the RT data analysis. In multisensory
facilitation conditions (simple detection and semantic classification), RTs were analyzed using a 2 ⫻ 3 repeated-measures
analysis of variance (ANOVA) in which trial type (i.e., A, V,
and AV⫹) and condition (i.e., simple detection and semantic
classification) were the within-subject factors. First, a main
effect of condition was observed [F(1, 14) ⫽ 478.96, P ⬍
0.001] with RTs for semantic classification conditions being
longer (mean ⫾ SD: 657 ⫾ 17 ms) than RTs in the simple
detection condition (289 ⫾ 14 ms). Second, a main effect of
trial type was identified [F(2, 28) ⫽ 135.83, P ⬍ 0.001] with
unimodal A trial types showing significantly longer RTs than
unimodal V and cross-modal trial types across all three conditions (see Table 3 for means ⫾ SD values in each condition
and trial type; Fig. 2, A and B).
Finally, an interaction between conditions and trial types
was also found [F(2, 28) ⫽ 29.90, P ⬍ 0.0001] reflecting
significant behavioral facilitation to cross-modal compared
with unimodal A presentations in both semantic classification
and simple detection conditions (see Table 4 for an overview of
the repeated-measures ANOVA results). In the simple detection condition, post hoc t-tests suggested that RTs to unimodal
V trial types were also significantly slower than RTs to AV⫹
trial types (t ⫽ 11.47, P ⬍ 0.001) and RTs to AV⫺ trial types
(t ⫽ 11.35, P ⬍ 0.001). In conclusion, multisensory facilitation
was captured across both auditory and visual modalities in the
simple detection condition and in the auditory modality only in
the semantic classification condition.
CROSS-MODAL CONFLICTS. We examined the effects of crossmodal incongruence on performance by comparing congruent
(AV⫹) and incongruent (AV-) trial types in both simple
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Similarly to previous beamforming approaches, the ER-SAM analysis uses the individual trials of each condition and the forward
solution for modeling optimal current direction to calculate a spatial
filter for each voxel using the minimum-variance beamforming algorithm (Cheyne et al. 2006). The spatial filter we used included 72
brain regions of interest adapted from the regional map coarse parcellation scheme of the cerebral cortex proposed by Kotter and
colleagues (Bezgin et al. 2008; Kotter and Wanke 2005). See Table 2
for a complete listing of the brain regions used with their respective
Talairach coordinates. Each brain region was defined by a threedimensional position vector and consisted of a unique set of sensor
coefficients that constituted a weighting matrix. Thus all brain regions
were constrained to be the same size.
The MEG data were then projected through this spatial filter to give
a measure of current density, as a function of time, in the target brain
region. Because this source time series was calculated using a
weighted sum of the MEG sensors, it had the same millisecond time
resolution as the original MEG sensor data. To enhance the spatial
precision of this technique, we used the participants’ structural MRIs
to constrain the ER-SAM images to each participant’s individual MRI
and to allow for spatial normalization and group averaging in stereotaxic space. The individual functional maps were overlaid on the
individual participant’s MRI based on coregistration with the indicator coils placed on the nasion and bilateral preauricular points. The
functional data were then transformed to the standard TalairachTournoux space using the same transformation applied to the structural MRI (AFNI software; Cox 1996).
PLS analysis. We used partial least squares (PLS) analysis (McIntosh et al. 1996) to examine neuromagnetic brain activity across all 72
brain regions of interest as a function of task demands, namely,
multisensory facilitation and cross-modal conflict. The term “partial
least squares” refers to the computation of an optimal squares fit to
part of a covariance structure that is attributable to the experimental
manipulations. PLS applied to MEG data is conceptually analogous to
the analysis of MEG difference waveforms, because it identifies
task-related differences in amplitude across all MEG sources by
deriving the optimal least squares contrasts that code for the task
differences. Because PLS performs this computation across the entire
dataset in time and space simultaneously, there is no need to specify
a priori MEG sources or time intervals for the analysis.
In the original version of PLS, singular value decomposition (SVD)
was used to identify the strongest effects in the data. For several
analyses discussed in this study, we used a nonrotated version of task
PLS, in which a priori contrasts restricted the spatiotemporal patterns
derived from PLS. This version of PLS has the advantage of allowing
direct assessment of hypothesized experimental effects. Task differences were examined across the entire epoch, including the prestimulus baseline (⫺0.2 s) and the poststimulus interval (1 s).
The relationship between two sets of matrices is analyzed: the first
pertains to the experimental design (or to the performance measures),
whereas the second contains MEG source activity across all time
points. The relationship between the two blocks of data is stored in a
cross-product matrix. Application of SVD decomposes this matrix
into three new matrices: 1) task saliences, 2) singular values, and
3) source saliences. The task saliences indicate the experimental
design profiles that best characterize the given matrix. The source
saliences reflect the MEG sources that characterize the corresponding
task pattern across space (expressed across a collection of MEG
sources) and time (expressed across all time points included in the
analysis). Task and source saliences are linked together via singular
values, which represent the square root of the eigenvalues. The
cross-product between the task saliences and original matrix creates
“brain scores,” which reflect the expression of the task contrast across
all participants (see Figs. 4 —7).
Statistical assessment. For statistical assessment of task effects,
two complementary resampling techniques were employed. First,
permutation tests assessed whether the task saliences were signifi-
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
2901
Table 2. Regional map coordinates with reference to Talairach-Tournoux
X
Y
Z
Region
BA
Hemisphere
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
0
0
0
0
⫺40
⫺60
⫺36
⫺36
⫺36
⫺24
⫺44
⫺44
⫺8
⫺28
⫺48
⫺48
⫺8
⫺8
⫺24
⫺24
⫺48
⫺28
⫺28
⫺4
⫺44
⫺16
⫺40
⫺56
⫺64
⫺64
⫺52
⫺52
⫺32
⫺8
⫺4
⫺4
⫺20
⫺20
40
60
36
36
36
24
44
44
8
28
48
48
8
8
24
24
48
28
28
4
44
16
40
56
64
64
52
52
32
8
4
32
⫺32
⫺48
16
⫺14
⫺14
8
16
⫺8
⫺24
⫺48
⫺64
⫺64
⫺56
32
36
36
48
44
64
32
⫺16
0
0
4
⫺28
⫺28
⫺16
⫺24
⫺24
12
⫺4
⫺28
⫺8
⫺84
⫺96
⫺88
⫺84
⫺14
⫺14
8
16
⫺8
⫺24
⫺48
⫺64
⫺64
⫺56
32
36
36
48
44
64
32
⫺16
0
0
4
⫺28
⫺28
⫺16
⫺24
⫺24
12
⫺4
⫺28
⫺8
⫺84
24
24
12
⫺8
4
4
56
⫺4
⫺4
56
20
28
54
54
12
32
40
20
⫺20
4
⫺8
⫺16
60
60
24
4
64
16
⫺12
⫺24
⫺28
⫺8
⫺28
4
⫺4
8
20
⫺12
4
4
56
⫺4
⫺4
56
20
28
54
54
12
32
40
20
⫺20
4
⫺8
⫺16
60
60
24
4
64
16
⫺12
⫺24
⫺28
⫺8
⫺28
4
⫺4
Anterior cingulate cortex
Posterior cingulate cortex
Retrosplenial cingulate cortex
Aubgenual cingulate cortex
A1 (primary auditory)
A2 (secondary auditory)
Frontal eye fields
Anterior insula
Claustrum
M1 (primary motor)
Inferior parietal cortex
Angular gyrus
Precuneus
Superior parietal cortex
Centrolateral prefrontal cortex
Dorsolateral prefrontal cortex
Dorsomedial prefrontal cortex
Medial prefrontal cortex
Orbitofrontal cortex
Frontal polar
Ventrolateral prefrontal cortex
Parahippocampal cortex
Dorsolateral premotor cortex
Medial premotor cortex
Ventrolateral premotor cortex
Pulvinar
S1 (primary somatosensory)
S2 (secondary somatosensory)
Middle temporal cortex
Inferior temporal cortex
Temporal pole
Superior temporal cortex
Ventral temporal cortex
Thalamus (ventral lateral nucleus)
V1 (primary visual)
V2 (secondary visual)
Cuneus
Fusiform gyrus
A1 (primary auditory)
A2 (secondary auditory)
Frontal eye fields
Anterior insula
Claustrum
M1 (primary motor)
Inferior parietal cortex
Angular gyrus
Precuneus
Superior parietal cortex
Centrolateral prefrontal cortex
Dorsolateral prefrontal cortex
Dorsomedial prefrontal cortex
Medial prefrontal cortex
Orbitofrontal cortex
Frontal polar
Ventrolateral prefrontal cortex
Parahippocampal cortex
Dorsolateral premotor cortex
Medial premotor cortex
Ventrolateral premotor cortex
Pulvinar
S1 (primary somatosensory)
S2 (secondary somatosensory)
Middle temporal cortex
Inferior temporal cortex
Temporal pole
Superior temporal cortex
Ventral temporal cortex
Thalamus (ventral lateral nucleus)
V1 (primary visual)
32
23
30
25
Midline
Midline
Midline
Midline
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Left
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Right
Continued
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
22
6
13
4
40
39
7
7
46
9
8
10
11
10
6
6
9
3
43
21
20
38
22
18
19
22
6
13
4
40
39
7
7
46
9
8
10
11
10
6
6
9
3
43
21
20
38
22
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Cluster
2902
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
Table 2. —Continued
Cluster
X
Y
Z
70
71
72
4
20
20
⫺96
⫺88
⫺84
8
20
⫺12
Region
V2 (secondary visual)
Cuneus
Fusiform gyrus
BA
Hemisphere
18
19
Right
Right
Right
Coordinates are included for the 72 brain regions of interest adapted from the regional map coarse parcellation scheme of the cerebral cortex proposed by Kotter
and Wanke (2005). Brodmann areas (BA) were determined using the brain atlas of Talairach and Tournoux (1988), because these landmarks provide more
meaningful representations of the magnetoencephalography source locations.
Cumulative distribution functions. To investigate the behavioral effects of multisensory integration further, we contrasted
cross-modal trial types (i.e., AV⫹ and AV-) against the inequality model proposed by Miller (1982). The group-averaged
probability for each cross-modal trial type was analyzed
against the predictions of the model based on the unimodal
Table 3. Descriptive statistics for the two dependent variables:
accuracy and RT
Variable
Accuracy
Simple detection AV⫹
Simple detection AV⫺
Simple detection A
Simple detection V
Semantic classification AV⫹
Semantic classification A
Semantic classification V
Congruency AV⫹
Congruency AV⫺
RT
Simple detection AV⫹
Simple detection AV⫺
Simple detection A
Simple detection V
Semantic classification AV⫹
Semantic classification A
Semantic classification V
Congruency AV⫹
Congruency AV⫺
Mean
98.33
97.33
98.73
97.07
97.20
90.40
97.53
84.87
89.93
254.54
249.72
330.18
284.59
592.91
779.42
598.79
957.835
1,099.843
SD
0.90
1.35
0.46
1.28
0.94
1.68
1.06
0.48
0.30
47.77
47.41
80.37
46.25
70.73
75.87
83.25
172.0379
192.0922
Values are percentages for accuracy and milliseconds for response time
(RT) for unimodal auditory (A), unimodal visual (V), cross-modal congruent
(AV⫹), and cross-modal incongruent (AV⫺) trial type statistics.
J Neurophysiol • VOL
Fig. 2. Response times (RTs; in ms) in simple detection (A), semantic feature
classification (B), and congruency conditions (C). Errors bars represent SE.
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
detection and congruency conditions. In the detection condition, we expected to find no differences between congruent and
incongruent presentations. However, in the congruency condition, we anticipated that RTs would be slower as participants
were required to assess the degree of mismatch between the
auditory and visual stimuli embedded within cross-modal
presentations.
We found a significant interaction between trial types and
conditions [F(1,14) ⫽ 37.39, P ⬍ 0.001]. As predicted, crossmodal congruent and incongruent trial types did not differ in
the simple detection condition; however, in the congruency
condition, incongruent compared with congruent trial types
exhibited significantly longer RTs (Fig. 2C). In the congruency
condition, post hoc t-tests revealed significant differences between congruent and incongruent AV trial types [t(14) ⫽ 6.19,
P ⬍ 0.0001]. Participants were slower to respond to incongruent AV trial types by ⬃140 ms. Although they were slower,
participants were also more accurate in categorizing incongruent compared with congruent presentations. Because we were
primarily interested in task differences in RTs, we instructed
participants to respond as fast as possible to each stimulus type.
Thus, in the congruency condition, responses to congruent AV
stimuli were faster, but also less accurate, than incongruent
ones. Performance accuracy was 4% lower in congruent compared with incongruent trials (t ⫽ 11.4, P ⬍ 0.0001) averaging
at 85% and 89%, respectively.
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
Table 4. Repeated-measures ANOVA results and descriptive statistics
Effect
Condition
Trial type
Condition ⫻
trial type
F
Hypothesis df
Error df
P
Partial ␩2
1.0000
2.0000
14.0000
13.0000
0.0000
0.0000
0.9716
0.9408
2.0000
13.0000
0.0002
0.7345
478.9638
135.8296
29.89567
95% Confidence
Interval
Effect
SD
Lower
bound
Upper
bound
289.77
657.04
14.78
17.58
258.07
619.34
321.47
694.74
423.72
554.80
441.69
12.92
15.58
15.64
396.00
521.39
408.15
451.44
588.21
475.24
254.54
330.18
284.59
592.91
779.42
598.79
12.33
20.75
11.94
18.26
19.59
21.49
228.08
285.68
258.98
553.74
737.41
552.69
280.99
374.69
310.21
632.08
821.43
644.89
Values are results of repeated-measures analysis of variance (ANOVA) and
other descriptive statistics as indicated. df, Degrees of freedom.
distributions. We generated subject-specific cumulative distribution functions (CDFs) for each trial type by dividing the time
series into smaller, 5-ms time bins, similar to the procedure
used by Laurienti and colleagues under similar task demands
(Laurienti et al. 2006; Peiffer et al. 2007). The race model CDF
was generated for each subject and averaged at each time bin.
Paired t-tests were used to compare multisensory responses
against the statistical facilitation predictions of the race model
based on the unimodal RT distributions for each participant. In
the simple detection condition, the probability of detecting
faster RTs following cross-modal presentations exceeded the
probability predicted by statistical facilitation associated with
redundant unisensory stimuli (t ⫽ 69.92, P ⬍ 0.001). Similarly, in the semantic classification condition, multisensory
responses were also faster than those predicted by the race
model (t ⫽ 19.25, P ⬍ 0.003). Note that the probabilities of
detecting shorter RTs in AV⫹ trials were not larger than the
probabilities of detecting short RTs in unimodal visual trials in
the semantic classification condition, further indicating that
semantic classification of visual objects does not benefit from
concurrent auditory presentations (see Fig. 3B).
MEG Results
Multisensory processing. In both simple detection and semantic classification conditions, significant differences between cross-modal and unimodal auditory presentations were
detected [first-order latent variable (LV1) ⫽ 48.94, P ⬍ 0.001;
Fig. 4A]. Multisensory integration, or increased MEG source
activity in response to cross-modal compared with unimodal
auditory stimuli, was captured in the left temporal pole (BA
38) and the left angular gyrus (BA 39) between 100 –300 and
200 – 400 ms, respectively (Fig. 4B). Cross-modal and unimodal V trial types also differed significantly (LV1 ⫽ 52.43,
P ⬍ 0.001; Fig. 4C). Multisensory processes were captured in
the left cuneus (BA 18) and the left IPL (BA 40) between
100 –300 and 200 – 400 ms, respectively, with these sources
showing larger amplitude modulations to cross-modal stimuli
compared with visual ones. (Fig. 4D).
Our initial prediction that posterior parietal cortices responded preferentially to cross-modal trial types was also
supported by an analysis that compared cross-modal (AV⫹
and AV⫺) with unisensory presentations (LV1 ⫽ 44.96, P ⬍
0.05; Fig. 5A). When comparing cross-modal stimuli with their
respective unisensory components, we found evidence of multisensory integration in the left precuneus (BA 7) and the right
primary motor cortex (BA 4) between 100 –200 and 300 – 600
ms after stimulus onset, respectively (Fig. 5B).
EFFECTS OF SEMANTIC EVALUATION. The multisensory integration neural processes described above were captured in both
Fig. 3. Cumulative probability density function (CDF) for cross-modal trial types and multisensory responses predicted by the race (or inequality) model.
Multisensory and calculated (predicted) race model CDFs are shown. Multisensory responses were faster than those predicted by the race model in both detection
(A) and semantic classification conditions (B). Note that in the semantic classification condition, multisensory facilitation was only detected when auditory stimuli
were accompanied by visual ones, rather than vice versa.
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Condition
Simple detection
Semantic classification
Trial type
AV⫹
A
V
Condition ⫻ trial type
Simple detection AV⫹
Simple detection A
Simple detection V
Semantic classification AV⫹
Semantic classification A
Semantic classification V
Mean
2903
2904
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
simple detection and semantic classification conditions. In the
semantic classification condition, increased amplitude modulations in response to semantically congruent cross-modal
stimuli were observed in a distinct set of brain regions. In
addition to multisensory responses in sensory-specific and
posterior parietal brain regions, increased activity in posterior
cingulate and superior temporal cortices were observed as
participants were required to classify cross-modal trial types
into animate and inanimate categories (LV1 ⫽ 48.13, P ⬍
0.05; Fig. 6A). The posterior cingulate cortex and the left STS
showed larger amplitudes in response to AV⫹ trial types
between 100 –300 and 500 –700 ms, respectively (Fig. 6B).
The effects of semantic classification across cingulate and
superior temporal regions were specific to semantically congruent cross-modal presentations. The specificity of cingulate
and temporal activity in response to cross-modal stimuli was
tested explicitly by assessing the impact of semantic categorization on unisensory processing. In visual-only trial types,
there was a trend toward increased activity in the left fusiform
gyrus as participants classified pictures into animate or inanimate categories. A similar trend was observed in the auditory
modality in the insula. These differences, however, were not
significant by permutation tests. Thus we concluded that posterior cingulate and left STS activity between 100 –300 and
500 –700 ms was specific to cross-modal presentations in the
semantic classification condition.
EFFECTS OF CROSS-MODAL CONFLICT. Compared with the multisensory facilitation effects in the simple detection conditions,
performance decreases were observed in AV⫹ and AV⫺ trials
as participants were required to determine the degree of con2
2
Fig. 5. Cross-modal compared with unimodal trial types in multisensory facilitation conditions. The source waveforms were derived from the Talairach
coordinates displayed at left. The bootstrap ratios above the source waveforms reflect the positive expression of the task contrast (A), i.e., larger amplitude
modulations for cross-modal compared with unimodal trial types. We found increased activity in the left precuneus and the right primary motor cortex (M1)
between 100 –200 and 300 – 600 ms (B).
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Fig. 4. Cross-modal compared with unimodal auditory (A and B) and unimodal visual processing (C and D) in multisensory facilitation conditions. The source
waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios above the source waveforms reflect the positive expression of the
task contrast (A and C), i.e., larger amplitude modulations for cross-modal compared with unimodal auditory and unimodal visual trial types. The left temporal
pole and angular gyrus (AG) showed increased activity in response to cross-modal compared with unimodal auditory trial types between 100 –300 and 200 – 600
ms, respectively (B). The left cuneus and inferior parietal lobule (IPL) showed increased activity in response to cross-modal compared with unimodal visual trial
types between 100 –300 and 200 – 400 ms (D).
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
2905
Cortex
Fig. 6. Effects of semantic feature categorization on multisensory processing. The source waveforms were derived from the Talairach coordinates displayed at
left. The bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal
trial types in the semantic classification condition compared with cross-modal trial types in the simple detection condition. The medial posterior cingulate and
left superior temporal sulcus showed increased activity in response to cross-modal trial types in the semantic classification condition between 100 –300 and
500 –700 ms (B).
between 200 – 400 and 500 –1,000 ms after stimulus onset were
captured in the congruency condition when participants were
required to determine whether cross-modal presentations
matched semantically.
Fig. 7. Effects of cross-modal conflict on multisensory processing. The source waveforms were derived from the Talairach coordinates displayed at left. The
bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal trial types
in the congruent condition compared with the simple detection condition, and larger amplitudes for incongruent compared with congruent trial types. The left
parahippocampal cortex (PHC), dorsomedial prefrontal cortex (PFC), premotor cortex (PMC), and bilateral orbitofrontal cortex (OFC) showed increased activity
in response to cross-modal trial types in the congruency condition between 200 –500 and 600 –1,000 ms (B).
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
gruency between auditory and visual targets. We observed
significant differences between cross-modal trial types in the
simple detection and the congruency conditions (LV1 ⫽ 55.25,
P ⬍ 0.001; Fig. 7A). Large-amplitude modulations extending
2906
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
DISCUSSION
Behavior Results
Multisensory facilitation at the behavioral level was observed in simple detection and semantic classification conditions with faster RTs and improved accuracy for cross-modal
compared with unimodal trial types. Multisensory responses in
both simple detection and semantic classification conditions
also exceeded the responses predicted by the race model (cf.
Fig. 3). In the semantic classification condition, however, the
presentation of the visual target facilitated auditory object
recognition, but not vice versa. Although motor performance
was slower for unimodal A compared with cross-modal trial
types, no significant differences between the unimodal visual
and cross-modal trial types were observed. Indeed, previous
studies demonstrated an asymmetry between auditory and
visual stimuli with increased dominance of vision over auditory perception during object recognition tasks. As demonstrated previously, concurrent visual presentations helped disambiguate auditory object processes more than vice versa
(Jaekl and Harris 2009; Yuval-Greenberg and Deouell, 2009).
Although the behavioral results in the semantic classification
condition indicate dominance of vision over audition, recent
ERP studies suggest that, similarly to the visual sensory mo-
dality, auditory animacy judgments begin within 100 ms after
stimulus onset (Murray et al. 2006). Similarly, Thorpe et al.
(1996) showed that animacy judgments of visual objects increase neural activity in sensory-specific areas 100 ms after
stimulus presentation.
In the congruency condition, which assessed the effects of
cross-modal semantic relatedness on classification performance, slower RTs and an increased number of false alarms
were observed in response to congruent presentations relative
to incongruent ones. When they were faster, participants were
also more likely to judge cross-modal stimuli as congruent.
Similar to previous behavioral findings (cf., Laurienti et al.
2003, 2004; Noppeney et al. 2008), even after incorrect trials
were excluded, RTs to congruent presentations continued to be
significantly faster than RTs to incongruent presentations.
Distinct Effects of Multisensory Facilitation and Cross-Modal
Conflict on Neuromagnetic Activity
MEG recordings and ER-SAM analysis allowed us to examine multisensory processes both temporally and spatially.
We observed evidence of multisensory integration across sensory-specific and posterior parietal sources in both auditory and
visual modalities. These spatiotemporal patterns were captured
in both simple detection and semantic classification conditions.
Within 100 –200 ms after stimulus onset, cross-modal stimuli,
compared with unimodal stimuli, elicited increased amplitude
modulations in left posterior parietal cortices, including the left
precuneus, left angular gyrus, and the left IPL. MEG source
activity in the left temporal pole and angular gyrus increased in
response to cross-modal presentations relative to unimodal
auditory presentations. Although we observed significant sensory modality differences in performance, the timing of multisensory responses did not differ between auditory and visual
modalities. Similar to the auditory modality, larger amplitudes
to cross-modal compared with unimodal visual trial types
extended between 100 –300 and 200 – 400 ms. In addition,
examination of multisensory responses against the responses
evoked by both modality-specific components revealed enhanced activity in the left posterior parietal sources, including
the left precuneus, and the right motor cortex between 100 and
400 and at 100 ms, respectively, in response to cross-modal
stimuli.
Posterior parietal activity has been previously linked to
multisensory processing. Neuroimaging studies using fMRI
recordings to examine multisensory integration processes
Fig. 8. Congruent compared with incongruent cross-modal matching. The source waveforms were derived from the Talairach coordinates displayed at left. The
bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal trial types
in the congruent condition compared with the simple detection condition, and larger amplitudes for incongruent compared with congruent trial types. The left
dorsomedial PFC and the left centrolateral PFC showed increased activity to incongruent compared with congruent trial types between 100 and 300 ms (B).
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
The left parahippocampal cortex (PHC) and the left dorsomedial prefrontal cortex (PFC) showed large amplitude modulations between 200 – 400 and 400 – 600 ms in response to
AV⫹ and AV⫺ trial types in the congruency condition compared with the simple detection condition (Fig. 7B). Furthermore, the bilateral orbitofrontal cortex (BA 11) and the left
dorsolateral premotor cortex (PMC) were also associated with
cross-modal presentations in the congruency condition between 400 – 600 and 600 –1,000 ms (Fig. 7B). Source activity
in the left dorsolateral PMC was positively correlated with RTs
across all trial types in the congruency condition between 500
and 1,000 ms after stimulus onset (r ⫽ 0.942, 0.945, 0.71, and
0.82, P ⬍ 0.001, in AV⫹, AV⫺, A, and V trial types,
respectively).
Within the congruency condition, we also observed differences between congruent and incongruent trial types (LV1 ⫽
41.98, P ⬍ 0.05; Fig. 8A). The left dorsomedial PFC and the
left centrolateral PFC showed larger responses to incongruent
compared with congruent trial types between 100 and 300 ms
(Fig. 8B).
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
J Neurophysiol • VOL
lack of significant differences between animate and inanimate
trial types in motor and premotor sources.
Within cross-modal trial types, the left dorsomedial and
centrolateral prefrontal cortices responded preferentially to
incongruent cross-modal combinations compared with congruent ones 100 – 400 ms after stimulus onset. Previous fMRI
studies that examined the effects of semantic congruence on
multisensory integration suggested that the inferior frontal
cortex activated in response to cross-modal stimuli when auditory and visual representations were semantically mismatched. Noppeney et al. (2008, 2010) used pictures of animate (animals) and inanimate (man-made) objects concurrently
presented with matching and nonmatching complex sounds or
spoken words. Stronger effects of semantic relatedness were
observed in the posterior STS and middle temporal gyrus in
response to words. However, only medial and inferior frontal
cortices showed incongruency effects for both spoken words
and complex sounds when paired with animate or inanimate
pictures. In line with these results, Belardinelli et al. (2004)
also demonstrated that the inferior frontal cortex responded
more strongly to incongruent cross-modal combinations of
complex stimuli (Belardinelli et al. 2004).
The present study extends previous examinations of crossmodal conflict by demonstrating that detecting the degree of
semantic AV incongruence elicits increased activity in medial
and centrolateral prefrontal cortices as early as 100 ms after
stimulus onset. Furthermore, as participants are instructed to
identify the degree of cross-modal semantic relatedness, additional sources, namely, left PHC, show large-amplitude modulations as early as 200 ms after stimulus onset. After ⬃250
ms, premotor and bilateral orbitofrontal cortices show similar
task-related amplitude modulations with premotor activity reflecting motor response preparation.
Conclusion
Multisensory integration, or the distinct neural processes
associated with cross-modal relative to unimodal events,
was captured in sensory-specific and posterior parietal brain
regions within 100 and 200 ms after stimulus onset. Cingulate and superior temporal cortices responded preferentially
to cross-modal presentations between 100 and 500 ms as
participants classified cross-modal pairs into animate or
animate categories. Finally, in trials that required participants to detect the level of cross-modal incongruence, medial and centrolateral prefrontal cortices showed early amplitude modulations in response to incongruent pairs compared with congruent ones. These results suggest that
integration of semantically related complex sounds and pictures unfolds in multiple stages and across distinct parietal
and medial frontal cortical regions: first, sensory-specific
and posterior parietal neurons respond preferentially to
cross-modal stimuli irrespective of task demands and as
early as 100 ms after stimulus onset; second, regions in
superior temporal and posterior cingulate cortices respond
to cross-modal stimuli during semantic feature classification
trials between 100 and 500 ms; and finally, within 100 – 400
ms and 400 – 800 ms after stimulus onset, parahippocampal,
medial prefrontal, and orbitofrontal regions process crossmodal conflicts and respond to cross-modal stimuli when
complex sounds and pictures are semantically mismatched.
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
showed that posterior parietal regions including IPS, the IPL
(BA 40), and the SPL (BA 39) responded preferentially to
cross-modal compared with unimodal presentations (Baumann
and Greenlee 2007; Bishop and Miller 2008; Calvert et al.
2001; Grefkes et al. 2002; Macaluso et al. 2004; Moran et al.
2008). Previous ERP studies also found early-amplitude deflections across sensory-specific channels and late-amplitude
deflections in parietal scalp regions during the presentation of
semantically and spatially congruent AV stimuli (Giard and
Peronnet 1999; Teder-Salejarvi et al. 2002, 2005).
Although multisensory processes were captured bilaterally
in posterior parietal cortices, the difference between crossmodal and unimodal trial types was larger and more stable
across participants in the left hemisphere. This left hemispheric
bias in posterior parietal activity during multisensory processing is not uncommon. There is growing evidence that when
cross-modal activation patterns are not found bilaterally, they
are usually observed in the left hemisphere (cf. Baumann and
Greenlee 2007; Calvert et al. 2001; Grefkes et al. 2002;
Molholm et al. 2006).
In addition to increased amplitude modulations associated
with cross-modal stimuli, we also demonstrated that activity in
cingulate and superior temporal regions increased 100 and 500
ms after stimulus onset as participants were required to determine which cross-modal pairs pertained to animate or inanimate categories. These MEG source activation patterns were
specific to cross-modal trial types, because they were not
detected when contrasting unisensory responses across simple
detection and semantic classification conditions.
Previous neuroimaging studies also showed that simultaneously presented complex sounds and semantically-related visual stimuli recruited anterior cingulate and adjacent medial
prefrontal cortices (Laurienti et al. 2003). Furthermore, consistent with our current results, Beauchamp et al. (2004) and
van Atteveldt et al. (2010) showed that blood oxygen leveldependent activity in the posterior STS and the superior temporal cortex was specifically associated with the processing of
familiar, semantically related cross-modal stimuli. The present
study extends this line of research by demonstrating that neural
activity related to semantic feature classification begins 100 ms
after stimulus onset in cingulate regions and extends to superior temporal regions 400 ms later in response to cross-modal
stimulus presentations only.
In cross-modal matching trial types, activity in the left PHC
increased between 200 – 400 and 600 –1,000 ms across both
congruent and incongruent cross-modal pairs. Furthermore, the
bilateral orbitofrontal cortex (BA 11) and the left PMC showed
similar task-related activation patterns at later processing
stages, between 500 and 1,000 ms after stimulus onset. We also
found that increased activity in the left PMC predicted slower
performance in response to cross-modal stimuli as participants
assessed the degree of congruence between their auditory and
visual counterparts. Because response latencies in the congruency condition averaged at ⬃858 ms, increased premotor
activity in this condition may reflect motor response preparation. On the other hand, recent research has shown that left
PMCs respond preferentially to inanimate objects such as tools
(Maravita et al. 2002; Weisberg et al. 2007). Although we were
not interested in differences between animate and inanimate
stimuli, we compared the two stimulus categories and found a
2907
2908
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
GRANTS
This research was supported by grants from the Canadian Institutes of
Health Research and the J. S. McDonnell Foundation to A. R. McIntosh. A. O.
Diaconescu is supported by the Natural Sciences and Engineering Research
Council of Canada.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s).
AUTHOR CONTRIBUTIONS
A.O.D., C.A., and A.R.M. conception and design of research; A.O.D.
performed experiments; A.O.D. analyzed data; A.O.D. and A.R.M interpreted
results of experiments; A.O.D. prepared figures; A.O.D. drafted manuscript;
A.O.D., C.A., and A.R.M. edited and revised manuscript; A.O.D., C.A., and
A.R.M. approved final version of manuscript.
Baumann O, Greenlee MW. Neural correlates of coherent audiovisual
motion perception. Cereb Cortex 17: 1433–1443, 2007.
Beauchamp MS, Nath AR, Pasalar S. fMRI-Guided transcranial magnetic
stimulation reveals that the superior temporal sulcus is a cortical locus of the
McGurk effect. J Neurosci 30: 2414 –2417, 2010.
Belardinelli MO, Sestieri C, Di Matteo R, Delogu F, Del Gratta C, Ferretti
A, Caulo M, Tartaro A, Romani GL. Audio-visual crossmodal interactions in environmental perception: An fMRI investigation. Cogn Process 5:
167–174, 2004.
Bezgin G, Wanke E, Krumnack A, Kotter R. Deducing logical relationships
between spatially registered cortical parcellations under conditions of uncertainty. Neural Netw 21: 1132–1145, 2008.
Bishop CW, Miller LM. A multisensory cortical network for understanding
speech in noise. J Cogn Neurosci 21: 1790 –1805, 2009.
Brookes MJ, Stevenson CM, Barnes GR, Hillebrand A, Simpson MI,
Francis ST, Morris PG. Beamformer reconstruction of correlated sources
using a modified source model. Neuroimage 34: 1454 –1465, 2007.
Calvert GA, Hansen PC, Iversen SD, Brammer MJ. Detection of audiovisual integration sites in humans by application of electrophysiological
criteria to the BOLD effect. Neuroimage 14: 427– 438, 2001.
Chen YC, Spence C. When hearing the bark helps to identify the dog:
semantically-congruent sounds modulate the identification of masked pictures. Cognition 114: 389 – 404, 2010.
Cheyne D, Bakhtazad L, Gaetz W. Spatiotemporal mapping of cortical
activity accompanying voluntary movements using an event-related beamforming approach. Hum Brain Mapp 27: 213–229, 2006.
Cheyne D, Bostan AC, Gaetz W, Pang EW. Event-related beamforming: a
robust method for presurgical functional mapping using MEG. Clin Neurophysiol 118: 1691–1704, 2007.
Cox RW. AFNI: software for analysis and visualization of functional magnetic
resonance neuroimages. Comput Biomed Res 29: 162–173, 1996.
Diederich A, Colonius H. Bimodal and trimodal multisensory enhancement:
effects of stimulus onset and intensity on reaction time. Percept Psychophys
66: 1388 –1404, 2004.
Doehrmann O, Naumer MJ. Semantics and the multisensory brain: how
meaning modulates processes of audio-visual integration. Brain Res 1242:
136 –150, 2008.
Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence
intervals and other measures of statistical accuracy. Statist Sci 1: 54 –77,
1986.
Giard MH, Peronnet F. Auditory-visual integration during multimodal object
recognition in humans: a behavioral and electrophysiological study. J Cogn
Neurosci 11: 473– 490, 1999.
Grefkes C, Weiss PH, Zilles K, Fink GR. Crossmodal processing of object
features in human anterior intraparietal cortex: an fMRI study implies
equivalencies between humans and monkeys. Neuron 35: 173–184, 2002.
Gross J, Kujala J, Hamalainen M, Timmermann L, Schnitzler A, Salmelin
R. Dynamic imaging of coherent sources: Studying neural interactions in the
human brain. Proc Natl Acad Sci USA 98: 694 – 699, 2001.
Jaekl PM, Harris LR. Sounds can affect visual perception mediated primarily
by the parvocellular pathway. Vis Neurosci 26: 477– 486, 2009.
Hadjipapas A, Hillebrand A, Holliday IE, Singh KD, Barnes GR. Assessing interactions of linear and nonlinear neuronal sources using MEG
beamformers: a proof of concept. Clin Neurophysiol 116: 1300 –1313, 2005.
J Neurophysiol • VOL
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
REFERENCES
Hartcher-O’Brien J, Gallace A, Krings B, Koppen C, Spence C. When
vision ‘extinguishes’ touch in neurologically-normal people: extending the
Colavita visual dominance effect. Exp Brain Res 186: 643– 658, 2008.
Kaiser J, Hertrich I, Ackermann H, Mathiak K, Lutzenberger W. Hearing
lips: gamma-band activity during audiovisual speech perception. Cereb
Cortex 15: 646 – 653, 2005.
Koppen C, Alsius A, Spence C. Semantic congruency and the Colavita visual
dominance effect. Exp Brain Res 184: 533–546, 2008.
Koppen C, Spence C. Assessing the role of stimulus probability on the
Colavita visual dominance effect. Neurosci Lett 418: 266 –271, 2007a.
Koppen C, Spence C. Seeing the light: exploring the Colavita visual dominance effect. Exp Brain Res 180: 737–754, 2007b.
Kotter R, Wanke E. Mapping brains without coordinates. Philos Trans R Soc
Lond B Biol Sci 360: 751–766, 2005.
Kutas M, Federmeier KD. Electrophysiology reveals semantic memory use
in language comprehension. Trends Cogn Sci 4: 463– 470, 2000.
Laurienti PJ, Burdette JH, Maldjian JA, Wallace MT. Enhanced multisensory integration in older adults. Neurobiol Aging 27: 1155–1163, 2006.
Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT. Semantic
congruence is a critical factor in multisensory behavioral performance. Exp
Brain Res 158: 405– 414, 2004.
Laurienti PJ, Wallace MT, Maldjian JA, Susi CM, Stein BE, Burdette JH.
Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum Brain Mapp 19: 213–223, 2003.
Macaluso E, George N, Dolan R, Spence C, Driver J. Spatial and temporal
factors during processing of audiovisual speech: a PET study. Neuroimage
21: 725–732, 2004.
Malach R, Reppas JB, Benson RR, Kwong KK, Jiang H, Kennedy WA,
Tootell RB. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci USA 92:
8135– 8139, 1995.
Maravita A, Spence C, Kennett S, Driver J. Tool-use changes multimodal
spatial interactions between vision and touch in normal humans. Cognition
83: B25–B34, 2002.
McDonald JJ, Teder-Salejarvi WA, Hillyard SA. Involuntary orienting to
sound improves visual perception. Nature 407: 906 –908, 2000.
McGurk H, MacDonald J. Hearing lips and seeing voices. Nature 264:
746 –748, 1976.
McIntosh AR, Bookstein FL, Haxby JV, Grady CL. Spatial pattern analysis
of functional brain images using partial least squares. Neuroimage 3:
143–157, 1996.
McIntosh AR, Lobaugh NJ. Partial least squares analysis of neuroimaging
data: applications and advances. Neuroimage 23, Suppl 1: S250 –S263,
2004.
Miller J. Divided attention: evidence for coactivation with redundant signals.
Cogn Psychol 14: 247–279, 1982.
Molholm S, Ritter W, Javitt DC, Foxe JJ. Multisensory visual-auditory
object recognition in humans: a high-density electrical mapping study.
Cereb Cortex 14: 452– 465, 2004.
Molholm S, Sehatpour P, Mehta AD, Shpaner M, Gomez-Ramirez M,
Ortigue S, Foxe JJ. Audio-visual multisensory integration in superior
parietal lobule revealed by human intracranial recordings. J Neurophysiol
96: 721–729, 2006.
Moran RJ, Molholm S, Reilly RB, Foxe JJ. Changes in effective connectivity of human superior parietal lobule under multisensory and unisensory
stimulation. Eur J Neurosci 27: 2303–2312, 2008.
Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid
brain discrimination of sounds of objects. J Neurosci 26: 1293–1302, 2006.
Murray MM, Michel CM, Grave de Peralta R, Ortigue S, Brunet D,
Gonzalez Andino S, Schnider A. Rapid discrimination of visual and
multisensory memories revealed by electrical neuroimaging. Neuroimage
21: 125–135, 2004.
Nichols TE, Holmes AP. Nonparametric permutation tests for functional
neuroimaging: a primer with examples. Hum Brain Mapp 15: 1–25, 2002.
Nishitani N, Hari R. Viewing lip forms: cortical dynamics. Neuron 36:
1211–1220, 2002.
Noppeney U, Josephs O, Hocking J, Price CJ, Friston KJ. The effect of
prior visual information on recognition of speech and sounds. Cereb Cortex
18: 598 – 609, 2008.
Noppeney U, Ostwald D, Werner S. Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex. J Neurosci 30:
7434 –7446, 2010.
MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT
J Neurophysiol • VOL
Semantic confusion regarding the development of multisensory integration:
a practical solution. Eur J Neurosci 31: 1713–1720, 2010.
Talairach J, Tournoux P. Co-Planar Stereotaxic Atlas of the Human Brain,
translated by Rayport M. New York: Thieme Medical, 1988.
Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA. Effects of
spatial congruity on audio-visual multimodal integration. J Cogn Neurosci
17: 1396 –1409, 2005.
Teder-Salejarvi WA, McDonald JJ, Di Russo F, Hillyard SA. An analysis
of audio-visual crossmodal integration by means of event-related potential
(ERP) recordings. Brain Res Cogn Brain Res 14: 106 –114, 2002.
Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system.
Nature 381: 520 –522, 1996.
van Atteveldt NM, Blau VC, Blomert L, Goebel R. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in
human superior temporal cortex. BMC Neurosci 11: 11, 2010.
Van Petten C, Luka BJ. Neural localization of semantic context effects in
electromagnetic and hemodynamic studies. Brain Lang 97: 279 –293, 2006.
Van Wassenhove V, Grant KW, Poeppel D. Visual speech speeds up the
neural processing of auditory speech. Proc Natl Acad Sci USA 102: 1181–
1186, 2005.
Weisberg J, van Turennout M, Martin A. A neural system for learning about
object function. Cereb Cortex 17: 513–521, 2007.
Yuval-Greenberg S, Deouell LY. What you see is not (always) what you
hear: induced gamma band responses reflect cross-modal interactions in
familiar object recognition. J Neurosci 27: 1090 –1096, 2007.
Yuval-Greenberg S, Deouell LY. The dog’s meow: asymmetrical interaction
in cross-modal object recognition. Exp Brain Res 193: 603– 614, 2009.
106 • DECEMBER 2011 •
www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017
Peiffer AM, Mozolic JL, Hugenschmidt CE, Laurienti PJ. Age-related
multisensory enhancement in a simple audiovisual detection task. Neuroreport 18: 1077–1081, 2007.
Pourtois G, de Gelder B. Semantic factors influence multisensory pairing: a
transcranial magnetic stimulation study. Neuroreport 13: 1567–1573, 2002.
Quraan MA, Cheyne D. Reconstruction of correlated brain activity with
adaptive filters in MEG. Neuroimage 49: 2387–2400, 2010.
Robinson S, Vrba J(editors) . Functional Neuroimaging by Synthetic Aperture Magnetometry. Sendai, Japan: Tohoku Univ. Press, 1998.
Schneider TR, Debener S, Oostenveld R, Engel AK. Enhanced EEG
gamma-band activity reflects multisensory semantic matching in visual-toauditory object priming. Neuroimage 42: 1244 –1254, 2008.
Sekihara K, Nagarajan SS, Poeppel D, Marantz A, Miyashita Y. Reconstructing spatio-temporal activities of neural sources using an MEG vector
beamformer technique. IEEE Trans Biomed Eng 48: 760 –771, 2001.
Senkowski D, Saint-Amour D, Kelly SP, Foxe JJ. Multisensory processing
of naturalistic objects in motion: a high-density electrical mapping and
source estimation study. Neuroimage 36: 877– 888, 2007.
Sinnett S, Soto-Faraco S, Spence C. The co-occurrence of multisensory
competition and facilitation. Acta Psychol (Amst) 128: 153–161, 2008.
Sinnett S, Spence C, Soto-Faraco S. Visual dominance and attention: the
Colavita effect revisited. Percept Psychophys 69: 673– 686, 2007.
Snodgrass JG, Vanderwart M. A standardized set of 260 pictures: norms for
name agreement, image agreement, familiarity, and visual complexity. J Exp
Psychol Hum Learn 6: 174 –215, 1980.
Stein BE, Burr D, Constantinidis C, Laurienti P, Meredith MA, Perrault
TJ, Ramachandran R, Röder B, Rowland BA, Sathian K, Schroeder
CE, Shams L, Stanford TR, Wallace MT, Yu L, Lewkowicz DJ.
2909