J Neurophysiol 106: 2896 –2909, 2011. First published August 31, 2011; doi:10.1152/jn.00303.2011. The co-occurrence of multisensory facilitation and cross-modal conflict in the human brain Andreea Oliviana Diaconescu, Claude Alain, and Anthony Randal McIntosh 1 Rotman Research Institute, Baycrest Centre, Toronto; and 2Department of Psychology, University of Toronto, Ontario, Canada Submitted 5 April 2011; accepted in final form 26 August 2011 multisensory integration; auditory; visual; magnetoencephalography; event-related synthetic aperture magnetometry analysis PREVIOUS EXAMINATIONS OF MULTISENSORY processes demonstrated that perception by means of distinct sensory channels not only contributes to the richness of each sensory experience but also improves object detection and recognition (for a review, see Doehrmann and Naumer 2008). Simple low-level stimulus properties, including temporal correspondence and spatial congruence, mediate multisensory facilitation in auditory and visual modalities. Indeed, a stimulus presented in one sensory modality can facilitate the detection of a spatially coincident stimulus presented in a distinct sensory modality. Address for reprint requests and other correspondence: A. O. Diaconescu, Laboratory for Social and Neural Systems Research, Dept. of Economics, Univ. of Zürich, Blümlisalpstrasse 10, CH-8006, Zürich, Switzerland (e-mail: [email protected]). 2896 McDonald et al. (2000) used an auditory cross-modal cueing paradigm to demonstrate that the co-occurrence of an irrelevant sound decreased the speed of detection of a subsequent light when the target light appeared in the same location as the sound. In addition, Diederich and Colonius (2004) provided evidence of faster response times (RTs) to monochromatic lights when they were accompanied by simple tones presented simultaneously or at short intervals before or after target stimuli. Therefore, a temporally synchronized stimulus in one sensory modality facilitates the detectability of another stimulus presented in another sensory modality. Behavioral facilitation in multisensory contexts capitalizes on the spatial and temporal properties of cross-modal, audiovisual (AV) stimuli as well as their semantic relatedness. The critical stimuli used in the present study included complex sounds and semantically related black-and-white line drawings of animate and inanimate objects. We examined the neural processes that mediate the associations between familiar auditory and visual stimuli, which can be referred to as semantic multisensory integration resembling naturalistic situations of multisensory stimulation. Multisensory integration, in the present context, refers to the neural process by which unisensory signals are combined to form a unique signal that is specifically associated with the cross-modal stimulus. It is operationally defined as a multisensory response, both neural and behavioral, that is significantly distinct from the sum of the responses evoked by the modality-specific component stimuli (cf. Stein et al. 2010). Whereas integration of semantically congruent cross-modal stimuli speeds up object detection and recognition, crossmodal conflicts impair performance (Chen and Spence 2010). Previous research demonstrated the presence of visual dominance during cross-modal conflicts, with ambiguous visual inputs impairing the recognition of complex sounds more than ambiguous auditory information impaired visual object recognition (Laurienti et al. 2004; Yuval-Greenberg and Deouell 2009). Furthermore, studies from Spence and colleagues (Hartcher-O’Brien et al. 2008; Koppen et al. 2008; Koppen and Spence 2007a, 2007b; Sinnett et al. 2007, 2008) reported longer RTs following cross-modal stimuli when participants were required to detect the presence of simple tones embedded within cross-modal presentations. These findings further support the interference of the more dominant visual sensory modality on the detection of auditory targets. To date, few studies have examined the effects of both multisensory facilitation and competition on performance. Sinnett et al. (2008) showed that presentations of complex sounds alongside visual objects can either speed up or slow down RTs depending on task demands. Participants were significantly 0022-3077/11 Copyright © 2011 the American Physiological Society www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Diaconescu AO, Alain C, McIntosh AR. The co-occurrence of multisensory facilitation and cross-modal conflict in the human brain. J Neurophysiol 106: 2896 –2909, 2011. First published August 31, 2011; doi:10.1152/jn.00303.2011.—Perceptual objects often comprise a visual and auditory signature that arrives simultaneously through distinct sensory channels, and cross-modal features are linked by virtue of being attributed to a specific object. Continued exposure to cross-modal events sets up expectations about what a given object most likely “sounds” like, and vice versa, thereby facilitating object detection and recognition. The binding of familiar auditory and visual signatures is referred to as semantic, multisensory integration. Whereas integration of semantically related cross-modal features is behaviorally advantageous, situations of sensory dominance of one modality at the expense of another impair performance. In the present study, magnetoencephalography recordings of semantically related cross-modal and unimodal stimuli captured the spatiotemporal patterns underlying multisensory processing at multiple stages. At early stages, 100 ms after stimulus onset, posterior parietal brain regions responded preferentially to cross-modal stimuli irrespective of task instructions or the degree of semantic relatedness between the auditory and visual components. As participants were required to classify cross-modal stimuli into semantic categories, activity in superior temporal and posterior cingulate cortices increased between 200 and 400 ms. As task instructions changed to incorporate cross-modal conflict, a process whereby auditory and visual components of crossmodal stimuli were compared to estimate their degree of congruence, multisensory processes were captured in parahippocampal, dorsomedial, and orbitofrontal cortices 100 and 400 ms after stimulus onset. Our results suggest that multisensory facilitation is associated with posterior parietal activity as early as 100 ms after stimulus onset. However, as participants are required to evaluate cross-modal stimuli based on their semantic category or their degree of congruence, multisensory processes extend in cingulate, temporal, and prefrontal cortices. MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT J Neurophysiol • VOL occipital complex (LOC), which has been previously linked to visual object recognition (Malach et al. 1995). Electrophysiological research of semantic cross-modal conflicts has focused primarily on conflicts regarding the phonetic perception of speech. Whereas congruent articulatory gestures significantly speed up auditory processes related to speech (van Wassenhove et al. 2005), incompatible visual input can greatly impair speech perception. The most documented case of crossmodal conflict during speech is the McGurk effect (McGurk and MacDonald 1976), in which incongruent or temporally asynchronous visual representations of speech modify the auditory percept phonetically (McGurk and MacDonald 1976). Several MEG studies (e.g., Kaiser et al. 2005; Nishitani and Hari 2002) showed oscillatory gamma-band activity in response to McGurk-like stimuli beginning in the posterior parietal scalp region at 160 ms and extending over occipital and temporal-to-inferior frontal regions between 200 and 320 ms. To determine the brain regions that are chiefly involved in multisensory integration during speech, previous studies used transcranial magnetic stimulation (TMS) and temporally perturbed task-related neural activity. Disruption of posterior parietal activity 200 ms after stimulus onset impaired AV integration of tones and shapes (Pourtois and de Gelder 2002). In addition, TMS applied to the superior temporal sulcus (STS) significantly reduced the McGurk effect within 100 ms after auditory syllable onset (Beauchamp et al. 2010). With respect to semantically incongruent combinations of nonspeech sounds and visual objects, large negative amplitude deflections were captured 390 ms after stimulus onset (Molholm et al. 2004). This ERP pattern resembled the N400 component, which was previously implicated in semantic mismatch processing of objects in linguistic contexts (Kutas and Federmeier 2000). Although the effects of cross-modal incongruence were not explicitly examined, recent functional magnetic resonance imaging (fMRI) studies suggested that cingulate and inferior frontal cortices were recruited during semantic conflicts (Laurienti et al. 2003; Van Petten and Luka 2006). In the present study, we investigated the effects of both multisensory facilitation and semantic cross-modal conflicts in sensory-specific and modality-independent brain regions. We predicted that the integrated neural signal would be larger and, most importantly, that multisensory processes would engage a distinct set of brain regions than each of the responses evoked by the modality-specific component stimuli. We considered it to be a principled prediction that multisensory integration processes are not simply the linear combination of unisensory responses. Recent multiple cell recording studies (Molholm et al. 2006) and functional neuroimaging studies in humans (Baumann and Greenlee 2007; Bishop and Miller 2008; Calvert et al. 2001; Grefkes et al. 2002; Macaluso et al. 2004; Moran et al. 2008) showed that cross-modal stimuli not only elicited increased activity in sensory-specific cortices but also activated a distinct network of posterior parietal brain regions, including the inferior parietal sulcus (IPS), the inferior parietal lobule (IPL), and the superior parietal lobule (SPL). All conditions used in the present study included both unimodal and cross-modal presentations of familiar complex sounds and visual objects. Behavioral multisensory facilitation effects that capitalize on the semantic relatedness of crossmodal stimuli were assessed by asking participants to categorize stimuli based on their animacy (i.e., living vs. nonliving 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 faster to detect visual targets when they were embedded within cross-modal presentations than when they were presented alone. However, when required to identify complex sounds that were either presented in isolation or as cross-modal pairs, participants were significantly slower to respond to crossmodal stimuli compared with unimodal ones. Given the behavioral advantages of multisensory integration and the performance decreases following cross-modal conflicts, more research is required to determine whether a common network of brain regions is responsible for processing cross-modal conflict compared with multisensory facilitation processes. Furthermore, more research is required to determine the extent to which multisensory processes are modulated by the degree of semantic correspondence between visual and auditory stimuli. Thus, in the present study, we used magnetoencephalography (MEG) recordings to investigate the neural mechanisms underlying both multisensory facilitation and cross-modal conflict during the presentation of complex sounds and semantically related black-and-white line drawings of animate and inanimate objects. Noninvasive electrophysiological studies in humans that used semantically related AV stimuli and contrasted crossmodal with unimodal stimulus presentations demonstrated modulations of both early and late event-related related potentials (ERPs) over both sensory-specific and modality-independent cortical areas (e.g., Molholm et al. 2004; Yuval-Greenberg and Deouell 2007). For instance, recent studies found increases in the visual N1 waveform at occipital sites for cross-modal presentations compared with unimodal ones (Molholm et al. 2004; Yuval-Greenberg and Deouell 2007). YuvalGreenberg and Deouell (2007) also examined oscillatory gamma-band activity (30 –70 Hz) in response to cross-modal stimuli using a name verification task of naturalistic objects. Early and late gamma-band activity at 90 and 260 ms, respectively, was associated with low-level feature integration and higher level object representation. Similarly, using a perceptual decision task, Schneider et al. (2008) found enhanced gamma-band activity between 120 and 180 ms in response to semantically congruent relative to semantically incongruent cross-modal pairs of common objects. There is increasing evidence that meaning and semantic relatedness play an important role in multisensory integration. For instance, Senkowski et al. (2007) compared the effects of natural and abstract stimuli reflecting motion. Participants were presented with a random stream of naturalistic and abstract video clips and static target stimuli. Each stimulus class consisted of unimodal auditory, unimodal visual, and cross-modal presentations. The participants’ task was to count the target stimuli embedded within the natural and abstract stimulus classes. Only the former showed evidence of early multisensory integration within 120 ms after stimulus onset. Furthermore, naturalistic motion represented by AV presentations engaged occipital, temporal, and frontal regions. Murray et al. (2004), on the other hand, demonstrated the efficacy of cross-modal stimuli on subsequent memory recall by showing that visual images were more likely to be remembered when they were previously presented along with semantically related complex sounds than when they were presented only in the visual modality. Cross-modal presentations, in this case, led to increases in cortical activity in the right lateral 2897 2898 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT objects). Conversely, both behavioral and neural responses to cross-modal conflicts were examined by requiring participants to detect the level of semantic correspondence between auditory and visual stimuli embedded within cross-modal presentations that were either semantically congruent or incongruent. We were particularly interested to compare multisensory integration processes in conditions of semantic congruence and behavioral facilitation to multisensory processes in conditions containing cross-modal semantic conflict and reductions in performance. MATERIALS AND METHODS Participants Stimuli Stimuli were selected to have semantically congruent auditory and visual representations. Two types of stimuli, animate and inanimate, were used in the study. Items were selected from four distinct categories: 1) animals, 2) musical instruments, 3) automobiles, and 4) household objects. The first category of stimuli was labeled as “animate,” whereas the remaining three categories were considered “inanimate” objects. Black-and-white line drawings, selected from the Snodgrass and Vanderwart (1980), served as the visual stimuli. All visual stimuli were matched according to size (in pixels), brightness, and contrast. Participants sat in an upright position and viewed the visual stimuli on a back-projection screen that subtended ⬃30 degrees of visual angle when they were seated 70 cm from the screen. Semantically related nonspeech, complex sounds were matched in terms of loudness by computing the mean root mean square value. Each complex sound was assigned the mean amplitude; thus louder sounds were reduced, whereas softer ones were amplified. Complex sounds were delivered binaurally at an intensity level of 60 dB HL based on the audiometric mean across both ears. Binaural auditory stimuli were presented via an OB 822 Clinical Audiometer through ER30 transducers (Etymotic Research, Elk Grove, IL) and connected with a 1.5-m length of matched plastic tubing and foam earplugs to the participants’ ears. Complex sounds and line drawings were paired to create crossmodal congruent stimulus combinations in which the AV stimuli matched semantically (e.g., picture of a lion paired with the sound of a roar or picture of an ambulance car paired with a siren). Incongruent cross-modal stimulus combinations, on the other hand, were created by randomly pairing complex sounds with visual objects from distinct categories; thus a complex sound or a picture from the animate category was always paired with a picture or complex sound from the inanimate category. In summary, four stimulus types were employed: 1) auditory unimodal (A), 2) visual unimodal (V), 3) cross-modal congruent (AV⫹), and 4) cross-modal incongruent (AV⫺). To ensure that the complex sounds were easily nameable and identifiable, we assessed accuracy values and RTs for each stimulus exemplar in an initial behavioral pilot. Five young adults (mean age: 26 yr) participated in this initial pilot. Complex sounds were excluded if detection accuracy levels fell below 75% and RTs exceeded 2 SD above the mean RT values for each individual subject. Furthermore, after behavioral testing, we also asked participants to rate complex J Neurophysiol • VOL Procedure Each stimulus or stimulus pair was presented for 400 ms; for the auditory stimuli, the 400-ms interval also included a 5-ms fall and rise time. The time interval between the end of the stimulus presentation and the beginning of the next trial varied between 2 and 4 s (equiprobable). See Fig. 1 for an illustration of the paradigm. Multisensory facilitation, the RT gain following semantically congruent AV stimuli, was assessed using a simple detection and a feature classification task. During the detection task, participants were instructed to respond (left index finger response) as quickly as possible to any trial type: unimodal A, unimodal V, cross-modal congruent (AV⫹), and cross-modal incongruent (AV⫺). Forty random presentations of each stimulus type and a total of 160 trials were used in this condition. Therefore, because we used 30 exemplars for each trial type, certain auditory or visual stimuli were presented more than once. In the semantic classification task, participants were required to make animacy or inanimacy judgments following unimodal A, unimodal V, and cross-modal congruent AV⫹ trial types (left index and middle finger responses for animate and inanimate judgments, respectively). The correspondence between the response keys and the type of stimulus presentation was counterbalanced between participants. Table 1. List of auditory and visual stimuli used in the experiment Animate Cow Pig Sheep Lamb Dog Cat Horse Donkey Chicken Turkey Rooster Carnival Sparrow Goose Owl Penguin Whale Dolphin Fly Cricket Frog Snake Crocodile Lion Bear Wolf Tiger Puma Elephant Monkey 106 • DECEMBER 2011 • www.jn.org Inanimate Musical instruments Violin Guitar Harp Banjo Trumpet Bell Clarinet Piano Organ Drum Cymbal Automobiles/transportation Train Ship Car Ambulance Fire truck Motorcycle icycle Airplane Helicopter Household objects Frying pan Fork Sink Bottle Hairdryer Door Telephone Tea kettle Camera Clock Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Eighteen young adults (9 males, ages 19 –25 yr, mean ⫾ SD: 24.17 ⫾ 3.9 yr), all right-handed with healthy neurological histories, normal to corrected-to-normal vision and hearing, and an average of 16.39 yr of education, participated in this study. The study was approved by the joint Baycrest Centre-University of Toronto Research Ethics Committee, and the rights and privacy of the participants were observed. All participants gave formal informed consent before the experiment and received monetary compensation for participation. sounds based on their recognizability. On the basis of the behavioral findings and the postexperiment questionnaire results, we excluded several complex sounds along with their visual counterparts. Thus, for each animate or inanimate category, 30 different exemplars from each sensory modality (auditory and visual) were selected because they were unambiguously categorized. See Table 1 for a complete list of the visual and auditory stimuli used in the experiment. In total, 60 animate stimuli (30 auditory and 30 visual) and 60 inanimate stimuli (30 auditory and 30 visual) were used. MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT 2899 system for preprocessing and analysis of the MEG data. Three participants were eliminated from the analysis due to excessive head motion during the MEG scan. Thus MEG data from a total of 15 participants were preprocessed and subsequently analyzed. Neuromagnetic activity was sampled at a rate of 1,250 Hz and was recorded continuously in 6 experimental blocks (i.e., 3 experimental conditions repeated once) of 15 min recording time each. Participants took a minimum 5-min break between each block. We divided the conditions into 15-min blocks to maximize the total number of trials for each condition while also keeping the blocks short to minimize the amount of head motion in the MEG scanner. The simple detection condition was always presented first; however, the order of the rest of the experimental condition blocks was randomly assigned to each participant. MEG Preprocessing Forty randomized presentations of each stimulus type and a total of 120 trials were used in this condition. To assess cross-modal conflicts, we asked participants to indicate whether auditory and visual signatures within cross-modal presentations were semantically congruent or not. As such, in the congruency task, participants discriminated between congruent and incongruent cross-modal presentations by pressing one of two responses (left index and middle finger response keys). Forty random presentations of each stimulus type were used. The correspondence between the response keys and the type of stimulus presentation was counterbalanced between participants to control for any finger dominance effects on response latencies. Presentation software (version 10.3; Neurobehavioral Systems, San Francisco, CA; http://www.neurobs.com/) was used to control visual and auditory stimulus delivery and to record participants’ response latency and accuracy. MEG Recordings The MEG was recorded in a magnetically shielded room at the Rotman Research Institute, Baycrest Centre, using a 151-channel whole head neuromagnetometer (OMEGA; VSM Medtech, Vancouver, Canada). MEG collection was synchronized to the onset of each visual stimulus by recording the luminance changes of the screen with a photodiode. With respect to the auditory stimuli, the MEG data collection was synchronized to the onset of the auditory sound envelope. Participants’ head positions within the MEG scanner were determined at the start and at the end of each recording block using indicator coils placed on nasion and bilateral preauricular points. These fiducial points established a head-based Cartesian coordinate J Neurophysiol • VOL MEG Data Analysis Event-related SAM analysis. The “nonlinear beamformer” or synthetic aperture magnetometry (SAM) technique was used to model the source of the measured magnetic field (Robinson and Vrba 1998; Sekihara et al. 2001). SAM minimizes power or the variance of the measured MEG signals such that signals emitted from sources outside each specified voxel are suppressed (Brookes et al. 2007; Cheyne et al. 2007). This enables one to display simultaneously active sources at multiple sites, provided that they are not perfectly synchronized. To obtain spatial precision without integrating power over long temporal windows, we used an event-related version of the SAM analysis technique, event-related SAM or ER-SAM, introduced by Cheyne et al. (2006) to identify evoked brain responses from unaveraged singletrial data. Despite their advantages in terms of spatial resolution and localization accuracy, the performance of beamformers can be reduced in the presence of sources with high temporal correlations. However, beamformers used to localize time-locked brain activity, including ER-SAM, have been shown to be fairly robust in the presence of sources with small to moderate correlations, exhibiting only small decreases in amplitude in the presence of such sources (Gross et al. 2001; Hadjipapas et al. 2005; Sekihara et al. 2002). It is important to consider, however, that the proximity between highly correlated sources may reduce estimated power in such sources. Quraan and Cheyne (2010) used simulations of fully correlated bilateral sources in the auditory cortex as well as correlated bilateral sources in striate and extrastriate areas with random Gaussian noise and demonstrated that source cancellations were less likely to occur at signal-to-noise levels typical of neural sources in current MEG systems. Another reason for selecting ER-SAM over more traditional beamformers is that it can localize sources that are time-locked to stimulus events even in the presence of artifacts and instrumental noise (Cheyne et al. 2007). 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Fig. 1. Experimental design. Four possible stimulus combinations were used in this study: 1) unimodal visual (V; example, picture of a violin), 2) unimodal auditory (A; example, sound of a “roar”), 3) cross-modal congruent or simultaneous auditory and visual stimuli that were matched semantically (AV⫹; example, picture of a bird matched with a corresponding “chirp” sound), and 4) cross-modal incongruent or simultaneous auditory and visual stimuli that were semantically mismatched (AV⫺; example, picture of a lion matched with a “siren” sound). Each stimulus or stimulus pair was presented for 400 ms; for the auditory stimulus, the 400-ms interval also included a 5-ms fall and rise time. The time interval between the end of the stimulus presentation and the beginning of the next trial (ITI) varied between 2 and 4 s (equiprobable). Third-gradient noise correction was applied to the continuous MEG data. Afterwards, the MEG data were parsed into epochs including a 200-ms prestimulus and 1,000-ms poststimulus activity window, and DC offsets were removed from the entire epoch. Finally, MEG data were band-pass filtered between 0.1 and 55 Hz and averaged across all trial types. Neuromagnetic activity was also coregistered to each participant’s individual structural MRI. To constrain the sources of activation to each participant’s head shape and structural anatomy, we acquired MRI scans for each participant using a 3.0T Siemens Tim MAGNETOM Trio MRI scanner (syngo.MR software; Siemens Medical, Erlangen, Germany) with 12-channel head coil. All participants’ structural MRIs and MEG source data were spatially normalized to the Talairach standard brain using Analysis of Functional Neuroimaging software (AFNI; Cox 1996). 2900 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT J Neurophysiol • VOL cantly different from random noise. Second, the reliability of each source contribution was assessed using a bootstrap estimation of standard errors for the MEG source saliences. The primary purpose of the bootstrap is to determine those portions of the source waveforms that show reliable experimental effects across subjects. Importantly, with the use of bootstrap estimation of standard errors, no correction for multiple comparisons was necessary, because the source saliences were calculated in a single mathematical step on the whole brain at once (McIntosh et al. 1996; McIntosh and Lobaugh 2004). These two resampling techniques provide complementary information about the statistical strength of the task effects and its reliability across participants. Statistical evaluation of task effects was performed using an optimal number of 500 permutations (cf. Nichols and Holmes 2002) and 300 bootstrap iterations (cf. Efron and Tibshirani 1986; McIntosh et al. 1996). RESULTS Behavioral Results Performance accuracy. In the simple detection condition, accuracy levels were at ceiling in all four trial types (i.e., AV⫹, AV⫺, A, and V). In the animacy condition, a larger number of errors were observed for unimodal A compared with AV⫹ and unimodal V trial types [F(2,13) ⫽ 207.33, P ⬍ 0.001]. Furthermore, in the congruency condition, participants showed more false alarms in response to AV⫹ trial types. In other words, participants were more likely to classify AV stimuli as incongruent in cases of semantic congruence [F(1,14) ⫽ 131.81, P ⬍ 0.0001]. See Table 3 for means and SD values in the three conditions and four trial types. Response times. MULTISENSORY FACILITATION. Only correct responses were used in the RT data analysis. In multisensory facilitation conditions (simple detection and semantic classification), RTs were analyzed using a 2 ⫻ 3 repeated-measures analysis of variance (ANOVA) in which trial type (i.e., A, V, and AV⫹) and condition (i.e., simple detection and semantic classification) were the within-subject factors. First, a main effect of condition was observed [F(1, 14) ⫽ 478.96, P ⬍ 0.001] with RTs for semantic classification conditions being longer (mean ⫾ SD: 657 ⫾ 17 ms) than RTs in the simple detection condition (289 ⫾ 14 ms). Second, a main effect of trial type was identified [F(2, 28) ⫽ 135.83, P ⬍ 0.001] with unimodal A trial types showing significantly longer RTs than unimodal V and cross-modal trial types across all three conditions (see Table 3 for means ⫾ SD values in each condition and trial type; Fig. 2, A and B). Finally, an interaction between conditions and trial types was also found [F(2, 28) ⫽ 29.90, P ⬍ 0.0001] reflecting significant behavioral facilitation to cross-modal compared with unimodal A presentations in both semantic classification and simple detection conditions (see Table 4 for an overview of the repeated-measures ANOVA results). In the simple detection condition, post hoc t-tests suggested that RTs to unimodal V trial types were also significantly slower than RTs to AV⫹ trial types (t ⫽ 11.47, P ⬍ 0.001) and RTs to AV⫺ trial types (t ⫽ 11.35, P ⬍ 0.001). In conclusion, multisensory facilitation was captured across both auditory and visual modalities in the simple detection condition and in the auditory modality only in the semantic classification condition. CROSS-MODAL CONFLICTS. We examined the effects of crossmodal incongruence on performance by comparing congruent (AV⫹) and incongruent (AV-) trial types in both simple 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Similarly to previous beamforming approaches, the ER-SAM analysis uses the individual trials of each condition and the forward solution for modeling optimal current direction to calculate a spatial filter for each voxel using the minimum-variance beamforming algorithm (Cheyne et al. 2006). The spatial filter we used included 72 brain regions of interest adapted from the regional map coarse parcellation scheme of the cerebral cortex proposed by Kotter and colleagues (Bezgin et al. 2008; Kotter and Wanke 2005). See Table 2 for a complete listing of the brain regions used with their respective Talairach coordinates. Each brain region was defined by a threedimensional position vector and consisted of a unique set of sensor coefficients that constituted a weighting matrix. Thus all brain regions were constrained to be the same size. The MEG data were then projected through this spatial filter to give a measure of current density, as a function of time, in the target brain region. Because this source time series was calculated using a weighted sum of the MEG sensors, it had the same millisecond time resolution as the original MEG sensor data. To enhance the spatial precision of this technique, we used the participants’ structural MRIs to constrain the ER-SAM images to each participant’s individual MRI and to allow for spatial normalization and group averaging in stereotaxic space. The individual functional maps were overlaid on the individual participant’s MRI based on coregistration with the indicator coils placed on the nasion and bilateral preauricular points. The functional data were then transformed to the standard TalairachTournoux space using the same transformation applied to the structural MRI (AFNI software; Cox 1996). PLS analysis. We used partial least squares (PLS) analysis (McIntosh et al. 1996) to examine neuromagnetic brain activity across all 72 brain regions of interest as a function of task demands, namely, multisensory facilitation and cross-modal conflict. The term “partial least squares” refers to the computation of an optimal squares fit to part of a covariance structure that is attributable to the experimental manipulations. PLS applied to MEG data is conceptually analogous to the analysis of MEG difference waveforms, because it identifies task-related differences in amplitude across all MEG sources by deriving the optimal least squares contrasts that code for the task differences. Because PLS performs this computation across the entire dataset in time and space simultaneously, there is no need to specify a priori MEG sources or time intervals for the analysis. In the original version of PLS, singular value decomposition (SVD) was used to identify the strongest effects in the data. For several analyses discussed in this study, we used a nonrotated version of task PLS, in which a priori contrasts restricted the spatiotemporal patterns derived from PLS. This version of PLS has the advantage of allowing direct assessment of hypothesized experimental effects. Task differences were examined across the entire epoch, including the prestimulus baseline (⫺0.2 s) and the poststimulus interval (1 s). The relationship between two sets of matrices is analyzed: the first pertains to the experimental design (or to the performance measures), whereas the second contains MEG source activity across all time points. The relationship between the two blocks of data is stored in a cross-product matrix. Application of SVD decomposes this matrix into three new matrices: 1) task saliences, 2) singular values, and 3) source saliences. The task saliences indicate the experimental design profiles that best characterize the given matrix. The source saliences reflect the MEG sources that characterize the corresponding task pattern across space (expressed across a collection of MEG sources) and time (expressed across all time points included in the analysis). Task and source saliences are linked together via singular values, which represent the square root of the eigenvalues. The cross-product between the task saliences and original matrix creates “brain scores,” which reflect the expression of the task contrast across all participants (see Figs. 4 —7). Statistical assessment. For statistical assessment of task effects, two complementary resampling techniques were employed. First, permutation tests assessed whether the task saliences were signifi- MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT 2901 Table 2. Regional map coordinates with reference to Talairach-Tournoux X Y Z Region BA Hemisphere 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 0 0 0 0 ⫺40 ⫺60 ⫺36 ⫺36 ⫺36 ⫺24 ⫺44 ⫺44 ⫺8 ⫺28 ⫺48 ⫺48 ⫺8 ⫺8 ⫺24 ⫺24 ⫺48 ⫺28 ⫺28 ⫺4 ⫺44 ⫺16 ⫺40 ⫺56 ⫺64 ⫺64 ⫺52 ⫺52 ⫺32 ⫺8 ⫺4 ⫺4 ⫺20 ⫺20 40 60 36 36 36 24 44 44 8 28 48 48 8 8 24 24 48 28 28 4 44 16 40 56 64 64 52 52 32 8 4 32 ⫺32 ⫺48 16 ⫺14 ⫺14 8 16 ⫺8 ⫺24 ⫺48 ⫺64 ⫺64 ⫺56 32 36 36 48 44 64 32 ⫺16 0 0 4 ⫺28 ⫺28 ⫺16 ⫺24 ⫺24 12 ⫺4 ⫺28 ⫺8 ⫺84 ⫺96 ⫺88 ⫺84 ⫺14 ⫺14 8 16 ⫺8 ⫺24 ⫺48 ⫺64 ⫺64 ⫺56 32 36 36 48 44 64 32 ⫺16 0 0 4 ⫺28 ⫺28 ⫺16 ⫺24 ⫺24 12 ⫺4 ⫺28 ⫺8 ⫺84 24 24 12 ⫺8 4 4 56 ⫺4 ⫺4 56 20 28 54 54 12 32 40 20 ⫺20 4 ⫺8 ⫺16 60 60 24 4 64 16 ⫺12 ⫺24 ⫺28 ⫺8 ⫺28 4 ⫺4 8 20 ⫺12 4 4 56 ⫺4 ⫺4 56 20 28 54 54 12 32 40 20 ⫺20 4 ⫺8 ⫺16 60 60 24 4 64 16 ⫺12 ⫺24 ⫺28 ⫺8 ⫺28 4 ⫺4 Anterior cingulate cortex Posterior cingulate cortex Retrosplenial cingulate cortex Aubgenual cingulate cortex A1 (primary auditory) A2 (secondary auditory) Frontal eye fields Anterior insula Claustrum M1 (primary motor) Inferior parietal cortex Angular gyrus Precuneus Superior parietal cortex Centrolateral prefrontal cortex Dorsolateral prefrontal cortex Dorsomedial prefrontal cortex Medial prefrontal cortex Orbitofrontal cortex Frontal polar Ventrolateral prefrontal cortex Parahippocampal cortex Dorsolateral premotor cortex Medial premotor cortex Ventrolateral premotor cortex Pulvinar S1 (primary somatosensory) S2 (secondary somatosensory) Middle temporal cortex Inferior temporal cortex Temporal pole Superior temporal cortex Ventral temporal cortex Thalamus (ventral lateral nucleus) V1 (primary visual) V2 (secondary visual) Cuneus Fusiform gyrus A1 (primary auditory) A2 (secondary auditory) Frontal eye fields Anterior insula Claustrum M1 (primary motor) Inferior parietal cortex Angular gyrus Precuneus Superior parietal cortex Centrolateral prefrontal cortex Dorsolateral prefrontal cortex Dorsomedial prefrontal cortex Medial prefrontal cortex Orbitofrontal cortex Frontal polar Ventrolateral prefrontal cortex Parahippocampal cortex Dorsolateral premotor cortex Medial premotor cortex Ventrolateral premotor cortex Pulvinar S1 (primary somatosensory) S2 (secondary somatosensory) Middle temporal cortex Inferior temporal cortex Temporal pole Superior temporal cortex Ventral temporal cortex Thalamus (ventral lateral nucleus) V1 (primary visual) 32 23 30 25 Midline Midline Midline Midline Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Left Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Right Continued J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org 22 6 13 4 40 39 7 7 46 9 8 10 11 10 6 6 9 3 43 21 20 38 22 18 19 22 6 13 4 40 39 7 7 46 9 8 10 11 10 6 6 9 3 43 21 20 38 22 Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Cluster 2902 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT Table 2. —Continued Cluster X Y Z 70 71 72 4 20 20 ⫺96 ⫺88 ⫺84 8 20 ⫺12 Region V2 (secondary visual) Cuneus Fusiform gyrus BA Hemisphere 18 19 Right Right Right Coordinates are included for the 72 brain regions of interest adapted from the regional map coarse parcellation scheme of the cerebral cortex proposed by Kotter and Wanke (2005). Brodmann areas (BA) were determined using the brain atlas of Talairach and Tournoux (1988), because these landmarks provide more meaningful representations of the magnetoencephalography source locations. Cumulative distribution functions. To investigate the behavioral effects of multisensory integration further, we contrasted cross-modal trial types (i.e., AV⫹ and AV-) against the inequality model proposed by Miller (1982). The group-averaged probability for each cross-modal trial type was analyzed against the predictions of the model based on the unimodal Table 3. Descriptive statistics for the two dependent variables: accuracy and RT Variable Accuracy Simple detection AV⫹ Simple detection AV⫺ Simple detection A Simple detection V Semantic classification AV⫹ Semantic classification A Semantic classification V Congruency AV⫹ Congruency AV⫺ RT Simple detection AV⫹ Simple detection AV⫺ Simple detection A Simple detection V Semantic classification AV⫹ Semantic classification A Semantic classification V Congruency AV⫹ Congruency AV⫺ Mean 98.33 97.33 98.73 97.07 97.20 90.40 97.53 84.87 89.93 254.54 249.72 330.18 284.59 592.91 779.42 598.79 957.835 1,099.843 SD 0.90 1.35 0.46 1.28 0.94 1.68 1.06 0.48 0.30 47.77 47.41 80.37 46.25 70.73 75.87 83.25 172.0379 192.0922 Values are percentages for accuracy and milliseconds for response time (RT) for unimodal auditory (A), unimodal visual (V), cross-modal congruent (AV⫹), and cross-modal incongruent (AV⫺) trial type statistics. J Neurophysiol • VOL Fig. 2. Response times (RTs; in ms) in simple detection (A), semantic feature classification (B), and congruency conditions (C). Errors bars represent SE. 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 detection and congruency conditions. In the detection condition, we expected to find no differences between congruent and incongruent presentations. However, in the congruency condition, we anticipated that RTs would be slower as participants were required to assess the degree of mismatch between the auditory and visual stimuli embedded within cross-modal presentations. We found a significant interaction between trial types and conditions [F(1,14) ⫽ 37.39, P ⬍ 0.001]. As predicted, crossmodal congruent and incongruent trial types did not differ in the simple detection condition; however, in the congruency condition, incongruent compared with congruent trial types exhibited significantly longer RTs (Fig. 2C). In the congruency condition, post hoc t-tests revealed significant differences between congruent and incongruent AV trial types [t(14) ⫽ 6.19, P ⬍ 0.0001]. Participants were slower to respond to incongruent AV trial types by ⬃140 ms. Although they were slower, participants were also more accurate in categorizing incongruent compared with congruent presentations. Because we were primarily interested in task differences in RTs, we instructed participants to respond as fast as possible to each stimulus type. Thus, in the congruency condition, responses to congruent AV stimuli were faster, but also less accurate, than incongruent ones. Performance accuracy was 4% lower in congruent compared with incongruent trials (t ⫽ 11.4, P ⬍ 0.0001) averaging at 85% and 89%, respectively. MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT Table 4. Repeated-measures ANOVA results and descriptive statistics Effect Condition Trial type Condition ⫻ trial type F Hypothesis df Error df P Partial 2 1.0000 2.0000 14.0000 13.0000 0.0000 0.0000 0.9716 0.9408 2.0000 13.0000 0.0002 0.7345 478.9638 135.8296 29.89567 95% Confidence Interval Effect SD Lower bound Upper bound 289.77 657.04 14.78 17.58 258.07 619.34 321.47 694.74 423.72 554.80 441.69 12.92 15.58 15.64 396.00 521.39 408.15 451.44 588.21 475.24 254.54 330.18 284.59 592.91 779.42 598.79 12.33 20.75 11.94 18.26 19.59 21.49 228.08 285.68 258.98 553.74 737.41 552.69 280.99 374.69 310.21 632.08 821.43 644.89 Values are results of repeated-measures analysis of variance (ANOVA) and other descriptive statistics as indicated. df, Degrees of freedom. distributions. We generated subject-specific cumulative distribution functions (CDFs) for each trial type by dividing the time series into smaller, 5-ms time bins, similar to the procedure used by Laurienti and colleagues under similar task demands (Laurienti et al. 2006; Peiffer et al. 2007). The race model CDF was generated for each subject and averaged at each time bin. Paired t-tests were used to compare multisensory responses against the statistical facilitation predictions of the race model based on the unimodal RT distributions for each participant. In the simple detection condition, the probability of detecting faster RTs following cross-modal presentations exceeded the probability predicted by statistical facilitation associated with redundant unisensory stimuli (t ⫽ 69.92, P ⬍ 0.001). Similarly, in the semantic classification condition, multisensory responses were also faster than those predicted by the race model (t ⫽ 19.25, P ⬍ 0.003). Note that the probabilities of detecting shorter RTs in AV⫹ trials were not larger than the probabilities of detecting short RTs in unimodal visual trials in the semantic classification condition, further indicating that semantic classification of visual objects does not benefit from concurrent auditory presentations (see Fig. 3B). MEG Results Multisensory processing. In both simple detection and semantic classification conditions, significant differences between cross-modal and unimodal auditory presentations were detected [first-order latent variable (LV1) ⫽ 48.94, P ⬍ 0.001; Fig. 4A]. Multisensory integration, or increased MEG source activity in response to cross-modal compared with unimodal auditory stimuli, was captured in the left temporal pole (BA 38) and the left angular gyrus (BA 39) between 100 –300 and 200 – 400 ms, respectively (Fig. 4B). Cross-modal and unimodal V trial types also differed significantly (LV1 ⫽ 52.43, P ⬍ 0.001; Fig. 4C). Multisensory processes were captured in the left cuneus (BA 18) and the left IPL (BA 40) between 100 –300 and 200 – 400 ms, respectively, with these sources showing larger amplitude modulations to cross-modal stimuli compared with visual ones. (Fig. 4D). Our initial prediction that posterior parietal cortices responded preferentially to cross-modal trial types was also supported by an analysis that compared cross-modal (AV⫹ and AV⫺) with unisensory presentations (LV1 ⫽ 44.96, P ⬍ 0.05; Fig. 5A). When comparing cross-modal stimuli with their respective unisensory components, we found evidence of multisensory integration in the left precuneus (BA 7) and the right primary motor cortex (BA 4) between 100 –200 and 300 – 600 ms after stimulus onset, respectively (Fig. 5B). EFFECTS OF SEMANTIC EVALUATION. The multisensory integration neural processes described above were captured in both Fig. 3. Cumulative probability density function (CDF) for cross-modal trial types and multisensory responses predicted by the race (or inequality) model. Multisensory and calculated (predicted) race model CDFs are shown. Multisensory responses were faster than those predicted by the race model in both detection (A) and semantic classification conditions (B). Note that in the semantic classification condition, multisensory facilitation was only detected when auditory stimuli were accompanied by visual ones, rather than vice versa. J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Condition Simple detection Semantic classification Trial type AV⫹ A V Condition ⫻ trial type Simple detection AV⫹ Simple detection A Simple detection V Semantic classification AV⫹ Semantic classification A Semantic classification V Mean 2903 2904 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT simple detection and semantic classification conditions. In the semantic classification condition, increased amplitude modulations in response to semantically congruent cross-modal stimuli were observed in a distinct set of brain regions. In addition to multisensory responses in sensory-specific and posterior parietal brain regions, increased activity in posterior cingulate and superior temporal cortices were observed as participants were required to classify cross-modal trial types into animate and inanimate categories (LV1 ⫽ 48.13, P ⬍ 0.05; Fig. 6A). The posterior cingulate cortex and the left STS showed larger amplitudes in response to AV⫹ trial types between 100 –300 and 500 –700 ms, respectively (Fig. 6B). The effects of semantic classification across cingulate and superior temporal regions were specific to semantically congruent cross-modal presentations. The specificity of cingulate and temporal activity in response to cross-modal stimuli was tested explicitly by assessing the impact of semantic categorization on unisensory processing. In visual-only trial types, there was a trend toward increased activity in the left fusiform gyrus as participants classified pictures into animate or inanimate categories. A similar trend was observed in the auditory modality in the insula. These differences, however, were not significant by permutation tests. Thus we concluded that posterior cingulate and left STS activity between 100 –300 and 500 –700 ms was specific to cross-modal presentations in the semantic classification condition. EFFECTS OF CROSS-MODAL CONFLICT. Compared with the multisensory facilitation effects in the simple detection conditions, performance decreases were observed in AV⫹ and AV⫺ trials as participants were required to determine the degree of con2 2 Fig. 5. Cross-modal compared with unimodal trial types in multisensory facilitation conditions. The source waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios above the source waveforms reflect the positive expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal compared with unimodal trial types. We found increased activity in the left precuneus and the right primary motor cortex (M1) between 100 –200 and 300 – 600 ms (B). J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Fig. 4. Cross-modal compared with unimodal auditory (A and B) and unimodal visual processing (C and D) in multisensory facilitation conditions. The source waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios above the source waveforms reflect the positive expression of the task contrast (A and C), i.e., larger amplitude modulations for cross-modal compared with unimodal auditory and unimodal visual trial types. The left temporal pole and angular gyrus (AG) showed increased activity in response to cross-modal compared with unimodal auditory trial types between 100 –300 and 200 – 600 ms, respectively (B). The left cuneus and inferior parietal lobule (IPL) showed increased activity in response to cross-modal compared with unimodal visual trial types between 100 –300 and 200 – 400 ms (D). MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT 2905 Cortex Fig. 6. Effects of semantic feature categorization on multisensory processing. The source waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal trial types in the semantic classification condition compared with cross-modal trial types in the simple detection condition. The medial posterior cingulate and left superior temporal sulcus showed increased activity in response to cross-modal trial types in the semantic classification condition between 100 –300 and 500 –700 ms (B). between 200 – 400 and 500 –1,000 ms after stimulus onset were captured in the congruency condition when participants were required to determine whether cross-modal presentations matched semantically. Fig. 7. Effects of cross-modal conflict on multisensory processing. The source waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal trial types in the congruent condition compared with the simple detection condition, and larger amplitudes for incongruent compared with congruent trial types. The left parahippocampal cortex (PHC), dorsomedial prefrontal cortex (PFC), premotor cortex (PMC), and bilateral orbitofrontal cortex (OFC) showed increased activity in response to cross-modal trial types in the congruency condition between 200 –500 and 600 –1,000 ms (B). J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 gruency between auditory and visual targets. We observed significant differences between cross-modal trial types in the simple detection and the congruency conditions (LV1 ⫽ 55.25, P ⬍ 0.001; Fig. 7A). Large-amplitude modulations extending 2906 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT DISCUSSION Behavior Results Multisensory facilitation at the behavioral level was observed in simple detection and semantic classification conditions with faster RTs and improved accuracy for cross-modal compared with unimodal trial types. Multisensory responses in both simple detection and semantic classification conditions also exceeded the responses predicted by the race model (cf. Fig. 3). In the semantic classification condition, however, the presentation of the visual target facilitated auditory object recognition, but not vice versa. Although motor performance was slower for unimodal A compared with cross-modal trial types, no significant differences between the unimodal visual and cross-modal trial types were observed. Indeed, previous studies demonstrated an asymmetry between auditory and visual stimuli with increased dominance of vision over auditory perception during object recognition tasks. As demonstrated previously, concurrent visual presentations helped disambiguate auditory object processes more than vice versa (Jaekl and Harris 2009; Yuval-Greenberg and Deouell, 2009). Although the behavioral results in the semantic classification condition indicate dominance of vision over audition, recent ERP studies suggest that, similarly to the visual sensory mo- dality, auditory animacy judgments begin within 100 ms after stimulus onset (Murray et al. 2006). Similarly, Thorpe et al. (1996) showed that animacy judgments of visual objects increase neural activity in sensory-specific areas 100 ms after stimulus presentation. In the congruency condition, which assessed the effects of cross-modal semantic relatedness on classification performance, slower RTs and an increased number of false alarms were observed in response to congruent presentations relative to incongruent ones. When they were faster, participants were also more likely to judge cross-modal stimuli as congruent. Similar to previous behavioral findings (cf., Laurienti et al. 2003, 2004; Noppeney et al. 2008), even after incorrect trials were excluded, RTs to congruent presentations continued to be significantly faster than RTs to incongruent presentations. Distinct Effects of Multisensory Facilitation and Cross-Modal Conflict on Neuromagnetic Activity MEG recordings and ER-SAM analysis allowed us to examine multisensory processes both temporally and spatially. We observed evidence of multisensory integration across sensory-specific and posterior parietal sources in both auditory and visual modalities. These spatiotemporal patterns were captured in both simple detection and semantic classification conditions. Within 100 –200 ms after stimulus onset, cross-modal stimuli, compared with unimodal stimuli, elicited increased amplitude modulations in left posterior parietal cortices, including the left precuneus, left angular gyrus, and the left IPL. MEG source activity in the left temporal pole and angular gyrus increased in response to cross-modal presentations relative to unimodal auditory presentations. Although we observed significant sensory modality differences in performance, the timing of multisensory responses did not differ between auditory and visual modalities. Similar to the auditory modality, larger amplitudes to cross-modal compared with unimodal visual trial types extended between 100 –300 and 200 – 400 ms. In addition, examination of multisensory responses against the responses evoked by both modality-specific components revealed enhanced activity in the left posterior parietal sources, including the left precuneus, and the right motor cortex between 100 and 400 and at 100 ms, respectively, in response to cross-modal stimuli. Posterior parietal activity has been previously linked to multisensory processing. Neuroimaging studies using fMRI recordings to examine multisensory integration processes Fig. 8. Congruent compared with incongruent cross-modal matching. The source waveforms were derived from the Talairach coordinates displayed at left. The bootstrap ratios below the source waveforms reflect the negative expression of the task contrast (A), i.e., larger amplitude modulations for cross-modal trial types in the congruent condition compared with the simple detection condition, and larger amplitudes for incongruent compared with congruent trial types. The left dorsomedial PFC and the left centrolateral PFC showed increased activity to incongruent compared with congruent trial types between 100 and 300 ms (B). J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 The left parahippocampal cortex (PHC) and the left dorsomedial prefrontal cortex (PFC) showed large amplitude modulations between 200 – 400 and 400 – 600 ms in response to AV⫹ and AV⫺ trial types in the congruency condition compared with the simple detection condition (Fig. 7B). Furthermore, the bilateral orbitofrontal cortex (BA 11) and the left dorsolateral premotor cortex (PMC) were also associated with cross-modal presentations in the congruency condition between 400 – 600 and 600 –1,000 ms (Fig. 7B). Source activity in the left dorsolateral PMC was positively correlated with RTs across all trial types in the congruency condition between 500 and 1,000 ms after stimulus onset (r ⫽ 0.942, 0.945, 0.71, and 0.82, P ⬍ 0.001, in AV⫹, AV⫺, A, and V trial types, respectively). Within the congruency condition, we also observed differences between congruent and incongruent trial types (LV1 ⫽ 41.98, P ⬍ 0.05; Fig. 8A). The left dorsomedial PFC and the left centrolateral PFC showed larger responses to incongruent compared with congruent trial types between 100 and 300 ms (Fig. 8B). MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT J Neurophysiol • VOL lack of significant differences between animate and inanimate trial types in motor and premotor sources. Within cross-modal trial types, the left dorsomedial and centrolateral prefrontal cortices responded preferentially to incongruent cross-modal combinations compared with congruent ones 100 – 400 ms after stimulus onset. Previous fMRI studies that examined the effects of semantic congruence on multisensory integration suggested that the inferior frontal cortex activated in response to cross-modal stimuli when auditory and visual representations were semantically mismatched. Noppeney et al. (2008, 2010) used pictures of animate (animals) and inanimate (man-made) objects concurrently presented with matching and nonmatching complex sounds or spoken words. Stronger effects of semantic relatedness were observed in the posterior STS and middle temporal gyrus in response to words. However, only medial and inferior frontal cortices showed incongruency effects for both spoken words and complex sounds when paired with animate or inanimate pictures. In line with these results, Belardinelli et al. (2004) also demonstrated that the inferior frontal cortex responded more strongly to incongruent cross-modal combinations of complex stimuli (Belardinelli et al. 2004). The present study extends previous examinations of crossmodal conflict by demonstrating that detecting the degree of semantic AV incongruence elicits increased activity in medial and centrolateral prefrontal cortices as early as 100 ms after stimulus onset. Furthermore, as participants are instructed to identify the degree of cross-modal semantic relatedness, additional sources, namely, left PHC, show large-amplitude modulations as early as 200 ms after stimulus onset. After ⬃250 ms, premotor and bilateral orbitofrontal cortices show similar task-related amplitude modulations with premotor activity reflecting motor response preparation. Conclusion Multisensory integration, or the distinct neural processes associated with cross-modal relative to unimodal events, was captured in sensory-specific and posterior parietal brain regions within 100 and 200 ms after stimulus onset. Cingulate and superior temporal cortices responded preferentially to cross-modal presentations between 100 and 500 ms as participants classified cross-modal pairs into animate or animate categories. Finally, in trials that required participants to detect the level of cross-modal incongruence, medial and centrolateral prefrontal cortices showed early amplitude modulations in response to incongruent pairs compared with congruent ones. These results suggest that integration of semantically related complex sounds and pictures unfolds in multiple stages and across distinct parietal and medial frontal cortical regions: first, sensory-specific and posterior parietal neurons respond preferentially to cross-modal stimuli irrespective of task demands and as early as 100 ms after stimulus onset; second, regions in superior temporal and posterior cingulate cortices respond to cross-modal stimuli during semantic feature classification trials between 100 and 500 ms; and finally, within 100 – 400 ms and 400 – 800 ms after stimulus onset, parahippocampal, medial prefrontal, and orbitofrontal regions process crossmodal conflicts and respond to cross-modal stimuli when complex sounds and pictures are semantically mismatched. 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 showed that posterior parietal regions including IPS, the IPL (BA 40), and the SPL (BA 39) responded preferentially to cross-modal compared with unimodal presentations (Baumann and Greenlee 2007; Bishop and Miller 2008; Calvert et al. 2001; Grefkes et al. 2002; Macaluso et al. 2004; Moran et al. 2008). Previous ERP studies also found early-amplitude deflections across sensory-specific channels and late-amplitude deflections in parietal scalp regions during the presentation of semantically and spatially congruent AV stimuli (Giard and Peronnet 1999; Teder-Salejarvi et al. 2002, 2005). Although multisensory processes were captured bilaterally in posterior parietal cortices, the difference between crossmodal and unimodal trial types was larger and more stable across participants in the left hemisphere. This left hemispheric bias in posterior parietal activity during multisensory processing is not uncommon. There is growing evidence that when cross-modal activation patterns are not found bilaterally, they are usually observed in the left hemisphere (cf. Baumann and Greenlee 2007; Calvert et al. 2001; Grefkes et al. 2002; Molholm et al. 2006). In addition to increased amplitude modulations associated with cross-modal stimuli, we also demonstrated that activity in cingulate and superior temporal regions increased 100 and 500 ms after stimulus onset as participants were required to determine which cross-modal pairs pertained to animate or inanimate categories. These MEG source activation patterns were specific to cross-modal trial types, because they were not detected when contrasting unisensory responses across simple detection and semantic classification conditions. Previous neuroimaging studies also showed that simultaneously presented complex sounds and semantically-related visual stimuli recruited anterior cingulate and adjacent medial prefrontal cortices (Laurienti et al. 2003). Furthermore, consistent with our current results, Beauchamp et al. (2004) and van Atteveldt et al. (2010) showed that blood oxygen leveldependent activity in the posterior STS and the superior temporal cortex was specifically associated with the processing of familiar, semantically related cross-modal stimuli. The present study extends this line of research by demonstrating that neural activity related to semantic feature classification begins 100 ms after stimulus onset in cingulate regions and extends to superior temporal regions 400 ms later in response to cross-modal stimulus presentations only. In cross-modal matching trial types, activity in the left PHC increased between 200 – 400 and 600 –1,000 ms across both congruent and incongruent cross-modal pairs. Furthermore, the bilateral orbitofrontal cortex (BA 11) and the left PMC showed similar task-related activation patterns at later processing stages, between 500 and 1,000 ms after stimulus onset. We also found that increased activity in the left PMC predicted slower performance in response to cross-modal stimuli as participants assessed the degree of congruence between their auditory and visual counterparts. Because response latencies in the congruency condition averaged at ⬃858 ms, increased premotor activity in this condition may reflect motor response preparation. On the other hand, recent research has shown that left PMCs respond preferentially to inanimate objects such as tools (Maravita et al. 2002; Weisberg et al. 2007). Although we were not interested in differences between animate and inanimate stimuli, we compared the two stimulus categories and found a 2907 2908 MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT GRANTS This research was supported by grants from the Canadian Institutes of Health Research and the J. S. McDonnell Foundation to A. R. McIntosh. A. O. Diaconescu is supported by the Natural Sciences and Engineering Research Council of Canada. DISCLOSURES No conflicts of interest, financial or otherwise, are declared by the author(s). AUTHOR CONTRIBUTIONS A.O.D., C.A., and A.R.M. conception and design of research; A.O.D. performed experiments; A.O.D. analyzed data; A.O.D. and A.R.M interpreted results of experiments; A.O.D. prepared figures; A.O.D. drafted manuscript; A.O.D., C.A., and A.R.M. edited and revised manuscript; A.O.D., C.A., and A.R.M. approved final version of manuscript. Baumann O, Greenlee MW. Neural correlates of coherent audiovisual motion perception. Cereb Cortex 17: 1433–1443, 2007. Beauchamp MS, Nath AR, Pasalar S. fMRI-Guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. J Neurosci 30: 2414 –2417, 2010. Belardinelli MO, Sestieri C, Di Matteo R, Delogu F, Del Gratta C, Ferretti A, Caulo M, Tartaro A, Romani GL. Audio-visual crossmodal interactions in environmental perception: An fMRI investigation. Cogn Process 5: 167–174, 2004. Bezgin G, Wanke E, Krumnack A, Kotter R. Deducing logical relationships between spatially registered cortical parcellations under conditions of uncertainty. Neural Netw 21: 1132–1145, 2008. Bishop CW, Miller LM. A multisensory cortical network for understanding speech in noise. J Cogn Neurosci 21: 1790 –1805, 2009. Brookes MJ, Stevenson CM, Barnes GR, Hillebrand A, Simpson MI, Francis ST, Morris PG. Beamformer reconstruction of correlated sources using a modified source model. Neuroimage 34: 1454 –1465, 2007. Calvert GA, Hansen PC, Iversen SD, Brammer MJ. Detection of audiovisual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage 14: 427– 438, 2001. Chen YC, Spence C. When hearing the bark helps to identify the dog: semantically-congruent sounds modulate the identification of masked pictures. Cognition 114: 389 – 404, 2010. Cheyne D, Bakhtazad L, Gaetz W. Spatiotemporal mapping of cortical activity accompanying voluntary movements using an event-related beamforming approach. Hum Brain Mapp 27: 213–229, 2006. Cheyne D, Bostan AC, Gaetz W, Pang EW. Event-related beamforming: a robust method for presurgical functional mapping using MEG. Clin Neurophysiol 118: 1691–1704, 2007. Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 29: 162–173, 1996. Diederich A, Colonius H. Bimodal and trimodal multisensory enhancement: effects of stimulus onset and intensity on reaction time. Percept Psychophys 66: 1388 –1404, 2004. Doehrmann O, Naumer MJ. Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration. Brain Res 1242: 136 –150, 2008. Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence intervals and other measures of statistical accuracy. Statist Sci 1: 54 –77, 1986. Giard MH, Peronnet F. Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11: 473– 490, 1999. Grefkes C, Weiss PH, Zilles K, Fink GR. Crossmodal processing of object features in human anterior intraparietal cortex: an fMRI study implies equivalencies between humans and monkeys. Neuron 35: 173–184, 2002. Gross J, Kujala J, Hamalainen M, Timmermann L, Schnitzler A, Salmelin R. Dynamic imaging of coherent sources: Studying neural interactions in the human brain. Proc Natl Acad Sci USA 98: 694 – 699, 2001. Jaekl PM, Harris LR. Sounds can affect visual perception mediated primarily by the parvocellular pathway. Vis Neurosci 26: 477– 486, 2009. Hadjipapas A, Hillebrand A, Holliday IE, Singh KD, Barnes GR. Assessing interactions of linear and nonlinear neuronal sources using MEG beamformers: a proof of concept. Clin Neurophysiol 116: 1300 –1313, 2005. J Neurophysiol • VOL 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 REFERENCES Hartcher-O’Brien J, Gallace A, Krings B, Koppen C, Spence C. When vision ‘extinguishes’ touch in neurologically-normal people: extending the Colavita visual dominance effect. Exp Brain Res 186: 643– 658, 2008. Kaiser J, Hertrich I, Ackermann H, Mathiak K, Lutzenberger W. Hearing lips: gamma-band activity during audiovisual speech perception. Cereb Cortex 15: 646 – 653, 2005. Koppen C, Alsius A, Spence C. Semantic congruency and the Colavita visual dominance effect. Exp Brain Res 184: 533–546, 2008. Koppen C, Spence C. Assessing the role of stimulus probability on the Colavita visual dominance effect. Neurosci Lett 418: 266 –271, 2007a. Koppen C, Spence C. Seeing the light: exploring the Colavita visual dominance effect. Exp Brain Res 180: 737–754, 2007b. Kotter R, Wanke E. Mapping brains without coordinates. Philos Trans R Soc Lond B Biol Sci 360: 751–766, 2005. Kutas M, Federmeier KD. Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn Sci 4: 463– 470, 2000. Laurienti PJ, Burdette JH, Maldjian JA, Wallace MT. Enhanced multisensory integration in older adults. Neurobiol Aging 27: 1155–1163, 2006. Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT. Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158: 405– 414, 2004. Laurienti PJ, Wallace MT, Maldjian JA, Susi CM, Stein BE, Burdette JH. Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum Brain Mapp 19: 213–223, 2003. Macaluso E, George N, Dolan R, Spence C, Driver J. Spatial and temporal factors during processing of audiovisual speech: a PET study. Neuroimage 21: 725–732, 2004. Malach R, Reppas JB, Benson RR, Kwong KK, Jiang H, Kennedy WA, Tootell RB. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci USA 92: 8135– 8139, 1995. Maravita A, Spence C, Kennett S, Driver J. Tool-use changes multimodal spatial interactions between vision and touch in normal humans. Cognition 83: B25–B34, 2002. McDonald JJ, Teder-Salejarvi WA, Hillyard SA. Involuntary orienting to sound improves visual perception. Nature 407: 906 –908, 2000. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature 264: 746 –748, 1976. McIntosh AR, Bookstein FL, Haxby JV, Grady CL. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage 3: 143–157, 1996. McIntosh AR, Lobaugh NJ. Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage 23, Suppl 1: S250 –S263, 2004. Miller J. Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14: 247–279, 1982. Molholm S, Ritter W, Javitt DC, Foxe JJ. Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex 14: 452– 465, 2004. Molholm S, Sehatpour P, Mehta AD, Shpaner M, Gomez-Ramirez M, Ortigue S, Foxe JJ. Audio-visual multisensory integration in superior parietal lobule revealed by human intracranial recordings. J Neurophysiol 96: 721–729, 2006. Moran RJ, Molholm S, Reilly RB, Foxe JJ. Changes in effective connectivity of human superior parietal lobule under multisensory and unisensory stimulation. Eur J Neurosci 27: 2303–2312, 2008. Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid brain discrimination of sounds of objects. J Neurosci 26: 1293–1302, 2006. Murray MM, Michel CM, Grave de Peralta R, Ortigue S, Brunet D, Gonzalez Andino S, Schnider A. Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging. Neuroimage 21: 125–135, 2004. Nichols TE, Holmes AP. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum Brain Mapp 15: 1–25, 2002. Nishitani N, Hari R. Viewing lip forms: cortical dynamics. Neuron 36: 1211–1220, 2002. Noppeney U, Josephs O, Hocking J, Price CJ, Friston KJ. The effect of prior visual information on recognition of speech and sounds. Cereb Cortex 18: 598 – 609, 2008. Noppeney U, Ostwald D, Werner S. Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex. J Neurosci 30: 7434 –7446, 2010. MULTISENSORY FACILITATION AND CROSS-MODAL CONFLICT J Neurophysiol • VOL Semantic confusion regarding the development of multisensory integration: a practical solution. Eur J Neurosci 31: 1713–1720, 2010. Talairach J, Tournoux P. Co-Planar Stereotaxic Atlas of the Human Brain, translated by Rayport M. New York: Thieme Medical, 1988. Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA. Effects of spatial congruity on audio-visual multimodal integration. J Cogn Neurosci 17: 1396 –1409, 2005. Teder-Salejarvi WA, McDonald JJ, Di Russo F, Hillyard SA. An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings. Brain Res Cogn Brain Res 14: 106 –114, 2002. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature 381: 520 –522, 1996. van Atteveldt NM, Blau VC, Blomert L, Goebel R. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex. BMC Neurosci 11: 11, 2010. Van Petten C, Luka BJ. Neural localization of semantic context effects in electromagnetic and hemodynamic studies. Brain Lang 97: 279 –293, 2006. Van Wassenhove V, Grant KW, Poeppel D. Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102: 1181– 1186, 2005. Weisberg J, van Turennout M, Martin A. A neural system for learning about object function. Cereb Cortex 17: 513–521, 2007. Yuval-Greenberg S, Deouell LY. What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition. J Neurosci 27: 1090 –1096, 2007. Yuval-Greenberg S, Deouell LY. The dog’s meow: asymmetrical interaction in cross-modal object recognition. Exp Brain Res 193: 603– 614, 2009. 106 • DECEMBER 2011 • www.jn.org Downloaded from http://jn.physiology.org/ by 10.220.33.3 on June 18, 2017 Peiffer AM, Mozolic JL, Hugenschmidt CE, Laurienti PJ. Age-related multisensory enhancement in a simple audiovisual detection task. Neuroreport 18: 1077–1081, 2007. Pourtois G, de Gelder B. Semantic factors influence multisensory pairing: a transcranial magnetic stimulation study. Neuroreport 13: 1567–1573, 2002. Quraan MA, Cheyne D. Reconstruction of correlated brain activity with adaptive filters in MEG. Neuroimage 49: 2387–2400, 2010. Robinson S, Vrba J(editors) . Functional Neuroimaging by Synthetic Aperture Magnetometry. Sendai, Japan: Tohoku Univ. Press, 1998. Schneider TR, Debener S, Oostenveld R, Engel AK. Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-toauditory object priming. Neuroimage 42: 1244 –1254, 2008. Sekihara K, Nagarajan SS, Poeppel D, Marantz A, Miyashita Y. Reconstructing spatio-temporal activities of neural sources using an MEG vector beamformer technique. IEEE Trans Biomed Eng 48: 760 –771, 2001. Senkowski D, Saint-Amour D, Kelly SP, Foxe JJ. Multisensory processing of naturalistic objects in motion: a high-density electrical mapping and source estimation study. Neuroimage 36: 877– 888, 2007. Sinnett S, Soto-Faraco S, Spence C. The co-occurrence of multisensory competition and facilitation. Acta Psychol (Amst) 128: 153–161, 2008. Sinnett S, Spence C, Soto-Faraco S. Visual dominance and attention: the Colavita effect revisited. Percept Psychophys 69: 673– 686, 2007. Snodgrass JG, Vanderwart M. A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. J Exp Psychol Hum Learn 6: 174 –215, 1980. Stein BE, Burr D, Constantinidis C, Laurienti P, Meredith MA, Perrault TJ, Ramachandran R, Röder B, Rowland BA, Sathian K, Schroeder CE, Shams L, Stanford TR, Wallace MT, Yu L, Lewkowicz DJ. 2909
© Copyright 2026 Paperzz