Exp Brain Res DOI 10.1007/s00221-013-3674-2 Research Article Effect of pitch–space correspondence on sound‑induced visual motion perception Souta Hidaka · Wataru Teramoto · Mirjam Keetels · Jean Vroomen Received: 14 March 2013 / Accepted: 2 August 2013 © Springer-Verlag Berlin Heidelberg 2013 Abstract The brain tends to associate specific features of stimuli across sensory modalities. The pitch of a sound is for example associated with spatial elevation such that higher-pitched sounds are felt as being “up” in space and lower-pitched sounds as being “down.” Here we investigated whether changes in the pitch of sounds could be effective for visual motion perception similar to those in the location of sounds. We demonstrated that only sounds that alternate in up/down location induced illusory vertical motion of a static visual stimulus, while sounds that alternate in higher/lower pitch did not induce this illusion. The pitch of a sound did not even modulate the visual motion perception induced by sounds alternating in up/down location. Interestingly, though, sounds alternating in higher/ lower pitch could become a driver for visual motion if they were paired in a previous exposure phase with vertical visual apparent motion. Thus, only after prolonged exposure, the pitch of a sound became an inducer for upper/lower visual motion. This occurred even if during exposure the Electronic supplementary material The online version of this article (doi:10.1007/s00221-013-3674-2) contains supplementary material, which is available to authorized users. S. Hidaka (*) Department of Psychology, Rikkyo University, 1‑2‑26, Kitano, Niiza‑shi, Saitama 352‑8558, Japan e-mail: [email protected] W. Teramoto Department of Computer Science and Systems Engineering, Muroran Institute of Technology, 27‑1 Mizumoto‑cho, Muroran 050‑8585, Japan M. Keetels · J. Vroomen Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, 5000 LE Tilburg, The Netherlands pitch and location of the sounds were paired in an incongruent fashion. These findings indicate that pitch–space correspondence is not so strong to drive or modulate visual motion perception. However, associative exposure could increase the saliency of pitch–space relationships and then the pitch could induce visual motion perception by itself. Keywords Crossmodal correspondence · Multisensory perception · Auditory space · Pitch · Visual motion perception Introduction People receive large amounts of multiple sensory inputs from the surrounding environment. Our brain automatically and efficiently integrates these multisensory inputs to establish coherent and robust perceptions and cognitions (Ernst and Bülthoff 2004). While an influential cue for the integration is spatiotemporal consistency (Calvert et al. 2004), more abstract “correspondences” between sensory inputs can also serve as a factor to associate or bind them. One example is the pitch of a sound that can provide not only a high–low sensation of pitch, but also an up–down (or high/low) impression in space (Bernstein and Edelstein 1971; Evans and Treisman 2010; Mudd 1963; Pratt 1930; Roffler and Butler 1968; Rusconi et al. 2006) (see Marks 2004; Spence 2011 for review). At least three different “cues” may be responsible for establishing these crossmodal correspondences: Correspondence in the “magnitude” of brain activity (bright lights and loud sounds both induce “more” brain activity), natural statistics including simple co-occurrence of events (bigger objects produce bass sounds), and semantic consistency (the word “high” is common for high-pitched sounds 13 and upper spatial elevation) (Spence 2011). In many circumstances, these cues co-occur making it difficult to tease them apart and to establish their contribution in isolation. To demonstrate these crossmodal associations, many studies have adopted tasks like speeded classification. For example, the reaction time to an upper (or lower) visual target is faster when a higher-pitched (or lower-pitched) sound is concurrently presented than when a lower-pitched (or higher-pitched) sound is presented (Bernstein and Edelstein 1971; Spence 2011). Some studies have also demonstrated correspondence effects using more “indirect” response or attentional tasks (Evans and Treisman 2010; Chiou and Rich 2012; Klapetek et al. 2012; Mossbridge et al. 2011; Parise and Spence 2008, 2012). Evans and Treisman (2010) showed that higher- or lower-pitched sounds boost the reaction time not only for judgments of a visual target’s position but also for judgments of a visual target’s feature (orientation) when the target is presented at a congruent spatial position (i.e., upper or lower) relative to an incongruent situation (i.e., lower or upper). Chiou and Rich (2012) also reported that higher- or lower-pitched sounds worked as a “spatial” cue such that these sound shifted participant’s attention to an upper or lower visual location. These findings suggest that attentional/response-levels of processing could be involved in crossmodal correspondences. Recently, several studies also demonstrated crossmodal correspondence effects by using unspeeded perceptual tasks (e.g., temporal order judgment (TOJ), methods of adjustment) for pitch–size, pitch–shape (Parise and Spence 2009), voice–size (Sweeny et al. 2012), and auditory amplitude–visual spatial frequency (Guzman-Martinez et al. 2012) pairs. These studies thus indicate the involvement of perceptual processing in crossmodal correspondences. With regard to pitch–space correspondence, a study reported that continuous changes (glides) in pitch affected visual motion perception (Maeda et al. 2004). However, since the continuous pitch changes could dominantly induce motion impression by itself (Walker 1987), crossmodal interaction in motion processing rather than pitch–space correspondences is assumed to mainly contribute to this finding. Therefore, although pitch–space correspondence is considered to be one of the most typical examples of crossmodal correspondence, the effects of this correspondence in a perceptual domain had not been purely investigated with a comparable manner between the pitch and spatial information of sounds. The aim of the present study was thus to investigate whether changes in the pitch of sounds could be effective for visual motion perception similar to those in the location of sounds. In addition to introduce discrete pitch changes rather than continuous ones, we adopted sound-induced illusory visual motion as a tool. Sounds alternating in space can induce strong illusory visual motion perception 13 Exp Brain Res of static visual stimuli (Hidaka et al. 2009) [see Supplementary Information S1A (movie)]. Relying on signaldetection theory (Macmillan and Creelman 2004), this phenomenon has been found to affect sensitivity (d-prime) (Hidaka et al. 2011b). In line with previous studies, we here report that sounds that alternate in vertical space (up/ down) also induce vertical illusory motion (Teramoto et al. 2010b). The new finding in our current study is that sounds that alternate in pitch (high/low) do not induce this illusory visual motion so that they are not necessarily comparable to those that alternate in spatial location (up/down) [see Supplementary Information S1B (movie)] (Experiment 1). Moreover, the driving effect of the auditory up–down spatial information is not affected by whether the pitch of the sounds changes in a congruent (higher-pitched sound in upper location, lower-pitched sound in lower location) or incongruent (higher-pitched sound in lower location, lowerpitched sound in upper location) fashion (Experiment 2). These findings indicate that pitch–space correspondence is not so strong as to drive or modulate visual motion perception. Importantly, though, we also report that high- and low-pitched sounds can acquire a driving effect for visual motion if they have been associated with vertical visual apparent motion for a few minutes (Teramoto et al. 2010a). After prolonged exposure, higher/lower-pitched sounds thus induced up–down visual motion, even if these sounds were paired in an incongruent fashion with visual motion during exposure (Experiment 3). This may indicate that associative exposure can increase the saliency of pitch– space corresponding relationships, and then the pitch can induce visual motion perception by itself. Experiments 1 and 2 Experiments 1 and 2 tested whether the alternation of higher- and lower-pitched sound induced visual motion similar to sounds that alternate in upper and lower location (Hidaka et al. 2009; Teramoto et al. 2010b; Hidaka et al. 2011b). In Experiment 1, we investigated driving effects of pitch alone for visual motion perception; sounds were presented either with or without alternation in pitch (alternating pitch and constant pitch conditions, respectively), while spatial elevation of those sounds was kept constant (thus perceived as coming from the center) (Fig. 1a) [see also Supplementary Information S1B (movie)]. In Experiment 2, we further investigated modulatory effects of pitch on auditory spatial information; sounds always alternated in upper and lower locations, while the pitch alternated either in a congruent fashion (i.e., higher pitch in upper location, lower pitch in lower location) or in an incongruent fashion (i.e., higher pitch in lower location, lower pitch in upper location) (Fig. 1b). Exp Brain Res Fig. 1 Absence of driving and modulatory effects of pitch information in sound-induced visual motion perception (Experiments 1 and 2). a, b Schematic illustrations of stimuli and auditory conditions of Experiments 1 and 2, respectively. In Experiment 1, sounds were presented either with or without pitch alternation with fixed auditory spatial information. In Experiment 2, sounds were presented alternating in upper and lower locations, while pitch of the sounds alternated either in a congruent or an incongruent fashion. c, d Results of Experiments 1 and 2, respectively. Error bars denote the standard error of the mean (N = 8). Asterisks indicate statistically significant differences (p < .05) Methods Participants and apparatus Written consent was obtained from each participant prior to experiments. The experiments were approved by the local ethics committee of Rikkyo University. Each of the 16 participants (eight participants in each experiment) had normal or corrected-to-normal vision and normal hearing. The participants were naïve to the purpose of the experiment. A customized PC and MATLAB (The Mathworks Inc.) with the Psychophysics Toolbox (Brainard 1997; Pelli 1997) were used to control the experiment. Visual stimuli were presented on a CRT display with a resolution of 800 × 600 pixels and a refresh rate of 60 Hz. The viewing distance was 45 cm. Auditory stimuli were generated digitally 13 Exp Brain Res (sampling frequency 44.1 kHz) and delivered through loudspeakers. The upper speaker was set at 50 cm above, while the lower speaker was set at 50 cm below the center of the display. The horizontal position of the speakers was aligned with that of the visual stimuli. A numeric keypad was used for recording responses. We confirmed that the onset of the visual and auditory stimuli was synchronized using a digital oscilloscope. The observers were instructed to place their heads on a chin rest. All the experiments were conducted in a dark room. Stimuli We presented a red circle (0.4° in diameter; 17.43 cd/m2) as a fixation point on a black background. A sequence of white bars (3° × 0.2°; 5.08 cd/m2) was presented as visual stimuli in the right visual field at an eccentricity of either 10 or 20°. Each bar was presented for 400 ms with 100 ms of the inter-stimulus interval (ISI). For auditory stimuli, two white noise bursts were created and filtered with a 1-octave frequency band; the center frequency was either 3 kHz (higher) or 1.2 kHz (lower). These sounds were presented for 50 ms with a cosine ramp of 5 ms at the onset and offset. The amplitude was adjusted such that the two sounds were equally loud in our experimental situation. The sound pressure levels of the lower and higher tones were 69 dB SPL and 73 dB SPL, respectively, with monaural presentation and 70 dB SPL and 73 dB SPL, respectively, with binaural presentations.1 We confirmed that these stimuli could induce pitch–space correspondence effect in response domain by adopting a speeded classification task (Rusconi et al. 2006) (see Fig. 2). The onset timing of each noise burst was synchronized with that of the visual stimulus. Procedure Each experiment consisted of training and main sessions. In each session, the participants were asked to judge whether the visual stimulus was perceived as static or moving. During the main session, the participants were asked to make 1 We confirmed that the stimuli could be discriminated only by pitch. We sequentially presented higher- and lower-pitched tones, or vice versa, with 1,000 ms of ISI and asked 10 participants to judge which tone was perceived as higher in pitch or larger in amplitude (these response domains were randomly assigned in each trial). Pitch discrimination performance was nearly perfect (percentage of correct responses (standard errors of the mean) was 94.5 % (1.9 %) and 93 % (2.1 %) in the monaural and binaural presentations, respectively). On the contrary, amplitude discrimination performance was not significantly different from chance (54.5 % (9.0 %) and 55.0 % (8.9 %), t(9) = 0.50 and 0.58, in the monaural and binaural presentations, respectively). 13 Fig. 2 We confirmed that our auditory stimuli had pitch–space correspondence effects in a speeded classification task. After the 500 ms presentation of the fixation point, participants (N = 8) were presented a single high- or low-pitched band-pass noise (center frequency of 3 or 1.2 kHz, respectively) and were asked to make judgments as quickly and as accurately as possible about either the location (upper or lower for a location discrimination task) or the pitch (higher or lower pitch for a pitch discrimination task) of the sounds, ignoring the irrelevant dimension. The sound was presented from either the upper or lower loudspeakers in the location judgment task and from both loudspeakers (i.e., without spatial elevation) in the pitch judgment task. a The stimulus– response mapping was either congruent (i.e., the upper response key for upper-location or higher-pitched sounds) or incongruent (i.e., the lower response key for upper-location or higher-pitched sounds). In the congruent response key assignment, the “8” and “2” keys on the numerical keypad were assigned to the upper/ higher and lower buttons, respectively. In the incongruent response key assignment, the relationship was reversed. Reaction time (RT) and accuracy were recorded. The experiment consisted of 160 trials: Judgment (2) × Key assignment (2) × Pitch (2) × Repetitions (20). Each judgment type and key assignment was introduced as a blocked design, and the order of these conditions was counterbalanced among the participants. While the pitch was fixed in each block and counterbalanced among the blocks for the location judgment, the pitch was randomly varied among the trials for the pitch judgment. RTs shorter than 200 ms and longer than 1,200 ms were excluded from analysis (Location-congruent task: 2.19 %, Location-incongruent task: 6.56 %, Pitch-congruent task: 0.31 %, Pitch-incongruent task: 5.00 %). Mean error rates of the location and pitch judgments were as follows: Location-congruent task: 10.94 %, Location-incongruent task: 17.50 %, Pitch-congruent task: 0.94 %, Pitch-incongruent task: 3.75 %. b Regarding RT data, a two-way repeated measures ANOVA with Congruency × Judgment type found a main effect for Congruency (F(1, 7) = 9.49, p < .05): The RTs for the congruent condition were significantly shorter than those for the incongruent condition. A main effect for Judgment type (F(1, 7) = 8.16, p < .05) revealed that the RTs for location judgments were longer than those for pitch judgments. An interaction effect between the factors was not significant (F(1, 7) = 1.38, p = .29). These results thus indicate that our higher/ lower-pitched sounds were associated, as sounds from upper and lower location, with an upper/lower response space. Error bars denote the standard error of the mean (N = 8). Asterisks indicate statistically significant differences (p < .05) Exp Brain Res the judgments while trying to ignore the sounds. The training session consisted of 40 trials: Visual stimulus (2; static/ moving) × Eccentricity (2) × Repetition (10). The white bar was presented 6 times without the sounds. The bar was vertically displaced back and forth by 0.2° for the moving condition and at a fixed location for the static condition. The training session was repeated until the discrimination performance reached above 75 % for each eccentricity. This session was introduced because the visual stimuli presented at relatively large eccentricities were sometimes perceived as moving without sounds (e.g., Hidaka et al. 2009), and this effect should be dissociated from auditory driving effect. In the main session of Experiment 1, the sounds were presented from both upper and lower loudspeakers (i.e., giving the impression that the sound came from the center). This session consisted of 240 trials: Visual stimulus (2) × Eccentricity (2) × Sound (3) × Repetition (20). The sounds were presented either with pitch changes (alternating pitch condition) or without pitch changes (constant pitch condition). A silent condition was also included as baseline. In Experiment 2, the main session consisted of 320 trials: Visual stimulus (2) × Eccentricity (2) × Sound (4) × Repetition (20). The sounds were presented from the upper and lower loudspeakers in either the congruent (higher-pitched sound in upper location) or incongruent (lower-pitched sounds in upper location) pitch-visual position assignment. The conditions with the constant pitch sounds (constant pitch condition) and without the sounds (silent condition) were also introduced. In each experiment, the order of the conditions was randomly assigned in each trial and counterbalanced among the participants. Results and discussion We calculated d-prime and β values (Macmillan and Creelman 2004) as indices of perceptual sensitivity and response/ decisional biases, respectively (Hidaka et al. 2011b). We regarded the responses of perceived static stimuli as a “hit” for the static trials and as a “false alarm” for the moving trials (Supplementary Information S2). Thus, lower d-prime values in conditions where sounds were present indicate sound-induced illusory visual motion. With regard to Experiment 1, a two-way repeated measures analysis of variance (ANOVA) with Sound × Eccentricity in d-prime values revealed a significant main effect of Eccentricity (F(1, 7) = 18.28, p < .005). However, the crucial effect of Sound (F(2, 14) = 0.89, p = .43) and their interaction (F(2, 14) = 0.44, p = .65) were not significant (Fig. 1c). The ANOVA in the β values revealed a significant main effect of Sound (F(2, 14) = 6.13, p < .05). The post hoc test (p < .05) revealed that the β value of the alternating pitch condition was smaller than the other conditions. This indicates that, consistent with the results of the speeded classification task (Fig. 2), changes in pitch could be effective in response/decisional domain. These results thus demonstrate that sounds that alternate in high/low pitch do not induce illusory visual motion perception. In Experiment 2, the two-way repeated measures ANOVA in d-prime values revealed main effects for Sound (F(3, 21) = 6.64, p < .005) and Eccentricity (F(1, 7) = 15.47, p < .01), as well as a significant interaction effect among these factors (F(3, 21) = 3.32, p < .05) (Fig. 1d). Regarding a simple main effect of Sound at 20° of Eccentricity (F(3, 42) = 6.68, p < .001), post hoc tests (Tukey’s HSD, p < .05) revealed that the d-prime for the silent condition was always higher than the other conditions and that there was no difference among the sound-present conditions. Sounds alternating in upper/lower locations thus always induced illusory visual motion of static stimuli at 20°, irrespective of the presence or absence of pitch– space congruency. The corresponding simple main effect at 10° of Eccentricity was not significant (F(3, 42) = 2.81, p = .06). The ANOVA in the β values revealed a significant main effect of sound conditions (F(3, 21) = 3.87, p < .05). The post hoc test showed that the β value of the congruent condition was smaller than those for the constant pitch condition. Again, this suggests that the correspondence between pitch and spatial information would be effective, particularly in response/decisional domain (see also Fig. 2). In line with previous studies (Hidaka et al. 2009, 2011b; Teramoto et al. 2010b), these results show that auditory spatial shifts induced illusory motion perception of static visual stimuli especially at a far peripheral eccentricity. Most importantly, congruency between sound location and pitch did not modulate that perceptual effect. Taken together, the results of Experiments 1 and 2 show that the alternation of pitch do neither induce illusory visual motion perception nor modulate the driving effect of auditory spatial information. Experiment 3 In Experiment 3, we examined whether the saliency of pitch for spatial information could be changed by associating pitch with visual apparent motion. Previous studies have shown that arbitrary pitch information can be associative with and induce horizontal visual motion within a few minutes of exposure to a paired presentation of these stimuli (Teramoto et al. 2010a). This sound-contingent visual motion perception is confirmed to occur at a perceptual level (Hidaka et al. 2011a; Kobayashi et al. 2012a, b). Here, we used a 9-minute exposure phase in which sounds alternating in pitch (higher/lower) without spatial elevation (thus perceived as appearing from the center) were paired with visual stimuli (white bars) alternately displaced by 13 5° in the vertical direction. Pitch and visual stimulus locations were either congruent (e.g., a higher-pitched sound paired with the upper visual stimulus) or incongruent (e.g., a higher-pitched sound paired with the lower visual stimulus) (Fig. 3a) [see also Supplementary Information S1C (movie)]. Test sessions were held before and after the exposure sessions to quantify the exposure effects on visual motion perception. Fig. 3 Effects of associative exposure between vertical visual motion and alternating pitch sounds (Experiment 3). a Schematic illustrations of exposure and test sessions and an example of session flow. In a 9-minute exposure session, sounds alternating in pitch appearing from central location were paired with visual stimuli alternating in vertical locations. The pitch of the sound was either congruent or incongruent with the visual stimulus locations. Test sessions were held before (pre-test) and after (posttest) the exposure sessions. In each test session, the visual stimuli shifted in upward or downward direction with pitch changes from higher-to-lower or vice versa. b Psychometric functions. On the horizontal axis, negative values indicate downward visual motion, and positive values indicate upward motion. Point of 50 % responses was estimated as the point of subjective stationarity (PSS). c Amount of PSS shifts. Error bars denote one standard error of the mean (N = 8). Asterisks indicate statistically significant differences (p < .05) 13 Exp Brain Res Methods Eight participants were recruited in this experiment. They had normal or corrected-to-normal vision and normal hearing and were naïve to the purpose of the experiment. In the display with a resolution of 1,600 × 1,200, the white bars were presented in the left and right visual fields at 10° of Eccentricity. This experiment consisted of three sessions, Exp Brain Res a pre-test, an exposure, and a post-test session. In the exposure session, the white bar constantly moved up and down with 5° of distance for 9 min (Duration of each bar was 400 ms and ISI between successive bars was 100 ms, amounting to 540 number of exposure in total). The onset of the visual stimuli was synchronized with that of the sounds. The pitch and the location of the visual stimulus were either congruent (i.e., higher-pitched sound with upper visual stimulus) or incongruent (i.e., higher-pitched sound with lower visual stimulus). Participants were asked to keep looking at the fixation point. The visual stimuli were presented in either the right or the left visual field for each exposure type because the contingency effect has sharp spatial selectivity (~5°) at the exposed visual field (Teramoto et al. 2010a; Hidaka et al. 2011a). In pre- and post-test sessions, the points of subjective stationarity for motion direction were measured by using motion nulling procedure with the method of constant stimuli (Teramoto et al. 2010a; Hidaka et al. 2011a; Kobayashi et al. 2012a, b). The estimation of PSS with motion nulling procedure is a traditional, reliable, and direct measurement of motion perception (e.g., Arman et al. 2006; Cavanagh and Favreau 1985; Mateeff et al. 1985). Typically, in this procedure, the magnitude of illusory motion perception is measured by presenting physical motion in the same or opposite direction of illusory motion with various displacement sizes. If illusory motion is not perceived, observers’ responses and the resulting PSS are firmly consistent with the perception of physical motion. However, if illusory motion occurs, the PSS would shift because illusory motion perception boosts the percept of a consistent physical motion signal and even cancels out an inconsistent physical motion signal. In each trial, two visual stimuli were sequentially presented, producing vertical apparent motion. The moving distances (0.06, 0.12, 0.24, and 0.48°) and directions (upward/ downward) were randomly assigned. The onsets of the two visual stimuli were synchronized with the higher and lower sounds (H-L) or vice versa (L–H). The silent condition was also introduced as baseline. The participants were asked to judge the perceived motion direction of the visual stimuli by ignoring the sounds if presented. Each pre- and post-test session consisted of 240 trials: Distance (4) × Direction (2) × Sound (3) × Repetition (10). In each session, the conditions were randomly assigned in each trial and counterbalanced among the participants. While the pre-tests were successively completed in one of the visual fields (left or right), post-tests were introduced after each exposure type in the exposed visual field. The exposure types and exposed visual fields (Congruent— Left and Incongruent—Right or Incongruent—Left and Congruent—Right) were randomly assigned in each participant and counterbalanced among the participants. The experiment lasted about an hour. Except for these, the apparatus, stimuli, and procedures were identical to those of Experiment 1. Results and discussion We plotted the proportions of the upward motion perception as a function of moving distance (Fig. 3b). To estimate the points of subjective stationarities (PSSs), we obtained the 50 % point by fitting a cumulative normal-distribution function to each individual’s psychometric function. In order to compare the PSSs (Supplementary Information S3) between the sound conditions and between exposure types, we calculated the amount of PSS shifts by subtracting the PSSs of the silent condition from those of the sound conditions in each test session and exposure type (Kobayashi et al. 2012b) (Fig. 3c). A three-way repeated measures ANOVA on the PSS with Test session × Exposure × Sound found a significant interaction effect between Test session and Sound (F(1, 7) = 5.87, p < .05). While the simple main effects of Sound were not significant in the pre-test session (F(1, 14) = 1.79, p = .20), the effect was significant in the post-test session (Fs(1, 14) = 13.18, p < .005). The PSSs shifted to the negative (downward motion) direction in the H–L condition and to the positive (upward motion) direction in the L–H condition, irrespective of exposure type: The congruent condition thus induced illusory visual motion in the same direction that the participants were exposed to (i.e., the exposure to H–L sounds induced frequent downward motion perception), whereas the incongruent condition induced illusory visual motion in the opposite direction that the participants were exposed to (i.e., the exposure to L–H sounds also induced frequent downward motion perception). Consistent with Experiment 1, we found that pitch did not affect visual motion perception before exposure. However, after exposure, a change in pitch did alter visual motion perception. In line with previous studies (Teramoto et al. 2010a; Hidaka et al. 2011a; Kobayashi et al. 2012a, b), these results indicate that a new pitch–space association could be formed and this induces sound-induced visual motion. What is new is that the effect of exposure occurred in a congruent manner irrespective of the exposure types: H–L/L–H sounds consistently induced downward/upward visual motion perception. A previous study suggested that a top–down association with pitch and brightness could serve to make their correspondence relationship salient to the participants and then elicit the correspondence effect in visual search task (Klapetek et al. 2012). In line with this idea, one explanation for the current findings would be that the pitch–space correspondence existing in higher processing levels somehow modulates perceptual association in a congruent manner by top–down control such as through attentional guidance to a specific relationship (Ahissar and 13 Hochstein 1993; Chiou and Rich 2012). An alternative explanation would be that the representation of the pitch– space correspondence is originally but weakly represented at the perceptual processing level. In fact, there was a slight (albeit non-significant) trend for a pitch–space correspondence in the pre-test. If the representation of the congruent pitch–space association is weakly established, the congruent exposure might activate such representations and directly induce the congruency effect. The representation of an incongruent pitch–space association would be newly shaped by the prolonged exposure in the current study and activate the counterpart congruent representation by the enhancement of crossmodal connectivity (Zangenehpour and Zatorre 2010) so that the congruency effect might appear. In either explanation, the findings may suggest that the prolonged exposure can increase the saliency of pitch– space corresponding relationships, and then the pitch can induce visual motion perception. General discussion The present study investigated whether the discrete changes in the pitch of sounds, which is typically associated with spatial location information (pitch–space correspondence), could be effective for visual motion perception. We adopted sound-induced illusory visual motion as a tool for investigating the perceptual effects of pitch–space correspondence on visual motion perception in a comparable manner between the pitch and spatial information of sounds. We found that, contrary to the changes in location of sounds, the pitch changes did not have a driving or modulating effect on visual motion perception (Experiments 1 and 2). We further found that, after the pitch changes were associated with visual motion, they did become inducers of visual motion in a congruent manner even after an incongruent paired association (Experiment 3). These findings suggest that pitch–space correspondence is originally not so strong to drive or modulate visual motion perception. However, associative exposure could increase the saliency of pitch– space corresponding relationships and then the pitch could induce visual motion perception. In Experiments 1 and 2, we focused on differences in d-primes as the index of perceptual sensitivity to visual motion. We confirmed that the alternation of pitch information did not induce the changes in d-prime values, while this did have some effect on β values, which are the index of response/decisional biases. In Experiment 3, in addition to that the changes in PSSs demonstrated the driving effect of pitch on visual motion, we also confirmed that the slope of the psychometric functions (just noticeable differences; JND) did not differ among the auditory conditions even in the post-test session [see Supplementary Information S4]. 13 Exp Brain Res If response/decisional biases existed, JND should differ because the upward/downward sound presentations would induce frequent upward/downward responses especially when motion direction was uncertain. These results suggest that the findings in the current experiments cannot be simply explained by response/decisional biases. Pitch–space correspondence effects have often been demonstrated by speeded classification tasks (Bernstein and Edelstein 1971) (see also Fig. 2). Recently, such correspondence has also been demonstrated using an indirect response task (Evans and Treisman 2010; Klapetek et al. 2012; Parise and Spence 2008, 2012) or attentional tasks (Chiou and Rich 2012; Mossbridge et al. 2011). These findings indicate that attentional/response-levels of processing could be involved in crossmodal correspondences. Recently, some studies successfully demonstrate crossmodal correspondence effects by using unspeeded perceptual tasks for pitch–size, pitch–shape (Parise and Spence 2009), voice–size (Sweeny et al. 2012), and auditory amplitude–visual spatial frequency (Guzman-Martinez et al. 2012) pairs. However, none of such studies were reported for pitch–space correspondence. Maeda et al. (2004) also reported that changes in pitch induced visual motion perception. However, considering that they presented continuous pitch changes (glides) which are more likely to elicit vertical motion impression than two discrete pitch sounds used in the current research (Walker 1987), a key element in Maeda et al. (2004) may be audiovisual interaction in motion processing rather than pitch–space correspondence. Thus, our results provide direct evidence that there exist limitations of pitch–space correspondence at a perceptual level based in an unspeeded perceptual task (sound-induced visual motion perception). It has been debated how crossmodal correspondences are acquired in the brain. Recently, it has been reported that pitch–space and pitch–shape correspondence effects would be observed even for pre-linguistic 4-month-old infants (Walker et al. 2010) and that not only humans but also chimpanzees exhibit pitch–luminance correspondence (Ludwig et al. 2011). These findings suggest that crossmodal correspondences may be innate because linguisticor conceptually based acquisition processes are inapplicable. However, it has also been argued that there is linguistic diversity, and all languages do not necessarily use the same spatial metaphor for pitch (Dolscheid et al. 2011). Also, both humans and chimpanzees could learn the relationship of multimodal inputs/events based on natural statistics including simple concurrence of events (Adams et al. 2004; Ernst 2005, 2007). Thus, we could also consider that crossmodal correspondences are empirically acquired after birth (Mossbridge et al. 2011; Spence and Deroy 2012). The current findings echo both of these ideas: Experiment 3 demonstrated that the pitched sounds did come to induce visual Exp Brain Res motion after the paired exposure of pitch and visual spatial information. Moreover, the correspondence relationships were observed irrespective of the exposure types. These findings indicate that there originally exist representations for pitch–space correspondence and the representations could be activated via associative exposure for being effective at a perceptual level. In the current study, we investigated the perceptual reality of pitch–space correspondence by adopting a behavioral task that is highly compatible with pitch–space associations in sounds (sound-induced visual motion). Recently, it was reported that, while continuous pitch changes had biasing effects on visual motion perception equivalent to natural auditory motion signal in a behavioral task, the underlying neural responses were different during the task: Whereas natural auditory motion signal modulated the responses in hMT area, continuous pitch changes did not have such effect but modulated the responses in relatively higher brain region (superior/intraparietal sulcus) (Sadaghiani et al. 2009). Based on these findings, the adoption of brain imaging techniques should be necessary in near future to investigate where the representations of pitch–space correspondence originally exist and how they are activated and associated with perceptual processing during associative exposure in the brain. Acknowledgments We thank Wouter D.H. Stumpel for his technical supports. We are grateful to anonymous reviewers for their valuable and insightful comments and suggestions for early versions of the manuscript. This research was supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Specially Promoted Research (No. 19001004) and Rikkyo University Special Fund for Research. References Adams WJ, Graf EW, Ernst MO (2004) Experience can change the “light-from-above” prior. Nat Neurosci 7:1057–1058 Ahissar M, Hochstein S (1993) Attentional control of early perceptual learning. Proc Natl Acad Sci USA 90:5718–5722 Arman AC, Ciaramitaro VM, Boynton GM (2006) Effects of featurebased attention on the motion aftereffect at remote locations. Vision Res 46:2968–2976 Bernstein IH, Edelstein BA (1971) Effects of some variations in auditory input upon visual choice reaction time. J Exp Psychol 87:241–247 Brainard DH (1997) The psychophysics toolbox. Spat Vis 10:433–436 Calvert GA, Spence C, Stein BE (eds) (2004) The handbook of multisensory processing. MIT Press, Cambridge Cavanagh P, Favreau OE (1985) Color and luminance share a common motion pathway. Vision Res 25:1595–1601 Chiou R, Rich AN (2012) Cross-modality correspondence between pitch and spatial location modulates attentional orienting. Perception 41:339–353 Dolscheid S, Shayan S, Majid A, Casasanto D. (2011) The thickness of musical pitch: psychophysical evidence for the Whorfian hypothesis. In: Proceedings of the 33rd Annual Conference of the Cognitive Science Society, pp 537–542 Ernst MO (2005) A Bayesian view on multimodal cue integration. In: Knoblich G, Thornton I, Grosjean M, Shiffrar M (eds) Perception of the human body perception from the inside out. Oxford University Press, New York, pp 105–131 Ernst MO (2007) Learning to integrate arbitrary signals from vision and touch. J Vis 7(7):1–14 Ernst MO, Bülthoff HH (2004) Merging the senses into a robust percept. Trends Cogn Sci 8:162–169 Evans KK, Treisman A (2010) Natural cross-modal mappings between visual and auditory features. J Vis 10(1):6: 1–12 Guzman-Martinez E, Ortega L, Grabowecky M, Mossbridge J, Suzuki S (2012) Interactive coding of visual spatial frequency and auditory amplitude-modulation rate. Curr Biol 22:383–388 Hidaka S, Manaka Y, Teramoto W, Sugita Y, Miyauchi R, Gyoba J, Suzuki Y, Iwaya Y (2009) Alternation of sound location induces visual motion perception of a static object. PLoS ONE 4:e8188 Hidaka S, Teramoto W, Kobayashi M, Sugita Y (2011a) Sound-contingent visual motion aftereffect. BMC Neurosci 12:44 Hidaka S, Teramoto W, Sugita Y, Manaka Y, Sakamoto S, Suzuki Y (2011b) Auditory motion information drives visual motion perception. PLoS ONE 6:e17499 Klapetek A, Ngo MK, Spence C (2012) Does crossmodal correspondence modulate the facilitatory effect of auditory cues on visual search? Atten Percept Psychophys 74:1154–1167 Kobayashi M, Teramoto W, Hidaka S, Sugita Y (2012a) Indiscriminable sounds determine the direction of visual motion. Sci Rep 2:365 Kobayashi M, Teramoto W, Hidaka S, Sugita Y (2012b) Sound frequency and aural selectivity in sound-contingent visual motion aftereffect. PLoS ONE 7:e36803 Ludwig VU, Adachi I, Matsuzawa T (2011) Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proc Natl Acad Sci USA 108:20661–20665 Macmillan NA, Creelman CD (2004) Detection theory: a user’s guide, 2nd edn. Lawrence Erlbaum Associates Inc, New Jersey Maeda F, Kanai R, Shimojo S (2004) Changing pitch induced visual motion illusion. Curr Biol 14:R990–R991 Marks LE (2004) Cross-modal interactions in speeded classification. In: Calvert GA, Spence C, Stein BE (eds) Handbook of multisensory processes. MIT Press, Cambridge, pp 85–105 Mateeff S, Hohnsbein J, Noack T (1985) Dynamic visual capture: apparent auditory motion induced by a moving visual target. Perception 14:721–727 Mossbridge JA, Grabowecky M, Suzuki S (2011) Changes in auditory frequency guide visual-spatial attention. Cognition 121:133–139 Mudd SA (1963) Spatial stereotypes of four dimensions of pure tone. J Exp Psychol 66:347–352 Parise C, Spence C (2008) Synesthetic congruency modulates the temporal ventriloquism effect. Neurosci Lett 442:257–261 Parise CV, Spence C (2009) “When birds of a feather flock together”: synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS ONE 4:e5664 Parise CV, Spence C (2012) Audiovisual crossmodal correspondences and sound symbolism: a study using the implicit association test. Exp Brain Res 220:319–333 Pelli DG (1997) The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis 10:437–442 Pratt CC (1930) The spatial character of high and low tones. J Exp Psychol 13:278–285 Roffler SK, Butler RA (1968) Factors that influence the localization of sound in the vertical plane. J Acoust Soc Am 43:1255–1259 Rusconi E, Kwan B, Giordano BL, Umiltà C, Butterworth B (2006) Spatial representation of pitch height: the SMARC effect. Cognition 99:113–129 13 Sadaghiani S, Maier JX, Noppeney U (2009) Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. J Neurosci 29:6490–6499 Spence C (2011) Crossmodal correspondences: a tutorial review. Atten Percept Psychophys 73:971–995 Spence C, Deroy O (2012) Crossmodal correspondences: innate or learned? Iperception 3:316–318 Sweeny TD, Guzman-Martinez E, Ortega L, Grabowecky M, Suzuki S (2012) Sounds exaggerate visual shape. Cognition 124:194–200 Teramoto W, Hidaka S, Sugita Y (2010a) Sounds move a static visual object. PLoS ONE 5:e12255 Teramoto W, Manaka Y, Hidaka S, Sugita Y, Miyauchi R, Sakamoto S, Gyoba J, Iwaya Y, Suzuki Y (2010b) Visual motion 13 Exp Brain Res perception induced by sounds in vertical plane. Neurosci Lett 479:221–225 Walker R (1987) The effects of culture, environment, age, and musical training on choices of visual metaphors for sound. Percept Psychophys 42:491–502 Walker P, Bremner JG, Mason U, Spring J, Mattock K, Slater A, Johnson SP (2010) Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences. Psychol Sci 21:21–25 Zangenehpour S, Zatorre RJ (2010) Crossmodal recruitment of primary visual cortex following brief exposure to bimodal audiovisual stimuli. Neuropsychologia 48:591–600
© Copyright 2026 Paperzz