Mechanisms of simultaneity constancy by Laurence Harris*, Vanessa Harrar, and Phil Jaekl Multisensory Integration Laboratory, Centre for Vision Research, York University, Toronto, Ontario, Canada M3J 1P3 Corresponding author: Laurence Harris, Department of Psychology, York University, Toronto, Ontario, Canada M3J 1P3 Email: [email protected] Phone: 416-736-2100 x 66108 Fax: 416-736-5857 Words: Figures: Tables: Version: 6,085 10 0 8 Chapter to appear in: “Issues of space and time in perception and action" edited by Romi Nijhawan. (Cambridge University Press) Although time is essential for proper perception of the outside world, it is created by the brain. There is no sensory system for time, only a perception of relative time between events. Since time is a reconstructive process, the perception of “now” must always lag behind (Dennett and Kinsbourne, 1992; Stone et al., 2001) which means that perception of time is by definition always wrong to a degree. Are we able to correct for errors that result from various stimuli taking different amounts of time to be processed in the reconstruction of time and accurately perceive simultaneity? This chapter reviews the circumstances in which simultaneity is correctly perceived and proposes a three-stage mechanism for achieving this. Combining information from different senses that corresponds with a single event has several problems associated with it. The senses collect data in different reference frames, generally about different attributes of an object and with different resolutions and reliability. Furthermore information picked up by different senses is often redundant – which raises the issue of how information might be averaged between or selected from a range of signals. Areas where the different senses may provide redundant information are the location of an object in space and time. Making accurate and consistent judgments about the relative timing of events requires our perceptual system to overcome the different operation speeds associated with each individual sense. These differences arise from both intrinsic and extrinsic factors. There are many reasons why information about a given event takes a different amount of time depending on the sensory route it takes to reach the central nervous system. Three sources of these differences are: the time it takes the energy from the event to reach the neural sensors; the time for the transduction process (Spence and Squire, 2003); and the neural transmission time for the information to pass from the transducers to the central nervous system (Von Békésy, 1963; Macefield et al., 1989). Many factors can influence the timing of each of these three sources and any two simultaneous stimuli will often differ in the timing of all them. Reconstructing the actual timing of an event or its timing relative to another event, therefore involves making some allowance for these variable delays. If observers are to identify correctly the simultaneity of the components of multimodal events, they need to allow for all of these factors. We and others have demonstrated that in some situations, the brain can take many of these factors into account and correctly realize when stimuli from various modalities inform about a common object or event (Engel and Dougherty, 1971; Sugita and Suzuki, 2003; Kopinska and Harris, 2004; Alais and Carlile, 2005). This chapter will explore how effectively this can be done both cross-modally and intra-modally. Simultaneity constancy Simultaneity constancy is the ability to correctly perceive simultaneous events as such, despite variations in the timing of the internal representations of the individual components of the stimuli (Kopinska and Harris, 2004). Simultaneity constancy is conceptually not different from other forms of perceptual constancy where the perception, in this case of relative time, is not present in sensory information and must be deduced (Walsh and Kulikowski, 1998). For example the size of an object is correctly perceived despite continuous changes in its retinal size – a phenomenon known as size constancy (Gregory, 1963; McKee and Smallman, 1998). Simultaneity constancy maintains a percept of simultaneous stimuli across many channels and types of information thus indicating a common attribute – the presence of a complex single event. Methods used to assess perceived simultaneity There are two methods generally used to asses the perception of simultaneity: 1) A forced-choice decision canbe made between whether two stimuli are “simultaneous” or “successive.” Generally these decisions are plotted as a normal distribution with “number of times subjects said simultaneous” as a function of the stimulus onset asynchrony (SOA) between the two stimuli. The peak of this curve indicates the SOA where subjects are most likely to say “simultaneous”: the point of subjective simultaneity (PSS). The width of these curves typically indicate that stimuli need to be separated by about 100200 ms to be reliably perceived as separate (see for example figures 9 a and c, taken from (Stone et al., 2001; Fujisaki et al., 2004). 2) A forced-choice temporal order judgement (TOJ) as to which of a pair of stimuli came on first. A psychometric curve is fitted to the percentage of times a subject said one stimulus was first, plotted as a function of the SOA. JND’s typically are around +/- 100 ms (see for example figure 3.) When TOJ performance is at chance, subjects cannot tell which stimulus came first and this point can be taken as the point of perceived subjective simultaneity (PSS) (see for example Spence et al., 2001 and figures 3 and 5. The advantage of the former method is that it is a direct judgement of perceived simultaneity; the disadvantage is that it is psychophysically uncontrolled, i.e. it is vulnerable to random fluctuations of criterion. The advantage of temporal order judgements is that they provide statistically reliable psychometric data relatively immune to subjective criteria biases because of the forced choice between two independent alternatives. The disadvantage is that the decision forces the subject unnaturally to concentrate on the temporal sequence which might affect their perception. We are presently developing a third method in which subjects estimate the direction of apparent motion between members of a stimulus pair. The relative timing of the stimuli when subjects cannot tell the direction indicates that the stimuli are perceived not as staggered, which evokes apparent motion, but as simultaneous with no resulting apparent motion (Harrar et al., submitted). Simultaneity constancy in the auditory/visual system Judging whether visual and auditory stimuli originate from a single event requires many potential sources of difference between the timing of the two signals need to be taken into account. These include the longer time it takes for sound to reach the ears as opposed to the negligible amount of time it takes light to reach the eyes. This difference depends on the distance of the source from the observer. Other differences result from the faster transduction time of sound than lights, and the various intensities of both sounds and lights which also contributes to the variability in processing times (Spence et al., 2001). To assess the effect of the different velocities of light and sound, we had subjects sit at various distances from a computer that generated light and sound stimuli that could be staggered in time. In addition to varying the distance we also varied the effective intensity of the visual stimulus by viewing it through dark glasses in one condition and with peripheral vision in another condition. Both of these manipulations slow the processing of the visual stimulus (Wilson and Anstis, 1969; Nickalls, 1996). 150 TIME (ms) (-ve = sound first) 100 50 0 -50 -100 -150 0 5 10 15 20 25 30 Distance between observer and stimuli (m) central viewing eccentric viewing dark glasses speed of sound complete compensation Figure 1. The effect of distance, intensity, and eccentricity on judgments of simultaneity. Subjects were presented with lights and sounds from a computer at various distances either during foveal viewing without any glasses (filled circles), wearing dark glasses (filled squares) or eccentric viewing (open triangles). The point of subjective simultaneity (average of 10 subjects) is plotted as a function of distance. The horizontal dashed line indicates true simultaneity; the slanted 35 dotted line indicates the delay of sound due to its slower speed. Data from Kopinska and Harris, 2004. Figure 1 shows that factors that introduce differences in the timing of the light and sound stimuli do not affect the PSS. Sound takes about 3 ms to travel 1 m. Since our corridor was 32 m long, this should add up to a 96 ms lag for the sound, as indicated by the dashed angled line in figure 1. The dark glasses added about 30 ms to the processing time of light, as expected from Pulfrich (Pulfrich, 1922) and measured by reaction times to the present stimuli. Wearing dark glasses should therefore have shifted the function by this amount. Similarly eccentric viewing delays the detection of visual stimuli as measured by simple reaction times by about 40 ms for the present stimuli. These changes would be expected to shift the curves by these amounts. However, there were no statistically significant differences between the three curves (Kopinska and Harris, 2004) indicating that all these variable delays are taken into account before the point at which the brain integrates the information to decide on their relative timing. Notice that differences in timing that the brain needs to take into account include: some due to outside sources (the speed of sound), some due to fixed neural constraints (the speed of transduction of each sense), and some due to factors that vary from moment to moment (the brightness of an object or its location on the retina) Simultaneity constancy in the visual/tactile system Another example of a cross-modal temporal comparison where timing differences need to be taken into account for accurate TOJ’s is the comparison between visual and tactile stimuli. Tactile stimuli, like auditory stimuli, are transduced faster than visual stimuli (King and Palmer, 1985; Poppel et al., 1990) and this difference therefore needs to be compensated for if accurate judgements about the relative timing of tactile and visual stimuli are to be made. To see how well these factors were taken into account we measured TOJ’s between lights and touches on different parts of the body. 300 } } Longer reaction time to light than to touch on finger 250 Transmission time from hand 200 150 Reaction time (msecs) Internal processing + Response generation time 100 50 0 0 50 100 150 200 Distance along body (cms) Figure 2. The reaction times to touches on various parts of the body as a function of distance from the centre of the head. There is an increase in reaction time with distance from the head. Superimposed on this graph is the average reaction time to lights (grey bar). The reaction time to a touch on the hand is pointed to by the picture of a hand. Figure 2 shows that simple reaction times at different points on the body are longer for body parts that are further from the centre of the head. Also, the reaction times to touches on the body are generally faster than to lights such that the reaction time to a touch on the hand is about 40ms faster than to a light. Visual and tactile stimuli on the hand are likely to originate in single events, such when people watch themselves manipulating an object or pressing a piano key. A simultaneity constancy mechanism is required to resynchronize the two brain’s representation of the two stimuli by compensating for the temporal difference in processing times for these two aspects of what should really be regarded as a single (bimodal) stimulus. Figure 3 shows psychometric functions of visual/tactile temporal order judgments involving the hand. The average PSS (shown by the vertical dotted line) is not significantly different from zero despite the fact that touches on the hand are processed 40ms faster than lights (vertical solid line). Eagleman (Eagleman, in press) has found similar evidence in support of a simultaneity constancy mechanism for visual/tactile processing. 12 prediction 34.2 ms Time between stimuli (ms) # times subject said "light first" 4.2 ms 10 8 6 4 2 0 -2 -400 -200 Touch first 0 200 DELAY (ms) 400 first 40 20 0 -20 -40 prediction PSS Prediction first Light first A B Figure 3. Perceived timing of lights and touches on the hand. (A) shows psychometric functions (obtained using method 2 described above) from which the point of subjective simultaneity (the 50% point) could be measured for each subject. Each thin line shows an individual subject’s data. The average PSS is shown by the vertical dashed line and the difference in reaction times between the same stimuli is shown by the vertical solid line. The PSS and the difference in reaction times (prediction) are compared in the histogram in B. Interestingly, (Harrar and Harris, in press) found that if the touches and lights were on different parts of the body, correct judgment of simultaneity were not found. When touches and lights are on different parts of the body they are likely to correspond to multiple events suggesting that spatial congruency requiring the binding of stimuli into a single event (Spence et al., 2003; Spence et al., 2001; Spence and Squire, 2003; Soto-Faraco et al., 2004; Zampini et al., 2005a; Zampini et al., 2005b) Simultaneity constancy within a modality The simultaneity detecting mechanism does not only need to deal with differences across senses. Sometimes multiple stimuli detected by a single sense can evoke responses with different latencies. In the touch system, for example, the time at which information reaches the brain depends on the length of the nerves from the various parts of the body (Macefield et al., 1989). When asked which of two touches on different parts of the body came first, the answer is predictable not from which part was actually touched first, but from the differences in distance from the head. There does not seem to be a simultaneity constancy mechanism within the touch sense (Bergenheim et al., 1996; Harrar and Harris, in press). There is also a demand for an intramodal simultaneity constancy mechanism within vision. A single visual image stimulates different regions of the retina and those regions will be processed at different speeds according to their eccentricity and the local intensity of the image (Wilson and Anstis, 1969). Time difference between stimuli (ms) dim first bright first A B Figure 4. Simultaneity constancy within the visual system. (A) Subjects viewed patches at an eccentricity of 19 degrees. The patches could be either bright or dim, illustrated as white or grey circles. Left and right pairs, separated by 14 degrees, were presented separately for each judgment trial. Reaction times were measured to each patch individually. (B). Bright patches were responded to 20 ms faster than dim patches, leading to the prediction that the dim light would need to lead the bright light by 20 ms for them to be perceived as simultaneous. However the PSS was not significantly different from zero. In the experiment illustrated in figure 4, targets with different luminances were used. Reaction times showed that these patches were processed at different speeds, but TOJs showed that the timing difference was accounted for and simultaneity constancy was achieved, i.e. the PSS was not significantly different from true simultaneity. Occasions on which simultaneity constancy is not engaged The present authors and others (Engel and Dougherty, 1971) have found simultaneity constancy is engaged for various pairs of stimuli including light/sound, light/touch, and light/light pairs (Sugita and Suzuki, 2003; Kopinska and Harris, 2004; Alais and Carlile, 2005). There are however some situations in which simultaneity constancy is not engaged. We have already seen that there is substantial evidence against a simultaneity constancy mechanism for multiple touches along the body. When the simultaneity constancy mechanism is not engaged, differences in the processing times of two stimuli resulting from either internal or external sources are reflected in the PSS. That is subjects make errors in judging which stimulus was first by not taking into account factors that alter the timing of brain responses. For example working in the open air (Lewald and Guski, 2003) found the PSS for light/sound pairs of stimuli to vary with distance. Another example comes from an experiment which has investigated simultaneity constancy in a more realistic context (Arnold et al., 2005). When two disks move on a collision course they can be seen either to pass through each other, or to bounce off each other. The latter interpretation is made more likely if a sound accompanies the “collision”. The sound is most effective in doing this if it is presented not at the instant of collision, but just prior to it – suggesting that the sound and the visual event need to match in “brain time” rather than in real time (Arnold et al., 2005). Furthermore, the most effective time varies with distance, indicating no compensation for the different velocities of light and sound and no effective simultaneity constancy mechanism. Why simultaneity constancy seems to be engaged in some circumstances and not others, is unclear and currently being investigated. Mechanisms of Simultaneity Constancy Given that simultaneity constancy is engaged under many circumstances, it now behooves us to explain how these variations in timing can be compensated for. There are two broad classes of models: bottom up and top down. A bottom-up model would involve altering the timing of information as it passed through the nervous system such that by the time it arrived at the relevant decision-making site any timing differences would already be removed. Such a scheme could only compensate for internal variations, such as nerve lengths and transduction time differences since they are independent of context, and do not change for a given stimulation – the conduction velocity from the hand to the brain is independent of what touches the hand. Other factors, such as the delay introduced by the speed of sound, require information that is context dependent (distance from the observer) and not present in the stimulus itself (the distance from the observer is a percept created by the observer). A bottom-up model cannot compensate fully for such things because they are not fixed. Reaction time differences such as shown in figure 2 suggest empirically that no peripheral compensation for timing differences takes place at a low level. If there were such bottom-up peripheral compensation then reaction times to touches anywhere on the body would be the same. Top-down explanations of simultaneity constancy involve taking into account the context in which the stimulus components occur and using this as part of a reconstructive process to obtain the correct relative timings. Intuitively it seems unlikely that all the sources of variation in timing could be hardwired into the system or that it would be advantageous for it to be so. The system could learn to interpret particular delays between two stimuli in a particular context as corresponding to simultaneity. But if exposed to a new delay between two stimuli, a top-down system could recalibrate assuming that this new pairing now corresponded to simultaneity in the outside world. Fujisaki et al. (Fujisaki et al., 2004) showed that the simultaneity constancy mechanism is indeed flexible in this way. They showed that the PSS between lights and sounds could be shifted following a period of repeated exposure to pairs of stimuli separated by a particular delay. Although, flexibility is not proof of a top-down system, it is a defining characteristic. We replicated Fujisaki et al.’s experiment by presenting a time-staggered sound/light pair, with the light coming on before the light. We used headphones and an LED and repeated the new contingency repeatedly for 5 min. This caused subjects’ PSS’s to shift 40 ms, but curiously not towards the experienced stagger (figure 5). Although this result confirms that the PSS is flexible, it is odd that after adapting to ‘light first’ as an indication of true simultaneity, ‘light first’ was not subsequently interpreted as simultaneous! Fujisaki et al. (Fujisaki et al., 2004) adapted subjects to several different temporal intervals and found that responses varied depending on the SOA adapted to. However, as can be seen from their figures 2b and 2d, subjects’ shifts were all in the direction of ‘sound first’ even after adapting to ‘light first’ stimuli. This suggests that there may be a bias towards shifting the PSS in the direction of sound first, but it is not yet known why. A possible confound may be that both Fujisaki et al. (2004) and Harrar and Harris (2005) used headphones for the presentation of sounds which separated them spatially from the light and made it more difficult to bind the stimuli. PRE Probability of choosing "light first" 12 POST 32 72 10 8 6 4 2 0 light/sound pair -2 -300ms -200ms -100ms light first 0ms 100ms delay(ms) 200ms 300ms sound first Figure 5. The result of adapting to a temporally staggered light/sound pair. After 5 min of repeated exposure to a light followed 250 ms later by a sound (delivered through headphones) (see insert) there was a significant shift in temporal order judgments between light and sound (before, filled circles; after open circles; error bars over five subjects pooled). The vertical lines indicate the PSS before (solid line) and after (dashed line). The PSS shifted 40 ms towards ‘sound first’. If reaction times were similarly affected by such an adaptation regime, it would be evidence for a bottom-up process in which the total functioning of the system could be altered in response to experience but which would then, blindly and without regard to individual context, operate with a particular delay. We therefore repeated Fujisaki et al. (2004)’s experiment for all combinations of light, sound, and touch stimuli and tested reaction times both to the stimuli within the pair that was adapted to and to the third stimulus (the one not present in the adaptation phase). Figure 6 shows that reaction times to all three types of stimuli were unaffected by all adaptation regimes. Adapt to sound light Reaction Times 280 260 240 220 200 sound no light -100 no adapt 100 first adapt first Adapt to sound touch Reaction Times 280 260 240 220 200 sound -100 first no no adapt adapt touch 100 first Adapt to light touch Reaction Times 280 260 240 Light Sound Touch 220 200 light -100 first no no adapt adapt touch 100 first Figure 6. Reaction times after adapting to staggered stimulus pairs. Subjects adapted to combinations of light, sound, and touch stimuli that were staggered in time by 100 ms (left panel). Sounds were presented through earphones while the lights and touches were on the subjects’ index finger. The graphs on the right show the mean reaction times for each stimulus (see key on graph) and their standard errors. The middle point (labeled ‘no adapt’) is the reaction times prior to adaptation. There were no significant differences for reaction times following any adaptations. These experiments show that reaction times, and the hardwired processing that leads to them, are not altered by an adaptation regime that was capable of altering TOJs (figure 5). However, as we will see, although the light/sound system’s properties can be altered by a few minutes of experience, this does not seem hold for pair comparisons involving the touch system. The general robustness of reaction times, even after adapting to staggered light/sound pairs, is consistent with temporal compensation not being a bottom-up process. Another limitation of a bottom-up class of model is that if, for example, the processing time of a particular stimulation were extended (in response to environmental demands or exposure to contrived temporal pairing in the laboratory) this added delay in processing time would subsequently be applied in all circumstances where that stimulation occurred regardless of which other stimuli were present. For example if the processing of light were delayed to compensate for an unusually slow sound coming from a distant event, the added delay would affect all subsequent perceptions involving visual stimuli. To check if this was so we adapted subjects to each of three combinations of time-staggered stimuli (light/sound, light/touch, sound/touch) and after each adaptation, tested TOJs to all three combinations interleaved. After adapting to the time-staggered sound/light combination, PSS’s fpr the adapted pair shifted but the PSS for the sound/touch and light/touch pairs did not (cf. figure 5). Thus there is not a general slowing or speeding up of the processing of individual stimuli since although the experienced light/sound pair was affected by the adaptation, the other pairs were unaffected despite having one member in common with the pair for which the PSS did shift. This suggests that there are multiple, independent, simultaneity constancy mechanisms as shown in figure 7 for the various multimodal tasks that we may encounter. Figure 7. Multiple, independent mechanisms for simultaneity constancy. The lack of cross coupling between stimulus pairs after adaptation suggests that the perceived temporal aspects of multimodal stimulus combinations are processed individually, as opposed to a single mechanism that adjusts all individual components. On the far left are the individual stimuli which are then paired with other stimuli involving independent comparisons which brings them into synchrony separately, appropriate for various tasks, examples of which are described on the right. After adapting to time-staggered light/touch and sound/touch pairs we found no significant shifts of the PPS between the stimuli adapted to, however we did find a shift in the sound/light PSS after adapting to the light/touch pair. This suggests that the sound/light system is more plastic than other systems and can be adjusted in response to even a few minutes experience with a staggered pair, whereas systems involving touch seem to be less plastic. Eagleman’s chapter in this book (Eagleman, in press) has pointed to the importance of active involvement in engaging the touch system. He suggests that the touch system is fundamentally different from the more passive auditory and visual systems and is usually interpreted in terms of actions and the positions of the limbs. Further, a touch on a given part of the body always occurs at a known distance from the observer and does not need to be inferred the way it has to be for visual and auditory signals. In the somatosensory homunculus each group of cells represents a specified location on the body (Penfield and Rasmussen, 1950; Bergenheim et al., 1996). Bergenheim et al. (1996) suggest that since the distance is a fixed attribute of the stimulation site it is also possible that the relative time at which the stimulus occurred could be part of the signal; a “temporal homunculus”. The lack of plasticity in the tactile system may be evidence for such a fixed temporal homunculus. If the simultaneity constancy model is a top-down process, then contextual elements such as distance and velocity can be taken into account to more easily pair them together before specific times are attributed to the elements. However, taking these factors into account may be associated with large computational demands and these valuable resources should only be allocated when necessary. Relevant stimuli need to be carefully selected from the vast array of potential candidates, shortcuts applied if possible, and computationally demanding contextual elements taken into account only when necessary. We propose a three stage model to achieve this. Simultaneity constancy: a three stage model In the first stage of the model individual stimuli that may be part of a single object or event are identified based on temporal and spatial proximity to each other. In the second stage, a first approximation is applied to the time difference between the stimuli, based on the most likely time difference. In the third stage, specific adjustments are made that are based on experience with the particular stimuli. Following the third stage, simultaneity constancy is achieved. Figure 8. The three stage model for simultaneity constancy. Model first stage The first part of the three stage model addresses the question of which stimuli are selected for temporal compensation. A potential guide for identifying likely candidates is if they occur close together in space and time (Bertelson and Aschersleben, 2003; Lewald and Guski, 2003; Spence et al., 2001). Although there is some considerable agreement that temporal and spatial windows for temporal and spatial integration exist, the sizes of these windows are less clear. To measure the importance of spatial separation, we used targets that differed in brightness and arranged them to be either close together (separated by 8 degs) or far apart (separated by 39 degs). Stimuli of different brightness show simultaneity constancy that overcomes the slower processing of the dimmer stimulus (see above, figure 4). The pair that was close together showed simultaneity constancy, correcting for a reaction time difference of over 20 ms. However, the pair that was far apart did not. This suggests a spatial window for intramodal visual simultaneity constancy between 8 and 39 degrees, consistent with previous studies of intermodal sensory integration (Lewald and Guski, 2003 (greater than 20 degrees), Bertelson and Aschersleben, 2003 (less than 50 degrees), Spence et al., 2001). This observation is also consistent with a similar spatial constraint for the visual/tactile simultaneity constancy mechanism. For touch/light simultaneity constancy, Harrar and Harris (2005) showed that compensation took place when the stimuli were both presented on the hand and did not occur when one was on the hand and the other on the foot. Similar results were found by (Spence et al., 2003) who found PSSs did not differ from true simultaneity when visual and tactile stimuli were presented at the same location, while the PSSs did significantly differ from 0 when the two stimuli were presented in different positions. . Thus, although there is considerable evidence for a spatial window, it seems to be rather broad. Perhaps a temporal window will enable a more refined selection of stimuli for binding. There must be a limit to a time acceptance window for simultaneity constancy, to prevent all the events that ever happen being collected into a single time epoch and all being treated as simultaneous! A simultaneity constancy mechanism should only be engaged in situations where a single event is likely to have created stimuli in different modalities. For touch and light this corresponds to a maximum of about 100 ms (see figure 2), but when sound is involved, the delay can be very much larger since it depends on the distance of the source to the observer. Thus an integration range in time corresponds to an operating range in distance. When asked if two events are simultaneous, or which of two events came first, there is a probabilistic uncertainty associated with the decision. Examples of these are shown in figure 9 which indicates an uncertainty window of about +/- 100 to 200 ms. For a light/sound comparison this therefore corresponds to an operating distance range of 30-60 m. This interval corresponds to that obtained by other methods of assessing the temporal integration window (Slutsky and Recanzone, 2001; Fendrich and Corballis, 2001; Morein-Zamir et al., 2003; Lewald and Guski, 2003; Jaekl and Harris, in press). hand touch /hand light 12 prediction 34.2 ms 4.2 ms 10 Y Data 8 ±100ms 6 4 2 -2 -400 -200 Touch first A from Stone et al. (2001) ±200ms ±100ms 0 B 0 DELAY (ms) 200 400 Light first from Harris and Harrar. (In press) C from Fujisaki et al. (2004) Figure 9. The width of the temporal integration window. Several studies using different methods suggest that the window within which temporal integration occurs is between 100 and 200 ms. (data from (Stone et al., 2001; Harrar and Harris, in press; Fujisaki et al., 2004) Only within these rather broad temporal and spatial windows can stimuli be combined into a single percept and simultaneity constancy, either intramodal or intermodal, operate. Model second stage The first stage of the simultaneity constancy mechanism selected appropriate stimuli to be bound into a single event. Once stimuli have been identified as possibly arising from the same event, we propose that a second stage in the simultaneity constancy mechanism applies a best-guess, fixed time delay to stimuli that brings the average relative time between them closer to zero and thus reduces computational demands. For example, as seen in figure 2, the processing time of touches on the body is generally faster than that of lights. So when a multimodal stimulus occurs that involves tactile and visual stimulation, such as watching something touch the skin or looking at an object being manipulated in the hand, the tactile input can be anticipated as leading the visual input by roughly 40ms on average. Adding a delay of around this amount to the representation of all tactile input thus reduces the need for complex computations tailored to particular requirements and will be approximately correct for most circumstances. Similarly, lights are generally processed slower than sounds within a few meters, and so a fixed relative delay might also be applied to the neural representation of all sound stimuli when they are paired with visual stimuli. The proposed delay would be fixed for a given pair of stimuli as follows: light/touch slow touch by 40 ms light/sound slow sound by 40 ms touch/sound no adjustment If specific experience is not available, the fixed-rule second stage is all that can be applied. An example of a situation when specific experience is not available when lights and touches are presented in close proximity on the sole of the foot. foot touch / foot light Number of times subject said “light first” Y Data 12 29.5 prediction 1.7 ms 10 8 6 4 2 0 -2 -400 -200 Touch first 0 200 DELAY (ms) first first first Time between stimuli (ms) 400 Light first foot touch / foot light 40 20 0 -20 -40 prediction PSS Prediction first Figure 10. PSS for touches and lights on the foot. The top panel shows the temporal order judgments for each pair of stimuli as a function of the delay between the two components. Positive and negative values on the horizontal axis indicate which of the stimuli was presented first, as shown by the inserted cartoons. The grey lines show sigmoidal fits through each subject’s data and the average data points and standard errors are superimposed on these curves. The black curve is a sigmoid reconstructed from the average PSS and standard deviation of all of these curves. The temporal delay at the PSS indicated by the average curve is shown by a dashed vertical line and the prediction from the reaction time differences is shown as a solid vertical line. In the histograms in the lower panel, the reaction time prediction (black bar) is compared with the PSS (shaded bar). The vertical axis shows the delays between the stimuli with the polarity indicated by the inserted cartoons. Standard errors are also shown. The reaction time prediction is zero, while the PSS requires light to be presented 30 ms before the touch on the foot. Figure 2 shows that reaction times to touches on the foot and to lights happen to be approximately the same. The prediction, if no compensation is present, is therefore that the PSS would be when lights and touches were applied simultaneously to the foot. However, we found a shift in the PSS away from accurate simultaneity such that subjects perceived simultaneity between the light and the touch when the light was presented about 30 ms before the touch (figure 10): an anticompensatory shift. We interpret this as revealing the application of a fixed delay (40ms delay to tactile stimulus) corresponding to the second stage in a simultaneity constancy mechanism. The lack of experience of feedback from the foot meant that further fine adjustments could not be made. If this “best-guess” average tactile delay were added to the processing of a tactile and visual combination on the hand, it would allow veridical perception of simultaneity and relieve the system of the need for further calculation, but on the foot it causes the temporal relations to be misperceived. It is an interesting question as to whether the second stage fixed rules can be adapted or whether the period for being plastic has passed for these unusual stimulus combinations. Veridical perception of simultaneity usually requires specific experience with the stimulus pair in question so that variations in timing specific to the stimuli and context involved, such as intensity and distance, can be taken into account. The third stage of our model takes into account such specific timing requirements. Model third stage The third stage of our model is where adjustments are made based on particular experience of the stimulus pair in question. This is where things such as the speed of sound and distance from the observer are taken into account – information that is not present in the stimulus and that must be added from previous experience. Evidence for this third stage has been presented above in many conditions where stimuli were correctly perceived as simultaneous despite differences in processing time that could not be assumed without reference to the stimulus situation. Examples described above included compensation for the slower speed of sound for distances of up to 32 meters when paired with lights (figure 1), and compensation for the variation with intensity of the processing times of lights (figure 4). In these situations, the rough approximations that are added at the second stage are “tweaked” at the third stage so that perception is veridical. How this compensation might be achieved is a question that we unfortunately have no answers to. The comparisons needed to answer either of the main questions used to assess the perception of simultaneity (were they simultaneous? which came first?) must take place in memory, which is necessarily accessed after the event. Thus, a reconstruction of the entire multimodal event may be imagined taking into account as many factors as possible. This reconstructive process is flexible enough to compensate for delays of up to at least 100 ms (such as found between the auditory and visual components of events 32 meters away). If however the timing information is required immediately for some purpose or decision which does not allow the luxury of reflection after the event, it is possible that no adjustment can be made. The example of two disks colliding and emitting a sound on contact (Arnold et al., 2005) maybe one such task. Conclusions This chapter has reviewed the occurrence of an important general perceptual principle: simultaneity constancy. A reconstructive process is involved that is able resynchronize asynchronous signals by taking into account many factors, both internal and external, which would otherwise distort accurate knowledge of timing. Following the reconstruction, accurate estimates can be made about the relative timing of events. There appear to be a multitude of parallel mechanisms working on different combinations of stimuli. The rules of when to activate the systems are not clear and the spatiotemporal constraints appear loosely defined. Without tight spatial or temporal acceptance windows it seems likely that the system would be engaged inappropriately and the timing of unrelated stimuli would often be brought into perceptual alignment. Such accidental alignment, however, is unlikely to introduce a problem and appears to be an acceptable side-effect of a system that manipulates the perception of space and time to suit its needs. Acknowledgements This work was supported a grant from the Natural Science and Engineering Research Council of Canada to LRH. Vanessa Harrar and Philip Jaekl hold NSERC graduate fellowships. Many of the experiments reported here were carried out by Agnieszka Kopinska and we are grateful for her permission to use them here. We are also very grateful to Sugirhini & Shamini Selvanayagarajah for their help with the data collection. Reference List Alais, D. and Carlile, S. Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speed of sound. Proc Natl Acad Sci U S A. 2005 Feb 8; 102(6):2244-7. Arnold, D. H.; Johnston, A., and Nishida, S. Timing sight and sound. Vision Res. 2005 May; 45(10):1275-84. Bergenheim M, Johansson H, Granlund B, Pedersen J (1996) Experimental evidence for a sensory synchronization of sensory information to conscious experience. In: Hameroff, SR, Kaszniak, AW, Scott, AC (ed) Towards a science of consciousness: The first Tucson discussions and debates. MIT press, Cambridge, MA. pp. 301-310 Bertelson, P. and Aschersleben, G. Temporal ventriloquism: crossmodal interaction on the time dimension. 1. Evidence from auditory-visual temporal order judgment. Int J Psychophysiol. 2003 Oct; 50(1-2):147-55. Dennett DC, Kinsbourne M (1992) Time and the observer: the where and when of consciousness in the brain. Behavioural and Brain Sciences 15: 183-201 Eagleman D (in press) Recalibrating touch. In: Nijhawan, R (ed) Issues of space and time in perception and action. Cambridge University Press, Cambridge, UK Engel GR, Dougherty WG (1971) Visual-auditory distance constancy. Nature 234: 308 Fendrich R, Corballis PM (2001) The temporal cross-capture of audition and vision. Perception and Psychophysics 63: 719-725 Fujisaki W, Shimojo S, Kashino M, Nishida S (2004 Jul) Recalibration of audiovisual simultaneity. Nat Neurosci 7: 773-8 Gregory, R. L. Distortion of visual space as inappropriate constancy scaling. Nature. 1963 Aug 17; 199:678-80. Harrar, V. and Harris, L. R. Simultaneity constancy: detecting events with touch and vision. Exp Brain Res. in press. Harrar V, Winter R, Harris LR (submitted) The effect of distance on unimodal and multimodal apparent motion. Neuropsychologia Jaekl P, Harris LR (in press) Time Shifting: A Direct Measurement of Auditory-Visual Temporal Integration. Neuropsychologia King AJ, Palmer AR (1985) Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Exp. Brain Res. 60: 492-500 Kopinska A, Harris LR (2004) Simultaneity constancy. Perception 33: 1049-1060 Lewald, J. and Guski, R. Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli. Brain Res Cogn Brain Res. 2003 May; 16(3):468-78. Macefield, G.; Gandevia, S. C., and Burke, D. Conduction velocities of muscle and cutaneous afferents in the upper and lower limbs of human subjects. Brain. 1989 Dec; 112 ( Pt 6):1519-32. McKee SP, Smallman HS (1998) Size and speed constancy. In: Walsh, V, Kulikowski, JJ (ed) Perceptual constancy. Cambridge University Press, Cambridge, U.K. pp. 373-408 Morein-Zamir, S.; Soto-Faraco, S., and Kingstone, A. Auditory capture of vision: examining temporal ventriloquism. Brain Res Cogn Brain Res. 2003 Jun; 17(1):154-63. Nickalls RWD (1996) The Influence of Target Angular Velocity on Visual Latency Difference Determined Using the Rotating Pulfrich Effect. Vision Res. 36: 28652872 Penfield, W. and Rasmussen, T. 1950 The cerebral cortex of man. Macmillan, New York Poppel, E.; Schill, K., and von Steinbuchel, N. Sensory integration within temporally neutral systems states: a hypothesis. Naturwissenschaften. 1990 Feb; 77(2):89-91. Pulfrich C (1922) Die Stereoskopie im Dienste der isochromen und heterochromen Photometrie. Naturwissenschaften 25: 553-564 Slutsky, D. A. and Recanzone, G. H. Temporal and spatial dependency of the ventriloquism effect. Neuroreport. 2001 Jan 22; 12(1):7-10. Soto-Faraco, S.; Ronald, A., and Spence, C. Tactile selective attention and body posture: assessing the multisensory contributions of vision and proprioception. Percept Psychophys. 2004 Oct; 66(7):1077-94. Spence, C.; Baddeley, R.; Zampini, M.; James, R., and Shore, D. I. Multisensory temporal order judgments: when two locations are better than one. Percept Psychophys. 2003 Feb; 65(2):318-28. Spence C, Shore DI, Klein RM (2001 Dec) Multisensory prior entry. J Exp Psychol Gen 130: 799-832 Spence, C. and Squire, S. Multisensory integration: maintaining the perception of synchrony. Curr Biol. 2003 Jul 1; 13(13):R519-21. Stone RV, Hunkin NM, Porrill J, Wood R, Keeler V, Beanland M, Port M, Porter NR (2001) When is now? Perception and simultaneity. Proc. Roy. Soc. Lond. B. 268: 31-38 Sugita Y, Suzuki Y (2003) Audiovisual perception: Implicit estimation of sound-arrival time. Nature 421: 911 Von Békésy G (1963) Interaction of paired sensory stimuli and conduction in peripheral nerves. J. Applied Physiol. 18: 1276-1284 Walsh, V. and Kulikowski, J. 1998 Perceptual constancy: why things look as they do. Cambridge University Press, Cambridge, UK Wilson JA, Anstis SM (1969) Visual delay as a function of luminance. American Journal of Psychology 82: 350-8 Zampini, M.; Brown, T.; Shore, D. I.; Maravita, A.; Roder, B., and Spence, C. Audiotactile temporal order judgments. Acta Psychol (Amst). 2005a Mar; 118(3):277-91. Zampini, M.; Guest, S.; Shore, D. I., and Spence, C. Audio-visual simultaneity judgments. Percept Psychophys. 2005b Apr; 67(3):531-44.
© Copyright 2024 Paperzz