Mechanisms of simultaneity constancy

Mechanisms of simultaneity constancy
Laurence Harris*, Vanessa Harrar, and Phil Jaekl
Multisensory Integration Laboratory,
Centre for Vision Research,
York University,
Ontario, Canada M3J 1P3
Corresponding author:
Laurence Harris,
Department of Psychology,
York University,
Ontario, Canada M3J 1P3
Email: [email protected]
Phone: 416-736-2100 x 66108
Fax: 416-736-5857
Chapter to appear in:
“Issues of space and time in perception and action" edited by Romi
Nijhawan. (Cambridge University Press)
Although time is essential for proper perception of the outside world, it is created by the brain.
There is no sensory system for time, only a perception of relative time between events. Since time
is a reconstructive process, the perception of “now” must always lag behind (Dennett and
Kinsbourne, 1992; Stone et al., 2001) which means that perception of time is by definition always
wrong to a degree. Are we able to correct for errors that result from various stimuli taking
different amounts of time to be processed in the reconstruction of time and accurately perceive
simultaneity? This chapter reviews the circumstances in which simultaneity is correctly perceived
and proposes a three-stage mechanism for achieving this.
Combining information from different senses that corresponds with a single event has several
problems associated with it. The senses collect data in different reference frames, generally about
different attributes of an object and with different resolutions and reliability. Furthermore
information picked up by different senses is often redundant – which raises the issue of how
information might be averaged between or selected from a range of signals. Areas where the
different senses may provide redundant information are the location of an object in space and
time. Making accurate and consistent judgments about the relative timing of events requires our
perceptual system to overcome the different operation speeds associated with each individual
sense. These differences arise from both intrinsic and extrinsic factors.
There are many reasons why information about a given event takes a different amount of time
depending on the sensory route it takes to reach the central nervous system. Three sources of
these differences are: the time it takes the energy from the event to reach the neural sensors; the
time for the transduction process (Spence and Squire, 2003); and the neural transmission time for
the information to pass from the transducers to the central nervous system (Von Békésy, 1963;
Macefield et al., 1989). Many factors can influence the timing of each of these three sources and
any two simultaneous stimuli will often differ in the timing of all them. Reconstructing the actual
timing of an event or its timing relative to another event, therefore involves making some
allowance for these variable delays. If observers are to identify correctly the simultaneity of the
components of multimodal events, they need to allow for all of these factors. We and others have
demonstrated that in some situations, the brain can take many of these factors into account and
correctly realize when stimuli from various modalities inform about a common object or event
(Engel and Dougherty, 1971; Sugita and Suzuki, 2003; Kopinska and Harris, 2004; Alais and
Carlile, 2005). This chapter will explore how effectively this can be done both cross-modally and
Simultaneity constancy
Simultaneity constancy is the ability to correctly perceive simultaneous events as such, despite
variations in the timing of the internal representations of the individual components of the stimuli
(Kopinska and Harris, 2004). Simultaneity constancy is conceptually not different from other
forms of perceptual constancy where the perception, in this case of relative time, is not present in
sensory information and must be deduced (Walsh and Kulikowski, 1998). For example the size of
an object is correctly perceived despite continuous changes in its retinal size – a phenomenon
known as size constancy (Gregory, 1963; McKee and Smallman, 1998). Simultaneity constancy
maintains a percept of simultaneous stimuli across many channels and types of information thus
indicating a common attribute – the presence of a complex single event.
Methods used to assess perceived simultaneity
There are two methods generally used to asses the perception of simultaneity:
1) A forced-choice decision canbe made between whether two stimuli are “simultaneous” or
“successive.” Generally these decisions are plotted as a normal distribution with “number
of times subjects said simultaneous” as a function of the stimulus onset asynchrony
(SOA) between the two stimuli. The peak of this curve indicates the SOA where subjects
are most likely to say “simultaneous”: the point of subjective simultaneity (PSS). The
width of these curves typically indicate that stimuli need to be separated by about 100200 ms to be reliably perceived as separate (see for example figures 9 a and c, taken from
(Stone et al., 2001; Fujisaki et al., 2004).
2) A forced-choice temporal order judgement (TOJ) as to which of a pair of stimuli came on
first. A psychometric curve is fitted to the percentage of times a subject said one stimulus
was first, plotted as a function of the SOA. JND’s typically are around +/- 100 ms (see
for example figure 3.) When TOJ performance is at chance, subjects cannot tell which
stimulus came first and this point can be taken as the point of perceived subjective
simultaneity (PSS) (see for example Spence et al., 2001 and figures 3 and 5.
The advantage of the former method is that it is a direct judgement of perceived simultaneity; the
disadvantage is that it is psychophysically uncontrolled, i.e. it is vulnerable to random
fluctuations of criterion. The advantage of temporal order judgements is that they provide
statistically reliable psychometric data relatively immune to subjective criteria biases because of
the forced choice between two independent alternatives. The disadvantage is that the decision
forces the subject unnaturally to concentrate on the temporal sequence which might affect their
perception. We are presently developing a third method in which subjects estimate the direction
of apparent motion between members of a stimulus pair. The relative timing of the stimuli when
subjects cannot tell the direction indicates that the stimuli are perceived not as staggered, which
evokes apparent motion, but as simultaneous with no resulting apparent motion (Harrar et al.,
Simultaneity constancy in the auditory/visual system
Judging whether visual and auditory stimuli originate from a single event requires many potential
sources of difference between the timing of the two signals need to be taken into account. These
include the longer time it takes for sound to reach the ears as opposed to the negligible amount of
time it takes light to reach the eyes. This difference depends on the distance of the source from
the observer. Other differences result from the faster transduction time of sound than lights, and
the various intensities of both sounds and lights which also contributes to the variability in
processing times (Spence et al., 2001).
To assess the effect of the different velocities of light and sound, we had subjects sit at various
distances from a computer that generated light and sound stimuli that could be staggered in time.
In addition to varying the distance we also varied the effective intensity of the visual stimulus by
viewing it through dark glasses in one condition and with peripheral vision in another condition.
Both of these manipulations slow the processing of the visual stimulus (Wilson and Anstis, 1969;
Nickalls, 1996).
TIME (ms)
(-ve = sound first)
Distance between observer and stimuli (m)
central viewing
eccentric viewing
dark glasses
speed of sound
complete compensation
Figure 1. The effect of distance, intensity, and eccentricity on judgments of simultaneity. Subjects
were presented with lights and sounds from a computer at various distances either during foveal
viewing without any glasses (filled circles), wearing dark glasses (filled squares) or eccentric
viewing (open triangles). The point of subjective simultaneity (average of 10 subjects) is plotted
as a function of distance. The horizontal dashed line indicates true simultaneity; the slanted
dotted line indicates the delay of sound due to its slower speed. Data from Kopinska and Harris,
Figure 1 shows that factors that introduce differences in the timing of the light and sound stimuli
do not affect the PSS. Sound takes about 3 ms to travel 1 m. Since our corridor was 32 m long,
this should add up to a 96 ms lag for the sound, as indicated by the dashed angled line in figure 1.
The dark glasses added about 30 ms to the processing time of light, as expected from Pulfrich
(Pulfrich, 1922) and measured by reaction times to the present stimuli. Wearing dark glasses
should therefore have shifted the function by this amount. Similarly eccentric viewing delays the
detection of visual stimuli as measured by simple reaction times by about 40 ms for the present
stimuli. These changes would be expected to shift the curves by these amounts. However, there
were no statistically significant differences between the three curves (Kopinska and Harris, 2004)
indicating that all these variable delays are taken into account before the point at which the brain
integrates the information to decide on their relative timing.
Notice that differences in timing that the brain needs to take into account include: some due to
outside sources (the speed of sound), some due to fixed neural constraints (the speed of
transduction of each sense), and some due to factors that vary from moment to moment (the
brightness of an object or its location on the retina)
Simultaneity constancy in the visual/tactile system
Another example of a cross-modal temporal comparison where timing differences need to be
taken into account for accurate TOJ’s is the comparison between visual and tactile stimuli. Tactile
stimuli, like auditory stimuli, are transduced faster than visual stimuli (King and Palmer, 1985;
Poppel et al., 1990) and this difference therefore needs to be compensated for if accurate
judgements about the relative timing of tactile and visual stimuli are to be made. To see how well
these factors were taken into account we measured TOJ’s between lights and touches on different
parts of the body.
Longer reaction time to light
than to touch on finger
Transmission time from hand
Reaction time (msecs)
Internal processing
Response generation time
Distance along body (cms)
Figure 2. The reaction times to touches on various parts of the body as a function of distance
from the centre of the head. There is an increase in reaction time with distance from the head.
Superimposed on this graph is the average reaction time to lights (grey bar). The reaction time to
a touch on the hand is pointed to by the picture of a hand.
Figure 2 shows that simple reaction times at different points on the body are longer for body parts
that are further from the centre of the head. Also, the reaction times to touches on the body are
generally faster than to lights such that the reaction time to a touch on the hand is about 40ms
faster than to a light.
Visual and tactile stimuli on the hand are likely to originate in single events, such when people
watch themselves manipulating an object or pressing a piano key. A simultaneity constancy
mechanism is required to resynchronize the two brain’s representation of the two stimuli by
compensating for the temporal difference in processing times for these two aspects of what
should really be regarded as a single (bimodal) stimulus.
Figure 3 shows psychometric functions of visual/tactile temporal order judgments involving the
hand. The average PSS (shown by the vertical dotted line) is not significantly different from zero
despite the fact that touches on the hand are processed 40ms faster than lights (vertical solid line).
Eagleman (Eagleman, in press) has found similar evidence in support of a simultaneity constancy
mechanism for visual/tactile processing.
prediction 34.2 ms
Time between stimuli (ms)
# times subject said "light first"
4.2 ms
Touch first
DELAY (ms)
prediction PSS
Light first
Figure 3. Perceived timing of lights and touches on the hand. (A) shows psychometric functions
(obtained using method 2 described above) from which the point of subjective simultaneity (the
50% point) could be measured for each subject. Each thin line shows an individual subject’s
data. The average PSS is shown by the vertical dashed line and the difference in reaction times
between the same stimuli is shown by the vertical solid line. The PSS and the difference in
reaction times (prediction) are compared in the histogram in B.
Interestingly, (Harrar and Harris, in press) found that if the touches and lights were on different
parts of the body, correct judgment of simultaneity were not found. When touches and lights are
on different parts of the body they are likely to correspond to multiple events suggesting that
spatial congruency requiring the binding of stimuli into a single event (Spence et al., 2003;
Spence et al., 2001; Spence and Squire, 2003; Soto-Faraco et al., 2004; Zampini et al., 2005a;
Zampini et al., 2005b)
Simultaneity constancy within a modality
The simultaneity detecting mechanism does not only need to deal with differences across senses.
Sometimes multiple stimuli detected by a single sense can evoke responses with different
latencies. In the touch system, for example, the time at which information reaches the brain
depends on the length of the nerves from the various parts of the body (Macefield et al., 1989).
When asked which of two touches on different parts of the body came first, the answer is
predictable not from which part was actually touched first, but from the differences in distance
from the head. There does not seem to be a simultaneity constancy mechanism within the touch
sense (Bergenheim et al., 1996; Harrar and Harris, in press).
There is also a demand for an intramodal simultaneity constancy mechanism within vision. A
single visual image stimulates different regions of the retina and those regions will be processed
at different speeds according to their eccentricity and the local intensity of the image (Wilson and
Anstis, 1969).
Time difference
between stimuli (ms)
dim first
bright first
Figure 4. Simultaneity constancy within the visual system. (A) Subjects viewed patches at an
eccentricity of 19 degrees. The patches could be either bright or dim, illustrated as white or grey
circles. Left and right pairs, separated by 14 degrees, were presented separately for each
judgment trial. Reaction times were measured to each patch individually. (B). Bright patches
were responded to 20 ms faster than dim patches, leading to the prediction that the dim light
would need to lead the bright light by 20 ms for them to be perceived as simultaneous. However
the PSS was not significantly different from zero.
In the experiment illustrated in figure 4, targets with different luminances were used. Reaction
times showed that these patches were processed at different speeds, but TOJs showed that the
timing difference was accounted for and simultaneity constancy was achieved, i.e. the PSS was
not significantly different from true simultaneity.
Occasions on which simultaneity constancy is not engaged
The present authors and others (Engel and Dougherty, 1971) have found simultaneity constancy
is engaged for various pairs of stimuli including light/sound, light/touch, and light/light pairs
(Sugita and Suzuki, 2003; Kopinska and Harris, 2004; Alais and Carlile, 2005). There are
however some situations in which simultaneity constancy is not engaged. We have already seen
that there is substantial evidence against a simultaneity constancy mechanism for multiple
touches along the body. When the simultaneity constancy mechanism is not engaged, differences
in the processing times of two stimuli resulting from either internal or external sources are
reflected in the PSS. That is subjects make errors in judging which stimulus was first by not
taking into account factors that alter the timing of brain responses. For example working in the
open air (Lewald and Guski, 2003) found the PSS for light/sound pairs of stimuli to vary with
Another example comes from an experiment which has investigated simultaneity constancy in a
more realistic context (Arnold et al., 2005). When two disks move on a collision course they can
be seen either to pass through each other, or to bounce off each other. The latter interpretation is
made more likely if a sound accompanies the “collision”. The sound is most effective in doing
this if it is presented not at the instant of collision, but just prior to it – suggesting that the sound
and the visual event need to match in “brain time” rather than in real time (Arnold et al., 2005).
Furthermore, the most effective time varies with distance, indicating no compensation for the
different velocities of light and sound and no effective simultaneity constancy mechanism. Why
simultaneity constancy seems to be engaged in some circumstances and not others, is unclear and
currently being investigated.
Mechanisms of Simultaneity Constancy
Given that simultaneity constancy is engaged under many circumstances, it now behooves us to
explain how these variations in timing can be compensated for. There are two broad classes of
models: bottom up and top down.
A bottom-up model would involve altering the timing of information as it passed through the
nervous system such that by the time it arrived at the relevant decision-making site any timing
differences would already be removed. Such a scheme could only compensate for internal
variations, such as nerve lengths and transduction time differences since they are independent of
context, and do not change for a given stimulation – the conduction velocity from the hand to the
brain is independent of what touches the hand. Other factors, such as the delay introduced by the
speed of sound, require information that is context dependent (distance from the observer) and
not present in the stimulus itself (the distance from the observer is a percept created by the
observer). A bottom-up model cannot compensate fully for such things because they are not
fixed. Reaction time differences such as shown in figure 2 suggest empirically that no peripheral
compensation for timing differences takes place at a low level. If there were such bottom-up
peripheral compensation then reaction times to touches anywhere on the body would be the same.
Top-down explanations of simultaneity constancy involve taking into account the context in
which the stimulus components occur and using this as part of a reconstructive process to obtain
the correct relative timings. Intuitively it seems unlikely that all the sources of variation in timing
could be hardwired into the system or that it would be advantageous for it to be so. The system
could learn to interpret particular delays between two stimuli in a particular context as
corresponding to simultaneity. But if exposed to a new delay between two stimuli, a top-down
system could recalibrate assuming that this new pairing now corresponded to simultaneity in the
outside world. Fujisaki et al. (Fujisaki et al., 2004) showed that the simultaneity constancy
mechanism is indeed flexible in this way. They showed that the PSS between lights and sounds
could be shifted following a period of repeated exposure to pairs of stimuli separated by a
particular delay. Although, flexibility is not proof of a top-down system, it is a defining
We replicated Fujisaki et al.’s experiment by presenting a time-staggered sound/light pair, with
the light coming on before the light. We used headphones and an LED and repeated the new
contingency repeatedly for 5 min. This caused subjects’ PSS’s to shift 40 ms, but curiously not
towards the experienced stagger (figure 5). Although this result confirms that the PSS is flexible,
it is odd that after adapting to ‘light first’ as an indication of true simultaneity, ‘light first’ was not
subsequently interpreted as simultaneous! Fujisaki et al. (Fujisaki et al., 2004) adapted subjects
to several different temporal intervals and found that responses varied depending on the SOA
adapted to. However, as can be seen from their figures 2b and 2d, subjects’ shifts were all in the
direction of ‘sound first’ even after adapting to ‘light first’ stimuli. This suggests that there may
be a bias towards shifting the PSS in the direction of sound first, but it is not yet known why. A
possible confound may be that both Fujisaki et al. (2004) and Harrar and Harris (2005) used
headphones for the presentation of sounds which separated them spatially from the light and
made it more difficult to bind the stimuli.
Probability of choosing
"light first"
-300ms -200ms -100ms
light first
sound first
Figure 5. The result of adapting to a temporally staggered light/sound pair. After 5 min of
repeated exposure to a light followed 250 ms later by a sound (delivered through headphones)
(see insert) there was a significant shift in temporal order judgments between light and sound
(before, filled circles; after open circles; error bars over five subjects pooled). The vertical lines
indicate the PSS before (solid line) and after (dashed line). The PSS shifted 40 ms towards ‘sound
If reaction times were similarly affected by such an adaptation regime, it would be evidence for a
bottom-up process in which the total functioning of the system could be altered in response to
experience but which would then, blindly and without regard to individual context, operate with a
particular delay. We therefore repeated Fujisaki et al. (2004)’s experiment for all combinations
of light, sound, and touch stimuli and tested reaction times both to the stimuli within the pair that
was adapted to and to the third stimulus (the one not present in the adaptation phase). Figure 6
shows that reaction times to all three types of stimuli were unaffected by all adaptation regimes.
Adapt to sound light
Reaction Times
no adapt
Adapt to sound touch
Reaction Times
no adapt
Adapt to light touch
Reaction Times
no adapt
Figure 6. Reaction times after adapting to staggered stimulus pairs. Subjects adapted to
combinations of light, sound, and touch stimuli that were staggered in time by 100 ms (left panel).
Sounds were presented through earphones while the lights and touches were on the subjects’
index finger. The graphs on the right show the mean reaction times for each stimulus (see key on
graph) and their standard errors. The middle point (labeled ‘no adapt’) is the reaction times
prior to adaptation. There were no significant differences for reaction times following any
These experiments show that reaction times, and the hardwired processing that leads to them, are
not altered by an adaptation regime that was capable of altering TOJs (figure 5). However, as we
will see, although the light/sound system’s properties can be altered by a few minutes of
experience, this does not seem hold for pair comparisons involving the touch system. The general
robustness of reaction times, even after adapting to staggered light/sound pairs, is consistent with
temporal compensation not being a bottom-up process.
Another limitation of a bottom-up class of model is that if, for example, the processing time of a
particular stimulation were extended (in response to environmental demands or exposure to
contrived temporal pairing in the laboratory) this added delay in processing time would
subsequently be applied in all circumstances where that stimulation occurred regardless of which
other stimuli were present. For example if the processing of light were delayed to compensate for
an unusually slow sound coming from a distant event, the added delay would affect all
subsequent perceptions involving visual stimuli. To check if this was so we adapted subjects to
each of three combinations of time-staggered stimuli (light/sound, light/touch, sound/touch) and
after each adaptation, tested TOJs to all three combinations interleaved. After adapting to the
time-staggered sound/light combination, PSS’s fpr the adapted pair shifted but the PSS for the
sound/touch and light/touch pairs did not (cf. figure 5). Thus there is not a general slowing or
speeding up of the processing of individual stimuli since although the experienced light/sound
pair was affected by the adaptation, the other pairs were unaffected despite having one member in
common with the pair for which the PSS did shift. This suggests that there are multiple,
independent, simultaneity constancy mechanisms as shown in figure 7 for the various multimodal
tasks that we may encounter.
Figure 7. Multiple, independent mechanisms for simultaneity constancy. The lack of cross
coupling between stimulus pairs after adaptation suggests that the perceived temporal aspects of
multimodal stimulus combinations are processed individually, as opposed to a single mechanism
that adjusts all individual components. On the far left are the individual stimuli which are then
paired with other stimuli involving independent comparisons which brings them into synchrony
separately, appropriate for various tasks, examples of which are described on the right.
After adapting to time-staggered light/touch and sound/touch pairs we found no significant shifts
of the PPS between the stimuli adapted to, however we did find a shift in the sound/light PSS
after adapting to the light/touch pair. This suggests that the sound/light system is more plastic
than other systems and can be adjusted in response to even a few minutes experience with a
staggered pair, whereas systems involving touch seem to be less plastic. Eagleman’s chapter in
this book (Eagleman, in press) has pointed to the importance of active involvement in engaging
the touch system. He suggests that the touch system is fundamentally different from the more
passive auditory and visual systems and is usually interpreted in terms of actions and the
positions of the limbs. Further, a touch on a given part of the body always occurs at a known
distance from the observer and does not need to be inferred the way it has to be for visual and
auditory signals. In the somatosensory homunculus each group of cells represents a specified
location on the body (Penfield and Rasmussen, 1950; Bergenheim et al., 1996). Bergenheim et al.
(1996) suggest that since the distance is a fixed attribute of the stimulation site it is also possible
that the relative time at which the stimulus occurred could be part of the signal; a “temporal
homunculus”. The lack of plasticity in the tactile system may be evidence for such a fixed
temporal homunculus.
If the simultaneity constancy model is a top-down process, then contextual elements such as
distance and velocity can be taken into account to more easily pair them together before specific
times are attributed to the elements. However, taking these factors into account may be associated
with large computational demands and these valuable resources should only be allocated when
necessary. Relevant stimuli need to be carefully selected from the vast array of potential
candidates, shortcuts applied if possible, and computationally demanding contextual elements
taken into account only when necessary. We propose a three stage model to achieve this.
Simultaneity constancy: a three stage model
In the first stage of the model individual stimuli that may be part of a single object or event are
identified based on temporal and spatial proximity to each other. In the second stage, a first
approximation is applied to the time difference between the stimuli, based on the most likely time
difference. In the third stage, specific adjustments are made that are based on experience with the
particular stimuli. Following the third stage, simultaneity constancy is achieved.
Figure 8. The three stage model for simultaneity constancy.
Model first stage
The first part of the three stage model addresses the question of which stimuli are selected for
temporal compensation. A potential guide for identifying likely candidates is if they occur close
together in space and time (Bertelson and Aschersleben, 2003; Lewald and Guski, 2003; Spence
et al., 2001). Although there is some considerable agreement that temporal and spatial windows
for temporal and spatial integration exist, the sizes of these windows are less clear.
To measure the importance of spatial separation, we used targets that differed in brightness and
arranged them to be either close together (separated by 8 degs) or far apart (separated by 39
degs). Stimuli of different brightness show simultaneity constancy that overcomes the slower
processing of the dimmer stimulus (see above, figure 4). The pair that was close together showed
simultaneity constancy, correcting for a reaction time difference of over 20 ms. However, the
pair that was far apart did not. This suggests a spatial window for intramodal visual simultaneity
constancy between 8 and 39 degrees, consistent with previous studies of intermodal sensory
integration (Lewald and Guski, 2003 (greater than 20 degrees), Bertelson and Aschersleben, 2003
(less than 50 degrees), Spence et al., 2001). This observation is also consistent with a similar
spatial constraint for the visual/tactile simultaneity constancy mechanism. For touch/light
simultaneity constancy, Harrar and Harris (2005) showed that compensation took place when the
stimuli were both presented on the hand and did not occur when one was on the hand and the
other on the foot. Similar results were found by (Spence et al., 2003) who found PSSs did not
differ from true simultaneity when visual and tactile stimuli were presented at the same location,
while the PSSs did significantly differ from 0 when the two stimuli were presented in different
positions. .
Thus, although there is considerable evidence for a spatial window, it seems to be rather broad.
Perhaps a temporal window will enable a more refined selection of stimuli for binding. There
must be a limit to a time acceptance window for simultaneity constancy, to prevent all the events
that ever happen being collected into a single time epoch and all being treated as simultaneous!
A simultaneity constancy mechanism should only be engaged in situations where a single event is
likely to have created stimuli in different modalities. For touch and light this corresponds to a
maximum of about 100 ms (see figure 2), but when sound is involved, the delay can be very
much larger since it depends on the distance of the source to the observer. Thus an integration
range in time corresponds to an operating range in distance.
When asked if two events are simultaneous, or which of two events came first, there is a
probabilistic uncertainty associated with the decision. Examples of these are shown in figure 9
which indicates an uncertainty window of about +/- 100 to 200 ms. For a light/sound comparison
this therefore corresponds to an operating distance range of 30-60 m. This interval corresponds to
that obtained by other methods of assessing the temporal integration window (Slutsky and
Recanzone, 2001; Fendrich and Corballis, 2001; Morein-Zamir et al., 2003; Lewald and Guski,
2003; Jaekl and Harris, in press).
hand touch /hand light
prediction 34.2 ms
4.2 ms
Y Data
Touch first
from Stone et al. (2001)
DELAY (ms)
Light first
from Harris and Harrar. (In press)
from Fujisaki et al. (2004)
Figure 9. The width of the temporal integration window. Several studies using different methods
suggest that the window within which temporal integration occurs is between 100 and 200 ms.
(data from (Stone et al., 2001; Harrar and Harris, in press; Fujisaki et al., 2004)
Only within these rather broad temporal and spatial windows can stimuli be combined into a
single percept and simultaneity constancy, either intramodal or intermodal, operate.
Model second stage
The first stage of the simultaneity constancy mechanism selected appropriate stimuli to be bound
into a single event. Once stimuli have been identified as possibly arising from the same event, we
propose that a second stage in the simultaneity constancy mechanism applies a best-guess, fixed
time delay to stimuli that brings the average relative time between them closer to zero and thus
reduces computational demands.
For example, as seen in figure 2, the processing time of touches on the body is generally faster
than that of lights. So when a multimodal stimulus occurs that involves tactile and visual
stimulation, such as watching something touch the skin or looking at an object being manipulated
in the hand, the tactile input can be anticipated as leading the visual input by roughly 40ms on
average. Adding a delay of around this amount to the representation of all tactile input thus
reduces the need for complex computations tailored to particular requirements and will be
approximately correct for most circumstances. Similarly, lights are generally processed slower
than sounds within a few meters, and so a fixed relative delay might also be applied to the neural
representation of all sound stimuli when they are paired with visual stimuli. The proposed delay
would be fixed for a given pair of stimuli as follows:
slow touch by 40 ms
slow sound by 40 ms
no adjustment
If specific experience is not available, the fixed-rule second stage is all that can be applied. An
example of a situation when specific experience is not available when lights and touches are
presented in close proximity on the sole of the foot.
foot touch / foot light
Number of times subject
said “light
Y Data
prediction 1.7 ms
Touch first
DELAY (ms)
Time between stimuli (ms)
Light first
foot touch / foot light
prediction PSS
Figure 10. PSS for touches and lights on the foot. The top panel shows the temporal order
judgments for each pair of stimuli as a function of the delay between the two components.
Positive and negative values on the horizontal axis indicate which of the stimuli was
presented first, as shown by the inserted cartoons. The grey lines show sigmoidal fits
through each subject’s data and the average data points and standard errors are
superimposed on these curves. The black curve is a sigmoid reconstructed from the
average PSS and standard deviation of all of these curves. The temporal delay at the PSS
indicated by the average curve is shown by a dashed vertical line and the prediction from
the reaction time differences is shown as a solid vertical line. In the histograms in the
lower panel, the reaction time prediction (black bar) is compared with the PSS (shaded
bar). The vertical axis shows the delays between the stimuli with the polarity indicated
by the inserted cartoons. Standard errors are also shown. The reaction time prediction is
zero, while the PSS requires light to be presented 30 ms before the touch on the foot.
Figure 2 shows that reaction times to touches on the foot and to lights happen to be approximately
the same. The prediction, if no compensation is present, is therefore that the PSS would be when
lights and touches were applied simultaneously to the foot. However, we found a shift in the PSS
away from accurate simultaneity such that subjects perceived simultaneity between the light and
the touch when the light was presented about 30 ms before the touch (figure 10): an anticompensatory shift. We interpret this as revealing the application of a fixed delay (40ms delay to
tactile stimulus) corresponding to the second stage in a simultaneity constancy mechanism. The
lack of experience of feedback from the foot meant that further fine adjustments could not be
made. If this “best-guess” average tactile delay were added to the processing of a tactile and
visual combination on the hand, it would allow veridical perception of simultaneity and relieve
the system of the need for further calculation, but on the foot it causes the temporal relations to be
misperceived. It is an interesting question as to whether the second stage fixed rules can be
adapted or whether the period for being plastic has passed for these unusual stimulus
Veridical perception of simultaneity usually requires specific experience with the stimulus pair in
question so that variations in timing specific to the stimuli and context involved, such as intensity
and distance, can be taken into account. The third stage of our model takes into account such
specific timing requirements.
Model third stage
The third stage of our model is where adjustments are made based on particular experience of the
stimulus pair in question. This is where things such as the speed of sound and distance from the
observer are taken into account – information that is not present in the stimulus and that must be
added from previous experience.
Evidence for this third stage has been presented above in many conditions where stimuli were
correctly perceived as simultaneous despite differences in processing time that could not be
assumed without reference to the stimulus situation. Examples described above included
compensation for the slower speed of sound for distances of up to 32 meters when paired with
lights (figure 1), and compensation for the variation with intensity of the processing times of
lights (figure 4). In these situations, the rough approximations that are added at the second stage
are “tweaked” at the third stage so that perception is veridical.
How this compensation might be achieved is a question that we unfortunately have no answers to.
The comparisons needed to answer either of the main questions used to assess the perception of
simultaneity (were they simultaneous? which came first?) must take place in memory, which is
necessarily accessed after the event. Thus, a reconstruction of the entire multimodal event may be
imagined taking into account as many factors as possible. This reconstructive process is flexible
enough to compensate for delays of up to at least 100 ms (such as found between the auditory and
visual components of events 32 meters away). If however the timing information is required
immediately for some purpose or decision which does not allow the luxury of reflection after the
event, it is possible that no adjustment can be made. The example of two disks colliding and
emitting a sound on contact (Arnold et al., 2005) maybe one such task.
This chapter has reviewed the occurrence of an important general perceptual principle:
simultaneity constancy. A reconstructive process is involved that is able resynchronize
asynchronous signals by taking into account many factors, both internal and external, which
would otherwise distort accurate knowledge of timing. Following the reconstruction, accurate
estimates can be made about the relative timing of events. There appear to be a multitude of
parallel mechanisms working on different combinations of stimuli. The rules of when to activate
the systems are not clear and the spatiotemporal constraints appear loosely defined. Without tight
spatial or temporal acceptance windows it seems likely that the system would be engaged
inappropriately and the timing of unrelated stimuli would often be brought into perceptual
alignment. Such accidental alignment, however, is unlikely to introduce a problem and appears to
be an acceptable side-effect of a system that manipulates the perception of space and time to suit
its needs.
This work was supported a grant from the Natural Science and Engineering Research Council of
Canada to LRH. Vanessa Harrar and Philip Jaekl hold NSERC graduate fellowships. Many of the
experiments reported here were carried out by Agnieszka Kopinska and we are grateful for her
permission to use them here. We are also very grateful to Sugirhini & Shamini Selvanayagarajah
for their help with the data collection.
Reference List
Alais, D. and Carlile, S. Synchronizing to real events: subjective audiovisual alignment
scales with perceived auditory depth and speed of sound. Proc Natl Acad Sci U S
A. 2005 Feb 8; 102(6):2244-7.
Arnold, D. H.; Johnston, A., and Nishida, S. Timing sight and sound. Vision Res. 2005
May; 45(10):1275-84.
Bergenheim M, Johansson H, Granlund B, Pedersen J (1996) Experimental evidence for
a sensory synchronization of sensory information to conscious experience. In:
Hameroff, SR, Kaszniak, AW, Scott, AC (ed) Towards a science of
consciousness: The first Tucson discussions and debates. MIT press, Cambridge,
MA. pp. 301-310
Bertelson, P. and Aschersleben, G. Temporal ventriloquism: crossmodal interaction on
the time dimension. 1. Evidence from auditory-visual temporal order judgment.
Int J Psychophysiol. 2003 Oct; 50(1-2):147-55.
Dennett DC, Kinsbourne M (1992) Time and the observer: the where and when of
consciousness in the brain. Behavioural and Brain Sciences 15: 183-201
Eagleman D (in press) Recalibrating touch. In: Nijhawan, R (ed) Issues of space and time
in perception and action. Cambridge University Press, Cambridge, UK
Engel GR, Dougherty WG (1971) Visual-auditory distance constancy. Nature 234: 308
Fendrich R, Corballis PM (2001) The temporal cross-capture of audition and vision.
Perception and Psychophysics 63: 719-725
Fujisaki W, Shimojo S, Kashino M, Nishida S (2004 Jul) Recalibration of audiovisual
simultaneity. Nat Neurosci 7: 773-8
Gregory, R. L. Distortion of visual space as inappropriate constancy scaling. Nature.
1963 Aug 17; 199:678-80.
Harrar, V. and Harris, L. R. Simultaneity constancy: detecting events with touch and
vision. Exp Brain Res. in press.
Harrar V, Winter R, Harris LR (submitted) The effect of distance on unimodal and
multimodal apparent motion. Neuropsychologia
Jaekl P, Harris LR (in press) Time Shifting: A Direct Measurement of Auditory-Visual
Temporal Integration. Neuropsychologia
King AJ, Palmer AR (1985) Integration of visual and auditory information in bimodal
neurones in the guinea-pig superior colliculus. Exp. Brain Res. 60: 492-500
Kopinska A, Harris LR (2004) Simultaneity constancy. Perception 33: 1049-1060
Lewald, J. and Guski, R. Cross-modal perceptual integration of spatially and temporally
disparate auditory and visual stimuli. Brain Res Cogn Brain Res. 2003 May;
Macefield, G.; Gandevia, S. C., and Burke, D. Conduction velocities of muscle and
cutaneous afferents in the upper and lower limbs of human subjects. Brain. 1989
Dec; 112 ( Pt 6):1519-32.
McKee SP, Smallman HS (1998) Size and speed constancy. In: Walsh, V, Kulikowski,
JJ (ed) Perceptual constancy. Cambridge University Press, Cambridge, U.K. pp.
Morein-Zamir, S.; Soto-Faraco, S., and Kingstone, A. Auditory capture of vision:
examining temporal ventriloquism. Brain Res Cogn Brain Res. 2003 Jun;
Nickalls RWD (1996) The Influence of Target Angular Velocity on Visual Latency
Difference Determined Using the Rotating Pulfrich Effect. Vision Res. 36: 28652872
Penfield, W. and Rasmussen, T. 1950 The cerebral cortex of man. Macmillan, New York
Poppel, E.; Schill, K., and von Steinbuchel, N. Sensory integration within temporally
neutral systems states: a hypothesis. Naturwissenschaften. 1990 Feb; 77(2):89-91.
Pulfrich C (1922) Die Stereoskopie im Dienste der isochromen und heterochromen
Photometrie. Naturwissenschaften 25: 553-564
Slutsky, D. A. and Recanzone, G. H. Temporal and spatial dependency of the
ventriloquism effect. Neuroreport. 2001 Jan 22; 12(1):7-10.
Soto-Faraco, S.; Ronald, A., and Spence, C. Tactile selective attention and body posture:
assessing the multisensory contributions of vision and proprioception. Percept
Psychophys. 2004 Oct; 66(7):1077-94.
Spence, C.; Baddeley, R.; Zampini, M.; James, R., and Shore, D. I. Multisensory
temporal order judgments: when two locations are better than one. Percept
Psychophys. 2003 Feb; 65(2):318-28.
Spence C, Shore DI, Klein RM (2001 Dec) Multisensory prior entry. J Exp Psychol Gen
130: 799-832
Spence, C. and Squire, S. Multisensory integration: maintaining the perception of
synchrony. Curr Biol. 2003 Jul 1; 13(13):R519-21.
Stone RV, Hunkin NM, Porrill J, Wood R, Keeler V, Beanland M, Port M, Porter NR
(2001) When is now? Perception and simultaneity. Proc. Roy. Soc. Lond. B. 268:
Sugita Y, Suzuki Y (2003) Audiovisual perception: Implicit estimation of sound-arrival
time. Nature 421: 911
Von Békésy G (1963) Interaction of paired sensory stimuli and conduction in peripheral
nerves. J. Applied Physiol. 18: 1276-1284
Walsh, V. and Kulikowski, J. 1998 Perceptual constancy: why things look as they do.
Cambridge University Press, Cambridge, UK
Wilson JA, Anstis SM (1969) Visual delay as a function of luminance. American Journal
of Psychology 82: 350-8
Zampini, M.; Brown, T.; Shore, D. I.; Maravita, A.; Roder, B., and Spence, C.
Audiotactile temporal order judgments. Acta Psychol (Amst). 2005a Mar;
Zampini, M.; Guest, S.; Shore, D. I., and Spence, C. Audio-visual simultaneity
judgments. Percept Psychophys. 2005b Apr; 67(3):531-44.