Copyright 2009 by Human Factors and Ergonomics Society, Inc. All

PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009
16
Do Velocity Vectors Support Multiple Object Tracking?
Krista Oinonen1, Lauri Oksama1, Esa Rantanen2 and Jukka Hyönä1
1
University of Turku, Finland
Rochester Institute of Technology, Rochester, NY, USA
2
We examined the effect of velocity vectors on human ability to extrapolate movement of multiple moving
objects. This was achieved using an identity tracking task, in which objects (line-drawings of familiar objects) first move for 5 s, after which they temporarily disappear from view. In order to examine different
aspects of vector information, we compared 4 velocity vector conditions: (1) no vectors were displayed; (2)
only a history trail was displayed; (3) a condition where the end of the velocity vector pointed to exactly
where the objects would reappear after the masking; (4) a constant heading length was displayed. Based on
recent studies on Keane and Pylyshyn (2006) and Oinonen et al. (2009), we hypothesized that the condition, where exact location information is presented, would improve performance compared to baseline. The
results supported this hypothesis by indicating that history trail or directional vector do not improve performance but only complete visual information about the future location will help to anticipate movement
in multiple object tracking.
Copyright 2009 by Human Factors and Ergonomics Society, Inc. All rights reserved. 10.1518/107118109X12524440832665
INTRODUCTION
In many real-life dynamic visual environments, such as
traffic, sports, or air traffic control (ATC), it is important to
anticipate in what direction critical objects are about to move
and where they would be positioned in the near future. In order to aid to predict the targets’ trajectories into the future, velocity vectors (or, track velocity and direction vector lines) are
common in ATC plan view displays (PVD). The velocity vectors achieve a very simple thing of making the dynamic characteristics of targets (i.e., aircraft) on controllers’ display visually available. Without such visualization the controller
would be under a heavy cognitive burden having to rely on
past experience on the target dynamics (memory) and integration of a large number of circumstantial factors to predict the
targets’ trajectories into the future.
It appears that the putative benefits of visual representation of target dynamics were almost coincidental to the old radar technology and cathode ray tube (CRT) displays. Because
the radar antenna continuously rotates and thus ‘sweeps’ the
atmosphere for returns from solid objects, a target is only ‘illuminated’ for the moment the radar antenna is pointed at it;
phosphorus coating of the CRT displays provided an afterglow
of the radar return (Cole, 1985), forming a history trail for a
moving target for past 3-5 positions (depending on the phosphorus decay rate). From such history trails controllers can
glean targets’ direction, velocity (the distance between successive past positions), and rate of turn information, and extrapolate from these the future trajectories of the targets. In subsequent digital radar PVDs the history of the targets continued to
be provided, together with the computed velocity vector projected ahead of the target (Nolan, 1998). However, remarkably
little research has been published on the theoretical underpinnings of predictor displays in ATC.
But are vectors really helpful? Do they really help in
movement extrapolation? We raise these questions as recent
studies on tracking of multiple moving objects that temporarily disappear from the observer’s field of view have suggested
that observers are not able to update or anticipate objects’ lo-
cations (Keane & Pylyshyn, 2006; Oinonen, Oksama &
Hyönä, 2009). Our ability to extrapolate movement of multiple moving objects during interruption or occlusion hence
appears to be quite poor.
According to Keane and Pylyshyn (2006) observers can
only remember where the critical objects were located at the
time of their disappearance or slightly before that. Even knowing the objects’ movement directions does not help observers
to extrapolate and predict the future locations of multiple objects. According to the displacement hypothesis of Keane and
Pylyshyn, predicting object’s location is the poorer the farther
the objects move while invisible, causing bigger displacement
from their last visible locations.
These results seem to cast a shadow on the intuitive idea
that observers can easily anticipate multiple objects’ movement and benefit from velocity vector information. In particular, it follows from these studies that vectors including only directional information (heading or history) may not be beneficial at all, as they do not give any exact visual information on
targets’ future locations. Cognitively demanding movement
extrapolation process is still required. Instead, only vectors
that provide exact information on targets’ future location may
be truly helpful.
Because these experimental tasks are clearly analogous to
those of air traffic controllers, where the occlusion of targets
corresponds to the controller taking his or her eyes off the
PVD to perform other tasks, an ATC task offers an attractive
approach to deeper understanding of the contribution of velocity vectors to human multiple object tracking performance.
Past Research on Predictor Displays
Much of predictor display literature pertains to control
engineering (e.g., Kelley, 1962; 1968) and active vehicular
control with applications in nuclear submarines (Berbert &
Kelley, 1962) and lunar landing (Fargel & Ulbrich, 1963). For
a flavor of this body of literature and further examples of driving an automobile in traffic, a blind pedestrian using a cane or
electronic obstacle detector, and remote manipulation of solid
PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009
objects using artificial sensors and effectors, see Sheridan
(1966). Textbook descriptions of predictor displays are quite
short and lack any treatment of theoretical background. Kantowitz and Sorkin (1983) only cite Palmer, Jago, Baty, and
O’Connor (1980) about the benefits of predictor displays, and
Sanders and McCormick (1993) cite Kelley (1962; 1968) as
well as Roscoe, Corl, and Jensen (1981) and Jensen (1981)
about quickened displays.
As far as empirical human-in-the-loop (HITL) studies are
concerned, these come from either direct vehicular control or
airborne collision detection settings. Kelley (1968) cites several studies that have demonstrated the benefits of predictor
displays in vehicular control, for example controlling a spacecraft and correct rapidly changing thrust disturbances (Besco,
1964) and submarine displays in collision avoidance task
(McLane & Wolf, 1966). In the aviation domain most of the
predictor display research has been on advanced cockpit displays, either perspective flight displays (e.g., Jensen, 1981;
Grunwald, 1981, Merwin, & Wickens, 1996; Wickens,
Haskell, & Harte, 1989) or cockpit displays of traffic information (CDTIs; Hart & Loomis, 1980; Kelly, 1983; Kelly &
Abbot, 1984; Kelly & William, 1983; Palmer, Baty, & O'Connor, 1979; Palmer et al., 1980; Williams; 1983). These studies
also include HITL simulations with strong evidence about the
benefits of predictor displays to human performance in estimating future events (e.g., conflicts with other aircraft).
Palmer et al. (1980) examined straight and curved predictor lines in an airborne conflict detection task where subjects
estimated whether the intruder would pass in front of or behind the ownship. Predictors clearly aided the participating pilots’ performance when both aircraft were moving along
straight trajectories as well as when either aircraft was turning
if the predictors displayed turn-rate information. Interestingly,
the presence or absence of flight path history or the information update rate did not influence performance (although the
pilots preferred having history trails displayed and continuous
ownship position update).
In a similar experiment Hart and Loomis (1980) explored
the effect of CDTI velocity vectors on pilots' ability to judge
lateral relationships between their own and other aircraft. The
participating pilots judged as quickly as possible whether the
other aircraft would pass ahead of or behind the ownship at the
closest point of approach. Straight predictor lines reduced response time from about 38 s to 31 s and errors from about
15% to 0%in straight flight paths scenarios, and from 40 s to
34 s and from 28% to 5%, respectively, in curved flight path
scenarios. Straight predictors in curved flight path scenarios
did not help much.
Palmer (1983) also manipulated the quality of information
provided by the predictor vector in simulated conflict avoidance maneuvering tasks. The participating pilots were able to
avoid triggering conflict avoidance system (CAS) alarm 90%
of the time when the predictive information was free of noise,
but their performance suffered substantially (avoiding triggering the CAS alarm only 78% of the time) when predictor information was degraded.
In an eye-movement study on a CDTI Ellis and Stark
(1986) showed that the most common points of interest in that
17
display were the leader line predictors, with the aircraft symbol and history trail receiving respectively fewer visual fixations. These results attest to the value of the velocity vectors to
the subjects (8 experienced airline pilots) in their task (intruder
passing in front of or behind the ownship).
Although there seems to be ample empirical evidence for
predictor lines supporting task performance, it should be noted
that all the experimental tasks reviewed above were typically
very short in duration (quick judgments) and involved only a
few targets (typically only two, ownship and an intruder).
They were hence quite different from the ATC-like tasks of
interest to us in our study.
Velocity Vectors in ATC
There is little empirical research on different predictor
displays in ATC, despite their ubiquitous presence on most
modern ATC systems. Perhaps this is due to the long evolution of predictor information on ATC displays and the recent
research focus on the development of cockpit displays (i.e.,
CDTIs). Unfortunately, the results from CDTI research generalize poorly to ATC because the air traffic controller’s task is
fundamentally different from the pilot’s. In all of the experiments reviewed in the previous section the task was to estimate at the most two targets’ relative position in the immediate future, or compare a vehicle’s state to some criterion
value in direct vehicular control. However, while controllers
need to estimate aircraft trajectories and judge potential conflicts between aircraft in a pair, they also must maintain situation awareness on a much larger scale, involving tens of aircraft in busy sectors. The research paradigm of tracking multiple moving objects that are temporarily masked appears uniquely suitable for examination of the effects of velocity vectors on tracking performance in a task that is closely analogous to that of air traffic controllers.
The present study
In the present study we were interested in the influence of
velocity vectors on movement extrapolation. A version of
multiple identity tracking task (MIT, see Oksama & Hyönä,
2004, 2008) was employed, where at some point all moving
targets temporarily disappear from the screen (Oinonen, Oksama, & Hyönä, 2009). Participants were instructed to continue tracking the invisibly moving targets and extrapolate their
movement. When the targets reappeared, the participants’ task
was to indicate as fast as possible the current location of one
auditorily probed target. In MIT, the moving elements all have
distinct identities analogously to real-life tracking tasks; thus,
the task required continuous updating of what–where–
bindings. The task was designed to be a sensitive measure of
immediate situation awareness of the observers.
In order to examine different features of vector information, we compared four velocity vector conditions (no vectors,
only a history trail, velocity vectors corresponding to the
masking duration, and heading vectors with constant length)
over three target masking durations. Based on Keane and Pylyshyn (2006) and Oinonen et al. (2009) we expected that the
PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009
vectors corresponding to masking duration, and thereby visually presenting exact target location into the future, would
improve performance the most compared to the baseline.
METHOD
Participants
Twenty undergraduate psychology students (median age
24 years) from University of Turku, Finland, volunteered for
the study.
Apparatus and Experimental Task
The stimuli were presented on a 19-inch Samsung SyncMaster monitor with a resolution of 1280 by 1024 pixels controlled by NVIDIA GeForce FX5200 card, AMD Sempron
2400+, 1.66 GHz, 1.0 GB of RAM computer. The experiment
was programmed with the E-prime software (Schneider,
Eschman, & Zuccolotto, 2002a, 2002b). The program that
generated the motion sequences was written in Visual Basic.
The stimuli included 8 black-and-white line drawings
(familiar objects and animals), chosen from the picture corpus
of Snowgrass and Vanderwart (1980). The pictures were chosen so that their names in Finnish began with a consonant and
comprised 5 letters and 2 syllables. Their pronunciation time
was about 700 ms. The names were also matched with respect
to their frequency in a written language corpus (Laine & Virtanen, 1999). Pictures were 75 x 39 pixels in size, and extended a visual angle of 1.9 x 1.0 degrees.
The objects’ movement directions were randomly chosen
from among the four intermediate compass directions (northeast, south-east, south-west, and north-west). Object speed
was 2.42 degrees/s. Objects were allowed to move with intersecting trajectories. The objects moved for 5 s. after which
they disappeared from the display but continued to move. Participants were instructed to continue to track the invisibly
moving targets and extrapolate their movement during masking (disappearance). After a given time the objects reappeared
and the participants’ task was to immediately locate and indicate the auditorily designated target on the screen.
Design
Independent variables. We manipulated the masking duration (1680, 2520 and 3360 ms, resulting in the objects’ displacement during masking of approximately 4, 6, and 8 degrees of visual angle, respectively) and the number of targets
to be tracked (2 and 3 out of the total of 8 objects). We also
employed four velocity vector conditions: (1) in the baseline
(B) condition, no vectors were displayed; (2) in the track (T)
condition, only a history trail was displayed, indicating the objects’ movement direction but not their future positions; (3) in
the congruent (C) condition the velocity vectors corresponded
to the masking duration, that is, the end of the velocity vector
pointed to exactly where the objects would reappear after the
masking; (4) finally, in the heading (H) condition the objects’
velocity vector lengths were constant predicting the objects’
18
positions 2520 ms into the future. Note that this matched the
congruent condition when the masking duration was 2520 ms,
but with masking duration of 1680 ms the vector pointed
farther and with 3360 ms masking duration closer than where
the objects would reappear. This condition tested whether the
length of the vector aided in anticipation of the targets’
movement. In all visual vector conditions (track, congruent
and heading) the vectors were presented in all eight objects.
Dependent variable. The movement extrapolaton performance was measured by response time (RT) to locate a requested target at a moment of targets’ reappearance.
Design. The experimental design included three withinparticipant variables: set-size (2-3), masking duration (840 ms,
1680 ms, 2520 ms) and vector condition (B, T, C, or H). Each
of the resulting 24 conditions was replicated 10 times, resulting in a total of 240 trials for each participant.
Procedure
A total of 60 different object trajectories were created, for
10 different trajectories per set-size and masking duration. The
same trajectories were used in all vector conditions. The target
pictures to be identified and their mean distance from the
screen center at a moment they reappeared were matched
across all experimental conditions.
At the beginning of each trial, the objects were displayed
stationary for 1s. After that, a black frame flashed on and off
for ten times (3s) around the target objects. All objects then
began to move in a random and continuous fashion on the
screen. The tracking period lasted for 5 s, after which the objects disappeared for one of the three masking durations.
When the objects reappeared on the screen (they no longer
moved), participants heard simultaneously a name of one target picture via the earphones. They were instructed to locate
and point the specific target with the computer mouse as
quickly as possible. The mouse cursor was initially positioned
in the center of the screen. Participants’ eye-to-screen distance
(57 cm) was controlled using a chin rest.
The baseline, track, congruent and heading conditions
were presented in separate blocks. The other variables were
randomized within these blocks. The order of blocks was
counterbalanced across participants. There was a short rest period between the blocks. The entire session took about 100
min.
RESULTS
The results supported our hypothesis. The congruent condition where the vector length corresponded to the masking
duration and thus predicted the target position upon its reappearance, resulted in the fastest identification (see Figure 1).
Before analysis, erroneous responses (1.3%) were removed from the data. A few response time outliers (0.6%)
were also removed (z < -3.29 or z > 3.29). The remaining response time data were analyzed with a 2 (Set-Size) x 3 (Masking Duration) x 4 (Vector Condition) repeated measures analysis of variance (ANOVA). A Greenhouse-Geisser correction
was applied to the p-values whenever needed.
PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009
1300
1250
1200
Baseline
Track
ms 1150
Congruent
Heading
1100
1050
1000
1680
2520
3360
Masking duration
Figure 1. Response times (ms) as a function of masking duration (ms) and vector condition (baseline, track, congruent and
heading). Vertical lines depict standard error of means (±1
SEM).
The main effect of set-size was significant, F(1, 19) =
233.53, p < .001; RTs were lengthened as the set-size increased. The main effect of masking duration was significant
as well, F(2, 38) = 10.77, p < .001; RTs were lengthened as as
a function of masking duration. The main effect of vector condition was also significant, F(3, 57) = 4.49, p < .01. Planned
comparisons were conducted where the baseline condition was
compared to other vector conditions. Comparisons indicated
that the congruent vector condition differed from the baseline,
F(1, 19) = 6.51, p < .05); RTs were faster in the congruent
vector than in the baseline condition. The other vector conditions did not differ from baseline; track, F(1, 19) = 1.68, p >.
20; heading, F < 1.
The interaction between set-size and masking duration
was significant, F(2, 28) = 5.84, p < .05); with the effect of
masking duration was more robust for set-size three than setsize two. The other interactions were not significant (Set-size
x Vector Condition, F(3, 57) = 1.87, p > .10; Masking Duration x Vector Condition, F < 1; 3-way, F < 1).
DISCUSSION
Our study examined the influence of velocity vectors on
movement extrapolation when tracking multiple invisible objects. We tested this by using the MIT task where at some
point all moving targets temporarily disappeared from view.
The participants task was to immediately locate and indicate a
requested target after it reappeared on the screen. We compared four velocity vector conditions (no vector, only a history
trail, velocity vector corresponding to the masking duration,
and constant vector length) over three target masking durations.
Our results indicate that only the vector providing complete information about the target’s future location improved
19
movement extrapolation when compared to the condition
where no visual vectors were present. Surprisingly, neither the
history trail nor the forward heading vector with constant
length (not pointing to the future location in all trials) did improve performance. If anything, the history trail condition was
slightly worse than the baseline condition (no velocity information at all).
Our results suggest that only explicit visual information
about the future locations of invisibly moving targets will help
to anticipate their locations after occlusion. On the other hand,
movement history or directional information of targets’ trajectory does little to help on movement extrapolation with multiple targets. Our results are consistent with Keane and Pylyshyn’s (2006) displacement hypothesis. According to this hypothesis, when objects disappear from view, observers maintain the last visible locations of the target objects in the visual
short term memory (VSTM). When the objects reappear, this
VSTM representation is used as a reference in relocating the
targets. This view is also supported by Oksama and Hyönä
(2008), whose model of multiple identity tracking (MOMIT)
suggests that in order to maintain an up-to-date representation
of multiple moving targets focal attention is serially switched
between the targets. When focal attention is directed to a target element its identity-location binding is refreshed and updated. And when a target becomes unattented, or in this case
invisible, its last visible and attended location is stored in the
VSTM. Therefore, relocating a recently unattended target relies on stored information on that target’s last position. Congruent vectors provide explicit location information of the targets’ position after occlusion, which information can then be
stored in VSTM for future use. This is why congruent vectors
are superior to other vector conditions in multiple identity
tracking. This may also explain why the trail condition produced the worst performance. The trail may draw observers’
visual attention toward it and thus bias movement extrapolation in the wrong direction.
REFERENCES
Berbert, A. G., & Kelley, C. R. (1962). Piloting nuclear submarines with controls that look into future. Electronics,
XXXV (June 8, 1962).
Besco, R. O. (1964). Manual attitude control systems: Vol II.
Display format considerations (Technical report to the
NASA). Culver City, CA: Hughes Aircraft Company.
Cole, H. W. (1985). Understanding radar. London: Collins.
Ellis, S.R., & Stark, L. (1986). Statistical dependency in visual
scanning. Human Factors, 28(4), 421-438.
Fargel, L. C., & Ulbrich, E. A. (1963). Predictive controllers
applied to lunar landing. Instrumentation and Control
Systems, 36, 130-131.
Grunwald, A. J. (1981). Predictor symbology in computergenerated perspective displays. In J. Lyman & A Bejczy
(Eds.), Proceedings of the 17th Annual Conference in Manual Control (NASA-JPL Pub 81-95).
Hart, S. G., & Loomis, L. L. (1980). Evaluation of the potential format and content of a cockpit display of traffic information. Human Factors, 22(5), 591-604.
PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009
Jensen, R. S. (1981). Prediction and quickening in perspective
flight displays for curved landing approaches. Human
Factors, 23, 355-363.
Kantowitz, B. H. & Sorkin, R. S. (1983). Human factors: Understanding people-systems relationships. New York:
John Wiley & Sons.
Keane, B. P., & Pylyshyn, Z. W. (2006). Is motion extrapolation employed in multiple object tracking? Tracking as a
low-level, non-predictive function. Cognitive Psychology,
52, 346-368.
Kelley, C. R. (1962). Predictor instruments look to the future.
Control Engineering, 86, 86-90
Kelley, C. R. (1968). Manual and automatic control. New
York: Wiley
Kelly, J. R. (1983). Effect of lead-aircraft ground-speed quantization on self-spacing performance using a cockpit display of traffic information. NASA TP-2194.
Kelly, J. R., & Abbott T. S. (1984). In-trail spacing dynamics
of multiple CDTI-equipped aircraft queues. NASA TM85699. Hampton, VA: NASA Langley.
Kelly, J. R. & Williams, D. H. (1983). Factors affecting intrail following using CDTI. Proceedings of the Eighteenth
Annual Conference on Manual Control, Frank L. George,
ed., (AFWAL-TR-83-3021), pp. 485-513. U.S. Air Force.
Laine, M. & Virtanen, P. (1999). WordMill Lexical Search
Program. Turku, Finland: Centre for Cognitive Neuroscience, University of Turku.
McLane, R. C., & Wolf, J. D. (1966). Symbolic and pictorial
displays for submarine control. Paper presented at the
MIT–NASA Working Conference on Manual Control,
Cambridge, MA, February 28–March 2, 1966.
Merwin, D. H., & Wickens, C. D. (1996). Evaluation of perspective and coplanar cockpit displays of traffic information to support hazard awareness in free flight (Technical
Report ARL-96-5). Savoy, IL: Aviation Research Laboratory.
Nolan, M. S. (1998). Fundamentals of air traffic control (3rd
ed.). Pacific Grove, CA: Brooks/Cole—Wadsworth.
Oinonen, K., Oksama, L., & Hyönä, J. (2009). Movement
extrapolation and identity-location binding for invisible
objects. In submission.
Oksama, L., & Hyönä, J. (2004). Is multiple object tracking
carried out automatically by an early vision mechanism
20
independent of higher-order cognition? An individual difference approach. Visual Cognition, 11, 631-671.
Oksama, L., & Hyönä, J. (2008). Dynamic binding of identity
and location information: A serial model of multiple identity tracking. Cognitive Psychology, 56, 237-283.
Palmer, E. A. (1983). Conflict resolution maneuvers during
near miss encounters with cockpit traffic displays. Proceedings of the 27th Annual Meeting of the Human Factors Society. Santa Monica, CA: Human Factors Society.
Palmer, E. A., Jago, S. J., Baty, D. L., & O’Connor, S. L.
(1980). Perception of horizontal aircraft separation on a
cockpit display of traffic information. Human Factors,
22(5), 605-620.
Palmer, E., Baty, D., & O'Connor, S. (1979). Perception of
aircraft separation with various svmbols on a cockpit display of traffic information. Proceedings of the 15th Annual Conference on Manual Control. Wright-Patterson
Air Force Base, OH.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002a). Eprime user’s guide. Pittsburgh, PA: Psychology Software
Tools, Inc.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002b). Eprime reference guide. Pittsburgh, PA: Psychology Software Tools, Inc.
Sanders, M. A. McCormick, E. J. (1993). Human factors in
engineering and design. New York: McGraw-Hill.
Sheridan, T. B. (1966). Three models of preview control.
IEEE Transactions on Human Factors in Electronics,
HFE-7(2), 91-102.
Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set
of 260 pictures: Norms for name agreement, image
agreement, familiarity and visual complexity. Journal of
Experimental Psychology: Human Learning and Memory, 6, 174–215.
Roscoe, S. N., Corl, L., & Jensen, R. S. (1981). Flight display
dynamics revisited. Human Factors, 23, 341-353.
Wickens, C. D., Haskell, I., & Harte, K. (1989). Ergonomic
design for perspective flight path displays. IEEE Control
Systems Magazine, 9(4), 3-8.
Williams, D. H. (1983). Time-based self-spacing techniques
using cockpit display of traffic information during approach to landing in a terminal area vectoring environment. NASA-TM-84601.