Exaggerating Temporal Differences Enhances

Research Article
Harold Hill1 and Frank E. Pollick1,2
Department of Psychology, University of Glasgow, Glasgow, Scotland, United Kingdom, and 2ATR Human Information
Processing Laboratories, Kyoto, Japan
Abstract—Humans are very good at perceiving each other’s movements. In this article, we investigate the role of time-based information in the recognition of individuals from point light biological
motion sequences. We report an experiment in which we used an
exaggeration technique that changes temporal properties while
keeping spatial information constant; differences in the durations of
motion segments are exaggerated relative to average values. Participants first learned to recognize six individuals on the basis of a
simple, unexaggerated arm movement. Subsequently, they recognized
positively exaggerated versions of those movements better than the
originals. Absolute duration did not appear to be the critical cue. The
results show that time-based cues are used for the recognition of
movements and that exaggerating temporal differences improves performance. The results suggest that exaggeration may reflect general
principles of how diagnostic information is encoded for recognition in
different domains.
Humans and other animals are very good at perceiving each other’s movements, or what is called biological motion (Johansson,
1973). Biological motion can be represented by moving point light
displays, which are very limited displays consisting only of the motion
of points corresponding to the joints of the actor. How the complex
patterns of relative motion in these displays are perceived and interpreted remains underspecified. Like other moving stimuli, point light
displays have spatial, temporal, and spatiotemporal properties. In this
article, we investigate the effects of varying temporal properties of
simple motion sequences, leaving spatial cues constant.
The movements of the points in displays of biological motion
allow people to immediately recognize the action depicted (e.g., walking). Further, the same limited information allows people to make
more specific categorizations with a reasonable degree of accuracy;
for example, observers may be able to determine the direction of
walking, the sex of the walker, and even the identity of the walker
(e.g., Cutting & Kozlowski, 1977; Mather & Murdoch, 1994). The
information that allows such decisions is not yet fully understood, but
there are at least two possibilities. These can be illustrated through the
example of categorizing sex from a moving point light display of
someone walking. One possibility is that people use motion as a cue
to three-dimensional shape. In the case of the walker, the relative
movements of shoulder and hip joints specify the center of moment,
and this captures differences in torso shape between males and females (Cutting, Proffitt, & Kozlowski, 1978). Alternatively, motion
itself may directly inform categorization. For example, the velocities
of shoulder and hip joints of the walker appear to specify sex even
Address correspondence to Harold Hill, Department of Psychology, University College London, Gower St., London WC1E 6BT, United Kingdom;
e-mail: [email protected]
VOL. 11, NO. 3, MAY 2000
when differences in torso shape have been eliminated (Mather &
Murdoch, 1994). In this case, a spatiotemporal motion cue, body
sway, is used directly.
Evidence that spatiotemporal properties are particularly important
for the perception of biological motion comes from masking studies.
For example, noise masks with the same properties as the motion
being masked are particularly effective for disrupting the perception
of biological motion, and this cannot be accounted for in terms of
spatial properties alone (Cutting, Moore, & Morrison, 1988). Purely
temporal properties are also important. For example, disrupting the
temporal synchronization of points in a biological motion display by
starting them at different places in their cycle disrupts perception of
the motion (Bertenthal & Pinto, 1994). Temporal presentation properties (e.g., frame durations and interframe intervals) also affect the
perception of biological motion. Further, the effects of these parameters on biological motion are distinct from their effects on other
motion displays, such as displays of simple translational motion
(Mather, Radford, & West, 1992; Shiffrar & Freyd, 1990, 1993;
Thornton, Pinto, & Shiffrar, 1998). In particular, biological motion
information is integrated over longer time periods than nonbiological
motion and adapts to stimulus-based properties, such as number of
cycles, rather than stimulus-independent properties, such as absolute
time (Neri, Morrone, & Burr, 1998).
In this study, we wished to test whether exaggerating temporal
properties can facilitate the recognition of biological motion. In the
spatial domain, exaggerating differences from the average—often referred to as caricaturing—facilitates recognition (e.g., of faces; Brennan, 1985; Rhodes, 1996; Rhodes, Brennan, & Carey, 1987).
Investigators have argued that the enhanced recognition of exaggerations suggests either that faces may actually be stored as exaggerations or that exaggerations provide better access to the stored
representation than their unexaggerated originals (Rhodes et al.,
1987). Examples of such “supernormal” stimuli that produce a stronger response than the actual stimuli they replace are common in the
animal world (Tinbergen, 1953, pp. 206–210). These have in common
that some property or properties of the stimuli appear critical, and
supernormal stimuli are produced only when these properties are
It is possible to exaggerate movement by exaggerating positions in
individual frames in a manner analogous to the way faces are exaggerated (Fidopiastis & Pollick, 1998). However, in this research, we
calculated and exaggerated differences in the temporal rather than the
spatial domain, that is, in terms of durations rather than distance. Any
exaggeration increases differences between stimuli, “stretching” the
underlying dimensions of variability. However, as with supernormal
stimuli, the stretching will affect performance only if these dimensions are critical for the task.
We used point light displays of the right sagittal view of the arm
Copyright © 2000 American Psychological Society
Temporal Exaggeration
movement involved in picking up, drinking from, and putting down a
glass. These sequences were divided into segments at key frames
specifying the start and end of each segment (see Fig. 1). Animators
use similar key frames, defining spatial positions in these frames
manually while generating intermediate frames automatically (Thomas & Johnson, 1981). As Figure 2a shows, all but one of the key
frames that we used fell at local minima on the velocity profile of the
wrist, that is, at relatively stationary points between periods of movement. We used the velocity profile of the wrist to determine the key
frames, as extremities of this kind are known to be important for the
perception of biological motion (Mather et al., 1992). The segments
between the key frames correspond to distinct parts of the motion
sequence—for example, the first segment corresponds to picking up
the glass. The key frame not at a minimum is at a point where the
velocity profile changed markedly and corresponds to the onset of
actual drinking.
We temporally exaggerated sequences by scaling the durations of
the movement segments relative to average values. (See Fig. 2b for
the exaggeration of the velocity profile shown in Fig. 2a.) The actor
whose movement data the velocity profiles in Figure 2 were derived
from picked up, lifted, and drank from the glass in a shorter time than
average. In the exaggeration, the durations of these segments became
even shorter. Spatial properties like the distance traveled by points
between the consecutive key frames shown in Figure 1 remained the
same, so spatiotemporal properties, including peak and average velocity of segments, also changed when temporal properties changed.
The last two segments of this actor’s movement had durations close to
average and were therefore not much affected by exaggeration. (Additional technical details and examples of the animations used are
available on the World Wide Web at http://www.psy.gla.ac.uk/harry/
Temporal exaggerations allowed us to test the effect of varying
temporal properties of movement while leaving spatial properties
In this experiment, participants learned to recognize individuals
from their arm movements. The task and stimuli chosen were relatively arbitrary, but had the advantage that the task demands were
clearly specified and the information available was controllable. We
compared recognition performance for learned and exaggerated sequences, thus determining whether the properties affected by temporal
exaggeration are important for this task.
Fig. 1. Single frames taken from an animation sequence at points corresponding to the key frames shown in Figure 2. The
frames represent the maximum spatial extents of the different phases of movement for both unexaggerated and exaggerated
sequences. The images are shown in reverse polarity with enlarged dots; they are framed for illustrative purposes. The individual
frames represent (a) the start position, (b) picking up the glass, (c) the beginning of drinking, (d) the end of drinking, (e)
replacing the glass, and (f) the final position.
VOL. 11, NO. 3, MAY 2000
Harold Hill and Frank E. Pollick
properties are constant. We predicted that temporal exaggeration will
improve performance if it affects dimensions of psychological space
that are important for recognition by stretching them in a way that
facilitates discrimination between individuals.
Sixteen students from the University of Glasgow took part in the
experiment. They were paid for their participation, which took about
an hour.
Fig 2. Example velocity profile for one sequence from one actor (a)
and the same sequence after temporal exaggeration by a factor of +1
(b). The profiles are based on tangential wrist velocity. Dotted lines
indicate the positions of key frames. The profile in (b) illustrates a
time-varying exaggeration in which total duration was not normalized.
We tested both time-varying and time-normalized exaggerations.
For both types of exaggerations, the relative durations of segments
were produced in the same manner–that is, differences from average
values were increased by a scaling factor. However, for timenormalized exaggerations, we normalized the overall length of the
sequences so that this was always equal to the average value for the
actor involved. Comparing these two types of exaggeration provides
a test of whether the total duration of movement sequences is itself the
critical cue to identifying the actor.
We also varied the level of exaggeration by using different scaling
factors. We used two positive levels of exaggeration, with differences
increased by factors of +0.5 (+50%) and +1 (+100%); an unexaggerated condition, with differences unchanged (0%); and a negative level
of exaggeration, which was scaled by −0.5 (−50%). Positive exaggerations should be better recognized and negative exaggerations
worse recognized if exaggeration enhances dimensions of variability
important for the task. The original learned sequences correspond to
the zero-exaggeration level of the time-varying exaggerations and
provide a baseline measure of performance.
In summary, the experiment tested whether changing temporal
properties affects recognition of biological motion even when spatial
VOL. 11, NO. 3, MAY 2000
We presented sequences as QuickTime animations using PsyScope
(Cohen, MacWhinney, Flatt, & Provost, 1993). In the initial stage, for
each sequence observers saw the first name of the person followed by
an original, unexaggerated example of the movement. Three different
sequences were shown for each of six individuals, making a total of
18 sequences. Participants did not make any responses during this
stage of the experiment.
The second stage of the experiment involved training with feedback. Participants saw 1 of the 18 learned sequences at random and
indicated who they thought it was. Responses were made using the
keys 1 to 6, each corresponding to a particular actor. A list of which
number corresponded to which actor was available throughout the
experiment. Automatic feedback indicated whether the response was
right or wrong. If the response was wrong, the correct name and
number were shown, and that sequence was shown again later in this
stage of the experiment. Participants trained until they had correctly
identified all 18 sequences twice each. The number of trials to criterion was recorded for each participant.
In the third and final stage of the experiment, participants saw a
mixture of the learned sequences and previously unseen, exaggerated
sequences in a random order. They responded in the same way as in
the second training stage, but there was no feedback and incorrectly
identified trials were not repeated. There were 144 trials in this stage
(6 actors × 2 types of exaggeration × 4 levels of exaggeration × 3
The core design was a 2 (exaggeration type: time-varying or timenormalized) × 4 (exaggeration level: −0.5, 0, +0.5, or +0.1) × 6 (actor)
within-participants factorial design. Exaggeration levels were the scaling factors, which were applied to the initial differences between the
sequence and the grand average.
We had recorded 12 repetitions for each actor when creating the
stimuli. In order to make use of all the repetitions and avoid effects
due to idiosyncrasies of particular sequences, we produced four versions of the experiment. Each version used a different random triplet
of the 12 available sequences for each actor. Version was included as
an additional between-participants variable.
Hits and false alarms for each actor in each condition were combined into a single measure of sensitivity, dL, an analogue of d⬘ based
on a logistic distribution (Snodgrass & Corwin, 1988). A dL of zero
corresponds to chance performance, and a dL of 5.4 corresponds to
perfect performance (3/3 hits and 0/15 false alarms) in any condition
of this experiment.
Temporal Exaggeration
Fig. 3. Performance as a function of exaggeration. Error bars show
standard errors.
The average number of trials to correctly recognize each sequence
twice in the training-with-feedback stage was 87, with a standard
deviation of 33. The minimum possible would have been 36 trials (18
× 2). Clearly, observers took time to learn these discriminations, and
there was considerable variation among observers.
A 2 (type of exaggeration) × 4 (level of exaggeration) × 6 (actor)
× 4 (version) analysis of variance showed a main effect of level of
exaggeration, F(3, 36) ⳱ 9.3, p < .001, and an interaction between
actor and version, F(15, 60) ⳱ 2.1, p < .05. There was no effect of
exaggeration type, nor were there any higher-order interactions (all
ps > .1).
The effect of level of exaggeration is illustrated in Figure 3. It is
clear that performance increased with increasing level of exaggeration, with positively exaggerated examples recognized better than the
learned examples. A post hoc Newman-Keuls test showed significant
differences (␣ ⳱ .05) between the −0.5 level of exaggeration and all
other levels of exaggeration and between the 0 and +1 levels of
exaggeration. A post hoc orthogonal linear contrast was consistent
with a linear relationship over the range tested, F(1, 36) ⳱ 26.8, p <
.05, with no significant nonlinear residual, F(2, 36) ⳱ 0.6, p > .1.
The interaction between version and actor shows that some of the
actors were better recognized depending on the particular set of sequences. As this effect was not strong, was independent of the main
effect of level of exaggeration, and was not itself of theoretical interest, it was not pursued further.
The results confirm that changing temporal properties affects the
recognition of identity from biological motion1 even when the spatial
1. The stimuli used are convincing depictions of biological motion although we used only six points and represented only part of the body. To test
properties remain constant. Exaggerating temporal differences improved performance on a task involving discriminating among people.
This suggests that the physical properties that we altered—duration
and properties based on duration—are psychologically important for
the task of recognizing different individuals from their movements.
Exaggerating temporal differences from mean values facilitated recognition although participants had learned unexaggerated versions.
Although exaggerations were based on differences in duration, total
duration does not appear to be the critical cue, as there was no difference between time-varying and time-normalized exaggerations.
Although theories of object recognition (e.g., Biederman, 1987;
Marr & Nishihara, 1978) emphasize the importance of spatial information, primarily shape, it appears that temporally derived information is also useful for recognition. The effect of temporal exaggeration
shows that spatial cues, which were available (e.g., the distances
between points and the motion path), do not fully determine performance for this task. Participants reported using spatial cues (e.g., the
height to which the glass was lifted), but performance was not determined by spatial cues alone, as it was also sensitive to the temporal or
spatiotemporal cues altered by temporal exaggeration. This result is
consistent with previous evidence that temporal and spatiotemporal
properties of the signal and noise are critical in determining performance on tasks based on the perception of biological motion (Bertenthal & Pinto, 1994; Cutting et al., 1988; Thornton et al., 1998).
There was no effect of whether the absolute duration of the sequence was allowed to vary or was time normalized, ruling out absolute duration as the critical cue. A purely temporal cue like relative
duration might still be important, however. For example, the relative
durations of the segments of a movement may capture the rhythm of
movement and be useful for recognition. Temporal exaggeration
might increase individual differences in rhythm and thereby facilitate
recognition. However, spatiotemporal cues, such as peak velocity, are
also affected by temporal exaggeration, and it may be one or more of
these cues that is critical for recognition. Participants mentioned differences in both speed and duration as cues that they had used.
The lack of an effect of time normalizing suggests that whatever
whether the results are specific to the perception of biological motion or if they
reflect general properties of the perception of moving point stimuli, we conducted a shortened version of the experiment in which all sequences, both
learned and tested, were played backward. This manipulation does not affect
low-level properties of motion—points still move the same distances in the
same time. However, sequences played backward look jerky, perhaps because
they disrupt the rhythm of movement and violate implicit knowledge of how
people move. In the backward experiment, there was an effect of level of
exaggeration, F(3, 33) ⳱ 5.1, p < .05, but the pattern was very different from
that reported for forward sequences. Mean dLs (with standard errors in parentheses) were 2.9 (0.3) for the −0.5 exaggeration, 2.9 (0.3) for no exaggeration,
3.2 (0.3) for the +0.5 exaggeration, and 2.6 (0.3) for the +1.0 exaggeration.
(These means are higher than in the forward condition because participants
only had to distinguish between four rather than six people.) A Newman-Keuls
test showed a significant difference (␣ ⳱ .05) between +0.5 and +1 exaggerations (performance was worse for +1 exaggerations), but no other significant
differences. An orthogonal contrast was not consistent with a linear relationship, F(1, 33) ⳱ 1.1, p > .1, the nonlinear residual being significant, F(2, 33)
⳱ 5.3, p < .05. There was a main effect of actor in this experiment, F(3, 33)
⳱ 6.2, p < .05, but this was independent of level of exaggeration, p > .1. In
summary, the very different pattern of results found when sequences were
played backward is consistent with interpreting the results of the main experiment as being specific to the perception of biological motion.
VOL. 11, NO. 3, MAY 2000
Harold Hill and Frank E. Pollick
the critical cue is, it is encoded in a relative rather than an absolute
way. It appears characteristic of cyclic biological motions that critical
properties are specified in terms of the number of stimulus cycles
rather than in terms of absolute duration (Neri et al., 1998). The results
reported here suggest that noncyclic biological motion is also encoded
in relative, stimulus-based terms. In summary, there are important
temporal or spatiotemporal cues diagnostic for recognition, and these
are encoded in relative, not absolute, terms.
Recognition performance increased with level of exaggeration.
Negative exaggerations were more poorly recognized than the unaltered sequences, and +1 exaggerations were recognized better than the
originals. The relationship appeared linear over the range tested. Any
exaggeration necessarily makes stimuli more discriminable by increasing the differences between examples, and exaggeration would
therefore be expected to facilitate discrimination between exaggerated
sequences. However, the current results go further in that exaggeration facilitated recognition of previously learned, unexaggerated sequences. This suggests that temporal exaggeration enhanced
psychologically important information about motion that was encoded
from the learned, unexaggerated sequences.
In the case of face recognition, it has been suggested that exaggerations are associated with advantages because faces are stored as
deviations from the average in a hypothetical “face space” (Valentine,
1991; for a review, see Rhodes, 1996). The representations might
themselves be exaggerations; this would directly explain improved
performance, as exaggerations would be a closer match than nonexaggerated examples. Alternatively, exaggerations might provide better
access than unexaggerated examples to representations that are themselves veridical (Rhodes et al., 1987); this would be possible within a
scheme in which both the local density of the representation space and
the distance between representations are important (Krumhansl,
1978). The results presented here are consistent with the possibility
that movements are encoded relative to an average spatiotemporal
template or prototype (Posner & Keele, 1968). However, the effects of
exaggeration can be accounted for within other representational
schemes, for example, an exemplar-based account (Medin & Schaffer,
1978; Nosofsky, 1986). It is beyond the scope of this article to determine the nature of the representation involved, but the results do
show that temporal information is diagnostic and must be accessible
from whatever representation is used (cf. Schyns, 1998). The current
results also extend the face-caricature effect to a different domain and
suggest that exaggeration may reflect general principles of how information is encoded for within-class discriminations.
For the most part, participants did not report being aware that the
timing was being varied. With sequential presentation, as used in this
experiment, differences are not, in general, readily apparent, although
they can be seen when sequences are presented simultaneously. In the
cases in which there was an obvious difference, such as when the glass
was picked up particularly fast, this was reliably associated with a
particular person and so provided a cue to recognition regardless of
how natural it looked. Similarly, even when facial caricatures are
obviously distorted, they are still clearly recognizable.
Although the movement, task, and stimuli used here were relatively arbitrary, we think that the results reflect general properties of
how biological motion is perceived. It should be possible to temporally exaggerate any movement that can be divided into segments.
Indeed, the widespread use of key frames by animators suggests that
this would normally be the case. Exaggerating the durations of these
VOL. 11, NO. 3, MAY 2000
segments would be expected to enhance performance on tasks requiring discrimination between the motions involved, at least when temporal or spatiotemporal properties are diagnostic. Our recognition task
is just a particular case of a within-category discrimination, and we
would expect that the physical and psychological dimensions varied
here would be important for other tasks involving the categorization
of biological motion.
In conclusion, temporal properties are important to how biological
motion is perceived, and exaggerating differences appears to facilitate
recognition on the basis of biological motion. The results generalize
the effect of exaggeration to another domain and are consistent with
exaggeration reflecting general properties of the way information for
within-class categorizations is encoded or accessed.
Acknowledgments—We would like to thank the actors and participants
who took part in the experiment. We also thank Pascal Mamassian, Philippe Schyns, Donald Morrison, and the two reviewers for helpful and
insightful comments on an earlier draft of the article, and Alan Johnston for
related discussions. The work was supported by Engineering and Physical
Science Research Council Grant GR/M36052 to Frank Pollick.
Bertenthal, B.I., & Pinto, J. (1994). Global processing of biological motions. Psychological Science, 5, 221–225.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115–147.
Brennan, S.E. (1985). The caricature generator. Leonardo, 18, 170–178.
Cohen, J.D., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope: A new graphic
interactive environment for designing psychology experiments. Behavior Research
Methods, Instruments, & Computers, 25, 257–271.
Cutting, J.E., & Kozlowski, L.T. (1977). Recognizing friends by their walk: Gait perception without familiarity cues. Bulletin of the Psychonomic Society, 9, 353–356.
Cutting, J.E., Moore, C., & Morrison, R. (1988). Masking the motions of human gait.
Perception & Psychophysics, 44, 339–347.
Cutting, J.E., Proffitt, D.R., & Kozlowski, L.T. (1978). A biomechanical invariant for gait
perception. Journal of Experimental Psychology: Human Perception and Performance, 4, 357–372.
Fidopiastis, C.M., & Pollick, F.E. (1998). Recognition of exaggerated human movement.
Investigative Ophthalmology and Visual Science, 39, S1094.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis.
Perception & Psychophysics, 14, 201–211.
Krumhansl, C. (1978). Concerning the applicability of geometric nodes to similarity data:
The interrelationship between similarity and spatial density. Psychological Review,
85, 445–463.
Marr, D., & Nishihara, H. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London B,
200, 269–294.
Mather, G., & Murdoch, L. (1994). Gender discrimination in biological motion displays
based on dynamic cues. Proceedings of the Royal Society of London B, 258, 273–
Mather, G., Radford, K., & West, S. (1992). Low-level visual processing of biological
motion. Proceedings of the Royal Society of London B, 249, 149–155.
Medin, D.L., & Schaffer, M.M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238.
Neri, P., Morrone, M., & Burr, D. (1998). Seeing biological motion. Nature, 395, 894–
Nosofsky, R. (1986). Attention, similarity and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57.
Posner, M., & Keele, S. (1968). On the genesis of abstract ideas. Journal of Experimental
Psychology, 77, 353–363.
Rhodes, G. (1996). Superportraits: Caricatures and recognition. Hove, England: Psychology Press.
Rhodes, G., Brennan, S., & Carey, S. (1987). Identification and ratings of caricatures:
Implications for mental representations of faces. Cognitive Psychology, 19, 473–
Temporal Exaggeration
Schyns, P.G. (1998). Diagnostic recognition: Task constraints, object information and
their interactions. Cognition, 67, 147–179.
Shiffrar, M., & Freyd, J.J. (1990). Apparent motion of the human body. Psychological
Science, 1, 257–264.
Shiffrar, M., & Freyd, J.J. (1993). Timing and apparent motion path choice with human
body photographs. Psychological Science, 4, 379–384.
Snodgrass, J.G., & Corwin, J. (1988). Pragmatics of measuring recognition memory:
Applications to dementia and amnesia. Journal of Experimental Psychology: General, 117, 34–50.
Thomas, F., & Johnson, O. (1981). The illusion of life: Disney animation. New York:
Thornton, I.M., Pinto, J., & Shiffrar, M. (1998). The visual perception of human locomotion. Cognitive Neuropsychology, 15, 535–552.
Tinbergen, N. (1953). The herring gull’s world. London: Collins.
Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and
race upon face recognition. Quarterly Journal of Experimental Psychology, 43A,
(RECEIVED 4/28/99; ACCEPTED 8/16/99)
VOL. 11, NO. 3, MAY 2000