QUADRUPED GAIT ANALYSIS USING SPARSE

QUADRUPED GAIT ANALYSIS USING SPARSE MOTION INFORMATION
David P. Gibson, Neill W. Campbell and Barry T. Thomas
Department of Computer Science
University of Bristol
Bristol
BS8 1UB, United Kingdom
[email protected]
ABSTRACT
In this paper we propose a system that recognises gait and
quadruped structure from a sparse set of tracked points. In
this work the motion information is derived from dynamic
wildlife film footage and is consequently extremely complex and noisy. The gait analysis is carried out on footage
that contains quadrupeds, usually walking in profile to the
camera, however, this part of the system is pose independent
and useful for gait detection in general. The dominant motion is assumed to be generated by the background and its
relationship to the camera motion, enabling its removal as
an initial step. Along with frequency analysis, an eigengait
model is used as a template to synchronise clusters of points
and to established an underlying spatio-temporal structure.
Given this synchronised structure, further tracking observations are used to deform the structure to better fit the overall
motion. We demonstrate that the use of an eigengait model
enables the spatio-temporal localisation of walking animals
and assists in overcoming difficulties caused by occlusion,
tracking failure and noisy measurements.
1. INTRODUCTION
The motivation for this paper is a system for automatically
archiving and indexing large video databases of wildlife film
footage. In particular the detection and classification of
gaited animals is important for the efficient reuse of the
video content in terms of creating new wildlife productions.
The major difficulty in analysing such video data is the vast
range of content and variability of low level image properties. The footage has been captured under many different
conditions causing dramatic variations in texture and particularly colour. This is typical of our data and hence this
paper concentrates purely on motion based recognition as a
first step to higher level understanding of the video content.
Previous work on gait analysis has tended to focus on
the biometrics of human subjects. Standard approaches initially include motion analysis such as optical flow or con-
densation often in conjunction with background subtraction.
Periodic features are then extracted and used for recognition
[1, 2, 3] or are matched and fitted to statistical models of
motion [4, 5].
Understanding how things move over time is a powerful
cue to recognition and classification, this has been demonstrated by work on Moving Light Displays (MLDs). Much
work has been carried out regarding MLDs indicating that,
for humans at least, a small number of points moving in a
particular way can infer a large amount of potentially useful
information [6, 7]. It is these concepts that inspire the work
described by this paper.
2. METHOD
The video footage used in this work focuses on dynamic
outdoor scenes of walking animals. The content consists of
multiple motions derived from the movement of the animal,
often in addition to the apparent motion generated by a panning camera. Scenes often contain large sky regions, which
often afford very little or no texture information, as well as
complex foliage such as long grass. Other factors contribute
to the complexity of the scenes, such as wind animating foliage or unsteady camera motions. All of the above factors cause standard motion analysis techniques such as optical flow and block matching to become very unstable. For
this reason the Kanade-Lucas-Tomasi (KLT) point tracker
has been used [8]. Whilst the motion information generated by the KLT tracker is sparse, it is at least transformed
into a discrete set of trajectories, some of which are tracked
over many frames. For this paper 150 points per frame were
tracked.
2.1. Dominant Motion Removal
In this work the dominant motion, or non-motion, has been
assumed to be generated by image points belonging to the
background, i.e. non-animal parts of each image. Trajectories of points tracked over multiple frames are extracted
from the sparse motion information and a translational model
is fitted to the trajectories using RANSAC. The model is applied to the displacement vectors of points over n frames,
where 1 < n < 10. Points from the background tend to be
tracked over many frames and move in a consistent manner. The dominant cluster of trajectories can then be extracted leaving the trajectories generated by the foreground
object of interest, see Figure 1. This sequence of footage
consists of 250 frames of a male lion walking to the left
whilst being tracked by a panning camera. The camera pan
is not smooth, causing the position of the animal to vary
over time. However, the pan is consistent enough to be able
to extract the background motion over the entire sequence.
In Figure 1 the background motion is measured as a pan to
the right with a velocity vector of (4.21, −0.02) pixels per
frame.
Fig. 1. Top left is frame n, top right frame n + 1. Bottom left shows the dominant background trajectories, bottom right the remaining foreground object of interest. The
background is measured as moving with a velocity vector of
(4.21, −0.02) pixels per frame.
Fig. 2. Three frames out of the 120 frames of a horse walking on a treadmill. Twelve key point were manually tracked
and labelled for each frame. Lines depicting a crude skeletal
form have been added for clarity.
Fig. 3. Responses of each frame of the labelled horse frames
to the first three principal modes of the eigengait space.
2.2. Generation of an Eigengait Space
A sequence of images of a horse walking on a treadmill was
used as an exemplar of quadruped gait. The sequence consists of 120 frames, approximately four cycles of the gait,
with each frame being hand labelled with 12 (x,y) key points
manually tracked over the sequence, see Figure 2. Principal
component analysis was carried out on the 120, 24 dimensional vectors to generate an eigengait space, see Figure 3.
The responses of Figure 3 were then approximated by a temporally ordered Gaussian Mixture Model (GMM) [9] to give
a continuous gait model as shown in Figure 4.
2.3. Directionality and Spatial Division
To establish an initial structure of the foreground point cloud,
heuristics based on general observations of walking animals
Fig. 4. A temporally ordered GMM of the responses in Figure 3. A spline has been drawn through the centres of each
mixture component.
are used. These include the generalities that the scene is oriented such that sky is up and that animals tend not to walk
upside down nor walk backwards. Over a period of time,
given the way the camera is moving or the general direction
of the foreground point cloud, a front to back, top to bottom
assignment is made. The spatial mean of the foreground
points is then used as the centroid of the assumed animal
and the points are subdivided into quadrants as shown in
Figure 5.
Subject
Horse
Lion
Wildebeast
Human†
Horse (trot)
Frequency (Hz)
1.67
1.57
1.72
1.91
2.84
Table 1. Frequencies of different creatures’ walking gaits.
† As calculated from [1]
cycle of gait is required to give a reasonable estimate of the
fundamental frequency. A useful aspect of this approach,
with respect to gait detection, is that the fundamental frequency detection is not dependent on the 3-dimensional position or pose of a creature.
Fig. 5. The centroid of the foreground points is used to subdivide into quadrants with foreground/background motion
determining the direction of movement as described in Figure 1.
2.4. Frequency Analysis and Synchronisation
If a gait like frequency can be detected the tracking observations can be matched to, and synchronised with, the
eigengait model. To establish the existence of a gait, frequency analysis is used. The trajectories generated by the
KLT tracker contain noise, in addition to this they contain
spatial drifts caused by irregular camera motion or animal
movement. This confounds the frequency analysis of individual trajectories. To overcome these difficulties the mean
of the vertical differences is used to establish the fundamental frequency of the gait,
i )
∑ni=1 (yti − yt−1
f (t) =
⊗ G(t, σ)
(1)
n
where yt are the vertical positions of the n tracked points
in a quadrant for each frame and G(t, σ) is a Gaussian kernel, the size of the kernel for this work was 9 units of time.
Figure 6, top, is the resulting signal observed for the top rear
quadrant of the lion walking sequence. By using the vertical
differences, as opposed to position, the effects of poor camera tracking are greatly reduced and with temporal smoothing a more representative signal is observed. The maximum magnitude component of the Fourier spectrum gives
the fundamental gait frequency. This is matched against
a set of previously measured fundamental gait frequencies
and if this frequency is within an allowable range, the presence of a gaited animal is assumed, Table 1. At least one
Fig. 6. The observed signal for the top rear quadrant of
the first 100 frames of the lion walking sequence, top. The
same signal from the horse based eigengait model, middle.
The new signal is generated by warping the model signal,
middle, to the observed signal, top.
Figure 6, middle, shows the equivalent top rear signal
generated by the horse based eigengait model. By synchronising the model gait frequency with the observed lion gait
frequency we can generate a new spatio-temporal structure,
(STS), that is walking in phase with and at the same rate as
the observed point cloud, Figure 6, bottom. This is accomplished by detecting maxima and minima in the observed
and model generated signals and then doing a piecewise lin-
fit the expected model evolution. It has been shown that the
STS can track a sparse set of trajectories over a large number of frames without losing lock on the foreground point
cloud. Additionally the system offers reliable pose independent gait detection. Future work includes taking into
account more complex camera motions, such as zooms and
lens effects and instances of multiple animals.
5. ACKNOWLEDGEMENTS
Fig. 7. The new synchronised and spatially updated STS
points for the first 100 frames of the lion sequence, circles. The original horse based eigengait model is shown
as crosses.
ear interpolation. The matched temporal control points can
now be projected out of the eigengait space to generate a
new STS, synchronised with the observations. The STS is
spatially located using the centroid of the cloud of points as
well as the spatial mean of the top rear and front quadrants,
Figure 7. It should be noted that the model based information does not consider any 3-dimensional depth information
such as nearside legs as opposed to farside legs.
3. RESULTS
Three frames from the lion walking sequence are shown in
Figure 8. The STS has been overlayed. The system described in this paper has been successfully applied to other
walking animal sequences including an elephant, cheetah
and zebra. Additionally gait recognition rates of over 80%
have been achieved using a threshold on the frequency analysis.
This work is sponsored by the DTI Broadcast Link Programme. Thanks are due to the BBC Natural History Unit
(Bristol) and Matrix Data Ltd.
6. REFERENCES
[1] J. J. Little and J. E. Boyd, “Recognising People by Their
Gait: The Shape of Motion,” Videre: Journal of Computer Vision Research, vol. 1, no. 2, 1998.
[2] R. B. Polana and R. C. Nelson, “Nonparametric Recognition of Nonrigid Motion,” Tech. Rep. TR575, University of Rochester, 1995.
[3] R. Cutler and L. Davis, “Robust Real-Time Periodic
Motion Detection, Analysis and Applications,” IEEE
Transactions on PAMI, vol. 22, no. 8, pp. 781–796, August 2000.
[4] D. Ormoneit, H. Sidenbladh, M. J. Black, and T. Hastie,
“Learning and Tracking Cyclic Human Motion,” in
NIPS, 2000, pp. 894–900.
[5] D. Meyer, J. Pösl, and H. Niemann, “Gait Classification
with HMMs for Trajectories of Body Parts Extracted by
Mixture Densities,” in BMVC, 1998, pp. 459–468.
[6] C. Cedras and M. Shah, “A Survey of Motion Analysis
from Moving Light Displays,” in CVPR, Seattle, Washington, June 1994, pp. 20–24, IEEE Computer Society.
Fig. 8. Frames 12, 44 and 85 from the lion sequence with
the spatio-temporal structure overlayed.
4. CONCLUSIONS
In this paper we show that the use of a horse based eigengait
model can be used to generate a spatio-temporal structure of
other quadrupeds given a sparse set of motion trajectories.
The model based approach enables tracking errors such as
noise, occlusion and tracking failure to be overcome. After frequency analysis and synchronisation a new spatiotemporal structure is updated based on measurements that
[7] H. Lakany and G. Hayes, “An Algorithm for Recognising Walkers,” in Audio- and Video-based Biometric Person Authentication. 1997, pp. 112–118, IAPR,
Springer.
[8] J. Shi and C. Tomasi, “Good Features to Track,” in
CVPR, Seattle, Washington, June 1994, pp. 593–600,
IEEE Computer Society.
[9] C. Bishop, Neural Networks for Pattern Recognition,
Oxford University Press, New York, 1995.