QUADRUPED GAIT ANALYSIS USING SPARSE MOTION INFORMATION David P. Gibson, Neill W. Campbell and Barry T. Thomas Department of Computer Science University of Bristol Bristol BS8 1UB, United Kingdom [email protected] ABSTRACT In this paper we propose a system that recognises gait and quadruped structure from a sparse set of tracked points. In this work the motion information is derived from dynamic wildlife film footage and is consequently extremely complex and noisy. The gait analysis is carried out on footage that contains quadrupeds, usually walking in profile to the camera, however, this part of the system is pose independent and useful for gait detection in general. The dominant motion is assumed to be generated by the background and its relationship to the camera motion, enabling its removal as an initial step. Along with frequency analysis, an eigengait model is used as a template to synchronise clusters of points and to established an underlying spatio-temporal structure. Given this synchronised structure, further tracking observations are used to deform the structure to better fit the overall motion. We demonstrate that the use of an eigengait model enables the spatio-temporal localisation of walking animals and assists in overcoming difficulties caused by occlusion, tracking failure and noisy measurements. 1. INTRODUCTION The motivation for this paper is a system for automatically archiving and indexing large video databases of wildlife film footage. In particular the detection and classification of gaited animals is important for the efficient reuse of the video content in terms of creating new wildlife productions. The major difficulty in analysing such video data is the vast range of content and variability of low level image properties. The footage has been captured under many different conditions causing dramatic variations in texture and particularly colour. This is typical of our data and hence this paper concentrates purely on motion based recognition as a first step to higher level understanding of the video content. Previous work on gait analysis has tended to focus on the biometrics of human subjects. Standard approaches initially include motion analysis such as optical flow or con- densation often in conjunction with background subtraction. Periodic features are then extracted and used for recognition [1, 2, 3] or are matched and fitted to statistical models of motion [4, 5]. Understanding how things move over time is a powerful cue to recognition and classification, this has been demonstrated by work on Moving Light Displays (MLDs). Much work has been carried out regarding MLDs indicating that, for humans at least, a small number of points moving in a particular way can infer a large amount of potentially useful information [6, 7]. It is these concepts that inspire the work described by this paper. 2. METHOD The video footage used in this work focuses on dynamic outdoor scenes of walking animals. The content consists of multiple motions derived from the movement of the animal, often in addition to the apparent motion generated by a panning camera. Scenes often contain large sky regions, which often afford very little or no texture information, as well as complex foliage such as long grass. Other factors contribute to the complexity of the scenes, such as wind animating foliage or unsteady camera motions. All of the above factors cause standard motion analysis techniques such as optical flow and block matching to become very unstable. For this reason the Kanade-Lucas-Tomasi (KLT) point tracker has been used [8]. Whilst the motion information generated by the KLT tracker is sparse, it is at least transformed into a discrete set of trajectories, some of which are tracked over many frames. For this paper 150 points per frame were tracked. 2.1. Dominant Motion Removal In this work the dominant motion, or non-motion, has been assumed to be generated by image points belonging to the background, i.e. non-animal parts of each image. Trajectories of points tracked over multiple frames are extracted from the sparse motion information and a translational model is fitted to the trajectories using RANSAC. The model is applied to the displacement vectors of points over n frames, where 1 < n < 10. Points from the background tend to be tracked over many frames and move in a consistent manner. The dominant cluster of trajectories can then be extracted leaving the trajectories generated by the foreground object of interest, see Figure 1. This sequence of footage consists of 250 frames of a male lion walking to the left whilst being tracked by a panning camera. The camera pan is not smooth, causing the position of the animal to vary over time. However, the pan is consistent enough to be able to extract the background motion over the entire sequence. In Figure 1 the background motion is measured as a pan to the right with a velocity vector of (4.21, −0.02) pixels per frame. Fig. 1. Top left is frame n, top right frame n + 1. Bottom left shows the dominant background trajectories, bottom right the remaining foreground object of interest. The background is measured as moving with a velocity vector of (4.21, −0.02) pixels per frame. Fig. 2. Three frames out of the 120 frames of a horse walking on a treadmill. Twelve key point were manually tracked and labelled for each frame. Lines depicting a crude skeletal form have been added for clarity. Fig. 3. Responses of each frame of the labelled horse frames to the first three principal modes of the eigengait space. 2.2. Generation of an Eigengait Space A sequence of images of a horse walking on a treadmill was used as an exemplar of quadruped gait. The sequence consists of 120 frames, approximately four cycles of the gait, with each frame being hand labelled with 12 (x,y) key points manually tracked over the sequence, see Figure 2. Principal component analysis was carried out on the 120, 24 dimensional vectors to generate an eigengait space, see Figure 3. The responses of Figure 3 were then approximated by a temporally ordered Gaussian Mixture Model (GMM) [9] to give a continuous gait model as shown in Figure 4. 2.3. Directionality and Spatial Division To establish an initial structure of the foreground point cloud, heuristics based on general observations of walking animals Fig. 4. A temporally ordered GMM of the responses in Figure 3. A spline has been drawn through the centres of each mixture component. are used. These include the generalities that the scene is oriented such that sky is up and that animals tend not to walk upside down nor walk backwards. Over a period of time, given the way the camera is moving or the general direction of the foreground point cloud, a front to back, top to bottom assignment is made. The spatial mean of the foreground points is then used as the centroid of the assumed animal and the points are subdivided into quadrants as shown in Figure 5. Subject Horse Lion Wildebeast Human† Horse (trot) Frequency (Hz) 1.67 1.57 1.72 1.91 2.84 Table 1. Frequencies of different creatures’ walking gaits. † As calculated from [1] cycle of gait is required to give a reasonable estimate of the fundamental frequency. A useful aspect of this approach, with respect to gait detection, is that the fundamental frequency detection is not dependent on the 3-dimensional position or pose of a creature. Fig. 5. The centroid of the foreground points is used to subdivide into quadrants with foreground/background motion determining the direction of movement as described in Figure 1. 2.4. Frequency Analysis and Synchronisation If a gait like frequency can be detected the tracking observations can be matched to, and synchronised with, the eigengait model. To establish the existence of a gait, frequency analysis is used. The trajectories generated by the KLT tracker contain noise, in addition to this they contain spatial drifts caused by irregular camera motion or animal movement. This confounds the frequency analysis of individual trajectories. To overcome these difficulties the mean of the vertical differences is used to establish the fundamental frequency of the gait, i ) ∑ni=1 (yti − yt−1 f (t) = ⊗ G(t, σ) (1) n where yt are the vertical positions of the n tracked points in a quadrant for each frame and G(t, σ) is a Gaussian kernel, the size of the kernel for this work was 9 units of time. Figure 6, top, is the resulting signal observed for the top rear quadrant of the lion walking sequence. By using the vertical differences, as opposed to position, the effects of poor camera tracking are greatly reduced and with temporal smoothing a more representative signal is observed. The maximum magnitude component of the Fourier spectrum gives the fundamental gait frequency. This is matched against a set of previously measured fundamental gait frequencies and if this frequency is within an allowable range, the presence of a gaited animal is assumed, Table 1. At least one Fig. 6. The observed signal for the top rear quadrant of the first 100 frames of the lion walking sequence, top. The same signal from the horse based eigengait model, middle. The new signal is generated by warping the model signal, middle, to the observed signal, top. Figure 6, middle, shows the equivalent top rear signal generated by the horse based eigengait model. By synchronising the model gait frequency with the observed lion gait frequency we can generate a new spatio-temporal structure, (STS), that is walking in phase with and at the same rate as the observed point cloud, Figure 6, bottom. This is accomplished by detecting maxima and minima in the observed and model generated signals and then doing a piecewise lin- fit the expected model evolution. It has been shown that the STS can track a sparse set of trajectories over a large number of frames without losing lock on the foreground point cloud. Additionally the system offers reliable pose independent gait detection. Future work includes taking into account more complex camera motions, such as zooms and lens effects and instances of multiple animals. 5. ACKNOWLEDGEMENTS Fig. 7. The new synchronised and spatially updated STS points for the first 100 frames of the lion sequence, circles. The original horse based eigengait model is shown as crosses. ear interpolation. The matched temporal control points can now be projected out of the eigengait space to generate a new STS, synchronised with the observations. The STS is spatially located using the centroid of the cloud of points as well as the spatial mean of the top rear and front quadrants, Figure 7. It should be noted that the model based information does not consider any 3-dimensional depth information such as nearside legs as opposed to farside legs. 3. RESULTS Three frames from the lion walking sequence are shown in Figure 8. The STS has been overlayed. The system described in this paper has been successfully applied to other walking animal sequences including an elephant, cheetah and zebra. Additionally gait recognition rates of over 80% have been achieved using a threshold on the frequency analysis. This work is sponsored by the DTI Broadcast Link Programme. Thanks are due to the BBC Natural History Unit (Bristol) and Matrix Data Ltd. 6. REFERENCES [1] J. J. Little and J. E. Boyd, “Recognising People by Their Gait: The Shape of Motion,” Videre: Journal of Computer Vision Research, vol. 1, no. 2, 1998. [2] R. B. Polana and R. C. Nelson, “Nonparametric Recognition of Nonrigid Motion,” Tech. Rep. TR575, University of Rochester, 1995. [3] R. Cutler and L. Davis, “Robust Real-Time Periodic Motion Detection, Analysis and Applications,” IEEE Transactions on PAMI, vol. 22, no. 8, pp. 781–796, August 2000. [4] D. Ormoneit, H. Sidenbladh, M. J. Black, and T. Hastie, “Learning and Tracking Cyclic Human Motion,” in NIPS, 2000, pp. 894–900. [5] D. Meyer, J. Pösl, and H. Niemann, “Gait Classification with HMMs for Trajectories of Body Parts Extracted by Mixture Densities,” in BMVC, 1998, pp. 459–468. [6] C. Cedras and M. Shah, “A Survey of Motion Analysis from Moving Light Displays,” in CVPR, Seattle, Washington, June 1994, pp. 20–24, IEEE Computer Society. Fig. 8. Frames 12, 44 and 85 from the lion sequence with the spatio-temporal structure overlayed. 4. CONCLUSIONS In this paper we show that the use of a horse based eigengait model can be used to generate a spatio-temporal structure of other quadrupeds given a sparse set of motion trajectories. The model based approach enables tracking errors such as noise, occlusion and tracking failure to be overcome. After frequency analysis and synchronisation a new spatiotemporal structure is updated based on measurements that [7] H. Lakany and G. Hayes, “An Algorithm for Recognising Walkers,” in Audio- and Video-based Biometric Person Authentication. 1997, pp. 112–118, IAPR, Springer. [8] J. Shi and C. Tomasi, “Good Features to Track,” in CVPR, Seattle, Washington, June 1994, pp. 593–600, IEEE Computer Society. [9] C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, 1995.
© Copyright 2026 Paperzz