A Three-Level Scheme for Real-Time Ball Tracking
Xiaofeng Tong1, Tao Wang1, Wenlong Li1, Yimin Zhang1
Bo Yang2, Fei Wang2, Lifeng Sun2, and Shiqiang Yang2
1
Intel China Research Center, Beijing, P.R. China, 100080
{xiaofeng.tong,tao.wang,wenlong.li,yimin.zhang}@intel.com
2
Department of Computer Science and Technology,
Tsinghua University, Beijing 100084
[email protected], [email protected],
{sunlf,yangshq}@tsinghua.edu.cn
Abstract. A three-level method is proposed to achieve robust and real-time ball
tracking in soccer videos. It includes object-, intra-trajectory-, and intertrajectory-level processing. Due to much noise and frequent occlusion, it’s difficult to get the solely ball in one frame. Thus, in object level, multiple objects
instead of a single one are detected and taken as ball candidates with shape and
color features. Then at intra-trajectory level, each ball candidate is tracked by a
Kalman filter in successive frames, which results in lots of initial trajectories in
a video segment. These trajectories are thereafter scored and filtered according
to their length and relationship in a time-line model. With these trajectories, we
construct a distance graph, in which a node represents a trajectory, and an edge
means distance between two trajectories. We use the Dijkstra algorithm to get
the optimal path in the graph at the inter-trajectory level. To smooth the trajectory, we finally apply cubic spline interpolation to bridge the gap between adjacent trajectories. The algorithm is tested on broadcast soccer games in
FIFA2006 and got the F-score 80.26%. The whole speed far exceeds real-time,
35.6 fps on mpeg2 data.
Keywords: Ball tracking, Kalman filtering, trajectory inference, data association, sports video.
1 Introduction
Semantic object (ball, player, goalmouth, face, jersey number, etc) detection and
tracking have attracted much attention and been used for advanced semantics analysis
in addition to highlights extraction in sports video [8]. In field-ball sports, the ball is
the focus of competition. Ball detection and tracking, trajectory analysis is of great
benefit to semantics mining, such as tactics, team activity analysis, etc [4].
In previous literatures, X. Yu [8] proposed a detection-based tracking method for
ball trajectory selection, inference and extension, and applied it in semantic event
detection. D. Liang [2] used Kalman filtering to track the ball based on detection
verification in initial multiple successive frames. X. Tong [5] presented a Condensation based ball tracking algorithm using color and shape features. F. Yan [7] used
trajectory level verification scheme to infer and determine the ball trajectory in tennis
N. Sebe, Y. Liu, and Y. Zhuang (Eds.): MCAM 2007, LNCS 4577, pp. 161 – 171, 2007.
© Springer-Verlag Berlin Heidelberg 2007
162
X. Tong et al.
video based on data association. At early times, chromatic and morphological features
were utilized to detect ball in [3].
There are some challenges for soccer ball tracking. Firstly, there is camera motion
in broadcast video; thus, the estimated ball motion is mixed by intrinsic ball and camera motion. Secondly, it is not easy to construct an effective model to detect ball because there are constantly size and shape change, occlusion and noise (See Fig 1).
Thirdly, there is much occlusion, frequent motion speed and direction change, flying
outside of playfield, which challenges the robustness of ball tracking in a long time.
Fig. 1. Difficulties of ball detection (from left to right: shape blur, occlusion and noise)
The above challenges come into being much difference between ball and other traditional object tracking algorithms. Firstly, it needs more than one object for tracking
initialization. The unreliable detection and much noise make tracking can not simply
rely on detection. Secondly, long time tracking without instant rectification is not
stable due to frequent occlusion and shot transition. We deem the tracking as data
association based on detection is feasible and practical.
In this paper, we propose a three-level including object-, intra-trajectory-, and inter-trajectory-level, scheme to realize effective ball tracking solution. The idea is to
discriminate and optimize the unique and true ball trajectory from multiple trajectories generated by many ball-like candidates. At the object level, we use the video
dominant color (field color in soccer) to extract non-field objects including ball, players, line-marks and others within play-field. Then a filter with shape and color cues is
applied to remove noises, and several qualified candidates are reserved in each frame.
At the intra-trajectory level, many possible trajectories are generated through motion
fitting with ball candidates in consecutive frames. Some false alarms are removed
through trajectory filtering in term of the length and their relationship in time-line. At
this phase, there are still many trajectory candidates in a video segment. To separate
the truth trajectories from others, at the inter-trajectory level, a graph, in which a node
represents a trajectory, and an edge is the distance between a pair of trajectories, is
constructed to find the optimal path by the Disjkstra algorithm. In order to smooth and
linkup the final ball trajectory, a cubic spline is used to interpolate the gap between
two adjacent trajectories.
Compared with previous methods, the contributions of this paper are:
1)
2)
3)
4)
Propose a practical and hierarchical scheme for real-time tracking.
Apply time-line model to define trajectory distance.
Use Dijkstra distance matrix to optimize the optimal ball path.
Utilize cubic spline interpolation to predict the ball position in missing frames.
We tested the method on FIFA2006 soccer games and got the F-score of 80.26%.
The speed is 35.6 fps on mpeg2 data (720x576 pixels, 25fps, Intel Core 2 Dual).
A Three-Level Scheme for Real-Time Ball Tracking
163
2 Framework
The whole flowchart is illuminated in Fig 2, which consists of: (1) pre-processing for
view-type classification. In this step, only global views are selected for ball tracking
process. (2) Object level processing, where multiple ball candidates are identified
through image segmentation and connect-component analysis. To eliminate noises,
Hough line detection, shape and color cues verification are performed. With the filtering, the left candidates have high probability to be the ball. (3) Intra-trajectory level,
Fig. 2. Framework of the method
connect ball candidates and generate initial ball trajectories, delete the false ones. (4)
Inter-trajectory processing, define distance between trajectories, construct a distance
graph representing the trajectories and their distance, and find the optimal path consisting of non-overlap trajectories. Finally, we use cubic spline interpolation to
smooth and linkup the gap between two adjacent trajectories.
3 Methodology
3.1 Pre-processing
The pre-processing module performs view-type classification, selects global views for
ball detection and tracking. We firstly learn the dominant color (corresponds to grass
164
X. Tong et al.
color) by accumulating HSV color histograms with lots of frames. Then we segment
the play-field with the dominant color. According to the area of play-field and the size
of non-field object, we classify each view using a decision tree into one of four types
of views: global view, medium view, close-up, and out of view (Fig 3) [1].
Fig. 3. View-types: global view, middle view, close-up, and out-of view
3.2 Object Level Processing
In a global view, we extract the non-field regions using dominant color segmentation
(Fig.4(a)). The ball is included in these regions (white regions). But there is much
noise: line-marks, player regions, false segmented regions, etc. To discriminate the
ball and remove noise, we apply a three steps filtering:
1) We use Hough transform to detect direct lines within play-field and remove
them.
2) We filter out the unqualified regions with shape features: (a) size, the size of the
longer axis of a region; (b) area (A), area of the region; (c) eccentricity, ratio of
longer axis to shorter axis; and (d) circularity factor, represented as 4*π*A/ P2, P
is the perimeter.
3) We utilize an SVM classifier with color cue to further verify the ball candidates.
A four-dimension color feature vector is used: <r, b, s, v>, in which r = R/G, b =
B/R in RGB color space, s and v are saturation and value components in HSV
space. The middle results are presented in Fig.4.
(a)
(b)
(c)
(d)
Fig. 4. Results in object level processing. (a) Segmented binary image by dominant color; (b)
remove line-marks; (c) filtering with shape feature; and (d) filtering with color.
After get ball candidates in each frame, we divide the video into shots (called ballshot), then perform tracking process in each ball-shot. If the frame number interval
between two adjacent ball frames is larger than a threshold (interval > th_delta), a
A Three-Level Scheme for Real-Time Ball Tracking
165
ball-shot boundary at this time-point is declared. In this work, the th_delta is set to be
2*fps (frame per second of the video).
3.3 Intra-trajectory Level Processing
Once get the ball candidates in each frame, we generate initial trajectories through
linking the adjacent candidates in spatial-temporal domain. The predication is performed by Kalman filtering with the formation:
X k = AX k −1 + wk
Z k = HX k + vk
Where Xk and Zk are the state and measurement vector at time k, wk and vk are system
and measurement noise. A and H are state transition and measurement matrix, respectively. In this work, we set:
⎡1
⎡x ⎤
⎢0
⎢y ⎥
X = ⎢v ⎥ , Z = ⎡ x ⎤ , A = ⎢
⎢⎣ y ⎦⎥
x
⎢0
⎢⎣ v y ⎥⎦
⎢
⎣0
0 1 0⎤
1 0 1⎥
1 0 0 0⎤
⎥,H = ⎡
⎢ 0 1 0 0⎥
0 1 0⎥
⎣
⎦
⎥
0 0 1⎦
where (x, y) is the ball’s center, (vx, vy) is the velocity in x- and y-direction.
A trajectory is initialized by a seed candidate that is not included by any trajectory,
and grows if the position predicted by Kalman filter is verified by a candidate in the
next adjacent frame. The procedure is summarized in Fig 5, which is similar to [8].
Fig. 5. Flowchart of initial trajectory generation
166
X. Tong et al.
The initial trajectories link all possible candidates along a reasonable path. We set
confidence for each trajectory according to its length (during):
⎧⎪ false,
confidence = ⎨true,
⎪⎩ pendent ,
if
length < T1
if
length > T2
otherwise
We then remove the unconfident trajectories. If a pendent trajectory is covered
(A covered by B means: A.start > B.start, and A.end < B.end) by another one (see
Table 1), it is also deleted. The others reserved will be identified at inter-trajectory
level. The procedure is shown in Fig 5. The ball candidates are shown in Fig 6(a).
The initial trajectories are displayed in Fig 6(b). Fig 6(c) shows the procedure
of trajectories filtering. The circled trajectories are false. Fig 6(d) is the filtering
result.
(a)
(b)
(c)
(d)
Fig. 6. Intra-trajectory filtering
3.4 Inter-trajectory Level Processing
We further discriminate the truth trajectories via path optimization at the intertrajectory level. The ball’s trajectory within a shot should be smoothing and continuous. We define the distance between two trajectories and generate a distance graph.
Then, we find the optimal path with Dijkstra algorithm.
We apply the time-line model to define the distance between two trajectories. Let A
be a trajectory, A.start and A.end be its start and end time. The symbol “<” represents
“earlier”, “>” denotes “later”, “<=” is “no later”, “>=” is “no earlier”. “a” and “b” are
ball candidates in trajectory A and B. We define the distance of two trajectories
below.
A Three-Level Scheme for Real-Time Ball Tracking
167
Table 1. Trajectory distance definition between trajectories A and B
No
1
Description
A.end < B.start
2
A.end > B.start
and
A.end <= B.end
3
A.end > B.end
Example
A
B
A
B
A
B
A
B
A
B
A
B
Distance
dist(A.end, B.start)
min(dist(a,b)),
a,b
∈A∩B,
a<b
∞
Assume a = <xa, ya, ta>, b = <xb, yb, tb> be two ball points, then
dist (a, b) = ( xa − xb ) 2 + ( ya − yb ) 2 + α ta − tb
Where α is a weighted factor for time interval. In this work, we set α = 7.
In the trajectory graph, a trajectory is taken as a node, and the linkage between
trajectories are edges. The distance between two trajectories is set to be edge
weight. In the graph, we put two additional start and end nodes. The start node is a
trajectory including one ball-frame which has a ball candidate with the start time of
the ball-shot. The distance from the start node to its neighbors is represented by the
temporal distance (the spatial distance is zero). The end node is defined vice verse.
With the graph, we apply Dijkstra algorithm to find the optimal path from start to
end. The resultant path consists of optimal time-ordered trajectories. Fig 7 portrays
an example. Trajectories before/after optimization are shown in (a) and (b) (blue
solid line).
(a)
(b)
Fig. 7. Inter-trajectory optimization. (left) multiple trajectory candidates, (right) filtered trajectories(blue and solid lines).
168
X. Tong et al.
(a)
(b)
Fig. 8. Trajectory interpolation. (a) trajectories before interpolation, (b) interpolation result
(points with blue “+” are interpolated results).
The interval between two adjacent trajectories is generated due to candidate missing. The missing often occurs when there is a change of the motion speed and direction (motion model) due to, for example, take a pass, dribble and then pass the ball to
others. The interval (gap) can be interpolated by the two sandwiched trajectories. We
apply cubic spline interpolation to make up the interval ( Fig 8).
4 Experiments and Analysis
4.1 Data-Set and Evaluation Criterion
We tested this algorithm on FIFA2006 broadcasting soccer videos with mpeg2 format. We manually labeled one minutes video (1497 frame). We manually located the
ball’s position and its radius in every frame if: 1) the ball is visible or 2) it’s occluded
but its position can be estimated by neighbor frames. The evaluation criterion is based
on overlap between ground truth and tracked regions.
4.2 Middle Results
In the experiments, we extract the middle results of the first ball-shot and show in
Fig 9. The detected/tracked results are displayed with blue “+”; and true results are
showed by red “o”. The horizontal axis is time, and the vertical axis is the width of
image frame. The ball candidates are shown in Fig 9(a). Trajectories linked by
these candidates and filtered results are displayed in Fig 9(b). We can see that the
most false trajectories are removed out. Fig 9(c) shows the optimal path by the
Dijkstra algorithm. Fig 9(d) is the linking and smoothing result. It can be seen that
most correct candidates are reserved, and most missing cases are smoothingly
interpolated.
The smoothing and interpolation results from frame 50 to 89 are shown in Fig 10.
The ground truth is displayed by red dash line, the tracking and interpolated result is
displayed by blue filled circles. Actually, some ground truth is got by estimation and
impossible to be detected due to occlusion or blur.
A Three-Level Scheme for Real-Time Ball Tracking
(a)
(b)
(c)
(d)
169
Fig. 9. A ball tracking result. (a) Ball candidates; (b) filtered trajectories; (c) optimal path; (d)
final trajectories. Detection/tracking results are displayed by blue “+”,truth is shown by red “o”.
Fig. 10. An interpolation result from frame 50-89
frame 714
frame 722
frame 730
frame 738
frame 746
frame 754
frame 762
frame 770
Fig. 11. An example of interpolation in a sequence
170
X. Tong et al.
Another example of sequence is shown in Fig 11. We extract 8 representative
frames from successive 80 frames. In these results, most frames in two occluded segments frame 714-738, and 762-770 are corrected interpolated.
4.3 Tracking Results
We compared the tracking results with the ground truth frame by frame on the oneminute video. The information is given in Table 2. In the total 1497 frames, there are
3 ball-shots, 928 global frames, in which the ball tracking is conducted. The shot
boundaries are correctly obtained. In detection phase, the ball is correctly located in
356 frames. Through tracking, we finally get 572 ball-frames.
Table 2. Information of the one-minute video clip
total (frame) global-view
1497
928
ball-shot
3
detected ball-frame
356
tracked ball-frame
572
In the results, there are correct, false and missing instances. The incremental correctly tracked frames are got by interpolation. It is actually impossible to detect the
ball in these frames due to occlusion or clutter. The result is shown in Table 3.
Table 3. Tracking performance
Correct
494
False
78
Missing
165
Precision
86.36%
Recall
74.96%
F-score
80.26%
In the final result, we found that some fragmental trajectories (interval and short
length trajectories) are usually missing. It is because that the short trajectories are
removed during the intra-trajectory processing.
We also tested the method on a complete game between Spain and France in
FIFA2006. The result is also satisfied. Most ball-frames are correctly tracked. The
processing including all modules (view-type classification, line-marks detection with
Hough transform, three-level ball tracking) is beyond real-time with two threads on
Intel Core 2 Dual machine with 2.4G CPU, 2.0G RAM. The testing data is mpeg2
format (720x576 pixels, 25 fps). The operation speed is given in Table 4.
Table 4. Processing speed
Module
only view-type
only ball tracking
view-type + ball tracking
Time (second)
24.48
17.50
41.98
Speed (fps)
61.15 (1497/24.48)
53.03 (928/17.50)
35.66 (1497/41.98)
A Three-Level Scheme for Real-Time Ball Tracking
171
5 Conclusions
We proposed a three-level scheme for ball tracking in broadcasting soccer video in
this paper. To solve the difficulty of solely ball determination, we detect multiple
candidates in a frame in the object level. Then, we link these candidates to generate
initial trajectories and apply intra-trajectory processing to filter out uncertain trajectories. To get the final optimal path in a shot, we utilize the inter-trajectory procedure to
optimize the ball path with the Dijkstra algorithm. We tested the algorithm on real
broadcasting soccer video and got satisfying performance. The whole processing far
exceeds real-time. In the future, the verification of path composed by small fragment
trajectories should be improved.
References
1. Li, J., Wang, T., Hu, W., Zhang, Y.: Soccer highlight detection using two-dependent Bayesian network, In: ICME 2006
2. Liang, D., Liu, Y., Huang, Q., Gao, W.: A scheme for ball detection and tracking in broadcast soccer video. In: Ho, Y.-S., Kim, H.J. (eds.) PCM 2005. LNCS, vol. 3768, Springer,
Heidelberg (2005)
3. Gong, Y., Sin, L., Chuan, C., Zhang, H., sakauchi, M.: Automatic parsing of TV soccer programs. In: Proc. of 2nd International Conference on Multimedia Computing and Systems,
pp. 167–174 (1995)
4. Sullivan, J., Carlsson, S.: Tracking and labeling of interesting multiple targets. In: ECCV
2006 (2006)
5. Tong, X., Lu, H., Liu, Q.: An effective and fast soccer ball detection and tracking method”,
ICPR 2004, pp. 795–798 (2004)
6. Welch, G., Bishop, G.: An introduction to the Kalman filter, Technical Report TR95-041,
University of North Carolina at Chapel Hill (1995)
7. Yan, F., Kostin, A., Christmas, W., Kittler, J.: A novel data association algorithm for object
tracking in clutter with application to tennis video analysis, In: CVPR 2006 (2006)
8. Yu, X., Xu, C., Tian, Q., Leong, H.: A ball tracking framework for broadcast soccer video,
In: ICME 2003 (2003)
© Copyright 2026 Paperzz