Advances in Computer Science and Engineering Volume 6, Number 1, 2011, Pages 93-104 Published Online: February 22, 2011 This paper is available online at http://pphmj.com/journals/acse.htm © 2011 Pushpa Publishing House TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D PARTICLE FILTER BASED ON DETECTOR CONFIDENCE TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI Graduate School of Engineering Kobe University e-mail: [email protected] [email protected] [email protected] Abstract In this paper, a novel method for tracking multiple soccer players based on a support vector machine (SVM) and a particle filter is proposed for a monocular fixed soccer video. This paper considers three main topics. First, the state of a particle filter presented in a 3D world coordinate space, instead of a 2D image coordinate space, provides a more realistic model tracking because depth considerations are taken into account. Second, by integrating SVM into the particle filter framework, the initial positions of the players can be detected or rediscovered when the tracker has lost the players. Third, the feature histogram and correlation are combined when the likelihood for a particle is computed. The effectiveness of this method has been confirmed by experiments that actually tracked multiple soccer players. 1. Introduction When professional cameramen video or edit a sports game, they are required to make it interesting for the TV or video audience. In order to address this requirement by computer, automatic production systems including digital camera work have been studied (ex. [1, 2]). A typical video production technology involves shooting the Keywords and phrases: particle filter, soccer, tracking, earth mover’s distance. Received October 11, 2010 94 TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI sports game using a fixed camera, and then clipping and connecting the video frames. This technology allows videos to be edited according to individual audience preferences. An automatic production system is composed of several image recognition techniques and requires the semantic analysis of the process and strategy of the game. The position and trajectory of players in the game can provide useful information to facilitate the semantic analysis. In order to obtain such information, detection and tracking of objects (players, the ball, and the lines) are required. Many tracking methods have been proposed to date, and one of the popular methods of tracking-by-detection has been reported, where a detection framework is integrated into a tracking algorithm [3, 4, 6]. It makes it easy to get the initial positions of targets or rediscover them when the tracker has lost the targets. For object detection, there are two types of detection algorithms; SVM [3, 4] and AdaBoost [5, 6]. Both methods can detect their targets correctly if an abundance of training data is used. In the tracking algorithm, a particle filter is often employed because of its robustness with respect to occlusion. Therefore, our tracking algorithm also employs tracking-by-detection with a particle filter for robust tracking despite occlusion. In our system, at first, players are detected on the first frame. Next, tracking using a particle filter is carried out, where the particle filter approximates the probabilistic model discretely using a large number of particles. Each particle is translated by a state transition model and is weighted by an observation model, and then the state is propagated. Due to the detection results, the particles are weighted more accurately, where the proposed method computes the likelihood for a particle using a linear combination of the feature histogram and correlation. 2. Player Detection For each video frame, player detection is first performed. The classifier that detects players is trained in advance using the training data. The number of training data is 3,000 positive samples and 1,000 negative samples. Examples of positive and negative samples are shown in the upper part and the lower part of Fig. 1, respectively. The gray value at each pixel on a gray scale image was used as a feature. In the detection step, the multiple detectors search the video frame by sliding the window and changing the size, and the classifier outputs the detected regions TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D … 95 with an SVM score c(det) for a detector det. The SVM score is defined as the distance between the window image and the decision boundary of the classifier. Figure 1. Training samples. Figure 2 shows the detection results. Player regions are correctly detected, but in the case where multiple players are crossing in front of each other, an individual player cannot be detected. Figure 2. Detection results. 3. Tracking Algorithm 3.1. 3D particle filter A tracking algorithm is based on estimating the distribution of each state of a particle filter. The world coordinate is fixed to the soccer stadium, and the state x at time t is defined as Eq. (1) using the 3D coordinate. x = [ p x , p y , v x , v y , a x , a y ]T . (1) The state x consists of the 3D world position ( p x , p y ) , the velocity (v x , v y ) and the acceleration (a x , a y ). The state presented in the 3D world coordinate space, rather than the 2D image coordinate space, makes tracking more robust owing to depth considerations when occlusion occurs. Here, the height of the players is not considered, and ( p x , p y ) represents the position of a player’s centroid. TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI 96 Then, we assumed that the state transition model from time t − τ to t is an accelerated motion model as follows: x (t ) = Cx (t − τ ) + ϒ I 2 ∗2 τI 2∗2 C = O2 ∗2 I 2 ∗2 O2 ∗2O2 ∗2 ( τ 2 2 ) I 2 ∗2 τI 2 ∗2 I 2 ∗2 ϒ = [ε p I1∗2 , ε v I1∗2 , ε a I1∗2 ]T . The process noises ε p , εv , ε a for state variables are independently drawn from zero-mean normal distributions. The variance σ2p for the position noise changes with the size of the tracking target, whereas the variances σ2v and σ2a for the velocity and acceleration noise are inversely proportional to the number of successfully tracked frames. Hence, we can expect that longer a target is tracked successfully, the less the particles are spread. 3.2. Assigning detection to tracker In order to integrate the detector into the tracking framework using SVM, a detector/tracker pair must be decided. For this pairing decision, the algorithm described in [3] is utilized. The matching algorithm works as follows. First, a matching score s(tr , det ) for each pair (tr, det ) of the tracker tr and the detector det is computed as shown in Eq. (2): N s(tr , det ) = g (tr , det ) ⋅ c(det ) + α p N (det − p ) , p∈tr (2) sizetr − sizedet g (tr , det ) = p( sizedet | tr ) = p N , sizetr (3) ∑ where p N (det − p ) ~ N (det − p; 0, σ 2det ) denotes the normal distribution evaluated for the distance between the detector det and each particle p of tracker tr, and g (tr , det ) is a gating function which additionally assesses each detector. sizetr denotes the size of tracker located at p y , and sizedet denotes the window size of the TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D … 97 detector det. The values of α and σ 2det are set experimentally. The meaning of Eq. (2) is that a pair consisting of tracker tr and detector det gets a higher matching score s(tr , det ) , if their average distance and window size difference are smaller. Finally, the matching matrix function S (the number of trackers × the number of detectors), whose element is s(tr , det ) , is computed. Then, the pair (tr ∗ , det ∗ ) with the maximum score in S is iteratively selected, and the rows and columns belonging to the tracker tr∗ and the detector det ∗ in S are deleted. This is repeated until no further valid pair is available. 3.3. Observation model To compute the weight ωtr, p for a particle p of the tracker tr, our algorithm estimates the conditional likelihood of the new observation yt of the propagated particle, where the state is xt . For this purpose, we combine different sources of information, namely the likelihood based on the detector det ∗ , the histogram distance and the correlation as follows: ωtr, p = p( yt | xt ) = p D + β ⋅ p H + (1 − β) ⋅ pC . (4) The first term pD denotes the likelihood based on the detector det ∗ , which is computed as follows: p D = I (tr ) (φ ⋅ c( p ) + ψ ⋅ p N ( p − det ∗ )) , 1 s(tr , det ∗ ) > threshold I (tr ) = , 0 other (5) where c ( p ) denotes the SVM score at the particle position, and p N ( p − det ∗ ) denotes the distance between the particle p and the associated detector det ∗ , evaluated under a normal distribution p N . I (tr ) is the indicator function that returns 1 if a detector is associated with the tracker, namely the matching score s(tr , det ∗ ) is larger than threshold. Otherwise, 0 is returned. When a matching 98 TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI detection is found, this term robustly guides the particles. The parameters φ, ψ and threshold are set experimentally. The second and third terms ( pH and pC ) denote the likelihood based on the histogram distance and the likelihood based on the correlation, respectively. The former is robust to occlusion or pose change of players, whereas the variance of particle grows more than the necessity. On the other hand, the latter is weak to occlusion but it has an advantage in that the particle variance is small. Therefore, the weight for the former increases, when the distance between players is small (when occlusion of players tends to occur). In contrast, when the distance is long, the weight for the latter increases. Due to this method, the players can be tracked accurately for a long time. In this paper, we employ the Earth Mover’s Distance (EMD) [7] as the histogram distance and Normalized Cross-Correlation as the correlation. They are defined as follows: pH = pH (distEMD ) , pC = ∑ ∑ i, j i, j (6) {I (i, j ) − I } ⋅ {T (i, j ) − T } {I (i, j ) − I }2 ⋅ β = p N (min disti ) , i∈ P ∑ i, j {I (i, j ) − T }2 , (7) (8) where distEMD in Eq. (6) denotes EMD. The normalized cross-correlation pC is defined by Eq. (7). Here, I ( x, y ) and T ( x, y ) are gray values at point ( x, y ) in the search area and the template image, respectively. I and T are averaged gray value of the search area and the template image, respectively. Eq. (8) denotes the weight of pH and pC . It is defined as the distance between the target particle and the closest player, evaluated under a normal distribution p N . disti denotes the distance between the target player and player i, and P denotes a set of players. 3.4. Calibration The perspective projection of a pinhole camera model is needed to project world TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D … 99 coordinates ( X w , Yw , Z w ) of state x to image coordinates ( x f , y f ) as shown in Eq. (9). A is an intrinsic parameter, R is the rotation matrix, and t is the translation vector. R and t are extrinsic parameters. Xw x f y = A[R | t ] Yw f Zw 1 1 (9) A soccer field has known details, such as the length of lines in the penalty area and the goal area, etc. Therefore, several points on an image are manually associated with the world coordinates of the soccer field, and then the camera parameters are calibrated using the Tsai-method [8]. Fig. 3 shows the points used for calibration. Figure 3. The points used for calibration (red points). 4. Experiment 4.1. Experiment environment We videoed a film of a soccer game played during the 38th National High School Soccer Championship in Japan. The image size was 1,280 × 720 pixels with 24-bit color. The image size of the player template was 20 × 40 pixels. The number of particles was 300, and the frame rate was 30 fps. As the experiment data, the video sequence as shown in Fig. 4 was used, where the left half of the field located in the lower half of the image. We extracted 12 video clips of 330 frames (on average), and evaluated the results in terms of tracking accuracy as defined by Eq. (10). 1 Tracking Accuracy = P P of correctly tracked frames for player i ∑ Number Number of frames including player i i =1 (10) 100 TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI Figure 4. Frame shot used in experiment. If the distance between the correct 2D image coordinate obtained manually and the coordinate given to a 2D image from a tracking result obtained by Eq. (9) was five pixels or less, it was defined as correct tracking. P denotes the number of players. The camera parameters estimated by calibration are shown in Table 1. Table 1. Estimated camera parameters Extrinsic parameters R t Intrinsic parameters Rx 93.29 [deg] f 3.16 [m] Ry 5.39 [deg] k 3.61e-02 Ry –0.30 [deg] sx 1.0 tx –28.38 [m] sy 1.0 ty 12.25 [m] Cx 640 [pixel] tz 71.53 [m] Cy 360 [pixel] 4.2. Experiment results Fig. 5 shows the tracking results obtained using the proposed 3D particle filter compared with the traditional 2D particle filter [1], where we used the normalized cross-correlation as the likelihood evaluation. The average tracking accuracy was improved significantly from 63.1% to 70.3%, so it can be said that modeling based TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D … 101 on the 3D world coordinates is effective. But the tracking accuracy decreased for sample 13, sample 12, sample 09 and sample 05, where the players hardly move or simply move linearly. It appears that the accuracy decreased in the proposed method, even though the motion was quite simple, because complex 3D modeling is being carried out. Figure 5. Tracking accuracy of the proposed 3D tracking compared with 2D tracking. We compared the performance of pD , pH and pC in Eq. (4). The results are shown in Table 2, where the circle indicates the utilization of the corresponding likelihood. In pH , the Bhattacharyya distance and EMD were compared. Comparing the second line with the third line in Table 2, EMD showed better performance than the Bhattacharyya distance. It can be said that EMD is more effective than the Bhattacharyya distance with respect to occlusion. Comparing the second line with the fourth line in Table 2, pD improved the tracking accuracy significantly. Table 2. Tracking accuracy in terms of observation model pD pH pC Tracking Accuracy [%] × × 0 70.30 × Bhattacharyya 0 72.41 × EMD 0 73.64 0 Bhattacharyya 0 75.22 102 TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI Fig. 6 shows the tracking results for video clips with narrow windows viewed close-up. A number of occlusion incidents occurred during a short time, but the Figure 6. Tracking results (close-up view). proposed method was able to track the players without problems. Fig. 7 shows the tracking results for video clips with wide windows view from further away. In both the cases, the players can be tracked accurately even when occlusion occurs. Figure 7. Tracking results (distant view). TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D … 103 5. Conclusion In this paper, we proposed a novel method for detecting and tracking multiple objects soccer players in a fixed monocular soccer video. By making use of the state in the 3D world coordinate space in a particle filter, we were able to achieve a more realistic model. In addition, the observation model incorporating the different information sources makes tracking more robust when occlusion or pose changes occur. In future, we are planning to carry out work focusing on the detection of more complex soccer events (e.g., offsides and fouls) using the 3D positional information of players acquired by the proposed method. References [1] Y. Ariki, T. Takiguchi and K. Yano, Digital camera work for soccer video production with event recognition and accurate ball tracking by switching search method, IEEE International Conference on Multimedia & Expo, pp. 889-892, Hannover, Germany, June 2008. [2] T. Nishino, T. Takiguchi and Y. Ariki, Situation recognition using 3D positional information of ball from monocular soccer image sequence, International Conference on Multimedia, Information Technology and its Applications (MITA’2009) (2009) 109-112. [3] M. Breitenstein, F. Reichin, B. Leibe, E. Koller-Meier and L. V. Gool, Robust tracking-by-detection using a detector confidence particle filter, The 12th IEEE International Conference on Computer Vision (ICCV), pp. 1515-1522, Kyoto, Japan, Sept. 2009. [4] G. Zhu, C. Xu, Q. Huang and W. Gao, Automatic multi player detection and tracking in broadcast sports video using support vector machine and particle filter, IEEE International Conference on Multimedia & Expo (ICME), pp. 1629-1632, Toronto, Canada, July 2006. [5] Y. Wu, X. Tong, Y. Zhang and H. Lu, Boosted interactively distributed particle filter for automatic multi-object tracking, IEEE International Conference on Image Processing (ICIP), pp. 1844-1847, San Diego, U. S. A., Oct. 2008. [6] K. Okuma, A. Taleghani, N. D. Freitas, J. J. Littele and D. G. Lowe, A boosted particle filter: multi target detection and tracking, The 8th European Conference on Computer Vision (ECCV), pp. 28-39, Prague, Czech, May 2004. 104 TAKURO NISHINO, TETSUYA TAKIGUCHI and YASUO ARIKI [7] Y. Rubner, C. Tomasi and L. J. Guibas, The earth mover’s distance a metric for image retrieval, International Journal of Computer Vision (IJCV) 40(2) (2000), 99-121. [8] R. Y. Tsai, An efficient and accurate camera calibration technique for 3D machine vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 364-374, Miami, U. S. A., June 1986. [9] Xinguo Yu, Xin Yan and Yiqun Li, Tool-aided semantics acquisition for live soccer video, IEEE International Conference on Multimedia & Expo, pp. 893-896, Hannover, Germany, June 2008.
© Copyright 2026 Paperzz