Pattern Recognition 41 (2008) 2422 – 2433 www.elsevier.com/locate/pr Posterior probability measure for image matching Zuren Feng a,∗ , Na Lu a , Ping Jiang b,c a State Key Laboratory for Manufacturing Systems Engineering, Systems Engineering Institute, Xi’an Jiaotong University, Xi’an 710049, China b Department of Computing, University of Bradford, Bradford BD71DP, UK c Tongji University, Shanghai 201804, China Received 2 July 2007; received in revised form 12 December 2007; accepted 16 December 2007 Abstract Template matching is one of the principle techniques in visual tracking. Various similarity measures have been developed to find the target in an acquired image by matching with a template. However, mismatching or misidentification may sporadically occur due to the influence of the background pixels included in the designated target model. Taking into account the statistical features of a search region, a novel similarity measure is proposed, which can decrease the interference of the background pixels enclosed in the model. It highlights the significant target features and at the same time reduces the influence of the features shared by both the target and the background. It exhibits an excellent monotonic property and a distinct peak-like distribution. This new measure is also demonstrated to have a direct interpretation of posterior probability and is named as posterior probability measure (PPM). The proposed PPM can be obtained through a pixel-wise computation and exhibits suitability for image matching. The pixel-wise computation also enables a fast measure update after a target region has changed, which results in a new adaptive scaling method for tracking a target with a varying size. Experiments show that it provides a higher precision in the localization and a discriminatory power superior to the existing similarity measures, such as Bhattacharyya coefficient, Kullback–Leibler divergence, and normalized cross correlation. The effectiveness of the adaptive scaling method is demonstrated in experiments. 䉷 2008 Elsevier Ltd. All rights reserved. Keywords: Similarity measure; Image matching; Visual tracking; Posterior probability 1. Introduction A variety of image tracking algorithms have been developed in recent years [1], including model-based methods [2,3], feature-based methods [4–6], knowledge-based methods [7–11], learning-based methods [12–14], and foreground–background discrimination methods [15–18]. Among these methods, the mean-shift algorithm has been widely studied and applied in image segmentation and tracking [4,19–22], for its simplicity and robustness. In the visual tracking field, the model-based mean-shift algorithm has achieved prominent success because of its distinctive real-time performance. The foreground–background discrimination methods [15–18] transform object tracking into classification. Classifiers such as support vector machine, linear discriminant ∗ Corresponding author. Tel.: +86 29 82 667 771; fax: +86 29 82 665 487. E-mail addresses: [email protected] (Z. Feng), [email protected] (N. Lu), [email protected] (P. Jiang). 0031-3203/$30.00 䉷 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2007.12.013 analysis, and variance ratio have been trained to discriminate an object from a background. However, there are two crucial aspects we must pay more attention to for visual tracking: localization precision and target scale adaptation [19,23–25]. The model matching has been widely employed in the tracking methods mentioned above. For template-matching-based image tracking, a vital factor that influences the target localization precision is the selection of a proper similarity measure which evaluates the distinctness between a template as the target model and a candidate region in the image. The most widely used similarity measures include the Bhattacharyya coefficient [26–28], Kullback–Leibler divergence [27,29,30], normalized cross correlation [31], histogram intersection distance [32], and so on. Among them, the Bhattacharyya coefficient has been applied in the mean-shift tracking algorithm. The Bhattacharyya coefficient treats the target pixels and the background pixels in a target model equally. Therefore, the background pixels may interfere with the target matching and cause localization bias or even mismatching. The same flaw also exists in other Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 similarity measures. What is more, the pixels shared by the target and the background will also influence the matching result. Consequently, there is a necessity for the development of a more reliable similarity measure which takes the background interference into account. In this paper, a similarity measure, i.e. the posterior probability measure (PPM), is presented to deal with the issue by considering statistical feature of a search region. During a visual tracking process, the scale of the tracked target may continuously vary. In order to achieve precise localization, the target scale or the kernel bandwidth must be adaptive. The kernel bandwidth is a scale parameter that is closely related to the target size when kernel-based methods such as mean-shift are employed. Numerous methods have been developed for adaptive adjustments of the target scale. Comaniciu et al. [19] proposed to apply the mean-shift procedure three times with three different kernel bandwidths derived by adding or reducing 10 percent of the current target size during each repetition, from which the optimal scale with the maximum Bhattacharyya coefficient will be selected. This method works well when the target becomes smaller, but may fail when the target expands [25]. A data-driven scale selection method was presented in Refs. [23,24]. By analyzing the local statistical characteristics around each data point, the kernel bandwidth is calculated. Collins developed a scheme for blob tracking over the scale space [25]. A scale space mean-shift procedure is first executed and then the spatial space calculation is processed over multiple scales. All these methods are complicated and computationally demanding, which hinders their application to the real-time tracking. The proposed PPM can be calculated by following a concise pixel-wise computation so that a fast and convenient adaptive scaling method can be developed. The visual tracking experiments reveal the excellent performance of this method. In short, there are two major contributions in this paper. Firstly, a new similarity measure, the PPM, is proposed, which is most appropriate for blob matching tasks. Secondly, this new measure leads to the development of a fast and convenient adaptive scaling method. Experiments have shown that the PPM-based tracking method can achieve more precise tracking than the other similarity-measure-based algorithms under circumstances involving rotation, occlusion, or cluttered background. The peak-like shape of the similarity distribution of the PPM is more distinct, which improves both tracking accuracy and robustness. For the PPM, the influence of the background pixels is suppressed by considering the statistics of a search region, while the contribution from the actual target features is enhanced, which leads to the improvement in the precision of the target localization. The model scale constraint is also relaxed because more background pixels are allowed to be contained. In addition, the PPM value of a candidate region can be simply computed by adding up the individual contribution of each pixel in the region. As a result, the adaptive scaling algorithm can be implemented using pixel-wise computing. The paper is organized as follows: Section 2 gives some preliminary definitions. Section 3 analyzes the performance of the 2423 existing similarity measures and the reason for localization failure. The new similarity measure is proposed in Section 4. Comparisons of matching results between the PPM and the other similarity measures are also given. Section 5 analyzes the properties and limitations of the PPM. The adaptive scaling method is depicted in Section 6. Section 7 gives and demonstrates the probabilistic interpretation of the PPM. Visual tracking experiments are presented in Section 8. Section 9 forms a conclusion. 2. Image matching and image feature analysis 2.1. Image matching In this paper, the aim of image matching or blob matching is to search for a sub-area within an image which is most similar to a predefined target model. The obtained sub-area is the target region. All the sub-areas to be matched against the target model are target candidates. Both the target model and the target candidates are characterized by a histogram vector. Thus, we have target model: q = {qu }u=1,...,mu target candidate: p = {pu }u=1,...,mu , where mu is the dimension of the histogram. The similarity measure is a function evaluating the degree of similarity between the target model and the target candidate. We have similarity measure: (i) ≡ (q, pi ), where i is the index of the target candidate and pi is the histogram of the target candidate i. A target candidate with the largest similarity value will be recognized as the target. Therefore, an image matching task can be formulated as max i∈{1,...,nS } (q, pi ), (1) where ns is the number of target candidates in the search region. Usually, there is a rough prediction of the target location. Therefore, it is not necessary to search the whole image. The area searched for a match is called the search region. 2.2. Image feature analysis In order to enclose the whole target, it is inevitable that the background pixels will merge with the target model. When the background is cluttered or the imaging quality is poor, these pixels may produce interference in localization of the true target and lead to biased localization or misidentification. Figs. 1 and 2 provide two examples to illustrate the aforementioned background interference. It can be easily observed that there exist background pixels in the target models. In the ice hockey player model (Fig. 1), in addition to the player, which is the tracked target, there are also a lot of background pixels of the ice rink. Fig. 2 is an image taken when the pingpong bounces quickly. In the corresponding target model, the background pixels are interlaced with the ping-pong pixels. 2424 Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 Fig. 1. Ice hockey image and target model. Fig. 2. Ping-pong image and target model. Due to the movement of the target or camera, complications such as partial occlusion or low-quality images may be produced, which lead to that the target loses dominance in an image. On the other hand, the background pixels existing in the target model could appear in the image in large quantities. Without considering background interference, a high value of similarity measure can no longer indicate the appearance of the target, which is the main reason for incorrect target localizations. Although the exclusion of the background pixels from the target model may be very difficult, such as in the case of the ping-pong model (Fig. 2), a new similarity measure that treats the target pixels and background pixels with different weights could be a solution. Fig. 3. Matching instance of ice hockey player. (a) Correct target region and its enlargement. (b) Biased target region obtained by the Bhattacharyya coefficient and its enlargement. Table 1 Matching performance analysis of the Bhattacharyya coefficient for ice hockey instance Target model Correct target region Biased target region Target pixels 621 (46%) 858 (61%) 421 (30%) 3. Performance analysis of the existing similarity measures Background pixels 749 (54%) 294 (21%) 887 (63%) There exist many kinds of similarity measures or distance metrics, such as Euclidean distance, Bhattacharyya coefficient, cross correlation, normalized cross correlation, Kullback–Leibler divergence, etc. Among these measures, the Bhattacharyya coefficient and Kullback–Leibler divergence have wide applications [4,29]. Here we take the Bhattacharyya coefficient as an example to analyze the matching performance of the existing similarity measures which treat all the pixels in the target model identically. Bhattacharyya coefficient value 0.805 0.884 and q is formulated as (p, q) = mu √ pu · q u , (2) u=1 3.1. Bhattacharyya coefficient The Bhattacharyya coefficient is a popular similarity measure which has been employed in the well-known mean-shift tracking algorithm. The Bhattacharyya coefficient between p where p and q are the histogram vectors of the target candidate and the target model respectively, pu and qu are the uth elements of vectors p and q, and mu is the number of bins of the histogram. Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 Fig. 4. Matching instance of ping-pong sequence. (a) Correct target region and its enlargement. (b) Mistaken target region obtained by the Bhattacharyya coefficient and its enlargement. Table 2 The Bhattacharyya coefficient matching performance analysis for ping-pong instance Target model Correct target region Mistaken target region Target pixels 580 (80%) 218 (30%) 0 Background pixels 145 (20%) 3 (0.4%) 515 (71%) 0.167 0.276 Bhattacharyya coefficient value 3.2. Mismatch analysis of the Bhattacharyya coefficient When the Bhattacharyya coefficient is applied in visual tracking, localization bias or even mismatch may occur occasionally. In this section, the causes of the matching failures are analyzed and illustrated through two instances, in which the pixels of the target models can be divided into two categories, target pixels and background pixels. Fig. 3 shows a matching instance of the ice hockey player using the same target model as in Fig. 1. Fig. 3(a) is the correct 2425 target region. Fig. 3(b) is the localization result obtained by the Bhattacharyya coefficient which is clearly a biased target region. Table 1 gives a comparison between the case of the correct target region in Fig. 3(a) and the case of the biased target region in Fig. 3(b) in terms of Bhattacharyya coefficient values, and the pixel counts of the target pixels and the background pixels. In the correct target region, there are 61% target pixels and 21% background pixels. The remaining 18% of the pixels are not included in the predefined target model and do not affect the similarity value. The corresponding Bhattacharyya coefficient is 0.805. However, in the biased region, there are 30% target pixels, 63% background pixels, and 7% others. The Bhattacharyya coefficient of the biased region reaches 0.884, which is higher than that of the correct target region. Therefore, using the Bhattacharyya coefficient to match produces a biased target region, as shown in Fig. 3(b). It can be deduced that the higher percentage of the background pixels (63%) in Fig. 3(b) could cause the bias. Fig. 4 and Table 2 provide another matching instance for the case of the ping-pong model in Fig. 2. In this case, a complete mismatch happens. It can be observed from Table 2 that the high percentage of background pixels (71%) in the mistaken target region has affected the Bhattacharyya coefficient and caused the mismatch. The above analysis comes to two conclusions. Firstly, when the Bhattacharyya coefficient is employed as the similarity measure for blob matching, localization failures such as bias or mismatch may take place. Secondly, the background pixels in a target model may influence the similarity value notably, so that the similarity value of other regions may even exceed that of the correct target region. This is the main reason for matching failures. The other existing similarity measures, which do not differentiate between target pixels and background pixels, exhibit similar problems to the Bhattacharyya coefficient. 4. Posterior probability measure 4.1. Derivation of the measure As discussed above, the background pixels included in the target model may lead to mistaken localization. Differentiating between target pixels and background pixels is one way to avoid it. For target matching, it is common to search for the best matched target region in a predefined search region, which is much larger than the target region. In other words, a search region contains mainly background pixels but fewer target pixels. Consequently, the histogram of the search region can provide clues to understanding the statistical characteristics of the background pixels. Assuming s is the unnormalized histogram vector of the search region, a larger su indicates feature u is more likely to be a feature of background pixels, whereas for a feature of target pixels, su is more likely small. Under this consideration, 1/su can be introduced into the existing similarity measures as a rectifying weight to reduce the weight given to background features and increase the weight given to target features, respectively. Following this idea and taking cross 2426 Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 correlation [31] as a prototype, a new similarity measure can be defined as m (p, q) = u p u qu 1 , m su (3) u=1 Table 3 PPM matching performance analysis for ice hockey instance Target pixels Background pixels PPM value Target model Correct target region Biased target region Search region 621 749 858 294 421 887 1388 8840 0.356 0.232 Table 4 PPM matching performance analysis for ping-pong instance Target pixels Background pixels PPM value Target model Correct target region Mistaken target region Search region 580 145 218 3 0 515 302 6939 0.196 0.048 where su , pu , and qu represent the uth elements of the unnormalized histogram vectors s, p, and q, respectively; p and q are the histogram vectors of the target candidate and the target model, respectively; m is the number of pixels in the target model, which is a normalization constant. It will be justified theoretically in Section 7 that the new measure (p, q) defined above in Eq. (3) is an evaluation of the posterior probability of a target candidate to be identified as the target, i.e. a PPM. 4.2. Comparison analysis In comparison with the matching performance of the Bhattacharyya coefficient shown in Tables 1 and 2, matchings of the ice hockey player and the ping-pong ball are carried out again using the PPM. Tables 3 and 4 show the results. The last columns of the tables give the numbers of pixels in the search regions, which indicate that the background pixels are far more than the target pixels. As the result of introducing 1/su as weights in Eq. (3), the importance of the background pixels to the PPM has been apparently weakened, whereas the importance of the target pixels has been strengthened. Experimental results show that the target regions for both instances can be localized correctly as they exhibit higher PPM values than biased or mistaken target regions (Tables 3 and 4). Fig. 5. Similarity distribution comparisons. (a) PPM measure similarity distribution. (b) Bhattacharyya similarity distribution. (c) Localization results of PPM. Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 5. Matching performance evaluation of the PPM One common characteristic of the existing similarity measures is that they treat the target pixels and the background pixels in a target model equally, unlike the proposed PPM which treats them differently. Taking the Bhattacharyya coefficient as their representative, this section will evaluate the matching performance of the PPM in comparison with them. A similarity measure is often expected to have a distinctly sharp peak-like distribution in order to achieve a reliable match. Therefore, the distribution shapes of two similarity measures are compared first as shown in Fig. 5, where the big squares in the images indicate the search regions and the small ones indicate the obtained target regions. From the figure, it is clear that the similarity distribution of the PPM has a more distinct and sharper peak than the distribution of the Bhattacharyya coefficient. This phenomenon can be explained by the suppressing capability of the PPM on background pixels. A pixel further from the target center means that this pixel is more likely to be a background pixel. Its importance in the PPM is attenuated more as a result. Therefore, a distribution of the PPM tends to be shaped into a sharper peak than the Bhattacharyya coefficient. Because of the background suppressing capability of the PPM, we can also expect a more reliable and tolerant match to varying model scale. A target model with a larger scale generally encloses more background pixels and the corresponding similarity measure suffers more background interference. 2427 This may lead to mistaken localization by using the existing similarity measures. However, the PPM can weaken the interference effect. In Fig. 6, the scales of the three target models are 17 × 17, 29 × 29, and 39 × 39, respectively. Even though the target model scales have been greatly enlarged in the second and the third images, with more background pixels enclosed, the PPM distributions still hold monotonic and distinct peak shapes as shown in Fig. 6(a), whereas the Bhattacharyya-coefficient-based ones are damaged. Finally, we can expect that the PPM can effectively improve the localization accuracy in comparison with the Bhattacharyya coefficient. Fig. 7(a) depicts the similarity distributions of the PPM and the corresponding matching results. Fig. 7(b) presents the cases of the Bhattacharyya coefficient. The PPM correctly localizes the targets, whereas the Bhattacharyya coefficient results in bias or mismatch because of the background interference. 6. Computation property and scale adaptation The computation of a Bhattacharyya coefficient is featurewise as shown in Eq. (2), which means the coefficient is the √ summation of pu · qu for each feature u. The PPM computation given in Eq. (3) is feature-wise too but it can be transferred into a pixel-wise manner as shown in Section 6.1. The pixelwise computation allows that the PPM of a region is obtained by accumulating effects of individual pixels, from a pixel to Fig. 6. Target model scale tolerance comparisons. (a) PPM similarity distribution. (b) Bhattacharyya coefficient similarity distribution. (c) Localization results of PPM. 2428 Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 Fig. 8. Scale adaptation illustration. we have ⎛ ⎞ mu qu 1 ⎝ 1⎠ (p, q) = su m u=1 {j :B(j )=u} ⎛ ⎞ mu qu 1 ⎝ ⎠ = m su u=1 {j :B(j )=u} ⎛ 1 q1 q2 = ⎝ + m s1 s2 {j :B(j )=1}1 {j :B(j )=2}12 ⎞ qmu ⎠ . +··· + smu (4) {j :B(j )=mu } Because {j : B(j ) = 1} ∪ {j : B(j ) = 2} ∪ · · · ∪ {j : B(j ) = mu } = {1 · · · m}, where mu is the number of bins and m is the total number of pixels in the region, we can rearrange Eq. (4) into the pixel-wise form: m (p, q) = 1 qu (j ) , m su (j ) (5) j =1 Fig. 7. Matching accuracy comparisons. (a) The PPM-based similarity distributions and localization results. (b) Bhattacharyya-coefficient-based similarity distributions and localization results. another pixel. This simplifies scale adaptation (Section 6.2). However, a Bhattacharyya coefficient cannot be computed in a pixel-wise manner straightforwardly due to the involved square root operation. 6.1. Pixel-wise computation of the new measure In Eq. (3), the pu represents the number of pixels in a target candidate belonging to a histogram bin (or feature) u, i.e. pu = {j :B(j )=u} 1, where B(j ) is the bin that pixel j belongs to. Substituting it into Eq. (3), where j is the pixel index, qu (j ) is the uth bin value of the target model histogram if B(j ) = u, and su (j ) is the uth bin value of the search region histogram if B(j ) = u. Obviously, the value of qu (j )/su (j ) can be regarded as the contribution of pixel j to the similarity value. Thus, the PPM can be computed by accumulating individual pixel’s contributions. 6.2. Scale adaptation Scale adaptation is an adaptive process to fit a target region to a variable target scale for the purpose of precise target tracking. The pixel-wise PPM in Eq. (5) provides an efficient formula for scale adaptation because it is simply a linear combination of contributions from individual pixels. Suppose a target has been localized in a target region with size w(k) (width and height) at the kth frame of a video and its PPM (p(y ∗ ), q) has been obtained, where y ∗ denotes the current target center position, Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 2429 ¯ 0 > 0.75, ¯ −1 > 0.8, In Eq. (8), the expanding condition, and ¯ 1 > 0.6, means the average contribution of all the three layers retains a consistently high level of similarity and the pixels by the border are likely to be a part of the target. The target region should be enlarged. The contracting condition, ¯ < 0.6 and ¯ 1 < 0.3, indicates situations where the average 0 contribution of the pixels on the target border is rather small and it decreases rapidly when moving out of the current target region. Consequently, the target region should be reduced consequently. Fig. 9 shows two examples of scale adaptation for target tracking. 7. Probabilistic justification of the PPM Fig. 9. Scale adaptation results. (a) Adaptive tracking results of ping-pong sequence. (b) Adaptive tracking results of face sequence. p(y ∗ ) is the corresponding target histogram vector, and q is the target model. We can average the PPM to obtain an approximate contribution of each pixel to the similarity measure as ∗ ¯ = (p(y ), q) , (6) m where m is the number of pixels in the target region. Due to the motion of the target, the scale of the target may change. The size of the target region has to be able to adapt to this change. This is conducted first by examining the average similarity contribution of the pixels in the inner and outer layers of the target region against the average contribution of each pixel on the current target region border. Fig. 8 illustrates the first inner layer, border, and the first outer layer of a target region. The average similarity contributions of the pixels in the inner and outer layers are denoted as ¯ , i = −a, . . . , 0, . . . , a, (7) i where i < 0 means the ith inner layer, i > 0 means the ith outer layer, and i = 0 represents the target region border; a is the comparison step of scale adaptation and is set to 1 without losing generality. An empirical scale adaptation formula, which has been applied in a great number of experiments and proved to be good in tracking performance, is given as ⎧ ¯ > 0.8 and ¯ > 0.75 and w(k) + 2 if −1 0 ⎪ ⎪ ⎪ ⎨ ¯ 1 > 0.6, w(k + 1) = (8) ¯ < 0.6 and ¯ < 0.3, ⎪ w(k) − 2 if ⎪ 0 1 ⎪ ⎩ w(k) else, where w(k) is the size of the target region at frame k. The similarity measure formularized by Eq. (3) evaluates the posterior probability of a target candidate to be the true target. Proof. Let ns be the number of pixels in the search region, m the number of pixels in the target model, and nc the number of pixels in the target candidate. Suppose event X represents that the current target candidate is the target in the search region. Without assessing the distributions of the regions, the prior probability P (X) can be obtained only by the ratio of pixels in the candidate region to that in the search region: nc (9) P (X) = . ns Suppose event Yu represents feature u is detected in the search region, then su (10) P (Yu ) = , ns where su is the uth element of the histogram vector of the search region, as defined in (3). Therefore, in the search region, the probability that a target candidate is the target and contains the uth feature of the target model is n c pu pu · = , (11) P (XY u ) = P (X)P (Yu |X) = ns nc ns where pu is the uth element of the histogram vector of the target candidate, as defined in (3); P (Yu |X) denotes the probability that the uth feature is detected in the candidate region. According to Bayes’ theorem, the posterior probability that the current target candidate is the target after detecting the uth feature can be described as su P (X)P (Yu |X) pu pu = . (12) = P (X|Yu ) = P (Yu ) ns ns su Taking qu /m, the ratio of the uth feature in the target model, as the normalizing constant, the posterior probability can be represented as qmu · psuu , where qu is the uth element of the histogram vector of the target model. Examining all features, u = 1, . . . , mu , we obtain the combined posterior probability of the target candidate to be the target: P (X|Y ) = mu qu u=1 m P (X|Yu )= mu mu qu p u 1 pu q u · = m su m su u=1 which is the PPM proposed in Eq. (3). u=1 (13) 2430 Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 8. Experimental comparisons 8.1. Tracking performance with features shared by target and background The mean-shift algorithm is a local search strategy toward the maximum similarity defined by a similarity measure. For each frame, the center of a target region is initialized and then updated iteratively in a format of mean of the similarity contribution as m j =1 xj (i)j ŷ(i + 1) = m , (14) j =1 j where m is the number of pixels in the target region, i is the iteration index, xj (i) is the position of pixel j in the current region, j is the similarity contribution of pixel j, and ŷ(i + 1) is the updated center position of the target region. The mean-shift of the region center in Eq. (14) will be updated successively and can converge to a stationary position, where the similarity is maximal or, at least, locally maximal. A more detailed description of the mean-shift search procedure can be found in Refs. [4,19]. With the PPM as discussed in Section 6.1, pixel contribution j can be represented as j = qu (j ) . su (j ) (15) By conducting mean shift (14) with j in Eq. (15) iteratively, the target can be localized. Fig. 10 shows a comparison between the PPM mean-shift tracker and the conventional mean-shift tracker [4] for ice hockey player tracking. There are 186 frames in the ice hockey sequence. Some color features coexist in both the target model and the background, such as the ice ground pixels, as shown in Fig. 1. The experiment shows that the PPM tracker can provide a more superior tracking performance with far less bias than the conventional mean-shift tracker. Fig. 10 shows some comparison frames. For illustration convenience, only the parts within the search regions are given (big squares). The small squares indicate the localized targets. Fig. 11 gives some representative comparisons between the PPM and three typical image trackers, which are the Kullback–Leibler divergence based trust-region tracker [30], the normalized cross-correlation based tracker [31], and the histogram intersection distance-based tracker [32]. In the latter two trackers, an exhaustive search is utilized. In all the frames, the PPM-based tracker precisely tracks the target, whereas the other three measures-based trackers exhibit occasional bias or mismatching, as shown in Fig. 11. 8.2. Tracking performance with poor imaging quality sequence Fig. 12 gives some comparison results for tracking a pingpong ball, which has a very blurred image due to its speed. In this case the conventional mean-shift tracker fails to localize the target (see Fig. 12(b)), whereas the PPM still produces satisfactory localization (see Fig. 12(a)). 8.3. Tracking performance under rotation and occlusion To track a moving target with possible rotation and occlusion, Fig. 13 gives some comparisons between the PPM-based tracker and the foreground–background texture discrimination tracker [15]. The foreground–background texture discrimination tracker conceives tracking as developing of distinction between foreground and background. As a result, the tracking is transformed into classification. Nguyen and Smeulders [15] claimed that their tracking method outperforms several stateof-the-art trackers, such as SSD [33], WSL [34], and Collins and Liu [16]. It was also indicated in Ref. [15] that the tracker may fail when there are features similar to the target in the background. A comparison experiment is carried out as shown in Fig. 13, where the mark ‘95’ on the T-shirt is tracked. Due to the motion of the person, the mark can be rotated, deformed, and even partially occluded. The PPM-based tracker shows satisfactory tracking results, whereas the foreground–background texture discrimination tracker fails in some frames, such as Fig. 10. Comparison results of ice hockey sequence. (a) PPM mean-shift tracking results. (b) Conventional mean-shift tracking results. Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 2431 Fig. 11. Comparisons with the Other Similarity Measures. (The correct trackings are obtained by the PPM-based tracker and the other results are obtained by the trackers indicated by the subtitles.) (a) Comparisons with Kullback–Leibler divergence based trust-region tracker. (b) Comparisons with normalized cross-correlation. (c) Comparisons with histogram intersection distance. Fig. 13. Comparisons under rotations and occlusions. (a) PPM tracking results. (b) Tracking results of foreground–background texture discrimination tracker. Fig. 12. Comparison results of ping-pong sequence. (a) PPM tracking results. (b) Tracking results of conventional mean-shift tracker. 2432 Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 Fig. 14. Football tracking results. frames 2, 3, 4, and 6 in Fig. 13(b). In both trackers, the same adaptive scaling method [19] is employed. 8.4. Application in robot soccer The proposed measure and scale adaptation method have also been successfully applied to our robot soccer team for ball tracking. Fig. 14 gives some football tracking results. Using a computer of 2.4 GHz, for an image of 640 × 480 pixels, each matching computation takes only 2.4 ms. 9. Conclusion For blob matching, the existing similarity measures may produce matching bias or misidentification due to background interference. This generates the need for a more robust and effective similarity measure. By taking the statistics of a search region as the statistical estimate of background pixels, a new measure, the PPM, for blob matching was proposed. This measure can effectively attenuate the influence of the background pixels. In comparison with the other existing similarity measures, the PPM has a more distinctly monotonic and peak-like distribution and can achieve higher matching accuracy. This paper showed that the PPM is in fact the posterior probability of a target candidate being identified as a target and developed a pixel-wise algorithm for the PPM-based image tracking. A novel and simple scale adaptation algorithm was also presented. Comparison experiments between the PPM and the existing similarity measures were carried out for image tracking and demonstrated a more precise and robust tracking performance. Acknowledgments We would like to thank Dr. John Baruch for proofreading the paper. We would also like to thank Dr. Liangfu Li and Yin Tao for their helpful discussions. The research is supported by the National Natural Science Foundation of China under Grant no. 60475023, National Doctoral Foundation of China under Grant no. 20050698032, National Basic Research Program (973 Program) under Grant no. 2007CB311006, and Hi-tech Research and Development Program of China (863 Program) under Grant no. 2006AA04Z222. References [1] C. Yang, R. Duraiswami, L. Davis, Efficient mean-shift tracking via a new similarity measure, in: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005. [2] B. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: Proceedings of the International Joint Conference on Artificial Intelligence, 1981, pp. 674–679. [3] G. Hager, P. Belhumeur, Efficient region tracking with parametric models of geometry and illumination, IEEE Trans. Pattern Anal. Mach. Intell. 20 (10) (1998) 1025–1039. [4] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. Pattern Anal. Mach. Intell. 25 (5) (2003) 564–577. Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433 [5] M. Isard, A. Blake, Contour tracking by stochastic propagation of conditional density, in: Proceedings of the European Conference on Computer Vision, Cambridge, UK, 1996, pp. 343–356. [6] P. Fieguth, D. Terzopoulos, Color based tracking of heads and other mobile objects at video frame rates, in: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Puerto Rico, 1997, pp. 21–27. [7] J. Yang, A. Waibel, A real-time face tracker, in: Proceedings of WACV, Sarasota, FL, 1996, pp. 142–147. [8] C.R. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: real-time tracking of the human body, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 780–785. [9] G. Bradski, Computer vision face tracking for use in a perceptual user interface, Intel Technol. J. 2 (Q2) (1998). [10] C. Sminchisescu, B. Triggs, Kinematic jump processes for monocular 3D human tracking, in: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, vol. 1, Madison, WI, 2003, pp. 69–76. [11] G. Cheung, S. Baker, T. Kanade, Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture, in: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, vol. 1, Madison, WI, 2003, pp. 77–84. [12] M.J. Black, A.D. Jepson, Eigentracking: robust matching and tracking of articulated objects using a view-based representation, Int. J. Comput. Vision 26 (1) (1998) 63–84. [13] S. Avidan, Support vector tracking, Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, vol. 1, Kauai, HI, 2001, pp. 184–191. [14] O. Williams, A. Blake, R. Cipolla, A sparse probabilistic learning algorithm for real-time tracking, in: Proceedings of the International Conference on Computer Vision, Nice, France, 2003, pp. 353–360. [15] H.T. Nguyen, A.W.M. Smeulders, Robust tracking using foreground–background texture discrimination, Int. J. Comput. Vision 69 (3) (2006) 277–293. [16] R.T. Collins, Y. Liu, M. Leordeanu, Online selection of discriminative tracking features, IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) (2005) 1631–1643. [17] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell. 26 (8) (2004) 1064–1072. [18] S. Avidan, Ensemble tracking, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2) (2007) 261–271. 2433 [19] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, IEEE Comput. Vision Pattern Recognition 2 (2000) 142–149. [20] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2002) 603–619. [21] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal. Mach. Intell. 17 (8) (1995) 790–799. [22] G. Hager, M. Dewan, C. Stewart, Multiple kernel tracking with ssd, in: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, vol. 1, Washington, DC, 2004, pp. 790–797. [23] D. Comaniciu, An algorithm for data-driven bandwidth selection, IEEE Trans Pattern Anal. Mach. Intell. 25 (2) (2003) 281–288. [24] D. Comaniciu, V. Ramesh, P. Meer, The variable bandwidth mean shift and data-driven scale selection, IEEE Int. Conf. Comput. Vision 1 (2001) 438–445. [25] R.T. Collins, Mean-shift blob tracking through scale space, IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition 2 (2003) II234–40. [26] A. Djouadi, O. Snorrason, F. Garber, The quality of training-sample estimates of the Bhattacharyya coefficient, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 92–97. [27] T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol. 15 (1967) 52–60. [28] J. Lin, Divergence measures based on the Shannon entropy, IEEE Trans. Inf. Theory 37 (1991) 145–151. [29] T.-L. Liu, H.-T. Chen, Real-time tracking using trust-region methods, IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004) 397–402. [30] T.-L. Liu, H.-T. Chen, Real-time tracking using trust-region methods, IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004) 397–402. [31] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Academic Press, New York, 2003 pp. 337–340. [32] A. Joukhadar, A. Scheuer, Fast contact detection between moving deformable polyhedra, in: Proceedings of IEEE International Conference on Intelligent Robots and Systems, vol. 3, 1999, pp. 1810–1815. [33] B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: Proceedings of the DARPA Imaging Understanding Workshop, 1981, pp. 121–130. [34] A.D. Jepson, D.J. Fleet, T.F. El-Maraghi, Robust online appearance models for visual tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. I, 2001, pp. 415–422. About the Author—ZUREN FENG received his M. Eng. and Ph.D. degrees in Information and Control Engineering from Xi’an Jiaotong University, Xi’an, China, in 1982 and 1988, respectively. Since 1994, he has been a Professor in Systems Engineering Institute, Xi’an Jiaotong University. In 1992, he worked as a visiting scholar in INRIA, France, for research on manipulator control with flexible joints and applications of Petri Nets in DEDS. In 1994, he was invited by Kassel University, Germany, for research on mobile service robots. Professor Feng has been Deputy Dean of the School of Electronics and Information Engineering at Xi’an Jiaotong University since 2001. From 1998 to 2001, he was the Deputy Dean of the Academy of Engineering and Science and also the Head of the Systems Engineering Institute at Xi’an Jiaotong University. He is now a member of the Committee of Deep Space Exploration Technology, Chinese Society of Astronautics, a member of Council of Shannxi Society of Automation, China, and a member of Academic Council of Chinese Key Labs in Manufacturing. He was the winners of National Outstanding Achievement Award for Ph.D Scholars in 1991. His research interests include robotics and automation; multi-agent systems; intelligent optimization, adaptive control, and vision based robot navigation. He has been involved in many academic and practical projects and published near 100 papers on relevant topics. About the Author—NA LU is a Ph.D. candidate of Systems Engineering Institute, Xi’an Jiaotong University, P.R. China. Her research interests include visual tracking, image matching, feature extraction, and mobile robotics. About the Author—PING JIANG received B. Eng., M. Eng., and Ph.D. degrees in Information and Control Engineering from Xi’an Jiaotong University, Xi’an, PR China, in 1985, 1988, and 1992, respectively. He joined the Department of Electrical Engineering at Tongji University, Shanghai, as a Lecturer in 1992 and as an Associate Professor in 1994. Since 1997, he has been a Professor in the Department of Information and Control Engineering, Tongji University, Shanghai, PR China. From 1998 to 2000, he worked as an Alexander von Humboldt Research Fellow in Lehrstuhl fuer Allgemeine und Theoretische Elektrotechnik, Universitaet Erlangen-Nuernberg, Germany. From 2002 to 2003, he was a Senior Research Fellow at Glasgow Caledonian University for the IST Project DIECoM. He has taken up a lecturer appointment in Cybernetics and Virtual Systems at the University of Bradford since 2003. His research interests include intelligent control and intelligent robots; distributed artificial intelligent and multi-agent; and distributed control networks and applications. He was involved in over 30 projects and published over 100 papers on relevant topics.
© Copyright 2026 Paperzz