Posterior probability measure for image matching

Pattern Recognition 41 (2008) 2422 – 2433
www.elsevier.com/locate/pr
Posterior probability measure for image matching
Zuren Feng a,∗ , Na Lu a , Ping Jiang b,c
a State Key Laboratory for Manufacturing Systems Engineering, Systems Engineering Institute, Xi’an Jiaotong University, Xi’an 710049, China
b Department of Computing, University of Bradford, Bradford BD71DP, UK
c Tongji University, Shanghai 201804, China
Received 2 July 2007; received in revised form 12 December 2007; accepted 16 December 2007
Abstract
Template matching is one of the principle techniques in visual tracking. Various similarity measures have been developed to find the target
in an acquired image by matching with a template. However, mismatching or misidentification may sporadically occur due to the influence of
the background pixels included in the designated target model. Taking into account the statistical features of a search region, a novel similarity
measure is proposed, which can decrease the interference of the background pixels enclosed in the model. It highlights the significant target
features and at the same time reduces the influence of the features shared by both the target and the background. It exhibits an excellent
monotonic property and a distinct peak-like distribution. This new measure is also demonstrated to have a direct interpretation of posterior
probability and is named as posterior probability measure (PPM). The proposed PPM can be obtained through a pixel-wise computation and
exhibits suitability for image matching. The pixel-wise computation also enables a fast measure update after a target region has changed, which
results in a new adaptive scaling method for tracking a target with a varying size. Experiments show that it provides a higher precision in
the localization and a discriminatory power superior to the existing similarity measures, such as Bhattacharyya coefficient, Kullback–Leibler
divergence, and normalized cross correlation. The effectiveness of the adaptive scaling method is demonstrated in experiments.
䉷 2008 Elsevier Ltd. All rights reserved.
Keywords: Similarity measure; Image matching; Visual tracking; Posterior probability
1. Introduction
A variety of image tracking algorithms have been developed in recent years [1], including model-based methods [2,3], feature-based methods [4–6], knowledge-based
methods [7–11], learning-based methods [12–14], and
foreground–background discrimination methods [15–18].
Among these methods, the mean-shift algorithm has been
widely studied and applied in image segmentation and tracking
[4,19–22], for its simplicity and robustness. In the visual tracking field, the model-based mean-shift algorithm has achieved
prominent success because of its distinctive real-time performance. The foreground–background discrimination methods
[15–18] transform object tracking into classification. Classifiers such as support vector machine, linear discriminant
∗ Corresponding author. Tel.: +86 29 82 667 771; fax: +86 29 82 665 487.
E-mail addresses: [email protected] (Z. Feng),
[email protected] (N. Lu), [email protected] (P. Jiang).
0031-3203/$30.00 䉷 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2007.12.013
analysis, and variance ratio have been trained to discriminate
an object from a background. However, there are two crucial
aspects we must pay more attention to for visual tracking: localization precision and target scale adaptation [19,23–25].
The model matching has been widely employed in the tracking methods mentioned above. For template-matching-based
image tracking, a vital factor that influences the target localization precision is the selection of a proper similarity measure
which evaluates the distinctness between a template as the target model and a candidate region in the image. The most widely
used similarity measures include the Bhattacharyya coefficient
[26–28], Kullback–Leibler divergence [27,29,30], normalized
cross correlation [31], histogram intersection distance [32], and
so on. Among them, the Bhattacharyya coefficient has been applied in the mean-shift tracking algorithm. The Bhattacharyya
coefficient treats the target pixels and the background pixels
in a target model equally. Therefore, the background pixels
may interfere with the target matching and cause localization
bias or even mismatching. The same flaw also exists in other
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
similarity measures. What is more, the pixels shared by the
target and the background will also influence the matching result. Consequently, there is a necessity for the development of
a more reliable similarity measure which takes the background
interference into account. In this paper, a similarity measure,
i.e. the posterior probability measure (PPM), is presented to
deal with the issue by considering statistical feature of a search
region.
During a visual tracking process, the scale of the tracked
target may continuously vary. In order to achieve precise localization, the target scale or the kernel bandwidth must be adaptive. The kernel bandwidth is a scale parameter that is closely
related to the target size when kernel-based methods such as
mean-shift are employed. Numerous methods have been developed for adaptive adjustments of the target scale. Comaniciu
et al. [19] proposed to apply the mean-shift procedure three
times with three different kernel bandwidths derived by adding
or reducing 10 percent of the current target size during each
repetition, from which the optimal scale with the maximum
Bhattacharyya coefficient will be selected. This method works
well when the target becomes smaller, but may fail when the
target expands [25]. A data-driven scale selection method was
presented in Refs. [23,24]. By analyzing the local statistical
characteristics around each data point, the kernel bandwidth is
calculated. Collins developed a scheme for blob tracking over
the scale space [25]. A scale space mean-shift procedure is first
executed and then the spatial space calculation is processed
over multiple scales. All these methods are complicated and
computationally demanding, which hinders their application to
the real-time tracking. The proposed PPM can be calculated by
following a concise pixel-wise computation so that a fast and
convenient adaptive scaling method can be developed. The visual tracking experiments reveal the excellent performance of
this method.
In short, there are two major contributions in this paper.
Firstly, a new similarity measure, the PPM, is proposed, which
is most appropriate for blob matching tasks. Secondly, this new
measure leads to the development of a fast and convenient
adaptive scaling method.
Experiments have shown that the PPM-based tracking
method can achieve more precise tracking than the other
similarity-measure-based algorithms under circumstances involving rotation, occlusion, or cluttered background. The
peak-like shape of the similarity distribution of the PPM is
more distinct, which improves both tracking accuracy and robustness. For the PPM, the influence of the background pixels
is suppressed by considering the statistics of a search region,
while the contribution from the actual target features is enhanced, which leads to the improvement in the precision of the
target localization. The model scale constraint is also relaxed
because more background pixels are allowed to be contained.
In addition, the PPM value of a candidate region can be simply
computed by adding up the individual contribution of each
pixel in the region. As a result, the adaptive scaling algorithm
can be implemented using pixel-wise computing.
The paper is organized as follows: Section 2 gives some preliminary definitions. Section 3 analyzes the performance of the
2423
existing similarity measures and the reason for localization failure. The new similarity measure is proposed in Section 4. Comparisons of matching results between the PPM and the other
similarity measures are also given. Section 5 analyzes the properties and limitations of the PPM. The adaptive scaling method
is depicted in Section 6. Section 7 gives and demonstrates the
probabilistic interpretation of the PPM. Visual tracking experiments are presented in Section 8. Section 9 forms a conclusion.
2. Image matching and image feature analysis
2.1. Image matching
In this paper, the aim of image matching or blob matching is
to search for a sub-area within an image which is most similar
to a predefined target model. The obtained sub-area is the target region. All the sub-areas to be matched against the target
model are target candidates. Both the target model and the target candidates are characterized by a histogram vector. Thus,
we have
target model: q = {qu }u=1,...,mu
target candidate: p = {pu }u=1,...,mu ,
where mu is the dimension of the histogram. The similarity
measure is a function evaluating the degree of similarity between the target model and the target candidate. We have
similarity measure: (i) ≡ (q, pi ),
where i is the index of the target candidate and pi is the histogram of the target candidate i. A target candidate with the
largest similarity value will be recognized as the target. Therefore, an image matching task can be formulated as
max
i∈{1,...,nS }
(q, pi ),
(1)
where ns is the number of target candidates in the search region. Usually, there is a rough prediction of the target location.
Therefore, it is not necessary to search the whole image. The
area searched for a match is called the search region.
2.2. Image feature analysis
In order to enclose the whole target, it is inevitable that the
background pixels will merge with the target model. When the
background is cluttered or the imaging quality is poor, these
pixels may produce interference in localization of the true target
and lead to biased localization or misidentification.
Figs. 1 and 2 provide two examples to illustrate the aforementioned background interference. It can be easily observed
that there exist background pixels in the target models. In the
ice hockey player model (Fig. 1), in addition to the player,
which is the tracked target, there are also a lot of background
pixels of the ice rink. Fig. 2 is an image taken when the pingpong bounces quickly. In the corresponding target model, the
background pixels are interlaced with the ping-pong pixels.
2424
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
Fig. 1. Ice hockey image and target model.
Fig. 2. Ping-pong image and target model.
Due to the movement of the target or camera, complications such as partial occlusion or low-quality images may be
produced, which lead to that the target loses dominance in an
image. On the other hand, the background pixels existing in
the target model could appear in the image in large quantities.
Without considering background interference, a high value of
similarity measure can no longer indicate the appearance of the
target, which is the main reason for incorrect target localizations. Although the exclusion of the background pixels from the
target model may be very difficult, such as in the case of the
ping-pong model (Fig. 2), a new similarity measure that treats
the target pixels and background pixels with different weights
could be a solution.
Fig. 3. Matching instance of ice hockey player. (a) Correct target region
and its enlargement. (b) Biased target region obtained by the Bhattacharyya
coefficient and its enlargement.
Table 1
Matching performance analysis of the Bhattacharyya coefficient for ice hockey
instance
Target
model
Correct
target region
Biased
target region
Target pixels
621 (46%)
858 (61%)
421 (30%)
3. Performance analysis of the existing similarity measures
Background pixels
749 (54%)
294 (21%)
887 (63%)
There exist many kinds of similarity measures or distance metrics, such as Euclidean distance, Bhattacharyya
coefficient, cross correlation, normalized cross correlation,
Kullback–Leibler divergence, etc. Among these measures, the
Bhattacharyya coefficient and Kullback–Leibler divergence
have wide applications [4,29]. Here we take the Bhattacharyya
coefficient as an example to analyze the matching performance
of the existing similarity measures which treat all the pixels in
the target model identically.
Bhattacharyya
coefficient value
0.805
0.884
and q is formulated as
(p, q) =
mu
√
pu · q u ,
(2)
u=1
3.1. Bhattacharyya coefficient
The Bhattacharyya coefficient is a popular similarity measure which has been employed in the well-known mean-shift
tracking algorithm. The Bhattacharyya coefficient between p
where p and q are the histogram vectors of the target candidate
and the target model respectively, pu and qu are the uth elements of vectors p and q, and mu is the number of bins of the
histogram.
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
Fig. 4. Matching instance of ping-pong sequence. (a) Correct target region
and its enlargement. (b) Mistaken target region obtained by the Bhattacharyya
coefficient and its enlargement.
Table 2
The Bhattacharyya coefficient matching performance analysis for ping-pong
instance
Target
model
Correct target
region
Mistaken target
region
Target pixels
580 (80%)
218 (30%)
0
Background pixels
145 (20%)
3 (0.4%)
515 (71%)
0.167
0.276
Bhattacharyya
coefficient value
3.2. Mismatch analysis of the Bhattacharyya coefficient
When the Bhattacharyya coefficient is applied in visual tracking, localization bias or even mismatch may occur occasionally.
In this section, the causes of the matching failures are analyzed
and illustrated through two instances, in which the pixels of the
target models can be divided into two categories, target pixels
and background pixels.
Fig. 3 shows a matching instance of the ice hockey player
using the same target model as in Fig. 1. Fig. 3(a) is the correct
2425
target region. Fig. 3(b) is the localization result obtained by the
Bhattacharyya coefficient which is clearly a biased target region. Table 1 gives a comparison between the case of the correct
target region in Fig. 3(a) and the case of the biased target region
in Fig. 3(b) in terms of Bhattacharyya coefficient values, and
the pixel counts of the target pixels and the background pixels.
In the correct target region, there are 61% target pixels and 21%
background pixels. The remaining 18% of the pixels are not included in the predefined target model and do not affect the similarity value. The corresponding Bhattacharyya coefficient is
0.805. However, in the biased region, there are 30% target pixels, 63% background pixels, and 7% others. The Bhattacharyya
coefficient of the biased region reaches 0.884, which is higher
than that of the correct target region. Therefore, using the Bhattacharyya coefficient to match produces a biased target region,
as shown in Fig. 3(b). It can be deduced that the higher percentage of the background pixels (63%) in Fig. 3(b) could cause the
bias.
Fig. 4 and Table 2 provide another matching instance for the
case of the ping-pong model in Fig. 2. In this case, a complete
mismatch happens. It can be observed from Table 2 that the high
percentage of background pixels (71%) in the mistaken target
region has affected the Bhattacharyya coefficient and caused
the mismatch.
The above analysis comes to two conclusions. Firstly, when
the Bhattacharyya coefficient is employed as the similarity measure for blob matching, localization failures such as bias or
mismatch may take place. Secondly, the background pixels in a
target model may influence the similarity value notably, so that
the similarity value of other regions may even exceed that of
the correct target region. This is the main reason for matching
failures.
The other existing similarity measures, which do not differentiate between target pixels and background pixels, exhibit
similar problems to the Bhattacharyya coefficient.
4. Posterior probability measure
4.1. Derivation of the measure
As discussed above, the background pixels included in the
target model may lead to mistaken localization. Differentiating between target pixels and background pixels is one way to
avoid it. For target matching, it is common to search for the best
matched target region in a predefined search region, which is
much larger than the target region. In other words, a search region contains mainly background pixels but fewer target pixels.
Consequently, the histogram of the search region can provide
clues to understanding the statistical characteristics of the background pixels. Assuming s is the unnormalized histogram vector of the search region, a larger su indicates feature u is more
likely to be a feature of background pixels, whereas for a feature of target pixels, su is more likely small. Under this consideration, 1/su can be introduced into the existing similarity
measures as a rectifying weight to reduce the weight given
to background features and increase the weight given to target features, respectively. Following this idea and taking cross
2426
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
correlation [31] as a prototype, a new similarity measure can
be defined as
m
(p, q) =
u
p u qu
1 ,
m
su
(3)
u=1
Table 3
PPM matching performance analysis for ice hockey instance
Target pixels
Background
pixels
PPM value
Target
model
Correct
target region
Biased
target region
Search
region
621
749
858
294
421
887
1388
8840
0.356
0.232
Table 4
PPM matching performance analysis for ping-pong instance
Target pixels
Background
pixels
PPM value
Target
model
Correct
target region
Mistaken
target region
Search
region
580
145
218
3
0
515
302
6939
0.196
0.048
where su , pu , and qu represent the uth elements of the
unnormalized histogram vectors s, p, and q, respectively;
p and q are the histogram vectors of the target candidate and the target model, respectively; m is the number
of pixels in the target model, which is a normalization
constant.
It will be justified theoretically in Section 7 that the new
measure (p, q) defined above in Eq. (3) is an evaluation of
the posterior probability of a target candidate to be identified
as the target, i.e. a PPM.
4.2. Comparison analysis
In comparison with the matching performance of the Bhattacharyya coefficient shown in Tables 1 and 2, matchings
of the ice hockey player and the ping-pong ball are carried
out again using the PPM. Tables 3 and 4 show the results.
The last columns of the tables give the numbers of pixels
in the search regions, which indicate that the background
pixels are far more than the target pixels. As the result of
introducing 1/su as weights in Eq. (3), the importance of the
background pixels to the PPM has been apparently weakened, whereas the importance of the target pixels has been
strengthened. Experimental results show that the target regions
for both instances can be localized correctly as they exhibit
higher PPM values than biased or mistaken target regions
(Tables 3 and 4).
Fig. 5. Similarity distribution comparisons. (a) PPM measure similarity distribution. (b) Bhattacharyya similarity distribution. (c) Localization results of PPM.
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
5. Matching performance evaluation of the PPM
One common characteristic of the existing similarity measures is that they treat the target pixels and the background pixels in a target model equally, unlike the proposed PPM which
treats them differently. Taking the Bhattacharyya coefficient as
their representative, this section will evaluate the matching performance of the PPM in comparison with them.
A similarity measure is often expected to have a distinctly
sharp peak-like distribution in order to achieve a reliable match.
Therefore, the distribution shapes of two similarity measures
are compared first as shown in Fig. 5, where the big squares in
the images indicate the search regions and the small ones indicate the obtained target regions. From the figure, it is clear that
the similarity distribution of the PPM has a more distinct and
sharper peak than the distribution of the Bhattacharyya coefficient. This phenomenon can be explained by the suppressing
capability of the PPM on background pixels. A pixel further
from the target center means that this pixel is more likely to
be a background pixel. Its importance in the PPM is attenuated
more as a result. Therefore, a distribution of the PPM tends to
be shaped into a sharper peak than the Bhattacharyya coefficient.
Because of the background suppressing capability of the
PPM, we can also expect a more reliable and tolerant match
to varying model scale. A target model with a larger scale generally encloses more background pixels and the corresponding similarity measure suffers more background interference.
2427
This may lead to mistaken localization by using the existing similarity measures. However, the PPM can weaken
the interference effect. In Fig. 6, the scales of the three
target models are 17 × 17, 29 × 29, and 39 × 39, respectively. Even though the target model scales have been
greatly enlarged in the second and the third images, with
more background pixels enclosed, the PPM distributions
still hold monotonic and distinct peak shapes as shown in
Fig. 6(a), whereas the Bhattacharyya-coefficient-based ones are
damaged.
Finally, we can expect that the PPM can effectively improve
the localization accuracy in comparison with the Bhattacharyya
coefficient. Fig. 7(a) depicts the similarity distributions of the
PPM and the corresponding matching results. Fig. 7(b) presents
the cases of the Bhattacharyya coefficient. The PPM correctly
localizes the targets, whereas the Bhattacharyya coefficient results in bias or mismatch because of the background interference.
6. Computation property and scale adaptation
The computation of a Bhattacharyya coefficient is featurewise as shown in Eq. (2), which means the coefficient is the
√
summation of pu · qu for each feature u. The PPM computation given in Eq. (3) is feature-wise too but it can be transferred
into a pixel-wise manner as shown in Section 6.1. The pixelwise computation allows that the PPM of a region is obtained
by accumulating effects of individual pixels, from a pixel to
Fig. 6. Target model scale tolerance comparisons. (a) PPM similarity distribution. (b) Bhattacharyya coefficient similarity distribution. (c) Localization results
of PPM.
2428
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
Fig. 8. Scale adaptation illustration.
we have
⎛
⎞
mu
qu
1 ⎝
1⎠
(p, q) =
su
m
u=1 {j :B(j )=u}
⎛
⎞
mu
qu
1 ⎝
⎠
=
m
su
u=1 {j :B(j )=u}
⎛
1
q1
q2
= ⎝
+
m
s1
s2
{j :B(j )=1}1
{j :B(j )=2}12
⎞
qmu ⎠
.
+··· +
smu
(4)
{j :B(j )=mu }
Because {j : B(j ) = 1} ∪ {j : B(j ) = 2} ∪ · · · ∪ {j : B(j ) =
mu } = {1 · · · m}, where mu is the number of bins and m is the
total number of pixels in the region, we can rearrange Eq. (4)
into the pixel-wise form:
m
(p, q) =
1 qu (j )
,
m
su (j )
(5)
j =1
Fig. 7. Matching accuracy comparisons. (a) The PPM-based similarity distributions and localization results. (b) Bhattacharyya-coefficient-based similarity
distributions and localization results.
another pixel. This simplifies scale adaptation (Section 6.2).
However, a Bhattacharyya coefficient cannot be computed in a
pixel-wise manner straightforwardly due to the involved square
root operation.
6.1. Pixel-wise computation of the new measure
In Eq. (3), the pu represents the number of pixels
in a target candidate belonging
to a histogram bin (or
feature) u, i.e. pu =
{j :B(j )=u} 1, where B(j ) is the
bin that pixel j belongs to. Substituting it into Eq. (3),
where j is the pixel index, qu (j ) is the uth bin value of the
target model histogram if B(j ) = u, and su (j ) is the uth bin
value of the search region histogram if B(j ) = u. Obviously,
the value of qu (j )/su (j ) can be regarded as the contribution of
pixel j to the similarity value. Thus, the PPM can be computed
by accumulating individual pixel’s contributions.
6.2. Scale adaptation
Scale adaptation is an adaptive process to fit a target region to
a variable target scale for the purpose of precise target tracking.
The pixel-wise PPM in Eq. (5) provides an efficient formula for
scale adaptation because it is simply a linear combination of
contributions from individual pixels. Suppose a target has been
localized in a target region with size w(k) (width and height)
at the kth frame of a video and its PPM (p(y ∗ ), q) has been
obtained, where y ∗ denotes the current target center position,
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
2429
¯ 0 > 0.75,
¯ −1 > 0.8, In Eq. (8), the expanding condition, and ¯ 1 > 0.6, means the average contribution of all the three
layers retains a consistently high level of similarity and the
pixels by the border are likely to be a part of the target. The
target region should be enlarged. The contracting condition,
¯ < 0.6 and ¯ 1 < 0.3, indicates situations where the average
0
contribution of the pixels on the target border is rather small
and it decreases rapidly when moving out of the current target
region. Consequently, the target region should be reduced consequently. Fig. 9 shows two examples of scale adaptation for
target tracking.
7. Probabilistic justification of the PPM
Fig. 9. Scale adaptation results. (a) Adaptive tracking results of ping-pong
sequence. (b) Adaptive tracking results of face sequence.
p(y ∗ ) is the corresponding target histogram vector, and q is the
target model. We can average the PPM to obtain an approximate
contribution of each pixel to the similarity measure as
∗
¯ = (p(y ), q) ,
(6)
m
where m is the number of pixels in the target region.
Due to the motion of the target, the scale of the target may
change. The size of the target region has to be able to adapt to
this change. This is conducted first by examining the average
similarity contribution of the pixels in the inner and outer layers
of the target region against the average contribution of each
pixel on the current target region border. Fig. 8 illustrates the
first inner layer, border, and the first outer layer of a target
region. The average similarity contributions of the pixels in the
inner and outer layers are denoted as
¯ , i = −a, . . . , 0, . . . , a,
(7)
i
where i < 0 means the ith inner layer, i > 0 means the ith outer
layer, and i = 0 represents the target region border; a is the
comparison step of scale adaptation and is set to 1 without
losing generality.
An empirical scale adaptation formula, which has been applied in a great number of experiments and proved to be good
in tracking performance, is given as
⎧
¯ > 0.8 and ¯ > 0.75 and
w(k) + 2 if −1
0
⎪
⎪
⎪
⎨
¯ 1 > 0.6,
w(k + 1) =
(8)
¯ < 0.6 and ¯ < 0.3,
⎪
w(k) − 2 if ⎪
0
1
⎪
⎩
w(k)
else,
where w(k) is the size of the target region at frame k.
The similarity measure formularized by Eq. (3) evaluates the
posterior probability of a target candidate to be the true target.
Proof. Let ns be the number of pixels in the search region, m
the number of pixels in the target model, and nc the number of
pixels in the target candidate.
Suppose event X represents that the current target candidate
is the target in the search region. Without assessing the distributions of the regions, the prior probability P (X) can be obtained only by the ratio of pixels in the candidate region to that
in the search region:
nc
(9)
P (X) = .
ns
Suppose event Yu represents feature u is detected in the search
region, then
su
(10)
P (Yu ) = ,
ns
where su is the uth element of the histogram vector of the search
region, as defined in (3). Therefore, in the search region, the
probability that a target candidate is the target and contains the
uth feature of the target model is
n c pu
pu
·
=
,
(11)
P (XY u ) = P (X)P (Yu |X) =
ns nc
ns
where pu is the uth element of the histogram vector of the target
candidate, as defined in (3); P (Yu |X) denotes the probability
that the uth feature is detected in the candidate region. According to Bayes’ theorem, the posterior probability that the current
target candidate is the target after detecting the uth feature can
be described as
su
P (X)P (Yu |X) pu
pu
=
.
(12)
=
P (X|Yu ) =
P (Yu )
ns
ns
su
Taking qu /m, the ratio of the uth feature in the target model,
as the normalizing constant, the posterior probability can be
represented as qmu · psuu , where qu is the uth element of the
histogram vector of the target model. Examining all features,
u = 1, . . . , mu , we obtain the combined posterior probability
of the target candidate to be the target:
P (X|Y ) =
mu
qu
u=1
m
P (X|Yu )=
mu
mu
qu p u 1 pu q u
· =
m su m
su
u=1
which is the PPM proposed in Eq. (3).
u=1
(13)
2430
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
8. Experimental comparisons
8.1. Tracking performance with features shared by target and
background
The mean-shift algorithm is a local search strategy toward
the maximum similarity defined by a similarity measure. For
each frame, the center of a target region is initialized and then
updated iteratively in a format of mean of the similarity contribution as
m
j =1 xj (i)j
ŷ(i + 1) = m
,
(14)
j =1 j
where m is the number of pixels in the target region, i is the
iteration index, xj (i) is the position of pixel j in the current
region, j is the similarity contribution of pixel j, and ŷ(i + 1)
is the updated center position of the target region.
The mean-shift of the region center in Eq. (14) will be updated successively and can converge to a stationary position,
where the similarity is maximal or, at least, locally maximal.
A more detailed description of the mean-shift search procedure
can be found in Refs. [4,19].
With the PPM as discussed in Section 6.1, pixel contribution
j can be represented as
j =
qu (j )
.
su (j )
(15)
By conducting mean shift (14) with j in Eq. (15) iteratively,
the target can be localized.
Fig. 10 shows a comparison between the PPM mean-shift
tracker and the conventional mean-shift tracker [4] for ice
hockey player tracking. There are 186 frames in the ice hockey
sequence. Some color features coexist in both the target model
and the background, such as the ice ground pixels, as shown
in Fig. 1. The experiment shows that the PPM tracker can provide a more superior tracking performance with far less bias
than the conventional mean-shift tracker. Fig. 10 shows some
comparison frames. For illustration convenience, only the parts
within the search regions are given (big squares). The small
squares indicate the localized targets.
Fig. 11 gives some representative comparisons between
the PPM and three typical image trackers, which are the
Kullback–Leibler divergence based trust-region tracker [30],
the normalized cross-correlation based tracker [31], and the
histogram intersection distance-based tracker [32]. In the latter
two trackers, an exhaustive search is utilized. In all the frames,
the PPM-based tracker precisely tracks the target, whereas the
other three measures-based trackers exhibit occasional bias or
mismatching, as shown in Fig. 11.
8.2. Tracking performance with poor imaging quality sequence
Fig. 12 gives some comparison results for tracking a pingpong ball, which has a very blurred image due to its speed. In
this case the conventional mean-shift tracker fails to localize
the target (see Fig. 12(b)), whereas the PPM still produces
satisfactory localization (see Fig. 12(a)).
8.3. Tracking performance under rotation and occlusion
To track a moving target with possible rotation and occlusion, Fig. 13 gives some comparisons between the PPM-based
tracker and the foreground–background texture discrimination
tracker [15]. The foreground–background texture discrimination tracker conceives tracking as developing of distinction
between foreground and background. As a result, the tracking
is transformed into classification. Nguyen and Smeulders [15]
claimed that their tracking method outperforms several stateof-the-art trackers, such as SSD [33], WSL [34], and Collins
and Liu [16]. It was also indicated in Ref. [15] that the tracker
may fail when there are features similar to the target in the
background. A comparison experiment is carried out as shown
in Fig. 13, where the mark ‘95’ on the T-shirt is tracked. Due
to the motion of the person, the mark can be rotated, deformed,
and even partially occluded. The PPM-based tracker shows satisfactory tracking results, whereas the foreground–background
texture discrimination tracker fails in some frames, such as
Fig. 10. Comparison results of ice hockey sequence. (a) PPM mean-shift tracking results. (b) Conventional mean-shift tracking results.
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
2431
Fig. 11. Comparisons with the Other Similarity Measures. (The correct trackings are obtained by the PPM-based tracker and the other results are obtained by the trackers indicated by the subtitles.) (a) Comparisons with
Kullback–Leibler divergence based trust-region tracker. (b) Comparisons with
normalized cross-correlation. (c) Comparisons with histogram intersection
distance.
Fig. 13. Comparisons under rotations and occlusions. (a) PPM tracking results.
(b) Tracking results of foreground–background texture discrimination tracker.
Fig. 12. Comparison results of ping-pong sequence. (a) PPM tracking results.
(b) Tracking results of conventional mean-shift tracker.
2432
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
Fig. 14. Football tracking results.
frames 2, 3, 4, and 6 in Fig. 13(b). In both trackers, the same
adaptive scaling method [19] is employed.
8.4. Application in robot soccer
The proposed measure and scale adaptation method have
also been successfully applied to our robot soccer team for ball
tracking. Fig. 14 gives some football tracking results. Using a
computer of 2.4 GHz, for an image of 640 × 480 pixels, each
matching computation takes only 2.4 ms.
9. Conclusion
For blob matching, the existing similarity measures may produce matching bias or misidentification due to background interference. This generates the need for a more robust and effective similarity measure. By taking the statistics of a search
region as the statistical estimate of background pixels, a new
measure, the PPM, for blob matching was proposed. This measure can effectively attenuate the influence of the background
pixels. In comparison with the other existing similarity measures, the PPM has a more distinctly monotonic and peak-like
distribution and can achieve higher matching accuracy. This
paper showed that the PPM is in fact the posterior probability
of a target candidate being identified as a target and developed
a pixel-wise algorithm for the PPM-based image tracking. A
novel and simple scale adaptation algorithm was also presented.
Comparison experiments between the PPM and the existing
similarity measures were carried out for image tracking and
demonstrated a more precise and robust tracking performance.
Acknowledgments
We would like to thank Dr. John Baruch for proofreading the
paper. We would also like to thank Dr. Liangfu Li and Yin Tao
for their helpful discussions. The research is supported by the
National Natural Science Foundation of China under Grant no.
60475023, National Doctoral Foundation of China under Grant
no. 20050698032, National Basic Research Program (973 Program) under Grant no. 2007CB311006, and Hi-tech Research
and Development Program of China (863 Program) under Grant
no. 2006AA04Z222.
References
[1] C. Yang, R. Duraiswami, L. Davis, Efficient mean-shift tracking via a new
similarity measure, in: Proceedings of the 2005 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’05),
2005.
[2] B. Lucas, T. Kanade, An iterative image registration technique with an
application to stereo vision, in: Proceedings of the International Joint
Conference on Artificial Intelligence, 1981, pp. 674–679.
[3] G. Hager, P. Belhumeur, Efficient region tracking with parametric models
of geometry and illumination, IEEE Trans. Pattern Anal. Mach. Intell.
20 (10) (1998) 1025–1039.
[4] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE
Trans. Pattern Anal. Mach. Intell. 25 (5) (2003) 564–577.
Z. Feng et al. / Pattern Recognition 41 (2008) 2422 – 2433
[5] M. Isard, A. Blake, Contour tracking by stochastic propagation of
conditional density, in: Proceedings of the European Conference on
Computer Vision, Cambridge, UK, 1996, pp. 343–356.
[6] P. Fieguth, D. Terzopoulos, Color based tracking of heads and other
mobile objects at video frame rates, in: Proceedings of the IEEE
Conference on Computer Vision Pattern Recognition, Puerto Rico, 1997,
pp. 21–27.
[7] J. Yang, A. Waibel, A real-time face tracker, in: Proceedings of WACV,
Sarasota, FL, 1996, pp. 142–147.
[8] C.R. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: real-time
tracking of the human body, IEEE Trans. Pattern Anal. Mach. Intell. 19
(7) (1997) 780–785.
[9] G. Bradski, Computer vision face tracking for use in a perceptual user
interface, Intel Technol. J. 2 (Q2) (1998).
[10] C. Sminchisescu, B. Triggs, Kinematic jump processes for monocular
3D human tracking, in: Proceedings of the IEEE Conference on
Computer Vision Pattern Recognition, vol. 1, Madison, WI, 2003,
pp. 69–76.
[11] G. Cheung, S. Baker, T. Kanade, Shape-from-silhouette of articulated
objects and its use for human body kinematics estimation and motion
capture, in: Proceedings of the IEEE Conference on Computer Vision
Pattern Recognition, vol. 1, Madison, WI, 2003, pp. 77–84.
[12] M.J. Black, A.D. Jepson, Eigentracking: robust matching and tracking
of articulated objects using a view-based representation, Int. J. Comput.
Vision 26 (1) (1998) 63–84.
[13] S. Avidan, Support vector tracking, Proceedings of the IEEE Conference
on Computer Vision Pattern Recognition, vol. 1, Kauai, HI, 2001, pp.
184–191.
[14] O. Williams, A. Blake, R. Cipolla, A sparse probabilistic learning
algorithm for real-time tracking, in: Proceedings of the International
Conference on Computer Vision, Nice, France, 2003, pp. 353–360.
[15] H.T. Nguyen, A.W.M. Smeulders, Robust tracking using
foreground–background texture discrimination, Int. J. Comput. Vision
69 (3) (2006) 277–293.
[16] R.T. Collins, Y. Liu, M. Leordeanu, Online selection of discriminative
tracking features, IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) (2005)
1631–1643.
[17] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach.
Intell. 26 (8) (2004) 1064–1072.
[18] S. Avidan, Ensemble tracking, IEEE Trans. Pattern Anal. Mach. Intell.
29 (2) (2007) 261–271.
2433
[19] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid
objects using mean shift, IEEE Comput. Vision Pattern Recognition 2
(2000) 142–149.
[20] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature
space analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2002)
603–619.
[21] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern
Anal. Mach. Intell. 17 (8) (1995) 790–799.
[22] G. Hager, M. Dewan, C. Stewart, Multiple kernel tracking with ssd,
in: Proceedings of the IEEE Conference on Computer Vision Pattern
Recognition, vol. 1, Washington, DC, 2004, pp. 790–797.
[23] D. Comaniciu, An algorithm for data-driven bandwidth selection, IEEE
Trans Pattern Anal. Mach. Intell. 25 (2) (2003) 281–288.
[24] D. Comaniciu, V. Ramesh, P. Meer, The variable bandwidth mean shift
and data-driven scale selection, IEEE Int. Conf. Comput. Vision 1 (2001)
438–445.
[25] R.T. Collins, Mean-shift blob tracking through scale space, IEEE
Comput. Soc. Conf. Comput. Vision Pattern Recognition 2 (2003) II234–40.
[26] A. Djouadi, O. Snorrason, F. Garber, The quality of training-sample
estimates of the Bhattacharyya coefficient, IEEE Trans. Pattern Anal.
Mach. Intell. 12 (1990) 92–97.
[27] T. Kailath, The divergence and Bhattacharyya distance measures in signal
selection, IEEE Trans. Commun. Technol. 15 (1967) 52–60.
[28] J. Lin, Divergence measures based on the Shannon entropy, IEEE Trans.
Inf. Theory 37 (1991) 145–151.
[29] T.-L. Liu, H.-T. Chen, Real-time tracking using trust-region methods,
IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004) 397–402.
[30] T.-L. Liu, H.-T. Chen, Real-time tracking using trust-region methods,
IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004) 397–402.
[31] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Academic Press,
New York, 2003 pp. 337–340.
[32] A. Joukhadar, A. Scheuer, Fast contact detection between moving
deformable polyhedra, in: Proceedings of IEEE International Conference
on Intelligent Robots and Systems, vol. 3, 1999, pp. 1810–1815.
[33] B.D. Lucas, T. Kanade, An iterative image registration technique with
an application to stereo vision, in: Proceedings of the DARPA Imaging
Understanding Workshop, 1981, pp. 121–130.
[34] A.D. Jepson, D.J. Fleet, T.F. El-Maraghi, Robust online appearance
models for visual tracking, in: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, vol. I, 2001, pp. 415–422.
About the Author—ZUREN FENG received his M. Eng. and Ph.D. degrees in Information and Control Engineering from Xi’an Jiaotong University, Xi’an,
China, in 1982 and 1988, respectively. Since 1994, he has been a Professor in Systems Engineering Institute, Xi’an Jiaotong University. In 1992, he worked as
a visiting scholar in INRIA, France, for research on manipulator control with flexible joints and applications of Petri Nets in DEDS. In 1994, he was invited
by Kassel University, Germany, for research on mobile service robots. Professor Feng has been Deputy Dean of the School of Electronics and Information
Engineering at Xi’an Jiaotong University since 2001. From 1998 to 2001, he was the Deputy Dean of the Academy of Engineering and Science and also
the Head of the Systems Engineering Institute at Xi’an Jiaotong University. He is now a member of the Committee of Deep Space Exploration Technology,
Chinese Society of Astronautics, a member of Council of Shannxi Society of Automation, China, and a member of Academic Council of Chinese Key Labs
in Manufacturing. He was the winners of National Outstanding Achievement Award for Ph.D Scholars in 1991. His research interests include robotics and
automation; multi-agent systems; intelligent optimization, adaptive control, and vision based robot navigation. He has been involved in many academic and
practical projects and published near 100 papers on relevant topics.
About the Author—NA LU is a Ph.D. candidate of Systems Engineering Institute, Xi’an Jiaotong University, P.R. China. Her research interests include visual
tracking, image matching, feature extraction, and mobile robotics.
About the Author—PING JIANG received B. Eng., M. Eng., and Ph.D. degrees in Information and Control Engineering from Xi’an Jiaotong University,
Xi’an, PR China, in 1985, 1988, and 1992, respectively. He joined the Department of Electrical Engineering at Tongji University, Shanghai, as a Lecturer
in 1992 and as an Associate Professor in 1994. Since 1997, he has been a Professor in the Department of Information and Control Engineering, Tongji
University, Shanghai, PR China. From 1998 to 2000, he worked as an Alexander von Humboldt Research Fellow in Lehrstuhl fuer Allgemeine und Theoretische
Elektrotechnik, Universitaet Erlangen-Nuernberg, Germany. From 2002 to 2003, he was a Senior Research Fellow at Glasgow Caledonian University for the
IST Project DIECoM. He has taken up a lecturer appointment in Cybernetics and Virtual Systems at the University of Bradford since 2003. His research
interests include intelligent control and intelligent robots; distributed artificial intelligent and multi-agent; and distributed control networks and applications.
He was involved in over 30 projects and published over 100 papers on relevant topics.