Optimization and Tracking of Polygon Vertices
for Shape Coding
Janez Zaletelj, Jurij F. Tasic
University of Ljubljana, Faculty of Electrical Engineering,
Trzaska 25, SI-1000 Ljubljana, Slovenia
{janez.zaletelj, jurij.tasic}@fe.uni-lj.si
http://ldos.fe.uni-lj.si
Abstract. The efficiency of shape coding is an important problem in
low-bitrate object-based video compression. Lossy contour coding methods typically include contour approximation by polygons or splines, spatial and/or temporal prediction of vertices, and entropy coding the of
prediction error. In conventional contour coding schemes, however, the
coding gain in the interframe mode is typically small. This indicates
that the temporal redundancy is not successfully removed. The paper
addresses the issue of temporal shape decorrelation by proposing the
Kalman filtering-based approach to vertex tracking and prediction. A
temporal vertex association procedure is proposed effectively minimizing bit rate in each frame. The prediction error is coded using adaptive
arithmetic encoding. Vertex optimization is employed to reduce the shape
reconstruction error.
1
Introduction
In the context of object-based video coding, shape information is crucial for
content-based access and manipulation of video streams. Because it represents
a substantial part of the total bit rate, efficient coding methods are needed.
MPEG-4 uses Context Arithmetic Encoding [9], which is a bitmap-based coding method. However, lossy contour-based coding methods can achieve a higher
coding efficiency by encoding control points / vertices of spline or polygon approximation. Most contour-based shape coding methods [1,2,3,5,10,11] concentrate on finding a set of polygon vertices / spline control points to minimize the
bit rate while satisfying the maximum allowable distortion criterion. Intra-frame
relative addressing of vertices is employed and predefined variable length codes
are used for entropy coding, which limits the rate-distortion efficiency. Temporal
decorrelation of shape is generally not addressed adequately, with the exception
of [7] which uses Lagrangian optimization of control point positions to obtain a
rate-distortion optimal solution.
The computational complexity of the optimal methods [5], [7] is quadratic in
the number of admissible control points. Their optimality can only be claimed
within the limitations imposed by the chosen code structure, motion compensation scheme, approximation scheme, width of the admissible control-point band,
2
Janez Zaletelj, Jurij F. Tasic
and code words. By using different coding schemes or different prediction /
motion compensation methods it is thus possible to derive a suboptimal, computationally less intensive method which would yield a similar rate-distortion
efficiency.
The proposed method capitalizes on the temporal shape correlation by finding correspondences between polygon vertices in successive frames. In each frame,
the video object’s shape is approximated by a polygon using a method based on
iterative polygon refinement [11]. In Section 2, a vertex optimization procedure
is defined which minimizes the distortion of the reconstructed shape. A method
of finding a rate-distortion optimal polygon approximation is given in Section 3.
Temporal correspondence of vertices in successive frames is determined on the
basis of vertex tracking using the Kalman filtering. Using a predicted set of vertices, we find correspondences which minimize the coding rate (see Eqn. 7). This
allows using an interframe relative addressing which reduces the dynamic range
and consequently the bit rate. A high coding efficiency is achieved by employing
adaptive arithmetic encoding of vertices. The encoder uses different probability
distributions for each coding mode which are continually adapted to the input
signal (Figure 2). Results of the experiments, given in Section 4, indicate that
the proposed combination of temporal tracking, adaptive arithmetic encoding
and vertex optimization outperforms standard shape coding methods based on
the polygon approximation.
2
State of the Art in Vertex-based Shape Coding
It is clear that the boundaries of the video object in successive frames are highly
correlated (see Fig. 1). However, the coding gain achieved by exploiting this
redundancy is relatively small compared to the intermode gain for grayscale
coders. Algorithms which exploit the interframe redundancy are mostly based on
the intraframe techniques adapted to the use of prediction based on a combined
spatio-temporal context [9], [7].
The efficiency of such pixel-based approaches is deteriorated by boundary
misalignment and boundary noise which are present even if no motion has occurred. A simple translational object motion model used to align boundaries
performs poorly under nonrigid object deformations. To compensate for local
boundary deformations, two-stage global-local motion compensation was proposed in [8]. Instead of predicting boundary pixels, a spatio-temporal prediction
of the angle and size of vectors connecting B-spline control points was proposed
in [7]. However, relative encoding of control points is essentially the same as in
intramode, so the magnitude of the coded vectors was not reduced.
3
Optimal Encoding Using Distortion Minimization and
Vertex Prediction
The computational complexity of the rate-distortion optimal vertex selection
methods [7], [5] is high due to exhaustive global search for the optimal vertex
Optimization and Tracking of Polygon Vertices for Shape Coding
3
positions. By using separate distortion and rate minimizations, it is possible
to find a suboptimal, but computationally less intensive solution. We propose
an iterative rate-distortion optimization scheme which is based on the polygon refinement method. Polygon vertices are added or removed on the basis of
the target bit rate Rmax . In each iteration, a rate-optimal spatial or temporal
correspondence and consequently the coding mode is defined for each vertex.
Temporal prediction of the vertex positions based on the Kalman filtering is
used to increase the coding gain by reducing the magnitude of the prediction
error.
3.1
Distortion-optimized Polygon Approximation
A number of shape coding methods relies on polygon approximation of the object’s contour. In a lossy contour coding scheme, the reconstruction error of the
polygon approximation needs to be evaluated. Different distance metrics are defined on the basis of the Euclidean distance between the contour pixel and the
closest point on the polygon segment [5]. The peak absolute distance Dp is defined as the maximum of the pixel distances to the polygon and is useful for
finding a polygon which satisfies the maximum peak distance criterion Dpmax .
However, within MPEG-4 evaluation of shape coders the distortion metric
Dn was used, because it is more sensitive to the shape reconstruction error.
It is defined on the basis of a¡ comparison of the original
and reconstructed
¢
binary object masks. Let B t = btj : j = 0, . . . , NBt − 1 denote an ordered set of
boundary elements of the object in frame t, and let S t = (stk : k = 0, . . . , N t − 1)
denote an ordered set of polygon vertices. Let O(B t ) denote a set of pixels which
belong to the video object in frame t, and let R(S t ) denote a set of pixels which
belong to the reconstruction of the video object. Dn is defined as the relative
number of erroneously represented pixels of the reconstructed binary shape mask
Dn (S t , B t ) =
|(O(B t )\R(S t )) ∪ (R(S t )\O(B t ))|
.
|O(B t )|
(1)
An iterated refinement method [11] is widely used as a polygon approximation technique because of its hierarchical nature. In each iteration it refines a
polygon segment with a maximum distance from the boundary by inserting a
new vertex. The method finds a polygon satisfying the maximum peak distance
Dpmax criterion using a minimal number of vertices. However, because vertex positions are restricted to the contour points, the resulting polygon is suboptimal
in terms of the distortion Dn . In [3] a vertex adjustment was proposed which
minimizes either average absolute distance Da or peak absolute distance Dp ,
however neither of these guarantees the minimization of the distortion Dn . We
thus propose to integrate the vertex adjustment using Dn as a criterion into the
iterated polygon refinement method.
The iterative optimization procedure seeks for a set of optimal vertices S t∗ ,
t
from its original position st,0
where each vertex
k within the search
° sk is adjusted
°
° t
°
t,0
max
range given by °sk − sk ° ≤ Dp
4
Janez Zaletelj, Jurij F. Tasic
³
´
∗
∗
st0 , . . . , stN t −1 = arg min Dn (st0 , . . . , stN t −1 , B t ) .
(2)
st0 ,...,st
N t −1
The obtained polygon yields a minimum shape reconstruction error for the given
number of segments N t .
3.2
Temporal Prediction of Polygon Vertices by Kalman Filtering
The proposed encoding scheme uses a local motion prediction and compensation
scheme, based on the Kalman prediction of vertex positions. Fig. 1 shows the
polygon approximation of the video object boundary in two successive frames of
the Children test sequence. Temporal prediction of the polygon is shown in both
frames by a dotted line. Temporally matched polygon vertices are represented
by circles, and unmatched vertices are represented by squares. The proposed
temporal matching and prediction effectively reduces the magnitude of the coded
prediction error vectors (shown by solid lines in Fig. 1, right).
50
50
100
100
150
150
200
200
250
250
100
150
200
250
300
350
100
150
200
250
300
350
Fig. 1. Polygon approximation (dashed line) of two frames from the Children test
sequence. Temporally predicted polygon is shown by a dotted line. Temporally matched
vertices are represented by circles, and unmatched vertices are represented by squares
(right). Solid lines (right) represent temporal prediction error vectors.
Let vector Xtk = (ptk , vkt )T denote the Kalman state variable which describes
the position and velocity of one polygon vertex. The Kalman filtering relates
states at different time instants through the equation
Xt+1
= FXtk + wk ,
k
(3)
T
where F is a transition matrix, and wk = (wkp , wkv ) represents the acceleration
of the vertex modelled as a white noise process.
The Kalman filter provides a prediction of the vertex position and velocity in
t−1
each frame Xt−
k = FXk , and updates the state variables and error covariance
Optimization and Tracking of Polygon Vertices for Shape Coding
5
matrices when a new measurement of the vertex position is available. Let P t− =
t−1
(pt−
− 1) denote an ordered set of predicted polygon vertices
k : k = 0, . . . , N
serving as a basis for finding an optimal correspondence between polygon vertices
in the current frame S t and vertices in the previous frame S t−1 .
3.3
Problem Formulation
We seek an approximating polygon which effectively minimizes the distortion of
the approximation, given the maximum target bit rate Rmax . We wish to select
an optimal number of polygon vertices N t and an optimal coding mode ψ(k, t)
(Eq. 5) for each vertex stk , such that the total distortion is minimized
¡
¢
min
Dn st0 , · · · , stN t −1 ,
t
t
s0 ,···,s t
(4)
N −1
¡
¢
subject to R st0 , · · · , stN t −1 ≤ Rmax .
Adding a new polygon vertex, which is a result of polygon refinement and distortion optimization (see Sect. 3.1), generally decreases distortion Dn and increases
the bit rate R. In each iteration, the minimum coding rate is found by optimal
intra/inter vertex matching, which selects an appropriate coding mode for each
vertex.
Given a set of polygon vertices S t and a set of predicted vertices P t− , the
goal of the vertex matching is to find a rate optimal encoding. For each vertex stk
two coding modes are available: intraframe, where the prediction error with ret
t
spect to the spatially predicted position st−
k = f (sk−1 , sk−2 , . . .) is encoded, and
the interframe mode, where the prediction error with respect to the temporal
prediction pt−
is encoded. Because of the temporal redundancy, the interframe
l
compensated coding generally requires less bits. In the context of the Kalman
filtering and prediction, the vertex matching is a data association step, necessary to associate a new measurement to each Kalman filter. The problem is to
associate a set of N t optimized polygon vertices to the N t−1 predicted vertices.
The most likely assignment is the one that minimizes the coding cost function.
Let the binary functions ψ(k, t) and χ(k, l, t) define the coding mode and the
temporal correspondence of the vertex stk , respectively:
½
ψ(k, t) =
½
χ(k, l, t) =
1 if stk is coded in intramode with respect to st−
k ,
0 if stk is coded in intermode,
(5)
1 if stk is coded in intermode with respect to pt−
l
0 otherwise.
(6)
where l = 0, . . . , N t−1 − 1 indexes over the list of the Kalman filters and k =
0, . . . , N t − 1 indexes over the measurement list. If the vertex stk is coded in
intramode, then its intermode correspondence function is zero for all l, χ(k, l, t) =
0. The vertex assignment problem is now formulated as a minimization of a
coding rate given by
6
Janez Zaletelj, Jurij F. Tasic
R (χ, ψ) =
t
N t−1
−1
X−1 NX
l=0
¡
¢
t−
T t
ψ(k, t) · RS (stk − st−
k ) + χ(k, l, t) · R (sk − pl ) , (7)
k=0
where RS (stk − st−
k ) is a coding cost for the intramode encoding (Eq. 9), and
RT (stk − pt−
)
is
a
coding cost for intermode vertex encoding (Eq. 8). The minil
mization is performed by Dijkstra’s algorithm [12] which finds the shortest path
in the weighted directed graph, where vertices correspond to the pairs (stk ,pt−
l )
and edge weights correspond to the coding costs of the prediction error.
3.4
Adaptive Arithmetic Encoding of the Prediction Error
t−
t
Prediction error vectors stk − st−
k and sk − pl are losslessly encoded using three
components: octant difference index dk = ok − ok−1 , max Mk and min mk components [3]. This allows that the dynamic range and probability distribution of
each component is adjusted independently thus increasing the coding efficiency.
Probability distributions are adaptively adjusted with each incoming symbol (see
Fig. 2) and they can be initialized to the predefined function, for example Laplacian distribution for M and m components. Typical probability distributions of
the M component in the intra and intermode are plotted in Fig. 2. The intermode distribution pM,T (Mk ) is highly non-uniform, which explains the higher
coding efficiency of the intermode coding.
0.35
0.08
0.07
0.3
0.06
0.25
0.05
0.2
0.04
0.15
0.03
0.1
0.02
0.05
0
0
0.01
0
5
10
15
20
30
25
20
15
10
5
5
0
5
10
10
15
20
15
Fig. 2. Probability distributions of the Max component of the prediction error for
intramode coding pM,S (Mk ) (left) and for intermode coding pM,T (Mk ) (right)
The default coding mode is the interframe mode. An octant difference index 4
is used as an escape symbol, indicating that the next symbol is a control symbol.
Control symbols are employed to change the coding mode, indicate a new frame,
an overflow of the Min or Max component of a vector, etc. Coding costs for intra
and inter encoding can be estimated from the entropies of coding symbols:
Optimization and Tracking of Polygon Vertices for Shape Coding
7
¡
¢
¡
¢
¡
¢
¡
¢
RT stk − pt−
= − log2 pd (dk ) − log2 pM,T (Mk ) − log2 pm,T (mk ) (8)
l
¡
¢
¡
¢
¡
¢
RS stk − st−
= − log2 pd (esc) − log2 pesc (intra)
k
¡
¢
¡
¢
¡
¢
− log2 pd (dk ) − log2 pM,S (Mk ) − log2 pm,S (mk ) (9)
4
Experimental Results
Fig. 3 shows the rate-distortion curve of the proposed algorithm for the MPEG4 test sequence ‘Children’. The efficiency of the proposed method is compared
to the baseline method [2], object-adaptive vertex encoding method [1] and Bspline-based method [7] (left). The proposed method outperforms the baseline
and object-adaptive encoding, but the rate-distortion optimized B-spline encoding performs better at all bit rates. This is because of the smaller reconstruction
error of B-splines, and also because of global vertex and VLC code word optimization.
1800
1.6
1600
1.4
1400
1.2
1200
1
1000
0.8
800
0.6
600
0.4
400
0.2
200
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Fig. 3. Rate-distortion efficiency of the proposed method compared to methods [1,2,7]
(left). Rate-distortion curves of the proposed method with and without vertex optimization compared to method [3] (right)
Fig. 3 (right) demonstrates the effect of vertex optimization. Compared to
the vertex coding method [3], which also uses vertex adjustment, the proposed
method is more effective, and the bit rate reduction due to optimization ranges
up to 40 percent. The coding gain due to the temporal prediction of vertices
varies from sequence to sequence and depends on the amount of the contour
motion.
8
5
Janez Zaletelj, Jurij F. Tasic
Conclusion
In this work, we propose a novel approach to the rate-distortion controlled encoding of video object shape information which is based on polygon approximation and vertex encoding. It is a combination of distortion-optimized polygon
approximation, Kalman-based tracking and prediction of polygon vertices, and
adaptive arithmetic encoding of the prediction error. Its efficiency comes from
using a constant-velocity motion model for each vertex separately. The model
predicts the position of the vertex in the next frame and allows the tracking of
boundary segments moving in different directions. It employs adaptive arithmetic
encoding of the prediction error. The coding efficiency outperforms conventional
polygon-based shape coding methods, and is close to the rate-distortion optimized B-spline shape coding.
It is expected that further coding gains can be achieved by using a better
approximation technique, for example B-splines, by adapting the motion model
parameters to the actual sequence and by employing a hybrid intra/inter prediction modes.
References
1. O’Connell, K.J.: Object-adaptive vertex-based shape coding method. IEEE Trans.
Circuits Syst. Video Technol., Vol. 7, (1997) 251–255
2. Lee, S., Cho, D., Cho, Y., Son, S., Jang, E., Shin, J.: Binary shape coding using
1-D distance values from baseline. In: Proc. ICIP (1997) I-508–511
3. Chung, J., Lee, J., Moon, J., Kim, J.: A new vertex-based binary shape coder for
high coding efficiency. Signal Processing: Image Comm., Vol. 15 (2000) 665–684
4. Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans.
Electron. Comput., Vol. 10 (1961) 260–268
5. Katsaggelos, A.K., Kondi, L.P., Meier, F.W., Ostermann, J., Schuster, G.M.:
MPEG-4 and rate-distortion-based shape-coding techniques. Proc. IEEE, Vol. 86,
(1998) 1126–1154.
6. Witten, H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM, Vol. 30 6 (1987) 520–540
7. Melnikov, G., Schuster, G.M., Katsaggelos, A.K.: Shape Coding Using Temporal
Correlation and Joint VLC Optimization. IEEE Trans. Circ. Syst. Video Techn.,
Vol. 10 5 (2000) 744–754
8. Cho, S.H., Kim, R.C., Oh, S.S., Lee, S.U.: A Coding Technique for the Contours in
Smoothly Perfect Eight-Connectivity Based on Two-Stage Motion Compensation.
IEEE Trans. CSVT, Vol.9 (1999) 59–69
9. Brady, N., Bossen, F.: Shape compression of moving objects using context-based
arithmetic encoding. Signal Processing: Image Communication, Vol. 15 (2000) 601–
617
10. Kim, J.I., Bovik, A.C., Evans, B.L.: Generalized predictive binary shape coding
using polygon approximation. Signal Processing: Image Communication, Vol. 15
(2000) 643–663
11. Gerken, P.: Object-based analysis-synthesis coding of image sequences at very low
bit rates. IEEE Trans. CSVT, vol. 4, (1994) 228–235
12. Dijkstra, E.W.: A Note on Two Problems in Connexion with Graphs. Numerische
Mathematik, vol. 1, pp. 269-271, 1959.
© Copyright 2026 Paperzz