InterimRep.

MULTIMEDIA PROCESSING
EE5359 PROJECT INTERIM REPORT
ADAPTIVE INTERPOLATION FILTER FOR H.264/AVC
-Bhavana Prabhakar
Student Id: 1000790889
PROPOSAL:
Implement a non-separable adaptive interpolation filter in H.264 which
 Analytically minimizes the energy of the prediction error (PE) where ℎ𝐹𝑃 𝑖,𝑗
is the 2-D filter coefficient for each fractional pel position. Where original
image and predicted are dependent on x and y.
PE = ∑ ∑𝑥,𝑦(original image – (ℎ
𝐹𝑃
𝑖,𝑗 )predicted
2
image)
 Reduces the distorting effects, caused by aliasing, motion blur, motion
estimation inaccuracies etc.
 Further improvements can be achieved, when applying locally adaptive
filter, which is adapted to local properties of the image.
 Steps will be taken to reduce blurring effects by: first considering only
displacement without blurring effects. Where 𝑑𝑎𝑣 = average displacement
vector and the prediction signal 𝑆 ∗ (𝑋𝑛 ) expressed in frequency domain is
given as
𝑆𝑖
∗
∗
(jΩ) = 𝑆𝑖−1 (jΩ) 𝑒 𝑗𝑑𝑎𝑣Ω
eq.(1)
 With the intention of compensating the blurring effects, the adaptive
interpolation filter H(jΩ) for perfect motion compensated prediction has to
satisfy the condition given as
𝑆𝑖 (jΩ) = 𝑆𝑖
∗
(jΩ) . H(jΩ)
eq.(2)
An individual filter is to be used for the
interpolation of each fractional-pel position
The estimation of the coefficients and the motion
compensation are performed in the steps given in [16]
Filter coefficients are to be coded
Filter coefficients are subject to quantization,
followed by prediction and entropy coding
The aliasing effects are minimized by
suppressing the high-frequency components
Steps to reduce blurring effects.
ABSTRACT:
For reducing the bit-rate of video signals, current coding standards apply
hybrid coding with motion-compensated prediction and transform coding of the
prediction error [16]. From prior research [3] it is known that aliasing components
contained in an image signal, as well as motion blur are limiting the prediction
efficiency obtained by motion compensation. Hence, the objective is to show that
the analytical development of an optimal interpolation filter at particular
constraints is possible, resulting in coding improvements of broadcast quality
compared to the H.264/advanced video coding (AVC) [8] high profile.
Furthermore, the spatial adaptation to local image characteristics enables further
improvements for common intermediate format (CIF) sequences compared to
globally adaptive filter. Additionally, it will be shown that the presented approach
is generally applicable, i.e., also motion blur can be exactly compensated, if
particular constraints are fulfilled.
REQUIREMENT OF INTERPOLATION:
Motion-compensated prediction (MCP) is the key to the success of the
modern video coding standards, as it removes the temporal redundancy in video
sequences and reduces the size of bit streams significantly. With MCP, the pixels
to be coded are predicted from the temporally neighboring ones, and only the
prediction errors and the motion vectors (MV) are transmitted. However, due to the
finite sampling rate, the actual position of the prediction in the neighboring frames
may be out of the sampling grid, where the intensity is unknown, so the intensities
of the positions in between the integer pixels, called sub-pixel positions, must be
interpolated and the resolution of MV is increased accordingly.
INTERPOLATION IN H.264/AVC
In H.264/AVC, for the resolution of MV is quarter-pixel, the reference
frame is interpolated to be 16 times the size for MCP, 4 times both sides. As shown
in Fig. 1(a), the interpolation defined in H.264 includes two stages, interpolating
the half-pixel and quarter-pixel sub-positions, respectively. The interpolation in the
first stage is separable, which means the sampling rate in one direction is doubled
by inserting zero-valued samples followed by filtering using a 1-D filter h1, [1, 0, 5, 0, 20, 32, 20, 0, -5, 0, 1]/32 [21], and then the process repeats in the other
direction. The second stage, which is non-separable, uses bilinear filtering
supported by the integer pixels and the interpolated half-pixel values.
Fig. 1 Interpolation process of (a) the filter in H.264/AVC, (b) the optimal AIF,
and (c) the separable AIF
↑𝑚
= m times the sampling rate due to interpolation.
REVIEW OF ADAPTIVE INTERPOLATION FILTERS (AIF)
Considering the time-varying statistics of video sources, some researchers
propose using adaptive interpolation filter (AIF) [16], which is one of the design
elements making KTA significantly outperform JM [12]. With AIF, the filter
coefficients are optimized on a frame basis, such that for each frame the energy of
the MCP error is minimized. The optimal filter coefficients are quantized, coded,
and transmitted as the side information of the associated frame
2-D non-separable AIF [22], of which the interpolation process is shown in
Fig. 1(b), increases the spatial sampling rate 16 times at one time by zero-insertion,
and each sub-pixel position is interpolated directly by filtering the surrounding 6
×6 integer pixels. Fig. 2 (a) shows the support region of 2-D non-separable AIF.
As the spatial statistics are assumed to be isotropic, the filter h is in circular
symmetry and therefore 1/8 of the coefficients are coded, as shown in Fig. 2
(b). The assumption that the spatial statistics are isotropic may not hold for every
frame in a video sequence. 2-D separable AIF is proposed, which considers the
spatial statistics of horizontal and vertical directions different and reduces the
complexity of 2-D non-separable AIF. The 1-D AIFs for the two directions are
separately designed. As shown in Fig. 1(c), the horizontal sampling rate is
increased four times by zero-insertion and a 1-D filter h1 calculated for the current
frame is applied. Then, the process repeats for the vertical direction using h1.
(a)
Fig. 2 2-D non-separable
coefficients [22]
AIF’s
(a)
support
(b)
region
and
(b)
coded
DESCRIPTION:
To reduce the bit-rate of video signals, the international telecommunication
union (ITU) coding standards [14] apply hybrid video coding with motioncompensated prediction combined with transform coding of the prediction error. In
the first step the motion- compensated prediction is performed. The temporal
redundancy, i.e., the correlation between consecutive images is exploited for the
prediction of the current image from already transmitted images. In a second step,
the residual error is transform coded, thus the spatial redundancy is reduced.
For performing motion-compensated prediction, the current image of a
sequence is split into blocks. For each block a displacement vector ⃗⃗⃗
di is estimated
and transmitted that refers to the corresponding position of its image signal in an
already transmitted reference image. The displacement vectors have fractional-pel
resolution. The H.264/ (AVC) [8] is based on ¼ pel displacement resolution [1].
Displacement vectors with fractional resolution may refer to positions in the
reference image, which are located between the sampled positions. In order to
estimate and compensate the fractional-pel displacements, the reference image has
to be interpolated on the fractional-pel positions.
H.264/AVC [8] uses a 6-tap Wiener interpolation filter with filter
coefficients (1, −5, 20, 20, −5, 1)⁄32 . The interpolation process is depicted in
Fig.3 and can be subdivided into two steps. At first, the half-pel positions
aa, bb, b, hh, ii, jj and cc, dd, h, ee, ff, gg are calculated, using a horizontal or
vertical 6-tap Wiener filter, respectively. Using the same Wiener filter applied at
fractional-pel positions aa, bb, b, hh, ii, jj the fractional-pel position j is computed.
In the second step, the remaining quarter-pel positions are obtained, using a
bilinear filter, applied at already calculated half-pel positions and existing full-pel
positions.
Fig.3. Integer pixels (shaded blocks with upper-case letters) and fractional pixel
positions (non-shaded blocks with lower-case letters). Example for filter size 6 x 6.
[15]
An adaptive interpolation filter as proposed in [3] is independently estimated
for every image. This approach enables to take into account the alteration of image
signal properties as aliasing on the basis of minimization of the prediction error
energy. Analytical calculation of optimal filter coefficients is not possible due to
nonlinearity, which is caused by subsequent application of 1-D filters. In [4] a 3-D
filter is proposed. In this proposal two techniques are combined: a 2-D spatial filter
with a motion compensated interpolation filter (MCIF).
The main disadvantage of MCIF is the sensitivity concerning displacement
vector estimation errors. Besides aliasing, there are further distorting factors, which
impair the efficiency of motion compensated prediction. The main disadvantage of
using a 2-D spatial filter with a motion compensated interpolation filter (MCIF)
proposed in [4] is its numerical approach to determine the coefficients of a
separable 2-D filter. Due to an iterative procedure, this method is nondeterministic
in terms of time and requires a significantly higher encoder complexity.
In order to guarantee a limited increase of encoder complexity compared to
the standard H.264/ AVC [8] on the one hand and to reach the theoretical bound
for the coding gain obtained by means of a 2-D filter on the other hand, a nonseparable filter scheme is proposed. An individual filter will be used for the
interpolation of each fractional-pel position.
In the following, the calculation of the filter coefficients is shown more
SP
SP
SP
SP
precisely. Let us assume that h00 , h01 ,  , h54 , h55 are the 36 filter coefficients of a
SP
6x6-tap 2D filter used for a particular sub-pel position SP . Then the value p (
a  o ) to be interpolated is computed by a two-dimensional convolution:
6
6
p SP   Pi , j hiSP
1, j 1
i 1 j 1
Where Pi , j is an integer sample value ( A1  F 6 ).
The calculation of coefficients and the motion compensation are performed
in the following steps:

Displacement vectors d t  mvx, mvy are estimated for every image to be
coded. For the purpose of interpolation, the standard interpolation filter of
H.264/AVC is applied to every reference image.
2-D filter coefficients hi , j are calculated for each sub-pel position SP
independently by minimization of the prediction error energy:
e 
SP 2
with



   S x, y   hiSP
, j P~
x i , ~
y j 
x
y 
i
j

2
~x  x  mvx  FO, ~y  y  mvy  FO
 
 
S
P
where x , y is an original image, x , y a previously decoded image, i, j are the
filter indices, mvx and mvy are the estimated displacement vector (in meters)
components, FO - a so called Filter Offset centering the filter (
FO 
filter _ size
1
2
,
in case of a 6-tap filter FO  2 ) and  -operator is the floor function, which maps
the estimated displacement vector mv to the next full-pel position smaller than mv .
This is a necessary step, since the previously decoded images contain information
only at full-pel positions. Note, for the error minimization, only the sub-pel
positions are used, which were referred to by motion vectors. Thus, for each of the
sub-pel positions a o an individual set of equations is set up by computing the
derivative of e  with respect to the filter coefficient hi , j . The number of
equations is equal to the number of filter coefficients used for the current sub-pel
position SP .
SP 2
SP
(e
0
SP 2
)

hkSP,l
2




 
SP
 SP   S x , y   hi , j P~x  i , ~y  j  

hk ,l  x y 
i
j





 x  k , ~y  l
   S x , y   hiSP
P
~
~
, j x  i , y  j  P~
x
y 
i
j

k , l  0;5
For each sub-pel position e, f , g , i, j, k , m, n, o using a 6x6-tap 2D filter, a
system of 36 equations with 36 unknowns has to be solved. For the remaining subpel positions requiring a 1D filter, systems of 6 equations have to be solved. This
results in 360 filter coefficients (nine 2D filter sets with 36 coefficients each and
six 1D filter sets with 6 coefficients per set).
New displacement vectors are estimated. For the purpose of interpolation,
the adaptive interpolation filter computed in step 2 is applied. This step enables
reducing motion estimation errors, caused by aliasing, camera noise, etc. on the
one hand and to treat the problem in the rate-distortion sense on the other hand.
The steps 2 and 3 can be repeated, until a particular quality improvement
threshold is achieved. Since some of the displacement vectors are different after
the 3. Step, it is conceivable to estimate new filter coefficients, adapted to the new
displacement vectors.
The quantization, prediction and entropy coding:
• First, finding an optimal quantization step size is a very important step. On one
hand, the finer the quantization is the more accurate is the prediction. However, on
the other hand, the amount of the side information increases, which may impair the
coding gain. A trade-off will be set to 9 bits for the amplitude of the filter
coefficients.
• Second, the entropy coded differences to the standard Wiener filter are to be
transmitted for the 1-D fractional-pel positions. In case of symmetrical filter, these
are fractional-pel positions a and b. The filter coefficients for the fractional-pel
position are to be obtained from the filter by mirroring.
•In order to predict filter coefficients of 2-D positions, the calculated filter
coefficients for the 1-D positions are used. For the prediction of the non-separable
2-D filter, which can also be regarded as a poly phase filter [6], 2-D separable
filters are used. Fig. 4 illustrates an example with interpolated impulse response of
a predicted filter at the fractional-pel position j and actually calculated filter
coefficients from [16]. The spline surface represents the interpolated prediction of
the coefficients of a poly phase filter. The dots represent the calculated values of
the coefficients of such a filter, sampled at fractional-pel positions. The greater the
distance from the dots to the plotted surface is, the greater is the prediction error.
The filter coefficients for the remaining 2-D positions (g, i, k, m, n and o) are also
obtained by mirroring in case of non-symmetric filter. In case of a non-symmetric
filter, the filter coefficients are predicted in the same manner.
Fig. 4. Prediction of impulse response of a 6 x 6-tap 2-D Wiener filter at the fractional-pel
position j (displacement vector [ 0.5 , 0.5 ] and actually calculated filter coefficients from [16].
• The entropy coding is performed using the signed exp- Golomb code [1]. This
code is well-suitable for Laplacian distribution and is already implemented in the
standard H.264/AVC. Thus, no additional calculations or look-up tables are
required.
In order to keep the amount of the necessary side information as low as
possible, thus, enabling the highest coding gains, the filter coefficients are subject
to quantization, followed by prediction and entropy coding. For error resilience
reasons, only intra prediction is performed. The aliasing effects are minimized by
suppressing the high-frequency components.
Steps will be taken to reduce blurring effects by: first considering only
displacement without blurring effects.
Where 𝑑𝑎𝑣 = average displacement vector and the prediction signal 𝑆 ∗ (𝑋𝑛 )
expressed in frequency domain is given in eq.(1)
∗
𝑆𝑖∗ (𝑗Ω) = 𝑆𝑖−1
(𝑗Ω)𝑒 𝑗𝑑𝑎𝑣Ω
eq.(1)
With the intention of compensating the blurring effects, the adaptive
interpolation filter H(jΩ) for perfect motion compensated prediction has to satisfy
the condition given in eq.(2)
𝑆𝑖 (𝑗Ω) = 𝑆𝑖∗ (𝑗Ω) . H(jΩ)
eq.(2)
It can be shown that by the implementation of an adaptive interpolation
filter, blurring effects caused due to motion can be reduced.
Software that will be used: Key technical area (KTA) software version 2.3
[12] and H.264 reference software version 11 [8].
LIST OF ACRONYMS:
1. AIF: Adaptive interpolation filter
2. AVC: Advanced video coding
3. BD – ROM: Blue ray disc – read only memory
4. CIF: Common intermediate format
5. HD – DVD: High definition - digital video disc
6. ITU: International telecommunication union
7. KTA: Key technical area
8. MCIF: Motion compensated interpolation filter
9. MCP: Motion compensated prediction
10.MPEG: Moving picture experts group
11.MV: Motion vector
12.VCEG: Video coding experts group
REFERENCES:
[1] JVT of ISO/IEC & ITU-T, Draft ITU-T Recommendation H.264 and Draft ISO/IEC 1449610 AVC, Doc JVT-Go50. Pattaya, Thailand, 2003.
[2] O. Werner, “Drift analysis and drift reduction for multi resolution hybrid video coding”,
Signal processing: Image commun., vol. 8, no. 5, pp. 387–409, Jul. 1996.
[3] T. Wedi and H. G. Musmann, “Motion and aliasing compensated prediction for hybrid video
coding”, IEEE Trans. circuits and syst. video technol., vol. 13, no. 7, pp. 577–586, Jul. 2003.
[4] T. Wedi, “Adaptive interpolation filter for motion and aliasing compensated prediction”, in
Proc VCIP, San Jose, CA, USA, pp. 415–422, Jan. 2002.
[5] M. Budagavi, “Video compression using blur compensation”, in Proc.IEEE ICIP, Genova,
Italy, pp. 882–885, Sep. 2005.
[6] R. E. Crochiere and L. R. Rabiner, “Multi-rate signal processing”, Englewood Cliffs, NJ:
Prentice Hall, pp. 88–91, 1983.
[7] R. W. Schaefer and A. V. Oppenheim, “Discrete-time signal processing”, Englewood Cliffs,
NJ: Prentice-Hall, 1989.
[8] T. Wiegand et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans.
circuits and syst. video technol., vol. 13, no. 7, pp. 560-576, Jul. 2003.
[9] Y. Vatis and J. Ostermann, “Locally adaptive non separable interpolation filter for
H.264/AVC”, in Proc. IEEE ICIP, Atlanta, GA, pp. 33–36, Oct. 2006.
[10] T.Wedi, “Adaptive interpolation filter for motion compensated prediction”, Proc. IEEE
ICIP, Rochester, NY, pp. 509–512, Sep. 2002.
[11] H.264/AVC reference software version JM11.0
http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.zip, Jan. 2007 [Online].
[12] KTA software, version JM11.0 KTA2.3.
http://www.tnt.uni-hannover.de/~vatis/kta/jm11.0kta2.3.zip, Mar. 2007 [Online].
[13] Y. Vatis and J. Ostermann, “Prediction of P- and B-frames using a 2-D non-separable
adaptive Wiener interpolation filter”, in ITU-T SG16/Q [15] (VCEG) Doc VCEG-AD08,
Hangzhou, China, Oct. 2006.
[14] Y. Vatis and J. Ostermann, ITU-T SG16/Q [15] (VCEG) VCEG-AE16, Marrakech,
Morocco, Jan. 2007.
[[15]] S. Wittman and T. Wedi, “Separable adaptive interpolation filter”, in ITU-T SG16/Q6,
Doc. C-0219, Geneva, Switzerland, Jul. 2007.
[16] Y. Vatis and J. Ostermann “Adaptive interpolation filter for H.264/AVC”, IEEE Trans.
circuits and syst. video technol., vol. 19, pp.179-192, Feb. 2009.
[17] D. Rusanovskyy, K. Ugur, and J. Lainema, “Adaptive interpolation with directional filters”,
in ITU-T SG16/Q.6 Doc. VCEG-AG21, Shenzhen,China, Oct. 2007.
[18] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its
applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.
[19] T. Wiegand and G. J. Sullivan, “The picturephone is here: Really”, IEEE Spectrum, vol.48,
pp. 50-54, Sep. 2011.
[20] I. E. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Wiley,
2010.
[21] Y. Vatis, B. Edler, D. T. Nguyen, I. Wassermann, and J. Ostermann, “Coding of coefficients
of 2-D nonseparable adaptive Wiener interpolation filter,” in Proc. SPIE VCIP, Beijing, China,
pp. 623–631, Jul. 2005.
[22] Adaptive interpolation filter for video coding
http://www.h265.net/2010/07/adaptive-interpolation-filter-for-video-coding.html , Aug. 2010[online].