MULTIMEDIA PROCESSING EE5359 PROJECT INTERIM REPORT ADAPTIVE INTERPOLATION FILTER FOR H.264/AVC -Bhavana Prabhakar Student Id: 1000790889 PROPOSAL: Implement a non-separable adaptive interpolation filter in H.264 which Analytically minimizes the energy of the prediction error (PE) where ℎ𝐹𝑃 𝑖,𝑗 is the 2-D filter coefficient for each fractional pel position. Where original image and predicted are dependent on x and y. PE = ∑ ∑𝑥,𝑦(original image – (ℎ 𝐹𝑃 𝑖,𝑗 )predicted 2 image) Reduces the distorting effects, caused by aliasing, motion blur, motion estimation inaccuracies etc. Further improvements can be achieved, when applying locally adaptive filter, which is adapted to local properties of the image. Steps will be taken to reduce blurring effects by: first considering only displacement without blurring effects. Where 𝑑𝑎𝑣 = average displacement vector and the prediction signal 𝑆 ∗ (𝑋𝑛 ) expressed in frequency domain is given as 𝑆𝑖 ∗ ∗ (jΩ) = 𝑆𝑖−1 (jΩ) 𝑒 𝑗𝑑𝑎𝑣Ω eq.(1) With the intention of compensating the blurring effects, the adaptive interpolation filter H(jΩ) for perfect motion compensated prediction has to satisfy the condition given as 𝑆𝑖 (jΩ) = 𝑆𝑖 ∗ (jΩ) . H(jΩ) eq.(2) An individual filter is to be used for the interpolation of each fractional-pel position The estimation of the coefficients and the motion compensation are performed in the steps given in [16] Filter coefficients are to be coded Filter coefficients are subject to quantization, followed by prediction and entropy coding The aliasing effects are minimized by suppressing the high-frequency components Steps to reduce blurring effects. ABSTRACT: For reducing the bit-rate of video signals, current coding standards apply hybrid coding with motion-compensated prediction and transform coding of the prediction error [16]. From prior research [3] it is known that aliasing components contained in an image signal, as well as motion blur are limiting the prediction efficiency obtained by motion compensation. Hence, the objective is to show that the analytical development of an optimal interpolation filter at particular constraints is possible, resulting in coding improvements of broadcast quality compared to the H.264/advanced video coding (AVC) [8] high profile. Furthermore, the spatial adaptation to local image characteristics enables further improvements for common intermediate format (CIF) sequences compared to globally adaptive filter. Additionally, it will be shown that the presented approach is generally applicable, i.e., also motion blur can be exactly compensated, if particular constraints are fulfilled. REQUIREMENT OF INTERPOLATION: Motion-compensated prediction (MCP) is the key to the success of the modern video coding standards, as it removes the temporal redundancy in video sequences and reduces the size of bit streams significantly. With MCP, the pixels to be coded are predicted from the temporally neighboring ones, and only the prediction errors and the motion vectors (MV) are transmitted. However, due to the finite sampling rate, the actual position of the prediction in the neighboring frames may be out of the sampling grid, where the intensity is unknown, so the intensities of the positions in between the integer pixels, called sub-pixel positions, must be interpolated and the resolution of MV is increased accordingly. INTERPOLATION IN H.264/AVC In H.264/AVC, for the resolution of MV is quarter-pixel, the reference frame is interpolated to be 16 times the size for MCP, 4 times both sides. As shown in Fig. 1(a), the interpolation defined in H.264 includes two stages, interpolating the half-pixel and quarter-pixel sub-positions, respectively. The interpolation in the first stage is separable, which means the sampling rate in one direction is doubled by inserting zero-valued samples followed by filtering using a 1-D filter h1, [1, 0, 5, 0, 20, 32, 20, 0, -5, 0, 1]/32 [21], and then the process repeats in the other direction. The second stage, which is non-separable, uses bilinear filtering supported by the integer pixels and the interpolated half-pixel values. Fig. 1 Interpolation process of (a) the filter in H.264/AVC, (b) the optimal AIF, and (c) the separable AIF ↑𝑚 = m times the sampling rate due to interpolation. REVIEW OF ADAPTIVE INTERPOLATION FILTERS (AIF) Considering the time-varying statistics of video sources, some researchers propose using adaptive interpolation filter (AIF) [16], which is one of the design elements making KTA significantly outperform JM [12]. With AIF, the filter coefficients are optimized on a frame basis, such that for each frame the energy of the MCP error is minimized. The optimal filter coefficients are quantized, coded, and transmitted as the side information of the associated frame 2-D non-separable AIF [22], of which the interpolation process is shown in Fig. 1(b), increases the spatial sampling rate 16 times at one time by zero-insertion, and each sub-pixel position is interpolated directly by filtering the surrounding 6 ×6 integer pixels. Fig. 2 (a) shows the support region of 2-D non-separable AIF. As the spatial statistics are assumed to be isotropic, the filter h is in circular symmetry and therefore 1/8 of the coefficients are coded, as shown in Fig. 2 (b). The assumption that the spatial statistics are isotropic may not hold for every frame in a video sequence. 2-D separable AIF is proposed, which considers the spatial statistics of horizontal and vertical directions different and reduces the complexity of 2-D non-separable AIF. The 1-D AIFs for the two directions are separately designed. As shown in Fig. 1(c), the horizontal sampling rate is increased four times by zero-insertion and a 1-D filter h1 calculated for the current frame is applied. Then, the process repeats for the vertical direction using h1. (a) Fig. 2 2-D non-separable coefficients [22] AIF’s (a) support (b) region and (b) coded DESCRIPTION: To reduce the bit-rate of video signals, the international telecommunication union (ITU) coding standards [14] apply hybrid video coding with motioncompensated prediction combined with transform coding of the prediction error. In the first step the motion- compensated prediction is performed. The temporal redundancy, i.e., the correlation between consecutive images is exploited for the prediction of the current image from already transmitted images. In a second step, the residual error is transform coded, thus the spatial redundancy is reduced. For performing motion-compensated prediction, the current image of a sequence is split into blocks. For each block a displacement vector ⃗⃗⃗ di is estimated and transmitted that refers to the corresponding position of its image signal in an already transmitted reference image. The displacement vectors have fractional-pel resolution. The H.264/ (AVC) [8] is based on ¼ pel displacement resolution [1]. Displacement vectors with fractional resolution may refer to positions in the reference image, which are located between the sampled positions. In order to estimate and compensate the fractional-pel displacements, the reference image has to be interpolated on the fractional-pel positions. H.264/AVC [8] uses a 6-tap Wiener interpolation filter with filter coefficients (1, −5, 20, 20, −5, 1)⁄32 . The interpolation process is depicted in Fig.3 and can be subdivided into two steps. At first, the half-pel positions aa, bb, b, hh, ii, jj and cc, dd, h, ee, ff, gg are calculated, using a horizontal or vertical 6-tap Wiener filter, respectively. Using the same Wiener filter applied at fractional-pel positions aa, bb, b, hh, ii, jj the fractional-pel position j is computed. In the second step, the remaining quarter-pel positions are obtained, using a bilinear filter, applied at already calculated half-pel positions and existing full-pel positions. Fig.3. Integer pixels (shaded blocks with upper-case letters) and fractional pixel positions (non-shaded blocks with lower-case letters). Example for filter size 6 x 6. [15] An adaptive interpolation filter as proposed in [3] is independently estimated for every image. This approach enables to take into account the alteration of image signal properties as aliasing on the basis of minimization of the prediction error energy. Analytical calculation of optimal filter coefficients is not possible due to nonlinearity, which is caused by subsequent application of 1-D filters. In [4] a 3-D filter is proposed. In this proposal two techniques are combined: a 2-D spatial filter with a motion compensated interpolation filter (MCIF). The main disadvantage of MCIF is the sensitivity concerning displacement vector estimation errors. Besides aliasing, there are further distorting factors, which impair the efficiency of motion compensated prediction. The main disadvantage of using a 2-D spatial filter with a motion compensated interpolation filter (MCIF) proposed in [4] is its numerical approach to determine the coefficients of a separable 2-D filter. Due to an iterative procedure, this method is nondeterministic in terms of time and requires a significantly higher encoder complexity. In order to guarantee a limited increase of encoder complexity compared to the standard H.264/ AVC [8] on the one hand and to reach the theoretical bound for the coding gain obtained by means of a 2-D filter on the other hand, a nonseparable filter scheme is proposed. An individual filter will be used for the interpolation of each fractional-pel position. In the following, the calculation of the filter coefficients is shown more SP SP SP SP precisely. Let us assume that h00 , h01 , , h54 , h55 are the 36 filter coefficients of a SP 6x6-tap 2D filter used for a particular sub-pel position SP . Then the value p ( a o ) to be interpolated is computed by a two-dimensional convolution: 6 6 p SP Pi , j hiSP 1, j 1 i 1 j 1 Where Pi , j is an integer sample value ( A1 F 6 ). The calculation of coefficients and the motion compensation are performed in the following steps: Displacement vectors d t mvx, mvy are estimated for every image to be coded. For the purpose of interpolation, the standard interpolation filter of H.264/AVC is applied to every reference image. 2-D filter coefficients hi , j are calculated for each sub-pel position SP independently by minimization of the prediction error energy: e SP 2 with S x, y hiSP , j P~ x i , ~ y j x y i j 2 ~x x mvx FO, ~y y mvy FO S P where x , y is an original image, x , y a previously decoded image, i, j are the filter indices, mvx and mvy are the estimated displacement vector (in meters) components, FO - a so called Filter Offset centering the filter ( FO filter _ size 1 2 , in case of a 6-tap filter FO 2 ) and -operator is the floor function, which maps the estimated displacement vector mv to the next full-pel position smaller than mv . This is a necessary step, since the previously decoded images contain information only at full-pel positions. Note, for the error minimization, only the sub-pel positions are used, which were referred to by motion vectors. Thus, for each of the sub-pel positions a o an individual set of equations is set up by computing the derivative of e with respect to the filter coefficient hi , j . The number of equations is equal to the number of filter coefficients used for the current sub-pel position SP . SP 2 SP (e 0 SP 2 ) hkSP,l 2 SP SP S x , y hi , j P~x i , ~y j hk ,l x y i j x k , ~y l S x , y hiSP P ~ ~ , j x i , y j P~ x y i j k , l 0;5 For each sub-pel position e, f , g , i, j, k , m, n, o using a 6x6-tap 2D filter, a system of 36 equations with 36 unknowns has to be solved. For the remaining subpel positions requiring a 1D filter, systems of 6 equations have to be solved. This results in 360 filter coefficients (nine 2D filter sets with 36 coefficients each and six 1D filter sets with 6 coefficients per set). New displacement vectors are estimated. For the purpose of interpolation, the adaptive interpolation filter computed in step 2 is applied. This step enables reducing motion estimation errors, caused by aliasing, camera noise, etc. on the one hand and to treat the problem in the rate-distortion sense on the other hand. The steps 2 and 3 can be repeated, until a particular quality improvement threshold is achieved. Since some of the displacement vectors are different after the 3. Step, it is conceivable to estimate new filter coefficients, adapted to the new displacement vectors. The quantization, prediction and entropy coding: • First, finding an optimal quantization step size is a very important step. On one hand, the finer the quantization is the more accurate is the prediction. However, on the other hand, the amount of the side information increases, which may impair the coding gain. A trade-off will be set to 9 bits for the amplitude of the filter coefficients. • Second, the entropy coded differences to the standard Wiener filter are to be transmitted for the 1-D fractional-pel positions. In case of symmetrical filter, these are fractional-pel positions a and b. The filter coefficients for the fractional-pel position are to be obtained from the filter by mirroring. •In order to predict filter coefficients of 2-D positions, the calculated filter coefficients for the 1-D positions are used. For the prediction of the non-separable 2-D filter, which can also be regarded as a poly phase filter [6], 2-D separable filters are used. Fig. 4 illustrates an example with interpolated impulse response of a predicted filter at the fractional-pel position j and actually calculated filter coefficients from [16]. The spline surface represents the interpolated prediction of the coefficients of a poly phase filter. The dots represent the calculated values of the coefficients of such a filter, sampled at fractional-pel positions. The greater the distance from the dots to the plotted surface is, the greater is the prediction error. The filter coefficients for the remaining 2-D positions (g, i, k, m, n and o) are also obtained by mirroring in case of non-symmetric filter. In case of a non-symmetric filter, the filter coefficients are predicted in the same manner. Fig. 4. Prediction of impulse response of a 6 x 6-tap 2-D Wiener filter at the fractional-pel position j (displacement vector [ 0.5 , 0.5 ] and actually calculated filter coefficients from [16]. • The entropy coding is performed using the signed exp- Golomb code [1]. This code is well-suitable for Laplacian distribution and is already implemented in the standard H.264/AVC. Thus, no additional calculations or look-up tables are required. In order to keep the amount of the necessary side information as low as possible, thus, enabling the highest coding gains, the filter coefficients are subject to quantization, followed by prediction and entropy coding. For error resilience reasons, only intra prediction is performed. The aliasing effects are minimized by suppressing the high-frequency components. Steps will be taken to reduce blurring effects by: first considering only displacement without blurring effects. Where 𝑑𝑎𝑣 = average displacement vector and the prediction signal 𝑆 ∗ (𝑋𝑛 ) expressed in frequency domain is given in eq.(1) ∗ 𝑆𝑖∗ (𝑗Ω) = 𝑆𝑖−1 (𝑗Ω)𝑒 𝑗𝑑𝑎𝑣Ω eq.(1) With the intention of compensating the blurring effects, the adaptive interpolation filter H(jΩ) for perfect motion compensated prediction has to satisfy the condition given in eq.(2) 𝑆𝑖 (𝑗Ω) = 𝑆𝑖∗ (𝑗Ω) . H(jΩ) eq.(2) It can be shown that by the implementation of an adaptive interpolation filter, blurring effects caused due to motion can be reduced. Software that will be used: Key technical area (KTA) software version 2.3 [12] and H.264 reference software version 11 [8]. LIST OF ACRONYMS: 1. AIF: Adaptive interpolation filter 2. AVC: Advanced video coding 3. BD – ROM: Blue ray disc – read only memory 4. CIF: Common intermediate format 5. HD – DVD: High definition - digital video disc 6. ITU: International telecommunication union 7. KTA: Key technical area 8. MCIF: Motion compensated interpolation filter 9. MCP: Motion compensated prediction 10.MPEG: Moving picture experts group 11.MV: Motion vector 12.VCEG: Video coding experts group REFERENCES: [1] JVT of ISO/IEC & ITU-T, Draft ITU-T Recommendation H.264 and Draft ISO/IEC 1449610 AVC, Doc JVT-Go50. Pattaya, Thailand, 2003. [2] O. Werner, “Drift analysis and drift reduction for multi resolution hybrid video coding”, Signal processing: Image commun., vol. 8, no. 5, pp. 387–409, Jul. 1996. [3] T. Wedi and H. G. Musmann, “Motion and aliasing compensated prediction for hybrid video coding”, IEEE Trans. circuits and syst. video technol., vol. 13, no. 7, pp. 577–586, Jul. 2003. [4] T. Wedi, “Adaptive interpolation filter for motion and aliasing compensated prediction”, in Proc VCIP, San Jose, CA, USA, pp. 415–422, Jan. 2002. [5] M. Budagavi, “Video compression using blur compensation”, in Proc.IEEE ICIP, Genova, Italy, pp. 882–885, Sep. 2005. [6] R. E. Crochiere and L. R. Rabiner, “Multi-rate signal processing”, Englewood Cliffs, NJ: Prentice Hall, pp. 88–91, 1983. [7] R. W. Schaefer and A. V. Oppenheim, “Discrete-time signal processing”, Englewood Cliffs, NJ: Prentice-Hall, 1989. [8] T. Wiegand et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans. circuits and syst. video technol., vol. 13, no. 7, pp. 560-576, Jul. 2003. [9] Y. Vatis and J. Ostermann, “Locally adaptive non separable interpolation filter for H.264/AVC”, in Proc. IEEE ICIP, Atlanta, GA, pp. 33–36, Oct. 2006. [10] T.Wedi, “Adaptive interpolation filter for motion compensated prediction”, Proc. IEEE ICIP, Rochester, NY, pp. 509–512, Sep. 2002. [11] H.264/AVC reference software version JM11.0 http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.zip, Jan. 2007 [Online]. [12] KTA software, version JM11.0 KTA2.3. http://www.tnt.uni-hannover.de/~vatis/kta/jm11.0kta2.3.zip, Mar. 2007 [Online]. [13] Y. Vatis and J. Ostermann, “Prediction of P- and B-frames using a 2-D non-separable adaptive Wiener interpolation filter”, in ITU-T SG16/Q [15] (VCEG) Doc VCEG-AD08, Hangzhou, China, Oct. 2006. [14] Y. Vatis and J. Ostermann, ITU-T SG16/Q [15] (VCEG) VCEG-AE16, Marrakech, Morocco, Jan. 2007. [[15]] S. Wittman and T. Wedi, “Separable adaptive interpolation filter”, in ITU-T SG16/Q6, Doc. C-0219, Geneva, Switzerland, Jul. 2007. [16] Y. Vatis and J. Ostermann “Adaptive interpolation filter for H.264/AVC”, IEEE Trans. circuits and syst. video technol., vol. 19, pp.179-192, Feb. 2009. [17] D. Rusanovskyy, K. Ugur, and J. Lainema, “Adaptive interpolation with directional filters”, in ITU-T SG16/Q.6 Doc. VCEG-AG21, Shenzhen,China, Oct. 2007. [18] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006. [19] T. Wiegand and G. J. Sullivan, “The picturephone is here: Really”, IEEE Spectrum, vol.48, pp. 50-54, Sep. 2011. [20] I. E. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Wiley, 2010. [21] Y. Vatis, B. Edler, D. T. Nguyen, I. Wassermann, and J. Ostermann, “Coding of coefficients of 2-D nonseparable adaptive Wiener interpolation filter,” in Proc. SPIE VCIP, Beijing, China, pp. 623–631, Jul. 2005. [22] Adaptive interpolation filter for video coding http://www.h265.net/2010/07/adaptive-interpolation-filter-for-video-coding.html , Aug. 2010[online].
© Copyright 2026 Paperzz