Fast loop filtering using separable characteristics of the integer 2-D discrete cosine transform Yung-Lyul Lee Sejong University Department of Internet Engineering 98 Kunja-dong, Kwangjin-gu Seoul 143-747, Korea E-mail: [email protected] Il-Hong Shin Hyun Wook Park, MEMBER SPIE KAIST 373-1 Department of Electrical Engineering Guseong-dong, Yuseong-gu Daejeon 305-701, Korea E-mail: {ssi,hwpark}@athena.kaist.ac.kr Abstract. When an image is highly compressed, video coding using the discrete cosine transform (DCT) and quantization produces noticeable image degradations, such as blocking artifacts and ringing noise. In order to reduce the degradations, a loop filtering algorithm is proposed, which consists of simple deblocking filtering and a fast decision rule for ringing and blocking conditions. The proposed method utilizes a fourpoint integer DCT for detecting blocking and ringing conditions. It also adopts an early termination policy during detection of the blocking and ringing conditions in the DCT domain. This policy speeds up computation of the filtering. The proposed method is compared with the loop filtering of the Joint Video Team (JVT) codec. As an experimental result, the computation time of the proposed method is approximately 20% faster than that of the JVT loop filtering, while the PSNR is almost the same. © 2003 Society of Photo-Optical Instrumentation Engineers. [DOI: 10.1117/1.1594193] Subject terms: loop filtering; blocking artifacts; discrete cosine transform; image coding. Paper 020550 received Dec. 18, 2002; revised manuscript received Feb. 21, 2003; accepted for publication Feb. 24, 2003. 1 Introduction Recently, video compression has been widely used for efficient utilization of communication and data storage in various applications such as multimedia, videophone, video conference, and video streaming. The Joint Video Team 共JVT兲 codec,1 which is a joint standard of ITU-T Recommendation H.264 and ISO/IEC MPEG-4 Part 10, uses spatial-domain loop filtering1 to reduce blocking artifacts. It exploits edge information, intra- and inter-coded-block information, skipped blocks, and the difference of motion vectors between the current block and its neighbor blocks. The JVT loop filtering enhances the image quality. However, it requires much computation time in the decoder compared with the other video coding modules.2 For low-bit-rate moving-picture coding, several loopfiltering methods have been proposed for reducing the blocking artifacts.3–7 The loop filtering methods simply used a three-tap lowpass filter 关共1,2,1兲 or 共1,14,1兲兴 in order to reduce blocking artifacts on block boundaries. However, the 共1,14,1兲 three-tap filter was too weak to reduce the blocking artifacts significantly even though it improved the PSNR, whereas the 共1,2,1兲 three-tap filter can degrade the image details on the block boundary. TMN10 共test model near-term for ITU-T Recommendation H.263, Version 2兲5,6 adopted a loop filtering method that performs deblocking filtering on the block boundary, where the filter coefficients are determined with consideration of the image edge and quantization parameter. The image edge is detected by using a 共1,⫺4,4,⫺1兲 four-tap filter. This method shows good subjective quality, but it degrades image details, and the PSNR is lower than that of TMN10 without loop filtering. A loop filtering method7 with low computational complexity was developed and applied to TMN10. It shows good subjective quality and good PSNR, but it can cause 2588 Opt. Eng. 42(9) 2588–2594 (September 2003) encoder decoder mismatch because it uses an 8⫻8 noninteger discrete cosine transform 共DCT兲 that can cause integer DCT 共IDCT兲 mismatch. At present, JVT codec uses a 4⫻4 integer DCT in order to avoid this problem. This paper proposes a fast loop filtering method with blocking and ringing conditions based on 4⫻4 integer DCT coefficients.8,9 An early termination policy during computation of the DCT coefficients speeds up the proposed loop filtering, since calculation of some DCT coefficients is skipped when the early termination condition is satisfied. Section 2 explains the extraction process of the blocking and ringing conditions using separable 1-D horizontal and vertical DCT. The proposed loop filtering method, which consists of weak and strong deblocking filters, is described in Sec. 3. In Sec. 4, experimental results are presented to compare the proposed method with the JVT’s loop filtering with respect to PSNR and computational complexity. Finally, conclusions are given in Sec. 5. 2 Blocking and Ringing Conditions As shown in Fig. 1, loop filtering is applied to the reconstructed image in the JVT encoder and decoder, and the loop-filtered image is used as the reference image for motion estimation 共ME兲 and motion compensation 共MC兲 of the following frames. The generalized 4⫻4 2-D DCT8 is expressed as follows: X 共 k,l 兲 ⫽ 0091-3286/2003/$15.00 1 2& 3 C共 k 兲C共 l 兲 ⫻cos 3 兺 兺 x 共 i, j 兲 cos i⫽0 j⫽0 共 2i⫹1 兲 k 8 共 2 j⫹1 兲 l , 8 © 2003 Society of Photo-Optical Instrumentation Engineers Lee, Shin, and Park: Fast loop filtering . . . x⫽ y⫽ Fig. 1 Block diagram of JVT codec including the proposed loop filtering. C共 m 兲⫽ 再 1 & 1 for m⫽0, 共1兲 for m⫽1,2,3, where x(i, j) is 4⫻4 pixel value, whose 2-D DCT coefficients are X(k,l). In Eq. 共1兲, j and i are the horizontal and vertical indices, respectively. The 2-D DCT of a 4⫻4 block can be expressed as a 1-D vertical DCT followed by a 1-D horizontal DCT,9,10 as follows: X⫽AxAT⫽yAT, X⫽ A⫽ 冋 冋 where X 共 0,0兲 X 共 0,1兲 X 共 0,2兲 X 共 0,3兲 X 共 1,0兲 X 共 1,1兲 X 共 1,2兲 X 共 1,3兲 X 共 2,0兲 X 共 2,1兲 X 共 2,2兲 X 共 2,3兲 X 共 3,0兲 X 共 3,1兲 X 共 3,2兲 X 共 3,3兲 1 1 1 1 2 1 ⫺1 ⫺2 1 ⫺1 ⫺1 1 1 ⫺2 2 ⫺1 册 , 共2兲 y⫽Ax, 册 , 共3a兲 冋 冋 册 册 x 共 0,0兲 x 共 0,1兲 x 共 0,2兲 x 共 0,3兲 x 共 1,0兲 x 共 1,1兲 x 共 1,2兲 x 共 1,3兲 x 共 2,0兲 x 共 2,1兲 x 共 2,2兲 x 共 2,3兲 x 共 3,0兲 x 共 3,1兲 x 共 3,2兲 x 共 3,3兲 y 共 0,0兲 y 共 0,1兲 y 共 0,2兲 y 共 0,3兲 y 共 1,0兲 y 共 1,1兲 y 共 1,2兲 y 共 1,3兲 y 共 2,0兲 y 共 2,1兲 y 共 2,2兲 y 共 2,3兲 y 共 3,0兲 y 共 3,1兲 y 共 3,2兲 y 共 3,3兲 , 共3b兲 , where AT is the transpose matrix of A. This calculation can be performed by addition and bitwise-shift operations. When only the dc component has a nonzero value in the DCT domain, all 4⫻4 pixels have the same values in the spatial domain. In this case, the block can cause horizontal and vertical blocking artifacts. When only the far-left column coefficients of the 4⫻4 DCT coefficients have nonzero values, the 4⫻4 block may induce horizontal blocking artifacts. When only the top row coefficients of the 4⫻4 DCT coefficients have nonzero values, the 4⫻4 block may induce vertical blocking artifacts. In order to detect blocking and ringing conditions, the reconstructed image is transformed by a 4⫻4 integer DCT. Rows and columns of the 4⫻4 DCT coefficients are investigated for verifying the existence of nonzero coefficients in each position. The extraction of the DCT coefficients and the investigation of nonzero coefficients require considerable computation time. In order to reduce the computation load, the investigation of DCT coefficients is performed during the DCT. Therefore, the DCT can be terminated before completion of the 2-D DCT, if some conditions are satisfied. In the proposed fast decision method, a four-point 1-D vertical DCT is first performed by the matrix multiplication of A by x in Eq. 共2兲. Then, 1-D horizontal DCT is performed with an early termination policy to detect blocking and ringing conditions. The numbers in Fig. 2 denote the computation order of DCT coefficients to detect the corresponding conditions. The horizontal flag, vertical flag, and ringing condition become 1 when all the coefficients except those specified by numbers are zeros in Fig. 2共b兲, 2共c兲, and 2共d兲, respectively. The flags and ringing condition are defined by the pseudocode in Figs. 3 and 4. First, the far-left column coefficients are investigated for horizontal flags in the order in Fig. 2 DCT coefficients used for detection of the blocking and ringing conditions: (a) dc coefficients, (b) DCT coefficients for detection of the horizontal flag and ringing condition, (c) DCT coefficients for detection of the vertical flag and ringing condition, (d) DCT coefficients for detection of the ringing condition. The numbers denote the computation order of the DCT coefficients. Optical Engineering, Vol. 42 No. 9, September 2003 2589 Lee, Shin, and Park: Fast loop filtering . . . Fig. 3 Extraction routine for flags and condition for early termination. Fig. 5 Decision of the horizontal and vertical blocking conditions from horizontal flag, vertical flag, and ringing condition. Fig. 2共b兲. As described in the pseudocode of Fig. 3, the horizontal flag is computed, and investigation of DCT coefficients for the vertical flag is followed. When the ringing condition becomes 1 in Fig. 3, the extraction routine is terminated according to the early termination policy. To improve computational efficiency, the ringing condition is first investigated for coefficients of the far-left column and top row coefficients as shown in Fig. 3. If the ringing condition is still 0 after investigation of far-left column and top row coefficients, the DCT coefficients of X(0,2), X(1,1),...,X(3,3) in Fig. 2共d兲 are calculated sequentially in a zigzag scan order by matrix multiplication of y by AT, and the coefficients are investigated for detecting the ringing condition as shown in Fig. 4. Finally, horizontal and vertical blocking conditions are defined by consideration of the above horizontal flag, vertical flag, and ringing condition as shown in Fig. 5. 3 The Proposed Loop Filtering After obtaining the horizontal blocking, vertical blocking, and ringing conditions for each 4⫻4 block as in Sec. 2, strong or weak deblocking filtering is performed on the boundary of the 4⫻4 block according to those conditions. Horizontal strong filtering is applied if the ringing condition is 0 and both the horizontal blocking conditions of the Fig. 4 Extraction routine for ringing condition for early termination. 2590 Optical Engineering, Vol. 42 No. 9, September 2003 current block and its left block are 1. Otherwise, weak horizontal filtering is applied to the block boundary, except that the current block is a skipped block in which the loop filtering is not applied. In Fig. 6, pixels A, B, C, and D straddle a horizontal block boundary, where the block boundary is between B and C. In strong horizontal filtering that reduces blocking artifacts, smoothing is applied to the four pixels A, B, C, and D, where the five-tap smoothing filter has filter coefficients 共1/8, 1/4, 1/4, 1/4, 1/8兲. For horizontal weak filtering, the two pixels B and C are slightly modified in consideration of the macroblock type of the current block and the left block. JVT defines seven macroblock types.1 Usually similar regions have the same macroblock types. Thus, a slightly weaker deblocking filter is applied to the boundary between blocks having different macroblock types, whereas a slightly stronger weak deblocking filter is applied to those having the same macroblock type. As shown in Fig. 7共a兲, C and D are pixels inside the current block, and A and B are pixels on the block to its Fig. 6 Pixels for filtering on the horizontal block boundary. Lee, Shin, and Park: Fast loop filtering . . . Fig. 8 Blocking artifacts in the ‘‘Foreman’’ sequence: (a) an intra frame when q is 24, (b) an inter frame when q is 25. C new⫽C⫺ Fig. 7 Weak filtering method: (a) pixel values around block boundary in 1-D cut view, (b) weakly filtered pixels when the current block and the block to its left have the same macroblock types, (c) weakly filtered pixels when the macroblock types of the current block and the block to its left are different. left. When the current block and the block to its left have the same macroblock types, the boundary pixels B and C are replaced with new values as follows: B new⫽B⫺ B⫺C , q C new⫽C⫹ B⫺C , q 共4兲 where q is the quantization parameter of the JVT codec. The pixels filtered using Eq. 共4兲 are shown in Fig. 7共b兲. When the macroblock types of the current block and the block to its left are different, the boundary pixels B and C in Fig. 7共a兲 are replaced with the following new values as shown in Fig. 7共c兲: B new⫽B⫹ 4 共 C⫺B 兲 ⫹ 共 A⫺D 兲 , 8q 4 共 C⫺B 兲 ⫹ 共 A⫺D 兲 . 8q 共5兲 Vertical filtering can be applied to vertical boundary pixels in the same way as horizontal filtering, thereby implementing vertical blocking conditions on the current block and the block above instead of horizontal blocking conditions. Figure 8 shows the blocking artifacts of the ‘‘Foreman’’ sequence in intra and inter frames when their quantization parameters q are 24 and 25, respectively. The ‘‘Foreman’’ sequence was compressed by the JVT codec without loop filtering. As shown, at diagonal lines of the wall part in Fig. 8共a兲 and 8共b兲, the blocking artifacts in an intra frame are more serious than those of an inter frame. In addition, the intra frame is used as the first reference frame of motion compensation for the following inter frames, so that the intra frame needs to be filtered more elaborately. Therefore, the proposed loop filtering algorithm is designed in consideration of the difference between the intra frame and inter frame, as shown in Fig. 9. In Fig. 9, the neighbor block means the block to the left of the current block when horizontal deblocking filtering is applied, and the block above the current block when vertical deblocking filtering is applied. In the intra frame, the ringing condition of the current block and the blocking conditions of the current and neighbor blocks decide whether strong or weak deblocking filtering is applied to the boundary pixels. In inter frame, strong or weak deblocking filtering is selected by consideration of skipped blocks, motion vector, block type, and ringing and blocking conditions as shown in Fig. 9. Since the reference block of the current inter block has already been filtered, the inter block can be refiltered and smoothed too much if its neighbor blocks have the same motion vectors or skipped blocks. Therefore, the motion vectors and coded block patterns 共CBPs兲 are considered to prevent over smoothing of the current block. 4 Experimental Results In experiments, intra frame 共I frame兲 and inter frames 共P frames兲 are filtered to reduce blocking artifact and ringing noise. We used the Exp-Golomb code1; variable-block ME having 16⫻16, 16⫻8, 8⫻16, 8⫻8, 8⫻4, 4⫻8, and 4 ⫻4 block sizes; motion search range ⫺16 to ⫹16; quarterpixel MC; 4⫻4 DCT; and the rate-distortion optimization of the JVT codec.3 Several video sequences were used for Optical Engineering, Vol. 42 No. 9, September 2003 2591 Lee, Shin, and Park: Fast loop filtering . . . Fig. 9 Flow graph of the proposed loop-filtering algorithm, where Bc is the blocking condition of the current block, Bp is the blocking condition of the neighbor block, and Rc is the ringing condition of the current block. ‘‘Not Coded’’ means a skipped block, and MVc and MVp are the motion vectors of the current block and its neighbor block, respectively. & and 兩 mean logical AND and logical OR operation, respectively. the experiment: the 10-Hz ‘‘Foreman’’ sequence of the Quarter Common Intermediate Format 共QCIF兲, the 10-Hz ‘‘News’’ sequence of the QCIF, the 15-Hz ‘‘Silent Voice’’ sequence of the QCIF, and the 15-Hz ‘‘Paris’’ sequence of the Common Intermediate Format 共CIF兲. Each video sequence was compressed with the proposed loop filtering and the JVT loop filtering. Most experiments show very similar results. The ratedistortion plots for each sequence are shown in Fig. 10, and Table 1 shows the differences11 of the bit rates and the PSNR between the proposed loop filtering and the JVT loop filtering. In Table 1, the negative values of the bit rates mean that the proposed loop filtering has lower bit rates than the JVT, and the positive values of the luminance 共PSNR – Y兲 and chrominance PSNRs 共PSNR – U and PSNR – V兲 mean that the proposed filtering has higher PSNR than the JVT. Although the PSNR difference between the proposed and JVT loop filtering methods is very small as shown in Fig. 10, the computation time of the proposed method is approximately 20% shorter than that of the JVT loop filtering on a 1.5-GHz Pentium-IV for various video sequences, as shown in Fig. 11. In Fig. 11, the computation time of the proposed loop filtering without the early termination is presented to demonstrate the effect of the early termination. In the case that the early termination during detection of the ringing and blocking conditions is not used, the computation time of the proposed method is similar to that of the JVT loop filtering. Figure 12 compares the subjective quality of the proposed loop filtering with that of the JVT. Comparing Fig. 12共a兲 with Fig. 12共b兲 at part of the eyes of the ‘‘Foreman’’ sequence, and Fig. 12共c兲 with Fig. 12共d兲 at the lady’s nose of the ‘‘News’’ sequence, 2592 Optical Engineering, Vol. 42 No. 9, September 2003 it is seen that the proposed loop filtering makes the image details less blurred. 5 Conclusions This paper has proposed loop filtering using separable DCT characteristics. Comparing the proposed method with the Fig. 10 Rate-distortion (PSNR versus bit rate) plots of the proposed loop filtering and JVT loop filtering for (a) ‘‘Foreman’’ sequence with QCIF, (b) ‘‘News’’ sequence with QCIF, (c) ‘‘Silent Voice’’ sequence with QCIF, and (d) ‘‘Paris’’ sequence with CIF. Throughout, fps stands for frames/s, and Kbps for 1024 bits/s. Lee, Shin, and Park: Fast loop filtering . . . Table 1 Differences in bit rates and PSNRs between the proposed and the JVT loop filtering, where QP is the quantization parameter. QP Bit rate PSNR – Y PSNR – U PSNR – V Foreman QCIF 12 16 ⫺0.16 ⫺0.09 0.01 0.02 ⫺0.05 ⫺0.03 ⫺0.05 0.00 20 24 ⫺0.6 0.05 0.01 0.06 ⫺0.02 0.03 ⫺0.07 ⫺0.05 Silent QCIF 12 0.03 ⫺0.01 0.04 ⫺0.01 16 20 0.10 0.19 ⫺0.03 0.07 0.02 ⫺0.08 ⫺0.12 ⫺0.15 24 0.25 0.08 ⫺0.16 ⫺0.08 News QCIF 12 0.23 ⫺0.02 0.02 ⫺0.06 16 20 24 0.12 0.11 0.12 ⫺0.05 ⫺0.04 0.03 ⫺0.17 ⫺0.12 0.03 ⫺0.20 ⫺0.13 ⫺0.10 Paris CIF 12 16 20 24 0.47 0.58 0.26 0.15 ⫺0.02 0.01 ⫺0.02 ⫺0.02 ⫺0.01 ⫺0.06 ⫺0.08 0.01 ⫺0.05 0.03 0.00 ⫺0.03 JVT loop filtering, we find that the proposed one requires less computation time due to the early termination during detection of the ringing and blocking conditions, while the PSNR is preserved. Also the subjective quality of the filtered image is improved, especially in complex regions. In general, loop filtering increases the computational complexity in the encoder and decoder. The proposed loop filtering method can be very useful for real application to reduce the computational burden. Fig. 11 Computation time of the proposed loop filtering and JVT loop filtering for four test sequences of Table 1. The vertical axis is normalized to the computation time of JVT loop filtering. Fig. 12 Parts of the decoded frames of the ‘‘Foreman’’ (a, b) and ‘‘News’’ (c, d) sequences containing detail: (a, c) the proposed loopfiltered frame, (b, d) the JVT loop-filtered frame. Acknowledgment This work was partly supported by a Korea Research Foundation grant 共KRF-2002-003-D00339兲. References 1. T. Wiegand, ‘‘Joint final committee draft 共JFCD兲 of joint video specification 共ITU-T Rec. H.264兩ISO/IEC 14496-10 AVC兲,’’ JVT-D157 共Aug. 2002兲. 2. C. Blanch and K. Denolf, ‘‘Memory complexity analysis of the AVC Codec JM1.7,’’ ISO/IEC JTC1/SC29/WG11 MPEG02/M8378, Fairfax 共May 2002兲. 3. CCITT Recommendation H.261, ‘‘Video codec for audiovisual services at p⫻64 kbits/s’’ 共Dec. 1990兲. 4. K. K. Pang and T. K. Tan, ‘‘Optimum loop filter in hybrid coders,’’ IEEE Trans. Circuits Syst. Video Technol. 4, 158 –167 共1994兲. 5. ITU Telecom. Standardization Sector, ‘‘Video codec test model nearterm,’’ Version 10 共TMN10兲 Draft 1, H.263 Ad Hoc Group 共Apr. 1998兲. 6. ITU Telecom. Standardization Sector, ‘‘Video coding for low bitrate communication,’’ Draft ITU-T Recommendation H.263 Version 2 共Jan. 1998兲. 7. Y. L. Lee and H. W. Park, ‘‘Loop filtering and post-filtering for lowbit-rates moving picture coding,’’ Signal Process. Image Commun. 16, 871– 890 共2001兲. 8. K. R. Rao and P. Yip, ‘‘Fast algorithm for DCT-II,’’ Chap. 4 in Discrete Cosine Transform, Academic Press, New York 共1990兲. 9. A. Hallapuro and M. Karczewicz, ‘‘Low complexity transform and quantization—part I: basic implementation,’’ ISO/IEC MPEG & ITU-T Q.6/SG16 Q.6, JVT-B038 共Jan. 2002兲. 10. A. K. Jain, ‘‘Image transform,’’ Chap. 5 in Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ 共1989兲. 11. G. Bjontegaard, ‘‘Calculation of average PSNR differences between RD-curves,’’ ITU-T Q.6/SG16 VCEG-M33 共Mar. 2001兲. Yung-Lyul Lee received the BS and MS degrees in electronic engineering from Sogang University, Seoul, Korea, in 1985 and 1987, respectively, and the PhD degree in electrical and electronic engineering from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1999. He has been an assistant professor in the Department of Internet Engineering, Sejong University, Seoul, Korea, since 2001. He was a principal researcher at the Samsung Electronics Co. Ltd. from 1987 to 2001. His current research interests include video compression, image processing, watermarking, and multimedia systems. Optical Engineering, Vol. 42 No. 9, September 2003 2593 Lee, Shin, and Park: Fast loop filtering . . . Il-Hong Shin was born in Pohang, Korea, in 1978. He received the BS and MS degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 2000 and 2002, respectively. He is currently working toward the PhD at the same university. His research interests include image and video coding, video communication, rate control, and multimedia systems. Hyun Wook Park received the BS degree in electrical engineering from Seoul National University, Seoul, Korea, in 1981, and the MS 2594 Optical Engineering, Vol. 42 No. 9, September 2003 and PhD degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Seoul, Korea, in 1983 and 1988, respectively. He has been a professor in the Electrical Engineering Department, KAIST, Daejeon, Korea, since 1993. He was a research associate at the University of Washington from 1989 to 1992 and was a senior executive researcher at the Samsung Electronics Co. Ltd. from 1992 to 1993. His current research interests include image computing systems, image compression, medical imaging, and multimedia systems. He is a senior member of the IEEE and a member of SPIE.
© Copyright 2026 Paperzz