Fast loop filtering using separable characteristics of the integer 2

Fast loop filtering using separable characteristics
of the integer 2-D discrete cosine transform
Yung-Lyul Lee
Sejong University
Department of Internet Engineering
98 Kunja-dong, Kwangjin-gu
Seoul 143-747, Korea
E-mail: [email protected]
Il-Hong Shin
Hyun Wook Park, MEMBER SPIE
KAIST 373-1
Department of Electrical Engineering
Guseong-dong, Yuseong-gu
Daejeon 305-701, Korea
E-mail: {ssi,hwpark}@athena.kaist.ac.kr
Abstract. When an image is highly compressed, video coding using the
discrete cosine transform (DCT) and quantization produces noticeable
image degradations, such as blocking artifacts and ringing noise. In order to reduce the degradations, a loop filtering algorithm is proposed,
which consists of simple deblocking filtering and a fast decision rule for
ringing and blocking conditions. The proposed method utilizes a fourpoint integer DCT for detecting blocking and ringing conditions. It also
adopts an early termination policy during detection of the blocking and
ringing conditions in the DCT domain. This policy speeds up computation
of the filtering. The proposed method is compared with the loop filtering
of the Joint Video Team (JVT) codec. As an experimental result, the
computation time of the proposed method is approximately 20% faster
than that of the JVT loop filtering, while the PSNR is almost the same.
© 2003 Society of Photo-Optical Instrumentation Engineers. [DOI: 10.1117/1.1594193]
Subject terms: loop filtering; blocking artifacts; discrete cosine transform; image
coding.
Paper 020550 received Dec. 18, 2002; revised manuscript received Feb. 21,
2003; accepted for publication Feb. 24, 2003.
1
Introduction
Recently, video compression has been widely used for efficient utilization of communication and data storage in
various applications such as multimedia, videophone, video
conference, and video streaming. The Joint Video Team
共JVT兲 codec,1 which is a joint standard of ITU-T Recommendation H.264 and ISO/IEC MPEG-4 Part 10, uses
spatial-domain loop filtering1 to reduce blocking artifacts.
It exploits edge information, intra- and inter-coded-block
information, skipped blocks, and the difference of motion
vectors between the current block and its neighbor blocks.
The JVT loop filtering enhances the image quality. However, it requires much computation time in the decoder
compared with the other video coding modules.2
For low-bit-rate moving-picture coding, several loopfiltering methods have been proposed for reducing the
blocking artifacts.3–7 The loop filtering methods simply
used a three-tap lowpass filter 关共1,2,1兲 or 共1,14,1兲兴 in order
to reduce blocking artifacts on block boundaries. However,
the 共1,14,1兲 three-tap filter was too weak to reduce the
blocking artifacts significantly even though it improved the
PSNR, whereas the 共1,2,1兲 three-tap filter can degrade the
image details on the block boundary. TMN10 共test model
near-term for ITU-T Recommendation H.263, Version 2兲5,6
adopted a loop filtering method that performs deblocking
filtering on the block boundary, where the filter coefficients
are determined with consideration of the image edge and
quantization parameter. The image edge is detected by using a 共1,⫺4,4,⫺1兲 four-tap filter. This method shows good
subjective quality, but it degrades image details, and the
PSNR is lower than that of TMN10 without loop filtering.
A loop filtering method7 with low computational complexity was developed and applied to TMN10. It shows
good subjective quality and good PSNR, but it can cause
2588 Opt. Eng. 42(9) 2588–2594 (September 2003)
encoder decoder mismatch because it uses an 8⫻8 noninteger discrete cosine transform 共DCT兲 that can cause integer DCT 共IDCT兲 mismatch. At present, JVT codec uses a
4⫻4 integer DCT in order to avoid this problem.
This paper proposes a fast loop filtering method with
blocking and ringing conditions based on 4⫻4 integer
DCT coefficients.8,9 An early termination policy during
computation of the DCT coefficients speeds up the proposed loop filtering, since calculation of some DCT coefficients is skipped when the early termination condition is
satisfied. Section 2 explains the extraction process of the
blocking and ringing conditions using separable 1-D horizontal and vertical DCT. The proposed loop filtering
method, which consists of weak and strong deblocking filters, is described in Sec. 3. In Sec. 4, experimental results
are presented to compare the proposed method with the
JVT’s loop filtering with respect to PSNR and computational complexity. Finally, conclusions are given in Sec. 5.
2 Blocking and Ringing Conditions
As shown in Fig. 1, loop filtering is applied to the reconstructed image in the JVT encoder and decoder, and the
loop-filtered image is used as the reference image for motion estimation 共ME兲 and motion compensation 共MC兲 of the
following frames. The generalized 4⫻4 2-D DCT8 is expressed as follows:
X 共 k,l 兲 ⫽
0091-3286/2003/$15.00
1
2&
3
C共 k 兲C共 l 兲
⫻cos
3
兺 兺 x 共 i, j 兲 cos
i⫽0 j⫽0
共 2i⫹1 兲 k ␲
8
共 2 j⫹1 兲 l ␲
,
8
© 2003 Society of Photo-Optical Instrumentation Engineers
Lee, Shin, and Park: Fast loop filtering . . .
x⫽
y⫽
Fig. 1 Block diagram of JVT codec including the proposed loop
filtering.
C共 m 兲⫽
再
1
&
1
for m⫽0,
共1兲
for m⫽1,2,3,
where x(i, j) is 4⫻4 pixel value, whose 2-D DCT coefficients are X(k,l). In Eq. 共1兲, j and i are the horizontal and
vertical indices, respectively. The 2-D DCT of a 4⫻4 block
can be expressed as a 1-D vertical DCT followed by a 1-D
horizontal DCT,9,10 as follows:
X⫽AxAT⫽yAT,
X⫽
A⫽
冋
冋
where
X 共 0,0兲
X 共 0,1兲
X 共 0,2兲
X 共 0,3兲
X 共 1,0兲
X 共 1,1兲
X 共 1,2兲
X 共 1,3兲
X 共 2,0兲
X 共 2,1兲
X 共 2,2兲
X 共 2,3兲
X 共 3,0兲
X 共 3,1兲
X 共 3,2兲
X 共 3,3兲
1
1
1
1
2
1
⫺1
⫺2
1
⫺1
⫺1
1
1
⫺2
2
⫺1
册
,
共2兲
y⫽Ax,
册
,
共3a兲
冋
冋
册
册
x 共 0,0兲
x 共 0,1兲
x 共 0,2兲
x 共 0,3兲
x 共 1,0兲
x 共 1,1兲
x 共 1,2兲
x 共 1,3兲
x 共 2,0兲
x 共 2,1兲
x 共 2,2兲
x 共 2,3兲
x 共 3,0兲
x 共 3,1兲
x 共 3,2兲
x 共 3,3兲
y 共 0,0兲
y 共 0,1兲
y 共 0,2兲
y 共 0,3兲
y 共 1,0兲
y 共 1,1兲
y 共 1,2兲
y 共 1,3兲
y 共 2,0兲
y 共 2,1兲
y 共 2,2兲
y 共 2,3兲
y 共 3,0兲
y 共 3,1兲
y 共 3,2兲
y 共 3,3兲
,
共3b兲
,
where AT is the transpose matrix of A. This calculation can
be performed by addition and bitwise-shift operations.
When only the dc component has a nonzero value in the
DCT domain, all 4⫻4 pixels have the same values in the
spatial domain. In this case, the block can cause horizontal
and vertical blocking artifacts. When only the far-left column coefficients of the 4⫻4 DCT coefficients have nonzero values, the 4⫻4 block may induce horizontal blocking
artifacts. When only the top row coefficients of the 4⫻4
DCT coefficients have nonzero values, the 4⫻4 block may
induce vertical blocking artifacts.
In order to detect blocking and ringing conditions, the
reconstructed image is transformed by a 4⫻4 integer DCT.
Rows and columns of the 4⫻4 DCT coefficients are investigated for verifying the existence of nonzero coefficients in
each position. The extraction of the DCT coefficients and
the investigation of nonzero coefficients require considerable computation time. In order to reduce the computation
load, the investigation of DCT coefficients is performed
during the DCT. Therefore, the DCT can be terminated before completion of the 2-D DCT, if some conditions are
satisfied. In the proposed fast decision method, a four-point
1-D vertical DCT is first performed by the matrix multiplication of A by x in Eq. 共2兲. Then, 1-D horizontal DCT is
performed with an early termination policy to detect blocking and ringing conditions.
The numbers in Fig. 2 denote the computation order of
DCT coefficients to detect the corresponding conditions.
The horizontal flag, vertical flag, and ringing condition become 1 when all the coefficients except those specified by
numbers are zeros in Fig. 2共b兲, 2共c兲, and 2共d兲, respectively.
The flags and ringing condition are defined by the
pseudocode in Figs. 3 and 4. First, the far-left column coefficients are investigated for horizontal flags in the order in
Fig. 2 DCT coefficients used for detection of the blocking and ringing conditions: (a) dc coefficients,
(b) DCT coefficients for detection of the horizontal flag and ringing condition, (c) DCT coefficients for
detection of the vertical flag and ringing condition, (d) DCT coefficients for detection of the ringing
condition. The numbers denote the computation order of the DCT coefficients.
Optical Engineering, Vol. 42 No. 9, September 2003 2589
Lee, Shin, and Park: Fast loop filtering . . .
Fig. 3 Extraction routine for flags and condition for early termination.
Fig. 5 Decision of the horizontal and vertical blocking conditions
from horizontal flag, vertical flag, and ringing condition.
Fig. 2共b兲. As described in the pseudocode of Fig. 3, the
horizontal flag is computed, and investigation of DCT coefficients for the vertical flag is followed. When the ringing
condition becomes 1 in Fig. 3, the extraction routine is
terminated according to the early termination policy. To
improve computational efficiency, the ringing condition is
first investigated for coefficients of the far-left column and
top row coefficients as shown in Fig. 3. If the ringing condition is still 0 after investigation of far-left column and top
row coefficients, the DCT coefficients of X(0,2),
X(1,1),...,X(3,3) in Fig. 2共d兲 are calculated sequentially in
a zigzag scan order by matrix multiplication of y by AT,
and the coefficients are investigated for detecting the ringing condition as shown in Fig. 4.
Finally, horizontal and vertical blocking conditions are
defined by consideration of the above horizontal flag, vertical flag, and ringing condition as shown in Fig. 5.
3 The Proposed Loop Filtering
After obtaining the horizontal blocking, vertical blocking,
and ringing conditions for each 4⫻4 block as in Sec. 2,
strong or weak deblocking filtering is performed on the
boundary of the 4⫻4 block according to those conditions.
Horizontal strong filtering is applied if the ringing condition is 0 and both the horizontal blocking conditions of the
Fig. 4 Extraction routine for ringing condition for early termination.
2590 Optical Engineering, Vol. 42 No. 9, September 2003
current block and its left block are 1. Otherwise, weak horizontal filtering is applied to the block boundary, except that
the current block is a skipped block in which the loop filtering is not applied.
In Fig. 6, pixels A, B, C, and D straddle a horizontal
block boundary, where the block boundary is between B
and C. In strong horizontal filtering that reduces blocking
artifacts, smoothing is applied to the four pixels A, B, C,
and D, where the five-tap smoothing filter has filter coefficients 共1/8, 1/4, 1/4, 1/4, 1/8兲.
For horizontal weak filtering, the two pixels B and C are
slightly modified in consideration of the macroblock type
of the current block and the left block. JVT defines seven
macroblock types.1 Usually similar regions have the same
macroblock types. Thus, a slightly weaker deblocking filter
is applied to the boundary between blocks having different
macroblock types, whereas a slightly stronger weak deblocking filter is applied to those having the same macroblock type. As shown in Fig. 7共a兲, C and D are pixels inside
the current block, and A and B are pixels on the block to its
Fig. 6 Pixels for filtering on the horizontal block boundary.
Lee, Shin, and Park: Fast loop filtering . . .
Fig. 8 Blocking artifacts in the ‘‘Foreman’’ sequence: (a) an intra
frame when q is 24, (b) an inter frame when q is 25.
C new⫽C⫺
Fig. 7 Weak filtering method: (a) pixel values around block boundary in 1-D cut view, (b) weakly filtered pixels when the current block
and the block to its left have the same macroblock types, (c) weakly
filtered pixels when the macroblock types of the current block and
the block to its left are different.
left. When the current block and the block to its left have
the same macroblock types, the boundary pixels B and C
are replaced with new values as follows:
B new⫽B⫺
B⫺C
,
q
C new⫽C⫹
B⫺C
,
q
共4兲
where q is the quantization parameter of the JVT codec.
The pixels filtered using Eq. 共4兲 are shown in Fig. 7共b兲.
When the macroblock types of the current block and the
block to its left are different, the boundary pixels B and C
in Fig. 7共a兲 are replaced with the following new values as
shown in Fig. 7共c兲:
B new⫽B⫹
4 共 C⫺B 兲 ⫹ 共 A⫺D 兲
,
8q
4 共 C⫺B 兲 ⫹ 共 A⫺D 兲
.
8q
共5兲
Vertical filtering can be applied to vertical boundary pixels in the same way as horizontal filtering, thereby implementing vertical blocking conditions on the current block
and the block above instead of horizontal blocking conditions.
Figure 8 shows the blocking artifacts of the ‘‘Foreman’’
sequence in intra and inter frames when their quantization
parameters q are 24 and 25, respectively. The ‘‘Foreman’’
sequence was compressed by the JVT codec without loop
filtering. As shown, at diagonal lines of the wall part in Fig.
8共a兲 and 8共b兲, the blocking artifacts in an intra frame are
more serious than those of an inter frame. In addition, the
intra frame is used as the first reference frame of motion
compensation for the following inter frames, so that the
intra frame needs to be filtered more elaborately. Therefore,
the proposed loop filtering algorithm is designed in consideration of the difference between the intra frame and inter
frame, as shown in Fig. 9. In Fig. 9, the neighbor block
means the block to the left of the current block when horizontal deblocking filtering is applied, and the block above
the current block when vertical deblocking filtering is applied.
In the intra frame, the ringing condition of the current
block and the blocking conditions of the current and neighbor blocks decide whether strong or weak deblocking filtering is applied to the boundary pixels. In inter frame,
strong or weak deblocking filtering is selected by consideration of skipped blocks, motion vector, block type, and
ringing and blocking conditions as shown in Fig. 9. Since
the reference block of the current inter block has already
been filtered, the inter block can be refiltered and smoothed
too much if its neighbor blocks have the same motion vectors or skipped blocks. Therefore, the motion vectors and
coded block patterns 共CBPs兲 are considered to prevent over
smoothing of the current block.
4 Experimental Results
In experiments, intra frame 共I frame兲 and inter frames 共P
frames兲 are filtered to reduce blocking artifact and ringing
noise. We used the Exp-Golomb code1; variable-block ME
having 16⫻16, 16⫻8, 8⫻16, 8⫻8, 8⫻4, 4⫻8, and 4
⫻4 block sizes; motion search range ⫺16 to ⫹16; quarterpixel MC; 4⫻4 DCT; and the rate-distortion optimization
of the JVT codec.3 Several video sequences were used for
Optical Engineering, Vol. 42 No. 9, September 2003 2591
Lee, Shin, and Park: Fast loop filtering . . .
Fig. 9 Flow graph of the proposed loop-filtering algorithm, where Bc is the blocking condition of the
current block, Bp is the blocking condition of the neighbor block, and Rc is the ringing condition of the
current block. ‘‘Not Coded’’ means a skipped block, and MVc and MVp are the motion vectors of the
current block and its neighbor block, respectively. & and 兩 mean logical AND and logical OR operation,
respectively.
the experiment: the 10-Hz ‘‘Foreman’’ sequence of the
Quarter Common Intermediate Format 共QCIF兲, the 10-Hz
‘‘News’’ sequence of the QCIF, the 15-Hz ‘‘Silent Voice’’
sequence of the QCIF, and the 15-Hz ‘‘Paris’’ sequence of
the Common Intermediate Format 共CIF兲. Each video sequence was compressed with the proposed loop filtering
and the JVT loop filtering.
Most experiments show very similar results. The ratedistortion plots for each sequence are shown in Fig. 10, and
Table 1 shows the differences11 of the bit rates and the
PSNR between the proposed loop filtering and the JVT
loop filtering. In Table 1, the negative values of the bit rates
mean that the proposed loop filtering has lower bit rates
than the JVT, and the positive values of the luminance
共PSNR – Y兲 and chrominance PSNRs 共PSNR – U and
PSNR – V兲 mean that the proposed filtering has higher
PSNR than the JVT. Although the PSNR difference between the proposed and JVT loop filtering methods is very
small as shown in Fig. 10, the computation time of the
proposed method is approximately 20% shorter than that of
the JVT loop filtering on a 1.5-GHz Pentium-IV for various
video sequences, as shown in Fig. 11. In Fig. 11, the computation time of the proposed loop filtering without the
early termination is presented to demonstrate the effect of
the early termination. In the case that the early termination
during detection of the ringing and blocking conditions is
not used, the computation time of the proposed method is
similar to that of the JVT loop filtering. Figure 12 compares
the subjective quality of the proposed loop filtering with
that of the JVT. Comparing Fig. 12共a兲 with Fig. 12共b兲 at
part of the eyes of the ‘‘Foreman’’ sequence, and Fig. 12共c兲
with Fig. 12共d兲 at the lady’s nose of the ‘‘News’’ sequence,
2592 Optical Engineering, Vol. 42 No. 9, September 2003
it is seen that the proposed loop filtering makes the image
details less blurred.
5 Conclusions
This paper has proposed loop filtering using separable DCT
characteristics. Comparing the proposed method with the
Fig. 10 Rate-distortion (PSNR versus bit rate) plots of the proposed
loop filtering and JVT loop filtering for (a) ‘‘Foreman’’ sequence with
QCIF, (b) ‘‘News’’ sequence with QCIF, (c) ‘‘Silent Voice’’ sequence
with QCIF, and (d) ‘‘Paris’’ sequence with CIF. Throughout, fps
stands for frames/s, and Kbps for 1024 bits/s.
Lee, Shin, and Park: Fast loop filtering . . .
Table 1 Differences in bit rates and PSNRs between the proposed
and the JVT loop filtering, where QP is the quantization parameter.
QP
Bit rate
PSNR – Y
PSNR – U
PSNR – V
Foreman QCIF
12
16
⫺0.16
⫺0.09
0.01
0.02
⫺0.05
⫺0.03
⫺0.05
0.00
20
24
⫺0.6
0.05
0.01
0.06
⫺0.02
0.03
⫺0.07
⫺0.05
Silent QCIF
12
0.03
⫺0.01
0.04
⫺0.01
16
20
0.10
0.19
⫺0.03
0.07
0.02
⫺0.08
⫺0.12
⫺0.15
24
0.25
0.08
⫺0.16
⫺0.08
News QCIF
12
0.23
⫺0.02
0.02
⫺0.06
16
20
24
0.12
0.11
0.12
⫺0.05
⫺0.04
0.03
⫺0.17
⫺0.12
0.03
⫺0.20
⫺0.13
⫺0.10
Paris CIF
12
16
20
24
0.47
0.58
0.26
0.15
⫺0.02
0.01
⫺0.02
⫺0.02
⫺0.01
⫺0.06
⫺0.08
0.01
⫺0.05
0.03
0.00
⫺0.03
JVT loop filtering, we find that the proposed one requires
less computation time due to the early termination during
detection of the ringing and blocking conditions, while the
PSNR is preserved. Also the subjective quality of the filtered image is improved, especially in complex regions. In
general, loop filtering increases the computational complexity in the encoder and decoder. The proposed loop filtering method can be very useful for real application to
reduce the computational burden.
Fig. 11 Computation time of the proposed loop filtering and JVT
loop filtering for four test sequences of Table 1. The vertical axis is
normalized to the computation time of JVT loop filtering.
Fig. 12 Parts of the decoded frames of the ‘‘Foreman’’ (a, b) and
‘‘News’’ (c, d) sequences containing detail: (a, c) the proposed loopfiltered frame, (b, d) the JVT loop-filtered frame.
Acknowledgment
This work was partly supported by a Korea Research Foundation grant 共KRF-2002-003-D00339兲.
References
1. T. Wiegand, ‘‘Joint final committee draft 共JFCD兲 of joint video specification 共ITU-T Rec. H.264兩ISO/IEC 14496-10 AVC兲,’’ JVT-D157
共Aug. 2002兲.
2. C. Blanch and K. Denolf, ‘‘Memory complexity analysis of the AVC
Codec JM1.7,’’ ISO/IEC JTC1/SC29/WG11 MPEG02/M8378, Fairfax 共May 2002兲.
3. CCITT Recommendation H.261, ‘‘Video codec for audiovisual services at p⫻64 kbits/s’’ 共Dec. 1990兲.
4. K. K. Pang and T. K. Tan, ‘‘Optimum loop filter in hybrid coders,’’
IEEE Trans. Circuits Syst. Video Technol. 4, 158 –167 共1994兲.
5. ITU Telecom. Standardization Sector, ‘‘Video codec test model nearterm,’’ Version 10 共TMN10兲 Draft 1, H.263 Ad Hoc Group 共Apr.
1998兲.
6. ITU Telecom. Standardization Sector, ‘‘Video coding for low bitrate
communication,’’ Draft ITU-T Recommendation H.263 Version 2
共Jan. 1998兲.
7. Y. L. Lee and H. W. Park, ‘‘Loop filtering and post-filtering for lowbit-rates moving picture coding,’’ Signal Process. Image Commun. 16,
871– 890 共2001兲.
8. K. R. Rao and P. Yip, ‘‘Fast algorithm for DCT-II,’’ Chap. 4 in Discrete Cosine Transform, Academic Press, New York 共1990兲.
9. A. Hallapuro and M. Karczewicz, ‘‘Low complexity transform and
quantization—part I: basic implementation,’’ ISO/IEC MPEG &
ITU-T Q.6/SG16 Q.6, JVT-B038 共Jan. 2002兲.
10. A. K. Jain, ‘‘Image transform,’’ Chap. 5 in Fundamentals of Digital
Image Processing, Prentice-Hall, Englewood Cliffs, NJ 共1989兲.
11. G. Bjontegaard, ‘‘Calculation of average PSNR differences between
RD-curves,’’ ITU-T Q.6/SG16 VCEG-M33 共Mar. 2001兲.
Yung-Lyul Lee received the BS and MS degrees in electronic engineering from Sogang University, Seoul, Korea, in 1985 and 1987,
respectively, and the PhD degree in electrical and electronic engineering from Korea Advanced Institute of Science and Technology
(KAIST), Taejon, Korea, in 1999. He has been an assistant professor in the Department of Internet Engineering, Sejong University,
Seoul, Korea, since 2001. He was a principal researcher at the
Samsung Electronics Co. Ltd. from 1987 to 2001. His current research interests include video compression, image processing, watermarking, and multimedia systems.
Optical Engineering, Vol. 42 No. 9, September 2003 2593
Lee, Shin, and Park: Fast loop filtering . . .
Il-Hong Shin was born in Pohang, Korea, in 1978. He received the
BS and MS degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea,
in 2000 and 2002, respectively. He is currently working toward the
PhD at the same university. His research interests include image
and video coding, video communication, rate control, and multimedia systems.
Hyun Wook Park received the BS degree in electrical engineering
from Seoul National University, Seoul, Korea, in 1981, and the MS
2594 Optical Engineering, Vol. 42 No. 9, September 2003
and PhD degrees in electrical engineering from Korea Advanced
Institute of Science and Technology (KAIST), Seoul, Korea, in 1983
and 1988, respectively. He has been a professor in the Electrical
Engineering Department, KAIST, Daejeon, Korea, since 1993. He
was a research associate at the University of Washington from 1989
to 1992 and was a senior executive researcher at the Samsung
Electronics Co. Ltd. from 1992 to 1993. His current research interests include image computing systems, image compression, medical imaging, and multimedia systems. He is a senior member of the
IEEE and a member of SPIE.