a. intra frames - The University of Texas at Arlington

IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXXX 200X
1
Optimization of the Deblocking filter in H.264
codec for real time implementation
Hitesh Yadav, Student Member IEEE and K. R. Rao, Member, IEEE

Abstract— Blocking artifacts are visible in the decoded
frames of most video coding standards at low bit rate
coding. Latest video coding standard H.264/AVC uses an
in-loop deblocking filter to remove the blocking artifacts.
The main drawback of this filter is its high implementation
complexity. In this paper, we propose an in-loop
deblocking filter to remove the blocking artifacts. In the
proposed method, the maximum and minimum values
among the six pixels across an edge are computed to decide
whether the pixels of the block should be filtered or not.
For intra frames, again the block is classified as smooth or
mildly textured region. Depending on the classification of
the block, the appropriate filter is applied to that block.
The main advantage of the proposed method is its low
complexity compared to JM 9.8 (H.264 Software). A
performance comparison of the proposed method and the
current method is presented.
Index Terms— Deblocking filter, Post filter, Loop filter,
H.264 standard.
EDICS Category: IMD-CODE
I. INTRODUCTION
Signal source compression methods and coding bit rates
normally influence the perceptual quality of compressed
images and video [1]. In general, the less the bit rates the
severe the coding artifacts manifest in the reconstructed video.
Lower bit rates are desirable in many applications like video
streaming because of the channel bandwidth constraints. The
block discrete cosine transform (BDCT) based coding scheme
introduces blocking artifacts in flat regions and ringing
artifacts along object edges at low bit rates [1]. Deblocking
filters are used to remove the blocking artifacts in the decoded
video. Although the deblocking filters improve the objective
and subjective qualities of output video frames, they are
usually computationally intensive.
There are number of deblocking algorithms proposed for
reducing the block artifacts in BDCT based compressed
images with minimal smoothing of true edges. They can be
classified into three key categories: Projection onto convex
Manuscript received July xx, 2006; revised Xxxxx xx, 20xx
Hitesh Yadav and Dr. K. R. Rao are with the Electrical Engineering
Department, University of Texas at Arlington, TX 76010 USA (e-mail:
[email protected], [email protected])
sets (POCS), weighted sum of pixels across block boundaries,
and adaptive filters. POCS based algorithm [2] iteratively
projects back and forth between two sets on the entire picture.
Its relative computation and implementation complexity is
high compared to other two algorithms. It gives best quality
visually with most of the video compared to other two
methods. The weighted sum based algorithms [3] computation
complexity is high compared to the adaptive algorithms. As
adaptive algorithms [4] computational complexity is low they
are preferred algorithms for real time implementation.
H.264/AVC uses an adaptive in-loop deblocking filter to
remove the blocking artifacts visible in decoded frames at low
bit rate coding.
The main drawback of this deblocking filter is its
implementation complexity. Analysis of run time profiles of
decoder sub-functions indicates that the deblocking filter
process in H.264 is the most computationally intensive part
[5]. Though recently efficient techniques to reduce the
implementation complexity have been proposed [6]-[8], the
complexity still cannot be reduced significantly because of the
flow of algorithm itself. The program code [9] includes
extensive conditional branching. This makes codes unsuitable
for deeply pipelined processor and ASIC implementation. In
addition, this program code exposes little parallelism. Hence
this code is unsuitable for VLIW processors, which are
otherwise well suited to video encoding/decoding applications.
As we can see from the above, H.264/AVC has high
implementation complexity. In this paper, we present a simpler
algorithm for the deblocking filter which reduces the
implementation complexity while maintaining the perceptual
quality of the existing deblocking algorithm. In section 2, we
present the algorithm for both inter and intra frames. In section
3, the results obtained from the proposed algorithm and the
one obtained from the JM reference software [9] are discussed.
II. PROPOSED ALGORITHM
The blocking artifacts are visible in both intra and inter
frames. The basic operation of the deblocking filter is as
follows: The deblocking filter is applied to all the edges of 4x4
pixels blocks in each macroblock except to the edges on the
boundary of a frame or a slice. For each block, vertical edges
are filtered from left to right first, and then horizontal edges
are filtered from top to bottom. The decoded process is
repeated for all macroblocks in a frame. The one-dimensional
view of a 4x4 block edge is shown in Fig. 1. Here q0, q1, q2,
q3 represent the values loaded from the current 4x4 block and
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXXX 200X
the p0, p1, p2, p3 represent the 4x4 block adjacent to the current
4x4 block.
A. INTRA FRAMES
Intra frames are more susceptible to blocking artifacts
compared to inter frames [10]. The smooth blocks in an intra
frame have more severe blocking artifacts compared to other
blocks [10]. The proposed method as applied to intra frames is
shown in Fig. 2.
The first three blocks in Fig. 2. check for the conditions at
the slice boundaries. The user sets these conditions. These
three blocks are the same as used by the existing deblocking
filter in H.264. The next step in Fig. 2 is to compute the
maximum and minimum values among the six pixels across an
edge (p2, p1, p0, q0, q1, q2) and then calculate the difference
between the maximum and the minimum value. If this
difference is greater than the QP of current block, then it is
more likely to represent an edge and therefore should not be
filtered. On the other hand, if the difference is less than the QP
of the current block, filtering should be applied to that block to
remove the blocking artifacts. The block in the above case
most likely represents smoothly or mildly texture area. The
next step in Fig. 2 is to find the difference between adjacent
pixels of a 4x4 block edge. For example, the absolute
difference between p3 and p2 is calculated and if that
difference is less than a fixed threshold (in this case the
threshold is set to two) then one is assigned to a variable diff.
The above process is repeated for all the adjacent pixels across
an edge (Fig. 1) in a 4x4 block. The variable strength denotes
the sum of the variable diff across all pixels of a 4x4 block
edge (Fig. 1). If the variable strength is greater than fixed
threshold (in this case the threshold is set to four) then the
block is most likely to be a smoothly textured section and
strong filtering is applied to that block. Several thresholds are
tried here before the thresholds have been set to the above
values. On the other hand, if the variable strength is less than
the above specified threshold, it is most likely to represent
mildly textured area or high activity region and weak filtering
is applied to that block. Here, strong filtering means applying a
low pass filter to the three adjacent pixels (p2, p1, p0, q0, q1, q2)
on either side of the boundary of a block and weak filtering
means applying a low pass filter to the pixels (p0, q0) on either
side of the boundary of a block.
B. INTER FRAMES
The exploitation of interframe redundancies relies on the
transfer of previously coded information from motion
compensated reference frames to the current predictive coded
picture. Having filtered reference frames available for motion
compensation reduces the blocking effect, but still some
artifacts are present because the reference frames may not fit
exactly at block boundaries. The proposed method is applied
to inter frames as shown in Fig. 2. The next step in Fig. 2 is to
compute the maximum and minimum values among the six
pixels across an edge (p2, p1, p0, q0, q1, q2) and then calculate
the difference between the maximum and minimum values. If
this difference is less than QP then the current block most
2
likely represents a smooth or mildly textured area. If the MB is
inter then apply low pass filter to the adjacent pixels on either
side of the boundary of a block or MB. Otherwise, if the
difference between the minimum and maximum values is
greater than QP, the current block most likely represents an
edge and therefore is not filtered.
Fig. 1 4x4 block edge (vertical or horizontal)
III. RESULTS FOR INTRA AND INTER FRAMES
The simulations are done using JM 9.8/FRExt software [9].
High profile is used for simulations. The Y, Cb, Cr sampling
mode used in the simulation is 4:2:0 format. The PSNR values
for different test sequences [10] using the proposed method,
JM 9.8 (H.264 software) [9] and reconstruction without loop
filter are given in table 1. The reconstruction of Intra-frame
with proposed method gives better PSNR values than the
reconstruction without loop filter as shown in the table 1. Also,
it gives similar PSNR values compared to the reconstruction
with loop filter (table 1). Figures 3-4 show visually the
removal of blocking artifacts using proposed method
compared to the existing loop filter. Figure 5 shows the
removal of blocking artifacts using JM 9.8 method. Table 2
shows that the proposed loop filter gives more bit savings
compared to one without a loop filter. Though loop filter
increases the visual quality of the decoded frames, for few
sequences its PSNR value is less. The motion estimation is
based on the SAD (sum of absolute differences). The filtered
reference frames are used in motion compensation, so there is
possibility that though visually the reference frame is superior
but not in terms of SAD. So because of the above reasons for
few sequences we may get more bits with a loop filter included
in the codec. The PSNR values for different test sequences
[11] using the proposed method, JM 9.8 (H.264 software) [9]
and reconstruction without loop filter are given in table 3. The
PSNR values are better for most sequences using the proposed
method as compared to the JM method (H.264 software).
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXXX 200X
Yes
3
No
Deblocking disabled for
all edges of the slice?
Yes
Deblocking disabled for
all edges of the slice
boundaries
Is the edge at the
slice boundary?
No
No
Find the max. and min. values among 6
pixels across the vertical or horizontal
edge. Compute the absolute difference
between max. and min. value
diff = abs (max. – min.)
Yes
No
filtering
No
Is
diff < QP
Yes
No
Is
MB Intra
Apply weak filtering
across the edge
Yes
Calculate the offsets for 7 pixel pairs in
the horizontal or vertical edge
Yes
If offset < 2
Var =1
No
Strength = strength + var
Var =0
Yes
Is strength > 4
Apply strong
filtering
No
Apply weak
filtering
Fig. 2 Decision flow of filtering of pixels at the block edges
IV. CONCLUSIONS
The proposed method is able to reduce the blocking artifacts
in the reconstructed video. It gives almost similar visual
quality of the reconstructed video as compared to the one
obtained from JM 9.8 (H.264 software) loop filter. It gives
better PSNR values for most sequences especially in case of
inter frames. At the same time, the proposed method requires
less implementation complexity compared to the JM 9.8
(H.264 software) loop filter. That is because of the simple flow
algorithm of the proposed method as compared to the JM 9.8
(H.264 software) loop filter.
The proposed deblocking filter can be implemented in a real
time system. By doing so, its exact reduction in
implementation complexity compared to JM 9.8 can be
determined. The deringing filter [10] can also be incorporated
in the in-loop filter to see the visual improvement of the
reconstructed video. The proposed method uses image
enhancement techniques to reduce the artifacts in the
reconstructed video. Image recovery techniques can also be
explored to reduce the artifacts in H.264 decoded video. Also,
the transforms that do not produce blocking artifacts as well as
providing the benefits of integer DCT can be explored.
TABLE 1.
COMPARISON OF PSNR VALUES (dB) FOR DIFFERENT TEST
SEQUENCES.
Reconstructio
n without
Loop Filter
JM 9.8
Reconstruction with
Loop
Filter
JM 9.8
31.615
31.363
31.637
45
26.671
26.291
26.612
Silent
39
29.489
29.269
29.330
Container
39
29.539
29.383
29.493
Bridge
37
29.886
29.738
29.828
News
35
32.204
32.040
32.043
Container
45
25.718
25.553
25.710
Test clip
(QCIF)
QP
Reconstruction
with
Proposed
Method
Foreman
37
Car phone
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXXX 200X
4
Car phone-B
39
197
203
193
TABLE 3.
COMPARISON OF PSNR VALUES (dB) FOR DIFFERENT TEST
SEQUENCES FOR P AND B FRAMES.
PSNR (dB)
Test clip
(QCIF)
-Type of
frames
Fig. 3 A reconstructed I-frame from H.264 decoder
QP = 45 without using a loop filter
Fig. 4 A reconstructed I-frame from H.264 decoder
QP = 45 with proposed method
I. TOTAL NUMBER OF BITS USED
QP
Reconstruction
without
Loop
Filter JM
9.8
Reconstruction with
Loop Filter
JM 9.8
2)
Foreman-P
39
29.883
29.834
29.692
News-P
39
27.846
27.528
27.771
Car phoneP
39
30.271
30.024
30.171
Foreman-B
39
28.925
28.879
29.054
News-B
39
28.085
28.307
27.982
Car phoneB
39
29.637
29.438
29.491
REFERENCES
Fig.5 A reconstructed I-frame from H.264 decoder
QP = 45 with JM (H.264 software) method
TABLE 2.
COMPARISON OF TOTAL NUMBER OF BITS USED FOR ENCODING
A P OR B FRAME IN H.264 COMPRESSED SEQUENCES.
Test clip
(QCIF)
-Type of
frames
QP
Reconstruction with
Proposed
Method
1)
Reconstruction with
Proposed
Method
Reconstruction without
Loop filter
JM 9.8
Reconstruction
with
Loop
filter
JM 9.8
Foreman-P
39
2085
2131
2107
News-P
39
4119
4074
4235
Car phone-P
39
817
897
720
Foreman-B
39
489
515
499
News-B
39
1424
1802
1687
[1] M.-Y. Shen and C.C Jay Kuo, “Review of postprocessing techniques for
compression artifact removal”, Journal of Visual Communication and Image
Representation, vol. 9, pp. 2-14, Mar. 1998.
[2] A. Zakhor, “Iterative procedures for reduction of blocking effects in
transform image coding,” IEEE Trans. CSVT, vol.2, pp. 91-95, Mar. 1992.
[3] A. Z. Averbuch, A. Schlar, and D. L. Donoho, “Deblocking of blocktransform compressed images using weighted sums of symmetrically aligned
pixels,” IEEE Trans. Image Processing, vol.14, pp. 200-212, Feb. 2005.
[4] P. List et al, “Adaptive deblocking filter,” IEEE Trans. CSVT, vol.13, pp.
614-619, July 2003.
[5] V. Lappalainen, A. Hallapuro, and T. D. Hamalainen, “Complexity of
optimized H.26L video decoder implementation”, IEEE Trans. CSVT, vol. 13,
pp. 717-725, July 2003.
[6] M. N. Bojnordi, O. Fatemi, and M. R. Hashemi, “An efficient deblocking
filter with self- transposing memory architecture for H.264/AVC”, ICASSP,
vol. II, pp. 925-928, May 2006.
[7] G. khurana, et al, “A pipeline hardware implementation of in-loop
deblocking filter in H.264/AVC”, IEEE Trans. on Consumer Electronics,
Vol.52, No.2, May 2006.
[8] S. C. Chang, et al, “A platform based bus-interleaved architecture for
deblocking filter in H.264/MPEG-4 AVC”, IEEE Int’l Conf. on Consumer
Electronics, 2005.
[9]H.264 software (JM9.8/FRExt) from
http://iphome.hhi.de/suehring/tml/download/jm98.zip.
[10] H. R. Wu and K. R. Rao, “Digital video image quality and perceptual
coding”, Taylor and Francis, 2006.
[11]
Test
sequences
are
obtained
from:
http://trace.eas.asu.edu/yuv/qcif.html