Efficient block-based video encoder embedding a

J. Vis. Commun. Image R. 14 (2003) 22–40
www.elsevier.com/locate/yjvci
Efficient block-based video encoder embedding
a Wiener filter for noisy video sequences
Sung Deuk Kim and Jong Beom Ra*
Department of EECS, Korea Advanced Institute of Science and Technology, 373-1 Kusongdong, Yusonggu,
Taejon, Republic of Korea
Received 24 September 2001; accepted 15 November 2002
Abstract
Since pre-filtering removes camera noise and improves coding efficiency dramatically, its
efficient implementation has been an important issue in video sequence coding. Based on
the approximated generalized Wiener filtering and two-dimensional discrete cosine transform
(DCT) factorization, this paper introduces a novel pre-filtering scheme that is performed inside a video encoder. The proposed pre-filtering is performed, by scaling the DCT coefficients
of original image blocks for intra block coding and those of motion-compensated error blocks
for inter block coding, respectively. Even though the pre-filtering operation is embedded in a
video encoder, its additional computational complexity is marginal for given signal-to-noise
ratio (SNR) estimates, and the overall architecture of the conventional video encoder is maintained. Notwithstanding its simplicity, the proposed pre-filtering scheme gives good filtering
and coding performance for noisy video sequences.
Ó 2003 Elsevier Science (USA). All rights reserved.
Keywords: Pre-filtering; Noise removal; DCT-domain filtering
1. Introduction
Nowadays, block-based video encoders such as MPEG-1, MPEG-2, and H.263
are widely used for storing and transmitting video sequences. These video encoders
achieve good compression performance by reducing redundant information residing
*
Corresponding author. Fax: +82-42-869-8360.
E-mail address: [email protected] (Jong Beom Ra).
1047-3203/03/$ - see front matter Ó 2003 Elsevier Science (USA). All rights reserved.
doi:10.1016/S1047-3203(02)00012-3
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
23
in a video sequence. It is well known that DCT, motion-compensated prediction, and
variable length coding (VLC) are very useful tools in improving coding efficiency in
encoders by reducing spatial, temporal, and statistical redundant information, respectively. Since video sequences are highly correlated spatially and temporally, their
coding efficiency can be improved dramatically by using these technologies. In a
practical video encoder, however, a video sequence obtained from a camera usually
conveys noise, and this noise term degrades not only image quality but also coding
efficiency due to its uncorrelated nature. Therefore, it is common to apply a pre-filtering procedure in encoding a noisy image or video sequence in order to improve
encoded image quality and coding efficiency (Al-Shaykh and Mersereau, 1998; Vasconcelos and Dufaux, 1997).
Noise removal schemes for video sequences have been widely studied. Among
them, a spatial-domain adaptive Wiener filtering scheme (Lim, 1990) and a motion-compensated spatio-temporal filtering scheme (Boo and Bose, 1998) have given
good de-noising performance for video sequences. However, these filtering schemes
have been studied in the aspect of noise filtering itself rather than optimizing a video
encoder with pre-filtering. This is because the noise removal operation has been considered a process independent of video encoding. As a consequence, the incorporation of pre-filtering into video encoding has been implemented through
straightforward cascading. However, in this cascaded structure, computational complexity is increased significantly due to the additional pre-filtering stage.
This paper aims at realizing an efficient video coder including a pre-filtering step,
for noisy video sequences. Since a video coder is based on DCT, there have been some
activities to optimize video processing operations in the DCT-domain (J
onsson, 1997;
Merhav and Bhaskaran, 1997). Similarly, we try to embed a pre-filtering scheme inside a block-based video coder. In order to achieve an efficient encoder structure with
pre-filtering, that fully works in the DCT-domain for fast processing, the concept of
the approximated generalized Wiener filter is explored (Jain, 1989; Pratt, 1972); and
the pre-filtering operation is simply adapted by scaling the DCT coefficients. Since a
video encoder adaptively encodes a block as an intra block or an inter block, two different filters are designed and used for intra blocks and inter blocks, respectively.
Therefore, the overall architecture of a conventional video encoder is maintained irrespective of the insertion of the pre-filtering operation.
The approximated generalized Wiener filter is based on zero-mean image block
data. Therefore, for filtering non-zero-mean image blocks, their mean values are
to be estimated and subtracted from the data, and then filtering is applied. Finally,
the mean values are added to the filtered data. For efficient filtering in the DCT domain, we adopt a DCT-domain mean estimation scheme based on scaling, and combine the whole filtering operation including mean estimation, subtraction, zero-mean
block filtering, and addition, into a unified scaling operation. Since the filtering is
performed by scaling DCT coefficients, we also propose a method for jointly optimizing the filtering operation with the scaled DCT (Feig and Winograd, 1992).
The parameters of the approximated generalized Wiener filter are determined
on the basis of data covariance. Hence, proper covariance estimates are needed
for both mean-subtracted intra and inter blocks. To find the covariance model of
24
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
the motion-compensated inter frame, there have been several approaches including
a somewhat empirical one (Niehsen and Brunig, 1999) and a more theoretic one
(Chen and Pang, 1993). In this paper, to obtain covariance estimates for both
mean-subtracted intra and inter blocks in MPEG-4 test sequences, we adopt a
similar approach to Niehsen and Brunig (1999).
Although pre-filtering is merged into video encoding and performed just by scaling DCT coefficients, the proposed scheme gives good filtering and coding performance especially for inter frames, when compared with the cascaded method of a
spatial-domain adaptive Wiener filter and a video encoder.
This paper is organized as follows. Section 2 briefly reviews approximated generalized Wiener filtering, and proposes a filtering architecture based on scaling for nonzero-mean image blocks. Section 3 deals with the proposed video encoder scheme for
noisy sequences. Intensive simulation results for the proposed scheme are given in
Section 4. Finally, Section 5 provides concluding remarks.
2. Approximated generalized Wiener filtering
2.1. Brief review
Generalized Wiener filtering is an efficient method for approximately implementing the Wiener filter by using a fast unitary transform such as DCT (Jain, 1989).
Fig. 1 shows the block diagram of an approximated generalized Wiener filter for
^ are row-ordered column
non-zero-mean image data. Here, input v and output w
vectors denoting an observed noisy image block and the filtered image block, respec^ is 64 1 for an 8 8 block. Since v is
tively. Note that the dimension of both v and w
^ is estimated and subtracted
usually non-zero-mean image data, its mean value m
^ is
from data v. Then, after filtering the mean-subtracted data z, the mean value m
added to the filtered result ^
y. In the case of no blur, generalized Wiener filtering
for the zero-mean image observation model is described as follows:
~ Z;
^
y ¼ AT ½ALAT Az AT L
ð1Þ
1
T
~ ¼ ALA , L ¼ ½I þ r2 R
1 , R ¼ E½yyT , Z ¼ Az, and R and r2 denote the
where L
n
n
covariance matrix of y and noise variance, respectively. Here, A is a unitary transform. Since the DCT is adopted in our case, A ¼ ðC8 C8 Þ, where C8 and denote
~
an 8 8 DCT matrix and the Kronecker product operator, respectively. Now that L
Fig. 1. Approximated generalized Wiener filtering for non-zero-mean images.
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
25
is nearly diagonal for many unitary transforms, Eq. (1) can be approximated to the
following equation.
^;
^
ð2Þ
y ¼ AT Y
^
~
~
where Y ¼ LZ ½Diag LZ: Therefore, by mapping Eq. (2) into an 8 8 block,
Y^ ðk; lÞ P~ðk; lÞZðk; lÞ;
ð3Þ
where
P~ðk; lÞ ffi
1
1þ
ðr2n =r2 Þ
Wðk; lÞ
1
;
ð4Þ
Wðk; lÞ are normalized elements on the diagonal of ARAT , and r2 denotes the
variance of the desired data y. r2 is usually estimated by subtracting the noise variance from the variance of z. In Eq. (3), it should be noted that for zero-mean image
data, the approximated generalized Wiener filtering actually corresponds to scaling
their 2-D DCT coefficients with P~ðk; lÞ. Once y^ðm; nÞ is determined, the final filtered
^ ðm; nÞ, is obtained by adding m
^ ðm; nÞ to y^ðm; nÞ.
image, w
2.2. Proposed filtering architecture for non-zero-mean image blocks
The block diagram of the approximated generalized Wiener filtering for non-zeromean images in Fig. 1, can be redrawn by performing mean addition and subtraction
procedures in the DCT domain instead of the spatial domain (see Fig. 2). Here, we
assume that the mean block can be obtained by scaling the DCT coefficients of the
^ ðk; lÞ ¼
observed noisy block with a certain weighting matrix Sðk; lÞ, i.e., M
Sðk; lÞ V ðk; lÞ. Using this assumption and Eq. (3), the filtered image data can be represented in the DCT domain as follows:
^ ðk; lÞ ¼ ðP~ðk; lÞ ð1 Sðk; lÞÞ þ Sðk; lÞÞ V ðk; lÞ
W^ ðk; lÞ ¼ Y^ ðk; lÞ þ M
¼ F ðk; lÞ V ðk; lÞ;
ð5Þ
where
F ðk; lÞ ¼ P~ðk; lÞ ð1 Sðk; lÞÞ þ Sðk; lÞ
¼
1 þ Sðk; lÞ ðr2n =r2 Þ ð1=Wðk; lÞÞ
:
1 þ ðr2n =r2 Þ ð1=Wðk; lÞÞ
ð6Þ
Fig. 2. DCT-domain representation of the approximated generalized Wiener filtering for non-zero-mean
images. All procedures are fully taken into the DCT domain.
26
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
It is noted in Eq. (5) that the whole operation for filtering can be merged into a
unified scaling operation with a scaling matrix F ðk; lÞ, which is determined depending on the signal-to-noise level, the corresponding covariance estimates, and the
mean estimation scheme.
Therefore, the remaining task is to find an appropriate Sðk; lÞ for mean estimation. Here, we choose the two kinds of Sðk; lÞ that meet the assumption of
^ ðk; lÞ ¼ Sðk; lÞ V ðk; lÞ in the DCT domain. The simplest and most efficient choice
M
is to use the DC value of 8 8 block data as a mean block data, i.e.,
3
2
1 0 0 0 0 0 0 0
60 0 0 0 0 0 0 07
7
6
60 0 0 0 0 0 0 07
7
6
60 0 0 0 0 0 0 07
7:
6
ð7Þ
Sðk; lÞ ¼ S1 ðk; lÞ ¼ 6
7
60 0 0 0 0 0 0 07
60 0 0 0 0 0 0 07
7
6
40 0 0 0 0 0 0 05
0 0 0 0 0 0 0 0
Another Sðk; lÞ is chosen by assuming that the mean block data is obtained by the
convolution of the observed block data and the 5 5 averaging kernel of
2
3
1 1 1 1 1
61 1 1 1 17
7
1 6
avgðm; nÞ ¼
61 1 1 1 17
ð8Þ
7:
25 6
41 1 1 1 15
1 1 1 1 1
For the convolution process, the pixels outside the 8 8 block are defined through
mirroring boundary pixels. Since the convolution kernel of Eq. (8) has a separable
form, we can rewrite the averaging operation in matrix form as follows:
^ ¼ hvhT ;
m
where
2
2
62
6
61
6
1 60
h¼ 6
5 6
60
60
6
40
0
ð9Þ
2
1
1
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
2
3
0
07
7
07
7
07
7:
07
7
17
7
25
2
ð10Þ
If Eq. (9) is represented in the DCT domain, we obtain
^ ¼ HVHT ;
M
ð11Þ
^ ¼ C8 m
^ CT8 , V ¼ C8 vCT8 , and H ¼ C8 hCT8 . Fortunately, for the averaging
where M
kernel of Eq. (8), H becomes a diagonal matrix (Ng and Yip, 2001) and Eq. (11) is
more simply described as
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
^ ðk; lÞ ¼ H ðk; kÞ H ðl; lÞ V ðk; lÞ ¼ Sðk; lÞ V ðk; lÞ;
M
27
ð12Þ
where
Sðk; lÞ ¼ S2 ðk; lÞ
2
1:00000
0:85239
0:48284
0:07023 0:20000
6 0:85239
0:72658
0:41157
0:05986 0:17048
6
6 0:48284
0:41157
0:23314
0:03391 0:09657
6
6 0:07023
0:05986
0:03391
0:00493 0:01405
¼6
6 0:20000 0:17048 0:09657 0:01405
0:04000
6
6 0:23592 0:20109 0:11391 0:01657
0:04718
6
4 0:08284 0:07061 0:04000 0:00582
0:01657
0:11329
0:09657
0:05470
0:00796 0:02266
0:23592
0:20109
0:11391
0:01657
0:04718
0:05566
0:01954
0:02673
3
0:08284
0:11329
0:07061
0:09657 7
7
0:04000
0:05470 7
7
0:00582
0:00796 7
7:
0:01657 0:02266 7
7
0:01954 0:02673 7
7
0:00686 0:00939 5
0:00939
0:01283
(13)
3. Proposed video encoder for noisy video sequences
3.1. DCT-domain pre-filtering inside a video encoder
We have observed that the approximated generalized Wiener filtering can be performed by scaling transformed coefficients even for non-zero-mean image data.
From this observation, we will investigate how the pre-filtering operation is integrated in a video encoder.
Fig. 3 depicts the overall encoding structure for intra blocks. When we combine the
pre-filtering scheme with a video encoder, we note that the inverse DCT (IDCT) in the
Wiener filtering procedure is canceled by the DCT in the video encoder. (This is based
on the assumption that the DCT is chosen as a unitary transform for approximated
generalized Wiener filtering.) It means that only one DCT operation is needed when
approximated generalized Wiener filtering is merged into the video encoder.
The concept of Fig. 3 is also valid in processing inter blocks on the assumption
that the motion-compensated prediction data, pðm; nÞ, does not contain input noise.
Fig. 3. Proposed architecture for encoding intra blocks.
28
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
Fig. 4. Proposed encoder architecture for encoding inter blocks.
Therefore, the overall architecture of the conventional video encoder is maintained
besides the fact that transformed coefficients are scaled with a unified weighting
matrix, F ðk; lÞ. Fig. 4 depicts the block diagram for processing inter blocks. It should
be noted in Eq. (6) that F ðk; lÞ is dependent on the block mode, because covariance
estimates are different between intra and inter blocks.
3.2. Merging pre-filtering into scaled DCT
The scaled DCT is a well-known fast DCT algorithm that factorizes the DCT into
a core operation, scaling operation, and permutation operation (Feig and Winograd,
1992). Since approximated generalized Wiener filtering can be performed by scaling
the transformed coefficients, it may be possible to further optimize it by combining
the scaling operation for filtering and the scaling operation for DCT itself. In addition, the permutation operation can be merged by properly modifying the zigzag
scanning order.
According to the scaled DCT, an 8 8 DCT matrix, C8 , can be factorized as
follows:
C8 ¼ P8 D8 R8;1 M8 R8;2 ;
ð14Þ
~ 1 B2 B3 ,
where R8;2 ¼ B
2
1 0 0 0 0 0 0 0
6
60
6
60
6
6
60
6
P8 ¼ 6
60
6
6
60
6
6
40
3
2
1 0 0
0
0
0
0
1 0
0
0
0
0
0 1
0
0
0
0
0 0 cð4Þ 0
0
0
0 0
0
1
0
0
0 0
0
0 cð4Þ
0 0
0
0
0
0 0 0
0
0
0
7
6
0 0 0 1 0 0 07
60
7
6
7
60
0 1 0 0 0 0 07
6
7
6
7
60
0 0 0 0 1 0 07
6
7; M8 ¼ 6
60
1 0 0 0 0 0 07
7
6
7
6
60
0 0 0 0 0 0 17
7
6
7
6
0 0 1 0 0 0 05
40
0 0 0 0 0 0 1 0
0
cð6Þ
0
3
7
0 7
7
0 7
7
7
0 7
7
7;
0 7
7
7
0 7
7
7
cð2Þ 5
cð2Þ cð6Þ
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
2
1 0
1
6
60
6
61
6
6
60
6
B2 ¼ 6
60
6
6
60
6
6
40
0
0
0
0
0
3
2
1
7
6
07
6 1
7
6
7
6 0
0 1 1 0 0 0 07
6
7
6
6 0
0 1 1 0 0 0 0 7
7 ~
6
7; B1 ¼ 6
6 0
0 0 0 1 1 1 0 7
7
6
7
6
6 0
0 0 0 1 1 0 1 7
7
6
7
6
0 0 0 1 1 1 0 5
4 0
6
60
6
60
6
6
60
6
R8;1 ¼ 6
60
6
6
60
6
6
40
2
0
1
0
0
0
0
0 0
0
0
1
1
0
0
0
1
0 0 0 0
1
1
0
0 0 0
0
0
1 0 0 0
1 1
0
0 0 0
0
0
0
1 0 0
0
0
0
0 1 0
0
0
0
0 0 1
0 1
3
0
2
1
7
6
07
60
7
6
7
60
07
6
7
6
60
07
7
6
7; B3 ¼ 6
7
61
07
6
7
6
60
07
7
6
7
6
05
40
1 0
0
0
29
0
0
0
3
7
07
7
0 0 1 0 0 0 07
7
7
0 1 1 0 0 0 0 7
7
7;
0 0 0 0 0 1 0 7
7
7
0 0 0 1 0 0 1 7
7
7
0 0 0 1 0 1 0 5
1 0
0
0
0
0
0 0
0
0
1
0 1
3
0 0 0
0
1 0 0
0
0 1 0
0
0 0 1
1
0 0 0
0
1 0 0
0
0 1 0
0
0
0
1
7
0 7
7
1 0 0 7
7
7
0 0 0 7
7
7;
0 0 1 7
7
7
0 1 0 7
7
7
1 0 0 5
0
1
0 0 0 0 0 0 0 1
0 0 0 1 1 0 0 0
3
2 pffiffiffi
2
0
0
0
0
0
0
0
7
6
7
6 0 c
1 ð4Þ
0
0
0
0
0
0
7
6
7
6
1
7
6 0
0
c
ð6Þ
0
0
0
0
0
7
6
7
6
1
7
6
ð2Þ
0
0
0
0
0
0
0
c
7;
D8 ¼ 18 6
7
6
0
0
0
c
1 ð5Þ
0
0
0
7
6 0
7
6
7
6 0
1
0
0
0
0
c ð1Þ
0
0
7
6
7
6
1
7
6 0
0
0
0
0
0
c ð3Þ
0
5
4
1
0
0
0
0
0
0
0
c ð7Þ
ð15Þ
and cðkÞ ¼ cosð2pk=32Þ. It should be noted that C8 can be easily implemented by
scaling and permuting the result of R8;1 M8 R8;2 . In addition, Eq. (14) can be expanded
to two-dimensional DCT, i.e.,
C8 C8 ¼ ðP8 D8 R8;1 M8 R8;2 Þ ðP8 D8 R8;1 M8 R8;2 Þ
¼ ððP8 D8 Þ ðP8 D8 ÞÞððR8;1 M8 R8;2 Þ ðR8;1 M8 R8;2 ÞÞ
¼ ðP8 P8 ÞðD8 D8 ÞðR8;1 R8;1 ÞðM8 M8 ÞðR8;2 R8;2 Þ
¼ PDðR1 MR2 Þ:
ð16Þ
30
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
Fig. 5. Joint optimization with the scaled DCT.
Therefore, C8 C8 can also be implemented by the two-dimensional scaling and
permuting of R1 MR2 :
As shown in Fig. 5, the scaling factor residing in the scaled DCT, Dðk 0 ; l0 Þ, and the
scaling factor for pre-filtering, F ðk 0 ; l0 Þ, can be merged into a unified scaling term,
H ðk 0 ; l0 Þ. Here, the coordinates of ðk 0 ; l0 Þ are used instead of ðk; lÞ. This is because
the permutation operation still remains after scaling, to complete the scaled DCT.
However, since the permutation operation just alters the absolute positions of transformed coefficients, it can also be combined with the zigzag scanning operation in the
VLC, which also alters coefficient positions (see Fig. 6(a)). Hence, the combined
scanning order is given in Fig. 6(b).
In summary, H ðk 0 ; l0 Þ for simultaneously performing filtering and scaled DCT,
can be described as
H ðk 0 ; l0 Þ ¼ Dðk 0 ; l0 Þ F ðk 0 ; l0 Þ
¼
Dðk 0 ; l0 Þ þ Dðk 0 ; l0 Þ Sðk 0 ; l0 Þ ðr2n =r2 Þ ð1=Wðk 0 ; l0 ÞÞ
;
1 þ ðr2n =r2 Þ ð1=Wðk 0 ; l0 ÞÞ
ð17Þ
where
r2 ¼ maxðr2z r2n ; 0Þ;
and
r2z ¼
7 X
7
X
k¼0
l¼0
Z 2 ðk; lÞ ¼
7 X
7
X
2
ð1 Sðk 0 ; l0 ÞÞ D2 ðk 0 ; l0 Þ Vc2 ðk 0 ; l0 Þ:
ð18Þ
k 0 ¼0 l0 ¼0
Fig. 6. (a) The normal zigzag scanning order, and (b) the combined scanning order for the permutation of
the scaled DCT and the zigzag scanning of the VLC.
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
31
Here, Vc ðk 0 ; l0 Þ denotes the transformed coefficients just after operation R1 MR2 are
completed in the scaled DCT. It should be noted that Dðk 0 ; l0 Þ, and Sðk 0 ; l0 Þ have
constant values, and Wðk 0 ; l0 Þ, or normalized elements in the diagonal of ARAT ,
become constant if the covariance matrix of y, R, is estimated as constants. Therefore, operations related to Dðk 0 ; l0 Þ, Sðk 0 ; l0 Þ, and Wðk 0 ; l0 Þ in Eqs. (17) and (18) can be
pre-computed and stored in a small memory.
As can be noticed above, the increase of computational complexity by integrating
the pre-filtering process in the DCT is only due to the calculation of H ðk 0 ; l0 Þ. And its
calculation consists of 64 2 multiplications and additions, and 64 divisions, in addition to the estimation of SNR r2 =r2n . Also note that the computational complexity
of SNR estimation can be controlled depending on the desired filtering performance.
4. Simulation results
We have used an H.263 video encoder for this simulation. Only the first frame was
encoded as the intra frame and the other frames were encoded as inter frames. All
optional modes of the H.263 were turned off. Noisy video sequences were obtained
by adding additive white Gaussian noise (AWGN) to original MPEG-4 test sequences. We used a fixed quantization parameter for all intra and inter blocks and
encoded 300 frames with a frame rate of 10 Hz. The Hall monitor, Mother and daughter, Foreman, and Coast guard sequences in the QCIF format were used as test video
sequences. In order to measure the amount of noise added to the original image and
the visual quality of reconstructed images quantitatively, SNR ¼ 10 log10 ðr2 =r2n Þ and
PSNR ¼ 10 log10 ð2552 =r2n Þ are used, respectively. Here, r2 is the variance of the desired (or original) image, and r2n is the variance of the difference between the desired
image and the acquired (or noisy) image.
To obtain H ðk 0 ; l0 Þ in Eq. (17), we need to find Wðk; lÞ, and these can be easily
obtained if the covariance matrix R is known. Hence, to find realistic covariance
estimates for mean-subtracted intra and inter blocks, we use an estimate of the
covariance matrix for both horizontal and vertical directions, i.e., (Niehsen and
Brunig, 1999)
T
8
^ 1D ¼ 1
R
Ys Ys þ Ys YTs ;
ð19Þ
T
2 traceðYs Ys Þ
where Ys denotes the 8 8 matrix form of mean-subtracted block data, y. Note that
Eq. (19) coincides with the unit variance normalization for Toeplitz covariance
matrices, in the wide sense stationary case. In order to subtract block-based mean
data from intra and inter blocks, we try two mean estimation methods corresponding
to Eqs. (7) and (13), and name them method 1 and method 2, respectively. For
covariance estimation, we use four MPEG-4 sequences; Akiyo, Container ship, News,
and Silent voice in the QCIF format, which are different from the test sequences for
filtering. Each sequence has 300 frames and is encoded with a frame rate of 10 Hz,
and a fixed quantization parameter of 2 is used. First, for mean-subtracted intra and
inter blocks, we obtain normalized covariance estimates frame by frame, through
32
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
frame-based averaging. Then, we examine their minimal, maximal, and averaged
values for all the corresponding frames in the four sequences, and plot them in Fig. 7
for intra blocks and Fig. 8 for inter blocks, respectively. In the simulation for the
proposed DCT-domain Wiener filtering, we use the averaged covariance estimation
values as in Niehsen and Brunig (1999), for obtaining constant Wðk 0 ; l0 Þ. Also, meansubtracted image variance r2 is updated block by block.
Fig. 7. Normalized covariance estimates after frame-based averaging for intra blocks. Depicted are
minimal, maximal, and averaged block covariance estimates for all the intra frames in the four sequences.
Fig. 8. Normalized covariance estimates after frame-based averaging for inter blocks. Depicted are
minimal, maximal, and averaged block covariance estimates for all the inter frames in the four sequences.
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
33
To compare with the proposed embedded Wiener filtering in the DCT domain, we
also perform the simulation for the spatial-domain adaptive Wiener filtering in Lim
(1990), which is defined as
^ ðn1 ; n2 Þ þ
u^ðn1 ; n2 Þ ¼ m
r^2 ðn1 ; n2 Þ
^ ðn1 ; n2 ÞÞ;
ðmðn1 ; n2 Þ m
r^2 ðn1 ; n2 Þ þ r2n
ð20Þ
where
^ ðn1 ; n2 Þ ¼
m
nX
1 þM
1
ð2M þ 1Þ
nX
2 þM
2
mðk1 ; k2 Þ;
k1 ¼n1 M k2 ¼n2 M
r^2 ðn1 ; n2 Þ ¼ maxð^
r2z ðn1 ; n2 Þ r2n ; 0Þ;
and
Fig. 9. Encoding performance of intra frames. Each frame has an SNR of 20 dB. (a) Hall monitor,
(b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences.
34
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
r^2z ðn1 ; n2 Þ ¼
nX
1 þM
1
ð2M þ 1Þ
nX
2 þM
2
2
^ ðn1 ; n2 ÞÞ :
ðmðk1 ; k2 Þ m
ð21Þ
k1 ¼n1 M k2 ¼n2 M
Here, u^ðn1 ; n2 Þ denotes the filtered result of the observed noisy image block, mðn1 ; n2 Þ.
For video encoding based on spatial-domain Wiener filtering, each frame of a noisy
video sequence is filtered and applied to the input of the conventional video encoder.
In this simulation, M is set to 2, and the noise variance is assumed to be known.
Figs. 9 and 10 show the encoding performance of intra frames which are corrupted with noise of 20 and 10 dB SNR, respectively. From the figures, we can see
that the encoding performance of intra frames somewhat depends on image characteristics and the amount of noise. As images become smoother, spatial-domain
Wiener filtering gives slightly better results. On the contrary, in complex images
that have a large amount of detail such as Coast guard, the spatial-domain Wiener
Fig. 10. Encoding performance of intra frames. Each frame has an SNR of 10 dB. (a) Hall monitor,
(b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences.
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
35
filtering gives somewhat worse results. In a general sense, we can say that the three
filtering methods give comparable results in terms of PSNR but much better results
compared with no-filtering. It is also interesting to note in Fig. 10 that, in the case of
no-filtering, encoding performance is better in low bitrates. This is because coarse
quantization in low bitrates provides a low-pass filtering effect. Fig. 11 shows
encoded results of the intra-frame in the Foreman sequence, which is corrupted with
a noise of 10 dB SNR. It is noticed that the spatial-domain filtering needs more
coding bits than the proposed method for a fixed quantization parameter. From
the images, we can see that the proposed pre-filtering methods incur blocking artifacts at 8 8 block boundaries due to the block-based processing behavior. Blocking
artifacts are more noticeable in method 1 than in method 2. As the amount of noise
becomes lower, blocking artifacts are less visible. Also, as the quantization step
becomes coarser, the major source for blocking artifacts changes from block-based
pre-filtering to block-based quantization. These blocking artifacts can be effectively
alleviated by using the post-filtering technique recommended in MPEG-4 (MPEG-4,
1997; Kim et al., 1999).
Fig. 11. Encoded images of the intra frame in Foreman sequence which is corrupted with a noise of 10 dB
SNR. A fixed quantization parameter of 2 is used. (a) No filtering (24.0 dB, 195 kbits), (b) spatial-domain
adaptive Wiener filtering (29.2 dB, 109 kbits), (c) method 1 (29.0 dB, 87 kbits), and (d) method 2 (29.2 dB,
77 kbits).
36
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
Figs. 12 and 13 show the encoding performance of inter frames which are corrupted with noise of 20 and 10 dB SNR, respectively. When compared with the results of the intra frame, it is interesting to note that the proposed pre-filtering
performed inside a video encoder generally gives better results than the spatial-domain adaptive Wiener filtering performed at the input stage of a video encoder. This
means that the proposed method gives better results in inter frames than in intra
frames. This may be because it utilizes temporal correlation by motion compensated
prediction for inter frames. The gap between the PSNR value of the proposed method and that of the spatial filtering is increased, as the SNR becomes lower. Unlike
intra frame results where performance quite depends on image characteristics,
the best inter frame results are generally generated by method 1. When we consider
both intra and inter frame filtering, however, the difference between method 1 and
Fig. 12. Encoding performance of inter frames. Each frame has an SNR of 20 dB. (a) Hall monitor,
(b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences.
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
37
Fig. 13. Encoding performance of inter frames. Each frame has an SNR of 10 dB. (a) Hall monitor,
(b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences.
method 2 is not significant. From these results, we may conclude that the selection of
Sðk; lÞ gives marginal impact on filtering performance. Note in Fig. 13, that in the
case of no filtering, the PSNR for a noisy video also decreases as the bitrate increases
as in Fig. 10.
Fig. 14 shows inter-coded images of the 30th frame in the Foreman sequence for a
fixed quantization parameter. From these figures, it can be shown that the proposed
method provides more pleasant image quality with less coding bits compared to the
spatial-domain Wiener filtering. It is also interesting to note that, unlike intra
frames, the blocking artifacts are not noticeable in inter frames. Since the motioncompensated inter blocks are more noise-like compared to intra blocks, block-based
scaling of DCT coefficients incurs blocking artifacts more seriously in intra blocks
rather than in inter blocks.
38
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
Fig. 14. Encoded images of the 30th frame in Foreman sequence which is corrupted with a noise of 10 dB
SNR. A fixed quantization parameter of 2 is used. (a) No filtering (24.7 dB, 1774 kbps), (b) spatial-domain
adaptive Wiener filtering (29.6 dB, 892 kbps), (c) method 1 (30.2dB, 466kbps), and (d) method 2 (30.1 dB,
494 kbps).
In all the simulations above, we assume the baseline of H.263. However, even
when the advanced mode of H.263 is turned on, the overall tendency of the performance curves depending on applied pre-filtering methods is quite similar to that of
the baseline besides the slight improvement of coding efficiency.
5. Conclusions
A pre-filtering scheme tightly coupled with a conventional encoder structure has
been proposed. Unlike the conventional method that cascades a pre-filter and a video encoder, the proposed method is performed by scaling the transformed coefficients of original image blocks and motion-compensated error blocks inside a
video encoder. For this filtering, we adopted the approximated generalized Wiener
filtering and a DCT-domain mean estimation scheme for non-zero-mean image data,
in order to perform the whole operation with a single scaling operation. In addition,
covariance estimates of mean-subtracted intra and inter blocks are examined and
utilized for their filtering. Since the DCT is mandatory for many block-based video
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
39
coders and the whole filtering process requires only scaling operations for transformed coefficients rather than convolution operations, the overall architecture of
the conventional video encoder is maintained. Also, the computational complexity
increase due to pre-filtering is marginal for given SNR estimates.
The proposed method proves to provide good filtering and encoding performances especially for inter frames, when compared with the cascaded method of a
spatial-domain adaptive Wiener filter and a video encoder. When considering the
structural simplicity and good coding performance of the proposed encoder, it can
be very useful for low-cost video coding applications under noisy environments.
References
Al-Shaykh, O.K., Mersereau, R.M., 1998. Lossy compression of noisy images. IEEE Trans. Image
Processing 7 (12), 1641–1652.
Vasconcelos, N., Dufaux, R., 1997. Pre and post-filtering for low bit-rate video coding, in: Proceedings of
the International Conference on Image Processing, vol. 2, Santa Barbara, CA, pp. 291–294.
Lim, J.S., 1990. Two-Dimensional Signal and Image Processing. Prentice-Hall.
Boo, K.J., Bose, N.K., 1998. A motion-compensated spatio-temporal filter for image sequences with
signal-dependent noise. IEEE Trans. Circuits Syst. Video Technol. 8 (3), 287–298.
J
onsson, R.H., 1997. Efficient DCT domain implementation of picture masking and compositing, in:
Proceedingds of the International Conference on Image Processing, vol. 2, Santa Barbara, CA, pp.
366–369.
Merhav, N., Bhaskaran, V., 1997. Fast algorithms for DCT-domain image down-sampling and for inverse
motion compensation. IEEE Trans. Circuits Syst. Video Technol. 7 (3), 468–476.
Jain, A.K., 1989. Fundamentals of Digital Image Processing. Prentice-Hall.
Pratt, W.K., 1972. Generalized Wiener filter computation techniques. IEEE Trans. Comput. 21 (7), 636–
641.
Feig, E., Winograd, S., 1992. Fast algorithms for the discrete cosine transform. IEEE Trans. Signal
Processing 40 (9), 2174–2193.
Niehsen, W., Brunig, M., 1999. Covariance analysis of motion-compensated frame differences. IEEE
Trans. Circuits Syst. Video Technol. 9 (4), 536–539.
Chen, C.-F., Pang, K.K., 1993. The optimal transform of motion-compensated frame difference images in
a hybrid coder. IEEE Trans. Circuits Syst. II: Analog Digital Signal Processing 40 (6), 393–397.
Ng, M.K., Yip, A.M., 2001. A fast MAP algorithm for high-resolution image reconstruction with
multisensors. Multidimensional Systems and Signal Processing 12, 143–164.
MPEG-4 video verification model V.8.0, ISO/IEC JTC 1/SC 29/WG 11/N1796, July 1997.
Kim, S.D., Yi, J., Kim, H.M., Ra, J.B., 1999. A deblocking filter with two separate modes in block-based
video coding. IEEE Trans. Circuits Syst. Video Technol. 9 (1), 156–160.
SUNG DEUK KIM received the B.S. degree in electronics engineering from the Kyungpook National
University, Korea, in 1994, and the M.S. and Ph.D. in electrical engineering from KAIST, Korea in 1996
and 2000, respectively. From 2000 to 2003, he was employed by LG Electronics, Co., Ltd. Since March
2003, he has been a faculty member in the Department of Electronics Engineering Education at Andong
National University, Korea. His research interests include image and video processing.
JONG BEOM RA received the B.S. degree in electronic engineering in 1975 from Seoul National
University, and the M.S. and Ph.D. in electrical engineering from KAIST, Korea, in 1977 and 1983,
respectively. From 1983 to 1987, he was a member of the faculty at Columbia University, New York,
40
S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40
engaged in the development of medical imaging systems such as high field magnetic resonance imaging and
spherical positron emission tomography systems. In July 1987, he joined the Department of Electrical
Engineering and Computer Science at KAIST, where he is now a professor. His research interests are
digital image processing, video signal processing, 3-D visualization, and medical imaging such as MRI.