J. Vis. Commun. Image R. 14 (2003) 22–40 www.elsevier.com/locate/yjvci Efficient block-based video encoder embedding a Wiener filter for noisy video sequences Sung Deuk Kim and Jong Beom Ra* Department of EECS, Korea Advanced Institute of Science and Technology, 373-1 Kusongdong, Yusonggu, Taejon, Republic of Korea Received 24 September 2001; accepted 15 November 2002 Abstract Since pre-filtering removes camera noise and improves coding efficiency dramatically, its efficient implementation has been an important issue in video sequence coding. Based on the approximated generalized Wiener filtering and two-dimensional discrete cosine transform (DCT) factorization, this paper introduces a novel pre-filtering scheme that is performed inside a video encoder. The proposed pre-filtering is performed, by scaling the DCT coefficients of original image blocks for intra block coding and those of motion-compensated error blocks for inter block coding, respectively. Even though the pre-filtering operation is embedded in a video encoder, its additional computational complexity is marginal for given signal-to-noise ratio (SNR) estimates, and the overall architecture of the conventional video encoder is maintained. Notwithstanding its simplicity, the proposed pre-filtering scheme gives good filtering and coding performance for noisy video sequences. Ó 2003 Elsevier Science (USA). All rights reserved. Keywords: Pre-filtering; Noise removal; DCT-domain filtering 1. Introduction Nowadays, block-based video encoders such as MPEG-1, MPEG-2, and H.263 are widely used for storing and transmitting video sequences. These video encoders achieve good compression performance by reducing redundant information residing * Corresponding author. Fax: +82-42-869-8360. E-mail address: [email protected] (Jong Beom Ra). 1047-3203/03/$ - see front matter Ó 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S1047-3203(02)00012-3 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 23 in a video sequence. It is well known that DCT, motion-compensated prediction, and variable length coding (VLC) are very useful tools in improving coding efficiency in encoders by reducing spatial, temporal, and statistical redundant information, respectively. Since video sequences are highly correlated spatially and temporally, their coding efficiency can be improved dramatically by using these technologies. In a practical video encoder, however, a video sequence obtained from a camera usually conveys noise, and this noise term degrades not only image quality but also coding efficiency due to its uncorrelated nature. Therefore, it is common to apply a pre-filtering procedure in encoding a noisy image or video sequence in order to improve encoded image quality and coding efficiency (Al-Shaykh and Mersereau, 1998; Vasconcelos and Dufaux, 1997). Noise removal schemes for video sequences have been widely studied. Among them, a spatial-domain adaptive Wiener filtering scheme (Lim, 1990) and a motion-compensated spatio-temporal filtering scheme (Boo and Bose, 1998) have given good de-noising performance for video sequences. However, these filtering schemes have been studied in the aspect of noise filtering itself rather than optimizing a video encoder with pre-filtering. This is because the noise removal operation has been considered a process independent of video encoding. As a consequence, the incorporation of pre-filtering into video encoding has been implemented through straightforward cascading. However, in this cascaded structure, computational complexity is increased significantly due to the additional pre-filtering stage. This paper aims at realizing an efficient video coder including a pre-filtering step, for noisy video sequences. Since a video coder is based on DCT, there have been some activities to optimize video processing operations in the DCT-domain (J onsson, 1997; Merhav and Bhaskaran, 1997). Similarly, we try to embed a pre-filtering scheme inside a block-based video coder. In order to achieve an efficient encoder structure with pre-filtering, that fully works in the DCT-domain for fast processing, the concept of the approximated generalized Wiener filter is explored (Jain, 1989; Pratt, 1972); and the pre-filtering operation is simply adapted by scaling the DCT coefficients. Since a video encoder adaptively encodes a block as an intra block or an inter block, two different filters are designed and used for intra blocks and inter blocks, respectively. Therefore, the overall architecture of a conventional video encoder is maintained irrespective of the insertion of the pre-filtering operation. The approximated generalized Wiener filter is based on zero-mean image block data. Therefore, for filtering non-zero-mean image blocks, their mean values are to be estimated and subtracted from the data, and then filtering is applied. Finally, the mean values are added to the filtered data. For efficient filtering in the DCT domain, we adopt a DCT-domain mean estimation scheme based on scaling, and combine the whole filtering operation including mean estimation, subtraction, zero-mean block filtering, and addition, into a unified scaling operation. Since the filtering is performed by scaling DCT coefficients, we also propose a method for jointly optimizing the filtering operation with the scaled DCT (Feig and Winograd, 1992). The parameters of the approximated generalized Wiener filter are determined on the basis of data covariance. Hence, proper covariance estimates are needed for both mean-subtracted intra and inter blocks. To find the covariance model of 24 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 the motion-compensated inter frame, there have been several approaches including a somewhat empirical one (Niehsen and Brunig, 1999) and a more theoretic one (Chen and Pang, 1993). In this paper, to obtain covariance estimates for both mean-subtracted intra and inter blocks in MPEG-4 test sequences, we adopt a similar approach to Niehsen and Brunig (1999). Although pre-filtering is merged into video encoding and performed just by scaling DCT coefficients, the proposed scheme gives good filtering and coding performance especially for inter frames, when compared with the cascaded method of a spatial-domain adaptive Wiener filter and a video encoder. This paper is organized as follows. Section 2 briefly reviews approximated generalized Wiener filtering, and proposes a filtering architecture based on scaling for nonzero-mean image blocks. Section 3 deals with the proposed video encoder scheme for noisy sequences. Intensive simulation results for the proposed scheme are given in Section 4. Finally, Section 5 provides concluding remarks. 2. Approximated generalized Wiener filtering 2.1. Brief review Generalized Wiener filtering is an efficient method for approximately implementing the Wiener filter by using a fast unitary transform such as DCT (Jain, 1989). Fig. 1 shows the block diagram of an approximated generalized Wiener filter for ^ are row-ordered column non-zero-mean image data. Here, input v and output w vectors denoting an observed noisy image block and the filtered image block, respec^ is 64 1 for an 8 8 block. Since v is tively. Note that the dimension of both v and w ^ is estimated and subtracted usually non-zero-mean image data, its mean value m ^ is from data v. Then, after filtering the mean-subtracted data z, the mean value m added to the filtered result ^ y. In the case of no blur, generalized Wiener filtering for the zero-mean image observation model is described as follows: ~ Z; ^ y ¼ AT ½ALAT Az AT L ð1Þ 1 T ~ ¼ ALA , L ¼ ½I þ r2 R 1 , R ¼ E½yyT , Z ¼ Az, and R and r2 denote the where L n n covariance matrix of y and noise variance, respectively. Here, A is a unitary transform. Since the DCT is adopted in our case, A ¼ ðC8 C8 Þ, where C8 and denote ~ an 8 8 DCT matrix and the Kronecker product operator, respectively. Now that L Fig. 1. Approximated generalized Wiener filtering for non-zero-mean images. S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 25 is nearly diagonal for many unitary transforms, Eq. (1) can be approximated to the following equation. ^; ^ ð2Þ y ¼ AT Y ^ ~ ~ where Y ¼ LZ ½Diag LZ: Therefore, by mapping Eq. (2) into an 8 8 block, Y^ ðk; lÞ P~ðk; lÞZðk; lÞ; ð3Þ where P~ðk; lÞ ffi 1 1þ ðr2n =r2 Þ Wðk; lÞ 1 ; ð4Þ Wðk; lÞ are normalized elements on the diagonal of ARAT , and r2 denotes the variance of the desired data y. r2 is usually estimated by subtracting the noise variance from the variance of z. In Eq. (3), it should be noted that for zero-mean image data, the approximated generalized Wiener filtering actually corresponds to scaling their 2-D DCT coefficients with P~ðk; lÞ. Once y^ðm; nÞ is determined, the final filtered ^ ðm; nÞ, is obtained by adding m ^ ðm; nÞ to y^ðm; nÞ. image, w 2.2. Proposed filtering architecture for non-zero-mean image blocks The block diagram of the approximated generalized Wiener filtering for non-zeromean images in Fig. 1, can be redrawn by performing mean addition and subtraction procedures in the DCT domain instead of the spatial domain (see Fig. 2). Here, we assume that the mean block can be obtained by scaling the DCT coefficients of the ^ ðk; lÞ ¼ observed noisy block with a certain weighting matrix Sðk; lÞ, i.e., M Sðk; lÞ V ðk; lÞ. Using this assumption and Eq. (3), the filtered image data can be represented in the DCT domain as follows: ^ ðk; lÞ ¼ ðP~ðk; lÞ ð1 Sðk; lÞÞ þ Sðk; lÞÞ V ðk; lÞ W^ ðk; lÞ ¼ Y^ ðk; lÞ þ M ¼ F ðk; lÞ V ðk; lÞ; ð5Þ where F ðk; lÞ ¼ P~ðk; lÞ ð1 Sðk; lÞÞ þ Sðk; lÞ ¼ 1 þ Sðk; lÞ ðr2n =r2 Þ ð1=Wðk; lÞÞ : 1 þ ðr2n =r2 Þ ð1=Wðk; lÞÞ ð6Þ Fig. 2. DCT-domain representation of the approximated generalized Wiener filtering for non-zero-mean images. All procedures are fully taken into the DCT domain. 26 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 It is noted in Eq. (5) that the whole operation for filtering can be merged into a unified scaling operation with a scaling matrix F ðk; lÞ, which is determined depending on the signal-to-noise level, the corresponding covariance estimates, and the mean estimation scheme. Therefore, the remaining task is to find an appropriate Sðk; lÞ for mean estimation. Here, we choose the two kinds of Sðk; lÞ that meet the assumption of ^ ðk; lÞ ¼ Sðk; lÞ V ðk; lÞ in the DCT domain. The simplest and most efficient choice M is to use the DC value of 8 8 block data as a mean block data, i.e., 3 2 1 0 0 0 0 0 0 0 60 0 0 0 0 0 0 07 7 6 60 0 0 0 0 0 0 07 7 6 60 0 0 0 0 0 0 07 7: 6 ð7Þ Sðk; lÞ ¼ S1 ðk; lÞ ¼ 6 7 60 0 0 0 0 0 0 07 60 0 0 0 0 0 0 07 7 6 40 0 0 0 0 0 0 05 0 0 0 0 0 0 0 0 Another Sðk; lÞ is chosen by assuming that the mean block data is obtained by the convolution of the observed block data and the 5 5 averaging kernel of 2 3 1 1 1 1 1 61 1 1 1 17 7 1 6 avgðm; nÞ ¼ 61 1 1 1 17 ð8Þ 7: 25 6 41 1 1 1 15 1 1 1 1 1 For the convolution process, the pixels outside the 8 8 block are defined through mirroring boundary pixels. Since the convolution kernel of Eq. (8) has a separable form, we can rewrite the averaging operation in matrix form as follows: ^ ¼ hvhT ; m where 2 2 62 6 61 6 1 60 h¼ 6 5 6 60 60 6 40 0 ð9Þ 2 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 2 3 0 07 7 07 7 07 7: 07 7 17 7 25 2 ð10Þ If Eq. (9) is represented in the DCT domain, we obtain ^ ¼ HVHT ; M ð11Þ ^ ¼ C8 m ^ CT8 , V ¼ C8 vCT8 , and H ¼ C8 hCT8 . Fortunately, for the averaging where M kernel of Eq. (8), H becomes a diagonal matrix (Ng and Yip, 2001) and Eq. (11) is more simply described as S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 ^ ðk; lÞ ¼ H ðk; kÞ H ðl; lÞ V ðk; lÞ ¼ Sðk; lÞ V ðk; lÞ; M 27 ð12Þ where Sðk; lÞ ¼ S2 ðk; lÞ 2 1:00000 0:85239 0:48284 0:07023 0:20000 6 0:85239 0:72658 0:41157 0:05986 0:17048 6 6 0:48284 0:41157 0:23314 0:03391 0:09657 6 6 0:07023 0:05986 0:03391 0:00493 0:01405 ¼6 6 0:20000 0:17048 0:09657 0:01405 0:04000 6 6 0:23592 0:20109 0:11391 0:01657 0:04718 6 4 0:08284 0:07061 0:04000 0:00582 0:01657 0:11329 0:09657 0:05470 0:00796 0:02266 0:23592 0:20109 0:11391 0:01657 0:04718 0:05566 0:01954 0:02673 3 0:08284 0:11329 0:07061 0:09657 7 7 0:04000 0:05470 7 7 0:00582 0:00796 7 7: 0:01657 0:02266 7 7 0:01954 0:02673 7 7 0:00686 0:00939 5 0:00939 0:01283 (13) 3. Proposed video encoder for noisy video sequences 3.1. DCT-domain pre-filtering inside a video encoder We have observed that the approximated generalized Wiener filtering can be performed by scaling transformed coefficients even for non-zero-mean image data. From this observation, we will investigate how the pre-filtering operation is integrated in a video encoder. Fig. 3 depicts the overall encoding structure for intra blocks. When we combine the pre-filtering scheme with a video encoder, we note that the inverse DCT (IDCT) in the Wiener filtering procedure is canceled by the DCT in the video encoder. (This is based on the assumption that the DCT is chosen as a unitary transform for approximated generalized Wiener filtering.) It means that only one DCT operation is needed when approximated generalized Wiener filtering is merged into the video encoder. The concept of Fig. 3 is also valid in processing inter blocks on the assumption that the motion-compensated prediction data, pðm; nÞ, does not contain input noise. Fig. 3. Proposed architecture for encoding intra blocks. 28 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 Fig. 4. Proposed encoder architecture for encoding inter blocks. Therefore, the overall architecture of the conventional video encoder is maintained besides the fact that transformed coefficients are scaled with a unified weighting matrix, F ðk; lÞ. Fig. 4 depicts the block diagram for processing inter blocks. It should be noted in Eq. (6) that F ðk; lÞ is dependent on the block mode, because covariance estimates are different between intra and inter blocks. 3.2. Merging pre-filtering into scaled DCT The scaled DCT is a well-known fast DCT algorithm that factorizes the DCT into a core operation, scaling operation, and permutation operation (Feig and Winograd, 1992). Since approximated generalized Wiener filtering can be performed by scaling the transformed coefficients, it may be possible to further optimize it by combining the scaling operation for filtering and the scaling operation for DCT itself. In addition, the permutation operation can be merged by properly modifying the zigzag scanning order. According to the scaled DCT, an 8 8 DCT matrix, C8 , can be factorized as follows: C8 ¼ P8 D8 R8;1 M8 R8;2 ; ð14Þ ~ 1 B2 B3 , where R8;2 ¼ B 2 1 0 0 0 0 0 0 0 6 60 6 60 6 6 60 6 P8 ¼ 6 60 6 6 60 6 6 40 3 2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 cð4Þ 0 0 0 0 0 0 1 0 0 0 0 0 0 cð4Þ 0 0 0 0 0 0 0 0 0 0 0 7 6 0 0 0 1 0 0 07 60 7 6 7 60 0 1 0 0 0 0 07 6 7 6 7 60 0 0 0 0 1 0 07 6 7; M8 ¼ 6 60 1 0 0 0 0 0 07 7 6 7 6 60 0 0 0 0 0 0 17 7 6 7 6 0 0 1 0 0 0 05 40 0 0 0 0 0 0 1 0 0 cð6Þ 0 3 7 0 7 7 0 7 7 7 0 7 7 7; 0 7 7 7 0 7 7 7 cð2Þ 5 cð2Þ cð6Þ S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 2 1 0 1 6 60 6 61 6 6 60 6 B2 ¼ 6 60 6 6 60 6 6 40 0 0 0 0 0 3 2 1 7 6 07 6 1 7 6 7 6 0 0 1 1 0 0 0 07 6 7 6 6 0 0 1 1 0 0 0 0 7 7 ~ 6 7; B1 ¼ 6 6 0 0 0 0 1 1 1 0 7 7 6 7 6 6 0 0 0 0 1 1 0 1 7 7 6 7 6 0 0 0 1 1 1 0 5 4 0 6 60 6 60 6 6 60 6 R8;1 ¼ 6 60 6 6 60 6 6 40 2 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 3 0 2 1 7 6 07 60 7 6 7 60 07 6 7 6 60 07 7 6 7; B3 ¼ 6 7 61 07 6 7 6 60 07 7 6 7 6 05 40 1 0 0 0 29 0 0 0 3 7 07 7 0 0 1 0 0 0 07 7 7 0 1 1 0 0 0 0 7 7 7; 0 0 0 0 0 1 0 7 7 7 0 0 0 1 0 0 1 7 7 7 0 0 0 1 0 1 0 5 1 0 0 0 0 0 0 0 0 0 1 0 1 3 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 7 0 7 7 1 0 0 7 7 7 0 0 0 7 7 7; 0 0 1 7 7 7 0 1 0 7 7 7 1 0 0 5 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 3 2 pffiffiffi 2 0 0 0 0 0 0 0 7 6 7 6 0 c 1 ð4Þ 0 0 0 0 0 0 7 6 7 6 1 7 6 0 0 c ð6Þ 0 0 0 0 0 7 6 7 6 1 7 6 ð2Þ 0 0 0 0 0 0 0 c 7; D8 ¼ 18 6 7 6 0 0 0 c 1 ð5Þ 0 0 0 7 6 0 7 6 7 6 0 1 0 0 0 0 c ð1Þ 0 0 7 6 7 6 1 7 6 0 0 0 0 0 0 c ð3Þ 0 5 4 1 0 0 0 0 0 0 0 c ð7Þ ð15Þ and cðkÞ ¼ cosð2pk=32Þ. It should be noted that C8 can be easily implemented by scaling and permuting the result of R8;1 M8 R8;2 . In addition, Eq. (14) can be expanded to two-dimensional DCT, i.e., C8 C8 ¼ ðP8 D8 R8;1 M8 R8;2 Þ ðP8 D8 R8;1 M8 R8;2 Þ ¼ ððP8 D8 Þ ðP8 D8 ÞÞððR8;1 M8 R8;2 Þ ðR8;1 M8 R8;2 ÞÞ ¼ ðP8 P8 ÞðD8 D8 ÞðR8;1 R8;1 ÞðM8 M8 ÞðR8;2 R8;2 Þ ¼ PDðR1 MR2 Þ: ð16Þ 30 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 Fig. 5. Joint optimization with the scaled DCT. Therefore, C8 C8 can also be implemented by the two-dimensional scaling and permuting of R1 MR2 : As shown in Fig. 5, the scaling factor residing in the scaled DCT, Dðk 0 ; l0 Þ, and the scaling factor for pre-filtering, F ðk 0 ; l0 Þ, can be merged into a unified scaling term, H ðk 0 ; l0 Þ. Here, the coordinates of ðk 0 ; l0 Þ are used instead of ðk; lÞ. This is because the permutation operation still remains after scaling, to complete the scaled DCT. However, since the permutation operation just alters the absolute positions of transformed coefficients, it can also be combined with the zigzag scanning operation in the VLC, which also alters coefficient positions (see Fig. 6(a)). Hence, the combined scanning order is given in Fig. 6(b). In summary, H ðk 0 ; l0 Þ for simultaneously performing filtering and scaled DCT, can be described as H ðk 0 ; l0 Þ ¼ Dðk 0 ; l0 Þ F ðk 0 ; l0 Þ ¼ Dðk 0 ; l0 Þ þ Dðk 0 ; l0 Þ Sðk 0 ; l0 Þ ðr2n =r2 Þ ð1=Wðk 0 ; l0 ÞÞ ; 1 þ ðr2n =r2 Þ ð1=Wðk 0 ; l0 ÞÞ ð17Þ where r2 ¼ maxðr2z r2n ; 0Þ; and r2z ¼ 7 X 7 X k¼0 l¼0 Z 2 ðk; lÞ ¼ 7 X 7 X 2 ð1 Sðk 0 ; l0 ÞÞ D2 ðk 0 ; l0 Þ Vc2 ðk 0 ; l0 Þ: ð18Þ k 0 ¼0 l0 ¼0 Fig. 6. (a) The normal zigzag scanning order, and (b) the combined scanning order for the permutation of the scaled DCT and the zigzag scanning of the VLC. S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 31 Here, Vc ðk 0 ; l0 Þ denotes the transformed coefficients just after operation R1 MR2 are completed in the scaled DCT. It should be noted that Dðk 0 ; l0 Þ, and Sðk 0 ; l0 Þ have constant values, and Wðk 0 ; l0 Þ, or normalized elements in the diagonal of ARAT , become constant if the covariance matrix of y, R, is estimated as constants. Therefore, operations related to Dðk 0 ; l0 Þ, Sðk 0 ; l0 Þ, and Wðk 0 ; l0 Þ in Eqs. (17) and (18) can be pre-computed and stored in a small memory. As can be noticed above, the increase of computational complexity by integrating the pre-filtering process in the DCT is only due to the calculation of H ðk 0 ; l0 Þ. And its calculation consists of 64 2 multiplications and additions, and 64 divisions, in addition to the estimation of SNR r2 =r2n . Also note that the computational complexity of SNR estimation can be controlled depending on the desired filtering performance. 4. Simulation results We have used an H.263 video encoder for this simulation. Only the first frame was encoded as the intra frame and the other frames were encoded as inter frames. All optional modes of the H.263 were turned off. Noisy video sequences were obtained by adding additive white Gaussian noise (AWGN) to original MPEG-4 test sequences. We used a fixed quantization parameter for all intra and inter blocks and encoded 300 frames with a frame rate of 10 Hz. The Hall monitor, Mother and daughter, Foreman, and Coast guard sequences in the QCIF format were used as test video sequences. In order to measure the amount of noise added to the original image and the visual quality of reconstructed images quantitatively, SNR ¼ 10 log10 ðr2 =r2n Þ and PSNR ¼ 10 log10 ð2552 =r2n Þ are used, respectively. Here, r2 is the variance of the desired (or original) image, and r2n is the variance of the difference between the desired image and the acquired (or noisy) image. To obtain H ðk 0 ; l0 Þ in Eq. (17), we need to find Wðk; lÞ, and these can be easily obtained if the covariance matrix R is known. Hence, to find realistic covariance estimates for mean-subtracted intra and inter blocks, we use an estimate of the covariance matrix for both horizontal and vertical directions, i.e., (Niehsen and Brunig, 1999) T 8 ^ 1D ¼ 1 R Ys Ys þ Ys YTs ; ð19Þ T 2 traceðYs Ys Þ where Ys denotes the 8 8 matrix form of mean-subtracted block data, y. Note that Eq. (19) coincides with the unit variance normalization for Toeplitz covariance matrices, in the wide sense stationary case. In order to subtract block-based mean data from intra and inter blocks, we try two mean estimation methods corresponding to Eqs. (7) and (13), and name them method 1 and method 2, respectively. For covariance estimation, we use four MPEG-4 sequences; Akiyo, Container ship, News, and Silent voice in the QCIF format, which are different from the test sequences for filtering. Each sequence has 300 frames and is encoded with a frame rate of 10 Hz, and a fixed quantization parameter of 2 is used. First, for mean-subtracted intra and inter blocks, we obtain normalized covariance estimates frame by frame, through 32 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 frame-based averaging. Then, we examine their minimal, maximal, and averaged values for all the corresponding frames in the four sequences, and plot them in Fig. 7 for intra blocks and Fig. 8 for inter blocks, respectively. In the simulation for the proposed DCT-domain Wiener filtering, we use the averaged covariance estimation values as in Niehsen and Brunig (1999), for obtaining constant Wðk 0 ; l0 Þ. Also, meansubtracted image variance r2 is updated block by block. Fig. 7. Normalized covariance estimates after frame-based averaging for intra blocks. Depicted are minimal, maximal, and averaged block covariance estimates for all the intra frames in the four sequences. Fig. 8. Normalized covariance estimates after frame-based averaging for inter blocks. Depicted are minimal, maximal, and averaged block covariance estimates for all the inter frames in the four sequences. S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 33 To compare with the proposed embedded Wiener filtering in the DCT domain, we also perform the simulation for the spatial-domain adaptive Wiener filtering in Lim (1990), which is defined as ^ ðn1 ; n2 Þ þ u^ðn1 ; n2 Þ ¼ m r^2 ðn1 ; n2 Þ ^ ðn1 ; n2 ÞÞ; ðmðn1 ; n2 Þ m r^2 ðn1 ; n2 Þ þ r2n ð20Þ where ^ ðn1 ; n2 Þ ¼ m nX 1 þM 1 ð2M þ 1Þ nX 2 þM 2 mðk1 ; k2 Þ; k1 ¼n1 M k2 ¼n2 M r^2 ðn1 ; n2 Þ ¼ maxð^ r2z ðn1 ; n2 Þ r2n ; 0Þ; and Fig. 9. Encoding performance of intra frames. Each frame has an SNR of 20 dB. (a) Hall monitor, (b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences. 34 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 r^2z ðn1 ; n2 Þ ¼ nX 1 þM 1 ð2M þ 1Þ nX 2 þM 2 2 ^ ðn1 ; n2 ÞÞ : ðmðk1 ; k2 Þ m ð21Þ k1 ¼n1 M k2 ¼n2 M Here, u^ðn1 ; n2 Þ denotes the filtered result of the observed noisy image block, mðn1 ; n2 Þ. For video encoding based on spatial-domain Wiener filtering, each frame of a noisy video sequence is filtered and applied to the input of the conventional video encoder. In this simulation, M is set to 2, and the noise variance is assumed to be known. Figs. 9 and 10 show the encoding performance of intra frames which are corrupted with noise of 20 and 10 dB SNR, respectively. From the figures, we can see that the encoding performance of intra frames somewhat depends on image characteristics and the amount of noise. As images become smoother, spatial-domain Wiener filtering gives slightly better results. On the contrary, in complex images that have a large amount of detail such as Coast guard, the spatial-domain Wiener Fig. 10. Encoding performance of intra frames. Each frame has an SNR of 10 dB. (a) Hall monitor, (b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences. S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 35 filtering gives somewhat worse results. In a general sense, we can say that the three filtering methods give comparable results in terms of PSNR but much better results compared with no-filtering. It is also interesting to note in Fig. 10 that, in the case of no-filtering, encoding performance is better in low bitrates. This is because coarse quantization in low bitrates provides a low-pass filtering effect. Fig. 11 shows encoded results of the intra-frame in the Foreman sequence, which is corrupted with a noise of 10 dB SNR. It is noticed that the spatial-domain filtering needs more coding bits than the proposed method for a fixed quantization parameter. From the images, we can see that the proposed pre-filtering methods incur blocking artifacts at 8 8 block boundaries due to the block-based processing behavior. Blocking artifacts are more noticeable in method 1 than in method 2. As the amount of noise becomes lower, blocking artifacts are less visible. Also, as the quantization step becomes coarser, the major source for blocking artifacts changes from block-based pre-filtering to block-based quantization. These blocking artifacts can be effectively alleviated by using the post-filtering technique recommended in MPEG-4 (MPEG-4, 1997; Kim et al., 1999). Fig. 11. Encoded images of the intra frame in Foreman sequence which is corrupted with a noise of 10 dB SNR. A fixed quantization parameter of 2 is used. (a) No filtering (24.0 dB, 195 kbits), (b) spatial-domain adaptive Wiener filtering (29.2 dB, 109 kbits), (c) method 1 (29.0 dB, 87 kbits), and (d) method 2 (29.2 dB, 77 kbits). 36 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 Figs. 12 and 13 show the encoding performance of inter frames which are corrupted with noise of 20 and 10 dB SNR, respectively. When compared with the results of the intra frame, it is interesting to note that the proposed pre-filtering performed inside a video encoder generally gives better results than the spatial-domain adaptive Wiener filtering performed at the input stage of a video encoder. This means that the proposed method gives better results in inter frames than in intra frames. This may be because it utilizes temporal correlation by motion compensated prediction for inter frames. The gap between the PSNR value of the proposed method and that of the spatial filtering is increased, as the SNR becomes lower. Unlike intra frame results where performance quite depends on image characteristics, the best inter frame results are generally generated by method 1. When we consider both intra and inter frame filtering, however, the difference between method 1 and Fig. 12. Encoding performance of inter frames. Each frame has an SNR of 20 dB. (a) Hall monitor, (b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences. S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 37 Fig. 13. Encoding performance of inter frames. Each frame has an SNR of 10 dB. (a) Hall monitor, (b) Mother and daughter, (c) Foreman, and (d) Coast guard sequences. method 2 is not significant. From these results, we may conclude that the selection of Sðk; lÞ gives marginal impact on filtering performance. Note in Fig. 13, that in the case of no filtering, the PSNR for a noisy video also decreases as the bitrate increases as in Fig. 10. Fig. 14 shows inter-coded images of the 30th frame in the Foreman sequence for a fixed quantization parameter. From these figures, it can be shown that the proposed method provides more pleasant image quality with less coding bits compared to the spatial-domain Wiener filtering. It is also interesting to note that, unlike intra frames, the blocking artifacts are not noticeable in inter frames. Since the motioncompensated inter blocks are more noise-like compared to intra blocks, block-based scaling of DCT coefficients incurs blocking artifacts more seriously in intra blocks rather than in inter blocks. 38 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 Fig. 14. Encoded images of the 30th frame in Foreman sequence which is corrupted with a noise of 10 dB SNR. A fixed quantization parameter of 2 is used. (a) No filtering (24.7 dB, 1774 kbps), (b) spatial-domain adaptive Wiener filtering (29.6 dB, 892 kbps), (c) method 1 (30.2dB, 466kbps), and (d) method 2 (30.1 dB, 494 kbps). In all the simulations above, we assume the baseline of H.263. However, even when the advanced mode of H.263 is turned on, the overall tendency of the performance curves depending on applied pre-filtering methods is quite similar to that of the baseline besides the slight improvement of coding efficiency. 5. Conclusions A pre-filtering scheme tightly coupled with a conventional encoder structure has been proposed. Unlike the conventional method that cascades a pre-filter and a video encoder, the proposed method is performed by scaling the transformed coefficients of original image blocks and motion-compensated error blocks inside a video encoder. For this filtering, we adopted the approximated generalized Wiener filtering and a DCT-domain mean estimation scheme for non-zero-mean image data, in order to perform the whole operation with a single scaling operation. In addition, covariance estimates of mean-subtracted intra and inter blocks are examined and utilized for their filtering. Since the DCT is mandatory for many block-based video S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 39 coders and the whole filtering process requires only scaling operations for transformed coefficients rather than convolution operations, the overall architecture of the conventional video encoder is maintained. Also, the computational complexity increase due to pre-filtering is marginal for given SNR estimates. The proposed method proves to provide good filtering and encoding performances especially for inter frames, when compared with the cascaded method of a spatial-domain adaptive Wiener filter and a video encoder. When considering the structural simplicity and good coding performance of the proposed encoder, it can be very useful for low-cost video coding applications under noisy environments. References Al-Shaykh, O.K., Mersereau, R.M., 1998. Lossy compression of noisy images. IEEE Trans. Image Processing 7 (12), 1641–1652. Vasconcelos, N., Dufaux, R., 1997. Pre and post-filtering for low bit-rate video coding, in: Proceedings of the International Conference on Image Processing, vol. 2, Santa Barbara, CA, pp. 291–294. Lim, J.S., 1990. Two-Dimensional Signal and Image Processing. Prentice-Hall. Boo, K.J., Bose, N.K., 1998. A motion-compensated spatio-temporal filter for image sequences with signal-dependent noise. IEEE Trans. Circuits Syst. Video Technol. 8 (3), 287–298. J onsson, R.H., 1997. Efficient DCT domain implementation of picture masking and compositing, in: Proceedingds of the International Conference on Image Processing, vol. 2, Santa Barbara, CA, pp. 366–369. Merhav, N., Bhaskaran, V., 1997. Fast algorithms for DCT-domain image down-sampling and for inverse motion compensation. IEEE Trans. Circuits Syst. Video Technol. 7 (3), 468–476. Jain, A.K., 1989. Fundamentals of Digital Image Processing. Prentice-Hall. Pratt, W.K., 1972. Generalized Wiener filter computation techniques. IEEE Trans. Comput. 21 (7), 636– 641. Feig, E., Winograd, S., 1992. Fast algorithms for the discrete cosine transform. IEEE Trans. Signal Processing 40 (9), 2174–2193. Niehsen, W., Brunig, M., 1999. Covariance analysis of motion-compensated frame differences. IEEE Trans. Circuits Syst. Video Technol. 9 (4), 536–539. Chen, C.-F., Pang, K.K., 1993. The optimal transform of motion-compensated frame difference images in a hybrid coder. IEEE Trans. Circuits Syst. II: Analog Digital Signal Processing 40 (6), 393–397. Ng, M.K., Yip, A.M., 2001. A fast MAP algorithm for high-resolution image reconstruction with multisensors. Multidimensional Systems and Signal Processing 12, 143–164. MPEG-4 video verification model V.8.0, ISO/IEC JTC 1/SC 29/WG 11/N1796, July 1997. Kim, S.D., Yi, J., Kim, H.M., Ra, J.B., 1999. A deblocking filter with two separate modes in block-based video coding. IEEE Trans. Circuits Syst. Video Technol. 9 (1), 156–160. SUNG DEUK KIM received the B.S. degree in electronics engineering from the Kyungpook National University, Korea, in 1994, and the M.S. and Ph.D. in electrical engineering from KAIST, Korea in 1996 and 2000, respectively. From 2000 to 2003, he was employed by LG Electronics, Co., Ltd. Since March 2003, he has been a faculty member in the Department of Electronics Engineering Education at Andong National University, Korea. His research interests include image and video processing. JONG BEOM RA received the B.S. degree in electronic engineering in 1975 from Seoul National University, and the M.S. and Ph.D. in electrical engineering from KAIST, Korea, in 1977 and 1983, respectively. From 1983 to 1987, he was a member of the faculty at Columbia University, New York, 40 S. Deuk Kim, J. Beom Ra / J. Vis. Commun. Image R. 14 (2003) 22–40 engaged in the development of medical imaging systems such as high field magnetic resonance imaging and spherical positron emission tomography systems. In July 1987, he joined the Department of Electrical Engineering and Computer Science at KAIST, where he is now a professor. His research interests are digital image processing, video signal processing, 3-D visualization, and medical imaging such as MRI.
© Copyright 2026 Paperzz