Image and Vision Computing 24 (2006) 1269–1277 www.elsevier.com/locate/imavis Dominant colour extraction in DCT domain Jianmin Jiang a,b,*, Ying Weng b, PengJie Li c a Southwest University, China University of Bradford, UK c Institute of Acoustics, Chinese Academy of Sciences b Received 23 March 2004; received in revised form 11 January 2006; accepted 5 April 2006 Abstract As most of MPEG and JPEG compression standards are DCT-based, we propose a simple, low-cost and fast algorithm to extract dominant colour features directly in DCT domain without involving full decompression to access the pixel data. As dominant colour is one of the colour features used in constructing MPEG-7 dominant colour descriptors, the proposed algorithm would provide useful technique for those already compressed videos and images, where MPEG-7 dominant colour descriptors are needed. While the proposed algorithm presents advantages in terms of computing efficiency, i.e. eliminating the need of IDCT for compressed videos and images, extensive experiments also support that the proposed algorithm achieves competitive performances in extracting dominant colour features when compared with the pixel domain extraction described in MPEG-7. Crown Copyright q 2006 Published by Elsevier B.V. All rights reserved. Keywords: Dominant colour features; MPEG-7; Feature extraction in compressed domain 1. Introduction Along with the digital revolution and the success of international standardization of image compression techniques, rapid increase in volume of images and videos are seen almost everywhere including media publications, WEBbased publications, e-something, etc. The rich source of visual information made available to our daily life brought the need to develop algorithms and techniques to browse, search, retrieve and manage the content of those compressed digital images and videos. To this end, research on contentbased image indexing and retrieval is receiving tremendous interests within both the academic community and the industrial sectors [1,2]. Since most of such research is carried out in pixel domain, in which input images are assumed to be in their original raw format, full decompression would have to be performed prior to the content description or content-based management developed in pixel domain. In practice, it maybe arguable that this sort of overhead computing does not really present any problem since full decompression is fast, and its algorithm * Corresponding author. E-mail address: [email protected] (J. Jiang). implementation is efficient. Thus, it could be accepted that the cost of IDCT, or indeed the full decompression, will not be a bottleneck for such compressed image management. When millions of such compressed images need to be accessed, however, the computing cost for full decompression could become an issue, which may deserve our attention for better solutions. As a result, a new wave of research within the academics is emerging for direct content access and feature extraction in compressed domain rather than in pixel domain [10–13]. Following the successful development of MPEG-1, MPEG-2 and MPEG-4 standards, MPEG has now developed a new standard, MPEG-7 [4–8], which is formally known as multimedia content description interface to describe multimedia content. While the previous standards focus on coding and representation of audio–visual content, MPEG-7 focuses on content description with various modalities including image, video, audio, speech, graphics, and their combinations. To follow the research efforts in contentbased image indexing and retrieval, MPEG-7 visual standard for content description has developed a range of descriptors corresponding to common features including colour, texture, shape, and motion [5,7,8]. Among all the features, colour descriptors are mainly composed of scalable colour 0262-8856/$ - see front matter Crown Copyright q 2006 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2006.04.009 1270 J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 descriptor (SCD), dominant colour descriptor (DCD) [5,9], colour layout descriptor, and group-of-pictures colour descriptor. The DCD is mainly developed for fast and efficient image browsing and retrieval via the description of local spatial colour distribution at global level. For a given region of images, the colours are clustered into a small number of representative colours to produce a more compact representation of the colour distribution inside the image. In comparison with those histogram-based approaches, the DCD only contains elements of representative colours, their percentages in a region, spatial coherency of the colour and the colour variance. To provide efficient description for those already compressed images and videos, we propose in this paper a simple dominant colour extraction algorithm directly in DCT domain to formulate the MPEG-7 DCD. In comparison with the existing stateof-the-art, i.e. MPEG-7 and other colour-based description of image and video content published in the literature, our contribution and its novelty can be highlighted as: while all existing work extracts dominant colour features in pixel domain, our proposed algorithm can extract dominant colour features directly in DCT domain. When millions of compressed images and video frames need to be accessed with their dominant colour descriptions, the proposed technique will directly work with their compressed format without full decompression. Yet the existing techniques including MPEG-7 would need full decompression to get their pixel data before their dominant colour feature can be extracted. The remaining part of the paper is further divided into two further sections. While Section 2 provides the detailed description of the proposed algorithm design, Section 3 covers the assessment and evaluation of the proposed algorithm via extensive experiments. Finally, some concluding remarks are also given in this section. 2. Dominant colour extraction in DCT domain MPEG-7 dominant colour extraction mainly comes from the work by Deng and Manjunath [9]. This descriptor highlights the salient colour in images or regions in pixel domain via colour clustering process. Hence, the descriptor is defined to include the representative colours, their percentages in the region, spatial coherency of dominant colours, and colour variances for each dominant colour. Its full definition is given below F Z ffci ;pi ;vi g;s;g ði Z 1;2.;NÞ (1) where ci is the ith dominant colour; pi represents its percentage value; vi is its colour variance; and s represents the spatial coherency. It is reported that this descriptor achieves competitive performances in comparison with those high-dimensional histogram methods [7], while being used in image indexing within the 3D colour space. Although, significant advantages have been illustrated with this descriptor in terms of colour representation and low dimension, it cannot be used to process compressed images without decompression since it is limited to pixel domain, which requires sequence of operations including entropy decoding and IDCT. On the other hand, detailed analysis of DCT coefficients reveals that there exist many statistic features in DCT coefficients, reflecting the same content and the same nature of the image as that of pixels. While the DC coefficient represents the average intensity value among a block of pixels, the AC coefficients reflect the variance of colour changes in pixels within the same block. When arranged in a zig-zag order, the AC coefficients are also capable of indicating frequency components in an ascending order. Following this relationship analysis between the DCT coefficients and the image content in pixels, it becomes clear that each block of 8!8 pixels as designed in JPEGs [3] and MPEGs can be classified into a dominant colour block or a non-dominant colour block. Since the number of non-zero AC coefficients directly relates to the variance of colour components among the pixels, it can be used as a parameter to control the block classification of either a dominant colour block or a nondominant colour block. To improve the accuracy, a further parameter can be added to indicate the position of the last non-zero AC coefficient along the zig-zag order, since it represents the highest frequency component available inside the block. Therefore, all the blocks can be classified into dominant colour blocks, non-dominant colour blocks or uncertain blocks via the following mechanism in DCT domain ( dominant_colour; if Nnz ! T1 &P! T2 blocki Z (2) uncertain; otherwise where blocki stands for the ith block inside the image along the raft-scanning order, Nnz is the number of non-zero AC coefficients inside blocki, P represents the position of the last non-zero AC coefficient along the zig-zag order inside the blocki, and finally T1 and T2 stand for the first two thresholds we introduced to identify the dominant-colour blocks. For those uncertain blocks, there exist two possibilities. One is that the block concerned is a non-dominant colour block, and the other is that the block contains dominant colour in smaller regions. To identify those definitely nondominant colour blocks, controlling the number of AC coefficients remain to be sufficient. One extreme case is that all 63 AC coefficients are non-zero. Therefore, the following condition can be set up to determine those definitely nondominant colour blocks: ( non_dominant_colour if Nnz O T3 blocki Z (3) uncertain; otherwise For the latter that dominant colour may exist in smaller blocks, we can use our previous work [13] to decompose the 8!8 block into four sub-blocks of 4!4 coefficients in DCT domain. Specifically, given a block of 8!8 DCT coefficients represented by [C], the matrix of four sub-blocks of 4!4 J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 1271 Table 1 Parameter matrix [A] 0.5 0 0 0 0.5 0 0 0 0.4531 0.2079 K0.0373 0.0114 K0.4531 0.2079 0.0373 0.0114 0 0.5 0 0 0 K0.5 0 0 0 0 0.5 0 0 0 0.5 0 K0.1591 0.3955 0.2566 K0.0488 0.1591 0.3955 K0.2566 K0.0488 DCT coefficients can be derived from the following equation [13] " # C11 C12 (4) Z 2 ! ½A ! ½C ! ½AT C21 C22 where Cij is a matrix of 4!4 DCT coefficients representing the decomposed sub-block, and [A] is an 8!8 parameter matrix, whose values are given in Table 1. As the number of AC coefficients is now reduced to 16 in those sub-blocks, we propose to use the square sum of AC coefficients to determine those dominant colour sub-blocks instead of (2). This is detailed below ( dominant_colour; if S! T4 sub_blocki Z (5) non_dominant; otherwise where SZ 15 P ACk2 is the square sum of all 4!4 AC coefficients kZ1 inside the sub_block, and T4 is the fourth threshold introduced for the block classification. As the block decomposition in DCT domain can be repeated until one single coefficient is reached [13], the block classification can be further tuned to identify the dominant colour at the levels of 2!2 blocks, and each individual coefficient, which would be the same as that of MPEG-7 in pixel domain. In practice, however, it is found that one level decomposition gives us sufficient accuracy in dominant colour extraction. After all the blocks are classified, those non-dominant colour blocks or sub_blocks are discarded, and the mean value of the pixels inside each dominant colour block is calculated in DCT domain to construct the MPEG-7 descriptor as well as the histogram for dominant colour analysis. According to the DCT equations [8], the mean value of pixels inside each block can be obtained in DCT domain via the following mZ 1 Cð0;0Þ N (6) where NZ8 for dominant colour blocks and NZ4 for dominant sub-blocks. In this way, each dominant colour block or sub-block is essentially represented by one single value, which can also be used to construct a so called dominant colour image. A few such images are illustrated in Fig. 3. 0.1063 K0.1762 0.3841 0.2452 K0.1063 K0.1762 K0.3841 0.2452 0 0 0 0.5 0 0 0 K0.5 K0.0901 0.1389 K0.1877 0.4329 0.0901 0.1389 0.1877 0.4329 The above design requires four thresholds, which can be determined empirically in principle in accordance with the relationship between DCT coefficients and the image content in pixels. For example, it can be inferred that blocks with 30 nonzero AC coefficients should support that there exist no dominant colour in an 8!8 block. This is because when more than 30 different variance values are present in this pixel block, it certainly indicates that no colour component dominates among all pixels inside this block. As this threshold increases, the number of blocks identified as non-dominant colour will be reduced, yet more accuracy is achieved. If the threshold decreases, however, the accuracy of extraction may suffer, although more non-dominant colour blocks will be identified. As a result, T3 is set up to be 30 in the proposed algorithm. With thresholds T1 and T2, the principle is similar. As T1 is used to monitor the number of non-zero AC coefficients and this number indicates the variety of intensity values inside the block, we set T1 to be 12, which is around one fifth of the total number of non-zero AC coefficients. As T2OT1, and the position of the last non-zero AC coefficient indicates the maximum frequency component inside the block, we set T2 to be at 20. Such settings for T1wT3 are permanent for the proposed algorithm, which are applied to all images tested. Extensive experimental results suggest that the settings are correct and no adjustment is needed. T4 is an important parameter in this algorithm, which provides more delicate information about the dominant colour in smaller regions. Since those smaller regions proved to be more difficult to extract dominant colour information, an adaptive approach needs to be developed to determine the specific value for T4. Given M blocks of AC coefficients for a compressed image or a video frame, an AC absolute sum, Asi , can be calculated for the ith block by: Asi Z 63 X jACj j; ci 2½0;MK1 (7) jZ1 Via arranging all the AC absolute sums into an ascending order with respect to their frequencies, a curve can be produced. One example for Lena.jpg is shown in Fig. 1. From this curve, the maximum curvature can be taken as the value of T4, which remains adaptive to the content feature of each individual image. As the curve is produced out of discrete samples, a mean filtering is applied to smooth the curve and help us to locate the maximum curvature. To reduce the 1272 J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 Fig. 1. Illustration of square sum curves. complexity in the algorithm design, an alternative way is to set T4 to be 10% of the maximum AC absolute sum, which is roughly around the maximum curvature point. Our experiments do not reveal any significant difference between these two approaches. As a result, the proposed dominant colour extraction algorithm can be summarised in Fig. 2. 3. Experimental results and conclusions To evaluate the proposed algorithm, we carried out extensive experiments to extract dominant colour features in comparison with the existing MPEG-7 techniques in pixel domain. As given in (1), the MPEG-7 colour descriptor contains four elements. Since s is optional according to Fig. 2. Summary of the proposed algorithm. MPEG-7 [5–8], however, our experiments are focused on the comparison of three other elements, i.e. cipi and vi. Considering the fact that each image may have a large number of dominant colour elements, we arrange the detailed comparison in three phases. In the first phase, we compare the first four major dominant colours between the proposed algorithm in DCT domain and the original MPEG-7 descriptor in pixel domain, which are given in Tables 2–4. Table 2 lists all the elements of the MPEG-7 DCD (dominant colour descriptor) for the first four dominant colour for Lena.jpg, and Table 3 lists the same details for Pepper.jpg. In Table 4, we choose the colour image Baboon.jpg as the sample to illustrate the detailed comparison of all the three components between the DCDs extracted by the proposed algorithm and by the original MPEG-7. Fig. 3 illustrates the experimental results in the second phase of comparison, where the first 16 dominant colours for each image are illustrated in bar-charts. To cover all the dominant colour elements extracted by the proposed algorithm and overcome the drawback that tables and barcharts have limited space to include all dominant colour elements, we further arrange all those classified blocks displayed as an image for visual inspections, in which each pixel data are represented by m as specified in (6). These are illustrated in Fig. 4, where white blocks are those nondominant colour blocks and thus discarded. From all the phases of comparisons, the following observations can be made about the proposed algorithm: † The dominant colour extracted in DCT domain effectively represents the salient colour in the original image. † The dominant colour descriptor extracted by the proposed algorithm in DCT domain remains very close to that of MPEG-7 in pixel domain. † The reconstructed images via dominant colour (m) reveals that the content of original images is preserved very well. † The proposed algorithm essentially extracts dominant colour in terms of blocks or sub-blocks rather than J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 1273 Table 2 MPEG-7 dominant colour descriptor for Lena.jpg Lena Dominant colour in pixel domain Dominant colour in DCT domain Percentage in pixel (%) Percentage in DCT (%) Variance in pixel Variance in DCT 1st 2nd 3rd 4th 154 137 52 103 154 137 52 120 15.48 14.32 11.66 11.03 15.65 13.30 11.81 10.65 4.7681 5.1019 4.5647 4.5941 4.9656 5.0744 4.3800 4.9783 Table 3 MPEG-7 dominant colour descriptor for Peppers.jpg Peppers Dominant colour in pixel domain Dominant colour in DCT domain Percentage in pixel (%) Percentage in DCT (%) Variance in pixel Variance in DCT 1st 2nd 3rd 4th 86 103 188 154 103 86 171 188 13.18 12.19 11.32 11.23 12.19 12.08 10.82 10.77 4.7572 4.7032 4.5192 4.8958 4.6621 4.6389 5.0841 4.4151 pixels due to the fact that dominant colour must have a reasonable critical mass to be regarded as dominant. Under this context, the proposed algorithm can be implemented in two modes. The first mode is to extract dominant colour from blocks of 8!8 pixels only, and the second is to extract dominant colour with blocks of 8!8 pixels as well as sub-blocks of 4!4 pixels. In the first mode, the computing cost of the proposed algorithm is only three comparisons given by (2) and (3). In the second mode, those uncertain blocks will incur: (i) computing cost for deriving four sub-blocks (4!4 DCT coefficients), which is specified by (4). From Eq. (4), it can be seen that standard implementation will require 64!8!2Z1024 multiplication and 56!8!2Z896 additions; (ii) computing cost for examining the four sub-blocks specified by (5), where deriving S requires 15 multiplication and 14 additions. Therefore, the total computing cost is 1039 multi- plication and 910 additions. In contrast with the existing pixel-domain dominant colour extraction, the IDCT alone would require 4096 multiplication and 4032 additions [14], yet such computing cost is needed for every block inside the image. For the proposed algorithm, only a small number of blocks need subblock examination. As a matter of fact, for all the test images reported in the paper, dominant colour is extracted only at the block level and no sub-block division is incurred. n addition, we applied the proposed algorithm to a database of around 1000 JPEG compressed images to perform a small scale of content-based image retrieval via the dominant colour descriptors in both pixel domain as proposed by MPEG-7 and DCT domain as proposed in this paper. The experiments reveal that almost exactly the same retrieval results were observed in terms of dominant colour Table 4 MPEG-7 dominant colour descriptor Dominant colour in pixel domain Dominant colour in DCT domain Percentage in pixel (%) Percentage in DCT (%) Variance in pixel Variance in DCT Y component of colour Baboon.jpg 1st 120 2nd 137 3rd 171 4th 188 120 137 188 171 16.00 13.81 12.64 11.63 18.86 16.38 15.66 15.39 4.7525 4.9140 4.8899 4.6438 4.5058 4.8607 4.2962 4.8768 Cr component of colour Baboon.jpg 1st 120 2nd 137 3rd 103 4th 154 120 137 103 154 40.54 27.53 9.25 5.56 41.93 27.53 9.14 5.52 4.3625 4.1437 4.6982 4.6934 4.3645 4.1185 4.8808 4.9141 Cb component of colour Baboon.jpg 1st 120 2nd 103 3rd 137 4th 154 120 103 154 137 38.71 18.89 15.89 10.56 40.96 21.90 13.11 12.04 4.6315 4.8266 4.6401 4.4632 4.7426 4.8026 4.7625 4.0554 1274 J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 Fig. 3. Comparison of dominant colour histograms with 16 bins between the proposed and MPEG-7 in pixel domain. J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 1275 Fig. 4. Illustration of reconstructed image samples via extracted dominant colour. retrieval. Fig. 5 illustrates such retrieved samples, where the query image is presented at the top left corner. In addition, the mean value of dominant colour blocks extracted in the proposed algorithm can also be exploited to construct a so-called dominant colour histogram. According to the dominant colour extraction principle, this histogram should reflect the major structure of those general histograms constructed from the original pixel data. In 1276 J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 Fig. 5. Illustration of dominant colour-based image retrieval. this way, the proposed technique can be used to predict the image histogram directly in DCT domain, which could lead to histogram extraction rather than just dominant colour extraction. Fig. 6 illustrates such detailed comparisons between the extracted dominant colour histogram in DCT domain and the original image histogram based on colour intensity values in pixel domain. It is shown that the dominant colour histograms constructed J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277 1277 videos to formulate content descriptors in the process of their compression, the proposed algorithm will be very useful in extracting MPEG-7 dominant colour descriptors directly from those already compressed videos and images. As a result of the successful JPEG and MPEG standardization activities, almost all digital videos are now stored in compressed format. Therefore, feature extraction or image processing in DCT domain will provide certain advantages in reducing computing cost and direct accessing the content of compressed image and videos. Finally, the authors wish to acknowledge the financial support from the Resource: The Council for Museums, Archives and Libraries in the UK under the research grant LIC/RE/082, and the Natural Science Foundation in China. References Fig. 6. Visual comparison between the dominant colour histogram extracted by the proposed algorithm and the general colour histogram extracted in pixel domain. from the proposed algorithm do reflect the major structure of the image histogram in pixel domain. This will provide good potential for further research in extracting image histograms directly from DCT domain, which would be useful for other applications in processing compressed images. In this paper, an algorithm for dominant colour extraction in DCT domain is proposed, which provides effective and fast processing technique for content description of compressed videos and images. While MPEG-7 works on raw images and [1] R. Brunelli, O. Mich, M. Modena, A survey on the automatic indexing of video data, J. Visual Commun. Image Represent. (1999) 78–112. [2] B. Ko, J. Peng, H. Byun, Region-based image retrieval using probabilistic feature relevance learning, Pattern Anal. Appl. 4 (2–3) (2001) 174–184. [3] G.K. Wallace, The JPEG still picture compression standard, Commun. ACM 34 (4) (1991) 31–45. [4] Overview of MPEG-7 standard (version 5.0), ISO/IEC JTC1/SC29/WG11 N4031, March 2001. [5] T. Sikora, The MPEG-7 visual standard for content description—an overview, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 696– 702. [6] B.S. Manjunath, J. Ohm, V.V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 703– 715. [7] M. Bober, MPEG-7 visual shape descriptors, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 716–719. [8] S. Jeannin, A. Divakaran, MPEG-7 visual motion descriptors, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 720–724. [9] Y. Deng, B.S. Manjunath, C. Kenney, An efficient color representation for image retrieval, IEEE Trans. Image Process. 10 (1) (2001) 140–147. [10] G. Feng, J. Jiang, JPEG compressed image retrieval via statistical features, Pattern Recogn, in press. [11] R. Reeves, K. Kubik, W. Osberger, Texture characterization of compressed aerial images using DCT coefficients, Proceedings of SPIE: Storage and Retrieval for Image and Video Databases V, vol. 3022, February 1997, pp. 398–407. [12] M. Liu, J. Jiang, C. Hou, Combination of image indexing and compression, Proceedings of ICASSP’02, 2002. [13] J. Jiang, G. Feng, The spatial relationship of DCT coefficients between a block and its sub-blocks, IEEE Trans. Signal Proc. 50 (5) (2002) 1160– 1169. [14] J. Jiang, Y. Weng, Video extraction for fast content access to mpeg compressed videos, IEEE Trans. Circuits, Syst. Video Technol. 14 (5) (2004) 595–605.
© Copyright 2024 Paperzz