Dominant colour extraction in DCT domain

Image and Vision Computing 24 (2006) 1269–1277
www.elsevier.com/locate/imavis
Dominant colour extraction in DCT domain
Jianmin Jiang a,b,*, Ying Weng b, PengJie Li c
a
Southwest University, China
University of Bradford, UK
c
Institute of Acoustics, Chinese Academy of Sciences
b
Received 23 March 2004; received in revised form 11 January 2006; accepted 5 April 2006
Abstract
As most of MPEG and JPEG compression standards are DCT-based, we propose a simple, low-cost and fast algorithm to extract dominant
colour features directly in DCT domain without involving full decompression to access the pixel data. As dominant colour is one of the colour
features used in constructing MPEG-7 dominant colour descriptors, the proposed algorithm would provide useful technique for those already
compressed videos and images, where MPEG-7 dominant colour descriptors are needed. While the proposed algorithm presents advantages in
terms of computing efficiency, i.e. eliminating the need of IDCT for compressed videos and images, extensive experiments also support that the
proposed algorithm achieves competitive performances in extracting dominant colour features when compared with the pixel domain extraction
described in MPEG-7.
Crown Copyright q 2006 Published by Elsevier B.V. All rights reserved.
Keywords: Dominant colour features; MPEG-7; Feature extraction in compressed domain
1. Introduction
Along with the digital revolution and the success of
international standardization of image compression techniques, rapid increase in volume of images and videos are
seen almost everywhere including media publications, WEBbased publications, e-something, etc. The rich source of
visual information made available to our daily life brought
the need to develop algorithms and techniques to browse,
search, retrieve and manage the content of those compressed
digital images and videos. To this end, research on contentbased image indexing and retrieval is receiving tremendous
interests within both the academic community and the
industrial sectors [1,2]. Since most of such research is
carried out in pixel domain, in which input images are
assumed to be in their original raw format, full decompression would have to be performed prior to the content
description or content-based management developed in pixel
domain. In practice, it maybe arguable that this sort of
overhead computing does not really present any problem
since full decompression is fast, and its algorithm
* Corresponding author.
E-mail address: [email protected] (J. Jiang).
implementation is efficient. Thus, it could be accepted that
the cost of IDCT, or indeed the full decompression, will not
be a bottleneck for such compressed image management.
When millions of such compressed images need to be
accessed, however, the computing cost for full decompression could become an issue, which may deserve our attention
for better solutions. As a result, a new wave of research
within the academics is emerging for direct content access
and feature extraction in compressed domain rather than in
pixel domain [10–13].
Following the successful development of MPEG-1,
MPEG-2 and MPEG-4 standards, MPEG has now developed
a new standard, MPEG-7 [4–8], which is formally known as
multimedia content description interface to describe multimedia content. While the previous standards focus on
coding and representation of audio–visual content, MPEG-7
focuses on content description with various modalities
including image, video, audio, speech, graphics, and their
combinations. To follow the research efforts in contentbased image indexing and retrieval, MPEG-7 visual standard
for content description has developed a range of descriptors
corresponding to common features including colour, texture,
shape, and motion [5,7,8]. Among all the features, colour
descriptors are mainly composed of scalable colour
0262-8856/$ - see front matter Crown Copyright q 2006 Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2006.04.009
1270
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
descriptor (SCD), dominant colour descriptor (DCD) [5,9],
colour layout descriptor, and group-of-pictures colour
descriptor. The DCD is mainly developed for fast and
efficient image browsing and retrieval via the description of
local spatial colour distribution at global level. For a given
region of images, the colours are clustered into a small
number of representative colours to produce a more
compact representation of the colour distribution inside the
image. In comparison with those histogram-based
approaches, the DCD only contains elements of representative colours, their percentages in a region, spatial coherency
of the colour and the colour variance. To provide efficient
description for those already compressed images and videos,
we propose in this paper a simple dominant colour
extraction algorithm directly in DCT domain to formulate
the MPEG-7 DCD. In comparison with the existing stateof-the-art, i.e. MPEG-7 and other colour-based description
of image and video content published in the literature, our
contribution and its novelty can be highlighted as: while all
existing work extracts dominant colour features in pixel
domain, our proposed algorithm can extract dominant colour
features directly in DCT domain. When millions of
compressed images and video frames need to be accessed
with their dominant colour descriptions, the proposed
technique will directly work with their compressed format
without full decompression. Yet the existing techniques
including MPEG-7 would need full decompression to get
their pixel data before their dominant colour feature can be
extracted.
The remaining part of the paper is further divided into two
further sections. While Section 2 provides the detailed
description of the proposed algorithm design, Section 3 covers
the assessment and evaluation of the proposed algorithm via
extensive experiments. Finally, some concluding remarks are
also given in this section.
2. Dominant colour extraction in DCT domain
MPEG-7 dominant colour extraction mainly comes from the
work by Deng and Manjunath [9]. This descriptor highlights
the salient colour in images or regions in pixel domain via
colour clustering process. Hence, the descriptor is defined to
include the representative colours, their percentages in the
region, spatial coherency of dominant colours, and colour
variances for each dominant colour. Its full definition is given
below
F Z ffci ;pi ;vi g;s;g ði Z 1;2.;NÞ
(1)
where ci is the ith dominant colour; pi represents its percentage
value; vi is its colour variance; and s represents the spatial
coherency.
It is reported that this descriptor achieves competitive
performances in comparison with those high-dimensional
histogram methods [7], while being used in image indexing
within the 3D colour space. Although, significant advantages
have been illustrated with this descriptor in terms of colour
representation and low dimension, it cannot be used to
process compressed images without decompression since it is
limited to pixel domain, which requires sequence of
operations including entropy decoding and IDCT. On the
other hand, detailed analysis of DCT coefficients reveals that
there exist many statistic features in DCT coefficients,
reflecting the same content and the same nature of the
image as that of pixels. While the DC coefficient represents
the average intensity value among a block of pixels, the AC
coefficients reflect the variance of colour changes in pixels
within the same block. When arranged in a zig-zag order, the
AC coefficients are also capable of indicating frequency
components in an ascending order. Following this relationship analysis between the DCT coefficients and the image
content in pixels, it becomes clear that each block of 8!8
pixels as designed in JPEGs [3] and MPEGs can be classified
into a dominant colour block or a non-dominant colour
block. Since the number of non-zero AC coefficients directly
relates to the variance of colour components among the
pixels, it can be used as a parameter to control the block
classification of either a dominant colour block or a nondominant colour block. To improve the accuracy, a further
parameter can be added to indicate the position of the last
non-zero AC coefficient along the zig-zag order, since it
represents the highest frequency component available inside
the block. Therefore, all the blocks can be classified into
dominant colour blocks, non-dominant colour blocks or
uncertain blocks via the following mechanism in DCT
domain
(
dominant_colour; if Nnz ! T1 &P! T2
blocki Z
(2)
uncertain;
otherwise
where blocki stands for the ith block inside the image along
the raft-scanning order, Nnz is the number of non-zero AC
coefficients inside blocki, P represents the position of the last
non-zero AC coefficient along the zig-zag order inside the
blocki, and finally T1 and T2 stand for the first two thresholds
we introduced to identify the dominant-colour blocks.
For those uncertain blocks, there exist two possibilities.
One is that the block concerned is a non-dominant colour
block, and the other is that the block contains dominant
colour in smaller regions. To identify those definitely nondominant colour blocks, controlling the number of AC
coefficients remain to be sufficient. One extreme case is that
all 63 AC coefficients are non-zero. Therefore, the following
condition can be set up to determine those definitely nondominant colour blocks:
(
non_dominant_colour if Nnz O T3
blocki Z
(3)
uncertain;
otherwise
For the latter that dominant colour may exist in smaller
blocks, we can use our previous work [13] to decompose the
8!8 block into four sub-blocks of 4!4 coefficients in DCT
domain. Specifically, given a block of 8!8 DCT coefficients
represented by [C], the matrix of four sub-blocks of 4!4
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
1271
Table 1
Parameter matrix [A]
0.5
0
0
0
0.5
0
0
0
0.4531
0.2079
K0.0373
0.0114
K0.4531
0.2079
0.0373
0.0114
0
0.5
0
0
0
K0.5
0
0
0
0
0.5
0
0
0
0.5
0
K0.1591
0.3955
0.2566
K0.0488
0.1591
0.3955
K0.2566
K0.0488
DCT coefficients can be derived from the following equation
[13]
" #
C11 C12
(4)
Z 2 ! ½A ! ½C ! ½AT
C21
C22
where Cij is a matrix of 4!4 DCT coefficients representing
the decomposed sub-block, and [A] is an 8!8 parameter
matrix, whose values are given in Table 1.
As the number of AC coefficients is now reduced to 16 in
those sub-blocks, we propose to use the square sum of AC
coefficients to determine those dominant colour sub-blocks
instead of (2). This is detailed below
(
dominant_colour; if S! T4
sub_blocki Z
(5)
non_dominant;
otherwise
where SZ
15
P
ACk2 is the square sum of all 4!4 AC coefficients
kZ1
inside the sub_block, and T4 is the fourth threshold introduced
for the block classification.
As the block decomposition in DCT domain can be repeated
until one single coefficient is reached [13], the block
classification can be further tuned to identify the dominant
colour at the levels of 2!2 blocks, and each individual
coefficient, which would be the same as that of MPEG-7 in
pixel domain. In practice, however, it is found that one level
decomposition gives us sufficient accuracy in dominant colour
extraction.
After all the blocks are classified, those non-dominant
colour blocks or sub_blocks are discarded, and the mean value
of the pixels inside each dominant colour block is calculated in
DCT domain to construct the MPEG-7 descriptor as well as the
histogram for dominant colour analysis. According to the DCT
equations [8], the mean value of pixels inside each block can be
obtained in DCT domain via the following
mZ
1
Cð0;0Þ
N
(6)
where NZ8 for dominant colour blocks and NZ4 for dominant
sub-blocks.
In this way, each dominant colour block or sub-block is
essentially represented by one single value, which can also be
used to construct a so called dominant colour image. A few
such images are illustrated in Fig. 3.
0.1063
K0.1762
0.3841
0.2452
K0.1063
K0.1762
K0.3841
0.2452
0
0
0
0.5
0
0
0
K0.5
K0.0901
0.1389
K0.1877
0.4329
0.0901
0.1389
0.1877
0.4329
The above design requires four thresholds, which can be
determined empirically in principle in accordance with the
relationship between DCT coefficients and the image content in
pixels. For example, it can be inferred that blocks with 30 nonzero AC coefficients should support that there exist no dominant
colour in an 8!8 block. This is because when more than 30
different variance values are present in this pixel block, it
certainly indicates that no colour component dominates among
all pixels inside this block. As this threshold increases, the
number of blocks identified as non-dominant colour will be
reduced, yet more accuracy is achieved. If the threshold
decreases, however, the accuracy of extraction may suffer,
although more non-dominant colour blocks will be identified.
As a result, T3 is set up to be 30 in the proposed algorithm. With
thresholds T1 and T2, the principle is similar. As T1 is used to
monitor the number of non-zero AC coefficients and this
number indicates the variety of intensity values inside the block,
we set T1 to be 12, which is around one fifth of the total number
of non-zero AC coefficients. As T2OT1, and the position of the
last non-zero AC coefficient indicates the maximum frequency
component inside the block, we set T2 to be at 20. Such settings
for T1wT3 are permanent for the proposed algorithm, which are
applied to all images tested. Extensive experimental results
suggest that the settings are correct and no adjustment is needed.
T4 is an important parameter in this algorithm, which
provides more delicate information about the dominant colour
in smaller regions. Since those smaller regions proved to be
more difficult to extract dominant colour information, an
adaptive approach needs to be developed to determine the
specific value for T4.
Given M blocks of AC coefficients for a compressed image
or a video frame, an AC absolute sum, Asi , can be calculated for
the ith block by:
Asi Z
63
X
jACj j;
ci 2½0;MK1
(7)
jZ1
Via arranging all the AC absolute sums into an ascending
order with respect to their frequencies, a curve can be
produced. One example for Lena.jpg is shown in Fig. 1.
From this curve, the maximum curvature can be taken as the
value of T4, which remains adaptive to the content feature of
each individual image. As the curve is produced out of discrete
samples, a mean filtering is applied to smooth the curve and
help us to locate the maximum curvature. To reduce the
1272
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
Fig. 1. Illustration of square sum curves.
complexity in the algorithm design, an alternative way is to set
T4 to be 10% of the maximum AC absolute sum, which is
roughly around the maximum curvature point. Our experiments do not reveal any significant difference between these
two approaches.
As a result, the proposed dominant colour extraction
algorithm can be summarised in Fig. 2.
3. Experimental results and conclusions
To evaluate the proposed algorithm, we carried out
extensive experiments to extract dominant colour features in
comparison with the existing MPEG-7 techniques in pixel
domain. As given in (1), the MPEG-7 colour descriptor
contains four elements. Since s is optional according to
Fig. 2. Summary of the proposed algorithm.
MPEG-7 [5–8], however, our experiments are focused on
the comparison of three other elements, i.e. cipi and vi.
Considering the fact that each image may have a large
number of dominant colour elements, we arrange the
detailed comparison in three phases. In the first phase, we
compare the first four major dominant colours between the
proposed algorithm in DCT domain and the original
MPEG-7 descriptor in pixel domain, which are given in
Tables 2–4. Table 2 lists all the elements of the MPEG-7
DCD (dominant colour descriptor) for the first four
dominant colour for Lena.jpg, and Table 3 lists the same
details for Pepper.jpg. In Table 4, we choose the colour
image Baboon.jpg as the sample to illustrate the detailed
comparison of all the three components between the DCDs
extracted by the proposed algorithm and by the original
MPEG-7.
Fig. 3 illustrates the experimental results in the second
phase of comparison, where the first 16 dominant colours
for each image are illustrated in bar-charts. To cover all the
dominant colour elements extracted by the proposed
algorithm and overcome the drawback that tables and barcharts have limited space to include all dominant colour
elements, we further arrange all those classified blocks
displayed as an image for visual inspections, in which each
pixel data are represented by m as specified in (6). These
are illustrated in Fig. 4, where white blocks are those nondominant colour blocks and thus discarded. From all the
phases of comparisons, the following observations can be
made about the proposed algorithm:
† The dominant colour extracted in DCT domain effectively represents the salient colour in the original image.
† The dominant colour descriptor extracted by the
proposed algorithm in DCT domain remains very close
to that of MPEG-7 in pixel domain.
† The reconstructed images via dominant colour (m)
reveals that the content of original images is preserved
very well.
† The proposed algorithm essentially extracts dominant
colour in terms of blocks or sub-blocks rather than
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
1273
Table 2
MPEG-7 dominant colour descriptor for Lena.jpg
Lena
Dominant colour in
pixel domain
Dominant colour in
DCT domain
Percentage in pixel
(%)
Percentage in DCT
(%)
Variance in pixel
Variance in DCT
1st
2nd
3rd
4th
154
137
52
103
154
137
52
120
15.48
14.32
11.66
11.03
15.65
13.30
11.81
10.65
4.7681
5.1019
4.5647
4.5941
4.9656
5.0744
4.3800
4.9783
Table 3
MPEG-7 dominant colour descriptor for Peppers.jpg
Peppers
Dominant colour in
pixel domain
Dominant colour in
DCT domain
Percentage in pixel
(%)
Percentage in DCT
(%)
Variance in pixel
Variance in DCT
1st
2nd
3rd
4th
86
103
188
154
103
86
171
188
13.18
12.19
11.32
11.23
12.19
12.08
10.82
10.77
4.7572
4.7032
4.5192
4.8958
4.6621
4.6389
5.0841
4.4151
pixels due to the fact that dominant colour must have a
reasonable critical mass to be regarded as dominant.
Under this context, the proposed algorithm can be
implemented in two modes. The first mode is to extract
dominant colour from blocks of 8!8 pixels only, and
the second is to extract dominant colour with blocks of
8!8 pixels as well as sub-blocks of 4!4 pixels. In the
first mode, the computing cost of the proposed
algorithm is only three comparisons given by (2) and
(3). In the second mode, those uncertain blocks will
incur: (i) computing cost for deriving four sub-blocks
(4!4 DCT coefficients), which is specified by (4).
From Eq. (4), it can be seen that standard implementation will require 64!8!2Z1024 multiplication and
56!8!2Z896 additions; (ii) computing cost for
examining the four sub-blocks specified by (5), where
deriving S requires 15 multiplication and 14 additions.
Therefore, the total computing cost is 1039 multi-
plication and 910 additions. In contrast with the existing
pixel-domain dominant colour extraction, the IDCT
alone would require 4096 multiplication and 4032
additions [14], yet such computing cost is needed for
every block inside the image. For the proposed
algorithm, only a small number of blocks need subblock examination. As a matter of fact, for all the test
images reported in the paper, dominant colour is
extracted only at the block level and no sub-block
division is incurred.
n addition, we applied the proposed algorithm to a
database of around 1000 JPEG compressed images to
perform a small scale of content-based image retrieval via
the dominant colour descriptors in both pixel domain as
proposed by MPEG-7 and DCT domain as proposed in this
paper. The experiments reveal that almost exactly the same
retrieval results were observed in terms of dominant colour
Table 4
MPEG-7 dominant colour descriptor
Dominant colour in
pixel domain
Dominant colour in
DCT domain
Percentage in pixel
(%)
Percentage in DCT
(%)
Variance in pixel
Variance in DCT
Y component of colour Baboon.jpg
1st
120
2nd
137
3rd
171
4th
188
120
137
188
171
16.00
13.81
12.64
11.63
18.86
16.38
15.66
15.39
4.7525
4.9140
4.8899
4.6438
4.5058
4.8607
4.2962
4.8768
Cr component of colour Baboon.jpg
1st
120
2nd
137
3rd
103
4th
154
120
137
103
154
40.54
27.53
9.25
5.56
41.93
27.53
9.14
5.52
4.3625
4.1437
4.6982
4.6934
4.3645
4.1185
4.8808
4.9141
Cb component of colour Baboon.jpg
1st
120
2nd
103
3rd
137
4th
154
120
103
154
137
38.71
18.89
15.89
10.56
40.96
21.90
13.11
12.04
4.6315
4.8266
4.6401
4.4632
4.7426
4.8026
4.7625
4.0554
1274
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
Fig. 3. Comparison of dominant colour histograms with 16 bins between the proposed and MPEG-7 in pixel domain.
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
1275
Fig. 4. Illustration of reconstructed image samples via extracted dominant colour.
retrieval. Fig. 5 illustrates such retrieved samples, where the
query image is presented at the top left corner.
In addition, the mean value of dominant colour blocks
extracted in the proposed algorithm can also be exploited to
construct a so-called dominant colour histogram. According
to the dominant colour extraction principle, this histogram
should reflect the major structure of those general
histograms constructed from the original pixel data. In
1276
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
Fig. 5. Illustration of dominant colour-based image retrieval.
this way, the proposed technique can be used to predict
the image histogram directly in DCT domain, which
could lead to histogram extraction rather than just
dominant colour extraction. Fig. 6 illustrates such detailed
comparisons between the extracted dominant colour
histogram in DCT domain and the original image histogram
based on colour intensity values in pixel domain. It is
shown that the dominant colour histograms constructed
J. Jiang et al. / Image and Vision Computing 24 (2006) 1269–1277
1277
videos to formulate content descriptors in the process of their
compression, the proposed algorithm will be very useful in
extracting MPEG-7 dominant colour descriptors directly from
those already compressed videos and images. As a result of
the successful JPEG and MPEG standardization activities,
almost all digital videos are now stored in compressed format.
Therefore, feature extraction or image processing in DCT
domain will provide certain advantages in reducing computing cost and direct accessing the content of compressed image
and videos. Finally, the authors wish to acknowledge the
financial support from the Resource: The Council for
Museums, Archives and Libraries in the UK under the
research grant LIC/RE/082, and the Natural Science Foundation in China.
References
Fig. 6. Visual comparison between the dominant colour histogram extracted by
the proposed algorithm and the general colour histogram extracted in pixel
domain.
from the proposed algorithm do reflect the major structure
of the image histogram in pixel domain. This will provide
good potential for further research in extracting image
histograms directly from DCT domain, which would be
useful for other applications in processing compressed
images.
In this paper, an algorithm for dominant colour extraction
in DCT domain is proposed, which provides effective and fast
processing technique for content description of compressed
videos and images. While MPEG-7 works on raw images and
[1] R. Brunelli, O. Mich, M. Modena, A survey on the automatic indexing of
video data, J. Visual Commun. Image Represent. (1999) 78–112.
[2] B. Ko, J. Peng, H. Byun, Region-based image retrieval using probabilistic
feature relevance learning, Pattern Anal. Appl. 4 (2–3) (2001) 174–184.
[3] G.K. Wallace, The JPEG still picture compression standard, Commun.
ACM 34 (4) (1991) 31–45.
[4] Overview of MPEG-7 standard (version 5.0), ISO/IEC JTC1/SC29/WG11
N4031, March 2001.
[5] T. Sikora, The MPEG-7 visual standard for content description—an
overview, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 696–
702.
[6] B.S. Manjunath, J. Ohm, V.V. Vasudevan, A. Yamada, Color and texture
descriptors, IEEE Trans. Circuits Syst. Video Technol. 11 (6) (2001) 703–
715.
[7] M. Bober, MPEG-7 visual shape descriptors, IEEE Trans. Circuits Syst.
Video Technol. 11 (6) (2001) 716–719.
[8] S. Jeannin, A. Divakaran, MPEG-7 visual motion descriptors, IEEE
Trans. Circuits Syst. Video Technol. 11 (6) (2001) 720–724.
[9] Y. Deng, B.S. Manjunath, C. Kenney, An efficient color representation for
image retrieval, IEEE Trans. Image Process. 10 (1) (2001) 140–147.
[10] G. Feng, J. Jiang, JPEG compressed image retrieval via statistical
features, Pattern Recogn, in press.
[11] R. Reeves, K. Kubik, W. Osberger, Texture characterization of
compressed aerial images using DCT coefficients, Proceedings of SPIE:
Storage and Retrieval for Image and Video Databases V, vol. 3022,
February 1997, pp. 398–407.
[12] M. Liu, J. Jiang, C. Hou, Combination of image indexing and
compression, Proceedings of ICASSP’02, 2002.
[13] J. Jiang, G. Feng, The spatial relationship of DCT coefficients between a
block and its sub-blocks, IEEE Trans. Signal Proc. 50 (5) (2002) 1160–
1169.
[14] J. Jiang, Y. Weng, Video extraction for fast content access to mpeg
compressed videos, IEEE Trans. Circuits, Syst. Video Technol. 14 (5)
(2004) 595–605.