Region-Based Image Retrieval with
Perceptual Colors
Ying Liu1 , Dengsheng Zhang1 , Guojun Lu1 , and Wei-Ying Ma2
1
Gippsland School of Computing and Information Technology,
Monash University, Vic, 3842, Australia,
{ying.liu, dengsheng.zhang, guojun.lu}@infotech.monash.edu.au,
2
Microsoft Research Asia, No. 49 ZhiChun Road, Beijing, 100080, China,
[email protected]
Abstract. Due to the ‘semantic gap’ between low-level visual features
and the rich semantics in user’s mind, performance of traditional contentbased image retrieval systems is far from user’s expectation. In attempt
to reduce the ‘semantic gap’, this paper introduces a region-based image
retrieval system with high-level semantic color names used. For each segmented region, we define a perceptual color as the low-level color feature
of the region. This perceptual color is then converted to a semantic color
name. In this way, the system reduces the ‘semantic gap’ between numerical image features and the richness of human semantics. Four different
ways to calculate perceptual color are studied. Experimental results confirm the substantial performance of the proposed system compared to
traditional CBIR systems.
1
Introduction
To overcome the drawback of traditional text-based image retrieval systems
which require considerable amount of human labors, content-based image retrieval (CBIR) was introduced in the early 1990’s. CBIR indexes images by
their low-level features, such as color, shape, texture. Commercial products and
experimental prototype systems developed in the past decade include QBIC
system[1], Photobook system[2], Netra system[3], SIMPLIcity system [4], etc.
However, extensive experiments on CBIR systems show that in many cases lowlevel image features can’t describe the high level semantic concepts in the user’s
mind. Hence, the performance of CBIR is still far from the user’s expectations
[5][6]. ‘The discrepancy between the relatively limited descriptive power of lowlevel imagery features and the richness of user semantics’, is referred to as the
‘semantic gap’ [7].
In order to improve the retrieval accuracy of CBIR systems, research focus
in CBIR has been shifted from designing sophisticated feature extraction algorithms to reducing the ‘semantic gap’[8]. Recent work in narrowing down the
‘semantic gap’ can be roughly classified into 3 categories: 1) Using region-based
image retrieval (RBIR) which represents images at region-level with the intention
to be more close to the perception of human visual system [9]. 2) Introducing
K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3332, pp. 931–938, 2004.
c Springer-Verlag Berlin Heidelberg 2004
932
Y. Liu et al.
relevance feedback into image retrieval system for continuous learning through
on-line interaction with users to improve retrieval accuracy [7]. 3) Extracting
semantic features from low-level image features using machine learning or data
mining techniques [5].
We intent to develop a RBIR system with high-level concepts obtained from
numerical region features such as color, texture, spatial position. This paper
includes our initial experimental results using semantic color names. Firstly, each
database image is segmented into homogeneous regions. Then, for each region,
a perceptual color is defined. This is different from conventional methods using
color histogram or color moments [4][9]. The perceptual color is then converted
to a semantic color name (for example, ‘grass green’, ‘sky blue’).In this way, the
‘semantic gap’ is reduced. Another advantage of the system is that it allows users
to perform query by keywords (for example, ‘find images with sky blue regions’).
The remaining of the paper is organized as follows. In section 2, we describe
our system in details. Section 3 explains the test data set and the performance
evaluation model. Experimental results are given in Section 4. Finally, Section
5 concludes this paper.
2
System Description
Our system includes three components, image segmentation, color naming and
query processing.
2.1
Image Segmentation
Natural scenes are rich in both color and texture, and a wide range of natural
images can be considered as a mosaic of regions with different colors and textures.
We intent to relate low-level region features to high-level semantics such as color
names used in daily life (pink, green, sky blue, etc), real-world texture patterns
(grass, sky, trees, etc). For this purpose, firstly we use ‘JSEG’[10] to segment
images into regions homogeneous in color and texture. Fig.1 gives a few examples.
2.2
Color Naming
Perceptual Colors: In stead of using traditional color features such as color
moments or color histograms[4][9], we define a perceptual color for each segmented region with the intention to relate it to semantic color names.
Although millions of colors can be defined in computer system, the colors
that can be named by users are limited [11]. For example, the first two colors
in Fig. 2 correspond to two different points in HSV (Hue, Saturation, Value)
space, but users are likely to name them both as ‘pink’. Similarly, both of the
next 2 colors could be named as ‘sky blue’. The HSV values (with ranges [0,360],
[0,100], [0,100], respectively) of the 4 colors are given below.
Pink: (H,S,V) = (326, 42,100), (330, 40, 100)
SkyBlue: (H,S,V) = (200, 42, 93), (202, 40,100)
Region-Based Image Retrieval with Perceptual Colors
933
HSV color space is the most natural color space in visual. We define a perceptual color in HSV space for each region and then convert it to a semantic
color name. Four different ways to define perceptual color are studied.
– We use the average HSV value of all the pixels in a region as its perceptual
color (refered to as ‘Ave-cl’). This is reasonable as most regions obtained
using JSEG are color homogeneous.
– The value of Hue is in circular, for example, both ‘0’ and ‘360’ represents ‘red’
color. Averaging Hue values may result in a color very different from what
we expect. For example, (0+360)/2=180, this means the average of two ‘red’
pixels is ‘cyan’. To solve this problem, we first calculate the average RGB
value of a region and then convert it to HSV domain. This result is referred
to as ‘RGB-cl’.
– Due to the inaccuracy in image segmentation, pixels not belonging to the
interested region might be included in ‘Ave-cl’ calculation and results in
a color perceptually different from that of the region. Hence, we consider
using the dominant color of a region as the perceptual color. For this, we
first calculate the color histogram (10*4*4 bins) of a region and select the bin
with maximum size. The average HSV value of all the pixels in the selected
bin is used as the dominant color and referred to as ‘Dm-cl’.
– Considering that the histogram of a region may contain more than one bins
of large size, we calculate the average HSV value of all the pixels from M (¿1)
large bins as the perceptual color. Experimentally, we select all those bins
with size no less than 68% of the maximum-size bin. The result is referred
to as ‘Dmm-cl’.
We observed that in most cases the four perceptual colors are very similar,
as in Fig. 3(1), 3(2). However, in some special cases, ‘Ave-cl’ results in a color
visually very different from that of the original region. For example, in region
3(a), due to the inaccuracy in segmentation,a small part of the green background
(left side) is included in the flower. In addition, some pixels are not of pink color,
but dark yellow (at the center of the flower) or gray (in between the petals). The
result is that the ‘Ave-cl’ in 3(b) turns to be different from the color of region
in 3(a).
Color Naming: Color naming is to map a numerical color space to semantic
color names used in natural language. Qualification color naming model is often
used, in which Hue value is quantized into a small set of about 10-20 base color
names [12]. In [12], the author uniformly quantized the Hue value into 10 base
colors, such as red, orange, yellow, etc. Saturation and Luminance are quantized
into 4 bins respectively as adjectives signifying the richness and brightness of
the color. There are two problems with the model used in [14]. Firstly,uniform
quantization of Hue value is not proper as colors in the HSV space are not
uniformly distributed (refer to Fig. 4). The reason is that different colors have
different wave bandwidths. For example, the wave band of yellow and blue are
565-590nm, 450-500nm, respectively. The second problem is that in [12],‘red’
934
Y. Liu et al.
corresponds to Hue value from 0 to 36 (normalized to 0-0.1 in [12]). However,
we notice that Hue of ‘red’ can be around either 0 or 360.
Considering the above mentioned problems, we design a color naming model
as follows. Firstly, we define 8 base colors, red, orange, yellow, green, cyan, blue,
purple, magenta, with the range of the Hue values as [0,8) or [345,360], [8,36),
[36,80),[80,160),[160,188),[188,262), [262,315),[315,345), respectively. Saturation
and Value are quantized into 3 bins as in Fig. 5, with the corresponding adjectives
shown in Table 1. The asterisks indicate special cases. When S=0 and V=1, we
have ‘grey’. When S=0 and V>80, we have ‘white’. When V=0, we always
get ‘black’. Base color names with their adjectives can be simplified as other
common-used color names. For instance, ‘pale magenta’ is named as ‘pink’.
Finally, we obtain 8*2*2+3=35 different colors. For example, the first two
colors in Fig. 2 are both named as ‘pink’. Similarly, the other two colors are
named as ‘sky blue’.
In this way, the low-level color features are mapped to high-level semantic
color names, thus reducing the ‘semantic gap’.
2.3
Query Processing
All database images are segmented into regions and their low-level color features and color names are stored for retrieval purpose. The system can support
different types of queries.
1) Query by specified region - The user selects an interested region from
an image as the query region. The system calculates the low-level color feature
and color name of the query region. All images containing region(s) of same
color name are selected and form a candidate set C. Then, the images in C are
further ranked according to their EMD[13] distance to the query image. With
region distance defined as the Euclidean distance between region color features,
EMD measures the overall distance between two images.
2) Query by keyword – The keyword is selected from the 35 semantic colors
defined. In this case, the system returns all images containing region(s) of same
color name as specified by the keyword.
In this paper, we work on the first case, which is more complex.
3
Database and Performance Evaluation
Corel data set is often used to evaluate the performance of image retrieval systems due to its large size, heterogeneous content and human annotated ground
truth available. However, to be used in image retrieval system as test set, some
pre-processing work is necessary for the following two reasons: 1) some images
with similar content are divided into different categories. For examples, the images in ‘Ballon1’ and ‘Ballon2’. 2) Some ‘category labels’ are very abstract and
the images within the category can be largely varied in content. For instance, the
category ‘Australia’ includes pictures of city building, Australian wild animals,
etc. A few examples are given in Fig. 6.
Region-Based Image Retrieval with Perceptual Colors
935
Fig. 1. JSEG segmentation results
Fig. 2. Example colors
Fig. 3. Region perceptual colors (a) original region, (b) ‘Ave-cl’, (c) ‘RGB-cl’, (d)
‘Dm-cl’, (e) ‘Dmm-cl’
Fig. 4. HSV color Fig. 5. Quantization of S,V
space (H,S)
Fig. 6. Example images from category ‘Australia’
Fig. 7. Query images/regions examples
936
Y. Liu et al.
Hence, it’s better to select a subset from Corel images with ground truth
data available, or make some necessary changes in setting the group truth data.
We selected 5,000 Corel images as our test set(ground truth available).
‘JSEG’ segmentation produces 29187 regions (5.84 regions per image on average) with size no less than 3% of the original image. We ignore small regions
considering that regions should be large enough for us to study their texture
patterns later.
Precision and recall are often used in CBIR system to measure retrieval
performance. Precision (Pr) is defined as the ratio of the number of relevant
images retrieved Nrel to the total number of retrieved images N . Recall (Re) is
defined as the number of relevant images retrieved Nrel over the total number
of relevant images available in the database Nall . We calculate the average Pr
and Re of 30 queries with N =10,20,. . . 100, and obtain the Pr∼Re curve. A few
query images and the specified regions are displayed in Fig. 7.
4
Experimental Results
Firstly, we compare the performance of our RBIR system using ‘Ave-cl’, ‘RGBcl’, ‘Dm-Cl’ and ‘Dmm-cl’ respectively. The Pr∼Re curves are given in Fig.8(a).
The results show that ‘Dm-cl’ and ‘Dmm-cl’ perform better than ‘Ave-cl’ does.
‘RGB-cl’ works better than ‘Ave-cl’ but not as good as ‘Dm-cl’ and ‘Dmm-cl’.
In addition, the performance of ‘Dm-cl’ is very close to that of ‘Dmm-cl’. In this
work, we use ‘Dmm-cl’. Fig.9 compares the retrieval results for query 1 using
‘Ave-cl’ and ‘Dmm-cl’.
Our experiments also show that the proposed color naming system works
better than that used in [12].Due to space limitation, we did not give the results
here.
In addition,we compare our system (denoted as ‘R’) with a CBIR system
using global color histogram (referred to as ‘G’). In system ‘G’, images are represented by their HSV space color histogram with H, S, V uniformly quantized
into 18, 4, 4 bins, respectively. The similarity of two images is measured by the
Euclidean distance between their color histograms.
We observed that ‘R’ works well when the interested region is recognized
and the color names defined can well describe it. For example, in query 2, the
query region is the ‘eagle’. ‘R’ recognizes ‘eagle’ and successfully finds many
relevant images. Fig.10 gives the retrieval results, with ‘R’ returns 8 relevant
images within the top 10 retrieved, while ‘G’ finds only 3.
In another case, such as query 3, both ‘R’ and ‘G’ work well. Due to the
large green background available in the query image and the relevant database
images, retrieval accuracy of ‘G’ is very high. On the other hand, color name
‘grass green’ can well represent the grass region. Hence, retrieval performance
of ‘R’ is also very good. Among the first 10 images retrieved, the number of
relevant images retrieved by ‘G’ and ‘R’ are both 10.
Fig.8(b) compares the performance of ‘G’ and ‘R’ over 30 queries.
Region-Based Image Retrieval with Perceptual Colors
937
Fig. 8. (a) Using different perceptual colors, (b) ‘G’-‘R’ over 30 queries
Fig. 9. Retrieval Results for query 1. The first image is the query image. ‘Q’ refers to
query region. ‘T’ refers to the relevant images selected.
Fig. 10. Retrieval Results for query 2. The first image is the query image. ‘Q’ refers
to query region. ‘T’ refers to the relevant images selected.
5
Conclusions
This paper presents a region-based image retrieval system using high-level semantic color names. For each segmented region, a perceptual color is defined,
938
Y. Liu et al.
which is then converted to a semantic color name using our color naming algorithm. In this way, the system reduces the ‘semantic gap’ between numerical
image features and the richness of human semantics. Experimental results confirm the substantial performance of the proposed system over conventional CBIR
systems.
In our future work, we will make use of multiple types of low-level image
features to extract more accurate semantics. We expect the performance of our
system to be further improved.
References
1. C.Faloutsos, R.Barber, M.Flickner, J.Hafner, W. Niblack, D.Petkovic, and
W.Equitz, “Efficient and Effective Querying by Image Content,” J. Intell. Inform.
Syst., vol.3, no.3-4, pp231-262,1994
2. A. Pentland, R.W.Picard, and S.Scaroff, “Photobook: Content-based Manipulation
for Image Databases”, Inter. Jour. Computer Vision, vol. 18, no.3, pp233-254, 1996.
3. W.Y.Ma and B.Manjunath, “Netra: A Toolbox for Navigating Large Image
Databases”, Proc. of ICIP, pp568-571, 1997.
4. J.Z.Wang, J.Li, and G. Wiederhold, “SIMPLIcity: Semantics-Sentitive Integrated
Matching for Picture Libraries,” IEEE Trans. Pattern and Machine. Intelligence.
Vol 23, no.9, pp947-963, 2001.
5. A.Mojsilovic, B.Rogowitz, “Capturing Image Semantics with Low-Level Descriptors”, Proc. of ICIP, pp18-21, 2001
6. X.S. Zhou, T.S.Huang, “CBIR: From Low-Level Features to High-Level Semantics”, Proc. SPIE Image and Video Communication and Processing, San Jose, CA.
Jan.24-28, 2000.
7. Yixin Chen, J.Z.Wang, R.Krovetz, “An Unsupervised Learning Approach to
Content-based Image Retrieval”, IEEE Proc. Inter. Symposium on Signal Processing and Its Applications, pp197-200, July 2003.
8. Arnold W.M. Smeulders, Marcel Worring, Amarnath Gupta, Ramesh Jain,
“Content-based Image Retrieval at the End of the Early Years”, IEEE Trans.
On Pattern Analysis and Machine Intelligence, vol. 22, No.12, Dec. 2000.
9. Feng Jing, Mingjing Li,, Lei Zhang, Hong-Jiang Zhang, Bo Zhang, “Learning
in Region-based Image Retrieval”, Proc. Inter. Conf. on Image and Video Retrieval(CIVR2003), 2003.
10. Y.Deng, B.S.Manjunath and H.Shin “Color Image Segmentation”, Proc. IEEE
Computer Society Conference on Computer Vision and Pattern Recognition,
CVPR ’99, Fort Collins, CO, vol.2, pp.446-51, June 1999.
11. E.B.Goldstein, Sensation and Perception, 5th Edition, Brooks/Cole, 1999.
12. Conway, D.M., “An Experimental Comparison of Three Natural Language Color
Naming Models”, Proc. East-West International Conference on Human-Computer
Interactions, St. Petersburg, Russia, pp328-339, 1992.
13. Rubner, Y., Tomasi, C., and Guibas, L., “A Metric for Distributions with Applications to Image Databases”, Proc. of the 1998 IEEE Inter. Conf. on Computer
Vision, Jan. 1998.
© Copyright 2026 Paperzz