Does Ontology Help in Image Retrieval?

Does Ontology Help in Image Retrieval? — A Comparison
between Keyword, Text Ontology and Multi-Modality
Ontology Approaches
Huan Wang, Song Liu, and Liang-Tien Chia
Centre for Multimedia and Network Technology, School of Computer Engineering
Nanyang Technological University, Singapore 639798
{wa0004an,
pg03988006, asltchia}@ntu.edu.sg
ABSTRACT
Ontologies are effective for representing domain concepts and relations in a form of semantic network. Many efforts have been
made to import ontology into information matchmaking and retrieval. This trend is further accelerated by the convergence of
various high-level concepts and low-level features supported by
ontologies. In this paper we propose a comparison between traditional keyword based image retrieval and the promising ontology
based image retrieval. To be complete, we construct the ontologies not only on text annotation, but also on a combination of text
annotation and image feature. The experiments are conducted on
a medium-sized data set including about 4000 images. The result
proved the efficacy of utilizing both text and image features in a
multi-modality ontology to improve the image retrieval.
Categories and Subject Descriptors
H.5.1 [INFORMATION INTERFACES AND PRESENTATION]:
Multimedia Information Systems
General Terms
Algorithms, Performance, Experimentation
1.
INTRODUCTION
Image retrieval has always been one of the most active research
fields. Nowadays most popular web image retrieval systems are
based on keyword searching in the surrounding text of images.
However, this kind of retrieval requires adequate text information
with correct keyword description about the corresponding images,
which is not always true in real situations. Moreover, most pure
text-based retrievals simply ignore the helpful image features which
can be extracted through multimedia analysis. Some irrelevant images are retrieved for this reason.
Content-base image retrieval(CBIR) has been studied for many
years. It is a technique to extract image features like dominant
color, color histogram, texture, object shape, and so on. The main
problem is the semantic gap between the low-level image features
and high-level human-understandable concepts. Though efforts like
relevance feedback(RF) has been made to bridge this gap, lots of
user interactions are involved.
Ontologies are designed to capture shared knowledge and overcome the semantic heterogeneity among domains. Knowledge is
collected by expertise of individuals and represented in description
logics. Machine can understand these uniformed representations
and the gap between human-readable knowledge and machine-understandable logics is naturally bridged. A few applications[5] have
used ontologies purely from MPEG-7 feature descriptors. For example, [6] designed and proved ontologies as middle-level structures to bridge the semantic gap between low-level feature and
high-level concept.
The main contribution of this paper is a comparison between
the aforementioned traditional approaches and the ontology approaches. Though many people have tried to use vector or treestructure approaches, no actual application or explicit result are
shown. In order to accomplish this experiment, we designed different ontologies in a selected domain and constructed corresponding
domain knowledge. We compare the retrieval performance of different approaches and discuss their pros and cons. To the best of
our knowledge, no existing works have compared keyword based
retrieval with ontology based retrieval, or pure text ontology based
retrieval with multi-modality ontology based retrieval in the domain of image retrieval.
The rest of this paper is organized as follows: Section 2 introduces some related work. Section 3 focuses on the discussion of the
designed ontology model, which is later used in the comparison.
The experimental results and conclusions are given in Sections 4
and 5 respectively.
2. RELATED WORK
In this section, we have a brief retrospect to previous works on
different approaches. These approaches include content-based image retrievals(CBIR), text-based image search engines, and some
ontology-based retrieval systems. Some of the techniques are applied in our comparison. The most intuitive way of web image
retrieval is to use the surrounding textual information. Many textbased image search engines have been designed and made available on WWW, such as Y ahoo(T M ) and Google(T M ) . These textbased search engines search for images by text features, such as
file name, annotation and so on. In our experiment, we use the
Google image search engine to collect domain specific images. We
also keep these images in ranking order according to the Google
image result. This set of images and its ranking order is used as a
retrieval result of the representative keyword search engines. Other
than text information, image content is also a useful resource for
retrieval. CBIR captures image features and organizes them into a
meaningful retrieval architecture [7]. However, the semantic gap
between the image feature and high-level concept is still an open
issued. Ontology has been validated in practice[5][6][4] to bridge
Copyright is held by the author/owner(s).
MM’06, October 23–27, 2006, Santa Barbara, California, USA.
ACM 1-59593-447-2/06/0010.
109
Animal Description Ontology
Section 3.1 Animal Domain Ontology
Section 3.2 Textual Description Ontology
Section 3.3 Visual Description Ontology
AnimalConcept
Textual Concept
Jackal
Domesticdog
Fox
color
Distribution
Wilddog
Wolf
RedWolf
CapeFox
Pixcolor
ImageProperty
Outdoor
P
1
Graph
Colorful
Europe
Wildlife
WhiteFur
RedFur
Image
Building
hasDistribution
NoneFur
ColorFur
Indoor
Red
Asia
GreyWolf
Fur
Content
Greysacle
Gray
America
RedFox
Visual Concept
Habitat
Canine
BrownFur
Asia
RedWolf
hasFur
hasPixColor
hasColor
P4
BrownFur
Colorful
Red
hasContent
Wild
RedFox
hasFur
hasDistribution
RedFur
USA
Section 3.4 Generated Image Concept
rdfs:subClassOf
containConcept
Property and its
Class
value
Class-Generated Class
relationships
Generated Class
Figure 1: Layer structure of Ontologies
this semantic gap. Furthermore, attempts[1][3] have been made
to combine text information with image features for better retrieval
performance. However, these works either depend on domains with
uniform or simple image contents, or fail to give explicit experiment result.
3.
Table 1: Performance of image classification
Classification
Colorful/Graylike
Photograph/Drawing
Outdoor/Indoor
HumanRelevantScene/Buildings/Wildlife
Greenery/Sand/Stone/Snow/Others
WhiteFur/RedFur/GrayFur/BrownFur/NonAnmial
ONTOLOGY STRUCTURE
In this section we briefly discuss the structures of the text and
multi-modality ontologies, based on which we do image retrieval
and get comparable results with text-based image retrieval. As our
main focus is the comparison result, we do not include details of ontology construction, which is available at [8]. The experimental domain is canine, which is a sub-domain of animal. It is a challenging
domain due to animals’ varied shapes and complex living environments. Therefore ontology is effective on capturing and integrating
various aspects of information. This ontology is well-defined for
semantic research scenarios and open to further extension in our
future work.
0.921
0.842
0.806
0.794
0.814
0.634
3.2 Textual Description Ontology
Textual description ontology is purely based on text and it is used
to encapsulate high-level narrative animal description. By this ontology, certain animal is associated with its domain knowledge.
That is why this ontology works better than single keyword on
capturing semantic interpretations from different context. Several
classes have been defined like “ScientificName”, “Diet”, “Habitat”,
“Distribution” and “ColorDescription”. And semantic relationships
have been generated to connect different concepts including “hasName”, “hasDiet”, “hasHabitat”, “hasDistribution”, “hasColorDescription”. The class and relationship definitions of the ontology
are extracted from the BBC Science & Nature Animal category
as well. We also generate general knowledge ontologies like geographical ontology, color ontology which are associated with the
text ontologies by the relationships defined above.
3.1 Animal Domain Ontology
The animal domain ontology is the basis of the following text
ontology and multi-modality ontology. It provides semantic information of taxonomy definition for the target domain and handles
the classification of animal species. This work is usually done by
domain expert, while in our case, we derive the formal definition
and domain knowledge from the BBC Science & Nature Animal
category1 . It provides standard and unified descriptions in various
aspects for around 620 animals. 20 subspecies under the domain of
canine are collected as our experiment subjects. We re-define the
hyponyms relationship between two concepts as subclass property
in this ontology. For example, fox is a kind of canine (hyponyms),
and therefore fox is defined as a subclass of canine in animal domain ontology. A motivating example is that without this domain
ontology, a dhole image will not be returned for a search for wild
dog while it actually should be.
1
ACCR
3.3 Visual Description Ontology
Besides the single-modality textual description ontology, we define a multi-modality ontology which combines textual description and image features. First we build specific knowledge base
in which the classes and relationships are extracted from low-level
features. Then we incorporate this visual description ontology with
the aforementioned text ontology to the multi-modality ontology.
This combined ontology works better on images with loosely-coupled text annotation.
We formulate each image classification scheme as a class in the
http://www.bbc.co.uk/nature/wildfacts/animals a z.shtml
110
Arctic Fox Retrieval Result
ontology and define the image categories under this classification
scheme as its subclasses. These classes include GreyLikeImage,
ColorImage, ContentType, OutdoorScene, IndoorScene, BuildingRelevant, HumanRelevant, WildlifeScene, FurColor. A complete
list of relationships are extracted from low-level features as follows: “hasPixColor”, “hasPixProp”, “hasEnvironment”, “hasContent” and “hasFur”. So far, we have high-level descriptions generated not only from textual information, but also from low-level
image attributes.
35
100
80
60
40
Arctoc Fox(Google)
Arctic Fox(Text Ontology)
Arctic Fox(Multi−Modality Ontology)
Arctic Fox(Optimal)
20
0
Figure 1 provides a structure of the canine ontology in our system. We use ellipse and rectangle to represent pre-defined class and
generated class respectively. The horizontal line in the middle is to
separate the two different kinds of classes. Part of the ontology is
omitted due to the limited space. Two examples red fox and red
wolf are given to show how we define concrete animal concept using ontology. We can see red fox and red wolf are two generated
classes of superclass canine. From the textual description ontology,
we know red fox has distribution in USA and red wolf has distribution in Asia. The visual description ontology indicates the fur color
of red wolf is brown and the fur color of red fox is red, even though
in the query they share the same keyword red. The visual description information is helpful to filter a majority of inaccurate results.
For instance, we can reasonably infer from an indoor background
that a wild cape fox is not likely to exist in the image.
0
50
100
150
Number of Images Retrieved(in ranking order)
Number of Correct Images Retrieved
Number of Correct Images Retrieved
120
3.4 Examples of the Generated Classes
4.
Bush Dog Retrieval Result
140
30
25
20
15
10
Bush Dog(Google)
Bush Dog(Text Ontology)
Bush Dog(Multi−Modality Ontology)
Bush Dog(Optimal)
5
0
200
0
140
60
120
50
40
30
20
Coyote(Google)
Coyote(Text Ontology)
Coyote(Multi−Modality Ontology)
Coyote(Optimal)
10
0
0
50
100
150
Number of Images Retrieved(in ranking order)
200
Ethiopian Wolf Retrieval Result
70
200
Number of Correct Images Retrieved
Number of Correct Images Retrieved
Coyote Retrieval Result
50
100
150
Number of Images Retrieved(in ranking order)
100
80
60
40
Ethiopian Wolf(Google)
Ethiopian Wolf(Text Ontology)
Ethiopian Wolf(Multi−Modality Ontology)
Ethiopian Wolf(Optimal)
20
0
0
50
100
150
Number of Images (in ranking order)
200
Figure 2: A comparison of image retrieval results between different approaches(1)
EXPERIMENTAL RESULTS
In the experiment we compare the ontology-based image retrieval
systems with the Google image Search, which is among the best
keyword based search engines and handles over 2 billion images.
The experiment data set is set up by a total of 4000 images using
the top 200 Google images of each of the 20 canine subspecies.
Google image is used in our experiment as it is accessible and
other researchers can easily compare our performance with their
experimental results. The reason for using only the top 200 ranking results of Google image is that this set of images are statistically and visually higher in significance and ranking. The data
set of the web images and their web pages is downloaded by our
image crawler. The experimental results for low-level feature extraction using SVM is shown in Table1, which is used to build the
multi-modality ontology. For the comparison we use the Google
image Search with text ontology-based retrieval and multi-modality
ontology-based retrieval. For semantic matchmaking of ontologies,
we choose RACER version 1.9[2] as our reasoner since it is able to
provide consistency checking of the knowledge base, computing
entailed knowledge via resolution and processing queries through
complex reasoning.
To evaluate the performance of high-level information extraction
based on low-level features, we list the average correct classification rates (ACCR) in Table 1. In the classification, one third randomly selected data are used as training samples and rest of the data
are used as test samples. We repeat each classification 10 times and
calculate the ACCR. The last set of classification does not achieve
very good classification performance because the fur color of fox is
affected by the change of illumination and viewing angle.
formance of textual description ontology retrieval is slightly better
than keyword based search. However, we find that text ontology
retrieval is still hampered by the lack of text information within the
web page. For example, if no related concept and relationship is extracted from the surrounding text, the generated class of this image
is void. The result ranking is based on the degree of match: exact
match, subsume match and disjoint. As a void class is disjoint with
any pre-defined canine class, the image is ranked low in the final
result even if it is correct.
4.2 Keyword, Text Ontology versus MultiModality Ontology
From the figure, we can see the multi-modality ontology-based
retrieval outperforms others by returning more relevant images with
higher ranking. In the best case arctic fox, the multi-modality
ontology-based retrieval almost overlaps the optimum blue line,
which returns the N correct images in the first N ranking positions.
The result benefits much from high accuracy of image feature classification in WhiteFur, whose ACCR is 0.826(this ACCR value is
different from the one shown in Table 1 as that value is the Average ACCR of all fur types). However, there are gaps between
the multi-modality ontology results and the optimal results in most
cases. We presume it could be due to one or more of these reasons: First the performance could be affected by the accuracy of
image feature classification; Second, the lack of text information
in the web pages will result in less correspondence in text ontology and multi-modality ontology; Third, concurrently we have not
completed study on the accuracy of rule-based engines and reasoners, we are not sure if the reasoner we use provides the best
matchmaking result.
4.1 Keyword versus Text Ontology
We apply semantic matchmaking on the 200 top-ranking Google
images with web pages and present some results of the overall performance of different approaches in Figure 2. In this test, the 200
images in each category include the training and test data used in
previous image classification test. We can see that the overall per-
4.3 Comparison of Precision Results
In Web image search and retrieval systems, precision is a very
important guideline to measure the performance of a retrieval sys-
111
Top 20 Image Retrieval Result
Google result
Text ontology result
Multi−Modality ontology result
45
Number of Conrrect Image Retrieved
Number of Correct Image Retrieved
Top 40 Image Retrieval Result
50
Google result
Text ontology result
Multi−Modality ontology result
25
20
15
10
5
40
35
30
25
20
15
10
5
0
0
5
10
Animal Subspecies
15
0
20
0
5
Top 60 Image Retrieval Result
Number of Correct Images Retrieved
Number of Correct Images Retrieved
20
Google result
Text ontology result
Multi−Modality ontology result
90
60
50
40
30
20
10
0
15
Top 80 Images Retrieval Result
100
Google result
Text ontology result
Multi−Modality ontology result
70
10
Animal Subspecies
80
70
60
50
40
30
20
10
0
5
10
Animal Subspecies
15
20
0
0
5
10
Animal Subspecies
15
20
Figure 3: A comparison of image retrieval results between different approaches (2)
6. REFERENCES
tem because most users will browse only limited numbers of results. Therefore recall rate for a Web image retrieval is not that
crucial. In Figure 3, we show the correct number of image retrieved in the top 20, 40, 60 and 80 for all 20 canine subspecies.
The 20 subspecies are: 1. aardwolf, 2. African wild dog, 3. bateared fox, 4. black jackal, 5. cape fox, 6. arctic fox, 7. gray fox,
8. red fox, 9. kit fox, 10. bush dog, 11. coyote, 12. dhole, 13. dingo,
14. Ethiopian wolf, 15. fennec fox, 16. golden jackal, 17. gray wolf,
18. maned wolf, 19. red wolf and 20. spotted hyena. From the
figure, we can see for top 20 image retrieval result, nearly all images retrieved by the multi-modality ontology are correct. In most
cases, ontology-based image retrieval can achieve better retrieval
precision than keyword-based image search. As we only implement generic image classification mechanism which is not particularly designed for the target domain, we can see from Table 1 that
if we only apply the image retrieval based on image classification
results, we may fail to outperform the normal textual based image
retrieval mechanism. However, by combining the high-level textual
information with low-level image features, we are able to improve
the retrieval precision by about 5 to 30 percent.
5.
[1] Y. A. Aslandogan, C. Thier, C. T. Yu, J. Zou, and N. Rishe.
Using semantic contents and wordnet in image retrieval. In
Proceedings of 20th annual international ACM SIGIR
conference on Research and development in information,
pages 286–295, 1997.
[2] V. Haarslev and R. Moller. Racer system description. In
Proceeding of International Joint Conference on Automated
Reasoning, pages 701–705, 2001.
[3] S. Hammiche, S. Benbernou, M.-S. Hacid, and A. Vakali.
Semantic retrieval of multimedia data. In MMDB, pages
36–44, 2004.
[4] B. Hu, S. Dasmahapatra, P. Lewis, and N. Shadbolt.
Ontology-based medical image annotation with description
logics. In Proceedings of The 15th IEEE International
Conference on Tools with Artificial Intelligence, pages 77–82,
2002.
[5] J. Hunter. Adding multimedia to the semantic web - building
an mpeg-7 ontology. International Semantic Web Working
Symposium, August 2001.
[6] S. Liu, L.-T. Chia, and S. Chan. Ontology for nature-scene
image retrieval. In On the Move to Meaningful Internet
Systems 2004: CoopIS, DOA, and ODBASE, pages
1050–1061, June 2004.
[7] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and
R. Jain. Content-based image retrieval at the end of the early
years. IEEE Trans. PAMI, 22(12):1349–1379, DECEMBER
2000.
[8] H. Wang, S. Liu, C. Zhou, and L.-T. Chia. Ontology
Construction. available at:
http://cemnet.ntu.edu.sg/pet device/wanghuan/Ontolo
gy%20Construction.pdf, April 2006.
CONCLUSION
This paper has presented a comparison between keyword-based
image retrieval and the ontology-based image retrieval. Different approaches have their own pros and cons. Keyword-based approach is user friendly and easy to apply with an acceptable retrieval precision, while semantically rich ontology addresses the
need for complete descriptions of image retrieval and improves the
precision of retrieval. However, the lack of text information which
affects the performance of keyword approach is still a problem in
text ontology approach. Ontology works better with the combination of image features. Though there is a trade-off between the
complexity and performance, ontology is still a viable choice when
better performance is expected with a smaller result set.
112