Access to Pictorial Material: A Review of Current

Computers and the Humanities 33: 293–318, 1999.
© 1999 Kluwer Academic Publishers. Printed in the Netherlands.
293
Access to Pictorial Material: A Review of Current
Research and Future Prospects
CORINNE JÖRGENSEN
School of Information and Library Studies 534 Baldy Hall University at Buffalo Buffalo, NY
14260-1020, USA (E-mail: [email protected])
Abstract. Rapid expansion in the digitization of image and image collections has vastly increased
the numbers of images available to scholars and researchers through electronic means. This research
review will familiarize the reader with current research applicable to the development of image
retrieval systems and provides additional material for exploring the topic further, both in print
and online. The discussion will cover several broad areas, among them classification and indexing
systems used for describing image collections and research initiatives into image access focusing on
image attributes, users, queries, tasks, and cognitive aspects of searching. Prospects for the future of
image access, including an outline of future research initiatives, are discussed. Further research in
each of these areas will provide basic data which will inform and enrich image access system design
and will hopefully provide a richer, more flexible, and satisfactory environment for searching for and
discovering images. Harnessing the true power of the digital image environment will only be possible
when image retrieval systems are coherently designed from principles derived from the fullest range
of applicable disciplines, rather than from isolated or fragmented perspectives.
Key words: content-based retrieval, image databases, image indexing, image retrieval
1. Introduction
Rapid expansion in the digitization of images and image collections (and their
associated indexing records) has vastly increased the numbers of images available
to scholars and researchers through electronic means, and access to collections of
images is a research focus of several of the Digital Library research initiatives
underway at universities across the nation. Current interest in accessing visual
materials is also spurred by the use of multimedia in education and entertainment and an accompanying commercial sector investment in profitable large-scale
imagebases aimed towards the publishing and entertainment industries.1 Other
large scale imagebases include newswire collections and collections such as a
patent and trademark imagebase containing over 600,000 images.
These and other large-scale imagebases require new techniques and applications
for retrieval and have spurred new initiatives aimed at improving access to collections of visual materials. Current research is focusing on several broad areas: image
indexing and classification; image users and image uses; and machine methods
294
CORINNE JÖRGENSEN
for image parsing. There are a number of theoretical and practical questions to
be addressed within each research area. A lingering theoretical and philosophical
question for image indexing and classification is the appropriateness of text-based
retrieval for materials in other cognitive modalities than text. Classification questions include the range of image attributes which needs to be represented and
balancing the representation of domain-specific images meeting specialized needs
with the need for a generalized, domain-spanning indexing system for images
(paralleling a similar problem in natural language processing). On a practical
implementation level, there is the question of how to balance the access needs of
users with time and money constraints on the classification process itself.
In the area of users and uses, there is a great need for research leading to a
better understanding of the cognitive processes involved in searching for an image,
as well as the fundamental question of the meaning of similarity in a visual perceptual environment. Although the utilization of computer methods for retrieval of
fundamental image attributes such as color and texture is an active and well-funded
area for research, it is not yet clear exactly what the appropriate role for these
methods is in relation to user needs. An additional problem area is the ability
to share image information among different organizations and across different
computing platforms. While the application of computerized methods for retrieval
is fundamental across the various research initiatives, a concern is the lack of
communication among researchers in these different areas. Discussions of indexing
and classification systems and of user needs/uses remain generally in the library
and information science (LIS) literatures, while research in machine methods is
reported in conferences such as Photonics West and journals in computer science.
With the exception of a few researchers and projects, neither community appears
to be actively drawing upon other potentially useful literatures (such as cognitive
science) to develop and specify systems for accessing images.
This research review will familiarize the reader with current research applicable to the development of image retrieval systems. The discussion will cover
several broad areas, among them classification and indexing systems used for
describing image collections; research initiatives into image access, and prospects
for the future of image access, including an outline of future research initiatives.
Harnessing the true power of the digital image environment will only be possible
when image retrieval systems are coherently designed from principles derived from
the fullest range of applicable disciplines, rather than from isolated or fragmented
perspectives. The following, it is hoped, will provide a foundation for further
exploration and will be a touchstone for further discussion and collaboration among
the diverse communities of researchers concerned with image access and retrieval.
2. Classification Systems
Providing access to collections of images has long been a problem for the library
community, and through the years, librarians and archivists have dealt with this
ACCESS TO PICTORIAL MATERIAL
295
problem by creating classification systems for images, subject heading lists,
thesauri, or other lists of index terms from which assignment of textual indexing
terms takes place. These systems were based on expertise gained through years
of working with print image collections and generally reflected personal knowledge, the subject requirements of collections, or the needs of specific groups of
users.
One of the earliest was developed with the goal of universality of subject
coverage and places pictorial materials into three broad categories: history, art, and
science (Simons and Tansey, 1970). Some of these classification systems focus on
a specific area, which may be limited in scope by discipline, format, style or period,
or other criteria of interest to a narrow research community. Others are designed
to be applicable in more generalized collections of images but are frequently influenced by historical or art historical considerations such as subjects/themes or the
iconographical interpretation of images.
A survey of these early guides reveals a variety of approaches, even within a
single scheme. These classification systems offer no or few scope notes and are an
eclectic mix of subject, iconography, time period, and various other access points.
The schemes range from the seemingly very simple, with only nine major headings
(Evans, 1980), to more complex ones in which a heading such as “nature” (a single
main heading in Evans) is broken into two main headings, depending on whether
nature is “wild” or “domesticated” (Green, 1984). Often the impetus for codifying
these schemes in print from was the realization that the knowledge used to organize
extensive picture collections resided with one person in a particular setting and
could not easily be transmitted to new personnel, should the need arise (Gilbert,
1973).
Hourihane (Hourihane, 1989) provides a useful review of a number of classification systems for images and objects, especially for fine arts, and describes the
wide variety found in approaches and structure. For instance, systems may employ
subject headings or a thesaurus structure (LC Thesaurus for Graphic Materials;
Princeton Index of Christian Art, Art and Architecture Thesaurus); may employ
several levels of classification; and may be hierarchical or non-hierarchical within
levels (A Subject Index for the Visual Arts, Yale Center for British Art). Natural
language, numerical notation, codes, or a combination of several of these are used
to describe images. Natural language-based systems may either use a controlled
vocabulary or free-text indexing using detailed descriptions.
2.1.
GENERAL SYSTEMS
In response to the needs of image indexers and users, several larger-scale initiatives have been undertaken recently to develop more generally applicable image
indexing systems. The two most well-known and widely used of these systems
in the United States are the Art and Architecture Thesaurus and the Library of
Congress Thesaurus of Graphic Materials.
296
CORINNE JÖRGENSEN
2.1.1. Art and Architecture Thesaurus
Work on the Art and Architecture Thesaurus (AAT) began in 1979 and the first
edition was published in 1980. The AAT is a controlled vocabulary for describing
and retrieving information on fine art, architecture, decorative art, and material
culture. The thesaurus was inspired by the problems encountered by an architectural historian and its original intent was to create a “Universal Access System” for
slides. From these beginnings, the project evolved into the idea of a broader access
tool in the form of a hierarchically-structured thesaurus based on the collaboration
of scholars in the field.
The development of the AAT also relied on input from cataloguer surveys as
well as expert opinion. It is used in collections ranging from a few hundred to over
a million items, and organizations utilizing it include slide, drawing, and photograph collections, archives, museums, libraries, indexing services, architecture and
design firms, and art and architecture dictionary and encyclopedia projects (AAT
21–22).
Work on the AAT also spanned the move from print to electronic catalogs, and
computerized versions of the thesaurus have evolved from a flat AASCI data file
with no retrieval software to single user software running on a PC with database
and word-processing functions (Art & Architecture Thesaurus: Authority Reference Tool! , Version 2.1). A machine-readable version can also be imported into
existing or developing applications, allowing its integration beyond a stand-alone
tool for data entry and searching. It is also available as an online service through
The Research Libraries Information Network (RLIN) and the Canadian Heritage
Information Network (CHIN), as well as on the World Wide Web as the Art
& Architecture Thesaurus Browser (http://www.gii.getty.edu/aat_browser/), which
allows browsing and searching of terms by exact match or keyword match in scope
notes or other fields and display of hierarchies.
Several interesting online research and demonstration projects are utilizing
the Art & Architecture Thesaurus. One demonstration project currently under
developement within the Getty Information Institute’s Technology Research &
Development group is Arthur (ART media and text Hub and Retrieval System,
available at http:www.aphip.getty.edu/arthur). Arthur uses the AMORE image
system, developed by NEC USA, Inc. to index and search 30,000 images and
associated text of several hundred selected Websites. Images can be retrieved by
image similarity, contextual similarity (utilizing text in the web page near the
image), or by keywords in the web pages. Additional access is provided through
the use of the Getty Information Institute’s Vocabularies, which includes the AAT,
The Union List of Artist Names (ULAN), and The Getty Thesaurus of Geographic
Names (TGN).
Two other examples of online projects utilizing the AAT are the GRASP project
(http://arttic.com/GRASP/public/News03/News03/GRASP-Ontology.html), a system for the description of stolen or recovered art items, and The National Graphic
ACCESS TO PICTORIAL MATERIAL
297
Design Image Database at Cooper Union (http://ngda.cooper.edu/#info), under
development at the Herb Lubalin Study Center of Design and Typography in the
School of Art. The project aims to build a virtual visual encyclopedia through an
electronic community of students and educators, and the site is intended to explore
issues related to the history and theory of graphic design, especially the expansion
of the vocabulary used to describe graphic design.
The AAT in its latest edition has evolved into a complex tool with seven
broad facets (Associated Concepts, Physical Attributes, Styles and Periods, Agents,
Activities, Materials, and Objects) encompassing thirty-three hierarchies. As it had
earlier been noted that increased time was needed to catalog slide collections using
the first edition of the Art and Architecture Thesaurus (Besser and Snow, 1990),
(226), a companion volume, the Guide to Indexing and Cataloging with the Art
and Architecture Thesaurus, was published simultaneously with the second edition
of the AAT by Oxford University press. Soergal (1995) provides a thorough review
of the AAT and describes it as a significant achievement, yet concludes that in
order to reach its full potential, it needs a painstaking and thorough restructuring
to create a more useful polyhierarchical network of concept relationships.
Rather than using a specialized classification system, librarians and picture
archivists have sometimes tried to apply an existing general classification system,
such as the Dewey Decimal System or the Library of Congress Subject Headings,
to an image collection. These attempts have generally not been successful (Enser,
1993; Frost, 1996) leading to the conclusion that such systems provide a very
sparse language for image indexing, particularly in image content description. This
recognition stimulated further efforts to create generalized systems which will be
adequate for images.
2.1.2. LCTGM
As Parker (1987) (v–viii) describes in the introduction to the first edition, the
LCTGM was created in response to requests from catalogers for terms that could
be used for image description. It “provides a substantial body of terms for subject
indexing of pictures, particularly large general collections of historical images” and
“offers catalogers a controlled vocabulary for describing a broad range of subjects,
including activities, objects, and types of people, events, and places depicted in still
pictures.” It does not aim to provide coverage of art historical and iconographical
concepts but does supply terms for some abstract ideas. The main source of terminology for the LCTGM is the Library of Congress Subject Headings, but terms
are also drawn from the Art and Architecture Thesaurus, the Legislative Indexing
Vocabulary, and all terms found in the thesaurus Descriptive Terms for Graphic
Materials: Genre and Physical Characteristics Headings (GMGPC), to be used in
cases where types and formats of graphic materials are the subjects of images. The
second edition has added additional terminology and has separated out the genre
and physical characteristics headings for the convenience of the user.
298
CORINNE JÖRGENSEN
The LCTGM is strongly reflective of the LCSH subject headings, with almost
30% of its terms covering historical concerns such as industrial, political, and
social development (Jörgensen, 1996). It contains some terms relating to abstract
concepts, emotions, and thematic and “story” elements, as well as a large number
of “object” terms. Greenberg (1993), in a comparison of the Art and Architecture Thesaurus with The Library of Congress Thesaurus for Graphic Materials,
notes the difference between “terms” in a thesaurus, which represent a single
concept and may be coordinated at the time of use, and “subject headings,”
which name a subject and are precoordinated. The high degree of precoordination of the LCTGM constrains application of many of its terms, and while it
serves its purpose of providing historical access to pictorial materials, especially
those concerned with the history of the United States, it appears to lack utility
for more general applicability to a wider variety of pictorial materials, in which
a broader range of attributes may be important. A research project utilizing an
enhanced organization of LCTGM was conducted at the Institute for the Learning
Sciences, Northwestern University (Gordon, 1998). The Deja Vu Project created
a browsing system using new links among LCTGM terms called “Expectation
Packages,” which group index terms together in schema-like arrangements under
five main categories of Events, Places, People, Things, and Miscellaneous, with
the event being the locus of the organization. For instance, the components of
an Expectation Package for “Going to a pawnshop to hock some valuable items”
contains:
Events: Secondhand sales, Usury
Places: Pawnshops
People: Poor persons
Things: Clocks & watches, High-fidelity sound systems, Jewelry, Musical instruments
Misc: Debt, Poverty (Gordon, 1998), 403
The system was tested at two locations using the LCTGM to access visual
materials, the Library of Congress Prints and Photographs Division, and the North
Dakota Institute for Regional Studies in Fargo, ND. The research found three
benefits to the system: improved access for subject content searches, bringing the
thesaurus to the front of the search process, and making digital materials viewable
from the search tool. Disadvantages included the limitations of a single thesaurus
for access, the need for traditional query-based search tools, and the subjectivity
of the Expectation Packages. Other controlled vocabularies would require Expectation Packages with different slots, and further research is needed to determine
if in fact different Expectation Packages would be required for very specialized
domains, such as medicine. Nevertheless, the system demonstrates that there may
be utility in the application of cognitively-derived organizational structures to
existing vocabularies.
299
ACCESS TO PICTORIAL MATERIAL
2.2. “ ONTOLOGIES ”
OR A CLASSIFICATION BY ANY OTHER NAME
...
Research in “Content-Based Retrieval” (discussed more fully below) concentrates
on the application of machine methods to lower levels of attributes (e.g. color,
texture) for indexing and retrieval of images. Of interest to the discussion on classification systems is the fact that this community of researchers is recognizing that
some form of textual classification is a useful adjunct to these online systems. This
community of researchers frequently refers to a classification system as an “ontology” or “world model.” A formal ontology is populated with concepts, among
which are established relations, typically of the types “is,” “has,” and “belongs to.”
Domeshek et al. (1996) note that much work has been done in the area of ontology creation, but ontologies generally remain “difficult to build, to agree upon,
and to specify in directly reusable fashion” and suggests semi-formal specifications of ontologies composed of both free text and formal representation. While a
number of content-based image retrieval systems describe the use of an “ontology,”
those in the LIS community will recognize these ontologies as simple hierarchical
classification systems.
As an example, the creators of the WebSEEk image retrieval system found that
their users primarily initiate subject-based queries and prefer to navigate within
clearly defined hierarchical semantic space. They created an ontology composed
of more than 2000 classes using a multilevel hierarchy. Their system gathers
candidate terms for the ontology by searching the meta-tags and URLs associated with online images; these terms are then verified with human assistance.
However, a closer look at the ontology reveals problems with mixed hierarchical and semantic levels. For instance, the top level contains terms referring to
subjects and topics (insects, music), format (photography, photographs), and genre
forms (humor). Following links to lower levels (nature) finds the same problems
(mixed subjects and formats such as rivers and videos). Systems with mixed
organizational terms such as these often prove confusing and frustrating to the
user.
Thus, the application of classificatory techniques in these systems is rather
a “good news, bad news” scenario. While it is heartening that classification is
being recognized as a value-added process in automated image retrieval systems,
it is disheartening that many creators of these systems are not turning to the
community with the most expertise and experience in the creation of classification
systems. End-users will not continue to use systems which seem arbitrary and
illogical, no matter how initially appealing their automated retrieval capabilities
may be.
2.3.
CLASSIFICATION BY COMPUTER
As mentioned above, there is a basic question as to whether text-based indexing is
the most appropriate retrieval mechanism for materials in different modalities, such
as images, video, and sound, and researchers from widely different disciplines have
300
CORINNE JÖRGENSEN
called for investigation into a more direct retrieval method than using text as image
surrogates (Rorvig et al., 1988; Stam, 1989; Besser, 1990; Seloff, 1990; Svenonius,
1984). Computer scientists are actively developing “content-based” techniques for
the direct retrieval of visual image content such as shape, texture, color, and spatial
relationships. These techniques allow a user to by-pass the use of text in searching
by selecting an area of an image and requesting the system to “get more like this.”
In addition to indexing individual images, the techniques are useful for partitioning
very large sets of images.
2.3.1. Content-Based Retrieval
A technique that is being widely investigated is the use of color histograms to
represent images. A color histogram is a visual representation of the color distribution in an image, and recent research has demonstrated the utility of these
in characterizing images. However, color histograms do not necessarily produce
unique object or scene identifiers, as a single color histogram can map back to
many different objects or scenes. A second level of analysis of co-occurrence of
colors may be useful as well.
Texture is another basic visual feature used in content-based retrieval. Haralick
(1979) characterizes texture as an “organized area phenomenon” with two basic
components: basic units (primitives) out of which the texture is composed and
spatial distribution of the primitives. While there have been efforts to define
basic “textons,” (Julesz, 1981) a number of elements have been proposed, and
texture remains difficult to define. Most texture analysis models texture as a
two-dimensional, gray-level variation (Ahuja, 1992).
Shape-similarity-based methods retrieve objects which match or are closely
similar to a given shape or image. Such methods are familiar in automated
manufacturing, where visual inspection of products is done by the computer.
Object shape is determined by edge detection, region growing, or a combination
of both. Edge detection is concerned with finding locations in which specific
types of changes occur; region growing is concerned with finding areas where
no changes occur and attempts to answer which pixels are of the same surface.
No technique has yet been developed which can automatically and consistently
identify high-contrast regions as easily as the human visual system. Therefore,
the process of identifying and outlining objects in arbitrary, unconstrained images
currently requires human intervention and guidance, with the computer assisting
by providing refinement techniques and flexible user control.
An iconic or graphical search is a search in which direct input such as a shape
drawn on the computer screen is used as the search key. In order to process graphical or iconic commands, the query image must be captured and then undergo the
same types of representation and interpretation processes as the stored images
before matching can take place. Matching must be flexible enough to refer to
attributes at different levels of detail and must accommodate a range of uncer-
ACCESS TO PICTORIAL MATERIAL
301
tainty. This process is exceedingly complex, and, in order to be successful, requires
applications of artificial intelligence as well as sophisticated algorithms.
In addition to the need to derive useful and automatically extracted representations of image features is the need to determine measures of similarity among these
representations once they have been created. Gupta et al. (1997, p. 38) explain
this problem using texture as an example. They describe texture as composed of
randomness, periodicity, and directionality, which can each be represented by a
number. They continue:
Consider an image with 10 different regions, each with a different texture
value. These texture values form 10 points in the randomness-periodicitydirectionality coordinates. How similar is this image with another that has 10
other textured regions? Many distance functions are defined between point sets
. . . How well do these distance functions portray the human sense of difference in appearance? We have seen no thorough investigation of the issue to
date.
2.3.2. Current Systems
Two widely written about content-based systems are QBIC (Query by Image
Content), and WebSEEk. QBIC, a software product originally developed by the
Machine Vision Group at IBM’s Almaden Research Center, uses content-based
retrieval of color, texture, shape, position, or combinations of these. Several
methods have been used to input a search. Attributes such as color and texture
may be chosen from visual palettes, while shapes can be drawn and regions can
be specified with a mouse. An X-Windows version of QBIC was tested by the Art
and Art History Department Slide Library at the University of California at Davis
on a database of approximately 2000 images (Holt et al., 1997). The user performs
queries based on example images; a thumbnail image is displayed, and the system
can search for other images with similar color, texture or overall layout. The user
can also use graphical tools to specify arbitrary characteristics such as a color
histogram: 20% of a specific shade of blue, 30% of a shade of green. The search
will return results in the form of thumbnail images arranged in descending order
of match to the user’s query. Text attributes such as the artist’s name or media can
also be used to restrict the search. A web version of QBIC continues this research
(http://libra.ucdavis.edu; http://wwwqbic.almaden.ibm.com/) and adds a new color
layout function that allows searches for color in a specific location.
Results from this system are reported as being quite variable. For example,
searches for shapes in fine art images are problematic. The study’s initial conclusions were that while QBIC cannot as of yet replace conventional database tools
for thematic searches it can provide additional capabilities as a sorting tool, not as
a searching tool.
WebSEEk (http://www.ctr.columbia.edu/webseek) is described as “a semiautomatic image search and cataloging engine whose objective is to provide a visual
302
CORINNE JÖRGENSEN
search gateway for collecting, analyzing, indexing and searching for the Web’s
visual information” (Chang et al., 1997, p. 66). As described above, WebSEEk
extracts key terms from URLs and html tags for indexing and combines these
with some content-based retrieval of color features using binary color sets and
color histograms. Content-based retrieval in this system is limited because of the
vast number of images on the web. The current system had indexed (as of 1997)
650,000 images on the web and 10,000 video sequences. The creators of WebSEEk
are also researching the issues involved in querying the web using different search
engines. MetaSEEk (http://www.ctr.columbia.edu/metaseek), a prototype metaimage search engine, allows browsing and searching of random images from
several search engines: VisualSEEk/WebSEEk, QBIC, and Virage. Chang et al.
(1997) discuss multiple difficulties revealed by the project. The challenges in such
an endeavor become quickly apparent to an end-user. A quick search by the author
for “fear” resulted in an image of an old-fashioned icebag; clicking upon this
image in the “find more like this” mode retrieved a silhouette of a witch on a
broomstick!
Two other software demonstrations which may be visited on the web are
the Berkeley Digital Library Project (http://elib.cs.berkeley.edu/cypress) and the
Excalibur Image Surfer (http://isurf.interpix.com). A website on “Visual Information Management” bringing together many links to research groups, publications,
software demonstrations, and image sites is found at: http://rfv.insalyon.fr/∼jolion
/SESAME/T2/T2-7.html.
2.4.
SUMMARY
There is currently great interest in research of this type, yet many of the contentbased techniques that are evolving are the ones that are currently computationally
possible. While shape, texture, and color all appear intuitively to be important
in an image, there is as of yet little understanding of whether these features
on their own should be the focus of attention in retrieval, or whether these
features are more important because of what they uniquely contribute to the
holistic perception of the image. One researcher notes that text-based search
techniques remain the “most direct, accurate, and efficient methods for finding
‘unconstrained’ images and video” and “challenges remain in applying the [above]
content-based image search tools to meet real user needs” (italics added) (Chang
et al., 1997). In addition, while many advances have occurred in the areas of
automatic shape, color, and texture recognition, practical, reliable and consistent
implementation of these techniques for image retrieval remains far in the future.
Romer (1995) notes that “The reality for this technology is that completely automatic content-based recognition is on a VERY distant horizon. It is much more
likely that the cooperative efforts of text-based and content-based methods will
yield the most interesting and useful results for representing image and motion
content for a very long time to come.” Thus, the problems of image retrieval
ACCESS TO PICTORIAL MATERIAL
303
cannot be addressed by computational techniques alone; much basic research
is still needed in more fundamental questions such as which features are most
usefully represented using these techniques and the nature of similarity in a visual
environment.
As can be seen from the above discussion, a variety of different classification
systems for images have been created from a number of different disciplinary,
theoretical, or functional perspectives. Classification systems may be created based
upon the contents of a specific collection or for the users of a specific collection.
The focus may be upon a single collection or the concern may be to provide
links among related collections. Concerns for image users may include information
beyond the visual content of the item, such as the circumstances of production, technique or medium, the history of the item, or meaningful links among
related items. A number of sources may be used to gather text terms to apply to
images, among them subject headings, thesauri, free-text description, or associated
texts.
With the large scale digitization of image collections and widespread availability of images in large databases (especially the World Wide Web), there is
a realization that a better understanding of the process of searching for images
is needed. While there have been a large number of studies devoted to textual
information seeking and several models of this process have been proposed (among
them (Ellis, 1987; Bates, 1989)), there have been far fewer published studies on
how people search for images. Current research efforts focusing on these and other
questions related to image indexing and retrieval are discussed below.
3. Research
While there are many questions that need to be addressed, there are several basic
areas which need a stronger empirical foundation from which to begin to answer
these questions: the attributes needed for effective image indexing, the types of
queries put to image collections and imagebases, the users of these collections, the
tasks and image uses of the users, and the cognitive processes involved in searching
for images. While answers in one area can inform decisions in another area, when
trying to specify optimal methods for image retrieval it quickly becomes apparent
that research and knowledge is needed in all of these areas.
It should also be noted that research in these areas can be time-consuming
and difficult. The nature of these questions requires human-centered research and
detailed observation of processes, and qualitative methods are often the most
appropriate tool. Therefore, research of this type has been more limited than in
other areas of information seeking. The review below will summarize some key
studies and advances in understanding in these areas.
304
3.1.
CORINNE JÖRGENSEN
ATTRIBUTES
Questions concerning which image attributes are necessary for image retrieval fall
into two groups, those focusing on the range and type of attributes necessary, and
those concerned with the level of “granularity” needed in indexing.
3.1.1. Range and types of attributes
The data on unconstrained image description is sparse. Romer (1993) conducted
research that contributed to the development of the Kodak Picture Exchange application for commercial photography. Image search questions (in the words of the
originator) were collected from both image owners (photo agencies) and image
users (graphic artists, art directors). She found that search and review techniques
focused on such attributes as compositional qualities; artistic techniques, genre,
medium; subjective aspects; and spatial aspects. Continuing research by O’Connor
[O’Connor, 1996] found a wide variety of concepts and levels of specificity in
a small sample of image descriptions. He found that when picture descriptions
were elicited using a form with three sections (Caption, Descriptive Words and
Phrases, and Reactions), subjects would list non-object terms describing a metaphor, story, interior monologue, or emotional reaction, as well as observed objects.
This suggests the possibility of functional as well as topical searches.
Jörgensen (1995) conducted exploratory research investigating attributes typically described by subjects in several types of tasks using pictorial images. The goal
of the research was to describe as full a range of attributes as possible, and the
numerical distributions and conceptual relationships among these attributes. Participants performed describing, sorting, and searching tasks, and content analyses
of word and phrase data was used to define a number of image attributes falling
into twelve higher level attribute classes. The data suggest that indexing of literal
objects is of prime significance, as is indexing of the human form and other human
characteristics. The concept of location of specific items within an image occurs
frequently and may be useful in image indexing, although not necessarily only by
inclusion of positional terms. Color is both typically and consistently described and
appears to cue attention to certain attributes or areas as well as providing a holistic
visual impact.
This research demonstrated that “Content/Story” and other abstract and
affective attributes are also typically described, suggesting that image indexing
may benefit by the addition of more subjective aspects of images than have traditionally been addressed by image classification systems. Analysis of a sorting task
further supported the importance of such attributes as theme, setting, and other
“story” elements such as relationships among people depicted in the image. The
combination of data from several tasks suggests that the research data comprises
a fairly complete range of pictorial attributes found in spontaneous image descriptions by non-specialist users. Other research has evaluated attributes in terms of
users’ search failures.
ACCESS TO PICTORIAL MATERIAL
305
In research conducted by Hibler et al. (1992), retrieval failures occurred
mostly because of indexing omissions. The two most frequently omitted categories
responsible for search failure were omission of what appeared to be a relatively minor detail (an item of clothing) and failure to index frequently occurring
objects (walls, roof). The high specificity of the indexing language in this experiment increased precision but decreased recall, so that searches conducted on
general terms (hats, landscapes, eating) were less successful, omitting many items
considered relevant to searchers.
3.1.2. “Granularity” of attributes
An additional factor relating to an attribute is the issue of granularity: upon how
many semantic or hierarchical levels should access to image attributes be provided?
There are two issues here, one being the issue of what an image is “of” and “about”
(an image is “of” a flag but is “about” patriotism), and the other being upon what
level of specificity an index term should fall (e.g. fruit/apple/Red Delicious).
The issue of what an image is “of” and “about” draws upon Panofsky’s (1962)
discussion of iconography and iconology (item depicted and its symbolic or referential meaning) and has been discussed at length in the literature (most notably
by (Shatford, 1986) and (Drabenstott, 1986)). Studies of queries (discussed in 3.2)
demonstrate that multiple levels of attributes are needed, ranging from the specific
item named to the generic category of an item to the item’s “meaning.”
Research from cognitive psychology using images of individual objects as
stimuli suggests that objects and colors are most frequently named on what is
termed the “Basic Level” (Berlin and Kay, 1969; Rosch et al., 1976; Smith et
al., 1978). This Basic Level is neither the most specific nor the most abstract
level but is rather an intermediate level, such as “apple” in the above example.
Research demonstrates that basic level concepts are categorized faster, are used
almost exclusively in free-naming tasks, are learned sooner than other types of
concepts, and are employed similarly across different cultures (Lassaline et al.,
1992) Basic Level categories thus fulfill what is termed the “principle of cognitive
economy” by being both informative and general. Jörgensen (1995) found that
the majority of attributes (including both perceptual and interpretive) named in
image describing tasks were named on the Basic Level; this may have implications
for term choice in indexing images. She also found that freely generated terms
describing both physical and more abstract aspects of images showed less variability than might have been expected, suggesting some constraints may exist on the
process of communicating about visually perceived data.
3.2.
QUERIES
Several studies focus on queries to image collections. Enser (1993) analyzed
almost 3000 recorded requests contained in 1000 request forms submitted to a
large European picture archive, the Hulton Deutsch Collection Limited. Enser
306
CORINNE JÖRGENSEN
described users as representing their requests at a greater level of specificity
than users of online catalogs, with the majority of requests (69%) falling into
the “unique” category, a specific instance of a general category (“George III”
as opposed to “kings”). Requests were refined by date (34%), location, action,
event, or technical specification (the specification of image orientation or type of
image). As a result of this research, Enser suggests (if the Hulton collection is
characteristic of non-domain specific image banks) that a significant proportion of
requests could be satisfied by automatic matching operations on picture captions.
He notes that non-unique subjects (“purgatory”) need a more detailed indexing
system, but concludes that subject indexing, because of its expense, is of low
utility and that reliance upon experienced intermediaries will continue to be the
norm.
Armitage and Enser (1997) extend this work further by analyzing and categorizing additional user requests for still and moving images from seven libraries.
This analysis forms the basis for a faceted framework for queries with four main
categories (who, what, when, where) and three levels of abstraction for each
category (specific, generic, and abstract). The authors note that the incidence
of “unmediated” transactions is increasing and that some of the image libraries
studied are engaged in the development of additional image delivery mechanisms
and are in need of further information which could contribute to good interface
design for image retrieval systems.
Keister (1994) reported on reconstructed query logs at the National Library of
Medicine. While some of these requests are for specific items, in contrast to Enser’s
work, one-third to one-half of users’ queries were “image construct queries,” in
which images are constructed with words describing both abstract concepts as well
as concrete image elements such as specific objects. She concludes that, although
abstract or emotional concepts may be used in end-user descriptions of images,
the aesthetic and emotional needs of the user are highly subjective and not appropriate for the cataloger to consider. Rather, she suggests visual element cataloging
supplemented by a visual surrogate.
The following example demonstrates the four basic levels of image queries
emerging from these studies:
1. Requests for a specific item (The Picture of Rouen Cathedral painted by
Monet).
2. Requests for a specific instance of a general category (Rouen Cathedral).
3. Requests for a general topical or subject category of images (cathedrals).
4. Requests for images communicating a particular abstract concept or affective
response (pictures of cathedrals symbolizing the power of religion in the life
of an ordinary person of the Middle Ages).
The need for both a generic classification and indexing to the greatest level of
specificity are both apparent. These queries can also be further specified by the
addition of other facets relating to either the content or production of the image,
such as time or accessibility. This suggests that incorporating semantic term hier-
ACCESS TO PICTORIAL MATERIAL
307
archies into image retrieval systems may be useful, but research needs to be done
to determine whether such hierarchies should be created at the time of indexing,
generated at the time of searching from a resident thesaurus, or used in a browsing
mode as a tool of query refinement.
3.3.
USERS
Once again, only a few studies describe users of image collection in any detail.
Enser described the Hulton Deutsch Collection as receiving requests from book,
magazine, and newspaper publishers, advertising and design companies, television
and audiovisual companies, and “other” (3%). Armitage and Enser’s later study
spanned a number of collections having both a general user base (the National Film
and Television Archive) and specialist audiences (“expert” users in such specific
fields as natural history, town planning, engineering, art history, and medicine). The
National Library of Medicine collection is used primarily by picture and publication professionals (50%) and health professionals (33%), with the remainder
divided among the museum and academic community and the general public. The
images are used in books, television documentaries, movies, educational projects,
or for reference.
Keister describes further how requests vary among the different user groups.
The museum or academic community often has precise citations to the images it
desires. Health professionals ask for images in keeping with the NLM’s orientation
and images can be accessed by appropriate topic, such as a particular disease.
Picture professionals (still picture researchers, TV, film, or media personnel), on
the other hand, think visually and use art and/or graphics jargon describing specific
image features desired (action shot, horizontal, color).
In contrast to these groups, art historians need access to a different set of image
attributes (Panofsky, 1962; Drabenstott, 1986; Stam and Giral, 1988). Bakewell
(1988) describes one important need of this group: access to “visual traditions”
and particular visual facets (e.g., occurrences of blue cloaks in eighteenth-century
German painting) by means of a variety of indices. This report makes conclusions
that sound familiar to the realm of textual searching needs: information needs of
scholars are dynamic; the breadth, amount and quality of information sought as
well as the manner in which the literature is browsed varies according to the stage
of research; and access to different types of resources as well as different types of
searching strategies is appropriate.
3.4.
TASKS / USES
Users come to image collections with image requests which relate to specific tasks.
For instance, the museum, academic, and art historical communities have tasks
which are related to research questions. Image content is often used as “evidence”
in the proposing of an hypothesis or the construction of an argument. For instance, a
308
CORINNE JÖRGENSEN
social historian may need images of urban intersections to support a line of research
concerned with the effects of urban infrastructure on social relations. In this case,
such qualities as composition, technique, lighting, or perspective are irrelevant,
as may be the particular location depicted. In contrast, these image qualities are
important to other communities such as art historians or perhaps journalists. Other
users such as publishers or art directors have different tasks; they may want an
image which will have a particular emotional effect or may wish to use an image
as part of a package which will communicate a specific message. Thus, the nature
of the task will play a large part in determining which specific image attributes may
be requested.
While such factors appear intuitive, given the time and money constraints
which generally affect the indexing process, there is a need for research describing
in more detail the relationships among tasks and attributes. Research describing
both the relationships among different types of tasks and specific attributes and
delineating the general types of image use tasks is needed. For instance, in her
research, Jörgensen (1995) concluded that the strongest factor affecting the image
descriptions (besides the actual visual content of the image) was the nature of
the specific task being performed. The three describing tasks and the sorting
task produced two markedly different distributions of data, and these differing
results suggest that different sets of attributes assume importance based on type of
task.
Fidel (1997) conducted a small exploratory study with 100 requests from a
stock photo agency. Using Jörgensen’s attribute classes, her analysis demonstrated
that the distribution of attributes in these requests closely matched the distribution of attributes found in Jörgensen’s sorting task results, further supporting the
importance of task in the process of searching for images.
The author of the current paper suggests three major categories of images
(which are not mutually exclusive) to provide a framework for further taxonomy
building:
• Data image (images in which raw data is captured and perhaps processed for
visual clarity).
• Informative images (images to which human intelligence has been applied to
organize the visual material for communication of information).
• Expressional images (pictorial, photo-realistic, abstract, subject to multiple
interpretations).
These major types reflect the original motivation (research, communication,
expression) and process (attribute capture, attribute organization, attribute creation)
in the production of an image. This is not to say that these categories of use
are mutually exclusive. Indeed, an image can be created for one purpose yet be
used effectively for a completely different purpose, and one image can potentially
fulfill all three functions. Therefore, each of these categories may require different
ACCESS TO PICTORIAL MATERIAL
309
approaches in terms of image access, and a single image may require multiple
treatments. However, some notion of the various motivations and functionalities of
an image can serve as an additional indexing device or perhaps as a partitioning
device in a very large collection of images.
Korf Vidal (1995) (81–82) presents a similar model derived from her analysis of
image clusters produced by a spontaneous sorting task of images all depicting the
same subject matter (the Brooklyn Bridge), but created in a wide range of media.
The data revealed several distinct image clusters, which she suggests are organized
according to a communicative continuum: images about the person who made the
image (expressions/expressive); images about the culture in which the image was
made (active messages); and images about the objects or persons depicted in the
image (passive evidence).
Fidel (1997) also describes the different uses of images and presents a
continuum of use between two poles, the “Data Pole,” at which images (cartographic, medical, chemical structure) are used as sources of information, and the
“Objects Pole,” at which images are needed as “objects” for some task, such as
a magazine cover, an advertisement, or a picture in a book. She further describes
image and search characteristics for each pole. For instance, at the “Data Pole,”
users are looking for the smallest set which can provide the information needed,
whereas at the “Objects Pole” browsing sets are needed. The author calls for
additional research to further define and validate these characteristics.
There is, in fact, evidence that users themselves recognize and describe images
in terms which suggest such categorization. For instance, both Jörgensen and Korf
Vidal found examples of such user recognition and description of the purpose of
an image in their research:
Korf Vidal: artistic, clever use of bridge as icon, commemorative, could be
used for greeting cards, documentary, ephemeral uses, “framed art,” take-offs for
profit (Korf Vidal, 1995), 74.
Jörgensen: Advertisement, Big Postcard, Cartoon, Comics, Erotic Art, Postcards, War Footage, Children’s Art (Jörgensen, 1995).
3.5.
SEARCHES
Roddy (1991), (48), speaking of the issue of anticipating a searcher’s needs,
comments: “One of the great failures of image access at present is its inability to provide reliable information on what might be called a typical session.”
Studies focusing on visual search techniques and retrieval in multimedia systems
have produced results which are difficult to interpret as the system itself often
is a confounding factor (e.g. Dunlop, 1991). Nevertheless, research describing
searching and the image attributes relevant for the process is needed for determining adequate representational structures for images.
310
CORINNE JÖRGENSEN
Rorvig et al. (1988) conducted some early work on human image searching
with the NASA Visual Thesaurus, which provided a retrieval interface based on
the broad hypothesis that images can be somewhat robustly substituted for text and
that access to images will reduce ambiguity in both term assignment and searching.
Experimental results indicated that most users, while initially enamored with the
ability to search using surrogate images as input, reverted to text-based retrieval
functions as they became more expert. This suggests that while such a strategy
may be useful with newer users, a simple one-to-one visual/verbal replacement
strategy in searching may not be adequate for all searching.
Two studies describe searching strategies in pictorial databases. Batley (1988)
studied searching behavior of library staff, university students, and school children, and identified four visual information search strategies: Seeking, Focused
Exploring, Open Exploring, and Wandering. She proposes that, given a flexible
visual retrieval environment, users will adopt familiar search strategies and will
engage in a range of search activities from the non-exploratory to the unstructured
exploratory, in much the same way as in a traditional library environment.
Hastings (1994) conducted an exploratory investigation of the search behavior
of eight art historians (with specialties in Caribbean art) in both a manual and
computerized environment. She found that searches became more complex in the
computerized environment, and identified three types of search styles: BrowseSearcher, Subject Searcher, and Text-Searcher. Browse-Searchers created their
own categories for the images and used more complex images, Subject Searchers
imposed a preconceived classification scheme on the images and used textual
information to aid in identification of objects and activities, and Text-Searchers
worked primarily from textual information.
Romer (1993) discovered five search and review patterns in her research and
enumerates several visual thinking processes that were observed with professional
photo editors. She found that visual thinking is stimulated by images, and that
people often start to look for images by using images in either a random or directed
search. A second observation was that images already selected provide the basis
to continue a search. People desire to be able to use selected images to submit a
request such as “Get me more like the ones I just found.” These patterns contributed
to the construction of data records that provide access to two different “points of
view,” that of editorial and advertising users.
The search patterns discussed above broadly represent the two ends of a
continuum; on one end is a focused, specific type of searching and on the other
is a looser, exploratory, browsing type of searching. Fidel relates these search
behaviors to the intended use of the image. While many information retrieval
system designers have tried to accommodate several types of searches by providing
flexible system manipulation, the question remains as to whether this need could
also be addressed through new or expanded approaches to indexing or other forms
of knowledge representation for images.
311
ACCESS TO PICTORIAL MATERIAL
3.6.
SUMMARY
These studies indicate that, just as in text searching, different types of queries,
users, tasks, and searching strategies necessitate flexible retrieval mechanisms
which can accommodate a variety of requests.
Decisions that must be made in the selection of terms to describe images include
whether to limit indexing to visual appearance or to incorporate interpretive and
symbolic meanings and whether to index multiple hierarchical levels. Classification has traditionally been concerned with the first three levels, and numerous
reasons are given for not addressing these fourth-level facets, which seem to be
situated more within the individual user’s own interpretations. However, those
working with providing access to image collections know that the fourth level is a
key one and some researchers have argued for addressing both the “of” and “about”
of an image. Beyond this broad categorization, image queries can be modified by a
number of other more specific considerations, such as date, location, activity, event,
image orientation, perspective or type of representation.
The following chart (Jörgensen, 1995) summarizes some relationships among
types of image queries and associated attributes as revealed in the literature
reviewed.
ART HISTORICAL SEARCH
(context of production)
Text-based Information
creator
title
size
material
type
nationality
time period
technique
genre
VISUAL CONTENT SEARCH
(named object)
Specific Object or
Generic Object
color
size
location
texture
shape
orientation
TOPICAL SEARCH
(named person, place)
Specific Instance or
Generic Category
time
location
event or activity
EVENT SEARCH
(named event)
Specific Event or
General Event Type
time
setting
activity
312
CORINNE JÖRGENSEN
AFFECTIVE SEARCH
(named response)
Emotion or
Atmosphere
expression
color
lighting
composition
CONCEPTUAL SEARCH
(named concept)
Abstract
Symbolic
Thematic
Political
Social
Interpretive
State (stable Or Change)
any attribute
The question of what is needed for image retrieval is inherently multidisciplinary, and effective solutions to system design will draw upon concepts and data
from a number of fields, including classification research, information retrieval,
computer science, and cognitive science. Those interested in these problems
will also find relevant discussion in the fields of aesthetics, cognitive science,
linguistics, and philosophy, although these are beyond the limits of the present
discussion. Other reviews of the image indexing problem can be found in
Rasmussen (1997) and Lancaster (1998).
4. Research Agenda
Current research has focused on needs of specific user groups, searching behaviors, and image attributes. While much more research in each of these three
areas is needed, a broader research agenda should also include investigation of the
interplay and complementarity among these research efforts. Several researchers
(Sledge, 1995) have proposed indexing of attributes to facilitate searching from
particular points of view. Practical system implementations could match unique
user group needs with pertinent attribute set indexing, and searching behaviors
could also be accommodated within a flexible interface design. Other areas for a
broad-based research agenda include integration and selected application of current
text-based tools for image indexing, new representational techniques for attribute
sets, specification of image taxonomies, and research into the nature of similarity
in a visual environment.
4.1.
USES OF CURRENT TOOLS
As can be seen from the current review, there already exist a variety of subject
heading lists and indexing guides for images, as well as more generalized classification systems for textual materials. Rather than lament the shortcomings of these
systems when applied to visual indexing, it may be more productive to launch a
ACCESS TO PICTORIAL MATERIAL
313
major research effort aimed towards a thorough review of these systems with the
goal of specifying which parts of these systems are most useful within specific
indexing contexts.
This would need to be done within a conceptual framework which would be
informed by previous and continuing research on image attributes, points of view,
and user needs. There are several tantalizing aspects to this research, even though
it remains in what some consider the perhaps “inappropriate” textual world. The
appeal of using portions of existing general systems such as LC or Dewey is that
one search could bring up both textual and visual materials on the same topic.
Another intriguing idea is the building of some form of distributed indexing
mechanism for images within such a conceptualized framework. Those who have
knowledge and expertise in a system such as the AAT could provide terms which
are appropriate to the visual item and to the point of view of their users, while others
could contribute subject or topical access terms from LCTGM or Dewey. Experts
in graphics could provide access to compositional or production aspects while
scholars could provide interpretive analysis from realms as diverse as art history,
cultural studies, and postmodernism. Such distributed indexing is currently taking
place for other forms of media, such as movies, using the Internet. One interesting
example of this distributed indexing approach for images is currently taking place
at the Fine Arts Museums of San Francisco’s website (http://www.thinker.org/)
which allows visitors to their web site to add indexing terms to images.
4.2.
NEW REPRESENTATIONAL TECHNIQUES
Among the problems of image indexing is the hierarchical nature of many of the
classification tools, which limits the number and types of relationships which can
be expressed in their vocabulary stricture. The AAT, in an attempt to address these
problems, has evolved from a subject heading list to a hierarchical and faceted
thesaurus. Other representational techniques which can represent complex image
attribute relationships are needed, and schemas and semantic networks are two
which are currently being investigated.
Romer (1995) suggests the restructuring of hierarchical resources into semantic
networks, which are structures that represent knowledge in an interconnected
manner. Within a semantic network, it is possible to assign several relationships
between terms with differing weights to provide a clear notion of the semantic
strength between terms. Many of the limitations surrounding hierarchical and
faceted thesauri may be eliminated by the use of a network structure. Semantic
networks could provide the missing link between visual elements and their associated meanings. An example of this type of relationship is the use of “masks” to
identify visual elements of race, gender, and class (Okon, 1995).
For generalized retrieval, a fruitful approach may be to implement indexing
attributes through the use of a schema-based template (such as the Expectation
Packages discussed earlier), drawing upon a variety of controlled vocabularies and
314
CORINNE JÖRGENSEN
providing some predefined “Points of View” which target specific attribute sets of
interest to particular groups of users.
4.3.
IMAGE TYPOLOGIES
Just as textual materials can represent not only data but a wide range of human
thought and creativity, so too can images represent a variety of perceptual content,
from raw data, organized information, or creative illustration to “art.” And just as
text may represent more than one type of communication at a time (for instance,
poetry can still communicate information), so too can images be both creative and
informative. At present, in the wake of the uniformity and lack of context which
keyword searching has imposed on vastly dissimilar types of texts, we are only just
beginning to understand that different texts and domains may have different search
requirements (Albrechtsen, 1993), and that we still have a very imperfect model
for textual searching.
Thus, in addition to a sound knowledge of the full range of image attributes and an understanding of user search behaviors in a visual environment,
the development of image typologies and definition of their content and function could provide heuristics for indexing particular types of images and guide
image indexing in a more generalized retrieval context as well. Some very interesting research has been performed in this area already. Lohse et al. (1994) has
developed a structural classification of visual representations (in contrast to a functional taxonomy, which focuses on the intended use and purpose of the graphic
material). Eleven categories of visual representations emerged from the research:
graphs, tables, graphical tables, time charts, networks, structure diagrams, process
diagrams, maps, cartograms, icons, and pictures. These categories were evaluated on ten scales representing such attributes of the images as spatial-nonspatial,
attractive-unattractive, numeric-nonnumeric, and concrete-abstract. In this and
other research, there is evidence suggesting that photo-realistic pictures communicate the least amount of “information.” This suggests that this type of image
is not usefully grouped with the rest of the image types, and supports the notion
of “informative images” discussed above. Other work towards developing image
typologies was discussed during a panel presentation at the latest conference of the
American Society of Information Science.2 There needs to be more research aimed
at developing image typologies concerned with structure and function, content and
meaning.
4.4.
DEFINING VISUAL SIMILARITY
An understanding of image similarity features is needed before the “get more
images like this one” search can supply the searcher with accurate results, yet similarity in a visual environment still remains to be defined. Early theories of similarity
relied upon weighted feature lists, but researchers (Medin and Wattenmaker, 1987;
ACCESS TO PICTORIAL MATERIAL
315
Lockhead, 1992) have noted that physical features do not predict classification
performance. Jörgensen (1995), in a sorting task, provided data that demonstrate
that similarity among images cannot be represented solely by perceptual attributes, but must take into account interpretive attributes as well. As discussed above,
however, research also suggests that there is also some measure of consistency in
naming objects and items which appear in images.
These results prompt two lines of thought: (1) that definition of similarity in
a visual environment must not be narrowly constrained; and (2) that the process
of communicating about visual images within a particular cultural context may
provide some constraints on naming image attributes. If this were so, it would
enable a certain amount of “bravado” on the part of image catalogers in attaching
abstract or affective terms such as “mysterious” or “gloomy” to an image. It would
also suggest that assigning free-text terms or some form of distributed indexing
methods may prove useful.
Another larger question, which has barely been asked in the community of
researchers investigating image retrieval, is the function of the process of searching
for an image. If the goal of the process is to stimulate and enhance creativity rather
than to conduct a precise and efficient search, then methods for browsing or perhaps
for revealing unexpected connections or human-centered perspectives would in fact
be more appropriate. Research scientists tend to forget that, in the humanities, the
process is often at least as important as the product, and the process itself produces
new knowledge and understanding.
5. Conclusion
Further research in each of the areas discussed above will provide basic data which
will inform and enrich image access system design and will hopefully provide a
richer, more flexible, and satisfactory environment for searching for and discovering images. However, such research projects require the knowledge and theories
of a variety of disciplines. What is also needed are mechanisms to ensure that
communication among researchers in different disciplines takes place, in order to
provide the broad perspective needed both to define and to solve the problems
associated with accessing visual materials.
Notes
1 For examples, see Corbis Images: http://www.corbisimages.com/; PNI’s PictureQuest: http://
www.pniltd.com/picturequest.html).
2 ASIS Panel discussion, “Theory and Practice in the Organization of Image and Other Visuo-Spatial
Data: from Retrieval to Metadata,” Wednesday, October 28. Panel participants were Clifford Lynch,
Abby Goodrum, Myke Gluck, and Corinne Jörgensen.
316
CORINNE JÖRGENSEN
References
Ahuja, N. “Texture analysis”. In Encyclopedia of Artificial Intelligence. Ed. S. C. Shapiro, New York:
Wiley, 1992, pp. 1101–1115.
Albrechtsen, H. “Subject Analysis and Indexing: From Automated Indexing to Domain Analysis.”
The Indexer 18 (1993), 219–224.
Armitage, L. H. and P. G. Enser. “Analysis of User Need in Image Archives.” Journal of Information
Science, 23 (4) (1997), 287–299.
Bakewell, E. Object, Image, Inquiry: The Art Historian at Work. Santa Monica, Calif., AHIP, 1988.
Bates, M. J. “The Design of Browsing and Berrypicking Techniques for the Online Search Interface.”
Online Review, 13 (1989) 407–424.
Batley, S. Visual Information Retrieval: Browsing Strategies in Pictorial Databases, Ph.D., University
of Aberdeen, 1988.
Berlin, B. and P. Kay. Basic Color Terms. Berkeley, University of California Press, 1969.
Besser, H. “Visual Access to Visual Images – The UC Berkeley Image Database Project.” Library
Trends (38) (1990), 787–798.
Besser, H. and M. Snow. “Access to Diverse Collections in University Settings: The Berkeley
Dilemma.” In Beyond the book: Extending MARC for Subject Access. Eds. T. Petersen and P.
Molholt, Boston: G. K. Hall, 1990, pp. 203–224.
Chang, S. F. and J. R. Smith. “Visual Information Retrieval from Large Distributed Online
Repositories.” Communications of the ACM 1997, 40(12) (1997), 63–71.
Domeshek, E. and S. Kedar. Interactive Information Retrieval Systems with Minimalist Representation. AAAI-96: Thirteenth National Conference on Artificial Intelligence, Portland, OR,
1996.
Drabenstott, K. M. Subject Access to Visual Resources Collections: A Model for Computer
Construction of Thematic Catalogs. New York: Greenwood Press, 1986.
Dunlop, M. D. Multimedia Information Retrieval, University of Glasgow, 1991.
Ellis, D. The Derivation of a Behavioural Model for Information Retrieval System Design, University
of Sheffield, 1987.
Enser, P. G. B. “Query Analysis in a Visual Information Retrieval Context.” Journal of Document
and Text Management 1(1) (1993), 25–52.
Evans, H. Picture Librarianship. New York: K. G. Saur, 1980.
Fidel, R. “The Image Retrieval Task: Implications for the Design and Evaluation of Image
Databases.” The New Review of Hypermedia and Multimedia 3 (1997), 181–199.
Frost, O. “The University of Michigan School of Information Art Image Browser: Designing and
Testing a Model for Image Retrieval.” Knowledge Organization and Change. Ed. R. Green.
Frankfurt/Main, Indeks Verlag, 5 (1996), 182–188.
Gilbert, K. D. Picture Indexing for Local History Materials. Monroe NY: Library Research
Associates, 1973.
Gordon, A. S. The Design of Knowledge-Rich Browsing Interfaces for Retrieval in Digital Libraries,
Ph.D., Northwestern University, 1998.
Green, S. J. The Classification of Pictures and Slides. Denver CO: Little Books, 1984.
Greenberg, J. “Intellectual control of visual archives: A comparison between the Art and Architecture Thesaurus and Library of Congress Thesaurus for Graphic Materials.” Cataloging &
Classification Quarterly 16(1) (1993), 85–117.
Gupta, A. and S. Santini. “In Search of Information in Visual Media.” Communications of the ACM
40(12) (1997), 34–42.
Haralick, R. M. “Statistical and Structural Approaches to Texture.” Proceedings IEEE 67 (1979),
786–804.
Hastings, S. K. An Exploratory Study of Intellectual Access to Digitized Art Images, Ph.D., The
Florida State University, 1994.
ACCESS TO PICTORIAL MATERIAL
317
Hibler, J. N. D. and C. H. Leung. Image Storage and Retrieval Systems. Proceedings -SPIE, the
International Society for Optical Engineering, San Jose, CA., 1992.
Holt, B. and K. Weiss. Proceedings of the 60th ASIS Annual Meeting. ASIS ’97, Washington, D.C.,
Information Today, Inc., 1997.
Hourihane, C. “A Selective Survey of Systems of Subject Classification.” Computers and the History
of Art. Eds. W. Vaughan, A. Hamber and J. Miles. London: Mansell Publishing Limited, 1989,
pp. 117–129.
Jörgensen, C. Image Attributes: An Investigation, Ph.D. Syracuse University, 1995.
Jörgensen, C. Proceedings of the 6th ASIS SIG/CR Classification Research Workshop. Classification
Research Workshop, Chicago IL, American Society for information Science Special Interest
Group/Classification Research, 1995.
Jörgensen, C. “The Applicability of Existing Classification Systems to Image Attributes: A Selected
Review.” Knowledge Organization and Change. Ed. R. Green. Frankfurt/Main, Indeks Verlag, 5
(1996), 189–197.
Julesz, B. “Textons, the Elements of Texture Perception and Their Interactions.” Nature 290 (1981),
9–97.
Keister, L. H. “User Types and Queries: Impact on Image Access Systems.” Challenges in Indexing
Electronic Text and Images. Eds. R. Fidel, T. B. Hahn, E. M. Rasmussen and P. J. Smith. Medford
NJ, Learned Information, Inc., 1994.
Korf Vidal, N. Experimental Image Taxonomy: An Inquiry into Spontaneous Image Organization,
Master’s Thesis, Cornell University, 1995.
Lancaster, F. W. “Indexing Multimedia Sources.” Indexing and Abstracting in Theory and Practice.
Champaign IL, University of Illinois Graduate School of Library and Information Science, 1998,
pp. 206–221.
Lassaline, M. E. and E. J. Wisniewski. “Basic Levels in Artificial and Natural Categories: Are
All Basic Levels Created Equal?” Percepts, Concepts, and Categories: The Representation and
Processing of Information. Ed. B. Burns. New York: North-Holland, 93, 1992.
Lockhead, G. R. “On Identifying Things.” Percepts, Concepts, and Categories: The Representation
and Processing of Information. Ed. B. Burns, New York: North-Holland, 93 (1992), 109–143.
Lohse, G. L. and K. Biolsi. “A Classification of Visual Representations.” Communications of the
ACM 37(12) (1994), 36–49.
Medin, D. L. and W. D. Wattenmaker. “Category Cohesiveness, Theories, and Cognitive Archeology.” Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization. Ed. U. Neisser, New York: Cambridge University Press, 1987, pp. 25–62.
Okon, C. “IBM’s Image Recognition Tech for Databases at Work: QBIC.” Advanced Imaging, 1995,
pp. 63–65.
Panofsky, E. Studies in Iconology. New York: Harper & Row, 1962.
Parker, E. B. LC Thesaurus for Graphic Materials: Topical Terms for Subject Access. Washington,
D. C.: Library of Congress, 1987.
Rasmussen, E. M. “Indexing Images.” Annual Review of Information Science and Technology
(ARIST). Ed. M. E. Williams, Medford NJ: Information Today, Inc., 32 (1997), 169–196.
Roddy, K. “Subject Access to Visual Resources: What the 90s Might Portend.” Library Hi Tech, 9(1)
(1991), 45–49.
Romer, D. Getty Information Institute Online Conference on Digitizing Technologies, 1995.
Romer, D. M. A Keyword is Worth 1,000 Images. Rochester, NY: Kodak, Inc., 1993.
Rorvig, M. E. and C. H. Turner. The NASA Image Collection Visual Thesaurus. Proceedings.
American Society for Information Science 17th Mid-Year Meeting, Ann Arbor, MI, 1988.
Rosch, E. and C. B. Mervis. “Basic Objects in Natural Categories.” Cognitive Psychology, 8 (1976),
382–439.
Seloff, G. A. “Automated Access to the NASA-JSC Image Archive.” Library Trends, 38(4) 1990.
318
CORINNE JÖRGENSEN
Shatford, S. “Analyzing the Subject of a Picture: A Theoretical Approach.” Cataloging & Classification Quarterly 6(3) (1986), 39–62.
Simons, W. and L. C. Tansey. A Slide Classification System for the Organization and Automatic
Indexing of Interdisciplinary Collections of Slides and Pictures. Santa Cruz CA, University of
California, 1970.
Sledge, J. Points of View. Multimedia Computing and Museums. Philadelphia, Archives & Museum
Informatics, 1995, pp. 335–346.
Smith, E. E. and G. J. Balzano. “Nominal, Perceptual, and Semantic Codes in Picture Categorization.” Semantic Factors in Cognition Eds. J. W. Cotton and R. L. Klatzky, Hillsdale, NJ: Lawrence
Erlbaum Associates, 1978, pp. 137–168.
Soergel, D. “The Art and Architecture Thesaurus (AAT): A Critical Appraisal.” Visual Resources 10
(1995), 369–400.
Stam, D. C. “The Quest for a Code, or a Brief History of the Computerized Cataloging of Art Object.”
Art Documentation 8 (1989), 7–15.
Stam, D. C. and A. Giral. “Linking Art Objects and Art Information.” Library Trends, 37(2) (1988),
117–264.
Svenonius, E. Thesauri. Automatic Processing of Art History Data and Documents. Eds. L. Corti and
M. Schmitt. Los Angeles: The J. Paul Getty Trust, 1984, pp. 33–48.