Fear of Authority? Authority Control and Thesaurus Building for Art

Fear of Authority?
Authority Control and Thesaurus Building
for Art and Material Culture Information
Murtha Baca
SUMMARY. Until the 1980s, concepts like authority control, controlled vocabularies, and metadata and schemas were all but unknown
in the world of art and material culture information. This paper traces
the evolution and current status of tools and resources for authority
control of art information, and gives examples of how the lack of authority control can impede end-user access. Collection-specific thesauri and subject indexes, and vocabulary-assisted searching and
query expansion are also discussed. [Article copies available for a fee from
The Haworth Document Delivery Service: 1-800-HAWORTH. E-mail address:
<[email protected]> Website: <http://www.HaworthPress.com>
© 2004 by The Haworth Press, Inc. All rights reserved.]
KEYWORDS. Authority control, art museum information, thesauri,
thesaurus building, local thesauri, vocabulary-assisted searching, query
expansion, LCNAF, LCSH, TGM, AAT, ULAN, TGN, ICONCLASS,
access points, controlled vocabularies, subject access, metadata schemas,
CDWA, VRA Core Categories, Cataloguing Cultural Objects
Murtha Baca is affiliated with the Getty Research Institute.
[Haworth co-indexing entry note]: “Fear of Authority? Authority Control and Thesaurus Building for Art
and Material Culture Information.” Baca, Murtha. Co-published simultaneously in Cataloging & Classification Quarterly (The Haworth Information Press, an imprint of The Haworth Press, Inc.) Vol. 38, No. 3/4,
2004, pp. 143-151; and: Authority Control in Organizing and Accessing Information: Definition and International Experience (ed: Arlene G. Taylor, and Barbara B. Tillett) The Haworth Information Press, an imprint of
The Haworth Press, Inc., 2004, pp. 143-151. Single or multiple copies of this article are available for a fee
from The Haworth Document Delivery Service [1-800-HAWORTH, 9:00 a.m. - 5:00 p.m. (EST). E-mail address: [email protected]].
http://www.haworthpress.com/web/CCQ
 2004 by The Haworth Press, Inc. All rights reserved.
Digital Object Identifier: 10.1300/J104v38n03_13
143
144
Authority Control in Organizing and Accessing Information
My first job at the Getty, in the late 1980s, was as a name authority editor in
what was then called the Vocabulary Coordination Group (now the Getty Vocabulary Program). With a joint doctorate in art history and Italian language
and literature, and having up to that point devoted myself to teaching and
translation work, I was immersed in visual culture and well aware of the power
of language, but had never heard of “authority control.” Unlike the library
world, where cataloging and authority control have long been viewed as essential for providing access to information, the art and cultural heritage communities have only relatively recently become aware of the importance of
managing information and using standards and authority files in order to provide access to their collections.
Although the art library and visual resource communities were aware of
tools like LCSH, LCNAF, and the Thesaurus for Graphic Materials (TGM),1
few if any art museums were even aware of the need for such tools, much less
the existence of the specific tools themselves until the last decade or so. In the
mid- and late 1980s, when TGM I (Subject Terms) and TGM II (Genre and
Physical Characteristics Terms) were being developed, most art museums
didn’t have anything that could be called a collections information system.
Many still don’t, even though they may have purchased collection management software.
In 1980, the Getty began work on the Art & Architecture Thesaurus
(AAT),2 seeking to build a tool that would be useful as an authority file for
those whose job it was to catalog and describe not only bibliographic materials
about art and material culture, but also visual surrogates of works of art, architecture, and material culture (in the case of slide libraries, photographic archives, and similar repositories), as well as the objects themselves (in the case
of museums, archives, and other holding institutions). In those now distant
days before the advent of the World Wide Web and the creation of millions of
uncataloged, largely unstructured Web resources, little did we know how potentially powerful authority files and thesauri could be; the variant terms and
broader and narrower terms can increase both precision and recall in the online
environment, as we shall see in the examples given below.
In the mid-1980s, the Getty began developing its second vocabulary tool:
the Union List of Artist Names (ULAN),3 a database of preferred and variant
names, biographical information, and bibliographic citations for artists, architects, and other creators in the field of visual arts and architecture. Once again,
the goal was to create an authority file (in this case, a personal name authority)
for catalogers and indexers in the visual arts and architecture. Another reason
for the development of the ULAN was the fact that multiple Getty projects
built and maintained their own local name authorities; by creating a union resource, we could both enhance the scope and depth of our name authority
Standards, Exchange Formats, Metadata
145
work, and eliminate redundant work on different projects. In the late 1980s,
work began on the third vocabulary database: the Getty Thesaurus of Geographic Names (TGN),4 with the data first published on the Web in 1997 (yes,
it can take that long to build a good thesaurus, particularly when it has more
than a million names in it).
Building our three vocabularies has been, and continues to be, a time- and
labor-intensive undertaking. Perhaps an even greater challenge is getting museums to use these and other tools for authority control and end-user access.
Curators and other museum professionals tend to be horrified by an expression
like “authority control.” The mere idea of an art historian who considers himor herself to be an “authority” on a particular artist, school, or art form being
told the exact name to use for a particular artist, or what an object in the collections under his or her care should be called, is abhorrent. Thus, a period of
“consciousness raising” and education began as museums made the first attempts to control their collections information, and the effort is still going on
as of this writing. Museums, or rather the decision-makers and those who allocate resources (financial and human) at museums, need to understand that simply purchasing computers, a scanner, and collection management software
will not provide good access to their collections information for the wide
range of potential users of that information, from in-house users (registar’s office, curatorial departments, education department, security staff, etc.) to external users (from advanced researchers to first-time museum visitors and
casual Web visitors). The skills, tools, and methods long known to the library
community, including cataloging and controlled vocabularies, are essential
for organizing and publishing information on any collection. And curators and
other art experts need to understand that authority files do include a preferred
form or heading, but also accommodate variant names and forms, which are
clustered together with the preferred or display form. The philosophy of the
Getty Vocabulary Program is that all names or terms in a cluster are equally
valid as both access points and descriptive metadata; one is not “better” than
the others.
Let’s take a look at what happens when there is no authority control for art
databases. AMICO,5 one of the first large-scale image repositories for the
study of art history, “federates” information and images from the many member museums that contribute to it. The idea is to provide access to high-quality,
high-resolution images (and the images in the AMICO library are certainly
beautiful) and information, for educational and research purposes.
The AMICO library (a misnomer, considering that the word “library” implies organization and classification) is built by taking contributed records
from participating institutions. Although the AMICO data dictionary is
loosely based on Categories for the Description of Works of Art (CDWA),6
146
Authority Control in Organizing and Accessing Information
and there is a required format for contribution of data to AMICO, this is not the
same as contributing MARC records to a bibliographic utility. Many, if not
most, of the institutions that contribute to AMICO don’t appear to consistently
follow a standard metadata schema in their local collection management systems, nor do they appear to have authority control on their artist names; the
AMICO library certainly doesn’t. So, if a user enters a search for “creator=
van gogh” from the Simple Search screen, he is taken directly to an alphabetical list, in which the name “van Gogh” does not appear. At this point, many users would simply assume that the AMICO library contains no images of works
by Vincent van Gogh. If the user persists, however, and tries the strategy of
simply entering the single keyword “gogh” in the “creator” field, he or she is
presented with another alphabetical display, in which there are 48 hits for
“Gogh, Vincent Van” and 3 hits for “Gogh, Vincent Willem Van.” This is because the Philadelphia Museum of Art, which is one of the AMICO contributors, uses the form “Gogh, Vincent Willem Van,” while all of the other
contributing museums use “Gogh, Vincent Van” (incidentally, the preferred
LCNAF form is “Gogh, Vincent van;” inexplicably, AMICO chose to capitalize the prefix “van,” even though its contributing museums, and of course
LCNAF, which follows AACR, do not). At least two things are happening
here: (1) there is no authority control, so the variant form used by the Philadelphia Museum of Art is not clustered with the preferred form used by all of the
other contributing museums; and (2) the search engine is not doing keyword
searching, but phrase searching, so the direct form “van Gogh” is not retrieved
because the records in the AMICO repository only have the inverted form
“Gogh, Vincent Van” in the “Creator” field.
As anyone who has worked with art-historical materials knows, pre-modern artist names can be particularly problematic. A search for “gherardo delle
notti,” or even the single keyword “notti” on the Web site of the Hermitage
Museum,7 retrieves no results, because the Hermitage lists this Dutch artist
who spent much of his career in Italy under the Dutch form of his name,
“Gerrit van Honthorst.” The same thing occurs on the Web site of the National
Gallery of Art, London; the National Gallery of Art, Washington, DC; the National Portrait Gallery; the Louvre; and even in the databases of commercial
image collections such as the Bridgeman Art Library.8 And, as within an
AMICO search, a search on the Hermitage site for “Gerrit van Honthorst” retrieves zero results, while a search on the single keywords “gerrit” or
“honthorst” does. Why? Because the search engine is doing a phrase search,
and the name is given only in inverted form (“Honthorst, Gerrit van”). Once
again, there are at least two barriers to end-user access. The Uffizi in Florence,
instead, uses only the nickname by which Gerrit van Honthorst became famous during his stay in Italy, “Gherardo delle Notti,” which appears in virtu-
Standards, Exchange Formats, Metadata
147
ally all of the scholarly literature in the Italian language on this particular
artist. The LCNAF record for this artist (which was contributed by the Getty
through the Library of Congress’s Program for Cooperative Cataloging,
NACO) has eight variant forms in addition to the preferred form “Honthorst,
Gerrit van” (Figure 1); the ULAN record for the same artist (Figure 2) has 26
variant names in addition to the preferred name (which is identical to the
LCNAF preferred name) and the display name “Gerrit van Honthorst”; this is
because the ULAN allows “variants of variants,” and also includes a “display
name” in natural order, both for purposes of display and to accommodate
phrase searches. All of these names, which have appeared in scholarly literature, in primary documents, or on art objects, are valid access points that can
lead users to the information they are seeking. 9
In addition to variant forms, of course, the hierarchical structure of authority files that take the form of thesauri can make them potentially very powerful
as retrieval tools. Thus, to use an example from the AAT, a non-expert user
will retrieve a cartonnier even if he or she doesn’t know the specific name and
has searched on “cabinet,” provided that the object has been indexed using the
broader term. A searcher who is looking for the town in Tuscany with many
medieval towers, but can only remember that it’s near Siena and begins with
“San,” can find the name San Gimignano by searching on “Siena” in the TGN
and expanding the hierarchy below Siena province.
In addition to searching by personal and geographic names and object
types, users seeking information and/or images of works of art often search by
what is depicted in or on those works–their subject matter.10 “Subject matter”
can range from ordinary objects depicted in or on a work of art, to complex
narrative and iconographic themes. For searchers looking for depictions of
particular objects, tools like LC’s Thesaurus for Graphic Materials and the
AAT can be very helpful. The Thesaurus for Graphic Materials offers the
broader term “bathing suits” (and its variant “swimsuits”) for the search term
“bikini.” The AAT also distinguishes between “bikinis (bathing suits)” and
“bikinis (underwear).”
For users searching for the narrative content or iconographic themes of
works of art, a powerful if misunderstood (and mis-marketed), tool that specifically classifies the narrative content of figurative works of art (particularly
western European art) is ICONCLASS.11 This tool can assist users with variant terms (e.g., “Heracles” for “Hercules”; “Hera” for Juno), but it also uses a
hierarchical structure to help users identify specific narrative episodes that are
“children” of broad concepts (e.g., “Hercules in love with Deianira, daughter
of Oeneus,” which is a child of the broader concept “Love-affairs of Hercules,” in its turn a child of “Story of Hercules,” which is a child of “Greek heroic
legends”). Another potentially powerful functionality of the ICONCLASS
148
Authority Control in Organizing and Accessing Information
FIGURE 1. Library of Congress Name Authority File (LCNAF) Record for Gerrit
van Honthorst
system are the keywords that it associates with specific notations.12 Thus the
notation 94A332 (“Hercules searching for Hylas”) comes “pre-packaged”
with keywords like “mythology,” “Greek legend,” “hero,” “searching,” “sailing,” “Mysia” (the place where Hercules’ beloved companion Hylas was abducted by a nymph), and so on. The power of these keywords can make it
possible to identify the iconographic or narrative content of images and/or to
retrieve images that have been indexed with these keywords. For example, if
the ICONCLASS description and accompanying keywords had been used to
index the images, a search on “hair cutting” would retrieve images of Samson
having his hair cut off by either Delilah and/or a Philistine (the perpetrator varies, especially in depictions from the Baroque period).
In order to exploit the power of the hierarchical structure of a thesaurus, the
broader term(s) must be entered either manually or automatically at the point
of cataloging. Thus the cataloger who is describing a “bonnetière” will also
enter the term “cabinet” and even “furniture” to assist searchers who do not
know the specific name of the object for which they are searching; it may even
be useful (if heretical, at least in the traditional library world) to enter a wrong
broader term, or “false parent” such as “desk,” because that may be how the
user has interpreted the object. Of course this kind of cataloging can be labor-
Standards, Exchange Formats, Metadata
149
FIGURE 2. Union List of Artist Names (ULAN) Record for Gerrit van Honthorst
and hence time-intensive. A “mechanized” solution is to write a computer program that will automatically include the broader term or terms from the thesaurus. Thus, if a user enters the word “desk” on the search page of the Getty
Web site,13 the results will include a page for a “secrétaire à abbattant” on
which the word “desk” does not appear; unbeknownst to the end-user (and this
in itself could be a source of usability problems), a program has inserted that
term into the Keyword META tag in the source of the HTML page, because
that is the parent term in the local thesaurus.
Another machine solution is to interpose an authority file between the
searching and the resources being searched. If a visitor to the Getty Web site
enters the name “Carucci” on the search page, Web pages relating to the artist
known as Pontormo (the name of his birthplace), whose given name was
Jacopo Carucci, will be retrieved. This is because the user’s search statement
is being run against a copy of the ULAN data, and when a match is found, all
of the name forms, preferred and variant, from that record are submitted to the
search engine. Of course, a computer program can’t make the decision to include a misnomer or false parent because that’s what some users may be likely
to use in their search for a particular item. (I am convinced that in my lifetime,
150
Authority Control in Organizing and Accessing Information
no computer program will be able to catalog better than an appropriately
trained human cataloger.)
Simply adopting or interposing a published authority file or classification
system such as LCSH or the ULAN or ICONCLASS is not, however, the most
efficient way of enhancing end-user access to art information (or information
in any other field of study or interest, for that matter). Large authority files like
the Library of Congress authorities or the AAT or TGN (which has more than
1 million names referring to circa 900,000 places) are not only unwieldy as
“searching assistants,” they probably aren’t the right tools to use to enhance
precision and recall in searching specific collections. Many museums and
other cultural heritage collections (and the vendors who build the systems they
use for collection management, many of which now include thesaurus modules) are coming to realize that the best way to enhance end-user access by
means of vocabularies is to build collection-specific thesauri and indexes, taking terms and names from standard published authorities such as LCSH (and
recording the source of such terms), but also adding additional variants from
curators, educators, and even “wrong” terms (e.g., “pot” as a broader term for
“lekythos,” or “jar” as an alternate term for “hydria”). Again, some of this can
be automated (e.g., by writing a program that takes the broader terms and variant terms from the local thesaurus and uses them to populate the Keywords
META tag on a Web page for a museum object, as in the example from the
Getty Museum given on the preceding page); but at some point (in this case,
when the local thesaurus or subject index is being constructed, or when the object is originally being cataloged), a human being who both understands the
collections and understands thesaurus construction and authority control has
to do the work–that is, a person with skill, good judgment, experience, and
knowledge of the material being described, to echo Michael Gorman in his essay “Authority Control in the Context of Bibliographic Control in the Electronic Environment” in the present collection of essays.
In the museum world as in the library world, cataloging and authority control are (or should be) essential for organizing, documenting, and providing
good end-user access to information on our collections–in short, they are part
of the indispensable set of tools we need to fulfill our basic mission of preserving and providing access to our collections. Metadata schemas like Categories
for the Description of Works of Art, MARC VIM, and the VRA Core Categories14 that are specifically designed for cataloging works of art and visual materials exist, as do a range of vocabulary tools that are appropriate for
populating metadata element sets for art and material culture. As of this writing, an editorial team of members of the Visual Resources Association, with
an advisory group of leading experts on cataloging works of art from the museum, library, and archival communities, is nearing completion of the first ver-
Standards, Exchange Formats, Metadata
151
sion of Cataloguing Cultural Objects: A Guide to Describing Cultural Works
and Their Images, which one hopes will become an essential part of the art and
image cataloger’s “desktop.”15
Still missing at many museums are an awareness of these tools and methods, and the skilled people to implement them, in order to ensure that what we
have all rushed to make available on the World Wide Web can be found, identified, selected, and eventually obtained16 (or at least viewed) by our huge audience of end-users.
NOTES
1. “[D]eveloped to support the cataloging and retrieval needs of the Library of
Congress Prints and Photographs Division.” See www.loc.gov/rr/print/tgm1/.
2. Available at www.getty.edu/research/conducting_research/vocabularies/aat.
3. Available at www.getty.edu/research/conducting_research/vocabularies/ulan.
4. Available at www.getty.edu/research/conducting_research/vocabularies/tgn.
5. See www.amico.org. The AMICO library is available by subscription from the
Research Libraries Group.
6. A metadata element set developed by a task force that was modeled after
NISTF, sponsored by the College Art Association and what was formerly know as the
Getty Art History Information Program. See www.getty.edu/research/conducting_
research/datastandards/cdwa/.
7. See www.hermitagemuseum.org/. All of the Web searches in this article were
conducted in late August, 2003.
8. On the Web at www.bridgeman.co.uk.
9. N.B. The full ULAN record does not appear in this figure; it also includes roles,
events, related persons, sources for all names, and bibliographic citations.
10. See M. Baca, ed., Introduction to Art Image Access (Los Angeles: Getty Publications, 2002), especially the essays by Sara Shatford Layne and Colum Hourihane.
11. See http://www.iconclass.nl/.
12. One of the weaknesses of the ICONCLASS system are the alphanumerical notations that it employs, which are quite forbidding and user-unfriendly. These are, I believe, a vestige of the very early, paper-based days of this system (which dates from the
1950s).
13. At www.getty.edu/search/.
14. Available at http://www.vraweb.org/vracore3.htm.
15. Available as of this writing in draft form at http://www.vraweb.org/CCOweb/.
16. These are the four generic user tasks identified by IFLA’s Functional Requirements for Bibliographic Records final report, which is available at http://www.ifla.
org/VII/s13/frbr/frbr.htm.