Fear of Authority? Authority Control and Thesaurus Building for Art and Material Culture Information Murtha Baca SUMMARY. Until the 1980s, concepts like authority control, controlled vocabularies, and metadata and schemas were all but unknown in the world of art and material culture information. This paper traces the evolution and current status of tools and resources for authority control of art information, and gives examples of how the lack of authority control can impede end-user access. Collection-specific thesauri and subject indexes, and vocabulary-assisted searching and query expansion are also discussed. [Article copies available for a fee from The Haworth Document Delivery Service: 1-800-HAWORTH. E-mail address: <[email protected]> Website: <http://www.HaworthPress.com> © 2004 by The Haworth Press, Inc. All rights reserved.] KEYWORDS. Authority control, art museum information, thesauri, thesaurus building, local thesauri, vocabulary-assisted searching, query expansion, LCNAF, LCSH, TGM, AAT, ULAN, TGN, ICONCLASS, access points, controlled vocabularies, subject access, metadata schemas, CDWA, VRA Core Categories, Cataloguing Cultural Objects Murtha Baca is affiliated with the Getty Research Institute. [Haworth co-indexing entry note]: “Fear of Authority? Authority Control and Thesaurus Building for Art and Material Culture Information.” Baca, Murtha. Co-published simultaneously in Cataloging & Classification Quarterly (The Haworth Information Press, an imprint of The Haworth Press, Inc.) Vol. 38, No. 3/4, 2004, pp. 143-151; and: Authority Control in Organizing and Accessing Information: Definition and International Experience (ed: Arlene G. Taylor, and Barbara B. Tillett) The Haworth Information Press, an imprint of The Haworth Press, Inc., 2004, pp. 143-151. Single or multiple copies of this article are available for a fee from The Haworth Document Delivery Service [1-800-HAWORTH, 9:00 a.m. - 5:00 p.m. (EST). E-mail address: [email protected]]. http://www.haworthpress.com/web/CCQ 2004 by The Haworth Press, Inc. All rights reserved. Digital Object Identifier: 10.1300/J104v38n03_13 143 144 Authority Control in Organizing and Accessing Information My first job at the Getty, in the late 1980s, was as a name authority editor in what was then called the Vocabulary Coordination Group (now the Getty Vocabulary Program). With a joint doctorate in art history and Italian language and literature, and having up to that point devoted myself to teaching and translation work, I was immersed in visual culture and well aware of the power of language, but had never heard of “authority control.” Unlike the library world, where cataloging and authority control have long been viewed as essential for providing access to information, the art and cultural heritage communities have only relatively recently become aware of the importance of managing information and using standards and authority files in order to provide access to their collections. Although the art library and visual resource communities were aware of tools like LCSH, LCNAF, and the Thesaurus for Graphic Materials (TGM),1 few if any art museums were even aware of the need for such tools, much less the existence of the specific tools themselves until the last decade or so. In the mid- and late 1980s, when TGM I (Subject Terms) and TGM II (Genre and Physical Characteristics Terms) were being developed, most art museums didn’t have anything that could be called a collections information system. Many still don’t, even though they may have purchased collection management software. In 1980, the Getty began work on the Art & Architecture Thesaurus (AAT),2 seeking to build a tool that would be useful as an authority file for those whose job it was to catalog and describe not only bibliographic materials about art and material culture, but also visual surrogates of works of art, architecture, and material culture (in the case of slide libraries, photographic archives, and similar repositories), as well as the objects themselves (in the case of museums, archives, and other holding institutions). In those now distant days before the advent of the World Wide Web and the creation of millions of uncataloged, largely unstructured Web resources, little did we know how potentially powerful authority files and thesauri could be; the variant terms and broader and narrower terms can increase both precision and recall in the online environment, as we shall see in the examples given below. In the mid-1980s, the Getty began developing its second vocabulary tool: the Union List of Artist Names (ULAN),3 a database of preferred and variant names, biographical information, and bibliographic citations for artists, architects, and other creators in the field of visual arts and architecture. Once again, the goal was to create an authority file (in this case, a personal name authority) for catalogers and indexers in the visual arts and architecture. Another reason for the development of the ULAN was the fact that multiple Getty projects built and maintained their own local name authorities; by creating a union resource, we could both enhance the scope and depth of our name authority Standards, Exchange Formats, Metadata 145 work, and eliminate redundant work on different projects. In the late 1980s, work began on the third vocabulary database: the Getty Thesaurus of Geographic Names (TGN),4 with the data first published on the Web in 1997 (yes, it can take that long to build a good thesaurus, particularly when it has more than a million names in it). Building our three vocabularies has been, and continues to be, a time- and labor-intensive undertaking. Perhaps an even greater challenge is getting museums to use these and other tools for authority control and end-user access. Curators and other museum professionals tend to be horrified by an expression like “authority control.” The mere idea of an art historian who considers himor herself to be an “authority” on a particular artist, school, or art form being told the exact name to use for a particular artist, or what an object in the collections under his or her care should be called, is abhorrent. Thus, a period of “consciousness raising” and education began as museums made the first attempts to control their collections information, and the effort is still going on as of this writing. Museums, or rather the decision-makers and those who allocate resources (financial and human) at museums, need to understand that simply purchasing computers, a scanner, and collection management software will not provide good access to their collections information for the wide range of potential users of that information, from in-house users (registar’s office, curatorial departments, education department, security staff, etc.) to external users (from advanced researchers to first-time museum visitors and casual Web visitors). The skills, tools, and methods long known to the library community, including cataloging and controlled vocabularies, are essential for organizing and publishing information on any collection. And curators and other art experts need to understand that authority files do include a preferred form or heading, but also accommodate variant names and forms, which are clustered together with the preferred or display form. The philosophy of the Getty Vocabulary Program is that all names or terms in a cluster are equally valid as both access points and descriptive metadata; one is not “better” than the others. Let’s take a look at what happens when there is no authority control for art databases. AMICO,5 one of the first large-scale image repositories for the study of art history, “federates” information and images from the many member museums that contribute to it. The idea is to provide access to high-quality, high-resolution images (and the images in the AMICO library are certainly beautiful) and information, for educational and research purposes. The AMICO library (a misnomer, considering that the word “library” implies organization and classification) is built by taking contributed records from participating institutions. Although the AMICO data dictionary is loosely based on Categories for the Description of Works of Art (CDWA),6 146 Authority Control in Organizing and Accessing Information and there is a required format for contribution of data to AMICO, this is not the same as contributing MARC records to a bibliographic utility. Many, if not most, of the institutions that contribute to AMICO don’t appear to consistently follow a standard metadata schema in their local collection management systems, nor do they appear to have authority control on their artist names; the AMICO library certainly doesn’t. So, if a user enters a search for “creator= van gogh” from the Simple Search screen, he is taken directly to an alphabetical list, in which the name “van Gogh” does not appear. At this point, many users would simply assume that the AMICO library contains no images of works by Vincent van Gogh. If the user persists, however, and tries the strategy of simply entering the single keyword “gogh” in the “creator” field, he or she is presented with another alphabetical display, in which there are 48 hits for “Gogh, Vincent Van” and 3 hits for “Gogh, Vincent Willem Van.” This is because the Philadelphia Museum of Art, which is one of the AMICO contributors, uses the form “Gogh, Vincent Willem Van,” while all of the other contributing museums use “Gogh, Vincent Van” (incidentally, the preferred LCNAF form is “Gogh, Vincent van;” inexplicably, AMICO chose to capitalize the prefix “van,” even though its contributing museums, and of course LCNAF, which follows AACR, do not). At least two things are happening here: (1) there is no authority control, so the variant form used by the Philadelphia Museum of Art is not clustered with the preferred form used by all of the other contributing museums; and (2) the search engine is not doing keyword searching, but phrase searching, so the direct form “van Gogh” is not retrieved because the records in the AMICO repository only have the inverted form “Gogh, Vincent Van” in the “Creator” field. As anyone who has worked with art-historical materials knows, pre-modern artist names can be particularly problematic. A search for “gherardo delle notti,” or even the single keyword “notti” on the Web site of the Hermitage Museum,7 retrieves no results, because the Hermitage lists this Dutch artist who spent much of his career in Italy under the Dutch form of his name, “Gerrit van Honthorst.” The same thing occurs on the Web site of the National Gallery of Art, London; the National Gallery of Art, Washington, DC; the National Portrait Gallery; the Louvre; and even in the databases of commercial image collections such as the Bridgeman Art Library.8 And, as within an AMICO search, a search on the Hermitage site for “Gerrit van Honthorst” retrieves zero results, while a search on the single keywords “gerrit” or “honthorst” does. Why? Because the search engine is doing a phrase search, and the name is given only in inverted form (“Honthorst, Gerrit van”). Once again, there are at least two barriers to end-user access. The Uffizi in Florence, instead, uses only the nickname by which Gerrit van Honthorst became famous during his stay in Italy, “Gherardo delle Notti,” which appears in virtu- Standards, Exchange Formats, Metadata 147 ally all of the scholarly literature in the Italian language on this particular artist. The LCNAF record for this artist (which was contributed by the Getty through the Library of Congress’s Program for Cooperative Cataloging, NACO) has eight variant forms in addition to the preferred form “Honthorst, Gerrit van” (Figure 1); the ULAN record for the same artist (Figure 2) has 26 variant names in addition to the preferred name (which is identical to the LCNAF preferred name) and the display name “Gerrit van Honthorst”; this is because the ULAN allows “variants of variants,” and also includes a “display name” in natural order, both for purposes of display and to accommodate phrase searches. All of these names, which have appeared in scholarly literature, in primary documents, or on art objects, are valid access points that can lead users to the information they are seeking. 9 In addition to variant forms, of course, the hierarchical structure of authority files that take the form of thesauri can make them potentially very powerful as retrieval tools. Thus, to use an example from the AAT, a non-expert user will retrieve a cartonnier even if he or she doesn’t know the specific name and has searched on “cabinet,” provided that the object has been indexed using the broader term. A searcher who is looking for the town in Tuscany with many medieval towers, but can only remember that it’s near Siena and begins with “San,” can find the name San Gimignano by searching on “Siena” in the TGN and expanding the hierarchy below Siena province. In addition to searching by personal and geographic names and object types, users seeking information and/or images of works of art often search by what is depicted in or on those works–their subject matter.10 “Subject matter” can range from ordinary objects depicted in or on a work of art, to complex narrative and iconographic themes. For searchers looking for depictions of particular objects, tools like LC’s Thesaurus for Graphic Materials and the AAT can be very helpful. The Thesaurus for Graphic Materials offers the broader term “bathing suits” (and its variant “swimsuits”) for the search term “bikini.” The AAT also distinguishes between “bikinis (bathing suits)” and “bikinis (underwear).” For users searching for the narrative content or iconographic themes of works of art, a powerful if misunderstood (and mis-marketed), tool that specifically classifies the narrative content of figurative works of art (particularly western European art) is ICONCLASS.11 This tool can assist users with variant terms (e.g., “Heracles” for “Hercules”; “Hera” for Juno), but it also uses a hierarchical structure to help users identify specific narrative episodes that are “children” of broad concepts (e.g., “Hercules in love with Deianira, daughter of Oeneus,” which is a child of the broader concept “Love-affairs of Hercules,” in its turn a child of “Story of Hercules,” which is a child of “Greek heroic legends”). Another potentially powerful functionality of the ICONCLASS 148 Authority Control in Organizing and Accessing Information FIGURE 1. Library of Congress Name Authority File (LCNAF) Record for Gerrit van Honthorst system are the keywords that it associates with specific notations.12 Thus the notation 94A332 (“Hercules searching for Hylas”) comes “pre-packaged” with keywords like “mythology,” “Greek legend,” “hero,” “searching,” “sailing,” “Mysia” (the place where Hercules’ beloved companion Hylas was abducted by a nymph), and so on. The power of these keywords can make it possible to identify the iconographic or narrative content of images and/or to retrieve images that have been indexed with these keywords. For example, if the ICONCLASS description and accompanying keywords had been used to index the images, a search on “hair cutting” would retrieve images of Samson having his hair cut off by either Delilah and/or a Philistine (the perpetrator varies, especially in depictions from the Baroque period). In order to exploit the power of the hierarchical structure of a thesaurus, the broader term(s) must be entered either manually or automatically at the point of cataloging. Thus the cataloger who is describing a “bonnetière” will also enter the term “cabinet” and even “furniture” to assist searchers who do not know the specific name of the object for which they are searching; it may even be useful (if heretical, at least in the traditional library world) to enter a wrong broader term, or “false parent” such as “desk,” because that may be how the user has interpreted the object. Of course this kind of cataloging can be labor- Standards, Exchange Formats, Metadata 149 FIGURE 2. Union List of Artist Names (ULAN) Record for Gerrit van Honthorst and hence time-intensive. A “mechanized” solution is to write a computer program that will automatically include the broader term or terms from the thesaurus. Thus, if a user enters the word “desk” on the search page of the Getty Web site,13 the results will include a page for a “secrétaire à abbattant” on which the word “desk” does not appear; unbeknownst to the end-user (and this in itself could be a source of usability problems), a program has inserted that term into the Keyword META tag in the source of the HTML page, because that is the parent term in the local thesaurus. Another machine solution is to interpose an authority file between the searching and the resources being searched. If a visitor to the Getty Web site enters the name “Carucci” on the search page, Web pages relating to the artist known as Pontormo (the name of his birthplace), whose given name was Jacopo Carucci, will be retrieved. This is because the user’s search statement is being run against a copy of the ULAN data, and when a match is found, all of the name forms, preferred and variant, from that record are submitted to the search engine. Of course, a computer program can’t make the decision to include a misnomer or false parent because that’s what some users may be likely to use in their search for a particular item. (I am convinced that in my lifetime, 150 Authority Control in Organizing and Accessing Information no computer program will be able to catalog better than an appropriately trained human cataloger.) Simply adopting or interposing a published authority file or classification system such as LCSH or the ULAN or ICONCLASS is not, however, the most efficient way of enhancing end-user access to art information (or information in any other field of study or interest, for that matter). Large authority files like the Library of Congress authorities or the AAT or TGN (which has more than 1 million names referring to circa 900,000 places) are not only unwieldy as “searching assistants,” they probably aren’t the right tools to use to enhance precision and recall in searching specific collections. Many museums and other cultural heritage collections (and the vendors who build the systems they use for collection management, many of which now include thesaurus modules) are coming to realize that the best way to enhance end-user access by means of vocabularies is to build collection-specific thesauri and indexes, taking terms and names from standard published authorities such as LCSH (and recording the source of such terms), but also adding additional variants from curators, educators, and even “wrong” terms (e.g., “pot” as a broader term for “lekythos,” or “jar” as an alternate term for “hydria”). Again, some of this can be automated (e.g., by writing a program that takes the broader terms and variant terms from the local thesaurus and uses them to populate the Keywords META tag on a Web page for a museum object, as in the example from the Getty Museum given on the preceding page); but at some point (in this case, when the local thesaurus or subject index is being constructed, or when the object is originally being cataloged), a human being who both understands the collections and understands thesaurus construction and authority control has to do the work–that is, a person with skill, good judgment, experience, and knowledge of the material being described, to echo Michael Gorman in his essay “Authority Control in the Context of Bibliographic Control in the Electronic Environment” in the present collection of essays. In the museum world as in the library world, cataloging and authority control are (or should be) essential for organizing, documenting, and providing good end-user access to information on our collections–in short, they are part of the indispensable set of tools we need to fulfill our basic mission of preserving and providing access to our collections. Metadata schemas like Categories for the Description of Works of Art, MARC VIM, and the VRA Core Categories14 that are specifically designed for cataloging works of art and visual materials exist, as do a range of vocabulary tools that are appropriate for populating metadata element sets for art and material culture. As of this writing, an editorial team of members of the Visual Resources Association, with an advisory group of leading experts on cataloging works of art from the museum, library, and archival communities, is nearing completion of the first ver- Standards, Exchange Formats, Metadata 151 sion of Cataloguing Cultural Objects: A Guide to Describing Cultural Works and Their Images, which one hopes will become an essential part of the art and image cataloger’s “desktop.”15 Still missing at many museums are an awareness of these tools and methods, and the skilled people to implement them, in order to ensure that what we have all rushed to make available on the World Wide Web can be found, identified, selected, and eventually obtained16 (or at least viewed) by our huge audience of end-users. NOTES 1. “[D]eveloped to support the cataloging and retrieval needs of the Library of Congress Prints and Photographs Division.” See www.loc.gov/rr/print/tgm1/. 2. Available at www.getty.edu/research/conducting_research/vocabularies/aat. 3. Available at www.getty.edu/research/conducting_research/vocabularies/ulan. 4. Available at www.getty.edu/research/conducting_research/vocabularies/tgn. 5. See www.amico.org. The AMICO library is available by subscription from the Research Libraries Group. 6. A metadata element set developed by a task force that was modeled after NISTF, sponsored by the College Art Association and what was formerly know as the Getty Art History Information Program. See www.getty.edu/research/conducting_ research/datastandards/cdwa/. 7. See www.hermitagemuseum.org/. All of the Web searches in this article were conducted in late August, 2003. 8. On the Web at www.bridgeman.co.uk. 9. N.B. The full ULAN record does not appear in this figure; it also includes roles, events, related persons, sources for all names, and bibliographic citations. 10. See M. Baca, ed., Introduction to Art Image Access (Los Angeles: Getty Publications, 2002), especially the essays by Sara Shatford Layne and Colum Hourihane. 11. See http://www.iconclass.nl/. 12. One of the weaknesses of the ICONCLASS system are the alphanumerical notations that it employs, which are quite forbidding and user-unfriendly. These are, I believe, a vestige of the very early, paper-based days of this system (which dates from the 1950s). 13. At www.getty.edu/search/. 14. Available at http://www.vraweb.org/vracore3.htm. 15. Available as of this writing in draft form at http://www.vraweb.org/CCOweb/. 16. These are the four generic user tasks identified by IFLA’s Functional Requirements for Bibliographic Records final report, which is available at http://www.ifla. org/VII/s13/frbr/frbr.htm.
© Copyright 2026 Paperzz