Renate Beilharz - Australian Library and Information Association

Cataloguing is riding the waves of change
Renate Beilharz
Teacher
Library and Information Studies
Box Hill Institute
Abstract
Quality catalogue data is essential for effective resource discovery. Consistent catalogue
records, using international standards, have allowed records to be shared easily from
system to system, with copy cataloguing becoming standard in most libraries. It has been
postulated that it is no longer necessary to teach cataloguing in library technician courses
because the percentage of graduates who will be obliged to catalogue is very small.
Some library professionals appear to see cataloguing as no longer relevant because library
management systems are no longer the key source of information for users. The digital
world has transformed the way users search for information to meet their needs, with the
library catalogue being only one source of information amongst many available to
information seekers.
The semantic web is the latest development in the world wide web (WWW), allowing
computers to understand what is being sought by users and library staff, thus providing
relevant and focussed search results. For this to happen, consistent, quality metadata is
required. Cataloguing is the process of creating metadata for library resources; trained
cataloguers understand the need for quality metadata and understand the process of
creating it.
The introduction of the new cataloguing standard, Resource Description and Access (RDA),
ensures that data created for library management systems is compatible with metadata
schema of the digital world. RDA has positioned library cataloguing data firmly in the world of
the semantic web, with its modular data elements and its focus on recording relationships.
Creators of metadata, that is cataloguers, will always be required to ensure that resource
discovery systems are effective. The library industry has generations of experience in
creating consistent, authoritative cataloguing. These skills are also required by metadata
creators in the information industry. Cataloguing, as a skill, is here to stay, though maybe it
should be called metadata creation, and so more easily ride the waves of change.
Introduction
Providing effective and efficient access to physical and digital information collections is a
fundamental role of libraries and information centres. The tool used by staff and patrons to
find, select, identify and access resources is the library’s Open Public Access Catalogue
(OPAC). Underpinning all effective OPACs, and discovery tools in libraries, is standardised,
quality cataloguing data.
The collaborative nature of the library information industry has facilitated the development of
strong bibliographic networks, allowing sharing of catalogue data across library management
systems (LMS) and between institutions. The prevalence of these networks has contributed
to the decline in the number of library professionals who undertake original cataloguing.
This, in turn, has led to the suggestion that it is no longer necessary to teach cataloguing in
library technician courses.
Supporting the argument that cataloguing is not a skill required by library professionals is the
observation that OPACs are no longer the favoured tool for accessing information by many
library users. The simplicity of the Google search allows easy access to the huge amount of
information represented by the WWW. Therefore it is the preferred search tool for people
looking for information online.1
Online catalogues and OPACs have undergone considerable enhancement over the years
with the development of federated searching and discovery layers. At the same time the
WWW has not been static since its commercial development in the 1990s, moving from the
information web to the collaborative web and now on to the semantic web.2 The change in
resource discovery in the WWW and in OPACs has been significant. What hasn’t changed is
the purpose of both: to assist user access to information resources. Together they will play
an essential role in the future of information discovery, providing relevant and focussed
search results, especially through the implementation of semantic technologies.
The semantic web will allow computers to understand what is being sought by users and
library staff and provide relevant and focussed search results. For this to happen, consistent
quality metadata is required. Cataloguing is the process of creating metadata for resources;
trained cataloguers understand the need for quality metadata and understand the process of
creating it.
Cataloguing
Cataloguing in librarianship uses a number of international rules and standards. AngloAmerican Cataloguing Rules, 2nd edition (AACR2) is one of these. It has been around since
1978 and is being replaced by RDA this year. RDA is based on principles and concepts
developed by the international library community over the past two decades, including the
Statement of International Cataloguing Principles (ICP).
The ICP was developed by the International Federation of Library Association and
Institutions (IFLA) in 2009. The ICP states “the highest [principle] is the convenience of the
user.”3 It goes on to list objectives and functions of a catalogue, which is to enable users to:
-
1
Find: an item that meets their need (by an author, on a subject, with a certain title) e.g. a
work on knitting.
Hider, Philip, 2012, Information resource description, Facet Publishing, London.
Chatfield, Tom, 2011, 50 digital ideas you really need to know, Quercus, London.
3
IFLA 2009, Statement of international cataloguing principles. Retrieved 2 March 2013 from
http://ifla.org/publications/statement-of-international-cataloguing-principles.
2
-
Identify: to confirm that the item is the one they are looking for, distinguish between two
similar items e.g. work on Sue Smith’s book on knitting or John Smith’s book on knitting.
-
Select: to check the form of an item or its suitability for a particular group e.g. works on
knitting in large print.
-
Obtain: provide information needed to get the item, to borrow, request, download a
copy.
Every instruction and guideline in RDA is linked back to one or more of these objectives,
ensuring the user is the focus of all cataloguing activities.
In reality these principles are nothing radically new; library catalogues have been focussing
on users for years, assisting access to information. What has changed is the functionality
and capabilities of library management systems. These have evolved from card systems to
complex computerised systems with interactive search screens featuring graphics, access to
online resources and strong keyword searching capabilities. Today, libraries are integrating
the ‘catalogue search’ with other information repositories and sources, creating discovery
layers to break down the barriers between the traditional library catalogue and other
information sources. The National Library of Australia’s Trove is an example of a catalogue
search system covering a range of information sources and databases. 4
RDA is the descriptive cataloguing standard designed to meet the demands of these new
systems. The Joint Steering Committee for the Development of RDA (JSC) states:
RDA - Resource Description and Access will be a new standard for resource
description and access, designed for the digital world. Built on foundations
established by the Anglo-American Cataloguing Rules (AACR), RDA will provide a
comprehensive set of guidelines and instructions on resource description and access
4
NLA 2013, Trove. Retrieved 14 March 2013 from http://trove.nla.gov.au.
covering all types of content and media. RDA will enable users of library catalogues
and other systems of information organization to find, identify, select, and obtain
resources appropriate to their information needs.5
Cataloguing, using AACR or RDA, is the process of creating descriptive metadata about
items( that is, data about data, or information about information), designed to assist with
finding and accessing resources efficiently. For a long time library cataloguers, with their
card catalogues, were the queens and kings of the metadata world. Their metadata
standards, that is, cataloguing standards, allowed and still allow easy sharing of catalogue
resources between libraries.
The library world does not have a monopoly on creating metadata for the online world. “In
the development of RDA, consideration has also been given to the metadata standards used
in other communities (archives, museums, publishers, semantic web, etc.) to attain an
effective level of alignment between those standards and RDA.”6 RDA is designed not just to
work with MARC21, the computer data format of catalogue records that most library
management systems use, but also with other metadata standards such as Dublin Core and
ONIX7. RDA is overtly breaking down the barriers between library catalogues and other
information repositories. RDA is opening up the way for library catalogues to become
integrated with the semantic web.
World Wide Web
Since 1993, when the WWW started, data management and sharing via the internet has
developed considerably. The continual development of the WWW has greatly impacted
library management systems and catalogues.
Twenty years ago, information was placed on web pages and users moved between static
web pages via Uniform Resource Locators (URLs) and hyperlinks to read the data or
information. Retrospectively, this is called Web 1.0, the information web. Web 2.0, called the
social web, became popular about 2004; it allows users to create, collaborate and
communicate over the WWW8. Many library catalogues have added Web 2.0 functionalities
such as user tagging, rating and reviews of data; integration of data from social websites
(such as LibraryThing) into OPACs. As soon as the term Web 2.0 was coined, people
started talking about Web 3.0, looking forward to the ‘next big thing’. There have been many
definitions of Web 3.0 - the semantic web concept is one of them.
The semantic web enables computers to ‘understand’ the meaning behind the information
being shared, to disambiguate searches and provide accurate relevant search results.9 This
is achieved by adding machine-readable metadata to resources, allowing computers to
recognise how the resources are related to each other. By labelling subjects and objects in a
consistent manner, resources on the web are able to be linked in relationships which allows
for accurate and relevant search results. The Resource Description Framework (RDF),
developed by the World Wide Web Consortium, and the accompanying linked data model, is
a method of digital publishing that “allows computers or people to explore the web and find
5
JSC for development of RDA 2007, Strategic plan for RDA. Retrieved 5 March 2013 from http://www.rdajsc.org/stratplan.html.
6
RDA toolkit 2013, 0.2 Relationship to other standards for resource description and access. Retrieved 4 March
2013, from http://access.rdatoolkit.org/.
7
RDA toolkit 2013, 0.2 Relationship to other standards for resource description and access. Retrieved 4 March
2013, from http://access.rdatoolkit.org/.
8
Chatfield, Tom 2011, 50 digital ideas you really need to know, Quercus, London.
9
For a simple explanation of the semantic web watch: Sporny, Manu 2007, Introduction to the semantic web.
Retrieved 4 February 2013, from http://www.youtube.com/watch?v=OGg8A2zfWKg.
relevant and related information to things or concepts. It makes information on the web more
useful and enables data from different sources to be connected and queried.” This Wikipedia
article goes on to say that “the idea is very old and is closely related to concepts including …
controlled headings in library catalogs.”10 So it is not surprising that there is a lot of interest in
the library world in RDF and linked data. OCLC has created a very helpful short video on the
topic titled Linked data for libraries, which is an excellent explanation of the concept of linked
data and its relevance to libraries.11
RDF breaks information down into discrete subjects and objects (including persons). Each
subject or thing is given a Uniform Resource Identifier (URI). A URI works like a URL within
the WWW. A URL takes humans to a unique web page, while a URI (in RDF) takes
computers to a unique concept (subjects or objects). In library terms, each subject heading
is given a URI so a computer can find that term using coding, not relying on syntax, that is,
letters and words. By using URIs, computers have the ability to disambiguate concepts.
RDF also uses URIs to define relationships between concepts, to assist with searching. This
is called linked data triples, because they are created in threes, e.g. this subject (1) has a
relationship with (2) that object (3). For example: looking at linked data triples in library
terms, a novel (subject) is written by (relationship) an author (object).
URIs and linked data triples already exist in the library world. Library of Congress subject
headings (LCSH) have URIs, as do the Schools Online Thesaurus (ScOT) headings used in
the Schools Catalogue Information Service (SCIS) database. ScOT headings are linked to
related LCSHs using linked data ontologies (language).12
10
Bizer, Christian; Heath, Tom; Berners-Lee, Tim (2009). "Linked Data—The Story So Far". International
Journal on Semantic Web and Information Systems 5 (3): 1–22 in Wikipedia 2013, Linked data. Retrieved 4
March 2013 from http://en.wikipedia.org/wiki/Linked_data.
11
OCLC 2012, Linked data for libraries. Retrieved 30 February 2013, from
http://www.youtube.com/watch?v=fWfEYcnk8Z8.
12
Schools Online Thesaurus 2013, Science. Retrieved 6 March 2013, from
http://vocabulary.curriculum.edu.au/scot/1885.html.
RDA and FRBR
Library catalogues already capture a lot of useful bibliographic data and relationships. As is
demonstrated in the image below, subjects and objects are explicitly linked to each other in a
catalogue database by relationships such as ‘written by’, ‘illustrated by’ etc.
1946
born in
Possum
magic
written by
Mem Fox
illustrated by
work first
published in
format
Julie Vivas
1983
Picture book
Linked data requires discrete and granular information chunks, to allow accurate searching
of data elements. Catalogues already do this to some extent. RDA extends this by moving
library data even closer to RDF and a linked data environment, clearly defining elements or
data sets so they can be shared across different applications. RDA’s element sets and
vocabularies have been included in the Open Metadata Registry, which is a set of RDFbased controlled vocabularies and a fundamental piece of technical infrastructure for the
semantic web.13
RDA is underpinned by the Functional Requirements family, a set of conceptual models
developed in the international library community over the past decade. It encompasses:
FRBR – Functional Requirements for Bibliographic Records
FRAD – Functional Requirements for Authority Data
FRSAD – Functional Requirements for Subject Authority Data
13
W3C 2011, Open metadata registry. Retrieved 3 March 2013, from
http://www.w3.org/2001/sw/wiki/OpenMetadataRegistry.
FRBR has been referred to as the ‘fairy godmother of RDA.’14 RDA is strongly underpinned
by FRBR. “FRBR [is] a conceptual model for organizing bibliographic and authority
information based on the needs of the data’s users.”15 Note the link again to user needs,
following the ICP. In amongst all the tech talk, we must never lose sight of the fact that all of
this is for the convenience of the user.
FRBR is a conceptual model that lends itself to the semantic web, RDF and linked data,
because it is based on linked data triples, that is entities (subjects and objects) and
relationships. Entities in FRBR, and therefore in RDA, are clearly defined. They include
works, editions of works, people, places, concepts and so on, with 11 defined entities. These
entities have defined relationships with other entities and each entity has its own attributes.
This is best explained using Facebook as an example.
I am an entity in Facebook.
I have attributes: name is Renate, gender is female and height is 160 cm.
I have relationships with other entities in Facebook: ‘friend of’, ‘mother of’.
This library example illustrates the FRBR entities, relationships and attributes that already
exist in library catalogues, along with linked data triples.
Possum magic is a work entity.
It has a ‘written by’ relationship with Mem Fox (person entity).
14
Welsh, A. & Batley, S. 2012, Practical cataloguing : AACR2, RDA and MARC 21, Facet Publishing, London.
Hart, Amy 2010, The RDA primer : a guide for the occasional cataloguer, Linwoth, Santa Barbara,
California.
15
It has an ‘illustrated by’ relationship with Julie Vivas (person entity).
Possum magic has attributes, including first published 1983 and picture book format.
Mem Fox has an attribute, year of birth.
Attribute
1946
Entity
born
Possum
magic
written by
Entity
Mem Fox
illustrated by
work first
published
Entity
format
Julie Vivas
1983
Attribute
Picture book
Attribute
There are RDF linked triples in this diagram.
Possum magic (subject) is written by (relationship) Mem Fox (object).
Possum magic (subject) is illustrated by (relationship) Julie Vivas (object).
RDA “… will be a new standard for resource description and access, designed for the digital
world.”16 This could refer to the fact that AACR, the standards that are being replaced, were
originally written for a non-digital world, without the internet and resources such as CDs,
DVDs, MP4 files etc. It means much more than that. It means that RDA has positioned
library cataloguing data firmly in the world of the semantic web, with its modular data
elements and its focus on recording relationships. It ensures that the highest principle of the
ICP, the convenience of the user, is being followed,.
Linked data in libraries is here. It is not a thing of the future. Powerful metadata, clearly
defined elements and datasets and accurate cataloguing can produce some excellent tools
16
JSC for development of RDA 2007, Strategic plan for RDA. Retrieved 5 March 2013 from http://www.rdajsc.org/stratplan.html.
for the user. OCLC has some fascinating experimental search tools worth exploring such as
The Virtual International Authority File (VIAF)17 and WorldCat Identities Network.18
Embracing RDA, FRBR and linked data models will ensure that library catalogue systems
will continue to be an integral part of discovery systems, and will provide users with
accurate, relevant and focussed search results.
The video Linked data for libraries from OCLC ends with the statement:
… and because cataloguers and other librarians are already so good at creating and
maintaining data about these kind of relationships, libraries can be in the forefront of
the linked data revolution, building resources and services that help people find the
information they need from as many linked, authoritative sources as possible.19
Conclusion
Quality cataloguing and metadata are essential ingredients for the future of effective
resource discovery in the semantic web environment. It is important that all library
professionals have a sound understanding of the nature of the metadata that drives the
search and discovery systems within their library. This will enable them to assist patrons in
accessing the information resources required.
Creators of metadata, that is, cataloguers, will always be required to ensure that resource
discovery systems are effective. The library industry has generations of experience in
creating consistent, authoritative cataloguing. These skills are also required by metadata
creators in the information industry. Cataloguing, as a skill, is here to stay, though maybe it
should be called metadata creation, and so more easily ride the waves of change.
17
www.viaf.org
http://experimental.worldcat.org/idnetwork/
19
OCLC 2012, Linked data for libraries. Retrieved 30 March 2013, from
http://www.youtube.com/watch?v=fWfEYcnk8Z8.
18