indexing British Columbia`s political debates using

Herding cats: indexing British Columbia’s
political debates using controlled vocabulary
Julie McClung
Controlled vocabulary has been drawing a lot of interest for indexers, and numerous articles on the topic have
appeared in indexing publications including The Indexer. This article discusses how and why the team of indexers
at the British Columbia Legislative Assembly employ vocabulary control in their work practices to index political
debate transcripts.
Controlled vocabulary has been drawing a lot of interest for
indexers, and articles on the topic have appeared in indexing
publications including The Indexer. The ASI has an online
Taxonomies and Controlled Vocabularies Special Interest
Group (TCV SIG) which is an excellent resource for
indexers. My discussion covers how and why the team of
indexers at the British Columbia (BC) Legislative Assembly
employ vocabulary control in their work practices to index
political debate transcripts.
Controlled vocabulary
What is a controlled vocabulary? To answer that, we should
start with ‘what is natural language?’ Natural language is the
words of a text; the way it is composed by an author. A
controlled vocabulary is a subset of that language whose
vocabulary, syntax (the way words are put together), semantics (meaning of words) and pragmatics (the way words are
used) are limited (Wellisch, 1996: 214). Therefore, any index
really is a controlled subset of the natural language that was
in the text.
The purpose of vocabulary control is to attain consistency
in describing content and to guide users to where desired
information is (Hedden, 2008: 33). According to the ASI’s
TCV SIG website, ‘a controlled vocabulary, also called an
authority file, is an authoritative list of terms to be used in
indexing.’ In this article I use the terms ‘controlled vocabulary’ and ‘authority file’ interchangeably. So in a practical
sense a controlled vocabulary means the terms (or descriptors
or subject headings) and term relationships (equivalent, hierarchical and associative) that are used to describe concepts
and collect similar information (Leise, 2008: 121–2).
Controlled vocabularies are useful tools because they
reduce the ambiguities that are inherent in language, where
there are many different ways of saying the same thing. They
address the issues of homographs (words spelled the same
but with different meanings), synonyms (different words
with identical or similar meanings) and polysemes (words
with multiple, related meanings) in this way: they ensure
that each concept in a text is described using only one
authorized term, and each authorized term in the controlled
vocabulary describes only one concept (Wikipedia, nd).
Typically, controlled vocabulary is applied when indexing
collective works, large projects or ongoing publications, and
66
when multiple indexers work on projects. Most indexers are
familiar with traditional book indexing, which can be called
a closed system of indexing for a number of reasons
(Klement, 2002: 23–31; Mulvany, 2005: 4–5). Information in
book indexes is often cross-posted or double-posted so that
users have multiple entry points when looking for topics.
Terms found in the text – the natural language used in the
text – are found in the index, as well as broader subject
headings that may be created by the indexer. As such, each
index is unique to its content.
In contrast to back-of-the-book indexing, ongoing or
collective works and periodicals must be addressed with an
open system of indexing. These indexing projects require an
approach that can anticipate additional content without
affecting the index headings or structure, and can contend
effectively with variant terminology and synonym control.
This is achieved through the use of vocabulary control, in
the form of synonym rings (synonymous terms), authority
files (which may or may not contain relationships between
terms), taxonomies or thesauri that display the complex
relationships between terms (Leise, 2008: 122).
Why BC Hansard indexers use controlled
vocabulary
A controlled vocabulary in the form of an authority file1 is
used at the BC Legislative Assembly to guide the subject
indexing of political debates for a number of reasons. Parliamentary indexing holds many challenges due to its
open-ended nature, multiple speakers, specialized content,
workload and workflow, and end products, as noted by
several indexers of the proceedings at different Commonwealth Parliaments (Skakle and Spratt, 2000: 65; Grist, 2003:
138–9; Bilodeau, 2008: 116).
In parliamentary indexing, the source text is an evergrowing collection of documents rather than a single volume
with a distinct beginning, middle and end. The ongoing
debates in the Legislature are transcribed, edited and
published during the legislative session, much like periodicals. Political debates transcripts (Hansard2) cover discrete
sittings of the House (such as from 10.00 am to 12.00 pm),
and consist of the discussions by Members of the Legislative
Assembly (MLAs) of business such as legislation and
government ministry expenditures.
The Indexer Vol. 27 No. 2 June 2009
McClung: Herding cats
A typical session of the BC Legislature, which is roughly
one year between opening and prorogation, can generate 12
volumes of debates, each containing 6–12 issues and often
amounting to over 4,000 pages of text. Individual sitting
indexes and cumulative sessional indexes are published at
regular intervals online as the session progresses, and are
typeset and printed after the session has ended. With all this
material available, users need to be able to search over
multiple years easily, and therefore a controlled vocabulary
allows for consistent subject headings across many sessional
(or yearly) indexes.
In addition, the multiple MLAs (or, for the purposes of
this article ‘authors’) whose speeches make up the source
text create diverse content and ensure a wide vocabulary
pool with many variant terms that can be challenging to
index. The debates have structure in that there is an agenda
(the ‘Routine Proceedings’) for the business to be discussed,
but within that structure there is a certain amount of leeway
for wide-ranging topics. The parliamentary proceedings do
not necessarily read like a proofread book; while there are
well-rehearsed speeches, there are also extemporaneous
speeches, and unexpected events can arise. These are living,
breathing human beings speaking in the chamber, and politicians, as you can imagine, can engage in much ‘uncontrolled
vocabulary.’ Debate topics take unexpected turns, especially
the odd time when filibusters run late into the evening.
The upshot in terms of indexing political debate
transcripts is that these ‘multiple authors’ (there are 79
MLAs in total for BC, soon to be 85 with the next general
election) can come up with a breathtaking variety of
synonyms for the same subjects. A typical book publication
has one author, but the many speakers in political debate
transcripts create a much wider range of language to
contend with, as with any content produced by multiple
authors. This makes the previously mentioned problem of
homographs, synonyms and polysemes even harder to
address. Instead of a single authorial ‘voice’ to follow, there
is a chorus – and in the case of political debates, it comes
complete with hecklers.
Like any specialized area, there is a lot of jargon and
technical terminology, acronyms and interesting program
names to wade through. A lot of this cacophony comes from
the broad spectrums of opinion and political positions taken
by different political party members. For example, take an
exchange about a contract between two different entities:
depending on which side of the House a politician sits, this
contract might be called a scheme, an agreement, shady
business dealings or a value-added partnership. In choosing
the term ‘contract,’ the language of index meets polarized
content in the middle, presenting unbiased and nonpartisan
terms for information retrieval from the source text. For all
of these reasons, the highly politicized debates can be difficult for indexers to analyze to ensure that they are routinely
describing content in the same way.
Furthermore, the workload associated with creating the
indexes to the debates requires indexers to function as a
team, but it is difficult to make indexing practices consistent
amongst individual indexers.3 As many authorities have
noted, indexers will index content in different ways (Kells
and Smith, 2005; Wellisch, 1996: 419), so it is very difficult to
The Indexer Vol. 27 No. 2 June 2009
have standardized indexing practices without consulting an
authority file. When indexers consult the same resource,
they are better able to describe topics they encounter in a
similar way and therefore make consistent indexing decisions. The controlled vocabulary, then, is like a map that
answers two vital questions for the indexer: ‘What do I call
this topic?’ and ‘Where do I put this topic?’
BC Hansard indexers create a number of index files that
are merged together to create larger indexes – cumulative
Subject, Members and Business Indexes – as the Legislative
session progresses. A consistent indexing approach makes
this merging and editing task much easier, as the ‘backbone’
of the index structure does not change despite the addition
of new content. Finally, indexers create a number of indexes
not just for the transcripts of debates of the Legislature, but
for the Legislative committee transcripts as well. These are
separate but related index products, and again, it benefits
users to be able to search for information across different
indexes using the same subject headings.
How BC Hansard indexers use controlled
vocabulary
The Hansard Index to debates has been available since
1970, the same year the Official Report of Debates of the
Legislative Assembly (Hansard) transcripts began to be
published in BC. The Library of Congress Subject Headings, among other resources such as federal government
publications and other documentation particular to BC’s
history and culture, were consulted in creating subject
headings and guiding terminology decisions. Over time an
alphabetical list of subject headings was compiled, which is
the foundation of the current authority file. Despite its
considerable size and scope at present – the authority file
is now a specialized resource that reflects 39 years of
subjects unique to BC politics – it is still by no means
exhaustive. To further aid indexers, the authority file
contains many scope notes which clarify information about
particular subjects.
The backbone of this authority file is made up of year-in
and year-out subjects that arise from important government
policy areas and portfolio responsibilities such as education,
energy, forestry, health care, mining, post-secondary education, social services and transportation. The authority file’s
most frequently used subject headings flow from these
major policy areas. For example, a significant amount of
discussion takes place each year about expenditures for the
responsibilities of the Health Ministry. To organize information easily for government staff and other users of the
indexes, health-related topics appear as these compound
main headings in the authority file:
Health
Health – Continuing and long-term care
Health – Health authorities
Health – Health care facilities and hospitals
Health – Health care funding
Health – Health care system
Health – Public health
67
McClung: Herding cats
Numerous cross-references are included to guide indexers
to the authorized main heading (e.g. Hospitals See Health –
Health care facilities and hospitals), and many variant
terms direct indexers to the preferred term (e.g. Doctors,
See Physicians). The indexers therefore have a useful blueprint to follow when indexing a new session’s health-related
content, and a predictable structure that serves users well
over many years’ worth of debates.
As mentioned, the authority file contains many terms and
term relationships to aid indexers in describing subjects
consistently for index entries. Equivalence relationships
address synonyms and direct indexers to preferred terms
(e.g. Farming, See Agriculture), hierarchical relationships
reflect how information is arranged from broader subjects to
narrower subjects (e.g. the main heading Plants has
subheadings such as Invasive plants), and associative relationships show related information (e.g. the main heading
Education (K–12) is cross-referenced to the related topic of
Post-secondary education).
Words and phrases are chosen carefully to avoid duplicating information in the authority file. Having more than
one ‘home’ for a distinct subject could result in indexers
putting the same concept in more than one place in the
index, which makes the organization of information less efficient and ultimately makes it harder for users to search for
all available information on a particular subject. Changes to
terms are not made without considerable research and
consultation, and any more significant changes to terms and
term relationships are made in-between parliaments to
minimize the transition effort.
In practice, then, indexers consult the authority file to
confirm how to index subjects; it is their ‘map’ for what to
call topics and where to put them. A few examples below
illustrate how a new BC Hansard indexer would consult the
authority file.
Consider the typically BC subject of ferry boat services
that transport people and goods between Vancouver Island
and the mainland. Where is the home for this subject in the
indexes? A quick keyword search of the authority file using
the term ‘ferry’ reveals information on how the main
heading and possible subheadings of the index entry would
look:
Ferries and ferry services (main heading)
Fast ferries (subheading)
Routes and services (subheading)
Vessels (subheading)
For another example, take a discussion about mountain
caribou protection. A consultation of the authority file
would confirm that this subject is categorized under
Wildlife, along with other species such as Bears and Wolves.
In these ways the authority file proves to be a comprehensive resource that allows indexers of BC Hansard to
describe and organize subjects consistently as they read,
mark up and enter index records for thousands of pages of
debates each year. For the majority of subjects discussed in
debate, there is a well-established organizational structure
which helps indexers describe content consistently and in
turn, helps users of the index retrieve information easily.
68
Challenges of using controlled vocabulary
Because language is constantly evolving, authority files
require ongoing maintenance; adjustments must be made to
a controlled vocabulary to reflect these changes. As Fred
Leise notes, ‘Once a controlled vocabulary has been created,
it cannot just be shelved and forgotten’ (2008: 126). Terms
may be added, removed or revised, or the relationships
changed, to keep the authority file current.
For example, emerging topics in BC political debates,
such as global warming issues, represent a new body of
information to be assimilated. There was simply no mention
of climate change in the political debates even as recently as
ten years ago, and so the authority file was lacking in that
subject area. After careful research and consultation, the
authority file now contains new subject headings Carbon
and Climate change to reflect the new content in debates.
In the same way, term evolution is a challenge: new terms
arise while others fade away into obscurity. One important
example for the BC Hansard Index is the changing terms for
the indigenous peoples of BC. The source text of the
debates often used the terms ‘Indians’ and ‘Native peoples’
in the 1970s and 1980s. However, in the 1990s and into the
2000s this changed to the majority of references being to
‘Aboriginal peoples’ and ‘First nations.’ These changes were
eventually reflected in the controlled vocabulary, which
made the transition in a new parliament from the old term
Indians to the new term First nations.
Also, what seem to be new subjects are often old subjects
in disguise; indexers are constantly sifting through synonymous terms. At present, the authority file guides indexers to
use the term ‘social equity issues’ to describe the imbalance
between the rich and the poor in addressing climate change.
More and more, however, the term ‘climate justice’ is
appearing, which is now cross-referenced to the former
term. Should ‘climate justice’ become more widely known,
perhaps in time this term will become preferred and the
authority file will be adjusted accordingly.
Conclusions
Using controlled vocabulary in the form of an authority file
helps the BC Hansard indexing team maintain consistent
indexing practices when handling challenging content and
workload. This standardized approach to indexing subjects
over multiple years of debate publications assists individual
indexers in carrying out their tasks, and results in easy and
efficient index searches for users, including MLAs, precinct
staff, government employees, the media and the general
public. Despite the challenges of managing such a resource,
it is an indispensable tool for BC Hansard indexers.
Notes
1 This authority file is to guide indexing practices for subjects
only. Indexers also create index records for named entities such
as people, organizations, geographies and works, but that is
outside the scope of the subject authority file discussed in this
article.
2 The transcripts of the debates are also referred to as Hansard, a
The Indexer Vol. 27 No. 2 June 2009
McClung: Herding cats
name that comes from the Hansard family, the first authorized
printers of the official records of the 19th-century British
parliamentary debates.
3 For an interesting discussion of guidelines for team indexing of
proceedings of the Canadian Senate committees, see Stephanie
Bilodeau’s recent article (2008).
References
Agee, V. (2008) Controlling our own vocabulary: a primer for
indexers working in the world of taxonomy. Key Words 16(1),
30–1.
ASI Taxonomies and Controlled Vocabularies Special Interest
Group (2007) About taxonomies and controlled vocabularies
[online] www.taxonomies-sig.org/ (accessed 26 March 2008).
Bilodeau, S. (2008) The Parliament of Canada: indexing the work
of the Senate committees. The Indexer 26(3), 114–17.
Grist, D. (2003) Indexing legislative text: Alberta Hansard. The
Indexer 23(3), 138–9.
Hedden, H. (2008) Controlled vocabularies, thesauri, and
taxonomies. The Indexer 26(1), 33–4.
Kells, K. and Smith, S. (2005) Inside indexing: the decision-making
process. Oregon: Northwest Indexing Press.
Klement, S. (2002). Open-system versus closed-system indexing: a
vital distinction. The Indexer 23(1), 23–31.
Landes, Cheryl (2007). Workshop Report: Creating Taxonomies
and Controlled Vocabularies. Key Words 15(3), pp. 74-75.
The Indexer Vol. 27 No. 2 June 2009
Legislative Assembly of British Columbia (2009) British Columbia
Hansard Index [online] www.legis.gov.bc.ca/hansard/8-2.htm
(accessed 13 March 2009).
Leise, F. (2008) Controlled vocabularies: an introduction. The
Indexer 26(3), 121–6.
Leise, F., Fast, K., and Steckel, M. (2002) What is a controlled
vocabulary? Boxes and arrows: the design behind the design
[online] www.boxesandarrows.com/view/what_is_a_
controlled_ vocabulary_ (accessed 27 February 2009).
Mulvany, N. C. (2005) Indexing books, 2nd edn. Chicago:
University of Chicago Press.
Skakle, S. and Spratt, T. (2000). Indexing the proceedings and
publications of the Scottish Parliament. The Indexer 22(2),
65–8.
Wellisch, H.H. (1996) Indexing from A to Z, 2nd edn. New York:
H.W. Wilson.
Wikipedia (nd) Controlled vocabulary. [online] http://
en.wikipedia.org/wiki/Controlled_vocabulary (accessed 27
February 2009).
Julie McClung is an experienced indexer of political debates at the
British Columbia Legislative Assembly. Her responsibilities include
leading the preparation of subject, speaker and business indexes for
House and legislative committee transcripts, publishing the final
revised index at the end of each legislative session, and responding
to requests for information on legislative proceedings. Email:
[email protected]
69