Herding cats: indexing British Columbia’s political debates using controlled vocabulary Julie McClung Controlled vocabulary has been drawing a lot of interest for indexers, and numerous articles on the topic have appeared in indexing publications including The Indexer. This article discusses how and why the team of indexers at the British Columbia Legislative Assembly employ vocabulary control in their work practices to index political debate transcripts. Controlled vocabulary has been drawing a lot of interest for indexers, and articles on the topic have appeared in indexing publications including The Indexer. The ASI has an online Taxonomies and Controlled Vocabularies Special Interest Group (TCV SIG) which is an excellent resource for indexers. My discussion covers how and why the team of indexers at the British Columbia (BC) Legislative Assembly employ vocabulary control in their work practices to index political debate transcripts. Controlled vocabulary What is a controlled vocabulary? To answer that, we should start with ‘what is natural language?’ Natural language is the words of a text; the way it is composed by an author. A controlled vocabulary is a subset of that language whose vocabulary, syntax (the way words are put together), semantics (meaning of words) and pragmatics (the way words are used) are limited (Wellisch, 1996: 214). Therefore, any index really is a controlled subset of the natural language that was in the text. The purpose of vocabulary control is to attain consistency in describing content and to guide users to where desired information is (Hedden, 2008: 33). According to the ASI’s TCV SIG website, ‘a controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing.’ In this article I use the terms ‘controlled vocabulary’ and ‘authority file’ interchangeably. So in a practical sense a controlled vocabulary means the terms (or descriptors or subject headings) and term relationships (equivalent, hierarchical and associative) that are used to describe concepts and collect similar information (Leise, 2008: 121–2). Controlled vocabularies are useful tools because they reduce the ambiguities that are inherent in language, where there are many different ways of saying the same thing. They address the issues of homographs (words spelled the same but with different meanings), synonyms (different words with identical or similar meanings) and polysemes (words with multiple, related meanings) in this way: they ensure that each concept in a text is described using only one authorized term, and each authorized term in the controlled vocabulary describes only one concept (Wikipedia, nd). Typically, controlled vocabulary is applied when indexing collective works, large projects or ongoing publications, and 66 when multiple indexers work on projects. Most indexers are familiar with traditional book indexing, which can be called a closed system of indexing for a number of reasons (Klement, 2002: 23–31; Mulvany, 2005: 4–5). Information in book indexes is often cross-posted or double-posted so that users have multiple entry points when looking for topics. Terms found in the text – the natural language used in the text – are found in the index, as well as broader subject headings that may be created by the indexer. As such, each index is unique to its content. In contrast to back-of-the-book indexing, ongoing or collective works and periodicals must be addressed with an open system of indexing. These indexing projects require an approach that can anticipate additional content without affecting the index headings or structure, and can contend effectively with variant terminology and synonym control. This is achieved through the use of vocabulary control, in the form of synonym rings (synonymous terms), authority files (which may or may not contain relationships between terms), taxonomies or thesauri that display the complex relationships between terms (Leise, 2008: 122). Why BC Hansard indexers use controlled vocabulary A controlled vocabulary in the form of an authority file1 is used at the BC Legislative Assembly to guide the subject indexing of political debates for a number of reasons. Parliamentary indexing holds many challenges due to its open-ended nature, multiple speakers, specialized content, workload and workflow, and end products, as noted by several indexers of the proceedings at different Commonwealth Parliaments (Skakle and Spratt, 2000: 65; Grist, 2003: 138–9; Bilodeau, 2008: 116). In parliamentary indexing, the source text is an evergrowing collection of documents rather than a single volume with a distinct beginning, middle and end. The ongoing debates in the Legislature are transcribed, edited and published during the legislative session, much like periodicals. Political debates transcripts (Hansard2) cover discrete sittings of the House (such as from 10.00 am to 12.00 pm), and consist of the discussions by Members of the Legislative Assembly (MLAs) of business such as legislation and government ministry expenditures. The Indexer Vol. 27 No. 2 June 2009 McClung: Herding cats A typical session of the BC Legislature, which is roughly one year between opening and prorogation, can generate 12 volumes of debates, each containing 6–12 issues and often amounting to over 4,000 pages of text. Individual sitting indexes and cumulative sessional indexes are published at regular intervals online as the session progresses, and are typeset and printed after the session has ended. With all this material available, users need to be able to search over multiple years easily, and therefore a controlled vocabulary allows for consistent subject headings across many sessional (or yearly) indexes. In addition, the multiple MLAs (or, for the purposes of this article ‘authors’) whose speeches make up the source text create diverse content and ensure a wide vocabulary pool with many variant terms that can be challenging to index. The debates have structure in that there is an agenda (the ‘Routine Proceedings’) for the business to be discussed, but within that structure there is a certain amount of leeway for wide-ranging topics. The parliamentary proceedings do not necessarily read like a proofread book; while there are well-rehearsed speeches, there are also extemporaneous speeches, and unexpected events can arise. These are living, breathing human beings speaking in the chamber, and politicians, as you can imagine, can engage in much ‘uncontrolled vocabulary.’ Debate topics take unexpected turns, especially the odd time when filibusters run late into the evening. The upshot in terms of indexing political debate transcripts is that these ‘multiple authors’ (there are 79 MLAs in total for BC, soon to be 85 with the next general election) can come up with a breathtaking variety of synonyms for the same subjects. A typical book publication has one author, but the many speakers in political debate transcripts create a much wider range of language to contend with, as with any content produced by multiple authors. This makes the previously mentioned problem of homographs, synonyms and polysemes even harder to address. Instead of a single authorial ‘voice’ to follow, there is a chorus – and in the case of political debates, it comes complete with hecklers. Like any specialized area, there is a lot of jargon and technical terminology, acronyms and interesting program names to wade through. A lot of this cacophony comes from the broad spectrums of opinion and political positions taken by different political party members. For example, take an exchange about a contract between two different entities: depending on which side of the House a politician sits, this contract might be called a scheme, an agreement, shady business dealings or a value-added partnership. In choosing the term ‘contract,’ the language of index meets polarized content in the middle, presenting unbiased and nonpartisan terms for information retrieval from the source text. For all of these reasons, the highly politicized debates can be difficult for indexers to analyze to ensure that they are routinely describing content in the same way. Furthermore, the workload associated with creating the indexes to the debates requires indexers to function as a team, but it is difficult to make indexing practices consistent amongst individual indexers.3 As many authorities have noted, indexers will index content in different ways (Kells and Smith, 2005; Wellisch, 1996: 419), so it is very difficult to The Indexer Vol. 27 No. 2 June 2009 have standardized indexing practices without consulting an authority file. When indexers consult the same resource, they are better able to describe topics they encounter in a similar way and therefore make consistent indexing decisions. The controlled vocabulary, then, is like a map that answers two vital questions for the indexer: ‘What do I call this topic?’ and ‘Where do I put this topic?’ BC Hansard indexers create a number of index files that are merged together to create larger indexes – cumulative Subject, Members and Business Indexes – as the Legislative session progresses. A consistent indexing approach makes this merging and editing task much easier, as the ‘backbone’ of the index structure does not change despite the addition of new content. Finally, indexers create a number of indexes not just for the transcripts of debates of the Legislature, but for the Legislative committee transcripts as well. These are separate but related index products, and again, it benefits users to be able to search for information across different indexes using the same subject headings. How BC Hansard indexers use controlled vocabulary The Hansard Index to debates has been available since 1970, the same year the Official Report of Debates of the Legislative Assembly (Hansard) transcripts began to be published in BC. The Library of Congress Subject Headings, among other resources such as federal government publications and other documentation particular to BC’s history and culture, were consulted in creating subject headings and guiding terminology decisions. Over time an alphabetical list of subject headings was compiled, which is the foundation of the current authority file. Despite its considerable size and scope at present – the authority file is now a specialized resource that reflects 39 years of subjects unique to BC politics – it is still by no means exhaustive. To further aid indexers, the authority file contains many scope notes which clarify information about particular subjects. The backbone of this authority file is made up of year-in and year-out subjects that arise from important government policy areas and portfolio responsibilities such as education, energy, forestry, health care, mining, post-secondary education, social services and transportation. The authority file’s most frequently used subject headings flow from these major policy areas. For example, a significant amount of discussion takes place each year about expenditures for the responsibilities of the Health Ministry. To organize information easily for government staff and other users of the indexes, health-related topics appear as these compound main headings in the authority file: Health Health – Continuing and long-term care Health – Health authorities Health – Health care facilities and hospitals Health – Health care funding Health – Health care system Health – Public health 67 McClung: Herding cats Numerous cross-references are included to guide indexers to the authorized main heading (e.g. Hospitals See Health – Health care facilities and hospitals), and many variant terms direct indexers to the preferred term (e.g. Doctors, See Physicians). The indexers therefore have a useful blueprint to follow when indexing a new session’s health-related content, and a predictable structure that serves users well over many years’ worth of debates. As mentioned, the authority file contains many terms and term relationships to aid indexers in describing subjects consistently for index entries. Equivalence relationships address synonyms and direct indexers to preferred terms (e.g. Farming, See Agriculture), hierarchical relationships reflect how information is arranged from broader subjects to narrower subjects (e.g. the main heading Plants has subheadings such as Invasive plants), and associative relationships show related information (e.g. the main heading Education (K–12) is cross-referenced to the related topic of Post-secondary education). Words and phrases are chosen carefully to avoid duplicating information in the authority file. Having more than one ‘home’ for a distinct subject could result in indexers putting the same concept in more than one place in the index, which makes the organization of information less efficient and ultimately makes it harder for users to search for all available information on a particular subject. Changes to terms are not made without considerable research and consultation, and any more significant changes to terms and term relationships are made in-between parliaments to minimize the transition effort. In practice, then, indexers consult the authority file to confirm how to index subjects; it is their ‘map’ for what to call topics and where to put them. A few examples below illustrate how a new BC Hansard indexer would consult the authority file. Consider the typically BC subject of ferry boat services that transport people and goods between Vancouver Island and the mainland. Where is the home for this subject in the indexes? A quick keyword search of the authority file using the term ‘ferry’ reveals information on how the main heading and possible subheadings of the index entry would look: Ferries and ferry services (main heading) Fast ferries (subheading) Routes and services (subheading) Vessels (subheading) For another example, take a discussion about mountain caribou protection. A consultation of the authority file would confirm that this subject is categorized under Wildlife, along with other species such as Bears and Wolves. In these ways the authority file proves to be a comprehensive resource that allows indexers of BC Hansard to describe and organize subjects consistently as they read, mark up and enter index records for thousands of pages of debates each year. For the majority of subjects discussed in debate, there is a well-established organizational structure which helps indexers describe content consistently and in turn, helps users of the index retrieve information easily. 68 Challenges of using controlled vocabulary Because language is constantly evolving, authority files require ongoing maintenance; adjustments must be made to a controlled vocabulary to reflect these changes. As Fred Leise notes, ‘Once a controlled vocabulary has been created, it cannot just be shelved and forgotten’ (2008: 126). Terms may be added, removed or revised, or the relationships changed, to keep the authority file current. For example, emerging topics in BC political debates, such as global warming issues, represent a new body of information to be assimilated. There was simply no mention of climate change in the political debates even as recently as ten years ago, and so the authority file was lacking in that subject area. After careful research and consultation, the authority file now contains new subject headings Carbon and Climate change to reflect the new content in debates. In the same way, term evolution is a challenge: new terms arise while others fade away into obscurity. One important example for the BC Hansard Index is the changing terms for the indigenous peoples of BC. The source text of the debates often used the terms ‘Indians’ and ‘Native peoples’ in the 1970s and 1980s. However, in the 1990s and into the 2000s this changed to the majority of references being to ‘Aboriginal peoples’ and ‘First nations.’ These changes were eventually reflected in the controlled vocabulary, which made the transition in a new parliament from the old term Indians to the new term First nations. Also, what seem to be new subjects are often old subjects in disguise; indexers are constantly sifting through synonymous terms. At present, the authority file guides indexers to use the term ‘social equity issues’ to describe the imbalance between the rich and the poor in addressing climate change. More and more, however, the term ‘climate justice’ is appearing, which is now cross-referenced to the former term. Should ‘climate justice’ become more widely known, perhaps in time this term will become preferred and the authority file will be adjusted accordingly. Conclusions Using controlled vocabulary in the form of an authority file helps the BC Hansard indexing team maintain consistent indexing practices when handling challenging content and workload. This standardized approach to indexing subjects over multiple years of debate publications assists individual indexers in carrying out their tasks, and results in easy and efficient index searches for users, including MLAs, precinct staff, government employees, the media and the general public. Despite the challenges of managing such a resource, it is an indispensable tool for BC Hansard indexers. Notes 1 This authority file is to guide indexing practices for subjects only. Indexers also create index records for named entities such as people, organizations, geographies and works, but that is outside the scope of the subject authority file discussed in this article. 2 The transcripts of the debates are also referred to as Hansard, a The Indexer Vol. 27 No. 2 June 2009 McClung: Herding cats name that comes from the Hansard family, the first authorized printers of the official records of the 19th-century British parliamentary debates. 3 For an interesting discussion of guidelines for team indexing of proceedings of the Canadian Senate committees, see Stephanie Bilodeau’s recent article (2008). References Agee, V. (2008) Controlling our own vocabulary: a primer for indexers working in the world of taxonomy. Key Words 16(1), 30–1. ASI Taxonomies and Controlled Vocabularies Special Interest Group (2007) About taxonomies and controlled vocabularies [online] www.taxonomies-sig.org/ (accessed 26 March 2008). Bilodeau, S. (2008) The Parliament of Canada: indexing the work of the Senate committees. The Indexer 26(3), 114–17. Grist, D. (2003) Indexing legislative text: Alberta Hansard. The Indexer 23(3), 138–9. Hedden, H. (2008) Controlled vocabularies, thesauri, and taxonomies. The Indexer 26(1), 33–4. Kells, K. and Smith, S. (2005) Inside indexing: the decision-making process. Oregon: Northwest Indexing Press. Klement, S. (2002). Open-system versus closed-system indexing: a vital distinction. The Indexer 23(1), 23–31. Landes, Cheryl (2007). Workshop Report: Creating Taxonomies and Controlled Vocabularies. Key Words 15(3), pp. 74-75. The Indexer Vol. 27 No. 2 June 2009 Legislative Assembly of British Columbia (2009) British Columbia Hansard Index [online] www.legis.gov.bc.ca/hansard/8-2.htm (accessed 13 March 2009). Leise, F. (2008) Controlled vocabularies: an introduction. The Indexer 26(3), 121–6. Leise, F., Fast, K., and Steckel, M. (2002) What is a controlled vocabulary? Boxes and arrows: the design behind the design [online] www.boxesandarrows.com/view/what_is_a_ controlled_ vocabulary_ (accessed 27 February 2009). Mulvany, N. C. (2005) Indexing books, 2nd edn. Chicago: University of Chicago Press. Skakle, S. and Spratt, T. (2000). Indexing the proceedings and publications of the Scottish Parliament. The Indexer 22(2), 65–8. Wellisch, H.H. (1996) Indexing from A to Z, 2nd edn. New York: H.W. Wilson. Wikipedia (nd) Controlled vocabulary. [online] http:// en.wikipedia.org/wiki/Controlled_vocabulary (accessed 27 February 2009). Julie McClung is an experienced indexer of political debates at the British Columbia Legislative Assembly. Her responsibilities include leading the preparation of subject, speaker and business indexes for House and legislative committee transcripts, publishing the final revised index at the end of each legislative session, and responding to requests for information on legislative proceedings. Email: [email protected] 69
© Copyright 2026 Paperzz