145 - The Indexer - The International Journal

V
The Subject Access Project:
A comparison with PRECIS
Carolynn E. Bett
Following a summary of the aims and
recommendations of Atherton's SAP Report is a
critique of her discussion on PRECIS, inter
woven with corrective explanations of how
PRECIS works. In the SAP Report PRECIS is
confused with a bibliographic file, with document
analysis, with indexing policy and with the British
MARC Verbal Feature Heading. PRECIS is none
of the above, but rather an indexing method
based on a system of grammar, with computer
codes used to create a printed index. As well as
clearing up errors in relation to PRECIS, the
display value of PRECIS and machine ap
plications to PRECIS are noted in response to
Atherton 's desiderata for a new indexing system.
This paper has two purposes; firstly, to
commend to a wider readership the ideas on
subject access presented by Pauline Atherton in
Books Are For Use; Final Report of the Subject
Access Project to the Council on Library
Resources; secondly, to comment on some of the
statements about PRECIS (PREserved Context
Index System) which Atherton freely admits were
not based on careful testing, as was the rest of her
report.'
The aims of the research undertaken in the
Subject Access Project (SAP) were to find out
about:
(1)
availability of suitable information in
books to produce augmented subject
descriptions;
(2)
cost of inputting these subject descrip
tions in machine-readable form;
(3)
cost of computer storage of a BOOKS
data
base
(MARC-like
records
augmented with subject descriptions);
(4)
cost of online searching of a BOOKS
database;
(5)
benefits derived from online searching of
BOOKS.2
The research was considered a necessary
preparation for adapting the conventional
catalogue to the innovations of MARC (MAchine
The Indexer Vol. 11 No. 3 April 1979
Readable Cataloguing), International Standard
Bibliographic Description (ISBD) and online
search services. While many machine-readable
databases are available for 'free-text searching'
of journal literature, few are available for
monograph literature. Moreover, the traditional
subject descriptions for books do not capture the
specific information contained in each book.
Atherton prepared thorough statistics on the
number of books with useful tables of contents
and/or indexes, giving interesting breakdowns by
subject. Rules were developed for selecting
subject descriptions from the table of contents
and the back of the book index to augment the
MARC record. The thoroughness of the testing
and documentation of findings is, indeed, im
pressive.
Many recommendations
fire the
imagination for further research into fuller
exploitation of our mechanized bibliographic
systems.
It is hoped that the recommendation for
further work on an objective method of in-depth,
efficient indexing3 will be carried out. Scanning
the table of contents and the back of the book
index, traditionally, has provided quick and
specific entry into monograph literature for
reference librarians. Why does it, then, come as a
revolutionary suggestion to adapt this approach
to online searching?
The recommendation that publishers improve
the quality of tables of contents and indexes4
should be taken as seriously as the cataloguing in
publication project has been, to bring more
quickly and directly 'every book its reader'.5
However, what do we do until publishers assume
this responsibility?
Rather than following through the recom
mendation to put online 'a version of Roget's
Thesaurus'? let us consider an online thesaurus
in which the synonyms are hidden and the
hierarchical structure revealed, as an aid to
searching. This could be achieved by having the
user-selected term matched to a number address
which holds all the synonyms. In a free-text
system, all synonyms could be posted to search
the text without the user even being aware that
145
this was being done. Another approach would be
to control the indexing terms according to the
thesaurus but still leave the searcher free to use
his own language, with the computer translating
and searching behind the scenes.7
PRECIS provides a system for controlling
indexing terms while leaving the vocabulary
open-ended to accommodate new concepts. On
page 4, Atherton refers to PRECIS as a 'data
base of the BNB (British National Bibliography)'.
This is technically incorrect and leads to a great
deal of confusion, evident in later discussions of
PRECIS in the SAP. PRECIS was developed by
the BNB, but PRECIS is not a database. Rather,
it is an indexing method based on a system of
grammar,
approximating
natural
language
grammar, which throws new light on to the
structure of natural language. However, PRECIS
is much more rigid than most natural languages.
Its grammar verbalizes concepts so that their
context can be expressed:
e.g.
TEACHERS. Secondary Schools
Attitudes to students
PRECIS may be applied to any type or form of
materials: books, research, A-V materials, ar
tifacts, etc. There are two PRECIS files: the
subject file which shows all the subject strings
(i.e., sentences) together with the PRECIS
reference indicator numbers; and the reference
file, which shows thesaural relationships between
terms. The bibliographic file at BNB is a MARC
file, although not all bibliographical files indexed
by PRECIS are in the MARC format. In the
British MARC format there are three fields
pertaining directly to PRECIS: 690, 691 and 692,
with a fourth field 083 that is derived from the
PRECIS string. Field 690 contains the complete
PRECIS string(s) with codes; field 691 contains
the Subject Indicator Number(s) (SINs); and field
692 contains the Reference Indicator Number(s)
(RINs). Field 083 contains the Verbal Feature
Heading, which is derived from the PRECIS
string and used only in the classified catalogue of
the BNB. Field 690 is held in CAN MARC 680
and is used for international sharing.8 Not all of
us in Canada who are indexing with the PRECIS
method are using the MARC format for our
bibliographic data; however, we must all create a
RIN file and SIN file (number files) which are
matched to the PRECIS files to print subject
indexes complete with cross references. Usually
the indexes created by PRECIS programs are
two-stage, with the index pointing to a classified
146
catalogue, as in BNB, or pointing to a listing by
document number, as in the Ontario Educational
Research Information System (ONTERIS);
however, a one-stage computer-produced book
and microfiche index using short bibliographic
citations has been produced for Aurora High
School, Ontario.9
A strict grammar can be applied only when the
indexer has acquired a notion of what the book is
about. This process may be likened to rough
notes
taken
in
a
lecture,
which
are
later
developed into a publishable paper. Experienced
PRECIS indexers often go from the rough notes
stage to the finished grammatical stage in the
twinkling of an eye and write down only the
finished strings (sentences). The point is that
Atherton deals with methods of arriving at the
rough
notes,
otherwise
known
as
document
analysis; whereas, PRECIS deals with expressing
the analysis in such a manner that the computer
can print meaningful subject indexes.
Those on the PRECIS team often mention that
document analysis is the most fundamental
problem, but as yet international guidelines for
objective establishment of the subject of a
document, although in progress, have not been
published. It is to be hoped the results of
Atherton's Subject Access Project will
profoundly influence the development of such
guidelines.
In the section of the SAP report on comparison
of the BOOKS system with PRECIS, PRECIS
does not fare very well. Let me stress that
Atherton has acknowledged the cursory treat
ment given PRECIS and that those of us involved
with PRECIS appreciate her recognition that
PRECIS is to be acknowledged, studied and
come to terms with, although the Library of
Congress system is of consuming interest to
North Americans.
Atherton recognizes that in BOOKS, the
database built for SAP, the subject headings
derived from the table of contents and back of
the book indexes are like content notes.' ° She has
an indexing policy of providing up to 30 entries
per book. The BNB indexing policy approximates
that of the Library of Congress, namely, indexing
is designed to summarize the subject of the book.
The BNB, however, will index to a maximum of
four strings, giving, in the permutations of the
strings, many more access points than L.C.
The indexing policy for the Aurora PRECIS
Project more nearly approximates Atherton's in
The Indexer Vol. 11 No. 3 April 1979
its analytical approach to information contained
between two covers. In the Aurora Project, the
strings tend to be concise (i.e., have only a
few terms compared with ONTERIS strings);
therefore, it may take several strings to express
the multi-concepts contained in each document.
The indexing policy is guided by the needs of
students and teachers as related to the
curriculum. At ONTERIS, our policy and
materials lead to the writing of few, but rather
long, complex strings giving, like Aurora, many
access points for each document.
In examples 6, 7, 8 & 9, Atherton reprints both
the PRECIS entry and the string, thereby making
PRECIS appear immensely complex with its
crush of $s, Zs and 000s. The codes are, in fact,
the indexer's instructions to the computer. The
user is never exposed to these private con
versations in the printed index, nor are they of
any interest to one wanting to grasp how PRECIS
works. 'Shunting' of the strings to show the
various access points would have been more
informative.
A PRECIS entry is designed to show context
dependency in two directions and is therefore
designed in a two line format. The Lead, or entry
term into the index, is in the top left position.
Moving towards the right on the top line one
reads increasingly broader terms called the
Qualifier. The narrower context on the line below
is the Display. A simple example of this format
is:
CATHEDRALS. Canada
Architectural features
When the broadest term is in the lead, there is no
qualifier. Thus, the shunted entries from figure
V.6 on page 74 are as follows:
GREAT BRITAIN
Urban regions. Social planning
—Conference proceedings
URBAN REGIONS. Great Britain
Social planning—Conference
proceedings
SOCIAL PLANNING. Urban regions
Great Britain
—Conference proceedings
Figure V.2 would have given an interesting
example of shunting. However, the entries as they
appear in Atherton's report are not written in the
correct PRECIS format; rather, they have been
extracted from the BNB classified catalogue
verbal feature heading (field 083 of the British
The Indexer Vol. 11 No. 3 April 1979
MARC format), (which is explained beginning
p. 399 of the PRECIS manual by Derek Austin).
Field 083 is never shunted; however, the shunted
entries for the subject were found in the 1971
PRECIS index. PRECIS I was superseded by
PRECIS II in 1974 with drastic changes being
introduced at that time. While it might not have
been possible for Atherton to have checked the
latest edition of the SIN file to extract examples
for her publication released in February 1978,
examples
should
have been taken
from
PRECIS II indexes as was Figure V. 1.
Similar errors occur in Figures 3, 5 and 6. It
is unfortunate that Atherton accepted her in
formation from a misinformed colleague,
without verifying it herself.
PRECIS was developed initially in order to
enlist the aid of the computer in printing subject
indexes. In a sense, it is unfair to compare
PRECIS, a contextual system, with a system that
was designed specifically for computer searching
using Boolean logic with single-concept terms.
However, computer searching is the mode we are
now working with and it is useful to see how the
various systems can be adapted for online use.
Ann Schabas is comparing PRECIS strings with
Library of Congress Subject Headings (LCSH),
and published results thus far indicate that
PRECIS gives
better performance than
LCSH.11 The online mode gives the searcher
flexibility to alter the search strategy as the search
progresses; therefore, it seems likely that
PRECIS would give even better performance
online than in the batch mode used by Schabas.
At ONTERIS, searchers, using their own
terms, with a thesaurus, can ask for a display of
the PRECIS entries in which their terms appear.
Upon seeing the terms in context of the PRECIS
entry, the relevant abstracts may be chosen for
display and printing. However, because of the
small size of the ONTERIS file and the detail of
the sorting, usually one string points to only one
document. This is definitely not true of the BNB
file, where one string has been used to index many
documents. In the larger file, a selection stage
showing the search terms in the context of
PRECIS strings should prevent many false drops.
Atherton is concerned with the online display
of subject terms.'2 In considering the example on
page 69, the PRECIS entry is more informative
and less dense to scan than the index terms (IT).
The display value of PRECIS might well be
considered, particularly since BOOKS weighs
147
terms in relation to however many pages in a
book a term covers.14 PRECIS, however, gives
weighting to terms through a precise and
meaningful grammatical relationship. The
weighting is implicit in the structure and does not
have to be noted separately.
In Figure V.2, the verbal feature heading,
derived from PRECIS, is closer to natural
language and is, therefore, easier to read and
absorb. The extra information in the ITs could
have been captured in extra PRECIS strings, had
the indexing policy so demanded. In Figure V.9,
it is claimed that the same classification number
was searched in PRECIS as in BOOKS. However,
the BOOKS number is PN1998 A2 P48, being
interpreted:
Motion pictures
Biography
Collective
(Main entry .Phillips)
Whereas, the BNB number is PN1998.A3C8,
being interpreted:
Motion pictures
Biography
Individual
(Main entry Cukor)1'
Needless to say a biography collection turned out
rather more name entries than a biography on a
single person. Doubtless, this slip was part of the
cursory treatment given PRECIS; however, an
error like this might cause someone to dismiss
PRECIS unjustly.
PRECIS needs to be compared with BOOKS in
terms of cost and time of input and retrieval, cost
of storage, time and effort needed to train indexers, and relevance of output. However, in the
comparison, the indexing policies should be
comparable and the fullest power of the com
puter should be utilized.
NOTES
1 Atherton, Pauline. Books are for use; final report of
the Subject Access Project to the Council on Library
Resources. Syracuse, N.Y.: Syracuse University School
of Information Studies, February 1978. Research
Studies No. 4. p. 68.
1 Ibid., 3rd prelim. page.
3Ibid., p. 87.
4Ibid., p. 87.
3 Ibid., p. 129.
6Ibid., p. 86.
'Austin, Derek. PRECIS: A Manual of Concept
Analysis and Subject Indexing. Council of the British
National Bibliography, Ltd., 1974, p. 412.
1 Canadian
MARC
Communication
Format:
Monographs. 2d ed. Ottawa: National Library of
Canada, 1974, Appendix H, p. 14.
148
'Aurora High School Library. PRECIS Subject
Index Catalogue. Toronto: University of Toronto
Library Automation System, 1977.
10Atherton, p. 83.
1 'Schabas, Ann. Machine Searching of UK MARC.
In Wellisch, Hans. Proceedings of the International
PRECIS Workshop, University of Maryland, 1976,
p. 149-153.
t '7Atherton, p. 86.
1J Library of Congress Classification Schedule.
MAtherton, p. 21.
«—i
Ms. Bett is developing an online thesaurus as a
search tool for the databases at ONTERIS
(Ontario Educational Research Information
System), a research project funded by the Ontario
Ministry of Education. ONTERIS produces a
printed PRECIS index to volumes of detailed
abstracts and also provides for online displays of
PRECIS strings when the databases are queried
by Boolean logic according to the conventions of
ISIS programs.
*
The Society offers its congratulations to
Derek Austin, to whom the American Library
Association has awarded its Margaret Mann
Citation for 1978 'in recognition of his significant
contribution to the establishment of a new
direction in subject analysis through the
development of the PREserved Context Indexing
System, his persistence in refining and perfecting
its techniques, and his skill in defining, inter
preting and sharing its concepts'.
A progress report on the PRECIS Translingual Project describes the switching
procedure from the source language lexicon
through a pivot file, which consists of coded
addresses of all equivalents for a given concept,
to the target language lexicon. Work is nearing
completion on the translation of prepositions, the
addition of articles and the provision of in
flexions where appropriate.'
Nevertheless, the Library of Congress reports
that it has studied the feasibility of adding
PRECIS strings to its MARC cataloguing data
and has concluded: 'In view of the fact that the
addition of PRECIS strings to all current
cataloguing would cost approximately $1,000,000
per year and that there has been no demand to do
this, the Library of Congress will not seek money
from Congress or from any other source to
maintain two
subject
heading/indexing
systems.'2
m p
British
Library
Bibliographic
Newsletter (10), August 1978, 2-3.
Services
Division
1 Library of Congress information bulletin 37 (9), 3rd
March 1978, 154.
The Indexer Vol. 11 No. 3 April 1979