ALIS 31(3-4) 77-92

Annals of Library Science and Document~tion
1984, 31(3-4), 77-92
INDEX LANGUAGE DEVICES IN VARIOUS
THESAURI: A COMPARATIVE STUDY*
CHARLES SPURGEN
Indian Petrochemicals Corp. Lid.
Vadadara, Gujarat.
Presents a checklist to compare and contrast
d1ffl'Tr'17t thesauri, such as, Thesaurofacet (TF),
Thesaurus of Scientific and Technical Terms
(TEST),
Engineering
Index
Thesaurus (EIT),
INSPEC Thesaurus (INSPEC), Root Thesaurus
(RT) and Energy Information Data Base Subject
Thesaurus (ElDEST). The concepts "recall" and
"precision" are explained and the different recall and precision devices used in the above
thesauri are examined. Recall services examined,
include, synonym
control, term linkage, control of word forms ete. Precision devices, such
as, coordination,
role indicators, etc are also
examined.
Suggests also the need to design
and develop a thesaurus in the newly emerging
discipline called Appropriate
Technology (AT),
incorporating
the various recall and precision
devices.
INTRODUCTION
Index language of a retrieval system exerts
a considerable
influence
on its performance.
The requirements
that seem to be of most
importance
in a controlled
vocabulary
for information
retrieval are: (i)
it should
have
"warrant"
derived from the terminology
of the
literature
and
the information
needs of the
actual or potential
users, (ii) it must be sufficiently specific to allow the conduct
of the
great majority of searches at an acceptable level
of precision.
(iii) it must be sufficiently
pre·
coordinated
to avoid most problems
of false
(oordination
and incorrect
term relationships:
(iv) it should promote consistency
in indexing
"Paucr presented ill the IJH.TC Annual
(20) (1983),21-2,,) Fchruarv 1983.
Vol 11 Nos .1·1
Sept
I h-, 1')R4
SI'lIllnar
and searching
by the control
of synonyms,
near synonyms
and quasi-synonyms;
(v) it
should reduce terminological
ambiguity through
the separation
of homographs
and through the
definition
of terms whose meaning or scope
would otherwise be unclear; and (vi) it should
assist the indexer and searcher in the selection
of the most appropriate
terms needed to represent a particular subject through its hierarchical
and cross-reference
structure.
SCOPE OF THE PAPER
Any index language
can be disected
to
display its various devices. The index lang-uage
devices used in different thesauri will improve
the retrieval
performance
of an in formation
retrieval
system.
The various recall and precision devices used in di ffercnt thesauri arc
examined
with suitable
examples. The use of
the recall and precision devices arc studied in
different
thesauri,
independent
of any particular
information
retrieval
system.
Various
index languages or thesauri, examined to find,
their devices and how they work include,
Thesaurus
of Scientific
and Technical Terms
crEST), Thcsaur ofacct (TF), '·,ng-ineni.lg Index
Thesaurus (EIT), INSPEC Thesaurus (I:'\SPEC),
Root Thesaurus
(RT) and Energy IIIformation
Data Base Subject
Thcsau rus (EIDBST).
It
should be noted that no attempt
is made to
analyse the shortcomings
of the t hrs.rur i or
to comment
in depth Oil t hcir strut t u rc arid
methods
of compilation.
Further,
this <;Iltd')
gives a kind of r lu« of 110\\' 10 go ••bolt! ill
dcsumiru;
and
dnTloping
xu it.rb h
in tiC',
bngu,i'!c
or \cJctlndary
con t rl d ill ,[ nrwl ,
('lll(T~J1IC:
di" inlin« lik« .\pPIIJI'ILIII' It'l hn..
,I
SPURGEN
logy (AT) for which a great literary warrant
exists and enormous growth, output and use
.of literature are envisaged. It is suggested to
design and develop a thesaurus incorporating
the various recall and precision devices to ensure
the retrieval performance of an AT information system.
CRITERIA FOR
THESAURI
COMPARING VARIOUS
11. Are links made between a general term
and those specific to it? If so, what is the
average number of links in a hierarchy? How
many terms on an average are linked into
single hierarchy? To what extent do terms
form part of more than one hierarchy?
12. Are links made between terms related
in ways other than genus to species? What
other relations are included?
13. How are links between terms displayed?
What is the average number of links per term?
For comparing different thesauri the follow14. If coding is used, what purposes does
ing questions may be asked. These questions it serve?
may be taken as the criteria for comparison.
Statement showing the comparison of
1. What is the basic form of vocabulary - various thesauri is given in Annexure I given
alphabetic or systematic, and if the latter, at the end.
sequential (as in a classification schedule) or
diagrammatic.
RECALL AND PRECISION
2. By what method is an individual term
located in the scheme - by following an alpha- The term recall refers to a measure of whether
betical sequence, or by scanning the whole or not a particular item is retrieved or the
extent to which the retrieval of wanted items
structure?
occur. The measure of the completeness of a
3. How many terms are there ill the thesearch in a data base is frequently referred to
saurus?
as a recall ratio and the statement 80% recall
4. How specific are the terms? This is implies that 8/10 of the relevant documents,
always a relative question - specifically re- in the data base, were found.
lative to other terminologies, or to normally
The term precision refers to a measure of
encountered text or search terms.
signal-to-noise ratio in certain kinds of information systems. 20% precision implies that
5. Does the terminology include compound out of the 50 documents retrieved, only 10
terms, phrases of two or more words, and if are relevant.
so, are there rules concerning their admission?
These two measures, recall ratio and pre6. To what extent are word forms (singular cision ratio are illustrated by means of the
and plural, words with the same root) con- table givenbelow.
founded or kept separate? Are there rules
governing this?
Literature Search Results
7. How are homographs of different meaning
treated?
8. Is the use of some terms limited by
scope notes or definitions?
Retrieved
9. To what extent are synonyms and nearsynonyms confounded?
Not retrieved
10. If synonyms and near-synonyms are
barred, are they listed in the terminology
as lead-in words? How many such lead-in words
are there? What percentage of the total terminology are lead-in words?
78
Total
Relevant
Not Relevant
a
(Hits)
(Noise)
c
(Misses)
a+c
Total
b
a+b
d
(correctly rejected)
b+d
c+d
a+b+c+d
(Total coUection)
Ann Lib Sci Doc
INDEX LANGUAGES
Recall relates to the ability of the system to
retrieve relevant documents and precision relates
to its ability not to retrieve irrelevant documents.
The recall ratio is defined as,
Number of relevant documents
retrieved
Total number of relevant documents
the collection
The recall ratio
a
a -'.c
= _
IN THESAURI
The most common recall devices are :
Synonym control
term linkage
control of word forms
clustering.
X 100
in
X 100
The precision ratio is defined as,
The different precision devices include:
Coordination
links
role indicators
weighting
~N::u:.::m:.::b::..:e::r....:o::f..:r:::e::le:.:v=an:::t=--d=o:..:c::u:.::m:::e::n::ts=...:;re:..:t:::ri:.:e..:...ve:..:d:::...-_
X 100 Each of these devices is discussed below with
suitable examples from various thesauri. Before
Total number of documents retrieved
examining the devices in various thesauri, it
The precision ratio = __ a
X 100
is better to note the scope and structure of
a+b
each thesauri.
The precision ratio and recall ration, used
jointly, express the filtering capacity of the
system - its ability to let through what is wanted
and to hold back what is not. Recall and precision tend to be related inversely.
Recall and Precision Devices
An index language includes the following
three components.
1) A vocabulary (i.e. a set
of terms), 2) A syntax (i.e. grammatical structure), 3) Rules for use of the language and for
controlling changes in it.
A complete index language may include
certain devices which can be used with the
actual
terms themselves to achieve various
results. These devices may be referred to as
index-language
devices. They may be divided
into two groups:
1. Devices that group terms together into
classes of one type or other. Such devices will
reduce the size of the specifier vocabulary and
will allow improvements
in recall. These are
called recall devices.
2. Devices that, WhCl1 used in association
with terms, increase the shades of meaning.
These devices increase the number of specifiers in the vocabulary and allow improvements
in precision. These are called precision devices.
SCOPE AND
THESAURI
Thesaurus
(TEST)
Vol 31 Nos 3-4 Sept-Dee 1984
of Engineering
OF
VARIOUS
and Scientific
Terms
The Engineers Joint Council (EJC) TEST is
a list of engineering and related scientific terms
and their relationship for use as a vocabulary
reference in indexing and retrieving technical
information. To assist the user in determining
appropriate descriptors from several approaches
the terminology is organised in four divisions.
The principal devision is the Thesaurus of
Terms. The others serve as indexes to it: Permuted
Index,
Subject Category Index and
Hierarchical Index.
Thesaurus
of Terms
The terms appear in alphabetical order (letterby-letter). Each descriptor is shown with its
hierarchical structuring,
other cross-references
and scope notes. All cross-references are reciprocal. Terms are given in noun and plural
form where possible and the direct form of
entry is used. Terms that have more than one
accepted
meaning are accompanied
by an
explanation.
Permuted
Recall devices tend to increase the size of
retrieved document
sets while precision devices tend to reduce them.
STRUCTURE
Index
It is organised as follows: An entry is made
for every term under every significant word.
79
SPURGEN
The KWOC format is used. It is a display ot
all terms in the thesaurus permu ted according
to each significant word in the single and multiword terms arranged in alphabetical order on
the permuted words. Index words that are themselves descriptors appear in bold face italics,
otherwise in bold face. Terms that are USE
references rather than descriptors are preceded
by a centered dot (.). This index tends to group
generically related terms.
Subject
category
Index
This index is a two-level arrangement consisting of 22 major subject fields. Each subject
field is further subdivided into groups. The
subject fields, the group within a subject field
and the descriptors within a group are arranged
in alphabetical order. Groups have a four-digit
notation and in the thesaurus
part
the notation(s) of the appropriate group(s) are given
for each descriptor.
Hierarchical
Index
It displays descriptor
families based on the
BT-NT references as shown in the Thesaurus
of Terms. Only
desriptors that, in the Thesaurus of Terms, have no Broader Terms and
two or more levels of Narrower Terms are
selected as main entries. If a descriptor belongs
in two or more separate descriptor families
it will appear in each at its proper hierarchical
level. Each descriptor family is alphabetically
located.
Thesaurofacet
Thesaurofacet, compiled by the English Electric
Company, ranges over the whole field of science
and technology
but covers subject areas in
varying depths. Control engineering, computers,
measurement
and testing, physics and management science are covered at a greater detail.
In thesaurofacet,
the thesaurus replaces
the alphabetical
subject index. The terms in
the system appear twice, once in the thesaurus and once in the schedules. The link between
the two locations for the term being the notation or class number. The information
given
about the term in the thesaurus is additional
to that given in the classification
schedules.
80
In the same way, the information
ab oi - ;-,.
term in the classification is additional to '",.;.
in the thesaurus. The two parts of the sys.
are complementary,
and if used separat
incomplete.
Thesaurus
(Main part)
An entry is made for every term, including
some spelling variants and in some cases, inverted forms. The entries are arranged in alphabetical order. USE references are employed to
control synonyms and to provide entries for
specific terms that are not used for indexing
or searching. The UF reciprocals also appear.
Against each descriptor in the thesaurus
is
placed its class number. When we go to this
number
in the classification
schedules,
the
faceted display reveals the complete hierarchy
of broader
terms and narrower terms. The
thesaurus
section of the thesaurofacet
also
contains some RT's and some BT's, but does
not duplicate
any feature of the classified
section. The related terms shown in the thesaurus are not terms related hierarchically.
The more important
of these relationships
are indicated in Fig. 1.
The thesaurus does not display the same
BT relationship as shown in the faceted structure. A particular concept may quite properly
belong in several hierarchies. The faceted display
shows only the principal hierarchy. Others are
revealed in the thesaurus. In the cases of Television Camera Tubes the additional hierarchy,
indicated by the abbreviation
BT(A) is the
hierarchy of television apparatus.
The theasurus
SImilarly lists additional
narrower
terms indicated
by NT (A). For
example, the term JETS appears in the faceted
schedules as follows:
CWJ Jets
CWK Jet streams
CWL Plumes
CWM Wall jets
CWO
CWP
CWQ
Couette flow
Jet mixing
Propulsive jets
The jets listed here are all fluid dynamic
jets. But there are other types of jets also.
Ann Lib Sci Doc
INDEX LANGUAGES IN THESAURI
These additional hierarchical linkages are shown
as NT(A) 's under jets in the thesaurus;
The distinguishing characteristics of the EIT
Jets
NT(A) Jets (hovercraft)
Plasmajets.
The classification scheme and the thesaurus
together reveal the "multiple hierarchical
linkage of terms". In the thesaurofacet, the
faceted classification replaces the usual hierarchical structure built into a thesaurus by means
of BT-NT references. Similarly the thesaurus
replaces the alphabetical subject index that
would normally accompany the schedules in
a conventional faceted classification. Thesaurofacet uses the following symbols for
descriptors and for other preferred terms that
are expressed by a combination of descriptors:
UF, NT(A), BT(A), RT SBT, SBT(A), S RT.
S Indicates a descriptor to be used in a combination.
Schedules
ing are provided within the term-set, supplemented by related descriptors.
(classified
are:
a) Each term is a concept not necessarily a
single word.
b) The basic term-set consists of a hierarchical
arrangement of cross reference terms in the
following sequence:
Broader Term (BT):
always includes the MT concept
Narrower Term (NT): always included
MT concept
Related Term (RT):
index)
within
the
1.
Closely related to MT
concept, but less rigorously than NT or BT;
2.
an RT may be from a
different (but conceptually related) hierarchical
structure;
3.
it may show a class of
use applicable to the MT;
4. it may bear a whole-part
In the schedules all descriptors and a number
relationship to the MT;
of preferred terms that are expressed by a
5. it may be a near synocombination of descriptors are displayed.
nym.
For some descriptors narrower or related desUsed for:
a term which is forbidden for
criptors not to be seen from the linear arrangeuse in indexing.
ment are given in the schedules. (as a rule such
cross-references are given only in the part
Use:
a specified descriptor must be
labelled thesaurus). These cross-references are
substituted
for a f9rbidden
MT.
all designated by '*' and no distinction between
Narrower Term, and Related Term is made. c) All the above relationships are displayed
A few scope notes are also given in the schereciprocally within the thesaurus, e.g.
dules. The notation is ordinal using both
numeric and alphabetic characters as the base.
BASIC TERM-SET
Engineering
Index
Thesaurus
(EIT)
The primary focus of this thesaurus is Plastics
and Electrical/Electronics Engineering. Terms
are displayed in sets, each consisting of a main
term (MT) followed by indented Cross Reference Terms (CRTs) related to the Main
Term in various .ways. Term-sets are displayed
in the alphabetical order of their Main Terms.
Each Cross Reference Term has a code which
identifies the relationship existing within the
term-set between the CRT and the. MT. The
basic structure of the thesaurus is hierarchical. Terms of both broader and narrower meanVol 31 Nos 3-4 Sept-Dee 1984
RECIPROCALS
DERIVED
FROM BASIC TERM-SET
(MT) Astigmatism
(MT) Aberrations!
lenses
USED FOR
USE ABERRA nONS!
ASTIGMA nSM
LENSES
BT Physical
properties
(MT) Physical properties
NT Aberrations!
Lenses
NT Spherical
Aberration
(MT) Spherical Aberration
BT Aberrations!
Lenses
RT Lenses
(RT) Lenses
RT Aberrations!
Lenses
81
SPURGEN
d)
Scope notes are provided where needed for
clarification or {or special instructions.
INSPEC
Entries
format.
Thesaurus
INSPEC Thesaurus of terms for physics, electrotechnology,
computers
and control
can be
used either as a controlled language for indexing or searching. The thesaurus has two parts:
a)
Alphabetical
display of terms
This is the main part of the thesaurus, listing in a single alphabetical
order all terms
both preferred and lead-in. Bold type, is used
to indicate preferred terms. Cross reference or
lead-in terms are shown in medium type.
b)
Hierarchical
display of terms
This collects the individual terms scattered
throughout
the
main
alphabetical
display
and displays them in families or hierarchies
under the top terms (TT) of the hierarchy.
ROOT
Thesaurus
(RT)
BSI has produced the ROOT thesaurus based
on the English Electric Thesaurofacet,
as a
controlled vocabulary for technology, to meet
the needs of the English-speaking world. The
ROOT thesaurus has been published both to
meet the demand for a modern, ready-made
indexing tool and to act as a guide to technical
terminology. ROOT's presentation in the form
of a subject display with a complementary
alphabetical
list ensures that experienced and
new users alike are able to achieve a new standard of consistency. ROOT's unique computeraided generation
system simplifies updating
and extension, allows the production of foreign
language or multilingual versions
and facilitates the production
of a whole range of specialised thesauri enabling users to adapt it to
their own specific requirements. The thesaurus
consists of a subject display with a complementary alphabetical
list. The subject display is
the principal part of the thesaurus as it gives
the greatest assistance to the user by displaying terms in a complete subject context rather than in alphabetical sequence: The alphabetical list is derived automatically
from the
subject display and contains all information
(with the exception
of instructional
notes)
82
to one hierarchical
thesaurus format.
DESCRIPTOR.
level
for descriptors
Notation
In a conventional
have the following
(Scope Note)
:: Non-preferred synonym or quasi-synonym
c: Broader term in the same part of the
display
Narrower term in the same part of the
display
- Related term in the same part of the
display'
*<. Broader term for another part of the
display. Notation
'*> Narrower term from another part of
the display. Notation
~ _ Related term from another part of the
display. Notation
Non-descriptors,
which may be synonyms
or synthesized terms, have the following formats
respectivel y:
>
Synonym
-7 Descriptor.
Notation
** Synthesized term. Notation
-7 Descriptor A + Descriptor B
Reciprocal
entries of the following type are
provided for the terms making up a synthesis:
Descriptor A
+ Descriptor B
=** Synthesized term. Notation.
Chemical formulae and symbols are shown in
the chemistry schedule as synonyms (but limited
to inorganic substances only). These formulae
and symbols are listed in alphabetical order in
the chemical
formulae
index following
the
main alphabetical
list. The abbreviations
BT,
NT etc used in many thesauri are specific to
English Language. Since ROOT has been designed for multilingual applications, these abbreviations have been replaced by symbols which
are linguistically neutral. Key to symbols are
given below.
Symbol
Meaning
<
Broader term
Narrower term
Related term
">
Ann Lib Sci Doc
INDEX LANGUAGES
* <.
Broader term in an alternative hierarchy
* ';> Narrower term in an alternative hierarchy
Related term in an alternative hierar*
chy
quasiNon-preferred synonym
or
=
synonym
Use (The term or combination
of
terms following the arrow should be
used instea-I of the term preceding it)
This symbol appears between terms
+
which are used to synthesize a given
concept.
Synthesized
term (The term which
**
follows the symbol is a non-descriptor
which should be represented
by a
combination of terms)
=** The term ( a non-descriptor) following
the symbol should be represented by
the combination of descriptors preceding it.
(..)
Scope note
or instructional
note.
This clarifies the meaning of a term
in the context of the thesaurus or
gives guidance on the use of a term.
(By ..) Facet indicator. This is a device used
in the subject display section to group
together
terms having a common
characteristic.
Energy
Information
Dota Base Subject
Thesaurus
This thesaurus, brought out by U S Department
of Energy (DOE) is being used for the subject
indexing activities of DOE, allowing consistent
machine storage and retrieval of information
necessary to the accomplishment
of the DOE
mission.
Terms in the thesaurus are listed alphabetically on a ''word by word" basis. All the terms
associated with the alphabetic entry are enumerated below it. Terms which have hierarchical
relationship to the entry are identified by the
symbols BT and NT for Broader Term and
Narrower Term. Terms with an affinitive relationship are identified by R'T (Related Term). Terms
with a preferential
relationship are identified
by USE and the reciprocals (UF) are used to
designate USED FOR. Scope notes on the use of
a particular descriptor are given for some of the
terms. Descriptors added since December 1974
Vol 31 Nos 3-4 Sept-Dee 19&4
IN THESAURI
have the entry date (eg. DA 1977) and in some
cases, even definition. of the term is given which
appears as part of the word block. A number
[01] in the square brackets listed to the right of
some descriptors indicates terms that appear in
a thesaurus (IAEA-INIS-13) published by the
International Nuclear Information System (INIS).
Some descriptors in the theasaurus represent
a class consisting of a large number of specific
descriptors, e.g. even-even nuclei. Long lists of
narrower terms in the word blocks of this type
of term are not always of equal usefulness to
indexers and retrievers. Hence the display of
these word blocks has been truncated in the
main body of the thesaurus and whenever this
has been done it is indicated by an asterisk (*)
at the level below which no narrower terms are
listed. If the asterisk appears next to the alphabetic entry i,e. when none of the narrower terms
below the alphabetic entry are
listed
the
complete word block for the descriptor in question is given with full listings of a narrower term
in the Appendix to the thesaurus. Further, the
alphabetic entry is followed by the scope note:
"F or specific terms, consult the Appendix". If
the asterisk appears next to
the reference
indicator "NT" of a term within a word block,
narrower terms more specific than that identified by the asterisk are not listed in that particular word block. They are listed, in the word
block of the descriptor which is accompanied
by the asterisk.
RECALL DEVICES IN VARIOUS THESAURI
Synonym
Control
This device makes for consistency SInce the
vocabulary user is -directed to one specific
term from other synonymous terms. The vocabulary is thus reduced in size and at the
same time the term definitions are broadened.
This device is helpful to both the person indexing documents and to the person trying to
retrieve documents.
Recall will tend to be
improved by such synonym control, but possibly at the expense of precision. In each of the
thesauri this is done by clear reciprocal crossreferencing. For example, in the INSPEC thesaurus, there are 5200 postable terms and 3800
non-postable
terms. The notation USE, indi83
SPURGEN
cates that the term is not postable
the following term should be used.
and that
e.g. Crystal Energy
USE Lattice Energy
Similarly USED FOR is the reciprocal
of the USE Cross-reference
and identifies
postable terms.
e.g. Lattice Energy
UF Crystal Energy
The same method and notations are also
used in TEST, TF, and EIDBST.
e.g. Vibration meters
UF
BT
NT
RT
1402
Vibrometers
Measuring instruments
Seismometers
Accelerometers
Acoustic measurement
Frequency meters.
The TEST gives and explains the following notations:
Broader Term (BT), Narrower Term (NT),
Related Term (RT), Use (USE), Used For (UF).
In EIT, the same principle is employed but
instead of the notation UF, full form USED
FOR is used.
These cross-references are also used in the
other thesauri and normally have the following
specific applications:
e.g. Photochemical Reactions
Used for Photochemistry.
RT also uses the same principle but instead
of USE and UF, symbols __ and = are used.
e.g. Radioactivity CNN. CN
Radioactive decay
-~ Ionizing radiation
Radioactive decay
~
Radioactivity CNN. CN
Boarder term
indicates that the term(s)
following BT notation represent more inclusive concepts that cover, among
others, the term used.
e.g.
Vibrational
spectra
2006
BT Electromagnetic Spectra
Term Linkage
It is possible to impose a hierarchical structure on a vocabulary by means of a formal
classificatory organisation (overt classification)
or through an appropriate network of crossreferences
(covert classification).
Hierarchical
structure is a recall device because it groups
classes together, and forms them into larger
classes. A hierarchy expresses relationships of
super/subordination
of concepts. In indexing,
it may be useful to have display of term relationships if the policy is to index both generically and specifically. For searching, however by moving from a more specific term to a
more generic one, the device allows a broadening of the search strategy and thus the retrieval
of more documents.
Narrower term - indicates that the subject
term(s) following the NT
notation
represent
more
specific concepts than the
term used; it is a reciprocal
of the BT.
e.g.
Stress analysis 1402
NT X-ray stress analysis
Related term -
indicates that two indexable terms are closely related but are not structured
within the broader/narrower tree or hierarchy.
Spark machining chipless
machining RT Chipless machining
RT
Spark
machining
e.g.
USE
Hierarchical
Linkage
A hierarchy allows the relationships between
terms to be shown by cross-references; e.g.
(TEST).
84
e.g. Trade
USE
indicates the term is not
postable
and that
the
following term should be
used instead.
Commerce.
Ann Lib Sci Doc
INDEX LANGUAGES
USED FOR -
is the reciprocal of USE
cross-reference
and identifies postable terms.
Commerce
UF
Trade.
e.g.
TEST and INSPEC do in fact give a hierarchical
index displaying descriptor families based on
BT-NT references. INSPEC contains Top Terms
(IT) which are the most general (broadest)
terms of any category.
TF and RT h.r.e introduced
additional
broader and narrower terms. They give information about concepts which are broader or narrower hierachically than the index terms; but
which are displayed in schedules other than
those chosen as the primary hierarchy for the
display of the index terms. In TF, an (A) in
brackets is given after the BT or NT to show
that the relationship occurs in an auxiliary or
additional hierarchy. RT uses * Hand * > respectively.
IN THESAURI
Clothing
UF
NT
RT
Footwear
BT
NT
RT
Clothing
Boots (footwear)
Shoes
Arctic clothing
Hosiery
socks
Hosiery
BT
NT
RT
e.g. TF thesaurus entry
Socks
Jets
BT
CWJ
Apparel
Arctic clothing Footwear
Footwear
Hosiery
socks
Apparel fabrics
Clothing
Socks
Footwear
Clothing
Hosiery
RT
Jet condenser
RT
Footwear
Jet dispersal valves
Jet dispensers
Non-hierarchical
linkages
Jet pumps
Jet rectifiers
Nozzles
NT(A)
Jets (hovercraft)
Plasmajets
lThese
1found
terms will not be
in the schedules at
CWJ. They are in fact at
REJ
and
EXQ
respec-
tively.
In INSPEC thesauri the degree of specification is taken to a greater length as shown in the
example below.
Waves
Charge density waves
elastic waves
acoustic waves
magnetoacoustic effects
magnetoacoustic resonance
ferroacoustic resonance
The cross reference structure for TEST and
INSPEC are similar, with reciprocal entries for
each cross-reference. ego (from TEST)
Vol 31 Nos 3-4 Sept-Doc 1984
Various relationships can exist between terms.
that are neither hierarchical nor synonymous.
A few
examples from different thesauri are
shown below:
a)
Antonymity
Non-destructive tests
RT Destructive tests
Linear systems
RT Non-linear systems
Antiferro electromaterials
RT Ferroelectric materials
(TEST)
(EIT)
(INSPEC)
Sometimes only one concept is used to express
both relationship.
Insolubility
USE Solubility
Positrons
USE Antipositrons
Antibaryons
USE Baryons
(INSPEC)
(EIT)
(INSPEC)
85
SPURGEN
b)
Instrumental
Machine tools
RT Machining
Hydrometers
- Density measurement
Tape recorders
RT Audio recording
X-ray apparatus
RT X-rays
c)
Quality control
RT Acceptability
(INSPEC)
Abstracting
RT Abstract publications
(RT)
(INSPEC)
g)
(EIT)
(TF)
Milk RT cows
(EIDBST)
Permanent magnet motors
RT Permanent mllgnets
Data processing
Data
(INSPEC)
(RT)
h)
Ceramics
RT Ceramic capacitors
(EIT)
(TEST)
Revenue
RT Income
Design
-Technical drawing
(TEST)
(EIT)
(RT)
are not
the same as
Specificity
Machining
RT Grooving
(EIT)
Lasers
RT Ring lasers
Material
Plastic laminates
RT Sandwich construction
Data storage devices
RT Computer storage devices
Pulse modulation
RT Pulse compression
The above relationships
synonymity.
This relationship may be considered as the reciprocal of. the instrumental relationship.
d)
(TEST)
Similarity
(TF)
Steam turbines
RT Boilers
(TF)
Explosives
RT Accidents
Dependency
Information retrieval
RT Subject indexing
(TEST)
i)
Generality
ficity)
(TF)
(the
Crawler tractors
(INSPEC)
reciprocal
of
speci(TF)
RT Tractors
Wool
RT Textiles
Cellulose
RT Paper
Magnetic materials
RT Permanent magnets
e)
Electrolytes
RT Electrolysis
Cathode ray tubes
RT Television equipment
Counting tubes
RT Geiger coun ters
86
(EIT)
(EIT)
(TF)
(TES'f)
(INSPEC)
(RT)
The latter two relationships could have been
given as narrowerfbroader
term relationships
rather than as related terms. A term can, of
course have any combination
of the above
relationships.
Nuclear fuels
(TEST)
RT
Fissionable materials (material)
Neutron sources (utilisation)
Like the various relationships connected by the
notation RT, similar relationships can also exist
between the USE and USED FOR notations.
(e.g.)
a)
Cause and effect
Photoelectric cells
RT Photoelectricity
Periodical articles
- periodicals
(TF)
Utilization
Linoleum
RT Floor coverings
f)
(EIDBST)
(INSPEC)
Generality
Bisphenol-A USE Bisphenols
(EIT)
Thermometer
USE Temperature
measuring instruments
(TEST)
Ann Lib Sci Doc
INDEX LANGUAGES
b)
IN THESAURI
Specificity
welding
welding
welding
welding
welding
Window projectiles
USE
Window rockets (TEST)
c)
Material
Cast stone
USE
Concrete products
d)
(TEST)
Instrumental
Training programs
USE
Personnel Jevelopment
(TEST)
Sometimes it will be very difficult to identify the RT relationships that exist between
terms.
Since TEST, INSPEC etc employ direct form of
entry, the other words, such as, electric arc
welding, spot welding, fusion welding will not
be so readily found. This problem is reduced to
an extent in TF, RT etc as they make inverted
entries in several cases.
However, in other thesauri the hierarchical
structure permits a certain measure of control,
e.g.
Control of word forms
This is essentially a searching device rather
than an indexing device. It implies the acceptance of all variant word forms rather than just
one. Thus all words having the same root or
stem are compounded into one set or group usually by truncation.
As an example, the words
weld, welds,
welding,· welded, weldable, weldability
and
welder may all be reduced to the root "Weld"
which becomes a discriptor in the vocabulary.
When a search is conducted on the root descriptor weld the distinctions among the various
parts of speech are eliminated. This will improve
recall, since a searcher seeking information on
the process of welding as applied to titanium,
will also be interested in a document discussing
the property of weldability of titanium or in
one which discussed welded titanium products.
Dropping of word endings, like control of synonyms, will tend to improve recall, but at the
same time to reduce precision. However in an
on-line system even high precision also can be
achieved, if at the indexing stage, postings are
not made at the root word. At the indexing
s+age posting may be made at separate terms
and provision may be made for compounding
at the search stage.
Since all the thesauri are in alphabetical
order of keyword easy finding of many words
with the same root is possible.
e.g.
weld defects
weldability
welded joints
welding
(EIT)
Vol 31 Nos 3-4 Sept-Dee 1984
current
electrodes
fluxes
machines
pads
Welding
NT
Arc welding (EIT)
Electric welding
Friction welding
Fusion welding
Gas welding
Hot plate welding
Induction welding
Microwelding
Pressure welding
Resistance welding
Solvent welding
Spin welding
Spot welding
Ultrasonic welding
The provision of permuted index list as given
in TEST and INSPEC, helps to draw various
roots and words together.
e.g.
Consumption
(TEST)
Consumption rate
Daily consumption rate
Food consumption
Fuel consumntion
Oxygen consumption
Water consumption
PRECISION DEVICES IN VARIOUS
THESAURI
Coordination
There are two types of coordination-pre
and
post. In precoordination
the intersection between ccncepts are made at the indexing stage.
e.g.
evaporative
cooling)
cooling (evaporation
and
In postcoordination,
the intersection between
independent terms (unit terms) are made at the
search stage.
87
SPURGEN
Mufflers
(To suit context, choose
term
or terms from those
Precoordination helps to reduce the size of the
listed below)
(EIT)
vocabulary, the number of documents assigned
Handles
to terms and false associations,
(For doors, or for lifting
In some cases single concepts may exist
containers, etc)
(RT)
along with precoordinated terms, e.g. Helicopter rotors. Documents dealing with the NEED FOR A THESAURUS IN APPROrotors of helicopters, can be well indexed under PRIATE TECHNOLOGY (AT)
Helicopter rotors. Therefore, the user at the
search stage should not retrieve documents What is Appropriate Technology (AT)
on the subjects as a result of intersecting Heli- Appropriate Technology (AT) is technology
which is most suitably adapted to the conditions
copters and Rotors.
In some cases the single terms in the the- of a given situation. It is compatible with the
sauri, have to be combined to give the pre- human, financial, and material resources which
surrounded its application. AT, by its very
coordinated concept.
name,
describes approaches for solving the
e.g. Hot forging
USE
problems of a community that do not call for
Forging and Hot working
(TEST)
massive doses of capital or very sophisticated
Asbestos Cement pipes
**
(RT) technology.
~Pipes + Asbestos cement
On the other hand, it uses all the locally
available resources, skills and other endowRole indicators
ment factors and is capable of being managed
Role indicators are used to a limited extent in by the users themselves and furthermore conalmost all the thesauri by the addition of an serves resources and in no way accounts for
any disharmony in that situation. The basic
extra distinguishing term of qualifier.
criteria for the appropriateness of a technoe.g. Condensers (electric), Condensers
logy are:
(liquefiers), Concrete tiles (tubes),
1. Simplicity of that technology both for its
Hardening (materials), Hardening
fabrication and operation
(systems), etc.
(TEST)
2. Amenability of adoption to very small
Indicators (economics), Spinning
scales
(metals), Spinning (textiles), Range
e.g.
computers and design (computerised
design)
measurement (radar)
Bonds (chemical)
(TE)
(INSPEC)
EIT uses a slash instead of brackets e.g. Identification/chemical analysis.
Iden tity period/ crystallography
Choppers (circuits)
Doping (crystal)
(RT)
(EIDBST)
The addition of scope notes can also help define the role, use or limitations of a term.
3. Should be preferably labour intensive
4. Should have low capital intensity
5. Uses all the locally available raw materials
6. Optimises all the local skill-spectrum of
that community
7. Both the technology and the related services are available at a fair price, and within
easy reach.
8. Should be mass-replicable
Volunteers in Technical Assistance (VITA)
Cores
have published a thesaurus in AT but its suit(excludes magnetic cores)
(TEST) ability to developing countries is yet to be
Production
examined and improved, if necessary. ILO/
(limited to industrial production) (INSPEC) RCTT (ESCAP) Regional Technical consultation
Doublet-l device
on Documentation and Information Systems on
(Quadrupolar configuration)
(EIDBST) Alternate Technologies held at SIET Institute,
88
Ann Lib Sci Doc
INDEX LANGUAGES
Hyderabad from 8-13 September 1980, rightly
identified the need for an index language in
AT for the storage and retrieval of AT information in the local, regional and international
level. The various AT areas include, wind mills,
rural housing, solar cookers and ovens, spinning, weaving, biogas, water lifting, drinking
water supply, rural sanitation, rural industries,
waste utilisation,
etc. All these topics have
become socially relevant and social and population pressures and ~overnment policies make
the people and the scientists to pay some
attention
to this newly emerging discipline
called AT. With the result research and development work is being carried out in almost
all CSIR laboratories, KVIC institutions, engineering colleges etc in AT and enormous amount
of literature is being published.
Over a period of time SENDOC of SIET
Institute has accumulated a substantial amount
of information in AT in the form of books, product profiles, technical notes, periodical articales, etc. No doubt they have been classified
with the- existing schemes. But the need will
arise sooner or later to create a data base purely
on AT and to establish linkages with similar
national
and international
organisations
for
the acquisition and transfer of AT information.
In such a situation we need a specific vocabulary control device exclusively on AT which
may incorporate all the recall and precision devices, discussed earlier.
Even in a very narrow subject like waste
utilization which is coming "vithin the scope
of AT, a large. number of articles, processf
product profiles have been indexed and stored
ir; the Central
Information
File (Clf')
of
SENDOC. Since AT is an interdisciplinary subject, it borrows concepts from Technical, economics, management, behavioural and regulatory areas. At the moment no one thesaurus is
available covering the above areas in the depth
in which it is warranted. Hence depth indexing
seems to be an impossible one. Non-availability of nascent terms in the existing
thesauri
forces the indexer to go for non-depth indexing,
and hence leading to more recall and justifies
the need for an AT Thesaurus.
Vol 31 Nos 3-4 Sept-Dee 1984
IN THESAURI
CONCLUSION
The most common recall and precision devices
have been reviewed and their use in different
index languages or thesauri has been examined.
The various devices used in these indexes will
be very helpful for anyone who is interested
in constructing
a new thesaurus enabling to
achieve optimum recall and precision.
ACKNOWLEDGEMENT
I am grateful to Dr. J B Subramaniam, Director
(Documentation),
SIET Institute for giving me
encouragement
during the preparation of this
paper. I am also grateful to Shri S G Raghu,
Principal Director, SIET Institute for having
permitted me to present this paper in the DRTC
Annual Seminar - 20 (1983).
REFERENCES
1.
Energy Information Data Base Subject Thesaurus.
United States Department of Energy, 1978.
2.
Engineering Index Thesaurus. CCM Information
Corporation, 1972.
3.
INSPEC Thesaurus.
Engineers, 1979.
4.
John J R: Evaluating indexing systems.; a review after Cranfield. Indexer 1980, 12(1), 14-:!1.
5.
Jones K P: Problems associated with the use of
compound words in thesauri, with special reference to BS 5723: 1979. Journal of Documentation 1981, 37(2), 53-68.
6.
Lancaster F W: Information retrieval systems;
characteristics, testing an" evaluation. Ed 2.
1979.
7.
Lancaster F W: Vocab.uary control for information retrieval, 1972.
8.
Maron M E: Depth of indexing. Journal of the
American Society of Information Science 1979,
30(4). 224.
9.
Mater E: Increasing information retrieval precision. Intemational Forum on Information and
Documentation, 1980,5(4),12-18.
1 O.
Raitt Dr: Recall and Precision devices in interactive bibliographic, search and retrieval systems.
Aslib Proceedings 1980,32(7/8),281-301.
11.
ROOT Thesaurus, Part I &: II. British Standards
Institution, 1981.
Institution
of
Electrical
89
SPURGEN
12.
Soergel D: Indexing languages and
Construction and maintenance. 1974.
thesauri:
15.
Thesaurus of Engineering and Scientific TErms.
Engineers Joint Council. 1969.
13.
Steinweg H: Specificity in subject headings.
Library Resources and Technical Services 1979.
23(1).55-68.
16.
Vickery B C: Techniques of information retrieval.1970.
17.
14.
Thesaurofacet. The English Electric Company Ltd.
1969.
Wilson: The end of specificity. Library Resources and Technical Services 1979, 23(2), 116-122.
I
THING
Thing/part
Thing/property
Thing/process
Thing/thing as attribute
Thing/application
PROPERTY
Property /process
Property /property as attribute
PROCESS
Process/thing (agent)
Process/property
Fig. 1
Boilers
RT Steam pipes
Lasers
RT coherence
Ships
RT Shipbuilding
Arcs
RT Arc furnaces
Adaptive filters
RT Signal processing
Charge (electric)
RT Charge measurement
Skew
RT Skew girders
Landing (flight)
RT Landing gear
Detonation
RT Detonation waves
Ann Lib Sci Doc
INDEX LANGUAGES IN THESAURI
Annexure - I
Comparison of various Thesauri
-------------------------------------------------------------------TEST
TF
EIT
INSPEC
RT
EIDBST
--------------------------------------------------------------------1.
Fonn of Vocabulary
(a)
Yes
Alphabetic
Yes
Yes
Yes
Yes
Yes
1. Arrangement
2.
Yes
Yes
No
No
No
~o
b) Word by-word
No
No
Yes
Yes
Yes
Yes
Location of individual term
Alphabetical sequence
Scanning the whole
structure
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
(a)
Total No. of terms
23364
23000
11800
9000
17300
NA
(b)
(c)
Preferred terms
17810
16000
NA
5200
11800
NA
(a)
(b)
3.
a) Letter-by-letter
Lead-in terms
(cross-reference )
5554
4.
Are terms specific?
5.
Terminology inclusive of
(a)
(b)
6.
7.
8.
10.
11.
Compound terms
Phrases of words
7000
Yes
NA
Yes
3800
Yes
5500
Yes
NA
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
No
Yes
Yes
No
World fonns (words with the
same root)
(a)
Confounded
Yes
Yes
Yes
Yes
Yes
Yes
(b)
Kept separate
No
No
No
No
No
No
Homographs
( a)
Qualifiers
Yes
Yes
Yes
Yes
Yes
Yes
(b)
Part of the term
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
Yes
Use of some terms limited
1.
Scope note
2.
9.
Yes
Definitions
Synonym/near-synonym
(a)
Control/alternate
spellings
Yes
Yes
Yes
Yes
Yes
Yes
(b)
Concept representation by
more than one concept
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
NA
3800
5500
NA
NA
42.22
31. 79
NA
U synonyms/near-synonyms
barred
(a)
Are they listed as
lead-in words
(b)
How many such lead-in
words are there
(c)
What percentage
(of total tenninology)
,Are links made between a
general tenn and those
specific to it?
(a)
W'llat is the average number
of links in a hierarchy?
Vol 31 Nos 3-4 Sept-Dee 1984
5554
7000
23.7
30.43
Yes
Yes
Yes
Yes
Yes
Yes
3
4
4
3
4
3
91
SPURGEN
TF
(b)
(c)
12.
13.
14.
INSPEC
RT
EIDBST
How many terms on average
are linked into single
hierarchy?
10
12
6
6
6
15
Do terms fonn part of more
than one hierarchy?
Yes
Yes
Yes
Yes
Yes
Yes
Tenn relationships
(a)
Hierarchical
Yes
Yes
Yes
Yes
Yes
Yes
(b)
Non-hierarchical
Yes
Yes
Yes
Yes
Yes
Yes
Whether coding is used
Yes
Yes
No
Yes
Yes
a number
(01) is used
Purpose
to direct
to direct
to subject
category
index
to the
schedule
used in
INSPEC
MTS
Input to
IN SPEC
data base
to direct
to say it
to schedule appears in
I NIS thesaurus also.
1.
Appendix
Yes
No
No
No
No
Yes
2.
3.
Permuted index
Yes
No
No
No
No
No
Subject category
index/schedule
Yes
Yes
No
No
Yes
No
Hierarchical index
Yes
No
No
Yes
No
No
Chemical formula index
No
No
No
No
Yes
No
4.
5.
92
EIT
Ann Lib Sci Doc