Canadian Thesaurus of Construction Science and Technology: A

Canadian Thesaurus of Construction Science and Technology:
A HyperCard Stack
Dana J. Vanierl
Abstract
This paper describes a HyperCard application based on the Canadian Thesaurus of
Construction Science and Technology. It points to four major revelations within the field
of information processing for thesauri: 1) Automated tools can assist the indexing tasks of
librarians, 2) Thesauri are essential for browsing large information datasets, 3) Thesauri
originally developed for hard copy presentation can receive new uses in electronic form,
and 4) Performance of large Hypercard stacks is not debilitated by dataset size or number
of records.
Introduction
This paper describes a HyperCard application based on the Canadian Thesaurus of
Construction Science and Technology (CTCST 78). It was developed to assist in the
identification of a Classification System [Vanier 911 for the National Building Code of
Canada and it could serve the needs of a wide spectrum of consmction practitioners and
librarians.
As Canadians become increasingly more comfortable with the Information Age, they search
for electronic technical information to meet their day-to-day information needs in ever
increasing numbers. In the past, on-line access charges, expensive in-house computer
systems or cryptic user interfaces made it impossible for the average user to economicaliy
search the available data. The recent advances in both computer hardware and software, as
well as the development of new computer technologies such as hypertext [Vanier 901, have
opened new "information doors".
The Institute for Research in Construction (IRC) has been a principal source of technical
information for the consmction industrv for over 50 vears. In addition to a wide selection
of research papers on consmction-related topics, it p;blishes numerous practitioner papers
each year, as well as the National Building Code of Canada (in conjunction with the
Associate Cornmittee on the NBCC). The increasing use of ward processors, databases
and desktop publishing systems at the IRC in the recent years has provided the opportunity
to supply electronic versions of these documents to clients or to software developers.
One such project, Constmction Resources, is an attempt to produce an electronic collection
of all publications produced by IRC [Worling]. To date, it contains over 150 Canadian
Building Digests, 10 Building Science Insight Workshop proceedings, the National
Building Code, the National Fire Code, the Canadian Farm Building Code, and 3500
bilingual bibliographic references of all IRC publications. Consmction Resources includes
all of the text, graphics and tables of these document. This represents over 100 megabytes
of information, 3500 pages of text and figures and three man-years of work. It is intended
to dismbute this collection on optical disk technology.
l~nstitutefor Research in Construction, National Research Council
559
Searching such a large collection of information on a desktop workstation or personal
computer quickly demonstrates the need for controlled vocabulary and intelligent search
strategies; techniques unknown to the average computer user. Hence the need for tools
such as thesauri, in this case a construction thesaurus.
The Canadian Thesaurus of Construction Science and Technology [CTCST 781 was
developed by the IF gmup of the Universite de Mondal in the rnid 1970's. Although it
was intended as a hard copy publication, it was developed on a database of that era running
on a mainframe computer. The publication was printed using the computer output and has
been in use by librarians for the past 15 years. It is one of the few bilingual thesauri in
existence, the only English - French construction thesaurus and one of a handful of
construction thesauri. It was used as the base for the Thesaurus Latin of the Congress
international du bfitiment (CIB), Working Group 57 [reference].
An electronic copy of the CTCST was obtained by downloading the data to a personal
computer [Chailloux] and the ASCII file was imported into HyperCard. In HyperCard,
various utilities were developed, as and when required, to serve the needs of the author and
the Information Service of the IRC. The HyperCard Construction Thesaurus is the result
of these activities.
Figure 1 demonstrates a standard thesaurus presentation from CTCST; Figure 2 shows the
equivalent HyperCard representation.
THESAURI [6]
FT
UF
BT
RT
PT
Figure 1
thesaurus
controlled vocabularies
documentation(docurnents)
dictionaries
glossaries
indexing languages
indexing
post coordination
bt nt relationships
keywords
related terms
Conventional Thesaurus
HyperCard Construction Thesaurus
Any well-structured technical publications lends itself to easy autornation [Vanier 901. This
holds true for the CTCST. There is a well-defined structure of Descriptor, Descriptor
Level (denoted by [n]). French Terms (FT), Broader Terms (BT), Narrower Terms (NT),
Whole Terms W),Part Terms (PT), Preferential Terms (UF- Use For), Use Term (US USe), General Terms (GT), Related Terms @T), Associated Terms (AT) and Scope Notes
(SN). The data provided was identical to the hard copy thesaurus with all the descriptors
and associated Terms listed sequentially. This flat füe structure was duplicated in
HyperCard by creating a card (record) for each Descriptor and importing the associated
Terms into individual fields in that record as shown in Figure 2. The entire HyperCard
Construction Thesaurus was 4.2 mbytes in size and contained over 15,500records.
Mexing languages
Figure 2
Hypercard Thesaurus
Functionality of HyperCard Construction Thesaurus
There are three basic functions in the existing database: 1) Browse, 2) Search and 3)
Record information.
B rowse
The Browse feature allows the user full access to all of the terms shown in Figure 2. If the
user wishes to access Glossaries as a Related Terms, (s)he points and clicks at that
location on the screen. This will find Glossaries as a Descriptor and display the record.
These cross references are not explicitly tagged or "hard-wired" but the Computer software
(HyperCard script) interprets the Term that was clicked and searches for an record with that
name (using the standard HyperCard FIND comrnand). This will take approximately 2
seconds on a Mac 11 or double the time on a Mac SE.
The Cursor keys or pul1 down menu options allow the user to browse sequentially, that is,
go to the next card in the sequence, the previous card or the last card viewed.
Search
The Find Descriptor button in Figure 2 allows the User to enter a string of characters for
searching. These can be any number of words and can be tmncated (i.e. Thesaur...). The
software will display the found terms in alphabetical order. Successive hits of the Enter
Key will scroll the user through the results. This takes approximately 1 second for each
successive Find Descnptor.
Record information
Every Descriptor that has been viewed is saved automatically in an Audit Trail. This can
be accessed at any time using the Audit Trail feature display in Figure 3. The Audit
Trail maintains a list of the last 60 Descriptors viewed by the User.
In addition, the User can record Selected Terms for future recall by pointing and clicking
at any Term while holding down the option key. The Show Record button in Figure 2
will display the Selected Terms and the Audit Trail. The result of these two
operations in shown in Figure 3.
Audit Trail
keywords
descri ptors
keyword cards
descri ptors
keywords
indexing
thesauri
Selected Terms
identifiers
thesaurl
keyword cards
coordinate indexes
identifiers
Figure 3
Selected terms and Audit Trail
Two other features have been included, Synonym List and Cascading Terms, to
assist Users. The Synonym List provides a Summary of the Terms associated to the
current Descriptor. It accumulates data from the Related Terms, Associated Terms,
Preferential Terms, and Use Terms into a Synonym List. It also collects the Terms from
the General, Broad and Whole categories and treats these as Global Descriptors. It does
the Same for the Narrow Terms and Part Terms to create Specific Descriptors. These are
presented to the User in the form shown in Figure 4.
Global Terms
S~ecificTerms
post coordinaiion
thesaunw
Figure 4
Synonyms, Global and Specific Terms
Cascading Terms are all Terms referenced by the Nmow Terms and the Part Terms,
cascading throughout the entire Thesaurus. This implies, in some cases, the great-(tirnes
10) grandchildren of the parent Descriptor. An example is shown for the Term
"Behaviour" in Figure 5. In the case of Related Terms only fmt order Terms are included,
as Related Terms of Related Terms are not related to the original Descriptor.
Narrow Terms
anjmai behvioux
aggression
hastility
nsistive behaviour
violence
gmup behaviour
crime
man
juvenile delinquency
rioa
vandaiism
Related Terms
-***********
0 behavioux
.....
... affGctivity
bbsb
#f*I:!:!:
fi iife sryies
p h p k d mthmp~lOgy
ill!!phpioiopicd effect
pracess
bliiiiii!j!/ psychical
psychdlogical spress
of iife
g.......
g qmtY
:::::::
9
7
,
Figure 5
Part Terms
0 attitudes
W
host~lity
understanding
......: behavioural
patterns
:i:kg
(iji!ii
Q
eifi
....
...
g!;
...:
.:.:.:.
:::::::
:::::::
........
9
iiiiiii
.:.:.:.
iiiiiii
-
iS
Cascading Terms
Pointing and clicking on any Term in the columns for the Selected Terms, Audit Trail,
Cascading Terms, or Synonym List (Figures 3,4, or 5) will bring the user back to
that Descriptor in the Thesaurus.
Advantages of a HyperCard Construction Thesaurus
Automated tools can assist the indexing tasks of librarians
In&xing publications is a time-consuming, repetitive task requiring considerable
experience on the part of the in&xer. The HyperCard Consmction Thesaurus provides a
recording &vice for the indexer that not only records &sired terms, but quickly brings the
user to the desired location in the text, without having to stumble through an alphabetic
listing.
Thesauri originally &veloped for hard copy presentation can receive new uses in electronic
form
The CTCST was used extensively in the development of the Classification System for the
National Building Code of Canada. One principle complaint was the time involved to find
a Descriptor in the 400 pages of the Thesaurus; sometimes taking up to 30 seconds to
retrieve the references to one word.
In addition, the HyperCard Construction Thesaurus eliminated the need for the KWIC
(keyword in context) listing (another 200 page document) of the CTCST, owing to the fact
that simple flat fde can be always be searched as a KWIC. The software will find the
occurrence of a word whether it is the fmt or the last word in a field.
Thesauri are essential for browsing large information datasets
The Construction Resources project has over 100 megabytes of information and this couId
grow to 500 mbytes in the next 3 years. Any intelligent search methodology should
inclu& a method to expand the search criteria through the use of Global and Specific Terms
and of Synonyms. The HyperCard Construction Thesaurus permits this as shown in
Figure 4. These Terms could be used to expand the vocabulary. Vocabulary switching
could occur transparently to the user: the software could automatically include "dictionaries,
glossaries, indexing, and controlled vocabularies" as alternate search terms for "thesaurus".
Performance of large Hypercard stacks is not kbilitated by dataset size or number of
records
Many years ago, in the early days of Macintosh, the general feeling regarding this Computer
was "it is a toy". The Same was felt about HyperCard in its early days, "sure it was fun to
play with, but when you have to do real senous data manipulation - - get a real database!"
Recent expenments with HyperCard and datasets approaching 5 megabytes have proven
that it can outperforms many real databases for many operations. Experiments with the
HyperCard Construction Thesaurus has indicated that stacks approaching 15,000 records
perform extraordinarily well for a large number of tasks. This is attributed to the fact that
the CTCST contains a weli distributed vocabulary because it represents a full selection of
the English language. This fact allows the Find function to use the HyperCard internal
index to perform efficiently and effectively. Experiments with other non-homogeneous
HyperCard stacks do not perform as quick for this operation.
Conclusions
HyperCard provides a good environment for the development of Prototypes, specifically
this tool could assist indexers and construction practitioners and it took under 2 man weeks
to write and refine. Additional functionality could be designed, programmed and
h search
implemented with ease. The HyperCard Constmction Thesaurus will e ~ c the
vocabulary for users browsing large datasets. It will provide new uses for a document
developed over 15 years ago and may perhaps initiate new interest to revise and update the
CTCST. HyperCard can no longer be considered a toy as it can perform as well as many
of the existing "real databases".
References
[Chailioux]
Chailloux, Jean-Jacques, PhD Thesis
[CTCST 781
Canadian Thesaurus of Construction Science und Techrwlogy,
Construction Industry Development Council, Industry Science and
Technology Canada (formerly Department of Industry, Trade and
Comrnerce, Government of Canada), Ottawa, 1978
[NBCC 901
National Building Code of Canadu, Associate Committee of the National
Building Code, National Research Council Canada, Ottawa, Canada.
[Vanier 9 11
Vanier, Dana J., "A Classification System to Extract Project-Specific
Building Codes", Computers und Building Regulations, V?T
Symposium 125, Espoo Finland, May 1991.
[Vanier 901
Vanier, Dana J., "Hypertext - A Computer Tool to Assist Building
Design", The Electronic Design Studio, MIT Press, Cambndge MA,
1990.
[Wmlingl
Worling, Jarnie L., Mellon, B. Scott, Vanier, Dana J., "Electronic
Technical Information for the Construction Industry", To be published
in near future in Elecnic Architect - Journal of Computers in
Architecture.