How does the Thesaurus work

How does the Thesaurus work?
•
•
•
•
•
Introduction
Which backend fields are important for the thesaurus functionality?
Processing differences between single and multi-hierarchy thesauri
What tabs do I need for single and multi-hierarchy thesauri?
How does the thesaurus validate terms?
Introduction
The Thesaurus module stores one or more thesauri whose terms can be used to fill fields in other modules. It
supports both single hierarchy and multi-hierarchy thesauri and has the ability to:
•
•
•
Define what validation should be performed when a term is entered into a field.
Restrict entry of values into a field to be a sub-tree of a Thesaurus.
Expand Thesaurus terms at query time.
Thesaurus functionality is configured in the KE EMu Registry (see
http://emudev.mel.kesoftware.com/Registry/index.html for details about the Registry options).
The Thesaurus module is accessed by selecting the Thesaurus
button in the
Command Centre. As with other modules it displays in Search mode by default. The KERichEdit and
KELinkGrid controls include the UseThesaurus property. When this option is selected the Browse
button
is added to the control. When the Browse button is selected, the Thesaurus enters Browse mode. In Browse
mode the left half of the screen displays a tree view of the thesauri stored in the ethesaurus database. This
example shows three thesauri at the top level of the tree, AAT (Art & Architecture Thesaurus), KE and
LCSH TGM I (Library of Congress Subject Headings). Select a sign to expand the tree:
How does the Thesaurus work?
www.kesoftware.com
1
The following symbols indicate thesaural relationships and whether or not a term is acceptable for use (ISO
5964 Annex A):
• A green tick ( ) indicates that a term may be selected for use.
• A red cross ( ) indicates that a term should not be used.
Select the Show Use/Used For Terms
terms:
toggle button in the Tool bar to hide or reveal Use/Used For
• A right pointing (purple) arrow ( ) indicates that a term has a Use term, e.g. Babies in the example
above has a Use term of Infants.
• A left pointing (mauve) arrow ( ) indicates that a term has a Used For term, e.g. Bacteria has a
Used For term of Germs.
Which backend fields are important for Thesaurus functionality?
A thesaurus is a hierarchy of records. Implementing this structure in EMu involves linking terms through
attachments. The following design approach was adopted:
Each term has both broader and narrower terms. The broader term link must be made and from it a
reverse reference from the broader term can be made for its narrower term.
A term may have a Use reference and if it does, the Use term will have a Used For reference. The
Use term link must be made and from it a reverse reference from the Use term can be made for its
Used For term.
Although this approach simplifies tree restructuring, it does require three queries to gather all information
about a term:
1. The first to get the term.
2. The second to find narrower terms.
3. The third to find Used For references.
To speed up the process, flags were added to indicate whether narrower terms or Used For references existed
for a term. The columns used for this purpose are:
TerHasNarrower_tab - nested table of flags
TerHasUseFor - single flag
How does the Thesaurus work?
www.kesoftware.com
2
When the Thesaurus is opened in Browse mode it accesses the Registry to determine whether or not there are
any entry terms for the column being accessed. If there are none, it displays the collection of top terms. A top
term is defined as a term that does not have any broader terms. This is indicated in the record by the flag:
TerTopTerm - single flag
A record status has been included to allow each institution to assign their own record statuses to their
thesaurus. Consequently, an additional flag is required to determine which records are valid for use: a
selection in the Valid Term fields (Yes / No) is converted to a flag to signify that the record is valid for use.
This flag is:
TerValidTermFlag - single flag
To verify that a term is in a particular sub-tree an internal representation of a thesaurus' hierarchy has been
introduced. Each record may be part of one or more hierarchies. Each hierarchy is represented as a colon
separated list of IRNs. The internal hierarchy of a particular term is the colon separated string of IRNs of all
the broader terms of this term from the top of the tree down. Thus a top term has no internal hierarchy
because it is at the top of the tree. A term that has a top term as its broader term would have an internal
hierarchy of the IRN of the top term and so forth. The field used to represent this is:
TerInternalHierarchy_tab - nested table of internal hierarchies
Processing differences between single and multi-hierarchy thesauri
The main difference between single and multi-hierarchy thesauri is the way the internal hierarchy and
narrower terms flag fields are used.
For a single hierarchy thesaurus the TerHasNarrower_tab field acts as a flag. This flag is set to no if there
are no narrower terms and to yes if there are. When the tree is adjusted (i.e. the broader terms of a record are
changed) the field TerInternalHierarchy_tab is adjusted accordingly to reflect the changes in tree hierarchy.
For multi-hierarchy thesauri the TerHasNarrower_tab sets one flag per internal hierarchy. This flag is set to
no if there are no narrower terms and to yes if there are. When the tree is adjusted (i.e. the broader terms of a
record are changed) each hierarchy in TerInternalHierarchy_tab is set by asking the user which broader
hierarchy to attach into (if more than one exists) and which sub tree hierarchy to link the term into if one or
more exists.
What tabs do I need for single and multi-hierarchy thesauri?
The following Registry entries give the tab definitions for the Thesaurus module:
The Default Tabs
Group|Default|Table|ethesaurus|Tabs|Default|
All;-AllLcsTab;-AllAatTab;-AllHieTab;-AllHie2Tab;-AllRelTab
For a single hierarchy thesaurus
Group|Default|Table|ethesaurus|Tabs|TerAcronym|acronym|+AllHieTab
For a multi-hierarchy thesaurus
Group|Default|Table|ethesaurus|Tabs|TerAcronym|acronym|
+AllHie2Tab;+AllRelTab
How does the Thesaurus validate terms?
There are two types of term validation with the Thesaurus – when a user:
1. inserts a record into the Thesaurus.
2. takes a term from the Thesaurus for use in another module.
When a term is inserted into a thesaurus a query is performed to ensure that it is unique within that thesaurus.
Every term must also have an acronym associated with it, e.g. AAT, which identifies which thesaurus the
term belongs to. If users wish to create their own thesaurus, a default can be setup to insert their acronym for
them. The query performed to test for uniqueness queries on both the term and the acronym. If a record
How does the Thesaurus work?
www.kesoftware.com
3
already exists with this term and acronym, the user is informed and they are unable to save the record; if one
does not exist, the record is saved. This means that every term stored in the Thesaurus module is not
guaranteed to be unique because the same term may occur in different thesauri; however it will always be
unique within a thesaurus.
When a thesaurus term is used in another module the following tests take place to ensure that it is valid:
•
The internal hierarchy is checked to ensure that it is valid for this field
Each field that uses the thesaurus must define entry terms and from these it is possible to determine
whether or not a term exists in a sub-tree. This is made possible by looking at the internal hierarchy
of the entry terms. If the internal hierarchy of the selected term starts with the internal hierarchy of
the entry term, the term is valid. As an example, an entry term has an internal hierarchy of 12:54. A
term with an internal hierarchy of 12:54:87:26 would be valid whereas one with 25:54:37 would not.
•
The term is then checked to determine if it has any Use references
If it does, the Use terms are queried for and added to the matching set.
•
The term is checked to see if it is valid (valid flag set to "Y")
Therefore for a term to be valid for a given field it must have a matching internal hierarchy and have its valid
flag set to yes.
How does the Thesaurus work?
www.kesoftware.com
4