Modeling Classification Systems in Multicultural and Multilingual

Modeling Classification Systems in Multicultural and Multilingual Contexts
Joan S. Mitchell
OCLC, Inc., Dublin, Ohio, USA
[email protected]
Marcia Lei Zeng
Kent State University, Kent, Ohio, USA
[email protected]
Maja Žumer
University of Ljubljana, Slovenia
[email protected]
Abstract
This paper reports on the second part of an initiative of the authors on researching classification
systems with the conceptual model defined by the Functional Requirements for Subject
Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the
FRSAD conceptual model could be extended beyond subject authority data to model
classification data. The focus of the current study is to determine if classification data modeled
using FRSAD can be used to solve real-world discovery problems in multicultural and
multilingual contexts. The paper discusses the relationships between entities (same type or
different types) in the context of classification systems that involve multiple translations and /or
multicultural implementations. Results of two case studies are presented in detail: (a) two
instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC
22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are
also discussed.
Modeling Classification Systems in Multicultural and Multilingual Contexts
This paper reports on part of a long-term research project to model classification systems in
multilingual/multicultural contexts using general conceptual models to support organization and
discovery in Semantic Web settings. The research reported on here focuses on ongoing research
to use the Functional Requirements for Subject Authority Data (FRSAD) conceptual model in
order to derive implementable data models suitable for classification systems that involve
multiple translations and /or multicultural implementations.
The FRSAD conceptual model identifies entities, attributes, and relationships as they relate to
subject authority data (FRSAR, 2010). The two main entities are thema (any entity used as a
subject of a work) and nomen (any sign or sequence of signs that a thema is known by, referred
to or addressed as). Within a given subject authority system, a nomen should be the appellation
of one thema. The FRSAD conceptual model is summarized in figure 1.
Figure 1: FRSAD conceptual model
The FRSAD conceptual model includes two general attributes defined for themas, thema type
and scope note. Additional attributes may be defined in a specific implementation. The model
defines a wider set of attributes for nomens: type of nomen, scheme, reference source,
representation, language, script, script conversion, form, time of validity, audience and status.
Additional attributes may be defined in a specific implementation. The FRSAD conceptual
model also provides for relationships between entities of the same type (nomen-to-nomen,
thema-to-thema) and different types of entities (thema-to-nomen).
While FRSAD was developed as a general model of aboutness, the original focus was subject
authority data (i.e., controlled vocabularies). Models based on the structures and functions of
controlled vocabularies (such as thesauri and subject heading systems) often need to be adjusted
or extended to accommodate classification systems that have been developed with different
focused functions, structures and fundamental theories. FRSAD and W3C’s Simple Knowledge
Organization System (SKOS) (W3C, 2009) are skewed toward conventional thesauri or subject
headings systems; the research question of how to implement these models for the complex data
in classification systems remains to be answered. Classification system developers have been
proposing their own SKOS extensions in order to present their systems with the new formats;
such extensions in turn create various data models which then defeat the original purpose of a
more interoperable model. With a general implementation-oriented conceptual model based on
FRSAD these problems may be solvable effectively.
In an earlier study, we considered whether the FRSAD conceptual model could be extended
beyond subject authority data to model classification data (Mitchell, Zeng & Zumer, 2011). We
posited that the following general statements could be applied to any classification scheme
within the FRSAD conceptual model:
 thema corresponds to the full category description of the class

nomen is the symbol (or surrogate) used to represent the full category description
Using the Dewey Decimal Classification (DDC)1 system as a case study, we found that the
FRSAD conceptual model appears to accommodate DDC data at the thema/nomen level. The
DDC case study also demonstrated that an understanding of the semantic relationships within a
classification scheme is required in order to make additional statements with respect to what can
be considered equivalent to the full category description of the class (thema) and what can be
considered as a surrogate to represent the full category description (nomen). As further work, we
recommended investigation of DDC translations and mappings in the context of the FRSAD
conceptual model, application of the model to other classification schemes, and modeling the
Relative Index as a separate controlled vocabulary to explore a topic-centered view.
In this paper, we investigate relationships between entities of the same type and between entities
of different types in the context of classification systems that involve multiple translations and
/or multicultural implementations. We use as our case studies a) two instances of the DDC (DDC
22 in English, and the Swedish-English mixed translation of DDC 22), and b) Chinese Library
Classification.2
DDC Translation Case Study
For the DDC case study, we used the updated DDC 22 databases of the two language editions in
the WebDewey implementations (WebDewey [OCLC] and WebDewey [National Library of
Sweden]).3 DDC 22 (22/eng) was initially published in English in 2003 in print and web
versions; the Swedish-English mixed translation of DDC 22 (22/swe) was published in a web
version in 2011 based on the latest version of the 22/eng database. The Swedish mixed
translation is based on a model in which the category descriptions associated with the DDC are
expressed in the vernacular to form the basic framework of the translation; classes from the
corresponding English-language edition on which the mixed translation is based are ingested
directly to complete the hierarchies where needed (Mitchell, Rype & Svanberg, 2011).
In our initial study of whether the FRSAD conceptual model could be extended to model
classification data, we identified any Dewey class (e.g., 340 Law) as corresponding to a thema.
Nomens include class notation or surrogates, e.g., “340” (class notation),
“http://dewey.info/class/340/” (URI), “Social sciences/Law” (caption in the full hierarchical
context). Since topics in class-here notes in the DDC are considered to be functionally
equivalent to the class (“approximate the whole” of the class), they are also considered
1
DDC, Dewey, Dewey Decimal Classification and WebDewey are registered trademarks of OCLC Online
Computer Library Center, Inc.
2
In a companion study, the authors are investigating the application of the Functional Requirements for
Bibliographic Records (FRBR) model to classification systems and versions of those systems at the system level for
the purposes of specifying provenance of classification data and facilitating collaborative efforts for using and
reusing classification data, particularly in a linked data setting.
3
DDC 23 was published in print and web versions in English in 2011; however, the full Swedish-English
implementation of DDC 23 was not available in time for inclusion in the study.
alternative nomens for the class/thema. Relative Index terms with a functional equivalence4
relationship to the class are also alternative nomens for the class/thema.
Relationships between DDC classes in the translations
For any class at the same notational level in either language edition within equivalent
hierarchical structures, the following two statements hold true:
1. Thema [class] 22/eng = Thema [class] 22/swe
2. The nomen (the symbol (or surrogate) used to represent the thema in either language
edition may be used interchangeably (if not limited by a language-specific attribute).
In figure 2, the class 741.5 Comic books, graphic novels, fotonovelas, cartoons, caricatures,
comic strips in English-language edition is functionally equivalent to the corresponding class
741.5 in the Swedish mixed edition.
Figure 2: Equivalent classes at the same notational level
Because the themas are functionally equivalent, most nomens associated with the class in either
edition may be used interchangeably:
•
•
741.5
Arts & recreation / Drawing & decorative arts / Drawing and drawings / Special
applications / Comic books, graphic novels, fotonovelas, cartoons, caricatures, comic
strips
We use the term “functional equivalence” and the symbol “=” to signify near equivalence. Functional equivalence
is similar to the SKOS relationship “closeMatch”: concepts that are sufficiently similar that they can be used
interchangeably in some information retrieval applications (W3C, 2009).
4
•
•
Konst & fritid / Teckningskonst och konsthantverk / Teckningskonst och teckningar
/Speciella tillämpningsområden / Seriemagasin, grafiska romaner, fotonovelas, tecknade
serier, karikatyrer, skämtserier
http://dewey.info/class/741.5/5
These statements also hold true for any class at the same notational level in either language
edition in which one class has been expanded further, and the other remains a logical
abridgment. In figure 3, thema [class T2—4888] 22/eng = thema [class T2—4888] 22/swe.
Figure 3: Equivalent classes at the same notational level, with further expansion in one edition
Figure 4 further illustrates the relationships between the terminal class T2—4888 (22/eng) and
interoperable subclasses T2—48884 (22/swe) and T2—48887 (22/swe). For the subclasses that
exist only in one of the hierarchies, the following relationships exist between editions:
1. Each subclass (thema) is represented by edition-specific nomens.
2. The nomens are valid within the edition with the expanded hierarchy, but are not valid
within the edition without the expanded hierarchy.
5
If the URI includes attribute=language, the URI cannot be used as a nomen for a thema beyond a language-specific
instance of a thema.
T2—4888+ (22/swe)
T2—4888 (22/eng)
Figure 4: Relationships between terminal classes in one edition with interoperable subclasses in another edition
However, the individual themas may be summed as a scope note associated with the terminal
class in the other hierarchy. The nomens expressed as truncated symbols may be used to retrieve
the terminal class; surrogate nomens (e.g., index terms) associated with the subclasses may also
be used to retrieve the terminal class in the other hierarchy, but cannot be used to represent the
terminal class.
Relationships between index terms
Relative Index terms with a functional-equivalence relationship to the class (Relative Index
terms that either match the caption or topics in the class-here note), and Relative Index terms
with a near-synonym relationship between such index terms (i.e., lexical variants) are considered
as alternative nomens for the class/thema. The same nomen/thema relationships across editions
hold as described above.
Within the translation system that supports development and maintenance of DDC editions in
various languages, there is the ability to capture functional equivalence between index terms.
The relationships supported by this feature may prove important in future work. In figure 5, the
English-language index terms associated with class 796.963 are functionally equivalent to the
parallel index entries in the Swedish version. “Bandy” can be used as an alternative nomen for
class 796.963 in either edition. “Rink bandy” and “Innebandy,” while functionally equivalent to
each other as index terms, cannot serve as alternative nomens for class 796.963; neither is
functionally equivalent to the class 796.963 because each is part of a scope note (Mitchell, Zeng
& Zumer, 2011, 246).
Figure 5: Relationships between Relative Index terms
Chinese Library Classification Case Study
A second focus of our investigation is the Chinese Library Classification (CLC) 5th edition
(CLC, 2012). Our interest here lies in applying the FRSAD conceptual model to a scheme
developed outside an Anglophone viewpoint to see if additional issues emerge.
The sample classes were selected from the Chinese culture-related areas, including:
 J292 Chinese Calligraphy, Sculpture (subordinate of J2 Painting). Chinese calligraphy is
always a unique case because users usually look for calligraphy works based on (a) style
and (b) historical period. CLC thus provides J292.2 Calligraphy works according to time
period, and J292.3 Calligraphy works according to style.
 R2 Chinese Medicine. Chinese medicine is one part in the R Medicine schedule. R2 is
selected because Chinese medicine is a whole system that covers topics from theories and
classics to medicinal materials, diagnosis, and clinical practices. A mapping between R2 of
the CLC and Medical Subject Headings (MeSH) has existed since the 1990s (Lin, 1992).
 D93/97 Law of different countries. For multicultural related areas we selected these
classes. As possibly any country’s laws are to be classified for collections in Chinese
libraries, it would be interesting to see if a pre-defined classificatory structure can be
applied to every country’s laws regardless of social systems.
 A general table of nationality was also selected in the study. It is one of the seven
general tables to be used together with major schedules. Many specialized tables are also
available in each schedule.
All sample classes selected (J292, R2, and D93/97) appear to have the same components. Using
J292 as an example (see figure 6), it is clear that for the individual classes the CLC has such
components: class here, class elsewhere, include, see also, and note on subdividing that should
follow the examples of specific class(es).
Figure 6: Class J29 Calligraphy in CLC explained in FRSAD terminology
Although it grew up from a different root, the CLC’s microstructures of hierarchies and class
descriptions are very similar to the DDC’s microstructures. In this case, thema is Class J292. The
nomens of the thema include CLC number J292, the full caption 书法、篆刻, and its record
number C007181. We consider any topic co-extensive with the full meaning of the class as
thema which can be found in the “Class here” note: the topics that are functionally equivalent to
the class. Using the framework that the DDC has used, the “Include” and “Subdivide following
examples” notes are considered as scope notes; they are not equal to the class or the thema here.
The “Class elsewhere” and “See also” indicate the associative relationships between this thema
and other themas.
The Chinese Library Classification provides “alternative class” notation for classes that can be
the member of two different superordinate classes. Square brackets are used to indicate
alternative notation. In the CLC, alternative notation is most frequently associated with
interdisciplinary concepts.6 For example, the preferred position for Environmental Biology in the
CLC is X17; the alternative class is Q89 (see figure 7).
6
In contrast, the DDC provides options most frequently for culturally sensitive topics not given preferred treatment
in the standard notation, e.g., jurisdiction, ethnic/national group, language, religion (Mitchell, 1995).
Q Biological sciences
……
[Q89] Environmental biology
Preferred class: X17
X1 Environmental Sciences – Basic Theory
……
X17 Environmental biology
Note: Here under X17 Environmental
biology, all subdivisions and semantic
relationships between a class and other
classes are systematically presented.
If in an implementation a decision is
made to use Q89 as the preferred class
and X17 as the alternative class, the
bracket will be moved from Q89 to X17.
The subdivisions under Q89 will be
formed following those listed under
previous X17. The semantic
relationships of those classes will be
kept.
Figure 7: Alternative notation [Q89] and its equivalent, preferred class X17 (Panzer & Zeng, 2009)
What does this mean in terms of the FRSAD model? Is the thema “[Q89] Environmental
biology” equivalent to the thema “X17 Environmental biology”? The answer depends on the
inheritance rules within a particular classification system. In terms of the thema-nomen
relationship, “Environmental biology”@CLC has two nomens (Q89 and X17). The nomennomen relationship between them is similar to what is called ‘equivalence’ in a thesaurus.
Because the subdivisions and semantic relationships within “Environmental biology” remain as
a whole unit, the thema can be considered the same no matter which family it belongs to. What
will be different is the semantic relationship of “Environmental biology” with its superordinate
or family, either “X Environmental sciences” or “Q Biological sciences”. A different
understanding, however, is that due to the change of the superordinate or family, the original
thema “Environmental biology” class, has also changed. Thus “X17 Environmental biology” is a
different thema from “Q89 Environmental biology”. We welcome discussion on this issue, and
plan to explore it further as part of our future work on mappings between classification systems.
Use Cases
The key motivation for our work is to determine if classification data modeled using the
implementation-oriented FRSAD conceptual model can be used to solve real-world discovery
problems in multicultural and multilingual contexts. While purely theoretical, conceptual models
are important in practice. These models provide thorough and detailed description of a domain,
procedure, service, etc. and thus enable deep understanding. Such understanding is particularly
necessary when designing any computer application.
The value of FRSAD is that it fosters understanding of KOS in general. By clearly differentiating
the thing itself (thema) and the label used to refer to it (nomen) and, consequently, the attributes
and relationships associated with the two entity types, FRSAD enables the understanding of and
interoperability among various KOS systems that have inherited different data models. For the
same reason, FRSAD also supports the design of new KOS.
When talking about multicultural and multilingual issues of KOS in general and classification
systems in particular, the use cases for a conceptual model include translation, mapping,
integration and discovery. If we go back to figure 4, the scope note of T2—4888 (22/eng) can be
automatically extended by all the geographic areas explicitly provided for in subdivisions of
T2—4888 (22/swe); this is an example of integration. (Such integration can take place virtually;
it does not need to be stored in the scope note of the English-language class.) In turn, integration
supports discovery. For example, the Relative Index terms associated with the geographic areas
found in the subdivisions of T2—4888+ (22/swe) can be used to retrieve information at the level
of specificity provided for in either edition, e.g., Pajala kommun (Sverige), the Relative Index
term associated with T2—488846 (22/swe) can be used to retrieve works classed with T2—
488846 or with T2—4888.
Future Work
In our initial study, our third recommendation for future work was to model Relative Index as a
separate controlled vocabulary under the FRSAD conceptual model to explore a topic-centered
view and expose topic-to-topic relationships. As part of this future work, we plan to investigate
mapped terminology from other controlled vocabularies. In some cases, the relationship between
the mapped terminology and the DDC class is formally defined; in other instances, the mapped
term – DDC class relationship is a looser association motivated by extending the access
vocabulary in the system. Mapped headings are nomens for themas in those respective subject
heading systems; can these headings be used as nomens for the classes (themas) in the DDC to
which they are mapped? The answer likely depends on the relationship (if any) specified
between the subject heading and the DDC class to which it is mapped. We also plan further
investigation of mapped relationships between different classification systems.
References
Chinese Library Classification (CLC). (2012). Edition 5 Web version. Beijing, National Library
of China. Available at: http://1.202.200.235/login.aspx
IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR).
(2010). Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model.
Edited by M. L. Zeng, M. Žumer, & A. Salaba. Available at:
http://www.ifla.org/files/classification-and-indexing/functional-requirements-for-subjectauthority-data/frsad-final-report.pdf.
Lin, M. et al. (Ed.) (1992). Concordance of the Chinese Library Classification Schedule R,
Medical Subject Headings (MeSH), and Thesaurus of Chinese Medicine. Beijing: China
Science and Technology Press.
Mitchell, J. S. (1995). Options in the Dewey Decimal Classification system: The current
perspective. Co-published simultaneously in Cataloging & Classification Quarterly, 19, 3/4,
89-103; and Classification: Options and Opportunities. Edited by A. Thomas, 89-103.
Binghamton, NY: Haworth.
Mitchell, J. S., Rype, I., & Svanberg, M. (2011). Mixed translations of the DDC: Design,
usability, and implications for knowledge organization in multilingual environments. In:
Subject access: Preparing for the future. Edited by P. Landry, L. Bultrini, E. T. O’Neill, & S.
K. Roe, 77-89. Berlin: De Gruyter Saur.
Mitchell, J. S., Zeng, M. L., & Žumer, M. (2011). Extending models for controlled vocabularies
to classification systems: Modelling DDC with FRSAD. In Classification & ontology:
formal approaches and access to knowledge: proceedings of the International UDC Seminar,
19-20 September 2011, The Hague, The Netherlands. Edited by A. Slavic & E. Civallero,
241-250. Würzburg: Ergon Verlag.
Panzer, M., & Zeng, M. L. (2009). Modeling classification systems in SKOS: some challenges
and best-practice recommendations. In: Semantic interoperability of linked data:
Proceedings of the International Conference on Dublin Core and Metadata Applications,
Seoul, October 12-16, 2009. Edited by S. Oh, S. Sugimoto, & S. A. Sutton, 3-14. Seoul:
Dublin Core Metadata Initiative and National Library of Korea, Available at:
http://dcpapers.dublincore.org/ojs/pubs/article/view/974
W3C. (2009). SKOS Simple Knowledge Organization System Reference. (W3C Recommendation
August 18, 2009). Edited by A. Miles and S. Bechhofer. Available at: http://www.w3.org/TR/skosreference/
WebDewey. Dublin, OH: OCLC. Available at: http://dewey.org/webdewey.
WebDewey. Stockholm: National Library of Sweden. Available at: http://deweysv.pansoft.de.
Joan S. Mitchell is editor in chief of the Dewey Decimal Classification (DDC) system at OCLC
Online Computer Library Center, Inc. Prior to joining OCLC in 1993, she was director of
educational technology at Carnegie Mellon University and an adjunct professor in the School of
Information Sciences at the University of Pittsburgh. Her research interests include localization
and interoperability in classification systems, and the application of general conceptual models to
classification systems. In 2005, the American Library Association awarded her the Melvil
Dewey Medal, which recognizes distinguished service to the profession of librarianship.
Marcia Lei Zeng is professor at Kent State University. She has been involved in the
development and research of knowledge organization systems for over 20 years and has been
contributing to related standards including NISO Z39.19 and ISO 25964 for controlled
vocabularies. She was also the chair of IFLA Working Group that developed the model of
Functional Requirements for Subject Authority Data (FRSAD), and an Invited Expert on the
W3C Library Linked Data Incubator Group. She is a member of the Executive Board of the
International Society for Knowledge Organization (ISKO) and Director-at-large of the American
Society for Information Science and Technology (ASIS&T).
Maja Žumer is Professor of Information Science at the University of Ljubljana (Slovenia). Her
research interests include design and evaluation of information retrieval systems, end-user
interfaces, and conceptual modeling. She has been involved in several IFLA working groups,
NISO committees, and several EU projects. She has received several international and national
research grants. She is a member of the IFLA FRBR Review Group and was the co-chair of
IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR).