Modeling Classification Systems in Multicultural and Multilingual Contexts Joan S. Mitchell OCLC, Inc., Dublin, Ohio, USA [email protected] Marcia Lei Zeng Kent State University, Kent, Ohio, USA [email protected] Maja Žumer University of Ljubljana, Slovenia [email protected] Abstract This paper reports on the second part of an initiative of the authors on researching classification systems with the conceptual model defined by the Functional Requirements for Subject Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data. The focus of the current study is to determine if classification data modeled using FRSAD can be used to solve real-world discovery problems in multicultural and multilingual contexts. The paper discusses the relationships between entities (same type or different types) in the context of classification systems that involve multiple translations and /or multicultural implementations. Results of two case studies are presented in detail: (a) two instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are also discussed. Modeling Classification Systems in Multicultural and Multilingual Contexts This paper reports on part of a long-term research project to model classification systems in multilingual/multicultural contexts using general conceptual models to support organization and discovery in Semantic Web settings. The research reported on here focuses on ongoing research to use the Functional Requirements for Subject Authority Data (FRSAD) conceptual model in order to derive implementable data models suitable for classification systems that involve multiple translations and /or multicultural implementations. The FRSAD conceptual model identifies entities, attributes, and relationships as they relate to subject authority data (FRSAR, 2010). The two main entities are thema (any entity used as a subject of a work) and nomen (any sign or sequence of signs that a thema is known by, referred to or addressed as). Within a given subject authority system, a nomen should be the appellation of one thema. The FRSAD conceptual model is summarized in figure 1. Figure 1: FRSAD conceptual model The FRSAD conceptual model includes two general attributes defined for themas, thema type and scope note. Additional attributes may be defined in a specific implementation. The model defines a wider set of attributes for nomens: type of nomen, scheme, reference source, representation, language, script, script conversion, form, time of validity, audience and status. Additional attributes may be defined in a specific implementation. The FRSAD conceptual model also provides for relationships between entities of the same type (nomen-to-nomen, thema-to-thema) and different types of entities (thema-to-nomen). While FRSAD was developed as a general model of aboutness, the original focus was subject authority data (i.e., controlled vocabularies). Models based on the structures and functions of controlled vocabularies (such as thesauri and subject heading systems) often need to be adjusted or extended to accommodate classification systems that have been developed with different focused functions, structures and fundamental theories. FRSAD and W3C’s Simple Knowledge Organization System (SKOS) (W3C, 2009) are skewed toward conventional thesauri or subject headings systems; the research question of how to implement these models for the complex data in classification systems remains to be answered. Classification system developers have been proposing their own SKOS extensions in order to present their systems with the new formats; such extensions in turn create various data models which then defeat the original purpose of a more interoperable model. With a general implementation-oriented conceptual model based on FRSAD these problems may be solvable effectively. In an earlier study, we considered whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data (Mitchell, Zeng & Zumer, 2011). We posited that the following general statements could be applied to any classification scheme within the FRSAD conceptual model: thema corresponds to the full category description of the class nomen is the symbol (or surrogate) used to represent the full category description Using the Dewey Decimal Classification (DDC)1 system as a case study, we found that the FRSAD conceptual model appears to accommodate DDC data at the thema/nomen level. The DDC case study also demonstrated that an understanding of the semantic relationships within a classification scheme is required in order to make additional statements with respect to what can be considered equivalent to the full category description of the class (thema) and what can be considered as a surrogate to represent the full category description (nomen). As further work, we recommended investigation of DDC translations and mappings in the context of the FRSAD conceptual model, application of the model to other classification schemes, and modeling the Relative Index as a separate controlled vocabulary to explore a topic-centered view. In this paper, we investigate relationships between entities of the same type and between entities of different types in the context of classification systems that involve multiple translations and /or multicultural implementations. We use as our case studies a) two instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and b) Chinese Library Classification.2 DDC Translation Case Study For the DDC case study, we used the updated DDC 22 databases of the two language editions in the WebDewey implementations (WebDewey [OCLC] and WebDewey [National Library of Sweden]).3 DDC 22 (22/eng) was initially published in English in 2003 in print and web versions; the Swedish-English mixed translation of DDC 22 (22/swe) was published in a web version in 2011 based on the latest version of the 22/eng database. The Swedish mixed translation is based on a model in which the category descriptions associated with the DDC are expressed in the vernacular to form the basic framework of the translation; classes from the corresponding English-language edition on which the mixed translation is based are ingested directly to complete the hierarchies where needed (Mitchell, Rype & Svanberg, 2011). In our initial study of whether the FRSAD conceptual model could be extended to model classification data, we identified any Dewey class (e.g., 340 Law) as corresponding to a thema. Nomens include class notation or surrogates, e.g., “340” (class notation), “http://dewey.info/class/340/” (URI), “Social sciences/Law” (caption in the full hierarchical context). Since topics in class-here notes in the DDC are considered to be functionally equivalent to the class (“approximate the whole” of the class), they are also considered 1 DDC, Dewey, Dewey Decimal Classification and WebDewey are registered trademarks of OCLC Online Computer Library Center, Inc. 2 In a companion study, the authors are investigating the application of the Functional Requirements for Bibliographic Records (FRBR) model to classification systems and versions of those systems at the system level for the purposes of specifying provenance of classification data and facilitating collaborative efforts for using and reusing classification data, particularly in a linked data setting. 3 DDC 23 was published in print and web versions in English in 2011; however, the full Swedish-English implementation of DDC 23 was not available in time for inclusion in the study. alternative nomens for the class/thema. Relative Index terms with a functional equivalence4 relationship to the class are also alternative nomens for the class/thema. Relationships between DDC classes in the translations For any class at the same notational level in either language edition within equivalent hierarchical structures, the following two statements hold true: 1. Thema [class] 22/eng = Thema [class] 22/swe 2. The nomen (the symbol (or surrogate) used to represent the thema in either language edition may be used interchangeably (if not limited by a language-specific attribute). In figure 2, the class 741.5 Comic books, graphic novels, fotonovelas, cartoons, caricatures, comic strips in English-language edition is functionally equivalent to the corresponding class 741.5 in the Swedish mixed edition. Figure 2: Equivalent classes at the same notational level Because the themas are functionally equivalent, most nomens associated with the class in either edition may be used interchangeably: • • 741.5 Arts & recreation / Drawing & decorative arts / Drawing and drawings / Special applications / Comic books, graphic novels, fotonovelas, cartoons, caricatures, comic strips We use the term “functional equivalence” and the symbol “=” to signify near equivalence. Functional equivalence is similar to the SKOS relationship “closeMatch”: concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications (W3C, 2009). 4 • • Konst & fritid / Teckningskonst och konsthantverk / Teckningskonst och teckningar /Speciella tillämpningsområden / Seriemagasin, grafiska romaner, fotonovelas, tecknade serier, karikatyrer, skämtserier http://dewey.info/class/741.5/5 These statements also hold true for any class at the same notational level in either language edition in which one class has been expanded further, and the other remains a logical abridgment. In figure 3, thema [class T2—4888] 22/eng = thema [class T2—4888] 22/swe. Figure 3: Equivalent classes at the same notational level, with further expansion in one edition Figure 4 further illustrates the relationships between the terminal class T2—4888 (22/eng) and interoperable subclasses T2—48884 (22/swe) and T2—48887 (22/swe). For the subclasses that exist only in one of the hierarchies, the following relationships exist between editions: 1. Each subclass (thema) is represented by edition-specific nomens. 2. The nomens are valid within the edition with the expanded hierarchy, but are not valid within the edition without the expanded hierarchy. 5 If the URI includes attribute=language, the URI cannot be used as a nomen for a thema beyond a language-specific instance of a thema. T2—4888+ (22/swe) T2—4888 (22/eng) Figure 4: Relationships between terminal classes in one edition with interoperable subclasses in another edition However, the individual themas may be summed as a scope note associated with the terminal class in the other hierarchy. The nomens expressed as truncated symbols may be used to retrieve the terminal class; surrogate nomens (e.g., index terms) associated with the subclasses may also be used to retrieve the terminal class in the other hierarchy, but cannot be used to represent the terminal class. Relationships between index terms Relative Index terms with a functional-equivalence relationship to the class (Relative Index terms that either match the caption or topics in the class-here note), and Relative Index terms with a near-synonym relationship between such index terms (i.e., lexical variants) are considered as alternative nomens for the class/thema. The same nomen/thema relationships across editions hold as described above. Within the translation system that supports development and maintenance of DDC editions in various languages, there is the ability to capture functional equivalence between index terms. The relationships supported by this feature may prove important in future work. In figure 5, the English-language index terms associated with class 796.963 are functionally equivalent to the parallel index entries in the Swedish version. “Bandy” can be used as an alternative nomen for class 796.963 in either edition. “Rink bandy” and “Innebandy,” while functionally equivalent to each other as index terms, cannot serve as alternative nomens for class 796.963; neither is functionally equivalent to the class 796.963 because each is part of a scope note (Mitchell, Zeng & Zumer, 2011, 246). Figure 5: Relationships between Relative Index terms Chinese Library Classification Case Study A second focus of our investigation is the Chinese Library Classification (CLC) 5th edition (CLC, 2012). Our interest here lies in applying the FRSAD conceptual model to a scheme developed outside an Anglophone viewpoint to see if additional issues emerge. The sample classes were selected from the Chinese culture-related areas, including: J292 Chinese Calligraphy, Sculpture (subordinate of J2 Painting). Chinese calligraphy is always a unique case because users usually look for calligraphy works based on (a) style and (b) historical period. CLC thus provides J292.2 Calligraphy works according to time period, and J292.3 Calligraphy works according to style. R2 Chinese Medicine. Chinese medicine is one part in the R Medicine schedule. R2 is selected because Chinese medicine is a whole system that covers topics from theories and classics to medicinal materials, diagnosis, and clinical practices. A mapping between R2 of the CLC and Medical Subject Headings (MeSH) has existed since the 1990s (Lin, 1992). D93/97 Law of different countries. For multicultural related areas we selected these classes. As possibly any country’s laws are to be classified for collections in Chinese libraries, it would be interesting to see if a pre-defined classificatory structure can be applied to every country’s laws regardless of social systems. A general table of nationality was also selected in the study. It is one of the seven general tables to be used together with major schedules. Many specialized tables are also available in each schedule. All sample classes selected (J292, R2, and D93/97) appear to have the same components. Using J292 as an example (see figure 6), it is clear that for the individual classes the CLC has such components: class here, class elsewhere, include, see also, and note on subdividing that should follow the examples of specific class(es). Figure 6: Class J29 Calligraphy in CLC explained in FRSAD terminology Although it grew up from a different root, the CLC’s microstructures of hierarchies and class descriptions are very similar to the DDC’s microstructures. In this case, thema is Class J292. The nomens of the thema include CLC number J292, the full caption 书法、篆刻, and its record number C007181. We consider any topic co-extensive with the full meaning of the class as thema which can be found in the “Class here” note: the topics that are functionally equivalent to the class. Using the framework that the DDC has used, the “Include” and “Subdivide following examples” notes are considered as scope notes; they are not equal to the class or the thema here. The “Class elsewhere” and “See also” indicate the associative relationships between this thema and other themas. The Chinese Library Classification provides “alternative class” notation for classes that can be the member of two different superordinate classes. Square brackets are used to indicate alternative notation. In the CLC, alternative notation is most frequently associated with interdisciplinary concepts.6 For example, the preferred position for Environmental Biology in the CLC is X17; the alternative class is Q89 (see figure 7). 6 In contrast, the DDC provides options most frequently for culturally sensitive topics not given preferred treatment in the standard notation, e.g., jurisdiction, ethnic/national group, language, religion (Mitchell, 1995). Q Biological sciences …… [Q89] Environmental biology Preferred class: X17 X1 Environmental Sciences – Basic Theory …… X17 Environmental biology Note: Here under X17 Environmental biology, all subdivisions and semantic relationships between a class and other classes are systematically presented. If in an implementation a decision is made to use Q89 as the preferred class and X17 as the alternative class, the bracket will be moved from Q89 to X17. The subdivisions under Q89 will be formed following those listed under previous X17. The semantic relationships of those classes will be kept. Figure 7: Alternative notation [Q89] and its equivalent, preferred class X17 (Panzer & Zeng, 2009) What does this mean in terms of the FRSAD model? Is the thema “[Q89] Environmental biology” equivalent to the thema “X17 Environmental biology”? The answer depends on the inheritance rules within a particular classification system. In terms of the thema-nomen relationship, “Environmental biology”@CLC has two nomens (Q89 and X17). The nomennomen relationship between them is similar to what is called ‘equivalence’ in a thesaurus. Because the subdivisions and semantic relationships within “Environmental biology” remain as a whole unit, the thema can be considered the same no matter which family it belongs to. What will be different is the semantic relationship of “Environmental biology” with its superordinate or family, either “X Environmental sciences” or “Q Biological sciences”. A different understanding, however, is that due to the change of the superordinate or family, the original thema “Environmental biology” class, has also changed. Thus “X17 Environmental biology” is a different thema from “Q89 Environmental biology”. We welcome discussion on this issue, and plan to explore it further as part of our future work on mappings between classification systems. Use Cases The key motivation for our work is to determine if classification data modeled using the implementation-oriented FRSAD conceptual model can be used to solve real-world discovery problems in multicultural and multilingual contexts. While purely theoretical, conceptual models are important in practice. These models provide thorough and detailed description of a domain, procedure, service, etc. and thus enable deep understanding. Such understanding is particularly necessary when designing any computer application. The value of FRSAD is that it fosters understanding of KOS in general. By clearly differentiating the thing itself (thema) and the label used to refer to it (nomen) and, consequently, the attributes and relationships associated with the two entity types, FRSAD enables the understanding of and interoperability among various KOS systems that have inherited different data models. For the same reason, FRSAD also supports the design of new KOS. When talking about multicultural and multilingual issues of KOS in general and classification systems in particular, the use cases for a conceptual model include translation, mapping, integration and discovery. If we go back to figure 4, the scope note of T2—4888 (22/eng) can be automatically extended by all the geographic areas explicitly provided for in subdivisions of T2—4888 (22/swe); this is an example of integration. (Such integration can take place virtually; it does not need to be stored in the scope note of the English-language class.) In turn, integration supports discovery. For example, the Relative Index terms associated with the geographic areas found in the subdivisions of T2—4888+ (22/swe) can be used to retrieve information at the level of specificity provided for in either edition, e.g., Pajala kommun (Sverige), the Relative Index term associated with T2—488846 (22/swe) can be used to retrieve works classed with T2— 488846 or with T2—4888. Future Work In our initial study, our third recommendation for future work was to model Relative Index as a separate controlled vocabulary under the FRSAD conceptual model to explore a topic-centered view and expose topic-to-topic relationships. As part of this future work, we plan to investigate mapped terminology from other controlled vocabularies. In some cases, the relationship between the mapped terminology and the DDC class is formally defined; in other instances, the mapped term – DDC class relationship is a looser association motivated by extending the access vocabulary in the system. Mapped headings are nomens for themas in those respective subject heading systems; can these headings be used as nomens for the classes (themas) in the DDC to which they are mapped? The answer likely depends on the relationship (if any) specified between the subject heading and the DDC class to which it is mapped. We also plan further investigation of mapped relationships between different classification systems. References Chinese Library Classification (CLC). (2012). Edition 5 Web version. Beijing, National Library of China. Available at: http://1.202.200.235/login.aspx IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR). (2010). Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model. Edited by M. L. Zeng, M. Žumer, & A. Salaba. Available at: http://www.ifla.org/files/classification-and-indexing/functional-requirements-for-subjectauthority-data/frsad-final-report.pdf. Lin, M. et al. (Ed.) (1992). Concordance of the Chinese Library Classification Schedule R, Medical Subject Headings (MeSH), and Thesaurus of Chinese Medicine. Beijing: China Science and Technology Press. Mitchell, J. S. (1995). Options in the Dewey Decimal Classification system: The current perspective. Co-published simultaneously in Cataloging & Classification Quarterly, 19, 3/4, 89-103; and Classification: Options and Opportunities. Edited by A. Thomas, 89-103. Binghamton, NY: Haworth. Mitchell, J. S., Rype, I., & Svanberg, M. (2011). Mixed translations of the DDC: Design, usability, and implications for knowledge organization in multilingual environments. In: Subject access: Preparing for the future. Edited by P. Landry, L. Bultrini, E. T. O’Neill, & S. K. Roe, 77-89. Berlin: De Gruyter Saur. Mitchell, J. S., Zeng, M. L., & Žumer, M. (2011). Extending models for controlled vocabularies to classification systems: Modelling DDC with FRSAD. In Classification & ontology: formal approaches and access to knowledge: proceedings of the International UDC Seminar, 19-20 September 2011, The Hague, The Netherlands. Edited by A. Slavic & E. Civallero, 241-250. Würzburg: Ergon Verlag. Panzer, M., & Zeng, M. L. (2009). Modeling classification systems in SKOS: some challenges and best-practice recommendations. In: Semantic interoperability of linked data: Proceedings of the International Conference on Dublin Core and Metadata Applications, Seoul, October 12-16, 2009. Edited by S. Oh, S. Sugimoto, & S. A. Sutton, 3-14. Seoul: Dublin Core Metadata Initiative and National Library of Korea, Available at: http://dcpapers.dublincore.org/ojs/pubs/article/view/974 W3C. (2009). SKOS Simple Knowledge Organization System Reference. (W3C Recommendation August 18, 2009). Edited by A. Miles and S. Bechhofer. Available at: http://www.w3.org/TR/skosreference/ WebDewey. Dublin, OH: OCLC. Available at: http://dewey.org/webdewey. WebDewey. Stockholm: National Library of Sweden. Available at: http://deweysv.pansoft.de. Joan S. Mitchell is editor in chief of the Dewey Decimal Classification (DDC) system at OCLC Online Computer Library Center, Inc. Prior to joining OCLC in 1993, she was director of educational technology at Carnegie Mellon University and an adjunct professor in the School of Information Sciences at the University of Pittsburgh. Her research interests include localization and interoperability in classification systems, and the application of general conceptual models to classification systems. In 2005, the American Library Association awarded her the Melvil Dewey Medal, which recognizes distinguished service to the profession of librarianship. Marcia Lei Zeng is professor at Kent State University. She has been involved in the development and research of knowledge organization systems for over 20 years and has been contributing to related standards including NISO Z39.19 and ISO 25964 for controlled vocabularies. She was also the chair of IFLA Working Group that developed the model of Functional Requirements for Subject Authority Data (FRSAD), and an Invited Expert on the W3C Library Linked Data Incubator Group. She is a member of the Executive Board of the International Society for Knowledge Organization (ISKO) and Director-at-large of the American Society for Information Science and Technology (ASIS&T). Maja Žumer is Professor of Information Science at the University of Ljubljana (Slovenia). Her research interests include design and evaluation of information retrieval systems, end-user interfaces, and conceptual modeling. She has been involved in several IFLA working groups, NISO committees, and several EU projects. She has received several international and national research grants. She is a member of the IFLA FRBR Review Group and was the co-chair of IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR).
© Copyright 2026 Paperzz