• Cognizant 20-20 Insights Cognitive Integration: How Canonical Models and Controlled Vocabulary Enable Smarter and Faster Systems Interoperability Pharma companies are showing greater interest in adopting the canonical model approach to provide a standardized and highly abstracted way for partners to integrate with their systems. But without a controlled vocabulary that defines the semantics behind this approach, systems integration can be a difficult, costly and time-consuming activity for all parties. Executive Summary In today’s modern times, everything impacts everything else. “Six degrees of separation” is an anachronism, since nothing seems that far away. In a world characterized by dense interconnections,1 the ability to seamlessly integrate is the precursor to an ordered coexistence. Business has moved forward, forging new partnerships to deliver new capabilities and customer experiences. Digital is premised on agility and innovation. Integration is a foundational capability for success in the new, disruptive digital world. Systems integration has long been on most enterprises’ radar; literature abounds with methods of integration,2 from data-centric and services-centric approaches through process-centric models. The abundance of tools and platforms testify to the continued challenges of integration and the relative shortcomings in various approaches. With newer modes of offerings such as cloudbased software as a service (SaaS), the challenge is further accentuated. cognizant 20-20 insights | march 2016 When integration was confined to a smaller number of systems, the problem was easily handled. This is primarily due to the fact that most of the systems involved were within an enterprise’s boundaries. Such integrations were executed using point-to-point approaches that served this purpose well. Point-to-point approaches began to show their inherent weakness when the integrating systems involved partner applications and/or SaaS applications. The systems involved are now often outside the direct influence of an enterprise. In addition, these disparate systems challenged the way IT departments handled differences in data models as well as data. This postulated the need for greater flexibility and richer contextual understanding – cognitive integration. In principle, cognitive integration will have added semantic capabilities for reasoning. Developments in canonical models and controlled vocabulary give enterprises a way to achieve cognitive integration (i.e., the systems semantically integrate without excruciating coding efforts). In brief, canonical models provide an abstracted representation of entities. Controlled vocabulary provides acceptable connotation. While canonical models help in standardization, controlled vocabulary helps to alleviate semantic differences between systems. The emerging nature of business ecosystems and the attendant integration challenges can be better appreciated by looking at a realistic business scenario. This white paper explores an integration approach that combines a canonical model with controlled vocabulary and illustrates how this facilitates cognitive integration. Although the approach is generally applicable to any domain, the issues of possible multiple interpretations of data and the need to add context for appropriate usage semantics are best understood by examining the type of data being exchanged between systems in the life sciences domain. Business Context The process of bringing a new drug to market is time-consuming, requiring numerous ecosystem players to work together. Patients, regulators, scientists, manufacturers, key opinion leaders and supply chain stakeholders all play vital roles at various stages of the process. When so many business entities need to collaborate and successfully work together, there is an inherent need for information exchange. It is highly desirable, therefore, that such information exchange is achieved in a way that handles semantical differences intelligently. Clinical trials comprise a significant part of the efforts to bring new drugs to market. Conducting clinical trials is an endeavor in itself, and there are specialized business entities such as contract research organizations (CROs) which assist sponsors in this effort. CROs also offer services beyond clinical trials, such as filing and regulatory affairs. The global CRO market is approximately $27 billion and is set to hit $32.7 billion by 2017.3 CROs will continue to grow as pharmaceuticals companies continue outsourcing certain portfolios of studies to CROs – while they retain some of the studies and other core competencies in house. Consider the situation where a large pharmaceuticals organization is working with a number of CROs. Several trials may be ongoing simultaneously, and each CRO may be working on one or many trials. Each trial might have a specific way of gathering and organizing data. Complicatcognizant 20-20 insights ing matters further, CROs often have their own systems to manage the clinical data they collect. The exchange of experimental data within and outside of pharmaceuticals companies becomes an integration nightmare, due to the number of CROs/ other partners and their variations in data formats. There is additional complexity: The absence of standardization or universally accepted norms can lead to multiple interpretations of the data. Simple solutions to the above challenges could be manually mapping and reconciling. However, this laborious effort is not scalable: The addition of a new CRO partner would require the pharmaceuticals company to repeat these efforts. What if a standardized approach to interaction could be used that allows for dynamism in the way information can be expressed by different CROs? If that were possible, the integration of information across various participating systems could be more elegantly handled. For successful integration with business partners, the following would be ideally required: • It should be possible to achieve integration without needing to make major changes to the systems used by the pharmaceuticals company or the CRO. • It should be possible for the CRO to explore and understand how to integrate with the pharmaceuticals company. • Addition of new CROs should not pose significant system integration efforts. In a canonical approach, each application translates its data into a common format understandable to all applications; this loosely coupled pattern minimizes the impact of change. Canonical Models and Controlled Vocabulary Consider two applications being integrated. In point-to-point integration, changes need to be made to both the applications. The same approach would have to be repeated for any new application to be integrated. This introduces brittleness and avoidable engineering effort. To alleviate the pointto-point integration pains, canonical models were proposed. In a canonical approach, each application translates its data into a common format understandable to all applications; this loosely coupled pattern minimizes the impact of change. 2 Canonical Model Partner 1 Partner 2 Partner 3 Partner 4 Logical Data Model Canonical Data Model Enterprise Figure 1 A canonical approach aims to create a common logical model for all applications that need to be integrated. It is not influenced by technology. In this approach, all applications will use the canonical model to exchange information. Implementation specifics will determine the exact mechanisms of data transfer, usually achieved via some kind of transformation logic (see Figure 1). While this loosely coupled approach appears to be a panacea for integration woes, it is not without challenges. The canonical model, by virtue of introduction of an additional layer, can aggravate semantic integration. What was usually resolved between applications by directly understanding the contextual underpinnings now becomes more difficult to resolve. The canonical model, by virtue of introduction of an additional layer, can aggravate semantic integration. Consider the scenario for a concept known as “culture,” a term commonly encountered in microbiology4 for which the CRO and the pharmaceuticals company’s systems use different data model definitions with unique semantics (see Figure 2, next page). This illustration has been simplified to include only very few attributes to avoid complicating the subject. The canonical model defined by the pharmaceuticals enterprise for the purpose of integration with the CRO systems could be visualized as in Figure 3, next page. cognizant 20-20 insights A CV is a predefined, authorized term/ concept with agreed alternates or synonyms, mapped to a set of valid and unique values, and has a defined scope or describes a specific domain. Although the canonical model provides fields to map the CRO and pharmaceuticals company’s data models, the data values can continue to pose integration challenges. While the “Petri Dish” and ”Plate” denote the same type of container in Figure 2, the CRO would have no knowledge that the pharmaceuticals organization has standardized on the term “Petri Dish,” and sending any other equivalent term will not help to semantically integrate the two systems. The companies would need a better and more cognitive ability to understand such nuances during integration. Controlled vocabulary (CV) is an attempt to provide such improved cognition. A CV is a predefined, authorized term/concept with agreed alternates or synonyms, mapped to a set of valid and unique values, and has a defined scope or describes a specific domain.5 For simple illustration purposes, consider Figure 4 (next page) where the CV “Gender” is the concept and can use “Male” or “Female” as values. It is not unusual that a term could have different meanings based on usage context. For example, “Temperature” is a very common term, but incubation temperature and storage temperature convey different meanings although both are temperatures. CV equips programmers with a context for each term and thus provides better cognitive usage. Figure 5 (next page) shows potential CV usage. 3 Fictitious Culture Definition: CRO and Pharmaceuticals Company CRO Definition Pharma Definition Culture Culture Species Name: String Species Tax Id: Long Growth Temperature : String Incubation Temp : Float Container : String Temp UOM : String Media Container: String CRO Culture Data Pharma Culture Data 1337652 “S. cerevisiae 101 S” 32 . 0 “ OCelsius” “Petri Dish” “ 32 o C" “Plate” Figure 2 Unit of measure (UoM) is a good example of a CV that depends on the measurement context, since all measurement types will need to be expressed in terms of some units. Context can be length, temperature or weight. The allowed values need to be restricted depending on the measurement context. UoM CV can be visualized as depicted in Figure 6, next page. Canonical Model for Culture Controlled Vocabulary: An Illustrative Example <<_ Entity >> Culture Term/ Concept +taxonomyId : string +growthTemperature: float +temperatureUOM: string +growthContainer : string Gender Male Male Female ▼ Valid, unique values Figure 4 Figure 3 CV Examples Domain Pharmaceuticals Context Taxonomy Poultry Farming Pharmaceuticals General Incubation Storage Temperature CV Term Species Value(s) Saccharomyces cerevisiae, S. cerevisiae Incubation Temperature 30, 37 Storage Temperature 4, 10, 30 O Unit of Measure Celsius, OC, O Fahrenheit, OF Figure 5 cognizant 20-20 insights 4 Understanding CV Context [millimeter, meter, inch, feet] Length Contexts UOM Temperature Weight [Celsius, Fahrenheit, Kelvin] [gram, pound, ounce] Figure 6 Cognitive Integration Figure 7 illustrates the shortcomings of integration achieved using a canonical model without using CV. As illustrated in Figure 2, the integrating CRO might have a different model for Culture. Figure 7 reveals the complex and brittle transformation required to populate the CRO data into the canonical model. Given the above scenario, the following points are observed • Temperature needs to be split to value and UoM. • Though the value of the Container attribute can be mapped to the canonical model, the value itself cannot be consumed directly by the pharmaceuticals application since it uses a different value, a synonym, for the Container. The above transformations can get complicated with larger data models with more scope for value or data type differences, and such efforts need to be expended for each CRO. CV can provide better context and solve integration semantics. A potential architecture is shown in Figure 8, on the next page. The numbers in Figure 8 (next page) indicate the flow, from the integrating partner (CRO) perspective: Get canonical model –> get CV –> populate canonical model –> send data. The snippets of code illustrate how an attribute (Container) in the canonical model (Culture) is further cognitively elaborated in CV. The pharmaceuticals company can offer semantically powerful integration by publishing the canonical model with an accompanying controlled vocabulary. These models and data can be made available as data as a service (DaaS) via OData or an equivalent framework. We suggest OData here since it is a specifically devised standard for the purpose of sharing data and also has excellent Integration Using Canonical Model Without CV Look up taxonomy ID from online source and pass ID “S. cerevisiae 101 S” “ 32 o C " “Plate” <<_ Entity >> Parse float value Split value and unit Hardcode unit to o “ Celsius” where “o C” Hardcode container to “Petri Dish” where “Plate” Figure 7 cognizant 20-20 insights 5 Culture + taxonomyId: string + growthTemperature: float + temperatureUOM: string + growthContainer: string Integration Architecture with Canonical Model and CV Future Partners 3. Smart Data Population CRO 1 Application & Data 1. Inspect Canonical Model and Retrieve CV Info 2. Retrieve Preferred CV Value Populated Canonical Model CRO 2 CRO n 4. Integrate Smoothly Canonical Model CV Data Web Service Pharma Application & Data RESTful Web Service (OData ) Controlled Vocabulary Database Figure 8 discovery/query capabilities.6 The CRO that wants to integrate with a pharma company should look into the canonical model and try to understand it. To better understand the semantics behind the canonical model, the CRO could query the CV using the published DaaS of OData. The CRO can proceed with integration in a much smoother way using the canonical model and CV. By using this approach, integration between the CRO and the pharmaceuticals data model can happen with significantly smarter, smoother and reusable/repeatable transformation. Note that minor changes in stored values in the CRO data model can assist in smoother integration. In Figure 9, storing taxonomy ID instead of the full species name and value for the temperature without the unit does not require data model changes. At the same time, integration is much smoother. The UoM can be stored in a separate column or table as the CRO chooses. Again, this is a relatively minor change but implementation specifics need to be considered before deciding on the options available. However, for enabling this cognitive intelligence: • CV need to be defined with preferred synonyms for values, maintained and shared by the pharmaceuticals company with partnering CROs. • The canonical model exposed by the pharmaceuticals company needs to include the information of which CV to refer to for each attribute. • CV is preferably exposed via DaaS to CROs. Integration Using Canonical Model and CV <<_ Entity >> “ 1337652 ” “32 " o “ C” “Plate” Culture Parse float value Retrieve preferred synonym From Pharma UOM CV Retrieve preferred synonym From Pharma Container CV Figure 9 cognizant 20-20 insights 6 + taxonomyId: string + growthTemperature: float + temperatureUOM: string + growthContainer: string The above is a simple example. Imagine the extent that this approach can help with disparate data models and large data sets involved in the information sharing between various CROs and pharmaceuticals organizations. However, beware that the benefits of this approach could be limited or even counterproductive unless a conscious collaborative effort is invested in standardizing the CV. Looking Forward Canonical models continue to fascinate integrators as they can drive standardization. However, the approach has also met with considerable resistance due to the perceived complexity and the additional engineering efforts required. At some level in integration tasks, as we have illustrated in this paper, engineers still have to tackle nearly the same transformation challenges as with point-to-point integration, albeit in a different form. We have illustrated that using CV, this could be minimized and better cognition achieved. We foresee the future direction for integration between pharmaceuticals companies and CROs as moving towards as-a-service offerings. We envisage that enterprises across the industry will publish canonical models and CV as services for partners. We strongly believe that tools and platforms in this area will achieve significant growth. Simplicity, reduction of engineering efforts and elegance will determine the success of these integration offerings. We also believe that research and advances in knowledge management will strongly influence CV and, indirectly, canonical models. Advances in artificial intelligence in the area of reasoning and logic are likely to boost cognitive capabilities. We foresee an exciting future with multifarious disciplines coming together to create innovative possibilities. Footnotes 1 Saha, Pallab, “A Systemic Perspective to Managing Complexity with Enterprise Architecture.” 1-580 (2014), DOI:10.4018/978-1-4666-4518-9. 2 Eliana Kaneshima and Rosana T. Vaccare Braga, “Patterns for enterprise application integration,” from Proceedings of the 9th Latin-American Conference on Pattern Languages of Programming (SugarLoafPLoP ‘12). ACM, New York City, Article 2, 16 pages. DOI=http://dx.doi.org/10.1145/2591028.2600811. 3 http://www.clinicalleader.com/doc/an-overview-of-top-clinical-cros-0001. 4 https://en.wikipedia.org/wiki/Microbiological_culture. 5 Alasdair J. G. Gray, Norman Gray, and Iadh Ounis, “Searching and exploring controlled vocabularies,” from Proceedings of the WSDM ‘09 Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR ‘09), ACM, New York City, 1-5. DOI=http://dx.doi.org/10.1145/1506250.1506252. 6 http://www.odata.org. cognizant 20-20 insights 7 About the Authors Raghuraman Krishnamurthy is a Senior Director within Cognizant’s Life Sciences business unit. Raghu has over 22 years of IT experience and is responsible for pre-sales, solutions, architecture and technology consulting for life sciences customers. He focuses on cloud, mobility and big data. Raghu holds a master’s degree from IIT, Bombay and MOOC certificates from Harvard, Wharton, Stanford and MIT. He can be reached at [email protected] | LinkedIn: https://www.linkedin.com/pub/ raghuraman-krishnamurthy/4/1a9/ba0. Vinod Ranganathan is a Senior Architect within Cognizant’s Life Sciences business unit. He has over 14 years of combined experience in the life sciences and IT domains and is responsible for solutions and architecture proposals and design, technology consulting and implementation guidance for life sciences customers and projects. Vinod’s primary expertise is in Java-related technologies with an active interest in big data and cloud technologies. He holds a master’s degree in biotechnology from Pune University, a diploma in advanced computing from C-DAC, Pune and is a TOGAF 9 certified architect. Vinod can be reached at [email protected]. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 100 development and delivery centers worldwide and approximately 221,700 employees as of December 31, 2015, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 Email: [email protected] 1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 20 7297 7600 Fax: +44 (0) 20 7121 0102 Email: [email protected] #5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 Email: [email protected] © Copyright 2016, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. Codex 1849
© Copyright 2026 Paperzz