Cognitive Integration: How Canonical Models and

• Cognizant 20-20 Insights
Cognitive Integration: How Canonical
Models and Controlled Vocabulary Enable
Smarter and Faster Systems Interoperability
Pharma companies are showing greater interest in adopting the
canonical model approach to provide a standardized and highly
abstracted way for partners to integrate with their systems. But
without a controlled vocabulary that defines the semantics behind
this approach, systems integration can be a difficult, costly and
time-consuming activity for all parties.
Executive Summary
In today’s modern times, everything impacts
everything else. “Six degrees of separation” is an
anachronism, since nothing seems that far away.
In a world characterized by dense interconnections,1 the ability to seamlessly integrate is the
precursor to an ordered coexistence. Business
has moved forward, forging new partnerships to
deliver new capabilities and customer experiences. Digital is premised on agility and innovation.
Integration is a foundational capability for success
in the new, disruptive digital world.
Systems integration has long been on most enterprises’ radar; literature abounds with methods of
integration,2 from data-centric and services-centric approaches through process-centric models.
The abundance of tools and platforms testify to
the continued challenges of integration and the
relative shortcomings in various approaches.
With newer modes of offerings such as cloudbased software as a service (SaaS), the challenge
is further accentuated.
cognizant 20-20 insights | march 2016
When integration was confined to a smaller
number of systems, the problem was easily
handled. This is primarily due to the fact that
most of the systems involved were within an
enterprise’s boundaries. Such integrations were
executed using point-to-point approaches that
served this purpose well.
Point-to-point approaches began to show their
inherent weakness when the integrating systems
involved partner applications and/or SaaS applications. The systems involved are now often
outside the direct influence of an enterprise. In
addition, these disparate systems challenged the
way IT departments handled differences in data
models as well as data. This postulated the need
for greater flexibility and richer contextual understanding – cognitive integration. In principle,
cognitive integration will have added semantic
capabilities for reasoning.
Developments in canonical models and controlled
vocabulary give enterprises a way to achieve
cognitive integration (i.e., the systems semantically integrate without excruciating coding
efforts). In brief, canonical models provide an
abstracted representation of entities. Controlled
vocabulary provides acceptable connotation.
While canonical models help in standardization,
controlled vocabulary helps to alleviate semantic
differences between systems.
The emerging nature of business ecosystems
and the attendant integration challenges can
be better appreciated by looking at a realistic
business scenario. This white paper explores an
integration approach that combines a canonical
model with controlled vocabulary and illustrates how this facilitates cognitive integration.
Although the approach is generally applicable to
any domain, the issues of possible multiple interpretations of data and the need to add context for
appropriate usage semantics are best understood
by examining the type of data being exchanged
between systems in the life sciences domain.
Business Context
The process of bringing a new drug to market is
time-consuming, requiring numerous ecosystem
players to work together. Patients, regulators,
scientists, manufacturers, key opinion leaders
and supply chain stakeholders all play vital roles
at various stages of the process. When so many
business entities need to collaborate and successfully work together, there is an inherent need
for information exchange. It is highly desirable,
therefore, that such information exchange is
achieved in a way that handles semantical differences intelligently.
Clinical trials comprise a significant part of the
efforts to bring new drugs to market. Conducting
clinical trials is an endeavor in itself, and there
are specialized business entities such as contract
research organizations (CROs) which assist
sponsors in this effort. CROs also offer services
beyond clinical trials, such as filing and regulatory
affairs. The global CRO market is approximately
$27 billion and is set to hit $32.7 billion by 2017.3
CROs will continue to grow as pharmaceuticals
companies continue outsourcing certain portfolios
of studies to CROs – while they retain some of the
studies and other core competencies in house.
Consider the situation where a large pharmaceuticals organization is working with a number of
CROs. Several trials may be ongoing simultaneously, and each CRO may be working on one or
many trials. Each trial might have a specific way
of gathering and organizing data. Complicatcognizant 20-20 insights
ing matters further, CROs often have their own
systems to manage the clinical data they collect.
The exchange of experimental data within and
outside of pharmaceuticals companies becomes an
integration nightmare, due to the number of CROs/
other partners and their variations in data formats.
There is additional complexity: The absence of
standardization or universally accepted norms can
lead to multiple interpretations of the data.
Simple solutions to the above challenges could be
manually mapping and reconciling. However, this
laborious effort is not scalable: The addition of a
new CRO partner would require the pharmaceuticals company to repeat these efforts. What if
a standardized approach to interaction could be
used that allows for dynamism in the way information can be expressed by different CROs? If
that were possible, the integration of information
across various participating systems could be
more elegantly handled.
For successful integration with business partners,
the following would be ideally required:
• It
should be possible to achieve integration
without needing to make major changes to the
systems used by the pharmaceuticals company
or the CRO.
• It should be possible for the CRO to explore and
understand how to integrate with the pharmaceuticals company.
• Addition of new CROs should not pose significant system integration efforts.
In a canonical approach, each
application translates its data into a
common format understandable to
all applications; this loosely coupled
pattern minimizes the impact of change.
Canonical Models and
Controlled Vocabulary
Consider two applications being integrated. In
point-to-point integration, changes need to be
made to both the applications. The same approach
would have to be repeated for any new application
to be integrated. This introduces brittleness and
avoidable engineering effort. To alleviate the pointto-point integration pains, canonical models were
proposed. In a canonical approach, each application translates its data into a common format
understandable to all applications; this loosely
coupled pattern minimizes the impact of change.
2
Canonical Model
Partner 1
Partner 2
Partner 3
Partner 4
Logical
Data Model
Canonical Data Model
Enterprise
Figure 1
A canonical approach aims to create a common
logical model for all applications that need to
be integrated. It is not influenced by technology.
In this approach, all applications will use the
canonical model to exchange information. Implementation specifics will determine the exact
mechanisms of data transfer, usually achieved via
some kind of transformation logic (see Figure 1).
While this loosely coupled approach appears to be
a panacea for integration woes, it is not without
challenges. The canonical model, by virtue of
introduction of an additional layer, can aggravate
semantic integration. What was usually resolved
between applications by directly understanding
the contextual underpinnings now becomes more
difficult to resolve.
The canonical model, by virtue of
introduction of an additional layer, can
aggravate semantic integration.
Consider the scenario for a concept known as
“culture,” a term commonly encountered in microbiology4 for which the CRO and the pharmaceuticals company’s systems use different data model
definitions with unique semantics (see Figure 2,
next page). This illustration has been simplified
to include only very few attributes to avoid complicating the subject.
The canonical model defined by the pharmaceuticals enterprise for the purpose of integration with
the CRO systems could be visualized as in Figure 3,
next page.
cognizant 20-20 insights
A CV is a predefined, authorized term/
concept with agreed alternates or
synonyms, mapped to a set of valid and
unique values, and has a defined scope
or describes a specific domain.
Although the canonical model provides fields to
map the CRO and pharmaceuticals company’s
data models, the data values can continue to pose
integration challenges. While the “Petri Dish” and
”Plate” denote the same type of container in Figure
2, the CRO would have no knowledge that the pharmaceuticals organization has standardized on the
term “Petri Dish,” and sending any other equivalent
term will not help to semantically integrate the two
systems. The companies would need a better and
more cognitive ability to understand such nuances
during integration. Controlled vocabulary (CV) is
an attempt to provide such improved cognition. A
CV is a predefined, authorized term/concept with
agreed alternates or synonyms, mapped to a set of
valid and unique values, and has a defined scope
or describes a specific domain.5 For simple illustration purposes, consider Figure 4 (next page) where
the CV “Gender” is the concept and can use “Male”
or “Female” as values.
It is not unusual that a term could have different
meanings based on usage context. For example,
“Temperature” is a very common term, but
incubation temperature and storage temperature convey different meanings although both
are temperatures. CV equips programmers with
a context for each term and thus provides better
cognitive usage. Figure 5 (next page) shows
potential CV usage.
3
Fictitious Culture Definition: CRO and Pharmaceuticals Company
CRO Definition
Pharma Definition
Culture
Culture
Species Name: String
Species Tax Id: Long
Growth Temperature : String
Incubation Temp : Float
Container : String
Temp UOM : String
Media Container: String
CRO Culture
Data
Pharma Culture
Data
1337652
“S. cerevisiae 101 S”
32 . 0
“ OCelsius”
“Petri Dish”
“ 32 o C"
“Plate”
Figure 2
Unit of measure (UoM) is a good example of a CV
that depends on the measurement context, since
all measurement types will need to be expressed
in terms of some units. Context can be length,
temperature or weight. The allowed values need
to be restricted depending on the measurement
context. UoM CV can be visualized as depicted in
Figure 6, next page.
Canonical Model for Culture
Controlled Vocabulary:
An Illustrative Example
<<_ Entity >>
Culture
Term/
Concept
+taxonomyId : string
+growthTemperature: float
+temperatureUOM: string
+growthContainer : string
Gender
Male
Male
Female
▼
Valid,
unique
values
Figure 4
Figure 3
CV Examples
Domain
Pharmaceuticals
Context
Taxonomy
Poultry Farming
Pharmaceuticals
General
Incubation
Storage
Temperature
CV Term
Species
Value(s)
Saccharomyces
cerevisiae, S. cerevisiae
Incubation Temperature 30, 37
Storage Temperature
4, 10, 30
O
Unit of Measure
Celsius, OC,
O
Fahrenheit, OF
Figure 5
cognizant 20-20 insights
4
Understanding CV Context
[millimeter, meter, inch, feet]
Length
Contexts
UOM
Temperature
Weight
[Celsius, Fahrenheit, Kelvin]
[gram, pound, ounce]
Figure 6
Cognitive Integration
Figure 7 illustrates the shortcomings of integration achieved using a canonical model without
using CV. As illustrated in Figure 2, the integrating CRO might have a different model for Culture.
Figure 7 reveals the complex and brittle transformation required to populate the CRO data into
the canonical model.
Given the above scenario, the following points are
observed
• Temperature
needs to be split to value and
UoM.
• Though
the value of the Container attribute
can be mapped to the canonical model, the
value itself cannot be consumed directly by
the pharmaceuticals application since it uses a
different value, a synonym, for the Container.
The above transformations can get complicated
with larger data models with more scope for value
or data type differences, and such efforts need
to be expended for each CRO. CV can provide
better context and solve integration semantics. A
potential architecture is shown in Figure 8, on the
next page.
The numbers in Figure 8 (next page) indicate the
flow, from the integrating partner (CRO) perspective: Get canonical model –> get CV –> populate
canonical model –> send data. The snippets of
code illustrate how an attribute (Container) in the
canonical model (Culture) is further cognitively
elaborated in CV.
The pharmaceuticals company can offer semantically powerful integration by publishing the
canonical model with an accompanying controlled
vocabulary. These models and data can be made
available as data as a service (DaaS) via OData or
an equivalent framework. We suggest OData here
since it is a specifically devised standard for the
purpose of sharing data and also has excellent
Integration Using Canonical Model Without CV
Look up taxonomy ID
from online source and pass ID
“S. cerevisiae 101 S”
“ 32 o C "
“Plate”
<<_ Entity >>
Parse float
value
Split value
and unit
Hardcode unit to
o
“ Celsius” where “o C”
Hardcode container
to “Petri Dish” where “Plate”
Figure 7
cognizant 20-20 insights
5
Culture
+ taxonomyId: string
+ growthTemperature: float
+ temperatureUOM: string
+ growthContainer: string
Integration Architecture with Canonical Model and CV
Future Partners
3. Smart Data
Population
CRO 1 Application
& Data
1. Inspect Canonical Model
and Retrieve CV Info
2. Retrieve Preferred
CV Value
Populated
Canonical
Model
CRO 2
CRO n
4. Integrate Smoothly
Canonical
Model
CV Data
Web
Service
Pharma Application
& Data
RESTful Web Service
(OData )
Controlled Vocabulary
Database
Figure 8
discovery/query capabilities.6 The CRO that wants
to integrate with a pharma company should look
into the canonical model and try to understand
it. To better understand the semantics behind
the canonical model, the CRO could query the CV
using the published DaaS of OData. The CRO can
proceed with integration in a much smoother way
using the canonical model and CV.
By using this approach, integration between the
CRO and the pharmaceuticals data model can
happen with significantly smarter, smoother
and reusable/repeatable transformation. Note
that minor changes in stored values in the CRO
data model can assist in smoother integration.
In Figure 9, storing taxonomy ID instead of the
full species name and value for the temperature
without the unit does not require data model
changes. At the same time, integration is much
smoother. The UoM can be stored in a separate
column or table as the CRO chooses. Again, this
is a relatively minor change but implementation
specifics need to be considered before deciding
on the options available.
However, for enabling this cognitive intelligence:
• CV need to be defined with preferred synonyms
for values, maintained and shared by the pharmaceuticals company with partnering CROs.
• The canonical model exposed by the pharmaceuticals company needs to include the information of which CV to refer to for each attribute.
• CV is preferably exposed via DaaS to CROs.
Integration Using Canonical Model and CV
<<_ Entity >>
“ 1337652 ”
“32 "
o
“ C”
“Plate”
Culture
Parse float value
Retrieve preferred synonym
From Pharma UOM CV
Retrieve preferred synonym
From Pharma Container CV
Figure 9
cognizant 20-20 insights
6
+ taxonomyId: string
+ growthTemperature: float
+ temperatureUOM: string
+ growthContainer: string
The above is a simple example. Imagine the extent
that this approach can help with disparate data
models and large data sets involved in the information sharing between various CROs and pharmaceuticals organizations. However, beware that
the benefits of this approach could be limited or
even counterproductive unless a conscious collaborative effort is invested in standardizing the CV.
Looking Forward
Canonical models continue to fascinate integrators as they can drive standardization. However,
the approach has also met with considerable
resistance due to the perceived complexity and
the additional engineering efforts required. At
some level in integration tasks, as we have illustrated in this paper, engineers still have to tackle
nearly the same transformation challenges as
with point-to-point integration, albeit in a different
form. We have illustrated that using CV, this could
be minimized and better cognition achieved.
We foresee the future direction for integration
between pharmaceuticals companies and CROs
as moving towards as-a-service offerings. We
envisage that enterprises across the industry
will publish canonical models and CV as services
for partners. We strongly believe that tools and
platforms in this area will achieve significant
growth. Simplicity, reduction of engineering
efforts and elegance will determine the success
of these integration offerings.
We also believe that research and advances in
knowledge management will strongly influence
CV and, indirectly, canonical models. Advances
in artificial intelligence in the area of reasoning
and logic are likely to boost cognitive capabilities.
We foresee an exciting future with multifarious
disciplines coming together to create innovative
possibilities.
Footnotes
1
Saha, Pallab, “A Systemic Perspective to Managing Complexity with Enterprise Architecture.” 1-580
(2014), DOI:10.4018/978-1-4666-4518-9.
2
Eliana Kaneshima and Rosana T. Vaccare Braga, “Patterns for enterprise application integration,” from
Proceedings of the 9th Latin-American Conference on Pattern Languages of Programming (SugarLoafPLoP ‘12). ACM, New York City, Article 2, 16 pages. DOI=http://dx.doi.org/10.1145/2591028.2600811.
3
http://www.clinicalleader.com/doc/an-overview-of-top-clinical-cros-0001.
4
https://en.wikipedia.org/wiki/Microbiological_culture.
5
Alasdair J. G. Gray, Norman Gray, and Iadh Ounis, “Searching and exploring controlled vocabularies,”
from Proceedings of the WSDM ‘09 Workshop on Exploiting Semantic Annotations in Information
Retrieval (ESAIR ‘09), ACM, New York City, 1-5. DOI=http://dx.doi.org/10.1145/1506250.1506252.
6
http://www.odata.org.
cognizant 20-20 insights
7
About the Authors
Raghuraman Krishnamurthy is a Senior Director within Cognizant’s Life Sciences business unit. Raghu
has over 22 years of IT experience and is responsible for pre-sales, solutions, architecture and technology
consulting for life sciences customers. He focuses on cloud, mobility and big data. Raghu holds a master’s
degree from IIT, Bombay and MOOC certificates from Harvard, Wharton, Stanford and MIT. He can be
reached at [email protected] | LinkedIn: https://www.linkedin.com/pub/
raghuraman-krishnamurthy/4/1a9/ba0.
Vinod Ranganathan is a Senior Architect within Cognizant’s Life Sciences business unit. He has over 14
years of combined experience in the life sciences and IT domains and is responsible for solutions and
architecture proposals and design, technology consulting and implementation guidance for life sciences
customers and projects. Vinod’s primary expertise is in Java-related technologies with an active interest
in big data and cloud technologies. He holds a master’s degree in biotechnology from Pune University,
a diploma in advanced computing from C-DAC, Pune and is a TOGAF 9 certified architect. Vinod can be
reached at [email protected].
About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business
process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction,
technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 100 development and delivery centers worldwide and
approximately 221,700 employees as of December 31, 2015, Cognizant is a member of the NASDAQ-100,
the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and
fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.
World Headquarters
European Headquarters
India Operations Headquarters
500 Frank W. Burr Blvd.
Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
Email: [email protected]
1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 20 7297 7600
Fax: +44 (0) 20 7121 0102
Email: [email protected]
#5/535, Old Mahabalipuram Road
Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
Email: [email protected]
­­© Copyright 2016, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.
Codex 1849