Semantically enrich your MarkLogic triple store

Semantically enrich your
MarkLogic triple store
About TEMIS
TEMIS helps organizations structure, manage
and exploit their unstructured information
assets. Its flagship platform, Luxid®,
identifies and extracts targeted information
to semantically enrich content with domainspecific triples. This helps organizations to
intelligently archive, manage, package, deliver,
access and analyze increasing volumes of
information.
Founded in 2000, TEMIS operates in the
United States, UK, France and Germany, and is
represented worldwide through its network of
certified partners.
References
AAAS (American Association for the
Advancement of Science), Agence FrancePresse, American Lawers Media, BASF,
Deloitte, Elsevier, Europol, French Ministry
of Defense, French Ministry of Finance,
Gannett (USA Today), Les Echos, Merck KGaA,
OECD, RSNA (Radiological Society of North
America), Sanofi, Springer Science+Business
Media, The McGraw-Hill Companies, the U.S.
Department of Agriculture, Thomson Reuters,
Unicancer, Volkswagen, Wiley, Wolters Kluwer.
Unstructured content represents 80% of your information
assets. It’s a potential treasure trove of business information
but are you exploiting it? Luxid® extracts business information
from your unstructured content, structures it into triples and
feeds them into your MarkLogic triple store, enabling you to
query, visualize and analyze them for insights that are critical
to your competitiveness.
What’s a triple?
A triple is a way of encoding information about
objects. It is a key part of RDF (Resource Description Framework), a W3C standard that enables
computers to access, mesh, and take action on
information that is distributed across the Web. RDF
triples take the shape of statements that link a
subject to an object via a predicate. In each of
the following three triples, the predicate linking
both objects has been italicized.
Leonardo_DiCaprio stars_in Titanic
James_Cameron directed Titanic
Titanic launched_in 1997
Triples typically make claims about real world
objects (also called resources or entities) – in the
above example, actors, directors, movies – and
may be published in public knowledge bases. To
ensure robust operation, RDF triples unambiguously describe each of the entities they refer to
with a Unique Resource Identifier (URI), and use
predicates by reference to a vocabulary (or ontology) published alongside the knowledge base.
Triples are commonly stored in a dedicated repository such as the MarkLogic 7 triple store.
How can triples be used?
Triples can be queried, navigated, visualized and
analyzed in the context of any task that has at its
core the exploitation of knowledge, whether proprietary to the organization, or available from a
third party, or even a combination of both. Recurring use cases that leverage triples include:
Linked Open Data
DBPedia is an open data initiative that involves the
public sharing of knowledge housed in a queryable
triple store. Similar queryable information repositories include Geonames (geographical features),
data.gov (US federal, state, and local data) as well
as legislation.gov.uk (UK statutory law). In the Life
Sciences, UniProt and DrugBank are similar initiatives that offer information about proteins and
drugs. Linking your data with those Linked Open
Data repositories enables your users to extend their
knowledge about your Triple Store entities, thus
enabling more advanced queries and insights.
Commercial Information
Products
Triple stores can likewise be exploited for commercial publications. In portals, they enable new,
added-value information navigation and analytics
features alongside more traditional content-driven
offerings. The BBC coverage of the 2012 Olympics
is a well-known example based on this approach
to report key information about countries, teams,
players, and disciplines. Knowledge bases that rely
on queryable triple stores are also a growing product category. They enable the seamless integration of structured information into end-user workflow applications and analytics tools.
Enterprise Linked Data
Triple stores may also house proprietary information about any entity present in an organization’s
world view: other Organizations (suppliers, competitors, partners), People (customers, employees,
notable individuals), Products (Parts, Accessories,
Options), Objects of research (molecules, proteins,
diseases), etc…
Here again, such information formerly scattered
through the enterprise can then be queried, explored, visualized or analyzed to answer questions
such as the following:
• What business relationships exist between a
potential partner and my competitors?
A NATURAL LANGUAGE PROCESSING PIPELINE
THAT EXTRACTS STRUCTURED INFORMATION
FROM YOUR UNSTRUCTURED CONTENT
Luxid® Skill Cartridge® Library
A COLLECTION OF ANNOTATORS DEDICATED
TO YOUR SPECIFIC DOMAIN OR APPLICATION
Luxid® Content Enrichment Studio
A SUITE OF FOUR TOOLS TO OPTIMIZE YOUR
PLATFORM
France
Tour Mattei
207 rue de Bercy
75012 Paris
T : +33 (0)1 80 98 11 00
[email protected]
United Kingdom
The Euston Office
1 Euston Square
London, NW1 2FD
T : +44 (0)777 474 6278
[email protected]
Provided relevant information is also available from
third-party triple stores (commercial or open), it can
be conjointly analyzed with proprietary triples, enabling insights that would not be available otherwise.
How do you create triples?
Unstructured content represents 80% of your organization’s information assets, a potential treasure
trove of business insights. Thanks to Luxid®, a
complementary application to MarkLogic that is
integrated via REST Web Services, you can now
extract business information from your content and
feed it as triples into your MarkLogic triple store.
Based on an award-winning natural language
processing pipeline that analyzes your content,
Luxid® extracts information about your organization’s entities of interest and their relationships to
derive precise and relevant triples in 20 languages.
Aligned with your taxonomy or ontology, these
triples then become natively accessible to any application leveraging your MarkLogic triple store, in
particular for querying, visualization and analytics
purposes.
Platform overview and key components
Luxid® Annotation Server
• How do our clinical results compare to publicly
available information about side effects caused
by molecules with comparable modes of action?
• Which expert are most closely involved in our
area of investigation yet most remote from our
teams?
• Streamlined and scalable architecture
• Easy to integrate thanks to REST Web Services
• Supports 20 languages and extracts categories, entities, relationships or
terminology mentioned in text
• Extraction engines based on syntax, statistics, taxonomy, machine
learning and business rules
• Each Skill Cartridge® focuses on recurring areas of interest: People and location names, Information about companies and their relationships, Categorization of news, Biology, Medicine, Chemistry, Finance, Legal, Defense
& Security, etc.
• Luxid® Community hosts numerous complementary partner-provided Skill
Cartridges®
• Create/Customize/Extend your Skill Cartridges®
• Create/Maintain your Taxonomies, Thesauri and Ontologies
• Project your ontology into a Skill Cartridge®
• Exploit the morpho-syntactic reasoning engines, statistical models,
machine learninig and/or business rules
• Measure, Track and Optimize extraction quality
United States
6110 Executive Boulevard
Suite 690
Rockville, MD 20852
T : +1 240 477 1800
[email protected]
Germany
Blumenstraße 15
D-69115 Heidelberg
T : +49 - 6221 1375 3-0
[email protected]