OWL-DL Ontology for Classification of Lipids

OWL-DL Ontology for Classification of Lipids
1
Hong-Sang Low1, Christopher J.O. Baker2, Alexander Garcia3, Markus R. Wenk1,
National University of Singapore, Singapore; 2University of New Brunswick, Saint John,
Canada; 3 University of Bremen, Bremen, Germany.
Abstract
Lipids can be systematically classified according to
functional
properties,
structural
features,
biochemical origin or biological system. However
Lipid nomenclature has yet to become a robust
research tool since no rigorous definitions exist for
membership of specific lipid classes. Lipids need to
be defined in a manner that is systematic yet at the
same time semantically explicit. We report the reuse
of existing lipid nomenclature, ontology describing
chemical structure and the extension of the OWL-DL
Lipid Ontology in order to support the classification
of lipid molecules. We applied class definitions, DLaxioms, to describe lipid classes and illustrate
suitability of the ontology for the classification of
Fatty Acyl lipids and Mycolic acids.
Introduction
IUPAC-IUBMB proposed a systematic nomenclature
for lipids which received limited adoption by the
lipid community2. The proposed classification was
complicated and prone to erroneous application by
scientists2. Moreover the naming scheme was not
extended and does not adequately represent many
novel lipid classes discovered in the recent decades.
As a result lipids still lack systematic classification
and a nomenclature that is universally adopted by the
biomedical research community. The LIPIDMAPS
consortium aims to solve this problem by developing
a scientifically robust, comprehensive and extensible
classification system adapted from the IUPAC
nomenclature2. This classification scheme organizes
lipids from different phylogeny and synthetic
domains. Still, uptake by the scientific community
has been gradual and lipid literature is steeped with
instances of lipid synonyms that do not follow the
new nomenclature. In addition to that, LIPIDMAPS
nomenclature is still a cumbersome naming scheme.
primarily used is-a relationships3 to categorize lipids
and definitions for lipid classes in LIPIDMAPS
remained implicit. Moreover they are often
dependent on a chemical diagram in the form a
molecular graphic file that can only be accurately
classified by a trained lipid expert. No rigorous
definition, independent of a graphical diagram, exists
and the graphical definitions are not flexible, nor are
they extensible. Changes in such definitions require
the redrawing of the chemical diagram/definition.
Subsequently,
communicating,
storing
and
transferring of such structural definitions in the
current format is inefficient and there is much
reliance on trained experts. There is therefore a need
for lipids to be defined in a manner that is systematic
and explicit (following LIPIDMAPS hierarchical
structure). A rigorous definition would involve a
minimal necessary and sufficient declaration for class
membership that could adequately describe a lipid
without a molecular structure diagram being
necessary.
Description logics (DL) describes a domain using
class descriptions according to a logic based semantic
previously used to represent chemical knowledge4,5.
Using DL, it is possible to define a lipid with
necessary conditions where epoxy fatty acid is
defined as a lipid that at least have a carboxylic acid
group and an epoxy group. Moreover, we can define
with necessary and sufficient conditions such where
epoxy fatty acid is defined as a lipid that can only
have epoxy group and carboxylic acid group. Any
molecules that have functional groups other than
epoxy group and carboxylic acid group cannot be
considered as an epoxy fatty acid.
O
OH
O
Figure 1. An example of epoxy fatty acid
Hierarchical Classification of Lipid Nomenclature
Lipids are organic compounds and can be
systematically classified in many ways such as using
atomic connectivity, physicochemical properties,
presence of functional groups, or the type of
bioactivities. Albeit an important contribution, the
LIPIDMAPS central repository of lipids has
The Lipid Ontology
The first reported ontology was exclusively
developed to conceptualize and capture knowledge in
the domain of lipids through the use of concepts,
relations, instances and constraints on concepts6. It is
a resource that provides a common terminology for
the lipid domain and a basis for interoperability
between information systems. It had also been used
to support navigation of lipid literature in
conjunction with text mining. Full details of the
initial conceptualization are given in earlier
publication6. This ontology has been extended to
describe the LIPIDMAPS nomenclature classification
explicitly using description logics (OWL-DL) and to
support reasoning and inference tasks.
Prior to extending the ontology for classification
tasks we reviewed existing ontologies for reusable
components. We reviewed the Chemical Ontology7
for reuse of the Organic_Group hierarchy and the
Lipid Ontology was enriched with 32 functional
groups from Chemical Ontology. An additional 63
new concepts were added into the Organic_Group
super-concept in the current iteration of Lipid
Ontology. The Organic_Group hierarchy was
reorganized and asserted with new is-a relationships.
Moreover, in order to describe the lipids with
complex chemical moieties, we rename the
Organic_Group concept into Simple_Organic_Group
and position it together with newly created
Complex_Organic_Group
and
Chain_Group
concepts under a new Organic_Group concept. In
addition to that, we have also created the new
Ring_System concept to describe lipids with ring
structure.
In Chemical Ontology, Organic_Compound are
concepts with hasPart relationship to concepts under
Organic_Group. The same property is used in Lipid
Ontology to relate concepts under Lipid to concepts
under Organic_Group. Inversely, an inverse property
partOf is used to relate concepts under
Organic_Group with concepts under Lipid. For the
latest iteration of Lipid Ontology, we added 400 DL
definitions into the ontology for all lipid classes,
except for Polyketides. (See Table 1 for complete
summary of Lipid Ontology)
Functional Groups used in Lipid Classification
Lipids are complicated biomolecules and most lipids
can only be adequately classified with more than one
distinct functional group. Lipids are defined by
multiple sets of organic groups and these definitions
are used to restrict the membership of individual
lipids to specific classes of lipids. DL rules with
greater complexity than described in Chemical
Ontology7 are needed to describe lipids. Primarily we
re-use, from the Chemical Ontology, the axiom
“Organic_Compound hasPart Organic_Group” to
relate Lipid concepts to Organic_Group concepts.
We then define in Lipid Ontology, two concepts to
describe lipid structures, namely Organic_Group and
Ring_System. Organic_Group has the sub-groups (i)
Simple_Organic_Group, (ii) Complex_Organic_Group, (iii) Chain_Group. Simple_Organic_Group
subsumes concepts that describe basic functional
groups
whereas
Complex_Organic_Group
encapsulates glycans and amino acids. Glycans, in
particular, are used to classify lipids such as
sacharrolipid, and other sugar- linked lipids such as
sphingolipids. Chain_Group consists of the
Carbon_Chain_Group and the Sphingoid_Base_Chain_Group. The Sphingoid_Base_Chain_Group is
used exclusively for Sphingolipid whereas
Carbon_Chain_Group is applied to other lipid classes
accordingly. The Ring_System consists of (i)
Isoprenoid_ring_derivative, (ii) Monocyclic_Ring_
Group and (iii) Polycyclic_Ring_System. These
concepts are used to define lipids that have at least
one or more rings. Specifically, they are used mainly
for Sterol_Lipid, Prenol_Lipid and other ring lipids.
In the Lipid Ontology these concepts are extensively
used to provide the necessary structural descriptions
to define the identity of lipid-based compounds.
Subject
Total No. of Classes
Primitive Class
Defined Class
No. of Lipid Classes
No. of Lipid Classes with DL axioms
Total No. of Restrictions
Total No. of Properties
DL Expressivity
No.
715
447
268
428
400
901
46
ALCHIQ(D)
Table 1. Summary of the current Lipid Ontology
Hierarchical Classification of Lipids
Lipid concepts are organized hierarchically with the
super-classes restricted by generic necessary
conditions. More specific necessary conditions are
used to define membership requirements for sub
classes of lipid. At the end of hierarchy, lipid classes
are restricted by necessary and sufficient conditions
and closure axioms. Super-classes are not closed by
closure axiom to avoid inconsistency among
disjointed sibling classes. More specific lipid classes
are defined this way. The first involves specifying the
subclass of the present class to restrict the definition
of a lipid. Necessary conditions such a “hasPart some
Carboxylic_Acid_ derivative_Group” can be further
specified by the subclass of Carboxylic_Acid_
Derivative_Group, eg. an aldehyde. The second
approach uses a Cardinality Axiom. The axiom can
be applied to concepts at any level. Once declared,
the cardinality axiom restricts the number of a
particular concept to be allowed in a restriction. Lipid
classes can be defined by the number of certain
organic group concept or Chain_ Group concept. For
example, Triacylglycerol is an Acylglycerol with 3
acyl chains. Its superclass is restricted with an
existential axiom “has some Acyl_ Chain”. This is
further specified with the following cardinality axiom
“hasAcyl_Chain exactly 3”.
DL Axioms for the definition of Fatty_Acyl
Fatty acyls are a diverse group synthesized by chainelongation of an acetyl-CoA primer with malonylCoA/methylmalonyl-CoA groups2. We define a
Fatty_Acyl as a lipid that has at least one
Carboxylic_Acid_derivative_Group and at least one
Acyl_Chain. Docosanoid is a subclass of Fatty_Acyl
that inherits from Fatty_Acyl, the Carboxylic_Acid_
derivative_Group as well as Acyl_Chain. The
Carboxylic_Acid_derivative_Group in Docosanoid is
further specified to be a Carboxylic_ Acid, whereas
the Acyl_Chain of is specified with a cardinality
axiom and the property hasAcyl_ Chain.
Consequently, Docosanoid is defined to have only 1
Acyl_Chain. Moreover, Docosanoid has multiple and
distinct functional groups such as Carboxylic_Acid,
Alkenyl_Group, Alcohol and Cyclopentenone. These
functional groups are made to relate with Dosocanoid
via the property “hasPart” in conjuction with the
existential axiom “some”. A closure axiom is needed
to restrict the type of relationship constraints allowed
for a lipid class. The closure axiom is applied to
Docosanoids so that lipids of this class can only have
the
following
functional
groups,
namely,
Carboxylic_Acid, Alkenyl_Group, Alcohol, Cyclopentenone and Acyl_Chain. (see Figure 2)
DL Axioms for unusual Fatty_Acyl classes
As LIPIDMAPS nomenclature classifies lipids based
on their chemical structure or biosynthetic origin,
lipids such as Fatty_alcohol, Fatty_Nitrille,
Fatty_ether and Hydrocarbon are classified as a
member of Fatty_Acyl although they do not have an
Acyl_Group. We reconcile this by expanding the
definition of Fatty_Acyl to include Alkyl_Chain, a
characteristic structure of these un-usual Fatty_Acyl
classes. A Fatty_alcohol inherits an Alkyl_Chain
from Fatty_Acyl and is further defined to have only 1
Alkyl_Chain in the necessary and sufficient
condition. This necessary and sufficient condition
also includes a “hasPart” property that connects
Fatty_alcohol to an Alcohol class. This definition
enables us to include lipids without an Acyl_Group
as a member of Fatty_Acyl (see Figure 2). In
addition we create a new lipid class, namely
Fatty_Acyl_ derivative, a subclass of Fatty _Acyl.
Extension of Mycolic Acid Class
Mycolic acids are a key component of the cell wall of
Mycobacterium tubeculosis sps. and are implicated
mycobacterial disease. By 1998 there existed 500 or
more known chemical structures of related
mycolates8 and yet LIPIDMAPS currently contains
only 3 mycolic acid records. Consequently many
mycolic acids with known structures have yet to be
systematically classified. Classification of these
lipids is critical for system-level analysis of
mycobacterial pathogenesis. We illustrate extension
of the ontology to include new Mycolic_Acid classes
and demonstrate the assignment of an alpha mycolate
(see Figure 1) to the appropriate Lipid class. Based
on LIPIDMAPS nomenclature, we assign
Mycolic_acid as a member of Fatty_Acid and extend
Mycolic_acid classification with 9 defined subclasses
(see Table 2).
Figure 2. DL definitions of Docosanoid, Fatty alcohol, Alpha mycolic acid
These are distributed among five primitive classes.
General_mycolic_acid, General_ methylated_
mycolic_ acid,
General_ alpha_ mycolic_ acid,
Oxygenated_mycolic_acid, General_ methoxy_
mycolic_ acid.
classification of lipids. Ontology driven classification
has been applied to proteins9 and small molecules7
through the coordination of protein domain or
pharmacophore analysis, OWL-DL ontology, an
instance store, and DL reasoning. By adding
precisely defined DL-axioms to the lipid ontology we
can apply as smiliar approach for the automated
classification of lipids. Our approach is extensible to
accommodate novel lipids and we have extended the
use of DL-axioms to classify all lipid classes (except
for polyketides). In support of mycobacterial diseases
it is imperative to further extend lipid nomenclature
and classification. Our goal is to make available
formalized DL definitions of lipids organized
according to LIPIDMAPS nomenclature and to
establish a reusable standard for the lipid researchers
and the bioinformatics community interested in
lipidomics .
Acknowledgements
Table 2. Known classes of mycolic acid
Alpha mycolic acid is a mycolic acid that has
carboxylic acid, cyclopropane and an alpha-hydroxyl
acid group. The carboxylic acid group is a member of
the acyl group and it is not an ester group. Therefore,
according to the classification scheme below, alpha
mycolic acid must be a member of Fatty_Acyl.
Among members of Fatty_Acyl, only Octadecanoid,
Docosanoid, Eisocsanoid and Fatty_Acid have
Carboxylic_Acid. Alpha_mycolic_acid does not have
a cycloketone group and therefore, it cannot be
Docosanoid, Eicosanoid or Octadecanoid. Therefore,
it is a member of Fatty_Acid. Among members of
Fatty_Acid, only Mycolic_acid has Alpha-Hydroxy
_Acid_Group and a Meromycolic_Chain. Therefore,
alpha mycolic acid is classified under this class of
Fatty_Acid. Because Alpha_mycolic_acid is the only
class that accepts mycolic acid with Cyclopropane,
the lipid example in Figure 1 is classified as a
member of Alpha_mycolic_acid. (see Figure 2)
NUS Office of Life Science (R-183-000-607-712), ARF
(R-183-000-160-112), BMRC A*STAR (R-183-000-134305), New Brunswick Innovation Fund, Canada.
References
1.
2.
3.
4.
5.
6.
Conclusion
Lipid research is increasingly integrated within
systems level biology such as lipidomics1 where lipid
classification is required before appropriate
annotation of chemical functions can be applied. In
this paper we have sought to address the ongoing
challenge of classifying lipids through the adoption
of W3C standard knowledge representation and the
application of DL axioms. We initiated the process of
defining lipids according to appropriate functional
groups with the intent of using the ontology for
7.
8.
9.
Wenk MR: The emerging field of Lipidomics.
Nat. Rev. Drug Discov. 2005, 4: 594-610.
Fahy E, etal. A comprehensive classification
system for lipids. J. Lipid Res. 2005, 46: 839-62.
Sud M, Fahy E, Cotter D, Brown A, Dennis EA,
etal. LMSD: LIPID MAPS structure database.
Nucl. Acids Res. 2007, 35: D527-32.
Baader FI, Horrocks I, Sattler U: Description
logics as ontology languages for the semantic
web. In Festschrift Jörg Siekmann, LNAI 2003.
Villanueva-Rosales N, Dumontier M: Describing
chemical functional groups in OWL-DL for the
classification of chemical compounds. OWLED
2007, co-loc. ESWC2007 Innsbruck, Austria.
Baker CJO, etal.: Towards ontology-driven
navigation of the lipid bibliosphere. BMC
Bioinformatics. 2008, 9 (Suppl 1):S5.
Feldman HJ, etal: CO: A Chemical Ontology for
Identification of Functional Groups and
Semantic Comparison of Small Molecules. FEBS
Letters. 2005, 579:4685-4691.
Barry CE 3rd, Lee RE, Mdluli K,etal.: Mycolic
acids: structure, biosynthesis and physiological
functions. Prog. Lip. Res. 1998, 37:143-179.
Wolstencroft K, etal: Protein classification using
ontology classification. Bioinf. 2006, 22: e530-8.