Computing Chemical Organizations in Biological

Bioinformatics Advance Access published May 14, 2008
Computing Chemical Organizations in Biological
Networks
F. Centler∗, C. Kaleta, P. Speroni di Fenizio†, and P. Dittrich‡
Bio Systems Analysis Group, Jena Centre for Bioinformatics (JCB), and Department of
Mathematics and Computer Science, Friedrich-Schiller-University Jena, D-07743 Jena, Germany
Associate Editor: Prof. Thomas Lengauer
1 INTRODUCTION
Biological knowledge of the myriad of processes taking place in
living cells is typically represented in network models. Such models
detail, how genes regulate each other’s activity (de Jong, 2002), how
signals detected at the cell surface are propagated within the cell leading to a change in gene expression (Papin et al., 2005), and how
external molecules are internalized and inter-converted into different
metabolites for energy production (Heinrich and Schuster, 1996). As
biologists acquire more and more knowledge, these network models
not only increase in size, but also first steps are made towards comprehensive whole cell models (Tomita et al., 1999). With classic
∗ Current
address: Department of Environmental Microbiology, UFZ Centre for Environmental Research Leipzig-Halle, Germany
† Current address; Research Institute in Networks & Communications Engineering (RINCE), Dublin City University, Dublin 9, Ireland
‡ to whom correspondence should be addressed ([email protected])
analysis methods reaching their limits for such networks, the theory
of chemical organizations (Dittrich and Speroni di Fenizio, 2007)
has been proposed as a novel analysis technique. The main aim is
to identify subnetworks that do not create any species outside the
subnetwork and that are able to maintain themselves over time. The
method cannot only be applied to network models in which the set of
species is fixed, but also to constructive dynamical systems in which
novel species can appear as the system evolves in time. Signalling
systems are an example for such systems as the combinatorial complexity of molecular interactions leads to a vast number of potential
species (Blinov et al., 2004). The method has been applied to study
a photochemical model of the Martian atmosphere (Centler and Dittrich, 2007), models of HIV infection (Matsumaru et al., 2006a),
and a model of E. coli (Centler et al., 2007). Furthermore, the evolution of an artificial chemistry was investigated (Matsumaru et al.,
2006b) and chemical computing systems have been constructed with
the help of organization theory (Matsumaru et al., 2007). In this
paper, we lay the algorithmic foundation for computing chemical
organizations.
Another topology based method to study metabolic networks is
the analysis using elementary modes Schuster et al. (1999). In contrast to organization theory, only steady states are considered in
this approach. However, both methods are closely related and complement each other (Kaleta et al., 2006). Elementary modes are
minimal pathways that can operate at steady state. A biologically
feasible system state usually comprises a combination of several
elementary modes. As organizations embody such feasible system
states, elementary modes can be grouped according to the organizations within they are embedded. This allows for the identification of
elementary modes that act in concert, possibly providing a common
functionality. Furthermore, this classification eases the analysis of
the usually vast number of elementary modes (Kaleta et al., 2006).
2 THEORY OF CHEMICAL ORGANIZATIONS
Extending ideas by Fontana and Buss (1994), the theory of chemical
organizations (Dittrich and Speroni di Fenizio, 2007) provides a new
method to analyze complex reaction networks. A reaction network
hM, Ri is defined by a set of reactions R ⊆ PM (M) × PM (M)
occurring among molecular species M, with PM (M) denoting the
set of all multisets with elements from M. hM, Ri implies an m ×
n stoichiometric matrix S = (si,j ) where si,j is the number of
molecules of species i ∈ M that is produced in reaction j ∈ R
(i.e., right hand side minus left hand side); m = |M|, n = |R|.
At its core, chemical organization theory defines an organization
O ⊆ M as a set of molecular species that is (algebraically) closed
© The Author (2008). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
1
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
ABSTRACT
Motivation: Novel techniques are required to analyze computational
models of intracellular processes as they increase steadily in size and
complexity. The theory of chemical organizations has recently been
introduced as such a technique that links the topology of biochemical
reaction network models to their dynamical repertoire. The network
is decomposed into algebraically closed and self-maintaining subnetworks called organizations. They form a hierarchy representing all
feasible system states including all steady states.
Results: We present three algorithms to compute the hierarchy of
organizations for network models provided in SBML format. Two
of them compute the complete organization hierarchy, while the
third one uses heuristics to obtain a subset of all organizations for
large models. While the constructive approach computes the hierarchy starting from the smallest organization in a bottom-up fashion,
the flux-based approach employs self-maintaining flux distributions
to determine organizations. A runtime comparison on 16 different
network models of natural systems showed that none of the two
exhaustive algorithms is superior in all cases. Studying a “genomescale” network model with 762 species and 1193 reactions, we
demonstrate how the organization hierarchy helps to uncover the
model structure and allows to evaluate the model’s quality, for example by detecting components and subsystems of the model whose
maintenance is not explained by the model.
Availability: All data and a Java implementation that plugs into the
Systems Biology Workbench is available from
http://www.minet.uni-jena.de/csb/prj/ot/tools.
Contact: [email protected]
Centler et al
2
3 ALGORITHM I: CONSTRUCTIVE APPROACH
To compute the organizations for a given reaction network, one
could simply test all species combinations for the properties of
closure and self-maintenance in a brute force fashion. However,
this approach is only feasible for networks with few species (i.e.,
less than 30 species) as the number of sets to test equals 2n , with
n being the number of network species. The first more elaborate
algorithm to compute organizations determines in a first step all
semi-organizations of the reaction network. This is done in a recursive manner: given an already determined semi-organization so, the
semi-organizations containing so are computed in the next step. To
find a semi-organization above so, the network structure is taken
into account to select species that, when added to so, are likely to
give rise to a larger semi-organization. In this constructive fashion,
the hierarchy of semi-organizations is computed from bottom up. As
all organizations are also semi-organizations, the first step delivers
all candidate species sets for organizations. Then, in the second step,
all semi-organizations have to be identified that are also organizations. The property of self-maintenance is the property distinguishing
both organization types. All semi-organizations, for which it can be
shown that a flux vector in accordance with the self-maintenance
condition exists, are also organizations. This requires to solve a
linear programming problem for each semi-organization.
3.1 Step 1: Computing Semi-Organizations
The algorithm starts with the smallest semi-organization and creates
the whole hierarchy of semi-organizations from the bottom up.
3.1.1 The Main Loop. The main function computeSemiOrganizations() is depicted as pseudo code in Figure 1. The set
SOsToCheck contains all semi-organizations that have already
been found and that still have to be processed. It is initialized
with the smallest semi-organization computed by closure({}).
The function closure(set) computes the smallest closed set that
contains the species set set. This is done by iteratively adding all
species to set that, according to the reaction rules, can be produced
from the species in set. Each addition might enable new reactions. The iteration stops as soon as no new novel species can be
created. Taking the closure of the empty set delivers the smallest
semi-organization of the network. If the network contains no input
species, it is the empty set. If input species are present, the smallest
semi-organization contains all input species and their closure.
The central function of the main loop is SOsDirectlyAbove(so)
(see Figure 1 for pseudo code). It delivers the set of all semiorganizations that are directly above semi-organization so, that
means all semi-organizations that contain so and that do not contain any other semi-organization that contains so1 . In the Hasse
diagram, these are exactly those larger semi-organizations to which
1 More precisely, at least those semi-organizations are delivered. Under
certain circumstances, additional semi-organizations are returned that are
not directly above semi-organization so but rather contain another semiorganization directly above so. Consider for example the reaction network h{a, b}, {a → a + b}i. The system contains three organizations: {a, b} above {b} above {}. Applied to the empty set, the function SOsDirectlyAbove({}) returns both {b} and {a, b} here, since
function SOsDirectlyAboveContaining() first creates the closure
of its argument (see Section 3.1.2). For argument {a}, the closure is already
{a, b} and hence this semi-organization is returned although not being
directly above the semi-organization {}.
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
and self-maintaining. Closure ensures that given the species of an
organization, there is no reaction within the reaction network that
can create a novel species not yet present in the organization. Formally, O ⊆ M is closed, if for all reactions (A → B) ∈ R with
A ∈ PM (O), then also B ∈ PM (O).
Self-maintenance guarantees that all molecular species that are
consumed within the organization O can be recreated simultaneously from organization species at a sufficient rate for their
maintenance. A set of species O ⊆ M is called self-maintaining if
a flux vector v = (v1 , v2 , . . . , vn ) ∈ Rn
≥0 exists such that the following three conditions are fulfilled: (1) For every reaction (A →
B) ∈ R with A ∈ PM (O), its corresponding flux is vA→B > 0.
(2) For every reaction (A → B) ∈ R with A ∈
/ PM (O), its corresponding flux is vA→B = 0. (3) For every species i ∈ O, its
concentration change is nonnegative: (Sv)i ≥ 0. In other words: if
we consider only the subnetwork made up by the species of O (conditions (1) and (2)), we can find a positive flux vector, such that no
species of O decays (condition (3)).
The set of organizations, together with the set inclusion ⊆, forms
a partially ordered set that can be visualized as a hierarchical structure in a Hasse diagram (see Figure 3 for an example). Linking to the
dynamics, Dittrich and Speroni di Fenizio (2007) have shown that
all steady states of the system are instances of organizations, given
that the dynamics of the system is described by a set of ordinary
differential equations.
Besides these fundamental concepts, the following definitions
proved helpful. A set of species S ⊆ M is called semi-selfmaintaining, if all species s ∈ S that are consumed within the
reaction network implied by S are also produced within the same
network. This reaction network implied by S consists of all reactions (A → B) ∈ R with A ∈ PM (S). A species s ∈ S is
consumed (produced) if there exists a reaction (A → B) ∈ R
with A ∈ PM (S) for which s is contained more times in A than
in B (more times in B than in A). Note that each self-maintaining
set is semi-self-maintaining (Dittrich and Speroni di Fenizio, 2007).
A set of species S ⊆ M that is closed and semi-self-maintaining
is called a semi-organization. A (semi-) organization O ⊆ M is a
connected (semi-) organization, if it is empty, or there exists a species s ∈ O so that all species of O are connected to s. Two species
si and sj ∈ O are connected to each other, if there exists a sequence
of n species s1 , · · · , sn with sk ∈ O for k = 1, · · · , n, such that
si and s1 , sk and sk+1 for k = 1, · · · , n − 1, and sn and sj are
directly connected. Two species so and sp ∈ O are directly connected, if there exists a reaction (A → B) ∈ R with A ∈ PM (O),
so ∈ A ∪ B, and sp ∈ A ∪ B. All input species of the network (i.e.,
all species that appear as products in reactions with empty reactant
side) are defined as being connected to each other. Given a set of
organizations O, an organization O ∈ O with its corresponding
set of reactions R(O) := {(A → B) ∈ R | A ∈ PM (O) and
B ∈ PM (O)} is an elementary organization, if thereSexist no subset of organizations P ⊆ O \ O such that R(O) = P ∈P R(P ).
In other words, when viewing organizations as sets of reactions, an
elementary organization cannot be represented as the union of other
organizations. An organization O is called a reactive organization
if all species it contains are educt or product in at least one reaction of the set of reactions R(O) that occur within O. Note that all
connected organizations with more than one species are also reactive
organizations. The opposite is not true, as a reactive organization
may contain several connected subnetworks that are disparate.
Computing Chemical Organizations
so has a connection. In each iteration of the main loop, the smallest semi-organization2 current is taken from SOsToCheck. All
semi-organizations returned by SOsDirectlyAbove(current)
are then added to SOsToCheck, and current is removed from
the set. The iteration stops when no semi-organization is left in
SOsToCheck. In order to avoid processing the same set of species twice, a hash structure can be used to keep track of already
processed sets (not shown in pseudo code).
3.2 Step 2: Test for Self-Maintenance
Every semi-organization determined in the first step of the algorithm
is a candidate for an organization. For a semi-organization to also
be an organization, it must be shown that a flux vector exists in
accordance with the self-maintenance condition. Let O be a semiorganization with m′ = |O| species implying n′ reactions (i.e. all
reactants of n′ reactions are contained in O). With SO being the
stoichiometric matrix of this subnetwork, we must determine the
′
existence of a flux vector v ∈ Rn
≥0 with v > 0 and SO ·v ≥ 0. With
the solution space of these inequalities forming a convex polyhedral
cone in the positive orthant, originating in the point of origin, the
problem is equivalent to finding a flux vector v with v > 1 and
SO · v ≥ 0. This is a linear programming problem that can be
solved using the simplex method (Dantzig, 1963). Since only the
existence of such a flux vector v is of concern, only the first phase
of the simplex method needs to be performed.
3.3 Connected Semi-Organizations
A simple combinatorial explosion can lead to a vast number of chemical organizations. Consider the reaction network
h{s1 , · · · , sn }, {}i. The network does not contain any reaction.
Consequently, all n2 species combinations are closed and selfmaintaining, and hence organizations. However, the organizations
contain only isolated network species that do not react with each
other. In order to avoid this combinatorial complexity, it is adequate to only consider connected organizations (cf. Section 2). In
such organizations, all contained species form a single subnetwork
2 If
the smallest semi-organization is not unique, the choice is random.
Function SOsDirectlyAbove
Input: semi-organization so, reaction network hM, Ri
Output: set of all semi-orgs. directly above so in result
result ← ∅ ; usableSpecies ← M \ so ;
foreach s ∈ usableSpecies do
result ← result ∪ SOsDirectlyAboveContaining(so, { s } );
end
Function SOsDirectlyAboveContaining
Input: semi-organization so, species set species to be contained in new semi-orgs., reaction network hM, Ri
Output: set of all semi-organizations directly above so that
contain species in result
result ← ∅ ; closure ← closure(so ∪ species );
if closure is semi-self-maintaining then
result ← { closure } ;
else
spcsToProduce ← {s ∈ closure | s is consumed but not
produced in closure } ;
producingSets ←producerSets(spcsToProduce );
foreach set ∈ producingSets do
result ← result ∪ SOsDirectlyAboveContaining(so, species ∪ set );
end
end
Fig. 1. The constructive approach to compute semi-organizations.
in which all species are connected to each other via reactions.
In order to compute connected semi-organizations, only the function SOsDirectlyAbove() needs to be modified. Now, only
those species are added to an already discovered semi-organization
that are directly connected to a member species of that semiorganization. The Supplement contains the pseudo code. The hierarchy of connected organizations can also be used to compute the
full organization hierarchy as detailed in the Supplement.
4 ALGORTIHM II: FLUX-BASED APPROACH
The second algorithm takes a somehow opposite strategy as Algorithm I by combining self-maintaining flux distributions until closed
sets are discovered. While the first approach operates on species,
the flux-based approach operates on reactions. First, the flux based
approach computes elementary organizations (Step 1 and 2), which
3
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
3.1.2 Functions for Computing Semi-Organizations. In order to
find a larger semi-organization that contains semi-organization so,
it is checked in function SOsDirectlyAbove(so) for each
species not in so, whether this species is part of such a larger
semi-organization.
The function SOsDirectlyAboveContaining(so,
speciesSet) computes all semi-organizations that are directly
above so and contain the species in speciesSet (see Figure 1
for pseudo code). First, the closure of the union of so and
speciesSet is computed. If the computed closure is semi-selfmaintaining, a semi-organization with the desired properties is
found and the function returns. If not, those species in the closure are identified that are consumed but not produced. In order
to become a semi-organization, these species must be produced
somehow. The function producerSets(speciesSet) (see Supplement for pseudo code) returns all possible species combinations
that can produce all species in speciesSet. For each such combination, SOsDirectlyAboveContaining() is recursively called
again. This time, the producer combination is additionally required
to be present in the new semi-organizations.
Function computeSemiOrganizations
Input: reaction network hM, Ri
Output: set of all semi-organizations in result
result ← ∅ ; SOsToCheck ← { closure( {} ) };
while SOsToCheck 6= ∅ do
current ← getSmallestSO(SOsToCheck );
SOsToCheck ← SOsToCheck ∪
SOsDirectlyAbove(current );
SOsToCheck ← SOsToCheck \ { current };
result ← result ∪ { current };
end
Centler et al
Step 1: Computing Extreme Rays To compute the set of extreme
rays VB required for Step 2, we implemented the well-known Schuster algorithm (Schuster et al., 1999) to compute elementary modes.
This algorithm computes the extreme rays of a convex cone P ′
defined by v ≥ 0 and S·v = 0. By adding an outflux for each metabolite to the stoichiometric matrix used for the elementary mode
computation, the algorithm can also compute the extreme rays of
the cone P.
Step 2: Computing Elementary Organizations The central
function in the computation of elementary organizations is
organizationsAbove() (see Figure 2 for its pseudo code).
Given a self-maintenance flux vector v ∈ P and its corresponding reaction set v set , it computes all reactive organizations O that
contain M (v set ) and for which there exists no other organization
being a subset of O and a superset of M (v set )3 . Hence, the smallest
organizations containing M (v set ) are computed.
First, the closure of the reaction set v set , respectively M (v set ),
is computed. This is done by taking the species set M (v set ) and
iteratively adding all species to the set that can be created by reacset
tions of the network from the species set. The reaction set vClosure
contains all reactions that can take place in the generated closed
set of species. If this reaction set is identical to v set , the species
set
set M (v set ) is an organization. The reaction set vClosure
contains
set
more reactions than v
if either species were added, or M (v set )
is closed but v set does not contain all reactions that are possible in
set
this set. One such reaction is taken, and all reaction sets vB
∈ VBset
3 More
precisely, at least those organizations are computed. Under certain circumstances, organizations are also in the result set for which a subset
Os is also an organization and contains M (vset ). However, in such cases
Os is also contained in the resulting set of organizations.
4
Function organizationsAbove
Input: reaction network hM, Ri, set of reaction sets VBset
corresponding to the extreme rays spanning the selfmaintenance cone P, reaction set v set corresponding to
a self-maintenance flux vector v ∈ P
Output: set of the smallest reactive organizations containing
the reactions of v set in result
set
result ← ∅ ; vClosure
← closure(v set );
set
if vClosure
== v set then
result ← { M(v set )};
else
set
select one reaction r with r ∈ (vClosure
\ v set );
set
set
foreach vB
∈ VBset with r ∈ vB
do
result = result ∪ organizationsAbove(v set ∪
set
vB
);
end
end
Fig. 2. The central recursive function of the flux-based approach to compute
organizations.
that contain this reaction are consecutively combined with the original reaction set v set and the function is called again recursively.
set
As the initial reaction set v set and the extreme ray reaction sets vB
correspond to flux vectors fulfilling the self-maintenance condition,
also a flux vector vu fulfilling the self-maintenance property exists
set
for the union vuset = v set ∪ vB
. Hence, all reaction sets that are
considered in the recursive function calls are associated with selfmaintaining flux vectors. To obtain all elementary organizations, the
function organizationsAbove() is called for each reaction
set
set vB
∈ VBset corresponding to one of the extreme rays defining
P.
Step 3: Computing Reactive Organizations Elementary organizations are combined to determine all reactive organizations. This is
done by taking all possible combinations of two elementary organizations and calling organizationsAbove() for the union
of their reaction sets. For every newly discovered organization,
this organization must again be combined with each of the elementary organizations, and organizationsAbove() must be
called again for the reaction set unions.
Step 4: Computing All Organizations The organizations we have
obtained so far all possess a different set of reactions. Consequently,
the final step consists of searching for organizations having the same
set of reactions as already discovered ones, but containing different
species sets. Hence, we need to determine for all discovered organizations all species sets, that can be added to the organization without
changing its set of reactions. This can be done by simply inspecting
the reaction list.
5 ALGORITHM III: HEURISTIC APPROACH
The flux-based approach requires the computation of the extreme
rays of the cone P as input. The computation of extreme rays was
shown to belong to the class of NP problems (Samatova et al., 2002)
and is the bottle neck in the flux-based approach. However, instead
of starting with the set of all extreme rays VB , it is feasible to
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
are combined in Step 3 to obtain reactive organizations, that is, organizations where each species participates at least in one reaction
within the organization. Organizations containing species that do
not react are determined in the final Step 4.
Starting with the condition of self-maintenance, methods from
convex analysis can be employed. Given a reaction network hM, Ri
and its m × n stoichiometric matrix S, a flux vector v ∈ Rn
≥0 fulfilling the self-maintenance condition must be found to show that a
species set is self-maintaining. All such flux vectors lie in a convex
polyhedral cone P in the n-dimensional flux space Rn
≥0 , originating
in the point of origin. The cone is defined by the n + m inequalities v ≥ 0 and S · v ≥ 0. These constraints can be transformed
into a matrix A representing the set of spanning vectors VB or
extreme rays of P (Gagneur and Klamt, 2004). Each point within
P can be written as a linear combination of these extreme rays.
Thus, we can compute organizations by searching for combinations of extreme rays whose corresponding set of species fulfills the
closure condition.
Given a flux vector v, Algorithm II relies only on the set of reactions that have positive fluxes in v and not on their specific flux
values. Hence, we define v set as the set of reaction indices containing all reactions that have positive fluxes in v. Considering the set
of extreme rays VB defining P, VBset describes the set of reaction
set
sets vB
corresponding to the extreme rays vB ∈ VB . The species
set that corresponds to a reaction set v set is denoted by M (v set ).
It contains all reactants and products of the reactions contained in
v set .
Computing Chemical Organizations
skip the first step of the algorithm by directly starting with a set
of elementary organizations. To obtain such a set, a simple heuristic approach is used. A random walk through the reaction network
delivers a set of species. After computing the closure of this species
set, it is tested whether the closure also fulfills the self-maintenance
condition (see Section 3.2). After having determined a sufficient
large set of organizations, the associated set of elementary organizations Oel is determined. The reaction sets of the organizations
in Oel are then used as input to the second step of the extreme ray
algorithm to compute the complete set of organizations that can be
found by combining organizations from Oel . The heuristic approach
was able to correctly determine the whole set of organizations for all
tested networks to which we could also apply the regular flux-based
method.
The algorithms have been implemented in Java as a command line
tool and as an application that integrates into the Systems Biology Workbench (Sauro et al., 2003). Networks can be provided in
SBML format (Finney and Hucka, 2003), or in a human friendlier
proprietary format. SBML files are read using a Java-based SBML
parser from the JigCell project (Vass et al., 2006). The computed
hierarchy of organizations is stored in XML format (OrgML) and
can be visualized and interactively explored using the accompanying
Java application OrgView. To solve the linear programming problem
for checking for self-maintenance in the constructive approach, we
use lpsolve (Berkelaar et al., 2005). In order to study the runtime
behavior of the deterministic Algorithms I and II, we compute the
organizations for 16 different network models of natural systems
including a series of ten networks that increase in size in an iterative
fashion created using BioNetGen (Blinov et al., 2004).
The results are shown in Table 1. The complete hierarchy of
connected organizations using Algorithm I could be determined
for all networks. After 24 hours of computing time, Algorithm
I did not terminate for the Mars nightside network and networks
EGFR 5-9. Algorithm II did not terminate for the Mars night- and
dayside networks and networks EGFR 5-10. Generally, computing the hierarchy of connected organizations is much faster than
determining the complete hierarchy of organizations. Comparing
the runtime of Algorithms I and II shows that no method is superior in all cases. Algorithm II shows weak performance when the
number of extreme rays is large, as in the full martian photochemistry models, which contain a large number of cycles. When the
extreme rays can be computed quickly, Algorithm II tends to be
faster (e.g., EGFR 4). Although there is a tendency that larger networks require longer computation time, the size of the network
alone is only a limited indicator for the (expected) runtime. Adding a
single reaction can change the number of organizations and the runtime dramatically (e.g., Mars, nightside vs. Mars, dayside, Table 1).
Additionally, larger networks not necessarily contain more organizations. For example, the largest network EGFR 10 has only 32
organizations, while EGFR 4, a subnetwork of the former, contains
12288 organizations. However, computing these many organizations using Algorithm I is almost twice as fast as computing the mere
32 organizations of the larger network.
Model
Species/
Reacts.
Dry Mars, night 7/
Dry Mars, day 7/
Mars, night
31/
Mars, day
31/
Lambda phage 55/
Central E. coli 92/
EGFR 1
7/
EGFR 2
8/
EGFR 3
10/
EGFR 4
19/
EGFR 5
37/
EGFR 6
69/
EGFR 7
131/
EGFR 8
295/
EGFR 9
356/
EGFR 10
356/
Organizations
(Alg. I / Alg. II)
Connected
Elem. Extr.
Orgs. (Alg. I) Orgs. Rays
15
22 (1s/1s)
7
16
6 (1s/1s)
3
103 1088640 (9.2d/-)
27
104
1484 (50s/-)
8
81
7 (1s/2s)
2
168
4 (2s/3s)
1
3
48 (2s/1s)
8
5
48 (2s/1s)
8
8
96 (4s/1s)
9
21 12288 (479s/13s) 16
76
n/a (-/-)
25
213
n/a (-/-)
39
494
n/a (-/-)
70
1260
n/a (-/-)
174
3088
n/a (-/-)
72
3749
32 (831s/-)
13
(1s)
(1s)
(2s)
(1s)
(1s)
(1s)
(1s)
(1s)
(1s)
(1s)
(2s)
(3s)
(8s)
(122s)
(863s)
(1278s)
1
3
1
7
5
1
2
2
2
2
2
2∗
3∗
5∗
6∗
8∗
0
24
72
103
1
2
3
5
170
-
The runtime is computed as the sum of user and system time reported by memtime of
the constructive and flux-based approach, using a Pentium 4 processor at 1.8 GHz and 1 GB
RAM. The four Mars models are taken from Centler and Dittrich (2007), the lambda network
was derived from a petri net model by Doi (2005), the E. coli model is taken from Centler
et al. (2007), and the model EGFR 10 of epidermal growth factor receptor (EGFR) is taken
from Blinov et al. (2006); EGFR 1 - EFGR 9 represent steps of smaller networks in the
iterative network generation process of BioNetGen leading to EFGR 10. In cases marked ’-’,
the algorithm did not terminate within 24 hours. Runtime is rounded up to seconds and also
includes the computation of the hierarchical relations between organizations necessary for
their representation in a Hasse diagram. In cases marked ’*’, the heuristic approach was used
to estimate the number of elementary organizations. In all these cases, all reactive connected
organizations were found. They were computed within the first 10 minutes of a total runtime
of 24 hours. In all networks except for Mars, dayside and Lambda phage, the elementary
organizations are identical to the reactive connected organizations. For the exceptions, all
connected organizations are reactive.
6.1 Performance Analysis
The parameters network size, number of organizations, and required
runtime are only weakly correlated. The main factor determining
both the number of organizations and runtime is not the network
size, but the specific network topology. For any given number
of components it is possible to create a reaction network that
either does not contain any semi-organization or in which any species combination forms a semi-organization. This is true even for
connected semi-organizations, as detailed in the Supplement. One
key element of the constructive algorithm is to find larger semiorganizations by recursivly adding species to semi-organizations
that already have been computed. Instead of trying to add species
in all possible combinations in brute-force manner, this process is
guided by the network topology. For a given species set containing species that are not yet produced from the set, such species
are added that enable reactions that lead to the production of the
former species, in order to create a semi-self-maintaining species
set. If, for example in an extreme case, the network only consists
of decay reactions for all species, the algorithm discovers in the
first recursion step that no species can be added to the empty species set that would enable a reaction that creates other species. In
5
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
6 IMPLEMENTATION AND PERFORMANCE
Table 1. Number of regular, connected, and elementary organizations and
extreme rays in network models, and required runtime of the constructive (Alg. I)
and flux-based (Alg. II) approach
Centler et al
this case, the algorithm quickly terminates, independent of network
size, returning the empty set as the only semi-organization.
While the number of extreme rays usually explodes with increasing network size (Gagneur and Klamt, 2004), this is not necessarily
the case for reactive connected organizations (cf. Table 1). As
pathways consist of combinations of adjacent reactions, the addition of a reaction to the network increases the number of potential
pathways through the system. For organizations however, the addition of a reaction can even reduce their number. This can be
exemplified for the number of closed sets of a system. If the network consists of species without any reactions, every species set is
a closed set. Adding a reaction reduces the number of closed sets as
the set of reactants is no longer closed. Accordingly, the number of
organization candidates is reduced.
6.2 Runtime Complexity
7 APPLICATIONS
To demonstrate the use of organization theory to analyze biochemical reaction network models, we apply the method to a genomescale metabolic network of E. coli by Reed et al. (2003). After
splitting reversible reactions into explicit forward and backward
reactions, the reaction network contains 762 species and 1193 reactions. As the complete hierarchy of organizations could neither be
determined by Algorithm I nor Algorithm II, the heuristic approach
was used and only reactive organizations are considered. Four different scenarios which differ in input species and decay reactions as
shown in Table 2 are studied. The resulting organization hierarchies
are depicted in Figure 3.
In the first scenario, exactly those species are defined as input
that are also input in the original network by Reed et al. (2003). A
complex growth medium is modeled (see Table 2 for a list of input
species). The heuristic approach computes 70 elementary organizations after 48 hours of computing time, creating an organization
hierarchy of 88296 organizations. No organization contains the
complete network. However, the three largest organizations contain
the pseudo species Biomass, indicating states in which the cell is
able to produce biomass required for its survival.
In the second scenario, a decay reaction is added for all network
species. By this, all species are assumed to decay spontaneously,
simulating flow conditions. To be part of an organization, a species must now be produced at a positive rate from the input species
set. While in the first scenario organizations exist which only consist of closed reaction loops, for example a reversible reaction in
which one species reacts to another and back, such cycles cannot
form organizations in the second scenario. For this scenario, the
heuristic approach only identified one organization. It contains the
input species and all species that can be created from the input. It is
6
Fig. 3. Hasse diagram of organizations of the genome-scale model of E. coli
metabolism for the four scenarios as listed in Table 2. Organizations are
vertically arranged according to size, with the smallest organization at the
bottom (31 species in Scenarios 1, 2, and 4, and 34 species in Scenario
3) and the largest at the top (567 species in Scenario 1 and 547 species in
Scenarios 3 and 4). A link between two organizations indicates that the upper
organization contains the lower and there is no other organization between
them. For the first scenario, only the elementary organizations are shown.
the closure of the input species, that means it is the smallest closed
species set that contains the input species.
To explain this surprising finding, we analyzed in the first scenario, how organizations are created by the set of organizations
to which it has downlinks. Several mechanisms can form the larger organizations. In the simplest case, the union of the downlink
organizations already represents the larger organization. In other
cases, interactions between species of downlink organizations create
new species that are part of the larger organization. And finally,
in some cases it is necessary to add novel species to create the
larger organization. The closure of the union of downlink organization species and the additional species then forms the larger
organization. We call the smallest such sets of additional species
creator sets. Among the creator sets of the first scenario are 16 nonmetabolic network species representing the acyl carrier protein in
several states, tRNA, and thioredoxin, as listed in Table 2. Being a
metabolic network model, the model does not contain the de novo
biosynthesis pathways of these species. They cannot be created from
the network.
In the third scenario, we add input reactions for the three nonmetabolic species acyl carrier protein, thioredoxin, and tRNA-Glu.
This ensures that these species are unconditionally available to the
system. In this setting, the network contains two organizations. The
smallest organization again is the closure of the input species. The
second organization contains 547 species including pseudo species
Biomass, indicating bacterial growth. Adding the currency metabolites ATP, NAD, coA, mq8, and proteins thioredoxin and ACP, and
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
Organization computation is NP-hard. This can be shown by analyzing a specific subproblem, the question whether a reaction network
contains a reactive organization apart from the empty set or not. This
problem is NP-complete and thus we conclude that organization
computation is NP-hard. The proof is detailed in the Supplement.
It is based on a reduction to the 3-SAT problem. A formula F in
3-SAT can be translated into a reaction network for which the existence of a non-empty reactive organization implies the satisfiability
of F . Vice versa, if there is no solution such that F evaluates to true,
the network does not contain a non-empty reactive organization.
Computing Chemical Organizations
Table 2. Organizations of the genome-scale model of E. coli metabolism in four scenarios differing in input and decay reactions
Scenario
Input
Decay
Reactions
Connected Orgs.
Elementary Orgs.
Reactive Orgs.
1
2
3
4
CM
CM
CM + NMS*
CM
All
All
All - NMS
1193
1955
1958
1939
n.a.
1
2
4
≥ 70
1
2
4
n.a.
1
2
4
Largest Organization
Species
Reactions
567
31
547
547
956
77
1482
1464
CM denotes the set of input species given by Reed et al. (2003), they are: acetate, CO2 , Fe2+ , D-Glucose, glycine, water, H+, K+, D-Lactate, L-Lactate,
sodium, ammonia, oxygen, phosphate, sulfate, and succinate. NMS denotes the set of non-metabolic species including proteins and tRNA; they are: 3hmrsACP,
ACP, acACP, actACP, apoACP, ddcaACP, hdeACP, malACP, myrsACP, octeACP, palmACP, tdeACP, trdox, trdrd, glutrna, and trnaglu (for abbrevations see the
Supplement of Reed et al. (2003)). NMS* denotes the basic non-metabolic species ACP, thioredoxin, and tRNA-Glu. Organizations were computed using the
heuristic approach, running for 48 hours. Except for the first scenario, all organizations were found quickly at the beginning of the computation. Thus, only the
results for Scenarios 2, 3, and 4 can be assumed to be complete.
A total of 215 species do not appear in any organization in Scenarios 3 and 4 (see Supplement for a species list). Among them are
the 28 substrate deadend species, which only appear as reactants but
never as products in reactions. Such deadends indicate incomplete
knowledge regarding the metabolism. Besides the obvious incompleteness of the model indicated by the presence of substrate and
product deadend metabolites, organization theory provides a second
measure of incompleteness. For species that do not appear in any
organization, there exists no feasible state in which these species can
be maintained, or produced at positive rates under the given growth
conditions. Hence, the reaction network model does not explain the
synthesis of such species.
8 CONCLUSION
We have presented three algorithms to compute the hierarchy of
organizations for a given reaction network. Two of them, the constructive and the flux-based approach, compute the hierarchy of
organizations in an exhaustive manner. However, due to the exponential nature of the problem as a network can be constructed
in which any species subset is an organization, these algorithms
not always finish in reasonable time. In such cases, the heuristic
approach can be used to compute at least a significant subset of
all organizations. While the constructive approach computes chemical organizations in a bottom-up manner starting from the smallest
organization, the flux-based approach combines extreme rays of
the self-maintenance cone to find organizations. Testing both algorithms on several network models shows that neither of them is
superior in all cases. Not depending on extreme rays, the constructive approach has advantages in networks where the number
of extreme rays is very large (e.g., the mars model). When the network contains many species but few extreme rays, the flux-based
approach is favorable. Since the number of extreme rays usually
grows exponentially with network size, this case might proof rare.
However, if a network model is to be analyzed, it cannot be guaranteed beforehand, which algorithm will be more appropriate. If the
constructive algorithm is chosen, it is sufficient for many applications to only compute the hierarchy of connected organizations which
is usually much faster then computing all organizations.
If both the constructive and the flux-based algorithm fail to compute the organization hierarchy in reasonable time, the proposed
heuristic algorithm, which is based on the flux-based approach, can
7
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
tRNA-Glu to the small organization is sufficient to create the larger
organization.
In the fourth scenario, the unconditional availability of the nonmetabolic species is relaxed. Instead of providing them as input, we
remove their decay reactions. In this case, we obtain four organizations. The smallest organization is again the closure of the input
species. The organization next in size, containing 487 species, is
created by adding the currency metabolites ATP, NAD, coA, mq8,
and protein thioredoxin. Adding ACP to this organization gives rise
to the second largest organization containing 532 species, including
pseudo species Biomass. Adding tRNA-Glu finally creates the largest organization encompassing 547 species. This is identical to the
largest organization in the third scenario.
Analyzing the four different scenarios reveals the importance of
the non-metabolic species. Although being essential for biomass
production, their synthesis is not included in the reaction network
model. The largest metabolite sets forming self-maintaining networks did not contain all species of the network model. This hints to
parts of the network model that need refinement in order to become
a truly genome-scale network model of E. coli metabolism.
Taking an evolutionary perspective, one could ask what is the
smallest set of species that can create the whole metabolism as
represented by the largest organization from the input species and
their direct derivatives as represented by the smallest organization.
We find that such a creator set contains the non-metabolic species
and four (non-unique) more species, for example ATP, NAD, coA,
and mq8. For growth on a complex medium that provides 16 food
species (see Table 2), these creator species are enough to create all
547 species present in the largest organization. When assuming that
the metabolism has evolved from a small set of metabolite transforming enzymes to a more complex network, the creator sets can give
hints on species that were crucial in the expansion process of the
metabolism.
If the currency metabolites and the non-metabolic species are
assumed to be unconditionally available, the network only contains
one organization encompassing the 547 species of the largest organizations in Scenarios 3 and 4. The lack of a richer organizational
structure indicates that the metabolism acts as an indivisible unit.
The organization hierarchy of 4 organizations in Scenario 4 is a mere
artefact of the incompleteness of the network model, as it does not
contain the de novo biosynthesis pathways for all of its components.
Centler et al
ACKNOWLEDGEMENT
We are grateful to Axel von Kamp and Stefan Schuster for helpful discussions on extreme rays and elementary modes. We thank
two anonymous reviewers for helpful comments. We acknowledge
financial support by the BMBF (0312704A), DFG (Di852/4-1/2),
and EU (12789).
REFERENCES
Berkelaar, M., Eikland, K., and Notebaert, P. (2005). lp solve: Open source (mixedinteger) linear programming system, version 5.5.
8
Blinov, M. L., Faeder, J. R., Goldstein, B., and Hlavacek, W. S. (2004). BioNetGen:
software for rule-based modeling of signal transduction based on the interactions of
molecular domains. Bioinformatics, 20(17), 3289–3291.
Blinov, M. L., Faeder, J. R., Goldstein, B., and Hlavacek, W. S. (2006). A network
model of early events in epidermal growth factor receptor signaling that accounts
for combinatorial complexity. Biosystems, 83(2-3), 136–151.
Centler, F. and Dittrich, P. (2007). Chemical organizations in atmospheric photochemistries - a new method to analyze chemical reaction networks. Planet Space Sci,
55, 413–428.
Centler, F., Speroni di Fenizio, P., Matsumaru, N., and Dittrich, P. (2007). Chemical
organizations in the central sugar metabolism of escherichia coli. In A. Deutsch,
L. Brusch, H. Byrne, G. de Vries, and H.-P. Herzel, editors, Mathematical Modeling
of Biological Systems, Volume I, pages 109–123, Boston. Birkhäuser.
Dantzig, G. B. (1963). Linear Programming and Extensions. Princeton University
Press, Princeton.
de Jong, H. (2002). Modeling and simulation of genetic regulatory systems: a literature
review. J Comput Biol, 9(1), 67–103.
Dittrich, P. and Speroni di Fenizio, P. (2007). Chemical organization theory. Bull Math
Biol, 69(4), 1199–1231.
Doi,
A. (2005).
A hfpn model of bacteriophage lambda,
http://www.genomicobject.net/member3/GONET/lambda.html.
accessed in Feb. 2005.
Finney, A. and Hucka, M. (2003). Systems biology markup language: Level 2 and
beyond. Biochem Soc Trans, 31(Pt 6), 1472–1473.
Fontana, W. and Buss, L. W. (1994). ’The arrival of the fittest’: Towards a theory of
biological organization. B Math Biol, 56, 1–64.
Gagneur, J. and Klamt, S. (2004). Computation of elementary modes: a unifying
framework an the new binary approach. BMC Bioinformatics, 5, 175.
Heinrich, R. and Schuster, S. (1996). The Regulation of Cellular Systems. Chapman &
Hall, New York.
Kaleta, C., Centler, F., and Dittrich, P. (2006). Analyzing molecular reaction networks:
From pathways to chemical organizations. Mol Biotechnol, 34(2), 117–124.
Matsumaru, N., Centler, F., Speroni di Fenizio, P., and Dittrich, P. (2006a). Chemical
organization theory applied to virus dynamics. it - Information Technology, 48(3),
154–160.
Matsumaru, N., Speroni di Fenizio, P., Centler, F., and Dittrich, P. (2006b). On the evolution of chemical organizations. In S. Artmann and P. Dittrich, editors, Proceedings
of the 7th German Workshop on Artificial Life, pages 135–146, Berlin. Aka.
Matsumaru, N., Centler, F., Speroni di Fenizio, P., and Dittrich, P. (2007). Chemical
organization theory as a theoretical base for chemical computing. Int J Unconv
Comput, 3(4), 285–309.
Papin, J. A., Hunter, T., Palsson, B. O., and Subramaniam, S. (2005). Reconstruction of
cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Bio,
6(2), 99–111.
Reed, J. L., Vo, T. D., Schilling, C. H., and Palsson, B. O. (2003). An expanded
genome-scale model of escherichia coli k-12 (ijr904 gsm/gpr). Genome Biol, 4(9),
R54.
Samatova, N., Geist, A., Ostrouchov, G., and Melechko, A. (2002). Parallel out-ofcore algorithm for genome-scale enumeration of metabolic systemic pathways. In
Parallel and Distributed Processing Symposium., Proceedings International, IPDPS
2002, pages 185–192.
Sauro, H. M., Hucka, M., Finney, A., Wellock, C., Bolouri, H., Doyle, J., and Kitano,
H. (2003). Next generation simulation tools: the systems biology workbench and
biospice integration. OMICS, 7(4), 355–372.
Schuster, S., Dandekar, T., and Fell, D. A. (1999). Detection of elementary flux modes
in biochemical networks: a promising tool for pathway analysis and metabolic
engineering. Trends Biotechnol, 17(2), 53–60.
Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y., Miyoshi, F.,
Saito, K., Tanida, S., Yugi, K., Venter, J. C., and Hutchison, C. A. (1999). E-cell:
software environment for whole-cell simulation. Bioinformatics, 15(1), 72–84.
Vass, M. T., Shaffer, C. A., Ramakrishnan, N., Watson, L. T., and Tyson, J. J. (2006).
The jigcell model builder: A spreadsheet interface for creating biochemical reaction network models. IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 03(2), 155–164.
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on June 3, 2013
be applied. On current desktop computers, the algorithms can be
applied to networks having a species number in the order of 100
species for the deterministic approaches and in the order of 1000
species for the heuristic approach.
Testing the algorithms on a set of example network models
demonstrates that neither the runtime nor the number of organizations is strictly correlated with the network size. Both are
more dependend on the network topology which is difficult to
parameterize.
As an example application of chemical organization theory, we
analyze a genome-scale metabolic model of E. coli. The analysis unveiled four important aspects of the model. First, the largest
organizations in the four analyzed scenarios represent the largest
subnetworks that can operate in biologically feasible states, that
means steady states and states with increasing metabolite concentrations. However, these subnetworks do not encompass the whole
network model. Several species did not appear in any organization.
They can serve as starting points in the refinement of the metabolic
model to become truly “genomic-scale”.
Second, three non-metabolic species were identified that proofed
essential for the metabolic network: acyl carrier protein, thioredoxin, and tRNA-Glu. However, their de novo biosynthesis is not
accounted for in the model.
Third, only a small set of species is required in addition to the
input species to create the metabolic network. In Scenario 3, only
the currency metabolites ATP, NAD, Coenzyme A, and Ubiquinone 8 need to be added to the smallest organization to create
the largest organization in which biomass can be produced. This
metabolites are central to the model and might shed light onto the
evolution of the metabolism. And fourth, if the non-metabolic and
currency metabolites are assumed to be available at all times, the
metabolic network model only contains one large organization. The
metabolism appears as a indivisible unit that operates as a whole.
Thus, chemical organization theory can on one hand be used to
identify parts in the network model that need refinement, and on the
other hand it helps to explain model structure and model behavior.
In this way organization theory serves as a valuable tool for assessing the quality of whole scale models. By focussing on biologically
feasible system states including steady states and states featuring
accumulating species, it complements classical flux-based approaches, which focus on steady state flux distributions in metabolic
networks.