Are we ready for genome-scale modeling in plants?

Plant Science 191–192 (2012) 53–70
Contents lists available at SciVerse ScienceDirect
Plant Science
journal homepage: www.elsevier.com/locate/plantsci
Review
Are we ready for genome-scale modeling in plants?
Eva Collakova a,∗ , Jiun Y. Yen b , Ryan S. Senger b
a
b
Department of Plant Pathology, Physiology, and Weed Science, 308 Latham Hall, Virginia Tech, Blacksburg, VA, USA
Biological Systems Engineering Department, Virginia Tech, Blacksburg, VA, USA
a r t i c l e
i n f o
Article history:
Received 26 January 2012
Received in revised form 17 April 2012
Accepted 18 April 2012
Available online 3 May 2012
Keywords:
AraGEM
C4GEM
Flux balance analysis
Genome-scale model
Metabolic network reconstruction
Photorespiration
Photosynthesis
Plant primary and secondary metabolism
a b s t r a c t
As it is becoming easier and faster to generate various types of high-throughput data, one would expect
that by now we should have a comprehensive systems-level understanding of biology, biochemistry, and
physiology at least in major prokaryotic and eukaryotic model systems. Despite the wealth of available
data, we only get a glimpse of what is going on at the molecular level from the global perspective. The
major reason is the high level of cellular complexity and our limited ability to identify all (or at least important) components and their interactions in virtually infinite number of internal and external conditions.
Metabolism can be modeled mathematically by the use of genome-scale models (GEMs). GEMs are in silico
metabolic flux models derived from available genome annotation. These models predict the combination
of flux values of a defined metabolic network given the influence of internal and external signals. GEMs
have been successfully implemented to model bacterial metabolism for over a decade. However, it was not
until 2009 when the first GEM for Arabidopsis thaliana cell-suspension cultures was generated. Genomescale modeling (“GEMing”) in plants brings new challenges primarily due to the missing components and
complexity of plant cells represented by the existence of: (i) photosynthesis; (ii) compartmentation; (iii)
variety of cell and tissue types; and (iv) diverse metabolic responses to environmental and developmental
cues as well as pathogens, insects, and competing weeds. This review presents a critical discussion of the
advantages of existing plant GEMs, while identifies key targets for future improvements. Plant GEMs tend
to be accurate in predicting qualitative changes in selected aspects of central carbon metabolism, while
secondary metabolism is largely neglected mainly due to the missing (unknown) genes and metabolites.
As such, these models are suitable for exploring metabolism in plants grown in favorable conditions,
but not in field-grown plants that have to cope with environmental changes in complex ecosystems.
AraGEM is the first GEM describing a photosynthetic and photorespiring plant cell (Arabidopsis thaliana).
We demonstrate the use of AraGEM given the current (limited) knowledge of plant metabolism and
reveal the unexpected robustness of AraGEM by a series of in silico simulations. The major focus of these
simulations is on the assessment of the: (i) network connectivity; (ii) influence of CO2 and photon uptake
rates on cellular growth rates and production of individual biomass components; and (iii) stability of
plant central carbon metabolism with internal pH changes.
© 2012 Elsevier Ireland Ltd.Open access under CC BY-NC-ND license.
Contents
1.
2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.
Metabolic networks for GEMing are reconstructed from genome annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.
Constraining the model allows minimizing the number of possible flux solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1.
Stoichiometric constraints (mass balancing) assume no changes of fluxes with time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2.
Constraining the flux solution space minimizes the number of possible flux distribution solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3.
Objective functions are a mathematical representation of cellular “goals” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.
GEMs can be used to predict metabolic phenotypes in microorganisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Challenges associated with GEMing in plants revolve around the complexity of plant cell metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.
Lack of comprehensiveness in plant metabolic networks and missing components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.
Compartmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
∗ Corresponding author. Tel.: +1 540 231 7867.
E-mail address: [email protected] (E. Collakova).
0168-9452© 2012 Elsevier Ireland Ltd. Open access under CC BY-NC-ND license.
http://dx.doi.org/10.1016/j.plantsci.2012.04.010
54
54
54
54
55
55
56
57
57
57
54
3.
4.
5.
6.
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
2.3.
Photosynthesis and photorespiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.
Diversity of plant cell and tissue types and responses to the environment influence the objective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GEMing in plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.
Arabidopsis thaliana cell-suspension culture GEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.
Seed GEMs in plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1.
Compartmented barley seed endosperm flux balance analysis models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2.
Compartmented flux balance analysis models of developing Brassica napus embryos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.
AraGEM, an Arabidopsis flux balance analysis model representing a C3 plant cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.
C4GEM, maize, sugarcane, and sorghum flux balance analysis models representing C4 plant cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.
Maize GEM iRS1563 representing C4 plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Evaluating AraGEM through simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How can GEMing in plants be improved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduction
Genome-scale models (GEMs) have proven useful in bacteria
for: (i) elucidating metabolism and physiology; (ii) assessing the
essentiality of the metabolic steps; and (iii) for designing rational metabolic engineering strategies [1–5]. The first GEM was
generated in 1999 for Haemophilus influenza with the limited information about available genome annotation and metabolic data
[6]. With the development and improvement of high-throughput
sequencing, proteomic, and metabolomic technologies, GEMs were
constructed for many organisms including bacteria, cyanobacteria,
and fungi, animals, and plants [7–13]. The first GEMs generated
in plants are as follows: (i) Arabidopsis thaliana cell-suspension
culture GEM [14]; (ii) barley seed flux balance models [15,16];
(iii) AraGEM and C4GEM models for Arabidopsis and the monocots
maize, sugarcane, and sorghum, respectively [17,18]; (iv) rapeseed models [19–21]; and (v) updated AraGEM and C4GEM models
[22]. These models and challenges associated with genome-scale
modeling (“GEMing”) will be reviewed here after a brief introduction to generating GEMs.
1.1. Metabolic networks for GEMing are reconstructed from
genome annotation
GEMs are in silico metabolic flux models derived from genome
annotation that contain stoichiometry of all known metabolic
reactions of an organism of interest [3,23]. Metabolic flux (total
reaction rate) is typically defined as the amount of a metabolite
being converted by a particular enzyme per unit of time and per
gram (dry cell weight) of cells (or biomass). Each metabolic reaction in every pathway of a reconstructed network is represented
by relevant metabolites (including cofactors) and an enzyme or
a transporter encoded by the corresponding gene(s) present in
the genome of the organism. These gene-to-protein-to-reaction
associations are indispensable in deciding whether a particular
reaction takes place in the organism of interest as well as validating mutant predictions and implementing the actual metabolic
engineering strategies [24]. Fully sequenced genomes, or at least
comprehensive high-quality transcriptomic data, are needed to
extract the information about the participating enzymes and, subsequently, metabolic reactions and pathways. The databases most
commonly used to extract the metabolic information for specific organisms include: (i) Kyoto Encyclopedia for Genes and
Genomes (KEGG); (ii) the SEED database; (iii) BioCyc and AraCyc; (iv) Reactome; (v) TAIR; and (vi) the Biochemical Genetic
and Genomic knowledgebase (BiGG) [25–31]. It is well established
that many inconsistencies exist among these databases, and the
resulting network must be manually curated by an expert of the
organism of interest. These irregularities include inaccurate gene
58
58
59
59
61
61
62
63
63
64
64
66
67
67
68
annotation, inconsistent metabolite names, and incomplete formulas as well as the presence of unbalanced reactions for
water, protons, high-energy cofactors, and monomers/polymers.
However, several groups in the field are dedicated to building consistency among databases and enabling fully automated
genome-scale metabolic network reconstructions [32]. In the case
of plant cells, when compartmentation of metabolic pathways
in individual organelles is desired, one can consult a number
of databases for the localization of the corresponding enzymes
and their isoforms and seek transporters enabling metabolite
transport between organelles. These databases include: (i) SUBA;
(ii) Plant Proteome Database (PPDB); (iii) AraPerox; and (iv)
UniproKB/SwissProt [33–36].
1.2. Constraining the model allows minimizing the number of
possible flux solutions
The reconstructed metabolic network and its reaction stoichiometry (stored in a stoichiometric matrix) extracted from
databases determine the possible routes of carbon flow and cofactor
balancing. This represents the first constraint, as metabolites can
only be converted along pathways specific to the studied system.
Nevertheless, this model enables infinite possibilities for combinations of all allowed fluxes and needs to be further constrained to
minimize the number of the possible flux distribution solutions.
GEMs tend to be under-determined, constraint-based flux models yielding multiple solutions to a single problem [37,38]. As with
any under-determined system, one has to minimize the large number of possible flux solutions that are allowed by the structure of
the metabolic network to a few likely solutions. This goal can be
achieved by using various constraints on flux values within individual steps in metabolic pathways [39]. Examples of these constraints,
which include mass balancing, “physico-thermo-chemical”, and
actual flux measurements and are used to define a “window” of
possible values a flux can have in the studied system, are briefly
described here. Detailed information on constraint-based modeling
and its applications can be found elsewhere [5,10,39–43].
1.2.1. Stoichiometric constraints (mass balancing) assume no
changes of fluxes with time
During metabolism, concentrations of individual metabolites
change with time t, which is represented by Eq. (1), where Xi is
the concentration of the metabolite M, Sij is a stoichiometric coefficient matrix containing the stoichiometry of compounds i (placed
in rows) for each reaction j (placed in columns) in the metabolic
network. The parameter j is a vector containing the flux values through all reactions in the network [3]. The stoichiometric
matrix S contains all metabolites and reactions that ultimately are
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
55
reconstructed from the genome annotation of the studied organism, as described above.
dXi
=
Sij · j
dt
(1)
j
Because the rate of cell growth is much slower than the rate of
any enzymatic reaction, one may assume metabolic pseudo steadystate for very short time scales such that the rate of change of
metabolite concentrations dXi /dt is assumed to be zero [3,24,44].
This is the basis for steady-state mass balancing represented by
Eq. (2). This commonly termed “flux balance equation” is typically
solved by linear programming software.
S·=0
(2)
By applying the pseudo steady-state assumption, the differential equation (Eq. (1)) is replaced with a linear equation (Eq. (2)).
This was originally designed for bacteria growing in a chemostat,
where the culture was believed to very nearly achieve steady-state
metabolism. Usefulness of the modeling strategy has resulted in
this approach being applied more broadly. However, for a GEM,
there are usually more reactions than metabolites in S and we
refer to such a system as under-determined [37,38]. The many flux
solutions to an under-determined metabolic network must be narrowed down by introducing additional constraints to reduce the
allowable flux solution space into a “phenotypic solution space”
[38]. An example of the phenotypic solution space is shown in Fig. 1
for three fluxes as a heptahedron. In theory, this phenotypic solution space contains all possible flux combinations that the cell could
ever use. The particular combination of flux values that the cell
will use is dependent on the influences of: (i) the cellular objective
(e.g., maximizing growth rate); (ii) genetic regulation; and (iii) the
environmental and developmental cues. The question we are really
asking is: What combination of the fluxes can be expected in a cell
that has adapted to the specific conditions (e.g., a nitrogen-starved
cell)? In GEMing, knowledge of a “cellular objective” and realistic
“constraints” are required to solve this problem by linear programming with Eq. (2). The topic of cellular objective functions will be
revisited after introducing other potential methods for constraining
the GEM.
1.2.2. Constraining the flux solution space minimizes the number
of possible flux distribution solutions
The solution space can be further reduced in size by implementing “physico-thermo-chemical” constraints. The most commonly
used of these constraints include thermodynamic and enzyme
capacity constraints [45–47]. Implementing thermodynamics
requires that each reaction is described by its standard Gibb’s free
energy of formation, which together with known (or assumed)
metabolite concentrations, helps to determine the reversibility
constraints for each reaction. Thus, thermodynamically unfavorable reactions are removed by constraining them to zero flux [47]. A
number of reactions operate only in a single direction. For example,
decarboxylases tend to catalyze only the decarboxylation of their
substrates at physiological conditions. In the case of the absence of
the corresponding carboxylase in the studied organism, the range
of possible flux solutions can be restricted by constraining the
minimal decarboxylase flux to zero. The maximal possible flux constraints can be set by using maximum capacity for a specific enzyme
(max ) if known from in vitro or in vivo enzyme activity studies
[24]. Maximum flux constraints can also be based on measured
substrate uptake rates, which directly relate to the maximal possible biomass formed in a growing cell [24]. More sophisticated
ways to constrain the model include proton and cofactor balancing
[24,48] and integrating a variety of high-throughput data through
integrative metabolic omic analysis [49]. The resulting phenotypic
Fig. 1. Simplified GEMing workflow (modified from [47]). Step 1: Compartmented
metabolic network is extracted from genome annotation and available metabolic
and protein location databases. Metabolites are shown as circles of different colors;
external substrates and biomass products are in gray and black, respectively, while
the internal metabolites are in white. Enzymes E localized to different compartments
carry out the reactions and transporters T enable metabolite uptake and transport.
Stoichiometry and mass balancing S· = 0 is applied. Step 2: Other constraints (Section 1.2.2) are implemented to minimize the phenotypic allowable solution space
(shown as a heptahedron for all possible combinations of three fluxes 1,2,3 ) within
the 3D flux vector space. Imagine that this space would be 15-dimensional for the
example of the network (for 15 fluxes represented by 8 enzymatic steps and 7
transporters) and 1,798-dimensional for the updated AraGEM [22]. Step 3: Specific
objective functions (Section 1.2.3) are minimized or maximized to obtain solutions
that lie within the phenotypic space (a single solution with flux values of, e.g., 1, 2,
and 0.5 mmol g−1 dry weight h−1 for 1 , 2 , and 3 , respectively, is shown as a circle).
solution space can ideally be further explored to identify a single
flux distribution (or a family of fluxes) to describe the relationships
among competing metabolic pathways or between metabolism and
the cellular environment. Method development for extracting the
most useful flux distributions from the phenotypic space remains
an important area of systems biology research.
1.2.3. Objective functions are a mathematical representation of
cellular “goals”
As mentioned above, the phenotypic solution space contains
a very large number of possible flux combinations. Each reaction is represented by a range of flux values (from minimum to
maximum) that the reaction could ever have in the reconstructed
metabolic network. The actual solution (a combination of values
for all fluxes in the network, or what modelers like to call “flux
distribution”) is dependent on the “goal” of the cell. This goal is
commonly described by one or more objective functions when
the flux balance equation is solved by linear programming [38].
Anthropomorphically oriented cellular goals are only a mathematical convenience of capturing the influence of the environment
or genotype on the flux phenotype for modeling purposes and
should be viewed as such in this review. The set of flux values
will be different in the cell that, for example, operates to grow as
fast as possible from those in a cell that operates to generate and
store chemical energy and this is reflected in objective functions.
56
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
In the bacterial world, these objective functions typically include:
(i) regulating growth (minimizing or usually maximizing biomass
production); (ii) increasing the production of desired metabolites
(maximizing fluxes through particular pathways); (iii) minimizing
production of undesired metabolites (minimizing fluxes through
the relevant pathway(s)); and/or (iv) chemical energy production
(e.g., ATP or reductant) [1,3,38,50]. The rationale behind maximizing the biomass production is based on the natural selection
process. Bacteria growing under favorable growth conditions will
grow and divide as fast as possible to out-compete any other potentially present bacterial species and become the dominant species
in that environment. On the other hand, many bacteria are known
to grow slowly and export antimicrobial agents as a mechanism of
competition. For chemostat-based GEMing studies when competition is not an issue, the most commonly used objective function is
maximizing the biomass production.
The production of existing (e.g., antibiotics, solvents, etc.) or
novel metabolites (e.g., by metabolic engineering) is usually connected with low growth rates. In that case, the cellular goal becomes
a bi-level optimization, maximizing the growth rate as well as the
production of a target metabolite. Some highly reduced metabolites
such as fatty acids found in oil and membranes require significant
chemical energy in the form of ATP and reductant (e.g., NADPH)
for their synthesis. If the goal is to produce high levels of these
metabolites then maximizing the fluxes through the pathways that
are known to produce these high-energy cofactors will also need
to be considered as another objective function. It is apparent that
some objective functions need to be maximized, while others minimized to obtain a desired metabolic phenotype. Regardless, just as
protein folding is a thermodynamically driven process, distribution
of metabolic flux throughout a network is also thermodynamically
driven. Researchers in systems biology are still looking for these
connections. Until found, the use of multiple objective functions
to describe cellular function is the best approach to study multiple
behaviors in the cell.
From the mathematical perspective, flux balance analysis is the
most popular method to explore the phenotypic solution space
[1,3,38]. This approach uses linear programming (with Eq. (2)) to
minimize or maximize the objective function(s) within the phenotypic solution space to find the flux solution(s). Because the
objective function is either minimized or maximized, it represents
extremes at either end of the average (possibly physiologically relevant) objective function. As a result, the solution usually lies at the
edges of the phenotypic solution space (Step 3 in Fig. 1). There is
much debate about how well this represents an in vivo scenario. The
use of multiple objective functions is needed to model for instance
stressful or nutrient-limited conditions that are not represented by
extreme objective functions, which currently represents a major
challenge of GEMing. Much research is also in progress to build-in
genetic regulation to this approach.
Modeling (including linear programming) is typically done in
MATLAB using the COBRA toolbox that has been developed as a
shared resource with numerous useful tools for GEMing [51,52]. As
discussed above, due to the large phenotypic solution space, there
is usually more than one solution, but linear programming and the
majority of flux balance analysis algorithms commonly only return
a single set of solutions [2]. From all possible solutions, the one that
shows the smallest value for the sum of all fluxes will be typically
chosen to represent the specific objective function(s) [53]. This
choice is based on the assumption that cells will conserve resources
when it comes to nutrients and energy and will, therefore, likely
minimize the total carbon flux through the metabolic network [54].
Much research has been performed to identify suitable objective
functions in microbes. One such example is the Biological Objective
Solution Search algorithm [55], which relies on experimental data
to locate a suitable objective for the network. With a description of
GEMs and their representation by flux values established in these
sections, the next sections describe the applications of GEMs.
1.3. GEMs can be used to predict metabolic phenotypes in
microorganisms
GEMs provide information about the distribution of fluxes
in the genome-scale metabolic network for: (i) a specific genotype; (ii) a developmental stage governed by regulatory factors;
and (iii) environmental conditions. These are built into the GEM
through the stoichiometric matrix, constraints, and chosen objective functions. Historically, GEMs have been said to provide the
connection between the genotype and/or environmental influences
and resulting metabolic phenotypes, represented by fluxes. Generating genetic mutants in silico along with implementing diverse
objective functions through flux balance analysis and comparing
the resulting GEMs offer an insight into the changes of fluxes within
the metabolic network [42]. This type of in silico enzymatic reaction (‘gene’) knockout analysis is also useful in identifying essential
metabolic reactions and genes by constraining the flux through
the candidate reaction to zero to address the question of whether
all biomass components are still capable of being produced by
the metabolic network [56–58]. The reaction (and corresponding
gene(s) encoding enzyme(s) that catalyze it) will be considered
essential if other reactions of the network cannot synthesize the
components of biomass. The wild type and knockout mutants can
then be compared in effort to recognize metabolic re-programming
that occurs in response to genetic or environmental perturbations.
Of course, correct gene-to-protein-to-reaction associations [24] are
needed for validation purposes, specifically to genetically modify expression of relevant genes targeted for mutant analysis and
metabolic engineering.
Several more appropriate optimization methods in addition to
flux balance analysis have been used to predict the metabolic
phenotype in perturbed cells. Two most commonly used methods include the “minimization of metabolic adjustment” [59] and
the “regulatory on/off minimization” [60] algorithms. Minimization of metabolic adjustment uses linear (as flux balance analysis)
or quadratic optimization and the assumption that the affected cell
will prefer to minimize the effect of the genetic or environmental
perturbation. This is reflected as the smallest possible change in the
total flux between the wild type and perturbed cells [59]. The regulatory on/off minimization algorithm is similar to minimization of
metabolic adjustment except that instead of the minimization of
the total flux value, the total number of altered fluxes is minimized
[60]. The rationale behind regulatory on/off minimization is based
on the assumption that the cell will dump the extra carbon to short
as opposed to long alternate pathways as a metabolic adaptation
to the specific perturbation. Minimization of metabolic adjustment and regulatory on/off minimization are based on attractive
assumptions that coincide with the idea of conservation of cellular
resources committed to metabolism, while achieving the cellular
objective. Minimization of metabolic adjustment tends to be more
accurate for an initial mutant (before strain evolution), while flux
balance analysis and regulatory on/off minimization were shown
to be suitable for an adapted mutant [60].
GEMs are invaluable tools in rational metabolic engineering in
microorganisms. So how can GEMs be used to predict genetic modifications to obtain a desired metabolic phenotype (e.g., to produce
large amounts of a highly valuable metabolite) in a systematic
way without knowing which genes to knockout? The OptKnock,
OptFlux, and OptORF approaches (and their improved versions)
are three commonly used algorithms that perform systematic
gene knockout simulations with GEMs, given specific conditions,
to enhance production of specific metabolites [39,61–63]. These
methods were useful in increasing the production of lactate in
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
Escherichia coli K-12, succinate in Mannheimia succiniproducens, and
ethanol in Saccharomyces cerevisiae [64]. Efforts have also been
made to model the production of valuable chemicals during the
stationary phase of culture growth by instituting additional objective functions and minimizing the metabolic adjustment between
the exponential and stationary growth phases [65].
2. Challenges associated with GEMing in plants revolve
around the complexity of plant cell metabolism
2.1. Lack of comprehensiveness in plant metabolic networks and
missing components
Ideally, a GEM should contain all possible metabolites and
enzymes organized in interconnected pathways that can exist in
the given organism, based on its genome sequence. This can be
a challenge especially for complex eukaryotic organisms, such as
plants, when the function of most proteins remains unknown. Try
to imagine that the task is to put together a functional car, for which
one has all parts, but limited knowledge about the function of individual parts and how they assemble and function together. Those
with limited knowledge may still have a pretty good idea about the
“engine, fuel injection, and transmission systems” of the plant cell
represented by the central carbon metabolism. But, this knowledge
is probably not so great regarding the obscure parts representing
secondary metabolism. In general, we could assemble a working
vehicle, but we would not know if we had a sedan, a truck, or a
school bus.
The E. coli K-12 genome contains 4,290 predicted protein-coding
open reading frames, of which 1,366 (32%) have been included
in a comprehensive genome-scale reconstruction metabolic network iJO1366 [12]. Of the 1,366 metabolism-related genes, only
38 were inferred through bioinformatics. Conversely, Arabidopsis is predicted to have 28,775 genes as of September 11, 2011
[29]. Molecular function was experimentally determined for only
13% of all predicted genes, 44% of gene functions were computationally predicted, 30% are of unknown function, and 12% are
not annotated [29]. The maize genome is predicted to contain
32,540 genes and 53,764 transcripts [66]. For about 42% of the
total maize genes there is very little or no functional annotation
information available [22]. The high percentage of plant genes with
function inferred only through bioinformatics has raised concerns
about mis-annotation. The problem with mis-annotation and the
genes with unknown function relates mostly to plant secondary
metabolism and transporters. Here, many enzymes and the intermediates of specialized metabolic pathways as well as transporters
remain uncharacterized regarding their substrate specificities. For
example, methyltransferases involved in tocopherol biosynthesis
were originally mis-annotated as sterol methyltransferases until
their true function was confirmed by a combination of reverse
genetics and biochemical characterization [67]. In some cases,
homologues in different organisms have different functions. In
bacteria, PurU (N5 -formyl tetrahydrofolate deformylase) provides
formyl groups for purine biosynthesis [68], but two Arabidopsis
PurU homologues are involved in photorespiration and do not seem
to have anything to do with purine biosynthesis [69].
The situation gets even more complicated with metabolites.
Collectively, plants produce over 200,000 primary and secondary
metabolites, but structures of only about 10–20% of these metabolites are known [70,71]. The collection of all metabolites in a cell
is referred to as the metabolome [72–74]. Obviously, only some
pathways are active under specific conditions at any given time,
resulting in the production of a portion of all possible metabolites
in specific cells/tissues, which exacerbates the efforts to capture the
plant metabolome. As such, the metabolic networks extracted from
57
fully sequenced plant genomes and metabolic network databases
will have many gaps and errors. Automated gap-filling algorithms
such as GapFind and GapFill can be implemented [75]. However,
these must be and often are not used with caution. There are
instances when a particular step or a pathway is present in one
organism but not the other, or when completely different pathways
in different organisms can produce the same metabolite.
For example, plants use homoserine kinase that provides Ophosphohomoserine for the biosynthesis of both threonine and
methionine [76]. In bacteria, threonine is made from homoserine
the same way as in plants, but methionine is made using pathways
involving succinylated or acetylated as opposed to the phosphorylated forms of homoserine [77]. Threonine catabolism involves
threonine dehydrogenase in microorganisms and animals [78,79].
However, the activity and roles of putative plant threonine dehydrogenases in threonine degradation remain to be demonstrated
experimentally [76]. This problem relates to differences between
prokaryotic and eukaryotic organisms and also to organisms in
the same taxa. Unlike other bacteria and plants, Clostridium acetobutylicum is a fermentative microorganism that has an incomplete
TCA cycle and does not use oxidative pentose phosphate pathway
[80,81]. However, both initial drafts of the flux balance analysis
model for this organism had different versions of the TCA cycle that
also differed from what was eventually found through 13 C-based
steady-state metabolic flux analysis [48,65,81,82]. Metabolic networks that are extracted from databases using bioinformatics tools
and then used for GEMing should still be curated by an expert and
supplemented with biochemical data where necessary.
2.2. Compartmentation
Central carbon metabolism in bacteria occurs inside the cytosol,
so many bacterial GEMs include only ‘extracellular’ and ‘intracellular’ compartments. However, microbial GEMs that consider
additional compartments, such as the ‘periplasm’ of Gram negative
bacteria, do exist [12,58,83]. Minimizing compartments is preferred
to the mathematical modeling aspect. From the modeling perspective, the extracellular metabolites include substrates and products
of metabolism as well as some transported cofactors. The intracellular compounds represent the intermediates of metabolism.
The substrates can originate from the extracellular compartment in
abundance and are provided to the cell through transporters often
constrained to a desired flux. In the case of multicellular plants,
the substrates are produced elsewhere (e.g., photosynthetic assimilate or CO2 from the environment) and supplied to the sink cells.
The products must accumulate and constitute cellular biomass or
are excreted to the extracellular medium. Internal metabolites are
associated with at least one reaction that produces them and at least
one reaction that consumes them for those reactions to receive flux.
All metabolites transported between the extracellular and intracellular compartments must be accompanied by specific transport
reactions that supply or remove metabolites from the extracellular environment. Identifying relevant transporters and determining
their substrate specificities remains of the major challenges in
GEMing.
Plant cells have several compartments defined by organelles
that separate different metabolic processes and provide variable
conditions for enzymes. Many plant pathways of central carbon
metabolism are duplicated, which is represented by the presence
of differentially localized isoforms of the same enzymatic activities in proteomes from individual cellular compartments. In many
different plant cell types, glycolytic enzymes are present in both
cytosol and chloroplasts [84–87]. The existence of different compartments and enzyme isoforms in plant cells exacerbates GEMing
in terms of metabolic network reconstruction and the actual modeling. First, plant metabolic networks are more complex and fluxes
58
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
through duplicated (and this is true for any set of parallel pathways whether duplicated or alternate) can be dissected only when
there are differences in the production or consumption of chemical energy between the parallel pathways. Second, deciding the
specific compartment for the particular isoform is cumbersome
as there is only bioinformatic localization prediction available for
many proteins. Ideally, direct experimental evidence (e.g., protein
import and/or co-localization studies), should be combined with
proteomic studies for the specific cell/tissue/organ type and (if possible) at relevant developmental stages and conditions. With the
proteomic studies, caution is advised, as many cytosolic proteins
tend to be associated with individual organelles. Many glycolytic
enzymes in the cytosol are bound to the outside membranes of
mitochondria and chloroplasts probably enabling metabolite channeling and transport [88] and could be misinterpreted as being
mitochondrial and plastidic.
Third, the cytosol and different organelles provide different
conditions for metabolism in terms of: (i) pH; (ii) salt concentrations; and (iii) energy/redox status. Not considering the
differences in environments in diverse compartments also has
serious implications when thermodynamic constraints are implemented to narrow down possible flux distribution solutions.
Possible reversibility of a reaction depends on thermodynamics and
the concentration of participating metabolites and cofactors. Current methods of isolating organelles for metabolomic purposes are
not reliable [89], though non-aqueous fractionation can potentially
be useful [90]. The major problem of non-aqueous fractionation
is associated with the inability to separate metabolites present
in different organelles to discrete regions, but this problem can
be overcome by implementing marker molecules for different
organelles in combination with novel computational approaches
[90]. Other novel approaches using laser-capture microdissection
and pressure catapulting coupled with high spatial resolution imaging mass spectrometry (referred to as the “mass microscope”)
show a lot of promise for measuring levels of major metabolites
in specific cell types and even large organelles [91,92]. However,
these approaches still need significant improvements. The resolution (10 ␮m) of this “mass microscope” is too low to be useful
for distinguishing metabolites in cytosol, mitochondria, and plastids. Sensitivity is also low, as only major metabolites present at
the levels of about 1% of dry weight can be detected [92]. Metabolite charge depends on its pKa and pH of the environment and it
influences enzyme activity. Considering pH changes in different
compartments is useful for proton balancing. Thermodynamic balancing constraints have not yet been applied to plant GEMs and
only one has included proton balancing [22]. Currently, research is
underway to allow GEMing to aid in the quantification of metabolite fluxes between different tissues. These obstacles should be
addressed as additional models are created.
production, ferredoxin will transfer electrons back to photosystem
I (through the cytochrome bf complex and plastocyanin) rather
than to NADP reductase in the alternate cyclic electron transport
and only ATP is produced [95]. Both cyclic and linear electron
transport chains must be included in the model and the relative
fluxes through these routes depend on ATP and NADPH demand by
the dark reactions and other pathways. RubisCO can also fix O2 in
C3 plants and photorespiration should be included. Photorespiration is minimized or excluded in C4 plants and the relevant C4 cycle
involving phosphoenolpyruvate carboxylase in the mesophyll cells
is included in the model. Obviously, photosynthetic fluxes must be
constrained to zero in non-photosynthetic plant cells.
Significant modeling challenges exist involving the changes
in central carbon metabolism of photosynthetic organisms during the diurnal and circadian rhythms. Many models have aimed
to predict light-dependent photosynthetic metabolism, but in all
cases biomass composition was used for plants grown in standard light/dark regimes. As such, these GEMs may represent an
average of mixed metabolism (represented by average flux values) over a long period of time during biomass accumulation.
A GEM provides a combination of static flux values that would
explain the resulting biomass composition, given all implemented
constraints and objective functions. The clashing of two different
types of metabolic activities that are then averaged leads to a failure to adhere to the metabolic steady-state assumption needed
for flux balance analysis. Different pathways will be active under
light compared to dark conditions. Under light, photosystems will
be active, generating chemical energy used for CO2 fixation in
the Calvin–Benson–Bassham cycle. Sucrose and starch are made.
In the dark, the light reactions will no longer be active, changing the redox status of the cell. The chemical energy for the
Calvin–Benson–Bassham cycle has to come from other sources, e.g.,
respiration, in the dark. Dynamic flux balance analysis or more
sophisticated types of kinetic modeling provide a way to model
changes in fluxes [53,96]. Biomass and internal metabolites can be
measured at short intervals during a time course involving diurnal
or circadian metabolite changes for dynamic flux balance analysis.
Metabolite levels and fluxes are assumed not to change dramatically over a short period of time of the measurements, conforming
to the steady-state assumption. Subsequently, the measurements
for each time point are used to model fluxes individually by flux
balance analysis to obtain a view of dynamic change in fluxes in normally non-stationary model systems. Additional GEMing tools that
will aid this problem are new unpublished methods that remove
the dependency of the biomass composition. These approaches
remain in their infancy and are not discussed further in this review.
2.3. Photosynthesis and photorespiration
Plants are sessile organisms that have evolved numerous sophisticated mechanisms in response to environmental changes. These
mechanisms involve: (i) sensing specific types of biotic and abiotic stimuli; (ii) transducing relevant signals; and (iii) turning
on/off genes involved in the actual response to the changes in the
environment. These responses include various types of metabolic
and physiological adaptation processes, from the molecular to the
whole plant and even ecosystem level. At the same time, the plant
follows its own developmental program that is also tightly connected to changes in physiology and metabolism in specific organs
and cell and tissue types. From the GEMing perspective, this translates into an enormous complexity of possible metabolic networks
and flux combinations in these networks, as the responses to developmental and environmental changes are associated with global
metabolic reprogramming. Available high-throughput transcriptomic, proteomic, and metabolomic data are needed to refine the
The existence of photosynthesis and photorespiration
contributes to the complexity of plant metabolic networks. Photosynthetic rates can be measured under specific conditions [93].
These measurements can then be used to constrain the flux model
or used to validate GEM predictions. However, in some recalcitrant
model systems, including developing seeds, photosynthesis cannot
be easily measured, and the validation of flux models is not easily
achieved in these systems [94]. When modeling photosynthetic
cells, the light reactions produce chemical energy in the form of
NADPH and ATP from splitting water, while the dark reactions
of the Calvin–Benson–Bassham cycle consume these high-energy
cofactors to ultimately produce sucrose. The linear electron transport chain produces both NADPH and ATP, while the dark reactions
require more ATP than NADPH. To balance the NADPH and ATP
2.4. Diversity of plant cell and tissue types and responses to the
environment influence the objective functions
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
network after reconstructing it from genome annotation to ensure
that all enzymes and metabolites relevant to the model system
(specific cell types under defined conditions) are present.
Such complexity leads to an inherent problem associated
with GEMing, the selection of appropriate objective functions.
Minimization of metabolic adjustment and regulatory on/off minimization are algorithms designed to minimize total flux and
number of fluxes, respectively, and eliminate redundant parallel pathways. Plant transcriptomes and proteomes often reveal
that the enzymes of the redundant pathways are still present and
probing metabolism with 13 C-baleled substrates demonstrates that
there is indeed carbon flowing through these pathways [33,97–99].
So, what is the cause for this discrepancy? Does a cell not try to conserve resources by minimizing fluxes? It probably does, but GEMing
involves very simple and limited number of major objective functions. Other possible cellular goals are not considered and many
pathways will not be ‘needed’ in the model (but needed in reality) when these goals are not included. Bacteria are much simpler
in that sense, though they also respond to environmental changes.
However, from the metabolic engineering perspective, bacteria are
grown under optimized controlled growth conditions that maximize the production of the target metabolite.
What about plants? Given the existing toolsets, it is conceivable that GEMs can be used to predict phenotypes and design
metabolic engineering strategies for specific cell type and growth
conditions. However, there is no guarantee that these specific conditions will be met in the field. In fact, plants growing in the field will
simultaneously experience numerous abiotic and biotic stresses
and will adapt accordingly. The cellular objectives become much
more complicated than maximizing growth. Constantly changing
weather and plant–plant, –pathogen, and –herbivore interactions
will cause up-regulation of various metabolic and physiological
adaptations and defense mechanisms in field-grown plants. Significant cellular resources in terms of carbon and energy sources are
required for these mechanisms to operate. As such, substantial flux
will be redirected from the central carbon metabolism intended
for growth to the pathways of secondary metabolism for the production of defense metabolites. Limited information on secondary
metabolism in a plant GEM will diminish its predictive capabilities
when plant–environment and plant–pathogen interactions are in
play. Clearly this is an area that will benefit from incorporation of
advanced experimental research into the systems biology.
3. GEMing in plants
It is clear that GEMing in plants is extremely challenging, but as:
(i) the high-throughput technologies are improved; (ii) novel ones
emerge; and (iii) the missing parts of metabolic networks are identified, it will be possible to gather a systems level of understanding
of metabolic processes. Currently, there are several GEMs developed for plants. These models adequately describe the primarily
photosynthetic or non-photosynthetic central carbon metabolism
(Table 1). While secondary metabolism is included in these models,
it must be largely neglected in flux calculations due to unavailability
of complete metabolomic and proteomic data. However, all these
models were able to correctly predict certain aspects of central carbon metabolism in specific cell types. This leads to the question
of how much information about the metabolic network is really
needed to be able to predict specific phenotypes.
3.1. Arabidopsis thaliana cell-suspension culture GEM
Cell-suspension cultures are a useful tool in plant research.
Though not photosynthetic, they have been used as an approximation of a heterotrophic plant cell (e.g., root cells) with the
59
assumption that at least some aspects of central carbon metabolism
will be similar between cell cultures and diverse specialized root
cell types [100,101]. Clearly, the diversity of non-photosynthetic
plant cells cannot easily be captured with simple cell culture models. Such models should be viewed as a stepping stone for more
appropriate models using specific/relevant constraints and objective functions applicable to plant metabolism in specialized cells.
They should also be viewed more as models demonstrating the
application of modeling approaches to plant cells rather than providing meaningful metabolic and mutant predictions. The modeling
approaches used to generate the first plant cell-culture model [14]
were similar to those in bacteria. As with bacteria, using cultures of
homogeneous plant cells clearly offers advantages when it comes
to simplifications of the models and the ability to use a number of
assumptions that greatly simplify the modeling process. Not dealing with: (i) diverse cell types; (ii) photosynthesis; and (iii) the
need for compartments in bacterial cells provide an obvious advantage, but at the same time, this diminishes the relevancy and the
predictive power of the analogous plant models.
The Arabidopsis cell-culture GEM is a minimization-ofmetabolic-adjustment-like model based on linear programming
with the minimization of the total flux as an objective function
[14]. Organellar separation of metabolic pathways is not included
in this model. The metabolic network for this GEM was extracted
from AraCyc [30] using ScrumPy [102]. The network was carefully
curated to the final version of 855 reactions that were mass balanced for major atoms (each reaction has the same number of each
atom on both sides). Unbalanced reactions in the model would
upset the fundamental laws of mass and energy conservation. AraCyc is a useful tool for most applications, and improvements are
anticipated for its use with GEMing. However, it contains errors in
formulas as well as orphan reactions and metabolites, which prevents accurate atom balancing of the model; nearly 700 AraCyc
reactions were unbalanced for protons, of which over 370 reactions
also did not have the correct oxygen balancing [14].
Constraints included the stoichiometric mass balancing and the
physiologically relevant biomass composition. This was based on
experimental measurements of individual major biomass components in cells growing on glucose as the sole carbon source. Soluble
metabolites, salts, and CO2 were not accounted for in the total
biomass. While the soluble metabolites are difficult to quantify (and
individually they do not contribute much to the final biomass), CO2
lost during metabolism contributes significantly to the biomass
especially in heavily respiring heterotrophic cells. The total net
flux of CO2 production can be easily measured [103] and used as
an additional constraint. Nevertheless, this setup is rather clever
because rather than maximizing the biomass production as in most
bacterial studies, specific biomass production with the correct composition [50] is defined based on the typical growth of real plant cell
cultures under standard growth conditions. Specific biomass information including composition is mathematically expressed in so
called “biomass constituting equation”, representing a sum of the
amounts of all significant individual components constituting the
biomass.
The goal of this study was to interrogate the phenotypic solution
space. In other words, the authors sought to ascertain which fluxes
and by how much will these fluxes change in cells existing at various energy states (cells with different ATP availabilities) driven by
growth conditions. It is interesting to note that only 232 reactions
were active and only 42 reactions responded to systematic alterations in ATP demands [14]. As expected, the responsive reactions
were all related to the production or consumption of chemical
energy. Minimization of the total flux coincides with the objective
of minimizing the number of active routes (fluxes), which could
explain the low number of reactions carrying fluxes [14]. The objective function used here prevented the use of most reactions (they
60
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
Table 1
Summary of plant flux balance analysis models. All models presented here used biomass composition measurements as constraints and flux balance analysis except for the
bna572, which is a flux variability analysis model* [19,20]. Photosynthetic models assume that the ratio of RubisCO CO2 fixation and oxygenation is 3:1. Most of these models
used physiologically irrelevant, extreme objective functions because the real ones are not known. With the exception of the updated AraGEM and C4GEM [22], secondary
metabolism is not included.** denotes corresponding number of reactions and metabolites in the updated AraGEM model.
Models
Reactions
Metabolites
Objective functions
Biological question
Comments
Arabidopsis – heterotrophic cell
suspension culture [14]
1,406
1,253
• Minimize total flux
Assessing changes in metabolism
at different energy states in a
heterotrophic cell
• Objective function prevented many
redundant reactions from being used
• No compartments
Barley – photoheterotrophic and
heterotrophic cereal endosperm
[15,16]
257
234
• Maximize growth (linear
optimization)
• Minimize total flux
(quadratic optimization)
Estimating the influence of
hypoxia and aerobic conditions on
metabolic fluxes
• Small compartmented tissue-specific
model (65 transporters)
• 2nd model – improved with the
actual differential O2 and metabolite
measurements within the endosperm
Rape – photoheterotrophic
metabolism in developing oilseed [21]
313
262
• Actual substrate uptake and
biomass production rates
based on high-quality
measurements from the
literature
Predicting metabolic regulation
and mutant phenotypes related to
oil accumulation
• Flux balance analysis used in a
combination with principal component
analysis to minimize the number of
solutions (objective functions were
physiologically relevant)
• Regulation of oil biosynthesis
predicted
Rape – bna572, photoheterotrophic
metabolism in developing oilseed*
[19,20]
572
376
• Flux balance: minimize
substrate uptake rates
(photons and carbon and
nitrogen sources)
• Flux variability: fixed
objective function, then
minimize and maximize each
reaction
Exploring phenotypic solution
space with flux variability analysis
in search for alternate active
pathways in developing rapeseed
embryos
• Highly comprehensive and
compartmented oilseed model
• Flux variability along with in silico
mutant analysis – exploration of
phenotypic solution space obtained
from flux balance analysis and the use
of alternate pathways in relation to
carbon use efficiency
• Some fluxes in the bna572 model
were comparable to the fluxes in the
corresponding 13 C metabolic flux
analysis model
• Flux balance and variability analyses
– unable to select less efficient
alternate pathways that are known to
be active in vivo
Arabidopsis – AraGEM (iRS1597, an
updated AraGEM**) – C3
photosynthesis, photorespiration, and
heterotrophic metabolism in
mesophyll cells [18,22]
1,567
(1,798**)
1,748
(1,820**)
• Minimize substrate uptake
rates (photons for C3 and
sucrose for heterotrophic
metabolism)
Assessing metabolic changes in C3
vs. photosynthetic and
heterotrophic metabolism,
respectively; predicting sources of
energy in mesophyll cells
• Compartmented model of C3
photosynthesis
• Good qualitative model predicting
expected metabolic changes
Maize, sorghum, sugar cane – C4GEM – 1,588
C4 photosynthesis in mesophyll and
1,755
bundle sheath cells [17]
• As in AraGEM
As in AraGEM, but comparing
fluxes in 2 cell types and
distinguishing different types of C4
photosynthesis
• Compartmented 2-cell model of C4
photosynthesis containing 112
transporters
• Fluxes in photosystem I and II were
distinguished in 2 cell types
• Detailed analysis of high-energy
cofactor requirements in different
modes of C4 photosynthesis
Maize – iRS1563, an updated C4GEM
[22]
• Maximize biomass
production
Predicting metabolic phenotypes
in known lignin biosynthesis
mutants and assessing the effects
of changes in cell wall composition
on biomass production
• Expanded C4GEM specific to maize
• Known aspects of secondary
metabolism included
• Photon utilization and CO2 uptake –
not constrained
• Validation provided by existing data
on lignin biosynthetic mutants
1,985
1,825
carried zero flux). In reality, there is much metabolic redundancy
in plant central carbon metabolism and plant cells simultaneously
use these redundant and parallel pathways. Carbon can easily be
redirected from one pathway to the other if the main route is
blocked, resulting in the same amount of cell biomass and composition. The flux solutions obtained in [14] were biased by the use
of the highly limiting objective function and the question is how
comparable these modeled fluxes are to the situation in vivo. Two
flux maps were generated for Arabidopsis cell-suspension cultures
grown under heterotrophic conditions at two different oxygen
levels [99] using 13 C-based steady-state metabolic flux analysis.
This procedure involves a combination of measurements and
iterative modeling [104,105]. Accumulation of individual biomass
components and substrate uptake rate measurements are used to
constrain fluxes. Tracking the carbon from substrates through a
defined metabolic network to final products of metabolism using
13 C techniques enables one to discern which pathways are active
and, in some cases, even dissection of the forward and reverse fluxes
through reversible reactions [104,106]. Metabolic flux analysis
showed that under both normal- and high-oxygen conditions, both
glycolysis and TCA cycle were very active and that the TCA cycle
behaved as a cycle [99]. In the case of the GEM, the overall fluxes
of these two pathways were low and TCA was not a complete cycle
[14]. This case illustrates the need for experimentally-derived constraints in central carbon metabolism. Installing a level of genetic
regulation in a plant GEM to satisfy the need for constraints is an
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
ambitious but ultimately necessary approach from a systems
biology perspective.
It is difficult to compare these two models in this case
because the metabolic network of the metabolic flux analysis
model lacked reactions involving ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) and photorespiratory pathway. In
theory, RubisCO is not supposed to be active in non-photosynthetic
tissues, such as non-green seeds, roots, and plant cell-suspension
cultures. RubisCO is assumed to be inactive and is not included in
the metabolic flux analysis models of maize and sunflower seeds
and maize root-tips [107–110]. However, this enzyme is involved
in increasing the carbon use efficiency in green oilseeds by refixing significant amounts of CO2 lost in photoheterotrophic-type
metabolism [103,111]. RubisCO and photorespiratory enzymes are
found in the proteomes of plant cell cultures [112], but their in vivo
activities and physiological relevance to heterotrophic metabolism
need to be experimentally elucidated.
There was a discrepancy around CO2 and O2 fixation by RubisCO
in the flux balance analysis model. The two gasses compete for the
same active site in RubisCO [113]. This means that CO2 and O2 fixation cannot be separated when both gases are present and the
only way to shift the preference for either carboxylation or oxygenation is by changing the relative availability of the two gases.
Based on the flux balance analysis model, RubisCO was active with
both CO2 and O2 at low ATP demand, but the activity towards the
two substrates was disconnected, when O2 , but not CO2 was predicted to be used at all times regardless of the ATP demand in the
GEM [14]. These results contradict the expectations for high ATP
demand regarding the separation of carboxylation and oxygenation. At high ATP demands, substrates are oxidized at high levels,
meaning that O2 is consumed and CO2 produced, which would favor
the carboxylation of ribulose-1,5-bisphosphate by RubisCO. Subsequently, both the oxygenation and photorespiration should be
disfavored [14]. The most likely explanation for this discrepancy
was that RubisCO-catalyzed carboxylation and oxygenation were
not linked. There was also a need to constrain these two inseparable reactions together by using experimentally-derived constraints
to drive the relative flux through carboxylation versus oxygenation based on the actual relative availabilities of CO2 and O2 under
different energy conditions.
3.2. Seed GEMs in plants
Seeds provide means of sexual propagation in plants, but
they are also a source of nutritious and useful chemicals. Dry
seed biomass consists primarily of proteins, lipids, starch and
polysaccharides, and cell wall material and the relative levels and
composition of these individual storage compounds depend on the
genotype of the plant species and growth conditions [114,115].
Seed storage compounds are synthesized by the pathways of
central carbon metabolism during seed development and are mobilized during seed germination, providing nutrients and energy for
seedling establishment before photosynthesis takes over, enabling
seedling growth [94,103,110,116]. Maternally provided organic
and inorganic nutrients are metabolized through the: (i) glycolysis;
(ii) pentose phosphate pathway; (iii) TCA cycle; and (iv) individual pathways leading to amino acids for protein, fatty acids for
oil, and monosaccharide precursors for starch, oligosaccharide, and
cell wall syntheses [94,103,110]. Seeds also accumulate secondary
metabolites, providing protection against herbivores, but individually, these compounds contribute less than 1% to the final biomass
and due to poor flux value scalability so far have been excluded
from the seed models. For example, glucosinolates, which are considered major secondary metabolites in Brassicales, constitute less
than 1% and 4% of biomass in rape and Arabidopsis seed, respectively [117]. In addition, glucosinolates are synthesized most likely
61
in both maternal tissues and the embryo, but the specific locations
and their contribution to overall glucosinolate synthesis remain
controversial [118,119]. Glucosinolate metabolism in seeds cannot be currently accurately modeled due to this ambiguity in the
biosynthesis location. It is quite acceptable to argue that the levels
of these protective metabolites will increase in field-grown plants
and we have to include the relevant metabolic pathways in future
GEMs intended for elucidating seed metabolism for metabolic engineering purposes.
Bacteria will maximize their growth while saving the resources
under optimal conditions as one of the mechanisms to become
the dominant species in the environment. Similar to bacteria, one
would expect developing seeds to maximize the production of
biomass of optimal composition with the minimal effort under
favorable conditions, enabling the competition with other plant
species during germination. As such, real biomass measurements
can be used to constrain the model, and the maximization of
biomass production and minimization of total flux appear to be
acceptable objective functions. However, as noted in Section 3.1,
minimizing total flux is completely inappropriate for complex
eukaryotic cells due to active redundant and parallel pathways and
the lack of knowledge about additional required, but ignored processes that will influence the biomass production in seeds. Despite
these issues, seed central carbon metabolism is relatively simple
and usually fulfills the metabolic steady-state requirement needed
for flux balance analysis [104,120]. Therefore, it is not surprising
that the very first flux balance model in plants [15] was focused on
elucidating seed metabolism.
3.2.1. Compartmented barley seed endosperm flux balance
analysis models
The endosperm of developing cereal seeds represents a combination of photoheterotrophic and heterotrophic metabolism and
is the site of the synthesis of major storage compounds starch
and seed storage proteins [121]. Maternally provided sucrose is
used in glycolysis or in starch biosynthesis, while asparagine and
glutamine are sources of nitrogen for other amino acids for seed
storage protein synthesis. The endosperm also has to metabolically adapt to low O2 availability as it becomes less permeable to
O2 with increasing density, while still being able to make seed
storage compounds, ensuring seed growth [121,122]. This biochemical and physiological information and other genomic and
proteomic data provided the basis to refine the model of central carbon metabolism in developing barley endosperm extracted from a
number of databases [15]. The final model contained four different
compartments: (i) a single extracellular compartment for the
substrates, cofactors, and biomass compounds and (ii) three intracellular compartments (cytosol, mitochondria, and amyloplast) for
internal metabolites. Particular focus was given to transporters,
enabling the movement of internal metabolites between compartments. Barley endosperm comprises the majority, while the
embryo less than 4% of dry, husk-free barley seed biomass [123].
As such, the actual composition of barley seeds (predominantly
representing the endosperm) was used to constrain the biomass
fluxes. Two successive objective functions were used: (i) maximizing growth by linear programming followed by (ii) minimizing the
overall flux by quadratic optimization [124]. Fluxes were modeled
in barley endosperm under: (i) anoxic; (ii) hypoxic; and (iii) aerobic conditions by simulating different O2 and sucrose uptake rates
[15].
However, barley endosperm is not quite uniform and it contains
a gradient of O2 levels, from aerobiosis at the periphery to hypoxia
in the middle of the seed. High O2 levels (over 450 ␮M) are observed
at the periphery where photosynthesis occurs and low O2 levels
(less than 5 ␮M) in the middle of illuminated barley seeds [16]. This
means that the metabolism is different in the endosperm located
62
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
close to the seed coat and in the middle of the endosperm. It was
elegantly shown by using non-invasive in vivo magnetic resonance
imaging that alanine is made as final product of fermentation in the
central endosperm where O2 levels are low and then transported
through plasmodesmata to peripheral endosperm layers where it is
converted to pyruvate and used in aerobic metabolism [16]. Additional metabolomic measurements on dissected different layers of
barley endosperm demonstrated changes in several metabolites of
central carbon metabolism [16]. As such, the individual endosperm
layers corresponding to different levels of O2 have to be used for
biomass measurements individually. It appears that high degree of
hypoxia in cereal seeds is relevant not only to later stages of development when the endosperm is already densely packed with starch
and storage proteins, lowering the permeability to O2 [121], but also
deep within seeds [16]. To account for these differences in O2 levels
and central carbon metabolism, flux balance analysis was re-done
using the original network [15] and these relevant measurements
[16].
Barley endosperm flux balance analysis models predicted
metabolic changes associated with anoxia and different levels of
hypoxia relative to the aerobic conditions in developing barley
seeds [15,16]. Both models predicted the expected accumulation
of alanine under hypoxic conditions [15,16]. From the qualitative
perspective, most of the predicted changes in flux distributions
associated with anaerobiosis vs. aerobiosis were expected. For
example, the models described in detail the gradual switch from
respiration and high starch production in aerobic endosperm
(e.g., young seeds or peripheral endosperm exposed to light)
to an incomplete TCA cycle, high fluxes through the glycolytic
and fermentative pathways, and low starch/biomass production
in hypoxic dense endosperm (e.g., mature seeds or the central
endosperm in the darkness). The non-cyclic TCA cycle is observed
primarily because succinate dehydrogenase connects the TCA cycle
with the mitochondrial electron transport chain, which carries little flux when O2 is limited. Based on these simulations, the rate of
the biomass accumulation should decline with the increasing age
and density of developing barley seeds as O2 availability decreases.
From the quantitative perspective, the model predictions regarding
the growth rates were in accordance with the experimental findings [15]. This is not surprising when the actual composition of seed
storage compounds was used to constrain biomass fluxes.
The original model also predicted the existence of unexpected
metabolic adaptations to anoxia [15]. First, the futile cycle involving
phosphofructokinase and pyrophosphate:fructose-6-phosphate 1phosphotransferase running in opposite directions leads to the use
of ATP and generation of pyrophosphate during anoxia. Pyrophosphate produced by ADP-glucose pyrophosphorylase could be used
to help drive sucrose mobilization for starch synthesis by UDPglucose pyrophosphorylase in anoxic endosperm. Second, the
RubisCO bypass was required for growth in anoxic endosperm
due to an already incomplete TCA cycle and low O2 availability,
which limited additional available routes of carbon flow [15]. While
the RubisCO bypass can re-fix carbon lost during central carbon
metabolism in developing green oilseeds that are capable of photosynthesis [103,111], its activity may be irrelevant in heterotrophic
cereal endosperm during anoxia. Based on O2 measurements, barley endosperm is probably never completely anoxic [16]. As such,
the relevance of the futile cycle and RubisCO bypass to cereal
endosperm metabolism remains elusive.
Overall, these flux balance analysis models predicted barley
endosperm metabolism during growth. The next question is how
well the predicted internal fluxes are in an agreement with the
actual internal fluxes. Many GEMs predict the biomass fluxes, but
there is little agreement between the modeled and real internal
fluxes [125]. This presents an enormous opportunity for systems
biologists. One also has to keep in mind that plant metabolism
is resilient and highly regulated, and plant cells utilize redundant
pathways that: (i) may be missing from the model or (ii) not chosen as possible solutions due to the flux-minimization bias of the
optimization algorithms.
3.2.2. Compartmented flux balance analysis models of developing
Brassica napus embryos
Rapeseeds predominantly accumulate oil and protein as storage
compounds and represent an important source of food and biofuel
[115]. Oil and proteins are synthesized through the central carbon
and nitrogen metabolic pathways during embryo development.
The embryos are green and capable of performing photosynthesis,
which generates over 50% of the reducing power and ATP used in
extensive oil accumulation [94,126]. A flux balance analysis model
was reconstructed primarily from the AraCyc database [30] and
available literature. The objective function was based on data from
the literature regarding substrate uptake and rapeseed biomass
production during the early stages of oil accumulation. Flux balance
analysis was used along with principal component analysis and
in silico mutant analysis to investigate “metabolic network plasticity” with specific emphasis on the metabolic regulation of lipid
production. The principal component analysis-assisted exploration
of the entire phenotypic solution space was based on the reduction of the dimensionality in a highly complex biological system to
identify viable solutions. The predicted growth including lipid accumulation and its regulation by the transcription factor WRINKLED1
agreed with the experimental observations [21].
Brassica napus bna572 model is so far the most comprehensive
seed model containing 10 compartments, including apoplastic and
intermembrane spaces and thylakoid lumen [20]. Flux variability
analysis [53] was used to interrogate other possible optimal flux
distribution solutions in this flux balance analysis model to predict metabolic fluxes under photoheterotrophic and heterotrophic
conditions in combination with different organic and inorganic
nitrogen sources, respectively [19,20]. Flux variability analysis
enables to assess ranges of flux values representing optimal solutions given used constraints and objective functions. This means
that alternate and redundant pathways are also explored by this
approach and may be chosen as an optimal solution. The actual
biomass production rates including the carbon balance and housekeeping energy requirements of cultured B. napus embryos were
used as constraints, while the substrate uptake (photons and carbon
and nitrogen sources) was minimized in flux balance analysis [20].
For flux variability analysis, fixed objective function followed by
sequential minimizing and maximizing each reaction, respectively,
were used to narrow the ranges for flux distributions. Individual
fluxes were categorized into 18 different types and 6 classes based
on the flux variability/stability and essentiality/substitutability to
assess which pathways were active under four different conditions.
The predicted fluxes were compared to the actual fluxes
obtained by 13 C-based steady-state metabolic flux analysis [19].
It is very important to keep in mind that these two models had
different network structures and performing any direct comparison of these models is extremely difficult. Differences in network
structure are a result of poor resolution of compartmentation of
some pathways (e.g., cytosolic and plastidic glycolysis) or missing steps (e.g., plastidic malic enzyme as a source of carbon and
energy for fatty acid synthesis) in 13 C-based flux models [19]. On
the other hand, 13 C-based flux models are well constrained based
on 13 C-signatures reflecting real fluxes. The flux variability analysis yielded predictions that were in most part consistent with
the actual fluxes in developing rapeseed embryos. However, the
results obtained from the correlation analysis used for flux comparison are somewhat misleading. Although the correlation coefficient
was high when bna572 was compared to the 13 C-flux model [19],
this result was biased because of all small fluxes falling together
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
close to zero. In fact, numerous small positive or negative (opposite
direction) fluxes in 13 C-flux model were compared to the corresponding small, but several-fold different or zero fluxes in the
bna572 model. For example, the TCA cycle never carried any flux
between ␣-ketoglutarate and malate in the bna572 model, while
there was a small positive flux between these metabolites based
on 13 C metabolic flux analysis [19]. There were also clearly other
more significant disagreements between the two models. Pathways
known to be active in vivo that represented unfavorable “wasteful” or “energetically inefficient” processes were never selected to
be part of the solution by flux variability analysis due to insufficient constraints and extreme objective functions. These pathways
included: (i) amino acid uptake (ammonia uptake was energetically most efficient); (ii) glucose and fructose uptake (sucrose was
energetically most efficient); and (iii) pentose-phosphate pathway (reductant was produced exclusively by photosynthesis and
a number of dehydrogenases) [19,20,94,126–128]. However, it is
important to note that flux variability analysis is extremely useful,
as it enables predictions regarding the existence and essentiality
of novel and alternate pathways not considered in small 13 C-based
flux models. These predictions then can be tested experimentally
by 13 C-labeling approaches in plants [104,120].
3.3. AraGEM, an Arabidopsis flux balance analysis model
representing a C3 plant cell
AraGEM v.1.0 is a flux balance analysis model and it was the first
attempt to predict and compare the central carbon metabolism in
compartmented plant cells representing: (i) photosynthetic leaves;
(ii) photorespiring leaves; and (iii) non-photosynthetic respiring cells/tissues/organs (e.g., roots). The first, photosynthesis-only
scenario is realistic for plants grown under saturating levels of
CO2 when the oxygenation reaction of RubisCO is suppressed.
Modeling of: (i) photosynthetic and (ii) photorespiring leaves
allowed estimation of the effects of photorespiration on changes of
fluxes in central carbon metabolism. The network reconstruction
involved all available databases that provided the stoichiometry
and enzyme localization information as well as the information pertaining to individual gene-enzyme-reaction relationships.
AraGEM includes several compartments (cytosol, mitochondria,
plastids, peroxisomes, and vacuoles) and 18 plasma membrane and
83 interorganellar transporters [18]. A new, updated version of
AraGEM network, iRS1597was generated using up-to-date annotations, but this network was not used for modeling [22]. During
curation, 75 reactions that were absent from KEGG and AraCyc were
identified as essential for biomass production by flux balance analysis. Fluxes leading to the synthesis of sugars, amino acids, fatty
acids, nucleotides, vitamins and cofactors, and cell-wall components represent 47 different biomass drains that were included
in the biomass constituting equation. The uptake of photons, CO2 ,
inorganic salts, and selected sugars and amino acids represent
18 uptake fluxes. During network reconstruction, 446 singleton
metabolites were also identified [18].
In non-photosynthetic cells, photosynthetic and photorespiratory fluxes were set to zero, while sucrose uptake was allowed.
The rate of biomass production was held constant and the actual
biomass composition measurements [129] were used. The objective function involved the minimization of sucrose uptake and
utilization. For photosynthetic cells without photorespiration, the
flux through the oxygenation reaction of RubisCO was set to zero,
while for normally photosynthesizing and photorespiring mesophyll cells the carboxylation-oxygenation ratio of RubisCO was
constrained to 3:1, based on available data [130]. In both cases, CO2
and photon uptake were set as free fluxes (subjected to modeling)
and sucrose uptake from the extracellular space was constrained
to zero. Leaf biomass composition [131] was used to constrain the
63
biomass fluxes. The objective function involved minimizing the
photon uptake rates.
AraGEM is an extensive model of Arabidopsis metabolism, and
its development is considered an important achievement. Given
the complexity of plant metabolism, a number of simplifications
had to be implemented to reduce this complexity and to enable
flux balance analysis. Carbon and energy resources needed for
general cell maintenance, degradation and component turn-over
processes, and futile cycles are estimated at about 37% of total
cellular needs [125], and were included as a simple ATP maintenance term in the AraGEM model. The inclusion of fatty acids
was represented solely by palmitic acid and reduced the complexities that arise with lipid heterogeneity. The resulting impacts of
these simplifications on the relationship between cell growth and
photon uptake rates are minimal. However, the impacts of similar simplifications on fluxes through individual pathways remain
under investigation for several models. Despite the simplifications,
AraGEM was able to predict metabolic phenotypes in photosynthetic and non-photosynthetic Arabidopsis cells quite accurately
from the qualitative perspective. These predictions included: (i) the
activation of the traditional photorespiratory cycle and corresponding biomass loss as CO2 during photorespiration; (ii) activation of
sucrose degradation, glycolysis, and TCA cycle in root cells; and
(iii) identity of specific sources of reductant for nitrogen reduction in light and dark conditions. AraGEM also provided novel
predictions regarding metabolic flexibility and the production of
high-energy cofactors in different compartments under various
conditions [18].
3.4. C4GEM, maize, sugarcane, and sorghum flux balance analysis
models representing C4 plant cells
C4 metabolism enables some plants such as maize, sugarcane,
and sorghum to minimize water loss and photorespiration by
concentrating CO2 levels used by RubisCO [132–134]. Briefly, C4
metabolism involves CO2 fixation by phosphoenolpyruvate carboxylase, generating malate in mesophyll cells. Malate is then
transported to bundle-sheath cells, where its decarboxylation to
pyruvate by malic enzyme takes place. In the bundle-sheath cells,
CO2 released from malate is fixed by RubisCO in the traditional
Calvin–Benson–Bassham cycle. Pyruvate is transported back to
the mesophyll cells and enters the C4 cycle. There are speciesspecific variations to C4 metabolism depending on whether NADor NADP-dependent malic enzymes and phosphoenolpyruvate carboxykinase and mitochondria are involved, and these modes were
accounted for in the selected model C4 plant species. Compartmented AraGEM was used as a template for the reconstruction
of C4GEM, to which pathways and transporters related to the C4
metabolism were added [17]. Species-specific gene annotation was
implemented to generate models of central carbon metabolism in
maize (11,623 genes), sugarcane (3,881 genes), and sorghum (3,557
genes) [17].
GEMing of C4 metabolism offers a unique opportunity to interrogate the interactions between two different cell types and serves
as a basis for future studies involving modeling of multiple cell
types. Bundle-sheath and mesophyll cells were modeled as duplicated subsets of the whole system and the intercellular transport by
the plasmodesmata provided the needed flux link between these
two cell types [17]. Metabolites were exchanged between the cells
and remained ‘intracellular’ in both cells. In a single-cell model,
they would be excreted from the cell and considered ‘extracellular’. The network differences between the cell types were achieved
by implementing cell-specific constraints based on available transcriptomic and/or proteomic data for that cell type. For example,
the C4 cycle operates only in mesophyll cells, so fluxes through this
cycle can be constrained to zero in the bundle-sheath cells.
64
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
C4GEM delineated major differences in metabolism between
the bundle-sheath and mesophyll cells that overall agreed with
maize proteomic data. Bundle-sheath cells tend to have high fluxes
through the pathways relevant to CO2 assimilation, while the
mesophyll cells have highly active pathways involving pyruvate
and nucleotide metabolism as well as nitrogen assimilation and
metabolism [17]. A surprising prediction that came out of the
model was the existence of differential fluxes in photosystems I
and II represented by the linear and cyclic electron transport chains,
respectively, in two cell types. This distinction was enabled because
of the differences in ATP requirements in the bundle-sheath versus
mesophyll cells and for the different malate decarboxylation modes
mentioned above. C4 cycling in mesophyll cells requires more ATP
than C3 photosynthesis and therefore, mesophyll cells should have
had very active cyclic electron transport chain around photosystem I, which produces only ATP and helps keep the appropriate
ATP/NADPH balance. However, C4GEM predicted that for the
NADP-dependent malic enzyme mode, only linear electron transport chain was active in mesophyll cells, while the bundle-sheath
cells had only cyclic electron transport chain. Based on simulations,
cyclic electron transport chain is more efficient in the bundlesheath than in mesophyll cells with the NADP-malic enzyme-type
metabolism [17]. Because the photon uptake was minimized, the
cyclic electron transport was selected as the most efficient solution
for ATP production in the bundle-sheath, but not in mesophyll cells.
Minimizing the uptake of substrates (e.g., photons and
sucrose under photoautotrophic and photoheterotrophic conditions, respectively) with maximal production of biomass of the
appropriate composition provides the ability to assess the efficiency of cellular metabolism. It would be interesting to see flux
distribution predictions for physiologically relevant conditions
for a C4 plant model that includes constraints relating to the
actual capacities (maximal quantum yield) of the photosystems
and the downstream photosynthetic components including CO2
fixation based on available measurements [135]. Using photons
rather than the components of the photosystems is a mathematical convenience that allows controlling the downstream fluxes. Is
the minimization of the photon uptake a physiologically relevant
objective function for a photosynthetic cell? We are questioning
this objective function because the amount of light absorbed by
photosynthetic apparatus does not translate to the same amount
of electron flux. Not every photon transferring its energy to an
electron is used in photochemistry and there are processes such
as photoinhibition that are not accounted for in these models.
Minimizing photon uptake might be relevant to very low light
conditions or during an abiotic stress when photosynthesis is
suppressed [136,137], but not to a healthy plant growing under
favorable conditions when photons and minerals are abundant.
Photosystems are usually hit with excess light energy that needs to
be dissipated as heat (non-photochemical quenching of chlorophyll
fluorescence) or by release of fluorescent light from excited chlorophylls [138]. To assess the influence of photon and CO2 uptake rates
on biomass production, we performed a simulation study using
AraGEM [18], which also used the minimization of photon utilization as an objective function. Results of these simulations are
presented in Section 4.
3.5. Maize GEM iRS1563 representing C4 plants
An updated model of C4GEM (iRS1563) was generated for
maize mesophyll and bundle-sheath cells [22]. This model
describes more detailed biosyntheses of cell wall components
and lipids and includes selected, well-studied pathways of secondary metabolism (flavonoids, terpenoids, phenylpropanoids,
anthocyanins, carotenoids, chlorophylls, glucosinolates, and some
hormones) not present in previous models of central carbon
metabolism [17,18]. iRS1563 contains 674 more metabolites and
893 more reactions than C4GEM, which is relevant mostly to central carbon metabolism. This model also has 445 reactions and
369 metabolites that are unique to maize when compared to the
updated AraGEM (iRS1597) due to the differences between C3 and
C4 metabolism as well as in secondary metabolism. All reactions are
elementally and charge balanced in six compartments [22]. However, challenges still exist for proton balancing based on pKa values
to enable organelles to operate at different pH values. Measurements of biomass composition for maize and other closely related
species were used to constrain biomass fluxes in this flux balance
analysis model [22].
As with C4GEM, iRS1563 was designed to predict flux distributions under different physiological conditions represented by
photosynthesis, photorespiration, and respiration in mesophyll and
bundle-sheath cells of a C4 plant. Unfortunately, no data or discussion was provided to compare this model to C4GEM in terms of
flux predictions for these different types of physiological conditions. Changing lignin composition in cell walls without affecting
total biomass accumulation is a challenging task important for cellulosic biofuel production [139,140]. Therefore, the authors focused
on predicting metabolic phenotypes in two well-characterized
lignin biosynthesis mutants under photosynthetic conditions and
on prediction validation [22]. Fluxes through the steps disrupted
in the mutants, cinnamyl alcohol dehydrogenase and caffeate 3O-methyltranferase, were set to 10% of the wild-type activities
predicted by the model. Presence of a residual activity of these
enzymes was necessary to ensure lignin biomass production. Maximal photon utilization and CO2 uptake were allowed. The flux
balance analysis model was about 81% accurate in predicting the
expected qualitative changes in mutant fluxes when compared to
the experimental findings [22].
4. Evaluating AraGEM through simulations
The AraGEM model (v. 1.0) [18] was simulated in an effort to
demonstrate to plant modelers: (i) how critical constraints can
impact overall model results; and (ii) the robustness of the AraGEM
model. To achieve this goal, the following were examined: (i)
the relationship between the specific growth rate and the photon
uptake rate given a constrained CO2 uptake rate; (ii) the impact of
varying the stoichiometry of the biomass equation; (iii) the ability
of AraGEM to synthesize all compounds in the model; and (iv) the
impact of a net influx or efflux of free protons on the overall model
conversion. These factors play a key role in using GEMs to guide
metabolic engineering in silico. As shown in Fig. 2, it is critical to
consider how a metabolic engineering strategy might influence the
growth rate, photon uptake rate, and CO2 uptake rate of the mutant
plant. If one of these parameters is known (or can be assumed),
AraGEM will calculate the rest. However, it is presently unrealistic to assume that a GEM (of any kind) can predict all of these
parameters for a genetic mutant without some input or constraints
supplied by the user. For each curve in Fig. 2, the steady-state
solution for a given CO2 uptake rate lies at the junction between
the horizontal and vertical portions of the curve. At lower photon
uptake rates, relative to this junction, an imbalance between CO2
and the required photons for growth is observed, leading to a model
that does not converge in flux balance analysis. At higher photon
uptake rates, this represents a scenario in which photons are consumed in excess. The regulatory structure of the plant cell does not
allow either scenario to happen in the physical world.
Next, the biomass constituting equation of the AraGEM model
was investigated to determine its impact on the relationship
between the growth rate and the photon uptake rate and to address
the question how well individual biomass components need to be
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
Fig. 2. AraGEM model conversion with constrained CO2 uptake. The relationship
between the cellular growth rate and photon uptake rate is shown for cases in which
the CO2 uptake rate was constrained to 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, and
4.5 mmol g−1 dry weight h−1 as labeled in the figure.
determined. The latter is especially relevant to plant models, as
the biomass measurements for different biomass components often
come from diverse types of measurements, experiments, research
groups, and even cell types and plant species. For these simulations, the CO2 uptake rate was held constant at 2.5 mmol g−1 dry
weight h−1 . The following components of the biomass constituting equation were varied: (i) cellulose and xylose; (ii) the lignin
components (4-coumaryl alcohol, coniferyl alcohol, and sinapyl
65
alcohol); (iii) the lipids; and (iv) the maintenance ATP built into
the biomass constituting equation. The stoichiometric coefficient
for each of these parameters was varied by multipliers of 0; 0.5; 1;
1.5; 2; and 4, relative to their values in the AraGEM model, while the
composition of the rest of the biomass components was kept constant (Fig. 3). For the cases of varying lignin components (Fig. 3B)
and lipids (Fig. 3C), very little influence is observed for the relationship between the growth rate and photon uptake rate. The
cellulose and xylose components (Fig. 3A) show greater sensitivity and lead to a reduction in the growth and photon uptake rates
as their stoichiometric coefficients are increased in the biomass
constituting equation. The ATP maintenance component (Fig. 3D)
shows by far the greatest to the relationship between the growth
rate and photon uptake rate of the cell. As the stoichiometry of
required ATP maintenance was increased, the photon uptake rate to
achieve the steady-state growth rate increased considerably. This
result is common among several GEMs and has also been reported
for the bacteria when increases in ATP demands are accompanied
by increases in substrate uptake rates [5]. Nevertheless, the AraGEM
model is considered to be a robust model, providing a fairly consistent relationship between the growth rate and photon uptake rate
even when considerable changes are made to its biomass constituting equation.
Additional simulations were performed with the AraGEM model
to test its robustness and capabilities. When the biomass constituting equation provided with the AraGEM model is used, fluxes
of central carbon metabolism dominate the flux balance analysis
results, and little to no flux is observed through the secondary
metabolite pathways. Of the 1,567 total reactions in the AraGEM
model, only 435 receive non-zero (or non-trivial) flux. A simulation
study was performed to determine whether this was: (i) the result
of an incomplete model or (ii) a function of secondary metabolites not being included in the biomass constituting equation. We
Fig. 3. Some biomass components influence the modeling results more than others. The relationship between the growth rate and the photon uptake rate is shown for varying
the stoichiometric coefficients of specific components by multipliers of 0, 0.5, 1, 1.5, 2, and 4 (direction shown from 0 to 4 by arrows). CO2 uptake was held constant. The
stoichiometric parameters of the following biomass components were varied: (A) cellulose and xylose; (B) lignin components including (i) 4-coumaryl alcohol, (ii) coniferyl
alcohol, and (iii) sinapyl alcohol; (C) lipids; and (D) maintenance ATP requirements of the biomass equation.
66
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
5. How can GEMing in plants be improved?
Fig. 4. AraGEM is not sensitive to pH change. The number of reactions remaining
unchanged by varying the specific proton flux from (i) −10, (ii) 0, and (iii) 10 mmol
H+ g−1 dry weight h−1 .
constructed an algorithm where every metabolite of the AraGEM
model was added to the biomass constituting equation individually,
and the model was tested for growth. If the cell was able to grow
in silico, the metabolite was capable of being produced by AraGEM.
The results of this study revealed that all 1,748 metabolites of the
AraGEM model are capable of being produced. This further reveals
that all 1,567 reactions of the model are complete and capable of
carrying flux. This also indicates that the 435 reactions used by the
AraGEM model, as published, are a function of its biomass constituting equation. This leads to the following important point that if this
model is to be used to study secondary metabolite pathways, these
secondary metabolites must be added (manually) to the biomass
constituting equation. This means that the AraGEM model is ready
for additional uses involving secondary metabolism (given small
modifications to the biomass constituting equation). This is applicable primarily for studies involving plant-environment interactions
as well as interactions of plants with pathogen, herbivores, and
other plants or any study highlighting the importance of secondary
metabolism. As is the case of all GEMs, the biomass constituting
equation ultimately determines how a metabolic network is used,
given constraints applied to substrate uptake and product secretion
[5].
Finally, the impact of the specific proton flux was investigated
using the AraGEM model. The specific proton flux was initially
developed for bacteria and describes the total rate at which protons
are secreted or consumed by the cell [48]. Obviously, this parameter is more important to acid-producing bacteria than it is for
plants. To study the specific proton flux using AraGEM, a proton
exchange reaction was installed and constrained to values of −10,
0, and 10 mmol protons g−1 dry cell weight h−1 . These values correspond to proton efflux, no proton exchange, and proton influx,
respectively. Flux balance analysis was applied to all cases, and few
changes in fluxes were observed. The number of reactions maintaining identical fluxes is shown in Fig. 4. Of the 1,567 reactions,
1,449 maintained an identical flux for all three cases of the specific
proton flux. This represents a considerable change from bacterial
GEMs, which are particularly sensitive to the specific proton flux
parameter [5]. Collectively, the results of these simulation studies
with AraGEM suggest that this model is: (i) completely functional;
(ii) robust; and (iii) has a metabolic network that is functionally
different from GEMs created for bacteria.
Constructing the first compartmented plant GEMs describing
photoautotrophic and photoheterotrophic metabolism including
photorespiration and C4 metabolism in two different cell types
required an enormous effort and was quite an accomplishment.
The models described here represent the first stepping stone in
plant GEMing, but there is room for improvement. Modeling involving these plant genome-scale reconstructions provided mostly
qualitative predictions of flux distributions for selected aspects of
primarily central carbon metabolism, but most importantly an idea
about model properties and a framework for improvements. One of
the major obstacles in plant GEMing is the large number of missing
parts and unknown interactions among the known parts. Reverse
genetics and enzyme activity and specificity studies along with subcellular localization will slowly populate the list of genes/enzymes
of known metabolic functions facilitating far better gene annotation. Transcriptomics and proteomics are extremely useful in
delineating the presence of transcripts and proteins in plant cells
and subcellular localization of enzymes [112,141,142]. Advancing high-throughput non-targeted metabolomics will help identify
novel secondary metabolites that need to be included in plant
GEMs, but this effort has to be connected to metabolite localization
to relevant cell types and organelles and for defined developmental stages and growth conditions. Laser-capture microdissection
and pressure catapulting, highly sensitive liquid chromatographytandem mass spectrometry, and novel approaches including the
‘mass microscope’ [89,91,92] will become indispensible for this
goal. Recent advancements in non-aqueous fractionation will also
aid in these goals [90]. Considering that there are many different plant cell types that are recalcitrant to separation from other
cells and virtually an infinite number of internal and environmental
stimuli and their combinations, this is a Sisyphean task. Probably
the best approach with the current knowledge base is to focus on
cell types and conditions of interest. New parts and functionalities
can be added to the existing networks to generate comprehensive
template models for major plant species, from which specific networks operating in cells of interest can be extracted. This procedure
is similar to what evolved from the first E. coli K-12 MG1655 models
for bacteria [7,12,56].
Current plant GEMs tend to use the existing data to constrain the
biomass fluxes, but often from different studies of different biomass
metabolites, even extrapolated from different plant species and
the same is true for model validation. Often, predicted flux distribution validation involves solely experimental evidence for the
presence of an enzyme in the similar tissues that was predicted
to be active in in silico modeling. The following relates to problems with different data sources used in GEMing. Biomass and
its composition are highly variable traits that largely depend on
the developmental stage and environmental conditions. GEMing
should include measurements relevant to the model system at a
defined developmental stage and conditions and, if possible, from
appropriate cell types. Additional flux constraints should be added
to minimize the phenotypic solution space. Plant models are currently only constrained based on biomass fluxes, substrate uptake,
or a broad range of physiologically relevant metabolite concentrations [14,15,17,18,21,22], but do not consider thermodynamic
constraints for the internal reactions or other constraints. Only
maize iRS1563 was fully balanced for all elements and included
proton and charge balancing [22].
Objective functions used routinely for bacterial GEMing may
not be the most appropriate for GEMing in plant cells. Identifying
proper objective functions of specific plant cell types will improve
the predictive capabilities of resulting models. Because plant cells
are exposed to complex signals at the same time, modeling will
require multiple and more complex objective functions. The idea
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
of minimizing the use of resources while maximizing the output
given circumstances is probably valid, but difficult to prove because
of these circumstances surrounding growth, which often has missing information and is not considered in the model as resource and
energy drains (e.g., responses to pathogens and herbivores). Cell
maintenance, “wasteful” degradation processes, and futile cycles
also should be considered individually, as they represent significant sinks of resources and energy that are different in different cell
types [18,22,125]. In addition, plant cells respond metabolically to
convoluted stimuli coming from the adjacent cells and the environment, which also requires redirection of carbon flow to additional,
often neglected pathways. It is advisable to start looking at interactions of cells in tissues and include these pathways as active in
plant GEMs. The Nielsen and Maranas groups are elegantly pioneering the way of addressing these problems with their systems
involving two cell types in C4 plants [17,22].
Using the minimization of fluxes or steps as objective functions
in GEMing tends to get rid of “redundant” pathways. If there are
two redundant pathways that have similar cofactor balance, the
flux minimization algorithms will choose one pathway, despite
the fact that they are both active in vivo, which can be proven
by transcriptomic, metabolomic, and fluxomic approaches. These
pathways may seem to be redundant, but obviously they are needed
to allow metabolic adaptation in plant cells and are a result of
complex metabolic regulation that cannot easily be accounted for
in a flux balance analysis model. Finding the minimum number
of appropriate objective functions that correctly represent cellular
physiology and metabolism is probably one of the most problematic challenges in GEMing in general. One will probably get away
with using a few objective functions if appropriate, physiologically
relevant, flux constraints are implemented. Then such properly
constrained models should start yielding accurate predictions of
internal fluxes.
The presented plant flux balance analysis models were reasonably accurate in predicting biomass fluxes, which is not surprising,
since the actual cellular biomass composition was used to constrain most of these models. Prediction of internal fluxes in these
plant models was only qualitative and the model was considered a success when the expected pathways were predicted to be
active in silico. Flux balance analysis has been found to perform
poorly in isolated cases in predicting the internal fluxes regardless of the model system. This is possibly a result of too many
degrees of freedom in these under-determined models and possibly due to the use of extreme, physiologically irrelevant objective
functions. Comparison of results obtained from flux balance analysis and 13 C-based steady-state metabolic flux analysis revealed
that the actual flux distributions are within the phenotypically
feasible solution space of bacterial flux balance models [125]. However, the sampling of this solution space with extreme objective
functions used in flux balance analysis yielded flux combinations
that were substantially different from the actual flux distributions.
13 C-based fluxes were in close agreement with the actual in vivo
fluxes and therefore, 13 C-based steady-state metabolic flux analysis
is suitable to provide validation of flux balance analysis predictions [125]. The situation is similar in plant systems, although
the direct comparison with the actual fluxes is not always possible. Comparison of developing rapeseed flux balance/variability
and 13 C-based flux models revealed that while some fluxes were
comparable between the two models, others were not and many
suboptimal, but in vivo-active alternate pathways were not selected
as part of solution [19,20]. Flux variability analysis (all possible
metabolic routes are considered and the resulting range of fluxes
for each reaction are returned from the analysis) [53] used in these
studies should be expanded with proper experimentally-derived
constraints and the exclusion of extreme objective functions
[19,20].
67
A significant challenge for GEMing in the future involves
the addition of flux constraints on internal fluxes or at branch
points within the metabolic network itself. 13 C-based steady-state
metabolic flux analysis provided an accurate representation of
internal flux distributions, as demonstrated by the consistency
between the predicted and the actual, experimentally measured
13 C-labeling in internal metabolites [125]. As such, this approach
should be used to provide validation for flux balance analysis
flux distribution predictions or these approaches should be used
together to obtain the most appropriate predictions of fluxes
and metabolic phenotypes. Alternatively, rather than using 13 Cbased steady-state metabolic flux analysis to validate flux balance
or variability models, 13 C-based flux measurements should be
used as physiologically relevant flux constraints, which would
preclude the use of extreme objective functions (Jackie Shanks,
personal communication). If necessary, unique objective functions (or multiple combinations of these functions) instead of the
physiologically irrelevant extreme ones should be used. These
appropriate objective functions may ultimately be back-calculated
by comparing measured fluxes and flux balance analysis results
[47,55]. However, 13 C-based steady-state metabolic flux analysis cannot be used in purely photosynthetic systems, where
13 CO
2 is the sole source of carbon because every carbon in
every metabolite will become labeled to the same degree. In
such cases, kinetic modeling such as isotopically nonstationary 13 C-based metabolic flux analysis is needed [143]. Despite
the many successes of GEMing, significant challenges lie ahead
for systems biology research not only in plants, but also in
bacteria.
6. Conclusions
The current plant GEMs have provided very promising predictions. However, given the extreme level of complexity of plant
metabolism, much research is left to be done. Missing parts can be
incorporated as technology involving different omics progresses.
Future focus will be on secondary metabolism. Choosing the right
model systems, for which experimental validations are available,
as well as using known constraints and appropriate objective
functions will play a crucial role in improving plant GEMs. However, the missing parts do not relate to only to metabolites and
enzymes, but also to known and novel regulators in both primary and secondary metabolism. Fluxes represent the cellular
metabolic phenotypes and inherently reflect convoluted regulatory mechanisms. In another words, the final flux value through
a specific step is a result of the influence of all possible active
regulatory mechanisms that can affect the enzyme activity of this
step. The question is, how we can dissect these mechanisms and
integrate them into the model by associating relevant regulators with the enzymes. Whether we like it or not, integration of
regulation into GEMs will become important not only for understanding the basis of plant metabolic plasticity, but also for rational
metabolic engineering, as regulation tends to “overwrite” effects of
changes to expression levels of biosynthetic genes. Then we will
be able to address the questions of metabolic adaptations in plants
exposed to diverse stimuli and generate predictions for rational
plant metabolic engineering to improve plant stress tolerance and
resistance to pathogens and herbivores and ultimately biomass
production. So, are we ready for GEMing in plants? Definitely!
Acknowledgements
Research funding was obtained from NSF-MCB-1052145 and the
Biodesign and Bioprocessing Research Center at Virginia Tech. EC is
primarily responsible for writing the review. RS and JY performed
68
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
the simulations study with the AraGEM model. RS provided the discussion of AraGEM model simulations and edited the manuscript.
References
[1] K.J. Kauffman, P. Prakash, J.S. Edwards, Advances in flux balance analysis,
Current Opinion in Biotechnology 14 (2003) 491–496.
[2] E. Murabito, E. Simeonidis, K. Smallbone, J. Swinton, Capturing the essence of
a metabolic network: a flux balance analysis approach, Journal of Theoretical
Biology 260 (2009) 445–452.
[3] C.H. Schilling, J.S. Edwards, D. Letscher, B.O. Palsson, Combining pathway
analysis with flux balance analysis for the comprehensive study of metabolic
systems, Biotechnology and Bioengineering 71 (2000) 286–306.
[4] L.M. Liu, R. Agren, S. Bordel, J. Nielsen, Use of genome-scale metabolic models
for understanding microbial physiology, FEBS Letters 584 (2010) 2556–2564.
[5] R.S. Senger, Biofuel production improvement with genome-scale models: the
role of cell composition, Biotechnology Journal 5 (2010) 671–685.
[6] J.S. Edwards, B.O. Palsson, Systems properties of the Haemophilus influenzae Rd metabolic genotype, Journal of Biological Chemistry 274 (1999)
17410–17416.
[7] D.J. Baumler, R.G. Peplinski, J.L. Reed, J.D. Glasner, N.T. Perna, The evolution
of metabolic networks of E. coli, BMC System Biology 5 (2011) 182.
[8] N.C. Duarte, et al., Global reconstruction of the human metabolic network
based on genomic and bibliomic data, Proceedings of the National Academy
of Sciences of the United States of America 104 (2007) 1777–1782.
[9] I. Famili, J. Forster, J. Nielson, B.O. Palsson, Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale
reconstructed metabolic network, Proceedings of the National Academy of
Sciences of the United States of America 100 (2003) 13134–13139.
[10] A.M. Feist, M.J. Herrgard, I. Thiele, J.L. Reed, B.O. Palsson, Reconstruction of
biochemical networks in microorganisms, Nature Reviews Microbiology 7
(2009) 129–143.
[11] P.C. Fu, Genome-scale modeling of Synechocystis sp PCC 6803 and prediction
of pathway insertion, Journal of Chemical Technology and Biotechnology 84
(2009) 473–483.
[12] J.D. Orth, et al., A comprehensive genome-scale reconstruction of
Escherichia coli metabolism-2011, Molecular Systems Biology 7 (2011) 1–9.
[13] O. Rolfsson, B.O. Palsson, I. Thiele, The human metabolic reconstruction Recon
1 directs hypotheses of novel human metabolic functions, BMC System Biology 5 (2011) 155.
[14] M.G. Poolman, L. Miguet, L.J. Sweetlove, D.A. Fell, A genome-scale metabolic
model of Arabidopsis and some of its properties, Plant Physiology 151 (2009)
1570–1581.
[15] E. Grafahrend-Belau, F. Schreiber, D. Koschutzki, B.H. Junker, Flux balance
analysis of barley seeds: a computational approach to study systemic properties of central metabolism, Plant Physiology 149 (2009) 585–598.
[16] H. Rolletschek, et al., Combined noninvasive imaging and modeling
approaches reveal metabolic compartmentation in the barley endosperm,
Plant Cell 23 (2011) 3041–3054.
[17] C.G.D. Dal’Molin, L.E. Quek, R.W. Palfreyman, S.M. Brumbley, L.K. Nielsen,
C4GEM, a genome-scale metabolic model to study C4 plant metabolism, Plant
Physiology 154 (2010) 1871–1885.
[18] C.G.D. Dal’Molin, L.E. Quek, R.W. Palfreyman, S.M. Brumbley, L.K. Nielsen,
AraGEM, a genome-scale reconstruction of the primary metabolic network
in Arabidopsis, Plant Physiology 152 (2010) 579–589.
[19] J. Hay, J. Schwender, Computational analysis of storage synthesis in developing Brassica napus L. (oilseed rape) embryos: flux variability analysis in
relation to 13 C metabolic flux analysis, Plant Journal 67 (2011) 513–525.
[20] J. Hay, J. Schwender, Metabolic network reconstruction and flux variability
analysis of storage synthesis in developing oilseed rape (Brassica napus L.)
embryos, Plant Journal 67 (2011) 526–541.
[21] E. Pilalis, A. Chatziioannou, B. Thomasset, F. Kolisis, An in silico compartmentalized metabolic model of Brassica napus enables the systemic study of
regulatory aspects of plant central metabolism, Biotechnology and Bioengineering 108 (2011) 1673–1682.
[22] R. Saha, P.F. Suthers, C.D. Maranas, Zea mays iRS1563: a comprehensive
genome-scale metabolic reconstruction of maize metabolism, PLoS One 6
(2011) 1–12.
[23] E. Ruppin, J.A. Papin, L.F. de Figueiredo, S. Schuster, Metabolic reconstruction,
constraint-based analysis and game theory to probe genome-scale metabolic
networks, Current Opinion in Biotechnology 21 (2010) 502–510.
[24] J.L. Reed, T.D. Vo, C.H. Schilling, B.O. Palsson, An expanded genome-scale
model of Escherichia coli K-12 (iJR904 GSM/GPR), Genome Biology 4 (2003)
R54.
[25] P.D. Karp, et al., Expansion of the BioCyc collection of pathway/genome
databases to 160 genomes, Nucleic Acids Research 33 (2005) 6083–6089.
[26] H. Ogata, et al., KEGG: Kyoto encyclopedia of genes and genomes, Nucleic
Acids Research 27 (1999) 29–34.
[27] R. Overbeek, et al., The subsystems approach to genome annotation and its use
in the project to annotate 1000 genomes, Nucleic Acids Research 33 (2005)
5691–5702.
[28] J. Schellenberger, J.O. Park, T.M. Conrad, B.O. Palsson, BiGG: a Biochemical
Genetic and Genomic knowledgebase of large scale metabolic reconstructions, BMC Bioinformatics 11 (2010) 213.
[29] P. Lamesch, et al., The Arabidopsis Information Resource (TAIR): improved
gene annotation and new tools, Nucleic Acids Research 40 (2012)
D1202–D1210.
[30] P.F. Zhang, et al., MetaCyc and AraCyc. Metabolic pathway databases for plant
research, Plant Physiology 138 (2005) 27–37.
[31] D. Croft, et al., Reactome: a database of reactions, pathways and biological
processes, Nucleic Acids Research 39 (2011) D691–D697.
[32] C.S. Henry, et al., High-throughput generation, optimization and analysis of
genome-scale metabolic models, Nature Biotechnology 28 (2010), 977-U922.
[33] J.L. Heazlewood, R.E. Verboom, J. Tonti-Filippini, I. Small, A.H. Millar, SUBA:
the Arabidopsis subcellular database, Nucleic Acids Research 35 (2007)
D213–D218.
[34] Q. Sun, et al., PPDB, the Plant Proteomics Database at Cornell, Nucleic Acids
Research 37 (2009) D969–D974.
[35] S. Reumann, C.L. Ma, S. Lemke, L. Babujee, AraPerox., A database of putative
Arabidopsis proteins from plant peroxisomes, Plant Physiology 136 (2004)
2587–2608.
[36] B. Boeckmann, et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Research 31 (2003) 365–370.
[37] H.P.J. Bonarius, G. Schmid, J. Tramper, Flux analysis of underdetermined
metabolic networks: the quest for the missing constraints, Trends in Biotechnology 15 (1997) 308–314.
[38] A. Varma, B.O. Palsson, Metabolic flux balancing – basic concepts, scientific
and practical use, Bio-Technology 12 (1994) 994–998.
[39] N.D. Price, J.L. Reed, B.O. Palsson, Genome-scale models of microbial cells:
evaluating the consequences of constraints, Nature Reviews Microbiology 2
(2004) 886–897.
[40] A.M. Feist, B.O. Palsson, The growing scope of applications of genome-scale
metabolic reconstructions using Escherichia coli, Nature Biotechnology 26
(2008) 659–667.
[41] R. Mahadevan, B.O. Palsson, D.R. Lovley, In situ to in silico and back: elucidating
the physiology and ecology of Geobacter spp. using genome-scale modelling,
Nature Reviews Microbiology 9 (2011) 39–50.
[42] C.B. Milne, P.J. Kim, J.A. Eddy, N.D. Price, Accomplishments in genome-scale
in silico modeling for industrial and medical biotechnology, Biotechnology
Journal 4 (2009) 1653–1670.
[43] J.A. Papin, N.D. Price, S.J. Wiback, D.A. Fell, B.O. Palsson, Metabolic pathways in the post-genome era, Trends in Biochemical Sciences 28 (2003)
250–258.
[44] E.T. Papoutsakis, Equations and calculations for fermentations of butyric acid
bacteria, Biotechnology and Bioengineering 26 (1984) 174–187.
[45] C.S. Henry, M.D. Jankowski, L.J. Broadbelt, V. Hatzimanikatis, Genome-scale
thermodynamic analysis of Escherichia coli metabolism, Biophysical Journal
90 (2006) 1453–1461.
[46] M.D. Jankowski, C.S. Henry, L.J. Broadbelt, V. Hatzimanikatis, Group contribution method for thermodynamic analysis of complex metabolic networks,
Biophysical Journal 95 (2008) 1487–1499.
[47] J.L. Reed, B.O. Palsson, Thirteen years of building constraint-based in silico
models of Escherichia coli, Journal of Bacteriology 185 (2003) 2692–2699.
[48] R.S. Senger, E.T. Papoutsakis, Genome-scale model for Clostridium acetobutylicum: Part II. Development of specific proton flux states and numerically
determined sub-systems, Biotechnology and Bioengineering 101 (2008)
1053–1071.
[49] K. Yizhak, T. Benyamini, W. Liebermeister, E. Ruppin, T. Shlomi, Integrating
quantitative proteomics and metabolomics with a genome-scale metabolic
network model, Bioinformatics 26 (2010) i255–i260.
[50] A.M. Feist, B.O. Palsson, The biomass objective function, Current Opinion in
Microbiology 13 (2010) 344–349.
[51] S.A. Becker, et al., Quantitative prediction of cellular metabolism with
constraint-based models: the COBRA Toolbox, Nature Protocols 2 (2007)
727–738.
[52] J. Schellenberger, et al., Quantitative prediction of cellular metabolism with
constraint-based models: the COBRA Toolbox v2.0, Nature Protocols 6 (2011)
1290–1307.
[53] R. Mahadevan, C.H. Schilling, The effects of alternate optimal solutions in
constraint-based genome-scale metabolic models, Metabolic Engineering 5
(2003) 264–276.
[54] K. Smallbone, E. Simeonidis, Flux balance analysis: a geometric perspective,
Journal of Theoretical Biology 258 (2009) 311–315.
[55] E.P. Gianchandani, M.A. Oberhardt, A.P. Burgard, C.D. Maranas, J.A. Papin,
Predicting biological system objectives de novo from internal state measurements, BMC Bioinformatics 9 (2008) 43.
[56] J.S. Edwards, B.O. Palsson, The Escherichia coli MG1655 in silico metabolic
genotype: its definition, characteristics, and capabilities, Proceedings of the
National Academy of Sciences of the United States of America 97 (2000)
5528–5533.
[57] J. Forster, I. Famili, B. Palsson, J. Nielsen, Large-scale evaluation of in silico gene
deletions in Saccharomyces cerevisiae, OMICS 7 (2003) 193–202.
[58] I. Thiele, et al., A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonella Typhimurium LT2, BMC System
Biology 5 (2011) 8.
[59] D. Segre, D. Vitkup, G.M. Church, Analysis of optimality in natural and perturbed metabolic networks, Proceedings of the National Academy of Sciences
of the United States of America 99 (2002) 15112–15117.
[60] T. Shlomi, O. Berkman, E. Ruppin, Regulatory on/off minimization of
metabolic flux changes after genetic perturbations, Proceedings of the
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
[87]
[88]
[89]
[90]
[91]
[92]
National Academy of Sciences of the United States of America 102 (2005)
7695–7700.
A.P. Burgard, P. Pharkya, C.D. Maranas, OptKnock., A bilevel programming
framework for identifying gene knockout strategies for microbial strain optimization, Biotechnology and Bioengineering 84 (2003) 647–657.
J. Kim, J.L. Reed, OptO.R.F., Optimal metabolic and regulatory perturbations
for metabolic engineering of microbial strains, BMC System Biology 4 (2010)
53.
I. Rocha, et al., OptFlux: An open-source software platform for in silico
metabolic engineering, BMC System Biology 4 (2010) 45.
R. Mahadevan, Constraint-based genome-scale models of cellular
metabolism, in: C.D. Smolke (Ed.), The Metabolic Pathway Engineering
Handbook, Taylor & Francis Group, Boca Raton, 2010.
J. Lee, H. Yun, A.M. Feist, B.O. Palsson, S.Y. Lee, Genome-scale reconstruction
and in silico analysis of the Clostridium acetobutylicum ATCC 824 metabolic
network, Applied Microbiology and Biotechnology 80 (2008) 849–862.
P.S. Schnable, et al., The B73 maize genome: complexity, diversity, and
dynamics, Science 326 (2009) 1112–1115.
D. Shintani, D. DellaPenna, Elevating the vitamin E content of plants through
metabolic engineering, Science 282 (1998) 2098–2100.
P.L. Nagy, G.M. McCorkle, H. Zalkin, PurU a source of formate for
PurT-dependent phosphoribosyl-N-formylglycinamide synthesis, Journal of
Bacteriology 175 (1993) 7066–7073.
E. Collakova, et al., Arabidopsis 10-formyl tetrahydrofolate deformylases are
essential for photorespiration, Plant Cell 20 (2008) 1818–1832.
R.J. Bino, et al., Potential of metabolomics as a functional genomics tool, Trends
in Plant Science 9 (2004) 418–425.
E. Pichersky, D.R. Gang, Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective, Trends in Plant Science 5 (2000)
439–445.
K. Saito, F. Matsuda, Metabolomics for functional genomics, systems biology,
and biotechnology, in: S. Merchant, W.R. Briggs, D. Ort (Eds.), Annual Review
of Plant Biology, 2010, pp. 463–489.
O. Fiehn, Metabolomics – the link between genotypes and phenotypes, Plant
Molecular Biology 48 (2002) 155–171.
A.R. Fernie, Metabolome characterisation in plant system analysis, Functional
Plant Biology 30 (2003) 111–120.
V.S. Kumar, M.S. Dasika, C.D. Maranas, Optimization based automated curation of metabolic reconstructions, BMC Bioinformatics 8 (2007) 212.
G. Jander, V. Joshi, Aspartate-derived amino acid biosynthesis in Arabidopsis
thaliana, in: The Arabidopsis Book, The American Society of Plant Biologists,
2009, pp.c1–15.
U. Gophna, E. Bapteste, W.F. Doolittle, D. Biran, E.Z. Ron, Evolutionary plasticity of methionine biosynthesis, Gene 355 (2005) 48–57.
B.R. Epperly, E.E. Dekker, l-Threonine dehydrogenase from Escherichia coli –
identification of an active-site cysteine residue and metal-ion studies, Journal
of Biological Chemistry 266 (1991) 6086–6092.
A.J. Edgar, Molecular cloning and tissue distribution of mammalian lthreonine 3-dehydrogenases, BMC Biochemistry 3 (2002) 19.
D. Amador-Noguez, et al., Systems-level metabolic flux profiling elucidates
a complete, bifurcated tricarboxylic acid cycle in Clostridium acetobutylicum,
Journal of Bacteriology 192 (2010) 4452–4461.
S.B. Crown, et al., Resolving the TCA cycle and pentose-phosphate pathway of
Clostridium acetobutylicum ATCC 824: isotopomer analysis, in vitro activities
and expression analysis, Biotechnology Journal 6 (2011) 300–305.
R.S. Senger, E.T. Papoutsakis, Genome-scale model for Clostridium acetobutylicum: Part I. Metabolic network resolution and analysis, Biotechnology
and Bioengineering 101 (2008) 1036–1052.
A.M. Feist, et al., A genome-scale metabolic reconstruction for Escherichia coli
K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information,
Molecular Systems Biology 3 (2007) 121.
V.M.E. Andriotis, N.J. Kruger, M.J. Pike, A.M. Smith, Plastidial glycolysis in
developing Arabidopsis embryos, New Phytologist 185 (2010) 649–662.
S. Baud, I.A. Graham, A spatiotemporal analysis of enzymatic activities
associated with carbon metabolism in wild-type and mutant embryos of
Arabidopsis using in situ histochemistry, Plant Journal 46 (2006) 155–169.
L.E. Anderson, J.A. Bryant, A.A. Carol, Both chloroplastic and cytosolic phosphoglycerate kinase isozymes are present in the pea leaf nucleus, Protoplasma
223 (2004) 103–110.
W.C. Plaxton, The organization and regulation of plant glycolysis, Annual
Review of Plant Physiology and Plant Molecular Biology 47 (1996) 185–214.
P. Giege, et al., Enzymes of glycolysis are functionally associated with the
mitochondrion in Arabidopsis cells, Plant Cell 15 (2003) 2140–2151.
L.W. Sumner, et al., Spatially resolved plant metabolomics, in: R.D. Hall (Ed.),
Annu. Plant Rev. Biol. Plant Metabolomics, John Wiley & Sons, Chichester,
West Sussex, UK, 2011, pp. 343–366.
S. Klie, et al., Analysis of the compartmentalized metabolome – a validation of
the non-aqueous fractionation technique, Frontiers in Plant Science 2 (2011)
1–27.
J. Grassl, N.L. Taylor, A.H. Millar, Matrix-assisted laser desorption/ionisation
mass spectrometry imaging and its development for plant protein imaging,
Plant Methods 7 (2011) 21.
T. Harada, et al., Visualization of volatile substances in different organelles
with an atmospheric-pressure mass microscope, Analytical Chemistry 81
(2009) 9153–9157.
69
[93] N.R. Baker, J. Harbinson, D.M. Kramer, Determining the limitations and regulation of photosynthetic energy transduction in leaves, Plant, Cell and
Environment 30 (2007) 1107–1125.
[94] J. Schwender, J.B. Ohlrogge, Y. Shachar-Hill, A flux model of glycolysis and the
oxidative pentosephosphate pathway in developing Brassica napus embryos,
Journal of Biological Chemistry 278 (2003) 29442–29453.
[95] C. Miyake, Alternative electron flows (water-water cycle and cyclic electron
flow around PSI) in photosynthesis: molecular mechanisms and physiological
functions, Plant Cell Physiology 51 (2010) 1951–1963.
[96] A.L. Meadows, R. Karnik, H. Lam, S. Forestell, B. Snedecor, Application of
dynamic flux balance analysis to an industrial Escherichia coli fermentation,
Metabolic Engineering 12 (2010) 150–160.
[97] J. Kilian, et al., The AtGenExpress global stress expression data set: protocols,
evaluation and model data analysis of UV-B light, drought and cold stress
responses, Plant Journal 50 (2007) 347–363.
[98] Y. Shimada, H. Goda, Y. Yoshida, An introduction to AtGenExpress. How to
use the data sets and what to learn from large scale gene expression analysis,
Plant Cell Physiology 46 (2005), S141-S141.
[99] T.C.R. Williams, et al., Metabolic network fluxes in heterotrophic Arabidopsis
cells: stability of the flux distribution under different oxygenation conditions,
Plant Physiology 148 (2008) 704–718.
[100] C.J. Baxter, et al., The metabolic response of heterotrophic Arabidopsis cells
to oxidative stress, Plant Physiology 143 (2007) 312–325.
[101] M. Lehmann, et al., The metabolic response of Arabidopsis roots to oxidative
stress is distinct from that of heterotrophic cells in culture and highlights a
complex relationship between the levels of transcripts, metabolites, and flux,
Molecular Plant 2 (2009) 390–406.
[102] M.G. Poolman, ScrumPy: metabolic modelling with Python, IEE Proceedings
– Systems Biology 153 (2006) 375–378.
[103] D.K. Allen, J.B. Ohlrogge, Y. Shachar-Hill, The role of light in soybean seed
filling metabolism, Plant Journal 58 (2009) 220–234.
[104] R.G. Ratcliffe, Y. Shachar-Hill, Measuring multiple fluxes through plant
metabolic networks, Plant Journal 45 (2006) 490–511.
[105] W. Wiechert, M. Mollney, S. Petersen, A.A. de Graaf, A universal framework for C-13 metabolic flux analysis, Metabolic Engineering 3 (2001)
265–283.
[106] W. Wiechert, A.A. deGraaf, Bidirectional reaction steps in metabolic networks:
1. Modeling and simulation of carbon isotope labeling experiments, Biotechnology and Bioengineering 55 (1997) 101–117.
[107] A.P. Alonso, V.L. Dale, Y. Shachar-Hill, Understanding fatty acid synthesis in
developing maize embryos using metabolic flux analysis, Metabolic Engineering 12 (2010) 488–497.
[108] A.P. Alonso, F.D. Goffman, J.B. Ohlrogge, Y. Shachar-Hill, Carbon conversion
efficiency and central metabolic fluxes in developing sunflower (Helianthus
annuus L.) embryos, Plant Journal 52 (2007) 296–308.
[109] A.P. Alonso, et al., A metabolic flux analysis to study the role of sucrose synthase in the regulation of the carbon partitioning in central metabolism in
maize root tips, Metabolic Engineering 9 (2007) 419–432.
[110] A.P. Alonso, D.L. Val, Y. Shachar-Hill, Central metabolic fluxes in the
endosperm of developing maize seeds and their implications for metabolic
engineering, Metabolic Engineering 13 (2011) 96–107.
[111] J. Schwender, F. Goffman, J.B. Ohlrogge, Y. Shachar-Hill, Rubisco without the
Calvin cycle improves the carbon efficiency of developing green seeds, Nature
432 (2004) 779–782.
[112] K. Baerenfaller, et al., Genome-scale proteomics reveals Arabidopsis
thaliana gene models and proteome dynamics, Science 320 (2008)
938–941.
[113] I. Andersson, A. Backlund, Structure and function of Rubisco, Plant Physiology
and Biochemistry 46 (2008) 275–291.
[114] S. Baud, J.P. Boutin, M. Miquel, L. Lepiniec, C. Rochat, An integrated overview
of seed development in Arabidopsis thaliana ecotype WS, Plant Physiology and
Biochemistry 40 (2002) 151–160.
[115] R.J. Weselake, et al., Increasing the flow of carbon into seed oil, Biotechnology
Advances 27 (2009) 866–878.
[116] M.J. Holdsworth, L. Bentsink, W.J.J. Soppe, Molecular networks regulating Arabidopsis seed maturation, after-ripening, dormancy and germination, New
Phytologist 179 (2008) 33–54.
[117] P.D. Brown, J.G. Tokuhisa, M. Reichelt, J. Gershenzon, Variation of glucosinolate accumulation among different organs and developmental stages of
Arabidopsis thaliana, Phytochemistry 62 (2003) 471–481.
[118] S.X. Chen, B.L. Petersen, C.E. Olsen, A. Schulz, B.A. Halkier, Long-distance
phloem transport of glucosinolates in Arabidopsis, Plant Physiology 127
(2001) 194–201.
[119] H.H. Nour-Eldin, B.A. Halkier, Piecing together the transport pathway of
aliphatic glucosinolates, Phytochemistry Reviews 8 (2009) 53–67.
[120] D.K. Allen, I.G.L. Libourel, Y. Shachar-Hill, Metabolic flux analysis in
plants: coping with complexity, Plant, Cell and Environment 32 (2009)
1241–1257.
[121] J.T. van Dongen, et al., Phloem import and storage metabolism are highly coordinated by the low oxygen concentrations within developing wheat seeds,
Plant Physiology 135 (2004) 1809–1821.
[122] H. Rolletschek, W. Weschke, H. Weber, U. Wobus, L. Borisjuk, Energy state and
its control on seed development: starch accumulation is associated with high
ATP and steep oxygen gradients within barley grains, Journal of Experimental
Botany 55 (2004) 1351–1359.
70
E. Collakova et al. / Plant Science 191–192 (2012) 53–70
[123] M.M. Paluska, A.K. Dobrenz, R.T. Ramage, Seed size and seedling components
in Arivat barley, Journal of the Arizona-Nevada Academy of Science 14 (1979)
88–90.
[124] R. Schuetz, L. Kuepfer, U. Sauer, Systematic evaluation of objective functions
for predicting intracellular fluxes in Escherichia coli, Molecular Systems Biology 3 (2007) 119.
[125] X.W. Chen, A.P. Alonso, D.K. Allen, J.L. Reed, Y. Shachar-Hill, Synergy between
13
C-metabolic flux analysis and flux balance analysis for understanding
metabolic adaption to anaerobiosis in E. coli, Metabolic Engineering 13 (2011)
38–48.
[126] J. Schwender, J.B. Ohlrogge, Probing in vivo metabolism by stable isotope
labeling of storage lipids and proteins in developing Brassica napus embryos,
Plant Physiology 130 (2002) 347–361.
[127] B.H. Junker, J. Lonien, L.E. Heady, A. Rogers, J. Schwender, Parallel determination of enzyme activities and in vivo fluxes in Brassica napus embryos grown on
organic or inorganic nitrogen source, Phytochemistry 68 (2007) 2232–2242.
[128] J. Schwender, Y. Shachar-Hill, J.B. Ohlrogge, Mitochondrial metabolism in
developing embryos of Brassica napus, Journal of Biological Chemistry 281
(2006) 34040–34047.
[129] G.J. Niemann, J.B.M. Pureveen, G.B. Eijkel, H. Poorter, J.J. Boon, Differential
chemical allocation and plant adaptation – A Py-Ms study of 24 species differing in relative growth rate, Plant Soil 175 (1995) 275–289.
[130] R.R. Wise, J.K. Hoober, Synthesis, export and partitioning of end products of
photosynthesis, in: R.R. Wise, J.K. Hoober (Eds.), Structure and Function of
Plastids, Springer, Dordrecht, The Netherlands, 2007, pp. 274–288.
[131] H. Poorter, M. Bergkotte, Chemical composition of 24 wild species differing
in relative growth rate, Plant, Cell and Environment 15 (1992) 221–229.
[132] G.E. Edwards, V.R. Franceschi, E.V. Voznesenskaya, Single-cell C4 photosynthesis versus the dual-cell (Kranz) paradigm, Annual Review of Plant Biology
55 (2004) 173–196.
[133] W. Majeran, Y. Cai, Q. Sun, K.J. van Wijk, Functional differentiation of bundle sheath and mesophyll maize chloroplasts determined by comparative
proteomics, Plant Cell 17 (2005) 3111–3140.
[134] S. von Caemmerer, R.T. Furbank, The C4 pathway: an efficient CO2 pump,
Photosynthesis Research 77 (2003) 191–207.
[135] W.R. Chen, X. Yang, Z.L. He, Y. Feng, F.H. Hu, Differential changes in photosynthetic capacity, 77K chlorophyll fluorescence and chloroplast ultrastructure
between Zn-efficient and Zn-inefficient rice genotypes (Oryza sativa) under
low zinc stress, Physiologia Plantarum 132 (2008) 89–101.
[136] C.H. Foyer, H. Vanacker, L.D. Gomez, J. Harbinson, Regulation of photosynthesis and antioxidant metabolism in maize leaves at optimal and
chilling temperatures: review, Plant Physiology and Biochemistry 40 (2002)
659–668.
[137] S.I. Allakhverdiev, et al., Heat stress: an overview of molecular responses in
photosynthesis, Photosynthesis Research 98 (2008) 541–550.
[138] Z.R. Li, S. Wakao, B.B. Fischer, K.K. Niyogi, Sensing and responding to excess
light, Annual Review of Plant Biology 60 (2009) 239–260.
[139] O.J. Sanchez, C.A. Cardona, Trends in biotechnological production of fuel
ethanol from different feedstocks, Bioresource Technology 99 (2008)
5270–5295.
[140] V. Mechin, et al., In search of a maize ideotype for cell wall enzymatic degradability using histological and biochemical lignin characterization, Journal of
Agricultural and Food Chemistry 53 (2005) 5872–5881.
[141] A. Brautigam, U. Gowik, What can next generation sequencing do for you?
Next generation sequencing as a valuable tool in plant research, Plant Biology
12 (2010) 831–841.
[142] F.M. Canovas, et al., Plant proteome analysis, Proteomics 4 (2004) 285–298.
[143] J.D. Young, A.A. Shastri, G. Stephanopoulos, J.A. Morgan, Mapping photoautotrophic metabolism with isotopically nonstationary 13 C flux analysis,
Metabolic Engineering 13 (2011) 656–665.