Plant Science 191–192 (2012) 53–70 Contents lists available at SciVerse ScienceDirect Plant Science journal homepage: www.elsevier.com/locate/plantsci Review Are we ready for genome-scale modeling in plants? Eva Collakova a,∗ , Jiun Y. Yen b , Ryan S. Senger b a b Department of Plant Pathology, Physiology, and Weed Science, 308 Latham Hall, Virginia Tech, Blacksburg, VA, USA Biological Systems Engineering Department, Virginia Tech, Blacksburg, VA, USA a r t i c l e i n f o Article history: Received 26 January 2012 Received in revised form 17 April 2012 Accepted 18 April 2012 Available online 3 May 2012 Keywords: AraGEM C4GEM Flux balance analysis Genome-scale model Metabolic network reconstruction Photorespiration Photosynthesis Plant primary and secondary metabolism a b s t r a c t As it is becoming easier and faster to generate various types of high-throughput data, one would expect that by now we should have a comprehensive systems-level understanding of biology, biochemistry, and physiology at least in major prokaryotic and eukaryotic model systems. Despite the wealth of available data, we only get a glimpse of what is going on at the molecular level from the global perspective. The major reason is the high level of cellular complexity and our limited ability to identify all (or at least important) components and their interactions in virtually infinite number of internal and external conditions. Metabolism can be modeled mathematically by the use of genome-scale models (GEMs). GEMs are in silico metabolic flux models derived from available genome annotation. These models predict the combination of flux values of a defined metabolic network given the influence of internal and external signals. GEMs have been successfully implemented to model bacterial metabolism for over a decade. However, it was not until 2009 when the first GEM for Arabidopsis thaliana cell-suspension cultures was generated. Genomescale modeling (“GEMing”) in plants brings new challenges primarily due to the missing components and complexity of plant cells represented by the existence of: (i) photosynthesis; (ii) compartmentation; (iii) variety of cell and tissue types; and (iv) diverse metabolic responses to environmental and developmental cues as well as pathogens, insects, and competing weeds. This review presents a critical discussion of the advantages of existing plant GEMs, while identifies key targets for future improvements. Plant GEMs tend to be accurate in predicting qualitative changes in selected aspects of central carbon metabolism, while secondary metabolism is largely neglected mainly due to the missing (unknown) genes and metabolites. As such, these models are suitable for exploring metabolism in plants grown in favorable conditions, but not in field-grown plants that have to cope with environmental changes in complex ecosystems. AraGEM is the first GEM describing a photosynthetic and photorespiring plant cell (Arabidopsis thaliana). We demonstrate the use of AraGEM given the current (limited) knowledge of plant metabolism and reveal the unexpected robustness of AraGEM by a series of in silico simulations. The major focus of these simulations is on the assessment of the: (i) network connectivity; (ii) influence of CO2 and photon uptake rates on cellular growth rates and production of individual biomass components; and (iii) stability of plant central carbon metabolism with internal pH changes. © 2012 Elsevier Ireland Ltd.Open access under CC BY-NC-ND license. Contents 1. 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Metabolic networks for GEMing are reconstructed from genome annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Constraining the model allows minimizing the number of possible flux solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1. Stoichiometric constraints (mass balancing) assume no changes of fluxes with time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2. Constraining the flux solution space minimizes the number of possible flux distribution solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3. Objective functions are a mathematical representation of cellular “goals” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. GEMs can be used to predict metabolic phenotypes in microorganisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Challenges associated with GEMing in plants revolve around the complexity of plant cell metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Lack of comprehensiveness in plant metabolic networks and missing components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Compartmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∗ Corresponding author. Tel.: +1 540 231 7867. E-mail address: [email protected] (E. Collakova). 0168-9452© 2012 Elsevier Ireland Ltd. Open access under CC BY-NC-ND license. http://dx.doi.org/10.1016/j.plantsci.2012.04.010 54 54 54 54 55 55 56 57 57 57 54 3. 4. 5. 6. E. Collakova et al. / Plant Science 191–192 (2012) 53–70 2.3. Photosynthesis and photorespiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. Diversity of plant cell and tissue types and responses to the environment influence the objective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GEMing in plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Arabidopsis thaliana cell-suspension culture GEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Seed GEMs in plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1. Compartmented barley seed endosperm flux balance analysis models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2. Compartmented flux balance analysis models of developing Brassica napus embryos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. AraGEM, an Arabidopsis flux balance analysis model representing a C3 plant cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. C4GEM, maize, sugarcane, and sorghum flux balance analysis models representing C4 plant cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Maize GEM iRS1563 representing C4 plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluating AraGEM through simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How can GEMing in plants be improved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction Genome-scale models (GEMs) have proven useful in bacteria for: (i) elucidating metabolism and physiology; (ii) assessing the essentiality of the metabolic steps; and (iii) for designing rational metabolic engineering strategies [1–5]. The first GEM was generated in 1999 for Haemophilus influenza with the limited information about available genome annotation and metabolic data [6]. With the development and improvement of high-throughput sequencing, proteomic, and metabolomic technologies, GEMs were constructed for many organisms including bacteria, cyanobacteria, and fungi, animals, and plants [7–13]. The first GEMs generated in plants are as follows: (i) Arabidopsis thaliana cell-suspension culture GEM [14]; (ii) barley seed flux balance models [15,16]; (iii) AraGEM and C4GEM models for Arabidopsis and the monocots maize, sugarcane, and sorghum, respectively [17,18]; (iv) rapeseed models [19–21]; and (v) updated AraGEM and C4GEM models [22]. These models and challenges associated with genome-scale modeling (“GEMing”) will be reviewed here after a brief introduction to generating GEMs. 1.1. Metabolic networks for GEMing are reconstructed from genome annotation GEMs are in silico metabolic flux models derived from genome annotation that contain stoichiometry of all known metabolic reactions of an organism of interest [3,23]. Metabolic flux (total reaction rate) is typically defined as the amount of a metabolite being converted by a particular enzyme per unit of time and per gram (dry cell weight) of cells (or biomass). Each metabolic reaction in every pathway of a reconstructed network is represented by relevant metabolites (including cofactors) and an enzyme or a transporter encoded by the corresponding gene(s) present in the genome of the organism. These gene-to-protein-to-reaction associations are indispensable in deciding whether a particular reaction takes place in the organism of interest as well as validating mutant predictions and implementing the actual metabolic engineering strategies [24]. Fully sequenced genomes, or at least comprehensive high-quality transcriptomic data, are needed to extract the information about the participating enzymes and, subsequently, metabolic reactions and pathways. The databases most commonly used to extract the metabolic information for specific organisms include: (i) Kyoto Encyclopedia for Genes and Genomes (KEGG); (ii) the SEED database; (iii) BioCyc and AraCyc; (iv) Reactome; (v) TAIR; and (vi) the Biochemical Genetic and Genomic knowledgebase (BiGG) [25–31]. It is well established that many inconsistencies exist among these databases, and the resulting network must be manually curated by an expert of the organism of interest. These irregularities include inaccurate gene 58 58 59 59 61 61 62 63 63 64 64 66 67 67 68 annotation, inconsistent metabolite names, and incomplete formulas as well as the presence of unbalanced reactions for water, protons, high-energy cofactors, and monomers/polymers. However, several groups in the field are dedicated to building consistency among databases and enabling fully automated genome-scale metabolic network reconstructions [32]. In the case of plant cells, when compartmentation of metabolic pathways in individual organelles is desired, one can consult a number of databases for the localization of the corresponding enzymes and their isoforms and seek transporters enabling metabolite transport between organelles. These databases include: (i) SUBA; (ii) Plant Proteome Database (PPDB); (iii) AraPerox; and (iv) UniproKB/SwissProt [33–36]. 1.2. Constraining the model allows minimizing the number of possible flux solutions The reconstructed metabolic network and its reaction stoichiometry (stored in a stoichiometric matrix) extracted from databases determine the possible routes of carbon flow and cofactor balancing. This represents the first constraint, as metabolites can only be converted along pathways specific to the studied system. Nevertheless, this model enables infinite possibilities for combinations of all allowed fluxes and needs to be further constrained to minimize the number of the possible flux distribution solutions. GEMs tend to be under-determined, constraint-based flux models yielding multiple solutions to a single problem [37,38]. As with any under-determined system, one has to minimize the large number of possible flux solutions that are allowed by the structure of the metabolic network to a few likely solutions. This goal can be achieved by using various constraints on flux values within individual steps in metabolic pathways [39]. Examples of these constraints, which include mass balancing, “physico-thermo-chemical”, and actual flux measurements and are used to define a “window” of possible values a flux can have in the studied system, are briefly described here. Detailed information on constraint-based modeling and its applications can be found elsewhere [5,10,39–43]. 1.2.1. Stoichiometric constraints (mass balancing) assume no changes of fluxes with time During metabolism, concentrations of individual metabolites change with time t, which is represented by Eq. (1), where Xi is the concentration of the metabolite M, Sij is a stoichiometric coefficient matrix containing the stoichiometry of compounds i (placed in rows) for each reaction j (placed in columns) in the metabolic network. The parameter j is a vector containing the flux values through all reactions in the network [3]. The stoichiometric matrix S contains all metabolites and reactions that ultimately are E. Collakova et al. / Plant Science 191–192 (2012) 53–70 55 reconstructed from the genome annotation of the studied organism, as described above. dXi = Sij · j dt (1) j Because the rate of cell growth is much slower than the rate of any enzymatic reaction, one may assume metabolic pseudo steadystate for very short time scales such that the rate of change of metabolite concentrations dXi /dt is assumed to be zero [3,24,44]. This is the basis for steady-state mass balancing represented by Eq. (2). This commonly termed “flux balance equation” is typically solved by linear programming software. S·=0 (2) By applying the pseudo steady-state assumption, the differential equation (Eq. (1)) is replaced with a linear equation (Eq. (2)). This was originally designed for bacteria growing in a chemostat, where the culture was believed to very nearly achieve steady-state metabolism. Usefulness of the modeling strategy has resulted in this approach being applied more broadly. However, for a GEM, there are usually more reactions than metabolites in S and we refer to such a system as under-determined [37,38]. The many flux solutions to an under-determined metabolic network must be narrowed down by introducing additional constraints to reduce the allowable flux solution space into a “phenotypic solution space” [38]. An example of the phenotypic solution space is shown in Fig. 1 for three fluxes as a heptahedron. In theory, this phenotypic solution space contains all possible flux combinations that the cell could ever use. The particular combination of flux values that the cell will use is dependent on the influences of: (i) the cellular objective (e.g., maximizing growth rate); (ii) genetic regulation; and (iii) the environmental and developmental cues. The question we are really asking is: What combination of the fluxes can be expected in a cell that has adapted to the specific conditions (e.g., a nitrogen-starved cell)? In GEMing, knowledge of a “cellular objective” and realistic “constraints” are required to solve this problem by linear programming with Eq. (2). The topic of cellular objective functions will be revisited after introducing other potential methods for constraining the GEM. 1.2.2. Constraining the flux solution space minimizes the number of possible flux distribution solutions The solution space can be further reduced in size by implementing “physico-thermo-chemical” constraints. The most commonly used of these constraints include thermodynamic and enzyme capacity constraints [45–47]. Implementing thermodynamics requires that each reaction is described by its standard Gibb’s free energy of formation, which together with known (or assumed) metabolite concentrations, helps to determine the reversibility constraints for each reaction. Thus, thermodynamically unfavorable reactions are removed by constraining them to zero flux [47]. A number of reactions operate only in a single direction. For example, decarboxylases tend to catalyze only the decarboxylation of their substrates at physiological conditions. In the case of the absence of the corresponding carboxylase in the studied organism, the range of possible flux solutions can be restricted by constraining the minimal decarboxylase flux to zero. The maximal possible flux constraints can be set by using maximum capacity for a specific enzyme (max ) if known from in vitro or in vivo enzyme activity studies [24]. Maximum flux constraints can also be based on measured substrate uptake rates, which directly relate to the maximal possible biomass formed in a growing cell [24]. More sophisticated ways to constrain the model include proton and cofactor balancing [24,48] and integrating a variety of high-throughput data through integrative metabolic omic analysis [49]. The resulting phenotypic Fig. 1. Simplified GEMing workflow (modified from [47]). Step 1: Compartmented metabolic network is extracted from genome annotation and available metabolic and protein location databases. Metabolites are shown as circles of different colors; external substrates and biomass products are in gray and black, respectively, while the internal metabolites are in white. Enzymes E localized to different compartments carry out the reactions and transporters T enable metabolite uptake and transport. Stoichiometry and mass balancing S· = 0 is applied. Step 2: Other constraints (Section 1.2.2) are implemented to minimize the phenotypic allowable solution space (shown as a heptahedron for all possible combinations of three fluxes 1,2,3 ) within the 3D flux vector space. Imagine that this space would be 15-dimensional for the example of the network (for 15 fluxes represented by 8 enzymatic steps and 7 transporters) and 1,798-dimensional for the updated AraGEM [22]. Step 3: Specific objective functions (Section 1.2.3) are minimized or maximized to obtain solutions that lie within the phenotypic space (a single solution with flux values of, e.g., 1, 2, and 0.5 mmol g−1 dry weight h−1 for 1 , 2 , and 3 , respectively, is shown as a circle). solution space can ideally be further explored to identify a single flux distribution (or a family of fluxes) to describe the relationships among competing metabolic pathways or between metabolism and the cellular environment. Method development for extracting the most useful flux distributions from the phenotypic space remains an important area of systems biology research. 1.2.3. Objective functions are a mathematical representation of cellular “goals” As mentioned above, the phenotypic solution space contains a very large number of possible flux combinations. Each reaction is represented by a range of flux values (from minimum to maximum) that the reaction could ever have in the reconstructed metabolic network. The actual solution (a combination of values for all fluxes in the network, or what modelers like to call “flux distribution”) is dependent on the “goal” of the cell. This goal is commonly described by one or more objective functions when the flux balance equation is solved by linear programming [38]. Anthropomorphically oriented cellular goals are only a mathematical convenience of capturing the influence of the environment or genotype on the flux phenotype for modeling purposes and should be viewed as such in this review. The set of flux values will be different in the cell that, for example, operates to grow as fast as possible from those in a cell that operates to generate and store chemical energy and this is reflected in objective functions. 56 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 In the bacterial world, these objective functions typically include: (i) regulating growth (minimizing or usually maximizing biomass production); (ii) increasing the production of desired metabolites (maximizing fluxes through particular pathways); (iii) minimizing production of undesired metabolites (minimizing fluxes through the relevant pathway(s)); and/or (iv) chemical energy production (e.g., ATP or reductant) [1,3,38,50]. The rationale behind maximizing the biomass production is based on the natural selection process. Bacteria growing under favorable growth conditions will grow and divide as fast as possible to out-compete any other potentially present bacterial species and become the dominant species in that environment. On the other hand, many bacteria are known to grow slowly and export antimicrobial agents as a mechanism of competition. For chemostat-based GEMing studies when competition is not an issue, the most commonly used objective function is maximizing the biomass production. The production of existing (e.g., antibiotics, solvents, etc.) or novel metabolites (e.g., by metabolic engineering) is usually connected with low growth rates. In that case, the cellular goal becomes a bi-level optimization, maximizing the growth rate as well as the production of a target metabolite. Some highly reduced metabolites such as fatty acids found in oil and membranes require significant chemical energy in the form of ATP and reductant (e.g., NADPH) for their synthesis. If the goal is to produce high levels of these metabolites then maximizing the fluxes through the pathways that are known to produce these high-energy cofactors will also need to be considered as another objective function. It is apparent that some objective functions need to be maximized, while others minimized to obtain a desired metabolic phenotype. Regardless, just as protein folding is a thermodynamically driven process, distribution of metabolic flux throughout a network is also thermodynamically driven. Researchers in systems biology are still looking for these connections. Until found, the use of multiple objective functions to describe cellular function is the best approach to study multiple behaviors in the cell. From the mathematical perspective, flux balance analysis is the most popular method to explore the phenotypic solution space [1,3,38]. This approach uses linear programming (with Eq. (2)) to minimize or maximize the objective function(s) within the phenotypic solution space to find the flux solution(s). Because the objective function is either minimized or maximized, it represents extremes at either end of the average (possibly physiologically relevant) objective function. As a result, the solution usually lies at the edges of the phenotypic solution space (Step 3 in Fig. 1). There is much debate about how well this represents an in vivo scenario. The use of multiple objective functions is needed to model for instance stressful or nutrient-limited conditions that are not represented by extreme objective functions, which currently represents a major challenge of GEMing. Much research is also in progress to build-in genetic regulation to this approach. Modeling (including linear programming) is typically done in MATLAB using the COBRA toolbox that has been developed as a shared resource with numerous useful tools for GEMing [51,52]. As discussed above, due to the large phenotypic solution space, there is usually more than one solution, but linear programming and the majority of flux balance analysis algorithms commonly only return a single set of solutions [2]. From all possible solutions, the one that shows the smallest value for the sum of all fluxes will be typically chosen to represent the specific objective function(s) [53]. This choice is based on the assumption that cells will conserve resources when it comes to nutrients and energy and will, therefore, likely minimize the total carbon flux through the metabolic network [54]. Much research has been performed to identify suitable objective functions in microbes. One such example is the Biological Objective Solution Search algorithm [55], which relies on experimental data to locate a suitable objective for the network. With a description of GEMs and their representation by flux values established in these sections, the next sections describe the applications of GEMs. 1.3. GEMs can be used to predict metabolic phenotypes in microorganisms GEMs provide information about the distribution of fluxes in the genome-scale metabolic network for: (i) a specific genotype; (ii) a developmental stage governed by regulatory factors; and (iii) environmental conditions. These are built into the GEM through the stoichiometric matrix, constraints, and chosen objective functions. Historically, GEMs have been said to provide the connection between the genotype and/or environmental influences and resulting metabolic phenotypes, represented by fluxes. Generating genetic mutants in silico along with implementing diverse objective functions through flux balance analysis and comparing the resulting GEMs offer an insight into the changes of fluxes within the metabolic network [42]. This type of in silico enzymatic reaction (‘gene’) knockout analysis is also useful in identifying essential metabolic reactions and genes by constraining the flux through the candidate reaction to zero to address the question of whether all biomass components are still capable of being produced by the metabolic network [56–58]. The reaction (and corresponding gene(s) encoding enzyme(s) that catalyze it) will be considered essential if other reactions of the network cannot synthesize the components of biomass. The wild type and knockout mutants can then be compared in effort to recognize metabolic re-programming that occurs in response to genetic or environmental perturbations. Of course, correct gene-to-protein-to-reaction associations [24] are needed for validation purposes, specifically to genetically modify expression of relevant genes targeted for mutant analysis and metabolic engineering. Several more appropriate optimization methods in addition to flux balance analysis have been used to predict the metabolic phenotype in perturbed cells. Two most commonly used methods include the “minimization of metabolic adjustment” [59] and the “regulatory on/off minimization” [60] algorithms. Minimization of metabolic adjustment uses linear (as flux balance analysis) or quadratic optimization and the assumption that the affected cell will prefer to minimize the effect of the genetic or environmental perturbation. This is reflected as the smallest possible change in the total flux between the wild type and perturbed cells [59]. The regulatory on/off minimization algorithm is similar to minimization of metabolic adjustment except that instead of the minimization of the total flux value, the total number of altered fluxes is minimized [60]. The rationale behind regulatory on/off minimization is based on the assumption that the cell will dump the extra carbon to short as opposed to long alternate pathways as a metabolic adaptation to the specific perturbation. Minimization of metabolic adjustment and regulatory on/off minimization are based on attractive assumptions that coincide with the idea of conservation of cellular resources committed to metabolism, while achieving the cellular objective. Minimization of metabolic adjustment tends to be more accurate for an initial mutant (before strain evolution), while flux balance analysis and regulatory on/off minimization were shown to be suitable for an adapted mutant [60]. GEMs are invaluable tools in rational metabolic engineering in microorganisms. So how can GEMs be used to predict genetic modifications to obtain a desired metabolic phenotype (e.g., to produce large amounts of a highly valuable metabolite) in a systematic way without knowing which genes to knockout? The OptKnock, OptFlux, and OptORF approaches (and their improved versions) are three commonly used algorithms that perform systematic gene knockout simulations with GEMs, given specific conditions, to enhance production of specific metabolites [39,61–63]. These methods were useful in increasing the production of lactate in E. Collakova et al. / Plant Science 191–192 (2012) 53–70 Escherichia coli K-12, succinate in Mannheimia succiniproducens, and ethanol in Saccharomyces cerevisiae [64]. Efforts have also been made to model the production of valuable chemicals during the stationary phase of culture growth by instituting additional objective functions and minimizing the metabolic adjustment between the exponential and stationary growth phases [65]. 2. Challenges associated with GEMing in plants revolve around the complexity of plant cell metabolism 2.1. Lack of comprehensiveness in plant metabolic networks and missing components Ideally, a GEM should contain all possible metabolites and enzymes organized in interconnected pathways that can exist in the given organism, based on its genome sequence. This can be a challenge especially for complex eukaryotic organisms, such as plants, when the function of most proteins remains unknown. Try to imagine that the task is to put together a functional car, for which one has all parts, but limited knowledge about the function of individual parts and how they assemble and function together. Those with limited knowledge may still have a pretty good idea about the “engine, fuel injection, and transmission systems” of the plant cell represented by the central carbon metabolism. But, this knowledge is probably not so great regarding the obscure parts representing secondary metabolism. In general, we could assemble a working vehicle, but we would not know if we had a sedan, a truck, or a school bus. The E. coli K-12 genome contains 4,290 predicted protein-coding open reading frames, of which 1,366 (32%) have been included in a comprehensive genome-scale reconstruction metabolic network iJO1366 [12]. Of the 1,366 metabolism-related genes, only 38 were inferred through bioinformatics. Conversely, Arabidopsis is predicted to have 28,775 genes as of September 11, 2011 [29]. Molecular function was experimentally determined for only 13% of all predicted genes, 44% of gene functions were computationally predicted, 30% are of unknown function, and 12% are not annotated [29]. The maize genome is predicted to contain 32,540 genes and 53,764 transcripts [66]. For about 42% of the total maize genes there is very little or no functional annotation information available [22]. The high percentage of plant genes with function inferred only through bioinformatics has raised concerns about mis-annotation. The problem with mis-annotation and the genes with unknown function relates mostly to plant secondary metabolism and transporters. Here, many enzymes and the intermediates of specialized metabolic pathways as well as transporters remain uncharacterized regarding their substrate specificities. For example, methyltransferases involved in tocopherol biosynthesis were originally mis-annotated as sterol methyltransferases until their true function was confirmed by a combination of reverse genetics and biochemical characterization [67]. In some cases, homologues in different organisms have different functions. In bacteria, PurU (N5 -formyl tetrahydrofolate deformylase) provides formyl groups for purine biosynthesis [68], but two Arabidopsis PurU homologues are involved in photorespiration and do not seem to have anything to do with purine biosynthesis [69]. The situation gets even more complicated with metabolites. Collectively, plants produce over 200,000 primary and secondary metabolites, but structures of only about 10–20% of these metabolites are known [70,71]. The collection of all metabolites in a cell is referred to as the metabolome [72–74]. Obviously, only some pathways are active under specific conditions at any given time, resulting in the production of a portion of all possible metabolites in specific cells/tissues, which exacerbates the efforts to capture the plant metabolome. As such, the metabolic networks extracted from 57 fully sequenced plant genomes and metabolic network databases will have many gaps and errors. Automated gap-filling algorithms such as GapFind and GapFill can be implemented [75]. However, these must be and often are not used with caution. There are instances when a particular step or a pathway is present in one organism but not the other, or when completely different pathways in different organisms can produce the same metabolite. For example, plants use homoserine kinase that provides Ophosphohomoserine for the biosynthesis of both threonine and methionine [76]. In bacteria, threonine is made from homoserine the same way as in plants, but methionine is made using pathways involving succinylated or acetylated as opposed to the phosphorylated forms of homoserine [77]. Threonine catabolism involves threonine dehydrogenase in microorganisms and animals [78,79]. However, the activity and roles of putative plant threonine dehydrogenases in threonine degradation remain to be demonstrated experimentally [76]. This problem relates to differences between prokaryotic and eukaryotic organisms and also to organisms in the same taxa. Unlike other bacteria and plants, Clostridium acetobutylicum is a fermentative microorganism that has an incomplete TCA cycle and does not use oxidative pentose phosphate pathway [80,81]. However, both initial drafts of the flux balance analysis model for this organism had different versions of the TCA cycle that also differed from what was eventually found through 13 C-based steady-state metabolic flux analysis [48,65,81,82]. Metabolic networks that are extracted from databases using bioinformatics tools and then used for GEMing should still be curated by an expert and supplemented with biochemical data where necessary. 2.2. Compartmentation Central carbon metabolism in bacteria occurs inside the cytosol, so many bacterial GEMs include only ‘extracellular’ and ‘intracellular’ compartments. However, microbial GEMs that consider additional compartments, such as the ‘periplasm’ of Gram negative bacteria, do exist [12,58,83]. Minimizing compartments is preferred to the mathematical modeling aspect. From the modeling perspective, the extracellular metabolites include substrates and products of metabolism as well as some transported cofactors. The intracellular compounds represent the intermediates of metabolism. The substrates can originate from the extracellular compartment in abundance and are provided to the cell through transporters often constrained to a desired flux. In the case of multicellular plants, the substrates are produced elsewhere (e.g., photosynthetic assimilate or CO2 from the environment) and supplied to the sink cells. The products must accumulate and constitute cellular biomass or are excreted to the extracellular medium. Internal metabolites are associated with at least one reaction that produces them and at least one reaction that consumes them for those reactions to receive flux. All metabolites transported between the extracellular and intracellular compartments must be accompanied by specific transport reactions that supply or remove metabolites from the extracellular environment. Identifying relevant transporters and determining their substrate specificities remains of the major challenges in GEMing. Plant cells have several compartments defined by organelles that separate different metabolic processes and provide variable conditions for enzymes. Many plant pathways of central carbon metabolism are duplicated, which is represented by the presence of differentially localized isoforms of the same enzymatic activities in proteomes from individual cellular compartments. In many different plant cell types, glycolytic enzymes are present in both cytosol and chloroplasts [84–87]. The existence of different compartments and enzyme isoforms in plant cells exacerbates GEMing in terms of metabolic network reconstruction and the actual modeling. First, plant metabolic networks are more complex and fluxes 58 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 through duplicated (and this is true for any set of parallel pathways whether duplicated or alternate) can be dissected only when there are differences in the production or consumption of chemical energy between the parallel pathways. Second, deciding the specific compartment for the particular isoform is cumbersome as there is only bioinformatic localization prediction available for many proteins. Ideally, direct experimental evidence (e.g., protein import and/or co-localization studies), should be combined with proteomic studies for the specific cell/tissue/organ type and (if possible) at relevant developmental stages and conditions. With the proteomic studies, caution is advised, as many cytosolic proteins tend to be associated with individual organelles. Many glycolytic enzymes in the cytosol are bound to the outside membranes of mitochondria and chloroplasts probably enabling metabolite channeling and transport [88] and could be misinterpreted as being mitochondrial and plastidic. Third, the cytosol and different organelles provide different conditions for metabolism in terms of: (i) pH; (ii) salt concentrations; and (iii) energy/redox status. Not considering the differences in environments in diverse compartments also has serious implications when thermodynamic constraints are implemented to narrow down possible flux distribution solutions. Possible reversibility of a reaction depends on thermodynamics and the concentration of participating metabolites and cofactors. Current methods of isolating organelles for metabolomic purposes are not reliable [89], though non-aqueous fractionation can potentially be useful [90]. The major problem of non-aqueous fractionation is associated with the inability to separate metabolites present in different organelles to discrete regions, but this problem can be overcome by implementing marker molecules for different organelles in combination with novel computational approaches [90]. Other novel approaches using laser-capture microdissection and pressure catapulting coupled with high spatial resolution imaging mass spectrometry (referred to as the “mass microscope”) show a lot of promise for measuring levels of major metabolites in specific cell types and even large organelles [91,92]. However, these approaches still need significant improvements. The resolution (10 m) of this “mass microscope” is too low to be useful for distinguishing metabolites in cytosol, mitochondria, and plastids. Sensitivity is also low, as only major metabolites present at the levels of about 1% of dry weight can be detected [92]. Metabolite charge depends on its pKa and pH of the environment and it influences enzyme activity. Considering pH changes in different compartments is useful for proton balancing. Thermodynamic balancing constraints have not yet been applied to plant GEMs and only one has included proton balancing [22]. Currently, research is underway to allow GEMing to aid in the quantification of metabolite fluxes between different tissues. These obstacles should be addressed as additional models are created. production, ferredoxin will transfer electrons back to photosystem I (through the cytochrome bf complex and plastocyanin) rather than to NADP reductase in the alternate cyclic electron transport and only ATP is produced [95]. Both cyclic and linear electron transport chains must be included in the model and the relative fluxes through these routes depend on ATP and NADPH demand by the dark reactions and other pathways. RubisCO can also fix O2 in C3 plants and photorespiration should be included. Photorespiration is minimized or excluded in C4 plants and the relevant C4 cycle involving phosphoenolpyruvate carboxylase in the mesophyll cells is included in the model. Obviously, photosynthetic fluxes must be constrained to zero in non-photosynthetic plant cells. Significant modeling challenges exist involving the changes in central carbon metabolism of photosynthetic organisms during the diurnal and circadian rhythms. Many models have aimed to predict light-dependent photosynthetic metabolism, but in all cases biomass composition was used for plants grown in standard light/dark regimes. As such, these GEMs may represent an average of mixed metabolism (represented by average flux values) over a long period of time during biomass accumulation. A GEM provides a combination of static flux values that would explain the resulting biomass composition, given all implemented constraints and objective functions. The clashing of two different types of metabolic activities that are then averaged leads to a failure to adhere to the metabolic steady-state assumption needed for flux balance analysis. Different pathways will be active under light compared to dark conditions. Under light, photosystems will be active, generating chemical energy used for CO2 fixation in the Calvin–Benson–Bassham cycle. Sucrose and starch are made. In the dark, the light reactions will no longer be active, changing the redox status of the cell. The chemical energy for the Calvin–Benson–Bassham cycle has to come from other sources, e.g., respiration, in the dark. Dynamic flux balance analysis or more sophisticated types of kinetic modeling provide a way to model changes in fluxes [53,96]. Biomass and internal metabolites can be measured at short intervals during a time course involving diurnal or circadian metabolite changes for dynamic flux balance analysis. Metabolite levels and fluxes are assumed not to change dramatically over a short period of time of the measurements, conforming to the steady-state assumption. Subsequently, the measurements for each time point are used to model fluxes individually by flux balance analysis to obtain a view of dynamic change in fluxes in normally non-stationary model systems. Additional GEMing tools that will aid this problem are new unpublished methods that remove the dependency of the biomass composition. These approaches remain in their infancy and are not discussed further in this review. 2.3. Photosynthesis and photorespiration Plants are sessile organisms that have evolved numerous sophisticated mechanisms in response to environmental changes. These mechanisms involve: (i) sensing specific types of biotic and abiotic stimuli; (ii) transducing relevant signals; and (iii) turning on/off genes involved in the actual response to the changes in the environment. These responses include various types of metabolic and physiological adaptation processes, from the molecular to the whole plant and even ecosystem level. At the same time, the plant follows its own developmental program that is also tightly connected to changes in physiology and metabolism in specific organs and cell and tissue types. From the GEMing perspective, this translates into an enormous complexity of possible metabolic networks and flux combinations in these networks, as the responses to developmental and environmental changes are associated with global metabolic reprogramming. Available high-throughput transcriptomic, proteomic, and metabolomic data are needed to refine the The existence of photosynthesis and photorespiration contributes to the complexity of plant metabolic networks. Photosynthetic rates can be measured under specific conditions [93]. These measurements can then be used to constrain the flux model or used to validate GEM predictions. However, in some recalcitrant model systems, including developing seeds, photosynthesis cannot be easily measured, and the validation of flux models is not easily achieved in these systems [94]. When modeling photosynthetic cells, the light reactions produce chemical energy in the form of NADPH and ATP from splitting water, while the dark reactions of the Calvin–Benson–Bassham cycle consume these high-energy cofactors to ultimately produce sucrose. The linear electron transport chain produces both NADPH and ATP, while the dark reactions require more ATP than NADPH. To balance the NADPH and ATP 2.4. Diversity of plant cell and tissue types and responses to the environment influence the objective functions E. Collakova et al. / Plant Science 191–192 (2012) 53–70 network after reconstructing it from genome annotation to ensure that all enzymes and metabolites relevant to the model system (specific cell types under defined conditions) are present. Such complexity leads to an inherent problem associated with GEMing, the selection of appropriate objective functions. Minimization of metabolic adjustment and regulatory on/off minimization are algorithms designed to minimize total flux and number of fluxes, respectively, and eliminate redundant parallel pathways. Plant transcriptomes and proteomes often reveal that the enzymes of the redundant pathways are still present and probing metabolism with 13 C-baleled substrates demonstrates that there is indeed carbon flowing through these pathways [33,97–99]. So, what is the cause for this discrepancy? Does a cell not try to conserve resources by minimizing fluxes? It probably does, but GEMing involves very simple and limited number of major objective functions. Other possible cellular goals are not considered and many pathways will not be ‘needed’ in the model (but needed in reality) when these goals are not included. Bacteria are much simpler in that sense, though they also respond to environmental changes. However, from the metabolic engineering perspective, bacteria are grown under optimized controlled growth conditions that maximize the production of the target metabolite. What about plants? Given the existing toolsets, it is conceivable that GEMs can be used to predict phenotypes and design metabolic engineering strategies for specific cell type and growth conditions. However, there is no guarantee that these specific conditions will be met in the field. In fact, plants growing in the field will simultaneously experience numerous abiotic and biotic stresses and will adapt accordingly. The cellular objectives become much more complicated than maximizing growth. Constantly changing weather and plant–plant, –pathogen, and –herbivore interactions will cause up-regulation of various metabolic and physiological adaptations and defense mechanisms in field-grown plants. Significant cellular resources in terms of carbon and energy sources are required for these mechanisms to operate. As such, substantial flux will be redirected from the central carbon metabolism intended for growth to the pathways of secondary metabolism for the production of defense metabolites. Limited information on secondary metabolism in a plant GEM will diminish its predictive capabilities when plant–environment and plant–pathogen interactions are in play. Clearly this is an area that will benefit from incorporation of advanced experimental research into the systems biology. 3. GEMing in plants It is clear that GEMing in plants is extremely challenging, but as: (i) the high-throughput technologies are improved; (ii) novel ones emerge; and (iii) the missing parts of metabolic networks are identified, it will be possible to gather a systems level of understanding of metabolic processes. Currently, there are several GEMs developed for plants. These models adequately describe the primarily photosynthetic or non-photosynthetic central carbon metabolism (Table 1). While secondary metabolism is included in these models, it must be largely neglected in flux calculations due to unavailability of complete metabolomic and proteomic data. However, all these models were able to correctly predict certain aspects of central carbon metabolism in specific cell types. This leads to the question of how much information about the metabolic network is really needed to be able to predict specific phenotypes. 3.1. Arabidopsis thaliana cell-suspension culture GEM Cell-suspension cultures are a useful tool in plant research. Though not photosynthetic, they have been used as an approximation of a heterotrophic plant cell (e.g., root cells) with the 59 assumption that at least some aspects of central carbon metabolism will be similar between cell cultures and diverse specialized root cell types [100,101]. Clearly, the diversity of non-photosynthetic plant cells cannot easily be captured with simple cell culture models. Such models should be viewed as a stepping stone for more appropriate models using specific/relevant constraints and objective functions applicable to plant metabolism in specialized cells. They should also be viewed more as models demonstrating the application of modeling approaches to plant cells rather than providing meaningful metabolic and mutant predictions. The modeling approaches used to generate the first plant cell-culture model [14] were similar to those in bacteria. As with bacteria, using cultures of homogeneous plant cells clearly offers advantages when it comes to simplifications of the models and the ability to use a number of assumptions that greatly simplify the modeling process. Not dealing with: (i) diverse cell types; (ii) photosynthesis; and (iii) the need for compartments in bacterial cells provide an obvious advantage, but at the same time, this diminishes the relevancy and the predictive power of the analogous plant models. The Arabidopsis cell-culture GEM is a minimization-ofmetabolic-adjustment-like model based on linear programming with the minimization of the total flux as an objective function [14]. Organellar separation of metabolic pathways is not included in this model. The metabolic network for this GEM was extracted from AraCyc [30] using ScrumPy [102]. The network was carefully curated to the final version of 855 reactions that were mass balanced for major atoms (each reaction has the same number of each atom on both sides). Unbalanced reactions in the model would upset the fundamental laws of mass and energy conservation. AraCyc is a useful tool for most applications, and improvements are anticipated for its use with GEMing. However, it contains errors in formulas as well as orphan reactions and metabolites, which prevents accurate atom balancing of the model; nearly 700 AraCyc reactions were unbalanced for protons, of which over 370 reactions also did not have the correct oxygen balancing [14]. Constraints included the stoichiometric mass balancing and the physiologically relevant biomass composition. This was based on experimental measurements of individual major biomass components in cells growing on glucose as the sole carbon source. Soluble metabolites, salts, and CO2 were not accounted for in the total biomass. While the soluble metabolites are difficult to quantify (and individually they do not contribute much to the final biomass), CO2 lost during metabolism contributes significantly to the biomass especially in heavily respiring heterotrophic cells. The total net flux of CO2 production can be easily measured [103] and used as an additional constraint. Nevertheless, this setup is rather clever because rather than maximizing the biomass production as in most bacterial studies, specific biomass production with the correct composition [50] is defined based on the typical growth of real plant cell cultures under standard growth conditions. Specific biomass information including composition is mathematically expressed in so called “biomass constituting equation”, representing a sum of the amounts of all significant individual components constituting the biomass. The goal of this study was to interrogate the phenotypic solution space. In other words, the authors sought to ascertain which fluxes and by how much will these fluxes change in cells existing at various energy states (cells with different ATP availabilities) driven by growth conditions. It is interesting to note that only 232 reactions were active and only 42 reactions responded to systematic alterations in ATP demands [14]. As expected, the responsive reactions were all related to the production or consumption of chemical energy. Minimization of the total flux coincides with the objective of minimizing the number of active routes (fluxes), which could explain the low number of reactions carrying fluxes [14]. The objective function used here prevented the use of most reactions (they 60 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 Table 1 Summary of plant flux balance analysis models. All models presented here used biomass composition measurements as constraints and flux balance analysis except for the bna572, which is a flux variability analysis model* [19,20]. Photosynthetic models assume that the ratio of RubisCO CO2 fixation and oxygenation is 3:1. Most of these models used physiologically irrelevant, extreme objective functions because the real ones are not known. With the exception of the updated AraGEM and C4GEM [22], secondary metabolism is not included.** denotes corresponding number of reactions and metabolites in the updated AraGEM model. Models Reactions Metabolites Objective functions Biological question Comments Arabidopsis – heterotrophic cell suspension culture [14] 1,406 1,253 • Minimize total flux Assessing changes in metabolism at different energy states in a heterotrophic cell • Objective function prevented many redundant reactions from being used • No compartments Barley – photoheterotrophic and heterotrophic cereal endosperm [15,16] 257 234 • Maximize growth (linear optimization) • Minimize total flux (quadratic optimization) Estimating the influence of hypoxia and aerobic conditions on metabolic fluxes • Small compartmented tissue-specific model (65 transporters) • 2nd model – improved with the actual differential O2 and metabolite measurements within the endosperm Rape – photoheterotrophic metabolism in developing oilseed [21] 313 262 • Actual substrate uptake and biomass production rates based on high-quality measurements from the literature Predicting metabolic regulation and mutant phenotypes related to oil accumulation • Flux balance analysis used in a combination with principal component analysis to minimize the number of solutions (objective functions were physiologically relevant) • Regulation of oil biosynthesis predicted Rape – bna572, photoheterotrophic metabolism in developing oilseed* [19,20] 572 376 • Flux balance: minimize substrate uptake rates (photons and carbon and nitrogen sources) • Flux variability: fixed objective function, then minimize and maximize each reaction Exploring phenotypic solution space with flux variability analysis in search for alternate active pathways in developing rapeseed embryos • Highly comprehensive and compartmented oilseed model • Flux variability along with in silico mutant analysis – exploration of phenotypic solution space obtained from flux balance analysis and the use of alternate pathways in relation to carbon use efficiency • Some fluxes in the bna572 model were comparable to the fluxes in the corresponding 13 C metabolic flux analysis model • Flux balance and variability analyses – unable to select less efficient alternate pathways that are known to be active in vivo Arabidopsis – AraGEM (iRS1597, an updated AraGEM**) – C3 photosynthesis, photorespiration, and heterotrophic metabolism in mesophyll cells [18,22] 1,567 (1,798**) 1,748 (1,820**) • Minimize substrate uptake rates (photons for C3 and sucrose for heterotrophic metabolism) Assessing metabolic changes in C3 vs. photosynthetic and heterotrophic metabolism, respectively; predicting sources of energy in mesophyll cells • Compartmented model of C3 photosynthesis • Good qualitative model predicting expected metabolic changes Maize, sorghum, sugar cane – C4GEM – 1,588 C4 photosynthesis in mesophyll and 1,755 bundle sheath cells [17] • As in AraGEM As in AraGEM, but comparing fluxes in 2 cell types and distinguishing different types of C4 photosynthesis • Compartmented 2-cell model of C4 photosynthesis containing 112 transporters • Fluxes in photosystem I and II were distinguished in 2 cell types • Detailed analysis of high-energy cofactor requirements in different modes of C4 photosynthesis Maize – iRS1563, an updated C4GEM [22] • Maximize biomass production Predicting metabolic phenotypes in known lignin biosynthesis mutants and assessing the effects of changes in cell wall composition on biomass production • Expanded C4GEM specific to maize • Known aspects of secondary metabolism included • Photon utilization and CO2 uptake – not constrained • Validation provided by existing data on lignin biosynthetic mutants 1,985 1,825 carried zero flux). In reality, there is much metabolic redundancy in plant central carbon metabolism and plant cells simultaneously use these redundant and parallel pathways. Carbon can easily be redirected from one pathway to the other if the main route is blocked, resulting in the same amount of cell biomass and composition. The flux solutions obtained in [14] were biased by the use of the highly limiting objective function and the question is how comparable these modeled fluxes are to the situation in vivo. Two flux maps were generated for Arabidopsis cell-suspension cultures grown under heterotrophic conditions at two different oxygen levels [99] using 13 C-based steady-state metabolic flux analysis. This procedure involves a combination of measurements and iterative modeling [104,105]. Accumulation of individual biomass components and substrate uptake rate measurements are used to constrain fluxes. Tracking the carbon from substrates through a defined metabolic network to final products of metabolism using 13 C techniques enables one to discern which pathways are active and, in some cases, even dissection of the forward and reverse fluxes through reversible reactions [104,106]. Metabolic flux analysis showed that under both normal- and high-oxygen conditions, both glycolysis and TCA cycle were very active and that the TCA cycle behaved as a cycle [99]. In the case of the GEM, the overall fluxes of these two pathways were low and TCA was not a complete cycle [14]. This case illustrates the need for experimentally-derived constraints in central carbon metabolism. Installing a level of genetic regulation in a plant GEM to satisfy the need for constraints is an E. Collakova et al. / Plant Science 191–192 (2012) 53–70 ambitious but ultimately necessary approach from a systems biology perspective. It is difficult to compare these two models in this case because the metabolic network of the metabolic flux analysis model lacked reactions involving ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) and photorespiratory pathway. In theory, RubisCO is not supposed to be active in non-photosynthetic tissues, such as non-green seeds, roots, and plant cell-suspension cultures. RubisCO is assumed to be inactive and is not included in the metabolic flux analysis models of maize and sunflower seeds and maize root-tips [107–110]. However, this enzyme is involved in increasing the carbon use efficiency in green oilseeds by refixing significant amounts of CO2 lost in photoheterotrophic-type metabolism [103,111]. RubisCO and photorespiratory enzymes are found in the proteomes of plant cell cultures [112], but their in vivo activities and physiological relevance to heterotrophic metabolism need to be experimentally elucidated. There was a discrepancy around CO2 and O2 fixation by RubisCO in the flux balance analysis model. The two gasses compete for the same active site in RubisCO [113]. This means that CO2 and O2 fixation cannot be separated when both gases are present and the only way to shift the preference for either carboxylation or oxygenation is by changing the relative availability of the two gases. Based on the flux balance analysis model, RubisCO was active with both CO2 and O2 at low ATP demand, but the activity towards the two substrates was disconnected, when O2 , but not CO2 was predicted to be used at all times regardless of the ATP demand in the GEM [14]. These results contradict the expectations for high ATP demand regarding the separation of carboxylation and oxygenation. At high ATP demands, substrates are oxidized at high levels, meaning that O2 is consumed and CO2 produced, which would favor the carboxylation of ribulose-1,5-bisphosphate by RubisCO. Subsequently, both the oxygenation and photorespiration should be disfavored [14]. The most likely explanation for this discrepancy was that RubisCO-catalyzed carboxylation and oxygenation were not linked. There was also a need to constrain these two inseparable reactions together by using experimentally-derived constraints to drive the relative flux through carboxylation versus oxygenation based on the actual relative availabilities of CO2 and O2 under different energy conditions. 3.2. Seed GEMs in plants Seeds provide means of sexual propagation in plants, but they are also a source of nutritious and useful chemicals. Dry seed biomass consists primarily of proteins, lipids, starch and polysaccharides, and cell wall material and the relative levels and composition of these individual storage compounds depend on the genotype of the plant species and growth conditions [114,115]. Seed storage compounds are synthesized by the pathways of central carbon metabolism during seed development and are mobilized during seed germination, providing nutrients and energy for seedling establishment before photosynthesis takes over, enabling seedling growth [94,103,110,116]. Maternally provided organic and inorganic nutrients are metabolized through the: (i) glycolysis; (ii) pentose phosphate pathway; (iii) TCA cycle; and (iv) individual pathways leading to amino acids for protein, fatty acids for oil, and monosaccharide precursors for starch, oligosaccharide, and cell wall syntheses [94,103,110]. Seeds also accumulate secondary metabolites, providing protection against herbivores, but individually, these compounds contribute less than 1% to the final biomass and due to poor flux value scalability so far have been excluded from the seed models. For example, glucosinolates, which are considered major secondary metabolites in Brassicales, constitute less than 1% and 4% of biomass in rape and Arabidopsis seed, respectively [117]. In addition, glucosinolates are synthesized most likely 61 in both maternal tissues and the embryo, but the specific locations and their contribution to overall glucosinolate synthesis remain controversial [118,119]. Glucosinolate metabolism in seeds cannot be currently accurately modeled due to this ambiguity in the biosynthesis location. It is quite acceptable to argue that the levels of these protective metabolites will increase in field-grown plants and we have to include the relevant metabolic pathways in future GEMs intended for elucidating seed metabolism for metabolic engineering purposes. Bacteria will maximize their growth while saving the resources under optimal conditions as one of the mechanisms to become the dominant species in the environment. Similar to bacteria, one would expect developing seeds to maximize the production of biomass of optimal composition with the minimal effort under favorable conditions, enabling the competition with other plant species during germination. As such, real biomass measurements can be used to constrain the model, and the maximization of biomass production and minimization of total flux appear to be acceptable objective functions. However, as noted in Section 3.1, minimizing total flux is completely inappropriate for complex eukaryotic cells due to active redundant and parallel pathways and the lack of knowledge about additional required, but ignored processes that will influence the biomass production in seeds. Despite these issues, seed central carbon metabolism is relatively simple and usually fulfills the metabolic steady-state requirement needed for flux balance analysis [104,120]. Therefore, it is not surprising that the very first flux balance model in plants [15] was focused on elucidating seed metabolism. 3.2.1. Compartmented barley seed endosperm flux balance analysis models The endosperm of developing cereal seeds represents a combination of photoheterotrophic and heterotrophic metabolism and is the site of the synthesis of major storage compounds starch and seed storage proteins [121]. Maternally provided sucrose is used in glycolysis or in starch biosynthesis, while asparagine and glutamine are sources of nitrogen for other amino acids for seed storage protein synthesis. The endosperm also has to metabolically adapt to low O2 availability as it becomes less permeable to O2 with increasing density, while still being able to make seed storage compounds, ensuring seed growth [121,122]. This biochemical and physiological information and other genomic and proteomic data provided the basis to refine the model of central carbon metabolism in developing barley endosperm extracted from a number of databases [15]. The final model contained four different compartments: (i) a single extracellular compartment for the substrates, cofactors, and biomass compounds and (ii) three intracellular compartments (cytosol, mitochondria, and amyloplast) for internal metabolites. Particular focus was given to transporters, enabling the movement of internal metabolites between compartments. Barley endosperm comprises the majority, while the embryo less than 4% of dry, husk-free barley seed biomass [123]. As such, the actual composition of barley seeds (predominantly representing the endosperm) was used to constrain the biomass fluxes. Two successive objective functions were used: (i) maximizing growth by linear programming followed by (ii) minimizing the overall flux by quadratic optimization [124]. Fluxes were modeled in barley endosperm under: (i) anoxic; (ii) hypoxic; and (iii) aerobic conditions by simulating different O2 and sucrose uptake rates [15]. However, barley endosperm is not quite uniform and it contains a gradient of O2 levels, from aerobiosis at the periphery to hypoxia in the middle of the seed. High O2 levels (over 450 M) are observed at the periphery where photosynthesis occurs and low O2 levels (less than 5 M) in the middle of illuminated barley seeds [16]. This means that the metabolism is different in the endosperm located 62 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 close to the seed coat and in the middle of the endosperm. It was elegantly shown by using non-invasive in vivo magnetic resonance imaging that alanine is made as final product of fermentation in the central endosperm where O2 levels are low and then transported through plasmodesmata to peripheral endosperm layers where it is converted to pyruvate and used in aerobic metabolism [16]. Additional metabolomic measurements on dissected different layers of barley endosperm demonstrated changes in several metabolites of central carbon metabolism [16]. As such, the individual endosperm layers corresponding to different levels of O2 have to be used for biomass measurements individually. It appears that high degree of hypoxia in cereal seeds is relevant not only to later stages of development when the endosperm is already densely packed with starch and storage proteins, lowering the permeability to O2 [121], but also deep within seeds [16]. To account for these differences in O2 levels and central carbon metabolism, flux balance analysis was re-done using the original network [15] and these relevant measurements [16]. Barley endosperm flux balance analysis models predicted metabolic changes associated with anoxia and different levels of hypoxia relative to the aerobic conditions in developing barley seeds [15,16]. Both models predicted the expected accumulation of alanine under hypoxic conditions [15,16]. From the qualitative perspective, most of the predicted changes in flux distributions associated with anaerobiosis vs. aerobiosis were expected. For example, the models described in detail the gradual switch from respiration and high starch production in aerobic endosperm (e.g., young seeds or peripheral endosperm exposed to light) to an incomplete TCA cycle, high fluxes through the glycolytic and fermentative pathways, and low starch/biomass production in hypoxic dense endosperm (e.g., mature seeds or the central endosperm in the darkness). The non-cyclic TCA cycle is observed primarily because succinate dehydrogenase connects the TCA cycle with the mitochondrial electron transport chain, which carries little flux when O2 is limited. Based on these simulations, the rate of the biomass accumulation should decline with the increasing age and density of developing barley seeds as O2 availability decreases. From the quantitative perspective, the model predictions regarding the growth rates were in accordance with the experimental findings [15]. This is not surprising when the actual composition of seed storage compounds was used to constrain biomass fluxes. The original model also predicted the existence of unexpected metabolic adaptations to anoxia [15]. First, the futile cycle involving phosphofructokinase and pyrophosphate:fructose-6-phosphate 1phosphotransferase running in opposite directions leads to the use of ATP and generation of pyrophosphate during anoxia. Pyrophosphate produced by ADP-glucose pyrophosphorylase could be used to help drive sucrose mobilization for starch synthesis by UDPglucose pyrophosphorylase in anoxic endosperm. Second, the RubisCO bypass was required for growth in anoxic endosperm due to an already incomplete TCA cycle and low O2 availability, which limited additional available routes of carbon flow [15]. While the RubisCO bypass can re-fix carbon lost during central carbon metabolism in developing green oilseeds that are capable of photosynthesis [103,111], its activity may be irrelevant in heterotrophic cereal endosperm during anoxia. Based on O2 measurements, barley endosperm is probably never completely anoxic [16]. As such, the relevance of the futile cycle and RubisCO bypass to cereal endosperm metabolism remains elusive. Overall, these flux balance analysis models predicted barley endosperm metabolism during growth. The next question is how well the predicted internal fluxes are in an agreement with the actual internal fluxes. Many GEMs predict the biomass fluxes, but there is little agreement between the modeled and real internal fluxes [125]. This presents an enormous opportunity for systems biologists. One also has to keep in mind that plant metabolism is resilient and highly regulated, and plant cells utilize redundant pathways that: (i) may be missing from the model or (ii) not chosen as possible solutions due to the flux-minimization bias of the optimization algorithms. 3.2.2. Compartmented flux balance analysis models of developing Brassica napus embryos Rapeseeds predominantly accumulate oil and protein as storage compounds and represent an important source of food and biofuel [115]. Oil and proteins are synthesized through the central carbon and nitrogen metabolic pathways during embryo development. The embryos are green and capable of performing photosynthesis, which generates over 50% of the reducing power and ATP used in extensive oil accumulation [94,126]. A flux balance analysis model was reconstructed primarily from the AraCyc database [30] and available literature. The objective function was based on data from the literature regarding substrate uptake and rapeseed biomass production during the early stages of oil accumulation. Flux balance analysis was used along with principal component analysis and in silico mutant analysis to investigate “metabolic network plasticity” with specific emphasis on the metabolic regulation of lipid production. The principal component analysis-assisted exploration of the entire phenotypic solution space was based on the reduction of the dimensionality in a highly complex biological system to identify viable solutions. The predicted growth including lipid accumulation and its regulation by the transcription factor WRINKLED1 agreed with the experimental observations [21]. Brassica napus bna572 model is so far the most comprehensive seed model containing 10 compartments, including apoplastic and intermembrane spaces and thylakoid lumen [20]. Flux variability analysis [53] was used to interrogate other possible optimal flux distribution solutions in this flux balance analysis model to predict metabolic fluxes under photoheterotrophic and heterotrophic conditions in combination with different organic and inorganic nitrogen sources, respectively [19,20]. Flux variability analysis enables to assess ranges of flux values representing optimal solutions given used constraints and objective functions. This means that alternate and redundant pathways are also explored by this approach and may be chosen as an optimal solution. The actual biomass production rates including the carbon balance and housekeeping energy requirements of cultured B. napus embryos were used as constraints, while the substrate uptake (photons and carbon and nitrogen sources) was minimized in flux balance analysis [20]. For flux variability analysis, fixed objective function followed by sequential minimizing and maximizing each reaction, respectively, were used to narrow the ranges for flux distributions. Individual fluxes were categorized into 18 different types and 6 classes based on the flux variability/stability and essentiality/substitutability to assess which pathways were active under four different conditions. The predicted fluxes were compared to the actual fluxes obtained by 13 C-based steady-state metabolic flux analysis [19]. It is very important to keep in mind that these two models had different network structures and performing any direct comparison of these models is extremely difficult. Differences in network structure are a result of poor resolution of compartmentation of some pathways (e.g., cytosolic and plastidic glycolysis) or missing steps (e.g., plastidic malic enzyme as a source of carbon and energy for fatty acid synthesis) in 13 C-based flux models [19]. On the other hand, 13 C-based flux models are well constrained based on 13 C-signatures reflecting real fluxes. The flux variability analysis yielded predictions that were in most part consistent with the actual fluxes in developing rapeseed embryos. However, the results obtained from the correlation analysis used for flux comparison are somewhat misleading. Although the correlation coefficient was high when bna572 was compared to the 13 C-flux model [19], this result was biased because of all small fluxes falling together E. Collakova et al. / Plant Science 191–192 (2012) 53–70 close to zero. In fact, numerous small positive or negative (opposite direction) fluxes in 13 C-flux model were compared to the corresponding small, but several-fold different or zero fluxes in the bna572 model. For example, the TCA cycle never carried any flux between ␣-ketoglutarate and malate in the bna572 model, while there was a small positive flux between these metabolites based on 13 C metabolic flux analysis [19]. There were also clearly other more significant disagreements between the two models. Pathways known to be active in vivo that represented unfavorable “wasteful” or “energetically inefficient” processes were never selected to be part of the solution by flux variability analysis due to insufficient constraints and extreme objective functions. These pathways included: (i) amino acid uptake (ammonia uptake was energetically most efficient); (ii) glucose and fructose uptake (sucrose was energetically most efficient); and (iii) pentose-phosphate pathway (reductant was produced exclusively by photosynthesis and a number of dehydrogenases) [19,20,94,126–128]. However, it is important to note that flux variability analysis is extremely useful, as it enables predictions regarding the existence and essentiality of novel and alternate pathways not considered in small 13 C-based flux models. These predictions then can be tested experimentally by 13 C-labeling approaches in plants [104,120]. 3.3. AraGEM, an Arabidopsis flux balance analysis model representing a C3 plant cell AraGEM v.1.0 is a flux balance analysis model and it was the first attempt to predict and compare the central carbon metabolism in compartmented plant cells representing: (i) photosynthetic leaves; (ii) photorespiring leaves; and (iii) non-photosynthetic respiring cells/tissues/organs (e.g., roots). The first, photosynthesis-only scenario is realistic for plants grown under saturating levels of CO2 when the oxygenation reaction of RubisCO is suppressed. Modeling of: (i) photosynthetic and (ii) photorespiring leaves allowed estimation of the effects of photorespiration on changes of fluxes in central carbon metabolism. The network reconstruction involved all available databases that provided the stoichiometry and enzyme localization information as well as the information pertaining to individual gene-enzyme-reaction relationships. AraGEM includes several compartments (cytosol, mitochondria, plastids, peroxisomes, and vacuoles) and 18 plasma membrane and 83 interorganellar transporters [18]. A new, updated version of AraGEM network, iRS1597was generated using up-to-date annotations, but this network was not used for modeling [22]. During curation, 75 reactions that were absent from KEGG and AraCyc were identified as essential for biomass production by flux balance analysis. Fluxes leading to the synthesis of sugars, amino acids, fatty acids, nucleotides, vitamins and cofactors, and cell-wall components represent 47 different biomass drains that were included in the biomass constituting equation. The uptake of photons, CO2 , inorganic salts, and selected sugars and amino acids represent 18 uptake fluxes. During network reconstruction, 446 singleton metabolites were also identified [18]. In non-photosynthetic cells, photosynthetic and photorespiratory fluxes were set to zero, while sucrose uptake was allowed. The rate of biomass production was held constant and the actual biomass composition measurements [129] were used. The objective function involved the minimization of sucrose uptake and utilization. For photosynthetic cells without photorespiration, the flux through the oxygenation reaction of RubisCO was set to zero, while for normally photosynthesizing and photorespiring mesophyll cells the carboxylation-oxygenation ratio of RubisCO was constrained to 3:1, based on available data [130]. In both cases, CO2 and photon uptake were set as free fluxes (subjected to modeling) and sucrose uptake from the extracellular space was constrained to zero. Leaf biomass composition [131] was used to constrain the 63 biomass fluxes. The objective function involved minimizing the photon uptake rates. AraGEM is an extensive model of Arabidopsis metabolism, and its development is considered an important achievement. Given the complexity of plant metabolism, a number of simplifications had to be implemented to reduce this complexity and to enable flux balance analysis. Carbon and energy resources needed for general cell maintenance, degradation and component turn-over processes, and futile cycles are estimated at about 37% of total cellular needs [125], and were included as a simple ATP maintenance term in the AraGEM model. The inclusion of fatty acids was represented solely by palmitic acid and reduced the complexities that arise with lipid heterogeneity. The resulting impacts of these simplifications on the relationship between cell growth and photon uptake rates are minimal. However, the impacts of similar simplifications on fluxes through individual pathways remain under investigation for several models. Despite the simplifications, AraGEM was able to predict metabolic phenotypes in photosynthetic and non-photosynthetic Arabidopsis cells quite accurately from the qualitative perspective. These predictions included: (i) the activation of the traditional photorespiratory cycle and corresponding biomass loss as CO2 during photorespiration; (ii) activation of sucrose degradation, glycolysis, and TCA cycle in root cells; and (iii) identity of specific sources of reductant for nitrogen reduction in light and dark conditions. AraGEM also provided novel predictions regarding metabolic flexibility and the production of high-energy cofactors in different compartments under various conditions [18]. 3.4. C4GEM, maize, sugarcane, and sorghum flux balance analysis models representing C4 plant cells C4 metabolism enables some plants such as maize, sugarcane, and sorghum to minimize water loss and photorespiration by concentrating CO2 levels used by RubisCO [132–134]. Briefly, C4 metabolism involves CO2 fixation by phosphoenolpyruvate carboxylase, generating malate in mesophyll cells. Malate is then transported to bundle-sheath cells, where its decarboxylation to pyruvate by malic enzyme takes place. In the bundle-sheath cells, CO2 released from malate is fixed by RubisCO in the traditional Calvin–Benson–Bassham cycle. Pyruvate is transported back to the mesophyll cells and enters the C4 cycle. There are speciesspecific variations to C4 metabolism depending on whether NADor NADP-dependent malic enzymes and phosphoenolpyruvate carboxykinase and mitochondria are involved, and these modes were accounted for in the selected model C4 plant species. Compartmented AraGEM was used as a template for the reconstruction of C4GEM, to which pathways and transporters related to the C4 metabolism were added [17]. Species-specific gene annotation was implemented to generate models of central carbon metabolism in maize (11,623 genes), sugarcane (3,881 genes), and sorghum (3,557 genes) [17]. GEMing of C4 metabolism offers a unique opportunity to interrogate the interactions between two different cell types and serves as a basis for future studies involving modeling of multiple cell types. Bundle-sheath and mesophyll cells were modeled as duplicated subsets of the whole system and the intercellular transport by the plasmodesmata provided the needed flux link between these two cell types [17]. Metabolites were exchanged between the cells and remained ‘intracellular’ in both cells. In a single-cell model, they would be excreted from the cell and considered ‘extracellular’. The network differences between the cell types were achieved by implementing cell-specific constraints based on available transcriptomic and/or proteomic data for that cell type. For example, the C4 cycle operates only in mesophyll cells, so fluxes through this cycle can be constrained to zero in the bundle-sheath cells. 64 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 C4GEM delineated major differences in metabolism between the bundle-sheath and mesophyll cells that overall agreed with maize proteomic data. Bundle-sheath cells tend to have high fluxes through the pathways relevant to CO2 assimilation, while the mesophyll cells have highly active pathways involving pyruvate and nucleotide metabolism as well as nitrogen assimilation and metabolism [17]. A surprising prediction that came out of the model was the existence of differential fluxes in photosystems I and II represented by the linear and cyclic electron transport chains, respectively, in two cell types. This distinction was enabled because of the differences in ATP requirements in the bundle-sheath versus mesophyll cells and for the different malate decarboxylation modes mentioned above. C4 cycling in mesophyll cells requires more ATP than C3 photosynthesis and therefore, mesophyll cells should have had very active cyclic electron transport chain around photosystem I, which produces only ATP and helps keep the appropriate ATP/NADPH balance. However, C4GEM predicted that for the NADP-dependent malic enzyme mode, only linear electron transport chain was active in mesophyll cells, while the bundle-sheath cells had only cyclic electron transport chain. Based on simulations, cyclic electron transport chain is more efficient in the bundlesheath than in mesophyll cells with the NADP-malic enzyme-type metabolism [17]. Because the photon uptake was minimized, the cyclic electron transport was selected as the most efficient solution for ATP production in the bundle-sheath, but not in mesophyll cells. Minimizing the uptake of substrates (e.g., photons and sucrose under photoautotrophic and photoheterotrophic conditions, respectively) with maximal production of biomass of the appropriate composition provides the ability to assess the efficiency of cellular metabolism. It would be interesting to see flux distribution predictions for physiologically relevant conditions for a C4 plant model that includes constraints relating to the actual capacities (maximal quantum yield) of the photosystems and the downstream photosynthetic components including CO2 fixation based on available measurements [135]. Using photons rather than the components of the photosystems is a mathematical convenience that allows controlling the downstream fluxes. Is the minimization of the photon uptake a physiologically relevant objective function for a photosynthetic cell? We are questioning this objective function because the amount of light absorbed by photosynthetic apparatus does not translate to the same amount of electron flux. Not every photon transferring its energy to an electron is used in photochemistry and there are processes such as photoinhibition that are not accounted for in these models. Minimizing photon uptake might be relevant to very low light conditions or during an abiotic stress when photosynthesis is suppressed [136,137], but not to a healthy plant growing under favorable conditions when photons and minerals are abundant. Photosystems are usually hit with excess light energy that needs to be dissipated as heat (non-photochemical quenching of chlorophyll fluorescence) or by release of fluorescent light from excited chlorophylls [138]. To assess the influence of photon and CO2 uptake rates on biomass production, we performed a simulation study using AraGEM [18], which also used the minimization of photon utilization as an objective function. Results of these simulations are presented in Section 4. 3.5. Maize GEM iRS1563 representing C4 plants An updated model of C4GEM (iRS1563) was generated for maize mesophyll and bundle-sheath cells [22]. This model describes more detailed biosyntheses of cell wall components and lipids and includes selected, well-studied pathways of secondary metabolism (flavonoids, terpenoids, phenylpropanoids, anthocyanins, carotenoids, chlorophylls, glucosinolates, and some hormones) not present in previous models of central carbon metabolism [17,18]. iRS1563 contains 674 more metabolites and 893 more reactions than C4GEM, which is relevant mostly to central carbon metabolism. This model also has 445 reactions and 369 metabolites that are unique to maize when compared to the updated AraGEM (iRS1597) due to the differences between C3 and C4 metabolism as well as in secondary metabolism. All reactions are elementally and charge balanced in six compartments [22]. However, challenges still exist for proton balancing based on pKa values to enable organelles to operate at different pH values. Measurements of biomass composition for maize and other closely related species were used to constrain biomass fluxes in this flux balance analysis model [22]. As with C4GEM, iRS1563 was designed to predict flux distributions under different physiological conditions represented by photosynthesis, photorespiration, and respiration in mesophyll and bundle-sheath cells of a C4 plant. Unfortunately, no data or discussion was provided to compare this model to C4GEM in terms of flux predictions for these different types of physiological conditions. Changing lignin composition in cell walls without affecting total biomass accumulation is a challenging task important for cellulosic biofuel production [139,140]. Therefore, the authors focused on predicting metabolic phenotypes in two well-characterized lignin biosynthesis mutants under photosynthetic conditions and on prediction validation [22]. Fluxes through the steps disrupted in the mutants, cinnamyl alcohol dehydrogenase and caffeate 3O-methyltranferase, were set to 10% of the wild-type activities predicted by the model. Presence of a residual activity of these enzymes was necessary to ensure lignin biomass production. Maximal photon utilization and CO2 uptake were allowed. The flux balance analysis model was about 81% accurate in predicting the expected qualitative changes in mutant fluxes when compared to the experimental findings [22]. 4. Evaluating AraGEM through simulations The AraGEM model (v. 1.0) [18] was simulated in an effort to demonstrate to plant modelers: (i) how critical constraints can impact overall model results; and (ii) the robustness of the AraGEM model. To achieve this goal, the following were examined: (i) the relationship between the specific growth rate and the photon uptake rate given a constrained CO2 uptake rate; (ii) the impact of varying the stoichiometry of the biomass equation; (iii) the ability of AraGEM to synthesize all compounds in the model; and (iv) the impact of a net influx or efflux of free protons on the overall model conversion. These factors play a key role in using GEMs to guide metabolic engineering in silico. As shown in Fig. 2, it is critical to consider how a metabolic engineering strategy might influence the growth rate, photon uptake rate, and CO2 uptake rate of the mutant plant. If one of these parameters is known (or can be assumed), AraGEM will calculate the rest. However, it is presently unrealistic to assume that a GEM (of any kind) can predict all of these parameters for a genetic mutant without some input or constraints supplied by the user. For each curve in Fig. 2, the steady-state solution for a given CO2 uptake rate lies at the junction between the horizontal and vertical portions of the curve. At lower photon uptake rates, relative to this junction, an imbalance between CO2 and the required photons for growth is observed, leading to a model that does not converge in flux balance analysis. At higher photon uptake rates, this represents a scenario in which photons are consumed in excess. The regulatory structure of the plant cell does not allow either scenario to happen in the physical world. Next, the biomass constituting equation of the AraGEM model was investigated to determine its impact on the relationship between the growth rate and the photon uptake rate and to address the question how well individual biomass components need to be E. Collakova et al. / Plant Science 191–192 (2012) 53–70 Fig. 2. AraGEM model conversion with constrained CO2 uptake. The relationship between the cellular growth rate and photon uptake rate is shown for cases in which the CO2 uptake rate was constrained to 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, and 4.5 mmol g−1 dry weight h−1 as labeled in the figure. determined. The latter is especially relevant to plant models, as the biomass measurements for different biomass components often come from diverse types of measurements, experiments, research groups, and even cell types and plant species. For these simulations, the CO2 uptake rate was held constant at 2.5 mmol g−1 dry weight h−1 . The following components of the biomass constituting equation were varied: (i) cellulose and xylose; (ii) the lignin components (4-coumaryl alcohol, coniferyl alcohol, and sinapyl 65 alcohol); (iii) the lipids; and (iv) the maintenance ATP built into the biomass constituting equation. The stoichiometric coefficient for each of these parameters was varied by multipliers of 0; 0.5; 1; 1.5; 2; and 4, relative to their values in the AraGEM model, while the composition of the rest of the biomass components was kept constant (Fig. 3). For the cases of varying lignin components (Fig. 3B) and lipids (Fig. 3C), very little influence is observed for the relationship between the growth rate and photon uptake rate. The cellulose and xylose components (Fig. 3A) show greater sensitivity and lead to a reduction in the growth and photon uptake rates as their stoichiometric coefficients are increased in the biomass constituting equation. The ATP maintenance component (Fig. 3D) shows by far the greatest to the relationship between the growth rate and photon uptake rate of the cell. As the stoichiometry of required ATP maintenance was increased, the photon uptake rate to achieve the steady-state growth rate increased considerably. This result is common among several GEMs and has also been reported for the bacteria when increases in ATP demands are accompanied by increases in substrate uptake rates [5]. Nevertheless, the AraGEM model is considered to be a robust model, providing a fairly consistent relationship between the growth rate and photon uptake rate even when considerable changes are made to its biomass constituting equation. Additional simulations were performed with the AraGEM model to test its robustness and capabilities. When the biomass constituting equation provided with the AraGEM model is used, fluxes of central carbon metabolism dominate the flux balance analysis results, and little to no flux is observed through the secondary metabolite pathways. Of the 1,567 total reactions in the AraGEM model, only 435 receive non-zero (or non-trivial) flux. A simulation study was performed to determine whether this was: (i) the result of an incomplete model or (ii) a function of secondary metabolites not being included in the biomass constituting equation. We Fig. 3. Some biomass components influence the modeling results more than others. The relationship between the growth rate and the photon uptake rate is shown for varying the stoichiometric coefficients of specific components by multipliers of 0, 0.5, 1, 1.5, 2, and 4 (direction shown from 0 to 4 by arrows). CO2 uptake was held constant. The stoichiometric parameters of the following biomass components were varied: (A) cellulose and xylose; (B) lignin components including (i) 4-coumaryl alcohol, (ii) coniferyl alcohol, and (iii) sinapyl alcohol; (C) lipids; and (D) maintenance ATP requirements of the biomass equation. 66 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 5. How can GEMing in plants be improved? Fig. 4. AraGEM is not sensitive to pH change. The number of reactions remaining unchanged by varying the specific proton flux from (i) −10, (ii) 0, and (iii) 10 mmol H+ g−1 dry weight h−1 . constructed an algorithm where every metabolite of the AraGEM model was added to the biomass constituting equation individually, and the model was tested for growth. If the cell was able to grow in silico, the metabolite was capable of being produced by AraGEM. The results of this study revealed that all 1,748 metabolites of the AraGEM model are capable of being produced. This further reveals that all 1,567 reactions of the model are complete and capable of carrying flux. This also indicates that the 435 reactions used by the AraGEM model, as published, are a function of its biomass constituting equation. This leads to the following important point that if this model is to be used to study secondary metabolite pathways, these secondary metabolites must be added (manually) to the biomass constituting equation. This means that the AraGEM model is ready for additional uses involving secondary metabolism (given small modifications to the biomass constituting equation). This is applicable primarily for studies involving plant-environment interactions as well as interactions of plants with pathogen, herbivores, and other plants or any study highlighting the importance of secondary metabolism. As is the case of all GEMs, the biomass constituting equation ultimately determines how a metabolic network is used, given constraints applied to substrate uptake and product secretion [5]. Finally, the impact of the specific proton flux was investigated using the AraGEM model. The specific proton flux was initially developed for bacteria and describes the total rate at which protons are secreted or consumed by the cell [48]. Obviously, this parameter is more important to acid-producing bacteria than it is for plants. To study the specific proton flux using AraGEM, a proton exchange reaction was installed and constrained to values of −10, 0, and 10 mmol protons g−1 dry cell weight h−1 . These values correspond to proton efflux, no proton exchange, and proton influx, respectively. Flux balance analysis was applied to all cases, and few changes in fluxes were observed. The number of reactions maintaining identical fluxes is shown in Fig. 4. Of the 1,567 reactions, 1,449 maintained an identical flux for all three cases of the specific proton flux. This represents a considerable change from bacterial GEMs, which are particularly sensitive to the specific proton flux parameter [5]. Collectively, the results of these simulation studies with AraGEM suggest that this model is: (i) completely functional; (ii) robust; and (iii) has a metabolic network that is functionally different from GEMs created for bacteria. Constructing the first compartmented plant GEMs describing photoautotrophic and photoheterotrophic metabolism including photorespiration and C4 metabolism in two different cell types required an enormous effort and was quite an accomplishment. The models described here represent the first stepping stone in plant GEMing, but there is room for improvement. Modeling involving these plant genome-scale reconstructions provided mostly qualitative predictions of flux distributions for selected aspects of primarily central carbon metabolism, but most importantly an idea about model properties and a framework for improvements. One of the major obstacles in plant GEMing is the large number of missing parts and unknown interactions among the known parts. Reverse genetics and enzyme activity and specificity studies along with subcellular localization will slowly populate the list of genes/enzymes of known metabolic functions facilitating far better gene annotation. Transcriptomics and proteomics are extremely useful in delineating the presence of transcripts and proteins in plant cells and subcellular localization of enzymes [112,141,142]. Advancing high-throughput non-targeted metabolomics will help identify novel secondary metabolites that need to be included in plant GEMs, but this effort has to be connected to metabolite localization to relevant cell types and organelles and for defined developmental stages and growth conditions. Laser-capture microdissection and pressure catapulting, highly sensitive liquid chromatographytandem mass spectrometry, and novel approaches including the ‘mass microscope’ [89,91,92] will become indispensible for this goal. Recent advancements in non-aqueous fractionation will also aid in these goals [90]. Considering that there are many different plant cell types that are recalcitrant to separation from other cells and virtually an infinite number of internal and environmental stimuli and their combinations, this is a Sisyphean task. Probably the best approach with the current knowledge base is to focus on cell types and conditions of interest. New parts and functionalities can be added to the existing networks to generate comprehensive template models for major plant species, from which specific networks operating in cells of interest can be extracted. This procedure is similar to what evolved from the first E. coli K-12 MG1655 models for bacteria [7,12,56]. Current plant GEMs tend to use the existing data to constrain the biomass fluxes, but often from different studies of different biomass metabolites, even extrapolated from different plant species and the same is true for model validation. Often, predicted flux distribution validation involves solely experimental evidence for the presence of an enzyme in the similar tissues that was predicted to be active in in silico modeling. The following relates to problems with different data sources used in GEMing. Biomass and its composition are highly variable traits that largely depend on the developmental stage and environmental conditions. GEMing should include measurements relevant to the model system at a defined developmental stage and conditions and, if possible, from appropriate cell types. Additional flux constraints should be added to minimize the phenotypic solution space. Plant models are currently only constrained based on biomass fluxes, substrate uptake, or a broad range of physiologically relevant metabolite concentrations [14,15,17,18,21,22], but do not consider thermodynamic constraints for the internal reactions or other constraints. Only maize iRS1563 was fully balanced for all elements and included proton and charge balancing [22]. Objective functions used routinely for bacterial GEMing may not be the most appropriate for GEMing in plant cells. Identifying proper objective functions of specific plant cell types will improve the predictive capabilities of resulting models. Because plant cells are exposed to complex signals at the same time, modeling will require multiple and more complex objective functions. The idea E. Collakova et al. / Plant Science 191–192 (2012) 53–70 of minimizing the use of resources while maximizing the output given circumstances is probably valid, but difficult to prove because of these circumstances surrounding growth, which often has missing information and is not considered in the model as resource and energy drains (e.g., responses to pathogens and herbivores). Cell maintenance, “wasteful” degradation processes, and futile cycles also should be considered individually, as they represent significant sinks of resources and energy that are different in different cell types [18,22,125]. In addition, plant cells respond metabolically to convoluted stimuli coming from the adjacent cells and the environment, which also requires redirection of carbon flow to additional, often neglected pathways. It is advisable to start looking at interactions of cells in tissues and include these pathways as active in plant GEMs. The Nielsen and Maranas groups are elegantly pioneering the way of addressing these problems with their systems involving two cell types in C4 plants [17,22]. Using the minimization of fluxes or steps as objective functions in GEMing tends to get rid of “redundant” pathways. If there are two redundant pathways that have similar cofactor balance, the flux minimization algorithms will choose one pathway, despite the fact that they are both active in vivo, which can be proven by transcriptomic, metabolomic, and fluxomic approaches. These pathways may seem to be redundant, but obviously they are needed to allow metabolic adaptation in plant cells and are a result of complex metabolic regulation that cannot easily be accounted for in a flux balance analysis model. Finding the minimum number of appropriate objective functions that correctly represent cellular physiology and metabolism is probably one of the most problematic challenges in GEMing in general. One will probably get away with using a few objective functions if appropriate, physiologically relevant, flux constraints are implemented. Then such properly constrained models should start yielding accurate predictions of internal fluxes. The presented plant flux balance analysis models were reasonably accurate in predicting biomass fluxes, which is not surprising, since the actual cellular biomass composition was used to constrain most of these models. Prediction of internal fluxes in these plant models was only qualitative and the model was considered a success when the expected pathways were predicted to be active in silico. Flux balance analysis has been found to perform poorly in isolated cases in predicting the internal fluxes regardless of the model system. This is possibly a result of too many degrees of freedom in these under-determined models and possibly due to the use of extreme, physiologically irrelevant objective functions. Comparison of results obtained from flux balance analysis and 13 C-based steady-state metabolic flux analysis revealed that the actual flux distributions are within the phenotypically feasible solution space of bacterial flux balance models [125]. However, the sampling of this solution space with extreme objective functions used in flux balance analysis yielded flux combinations that were substantially different from the actual flux distributions. 13 C-based fluxes were in close agreement with the actual in vivo fluxes and therefore, 13 C-based steady-state metabolic flux analysis is suitable to provide validation of flux balance analysis predictions [125]. The situation is similar in plant systems, although the direct comparison with the actual fluxes is not always possible. Comparison of developing rapeseed flux balance/variability and 13 C-based flux models revealed that while some fluxes were comparable between the two models, others were not and many suboptimal, but in vivo-active alternate pathways were not selected as part of solution [19,20]. Flux variability analysis (all possible metabolic routes are considered and the resulting range of fluxes for each reaction are returned from the analysis) [53] used in these studies should be expanded with proper experimentally-derived constraints and the exclusion of extreme objective functions [19,20]. 67 A significant challenge for GEMing in the future involves the addition of flux constraints on internal fluxes or at branch points within the metabolic network itself. 13 C-based steady-state metabolic flux analysis provided an accurate representation of internal flux distributions, as demonstrated by the consistency between the predicted and the actual, experimentally measured 13 C-labeling in internal metabolites [125]. As such, this approach should be used to provide validation for flux balance analysis flux distribution predictions or these approaches should be used together to obtain the most appropriate predictions of fluxes and metabolic phenotypes. Alternatively, rather than using 13 Cbased steady-state metabolic flux analysis to validate flux balance or variability models, 13 C-based flux measurements should be used as physiologically relevant flux constraints, which would preclude the use of extreme objective functions (Jackie Shanks, personal communication). If necessary, unique objective functions (or multiple combinations of these functions) instead of the physiologically irrelevant extreme ones should be used. These appropriate objective functions may ultimately be back-calculated by comparing measured fluxes and flux balance analysis results [47,55]. However, 13 C-based steady-state metabolic flux analysis cannot be used in purely photosynthetic systems, where 13 CO 2 is the sole source of carbon because every carbon in every metabolite will become labeled to the same degree. In such cases, kinetic modeling such as isotopically nonstationary 13 C-based metabolic flux analysis is needed [143]. Despite the many successes of GEMing, significant challenges lie ahead for systems biology research not only in plants, but also in bacteria. 6. Conclusions The current plant GEMs have provided very promising predictions. However, given the extreme level of complexity of plant metabolism, much research is left to be done. Missing parts can be incorporated as technology involving different omics progresses. Future focus will be on secondary metabolism. Choosing the right model systems, for which experimental validations are available, as well as using known constraints and appropriate objective functions will play a crucial role in improving plant GEMs. However, the missing parts do not relate to only to metabolites and enzymes, but also to known and novel regulators in both primary and secondary metabolism. Fluxes represent the cellular metabolic phenotypes and inherently reflect convoluted regulatory mechanisms. In another words, the final flux value through a specific step is a result of the influence of all possible active regulatory mechanisms that can affect the enzyme activity of this step. The question is, how we can dissect these mechanisms and integrate them into the model by associating relevant regulators with the enzymes. Whether we like it or not, integration of regulation into GEMs will become important not only for understanding the basis of plant metabolic plasticity, but also for rational metabolic engineering, as regulation tends to “overwrite” effects of changes to expression levels of biosynthetic genes. Then we will be able to address the questions of metabolic adaptations in plants exposed to diverse stimuli and generate predictions for rational plant metabolic engineering to improve plant stress tolerance and resistance to pathogens and herbivores and ultimately biomass production. So, are we ready for GEMing in plants? Definitely! Acknowledgements Research funding was obtained from NSF-MCB-1052145 and the Biodesign and Bioprocessing Research Center at Virginia Tech. EC is primarily responsible for writing the review. RS and JY performed 68 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 the simulations study with the AraGEM model. RS provided the discussion of AraGEM model simulations and edited the manuscript. References [1] K.J. Kauffman, P. Prakash, J.S. Edwards, Advances in flux balance analysis, Current Opinion in Biotechnology 14 (2003) 491–496. [2] E. Murabito, E. Simeonidis, K. Smallbone, J. Swinton, Capturing the essence of a metabolic network: a flux balance analysis approach, Journal of Theoretical Biology 260 (2009) 445–452. [3] C.H. Schilling, J.S. Edwards, D. Letscher, B.O. Palsson, Combining pathway analysis with flux balance analysis for the comprehensive study of metabolic systems, Biotechnology and Bioengineering 71 (2000) 286–306. [4] L.M. Liu, R. Agren, S. Bordel, J. Nielsen, Use of genome-scale metabolic models for understanding microbial physiology, FEBS Letters 584 (2010) 2556–2564. [5] R.S. Senger, Biofuel production improvement with genome-scale models: the role of cell composition, Biotechnology Journal 5 (2010) 671–685. [6] J.S. Edwards, B.O. Palsson, Systems properties of the Haemophilus influenzae Rd metabolic genotype, Journal of Biological Chemistry 274 (1999) 17410–17416. [7] D.J. Baumler, R.G. Peplinski, J.L. Reed, J.D. Glasner, N.T. Perna, The evolution of metabolic networks of E. coli, BMC System Biology 5 (2011) 182. [8] N.C. Duarte, et al., Global reconstruction of the human metabolic network based on genomic and bibliomic data, Proceedings of the National Academy of Sciences of the United States of America 104 (2007) 1777–1782. [9] I. Famili, J. Forster, J. Nielson, B.O. Palsson, Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network, Proceedings of the National Academy of Sciences of the United States of America 100 (2003) 13134–13139. [10] A.M. Feist, M.J. Herrgard, I. Thiele, J.L. Reed, B.O. Palsson, Reconstruction of biochemical networks in microorganisms, Nature Reviews Microbiology 7 (2009) 129–143. [11] P.C. Fu, Genome-scale modeling of Synechocystis sp PCC 6803 and prediction of pathway insertion, Journal of Chemical Technology and Biotechnology 84 (2009) 473–483. [12] J.D. Orth, et al., A comprehensive genome-scale reconstruction of Escherichia coli metabolism-2011, Molecular Systems Biology 7 (2011) 1–9. [13] O. Rolfsson, B.O. Palsson, I. Thiele, The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions, BMC System Biology 5 (2011) 155. [14] M.G. Poolman, L. Miguet, L.J. Sweetlove, D.A. Fell, A genome-scale metabolic model of Arabidopsis and some of its properties, Plant Physiology 151 (2009) 1570–1581. [15] E. Grafahrend-Belau, F. Schreiber, D. Koschutzki, B.H. Junker, Flux balance analysis of barley seeds: a computational approach to study systemic properties of central metabolism, Plant Physiology 149 (2009) 585–598. [16] H. Rolletschek, et al., Combined noninvasive imaging and modeling approaches reveal metabolic compartmentation in the barley endosperm, Plant Cell 23 (2011) 3041–3054. [17] C.G.D. Dal’Molin, L.E. Quek, R.W. Palfreyman, S.M. Brumbley, L.K. Nielsen, C4GEM, a genome-scale metabolic model to study C4 plant metabolism, Plant Physiology 154 (2010) 1871–1885. [18] C.G.D. Dal’Molin, L.E. Quek, R.W. Palfreyman, S.M. Brumbley, L.K. Nielsen, AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis, Plant Physiology 152 (2010) 579–589. [19] J. Hay, J. Schwender, Computational analysis of storage synthesis in developing Brassica napus L. (oilseed rape) embryos: flux variability analysis in relation to 13 C metabolic flux analysis, Plant Journal 67 (2011) 513–525. [20] J. Hay, J. Schwender, Metabolic network reconstruction and flux variability analysis of storage synthesis in developing oilseed rape (Brassica napus L.) embryos, Plant Journal 67 (2011) 526–541. [21] E. Pilalis, A. Chatziioannou, B. Thomasset, F. Kolisis, An in silico compartmentalized metabolic model of Brassica napus enables the systemic study of regulatory aspects of plant central metabolism, Biotechnology and Bioengineering 108 (2011) 1673–1682. [22] R. Saha, P.F. Suthers, C.D. Maranas, Zea mays iRS1563: a comprehensive genome-scale metabolic reconstruction of maize metabolism, PLoS One 6 (2011) 1–12. [23] E. Ruppin, J.A. Papin, L.F. de Figueiredo, S. Schuster, Metabolic reconstruction, constraint-based analysis and game theory to probe genome-scale metabolic networks, Current Opinion in Biotechnology 21 (2010) 502–510. [24] J.L. Reed, T.D. Vo, C.H. Schilling, B.O. Palsson, An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR), Genome Biology 4 (2003) R54. [25] P.D. Karp, et al., Expansion of the BioCyc collection of pathway/genome databases to 160 genomes, Nucleic Acids Research 33 (2005) 6083–6089. [26] H. Ogata, et al., KEGG: Kyoto encyclopedia of genes and genomes, Nucleic Acids Research 27 (1999) 29–34. [27] R. Overbeek, et al., The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes, Nucleic Acids Research 33 (2005) 5691–5702. [28] J. Schellenberger, J.O. Park, T.M. Conrad, B.O. Palsson, BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions, BMC Bioinformatics 11 (2010) 213. [29] P. Lamesch, et al., The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools, Nucleic Acids Research 40 (2012) D1202–D1210. [30] P.F. Zhang, et al., MetaCyc and AraCyc. Metabolic pathway databases for plant research, Plant Physiology 138 (2005) 27–37. [31] D. Croft, et al., Reactome: a database of reactions, pathways and biological processes, Nucleic Acids Research 39 (2011) D691–D697. [32] C.S. Henry, et al., High-throughput generation, optimization and analysis of genome-scale metabolic models, Nature Biotechnology 28 (2010), 977-U922. [33] J.L. Heazlewood, R.E. Verboom, J. Tonti-Filippini, I. Small, A.H. Millar, SUBA: the Arabidopsis subcellular database, Nucleic Acids Research 35 (2007) D213–D218. [34] Q. Sun, et al., PPDB, the Plant Proteomics Database at Cornell, Nucleic Acids Research 37 (2009) D969–D974. [35] S. Reumann, C.L. Ma, S. Lemke, L. Babujee, AraPerox., A database of putative Arabidopsis proteins from plant peroxisomes, Plant Physiology 136 (2004) 2587–2608. [36] B. Boeckmann, et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Research 31 (2003) 365–370. [37] H.P.J. Bonarius, G. Schmid, J. Tramper, Flux analysis of underdetermined metabolic networks: the quest for the missing constraints, Trends in Biotechnology 15 (1997) 308–314. [38] A. Varma, B.O. Palsson, Metabolic flux balancing – basic concepts, scientific and practical use, Bio-Technology 12 (1994) 994–998. [39] N.D. Price, J.L. Reed, B.O. Palsson, Genome-scale models of microbial cells: evaluating the consequences of constraints, Nature Reviews Microbiology 2 (2004) 886–897. [40] A.M. Feist, B.O. Palsson, The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli, Nature Biotechnology 26 (2008) 659–667. [41] R. Mahadevan, B.O. Palsson, D.R. Lovley, In situ to in silico and back: elucidating the physiology and ecology of Geobacter spp. using genome-scale modelling, Nature Reviews Microbiology 9 (2011) 39–50. [42] C.B. Milne, P.J. Kim, J.A. Eddy, N.D. Price, Accomplishments in genome-scale in silico modeling for industrial and medical biotechnology, Biotechnology Journal 4 (2009) 1653–1670. [43] J.A. Papin, N.D. Price, S.J. Wiback, D.A. Fell, B.O. Palsson, Metabolic pathways in the post-genome era, Trends in Biochemical Sciences 28 (2003) 250–258. [44] E.T. Papoutsakis, Equations and calculations for fermentations of butyric acid bacteria, Biotechnology and Bioengineering 26 (1984) 174–187. [45] C.S. Henry, M.D. Jankowski, L.J. Broadbelt, V. Hatzimanikatis, Genome-scale thermodynamic analysis of Escherichia coli metabolism, Biophysical Journal 90 (2006) 1453–1461. [46] M.D. Jankowski, C.S. Henry, L.J. Broadbelt, V. Hatzimanikatis, Group contribution method for thermodynamic analysis of complex metabolic networks, Biophysical Journal 95 (2008) 1487–1499. [47] J.L. Reed, B.O. Palsson, Thirteen years of building constraint-based in silico models of Escherichia coli, Journal of Bacteriology 185 (2003) 2692–2699. [48] R.S. Senger, E.T. Papoutsakis, Genome-scale model for Clostridium acetobutylicum: Part II. Development of specific proton flux states and numerically determined sub-systems, Biotechnology and Bioengineering 101 (2008) 1053–1071. [49] K. Yizhak, T. Benyamini, W. Liebermeister, E. Ruppin, T. Shlomi, Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model, Bioinformatics 26 (2010) i255–i260. [50] A.M. Feist, B.O. Palsson, The biomass objective function, Current Opinion in Microbiology 13 (2010) 344–349. [51] S.A. Becker, et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox, Nature Protocols 2 (2007) 727–738. [52] J. Schellenberger, et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0, Nature Protocols 6 (2011) 1290–1307. [53] R. Mahadevan, C.H. Schilling, The effects of alternate optimal solutions in constraint-based genome-scale metabolic models, Metabolic Engineering 5 (2003) 264–276. [54] K. Smallbone, E. Simeonidis, Flux balance analysis: a geometric perspective, Journal of Theoretical Biology 258 (2009) 311–315. [55] E.P. Gianchandani, M.A. Oberhardt, A.P. Burgard, C.D. Maranas, J.A. Papin, Predicting biological system objectives de novo from internal state measurements, BMC Bioinformatics 9 (2008) 43. [56] J.S. Edwards, B.O. Palsson, The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities, Proceedings of the National Academy of Sciences of the United States of America 97 (2000) 5528–5533. [57] J. Forster, I. Famili, B. Palsson, J. Nielsen, Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae, OMICS 7 (2003) 193–202. [58] I. Thiele, et al., A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonella Typhimurium LT2, BMC System Biology 5 (2011) 8. [59] D. Segre, D. Vitkup, G.M. Church, Analysis of optimality in natural and perturbed metabolic networks, Proceedings of the National Academy of Sciences of the United States of America 99 (2002) 15112–15117. [60] T. Shlomi, O. Berkman, E. Ruppin, Regulatory on/off minimization of metabolic flux changes after genetic perturbations, Proceedings of the E. Collakova et al. / Plant Science 191–192 (2012) 53–70 [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] National Academy of Sciences of the United States of America 102 (2005) 7695–7700. A.P. Burgard, P. Pharkya, C.D. Maranas, OptKnock., A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization, Biotechnology and Bioengineering 84 (2003) 647–657. J. Kim, J.L. Reed, OptO.R.F., Optimal metabolic and regulatory perturbations for metabolic engineering of microbial strains, BMC System Biology 4 (2010) 53. I. Rocha, et al., OptFlux: An open-source software platform for in silico metabolic engineering, BMC System Biology 4 (2010) 45. R. Mahadevan, Constraint-based genome-scale models of cellular metabolism, in: C.D. Smolke (Ed.), The Metabolic Pathway Engineering Handbook, Taylor & Francis Group, Boca Raton, 2010. J. Lee, H. Yun, A.M. Feist, B.O. Palsson, S.Y. Lee, Genome-scale reconstruction and in silico analysis of the Clostridium acetobutylicum ATCC 824 metabolic network, Applied Microbiology and Biotechnology 80 (2008) 849–862. P.S. Schnable, et al., The B73 maize genome: complexity, diversity, and dynamics, Science 326 (2009) 1112–1115. D. Shintani, D. DellaPenna, Elevating the vitamin E content of plants through metabolic engineering, Science 282 (1998) 2098–2100. P.L. Nagy, G.M. McCorkle, H. Zalkin, PurU a source of formate for PurT-dependent phosphoribosyl-N-formylglycinamide synthesis, Journal of Bacteriology 175 (1993) 7066–7073. E. Collakova, et al., Arabidopsis 10-formyl tetrahydrofolate deformylases are essential for photorespiration, Plant Cell 20 (2008) 1818–1832. R.J. Bino, et al., Potential of metabolomics as a functional genomics tool, Trends in Plant Science 9 (2004) 418–425. E. Pichersky, D.R. Gang, Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective, Trends in Plant Science 5 (2000) 439–445. K. Saito, F. Matsuda, Metabolomics for functional genomics, systems biology, and biotechnology, in: S. Merchant, W.R. Briggs, D. Ort (Eds.), Annual Review of Plant Biology, 2010, pp. 463–489. O. Fiehn, Metabolomics – the link between genotypes and phenotypes, Plant Molecular Biology 48 (2002) 155–171. A.R. Fernie, Metabolome characterisation in plant system analysis, Functional Plant Biology 30 (2003) 111–120. V.S. Kumar, M.S. Dasika, C.D. Maranas, Optimization based automated curation of metabolic reconstructions, BMC Bioinformatics 8 (2007) 212. G. Jander, V. Joshi, Aspartate-derived amino acid biosynthesis in Arabidopsis thaliana, in: The Arabidopsis Book, The American Society of Plant Biologists, 2009, pp.c1–15. U. Gophna, E. Bapteste, W.F. Doolittle, D. Biran, E.Z. Ron, Evolutionary plasticity of methionine biosynthesis, Gene 355 (2005) 48–57. B.R. Epperly, E.E. Dekker, l-Threonine dehydrogenase from Escherichia coli – identification of an active-site cysteine residue and metal-ion studies, Journal of Biological Chemistry 266 (1991) 6086–6092. A.J. Edgar, Molecular cloning and tissue distribution of mammalian lthreonine 3-dehydrogenases, BMC Biochemistry 3 (2002) 19. D. Amador-Noguez, et al., Systems-level metabolic flux profiling elucidates a complete, bifurcated tricarboxylic acid cycle in Clostridium acetobutylicum, Journal of Bacteriology 192 (2010) 4452–4461. S.B. Crown, et al., Resolving the TCA cycle and pentose-phosphate pathway of Clostridium acetobutylicum ATCC 824: isotopomer analysis, in vitro activities and expression analysis, Biotechnology Journal 6 (2011) 300–305. R.S. Senger, E.T. Papoutsakis, Genome-scale model for Clostridium acetobutylicum: Part I. Metabolic network resolution and analysis, Biotechnology and Bioengineering 101 (2008) 1036–1052. A.M. Feist, et al., A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information, Molecular Systems Biology 3 (2007) 121. V.M.E. Andriotis, N.J. Kruger, M.J. Pike, A.M. Smith, Plastidial glycolysis in developing Arabidopsis embryos, New Phytologist 185 (2010) 649–662. S. Baud, I.A. Graham, A spatiotemporal analysis of enzymatic activities associated with carbon metabolism in wild-type and mutant embryos of Arabidopsis using in situ histochemistry, Plant Journal 46 (2006) 155–169. L.E. Anderson, J.A. Bryant, A.A. Carol, Both chloroplastic and cytosolic phosphoglycerate kinase isozymes are present in the pea leaf nucleus, Protoplasma 223 (2004) 103–110. W.C. Plaxton, The organization and regulation of plant glycolysis, Annual Review of Plant Physiology and Plant Molecular Biology 47 (1996) 185–214. P. Giege, et al., Enzymes of glycolysis are functionally associated with the mitochondrion in Arabidopsis cells, Plant Cell 15 (2003) 2140–2151. L.W. Sumner, et al., Spatially resolved plant metabolomics, in: R.D. Hall (Ed.), Annu. Plant Rev. Biol. Plant Metabolomics, John Wiley & Sons, Chichester, West Sussex, UK, 2011, pp. 343–366. S. Klie, et al., Analysis of the compartmentalized metabolome – a validation of the non-aqueous fractionation technique, Frontiers in Plant Science 2 (2011) 1–27. J. Grassl, N.L. Taylor, A.H. Millar, Matrix-assisted laser desorption/ionisation mass spectrometry imaging and its development for plant protein imaging, Plant Methods 7 (2011) 21. T. Harada, et al., Visualization of volatile substances in different organelles with an atmospheric-pressure mass microscope, Analytical Chemistry 81 (2009) 9153–9157. 69 [93] N.R. Baker, J. Harbinson, D.M. Kramer, Determining the limitations and regulation of photosynthetic energy transduction in leaves, Plant, Cell and Environment 30 (2007) 1107–1125. [94] J. Schwender, J.B. Ohlrogge, Y. Shachar-Hill, A flux model of glycolysis and the oxidative pentosephosphate pathway in developing Brassica napus embryos, Journal of Biological Chemistry 278 (2003) 29442–29453. [95] C. Miyake, Alternative electron flows (water-water cycle and cyclic electron flow around PSI) in photosynthesis: molecular mechanisms and physiological functions, Plant Cell Physiology 51 (2010) 1951–1963. [96] A.L. Meadows, R. Karnik, H. Lam, S. Forestell, B. Snedecor, Application of dynamic flux balance analysis to an industrial Escherichia coli fermentation, Metabolic Engineering 12 (2010) 150–160. [97] J. Kilian, et al., The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses, Plant Journal 50 (2007) 347–363. [98] Y. Shimada, H. Goda, Y. Yoshida, An introduction to AtGenExpress. How to use the data sets and what to learn from large scale gene expression analysis, Plant Cell Physiology 46 (2005), S141-S141. [99] T.C.R. Williams, et al., Metabolic network fluxes in heterotrophic Arabidopsis cells: stability of the flux distribution under different oxygenation conditions, Plant Physiology 148 (2008) 704–718. [100] C.J. Baxter, et al., The metabolic response of heterotrophic Arabidopsis cells to oxidative stress, Plant Physiology 143 (2007) 312–325. [101] M. Lehmann, et al., The metabolic response of Arabidopsis roots to oxidative stress is distinct from that of heterotrophic cells in culture and highlights a complex relationship between the levels of transcripts, metabolites, and flux, Molecular Plant 2 (2009) 390–406. [102] M.G. Poolman, ScrumPy: metabolic modelling with Python, IEE Proceedings – Systems Biology 153 (2006) 375–378. [103] D.K. Allen, J.B. Ohlrogge, Y. Shachar-Hill, The role of light in soybean seed filling metabolism, Plant Journal 58 (2009) 220–234. [104] R.G. Ratcliffe, Y. Shachar-Hill, Measuring multiple fluxes through plant metabolic networks, Plant Journal 45 (2006) 490–511. [105] W. Wiechert, M. Mollney, S. Petersen, A.A. de Graaf, A universal framework for C-13 metabolic flux analysis, Metabolic Engineering 3 (2001) 265–283. [106] W. Wiechert, A.A. deGraaf, Bidirectional reaction steps in metabolic networks: 1. Modeling and simulation of carbon isotope labeling experiments, Biotechnology and Bioengineering 55 (1997) 101–117. [107] A.P. Alonso, V.L. Dale, Y. Shachar-Hill, Understanding fatty acid synthesis in developing maize embryos using metabolic flux analysis, Metabolic Engineering 12 (2010) 488–497. [108] A.P. Alonso, F.D. Goffman, J.B. Ohlrogge, Y. Shachar-Hill, Carbon conversion efficiency and central metabolic fluxes in developing sunflower (Helianthus annuus L.) embryos, Plant Journal 52 (2007) 296–308. [109] A.P. Alonso, et al., A metabolic flux analysis to study the role of sucrose synthase in the regulation of the carbon partitioning in central metabolism in maize root tips, Metabolic Engineering 9 (2007) 419–432. [110] A.P. Alonso, D.L. Val, Y. Shachar-Hill, Central metabolic fluxes in the endosperm of developing maize seeds and their implications for metabolic engineering, Metabolic Engineering 13 (2011) 96–107. [111] J. Schwender, F. Goffman, J.B. Ohlrogge, Y. Shachar-Hill, Rubisco without the Calvin cycle improves the carbon efficiency of developing green seeds, Nature 432 (2004) 779–782. [112] K. Baerenfaller, et al., Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics, Science 320 (2008) 938–941. [113] I. Andersson, A. Backlund, Structure and function of Rubisco, Plant Physiology and Biochemistry 46 (2008) 275–291. [114] S. Baud, J.P. Boutin, M. Miquel, L. Lepiniec, C. Rochat, An integrated overview of seed development in Arabidopsis thaliana ecotype WS, Plant Physiology and Biochemistry 40 (2002) 151–160. [115] R.J. Weselake, et al., Increasing the flow of carbon into seed oil, Biotechnology Advances 27 (2009) 866–878. [116] M.J. Holdsworth, L. Bentsink, W.J.J. Soppe, Molecular networks regulating Arabidopsis seed maturation, after-ripening, dormancy and germination, New Phytologist 179 (2008) 33–54. [117] P.D. Brown, J.G. Tokuhisa, M. Reichelt, J. Gershenzon, Variation of glucosinolate accumulation among different organs and developmental stages of Arabidopsis thaliana, Phytochemistry 62 (2003) 471–481. [118] S.X. Chen, B.L. Petersen, C.E. Olsen, A. Schulz, B.A. Halkier, Long-distance phloem transport of glucosinolates in Arabidopsis, Plant Physiology 127 (2001) 194–201. [119] H.H. Nour-Eldin, B.A. Halkier, Piecing together the transport pathway of aliphatic glucosinolates, Phytochemistry Reviews 8 (2009) 53–67. [120] D.K. Allen, I.G.L. Libourel, Y. Shachar-Hill, Metabolic flux analysis in plants: coping with complexity, Plant, Cell and Environment 32 (2009) 1241–1257. [121] J.T. van Dongen, et al., Phloem import and storage metabolism are highly coordinated by the low oxygen concentrations within developing wheat seeds, Plant Physiology 135 (2004) 1809–1821. [122] H. Rolletschek, W. Weschke, H. Weber, U. Wobus, L. Borisjuk, Energy state and its control on seed development: starch accumulation is associated with high ATP and steep oxygen gradients within barley grains, Journal of Experimental Botany 55 (2004) 1351–1359. 70 E. Collakova et al. / Plant Science 191–192 (2012) 53–70 [123] M.M. Paluska, A.K. Dobrenz, R.T. Ramage, Seed size and seedling components in Arivat barley, Journal of the Arizona-Nevada Academy of Science 14 (1979) 88–90. [124] R. Schuetz, L. Kuepfer, U. Sauer, Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli, Molecular Systems Biology 3 (2007) 119. [125] X.W. Chen, A.P. Alonso, D.K. Allen, J.L. Reed, Y. Shachar-Hill, Synergy between 13 C-metabolic flux analysis and flux balance analysis for understanding metabolic adaption to anaerobiosis in E. coli, Metabolic Engineering 13 (2011) 38–48. [126] J. Schwender, J.B. Ohlrogge, Probing in vivo metabolism by stable isotope labeling of storage lipids and proteins in developing Brassica napus embryos, Plant Physiology 130 (2002) 347–361. [127] B.H. Junker, J. Lonien, L.E. Heady, A. Rogers, J. Schwender, Parallel determination of enzyme activities and in vivo fluxes in Brassica napus embryos grown on organic or inorganic nitrogen source, Phytochemistry 68 (2007) 2232–2242. [128] J. Schwender, Y. Shachar-Hill, J.B. Ohlrogge, Mitochondrial metabolism in developing embryos of Brassica napus, Journal of Biological Chemistry 281 (2006) 34040–34047. [129] G.J. Niemann, J.B.M. Pureveen, G.B. Eijkel, H. Poorter, J.J. Boon, Differential chemical allocation and plant adaptation – A Py-Ms study of 24 species differing in relative growth rate, Plant Soil 175 (1995) 275–289. [130] R.R. Wise, J.K. Hoober, Synthesis, export and partitioning of end products of photosynthesis, in: R.R. Wise, J.K. Hoober (Eds.), Structure and Function of Plastids, Springer, Dordrecht, The Netherlands, 2007, pp. 274–288. [131] H. Poorter, M. Bergkotte, Chemical composition of 24 wild species differing in relative growth rate, Plant, Cell and Environment 15 (1992) 221–229. [132] G.E. Edwards, V.R. Franceschi, E.V. Voznesenskaya, Single-cell C4 photosynthesis versus the dual-cell (Kranz) paradigm, Annual Review of Plant Biology 55 (2004) 173–196. [133] W. Majeran, Y. Cai, Q. Sun, K.J. van Wijk, Functional differentiation of bundle sheath and mesophyll maize chloroplasts determined by comparative proteomics, Plant Cell 17 (2005) 3111–3140. [134] S. von Caemmerer, R.T. Furbank, The C4 pathway: an efficient CO2 pump, Photosynthesis Research 77 (2003) 191–207. [135] W.R. Chen, X. Yang, Z.L. He, Y. Feng, F.H. Hu, Differential changes in photosynthetic capacity, 77K chlorophyll fluorescence and chloroplast ultrastructure between Zn-efficient and Zn-inefficient rice genotypes (Oryza sativa) under low zinc stress, Physiologia Plantarum 132 (2008) 89–101. [136] C.H. Foyer, H. Vanacker, L.D. Gomez, J. Harbinson, Regulation of photosynthesis and antioxidant metabolism in maize leaves at optimal and chilling temperatures: review, Plant Physiology and Biochemistry 40 (2002) 659–668. [137] S.I. Allakhverdiev, et al., Heat stress: an overview of molecular responses in photosynthesis, Photosynthesis Research 98 (2008) 541–550. [138] Z.R. Li, S. Wakao, B.B. Fischer, K.K. Niyogi, Sensing and responding to excess light, Annual Review of Plant Biology 60 (2009) 239–260. [139] O.J. Sanchez, C.A. Cardona, Trends in biotechnological production of fuel ethanol from different feedstocks, Bioresource Technology 99 (2008) 5270–5295. [140] V. Mechin, et al., In search of a maize ideotype for cell wall enzymatic degradability using histological and biochemical lignin characterization, Journal of Agricultural and Food Chemistry 53 (2005) 5872–5881. [141] A. Brautigam, U. Gowik, What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research, Plant Biology 12 (2010) 831–841. [142] F.M. Canovas, et al., Plant proteome analysis, Proteomics 4 (2004) 285–298. [143] J.D. Young, A.A. Shastri, G. Stephanopoulos, J.A. Morgan, Mapping photoautotrophic metabolism with isotopically nonstationary 13 C flux analysis, Metabolic Engineering 13 (2011) 656–665.
© Copyright 2026 Paperzz