In Silico Evolution of early Metabolism. Alexander Ullrich, Christoph Flamm, Markus Rohrschneider, Peter F. Stadler Abstract Scenarios of Evolution We developed a simulation tool for the evolution of early metabolism, allowing us to speculate on the formation of metabolic pathways from catalyzed chemical reactions and network properties. Our model consists of a protocellular entity with a simple RNAbased genetic system and an evolving metabolism of catalytically active ribozymes that manipulate a rich underlying chemistry. Ensuring an almost openended and fairly realistic simulation is crucial for understanding the first steps in metabolic evolution. We show here how our simulation tool can be helpful in arguing for or against hypotheses on the evolution of metabolic pathways. We demonstrate that seemingly mutually exclusive hypotheses may well be compatible when we take into account that different processes dominate different phases in the evolution of a metabolic system. Our results suggest that forward evolution shapes metabolic network in the very early steps of evolution. In later and more complex stages, enzyme recruitment supersedes forward evolution, keeping a core set of pathways from the early phase. Energy Energy Energy Energy Energy Energy Biomass Biomass Biomass (a) (b) Forward Evolution Assumes that an organism is able to make use of certain molecules from the environment. Individuals that can produce these beneficial molecules by themselves gain an advantage in selection in the case of depletion of the “food source”. Therefore, new chemical reactions are added that produce beneficial molecules from precursors that are abundant in the environment or that are produced in turn by the organism’s metabolism. As a consequence, one should observe more ancient enzymes downstream in present-day metabolic pathways. Towards the entry point of the pathway, younger and younger going enzymes should be found. Forward evolution could be seen as an extension or counterpart of the backward evolution hypothesis, reversing the direction of pathway evolution. Granick, and later Cordon, argue for a pathway evolution in forward direction, requiring that the intermediates are already beneficial to the organism. This is in particular plausible for catabolic pathways, where the organism can extract more energy by breaking food molecules downs to simpler and simpler end products. Older enzymes are then expected to be upstream in the pathway, with younger enzymes appearing further downstream. Patchwork Model Shell Hypothesis The patchwork model explains the formation of pathways by recruiting enzymes from existing pathways. The recruited enzymes may change their reaction chemistry and metabolic function in the new pathways and specialize later trough evolution. This introduction of new catalytic activities lead to a selective advantage. Looking at the constitution of a pathway formed by enzyme recruitment, we should observe a mosaic-like picture of older and younger enzymes mixed throughout the pathway. The shell hypothesis was proposed by Morowitz. It argues for the case of the reductive citric acid cycle that in the beginning an auto-catalytic core is formed from which new catalytic activities and pathways could be recruited and fed. Thus a metabolic shell would form around this core. Enzymes in the core would likely be less prone to mutational changes because they are essential for the organism. Thus, one should still be able to observe a core of ancient enzymes. Existing New Existing Backward Evolution same chemistry Core Shell Recruit (c) (d) Hypotheses about the formation and evolution of metabolic pathways [1]. (a) Backward evolution, (b) Forward evolution, (c) Patchwork model, (d) Shell hypothesis. Colored squares represent enzymes, gray circles are metabolites. Color encoding for enzymes stand for their age, red being older and blue being younger enzymes. classification tree of organic reactions for functional assignment of the catalytic set (Step B). Thus we have implemented an evolvable sequence-to-function map, allowing the metabolic organization to escape from the confines of the chemical space set by the initial conditions of the simulation. The metabolic subsystem is built upon a graph-based artificial chemistry endowed with a built-in thermodynamics. To generate the metabolic reaction network, induced by the catalytic set on the set of metabolites, a rule-based stochastic simulation is performed. Reaction rates are calculated “on the fly” from the chemical graphs of the reactants. To identify the elementary flux modes, i.e., extreme pathways, of the resulting reaction network, a metabolic flux analysis is performed. (Step D). The fitness of an organism is computed as the maximum of the yield function (e.g. biomass production) over all extreme pathways. Finally, genetic variation operators are applied to the genome (Step E). U A CC UG G G AA A UU GG U C AG CG C CG A B E C flux v B C v The computational model ([2]) is composed of a genetic and a metabolic subsystem. The genetic subsystem is implemented as a cyclic RNA genome. A special sequence motif indicates the start of genes which are of constant length. The RNA sequence corresponding to the “coding sequence” of a gene is folded into the (secondary) structure using the Vienna RNA Package(Step A). During chemical reactions, bond formation/breaking is confined to a small subset of atoms of the reacting molecules. A cyclic graph abstraction, called the imaginary transition state (ITS) [3], can be used to capture the changes in the reactive center. Furthermore, over 90% of all known organic reactions can be classified by their ITS and organized in a hierarchical structure [3]. Sequence and structure features of the folded RNA gene products are mapped into the flu x Computational Model O D flux v O A Scheme of the simulation system. (A) Decoding of (RNA-)genes to catalytic molecules; (B) Assignment of catalytic functions to “ribozymes”, through mapping from structural and sequential information of the RNA molecule to a reaction logo in the hierarchy [3]; (C) Construction and stochastic simulation of the metabolic network; (D) Metabolic Flux analysis and fitness evaluation; (E) Application of genetic variation operators. Results For this study, we performed 100 runs with a population size of 100 cells running for 100 generations and performing 100 network expansion (stochastic simulation) steps per generation, the input molecules were cyclobutadiene, ethenol, phthalic anhydride, methylbutadiene, and cyclohexa-1,3-diene. We use data from a simple simulation run, starting with only two input molecules and developing only few enzymes, for the visualization of an evolutionary time series and the reaction lifetime overview. Another, more complex, setting is used in a simulation run in which we investigate the evolutionary history of the involved genes/enzymes, depicted in the catalytic function genealogy for all generations. 4230404140 20402040 204040 4240204040 302040 424040 3141404021 21424020 413140414140 31404140 214141404140 SOURCE C6O6 2140 4040 SOURCE C6O6 2140 4040 2140 4040 C12 O12 2140 4040 C12 O12 C12 O12 C12O12 2140 4040 2140 4040 2140 4040 204040 C12 O12 C12 O12 C12 O12 C6O5 2140 4040 2141414020 C6O6 C6O6 2140 4040 2140 4040 C12O12 C12O12 C12O12 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 C12O12 C12O12 C12O12 C12 O12 C12O12 402130 21414040 2140 4040 204040 2140 4040 204040 C12O11 C12O11 204040 relative metabolite connectivity relative enzyme connectivity 30 20 10 C 0 20 40 60 80 C12O11 C12O11 214120 C12O11 214140 204040 C12O10 C 214040 SINK SINK 412020 (a) (b) 424131 424041404040 5 SOURCE C6O6 424130 SOURCE C6O6 C6O6 C6O6 4140402140 0 100 204040 10 2140 4040 0 C12O11 2140 4040 2140 4040 204040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 C12O12 C12 O12 C12 O12 C12 O12 C12 O12 C12 O12 C12O12 2140 4040 2140 4040 204040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 2140 4040 C12 O12 C12O12 C12 O12 C12 O12 C12 O12 C12 O12 4240402140 0 occurrence time of enzyme [generations] 20 40 60 80 C12 O12 100 occurrence time of metabolite [generations] 204040 204040 C12 O12 C6O5 204040 2140 4040 C12 O12 2140 4040 2140 4040 204040 C12 O11 C12O11 C12 O12 C12O12 C12O12 204040 204040 C12O11 C12O11 C12O11 C12O11 C12O11 204040 C12O11 204040 C12 O11 204040 204040 C12 O11 204040 204040 204040 C12 O12 C6O5 204040 2140 4040 C12 O12 2140 4040 C12 O11 2140 4040 2140 4040 C12O11 C12O11 204040 204040 2140 4040 204040 C12 O11 204040 C12O11 C12O11 204040 204040 C12 O12 C12O12 204040 204040 C12 O11 C12 O11 C12O11 204040 204040 C12O11 C12 O11 204040 414141402140 204040 C12O11 404040 204040 4020304040 C12O10 Average relative connectivity of (left) enzymes and (right) metabolites introduced in the same generation, for 100 generations. The height of the bars shows the fraction of the overall connections that are accounted by enzymes/metabolites from a particular generation. All values are averages over 100 simulation runs. Input molecules are not considered in the statistic, they account for nearly 50 percent of metabolite connectivity. 1 1 forward links backward links forward pathways backward pathways 0.8 relative abundance relative abundance 0.8 0.6 0.4 0.2 forward links backward links forward pathways backward pathways C12O10 C SINK C12 O10 C12O10 C 21404040 SINK (c) 403040 (d) 414020 214020 A series of simulated metabolic networks after (a) 10, (b) 30, (c) 66, and (d) 100 generations. Colored squares represent chemical reactions, gray circles represent metabolites. Metabolites involved in a reaction are connected to it in the network graph. The coloring for the reactions encode their age, where red stands for older (occurrence in early generation) and blue for newer (later generation) reactions. 20412040 413130 402040 21424040 0.6 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 54 54 202040 52 52 50 50 48 48 46 46 0.4 402140 44 44 42 42 40 40 38 38 4021404020 413140 28 28 26 26 0 20 40 60 time [generations] 80 0 We have introduced a simulation tool that models the early evolution of metabolism in a quite realistic setting and provides many tools for the detailed investigation of metabolic evolution. Using both simple example and a series of more complex simulation runs, the evolution of the components on the small scale (metabolites, enzymes) as well as on systems (pathways, networks) was investigated. The simulations allow to discriminate between different scenarios for the evolution of metabolic pathways. Based on the observations from this study, we argue that the different evolutionary hypotheses can be reconciled, in that they act in different phases of evolution, i.e. in different scenarios we might observe another strategy at work. Here, we suggest that forward evolution dominates in the earliest steps and is then superseded by a phase of enzyme recruitment, however, leaving behind a trace in form of a core set of forward evolved pathways. 36 36 34 34 0.2 32 32 30 30 0 Conclusion 0 20 40 60 213140 24 24 22 22 80 time [generations] 20 20 18 18 414141 16 16 14 14 414040 12 12 10 10 Evolutionary history of simulated metabolic networks. For the first 100 generations, we show the number of links and pathways that conform to the forward and backward evolution scenarios, respectively. Links are pairs of (left) consecutive reactions or (right) consecutive metabolites along a pathway. A pathway is identified as “forward-evolved” if at least one of its links is forward and none backward. In the first generations, the network consists predominantly of forward (reaction) links and pathways. After about 20 generations, the relative abundance of forward pathways decreases drastically but quickly reaches a persistent plateau value. We find that enzymes from earlier stages are significantly higher connected than those from later stages. Further, we observe forward evolution in the first period succeeded by patchwork evolution, however keeping a core set of pathways from the early steps. 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 8 8 6 6 4 4 2 2 214030 0 0 100 100 412030 Life Time Diagram: Reactions Life Time Diagram: Reactions 312040 Life-time diagram for metabolic reactions. (left) Life-time of reactions, (right) union network graph over all 100 generations. The reactions in the life-time diagram are positioned corresponding to their position in the union network graph. The rows have colored entries if the corresponding reaction was present at a certain generation (columns 1-100). Older reactions are red, newer blue. The colored bars show the age distribution of reactions in the network in the same order as in the lifetime overview. The first bar represents our results, following the pattern for backward evolution, forward evolution and the patchwork model. In an early and simple environment forward evolution is the main evolutionary force in our simulations. In longer simulations we observe all kinds of evolutionary events, such as divergence, convergence, multiplication and deletion. 2140304140 Genealogy of catalytic functions and gene dosage over 2000 generations. Each row represents an observed catalytic function. Black horizontal lines indicate time intervals in which genes coding for that catalytic function were present in the genome (0-200: from left to right). The thickness of the black lines indicates the number genes with a given function. Thin vertical red lines indicate points where the accumulation of mutations caused a transitions between catalytic functions. If the number of genes copies in a function class increases without a transition from another gene, then the increase is due to a gene duplication. A new gene can be created in the genome through the fortuitous formation of a TATA-box. Conversely, a gene can vanish if its TATA-box is destroyed by mutation. On the left of the chart a numerical encoding of the graph transformations performed by the “enzyme” is plotted. References [1] G. Caetano-Anolles et. al. The origin and evolution of modern metabolism. Inter J Biochem and Cell Biol (2009). [2] Flamm et. al. Evolution of metabolic networks: A computational framework. J. Syst. Chem. (2010) [3] J.B. Hendrickson. Comprehensive system for classification and nomenclature of organic reactions. J Chem Inf ComputSci (1997) Albeit our simulation environment is still a drastic simplification of chemistry, it is realistic enough to investigate the evolution of early metabolism. Computer simulations like this one are likely to provide new insights about the general evolutionary mechanisms governing biological systems in particular in regimes that are not readily observable. Our approach of a realistic, yet computationally feasible, model appears to be a promising step in this direction.
© Copyright 2026 Paperzz