Originally published 8 May 2008; corrected 9 June 2008 www.sciencemag.org/cgi/content/full/1154456/DC1 Supporting Online Material for Predictive Behavior Within Microbial Genetic Networks Ilias Tagkopoulos, Yir-Chung Liu, Saeed Tavazoie* *To whom correspondence should be addressed. E-mail: [email protected] Published 8 May 2008 on Science Express DOI: 10.1126/science.1154456 This PDF file includes: SOM Text Figs. S1 to S21 Table S1 References Correction (9 June 2008): Minor corrections to data (in SOM text and in legend of fig. S17). 1. Evolution in Variable Environments (EVE) 1.1 Simulation environment In our simulation framework (EVE: Evolution in Variable Environments) populations of in silico organisms compete and evolve under user-specified dynamic environments. Each organism (cell) contains a biochemical network whose chemical species and their interactions/transformations are modeled according to the “central dogma”. Simulations utilized multi-node supercomputer clusters (BlueGene/L and Beowulf running on an average of 500 node workload for over 2 years). Recent studies (S1-5) have made significant progress in simulating biochemical networks and extracting novel insights in the context of focused biological questions (e.g. circadian rhythms, chemotaxis, segmentation, etc.). Extending from these works, the EVE framework integrates many features that improve the biochemical, evolutionary, and ecological realism of our simulations, features that are crucial for simulating microbial regulatory networks in the context of interactions with the environment. From the biochemical perspective, dynamics are modeled as stochastic events, not ordinary differential equations, a feature that accounts for phenomena arising from small numbers of molecules. Additionally, We take into account the full range of molecular species and their interactions, including genes, mRNAs, siRNAs, miRNAs, proteins, and modified proteins. From the evolutionary perspective, EVE is designed to simulate growth, mutation, and selection in a highly realistic fashion. EVE uses an asynchronous simulation framework, where events such as cell-division, death, mutations etc., occur in a parallel, asynchronous fashion. Another important distinction is the way mutations are modeled, having a range of types and magnitudes that map to specific biological counterparts. Another important dimension of realism is the capacity for gene-duplication and divergence which is a recurring theme in the evolution of complex behavior in many of our experiments. Another very important advantage of EVE—and essential for addressing our primary thesis here—is the way in which the evolving organisms interact with their environments, sensing and responding to dynamic and temporally structured signals. In this context, our organisms are not evolved to explicitly implement specific functions (e.g. an oscillator), rather they evolve to maximize fitness through learning the probabilistic relationships between signals they can sense and the abundance of resources. For ease of interpretation, our simulations here have focused on sharply structured environments that correspond to familiar logic gates (e.g. delayed dynamic XOR). However, real microbial habits are likely characterized by probabilistic relationships amongst a large number of variables that show rather complex and fuzzy relationships with fitness. This is a fundamental departure from the traditional view of the relationship between regulatory-network structure and function. Our simulation framework provides the capacity to explore this new paradigm. The details of the EVE simulation environment are provided below. 1.2 Network components Each organism comprises three types of nodes with distinct functionalities: • • • Gene/RNA Nodes Protein Nodes Modified Protein Nodes Gene/RNA Nodes capture gene regulation and transcription of RNA (mRNA or siRNA). Protein Nodes summarize the translation of mRNA to proteins and model post-transcriptional regulation. Modified Protein Nodes capture the modification (phosphorylation, acetylation, etc) of proteins and other conformational changes that may lead to gain/loss of function. The value of each node is a discrete variable that corresponds to the number of node molecules present at a particular time. We define as node triplet any set of a Gene/RNA Node and its corresponding Protein and Modified Protein Nodes. 1.3 Network Topology The essence of each artificial organism is its internal interaction network. This can be depicted as a directed graph where each vertex is a Gene/RNA, Protein or Modified Protein node, and each edge a regulatory dependency between the two vertices that it connects. Edges are directional and their values can be either negative or positive depending on the type of interaction. A quantitative representation of the resulting graph is the interaction matrix W, where any non-zero value wij denotes the strength and regulation type (negative – positive) of node i by node j. Figure S1 depicts a randomly created internal network and its interaction matrix. 1.4 A probabilistic dynamical model We created a probabilistic dynamical model of intracellular biochemical interactions that regards an organism as a well-stirred and spatially homogeneous reaction chamber. This allows us to efficiently capture stochastic phenomena, for example, noisy gene expression caused by a small number of molecules. However, due to lack of spatial structure, this simulation framework is unsuitable when compartmentalization and spatial localization is important to the function of the network/organism. The general framework is similar to a dynamic Monte Carlo algorithm (S6) with a random selection method where in each time unit the probability of each event (node creation, deletion, mutation, molecule creation and degradation) is calculated and a particular event occurs if its probability is found to be higher than a randomly generated number. Although this approach can be computationally more intensive when compared to first reaction algorithms (S7), it requires less memory and is easier to parallelize in multi-processor environments. 1.5 Expression of RNAs, proteins, and modified proteins The value of each node corresponds to the total number of molecules of a particular type that are present at that given time. The value of each node may fluctuate but always by one molecule at a time when measured in consecutive time points. This fluctuation is a result of molecule creation, transformation or degradation. To calculate the molecule production probability (i.e. the probability that a new molecule will be created) we use two levels of sigmoid functions: One to capture the effect of each individual regulator on the target node, and a second to compute the sum of those effects. More specifically to calculate the molecule production probability of node i we first calculate the function: , … , , , … , , , 1 · tanh ∑$ · , , ̃ "# Where the function , , ̃ " corresponds to the regulatory effect of node j on node i: 1 , , ̃ " · (1 tanh ) 2 * +* ,+* -. & In the previous equations, n is the total number of nodes, basali the basal expression parameter, /01 the value of the ith row and jth column in the interaction matrix, 21 the value of node j, 30 and 40 the midpoint and slope of the main target-specific sigmoid function, 301 and 5,67 the midpoint and slope of the regulatorspecific sigmoid function. In the case of protein translation and modification, the molecule production probability also depends on the number of substrate molecules (i.e. RNA, proteins) that are present in the environment. Additionally, a realistic model has to capture the saturation effects of the translation and modification machinery (i.e. ribosomes, enzymes) at high molecular concentrations. For this reason is multiplied by yet another function 8 that is also a sigmoid in the case of protein translation and modification: 9 8 , · 8 Where 8 : 1 · ;1 tanh < 2 1 + =>?=+@?/ @B++C=+@? Finally, the molecule production probability is given by: + =>?C>+D=+@? + 9 H 0 + 0 I 9 I 1 + 9 J 1 0 E F9 1 Figure S2 gives a general schematic of the expression model and depicts the effect various values of midpoint and slope have on the sigmoid function +* * , +* , , +* ". In our experiments, the parameters 3 and 4 of function 8 take the heuristic values 10 and 4 respectively. Although RNA molecules are not consumed during translation, each modification event converts one unmodified protein molecule to its modified form and thus each protein modification causes its (unmodified protein) pool to be reduced by one. Proteins that get modified have a low (<0.1) probability of returning to their unmodified state. Although external regulation of the modified to unmodified transitions is a feature in EVE, it was disabled in the experiments we present here. 1.6 Degradation Degradation of molecules occurs in a somewhat similar manner. The probability that a molecule of node i is degraded at any given time unit is proportional to the number of molecules present and is given by: 1 DKLMN EKLMN , KLMN , KLMN , DKLMN " 2 1 DKLMN 2 · tanh ) KLMN - KLMN Where 20 , OPQR0 , 3PQR0 , 4PQR0 are the number of molecules present, node specific degradation parameter, degradation midpoint and slope respectively ( KLMN 50 ?B KLMN 4 in the experiments presented here). 1.7 Triplet duplication and deletion A Node triplet can be created or destroyed at any time unit during the course of an experiment. During triplet duplication, a new triplet is created by cloning (duplicating) an existing one. The new triplet initially has exactly the same associations and parameters as the old one, although it may eventually take on its own course during evolution. Similarly, during a triplet deletion event a whole triplet is deleted. It is important to note that these two processes apply on the triplet as a whole (i.e. Gene/RNA, protein and modified protein nodes) and not to a partial set of nodes within the triplet. 1.8 Modeling mutations Like triplet duplications and deletions, mutations happen at any time point and in any organism. Mutations always target one or more parameters in a particular node. During a mutational event, one of the following node-specific parameter groups is altered: • • • • Target node midpoint (30 ) and slope (40 ). Interaction weights (/01), regulator specific midpoint (301 ) and slope (4,01 ). Basal expression parameter (basali). Degradation parameter (OPQR0 ) The parameters within each group were specifically selected in order to match their biological counterparts. A mutation in the first group captures a sequence or conformational change in the target node that has an effect in its regulation function as a whole. A mutation in the second group models the case where a change in residue(s) or base pair(s) – either in the regulator or target node – results in a change of the interaction affinity between the target and the regulator node. This type of mutation is regulator-target specific since it does not affect other regulatory interactions associated with the target node. Finally a mutation may change the basal expression parameter (e.g. promoter becomes more/less leaky) or the degradation parameter (e.g. point mutations in RNA or protein that affects its stability). In addition to the type of parameters that a mutation targets, mutations can also be categorized based on the severity of the change. As such, mutations can be grouped in two classes: Mild and Strong mutational events. In mild mutational events, the parameters drift according to a Gaussian distribution around their current value, whereas in strong mutational events they can take on any random value within the valid (initial) range. Strong mutations are relatively rare events in the course of the experiment and they model large impact changes that can result in loss – or even reversal – of function (e.g. an activating transcription factor that becomes a repressor for a specific target gene). Figure S3 demonstrates the variety of possible mutational events accompanying the evolution of a toy oscillator circuit. 1.9 Biochemical parameters and organism attributes Organism parameters can be classified into three categories and are reported here with the ranges we used during initialization in low mutation rate (LM) experiments. Parameters are sampled from a uniform distribution, unless otherwise stated. • General parameters: o Energy: The energy that the organism has at each time point (initially 8 · 10V energy units). o Fitness: The Pearson-correlation between the time series vectors that represent the pattern of response pathway expression and abundance of environmental resources. o Maintenance Energy Cost: Energy cost per unit time associated with the maintenance of nodes. The maintenance energy cost for node i has a value of Costi = xi∙M energy units, where xi is the node-specific maintenance cost and M the mutation modifier parameter. For simplicity, in our experiments xi = 1 for all nodes. The mutation modifier per unit time models the cost of DNA repair and other proofreading mechanisms and is the relative ratio between the mutation rates at the beginning of the experiment to its current value. Thus, an organism evolved to have lower mutation rate has a higher mutation modifier parameter M (here M is equal to one since mutation rate per organism is constant). o Response pathway cost: Energy cost per unit time associated with maintaining the response pathway (modified protein RP1) active. It is linearly proportional to the number of RP1 molecules present in the organism and it is given by costresponse_pathway = z∙P, where z is the default cost for one protein per time unit (here z = 10) and P is the number of RP1 molecules. • • Mutation Specific parameters: o Triplet Creation Probability: The probability of triplet creation in each time unit (10-7 in LM experiments). o Triplet Destruction Probability: The probability of triplet destruction in each time unit (10-7 in LM experiments). o Mild Mutation Probability: The probability of mild mutation in any node per unit time(10-6 in LM experiments). o Strong Mutation Probability: The probability of strong mutation in any node per unit time (2∙10-7 in LM experiments). o Evolvability: The rate of change in the creation, destruction, strong and mild mutation probabilities. This is used only in experiments that allow organisms to vary their mutation probabilities during the course of evolution (here 0.2 in all experiments). Kinetic and Network related parameters: o Number of Nodes: The total number of nodes in the cell (initially from 0 to 30). o Node abundance: Abundance of molecules corresponding to a node (initially from 0 to 30). o Connectivity: The fraction of actual to all possible connections in the internal network (initially from 0.01 to 0.20). o Basal: A vector that contains the basal levels of transcription, translation and modification for all nodes in the internal network (initially from 0 to 0.5). o Degradation: A vector that contains the degradation rates of all node values in the internal network (initially from 0.05 to 1). o Interaction matrix: A matrix with all regulatory dependencies between nodes and environmental signals. Distribution of interaction weights initially obeys a power law with W 1.5 while the connection probability of an environmental signal to a node decreases exponentially with the number of nodes (in the same organism) it is already connected to. o Target Slope and Midpoint: Two vectors that contain the slope and midpoint of the primary target-specific sigmoid function that is used to model the regulation of a specific node by other nodes and signals (initially slope ranges within [1,3] and midpoint within [1,1]). o Regulator Slope and Midpoint: Two vectors that contain the slope and midpoint of a secondary regulator-specific sigmoid function that is used to model the effect of regulation by a node or signal of another node (initially the regulator slope ranges in [1,4] and the regulator midpoint in [0,10]. 1.10 Modeling organism-environment interactions Organisms compete in an environment of constant population size that may contain any number of signals with arbitrary or predefined characteristics. Various events that have a deleterious or beneficial fitness effect on the organisms occur during the time course of an experiment. The temporal pattern of event occurrence is flexible, ranging from purely stochastic to perfectly periodic. In the present work experiments have a single event type, namely the presence of energy resources that organisms can harvest through the expression of a metabolic pathway that is represented by a modified protein called “response protein 1” (RP1). Environmental signals can be programmed to correlate – partially or fully with the future occurrence of an event, making it possible for organisms to predict environmental trajectory. To achieve this, the organisms have to “learn” how to extract environmental information from one or more signals. A specific node can couple/decouple its expression to an external signal as a consequence of random mutations. Signal values can range from -1 to 1 and their regulatory effect is directly proportional to that value. The creation and destruction of any RNA, Protein or Modified Protein has an energy cost. Of special interest is RP1, the modified protein of triplet 1, whose creation cost is ten times more than that of other proteins. In addition to its high creation cost, the response protein has a maintenance cost per unit time that is proportional to the number of its molecules present at any one time. This models well the real metabolic machinery in cells, whose expression and maintenance is energetically costly. The probability of death decreases exponentially with the energy level of the organism and approaches unity as energy levels get close to zero. In order to keep the population size constant, a randomly generated organism is placed in the population after the death of another. The amount of energy that organisms acquire from the environment per unit time is proportional both to the amplitude of the resource and to the response pathway expression. When an organism’s energy level reaches a predefined threshold (in our analysis it is set to twice the initial energy level), the organism undergoes mitosis. During mitosis, the organism is cloned and its energy level is reduced to half. The newly created daughter cell takes the place of the organism with the least energy. 1.11 EVE simulation algorithm We developed two different algorithms that simulate our evolutionary environment and obtained similar results with both. Although the results presented here are derived from the real-time version of the simulator, both generation-based and real-time implementations are described for the sake of completeness. 1.11.1 Generation based simulator At the beginning of each experiment, a fixed-sized population of organisms that receive a predefined amount of energy units is created. During a given time duration, organisms undergo mutations while their node values are continuously updated. At the end of the time interval, a new generation of organisms is formed by “sampling with replacement” from the current population. The probability for an organism to be selected for the next generation is directly proportional to the energy that it has at that point. Each time a new generation is formed, the energy of all organisms is initialized. The simulation ends after a predefined number of generations. 1.11.2 Real-time simulator At the beginning of each experiment, a fixed-sized population of organisms that receive a predefined amount of energy units is created. At each time point during the experiment, organisms can mutate, divide and die in a parallel and asynchronous manner as described in the previous section. Statistics are gathered at periodic time intervals (epochs) without this affecting the evolutionary trajectory of the experiment. The experiment ends after a predefined amount of time units have elapsed. 1.12 Auxiliary algorithms 1.12.1 Deconstruction The deconstruction algorithm was developed to assess the link/node significance and derive the minimal internal network (i.e. the core computational module) of an organism. The algorithm deconstructs networks in either a “parallel” or “serial” mode. In parallel deconstruction, the significance of each individual link in the internal network of an organism is evaluated within a time window which is at least an order of magnitude larger than the interval between two epochs. During this evaluation process, a link gets knocked out and the fitness of the resulting organism is assessed. Regardless of the link significance, each link is restored in the internal network at the end of every evaluation. The program terminates once the fitness effect of all links is assessed. Serial deconstruction operates similarly with the difference that if a link proves to be insignificant (less than 5% fitness change with respect to the initial fitness), it is not restored in the original network, which leads to systematic pruning of non-essential links. To cope with redundant pathways in the network, serial deconstruction performs multiple randomizations (permutations) of link assessment. While parallel deconstruction is able to find the fitness impact of a single mutation in the network, serial deconstruction enables us to recover redundancies and minimal networks responsible for a particular phenotype. Each link significance is assessed multiple times (3 in our analysis) both in parallel and serial deconstruction, while serial deconstruction creates several (3 in our analysis) permutations of the order by which links get knocked out in order to uncover coexisting minimal networks. The results of parallel and serial deconstruction can be visualized as a fitness matrix (eg. Fig. S13-A), depicting the negative fitness fraction change between the initial and mutated phenotypes. 1.12.2 Epistasis analysis Epistasis is defined as any non-additive interaction between two or more distinct mutations, such that their combined effect on a phenotype deviates from the sum of their individual effects. Here, we compare the fitness effect of link deletions in an organism’s internal network, both when two deletions occur independently or concurrently as a pair. We use a variation of the parallel deconstruction algorithm that knocks out two links simultaneously at each evaluation cycle in order to assess the epistatic potential of link pairs, as manifested in their combined fitness effect (Fig. S13C, S14A). 1.12.3 Reverse engineering As an alternative approach to network inference, we used the ARACNE algorithm (S8) which is based on mutual information analysis of expression data. We conducted time series experiments and gathered the expression signatures of all nodes in any given network. To accurately mimic “microarray” experiments, we collected node expression data under different environmental conditions and stimuli, as for example forcing one of the signals up or down. Figure S15B shows the topology of the reconstructed network that belongs to the organism in Fig. 4, where the reconstruction algorithm discovered 4 out of the 11 links that are essential for the organism to retain its phenotypic behavior. However the reconstructed organism has none of the phenotypic properties of its original counterpart, thus rendering it essentially unfit under the original selection environment. Nevertheless, systematic analysis of reconstructed networks evolved under various mutation rates suggests that network reconstruction can indeed be helpful in identifying a portion of the essential links in the biochemical network, possibly at the expense of a high false-positive rate (Figure S16). 1.13 Additional simulations All experiments presented here have a fixed population size of Np= 200 and time duration D = 1.8∙107 time units. Relative mutation rates are 1:1:10:2 for creation, destruction, mild and strong mutation probabilities respectively. To obtain statistical significance for each environment and mutation rate combination, we ran 64 independent simulations. Based on the selection pressure in the environment, experiments can be classified into the following categories: • • Delayed Gates: Signals and resource are related by OR, AND, XOR, NAND, NOR dynamic logic functions. Multi-gates: Signals and resource are interchangeably related by combinations of OR, AND, XOR, NAND, NOR dynamic logic functions. • • • Oscillators: Selection pressure to evolve oscillatory expression of RP1 with or without a periodic guiding signal. Bi-stable Switches: Selection pressure to evolve bi-stability in environments where two environmental signals operate as ON/OFF pulse switches. Duration/variance Locking: Selection pressure to evolve networks that predict the duration of an environmental resource that has fluctuating duration or phase variance. Figure S4 summarizes the correlation between the environmental signals and the resource presence in each experiment type. Figures S9-S11 depict minimal networks and/or phenotypes of representative organisms that belong in the above categories (evolved oscillatory and multi-gate organisms have large complex networks and thus are not included here). 2. Experimental Methods 2.1 Bacterial strains and growth conditions E. coli strain MG1655 and its derivates were used in all physiology and evolution experiments. A library of Tn5 transposon mutants (S9) with a diversity of ~106 was used as the starting point for evolution experiments in order to: 1) increase the diversity of genetic perturbations on which selection acts and 2) allow marking and monitoring of individual mutants within the evolving population. Bacteria were grown in batch or bioreactor vessels in M9 minimal media supplemented with 0.4% glucose. 2.2 Controlled temperature and oxygen perturbations Physiological perturbations were carried out under a controlled environment in the context of bioreactor (Bioflo 110, New Brunswick Scientific) growth. Thermoelectric sensors and heaters were used to shift temperature profiles between 250 C and 370 C, and polarographic dissolved oxygen sensors (Mettler Toledo) and nitrogen gas was used to rapidly change oxygen saturation between anaerobic (0% dissolved oxygen) and aerobic (16-21% dissolved oxygen) condition. The cultures were maintained in exponential phase (O.D.600 0.2-0.4) through controlled dilution, where fresh media is pumped in and spent media is pumped out at a controlled rate. Prior to the perturbations, cells were maintained in the pre-transition environment for at least eight generations (8-24 hours). All experiments were performed in duplicate, with high reproducibility (Fig. S17). Unless specifically indicated, all perturbations were performed during exponential growth, and cells were harvested at variable intervals and rapidly mixed in an ice-cold ethanol/phenol solution (5% water-saturated phenol in ethanol) in order to stop mRNA degradation. After spinning down the cells at 5000 rpm for 5 minutes at 40 C, the media was discarded, and cell pellets were frozen on dry-ice/ethanol for storage at -800 C. 2.3 Microarray transcriptional profiling A hot-phenol procedure was used to extract total RNA. Cell pellets were lysed with 500 µl TE (pH8.0), 50 µl 10% SDS and lysozyme (0.5 mg/ml). Total RNA was extracted sequentially with phenol/chloroform (preheated to 640 C), followed by chloroform/isoamyl alcohol. RNA was precipitated with 1/10 volume of 3M NaOAc (pH 5.2) and 2 volumes of ethanol. After incubating overnight at –200 C, samples were spun down and pellets were washed with ice cold 70% ethanol (prepared with DEPC- H2O). RNA was resuspended in water, DNase treated (RQ1 RNase-free DNase/ Promega, WI) and purified using an RNeasy purification kit (Qiagen, CA). Fluorescent cDNA was synthesized from total RNA with 15 µg of total RNA serving as template. Cyanine-labeled Cy3 or Cy5-dUTP (Amersham Bioscience) and pdN6 random hexamers (GE healthcare) were used in reverse transcription reactions utilizing SuperScript II (Invitrogen, CA) for cDNA synthesis. Fluorescent genomic DNA was generated as described in (S9). Genomic DNA served as a universal reference for all hybridizations. E. coli genomic DNA was fragmented (500 – 2000 bp) using mechanical shearing, and subjected to Cy3 or Cy5-dUTP labeling using the BioPrime DNA labeling system (Invitrogen, CA). Following purifications through CyScribe GFX Purification Kit (Amersham Biosciences), Cy-dye labeled genomic DNA and RNA-derived cDNA probes were combined with a hybridization buffer (5X SSC, 0.1% SDS, 50% formamide, and 10 μg Salmon sperm DNA) and hybridized for ~ 16 hours at 420 C to a DNA microarray containing 95% of all ORFs in the E. coli MG1655 genome (S9). All microarray experiments were carried out in biological duplicates and showed high reproducibility (Fig. S18). 2.4 Microarray data analysis Microarray slides were scanned on a GenePix 4000B scanner (Axon Instruments), and fluorescence data for each of the duplicates on each slide were analyzed using GENEPIX PRO 5.0 software. In a quality- control step, elements with poor spot morphology or exhibiting uneven hybridization caused by dust particles or scratches, were flagged manually and excluded from further analyses. After local background subtraction and global normalization relative to the genomic DNA reference, duplicate measurements on the same array were averaged, yielding a single vector for each hybridization. Seven time-points were assayed for each perturbation, corresponding to 0, 4, 8, 12, 20, 28, and 44 minutes post transition. After relative scaling, biological duplicates from all experiments were averaged, leading to a single set of timeseries data for each perturbation. To compare our results to other perturbations, we combined our temperature/oxygen transcriptional responses with previously published studies of UV (S10) and osmolarity stress (S11) in E. coli. For most of the analyses, we focused on changes in gene expression that occurred at various points (e.g. 20 minutes) relative to the zero time-point reference. Log (base 2) transformations of fold-changes were utilized in determining the global correlation between various perturbations (Fig. 5B, 6D). Sets of differentially expressed genes were defined as showing 1.5 fold or higher increase/decrease in gene expression relative to the time zero reference. Variation of this threshold from 1.33 to 2.00 gave very similar results. The hyper-geometric distribution was used to determine the chance-probability of observing overlaps in sets of differentially regulated genes (Fig. 5A). The combined expression dataset (consisting of 2612 genes across 34 conditions/time-points) is available and can be downloaded from our web site. 2.5 Experimental evolution under an ecologically incoherent dynamic environment A diverse library of Tn5 transposon mutants (S9) was used as the starting point for evolution of E. coli MG1655 for optimal growth under a dynamic environment where temperature and oxygen transitions were positively correlated with a 40 minute time-lag (Fig. 6A). In order to avoid selection for periodic behavior, intervals of constant temperature/oxygen were sampled from a Gaussian distribution with a mean of 150 minutes (37C/O2+) and 300 minutes (25C/O2-) and standard deviations of 30 minutes (37C/O2+) and 60 minutes (25C/O2-), respectively. In total, 44 cycles of selection were carried out over the course of 384 hours. The starting population contained ~1x1010 transposon mutants (diversity of 5x105) growing in exponential phase (maintained between O.D.600 0.20-0.30 by continuous pumping of fresh media and discarding spent media) in a volume of ~500 ml of M9 media supplemented with 0.4% glucose. Periodic sampling of the culture allowed us to monitor growth-rate and archive representative samples of the population at -800 C. The presence of a single randomly inserted transposon in every genome allowed us to monitor population diversity and track high-fitness mutants during the evolution experiment through the application of a gel-based genetic footprinting procedure (Fig. S20). The amplification of transposon insertion sites from a sample of 100 colonies isolated at the end of selection identified a highly abundant mutant that made up 55% of the population. This “evolved” mutant, designated as aldB::Tn5 was used in subsequent fitness evaluation and physiology experiments. 2.6 Competition between parental and evolved strains To establish the fitness advantage of the evolved strain, we carried out competition experiments within the setting of the dynamic environment imposed during the evolution. The evolved and the parental strains were distinguishable through the differential expression of a LacZ marker. This system allowed us to easily quantify the relative fitness advantage of the evolved mutant using colony counting on plates (S12). A 60-hour competition experiment established the strong fitness advantage of the evolved strain (Fig. S21). 3. Figures and tables 0 16.19 0 [0 0 0 Z 0 0 0 W =Z 0 0 Z0 Z0 9.91 0 Y0 0 0 0 0 13.64 0 0 0 b a 0 0 0 a 0 3.77 0 a 0 0 0 à 0 0 0 Figure S1 | A randomly created internal network and corresponding interaction matrix. Red and blue edges denote activation and repression respectively. The absolute value in each edge corresponds to the weight of regulation while a positive/negative sign represents activation/inhibition respectively. Green edges depict creation in the case of mRNA transcription and protein translation, or transformation in the case of protein modification. A fi1(x1,m ˜ i1,s˜i1) fi2(x2,m ˜ i2,s˜i2) Gi ... Pi fin(xn,m ˜ in,s˜in) B C 0.8 Probability 1.0 ˜ = 2.5 m ˜ = 3.75 m m ˜ =5 m ˜ = 6.25 m ˜ = 7.5 0.6 0.4 s˜ = 1 s˜ = 2 s˜ = 3 s˜ = 4 0.6 0.4 0.2 0.2 0 ˜ =5 m 0.8 Probability 1.0 s˜ = 2.5 1 2 3 4 5 6 7 8 Number of Molecules 9 10 0 1 2 3 4 5 6 7 8 9 10 Number of Molecules Figure S2 | Expression model and molecule production probability. (A) A two level sigmoid function is used in the expression model. The molecule production probability is represented as a second level sigmoid that has as input the sum of all individual regulatory effects. (B) Plot of function , , ̃ " for different values of the midpoint, while keeping the slope constant , +* 2.5 (C) Plot of function , , ̃ " for different values of the slope, while keeping the midpoint constant ( 5 Figure S3 | Impact of mutational events on network parameters. (A) Network topology of a toy oscillator, that was evolved based on an initially user defined, non-functional design. Proteins MP0 and P0 activate (red lines) G1, while MP1 represses (blue line) G0 and G1. The P0-G1 activation (interaction weight c ) is depicted with a dashed line since it is deleted during the evolution. (B) Expression profile of both triplets over time. In each plot, blue, green and red waveforms correspond to RNA, protein and modified protein levels respectively. (C) Key parameters and their values in the initial state, after type1, type2 and type3/4 mutations. The expression profile of both triplets for each state is shown at the bottom of the figure. Figure S4 | Correlation of environmental signals with resource abundance for various environments. The resource abundance (red) during a specific signal pattern (black) is given for all resource-signal correlations mentioned in the text. Depending on the selection pressure there can be none, one or more signals present in the environment: oscillators have zero or one, toggle switches and gates have two signals while multi-gates have three. In a toggle switch, Signals 1 and 2 serve as ON and OFF switches respectively. In a multi-gate the third signal indicates a switching event, for example a transition from an AND to an OR logic. Figure S5 | Fitness (Pearson-correlation) Histogram for all delayed logic gate experiments. Each subplot depicts the histogram of Pearson-correlation between response protein expression and resource abundance of the fittest organism in the final populations (64 final populations per mutation rate/environment) of Supplementary Table 1. Each row corresponds to a gate type (AND, OR, XOR, NOR, NAND) and each column to a mutation rate (VLM, LM, MM, HM). For each subplot, x-axis and y-axis represent Pearson Correlation and number of experiments respectively. Signal S2 Signal S2 Signal S1 Signal S1 A B Response Resource Signal S1 Signal S2 Response Resource Signal S1 Signal S2 Time 1.14 MP0 P0 1.09 2.60 RP1 MP2 P1 MP3 P2 MP0 2.03 P3 12.28 RP1 P0 Promoter 0 G0 -0.63 RNA1 1.27 RNA2 4.21 Promoter 1 G1 P1 1.58 1.83 RNA0 Time 3.39 -1.34 1.86 RNA3 Promoter 2 G2 Promoter 0 G0 1.53 P2 2.34 1.86 1.85 RNA2 Promoter 1 G1 Signal S2 Signal S1 C -1.24 -1.27 RNA1 1.11 RNA0 G3 Promoter 3 MP2 1.13 1.23 Promoter 2 G2 Signal S2 Signal S1 Response Resource Signal S1 Signal S2 D Response Resource Signal S1 Signal S2 Time Time 1.05 1.37 0.97 1.86 -7.74 -7.74 1.37 MP0 9.70 RP1 MP2 MP3 -5.16 -8.82 5.09 -8.82 -8.82 5.90 -8.82 -7.86 4.28 -6.39 P1 0.97 1.37 P2 P0 P3 1.37 1.91 -1.841.91 2.58 2.30 2.13 2.04 1.85 -1.63 1.85 5.46 RNA0 RNA1 RNA2 RNA3 2.36 2.60 Promoter 0 G0 Promoter 1 MP0 RP1 3.60 -8.26 -10.79 -1.38 -9.74 7.25 P0 2.26 -5.41 7.38 2.71 RNA0 0.37 2.60 G1 Promoter 2 G2 G3 Promoter 3 G0 Signal 2 1.23 Promoter 0 7.35 P1 8.71 -8.87 3.84 -2.26 RNA1 Promoter1 G1 Signal 1 E Response Resource Signal S1 Signal S2 Time -11.87 1.93 0.33 0.36 1.90 -7.69 MP0 RP1 MP2 1.76 3.07 -9.86 -8.80 7.35 P0 -10.03 -11.56 4.25 28.82 3.23 RNA0 4.62 P1 -7.70 G0 -6.65 11.86 2.72 0.90 7.41 6.97 RNA1 Promoter 1 RNA2 1.20 -0.62 Promoter 0 P2 -7.19 G1 G2 Promoter 2 Figure S6 | Network topology of the fittest cell during the evolution of a delayed XOR. The networks portrayed here correspond to the fittest cell at epochs (1-5) of [Fig. 3]. Activation and inhibition are represented by red and blue arrows respectively, while solid and dashed lines indicate essential and nonessential links. The phenotypic behavior of each organism is depicted in the upper right corner. (A, B) The initially random, low fitness network evolves to a network that partially infers the resource fluctuation in the environment by coupling, although poorly, to signal S1. (C) Later, organisms evolve to predict all resource peaks, although their behavior is noisy and not matched well with resource abundance. (D) Once a stable phenotype emerges, it takes over the population despite the fact that organisms with higher fitness were present in the population. (E) A triplet duplication (triplet 0 to triplet 2) and subsequent optimizing mutations give rise to the final fit organism. Fitness Figure S7 | Fitness trajectory in a delayed XOR experiment. The evolution of a separate XOR experiment under the same conditions as that of [Fig. 3] is depicted. (A) Fitness trajectory of a delayed XOR experiment where the presence of any of the two signals alone is necessary and sufficient condition for future resource abundance. Red and black lines correspond to the highest and mean fitness in the population at each epoch respectively. Fitness is defined as the Pearson-correlation between resource abundance and response protein expression. (B) The phenotypic behavior of the fittest organism at different points along the evolutionary trajectory. Each subplot consists of four rows: the first row depicts the abundance profiles of the RNA (blue), protein (green), and modified protein (red) of the response pathway. The second, third and fourth rows correspond to the resource abundance, and environmental signals S1 and S2 respectively. Figure S8 | Network topology of the fittest cell during the evolution of a delayed XOR. The networks portrayed correspond to the fittest cell at epochs (1-4) of Fig. S7. Activation and inhibition is represented by red and blue arrows, while solid and dashed lines indicate essential and non-essential links respectively. The phenotypic behavior of these organisms is depicted in the upper right corner. The initially random and unfit network progressively evolves to a network that can accurately predict the resource fluctuation in the environment. (A) Initial random network. (B) Partial correlation evolved within the first 500 epochs. (C) A duplication event of triplet 0 facilitates the evolution of a higher fitness organism. (D) Further optimization by less severe mutations and emergence of the final phenotype. Signal S2 Signal S1 A Signal S1 Signal S2 B Response Response Resource Resource Signal S1 Signal S1 Signal S2 Signal S2 Time Time 13.89 MP3 RP1 P3 P1 MP4 MP6 RP1 P4 P6 P1 RNA4 RNA6 2. 73 3.03 11.66 RNA3 RNA1 RNA 3 RNA1 Promoter3 G3 Promoter1 G1 Promoter4 G4 G6 Signal S1 Signal S2 C Promoter 3 Promoter 6 G3 RNA 4 -1. 04 -1. 21 Promoter 1 G1 G4 Signal S2 Signal S1 D C Promoter 4 Response Resource Signal S1 Signal S2 Response Time Resource Signal S1 MP5 Signal S2 MP0 Time -5.83 P5 -1.99 P0 -1.21 1.10 -1.22 RNA5 RNA0 1.34 MP4 13.42 Promoter 5 G5 P4 MP 0 Promoter 0 G0 MP3 RNA4 MP2 RNA1 P1 Promoter 4 RNA0 G0 2.83 2.41 7.39 Promoter 0 P1 2.11 RP1 0.99 P0 RP1 -4.05 RNA 1 G1 G4 P3 -5.52 RNA 3 Promoter1 Promoter 3 P2 -1.47 Promoter 1 G1 RNA 2 G3 Promoter 2 G2 Figure S9 | Minimal networks of delayed dynamic logic gates evolved in environments with low mutation rate. (A) Delayed OR gate where the response pathway is activated through protein P3, modified protein MP6 or both simultaneously. The presence of either signal S1 or S2 is sufficient for the activation of the response pathway. Protein P4 catalyzes the modification of P1 to MP1 and increases the overall fitness of the system (Pearson correlation between MP1 expression and resource presence increases to 0.82 from 0.63) (B) Delayed NOR gate where the absence of both signals is necessary for the activation of the response pathway. (C) Delayed AND gate where the response protein is expressed only when both signals are present. Signal 2 initiates the transcription of G1, while high MP0 concentration is necessary for the translation of P1. The latter happens only in the presence of signal S1, since it regulates the expression of G0. (D) Delayed NAND gate where the response protein is not expressed when both signals are present simultaneously. Figure S10 | Minimal network and behavior of an ON/OFF switch. (A) Signal S1 (ON signal) activates the transcription of G1 that leads to the production of RNA1, P1 and RP1. RP1 acts in a positive feedback loop that activates the translation of P1 and locks the creation of RP1 molecules in the absence of Signal S2 (OFF signal). Once a Signal S2 pulse is present, MP7 and P7 directly or indirectly (through inhibition of RNA3, a positive regulator of P1-MP1 modification) inhibit the expression of MP1. (B) Signal S1 and Signal S2 pulses indicate the start and the end of resource presence in the environment respectively. Response protein expression (first row, red line) correlates well (>0.9 Pearson Correlation) with resource abundance (second row). Number of Occurencies Figure S11 | Evolution of a duration inference mechanism. (A) In an experiment where the duration of the resource presence is variable, organisms have evolved to infer duration variability and highly correlate the expression of response protein with the presence of resource. The duration inference mechanism is based on node coupling to a continuous environmental signal (Signal S3) whose phase is indicative of the environmental variance. (B) Distribution over 90 epochs of response protein duration when the duration of resource presence in the environment interchanges between 4000 ± 300 and 6000 ± 300 time units. Response protein expression follows a bimodal distribution with two peaks that correspond to short (red) and long (blue) expression periods. The duration of a response pulse is defined as the time interval between the ascending and descending edge, measured at the 50% of peak expression amplitude. Number of Cells Number of Cells Figure S12 | Fitness and network-size trajectory during the evolution of the delayed XOR network ([Fig. 3]). Each point along the time axis corresponds to an epoch, i.e. 4,500 time units in the experiment. (A) A time series depiction of population fitness for all cells (organisms). There are two fitness fixation points at 0.6 and 0.95 that occur around 1200 and 3000 respectively and correspond to the emergence of the one and two signal “mutual exclusive” computational modules, as described in the paper. After the second dominant fixation we observe a decline in the variation of population fitness. (B) Node number distribution in the internal non-minimal network of all cells. Nodes are always created and destroyed in triplets (gene, protein and modified protein nodes), hence the discrete nature of the plot. Although the node number variance decreases after the first fixation (where only six nodes comprise the minimal network), there is still triplet duplication events that ultimately lead to the optimal phenotype (which requires a minimum of nine nodes). MP2 P2 RNA 2 RP1 P1 RNA 1 MP0 P0 RNA 0 MP2 P2 RP1 RNA 2 P1 RNA 1 MP0 P0 Connection ID RNA0 Figure S13 | Network deconstruction in the delayed XOR experiment ([Fig. 3]). (A) Fitness effect of single link knockouts (parallel deconstruction) illustrates link essentiality in the internal network. The fitness impact of links vary from zero (green) to one (dark red) and corresponds to the fitness fraction change that the link severance causes with respect to the fitness of the original network (a 100% fitness change is denoted by 1, 42% by 0.42 and so forth). Matrix elements that depict non-existent links are colored blue. (B) Knockouts of node pairs reveal functional similarities between nodes: simultaneous deletion of all links in nodes P0 and P2 reveals strong negative epistasis. (C) Epistatic interaction matrix of all 81x81 possible link pair knockouts. Each matrix element shows the fitness effect of a specific link pair knockout. Non-existent link pairs are colored blue. A Pair Link ID (0,3) (0,5) (1,4) (1,5) (2,0) (2,6) (3,0) (3,1) (3,3) (3,5) (3,6) (3,7) (4,0) (4,3) (4,4) (4,6) (5,1) (5,2) (5,5) (5,7) (5,8) (6,3) (6,5) (7,4) (7,5) (8,0) (8,6) (0,3) 0.34 0.96 0.34 0.44 0.02 0.46 (0,5) 0.34 0.95 0.33 0.46 0.02 0.46 (1,4) 0.34 0.96 0.34 0.42 0.02 0.44 (1,5) 0.35 0.95 0.33 0.44 0.45 (2,0) 0.34 0.95 0.33 0.44 0.03 0.48 (2,6) 0.34 0.95 0.34 0.46 0.02 0.44 (3,0) 0.34 0.95 0.34 0.43 0.02 0.46 1 0.9 0.8 (3,1) 0.34 0.34 0.34 0.35 0.34 0.34 0.34 0.33 0.34 0.94 0.34 1.00 0.34 0.34 0.33 0.35 0.34 0.34 0.35 0.98 0.34 0.34 0.34 0.34 0.34 0.34 0.33 (3,3) 0.34 0.97 0.34 0.45 0.7 0.02 0.47 Pair Link ID (3,5) 0.96 0.95 0.96 0.95 0.95 0.95 0.95 0.94 0.97 0.96 0.95 0.94 0.96 0.96 0.95 0.95 0.99 0.95 0.91 0.98 0.95 0.95 0.95 0.96 0.96 0.96 0.95 (3,6) 0.34 0.95 0.34 0.41 0.02 0.44 0.6 (3,7) 0.34 0.33 0.34 0.33 0.33 0.34 0.34 1.00 0.34 0.94 0.34 0.34 0.34 0.34 0.34 0.34 1.00 0.34 0.35 0.34 0.33 0.34 0.33 0.33 0.34 0.34 0.33 (4,0) 0.34 0.96 0.34 (4,3) 0.34 0.96 0.34 0.33 0.95 0.34 0.35 0.95 0.34 (4,4) 0.02 (4,6) 0.20 0.20 0.46 0.02 0.42 0.47 0.02 0.53 0.43 0.46 0.44 0.46 0.5 0.02 (5,1) 0.44 0.46 0.42 0.44 0.44 0.46 0.43 0.34 0.45 0.99 0.41 1.00 0.46 0.47 0.43 0.44 0.44 0.42 0.49 0.99 0.44 0.47 0.41 0.42 0.44 0.44 0.43 (5,2) 0.34 (5,5) 0.95 0.34 0.42 0.03 0.02 0.02 0.35 0.02 0.91 0.02 0.35 0.02 0.02 0.4 0.46 0.02 0.47 0.02 0.02 0.49 0.02 0.02 0.03 0.02 0.3 (5,7) 0.46 0.46 0.44 0.45 0.48 0.44 0.46 0.98 0.47 0.98 0.44 0.34 0.42 0.53 0.46 0.46 0.99 0.46 0.47 0.44 0.43 0.47 0.47 0.48 0.42 0.46 0.48 (5,8) 0.34 0.95 0.33 (6,3) 0.34 0.95 0.34 (6,5) 0.34 0.95 0.33 (7,4) 0.34 0.96 0.33 0.42 0.02 0.48 (7,5) 0.34 0.96 0.34 0.44 0.02 0.42 (8,0) 0.34 0.96 0.34 0.44 0.03 0.46 (8,6) 0.33 0.95 0.33 0.43 0.02 0.48 0.02 0.44 0.02 0.43 0.47 0.02 0.47 0.02 0.41 0.47 B 0.2 0.1 0 C3 C2 C1 P1 P1 RNA 1 P1 P2 G1 P0 G1 RP1 G1 P2 P1 28.82 RNA 1 RP1 P0 RP 1 RP 1 11.86 P0 7.41 6. 97 P2 Promoter 1 G1 P0 - 8. 80 -9. 86 P1 P2 -15.16 RNA 1 Promoter 1 G1 Figure S14 | Epistatic Analysis of a delayed XOR network ([Fig. 4]). (A) Epistatic matrix of existent pair link knockouts derived from the augmented epistatic matrix of Fig. S13c. Only fitness changes more than 1% are shown. Each link pair is represented by the target-regulated and its source-regulator node respectively. Nodes are numbered from 0 to 8 from the lower triplet index to the higher and from RNA to Protein to Modified Protein. For example, matrix row (3,1) denotes the knockout of the direct regulation of G1 by P0. Strong negative epistasis is observed at the severance of both positive regulatory links from P0 and P2 to G1 (link pair (3,1) and (3,7)), both inhibitory interactions between P0, P2 and P1 modification (link pair (5,1) and (5,7)), as well as their crosslink combinations: knockout of P0 (P2) mediated activation of G1 and repression of P1 modification by P2 (P0) – which corresponds to link pairs (3,1)-(5,7) and (3,7)-(5,1) respectively. Positive epistasis occurs during simultaneous deletion of both inhibitory RP1 to G1 and self-activating RP1 to RP1 interactions, during deletion of repression/activation pair P1-RP1 and P1-G1 (link pair (5,1) and (3,1)), and during deletion of P2-RP1 and P2-G1 (link pair (5,7) and (3,7)). (B) Monochromatic clustering of epistatic interactions (Segre,D. et al. Nat. Genet. 37, 7783) reveals the existence of remarkably compact sub-modules that are responsible for keeping P1 expression on after a S1/S2 signal pulse (c1), response pathway activation (c2) and delayed behavior of the circuit (c3). A G0 50 P0 45 MP0 40 G1 35 P1 30 RP1 25 G2 P2 20 MP2 15 S1 10 S2 5 R 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Time B S1 S2 MP0 RP1 P0 P1 G0 G1 MP2 P2 G2 E + + - NE RC + + + + Figure S15 | Microarray analysis and reverse engineering. (A) Depiction of node expression (G0MP2), signals (S1-S2) and resource (R) abundance of the fittest organism evolved in a delayed XOR environment ([Fig. 4]). The organism successfully predicts resource abundance and expresses RP1 to harvest resources when present. (B) Network reconstruction derived from the application of an information-theoretic reconstruction algorithm (ARACNE; Basso,K. et al. Nat. Genet. 37, 382-390) and “microarray” data of network activity for various combinations of exposures to signal S1 and S2. Thick and thin red lines represent essential (E) and non-essential (NE) links that were successfully reconstructed (RC) respectively, while the blue dotted lines map to essential links that were not recovered by the reconstruction algorithm. Black lines indicate links that are predicted in the reconstructed network but which do not exist in the original network. Sensitivity Precision 60.00% Original 40.00% Minimal 20.00% 80.00% 60.00% Original 40.00% 0.00% ANDlm ANDmm ANDhm ORlm ORmm ORhm XORlm XORmm XORhm NORlm NORmm NORhm NANDlm NANDmm 0.00% 1 3 5 7 9 11 13 Minimal 20.00% ANDlm ANDmm ANDhm ORlm ORmm ORhm XORlm XORmm XORhm NORlm NORmm NORhm NANDlm NANDmm Precision 80.00% Sensitivity 100.00% 100.00% 1 3 5 7 9 11 13 Figure S16 | Statistics of reconstructed networks using a mutual information-based algorithm (ARACNE). We report the total number of links, link percentage, true positives (TP), false positives (FP), de de dh ), sensitivity ( ) and specificity ( ) of true negatives (TN), false negatives (FN), precision ( defge defgh dhfge the reconstructed network compared both to the original and minimal network of each organism. The two plots indicate the precision and sensitivity of reconstruction with respect to the original and minimal network. Interestingly, the precision of reconstruction is higher for the original network while the sensitivity of reconstruction is higher for the minimal network. The apparent asymmetry stems from the fact that all links in the minimal network are essential and in that respect they have to be present in the reconstructed network for it to fully exhibit its phenotype. B 41.0 36.0 T 31.0 26.0 21.0 -20 0 20 40 60 80 Temperature (0 C) Temperature (0 C) A D 20.0 15.0 O2 10.0 5.0 0.0 -20 0 20 40 60 Time after O2 shift (min) 80 31.0 T 26.0 21.0 -20 0 20 40 60 80 Time after temp.shift (min) Dissolved O2 (%) C Dissolved O2 (%) Time after temp.shift (min) 36.0 20.0 15.0 O2 10.0 5.0 0.0 -20 0 20 40 60 80 Time after O2 shift (min) Figure S17. Temperature and oxygen perturbation profiles are highly reproducible. Temperature and oxygen levels were precisely monitored and controlled using a New Brunswick BioFlo 110 bioreactor (New Brunswick Scientific). All experiments were performed in duplicate with high reproducibility (red vs. blue). (A) Temperature is shifted from 250 C to 370 C within a 20 minute interval while the culture is maximally aerated (18% dissolved O2). (B) Temperature is shifted from 370 C to 250 C within a 20 minute interval while the culture is maximally aerated (18% dissolved O2). (C) Oxygen saturation is downshifted within a 20 minute interval while temperature is maintained at 250 C. (D) Oxygen saturation is up-shifted within a 20 minute interval while temperature is maintained at 370 C. 2 B r = 0.969 10 10 10 1 0 -1 2 10 T up; 16 min. (2) T up; 0 min. (2) A 10 r = 0.951 1 10 0 10 -1 10 10 -1 0 1 10 10 10 2 10 10 0 1 10 10 2 T up; 16 min. (1) T up; 0 min. (1) 2 10 T up; 16 min. (1) C -1 r = 0.808 1 10 0 10 -1 10 10 -1 10 0 1 10 10 2 T up; 0 min. (1) Figure S18. Comparison of microarray expression profiles. High reproducibility of independent experimental replicates: (A) temperature up perturbation, 0 minutes, (B) temperature up perturbation, 16 minutes. (C) Lower correlation reflecting differential expression between 0 and 16 minutes post temperature up transition. UP-REGULATED DOWN-REGULATED A 292 109 O2 B 316 292 89 292 O2 212 50 p < 10-31 51 296 373 p < 10-1 327 OSM 247 71 247 O2 p < 10-9 161 p < 10-28 T 43 O2 p < 10-11 230 T O2 UV 78 247 O2 T O2 D p < 10-58 T O2 C 198 354 p < 10-1 UV 32 335 p < 100 OSM Figure S19. Transcriptional responses to ecologically correlated perturbations in temperature and oxygen show statistically significant overlap. Sets of genes that are differentially expressed by a factor of 1.5 fold show highly significant overlap when temperature and oxygen perturbations correspond to ecologically correlated changes accompanying transitions between the outside environment and the GI-tract (A, B). Significantly lower overlap is seen between the oxygen down-shift response and UV response (C), and response to hyper-osmolarity stress (D). B A transposon chromosomal DNA 0 48 96 144 192 240 288 Lyse cells or isolate genomic DNA from mutants Digest with restriction enzymes Y-linker Ligate to Y-linker 1st cycle of PCR PCR amplification of transposonadjacent genes DNA In vitro transcription RNA cDNA synthesis cDNA Figure S20. Parallel amplification of transposon-adjacent sequences and monitoring population diversity. (A) Genomic DNA extracted from a population of transposon mutants exposed to a selection is restriction digested, and the resulting fragments are ligated to a partially double-stranded Y-linker (Girgis et al. PLoS Genetics 3(9): e154 (2007)). In the initial PCR cycle, primer annealing to a sequence within the transposon and DNA polymerase mediated extension generates a double-stranded DNA template. In subsequent PCR cycles, primers annealing to a region within the transposon and the top strand complement of the Y-linker allows exponential amplification of transposon-adjacent sequences. An in vitro transcription reaction using the PCR products generates RNA which, in the presence of reverse transcriptase and modified nucleotides, generates fluorescently labeled cDNA. The fluorescently labeled cDNA can then used in subsequent microarray hybridizations in order to get a genome-wide picture of transposon insertions. (B) The DNA products resulting from the PCR amplification of transposon adjacent sequences can be easily visualized on an ethidium-bromide gel, providing a qualitative picture of population diversity throughout the evolution experiment. At the beginning, the population has maximal diversity (smear at time zero). Distinct bands correspond to individual mutants that become increasingly abundant as a function of selection. white/red: B 0 hr 12 hr 24 hr 50/57 86/71 392/99 Population fraction A 36 hr 162/9 48 hr 60 hr 414/10 302/5 1 0.8 0.6 Evolved 0.4 Parental 0.2 0 0 12 24 36 48 60 Hours Figure S21. Competition experiments between the parental and the evolved strain. (A) Differential expression of a LacZ marker allows monitoring of relative growth in mixed competition experiments. (B) The evolved strain rapidly outcompetes the parental over the course of 60 hours of growth under ecologically incoherent correlations between temperature and oxygen. VLM LM MM HM Total AND 14/64 17/64 40/64 22/64 93/256 (36.3%) OR 0/64 7/64 18/64 0/64 25/256 (9.8%) XOR 0/64 4/64 13/64 0/64 17/256 (6.6%) NOR 3/64 25/64 39/64 23/64 90/256 (35.2%) NAND 0/64 4/64 5/64 11/64 20/256 (7.8%) Total 17/320 57/320 115/320 56/320 (5.3%) (17.8%) (35.9%) (17.5%) Supplementary Table 1 | Success-rate of five delayed dynamic logic gate experiments across four different mutation rates. All experiments were conducted for 4000 epochs (1.8∙107 time units). An experiment is considered successful if the input-output characteristics of the fittest organism are representative of the gate we select for. Operationally this translates to having the Pearson Correlation between the resource abundance and the response protein expression higher than 0.7. From left to right the columns represent environments that have very low mutation (VLM, triplet creation probability 10-8), low mutation (LM, triplet creation probability 10-7), medium mutation (MM, triplet creation probability 10-6), high mutation (HM, triplet creation probability 10-5) rate. In each table entry the first value corresponds to the number of successful experiments and the second to the total number of experiments conducted. Other experiments with lower success rate included oscillators with (3/256) and without (1/512) synchronization signal, bistable switches (2/64) and OR-XOR multi-gates (1/128). References S1. P. Francois and V. Hakim, Proc.Natl.Acad.Sci.U.S.A 101, 580-585 (2004). S2. P. Francois, V. Hakim, E. D. Siggia, Mol.Syst.Biol. 3, 154 (2007). S3. Rodrigo G., Carrera J., A. Jaramillo, CEJB 2, 233-253 (2007). S4. O. S. Soyer, T. Pfeiffer, S. Bonhoeffer, J.Theor.Biol. 241, 223-232 (2006). S5. A. Wagner, Proc.Natl.Acad.Sci.U.S.A 102, 11775-11780 (2005). S6. K. A. Fichthorn and W. H. Weinberg, J.Chem.phys. 95, 1090-1096 (1991). S7. D. T. Gillespie, J.Phys.Chem. 81, 2340-2361 (1977). S8. K. Basso et al., Nat.Genet. 37, 382-390 (2005). S9. H. S. Girgis, Y. Liu, W. S. Ryu, S. Tavazoie, PLoS.Genet. 3, e154 (2007). S10. J. Courcelle, A. Khodursky, B. Peter, P. O. Brown, P. C. Hanawalt, Genetics 158, 41-64 (2001). S11. K. J. Cheung, V. Badarinarayana, D. W. Selinger, D. Janse, G. M. Church, Genome Res. 13, 206-215 (2003). S12. S. F. Elena and R. E. Lenski, Nat.Rev.Genet. 4, 457-469 (2003).
© Copyright 2026 Paperzz