Evolving a Primitive Eukaryotic Cell Cycle Model Malte Lücken, Jotun Hein, Bela Novak Abstract Current published models of eukaryotic cell cycles are minimalistic models of essential components for the cell cycle to function. In reality, the networks making up these cell cycles must be much larger to be robust to deleterious mutations, exhibiting many alternative pathways that can be activated should one pathway fail. Investigating how large and redundant current cell cycles are, can be done by modeling their evolution from a primitive eukaryotic model. In this project we present the Algorithm for Cell-cycle Evolution (ACE), which models the evolutionary process starting at a primitive eukaryotic cell cycle model. Preliminary results show the potential this algorithm has by reproducing artifacts of known cell cycles, and even generating a candidate primitive embryonic cell cycle model. However, the results also indicate room for further development of ACE, especially regarding its criteria used to generate selection pressure for “good” cell cycle models. Further, more detailed investigations into evolved models from ACE are necessary to make comprehensive conclusions. 1 Introduction At the point of conception, whether by sexual or asexual reproduction, any organism is just a single cell. In order to create the biodiversity that we observe in our surroundings today, this cell must replicate itself by division. It is this replication that allows one cell to develop into a complex multicellular organism, or a whole colony of bacteria. However, uncontrolled cell division and proliferation results in cancer. Thus, understanding the regulatory network underlying cell division is of central importance. In the cell division cell cycle of current eukaryotes, the cell goes through four distinct phases, with a fifth possible quiescent phase, which can be entered, halting the periodic cell division the cell otherwise undergoes. The distinction of these four phases (called G1, S, G2, and M) ensures that each of the two daughter cells produced in one cycle obtain exactly one copy of all chromosomes. This process necessitates careful regulation, as it must be carried out with near perfect precision, due to the high frequency at which it occurs. Arguably the most important transition is thus the G1/S transition, where the cell proceeds from a first growth phase (G1) to the DNA synthesis phase (S). As it is of paramount importance that this synthesis only occurs once, eukaryotes 1 have developed a system to ensure this, using a so-called licensing factor [9]. This licensing factor is present before DNA replication starts, and “loads” the origins of replication (ORs) on the DNA with proteins needed to start replication. The accumulation of proteins at the ORs (“loading”) is however not enough to start DNA replication, which requires the presence of an activator. Furthermore, once this activator is present, it is important that the licensing factor is no longer at a sufficient concentration to keep loading the ORs, as this would result in multiple firing of DNA replication and thus more than two copies of the DNA. In light of these considerations, models such as the SIMM (substrate-inhibitormultiply-modified) model [17] have been developed, which give the licensing factor the additional functions of being an inhibitor of the activator, as well as being a substrate of it. Thus, the inhibitor both inhibits the activator, and is modified by it to cause it’s own degradation, resulting in a double negative feedback loop between inhibitor and activator as seen in Figure 1. This type of network motif between the inhibitor and the activator can be observed not only in G1/S transitions, but in many protein regulated checkpoints in more complex cell cycle models, as it is typical for a bistable switch [17]. Figure 1: Schematic diagram of a double negative feedback loop as seen between the Activator and Inhibitor in a SIMM model network. In primitive eukaryotes it has been suggested that S and M phases can be triggered by the same activator [8, 2]. In such a case, the entire cell cycle can be modeled by the aforementioned G1/S transition model, now representing a “G” - “S/M” phase transition, where the cell cycle only switches between the initial growth phase “G”, and the DNA synthesis with subsequent segregation phase “S/M” [9]. In the first phase the licensing factor is at high abundance, which loads the ORs. In this phase the cell grows and the replication machinery is duplicated. In the “S/M” phase, the activator is at high abundance, and the licensing factor has been degraded. Here, DNA synthesis occurs, and as the activator is degraded below a certain threshold segregation and cell division is fired [9]. Rather than representing the original G, S and M phases, the phases in this model represent a pre-replicative phase (G) and a replicative phase (S/M) [9]. This eukaryotic cell cycle model, while very simplified compared to the fourphase model described above, fulfills the condition of having only one copy of each chromosome in each daughter cell. It is therefore a viable model for a primitive eukaryote. 2 On a molecular level, it can be modeled in several ways, two of which can be seen in Figures 2 and 3. In the schematic diagrams in these figures each node represents a protein or a protein complex, and arrows represent biochemical interactions. These reactions are reversible if the base of the arrow-line has a circular shape as in the inhibitor-activator binding reaction that forms the complex. The arrow represents the dominant direction of the interaction, meaning that the rate in this direction is larger than the opposite rate. Furthermore, the four circles represent degradation and arrows pointing towards a node that don’t come from another node, denote synthesis. The equations derived from Figures 2 a) and 3 a) can be found in the supplementary material. Figure 2: a) Schematic diagram of a very simple cell cycle model using only a single “activator” (kinase), inhibited by a stochiometric inhibitor, which is also a substrate of the activator. The activator (ACT) catalyses the degradation of the inhibitor (INH) only in the complex form (COM). In addition to the interactions depicted, there is also a background decay of total inhibitor at rate kdi and total activator at rate kda . b) Solution of the system of non-linear differential equations set up on the basis of the model depicted in a). The exact equations, as well as the parameter sets, can be found in the supplementary material. Mass is not shown on the plot as it varies an order of magnitude higher. The transcription factor (TF) is modeled as a Hill function dependent on the activator. Figure 2 depicts possibly the simplest activator-inhibitor model with a single checkpoint at the G1/S phase transition. Here, the inhibitor and activator form a complex in a reversible process. The inhibitor in this complex is then degraded under catalysis from the activator, returning the activator in the complex back to its original state. In this model it is important that kass kdiss , meaning the inhibitor binds tightly to the activator, and that kass > kcat , which ensures the complex is an inactive form of the activator, and not merely an intermediate step in activator mediated inhibitor degradation. The latter condition ensures a time delay in inhibitor degradation, which is needed to have an oscillating model [4]. The key to oscillation in this model is however not this time delay, or the inhibitor-activator motif, but the transcription factor (TF) - activator relation3 ship. This relationship is modeled by a Hill function governing the level of TF (c.f. (6.1g) in Supplementary Material), and thus introduces a negative feedback loop into the system. This network motif is strongly linked with oscillatory systems [16, 4]. In this model, DNA synthesis is fired as soon as the activator level peaks. Cell division is modeled as mass halving, when the activator level drops below a threshold value. Thus, mass is the parameter controlling cell cycle oscillation with the mass growth rate controlling the period of the cell cycle. Figure 3: a) Schematic diagram of a simplified SIMM model [17]. The activator (ACT) catalyzes the degradation of the phosphorylated inhibitor (INH-P), which is created by activator-inhibitor complex (COM). In addition to the interactions depicted, there is also a background decay of total inhibitor at rate kdi and total activator at rate kda . Furthermore the activator, ACT, is synthesized at a rate ksa × mass × TFa , where TFa is the transcription factor concentration. b) Solution of the system of non-linear differential equations set up on the basis of the model depicted in a). The exact equations, as well as the parameter sets, can be found in the supplementary material. The transcription factor (TF) is modeled as a Hill function dependent on the activator. The simplified SIMM model of a primitive eukaryotic cell cycle is depicted in Figure 3. This model is very similar to the aforementioned one, yet it differs from the prior in that it introduces a phosphorylated inhibitor protein (INH-P), whose degradation is catalyzed by the activator. This system is more robust to fluctuations in activator levels, as the activator cannot immediately start to degrade the inhibitor in the absence of INH-P, but instead it will immediately be bound by any residual INH. Moreover, this mechanism introduces a further time delay in inhibitor degradation, meaning all activator is immediately bound to the inhibitor and therefore the switch between high inhibitor and high activator becomes much quicker. This “quick” switch is desired as it represents a more sensitive checkpoint. When contrasting these simple models for a primitive eukaryote with cell cycle models for fission yeast [13], the different levels of complexity become evident very quickly. However, if the simple models presented above do represent a primitive eukaryote, then the two are merely different stages on the same evolutionary ladder. In order to better understand this ladder, a model for 4 evolution is needed. In this report, an algorithm to model cell cycle evolution is introduced. Next to modeling evolution, such an evolutionary model can provide information on the level of redundancy to be expected in cell cycle regulatory networks, as current models represent cell cycle networks, which were reduced to their essential components and are thus minimalistic. Furthermore, extraction of phylogenetic information from cell cycle models of different species may become possible on this basis. The core of any evolutionary algorithm is an accurate model of the evolutionary process. Evolution from primitive eukaryotes to organisms with more complex, and therefore more robust and/or more sensitive regulatory networks governing the cell cycle is a slow, large scale process. On a smaller scale, this process comprises of small-scale point mutations gradually altering properties of individual proteins, gene duplications, deleterious mutations rendering whole proteins dysfunctional etc. An algorithm which models this evolutionary process, must therefore focus on the small-scale changes that underlie it. However, these changes are individually random and do not necessarily result in an increase in fitness. In nature this is balanced out by having a large number of organisms. All organisms undergo mutations, some increase fitness, others don’t. Those organisms, which are fitter than their peers are more likely to secure the continued existence of their DNA and thus also the regulatory network which governs their cell cycle. This securing of their DNA’s existence can be done in several ways, such as higher likelihood of reproduction, better adaptation to living conditions with a resulting higher survival rate, etc. The Algorithm for Cell-cycle Evolution (ACE) presented in this report attempts to reflect this process in the appropriate detail to accurately model evolution. 2 2.1 Results Primitive Network A cell cycle model is described by a set of ordinary differential equations (ODEs). When designing an algorithm to evolve a cell cycle, it is important that the equation space of all possible systems that describe cell cycles is well defined, as the algorithm must automatically be able to explore this space to build new cell cycle models. In the case of the equations governing the models depicted in Figures 2 and 3 (found in the Supplementary Material section), the mathematical space underlying the equations incorporates multiple types of interactions. These interactions include catalysis, synthesis and degradation, and reversible and irreversible reactions of 1 and 2 reactants, all of which look mathematically different. In these molecular models the necessary complexity to model an oscillating network arises from a complex wiring of many mathematically simple interac- 5 tions. Due to the large amount of possible combinations of interactions, the equation space is large, as is the parameter set describing a system. Sampling such a large space for possible cell cycle networks is a slow process, making the implementation of an evolutionary algorithm on a molecular model of a cell cycle non-ideal. Here, we use a mathematical framework developed by Reinitz and colleagues [7] as described in [16] (c.f. Methods). This framework allows the use of simple combinations of two types of interactions to generate the complexity necessary to model the cell cycle. This is done by hiding the complexity in the sigmoidal node response to interaction signals, thus creating a smaller equation space by having fewer interaction types. Instead of modeling basic interactions, such as binding, synthesis, and degradation of molecular components, this framework models interactions simply as activation or inhibition. This approach also reduces the number of nodes necessary in the network, as e.g. the inhibitor binding tightly to the activator to form a complex, which produces phosphorylated inhibitor to be degraded under catalysis by the activator, is now simply a double negative feedback loop shown in Figure 1. This type of interaction is the basis of both aforementioned molecular model, therefore they can both be summarized in a single Reinitz model. This model is constructed to have the minimum number of nodes required for an oscillating network, 3 [4, 16], as well as a mass input. The network and its solution is shown in Figure 4 below. Figure 4: a) Schematic diagram of a primitive cell cycle in the reinitz framework. There is a double negative feedback loop between activator (Act) and inhibitor (Inh) and a negative feedback loop between activator and transcription factor (TF), which is the main oscillator. b) Solution of the system of non-linear differential equations set up on the basis of the model depicted in a). These can be found in the text. It must be noted that although mass is depicted as a node, it does not behave as a protein node in the Reinitz framework. It is rather characterized by exponential growth, with cell division modeled as mass halving once the activator concentration falls below 0.5. The equations solved on this network, 6 modeled within the Reinitz framework are: dAct dt dInh dlt dTF dt dmass dt = γ1 [F (σ1 W1 ) − Act] W1 = ω10 + ω12 Inh + ω13 TF (2.1a) = γ2 [F (σ2 W2 ) − Inh] W2 = ω20 + ω21 Act (2.1b) = γ3 [F (σ3 W3 ) − TF] W3 = ω31 Act + ω34 mass (2.1c) = µ mass (2.1d) where ω12 , ω21 and ω31 are negative constants, and ω13 and ω34 are positive constants reflecting activation and inhibition. µ is the mass growth rate, and the function F is a sigmoid function with the equation F (σi Wi ) = 1/(1 + exp(−σi Wi )). This simulation incorporates all features expected of a primitive cell cycle. Firstly, the activator peaks quickly after a period of high inhibitor concentration, denoting a sensitive and quick transition into “S/M” phase. Secondly, the activator decays slowly, gradually resetting for the “S/M” - G transition and allowing time for DNA synthesis to occur before cell division is triggered. Moreover, a cycle at constant period is obtained. 2.2 ACE - Algorithm for Cell-cycle Evolution The algorithm for Cell-cycle Evolution (ACE) presented here computes a possible evolutionary path using the above network as a starting point. This is done using an approach, which is designed to closely follow the natural evolutionary process explained in the introduction, and has been shown to be successful in modeling pathway evolution [10, 14]. In ACE a simulation is set up containing N individual, independent “organisms”, each of which is represented by a cell cycle network with a specific parameter set. When the program is initialized all N organisms are identical and must be functional in order for the initial networks to represent a viable starting point for an evolutionary path. To finish the simulation initialization, the fitness of each organism is calculated and scaled to sum to 1. The simulation starts to evolve by one organism being chosen to be reproduced and one organism being chosen to be deleted. These organisms are chosen randomly with their relative fitness representing their likelihood of survival or reproduction. The organism to be reproduced is duplicated and placed in the simulation instead of the deleted organism. Subsequently, the duplicated daughter organism is mutated, producing a new, evolved cell cycle model. Finally, the fitness of each cell cycle is re-calculated and rescaled. This process is shown in Figure 5. 7 Figure 5: Schematic diagram of the computational setup of ACE. A simulation is carried out by first randomly choosing a network for reproduction and one for deletion, weighted by the network fitnesses, F (reproduction) or by 1−F rescaled to sum to 1 (deletion). The network to be reproduced is then duplicated and “evolved” by a random mutation, and saved in place of the network chosen for deletion. Finally the fitnesses of all networks in the simulation are recalculated and rescaled to sum to 1. The approach implemented in ACE means that all organisms in the simulation are in constant competition for survival. The reproduction and deletion weighted by the organism’s fitness results in fitter organisms having a better chance to survive, giving a central role to the method that is used to calculate fitness. The characteristics that are focussed on in this function will determine what causes selection pressure. However, the core of the algorithm is the mutation method. This method defines the individual steps that result in the greater evolutionary process, thereby constituting the sampler of cell cycle equation space. These two functions are explained briefly below. 2.2.1 Fitness calculation In an evolutionary process, it is important that every network along the evolutionary path taken from a primitive cell cycle to any evolved cell cycle, is a viable model for a cell cycle network. This must be the case, as a mutation to an ancestral cell cycle, that does not support cell cycle function, would cause the organism with this mutation to die, and thus its DNA would not be preserved. 8 Therefore, having a functioning cell cycle must be central to fitness calculation in ACE, and therefore the dominant criterion creating selection pressure. Whether a cell cycle in functional or not is determined by its most fundamental necessary condition: oscillation. If a network is to represent a cell cycle, the node concentrations must vary along the same path for every cell cycle oscillation, as the cell cycle is traversed in the same way countless times in any organism. This describes a limit cycle oscillation [11, Chapter 7]. ACE calculates the fitness of a proposed cell cycle network by assigning a value of 1 − N c to all networks that have the potential of entering a limit cycle oscillatory phase, while those that don’t have this potential are assigned a fitness of 0. Here N is the number of nodes in a network and c is a constant fitness cost per node. A network’s fitness is penalized by the number of nodes as transcription and translation of proteins costs energy, thus larger networks have larger energy costs associated. However, as the number of proteins in eukaryotic proteomes tend to be of the order 104 − 105 [5], the additional energy cost of one protein in any primitive eukaryotic organism should be relatively small. Further details on the fitness calculation can be found in the Methods section. 2.2.2 Mutation The actual sampler of cell cycle equation space in ACE is its mutation function. This mutation function defines how a cell cycle network can be evolved to generate a network that represents the next step in evolution. ACE allows one of seven different types of mutation to be applied to the network’s adjacency matrix, as well as allowing a mutation of the continuous node property parameters to occur. The seven types of mutation consist of five network topology mutations: node addition, node duplication, node deletion, edge addition, and edge deletion, and two edge weight mutations: edge scaling and edge inversion. These mutation steps and the mutation of continuous node property parameters are elaborated on in the Methods section. As mentioned in the introduction, current cell cycle models are minimalistic as they represent only essential interactions needed to model the cell cycle of an organism. In reality, a large amount of redundancy can be expected, as any essential pathway, such as the cell cycle, must be robust to random mutations occurring [10]. For this reason, two important concepts are incorporated into ACE. Firstly, the nodes and edges of the initial network input into the algorithm cannot be deleted or, in the case of edges, changed between activation and inhibition by edge inversion. This is done as these edges represent essential interactions for cell cycle function, and are thus likely to be conserved throughout evolution. Furthermore, this has the practical implication that it conserves the central oscillator between the transcription factor and the activator, which helps keeping evolved networks in an oscillating regime. Secondly, an equilibration period is implemented in ACE. In this equilibration period the initial network is only evolved by node and interaction additions, 9 and node duplications. This period serves to grow the network to a certain level of redundancy, so that once the full range of mutation is allowed, a certain level of redundancy has been attained. Equilibrating the system in this way is done either until all networks have reached a predefined node size, or for a predefined number of steps per node. This enables a more realistic evolutionary pathway. 2.3 Evolved Networks The networks evolved from ACE shown in Figures 6, 7, and 8, are results from different runs after 100 iterations on the primitive network shown in Figure 4, with an equilibration period lasting until all networks have at least 6 node (not including mass). The evolved models presented here are 3 examples of runs that generated 12 evolved networks with non-zero fitnesses of which 6 were found to have oscillating solutions using the output parameter set. The results should be interpreted as preliminary studies that show the potential of ACE, as they were only done for a small number of iterations for the minimum simulation size of 3 organisms. Figure 6: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P10, green interactions represent positive interactions (activation) and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system of equations derived from the network depicted in a). Parametrization of the solution was obtained directly from ACE and can be found in the supplementary material. Inhibitor cannot be seen on this plot as it asymptotes to 0. The activator activity can be seen to oscillate in bursts. Figure 6 shows an evolved network in which the inhibitor concentration asymptotes to 0, and thus doesn’t take part in the oscillation. Furthermore, the oscillation does not seem to have a constant period, as a longer period is seen to be followed by a shorter one. This behaviour is an artifact also observed in the “quantized” periods of the fission yeast cell cycle [13]. Moreover, the activator concentration time solution can be seen to oscillate in bursts. This type of behaviour could be caused by the two coupled oscillator 10 motifs (negative feedback loops [16]) between the activator, the transcription factor and the P8 node shown in Figure 6 a). This coupled oscillator may result in a superposition of two oscillations at similar frequencies, causing the beat frequency pattern observed. The negative feedback loop between the inhibitor and P4 can be ignored in this interpretation, as the inhibitor is close to concentration 0. Figure 7: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P8, green interactions represent positive interactions (activation) and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system of equations derived from the network depicted in a). Parametrization of the solution was obtained directly from ACE and can be found in the supplementary material. The progression of activator, inhibitor and TF concentrations occurs occurs in bursts, however the period seems to be relatively stable. The network shown in Figure 7 shows a similar burst-like activator progression as the previous model, however the period appears to be more regular. A possible explanation for this type of behaviour could again be the amount of negative feedback loops, which couple the entire network into two groups (TFAct-P8 and the rest). The large amount of oscillators seem to have superposed to give the observed pattern. 11 Figure 8: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P9, green interactions represent positive interactions (activation) and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system of equations derived from the network depicted in a). Parametrization of the solution was obtained directly from ACE and can be found in the supplementary material. Apart from the mass asymptoting to 0, the cell cycle is similar to the primitive cell cycle model input into the system, suggesting it may be a viable model. In contrast to the previous two networks, Figure 8 shows a progression expected of cell cycle, with the exception of mass asymptoting to 0. This cell cycle exhibits a quick change between high inhibitor and high activator regimes, with a slow decay in activator concentration giving the cell sufficient time for DNA synthesis before cell division occurs, and a regular period. Interestingly, this cell cycle seems to have undergone more node duplications than the previous examples as investigations into the adjacency matrix evolution, and the network diagram, suggest. This indicates duplication of nodes may have played an important part in cell cycle evolution, a result that is also found by Tagkopoulos et al. [14] for metabolic networks. Furthermore, this cell cycle exhibits more stabilizing motifs than those in Figures 6 and 7. These stabilizing motifs are double negative or positive feedback loops, which can stabilize the oscillation [16]. They may have played a role in preventing the burst-like pattern the two previous models exhibited, as this model also has multiple negative feedback loops. However, further investigations are needed to confirm these conjectures. 3 Methods 3.1 Reinitz Framework The mathematical framework used to describe the cell cycle models in ACE was originally proposed by Reinitz and colleagues [7]. It is used here as interpreted and developed by Tyson and Novak [16]. The Reinitz framework incorporates the non-linearity of biological processes directly in the node equation, and can therefore create non-linearity via only two types of interactions, which are governed by a small set of parameters. In this model network node concentrations, Xi , are modeled via the equation: dXi = γi [F (σi Wi ) − Xi ], dt Wi = ωi0 + N X ωij Xj , i = 1, . . . , N (3.1) j=1 where the function F (σi Wi ) is given by: F (σi Wi ) = 1 1 + exp(−σi Wi ) 12 (3.2) In this framework γi is the parameter that governs the time scale on which Xi regulation takes place, σi determines the slope of the sigmoid governing interaction with node Xi and thus characterizes the level of “binary switch”-nature of this node, and Wi is a function which incorporates all interactions with node Xi in the network. The variables ωij from (3.1) denote the interaction strength of node Xj with the node Xi , and ωi0 is the sigmoid offset which determines the Wi threshold for interaction. This last parameter can be interpreted to be the synthesis minus the degradation of Xi . Finally, N denotes the number of nodes in the network. This model scales all quantities Xi to 1, reducing the number of parameters. Furthermore, the sigmoid function F from (3.2) is a reasonable approximation for most interaction types, such as hyperbolas resulting from mass-action, or even Hill-functions scaled to 1. In fact, equation 3.1 can be shown to be a generalization of a simple synthesis and degradation differential equation in the molecular model framework. A more in depth discussion of this model can be found in [7]. In this framework, the tunable parameters, which can be made subject to mutations in the evolutionary algorithm are γi , ωi0 , σi , and the interaction matrix ωij . 3.2 ODE solver All systems of non-linear ODEs in the scope of this project were solved using the XPP/XPPAUT software developed by Bard Ermentrout’s lab [3]. Parametrization of both the molecular and the Reinitz primitive cell cycle models were done with the aid of the XPP-AUTO function, which generates bifurcation diagrams of two dimensional systems of ODEs. These diagrams visualize the effect parameter variation has on the limit-cycle, greatly simplifying the search for good parameter sets. The systems were simplified to two dimensions by modeling some nodes as “fast”, meaning their concentrations adapted according to the slower variables immediately. This allowed them to be described as a function of the “slower” variables, not as a differential equation. 3.3 Evolutionary algorithm The setup of ACE is adequately described by Figure 5. The input network is read into the program as a file with a specific format, which includes all relevant information on the adjacency matrix, the number of nodes, and the ω0 , σ and γ values. This network is copied to all N “organisms” in the simulation, which then become distinct in the process of equilibration. The output of ACE is also done in file format. The networks in this output file are described in the same format as the input network, with their fitnesses displayed as well. Furthermore, all details of the run parameters are included at the head of this file. Details about the format and the algorithm can be found in the comments directly in the code, which is available upon request. 13 The preliminary results shown in Figures 6, 7 and 8 were generated at different random seeds with the same simulation parameters, which can be found in the supplementary material. The code is set up to be able to reproduce results if the random seeds are hard coded into the algorithm. A python script to convert the output of ACE into output readable by Bard Ermentrout’s XPPAUT was also created, and is also available upon request. The general setup of the algorithm is created after Soyer et al’s pathway evolution software [10], and Tagkopoulos et al’s EVE software [14]. The uniqueness of ACE, and also the general complexity of the algorithm, however lies within the fitness calculation function and the mutation function, which are further elaborated on below. 3.3.1 Fitness Calculation As mentioned previously, a necessary condition for a dynamic network to represent a cell cycle is limit cycle oscillation. Thus, in order to determine the fitness of a network as a cell cycle, it must be investigated whether the system of nonlinear ODEs solved on this network oscillates. In 2 and 3 dimensional systems the standard method of testing for oscillating solutions is by linear stability analysis [4, 11]. This method is designed to examine the behaviour of the linearized system in the vicinity of a fixed point, i.e. a point where d Xj /d t = 0 for all j. This is done by taking the Jacobian matrix of the system at that fixed point and calculating its eigenvalues [12, 4]. Complex eigenvalues of the Jacobian denote an oscillating system, however the real part of the complex eigenvalue is of importance when looking for limit cycle oscillations. A negative real part of a complex eigenvalue corresponds to a damped oscillation of the system, and thus the system asymptotically approaches the fixed point [4]. This type of fixed point is stable [11]. However, In order to obtain a sustained oscillation of the system, we need an unstable fixed point, this gives the conditions [4, 11, 12]: ∃λ ∈ Λ : x > 0 ∧ y 6= 0 for λ = x + iy (3.3) Here, Λ denotes the set of all eigenvalues of the Jacobian, one of which is λ. x is the real part of the complex eigenvalue, and y is the imaginary part. The condition effectively says that we are looking for a solution which oscillates outwards from a fixed point [12]. Condition (3.3) therefore does not ensure a sustained limit cycle oscillation without a controlling parameter. However, such a limit cycle oscillation without controlling parameter, is not what we require for a cell cycle model. As the molecular models depicted in the introduction show, mass plays an essential role in controlling cell cycle oscillation. However, as the mass of the system is not modeled as a continuous function of the other variables (cf. Equation (2.1d)), it cannot be included in the eigenvalue and fix point calculations. These calculations are instead performed with a constant mass value, which varies between 0.1 and 0.9 in steps of 0.1, meaning mass-control is not taken into 14 account. Finding an unstable steady state around which the dynamic solution spirals outward means a mass-controlled limit cycle or an uncontrolled limit cycle, often seen in embryonic cell cycles [1, Chapter 17], can occur. Therefore, this is regarded as sufficient to allow for a network to be seen as a cell cycle. This is further elaborated on in the discussion section. In light of these considerations, the fitness of a cell cycle in ACE is calculated by the equation: ( 1 − N c if α = 1 Fitness = 1 if α = 0 after Soyer and Bonhoeffer’s investigation of the evolution of complexity in generic signaling pathways [10]. Here, α = 1 , if the network fulfills condition (3.3) and α = 0 otherwise. A fitness of 0 is assigned to non-functional cell cycles to ensure these are not reproduced, and therefore the evolutionary pathway taken by any functioning cell cycle output by ACE is one that could have occurred naturally. 3.3.2 Network Mutation As mentioned previously, the mutation function incorporates seven possible mutation steps to evolve a cell cycle network to one that represents the next generation of cell cycle in an evolutionary process. Each time the mutation function is called, one node is picked at random, and one of these mutations is performed based on a predefined probability, with the exception of initial network nodes, which are preserved to a certain extent. If an edge mutation is to be performed, one of the outgoing or incoming interactions of this node is selected to be mutated, with the exception of the mass interaction in the initial network. This is chosen to remain the same to encourage mass-control being conserved in evolved networks. Additionally, there is a probability that the node properties (i.e. ω0 , σ and γ) of the selected node are also scaled. This is considered separately large-scale changes such as node additions/deletion/duplications or interactions changes are likely to also bring about changes in e.g. binding energy between two proteins, slightly changing the way a node reacts within the cell cycle. These properties may in fact change very slightly much more often than other mutations occurring, however ACE models this as larger changes over larger time periods. Of the seven mutation processes, node and edge deletion, as well as node duplication are trivial, as a node or edge is simply selected at random and deleted or duplicated. Edge inversion changes an interaction between inhibition and activation and therefore multiplies the interaction weight ωij by −1. In edge addition, an interaction that the selected node doesn’t already have is chosen from all possible incoming and outgoing interactions with the exception of an interaction outgoing to the mass node. The chosen interaction is added by sampling a value from a uniform distribution over the domain of allowed 15 interaction weights, found in Table 1. The weight assigned to this interaction must fulfill the condition |ωij | > 0.01. Node addition is implemented similarly to edge addition. In this mutation type a new node is added into the network, which has the same number of interactions as a randomly chosen node already in the network. This node is forced to have at least one incoming and one outgoing interaction, with all interaction weights sampled after the same method as edge addition is performed. The ω0 value is sampled from a uniform distribution similarly to ωij , however if none of the incoming interactions are inhibitive, the domain of ω0 is restricted to negative values. This is done, as the node concentration could otherwise only increase when simulated, as evident from equation (3.1). The node characteristic parameters of γ and σ are sampled from a probability distribution function 1 defined between 0.1 and 10. This pdf has an expectation value (pdf) p(x) = 2x of 1 and looks uniform on a logarithmic scale. Interaction scaling and scaling of node properties are implemented based on a similar thought process. As nothing is known about the parameter regions which give a functioning cell cycle model before the networks are evaluated by the fitness function, it is impossible to drive the network parameters into these regions via mutation. This would be a common approach representing a discrete analogy of the Ornstein-Uhlenbeck process [6, Chapter 15]. Thus, the scaling method of the network parameters in ACE is implemented to have an expectation value of 1, i.e. the parameters are not changed on average. This method is implemented to represent a random walk through parameter space [6, Chapter 15]. It is also for this reason that interaction weights are sampled from a uniform distribution. In light of this consideration, interaction scaling is implemented by sampling a value from the Gaussian distribution N (0, 0.5), and using this as the exponent of 2 to give a scaling factor. The gaussian sampling is bounded in the domain [−2; 2] meaning the maximum scaling possible is by a factor of 1/4 or 4, while approximately 68% of scaling factors are between 0.707 and 1.414. In node property scaling, γ and σ are scaled by sampling an exponent of 10 from the Gaussian distribution N (0, 0.1), while a new ω0 value is sampled directly from the Gaussian N (ω00 , 0.5), where ω00 is the previous value of ω0 . As is the case in node addition, if there are no incoming negative interactions, ω0 is only sampled in the negative range. Moreover, all scaling is done so that the values scaled remain within their respective domains, shown in Table 1. The mass growth rate µ from Equation (2.1d), as well as the threshold at which the mass is halved is left constant over evolution in ACE, as these values control only the oscillation period. 4 Discussion and Conclusions The preliminary results obtained from short simulations of ACE on the primitive cell cycle model in the Reinitz framework show that while not all evolved networks represent good cell cycles, they do generate artifacts found in current 16 Mutated Variable ωij ωi0 γi σi Domain [−10; 10] [−5; 5] [0.1; 10] [0.1; 10] Table 1: Allowed domains for mutated parameters eukaryotic cell cycle models. Figures 6 and 7 specifically do not represent expected outcomes of eukaryotic cell cycles, however the beat-frequency pattern generated by these models can be likened to activator progression in the cell cycle of Chlamydomonas Reinhardtii [18]. This algae exhibits different activity of a cell cycle related kinase dependent on the light-dark cycle, which controls growth rates. Figure 8 on the other hand fulfills the characteristics of a good cell cycle model, however does not seem to be mass-controlled. This type of model may be relevant when considering embryonic cell cycles [1, Chapter 17]. Obtaining such a mass-independent embryonic cell cycle was mentioned as a possible outcome of ACE, considering the method of fitness calculation uses a constant mass value. This, as well as only 6 of the 12 cell cycle networks resulting from the preliminary simulations being functional for the given parameter set, shows that the fitness function has room for improvement. The fitness function is built around finding steady states of the system that fulfill condition (3.3), meaning the main criterion for the fitness calculation of a network is that the dynamic solution spirals outward from an unstable fixed point. This type of oscillating solution can generally result in one of three dynamic paths: it can spiral outwards toward infinity, it can spiral outward to settle at another fixed point, or it can spiral outward toward a limit cycle around the unstable fixed point. Within the mathematical framework implemented here, the first is impossible, as the domain is bounded by Xj ∈ [0; 1] ∀j. If the latter occurs, the oscillation condition is fulfilled and thus the network can be seen as a viable cell cycle model. However, should the solution spiral toward a stable steady state, the problem becomes more complex. Steady states move and can disappear if a bifurcation parameter is varied [11, Chapter 3]. If the stable steady state, that a solution spirals toward, moves or disappears due to the variation of such a bifurcation parameter, the solution can enter a limit cycle oscillatory phase previously blocked by this steady state. This type of dynamic is what is meant by mass-controlled limit cycle oscillation, in the case that mass is the described bifurcation parameter. However, even if this occurs for a bifurcation parameter that is not mass, the network can be regarded as a potential cell cycle model. As finding new cell cycle network topologies is the aim of this algorithm, finding the potential of limit cycle oscillation independent of a specific parametrization, is deemed sufficient to see if a network topology has the potential of being a cell cycle. As the network grows with evolution, the number of fixed points which the 17 network can spiral toward increases. Finding suitable parameter sets that allow for oscillation, if existent, at larger scales thus becomes increasingly difficult and therefore infeasible for an algorithm which evaluates fitness for every network in the simulation at each iteration. The case which still remains unsolved despite the above argumentation is if the solution spirals out to a steady stable state, which cannot be removed by variation of any parameter. The aforementioned boundary for Xj is one such steady state. As mentioned above removing this possibility is very computationally intensive. This can however be done by taking the output parameter set back into account, and computing the time solution of the system of ODEs, including the mass equation, around any unstable steady state found which fulfills conditions (3.3). Subsequently, it must be tested whether the solution is changing less than a threshold value for several time points after a fixed equilibration time. However, it is impossible to predict how much time the system would take to reach a stable fixed point, and thus a predetermined cutoff would be necessary to stay within feasible computational time limits. Therefore, even with this type of testing, ensuring a limit cycle oscillatory path is reached would be impossible. Furthermore, the linearization approximation in linear stability analysis makes an interpretation of the dynamics of the solution from only stability criteria very difficult [12]. This approximation becomes worse the further the solution is from the unstable steady state, however we are looking for solutions that move from this steady state. These considerations outline the limits of the implemented fitness function. While there is room for improvement on the topic of oscillation evaluation, the currently implemented method is a viable way of approaching the problem. Apart from the investigation of oscillation, the fitness function can also be improved in other ways. As evident from preliminary results, evolved networks seem to resemble expected cell cycle properties more when the number of negative feedback loops is reduced, and when there are more double negative or positive feedback loops. While this interpretation needs a lot more investigation to be confirmed, specifically by reducing the evolved networks to their essential interactions to simplify their interpretation, previous evaluation of these functional motifs support this result [16]. The fitness calculation could be improved by including a penalty for negative feedback loops and a reward for stabilizing motifs. Furthermore, the mutation of networks also has room for improvement. Probabilities for different types of mutation events can be further refined from the values used currently. The relative probabilities of e.g. node duplication and node addition are sure to play a role in the way evolution is modeled, however these probabilities are currently simply scientific estimates. Further investigation into their effects would help refine the current model. All in all, ACE is shown to be a promising algorithm to evolve cell cycle models. In its current fundamental format, it is able to generate evolved cell cycles which exhibit artifacts from known eukaryotic models, and even a potential candidate for an embryonic cell cycle network pending further investigation. 18 Simple improvements to the fitness function or further investigation to refine mutation parameters have the potential of improving the results, yet the current version can be seen as a good starting point for an evolutionary algorithm. 5 Future Work As mentioned in the Discussion section, there are a lot of improvements to be made to ACE. Regarding the fitness function an additional penalty and reward should be added for negative feedback loops and double negative and positive feedback loops respectively. This would ideally reduce the amount of results with beat-frequency patterns and produce more networks with stable oscillations. More technical improvements can be made to this function regarding testing for solutions which oscillate towards stable steady states, while this is surely a more long-term goal. An easier test to implement would be to see whether inhibitor, activator or mass asymptotes to 0, which is also an undesirable case. Future work should also be directed toward testing evolved networks more rigorously to obtain more precise results. Individual networks should be tested for their essential components by creating a minimalizing algorithm, which knocks out interactions that are unnecessary to reproduce the observed dynamics, as well as node which asymptote to 0 concentration. Furthermore, investigation into mutation probabilities should be done as mentioned in the previous section. As aspect which would help in all future investigations, would be to find a summary statistic which describes the suitability of a model as a cell cycle. This statistic could not only be used as a benchmark for the aforementioned investigations, but could also be used in the fitness calculation. Finally, there are interesting investigations which are however very long term goals. ACE can be modified to investigate possible paths a primitive cell cycle model took to reach at models such as the fission or budding yeast cell cycles. This would mean creating a reverse mutation function which starts at a more complex cell cycle such as fission yeast and ends at a primitive cell cycle model. This type of approach can also give an indication of the amount of redundancy likely to exist in these current models. Another avenue ACE can be taken in is implementing it on other biological models apart from the cell cycle by rewriting the fitness function. This function is what determines what criteria a network will be evaluated against, thus given a different set of criteria, networks that represent different models can be generated. Finally, an alternative modeling approach would be to use the continuous time Markov process with continuous state space such as R+ , or intervals of real numbers. Such models have been used extensively in population genetics to model gene frequency change [15]. This approach would have had the benefit of real evolutionary time and therefore be better suited for phylogenetic inference, however would have been more difficult to implement for a cell cycle model, and thus be unsuitable for a short project. 19 6 Acknowledgements I would like to thank Lukas Hutter, Dr. P.K. Vinod and Dr. T. Zhang for help with using the XPP software and insightful conversations about the cell cycle. References [1] Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D Watson. Molecular biology of the cell (3rd edition). Garland, New York, 1994. [2] Stephen J Elledge. Cell cycle checkpoints: preventing an identity crisis. Science, 274(5293):1664–1672, 1996. [3] Bard Ermentrout. Xppaut. In Computational Systems Neurobiology, pages 519–531. Springer, 2012. [4] James E Ferrell Jr, Tony Yu-Chen Tsai, and Qiong Yang. Modeling the cell cycle: why do certain circuits oscillate? Cell, 144(6):874–885, 2011. [5] Paul M Harrison, Anuj Kumar, Ning Lang, Michael Snyder, and Mark Gerstein. A question of size: the eukaryotic proteome and the problems in defining it. Nucleic acids research, 30(5):1083–1090, 2002. [6] Samuel Karlin and Howard Milton Taylor. A second course in stochastic processes, volume 2. Access Online via Elsevier, 1981. [7] Eric Mjolsness, David H Sharp, and John Reinitz. A connectionist model of development. Journal of theoretical Biology, 152(4):429–453, 1991. [8] Kim Nasmyth. Evolution of the cell cycle. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 349(1329):271–281, 1995. [9] Bela Novak, Attila Csikasz-Nagy, Bela Gyorffy, Kim Nasmyth, and John J Tyson. Model scenarios for evolution of the eukaryotic cell cycle. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 353(1378):2063–2076, 1998. [10] Orkun S Soyer and Sebastian Bonhoeffer. Evolution of complexity in signaling pathways. Proceedings of the National Academy of Sciences, 103(44):16337–16342, 2006. [11] Steven Strogatz. Nonlinear dynamics and chaos: with applications to physics, biology, chemistry and engineering. Perseus Books Group, 2001. [12] Jörg W Stucki. Stability analysis of biochemical systemsa practical guide. Progress in Biophysics and Molecular Biology, 33:99–187, 1979. 20 [13] Akos Sveiczer, Attila Csikasz-Nagy, Bela Gyorffy, John J Tyson, and Bela Novak. Modeling the fission yeast cell cycle: Quantized cycle times in wee1cdc25δ mutant cells. Proceedings of the National Academy of Sciences, 97(14):7865–7870, 2000. [14] Ilias Tagkopoulos, Yir-Chung Liu, and Saeed Tavazoie. Predictive behavior within microbial genetic networks. science, 320(5881):1313–1317, 2008. [15] Elizabeth Alison Thompson. Human evolutionary trees. CUP Archive, 1975. [16] John J Tyson and Béla Novák. Functional motifs in biochemical reaction networks. Annual review of physical chemistry, 61:219–240, 2010. [17] Anael Verdugo, PK Vinod, John J Tyson, and Bela Novak. Molecular mechanisms creating bistable switches at cell cycle transitions. Open biology, 3(3), 2013. [18] Vilém Zachleder, Oliver Schläfli, and Arminio Boschetti. Growth-controlled oscillation in activity of histone h1 kinase during the cell cycle of chlamydomonas reinhardtii (chlorophyta) 1. Journal of phycology, 33(4):673–681, 1997. 21 Supplementary Material Molecular Models of the Primitive Cell Cycle The equations governing the simplest inhibitor activator model mentioned in the introduction are: ∂C ∂t ∂At ∂t ∂It ∂t ∂mass ∂t I = kass A I − (kdiss + kdi + kda + kcat A) C (6.1a) = ksa mass TFa − kda At (6.1b) = ksi − kdi It − kcat C A (6.1c) = µ mass (6.1d) = It − C (6.1e) A = At − C Φ TFa = 1 + ( A )n (6.1f) (6.1g) Equation 6.1g is a Hill function for self-inhibition, and Φ, , and n are tunable parameters, where n > 1 must be fulfilled. Here the activator is abbreviated to A, the complex to C and the inhibitor to I. The quantities with subscript t denote the total concentration of this protein in the system. Mass is modeled as exponentially increasing in time by (6.1d), it is halved once the activator falls below a certain threshold, thereby modeling the cell division at the end of M phase, when synthesis has long been fired by high activator concentration, which is then decaying slowly. The simplified SIMM-model is governed by the following equations: ∂C ∂t ∂Ip ∂t ∂At ∂t ∂It ∂t ∂mass ∂t I = kass A I − (kdiss + kdi + kda + kcat ) C = kcat C − (kdp + kdi + kcat2 A) Ip = ksa mass TFa − kda At = ksi − kdi It − kcat2 Ip A = µ mass = It − C A = At − C Φ TFa = 1 + ( A )n 22 Parametrization of Primitive Cell Cycle Networks Parameter Φ n thresh µ ksa ksi kda kdi kcat kass kdiss Value 1 0.001 1.1 0.06 0.05 0.2 0.2 0.5 0.05 50 100 0.01 Parameter description TF Hill function parameter TF Hill function parameter TF Hill function parameter Activator threshold at which mass halving occurs Mass growth rate Synthesis rate of Activator Synthesis rate of Inhibitor Degradation rate of Activator Degradation rate of Inhibitor Catalysis rate of Complex Association rate to Complex Disassociation rate from Complex Table 2: Simplest Activator-Inhibitor molecular model parameter set Parameter Φ n thresh µ ksa ksi kda kdi kcat kcat2 kdp kass kdiss Value 1 0.001 1.1 0.06 0.05 1 0.5 0.025 0.05 0.5 20 0.5 100 0.01 Parameter description TF Hill function parameter TF Hill function parameter TF Hill function parameter Activator threshold at which mass halving occurs Mass growth rate Synthesis rate of Activator Synthesis rate of Inhibitor Degradation rate of Activator Degradation rate of Inhibitor Catalysis rate of Complex Rate of INH-P degradation catalyzed by Activator Dephosphorlylation rate of INH-P Association rate to Complex Disassociation rate from Complex Table 3: Simplified SIMM molecular model parameter set 23 Parameter γ1 γ2 γ3 σ1 σ2 σ3 ω10 ω12 ω13 ω20 ω21 ω30 ω31 ω34 µ thresh Value 1 5 0.1 1 4 2 -1 -5 7 1.5 -5 0 -7 5 0.01 0.5 Parameter description Activator timescale parameter Inhibitor timescale parameter TF timescale parameter Activator sigmoid steepness parameter Inhibitor sigmoid steepness parameter TF sigmoid steepness parameter Activator synthesis - degradation rate Activator inhibition weight by Inhibitor Activator activation weight by TF Inhibitor synthesis - degradation rate Inhibitor inhibition weight by Activator TF synthesis - degradation rate TF inhibition weight by Activator TF activation weight by Mass Mass growth rate Activator threshold at which mass halving occurs Table 4: Reinitz model parameter set. Further explanations of the parameters can be found in the Methods section 24 Parametrization of Evolved Cell Cycle Networks Parameter N MIN Batchsize nMax MAX GENERATION NODE FITNESS COST NODE PROP MUTATION EqMut[0] EqMut[1] EqMut[2] EqMut[3] EqMut[4] EqMut[5] EqMut[6] Mut[0] Mut[1] Mut[2] Mut[3] Mut[4] Mut[5] Mut[6] Ace1: seed Ace1: gausSeed Ace2: seed Ace2: gausSeed Ace3: seed Ace3: gausSeed Value 6 3 20 100 0.001 0.5 0.3 0 0.3 0 0.4 0 0 0.1 0.2 0.1 0.3 0.3 0.3 0.15 -1379610050 1379610050 -1379933119 1379933119 -1379933503 1379933503 Parameter description Minimun size of all networks to exit equilibration Number of “organisms” in the simulation Maximum number of nodes per network Number of evolution iterations Fitness cost per node Probability of node property mutation Equilibration probability of node duplication Equilibration probability of node deletion Equilibration probability of node addition Equilibration probability of interaction deletion Equilibration probability of interaction addition Equilibration probability of interaction scaling Equilibration probability of interaction inversion Relative probability of node duplication Relative probability of node deletion Relative probability of node addition Relative probability of interaction deletion Relative probability of interaction addition Relative probability of interaction scaling Relative probability of interaction inversion Uniform sampler random seed for Fig 6 run Gaussian sampler random seed for Fig 6 run Uniform sampler random seed for Fig 7 run Gaussian sampler random seed for Fig 7 run Uniform sampler random seed for Fig 8 run Gaussian sampler random seed for Fig 8 run Table 5: Parameter set of ACE run to produce evolved networks 25
© Copyright 2026 Paperzz