Evolving a Primitive Eukaryotic Cell Cycle Model

Evolving a Primitive Eukaryotic Cell Cycle Model
Malte Lücken, Jotun Hein, Bela Novak
Abstract
Current published models of eukaryotic cell cycles are minimalistic
models of essential components for the cell cycle to function. In reality,
the networks making up these cell cycles must be much larger to be robust
to deleterious mutations, exhibiting many alternative pathways that can
be activated should one pathway fail. Investigating how large and redundant current cell cycles are, can be done by modeling their evolution from
a primitive eukaryotic model. In this project we present the Algorithm
for Cell-cycle Evolution (ACE), which models the evolutionary process
starting at a primitive eukaryotic cell cycle model. Preliminary results
show the potential this algorithm has by reproducing artifacts of known
cell cycles, and even generating a candidate primitive embryonic cell cycle
model. However, the results also indicate room for further development of
ACE, especially regarding its criteria used to generate selection pressure
for “good” cell cycle models. Further, more detailed investigations into
evolved models from ACE are necessary to make comprehensive conclusions.
1
Introduction
At the point of conception, whether by sexual or asexual reproduction, any organism is just a single cell. In order to create the biodiversity that we observe in
our surroundings today, this cell must replicate itself by division. It is this replication that allows one cell to develop into a complex multicellular organism, or
a whole colony of bacteria. However, uncontrolled cell division and proliferation
results in cancer. Thus, understanding the regulatory network underlying cell
division is of central importance.
In the cell division cell cycle of current eukaryotes, the cell goes through
four distinct phases, with a fifth possible quiescent phase, which can be entered,
halting the periodic cell division the cell otherwise undergoes. The distinction
of these four phases (called G1, S, G2, and M) ensures that each of the two
daughter cells produced in one cycle obtain exactly one copy of all chromosomes.
This process necessitates careful regulation, as it must be carried out with near
perfect precision, due to the high frequency at which it occurs.
Arguably the most important transition is thus the G1/S transition, where
the cell proceeds from a first growth phase (G1) to the DNA synthesis phase (S).
As it is of paramount importance that this synthesis only occurs once, eukaryotes
1
have developed a system to ensure this, using a so-called licensing factor [9]. This
licensing factor is present before DNA replication starts, and “loads” the origins
of replication (ORs) on the DNA with proteins needed to start replication. The
accumulation of proteins at the ORs (“loading”) is however not enough to start
DNA replication, which requires the presence of an activator. Furthermore, once
this activator is present, it is important that the licensing factor is no longer
at a sufficient concentration to keep loading the ORs, as this would result in
multiple firing of DNA replication and thus more than two copies of the DNA.
In light of these considerations, models such as the SIMM (substrate-inhibitormultiply-modified) model [17] have been developed, which give the licensing
factor the additional functions of being an inhibitor of the activator, as well
as being a substrate of it. Thus, the inhibitor both inhibits the activator, and
is modified by it to cause it’s own degradation, resulting in a double negative
feedback loop between inhibitor and activator as seen in Figure 1. This type
of network motif between the inhibitor and the activator can be observed not
only in G1/S transitions, but in many protein regulated checkpoints in more
complex cell cycle models, as it is typical for a bistable switch [17].
Figure 1: Schematic diagram of a double negative feedback loop as seen between
the Activator and Inhibitor in a SIMM model network.
In primitive eukaryotes it has been suggested that S and M phases can be
triggered by the same activator [8, 2]. In such a case, the entire cell cycle can
be modeled by the aforementioned G1/S transition model, now representing a
“G” - “S/M” phase transition, where the cell cycle only switches between the
initial growth phase “G”, and the DNA synthesis with subsequent segregation
phase “S/M” [9].
In the first phase the licensing factor is at high abundance, which loads the
ORs. In this phase the cell grows and the replication machinery is duplicated. In
the “S/M” phase, the activator is at high abundance, and the licensing factor has
been degraded. Here, DNA synthesis occurs, and as the activator is degraded
below a certain threshold segregation and cell division is fired [9]. Rather than
representing the original G, S and M phases, the phases in this model represent
a pre-replicative phase (G) and a replicative phase (S/M) [9].
This eukaryotic cell cycle model, while very simplified compared to the fourphase model described above, fulfills the condition of having only one copy of
each chromosome in each daughter cell. It is therefore a viable model for a
primitive eukaryote.
2
On a molecular level, it can be modeled in several ways, two of which can
be seen in Figures 2 and 3.
In the schematic diagrams in these figures each node represents a protein
or a protein complex, and arrows represent biochemical interactions. These
reactions are reversible if the base of the arrow-line has a circular shape as
in the inhibitor-activator binding reaction that forms the complex. The arrow
represents the dominant direction of the interaction, meaning that the rate
in this direction is larger than the opposite rate. Furthermore, the four circles
represent degradation and arrows pointing towards a node that don’t come from
another node, denote synthesis. The equations derived from Figures 2 a) and 3
a) can be found in the supplementary material.
Figure 2: a) Schematic diagram of a very simple cell cycle model using only a
single “activator” (kinase), inhibited by a stochiometric inhibitor, which is also
a substrate of the activator. The activator (ACT) catalyses the degradation
of the inhibitor (INH) only in the complex form (COM). In addition to the
interactions depicted, there is also a background decay of total inhibitor at rate
kdi and total activator at rate kda . b) Solution of the system of non-linear
differential equations set up on the basis of the model depicted in a). The exact
equations, as well as the parameter sets, can be found in the supplementary
material. Mass is not shown on the plot as it varies an order of magnitude
higher. The transcription factor (TF) is modeled as a Hill function dependent
on the activator.
Figure 2 depicts possibly the simplest activator-inhibitor model with a single
checkpoint at the G1/S phase transition. Here, the inhibitor and activator form
a complex in a reversible process. The inhibitor in this complex is then degraded
under catalysis from the activator, returning the activator in the complex back
to its original state. In this model it is important that kass kdiss , meaning
the inhibitor binds tightly to the activator, and that kass > kcat , which ensures
the complex is an inactive form of the activator, and not merely an intermediate
step in activator mediated inhibitor degradation. The latter condition ensures
a time delay in inhibitor degradation, which is needed to have an oscillating
model [4].
The key to oscillation in this model is however not this time delay, or the
inhibitor-activator motif, but the transcription factor (TF) - activator relation3
ship. This relationship is modeled by a Hill function governing the level of TF
(c.f. (6.1g) in Supplementary Material), and thus introduces a negative feedback loop into the system. This network motif is strongly linked with oscillatory
systems [16, 4].
In this model, DNA synthesis is fired as soon as the activator level peaks.
Cell division is modeled as mass halving, when the activator level drops below
a threshold value. Thus, mass is the parameter controlling cell cycle oscillation
with the mass growth rate controlling the period of the cell cycle.
Figure 3: a) Schematic diagram of a simplified SIMM model [17]. The activator (ACT) catalyzes the degradation of the phosphorylated inhibitor (INH-P),
which is created by activator-inhibitor complex (COM). In addition to the interactions depicted, there is also a background decay of total inhibitor at rate kdi
and total activator at rate kda . Furthermore the activator, ACT, is synthesized
at a rate ksa × mass × TFa , where TFa is the transcription factor concentration.
b) Solution of the system of non-linear differential equations set up on the basis
of the model depicted in a). The exact equations, as well as the parameter sets,
can be found in the supplementary material. The transcription factor (TF) is
modeled as a Hill function dependent on the activator.
The simplified SIMM model of a primitive eukaryotic cell cycle is depicted
in Figure 3. This model is very similar to the aforementioned one, yet it differs
from the prior in that it introduces a phosphorylated inhibitor protein (INH-P),
whose degradation is catalyzed by the activator. This system is more robust
to fluctuations in activator levels, as the activator cannot immediately start to
degrade the inhibitor in the absence of INH-P, but instead it will immediately be
bound by any residual INH. Moreover, this mechanism introduces a further time
delay in inhibitor degradation, meaning all activator is immediately bound to
the inhibitor and therefore the switch between high inhibitor and high activator
becomes much quicker. This “quick” switch is desired as it represents a more
sensitive checkpoint.
When contrasting these simple models for a primitive eukaryote with cell
cycle models for fission yeast [13], the different levels of complexity become
evident very quickly. However, if the simple models presented above do represent
a primitive eukaryote, then the two are merely different stages on the same
evolutionary ladder. In order to better understand this ladder, a model for
4
evolution is needed. In this report, an algorithm to model cell cycle evolution
is introduced.
Next to modeling evolution, such an evolutionary model can provide information on the level of redundancy to be expected in cell cycle regulatory
networks, as current models represent cell cycle networks, which were reduced
to their essential components and are thus minimalistic. Furthermore, extraction of phylogenetic information from cell cycle models of different species may
become possible on this basis.
The core of any evolutionary algorithm is an accurate model of the evolutionary process. Evolution from primitive eukaryotes to organisms with more
complex, and therefore more robust and/or more sensitive regulatory networks
governing the cell cycle is a slow, large scale process. On a smaller scale, this
process comprises of small-scale point mutations gradually altering properties
of individual proteins, gene duplications, deleterious mutations rendering whole
proteins dysfunctional etc. An algorithm which models this evolutionary process, must therefore focus on the small-scale changes that underlie it. However,
these changes are individually random and do not necessarily result in an increase in fitness. In nature this is balanced out by having a large number of
organisms.
All organisms undergo mutations, some increase fitness, others don’t. Those
organisms, which are fitter than their peers are more likely to secure the continued existence of their DNA and thus also the regulatory network which governs
their cell cycle. This securing of their DNA’s existence can be done in several ways, such as higher likelihood of reproduction, better adaptation to living
conditions with a resulting higher survival rate, etc.
The Algorithm for Cell-cycle Evolution (ACE) presented in this report attempts to reflect this process in the appropriate detail to accurately model
evolution.
2
2.1
Results
Primitive Network
A cell cycle model is described by a set of ordinary differential equations (ODEs).
When designing an algorithm to evolve a cell cycle, it is important that the equation space of all possible systems that describe cell cycles is well defined, as the
algorithm must automatically be able to explore this space to build new cell
cycle models. In the case of the equations governing the models depicted in
Figures 2 and 3 (found in the Supplementary Material section), the mathematical space underlying the equations incorporates multiple types of interactions.
These interactions include catalysis, synthesis and degradation, and reversible
and irreversible reactions of 1 and 2 reactants, all of which look mathematically
different.
In these molecular models the necessary complexity to model an oscillating
network arises from a complex wiring of many mathematically simple interac-
5
tions. Due to the large amount of possible combinations of interactions, the
equation space is large, as is the parameter set describing a system. Sampling
such a large space for possible cell cycle networks is a slow process, making
the implementation of an evolutionary algorithm on a molecular model of a cell
cycle non-ideal.
Here, we use a mathematical framework developed by Reinitz and colleagues
[7] as described in [16] (c.f. Methods). This framework allows the use of simple
combinations of two types of interactions to generate the complexity necessary
to model the cell cycle. This is done by hiding the complexity in the sigmoidal
node response to interaction signals, thus creating a smaller equation space by
having fewer interaction types.
Instead of modeling basic interactions, such as binding, synthesis, and degradation of molecular components, this framework models interactions simply as
activation or inhibition. This approach also reduces the number of nodes necessary in the network, as e.g. the inhibitor binding tightly to the activator to
form a complex, which produces phosphorylated inhibitor to be degraded under catalysis by the activator, is now simply a double negative feedback loop
shown in Figure 1. This type of interaction is the basis of both aforementioned
molecular model, therefore they can both be summarized in a single Reinitz
model.
This model is constructed to have the minimum number of nodes required
for an oscillating network, 3 [4, 16], as well as a mass input. The network and
its solution is shown in Figure 4 below.
Figure 4: a) Schematic diagram of a primitive cell cycle in the reinitz framework.
There is a double negative feedback loop between activator (Act) and inhibitor
(Inh) and a negative feedback loop between activator and transcription factor
(TF), which is the main oscillator. b) Solution of the system of non-linear
differential equations set up on the basis of the model depicted in a). These can
be found in the text.
It must be noted that although mass is depicted as a node, it does not
behave as a protein node in the Reinitz framework. It is rather characterized
by exponential growth, with cell division modeled as mass halving once the
activator concentration falls below 0.5. The equations solved on this network,
6
modeled within the Reinitz framework are:
dAct
dt
dInh
dlt
dTF
dt
dmass
dt
= γ1 [F (σ1 W1 ) − Act]
W1 = ω10 + ω12 Inh + ω13 TF
(2.1a)
= γ2 [F (σ2 W2 ) − Inh]
W2 = ω20 + ω21 Act
(2.1b)
= γ3 [F (σ3 W3 ) − TF]
W3 = ω31 Act + ω34 mass
(2.1c)
= µ mass
(2.1d)
where ω12 , ω21 and ω31 are negative constants, and ω13 and ω34 are positive
constants reflecting activation and inhibition. µ is the mass growth rate, and
the function F is a sigmoid function with the equation F (σi Wi ) = 1/(1 +
exp(−σi Wi )).
This simulation incorporates all features expected of a primitive cell cycle.
Firstly, the activator peaks quickly after a period of high inhibitor concentration, denoting a sensitive and quick transition into “S/M” phase. Secondly,
the activator decays slowly, gradually resetting for the “S/M” - G transition
and allowing time for DNA synthesis to occur before cell division is triggered.
Moreover, a cycle at constant period is obtained.
2.2
ACE - Algorithm for Cell-cycle Evolution
The algorithm for Cell-cycle Evolution (ACE) presented here computes a possible evolutionary path using the above network as a starting point. This is done
using an approach, which is designed to closely follow the natural evolutionary
process explained in the introduction, and has been shown to be successful in
modeling pathway evolution [10, 14].
In ACE a simulation is set up containing N individual, independent “organisms”, each of which is represented by a cell cycle network with a specific
parameter set. When the program is initialized all N organisms are identical
and must be functional in order for the initial networks to represent a viable
starting point for an evolutionary path. To finish the simulation initialization,
the fitness of each organism is calculated and scaled to sum to 1.
The simulation starts to evolve by one organism being chosen to be reproduced and one organism being chosen to be deleted. These organisms are chosen
randomly with their relative fitness representing their likelihood of survival or
reproduction. The organism to be reproduced is duplicated and placed in the
simulation instead of the deleted organism. Subsequently, the duplicated daughter organism is mutated, producing a new, evolved cell cycle model. Finally,
the fitness of each cell cycle is re-calculated and rescaled. This process is shown
in Figure 5.
7
Figure 5: Schematic diagram of the computational setup of ACE. A simulation
is carried out by first randomly choosing a network for reproduction and one for
deletion, weighted by the network fitnesses, F (reproduction) or by 1−F rescaled
to sum to 1 (deletion). The network to be reproduced is then duplicated and
“evolved” by a random mutation, and saved in place of the network chosen for
deletion. Finally the fitnesses of all networks in the simulation are recalculated
and rescaled to sum to 1.
The approach implemented in ACE means that all organisms in the simulation are in constant competition for survival. The reproduction and deletion
weighted by the organism’s fitness results in fitter organisms having a better
chance to survive, giving a central role to the method that is used to calculate
fitness. The characteristics that are focussed on in this function will determine
what causes selection pressure. However, the core of the algorithm is the mutation method. This method defines the individual steps that result in the greater
evolutionary process, thereby constituting the sampler of cell cycle equation
space. These two functions are explained briefly below.
2.2.1
Fitness calculation
In an evolutionary process, it is important that every network along the evolutionary path taken from a primitive cell cycle to any evolved cell cycle, is a
viable model for a cell cycle network. This must be the case, as a mutation to an
ancestral cell cycle, that does not support cell cycle function, would cause the
organism with this mutation to die, and thus its DNA would not be preserved.
8
Therefore, having a functioning cell cycle must be central to fitness calculation
in ACE, and therefore the dominant criterion creating selection pressure.
Whether a cell cycle in functional or not is determined by its most fundamental necessary condition: oscillation. If a network is to represent a cell cycle,
the node concentrations must vary along the same path for every cell cycle oscillation, as the cell cycle is traversed in the same way countless times in any
organism. This describes a limit cycle oscillation [11, Chapter 7].
ACE calculates the fitness of a proposed cell cycle network by assigning
a value of 1 − N c to all networks that have the potential of entering a limit
cycle oscillatory phase, while those that don’t have this potential are assigned
a fitness of 0. Here N is the number of nodes in a network and c is a constant
fitness cost per node. A network’s fitness is penalized by the number of nodes as
transcription and translation of proteins costs energy, thus larger networks have
larger energy costs associated. However, as the number of proteins in eukaryotic
proteomes tend to be of the order 104 − 105 [5], the additional energy cost of one
protein in any primitive eukaryotic organism should be relatively small. Further
details on the fitness calculation can be found in the Methods section.
2.2.2
Mutation
The actual sampler of cell cycle equation space in ACE is its mutation function.
This mutation function defines how a cell cycle network can be evolved to generate a network that represents the next step in evolution. ACE allows one of
seven different types of mutation to be applied to the network’s adjacency matrix, as well as allowing a mutation of the continuous node property parameters
to occur.
The seven types of mutation consist of five network topology mutations:
node addition, node duplication, node deletion, edge addition, and edge deletion, and two edge weight mutations: edge scaling and edge inversion. These
mutation steps and the mutation of continuous node property parameters are
elaborated on in the Methods section.
As mentioned in the introduction, current cell cycle models are minimalistic
as they represent only essential interactions needed to model the cell cycle of
an organism. In reality, a large amount of redundancy can be expected, as any
essential pathway, such as the cell cycle, must be robust to random mutations
occurring [10]. For this reason, two important concepts are incorporated into
ACE.
Firstly, the nodes and edges of the initial network input into the algorithm
cannot be deleted or, in the case of edges, changed between activation and
inhibition by edge inversion. This is done as these edges represent essential interactions for cell cycle function, and are thus likely to be conserved throughout
evolution. Furthermore, this has the practical implication that it conserves the
central oscillator between the transcription factor and the activator, which helps
keeping evolved networks in an oscillating regime.
Secondly, an equilibration period is implemented in ACE. In this equilibration period the initial network is only evolved by node and interaction additions,
9
and node duplications. This period serves to grow the network to a certain level
of redundancy, so that once the full range of mutation is allowed, a certain level
of redundancy has been attained. Equilibrating the system in this way is done
either until all networks have reached a predefined node size, or for a predefined
number of steps per node. This enables a more realistic evolutionary pathway.
2.3
Evolved Networks
The networks evolved from ACE shown in Figures 6, 7, and 8, are results from
different runs after 100 iterations on the primitive network shown in Figure 4,
with an equilibration period lasting until all networks have at least 6 node (not
including mass). The evolved models presented here are 3 examples of runs that
generated 12 evolved networks with non-zero fitnesses of which 6 were found to
have oscillating solutions using the output parameter set. The results should be
interpreted as preliminary studies that show the potential of ACE, as they were
only done for a small number of iterations for the minimum simulation size of
3 organisms.
Figure 6: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P10, green interactions represent positive interactions (activation)
and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system
of equations derived from the network depicted in a). Parametrization of the
solution was obtained directly from ACE and can be found in the supplementary material. Inhibitor cannot be seen on this plot as it asymptotes to 0. The
activator activity can be seen to oscillate in bursts.
Figure 6 shows an evolved network in which the inhibitor concentration
asymptotes to 0, and thus doesn’t take part in the oscillation. Furthermore, the
oscillation does not seem to have a constant period, as a longer period is seen
to be followed by a shorter one. This behaviour is an artifact also observed in
the “quantized” periods of the fission yeast cell cycle [13].
Moreover, the activator concentration time solution can be seen to oscillate
in bursts. This type of behaviour could be caused by the two coupled oscillator
10
motifs (negative feedback loops [16]) between the activator, the transcription
factor and the P8 node shown in Figure 6 a). This coupled oscillator may result in a superposition of two oscillations at similar frequencies, causing the
beat frequency pattern observed. The negative feedback loop between the inhibitor and P4 can be ignored in this interpretation, as the inhibitor is close to
concentration 0.
Figure 7: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P8, green interactions represent positive interactions (activation)
and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system
of equations derived from the network depicted in a). Parametrization of the
solution was obtained directly from ACE and can be found in the supplementary
material. The progression of activator, inhibitor and TF concentrations occurs
occurs in bursts, however the period seems to be relatively stable.
The network shown in Figure 7 shows a similar burst-like activator progression as the previous model, however the period appears to be more regular. A
possible explanation for this type of behaviour could again be the amount of
negative feedback loops, which couple the entire network into two groups (TFAct-P8 and the rest). The large amount of oscillators seem to have superposed
to give the observed pattern.
11
Figure 8: a) Schematic diagram of the evolved network. Added nodes are labelled P3 to P9, green interactions represent positive interactions (activation)
and red interactions represent negative interactions (inhibition). Depicted interaction thickness is scaled by the interaction weight. b) Solution of the system
of equations derived from the network depicted in a). Parametrization of the
solution was obtained directly from ACE and can be found in the supplementary material. Apart from the mass asymptoting to 0, the cell cycle is similar
to the primitive cell cycle model input into the system, suggesting it may be a
viable model.
In contrast to the previous two networks, Figure 8 shows a progression expected of cell cycle, with the exception of mass asymptoting to 0. This cell cycle
exhibits a quick change between high inhibitor and high activator regimes, with
a slow decay in activator concentration giving the cell sufficient time for DNA
synthesis before cell division occurs, and a regular period. Interestingly, this
cell cycle seems to have undergone more node duplications than the previous
examples as investigations into the adjacency matrix evolution, and the network
diagram, suggest. This indicates duplication of nodes may have played an important part in cell cycle evolution, a result that is also found by Tagkopoulos
et al. [14] for metabolic networks. Furthermore, this cell cycle exhibits more
stabilizing motifs than those in Figures 6 and 7. These stabilizing motifs are
double negative or positive feedback loops, which can stabilize the oscillation
[16]. They may have played a role in preventing the burst-like pattern the two
previous models exhibited, as this model also has multiple negative feedback
loops. However, further investigations are needed to confirm these conjectures.
3
Methods
3.1
Reinitz Framework
The mathematical framework used to describe the cell cycle models in ACE was
originally proposed by Reinitz and colleagues [7]. It is used here as interpreted
and developed by Tyson and Novak [16].
The Reinitz framework incorporates the non-linearity of biological processes
directly in the node equation, and can therefore create non-linearity via only
two types of interactions, which are governed by a small set of parameters. In
this model network node concentrations, Xi , are modeled via the equation:
dXi
= γi [F (σi Wi ) − Xi ],
dt
Wi = ωi0 +
N
X
ωij Xj ,
i = 1, . . . , N
(3.1)
j=1
where the function F (σi Wi ) is given by:
F (σi Wi ) =
1
1 + exp(−σi Wi )
12
(3.2)
In this framework γi is the parameter that governs the time scale on which
Xi regulation takes place, σi determines the slope of the sigmoid governing interaction with node Xi and thus characterizes the level of “binary switch”-nature
of this node, and Wi is a function which incorporates all interactions with node
Xi in the network. The variables ωij from (3.1) denote the interaction strength
of node Xj with the node Xi , and ωi0 is the sigmoid offset which determines
the Wi threshold for interaction. This last parameter can be interpreted to be
the synthesis minus the degradation of Xi . Finally, N denotes the number of
nodes in the network.
This model scales all quantities Xi to 1, reducing the number of parameters.
Furthermore, the sigmoid function F from (3.2) is a reasonable approximation
for most interaction types, such as hyperbolas resulting from mass-action, or
even Hill-functions scaled to 1. In fact, equation 3.1 can be shown to be a
generalization of a simple synthesis and degradation differential equation in the
molecular model framework. A more in depth discussion of this model can be
found in [7].
In this framework, the tunable parameters, which can be made subject to
mutations in the evolutionary algorithm are γi , ωi0 , σi , and the interaction
matrix ωij .
3.2
ODE solver
All systems of non-linear ODEs in the scope of this project were solved using the
XPP/XPPAUT software developed by Bard Ermentrout’s lab [3]. Parametrization of both the molecular and the Reinitz primitive cell cycle models were done
with the aid of the XPP-AUTO function, which generates bifurcation diagrams
of two dimensional systems of ODEs. These diagrams visualize the effect parameter variation has on the limit-cycle, greatly simplifying the search for good
parameter sets. The systems were simplified to two dimensions by modeling
some nodes as “fast”, meaning their concentrations adapted according to the
slower variables immediately. This allowed them to be described as a function
of the “slower” variables, not as a differential equation.
3.3
Evolutionary algorithm
The setup of ACE is adequately described by Figure 5. The input network is
read into the program as a file with a specific format, which includes all relevant
information on the adjacency matrix, the number of nodes, and the ω0 , σ and
γ values. This network is copied to all N “organisms” in the simulation, which
then become distinct in the process of equilibration.
The output of ACE is also done in file format. The networks in this output
file are described in the same format as the input network, with their fitnesses
displayed as well. Furthermore, all details of the run parameters are included at
the head of this file. Details about the format and the algorithm can be found
in the comments directly in the code, which is available upon request.
13
The preliminary results shown in Figures 6, 7 and 8 were generated at different random seeds with the same simulation parameters, which can be found in
the supplementary material. The code is set up to be able to reproduce results
if the random seeds are hard coded into the algorithm.
A python script to convert the output of ACE into output readable by Bard
Ermentrout’s XPPAUT was also created, and is also available upon request.
The general setup of the algorithm is created after Soyer et al’s pathway evolution software [10], and Tagkopoulos et al’s EVE software [14]. The uniqueness
of ACE, and also the general complexity of the algorithm, however lies within
the fitness calculation function and the mutation function, which are further
elaborated on below.
3.3.1
Fitness Calculation
As mentioned previously, a necessary condition for a dynamic network to represent a cell cycle is limit cycle oscillation. Thus, in order to determine the
fitness of a network as a cell cycle, it must be investigated whether the system
of nonlinear ODEs solved on this network oscillates.
In 2 and 3 dimensional systems the standard method of testing for oscillating
solutions is by linear stability analysis [4, 11]. This method is designed to
examine the behaviour of the linearized system in the vicinity of a fixed point,
i.e. a point where d Xj /d t = 0 for all j. This is done by taking the Jacobian
matrix of the system at that fixed point and calculating its eigenvalues [12, 4].
Complex eigenvalues of the Jacobian denote an oscillating system, however the
real part of the complex eigenvalue is of importance when looking for limit
cycle oscillations. A negative real part of a complex eigenvalue corresponds
to a damped oscillation of the system, and thus the system asymptotically
approaches the fixed point [4]. This type of fixed point is stable [11]. However,
In order to obtain a sustained oscillation of the system, we need an unstable
fixed point, this gives the conditions [4, 11, 12]:
∃λ ∈ Λ : x > 0 ∧ y 6= 0
for λ = x + iy
(3.3)
Here, Λ denotes the set of all eigenvalues of the Jacobian, one of which is
λ. x is the real part of the complex eigenvalue, and y is the imaginary part.
The condition effectively says that we are looking for a solution which oscillates
outwards from a fixed point [12].
Condition (3.3) therefore does not ensure a sustained limit cycle oscillation
without a controlling parameter. However, such a limit cycle oscillation without
controlling parameter, is not what we require for a cell cycle model. As the
molecular models depicted in the introduction show, mass plays an essential
role in controlling cell cycle oscillation. However, as the mass of the system
is not modeled as a continuous function of the other variables (cf. Equation
(2.1d)), it cannot be included in the eigenvalue and fix point calculations. These
calculations are instead performed with a constant mass value, which varies
between 0.1 and 0.9 in steps of 0.1, meaning mass-control is not taken into
14
account. Finding an unstable steady state around which the dynamic solution
spirals outward means a mass-controlled limit cycle or an uncontrolled limit
cycle, often seen in embryonic cell cycles [1, Chapter 17], can occur. Therefore,
this is regarded as sufficient to allow for a network to be seen as a cell cycle.
This is further elaborated on in the discussion section.
In light of these considerations, the fitness of a cell cycle in ACE is calculated
by the equation:
(
1 − N c if α = 1
Fitness =
1
if α = 0
after Soyer and Bonhoeffer’s investigation of the evolution of complexity in
generic signaling pathways [10]. Here, α = 1 , if the network fulfills condition
(3.3) and α = 0 otherwise.
A fitness of 0 is assigned to non-functional cell cycles to ensure these are not
reproduced, and therefore the evolutionary pathway taken by any functioning
cell cycle output by ACE is one that could have occurred naturally.
3.3.2
Network Mutation
As mentioned previously, the mutation function incorporates seven possible mutation steps to evolve a cell cycle network to one that represents the next generation of cell cycle in an evolutionary process. Each time the mutation function
is called, one node is picked at random, and one of these mutations is performed based on a predefined probability, with the exception of initial network
nodes, which are preserved to a certain extent. If an edge mutation is to be
performed, one of the outgoing or incoming interactions of this node is selected
to be mutated, with the exception of the mass interaction in the initial network.
This is chosen to remain the same to encourage mass-control being conserved
in evolved networks. Additionally, there is a probability that the node properties (i.e. ω0 , σ and γ) of the selected node are also scaled. This is considered
separately large-scale changes such as node additions/deletion/duplications or
interactions changes are likely to also bring about changes in e.g. binding energy between two proteins, slightly changing the way a node reacts within the
cell cycle. These properties may in fact change very slightly much more often
than other mutations occurring, however ACE models this as larger changes
over larger time periods.
Of the seven mutation processes, node and edge deletion, as well as node
duplication are trivial, as a node or edge is simply selected at random and
deleted or duplicated. Edge inversion changes an interaction between inhibition
and activation and therefore multiplies the interaction weight ωij by −1. In
edge addition, an interaction that the selected node doesn’t already have is
chosen from all possible incoming and outgoing interactions with the exception
of an interaction outgoing to the mass node. The chosen interaction is added
by sampling a value from a uniform distribution over the domain of allowed
15
interaction weights, found in Table 1. The weight assigned to this interaction
must fulfill the condition |ωij | > 0.01.
Node addition is implemented similarly to edge addition. In this mutation
type a new node is added into the network, which has the same number of interactions as a randomly chosen node already in the network. This node is forced
to have at least one incoming and one outgoing interaction, with all interaction
weights sampled after the same method as edge addition is performed. The ω0
value is sampled from a uniform distribution similarly to ωij , however if none
of the incoming interactions are inhibitive, the domain of ω0 is restricted to
negative values. This is done, as the node concentration could otherwise only
increase when simulated, as evident from equation (3.1). The node characteristic parameters of γ and σ are sampled from a probability distribution function
1
defined between 0.1 and 10. This pdf has an expectation value
(pdf) p(x) = 2x
of 1 and looks uniform on a logarithmic scale.
Interaction scaling and scaling of node properties are implemented based on
a similar thought process. As nothing is known about the parameter regions
which give a functioning cell cycle model before the networks are evaluated
by the fitness function, it is impossible to drive the network parameters into
these regions via mutation. This would be a common approach representing a
discrete analogy of the Ornstein-Uhlenbeck process [6, Chapter 15]. Thus, the
scaling method of the network parameters in ACE is implemented to have an
expectation value of 1, i.e. the parameters are not changed on average. This
method is implemented to represent a random walk through parameter space
[6, Chapter 15]. It is also for this reason that interaction weights are sampled
from a uniform distribution.
In light of this consideration, interaction scaling is implemented by sampling
a value from the Gaussian distribution N (0, 0.5), and using this as the exponent
of 2 to give a scaling factor. The gaussian sampling is bounded in the domain
[−2; 2] meaning the maximum scaling possible is by a factor of 1/4 or 4, while
approximately 68% of scaling factors are between 0.707 and 1.414. In node
property scaling, γ and σ are scaled by sampling an exponent of 10 from the
Gaussian distribution N (0, 0.1), while a new ω0 value is sampled directly from
the Gaussian N (ω00 , 0.5), where ω00 is the previous value of ω0 . As is the case in
node addition, if there are no incoming negative interactions, ω0 is only sampled
in the negative range. Moreover, all scaling is done so that the values scaled
remain within their respective domains, shown in Table 1.
The mass growth rate µ from Equation (2.1d), as well as the threshold at
which the mass is halved is left constant over evolution in ACE, as these values
control only the oscillation period.
4
Discussion and Conclusions
The preliminary results obtained from short simulations of ACE on the primitive cell cycle model in the Reinitz framework show that while not all evolved
networks represent good cell cycles, they do generate artifacts found in current
16
Mutated Variable
ωij
ωi0
γi
σi
Domain
[−10; 10]
[−5; 5]
[0.1; 10]
[0.1; 10]
Table 1: Allowed domains for mutated parameters
eukaryotic cell cycle models. Figures 6 and 7 specifically do not represent expected outcomes of eukaryotic cell cycles, however the beat-frequency pattern
generated by these models can be likened to activator progression in the cell
cycle of Chlamydomonas Reinhardtii [18]. This algae exhibits different activity
of a cell cycle related kinase dependent on the light-dark cycle, which controls
growth rates.
Figure 8 on the other hand fulfills the characteristics of a good cell cycle
model, however does not seem to be mass-controlled. This type of model may
be relevant when considering embryonic cell cycles [1, Chapter 17].
Obtaining such a mass-independent embryonic cell cycle was mentioned as
a possible outcome of ACE, considering the method of fitness calculation uses a
constant mass value. This, as well as only 6 of the 12 cell cycle networks resulting
from the preliminary simulations being functional for the given parameter set,
shows that the fitness function has room for improvement.
The fitness function is built around finding steady states of the system that
fulfill condition (3.3), meaning the main criterion for the fitness calculation of
a network is that the dynamic solution spirals outward from an unstable fixed
point. This type of oscillating solution can generally result in one of three
dynamic paths: it can spiral outwards toward infinity, it can spiral outward to
settle at another fixed point, or it can spiral outward toward a limit cycle around
the unstable fixed point. Within the mathematical framework implemented
here, the first is impossible, as the domain is bounded by Xj ∈ [0; 1] ∀j. If the
latter occurs, the oscillation condition is fulfilled and thus the network can be
seen as a viable cell cycle model. However, should the solution spiral toward a
stable steady state, the problem becomes more complex.
Steady states move and can disappear if a bifurcation parameter is varied [11,
Chapter 3]. If the stable steady state, that a solution spirals toward, moves or
disappears due to the variation of such a bifurcation parameter, the solution can
enter a limit cycle oscillatory phase previously blocked by this steady state. This
type of dynamic is what is meant by mass-controlled limit cycle oscillation, in
the case that mass is the described bifurcation parameter. However, even if this
occurs for a bifurcation parameter that is not mass, the network can be regarded
as a potential cell cycle model. As finding new cell cycle network topologies is the
aim of this algorithm, finding the potential of limit cycle oscillation independent
of a specific parametrization, is deemed sufficient to see if a network topology
has the potential of being a cell cycle.
As the network grows with evolution, the number of fixed points which the
17
network can spiral toward increases. Finding suitable parameter sets that allow
for oscillation, if existent, at larger scales thus becomes increasingly difficult and
therefore infeasible for an algorithm which evaluates fitness for every network
in the simulation at each iteration.
The case which still remains unsolved despite the above argumentation is if
the solution spirals out to a steady stable state, which cannot be removed by
variation of any parameter. The aforementioned boundary for Xj is one such
steady state. As mentioned above removing this possibility is very computationally intensive. This can however be done by taking the output parameter
set back into account, and computing the time solution of the system of ODEs,
including the mass equation, around any unstable steady state found which fulfills conditions (3.3). Subsequently, it must be tested whether the solution is
changing less than a threshold value for several time points after a fixed equilibration time. However, it is impossible to predict how much time the system
would take to reach a stable fixed point, and thus a predetermined cutoff would
be necessary to stay within feasible computational time limits. Therefore, even
with this type of testing, ensuring a limit cycle oscillatory path is reached would
be impossible.
Furthermore, the linearization approximation in linear stability analysis makes
an interpretation of the dynamics of the solution from only stability criteria very
difficult [12]. This approximation becomes worse the further the solution is from
the unstable steady state, however we are looking for solutions that move from
this steady state.
These considerations outline the limits of the implemented fitness function.
While there is room for improvement on the topic of oscillation evaluation, the
currently implemented method is a viable way of approaching the problem.
Apart from the investigation of oscillation, the fitness function can also be
improved in other ways. As evident from preliminary results, evolved networks
seem to resemble expected cell cycle properties more when the number of negative feedback loops is reduced, and when there are more double negative or
positive feedback loops. While this interpretation needs a lot more investigation
to be confirmed, specifically by reducing the evolved networks to their essential
interactions to simplify their interpretation, previous evaluation of these functional motifs support this result [16]. The fitness calculation could be improved
by including a penalty for negative feedback loops and a reward for stabilizing
motifs.
Furthermore, the mutation of networks also has room for improvement.
Probabilities for different types of mutation events can be further refined from
the values used currently. The relative probabilities of e.g. node duplication
and node addition are sure to play a role in the way evolution is modeled,
however these probabilities are currently simply scientific estimates. Further
investigation into their effects would help refine the current model.
All in all, ACE is shown to be a promising algorithm to evolve cell cycle
models. In its current fundamental format, it is able to generate evolved cell
cycles which exhibit artifacts from known eukaryotic models, and even a potential candidate for an embryonic cell cycle network pending further investigation.
18
Simple improvements to the fitness function or further investigation to refine
mutation parameters have the potential of improving the results, yet the current
version can be seen as a good starting point for an evolutionary algorithm.
5
Future Work
As mentioned in the Discussion section, there are a lot of improvements to be
made to ACE. Regarding the fitness function an additional penalty and reward
should be added for negative feedback loops and double negative and positive
feedback loops respectively. This would ideally reduce the amount of results with
beat-frequency patterns and produce more networks with stable oscillations.
More technical improvements can be made to this function regarding testing for
solutions which oscillate towards stable steady states, while this is surely a more
long-term goal. An easier test to implement would be to see whether inhibitor,
activator or mass asymptotes to 0, which is also an undesirable case.
Future work should also be directed toward testing evolved networks more
rigorously to obtain more precise results. Individual networks should be tested
for their essential components by creating a minimalizing algorithm, which
knocks out interactions that are unnecessary to reproduce the observed dynamics, as well as node which asymptote to 0 concentration.
Furthermore, investigation into mutation probabilities should be done as
mentioned in the previous section.
As aspect which would help in all future investigations, would be to find
a summary statistic which describes the suitability of a model as a cell cycle.
This statistic could not only be used as a benchmark for the aforementioned
investigations, but could also be used in the fitness calculation.
Finally, there are interesting investigations which are however very long term
goals. ACE can be modified to investigate possible paths a primitive cell cycle
model took to reach at models such as the fission or budding yeast cell cycles.
This would mean creating a reverse mutation function which starts at a more
complex cell cycle such as fission yeast and ends at a primitive cell cycle model.
This type of approach can also give an indication of the amount of redundancy
likely to exist in these current models.
Another avenue ACE can be taken in is implementing it on other biological
models apart from the cell cycle by rewriting the fitness function. This function is what determines what criteria a network will be evaluated against, thus
given a different set of criteria, networks that represent different models can be
generated.
Finally, an alternative modeling approach would be to use the continuous
time Markov process with continuous state space such as R+ , or intervals of
real numbers. Such models have been used extensively in population genetics to
model gene frequency change [15]. This approach would have had the benefit of
real evolutionary time and therefore be better suited for phylogenetic inference,
however would have been more difficult to implement for a cell cycle model, and
thus be unsuitable for a short project.
19
6
Acknowledgements
I would like to thank Lukas Hutter, Dr. P.K. Vinod and Dr. T. Zhang for help
with using the XPP software and insightful conversations about the cell cycle.
References
[1] Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and
James D Watson. Molecular biology of the cell (3rd edition). Garland, New
York, 1994.
[2] Stephen J Elledge. Cell cycle checkpoints: preventing an identity crisis.
Science, 274(5293):1664–1672, 1996.
[3] Bard Ermentrout. Xppaut. In Computational Systems Neurobiology, pages
519–531. Springer, 2012.
[4] James E Ferrell Jr, Tony Yu-Chen Tsai, and Qiong Yang. Modeling the
cell cycle: why do certain circuits oscillate? Cell, 144(6):874–885, 2011.
[5] Paul M Harrison, Anuj Kumar, Ning Lang, Michael Snyder, and Mark
Gerstein. A question of size: the eukaryotic proteome and the problems in
defining it. Nucleic acids research, 30(5):1083–1090, 2002.
[6] Samuel Karlin and Howard Milton Taylor. A second course in stochastic
processes, volume 2. Access Online via Elsevier, 1981.
[7] Eric Mjolsness, David H Sharp, and John Reinitz. A connectionist model
of development. Journal of theoretical Biology, 152(4):429–453, 1991.
[8] Kim Nasmyth. Evolution of the cell cycle. Philosophical Transactions of the
Royal Society of London. Series B: Biological Sciences, 349(1329):271–281,
1995.
[9] Bela Novak, Attila Csikasz-Nagy, Bela Gyorffy, Kim Nasmyth, and John J
Tyson. Model scenarios for evolution of the eukaryotic cell cycle. Philosophical Transactions of the Royal Society of London. Series B: Biological
Sciences, 353(1378):2063–2076, 1998.
[10] Orkun S Soyer and Sebastian Bonhoeffer. Evolution of complexity in
signaling pathways. Proceedings of the National Academy of Sciences,
103(44):16337–16342, 2006.
[11] Steven Strogatz. Nonlinear dynamics and chaos: with applications to
physics, biology, chemistry and engineering. Perseus Books Group, 2001.
[12] Jörg W Stucki. Stability analysis of biochemical systemsa practical guide.
Progress in Biophysics and Molecular Biology, 33:99–187, 1979.
20
[13] Akos Sveiczer, Attila Csikasz-Nagy, Bela Gyorffy, John J Tyson, and Bela
Novak. Modeling the fission yeast cell cycle: Quantized cycle times in wee1cdc25δ mutant cells. Proceedings of the National Academy of Sciences,
97(14):7865–7870, 2000.
[14] Ilias Tagkopoulos, Yir-Chung Liu, and Saeed Tavazoie. Predictive behavior
within microbial genetic networks. science, 320(5881):1313–1317, 2008.
[15] Elizabeth Alison Thompson. Human evolutionary trees. CUP Archive,
1975.
[16] John J Tyson and Béla Novák. Functional motifs in biochemical reaction
networks. Annual review of physical chemistry, 61:219–240, 2010.
[17] Anael Verdugo, PK Vinod, John J Tyson, and Bela Novak. Molecular
mechanisms creating bistable switches at cell cycle transitions. Open biology, 3(3), 2013.
[18] Vilém Zachleder, Oliver Schläfli, and Arminio Boschetti. Growth-controlled
oscillation in activity of histone h1 kinase during the cell cycle of chlamydomonas reinhardtii (chlorophyta) 1. Journal of phycology, 33(4):673–681,
1997.
21
Supplementary Material
Molecular Models of the Primitive Cell Cycle
The equations governing the simplest inhibitor activator model mentioned in
the introduction are:
∂C
∂t
∂At
∂t
∂It
∂t
∂mass
∂t
I
= kass A I − (kdiss + kdi + kda + kcat A) C
(6.1a)
= ksa mass TFa − kda At
(6.1b)
= ksi − kdi It − kcat C A
(6.1c)
= µ mass
(6.1d)
= It − C
(6.1e)
A = At − C
Φ
TFa =
1 + ( A )n
(6.1f)
(6.1g)
Equation 6.1g is a Hill function for self-inhibition, and Φ, , and n are tunable
parameters, where n > 1 must be fulfilled. Here the activator is abbreviated
to A, the complex to C and the inhibitor to I. The quantities with subscript t
denote the total concentration of this protein in the system.
Mass is modeled as exponentially increasing in time by (6.1d), it is halved
once the activator falls below a certain threshold, thereby modeling the cell
division at the end of M phase, when synthesis has long been fired by high
activator concentration, which is then decaying slowly.
The simplified SIMM-model is governed by the following equations:
∂C
∂t
∂Ip
∂t
∂At
∂t
∂It
∂t
∂mass
∂t
I
= kass A I − (kdiss + kdi + kda + kcat ) C
= kcat C − (kdp + kdi + kcat2 A) Ip
= ksa mass TFa − kda At
= ksi − kdi It − kcat2 Ip A
= µ mass
= It − C
A = At − C
Φ
TFa =
1 + ( A )n
22
Parametrization of Primitive Cell Cycle Networks
Parameter
Φ
n
thresh
µ
ksa
ksi
kda
kdi
kcat
kass
kdiss
Value
1
0.001
1.1
0.06
0.05
0.2
0.2
0.5
0.05
50
100
0.01
Parameter description
TF Hill function parameter
TF Hill function parameter
TF Hill function parameter
Activator threshold at which mass halving occurs
Mass growth rate
Synthesis rate of Activator
Synthesis rate of Inhibitor
Degradation rate of Activator
Degradation rate of Inhibitor
Catalysis rate of Complex
Association rate to Complex
Disassociation rate from Complex
Table 2: Simplest Activator-Inhibitor molecular model parameter set
Parameter
Φ
n
thresh
µ
ksa
ksi
kda
kdi
kcat
kcat2
kdp
kass
kdiss
Value
1
0.001
1.1
0.06
0.05
1
0.5
0.025
0.05
0.5
20
0.5
100
0.01
Parameter description
TF Hill function parameter
TF Hill function parameter
TF Hill function parameter
Activator threshold at which mass halving occurs
Mass growth rate
Synthesis rate of Activator
Synthesis rate of Inhibitor
Degradation rate of Activator
Degradation rate of Inhibitor
Catalysis rate of Complex
Rate of INH-P degradation catalyzed by Activator
Dephosphorlylation rate of INH-P
Association rate to Complex
Disassociation rate from Complex
Table 3: Simplified SIMM molecular model parameter set
23
Parameter
γ1
γ2
γ3
σ1
σ2
σ3
ω10
ω12
ω13
ω20
ω21
ω30
ω31
ω34
µ
thresh
Value
1
5
0.1
1
4
2
-1
-5
7
1.5
-5
0
-7
5
0.01
0.5
Parameter description
Activator timescale parameter
Inhibitor timescale parameter
TF timescale parameter
Activator sigmoid steepness parameter
Inhibitor sigmoid steepness parameter
TF sigmoid steepness parameter
Activator synthesis - degradation rate
Activator inhibition weight by Inhibitor
Activator activation weight by TF
Inhibitor synthesis - degradation rate
Inhibitor inhibition weight by Activator
TF synthesis - degradation rate
TF inhibition weight by Activator
TF activation weight by Mass
Mass growth rate
Activator threshold at which mass halving occurs
Table 4: Reinitz model parameter set. Further explanations of the parameters
can be found in the Methods section
24
Parametrization of Evolved Cell Cycle Networks
Parameter
N MIN
Batchsize
nMax
MAX GENERATION
NODE FITNESS COST
NODE PROP MUTATION
EqMut[0]
EqMut[1]
EqMut[2]
EqMut[3]
EqMut[4]
EqMut[5]
EqMut[6]
Mut[0]
Mut[1]
Mut[2]
Mut[3]
Mut[4]
Mut[5]
Mut[6]
Ace1: seed
Ace1: gausSeed
Ace2: seed
Ace2: gausSeed
Ace3: seed
Ace3: gausSeed
Value
6
3
20
100
0.001
0.5
0.3
0
0.3
0
0.4
0
0
0.1
0.2
0.1
0.3
0.3
0.3
0.15
-1379610050
1379610050
-1379933119
1379933119
-1379933503
1379933503
Parameter description
Minimun size of all networks to exit equilibration
Number of “organisms” in the simulation
Maximum number of nodes per network
Number of evolution iterations
Fitness cost per node
Probability of node property mutation
Equilibration probability of node duplication
Equilibration probability of node deletion
Equilibration probability of node addition
Equilibration probability of interaction deletion
Equilibration probability of interaction addition
Equilibration probability of interaction scaling
Equilibration probability of interaction inversion
Relative probability of node duplication
Relative probability of node deletion
Relative probability of node addition
Relative probability of interaction deletion
Relative probability of interaction addition
Relative probability of interaction scaling
Relative probability of interaction inversion
Uniform sampler random seed for Fig 6 run
Gaussian sampler random seed for Fig 6 run
Uniform sampler random seed for Fig 7 run
Gaussian sampler random seed for Fig 7 run
Uniform sampler random seed for Fig 8 run
Gaussian sampler random seed for Fig 8 run
Table 5: Parameter set of ACE run to produce evolved networks
25

Download Report

Evolving a Primitive Eukaryotic Cell Cycle Model

Paperzz.com

Your Paperzz