Organism size promotes the evolution of specialized

doi: 10.1111/j.1420-9101.2007.01466.x
Organism size promotes the evolution of specialized cells
in multicellular digital organisms
M. WILLENSDORFER
Program for Evolutionary Dynamics, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
Keywords:
Abstract
differentiated multicellularity;
division of labour;
early life;
somatic cells.
Specialized cells are the essence of complex multicellular life. Fossils allow us
to study the modification of specialized, multicellular features such as jaws,
scales, and muscular appendages. But it is still unclear what organismal
properties contributed to the transition from undifferentiated organisms,
which contain only a single cell type, to multicellular organisms with
specialized cells. Using digital organisms I studied this transition. My
simulations show that the transition to specialized cells happens faster in
organism composed of many cells than in organisms composed of few cells.
Large organisms suffer less from temporarily unsuccessful evolutionary
experiments with individual cells, allowing them to evolve specialized cells
via evolutionary trajectories that are unavailable to smaller organisms. This
demonstrates that the evolution of simple multicellular organisms which are
composed of many functionally identical cells accelerates the evolution of
more complex organisms with specialized cells.
Introduction
In multicellular organisms, cells differentiate and specialize to form tissues which cooperate to form organs
such as brains, kidneys, hearts, stomachs and lungs.
Without specialized cells multicellular organisms would
be nothing more than a homogeneous lump of cells. It is
a widely accepted consequence of evolutionary theory
that differentiated organisms with specialized cells
evolved from undifferentiated ancestors (Darwin, 1859;
Buss, 1988; Knoll, 2003; King, 2004).
It is believed that the pre-existence of undifferentiated
multicellularity conveys advantages for the evolution of
specialized cells (Buss, 1988; Maynard-Smith, 1989).
One argument regards the alleviation of reproductive
competition in organisms that develop from a single cell.
In such organisms, cells are genetically identical and
genes that encode for the development of specialized cells
would not curtail their own propagation by creating
nonreproductive cells (Buss, 1988; Maynard-Smith,
1989; Maynard-Smith & Szathmary, 1997; Dawkins,
1999; Michod & Roze, 2001).
Correspondence: M. Willensdorfer, Program for Evolutionary Dynamics,
Department of Molecular and Cellular Biology, Harvard University,
Cambridge, MA 02138, USA.
e-mail: [email protected]
104
In this work, I demonstrate that undifferentiated
multicellularity conveys an additional advantage for the
evolution of specialized cells. I find that the size of a
multicellular organism affects its fitness landscape. Mutations that differentiate individual cells are less detrimental in organisms composed of many cells than
in organisms composed of few cells. This changes the
evolutionary landscape and accelerates the evolution of
specialized cells. The insight that the size of an organism
affects its ability to evolve new, specialized cells is vital
for our understanding of how complex multicellular life
evolved.
To study the evolution of a complex feature like
differentiated multicellularity, it is desirable to use an
experimental system in which the evolutionary path
from one stage to another is not preset but discovered by
evolution itself. Digital organisms provide such a framework. Digital organisms are entities that are able to
replicate and perform specific tasks. They compete for a
common resource and are exposed to mutations. The
combination of replication, competition and mutation
results in an evolutionary process, which can be used to
address biological questions (Adami et al., 2000; Wilke
et al., 2001; Yedid & Bell, 2002; Lenski et al., 2003; Chow
et al., 2004).
So far, however, digital organisms have not been
equipped with the ability to evolve multicellularity. To
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Specialized cells in multicellular DISCOs
close this gap, I developed and implemented digital
organisms that are able to evolve multicellularity of
varying complexity. The resulting digital self-replicating
cellular organisms (DISCOs) are similar to the digital
organisms used by the Avida software platform (Ofria &
Wilke, 2004). The Supplementary material contains
details about how a single DISCO cell works and how
multicellularity is implemented. Besides providing
insight into the evolution of specialized cells, this paper
demonstrates how readily existing artificial life systems
can be extended to study the evolution of complex
multicellular features.
For the following, it is sufficient to know that a DISCO
has a genome that can encode logic functions as well as
the development of a multicellular organism. The fitness
of a DISCO is determined by its merit and its speed of
replication. The merit is determined by the logic functions that the DISCO can execute, which are encoded in
its genome. Logic functions differ in their complexity.
The higher the complexity of the logic function, the
larger the merit increase (see Supplementary material
and Lenski et al., 2003).
The speed of replication is mainly determined by the
number of cells a DISCO is composed of. Several types of
cells exist. The default (D) cell is the replicative cell.
Every DISCO has exactly one D cell and it is the only cell
type in a unicellular organism. D cells can, if instructed
by the genome, produce somatic X and Y cells. Somatic
cells are always associated with a cost as a multicellular
organism spends time and energy growing them,
whereas a unicellular organism can use these resources
to produce offspring. On the other hand, by computing
logic functions, somatic cells can increase a DISCO’s
merit yielding a benefit that outweighs these costs. Some
functions, however, can only be utilized by specific,
specialized cells.
Specialized cells are common in biology. The model
structure studied in this work is motivated by the
heterocysts of cyanobacteria. Heterocysts are cells specialized on the fixation of nitrogen. They provide an
oxygen-free environment for the nitrogen-fixing
enzyme. To accomplish this, they develop thick cell walls
that shut out oxygen. They also degrade photosystem II,
which produces oxygen. These features allow heterocysts
to fix nitrogen, but prevent them from carrying out
functions of nonspecialized cells, such as cell division and
photosynthesis via photosystem II.
Y cells are specialized cells in DISCOs and analogous to
heterocysts in cyanobacteria. Y cells are different from
normal (D and X) cells. The differences allow Y cells to
utilize the three most complex logic functions which
cannot be utilized by the nonspecialized D and X cells.
This specialization, however, makes it impossible for Y
cells to utilize the six logic functions that nonspecialized
D and X cells can utilize. Thus, similar to heterocysts, Y
cells are specialized for certain tasks (see first column in
Fig. 1b and Supplementary material).
105
It is important to emphasize that a multicellular DISCO
can only benefit from a given logic function if: (a) the
function is encoded in the genome; and (b) cell types that
are able to utilize the function are present. This is
analogous to heterocystous cyanobacteria that can only
benefit from nitrogen fixation if: (a) the nitrogen-fixing
enzyme is correctly encoded in the genome; and (b) cells
with degenerated photosynthesis II and thick, oxygenimpermeable cell walls exist.
Simulations and results
In this work, I am interested in the transition from simple
multicellularity to a more complex multicellularity with
specialized cells. In particular, I would like to know if the
pre-existence of undifferentiated multicellular organisms
has an effect on the evolution of specialized cells. To
study this, I will compare ‘)X’ and ‘+X’ simulations. In
+X simulations, X cells are able to increase the merit of
a DISCO. A DISCO composed of one D cell and n X cells
has n + 1 times the merit of a unicellular DISCO. In +X
simulations, the evolution of X cells, that is, undifferentiated multicellularity, is encouraged. This is not the case
in )X simulations in which X cells are not able to increase
the merit of the organism and are therefore disadvantageous. In )X simulations, a transition to differentiated
multicellularity with specialized cells has to occur directly
from a unicellular ancestor.
To study the transition to differentiated organisms,
I evolved, as a first step, undifferentiated DISCOs. To
ensure that DISCOs do not evolve specialized cells, I
suspended the ability of Y cells to increase the merit of
the organism for this initial set of simulations (see
Table S1 and Fig. 1b). For each set of simulations ()X
and +X), I conducted 500 independent runs that differ
only with respect to the seed for the random number
generator. Each simulation was initiated with a genome
that encodes only replication. In other words, the
ancestral DISCO was unicellular and could not compute
any logic function. For computational reasons, I used an
effective population size of 200 organisms and stopped a
simulation after 10 000 generations (see Supplementary
material for more details). Following the logic of Lenski
et al. (2003), at the end of each simulation I determined
the most recent common ancestor of the population and
its line of descent. Similar to a palaeontologist, I use this
(digital) fossil record to determine when each trait
appeared. But in contrast to a palaeontologist, I know
the fitness of each fossil and study 500 independent
instances of one evolutionary process. This gives me the
opportunity to discover general properties of the process
at hand.
As expected, none of the )X simulations evolved
multicellularity during the first 10 000 generations. All
DISCOs remained unicellular. On the other hand, 491 of
the 500 +X simulations evolved undifferentiated multicellularity. Multicellular DISCOs are very diverse with
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
106
M. WI L LE N S D O R F E R
(a)
(b)
Fig. 1 Multicellularity in digital self-replicating cellular organisms (DISCOs). (a) The first five cell divisions of a DISCO with a genome that
encodes for two X cells (green shaded regions) and one Y cell (blue shaded region). The first three cell divisions produce the somatic X and
Y cells. Every further division produces offspring which is released into the environment. (b) The merit of this multicellular organism
during the first and second 10 000 generations of the )X and +X simulations. The genome encodes for five (of nine) logic functions as indicated
by the nine-digit binary sequence. D, X and Y cells can utilize these functions (second column) only according to their cell type specificity
(first column) and receive a corresponding merit (third column) which is used to calculate the merit of the organism (fourth column). Note
that Y cells are not able to increase the merit of the organism during the first 10 000 generations and that X cells are not able to increase
the merit of the organism during the )X simulations. Cells that do not increase merit are disadvantageous, as they increase the number of
cell divisions that are required to reach maturity. A more detailed description is available in the Supplementary material.
respect to their size. They have body sizes ranging from
two to 13 cells, with size five as the most frequent. The
organisms were also very successful in evolving logic
functions. Most simulations evolved DISCOs that can
compute all six available functions; few evolved ‘just’ five
functions.
To study the evolution of specialized cells, I used each
of these most recent common ancestors as a starting point
for another 2 · 500 simulations. This time, Y cells were
able to utilize functions that had not been available so
far. They can increase the fitness of a DISCO substantially
(see Supplementary material and Fig. 1b). In such a
situation, one expects the evolution of DISCOs with Y
cells and Y-cell-specific functions, which was indeed the
case. As expected, multicellular DISCOs in the )X
simulations were exclusively bicellular, composed of
one D and one Y cell. Specialized cells were discovered
in 197 of the 500 )X and in 308 of the 500 +X
simulations. This difference is highly significant (twosample test for equality of proportions: v2 = 48.40,
d.f. = 1, P = 3.5 · 10)12). Apparently, the pre-existence
of undifferentiated multicellular organisms promotes the
evolution of more complex multicellular organisms with
specialized cells.
To study why undifferentiated multicellularity promotes the evolution of Y cells, I examined the evolutionary paths that lead to DISCOs with specialized cells.
Considering the order of events we have three possibilities. Mutations can result in the simultaneous (si)
appearance of Y cells and Y-cell-specific functions, or
the two traits may appear in succession, either first the
cell and then the function (cf) or first the function and
then the cell (fc). The digital fossil record allows us to
determine via which path and at what time Y cells were
discovered (see Fig. 2). Two features are conspicuous.
First, for the )X simulations si is the most frequently
travelled path; about 46% of the specialized cells are
discovered simultaneously with the cell function. Secondly, the )X and +X simulations differ noticeably only
with respect to cf (red triangles). Apparently, evolving
first the cell and then the function is much easier for
undifferentiated multicellular organisms than it is for
unicellular ones and accounts for the significant difference in the number of simulations that discovered Y cells
between the +X and the )X simulations.
To explain these observations, we have to consider the
fitness of organisms along the three possible paths.
Especially, the intermediates for cf and fc are of interest.
Let Dc and Df denote DISCOs along the evolutionary
paths cf and fc. That is, Dc is a DISCO that encodes Y cells
but not (yet) Y-cell-specific logic functions, and Df is a
DISCO that has acquired Y-cell-specific functions but not
(yet) Y cells. If Dc and Df have a low fitness, then they are
not maintained for long in the population and there is
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
107
+X simulations
−X simulations
cf (first cell then function)
si (simultaneously cell and function)
fc (first function then cell)
0
Number of simulations
50
100
150
200
Specialized cells in multicellular DISCOs
0
2000
4000
6000
8000
Number of generations
10 000
Fig. 2 Number of simulations that evolved specialized cells as a
function of time. Simulations are grouped according to the evolutionary paths cf, si and fc (see figure legend and main text) that led
to the evolution of new cell types that utilize new functions.
Noticeable differences between simulations with unicellular ()X)
and undifferentiated multicellular (+X) ancestors exist only with
respect to evolutionary path cf along which specialized cells (Y cells)
appear before the genome encodes for the specialized functions.
less opportunity for a second mutation to give rise to the
missing Y-cell function or the missing Y cell. In such a
case, evolution along the corresponding paths is impaired
and one would expect most specialized cells to evolve
directly via si (Iwasa et al., 2003, 2004).
The digital fossil record provides information about the
time, t, that Dc and Df are maintained in the population,
as well as the organism size at that point (see Fig. 3). Let
us first discuss the data for organisms of size one, that is,
the )X simulations. None of the 51 simulations that
evolved Y cells via cf maintained Dc for more than 20
generations in the population. This suggests that mutations that provide a DISCO with Y cells are deleterious.
The digital fossil record shows that the fitness decrease is
not a result of a merit decrease due to a loss of logic
functions. Rather, the loss in fitness is caused by
developmental costs. DISCOs with Y cells but without
Y-cell functions grow one additional, unused cell. This
constitutes a fitness burden. In the Supplementary
material, I show that mutations that transform a unicellular DISCO into a bicellular DISCO decrease the relative
fitness from 1 to about 0.62. Thus, the fitness of Dc is
indeed low.
What about the fitness of Df? The data in Fig. 3 show
that Df is easier to maintain in the population than Dc
and suggests that mutations along path fc are less
deleterious than mutations along cf. But for the following
CF
(first cell then function)
15
FC
(first function then cell)
14
3
12
11
10
2 7 14
9
1 6 24
8
2 9 30
7
7 9 21
6
9 12 25
5
8 10 11
4
4 8 12
9 5 5
2
3 9 15
3
3
4 11
1
Fig. 3 Time, t, in number of generations
between the appearance of specialized cells
(Y cells) and the appearance of specialized
functions. The data are grouped according to
the size of the digital self-replicating cellular
organism immediately before the appearance
of Y cells. The plot contains data from the +X
(organism size greater than one) and )X
(organism size equals one) simulations. The
panel in the middle shows the number of
simulations that evolved Y cells via fc, si
and cf respectively. The correlation between
t and the size of the organism for evolutionary path cf is evident (Kendall’s s-statistic:
z = 5.15, P = 2.64·10)7). It shows how
organism size affects the evolution of specialized cells by reducing the detrimental
effect of temporarily unsuccessful evolutionary experiments with individual cells.
To increase expressiveness, I added small
random noise to the organism size and
used different plot regions for fc and cf.
Organism size
13
1
6 2 1
+X simulations
–X simulations
–250 –200 –150 –100 –50
55 91 51
0
0
10
20
30
40
Number of generations between the appearance of Y cells
and Y cell functions
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
50
108
M. WI L LE N S D O R F E R
relative fitness of a 10-celled DISCO is decreased to only
0.93 (see Supplementary material). Hence, the intermediates along cf are less deleterious for large organisms and
are therefore maintained longer in the population. This is
evidenced by a significant (Kendall’s s-statistic: z = 5.15,
P = 2.64 · 10)7) correlation between t and the size of the
organism for cf (see Fig. 3). Consequently, the rate of
evolution along cf increases with organism size and large
organisms are significantly more likely to evolve Y cells
via cf (Pearson’s chi-squared test: v2 = 37.20, d.f. = 13,
P = 3.9 · 10)4, see numbers in Fig. 3). By lowering the
barriers along one of the evolutionary paths, organism
size promotes the evolution of specialized cells and,
therefore, of more complex multicellular organisms.
If this is the case, then we should also find evidence for
this phenomenon within the +X simulations. In particular, organisms composed of many cells should evolve
specialized cells earlier than organisms composed of few
cells. Fig. 4 shows the size distribution of DISCOs with
and without Y cells at different time points of the +X
simulations. For example, after 200 generations, 14
simulations discovered Y cells. Only two of those
(<15%) were discovered in organisms smaller than seven
cells, even though, most simulations (>60%) contained
organisms smaller than seven cells. This bias towards
specialized cells in larger organisms is even more
pronounced in later stages of the simulations and
reasons, we can actually expect most mutations that
generate Y-cell functions in DISCOs without Y cells
(mutations along path fc) to be very deleterious. The
digital fossil record shows that most mutations (>95%)
that generate Y-cell functions in DISCOs with Y cells
(along evolutionary path cf) destroy at least one of the
previously evolved logic functions. This is not detrimental for DISCOs with Y cells because they trade a Y-cellspecific function for a nonspecific function. However,
DISCOs without Y cells cannot utilize the newly discovered logic function and experience ‘just’ a loss of already
evolved logic functions. Hence, most mutations that lead
to Df are actually deleterious and evolution via fc can
only use a small (<5%) subset of neutral mutations. All
things considered, Df and Dc have on average a low
fitness and we should not be surprised that many
specialized cells evolve via si (Iwasa et al., 2003, 2004).
But why and how does the situation change for
undifferentiated, multicellular organisms? Why is the
rate of evolution via cf higher in +X than in )X
simulations (see Fig. 2)? We can answer this question
by considering the developmental cost of Y cells for
organisms of different sizes. As one would expect, the
burden of developing one additional unused cell is more
substantial for small organisms than it is for large ones.
For example, a size increase by one decreases the relative
fitness of a unicellular DISCO to 0.62, whereas the
1
3
5
7
9
11 13 15
10 20 30 40 50 60 70
P−value = 3.45e−02
After 500 generations
0
0
Number of simulations
10 20 30 40 50 60 70
After 200 generations
P−value = 1.48e−04
1
7
9
11 13 15
P−value = 1.63e−10
0
0
10
20
30
40
Number of simulations
10
20
30
40
P−value = 5.13e−09
5
After 10 000 generations
50
50
After 5000 generations
3
1
3
5 7 9 11 13 15
Organism size
1
3
5 7 9 11 13 15
Organism size
Fig. 4 Number of simulations that evolved
organisms with (white bars) and without
(black bars) specialized cells after 200, 500,
5000 and 10 000 generations, grouped
according to the size of the organism. Large
organisms show a significant bias (see
P-values of a Pearson’s chi-squared test)
towards discovering specialized cells earlier
than small organisms.
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Specialized cells in multicellular DISCOs
remains significant (see P-values in Fig. 4). Even within
the +X simulations, we observe that an increase in
organism size eases the evolution of specialized cells.
Discussion
For multicellular organisms (Bonner, 1965, 2004; Bell &
Mooers, 1997) as well as insect (Wilson, 1971) or
human (Blau, 1974) societies, it is known that the
degree of specialization increases with the size of the
system. Large systems seem to be able to benefit more
from specialized units. Consequently, the lack of specialization in very small multicellular organisms (Bell &
Mooers, 1997) might be explained by the existence of a
minimum threshold size at which specialization becomes advantageous. It is important to emphasize that
this is not the case in this paper. Specialized cells can
increase the fitness of small and large DISCOs substantially. Nonetheless, there is a significant correlation
between organism size and the presence of specialized
cells (see Fig. 4). This correlation is based on evolutionary constraints in small organisms and not on a
minimum threshold size at which specialization becomes advantageous.
This paper demonstrates how artificial life simulations
can be used to study the evolution of complex multicellularity. Currently, the implementation of DISCOs allows
only for a sudden change of cell types which results in an
equally sudden change of the logic functions that the cell
can utilize. For biological systems, one might favour
models in which new cell types evolve more gradually
with intermediate, ‘chimeric’ types that can perform new
and old functions, but both just suboptimally due to an
inherent trade off. Even for such models similar results
can be expected. Allowing for chimeric cell types does
not change the fact that individual cells affect the fitness
of the whole organisms less in large organisms (organisms composed of many cells) than in small organisms
(organisms composed of few cells).
In general, loss of function mutations are more
frequent than gain of function mutations. It is therefore likely that many evolutionary trajectories exists
that lead to specialized cells but involve intermediate
fitness losses due to loss of function mutations. As I
have demonstrated in this paper, such mutations are
less harmful to organisms composed of many cells. In
such organisms, mutations that create chimeric cells
with (temporarily) mediocre functionality can be maintained in the population until additional mutations
accumulate that allow the new cell type to execute the
new function at its full potential. As loss of function
mutations are so frequent, an increase in organism size
can be expected to substantially increase the arsenal of
evolutionary trajectories that are available for the
evolution of specialized cells and is therefore an
important step towards the evolution of complex
multicellularity.
109
The insight that the size of a biological system affects its
evolutionary landscape can also be applied to other
aspects of biology. Take, for example, gene duplication. It
is commonly accepted that gene duplication accelerates
the discovery of new gene functions. After a gene
duplication, one copy of the gene can execute the old
function. The other copy can accumulate mutations
which might eventually lead to new functions (Lynch &
Conery, 2000). A corollary from this paper is that the
genome size of an organism has a crucial impact on the
organism’s ability to discover new gene functions by
means of gene duplication. For very small genomes (e.g.
the genome of a virus) a duplicated gene constitutes a
substantial fitness burden and might not be maintained
in the population long enough to adopt a new function.
For big genomes (e.g. eukaryotic genomes) a duplicate
of a single gene constitutes an insignificant additional
burden and can be maintained long enough to discover a
new role. The results from this paper regarding organism
size and the evolution of complex multicellularity can
also be applied to genome sizes and the evolution of
complex genomes.
Organism size has always been considered an important factor for the evolution of multicellularity. In fact,
benefits of increased size are thought to have promoted
the transition from unicellular to undifferentiated multicellular life (Bonner, 1965, 2001; Kirk, 1997, 2003;
King, 2004). Advantages of increased size include predator evasion (Boraas et al., 1998), increased motility
(Kirk, 2003) or increased capacity to store nutrients
(Koufopanou & Bell, 1993; Kerszberg & Wolpert, 1998).
In this work, I observe another, less obvious, benefit of
organism size. This benefit – the ability to discover new,
specialized cells via trajectories that are inaccessible to
small organisms – does not concern the fitness of the
organism itself, but its ability to evolve more complex
multicellular forms.
Acknowledgments
I am grateful to Erick Matsen for mesmerizing and
amazing discussions, Matthew Hegreness and Reinhard
Bürger for help with the manuscript, and Martin Nowak
for invaluable input. I am supported by a Merck-Wiley
fellowship. Support from the NSF ⁄ NIH joint programme
in mathematical biology (NIH grant R01GM078986) is
gratefully acknowledged. The Program for Evolutionary
Dynamics at Harvard University is sponsored by
J. Epstein.
References
Adami, C., Ofria, C. & Collier, T.C. 2000. Evolution of biological
complexity. Proc. Natl Acad. Sci. U. S. A. 97: 4463–4468.
Bell, G. & Mooers, A. 1997. Size and complexity among
multicellular organisms. Biol. J. Linn. Soc. 60: 345–363.
Blau, P. 1974. On the Nature of Organizations. Wiley, New York.
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
110
M. WI L LE N S D O R F E R
Bonner, J.T. 1965. Size and Cycle. Princeton University Press,
Princeton, NJ.
Bonner, J.T. 2001. First Signals: The Evolution of Multicellular
Development. Princeton University Press, Princeton, NJ.
Bonner, J.T. 2004. Perspective: the size–complexity rule. Evolution 58: 1883–1890.
Boraas, M., Seale, D. & Boxhorn, J. 1998. Phagotrophy by a
flagellate selects for colonial prey: a possible origin of multicellularity. Evol. Ecol. 12: 153–164.
Buss, L.W. 1988. The Evolution of Individuality. Princeton University Press, Princeton, NJ.
Chow, S.S., Wilke, C.O., Ofria, C., Lenski, R.E. & Adami, C.
2004. Adaptive radiation from resource competition in digital
organisms. Science 305: 84–86.
Darwin, C. 1859. On the Origin of Species: By Means of Natural
Selection. Murray, London.
Dawkins, R. 1999. The Extended Phenotype, revised edn. Oxford
University Press, Oxford.
Iwasa, Y., Michor, F. & Nowak, M.A. 2003. Evolutionary
dynamics of escape from biomedical intervention. Proc. Biol.
Sci. 270: 2573–2578.
Iwasa, Y., Michor, F. & Nowak, M.A. 2004. Stochastic tunnels in
evolutionary dynamics. Genetics 166: 1571–1579.
Kerszberg, L. & Wolpert, E. 1998. The origin of metazoa and the
egg: a role for cell death. J. Theor. Biol. 193: 535–537.
King, N. 2004. The unicellular ancestry of animal development.
Dev. Cell 7: 313–325.
Kirk, D.L. 1997. Volvox : A Search for the Molecular and Genetic
Origins of Multicellularity and Cellular Differentiation. Cambridge
University Press, Cambridge.
Kirk, D. 2003. Seeking the ultimate and proximate causes of
volvox multicellularity and cellular differentiation. Integr.
Comp. Biol. 43: 247–253.
Knoll, A.H. 2003. Life on a Young Planet: The First Three Billion
Years of Evolution on Earth. Princeton University Press, Princeton, NJ.
Koufopanou, V. & Bell, G. 1993. Soma and germ – an
experimental approach using volvox. Proc. R. Soc. Lond. Ser.
B-Biol. Sci. 254: 107–113.
Lenski, R.E., Ofria, C., Pennock, R.T. & Adami, C. 2003. The
evolutionary origin of complex features. Nature 423: 139–144.
Lynch, M. & Conery, J.S. 2000. The evolutionary fate and
consequences of duplicate genes. Science 290: 1151–1155.
Maynard-Smith, J. 1989. Evolutionary Progress. University of
Chicago Press, Chicago, IL, pp. 219–230.
Maynard-Smith, J. & Szathmary, E. 1997. The Major Transitions
in Evolution, reprint edn. Oxford University Press, Oxford.
Michod, R.E. & Roze, D. 2001. Cooperation and conflict in the
evolution of multicellularity. Heredity 86: 1–7.
Ofria, C. & Wilke, C.O. 2004. Avida: a software platform for
research in computational evolutionary biology. Artif. Life 10:
191–229.
Wilke, C.O., Wang, J.L., Ofria, C., Lenski, R.E. & Adami, C.
2001. Evolution of digital organisms at high mutation rates
leads to survival of the flattest. Nature 412: 331–333.
Wilson, E.O. 1971. The Insect Societies. Harvard University Press,
Cambridge.
Yedid, G. & Bell, G. 2002. Macroevolution simulated with
autonomously replicating computer programs. Nature 420:
810–812.
Supplementary Material
The following supplementary material is available for
this article:
Appendix Supporting online material.
This material is available as part of the online
article from: http://www.blackwell-synergy.com/doi/abs/
10.1111/j.1420-9101.2007.01466.x
Please note: Blackwell Publishing are not responsible
for the content or functionality of any supplementary
materials supplied by the authors. Any queries (other
than missing material) should be directed to the corresponding author for the article.
Received 14 June 2007; revised 9 September 2007; accepted 5 October
2007
ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110
JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Supporting Online Material for
Organism Size Promotes the Evolution of Specialized
Cells in Multicellular Digital Organisms
Martin Willensdorfer
Program for Evolutionary Dynamics,
Department of Molecular and Cellular Biology,
Harvard University, Cambridge, MA 02138, USA.
E-mail: [email protected]
Contents
1
Digital Self-Replicating Cellular Organisms (DISCOs)
1.1 The Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Environment, Fitness, Merit, and Logic Functions . . . . . . . . . . . . . . . .
1.3 Multicellularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
5
7
2
Details about the -X and +X simulations
9
3
Developmental cost of one Y cell
13
A List of Instructions
17
B Example Genomes
19
1
1 Digital Self-Replicating Cellular Organisms (DISCOs)
This section provides details about DISCOs. Since I was motivated by Lenski et. al. (1), DISCOs are very similar to Avididans, that is, digital organisms developed by Christoph Adami (2).
However, there are some important differences: (a) DISCOs have a cell cycle with a metabolic
and a replication phase, (b) DISCOs have a minimal and disjunct set of instructions for each
phase, (c) DISCOs can only copy their genome to a daughter strand and cannot modify their
own genome, and (d) DISCOs can evolve multicellularity. Modifications (a)–(c) make it possible to directly identify the genomic basis of a phenotypic feature and modification (d) makes it
possible to study the evolution of multicellularity.
Section 1.1 explains in detail how a single DISCO cell works. Section 1.2 describes how
fitness is realized in DISCOs and how a DISCO can increase its fitness by computing logic
functions. Finally, Section 1.3 explains aspects of multicellularity in DISCOs.
1.1 The Cell
A DISCO cell is a computing automaton with the ability to replicate. Each DISCO has a
genome, that is, a sequence of instructions and modifiers. The instructions are designed to
operate on the automaton and change its state in certain ways. This state change depends not
only on the instruction but also on modifiers in the genome. For the following, however, I will
use the term instruction to refer to an instruction and all the modifiers in the genome that affect
the action of this instruction. Appendix A contains detailed information about how modifiers
affect each instruction. In DISCOs modifiers are also used to encode multicellularity in the
genome as we will see in Section 1.3.
Naturally, the execution of a sequence of instructions leads to a sequence of state changes
of the automaton. The right sequence of instructions can cause state changes that result in the
2
computation of logic functions, genome replication, and, finally, cell division. Which sequence
of instructions a cell will actually execute is determined by its genome. Hence, the genome
determines the ability of a cell to compute logic functions as well as speed and accuracy of
replication, which constitute a cells phenotype.
The life of a DISCO cell is divided into two phases, a metabolic and a replication phase.
At the onset of each phase the automaton is reset and starts to execute instructions from the
beginning of the genome. During the metabolic phase a cell can read and manipulate data to
compute logic functions. During the replication phase a cell can copy its genome and initiate
cell division. Each phase has its own set of instructions. We have a set of instructions for
the metabolic phase (blank, io, nand, swap, push, pop) and a set of instructions for the
replication phase (copy, search, mov-head, if-label, divide).
A newly created cell enters first the metabolic phase. The transition from metabolic to
replication phase happens after the execution of m metabolic instructions (m is a simulation
parameter and equals 300 in all described simulations). The replication phase, on the other
hand, ends at the successful or unsuccessful attempt to initiate cell division, that is, at the
execution of a divide instruction.
During the metabolic phase a DISCO cell can create data in the form of logic compounds.
To make the production of logic compounds possible the automaton is equipped with three
registers (A, B, and C) and one stack. The registers are the operating platform. They are used
to store and manipulate data. New logic compounds can only be generated by using the nand
instruction which manipulates data stored in the registers. The most basic logic compound is a
logic variable and every logic compound is composed of logic variables and logic operators that
connect these variables. Logic variables are supplied by the io instruction and stored in one
of the three registers. By loading a new variable to a register, the io instruction might have to
replace a logic compound that is stored in this register. Whenever a logic compound is replaced,
3
the io instruction checks whether this compound is equivalent to one of nine logic functions.
If this is the case, then the DISCO will be rewarded (see next section). Data can also be copied
from one of the registers to the stack with the push instruction and recovered with the pop
instruction. The DISCO reads the metabolic instructions one by one from the genome. If the
end of the genome is reached, the DISCO will continue from the beginning, that is, the genome
is processed circular.
One might wonder how those activities relate to a biological metabolism. The analogy
becomes quite obvious if one considers autotrophic processes like carbon fixation in plants.
Plants convert carbon dioxide into organic compounds. The first stable intermediate is a 3carbon compound. These trioses can be condensed into hexoses, sucrose, or cellulose. They
can also be used to make amino acids and lipids. Hence, more complex organic compounds are
assembled from simpler ones and all those compounds contribute to the fitness of the organism.
The situation is analogous in DISCOs. However, a DISCO handles logic compounds instead of
organic ones. Logic variables are assembled with the nand instruction to form more complex
logic compounds and, as with real organisms, certain compounds can contribute considerably
to the fitness of the organism.
The metabolic phase is followed by a replication phase. During the replication phase a
DISCO can copy its genome and initiate cell division. The copy instruction instructs the
automaton to copy a genome element to the daughter strand. Which element of the genome
is copied depends on the position of the so called read head. Heads are pointers that the
automaton can use to mark elements of the genome. Besides the read head the automaton has
an instruction head and a flow head. The read head shows the automaton which element
of the genome has to be copied next. It moves forward to the next element after each copy
event. The instruction head shows the automaton which instruction has to be executed next and
does also move forward after the instruction has been executed. The flow head is used to mark
4
positions in the genome to which either the instruction head or the read had can be moved by the
mov-head instruction. This together with the if-label instruction allows to encode jumps
and loops in the genome. Cell division is initiated by the divide instruction. The execution
of a divde instruction will always lead to a switch from the replication phase to the metabolic
phase, regardless of whether it was successful or not.
Appendix A describes precisely how each instruction changes the state of the automaton
and how modifiers can affect this state change. Appendix B contains some example genomes
and illustrates how a sequence of instructions can encode logic functions and cell replication.
1.2 Environment, Fitness, Merit, and Logic Functions
DISCOs live in an unstructured environment that can accommodate a given number of organisms. In this environment DISCOs compete for the opportunity to execute instructions. For each
iteration only one DISCO is selected and the selected DISCO can execute only one instruction.
Most of the times the execution of an instruction will only affect the DISCO internally. At some
point, however, a DISCO will produce offspring. The offspring is first exposed to replacement
(copy), insertion, and deletion mutations and then placed at a randomly chosen position in the
environment. In most cases this will result in the replacement (death) of another DISCO.
The frequency with which DISCOs are chosen to execute an instruction is proportional to
their merit. Consequently, the more merit a DISCO has, the more frequently it will be chosen
to execute an instruction. Obviously, if the number of instructions that have to be executed until
a cell divides is the same for two DISCOs, then the one with the greater merit will have the
higher fitness. On the other hand, if two DISCOs have the same merit, then the one that has to
execute less instructions until cell division will have the higher fitness. We see that there are
two components that determine the fitness of a DISCO: (a) its merit (relative to the merit of the
other organisms) and (b) the speed of replication (i.e., the number of instructions that have to
5
be executed until a cell divides).
As mentioned above, a DISCO can manipulate data to create logic compounds. By creating
compounds that are equivalent to specific logic functions, a DISCO can increase its merit and
consequently its fitness. A DISCO can only generate logic compounds by applying the nand
instruction. The nand instruction connects two data elements (a data element is either a logic
variable or a logic compound) with the NAND (“Not And”) operation. The NAND operator
(. | .) is defined as the negation (¬ .) of the conjunction (. ∧ .), that is, (a|b) = ¬(a ∧ b). The
resulting logic compound is false if and only if both a and b are true. Formal logic shows
that every truth-functional compound can be expressed by using just the NAND operator (3).
For example, we can write ¬a = a|a and a ∧ b = (a|b)|(a|b). Since a DISCO can apply the
NAND operation, it can also produce a multitude of logic compounds. With the io instruction
a DISCO can test whether a data element is equivalent to one of up to nine logic functions. If
so, then the DISCO is rewarded with a merit increase. By how much the merit is increased
depends on the complexity of the logic function (see below and Section 2). Hence, a DISCO
receives a reward for the construction of meaningful logic compounds. Selection will therefore
favor DISCOs that can construct such logic compounds.
For simplicity, the merit of an organism remains unchanged until it produces offspring.
Each time a DISCO produces new offspring its merit is newly calculated according to the logic
compounds it was able to create. One has to keep in mind that newly born DISCOs have not yet
had the chance to construct any logic compounds. To avoid disadvantageous for newborns, they
start off with the parental merit until they reached maturity and produce their own offspring.
The table below lists the nine logic operators that a DISCO can compute to change its merit.
During a simulation logic operators are identified by calculating the truth value of the expression
in the second column, where C is a logic compound that a DISCO generated. The truth value
of the expression in column two is calculated by using randomly generated 64-bit integers as
6
instances for the logic variables a and b. The NAND operation as well as the logic operations
are applied bit-wise. An example is given below.
It is convenient to summarize which logic functions the genome of a DISCO can compute
by using a nine digit binary code. For example a 000000000 tells us that the genome does not
encode for any logic function; a 101101000 that the genome can compute NOT, AND, OR N,
and AND N, and a 111111011 that the genome can compute all nine logic functions except for
XOR.
minimum number
1
logic operator
definition
NOT
C(a) ≡ ¬a
NAND
of NAND required
1
C(a, b) ≡ ¬(a ∧ b)
1
C(a, b) ≡ a ∧ b
C(a, b) ≡ a ∨ ¬b ∨ C(a, b) ≡ ¬a ∨ b
2
3
AND N
C(a, b) ≡ a ∨ b
C(a, b) ≡ a ∧ ¬b ∨ C(a, b) ≡ ¬a ∧ b
XOR
C(a, b) ≡ (a ∨ b) ∧ ¬(a ∧ b)
4
C(a, b) ≡ (a ∧ b) ∨ (¬a ∧ ¬b)
5
AND
OR N
OR
NOR
EQU
C(a, b) ≡ ¬(a ∨ b)
2
3
4
a
: 110101110001110011111001100010001000001000. . .
64-bit integer
b
: 000100101101101001011100000111110000101000. . .
64-bit integer
C(a, b) : 000100100001100001011000000010000000001000. . .
64-bit integer
We have C(a, b) ≡ a ∧ b and conclude that compound C(a, b) encodes for AND
1.3 Multicellularity
So far I have described how the genome of a DISCO determines the phenotype of a single cell.
The genome can also encode information about the development of a multicellular organism.
1
¬, ∧, ∨, and ≡ symbolize the logical negation, conjunction (and), disjunction (or), and equivalence.
7
A DISCOs life starts always with a single cell, the default (D) cell. Each organism has exactly
one D cell and the D cell is the only reproductive cell. After the first cell division the D cell has
two options. It can either release the daughter cell into the environment as offspring and remain
unicellular, or retain the cell as a first step towards the development of a multicellular organism.
If, how many, and what kind of cells are retained is encoded in the genome.
To understand how multicellularity is encoded, we have to remind ourself that the genome
is a sequence of instructions and modifiers. Among the set of instructions, the divide instruction is special because it is the only one that is not affected by modifiers. To keep things as
simple as possible, I decided to exploit this feature of the divide instruction. In particular,
the D cell will retain one cell for each divide instruction in the genome that is followed by a
modifier. Depending on the kind of modifier the retained daughter cell is assigned to a somatic
cell type. For example, a ‘divide|A’ encodes for a X cell, whereas a ‘divide|B’ and a
‘divide|C’ encodes for a Y and a Z cell, respectively. (Z cells are not relevant for this work
and just mentioned for completeness.) After a daughter cell has been retained for each such
divide instruction, the D cell is able to release daughter cells as offspring into the environment (see Figure 1 for an example). Please note that the genome might not contain any divide
instruction that is followed by a modifier and would therefore encode for a unicellular organism
(see Appendix B for examples).
Most multicellular organisms have specialized cells. Even though the specialized cells of
a multicellular organism contain in most cases the same genome as the replicative cells, they
behave differently. Specialized cells in DISCOs work analogous. The genome of a cell might
be able to compute all logic functions, but cells can only utilize logic functions according to
their cell type. For this work, D and X cells can only benefit from the first six logic functions
and Y cells only from the last three functions that might be encoded in the genome (see Section
2).
8
It is important to point out that, even though a DISCO can be composed of several cells,
each cell is still an independent automaton that executes one instruction after another. In fact,
whenever a multicellular DISCO is selected by the environment to execute an instruction, each
cell of this organism will execute one instruction.
2 Details about the -X and +X simulations
This section describes the -X and the +X simulations in more detail. For computational reasons
I limited the population size to 200 (uni- or multicellular) DISCOs. I conducted 500 runs for
each type of simulation, which differed only with respect to the seed for the random number
generator. Both types of simulations have two parts (a) an initial 10 000 generations (2 × 106
replication events) in which specialized cells are not beneficial and (b) a further 10 000 generations in which specialized cells are advantageous.
For the first 10 000 generations, each simulation was initiated with the same genome. This
genome contains 141 ‘blank’ instructions (essentially a place holder, see Appendix A) and a
sequence of 9 instructions, ‘search|copy|if-label|C|A|divide|mov-head|A|B’,
which encodes cell replication (see Appendix B). Thus, the ancestral DISCO could not compute any logic function and was unicellular. The initial genome has length 150. During all
simulations, genome length was restricted to [145, 155], that is, offspring was nonviable if its
genome was smaller than 145 or larger than 155. After the first 10 000 generations I determine
the most recent common ancestor (MRCA) of the population. The genome of the MRCA is
then used to seed the population for the next 10 000 generations. That is, the second part of a
run is initialized with a genome that was produced during the first part of this run.
During the simulations, offspring was exposed to copy, insertion, and deletion mutations.
For each offspring, the number of copy, insertion, and deletion mutations is chosen from a
9
Poisson distribution with mean 0.45, 0.025, and 0.025, respectively. An average mutation rate
of 0.5 instructions per replication (≈ 3.4 × 10−3 mutations per instruction per replication) was
chosen because it seemed to be optimal for the evolution of Y cell specific logic functions.
The merit of an organism is calculated based on the merit of its cells. The merit of a cell
in turn is calculated based on the logic functions it can compute and utilize. A cell that cannot
compute any logic function has merit 1. The second column of Table 1 shows how the merit
of a cell changes if it is able to compute (and utilize) the corresponding logic function. The
values were taken from Lenski et al. (1) and reflect the complexity of the respective function.
In particular, the merit of a cell is multiplied by 2n if the cell is able to compute a logic function
of complexity n, where n gives the minimum number of NAND operations that are required
to construct the logic function [see (1) and Section 1.2]. As mentioned before, not every cell
is able to utilize every logic function. Which kind of cell can utilize which kind of functions
during the first and the second part of the +X and the -X simulations is shown in Table 1.
Finally, the merit of the multicellular organism has to be calculated based on the merit of
each cell. In short, the merit of an organism during the +X simulations is given by SUM(X,D)*SET(Y)
and during the -X simulations by SET(X,D)*SET(Y). The expression SUM(X,D) denotes
the sum of merits of all D and X cells. Hence, during the +X simulations D and X cells contribute linearly to the merit of the organism. SET(.) is equal to the merit of a cell that
can compute the set of functions that all the cells in the argument can compute. For example,
SET(X,D) equals the merit of a cell that can compute the same set of functions that all X and D
cells can compute. Since X and D cells can encode and utilize the same functions, SET(X,D)
is essentially equal to the merit of one D cell. Hence, additional X cells cannot contribute to the
merit of the organism. Similarly, SET(Y) equals the merit of a cell that can compute the set of
functions that all Y cells can compute. Again, since one Y cell can compute the set of functions
that all Y cells can compute, SET(Y) is equal to the merit of one Y cell.
10
The use of SUM(X,D)*SET(Y) for the +X simulations was motivated by specialized cells
in cyanobacteria. X and D cells are functionally equivalent and contribute additively to the
merit of the organism: Two photosynthesizing vegetative cells can fix approximately twice as
much carbon as one photosynthesizing cell. Y cells, however, are specialized cells that amplify
the activity of X and D cells (by providing nitrogen, for example). This is reflected in the
multiplicative contribution of Y cells to the merit of the organism. I use SET(Y) instead of
SUM(Y) because DISCOs are thought to be small enough so that one Y cells can amplify the
merit of X and D cells as well as two Y cells.
Figure 1 shows the first five cell divisions in the life of a multicellular DISCO composed of
two X cells and one Y cell. It also shows what the merit of this organism would be during the
first and the second part of the +X and -X simulations.
cell types that can utilize the given function
logic function change in merit
NOT
NAND
AND
OR N
OR
AND N
XOR
NOR
EQU
×21
×21
×22
×22
×23
×23
×24
×24
×25
first 10 000 generations
second 10 000 generations
+X
-X
+X
-X
D,X
D,X
D,X
D,X
D,X
D,X
-
D
D
D
D
D
D
-
D,X
D,X
D,X
D,X
D,X
D,X
Y
Y
Y
D
D
D
D
D
D
Y
Y
Y
Table 1: Merit increase and cell type specifity during the +X and -X simulations. Please note
that X cells can utilize functions during the -X simulations but are not able to contribute to the
merit of the organism, since I am using SET(D,X)*SET(Y) instead of SUM(D,X)*SET(Y).
11
Figure 1: Multicellularity in DISCOs. (a) The first five cell division of a DISCO with a genome
that encodes two X cells and one Y cell. As explained in Section 1.3, if the genome contains
divide instructions that are followed by a modifier, then the D cell will retain daughter cells
to build a multicellular organism. The genome of this DISCO contains two divide|A and
one divide|B instructions. Consequently, the first three cell divisions are used to produce
two X cells and one Y cell. Thereafter every further division results in cells that are released
into the environment as offspring. (b) Calculating the merit of a multicellular organism. Let us
assume that the genome of this DISCO encodes for AND, OR, AND N, XOR, and EQU, that is,
001011101. A logic function can increase a cells merit only if it is encoded in the genome and
utilizable by the given cell type. Here, for example, the D cell can compute and utilize AND,
OR, and AND N (001011000) and has therefore a merit of 1 × 22 × 23 × 23 = 28 = 256.
For the first 10 000 generations, Y cells cannot utilize any logic function and have therefore
always merit 1. For the second part, however, Y cells can utilize the last three logic functions
(000000111). Since the genome of this organism encodes XOR and EQU, Y cells receive a
merit of 1 × 24 × 25 = 29 = 512 for 000000101. The -X simulations differ from the +X
simulations in that X cells cannot contribute to the merit of the organism. I use SET(D,X)
instead of SUM(D,X) and one D cell can compute the same set of functions as D and X cells
together. We can clearly see that some somatic cells do not increase the merit of the DISCO
(X cells during the -X simulations and Y cells during the first 10 000 generations). However,
somatic cells constitute a cost, since they delay the time it takes to reach maturity and finally
produce offspring. Hence, somatic cells that do not increase the merit of the organism are
disadvantageous.
12
3 Developmental cost of one Y cell
I will calculate the fitness of an organism composed of n + 1 cells relative to the fitness of an
organism composed of n cells. Let us first consider n = 1. The unicellular organism produces
offspring with every cell division at rate r1 , i.e., → + . Its fitness is given by the rate
of cell division. The bicellular organism produces offspring only after it reached maturity. In
DISCOs (see Figure 1a) only one cell is able to divide. Hence, we have → → +.
If x1 and x2 denote the frequency of the bicellular organisms in the unicellular and the bicellular
stage of development, respectively, and r2 the rate of cell division, then we can use the following
differential equation to describe the population,
ẋ1 = −r2 x1 + r2 x2 − Φx1
ẋ2 = r2 x1
(1)
− Φx2 .
The fitness of the bicellular organism is given by the average fitness Φ at equilibrium, which is
given by the largest eigenvalue of


−r2 r2 
.

r2 0
(2)
A short calculation shows, that the eigenvalues, λ, are given by the solutions of λ(λ + r2 ) − r22 .
Equivalently, we can solve λ(λ + 1) − 1 and multiply the solution with r2 . In any case, the
√
fitness of the bicellular organism at equilibrium is given by Φ = ( 5 − 1)/2 r2 ≈ 0.62 r2 .
Hence the fitness of the bicellular organism relative to the fitness of the unicellular organism
equals r2 0.62/r1 . In DISCOs the rate of cell division is proportional to the merit of the organism
and the efficiency of genome duplication. Since we are just interested in the effect of mutations
that add an additional cell to an organism, we have r1 ≈ r2 . Hence, mutations that turn a
unicellular into a bicellular organism decrease the relative fitness to 0.62.
The calculations for an organism of size n are similar. To calculate the fitness, Φn , of an
13
organism of size n, we have to calculate the largest eigenvalue of the following n × n matrix,


−1 0 · · · 0 1


.
 1 −1 . .
0

..
. . .. 
0
(3)
.
. .
1
.


 .
..
..
 ..
.
. −1 0
0 ··· 0
1 0
The eigenvalues λ are given by the roots of λ(λ + 1)n−1 − 1. The fitness of an organism of
size n + 1 relative to and organism of size n is then given by Φn+1 /Φn . For example, the cost
of one additional, unused cell in an organism of size 10 is Φ11 /Φ10 = 0.184/0.197 = 0.934.
The dotted line in Figure 2 shows Φn and the solid line Φn+1 /Φn for a wide range of organism
sizes.
14
1.0
0.8
0.6
0.4
0.0
0.2
relative fitness
1
5
10
50
100
500
organism size
Figure 2: Developmental cost of an additional, unused cell. The dotted line shows the fitness of
an organism of size n relative to the fitness of a unicellular organism. The solid line shows the
fitness of an organism of size n + 1 relative to the fitness of an organism of size n. Hence, it
shows how deleterious mutations are that add one, unused Y cell to an organism of size n.
15
References and Notes
1. Lenski R.E., Ofria C., Pennock R.T., & Adami C. (2003) Nature 423, 139–44.
2. Ofria C. & Wilke C.O. (2004) Artif Life 10, 191–229.
3. Goldfarb W. (2003) Deductive Logic (Hackett Publishing Company).
16
A List of Instructions
This section describes how instructions together with modifiers change the state of a DISCO cell. In the following
inst will refer to any of the available instructions, mod will refer to a sequence of modifiers, and C(mod) to
its complement. We have the three modifiers A, B, and C with their complements B, C, and A, respectively. The
complement of a sequence of modifiers is given by the sequence of complements, for example, C(CACABB) =
ABABCC.
Instructions for the replication phase
The copy instruction: The copy instruction copies the instruction/modifiers to which the read head points to the
daughter strand and moves the read head to the instruction/modifier following the just copied genome elements.
The search instruction: The search instruction repositions the flow head depending on modifier sequences
in the genome. We have to distinguish the following cases.
• The search instruction is followed by another instruction, i.e., we have . . . search inst . . . : In this
case the flow head is moved to point at inst.
• The search instruction is followed by a sequence of modifiers, i.e., we have . . . search mod . . . :
In this case we look for the compliment of the modifier sequence C(mod) in the genome. If this sequence
can be found, then the flow head is moved to point at the instruction following C(mod). If this sequence
cannot be found, then the flow head is moved to point to the instruction following mod.
The mov-head instruction: The mov-head moves either the instruction head or the read head to the position
of the flow head. The instruction head is moved to the flow head if the DISCO cell executes a mov-head that
is followed by modifier A, i.e., mov-head A. If the mov-head is followed by modifier B, i.e., mov-head B,
then the read head is moved to the flow head. In all other cases, the mov-head instruction does not affect the state
of the automaton.
The if-label instruction: The if-label instruction can be used to skip instructions depending on the most
recently copied genome element. Let us consider the sequence if-label mod inst. The inst instruction
will be executed only if the most recently copied genome element is the complement of mod. If this is not the case,
then inst will be ignored and the instruction head moved forward to the next instruction. In case of if-label
inst, the automaton will execute inst if and only if the most recently copied genome element is an instruction.
The divide instruction: The divide instruction initiates cell division and terminates the replication phase.
The divide instruction will successfully generate a new DISCO cell if the genome has the right length. For this
work the genome cannot be smaller than 145 or larger than 155 modifiers and instructions long. The generated
daughter cell is either used as a somatic cell or released into the environment as offspring.
Instructions for the metabolic phase
The state changes for the metabolic phase are more straight forward than the ones for the replication phase. They
only affect the merit of the organism and data stored in the registers and the stack. In the following I will use si , a,
b, and c to denote logic compounds and x to denote a newly generated logic variable. The first column of following
tables shows the state of the automaton before the sequence in the second column is executed. The third column
17
shows the state of the automaton after the execution of the sequence. Not shown is the advance of the instruction
head to the next instruction.
The blank instruction: The blank instruction does not change the state of the automaton. It is essentially just
used as a place holder in ancestral genomes. Usually, the blank instruction is not part of the pool of instructions
from which instructions for the copy and insertion mutations are chosen. Consequently, blank instructions will
eventually disappear from the genome.
The io instruction: The merit of the DISCO before the execution of the io instruction is given by m. The merit
of the DISCO, after testing a, b, and c for logic functions is given by ma , mb , and mc , respectively. A DISCO cell
is rewarded only once for a given logic function during one metabolic phase.
before
merit:m A:a B:b C:c
instruction
io A
io B
io C
io inst
merit:ma
merit:mb
merit:mc
merit:mb
after
A:x
A:a
A:a
A:a
B:b
B:x
B:b
B:x
C:c
C:c
C:x
C:c
The nand instruction: Please note that the nand instruction is the only instruction that can generate new logic
compounds.
before
A:a B:b C:c
instruction
nand A
nand B
nand C
nand inst
A:a|b
A:a
A:a
A:a
instruction
swap A
swap B
swap C
swap inst
A:b
A:a
A:c
A:a
after
B:b
B:b|c
B:b
B:b|c
C:c
C:c
C:c|a
C:c
The swap instruction:
before
A:a B:b C:c
after
B:a C:c
B:c C:b
B:b C:a
B:c C:b
The push instruction:
before
A:a B:b C:c
stack: s1 , s2 , . . .
instruction
push A
push B
push C
push inst
The pop instruction:
18
after
stack: a, s1 , s2 , s3 , . . .
stack: b, s1 , s2 , s3 , . . .
stack: c, s1 , s2 , s3 , . . .
stack: b, s1 , s2 , s3 , . . .
before
A:a B:b C:c
stack: s1 , s2 , . . .
instruction
pop A
pop B
pop C
pop inst
after
A:s1 B:b C:c stack: s2 , s3 , . . .
A:a B:s1 C:c stack: s2 , s3 , . . .
A:a B:b C:s1 stack: s2 , s3 , . . .
A:a B:s1 C:c stack: s2 , s3 , . . .
B Example Genomes
Encoding Replication
The following shows a sequence of the state changes that lead to genome replication. During the replication phase,
state changes affect only the position of the read, flow, and instruction head. These positions are indicated by
underlines, overlines, and bold font, respectively, i.e., read head, flow head, and instruction head.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand:
search moves the flow head to the copy instruction
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand:
copy copies the io instruction to the daughter strand and moves the read head forward
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io
The complement of C|A is given by A|B. Since the most recently copied element is io and not A|B,
if-label ignores divide and advances the instruction head to the mov-head instruction.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io
mov-head moves the instruction head to the position of the flow head.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io
copy copies the search instruction to the daughter strand and moves the read head forward.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io|search
This loop continues until the automaton copies A|B, the last genome element.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io|search|copy|if-label|C|A|divide|mov-head
Copies A|B to the daughter strand and moves the read head forward.
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io|search|copy|if-label|C|A|divide|mov-head|A|B
Now the most recently copied genome element is identical to the complement of C|A and if-label does
not skip the divide instruction
• io|search|copy|if-label|C|A|divide|mov-head|A|B
daughter strand: io|search|copy|if-label|C|A|divide|mov-head|A|B
The divide instruction ends the replication phase and the daughter strand is used to build a new cell.
19
Encoding NAND and AND
This sequence creates two logic compounds. The first one (a|b)|(a|b) is equivalent to AND and the second one a|b
is equivalent to NAND, where a and b are logic variables.
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
io|io|C|nand|push|pop|C|nand|io|io|C
A:
A:
A:
A:
A:
A:
A:
A:
A:
B:
C: stack:
B:a
C: stack:
B:a
C:b stack:
B:a|b
C:b stack:
B:a|b
C:b stack:a|b
B:a|b
C:a|b stack:
B:(a|b)|(a|b) C:a|b stack:
B:c
C:a|b stack:
B:c
C:d stack:
merit= 1
merit= 1
merit= 1
merit= 1
merit= 1
merit= 1
merit= 1
merit= 1 × 22
merit= 1 × 22 × 21
Encoding NAND, AND, and Replication
This sequence is pieced together using the two sequences above. The first part encodes the two logic function
NAND and AND, and the second part genome replication.
io|io|C|nand|push|pop|C|nand|io|io|C|search|copy|if-label|C|A|divide|
mov-head|A|B
Encoding NAND, AND, Replication, and One Y Cell
This sequence is a derivative of the previous sequence. It contains a divide|B and encodes therefore for a
multicellular organism with one Y cell. Please note that the if-label|C before the divide is required to
prevent a premature initiation of cell division.
io|io|C|nand|push|if-label|C|divide|B|pop|C|nand|io|io|C|search|copy|
if-label|C|A|divide|mov-head|A|B
20
doi: 10.1111/j.1420-9101.2007.01496.x
Erratum
A publication error led to an error being introduced into Fig. 1(b) of Willensdorfer (2008) in the print edition of the
journal (Journal of Evolutionary Biology, vol. 21(1), p. 106). A corrected version of this figure is reproduced below. This
figure is correct in the online edition of the journal.
(a)
(b)
Fig. 1 Multicellularity in digital self-replicating cellular organisms (DISCOs). (a) The first five cell divisions of a DISCO with a genome that
encodes for two X cells (green shaded regions) and one Y cell (blue shaded region). The first three cell divisions produce the somatic X and
Y cells. Every further division produces offspring which is released into the environment. (b) The merit of this multicellular organism
during the first and second 10 000 generations of the )X and +X simulations. The genome encodes for five (of nine) logic functions as indicated
by the nine-digit binary sequence. D, X and Y cells can utilize these functions (second column) only according to their cell type specificity
(first column) and receive a corresponding merit (third column) which is used to calculate the merit of the organism (fourth column). Note
that Y cells are not able to increase the merit of the organism during the first 10 000 generations and that X cells are not able to increase
the merit of the organism during the )X simulations. Cells that do not increase merit are disadvantageous, as they increase the number of
cell divisions that are required to reach maturity. A more detailed description is available in the Supplementary material.
Reference
Willensdorfer, M. (2008). Organism size promotes the evolution of specialized cells in multicellular digital organisms. J. Evol. Biol. 21:
104–110.
646
ª 2008 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 646
JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY