doi: 10.1111/j.1420-9101.2007.01466.x Organism size promotes the evolution of specialized cells in multicellular digital organisms M. WILLENSDORFER Program for Evolutionary Dynamics, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA Keywords: Abstract differentiated multicellularity; division of labour; early life; somatic cells. Specialized cells are the essence of complex multicellular life. Fossils allow us to study the modification of specialized, multicellular features such as jaws, scales, and muscular appendages. But it is still unclear what organismal properties contributed to the transition from undifferentiated organisms, which contain only a single cell type, to multicellular organisms with specialized cells. Using digital organisms I studied this transition. My simulations show that the transition to specialized cells happens faster in organism composed of many cells than in organisms composed of few cells. Large organisms suffer less from temporarily unsuccessful evolutionary experiments with individual cells, allowing them to evolve specialized cells via evolutionary trajectories that are unavailable to smaller organisms. This demonstrates that the evolution of simple multicellular organisms which are composed of many functionally identical cells accelerates the evolution of more complex organisms with specialized cells. Introduction In multicellular organisms, cells differentiate and specialize to form tissues which cooperate to form organs such as brains, kidneys, hearts, stomachs and lungs. Without specialized cells multicellular organisms would be nothing more than a homogeneous lump of cells. It is a widely accepted consequence of evolutionary theory that differentiated organisms with specialized cells evolved from undifferentiated ancestors (Darwin, 1859; Buss, 1988; Knoll, 2003; King, 2004). It is believed that the pre-existence of undifferentiated multicellularity conveys advantages for the evolution of specialized cells (Buss, 1988; Maynard-Smith, 1989). One argument regards the alleviation of reproductive competition in organisms that develop from a single cell. In such organisms, cells are genetically identical and genes that encode for the development of specialized cells would not curtail their own propagation by creating nonreproductive cells (Buss, 1988; Maynard-Smith, 1989; Maynard-Smith & Szathmary, 1997; Dawkins, 1999; Michod & Roze, 2001). Correspondence: M. Willensdorfer, Program for Evolutionary Dynamics, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA. e-mail: [email protected] 104 In this work, I demonstrate that undifferentiated multicellularity conveys an additional advantage for the evolution of specialized cells. I find that the size of a multicellular organism affects its fitness landscape. Mutations that differentiate individual cells are less detrimental in organisms composed of many cells than in organisms composed of few cells. This changes the evolutionary landscape and accelerates the evolution of specialized cells. The insight that the size of an organism affects its ability to evolve new, specialized cells is vital for our understanding of how complex multicellular life evolved. To study the evolution of a complex feature like differentiated multicellularity, it is desirable to use an experimental system in which the evolutionary path from one stage to another is not preset but discovered by evolution itself. Digital organisms provide such a framework. Digital organisms are entities that are able to replicate and perform specific tasks. They compete for a common resource and are exposed to mutations. The combination of replication, competition and mutation results in an evolutionary process, which can be used to address biological questions (Adami et al., 2000; Wilke et al., 2001; Yedid & Bell, 2002; Lenski et al., 2003; Chow et al., 2004). So far, however, digital organisms have not been equipped with the ability to evolve multicellularity. To ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY Specialized cells in multicellular DISCOs close this gap, I developed and implemented digital organisms that are able to evolve multicellularity of varying complexity. The resulting digital self-replicating cellular organisms (DISCOs) are similar to the digital organisms used by the Avida software platform (Ofria & Wilke, 2004). The Supplementary material contains details about how a single DISCO cell works and how multicellularity is implemented. Besides providing insight into the evolution of specialized cells, this paper demonstrates how readily existing artificial life systems can be extended to study the evolution of complex multicellular features. For the following, it is sufficient to know that a DISCO has a genome that can encode logic functions as well as the development of a multicellular organism. The fitness of a DISCO is determined by its merit and its speed of replication. The merit is determined by the logic functions that the DISCO can execute, which are encoded in its genome. Logic functions differ in their complexity. The higher the complexity of the logic function, the larger the merit increase (see Supplementary material and Lenski et al., 2003). The speed of replication is mainly determined by the number of cells a DISCO is composed of. Several types of cells exist. The default (D) cell is the replicative cell. Every DISCO has exactly one D cell and it is the only cell type in a unicellular organism. D cells can, if instructed by the genome, produce somatic X and Y cells. Somatic cells are always associated with a cost as a multicellular organism spends time and energy growing them, whereas a unicellular organism can use these resources to produce offspring. On the other hand, by computing logic functions, somatic cells can increase a DISCO’s merit yielding a benefit that outweighs these costs. Some functions, however, can only be utilized by specific, specialized cells. Specialized cells are common in biology. The model structure studied in this work is motivated by the heterocysts of cyanobacteria. Heterocysts are cells specialized on the fixation of nitrogen. They provide an oxygen-free environment for the nitrogen-fixing enzyme. To accomplish this, they develop thick cell walls that shut out oxygen. They also degrade photosystem II, which produces oxygen. These features allow heterocysts to fix nitrogen, but prevent them from carrying out functions of nonspecialized cells, such as cell division and photosynthesis via photosystem II. Y cells are specialized cells in DISCOs and analogous to heterocysts in cyanobacteria. Y cells are different from normal (D and X) cells. The differences allow Y cells to utilize the three most complex logic functions which cannot be utilized by the nonspecialized D and X cells. This specialization, however, makes it impossible for Y cells to utilize the six logic functions that nonspecialized D and X cells can utilize. Thus, similar to heterocysts, Y cells are specialized for certain tasks (see first column in Fig. 1b and Supplementary material). 105 It is important to emphasize that a multicellular DISCO can only benefit from a given logic function if: (a) the function is encoded in the genome; and (b) cell types that are able to utilize the function are present. This is analogous to heterocystous cyanobacteria that can only benefit from nitrogen fixation if: (a) the nitrogen-fixing enzyme is correctly encoded in the genome; and (b) cells with degenerated photosynthesis II and thick, oxygenimpermeable cell walls exist. Simulations and results In this work, I am interested in the transition from simple multicellularity to a more complex multicellularity with specialized cells. In particular, I would like to know if the pre-existence of undifferentiated multicellular organisms has an effect on the evolution of specialized cells. To study this, I will compare ‘)X’ and ‘+X’ simulations. In +X simulations, X cells are able to increase the merit of a DISCO. A DISCO composed of one D cell and n X cells has n + 1 times the merit of a unicellular DISCO. In +X simulations, the evolution of X cells, that is, undifferentiated multicellularity, is encouraged. This is not the case in )X simulations in which X cells are not able to increase the merit of the organism and are therefore disadvantageous. In )X simulations, a transition to differentiated multicellularity with specialized cells has to occur directly from a unicellular ancestor. To study the transition to differentiated organisms, I evolved, as a first step, undifferentiated DISCOs. To ensure that DISCOs do not evolve specialized cells, I suspended the ability of Y cells to increase the merit of the organism for this initial set of simulations (see Table S1 and Fig. 1b). For each set of simulations ()X and +X), I conducted 500 independent runs that differ only with respect to the seed for the random number generator. Each simulation was initiated with a genome that encodes only replication. In other words, the ancestral DISCO was unicellular and could not compute any logic function. For computational reasons, I used an effective population size of 200 organisms and stopped a simulation after 10 000 generations (see Supplementary material for more details). Following the logic of Lenski et al. (2003), at the end of each simulation I determined the most recent common ancestor of the population and its line of descent. Similar to a palaeontologist, I use this (digital) fossil record to determine when each trait appeared. But in contrast to a palaeontologist, I know the fitness of each fossil and study 500 independent instances of one evolutionary process. This gives me the opportunity to discover general properties of the process at hand. As expected, none of the )X simulations evolved multicellularity during the first 10 000 generations. All DISCOs remained unicellular. On the other hand, 491 of the 500 +X simulations evolved undifferentiated multicellularity. Multicellular DISCOs are very diverse with ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY 106 M. WI L LE N S D O R F E R (a) (b) Fig. 1 Multicellularity in digital self-replicating cellular organisms (DISCOs). (a) The first five cell divisions of a DISCO with a genome that encodes for two X cells (green shaded regions) and one Y cell (blue shaded region). The first three cell divisions produce the somatic X and Y cells. Every further division produces offspring which is released into the environment. (b) The merit of this multicellular organism during the first and second 10 000 generations of the )X and +X simulations. The genome encodes for five (of nine) logic functions as indicated by the nine-digit binary sequence. D, X and Y cells can utilize these functions (second column) only according to their cell type specificity (first column) and receive a corresponding merit (third column) which is used to calculate the merit of the organism (fourth column). Note that Y cells are not able to increase the merit of the organism during the first 10 000 generations and that X cells are not able to increase the merit of the organism during the )X simulations. Cells that do not increase merit are disadvantageous, as they increase the number of cell divisions that are required to reach maturity. A more detailed description is available in the Supplementary material. respect to their size. They have body sizes ranging from two to 13 cells, with size five as the most frequent. The organisms were also very successful in evolving logic functions. Most simulations evolved DISCOs that can compute all six available functions; few evolved ‘just’ five functions. To study the evolution of specialized cells, I used each of these most recent common ancestors as a starting point for another 2 · 500 simulations. This time, Y cells were able to utilize functions that had not been available so far. They can increase the fitness of a DISCO substantially (see Supplementary material and Fig. 1b). In such a situation, one expects the evolution of DISCOs with Y cells and Y-cell-specific functions, which was indeed the case. As expected, multicellular DISCOs in the )X simulations were exclusively bicellular, composed of one D and one Y cell. Specialized cells were discovered in 197 of the 500 )X and in 308 of the 500 +X simulations. This difference is highly significant (twosample test for equality of proportions: v2 = 48.40, d.f. = 1, P = 3.5 · 10)12). Apparently, the pre-existence of undifferentiated multicellular organisms promotes the evolution of more complex multicellular organisms with specialized cells. To study why undifferentiated multicellularity promotes the evolution of Y cells, I examined the evolutionary paths that lead to DISCOs with specialized cells. Considering the order of events we have three possibilities. Mutations can result in the simultaneous (si) appearance of Y cells and Y-cell-specific functions, or the two traits may appear in succession, either first the cell and then the function (cf) or first the function and then the cell (fc). The digital fossil record allows us to determine via which path and at what time Y cells were discovered (see Fig. 2). Two features are conspicuous. First, for the )X simulations si is the most frequently travelled path; about 46% of the specialized cells are discovered simultaneously with the cell function. Secondly, the )X and +X simulations differ noticeably only with respect to cf (red triangles). Apparently, evolving first the cell and then the function is much easier for undifferentiated multicellular organisms than it is for unicellular ones and accounts for the significant difference in the number of simulations that discovered Y cells between the +X and the )X simulations. To explain these observations, we have to consider the fitness of organisms along the three possible paths. Especially, the intermediates for cf and fc are of interest. Let Dc and Df denote DISCOs along the evolutionary paths cf and fc. That is, Dc is a DISCO that encodes Y cells but not (yet) Y-cell-specific logic functions, and Df is a DISCO that has acquired Y-cell-specific functions but not (yet) Y cells. If Dc and Df have a low fitness, then they are not maintained for long in the population and there is ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY 107 +X simulations −X simulations cf (first cell then function) si (simultaneously cell and function) fc (first function then cell) 0 Number of simulations 50 100 150 200 Specialized cells in multicellular DISCOs 0 2000 4000 6000 8000 Number of generations 10 000 Fig. 2 Number of simulations that evolved specialized cells as a function of time. Simulations are grouped according to the evolutionary paths cf, si and fc (see figure legend and main text) that led to the evolution of new cell types that utilize new functions. Noticeable differences between simulations with unicellular ()X) and undifferentiated multicellular (+X) ancestors exist only with respect to evolutionary path cf along which specialized cells (Y cells) appear before the genome encodes for the specialized functions. less opportunity for a second mutation to give rise to the missing Y-cell function or the missing Y cell. In such a case, evolution along the corresponding paths is impaired and one would expect most specialized cells to evolve directly via si (Iwasa et al., 2003, 2004). The digital fossil record provides information about the time, t, that Dc and Df are maintained in the population, as well as the organism size at that point (see Fig. 3). Let us first discuss the data for organisms of size one, that is, the )X simulations. None of the 51 simulations that evolved Y cells via cf maintained Dc for more than 20 generations in the population. This suggests that mutations that provide a DISCO with Y cells are deleterious. The digital fossil record shows that the fitness decrease is not a result of a merit decrease due to a loss of logic functions. Rather, the loss in fitness is caused by developmental costs. DISCOs with Y cells but without Y-cell functions grow one additional, unused cell. This constitutes a fitness burden. In the Supplementary material, I show that mutations that transform a unicellular DISCO into a bicellular DISCO decrease the relative fitness from 1 to about 0.62. Thus, the fitness of Dc is indeed low. What about the fitness of Df? The data in Fig. 3 show that Df is easier to maintain in the population than Dc and suggests that mutations along path fc are less deleterious than mutations along cf. But for the following CF (first cell then function) 15 FC (first function then cell) 14 3 12 11 10 2 7 14 9 1 6 24 8 2 9 30 7 7 9 21 6 9 12 25 5 8 10 11 4 4 8 12 9 5 5 2 3 9 15 3 3 4 11 1 Fig. 3 Time, t, in number of generations between the appearance of specialized cells (Y cells) and the appearance of specialized functions. The data are grouped according to the size of the digital self-replicating cellular organism immediately before the appearance of Y cells. The plot contains data from the +X (organism size greater than one) and )X (organism size equals one) simulations. The panel in the middle shows the number of simulations that evolved Y cells via fc, si and cf respectively. The correlation between t and the size of the organism for evolutionary path cf is evident (Kendall’s s-statistic: z = 5.15, P = 2.64·10)7). It shows how organism size affects the evolution of specialized cells by reducing the detrimental effect of temporarily unsuccessful evolutionary experiments with individual cells. To increase expressiveness, I added small random noise to the organism size and used different plot regions for fc and cf. Organism size 13 1 6 2 1 +X simulations –X simulations –250 –200 –150 –100 –50 55 91 51 0 0 10 20 30 40 Number of generations between the appearance of Y cells and Y cell functions ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY 50 108 M. WI L LE N S D O R F E R relative fitness of a 10-celled DISCO is decreased to only 0.93 (see Supplementary material). Hence, the intermediates along cf are less deleterious for large organisms and are therefore maintained longer in the population. This is evidenced by a significant (Kendall’s s-statistic: z = 5.15, P = 2.64 · 10)7) correlation between t and the size of the organism for cf (see Fig. 3). Consequently, the rate of evolution along cf increases with organism size and large organisms are significantly more likely to evolve Y cells via cf (Pearson’s chi-squared test: v2 = 37.20, d.f. = 13, P = 3.9 · 10)4, see numbers in Fig. 3). By lowering the barriers along one of the evolutionary paths, organism size promotes the evolution of specialized cells and, therefore, of more complex multicellular organisms. If this is the case, then we should also find evidence for this phenomenon within the +X simulations. In particular, organisms composed of many cells should evolve specialized cells earlier than organisms composed of few cells. Fig. 4 shows the size distribution of DISCOs with and without Y cells at different time points of the +X simulations. For example, after 200 generations, 14 simulations discovered Y cells. Only two of those (<15%) were discovered in organisms smaller than seven cells, even though, most simulations (>60%) contained organisms smaller than seven cells. This bias towards specialized cells in larger organisms is even more pronounced in later stages of the simulations and reasons, we can actually expect most mutations that generate Y-cell functions in DISCOs without Y cells (mutations along path fc) to be very deleterious. The digital fossil record shows that most mutations (>95%) that generate Y-cell functions in DISCOs with Y cells (along evolutionary path cf) destroy at least one of the previously evolved logic functions. This is not detrimental for DISCOs with Y cells because they trade a Y-cellspecific function for a nonspecific function. However, DISCOs without Y cells cannot utilize the newly discovered logic function and experience ‘just’ a loss of already evolved logic functions. Hence, most mutations that lead to Df are actually deleterious and evolution via fc can only use a small (<5%) subset of neutral mutations. All things considered, Df and Dc have on average a low fitness and we should not be surprised that many specialized cells evolve via si (Iwasa et al., 2003, 2004). But why and how does the situation change for undifferentiated, multicellular organisms? Why is the rate of evolution via cf higher in +X than in )X simulations (see Fig. 2)? We can answer this question by considering the developmental cost of Y cells for organisms of different sizes. As one would expect, the burden of developing one additional unused cell is more substantial for small organisms than it is for large ones. For example, a size increase by one decreases the relative fitness of a unicellular DISCO to 0.62, whereas the 1 3 5 7 9 11 13 15 10 20 30 40 50 60 70 P−value = 3.45e−02 After 500 generations 0 0 Number of simulations 10 20 30 40 50 60 70 After 200 generations P−value = 1.48e−04 1 7 9 11 13 15 P−value = 1.63e−10 0 0 10 20 30 40 Number of simulations 10 20 30 40 P−value = 5.13e−09 5 After 10 000 generations 50 50 After 5000 generations 3 1 3 5 7 9 11 13 15 Organism size 1 3 5 7 9 11 13 15 Organism size Fig. 4 Number of simulations that evolved organisms with (white bars) and without (black bars) specialized cells after 200, 500, 5000 and 10 000 generations, grouped according to the size of the organism. Large organisms show a significant bias (see P-values of a Pearson’s chi-squared test) towards discovering specialized cells earlier than small organisms. ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY Specialized cells in multicellular DISCOs remains significant (see P-values in Fig. 4). Even within the +X simulations, we observe that an increase in organism size eases the evolution of specialized cells. Discussion For multicellular organisms (Bonner, 1965, 2004; Bell & Mooers, 1997) as well as insect (Wilson, 1971) or human (Blau, 1974) societies, it is known that the degree of specialization increases with the size of the system. Large systems seem to be able to benefit more from specialized units. Consequently, the lack of specialization in very small multicellular organisms (Bell & Mooers, 1997) might be explained by the existence of a minimum threshold size at which specialization becomes advantageous. It is important to emphasize that this is not the case in this paper. Specialized cells can increase the fitness of small and large DISCOs substantially. Nonetheless, there is a significant correlation between organism size and the presence of specialized cells (see Fig. 4). This correlation is based on evolutionary constraints in small organisms and not on a minimum threshold size at which specialization becomes advantageous. This paper demonstrates how artificial life simulations can be used to study the evolution of complex multicellularity. Currently, the implementation of DISCOs allows only for a sudden change of cell types which results in an equally sudden change of the logic functions that the cell can utilize. For biological systems, one might favour models in which new cell types evolve more gradually with intermediate, ‘chimeric’ types that can perform new and old functions, but both just suboptimally due to an inherent trade off. Even for such models similar results can be expected. Allowing for chimeric cell types does not change the fact that individual cells affect the fitness of the whole organisms less in large organisms (organisms composed of many cells) than in small organisms (organisms composed of few cells). In general, loss of function mutations are more frequent than gain of function mutations. It is therefore likely that many evolutionary trajectories exists that lead to specialized cells but involve intermediate fitness losses due to loss of function mutations. As I have demonstrated in this paper, such mutations are less harmful to organisms composed of many cells. In such organisms, mutations that create chimeric cells with (temporarily) mediocre functionality can be maintained in the population until additional mutations accumulate that allow the new cell type to execute the new function at its full potential. As loss of function mutations are so frequent, an increase in organism size can be expected to substantially increase the arsenal of evolutionary trajectories that are available for the evolution of specialized cells and is therefore an important step towards the evolution of complex multicellularity. 109 The insight that the size of a biological system affects its evolutionary landscape can also be applied to other aspects of biology. Take, for example, gene duplication. It is commonly accepted that gene duplication accelerates the discovery of new gene functions. After a gene duplication, one copy of the gene can execute the old function. The other copy can accumulate mutations which might eventually lead to new functions (Lynch & Conery, 2000). A corollary from this paper is that the genome size of an organism has a crucial impact on the organism’s ability to discover new gene functions by means of gene duplication. For very small genomes (e.g. the genome of a virus) a duplicated gene constitutes a substantial fitness burden and might not be maintained in the population long enough to adopt a new function. For big genomes (e.g. eukaryotic genomes) a duplicate of a single gene constitutes an insignificant additional burden and can be maintained long enough to discover a new role. The results from this paper regarding organism size and the evolution of complex multicellularity can also be applied to genome sizes and the evolution of complex genomes. Organism size has always been considered an important factor for the evolution of multicellularity. In fact, benefits of increased size are thought to have promoted the transition from unicellular to undifferentiated multicellular life (Bonner, 1965, 2001; Kirk, 1997, 2003; King, 2004). Advantages of increased size include predator evasion (Boraas et al., 1998), increased motility (Kirk, 2003) or increased capacity to store nutrients (Koufopanou & Bell, 1993; Kerszberg & Wolpert, 1998). In this work, I observe another, less obvious, benefit of organism size. This benefit – the ability to discover new, specialized cells via trajectories that are inaccessible to small organisms – does not concern the fitness of the organism itself, but its ability to evolve more complex multicellular forms. Acknowledgments I am grateful to Erick Matsen for mesmerizing and amazing discussions, Matthew Hegreness and Reinhard Bürger for help with the manuscript, and Martin Nowak for invaluable input. I am supported by a Merck-Wiley fellowship. Support from the NSF ⁄ NIH joint programme in mathematical biology (NIH grant R01GM078986) is gratefully acknowledged. The Program for Evolutionary Dynamics at Harvard University is sponsored by J. Epstein. References Adami, C., Ofria, C. & Collier, T.C. 2000. Evolution of biological complexity. Proc. Natl Acad. Sci. U. S. A. 97: 4463–4468. Bell, G. & Mooers, A. 1997. Size and complexity among multicellular organisms. Biol. J. Linn. Soc. 60: 345–363. Blau, P. 1974. On the Nature of Organizations. Wiley, New York. ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY 110 M. WI L LE N S D O R F E R Bonner, J.T. 1965. Size and Cycle. Princeton University Press, Princeton, NJ. Bonner, J.T. 2001. First Signals: The Evolution of Multicellular Development. Princeton University Press, Princeton, NJ. Bonner, J.T. 2004. Perspective: the size–complexity rule. Evolution 58: 1883–1890. Boraas, M., Seale, D. & Boxhorn, J. 1998. Phagotrophy by a flagellate selects for colonial prey: a possible origin of multicellularity. Evol. Ecol. 12: 153–164. Buss, L.W. 1988. The Evolution of Individuality. Princeton University Press, Princeton, NJ. Chow, S.S., Wilke, C.O., Ofria, C., Lenski, R.E. & Adami, C. 2004. Adaptive radiation from resource competition in digital organisms. Science 305: 84–86. Darwin, C. 1859. On the Origin of Species: By Means of Natural Selection. Murray, London. Dawkins, R. 1999. The Extended Phenotype, revised edn. Oxford University Press, Oxford. Iwasa, Y., Michor, F. & Nowak, M.A. 2003. Evolutionary dynamics of escape from biomedical intervention. Proc. Biol. Sci. 270: 2573–2578. Iwasa, Y., Michor, F. & Nowak, M.A. 2004. Stochastic tunnels in evolutionary dynamics. Genetics 166: 1571–1579. Kerszberg, L. & Wolpert, E. 1998. The origin of metazoa and the egg: a role for cell death. J. Theor. Biol. 193: 535–537. King, N. 2004. The unicellular ancestry of animal development. Dev. Cell 7: 313–325. Kirk, D.L. 1997. Volvox : A Search for the Molecular and Genetic Origins of Multicellularity and Cellular Differentiation. Cambridge University Press, Cambridge. Kirk, D. 2003. Seeking the ultimate and proximate causes of volvox multicellularity and cellular differentiation. Integr. Comp. Biol. 43: 247–253. Knoll, A.H. 2003. Life on a Young Planet: The First Three Billion Years of Evolution on Earth. Princeton University Press, Princeton, NJ. Koufopanou, V. & Bell, G. 1993. Soma and germ – an experimental approach using volvox. Proc. R. Soc. Lond. Ser. B-Biol. Sci. 254: 107–113. Lenski, R.E., Ofria, C., Pennock, R.T. & Adami, C. 2003. The evolutionary origin of complex features. Nature 423: 139–144. Lynch, M. & Conery, J.S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155. Maynard-Smith, J. 1989. Evolutionary Progress. University of Chicago Press, Chicago, IL, pp. 219–230. Maynard-Smith, J. & Szathmary, E. 1997. The Major Transitions in Evolution, reprint edn. Oxford University Press, Oxford. Michod, R.E. & Roze, D. 2001. Cooperation and conflict in the evolution of multicellularity. Heredity 86: 1–7. Ofria, C. & Wilke, C.O. 2004. Avida: a software platform for research in computational evolutionary biology. Artif. Life 10: 191–229. Wilke, C.O., Wang, J.L., Ofria, C., Lenski, R.E. & Adami, C. 2001. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature 412: 331–333. Wilson, E.O. 1971. The Insect Societies. Harvard University Press, Cambridge. Yedid, G. & Bell, G. 2002. Macroevolution simulated with autonomously replicating computer programs. Nature 420: 810–812. Supplementary Material The following supplementary material is available for this article: Appendix Supporting online material. This material is available as part of the online article from: http://www.blackwell-synergy.com/doi/abs/ 10.1111/j.1420-9101.2007.01466.x Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. Received 14 June 2007; revised 9 September 2007; accepted 5 October 2007 ª 2007 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 104–110 JOURNAL COMPILATION ª 2007 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY Supporting Online Material for Organism Size Promotes the Evolution of Specialized Cells in Multicellular Digital Organisms Martin Willensdorfer Program for Evolutionary Dynamics, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA. E-mail: [email protected] Contents 1 Digital Self-Replicating Cellular Organisms (DISCOs) 1.1 The Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Environment, Fitness, Merit, and Logic Functions . . . . . . . . . . . . . . . . 1.3 Multicellularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 5 7 2 Details about the -X and +X simulations 9 3 Developmental cost of one Y cell 13 A List of Instructions 17 B Example Genomes 19 1 1 Digital Self-Replicating Cellular Organisms (DISCOs) This section provides details about DISCOs. Since I was motivated by Lenski et. al. (1), DISCOs are very similar to Avididans, that is, digital organisms developed by Christoph Adami (2). However, there are some important differences: (a) DISCOs have a cell cycle with a metabolic and a replication phase, (b) DISCOs have a minimal and disjunct set of instructions for each phase, (c) DISCOs can only copy their genome to a daughter strand and cannot modify their own genome, and (d) DISCOs can evolve multicellularity. Modifications (a)–(c) make it possible to directly identify the genomic basis of a phenotypic feature and modification (d) makes it possible to study the evolution of multicellularity. Section 1.1 explains in detail how a single DISCO cell works. Section 1.2 describes how fitness is realized in DISCOs and how a DISCO can increase its fitness by computing logic functions. Finally, Section 1.3 explains aspects of multicellularity in DISCOs. 1.1 The Cell A DISCO cell is a computing automaton with the ability to replicate. Each DISCO has a genome, that is, a sequence of instructions and modifiers. The instructions are designed to operate on the automaton and change its state in certain ways. This state change depends not only on the instruction but also on modifiers in the genome. For the following, however, I will use the term instruction to refer to an instruction and all the modifiers in the genome that affect the action of this instruction. Appendix A contains detailed information about how modifiers affect each instruction. In DISCOs modifiers are also used to encode multicellularity in the genome as we will see in Section 1.3. Naturally, the execution of a sequence of instructions leads to a sequence of state changes of the automaton. The right sequence of instructions can cause state changes that result in the 2 computation of logic functions, genome replication, and, finally, cell division. Which sequence of instructions a cell will actually execute is determined by its genome. Hence, the genome determines the ability of a cell to compute logic functions as well as speed and accuracy of replication, which constitute a cells phenotype. The life of a DISCO cell is divided into two phases, a metabolic and a replication phase. At the onset of each phase the automaton is reset and starts to execute instructions from the beginning of the genome. During the metabolic phase a cell can read and manipulate data to compute logic functions. During the replication phase a cell can copy its genome and initiate cell division. Each phase has its own set of instructions. We have a set of instructions for the metabolic phase (blank, io, nand, swap, push, pop) and a set of instructions for the replication phase (copy, search, mov-head, if-label, divide). A newly created cell enters first the metabolic phase. The transition from metabolic to replication phase happens after the execution of m metabolic instructions (m is a simulation parameter and equals 300 in all described simulations). The replication phase, on the other hand, ends at the successful or unsuccessful attempt to initiate cell division, that is, at the execution of a divide instruction. During the metabolic phase a DISCO cell can create data in the form of logic compounds. To make the production of logic compounds possible the automaton is equipped with three registers (A, B, and C) and one stack. The registers are the operating platform. They are used to store and manipulate data. New logic compounds can only be generated by using the nand instruction which manipulates data stored in the registers. The most basic logic compound is a logic variable and every logic compound is composed of logic variables and logic operators that connect these variables. Logic variables are supplied by the io instruction and stored in one of the three registers. By loading a new variable to a register, the io instruction might have to replace a logic compound that is stored in this register. Whenever a logic compound is replaced, 3 the io instruction checks whether this compound is equivalent to one of nine logic functions. If this is the case, then the DISCO will be rewarded (see next section). Data can also be copied from one of the registers to the stack with the push instruction and recovered with the pop instruction. The DISCO reads the metabolic instructions one by one from the genome. If the end of the genome is reached, the DISCO will continue from the beginning, that is, the genome is processed circular. One might wonder how those activities relate to a biological metabolism. The analogy becomes quite obvious if one considers autotrophic processes like carbon fixation in plants. Plants convert carbon dioxide into organic compounds. The first stable intermediate is a 3carbon compound. These trioses can be condensed into hexoses, sucrose, or cellulose. They can also be used to make amino acids and lipids. Hence, more complex organic compounds are assembled from simpler ones and all those compounds contribute to the fitness of the organism. The situation is analogous in DISCOs. However, a DISCO handles logic compounds instead of organic ones. Logic variables are assembled with the nand instruction to form more complex logic compounds and, as with real organisms, certain compounds can contribute considerably to the fitness of the organism. The metabolic phase is followed by a replication phase. During the replication phase a DISCO can copy its genome and initiate cell division. The copy instruction instructs the automaton to copy a genome element to the daughter strand. Which element of the genome is copied depends on the position of the so called read head. Heads are pointers that the automaton can use to mark elements of the genome. Besides the read head the automaton has an instruction head and a flow head. The read head shows the automaton which element of the genome has to be copied next. It moves forward to the next element after each copy event. The instruction head shows the automaton which instruction has to be executed next and does also move forward after the instruction has been executed. The flow head is used to mark 4 positions in the genome to which either the instruction head or the read had can be moved by the mov-head instruction. This together with the if-label instruction allows to encode jumps and loops in the genome. Cell division is initiated by the divide instruction. The execution of a divde instruction will always lead to a switch from the replication phase to the metabolic phase, regardless of whether it was successful or not. Appendix A describes precisely how each instruction changes the state of the automaton and how modifiers can affect this state change. Appendix B contains some example genomes and illustrates how a sequence of instructions can encode logic functions and cell replication. 1.2 Environment, Fitness, Merit, and Logic Functions DISCOs live in an unstructured environment that can accommodate a given number of organisms. In this environment DISCOs compete for the opportunity to execute instructions. For each iteration only one DISCO is selected and the selected DISCO can execute only one instruction. Most of the times the execution of an instruction will only affect the DISCO internally. At some point, however, a DISCO will produce offspring. The offspring is first exposed to replacement (copy), insertion, and deletion mutations and then placed at a randomly chosen position in the environment. In most cases this will result in the replacement (death) of another DISCO. The frequency with which DISCOs are chosen to execute an instruction is proportional to their merit. Consequently, the more merit a DISCO has, the more frequently it will be chosen to execute an instruction. Obviously, if the number of instructions that have to be executed until a cell divides is the same for two DISCOs, then the one with the greater merit will have the higher fitness. On the other hand, if two DISCOs have the same merit, then the one that has to execute less instructions until cell division will have the higher fitness. We see that there are two components that determine the fitness of a DISCO: (a) its merit (relative to the merit of the other organisms) and (b) the speed of replication (i.e., the number of instructions that have to 5 be executed until a cell divides). As mentioned above, a DISCO can manipulate data to create logic compounds. By creating compounds that are equivalent to specific logic functions, a DISCO can increase its merit and consequently its fitness. A DISCO can only generate logic compounds by applying the nand instruction. The nand instruction connects two data elements (a data element is either a logic variable or a logic compound) with the NAND (“Not And”) operation. The NAND operator (. | .) is defined as the negation (¬ .) of the conjunction (. ∧ .), that is, (a|b) = ¬(a ∧ b). The resulting logic compound is false if and only if both a and b are true. Formal logic shows that every truth-functional compound can be expressed by using just the NAND operator (3). For example, we can write ¬a = a|a and a ∧ b = (a|b)|(a|b). Since a DISCO can apply the NAND operation, it can also produce a multitude of logic compounds. With the io instruction a DISCO can test whether a data element is equivalent to one of up to nine logic functions. If so, then the DISCO is rewarded with a merit increase. By how much the merit is increased depends on the complexity of the logic function (see below and Section 2). Hence, a DISCO receives a reward for the construction of meaningful logic compounds. Selection will therefore favor DISCOs that can construct such logic compounds. For simplicity, the merit of an organism remains unchanged until it produces offspring. Each time a DISCO produces new offspring its merit is newly calculated according to the logic compounds it was able to create. One has to keep in mind that newly born DISCOs have not yet had the chance to construct any logic compounds. To avoid disadvantageous for newborns, they start off with the parental merit until they reached maturity and produce their own offspring. The table below lists the nine logic operators that a DISCO can compute to change its merit. During a simulation logic operators are identified by calculating the truth value of the expression in the second column, where C is a logic compound that a DISCO generated. The truth value of the expression in column two is calculated by using randomly generated 64-bit integers as 6 instances for the logic variables a and b. The NAND operation as well as the logic operations are applied bit-wise. An example is given below. It is convenient to summarize which logic functions the genome of a DISCO can compute by using a nine digit binary code. For example a 000000000 tells us that the genome does not encode for any logic function; a 101101000 that the genome can compute NOT, AND, OR N, and AND N, and a 111111011 that the genome can compute all nine logic functions except for XOR. minimum number 1 logic operator definition NOT C(a) ≡ ¬a NAND of NAND required 1 C(a, b) ≡ ¬(a ∧ b) 1 C(a, b) ≡ a ∧ b C(a, b) ≡ a ∨ ¬b ∨ C(a, b) ≡ ¬a ∨ b 2 3 AND N C(a, b) ≡ a ∨ b C(a, b) ≡ a ∧ ¬b ∨ C(a, b) ≡ ¬a ∧ b XOR C(a, b) ≡ (a ∨ b) ∧ ¬(a ∧ b) 4 C(a, b) ≡ (a ∧ b) ∨ (¬a ∧ ¬b) 5 AND OR N OR NOR EQU C(a, b) ≡ ¬(a ∨ b) 2 3 4 a : 110101110001110011111001100010001000001000. . . 64-bit integer b : 000100101101101001011100000111110000101000. . . 64-bit integer C(a, b) : 000100100001100001011000000010000000001000. . . 64-bit integer We have C(a, b) ≡ a ∧ b and conclude that compound C(a, b) encodes for AND 1.3 Multicellularity So far I have described how the genome of a DISCO determines the phenotype of a single cell. The genome can also encode information about the development of a multicellular organism. 1 ¬, ∧, ∨, and ≡ symbolize the logical negation, conjunction (and), disjunction (or), and equivalence. 7 A DISCOs life starts always with a single cell, the default (D) cell. Each organism has exactly one D cell and the D cell is the only reproductive cell. After the first cell division the D cell has two options. It can either release the daughter cell into the environment as offspring and remain unicellular, or retain the cell as a first step towards the development of a multicellular organism. If, how many, and what kind of cells are retained is encoded in the genome. To understand how multicellularity is encoded, we have to remind ourself that the genome is a sequence of instructions and modifiers. Among the set of instructions, the divide instruction is special because it is the only one that is not affected by modifiers. To keep things as simple as possible, I decided to exploit this feature of the divide instruction. In particular, the D cell will retain one cell for each divide instruction in the genome that is followed by a modifier. Depending on the kind of modifier the retained daughter cell is assigned to a somatic cell type. For example, a ‘divide|A’ encodes for a X cell, whereas a ‘divide|B’ and a ‘divide|C’ encodes for a Y and a Z cell, respectively. (Z cells are not relevant for this work and just mentioned for completeness.) After a daughter cell has been retained for each such divide instruction, the D cell is able to release daughter cells as offspring into the environment (see Figure 1 for an example). Please note that the genome might not contain any divide instruction that is followed by a modifier and would therefore encode for a unicellular organism (see Appendix B for examples). Most multicellular organisms have specialized cells. Even though the specialized cells of a multicellular organism contain in most cases the same genome as the replicative cells, they behave differently. Specialized cells in DISCOs work analogous. The genome of a cell might be able to compute all logic functions, but cells can only utilize logic functions according to their cell type. For this work, D and X cells can only benefit from the first six logic functions and Y cells only from the last three functions that might be encoded in the genome (see Section 2). 8 It is important to point out that, even though a DISCO can be composed of several cells, each cell is still an independent automaton that executes one instruction after another. In fact, whenever a multicellular DISCO is selected by the environment to execute an instruction, each cell of this organism will execute one instruction. 2 Details about the -X and +X simulations This section describes the -X and the +X simulations in more detail. For computational reasons I limited the population size to 200 (uni- or multicellular) DISCOs. I conducted 500 runs for each type of simulation, which differed only with respect to the seed for the random number generator. Both types of simulations have two parts (a) an initial 10 000 generations (2 × 106 replication events) in which specialized cells are not beneficial and (b) a further 10 000 generations in which specialized cells are advantageous. For the first 10 000 generations, each simulation was initiated with the same genome. This genome contains 141 ‘blank’ instructions (essentially a place holder, see Appendix A) and a sequence of 9 instructions, ‘search|copy|if-label|C|A|divide|mov-head|A|B’, which encodes cell replication (see Appendix B). Thus, the ancestral DISCO could not compute any logic function and was unicellular. The initial genome has length 150. During all simulations, genome length was restricted to [145, 155], that is, offspring was nonviable if its genome was smaller than 145 or larger than 155. After the first 10 000 generations I determine the most recent common ancestor (MRCA) of the population. The genome of the MRCA is then used to seed the population for the next 10 000 generations. That is, the second part of a run is initialized with a genome that was produced during the first part of this run. During the simulations, offspring was exposed to copy, insertion, and deletion mutations. For each offspring, the number of copy, insertion, and deletion mutations is chosen from a 9 Poisson distribution with mean 0.45, 0.025, and 0.025, respectively. An average mutation rate of 0.5 instructions per replication (≈ 3.4 × 10−3 mutations per instruction per replication) was chosen because it seemed to be optimal for the evolution of Y cell specific logic functions. The merit of an organism is calculated based on the merit of its cells. The merit of a cell in turn is calculated based on the logic functions it can compute and utilize. A cell that cannot compute any logic function has merit 1. The second column of Table 1 shows how the merit of a cell changes if it is able to compute (and utilize) the corresponding logic function. The values were taken from Lenski et al. (1) and reflect the complexity of the respective function. In particular, the merit of a cell is multiplied by 2n if the cell is able to compute a logic function of complexity n, where n gives the minimum number of NAND operations that are required to construct the logic function [see (1) and Section 1.2]. As mentioned before, not every cell is able to utilize every logic function. Which kind of cell can utilize which kind of functions during the first and the second part of the +X and the -X simulations is shown in Table 1. Finally, the merit of the multicellular organism has to be calculated based on the merit of each cell. In short, the merit of an organism during the +X simulations is given by SUM(X,D)*SET(Y) and during the -X simulations by SET(X,D)*SET(Y). The expression SUM(X,D) denotes the sum of merits of all D and X cells. Hence, during the +X simulations D and X cells contribute linearly to the merit of the organism. SET(.) is equal to the merit of a cell that can compute the set of functions that all the cells in the argument can compute. For example, SET(X,D) equals the merit of a cell that can compute the same set of functions that all X and D cells can compute. Since X and D cells can encode and utilize the same functions, SET(X,D) is essentially equal to the merit of one D cell. Hence, additional X cells cannot contribute to the merit of the organism. Similarly, SET(Y) equals the merit of a cell that can compute the set of functions that all Y cells can compute. Again, since one Y cell can compute the set of functions that all Y cells can compute, SET(Y) is equal to the merit of one Y cell. 10 The use of SUM(X,D)*SET(Y) for the +X simulations was motivated by specialized cells in cyanobacteria. X and D cells are functionally equivalent and contribute additively to the merit of the organism: Two photosynthesizing vegetative cells can fix approximately twice as much carbon as one photosynthesizing cell. Y cells, however, are specialized cells that amplify the activity of X and D cells (by providing nitrogen, for example). This is reflected in the multiplicative contribution of Y cells to the merit of the organism. I use SET(Y) instead of SUM(Y) because DISCOs are thought to be small enough so that one Y cells can amplify the merit of X and D cells as well as two Y cells. Figure 1 shows the first five cell divisions in the life of a multicellular DISCO composed of two X cells and one Y cell. It also shows what the merit of this organism would be during the first and the second part of the +X and -X simulations. cell types that can utilize the given function logic function change in merit NOT NAND AND OR N OR AND N XOR NOR EQU ×21 ×21 ×22 ×22 ×23 ×23 ×24 ×24 ×25 first 10 000 generations second 10 000 generations +X -X +X -X D,X D,X D,X D,X D,X D,X - D D D D D D - D,X D,X D,X D,X D,X D,X Y Y Y D D D D D D Y Y Y Table 1: Merit increase and cell type specifity during the +X and -X simulations. Please note that X cells can utilize functions during the -X simulations but are not able to contribute to the merit of the organism, since I am using SET(D,X)*SET(Y) instead of SUM(D,X)*SET(Y). 11 Figure 1: Multicellularity in DISCOs. (a) The first five cell division of a DISCO with a genome that encodes two X cells and one Y cell. As explained in Section 1.3, if the genome contains divide instructions that are followed by a modifier, then the D cell will retain daughter cells to build a multicellular organism. The genome of this DISCO contains two divide|A and one divide|B instructions. Consequently, the first three cell divisions are used to produce two X cells and one Y cell. Thereafter every further division results in cells that are released into the environment as offspring. (b) Calculating the merit of a multicellular organism. Let us assume that the genome of this DISCO encodes for AND, OR, AND N, XOR, and EQU, that is, 001011101. A logic function can increase a cells merit only if it is encoded in the genome and utilizable by the given cell type. Here, for example, the D cell can compute and utilize AND, OR, and AND N (001011000) and has therefore a merit of 1 × 22 × 23 × 23 = 28 = 256. For the first 10 000 generations, Y cells cannot utilize any logic function and have therefore always merit 1. For the second part, however, Y cells can utilize the last three logic functions (000000111). Since the genome of this organism encodes XOR and EQU, Y cells receive a merit of 1 × 24 × 25 = 29 = 512 for 000000101. The -X simulations differ from the +X simulations in that X cells cannot contribute to the merit of the organism. I use SET(D,X) instead of SUM(D,X) and one D cell can compute the same set of functions as D and X cells together. We can clearly see that some somatic cells do not increase the merit of the DISCO (X cells during the -X simulations and Y cells during the first 10 000 generations). However, somatic cells constitute a cost, since they delay the time it takes to reach maturity and finally produce offspring. Hence, somatic cells that do not increase the merit of the organism are disadvantageous. 12 3 Developmental cost of one Y cell I will calculate the fitness of an organism composed of n + 1 cells relative to the fitness of an organism composed of n cells. Let us first consider n = 1. The unicellular organism produces offspring with every cell division at rate r1 , i.e., → + . Its fitness is given by the rate of cell division. The bicellular organism produces offspring only after it reached maturity. In DISCOs (see Figure 1a) only one cell is able to divide. Hence, we have → → +. If x1 and x2 denote the frequency of the bicellular organisms in the unicellular and the bicellular stage of development, respectively, and r2 the rate of cell division, then we can use the following differential equation to describe the population, ẋ1 = −r2 x1 + r2 x2 − Φx1 ẋ2 = r2 x1 (1) − Φx2 . The fitness of the bicellular organism is given by the average fitness Φ at equilibrium, which is given by the largest eigenvalue of −r2 r2 . r2 0 (2) A short calculation shows, that the eigenvalues, λ, are given by the solutions of λ(λ + r2 ) − r22 . Equivalently, we can solve λ(λ + 1) − 1 and multiply the solution with r2 . In any case, the √ fitness of the bicellular organism at equilibrium is given by Φ = ( 5 − 1)/2 r2 ≈ 0.62 r2 . Hence the fitness of the bicellular organism relative to the fitness of the unicellular organism equals r2 0.62/r1 . In DISCOs the rate of cell division is proportional to the merit of the organism and the efficiency of genome duplication. Since we are just interested in the effect of mutations that add an additional cell to an organism, we have r1 ≈ r2 . Hence, mutations that turn a unicellular into a bicellular organism decrease the relative fitness to 0.62. The calculations for an organism of size n are similar. To calculate the fitness, Φn , of an 13 organism of size n, we have to calculate the largest eigenvalue of the following n × n matrix, −1 0 · · · 0 1 . 1 −1 . . 0 .. . . .. 0 (3) . . . 1 . . .. .. .. . . −1 0 0 ··· 0 1 0 The eigenvalues λ are given by the roots of λ(λ + 1)n−1 − 1. The fitness of an organism of size n + 1 relative to and organism of size n is then given by Φn+1 /Φn . For example, the cost of one additional, unused cell in an organism of size 10 is Φ11 /Φ10 = 0.184/0.197 = 0.934. The dotted line in Figure 2 shows Φn and the solid line Φn+1 /Φn for a wide range of organism sizes. 14 1.0 0.8 0.6 0.4 0.0 0.2 relative fitness 1 5 10 50 100 500 organism size Figure 2: Developmental cost of an additional, unused cell. The dotted line shows the fitness of an organism of size n relative to the fitness of a unicellular organism. The solid line shows the fitness of an organism of size n + 1 relative to the fitness of an organism of size n. Hence, it shows how deleterious mutations are that add one, unused Y cell to an organism of size n. 15 References and Notes 1. Lenski R.E., Ofria C., Pennock R.T., & Adami C. (2003) Nature 423, 139–44. 2. Ofria C. & Wilke C.O. (2004) Artif Life 10, 191–229. 3. Goldfarb W. (2003) Deductive Logic (Hackett Publishing Company). 16 A List of Instructions This section describes how instructions together with modifiers change the state of a DISCO cell. In the following inst will refer to any of the available instructions, mod will refer to a sequence of modifiers, and C(mod) to its complement. We have the three modifiers A, B, and C with their complements B, C, and A, respectively. The complement of a sequence of modifiers is given by the sequence of complements, for example, C(CACABB) = ABABCC. Instructions for the replication phase The copy instruction: The copy instruction copies the instruction/modifiers to which the read head points to the daughter strand and moves the read head to the instruction/modifier following the just copied genome elements. The search instruction: The search instruction repositions the flow head depending on modifier sequences in the genome. We have to distinguish the following cases. • The search instruction is followed by another instruction, i.e., we have . . . search inst . . . : In this case the flow head is moved to point at inst. • The search instruction is followed by a sequence of modifiers, i.e., we have . . . search mod . . . : In this case we look for the compliment of the modifier sequence C(mod) in the genome. If this sequence can be found, then the flow head is moved to point at the instruction following C(mod). If this sequence cannot be found, then the flow head is moved to point to the instruction following mod. The mov-head instruction: The mov-head moves either the instruction head or the read head to the position of the flow head. The instruction head is moved to the flow head if the DISCO cell executes a mov-head that is followed by modifier A, i.e., mov-head A. If the mov-head is followed by modifier B, i.e., mov-head B, then the read head is moved to the flow head. In all other cases, the mov-head instruction does not affect the state of the automaton. The if-label instruction: The if-label instruction can be used to skip instructions depending on the most recently copied genome element. Let us consider the sequence if-label mod inst. The inst instruction will be executed only if the most recently copied genome element is the complement of mod. If this is not the case, then inst will be ignored and the instruction head moved forward to the next instruction. In case of if-label inst, the automaton will execute inst if and only if the most recently copied genome element is an instruction. The divide instruction: The divide instruction initiates cell division and terminates the replication phase. The divide instruction will successfully generate a new DISCO cell if the genome has the right length. For this work the genome cannot be smaller than 145 or larger than 155 modifiers and instructions long. The generated daughter cell is either used as a somatic cell or released into the environment as offspring. Instructions for the metabolic phase The state changes for the metabolic phase are more straight forward than the ones for the replication phase. They only affect the merit of the organism and data stored in the registers and the stack. In the following I will use si , a, b, and c to denote logic compounds and x to denote a newly generated logic variable. The first column of following tables shows the state of the automaton before the sequence in the second column is executed. The third column 17 shows the state of the automaton after the execution of the sequence. Not shown is the advance of the instruction head to the next instruction. The blank instruction: The blank instruction does not change the state of the automaton. It is essentially just used as a place holder in ancestral genomes. Usually, the blank instruction is not part of the pool of instructions from which instructions for the copy and insertion mutations are chosen. Consequently, blank instructions will eventually disappear from the genome. The io instruction: The merit of the DISCO before the execution of the io instruction is given by m. The merit of the DISCO, after testing a, b, and c for logic functions is given by ma , mb , and mc , respectively. A DISCO cell is rewarded only once for a given logic function during one metabolic phase. before merit:m A:a B:b C:c instruction io A io B io C io inst merit:ma merit:mb merit:mc merit:mb after A:x A:a A:a A:a B:b B:x B:b B:x C:c C:c C:x C:c The nand instruction: Please note that the nand instruction is the only instruction that can generate new logic compounds. before A:a B:b C:c instruction nand A nand B nand C nand inst A:a|b A:a A:a A:a instruction swap A swap B swap C swap inst A:b A:a A:c A:a after B:b B:b|c B:b B:b|c C:c C:c C:c|a C:c The swap instruction: before A:a B:b C:c after B:a C:c B:c C:b B:b C:a B:c C:b The push instruction: before A:a B:b C:c stack: s1 , s2 , . . . instruction push A push B push C push inst The pop instruction: 18 after stack: a, s1 , s2 , s3 , . . . stack: b, s1 , s2 , s3 , . . . stack: c, s1 , s2 , s3 , . . . stack: b, s1 , s2 , s3 , . . . before A:a B:b C:c stack: s1 , s2 , . . . instruction pop A pop B pop C pop inst after A:s1 B:b C:c stack: s2 , s3 , . . . A:a B:s1 C:c stack: s2 , s3 , . . . A:a B:b C:s1 stack: s2 , s3 , . . . A:a B:s1 C:c stack: s2 , s3 , . . . B Example Genomes Encoding Replication The following shows a sequence of the state changes that lead to genome replication. During the replication phase, state changes affect only the position of the read, flow, and instruction head. These positions are indicated by underlines, overlines, and bold font, respectively, i.e., read head, flow head, and instruction head. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: search moves the flow head to the copy instruction • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: copy copies the io instruction to the daughter strand and moves the read head forward • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io The complement of C|A is given by A|B. Since the most recently copied element is io and not A|B, if-label ignores divide and advances the instruction head to the mov-head instruction. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io mov-head moves the instruction head to the position of the flow head. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io copy copies the search instruction to the daughter strand and moves the read head forward. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io|search This loop continues until the automaton copies A|B, the last genome element. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io|search|copy|if-label|C|A|divide|mov-head Copies A|B to the daughter strand and moves the read head forward. • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io|search|copy|if-label|C|A|divide|mov-head|A|B Now the most recently copied genome element is identical to the complement of C|A and if-label does not skip the divide instruction • io|search|copy|if-label|C|A|divide|mov-head|A|B daughter strand: io|search|copy|if-label|C|A|divide|mov-head|A|B The divide instruction ends the replication phase and the daughter strand is used to build a new cell. 19 Encoding NAND and AND This sequence creates two logic compounds. The first one (a|b)|(a|b) is equivalent to AND and the second one a|b is equivalent to NAND, where a and b are logic variables. io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C io|io|C|nand|push|pop|C|nand|io|io|C A: A: A: A: A: A: A: A: A: B: C: stack: B:a C: stack: B:a C:b stack: B:a|b C:b stack: B:a|b C:b stack:a|b B:a|b C:a|b stack: B:(a|b)|(a|b) C:a|b stack: B:c C:a|b stack: B:c C:d stack: merit= 1 merit= 1 merit= 1 merit= 1 merit= 1 merit= 1 merit= 1 merit= 1 × 22 merit= 1 × 22 × 21 Encoding NAND, AND, and Replication This sequence is pieced together using the two sequences above. The first part encodes the two logic function NAND and AND, and the second part genome replication. io|io|C|nand|push|pop|C|nand|io|io|C|search|copy|if-label|C|A|divide| mov-head|A|B Encoding NAND, AND, Replication, and One Y Cell This sequence is a derivative of the previous sequence. It contains a divide|B and encodes therefore for a multicellular organism with one Y cell. Please note that the if-label|C before the divide is required to prevent a premature initiation of cell division. io|io|C|nand|push|if-label|C|divide|B|pop|C|nand|io|io|C|search|copy| if-label|C|A|divide|mov-head|A|B 20 doi: 10.1111/j.1420-9101.2007.01496.x Erratum A publication error led to an error being introduced into Fig. 1(b) of Willensdorfer (2008) in the print edition of the journal (Journal of Evolutionary Biology, vol. 21(1), p. 106). A corrected version of this figure is reproduced below. This figure is correct in the online edition of the journal. (a) (b) Fig. 1 Multicellularity in digital self-replicating cellular organisms (DISCOs). (a) The first five cell divisions of a DISCO with a genome that encodes for two X cells (green shaded regions) and one Y cell (blue shaded region). The first three cell divisions produce the somatic X and Y cells. Every further division produces offspring which is released into the environment. (b) The merit of this multicellular organism during the first and second 10 000 generations of the )X and +X simulations. The genome encodes for five (of nine) logic functions as indicated by the nine-digit binary sequence. D, X and Y cells can utilize these functions (second column) only according to their cell type specificity (first column) and receive a corresponding merit (third column) which is used to calculate the merit of the organism (fourth column). Note that Y cells are not able to increase the merit of the organism during the first 10 000 generations and that X cells are not able to increase the merit of the organism during the )X simulations. Cells that do not increase merit are disadvantageous, as they increase the number of cell divisions that are required to reach maturity. A more detailed description is available in the Supplementary material. Reference Willensdorfer, M. (2008). Organism size promotes the evolution of specialized cells in multicellular digital organisms. J. Evol. Biol. 21: 104–110. 646 ª 2008 THE AUTHOR. J. EVOL. BIOL. 21 (2008) 646 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
© Copyright 2026 Paperzz