“prokaryote” and the polyphyletic origin of genes

HYPOTHESIS
The indefinable term “prokaryote” and the polyphyletic origin of genes
MASSIMO DI GIULIO
Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P.
Castellino, 111, 80131 Naples, Italy
email: [email protected]
Running title: Prokaryote term and gene polyphyletic origin
Keywords: Progenote - LUCA - Evolutionary stages – Early Evolution of Life
Abstract
An analysis has been performed of implications existing between the presence/absence of the
evolutionary stage of the Prokaryote, that is to say, the presence/absence of common
characteristics between archaea and bacteria, and the monophyletic/polyphyletic origin of
genes of proteins. Thereafter, a theorem stating that: “the polyphyletic origin of proteins
would imply absence of common characteristics between bacteria and archaea and therefore
the lack of the evolutionary stage of the Prokaryote, and vice versa that the indefinable
Prokaryote stage implies a polyphyletic origin of proteins”, has been made and validated. The
conclusion is that given the absence of true in common characteristics between archaea and
bacteria, the origin of genes of proteins should have been polyphyletic.
The non-biological meaning of the Prokaryote term and the polyphyletic origin of genes: an
introduction
The monophyletic origin of genes seems to be the only sensible explanation for their
existence. This would derive, for example, from the observation that we can build plausible
multiple alignments of orthologous proteins taken from the three domains of life. In a more
explicit way, the fact that it is possible to align orthologous proteins from the three domains
of life would seem to imply that these proteins were already present at the Last Universal
Common Ancestor (LUCA) stage. Therefore, they were simply inherited in the three domains
of life, thus defining a their monophyletic origin. This reasoning seems to be sensible and
therefore would discredit considerably the polyphyletic hypothesis as an explanation for the
origin of genes of proteins. However, it is possible, at least in principle, that at the LUCA
stage protein modules were present, active, and codified by means of pieces of RNA that
were assembled by trans-splicing to form mRNAs. The translation of these mRNAs resulted
in proteins whose genes (DNA) actually evolved only later, that is to say, only after the
formation of the main phyletic lines (Di Giulio 2008a). In other words, at the LUCA stage
protein modules might have existed and codified by means of pieces of RNA. The one that
we observe today in the multiple alignments of orthologous proteins is substantially and
primarily an alignment of modules. These modules might have been different in the different
phyletic lines of divergence (also if homologues), because assembled by trans-splicing only
late on, during or after the formation of the three domains of life (Di Giulio 2006b, 2008a,
2008b, 2008d), to form proteins that we align today. In this way we would be able to explain
why we can obtain multiple alignments of the same protein taken from organisms from the
three domains of life, also under the hypothesis that the model for their evolution has been
the polyphyletic one and not only that monophyletic (Di Giulio 2006b, 2008a, 2008b, 2008d).
There are several observations that bring us to consider that the origin of genes could
have been polyphyletic. Firstly, it has been suggested that the origin of DNA might have been
relatively recent, that is to say, only after that the main plyletic lines were established
(Mushegian and Koonin 1996; Leipe et al. 1999; Aravind et al. 1999; Poole and Logan 2005;
Forterre 2005, 2006). If this would have been true then the origin of genes must have been
necessarily polyphyletic, having DNA, for hypothesis, evolved late on and independently in
the main phyletic lines (Di Giulio 2008a). Furthermore, there are strong observations that
bring us to consider that the origin of genes of tRNAs has been polyphyletic (Di Giulio 1999,
2004, 2006a, 2006b, 2008a, 2008b, 2008c, 2008d, 2008e, 2009b 2012, 2013a, 2013b).
Indeed, given that the hypothesis of ancestrality of split tRNA genes of Nanoarchaeum
equitans has been strongly corroborated (Di Giulio 2004, 2005, 2006a, 2006b, 2008a, 2008b,
2008c, 2008d, 2008e, 2009a, 2009b, 2011, 2012a, 2012b, 2014; Branciamore and Di Giulio
2011). The presence of split tRNA genes both at the LUCA stage and in actual organisms
should then prove a late assembly of the genes of tRNAs, in the main phyletic lines. In other
words, their origin should have been polyphyletic (Di Giulio 1999, 2004, 2006b, 2012a,
2013a, 2013b). There are also other observations that consider that the origin of genes of
proteins might have been polyphyletic (Lupas et al. 2001; Di Giulio 1999, 2004, 2006b,
2012a, 2013a, 2013b). In conclusion, although the polyphyletic hypothesis for the origin of
genes has never been taken in account, there are several observations and explanations that
bring us to consider that such origin is less absurd than commonly thought (Di Giulio 1999,
2004, 2006b, 2012, 2013a, 2013b).
There is a debate on the Prokaryote term, that is to say if it has, or does not have, an
actual biological meaning (Pace 2006, 2009; Martin and Koonin 2006; Sapp 2006; Dolan and
Margulis 2007; Whitman 2009; Doolittle and Zhaxybayeva 2009, 2013; Fox-Keller 2010; Di
Giulio 2015). Indeed, although analyses have reconstructed a complex LUCA, for example,
composed of genes and proteins comparable to that of current organisms (Mushegian and
Koonin 1996; Ouzounis and Kyrpides 1996; Ouzounis et al. 2006; Mat et al. 2008; Tuller et
al. 2010). We are, nevertheless, unable to define in a likewise clear and immediate way what
in reality is a prokaryote – i.e. what characteristics it had - as, in contrast, these analyses
would easily seem to imply (Di Giulio 2015). That is to say, we are not essentially able to
associate to the Prokaryote word a single characteristic, for instance, of the cellular level that
is really in common between bacteria and archaea. This would seem to deny the very
existence of the Prokaryote stage because not appreciable and not supported by the presence
of at least a character that was in common between archaea and bacteria (Di Giulio 2015; Fig.
1b).
It has been maintained that hundreds of genes, in particular the operational genes, are
commonly present in archaea and bacteria and they are generally absent in eukaryotes
(Doolittle and Zhaxybayeva 2013). These genes would identify a common ancestor between
them (Doolittle and Zhaxybayeva 2013). But this would not clarify the meaning of the
Prokaryote term because the existence of a common ancestor between bacteria and archaea indicated from the presence of in common proteins between them - would not imply that the
evolutionary stage of the Prokaryote has existed. This is because these proteins might have
been present – as seen above – under different forms, for example, under form of modules
and assembled only late on, in the main phyletic lines. This implies a polyphyletic origin for
proteins. Therefore the not existence of the evolutionary stage of the Prokaryote (Fig. 1a), but
only that of a progenote (Fig. 1b), given the rapid dynamism that would seem involved in the
evolution of these proteins (Di Giulio 2000, 2001, 2011, 2015; see below). It would result
therefore that proteins would not be a good indicator of an evolutionary stage as for instance,
the cellular membrane or the organization of a chromosome might instead be it. This is
because proteins might not represent a clear cellular level. They would be a kind of character
that might reflect both the presence of an evolutionary stage and its absence, given the
evolutionary dynamism that would seem to be associated to their structure.
Then, the set of problems would seem that to understand if the multiple alignments of
orthologous proteins define “complete” proteins in the common ancestor, that is to say, they
define that the evolutionary stage of the Prokaryote really existed (Fig. 1a). Or if, on the
contrary, these multiple alignments would imply “incomplete” proteins, i.e. still in rapid
evolution, that is to say, with a mode and tempo of evolution completely different from the
current one. The latter scenario would imply therefore the absence of the evolutionary stage
of the Prokaryote, and instead the presence of the one of the progenote (Fig. 1b). Multiple
alignments of orthologous proteins have been analyzed and it has been concluded that these
might support, or at least to be compatible, with a polyphyletic origin of genes, and thus the
absence of the evolutionary stage of the Prokaryote (Di Giulio, 2008a). It being understood
that multiple alignments of orthologous proteins from the domains of life should be read also
in the light of a polyphyletic origin of genes. Here, I analyze this kind of set of problems,
considering our incapacity to define with accuracy and immediacy the Prokaryote term and
its relationships with the polyphyletic origin of proteins.
[I assume that the domains of life are only two - Bacteria and Archaea – and not three,
that is to say, eukaryotes are derived from some archaea as now there is the tendency to
consider (Williams et al. 2012, 2013; Spang et al. 2015). However, the conclusions seem true
also if eukaryotes would represent a third domain of life. Furthermore, all crucial terms used
in this work have been previously defined (Woese 1987; Woese and Fox 1977; Doolittle and
Brown 1994; Pace 2006, 2009; Di Giulio 2000, 2001, 2006b, 2011, 2015)].
The non-biological meaning of the Prokaryote term does imply a polyphyletic origin of genes
and vice versa: a theorem
We start to reason under the hypothesis that the origin of proteins has been polyphyletic. This
should imply, by definition, an “immaturity” of proteins at the stage of LUCA. The absence
of common characteristics between bacteria and archaea (Fig. 1b) would be a simple
consequence of that. Indeed, the immaturity of proteins should imply, that is to say would be
equivalent to, an immaturity of all cellular structures that, therefore, might not have been in
common between bacteria and archaea as still rapidly evolving (Fig. 1b). Therefore, such to
not define the evolutionary stage of the Prokaryote (genote) (Fig. 1a), but only that of a
progenote (Fig. 1b). We can see that a polyphyletic origin of proteins should imply the
absence of common structures between archaea and bacteria, and therefore our incapacity to
define the Prokaryote term in a biological sense. Does exist the possibility that immature
proteins, that is to say still in rapid evolution, can instead go to define complete structures,
i.e. for instance, belonging to and defining a determined cellular level and thus an
evolutionary stage? It seems to me that this is not possible because if proteins are immature,
that is in rapid evolution, then also the structures of corresponding cellular level should be
rapidly evolving. Consequently it seems to me remote the possibility that instead a structure
of this “cellular” level can have been passed to bacteria and archaea, as still rapidly evolving,
that is to say, with a tempo and mode typical of a progenote and not of a genote (Woese
1987; Woese and Fox 1977; Doolittle and Brown 1994; Di Giulio 2000, 2001, 2006b, 2011,
2015).
Is the opposite also true? That is to say, the absence of in common characteristics
between bacteria and archaea would imply an immaturity of proteins and therefore a
polyphyletic origin of these or not? The answer seems to be positive. If proteins would have
been in an advanced stage of “maturation”, this would have imposed evolved cellular
structures, i.e. complete. These structures would have been in common between bacteria and
archaea, and this, instead, is not observable. Indeed, the absence of common characteristics
between archaea and bacteria is first of all absence of common cellular structures. This can
only reflect the presence of proteins still in rapid evolution, i.e. premature, at the LUCA’s
stage, and thus a polyphyletic origin of these latter. Whereas the absence of common
characteristics between archaea and bacteria cannot reflect the presence of proteins
completely evolved, i.e. mature. These proteins would have implied the very contrary. That is
to say, the presence at the LUCA’s stage of at least some complete characters that would
have become in common between bacteria and archaea, and this is not true for hypothesis.
In conclusion, it seems that all this can be included in the following theorem:
“the polyphyletic origin of proteins does imply absence of common characteristics
between bacteria and archaea, and therefore the absence of the evolutionary stage of
Prokaryote and vice versa, the indefinable evolutionary stage of the Prokaryote does
imply a polyphyletic origin of proteins”. That is to say, the theorem would make
equivalent the following points: (1) the absence of in common characteristics between
archaea and bacteria, (2) the absence of the evolutionary stage of Prokaryote i.e. the
presence of that of progenote, and (3) the polyphyletic origin of genes of proteins. The
highly significant biological consequence would be that the origin of proteins would
have followed the polyphyletic model and not the usually accepted monophyletic one.
Acknowledgments
This work was carried out in the Department of Philosophy and History of Science, Faculty
of Science, Charles University in Prague, Czech Republic, during my stay in this university,
financed by the short-term mobility program of the National Research Council of Italy. I
would like to thank Prof. Jaroslav Flegr for discussions and cordial hospitality.
References
Aravind I., Walkers D. R., Koonin E. K. 1999 Conserved domains in DNA repair proteins
and evolution of repair systems. Nucleic Acids Res. 27, 1223–1242.
Branciamore S., Di Giulio M. 2011 The presence in tRNA molecule sequences of the double
hairpin, an evolutionary stage through which the origin of this molecule is thought to have
passed. J. Mol. Evol. 72, 352-363.
Di Giulio M. 1999 The non-monophyletic origin of tRNA molecule. J. Theor. Biol. 197,
403–414.
Di Giulio M. 2000 The universal ancestor lived in a thermophilic or hyperthermophilic
environment. J. Theor. Biol. 203, 203–213.
Di Giulio M. 2001 The non-universality of the genetic code: the universal ancestor was a
progenote. J. Theor. Biol. 209, 345–349.
Di Giulio M. 2004 The origin of the tRNA molecule: implications for the origin of protein
synthesis. J. Theor. Biol. 226, 89–93.
Di Giulio M. 2006a Nanoarchaeum equitans is a living fossil. J. Theor. Biol. 242, 257–260.
Di Giulio M. 2006b The non-monophyletic origin of the tRNA molecule and the origin of
genes only after the evolutionary stage of the Last Universal Common Ancestor (LUCA). J.
Theor. Biol. 240, 343–352.
Di Giulio M. 2008a The origin of genes could be polyphyletic. Gene 426, 39–46.
Di Giulio M. 2008b The split genes of Nanoarchaeum equitans are an ancestral character.
Gene 421, 20–26.
Di Giulio M. 2008c Permuted tRNA genes of Cyanidioschyzon merolae, the origin of the
tRNA molecule and the root of the Eukarya domain. J. Theor. Biol. 253, 587–592.
Di Giulio M. 2008d Split genes, ancestral genes. In: Tze-Fei Wong J., Lazcano A. (Eds)
Prebiotic Evolution and Astrobiology. Landes Bioscience Pub- lisher, Texas, USA, chapter
13.
Di Giulio M. 2008e Transfer RNA genes in pieces are an ancestral characther. EMBO Rep. 9,
820.
Di Giulio M. 2009a A comparison among the models proposed to explain the origin of the
tRNA molecule: a synthesis. J Mol Evol 69, 1–9.
Di Giulio M. 2009b Formal proof that the split genes of tRNAs of Nanoarchaeum equitans
are an ancestral character. J. Mol. Evol. 69, 505–511.
Di Giulio M. 2011 The Last Universal Common Ancestor (LUCA) and the Ancestors of
Archaea and Bacteria were Progenotes. J. Mol. Evol. 72, 119–126.
Di Giulio M. 2012a The origin of the tRNA molecule: independent data favor a specific
model of its evolution. Biochimie 94, 1464-1466.
Di Giulio M. 2012b The 'recently' split transfer RNA genes may be close to merging the two
halves of the tRNA rather than having just separated them. J. Theor. Biol. 310, 1-2.
Di Giulio M. 2013a A polyphyletic model for the origin of tRNAs has more support than a
monophyletic model. J. Theor. Biol. 318, 124–128.
Di Giulio M. 2013b Is Nanoarchaeum equitans a paleokaryote? J. Biol. Res. 19, 83–88.
Di Giulio M. 2014 The split genes of Nanoarchaeum equitans have not originated in its
lineage and have been merged in another Nanoarchaeota: a reply to Podar et al. J. Theor.
Biol. 349, 167-169.
Di Giulio M. 2015 The Non-Biological Meaning of the Term ‘‘Prokaryote’’ and Its
Implications. J. Mol. Evol. 80, 98–101.
Dolan M. F., Margulis L. 2007 Advances in biology reveal truth about prokaryotes. Nature
445, 21.
Doolittle W. F., Brown J. R. 1994 Tempo mode the progenote and the universal root. Proc.
Natl. Acad. Sci. U. S. A. 91, 6721–6728.
Doolittle W. F., Zhaxybayeva O. 2009 On the origin of the prokaryotic species. Genome Res.
19, 744–756.
Doolittle W. F., Zhaxybayeva O. 2013 What is a prokaryote? E Rosenberg et al. (eds) The
Prokaryotes—Prokaryotic Biology and Symbiotic Associations, Springer, Berlin Heidelberg
pp. 21–37.
Forterre P. 2005 The two ages of the RNA world, and the transition to the DNA world: a
story of viruses and cells. Biochimie 87, 793–803.
Forterre P. 2000 Three RNA cells for ribosomal lineages and three DNA virus to replicate
their genomes: a hypothesis for the origin of cellular domain. Proc. Natl. Acad. Sci. U. S. A.
103, 3669–3674.
Fox-Keller E. 2010 The mirage of a space between nature and nurture. Duke Univesity Press,
Durham.
Leipe D. D., Aravind L., Koonin E. V. 1999 Did DNA replication evolve twice
independently? Nucleic Acids Res. 27, 3389–3401.
Lupas A. N., Ponting C. P., Russell R. B. 2001 On the Evolution of Protein Folds: Are
Similar Motifs in Different Protein Folds the Result of Convergence, Insertion, or Relics of
an Ancient Peptide World? J. Struct. Biol. 134, 191–203.
Martin W., Koonin E. V. 2006 A positive definition of prokaryotes. Nature 442, 868.
Mat W. K., Xue H., Wong J. T. 2008 The genomics of LUCA. Front Biosci. 3, 5605–5613.
Mushegian A. R., Koonin E. V. 1996 A minimal gene set for cellular life derived by
comparison of complete bacterial genomes. Proc. Natl. Acad. Sci. U. S. A. 93, 10268–10273.
Ouzounis C., Kyrpides N. 1996 The emergence of major cellular processes in evolution.
FEBS Lett. 390, 119–123.
Ouzounis C. A., Kunin V., Darzentas N., Goldovsky L. 2006 A minimal estimate for the gene
content of the last universal common ancestor. Res. Microbiol. 157, 57–68.
Pace N. R. 2006 Time for a change. Nature 441, 289.
Pace N. R. 2009 Rebuttal: the modern concept of prokaryote. J. Bacteriol. 191, 2006–2007.
Poole A. M., Logan D. T. 2005 Modern mRNA proofreading and repair: clues that the last
universal ancestor possessed an RNA genome? Mol. Biol. Evol. 22, 1444–1455.
Sapp R. 2006 Two faces of the prokaryote concept. Intern. Microbiol. 9, 163-172.
Spang A., Saw J. H., Jørgensen S. L., Zaremba-Niedzwiedzka K., Martijn J., Lind A. E., van
Eijk R., Schleper C., Guy L., Ettema T. J. 2015 Complex archaea that bridge the gap between
prokaryotes and eukaryotes. Nature 521, 173-179.
Tuller T., Birin H., Gophna U., Kupiec M., Ruppin E. 2010 Reconstruction ancestral gene
content by coevolution. Genome Res. 20, 122–132.
Whitman W. B. 2009 The modern concept of the Procaryote. J. Bacteriol. 191, 2000–2005.
Widmann J., Di Giulio M., Yarus M., Knight R. 2005 tRNA creation by hairpin duplication.
J. Mol. Evol. 4, 524-530.
Williams T. A., Foster P. G., Nye T. M., Cox C. J., Embley T. M. 2012 A congruent
phylogenomic signal places eukaryotes within the Archaea. Proc. R. Soc. Lond. B 279, 4870–
4879.
Williams T. A., Foster P. G., Cox C. J., Embley T. M. 2013 An archaeal origin of,
Eukaryotes supports only two primary domains of life. Nature 504, 231–236.
Woese C. R. 1987 Bacterial evolution. Microbiol Res. 51, 221–271.
Woese C. R., Fox G. E. 1977 The concept of cellular evolution. J. Mol. Evol. 10, 1–6.
Received 9 May 2016, in revised form 6 July 2016; accepted 23 August 2016
Unedited version published online: 25 August 2016
Figure 1. Three possible evolutionary branching from the universal ancestor (LUCA) are
showed. (a) The universal ancestor is a Prokaryote because the characters (geometrical
forms) are “identical” in the bacterial and archaeal domains. (b) LUCA is a progenote
because the characters are completely different between the bacterial and archaeal domains;
and (c) only a few characters are identical in the bacterial and archaeal domains, while the
most part of characters are different in the two domains of life. This would define a “pseudoProkaryote” condition or better “pseudo-progenote“ condition of LUCA that is indicated as
possibility but it is not discussed in the text.