AMER. ZOOL., 29:487-499 (1989)
Evolution of Eukaryotic Microorganisms and Their
Small Subunit Ribosomal RNAs1
MITCHELL L. SOGIN
Division of Molecular and Cellular Biology in Department of Pediatrics,
National Jewish Center for Immunology and Respiratory Medicine, 1400 Jackson Street, Denver, Colorado 80206 and
Department of Microbiology and Immunology, University of Colorado Health Sciences Center,
Denver, Colorado 80206
SYNOPSIS. Traditional views about the origin of eukaryotes and relationships between
major "kingdoms" reflect interpretations of the fossil record and comparisons of phenotypic characters. This perspective is challenged by phylogenetic frameworks inferred
from comparisons of macromolecular sequences which share a common ancestry. Similarities between ribosomal RNA genes demonstrate that instead of being relatively recent
biological inventions, eukaryotes represent a discrete lineage that may be as old as the
archaebacterial and eubacterial lines of descent. The diversity of protistan small subunit
rRNA sequences exceeds that seen within the entire prokaryotic world. The earliest
branching lineages include diplomonads, microsporidians, and tritrichomonads. Yet, other
major protistan groups diverged relatively late in the evolutionary history of nucleated
cells. Rather than being a concise evolutionary assemblage, the Protista should be regarded
as a collection of paraphyletic lineages. In contrast, the Fungi, Plantae, and Animalia are
independent monophyletic groupings. They originated nearly simultaneously during a
relatively recent period characterized by a massive diversification of forms. This novel
view of eukaryotic evolution suggests that a reliance upon large phenotypic differences
in delineating kingdoms can obscure true genealogical relationships. Instead of dividing
eukaryotes into four or more major divisions, they should be considered as a single
kingdom that encompasses a progression of independently diverging lineages.
duction of Darwinian concepts and the discovery of the microbial world, taxonomists
were forced to revise their classification
schemes into phylogenies or "natural systems" which reflected the historical evolution of the organisms being considered.
Many of the newly discovered organisms
could not be unequivocally classified as
plants or animals, e.g., organisms that were
both motile and photosynthetic or nonmotile and non-photosynthetic. This led to
proposals for new kingdoms, most significantly from Haekel (1892) who placed all
microbial forms into the kingdom Protista.
It was not until the 1960s that the eukaryote/prokaryote dichotomy (as originally
described by Chatton in 1937) became
widely accepted (Stanier and van Niel,
1962; Stanier, 1970). More recently
through the use of molecular techniques,
Woese discovered the Archaebacteria
which represent a third major line of evolutionary descent (Woese and Fox, 1977).
In a five kingdom scheme that emphasizes a dichotomy between prokaryotes and
1
From the Symposium on Science as a Way of Knowing—Cell and Molecular Biology presented at the Annual eukaryotes, Whittaker (1969) has further
Meeting of the American Society of Zoologists, 27- divided the eukaryotic lineage into four
30 December 1988, at San Francisco, California.
major groups. Early support for this clas487
INTRODUCTION
The planet Earth is about 4.7 billion years
old. The first life form appeared 3.5-4.2
billion years ago and it consisted of microscopic, relatively simple cells. Over the
ensuing eons of years, primitive cells
evolved into the ten million different
species which represent the existing biological diversity. All organisms including
animals, plants, fungi, and an untold collection of microbial species have their common ancestral roots within these earliest
life forms. Insights into historical events
that dominated the evolution of this biosphere are linked to the understanding of
phylogenetic relationships between the
major groups of extant organisms.
Early systematic biologists relied upon
the largest measurable differences in macroscopic structural organization and modes
of life style to place living organisms into
the plant or animal world. With the intro-
488
MITCHELL L. SOGIN
sification scheme by Margulis (1974) has
contributed to its widespread acceptance.
A "bubble gram" depicting the five kingdom relationships is shown in Figure 1
(Margulis and Schwartz, 1982). In addition
to the Monera, which would include the
Eubacteria and Archaebacteria as described
by Woese, there are four eukaryotic kingdoms: the Animalia (motile, heterotrophic,
lacking cell walls, multicellular with two or
more kinds of tissue), the Plantae (nonmotile, photo-autotrophic, cell walls made
of cellulose, multicellular with two or more
kinds of tissue), the Fungi (non-motile,
osmotrophic type of heterotrophic nutrition, chitinous cell walls, a single kind of
vegetative tissue), and the Protoctista (or
Protista) (both motile and non-motile, all
modes of nutrition, generally unicellular).
With the development of analytical tools
for defining subcellular features the debate
over the number of eukaryotic kingdoms
that represent independent lineages has
intensified. For example, in some phylogenies protists are treated as a cohesive
grouping (Corliss, 1984), while in others
they are divided into multiple kingdoms
(Cavalier-Smith, 1981). The phenotypic
variation within the Protista far exceeds
that seen in other eukaryotic kingdoms.
Instead of being a related phylogenetic
grouping, individual protist classes may
represent independent lineages, each worthy of "kingdom level" status.
There is a general consensus that protists represent the oldest and most diverse
of the eukaryotic kingdoms. However, the
identity of early diverging lineages as well
as protistan groups ancestral to plants, animals, and fungi is still an unresolved issue.
Soft-bodied protists are not well represented in the fossil record and because of
their enormous biochemical, cytological,
and physiological variation, there is no
agreement about which characters are most
useful for making phylogenetic inferences.
Even if there was a consensus about an
effective set of phylogenetic markers, the
extent of similarity between groups could
not be reliably determined from comparisons of traditional evolutionary markers;
the rate of genotypic change is not necessarily closely linked to phenotypic varia-
tion. Establishing "natural systems" based
upon resemblance is useful for the taxonomist but this method alone is not sufficient
to reconstruct the evolutionary history of
distantly related organisms. A phylogenetic framework based upon objective,
quantitative measurements is needed so
that the number of major eukaryotic groups
and relationships between them can be settled.
MOLECULAR EVOLUTIONARY MARKERS
As an alternative to traditional phenotypic markers, quantitative phylogenetic
frameworks can be established through
comparisons of macromolecular sequences
which share a common ancestry (Zuckerkandl and Pauling, 1965). If a given molecular phylogeny proves to be sufficiently
robust, it can be used to evaluate the utility
of various non-molecular traits for inferring evolutionary relationships. Given the
current state of biotechnology and information processing capabilities, it is not feasible to determine and compare complete
genome sequences from representatives of
each of the major eukaryotic lineages.
However, molecular phylogenies can be
inferred from sequence comparisons
between one or a few genes if they satisfy
the following restrictions: 1) The compared molecules (or genes that define the
molecules) must be evolutionary homologues and easy to isolate; 2) The sequences
must change sufficiently slowly to allow
measurements of even the largest genealogical distances between the compared
organisms; 3) The genes defining the macromolecules must not undergo transfer
between species (if lateral transfer has
occurred, the inferred phylogeny will be
that of the genes, not the organisms); and
4) The macromolecules must consist of a
statistically significant number of independently variable sites. (Ideally they should
contain several functional domains which
can be used to test for convergent evolution. Identical trees inferred from comparisons of sequence domains that can vary
independently must reflect divergent evolution from a common ancestor; it is
unlikely that functionally separate macromolecular sequences will converge at the
489
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
ANIMALIA
(diploids)
PLANTAE
(haplo-diploids)
FUNGI
GNETOPHYTA \ GINKGOPHYTA
Histogenesis \
\ ^
\
/ FILICINOPHYTA
o
o
Embryogenesis
PROTOCTISTA
MeiOSIS
OOMYCOTA
V
GAMOPHYTA
PHAEOPHYTA
Gametogenesis^
™*
m
TENOPHORA
I
APICOMPLEXA
'
/ BACILLARIOPHYTA
Mitosis
/
XANTHOPHYTA
CHRYSOPHYTA / C X
CRYPTOPHYTA
SARCODINA
IHAPTOPHYTAH-ES
LABYRINTHULAMYCOTA
CIUOPHORA
DINOFLAGELLATA
Steroidogenesis
Photosynthesis \ ^? T ° S , Y A N T H E T I C
•
\ BACTERIA
MYXOBACTERIA
^
CHLOROXYBACTERIA
Spores
CHEUOAUTOTROPHIC
BACTERIA
I
Phototaxis
Aerobiosis
TINOBACTERIA
EUGLENOPHYTA
,'
METHANEOCREATRICES j AEROENDOSPORA
/
FERMENTING
BACTERIA
o
/
SPIROCHAETAE
Motihty
' OMNIBACTERIA
/
^"*
NITROGEN-FIXING
BACTERIA
Chemotaxis
6
MONERA
Fermentation
Reproduction V: • ..'...•• '-j
FIG. 1. A phylogeny of life on Earth based on the Whittaker five-kingdom system and the symbiotic theory
of the origin of eukaryotic cells. (Reproduced from Five Kingdoms: An Illustrated Guide to the Phyla of Life on
Earth by Margulis and Schwartz, 1982)
same rate during evolution and produce
identical phylogenetic frameworks.) Ribosomal RNAs (rRNAs) or their coding
regions satisfy these requirements and have
been widely used to study both prokaryotic
and eukaryotic phylogenetic relationships
(Woese, 1987).
RNA COMPONENTS OF THE
TRANSLATION APPARATUS
The ribosome is a multi-functional,
ribonucleoprotein complex which mediates
the translation of messenger RNAs into
proteins. All ribosomes can be dissociated
into two unequal subunits (Whittmann,
1982). In prokaryotes, the smaller subunit
contains a single large rRNA that is generally 16S in size. Its evolutionary homologue in eukaryotes can be as large as 19S
(Noller, 1984; Gunderson and Sogin,
1986). The larger ribosomal subunit is normally twice as large as the small subunit
and in prokaryotes it typically contains two
RNA molecules, 5S and 23S in size. (In a
490
MITCHELL L. SOGIN
few organisms, the primary 23S rRNA
transcript is cleaved into two smaller molecules [Marrs and Kaplan, 1970].) In the
eukaryotic large ribosomal subunit, there
are at least three RNA species. These
include the 5S and 28S RNAs plus a third
molecule, the 5.8S rRNA which is evolutionarily homologous to the 5' terminal
region of the prokaryotic 23S RNA. In
prokaryotes, all three RNA species are
embodied within a single primary transcript which, via a series of RNA processing events, is reduced to the size of the
mature ribosomal RNA species (Pace,
1973). In eukaryotes, the 18S, 5.8S and
28S RNAs are defined by a contiguous coding region (Attardi and Amaldi, 1970)
which is evolutionarily homologous to the
prokaryotic 16S-23S-5S rRNA cistron. The
eukaryotic 5S rRNA coding region is usually not linked to the other rRNA genes
and is transcribed by a different class of
RNA polymerases (Reeder, 1974).
STRUCTURAL ANALYSIS OF
RIBOSOMAL R N A S
The 5S and 5.8S rRNA data base comprises the largest collection of macromolecular sequences which share a common ancestry. More than 700 prokaryotic
and eukaryotic species are represented.
Strategies for direct sequence analyses of
these rRNA molecules take advantage of
their relatively small size (fewer than 120
nucleotide positions for 5S and 170 for
5.8S rRNAs). However, the utility of small
rRNA species in defining very close or very
distant evolutionary relationships is diminished by the minimal number of positions
that can independently vary. These molecules are comprised of a few well constrained helical domains. When a helical
structure is altered during evolution, several "non-clock-like" changes can be introduced and so, an overestimate of evolutionary distance will result. Conversely, in
comparisons of distantly related organisms, the majority of mutable sites in the
5S and 5.8S rRNAs may have undergone
multiple mutational events which leads to
an underestimate of evolutionary distance.
The statistical limitations of 5S and 5.8S
rRNA sequences in phylogenetic tree
reconstructions are not inherent in comparisons of the larger ribosomal RNAs. In
Escherichia coli, the small subunit rRNA
(16S-like rRNA) is 1,453 nucleotides in
length (Brosius et al., 1978) which compares to a typical eukaryotic 16s-like rRNA
(commonly referred to as 18S or 19S
rRNA) of 1,797 nucleotides as represented
by the yeast, Saccharomyces cerevisiae (Rub-
tsov et al., 1980). The large subunit rRNAs
(23S-like rRNAs) of prokaryotes and
eukaryotes are about twice the size of the
16S-like rRNA species. Both are suitable
for measuring the extent of genetic relatedness between diverse groups of organisms.
Phylogenetic frameworks spanning a
broad range of evolutionary distances have
been inferred from structural similarities
between eukaryotic 16S-like rRNAs (Sogin
etal, 19866; Sogin and Gunderson, 1987;
Gunderson et al., 1987). Because of functional constraints which must have been
established in the earliest common ancestors to the three primary lines of descent,
large portions of the loS-like rRNA are
nearly invariant and can be used to measure very distant relationships. The highly
conserved elements are interspersed among
moderately conserved and highly variable
regions. The variable regions are useful
for estimating the extent of genetic relatedness between more closely related species
(Elwood etal., 1985).
All prokaryotic and eukaryotic 16S-like
rRNAs can be folded into a consensus secondary structure model (Gutell etal., 1986)
similar to the diagram of the ciliated protozoan Tetrahymena thermnphila rRNA
shown in Figure 2. The structure is comprised of three domains which fold about
a "minimal core" region. There are as
many as fifty helical domains that are considered to be phylogenetically proven. (A
helical structure is considered to be phylogenetically proven if it is present in most
16S-like rRNAs and its formation is independent of absolute primary structure conservation; several compensating base
changes must be found that maintain the
helical structure.) Several regions which are
extremely variable in size and sequence
composition cannot be folded into a con-
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
491
Tetrahvmena thermophila
V5
FIG. 2. Secondary structure model of the 16S-like rRNA of the ciliated protozoan Tetrahymena thermophila.
The secondary structure for the T. thermophila 16S-like rRNA (Sogin et al., 1986) was drawn according to
the pairing scheme proposed by Gutell (Gutell et al., 1986). The heavy lines indicate the location of highly
conserved sequences. Oligonucleotides complementary to these sites can be used to initiate DNA synthesis
in dideoxynucleotide chain termination sequencing protocols. The brackets VI-V7 indicate regions that
display extreme length variation in eukaryotic 16S-like rRNAs.
sensus structure. These "expansion
regions" are not presented in helical array.
The details of the model and its correlation
with functional roles in the ribosome have
been discussed by Noller et al. (1986).
Within the past few years methods have
been developed which permit the direct
determination of partial 16S-like rRNA
sequences{Quetal., 1983; Lzneetal., 1985)
or complete coding region sequences
(Elwood et al., 1985) with less effort than
that required to sequence a 5S or 5.8S
rRNA. The strategy for rapidly sequencing these molecules or their genes is based
upon the interspersion of highly conserved
sequences along the length of the rRNA
492
MITCHELL L. SOGIN
Paramecium tetraurelia 16S-likerRNA
4
1090
535
344
-51 (Pst I)
858
15226 (EcoRI^
_
5' Terminus
3' Termir(US
Pst I
371
1241
872
I
i
-200
0
200
|
|
400
600
|
800
Pst I
_1541 (EcoRI) _J741
1104
551
EcoRI
_ 1604
|
I
I
I
I
I
1000
1200
1400
1600
1800
2000
FIG. 3. Sequencing strategy for the 16S-like rRNA coding region from the ciliated protozoan, Paramecium
tetraurelia. The P. tetraurelia 16S-like rRNA coding region is contained within a 2,000 base pair restriction
fragment defined by the enzyme Pst I. Both orientations of the fragment were subcloned into the single
stranded phages M13 mpl8 and Ml 3 mpl9. Single stranded DNA templates were prepared from the recombinant phage and sequenced using the dideoxynucleotide chain termination sequencing protocols (Sanger
and Coulson, 1975). Oligonucleotides (15—20 nucleotides in length) that are complementary to conserved
coding and noncoding regions were used to prime DNA synthesis in the sequencing reactions. The direction
and extent of sequence analysis from a given primer are shown by the arrows. The locations of the primers
in the P. tetraurelia rRNA are indicated on the arrows.
molecules. Typically, an rRNA coding
region is cloned into the single strand phage
Ml3. DNA sequencing templates containing the coding or noncoding strands for
the entire 16S-like rRNA are prepared and
oligonucleotides (15-20 nucleotides in
length) that are complementary to highly
conserved elements can be used to initiate
DNA synthesis in dideoxynucleotide
sequencing protocols. The locations of the
highly conserved primer sites used in the
dideoxynucleotide sequencing reactions
are shown in the secondary structure model
presented in Figure 2. A detailed sequencing strategy for the Parnmecium tetraurelia
16S-like rRNA gene is shown in Figure 3
(Sogin and Elwood, 1986). For studies of
rRNA primary structure, the time-limiting
factor is that required to construct genomic libraries and identify recombinant
clones containing rRNA coding regions.
More recently the development of the
Polymerase Chain Reaction (PCR) technology (Mullis and Faloona 1987; Saiki el
ah, 1988) has dramatically reduced the
effort required to obtain rDNA coding
regions for sequence analysis. The use of
PCR technology to synthesize or amplify
many copies of rRNA coding regions is
illustrated in Figure 4. Comparison of sixty
eukaryotic 16S-like rRNA sequences
reveals that there are conserved sequence
elements proximal to the 5' and 3' termini
which can be used as initiation sites in
PCR experiments (Medlin et al., 1988).
Sequences between the conserved elements can be exponentially amplified by
repetitive cycles of denaturing duplex
DNA, annealing primers complementary
to the conserved sequence elements, and
then primer extension using DNA polymerase. The products of the primer extension as well as the original duplex DNA
can serve as templates in successive amplification cycles. Within a few hours, several
Mg of DNA encoding 16S-like rRNAs can
be obtained from as little as 0.1 nanograms
of bulk genomic DNA. The resulting product can be cloned into the single stranded
phage Ml3 or characterized by modifications of the dideoxynucleotide sequencing
protocols for analyzing double stranded
DNA templates. The fidelity of synthesis
in the PCR amplification procedures has
been evaluated by comparisons to sequence
determinations of cloned genomic rRNA
genes. Fewer than one error per 2,000 were
observed in the amplified rRNA coding
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
regions (Medlin et al., 1988). These technological advances have led to the explosive expansion of the data base for 16S-like
rRNA sequences and the attendant dramatic restructuring of our perspective of
eukaryote evolution.
493
Target DNA
1
iiiiininiMHiHiiiiuininimi
-
CYCLE-1
ANNEAL PRIMERS
TO DENATURED
TEMPLATE
-
CYCLE-1
• PRIMER EXTENSION
INFERENCE OF RIBOSOMAL RNA
BASED PHYLOGENIES
CYCLE-2
ANNEAL PRIMERS
TO DENATURED
TEMPLATE
Parsimony, Cladistic, or Distance matrix
methods can be used to infer phylogenetic
relationships from molecular data. AdvanCYCLE-2
tages and limitations for each of the
PRIMER EXTENSION
'
'
•"
'
approaches have been reviewed by Felsenstein (1982). In general, both parsimony
and distance methods produce similar tree FIG. 4. Polymerase chain reaction for amplifying
topologies in comparisons of our rRNA DNA sequences. Target DNAs between known DNA
sequences. A more critical aspect of phy- sequences 20-25 nucleotides in length can be exponentially amplified by repetitive cycles of denaturing
logenetic tree inference from molecular duplex
DNA, annealing primers complementary to
comparisons is the method employed for the known sequence elements, and then primer extenaligning rRNA sequences. Since eukary- sion using DNA polymerase (Mullis and Faloona, 1987;
otic 16S-like rRNAs vary in length from Saiki et al., 1988). The products of the primer exten1,244 nucleotides in Vairimorpha necatrix sion as well as the original duplex DNA can serve as
templates in successive amplification cycles. Amplifi(Vossbrinck et al., 1987) to as many as 2,305 cation
primers proximal to the 5' and 3' termini of
in Acanthamoeba castellanii (Gunderson and 16S-like rRNA genes have been designed which perSogin, 1986) it is necessary to align homol- mit the rapid amplification of rDNA genes from nanogous positions among all the compared ogram quantities of genomic DNA (Medlin et al., 1988).
sequences. Computerized procedures have The primers contain polylinkers at their 5' termini
order to facilitate the cloning of the PCR amplibeen developed in which numerical scores in
fication products into the single stranded phage M13
and penalties (for matches, mismatches, and for subsequent DNA sequence analysis.
alignment gaps) are tallied in order to
derive the optimal alignment between two
sequences (Smith et al, 1981). This tech- resent homologous positions in all the 16Snique is totally objective but the compu- like rRNA coding regions. The alignment
tational time rises geometrically with the within regions that display length varianumber of simultaneously considered tions is refined by a consideration of physequences and it does not take into account logenetically conserved higher order structhe significance of secondary and tertiary tures. Sequence elements that define
structure similarities. A preferable evolutionarily proven helices are juxtaapproach is the use of computer assisted posed by the appropriate placement of
methods that consider the phylogenetic alignment gaps. Only those positions which
conservation of both primary and second- are in obvious alignment (using the criteria
ary structures in the alignment process. of primary and/or secondary structure
Initially the highly conserved sequences are conservation) should be used when comaligned by using a computer assisted algo- puting similarities between rRNA
rithm. Alignment gaps are introduced to sequences. When distantly related organjuxtapose similar regions in the sequence isms representing each of the three pricollection. The procedure is repeated but mary lines of descent are compared, the
the search for similar sequences is restricted number of positions that can be unambigto tracts that display lower similarity. If uously aligned is reduced to approximately
several sequences are considered simulta- one thousand (Sogin et al., 1988). If an
neously, it is possible to identify very short analysis includes only closely related
sequences of weaker similarity that rep- species, e.g., thirteen species within the
""
'"
""••iinnniiiinmiTTTP
minium.
inn
11
"
'"
'
•
494
MITCHELL L. SOGIN
genus Tetrahymena (Sogin et al., 1986a), all
of the sites in the 16S-like rRNAs can be
unambiguously aligned. The reliability of
branching patterns in a given phylogenetic
framework is improved when larger numbers of properly aligned positions are used
to calculate similarity values.
For the inference of phylogenetic trees
by distance matrix methods, similarity values must be computed for all possible pairwise comparisons of homologous nucleotide positions. Similarity is denned as:
s = m/(m + u + g/2)
where m is the number of sequence positions with matching nucleotides, u is the
number of positions with non-matching
nucleotides, and g is the number of
sequence gaps. (Only the first 5 positions
in a gap are considered in making the calculations. Large insertions or deletions
probably reflect single rare events.) The
similarity values are converted to ""distance" values (the number of evolutionary
changes per 100 positions) using the formula of Jukes and Cantor (1969) which
compensates for the probability of multiple events at the same position. The distances are then converted to phylogenetic
trees using a modification (Elwood et al.,
1985) of the distance matrix methods (Fitch
and Margoliash, 1967). The evaluation of
alternative phylogenetic trees is based upon
the agreement of the distance data separating pairs of organisms and the sum of
tree segment lengths joining the organisms
in the tree.
A THREE KINGDOM PHYLOGENY
The phylogenetic tree shown in Figure
5 includes representatives from the three
primary lines of descent. It is based upon
the comparison of 1,030 positions in 16Slike rRNAs that can be unambiguously
aligned in all of the considered organisms.
Evolutionary distances between organisms
and nodes are proportional to the line segment lengths presented in the figure. Our
algorithm for searching for the optimal tree
topology is limited to a maximum of thirty
two organisms (a constraint imposed by the
enormous computational time required to
explore larger numbers of alternative tree
topologies). To include as many independent eukaryotic lineages as possible, only
prokaryotic species that represent the
known rRNA sequence diversity within the
eubacterial or archaebacterial kingdoms
are included in the analysis. In this phylogeny the organisms separate into three
major subtrees or primary lines of descent.
They correspond to the Eubacteria,
Archaebacteria and the Eukaryota. The
root of the tree is unknown but the succession order for lineages within each of the
subtrees relative to the other major groups
is shown in the figure.
The earliest known divergence in the
eukaryotic subtree is represented by the
diplomonad Giardia lamblia. The extreme
rRNA sequence divergence between G.
lamblia and other eukaryotes is unequaled
within the other primary lineages. The G.
lamblia/Plasmodium berghei and G. lamblia/
Trypanosoma brucei structural similarity values are 0.675 and 0.677, respectively. This
compares to inter-kingdom values of 0.700
and 0.711 for Sulfolobus solfataricus/E. coli
and Halobacterium volcanii/E. coli, respectively. It is unlikely that the deep branching pattern is an artefact of accelerated
rates of nucleotide substitution in* G.
lamblia rRNA relative to that of other
emkaryoces. Unusually rapid rates of
nucleotide substitution in a sequence can
be detected by comparing its divergence
from a distantly related sequence with that
of a close relatire and the distantly related
sequence. If a sequence is subject t© a fast
rate of evolutionary change, its divergence
from the control sequence will be unusually large; the higher the mutation rate for
a given sequence, the greater the divergence from a distantly related sequence.
The distance between G. lamblia and£. coli
is comparable to that between D. discoideum
and E, coli or Xenopus laevis and E. coli. The
early divergence of G. lamblia is corroborated by the retention of features in its
rRNA coding region that are characteristic of archaebacterial and eubacterial
16S-like rRNAs. In contrast to all other
characterized eukaryotes, the G. lamblia
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
Zea mays
Ochromonas danica
Achlva bisexualis
495
Entamoeba histqlytica
Naegleria gruberi
Euglena gracilis
Trvpanosoma brucei
.eishmania donovanii
Tritrichomonas vaginalis
Vairimorpha necatrix
Giardia lamblia
Halobacterium volcanii
Sulfolobus solfataricus
FIG. 5. Multi-kingdom tree inferred from 16S-like rRNAs. A computer assisted method was used to align
the 16S-like rRNA sequences from divergent representatives of the Eubacteria, Archaebacteria and Eukaryota.
The alignments were influenced by considering the evolutionary conservation of both primary and secondary
structure features (Elwood et al., 1985). The distance matrix methods (Fitch and Margoliash, 1967) were used
to infer an unrooted multi-kingdom tree in which the line segment lengths represent the evolutionary distance
between organisms.
16S-like rRNA even contains a prokaryotic-like "Shine-Delgarno" message RNA
binding site (Sogin et al., 1988).
Branchings that occurred soon after that
of G. lamblia include the microsporidian V.
necatrix and the tritrichomonad Tritrichomonas vaginallis. All three are protozoans
which have adopted parasitic life styles and
lack mitochondria. The parasitic life style
may have played a critical role leading to
the survivorship of these lineages during
unfavorable periods in their evolutionary
history, but parasitism alone can not explain
early branching lineages. Free living diplomonads related to G. lamblia can be identified in our contemporary biosphere and
other parasitic species including Plasmodium and Pneumocystis display late branching patterns. The early divergence of G.
lamblia in the history of nucleated cells is
consistent with the lack of mitochondria,
the apparent absence of ER and Golgi, the
evident lack of sexual life cycle stages (Fee-
ly et al., 1984), the prokaryotic-like organization of its ribosomal RNA coding
region (Boothroyd et al., 1987) and the
remarkably simple constellation of proteins associated with its cytoskeleton (Peattie, unpublished results). G. lamblia probably separated from other eukaryotes prior
to the full development of sub-cellular features such as Golgi and ER, earlier than
the endosymbiotic event(s) that gave rise
to mitochondria, and before the cytoskeleton had reached the level of complexity
found in other eukaryotic microorganisms.
The early branchings are followed by a
progression of other independent protist
lineages including coincident branchings
of the amoeboflagellate lineage (as represented by Naegleria gruberi) and a lineage
leading to euglenoids and kinetoplastids (as
represented by Euglena gracilis and
Trypanosoma brucei). Other discrete
branchings include the Rhizopoda (as represented by Entamoeba histolitica), the Acra-
496
MITCHELL L. SOGIN
siomycota (D. discoideum), the Apicomplexa
(Plasmodium berghei) and the nearly concurrent separation of ciliates, chlorophytes, chrysophytes (including the oomycetes) and acanthamoebae. The 16S-like
rRNA based phylogeny clearly demonstrates that the kingdom Protista encompasses a collage of seemingly unrelated lineages. This contrasts with the plants,
animals, and fungi which appear as monophyletic groups. These "higher kingdoms"
separated nearly simultaneously late in the
evolutionary history of the nucleated cells
during a period which also gave rise to
numerous other groups. The branchings
for these major lineages span a remarkably
short time-frame which precludes the identification of the proper order of succession
for the nearly concurrent diverging lineages. Their nodes are separated by fewer
than one nucleotide change per one
hundred positions. As additional organisms are included in the rRNA based phylogenies, a more accurate picture of events
which occurred during this period of radiation should emerge.
The rapid proliferation of morphologically distinct lineages could reflect major
environmental changes that occurred
approximately 800-1,200 million years
ago. For example, a large increase in atmospheric oxygen could have led to the development of new ecological niches. The rapid
diversification of a small number of lineages that survived some cataclysmic event
could also explain the present data. Resolution between these two possibilities might
be achieved with the inclusion of more protist sequences in the rRNA based phylogenetic trees. A prediction of the cataclysmic event model is that similar radiating
patterns will be observed in earlier diverging lineages. The time of radiation in the
earlier diverging lineages should correspond to that for the separation of plants,
animals, and fungi. It is also conceivable
that the radiative period of evolution was
not triggered by a major environmental
perturbation. The rate of phenotypic evolution may have suddenly accelerated in
response to novel mechanisms related to
the management of genetic information.
For example, "cis" splicing mechanisms for
processing RNA might have been invented
or vectors for rapidly exchanging genetic
information between species may have
appeared. Both of these developments
would facilitate the shuffling and exchange
of genetic elements which could have dramatically influenced the rate of phenotypic
change.
Hypotheses suggesting the chimeric
nature of the eukaryotic cell receive support from the phylogeny presented in Figure 5. According to the serial endosymbiotic theory, as discussed by Margulis
(1981), the eukaryotic cell arose from a
series of associations between prokaryotic
species. These endosymbiotic events gave
rise to many features in nucleated cells
which are considered to be characteristic
of eukaryotes. Although the nuclearencoded rRNAs are not specifically related
to any prokaryotic group, it is clear that
branches representing the mitochondrial
and chloroplast compartments of eukaryotic cells converge on specific lineages in
the eubacterial subtree. The Zea mays mitochondrial rRNA sequence is most similar
to that of the eubacterium Agrobacterium
tumefaciens. The Z. mays chloroplast rRNA
affiliates with the small subunit rRNA of
the cyanobacterium, Anacystis nidulans. A
common ancestry for mitochondrial or
chloroplast 16S-like rRNAs and their
eubacterial relatives offers compelling
proof of their endosymbiotic origins in
eukaryotes. However, their absence in early
diverging lineages suggests that the endosymbiotic origins of chloroplasts and mitochondria did not occur at the earliest stages
of eukaryotic evolution. In fact, it is probable that multiple independent endosymbiotic events gave rise to mitochondria and
chloroplasts. At least one of these endosymbioses is thought to have occurred after
the divergence of plants and animals. This
view is supported by Gray's demonstration
(Gray et al., 1989) that evolutionary trees
inferred from comparisons of nuclear
rRNA genes or from mitochondrial genes
have different topologies. Similarly the
deep divergence of Euglena gracilis and its
separation from other photosynthetic
groups by many nonphotosynthetic lineages suggests the possible acquisition of
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
chloroplasts by multiple, independent
endosymbiotic events.
497
noids, kinetoplastids, microsporidians and
diplomonads; the rRNA tree suggests that
these organisms represent the earliest
NEW PERSPECTIVES ON THE
branching lineages in the eukaryotic line
EVOLUTION OF EUKARYOTES
of descent. An alternative explanation for
The discovery of Archean microfossils the unexpected evolutionary diversity of
demonstrated that microorganisms existed eukaryotes is that they represent a rapidly
at least as early as 3.5 billion years ago evolving lineage which diverged from an
(Schopf and Walter, 1983). The general ancestral prokaryotic lineage much later
morphology of these fossils suggests that than the separation of the Archaebacteria
the most ancient living cells were exclu- from the Eubacteria. To date, the analysis
sively prokaryotic. The earliest eukaryotic of rRNA sequence data has not convincmicrofossils do not appear until the Pro- ingly identified a prokaryotic lineage speterozoic. Based upon these interpretations, cifically ancestral to all eukaryotic nuclear
the time of divergence between prokary- ribosomal RNA genes. Further, it is
otes and eukaryotes is commonly placed improbable that the rate of evolutionary
within the last 1-2 billion years. However, change in each of the eukaryotic lineages
this traditional view of organismal evolu- would have accelerated nearly simultation is not supported by the rRNA based neously.
phylogenies. The extraordinary depths of
The branching pattern in the eukaryotic
branching in the eukaryotic subtree eclipse subtree demands a reconsideration of what
those seen within the entire prokaryotic kingdom level phylogenetic boundaries are
world. The divergence between the meant to represent. There is little correeukaryote and prokaryote lineages prob- spondence between evolutionary distances
ably occurred early in the evolutionary his- inferred from comparisons of rRNA
tory of this biosphere. The absence of sequences and the phenotypic differences
eukaryotic microbial fossils older than two between the traditional major eukaryotic
billion years and the apparent lack of sig- groups. Evolutionary distances separating
nificant biochemical diversity are fre- plants, animals and fungi are dwarfed by
quently cited as evidence for a recent ori- similar comparisons between lineages
gin of the eukaryotic lineage. However, within the kingdom Protista. Even within
such interpretations may have distorted our monophyletic protist lineages, e.g., the Cilview of microbial evolution. The identifi- iophora (Sogin and Elwood, 1986) or
cation of eukaryote micro-fossils is gener- chrysophytes (Gunderson et. al., 1987), the
ally based upon their minimal size extent of rRNA diversity can be equivalent
approaching 10 fim. Since there are extant to or sometimes exceeds that observed
eukaryotes as small as 1 /*m in diameter, among the "higher" eukaryotic kingdoms.
e.g., Nanochlorum eukaryotum, and since soft- On the basis of rRNA sequence diversity,
bodied protozoans such as diplomonads are the ciliates and chrysophytes might be conunlikely to be preserved in the fossil record, sidered as separate kingdoms with equivthe lack of fossil record support for alent taxonomic rank to the Plantae, Fungi,
extremely ancient eukaryotes is not sur- and Animalia.
prising. Similarly, assessments of eukaryApparently, major phenotypic variaotic biochemical diversity may be misleading. Most data regarding biochemical tions used to define kingdom level groups
pathways and mechanisms are derived from in "natural systems" are not necessarily
studies of animals, plants or single-cell marked by deep genealogical differences.
organisms such as Saccharomyces. These The notion of kingdom within the eukarygroups represent recent branchings in the otic world is further challenged by the
eukaryotic line of descent and, therefore, extreme depths of branching for many
may be expected to have similar biochem- protist lineages, many of which have only
ical motifs. The true eukaryotic biochem- a limited number of representatives. For
ical diversity may be represented by eugle- example, the diplomonads, as represented
by G. lamblia, are a small group of organ-
498
MITCHELL L. SOGIN
mitochondrion and its genome. Proc. Natl. Acad.
isms which are more isolated from other
Sci. U.S.A. (In press)
eukaryotes than groups frequently recogGunderson.J. H., H. J. Elwood, A. Ingold, K. Kindle,
nized as separate kingdoms (animals, plants,
and M. L. Sogin. 1987. Phylogenetic relationfungi). Are diplomonads, then, to be given
ships between chlorophytes, chrysophytes and
kingdom level status? Similarly, the depths
oomycetes. Proc. Natl. Acad. Sci. U.S.A. 84:58235827.
of branching for several independent lineages including microsporidians, kineto- Gunderson, J. H. and M. L. Sogin. 1986. Length
variations in eukaryotic rRNAs: Small subunit
plastids/euglenoids, amoeboflagellates,
rRNAs from the protists Acanthamoeba castellani
slime molds, and apicomplexans are much
and Euglena gracilis. Gene 44:63-70.
older than those exhibited by "higher" Gutell, R. R., G. Weiser, C. R. Woese, and H. F.
Noller. 1986. Comparative anatomy of 16S-like
eukaryotic kingdoms. Even if each of these
ribosomal RNA. Prog. Nucleic Acid Res. Mol.
lineages were to be considered as a sepaBiol. 32:155-216.
rate kingdom, such a scheme would lead Haekel,
E. 1892. The history of creation or the developto the non-equivalent use of taxonomic
ment of the earth and its inhabitants by the action of
rank in different lineages. The significance
natural causes, 4th ed. (translated from the 8th
German edition by E. R. Lankester). D. Appleton
of higher level taxonomic divisions in the
and Co., New York.
eukaryotic world must be re-evaluated. The
T. H. and C. R. Cantor. 1969. Evolution of
rDNA data suggest that eukaryotes should Jukes,
protein molecules. In H. N. Munro (ed.), Mambe considered as a continuing progression
malian protein metabolism, pp. 21-132. Academic
of diverging lineages.
Press, New York.
Lane, D. J., B. Pace, G. J. Olsen, D. A. Stahl, M. L.
Sogin, and N. R. Pace. 1985. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc. Natl. Acad. Sci. U.S.A.
This work was supported by a grant from
82:6955-6959.
the National Institutes of Health
Margulis, L. 1974. Five-kingdom classification and
(GM32964) to M.L.S.
the origin and evolution of cells. Evol. Biol. 7:
45-78.
REFERENCES
Margulis, L. 1981. Symbiosis in cell evolution. W. H.
Freeman, San Francisco.
Attardi, G. and F. Amaldi. 1970. Structure and synthesis of ribosomal RNA. Ann. Rev. Biochem. Margulis, L. and K. V. Schwartz. 1982. Five kingdoms:
An illustrated guide to the phyla of life on earth. W.
39:183-226.
H. Freeman, San Francisco.
Boothroyd, J. C , A. Wang, D. A. Campbell, and C.
C. Wang. 1987. An unusually compact ribo- Marrs, B. and S. Kaplan. 1970. 23S precursor ribosomal RNA of Rhodopseudomonasspheroides. J. Mol.
somal DNA repeat in the protozoan Giardia
Biol. 49:297-317.
lamblia. Nucleic Acids Res. 15:4065-4084.
Brosius, J., M. L. Palmer, P. J. Kennedy, and H. F. Medlin, L., H. J. Elwood, S. Stickel, and M. L. Sogin.
1988. Sequence analysis of enzymatically ampliNoller. 1978. Complete nucleotide sequence of
fied genomic small subunit rRNA genes from the
a 16S ribosomal RNA gene from Eschenchia coli.
diatom, Skeletonema pustulata. Gene 71:491-499.
Proc. Natl. Acad. Sci. U.S.A. 75:4801-4805.
Cavalier-Smith, T. 1981. Eukaryote kingdoms: Seven Mullis, K. B. and F. A. Faloona. 1987. Specific synthesis of DNA in vitro via a polymerase-catalyzed
or nine? Biosystems 14:461-481.
chain reaction. Methods Enzymol. 155:335-350.
Chatton, E. 1937. Tilres et travoux scientifiques. Sete,
Noller, H. F. 1984. Structure of ribosomal RNA.
Sottano, Italy.
Ann. Rev. Biochem. 53:119-162.
Corliss, J. O. 1984. The kingdom Protista and its 45
Noller, H. F., M. Asire, A. Barta, S. Douthwaite, T.
phyla. Biosystems 17:87-126.
Goldstein, R. R. Gutell, D. Moazed,J. Normanly,
Elwood, H. J., G. J. Olsen, and M. L. Sogin. 1985.
J. B. Prince, S. Stern, K. Triman, S. Turner, B.
The small subunit ribosomal RNA gene sequences
Van Stolk, V. Wheaton, B. Weiser, and C. R.
from the hypotrichous ciliates Oxytricha nova and
Woese. 1986. In B. Hardesty and G. Kramer
Stylonychia pustulata. J. Mol. Biol. and Evol. 2:
(eds.), Structure, function and genetics of ribosomes,
399-410.
pp. 141-163. Springer Verlag, Berlin.
Feely, D. E., S. L. Erlandsen, and D. G. Chase. 1984.
Giardia and Giardiasis. Plenum Press, New York. Pace, N. R. 1973. Structure and synthesis of the
ribosomal ribonucleic acids of prokaryotes. BacFelsenstein, J. 1982. Numerical methods for inferteriol. Rev. 37:562-603.
ring evolutionary trees. Q. Rev. Biol. 57:379Qu, L. H., B. Michot, and J-P. Bachellerie. 1983.
404.
Improved methods for structure probing in large
Fitch, W. M. and E. Margoliash. 1967. Construction
RXAs: A rapid 'heterologous' sequencing
of phylogenetic trees. Science 155:279-284.
approach is coupled to the direct mapping of
Gray, M. W., R. Cedergren, Y. Abel, and D. Sankoff.
nuclease accessible sites. Applications to the 5'
1989. On the e\olutionar\ origin of the plant
ACKNOWLEDGMENTS
EUKARYOTIC MICROORGANISMS AND RIBOSOMAL R N A S
terminal domain of eukaryotic 28S rRNA. Nucl.
Acids Res. 11:5903-5920.
Reeder, R. H. 1974. Ribosomes from eukaryotes:
Genetics. In M. Nomura, A. Tissieres, and P.
Lengyel (eds.), Ribosomes, pp. 489-518. Cold
Spring Harbor Laboratory, Cold Spring Harbor,
New York.
Rubtsov, P. M., M. M. Musakhanov, V. M. Zakharyev,
A. S. Krayev, K. G. Skryabin, and A. A. Bayev.
1980. The structure of the yeast ribosomal RNA
genes. I. The complete nucleotide sequence of
the 18S ribosomal RNA gene from Saccharomyces
cerevisiae. Nucleic Acids Res. 8:5779-5794.
Saiki, R., D. H. Gelfand, S. Stoffel, S. J. Scharf, R.
Higuchi, G. T. Horn, K. B. Mullis, and H. A.
Erlich. 1988. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239:487-491.
Sanger, F. and A. R. Coulson. 1975. A rapid method
for determining sequences in DNA by primed
synthesis with DNA polymerase. J. Mol. Biol. 94:
441-448.
Schopf, J. W. and M. R. Walter. 1983. Archean
microfossils: New evidence of ancient microbes,
Chapter 9. In J. W. Schopf (ed.), Earth's earliest
499
RNAs: Evolutionary implications. In Endocytobiology III. Ann. New York Acad. Sci. 503:125139.
Sogin, M. L., J. H. Gunderson, H. J. Elwood, R. A.
Alonso, and D. A. Peattie. 1988. Phylogenetic
significance of the Kingdom concept: An unusual
eukaryotic 16S-like ribosomal RNA from Giardia
lamblia. Science 243:75-77.
Sogin, M. L., A. Ingold, M. Karlok, H. Nielsen, and
J. Engberg. 1986i. Phylogenetic evidence for
the acquisition of ribosomal RNA introns subsequent to the divergence of some of the major
Tetrahymena groups. EMBO 5:3625-3630.
Stanier, R. Y. 1970. Some aspects of the biology of
cells and their possible evolutionary significance.
In H. P. Charles and B. C. J. G. Knight (eds.),
Organization and control in prokaryotic and eukary-
otic cells, pp. 1-38. Cambridge University Press,
Cambridge.
Stanier, R. Y. and C. B. van Niel. 1962. The concept
of a bacterium. Arch. Microbiol. 42:437-466.
Vossbrinck, C. R., J. V. Maddox, S. Friedman, B. A.
Debrunner-Vossbrinck, and C. R. Woese. 1987.
Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature
(London) 362:411-414.
biosphere: Its origin and evolution, pp. 214-239.
Princeton University Press, Princeton.
Whittaker, R. H. 1969. New concepts of kingdoms
of organisms. Science 163:150-160.
Smith, T. F., M. S. Waterman, and W. M. Fitch. 1981.
Whittmann, H. G. 1982. Components of bacterial
Comparative biosequence metrics. J. Mol. Evol.
ribosomes. Ann. Rev. Biochem. 51:155—183.
18:38-46.
Sogin, M. L. and H. J. Elwood. 1986. Primary struc- Woese, C. R. 1987. Bacterial evolution. Microbiol.
Rev. 51:221-271.
ture of the Paramecium tetraurelia small subunit
Woese, C. R. and G. E. Fox. 1977. Phylogenetic
rRNA coding region: Phylogenetic relationships
structure of the prokaryotic domain: The priwithin the Ciliophora. J. Mol. Evol. 23:53-60.
mary kingdoms. Proc. Natl. Acad. Sci. U.S.A. 74:
Sogin, M. L., H. J. Elwood, and J. H. Gunderson.
5088-5090.
1986a. Evolutionary diversity of the eukaryotic
Zuckerkandl, E. and L. Pauling. 1965. Molecules as
small subunit rRNA genes. Proc. Natl. Acad. Sci.
documents of evolutionary history. J. Theor. Biol.
U.S.A. 83:1383-1387.
8:357-366.
Sogin, M. L. andj. H. Gunderson. 1987. Structural
diversity of eukaryotic small subunit ribosomal
© Copyright 2025 Paperzz