The Small-Subunit Ribosomal RNA Gene Sequences from the

The Small-Subunit Ribosomal RNA Gene Sequences
from the Hypotrichous Ciliates Oxytriclha IWWZ
and Stylonychia pustulata’
Hille J. Elwood,* Gary J. 01sen,tT2and Mitchell L. Sogin*
*Department
o f Molecular and Cellular Biology, National Jewish Hospital and Research
Center; TDepartment
of Biochemistry,
Biophysics and Genetics, University of Colorado
School of Medicine; and $Department
of Molecular and Cellular Biology, National Jewish
Hospital and Research Center
We have determined the complete nucleotide sequence of the small-subunit
ribosomal RNA genes for the ciliate protozoans Stylonychia pustulata and Oxytricha
nova. The sequences are homologous and sufficiently similar that these organisms
must be closely related. In a phylogeny inferred from comparisons
of several eukaryotic small-subunit
ribosomal RNAs, the divergence of the ciliates from the
eukaryotic line of descent is seen to coincide with the radiation of the plants, the
animals, and the fungi. This radiation is preceded by the divergence of the slime
mold, Dictyostelium discoideum.
Introduction
Eukaryotic microorganisms are classified in either the protoctista or the fungi.
In contrast to the fungi, which are a relatively cohesive phylogenetic grouping, the
protoctists are a heterogeneous collection of organisms that display enormous physiological, cytological, and biochemical diversity (Margulis and Schwartz 1982). Consequently, it has been very difficult to infer consistent phylogenies for these simple
eukaryotes using classical taxonomic approaches, i.e., comparative studies of phenotypes. As an alternative, similarities between ribosomal RNA sequences can be used
to define quantitative phylogenetic relationships for these organisms. We have previously reported that comparisons of small-subunit ribosomal RNA sequences indicate
that the protoctist, Dictyostelium discoideum, represents the deepest divergence in the
eukaryotic line of descent yet characterized by molecular phylogeny (McCarroll et al.
1983). In this paper we have expanded our analysis to include the two closely related
protoctists, Oxytricha nova and Stylonychia pustulata. In a phylogeny inferred from
the small-subunit ribosomal RNA similarities, these ciliates are seen to diverge from
the eukaryotic line of descent significantly later than the branching of D. discoideum.
Material and Methods
Reagents
Restriction enzymes, bacterial alkaline phosphatase, DNA polymerase/IUenow
fragment, and the DNA synthesis kit were purchased from New England Biolabs. [u35S]dATP was purchased from New England Nuclear. DNA ligase was prepared using
1.
Key words: small-subunit ribosomal RNA genes, hypotrichous ciliates, evolutionary relationships.
Address for correspondence and reprints: Dr. M. L. Sogin, Department of Molecular and Cellular
Biology, National Jewish Hospital, 3800 East Colfax, Denver, Colorado 80206.
2. Present address: Biology Department, Indiana University, Bloomington, Indiana 47405.
Mol. Biol. Evol. 2(5):399410. 1985.
0 1985 by The University of Chicago. All rights reserved.
07374038/85/0205-0655$02.00
399
400
Elwood, Olsen, and Sogin
the methods of Panet et al. (1973). The dideoxynucleotides
Pharmacia P-L Biochemicals.
were purchased from
Preparation of Plasmids Containing Ribosomal RNA Genes
The complete macronuclear ribosomal RNA transcription units (inserted into
the PstI site of pBR322) from 0. nova and S. pustulata were provided by M. Swanton
(Swanton et al. 1982). The recombinant plasmids were grown in Escherichia coli
strain HB 10 1 and amplified in the presence of chloramphenicol. Plasmid DNA was
isolated using the SDS-alkali lysis procedures described by Maniatis et al. (1982).
Subcloning of Small-Subunit
rRNA Genes
Restriction fragments containing the small-subunit ribosomal RNA genes were
isolated from the recombinant plasmids described above. The 3.27~Kb HindIII/HindIII
restriction fragment from 0. nova and the 3.12-Kb HindIII/HindIII fragment from
S. pustulata were electrophoretically fractionated in 0.75% agarose gels built in E
buffer (40 mM Tris-acetate, pH 8.2, 20 mM sodium acetate, and 2 mM EDTA). The
regions of the gel containing the DNA fragments, as defined by ethidium bromide
staining, were excised and placed in vials containing 5 M NaI. The gels were dissolved
by heating at 48 C, and the DNA was absorbed to glass beads as described by Vogelstein
and Gillespie (1979). After elution from the beads, the DNA was concentrated by
ethanol precipitation and suspended in LT buffer ( 10 mM Tris-HCl, pH 7.5, 10 mM
NaCI, and 0.5 mM EDTA). These fragments were cloned into the multiple cloning
site of the M 13/mp9 single-stranded phage (Messing 1983).
The phage-cloning vector was prepared by digesting 20 pg of the M 13/mp9 replicative form with HindIII. The linearized vector was treated with bacterial alkaline
phosphatase and electrophoretically purified on agarose gels. Subsequent to extraction
from the agarose gels, 20 ng of the vector plus 60 ng of the gel-purified DNA fragments
containing the small-subunit rRNA genes was incubated for 18 h at 10 C with 10
units of DNA ligase in 10 pl of ligation buffer (50 mM Tris-HCl, pH 7.4, 10 mM
MgC12, 10 mM dithiothreitol, 1 mM spermidine, 1 mM ATP, and 100 pg/ml bovine
serum albumin). The recombinant Ml 3 vectors were used to transform Escherichia
coli, strain JM103 (Messing 1983). Noncolored M 13 plaques were selected, and Ctests, as described by Messing (1983), were used to determine the size and orientation
of the presumptive rDNA inserts.
Preparation of Primers
Primers for the dideoxynucleotide chain termination sequencing protocols that
are complementary to evolutionarily conserved coding and noncoding strands in the
small-subunit ribosomal RNA genes were prepared using the phosphotriester protocols
(Matteucci and Caruthers 198 1). The deoxyoligonucleotides were purified on 40 X 20
cm X 0.8 mm thick 20% polyacrylamide gels that had been prepared in 8 M urea and
1 X NNB solution (134 mM Tris base, 45 mM boric acid, and 2.5 mM EDTA). After
electrophoresis at 40 W and room temperature, the primers were located by UV shadowing (Hassur and Whitlock 1974) and eluted from the gels with TE buffer ( 10 mM
Tris-HCl, pH 7.5, and 0.5 mM EDTA). The eluted primers were bound to a C8 Bond
Elut column (Analytichem International) in TE buffer plus 50 mM NH40Ac. After
elution with 50% acetonitrile, the primers were lyophilized and suspended in 10 mM
Tris-HCl, pH 7.2.
18 S rRNA !3equences From 0. nova and S. pustulata
Dideoxynucleotide
40 1
Sequencing
Template DNA was prepared from the recombinant M 13 clones as described by
Messing ( 1983). Six nanograms of M 13 primer (New England Biolabs) or the synthetic
primers complementary to evolutionarily conserved regions of the ribosomal RNA
genes were annealed to 6 pg of template DNA in annealing solution ( 10 mM TrisHCl, pH 7.2, 10 mM MgC12, 1 mM dithiothreitol) by heating to 65 C for 5 min and
slow cooling to room temperature over 30 min. Klenow fragment of DNA polymerase
and 30 l&i [a-35S]dATP were added. This mix was distributed to each of five tubes
containing the deoxynucleotide triphosphates (dNTPs) plus one dideoxynucleotide
triphosphate (ddNTP). The dNTPs were present at a concentration of 0.4 mM, and
the ddNTPs as follows: ddA, 0.176 mM; ddG, 0.68 mM; ddT, 1.OmM; ddC, 1.OmM;
and ddG/dI, 0.35 mM/0.4 mM. (Band-compression artifacts on polyacrylamide sequencing gels occur with a frequency of l-2 errors/ 100 residues. These artifacts occur
primarily in dideoxyguanosine chain termination reactions and often result in erroneous sequence interpretations. Band compressions are caused by strong secondarystructure interactions that distort gel sieving patterns or effect premature chain termination in the dideoxynucleotide sequencing reactions. The error rate can be reduced
to -0.5% by substituting deoxyinosine for deoxyguanosine in an additional dideoxyguanosine chain termination sequencing reaction. Because the stacking interactions
of deoxyinosine are weaker than those of deoxyguanosine, the secondary-structure
stabilities are altered. The reduced secondary-structure stabilities minimize the band
compressions that can be detected by comparing dideoxyguanosine-terminated
reactions containing deoxyinosine with similar reactions containing deoxyguanosine.)
After incubation for 20 min at 37 C, a nonradioactive chase mix (1 mM in all dNTPs)
plus additional Klenow enzyme was added and incubation was continued for 15 min.
The reactions were halted by addition of EDTA to a concentration of 10 mM. The
samples were dried under vacuum and then resuspended in 10 ~1 of gel-loading buffer
(0.1% xylene cyanol/O. 1% bromphenol blue in formamide). Two microliters of each
sample were loaded onto 6% or 8% polyacrylamide sequencing gels (Sanger and Coulson
1975) that had been prepared in 8 M urea with a salt gradient from 2.5 X NNB
(bottom) to 0.5 X NNB (top). After electrophoresis at 40 W and room temperature,
the gels were soaked for 30 min in 10% methanol/ 10% acetic acid/ 1% glycerol and
then vacuum dried onto a sheet of 3-mm paper. The radioactive bands were located
by autoradiography using Kodak XL1 film.
Results
Eukaryotic, small-subunit ribosomal RNAs encoded by the nucleus vary in length
from 1,77 1 nucleotides in Stylonychia pustulata (present paper) to more than 2,450
nucleotides in Trypanosoma brucei (Hasan et al. 1982; M.L.S. and H.J.E., unpublished
data). Comparisons of five eukaryotic and 20 prokaryotic small-subunit ribosomal
RNA sequences as well as Ti oligonucleotide catalogues representing more than 200
prokaryotic organisms (Fox et al. 1980) reveal that universal or eukaryote-specific
sequences (regions that are conserved among all organisms or among all eukaryotes,
respectively) are interspersed among semiconserved sequences (regions of intermediate
conservation) and nonconserved sequences (regions that display very high rates of
genetic drift). The semiconserved sequences are useful for the construction of quantitative molecular phylogenies involving distantly related organisms, whereas the non-
402
Elwood, Olsen, and Sogin
conserved regions are valuable for resolving close phylogenetic relationships. The highly
conserved regions, because of a lack of sequence variation, do not contribute information about sequence divergence; however, they are potentially useful for rapidly
sequencing small-subunit ribosomal RNA genes.
Sequence Analysis of the Small-Subunit Ribosomal RNA Genes
The dideoxynucleotide chain termination protocols were used to sequence portions of the coding and noncoding strands of the S. pustulata and 0. nova smallsubunit rDNA genes cloned into the single-stranded phage M 13/mp9. We synthesized
13 oligonucleotides (15-17-mers) that are complementary to coding and noncoding
strands of universal and eukaryote-specific regions. These regions, strategically located
in all eukaryotic small-subunit ribosomal RNA genes, were used to initiate synthesis
in the dideoxynucleotide chain termination sequencing protocols. The eukaryote-specific and universal oligonucleotide primer sequences as well as their locations in eukaryotic and prokaryotic small-subunit rRNAs (as represented by D. discoideum and
Escherichia coli, respectively) are listed in table 1. From a given primer site it was
generally possible to determine the sequence of 300-500 nucleotides. The sequencing
strategies for the two ciliate small-subunit ribosomal RNA genes presented in this
paper are shown in figure 1. Figure 2 displays the small-subunit ribosomal RNA gene
sequences from 0. nova and 5’. pustulata aligned with the previously reported smallsubunit ribosomal RNA genes from D. discoideum (McCarroll et al. 1983; Ozaki et
Table 1
Synthetic DNA Oligonucleotides Complementary to Conserved Regions
in Eukaryotic Small-Subunit Ribosomal RNA Gene Sequences
Eukaryotic Location a
4->20
366->382
555->570
892-~906
1125->1141
1704->1720
393->377
571->557
906->892
1139->1125
1277->1262
1719->1705
1860->1845
...
..
..
.
....
...
..
.. .
..
.
.. ...
.. .
... .
.....
.....
Prokaryotic Location b
(9->25)
(298~>3 14)
(5 15->530)
(686~>700)
(906->922)
(1391->1407)
(325->309)
(531->517)
(700~>686)
(920-> 906)
(1061->1047)
( 1406-> 1392)
(1526->1511)
Sequence
CTGGTTGATCCTGCCAG’
AGGGTTCGATTCCGGAG
GTGCCAGCRGCCGCGG’
YAGAGGTGAAATTCT’
GAAACTTAAAKGAATTG
TGYACACACCGCCCGTC”
TCAGGCTCCCTCTCCGGd
ACCGCGGCKGCTGGCd
AGAATTTCACCTCTG d
ATTCCTTTRAGTTTCd
CGGCCATGCACCACCd
ACGGGCGGTGTGTRCd
CYGCAGGTTCACCTACd
’
’
NOTE-The locations of phylogenetically conserved sequences in eukaryotic small-subunit ribosomal RNAs
were identified in comparisons of five eukaryotic and 20 prokaryotic small-subunit ribosomal RNA sequences
as well as from T, oligonucleotide catalogues representing more than 200 prokaryotic organisms.
a Nucleotide positions of the synthetic DNA oligomers in eukaryotic small-subunit ribosomal RNAs as rep
resented by Dictyostelium discoideum.
b Analogous nucleotide positions of the synthetic DNA oligomers in prokaryotic small-subunit ribosomal
RNAs as represented by E. cob.
’Synthetic DNA oligonucleotides complementary to evolutionarily conserved regions of the coding strand
of eukaryotic small-subunit ribosomal RNA genes.
d Synthetic DNA oligonucleotides complementary to evolutionarily conserved regions of the noncoding strand
of eukaryotic small-subunit ribosomal RNA genes.
18 S rRNA Sequences From 0. nova and S. pustdata
----
5’ terminus
Hind Ill
1
EcoRl
t
.I
Stylonychia Pustulata srRNA
(686)
(298)
(914
366
)
_
(515)
555
892
*
(906)
1125,
_
_
Hind m
----
(325)393
(920)
(700)
1139
1719
(1526)
5’ terminus
I
I
Nova srRNA
(686)
(298)
366
_
(515)
04
892
(906)
555
(5311571
(325)
I
l
II
0
200
_ (920)
(700)
393
1
400
600
906
I
800
I --
- ---
(1339)EcoRI
EcoRl
(1406)
1139
(li7
I
Hind m
I
_
c
1000
3’ terminus
1125,
(1344)
-200
1860
EcoRl
I
Oxytricha
-
EcoRl
_(1406)
(10_61)1277
906
Hind m
1------
(1339)EcoRI
(1344)
_(531)571
3’ terminus
1
403
1719
(1526)
l
1200
l
1400
1860
II
1600
l
1800
2000
III
2200
1
2400
2600
2800
I
3000
Base Pairs
FIG. I.-Restriction
map and strategies used to determine the DNA sequences of the Stylonychia
pustulata and Oxytricha nova small-subunit ribosomal RNA genes. The small-subunit ribosomal RNA
coding region for S. pus&data resides within a 3.12-Kb Hind111restriction fragment (top panel) and that for
0. nova within a 3.27-Kb Hind111 restriction fragment (bottom panel). Synthetic DNA oligomers that are
complementary to evolutionarily conserved regions in eukaryotic small-subunit RNAs were used to prime
the dideoxynucleotide chain termination sequencing protocols. The arrows indicate the extent of sequence
data read from a particular primer site. The location of the primers in eukaryotic small-subunit rRNAs, as
represented by Dictyostelium discoideum, are indicated on the arrows, and the analogous positions in the
prokaryotic small-subunit rRNAs, as represented by E. coli, are included in parentheses.
al. 1984), Saccharomyces cerevisiae (Rubtsov et al. 1980; Mankin et al. 198 1) and E.
coli (Brosius et al. 1978).
Similarity Calculations and Tree Construction
The ribosomal RNA sequences were aligned using a nonrigorous procedure that
considers the phylogenetic conservation of both primary- and secondary-structural
features (McCarroll et al. 1983). Initially, short subregions of identical or similar primary structure in approximately homologous positions were aligned for the sequences
shown in figure 2 and for incomplete or unpublished sequences from Euplotes aediculatus, and Paramecium tetraureka (M.L.S., J. Gunderson, and H.J.E., unpublished
data). Alignment gaps were placed by eye to juxtapose regions of high similarity in
the various sequences. The procedure was repeated in order to detect regions of weaker
similarity. The alignments in regions of length variation were further refined by lining
up those secondary structures that appear to be evolutionarily conserved in all taxa
404
1
s. PUSTUL
0. NOVA
Elwood, Olsen, and Sogin
‘AAUCUGGUUGAUCCffiCCAU-CAUAUGCU-UGUCUC~CU~C~AU~CU~-----~AU~----U~---UUAUA~
nn”c~~“cc~ccc~“~cA”*~“-~~c”~~c”~ccA~~~c”M~----~“A”~--~-“G-~--””
UAUCUGGUUGAUCCUGCCAGUAGU-CAUAUGCV-UGUCUC
UAACUGGUUGAUCCUGCCA-CAUAUGCU-UGUCUC
$AAUUGAAG~GUUUGAUCA~GGCUCAGAU~-AACGC~~CA-~CUAACA~A~CAAGUC~
,
‘i
GGUAAC
* AYAGAA!cuuGYucu’(uGcuci
S.PU~T~L~‘:~CUGC~~CU~~U~C~U~AU~UUAU~~U~UC-!--~UUUA~A~~AU~~C~~~~CU~U~UACA~---!
UGAAACw;CGAAuujcUcAUUAAAAcAGUUAUAGUUUAUU~U~UC----~UUUA~~-AU~CCG~U~UUCU~U~UACA~--::,=I
UGAAACUGCGAAUGGCUCAUUAAAUCAGUUAUCGUUUAUU~UffiUUC--CUUUACUACA~UAU~CC~U~UUCU~U~UACA~---~.D~IS;I
UGAAACUGCAGACGGCUCAUUACAACAGUGAUAAACUAAU---ACGAGUU;CCCdCU;GUW~C~-~C~~~~--~--------~-AU~~UACffi~C~U~CU~UACC~AU~
101
.
%pu&f
S: CEREVI
E: “ZY’
201
.
I
,
I
*
__________________________--AC~AGG~UGUAUUUA~JAGAI~ACA~~U~A~AUU~U~G~~~~
A--cubJu&ccAkuuuL
_-______-________--____-____-_-cu~uAAGccJGAcuuuu__~
UGUAUUUAUUAGAUAACAAAUCAAUAUUCCUCGUGUCUA
---_________--_____________UGUAUUUAUUAGAUAAAAAAUCAAUGUCUUC---GCACU
----cuuAbAAucucGAcccuuu--mmAGA
___________-______--________-CAAGCGAUGGGUGACUGGCACGGAAGcUCAGCGAUUAUUAG-cAUUCUACCAAuGcCUUC~-UUU
ccuuc~ccucuu__;_________;--___-_____~____---__~____-__-_~_____--__~_________~
AA?AYGA.
201cGucGcAYc
.
S.PU~T~L~‘~~~CAU~U~~~U~~U~~AU~U~U~C~CGC~UACA~CAUU~~~UC~~CCCA~C~~UUC~~~~~U~
UUGUGAUGAUUCAUAAUAACW;AUCGAAUCGCAUGGGCUUU~UC~~UA~UCAUU~UUC~CC~UC~UUUC~U~~UAU~
0. NWA
S.CEREVI
CUUUGAUGAUUU~U~CUUUUCGAAUCGCAUGGCCUUG~~C~~UUCAUUC~UUUC~CCCUAU~CUUUC~~U~U~~C
~,D~ISC~’
GGGUGAUA---CCGAAUAAUAUUGCAGAUCGA--GGAUUUA-UCU-UC~C~CUACffi~U~C~CCCUAUC~CUUUC~~AC~AU~
_____---_____________-___--_____~_-_____~~~~~~~-~CAUCCCAIIW”CCCAGA~UUAGC~A-XiUAIXuff=uAN~~
301
.
I
,
I
.
I
.
I
.
AAkAWiCAGCkCGtGUd
~~AGP~~cUC~UALCACA~CU
~.~IJIJ~~~~:UAC~AUGG~UUUC~C~AAC~~~UC~UUCC
CUGAGMACGGCUACCACAUCUAA~AGCAGGCGcGu~
CUACCAlMXCUUUCACGGGUAACGGKGAUU%GGUUCGAUUCC~
GCCUGAGAMCGGCUACCACAUCC AAWNXCAGCAGGCGCGCAAA
S:CEREVI
CUACCAUGGUUUCAACGGGUAACG@XAAUAAL%UUCGAUUCC~
GGAGPU;GAGCCUGAGAAAUGGCUACCACUUCUACGGAAG
~.D~IS;I
CUACCAUGGUUGUAACGGGUAACGGGGAAUUCGAUUCC
UCACCUAU;:tACW\UCCC~-~~C~~~C~CCA~~~CU~CAC~CC~~CUCCUAC~~~U~U~
401
.
S.PUSTlL
0. NDVA
S.CEREVI
f.DtISC~I
501
~AGU$c~u~c~cc~c_~~uAu_~uu~~~~_-~-___!____~_--_!___-~-_c~
UUACkAAU~CUGA~UC~~u~cUjAcc~__cuAu_~uuc~u~-_________~___________~~c~
UUACCCAAUCCUGACUC-A
CAAUAAAUAACGAUACACcAUUCGGGUCUUGUAAUuG-------------------------GAA
UUACCCAAUCCUAAUUC-A
_GGAGGGCAAuuG__________--_----______AAA
UUACUCAAUCCCAAUAC -G&GAAGUAGUG4CAUAUcAAUACCU-AUCCUUUUU
KAAGNfXCUUCGfXiIjIKUAAAGuA~UUUCAGC~AAAGUUAf
UffiCACAA~GCAAGC~U~~~C~~CCGCGUFUA
501
.
S.P~~T~~~W;AG~AGAA:UU~CCCCC:UUAC~~AGU’
CUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGuAUAUU~uGuU
0. NDVA
lJMGUAGAAUUUA4ACCCCUUUACGAGGAUcAAU~
CUGGUGCCAGCAGCCGCGGUAAUUCCAGcUCCAAUAGcGUAUAuUAAAGuuGuU
S. CEREVI
UG4GUACAAUGUAWJACCUUAAlcGAGGAAcAAuuGGAG
E.D;My
UGAACACAAAUUAAAACUCUUAAUUAAC-ACAAUUGCAGUU
ACC-fGCUAACUCC~CCAG~CGCGGuAAU~CCXiAGWGfAAGcCWA~~UUAC~
UACCUUUGC~CAUKACGU~ACCCGCAGAfWc
601
.
;. WUST~
S:CEREVI
~.D:NC~’
701
&GGAG~CGC~AAUG~~CGUCI!J~GU&CUG~AG~GGC~U~~~~~A--~----~-~~~~~~~~~~AA~
GcAGiru~ucGud~~uuc
GCCAAUGUCGUCUUGUUGACUGUGCAGCGGCGCUCUUCCA---------UCCUUCuG-UUAAC
GcAGuUAAAAAGCUCGUAGuuGGAuuuc~
GCAGU UAAAAAGCUCGUAGUUGAACUU-UUiGCC~U~C~CCGC
GCAGUUAAAAAGCUCGUAGUWU-WUUACCWU-UAUGUCAUU
ACCACU
v AACC ~&~~~~~~~~~~Wu_~UU!~
CZGCGU~CGCACGC~CXW~UU~UCAGA~~GAAAUCCCC~U
.I,
1.1,
I.
701
.
I
S ~~~T~~~~UUU~~~;UA~~~~U~C~C~C~UC~~UU~UACC~~U~~UUC~~C~C-~C~~C~UACA~-U~~A~
0: NDVA
GUUUCGGGUAUUCAUUUACUCGUCUCGGGCUCAGAUAuUUUACCU UG#MAAUUAGPGUGUUCCAGGCAGl%-UCGCGCCGGAAUACAU-UB
UGM4AAUUAGAGUGUUCAC~GUAUUGCUC~UAUAu-UmUGGA
S . CEREVI
CUUGAG---UCCUUGUGGCUCUUWXA-ACCAGWICUUUUACUU
KUGGUGUUUAAAGCAGGCGUCUCGCCUGAUCUUUUGCAGcAuGGu
:.“=I
UCAGCUUGUAUUAUCUUUGAUAGUGCUuGuU~cAUUUCACAUUUCU
____________________________________--_____--___-__-_____-_-__-___-________________________________
80101.1.1.1.
l.l.I.l.1.1
901.
I
.
I
.
I
.
I
I
I
.
FIG. 2.-Sequence
of the Stylonychia pustulata and Oxytricha nova small-subunit ribosomal RNA
coding regions aligned with other small-subunit rRNAs. The sequences of the S. pustulata and the 0. nova
small-subunit rRNAs are shown aligned with those from Dictyostelium discoideum (McCarroll et al. 1983),
Saccharomyces cervisiae (Rubtsov et al. 1980; Mankin et al. 198 I), and E. coli (Brosius et al. 1978). The S.
pustulata and 0. nova sequences were determined as described under Experimental Procedures using the
sequencing strategies depicted in fig. 1. Initially the small-subunit sequences shown as well as those from
Xenopus laevis (Salim and Maden 1981), Zea mays (Messing et al. 1984), rat (Chan et al. 1984) rice
whose sequences have been examined. In any alignment procedure, the merits of
improved sequence similarity versus the introduction of alignment gaps must be
weighed. Our method of alignment is not rigorously defined, but we believe that the
positions aligned by this procedure have a higher probability of being in homologous
alignment than those aligned by maximizing the number of matching nucleotides.
We will employ the word “structural” to distinguish similarity defined using our type
18 S rRNA Sequences From 0. nova and S. pustulata
~~J.sT~
S:CEREVI
;. my’
1001
Gk~“c”ckAu”kiU”AkGAciJAAc”~!~AuGc&A””I_kcc
A&wilUUUCAuuAA;Ic~c~ukuc~ctic~
GAAAUUCUCGGAUUUGUUAAAGACUUAUGCGAAAGC
GAAAUUCUUGGAUUUAUUGAAG~CUAACUAC~G~~AUU~GCC~CGUUUU~A~UAAUCAAGAAC~~CWCWCUGA
~~~~~~AUCAAGAUCUUCUGCGAAAGCAUUCAC~UACUUCCCCAUU~U~C~U~UC~C~UC~
UACC~c@W~GGCCCCC~CGAAGACuyACGcUcAGGyGCGA#,Gc
1001
.
I
. uYY
405
GymAGcAApcmu~
;. P”~~~l0:AcC~CC”kuCU~~Cc~“~~~UG~C~C~~Uc~G!--C~U”“A~C-CG~CU””~ActUUA~“C~~UC”U~
UACCGUCCUAGUCUUAACCAU~CUAUGCCGACUAGGWI
S:CEREVI
UACCGUCGlJAGUCUUAACCAUAA4CUAUGCCGACUAGGGAUC GGGKGNJUUUUUUAAKACCCACUCC&UACCUUACGA
UGIJWNAUCAUGAGUCUUUAGA
EsD;NC&I
UACCGUCGUAGUCCAAACUAUAAACUAUGUCGACCAGGGACU
UACCCUUjU~CCACGCC~~C~~C~CUUG
1101
*
1201
uuc lJhG&iUAljGGUC~AA~U&CU~~U~C
~AC~A~CAGCUU~C~~C~U~~C~C~~C~
UUCUCUjGU;IX;UAUGGUCGCA4~uGAAACUUAAAGGCC
AG@WGGAGCUUGCGGCUCAAUUUGACUCAACACGGGA
s: CEREVI
uuc KGG&GWALlGGUCGCACUGAAACUUAMGGAAUuGAC~ACCACC
ANXUGGAGCWGCGGCU-AAUUUGACUCAACACGGGG
CuGCGGcUUAAuUuGACUCAAcuc~
pc~’
uucc GW%G4GUAUGGKGCAA-GUCuGAAACUUAAAGGAAUUG4C~ACACA4~
ACGGCC~CAA-CW~CUCAAA~ULNX~CCCGCAC~~C
GGWfGCAUGUC&UIfWAUUCG4~AACGC~
CCGCClWfAGU
plJ~u~
1201
.
1301
l&GGKkWA:~Ccccillu-cljluAGuiKj
s.PUSTUL
AAACiJUACCkUCtAGA-:AU&&UUtiC~~W-l-U&UCUUhCU~UUCU~
AUGGCCGUU-CUUAGWGGKG
0. NOVA
AMCUUACCAGGUCCAGA-CAUAGuG-AWWJGACAGAUW---UAGcUCUUUCUuG4UUCUA~~
GGUGG%CAlN%CGUUUCUC&UlWJGG
S.CEREVI bA4CUCACCAGGllCCAGA-CACAAuA-~UuGACAGAU~---GAGCUCUUUCUUGAUUUUG~
E,D;W;I
AAACUUACCAAGCUAAGA-UAUAWA-AGGAUUGACAGACU~---~UCUUUCA~UUCUAU~U~~U~AU~CGUU-CUUA~~~
AACCUUACC(IGGUCUUGAC~UCCAC~UUUCAG
1301
.
S.PUSTUl_
0.
NOVA
S.CEREVI
;.~.Is;I
1401
AGu(;i\UUuG:CucGliUA~CC~~~C~C~CCU~~C~ACU~CU~~C-UA~CC~~U-!----~----!----~----!----~----!
AGuGAUUUGuCuGGuUAAUUCCGUUAAcGAAcGc\GACCUU~CUACU~CU~C-~UUC~~U------------------------------UAGcAuuU_-___-____________-_____________
AGUGAUUUGlICUGCUUAAUUGCGAUAACGAACGAG4CCUUA4CCUACUAAAU~
AGCGAuUuGuCuGGuCAAUUCCGAuAAcGGAcGpsdCCUC~CC~U~CUffiU~UAUUUAUU~UC~UAU~C~U~UUUUC~UUU~
ffiUGAAAWjl(UGUXIUACCCGC~C~CGCAACCC~UAUCC-UU~U~CCAGC~CCG-----~---------~---------~---------~
1401
.
1501
____~___-!____~__-_!~~__~~~____‘__~~~-_~!__-~~~~~~~_~~~!~~~~~~~~~~~~~~~~~~~~
___-____--__--____-___-____---_-____-____-__~~~~~_-___~~~~~~~~~~~~~~~_~~~~~~
S . PUSTUL
0.
?4OVA
S.CEl?EVI
D.DISCOI
E. COLI
CAkcid
_--___-~__--~-__-_-_~~--___-_______---_______~~~~~~~____-~~~~~~~~~~~~~~~~~~~~~~
AUGAUUUCGGUCAUCUCCUGCUUC M%AGKWJAGlJCUGACUCGAUAGGUACGAAUUAAAAC
_________T__-_____-S_______-_;--_---__--_~__--___--~__--~cc_~cuc
1501
.
AyAcwACuG~CAGUGAU~cu
1701
C~AAUC!-Acc~UnuGccuctucAuGcccAliAcA~
CUGGUAAuC--AGCAAUAUGCGUCGUGA~UAGAUCUUuGGAAUUAUAGAUCUuGAACG4GGAAUUCCUAGU~GcAAGu
S:CEREVI
UUGGUAAUCUUGUGAAACUCCGUCGUGCUGUiGAUAGAGC
CAUUUGAAUWCCUACGUAACLKjGGCUUGAUCUUUGUAAUU
!:“::?
lKsmAAU
AAGCU;ACC~CAUAAAGUG~GU-C~~~C~U~UC~C~CU~~CUCCAU~C~U~GCU~~U~GU~UC~UGC~AC~~
1701
*
;XIJJ~
S.PUSTLL
0. NOVA
S.CEREVI
;.DtISC’;‘I
GGY
AUUA CU
GCU
E AUC Akcu#%iuu
1901
GG--L----&%h&GlltiCCA~UCACIjU
AGMbhML
UCGUhA&WJ~CGU~CC~~~U~----:UUA
GG----------!WAAUCUAGUGUAAACCAUAUCACUUAGAG
CGUA4CAAGGWUCCGUAGGllGMCCuGC(%AAGGAUCA-----UUA
-UCUcPMGCGGA&AlWGGAC~UUGGUCAUU
NXAGCMCUAAAAGUCGUAACAAGGuUUCCGUAGGUGAA
AUAUAAAUUA-MWUUAUUUAbAUCUCAUuGUUU
AGAGGAAGGIU;AAGUCWAACAAGGUAUcCGUAGGUGAACCUGcGGAUGGAUCAUU---UUA
GG-CGCUUAC-------;--CACUUUUIGAUUCA~~U~~UC~~C~~~CCGUA~cC~~~JC~CCUCCuUA
1901
.
I
(Takaiwa et al. 1984), Tetrahymena thermophila (Spangler and Blackburn 1985), Paramecium tetraurelia,
and Euplotes aediculatus (M.L.S., J. Gunderson, and H.J.E., unpublished data) were aligned according to
primary structure. The locations of evolutionarily conserved structures were then used to refine the alignment
where length variation occurred. The differences in sequence length were compensated by introducing ap
propriate gaps (-) in the sequences. A number system for the aligned sequences, as well as number systems
for each sequence, is provided.
of alignment from those using maximal matches. We define structural similarity, s,
as
s = m/(m + u + g/2),
where m is the number of sequence positions with matching nucleotides in the two
sequences, u is the number of sequence positions with nonmatching nucleotides, and
406
Elwood, Olsen, and Sogin
g is the number of sequence positions that have a gap in one sequence opposite a
nucleotide in the other sequence. A special case is the occurrence of large insertions
and deletions. These events are likely to be the result of single rare events rather than
of the compounding of large numbers of single nucleotide events. Therefore, only the
first five sequence gaps in a string of gaps were counted in determining g.
Pairwise comparisons of all homologous nucleotide positions for the small-subunit
ribosomal RNA sequences shown in figure 2 as well as those of Zea mays (Messing
et al. 1984), rice (Takaiwa et al. 1984), Xenopus Zaevis (Salim and Maden 198 l),
Halobacterium volcanii (Gupta et al. 1983), and rat (Chan et al. 1984) were used to
compute similarity values. If one treats the structural similarities as representing the
fraction of sites that are identical, then h = 1 - s is the fraction of sites that are different
and may be used in the formula of Jukes and Cantor (1969) to get a corresponding
structural distance expressed in terms of nucleotide substitutions/site. Both the similarity values and the computed structural distances are presented in table 2.
The structural distances were converted to phylogenetic trees by a variation of
the method of Fitch and Margoliash (1967). The evaluation of alternative phylogenetic
trees was based on the agreement of the structural distance data separating pairs of
organisms and the sum of the tree segment lengths joining the organisms in the tree.
The difference between the sum of the tree segment lengths and the structural distance
for each pair of organisms was squared. This error was divided by the variance of the
structural distance estimate (Olsen 1983).
The variance o2 is defined as
(3=
w-0-v
n[ 1 - (4/3)h12 ’
where o* is the variance, s is the structural similarity, and n = m + u + g/2 (Kimura
and Ohta 1972; Hori and Osawa 1977). The summation of weighted errors for all
pairs of organisms is defined as the tree error. For a given tree topology, the tree
segment lengths that minimize the tree error were determined. If a topology yielded
a negative length segment (negative length segments are mathematical artifacts that
have no evolutionary meaning), the tree error was penalized by a factor of 3 for every
negative length segment in the tree.
Determining the tree geometry and branch lengths that best fit the structuralsimilarity data is an optimization problem of considerable magnitude (there are
2 X 1O6 possible unrooted trees for 10 organisms). Because it is not practical to test
all possible trees, we have used an algorithm in which the effects of a given set of
rearrangements on a given phylogenetic tree are tested, and then the best of all tested
alternatives (i.e., the most improved tree) is maintained and is used as the starting
point for another round of optimization. Two simple classes of tree rearrangements
are tested by the optimization algorithm (Olsen 1983). Both regard the current tree
as sets of subtrees connected by segments. A subtree can range from a single sequence
to N - 3 sequences, where N is the number of organisms represented in the tree. A
subtree can be moved to a new location by removing its nearest node from the tree
and inserting this node into an alternative tree segment. In the first class of rearrangements, the effect of moving each possible subtree (one at a time) to every alternative
location in the tree is systematically tested. The second class of rearrangements tested
involves interchanging the locations of a pair of subtrees. The effect of all possible
Table 2
Structural Similarity and Distance between Small-Subunit
Ribosomal RNA Gene Sequences
STRUCTURALSIMILARITY(S)TO
ORGANISMa
Rat
Rat
.... ... .
Dictyostelium
discoideum
Rice
Zea
mays
Saccharomyces
cervisiae
Stylonychia
pustulata
Oxytricha
nova
0.958
0.794
0.790
0.777
0.742
0.742
0.676
0.539
0.493
0.797
0.795
0.778
0.744
0.745
0.69 1
0.542
0.494
0.977
0.819
0.805
0.806
0.720
0.550
0.493
0.820
0.809
0.807
0.720
0.552
0.495
0.817
0.818
0.729
0.55 1
0.501
0.985
0.734
0.559
0.515
0.737
0.557
0.517
X. laevis . . . . . . .
Rice
... .....
0.044
0.240
0.236
... .. .
.
S. cerevisiae
S. pustulata . . .
0. nova .
.
D. discoideum
...
H. volcanii
..
E. coli . . . . . .
0.247
0.240
0.023
0.265
0.263
0.207
0.206
0.316
0.313
0.225
0.22 1
0.209
0.316
0.312
0.224
0.223
0.208
0.015
0.425
0.398
0.350
0.350
0.337
0.328
0.324
0.715
0.706
0.687
0.682
0.684
0.666
0.670
0.696
0.845
0.843
0.845
0.840
0.820
0.780
0.774
0.815
Z. mays
Halobacterium
volcanii
Escherichia
coli
Xenopus
laevis
0.547
0.503
0.605
0.56 1
NOTE.-The upper-right half of the table gives s values for all pairs of aligned small-subunit rRNA sequences. If 1 - s is considered to be the fraction of sites that are identical, the formula of Jukes
and Cantor ( 1969) can be used to compute the structural distances (average number of base changes per sequence position), which are shown in the lower-left half of the table.
’Sequence data from rat (Chan et al. 1984), X. fuevis (Salim and Maden 198 l), rice (Takaiwa et al. 1984), 2. muys (Messing et al. 1984), S. cerevisiue (Rubtsov et al. 1980), D. discodieum (McCarroll
et al. 1983), H. volcunii (Gupta et al. 1983), and E. coli (Brosius et al. 1978).
408
Elwood, Olsen, and Sogin
pairwise interchanges of subtrees (one pair at a time) is tested. The rearrangement
that leads to the most improved tree is used as the starting point for a new round of
optimization. For 10 organisms there are on the order of 250 independent trees per
round of optimization. The number of rounds of optimization required for a convergent
solution depends on the topology of the initial tree. Computer programs for calculating
similarities and structural distances and for implementing our phylogenetic-tree evaluation and optimization algorithm have been written in FORTRAN for execution on
the Digital VAX 1 l/750. The program is useful for trees with less than 30 organisms.
It appears to have successfully found the optimum of all sequence sets tested (as
evaluated by the “optimal” tree being independent of the initial tree).
The computer-assisted optimization algorithm described above was used to infer
the phylogenetic tree shown in figure 3 from the similarity and structural distance
data presented in table 2. This optimized tree (tree error = 0.30) is consistent with S.
pustulata and 0. nova being closely related but places the divergence of D. discoideum
from the eukaryotic line of descent prior to the branching of these ciliates. The tree
with the second-lowest error (tree error = 0.32) was constructed by interchanging the
branching order of S. cervisiae and the ciliates. We interpret the slight difference in
tree errors and the small negative segment (-0.01) between the cilate and the fungal
branchings in the second-best tree to mean that the branching order for these two
groups is not statistically significant.
Discussion
Ciliates are unicellular heterokaryotic organisms that have cilia at some point in
their life cycle. Classification schemes for these organisms are generally based on characterization of ciliature and infraciliature. Corliss (1979) has proposed a taxonomy
that includes 23 orders divided among three major classes. The hypotrichous ciliates
Stylonychia pustulata and Oxytricha nova are members of the Polyhymenophora and
are considered to be members of the suborder Sporadotrichina. As shown in our
inferred phylogeny (fig. 3), the S. pustulata/O. nova ribosomal RNA distance is con-
-
FIG.
tree was i
Results. The evolutionary distance between nodes of the tree is given alongside the segment connecting
them and is represented in the horizontal component of their separation.
18 S rRNA Sequences From 0. nova and S, pus&data 409
&tent with this placement and is comparable to the divergence between rice and
Zea mays.
As in the case of Dictyostelium discoideum, the ciliophora are taxonomically
treated as members of the protoctista and are considered to be a very ancient phylum.
Our phylogeny indicates that the emergence of the ciliophora was preceded by the
rhizopodea as represented by D. discoideum, thus supporting the notion that the protoctista are not monophyletic. In fact, based on the structural distances of ribosomal
RNA shown in table 2, the evolutionary distance between these two major protoctistan
groups is comparable to the evolutionary distance between plants and D. discoideum
or to that between animals and D. discoideum. We have previously argued that the
large distance between the rRNA of D. discoideum and published sequences from
other eukaryotes represents an early branching in the eukaryotic line of descent rather
than an unusually high rate of genetic drift (fast evolutionary-clock speed) or convergent
evolution in the rRNAs of other eukaryotes (see McCarroll et al. 1983).
Finally, the phylogeny shown in figure 3 suggests that the early branching of D.
discoideum was followed by a radiative period that gave rise to the animals, the plants,
the fungi, and the ciliates. It will be of interest to characterize other representatives of
the protoctista to determine whether major phylogenetic groups radiated from the
eukaryotic line of descent at a similar time.
Acknowledgements
This research was supported by grant GM32964 from the National Institutes of
Health.
LITERATURE CITED
BROSIUS,J., M. L. PALMER, P. J. KENNEDY, and H. F. NOLLER 197 8. Complete nucleotide
sequence of a 16s ribosomal RNA gene from Escherichia coli Proc. Natl. Acad. Sci. USA
75:4801-4805.
CHAN, Y., R. GUTELL, H. F. NOLLER, and I. G. WOOL. 1984. The nucleotide sequence of a
rat 18s ribosomal ribonucleic acid gene and a proposal for the secondary structure of 18s
ribosomal ribonucleic acid. J. Biol. Chem. 259:224-230.
CORLISS,J. 0. 1979. The ciliated protozoa: characterization, classification, and guide to literature,
2nd ed. Pergamon, New York. 455 pp.
FITCH, W. M., and E. MARGOLIASH. 1967. Construction of phylogenetic trees. Science 155:
279-284.
Fox, G. E., E. STACKEBRANDT,R. B. HESPELL,J. GIBSON, J. MANILOFF, T. A. DYER, R. S.
WOLFE, W. E. BALCH, R. S. TANNER, L. J. MAGRUM, L. B. ZABLEN, R. BLAKEMORE,R.
GUFTA, L. BONEN, B. J. LEWIS, D. A. STAHL, K. R. LEUHRSEN,K. N. CHEN, and C. R.
WOESE. 1980. The phylogeny of prokaryotes. Science 209:457-463.
GUPTA, R., J. M. LANTER, and C. R. WOESE. 1983. Sequence of the 16s ribosomal RNA from
Halobacterium volcanii, an archaebacterium. Science 221:656-659.
HASAN,G., J. J. TURNER, and J. S. CORDINGLEY.1982. Ribosomal RNA genes of Trypanasoma
brucei: cloning of a rRNA gene containing a mobile element. Nucleic Acids Res. 10:67476761.
HASSUR, S. M., and H. W. WHITLOCK. 1974. UV shadowing-a
new and convenient method
for the location of ultraviolet-absorbing species in polyacrylamide gels. Anal. Biochem. 59:
162-164.
HORI, H., and S. OSAWA. 1979. Evolutionary change in 5S RNA secondary structure and a
phylogenic tree of 54 5S RNA species. Proc. Natl. Acad. Sci. USA 76:381-385.
410
Elwood, Olsen, and Sogin
JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 l-l 32 in H. N.
MUNRO, ed. Mammalian protein metabolism. Academic Press, New York.
KIMURA, M., and T. OHTA. 1973. On the stochastic model for estimation of mutational distances
between homologous proteins. J. Mol. Evol. 2:87-90.
MANIATIS,T., E. F. FRITSCH, and J. SAMBROOK.1982. Large-scale isolation of plasmid DNA.
Pp. 86-94 in Molecular cloning: a laboratory manual, Cold Spring Harbor Laboratory, Cold
Spring Harbor, N.Y.
MANKIN, A. S., A. M. KOPYLOV,and A. A. BOGDANOV.198 1. Modification of 18s rRNA in
the 40s ribosomal subunit of yeast with dimethylsulfate. FEBS Lett. 134:11-14.
MARGULIS,L., and K. N. SCHWARTZ.1982. Protoctista. P. 7 1 in P. BREWER,ed. Five kingdoms.
W. H. Freeman, New York.
MATTEUCCI, M. D., and M. H. CARUTHERS. 198 1. Synthesis of deoxyoligonucleotides on a
polymer support. J. Am. Chem. Sot. 103:3 185-3 19 1.
MCCARROLL,R., G. J. OLSEN, Y. D. STAHL, C. R. WOESE,and M. L. SOGIN. 1983. Nucleotide
sequence of Dictyostelium discoideum small-subunit ribosomal ribonucleic acid inferred
from the gene sequence: evolutionary implications. Biochemistry 22:5858-5868.
MESSING, J. 1983. New M 13 vectors for cloning. Pp. 20-78 in R. WV, L. GROSSMAN,and K.
MOLDAVE,eds. Methods in enzymology: recombinant DNA. Part C. Academic Press, New
York.
MESSING, J., J. CARLSON, G. HAGEN, I. RUBENSTEIN,and A. OLESON. 1984. Cloning and
sequencing of the ribosomal RNA genes in maize: the 17s region. DNA 3:3 l-40.
OLSEN, G. J. 1983. Comparative analysis of nucleotide sequence data. Ph.D. thesis. University
of Colorado, Boulder.
OZAKI, T., Y. KOSHIKAWA,Y. IIDA, and M. IWABUCHI.1984. Sequence analysis of the transcribed
and 5’ non-transcribed regions of the ribosomal RNA gene in Dictyostelium discoideum.
Nucleic Acids Res. 12:4 17l-4 184.
PANET, A., J. H. VANDE SANDE,P. C. LOEWEN,H. G. KHORANA,A. J. RAAE, J. R. LILLEHAUG,
and K. KLEPPE. 1973. Physical characterization and simultaneous purification of bacteriophage T4 induced polynucleotide kinase, polynucleotide ligase, and deoxyribonucleic acid
polymerase. Biochemistry 12:5045-5050.
RUBTSOV, P. M., M. M. MUSAKHANOV,V. M. ZAKHARYEV,A. S. KRAYEV, K. G. SKRYABIN,
and A. A. BAYEV. 1980. The structure of the yeast ribosomal RNA genes. I. The complete
nucleotide sequence of the 18s ribosomal RNA gene from Saccharomyces cerevisiae. Nucleic
Acids Res. 8:5779-5794.
SALIM, M., and B. E. H. MADEN. 198 1. Nucleotide sequence of Xenopus laevis 18s ribosomal
RNA inferred from gene sequence. Nature 291:205-208.
SANGER, F., and A. R. COULSON. 1975. A rapid method for determining sequences in DNA
by primed synthesis with DNA polymerase. J. Mol. Biol. 94:441-448.
SPANGLER,E. A., and E. H. BLACKBURN.The nucleotide sequence of the 17s ribosomal RNA
gene of Tetrahymena thermophila and the identification of point mutations resulting in
resistance to the antibiotics paromomycin and hygromycin. J. Biol. Chem. 260:6334-6340.
SWANTON,M. T., R. M. MCCARROLL,and B. B. SPEAR. 1982. The organization of macronuclear
rDNA molecules of four hypotrichous ciliated protozoans. Chromosoma 85: l-9.
TAKAIWA, F., K. 00~0, and M. SUGIURA. 1984. The complete nucleotide sequence of a rice
17s rRNA gene. Nucleic Acids Res. 12:5441-5448.
VOGELSTEIN,B., and D. GILLESPIE. 1979. Preparative and analytic purification of DNA from
agarose. Proc. Natl. Acad. Sci. USA 76:6 15-6 19.
WALTER M. FITCH, reviewing editor
Received January 23, 1985; revision received May 14, 1985.