Nucleic Acids Research
Volume 17 Number 19 1989
Euglena gracilis chloroplast ribosomal protein operon:
L5 and description of a novel
organelie
a new
chloroplast gene for ribosomal protein
intron category designated
group
HI
David A.Christopherl and Richard B.Hallickl.2 *
'Department of Molecular and Cellular Biology and 2Department of Biochemistry, University of Arizona,
Tucson, AZ 85721, USA
Received July 20, 1989; Accepted August 18, 1989
ABSTRACT
We describe the structure (3840 bp) of a novel Euglena gracilis chloroplast ribosomal protein operon
that encodes the five genes rplJ6-rplJ4-rpIS-rps8-rp136. The gene organization resembles the spc
and the 3'-end of the SlO ribosomal protein operons of E. coli. The rpl5 is a new chloroplast gene
not previously reported for any chloroplast genome to date and also not described as a nuclear-encoded,
chloroplast protein gene. The operon contains at least 7 introns. We present evidence from primer
extension analysis of chloroplast RNA for the correct in vivo splicing of five of the introns. Two
of the introns within the rps8 gene flank an 8 bp exon, the smallest exon yet characterized in a
chloroplast gene. Three introns resemble the classical group II introns of organelle genomes. The
remaining 4 introns appear to be unique to the Euglena chloroplast DNA. They are uniform in size
(95-109 nt), share common features with each other and are distinct from both group I and group
II introns. We designate this new intron category as 'group III'.
INTRODUCTION
Of the estimated 60 different ribosomal proteins that comprise the prokaryotic-like 70S
ribosomes of chloroplasts (1), genes for 20 are located on the tobacco (2) and liverwort
(3) chloroplast genomes. The remainder appear to be encoded by the nuclear genome,
and synthesized as precursor polypeptides on cytoplasmic ribosomes (4,5). Many of the
chloroplast-encoded ribosomal protein genes are organized in clusters that resemble the
Sl0, spc and alpha ribosomal protein operons of E. coli (6). Some ribosomal protein genes
have been shown to be transcribed (7,8,9), trans-spliced (10,1 1,12) and to encode proteins
present in ribosomes in vivo (13,14,15). However, the molecular events that coordinate
the expression of the chloroplast and nuclear-encoded ribosomal constituents during
ribosome biogenesis are poorly understood.
In the photosynthetic protist, Euglena gracilis, 11 chloroplast ribosomal protein genes
have been characterized, including rps7-rpsl2 (16), rp120 (17), rp123-rp12-rpsl9-rp122-rps3
(18), rpsl4 (19), and rps4-rpsll (Stevenson, J. and Hallick, R.B., manuscript in
preparation). In the present paper, we describe a novel chloroplast ribosomal protein operon
that contains the five genes, rp116-rp114-rp15-rps8-rpL36. The rp15 locus is a new chloroplast
gene, not previously reported for any chloroplast genome and also not described as a nuclearencoded, chloroplast protein gene. This operon is interrupted by at least 7 introns. Two
introns witiin the rps8 gene flank an 8 bp exon, the smallest yet characterized in a chloroplast
gene. Three introns are relatively small examples of the well-known group II intron (20)
category found in both chloroplast and mitochondrial genomes. The remaining 4 introns
belong to a previously described class of small introns unique to the Euglena chloroplast
DNA and occurring in several ribosomal protein genes (18). Evidence is presented that
© IRL Press
C)
7591
Nucleic Acids Research
these introns comprise a new category of organelle intron, the members of which are highly
related to each other, but unique from both group I and group H introns. We designate
this new intron category 'group HII.
MATERIALS AND METHODS
Molecular Cloning and DNA Sequencing
Chloroplast DNA from photoautotrophically-grown E. gracilis strain Z was prepared as
described (21). The 3.3 kb chloroplast EcoRI restriction fragment EcoM (Fig. 2) was
isolated by agarose gel electrophoresis (22) and cloned in both orientations in the EcoRI
site of pBS-(Blue Scribe-, Stratagene Cloning Systems, Inc.). The resulting plasmids are
designated pEZC541.1 and pEZC541.2. A plasmid library of chloroplast BglII restriction
fragments of 3.0-5.0 kb in size cloned in the BamHI site of pBS- was constructed with
E. coli XL-1 Blue as host. Replica filters of the library were screened by colony
hybridization (23) with 32P-labeled DNA probes from purified EcoM fragment, and also
from a 600 bp DNA fragment encoding the E. gracilis 3'-end of rpsl4 and the complete
trnF and trnC genes (19). Plasmid pEZC942, which contains the 4.2 kb E. gracilis
chloroplast BglIl fragment BgLJ2, (Fig. 2) that overlaps both EcoM and EcoRI fragment
EcoA (24) was selected. The 2.0 kb EcoRI/BglIH fragment of pEZC942 was subcloned
into pBS- and is designated pEZC942.20. Plasmid pEZC948. 1, which contains the 3.5
kb BglII fragment BglM (Fig. 2) located entirely within EcoA, was also selected from
the library. The plasmid pEZC948.2, which contains the fragment BglM in the opposite
orientation to that of plasmid pEZC948. 1, was kindly provided by G. Yepiz-Plascencia
of this laboratory. The adjacent BglII fragments enabled DNA sequence analysis across
the EcoM-EcoA EcoRI restriction sites.
Double-stranded plasmid DNAs were purified on cesium chloride gradients after alkaline
lysis (23) and linearized by double digestion with PstI and XbaI (for pEZC942 and
pEZC948) or with PstI and BamHI (for pEZC541.2 and pEZC942.20). Overlapping
deletion subclones (25) were generated as in (18), except 7-10 units of exonuclease Ill/g
DNA was used to digest from either the BamHI or XbaI ends of the linearized plasmid
DNAs. Single-stranded template DNA was prepared using the helper phage M13K07 (26).
Templates for 100% of both DNA strands were sequenced via the [a-35S]-dATP dideoxy
chain termination method (27), with Sequenase (United Biochemical Co.) using standard
T3 and M13 primers.
DNA Sequence Analysis
Analysis of sequence data was performed on IBM PC-XT and PC-AT computers using
the DNA and protein analysis programs of Mount and Conrad (28). Protein open reading
frames were aligned with known polypeptide sequences compiled in the Protein Identification
Resource (P.I.R.) of the National Biomedical Research Foundation, Georgetown University,
using the FASTA search algorithm described in (29). Multiple protein alignments were
prepared with the program described in (30).
RNA Isolation
Purified chloroplasts from photoautotrpically grown E. gracilis cultures were resuspended
in lysis buffer (0.5% SDS, 10 mM Tris-HCl pH 7.5, 1 mM EDTA, 5 mM DlI'), extracted
three times with equal volumes of phenol saturated with 10 mM Tris-HCl pH 7.5, once
with phenol-chloroform-isoamyl alcohol (1:1:24), once with ether and then ethanol
precipitated. The resulting nucleic acid was digested with RQI RNase-free DNase (Promega
Biotechnology), at 1 unit/4g DNA, 15 min., 37°C. After phenol extraction and ethanol
7592
Nucleic Acids Research
precipitation,the chloroplast RNA was subjected to electrophoresis on 0.66 M formaldehyde
agarose gels (31).
RNA Hybridization
32P-labeled synthetic RNAs complementary to the rplJ6-rplJ4-rp15 transcripts were
prepared from linearized DNA templates (32) and used for probing membrane filter
(Northern) blots of chloroplast RNA fractionated on 0.66 M formaldehyde/I % agarose
gels (31). Separated RNAs were transferred to Genescreen filters (Dupont Co.).
for 12 hrs in 50% deionized formamide,S xSSC, 0.5%
Prehybridization was done at
SDS, 20 mM NaPO4 pH 6.8, 0.5 mg/ml Ficoll, 0.5 mg/ml polyvinyl pyrrolidone, 0.5
mg/ml BSA, and 200 jig/ml heat denatured herring sperm DNA. Hybridization buffer
55°C
and conditions were as for pre-hybridization, with the addition of probe[2x 107dpm] 32plabeled RNA. The membrane filters were washed in 0.5% SDS, 0.1 x SSC and 20 mM
NaPO4 pH 6.8, twice at 55°C for1 hr and once at 650 for 15 min. Autoradiography was
done with 12-16 hr exposures on Kodak X-omat-AR X-ray film.
Primer Extension cDNA Sequence Analysis
The purified oligo-deoxynucleotide primers, 5'-TCGGCTATATTTACCGTAAG for rps8
exon 3 (positions 3008-3027, Fig. 1) and 5'-TCCAAAGTTTGCCCCCACGT for rplJ6
exon 4 (positions 885 -904, Fig. 1), were synthesized at the University of Arizona
Biotechnology Center. The primers were 5'-end labeled with T4-polynucleotide kinase
(23). A total of 8x 106 dpm of 32P-oligonucleotide primer was co-precipitated with 12
yg chloroplast RNA and dissolved in 10yl 200 mM KCl, 10 mM Tris-HCl (pH 8.3 at
43°C). The mixture was heated to 85°C for 3 min. and quickly cooled on ice. The mixture
was then placed at 45°C for 1 hr and slowly cooled to room temperature. The primer
extension reactions were carried out in a series of five 7 tl reaction mixtures containing
4 units of AMV reverse transcriptase (Bethesda Research Labs), 2 LI of annealing mixture,
1.5 of5 x reverse transcriptase buffer (100 mM Tris-HCl pH 8.3, 50 mM MgCl2,
25 mM DTT, 280pg/ml actinomycin D), 300 AM of each dATP, dTTP, dCTP and dGTP
and 160,1M of one of the four dideoxyribonucleotides, except in the N-reaction which
for 45 min., 5 of loading dye (96% formamide,
lacked ddNTPs. After incubation at
10 mM EDTA, 0.1 % xylene cyanol, 0.1 % bromophenol blue) was added. After heating
to
for 2 min., the samples were electrophoresed through 0.25 mm thick 6%
polyacrylamide gels containing 7 M urea, 89 mM Tris-borate (pH 8.2) and 2 mM EDTA
and visualized by autoradiography. Additional control primer extension reactions were
of a genomic
done as above except the 5'-32P-labeled primer was annealed to 1
chloroplast DNA clone (a single-stranded recombinant phagemid, pEZC541.2).
,d
43°C
yl
85°C
,tg
RESULTS
Characterization ofthe rplJ6-rplJ4-rplJS-rpl8-rp136 Chloroplast Ribosomal Protein Genes
We are interested in the organization, expression, RNA maturation pathways, and novel
introns of Euglena chloroplast ribosomal protein genes, and the evolutionary relationship
of these genes and introns to other comparable ribosomal protein operons (17,18,19,33).
The region under study is located in the E. gracilis chloroplast EcoRI fragments EcoM
and EcoA (24). Preliminary evidence for the presence of ribosomal protein genes on the
EcoM fragment of Euglena chloroplast DNA was obtained via Southern hybridization
and I1, and
experiments. EcoM hybridizes to tobacco chloroplast DNA fragments
BamHI-7 and 10 which encode a cluster of ribosomal protein genes (33) (data not shown).
The DNA sequence of 3840 bp from the EcoM and EcoA fragments (24) of Euglena
SalI-1O
7593
Nucleic Acids Research
*
100
TACCTAATTTAGAAMTTTATTATCTTTTATTTTCGTTGTTTTTATAGTCTTATATGTTAAGTCCTAAGCGAACGAAGTTTCGTAAATATCATAGAGGTAG
M L S P K R T K F R K Y H R G R
rpL16
200
AT TAACAGGTAAATCTATATGAGACT T TTT TTATGTAAAAATAAAATT T TTAT TGCGTT TTACCACGATAGT TAT T TTATT CGAT TTGAT TTAAGT GA
L T G K I Y
300
AATTTTTTATTTAATTGATAAMGTTGTTTTTGGTAATTATGCTTGTGCGATCTAAATAATTTAAMAAATTAATTATGAGTAAATTTAATTATTTTTTGTT
D K V V F G N Y A (L)
400
TAAGTTTTAAATTATT GTAATTAAATATCTTTAATAGCATTTTATTCTTAACGTAATMAAT GTTTAATAATTTTAAATT TTAAGTATTAAATATT TT TT T
500
GAGCTAATAAMMTTAAAATAATAAAAATACAT TTAATCTTTACTAAATTAAGTAGATTAT TATTAAAT TT T TTAT T TTATAT TAT GATATACAATT
600
TGATTTTGTACTTATTTTACTTTTTTTAAATTTATTAATTAAAMGCCAAMTGCATTACATTTTGCTTGTTTGGATTTTTTAAGGCATATTGTACTTGTTT
700
TACAATCATTAGAGCCTGGTTGGATAACTTCACGTCAAATTGAAGCTGCTTTGTGGTTTTTTGTGATTTCTAATAAATTTATATAGATTTTTTGCTTTCA
O
S
L
P G WJ
E
I
S R Q
T
E A A
I
800
AAGTTATTTAATAAMTTTTTATTGTTTATAAMcTTTTTTGGTTTTcGATAAAAAGGTTTTTTAAcTTATTTAcTTATMAGATGATTTTTCrTTMTTGTT
900
TTATTTTAAATAAATTTTTACTGTTTTCTTTTCTAAAAAAAAATTTTTTTATTTAATTCGTAGAGTTATTACAAGATATGCAAAACGTGGGGGCAAACTT
R R V I T R Y A K R G G K L
1000
TGGATAAGAATTTTTCCCGATAAGCCTGTAACATTTCGAGCAGCCGAAACTCGCATGGGTTCAGGGAAAGGAAMTGTAGAATATTGGGTTGCAATTGTAA
W I R I F P D K P V T F R A A E T R M G S G K G N V E Y W V A I V
1100
AACCTGGAAAAATTCTTTACGAAGTATTGGGTATTTCAGAATCCATTGCAAAGTATTCTTTAAAAATAGCAGGATATAMMATGCCTATTAAAACTCGTGT
G K
K P
I
L
Y
E
G
V L
I
S
S E
I
A K Y
S
L
I
K
A G Y
K M
P
I
K T
R V
1200
TAT TGTTAAMMTTTAAMACT GTTTAATGCTAATT GTT TT TGTTAATAATATT TT TGGTAAMAT TCTTAT TTT GGTAGT TCT TCGT TCTT TATAAT TTAT T
I V K I *
1300
T T TTAGT TTAGATGATTAAACCACAAACATATTTGAAAATTGCGGATAATACTGGAGCACAAAAAT TATGTGTAT TCGTATAT TAGGACCAAAT TGTCA
rpL14
M I K P 0 T Y L K I A D N T G A 0 K I M C I R I L G P N C 0
1400
GTATGCGAATATTGGTGATATAATAATAGCGGTTGTAAAATTGTGTATTAGAGTTAATAATTTATATATCTTTTTATGTTTTTATTATTTTATTATTCTT
A N
Y
G D
I
I
I
A V V K
I
1500
TTATATTGATATTGTTTATAAATTTATAAAGATTTTATCTATTTTAAAGAAGCTATTCCTAACATGGTTGTTMAAAMATCAGATATTGTTAAAGCTGTTA
E A
P N M V V K K
I
S D
I
V K A V
1600
TTGTTAGAACTGTTAAAGGAGTACGTAGAGAAAGTGGAATGGCAATTCGTTTTGACGAAAATGCTGCTGTCATMTTMTAATGATCGTTCACCTAAAGG
I
V R
V K G V R
T
E
R
S G M A
I
F D E
R
N A A V
I
N N D
I
R
P K
S
G
1700
TACAAGAATT T TCGGCCCCAT TGCTCGCGAATTGCGAGAAAAGGAATTCGTAAAATAATGTCCTTAGCGCCAGAGGT TGTGTGATACAATAAAGACT TC
T R I F G P I A R E L R E K E F V K I M S L A P E V V*
1800
TCATAATAAGTTGTTTTTCATTTTTAGCGTATATCTATATTTTCTTAATTTTTTGTATGTGTGTTTTTTTATTATAAMTTTTACTTTTATCATTCCTATT
1900
TATAGTTCCAAAMTATTTTTATTAAAATTTAGTGTAGTAGTAATATTTMAAMAGGATTTTATMAAAMAATGCAAAGATTAAATCGTTTTATTTAGAAAC
rpL5
M Q R L K S F Y L E T
2000
TATCAT TCCCAAACTTAAAGAAGAAT TTGGTTATGT TAATTCTTATAGGGTTCCTAAATTMAAAMAGAT TGT TATAAMTCGAGGATT TGAT GAAT CT TGT
I I P K L K E E F G Y V N S Y R V P K L K K I V I N R G F D E S C
2100
CAAAATTCAAAAATTTTGGAAGTTTTATTAAATGAATTAGAAATTATTTCTGGTCAAAMGCCTATTATAAGTAAGGCGAAAAAAGCTATTGCTAACTTTA
Q
S
N
K
I
L
E
V
L
L
N
E
L
E
I
I
S
G Q K
P
I
S
I
K A
K
A
K
I
A
N
F
2200
AACTTAAAGAAAAGATGCCTGTTGGTATGTTTTTGACTTTGCGTAGTGAAMGATGTACAGTTTTTTAGACCGGTTAATTAATCTATCTTTACCTAGMT
K
L
E
K
K M
P
V G M
F
T
L
L
R
S E
K M Y
S
F
L
D
R
L
I
N
L
S
L
P
R I
2300
TAGAGATTTTCAAGGAATAAACAAGAATTGTTTTGATGGATCAGGTAACTTTAGTTTTGGGTTAAGTGAACAATCAATGTTCCCTGMATTAACTTTGAT
R D
F O G
I
N
K N
C
F D
G
S G N
F S
F G
L
S E Q S M
F
P E
I
N
F D
2400
AAAATGATTAAAGTACAAGGTTTGAATATAACAATCGT TACAACTGCTGAAACGAATCAAGAAGCTTTCTTT CT TTTAMAAGAATTAGGTATCCCGTTCC
K M
I
K V Q G
L
N
I
T
I
V T
T A E
T
N Q E A
F
F
L
L
K E
L
G
I
P
F
2500
GAGATTAATTTTTATACTTTTTTTTACGTGTTAAGATGTACAAAATCTATTTTTTATTMTAGTAATTTTATTGTTTTATTTTTAGAATTTTATGACAAA
R D*
rps8
M T N
2600
TAT TGAT TTGCGAGT TTAAGTGTAATAATATATTATTAATAATTT T TTATAT TTAT TAT CTATATATATGT TGAAAT TCGGTAAATAAGAGCAT TAT TAA
I D
2700
ATATTTTGATAGTAGTATTTTTCTTTAACAAACCGTTTATTATTGTTTTGTAAAATTTATTCATTAGTTGGGAGATCACTTATTTTTATTACTATATTAA
7594
Nucleic Acids Research
2800
TATTACATAATTAAATTAGTTAAATATTAGTAAATCTTGAAATATTTAAATTTTATTTTTCAAAACTTTATGATTAATGGTTTCCTGTAGAGTTTGTTAA
2900
TAAATTGTTTTAAAATTTATAATTTAATTTAATGTAATTAGGTGTGATTTTTTTTATAGAAGTATAATTAMTTTTTGATTTTTATTAATATATATACAT
V I (S)
3000
ATTTTTATTCTTTTGTTAGGTTTTATTTAGCCTTATACGATATGCTTACAAGAATAAGAAACTCTCTTT TMTAAMAGCTAGAAAMGTTAATGTTATTAA
D M L T R
I
R N S L L
I
I N
3100
K A R K V N V
TACAAMACTTACGGTAAATATAGCCGAAMTTTTAAAAAAGAAGGATTTATTGACTCTTTTGAATTGGCTGACGCTACGTGTTTAACTGAAACGGTGTT
T K L T V N
I
A E
I
L K K E G F
I D S F E L A D A T C L T E N G V
3200
ATAAAAAAMTATATTACAATCTTTTTAAAMTATAAAGGTCCAAAMCAAGTTTCTTATATAACTAAATAAAACGTGTAAGCAAACCTGGTTTGCGTACTT
I
K K Y
I
T
I
F L K Y K G P K Q V S Y
I
T K
I
K R V S K P G L R T
3300
ATAGTAGTTATAAAMGACTACAATCAGTAGCAGGTGGCGTTGGTTTAACTGTTTGTGCGATATGTTTTTTGAGTTTCATAAGATAAAAMTTATAAGCAAT
Y S S Y K R L Q S V A G G V G L T V
3400
AATAAMTTTTTTATAATGATTTCCTTTATTATAATTTAGCAAGAATTTCTATTTTTGTATTCTAATGCATGTTTTTGTGAAAAACATAAAMTTTAATTCT
3500
TATTAT TCGTGAATTATT TTGAAATAT T T TTTAACGTAATT TT TTGTATTATTAT TATTATTTTTATTTGAGCCT TAT GTTGATTAT TAACTTGTATT GT
3600
T CTTAAMGAAMTTTTAAATTTAATTTTGCATGTCTACTTCTAAMGGATTAATGACTGATCGATTGGCTAGATCTAATAAMATTGGTGGGGAGATTCTGT
(L) S T S K G L N T D R L A R S N K I
G G E
I
L
3700
T TTATATTTGGTAAATAAACTAAAATTACCTCTTTTGTTTTATTATTTTTTATTAAAAGTTGTTTGATAATTTTTGTTTAATTAGACTTAATTTTTATG
rpL36
M
F Y I W
3800
AAAATACGTTCTTCTGTTAAAAAATTTGTAATAAATGTTATTTGATTCGTCGCAAAAMCAATCTTTTAGTTGTTTGTATAAATAACAAGCATAAACAAC
K I
R S S V K K I
C N K C Y L
I
R R K N N L L V V C
I
N N K H K Q
3840
GACAGGGTTAAACTTGTTTTGCGTCTATAATTTGTAGGCG
R Q G *
Figure 1. DNA sequence of the RNA-like strand for the Euglena chloroplast ribosomal protein loci rp116, rp114,
rplS, rps8 and rp136. Coding regions for exons are designated by the single letter amino acid code directly below
the second nucleotide of each triplet codon. Amino acid symbols that are in parenthesis indicate a split codon.
Sequences which contain conserved nucleotides at the 5' and 3'-ends of each intron are underlined. Asterisks
(*) designate stop codons. Gene symbols follow the nomenclature described in (46).
gracilis chloroplast DNA is given in Fig. 1. This region was found to encode a cluster
of 5 ribosomal protein genes, with the gene organization rplJ6-95 bp spacer-rplJ4-183
bp spacer-rpl5-84 bp spacer-rpl8-83 bp spacer-rpl36. Protein coding regions are interrupted
by introns in the genes for rplJ6 (3 introns), rplJ4 (1 intron), and rps8 (3 introns). The
cDNA sequence analysis of splice boundaries, and intron properties are described below.
The overall organization of the exons, introns, and intergenic regions for the 5 ribosomal
protein genes is shown in Fig. 2. This region is located 2.8 kbp distal to, and in the same
polarity as, the previously described rp123-rpl2-rpsl9-rp122-rps3 Euglena chloroplast
ribosomal protein gene cluster (18). Immediately downstream from the 5 genes described
in this paper, and in the same polarity are the genes trnl-rpsl4-trnF-trnC (19).
Exon-Intron Boundaries of rplJ6 and rps8 Determined by cDNA Sequence Analysis.
It was not possible to precisely define the exons of rplJ6 and rps8 from the DNA sequence
data alone because chloroplast ribosomal protein coding regions are not well-conserved
among species (18), the introns of Euglena chloroplast genes are highly novel, and several
potential exons appeared to be extremely small. To accurately determine splice boundaries,
primer extension cDNA sequence analysis was employed. From a 20-nt oligonucleotide
primer complementary to positions 885-904 (Fig. 1) of rplJ6-exon 4, the sequences across
all three rp116 introns were determined by direct cDNA sequencing of the spliced mRNA
template. A comparison of the genomic and cDNA sequence ladders used to characterize
splice sites for rpll 6 exons 1-2 and exons 2-3 is shown in Figure 3. (Data on exon
7595
Nucleic Acids Research
v,,
^
,,,
,Z,,
t o ,, ^,
rpIl 4
rpll 6
~~U I
II ITE~~
s miu
rpl5
-
rps8
rp13 6 trn I
) ( rps1 4
_ 11U
EKE
La_
Ec,.o RI
HIrInd III
A
M
21
2
M
Bg
ktbp
0.0
1.0
2.0
3.0
4.0
Figure 2. Organization and restriction map of the rplI6, rplJ4, rplS, rps8 and rp136 loci from the Euglena gracilis
chloroplast genome. The tmIn and rpsl4 loci are included as reference genes (19). The black boxes represent
exons, and open boxes represent introns. The hatched region above the genes indicates the probe used in the
northern analysis (Fig. 6). The large arrow above the gene boxes indicates direction of transcription, from left
to right. The small arrows under the genes indicate the position of primers used in the cDNA sequencing. Sizes
are in kilobase-pair (kbp). Restriction fragments are labeled as numbers or letters between restriction sites (see
ref. 24 for complete map).
3-4 splicing not shown). Three rpl]6 introns of 97, 356, and 208 nt were identified.
They are at positions 120-216, 245-600, and 651-858 (Fig. 1).
The rps8 amino terminal coding region, presumably located somewhere within the 600
bp distal to rplS (Fig. 1), could not be defined from the DNA sequence data alone.
Therefore, the 5'-end of the spliced rps8 mRNA, containing two exon-exon boundaries,
was also determined by cDNA sequence analysis. A 20-nt oligonucleotide primer,
complementary to positions 3008 - 3027 of a conserved rps8 coding region now identified
as within exon 3, was used. A comparison of rps8 genomic and cDNA sequence is shown
in Fig 4. The cDNA ladder contains two unique exon sequences, including an 8 nt-long
sequence of 5'-CTAATAC-3'. The mRNA-like complement of this exon
(5'-GTAATTAG-3') could be uniquely positioned at bases 2835 -2842 (Fig. 1), 95 bp
upstream from exon 3 of rps8. The cDNA sequence then skips over the 327 bp intron
1 to an amino terminal exon coding for 5 amino acids (positions 2493 -2507). To our
knowledge, the 8 nt exon 2 of rps8 is the smallest exon described to date for any chloroplast
gene.
In addition to the five exon-exon boundaries determined from the cDNA sequence, two
boundaries (rpsl4 intron and rps8 intron 3) were predicted from highly conserved regions
of the genomic sequence data (Fig. 1). Properties of the 7 introns from this ribosomal
protein transcription unit are described below.
Ribosomal Protein Gene Products
The amino acid sequences derived from the five ribosomal protein genes are given in Fig.
1. The amino acid sequences were analyzed at the seven splice sites and were found to
continue in-frame across the exon-exon junctions of the spliced mRNAs. Two interesting
examples are the rps8 exon 1 -2 and exon 2-3 junctions. As shown in Fig. 5a, the 8-nt
exon 2 is spliced in frame to exons 1 and 3, and encodes a valine and conserved isoleucine
residue, as well as part of a serine codon. Splicing yields a product that is colinear with
six other ribosomal protein S8 sequences (Fig. 5a). Adjacent to the exon 1 -2 junction
is an invariant aspartate at position 5. At the exon 2-3 junction, 10 of 11 residues are
invariant or semi-invariant.
The five Euglena genes rplJ6 (134 codons), rplJ4 (122 codons), rplS (180 codons),
rps8 (141 codons) and rp136 (38 codons) encode polypeptides of predicted molecular
weights of 15467, 13387, 20617, 15670, and 4405, respectively. These chloroplast
7596
Nucleic Acids Research
ExoN s
GEN0MI C
SEOUENCE
DNA
A
--4
T
AT
.T TTT
C
A
G
zA
SEQU ENCE
cc..
RNA
G
GA
A T C G N
T
~~~~~~~~~~~~~~~~~T
T
T
.
.
\Aq
.
AAA
A
T
..
..T....
~~
~~ATTIC.
L&
3
G
T
ACTCT
C.)
-
rpl16 cDNA
A.AT
AT
-
A A~~~T
C
AAAC.
_~~~~~~~~~~~~~~~~~W
.
T
A
A
AT
A~~~~~~~~~~~~~~
t
i+
A _~~~~~~~~~~~~
CGAAC T
TCA
GTAACT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Figure 3. Primer extension RNA sequence analysis of the splice junctions between rp116 exons 1 through 3.
The lanes labeled RNA (A,T,C,G,N) contain the cDNA sequence generated by reverse transcription of Euglena
chioroplast RNA using a primer complimentary to bases 885-904 (Fig. 1). The rp116 cDNA sequence is listed
to the right of the lanes from which it is read. The bands in the control 'N' lanes are due to spontaneous stops
and pauses in the reverse transcriptase reactions. In both the upper and lower RNA panels, a solid line intersects
the sequence at the exon/exon junctions. The line branches to the left of the RNA panel and intersects the
corresponding exon/intron junctions (arrows) in the genomic DNA sequencing panel. The chloroplast genomic
DNA sequence is listed to the left of the lanes from which it is read. Exons 1, 2 and 3 and mntrons 1 and 2 are labeled.
7597
\:,;j=.!
Nucleic Acids Research
Ab.
*.
w
F
....
...
3s
:}}.
x
_
._
::
,,,ll
_.
t-
_.
..f .: .... .:
_
X_...F
;0>"''
Figure 4. Primer extension RNA sequence analysis of the splice junctions between rps8 exons 1 to 3. The lanes
labeled CT RNA (A,T,C,G) contain the cDNA sequence generated by reverse transcription of Euglena chloroplast
RNA using a primer complementary to bases 3008 -3027 (Fig. 1) as described in Materials and Methods. The
rps8 cDNA sequence is listed to the right of the lanes from which it is read (right panel). Dark lines that intersect
the sequence indicate exon/exon junctions in the cDNA sequence. The DNA sequence panels (DNA, ATCG)
contain the sequencing reactions for the corresponding chloroplast genomic clones. The genomic DNA sequence
is listed to the left of the lanes from which it is read. Exon and intron regions are labeled next to the DNA
sequence. The black lines cross the DNA sequence at the exon/intron junctions (arrows), linking up with the
corresponding lanes in the DNA and RNA sequencing panels.
7598
Nucleic Acids Research
a. 30s Ribosomal Protein S8
v
*+
Euglena
E. coli
Mycoplasma
Liverwort
Tobacco
(amino terminus)
v
* + +++++ +*+ +*
* *++++* ***+
****
MTNIDVISDMLTRIRNSLLIKARKVNVINTKLTVNIAEILKKEGFI...
MSMQDPIADMLTRIRNGQAANKAAVTMPSSKLKVAIANVLKEEGFI...
MT TDVIADMLTRIRNANQRYLKTVSVPSSKVKLEIARILKEEGFI...
MG NDTIANMITSIRNANLGKIKTVQVPATNITRNIAKILFQEGFI...
MG RDTIAEIITSIRNADMDRKRVVRIASTNITENIVQILLREGFI...
b. 50s Ribosomal Protein L5
*+ +*
+* ++++ +* +*+ * +
**+++** +* *
++ +++*+*+
E. coli
MQ RLKSFYLETIIPKLKEEFGYVNSYRVPKLKKIVINRGFDESCQNSKILE
MA KLHDYYKDEVVKKLMTEFNYNSVMQVPRVEKITLNMGVGEAIADKKLLD
Mycoplasma
MKSRLEIKYKDQIVLELFKELNYKSIMQVPKIQKIVINMGIGDATTDPKKLD
Euglena
VLLNELEIISGQKPIISKAKKAIANFKLKEKMPVGMFLTLRSEKMYSFLDRL
NAAADLAAISGQKPLITKARKSVAGFKIRQGYPIGCKVTLRGQRMWEFFERL
AAIFELEKLSGQKPIVTKAKKSLAVFKLREGMAIGAKVTLRGKKMYDFLDKL
Euglena
++*
E. coli
Mycoplasma
+*****+++**+*++* **+++
* +++**+***
Euglena
E. coli
Mycoplasma
E. coli
Mycoplasma
+***+++* +*+++*
*++ ++*** *+++ *++** +***+++**
++ *++**
INLSLPRIRDFQGINKNCFDGSGNFSFGLSEQSMFPEINFDKMIKVQGLNIT
ITIAVPRIRDFRGLSAKSFDGRGDYSMGVREQIIFPEIDYDKVDRVRGLDIT
INVALPRVRDFRGVSKTSFDGFGNFYTGIKEQIIFPEVDHDKVIRLRGMDIT
* *+*++++*+
Euglena
++*
*
**++
IVTTAETNQEAFFLLKELGIPFRDter
ITTTAKSDEEGRALAAFDF PFRKter
IVTSAKTNKEAFALLQKIGMPFEKter
Figure 5 a) Alignment of the amino terminus of Euglena chloroplast ribosomal protein S8 with the corresponding
S8 sequences from E. coli (34), Mycoplasma capricolum (35), liverwort chloroplast (3), and tobacco chloroplast
(33). Vertical arrow heads point to the location of introns in the Euglena gene. b) Alignments of the gene products
of the Euglena gracilis, E. coli (34) and Mycoplasma capricolum (35) rpl5 loci. Amino acids are represented
as the single letter amino acid code. Astensks (*) denote amino acid identity and plus signs (+) denote conservative
amino acid replacements.
ribosomal protein sequences were aligned with sequences in the P.I.R. protein data base
by the FASTA algorithm of Lipmann and Pearson (29). The only significant alignments
were found with 70S ribosomal proteins from procaryotes and other chloroplasts. The
gene products are 42-62% identical in primary structure to the E. coli ribosomal protein
counterparts (Table 1), and 42-69% identical to the corresponding plant chloroplast gene
products. Based on the amino acid sequence comparisons, cDNA sequencing, and the
mRNA products of these genes (described below), we would predict that all 5 Euglena
genes are functional genes, and not pseudogenes as has been described for the rp123 of
spinach chloroplast DNA (14).
A New Chloroplast Gene, rplS
The gene for ribosomal protein L5, rpl5, is a new chloroplast gene. It is not encoded in
the chloroplast genomes of liverwort and tobacco that have been completely sequenced,
nor in the partial chloroplast sequences known from other species. There are no known
nuclear genes for chloroplast ribosomal protein L5.
The identification of rplS is based on its location in a ribosomal protein operon, and the
amino acid sequence alignment with the corresponding E. coli polypeptide shown in Fig.
Sb. There are 42% identical residues, and an additional 41% of the residues (conservative
replacements) contributing to the similarity score of FASTA. The polypeptides are exactly
co-linear except for a single additional residue near the carboxyl terminus in the Euglena
7599
Nucleic Acids Research
TABLE 1.
GENE_
PERCENT AMINO ACID IDENTITY
E. coli
LIVEMWORT
TOAC:CO
65
69
64
55
rpll6
rpll4
rpl5
rps8
rp136
54
62
42
43
51
--
--
53
68
42
65
Average
50
64
57
TABLE 1. The percent amino acid Identity of the Euglena
chloroplast ribosomal protein gene products compared with
those of E. coli, liverwort and tobacco. The average percent
Identity between all the Euglena gene products and those
from a particular organism is listed at the bottom of each
column.
sequence. A similar alignment is possible with rplS from Mycoplasma capricolum
(Fig. 5b.)
The rp116-rpl]4-rplS-rps8-rp136 genes are in a Polycistronic Transcription Unit
A preliminary analysis of transcripts from the ribosomal protein gene cluster was
undertaken. A 32P-RNA probe specific for the rp116-rplJ4-rp15 coding region (Fig. 2)
was hybridized to membrane filter (Northern) blots of chloroplast RNAs that had been
separated by electrophoresis. Transcripts of 1.1, 1.3, 1.6, 2.3, 2.5, 2.9, 4.5, 4.8. 7.8,
and 8.3 kb and additional minor RNAs are detected. Based on additional experiments with
gene and intron specific probes (D A. Christopher and R. B. Hallick, in preparation)
we can interpret the 1.1 and 1.3 as dicistronic rplJ6-rplJ4 and rpl14-rplS mRNAs, and
the 1.6 kb species as a tricistronic mRNA. The 2.3, 2.5, and 2.9 kb mRNAs are even
larger precursors of the ribosomal protein transcription unit, containing cistrons distal to
rplS. The 4.5 to 8.3 kb transcripts are much larger than the region encoding the 5 ribosomal
protein genes, and are presumed to be part of a larger precursor mRNA, perhaps encoding
regions of the flanking ribosomal protein genes (19). The intron specific probes hybridize
to unique, but rare and presumably unspliced, chloroplast RNAs (data not shown). A detailed
analysis of these pre-mRNAs is in progress.
DISCUSSION
Gene organization
As illustrated in Fig. 7, the overall organization of the chloroplast ribosomal protein genes
in Euglena has a striking similarity to the arrangement of genes at the 3'-end of the S10
ribosomal protein operon and the spc operon of E. coli (6,34). The Euglena genes are
also organized like the analogous genes in the tobacco (33), liverwort (3), spinach (15)
and maize (38) chloroplast genomes. There appears to be an overall evolutionary
conservation of gene order for many ribosomal protein genes. The major differences of
7600
Nucleic Acids Research
EG CT
RNA
KB
4.8_
4.5
2.9
2.5
1.6-
1.3-.
Figure
in Fig.
the
6. Northern blot of
2.
Transcript
chloroplast
sizes
purified Euglena
are
chloroplast RNA probed with the rp1J6-rp1l4-rplS
labeled in kilobase
genomes with respect to E.
proteins L29, S17, L24, L6,
in the nuclear DNA of
LI18,
region
illustrated
(kb).
coli
and S5. These
Euglena, higher plants,
are
the absence of genes for ribosomal
proteins
are all
presumed
and liverwort. The
one
to be encoded
rearrangement in
chloroplast gene order vs E. coli involves rps 14, which in Euglena is encoded distal to
rp136 and tmI (Figs. 5 and 7), and in higher plants at a completely different chloroplast
locus (2,3). Higher plants are distinct from both Euglena and E. coli for the presence
of the gene infA for initiation factor IF-i (9) adjacent to rp136 (formerly known as secX).
The allocation of genes for ribosomal proteins between the chloroplast and the nucleus
in photosynthetic eukaryotes has largely been conserved throughout evolution. Differences
in ribosomal protein coding capacity among chloroplast genomes are uncommon, but not
without precedent. Both Euglena and Chlamnydomonas chloroplast DNAs have a tufA locus
for elongation factor Ef-Tu (36,37) which is absent in higher plants and liverwort. Tobacco
and liverwort chloroplast genomes differ in only one of their 20 ribosomal protein genes,
with rpl2J being present only in liverwort and rpsl6 only in tobacco (2,3). Therefore,
the discovery of a new chloroplast ribosomal protein gene in Euglena is noteworthy. This
is the first example of a chloroplast rplS gene from any species, and only the third example
when all prokaryotes are considered. The rplS gene is probably nuclear-encoded in higher
plants,
as
are
chloroplast ribosomal proteins (4,5).
positions of introns in the Euglena rplJ 6, rplJ 4, and rps8 loci distinguish
higher plant chloroplast DNA counterparts. The rp1J6 loci of tobacco
other
The number and
them from their
7601
Nucleic Acids Research
trn I
Euglena
chloroplast
E.coli
operons
rps3
U (2.8
S3
L16
3'-end
Si10
rpIl16
kb); 11F
Liverwort
chloroplast
14 rpI 5 rps8
F[T M
L29 S17 L14 L24 L5 S14 S8
operon
rpl36/ rps14
l
L18
L6
S5
secY
L36
spc operon
inf A
Tobacco
and
rp
rps 3
rpI 1 6
rpI 1 4 rps 8
-
-
rpl 3 6
rps I 1
**
Figure 7. Comparison of the gene organization of the Euglena chloroplast ribosomal protein loci with those of
the E. coli spc and 3'-end of the S 10 operons (6) and similar ribosomal protein gene clusters from tobacco (33)
and liverwort (3).
(2), liverwort (3), maize (38), spinach (14), and Spirodela (7) all possess a large group
1I-like intron after the first three amino acids (M-L-S). The Euglena rpl]6 gene lacks a
large intron in this position, but contains three smaller introns. The rp116 locus in
Chlamydomonas lacks introns (39). The rp1J4 and rps8 loci of tobacco and liverwort lack
introns, while each of the Euglena genes contain one and three introns, respectively. The
Euglena chloroplast genome is different in having a large number of introns in protein
coding genes (40), but lacks intron containing tRNA genes (41). The 8 bp exon is a novel
feature of the Euglena rps8 locus. It is flanked by two introns. It is the smallest exon
defined to date in any chloroplast gene, perhaps defining the minimum exon size for correct
splicing in chloroplasts. Small exons are found in yeast ribosomal protein genes S10 and
L46 (42), each of which contains an intron located immediately after the first methionine.
Genes for yeast L25 and S16a have introns after the first 3 and 5 codons, respectively.
The amino terminal exons of rplJ6 in tobacco (33), Spirodela (7), liverwort (3), maize
(38) and spinach (15) contain a 9 bp translated region encoding Met-Lys-Ser immediately
upstream from a single large group TI-like intron. However, as for other chloroplast
transcripts (43,44), the amino terminal exons are associated with a 5'-untranslated leader
region (7,15).
Another highly unusual feature of the Euglena chloroplast gene cluster is the 2.8 kbp
rps3-rplJ6 intercistronic region (Figs. 2 and 7). By comparison, the corresponding
rps3-rplJ6 region of E. coli, and tobacco and liverwort chloroplasts are 12, 147, and 58
bp, respectively. The region has been sequenced (D. A. Christopher and R. B. Hallick,
unpublished), but no genes have been identified. From our preliminary Northern
hybridization analysis, it would appear that the 2.8 kbp rps3-rplJ6 intercistronic DNA
is transcribed, and that a stable transcript from this locus does accumulate.
Gene Expression
The 4.5 to 8.3 kb transcripts detected with an rplJ6-rplJ4-rplS specific probe are much
7602
Nucleic Acids Research
larger than the region (3.8 kb) encoding the five genes. These genes are most likely cotranscribed with upstream and/or downstream genes. Beginning 7.2 kbp upstream from
is the cluster of ribosomal protein genes
rpl16, and in the same polarity,
that
is
similar
to the proximal end of the E. coli spc operon
rp123-rpl2-rpsl9-rp122-rps3 (18)
overlapping
Downstream
are
the
(6).
trnI-rpsl4-trnF-trnC loci. There is precedent forwith
flanking
E.
between
the
coli
and
Co-transcription
(45).
Sl0
spc operons
transcription
genes for the spinach rps3-rpll 6 loci has been proposed (15). Experiments are currently
in progress with gene-specific probes from each cistron to define the ribosomal protein
mRNA transcription and RNA maturation pathway(s) for Euglena chloroplasts.
A New Category of Cell Organelle Intron, Designated 'Group III'
There are two types of introns within Euglena chloroplast ribosomal protein genes. They
differ in size, secondary structural features, degree of conservation of boundary sequences,
and other properties. One of these categories is the well-known groupII introns that are
found in both chloroplast and mitochondrial genomes (20), exemplified by rp116 intron
2, and rps8 introns1 and 3 (Fig. 1, described below). The remainder of the introns are
very similar to a previously described, novel group of 6 introns of the Euglena chloroplast
ribosomal protein genes rp123, rpsl9 and rps3 (18), and three introns in tufA (16). With
the addition of the small introns of rp116, rplJ4, rps8, and rpsl4 (19), there are now enough
of these introns that are sufficiently similar to each other to warrant their classification
as a new category of chloroplast intron. We propose the designation 'group HI introns.'
Examples of 13 group III introns are shown in Fig. 8a. The properties of these introns
are as follows: (i) They are small and remarkably uniform in size, with a range of 95-110
nt, and an average size of 102 nt. By contrast, the smallest Euglena chloroplast group
H intron (rps8 intron 3) is 277 nt. (ii) They have degenerate versions of the groupII intron
consensus boundary sequences (Fig. 8a) The 5'-boundaries of 5'-NTNNG (N=nucleotide)
have two conserved bases from the 5'-GTGYG- group H consensus sequence (20,47).
The3'-boundaries of ANNTNNNN-3', have their two conserved nucleotides and pyrimidine
rich nature in common with the ATTTTAT-3' group II consensus sequence. In 12 of 13
examples (Fig. 8a), the conserved A residue is exactly 8 nt from the 3'-cleavage site.
The conserved bases in the boundaries 5'-NTNNGN... ANNTNNNN-3' may be central
to the splicing mechanism. (iii) They lack the highly conserved secondary structural features
characteristic of group II introns, a central core with 6 radiating, helical domains I-VI
(20,48). We have been unable to identify any conserved secondary structure among the
group 1m introns. (iv) They are located primarily, but not exclusively, in genes for
components of the Euglena chloroplast translation and transcription machinery. There are
numerous group IH introns in the rpoB-rpoCl-rpoC2 operon (C. Radebaugh, G. YepizPlascencia, and R. B. Hallick, unpublished observation), but a few group introns are
also present in psbB and atpI (R. Drager, J. K. Stevenson, and R. B. Hallick, unpublished
observation). The small introns are to date unique to Euglena chloroplast DNA. (v) The
group Im introns are very A/T rich, with a base bias of T > A > G > C. This is a
feature characteristic of most Euglena chloroplast introns (18,49,50). The uniformly small
size, degenerate group boundaries, and lack of any discernible secondary structure
distinguishes group introns from all other chloroplast and mitochondrial introns, and
Im
Im
from introns in nuclear
I
genes.
There may be additional group III introns in the sequence data of Fig. 1. The 208-nt
intron 3 is not a group H intron, but is twice as large as expected for a group HI
rp116
intron
intron. One possible interpretation is that rp116 intron 3 is actually one group
Im
7603
Nucleic Acids Research
A) GROUP I I I
INTRON
LOCUS
RPL23 - vs -l
RPL23 - ivs-2
RPL23 - zvs-3
* RP#19-ZVS-l
RPS19- IVs-2
RPS3 - iVs - 2
* TUFA-IVS-1
* TUFA-IVS-2
* TUFA-IVS-3
RPL14 - zvs* RPS8-IVS-2
RPsl4 - ivs* RPL16-IVS-1
* RPL16-ivs-3
EXON
GTAT66
CAAATG
AAATTA
TCGTTT
GGTCAC
TAGCTC
AATAAA
AGTAGA
ATAGAA
GTAAAA
AATTAG
CGATTA
ATCTAT
GCTGCT
CONSERYED
* INTRON...
...
INTRON
EXON
GT6TGTTCTTAT. (
TTTTGAAT6TTT. (
GT6AGATTATAT...(
TTGA6ATTT6AC. .. (
TTTTGATTTTAT...(
ATAAGATATTTC...C
AT6AGTTAATTA...C
ATAAGCTTAAAA ...C
AA6TGTCGTTTA...C
TT6TGTATTA6A...C
GT6T6ATTTTTT...C
ATTTGATTTTCT...C
AT6AGACTTTTT...(
TT6TGGTTTTTT...C
.T .....
82 NT ). .TTTTAAATCTCA
75 NT )... TATAACTTCATA
79 NT )... ATCAATTTATAT
79 NT )... TTAGATCTTTTT
73 NT )... TTAAACCTTATA
78 NT ). .ATTAATTTTATA
71 NT )... AAAGAAAACAAA
79 NT ). .TTCCATCAAAAA
86 NT ). .AAACATATTGGA
85 NT )... ATCTATTTTAAA
71 NT ). .. TTTAGCCTTATA
82 NT )...TTTAACTCTTT
73 NT ). .TTTTATTTAATT
184 NT )..TTTTATTTAATT
A..T...
AATTTT
AGTAAA
AATGAC
TAAATC
AAATTA
ATACGA
AAATAA
CGATAG
AAAAAG
GAAGCT
CGATAT
CGAAAT
GATAAA
CGTAGA
TTGAGATACAAA... (
GTGCGATCTAAA...(
TT6CGAGTTTAA... (
GT6CGATATGTT... (
385
332
303
253
AGTCGC
TACAAT
GTAATT
TGTCTA
B) GROUP II
RPS3-IVs-1
*RPL16-IVs-2
TTACTA
ATGCTT
*RPs8-IVs-l
RPS8-IVS-3
ATTGAT
CTGTTT
CONSERVED
GROUP II CONSENSUS
TTGCGA...
6
GTGYG
NT ). .TTCTATTTTCTT
NT ). .TTGTACTTGTTT
NT )..ATTTAATTTAAT
NT )..TTTAATTTTGCA
...T..A.TTT..T
TTTAATTTTAT
Figure 8. Comparison of the intron-exon boundaries of 15 introns from the Euglena gracilis chloroplast rp123,
rps19, rps3 (18), rpsl4 (19) rpIJ6, rp114, and rps8 loci and 3 introns from tufA(16). The introns are divided
between a) Group III and b) Group II designations. The hyphenated number after the gene symbol indicates the
first, second or third intron (ivs) of the locus. Verticle arrows point to the splice junctions. The asterisk (*) denotes
exon-intron junctions determined by primer extension RNA sequencing. Potential conserved and group II consensus
(47,48,52) 5'-and 3'-nucleotides are indicated below the aligned sequences.
within another group III intron. This is an intriguing possibility. Michel et al (51) have
proposed that Euglena chloroplast psbF intron 1 is a group II intron within another group
II intron. We note that a potential 102-nt group III intron internal to rpll6 intron 3 could
begin at position 661 (Fig. 1) with the group III-like boundary sequence
5 '-TTGTGTATTTCT and end at or near position 762 with the sequence
AAAAAGGTTTTT-3'. There is also the possibility of group III introns in intergenic
spacers. We have recently characterized the rps4-rpsll operon of Euglena chloroplast DNA
and determined that the 124-nt rps4-rpsl 1 intergenic spacer has a 95-nt group III intron
(J. K. Stevensen, R. Drager, and R. B. Hallick, in preparation). We note that the 183-nt
rplS-rps8 intergenic spacer (Fig. 1) has a potential group III intron of approximately 104
nt beginning with the sequence 5'-GTGTGTTTTTTT at position 1759 and ending at or
near position 1863 with AAAGGATTTATA-3' spacer. Further characterization of precursor
and mature RNA products will be required to determine all of the features of the RNA
maturation and processing pathway for the rpl 6-rpll4-rplS-rps8-rp136 transcription unit.
Group II Introns in Ribosomal Protein Genes
The second class of introns of Euglena chloroplast ribosomal protein genes are group II
introns. Examples are rp116 intron 2 (356 nt), and rps8 introns 1 (327 nt) and 3 (274nt),
and the previously described rps3 intron 1 (409 nt) (18). They are on average smaller
7604
Nucleic Acids Research
B)
A)
v
G
VI
U
6
A- U
u
A
A-U
A
U
C
U
A
U
U
U
C
A
U
U
C
U
U -A
G*U
U -A
A-U
A-U
rA u
VI
A
U
G*U
U.G
U
A-U
A-Uu
U-A
U
C
A
U
U
G-C
A
A
A-U
C-6G-Cc
A-U
A-U
A G
-UA
-
A
A
-U
U G
G-CU
GA-C
A -U
A- U
13
U
U-(
UUU
T )
c)
UAC
D)
U
U
A
A- U
GUVIU
U
U-A
U-A
G-C
U*G
A-U
U -A
U
U
C
U
U
U-A
U
U-A
A
u
u -A
c-c
u
u 6-CUu
A-U
A-U
A-U A
I
G.U A
u-eU
-
uuu
U
U
u
A-U
U
U
U
AU 3A
A-U
A-U
U
A-U
A-U
A-U
_6-C_ U*G
U
U
u
A
U
-GoU
C -G-U
A
_A-U_
CA
U
GU
-U
U U G
A^- U
s
AA
+A(")U +AGU
Figure 9. RNA secondary structural models proposed for the 3'-ends of a) rp116 intron 2, b) rps8 intron 1,
c) rps8 intron 3, and d)rps3 intron 1. Structures labeled V and VI resemble group II intron domains five and
six (20). The arrow points to the 3'-splice junction. The asterisk (*) designates the conserved bulge A residue.
The brackets delimit a base-paired region of domain five that resembles a similar conserved region of group
H introns.
than the group H introns in the liverwort (315-2111 nt) (3) and tobacco (503 -2526 nt)
(2) chloroplast genomes, and smaller than the introns from light-induced Euglena chloroplast
genes (326-1600 nt) (40). The distinguishing features of Euglena chloroplast group H
introns are the following: (i) They have classical group II 5'- and 3'-boundary sequences
(20,47), as initially identified for chloroplast group II introns in the Euglena rbcL locus
(52). As shown in Fig. 8b, the ribosomal protein group II introns, with the possible exception
of rps3 intron 1 (discussed below) follow this property. (ii) They have domain V and domain
VI secondary structure features characteristic of all group H introns (20). As shown in
Fig. 9, a short stretch of nucleotides in the domain V stem is conserved with those in
7605
Nucleic Acids Research
29 other examples (51). In domain VI (Fig. 9), the conserved A-residue that is located
8 nt upstream from the 3'-splice site is in an unpaired position, also characteristic of all
group II introns. For self-splicing introns (48,53), the A residue serves as the branch point
in the formation of a lariat intermediate.
The 409-nt rps3 intron 1 (18) has two unusual features. The 5'- and 3'- boundary
sequences are more characteristic of group EII, than of group II introns (Fig. 8). In addition,
the location of domains V and VI are more consistent with a splice boundary 20 nt upstream
from the beginning of the second exon (Fig. 9d) than at the beginning of the exon. There
is a good group II-like boundary sequence of 5'-GTGCGATACTAT located 79 nt from
the upstream exon (see Fig. 2, ref. 18). Therefore we are considering the possiblity that
rps3 intron 1 might be a 99-nt group III intron with an internal 300-nt group II intron.
The novel feature of many Euglena chloroplast group II introns is that structures
resembling group II intron domains I to IV are often either very weak or absent. Michel
(51) has suggested that Euglena group II-like introns may have lost domains I to IV and
possess variable versions of domains V and VI that are heterogeneous in size and basepairing. In general, we find that group II ribosomal protein gene introns have fewer elements
characteristic of domains I to IV than their counterparts in genes for photosynthesis-related
polypeptides such as psbA and rbcL and group II introns from other organelle DNAs.
For example, the self-splicing mitochondrial group II introns have two sites of
complementarity between domain I of the intron (exon binding sites 1 and 2) and the 5'-exon
(intron binding sites 1 and 2) that are required for splicing (47). By contrast, the short
exons of Euglena rps8 (exon one, 15 bp and exon two, 8 bp) do not possess any
complementary bases with the flanking introns. The exons for rp114 and rp116 also lack
intron binding sites. We suggest that the intron-exon recognition of the type reported for
self-splicing group II introns does not occur for these Euglena chloroplast introns. We
propose that there is an evolutionary continuum of intron structural variations among
chloroplast and mitochondrial group H-like introns that has the following order: (a) group
II (self-splicing); (b) group II non-self splicing, but with all 6 domains as defined in (20);
(c) group II non-self splicing, with some domains absent; (d) group III. The later two
categories have to date only been found in Euglena chloroplast DNA. Small introns of
100- 110 nt with different splice boundaries and G/C content with respect to the Euglena
introns have been described for some plant nuclear genes (54). Group III introns in turn
may be the closest relatives among organelle introns to nuclear introns, especially those
of higher plants.
ACKNOWLEDGEMENTS
We wish to thank Ms. Cathy Radebaugh and Ms. Gloria Yepiz-Plascencia for helpful
discussions during the course of the experiments, and Ms. Jane Dugas Huff for her expert
typing of this manuscript. This work was supported by a grant to RBH from NIH.
*To whom correspondence should be addressed
REFERENCES
1. Eneas-Filho,J., Hartley,M.R. and Mache,R. (1981) Mol. Gen. Genet. 184:484-488.
2. Shinozaki,K., Ohme,M., Tanaka,M., Wakasugi,T., Hayashida,N., Matsubayashi,T., Zaita,N.,
Chunwongse,J., Obokata,J., Yamaguchi-Shinaki,K., Ohto,C., Torazawa,K., Meng,B.Y., Sugita,M., Deno,I.,
Kanogashira,T., Yamada,K., Kusuda,J., Takaiwa,F., Kato,A., Tohdoh,H., Shimada,H. and Sugiura,M.
(1986) EMBO J. 5:2043-2049.
7606
Nucleic Acids Research
3. Ohyama,K., Fukuzawa,H., Kohchi,T., Shirai,H., Sano,T., Sano,S., Umesono,K., Shiki, Y., Takeuchi,M.,
Chang,Z., Aota,S., Inokuch,H. and Ozeki,H. (1986) Nature 322:572-574.
4. Schmidt,R.J., Hosler,j.P., Gillham,N.W., Boynton,J.E. (1984) J. Cell. Biol. 98:2011-2018.
5. Gantt,J.S. and Key,J.L. (1986) Mol. Gen. Genet. 202:186-193.
6. Lindahl,L. and Lindahl,J.M. (1986) Ann. Rev. Genet. 20:297-326.
7. Posno,M., Vliet,A.V. and Groot,G.S.P. (1986) Nucleic Acids Res. 14:3181-3195.
8. Deng,X.W. and Gruissem,W. (1987) Cell 49:379-387.
9. Muller,G.S., Hallick,R.B., Alt,J., Westhoff,P. and Hermann,R. (1986) Nucleic Acids Res. 14:1029-1044.
10. Koller,B., Fromm,H., Galun,E. and Edelman,M. (1987) Cell 48:111-119.
11. Hildebrand,M., Hallick,R.B., Passavant,C.W. and Bourque,D.P. (1988) Proc. Acad. Sci., USA 85:372-376.
12. Kohchi,T., Umesono,K., Yutaka,O., Komine,Y., Nakahigashi,K., Komano,T., Yamada,Y., Ozeki,H. and
Ohyama,K. (1988) Nucleic Acids Res. 16:10025-10036.
13. Markmann-Mulisch,U., Knoblauch,K., Lehmann,A. and Subramanian,A.R. (1987) Biochem. Internat.
15:1057-1067.
14. Thomas,F., Massenet,O., Dorne,A.M., Briat,J.F. and Mache,R. (1988) Nucleic Acids Res. 16:2461-2472.
15. Zhou,D., Quigley,F., Massenet,O. and Mache,M. (1989) Mol. Gen. Genet. 216:439-445.
16. Montandon,P. and Stutz,E. (1984) Nucleic Acids Res. 12:2851-2859.
17. Manzara,T. and Hallick,R.B. (1987) Nucleic Acids Res. 15:3927.
18. Christopher,D.A., Cushmann,J.C., Price,C.A. and Hallick,R.B. (1988) Curr. Genet. 14:275-286.
19. Nickoloff,J.A., Christopher,D.A., Drager,R.G. and Hallick,R.B. (1989) Nucleic Acids Res. 17:(in press).
20. Michel,F. and Dujon,B. (1983) EMBO J. 2:33-38.
21. Hallick,R.B., Richards,O.C. and Gray,P.W. (1982) In:Edelman,M., Hallick,R.B., Chua,N-H., eds. Methods
in Chloroplast Molecular Biology. Elsevier Biomedical, New York, pp. 281-294.
22. Hallick,R.B., Rushlow,K.E. and Bingham,S.C. (1982) In: Edelman,M., Hallick,R.B., Chua,N-H., eds.
Methods in Chloroplast Molecular Biology. Elsevier Biomedical, New York, pp. 315-332.
23. Maniats,T., Fritsch,E.F. and Sambrook,J. (1982) Molecular Cloning: a laboratory manual. Cold Spring
Harbor Laboratory, Cold Spring Harbor, New York.
24. Hallick,R.B. and Buetow,D.E. (1989) In: Buetow,D.E., ed. The Biology of Euglena, Vol. IV, Academic
Press, Inc., New York, pp. 351-414.
25. Henikoff,S. (1984) Gene 28:351-359.
26. Vieira,J. and Messing,J. (1987) Method. Enzymol. 153:3-11.
27. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) 74:5463-5467.
28. Mount,D.W. and Conrad,B. (1986) Nucleic Acids Res. 14:443-454.
29. Lipman,D.J. and Pearson,W.R. (1985) Science 227:1435-1441.
30. Feng,D.F. and Doolittle,R.F. (1987) J. Mol. Evol. 25:351-360.
31. Fourney,R.M., Miyakoshi,J., Day II,R.S. and Paterson,M.C. (1987) Focus 10:5-7.
32. Melton,D.A., Krieg,P.A., Rebagliati,M.R., Maniatis,T., Zinn,K. and Green,M.R. (1984) Nucleic Acids
Res. 12:7035-7056.
33. Tanaka,M., Wakasugi,T., Sugita,M., Shinozald,K. and Sugiura,M. (1986) Proc. Natl. Acad. Sci. USA
83:6030-6034.
34. Ceretti,D.P., Dean,D., Davis,G.R., Bedwell,D.M. and Nornura,M. (1983) Nucleic Acids Res. 11:2599-2616.
35. Ohkubo,S., Muto,A., Kawauchi,Y., Yamao,F. and Osawa,S. (1987) Mol. Gen. Genet. 210:314-322.
36. Montandon,P.E., Knuchel-Aegerter,C.and Stutz,E. (1987) Nucleic Acids Res. 15:7809-7822.
37. Watson,J.C. and Surzycki,S.J. (1982) Proc. Natl. Acad. Sci. U.S.A. 79:2264-2267.
38. Markmann-Mulisch,U. and Subramaniian,A.R. (1988) Eur. J. Biochem. 170:507-514.
39. Lou,J.K., Wu,M., Chang,C.H. and Cuticchia,A.J. (1987) Cuff. Genet. 11:537-541.
40. Koller,B. and Delius,H. (1984) Cell 36:613-622.
41. Hallick,R.B., Hollingsworth,M.J. and Nickoloff,J.A. (1984) Plant Molec. Biol. 3:169-175.
42. Planta,R.J., Mager,W.H., Leer,R.J., Wondt,L.P., Raue,H.A. and El-Baradi T.T.A.L. (1986) in:Hardesty,B.
and Kramer,G. (eds.) Structure, Function and Genetics of Ribosomes, Springer-Verlag, New York, pp.
699-718.
43. Mullet,J.E., Orozco,E.M. and Chua,N.H. (1985) Plant Mol. Biol. 4:39-54.
44. Gruissem,W. and Zurawski,G. (1985) EMBO J 4:3375-3383.
45. Mattheakis,L.C. and Nomura,M. (1988) J. Bacteriol. 170:4484-4492.
46. Hallick,R.B. and Bottomley,W. (1983) Plant Molec. Biol. Report. 1:38-43.
47. Jacquier,A. and Michel,F. (1987) Cell 50:17-29.
48. Schmelzer,C. and Muller,M.W. (1987) Cell 51:753-762.
49. Gingrich,J.C. and Hallick,R.B. (1985) J. Biol. Chem. 260:16156-16161.
7607
Nucleic Acids Research
50.
51.
52.
53.
54.
Cushman,J.C., Hallick,R.B. and Price,C.A. (1988) Cuff. Genet. 13:159-171.
Michel,F., Umesono,K. and Ozeki,H. (1989) Gene, in press.
Koller,B., Gingrich,J.C., Stiegler,G.L., Farley,M.a., Delius,H. and Hallick,R.B. (1984) Cell 36:545-553.
Jarrell,K.A., Dietrich,R.C. and Perlman,P.S. (1988) Molec. Cell Biol. 8:2361-2366.
Sugita,M., Manzara,T., Pichersky,E. Cashmore,A. and Gruissem,W. (1987) Mol. Gen. Genet. 209:247-256.
This article, submitted on disc, has been automatically
converted into this typeset format by the publisher.
7608
© Copyright 2025 Paperzz