volume 17 Number 19 1989
Nucleic Acids Research
Euglena gntcilis chkiroplast ribosomal protein operon: a new chloroplast gene for ribosomal protein
L5 and description of a novel organdie Intron category designated group III
David A.Christopher1 and Richard B.Hallick1-2*
'Department of Molecular and Cellular Biology and ^Department of Biochemistry, University of Arizona,
Tucson, AZ 85721, USA
Received July 20, 1989; Accepted August 18, 1989
ABSTRACT
We describe the structure (3840 bp) of a novel Euglena gracilis chloroplast ribosomal protein operon
that encodes the five genes rpU6-rpll4-rpl5-rps8-rpl36. The gene organization resembles the spc
and the 3'-end of the S10 ribosomal protein operons off. coli. The rpl5 is a new chloroplast gene
not previously reported for any chloroplast genome to date and also not described as a nuclear-encoded,
chloroplast protein gene. The operon contains at least 7 introns. We present evidence from primer
extension analysis of chloroplast RNA for the correct in vivo splicing of five of the introns. Two
of the introns within the rps8 gene flank an 8 bp exon, the smallest exon yet characterized in a
chloroplast gene. Three introns resemble the classical group II introns of organelle genomes. The
remaining 4 introns appear to be unique to the Euglena chloroplast DNA. They are uniform in size
(95—109 nt), share common features with each other and are distinct from both group I and group
II introns. We designate this new intron category as 'group III'.
INTRODUCTION
Of the estimated 60 different ribosomal proteins that comprise the prokaryotic-like 70S
ribosomes of chloroplasts (1), genes for 20 are located on the tobacco (2) and liverwort
(3) chloroplast genomes. The remainder appear to be encoded by the nuclear genome,
and synthesized as precursor polypeptides on cytoplasmic ribosomes (4,5). Many of the
chloroplast-encoded ribosomal protein genes are organized in clusters that resemble the
S10, spc and alpha ribosomal protein operons of E. coli (6). Some ribosomal protein genes
have been shown to be transcribed (7,8,9), trans-spliced (10,11,12) and to encode proteins
present in ribosomes in vivo (13,14,15). However, the molecular events that coordinate
the expression of the chloroplast and nuclear-encoded ribosomal constituents during
ribosome biogenesis are poorly understood.
In the photosynthetic protist, Euglena gracilis, 11 chloroplast ribosomal protein genes
have been characterized, including rps7-rpsl2 (16), rpl20 (17), rpl23-rpl2-rpsl9-rpt22-rps3
(18), rpsl4 (19), and rps4-rps]J (Stevenson, J. and Hallick, R.B., manuscript in
preparation). In the present paper, we describe a novel chloroplast ribosomal protein operon
that contains the five genes, rpll6-rpU4-rpl5-rps8-rpl36. The rpl5 locus is a new chloroplast
gene, not previously reported for any chloroplast genome and also not described as a nuclearencoded, chloroplast protein gene. This operon is interrupted by at least 7 introns. Two
introns within the rps8 gene flank an 8 bp exon, the smallest yet characterized in a chloroplast
gene. Three introns are relatively smaJl examples of the well-known group II intron (20)
category found in both chloroplast and mitochondria] genomes. The remaining 4 introns
belong to a previously described class of small introns unique to the Euglena chloroplast
DNA and occurring in several ribosomal protein genes (18). Evidence is presented that
© IRL Press
7591
Nucleic Acids Research
these introns comprise a new category of organelle intron, the members of which are highly
related to each other, but unique from both group I and group II introns. We designate
this new intron category 'group HP.
MATERIALS AND METHODS
Molecular Cloning and DNA Sequencing
Chloroplast DNA from photoautotrophically-grown E. gracilis strain Z was prepared as
described (21). The 3.3 kb chloroplast EcoRI restriction fragment EcoM (Fig. 2) was
isolated by agarose gel electrophoresis (22) and cloned in both orientations in the EcoRI
site of pBS~(Blue Scribe-, Stratagene Cloning Systems, Inc.). The resulting plasmids are
designated pEZC541.1 and pEZC541.2. A plasmid library of chloroplast Bglll restriction
fragments of 3.0—5.0 kb in size cloned in the BamHI site of pBS~ was constructed with
E. coli XL-1 Blue as host. Replica filters of the library were screened by colony
hybridization (23) with ^P-labeled DNA probes from purified EcoM fragment, and also
from a 600 bp DNA fragment encoding the E. gracilis 3'-end of rpsl4 and the complete
trnF and tmC genes (19). Plasmid pEZC942, which contains the 4.2 kb E. gracilis
chloroplast Bgin fragment BgU2, (Fig. 2) that overlaps both EcoM and EcoRI fragment
EcoA (24) was selected. The 2.0 kb EcoRI/Bgin fragment of pEZC942 was subcloned
into pBS~ and is designated pEZC942.20. Plasmid pEZC948.1, which contains the 3.5
kb Bgin fragment BglM (Fig. 2) located entirely within EcoA, was also selected from
the library. The plasmid pEZC948.2, which contains the fragment BglM in the opposite
orientation to that of plasmid pEZC948.1, was kindly provided by G. Yepiz-Plascencia
of this laboratory. The adjacent Bgin fragments enabled DNA sequence analysis across
the EcoM-EcoA EcoRI restriction sites.
Double-stranded plasmid DNAs were purified on cesium chloride gradients after alkaline
lysis (23) and linearized by double digestion with PstI and Xbal (for pEZC942 and
pEZC948) or with PstI and BamHI (for pEZC541.2 and pEZC942.20). Overlapping
deletion subclones (25) were generated as in (18), except 7 - 1 0 units of exonuclease Hl/^g
DNA was used to digest from either the BamHI or Xbal ends of the linearized plasmid
DNAs. Single-stranded template DNA was prepared using the helper phage M13K07 (26).
Templates for 100% of both DNA strands were sequenced via the [a-35S]-dATP dideoxy
chain termination method (27), with Sequenase (United Biochemical Co.) using standard
T3 and M13 primers.
DNA Sequence Analysis
Analysis of sequence data was performed on IBM PC-XT and PC-AT computers using
the DNA and protein analysis programs of Mount and Conrad (28). Protein open reading
frames were aligned with known polypeptide sequences compiled in the Protein Identification
Resource (P.I.R.) of the National Biornedical Research Foundation, Georgetown University,
using the FASTA search algorithm described in (29). Multiple protein alignments were
prepared with the program described in (30).
RNA Isolation
Purified chloroplasts from photoautotrophically grown E. gracilis cultures were resuspended
in lysis buffer (0.5% SDS, 10 mM Tris-HCl pH 7.5, 1 mM EDTA, 5 mM DTT), extracted
three times with equal volumes of phenol saturated with 10 mM Tris-HCl pH 7.5, once
with phenol-chloroform-isoamyl alcohol (1:1:24), once with edier and then ethanol
precipitated. The resulting nucleic acid was digested with RQl RNase-free DNase (Promega
Biotechnology), at 1 unit//tg DNA, 15 min., 37°C. After phenol extraction and ethanol
7592
Nucleic Acids Research
precipitation, the chloroplast RNA was subjected to electrophoresis on 0.66 M formaldehyde
agarose gels (31).
RNA Hybridization
32
P-labeled synthetic RNAs complementary to the rpll6-rpll4-rpl5 transcripts were
prepared from linearized DNA templates (32) and used for probing membrane filter
(Northern) blots of chloroplast RNA fractionated on 0.66 M formaldehyde/1% agarose
gels (31). Separated RNAs were transferred to Genescreen filters (Dupont Co.).
Prehybridization was done at 55°C for 12 hrs in 50% deionized formamide, 5xSSC, 0.5%
SDS, 20 mM NaPO4 pH 6.8, 0.5 mg/ml Ficoll, 0.5 mg/ml polyvinyl pyrrolidone, 0.5
mg/ml BSA, and 200 /ig/ml heat denatured herring sperm DNA. Hybridization buffer
and conditions were as for pre-hybridization, with the addition of probe [2x 107dpm] 32Plabeled RNA. The membrane filters were washed in 0.5% SDS, 0.1 xSSC and 20 mM
NaPO4 pH 6.8, twice at 55°C for 1 hr and once at 65° for 15 min. Autoradiography was
done with 12-16 hr exposures on Kodak X-omat-AR X-ray film.
Primer Extension cDNA Sequence Analysis
The purified oligo-deoxynucleotide primers, 5'-TCGGCTATATTTACCGTAAG for rps8
exon 3 (positions 3008-3027, Fig. 1) and 5'-TCCAAAGTTTGCCCCCACGT for rpll6
exon 4 (positions 885-904, Fig. 1), were synthesized at the University of Arizona
Biotechnology Center. The primers were 5'-end labeled with T4-polynucleotide kinase
(23). A total of 8x 106 dpm of 32P-oligonucleotide primer was co-precipitated with 12
Hg chloroplast RNA and dissolved in 10 /il 200 mM KC1, 10 mM Tris-HCl (pH 8.3 at
43°Q. The mixture was heated to 85°C for 3 min. and quickly cooled on ice. The mixture
was then placed at 45°C for 1 hr and slowly cooled to room temperature. The primer
extension reactions were carried out in a series of five 7 yl reaction mixtures containing
4 units of AMV reverse transcriptase (Bethesda Research Labs), 2 fd of annealing mixture,
1.5 /tl of 5x reverse transcriptase buffer (100 mM Tris-HCl pH 8.3, 50 mM MgCl2,
25 mM DTT, 280 y.glm\ actinomycin D), 300 pM of each dATP, dTTP, dCTP and dGTP
and 160 /iM of one of the four dideoxyribonucleotides, except in the N-reaction which
lacked ddNTPs. After incubation at 43°C for 45 min., 5 jtl of loading dye (96% formamide,
10 mM EDTA, 0.1 % xylene cyanol, 0.1 % bromophenol blue) was added. After heating
to 85°C for 2 min., the samples were electrophoresed through 0.25 mm thick 6%
polyacrylamide gels containing 7 M urea, 89 mM Tris-borate (pH 8.2) and 2 mM EDTA
and visualized by autoradiography. Additional control primer extension reactions were
done as above except the 5'-32P-labeled primer was annealed to 1 /tg of a genomic
chloroplast DNA clone (a single-stranded recombinant phagemid, pEZC541.2).
RESULTS
Characterizxition of the rpll6-rpU4-rpU5-rpl8-rpl36 Chloroplast Ribosomal Protein Genes
We are interested in the organization, expression, RNA maturation pathways, and novel
introns of Euglena chloroplast ribosomal protein genes, and the evolutionary relationship
of these genes and introns to other comparable ribosomal protein operons (17,18,19,33).
The region under study is located in the E. gracilis chloroplast EcoRI fragments EcoM
and EcoA (24). Preliminary evidence for the presence of ribosomal protein genes on the
EcoM fragment of Euglena chloroplast DNA was obtained via Southern hybridization
experiments. EcoM hybridizes to tobacco chloroplast DNA fragments SalI-10 and 11, and
BamHI-7 and 10 which encode a cluster of ribosomal protein genes (33) (data not shown).
The DNA sequence of 3840 bp from the EcoM and EcoA fragments (24) of Euglena
7593
Nucleic Acids Research
100
GTAG
rpl16
M I S P K R T K F R K Y H R G R
200
«TTAACAGGTAAAATCTATATGAGACTTTTTTTATGTAAAAAATAAAATTTTTATTGCGTTTTACCACGATAGTTATTTTATTCGATTTGATTTAAGTGA
L T G K I Y
300
AATTTrTTATTTAATTGATAAAGTTGTTTTTGGTAATTATGCTTGTGCGATCTAAATAATTTAAAAAATTAATTATGAGTAAATTTAATTATTTTTTGTT
D K V V F G N Y A U )
*0O
TAAGTTTTAAATTATTGTAATTAAATATCTTTAATAGCATTTTATTCTTAACGTAATAAATGTTTAATAATTTTAAATTTTAAGTATTAAATATTTTTTT
500
SAGCTAATAAAAAnAAAATAATAAAAAATACAnTAATCTTTACTAAAATTAAGTAGATTATTATTAAAATTTTTTATTTTATATTATGATATACAATT
600
TGATTTTGTACTTATTTTACTTTTTTTAAATTTATTAATTAAAAGCCAAATGCATTACATTTTGCTTGTTTGGATTTTTTAAGGCATATTGTACUfilll
700
TACAATCATTAGAGCCTGGTTGGATAACTTCACGTCAAATTGAAGCTGCTTTGTGGTTTTTTGTGATTTCTAATAAATTTATATAGATTTTTTGCTTTCA
Q S L E P G U I T S R O I E A A
800
AAGTTATTTAATAAATTTTTATTGTTTATAAACTIITTTGGTTTTCGATAAAAAAGGTTTTTTAACTTATTTACTTATAAGATGATTTTTCTTAATTGTT
WO
TTATTTTAAATAAATTTTTACTGTTTTCTTTTCTAAAAAAAAATTTTTTTAJiUAIICGTAGAGTTATTACAAGATATGCAAAACGTGGGGGCAAACTT
R R V I T R Y A K R G G K L
1000
TGGATAAGAATTTTTCCCGATAAGCCTGTAACATTTCGAGCAGCCGAAACTCGCATGGGTTCAGGGAAAGGAAATGTAGAATATTGGGTTGCAATTGTAA
U I
R l
F P D K P V T F R A A E T R H G S G K G N V E Y U V A I V
1100
AACCTGCAAAAATTCTTTACGAAGTATTGGGTATTTCAGAATCCATTGCAAAGTATTCTTTAAAAATAGCAGGATATAAAAIGCCTATTAAAACTCGTGT
K P G K I L Y E V L G I S E S I A I C Y S L K I A G Y K H P I K T R V
1200
TAT7GTTAAAATTTAAAACTGTTTAATGCTAATTGTTTTTGTTAATAATATTTTTGGTAAAATTCTTATTTTGGTAGTTCTTCGTTCTTTATAATTTATT
1 V K I
•
1300
7TTIAGTTTAGATGATTAAACCACAAACATATTTGAAAATTGCGGATAATACTGGAGCACAAAAAATTATGTGTATTCGTATATTAGGACCAAATTGTCA
rplU
M I K P Q T Y L K I A D I I T G A O I C I M C I R I L G P H C Q
1*00
GTATGCGAATATTGCTGATATAATAATAGCGGTTGTAAAATTGTGTATTAGAGTTAATAATTTATATATCTTTTTATGTTTTTATTATTTTATTATTCTT
Y A H I G D I I I A V V K
1500
TTATATTGATATTGTTTATAAATTTATAAAGATTTTATCTATTTTAAAGAAGCTATTCCTAACATGGTTGTTAAAAAATCAGATATTGTTAAAGCTGTTA
E A 1
P N M V V I t l C S D I v r A V
1600
TTGTTAGAACTGTTAAAGGAGTACGTAGAGAAACTGGAATGflCAATTCGTTTTGACGAAAATGCTGCTGTCATAATTAATAATGATCGTTCACCTAAAGG
I V R T V K G V R R E S G M A I R F D E N A A V I I I I N D R S P K G
1700
rACAAGAATTTTCGGCCCCATTGCTCGCGAATTKGAGAAAAGGAATTCGTAAAAATAATGTCCTTAGCGCCAGAGGTTGTGTGATACAATAAAGACTTC
T R I
F G P I A R E L R E K E F V K I H S L A P E V V *
1800
ICATAATAAGTTGTTTTTCATTTTTAGCGTATATCTATATTTTCTTAATTTTTTGTATGTGTGTTTTTTTATTATAAATTTTACTTTTATCATTCCTATT
1900
TATAGTTCCAAAATATTTTTATTAAAATTTAGTSTAGTA6TAATATTTAAAAAGGATTTTATAAAAAAATGCAAAGATTAAAATCGTTTTATTTAGAAAC
rpl5
M O R L K S F Y L E T
2000
TATCATTCXCAAACTTAAAGAAGAATTTGGTTATCTTAATTCTTATAGGGTTCCTAAATTAAAAAAGATTGTTATAAATCGAGGATTTGATGAATCTTGT
I I P K L K E E F G Y V N S Y R V P K L K K I V I N R G F O E S C
2100
CAAAATTCAAAAATTTTGGAAGTTTTATTAAATGAATTA6AAA7TATTTCTGGTCAAAAGCCTATTATAAGTAAGGCGAAAAAAGCTATTGCTAACTTTA
0 N S K 1 L E V L L H E L E I
I S G O K P I
I
S K A K K A I A M F
2200
AACTTAAAGAAAAGATGCCTGTTGGTATGTTTTTGACTTTGCGTAGTGAAAAGATGTACAGTTTTTTAGACCGGTTAATIAATCTATCTTTACCTAGAAT
K L K E C N P V G M F I T L R S E I C H Y S F L D R L I N L S L P R I
2300
TAGAGATTTTCAAGGAATAAACAAGAATTGTTTTGATGGATCAGGTAACTTTAGTTTTGGGTTAAGTGAACAATCAATGTTCCCTGAAATTAACTTTGAT
R O F a G I N K N C F D G S G H F S F G L S E O S M F P E I N F D
2*00
AAAATGATTAAAGTACAAGGITTGAATATAACAATCGTTACAACTGCTGAAACGAATCAAGAAGCTTTCTTTCTTTTAAAAGAATTAGGTATCCCGTTCC
K H I K V Q G L N I T I V T T A E T U O E A F F L L K E L G I P F
25O0
GAGATTAATTTTTATACTTTTTTTTACGTGTTAAGATGTACAAAATCTATTTTTTATTAATAGTAATTTTATTGTTTTATTTTTAGAATTTTATGACAAA
a D •
rprf
H T N
2600
TATTGATTTGCGAGTTTAAGTGTAATAATATATTATTAATAATTTTTTATATTTATTATCTATATATATGTTGAAATTCGGTAAATAAGAGCATTATTAA
I
0
2700
ATATTTTGATAGTAGTATTTTTCTTTAACAAACCGTTTATTATTGTTTTGTAAAATTTATTCATTAGTTGGGAGATCACTTATnTTATTACTATATTAA
7594
Nucleic Acids Research
2800
TATTACATAATTAAATTAGTTAAATATTAGTAAATCTTGAAATATTTAAATTTTATTTTTCAAAAaTTATGATTAATGOTTTCCTGTACAGTTTGTTAA
2900
TAAATTGTTTTAAAAATTTATAATTTAATTTAATGTAATTAGGTGTGATTTTTTTTATAGAAGTATAATTAATTTTTGATTTTTATTAATATATATACAT
V
I (S)
JOOO
ATTTTTATTCTTTTGTTAGGTTTTATTTAGCCTTATACGATATGCTTACAAGAATAAGAAACTCTCTTTTAATAAAAGCTAGAAAAGTTAATGTTATTAA
D H L T R I R M S L L I K A R K V N V I N
3100
T t C L T V N I A E I L K K E G F I D S F E L A D A T C L T E N G V
3200
ATAAAAAAATATAITACAATCTTTTTAAAATATAAAGGTCCAAAACAAGTTTCTTATATAACTAAAATAAAACGTGTAAGCAAACCTGGTTTGCGTACTT
I K K Y I T I F L K T K G P K O V S Y I T K I I C R V S I C P G L R T
3300
ATAGTAGTTATAAAAGACTACAATCAGTAGCAGGIGGCGTTGCTTTAACTGTTT5IG£GATATGTTTTTTGAGTTTCATAAGATAAAAATTATAAGCAAT
Y S S Y K R L O S V A G G V G L T V
3*00
3600
TCTTAAAGAAATTTTAAAATTTAATTTTGCATGTCTACTTCTAAAGGATTAATGACTGATCGATTGGCTAGATCTAATAAAATTGGTGGCGAGATTCTGT
( U S T S K G I N T D R L A R S I I K I G G E I I .
3700
F Y I U *
rplM
H
3800
AAAATACGTTCTTCTOTTAAAAAAATTTGTAATAAATGTTATTTCATTCGTCCCAAAAACAATCTTTTAGTTGTTTGTATAAATAACAAGCATAAACAAC
K I R S S V C K l C N K C T L I R R t H k L L V V C I N M K H r O
38*0
GACAGGGTTAAACTTGTTTTGCGTCTATAATTTGTAGGCG
R 0 G *
Figure 1. DNA sequence of the RNA-like strand for the Euglena chloroplast ribosomal protein loci rplI6, rpU4,
rpl5, rps8 and rpl36. Coding regions for exons are designated by the single letter amino acid code directly below
the second nucleotide of each triplet codon. Amino acid symbols that are in parenthesis indicate a split codon.
Sequences which contain conserved nucleotides at the 5' and 3'-ends of each intron are underlined. Asterisks
(*) designate stop codons. Gene symbols follow the nomenclature described in (46).
gracilis chloroplast DNA is given in Fig. 1. This region was found to encode a cluster
of 5 ribosomal protein genes, with the gene organization rpU6-95 bp spacer-/p/74-183
bp spacer-/p/5-84 bp spacer-/p/8-83 bp spacer-rp£36". Protein coding regions are interrupted
by introns in the genes for rpll6 (3 introns), rpll4 (1 intron), and rps8 (3 introns). The
cDNA sequence analysis of splice boundaries, and intron properties are described below.
The overall organization of the exons, introns, and intergenic regions for the 5 ribosomal
protein genes is shown in Fig. 2. This region is located 2.8 kbp distal to, and in the same
polarity as, the previously described rpl23-rpl2-rpsl9-rpl22-rps3 Euglena chloroplast
ribosomal protein gene cluster (18). Immediately downstream from the 5 genes described
in this paper, and in the same polarity are the genes trnI-rpsl4-trnF-trnC (19).
Exon-Intron Boundaries of rpll6 and rps8 Determined by cDNA Sequence Analysis.
It was not possible to precisely define the exons of rpll6 arid rps8 from the DNA sequence
data alone because chloroplast ribosomal protein coding regions are not well-conserved
among species (18), the introns of Euglena chloroplast genes are highly novel, and several
potential exons appeared to be extremely small. To accurately determine splice boundaries,
primer extension cDNA sequence analysis was employed. From a 20-nt oligonucleotide
primer complementary to positions 885 -904 (Fig. 1) of rpU6-exon 4, the sequences across
all three rpll6 introns were determined by direct cDNA sequencing of the spliced mRNA
template. A comparison of the genomic and cDNA sequence ladders used to characterize
splice sites for rpll6 exons 1-2 and exons 2 - 3 is shown in Figure 3. (Data on exon
7595
Nucleic Acids Research
rpl 36 trn I
rpn 6
rp/1 4
rpl 5
rp»8
Eco R»
Htod in
EKJI U
kb
"
Figure 2. Organization and restriction map of the rpl]6, rplI4, rpl5, rpsS and rpl36 loci from the Euglena gradlis
chloroplast genome. The tml and rps!4 loci are included as reference genes (19). The black boxes represent
exons, and open boxes represent introns. The hatched region above the genes indicates the probe used in the
northern analysis (Fig. 6). The large arrow above the gene boxes indicates direction of transcription, from left
to right. The small arrows under the genes indicate the position of primers used in the cDNA sequencing. Sizes
are in kilobase-pair (kbp). Restriction fragments are labeled as numbers or letters between restriction sites (see
ref. 24 for complete map).
3—4 splicing not shown). Three rpU6 introns of 97, 356, and 208 nt were identified.
They are at positions 120-216, 245-600, and 651-858 (Fig. 1).
The rps8 amino terminal coding region, presumably located somewhere within the 600
bp distal to rpl5 (Fig. 1), could not be defined from the DNA sequence data alone.
Therefore, the 5'-end of the spliced rps8 mRNA, containing two exon-exon boundaries,
was also determined by cDNA sequence analysis. A 20-nt oligonucleotide primer,
complementary to positions 3008 — 3027 of a conserved rps8 coding region now identified
as within exon 3, was used. A comparison of rps8 genomic and cDNA sequence is shown
in Fig 4. The cDNA ladder contains two unique exon sequences, including an 8 nt-long
sequence of 5'-CTAATAC-3'. The mRNA-like complement of this exon
(5'-GTAATTAG-3') could be uniquely positioned at bases 2835-2842 (Fig. 1), 95 bp
upstream from exon 3 of rps8. The cDNA sequence then skips over the 327 bp intron
1 to an amino terminal exon coding for 5 amino acids (positions 2493-2507). To our
knowledge, the 8 nt exon 2 of rps8 is the smallest exon described to date for any chloroplast
gene.
In addition to the five exon-exon boundaries determined from the cDNA sequence, two
boundaries (rpsJ4 intron and rps8 intron 3) were predicted from highly conserved regions
of the genomic sequence data (Fig. 1). Properties of the 7 introns from this ribosomal
protein transcription unit are described below.
Ribosomal Protein Gene Products
The amino acid sequences derived from the five ribosomal protein genes are given in Fig.
1. The amino acid sequences were analyzed at the seven splice sites and were found to
continue in-frame across the exon-exon junctions of the spliced mRNAs. Two interesting
examples are the rps8 exon 1 - 2 and exon 2 —3 junctions. As shown in Fig. 5a, the 8-nt
exon 2 is spliced in frame to exons 1 and 3, and encodes a valine and conserved isoleucine
residue, as well as part of a serine codon. Splicing yields a product that is colinear with
six other ribosomal protein S8 sequences (Fig. 5a). Adjacent to the exon 1 —2 junction
is an invariant aspartate at position 5. At the exon 2 - 3 junction, 10 of 11 residues are
invariant or semi-invariant.
The five Euglena genes rpll6 (134 codons), rpll4 (122 codons), rpl5 (180 codons),
rps8 (141 codons) and rpl36 (38 codons) encode polypeptides of predicted molecular
weights of 15467, 13387, 20617, 15670, and 4405, respectively. These chloroplast
7596
Nucleic Acids Research
EXONS
GENOHIC
SEQUENCE
DNA
1-3
rpl 1 6 c D N A
SEQUENCE
GTAACT..
Figure 3. Primer extension RNA sequence analysis of the splice junctions between rpl/6 exons 1 through 3.
The lanes labeled RNA (A,T,C,G,N) contain the cDNA sequence generated by reverse transcription of Euglena
chloropiast RNA using a primer complimentary to bases 8 8 5 - 9 0 4 (Fig. 1). The rpll6cDNA
sequence is listed
to the right of the lanes from which it is read. The bands in the control 'N' lanes are due to spontaneous stops
and pauses in the reverse transcriptase reactions. In both the upper and lower RNA panels, a solid line intersects
the sequence at the exon/exon junctions. The line branches to the left of the RNA panel and intersects the
corresponding exon/intron junctions (arrows) in the genomic DNA sequencing panel. The chloropiast genomic
DNA sequence is listed to the left of the lanes from which it is read. Exons 1, 2 and 3 and introns 1 and 2 are labeled.
7597
Nucleic Acids Research
GENOMIC
SEQUENCE
rps8 cDNA
SEQUENCE
Figure 4. Primer extension RNA sequence analysis of the splice junctions between rps8 exons 1 to 3. The lanes
labeled CT RNA (A,T,C,G) contain the cDNA sequence generated by reverse transcription of Euglena chloroplast
RNA using a primer complementary to bases 3008—3027 (Fig. 1) as described in Materials and Methods. The
rps8 cDNA sequence is listed to the right of the lanes from which it is read (right panel). Dark lines that intersect
the sequence indicate exon/exon junctions in the cDNA sequence. The DNA sequence panels (DNA, ATCG)
contain the sequencing reactions for the corresponding chloroplast genomic clones. The genomic DNA sequence
is listed to the left of the lanes from which it is read. Exon and intron regions are labeled next to the DNA
sequence. The black lines cross the DNA sequence at the exon/intron junctions (arrows), linking up with the
corresponding lanes in the DNA and RNA sequencing panels.
7598
Nucleic Acids Research
a. 30s Ribosomal Protein S8 famino terminus!
Euglena
E. coli
Mycoplaama
Liverwort
Tobacco
*+ * *++++* ***+
* + +++++ +*+ +* ****
MTNIDVISDMLTRIRNSLLIKARKVNVINTKLTVNIAEILKKEGFI.
MSMQDPIADMLTRIRNGQAANKAAVTHPSSKLKVAIANVLKEEGFI.
MT TDVIADMLTRIRNANQRYLKTVSVPSSKVKLEIARILKEEGFI.
HG NDTIANMITSIRNANLGKIKTVQVPATNITRHIAKILFQEGFI.
HG RDTIAEIITSIRNADMDRKRWRIASTNITENIVQILLREGFI.
b- 503 Ribosoroal Protein L5
*+ +* +* ++++ +* +*+ * +
**+++** +* * ++ +++*+*+
Euglena
MQ RLKSFYLETIIPKLKEEFGYVNSYRVPKLKKIVINRGFDESCQNSKIIiE
E. COli
MA KI^DyYKDEVVKKLMTEFNYNSVMQVPRVEKITLNMGVGEAIADKKLLD
Mycoplasma
MKSRLEIKYKDQIVLELFKELNYKSIMQVPKIQKIVINMGIGDATTDPKKLD
Euglena
E. coli
Mycoplasma
++* +*****+++**+*++* **+++ ++* +***+++* +*+++*
VLmELEIISGQKPIISKAKKAIANFKLKEKMPVGHFLTLRSEKMYSFLDRL
NAAADLAAISGQKPLJTKARKSVAGFKIRQGYPIGCKVTLRGQRMWEFFERL
AAIFELEKLSGQKPIVTKAKKSLAVFKLREGMAIGAKVTLRGKKMYDFLDKL
Euglena
E. coli
Hycoplasma
* +++**+*** *++ -t-t-*** *+++ *++** +***+++#* ++ *++**
INLSLPRIRDFQGINKNCFDGSGNFSFGLSEQSHFPEINFDKMIKVQGLNIT
ITIAVPRIRDFRGLSAKSFDGRGDYSHGVREQIIFPEIDYDKVDRVRGLDIT
INVALPRVRDFRGVSKTSFDGFGNFYTGIKEQIIFPEVDHDKVIRLRGMDIT
Euglena
E. coli
Mycoplasma
IVTTAETNQEAFTLLKELGIPFRDter
ITTTAKSDEEGRALAAFDF PFRKter
IVTSAKTNKEAFALLQKIGMPFEKter
Figure 5 a) Alignment of the amino terminus of Euglena chloroplast ribosoma] protein S8 with the corresponding
S8 sequences from E. coli (34), Mycoplasma capricolum (35), liverwort chloroplast (3), and tobacco chloroplast
(33). Vertical arrow heads point to the location of introns in the Euglena gene, b) Alignments of the gene products
of the Euglena gracilis, E. coli (34) and Mycoplasma capricolum (35) rpl5 loci. Amino acids are represented
as the single letter amino acid code. Asterisks (*) denote amino acid identity and plus signs (+) denote conservative
amino acid replacements.
ribosomal protein sequences were aligned with sequences in the P.I.R. protein data base
by the FAST A algorithm of Lipmann and Pearson (29). The only significant alignments
were found with 70S ribosomal proteins from procaryotes and other chloroplasts. The
gene products are 42-62% identical in primary structure to the E. coli ribosomal protein
counterparts (Table 1), and 42-69% identical to the corresponding plant chloroplast gene
products. Based on the amino acid sequence comparisons, cDNA sequencing, and the
mRNA products of these genes (described below), we would predict that all 5 Euglena
genes are functional genes, and not pseudogenes as has been described for the rpl23 of
spinach chloroplast DNA (14).
A New Chloroplast Gene, rpl5
The gene for ribosomal protein L5, rpl5, is a new chloroplast gene. It is not encoded in
the chloroplast genomes of liverwort and tobacco that have been completely sequenced,
nor in the partial chloroplast sequences known from other species. There are no known
nuclear genes for chloroplast ribosomal protein L5.
The identification of rpl5 is based on its location in a ribosomal protein operon, and the
amino acid sequence alignment with the corresponding E. coli polypeptide shown in Fig.
5b. There are 42% identical residues, and an additional 41 % of the residues (conservative
replacements) contributing to the similarity score of FASTA. The polypeptides are exactly
co-linear except for a single additional residue near the carboxyl terminus in the Euglena
7599
Nucleic Acids Research
TABLE
1.
PERCENT
E. coli
AHINO
ACID
I D E N T ITY
LIVERWORT
TOBACCO
65
69
—
64
55
—
rps8
54
62
42
43
53
42
rpl36
51
68
65
Average
50
64
57
rpll6
rpll4
rpl5
T A B L E 1 . The percent atnino acid Identity of the Eualena
chloroplast rlbosomal protein gene products compared with
those of E. coll. liverwort and tobacco. The average percent
Identity between all the Eualena gene products and those
from a particular organism Is listed at the bottom of each
column.
sequence. A similar alignment is possible with rpl5 from Mycoplasma capricolum
(Fig. 5b.)
The rpll6-rpll4-rpl5-rps8-rpl36 genes are in a Polycistronic Transcription Unit
A preliminary analysis of transcripts from the ribosomal protein gene cluster was
undertaken. A 32P-RNA probe specific for the rpll6-rpU4-rpl5 coding region (Fig. 2)
was hybridized to membrane filter (Northern) blots of chloroplast RNAs that had been
separated by electrophoresis. Transcripts of 1.1, 1.3, 1.6, 2.3, 2.5, 2.9, 4.5, 4.8. 7.8,
and 8.3 kb and additional minor RNAs are detected. Based on additional experiments with
gene and intron specific probes (D .A. Christopher and R. B. Hallick, in preparation)
we can interpret the 1.1 and 1.3 as dicistronic rpll6-rpU4 and rpU4-rpl5 mRNAs, and
the 1.6 kb species as a tricistronic mRNA. The 2.3, 2.5, and 2.9 kb mRNAs are even
larger precursors of the ribosomal protein transcription unit, containing cistrons distal to
rpl5. The 4.5 to 8.3 kb transcripts are much larger than the region encoding the 5 ribosomal
protein genes, and are presumed to be part of a larger precursor mRNA, perhaps encoding
regions of the flanking ribosomal protein genes (19). The intron specific probes hybridize
to unique, but rare and presumably unspliced, chloroplast RNAs (data not shown). A detailed
analysis of these pre-mRNAs is in progress.
DISCUSSION
Gene organization
As illustrated in Fig. 7, the overall organization of the chloroplast ribosomal protein genes
in Euglena has a striking similarity to the arrangement of genes at the 3'-end of the S10
ribosomal protein operon and the spc operon of E. coli (6,34). The Euglena genes are
also organized like the analogous genes in the tobacco (33), liverwort (3), spinach (15)
and maize (38) chloroplast genomes. There appears to be an overall evolutionary
conservation of gene order for many ribosomal protein genes. The major differences of
7600
Nucleic Acids Research
EG CT
RNA
KB
8.3.
7.8'
4.8
4.5
2.9
2.5
2.3
1.6
I-•
1.3 - jH
1.1 —
. ••
0.5'
Figure 6. Northern blot of purified Euglena chloroplast RNA probed with the rpll6-rpU4-rpl5 region illustrated
in Fig. 2. Transcript sizes are labeled in kilobase (kb).
the chloroplast genomes with respect to E. coli are the absence of genes for ribosomal
proteins L29, S17, L24, L6, L18, and S5. These proteins are all presumed to be encoded
in the nuclear DNA of Euglena, higher plants, and liverwort. The one rearrangement in
chloroplast gene order vs E. coli involves rpsl4, which in Euglena is encoded distal to
rpl36 and trnl (Figs. 5 and 7), and in higher plants at a completely different chloroplast
locus (2,3). Higher plants are distinct from both Euglena and E. coli for the presence
of the gene infA for initiation factor IF-1 (9) adjacent to rpl36 (formerly known as secX).
The allocation of genes for ribosomal proteins between the chloroplast and the nucleus
in photosynthetic eukaryotes has largely been conserved throughout evolution. Differences
in ribosomal protein coding capacity among chloroplast genomes are uncommon, but not
without precedent. Both Euglena and Chlamydomonas chloroplast DNAs have a tufA locus
for elongation factor Ef-Tu (36,37) which is absent in higher plants and liverwort. Tobacco
and liverwort chloroplast genomes differ in only one of their 20 ribosomal protein genes,
with rpl21 being present only in liverwort and rpsl6 only in tobacco (2,3). Therefore,
the discovery of a new chloroplast ribosomal protein gene in Euglena is noteworthy. This
is the first example of a chloroplast rpl5 gene from any species, and only the third example
when all prokaryotes are considered. The rpl5 gene is probably nuclear-encoded in higher
plants, as are other chloroplast ribosomal proteins (4,5).
The number and positions of introns in the Euglena rpll6, rpll4, and rps8 loci distinguish
them from their higher plant chloroplast DNA counterparts. The rpll6 loci of tobacco
7601
Nucleic Acids Research
E.coli
op«ron»
rp/16
rpt 3
Eugicna
chloroplatt
S3
L16
L2B 317
3'-»nd 310 opwon
Tobacco
•nd
Liverwort
chloroplait
rplM
rpt 8
rpl 5
L14 L24 LS 314 88
L6
trn I
rpl 36/ rpf\ 4
L18
36
MCY
ipc operoti
rpt 3
rp/16
Int A rpl 3 6
rpl 14 rpt 8 | f rpt 1 1
Figure 7. Comparison of the gene organization of the Euglena chloroplast ribosomal protein loci with those of
the E. coli spc and 3'-end of the S10 operons (6) and similar ribosomal protein gene clusters from tobacco (33)
and liverwort (3).
(2), liverwort (3), maize (38), spinach (14), and Spirodela (7) all possess a large group
II-like intron after the first three amino acids (M-L-S). The Euglena rpll6 gene lacks a
large intron in this position, but contains three smaller introns. The rpll6 locus in
Chlamydomonas lacks introns (39). The rpll4 and rps8 loci of tobacco and liverwort lack
introns, while each of the Euglena genes contain one and three introns, respectively. The
Euglena chloroplast genome is different in having a large number of introns in protein
coding genes (40), but lacks intron containing tRNA genes (41). The 8 bp exon is a novel
feature of the Euglena rps8 locus. It is flanked by two introns. It is the smallest exon
defined to date in any chloroplast gene, perhaps defining the minimum exon size for correct
splicing in chloroplasts. Small exons are found in yeast ribosomal protein genes S10 and
L46 (42), each of which contains an intron located immediately after the first methionine.
Genes for yeast L25 and S16a have introns after the first 3 and 5 codons, respectively.
The amino terminal exons of rpll6 in tobacco (33), Spirodela (7), liverwort (3), maize
(38) and spinach (15) contain a 9 bp translated region encoding Met-Lys-Ser immediately
upstream from a single large group II-like intron. However, as for other chloroplast
transcripts (43,44), the amino terminal exons are associated with a 5'-untranslated leader
region (7,15).
Another highly unusual feature of the Euglena chloroplast gene cluster is the 2.8 kbp
rps3-rpll6 intercistronic region (Figs. 2 and 7). By comparison, the corresponding
rps3-rpll6 region off. coli, and tobacco and liverwort chloroplasts are 12, 147, and 58
bp, respectively. The region has been sequenced (D. A. Christopher and R. B. Hallick,
unpublished), but no genes have been identified. From our preliminary Northern
hybridization analysis, it would appear that the 2.8 kbp rps3-rpll6 intercistronic DNA
is transcribed, and that a stable transcript from this locus does accumulate.
Gene Expression
The 4.5 to 8.3 kb transcripts detected with an rpll6-rpH4-rpl5 specific probe are much
7602
Nucleic Acids Research
larger than the region (3.8 kb) encoding the five genes. These genes are most likely cotranscribed with upstream and/or downstream genes. Beginning 7.2 kbp upstream from
rpU6, and in the same polarity, is the cluster of ribosomal protein genes
rpl23-rpt2-rpsl9-rpl22-rps3 (18) that is similar to the proximal end of the E. coli spc operon
(6). Downstream are the trnI-rpsl4-tmF-trnC loci. There is precedent for overlapping
transcription between the E. coli S10 and spc operons (45). Co-transcription with flanking
genes for the spinach rps3-rpll6 loci has been proposed (15). Experiments are currently
in progress with gene-specific probes from each cistron to define the ribosomal protein
mRNA transcription and RNA maturation path way (s) for Euglena chloroplasts.
A New Category of Cell Organelle Intron, Designated 'Group III'
There are two types of introns within Euglena chloroplast ribosomal protein genes. They
differ in size, secondary structural features, degree of conservation of boundary sequences,
and other properties. One of these categories is the well-known group II introns that are
found in both chloroplast and mitochondrial genomes (20), exemplified by rpll 6 intron
2, and rps8 introns 1 and 3 (Fig. 1, described below). The remainder of the introns are
very similar to a previously described, novel group of 6 introns of the Euglena chloroplast
ribosomal protein genes rpl23, rps!9 and rps3 (18), and three introns in tufA (16). With
the addition of the small introns of rpll'6, rpll4, rps8, and rpsl4 (19), there are now enough
of these introns that are sufficiently similar to each other to warrant their classification
as a new category of chloroplast intron. We propose the designation 'group III introns.'
Examples of 13 group HI introns are shown in Fig. 8a. The properties of these introns
are as follows: (i) They are small and remarkably uniform in size, with a range of 95 — 110
nt, and an average size of 102 nt. By contrast, the smallest Euglena chloroplast group
II intron (rps8 intron 3) is 277 nt. (ii) They have degenerate versions of the group II intron
consensus boundary sequences (Fig. 8a) The 5'-boundaries of 5'-NTNNG (N=nucleotide)
have two conserved bases from the 5'-GTGYG- group II consensus sequence (20,47).
The 3'-boundaries of ANNTNNNN-3', have their two conserved nucleotides and pyrimidine
rich nature in common with the ATTTTAT-3' group II consensus sequence. In 12 of 13
examples (Fig. 8a), the conserved A residue is exactly 8 nt from the 3'-cleavage site.
The conserved bases in the boundaries 5'-NTNNGN...AhfNTNNNN-3' may be central
to the splicing mechanism, (iii) They lack the highly conserved secondary structural features
characteristic of group II introns, a central core with 6 radiating, helical domains I—VI
(20,48). We have been unable to identify any conserved secondary structure among the
group HI introns. (iv) They are located primarily, but not exclusively, in genes for
components of the Euglena chloroplast translation and transcription machinery. There are
numerous group HI introns in the rpoB-rpoCl-rpoC2 operon (C. Radebaugh, G. YepizPlascencia, and R. B. Hallick, unpublished observation), but a few group in introns are
also present in psbB and atpl (R. Drager, J. K. Stevenson, and R. B. Hallick, unpublished
observation). The small introns are to date unique to Euglena chloroplast DNA. (v) The
group m introns are very A/T rich, with a base bias of T > A > G > C. This is a
feature characteristic of most Euglena chloroplast introns (18,49,50). The uniformly small
size, degenerate group II boundaries, and lack of any discernible secondary structure
distinguishes group HI introns from all other chloroplast and mitochondrial introns, and
from introns in nuclear genes.
There may be additional group HI introns in the sequence data of Fig. 1. The 208-nt
rpll6 intron 3 is not a group II intron, but is twice as large as expected for a group III
intron. One possible interpretation is that rpll6 intron 3 is actually one group HI intron
7603
Nucleic Acids Research
A) G R O U P
IHTRON
*
*
*
*
*
*
*
LOCOf
P.PL23 - I V I - 1
RPL23-IVf-2
R P L 2 3 - I V I -3
RPflt-IVt-1
RPfl9-IVf-2
RPt3-ivt-2
TUFA-IVf-1
TUFA-IVt-2
TUFA-Ivt-3
RPLl4-IVf-l
RPf8-ivt-2
RPf14-IVf-1
RPLlt-IVt-1
RPLlS-IVf-3
III
EXOH
GTAT6G
CAAATG
AAATTA
TC6TTT
G6TCAC
TAGCTC
AATAAA
AGTA6A
ATAGAA
6TAAAA
AATTAG
CGATTA
ATCTAT
GCTGCT
CONSERVED
B)
GROUP
RPf3-IVt-l
* RPLlt-IVf-2
* RPf8-IVf-1
RPf8-IVf-3
,
T IHTIION. . .
GTGTGTTCTTAT...( 82
TTTTGAATGTTT...( 75
GTGAGATTATAT...( 70
NT
NT
NT
TTGAGATTTGAC...
79 NT
TTTTGATTTTAT...
73 NT
ATAAGATATTTC...
78 NT
ATGA6TTAATTA...
71 NT
ATAAGCTTAAAA...
79 NT
AAGTGTC6TTTA...
86 NT
TTtTGTATTACA. . . 85 NT
GTGTGATTTTTT...
71 NT
ATTTGATTTTCT...
82 NT
ATGAGACTTTTT...
73 NT
TT6TGGTTTTTT...
184 NT
.T. .G. . .
. . . IHTRON
)
)
)
)
)
)
)
)
)
)
)
)
)
.. .TTTTAAATCTCA
...TATAACTTCATA
...ATCAATTTATAT
. ..TTAGATCTTTTT
...TTAAACCTTATA
...ATTAATTTTATA
...AAAGAAAACAAA
...TTCCATCAAAAA
...AAACATATTG6A
...ATCTATTTTAAA
...TTTAGCCTTATA
...T6TTAACTCTTT
...TTTTATTTAATT
) ..TTTTATTTAATT
IXOH
t
AATTTT
AGTAAA
AATGAC
TAAATC
AAATTA
ATACGA
AAATAA
CGATAG
AAAAAG
GAAGCT
CGATAT
CGAAAT
GATAAA
C6TAGA
. . . A. . T. . .
II
TTACTA
AT6CTT
ATTGAT
CTGTTT
TTGAGATACAAA...( 385 NT ) . . T T C T A T T T T C T T
C T C C G A T C T A A A . . . ( 332 HT ) . . T T G T A C T T G T T T
T T C C 6 A 6 T T T A A . . . ( 303 NT ) ..ATTTAATTTAAT
G T G C G A T A T G T T . . . ( 253 NT ) ..TTTAATTTTGCA
CONSERVED
'TGCGA. ..
GROUP
GTGYG
II C O N S E N S U S
AGTCGC
TACAAT
6TAATT
TGTCTA
. ..T . .A.TTT . . T
TTTAATTTTAT
Figure 8. Comparison of the intron-exon boundaries of 15 introns from the Euglena graalis chloroplast rpl23,
rpsl9, rps3 (18), rpsl4 (19) rpl!6, rplI4, and rps8 loci and 3 introns from tufA(\6). The introns are divided
between a) Group III and b) Group II designations. The hyphenated number after the gene symbol indicates the
first, second or third intron (ivs) of the locus. Verticle arrows point to the splice junctions. The asterisk (*) denotes
exon-intron junctions determined by primer extension RNA sequencing. Potential conserved and group II consensus
(47,48,52) 5'-and 3'-nucleotides are indicated below the aligned sequences.
within another group HI intron. This is an intriguing possibility. Michel et al (51) have
proposed that Euglena chloroplast psbF intron 1 is a group II intron within another group
II intron. We note that a potential 102-nt group HI intron internal to rpll6 intron 3 could
begin at position 661 (Fig. 1) with the group Ill-like boundary sequence
5'-TTGTGTATTTCT and end at or near position 762 with the sequence
AAAAAGGTTTTT-3'. There is also the possibility of group HI introns in intergenic
spacers. We have recently characterized the rps4-rpsll operon of Euglena chloroplast DNA
and determined that the 124-nt rps4-rpsl 1 intergenic spacer has a 95-nt group HI intron
(J. K. Stevensen, R. Drager, and R. B. Hallick, in preparation). We note that the 183-nt
rpl5-rps8 intergenic spacer (Fig. 1) has a potential group HI intron of approximately 104
nt beginning with the sequence 5'-GTGTGl 111111 at position 1759 and ending at or
near position 1863 with AAAGGATTTATA-3' spacer. Further characterization of precursor
and mature RNA products will be required to determine all of the features of the RNA
maturation and processing pathway for the rpU6-rpU4-rpl5-rps8-rpl36 transcription unit.
Group II Introns in Ribosomal Protein Genes
The second class of introns of Euglena chloroplast ribosomal protein genes are group II
introns. Examples are 777/76" intron 2 (356 nt), and rps8 introns 1 (327 nt) and 3 (274nt),
and the previously described rps3 intron 1 (409 nt) (18). They are on average smaller
7604
Nucleic Acids Research
B)
A)
G
g
V
A
A
A-g
g
A
A-g
U
g
g
A
c-c
c u
c-c-]
A- D
D-A
c- c
A-u J
A-
u
A-
I
C
A-0
8
*
Ml
'
g*s u
A-g
U-A
g-A
A-g
A-D
g-A
A-D
A §
a
W -A
A
•'•
LA-O
-A
J
eg
A-U
A-a
g
OAAI
UIU'UAC
c)
A-D
G.g
g-A
O-A
S-C o
ft
D-A
u«c
•
"A
ii
0 -A
u«c
A-g
g
g
g
C
g
C- G
C*
A-U
A
VI
A
g
A
u
c*g
A-g
A-g
A
g
u
c
A
g-A
A-U
VI
A
c
D)
VI
g A
g
A
U-A
D-A
A-g
A-U
%u
-A
-A
-C
A-U
A-g
A-g
_s-c_
g»s
{ • ! * •
A-U
g
D
-c0
A-UA
g u
c g
c-c
e«u
VI
A
A-g
u
g
g
Al
g-A
c-c
.•:
CA ' OCB
A-g
A-U
A-U
1
U C
AA
*<i?>°
ASU
Figure 9. RNA secondary structural models proposed for the 3'-ends of a) rpll6 intron 2, b) rps8 intron 1,
c) rps8 intron 3, and d)rps3 intron 1. Structures labeled V and VI resemble group II intron domains five and
six (20). The arrow points to the 3'-splice junction. The asterisk (*) designates the conserved bulge A residue.
The brackets delimit a base-paired region of domain five that resembles a similar conserved region of group
II introns.
than the group II introns in the liverwort (315-2111 nt) (3) and tobacco (503-2526 nt)
(2) chloroplast genomes, and smaller than the introns from light-induced Euglena chloroplast
genes (326-1600 nt) (40). The distinguishing features of Euglena chloroplast group II
introns are the following: (i) They have classical group II 5'- and 3'-boundary sequences
(20,47), as initially identified for chloroplast group II introns in the Euglena rbcL locus
(52). As shown in Fig. 8b, theribosomalprotein group II introns, with the possible exception
of rps3 intron 1 (discussed below) follow this property, (ii) They have domain V and domain
VI secondary structure features characteristic of all group II introns (20). As shown in
Fig. 9, a short stretch of nucleotides in the domain V stem is conserved with those in
7605
Nucleic Acids Research
29 other examples (51). In domain VI (Fig. 9), the conserved A-residue that is located
8 nt upstream from the 3'-splice site is in an unpaired position, also characteristic of all
group II introns. For self-splicing introns (48,53), the A residue serves as the branch point
in the formation of a lariat intermediate.
The 409-nt rps3 intron 1 (18) has two unusual features. The 5'- and 3'- boundary
sequences are more characteristic of group HI, than of group II introns (Fig. 8). In addition,
the location of domains V and VI are more consistent with a splice boundary 20 nt upstream
from the beginning of the second exon (Fig. 9d) than at the beginning of the exon. There
is a good group II-like boundary sequence of 5'-GTGCGATACTAT located 79 nt from
the upstream exon (see Fig. 2, ref. 18). Therefore we are considering the possiblity that
rps3 intron 1 might be a 99-nt group HI intron with an internal 300-nt group II intron.
The novel feature of many Euglena chloroplast group II introns is that structures
resembling group II intron domains I to IV are often either very weak or absent. Michel
(51) has suggested that Euglena group II-like introns may have lost domains I to IV and
possess variable versions of domains V and VI that are heterogeneous in size and basepairing. In general, we find that group n ribosomal protein gene introns have fewer elements
characteristic of domains I to IV than their counterparts in genes for photosynthesis-related
polypeptides such as psbA and rbcL and group II introns from other organelle DNAs.
For example, the self-splicing mitochondrial group II introns have two sites of
complementarity between domain I of the intron (exon binding sites 1 and 2) and the 5'-exon
(intron binding sites 1 and 2) that are required for splicing (47). By contrast, the short
exons of Euglena rps8 (exon one, 15 bp and exon two, 8 bp) do not possess any
complementary bases with the flanking introns. The exons for rpll4 and rpll6 also lack
intron binding sites. We suggest that the intron-exon recognition of the type reported for
self-splicing group II introns does not occur for these Euglena chloroplast introns. We
propose that there is an evolutionary continuum of intron structural variations among
chloroplast and mitochondrial group II-like introns that has the following order: (a) group
II (self-splicing); (b) group II non-self splicing, but with all 6 domains as defined in (20);
(c) group II non-self splicing, with some domains absent; (d) group HI. The later two
categories have to date only been found in Euglena chloroplast DNA. Small introns of
100— 110 nt with different splice boundaries and G/C content with respect to the Euglena
introns have been described for some plant nuclear genes (54). Group HI introns in turn
may be the closest relatives among organelle introns to nuclear introns, especially those
of higher plants.
ACKNOWLEDGEMENTS
We wish to thank Ms. Cathy Radebaugh and Ms. Gloria Yepiz-Plascencia for helpful
discussions during the course of the experiments, and Ms. Jane Dugas Huff for her expert
typing of this manuscript. This work was supported by a grant to RBH from NIH.
•To whom correspondence should be addressed
REFERENCES
1. Eneas-Filho.J., Hartley.M.R. and Mache.R. (1981) Mol. Gen. Genet. 184:484-488.
2. Shinozaki.K., Ohme.M., Tanaka.M., Wakasugi.T., Hayashida.N., Matsubayashi,T., Zaita.N.,
ChunwongseJ., OtokataJ., Yamaguchi-Shinaki.K., Ohto.C, Torazawa,K., Meng,B.Y., Sugha,M., Deno,l.,
Kanogashira.T., Yamada.K., KusudaJ., Takaiwa.F., Kato.A., Tohdoh.H., Shimada.H. and Sugiura.M.
(1986) EMBO J. 5:2043-2049.
7606
Nucleic Acids Research
3. Ohyama.K., Fukuzawa.H., Kohchi.T., Shirai.H., Sano,T., Sano.S., Umesono.K., Shiki, Y., Takeuchi.M.,
Chang,Z., Aota.S., Inokuch.H. and Ozeki.H. (1986) Nature 322:572-574.
4. Schmidt.R.J., HoslerJ.P., Gillham.N.W., BoyntonJ.E. (1984) J. Cell. Biol. 98:2011-2018.
5. GanttJ.S. and KeyJ.L. (1986) Mol. Gen. Genet. 202:186-193.
6. Lindahl.L- and LindahU.M. (1986) Ann. Rev. Genet. 20:297-326.
7. Posno.M., Vliet.A.V. and Groot.G.S.P. (1986) Nucleic Acids Res. 14:3181-3195.
8. Deng.X.W. and Gruissem.W. (1987) Cell 49:379-387.
9. Muller.G.S., Hallick.R.B., AltJ., Westhoff.P. and Hennann,R. (1986) Nucleic Acids Res. 14:1029-1044.
10. Koller.B., Fromm,H., Galun.E. and Edelman.M. (1987) Cell 48:111-119.
11. Hildebrand.M., Hallick.R.B., Passavant.C.W. and Bourque.D.P. (1988) Proc. Acad. Sci., USA 85:372-376.
12. KohchiX, Umesono.K., Yutaka.O., Komine.Y., Nakahigashi.K., Komano,T., Yamada,Y., Ozeki.H. and
Ohyama.K. (1988) Nucleic Acids Res. 16:10025-10036.
13. Markmann-Mulisch.U., Knoblauch,K., Lehmann.A. and Subramanian.A.R. (1987) Biochem. Intemat.
15:1057-1067.
14. Thomas.F., Massenet.O., Dorne.A.M., BriaU.F. and Mache.R. (1988) Nucleic Acids Res. 16:2461 -2472.
15. Zhou.D., Quigley.F., Massenet.O. and Mache.M. (1989) Mol. Gen. Genet. 216:439-445.
16. Montandon.P. and Stutz.E. (1984) Nucleic Acids Res. 12:2851-2859.
17. Manzara.T. and Halhck.R.B. (1987) Nucleic Acids Res. 15:3927.
18. Christopher.D.A., CushmannJ.C, Price.C.A. and Hallick.R.B. (1988) Curr. Genet. 14:275-286.
19. NickolofT,J.A., Christopher.D.A., Drager.R.G. and Hallick.R.B. (1989) Nucleic Acids Res. 17:(in press).
20. Michel.F. and Dujon.B. (1983) EMBO J. 2:33-38.
21. Hallick.R.B., Rkhards.O.C. and Gray,P.W. (1982) In:Edelman,M., Hallick.R.B., Chua,N-H., eds. Methods
in Chloroplast Molecular Biology. Elsevier Biomedical, New York, pp. 281-294.
22. Hallick.R.B., Rushlow.K.E. and Bingham.S.C. (1982) In: Edclman.M., Hallick.R.B., Chua,N-H., eds.
Methods in Chloroplast Molecular Biology. Elsevier Biomedical, New York, pp. 315—332.
23. Maniatis.T., Fritsch.E.F. and Sambrook^l. (1982) Molecular Cloning: a laboratory manual. Cold Spring
Harbor Laboratory, Cold Spring Harbor, New York.
24. Hallick.R.B. and Buetow.D.E. (1989) In: Buetow.D.E., ed. The Biology of Euglena, Vol. IV, Academic
Press, Inc., New York, pp. 351-414.
25. Henikoff.S. (1984) Gene 28:351-359.
26. VieiraJ. and Messing,!. (1987) Method. Enzymol. 153:3-11.
27. Sanger.F., Nicklen.S. and Coulson.A.R. (1977) 74:5463-5467.
28. Mount.D.W. and Conrad.B. (1986) Nucleic Acids Res. 14:443-454.
29. Lipman.D.J. and Pearson.W.R. (1985) Science 227:1435-1441.
30. Feng.D.F. and Doolittle.R.F. (1987) J. Mol. Evol. 25:351-360.
31. Foumey,R.M., MiyakoshiJ., Day ID.R.S. and Paterson.M.C. (1987) Focus 10:5-7.
32. Melton.D.A., Krieg,P.A., Rebagliati.M.R., Maniatis.T., Zinn.K. and Green.M.R. (1984) Nucleic Acids
Res. 12:7035-7056.
33. Tanaka,M., Wakasugi.T., Sugita.M., Shinozaki.K. and Sugiura.M. (1986) Proc. Natl. Acad. Sci. USA
83:6030-6034.
34. Ceretti.D.P., Dean.D., Davis.G.R., Bedwell.D.M. and Nomura,M. (1983) Nucleic Acids Res. 11:2599-2616.
35. Ohkubo.S., Muto.A., Kawauchi.Y., Yamao.F. and Osawa.S. (1987) Mol. Gen. Genet. 210:314-322.
36. Montandon.P.E., Knuchel-Aegerter.C.and Stutz.E. (1987) Nucleic Acids Res. 15:7809-7822.
37. Watson.J.C. and Surzycki.S.J. (1982) Proc. Nad. Acad. Sci. U.S.A. 79:2264-2267.
38. Markmann-Mulisch.U. and Subramanian.A.R. (1988) Eur. J. Biochem. 170:507-514.
39. hoaJ.K., Wu.M., Chang.C.H. and Cuticchia.A.J. (1987) Curr. Genet. 11:537-541.
40. Koller.B. and Delius.H. (1984) Cell 36:613-622.
41. Hallick.R.B., Hollingsworth.M.J. and NickoloffJ.A. (1984) Plant Molec. Biol. 3:169-175.
42. Planla,R.J., Mager.W.H., Leer.R.J., Woodt,L.P., Raue.H.A. and El-Baradi T.T.A.L. (1986) in:Hardesty,B.
and Kramer.G. (eds.) Structure, Function and Genetics of Ribosomes, Springer-Verlag, New York, pp.
699-718.
43. MulletJ.E., Orozco.E.M. and Chua.N.H. (1985) Plant Mol. Biol. 4:39-54.
44. Gruissem.W. and Zurawski.G. (1985) EMBO J 4:3375-3383.
45. Mattheakis.L.C. and Nomura.M. (1988) J. Bacteriol. 170:4484-4492.
46. Hallick.R.B. and Bottomley.W. (1983) Plant Molec. Biol. Report. 1:38-43.
47. Jaojuier.A. and Michel.F. (1987) Cell 50:17-29.
48. Schmelzer.C. and Muller.M.W. (1987) Cell 51:753-762.
49. GingrichJ.C. and Hallick.R.B. (1985) J. Biol. Chem. 260:16156-16161.
7607
Nucleic Acids Research
50.
51.
52.
53.
54.
CushmanJ.C, Hallick.R.B. and Price.C.A. (1988) Curr. Genet. 13:159-171.
Michel,F., Umesono.K. and Ozekj.H. (1989) Gene, in press.
Koller.B , GingrichJ.C, Stiegler.G.L., Farley.M.a., Delius.H. and Hallick.R.B. (1984) Cell 36:545-553.
Jarrell.K.A., Dietrich.R.C. and Periman.P.S. (1988) Molec. Cell Biol. 8:2361-2366.
Sugha.M., Manzara.T., Pichersky.E. Cashmore.A. and Gruissem.W. (1987) Mol. Gen. Genet. 209:247-256.
This article, submitted on disc, has been automatically
converted into this typeset format by the publisher.
7608
© Copyright 2026 Paperzz