Analysis of Euglena gracilis Plastid-Targeted Proteins

EUKARYOTIC CELL, Dec. 2006, p. 2079–2091
1535-9778/06/$08.00⫹0 doi:10.1128/EC.00222-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Vol. 5, No. 12
Analysis of Euglena gracilis Plastid-Targeted Proteins Reveals
Different Classes of Transit Sequences䌤
Dion G. Durnford1* and Michael W. Gray2
Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada E3B 5A3,1 and Department of
Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X52
Received 12 July 2006/Accepted 15 September 2006
The plastid of Euglena gracilis was acquired secondarily through an endosymbiotic event with a eukaryotic
green alga, and as a result, it is surrounded by a third membrane. This membrane complexity raises the
question of how the plastid proteins are targeted to and imported into the organelle. To further explore plastid
protein targeting in Euglena, we screened a total of 9,461 expressed sequence tag (EST) clusters (derived from
19,013 individual ESTs) for full-length proteins that are plastid localized to characterize their targeting
sequences and to infer potential modes of translocation. Of the 117 proteins identified as being potentially
plastid localized whose N-terminal targeting sequences could be inferred, 83 were unique and could be
classified into two major groups. Class I proteins have tripartite targeting sequences, comprising (in order) an
N-terminal signal sequence, a plastid transit peptide domain, and a predicted stop-transfer sequence. Within
this class of proteins are the lumen-targeted proteins (class IB), which have an additional hydrophobic domain
similar to a signal sequence and required for further targeting across the thylakoid membrane. Class II
proteins lack the putative stop-transfer sequence and possess only a signal sequence at the N terminus,
followed by what, in amino acid composition, resembles a plastid transit peptide. Unexpectedly, a few unrelated
plastid-targeted proteins exhibit highly similar transit sequences, implying either a recent swapping of these
domains or a conserved function. This work represents the most comprehensive description to date of transit
peptides in Euglena and hints at the complex routes of plastid targeting that must exist in this organism.
newly synthesized proteins to the outer envelope membrane,
where they interact with receptors and other components of
the translocation apparatus so that protein import and subsequent sorting can take place (33). Many of the translocation
components present in the outer and inner envelopes have
been identified in plants (31).
Many protists, however, possess secondary plastids that are
believed to have arisen from endosymbiosis with a eukaryotic
alga. These organisms have complex plastids with either three
membranes around the chloroplast, as occurs in the dinoflagellates and Euglena spp., or four membranes, as in the stramenopiles and haptophytes (34). The presence of additional
membranes surrounding the plastid would seem to necessitate
additional targeting information, complicating the process of
translocation. We know, for example, that during the evolution
of secondary plastids, genes from the endosymbiont were functionally transferred to the host’s nuclear genome. These genes
must then be expressed and their protein products targeted
back to the organelle, and this process is undoubtedly more
complicated than that in the case of primary plastids. A significant hurdle in this pathway is the necessity to acquire appropriate targeting information that allows nucleus-encoded proteins to be directed to the plastid and to traverse additional
membranes in the process. Understanding the mechanism of
targeting and translocation in organisms with complex plastids
has been key to understanding how the transition from algal
symbiont to plastid occurred (12, 35, 47, 50, 63, 74).
In protists with four membranes around the plastid, the
outermost membrane often has ribosomes attached and is
typically continuous with the endoplasmic reticulum (ER)
(23). Proteins directed to these plastids possess bipartite
targeting sequences, with an N-terminal signal sequence
A fundamental problem in cell biology is the precise and
efficient targeting of proteins synthesized by cytoplasmic ribosomes to their appropriate intracellular locations. Proteins destined for the endomembrane system, mitochondria, or the
chloroplast usually have specific N-terminal targeting domains
that are required for proper subcellular localization. These
leader sequences are often removed by specific proteases at the
protein’s destination prior to it assuming its active conformation. For chloroplast-targeted proteins in plants and algae, an
N-terminal transit peptide (TP) is both necessary and sufficient
for correct plastid targeting (11). Transit peptides are not conserved in sequence but exhibit characteristic biochemical properties, such as an elevated content of the hydroxylated amino
acids serine and threonine as well as a deficiency of acidic
(aspartate and glutamate) amino acids (76). Within a typical
chloroplast, there are six distinct locations to which the constituent proteins must be sorted, and some proteins have to
cross up to three membranes (33). This complexity requires
additional targeting information within the transit peptide,
such as the signal sequence-like domain found in proteins
targeted to the thylakoid membrane, or information contained
within the mature portion of the protein itself (62).
Plants, green algae, and red algae have plastids derived from
an endosymbiotic cyanobacterium, with two membranes enveloping the chloroplast (34). Protein targeting to these plastids is
fairly well understood; generally, the transit peptides direct
* Corresponding author. Mailing address: Department of Biology,
University of New Brunswick, Fredericton, New Brunswick, Canada
E3B 5A3. Phone: (506) 452-6207. Fax: (453) 453-3583. E-mail:
[email protected].
䌤
Published ahead of print on 22 September 2006.
2079
2080
DURNFORD AND GRAY
(24) that directs them to the chloroplast ER, where they are
cotranslationally imported across the first membrane (4, 7,
30). The domain after the signal sequence is the predicted
transit peptide for transport across the inner two membranes, in a process likely to resemble translocation across
plant chloroplast envelopes (43).
The euglenophytes and dinoflagellates have plastids with
three membranes, the outermost of which lacks bound ribosomes. In both cases, plastid proteins are targeted through the
endomembrane system (49, 53, 67, 70, 71). From studies of
several complete, publicly available Euglena gracilis plastid
protein sequences (13, 25, 27, 28, 38, 44, 52, 56, 61, 64, 66, 73),
it was predicted that the plastid proteins have an N-terminal
signal sequence, an inference that was confirmed by both in
vitro (38) and in vivo (70, 71) experimental approaches. Following the signal sequence is the predicted transit peptide,
which is sufficient for translocation across plant chloroplast
membranes (29), and a hydrophobic region that acts as a “stoptransfer” sequence to prevent complete transport into the ER,
such that the mature protein remains in the cytoplasm (69).
The protein is then targeted to the plastid, likely via a vesicular
transport system (67). Also described for Euglena are tripartite
transit sequences that possess an additional hydrophobic domain predicted to target proteins to the thylakoid lumen (73).
Because relatively few Euglena plastid protein sequences are
publicly available, the study we report here more comprehensively examines the characteristics of plastid-targeting sequences. Since many of the known Euglena proteins, including all
of those for which biochemical analyses of targeting have been
conducted, are encoded as polyproteins, we sought to determine
whether all plastid proteins are likely to proceed to the plastid via
a similar pathway in this organism. By examining the targeting
sequences of a large number of plastid proteins, the majority of
which are not organized as polyproteins, we have been able to
define the characteristics that can be used to identify Euglena
plastid-targeted proteins with high confidence and to infer
modes of transport to the plastid.
MATERIALS AND METHODS
E. gracilis strain Z was cultured under several different conditions, and cDNA
libraries were produced commercially in the PCDNA3.1(⫹) vector (DNA Technologies Inc.). Expressed sequence tag (EST) sequencing was performed at the
Atlantic Genome Centre (Halifax, Nova Scotia, Canada) and the B.C. Cancer
Agency (Vancouver, British Columbia, Canada). A total of 19,013 ESTs were
retained following quality and vector trimming via the taxonomically broad EST
database (TBestDB [http://tbestdb.bcm.umontreal.ca/searches/login.php]), under the auspices of the Protist EST Program. The ESTs were clustered to form
a total of 9,461 unique groups.
To search for plastid-targeted proteins, the 9,461 clusters were translated in
three reading frames (ORFs) (plus orientation), and the longest ORF of ⬎19
amino acids starting with a methionine was retained for further analyses (http:
//maven.smith.edu/⬃vvouille/sumCGI/translator.html). Screening for plastid-targeted proteins was carried out in several rounds. First, all ORFs were screened
for the presence of a signal sequence using the program SignalP3 (6, 51; http:
//www.cbs.dtu.dk/services/SignalP/). Any ORFs with a signal sequence predicted
with the hidden Markov model (HMM) or the artificial neural network (NN)
were retained. All selected ORFs were then rescreened, and those having a clear
role in plastid function and/or those whose top BLASTnr hit was plant, algal, or
cyanobacterial in origin were segregated for further consideration. Finally, the
putative plastid-targeted proteins were screened further according to the following criteria: (i) the top BLAST hit (NCBI nonredundant database) was plant/
algal or cyanobacterial and/or the protein has a clear role in plastid function, and
(ii) the BLASTp E value was ⱕ1e⫺05. The ORF was considered to possess a
complete transit sequence when (i) there was evidence for a spliced leader
EUKARYOT. CELL
sequence (TTTTTTTCG) at the 5⬘ end of the cDNA that would indicate that the
cDNA was full length (72), (ii) there was an extension of the ORF toward the N
terminus upstream of the first region of evident amino acid sequence similarity
following a BLASTp search, and (iii) the beginning of the mature protein was
identified by comparison with orthologous proteins.
Potential membrane-spanning regions were identified using the hidden
Markov model-based program TMHMM (39; http://www.cbs.dtu.dk/services
/TMHMM/). Hydrophobicity plots were generated using the Protscale program
at the exPASy site (http://www.expasy.org/tools/protscale.html), using a KyteDoolittle scale with a sliding window length of 7 or 19 nucleotides, as indicated.
The amino acid content of peptides was calculated using the PEPSTATS program in the EMBOSS package, available at AnaBench (http://anabench.bcm
.umontreal.ca/anabench/Anabench-Jsp/Welcome.jsp). Sequence logo displays were
generated using the online program WebLogo (weblogo.berkeley.edu/logo.cgi).
Nucleotide sequence accession numbers. All individual EST sequences have
been deposited in the NCBI dbEST database under accession numbers
EG565093 to EG565263.
RESULTS
From 9,461 individual Euglena EST clusters, a total of 117
full-length plastid proteins were identified. Eliminating nearly
identical isoforms from the data set left a total of 83 unique
proteins for further analysis (Table 1). In addition to functioning in basic photosynthetic reactions, the proteins identified
had predicted roles in the biosynthesis of proteins, lipids, carotenoids, and chlorophyll. Proteins involved in signal transduction and plastid metabolism were also found. Through determination of the N-terminal-most regions of sequence similarity
in BLASTp searches, targeting domains were delineated and
found to be very long, with an average size (⫾ standard deviation) of 152 ⫾ 25 residues (Table 1). The shortest estimated
presequence was 95 residues, for Rubisco activase, and the
longest was 211 (Albino3) (Table 1).
Of the few Euglena plastid proteins examined to date, all
possess an N-terminal region similar to a eukaryotic signal
sequence. Thus, the first strategy for identifying plastid-targeted proteins was to search for the presence of such a sequence, using SignalP3. Of the final group of 83 plastid-targeted proteins examined using the SignalP hidden Markov
model, 68% were predicted to possess a signal sequence. This
value dropped to 56% when the artificial NN was employed. In
cases where SignalP did not predict a signal peptide but other
screens indicated a potential plastid-targeted protein, there
was nevertheless a clear hydrophobic region characteristic of a
signal peptide. Based on the NN predictions for the signal
sequence cleavage sites, the Euglena signal sequence was estimated to be 33 ⫾ 9 residues long (range, 18 to 59 residues).
The predicted cleavage site was consistent with that in other
eukaryotic signal sequences (Fig. 1C).
E. gracilis plastid-targeting sequences can be divided into
two classes. Class I plastid-targeting sequences are designated by analogy to a similar type of targeting domain identified in dinoflagellates (55). This class encompassed 89% of
the Euglena proteins examined, which were characterized by
the presence of two hydrophobic regions that are predicted
by the TMHMM program to be transmembrane helices
(TMH) (Fig. 1A). Figure 1A shows the average TMHMM
probability for the class I plastid-targeting regions, with the
first predicted transmembrane helix (TMH1) corresponding to
the hydrophobic domain of a classic signal sequence (75). A
basic amino acid precedes the first TMH in all but six proteins,
with the average charge of this N-terminal region being ⫹1.6.
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
VOL. 5, 2006
2081
TABLE 1. EST clusters
Cluster IDa
Class IA proteins
0726
0899
1043
1116
1127
1204
1312
1428
1495
1503
1573
1674
1706
2042
2448
2566
2596
2669
2795
2990
3121
3164
3171
3330
3362
3372
3375
3383
3449
3469
3474
3482
3500
3504
3558
3594
3603
3619
3635
3653
3673
3676
3817
3830
3881
3900
3911
3934
3943
3946
3996
4008
4056
7084
7147
7392
7739
7766
8108
8254
8643
8888
9366
Class IB proteins
3955
4026
3381
3249
3902
Annotation
Ferredoxin
50S ribosomal protein L3
Putative ferredoxin
30S ribosomal protein S20
Zeta-carotene desaturase
Uroporphyrinogen decarboxylase
Putative ferredoxin
RubisCO small subunit
Glutaredoxin 2
Membrane-associated 30-kDa protein
Putative ferredoxin
Sugar nucleotide phosphorylase
50S ribosomal protein L34
Peptidyl-prolyl cis-trans isomerase
Ferredoxin-like protein
Ribose-5-phosphate isomerase
Ycf53 (tetrapyrrole-binding protein)
50S ribosomal protein L11
D-Ribulose-5-phosphate 3-epimerase
Albino 3
Chaperonin PSII quinone-binding
protein
Rhodanese domain-containing protein
Photosystem II 22-kDa protein
Coproporphyrinogen III oxidase
ATP synthase delta chain
50S ribosomal protein L15
Light-regulated Chlp-localized protein
ATP synthase gamma chain
Cytochrome f
Porphobilinogen deaminase
Probable membrane-associated 30-kDa
protein
Fructose-1,6-bisphosphatase
Glu 1-semialdehyde 2,1-aminomutase
Carbonic anhydrase
Carbonic anhydrase
50S ribosomal protein L28
Peroxiredoxin precursor
50S ribosomal protein L21
Coproporphyrinogen III oxidase
Delta 12 fatty acid desaturase
Carbonic anhydrase
30S ribosomal protein S1
Acyl carrier protein
Ferredoxin
ATP/ADP transporter
PsbM
LHCI
Ferredoxin-NADP⫹ reductase
NADPH protochlorophyllide reductase
RuBisCO activase
LHCI
LHCI
CP29
Chl. synthase 33-kDa subunit
SOUL-heme-binding protein
ATP-dependent Clp protease
Ycf3 (PSI assembly)
RuBisCO 60-kDa chaperonin
YebC-related protein
Chlorophyll b synthase
Uroporphyrinogen decarboxylase
3-Isopropylmalate dehydrogenase
Photosystem II family protein
Oxygen evolving enhancer (OEE1)
Oxygen evolving enhancer (OEE2)
HCF136 (PSII stability factor)
Putative ascorbate peroxidase
Cytochrome c6
BLASTp
score
Presequence
length (aa)
TP length
(aa)b
TMH1
positionc
Arabidopsis thaliana
Cyanophora paradoxa
Arabidopsis thaliana
Synechococcus elongatus
Oryza sativa
Anopheles gambiae
Arabidopsis thaliana
Euglena gracilis
Actinobacillus actinomycetemcomitans
Pisum sativum
Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana
Oryza sativa
Rhizobium loti
Spinacia oleracea
Synechococcus elongatus
Odontella sinensis
Arabidopsis thaliana
Bigelowiella natans
Arabidopsis thaliana
3e⫺06
3e⫺12
4e⫺09
4e⫺08
1e⫺22
2e⫺31
3e⫺09
e⫺118
6e⫺10
148
168
144
170
182
149
143
120
130
60
52
65
64
55
60
58
56
55
2e⫺10
6e⫺09
7e⫺12
7e⫺06
6e⫺07
8e⫺05
2e⫺24
2e⫺12
3e⫺12
1e⫺24
5e⫺37
8e⫺30
168
147
168
190
178
120
144
182
186
150
211
189
Oryza sativa
Arabidopsis thaliana
Chlamydomonas reinhardtii
Nicotiana tabacum
Bigelowiella natans
Solanum tuberosum
Odontella sinensis
Euglena gracilis
Euglena gracilis
Synechocystis sp.
6e⫺08
3e⫺17
e⫺108
7e⫺22
1e⫺14
4e⫺20
7e⫺78
4e⫺91
0
7e⫺49
Bigelowiella natans
Chlorarachnion sp.
Deinococcus radiodurans
Deinococcus radiodurans
Toxoplasma gondii
Chlamydomonas reinhardtii
Thermoanaerobacter tengcongensis
Chlamydomonas reinhardtii
Phaeodactylum tricornutum
Deinococcus radiodurans
Chlamydomonas reinhardtii
Synechocystis sp.
Euglena viridis
Galdieria sulfuraria
Zea mays
Euglena gracilis
Chlamydomonas reinhardtii
Chlorarachnion sp.
Chlorococcum littorale
Euglena gracilis
Euglena gracilis
Oryza sativa
Anabaena sp.
Arabidopsis thaliana
Vibrio cholerae
Physcomitrella patens
Arabidopsis thaliana
Arabidopsis thaliana
Dunaliella salina
Ashbya gossypii
Bifidobacterium longum
Arabidopsis thaliana
Euglena gracilis
Lycopersicon esculentum
Arabidopsis thaliana
Lycopersicon esculentum
Euglena gracilis
Organism with top
BLASTp hit
TMH2
positionc
Id
7–29
13–35
7–29
13–35
21–43
21–38
11–33
4–26
7–24
89–111
87–109
94–113
99–121
98–121
98–120
91–110
82–104
79–101
2
2
1
2
2
2
1
2
2
75
61
66
63
67
58
61
58
55
58
77
94
7–29
13–35
21–43
13–35
7–26
15–37
20–42
12–34
19–41
21–40
29–51
3–25
104–126
96–118
109–131
98–117
93–117
95–117
103–125
92–114
96–118
98–120
128–150
119–136
1
1
1
1
1
1
1
1
2
2
1
1
133
152
156
147
191
120
137
147
151
151
58
67
53
57
54
60
60
60
56
63
5–24
21–40
22–44
15–37
24–46
12–31
13–35
7–26
17–39
7–26
82–104
107–126
97–119
94–116
100–119
91–110
95–113
86–108
95–112
89–111
1
1
2
2
1
1
3
2
2
1
2e⫺71
e⫺148
7e⫺28
1e⫺10
4e⫺18
8e⫺85
3e⫺11
1e⫺78
2e⫺98
8e⫺30
7e⫺32
6e⫺13
1e⫺41
0
0.017
5e⫺86
e⫺144
5e⫺69
e⫺145
e⫺116
0
7e⫺57
9e⫺10
7e⫺20
1e⫺44
3e⫺27
5e⫺37
7e⫺17
8e⫺18
1e⫺17
7e⫺13
4e⫺11
188
138
102
140
160
134
168
169
162
179
233
122
138
148
154
179
114
155
95
158
141
136
141
136
143
162
122
108
114
164
150
137
56
53
52
62
55
57
52
60
77
68
66
49
56
52
63
50
51
53
59
55
51
50
45
68
58
64
67
57
70
63
71
61
20–37
7–29
5–24
13–35
13–35
5–27
13–35
29–51
13–30
13–32
4–26
15–34
17–39
12–34
13–35
13–35
5–27
12–34
13–35
13–35
13–35
12–34
15–34
5–22
15–37
20–42
7–29
7–29
6–22
17–39
13–35
13–35
93–115
82–99
76–98
97–119
90–112
84–106
87–109
111–133
107–129
100–122
92–114
83–105
95–117
86–105
98–120
85–107
78–100
87–109
72–101
90–109
86–108
84–106
79–101
90–112
95–117
106–128
96–118
86–108
92–111
102–124
106–128
96–118
1
2
3
1
2
2
1
3
2
4
1
1
3
1
3
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
e⫺116
6e⫺17
5e⫺07
2e⫺04
4e⫺69
142
153
142
184
123
53
49
52
70
60
5–27
20–42
7–29
13–35
29–51
80–99
91–113
81–103
105–127
111–133
3
2
1
1
2
Continued on following page
2082
DURNFORD AND GRAY
EUKARYOT. CELL
TABLE 1—Continued
Cluster IDa
3752
2674
Class II proteins
3630
3294
0923
4012
2060
2416
3797
4932
8550
3784
9282
6808
2660
Annotation
Organism with top
BLASTp hit
BLASTp
score
Presequence
length (aa)
TP length
(aa)b
TMH1
positionc
TMH2
positionc
Id
95–114
108–127
2
1
PSI subunit III (PsaF)
Thylakoid luminal 17.4-kDa protein
Chlamydomonas reinhardtii
Arabidopsis thaliana
6e⫺53
5e⫺22
144
171
60
71
13–35
15–37
Photosystem II (PsbW)
ABC transporter (cytochrome c
biogenesis)
PEP/phosphate translocator
Oxygen evolving enhancer (OEE3)
Mg-protoporphyrin IX
methyltransferase
Peptide chain release factor (RF) 2
PSI subunit IV (PsaE)
50S ribosomal protein L9
Short-chain (SC) dehydrogenase
Phosphoribulokinase
MECP synthase
Squalene and phytoene synthases
ClpB
Chlorarachnion sp.
Nostoc punctiforme
4e⫺15
5e⫺33
82
175
52
135
20–37
34–53
3
1
Phaeodactylum tricornutum
Chlamydomonas reinhardtii
Synechococcus elongatus
4e⫺10
3e⫺22
4e⫺17
166
61
66
132
36
40
13–35
13–35
5–27
1
2
1
Synechocystis sp.
Chlamydomonas reinhardtii
Bigelowiella natans
Prochlorococcus marinus
Vaucheria litorea
Arabidopsis thaliana
Prochlorococcus marinus
Phaseolus lunatus
3e⫺42
6e⫺17
6e⫺05
8e⫺07
1e⫺76
2e⫺36
1e⫺27
8e⫺48
99
95
62
120
100
121
98
123
70
61
39
82
75
80
47
76
13–35
15–37
15–33
29–51
20–42
28–50
35–52
37–52
1
3
1
2
1
1
1
1
a
Original cluster IDs had “EEL0000” preceding the 4-digit numbers shown.
For class I proteins, this is the region between the signal sequence and the stop-transfer region.
TMH1 and TMH2 are the hydrophobic domains (range of amino acids is given from the start Met) of the signal sequence and stop-transfer sequence, respectively,
as predicted by the TMHMM program. Underlined regions indicate that the TMHMM program did not predict a TMH (TMHMM value, 0.1 ⬍ P ⬍ 0.9) but that a
hydrophobic patch is apparent from a Kyte-Doolittle analysis.
d
Number of nearly identical isoforms detected.
b
c
In only one case is the N-terminal region negatively charged
(Table 1, cluster 3881 [ATP/ADP transporter]).
The location of the second TMH is remarkably consistent, at
60 ⫾ 8 amino acids following the end of the first predicted
TMH, with a range of 45 to 94 amino acids. We designate this
localization the “60 ⫾ 8 rule” (Fig. 1A). The properties of the
amino acids within the targeting regions of selected plastidlocalized proteins are shown in Fig. 1B. In this figure, the
hydrophobic regions (gray) are obvious. The presence of the
two TMH motifs separated by 60 ⫾ 8 amino acids had excellent discriminating power for identifying potential plastid-targeted proteins. For class I targeting sequences, the TMHMM
program was able to predict upwards of 95% of the plastid
proteins simply by searching for N-terminal regions with
TMHs according to the 60 ⫾ 8 rule. If we combined the entire
set of predicted plastid proteins (all classes), the TMHMM
program would have an overall success rate of 82%. In cases
where the TMHMM probability did not meet the threshold for
formal TMH prediction (Table 1, underlined values), the probability of a TMH was usually between 0.3 and 0.9, and the
success rate would be very high if the threshold was reduced in
subsequent rounds of screening. Rescreening the entire population of ORFs using the 60 ⫾ 8 rule detected all of the class
I proteins listed in Table 1, including isoforms, plus an additional 25 proteins classified as unknowns (data not shown). The
TP domains of dinoflagellates, whose plastid leader sequences
have a similar structure (49), are about half the size (25 ⫾ 8
residues) (data not shown) of those of Euglena proteins.
Class IB proteins (Table 1) also possess two predicted
TMHs separated by 60 ⫾ 8 amino acids, but they have a third
hydrophobic domain with a mean distance of 17 residues
(range, 7 to 25 residues) downstream of the end of TMH2 (Fig.
2). This region resembles a prokaryotic signal sequence and
is postulated to function in the targeting of proteins to the
thylakoid lumen (73). We identified five proteins that are ho-
mologous to thylakoid lumen-localized proteins and for which
biochemical evidence for this location exists (four of these class
IB proteins are shown in Fig. 2, along with a lumen-targeted
class II protein [see below]). Two additional proteins are predicted to function in the lumen, based on their annotation as
well as their possession of a putative lumen-targeting domain
(LTD). Three of the seven class IB proteins (ascorbate peroxidase, HCF136, and OEE2) contain a double Arg immediately
preceding the third hydrophobic domain (data not shown);
another two class IB proteins (PSI-III and cytochrome c6) have
the same motif within six amino acids of the start of the hydrophobic LTD, suggesting that the twin-arginine translocation (Tat) pathway (58) is functional in Euglena.
Class II targeting sequences in Euglena represent a departure from the class I type in that they lack the second TMH
region upstream of the region specifying the mature protein,
and hence do not conform to the 60 ⫾ 8 rule (Fig. 3). The
TMHMM probability scatter plot shows the presence of the
hydrophobic region associated with the signal sequence in all
class II proteins. This class represents 14% of the identified
population of plastid-targeted proteins. Of the 13 class II proteins delineated so far, 6 have unambiguous functions in the
plastid, while the others conceivably could be targeted elsewhere. However, they all possess signal sequence-like N termini, and their predicted functions are expected to occur
within the plastid. Each of these proteins is also related to
homologs from photosynthetic taxa, as gauged by the top
BLASTp hit (Table 1), supporting a putative plastid localization. The OEE3 protein, which is located within the thylakoid
lumen, exhibits a second hydrophobic region (Fig. 3, arrow)
that represents an LTD analogous to that found in class IB
targeting domains. The estimated presequence length is 100
amino acids. All of the class II sequences have a spliced leader
sequence, indicating that they are not class I sequences that
VOL. 5, 2006
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
2083
FIG. 1. Characteristics of class I targeting sequences of Euglena. (A) Averaged TMHMM probabilities for 70 class I proteins identified in this
study. Because the region upstream of the first TMH is of variable length (range, 2 to 32 amino acids; mean, 12.7 ⫾ 6.7 amino acids), the data
were normalized to a starting TMHMM probability of ⱖ0.1, which corresponds to the beginning of a predicted membrane-spanning region, and
then averaged. The error bars show 2 standard errors. Key features of a Euglena class I targeting sequence are depicted above the graph.
(B) Overview (McClade) of amino acid categories of the targeting sequences of selected plastid-targeted proteins. Colors represent different amino
acids, as follows: gray, hydrophobic and nonpolar (A, C, F, G, I, L, M, P, V, W, and Y); red, acidic (D and E); purple, basic (H, K, and R); yellow,
hydroxylated (S and T); and blue, polar (Q and N). (C) Sequence logo plot showing occurrence of amino acids around the signal sequence cleavage
site (arrow) predicted by SignalP (neural net). The y axis is displayed as bits, as described at weblogo.berkeley.edu/logo.cgi.
were artifactually truncated upstream of the stop-transfer domain.
Plastid transit peptides of class I and II proteins. In plants,
targeting of proteins to the chloroplast is mediated by a transit
peptide (for a review, see reference 11). Although sequence
conservation per se is lacking, there is a general maintenance
of certain chemical properties, including enrichment for the
hydroxylated amino acids serine and threonine and a deficiency in acidic residues (76). In Euglena class I proteins, the
intervening region between the two TMHs likely functions as a
plastid TP (29). For class II proteins, we predicted that the
region immediately following the signal sequence must have a
role in targeting to the plastid. The exact length of the putative
TP was difficult to assess, as we had little confidence in the
ability of ChloroP to correctly predict the cleavage site, and
thus the values in Table 1 are only estimates. However, from
the predicted signal sequence site to the first region of clear
sequence similarity to known proteins, the length ranged from
36 to 135 amino acids.
To test whether the class II TP region was similar to that of
class I targeting sequences and to determine the chemical
properties of both TP domains compared to the TPs of green
algae and plants, we examined their amino acid compositions.
We also compared these compositions to those of the mature
region of proteins with class I targeting domains as well as
selected Chlamydomonas proteins. The amino acid composi-
2084
DURNFORD AND GRAY
EUKARYOT. CELL
FIG. 3. Characteristics of class II targeting sequences of Euglena
plastid proteins. (A) Scatter plot showing TMHMM probability for the
first 100 amino acids. Because the region before the first TMH is of
variable length, the data were normalized to a starting TMHMM
probability of ⱖ0.1. In all cases, a second TMH 60 ⫾ 8 amino acids
downstream from the first was absent. The hydrophobic region centered at position 45 is the LTD of OEE3. (B) Overview (McClade) of
amino acid categories of the targeting sequences of class II plastidtargeted proteins. Colors represent defined categories of amino acids,
as indicated in the legend to Fig. 1. The black arrowhead indicates the
predicted signal sequence cleavage site.
FIG. 2. Kyte-Doolittle hydropathy plots for class IB plastid-targeting sequences of Euglena. Hydrophobicity plots for five confirmed
lumen-targeted proteins are shown. The analyses were conducted with
a window size of 19, and the hydrophobic regions (positive scores)
corresponding to the TMHs of the signal sequence (SS) and the stoptransfer sequence (ST) are indicated with black bars. The hydrophobic
region corresponding to the LTD is indicated with gray bars. Oxygenevolving enhancer 3 (OEE3) has a class II targeting sequence and thus
lacks the typical ST region. TP, transit peptide; MP, mature protein.
tion was calculated from the entire intervening region between
the TMH regions of class I proteins (the predicted transit
peptide), the estimated transit peptide from class II proteins
that was located after the signal sequence and before the pre-
dicted start of the mature protein, and the entire coding region
from all proteins having class I targeting sequences.
The data for selected amino acids and amino acid categories
are shown in the form of box-and-whisker plots (Fig. 4). Since
plastid transit peptides are reportedly enriched in hydroxylated
amino acids and deficient in acidic amino acids (76), we analyzed a priori these amino acid categories in the putative TPs
of class I and class II targeting sequences of Euglena in addition to a selection of 25 predicted TPs from Chlamydomonas
proteins (Fig. 4). The region immediately downstream of the
signal sequence in class I and II targeting sequences was significantly enriched in Ser and Thr (22% and 17%, respectively)
compared to the mature regions of proteins with class I targeting sequences (11%) (one-way analysis of variance
[ANOVA] and Tukey’s test [␣ ⱕ 0.05]). The TPs of Chlamydomonas proteins were similarly enriched in Ser/Thr (17%) compared to the mature portions of the proteins (11%) (Fig. 4).
The putative transit peptide regions of class I and II targeting
sequences were also significantly depleted in acidic amino acids (Asp and Glu) compared to the mature regions of the same
proteins (Fig. 4) (one-way ANOVA and Tukey’s test [␣ ⱕ
0.05]). The predicted transit peptide regions were also found to
have a higher Ala and Pro content than the mature portions of
VOL. 5, 2006
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
2085
FIG. 4. Amino acid composition analyses of the predicted TPs of class I and II targeting sequences compared to the mature proteins (MP). The
amino acid compositions of the intervening region between TMH1 and TMH2 of class I targeting sequences (TP, I; n ⫽ 70), the predicted transit
peptide region for class II proteins (TP, II; n ⫽ 13), and the mature protein regions from class I proteins (MP, I; n ⫽ 70) were determined. Also
shown are the amino acid compositions of Chlamydomonas reinhardti TPs (TP, Cr; n ⫽ 25) and mature proteins (MP, Cr; n ⫽ 25). Box-and-whisker
plots were used to represent the data and are based on quartiles around the median value. The box encloses 50% of the data, with 25% above and
below the median (solid line). Each whisker represents the data range of an additional 25% of the data. The existence of outliers beyond the 5%
and 95% confidence ranges is indicated with a solid dot where applicable. Categories indicated with different letters on the plot are significantly
different (one-way ANOVA and Tukey’s test [␣ ⱕ 0.05]). All data were normal except for the Lys content in class II peptides, in which case
nonparametric statistics were used to assess differences.
proteins (Fig. 4) (one-way ANOVA and Tukey’s test [␣ ⱕ
0.05]). However, given that 20 tests were conducted and that
the amino acid composition is not truly independent, there is a
possibility that some of these differences could be by chance.
Although the Chlamydomonas TP exhibited a clear elevation
in Ala content, there was no difference in the amount of Pro
compared to that in the Euglena TPs. In terms of charged
amino acids, the TP region is deficient in acidic amino acids,
yet there is little significant change in the content of basic (His,
Lys, and Arg) residues compared to the mature regions of the
same proteins. However, examination of Lys and Arg separately reveals discrimination against Lys in the TP regions of
class I and II targeting sequences (mean, 1.6% and 2.1%,
respectively) compared to the mature proteins (mean, 5.8%;
P ⬍ 0.001 [Kruskall-Wallis]) (Fig. 4). There were no significant
differences in Arg content between the predicted transit peptides and the mature portions of the same proteins. Chlamydomonas TPs discriminate strongly against acidic amino acids
(mean, 0.2%) and have an elevated content of Arg compared
to the mature regions of the same proteins. Unlike Euglena,
Chlamydomonas shows no bias against Lys in the TP. Without exception, the amino acid compositions of the Euglena
class I and II transit peptides were the same, and both were
significantly different from the composition of the mature
protein (Fig. 4).
To examine the distribution of the acidic amino acids further, the class I transit peptide region was divided into thirds,
and the acidic amino acid content was calculated (Fig. 5). From
this analysis, an asymmetric distribution of acidic amino acids
was apparent, such that the first third (TP1) lacked acidic
residues (1%) while the latter third (TP3) had the same acidic
content as the mature protein (11%). The Ser/Thr compositions of the putative TPs were not different among the three
regions (TP1-3) (Fig. 5). The basic amino acid composition was
the same within the three TP regions and the mature protein
(Fig. 5).
2086
DURNFORD AND GRAY
EUKARYOT. CELL
FIG. 5. Amino acid composition analysis of the plastid TP domain
of class I targeting sequences. Each TP region was divided into three
equal segments (TP1-3), and the basic (H, K, and R), acidic (D and E),
and serine/threonine (Ser/Thr) contents were calculated. These values
were compared to the averaged amino acid composition of the mature
protein (MP).
Overall, the putative TP domains of the two classes of
Euglena targeting sequences have the same amino acid composition, and this composition resembles that of plant chloroplast transit peptides (11, 20, 76) in terms of an elevated content of Ser/Thr. These putative TP domains were also
predicted to be transit peptides by using ChloroP (18), with
apparent success rates of 83% and 67% for class I and II
targeting sequences, respectively, when the signal sequence
domain was removed. Surprisingly, the success rates were still
respectable when the signal sequence was retained during the
analysis (71% and 50%) but not when the entire targeting
sequence was removed. One notable exception is the lumentargeted protein OEE3, with a class II targeting sequence that
has a mere seven amino acids between the end of the TMH
(the signal sequence) and the putative hydrophobic LTD. Two
of the seven residues are basic amino acids (no acidic residues), but the region immediately after the LTD is strongly
acidic (data not shown). Euglena TPs, like others in the green
alga lineage, lack a requirement for a Phe at the N terminus
that is commonly observed in chromalveolates (15, 55, 57) and
glaucophytes (68) and that is essential for plastid import in vivo
in diatoms (37).
Stop-transfer sequences are a predicted feature of class I
proteins. Stop-transfer sequences function to halt the cotranslational import of proteins into the ER and serve an important
role in determining the orientation of a protein in the membrane (8). For Euglena, it has been proposed that the second
TMH acts as a stop-transfer sequence (69). From analysis of a
large number of proteins with class I targeting sequences, it
is clear that a stop-transfer sequence is a common motif in
Euglena plastid-targeted proteins. In a few cases (Table 1), the
second TMH region was not predicted by the TMHMM program, and the probability of having a TMH ranged from 0.1 to
0.9. Nevertheless, in these cases, subsequent hydropathy plots
confirmed that these targeting domains are still strongly hy-
FIG. 6. (A) Kyte-Doolittle hydrophobicity profiles for the stoptransfer region of class I targeting sequences and the region immediately following the signal sequence of class II targeting domains. Plots
begin 10 amino acid residues upstream of the start of the second TMH
(for class I proteins) or the first TMH (for class II proteins), and the
hydrophobicity profiles were calculated with a window size of 7 residues. The thick lines are the mean scores, and the thin lines on either
side represent the 95% confidence intervals. The black bars above the
hydrophobic regions indicate the location of the predicted TMH.
(B) Sequence logo plot of class IA sequences when the second transmembrane helixes (TMH2) were aligned. Only the regions immediately before and after TMH2 are shown.
VOL. 5, 2006
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
2087
FIG. 7. Alignment of targeting sequences from selected Euglena plastid-targeted proteins. (A) Comparison of FNR and CP29 targeting
sequences. Identical amino acids are white on a black background. (B) Second group of proteins possessing similar targeting sequences. Identical
amino acids compared to the top sequence are indicated by white letters on a black background. The hydrophobic regions of the signal sequence
and stop-transfer domains are indicated by lines above the appropriate amino acids. The mature portions of the proteins, if shown, are indicated
with double underlining.
drophobic (data not shown) and therefore likely to have the
same stop-transfer function. Immediately following the second
TMH and within six residues of its end, ca. 80% of proteins of
this class have two or more basic amino acids, and 97% of
proteins have at least one. Only 2 of the 71 class I proteins lack
a positively charged residue immediately after the TMH. The
sharp change in polarity immediately after the second hydrophobic region, particularly towards positively charged residues,
is apparent in the hydropathy plots encompassing this region
(Fig. 6A). Class I polypeptides display a sharp decline in hydrophobicity immediately following the second TMH, a feature that presumably acts to block further insertion into the
membrane. In class IB proteins, an additional hydrophobic
region, the lumen-targeting domain, is located 25 to 30 amino
acids further downstream. In contrast, class II polypeptides do
not exhibit this sharp increase in polarity immediately after the
hydrophobic section of the predicted signal sequence. The
sequence logo illustrates the common occurrence of basic
amino acids immediately after the hydrophobic domain of the
stop-transfer domain (Fig. 6B), which is not observed after the
signal sequences of class II proteins. These differences provide
additional evidence that the TMH of a class II protein is not
simply the second TMH of a 5⬘-truncated cDNA encoding a
class I protein.
Some plastid transit sequences are conserved. When the
plastid-targeting sequences of Euglena class I proteins were
used individually as tBLASTn queries against the Euglena
database, unexpected similarities in certain groups of unrelated proteins were revealed. For instance, FNR and CP29
possess nearly identical targeting sequences despite having no
functional relationship (Fig. 7A). There is also a group of
targeting sequences that show various degrees of similarity,
particularly within the signal sequence and plastid-targeting
domains of the transit peptide. Within this group, the targeting
sequences of rpL21 and rpL3 (tBLASTn E value ⫽ 5e⫺59) are
nearly identical, and these sequences share a high degree of
similarity with the targeting sequences of an acyl carrier protein (E ⫽ 2e⫺29) and two different light-harvesting complex
(LHC) subunits (E ⫽ 2e⫺39 and 4e⫺21). Interestingly, the
targeting domain of the first LHCI-like sequence shares a
greater degree of similarity with those of the acyl carrier protein and ribosomal proteins than with the targeting domain of
the other LHCI sequence (or of any other LHC sequence in
the database). This similarity even extends into the putative
stop-transfer domain, a region not expected to be conserved.
In other cases, a tBLASTn search with a specific targeting
sequence allowed the detection, as expected, of isoforms and
members of a multigene family, a result that is attributable to
gene duplication events. This search approach was able to
recognize a variety of different plastid-targeting proteins, although the E values were generally ⬎1e⫺20; thus, some of this
similarity could simply be due to the constraints placed upon
these regions by amino acid composition. In marked contrast,
many other targeting sequences produced no significant hits at
all. With the exception of rpL3, which was represented by a
single EST, the remainder of the EST clusters analyzed here
comprised multiple overlapping reads, with clear evidence of a
spliced leader sequence, eliminating clustering artifacts as an
explanation for the observed similarity.
DISCUSSION
The discovery of LHC precursors in Golgi dictyosomes of
Euglena (53) and subsequent in vitro experiments demon-
2088
DURNFORD AND GRAY
strated that the Euglena LHCII presequence does indeed possess a functional signal motif (38), an inference that is strongly
supported by the study reported here. Although the presence
of a signal sequence-like region was part of our selection criteria, we found no evidence in the entire database of a plastidtargeted protein that lacked a signal sequence, suggesting that
in Euglena, all plastid proteins proceed to the organelle via the
endomembrane system. Although some in vitro studies have
suggested the potential direct import of proteins into Euglena
plastids, thereby bypassing the ER (65), the bulk of relevant
biochemical work indicates that transport via the endomembrane system is required for plastid targeting (38, 67, 69–71).
The endomembrane system is also important for plastid targeting in all protists with complex secondary plastids, including
those with three (49, 55) and four (4, 7, 16, 19, 37, 59, 78, 79)
plastid membranes.
In Euglena, proteins targeted to the plastid do not fully insert
into the ER lumen or the membrane during translation due to
the presence of a stop-transfer domain, so the majority of the
protein remains exposed in the cytoplasm (69). Indeed, in class
I proteins, the presequence has a second hydrophobic region
followed by positively charged amino acids, both of which are
characteristics typical of stop-transfer sequences (14, 41). Although 2 of the 70 class I proteins lack positively charged
amino acids immediately after the second TMH, such residues
are not an absolute requirement for a stop-transfer function,
with the effectiveness of targeting depending on a combination
of hydrophobicity, length, and charge (14, 41, 60). The presence of a functioning stop-transfer motif in a plastid presequence is unique to Euglena and dinoflagellates. Both groups
have three plastid membranes, leading Nassoury et al. (49) to
suggest that the stop-transfer sequence arose from a mechanistic requirement driven by the number of plastid membranes.
It is generally agreed that Euglena and dinoflagellates are phylogenetically distant; thus, the similarities between their targeting sequences, and presumably the underlying transport
mechanisms, would appear to be convergent as part of a necessary step in protein targeting.
Although targeting in organisms with complex plastids first
requires import of the protein into the ER, little is known
about subsequent mechanisms of targeting to the plastid. In
organisms with three plastid membranes, such as euglenophytes and dinoflagellates, targeting from the ER to the outer
plastid membrane involves vesicular transport via the Golgi
system (49, 53). The segregation of plastid-bound proteins into
the proper vesicles may involve receptors located in the endomembrane system that recognize the transit peptide and direct
the protein to its appropriate destination. This pathway is
analogous to that in animal and fungal systems, where receptors within the endomembrane system, such as the classic mannose-6-phosphate receptor system for targeting to the lysosome (22), are able to recognize features of the protein and
ensure proper localization. Ultimately, cytoplasmic sorting factors, such as adaptins (9), may play a role in the accumulation
of plastid-targeted proteins and their segregation to vesicles
destined for the plastid. Such cytosolic factors could participate in the recognition of receptors that bind to plastid-targeted proteins and/or specific motifs just beyond the stoptransfer domain of the targeted protein itself to facilitate
targeting. One potential series of residues includes the cluster
EUKARYOT. CELL
of basic amino acids that immediately follows the stop-transfer
domain. The importance of short, cytoplasm-exposed targeting
motifs for intracellular sorting is well known (9). For Euglena,
Sláviková et al. (67) determined that this cytoplasm-exposed
portion of the presequence is not required for plastid import in
vitro, but they suggested that it may function in vesicle routing.
Of particular interest here is our discovery of plastid-targeted proteins lacking the putative stop-transfer sequence
(class II), implying that these proteins are inserted entirely into
the ER, leaving a soluble portion within the ER lumen and a
membrane portion integrated within the ER membrane, once
the signal sequence is removed. Given that the Euglena class II
proteins comprise both soluble and membrane proteins, it is
unlikely that other domains within the mature protein could
impart a similar stop-transfer effect to compensate for the lack
of such a region in the presequence. The targeting route for
class II proteins is conceptually similar to the targeting of
proteins to the remnant plastid (apicoplast) in apicomplexans;
apicoplast proteins lack the stop-transfer sequence and are
targeted to the plastid via the ER (19), presumably by vesicular
transport. Thus, for correct targeting, the putative transit peptide, and possibly the mature protein, must contain features
that would be recognized by specific cofactors or receptors that
are localized to the ER lumen, not the cytoplasm. Since class
II transit peptides are predicted to lack the stop-transfer sequence and thus the cytoplasm-exposed region just beyond,
redirection to the plastid must be facilitated solely by interaction with targeting factors that bind to the TP and allow these
precursors to “hitchhike” in vesicles with the class I proteins.
An alternative, albeit unlikely, mechanism is that the class II
signal sequence acts as a signal anchor, with the N terminus
facing the ER lumen. However, in this orientation the transit
peptide would be facing the cytoplasm and presumably would
be inaccessible to the targeting machinery.
Even more surprising is the resemblance of this class of
targeting sequence to those of dinoflagellates, whose plastidtargeted proteins also exhibit a similar proportion of presequences lacking stop-transfer domains (55), with the remainder resembling class I proteins. As possible explanations for
the dinoflagellates, Patron et al. (55) ruled out the evolutionary
history of the gene transfer or final destination of the protein,
suggesting instead that the “physical characteristics” of the
plastid-targeted protein may determine the nature of its presequence. In support of the latter hypothesis, they found that
the class I and II distinction was conserved between proteins in
two dinoflagellates examined. If “physical characteristics” was
the main factor determining the mode of transport, then we
would predict that Euglena would exhibit a similar distribution
of proteins having class I and II presequences. Some similarities are clearly evident, such as with phosphoribulose kinase
and oxygen-evolving enhancer 3 (PsbO), which lack a stoptransfer sequence in both dinoflagellates and Euglena. However, other dinoflagellate proteins with class II (and III) targeting sequences are class I proteins in Euglena (acyl carrier
protein, carbonic anhydrase, cytochrome c6, and the PSII 11kDa protein). Although the sample size for comparison is
small, there do not appear to be any obvious inherent functional or physical properties that would require a class I versus
class II targeting sequence. In vitro import assays should help
to define the functional requirements of the different classes of
VOL. 5, 2006
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
presequence and determine whether either is essential for the
import of specific proteins.
With the exception of apicomplexans, complex plastids with
four plastid membranes often have ribosomes attached to the
outer membrane (chloroplast ER [CER]). However, the primary plastid-sorting mechanism must still occur after cotranslational import across/into the ER membrane, since in diatoms
the signal sequences of ER and plastid-resident proteins are
functionally equivalent (37), and in a raphidophyte, few ribosomes are bound to the CER (30). Thus, once inserted into the
endomembrane system, the plastid-bound proteins still have to
be targeted to and transverse at least three membranes, similar
to the situation in Euglena and dinoflagellates. Though not
involving the Golgi dictyosomes, a vesicular transfer between
the CER and the third membrane has been proposed (23), and
a recent report supports such a mechanism (37). Apicomplexans, in particular, provide a valuable model system for dissecting the targeting process in complex plastids with four membranes, with several studies indicating not only that there is
partially redundant targeting information in the presequences
of apicoplast-targeted proteins (26, 81, 82) but also that there
is a distinction between the information for targeting and that
for import into the apicoplast (26). Recent work has even
identified proteins that interact with the TP and that may be
involved in sorting from the ER to the apicoplast (82).
In Euglena, the region between the signal sequence and the
stop-transfer sequence in class I proteins functions as a TP
(67). This region and the TPs of class II proteins possess
characteristics typical of most TPs. These similarities include
enrichment in Ser/Thr (S/T bias) and Ala. S/T bias is a common feature of most transit peptides of plants and algae (2, 4,
11, 15, 20, 49, 54, 55, 59, 76). Some notable exceptions to this
rule include apicomplexans (77, 78) and nucleomorph-encoded
plastid proteins from the cryptomonad alga Guillardia theta
(57). Replacement of all Ser/Thr residues in the TP of Plasmodium had no effect on plastid targeting, demonstrating a
lack of a requirement for such residues (78). Although an
elevated Ser/Thr content is evidently dispensable in apicomplexans, it remains one of the more consistent features of most
TPs, which may reflect a requirement for phosphorylationdependent binding of 14-3-3 proteins as part of a preinsertion
guidance complex (46).
Euglena TPs also have an overall positive charge, an apparently universal feature of TPs (57), that is primarily due to a
reduction in the content of acidic amino acids. Of particular
interest is the asymmetric distribution of acidic amino acids in
the TP, with the first two-thirds being deficient in such residues, whereas the remaining third has a composition resembling that of the mature protein. This asymmetry may reflect a
distinction between functional TPs (with a bias against acidic
residues) and regions having a different function. The importance of a TP depleted in acidic residues was demonstrated
in Plasmodium, where the replacement of basic with acidic
amino acids eliminated apicoplast targeting (19). Interestingly, Euglena TPs are also deficient in Lys (but not Arg) and
have biases in favor of Ala and Pro compared to mature proteins, which are also features of the TPs of the chlorarachniophyte Bigelowiella natans (59). Some of the shared features of
TPs, such as a bias against acidic amino acids and a bias in
favor of some hydrophobic residues, may be due to a re-
2089
quirement for binding of import factors, such as molecular
chaperones (Hsp70) (83). Although the biological significance of the biased amino acid composition in TPs is not
entirely understood, and despite any differences in primary
structure, TPs from diverse plastid types are functionally
sufficient in heterologous import assays (3, 32, 42, 49, 67,
79).
The striking amino acid similarity between certain plastidtargeting sequences is surprising. In general, transit peptides
lack evident sequence similarity, even among paralogs of the
same gene family, so the detection of clusters of related targeting sequences may shed light on how targeting sequences
were acquired following transfer of the endosymbiont’s genes
to the host nucleus during plastid evolution. Reports of highly
similar plastid and mitochondrial TPs are relatively rare, but
the examples can be separated into two categories. In the first
case, homologs from different species exhibit a greater-thanexpected similarity within the TP region compared to that of
the mature proteins, which is attributed to a conserved functional role (80). The second category includes unrelated proteins that possess highly similar TPs (1, 5, 40, 45), which is what
we observe in Euglena. This similarity is often attributed to
exon shuffling, as introns commonly separate the transit peptide from the mature protein (36, 45). There are also reports
of transit peptide acquisition through insertion into preexisting genes for plastid (5)- and mitochondrion-targeted (1)
proteins. Thus, the newly transferred genes would acquire
not only the targeting mechanism but also the regulatory
sequences required for expression, in the so-called “lucky insertion scenario” (21). Although we lack the appropriate
genomic information from Euglena to be able to completely
assess the mechanism of TP acquisition, a genomic sequence
for an LHCII gene of this organism does have an intron that
roughly separates the predicted targeting domain from the
mature protein (48), suggesting exon shuffling as a potential
mode of TP acquisition. However, the similarity of the rpL3
and rpL21 presequences to a small portion of the LHCI mature protein (GFDPLGL) (Fig. 7) suggests that TP acquisition
by insertion into a preexisting copy of the LHCI gene is also a
strong possibility. The maintenance of a continued high degree
of conservation between rpL21-rpL3 and CP29-FNR could
also imply recent recombination, or perhaps alternative splicing, as described for rice mitochondrion-targeted rpS14 and
SDHB proteins (40). The pronounced sequence conservation
within these regions also raises the possibility that these targeting sequences have an additional function(s) in the cell,
either before or after cleavage, as proposed for some mammalian signal sequences (10, 17).
In summary, we have characterized two distinct classes of
Euglena plastid presequences, i.e., classes I and II, that differ
by the presence and absence of a predicted stop-transfer sequence, respectively, revealing an additional level of complexity in the protein transport mechanism. In addition to enhancing our ability to predict Euglena presequences, we expect that
the characteristics of these TPs will stimulate further import
studies, both in vitro and in vivo, seeking to dissect the processes of targeting and import into the complex plastids of
Euglena.
2090
DURNFORD AND GRAY
ACKNOWLEDGMENTS
This work was carried out under the auspices of a Genome Canada
large-scale genomics project, the Protist EST Program, with funding
provided through Genome Atlantic and the Atlantic Innovation Fund.
M.W.G. gratefully acknowledges salary support from the Canada Research Chairs Program and the Canadian Institute for Advanced Research (Program in Evolutionary Biology). D.G.D. also thanks the
Natural Sciences and Engineering Research Council (NSERC) for
ongoing support.
We are grateful to Patrick Keeling for sharing a paper on dinoflagellate targeting sequences prior to publication. We also thank Steve
Heard and Penny Humby for helpful discussions on statistics. The
technical assistance of H. Rissler, who isolated RNAs from Euglena for
the construction of two of the five cDNA libraries sequenced for this
study, is acknowledged.
REFERENCES
1. Adams, K. L., M. Rosenblueth, Y. L. Qiu, and J. D. Palmer. 2001. Multiple
losses and transfers to the nucleus of two mitochondrial succinate dehydrogenase genes during angiosperm evolution. Genetics 158:1289–1300.
2. Apt, K. E., D. Bhaya, and A. R. Grossman. 1994. Characterization of genes
encoding the light-harvesting proteins in diatoms: biogenesis of the fucoxanthin chlorophyll a/c protein complex. J. Appl. Phycol. 6:225–230.
3. Apt, K. E., N. E. Hoffman, and A. R. Grossman. 1993. The ␥-subunit of
R-phycoerythrin and its possible mode of transport into the plastid of red
algae. J. Biol. Chem. 268:16208–16215.
4. Apt, K. E., L. Zaslavkaia, J. C. Lippmeier, M. Lang, O. Kilian, R. Wetherbee,
A. R. Grossman, and P. G. Kroth. 2002. In vivo characterization of diatom
multipartite plastid targeting signals. J. Cell Sci. 115:4061–4069.
5. Arimura, S.-I., S. Takusagawa, S. Hatano, M. Nakazono, A. Hirai, and N.
Tsutsumi. 1999. A novel plant nuclear gene encoding chloroplast ribosomal
protein S9 has a transit peptide related to that of rice chloroplast ribosomal
protein L12. FEBS Lett. 450:231–234.
6. Bendtsen, J. D., H. Nielsen, G. von Heijne, and S. Brunak. 2004. Improved
prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783–795.
7. Bhaya, D., and A. Grossman. 1991. Targeting proteins to diatom plastids
involves transport through an endoplasmic reticulum. Mol. Gen. Genet.
229:400–404.
8. Blobel, G. 1980. Intracellular protein topogenesis. Proc. Natl. Acad. Sci.
USA 77:1496–1500.
9. Bonifacino, J. S., and L. M. Traub. 2003. Signals for sorting of transmembrane proteins to endosomes and lysosomes. Annu. Rev. Biochem. 72:395–
447.
10. Braud, V. M., D. S. Allan, C. A. O’Callaghan, K. Soderstrom, A. D’Andrea,
G. S. Ogg, S. Lazetic, N. T. Young, J. I. Bell, J. H. Phillips, L. L. Lanier, and
A. J. McMichael. 1998. HLA-E binds to natural killer cell receptors CD94/
NKG2A, B and C. Nature 391:795–799.
11. Bruce, B. D. 2000. Chloroplast transit peptides: structure, function and
evolution. Trends Cell Biol. 10:440–447.
12. Cavalier-Smith, T. 2002. Chloroplast evolution: secondary symbiogenesis
and multiple losses. Curr. Biol. 12:R62–R64.
13. Chan, R. L., M. Keller, J. Canaday, J. H. Weil, and P. Imbault. 1990. Eight
small subunits of Euglena ribulose-1 5-bisphosphate carboxylase-oxygenase
are translated from a large messenger RNA as a polyprotein. EMBO J.
9:333–338.
14. Chen, H., and D. A. Kendall. 1995. Artificial transmembrane segments.
Requirements for stop transfer and polypeptide orientation. J. Biol. Chem.
270:14115–14122.
15. Deane, J. A., M. Fraunholz, V. Su, U.-G. Maier, W. Martin, D. G. Durnford,
and G. I. McFadden. 2000. Evidence for nucleomorph to host nucleus gene
transfer: light-harvesting complex proteins from cryptomonads and
chlorarachniophytes. Protist 151:239–252.
16. DeRocher, A., C. B. Hagen, J. E. Froehlich, J. E. Feagin, and M. Parsons.
2000. Analysis of targeting sequences demonstrates that trafficking to the
Toxoplasma gondii plastid branches off the secretory system. J. Cell Sci.
113:3969–3977.
17. Eichler, R., O. Lenz, T. Strecker, M. Eickmann, H. D. Klenk, and W. Garten.
2003. Identification of Lassa virus glycoprotein signal peptide as a transacting maturation factor. EMBO Rep. 4:1084–1088.
18. Emanuelsson, O., H. Nielsen, and G. von Heijne. 1999. ChloroP, a neural
network-based method for predicting chloroplast transit peptides and their
cleavage sites. Protein Sci. 8:978–984.
19. Foth, B. J., S. A. Ralph, C. J. Tonkin, N. S. Struck, M. Fraunholz, D. S. Roos,
A. F. Cowman, and G. I. McFadden. 2003. Dissecting apicoplast targeting in
the malaria parasite Plasmodium falciparum. Science 299:705–708.
20. Franzen, L. G., J.-D. Rochaix, and G. von Heijne. 1990. Chloroplast transit
peptides from the green alga Chlamydomonas reinhardtii share features with
both mitochondrial and higher plant chloroplast presequences. FEBS Lett.
260:165–168.
EUKARYOT. CELL
21. Gantt, J. S., S. L. Baldauf, P. J. Calie, N. F. Weeden, and J. D. Palmer. 1991.
Transfer of rpl22 to the nucleus greatly preceded its loss from the chloroplast
and involved the gain of an intron. EMBO J. 10:3073–3078.
22. Ghosh, P., N. M. Dahms, and S. Kornfeld. 2003. Mannose 6-phosphate
receptors: new twists in the tale. Nat. Rev. Mol. Cell Biol. 4:202–212.
23. Gibbs, S. P. 1979. Route of entry of cytoplasmically synthesized proteins into
chloroplasts of algae possessing chloroplast-ER. J. Cell Sci. 35:253–266.
24. Grossman, A., A. Manodori, and D. Snyder. 1990. Light-harvesting proteins
of diatoms: their relationship to the chlorophyll a/b binding proteins of
higher plants and their mode of transport into plastids. Mol. Gen. Genet.
224:91–100.
25. Hannaert, V., H. Brinkmann, U. Nowitzki, J. A. Lee, M. A. Albert, C. W.
Sensen, T. Gaasterland, M. Muller, P. Michels, and W. Martin. 2000. Enolase from Trypanosoma brucei, from the amitochondriate protist Mastigamoeba balamuthi, and from the chloroplast and cytosol of Euglena gracilis:
pieces in the evolutionary puzzle of the eukaryotic glycolytic pathway. Mol.
Biol. Evol. 17:989–1000.
26. Harb, O. S., B. Chatterjee, M. J. Fraunholz, M. J. Crawford, M. Nishi, and
D. S. Roos. 2004. Multiple functionally redundant signals mediate targeting
to the apicoplast in the apicomplexan parasite Toxoplasma gondii. Eukaryot.
Cell 3:663–674.
27. Henze, K., A. Badr, M. Wettern, R. Cerff, and W. Martin. 1995. A nuclear
gene of eubacterial origin in Euglena gracilis reflects cryptic endosymbioses
during protist evolution. Proc. Natl. Acad. Sci. USA 92:9122–9126.
28. Houlne, G., and R. Schantz. 1987. Molecular analysis of the transcripts
encoding the light-harvesting chlorophyll a/b protein in Euglena gracilis:
unusual size of the mRNA. Curr. Genet. 12:611–616.
29. Inagaki, J., Y. Fujita, T. Hase, and Y. Yamamoto. 2000. Protein translocation
within chloroplast is similar in Euglena and higher plants. Biochem. Biophys.
Res. Commun. 277:436–442.
30. Ishida, K., T. Cavalier-Smith, and B. R. Green. 2000. Endomembrane structure and the chloroplast protein targeting pathway in Heterosigma akashiwo
(Raphidophyceae, Chromista). J. Phycol. 36:1135–1144.
31. Jackson-Constan, D., and K. Keegstra. 2001. Arabidopsis genes encoding
components of the chloroplastic protein import apparatus. Plant Physiol.
125:1567–1576.
32. Jakowitsch, J., C. Neumann-Spallart, Y. Ma, J. Steiner, H. E. Schenk, H. J.
Bohnert, and W. Löffelhardt. 1996. In vitro import of pre-ferredoxinNADP⫹-oxidoreductase from Cyanophora paradoxa into cyanelles and into
pea chloroplasts. FEBS Lett. 381:153–155.
33. Keegstra, K., and K. Cline. 1999. Protein import and routing systems of
chloroplasts. Plant Cell 11:557–570.
34. Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their
hosts. Am. J. Bot. 91:1481–1493.
35. Kilian, O., and P. G. Kroth. 2003. Evolution of protein targeting into “complex” plastids: the “secretory transport hypothesis.” Plant Biol. (Stuttgart)
5:350–358.
36. Kilian, O., and P. G. Kroth. 2004. Presequence acquisition during secondary
endocytobiosis and the possible role of introns. J. Mol. Evol. 58:712–721.
37. Kilian, O., and P. G. Kroth. 2005. Identification and characterization of a
new conserved motif within the presequence of proteins targeted into complex diatom plastids. Plant J. 41:175–183.
38. Kishore, R., U. S. Muchhal, and S. D. Schwartzbach. 1993. The presequence
of Euglena LHCPII, a cytoplasmically synthesized chloroplast protein, contains a functional endoplasmic reticulum-targeting domain. Proc. Natl. Acad.
Sci. USA 90:11845–11849.
39. Krogh, A., B. Larsson, G. von Heijne, and E. L. Sonnhammer. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305:567–580.
40. Kubo, N., K. Harada, A. Hirai, and K.-I. Kadowaki. 1999. A single nuclear
transcript encoding mitochondrial RPS14 and SDHB of rice is processed by
alternative splicing: common use of the same mitochondrial targeting signal
for different proteins. Proc. Natl. Acad. Sci. USA 96:9207–9211.
41. Kuroiwa, T., M. Sakaguchi, K. Mihara, and T. Omura. 1991. Systematic
analysis of stop-transfer sequence for microsomal membrane. J. Biol. Chem.
266:9251–9255.
42. Lang, M., K. E. Apt, and P. G. Kroth. 1998. Protein transport into “complex”
diatom plastids utilizes two different targeting signals. J. Biol. Chem. 273:
30973–30978.
43. Lang, M., and P. G. Kroth. 2001. Diatom fucoxanthin chlorophyll a/c-binding protein (FCP) and land plant light-harvesting proteins use a similar
pathway for thylakoid membrane insertion. J. Biol. Chem. 276:7985–7991.
44. Lin, Q., L. Ma, W. Burkhart, and L. L. Spremulli. 1994. Isolation and
characterization of cDNA clones for chloroplast translational initiation factor-3 from Euglena gracilis. J. Biol. Chem. 269:9436–9444.
45. Long, M., S. J. de Souza, C. Rosenberg, and W. Gilbert. 1996. Exon shuffling
and the origin of the mitochondrial targeting function in plant cytochrome c1
precursor. Proc. Natl. Acad. Sci. USA 93:7727–7731.
46. May, T., and J. Soll. 2000. 14-3-3 proteins form a guidance complex with
chloroplast precursor proteins in plants. Plant Cell 12:53–64.
47. McFadden, G. I. 1999. Plastids and protein targeting. J. Eukaryot. Microbiol.
46:339–346.
VOL. 5, 2006
E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES
48. Muchhal, U. S., and S. D. Schwartzbach. 1994. Characterization of the
unique intron-exon junctions of Euglena gene(s) encoding the polyprotein
precursor to the light-harvesting chlorophyll a/b binding protein of photosystem II. Nucleic Acids Res. 22:5737–5744.
49. Nassoury, N., M. Cappadocia, and D. Morse. 2003. Plastid ultrastructure
defines the protein import pathway in dinoflagellates. J. Cell Sci. 116:2867–
2874.
50. Nassoury, N., and D. Morse. 2005. Protein targeting to the chloroplasts of
photosynthetic eukaryotes: getting there is half the fun. Biochim. Biophys.
Acta 1743:5–19.
51. Nielsen, H., and A. Krogh. 1998. Prediction of signal peptides and signal
anchors by a hidden Markov model. Proc. Int. Conf. Intell. Syst. Mol. Biol.
6:122–130.
52. Nowitzki, U., G. Gelius-Dietrich, M. Schwieger, K. Henze, and W. Martin.
2004. Chloroplast phosphoglycerate kinase from Euglena gracilis: endosymbiotic gene replacement going against the tide. Eur. J. Biochem. 271:4123–
4131.
53. Osafune, T., S. Sumida, J. A. Schiff, and E. Hase. 1991. Immunolocalization
of LHCP II apoprotein in the Golgi during light-induced chloroplast development in non-dividing Euglena cells. J. Electron Microsc. 40:41–47.
54. Pancic, P. G., and H. Strotmann. 1993. Structure of the nuclear encoded g
subunit of CFoCF1 of the diatom Odontella sinensis including its presequence. FEBS Lett. 320:61–66.
55. Patron, N. J., R. F. Waller, J. M. Archibald, and P. J. Keeling. 2005.
Complex protein targeting to dinoflagellate plastids. J. Mol. Biol. 348:1015–
1024.
56. Plaumann, M., B. Pelzer-Reith, W. F. Martin, and C. Schnarrenberger.
1997. Multiple recruitment of class-I aldolase to chloroplasts and eubacterial
origin of eukaryotic class-II aldolases revealed by cDNAs from Euglena
gracilis. Curr. Genet. 31:430–438.
57. Ralph, S. A., B. J. Foth, N. Hall, and G. I. McFadden. 2004. Evolutionary
pressures on apicoplast transit peptides. Mol. Biol. Evol. 21:2183–2194.
58. Robinson, C. 2000. The twin-arginine translocation system: a novel means of
transporting folded proteins in chloroplasts and bacteria. Biol. Chem. 381:
89–93.
59. Rogers, M. B., J. M. Archibald, M. A. Field, C. Li, B. Striepen, and P. J.
Keeling. 2004. Plastid-targeting peptides from the chlorarachniophyte
Bigelowiella natans. J. Eukaryot. Microbiol. 51:529–535.
60. Saaf, A., E. Wallin, and G. von Heijne. 1998. Stop-transfer function of
pseudo-random amino acid segments during translocation across prokaryotic
and eukaryotic membranes. Eur. J. Biochem. 251:821–829.
61. Santillán-Torres, J. L., A. Atteia, M. G. Claros, and D. González-Halphen.
2003. Cytochrome f and subunit IV, two essential components of the photosynthetic bf complex typically encoded in the chloroplast genome, are
nucleus-encoded in Euglena gracilis. Biochim. Biophys. Acta 1604:180–189.
62. Schnell, D. J. 1998. Protein targeting to the thylakoid membrane. Annu. Rev.
Plant Physiol. Plant Mol. Biol. 49:97–126.
63. Schwartzbach, S. D., T. Osafune, and W. Löffelhardt. 1998. Protein import
into cyanelles and complex chloroplasts. Plant Mol. Biol. 38:247–263.
64. Sharif, A. L., A. G. Smith, and C. Abell. 1989. Isolation and characterisation
of a cDNA clone for a chlorophyll synthesis enzyme from Euglena gracilis.
The chloroplast enzyme hydroxymethylbilane synthase (porphobilinogen
deaminase) is synthesised with a very long transit peptide in Euglena. Eur.
J. Biochem. 184:353–359.
65. Shashidhara, L. S., S. H. Lim, J. B. Shackleton, C. Robinson, and A. G.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
2091
Smith. 1992. Protein targeting across the three membranes of the Euglena
chloroplast envelope. J. Biol. Chem. 267:12885–12891.
Shigemori, Y., J. Inagaki, H. Mori, M. Nishimura, S. Takahashi, and Y.
Yamamoto. 1994. The presequence of the precursor to the nucleus-encoded
30 kDa protein of photosystem II in Euglena gracilis Z includes two hydrophobic domains. Plant Mol. Biol. 24:209–215.
Sláviková, S., R. Vacula, Z. Fang, T. Ehara, T. Osafune, and S. D.
Schwartzbach. 2005. Homologous and heterologous reconstitution of Golgi
to chloroplast transport and protein import into the complex chloroplasts of
Euglena. J. Cell Sci. 118:1651–1661.
Steiner, J. M., and W. Löffelhardt. 2005. Protein translocation into and
within cyanelles. Mol. Membr. Biol. 22:123–132.
Sulli, C., Z. Fang, U. Muchhal, and S. D. Schwartzbach. 1999. Topology of
Euglena chloroplast protein precursors within endoplasmic reticulum to
Golgi to chloroplast transport vesicles. J. Biol. Chem. 274:457–463.
Sulli, C., and S. D. Schwartzbach. 1995. The polyprotein precursor to the
Euglena light-harvesting chlorophyll a/b-binding protein is transported to the
Golgi apparatus prior to chloroplast import and polyprotein processing.
J. Biol. Chem. 270:13084–13090.
Sulli, C., and S. D. Schwartzbach. 1996. A soluble protein is imported into
Euglena chloroplasts as a membrane-bound precursor. Plant Cell 8:43–53.
Tessier, L. H., M. Keller, R. L. Chan, R. Fournier, J. H. Weil, and P.
Imbault. 1991. Short leader sequences may be transferred from small RNAs
to pre-mature mRNAs by trans-splicing in Euglena. EMBO J. 10:2621–2625.
Vacula, R., J. M. Steiner, J. Krajcovic, L. Ebringer, and W. Löffelhardt.
1999. Nucleus-encoded precursors to thylakoid lumen proteins of Euglena
gracilis possess tripartite presequences. DNA Res. 6:45–49.
van Dooren, G. G., S. D. Schwartzbach, T. Osafune, and G. I. McFadden.
2001. Translocation of proteins across the multiple membranes of complex
plastids. Biochim. Biophys. Acta 1541:34–53.
von Heijne, G. 1990. The signal peptide. J. Membr. Biol. 115:195–201.
von Heijne, G., J. Steppuhn, and R. G. Herrmann. 1989. Domain structure
of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 180:
535–545.
Waller, R. F., P. J. Keeling, R. G. Donald, B. Striepen, E. Handman, N.
Lang-Unnasch, A. F. Cowman, G. S. Besra, D. S. Roos, and G. I. McFadden.
1998. Nuclear-encoded proteins target to the plastid in Toxoplasma gondii
and Plasmodium falciparum. Proc. Natl. Acad. Sci. USA 95:12352–12357.
Waller, R. F., M. B. Reed, A. F. Cowman, and G. I. McFadden. 2000. Protein
trafficking to the plastid of Plasmodium falciparum is via the secretory pathway. EMBO J. 19:1794–1802.
Wastl, J., and U.-G. Maier. 2000. Transport of proteins into cryptomonads
complex plastids. J. Biol. Chem. 275:23194–23198.
Wolter, F. P., C. C. Fritz, L. Willmitzer, J. Schell, and P. H. Schreier. 1988.
rbcS genes in Solanum tuberosum: conservation of transit peptide and exon
shuffling during evolution. Proc. Natl. Acad. Sci. USA 85:846–850.
Yung, S., T. R. Unnasch, and N. Lang-Unnasch. 2001. Analysis of apicoplast
targeting and transit peptide processing in Toxoplasma gondii by deletional
and insertional mutagenesis. Mol. Biochem. Parasitol. 118:11–21.
Yung, S. C., T. R. Unnasch, and N. Lang-Unnasch. 2003. Cis and trans
factors involved in apicoplast targeting in Toxoplasma gondii. J. Parasitol.
89:767–776.
Zhang, X. P., and E. Glaser. 2002. Interaction of plant mitochondrial and
chloroplast signal peptides with the Hsp70 molecular chaperone. Trends
Plant Sci. 7:14–21.