EUKARYOTIC CELL, Dec. 2006, p. 2079–2091 1535-9778/06/$08.00⫹0 doi:10.1128/EC.00222-06 Copyright © 2006, American Society for Microbiology. All Rights Reserved. Vol. 5, No. 12 Analysis of Euglena gracilis Plastid-Targeted Proteins Reveals Different Classes of Transit Sequences䌤 Dion G. Durnford1* and Michael W. Gray2 Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada E3B 5A3,1 and Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X52 Received 12 July 2006/Accepted 15 September 2006 The plastid of Euglena gracilis was acquired secondarily through an endosymbiotic event with a eukaryotic green alga, and as a result, it is surrounded by a third membrane. This membrane complexity raises the question of how the plastid proteins are targeted to and imported into the organelle. To further explore plastid protein targeting in Euglena, we screened a total of 9,461 expressed sequence tag (EST) clusters (derived from 19,013 individual ESTs) for full-length proteins that are plastid localized to characterize their targeting sequences and to infer potential modes of translocation. Of the 117 proteins identified as being potentially plastid localized whose N-terminal targeting sequences could be inferred, 83 were unique and could be classified into two major groups. Class I proteins have tripartite targeting sequences, comprising (in order) an N-terminal signal sequence, a plastid transit peptide domain, and a predicted stop-transfer sequence. Within this class of proteins are the lumen-targeted proteins (class IB), which have an additional hydrophobic domain similar to a signal sequence and required for further targeting across the thylakoid membrane. Class II proteins lack the putative stop-transfer sequence and possess only a signal sequence at the N terminus, followed by what, in amino acid composition, resembles a plastid transit peptide. Unexpectedly, a few unrelated plastid-targeted proteins exhibit highly similar transit sequences, implying either a recent swapping of these domains or a conserved function. This work represents the most comprehensive description to date of transit peptides in Euglena and hints at the complex routes of plastid targeting that must exist in this organism. newly synthesized proteins to the outer envelope membrane, where they interact with receptors and other components of the translocation apparatus so that protein import and subsequent sorting can take place (33). Many of the translocation components present in the outer and inner envelopes have been identified in plants (31). Many protists, however, possess secondary plastids that are believed to have arisen from endosymbiosis with a eukaryotic alga. These organisms have complex plastids with either three membranes around the chloroplast, as occurs in the dinoflagellates and Euglena spp., or four membranes, as in the stramenopiles and haptophytes (34). The presence of additional membranes surrounding the plastid would seem to necessitate additional targeting information, complicating the process of translocation. We know, for example, that during the evolution of secondary plastids, genes from the endosymbiont were functionally transferred to the host’s nuclear genome. These genes must then be expressed and their protein products targeted back to the organelle, and this process is undoubtedly more complicated than that in the case of primary plastids. A significant hurdle in this pathway is the necessity to acquire appropriate targeting information that allows nucleus-encoded proteins to be directed to the plastid and to traverse additional membranes in the process. Understanding the mechanism of targeting and translocation in organisms with complex plastids has been key to understanding how the transition from algal symbiont to plastid occurred (12, 35, 47, 50, 63, 74). In protists with four membranes around the plastid, the outermost membrane often has ribosomes attached and is typically continuous with the endoplasmic reticulum (ER) (23). Proteins directed to these plastids possess bipartite targeting sequences, with an N-terminal signal sequence A fundamental problem in cell biology is the precise and efficient targeting of proteins synthesized by cytoplasmic ribosomes to their appropriate intracellular locations. Proteins destined for the endomembrane system, mitochondria, or the chloroplast usually have specific N-terminal targeting domains that are required for proper subcellular localization. These leader sequences are often removed by specific proteases at the protein’s destination prior to it assuming its active conformation. For chloroplast-targeted proteins in plants and algae, an N-terminal transit peptide (TP) is both necessary and sufficient for correct plastid targeting (11). Transit peptides are not conserved in sequence but exhibit characteristic biochemical properties, such as an elevated content of the hydroxylated amino acids serine and threonine as well as a deficiency of acidic (aspartate and glutamate) amino acids (76). Within a typical chloroplast, there are six distinct locations to which the constituent proteins must be sorted, and some proteins have to cross up to three membranes (33). This complexity requires additional targeting information within the transit peptide, such as the signal sequence-like domain found in proteins targeted to the thylakoid membrane, or information contained within the mature portion of the protein itself (62). Plants, green algae, and red algae have plastids derived from an endosymbiotic cyanobacterium, with two membranes enveloping the chloroplast (34). Protein targeting to these plastids is fairly well understood; generally, the transit peptides direct * Corresponding author. Mailing address: Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada E3B 5A3. Phone: (506) 452-6207. Fax: (453) 453-3583. E-mail: [email protected]. 䌤 Published ahead of print on 22 September 2006. 2079 2080 DURNFORD AND GRAY (24) that directs them to the chloroplast ER, where they are cotranslationally imported across the first membrane (4, 7, 30). The domain after the signal sequence is the predicted transit peptide for transport across the inner two membranes, in a process likely to resemble translocation across plant chloroplast envelopes (43). The euglenophytes and dinoflagellates have plastids with three membranes, the outermost of which lacks bound ribosomes. In both cases, plastid proteins are targeted through the endomembrane system (49, 53, 67, 70, 71). From studies of several complete, publicly available Euglena gracilis plastid protein sequences (13, 25, 27, 28, 38, 44, 52, 56, 61, 64, 66, 73), it was predicted that the plastid proteins have an N-terminal signal sequence, an inference that was confirmed by both in vitro (38) and in vivo (70, 71) experimental approaches. Following the signal sequence is the predicted transit peptide, which is sufficient for translocation across plant chloroplast membranes (29), and a hydrophobic region that acts as a “stoptransfer” sequence to prevent complete transport into the ER, such that the mature protein remains in the cytoplasm (69). The protein is then targeted to the plastid, likely via a vesicular transport system (67). Also described for Euglena are tripartite transit sequences that possess an additional hydrophobic domain predicted to target proteins to the thylakoid lumen (73). Because relatively few Euglena plastid protein sequences are publicly available, the study we report here more comprehensively examines the characteristics of plastid-targeting sequences. Since many of the known Euglena proteins, including all of those for which biochemical analyses of targeting have been conducted, are encoded as polyproteins, we sought to determine whether all plastid proteins are likely to proceed to the plastid via a similar pathway in this organism. By examining the targeting sequences of a large number of plastid proteins, the majority of which are not organized as polyproteins, we have been able to define the characteristics that can be used to identify Euglena plastid-targeted proteins with high confidence and to infer modes of transport to the plastid. MATERIALS AND METHODS E. gracilis strain Z was cultured under several different conditions, and cDNA libraries were produced commercially in the PCDNA3.1(⫹) vector (DNA Technologies Inc.). Expressed sequence tag (EST) sequencing was performed at the Atlantic Genome Centre (Halifax, Nova Scotia, Canada) and the B.C. Cancer Agency (Vancouver, British Columbia, Canada). A total of 19,013 ESTs were retained following quality and vector trimming via the taxonomically broad EST database (TBestDB [http://tbestdb.bcm.umontreal.ca/searches/login.php]), under the auspices of the Protist EST Program. The ESTs were clustered to form a total of 9,461 unique groups. To search for plastid-targeted proteins, the 9,461 clusters were translated in three reading frames (ORFs) (plus orientation), and the longest ORF of ⬎19 amino acids starting with a methionine was retained for further analyses (http: //maven.smith.edu/⬃vvouille/sumCGI/translator.html). Screening for plastid-targeted proteins was carried out in several rounds. First, all ORFs were screened for the presence of a signal sequence using the program SignalP3 (6, 51; http: //www.cbs.dtu.dk/services/SignalP/). Any ORFs with a signal sequence predicted with the hidden Markov model (HMM) or the artificial neural network (NN) were retained. All selected ORFs were then rescreened, and those having a clear role in plastid function and/or those whose top BLASTnr hit was plant, algal, or cyanobacterial in origin were segregated for further consideration. Finally, the putative plastid-targeted proteins were screened further according to the following criteria: (i) the top BLAST hit (NCBI nonredundant database) was plant/ algal or cyanobacterial and/or the protein has a clear role in plastid function, and (ii) the BLASTp E value was ⱕ1e⫺05. The ORF was considered to possess a complete transit sequence when (i) there was evidence for a spliced leader EUKARYOT. CELL sequence (TTTTTTTCG) at the 5⬘ end of the cDNA that would indicate that the cDNA was full length (72), (ii) there was an extension of the ORF toward the N terminus upstream of the first region of evident amino acid sequence similarity following a BLASTp search, and (iii) the beginning of the mature protein was identified by comparison with orthologous proteins. Potential membrane-spanning regions were identified using the hidden Markov model-based program TMHMM (39; http://www.cbs.dtu.dk/services /TMHMM/). Hydrophobicity plots were generated using the Protscale program at the exPASy site (http://www.expasy.org/tools/protscale.html), using a KyteDoolittle scale with a sliding window length of 7 or 19 nucleotides, as indicated. The amino acid content of peptides was calculated using the PEPSTATS program in the EMBOSS package, available at AnaBench (http://anabench.bcm .umontreal.ca/anabench/Anabench-Jsp/Welcome.jsp). Sequence logo displays were generated using the online program WebLogo (weblogo.berkeley.edu/logo.cgi). Nucleotide sequence accession numbers. All individual EST sequences have been deposited in the NCBI dbEST database under accession numbers EG565093 to EG565263. RESULTS From 9,461 individual Euglena EST clusters, a total of 117 full-length plastid proteins were identified. Eliminating nearly identical isoforms from the data set left a total of 83 unique proteins for further analysis (Table 1). In addition to functioning in basic photosynthetic reactions, the proteins identified had predicted roles in the biosynthesis of proteins, lipids, carotenoids, and chlorophyll. Proteins involved in signal transduction and plastid metabolism were also found. Through determination of the N-terminal-most regions of sequence similarity in BLASTp searches, targeting domains were delineated and found to be very long, with an average size (⫾ standard deviation) of 152 ⫾ 25 residues (Table 1). The shortest estimated presequence was 95 residues, for Rubisco activase, and the longest was 211 (Albino3) (Table 1). Of the few Euglena plastid proteins examined to date, all possess an N-terminal region similar to a eukaryotic signal sequence. Thus, the first strategy for identifying plastid-targeted proteins was to search for the presence of such a sequence, using SignalP3. Of the final group of 83 plastid-targeted proteins examined using the SignalP hidden Markov model, 68% were predicted to possess a signal sequence. This value dropped to 56% when the artificial NN was employed. In cases where SignalP did not predict a signal peptide but other screens indicated a potential plastid-targeted protein, there was nevertheless a clear hydrophobic region characteristic of a signal peptide. Based on the NN predictions for the signal sequence cleavage sites, the Euglena signal sequence was estimated to be 33 ⫾ 9 residues long (range, 18 to 59 residues). The predicted cleavage site was consistent with that in other eukaryotic signal sequences (Fig. 1C). E. gracilis plastid-targeting sequences can be divided into two classes. Class I plastid-targeting sequences are designated by analogy to a similar type of targeting domain identified in dinoflagellates (55). This class encompassed 89% of the Euglena proteins examined, which were characterized by the presence of two hydrophobic regions that are predicted by the TMHMM program to be transmembrane helices (TMH) (Fig. 1A). Figure 1A shows the average TMHMM probability for the class I plastid-targeting regions, with the first predicted transmembrane helix (TMH1) corresponding to the hydrophobic domain of a classic signal sequence (75). A basic amino acid precedes the first TMH in all but six proteins, with the average charge of this N-terminal region being ⫹1.6. E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES VOL. 5, 2006 2081 TABLE 1. EST clusters Cluster IDa Class IA proteins 0726 0899 1043 1116 1127 1204 1312 1428 1495 1503 1573 1674 1706 2042 2448 2566 2596 2669 2795 2990 3121 3164 3171 3330 3362 3372 3375 3383 3449 3469 3474 3482 3500 3504 3558 3594 3603 3619 3635 3653 3673 3676 3817 3830 3881 3900 3911 3934 3943 3946 3996 4008 4056 7084 7147 7392 7739 7766 8108 8254 8643 8888 9366 Class IB proteins 3955 4026 3381 3249 3902 Annotation Ferredoxin 50S ribosomal protein L3 Putative ferredoxin 30S ribosomal protein S20 Zeta-carotene desaturase Uroporphyrinogen decarboxylase Putative ferredoxin RubisCO small subunit Glutaredoxin 2 Membrane-associated 30-kDa protein Putative ferredoxin Sugar nucleotide phosphorylase 50S ribosomal protein L34 Peptidyl-prolyl cis-trans isomerase Ferredoxin-like protein Ribose-5-phosphate isomerase Ycf53 (tetrapyrrole-binding protein) 50S ribosomal protein L11 D-Ribulose-5-phosphate 3-epimerase Albino 3 Chaperonin PSII quinone-binding protein Rhodanese domain-containing protein Photosystem II 22-kDa protein Coproporphyrinogen III oxidase ATP synthase delta chain 50S ribosomal protein L15 Light-regulated Chlp-localized protein ATP synthase gamma chain Cytochrome f Porphobilinogen deaminase Probable membrane-associated 30-kDa protein Fructose-1,6-bisphosphatase Glu 1-semialdehyde 2,1-aminomutase Carbonic anhydrase Carbonic anhydrase 50S ribosomal protein L28 Peroxiredoxin precursor 50S ribosomal protein L21 Coproporphyrinogen III oxidase Delta 12 fatty acid desaturase Carbonic anhydrase 30S ribosomal protein S1 Acyl carrier protein Ferredoxin ATP/ADP transporter PsbM LHCI Ferredoxin-NADP⫹ reductase NADPH protochlorophyllide reductase RuBisCO activase LHCI LHCI CP29 Chl. synthase 33-kDa subunit SOUL-heme-binding protein ATP-dependent Clp protease Ycf3 (PSI assembly) RuBisCO 60-kDa chaperonin YebC-related protein Chlorophyll b synthase Uroporphyrinogen decarboxylase 3-Isopropylmalate dehydrogenase Photosystem II family protein Oxygen evolving enhancer (OEE1) Oxygen evolving enhancer (OEE2) HCF136 (PSII stability factor) Putative ascorbate peroxidase Cytochrome c6 BLASTp score Presequence length (aa) TP length (aa)b TMH1 positionc Arabidopsis thaliana Cyanophora paradoxa Arabidopsis thaliana Synechococcus elongatus Oryza sativa Anopheles gambiae Arabidopsis thaliana Euglena gracilis Actinobacillus actinomycetemcomitans Pisum sativum Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Oryza sativa Rhizobium loti Spinacia oleracea Synechococcus elongatus Odontella sinensis Arabidopsis thaliana Bigelowiella natans Arabidopsis thaliana 3e⫺06 3e⫺12 4e⫺09 4e⫺08 1e⫺22 2e⫺31 3e⫺09 e⫺118 6e⫺10 148 168 144 170 182 149 143 120 130 60 52 65 64 55 60 58 56 55 2e⫺10 6e⫺09 7e⫺12 7e⫺06 6e⫺07 8e⫺05 2e⫺24 2e⫺12 3e⫺12 1e⫺24 5e⫺37 8e⫺30 168 147 168 190 178 120 144 182 186 150 211 189 Oryza sativa Arabidopsis thaliana Chlamydomonas reinhardtii Nicotiana tabacum Bigelowiella natans Solanum tuberosum Odontella sinensis Euglena gracilis Euglena gracilis Synechocystis sp. 6e⫺08 3e⫺17 e⫺108 7e⫺22 1e⫺14 4e⫺20 7e⫺78 4e⫺91 0 7e⫺49 Bigelowiella natans Chlorarachnion sp. Deinococcus radiodurans Deinococcus radiodurans Toxoplasma gondii Chlamydomonas reinhardtii Thermoanaerobacter tengcongensis Chlamydomonas reinhardtii Phaeodactylum tricornutum Deinococcus radiodurans Chlamydomonas reinhardtii Synechocystis sp. Euglena viridis Galdieria sulfuraria Zea mays Euglena gracilis Chlamydomonas reinhardtii Chlorarachnion sp. Chlorococcum littorale Euglena gracilis Euglena gracilis Oryza sativa Anabaena sp. Arabidopsis thaliana Vibrio cholerae Physcomitrella patens Arabidopsis thaliana Arabidopsis thaliana Dunaliella salina Ashbya gossypii Bifidobacterium longum Arabidopsis thaliana Euglena gracilis Lycopersicon esculentum Arabidopsis thaliana Lycopersicon esculentum Euglena gracilis Organism with top BLASTp hit TMH2 positionc Id 7–29 13–35 7–29 13–35 21–43 21–38 11–33 4–26 7–24 89–111 87–109 94–113 99–121 98–121 98–120 91–110 82–104 79–101 2 2 1 2 2 2 1 2 2 75 61 66 63 67 58 61 58 55 58 77 94 7–29 13–35 21–43 13–35 7–26 15–37 20–42 12–34 19–41 21–40 29–51 3–25 104–126 96–118 109–131 98–117 93–117 95–117 103–125 92–114 96–118 98–120 128–150 119–136 1 1 1 1 1 1 1 1 2 2 1 1 133 152 156 147 191 120 137 147 151 151 58 67 53 57 54 60 60 60 56 63 5–24 21–40 22–44 15–37 24–46 12–31 13–35 7–26 17–39 7–26 82–104 107–126 97–119 94–116 100–119 91–110 95–113 86–108 95–112 89–111 1 1 2 2 1 1 3 2 2 1 2e⫺71 e⫺148 7e⫺28 1e⫺10 4e⫺18 8e⫺85 3e⫺11 1e⫺78 2e⫺98 8e⫺30 7e⫺32 6e⫺13 1e⫺41 0 0.017 5e⫺86 e⫺144 5e⫺69 e⫺145 e⫺116 0 7e⫺57 9e⫺10 7e⫺20 1e⫺44 3e⫺27 5e⫺37 7e⫺17 8e⫺18 1e⫺17 7e⫺13 4e⫺11 188 138 102 140 160 134 168 169 162 179 233 122 138 148 154 179 114 155 95 158 141 136 141 136 143 162 122 108 114 164 150 137 56 53 52 62 55 57 52 60 77 68 66 49 56 52 63 50 51 53 59 55 51 50 45 68 58 64 67 57 70 63 71 61 20–37 7–29 5–24 13–35 13–35 5–27 13–35 29–51 13–30 13–32 4–26 15–34 17–39 12–34 13–35 13–35 5–27 12–34 13–35 13–35 13–35 12–34 15–34 5–22 15–37 20–42 7–29 7–29 6–22 17–39 13–35 13–35 93–115 82–99 76–98 97–119 90–112 84–106 87–109 111–133 107–129 100–122 92–114 83–105 95–117 86–105 98–120 85–107 78–100 87–109 72–101 90–109 86–108 84–106 79–101 90–112 95–117 106–128 96–118 86–108 92–111 102–124 106–128 96–118 1 2 3 1 2 2 1 3 2 4 1 1 3 1 3 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 e⫺116 6e⫺17 5e⫺07 2e⫺04 4e⫺69 142 153 142 184 123 53 49 52 70 60 5–27 20–42 7–29 13–35 29–51 80–99 91–113 81–103 105–127 111–133 3 2 1 1 2 Continued on following page 2082 DURNFORD AND GRAY EUKARYOT. CELL TABLE 1—Continued Cluster IDa 3752 2674 Class II proteins 3630 3294 0923 4012 2060 2416 3797 4932 8550 3784 9282 6808 2660 Annotation Organism with top BLASTp hit BLASTp score Presequence length (aa) TP length (aa)b TMH1 positionc TMH2 positionc Id 95–114 108–127 2 1 PSI subunit III (PsaF) Thylakoid luminal 17.4-kDa protein Chlamydomonas reinhardtii Arabidopsis thaliana 6e⫺53 5e⫺22 144 171 60 71 13–35 15–37 Photosystem II (PsbW) ABC transporter (cytochrome c biogenesis) PEP/phosphate translocator Oxygen evolving enhancer (OEE3) Mg-protoporphyrin IX methyltransferase Peptide chain release factor (RF) 2 PSI subunit IV (PsaE) 50S ribosomal protein L9 Short-chain (SC) dehydrogenase Phosphoribulokinase MECP synthase Squalene and phytoene synthases ClpB Chlorarachnion sp. Nostoc punctiforme 4e⫺15 5e⫺33 82 175 52 135 20–37 34–53 3 1 Phaeodactylum tricornutum Chlamydomonas reinhardtii Synechococcus elongatus 4e⫺10 3e⫺22 4e⫺17 166 61 66 132 36 40 13–35 13–35 5–27 1 2 1 Synechocystis sp. Chlamydomonas reinhardtii Bigelowiella natans Prochlorococcus marinus Vaucheria litorea Arabidopsis thaliana Prochlorococcus marinus Phaseolus lunatus 3e⫺42 6e⫺17 6e⫺05 8e⫺07 1e⫺76 2e⫺36 1e⫺27 8e⫺48 99 95 62 120 100 121 98 123 70 61 39 82 75 80 47 76 13–35 15–37 15–33 29–51 20–42 28–50 35–52 37–52 1 3 1 2 1 1 1 1 a Original cluster IDs had “EEL0000” preceding the 4-digit numbers shown. For class I proteins, this is the region between the signal sequence and the stop-transfer region. TMH1 and TMH2 are the hydrophobic domains (range of amino acids is given from the start Met) of the signal sequence and stop-transfer sequence, respectively, as predicted by the TMHMM program. Underlined regions indicate that the TMHMM program did not predict a TMH (TMHMM value, 0.1 ⬍ P ⬍ 0.9) but that a hydrophobic patch is apparent from a Kyte-Doolittle analysis. d Number of nearly identical isoforms detected. b c In only one case is the N-terminal region negatively charged (Table 1, cluster 3881 [ATP/ADP transporter]). The location of the second TMH is remarkably consistent, at 60 ⫾ 8 amino acids following the end of the first predicted TMH, with a range of 45 to 94 amino acids. We designate this localization the “60 ⫾ 8 rule” (Fig. 1A). The properties of the amino acids within the targeting regions of selected plastidlocalized proteins are shown in Fig. 1B. In this figure, the hydrophobic regions (gray) are obvious. The presence of the two TMH motifs separated by 60 ⫾ 8 amino acids had excellent discriminating power for identifying potential plastid-targeted proteins. For class I targeting sequences, the TMHMM program was able to predict upwards of 95% of the plastid proteins simply by searching for N-terminal regions with TMHs according to the 60 ⫾ 8 rule. If we combined the entire set of predicted plastid proteins (all classes), the TMHMM program would have an overall success rate of 82%. In cases where the TMHMM probability did not meet the threshold for formal TMH prediction (Table 1, underlined values), the probability of a TMH was usually between 0.3 and 0.9, and the success rate would be very high if the threshold was reduced in subsequent rounds of screening. Rescreening the entire population of ORFs using the 60 ⫾ 8 rule detected all of the class I proteins listed in Table 1, including isoforms, plus an additional 25 proteins classified as unknowns (data not shown). The TP domains of dinoflagellates, whose plastid leader sequences have a similar structure (49), are about half the size (25 ⫾ 8 residues) (data not shown) of those of Euglena proteins. Class IB proteins (Table 1) also possess two predicted TMHs separated by 60 ⫾ 8 amino acids, but they have a third hydrophobic domain with a mean distance of 17 residues (range, 7 to 25 residues) downstream of the end of TMH2 (Fig. 2). This region resembles a prokaryotic signal sequence and is postulated to function in the targeting of proteins to the thylakoid lumen (73). We identified five proteins that are ho- mologous to thylakoid lumen-localized proteins and for which biochemical evidence for this location exists (four of these class IB proteins are shown in Fig. 2, along with a lumen-targeted class II protein [see below]). Two additional proteins are predicted to function in the lumen, based on their annotation as well as their possession of a putative lumen-targeting domain (LTD). Three of the seven class IB proteins (ascorbate peroxidase, HCF136, and OEE2) contain a double Arg immediately preceding the third hydrophobic domain (data not shown); another two class IB proteins (PSI-III and cytochrome c6) have the same motif within six amino acids of the start of the hydrophobic LTD, suggesting that the twin-arginine translocation (Tat) pathway (58) is functional in Euglena. Class II targeting sequences in Euglena represent a departure from the class I type in that they lack the second TMH region upstream of the region specifying the mature protein, and hence do not conform to the 60 ⫾ 8 rule (Fig. 3). The TMHMM probability scatter plot shows the presence of the hydrophobic region associated with the signal sequence in all class II proteins. This class represents 14% of the identified population of plastid-targeted proteins. Of the 13 class II proteins delineated so far, 6 have unambiguous functions in the plastid, while the others conceivably could be targeted elsewhere. However, they all possess signal sequence-like N termini, and their predicted functions are expected to occur within the plastid. Each of these proteins is also related to homologs from photosynthetic taxa, as gauged by the top BLASTp hit (Table 1), supporting a putative plastid localization. The OEE3 protein, which is located within the thylakoid lumen, exhibits a second hydrophobic region (Fig. 3, arrow) that represents an LTD analogous to that found in class IB targeting domains. The estimated presequence length is 100 amino acids. All of the class II sequences have a spliced leader sequence, indicating that they are not class I sequences that VOL. 5, 2006 E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES 2083 FIG. 1. Characteristics of class I targeting sequences of Euglena. (A) Averaged TMHMM probabilities for 70 class I proteins identified in this study. Because the region upstream of the first TMH is of variable length (range, 2 to 32 amino acids; mean, 12.7 ⫾ 6.7 amino acids), the data were normalized to a starting TMHMM probability of ⱖ0.1, which corresponds to the beginning of a predicted membrane-spanning region, and then averaged. The error bars show 2 standard errors. Key features of a Euglena class I targeting sequence are depicted above the graph. (B) Overview (McClade) of amino acid categories of the targeting sequences of selected plastid-targeted proteins. Colors represent different amino acids, as follows: gray, hydrophobic and nonpolar (A, C, F, G, I, L, M, P, V, W, and Y); red, acidic (D and E); purple, basic (H, K, and R); yellow, hydroxylated (S and T); and blue, polar (Q and N). (C) Sequence logo plot showing occurrence of amino acids around the signal sequence cleavage site (arrow) predicted by SignalP (neural net). The y axis is displayed as bits, as described at weblogo.berkeley.edu/logo.cgi. were artifactually truncated upstream of the stop-transfer domain. Plastid transit peptides of class I and II proteins. In plants, targeting of proteins to the chloroplast is mediated by a transit peptide (for a review, see reference 11). Although sequence conservation per se is lacking, there is a general maintenance of certain chemical properties, including enrichment for the hydroxylated amino acids serine and threonine and a deficiency in acidic residues (76). In Euglena class I proteins, the intervening region between the two TMHs likely functions as a plastid TP (29). For class II proteins, we predicted that the region immediately following the signal sequence must have a role in targeting to the plastid. The exact length of the putative TP was difficult to assess, as we had little confidence in the ability of ChloroP to correctly predict the cleavage site, and thus the values in Table 1 are only estimates. However, from the predicted signal sequence site to the first region of clear sequence similarity to known proteins, the length ranged from 36 to 135 amino acids. To test whether the class II TP region was similar to that of class I targeting sequences and to determine the chemical properties of both TP domains compared to the TPs of green algae and plants, we examined their amino acid compositions. We also compared these compositions to those of the mature region of proteins with class I targeting domains as well as selected Chlamydomonas proteins. The amino acid composi- 2084 DURNFORD AND GRAY EUKARYOT. CELL FIG. 3. Characteristics of class II targeting sequences of Euglena plastid proteins. (A) Scatter plot showing TMHMM probability for the first 100 amino acids. Because the region before the first TMH is of variable length, the data were normalized to a starting TMHMM probability of ⱖ0.1. In all cases, a second TMH 60 ⫾ 8 amino acids downstream from the first was absent. The hydrophobic region centered at position 45 is the LTD of OEE3. (B) Overview (McClade) of amino acid categories of the targeting sequences of class II plastidtargeted proteins. Colors represent defined categories of amino acids, as indicated in the legend to Fig. 1. The black arrowhead indicates the predicted signal sequence cleavage site. FIG. 2. Kyte-Doolittle hydropathy plots for class IB plastid-targeting sequences of Euglena. Hydrophobicity plots for five confirmed lumen-targeted proteins are shown. The analyses were conducted with a window size of 19, and the hydrophobic regions (positive scores) corresponding to the TMHs of the signal sequence (SS) and the stoptransfer sequence (ST) are indicated with black bars. The hydrophobic region corresponding to the LTD is indicated with gray bars. Oxygenevolving enhancer 3 (OEE3) has a class II targeting sequence and thus lacks the typical ST region. TP, transit peptide; MP, mature protein. tion was calculated from the entire intervening region between the TMH regions of class I proteins (the predicted transit peptide), the estimated transit peptide from class II proteins that was located after the signal sequence and before the pre- dicted start of the mature protein, and the entire coding region from all proteins having class I targeting sequences. The data for selected amino acids and amino acid categories are shown in the form of box-and-whisker plots (Fig. 4). Since plastid transit peptides are reportedly enriched in hydroxylated amino acids and deficient in acidic amino acids (76), we analyzed a priori these amino acid categories in the putative TPs of class I and class II targeting sequences of Euglena in addition to a selection of 25 predicted TPs from Chlamydomonas proteins (Fig. 4). The region immediately downstream of the signal sequence in class I and II targeting sequences was significantly enriched in Ser and Thr (22% and 17%, respectively) compared to the mature regions of proteins with class I targeting sequences (11%) (one-way analysis of variance [ANOVA] and Tukey’s test [␣ ⱕ 0.05]). The TPs of Chlamydomonas proteins were similarly enriched in Ser/Thr (17%) compared to the mature portions of the proteins (11%) (Fig. 4). The putative transit peptide regions of class I and II targeting sequences were also significantly depleted in acidic amino acids (Asp and Glu) compared to the mature regions of the same proteins (Fig. 4) (one-way ANOVA and Tukey’s test [␣ ⱕ 0.05]). The predicted transit peptide regions were also found to have a higher Ala and Pro content than the mature portions of VOL. 5, 2006 E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES 2085 FIG. 4. Amino acid composition analyses of the predicted TPs of class I and II targeting sequences compared to the mature proteins (MP). The amino acid compositions of the intervening region between TMH1 and TMH2 of class I targeting sequences (TP, I; n ⫽ 70), the predicted transit peptide region for class II proteins (TP, II; n ⫽ 13), and the mature protein regions from class I proteins (MP, I; n ⫽ 70) were determined. Also shown are the amino acid compositions of Chlamydomonas reinhardti TPs (TP, Cr; n ⫽ 25) and mature proteins (MP, Cr; n ⫽ 25). Box-and-whisker plots were used to represent the data and are based on quartiles around the median value. The box encloses 50% of the data, with 25% above and below the median (solid line). Each whisker represents the data range of an additional 25% of the data. The existence of outliers beyond the 5% and 95% confidence ranges is indicated with a solid dot where applicable. Categories indicated with different letters on the plot are significantly different (one-way ANOVA and Tukey’s test [␣ ⱕ 0.05]). All data were normal except for the Lys content in class II peptides, in which case nonparametric statistics were used to assess differences. proteins (Fig. 4) (one-way ANOVA and Tukey’s test [␣ ⱕ 0.05]). However, given that 20 tests were conducted and that the amino acid composition is not truly independent, there is a possibility that some of these differences could be by chance. Although the Chlamydomonas TP exhibited a clear elevation in Ala content, there was no difference in the amount of Pro compared to that in the Euglena TPs. In terms of charged amino acids, the TP region is deficient in acidic amino acids, yet there is little significant change in the content of basic (His, Lys, and Arg) residues compared to the mature regions of the same proteins. However, examination of Lys and Arg separately reveals discrimination against Lys in the TP regions of class I and II targeting sequences (mean, 1.6% and 2.1%, respectively) compared to the mature proteins (mean, 5.8%; P ⬍ 0.001 [Kruskall-Wallis]) (Fig. 4). There were no significant differences in Arg content between the predicted transit peptides and the mature portions of the same proteins. Chlamydomonas TPs discriminate strongly against acidic amino acids (mean, 0.2%) and have an elevated content of Arg compared to the mature regions of the same proteins. Unlike Euglena, Chlamydomonas shows no bias against Lys in the TP. Without exception, the amino acid compositions of the Euglena class I and II transit peptides were the same, and both were significantly different from the composition of the mature protein (Fig. 4). To examine the distribution of the acidic amino acids further, the class I transit peptide region was divided into thirds, and the acidic amino acid content was calculated (Fig. 5). From this analysis, an asymmetric distribution of acidic amino acids was apparent, such that the first third (TP1) lacked acidic residues (1%) while the latter third (TP3) had the same acidic content as the mature protein (11%). The Ser/Thr compositions of the putative TPs were not different among the three regions (TP1-3) (Fig. 5). The basic amino acid composition was the same within the three TP regions and the mature protein (Fig. 5). 2086 DURNFORD AND GRAY EUKARYOT. CELL FIG. 5. Amino acid composition analysis of the plastid TP domain of class I targeting sequences. Each TP region was divided into three equal segments (TP1-3), and the basic (H, K, and R), acidic (D and E), and serine/threonine (Ser/Thr) contents were calculated. These values were compared to the averaged amino acid composition of the mature protein (MP). Overall, the putative TP domains of the two classes of Euglena targeting sequences have the same amino acid composition, and this composition resembles that of plant chloroplast transit peptides (11, 20, 76) in terms of an elevated content of Ser/Thr. These putative TP domains were also predicted to be transit peptides by using ChloroP (18), with apparent success rates of 83% and 67% for class I and II targeting sequences, respectively, when the signal sequence domain was removed. Surprisingly, the success rates were still respectable when the signal sequence was retained during the analysis (71% and 50%) but not when the entire targeting sequence was removed. One notable exception is the lumentargeted protein OEE3, with a class II targeting sequence that has a mere seven amino acids between the end of the TMH (the signal sequence) and the putative hydrophobic LTD. Two of the seven residues are basic amino acids (no acidic residues), but the region immediately after the LTD is strongly acidic (data not shown). Euglena TPs, like others in the green alga lineage, lack a requirement for a Phe at the N terminus that is commonly observed in chromalveolates (15, 55, 57) and glaucophytes (68) and that is essential for plastid import in vivo in diatoms (37). Stop-transfer sequences are a predicted feature of class I proteins. Stop-transfer sequences function to halt the cotranslational import of proteins into the ER and serve an important role in determining the orientation of a protein in the membrane (8). For Euglena, it has been proposed that the second TMH acts as a stop-transfer sequence (69). From analysis of a large number of proteins with class I targeting sequences, it is clear that a stop-transfer sequence is a common motif in Euglena plastid-targeted proteins. In a few cases (Table 1), the second TMH region was not predicted by the TMHMM program, and the probability of having a TMH ranged from 0.1 to 0.9. Nevertheless, in these cases, subsequent hydropathy plots confirmed that these targeting domains are still strongly hy- FIG. 6. (A) Kyte-Doolittle hydrophobicity profiles for the stoptransfer region of class I targeting sequences and the region immediately following the signal sequence of class II targeting domains. Plots begin 10 amino acid residues upstream of the start of the second TMH (for class I proteins) or the first TMH (for class II proteins), and the hydrophobicity profiles were calculated with a window size of 7 residues. The thick lines are the mean scores, and the thin lines on either side represent the 95% confidence intervals. The black bars above the hydrophobic regions indicate the location of the predicted TMH. (B) Sequence logo plot of class IA sequences when the second transmembrane helixes (TMH2) were aligned. Only the regions immediately before and after TMH2 are shown. VOL. 5, 2006 E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES 2087 FIG. 7. Alignment of targeting sequences from selected Euglena plastid-targeted proteins. (A) Comparison of FNR and CP29 targeting sequences. Identical amino acids are white on a black background. (B) Second group of proteins possessing similar targeting sequences. Identical amino acids compared to the top sequence are indicated by white letters on a black background. The hydrophobic regions of the signal sequence and stop-transfer domains are indicated by lines above the appropriate amino acids. The mature portions of the proteins, if shown, are indicated with double underlining. drophobic (data not shown) and therefore likely to have the same stop-transfer function. Immediately following the second TMH and within six residues of its end, ca. 80% of proteins of this class have two or more basic amino acids, and 97% of proteins have at least one. Only 2 of the 71 class I proteins lack a positively charged residue immediately after the TMH. The sharp change in polarity immediately after the second hydrophobic region, particularly towards positively charged residues, is apparent in the hydropathy plots encompassing this region (Fig. 6A). Class I polypeptides display a sharp decline in hydrophobicity immediately following the second TMH, a feature that presumably acts to block further insertion into the membrane. In class IB proteins, an additional hydrophobic region, the lumen-targeting domain, is located 25 to 30 amino acids further downstream. In contrast, class II polypeptides do not exhibit this sharp increase in polarity immediately after the hydrophobic section of the predicted signal sequence. The sequence logo illustrates the common occurrence of basic amino acids immediately after the hydrophobic domain of the stop-transfer domain (Fig. 6B), which is not observed after the signal sequences of class II proteins. These differences provide additional evidence that the TMH of a class II protein is not simply the second TMH of a 5⬘-truncated cDNA encoding a class I protein. Some plastid transit sequences are conserved. When the plastid-targeting sequences of Euglena class I proteins were used individually as tBLASTn queries against the Euglena database, unexpected similarities in certain groups of unrelated proteins were revealed. For instance, FNR and CP29 possess nearly identical targeting sequences despite having no functional relationship (Fig. 7A). There is also a group of targeting sequences that show various degrees of similarity, particularly within the signal sequence and plastid-targeting domains of the transit peptide. Within this group, the targeting sequences of rpL21 and rpL3 (tBLASTn E value ⫽ 5e⫺59) are nearly identical, and these sequences share a high degree of similarity with the targeting sequences of an acyl carrier protein (E ⫽ 2e⫺29) and two different light-harvesting complex (LHC) subunits (E ⫽ 2e⫺39 and 4e⫺21). Interestingly, the targeting domain of the first LHCI-like sequence shares a greater degree of similarity with those of the acyl carrier protein and ribosomal proteins than with the targeting domain of the other LHCI sequence (or of any other LHC sequence in the database). This similarity even extends into the putative stop-transfer domain, a region not expected to be conserved. In other cases, a tBLASTn search with a specific targeting sequence allowed the detection, as expected, of isoforms and members of a multigene family, a result that is attributable to gene duplication events. This search approach was able to recognize a variety of different plastid-targeting proteins, although the E values were generally ⬎1e⫺20; thus, some of this similarity could simply be due to the constraints placed upon these regions by amino acid composition. In marked contrast, many other targeting sequences produced no significant hits at all. With the exception of rpL3, which was represented by a single EST, the remainder of the EST clusters analyzed here comprised multiple overlapping reads, with clear evidence of a spliced leader sequence, eliminating clustering artifacts as an explanation for the observed similarity. DISCUSSION The discovery of LHC precursors in Golgi dictyosomes of Euglena (53) and subsequent in vitro experiments demon- 2088 DURNFORD AND GRAY strated that the Euglena LHCII presequence does indeed possess a functional signal motif (38), an inference that is strongly supported by the study reported here. Although the presence of a signal sequence-like region was part of our selection criteria, we found no evidence in the entire database of a plastidtargeted protein that lacked a signal sequence, suggesting that in Euglena, all plastid proteins proceed to the organelle via the endomembrane system. Although some in vitro studies have suggested the potential direct import of proteins into Euglena plastids, thereby bypassing the ER (65), the bulk of relevant biochemical work indicates that transport via the endomembrane system is required for plastid targeting (38, 67, 69–71). The endomembrane system is also important for plastid targeting in all protists with complex secondary plastids, including those with three (49, 55) and four (4, 7, 16, 19, 37, 59, 78, 79) plastid membranes. In Euglena, proteins targeted to the plastid do not fully insert into the ER lumen or the membrane during translation due to the presence of a stop-transfer domain, so the majority of the protein remains exposed in the cytoplasm (69). Indeed, in class I proteins, the presequence has a second hydrophobic region followed by positively charged amino acids, both of which are characteristics typical of stop-transfer sequences (14, 41). Although 2 of the 70 class I proteins lack positively charged amino acids immediately after the second TMH, such residues are not an absolute requirement for a stop-transfer function, with the effectiveness of targeting depending on a combination of hydrophobicity, length, and charge (14, 41, 60). The presence of a functioning stop-transfer motif in a plastid presequence is unique to Euglena and dinoflagellates. Both groups have three plastid membranes, leading Nassoury et al. (49) to suggest that the stop-transfer sequence arose from a mechanistic requirement driven by the number of plastid membranes. It is generally agreed that Euglena and dinoflagellates are phylogenetically distant; thus, the similarities between their targeting sequences, and presumably the underlying transport mechanisms, would appear to be convergent as part of a necessary step in protein targeting. Although targeting in organisms with complex plastids first requires import of the protein into the ER, little is known about subsequent mechanisms of targeting to the plastid. In organisms with three plastid membranes, such as euglenophytes and dinoflagellates, targeting from the ER to the outer plastid membrane involves vesicular transport via the Golgi system (49, 53). The segregation of plastid-bound proteins into the proper vesicles may involve receptors located in the endomembrane system that recognize the transit peptide and direct the protein to its appropriate destination. This pathway is analogous to that in animal and fungal systems, where receptors within the endomembrane system, such as the classic mannose-6-phosphate receptor system for targeting to the lysosome (22), are able to recognize features of the protein and ensure proper localization. Ultimately, cytoplasmic sorting factors, such as adaptins (9), may play a role in the accumulation of plastid-targeted proteins and their segregation to vesicles destined for the plastid. Such cytosolic factors could participate in the recognition of receptors that bind to plastid-targeted proteins and/or specific motifs just beyond the stoptransfer domain of the targeted protein itself to facilitate targeting. One potential series of residues includes the cluster EUKARYOT. CELL of basic amino acids that immediately follows the stop-transfer domain. The importance of short, cytoplasm-exposed targeting motifs for intracellular sorting is well known (9). For Euglena, Sláviková et al. (67) determined that this cytoplasm-exposed portion of the presequence is not required for plastid import in vitro, but they suggested that it may function in vesicle routing. Of particular interest here is our discovery of plastid-targeted proteins lacking the putative stop-transfer sequence (class II), implying that these proteins are inserted entirely into the ER, leaving a soluble portion within the ER lumen and a membrane portion integrated within the ER membrane, once the signal sequence is removed. Given that the Euglena class II proteins comprise both soluble and membrane proteins, it is unlikely that other domains within the mature protein could impart a similar stop-transfer effect to compensate for the lack of such a region in the presequence. The targeting route for class II proteins is conceptually similar to the targeting of proteins to the remnant plastid (apicoplast) in apicomplexans; apicoplast proteins lack the stop-transfer sequence and are targeted to the plastid via the ER (19), presumably by vesicular transport. Thus, for correct targeting, the putative transit peptide, and possibly the mature protein, must contain features that would be recognized by specific cofactors or receptors that are localized to the ER lumen, not the cytoplasm. Since class II transit peptides are predicted to lack the stop-transfer sequence and thus the cytoplasm-exposed region just beyond, redirection to the plastid must be facilitated solely by interaction with targeting factors that bind to the TP and allow these precursors to “hitchhike” in vesicles with the class I proteins. An alternative, albeit unlikely, mechanism is that the class II signal sequence acts as a signal anchor, with the N terminus facing the ER lumen. However, in this orientation the transit peptide would be facing the cytoplasm and presumably would be inaccessible to the targeting machinery. Even more surprising is the resemblance of this class of targeting sequence to those of dinoflagellates, whose plastidtargeted proteins also exhibit a similar proportion of presequences lacking stop-transfer domains (55), with the remainder resembling class I proteins. As possible explanations for the dinoflagellates, Patron et al. (55) ruled out the evolutionary history of the gene transfer or final destination of the protein, suggesting instead that the “physical characteristics” of the plastid-targeted protein may determine the nature of its presequence. In support of the latter hypothesis, they found that the class I and II distinction was conserved between proteins in two dinoflagellates examined. If “physical characteristics” was the main factor determining the mode of transport, then we would predict that Euglena would exhibit a similar distribution of proteins having class I and II presequences. Some similarities are clearly evident, such as with phosphoribulose kinase and oxygen-evolving enhancer 3 (PsbO), which lack a stoptransfer sequence in both dinoflagellates and Euglena. However, other dinoflagellate proteins with class II (and III) targeting sequences are class I proteins in Euglena (acyl carrier protein, carbonic anhydrase, cytochrome c6, and the PSII 11kDa protein). Although the sample size for comparison is small, there do not appear to be any obvious inherent functional or physical properties that would require a class I versus class II targeting sequence. In vitro import assays should help to define the functional requirements of the different classes of VOL. 5, 2006 E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES presequence and determine whether either is essential for the import of specific proteins. With the exception of apicomplexans, complex plastids with four plastid membranes often have ribosomes attached to the outer membrane (chloroplast ER [CER]). However, the primary plastid-sorting mechanism must still occur after cotranslational import across/into the ER membrane, since in diatoms the signal sequences of ER and plastid-resident proteins are functionally equivalent (37), and in a raphidophyte, few ribosomes are bound to the CER (30). Thus, once inserted into the endomembrane system, the plastid-bound proteins still have to be targeted to and transverse at least three membranes, similar to the situation in Euglena and dinoflagellates. Though not involving the Golgi dictyosomes, a vesicular transfer between the CER and the third membrane has been proposed (23), and a recent report supports such a mechanism (37). Apicomplexans, in particular, provide a valuable model system for dissecting the targeting process in complex plastids with four membranes, with several studies indicating not only that there is partially redundant targeting information in the presequences of apicoplast-targeted proteins (26, 81, 82) but also that there is a distinction between the information for targeting and that for import into the apicoplast (26). Recent work has even identified proteins that interact with the TP and that may be involved in sorting from the ER to the apicoplast (82). In Euglena, the region between the signal sequence and the stop-transfer sequence in class I proteins functions as a TP (67). This region and the TPs of class II proteins possess characteristics typical of most TPs. These similarities include enrichment in Ser/Thr (S/T bias) and Ala. S/T bias is a common feature of most transit peptides of plants and algae (2, 4, 11, 15, 20, 49, 54, 55, 59, 76). Some notable exceptions to this rule include apicomplexans (77, 78) and nucleomorph-encoded plastid proteins from the cryptomonad alga Guillardia theta (57). Replacement of all Ser/Thr residues in the TP of Plasmodium had no effect on plastid targeting, demonstrating a lack of a requirement for such residues (78). Although an elevated Ser/Thr content is evidently dispensable in apicomplexans, it remains one of the more consistent features of most TPs, which may reflect a requirement for phosphorylationdependent binding of 14-3-3 proteins as part of a preinsertion guidance complex (46). Euglena TPs also have an overall positive charge, an apparently universal feature of TPs (57), that is primarily due to a reduction in the content of acidic amino acids. Of particular interest is the asymmetric distribution of acidic amino acids in the TP, with the first two-thirds being deficient in such residues, whereas the remaining third has a composition resembling that of the mature protein. This asymmetry may reflect a distinction between functional TPs (with a bias against acidic residues) and regions having a different function. The importance of a TP depleted in acidic residues was demonstrated in Plasmodium, where the replacement of basic with acidic amino acids eliminated apicoplast targeting (19). Interestingly, Euglena TPs are also deficient in Lys (but not Arg) and have biases in favor of Ala and Pro compared to mature proteins, which are also features of the TPs of the chlorarachniophyte Bigelowiella natans (59). Some of the shared features of TPs, such as a bias against acidic amino acids and a bias in favor of some hydrophobic residues, may be due to a re- 2089 quirement for binding of import factors, such as molecular chaperones (Hsp70) (83). Although the biological significance of the biased amino acid composition in TPs is not entirely understood, and despite any differences in primary structure, TPs from diverse plastid types are functionally sufficient in heterologous import assays (3, 32, 42, 49, 67, 79). The striking amino acid similarity between certain plastidtargeting sequences is surprising. In general, transit peptides lack evident sequence similarity, even among paralogs of the same gene family, so the detection of clusters of related targeting sequences may shed light on how targeting sequences were acquired following transfer of the endosymbiont’s genes to the host nucleus during plastid evolution. Reports of highly similar plastid and mitochondrial TPs are relatively rare, but the examples can be separated into two categories. In the first case, homologs from different species exhibit a greater-thanexpected similarity within the TP region compared to that of the mature proteins, which is attributed to a conserved functional role (80). The second category includes unrelated proteins that possess highly similar TPs (1, 5, 40, 45), which is what we observe in Euglena. This similarity is often attributed to exon shuffling, as introns commonly separate the transit peptide from the mature protein (36, 45). There are also reports of transit peptide acquisition through insertion into preexisting genes for plastid (5)- and mitochondrion-targeted (1) proteins. Thus, the newly transferred genes would acquire not only the targeting mechanism but also the regulatory sequences required for expression, in the so-called “lucky insertion scenario” (21). Although we lack the appropriate genomic information from Euglena to be able to completely assess the mechanism of TP acquisition, a genomic sequence for an LHCII gene of this organism does have an intron that roughly separates the predicted targeting domain from the mature protein (48), suggesting exon shuffling as a potential mode of TP acquisition. However, the similarity of the rpL3 and rpL21 presequences to a small portion of the LHCI mature protein (GFDPLGL) (Fig. 7) suggests that TP acquisition by insertion into a preexisting copy of the LHCI gene is also a strong possibility. The maintenance of a continued high degree of conservation between rpL21-rpL3 and CP29-FNR could also imply recent recombination, or perhaps alternative splicing, as described for rice mitochondrion-targeted rpS14 and SDHB proteins (40). The pronounced sequence conservation within these regions also raises the possibility that these targeting sequences have an additional function(s) in the cell, either before or after cleavage, as proposed for some mammalian signal sequences (10, 17). In summary, we have characterized two distinct classes of Euglena plastid presequences, i.e., classes I and II, that differ by the presence and absence of a predicted stop-transfer sequence, respectively, revealing an additional level of complexity in the protein transport mechanism. In addition to enhancing our ability to predict Euglena presequences, we expect that the characteristics of these TPs will stimulate further import studies, both in vitro and in vivo, seeking to dissect the processes of targeting and import into the complex plastids of Euglena. 2090 DURNFORD AND GRAY ACKNOWLEDGMENTS This work was carried out under the auspices of a Genome Canada large-scale genomics project, the Protist EST Program, with funding provided through Genome Atlantic and the Atlantic Innovation Fund. M.W.G. gratefully acknowledges salary support from the Canada Research Chairs Program and the Canadian Institute for Advanced Research (Program in Evolutionary Biology). D.G.D. also thanks the Natural Sciences and Engineering Research Council (NSERC) for ongoing support. We are grateful to Patrick Keeling for sharing a paper on dinoflagellate targeting sequences prior to publication. We also thank Steve Heard and Penny Humby for helpful discussions on statistics. The technical assistance of H. Rissler, who isolated RNAs from Euglena for the construction of two of the five cDNA libraries sequenced for this study, is acknowledged. REFERENCES 1. Adams, K. L., M. Rosenblueth, Y. L. Qiu, and J. D. Palmer. 2001. Multiple losses and transfers to the nucleus of two mitochondrial succinate dehydrogenase genes during angiosperm evolution. Genetics 158:1289–1300. 2. Apt, K. E., D. Bhaya, and A. R. Grossman. 1994. Characterization of genes encoding the light-harvesting proteins in diatoms: biogenesis of the fucoxanthin chlorophyll a/c protein complex. J. Appl. Phycol. 6:225–230. 3. Apt, K. E., N. E. Hoffman, and A. R. Grossman. 1993. The ␥-subunit of R-phycoerythrin and its possible mode of transport into the plastid of red algae. J. Biol. Chem. 268:16208–16215. 4. Apt, K. E., L. Zaslavkaia, J. C. Lippmeier, M. Lang, O. Kilian, R. Wetherbee, A. R. Grossman, and P. G. Kroth. 2002. In vivo characterization of diatom multipartite plastid targeting signals. J. Cell Sci. 115:4061–4069. 5. Arimura, S.-I., S. Takusagawa, S. Hatano, M. Nakazono, A. Hirai, and N. Tsutsumi. 1999. A novel plant nuclear gene encoding chloroplast ribosomal protein S9 has a transit peptide related to that of rice chloroplast ribosomal protein L12. FEBS Lett. 450:231–234. 6. Bendtsen, J. D., H. Nielsen, G. von Heijne, and S. Brunak. 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783–795. 7. Bhaya, D., and A. Grossman. 1991. Targeting proteins to diatom plastids involves transport through an endoplasmic reticulum. Mol. Gen. Genet. 229:400–404. 8. Blobel, G. 1980. Intracellular protein topogenesis. Proc. Natl. Acad. Sci. USA 77:1496–1500. 9. Bonifacino, J. S., and L. M. Traub. 2003. Signals for sorting of transmembrane proteins to endosomes and lysosomes. Annu. Rev. Biochem. 72:395– 447. 10. Braud, V. M., D. S. Allan, C. A. O’Callaghan, K. Soderstrom, A. D’Andrea, G. S. Ogg, S. Lazetic, N. T. Young, J. I. Bell, J. H. Phillips, L. L. Lanier, and A. J. McMichael. 1998. HLA-E binds to natural killer cell receptors CD94/ NKG2A, B and C. Nature 391:795–799. 11. Bruce, B. D. 2000. Chloroplast transit peptides: structure, function and evolution. Trends Cell Biol. 10:440–447. 12. Cavalier-Smith, T. 2002. Chloroplast evolution: secondary symbiogenesis and multiple losses. Curr. Biol. 12:R62–R64. 13. Chan, R. L., M. Keller, J. Canaday, J. H. Weil, and P. Imbault. 1990. Eight small subunits of Euglena ribulose-1 5-bisphosphate carboxylase-oxygenase are translated from a large messenger RNA as a polyprotein. EMBO J. 9:333–338. 14. Chen, H., and D. A. Kendall. 1995. Artificial transmembrane segments. Requirements for stop transfer and polypeptide orientation. J. Biol. Chem. 270:14115–14122. 15. Deane, J. A., M. Fraunholz, V. Su, U.-G. Maier, W. Martin, D. G. Durnford, and G. I. McFadden. 2000. Evidence for nucleomorph to host nucleus gene transfer: light-harvesting complex proteins from cryptomonads and chlorarachniophytes. Protist 151:239–252. 16. DeRocher, A., C. B. Hagen, J. E. Froehlich, J. E. Feagin, and M. Parsons. 2000. Analysis of targeting sequences demonstrates that trafficking to the Toxoplasma gondii plastid branches off the secretory system. J. Cell Sci. 113:3969–3977. 17. Eichler, R., O. Lenz, T. Strecker, M. Eickmann, H. D. Klenk, and W. Garten. 2003. Identification of Lassa virus glycoprotein signal peptide as a transacting maturation factor. EMBO Rep. 4:1084–1088. 18. Emanuelsson, O., H. Nielsen, and G. von Heijne. 1999. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8:978–984. 19. Foth, B. J., S. A. Ralph, C. J. Tonkin, N. S. Struck, M. Fraunholz, D. S. Roos, A. F. Cowman, and G. I. McFadden. 2003. Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705–708. 20. Franzen, L. G., J.-D. Rochaix, and G. von Heijne. 1990. Chloroplast transit peptides from the green alga Chlamydomonas reinhardtii share features with both mitochondrial and higher plant chloroplast presequences. FEBS Lett. 260:165–168. EUKARYOT. CELL 21. Gantt, J. S., S. L. Baldauf, P. J. Calie, N. F. Weeden, and J. D. Palmer. 1991. Transfer of rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain of an intron. EMBO J. 10:3073–3078. 22. Ghosh, P., N. M. Dahms, and S. Kornfeld. 2003. Mannose 6-phosphate receptors: new twists in the tale. Nat. Rev. Mol. Cell Biol. 4:202–212. 23. Gibbs, S. P. 1979. Route of entry of cytoplasmically synthesized proteins into chloroplasts of algae possessing chloroplast-ER. J. Cell Sci. 35:253–266. 24. Grossman, A., A. Manodori, and D. Snyder. 1990. Light-harvesting proteins of diatoms: their relationship to the chlorophyll a/b binding proteins of higher plants and their mode of transport into plastids. Mol. Gen. Genet. 224:91–100. 25. Hannaert, V., H. Brinkmann, U. Nowitzki, J. A. Lee, M. A. Albert, C. W. Sensen, T. Gaasterland, M. Muller, P. Michels, and W. Martin. 2000. Enolase from Trypanosoma brucei, from the amitochondriate protist Mastigamoeba balamuthi, and from the chloroplast and cytosol of Euglena gracilis: pieces in the evolutionary puzzle of the eukaryotic glycolytic pathway. Mol. Biol. Evol. 17:989–1000. 26. Harb, O. S., B. Chatterjee, M. J. Fraunholz, M. J. Crawford, M. Nishi, and D. S. Roos. 2004. Multiple functionally redundant signals mediate targeting to the apicoplast in the apicomplexan parasite Toxoplasma gondii. Eukaryot. Cell 3:663–674. 27. Henze, K., A. Badr, M. Wettern, R. Cerff, and W. Martin. 1995. A nuclear gene of eubacterial origin in Euglena gracilis reflects cryptic endosymbioses during protist evolution. Proc. Natl. Acad. Sci. USA 92:9122–9126. 28. Houlne, G., and R. Schantz. 1987. Molecular analysis of the transcripts encoding the light-harvesting chlorophyll a/b protein in Euglena gracilis: unusual size of the mRNA. Curr. Genet. 12:611–616. 29. Inagaki, J., Y. Fujita, T. Hase, and Y. Yamamoto. 2000. Protein translocation within chloroplast is similar in Euglena and higher plants. Biochem. Biophys. Res. Commun. 277:436–442. 30. Ishida, K., T. Cavalier-Smith, and B. R. Green. 2000. Endomembrane structure and the chloroplast protein targeting pathway in Heterosigma akashiwo (Raphidophyceae, Chromista). J. Phycol. 36:1135–1144. 31. Jackson-Constan, D., and K. Keegstra. 2001. Arabidopsis genes encoding components of the chloroplastic protein import apparatus. Plant Physiol. 125:1567–1576. 32. Jakowitsch, J., C. Neumann-Spallart, Y. Ma, J. Steiner, H. E. Schenk, H. J. Bohnert, and W. Löffelhardt. 1996. In vitro import of pre-ferredoxinNADP⫹-oxidoreductase from Cyanophora paradoxa into cyanelles and into pea chloroplasts. FEBS Lett. 381:153–155. 33. Keegstra, K., and K. Cline. 1999. Protein import and routing systems of chloroplasts. Plant Cell 11:557–570. 34. Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their hosts. Am. J. Bot. 91:1481–1493. 35. Kilian, O., and P. G. Kroth. 2003. Evolution of protein targeting into “complex” plastids: the “secretory transport hypothesis.” Plant Biol. (Stuttgart) 5:350–358. 36. Kilian, O., and P. G. Kroth. 2004. Presequence acquisition during secondary endocytobiosis and the possible role of introns. J. Mol. Evol. 58:712–721. 37. Kilian, O., and P. G. Kroth. 2005. Identification and characterization of a new conserved motif within the presequence of proteins targeted into complex diatom plastids. Plant J. 41:175–183. 38. Kishore, R., U. S. Muchhal, and S. D. Schwartzbach. 1993. The presequence of Euglena LHCPII, a cytoplasmically synthesized chloroplast protein, contains a functional endoplasmic reticulum-targeting domain. Proc. Natl. Acad. Sci. USA 90:11845–11849. 39. Krogh, A., B. Larsson, G. von Heijne, and E. L. Sonnhammer. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305:567–580. 40. Kubo, N., K. Harada, A. Hirai, and K.-I. Kadowaki. 1999. A single nuclear transcript encoding mitochondrial RPS14 and SDHB of rice is processed by alternative splicing: common use of the same mitochondrial targeting signal for different proteins. Proc. Natl. Acad. Sci. USA 96:9207–9211. 41. Kuroiwa, T., M. Sakaguchi, K. Mihara, and T. Omura. 1991. Systematic analysis of stop-transfer sequence for microsomal membrane. J. Biol. Chem. 266:9251–9255. 42. Lang, M., K. E. Apt, and P. G. Kroth. 1998. Protein transport into “complex” diatom plastids utilizes two different targeting signals. J. Biol. Chem. 273: 30973–30978. 43. Lang, M., and P. G. Kroth. 2001. Diatom fucoxanthin chlorophyll a/c-binding protein (FCP) and land plant light-harvesting proteins use a similar pathway for thylakoid membrane insertion. J. Biol. Chem. 276:7985–7991. 44. Lin, Q., L. Ma, W. Burkhart, and L. L. Spremulli. 1994. Isolation and characterization of cDNA clones for chloroplast translational initiation factor-3 from Euglena gracilis. J. Biol. Chem. 269:9436–9444. 45. Long, M., S. J. de Souza, C. Rosenberg, and W. Gilbert. 1996. Exon shuffling and the origin of the mitochondrial targeting function in plant cytochrome c1 precursor. Proc. Natl. Acad. Sci. USA 93:7727–7731. 46. May, T., and J. Soll. 2000. 14-3-3 proteins form a guidance complex with chloroplast precursor proteins in plants. Plant Cell 12:53–64. 47. McFadden, G. I. 1999. Plastids and protein targeting. J. Eukaryot. Microbiol. 46:339–346. VOL. 5, 2006 E. GRACILIS PLASTID-TARGETED PROTEIN TRANSIT SEQUENCES 48. Muchhal, U. S., and S. D. Schwartzbach. 1994. Characterization of the unique intron-exon junctions of Euglena gene(s) encoding the polyprotein precursor to the light-harvesting chlorophyll a/b binding protein of photosystem II. Nucleic Acids Res. 22:5737–5744. 49. Nassoury, N., M. Cappadocia, and D. Morse. 2003. Plastid ultrastructure defines the protein import pathway in dinoflagellates. J. Cell Sci. 116:2867– 2874. 50. Nassoury, N., and D. Morse. 2005. Protein targeting to the chloroplasts of photosynthetic eukaryotes: getting there is half the fun. Biochim. Biophys. Acta 1743:5–19. 51. Nielsen, H., and A. Krogh. 1998. Prediction of signal peptides and signal anchors by a hidden Markov model. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6:122–130. 52. Nowitzki, U., G. Gelius-Dietrich, M. Schwieger, K. Henze, and W. Martin. 2004. Chloroplast phosphoglycerate kinase from Euglena gracilis: endosymbiotic gene replacement going against the tide. Eur. J. Biochem. 271:4123– 4131. 53. Osafune, T., S. Sumida, J. A. Schiff, and E. Hase. 1991. Immunolocalization of LHCP II apoprotein in the Golgi during light-induced chloroplast development in non-dividing Euglena cells. J. Electron Microsc. 40:41–47. 54. Pancic, P. G., and H. Strotmann. 1993. Structure of the nuclear encoded g subunit of CFoCF1 of the diatom Odontella sinensis including its presequence. FEBS Lett. 320:61–66. 55. Patron, N. J., R. F. Waller, J. M. Archibald, and P. J. Keeling. 2005. Complex protein targeting to dinoflagellate plastids. J. Mol. Biol. 348:1015– 1024. 56. Plaumann, M., B. Pelzer-Reith, W. F. Martin, and C. Schnarrenberger. 1997. Multiple recruitment of class-I aldolase to chloroplasts and eubacterial origin of eukaryotic class-II aldolases revealed by cDNAs from Euglena gracilis. Curr. Genet. 31:430–438. 57. Ralph, S. A., B. J. Foth, N. Hall, and G. I. McFadden. 2004. Evolutionary pressures on apicoplast transit peptides. Mol. Biol. Evol. 21:2183–2194. 58. Robinson, C. 2000. The twin-arginine translocation system: a novel means of transporting folded proteins in chloroplasts and bacteria. Biol. Chem. 381: 89–93. 59. Rogers, M. B., J. M. Archibald, M. A. Field, C. Li, B. Striepen, and P. J. Keeling. 2004. Plastid-targeting peptides from the chlorarachniophyte Bigelowiella natans. J. Eukaryot. Microbiol. 51:529–535. 60. Saaf, A., E. Wallin, and G. von Heijne. 1998. Stop-transfer function of pseudo-random amino acid segments during translocation across prokaryotic and eukaryotic membranes. Eur. J. Biochem. 251:821–829. 61. Santillán-Torres, J. L., A. Atteia, M. G. Claros, and D. González-Halphen. 2003. Cytochrome f and subunit IV, two essential components of the photosynthetic bf complex typically encoded in the chloroplast genome, are nucleus-encoded in Euglena gracilis. Biochim. Biophys. Acta 1604:180–189. 62. Schnell, D. J. 1998. Protein targeting to the thylakoid membrane. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:97–126. 63. Schwartzbach, S. D., T. Osafune, and W. Löffelhardt. 1998. Protein import into cyanelles and complex chloroplasts. Plant Mol. Biol. 38:247–263. 64. Sharif, A. L., A. G. Smith, and C. Abell. 1989. Isolation and characterisation of a cDNA clone for a chlorophyll synthesis enzyme from Euglena gracilis. The chloroplast enzyme hydroxymethylbilane synthase (porphobilinogen deaminase) is synthesised with a very long transit peptide in Euglena. Eur. J. Biochem. 184:353–359. 65. Shashidhara, L. S., S. H. Lim, J. B. Shackleton, C. Robinson, and A. G. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 2091 Smith. 1992. Protein targeting across the three membranes of the Euglena chloroplast envelope. J. Biol. Chem. 267:12885–12891. Shigemori, Y., J. Inagaki, H. Mori, M. Nishimura, S. Takahashi, and Y. Yamamoto. 1994. The presequence of the precursor to the nucleus-encoded 30 kDa protein of photosystem II in Euglena gracilis Z includes two hydrophobic domains. Plant Mol. Biol. 24:209–215. Sláviková, S., R. Vacula, Z. Fang, T. Ehara, T. Osafune, and S. D. Schwartzbach. 2005. Homologous and heterologous reconstitution of Golgi to chloroplast transport and protein import into the complex chloroplasts of Euglena. J. Cell Sci. 118:1651–1661. Steiner, J. M., and W. Löffelhardt. 2005. Protein translocation into and within cyanelles. Mol. Membr. Biol. 22:123–132. Sulli, C., Z. Fang, U. Muchhal, and S. D. Schwartzbach. 1999. Topology of Euglena chloroplast protein precursors within endoplasmic reticulum to Golgi to chloroplast transport vesicles. J. Biol. Chem. 274:457–463. Sulli, C., and S. D. Schwartzbach. 1995. The polyprotein precursor to the Euglena light-harvesting chlorophyll a/b-binding protein is transported to the Golgi apparatus prior to chloroplast import and polyprotein processing. J. Biol. Chem. 270:13084–13090. Sulli, C., and S. D. Schwartzbach. 1996. A soluble protein is imported into Euglena chloroplasts as a membrane-bound precursor. Plant Cell 8:43–53. Tessier, L. H., M. Keller, R. L. Chan, R. Fournier, J. H. Weil, and P. Imbault. 1991. Short leader sequences may be transferred from small RNAs to pre-mature mRNAs by trans-splicing in Euglena. EMBO J. 10:2621–2625. Vacula, R., J. M. Steiner, J. Krajcovic, L. Ebringer, and W. Löffelhardt. 1999. Nucleus-encoded precursors to thylakoid lumen proteins of Euglena gracilis possess tripartite presequences. DNA Res. 6:45–49. van Dooren, G. G., S. D. Schwartzbach, T. Osafune, and G. I. McFadden. 2001. Translocation of proteins across the multiple membranes of complex plastids. Biochim. Biophys. Acta 1541:34–53. von Heijne, G. 1990. The signal peptide. J. Membr. Biol. 115:195–201. von Heijne, G., J. Steppuhn, and R. G. Herrmann. 1989. Domain structure of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 180: 535–545. Waller, R. F., P. J. Keeling, R. G. Donald, B. Striepen, E. Handman, N. Lang-Unnasch, A. F. Cowman, G. S. Besra, D. S. Roos, and G. I. McFadden. 1998. Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc. Natl. Acad. Sci. USA 95:12352–12357. Waller, R. F., M. B. Reed, A. F. Cowman, and G. I. McFadden. 2000. Protein trafficking to the plastid of Plasmodium falciparum is via the secretory pathway. EMBO J. 19:1794–1802. Wastl, J., and U.-G. Maier. 2000. Transport of proteins into cryptomonads complex plastids. J. Biol. Chem. 275:23194–23198. Wolter, F. P., C. C. Fritz, L. Willmitzer, J. Schell, and P. H. Schreier. 1988. rbcS genes in Solanum tuberosum: conservation of transit peptide and exon shuffling during evolution. Proc. Natl. Acad. Sci. USA 85:846–850. Yung, S., T. R. Unnasch, and N. Lang-Unnasch. 2001. Analysis of apicoplast targeting and transit peptide processing in Toxoplasma gondii by deletional and insertional mutagenesis. Mol. Biochem. Parasitol. 118:11–21. Yung, S. C., T. R. Unnasch, and N. Lang-Unnasch. 2003. Cis and trans factors involved in apicoplast targeting in Toxoplasma gondii. J. Parasitol. 89:767–776. Zhang, X. P., and E. Glaser. 2002. Interaction of plant mitochondrial and chloroplast signal peptides with the Hsp70 molecular chaperone. Trends Plant Sci. 7:14–21.
© Copyright 2026 Paperzz