Biochimica et Biophysica Acta 1758 (2006) 1557 – 1579 www.elsevier.com/locate/bbamem Review Extra domains in secondary transport carriers and channel proteins Ravi D. Barabote, Dorjee G. Tamang, Shannon N. Abeywardena, Neda S. Fallah, Jeffrey Yu Chung Fu, Jeffrey K. Lio, Pegah Mirhosseini, Ronnie Pezeshk, Sheila Podell 1 , Marnae L. Salampessy, Mark D. Thever, Milton H. Saier Jr. ⁎ Division of Biological Sciences, University of California at San Diego, La Jolla, CA 92093-0116, USA Received 9 May 2006; received in revised form 16 June 2006; accepted 20 June 2006 Available online 27 June 2006 Abstract “Extra” domains in members of the families of secondary transport carrier and channel proteins provide secondary functions that expand, amplify or restrict the functional nature of these proteins. Domains in secondary carriers include TrkA and SPX domains in DASS family members, DedA domains in TRAP-T family members (both of the IT superfamily), Kazal-2 and PDZ domains in OAT family members (of the MF superfamily), USP, IIAFru and TrkA domains in ABT family members (of the APC superfamily), ricin domains in OST family members, and TrkA domains in AAE family members. Some transporters contain highly hydrophilic domains consisting of multiple repeat units that can also be found in proteins of dissimilar function. Similarly, transmembrane α-helical channel-forming proteins contain unique, conserved, hydrophilic domains, most of which are not found in carriers. In some cases the functions of these domains are known. They may be ligand binding domains, phosphorylation domains, signal transduction domains, protein/protein interaction domains or complex carbohydrate-binding domains. These domains mediate regulation, subunit interactions, or subcellular targeting. Phylogenetic analyses show that while some of these domains are restricted to closely related proteins derived from specific organismal types, others are nearly ubiquitous within a particular family of transporters and occur in a tremendous diversity of organisms. The former probably became associated with the transporters late in the evolutionary process; the latter probably became associated with the carriers much earlier. These domains can be located at either end of the transporter or in a central region, depending on the domain and transporter family. These studies provide useful information about the evolution of extra domains in channels and secondary carriers and provide novel clues concerning function. © 2006 Elsevier B.V. All rights reserved. Keywords: Transport; Secondary carriers; Evolution; Protein domains; DedA; TrkA; SPX; Kazal-2; PDZ; USP; IIAFru Contents 1. 2. 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. TRKA-C domains in proteins of the DASS family (TC #2.A.47). . . . . . . . . . . . . . . . . . . . . . . . 3.2. SPX domains in proteins of the DASS family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. DedA domains in TRAP-T family transporters (TC #2.A.56). . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Kazal-2 domains in proteins of the OAT family (TC #2.A.60) . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. PDZ domains in proteins of the OAT family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. USP domains in proteins of the ABT family (TC #2.A.3.6) within the APC superfamily (TC #2.A.3) . . . . 3.7. TrkA-C domains in proteins of the aspartate:alanine exchanger (AAE) family (TC #2.A.81) . . . . . . . . . 3.8. Multiple domains in proteins of the eukaryotic-specific organic solute transporter (OST) family (TC #2.A.82) 3.9. Repeat domains in members of the 4 TMS multidrug endosomal transporter (MET) family (TC #2.A.74) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⁎ Corresponding author. Tel.: +1 858 534 4084; fax: +1 858 534 7108. E-mail address: [email protected] (M.H. Saier). 1 Present address: Scripps Genome Center, Scripps Institution of Oceanography, University of California at San Diego, La Jolla, CA 92093-0202, USA. 0005-2736/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.bbamem.2006.06.018 . . . . . . . . . . . . 1558 1558 1558 1558 1559 1560 1561 1564 1565 1565 1569 1572 1558 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 3.10. “Extra” domains in ion channel proteins . . . . . . . . . . . . . 3.11. The voltage-gated ion channel (VIC) superfamily (TC #1.A.1–5) 3.12. The ligand-gated ion channel (LIC) family (TC #1.A.9) . . . . . 3.13. The chloride ion channel (ClC) family (TC #1.A.11). . . . . . . 4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1574 1574 1575 1575 1576 1577 1577 1. Introduction 2. Computer methods For the past 15 years, our bioinformatics laboratory has focused on the characterization, classification and evolutionary origins of transport proteins [1,2]. The availability of the transporter classification system [3,4], now incorporated into an extensive database [5], has allowed us to conduct analyses leading to conclusions about the origins of specific transporters and protein domains present in transport systems [6,7]. Of particular value in identifying the functions of transporters of unknown mechanism of action and specificity has been the analysis of extra domains in “fusion” proteins. For example, the fusion of a transporter of the major facilitator superfamily (MFS) [8,9] to a fatty acyl transferase/ acyl-ACP synthetase led to the functional characterization of a family of transporters catalyzing lysophospholipid flipping across the cytoplasmic membranes of Gram-negative bacteria [10]. In another case, fusion of transporters of the SulP family to carbonic anhydrase homologues led to the correct prediction that these systems are bicarbonate transporters [11]. In further recent studies, we analyzed fusion proteins where recognized domains of uncertain function are fused to transport protein homologues. These studies also allowed correct functional predictions [12]. Analyses of proteins that incorporate domains homologous to phosphoryl transfer proteins of the phosphoenolpyruvate-dependent phosphotransferase system (PTS) led to important predictions about their modes of regulation [13]. Bioinformatic successes such as those described above have tremendously facilitated functional genomic approaches. They have prompted us to analyze a broader range of protein domains associated with secondary transport carriers and α-type channel-forming proteins. The results of our analyses are reported here. We show that extra domains are not uncommon in families of carriers and channels, and in many cases their functions can be predicted. The presence of multiple such domains often indicates the occurrence of complex molecular interactions. A Fasta-formatted file of all the protein sequences in the Transporter Classification Database (TCDB; http://www.tcdb. org/tcdb) was obtained and used for domain identification and analysis. All TCDB proteins were screened against Pfam [14] protein domain profiles using the HMMER program (version 2.3.1) (http://hmmer.wustl.edu/) [15] with an e-value cutoff of e− 3. Secondary carriers (TC subclass #2.A) were chosen for analysis of the occurrence of various Pfam domains. Secondary transporters containing hydrophilic or amphipathic domains that had not previously been identified and additional domains that appeared to be of interest were detected and characterized. TCDB proteins were used as query sequences for identification of all family homologues in the NCBI protein database. All protein homologues discussed here were obtained using the PSI-BLAST search tool with iterations [16], screening the NCBI RefSeq databank. Transport proteins from the TCDB database that were found to contain the domains under study were used as query sequences in the PSI-BLAST searches. Homologues obtained from the Blast searches were shown to be members of the respective TC families using TC-BLAST (http://www.tcdb.org/progs/blast.php), retaining only established members of the respective families under study. Homologues were checked for redundancies using an unpublished program (S. Singhi and M.H. Saier, Jr. unpublished). Multiple sequence alignments and clustal-generated trees were constructed using the CLUSTAL X program [17]. Clustal-generated trees were viewed using the TreeView program [18]. Default parameters were used. Tables listing all the proteins analyzed for each family as well as the multiple alignments and clustal-generated trees (as figures) are provided as supplementary material on our website: http://biology.ucsd.edu/~msaier/supmat/Domains. The AveHAS program [19] was used for plotting the average hydropathy, similarity and amphipathicity as a function of alignment position after aligning the sequences with the CLUSTAL X program. The different domains were recognized using the NCBI CD-Search tool [20]. The occurrence of the domain in all the homologues of a family was assessed by analyzing the multiple sequence alignments. 3. Results 3.1. TRKA-C domains in proteins of the DASS family (TC #2.A.47) The Divalent Anion:Na+ symporter (DASS) family (TC #2.A.47) [21] is a constituent family within the Ion Transporter (IT) superfamily [22]. Various functionally characterized members of the DASS family transport inorganic sulfate, phosphate and organic R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1559 di- and tricarboxylates of the Krebs cycle as well as dicarboxylic amino acid. Some of these transporters exhibit narrow specificity while others exhibit broad specificity. The DASS family is ubiquitous, occurring in all three domains of life, archaea, bacteria and eukaryotes. Hydrophilic TrkA domains consist of two conserved regions, TrkA-N and TrkA-C, both of about 120 residues, separated by a short, poorly conserved linker region [23]. The TrkA-N domain occurs in a wide variety of transport and non-transport proteins such as channels, carriers, exonucleases, dehydrogenases and phosphoesterases [24,25]. This domain binds NAD+. The β-structured TrkA-C domain is less well defined but probably binds an unidentified ligand [23]. We have identified TrkA-C domains in members of the DASS family (TC #2.A.47) [21,22]. Members of this ubiquitous family were divided into three subgroups based on the organisms of origin: eukaryotes, bacteria and archaea. The TrkA-C domain was found in some but not all members of each of the three groups of proteins. In all of these proteins, these domains appear to occur between hydrophobic peaks 6 and 7, that is, between the two putative repeat elements of these 10 or 12 TMS carrier proteins (see Fig. 1A–C) [22,26,27]. In the 10 TMS model, hydrophobic peaks 5 and 11 are reentrant loops that go into the membrane but do not traverse it [26]. Of the eight archaeal homologues identified (see Table DASS-S1 for the list of proteins studied and their properties, and Fig. DASS-S1A for the multiple alignment, both on our website), only 4 proteins contain the TrkA-C domain, and of them, three (Hma1 and Hma2 from Haloarcula marismortui as well as Hsa1-1 from Halobacterium salinarum) have this domain tandemly duplicated. These two tandemly duplicated TrkA-C domains were designated as TrkA-C1 and TrkA-C2 (Fig. 1). The fourth TrkA-C-containing archaeal homologue, Pae1-1 from Pyrobaculum aerophilum, had only one copy of the TrkA-C domain. The three halophilic archaeal proteins cluster together on the clustal-generated tree while the Pyrobaculus protein clusters distantly from all others (Fig. DASS-S1B). While two archaeal Hma paralogues (Hma1 and Hma2) contain the TrkA-C domain, as noted above, a third Hma paralogue lacks the TrkA-C domain. This protein clusters separately from its paralogues, but it clusters loosely with Mja1 from Methancaldococcus jannaschii which also lacks the TrkA-C domain. The results suggest that both association and duplication of the TrkA-C domain occurred early during the evolutionary divergence of these archaeal homologues. Of the 195 bacterial homologues (see Table DASS-S2), the first TrkA-C domain (TrkA-C1) could be identified in about 75 of them. The second TrkA-C domain (TrkA-C2) was present in almost all bacterial homologues. The multiple alignment and the clustalgenerated tree of the bacterial proteins (Figs. DASS-S2A and S2B, respectively) revealed that the occurrence of the TrkA-C domain correlates reasonably well with phylogeny. For example, of the eight clusters shown in Figure DASS-S2B, no protein in clusters 1, 2, 7 and 8 contains a TrkA-C1 domain, but all members of clusters 4 and 5 and most members of clusters 3 and 6 contain these domains. The organisms that either possess or lack the TrkA-C1 domain derive from diverse bacterial kingdoms, and many organisms encode in their genomes homologues that both do and do not contain this domain. Thus, TrkA-C domain occurrence does not correlate with organismal phylogeny. One hundred and fifteen eukaryotic DASS family homologues were studied (Table DASS-S3). Of these, only four have recognizable TrkA-C1 domains. All four of these proteins (Ehi1-4) are from Entamoeba histolytica (Fig. DASS-S3A). These proteins cluster together and comprise a single subcluster of cluster 1 (see Fig. DASS-S3B). The TrkA-C2 domain was not detected in any of the eukaryotic homologues. 3.2. SPX domains in proteins of the DASS family The DASS family was described in the previous section. In addition to the TrkA domain described therein, some members of the DASS family possess N-terminal SPX domains. These SPX (Syg1/Pho81/XPR1) domains, of about 180 residues, are found at the Ntermini of various proteins, particularly signal transduction proteins [28,29]. They are hydrophilic domains believed to bind Gprotein β-subunits [30,31]. Such a domain inhibits transduction of the mating pheromone signal in yeast [30]. Possibly all SPX domains bind G-proteins for the purpose of signal transduction. SPX domains are often (1) variable in sequence, (2) split due to insertions, or (3) fragmented due to deletions. They are found in DASS family proteins present in all of the major eukaryotic kingdoms (fungi, plants, animals, slime molds and protozoans). The clearly recognized SPX domains in the DASS family occur only in one cluster of the clustal-generated tree. This is fungal cluster 7 (see Table DASS-S3 and Fig. DASS-S3B on our website). Almost all members of this cluster possess the SPX domain. It accounts for a significant part of the large, N-terminal, poorly conserved, hydrophilic domain shown in Fig. 1C. The presence of this domain suggests that fungal DASS family members of cluster 7 may have their transport functions regulated by direct Gprotein binding. Alternatively, they may mediate cellular regulatory functions in addition to their transport functions by serving as receptors for G-proteins. Thus, by an uncharacterized mechanism, they may allow transmission of a signal to specific cellular constituents. Relevant to this second possibility, it is important to note that low affinity phosphate carriers, including Pho87, Pho90 and Pho91 in S. cerevisiae, all of cluster 7 in the DASS family (see Table DASS-S3 and Fig. DASS-S3B), have been shown to play a role in pho gene regulation [32]. It is also noteworthy that several of the large superfamilies in TCDB include members that instead of, or in addition to transport, function as receptors in signal transduction processes [2]. 1560 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 Fig. 1. Average hydropathy (top) and similarity (bottom) plots for sequenced members of the DASS family. A, archaeal homologues; B, bacterial homologues; C, eukaryotic homologues. The plots were generated from the multiple alignments shown on our website (Figs. DASS-S1A, S2A and S3A) using the AveHas program [19]. The proteins included are presented in Tables DASS-S1–S3, respectively. The positions of the TrkA-C domains in all three figures as well as that of the SPX domain in figure C are indicated. In this and other figures, average similarity values are presented as relative, not absolute values, and the scale is not presented [see Ref. [19] for details]. 3.3. DedA domains in TRAP-T family transporters (TC #2.A.56) The Tripartite ATP-independent Periplasmic Transporter (TRAP-T) family consists of systems with three components, an extracytoplasmic binding receptor, a large transmembrane (TM) protein with ≥ 12 putative TMSs (peaks of hydrophobicity in a hydropathy plot) and a small transmembrane protein with 4 TMSs [33,34]. The larger TM constituents are members of the IT superfamily [22], and they often include DedA domains [14]. Nothing is known about the function(s) of DedA domains, but these domains are found in proteins from archaea, bacteria and eukaryotes. In addition to transporters, they are found in membraneassociated enzymes such as phospholipases and oxidoreductases. TRAP-T systems are found in bacteria and archaea but not in R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1561 eukaryotes. The archaeal homologues are very distantly related to the bacterial homologues [22,34]. Only the latter were analyzed in this study. Two hundred and forty-five large subunits of TRAP-T systems from bacteria were characterized phylogenetically and analyzed for DedA domains (see Table TRAP-S1). The multiple alignment and clustal-generated tree for all of these proteins are shown in Figures TRAP-T-S1A and S1B, respectively. There are three obvious clusters, and the alignments and trees for these three clusters, clusters 1–3, are shown in Figures TRAP-S2-4, A and B, respectively. Clusters 1 and 3 include clearly delineated DedA domains, but the DedA domain could not be identified in any of the cluster 2 proteins. The average hydropathy and similarity plots for aligned clusters 1, 2 and 3 proteins are shown in Fig. 2A–C, respectively. Cluster 1 homologues (Fig. 2A) show 11 peaks of conserved hydrophobicity that might correspond to 12 TMSs, numbered 1–12. The first peak of hydrophobicity is broad and is predicted to include two TMSs with a loop region of minimal size, probably just a β-turn. The DedA domains occur at alignment positions 430 to 760, and putative TMSs 6, 7 and 8 lie within this domain. From the average similarity plot, it is clear that insertions have occurred within this domain in some of the homologues. The cluster 3 plot (Fig. 2C) shows the same pattern where putative TMSs 6, 7 and 8 lie within the DedA domain. In this plot, as for that of the cluster 1 homologues, regions of poor conservation are observed, corresponding to insertions in just one or a few of the homologues. It should be noted that DedA domains differ from all other “extra” domains in being parts of the hydrophobic core of these proteins. The significance of the DedA domain relative to the remainder of the transporter is not clear. Cluster 2 proteins show 22 peaks of hydrophobicity. Several of these peaks are poorly conserved, but there is a much greater range of conserved sequences than in either cluster 1 or cluster 3. The DedA domain was not identified in initial searches, but subsequent comparisons between cluster 2 proteins and proteins of clusters 1 and 3 revealed that sequence divergent DedA domains could be identified. The hydropathy plot shown in Fig. 2B indicates where this DedA domain occurs since according to our TMS numbering system, it always corresponds to TMSs 6–8 (Fig. 2A–C). Putative TMSs that are believed to be homologous to putative TMSs 1–12 in the clusters 1 and 3 proteins are so indicated in Fig. 2B. Additional peaks of hydrophobicity, always less well conserved, are indicated by roman numerals in accordance with the convention used previously [22]. In Fig. 2B, there are three peaks (putative TMSs I–III) that are relatively poorly conserved, in agreement with Prakash et al. [22], preceding the first peak common to clusters 1 and 3 (peak 1 in Fig. 2B). Then follow 6 TMSs, the last of which is within the DedA domain. Two non-conserved peaks (IV and V) follow. Then two well-conserved peaks within the DedA domain are found (peaks 7 and 8). These are followed by two non-conserved peaks (VI and VII). Each of these two nonconserved regions occurs in only four proteins: the first insertion (alignment positions 425–500) were in Dps9, Pmu9, Msp17 and Bbr1, while the second insertion (alignment positions 500–610) occurred in Rpa8, Mma3, Ppu2 and Spo23. Each of these 4 protein sets occurs together in one cluster on the clustal-generated tree (see Fig. TRAP-S1B). Following peak VII are the last three well-conserved peaks (9–11), corresponding to the last four well-conserved putative TMSs. These are followed by four additional poorly conserved peaks (VIII–XI). The first three of these are found in many of the proteins while the last peak is present in only a few (see Figs. TRAP-S3A and S3B). The occurrence of four poorly conserved putative Cterminal TMSs is in agreement with the results of Prakash et al. [22]. It should be noted that the numbers and positions of the extra central TMSs differ in the two studies as expected since different protein sets were analyzed. Prakash et al. [22] included proteins from all three clusters. The last hydrophobic peak (past alignment position 1000) was found in seven proteins, Dps9, Pmu9, Msp17, Bbr1, Wsu3, Uba1 and Cef2. The first four of these proteins are the same ones that have the first insert including TMSs I–III, and the first six of these proteins cluster together on the clustal-generated tree (Figs. TRAP-S1B and S3B). The plot shown in Fig. 2B makes structural sense, since insertion of an even number of TMSs allows the remaining conserved portions of the proteins to retain their original topologies. 3.4. Kazal-2 domains in proteins of the OAT family (TC #2.A.60) Secondary carriers of the organic anion transporter (OAT) family (TC #2.A.60) catalyze the Na+-independent facilitated transport of organic anions and cations such as prostaglandins, bile acids, steroid conjugates, thyroid hormones, anionic oligopeptides, drugs, toxins and other xenobiotics [35]. OAT transporters are found only in animals although the OAT family belongs to the Major Facilitator Superfamily (MFS) which is ubiquitous [8,9]. The Kazal-type serine protease inhibitor domain (Kazal-2), found in serine protease inhibitors as indicated by the name, are also found as extracellular parts of agrins, proteins involved in maturation of neuromuscular junctions in vertebrates [36]. These 48 amino acyl residue domains mediate protein–protein interactions with several other types of proteins, thereby forming structural complexes [37]. The crystal structure of the insect-derived double domain Kazal inhibitor, rhodniin, has been solved complexed to thrombin [38]. We have identified Kazal-2 domains in OAT family transporters. Of 123 OAT family homologues (Table OAT-S1), all but eight contain a recognizable Kazal-2 domain. The multiple alignment of these protein sequences is shown in Figure OAT-S1. Kazal domains are found in an internal location between putative TMSs 9 and 10 (see Fig. 3A). They therefore correspond to expectation in being localized to a site that is predicted to be on the extracytoplasmic surface of the membrane. The average hydropathy plot shown in Fig. 3A (in which the poorly conserved N-terminal and C-terminal regions of the alignment were omitted) reveals twelve conserved peaks (1–12) that might correspond to transmembrane α-helical spanners (TMSs). 1562 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 Fig. 2. Average hydropathy (top) and similarity (bottom) plots for the three clusters (A, cluster 1; B, cluster 2, and C, cluster 3) of the large transmembrane bacterial subunits of TRAP-T transporters. These plots are based on the multiple alignments shown in Figures TRAP-S2A-4A on our website. Format of presentation and the programs used are as for Fig. 1. In all three figures, Arabic numbers refer to the twelve conserved peaks of hydrophobicity while Roman numerals refer to the poorly conserved peaks. The positions of the DedA domain encompass putative TMSs 6, 7 and 8. Fig. 3. (A) Average hydropathy (top) and similarity (bottom) for all sequenced members of the OAT family included in this study. The entire conserved hydrophobic domain (multiple alignment segment 1000–3500) is shown, but the long N- and C-terminal extensions, present only in a few proteins, were eliminated for clarity. The plots are based on a CLUSTAL X alignment shown in Figure OAT-S1 [17] and were derived using the AveHas program [19]. Positions of Kazal-2 and PDZ domains are indicated. (B) Clustal-generated tree of OAT family members. The tree shows 8 primary clusters, several of which can be subdivided into proteins from specific animal types. The organismal sources of the proteins in each cluster are indicated. The tree, based on the CLUSTAL X alignment shown in Figure OAT-S1 (proteins listed in Table OAT-S1) [17], was drawn with the TreeView program [63]. The proteins that either possess a very sequence-divergent Kazal-2 domain, or do not possess this domain, are boxed. R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1563 These 12 putative TMSs occur in groups of 3, each separated by poorly conserved regions. TMSs 1–3 at the N-terminal region are followed by a long (∼ 200 residue) poorly conserved region present in only one homologue, Cfa3. This 200 residue insert in Cfa3 includes two hydrophobic peaks (peaks A and B in Fig. 3A) that could span the membrane twice, bringing the polypeptide chain 1564 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 back to the same side of the membrane as expected for all other homologues which lack this inserted sequence. TMSs 4, 5 and 6 are relatively close to each other, but a hydrophilic insert of about 100 aas between TMSs 4 and 5 occurs in just two homologues, Ptr10 and 11. Another large non-conserved insert that may include 2 TMSs (peaks C and D in Fig. 3A) occurs between TMSs 6 and 7. This insert is about 350 aas long and is found only in Ptr10. Between TMSs 9 and 10 is a ∼ 300 residue insert that includes the Kazal-2 domain. This domain is entirely hydrophilic. Of the various inserts, this one is the best conserved among members of the OAT family; in fact, it is present in almost all of the homologues (see below). Finally, within the last three TMSs, a short (∼ 100 aa) insert occurs between TMSs 10 and 11. Two hydrophobic peaks (peaks E and F in Fig. 3A) may correspond to transmembrane segments. These occur in many but not all of the OAT family members. It is worthy of note that each of the inserts that include putative TMSs has those TMSs in pairs so as not to disrupt the normal topology of the protein in the membrane. The clustal-generated tree for the 123 OAT family members is shown in Fig. 3B. There are 8 distinct clusters, but several of these can be subdivided because of deep branching, particularly within clusters 2, 5 and 6. As noted above, the OAT family proteins are derived exclusively from animals, but almost all organisms represented contain multiple paralogues. For example, round worms have 6, but interestingly one of these in C. elegans (Cel7) is duplicated internally, containing two full-length OAT family sequences in tandem with the two halves of the protein exhibiting 62% identity. Flies have 8 paralogues, the chicken and the puffer fish have 10, and mammals have 10–17 paralogues. When proteins from any one of these organismal types are examined in the clustal-generated tree, we discover that some of the paralogues are closely related while others are very distantly related. For example, worm paralogues occur in clusters 2, 6 and 8. Insect paralogues occur in clusters 2, 4, 5 and 6. Vertebrate paralogues occur in clusters 1–5 and 7. Further, within cluster 2, subclusters can be identified that are derived exclusively from (a) worms, (b) insects, (c) vertebrates and (d) mammals. Also, in cluster 6, the worm homologues segregate from the insect proteins. These observations suggest that most of the paralogues arose by gene duplication events after the organismal types diverged from each other. However, some early paralogue existed before these divergences occurred, giving rise to at least some of the primary clusters. The Kazal-2 domains analyzed here were found in the vast majority of OAT family members. Only 8 members were found to either lack this domain altogether or possess a sequence between TMSs 9 and 10 that was so divergent from the Kazal-2 sequence that it was non recognizable. These 8 proteins, present in clusters 1, 2, 3 and 5, are scattered fairly randomly around the tree. They are boxed in Fig. 3B. Exon inspection revealed that the Kazal-2 domain is encompassed within exon 12 in the mouse [39]. Homologues that lack this domain altogether may lack exon 12. Homologues that lack the Kazal-2 domain and a few sequence divergent proteins were eliminated from the multiple alignment revealing a large percentage of well-conserved cysteines. The best-conserved motif was: * * * * C X3 C X C X6 P V C X6 Y X S X C X A G C ½X ¼ any residue; * ¼ full conservation Thus, in this 30-residue sequence, six well-conserved cysteines, including three fully conserved cysteines, were found. These residues presumably possess functional significance, perhaps for the purpose of protein domain interactions, possibly also for formation of intradomain or interdomain covalent disulfide bonds. 3.5. PDZ domains in proteins of the OAT family The OAT family was described in the previous section. In addition to the Kazal-2 domain, some of these proteins contain short (60–90 residue) hydrophilic domains at their C-termini. These PDZ domains are modular protein interaction domains that play roles in protein targeting and complex assembly [40]. PDZ stands for (1) post-synaptic protein, PSD-95, (2) the Drosophila septate junction protein, Discs-large, and (3) the tight junction protein, ZO-1. PDZ-containing proteins are found ubiquitously in prokaryotes and all major classes of eukaryotes. Many hundreds have been identified due to genome sequencing. Some of these proteins contain more than one PDZ domain. Critical for OAT family protein interactions is a C-terminal X-(S/T)-X-Hy tetrapeptide sequence where X = any residue and Hy = a hydrophobic residue [41]. Mutations in PDZ domains of some human proteins can give rise to severe diseases [40]. The high-resolution 3-dimensional structures of several PDZ domains complexed to their cognate ligands have been solved [see Ref. [40] for a review]. Wang et al. [41] have recently characterized the specific function of the PDZ domain in one of the Oat carriers, Oatp1a1 (TC #2.A.60.1.1). They showed that this domain is required for interaction with the PDZK1 protein, a major scaffold protein of the epithelial cell membrane [42]. Oatp1a1 binds preferentially to the first and third PDZ binding domains in PDZK1 [41]. Without PDZK1, Oatp1a1 proved to localize to intracellular membranes instead of its normal basolateral plasma membrane local [41]. Thus, association of Oatp1a1 with PDZK1 confers upon this transporter its proper subcellular localization in the basolateral membranes of epithelial cells. We identified OAT family members with C-terminal PDZ domains, and discovered that they fall into vertebrate clusters 1 and 7 in the tree shown in Fig. 3B. However, not all proteins of these two clusters bear PDZ domains. Only four proteins in cluster 7 (Gga4, Cfa4, Bta3 and Ppy2) contain this domain. The last 3 of these proteins cluster together, but Rno26, which also clusters with these R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1565 proteins, apparently lacks the PDZ domain. Most proteins in cluster 1 contain the PDZ domain. Those that lack it include Gga2, Xla1, Tni3 and Tni2 which cluster loosely together. Also, Ptr11 and Hsa8, which cluster together, and Gga8, Tmi4, Gga6 and Dre1, which cluster loosely together, lack the PDZ domain. Thus, proteins with or without the PDZ domain generally cluster together, but with apparent exceptions. We suggest that association of OAT family proteins with the PDZ domain was a relatively late evolutionary event, designed for the purpose of membrane targeting [41]. 3.6. USP domains in proteins of the ABT family (TC #2.A.3.6) within the APC superfamily (TC #2.A.3) The Amino Acid-Polyamine-Organocation (APC) superfamily of transport proteins includes members that function as solute: cation symporters and solute:solute antiporters [43]. They occur in bacteria, archaea, unicellular eukaryotic protists, slime molds, fungi, plants and animals [44]. They are thus essentially ubiquitous. They vary in length, being as small as 350 residues and as large as 850 residues. The smaller proteins are generally of prokaryotic origin while the larger ones are of eukaryotic origin [45]. Most of them possess twelve transmembrane α-helical spanners but may have a re-entrant loop between TMSs 2 and 3 [46]. The universal stress protein of E. coli, UspA, is a small cytoplasmic protein whose expression is enhanced when the cell is exposed to stress agents [47]. UspA enhances the rate of cell survival during prolonged exposure to such conditions and may provide a general “stress endurance” activity [48]. The crystal structure of Haemophilus influenzae UspA [49] reveals an alpha/beta fold similar to that of the Methanococcus jannaschii MJ0577 protein which binds ATP [48]. The UspA proteins of E. coli and H. influenzae have not been shown to possess ATP-binding activity. The “universal stress protein” (USP) domain is found in various proteins by themselves or fused to other domains in single copy or in duplicate. It occurs in a variety of transport, regulatory and catalytic proteins. For example, it is present in single copy at the Nor C-termini of protein kinases (e.g., ZP_00400544) and at the C-termini of sensor histidine kinases (e.g., ZP_00233092) where the entire protein is sometimes duplicated. In the universal stress proteins cited above, this domain alone is present. It is also present in tandemly duplicated copies at the C-termini of chloride carriers of the ClC family (e.g., ZP_00516804; TC #1.A.11; Ref. [50]) and Na+/H+ antiporters and K+ efflux systems of the CPA2 family (e.g., BAB72614; TC #2.A.37; Ref. [21]). We identified single and tandemly duplicated USP domains in members of one of the families within the APC superfamily [44]. This family is the archaeal/bacterial transporter (ABT) family (TC #2.A.3.6) (see Table ABT-S1). Some, but not all members of the family contain C-terminal, single, or tandemly duplicated USP domains. None of the members of this family are functionally characterized. Another APC superfamily homologue, a parachlamydial protein (YP_007261) with a single C-terminal USP domain, proved to be distantly related to the 2.A.3.1.4 GABA (GabP) permease of E. coli in the amino acid transporter (AAT) family. Based on the multiple alignment shown in Figure ABT-S1, the average hydropathy and similarity plots for the ABT family were derived as shown in Fig. 4A. Twelve hydropathy peaks are all well conserved except for the last two (TMSs 11 and 12). Moreover, all of the cytoplasmic loops as well as several of the extracytoplasmic loops are well conserved. Following the hydrophobic transmembrane domain is a stretch of about 300 residues where the tandemly duplicated USP domains can be found. As indicated in the plot, this region is very poorly conserved because it is lacking in most of the homologues. The clustal-generated tree for the ABT family, also based on the multiple alignment shown in Figure ABT-S1, is shown in Fig. 4B. All but two of the proteins possessing the duplicated USP domains are found in a single cluster, cluster 1. However, one of the members of this cluster, Hma4 from Haloarcula marismortui, lacks these domains altogether. One other protein, Dac1 from the δproteobacterium, Desulfuromonas acetoxidans, the only bacterial protein in this cluster, possesses a hydrophilic C-terminal extension of the same size as most of the archaeal proteins of this cluster, but surprisingly, it contains first, a fructose-specific IIA domain (IIAFru) of the bacterial phosphotransferase system (PTS; Ref. [13]) followed by a TrkA-C domain (see above). Other paralogues of H. marismortui as well as other archaeal proteins in this cluster possess the full-length duplicated USP domains (see boxed proteins in Fig. 4B). Two other proteins in the tree shown in Fig. 4B were also found to possess C-terminally duplicated USP domains. One is the archaeal Afu2 protein from Archaeoglobus fulgidus (cluster 5) and the other is the Rba1 protein from the planctobacterium, Rhodopirellula baltica (branch 8). These proteins branch distantly from other members of this family suggesting that the hydrophilic domains became associated with these evolutionarily divergent proteins independently of the homologous domains in the cluster 1 proteins. 3.7. TrkA-C domains in proteins of the aspartate:alanine exchanger (AAE) family (TC #2.A.81) No publication has previously described the characteristics of the AAE family and its constituent members. Consequently, in this section, we not only describe the uniquely situated duplicated TrkA-C domains, we also provide a brief description of this family. A single functionally characterized protein, the aspartate:alanine exchanger (AspT) of the Gram-positive lactic acid bacterium, Tetragenococcus halophilaD10 (Pediococcus halophilus) serves to characterize the AAE family [51]. This organism takes up L-aspartate via AspT, decarboxylates it to L-alanine and CO2 in the cytoplasm using L-aspartate β-decarboxylase (AspD), and exports the L-alanine in a 1:1 exchange reaction with L-aspartate. AspT is a hydrophobic protein of 543 aas and 10–12 putative TMSs. This protein has many bacterial homologues of unknown function and one very distant homologue in the archaeon, Halobacterium sp. 1566 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1567 strain NRC-1. This last protein (384 aas; 10–12 putative TMSs; AAC82885) is the only AAE family member from an archaeon in the NCBI database, and there are none from eukaryotes (see below). Because one more negative charge is brought in (aspartate) than is exported (alanine), the exchange transport process may result in net charge movement, creating a membrane potential, negative inside. Further, decarboxylation of aspartate consumes a scalar proton and thus generates a pH gradient (basic inside). The resultant pmf can drive ATP synthesis via the F-type ATPase (TC #3.A.2). Other such exchangers generating a pmf are the prototypical oxalate/formate exchanger of the MFS (TC #2.A.1) as well as glutamate/ γ-amino butyrate, malate/lactate, citrate/lactate and histidine/histamine exchangers (for references see Ref. [51]). Each of the two hydrophobic halves of AAE family proteins contains six peaks of hydropathy that may correspond to 5 or 6 TMSs. The two halves exhibit almost unprecedented degrees of sequence similarity, suggesting that they arose by an intragenic duplication event relatively recently. If the six hydrophobic peaks correspond to 5 TMSs, there may be a single re-entrant loop between TMSs 4 and 5 [26,27]. Although this characteristic resembles that of one well-studied putative member of the IT superfamily, the CitS citrate transporter of Klebsiella pneumoniae in the CCS family (TC #2.A.24) [22,26,52], we could not establish a statistically significant relationship between these two groups of proteins. Interestingly, in all but one of the bacterial homologues, but not in the archaeal or the Fusobacterium nucleatum homologue, the two hydrophobic repeat elements are separated by two tandem hydrophilic TrkA-C domains [23,24] that must have arisen by an independent duplication or insertional event. Table AAE-S1 on our website (http://biology.ucsd.edu/~msaier/supmat/AAE) presents the members of the AAE family in alphabetical order according to species name. All but one of the members of this family derive from bacteria, but several bacterial kingdoms are represented. Most are derived from proteobacteria of the α, β, γ and δ-subclasses. Others are found in low and high G + C Gram-positive bacteria (Firmicutes and Actinobacteria, respectively) as well as Bacteroides species, Chloroflexus aurantracus, Fusobacterium nucleatum and Rhodopirellula baltica. Several species have multiple paralogues. Within the Bacteroides group, Porphyromonas gingivalis has two, Bacteroides fragilus has 3, and B. thetaiotaomicron has 4, the most of any species examined. Among the proteobacteria, organisms listed in Table AAE-S1 can encode within their genomes from one to three paralogues. The α- and β-proteobacteria represented can have 1–3 paralogues while the γ- and δ-proteobacteria have just 1 or 2. The two Firmicutes represented have just one member while actinobacteria can have either one or two. All three Corynebacterial species have two. Finally, all other bacteria represented have only one. These proteins are almost all large, between 516 and 580 residues. This large size reflects the presence of the two tandem TrkA-C domains between the two putative 5 or 6 TMS repeat units (mentioned above). The two TrkA-C domains are each about 70 residues long. The central hydrophilic region is usually about 220 residues long while the two flanking hydrophobic domains are about 170 residues long. These two hydrophobic and two hydrophilic domains thus account for much of the full sizes of most of these proteins. A few proteins listed in Table AAE-S1 are either shorter or longer than the large majority of homologues. Two of these, Msu1 and Msu2 from Mannheimia succiniciproducens, are half-sized (280 and 297 residues). They correspond to the N- and C-terminal halves of a typical AAE family member, each possessing a single AAE and a single TrkA–C domain. Fused, they comprise a full-length AAE family homologue. Either a sequencing error or a mutation accounts for this splicing event. Two proteins, Dps2 of Desulfotalea psychrophila and Vfi2 from Vibrio fischeri, are large (669 and 624 residues) because they possess N-terminal extensions of 95 and 50 residues, respectively, both of which contain a single hydrophobic putative TMS. The former, but not the latter, exhibits sequence similarity with the N-terminal region of the HlyA protein of E. coli (gi #68520714). Thus, when residues 5–83 of Dps2 were compared with residues 15–94 of HlyD, with the GAP program [53], 36% identity, 50% similarity and a comparison score of 14.3 S.D. was obtained. This is sufficient to establish homology. It is possible that these sequences are important for targeting of the proteins to the secretory (Sec) apparatus for membrane insertion [54]. The sequences of members of the AAE family were multiply aligned with the CLUSTAL X program [17], and this alignment (see Fig. AAE-S1 on our website) was used to generate average hydropathy/similarity plots (Fig. 5A; AveHas program; Ref. [19]) as well as a clustal-generated tree (Fig. 5B; TreeView program; Ref. [55]). The plot in Fig. 5A reveals one hydrophobic peak (peak 0) at alignment positions 20–60, very poorly conserved, found in only two proteins as noted above. Five peaks of average hydropathy (peaks 1–5) follow peak 0, and these are all well conserved (alignment positions 100–300). A potential semipolar re-entrant loop occurs between putative TMSs 4 and 5 [26,52]. Then follows the central hydrophilic region where the two tandem TrkA–C domains are found. Finally, 5 more peaks of hydrophobicity, including a potential re-entrant loop between putative TMSs 9 and 10, follow. Similarity between peaks 1–5 and 6–10 can be seen; for example, peaks 1, 4, 6 and 9 are the most hydrophobic while the probable re-entrant loops, peaks R1 and R2, are the least hydrophobic, exhibiting semipolar characteristics. These observations prompted us to examine segments encompassing TMSs 1–5 and TMSs 6–10 to determine if they are homologous. The IC [18] and GAP [53] programs were used for this purpose. The results of one comparison are shown in Figure AAE-S2 on our website. The startling comparison score value of 23 S.D. is among the greatest observed for the two repeat units in any family within the TCDB (see Refs. [22,56]). The high values obtained suggest that the intragenic duplication event that gave rise Fig. 4. (A) The average hydropathy (top) and similarity (bottom) plots (AveHas program) for the ABT family within the APC superfamily. (B) The clustal-generated tree (TreeView program) for the ABT family. The plots are based on a CLUSTAL X multiple alignment shown in Figure ABT-S1 using the proteins tabulated in Table ABT-S1. In (B), the proteins possessing the C-terminal tandemly duplicated USP domains are boxed. 1568 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 Fig. 5. (A) Average hydropathy and similarity plots for the proteins of the AAE family. The proteins (Table AAE-S1) included in the CLUSTAL X alignment (Fig. AAE-S1), upon which this plot was based, can be viewed on our website (http://biology.ucsd.edu/~msaier/supmat/Domains). The putative 5 TMS repeat unit (putative TMSs 1–5 or 6–10) plus a proposed semipolar re-entrant loop (R1 or R2) [26,27] are indicated as are the duplicated TrkA-C domains [23,24]. (B) Clustalgenerated tree for the AAE family. The TreeView program was used to draw the tree, based on the CLUSTAL X alignment shown in Figure AAE-S1 on our website. The proteins comprising the eleven clusters and their abbreviations are presented in Table AAE-S1. R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1569 to these putative 10 TMS proteins must have occurred relatively recently in evolutionary history. They also suggest that the two halves of these proteins serve similar functions and have not yet acquired a high degree of functional selectivity. The clustal-generated tree for the AAE family proteins is shown in Fig. 5B. It can be seen that the various clusters exhibit a limited diversity of organismal types from which these proteins were derived. Thus, clusters 1 and 7 proteins are all from γproteobacteria, and all proteins in each of these two clusters are from different organisms. The only exception is the Msu1 and Msu2 proteins which may represent the two halves of a single protein, split because of a sequencing error or an authentic mutation. We suggest that each of these clusters represents a set of orthologues although the two clusters are clearly nonorthologous to each other. In a like fashion, cluster 4 proteins are derived from the only two Firmicutes represented, and cluster 6 proteins are all derived from members of the Bacteroides. Of these proteins, only Porphyromonas gingivalis has two paralogues. Clusters 10 and 11 each includes a single protein, from the only member of the Planctomycetacia and from Fusobacterium nucleatum, respectively. Other clusters consist of homologues from broader groups of bacteria, and each of these clusters is likely to comprise a different set of functionally distinct proteins. Thus, cluster 2 includes proteins from β-, γand δ-proteobacteria, cluster 3 is from α- and β-proteobacteria, cluster 5 is from α- and β-proteobacteria as well as Bacteroides, cluster 8 is from Bacteroides as well as a δ-proteobacterium, and cluster 9 is from β- and δ-proteobacteria as well as Chloroflexi and Actinobacteria. 3.8. Multiple domains in proteins of the eukaryotic-specific organic solute transporter (OST) family (TC #2.A.82) Members of the OST family have been characterized from humans, mice and the little skate, Raja erinacea [57,58]. Each of these characterized systems consists of two polypeptide chains, α and β. For the human system, the α-subunit (Ostα) is of 340 aas with 7 putative TMSs while the β-subunit (Ostβ) is of 128 aas with 1 putative TMS near the N-terminus (residues 40–56). Neither Ostα nor Ostβ alone has activity, but the two together transport a variety of organic compounds, mostly anions. Transport of estrone-3-sulfate is Na+-independent, ATP-independent, saturable and inhibited by other steroids and anionic drugs. Bile acids, taurocholate, digoxin and prostaglandin E1 are also substrates [57,58]. The two proteins are highly expressed in many human tissues. The β-subunit is not required to target the α-subunit to the plasma membrane, but coexpression of both genes is required to convert Ostα to the mature glycosylated form in enterocyte basolateral membranes and possibly for trafficking through the Golgi apparatus [59]. Table OST-S1 on our website (http://biology.ucsd.edu/~msaier/supmat/OST) presents the members of the Ostα family while Table OST-S2 presents the recognized members of the Ostβ family. All β-subunits are derived from vertebrate animals, but the αsubunits are derived from animals, plants, fungi and protozoans. While only the chicken appears to have more than one Ostβ, many organisms encode multiple Ostα's within their genomes. Of the organisms with fully sequenced genomes, plants have 7 or 8, animals have 3–6, fungi have 1 or 2, and the single protozoan represented has just one. The α-subunits vary in size from about 250–800 aas while the β-subunits are either of about 110–130 aas or 280–380 aas. The mammalian Ostβ proteins are of the smaller size. The clustal-generated tree of the Ostα family is shown in Fig. 6A while that for the Ostβ family is shown in Fig. 6B. In Fig. 6A, we find four clusters of animal homologues, three clusters of plant proteins, two clusters of fungal proteins, and the single protozoan protein is on a branch by itself. This fact suggests that early intergenic duplication events gave rise to the primary clusters within each eukaryotic kingdom. These duplication events may have occurred at about the time the major eukaryotic kingdoms arose. Speciation accounts for separation of clusters according to kingdom. The phylogenetic analysis suggests that there has been no horizontal transfer of genetic material encoding these homologues between kingdoms. Interestingly, fungal cluster 1 has far more members than fungal cluster 2, and the cluster 2 proteins are exclusively from organisms that have two paralogues, one of which is in fungal cluster 1. Thus, in fungi, there have been no late gene duplication events within the OST family. There was a single early gene duplication event, and subsequently one of the paralogues (those in cluster 2) was lost although those in cluster 1 were not. Many of the clusters show phylogenetic grouping in accordance with organismal phylogeny, again suggesting orthology for proteins from different species within a specific cluster. The Ostβ tree (Fig. 6B) shows a pattern virtually identical to that of one subcluster in the Ostα animal 3 cluster (Fig. 6A). The transporters in this subcluster presumably require the participation of a β-subunit. However, other systems represented in Table OST-S1 either function without a β-subunit or use a sequence divergent β-subunit that was not recognized in our BLAST searches. The average hydropathy and similarity plots shown in Fig. 7A and B were generated from the multiple alignments shown in Figures OST-S1 and OST-S2 on our website, respectively (http://biology.ucsd.edu/~msaier/supmat/Domains). The Ostα alignment and the plots presented in Fig. 7A show 7 well-conserved peaks of hydrophobicity. Most of these peaks are sharp, but peak 5 looks like a double peak with good conservation. Examination of the multiple alignment revealed that this double peak is due to a 4residue insertion in the center of this putative TMS in one of the proteins. Several inter-TMS and post-TMS regions are also quite well conserved. The inter-TMS regions between putative TMSs 1 and 2, and TMSs 5 and 6 are much better conserved than those between TMSs 2 and 3, 3 and 4, 4 and 5, and 6 and 7. Thus, assuming a 7 TMS topology, we predict, on this basis, that cytoplasmic loops tend to be best conserved, that the N-termini of these proteins are in the extracellular medium and the C-termini are in the cytoplasm. The positive inside rule [60–62] gave the same prediction. We also suggest that these putative 7 TMS proteins 1570 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 Fig. 6. Clustal-generated trees for the protein constituents of the OST family, Ostα (A) and Ostβ (B). The protein abbreviations are as presented in Tables OST-S1 and OST-S2, respectively, and the multiple alignments upon which these trees were based can be viewed in Figures OST-S1 and OST-S2, respectively. The TreeView program [55] was used to draw the tree. arose by an intragenic duplication event in which a 3 TMS-encoding genetic element duplicated to give 7 TMSs (with the extra TMS sandwiched in between the two repeat sequences) as has been documented for the Brho and LCT families [63]. Examination of the multiple alignment for the Ostα subunits revealed that over a dozen homologues were either truncated or internally deleted, no two exhibiting the same apparent deletions. These atypical sequences, most believed to be artifactual due to sequencing errors, incorrect initiation codon assignment or lack of exon recognition, were excluded from the alignment, yielding a more coherent plot. R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1571 Fig. 7. Average hydropathy and similarity plots of (A) Ostα homologues and (B) Ostβ homologues. The proteins included are listed in Tables OST-S1 and OST-S2, respectively, while the multiple alignments upon which these plots were based are shown in Figures OST-S1 and OST-S2, respectively. The AveHas program [19] was used to generate these plots. The most conserved regions of the Ostα proteins were used to design consensus sequences. These corresponded to TMSs 1 and 2 and TMSs 5 and 6. These sequences are: H L X ðH NÞ Y ðT SÞ ðK NÞ P ðE NÞ ðEÞ Q R Hy5 P ðI VÞ Y ðA SÞ ðL FÞ ðD SÞ S ðW FÞ ðV LÞ ðS GÞ L ðalignment positions 321357Þ * ð2Þ ðR KÞ ðE DÞ X L X P ðF YÞ ðR KÞ P Hy2 K F Hy3 K Hy3 F L ðS TÞ ðF YÞ W Q ðalignment positions ð587612Þ ðAlternative residues at a single position are in parentheses; Hy¼any hydrophobic residue; X ¼ any residue; * ¼ fully conservedÞ Examination of the Ostβ multiple alignment revealed that the β-subunits show poorly conserved N-terminal regions, and that a single region in the β-subunit is well conserved. This region includes the single putative TMS with two fully conserved residues. The consensus sequence for this region is: * * ðS TÞ P W N Y S Hy4 ðS GÞ Hy4 ðR G AÞ R ðS NÞ I X A N R ðN KÞ R K ðalignment positions: 302334Þ 1572 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 Several of the larger Ostα homologues proved to possess C-terminal hydrophilic domains that are homologous to those in other proteins. For example, Cne1 from Cryptococcus neoformans (796 aas; gi 50255021) has a large domain (residues 426–791) that exhibits significant sequence similarity with the huge dynein-related AAA-type ATPase (midasin; Mdn1p of Saccharomyces cerevisiae) that forms a complex with the Rix1 protein complex. This large complex mediates the ATP-dependent remodeling of 60S ribosomal subunits and their subsequent export from the nucleoplasm to the cytoplasm [64,65]. Another Ostα homologue, Tni4 from Tetraodon nigroviridis (869 aas; gi 47209270) exhibits 14 putative TMSs and possesses a C-terminal Ostα domain and an N-terminal membrane-bound O-acyl transferase (MBOAT) domain. The MBOAT family includes proteins from all eukaryotic kingdoms. The protein-cysteine N-palmitoyl transferase, porcupine [66] is a well-characterized example. Another Ostα homologue, Tni5 (842 aas; gi 47227190) shows an extensive region (residues 549–814) containing numerous repeat units of about 27 aas. This region is homologous to a 300 residue domain in the human polycystic kidney disease protein (2318 aas; gi 62665258) [67–69]. Finally, the Uma protein from Ustilaga maydis (969 residues; gi 46100056) shows a C-terminal region (residues 684–855) that is homologous to the N-terminal region (residues 7–169) of human kinesin (208 aas; gi 51464016) [70] as well as with a central domain (residues 729–884) of the plasmodium erythrocyte binding protein that recognizes the human Duffy glycoprotein blood group determinant (1070 aas; gi 1346921). These results show that eukaryotic Ostα transporter homologues can be fused to any of a variety of conserved protein domains that may function in catalysis (e.g., the MBOAT domain) or in macromolecular recognition. The two Ostβ paralogues from the chicken, Gga1 and Gga2, and the fish homologue, Dre1 from Danio rerio, are substantially larger than the mammalian homologues (see Table OST-S2). One of these, Gga2, has an N-terminal ricin-type β-trefoil domain, a carbohydrate-binding domain, formed by a presumed intragenic triplication event [71]. The conserved hydrophobic domain occurs in the C-terminal region of this protein at residues 290–315. The other chicken Ostβ paralogue, Gga1 (286 aas; gi 50753202) and the fish homologue, Dre1 (332 aas; gi 68369922), have their single TMSs centered at about residue 200. Gga1 does not exhibit a CDDrecognized conserved domain, but Dre1 clearly has the ricin-like domain. All 3 homologues proved to be homologous throughout most of their lengths, and consequently, although that in Gga1 is sequence divergent, they all possess N-terminal ricin domains. We conclude that the three non-mammalian Ostβ subunits possess a ricin-like carbohydrate-binding domain, possibly for the purpose of recognizing a carbohydrate chain in Ostα or another protein. The ricin domain is proposed to function in glycoprotein recognition although the possibility of a glycolipid recognition function cannot be ruled out. 3.9. Repeat domains in members of the 4 TMS multidrug endosomal transporter (MET) family (TC #2.A.74) Proteins of the MET family have been sequenced and characterized from mice and humans [72]. These two orthologues have the same length and sequence. The two functionally characterized human and mouse orthologues are located in late endosomal, Golgi and/or lysosomal membranes. Their transcripts are expressed in many tissues and cell types. The C-termini of these proteins possess hydrophilic domains containing several tyrosine-based sorting motifs, YXXHy, where Hy represents a bulky hydrophobic residue. This sequence directs proteins to intracellular compartments in eukaryotes. Substrates of the mouse MET (also called “mouse transporter protein,” MTP) include thymidine, both nucleoside and nucleobase analogues, antibiotics, anthracyclines, ionophores and steroid hormones [73,74]. MET transporters may be involved in the subcellular compartmentation of steroid hormones and other compounds. Drug sensitivity by mouse MET is regulated by compounds that inhibit lysosomal function, interface with intracellular cholesterol transport, or modulate the multidrug resistance phenotype of mammalian cells [72–74]. Thus, MET family members may compartmentalize diverse hydrophobic molecules, thereby affecting cellular drug sensitivity, nucleoside/nucleobase availability and steroid hormone responses. The mode of transport and energy-coupling mechanism, if any, has not been investigated, but a proton antiport mechanism is inferred. A related protein is the retinoic acid-inducible E3 protein, associated with lymphoid, erythroid and myeloid tissues but not nonhematopoietic tissues of the mouse [75,76]. The E3 protein is 261 amino acids long and exhibits 5 putative TMSs. Both MTP and the E3 protein have human homologues. Three TMS homologues found in the related flatworms, Schistosoma mansoni and S. haematobium, are called the “tri-spanning orphan receptors,” TorE [77]. Table MET-S1 on our website (http://biology.ucsd.edu/~msaier/supmat/MET) presents the proteins of the MET family. Using the BLAST search tool [16] with the human MTP as the query sequence and several iterations, many proteins were obtained. A number of these proteins proved to be randomly truncated or incomplete, and these were eliminated. Redundant sequences were also eliminated prior to construction of the multiple alignment. We paid particular attention to organisms with fully sequenced genomes, but several others are included in Table MET-S1. All organisms represented except for Trypanosoma cruzi are animals. Since the T. cruzi sequence and those from Schistosoma species are nearly identical to the human Hsa5 protein, we suggest either that contaminating DNA encoding the gene products is responsible, or that the encoding genes have been acquired by these pathogens from a mammal via a recent horizontal gene transfer event as originally proposed by Inal [78]. Examination of the proteins listed in Table MET-S1 reveals that many organisms have two or three paralogues. Thus, three are found in D. melanogaster, H. sapiens and T. nigroviridis. A multiple alignment was generated using the CLUSTAL X program (Fig. MET-S1) [17] and from it, an AveHas plot (Fig. 8A) and a clustal-generated tree (Fig. 8B) were generated. R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1573 Fig. 8. (A) The average hydropathy (top) and similarity (bottom) plots (AveHas program; Ref. [19]) for the eukaryotic MET family and (B) the clustal-generated tree (TreeView program; Ref. [55]) for the MET family. The plots are based on the multiple alignment shown in Figure MET-S1. The protein abbreviations are presented in Table MET-S1. Most of the proteins listed in Table MET-S1 have sizes of 230–290 aas, but some are substantially larger. All three roundworm proteins are large (562–568 aas). The same is true of two of the fruitfly proteins (432–474 aas), one of the P. troglodytes proteins (481 aas), the S. japonicum homologue (414 aas) and one of the T. nigroviridis proteins (355 aas). 1574 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 As shown in Table MET-S1, a majority of MET family homologues have 4 putative TMSs, but several are predicted to have 3 or 5. Eight of them are predicted to have 5 TMSs, and six have 3 putative TMSs. These topological differences are probably real and not artifactual, due to sequencing or annotation errors. The multiple alignment revealed tremendous diversity between the proteins. No single residue was conserved in all of the proteins (see Fig. MET-S1). Fig. 8A shows the average hydropathy plot (top) and average similarity plot (bottom). The four well-conserved hydrophobic peaks are labeled 1–4. Maximal conservation is observed not only for these peaks, but immediately following them, particularly following peaks 1 and 4 (Fig. 8A). Three moderately hydrophilic regions are also quite well conserved. The MET family clustal-generated tree is shown in Fig. 8B. There are four principal clusters. Clusters 1 and 4 proteins exhibit 4 putative TMSs while cluster 2 proteins exhibit 5 putative TMSs, and cluster 3 proteins are predicted to have 3 TMSs. In clusters 1–3, there is a single exception to the standard topology for that group of proteins, possibly due to errors in sequencing, exon assignment or initiation codon assignment. While the mouse transporter, MTP (Mmu1) is in cluster 1, the mouse retinoic acid-inducible E3 protein (Mmu4) is in cluster 2. The tri-spanning orphan receptors, also called the complement inhibitory receptors, are present in cluster 3. As noted above, clusters 1 and 4 proteins have 4 TMSs, cluster 2 proteins have 5 TMSs, and cluster 3 proteins have 3 TMSs. The multiple alignment upon which Fig. 8 was based (Fig. MET-S1) was examined to determine which TMSs correspond. The 5 TMS proteins in cluster 2 have an extra N-terminal TMS which in Fig. 8A occurs at the well-conserved, moderately hydrophobic peak centered at alignment position 310. This peak is labeled “0” in Fig. 8A. The remaining 4 TMSs correspond in position to the 4 TMSs in cluster 1 proteins. The 3 TMS proteins of cluster 3 correspond in position to TMSs 1–3 in the 5 TMS proteins of cluster 2. Thus, all proteins share only TMSs 1 and 2 as labeled in Fig. 8A. They do, however, have C-terminal hydrophilic extensions. The Sja5 cluster 3 protein with 5 putative TMSs had a 150 residue N-terminal extension with two extra hydrophobic regions. Putative TMSs 3–5 in this protein correspond to TMSs 1–3 in all other cluster 3 proteins. We examined some of the proteins for “extra” domains. When a short homologue such as Rno5 was blasted against the NCBI database, only recognized MET family members appeared. However, when we blasted a longer protein such as Cel1, which has an N-terminal 4 TMS membrane domain and a long hydrophilic C-terminal extension of about 400 residues, many cytosolic proteins were retrieved. One such protein, peptidylprolylisomerase G (cyclophilic G) of D. melanogaster (736 aas; similar to human cyclophilins A, B and H; gi 45708892) showed ≥ 7 repeats of about 44 residues that were homologous to residues 320–490 in Cel1. This repeat element, present at least 4 times in Cel1, proved to be exceptionally hydrophilic. The repeat unit was found in proteins from amoebae, fungi, animals and plants. It was not recognized by CDD. Other proteins retrieved in the same BLAST search included a hypothetical protein from Plasmodium yoelii yoelii (931 aas; gi 83314969) which had at least 11 repeats of this segment, a putative protease from Gallus gallus, a hypothetical protein of Trypanosoma cruzi (839 aas; gi 70883026) with at least 13 repeat units, and a fish homologue from Danio revio (1798 aas; gi 68367999). 3.10. “Extra” domains in ion channel proteins Recognizable CDD-defined domains could be identified in various ion channel proteins. Using default settings for the NCBI conserved domain database (CDD) search tool (expect cutoff of 0.01 with filter; Ref. [20]), α-helical-type channel proteins of TC class 1.A were screened for extra hydrophilic domains. Domains of interest identified in representative channel-forming proteins are presented in Table 1. A more detailed description of the proteins, their structures, and their functions can be found in TCDB. 3.11. The voltage-gated ion channel (VIC) superfamily (TC #1.A.1–5) Within members of the VIC superfamily, we identified several different domains associated with various family members. In families 4 and 5 of TC superfamily 1.A.1, three hydrophilic domains were identified. Thus, for example, in the Akt1 potassium channel protein of Arabidopsis thaliana [79], the CAP_ED domain, a cyclic AMP binding domain found in transcription factors, cyclic nucleotide-dependent protein kinases and cyclic nucleotide-gated ion channels, follows the voltage sensor and channelforming transmembrane domains. The CAP_ED domain is then followed by several (possibly six) ankyrin repeats (ANK domains) found in a diverse group of proteins. ANK repeats are about 33 residues long and usually occur in 4-20 consecutive copies. The core of this repeat appears to be a helix–loop–helix structure, and these repeats may mediate protein:protein interactions. The KHA domain at the extreme C-terminus of Akt1 may promote tetramerization of the complex (Table 1; Refs. [80,81]). Other VIC superfamily members include the calcium-activated potassium channel proteins such as human KCNN4 (TC #1. A.1.16.2) [82,83]. These proteins have calmodulin-binding domains (CaMBD) C-terminal to the ion channel domains [84]. In KCNN4, this domain occurs at residues 410–480. Other members of family 16 also bear this domain. Some VIC family members have both CAP_ED and CaMBD domains [85], and thus are sensitive to both cyclic nucleotides and calcium. Distantly related to the VIC superfamily is the ryanodine/inositol 1,4,5-triphosphate receptor calcium channel (RIR-CaC) family (TC #1.A.3) [56,86]. One such protein, ITPR2 (TC #1.A.3.2.1) has multiple N-terminal MIR domains. These domains, in addition to RIR-CaC family proteins, occur in O-mannosyl transferases [87]. Their function is unknown. Another distantly related family within the VIC superfamily is the transient receptor potential-calcium channel (TRP-CC) family (TC #1.A.4) [56]. Most if not all members of this family contain N-terminal ankyrin repeat (ANK) domains. As noted R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 1575 Table 1 “Extra” domains found in representative α-helical-type ion channel proteins (TC class 1.A) Family TC # a Familya Example b Domain abbreviation c 1.A.1 Voltage-gated ion channel (VIC) CAP_ED domain, ANK X5, KDH CaMBD 1.A.3 Ryanodine/inositol 1,4,5-triphosphate receptor calcium channel (RIR-CaC) Transient receptor potential protein cation channel (TRP-CC) Cyclic nucleotide-gated potassium channel, AKT1 (TC #1.A.1.4.1) (Q38998) Small conductance calcium-activated potassium channel protein 2, KCNN2 (TC #1.A.1.16.1) (Q9H2S1) Inositol 1,4,5-triphosphate receptor, ITPR2 (TC #1.A.3.2.1) (P29995) 1.A.4 1.A.5 Polycystin cation channel (PCC) 1.A.9 Ligand-gated ion channel (LIC) 1.A.11 Chloride ion channel (ClC) Transient receptor potential protein, TRP (TC #1.A.4.1.1) (P19334) Transient receptor potential cation (Ca2+/Mg2+) channel subfamily M, member 7, TRPM7 (TC #1.A.4.5.6) (Q96QT4) Polycystin-1 precursor, PDK1 (TC #1.A.5.1.1) (P98161) Neuronal acetylcholine receptor protein, alpha-7 subunit precursor, ACHA7 (TC #1.A.9.1.1) (P36544) Voltage-gated chloride channel, GEF1 (TC #1.A.11.1.1) (P37020) Chloride channel protein skeletal muscle, ClC-1, CLCN1 (TC #1.A.11.2.1) (P35523) MIR X4 ANK X4 Alpha-kinase LRR, WSC, PKD, CLECT, PKD X15, REJ, GPS, PLAT Neur_chan_LBD CBS COG3448 a TC #, family transporter classification number. The family name and abbreviation are provided in column 2. Representative proteins bearing the indicated domains are indicated by name, abbreviation, protein TC #, and SwissProt/TREMBL accession number. c When multiple domains are reported, these are presented in the order in which they occur in the protein (N-terminus to C-terminus). When multiple repeats are present, the number identified is indicated (e.g., X4: four repeats). b above, other members of the VIC superfamily contain these hydrophilic protein-protein interaction domains. They may occur in different positions relative to the ion channel domain. Other TRP-CC members (e.g., TRPM7; TC #1.A.4.5.6) have C-terminal alpha-kinase domains [88]. These domains are protein kinase catalytic domains that have little or no sequence similarity to conventional kinases. The family includes myosin heavy chain kinases and elongation factor-2 kinases. Alpha-kinase domains show similarity to enzymes with ATP-GRASP domains. At least some members of the TRP-CC family that have these domains, including TRPM7, are bifunctional with N-terminal channels and C-terminal ser/thr kinase domains. The kinase domain in TRPM7 is essential for channel activity and phosphorylates annexin A1. It therefore serves as a positive effector domain for Ca2+/Mg2+ influx [89–91]. A final distant family of the VIC superfamily is the polycystin cation channel family (TC #1.A.5) [56]. One such protein, human polycystin-1 precursor [92] has a size of 4303 aas and multiple domains [93]. These domains include (from N-terminus to Cterminus) (1) at least one leucine-rich repeat (LRR) of unknown function, (2) a yeast cell wall integrity and stress response (WSC) domain, present also in WSC proteins and fungal exoglucanases, (3) a single polycystic kidney disease 1 (PKD1) domain, also present in other proteins such as microbial collagenases, (4) a C-type lectin-binding calcium-dependent carbohydrate-binding (CLECT) domain, (5) 15 more polycystic kidney disease 1 (PKD1) repeats, (6) a receptor of egg jelly (REJ) domain (of unknown function) found also in sperm receptor proteins, (7) the G-protein-coupled receptor proteolytic site (GPS) domain, also present in sea urchin REJ and latrophilin-CL-1, and (8) the polycystin-1, lipoxygenase, α-toxin (PLAT or LH2) domain. This last-mentioned domain, which can exist in single domain proteins, is thought to facilitate access to sequestered membrane or micellar-bound substrates (see CDD for more detailed descriptions). 3.12. The ligand-gated ion channel (LIC) family (TC #1.A.9) In the ligand-gated ion channel (LIC) family (TC #1.A.9) [94], members consist of N-terminal ligand binding domains (LBD), homologous to solute-binding proteins in prokaryotes that serve as receptors for ABC-type uptake transporters as well as the ligandbinding domains of transcription factors in the LacI/GalR family [94,95]. In LIC family members, these ligand-binding domains are followed by the channel-forming integral membrane domains. This general structural pattern is observed for most if not all LIC family members such as the acetylcholine, serotonin, glycine, glutamate, and γ-aminobutyrate (GABA) receptors [96]. Highresolution X-ray structures of the pentameric acetylcholine receptor (α2βγδ) have been reported, revealing how ligand binding to the LBD domain controls channel conformation and activity [94]. In this case, the LBD is essential for responsiveness of the channel to the presence of a neurotransmitter. 3.13. The chloride ion channel (ClC) family (TC #1.A.11) The chloride ion channel (ClC) family (TC #1.A.11) has C-terminal cystathionine-β-synthase (CBS) domains similar to those found in many other proteins from organisms of the three domains of life [97]. Although their general function is not known and may 1576 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 be multifaceted, in the synthases, they function in regulation by S-adenosyl methionine (SAM). Proposed functions of CBS domains include (1) multimerization, (2) protein sorting, (3) channel gating, and (4) ligand binding. The recent demonstration that CBS domains bind adenosine-containing ligands (ATP, AMP and SAM) has led to the suggestion that they are sensors of intracellular metabolites and may also mediate signal transduction [97]. 4. Discussion In this bioinformatic analysis, we have identified “extra” domains in known secondary carriers, channels and their homologues. These proteins are all members of established transporter superfamilies/families categorized in the Transporter Classification (TC) System (see http://www.tcdb.org). Based on their family/subfamily associations, the transported substrates, the polarity of transport, and the cotransported cation (in secondary carriers) can often be predicted. However, when no functionally characterized member of a family or subfamily is available, functional prediction is much more difficult. In the past, we have found that prediction of transport functions can be greatly facilitated by the identification of fused domains of known function. For example, such associations have allowed us to correctly predict the polarity and substrates of lysophospholipid [10] and bicarbonate [11] uptake permeases. These successes have prompted us to seek extra domains in other transporters. Extra domains, particularly hydrophilic domains in channels and secondary carriers, are likely to serve as regulatory, targeting, or ligand/protein interaction domains. Often, these domains have become fused to catalytic enzymes as well as transporters, and if in any such case, the subfunction of the domain is known, then extrapolation to other proteins of unknown or differing function is, with caution, justified. For example, TrkA-N domains bind NAD+ and function in regulation or catalysis, and the sequence-related TrkA-C domains may serve a similar function while binding some other ligand [23]. The identification of these domains in members of the DASS, ABT and AAE families of anion and amino acid transporters suggests that these transporters may be regulated by a mechanism, common to that of various channels, carriers and enzymes that also contain these domains. In fact, it is possible that these diverse vectorial and non-vectorial catalytic processes are coordinately regulated by virtue of the presence of the same fused regulatory domains. Since only certain members of a particular family generally exhibit any one of these fused domains, it can be supposed that a subfunction common to the homologues containing that domain will be absent from those that do not. Thus, the discovery of multidomain transporters may facilitate functional characterization. However, it will also extend the functionality of the carrier by providing clues about specific subfunctions. Just as TrkA domains may be nucleotide (e.g., NAD+) binding domains, the SPX domain may be Gprotein β-subunit binding domains. The identification of large SPX domains in transport homologues of the DASS family is particularly worthy of note. These domains are believed to bind G-protein β-subunits for the purpose of signal transduction [30,31]. This function is consistent with our finding that they are restricted to fungal members of the DASS family; none were found in the prokaryotic homologues. Trimeric G-proteins and SPX domains seem to be restricted to the eukaryotic domain, although they are present in representative proteins from all eukaryotic kingdoms. This may suggest that the fungal cluster 7 (in figure DASS-S3B on our website) has gained a receptor function not common to non-fungal members of the DASS family. It is worthy of note that reception is one of the few non-transport functions that transport homologues have acquired over evolutionary time [2,32]. The DedA domains, found in TRAP-T family members, differ from those cited above in that these domains are hydrophobic, including putative transmembrane α-helices, that presumably form structural constituents of the transport proteins. They may function as structural scaffolds or participate directly in transport. In this respect, DedA domains differ from all of the other “extra” domains described in this study. The OAT family of animal organic anion transporters proved to contain Kazal-2 and PDZ domains. These domains are probably modular protein–protein interaction domains. The presence of two structurally distinct types of such interaction domains clearly suggests that the OAT transporters interact with multiple cytoplasmic or membrane proteins. Possibly these interacting proteins play a role in their transport functions. In fact, mutations in PDZ domains of some human proteins result in severe diseases [40,41]. In the case of Oatp1a1, the PDZ domain promotes interaction with the PDZK1 scaffolding protein of the epithelial cell, an event that is required for proper subcellular localization [42]. A particularly complex situation proved to exist in members of the OST family. Here, two subunits are required for the function of at least some animal transporters. These subunits, Ostα and Ostβ, could include in their polypeptides a number of extra domains, depending on the homologue. Thus, different Ostα subunits can incorporate (1) N-terminal catalytic membrane-bound O-acyl transferase (MBOAT) domains, (2) Cterminal hydrophilic domains also found in dynein-related AAA ATPases, (3) C-terminal strongly hydrophilic repeat units also found in the human polycystic kidney disease protein, and (4) C-terminal domains found N-terminally in human kinesin. The larger Ostβ homologues found in non-mammalian animals have an N-terminal ricin carbohydrate binding domain that probably functions in glycoprotein or glycolipid recognition. Such a degree of complexity within a single protein family is almost unprecedented and suggests the occurrence of a variety of late evolving subfunctions. Our studies revealed the presence of Universal Stress Protein (USP) domains in one family (ABT) of the APC superfamily of amino acid/polyamine/organocation transporters. Since UspA R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 of E. coli promotes cell survival and is synthesized in increased amounts under stress conditions, and because it lacks other domains, it can be inferred that the presence of USP domains in a transporter could promote the involvement of the transporter in one or more stress responses. Similarly, the presence of a IIAFru domain in a bacterial member of this family suggests that for this protein (which lacks the USP domain), regulation is sensitive to the phosphorylation state of the fused IIA protein and hence to the energy state of the cell. Finally, examination of representative α-helical-type channel proteins of TC class 1.A revealed a number of domains, several of which are known to function in regulation as ligand-binding domains. These include the cyclic nucleotide-binding CAP_ED domain, the calmodulin-binding domain (CaMBD), the neurotransmitter ligand-binding domain (LBD) and the α-adenosyl methionine-binding domain (CBS) that may actually be capable of binding any of a variety of adenosine-containing molecules such as ATP, AMP and SAM [47]. Other hydrophilic domains in channel-forming proteins probably function in signal transduction, macromolecular interactions or subcellular targeting. However, in one case, that of the TRPM7 Ca2+/Mg2+ channel protein, the extra C-terminal domain has ser/thr protein kinase activity that regulates channel activity. All of the suggestions and conclusions resulting from the reported studies are subject to experimental verification. Molecular genetic/biochemical approaches coupled to physiological analyses are likely to reveal the primary and secondary functions of these transporters and their subdomains. While clues as to their functions are already available, detailed analyses are likely to reveal functional diversity much greater than is currently recognized. We hope that these studies will provide guides for structure/function relationships for many years to come. Acknowledgments This work was supported by NIH grants GM55434 and GM64368. We thank Mary Beth Hiller for her assistance in the preparation of this manuscript. References [1] M.H. Saier Jr., Computer-aided analyses of transport protein sequences: gleaning evidence concerning function, structure, biogenesis, and evolution, Microbiol. Rev. 58 (1994) 71–93. [2] M.H. Saier Jr., A functional–phylogenetic classification system for transmembrane solute transporters, Microbiol. Mol. Biol. Rev. 64 (2000) 354–411. [3] W. Busch, M.H. Saier Jr., The Transporter Classification (TC) system, 2002, CRC Crit. Rev. Biochem. Mol. Biol. 37 (2002) 287–337. [4] W. Busch, M.H. Saier Jr., The IUBMB-endorsed transporter classification system, Mol. Biotechnol. 27 (2004) 253–262. [5] M.H. Saier Jr., C.V. Tran, R.D. Barabote, TCDB: the transporter classification database for membrane transport protein analyses and information, Nucleic Acids Res. 34 (2006) D181–D186 (Database issue). [6] M.H. Saier Jr., Answering fundamental questions in biology with bioinformatics, ASM News 69 (2003) 175–181. [7] M.H. Saier Jr., Tracing pathways of transport protein evolution, Mol. Microbiol. 48 (2003) 1145–1156. [8] S.S. Pao, I.T. Paulsen, M.H. Saier Jr., The major facilitator superfamily, Microbiol. Mol. Biol. Rev. 62 (1998) 1–32. 1577 [9] M.H. Saier Jr., J.T. Beatty, A. Goffeau, K.T. Harley, W.H.M. Heijne, S.-C. Huang, D.L. Jack, P.S. Jahn, K. Lew, J. Liu, S.S. Pao, I.T. Paulsen, T.-T. Tseng, P.S. Virk, The major facilitator superfamily, J. Mol. Microbiol. Biotechnol. 1 (1999) 257–279. [10] E.M. Harvat, Y.M. Zhang, C.V. Tran, Z. Zhang, M.W. Frank, C.O. Rock, M.H. Saier Jr., Lysophospholipid flipping across the Escherichia coli inner membrane catalyzed by a transporter (LplT) belonging to the major facilitator superfamily, J. Biol. Chem. 280 (2005) 12028–12034. [11] J. Felce, M.H. Saier Jr., Carbonic anhydrases fused to anion transporters of the SulP family: evidence for a novel type of bicarbonate transporter, J. Mol. Microbiol. Biotechnol. 8 (2004) 169–176. [12] T. von Rozycki, M. Schultzel, M.H. Saier Jr., Sequence analyses of cyanobacterial bicarbonate transporters and their homologues, J. Mol. Microbiol. Biotechnol. 7 (2004) 102–108. [13] R.D. Barabote, M.H. Saier Jr., Comparative genomic analyses of the bacterial phosphotransferase system (PTS), Microbiol. Mol. Biol. Rev. 69 (2005) 608–634. [14] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy, The Pfam protein families database, Nucleic Acids Res. 32 (2004) D138–D141 (Database issue). [15] S.R. Eddy, Profile hidden Markov models, Bioinformatics 14 (1998) 755–763. [16] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [17] J.D. Thompson, T.J. Gibson, F. Plewniak, F. Jeanmougin, D.G. Higgins, The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools, Nucleic Acids Res. 25 (1997) 4876–4882. [18] Y. Zhai, M.H. Saier Jr., A simple sensitive program for detecting internal repeats in sets of multiply aligned homologous proteins, J. Mol. Microbiol. Biotechnol. 4 (2002) 29–31. [19] Y. Zhai, M.H. Saier Jr., A web-based program for the prediction of average hydropathy, average amphipathicity and average similarity of multiply aligned homologous proteins, J. Mol. Microbiol. Biotechnol. 3 (2001) 285–286. [20] A. Marchler-Bauer, S.H. Bryant, CD-Search: protein domain annotations on the fly, Nucleic Acids Res. 32 (2004) W327–W331 (Web Server issue). [21] M.H. Saier Jr., B.H. Eng, S. Fard, J. Garg, D.A. Haggerty, W.J. Hutchinson, D.L. Jack, E.C. Lai, H.J. Liu, D.P. Nusinew, A.M. Omar, S.S. Pao, I.T. Paulsen, J.A. Quan, M. Sliwinski, T.-T. Tseng, S. Wachi, G.B. Young, Phylogenetic characterization of novel transport protein families revealed by genome analyses, Biochim. Biophys. Acta 1422 (1999) 1–56. [22] S. Prakash, G. Cooper, S. Singhi, M.H. Saier Jr., The ion transporter superfamily, Biochim. Biophys. Acta 1618 (2003) 79–92. [23] V. Anantharaman, E.V. Koonin, L. Aravind, Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains, J. Mol. Biol. 307 (2001) 1271–1292. [24] L. Aravind, E.V. Koonin, A novel family of predicted phosphoesterases includes Drosophila prune protein and bacterial RecJ exonuclease, Trends Biochem. Sci. 23 (1998) 17–19. [25] A. Schlosser, A. Hamann, D. Bossemeyer, E. Schneider, E.P. Bakker, NAD+ binding to the Escherichia coli K+-uptake protein TrkA and sequence similarity between TrkA and domains of a family of dehydrogenases suggest a role for NAD+ in bacterial transport, Mol. Microbiol. 9 (1993) 533–543. [26] J.S. Lolkema, I. Sobczak, D.J. Slotboom, Secondary transporters of the 2HCT family contain two homologous domains with inverted membrane topology and trans re-entrant loops, FEBS J. 272 (2005) 2334–2344. [27] J.S. Lolkema, D.-J. Slotboom, Classification of 29 families of secondary transport proteins into a single structural class using hydropathy profile analysis, J. Mol. Biol. 327 (2003) 901–909. [28] S. Nakano, E. Kuster-Schock, A.D. Grossman, P. Zuber, Spx-dependent global transcriptional control is induced by thiol-specific oxidative stress in Bacillus subtilis, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 13603–13608. 1578 R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 [29] S. Nakano, M.M. Nakano, Y. Zhang, M. Leelakriangsak, P. Zuber, A regulatory protein that interferes with activator-stimulated transcription in bacteria, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 4233–4238. [30] B.H. Spain, D. Koo, M. Ramakrishnan, B. Dzudzor, J. Colicelli, Truncated forms of a novel yeast protein suppress the lethality of a G protein αsubunit deficiency by interacting with the β-subunit, J. Biol. Chem. 270 (1995) 25435–25444. [31] Y. Wang, C. Ribot, E. Rezzonico, Y. Poirier, Structure and expression profile of the Arabidopsis PHO1 gene family indicates a broad role in inorganic phosphate homeostasis, Plant Physiol. 135 (2004) 400–411. [32] B. Pinson, M. Merle, J.M. Franconi, B. Daignan-Fornier, Low affinity orthophosphate carriers regulate PHO gene expression independently of internal orthophosphate concentration in Saccharomyces cerevisiae, J. Biol. Chem. 279 (2004) 35273–35280. [33] D.J. Kelly, G.H. Thomas, The tripartite ATP-independent periplasmic (TRAP) transporters of bacteria and archaea, FEMS Microbiol. Rev. 25 (2001) 405–424. [34] R. Rabus, D.L. Jack, D.J. Kelly, M.H. Saier Jr., TRAP transporters: an ancient family of periplasmic solute receptor-dependent secondary active transporters, Microbiology 145 (1999) 3431–3445. [35] B. Hagenbuch, P.J. Meier, The superfamily of organic anion transporting polypeptides, Biochim. Biophys. Acta 1609 (2003) 1–18. [36] X. Xu, A.K. Fu, F.C. Ip, C.P. Wu, S. Duan, M.M. Poo, X.B. Yuan, N.Y. Ip, Agrin regulates growth cone turning of Xenopus spinal motoneurons, Development 132 (2005) 4309–4316. [37] B. Schlott, J. Wohnert, C. Icke, M. Hartmann, R. Ramachandran, K.H. Guhrs, E. Glusa, J. Flemming, M. Gorlach, F. Grosse, O. Ohlenschlager, Interaction of Kazal-type inhibitor domains with serine proteinases: biochemical and structural studies, J. Mol. Biol. 318 (2002) 533–546. [38] A. van de Locht, D. Lamba, M. Bauer, R. Huber, T. Friedrich, B. Kroger, W. Hoffken, W. Bode, Two heads are better than one: crystal structure of the insect derived double domain Kazal inhibitor rhodniin in complex with thrombin, EMBO J. 14 (1995) 5149–5157. [39] K. Ogura, S. Choudhuri, C.D. Klaassen, Full-length cDNA cloning and genomic organization of the mouse liver-specific organic anion transporter-1 (lst-1), Biochem. Biophys. Res. Commun. 272 (2000) 563–570. [40] A.Y. Hung, M. Sheng, PDZ domains: structural modules for protein complex assembly, J. Biol. Chem. 277 (2002) 5699–5702. [41] P. Wang, J.J. Wang, Y. Xiao, J.W. Murray, P.M. Novikoff, R.H. Angeletti, G.A. Orr, D. Lan, D.L. Silver, A.W. Wolkoff, Interaction with PDZK1 is required for expression of organic anion transporting protein 1A1 on the hepatocyte surface, J. Biol. Chem. 280 (2005) 30143–30149. [42] S.M. Gisler, S. Pribanic, D. Bacic, P. Forrer, A. Gantenbein, L.A. Sabourin, A. Tsuji, Z.S. Zhao, E. Manser, J. Biber, H. Murer, PDZK1: I. a major scaffolder in brush borders of proximal tubular cells, Kidney Int. 64 (2003) 1733–1745. [43] M.H. Saier Jr., Families of transmembrane transporters selective for amino acids and their derivatives, Microbiology 146 (2000) 1775–1795. [44] D.L. Jack, I.T. Paulsen, M.H. Saier Jr., The amino acid/polyamine/ organocation (APC) superfamily of transporters specific for amino acids, polyamines and organocations, Microbiology 146 (2000) 1797–1814. [45] Y.J. Chung, C. Krueger, D. Metzgar, M.H. Saier Jr., Size comparisons among integral membrane transport protein homologues in bacteria, archaea, and eucarya, J. Bacteriol. 183 (2001) 1012–1021. [46] E. Gasol, M. Jiménez-Vidal, J. Chillarón, A. Zorzano, M. Palacín, Membrane topology of system xc-light subunit reveals a re-entrant loop with substrate-restricted accessibility, J. Biol. Chem. 279 (2004) 31228–31236. [47] T. Nystrom, F.C. Neidhardt, Expression and role of the universal stress protein, UspA, of Escherichia coli during growth arrest, Mol. Microbiol. 11 (1994) 537–544. [48] T.I. Zarembinski, L.W. Hung, H.J. Mueller-Dieckmann, K.K. Kim, H. Yokota, R. Kim, S.H. Kim, Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics, Proc. Natl. Acad. Sci. U. S. A. 95 (1998) 15189–15193. [49] M.C. Sousa, D.B. McKay, Structure of the universal stress protein of Haemophilus influenzae, Structure 9 (2001) 1135–1141. [50] A. Accardi, C. Miller, Secondary active transport mediated by a prokaryotic homologue of ClC Cl− channels, Nature 427 (2004) 803–807. [51] K. Abe, F. Ohnishi, K. Yagi, T. Nakajima, T. Higuchi, M. Sano, M. Machida, R.I. Sarker, P.C. Maloney, Plasmid-encoded asp operon confers a proton motive metabolic cycle catalyzed by an aspartate–alanine exchange reaction, J. Bacteriol. 184 (2002) 2906–2913. [52] J.S. Lolkema, D.-J. Slotboom, Sequence and hydropathy profile analysis of two classes of secondary transporters, Mol. Membr. Biol. 22 (2005) 177–189. [53] J. Devereux, P. Haeberli, O. Smithies, A comprehensive set of sequence analysis programs for the VAX, Nucleic Acids Res. 12 (1984) 387–395. [54] T.B. Cao, M.H. Saier Jr., The general protein secretory pathway: phylogenetic analyses leading to evolutionary conclusions, Biochim. Biophys. Acta 1609 (2003) 115–125. [55] Y. Zhai, J. Tchieu, M.H. Saier Jr., A web-based Tree View (TV) program for the visualization of phylogenetic trees, J. Mol. Microbiol. Biotechnol. 4 (2002) 69–70. [56] A.B. Chang, R. Lin, W.K. Studley, C.V. Tran, M.H. Saier Jr., Phylogeny as a guide to structure and function of membrane transport proteins, Mol. Membr. Biol. 21 (2004) 171–181. [57] D.J. Seward, A.S. Koh, J.L. Boyer, N. Ballatori, Functional complementation between a novel mammalian polygenic transport complex and an evolutionarily ancient organic solute transporter, OSTα-OSTβ, J. Biol. Chem. 278 (2003) 27473–27482. [58] W. Wang, D.J. Seward, L. Li, J.L. Boyer, N. Ballatori, Expression cloning of two genes that together mediate organic solute and steroid transport in the liver of a marine vertebrate, Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 9431–9436. [59] P.A. Dawson, M. Hubbert, J. Haywood, A.L. Craddock, N. Zerangue, W.V. Christian, N. Ballatori, The heteromeric organic solute transporter α-β, Ostα-Ostβ, is an ileal basolateral bile acid transporter, J. Biol. Chem. 280 (2005) 6960–6968. [60] L. Sipos, G. von Heijne, Predicting the topology of eukaryotic membrane proteins, Eur. J. Biochem. 213 (1993) 1333–1340. [61] G. von Heijne, Net N–C charge imbalance may be important for signal sequence function in bacteria, J. Mol. Biol. 192 (1986) 287–290. [62] G. von Heijne, Y. Gavel, Topogenic signals in integral membrane proteins, Eur. J. Biochem. 174 (1988) 671–678. [63] Y. Zhai, W.H.M. Heijne, D.W. Smith, M.H. Saier Jr., Homologues of archaeal rhodopsins in plants, animals and fungi: structural and functional predications for a putative fungal chaperone protein, Biochim. Biophys. Acta 1511 (2001) 206–223. [64] J.E. Garbarino, I.R. Gibbons, Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein, BMC Genomics 3 (2002) 18. [65] L.M. Iyer, D.D. Leipe, E.V. Koonin, L. Aravind, Evolutionary history and higher order classification of AAA+ ATPases, J. Struct. Biol. 146 (2004) 11–31. [66] K. Tanaka, K. Okabayashi, M. Asashima, N. Perrimon, T. Kadowaki, The evolutionarily conserved porcupine gene family is involved in the processing of the Wnt family, Eur. J. Biochem. 267 (2000) 4300–4311. [67] G.I. Anyatonwu, B.E. Ehrlich, Organic cation permeation through the channel formed by polycystin-2, J. Biol. Chem. 280 (2005) 29488–29493. [68] M. Bycroft, A. Bateman, J. Clarke, S.J. Hamill, R. Sandford, R.L. Thomas, C. Chothia, The structure of a PKD domain from polycystin-1. Implications for polycystic kidney disease, EMBO J. 18 (1999) 297–305. [69] G.M. Xu, S. González-Perrett, M. Essafi, G.A. Timpanaro, N. Montalbetti, M.A. Arnaout, H.F. Cantiello, Polycystin-1 activates and stabilizes the polycystin-2 channel, J. Biol. Chem. 278 (2003) 1457–1462. [70] H. Miki, Y. Okada, N. Hirokawa, Analysis of the kinesin superfamily: insights into structure and function, Trends Cell Biol. 15 (2005) 467–476. [71] Z. Fujimoto, A. Kuno, S. Kaneko, H. Kobayashi, I. Kusakabe, H. Mizuno, Crystal structures of the sugar complexes of Streptomyces olivaceoviridis E-86 xylanase: sugar binding structure of the family 13 carbohydrate binding module, J. Mol. Biol. 316 (2002) 65–78. [72] C.N. Adra, S. Zhu, J.L. Ko, J.C. Guillemot, A.M. Cuervo, H. Kobayashi, T. Horiuchi, J.M. Lelias, J.D. Rowley, B. Lim, LAPTM5: a novel R.D. Barabote et al. / Biochimica et Biophysica Acta 1758 (2006) 1557–1579 [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] lysosomal-associated multispanning membrane protein preferentially expressed in hematopoietic cells, Genomics 35 (1996) 328–337. D.L. Hogue, M.J. Ellison, J.D. Young, C.E. Cass, Identification of a novel membrane transporter associated with intracellular membranes by phenotypic complementation in the yeast Saccharomyces cerevisiae, J. Biol. Chem. 271 (1996) 9801–9808. D.L. Hogue, L. Kerby, V. Ling, A mammalian lysosomal membrane protein confers multidrug resistance upon expression in Saccharomyces cerevisiae, J. Biol. Chem. 274 (1999) 12877–12882. Y. Hayami, S. Iida, N. Nakazawa, I. Hanamura, M. Kato, H. Komatsu, I. Miura, B.J. Dave, W.G. Sanger, B. Lim, M. Taniwaki, R. Ueda, Inactivation of the E3/LAPTm5 gene by chromosomal rearrangement and DNA methylation in human multiple myeloma, Leukemia 17 (2003) 1650–1657. L.M. Scott, L. Mueller, S.J. Collins, E3, a hematopoietic-specific transcript directly regulated by the retinoic acid receptor alpha, Blood 88 (1996) 2517–2530. J.M. Inal, Schistosoma TOR (trispanning orphan receptor), a novel, antigenic surface receptor of the blood-dwelling, Schistosoma parasite, Biochim. Biophys. Acta 1445 (1999) 283–298. J.M. Inal, Complement C2 receptor inhibitor trispanning: from man to schistosome, Springer Semin, Immunopathol. 27 (2005) 320–331. R.E. Hirsch, B.D. Lewis, E.P. Spalding, M.R. Sussman, A role for the AKT1 potassium channel in plant nutrition, Science 280 (1998) 918–921. P. Daram, S. Urbach, F. Gaymard, H. Sentenac, I. Cherel, Tetramerization of the AKT1 plant potassium channel involves its C-terminal cytoplasmic domain, EMBO J. 16 (1997) 3455–3463. P. Maser, S. Thomine, J.I. Schroeder, J.M. Ward, K. Hirschi, H. Sze, I.N. Talke, A. Amtmann, F.J. Maathuis, D. Sanders, J.F. Harper, J. Tchieu, M. Gribskov, M.W. Persans, D.E. Salt, S.A. Kim, M.L. Guerinot, Phylogenetic relationships within cation transporter families of Arabidopsis, Plant Physiol. 126 (2001) 1646–1667. T.M. Ishii, C. Silvia, B. Hirschberg, C.T. Bond, J.P. Adelman, J. Maylie, A human intermediate conductance calcium-activated potassium channel, Proc. Natl. Acad. Sci. U. S. A. 94 (1997) 11651–11656. B.S. Jensen, D. Strobaek, S.P. Olesen, P. Christophersen, The Ca2+activated K+ channel of intermediate conductance: a molecular target for novel treatments? Curr. Drug Targets 2 (2001) 401–422. 1579 [84] R. Roncarati, I. Decimo, G. Fumagalli, Assembly and trafficking of human small conductance Ca2+-activated K+ channel SK3 are governed by different molecular domains, Mol. Cell. Neurosci. 28 (2005) 314–325. [85] T. Arazi, B. Kaplan, R. Sunkar, H. Fromm, Cyclic-nucleotide- and Ca2+/ calmodulin-regulated channels in plants: targets for manipulating heavymetal tolerance, and possible physiological roles, Biochem. Soc. Trans. 28 (2000) 471–475. [86] J. Sneyd, M. Falcke, Models of the inositol trisphosphate receptor, Prog. Biophys. Mol. Biol. 89 (2005) 207–245. [87] C.P. Ponting, Novel eIF4G domain homologues linking mRNA translation with nonsense-mediated mRNA decay, Trends Biochem. Sci. 25 (2000) 423–426. [88] D. Drennan, A.G. Ryazanov, Alpha-kinases: analysis of the family and comparison with conventional protein kinases, Prog. Biophys. Mol. Biol. 85 (2004) 1–32. [89] M.V. Dorovkov, A.G. Ryazanov, Phosphorylation of annexin I by TRPM7 channel-kinase, J. Biol. Chem. 279 (2004) 40646–50643. [90] L.V. Ryazanova, M.V. Dorovkov, A. Ansari, A.G. Ryazanov, Characterization of the protein kinase activity of TRPM7/ChaK1, a protein kinase fused to the transient receptor potential ion channel, J. Biol. Chem. 279 (2004) 3708–3716. [91] C. Schmitz, A.L. Perraud, D.O. Johnson, K. Inabe, M.K. Smith, R. Penner, T. Kurosaki, A. Fleig, A.M. Scharenberg, Regulation of vertebrate cellular Mg2+ homeostasis by TRPM7, Cell 114 (2003) 191–200. [92] P.D. Wilson, Polycystin: new aspects of structure, function, and regulation, J. Am. Soc. Nephrol. 12 (2001) 834–845. [93] C. Stayner, J. Zhou, Polycystin channels and kidney disease, Trends Pharmacol Sci. 22 (2001) 543–546. [94] A. Miyazawa, Y. Fujiyoshi, N. Unwin, Structure and gating mechanism of the acetylcholine receptor pore, Nature 423 (2003) 949–955. [95] K. Brejc, W.J. van Dijk, R.V. Klaassen, M. Schuurmans, J. van Der Oost, A.B. Smit, T.K. Sixma, Crystal structure of an ACh-binding protein reveals the ligand-binding domain of nicotinic receptors, Nature 411 (2001) 269–276. [96] H. Xue, Identification of major phylogenetic branches of inhibitory ligandgated channel receptors, J. Mol. Evol. 47 (1998) 323–333. [97] S. Ignoul, J. Eggermont, CBS domains: structure, function, and pathology in human proteins, Am. J. Physiol., Cell Physiol. 289 (2005) C1369–C1378.
© Copyright 2025 Paperzz