Page 1 ofArticles 58 in PresS. Physiol Genomics (January 16, 2007). doi:10.1152/physiolgenomics.00142.2006 1 The evolution and structural diversification of Hyperpolarization-activated cyclic nucleotide-gated (HCN) channel genes Heather A. Jackson1, Christian R. Marshall2 and Eric A. Accili1 1 Department of Cellular and Physiological Sciences, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada. 2Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada. ‘Evolution of HCN channels’ Corresponding Author: E.A. Accili, University of BC, Vancouver, BC, Canada. e-mail ([email protected]), phone (604) 822-6900, fax (604) 822-6048. Copyright © 2007 by the American Physiological Society. Page 2 of 58 2 Abstract Hyperpolarization-activated cyclic nucleotide-gated (HCN) channels are members of the voltage-gated channel superfamily and play a critical role in cellular pacemaking. Overall sequence conservation is high throughout the family and channel functions are similar but not identical. Phylogenetic analyses are imperative to understand how these genes have evolved and to make informed comparisons of HCN structure and function. These have been previously limited, however, by the small number of available sequences, from a minimal number of species unevenly distributed over evolutionary time. We have now identified and annotated thirty-one novel genes from invertebrates, urochordates, fish, amphibians, birds and mammals. With increased numbers and a broader representation, a more precise sequence comparison was performed and an evolutionary history for these genes was constructed. Our data confirm the existence of at least four vertebrate paralogs and suggest that these arose via three duplication and diversification events from a single ancestral gene. Additional lineage specific duplications appear to have occurred in urochordate and fish genomes. Based on exon boundary conservation and phylogenetic analyses, we hypothesize that mammalian gene structure was established, and duplication events occurred, after the divergence of urochordates and before the divergence of fish from the tetrapod lineage. In addition, we identified highly conserved sequence regions that are likely important for general HCN functions, as well as regions with differences conserved among each of the individual paralogs. The latter may underlie more subtle isoform-specific properties that were otherwise masked by the high identity among mammalian orthologs and/or inaccurate alignments between paralogs. Page 3 of 58 3 Key Words: Pacemaker channel; phylogeny; sequence analysis; molecular evolution Page 4 of 58 4 Introduction Hyperpolarization-activated currents, commonly referred to as Ih or If, have been identified in species distributed across the metazoan lineage and contribute to membrane potential in excitable and non-excitable cells. In the mammalian heart and nervous system they contribute to spontaneous beating in pacemaker cells (11, 52). In non-pacing cells, Ih facilitates the regulation of resting membrane properties by limiting the fluctuation of membrane potentials (52). The proteins responsible for Ih have been cloned from both invertebrate and vertebrate species and are referred to as hyperpolarization-activated cyclic nucleotide-gated (HCN) channels (18, 26, 35, 57). When expressed in heterologous systems, HCN subunits form cation-selective channels that are directly modulated by cyclic nucleotides and exhibit biophysical properties that are similar to those of Ih observed in native tissue. HCN channels are members of the voltage-gated cation channel superfamily and based on sequence homology, are most closely related to the cyclic nucleotide gated (CNG) channel and ether-a-go-go (EAG) potassium channel families. Individual subunits (Figure 1) are predicted to have six transmembrane segments (S1-S6), with a voltage sensor domain in the S4 segment and an ion conducting pore between S5 and S6. Based on the crystal structures of related potassium channels (14, 27, 32) it is proposed that four HCN subunits come together to form a tetramer around a central pore. The distal termini of each subunit are cytoplasmic and from the crystal structure of the C-terminus (78), it is now known that an C-helical linker region joins the transmembrane region to an evolutionarily conserved cyclic nucleotide-binding domain (CNBD). Page 5 of 58 5 To date, four mammalian isoforms have been cloned (HCN1-HCN4) (26, 35, 57). These paralogs exhibit 80-90% sequence identity between the start of S1 to the end of the CNBD. Differences in both sequence and length between paralogs occur predominantly in the N- and Ctermini (36). This high sequence identity suggests that the four mammalian HCN genes arose from duplications of a single ancestral gene prior to the lineage divergence. However, when these duplication and diversification events occurred remain unknown. The functional properties of the individual isoforms, such as their responses to changes in voltage and modulation of channel opening by cAMP, are similar but not identical. Likewise, the four HCN paralogs display partial overlapping tissue distribution in both the heart (26, 45, 46, 63) and the central nervous system (36). Despite what seems like redundant functional and expression profiles, the overall physiological importance of the individual isoforms is clearly evident from the different phenotypes of knockout mouse models. General and forebrain specific HCN1 knockout mice demonstrate learning and movement dysfunction (49), HCN2-deficient mice exhibited absence epilepsy as well as cardiac sinus dysrhythmia (34) and HCN4-deficient mice died during embryonic development, possibly due to reduced HCN-mediated currents and heart rates of the embryos (66). Furthermore, the physiological impact of the mammalian HCN4 isoform in the heart has been confirmed by the association of mutations in this isoform to sinus arrhythmias in humans (44, 60, 71). This pattern of functional disruption partially reflects the predominate tissue distribution of the isoforms but also indicates that the sequence divergence that has occurred in these four paralogs has been sufficient to differentiate their physiological roles, thereby reducing functional redundancy and allowing each to have been retained throughout evolution since the duplication process. Page 6 of 58 6 In addition to the four mammalian isoforms, only a few HCN genes have been cloned from lower vertebrate and invertebrate species. A single HCN1 ortholog has been cloned from the rainbow trout (9) whereas one and two HCN homologs have been cloned from arthropods (20, 21, 30, 43) and the sea urchin (16, 18), respectively. These sequences demonstrate considerable identity with the mammalian homologs, as well as some intriguing differences, and thus have offered some preliminary clues about the evolutionary relationships among HCN genes. But due to the lack of sequence representation from a diverse sampling of species distributed throughout evolutionary history, previous phylogenetic analyses of the HCN family and its relatives have been limited and have yielded inconsistent patterns of evolution (10, 16). Furthermore, the current sampling of the four vertebrate isoforms is limited primarily to closely-related mammalian sequences. The high residue conservation among these sequences, combined with the lack of sequences from more distantly related vertebrates, renders comparative analyses of the orthologs from the individual vertebrate isoforms ineffective. In this study, we report the first thorough phylogenetic analysis and sequence comparison of the HCN gene family. By performing an extensive search of the available protein and currently accessible whole-genome databases we derived a comprehensive list of known and putative fulllength sequences for HCN homologs from a wide variety of species including urochordates and lower vertebrates. The increased number of sequences and improved species diversification provide information about HCN gene structure at critical periods during its evolutionary history. We identified sequences that are conserved and likely important for general HCN functions, as well as regions that may underlie more subtle differences in function among the different isoforms. Furthermore, analyses of exon structure and genomic organization were carried out. Page 7 of 58 7 These analyses provide insight into the molecular evolution of this protein within different taxa and support the hypothesis that both lineage-specific and ancestral duplication and divergence events of the HCN genes have occurred throughout its history. Materials and Methods Sequence Data: Currently available HCN protein sequences were identified through BLASTP (1) searches of the National Center for Biotechnology Information (NCBI) and UniProt/SwissProt non-redundant (nr) protein databases. Human HCN2 (Genbank GI no. 21359848), trout HCN1 (Genbank GI no. 33312350) or drosophila HCN (Genbank GI no. 5326833) amino acid sequences were used as queries and default parameters were applied. To prevent the inclusion of incorrect sequences generated by computer prediction programs, all computer-derived annotations were temporarily removed until confirmed by sequence alignment or database annotation described below. Splice variants, short fragments, and other duplicates were also removed. The Ensembl Genome Browser (http://www.ensembl.org/) (6, 24) was used to determine the genomic position and distribution of the established HCN genes, and either to examine new HCN genes identified by the computer gene-prediction programs and annotation process or to identify novel genes. Sequences classified as HCN genes by Ensembl were examined and those that spanned the entire length of known sequences were downloaded. Protein annotations that resembled HCN but either lacked regions of the predicted sequence or showed signs of additional exons or exon fragments were not included. However, the genomic DNA underlying these protein predictions were used as further reference to help in the manual re-annotation of the Page 8 of 58 8 protein sequence. TBLASTN (1) searches of the available genome databases using either the low sensitivity default parameters (optimized for near-exact matches) or medium sensitivity default parameters (optimized to allow for local mismatch) were conducted to identify any further genes or genomic regions that showed significant sequence identity to the peptide query sequence used. In total, thirteen genomes were analyzed, including: Zebrafish (Danio rerio) (v. 35.5b), Japanese pufferfish (Fugu rubripes) (v. 29.2e), Green Pufferfish (Tetraodon) (v. 31.1c), opossum (Monodelphis domestica) (v35.2), dog (Canis Familiaris) (v. 35.1d), cow (Bos Taurus) (v.36.2), chimpanzee (Pan troglodytes) (v.31.2a), clawed frog (Xenopus Tropicalis) (v. 31.1a), chicken (Gallus gallus) (v.35.1k), sea squirt (Ciona Intestinalis) (v. 35.195b), pacific sea squirt (Ciona Savignyi) (CSAV2.0), mosquito (Anopheles gambiae) (v. 23.2b.1), and worm (Caenohabditis elegans) (v 29.130). Human HCN2, trout HCN1 and drosophila HCN sequences were used as query sequences for vertebrate, fish and insect genomes, respectively. For the urochordate genomes, sea urchin HCN (Genbank GI no. 74136757), trout HCN1 and human HCN2 sequences were used. In general, similar TBLASTN results were found regardless of the query sequence used, reflecting the high degree of sequence identity found within the core region throughout the HCN family. Full length putative protein sequences were constructed from the conceptual translation of genomic DNA as previously described (42) with the proposed starting methionine in vertebrates supported by the presence of a consensus start sequence (29). Genome position and intron-exon structure were examined and recorded. A list of the sequences used in the analysis is shown in Table 1. Multiple Sequence Alignments: Page 9 of 58 9 Protein sequence alignments were performed using the default parameters of ClustalX (version 1.83) (69). Alignments were subsequently examined and edited in GeneDoc (48). To ensure that the HCN genes identified were full-length and non-redundant, those lacking a putative start codon were discarded. Isoform pairs identified in some of the fish genomes were included if they corresponded to a different location in the genome, as they are likely the result of a polyploidization event that is thought to have occurred early in their lineage ((23) and references therein). Partial sequences that lacked a portion of the distal C-terminus corresponding to the last mammalian exon were included as this region was not used in the phylogenetic analyses due to the high degree of length and sequence variability among the different HCN genes. For the Nand C-terminal alignments, vertebrate sequences corresponding to the region upstream of S1 or downstream of the CNBD (see Figure 1) were individually re-aligned for HCN1-4 and were shaded based on sequence conservation of 100% (black), 80% (dark grey), and 60% (light grey). Phylogenetic Analyses: Sequences were trimmed to produce a core alignment spanning from the start of the transmembrane segment S1 to the end of the CNBD (see Figure 1). This region exhibits high sequence identity within the HCN family and among the other related sequences that were used. As there is no known bacterial HCN channel, and based on a previous phylogenetic analysis of HCN and CNG channels (10), KAT1 (Genbank GI no. 44888080), human ERG1 (Genbank GI no. 7531135), human CNGA1 and CNGA3 (Genbank GI no. 2506302 and 13959682) sequences were included in the analyses to serve as an outgroup for the rooted phylogenetic trees of the HCN family. Six sequences that were missing exons due to gaps in the genome assembly were removed from the alignment prior to running the programs. Neighbor-Joining (NJ) trees were Page 10 of 58 10 generated using ClustalX, followed by tree evaluation with bootstrap re-sampling of 1000 times. Additional NJ, Maximum Parsimony (MP) and Maximum Likelihood (ML) trees were created using the Seqboot, Protdist, Neighbor, Protpars, Proml, and Consense programs from the PHYLIP package (version 3.65), bootstrapping with 100 replicates with randomized input order and 10 jumbles (15). The TreeView program (version 1.6.6) (51) was used to examine and display all trees. Results and Discussion To date, 21 unique full-length HCN coding sequences have been cloned and deposited in GenBank: 4 from arthropods, 2 from sea urchin, 1 from fish and 14 from mammals, specifically human, rabbit, mouse and rat. Previous phylogenetic analyses of the HCN family and its relatives have been limited and inconsistent, in part due to the lack of available sequences from a diverse sampling of species. For example, the product of the first duplication event has been shown to be either HCN1 (10) or HCN3 (16). Furthermore, the recently cloned spHCN2 (16) sequence was shown to group with the HCN family, but was situated at the base of the phylogenetic tree. This is not consistent with species evolution, which would place the sea urchin genes with the other deuterostome species, after the arthropod clade division. Therefore, spHCN2 represents either a new family of channels not yet identified in any other species or an HCN homolog that was the product of a duplication event followed by rapid sequence divergence. These inconsistencies underscore the current limitations and need for more detailed analyses. This study expands the breadth of available HCN sequences through the manual annotation of available genomes and in turn, provides the first comprehensive sequence Page 11 of 58 11 comparison and phylogenetic analysis of this protein family including sequences from a representative sampling of different species across evolutionary history. HCN genes are present in multiple copies across a wide spectrum of species Using a BLASTP search at NCBI or by searching the genome databases available at the Ensembl website, we identified 58 non-redundant HCN sequences. Twenty-seven of these sequences were previously identified by cloning or by computer annotation which were confirmed by genomic data. The remaining thirty-one are novel and were completed by the data mining of genomes available at Ensembl or the manual re-annotation of the predicted protein translations. Several of the Ensembl predictions were either inconsistent with known sequences or the predicted gene did not span the estimated length of the transcript. The inherent problems in gene prediction and computer protein annotation methods have been previously described ((42) and references therein). Therefore, we have corrected for this by multi-species comparisons and manual reannotation on a gene-by-gene basis. A complete list of the sequences included in this analysis is shown in Table 1 and their protein sequences can be found in Supplementary Figure 1. Included in the final list are: 23 mammalian sequences; including 2 and 3 new sequences from the chimpanzee and opossum genomes, respectively; 22 lower vertebrate sequences; 6 urochordate sequences, including 3 new sequences from both c. intestinalis and c. savignyi genomes; and 7 invertebrate sequences, with a new annotation of the single HCN gene found in the mosquito genome. Of the lower vertebrates, 2 new sequences were from chicken, 3 from frog, 6 new sequences from both the green pufferfish and the Japanese pufferfish genomes, and 2 new sequences in the zebrafish. No HCN sequences were identified in c. elegans. With this substantial increase in the number of full-length sequences representing a wide range of species, Page 12 of 58 12 we were able to reconstruct the evolutionary history of this protein family and probe the relationship of channel structure and function in much greater detail than was previously possible. High sequence identity amongst four vertebrate HCN isoforms within the core region Due to the length and sequence variability that occurs in the N- and C-termini (see Supplementary Figure 2), these regions were trimmed and a core alignment was produced corresponding to a region between S1 and the end of the CNBD and approximately to exon 2 through 7 of the mammalian isoforms. Sequence conservation of the four vertebrate isoforms within this region is high, at least 80-90% identity amongst the mammalian sequences and over 90% residue conservation between the newly identified fish sequences and their respective mammalian orthologs (Figure 2). This indicates that this region has been slow to evolve during the 450MY that separate the fish and human lineages. Two interesting findings based on this sequence comparison are that HCN4 shares the highest sequence identity with all other vertebrate isoforms and that the mammalian HCN3 and fish HCN3 sequences are as similar to the other vertebrate isoforms as they are to each other. This suggests that HCN4 sequences have diverged the least from a common ancestral sequence, as is further evident from the branch lengths in the NJ and ML phylogenetic trees, and that HCN3 sequences have evolved independently within the fish and mammalian lineages by equal amounts but at different sites. Overall, the sequences of the invertebrates and urochordates display a lower conservation with the mammalian sequences. Arthropods share a general sequence identity of approximately 6065% with the mammalian homologs, whereas one of the sea urchin sequences, spHCN1 (also known as spIH or spHCN), and two of the ciona homologs, here named HCNb and HCNa, are Page 13 of 58 13 only 55% identical. The second sea urchin sequence, spHCN2, and the other sequence from the ciona species (HCNc) are even more diverged. EST evidence supports the validity of highly diverged sequences identified in urochordates Sequences identified from c. intestinalis and c. savignyi are substantially different from those of either invertebrates or vertebrates. Two sequence pairs, here named HCNa and HCNb to limit their association with any of the vertebrate isoforms, only share ~50-55% identity or 70-75% conservation with either group. The third sequence, HCNc, is even less similar. It shares only 40% identity and 61-64% conservation with all other sequences with the exception of ciona HCNa, with which it shares a slightly higher identity. The three sequences were found in the genome databases of both ciona species and the orthologs share a conserved identity of over 90%. This HCN sequence similarity between the two ciona genome projects makes it very unlikely that any differences between genes or with sequences from either invertebrates or vertebrates are due to errors in sequencing. To further support their validity, HCN expressed sequence tags (ESTs) were found in the database of the Kyoto University ciona cDNA project (59) (http://ghost.zool.kyotou.ac.jp/indexr1.html). These ESTs span several exons and cover a large portion of the c. intestinalis sequences provided here, suggesting that the three c. intestinalis genes have been correctly identified and that their transcripts are expressed. A recent comprehensive analysis of the ascidian genome revealed the existence of these three putative HCN ion channels (HCN1 = HCNc, HCN2 = HCNa and HCN3 = HCNb) (50), however, the sequences themselves were not Page 14 of 58 14 reported. The sequences identified here were in turn used to correctly annotate those found in the c. savignyi genome. Three different HCN duplication events occurred prior to the divergence of the fish lineage. In order to examine the phylogeny of this family, a multiple sequence alignment of the core region of HCN was created using 52 of the 58 identified sequences (see methods). Three different phylogenetic programs, two NJ methods and one MP method, produced similar results. Due to the high sequence conservation among mammalian sequences, ML methods were incapable of analyzing the complete list of sequences. Using a subgroup of 41 sequences, ML methods produced similar results to the NJ and MP trees. Figure 3 is a rooted MP consensus tree produced by the PHYLIP program. The NJ rooted phylogram created by ClustalX and a single representative ML phylogram produced by PHYLIP are provided in Supplementary Figure 3. Four sequences were included as an outgroup: human CNGA1 and CNGA3, human ERG1 and KAT1. Tree topology did not change when any of these outgroup sequences were removed. HCN sequences did not group according to species but rather within isoform clades, indicating that the HCN1-4 isoforms are paralogs. Genes from fish, birds and amphibians partitioned within the four mammalian clades which strongly suggests that the four isoforms originated prior to the origin of the vertebrate clade and were present in the common ancestor of fish and tetrapods. From the tree topology, we also predict that an HCN ancestor underwent three separate gene duplication events prior to the divergence of the fish lineage, over 450 MYA. Our data are inconsistent with two rounds of whole-genome duplication and suggests that the HCN family evolved through independent gene duplications or chromosomal block duplication (25). Page 15 of 58 15 However, due to a relatively low sequence resolution, the 2R hypothesis can not be ruled out. Of the four HCN isoforms, the clade belonging to HCN3 was the first to diverge and represents the product of the first duplication event. This position of HCN3 is supported by all four phylogeny methods and by a high bootstrap value, and supports the findings of Galindo and colleagues (16). Functional data will be required to further explain this branching position. The low bootstrap value at the division of the HCN3 clade in the MP and NJ trees indicates that this subdivision remains unresolved. We hypothesize that this clade division may be due to the independent sequence evolution which has occurred in the teleost and tetrapod HCN3 genes following the lineage divergence. Interestingly, the HCN3 clade is not divided in the ML tree for which the Jones-Taylor-Thornton model of evolution was imposed. We can not rule out the possibility that this difference is due to the decreased sequences used in the ML analysis. The resolution of the HCN3 clade will only be improved with the inclusion of more complete sequences from species which evolved following the emergence of the four HCN paralogs but prior to the divergence of mammals. Functional data of the HCN3 channels from lower vertebrates will also be required to conclusively determine whether a functional divergence occurred within this clade. Our data also suggests that HCN4 is the product of the second duplication event followed by the emergence of HCN1 and HCN2. This proposed evolutionary pattern is evident in all three phylogenetic trees and is further supported by the sequence conservation pattern shown in Figure 2. Based on phylogenetic results, the division between HCN1 and HCN2 remains unresolved. Nevertheless, functional data for the mammalian isoforms suggests a clear difference between these two isoforms. The discrepancy with the predicted order of species evolution within each mammalian clade is likely due to the high sequence conservation seen within this core region. Page 16 of 58 16 This results in a limited number of informative sites and produces a low phylogenetic signal (17). The lack of sequence divergence in the mammalian clades may be due to insufficient evolutionary time to permit the accumulation of mutations and/or a strong selective pressure to retain the conserved sequence of this channel. Fish lineage show evidence of duplicate HCN genes Genome data mining revealed multiple HCN genes in both the Fugu rubripes and Tetraodon species, in both cases exceeding the number found in the mammalian lineage. From the branching order in the phylogenetic tree shown in Figure 3, it is clear that fish gene pairs group together within the clades of individual mammalian orthologs. This suggests that the common ancestor of teleost fish and tetrapods had 4 HCN genes, and that these underwent further duplications independently in the fish lineage. These sequences have therefore been named according to their mammalian ortholog and subsequently designated ‘a’ or ‘b’. This pattern is clearly evident in the HCN2 and HCN4 clades, where the green puffer and fugu ‘a’ and ‘b’ sequences are grouped together and are separated from each other by high bootstrap values. Because the teleost fish are predicted to have undergone a complete genome duplication early in their own lineage ((23) and references therein), this distribution pattern is not unexpected. However, because some of the genes identified were not full length and some showed evidence of intron insertion, further analyses is required to determine whether these duplicates are expressed and functional or have become pseudogenes (73) Ciona genes most likely arose through lineage-specific duplication events Page 17 of 58 17 In contrast to the observed pattern of the multiple fish genes, the multiple copies of HCN sequences found within the urochordate species do not partition with any of the mammalian isoform clades. However, each gene did show a highly significant pairing between c. intestinalis and c. savignyi, indicating that these duplication events occurred before these two species diverged. The timing of this duplication is unknown, but with the presence of multiple HCN genes previously identified in the sea urchin, two different scenarios are possible. First, one duplication of an ancestral gene may have occurred prior to the divergence of the deuterostomes and given rise to the multiple HCN genes seen in the sea urchin and urochordate species. A lineage specific duplication then occurred in the urochordates to produce the third ciona HCN gene. Subsequently, the genes evolved rapidly and independently in the different lineages and thus no longer resemble each other or any specific HCN isoform. The loss of one ancestral gene and the duplication and diversification of the other, would then have given rise to the four isoforms now common to all chordates. A second, and more parsimonious, pattern of evolution is that lineage-specific duplication events of a single HCN ancestor occurred and gave rise to the three HCN homologs within the urochordate lineage. Further ancestral duplication events occurred within the vertebrate lineage, after the divergence of the urochordates and prior to the fish lineage, giving rise to the four known mammalian isoforms. Lineage-specific gene duplication in ciona has also been shown for the evolution of sodium gene family. Two sodium channel genes have been identified but one possesses a sequence that has diverged considerably from both its paralogous pair and from the other known sodium channel gene sequences. The authors concluded that the duplication events occurred just prior to fish lineage (33). Similarly, independent lineage-specific duplication was suggested for the ankyrin gene family based on their phylogenetic results and the differences in gene sequences between ciona and the vertebrate Page 18 of 58 18 homologs (7). The general branching order between the ciona HCN homologs, in which HCNa and HCNc group together and HCNb is independent, is consistent among the different trees produced. Their position within the tree, however, is variable and is most likely a result of the different tree building methods used. It may also reflect the amount of lineage-specific sequence divergence that has occurred in these HCN genes which has caused them to evolve independently of both the invertebrate and vertebrate clades and has blurred their phylogenetic position. Predicted phylogenetic patterns are supported by exon boundary structure Our phylogenetic analyses indicate that the four vertebrate isoforms arose via three duplications of an ancestral HCN gene. This is supported by the exon structure of their coding sequences. The four human HCN genes are located on different chromosomes: 1q21.2 (HCN3), 5p12 (HCN1), 15q24-q25 (HCN4) (61) and 19p13 (HCN2) (72). Using the genome databases, we found this pattern of differential localization for all vertebrate HCN genes, fish through mammals. Furthermore, the overall exon structure of the four isoforms has remained consistent since the duplication and divergence events (Figure 4). They are comprised of 8 exons, with highly conserved length and sequence in exons 2 through 7. An exception to this is observed in both of the fish HCN3b genes which are predicted to have an intron of <75bp inserted in the middle of exon 2. Exon 1 and 8, which directly correspond with the distal N- and C-termini, vary in both length and sequence for all vertebrate genes analyzed. In addition, exon boundary positions are highly conserved throughout the vertebrate lineage. This suggests that the extant vertebrate HCN sequences are derived from a single ancestral gene that had an exon structure similar to current Page 19 of 58 19 mammalian HCN genes and that the duplication events occurred after the intron positions were fixed in the linear sequence. Our phylogenetic results suggest that the invertebrate and urochordate HCN genes are homologs of the four vertebrate isoforms but, because they group outside the vertebrate clades, they are not orthologous to any one in particular. The presence of multiple genes in the urochordate species, therefore, raised the question as to where and when duplications occurred and what course of evolution lead from the invertebrate to vertebrate gene structure. To address these questions, we compared the exon structures of invertebrates and urochordates to the highly conserved vertebrate structure. The exon structure of the single HCN gene in the arthropod lineage is quite different from the mammalian structure. Using the fly and mosquito genes as representatives, the invertebrate gene is comprised of 13 or 14 shorter exons, separated by short introns. Of the seven boundary positions conserved throughout the vertebrate lineage, only 2-4 are conserved in these species. The exon structures of the urochordate genes, on the other hand, show similarities to both the invertebrate and vertebrate genes. The HCNb gene contains 15 exons separated by short introns, which is similar to the invertebrate structure. Nevertheless, seven of the exon boundaries parallel all of those found in the vertebrate structure. The other two urochordate genes, HCNa and HCNc, are different from both the vertebrate and invertebrate genes. HCNa contains 10 exons and only four boundaries are conserved with the vertebrate structure and none with the invertebrate genes. The 13 exon boundaries of HCNc are not conserved with any other gene, including paralogs from within the same species. In conjunction with sequence identity, residue conservation and phylogenetic analyses, these findings further support the hypothesis that HCNb represents the homolog most similar to the ancestral HCN gene of the urochordates, and that Page 20 of 58 20 HCNa and HCNc arose through lineage-specific duplication processes. Furthermore, we suggest that the ancestral gene of the HCN family contained numerous introns which were then lost during the course of evolution. Exons became fused to form the structure now seen within the vertebrate lineage. The evolution of key residues in the voltage sensing domain and pore region HCN channels open in response to changes in membrane voltage and allow for the passage of Na+ and K+ ions across the plasma membrane. The transmembrane domains of the individual subunits, which form tetramers around a central pore, are primarily responsible for these functions. As might be expected from the natural constraints of the hydrophobic bilayer, sequence conservation is abundant in areas predicted to correspond to the six transmembrane segments. Similar to depolarization-activated K+ channels, the fourth transmembrane segment (S4) contains positively charged residues and is likely to sense changes in voltage across the cell membrane (Figure 5A, ‘a’). In contrast to depolarization-activated K+ channels, however, HCN channels open instead of close in response to hyperpolarization. Interestingly, HCN channels possess an additional 4 or 5 charged residues at the N-terminal end of S4 (Figure 5A ‘b’). Together, in response to changes in voltage, these charges have been shown to move in the same direction as that in depolarization-activated K+ channels (3). Therefore, it has been suggested that the movement of the voltage-sensing domain is coupled to channel opening in the opposite way (40). Throughout the HCN family, there is high sequence identity in this region among the vertebrate isoforms and high conservation across all sequences. The more diverged sequences from the sea urchin and urochordate species, spHCN2, ciona HCNa and ciona HCNc, which are predicted to be the products of lineage-specific duplication, possess only three of the upstream Page 21 of 58 21 positive charges. The effect of this loss of charge awaits functional characterization, but could be due to relaxed constraints in these duplicate genes. Based on their similarity to K+ channels (77), we predict that the HCN channels are in their lowest energy state when closed and that the voltage-sensors work to open the gate upon hyperpolarization of the membrane potential. The S4-S5 linker couples the sensing of membrane voltage by S4 to the opening of the channel gate, which is located at the inner portion of the S6 segment. Channel opening is highly sensitive to mutations in the S4-S5 segment. Within this region (Figure 5A ‘c’), a dipeptide of a tyrosine followed by an acidic residue is conserved across the vertebrate and urochordate channels, but is not present in the invertebrates. Mutation of the tyrosine residue to polar or acidic amino acids in the mammalian HCN2 isoform disrupts channel closure (8, 38). Similar results were shown for the nearby arginine residue located at the end of this segment which is conserved throughout the HCN family. These experiments confirm that this region is critical for the regulation of channel opening by changes in voltage. The lack of complete conservation within this region therefore suggests that the strength of coupling may be variable among the different HCN genes. For example, spHCN1 channels exhibit an inactivation process that is due to a desensitization of the opening of the intracellular gate to voltage, which may reflect a weaker coupling of the voltage sensor and gate (55). Based on sequence homology to the crystal structure of a bacterial K+ channel (14), the region between S5 and the end of S6 is believed to form the ion conduction pore in HCN channels. It contains a pore helix and selectivity filter and is involved in both ion selectivity and transport. In K+ channels, the selectivity filter exhibits the K+ signature sequence (GY/FG), which is thought Page 22 of 58 22 to confer their K+ selective nature (14). In the HCN family, this tripeptide has been conserved but the channels allow the passage of significant amounts of both Na+ and K+. In all but two HCN genes, the sequence motif that corresponds to the selectivity filter is CIGYG (5) (Figure 5b ‘a’). The conservation of this sequence only differs from K+ selective channels at the cysteine residue, implicating this site’s involvement in the reduced K+ permeability. The two exceptions are again found in gene duplicates from sea urchin and urochordate species. spHCN2 has a filter sequence of SIGFG (16), which makes it more similar to the filter sequences of channels in the EAG family. Functional data does not exist for the spHCN2 channel, so whether this difference affects ion selectivity is not known. In ciona HCNc genes, the selectivity filter motif is CIGYS. In mammalian HCN channels, substitution of the second glycine residue by serine (G404S in HCN2), reduced the slow activating current (39). Evidence of channel function is required to determine if this residue difference in the urochordate channel is involved in gene silencing. If ciona HCNc is functional, a different mutational tolerance at this particular site seems likely and may reflect an adaptive process that has enabled these channels to fill a different functional niche specific to these species. Recently, an N-linked glycosylation site (NXT/S) located in the outer turret between the end of S5 and the pore helix (Figure 5b ‘b’) has been shown to play a role in membrane expression of mammalian isoforms (47). Similar to residues in the S4-S5 linker, this motif emerges with the urochordate sequences and is conserved in two of the three urochordate genes and throughout all vertebrate sequences. To understand the necessity of this motif and the role it plays in channel function throughout evolution, further studies are needed to determine if glycosylation does occur in these ciona channels and if it, or some other compensatory post-translational Page 23 of 58 23 modification that allows the channel to mature through the ER/golgi process, occurs elsewhere in the invertebrate sequences. Based on the K+ channel crystal structure (14), the S6 segments of HCN likely form the inner vestibule of the pore and the gate. They show high sequence conservation with other protein relatives, such as ERG and CNG, and are almost completely conserved throughout the HCN family. Not surprisingly, the S6 segments of ciona HCNa and HCNc and spHCN2, which are predicted to be the result of lineage-specific duplication, are the exceptions. Two glycine residues (Figure 5b ‘c’ and ‘d’) are completely conserved among the HCN genes. Based on sequence homology with voltage-gated K+ channels (62), it has been suggested that one of these may form a glycine hinge involved in the opening of the channel gate in response to the movement of the S4-S5 linker. However, more recent experimental and homology modeling evidence (19, 55, 56) has shown that a threonine residue, positioned after the glycines in the linear sequence and on the intracellular side of the channel, is only accessible in the open state. Furthermore, a glutamine residue at the end of S6 is accessible in both the open and closed positions, suggesting that the putative hinge position is located between these two residues (Figure 5b ‘e’). The evolution of the cyclic nucleotide binding and modulatory domains Cyclic nucleotides are known to bind directly to the channel (12) in the CNBD on the intracellular C-terminus and modify channel opening. The CNBD is an evolutionarily conserved domain that is found in several cyclic nucleotide binding proteins, including the bacterial catabolic activating peptide (CAP) and the protein kinase A (PKA) family (4). The crystal Page 24 of 58 24 structure of the C-terminus of mouse HCN2 has been solved (78). Despite a low overall sequence conservation, the tertiary structure of its CNBD is highly conserved with the crystal structures of other CNBDs (4). Furthermore, residues identified as being critical to structure and nucleotide binding are conserved throughout the HCN family (Figure 6). This includes the phosphate binding cassette (PBC), the most conserved feature of the CNBD which makes direct contact with cAMP (13), and the hinge region, important for the capping of cAMP by the C-helix of the CNBD. The middle of the PBC has diverged in one of the sea urchin genes and all of the urochordate genes, but these residues are also variable in the other related crystal structures, undermining the importance of their identity to cyclic nucleotide binding. Because the HCN family possesses these conserved key residues, it seems probable that all of their CNBDs can stabilize the tertiary structure and bind cyclic nucleotides. One exception to this is seen in urochordate HCNc genes. These two sequences are missing a key hydrophobic residue in the PBC (Figure 6 ‘P’) that forms a conserved interaction with cAMP (53). In these genes, this residue is threonine, a hydrophilic amino acid that could disrupt this cAMP interaction. This difference is consistent with our hypothesis that the ciona HCNc genes have undergone diversification following a lineage-specific duplication. Whether this difference also corresponds to a functional change in cAMP binding is an interesting question that will require functional studies to confirm. The effects of cAMP-binding on HCN channel opening are variable among isoforms and depend on the CNBD as well as the C-linker region which connects it to the transmembrane domain. For the mammalian HCN1, HCN2 and HCN4 isoforms, the binding of cAMP relieves a tonic inhibition of the transmembrane domain by the CNBD (75). The extent to which this relief Page 25 of 58 25 occurs (HCN4 = HCN2 > HCN1) depends in part on the C-linker region which transmits the effect of cAMP onto the gating machinery (76). In contrast, cAMP binding has no effect on mouse HCN3 (45) and enhances the inhibition of channel opening of human HCN3 (67). Whether the sequence differences within the C-linker are responsible for these differences is not known. Further complicating the issue is that the effects of cAMP also depend upon regions in the transmembrane domains (76) which may lead to more complex interactions with the cytoplasmic portions of the gating elements. The complexity of these interactions is especially apparent in the spHCN1 channel in which channel inactivation/desensitization after first opening is eliminated by cAMP binding (64). C-linker sequences show considerable divergence throughout the HCN family, including differences among mammalian paralogs, and even between mammalian and non-mammalian orthologs. This divergence is consistent with the variability in cAMP-mediated effects on channel opening. Overall, the role of the C-linker in HCN channels is unusual compared to other regions in the channel. Its sequence and function show variation throughout the family, but it connects two domains that are themselves highly conserved in sequence throughout evolution and carry out highly conserved, but distinct, functions in a cooperative manner (transmembrane = voltagesensing, channel opening and ion permeation, CNBD = cAMP binding and modulation of channel open by the transmembrane domain). Because of its position between these two domains, the sequence variability in the C-linker may be, in part, the result of an adaptive process that has enabled the diversification of cyclic nucleotide binding and the modulation of channel opening by the CNBD within the HCN family. Page 26 of 58 26 Sequence variability and functional divergence of vertebrate HCN paralogs From our data, we predict that the four vertebrate isoforms are paralogs of each other resulting from three duplication events which occurred before the divergence of the teleost and tetrapod lineages. At the time of their origin, the gene pairs produced by these events would have been functionally redundant. Because the vast majority of duplicate genes are silenced throughout the course of evolution (37), the retention of all four is probably due to the acquisition of unique functional characteristics and/or expression patterns which result from tolerated mutations specific to each paralog (65). Throughout the core region, there is high sequence identity among the four vertebrate HCN isoforms. Some of these invariant residues probably contribute to functional properties common among all vertebrate channels. In contrast, some of the residues that vary among paralogs within this conserved region probably contribute to the more subtle isoform-specific differences in function (31). The search has begun to identify which residues underlie differences in function among the four mammalian isoforms but approaches have been limited to direct sequence comparisons followed by single site mutagenesis and/or construction of chimeras, and functional assays. Using these approaches, three studies have identified residues or regions responsible for phenotypic variation among the four mammalian HCN channels. First, differences in the rates of channel opening and cyclic nucleotide modulation between the HCN2 and HCN4 isoforms were localized to a variant residue in the S1 segment (68). Second, a single residue difference in the inner selectivity filter was shown to confer chloride sensitivity to the HCN2 channel (74) (see Figure 5b). Lastly, differences in the C-linker were shown to contribute to differences in cAMP efficacy between HCN1 and HCN2, although the specific residues involved were not identified (76). From these few examples, it is clear that the functional consequences of residue variation among the four vertebrate isoforms, which may encompass not Page 27 of 58 27 only overt differences in channel opening and closing but also differences in permeation, cyclic nucleotide affinity, homo- and hetero-tetrameric assembly, glycosylation status, protein-channel interactions and abilities to traffick to the cell surface, cannot be easily predicted simply by its location within the channel. Furthermore, differences in function may involve multiple residues and domains located throughout the channel. Therefore, site-directed mutagenesis and/or the construction of chimeras between vertebrate isoforms may not be sufficient to determine all differences, especially in the absence of high resolution three dimensional structures for each of the four paralogs. In this study, we expanded the list of sequences for each of the four HCN paralogs by the addition of sequences from lower vertebrates. By broadening the evolutionary representation of the four vertebrate isoforms, the sequence signal-to-noise ratio is improved and the identification of residues that are conserved among orthologs, but differ among paralogs, is enhanced. However, genes from lower vertebrates (eg. fish) and mammals have continued to evolve under different selective pressures, since their divergence from a common ancestor, several million years ago. Therefore, information derived from this expanded list of sequences is complex. We identified three general groups of residues based on similarities among vertebrate orthologs. First, sites may be conserved among orthologs and could contribute to a function specific to each vertebrate isoform. Second, sites may be conserved in the mammalian and fish orthologs but differ between these two groups. These sites may contribute to functional differences between the mammalian and fish channels of a particular isoform. Finally, sites may be conserved in orthologs of mammals or fish, but not both. These sites could be involved in species-specific functions or paralog-specific functions within each species. Alternatively, they may be the result Page 28 of 58 28 of relaxed evolutionary constraints. By analyzing these different sets of conserved sites together with functional characterization of the various channels belonging to each of the vertebrate isoforms, sites that may contribute to paralog-specific differences in channel function can be more easily identified. An informative sub-set of sites identified when comparing HCN genes from lower vertebrates and mammals are those that are conserved with a paralog rather than its own ortholog. If we assume that the probability of identical sites mutating to the same residues independently following duplication and species divergence is low, then these probably represent sites that have been retained from ancestral genes. Differential retention of sites from ancestral genes among orthologs suggests that channel phenotypes may not be completely conserved among them. In conjunction with the identification of conserved and non-conserved sites, the functional analysis of HCN channels from lower vertebrates, and comparisons of their properties with those of mammalian channels, will help to identify residues that contribute to different phenotypes, and will also have the potential to shed light on the sequences and functions of ancestral HCN channels. Vertebrate isoform-specific alignments of sequences spanning 450MY, reveal conserved motifs in the N- and C-termini The large number of vertebrate HCN sequences assembled in this study has greatly increased the power to identify isoform-specific motifs in the N- and C-termini that may contribute to unique functions and specific patterns of expression within cells and tissues. The high sequence conservation among the four vertebrate isoforms extends beyond the core region analyzed above Page 29 of 58 29 and includes regions just upstream of the S1 segment and downstream of the CNBD. The more distal N- and C-termini, however, vary in both length and sequence among paralogs and do not align well. The sequences of the mammalian orthologs identified prior to this study were too close in evolutionary time to reveal sequence divergence in the distal N- and C-termini. On the other hand, the four paralogs from a single species were too diverged to align reliably. By expanding the taxa sampling to include species from different time periods of vertebrate evolution, the signal-to-noise ratio that is inherent in the sequence information was considerably improved (2). Alignments of the distal termini, using this expanded list of vertebrate sequences, revealed several isoform-specific blocks of conserved sequence interspersed with diverged regions of variable length (Figure 7). In the distal N- and C- termini, multiple regions specific to each of the four vertebrate paralogs were identified and were seen to be interspersed with regions of variable length and sequence. The conservation of these motifs throughout 450MY of evolution highlights a selective constraint placed upon them and suggests that they confer isoform-specific properties. These properties remain to be identified but there is some evidence to support a role for these motifs in the regulation of cell surface expression. For example, a single block of five residues is conserved in the N-terminus of HCN2 in fish, frog, chicken and mammals (Figure 7a ‘a’). This motif is not analogous to any known consensus sequence. However, the deletion of the first 130 amino acids of the mouse HCN2 N-terminus, which includes this conserved block of five amino acids, reduces cell surface expression and has only minor effects on channel opening and closing (70). A second example is a block of residues found in the C-terminus of HCN1 channels (Figure 7b ‘a’). This region is responsible for binding of filamin A to HCN1, and not the other isoforms Page 30 of 58 30 (22), and its removal abolishes the effects of filamin on channel cell surface expression. Overall, the motifs conserved within the termini of an individual isoform are more likely involved in roles specific to the function or expression of that paralog or may be involved in protein interactions specific to the tissues where these proteins are expressed. The regions of variable sequence and length observed between these conserved motifs result from DNA insertions and/or deletions. This variability may represent either relaxed constraints and/or the beginnings of species-specific adaptation processes. These adaptations may involve properties such as temperature sensitivity (41, 42) or binding to regulatory proteins with species-specific sequences. Finally, both the N and C-termini possess regions of high sequence conservation among the four vertebrate isoforms, in addition to those of the core region used for the phylogenetic analyses. These regions probably confer properties that are not unique to any individual isoform, but are important for all vertebrate HCN channels. In the N-terminus, a region of approximately 50 residues immediately upstream of the start of S1 is conserved in all four vertebrate isoforms (Figure 7a ‘b’, HCN3 not shown). In mouse HCN2, this region interacts with itself and thus may be involved in intersubunit interactions of tetrameric assembly (70). Furthermore, the removal of this region, along with the rest of the N-terminus, prevents the formation of functional channels. Based on the high level of sequence conservation among the four isoforms, it seems probable that this region provides similar interactions for HCN1, 3 and 4. If this is true, the few differences among paralogs within this region may modify intersubunit interactions and thus regulate homomeric and/or heteromeric assembly. Page 31 of 58 31 In the C-terminus, a block of residues conserved among the vertebrate HCN1, 2 and 4 isoforms was identified immediately downstream of the CNBD, which corresponds to the start of the last exon (Figure 7b ‘b’). Deletion of this region, together with the entire distal C-terminus and the C-helix of the CNBD, does not affect HCN1, HCN2 or HCN4 channel cell surface expression or function in heterologous systems (54, 75). A function for this conserved block of residues is not known but, whatever this function may be, based on the lack of sequence conservation, it is not likely possessed by HCN3 channels. Finally, a motif found at the distal end of the C-terminus, SNL, is conserved in HCN1, 2 and 4 (Figure 7b ‘c’). This motif, which qualifies as a consensus PDZ-binding domain, interacts with PDZ-containing proteins in vitro (28), and also with the TRP8 protein in vitro and in vivo, where it regulates channel cell surface expression (58). The absence of this motif in the HCN3 genes suggests that either this isoform does not interact with PDZ-containing proteins or that these interactions take place at different locations within the channel. Summary and Perspectives The availability of an increasing number of genome sequences has enabled us to generate a list of putative HCN coding sequences that has doubled the number of those previously known and covers a significantly greater portion of evolutionary time. With this improved species representation, we were able to more accurately complete sequence comparisons, phylogenetic analyses and exon structure comparisons of the HCN gene family and put forward a model of its molecular evolution. We propose that the vertebrate isoforms evolved from a single ancestral sequence that had an exon structure similar to current mammalian HCN genes and that the duplication events occurred after the intron positions were fixed in the linear sequence. We also Page 32 of 58 32 suggest that HCN duplications have occurred independently in the sea urchin, urochordate and fish lineages. Increasing the evolutionary distance between the sequences for each HCN isoform provided a good contrast and enabled us to unmask and identify regions putatively important for isoform-specific, as well as species-specific, functions. Together, our study provides a strong basis from which to refine the proposed model of evolution as more genomes become available and as the functional analysis of HCN genes progresses. Finally, our study provides a valuable tool to aid in the planning of experiments designed to probe the relationship between structure and function of HCN channels, and to determine the functional significance of sequence similarities and differences among them. Acknowledgements We would like to acknowledge Dr. Fiona Brinkman from Molecular Biology and Biochemistry at SFU for her advice, training and feedback on the manuscript as well as the Ouellette laboratory at the UBC Bioinformatics Centre (http://bioinformatics.ubc.ca), for their resource advice which has made this work possible. Current address of C.R. Marshall: The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Department of Molecular and Medical Genetics, University of Toronto, Ontario. Grants Supported by grants from the Natural Sciences and Engineering Research Council of Canada (EAA). Page 33 of 58 33 References 1. Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. Basic local alignment search tool. J Mol Biol 215: 403-410, 1990. 2. Anderson PA and Greenberg RM. Phylogeny of ion channels: clues to structure and function. Comp Biochem Physiol B Biochem Mol Biol 129: 17-28, 2001. 3. Bell DC, Yao H, Saenger RC, Riley JH, and Siegelbaum SA. Changes in local S4 environment provide a voltage-sensing mechanism for mammalian hyperpolarization-activated HCN channels. J Gen Physiol 123: 5-19, 2004. 4. Berman HM, Ten Eyck LF, Goodsell DS, Haste NM, Kornev A, and Taylor SS. The cAMP binding domain: an ancient signaling module. Proc Natl Acad Sci U S A 102: 45-50, 2005. 5. Biel M, Schneider A, and Wahl C. Cardiac HCN channels: structure, function, and modulation. Trends Cardiovasc Med 12: 206-212, 2002. 6. Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L, Coates G, Cuff J, Curwen V, Cutts T, Down T, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz HR, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, UretaVidal A, Woodwark KC, Cameron G, Durbin R, Cox A, Hubbard T, and Clamp M. An overview of Ensembl. Genome Res 14: 925-928, 2004. 7. Cai X and Zhang Y. Molecular evolution of the ankyrin gene family. Mol Biol Evol 23: 550- 558, 2006. 8. Chen J, Mitcheson JS, Tristani-Firouzi M, Lin M, and Sanguinetti MC. The S4-S5 linker couples voltage sensing and activation of pacemaker channels. Proc Natl Acad Sci U S A 98: 1127711282, 2001. Page 34 of 58 34 9. Cho WJ, Drescher MJ, Hatfield JS, Bessert DA, Skoff RP, and Drescher DG. Hyperpolarization-activated, cyclic AMP-gated, HCN1-like cation channel: the primary, full-length HCN isoform expressed in a saccular hair-cell layer. Neuroscience 118: 525-534, 2003. 10. Craven KB and Zagotta WN. CNG AND HCN CHANNELS: Two Peas, One Pod. Annu Rev Physiol 68: 375-401, 2006. 11. DiFrancesco D. Pacemaker mechanisms in cardiac tissue. Annu Rev Physiol 55: 455-472, 1993. 12. DiFrancesco D and Tortora P. Direct activation of cardiac pacemaker channels by intracellular cyclic AMP. Nature 351: 145-147, 1991. 13. Diller TC, Madhusudan, Xuong NH, and Taylor SS. Molecular basis for regulatory subunit diversity in cAMP-dependent protein kinase: crystal structure of the type II beta regulatory subunit. Structure 9: 73-82, 2001. 14. Doyle DA, Morais Cabral J, Pfuetzner RA, Kuo A, Gulbis JM, Cohen SL, Chait BT, and MacKinnon R. The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science 280: 69-77, 1998. 15. Felsenstein J. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266: 418-427, 1996. 16. Galindo BE, Neill AT, and Vacquier VD. A new hyperpolarization-activated, cyclic nucleotide-gated channel from sea urchin sperm flagella. Biochem Biophys Res Commun 334: 96-101, 2005. 17. Gallin WJ. Evolution of the "classical" cadherin family of cell adhesion molecules in vertebrates. Mol Biol Evol 15: 1099-1107, 1998. 18. Gauss R, Seifert R, and Kaupp UB. Molecular identification of a hyperpolarization-activated channel in sea urchin sperm. Nature 393: 583-587, 1998. Page 35 of 58 35 19. Giorgetti A, Carloni P, Mistrik P, and Torre V. A homology model of the pore region of HCN channels. Biophys J 89: 932-944, 2005. 20. Gisselmann G, Marx T, Bobkov Y, Wetzel CH, Neuhaus EM, Ache BW, and Hatt H. Molecular and functional characterization of an I(h)-channel from lobster olfactory receptor neurons. Eur J Neurosci 21: 1635-1647, 2005. 21. Gisselmann G, Warnstedt M, Gamerschlag B, Bormann A, Marx T, Neuhaus EM, Stoertkuhl K, Wetzel CH, and Hatt H. Characterization of recombinant and native Ih-channels from Apis mellifera. Insect Biochem Mol Biol 33: 1123-1134, 2003. 22. Gravante B, Barbuti A, Milanesi R, Zappi I, Viscomi C, and DiFrancesco D. Interaction of the pacemaker channel HCN1 with filamin A. J Biol Chem 279: 43847-43853, 2004. 23. Hoegg S, Brinkmann H, Taylor JS, and Meyer A. Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish. J Mol Evol 59: 190-203, 2004. 24. Hubbard T BD, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, and Clamp M. The Ensembl genome database project. Nucleic Acids Res 30: 38-41, 2002. 25. Hughes AL and Friedman R. 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J Struct Funct Genomics 3: 85-93, 2003. 26. Ishii TM, Takano M, Xie LH, Noma A, and Ohmori H. Molecular characterization of the hyperpolarization-activated cation channel in rabbit heart sinoatrial node. J Biol Chem 274: 1283512839, 1999. 27. Jiang Y, Lee A, Chen J, Ruta V, Cadene M, Chait BT, and MacKinnon R. X-ray structure of a voltage-dependent K+ channel. Nature 423: 33-41, 2003. Page 36 of 58 36 28. Kimura K, Kitano J, Nakajima Y, and Nakanishi S. Hyperpolarization-activated, cyclic nucleotide-gated HCN2 cation channel forms a protein assembly with multiple neuronal scaffold proteins in distinct modes of protein-protein interaction. Genes Cells 9: 631-640, 2004. 29. Kozak M. Interpreting cDNA sequences: some insights from studies on translation. Mamm Genome 7: 563-574, 1996. 30. Krieger J, Strobel J, Vogl A, Hanke W, and Breer H. Identification of a cyclic nucleotide- and voltage-activated ion channel from insect antennae. Insect Biochem Mol Biol 29: 255-267, 1999. 31. Li B and Gallin WJ. Computational identification of residues that modulate voltage sensitivity of voltage-gated potassium channels. BMC Struct Biol 5: 16, 2005. 32. Long SB, Campbell EB, and Mackinnon R. Voltage sensor of Kv1.2: structural basis of electromechanical coupling. Science 309: 903-908, 2005. 33. Lopreato GF, Lu Y, Southwell A, Atkinson NS, Hillis DM, Wilcox TP, and Zakon HH. Evolution and divergence of sodium channel genes in vertebrates. Proc Natl Acad Sci U S A 98: 75887592, 2001. 34. Ludwig A, Budde T, Stieber J, Moosmang S, Wahl C, Holthoff K, Langebartels A, Wotjak C, Munsch T, Zong X, Feil S, Feil R, Lancel M, Chien KR, Konnerth A, Pape HC, Biel M, and Hofmann F. Absence epilepsy and sinus dysrhythmia in mice lacking the pacemaker channel HCN2. Embo J 22: 216-224, 2003. 35. Ludwig A, Zong X, Jeglitsch M, Hofmann F, and Biel M. A family of hyperpolarization- activated mammalian cation channels. Nature 393: 587-591, 1998. 36. Ludwig A, Zong X, Stieber J, Hullin R, Hofmann F, and Biel M. Two pacemaker channels from human heart with profoundly different activation kinetics. Embo J 18: 2323-2329, 1999. 37. Lynch M and Conery JS. The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155, 2000. Page 37 of 58 37 38. Macri V and Accili EA. Structural elements of instantaneous and slow gating in hyperpolarization-activated cyclic nucleotide-gated channels. J Biol Chem 279: 16832-16846, 2004. 39. Macri V, Proenza C, Agranovich E, Angoli D, and Accili EA. Separable gating mechanisms in a Mammalian pacemaker channel. J Biol Chem 277: 35939-35946, 2002. 40. Mannikko R, Elinder F, and Larsson HP. Voltage-sensing mechanism is conserved among ion channels gated by opposite voltages. Nature 419: 837-841, 2002. 41. Marshall C, Elias C, Xue XH, Le HD, Omelchenko A, Hryshko LV, and Tibbits GF. Temperature dependence of cardiac Na+/Ca2+ exchanger. Ann N Y Acad Sci 976: 109-112, 2002. 42. Marshall CR, Fox JA, Butland SL, Ouellette BF, Brinkman FS, and Tibbits GF. Phylogeny of Na+/Ca2+ exchanger (NCX) genes from genomic data identifies new gene duplications and a new family member in fish species. Physiol Genomics 21: 161-173, 2005. 43. Marx T, Gisselmann G, Stortkuhl KF, Hovemann BT, and Hatt H. Molecular cloning of a putative voltage- and cyclic nucleotide-gated ion channel present in the antennae and eyes of Drosophila melanogaster. Invert Neurosci 4: 55-63, 1999. 44. Milanesi R, Baruscotti M, Gnecchi-Ruscone T, and DiFrancesco D. Familial sinus bradycardia associated with a mutation in the cardiac pacemaker channel. N Engl J Med 354: 151-157, 2006. 45. Mistrik P, Mader R, Michalakis S, Weidinger M, Pfeifer A, and Biel M. The murine HCN3 gene encodes a hyperpolarization-activated cation channel with slow kinetics and unique response to cyclic nucleotides. J Biol Chem 280: 27056-27061, 2005. 46. Moosmang S, Stieber J, Zong X, Biel M, Hofmann F, and Ludwig A. Cellular expression and functional characterization of four hyperpolarization-activated pacemaker channels in cardiac and neuronal tissues. Eur J Biochem 268: 1646-1652, 2001. Page 38 of 58 38 47. Much B, Wahl-Schott C, Zong X, Schneider A, Baumann L, Moosmang S, Ludwig A, and Biel M. Role of subunit heteromerization and N-linked glycosylation in the formation of functional hyperpolarization-activated cyclic nucleotide-gated channels. J Biol Chem 278: 4378143786, 2003. 48. Nicholas KB, Jr. NHB, and Deerfield DWI. GeneDoc: Analysis and Visualization of Genetic Variation. EMBNEW NEWS 4: 14, 1997. 49. Nolan MF, Malleret G, Lee KH, Gibbs E, Dudman JT, Santoro B, Yin D, Thompson RF, Siegelbaum SA, Kandel ER, and Morozov A. The hyperpolarization-activated HCN1 channel is important for motor learning and neuronal integration by cerebellar Purkinje cells. Cell 115: 551-564, 2003. 50. Okamura Y, Nishino A, Murata Y, Nakajo K, Iwasaki H, Ohtsuka Y, Tanaka-Kunishima M, Takahashi N, Hara Y, Yoshida T, Nishida M, Okado H, Watari H, Meinertzhagen IA, Satoh N, Takahashi K, Satou Y, Okada Y, and Mori Y. Comprehensive analysis of the ascidian genome reveals novel insights into the molecular evolution of ion channel genes. Physiol Genomics 22: 269282, 2005. 51. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12: 357-358, 1996. 52. Pape HC. Queer current and pacemaker: the hyperpolarization-activated cation current in neurons. Annu Rev Physiol 58: 299-327, 1996. 53. Passner JM, Schultz SC, and Steitz TA. Modeling the cAMP-induced allosteric transition using the crystal structure of CAP-cAMP at 2.1 A resolution. J Mol Biol 304: 847-859, 2000. 54. Proenza C, Tran N, Angoli D, Zahynacz K, Balcar P, and Accili EA. Different roles for the cyclic nucleotide binding domain and amino terminus in assembly and expression of hyperpolarizationactivated, cyclic nucleotide-gated channels. J Biol Chem 277: 29634-29642, 2002. Page 39 of 58 39 55. Rothberg BS, Shin KS, Phale PS, and Yellen G. Voltage-controlled gating at the intracellular entrance to a hyperpolarization-activated cation channel. J Gen Physiol 119: 83-91, 2002. 56. Rothberg BS, Shin KS, and Yellen G. Movements near the gate of a hyperpolarization- activated cation channel. J Gen Physiol 122: 501-510, 2003. 57. Santoro B, Liu DT, Yao H, Bartsch D, Kandel ER, Siegelbaum SA, and Tibbs GR. Identification of a gene encoding a hyperpolarization-activated pacemaker channel of brain. Cell 93: 717-729, 1998. 58. Santoro B, Wainger BJ, and Siegelbaum SA. Regulation of HCN channel surface expression by a novel C-terminal protein-protein interaction. J Neurosci 24: 10750-10762, 2004. 59. Satou Y, Kawashima T, Shoguchi E, Nakayama A, and Satoh N. An integrated database of the ascidian, Ciona intestinalis: towards functional genomics. Zoolog Sci 22: 837-843, 2005. 60. Schulze-Bahr E, Neu A, Friederich P, Kaupp UB, Breithardt G, Pongs O, and Isbrandt D. Pacemaker channel dysfunction in a patient with sinus node disease. J Clin Invest 111: 1537-1545, 2003. 61. Seifert R, Scholten A, Gauss R, Mincheva A, Lichter P, and Kaupp UB. Molecular characterization of a slowly gating human hyperpolarization-activated channel predominantly expressed in thalamus, heart, and testis. Proc Natl Acad Sci U S A 96: 9391-9396, 1999. 62. Shealy RT, Murphy AD, Ramarathnam R, Jakobsson E, and Subramaniam S. Sequence- function analysis of the K+-selective family of ion channels using a comprehensive alignment and the KcsA channel structure. Biophys J 84: 2929-2942, 2003. 63. Shi W, Wymore R, Yu H, Wu J, Wymore RT, Pan Z, Robinson RB, Dixon JE, McKinnon D, and Cohen IS. Distribution and prevalence of hyperpolarization-activated cation channel (HCN) mRNA expression in cardiac tissues. Circ Res 85: e1-6, 1999. Page 40 of 58 40 64. Shin KS, Maertens C, Proenza C, Rothberg BS, and Yellen G. Inactivation in HCN channels results from reclosure of the activation gate: desensitization to voltage. Neuron 41: 737-744, 2004. 65. Sidow A. Gen(om)e duplications in the evolution of early vertebrates. Curr Opin Genet Dev 6: 715-722, 1996. 66. Stieber J, Herrmann S, Feil S, Loster J, Feil R, Biel M, Hofmann F, and Ludwig A. The hyperpolarization-activated channel HCN4 is required for the generation of pacemaker action potentials in the embryonic heart. Proc Natl Acad Sci U S A 100: 15235-15240, 2003. 67. Stieber J, Stockl G, Herrmann S, Hassfurth B, and Hofmann F. Functional expression of the human HCN3 channel. J Biol Chem 280: 34635-34643, 2005. 68. Stieber J, Thomer A, Much B, Schneider A, Biel M, and Hofmann F. Molecular basis for the different activation kinetics of the pacemaker channels HCN2 and HCN4. J Biol Chem 278: 3367233680, 2003. 69. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, and Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876-4882, 1997. 70. Tran N, Proenza C, Macri V, Petigara F, Sloan E, Samler S, and Accili EA. A conserved domain in the NH2 terminus important for assembly and functional expression of pacemaker channels. J Biol Chem 277: 43588-43592, 2002. 71. Ueda K, Nakamura K, Hayashi T, Inagaki N, Takahashi M, Arimura T, Morita H, Higashiuesato Y, Hirano Y, Yasunami M, Takishita S, Yamashina A, Ohe T, Sunamori M, Hiraoka M, and Kimura A. Functional characterization of a trafficking-defective HCN4 mutation, D553N, associated with cardiac arrhythmia. J Biol Chem 279: 27194-27198, 2004. Page 41 of 58 41 72. Vaccari T, Moroni A, Rocchi M, Gorza L, Bianchi ME, Beltrame M, and DiFrancesco D. The human gene coding for HCN2, a pacemaker channel of the heart. Biochim Biophys Acta 1446: 419-425, 1999. 73. Vanin EF. Processed pseudogenes: characteristics and evolution. Annu Rev Genet 19: 253-272, 1985. 74. Wahl-Schott C, Baumann L, Zong X, and Biel M. An arginine residue in the pore region is a key determinant of chloride dependence in cardiac pacemaker channels. J Biol Chem 280: 1369413700, 2005. 75. Wainger BJ, DeGennaro M, Santoro B, Siegelbaum SA, and Tibbs GR. Molecular mechanism of cAMP modulation of HCN pacemaker channels. Nature 411: 805-810, 2001. 76. Wang J, Chen S, and Siegelbaum SA. Regulation of hyperpolarization-activated HCN channel gating and cAMP modulation due to interactions of COOH terminus and core transmembrane regions. J Gen Physiol 118: 237-250, 2001. 77. Yifrach O and MacKinnon R. Energetics of pore opening in a voltage-gated K(+) channel. Cell 111: 231-239, 2002. 78. Zagotta WN, Olivier NB, Black KD, Young EC, Olson R, and Gouaux E. Structural basis for modulation and agonist specificity of HCN pacemaker channels. Nature 425: 200-205, 2003. Page 42 of 58 42 Figure Legends Figure 1: Schematic representation of the overall topology of HCN channels. Included are the six putative transmembrane segments (S1-S6), including an S4 voltage sensor (light grey) and the reentrant P-loop region between S5 and S6, all structural characteristics common to the entire voltagegated potassium channel superfamily, including ERG, CNG and HCN channels. HCN and ERG channels also share the potassium selectivity filter GY/FG. Based on the crystal structure of mHCN2 (78), the C-linker region in HCN channels contains six alpha helices (A’-E’) and connects the cytoplasmic end of S6 to the start of the cyclic nucleotide binding domain (CNBD). The CNBD consists of two alpha helices (A and B) that flank an eight-stranded beta-roll followed by a third alpha helix at its C-terminal end (C-helix). The cyclic nucleotides bind to pocket that is formed by the betaroll and the C-helix. Thick black lines delineate the region that is included in the phylogenetic analyses. In HCN channels, these lines encompass vertebrate exons 2 through 7. Alignments of HCN vertebrate isoform-specific N-terminal (exon 1) and C-terminal (exon 8) regions include those residues that fall outside of this region. Figure 2: High sequence conservation observed between vertebrate isoforms. Sequence conservation between mammalian HCN1-4 (M1-M4) and fish HCN1-4 (F1-F4) isoforms indicate that mammalian HCN3 and fish HCN3 sequences are as similar to each other as they are to the other isoforms. This suggests that sequence divergence has continued independently in each lineage. Numbers indicate % sequence identity/sequence conservation generated by GeneDoc for the region between S1 and end of CNBD, where the total number of sites used was 457. Fore example, 85/94 correlates to 393 of 457 sites identical and 430 of 457 conserved. Human sequences were used as a representative of mammalian sequences and HCNa_fugu used as an example of fish sequences. ‘B’ sequences tend to show lower conservation. Page 43 of 58 43 Figure 3: A maximum parsimony (MP) consensus tree of the HCN family. 52 HCN sequences were included in this analysis. The MP tree was constructed by resampling 100 datasets (bootstrap) with randomized input order of 10 jumbles using Seqboot, Protpars and Consense programs in the PHYLIP software. To control for length differences between the different isoforms, only the region between the start of S1 and the end of the CNBD was used. The tree is rooted with an outgroup of KAT1, hERG1, CNGA1 and CNGA3, 4 sequences that are known to be related to the HCN family. Similar to the previous studies, HCN2_urchin is shown to branch before the invertebrate clade and may be consequence of the MP method’s sensitivity to unequal rates of evolution. The six urochordate sequences from c. intestinalis and c. savignyi fall according to their evolutionary position, between the invertebrates and vertebrate species, and are supported by high bootstrap values. Because they group within their own species, they are not considered orthologous to any of the mammalian isoforms and are therefore named HCNa, HCNb and HCNc. HCN3 is shown to be the product of the first duplication event, however, a split between the fish and mammalian sequences is observed but the partition is not well defined. HCN4 is shown to be the product of the second duplication event, followed by the emergence of HCN2 and HCN1. Low bootstrap values within isoform clades are most likely a consequence of the high sequence identity within the region used and in turn, the lack of informative sites required by the MP method. Duplicate copies of mammalian paralogs in the fish species are denoted with ‘a’ or ‘b’ in their sequence names. * indicate sequences annotated in this study. Numbers indicate bootstrap values and represent % support for the respective partitions. Figure 4: Conservation of exon structure reveals evolutionary patterns of HCN genes. The general regions of the HCN protein are outlined in the grey box, not drawn to scale. These include: the Nterminal region, six transmembrane segments (1-6), a pore region (P) that forms the pore and Page 44 of 58 44 selectivity filter in the quaternary structure of the channel, C-linker, CNBD, and the distal Cterminus. Below, the vertical lines represent exon boundary position of the coding sequence: black lines indicate an exon boundary position that is conserved with the human HCN isoforms, dark grey lines indicate conservation with invertebrate boundary positions and light grey indicates boundary position specific to the individual protein. The exon number is shown above the horizontal black line and amino acid length of the protein corresponding to the exon is shown below. The exon structure of most vertebrate isoforms (HCN1-4) is conserved with that of human (exception is fish HCN3b genes, see text for details), consisting of 8 exons with 2 through 7 corresponding to the region between the start of S1 and the end of the CNBD. The sequence diverges in length between isoforms in the N- and C-termini. A general HCN_human structure is shown as a representative with length differences of the four isoforms in exon 1 and 8 indicated by a range. The total lengths of the predicted exons and the exon boundary structure is variable among the three different c. intestinalis HCN coding sequences identified in this analysis. HCNb_c.intestinalis is composed of 15 exons, sharing all vertebrate HCN channels exon boundary positions, with an additional six unique boundaries and one boundary shared with the invertebrate HCN proteins of the fly and mosquito. HCNa_c.intestinalis is composed of 10 exons but only 4 boundaries are conserved with vertebrates. HCNc_c.intestinalis is predicted to have 14 exons, and shows no similarity in exon structure or boundary position to either of the other ciona proteins or vertebrate HCN. In general, the mosquito and fly proteins have a higher number of exons compared to vertebrate HCN, sharing only a few boundary positions. Figure 5: Sequence comparison of the voltage sensing domain and pore region. A) Sequence of voltage sensing domain conserved throughout evolution. The fly, human and fugu sequences were chosen to represent arthropod, mammal and fish species, respectively. Sequences are organized into invertebrates, urochordates and vertebrate groups. A high sequence identity is notable amongst all Page 45 of 58 45 vertebrate sequences. Shading indicates four levels of sequence conservation (see methods), excluding the assembly gap regions. Black stars indicate regions of assembly gaps in genome database and the arrow indicates the conserved exon boundary between mammalian exon 2 and 3. a = region of charge common to all potassium channel voltage sensing domains, b = region of charge unique to HCN channels and c = region associated with normal channel closure. B) A sequence comparison of the HCN pore region. The same representative sequences as in 5A were used, except HCN4_chicken. This was omitted due to a gap in the genome assembly across this region. The selectivity filter sequence CIGYG (a) is conserved throughout the HCN family, with the exception of two duplicate genes from sea urchin and urochordate species. The NXT/S glycosylation site (b) is located immediately prior to the pore helix and is conserved in urochordate and vertebrate sequences. Two conserved glycine residues (c and d) are located in S6, one of which was hypothesized to be the hinge that obstructs that pore. More recent evidence suggests that the hinge is located further downstream between the threonine and glutamine residues (e) at the C-terminal end of S6 in which a dipeptide is conserved throughout the entire HCN family. (f) The residue identified as responsible for the difference in chloride sensitivity in HCN2 vs. HCN1. Figure 6: Sequence comparison of the cyclic nucleotide binding domain. A sequence alignment of the CNBDs from representative sequences of the HCN family indicates that functionally important residues in the CNBD are conserved throughout evolution. Shading indicates four levels of sequence conservation (see methods). Evolutionarily conserved glycine residues known to be important for maintaining the fold of the V-roll found in all CNBDs (located between A and B) and those residues known to interact with the cyclic nucleotide are indicated by stars below the alignment. The phosphate binding cassette (PBC) and hinge region common to all CNBDs are identified by the black lines below Page 46 of 58 46 the sequence. The arrow indicates the hydrophobic site where divergence has occurred in ciona HCNc genes (as explained in text). Cylinders above the sequence represent helices and arrows = bstrands that together form the B-roll. Cyclic nucleotides are known to bind to the second conserved arginine in the PBC. Figure 7: Individual isoform alignments of the distal termini of HCN channels. 7A) Alignment of the N-terminal region of vertebrate HCN1, 2 and 4 isoforms. Sequence alignments of available vertebrate sequences for the N-terminal region of three of the four individual isoforms were produced using ClustalX and edited and shaded using GenDoc. Shading indicates four levels of sequence conservation (see methods). a = example of conserved motif in vertebrate HCN2 sequences that may play a role in channel surface expression. b = region conserved amongst all vertebrate HCN sequences that may be involved in tetrameric interactions in HCN2 and may play a similar role in HCN1, 3 and 4. Arrows between conserved blocks represent variable regions with the range of amino acid length variation listed above. Numbers at the start and end of the sequences correspond to their position in the full length sequences provided in Supplementary Figure 1. 7B) Alignment of the C-terminal region of vertebrate HCN1, 2 and 4 isoforms. Sequence alignments of all available vertebrate sequences for the C-terminal region of three of the four individual isoforms were produced using ClustalX and edited/shaded using GenDoc. Shading indicates four levels of sequence conservation (see methods). a = example of conserved motif in vertebrate HCN1 sequences has been shown to interact with Filamin A. b = region conserved amongst all vertebrate HCN sequences downstream of the CNBD. c = tripeptide sequence shown to be involved in protein-protein interactions with scaffolding proteins in vivo and to regulate cell surface expression. Dotted line indicates conserved region amongst four vertebrate isoforms previously identified but of unknown function. Arrows between conserved blocks represent variable regions with the range of amino acid length variation listed above. Numbers at the start and Page 47 of 58 47 end of the sequences correspond to their position in the full length sequences provided in Supplementary Figure 1. Supplementary Figure 1: FASTA sequences of putative HCN homologs used in analysis Supplementary Figure 2: Full length alignment of HCN sequences used in analysis. Channel regions indicated above the alignment. TMD = transmembrane domain, S4 = voltage sensor, SF = selectivity filter, CNBD = cyclic nucleotide binding domain. Black lines below alignment correspond to region used in phylogenetic analyses and to the region between the start of S1 and the end of the CNBD. Numbers correspond to position in sequence used in this study (see FASTA sequence in Supplementary Figure 1). * indicate gaps in genome assembly. C-terminal region in some fish species may be broken up into multiple exons but interpreted to be one because downstream portions are in frame with upstream sequence and high sequence conservation is shared between fish species. Supplementary Figure 3: Additional phylogenetic trees of HCN family. 3A) Neighbor-Joining (NJ) phylogram of the HCN family. The NJ tree was constructed with resampling (bootstrap, 1000 datasets) using ClustalX (excluding positions with gaps and corrected for multiple substitutions). Only the region between the start of S1 and the end of the CNBD was used, to control for length differences between the different isoforms. The tree is rooted with an outgroup of KAT1, hERG1, CNGA1 and CNGA3, four sequences that are known to be distantly related to the HCN family. Tree topology indicates that HCN3 was the first product of duplication events in the vertebrate lineage. There is a split between mammalian and fish HCN3 clades, which may be expected based on the sequence conservation shown in Figure 2, but the low bootstrap value indicates that this intra-isoform division remains unresolved. HCN4 is represented as the product of the second duplication event, followed by the emergence of HCN2 and HCN1. Similar to the HCN2_urchin, the urochordates, c. intestinalis and Page 48 of 58 48 c. savignyi, are shown to branch before the invertebrate clade. This position, however, is not conserved in the MP tree in which the 6 sequences fall according their evolutionary position, between the invertebrates and vertebrates. Furthermore, the low bootstrap values at the nodal partitions throughout the invertebrate/urochordate group indicate that the topology of these divisions remains unclear with this tree building method. This may be a consequence of excluding alignment positions with gaps and that insertions/deletions need to be included to resolve the evolutionary order of these primitive sequences. * indicate sequences identified in this study. Numbers indicate bootstrap values and represent number of trees containing the particular division. 3B) Maximum Likelihood phylogram of the HCN family. Due to the high sequence conservation and program limitations, the maximum likelihood phylogram was constructed using a subgroup of 41 sequences, resampling of 100 datasets and a randomized input order of 10 jumbles. Seqboot, Proml and Consense programs from PHYLIP were used and the JTT model of evolution was employed. A single representative tree similar in topology to consense tree was used to provide indication of branch length divergence. Numbers indicate bootstrap values for consense tree node divisions and represent % support for the respective partition. Consistent with the NJ and MP trees generated, HCN2_urchin is at the base of the phylogram, and is separated by a high bootstrap value. Similar to the NJ tree, ciona HCNb genes are independent from the HCNa and HCNc and the invertebrates/urochordates divisions remain undefined. HCN3 is separated from the other three vertebrate isoforms by a high bootstrap value, whereas the order of the other two duplication events remains unresolved by this method. As ML tree results are dependent on the evolutionary model imposed, this undefined partition suggests that the JTT model can not conclusively generate the sequence evolution observed within the HCN family. Based on branch length information, mammalian HCN4 and HCN2 sequences seem to have evolved the least from a common ancestral sequence, whereas all fish sequences and those in the HCN1 and Page 49 of 58 49 HCN3 clades seem to have undergone more evolutionary change. Similar results were suggested by the sequence identity profiles shown in Figure 2. Table 1: List of HCN Sequences Used in Analyses Naming of sequences = HCN isoform followed by the species. () indicate previous names for sequences. For duplicates, ‘a’ and ‘b’ labels were assigned arbitrarily. * indicate those that were reannotated or identified in this study and † indicates a sequence missing at least one internal exon and was therefore removed from phylogenetic analyses. Abbreviations: Invt., invertebrate; Mam., mammal; Vt., lower vertebrate; Uro., urochordate. NA = protein identifier was not available for the sequence included in final analysis. Page 50 of 58 1. ++ ++ S1 S2 S3 S4 S5 ++ ++ G Y G S6 Clin ke r P CN NH2 cAMP COOH Figure 1 BD Page 51 of 58 2. M1 F1 M2 F2 M3 F3 M4 F4 93/ 97 85/ 94 85/ 94 80/ 92 82/ 93 87/ 94 84/ 94 M1 85/ 93 83/ 93 79/ 92 82/ 94 86/ 94 82/ 93 F1 91/ 96 82/ 92 84/ 94 90/ 96 87/ 93 M2 80/ 91 83/ 93 88/ 95 85/ 93 F2 83/ 93 85/ 94 82/ 93 M3 86/ 95 84/ 94 F3 91/ 96 M4 F4 Figure 2 Page 52 of 58 3. hCNGA3 hCNGA1 KAT1 hERG1 HCN2_urchin Outgroup HCN_urchin HCN_lobster 100 Invertebrates HCN_mosquito 100 HCN_silkmoth 46.9 HCN_bee 51.3 HCN_fly HCNb_c.intestinalis* 100 HCNb_c.savignyi* 83.7 HCNa_c.intestinalis* 100 Urochordates HCNa_c.savignyi* 96 HCNc_c.intestinalis* 100 HCNc_c.savignyi* HCN3_green puffer* 100 HCN3a_fugu* 33.3 82.7 HCN3b_fugu* 100 HCN3_zebrafish* HCN3_opossum* 100 HCN3_mouse 92.5 75.8 HCN3_rat HCN3_cow 100 37.1 HCN3_human 28 HCN3_dog HCN4b_fugu* 41.2 HCN4b_green puffer* 93.4 HCN4_zebrafish 44.5 30.6 HCN4a_green puffer* 100 48.1 HCN4a_fugu* HCN4_opossum* 92.9 HCN4_mouse 66.1 HCN4_rat 41.5 HCN4_rabbit 38 HCN4_dog 41.3 HCN4_human 95 HCN2a_green puffer* 100 HCN2a_fugu* 56.6 HCN2b_green puffer* 72.1 97 HCN2_zebrafish HCN2_frog* 71.5 HCN2_rat 94.9 HCN2_human 69.7 56.5 HCN2_mouse HCN1_trout 100 HCN1_fugu 94.5 100 HCN1_green puffer* HCN1_mouse 98.5 100 HCN1_rat 25.1 HCN1_opossum 26.4 HCN1_chimp 62 HCN1_dog 29.5 HCN1_human 31.1 HCN1_rabbit 98 100 98 Figure 3 HCN3 HCN4 Vertebrates HCN2 HCN1 Page 53 of 58 4. 1 2 3 4 5 P N-terminus 1 HCN 1-4_human (774-1203aa) 2 93-262 141-143 HCNb_c.intestinalis (804aa) HCNa_c.intestinalis (688aa) HCNc_c.intestinalis (608aa) HCN_mosquito (945aa) HCN_fly (945aa) 6 C-linker CNBD C-terminus 3 4 5 6 7 8 54 73 49 80 56 225-488 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 41 45 53 44 54 44 29 49 40 42 56 38 45 197 1 2 3 4 5 6 7 8 9 10 219 61 43 54 75 50 44 42 48 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 18 29 43 25 49 50 54 43 46 53 27 60 74 37 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 79 80 55 45 46 50 26 47 57 32 29 64 43 41 5 6 7 8 9 10 11 12 13 50 26 104 32 29 106 44 1 2 3 4 268 55 80 55 45 51 Figure 4 Page 54 of 58 5A. S4 – voltage sensor S4-S5 linker Invertebrate HCN Urochordate HCN * Vertebrate HCN * a b 5B. c S6 Pore Invertebrate HCN Urochordate HCN Vertebrate HCN b a Figure 5 f c d e Page 55 of 58 6. A P * * * B * *** * PBC Figure 6 * *** * Hinge C ** Page 56 of 58 7a. 21-49 AAs b a 18-35 AAs 22-29 AAs 7-18 AAs 20-41 AAs 71-110 AAs Figure 7a Page 57 of 58 7b. B 57-113 AAs a b 13-14 AAs 10-11 AAs 29-41 AAs c 21-35 AAs b 54-103 AAs c b 8-14 AAs Figure 7b 191-560 AAs 66-112 AAs c Page 58 of 58 Table 1:List of HCN Sequences Used in Analyses Name Organism HCN_silkmoth (HvIh) Heliothis virescens HCN_bee (AmIh) Apis mellifera HCN_lobster (PaIh) Panulirus argus HCN_fly (DmIh) Drosophila HCN_urchin (SpIh/spHCN) Strongylocentrotus purpuratus HCN2_urchin (spHCN2) Strongylocentrotus purpuratus HCN1_mouse (mHCN1) Mus musculus HCN1_trout (tHCN1) Oncorhynchus mykiss HCN1_human (hHCN1) Homo sapiens HCN1_rabbit (rbHCN1) Oryctolagus cuniculus HCN1_dog Canis familiaris HCN1_rat (rHCN1) Rattus norvegicus HCN1_fugu Takifugu rubripes HCN2_human (hHCN2) Homo sapiens HCN2_mouse (mHCN2) Mus musculus HCN2_rat (rHCN2) Rattus norvegicus HCN2_zebrafish Danio Rerio HCN3_mouse (mHCN3) Mus musculus HCN3_human (hHCN3) Homo sapiens HCN3_rat (rHCN3) Rattus norvegicus HCN3_cow Bos taurus HCN3_dog Canis familiaris HCN4_human (hHCN4) Homo sapiens HCN4_mouse (mHCN4) Mus musculus HCN4_rabbit (rbHCN4) Oryctolagus cuniculus HCN4_rat (rHCN4) Rattus norvegicus HCN4_dog Canis familiaris HCN1_chimpanzee* Pan troglodytes HCN4_chimpanzee* † Pan troglodytes HCN1_opposum* Monodelphis domestica HCN3_opposum* Monodelphis domestica HCN4_opposum* Monodelphis domestica HCN2_chicken* † Gallus gallus HCN4_chicken* † Gallus gallus HCN1_frog* † Xenopus tropicalis HCN2_frog* Xenopus tropicalis HCN4_frog* † Xenopus tropicalis HCN1_green_puffer* Tetraodon nigroviridis HCN4a_green puffer* Tetraodon nigroviridis HCN4b_green_puffer* Tetraodon nigroviridis HCN2a_green_puffer* Tetraodon nigroviridis HCN2b_green puffer* Tetraodon nigroviridis HCN3_green_puffer* Tetraodon nigroviridis HCN3_zebrafish* Danio Rerio HCN4_zebrafish* Danio Rerio HCN2b_fugu*† Takifugu rubripes HCN2a_fugu* Takifugu rubripes HCN3a_fugu* Takifugu rubripes HCN3b_fugu* Takifugu rubripes HCN4a_fugu* Takifugu rubripes HCN4b_fugu* Takifugu rubripes HCN_mosquito* Anopheles gambiae HCNa_c.intestinalis* Ciona intestinalis HCNb_c.intestinalis* Ciona intestinalis HCNc_c.intestinalis* Ciona intestinalis HCNa_c.savignyi* Ciona savignyi HCNb_c.savignyi* Ciona savignyi HCNc_c.savignyi* Ciona savignyi Common Name Taxonomy silkmoth Invt. honey bee Invt. spiny lobster Invt. fruit fly Invt. sea urchin Invt. sea urchin Invt. mouse Mam. rainbow trout Vt. human Mam. rabbit Mam. dog Mam. norway rat Mam. japanese pufferfish Vt. human Mam. mouse Mam. norway rat Mam. zebrafish Vt. mouse Mam. human Mam. norway rat Mam. cow Mam. dog Mam. human Mam. mouse Mam. rabbit Mam. norway rat Mam. dog Mam. chimpanzee Mam. chimpanzee Mam. opossum Mam. opossum Mam. opossum Mam. red jungle fowl Vt. red jungle fowl Vt. frog Vt. frog Vt. frog Vt. spotted green pufferfish Vt. spotted green pufferfish Vt. spotted green pufferfish Vt. spotted green pufferfish Vt. spotted green pufferfish Vt. spotted green pufferfish Vt. zebrafish Vt. zebrafish Vt. japanese pufferfish Vt. japanese pufferfish Vt. japanese pufferfish Vt. japanese pufferfish Vt. japanese pufferfish Vt. japanese pufferfish Vt. mosquito Invt. sea squirt Uro. sea squirt Uro. sea squirt Uro. pacific sea squirt Uro. pacific sea squirt Uro. pacific sea squirt Uro. Identifier GI:3970750 GI:33355927 GI:33355925 GI:5326833 GI:47551101 GI:74136757 GI:6754168 GI:33312350 GI:32698746 GI:38605639 GI:73954256 GI:29840774 SINFRUP00000160459 GI:21359848 GI:6680189 GI:29840773 GI:68399930 GI:6680191 GI:38327037 GI:29840772 GI:76612489 GI:73961597 GI:4885407 GI:29840776 GI:38605640 GI:29840771 GI:74000965 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Database GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank Ensembl GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank GenBank Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl Ensembl
© Copyright 2026 Paperzz