A Newly Classified Vertebrate Calpain Protease, Directly Ancestral to CAPN1 and 2, Episodically Evolved a Restricted Physiological Function in Placental Mammals Daniel J. Macqueen,*,1 Margaret L. Delbridge,2 Sujatha Manthri,1 and Ian A. Johnston1 1 Physiological and Evolutionary Genomics Laboratory, School of Biology, Scottish Oceans Institute, University of St Andrews, St Andrews, Fife, United Kingdom 2 The ARC Centre of Excellence for Kangaroo Genomics, Ecology, Evolution and Genetics, Research School of Biology, The Australian National University, Canberra, ACT Australia *Corresponding author: E-mail: [email protected]. Associate editor: David Irwin Research article Abstract The most studied members of the calpain protease superfamily are CAPN1 and 2, which are conserved across vertebrates. Another similar family member called l/m-CAPN has been identified in birds alone. Here, we establish that l/m-CAPN shares one-to-one orthology with CAPN11, previously described only in eutherians (placental mammals). We use the name CAPN11 for this family member and identify orthologues across vertebrate lineages, which form a monophyletic phylogenetic clade directly ancestral to CAPN1 and 2. In lineages branching before therians (live-bearing mammals), the CAPN11 coding region has evolved under strong purifying selection, with low nonsynonymous (dN) versus synonymous (dS) substitution rates (dN/dS 5 0.076 across pretherians), and its transcripts were detected widely across different tissues. These characteristics are present in CAPN1 and 2 across vertebrate lineages and indicate that pretherian CAPN11 likewise has conserved a wide physiological function. However, an ;7-fold elevation in dN/dS is evident along the CAPN11 branch splitting eutherians from platypus, paralleled by a shift to ‘‘testis-specific’’ gene regulation. Estimates of dN/dS in eutherians were ;3-fold elevated compared with pretherians and coding and transcriptional-level evidence suggests that CAPN11 is functionally absent in marsupials. Many CAPN11 sites are functionally constrained in eutherians to conserve a residue with radically different biochemical properties to a fixed state shared between pretherian CAPN11 and CAPN1 and 2. Protein homology modeling demonstrated that many such eutherian-specific residue replacements modify or ablate interactions with the calpain inhibitor calpastatin that are observed in both pretherian orthologues and CAPN1/2. We propose a model akin to the Dykhuizen–Hartl effect, where inefficient purifying selection and increased genetic drift associated with a reduction in effective population size, drove the fixation of mutations in regulatory and coding regions of CAPN11 of a common marsupial–eutherian ancestor. A subset of these changes had a cumulative adaptive advantage in a eutherian ancestor because of lineage-specific aspects of sperm physiology, whereas in marsupials, no advantage was realized and the gene was disabled. This work supports that functional divergence among gene family member orthologues is possible in the absence of widespread positive selection. Key words: CAPN11 and l/m-CAPN, episodic gene evolution, functional divergence of gene family orthologues, transcriptional regulation, functional constraints, Dykhuizen–Hartl effect. Introduction 2þ The calpain superfamily of Ca -dependent cysteine proteases regulate a multitude of physiological processes including apoptosis, membrane fusion, cell motility, and signal transduction (reviewed by Goll et al. 2003) and are implicated in several human diseases (Saez et al. 2006). Vertebrates other than teleosts have up to 15 calpains, originating from duplications dating before and within metazoan and vertebrate lineages (Jékely and Friedrich 1999). Calpains have been classified by their domain structure and/or expression patterns (Sorimachi et al. 1997; Sorimachi and Suzuki 2001; Goll et al. 2003). The first classification separates the family into ‘‘typical’’ or ‘‘atypical’’ groups (Goll et al. 2003). Typical calpains, including CAPN-1, -2, -3, -8, -9, –11, -12, -13, and -14, have a conserved set of functional domains called DI, DIIa, DIIb, DIII, and DIV (Sorimachi and Suzuki 2001; Goll et al. 2003). Atypical calpains invariably have DIIa and DIIb, which together form the papain-like protease domain present in all calpains, some have DIII, which is reminiscent of C2-like complement domains and some have additional conserved domains (Sorimachi and Suzuki 2001; Goll et al. 2003). DIV, a calmodulin-like penta-EF-hand domain, is unique to the typical calpains (Sorimachi and Suzuki 2001; Goll et al. 2003). The second commonly used classification is based on messenger RNA (mRNA) transcript levels in different tissues and splits the family into ‘‘ubiquitous’’ or tissuespecific types (e.g., Sorimachi and Suzuki 2001; Saez et al. 2006). The majority of research into calpains has focused on two typical family members, CAPN1 and 2, which are © The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 1886 Mol. Biol. Evol. 27(8):1886–1902. 2010 doi:10.1093/molbev/msq071 Advance Access publication March 11, 2010 Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 closely related 80-kDa proteins coded by two respective genes, CAPN1 and 2. When bound to a common 30-kDa small subunit (called CAPNS1), they are classified as land m-CAPN respectively, which are considered the archetypal broad functioning ubiquitous calpains (Sorimachi and Suzuki 2001; Goll et al. 2003). Indeed, CAPN1 and 2 genes are each transcribed broadly across tissues in mammals and birds (Sorimachi et al. 1995; Farkas et al. 2003). Both l- and m-CAPN cleave hundreds of known substrates in many cell types while being tightly regulated by a specific ubiquitous inhibitor called calpastatin (CAST; Goll et al. 2003). Although l- and m-CAPN share fundamental biochemical properties due to the close evolutionary relationship of CAPN1 and 2 (Jékely and Friedrich 1999; Macqueen et al. 2010), they also have distinct functions, for example, they are activated by different levels of cellular Ca2þ (lM and mM concentrations respectively; Sorimachi and Suzuki 2001; Goll et al. 2003). Another family member called CAPN8 is more closely phylogenetically related to CAPN2 than CAPN1 (Jékely and Friedrich 1999), and while requiring a similar Ca2þ concentration for its activation (Hata et al. 2007), has diverged in several critical functions shared by CAPN1 and 2, for example, in mammals, its transcripts are ‘‘stomach specific’’ (Sorimachi and Suzuki 2001) and its protein does not require CAPNS1 for activity (Hata et al. 2007). Apart from CAPN1 and 2, the only other characterized ubiquitous typical calpain was identified solely in birds and called l/m-CAPN due to its midway sequence similarity and Ca2þ requirement to CAPN1 and 2 (Sorimachi et al. 1995). It branched closely to CAPN1 and 2 within a calpain phylogeny (Jékely and Friedrich 1999), and its mRNA transcripts were similarly broadly expressed across tissues (Sorimachi et al. 1995; Lee et al. 2007). In common with CAPN1 and 2, chicken l/m-CAPN forms a heterodimer with CAPNS1 and is inhibited by CAST (Wolfe et al. 1989). It is the predominant translated calpain in chicken erythrocytes (Murakami et al. 1988) and across different avian tissues (Lee et al. 2007). Furthermore, in chickens, CAPN1 protein is present at very low concentrations in several tissues and CAPN2 is not translated (Lee et al. 2007). These results are in marked contrast to the situation in mammals, for example, where CAPN2 is developmentally essential (Dutt et al. 2006). The phylogenetic status of l/m-CAPN is unclear, and it has been variously ascribed as avian specific (Lee et al. 2007) or based on sequence identity, a potential orthologue of CAPN11 (Dear et al. 1999). However, CAPN11 has only been characterized in eutherians from the Euarchontoglires lineage, where its gene transcripts are testis restricted (Dear and Boehm 1999; Dear et al. 1999). Here, we demonstrate using shared synteny and phylogenetic analyses that l/m-CAPN and eutherian CAPN11 genes share true orthology and are examples of a single vertebrate–wide calpain family member (CAPN11) that may represent the progenitor sequence to CAPN1 and 2. Our results suggest that CAPN11 of pretherians is under strict purifying selection to maintain wide physiological functions across many cell types. However, CAPN11 became MBE testis specific in a common eutherian ancestor in parallel with a striking shift in coding level constraints that lead to residue replacements affecting physical interactions with CAST that are conserved in both pretherian CAPN11 and CAPN1/2. This study provides evidence for functional divergence of calpain orthologues on par with differences previously observed between paralogous members of the superfamily and highlights the pitfalls of adopting classification systems for orthologous gene family members based on expression patterns or functions. Materials and Methods Sequences Sequences used in this study are listed in supplementary table S1, Supplementary Material online, and were obtained from Ensembl (http://www.ensembl.org) or NCBI (http://www.ncbi.nlm.nih.gov/) databases. Blast searches were performed using the NCBI Web server (http:// blast.ncbi.nlm.nih.gov/Blast.cgi), specifically with BlastP versus the nonredundant NCBI protein database or TBlastN versus the marsupial expressed sequence tag (EST) database. Regions of Ensembl marsupial genomes that were predicted to harbor CAPN11 were screened against the Ensembl trace server (http://trace.ensembl .org/cgi-bin/tracesearch) to confirm that observed nucleotides were not due to sequencing errors (see Results). Sequence alignment of marsupial calpain proteins against other family members was performed with Mafft v.6 employing the G-INS-I strategy (Katoh and Toh 2008). Phylogenetic Analyses Fifty-eight amino acid sequences were used for phylogenetic analyses including CAPN-1, -2, -3, -8, -9, and -11 orthologues (listed in supplementary table S1, Supplementary Material online). The whole polypeptide sequence contains valuable phylogenetic signal because each included calpain has an identical domain structure. Sequence alignment was performed with Mafft v.6 employing the G-INS-I strategy (Katoh and Toh 2008). An 890-site output was manually checked and submitted to Gblocks to eliminate ambiguous/saturated sites with the most stringent block selection setting (Castresana 2000). The 282-site output (alignment A in supplementary fig. S1, Supplementary Material online) was submitted to ProtTest (Abascal et al. 2005), which indicated Le and Gascuel (LG) þ G þ I (LG substitution model with estimation of gamma distribution shape parameter, a, and the proportion of invariable sites) as the best fitting of 112 examined evolutionary models according to Akaike information criterion (AIC) statistics. Maximum likelihood (ML) was performed using this model in PhyML (Guindon and Gascuel 2003) with four substitution rate categories. Bayesian inference (BI) was performed in MrBayes 3.12 (Ronquist and Huelsenbeck 2003) with the next best ProtTest model (Jones, Taylor and Thornton [JTT] þ G þ I) because LG is not available in the program. Two runs were used, each of a single chain of 20,000,000 generations, sampled every 20,000 generations. Convergence was 1887 MBE Macqueen et al. · doi:10.1093/molbev/msq071 assessed by comparing the standard deviation (SD) of split frequencies between runs. Visual assessment with Tracer v1.4 (Drummond and Rambaut 2007) also indicated that a suitable mixing of Markov chains was obtained. The first 5,000 generations were discarded and remaining sample independence was confirmed by a lack of autocorrelation in tree log-likelihood values assessed with Minitab 13.2 (Minitab Inc., State College, PA). The final 15,000 samples were used to obtain a consensus tree and posterior probabilities. Neighbor joining (NJ) and maximum parsimony (MP) trees were constructed in Mega 4.0 (Tamura et al. 2007) with 1,000 bootstrap replicates. For NJ, the best available ML model was used (JTT þ G, a 5 0.945, estimated by PhyML), and additionally, an exploratory analysis was performed to establish the effect of among-site rate variation on tree topology, where the JTT model was used with a fixed at either 0.25, 0.5, 0.75, or 5. For MP, the close-neighbor-interchange algorithm was used. Selection Analyses Sequences used for selection analyses are listed in supplementary table S1, Supplementary Material online, and included orthologues of CAPN11 (21 sequences), CAPN1 (18 sequences), and CAPN2 (14 sequences). Teleost CAPN2 sequences were excluded because phylogenetic evidence indicates that they are not direct orthologues of tetrapod CAPN2 (Macqueen et al. 2010). When sequences were obtained from Ensembl databases, only species with 6-fold coverage or greater were included to avoid sequencing errors. Codon alignments were constructed separately for each family member by submitting nucleotide sequences (complete or near-complete coding sequences) to Pal2Nal (Suyama et al. 2006) along with a manually checked alignment of translated amino acids produced using Mafft v.6 with the G-INS-I strategy (Katoh and Toh 2008). For CAPN1, 2, and 11, finished codon alignments of 2148, 2100, and 2223 respective sites (respective alignments B, C, and D in supplementary fig. S1, Supplementary Material online) were loaded into HyPhy v.0.99 (Kosakovsky Pond and Frost 2005a) along with a corresponding ML phylogenetic tree (provided in supplementary fig. S1, Supplementary Material online) constructed by the approach described above. For each calpain alignment, the HyPhy batch file NucModelCompare was used to establish the best fitting of 203 general time reversible (GTR) models of nucleotide substitution (Kosakovsky Pond et al. 2009). To test if selective constraints on calpain family members were altered during mammalian evolution, the HyPhy batch file SelectionLRT was employed using the best-fitting GTR model crossed with the MG94 codon model. This approach used likelihood ratio tests (LRTs) to compare the plausibility of five evolutionary models where dN estimates were either independent or constrained to be equal in different combinations of three specified partitions within the codon data (Kosakovsky Pond et al. 2009). The specified data partitions were the eutherian clade, the pretherian clade, and their separating branch. The models tested were 1, global dN estimate; 2, constrained dN estimate for eutherians and pretherians with independent estimate for the separating 1888 branch; 3, constrained dN estimate for eutherians and the separating branch with independent estimate for pretherians; 4, constrained dN estimate for pretherians and the separating branch with independent estimate for eutherians; and 5, independent dN estimates for eutherians, pretherians, and the separating branch. AIC statistics were used to determine the relative rank of each model and approximate its relative probability (Burnham and Anderson 2002). The HyPhy batchfile AnalyzeCodonData was used to estimate dN and dS for every branch of each specified phylogenetic tree by locally fitting the MG94 codon substitution model crossed with the best-fitting GTR model. Final branch dN and dS values were taken as the average of values obtained by nonparametric bootstrapping of this procedure with 100 replicates. Gene Expression Analyses The specific expression of CAPN1, 2, and 11 and RPS13 (coding a ribosomal protein) was examined using reverse transcription–polymerase chain replication (RT-PCR) with first-strand complementary DNA (cDNA) templates derived from RNA extracted from a panel of adult tissues for mouse (Mus musculus), pig (Sus scrofa), tammar wallaby (Macropus eugenii), platypus (Ornithorhynchus anatinus), green anole lizard (Anolis carolinensis), zebra finch (Taeniopygia guttata), frog (Xenopus laevis), and zebrafish (Danio rerio). Forty PCR cycles were used for tammar wallaby CAPN11 and 35 cycles for all other measured genes. Detailed methods and primer details are provided in supplementary file S1, Supplementary Material online. DIVERGE 2.0 Rate Tests Type I and II ‘‘functional divergence’’ was assessed using the program DIVERGE 2.0 (Gu 2006). The employed alignment (alignment E in supplementary fig. S1, Supplementary Material online) was constructed as for the main phylogenetic analysis minus the Gblocks submission to ensure maximum site inclusion. A ML tree was constructed using the same approach as for the main phylogenetic analysis and loaded into DIVERGE 2.0 before the following clades were specified: eutherian CAPN11 (six sequences), sauropsid (birds and reptile) CAPN11 (four sequences), teleost CAPN11 (six sequences), vertebrate CAPN1 (eight sequences), and tetrapod CAPN2 (six sequences). It was not possible to specify other CAPN11 lineages (e.g., amphibians or monotremes) because the minimum requirement of four sequences (Gu 2006) was not met. P values were calculated from z scores using an applet provided by the WEB Interface for Statistics Education (http://wise.cgu.edu/). When the coefficient of functional divergence h was significantly greater than 0, posterior probability values for individual sites were examined. Sequence logos were created in WebLogo (Crooks et al. 2004). Ancestral Sequence Reconstruction for the Common Eutherian Ancestor Ancestral sequence reconstruction (ASR) was performed with Datamonkey (Kosakovsky Pond and Frost 2005b) Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 using joint and marginal ML as well as sampled reconstruction to obtain CAPN11 and CAPNS1 and CAST4 sequences representing the common eutherian ancestor. For CAPN11, ASR was performed using amino acid translations of the codon alignment and same phylogenetic tree employed for selection analyses. A model selection test supported that JTT þ F (F denotes data-derived frequencies) was the best fitting of 28 examined evolutionary models according to AIC statistics. This model was employed with four substitution rate categories, allowing a gamma distribution of among-site substitution rates. The three methods of ASR were agreeing at 98.5% of sites. Type II sites were agreeing among all ASR methods receiving marginal ML probabilities .0.95 and posterior probability values of 1.0 (sampled reconstruction). Marginal ML reconstructed sequences were selected because they generally had the highest support values at nonagreeing positions and were most realistic considering the alignment data. For CAST4 and CAPNS1, ASR was performed as for CAPN11, using respective alignments representing 13 and 12 full-coding amino acid sequences, for a similar set of pretherian and eutherian species (sequence are provided in supplementary table S1, Supplementary Material online, respective alignments F and G in supplementary fig. S1, Supplementary Material online). Sequences were aligned and processed as described for phylogenetic analysis section. For CAST4, a 621-site alignment was uploaded to Datamonkey, which specified a suitable NJ tree and indicated JTT þ F as the best-fitting model. We were only interested in the accuracy of ASR for the CAST4 domain, which spanned sites 503–586. In this region, all sites were agreeing between ASR approaches and were well supported. For CAPNS1, a 174-site alignment was uploaded to Datamonkey, which specified a suitable NJ tree and indicated JTT as the best-fitting model. All reconstructed sites were agreeing between ASR approaches and were well supported. Homology Modeling of Eutherian and Pretherian CAPN11 Protein homology modeling was performed using Protinfo PPC, which predicts atomic-level structures of heterodimeric proteins with high accuracy (Kittichotirat et al. 2009). The chosen structural template was rat (Rattus norvegicus) CAPN2 bound to CAPNS1 and CAST4 (Hanna et al. 2008, RCSB protein databank file: 3BOW.pdb). Target sequences were as follows: Model A: rat CAPN2, CAPNS1, and CAST4; Model B: rat CAPN1, CAPNS1, and CAST4; Model C: rat CAPN11, CAPNS1, and CAST4; Model D: chicken CAPN11, CAPNS1, and CAST4; and Model E: CAPN11, CAPNS1, and CAST4 of a common eutherian ancestor. We only submitted regions directly alignable with the sequences in 3BOW.pdb to avoid template-free modeling. For the full-modeling pipeline, the reader is directed to Kittichotirat et al. (2009). The best energy–minimalized models were selected according to their structure and interface confidence scores (Kittichotirat et al. 2009). The global quality of each model relative to the experimental MBE structure was assessed using ERRAT, a program which can distinguish correctly and incorrectly determined regions of structures according to evidence-based expectations about atomic-level interactions (Colovos and Yeates 1993). The experimental structure 3BOW.pdb received a global quality score of 89.9 meaning around 90% of residues falls below the programs 95% rejection limit, which is typical of its 2.4Å resolution (Colovos and Yeates 1993). The homology models received nearequivalent ERRAT scores (Model A: 89.3, Model B: 89.2, Model C: 89.9, Model D: 89.4, and Model E: 88.5), indicating that the Protinfo PPC modeling did not reduce structural resolution. Global model quality was then examined with QMEAN, which provides a score based on a series of atomic-level structural features, which becomes more negative as inferred model quality improves (Benkert et al. 2008). 3BOW.pdb received a score of 96.68 and the rat CAPN2 control model 96.51, again suggesting that the modeling approach did not reduce structural resolution. Other models received an even lower QMEAN score (86.2 to 90.29). ProQres was also used, which considers multiple atomic-level expectations about protein structure to score the local quality of homology models (Wallner and Elofsson 2005). A sliding window analysis depicts the local scoring function on a scale of 0–1 (0 being very unreliable). According to ProQres, the homology models were of comparable high local quality with the published calpain template, with scores falling below 0.5 in only a few short regions. The 3D structure of the various models were rendered and manipulated with Polyview 3D (Porollo and Meller 2007). PDB files for each model including a list of predicted residue–residue interactions are available on request to D.J.M. Results l/m-CAPN Coding Genes are Present in Many Vertebrate Lineages Sequences with greater identity to chicken l/m-CAPN than other calpains were identified in the genomes of teleosts, amphibians, reptiles, birds, and monotreme mammals (supplementary fig. S2, Supplementary Material online). For most species, one ‘‘l/m-CAPN–like’’ sequence was observed, although zebrafish, stickleback (Gasterosteus aculeatus), and the frog X. laevis had two (supplementary fig. S2, Supplementary Material online). In Ensembl databases, these l/m-CAPN–like sequences were variably annotated, for example, as CAPN1, CAPN11, CANX, novel, or by a sequence identifier from another database (e.g., supplementary fig. S2, Supplementary Material online). Shared Synteny Exists between l/m-CAPN and CAPN11 Coding Genes We examined chromosomal regions containing genes coding the l/m-CAPN–like sequences, plus CAPN1, 2, 8, and 11 in a broad range of vertebrates (fig. 1 and supplementary fig. S3, Supplementary Material online). Eutherian 1889 MBE Macqueen et al. · doi:10.1093/molbev/msq071 FIG. 1. A comparison of shared synteny in the chromosomal neighborhood of eutherian CAPN11 and pretherian l/m-CAPN–like genes provides evidence for a one-to-one orthologous relationship. The question mark indicates the absence an opossum CAPN11 gene in its expected position. Syntenic genes are arrows pointing in the direction of sense-strand transcription. Avian-specific genes in this region are shown as gray or black arrowheads pointing in the direction of sense-strand transcription, respectively showing genes conserved in chicken and zebra finch or just chicken. An accepted phylogeny for the included taxa is shown to the figures left. chromosomal tracts containing CAPN11 clearly share gene order with those harboring l/m-CAPN–like coding genes of pretherian tetrapods and to a lesser extent zebrafish (fig. 1 and supplementary fig. S3, Supplementary Material online). This region was distinct from other genomic neighborhoods containing CAPN1, 2, and 8, where shared synteny was evident across vertebrates (supplementary fig. S3, Supplementary Material online). These results strongly indicate that eutherian CAPN11 genes are one-to-one orthologues of tetrapod l/m-CAPN–like genes because there is no plausible mechanism other than direct inheritance to account for the exact pattern of conserved gene order observed among tetrapods (fig. 1). Although the opossum (Monodelphis domestica) genome contained the same syntenic chromosomal neighborhood proximal to CAPN11 as in other tetrapods, CAPN11 was missing from its expected location (fig. 1 and supplementary fig. S3, Supplementary Material online, examined in a following section). Phylogenetic Position of l/m-CAPN–Like Proteins among the Wider Superfamily BI and ML phylogenetic analyses employing the best-fitting available evolutionary models supported the shared synteny analysis because eutherian CAPN11 and pretherian l/m-CAPN–like sequences branched within a monophyletic vertebrate clade, with nodes following the expected species relationships (fig. 2). Therefore, it is probable that these trees correctly reflect the expected topology of a vertebrate-wide calpain family member. We interpret this in accordance with human nomenclature, as an expansion of existing CAPN11 family members and adhere to this naming system onward. The branch splitting platypus and eutherian CAPN11 was highly extended compared with all other CAPN11 branches and this seemed to affect several phylogenetic reconstruction methods including ML, evidenced by a low (,50%) bootstrap value (fig. 2). Furthermore, MP failed to retrieve a monophyletic CAPN11 1890 clade (fig. 3A), as did NJ (fig. 3B), except when enforcing a gamma distribution shape parameter allowing for extreme among-site rate variation (a 5 0.25; fig. 3C). Molecular Evolution of the CAPN11 Clade Comparing rates of dN and dS is a common way to examine selective pressures on coding regions (Hughes and Nei 1988; Fay and Wu 2003; Kosakovsky Pond et al. 2009; Pybus and Shapiro 2009). A frequent assumption made is that nonsynonymous replacements, by altering protein function, generally affect fitness more than silent replacements. Although it should not be assumed that silent replacements in coding regions are always selectively neutral, for example, due to constraints imposed by secondary nucleic acid structure or codon usage preference (Pybus and Shapiro 2009), it is reasonable to assume that most are more neutral than those replacements altering an amino acid. Thus, changes in the dN/dS ratio approximate shifts in selective constraint and can detect instances of diversifying positive selection (e.g., Hughes and Nei 1988). Commonly, a dN/dS value of 1 is used to imply neutral selection on a coding region and values lesser or greater than 1 to respectively indicate purifying and positive selection (Fay and Wu 2003; Kosakovsky Pond et al. 2009; Pybus and Shapiro 2009). We first examined how selective constraints varied on the whole coding region of CAPN11 during vertebrate evolution. By globally fitting an evolutionary model, a dN/dS estimate of 0.12 was obtained, which was similar, but slightly higher than for CAPN1 (0.080) and 2 (0.096) (table 1), providing a crude measure that strong purifying selection has been maintained across vertebrate lineages. However, dN for the branch splitting platypus from eutherians was ;2.5 fold higher than any other pretherian branch (not shown, evident in fig. 2). Thus, we sought to test whether this was due to a shift in selective constraints. This was achieved by comparing the likelihood of five Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 MBE FIG. 2. ML and BI phylogenetic analyses show that l/m-CAPN–like and CAPN11 amino acid sequences form a monophyletic vertebrate clade and represent a single calpain family member ancestral to CAPN1 and CAPN2/8. The shown BI topology was highly similar to trees constructed by ML. Trees were rooted at the CAPN3/9 stem, an outgroup position established previously (Jékely and Friedrich 1999; Macqueen et al. 2010). All posterior probability values are shown above nodes in the BI tree and supporting ML bootstrap values .50% are shown below nodes or after BI values (i.e., BI value/ML value). The scale shows the number of substitutions per site along each branch. The chromosomal location of teleost calpain family member co-orthologues is also shown. evolutionary models where estimates of dN/dS were either constrained or allowed to be independent in three partitions within the codon alignments, namely eutherian branches, pretherian branches and the separating branch (table 1). By deriving AIC statistics, we obtained an Akaike weight for each model to approximate the probability that it was the best fitting of those tested (Burnham and Anderson 2002). For CAPN11, LRTs indicated that Models 2–5 provided a significantly better fit to the data than the global model (table 1). However, Model 5, where dN/dS was estimated separately for eutherians, pretherians and the separating branch, received overwhelming support as being the most plausible model (table 1, Akaike weight of 0.998). Respective Model 5 dN/dS estimates were 0.076 and 0.25 for pretherian and eutherian clades and 0.53 for the separating branch (table 1). Conversely, for both 1891 Macqueen et al. · doi:10.1093/molbev/msq071 MBE invariably lower, with a mean value of 0.10 and SD of 0.091. For platypus, the closest pretherian relative to eutherians, purifying selection was as strong (dN/dS 5 0.08) as in other pretherian branches. Conversely, dN/dS estimates for eutherian branches were almost invariably higher than pretherian branches, but generally lower than the common therian branch, with a mean value of 0.31 and SD of 0.17. In support of the compartmentalization analysis, dN/dS values of individual branches in the CAPN1 and 2 phylogenies were similar in pretherians and eutherians, again suggesting that selective constraints remained stable during evolution. The Transcriptional Regulation of CAPN11 across Vertebrate Evolution FIG. 3. MP and NJ methods of phylogenetic reconstruction did not recapture a monophyletic CAPN11 clade in most instances. Condensed trees are shown with bootstrap confidence values for important nodes. (A) MP tree. (B) NJ tree using the best-fitting available ML substitution model (JTT þ G, a 5 0.945 as estimated by PhyML). (C) NJ tree imposing a gamma shape parameter (a 5 0.25) allowing extreme among-site rate variation. CAPN1 and 2, dN/dS estimates did not deviate very strongly from the global estimate in specified partitions for any of three competing evolutionary models that had between 10% and 60% of being the best-fitting according to their Akaike weight (table 1). However, it should be mentioned that for both CAPN1 and 2, a small decrease in dN/dS is observed in eutherians compared with pretherians in each competing model (table 1). The above approach provides a statistical framework suggesting a striking shift in selective constraints on eutherian CAPN11 after the split from monotremes. However, this method fixes dN/dS estimates across specified clades, such that branch-by-branch variation was not considered. Thus, an evolutionary model was locally fit to estimate branch-specific dN and dS. With this approach, dN/dS for the branch splitting platypus from eutherians was 0.54. Estimates of dN/dS for individual pretherian branches were 1892 Next we examined the tissue-specific mRNA expression of CAPN1, 2, and 11 from vertebrates spanning all major classes (fig. 4). For species representing eutherians (mouse and pig), marsupials (tammar wallaby), monotreme mammals (platypus), reptiles (green anole lizard), birds (zebra finch), amphibians (African clawed frog), and teleost fish (zebrafish), CAPN1 and 2 were expressed widely across tissues (fig. 4). Likewise, CAPN11 of all examined pretherian vertebrates including platypus, green anole lizard, zebra finch, frog, and zebrafish were not restricted in their expression across tissues (fig. 4). Conversely, in mice, pigs, and humans (Homo sapiens), CAPN11 was only expressed in testis (fig. 4 and Dear and Boehm 1999; Dear et al. 1999). Previous work showed that CAPN11 transcripts accumulated specifically in spermatozoa of adult mice (Dear and Boehm 1999) and that its protein localized to the acrosomal cap (Ben-Aharon et al. 2006). A parsimonious explanation for these results is that changes in eutherian CAPN11 regulation predate the split of Laurasiatheria (pig) and Euarchontoglires (mouse/human) lineages, which occurred some 95–113 million years ago, a period spanning the base of eutherian evolution (Benton and Donoghue 2007). A 176-bp exonic fragment of a putative marsupial CAPN11 orthologue in tammar wallaby (discussed further in a following section) was confirmed as present in the genome, but we could not detect a transcribed product in eight different tissues after 40 RT-PCR cycles (fig. 4). This suggests that CAPN11 has been transcriptionally disabled in this animal. Unfortunately, opossum samples were not available to confirm if CAPN11 transcription is disabled more widely in marsupials. Divergence in Functional Constraints between CAPN11 and Other Family Members The established notion that functional importance and evolutionary conservation are inherently linked (Kimura 1983) is a central tenet of statistical methods formulated to identify sites in related proteins with distinct functional constraints (e.g., Gu 1999, 2006; Gribaldo et al. 2003). Such sites are thought to underlie functional specificities of distinct phylogenetic clades and two types have been defined. Type I sites (Gu 1999), otherwise known as heterotachous MBE Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 Table 1. Details of HyPhy Analysis Used to Compare the Plausibility of Five Models Estimating Selective Pressures in Different Phylogenetic Partitions of Calpain Family Member Codon Alignments. LogL Pa AICib Dic Relative likelihood(i)d Akaike weight (vi)e Model CAPN1 1 2 217,757.14 217,754.94 NA 0.11 35,592.81 35,592.28 8.36 7.83 0.015 0.020 0.0092 0.012 3 217,753.22 0.0012 35,584.45 0 1.00 0.60 4 217,754.95 0.0085 35,587.89 3.44 0.18 0.11 5 217,753.02 0.0045 35,586.05 1.60 0.45 0.27 CAPN2 1 2 213,423.69 213,422.54 NA 0.13 26,913.38 26,913.08 16.06 15.76 0.00033 0.00038 0.00017 0.00020 3 213,416.08 9.56 3 10205 26,900.16 2.84 0.24 0.12 4 213,414.66 2.14 3 10205 26,897.32 0 1.00 0.52 5 213,414.02 6.32 3 10205 26,898.05 0.73 0.70 0.36 Data partition and dN/dS estimate Global 5 0.0802 (0.0758–0.0848) Eutherians and pretherians 5 0.0816 (0.0770–0.117) versus separating branch 5 0.0572 (0.0436–0.0732) Eutherians and separating branch 5 0.0646 (0.0567–0.0733) versus pretherians 5 0.0881 (0.0827–0.0938) Pretherians and separating branch 5 0.0858 (0.0807–0.0911) versus eutherians 5 0.0659 (0.0566–0.0762) Eutherians 5 0.0660 (0.0567–0.0763) versus pretherians 5 0.0882 (0.0828–0.0938) versus separating branch 5 0.0571 (0.0435–0.0730) Global 5 0.0961 (0.0894–0.1031) Eutherians and pretherians 5 0.0945 (0.0878–0.102) versus separating branch 5 0.200 (0.120–0.308) Eutherians and separating branch 5 0.0774 (0.0683–0.0873) versus pretherians 5 0.113 (0.104–0.123) Pretherians and separating branch 5 0.114 (0.104–0.124) versus eutherians 5 0.0749 (0.0657–0.0849) Eutherians 5 0.0743 (0.0652–0.0843) versus pretherians 5 0.112 (0.102–0.122) versus separating branch 5 0.198 (0.119–0.304) CAPN11 1 221,272.13 NA 42,632.26 299.62 8.67 3 10266 8.65 3 10266 Global 5 0.118 (0.112–0.124) 2 221,238.72 3.33 3 10216 42,567.43 234.79 1.037 3 10251 1.035 3 10251 Eutherians and pretherians 5 0.111 (0.106–0.117) versus separating branch 5 0.486 (0.426–0.552) 3 221,127.51 0 42,345.03 12.39 0.0020 0.0020 Eutherians and separating branch 5 0.273 (0.254–0.293) versus pretherians 5 0.0769 (0.0718–0.0821) 4 221,167.94 0 42,425.89 93.25 5.63 3 10221 5.63 3 10221 Pretherians and separating branch 5 0.082 (0.077–0.087) versus eutherians 5 0.253 (0.232–0.275) 5 221,120.32 0 42,332.64 0 1.00 0.998 Eutherians 5 0.252 (0.231–0.273) versus pretherians 5 0.0761 (0.0711–0.0813) versus separating branch 5 0.534 (0.468–0.606) NOTE.—The values given in parentheses are 95% confidence interval. a P value established by LRT indicating whether the given model provides a significant improvement of fit to the data compared with the global model. b AICi is the calculated AIC value for given model (i). c Di 5 AICi minAIC (the model with the lowest AIC) (after Burnham and Anderson 2002). d Relative likelihood(i) 5 exp(1/2Di) (after Anderson and Burnham 2002). e xi 5 relative likelihood(i)/sum of relative likelihood(i) of all tested models (after Anderson and Burnham 2002). sites (Gribaldo et al. 2003), are those fixed in one clade, but variable in another, whereas type II sites (Gu 2006) are functionally constrained in both clades but fixed as residues with radically different biochemical properties. We employed DIVERGE 2.0 (Gu 2006) to explore the hypothesis that shifts in functional constraints occurred in CAPN11 following the split of eutherians from pretherians. Clades representing CAPN11 orthologues of eutherians, sauropsids, and teleosts were compared with each other and with clades for their paralogues, CAPN1 and 2. The coefficient of functional divergence (h) is a statistical measure of the strength of divergence in functional con- straints between compared clades ranging from a value of 0 to 1 (Gu 1999, 2006). Rejection of the null hypothesis that its value is equal to 0 indicates that a shift in functional constraints is present at some sites, which is a proxy for differences in protein function (Gu 1999, 2006). As h increases, so does the number of sites where constraints have changed between compared clades making functional divergence more likely. Significant type I divergence was evident across all compared calpain clades (table 2). The type I h values observed between eutherian CAPN11 in comparisons with orthologous (i.e., teleost or sauropsid CAPN11) or paralogous (i.e., CAPN1 and 2) clades were 1893 Macqueen et al. · doi:10.1093/molbev/msq071 FIG. 4. The figure shows the change in constraints on tissue-specific transcriptional regulation of CAPN11 but not CAPN1 or 2 during mammalian evolution. Shown are results for (A) mouse, (B) pig, (C) tammar wallaby, (D) platypus, (E) green anole lizard, (F) zebra finch, (G) frog, and (H) zebrafish. Abbreviations are B, brain; SKM, skeletal muscle; H, heart; SKI, skin; SP, spleen; LI, liver; T, testis; OV, ovary; K, kidney; LU, lung; -RTC, -reverse transcriptase control; NTC, notemplate control; and gDNA, genomic DNA. The gDNA band for wallaby CAPN1 was expected because the primers used did not span an exon boundary. approximately twice the value of comparisons excluding eutherians (table 2). Therefore, type I sites seem to be more associated with eutherian than pretherian CAPN11. However, it was previously shown by Gribaldo et al. (2003) that type I sites were equally present in paralogous and orthologous subgroups of a and b hemoglobin subunits. These authors suggested that type I sites underlie common pro1894 MBE cesses related to the evolution of homologous protein structures rather than being signatures for functional change (Gribaldo et al. 2003). In support of this notion, type I positions made up 95% of site variation among ;2,000 orthologues of cytochrome b, a mitochondrial protein unlikely to diverged functionally (Lopez et al. 2002). Here, type I h values were significantly positively correlated to the mean genetic distance between the compared clades (Spearman’s R 5 0.74, P 5 0.02, not shown). Thus, the increased type I h value in eutherian CAPN11 may mainly reflect the increased dN in the lineages stem. For these reasons, we deemed it would be difficult to distinguish type I sites that were candidates for clade-specific functions of CAPN11 from those representing background evolution. We also observed significant type II h values between eutherian CAPN11 and orthologous teleost or sauropsid clades (table 2). However, there was no evidence for type II divergence between teleost and sauropsid CAPN11 (table 2). Significant type II h values were also observed comparing vertebrate CAPN1 and eutherian CAPN11 but not comparing CAPN1 and teleost or sauropsid CAPN11 (table 2). Furthermore, although type II divergence was observed between CAPN2 and each CAPN11 clade, h was around twice as large in the comparison with eutherian CAPN11 (table 2). Type II divergence was absent between CAPN1 and 2 (table 2). To gain insight into the type II residue replacements in eutherian CAPN11, sites were examined with the highest possible Bayesian posterior ratio score (PRS), which mainly represented positions completely fixed as one residue in eutherians and a radically different state in the compared clade. Such type II sites were observed in 29, 34, 16, and 26 of 646 sites in respective comparisons of eutherian CAPN11 with teleost CAPN11 and sauropsid CAPN11, CAPN1, and CAPN2. Many of these type II sites were identified in all these comparisons and were generally identical or conserved in side chain biochemical property in pretherian CAPN11 and CAPN1/2 (fig. 5). Some type II sites receiving the highest PRS were identified in comparisons of eutherian CAPN11 with pretherian CAPN11 and either, but not both CAPN1 and 2 (fig. 5). For a broader comparison, we included in figure 5, the residues conserved at the equivalent sites in CAPN3 and 8, which are phylogenetically closely related to CAPN11, 1, and 2 (Jékely and Friedrich 1999; Macqueen et al. 2010) but have some fundamentally distinct functions. For example, CAPN3 is ‘‘muscle specific’’ in mammals and birds (Sorimachi et al. 1995; Sorimachi and Suzuki 2001), does not bind CAPNS1, and is not inhibited by CAST (Sorimachi et al. 1997). At many type II sites, the residue fixed in CAPN3 and/or 8 is also fixed or at least conserved in biochemical property with the equivalent residue in pretherian CAPN11 and/or CAPN1 and/or 2 but not with eutherian CAPN11 (fig. 5). This suggests that many radical replacements at type II sites of eutherian CAPN11 are unique to this section of the calpain family and cannot be related to functional specificities of CAPN3/8 distinguishing them from CAPN1/2. MBE Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 Table 2. Details of DIVERGE 2.0 Analysis of Functional Divergence. Comparison Type I CAPN11 eutherians versus sauropsids CAPN11 eutherians versus teleosts CAPN11 sauropsids versus teleosts CAPN1 versus CAPN11 eutherians CAPN1 versus CAPN11 sauropsids CAPN1 versus CAPN11 teleosts CAPN2 versus CAPN11 eutherians CAPN2 versus CAPN11 sauropsids CAPN2 versus CAPN11 teleosts CAPN1 versus CAPN2 Type II CAPN11 eutherians versus sauropsids CAPN11 eutherians versus teleosts CAPN11 sauropsids versus teleosts CAPN1 versus CAPN11 eutherians CAPN1 versus CAPN11 sauropsids CAPN1 versus CAPN11 teleosts CAPN2 versus CAPN11 eutherians CAPN2 versus CAPN11 sauropsids CAPN2 versus CAPN11 teleosts CAPN1 versus CAPN2 u uSE z score P 0.63 0.58 0.35 0.50 0.35 0.28 0.55 0.28 0.36 0.197 0.098 0.067 0.07 0.054 0.072 0.079 0.073 0.043 0.063 0.048 6.50 8.62 4.89 9.32 4.89 3.59 7.50 3.18 5.72 4.11 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.001 <0.0001 <0.001 <0.0001 <0.0001 0.15 0.18 0.031 0.12 20.02 20.031 0.11 0.060 0.068 20.024 0.038 0.04 0.028 0.048 0.04 0.043 0.046 0.034 0.037 0.047 3.92 4.35 1.09 2.41 20.49 20.71 2.44 1.74 1.81 0.51 <0.0001 <0.0001 0.14 0.008 0.69 0.76 0.007 0.041 0.035 0.70 NOTE.—h, coefficient of functional divergence; hSE, standard error of h; z score corresponding to P , 0.05 5 1.645, P , 0.01 5 2.326, P , 0.001 5 3.09, and P , 0.0001 5 3.719. Implications of Type II Sites for Putative Residue Interactions with CAST4 and CAPNS1 We next used protein homology modeling to examine if type II residue replacements could modulate the interaction of calpains with other proteins. As a template, we used a 2.4Å resolved crystal structure of the rat CAPN2–CAPNS1 complex (i.e., m-CAPN) bound to an inhibitory domain of CAST (CAST4) in the presence of Ca2þ (Hanna et al. 2008). We produced homology models from this structure for CAPN2, as well as other calpains known to bind CAST, including CAPN1 (Goll et al. 2003) and CAPN11 of chicken (Wolfe et al. 1989), plus for eutherian CAPN11, where it is unknown if a physical interaction with CAST exists. In this regard, CAST is expressed in primate spermatozoa (Rojas et al. 1999; Yudin et al. 2006), suggesting that it should be physically proximal to eutherian CAPN11. CAST is formed of four inhibitory domains each split into regions A, B, and C (Goll et al. 2003). Each inhibitory domain binds calpain DIV through region A to CAPNS1 via region C and to the remaining large subunit via region B, where the protease core active site is blocked while proteolysis is avoided by a looping mechanism that bypasses the active-site cysteine (Hanna et al. 2008) (e.g., fig. 6A). The control model, which used rat CAPN2, CAPNS1, and CAST4 sequences (fig. 6A), was visually identical to the published structure (Hanna et al. 2008) and all key-stated residue–residue interactions between CAPN2 and CAST4 were predicted (not shown). We mapped to this model, type II sites where biochemical constraints were conserved in CAPN2 and pretherian CAPN11 but had radically shifted in the eutherian ancestor (fig. 6A). Of 20 such type II sites, 13 were positioned away from the interface with CAPNS1 and CAST4, 6 were found at the interface with region B of CAST4 (sites Q290, R337, D425, T456, R461, and T464 in the full-length protein, accession number: 3BOW_A), interacting with one to three CAST4 residues, and 1 site (I516) was located at the CAPN2–CAPNS1 interface, interacting with three CAPNS1 residues (fig. 6B). At equivalent type II sites in full-length rat CAPN1 (accession number: NP_062025), the same residue–residue interactions were conserved with no new interactions predicted (fig. 6C). In chicken CAPN11 (accession number: NP_990634), identical residue–residue interactions were conserved at six of these seven type II sites and no new interactions were predicted (fig. 6D). The only difference was that at Q293 of the full-length sequence, which is equivalent to Q290 of CAPN2, a single additional residue interaction with CAST4 was predicted (fig. 6D). Strikingly, in CAPN11 of the eutherian ancestor and rat (accession number: NP_001002806), only one of these six type II sites had a conserved interaction with CAST4 compared with CAPN2 (fig. 6E and F). At the other type II sites, interactions with CAST4 residues were either lost (e.g., L347/L371 and F474/F499 of rat/ancestral eutherian CAPN11, respectively equivalent to R337 and R461 of CAPN2) or modified (e.g., R300/R324 and I477/I502 of rat/ancestral eutherian CAPN11, respectively equivalent to Q290 and T464 of CAPN2 (fig. 6E and F). Furthermore, one type II residue replacement in CAPN11 of rat and the eutherian ancestor (W430 and W454, respectively) lead to an interaction with CAST4 absent at the equivalent type II site of CAPN2 (R416) and pretherian CAPN11 (R419). The single type II residue interaction with CAPNS1 conserved in CAPN2 (I516), CAPN1 (I527), and chicken CAPN11 (I519) was absent at the equivalent site in eutherian CAPN11 (N554; fig. 6E) and modified in rat CAPN11 1895 Macqueen et al. · doi:10.1093/molbev/msq071 MBE FIG. 5. Conservation and distribution of identified type II sites in eutherian CAPN11 and pretherian CAPN11, CAPN1, and CAPN2. Sites receiving the highest possible PRS in at least two DIVERGE 2.0 comparisons with eutherian CAPN11 are marked with stars. For comparison, logos are shown for amphibian and platypus CAPN11, plus vertebrate CAPN3 and 8, which were not included in the analysis. Residues are color coded by biochemical property and heights represent their relative frequency at each site. The distribution of conserved type II sites is shown along the domain structure of human CAPN11, where DIIa, DIIb, DIII, and DIV are respectively boxed in green, yellow, red, and blue. The location of catalytic residues (C, H, and N) and the EF hand motifs (black vertical lines) are shown. The number of species per logo is indicated. (N529; fig. 6F). These results suggest that type II replacements fixed in CAPN11 of eutherians alter residue–residue interactions at the interface with CAST4 (and to a smaller extent CAPNS1) that have remained conserved between pretherian orthologues and CAPN2. Because these differences in interaction dynamics are mainly conserved in CAPN11 of both rat and the eutherian ancestor (fig. 6E and F), it is likely that many of the original type II fixations 1896 in the eutherian stem have not since been markedly affected or buffered by subsequent changes in the protein in individual eutherian lineages. It should also be noted that all interactions between CAST4 and the protease core residues (S105, H262, and N286 in full-length rat CAPN2) and other critical flanking residues of the active-site cleft, for example, W288 (Hanna et al. 2008) were conserved across the homology models (not shown). Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 MBE FIG. 6. (A) 3D surface representation of a protein homology model for rat CAPN2, bound to CAPNS1 and CAST4 (after Hanna et al. 2008). DIIa, DIIb, DIII, and DIV are shaded by the same color scheme as figure 5, DVI of CAPNS1 is shaded orange, and CAST4 regions are labeled with arrows (B) Is the same model as part A in cartoon form with the CAPN2 large subunit shaded entirely gray. Blue spheres show type II residues with no interaction with CAPNS1 or CAST4. Red spheres mark type II residues that interact with one to three residues in CAST4 or CAPNS1. (C–F) Represents homology models for other calpain family members (indicated above each structure) in the style of part B. Blue and red spheres show conserved residue–residue interactions relative to those observed in rat CAPN2, except when respectively labeled with blue or red arrowheads, which show new or lost residue interactions. Cyan spheres show type II residues where an interaction occurs with a CAST4 residue that does not occur at the CAPN2–CAST4 interface, but other residue–residue interactions are conserved. Pink spheres show type II replacements where an interaction with a CAST4 residue found at the CAPN2–CAST4 interface was lost, but other residue–residue interactions are conserved. Evidence for Loss of a Functional CAPN11 Gene in the Marsupial Lineage A CAPN11 gene is absent from the opossum genome, which harbors predicted genes for all other expected family members, that is, CAPN1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14 and 15. We submitted the region of chromosome 2 between TMEM63B and SLC29A1 genes, which flank the 5# and 3# of CAPN11 in all tetrapod genomes (fig. 1 and supplementary fig. S3, Supplementary Material online) to GENSCAN (Burge and Karlin 1997) returning an open-reading 1897 Macqueen et al. · doi:10.1093/molbev/msq071 MBE FIG. 7. (A) Sequence alignment of putative nonfunctional CAPN11 proteins of marsupials with functional eutherian and pretherian orthologues, plus CAPN1 and 2 of wallaby/chicken. Residues shaded in blue or green are positioned in different exons and those shaded red span exon boundaries. Letters shaded in bold or underlined black font respectively identify residue replacements in marsupial or eutherian CAPN11 that deviate from the state conserved in pretherian CAPN11. A stop codon in wallaby CAPN11 is marked as a black asterix. Indels are shown as a dash. A type II site in CAPN11 is marked with a 2. (B) Example DNA sequencing trace chromatograms (obtained from the Ensembl trace server) demonstrating the high quality of nucleotide bases underlying the aligned protein sequence for opossum CAPN11. Amino acids are shown above codons and arrowheads mark predicted donor and acceptor sites in introns at exon boundaries (C) As for (B), except for tammar wallaby. frame (ORF) of 268 amino acids (804 bp, derived from eight exons), which was BlastP screened against the nonredundant NCBI database, returning statistically strongest hits to CAPN11 sequences. We initially checked that the nucleotides underlying the GENSCAN prediction were not due to sequencing errors. Specifically, the 30,000-bp genomic DNA region encompassing the predicted ORF was broken into 800-bp segments, which were individually screened against the Ensembl trace server. In each case, multiple (.4) overlapping high-quality traces were retrieved (examples in fig. 7). Therefore, we are confident of the accuracy of 1898 the sequence underlying the GENSCAN prediction. Sequence alignment revealed that the opossum ORF was positioned toward the C-terminus of functional CAPN11 proteins and contained several large deletions (not shown). The CAPN11 ORF (minus indels) shared more sequence identity to CAPN11 sequences (.45% vs. platypus or chicken CAPN11) than to the next most closely related typical calpains (e.g., ;35% vs. marsupial CAPN1 or 2), but at a markedly lower level than in typical orthologue comparisons (e.g., ;80% in platypus vs. chicken CAPN11). The reduced sequence identity was due to frequent residue Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 replacements at sites conserved across typical calpains (e.g., fig. 7). The tammar wallaby has a 2-fold coverage genome sequence, within which a putative CAPN11 orthologue was identified (i.e., the sequence that was not transcribed across tissues; fig. 4). Specifically, a gene prediction (Ensembl ID: ENSMEUG00000013426) was identified by Blast screening. This gene has been named as CAPN12, but this is a clear misannotation, which can be proven by simple sequence alignment (see below). ENSMEUG00000013426 has a predicted cDNA distinct from a repertoire of other calpain genes (including CAPN1, 2, 3, 5, 6, 7, 8, 9, 10, 13, 14, and 15) but contains sections of unknown nucleotides, coding 355 amino acids in total. We again used the Ensembl trace server to confirm the quality of the ;23,000-bp genomic sequence from which this gene was predicted. The entire region as it appears in Ensembl was covered by overlapping high-quality traces, although several regions were covered by a single trace and the missing sequence information was due to a lack of coverage. This approach identified the presence of a stop codon that is skipped by the Ensembl gene prediction (see fig. 7). Like the opossum ORFs, the wallaby sequence returned strongest Blast hits to CAPN11 sequences and spanned the C-terminal of the protein. Similarly, by manual sequence alignment, large deletions were observed (not shown). In regions where indels were removed, the sequence shared greater sequence identity to CAPN11 (e.g., 54% vs. platypus) than the next most related calpain family members (e.g., 46% and 39% respective identity with wallaby CAPN1 and 2) and less identity with other family members, including CAPN12 (e.g., 32% identity with opossum CAPN12, which shares 75% identity with human CAPN12). The reduced sequence identity was again due to frequent residue replacements at sites conserved across typical calpains (e.g., see fig. 7). We could not identify marsupial CAPN11 sequences in NCBI EST or nonredundant protein databases for marsupials using CAPN11 orthologues as in silico probes. These degraded sequence predictions for opossum and wallaby suggest that a functional CAPN11 product is absent in both marsupial species. Comparison of CAPN11 in Marsupials, Pretherians, and Eutherians Short regions of putative CAPN11 proteins of opossum (129 amino acids) and wallaby (111 amino acids) were conserved enough to allow confident sequence alignment versus functional CAPN11 orthologues, plus marsupial CAPN1 and 2 (fig. 7). The intron–exon structures of the marsupial sequences are conserved with other typical calpains (fig. 7). Unsurprisingly, more residue replacements are present in the marsupial sequences than in functional calpains (fig. 7). Interestingly, many nonsynonymous changes present in eutherian CAPN11 occur at sites where replacements have also occurred in marsupials but are constrained across pretherian orthologues (fig. 7). These include a class of sites that are constrained as the MBE same residue in CAPN11 of included pretherians but variable with respect to amino acid property in included eutherians (fig. 7). Many such eutherian sites have replacements in one or both marsupial sequences that also deviate from the conserved pretherian state (fig. 7). These sites may represent those where functional constraints were lost in both marsupials and eutherians compared with the pretherian state. There is also a site-class constrained as the same residue in pretherian CAPN11 but fixed as a distinct residue in the included eutherians (fig. 7). Interestingly, at all such sites, nonsynonymous changes are present in one or both marsupial sequences that also deviate from the pretherian state (fig. 7). These sites may represent those where functional constraints were initially relaxed in both marsupials and eutherians but strong constraints subsequently returned in eutherians alone. There are also some sites where the same amino acid is conserved in eutherians and one or both marsupials (fig. 7), suggesting that they are synapomorphies. One such site was a type II residue (fig. 7). Discussion CAPN11 is a Vertebrate-Wide Family Member Directly Ancestral to CAPN1 and 2 Our phylogenetic trees supported a basal position for the newly recognized CAPN11 clade relative to CAPN1 and 2/8 with respect to the CAPN3/9 outgroup (fig. 2). It was previously suggested that CAPN1, 2, and 8 genes arose following two rounds of genomic duplication in the vertebrate stem of chordates, initially a tetraploidisation or large-scale event leading to CAPN1 and an ancestor gene to CAPN2/8, followed by a tandem duplication leading to separate CAPN2 and 8 genes (Jékely and Friedrich 1999). Considering its phylogenetic position, CAPN11 is a strong candidate for being the progenitor sequence from which CAPN1 and the ancestor gene to CAPN2/8 initially arose. Therefore, future studies of CAPN11 may provide interesting clues into the biochemical and functional properties of its highly studied paralogues, CAPN1 and 2. CAPN11 Diverged Functionally in a Eutherian Ancestor and is Likely Subject to Reduced Selective Constraints Relative to Pretherian Orthologues This work indicates that eutherian CAPN11 performs a more restricted physiological role than its pretherian orthologues. For example, the shift in regulation to testis (fig. 4) and specifically spermatozoa (Ben-Aharon et al. 2006) would have provided a dramatically different cellular arena in which to function. The restriction of CAPN11 protein to the testis may also explain the ;3-fold elevated dN/ dS ratios estimated across eutherians compared with the pretherian clade (table 1). It has been shown by Duret and Mouchiroud (2000) and Jordan et al. (2005) that the protein products of mouse and human orthologues expressed in a wide range of tissues evolve under greater functional constraints than those expressed in few tissues. For broadly expressed calpains, it would be expected that 1899 Macqueen et al. · doi:10.1093/molbev/msq071 among their numerous substrates (as well as among proteins with which they interact but do not proteolyse), some proportion would never be localized in testis or sperm. Therefore, eutherian CAPN11 likely lost a plethora of interactions conserved in widely expressed pretherian orthologues. Accordingly, some sites involved in the underlying interactions would be expected to be subject to a lower level of purifying selection, with nonsynonymous replacements being neutral that would be strongly deleterious in pretherian orthologues. In addition to the remarkable shift in gene regulation, several type II replacements fixed in the eutherian stem (fig. 5) modify or ablate certain interactions with CAST4 that are conserved in pretherian orthologues and CAPN1/2 paralogues (fig. 6). However, no type II sites or other nonsynonymous changes in eutherian CAPN11 proteins modified any critical interactions of CAST4 with the protease core and active-site cleft of CAPN11. Instead, the type II sites fall in regions that may affect the overall stability of the interaction. Without experimental validation, it is unclear if changes in CAPN11–CAST interactions will together relax the sum interaction between these proteins and thus the availability of CAPN11 to its potential substrates. One intriguing possibility is that with its restricted expression pattern, eutherian CAPN11 requires less tight control at the protein level than its pretherian orthologues and CAPN1/2. MBE were required to reach testis restriction in the eutherian ancestor, it is unlikely that each along the chain would be adaptive, particularly considering that the gene was seemingly under purifying selection to retain a wideexpression breadth until this point in evolutionary time (fig. 4). Thus, we suggest that a strong relaxation in purifying selection is required to account for the number of changes in regulatory regions required for such a dramatic change in gene transcription. In short regions of marsupial CAPN11 proteins that were available for comparison, many nonsynonymous changes present in eutherian CAPN11 that deviate from the pretherian state were observed at sites where constraints were also altered in opossum and/or wallaby (fig. 7). Furthermore, several putative synapomorphic sites, including one type II site, are shared by marsupial and eutherian CAPN11 (fig. 7), suggesting that some nonsynonymous changes present in CAPN11 of the eutherian ancestor occurred in a common eutherian–marsupial ancestor after the split from monotremes. These results suggest that a loss of constraint was common to many sites shared by marsupial and eutherian CAPN11 relative to pretherian orthologues. This points to a relaxation in purifying selection on the accompanying residues in a common therian ancestor being a driving force in the rapid evolution of CAPN11 in the eutherian ancestor. What Underlies the Episodic Evolution of CAPN11? A Model Akin to the Dykhuizen–Hartl Effect is Consistent with the Evolution of CAPN11 The point in evolutionary time where CAPN11 became transcriptionally restricted to testis and where coding level dN/dS estimates were first elevated encompasses the split of marsupials and eutherians. Considering this fact and that the marsupial gene is no longer expressed (fig. 4) and has accumulated coding level changes that likely render it nonfunctional (fig. 7), it is difficult to rule out that a common mechanism that caused a loss of constraint in a common eutherian–marsupial ancestor underlies the regulatory and coding level changes to the functional eutherian gene. We find nothing in the literature proposing a population mechanism to explain how orthologues of a widely expressed gene could become transcriptionally disabled in one lineage (i.e., marsupials) and tissue restricted in its sister group (i.e., the eutherians). The sum spatial and temporal expression of a typical vertebrate gene is governed by the interactions of the transcription machinery with its promoter, plus numerous enhancers, silencers, and insulators found at multiple spatially distinct locations in the proximal genome (Levine and Tjian 2003). Due to this overriding complexity, we suggest that the shift from a broad to tissue-specific expression pattern could not be mediated by positive selection alone. Our reasoning is that the adaptive phenotype being under selection, in this case testisrestricted transcription, would likely require a number of regulatory elements, such as tissue-specific enhancers, to be cumulatively modified, meaning no single substitution could reach the required endpoint. If multiple substitutions Based on the above arguments, we feel it is impossible to exclude that a loss of constraints accounts for the rapid evolution of CAPN11 at the base of therians and its subsequent functional restriction in eutherians and disablement in marsupials. Relaxation of functional constraint is common in one of two paralogues after genomic duplication due to redundancy in function (Zhang 2003). In the case of CAPN11, it seems unlikely that functional redundancy with other calpain family members could account for loss of constraints following the therian–monotreme split, considering that at this point in time, all vertebrate-wide calpain family members would have been separated by a minimum of ;230–260 million year of evolution (estimate from Benton and Donoghue 2007) and would have diverged markedly in protein sequence and presumably certain functions. Under the neutral theory of evolution (Kimura 1983), the strength of natural selection is intimately associated with population size, meaning purifying selection becomes increasingly inefficient under smaller effective population sizes, whereas genetic drift is more prevalent. Effective population size can be reduced via population bottlenecks, which are thought be common during speciation (Mayr 1963), a proposal that has been supported experimentally (Guo et al. 2009). It is possible that some event reducing effective population size affected the CAPN11 locus of a common marsupial– eutherian ancestor and consequently, mutations were fixed in regulatory and coding regions due to inefficient purifying selection/increased drift, leading to a reduction in 1900 Episodic Evolution of a Newly Classified Calpain Gene · doi:10.1093/molbev/msq071 transcript expression breadth and the incorporation of many nonsynonymous replacements in the protein that deviated from the pretherian state. Kimura (1983) proposed the Dykhuizen–Hartl effect (after Dykhuizen and Hartl 1980) as a mechanism whereby neutral mutations fixed by drift become subsequently adaptive in an altered biochemical environment. A model of this type can explain how eutherian CAPN11 gained a distinct role in eutherians compared with pretherians and how the gene was lost in marsupials. There is evidence that several facets of eutherian sperm physiology are distinct from marsupials, including that the oocyte zona pellucida (a membrane the spermatozoa must penetrate) is markedly thicker and more resistant to enzymatic digestion (Bedford 1998). Furthermore, the manner by which spermatozoa bind to this membrane in eutherians is different to marsupials (Bedford 1998). This binding is Ca2þ dependent (Yanagimachi 1978), and critical membrane fusion events between the spermatozoa and oocyte are likely modulated by the calpain system (Rojas et al. 1999). Therefore, we suggest that after the split of the common eutherian–marsupial ancestor, some lineage-specific facet of sperm physiology arose in eutherians, providing an adaptive advantage for the ‘‘evolved’’ CAPN11 gene in testis, possibly in relation to Ca2þ-dependent membrane fusion. It is possible that such an adaptive response involved positive selection for a testis-specific enhancer in CAPN11 regulatory regions. Under this model, certain combinations of nonsynonymous replacements fixed in the coding sequence were also selectively advantageous in the testis-restricted biochemical environment and with a return to a larger effective population size became subject to strong ongoing purifying selection leading to the observed type II sites. Conversely, in marsupials, no adaptive advantage was ever realized and the accumulation of changes to regulatory and coding regions meant the CAPN11 gene became subject to weak or even absent ongoing purifying selection, eventually leading to its functional disablement. Supplementary Material Supplementary table S1, figures S1–S3, and file S1 are available at Molecular Biology and Evolution online (http:// www.mbe.oxfordjournals.org). Acknowledgments Dr Lara Meischke (University of St Andrews) sequenced a teleost l/m-CAPN leading to the studies conception. We thank Dr Dave Ferrier (University of St Andrews) for his comments on an earlier manuscript draft. The study was supported by grants from the Natural Environment Research Council (NE/E015212/1) and European Commission (contract 506359). D.J.M and I.A.J conceived the study. D.J.M performed all experiments except PCRs in platypus and tammar wallaby performed by M.L.D. S.M. contributed to the shared synteny analyses. D.J.M prepared all figures MBE and drafted the manuscript, which was edited to final form with significant input from I.A.J and some input from M.L.D. References Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105. Bedford JM. 1998. Mammalian fertilization misread? Sperm penetration of the eutherian zona pellucida is unlikely to be a lytic event. Biol Reprod. 59:1275–1287. Ben-Aharon I, Brown PR, Shalgi R, Eddy EM. 2006. Calpain-11 is unique to mouse spermatogenic cells. Mol Reprod Dev. 73:767–773. Benkert P, Tosatto SCE, Schomburg D. 2008. QMEAN: a comprehensive scoring function for model quality assessment. Proteins 71:261–277. Benton MJ, Donoghue PC. 2007. Paleontological evidence to date the tree of life. Mol Biol Evol. 24:26–53. Burge C, Karlin S. 1997. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 268:78–94. Burnham KP, Anderson DP. 2002. Model selection and multimodel inference. New York: Springer-Verlag. Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 17:540–552. Colovos C, Yeates TO. 1993. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci. 2:1511–1519. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res. 14:1188–1190. Dear TN, Boehm T. 1999. Diverse mRNA expression patterns of the mouse calpain genes Capn5, Capn6 and Capn11 during development. Mech Dev. 89:201–209. Dear TN, Möller A, Boehm T. 1999. CAPN11: a calpain with high mRNA levels in testis and located on chromosome 6. Genomics 59:243–247. Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 7:214. Duret L, Mouchiroud D. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 17:68–74. Dutt P, Croall DE, Arthur JS, Veyra TD, Williams K, Elce JS, Greer PA. 2006. m-Calpain is required for preimplantation embryonic development in mice. BMC Dev Biol. 6:3. Dykhuizen D, Hartl DL. 1980. Selective neutrality of 6PGD allozymes in E. coli and the effects of genetic background. Genetics 96:801–817. Farkas A, Tompa P, Friedrich P. 2003. Revisiting ubiquity and tissue specificity of human calpains. Biol Chem. 384:945–949. Fay JC, Wu CI. 2003. Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet. 4:213–235. Goll DE, Thompson VF, Li H, Wei W, Cong J. 2003. The calpain system. Physiol Rev. 83:731–801. Gribaldo S, Casane D, Lopez P, Philippe H. 2003. Functional divergence prediction from evolutionary analysis: a case study of vertebrate hemoglobin. Mol Biol Evol. 20:1754–1759. Gu X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol. 16:1664–1674. Gu X. 2006. A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Mol Biol Evol. 23:1937–1945. Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biology. 52:696–704. 1901 Macqueen et al. · doi:10.1093/molbev/msq071 Guo YL, Bechsgaard JS, Slotte T, Neuffer B, Lascoux M, Weigel D, Schierup MH. 2009. Recent speciation of Capsella rubella from Capsella grandiflora, associated with loss of self-incompatibility and an extreme bottleneck. Proc Natl Acad Sci U S A. 106:5246–5251. Hanna RA, Campbell RL, Davies PL. 2008. Calcium-bound structure of calpain and its mechanism of inhibition by calpastatin. Nature 456:409–412. Hata S, Doi N, Kitamura F, Sorimachi H. 2007. Stomach-specific calpain, nCL-2/calpain 8, is active without calpain regulatory subunit and oligomerizes through C2-like domains. J Biol Chem. 282:27847–27856. Hughes AL, Nei M. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167–170. Jékely G, Friedrich P. 1999. The evolution of the calpain family as reflected in paralogous chromosome regions. J Mol Evol. 49:272–281. Jordan IK, Mariño-Ramı́rez L, Koonin EV. 2005. Evolutionary significance of gene expression divergence. Gene 345:119–126. Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9:286–298. Kimura M. 1983. The neutral theory of molecular evolution. Cambridge: Cambridge University Press. Kittichotirat W, Guerquin M, Bumgarner RE, Samudrala R. 2009. Protinfo PPC: a web server for atomic level prediction of protein complexes. Nucleic Acids Res. 37:W519–D525. Kosakovsky Pond SL, Frost SD. 2005a. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–679. Kosakovsky Pond SL, Frost SD. 2005b. Datamonkey: rapid detection of selective pressure on individual sites of codon alignment. Bioinformatics 21:2531–2533. Kosakovsky Pond SL, Poon AFY, Frost SD. 2009. Estimating selection pressures on alignments of coding sequences. In: Lemey P, Salemi M, Vandamme A, editors. The phylogenetic handbook. Cambridge: Cambridge University Press. p. 419–490. Lee HL, Santé-Lhoutellier V, Vigouroux S, Briand Y, Briand M. 2007. Calpain specificity and expression in chicken tissues. Comp Biochem Physiol B Biochem Mol Biol. 146:88–93. Levine M, Tjian R. 2003. Transcription regulation and animal diversity. Nature 424:147–151. Lopez P, Casane D, Philippe H. 2002. Heterotachy, an important process of protein evolution. Mol Biol Evol. 19:1–7. Macqueen DJ, Meischke L, Manthri S, Anwar A, Solberg C, Johnston IA. 2010. Characterisation of capn1, capn2-like, capn3 and capn11 genes in Atlantic halibut (Hippoglossus hippoglossus L.): transcriptional regulation across tissues and in skeletal muscle at distinct nutritional states. Gene 453:45–58. 1902 MBE Mayr E. 1963. Animal species and evolution. Cambridge: Harvard University Press. Murakami T, Ueda M, Hamakubo T, Murachi T. 1988. Identification of both calpains I and II in nucleated chicken erythrocytes. J Biochem. 103:168–171. Porollo A, Meller J. 2007. Versatile annotation and publication quality visualization of protein complexes using POLYVIEW-3D. BMC Bioinformatics. 8:316. Pybus OG, Shapiro B. 2009. Natural selection and adaptation of molecular sequences. In: Lemey P, Salemi M, Vandamme A, editors. The phylogenetic handbook. Cambridge: Cambridge University Press. p. 407–418. Rojas FJ, Brush M, Moretti-Rojas I. 1999. Calpain-calpastatin: a novel complete calcium-dependent protease system in human spermatozoa. Mol Hum Reprod. 5:520–526. Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. Saez ME, Ramirez-Lorca R, Moron FJ, Ruiz A. 2006. The therapeutic potential of the calpain family: new aspects. Drug Discov Today. 11:917–923. Sorimachi H, Ishiura S, Suzuki K. 1997. Structure and physiological function of calpains. Biochem J. 328:721–732. Sorimachi H, Suzuki K. 2001. The structure of calpain. J Biochem. 129:653–664. Sorimachi H, Tsukahara T, Okada-Ban M, Sugita H, Ishiura S, Suzuki K. 1995. Identification of a third ubiquitous calpain species-chicken muscle expresses four distinct calpains. Biochim Biophys Acta. 1261:381–393. Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34:W609–W612. Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol. 24:1596–1599. Wallner B, Elofsson A. 2005. Identification of correct regions in protein models using structural, alignment and consensus information. Protein Sci. 15:900–913. Wolfe FH, Sathe SK, Goll DE, Kleese WC, Edmunds T, Duperret SM. 1989. Chicken skeletal muscle has three Ca2þ-dependent proteinases. Biochim Biophys Acta. 998:236–250. Yanagimachi R. 1978. Calcium requirement for sperm–egg fusion in mammals. Biol Reprod. 19:949–958. Yudin AI, Goldberg E, Robertson KR, Overstreet JW. 2006. Calpain and calpastatin are located between the plasma membrane and outer acrosomal membrane of cynomolgus macaque spermatozoa. J Androl. 21:721–729. Zhang J. 2003. Evolution by gene duplication: an update. Trends Ecol Evol. 18:292–298.
© Copyright 2026 Paperzz