Gene 343 (2004) 357 – 366 www.elsevier.com/locate/gene Phylogenetic relationships of discoglossid frogs (Amphibia:Anura:Discoglossidae) based on complete mitochondrial genomes and nuclear genes Diego San Mauro*, Mario Garcı́a-Parı́s, Rafael Zardoya Departamento de Biodiversidad y Biologı́a Evolutiva, Museo Nacional de Ciencias Naturales, CSIC, José Gutiérrez Abascal, 2. E-28006 Madrid, Spain Received 23 April 2004; received in revised form 30 July 2004; accepted 5 October 2004 Available online 11 November 2004 Received by G. Pesole Abstract The complete nucleotide sequence of the mitochondrial (mt) genome was determined for three species of discoglossid frogs (Amphibia:Anura:Discoglossidae), representing three of the four recognized genera: Alytes obstetricans, Bombina orientalis, and Discoglossus galganoi. The organization and size of these newly determined mt genomes are similar to those previously reported for other vertebrates. Phylogenetic analyses (maximum likelihood, Bayesian inference, minimum evolution, and maximum parsimony) of mt protein-coding genes at the amino acid level were performed in combination with already published mt genome sequence data of three species of Neobatrachia, one of Pipoidea, and four of Caudata. Phylogenetic analyses based on the deduced amino acid sequences of all mt protein-coding genes arrived at the same topology. The monophyly of Discoglossidae is strongly supported. Within the Discoglossidae, Alytes is consistently recovered as sister group of Discoglossus, to the exclusion of Bombina. The three species representing Neobatrachia exhibited extremely long branches irrespective of the phylogenetic inference method used, and hence their relative position with respect to Discoglossidae and Xenopus may be artefactual due to a severe long branch attraction effect. To further investigate the phylogenetic intrarelationships of discoglossids, nucleotide sequences of four nuclear protein-coding genes (CXCR4, RAG1, RAG2, and Rhodopsin) with sequences available for the three discoglossid genera and Xenopus were retrieved from GenBank, and together with a concatenated nucleotide sequence data set containing all mt protein-coding genes except ND6 were subjected to separate and combined phylogenetic analyses. In all cases, a sister group relationship between Alytes and Discoglossus was recovered with high statistical support. D 2004 Elsevier B.V. All rights reserved. Keywords: Alytes; Bombina; Discoglossus; CXCR4; RAG1; RAG2; Rhodopsin 1. Introduction Discoglossids (Amphibia:Anura:Discoglossidae) are medium-sized frogs with a characteristic disc-shaped tongue Abbreviations: ATP6 and ATP8, ATP synthase F0 subunits 6 and 8; CI, consistency index; COX1-3, cytochrome c oxidase subunits I–III; CXCR4, chemokine (C-X-C motif) receptor 4; H-strand, heavy strand; L-strand, light strand; mt, mitochondrial; ND1-6, NADH dehydrogenase subunits 1–6; ORF, open reading frame; PCR, polymerase chain reaction; rRNA, ribosomal ribonucleic acid; RAG1 and RAG2, recombination activating genes 1 and 2; tRNA, transfer ribonucleic acid. * Corresponding author. Tel.: +34 91 4111328; fax: +34 91 5645078. E-mail address: [email protected] (D. San Mauro). 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2004.10.001 that show either a stocky or elongated body, and a warty or smooth skin. Furthermore, they exhibit considerably diverse life histories, from largely aquatic to terrestrial burrowers (Duellman and Trueb, 1994). Discoglossids are among the oldest living frog lineages, dating back at least to the Jurassic (Sanchiz, 1998). Living and fossil discoglossids are strictly distributed within the Paleartic Region, which supports a Laurasian origin of the lineage. Living discoglossid frogs have been generally grouped into four genera (e.g., Duellman, 1975; Laurent, 1979; Duellman and Trueb, 1994; Sanchiz, 1998): Alytes including five species from Western Europe and Morocco (but see the debate on Baleaphryne and Ammoryctis; Arntzen and 358 D. San Mauro et al. / Gene 343 (2004) 357–366 Garcı́a-Parı́s, 1995), Discoglossus composed by six (probably seven; Martı́nez-Solano, 2004) species from Western Europe, Northwestern Africa, Palestina and some Mediterranean islands, Bombina including nine species from Europe and East and South East Asia, and Barbourula comprising two species from Indonesia and Philippines (AmphibiaWeb, November 2, 2004; http://www.amphibiaweb.org/). Together with other bprimitive frogsQ (Leiopelmatidae, Ascaphidae, Pipoidea, and Pelobatoidea), discoglossids were traditionally placed within Archaeobatrachia (e.g., Duellman, 1975; Laurent, 1979). However, Archaeobatrachia is generally recovered as a paraphyletic group with respect to the remaining frogs, the Neobatrachia (e.g., Ford and Cannatella, 1993; Duellman and Trueb, 1994) based on morphological evidence. Only some molecular studies, based on partial sequences of mitochondrial (mt) ribosomal genes, have supported the monophyly of the Archaeobatrachia (Hedges and Maxson, 1993; Hay et al., 1995). The debate about the monophyly of Archaeobatrachia has been associated to rooting problems, particularly relevant for the molecular data sets (Garcı́a-Parı́s et al., 2003). Discoglossid frogs have been traditionally treated as a natural group (e.g., Duellman, 1975; Laurent, 1979; Duellman and Trueb, 1994; Sanchiz, 1998; Biju and Bossuyt, 2003; Pugener et al., 2003; Hertwig et al., 2004; Hoegg et al., 2004). However, the antiquity of the different discoglossid lineages has prompted taxonomic disagreement over the number of families in which the four genera should be grouped: one (Discoglossidae) or two independent families (Discoglossidae and Bombinatoridae) (e.g., Lanza et al., 1975). Moreover, while Bombina and Barbourula have been consistently treated as sister taxa (Duellman, 1975; Laurent, 1979; Duellman and Trueb, 1994; Sanchiz, 1998), a long and lasting controversy is still going on the relationships of the clade Bombina+Barbourula to the other discoglossid genera. Hypotheses supporting a sister taxon relationship between Alytes and Discoglossus to the exclusion of Bombina+Barbourula dominated over all other arrangements (Duellman, 1975; Laurent, 1979; Ford and Cannatella, 1993; Duellman and Trueb, 1994; Sanchiz, 1998; Biju and Bossuyt, 2003; Pugener et al., 2003; Hoegg et al., 2004). However, a sister taxon relationship between Alytes and Bombina (Erspamer et al., 1972; Lanza et al., 1975) and between Bombina and Discoglossus (Maxson and Szymura, 1984; Haas, 2003) were also proposed based on immunological and morphological evidence. Some studies have even challenged the monophyly of the group based on morphological evidence (Ford and Cannatella, 1993). These authors found the Alytes+Discoglossus grouping more closely related to other frogs (the Pipanura comprising pipoideans, pelobatoideans, and neobatrachians) than to the Bombina+Barbourula clade. They proposed the name Discoglossanura for the group including Alytes+Discoglossus and the Pipanura, whereas they used the name Bombinanura for the group comprising Bombina+Barbourula and the Discoglossanura. To test between the competing hypotheses on the monophyly of discoglossids, and to investigate the phylogenetic relationships among discoglossid genera, we have determined the complete nucleotide sequence of the mt genomes of three discoglossids, each one representing a different genus, and compared it with previously described frog mt genomes. This mitogenomic approach follows several recent studies (e.g., Zardoya and Meyer, 1996) that demonstrated the need to establish high-level phylogenetic inferences based on rather large sequence data sets in order to achieve statistical confidence. Also recently, several studies (e.g., Groth and Barrowclough, 1999) have proven that some orthologous nuclear protein-coding genes outperform individual mt genes in reconstructing ancient phylogenies. Therefore, to gain insights on the discoglossid phylogeny from a nuclear perspective, we have also gathered sequences of nuclear protein-coding genes that have shown a good performance in recovering the phylogenetic relationships among divergent amphibian lineages (Biju and Bossuyt, 2003; Hoegg et al., 2004; San Mauro et al., 2004). 2. Materials and methods 2.1. Taxon sampling The nucleotide sequence of the complete mt genome was determined in a single representative of the three most common discoglossid genera (voucher numbers from the Museo Nacional de Ciencias Naturales, Spain): Alytes obstetricans pertinax (MNCN/ADN 4313; collected in Tielmes, Spain), Bombina orientalis (MNCN/ADN 4314; pet trade), and Discoglossus galganoi (MNCN/ADN 4315; collected in Reliegos, Spain). The South East Asian genus Barbourula could not be included in the study, but it is confidently thought to be the sister group of Bombina, according to morphological and histological data (Sanchiz, 1998). The new sequence data were compared with all available anuran complete mt genome sequences: Bufo melanostictus (NC _ 005794), Fejervarya limnocharis (NC_005055), Rana nigromaculata (NC_002805), and Xenopus laevis (NC_001573). The complete mt genomes of four salamanders, Ambystoma mexicanum (NC_005797), Andrias davidianus (NC_004926), Lyciasalamandra atifi (NC_002756), and Ranodon sibiricus (NC_004021), were used as outgroups. To further investigate the phylogenetic relationships among discoglossids, we screened the GenBank database for nuclear protein-coding genes available in at least one species of each of the three discoglossid genera (Alytes, Bombina, and Discoglossus). Sequence information of the selected nuclear genes was not available for most of other frog and salamander genera employed in the mitochondrial analysis, so we used only Xenopus as outgroup, and addressed this second approach from a D. San Mauro et al. / Gene 343 (2004) 357–366 four-taxon-case perspective. The retrieved nuclear sequences were: CXCR4 exon 2 (A. obstetricans, AY364170; B. orientalis, AY364177; Discoglossus pictus, AY364172; X. laevis, Y17895); RAG1 (A. obstetricans, AY583334; B. orientalis, AY583335; D. galganoi, AY583338; X. laevis, L19324); RAG2 (Alytes muletensis, AY323780; B. orientalis, AY323783; Discoglossus sardus, AY323785; X. laevis, L19325); and Rhodopsin exon 1 (A. obstetricans, AY364385; B. orientalis, AY364391; D. pictus, AY364387; X. laevis, S62229). 2.2. DNA extraction, PCR amplification, cloning and sequencing Total DNA was purified following standard phenol/ chloroform extraction procedures. Overlapping fragments that covered the entire mt genome were amplified by PCR using the same primers and conditions reported in San Mauro et al. (2004). PCR products were purified by ethanol precipitation, and sequenced in an automated DNA sequencer (ABI PRISM 3700), using the BigDye Deoxy Terminator cycle-sequencing kit (Applied Biosystems) following manufacturer’s instructions. Short amplicons were directly sequenced using the corresponding PCR primers. Long amplicons were cloned into pGEM-T vectors (Promega), and recombinant plasmids were sequenced using the M13 (forward and reverse) universal primers and additional walking primers (available from the authors upon request). The sequences obtained averaged 700 base pairs (bp) in length, and each sequence overlapped the next contig by about 150 bp. In no case were differences in sequence observed between the overlapping regions. Complete mt genome nucleotide sequences reported in this paper have been deposited at the GenBank database under accession numbers AY585337 (A. obstetricans), AY585338 (B. orientalis), and AY585339 (D. galganoi). 2.3. Molecular and phylogenetic analyses Sequence data were analyzed with MacClade version 4.05, and PAUP* version 4.0b10. To control for saturation in the different data sets, we plotted either pairwise transition and transversion differences (for nucleotide sequences) or mean character distances (for amino acid sequences) against corrected sequence divergence (measured as ML distances). Transitions of mt protein-coding gene nucleotide sequences were saturated (Fig. 1A), particularly in all pairwise comparisons involving B. melanostictus, F. limnocharis, and R. nigromaculata i.e. Neobatrachia, and the outgroups. Hence, we analyzed the mt protein-coding gene sequence data at the amino acid level, which showed no saturation (Fig. 1B). Deduced amino acid sequences of mt protein-coding genes were aligned using CLUSTAL X version 1.83 and revised by eye in order to maximize homology of position. Ambiguous alignments and gaps were excluded from the 359 analyses using GBLOCKS version 0.91b with default parameters. The deduced amino acid sequences of all 13 proteincoding genes encoded by each mt genome were combined into a single concatenated data set that was subjected to four phylogenetic analyses using: maximum parsimony (MP), minimum evolution (ME), maximum likelihood (ML), and Bayesian inference (BI). MP and ME (mean character distances) analyses were carried out with PAUP* using heuristic searches with 10 random stepwise additions of taxa and TBR branch swapping. Support for the resulting MP and ME trees was evaluated by non-parametric bootstrapping (BP) with 1000 pseudoreplicates. ML analyses were conducted with TREE-PUZZLE version 5.2 using the mtREV24 model with correction for among-site rate heterogeneity (G+I). This model was selected following Yang et al. (1998) and by performing Likelihood Ratio Tests (LRTs) comparing hierarchically the following alternative models: equal rates (eq.) versus gamma-distributed rates (G), versus proportion of invariant sites (I), versus gammadistributed rates and proportion of invariant sites (G+I). Robustness of the resulting ML tree was evaluated by quartet puzzling (QP; 100,000 puzzling steps). BI analyses were performed with MrBayes version 3.0b4, simulating four simultaneous chains, for a million generations, sampling every 100 generations. Generations sampled before the chain reached stationarity (100,000), as judged by plots of ML scores, were discarded (bburn-inQ). For this analysis, the mtREV24+G+I model was also selected. Statistical support for clades obtained by BI was measured by Bayesian posterior probability (BPP). Because the sequences of the three species representing Neobatrachia were highly divergent, separate analyses using a more conservative alignment (by employing stringent parameter setting in GBLOCKS: minimum number of sequences for a conserved position: 9; minimum number of sequences for a flanking position: 11; maximum number of contiguous non-conserved positions: 1; minimum length of a block: 50) were also performed, using the same settings described above for each phylogenetic method. In the four-taxon approach, pairwise distance values of mt protein-coding genes among the three discoglossids and X. laevis were located in the linear part of the saturation plot (Fig. 1A), and neither transitions nor transversions were saturated. Similarly, nucleotide sequences of the four nuclear genes showed no saturation (not shown). Hence, all four-taxon data sets were analysed at the nucleotide level including all codon positions. For these data sets, inferred amino acid sequences were aligned as described above, gaps were excluded using GBLOCKS with default parameters, and the resulting alignments were then imposed onto the corresponding nucleotide sequences. Alignments are available from the authors upon request. The single four-taxon data sets (nucleotide sequences of each separate nuclear gene, a concatenated nucleotide sequence containing all four nuclear genes, and a con- 360 D. San Mauro et al. / Gene 343 (2004) 357–366 Fig. 1. Saturation plots of the mt concatenated datasets. (A) Plot of pairwise transitions (Ti) and transversions (Tv) against corrected sequence divergence (measured as ML distance) for the mt protein-coding genes at the nucleotide level. Dashed square indicates location of pairwise comparisons involving the three discoglossids and Xenopus. (B) Plot of uncorrected mean character distance against corrected divergence (measured as ML distance) for the mt proteincoding genes at the amino acid level. catenated nucleotide sequence containing all mt proteincoding genes except ND6) were subjected to MP, ME, ML and BI analyses, separately. MP, ME and ML analyses were carried out with PAUP*, whereas the BI analysis was conducted with MrBayes. MP and ML analyses were both performed using branch-and-bound searches with furthest addition sequence of taxa, whereas ME and BI analyses were both performed using the same settings mentioned above. The best-fit model of nucleotide substitution for the ME, ML, and BI analyses was selected using ModelTest version 3.5, following the Akaike Information Criterion (AIC). The selected model were: TVM+G, for CXCR4; GTR+I, for RAG1; TrN+I, for RAG2; HKY+G, for Rhodopsin; GTR+G, for the concatenated nuclear data set; and GTR+G+I, for the concatenated mt data set. MrBayes does not allow the TVM and TrN submodels, and hence the GTR was used for BI with the CXCR4 and RAG2 data sets. BPs were used to test the robustness of MP, ME, and ML trees (1000 pseudoreplicates). The reliability of the BI analyses was tested with BPPs. ML tree branch lengths were estimated to compare substitution rates among the different four-taxon data sets. Finally, all single nuclear gene four-taxon data sets as well as the concatenated mt four-taxon data set were combined into a joint data set, and submitted to MP, ME, ML, and BI methods of phylogenetic inference (using the same settings as for the separate analyses, see above). For ME and ML, a single GTR+G+I model of nucleotide substitution was selected (according to the AIC calculated using ModelTest). BI analysis was performed using the corresponding substitution model for each of the separate four-taxon data sets (see above), and model parameters were independently estimated for each partition (bunlinkQ option). Approximately unbiased (AU), Shimodaira-Hasegawa (SH), and Kishino-Hasegawa (KH) tests were used to evaluate the three alternative unrooted trees for the combined four-taxon data set using CONSEL version 0.1f, D. San Mauro et al. / Gene 343 (2004) 357–366 with site-wise log-likelihoods of trees calculated by PAUP*. A total of one million scaled bootstrap replicates were used in order to get a small sampling error. Some recent studies (e.g., Goldman et al., 2000) have pointed out that inappropriate tree specification may bias non-parametric tests (especially KH, which requires the trees to be specified a priori; and SH, which requires the inclusion of all breasonableQ trees though it is unclear how this set can be selected). On this matter, Goldman et al. (2000) noted that selecting all possible trees will be a conservative solution to the problem, but this is impractical except for the smallest taxon samplings. The more recent AU test is less biased than other methods, but is also impractical when the number of trees to be compared is large. We conducted the AU, SH, and KH tests using the combined four-taxon data set because is the one that gathers the largest and most comprehensive set of sequence characters, and because for four taxa there are only three alternative, fully resolved unrooted trees, making the selection of all possible trees practical. 3. Results and discussion 3.1. Mitochondrial genomes organization and structural features The complete nucleotide sequence of the L-strand of the mt genomes of the three discoglossids was determined. The total length of the new discoglossid mt genomes ranged from 17,014 to 17,847 bp (Table 1). All three mt genomes encoded for two rRNAs, 22 tRNAs, and 13 protein-coding genes, and in all cases the organization conformed to the vertebrate consensus mt gene arrangement (Jameson et al., 2003) (Fig. 2A). Overall base compositions of the L-strand as well as gene lengths for each genome are shown in Table 1. As in most vertebrates, the overall base compositions are skewed against guanine in all three discoglossid mt genomes, which is due to a strong bias against the use of guanine at the third codon position. The mt 12S and 16S rRNA genes range from 933 to 949, and from 1583 to 1626 bp (Table 1), respectively. The 22 tRNA genes range in size from 65 to 75 bp. All tRNAs can be folded into typical cloverleaf secondary structures with the known exception of tRNASer(AGY). There is one case of tRNA sequence overlap on the same strand: tRNACys and tRNATyr share one nucleotide in D. galganoi. Protein-coding genes in the three discoglossid mt genomes begin with ATG as start codon, except COX1, which initiates with GTG (Table 1). Stop codons are variable among discoglossid taxa. Most ORFs have incomplete stop codons, either T or TA, which presumably become functional by subsequent polyadenilation of the respective mRNAs (Table 1). As in most vertebrates, the putative origin of L-strand replication (OL) of the discoglossid mt genomes was located 361 Table 1 Main structural features of discoglossid mt genomes Feature Discoglossus galganoi Alytes obstetricans Bombina orientalis Total length %A %C %G %T Control region OL 12S rRNA 16S rRNA Intergenic spacers ATP6 ATP8 Cytochrome b COX1 COX2 COX3 ND1 ND2 ND3 ND4 ND4L ND5 ND6 17,014 29 27 16 28 1482 28 949 1626 37 17,490 29 29 15 27 2035 30 937 1583 35 17,847 30 27 15 28 2372 29 933 1599 42 683 (ATG/TA–) 168 (ATG/TAA) 1142 (ATG/TA–) 1554 (GTG/TAA) 688 (ATG/T–) 784 (ATG/T–) 965 (ATG/TA–) 1045 (ATG/T–) 343 (ATG/T–) 1378 (ATG/T–) 297 (ATG/TAA) 1818 (ATG/TAA) 510 (ATG/AGA) 683 (ATG/TA–) 168 (ATG/TAA) 1142 (ATG/TA–) 1551 (GTG/TAA) 688 (ATG/T–) 784 (ATG/T–) 963 (ATG/TAA) 1042 (ATG/T–) 343 (ATA/T–) 1378 (ATG/T–) 297 (ATG/TAA) 1809 (ATG/TAA) 510 (ATG/AGA) 684 (ATG/TAA) 168 (ATG/TAA) 1141 (ATG/T–) 1554 (GTG/TAA) 688 (ATG/T–) 784 (ATG/T–) 962 (ATG/TA–) 1045 (ATG/T–) 343 (ATG/T–) 1378 (ATG/T–) 297 (ATG/TAA) 1809 (ATG/TAA) 510 (ATG/AGA) For each, total length of the mt genome, overall base composition of the L-strand, length of the common non-coding regions, length of the ribosomal genes, length of all the intergenic spacers, and length of the protein-coding genes (showing start/stop codons within parentheses) are presented. Lengths are expressed as bp. within the WANCY tRNA cluster, between the tRNAAsn and tRNACys genes (Fig. 2A). In all discoglossids, the OL ranged from 28 to 30 bp (Table 1) and had the potential to fold into a stem-loop secondary structure, sharing some nucleotides with the flanking tRNACys (Fig. 2B). As described for other tetrapods, L-strand synthesis is probably initiated in a stretch of thymines in the OL loop (Fig. 2B). The 5V-GCCGG-3Vmotif that in human mt DNA is involved in the transition from RNA synthesis to DNA synthesis is entirely conserved in all three discoglossids (Fig. 2B). The control regions of the three discoglossid mt genomes are highly variable in length, ranging from 1482 to 2372 bp (Table 1). The structure of the control region of each species is shown in Fig. 3A. Three conserved sequence blocks (CSB-1, CSB-2, and CSB-3) (Fig. 3B) were identified in the 3V end part of each control region. The newly reported discoglossid CSB-1 motifs are not reduced to a truncated pentamotif (5V-GACAT-3V) as in fishes, but share moderately high similarity to the recently described caecilian CSB-1 (San Mauro et al., 2004) (Fig. 3B). A truncated CSB-1 had been reported for other amphibians: X. laevis, A. davidianus, L. atifi, and R. sibiricus. However, the alignment of all amphibian mt control regions allowed us to identify a complete CSB-1 motif in all these species (only tentatively in A. davidianus), as well as in the recently sequenced F. limnocharis, A. mexicanum, and B. melanostictus (not 362 D. San Mauro et al. / Gene 343 (2004) 357–366 Fig. 2. (A) Gene organization for the mt genomes of the discoglossids. Genes encoded by the L-strand are underlined. (B) Proposed secondary structures for the origins of L-strand replication (OL) of the discoglossids. The 5V-GCCGG-3Vmotif is indicated by a box. Lines show the nucleotides partially shared with flanking tRNAs. Fig. 3. Main features of the discoglossid mt DNA control region. (A) Structure of the control region for each species. All discoglossids have three conserved sequence blocks (CSB-1, 2, and 3), two pyrimidine-rich regions (PP-1 and 2), and repeated motifs at both 5Vand 3Vends. All repeats are in tandem except those at the 3Vend of D. galganoi. (B) Alignments of the identified conserved sequence blocks (CSB) of all three discoglossids. (C) Alignment of the repeated motif at the 5Vend. First position on this alignment is referred to first position on D. galganoi control region. Line shows nucleotides that correspond to a putative termination-associated sequence (TAS). D. San Mauro et al. / Gene 343 (2004) 357–366 shown). Two pyrimidine-rich stretches were identified upstream the CSB motifs in each control region (Fig. 3A). Although somewhat shorter, they are likely homologous to the caecilian PP-1 (poly-T stretch) and PP-2 (poly-C stretch), and might be involved in regulatory aspects of the origin of H-strand replication (San Mauro et al., 2004). All three discoglossid mt control regions possess repeats at both 5Vand 3Vends (Fig. 3A). The repeated motif at 5Vend is in tandem and shows high sequence similarity in all three discoglossids (Fig. 3C), which suggests a common origin i.e. homology. However, the number and length of tandem repeats differ across taxa: D. galganoi possesses four repeats of 87 bp, A. obstetricans five (plus five incomplete) of 92 bp, and B. orientalis 11 (plus one incomplete) of 77 bp (Fig. 3A). Two copies of the same motif were identified in X. laevis, and one single copy in examined neobatrachians and salamanders. This suggests that this motif at the 5Vend of the mt control region was likely present in at least the ancestor of frogs and salamanders, and that independent duplication events occurred in the evolutionary history of each lineage. Furthermore, a putative termination-associated sequence (TAS) was found within this homologous motif in all three discoglossids (Fig. 3C). Only in D. galganoi, there was a L-strand-encoded ORF all along the 5Vend motif, but 363 BLAST searches of the predicted 29 amino acid sequence produced no close matches, and thus the function of the putative polypeptide (if any) is unknown. Unlike the 5Vend repeats, sequence similarity of the 3Vend motifs (two repeats of about 78 bp not in tandem in D. galganoi, five tandem repeats of about 89 bp (plus one incomplete) in A. obstetricans, and three tandem repeats of about 64 bp in B. orientalis; Fig. 3A) was very low, suggesting that they might not be related to each other. 3.2. Phylogenetic relationships of discoglossids The deduced amino acid sequences of all 13 mt proteincoding genes were combined into a single data set that produced an alignment of 3,818 positions. Of these, 301 were excluded from the analyses because of ambiguity in the homology assignment, 1766 were invariant, and 1126 parsimony-informative. Mean character distance was 0.138F0.005 among discoglossids, 0.186F0.004 between discoglossids and Xenopus, 0.284F0.008 between discoglossids and neobatrachians, 0.304F0.019 between Xenopus and neobatrachians, and 0.269F0.017 among neobatrachians. ML ( ln likelihood=35,255.830), BI ( ln likelihood=35,298.060), ME (score=1.198), and MP (one Fig. 4. Phylogenetic relationships of discoglossid genera, and position of the family Discoglossidae within the Anura. (A) ML phylogram inferred from a single concatenated data set with the deduced amino acid sequence of all 13 mt protein-coding genes. Numbers above branches indicate support for ML (QP support; mtREV24+G+I model; upper value) and BI (BPPs; mtREV24+G+I model; lower value). Numbers below branches represent BPs for ME (mean character distances; upper value) and MP (lower value). Hyphens indicate support values below 50%. Salamanders were used as outgroups. (B) Unrooted ML phylogram inferred from analysis of the combined four-taxon data set (see text). Numbers above branches indicate support for ML (BPs; GTR+G+I model; upper value) and BI (BPPs; different model according to partition, see text; lower value). Numbers below branches represent BPs for ME (GTR+G+I distances; upper value) and MP (lower value). 364 D. San Mauro et al. / Gene 343 (2004) 357–366 single tree of 5371 steps; CI=0.762) phylogenetic analyses arrived at the same tree topology (Fig. 4A). The recovered tree strongly supported a discoglossid clade that comprises Alytes, Bombina, and Discoglossus (Fig. 4A). This result is congruent with recent morphological (Pugener et al., 2003) and molecular (Biju and Bossuyt, 2003; Hertwig et al., 2004; Hoegg et al., 2004) studies, and supports the traditional view of discoglossids as a natural group (e.g., Griffiths, 1963; Duellman, 1975; Laurent, 1979; Duellman and Trueb, 1994; Hay et al., 1995; Sanchiz, 1998). Conversely, it clearly rejects the validity of Bombinanura and Discoglossanura groupings as proposed by Ford and Cannatella (1993). Within the Discoglossidae, Alytes was recovered as the sister taxon of Discoglossus, to the exclusion of Bombina (Fig. 4A), a topology which is in full agreement with recent morphological (Pugener et al., 2003) and molecular (Biju and Bossuyt, 2003; Hoegg et al., 2004) investigations, and supports most previous studies that found closer affinities of Alytes to Discoglossus than to Bombina (Duellman, 1975; Laurent, 1979; Ford and Cannatella, 1993; Duellman and Trueb, 1994; Sanchiz, 1998; Odierna et al., 2000) irrespective of the actual monophyly of Discoglossidae. Previous immunological and morphological studies that defended an Alytes+Bombina clade (Erspamer et al., 1972; Lanza et al., 1975) or a Discoglossus+Bombina clade (Maxson and Szymura, 1984; Haas, 2003) are fully rejected by our results. Two recent molecular studies (Fromhage et al., 2004; Hertwig et al., 2004) have also dealt with the discoglossid phylogeny using partial sequences of 12S and 16S rRNA mt genes (about 900 bp in total), but unfortunately both of them were unable to reach clear and well supported results regarding the phylogenetic interrelationships of discoglossid genera. A close sister group relationship between Xenopus and discoglossids was highly supported (Fig. 4A), which would in principle support Hedges and Maxson’s (1993) and Hay et al.’s (1995) hypothesis, and contradict previous morphological (e.g., Ford and Cannatella, 1993; Duellman and Trueb, 1994) and molecular (Hillis et al., 1993) investigations. However, the lack of any Pelobatoidea in the analysis, a taxon that is often recovered as sister taxon of Pipoidea (e.g., Ford and Cannatella, 1993; Garcı́a-Parı́s et al., 2003), might bias the analysis and cause the observed topology. Moreover, the three neobatrachians, B. melanostictus, F. limnocharis and R. nigromaculata, were recovered together in a clade achieving maximal support with all methods (Fig. 4A), which fully agrees with almost all morphological and molecular studies to date (e.g., Duellman and Trueb, 1994; Hay et al., 1995). Interestingly, with all methods of phylogenetic inference, the three neobatrachians exhibited extremely long branches (Fig. 4A). It is well known that unequal substitution rates among taxa may have severe effects on tree reconstruction algorithms. Long branch attraction leads to a strong grouping and basal placement of the ingroup species with the fastest rates (longest branches) irrespective of the true phylogeny (Swofford et al., 1996). It is likely that the high rates of the neobatrachians may bias the analyses and cause artefactual monophyly of Archaeobatrachia. In fact, when Table 2 Results of the phylogenetic analyses of the single four-taxon data sets CXCR4 RAG1 RAG2 Rhodopsin All nuclear genes mt proteins Number of positions Total aligned Ambiguous/gapped Invariant Parsimony-informative 651 6 431 43 1509 0 1072 80 816 0 485 57 294 0 228 8 3270 6 2216 188 10,866 210 6228 908 ML ln L BP 2014.676 91 4377.024 94 2787.458 59 739.552 90 9973.266 98 37,381.548 72 BI ln L BPP 2014.390 100 4377.620 100 2785.300 65 739.800 94 9974.130 100 37,382.570 99 ME Tree score BP 0.676 96 0.507 91 0.780 58 0.365 77 0.678 99 1.339 79 MP Tree length CI BP 276 0.931 99 562 0.911 64 440 0.925 81 77 0.961 76 1355 0.923 99 6327 0.907 78 For each, number of total aligned positions, ambiguous/gapped positions, invariant positions, parsimony-informative positions, ML and BI log likelihoods, ME tree score, MP tree length, CI, and support for the ((Alytes, Discoglossus), (Bombina, Xenopus)) grouping (BPs for ML, ME, and MP; and BPPs for BI) are presented. D. San Mauro et al. / Gene 343 (2004) 357–366 Table 3 Log likelihoods and p values of Approximately unbiased (AU), Shimodaira-Hasegawa (SH), and Kishino-Hasegawa (KH) tests for each of the three unrooted topologies of the combined four-taxon data set Alternative topologies ((A, D), (B, X)) ((A, X), (B, D)) ((A, B), (D, X)) ln L 47,581.186 47,599.262 47,599.360 AU SH KH 0.984 0.027 0.025 0.991 0.020 0.019 0.980 0.020 0.019 A, Alytes; B, Bombina; D, Discoglossus; X, Xenopus. we use stringent parameter settings in GBLOCKS to remove the fast evolving sites from the alignment, Xenopus is consistently recovered as the sister group of the neobatrachians with all methods of phylogenetic inference, whereas other ingroup relationships become unresolved because of the strong reduction in the number of variable sites (not shown). Therefore, the recovered monophyly of archaeobatrachians may be spurious, and additional sequence information from more key lineages (e.g., representatives of Pelobatoidea, Ascaphidae, or Leiopelmatidae) needs to be gathered to properly address this question. We used the four-taxon data sets to further investigate the relationships among discoglossids. All phylogenetic analyses based on four-taxon data sets (nucleotide sequences of each separate nuclear gene, a concatenated nucleotide sequence containing all four nuclear genes, and a concatenated nucleotide sequence containing all mt proteincoding genes except ND6) arrived at the same wellsupported tree topology: ((Alytes, Discoglossus), (Bombina, Xenopus)). The results of all these analyses for each fourtaxon data set are given in Table 2. Despite the apparently low variability of the nuclear data sets with respect to the mt, the statistical support for the Alytes+Discoglossus grouping was very high in all cases, and only the RAG2 gene showed a moderately lower phylogenetic performance. 365 The combination of all four-taxon data sets into a joint matrix produced an alignment of 13,920 positions. ML ( ln likelihood=47,581.186), BI ( ln likelihood=47,400.700), ME (score=1.072), and MP (one single tree of 7682 steps; CI=0.910) arrived at the same tree topology, and all support values for the Alytes+Discoglossus grouping were maximal or nearly so (Fig. 4B). Results of AU, SH, and KH tests of alternative tree topologies of this latter combined data set are summarised in Table 3. All tests rejected the two suboptimal trees at Pb0.05. The strong evidence of the fourtaxon data sets in favor of an Alytes+Discoglossus grouping further supports the results achieved based on phylogenetic analyses of mt amino acids (Fig. 4A) (see above). Estimated substitution rates (ML tree branch length) of the nuclear genes were relatively slower than that of the combined mt protein-coding gene data set (Fig. 5), which is consistent with many previous studies (e.g., Brown et al., 1982). This condition makes all four nuclear genes potentially useful molecular markers for the study of deep amphibian divergences. With the noteworthy exception of CXCR4, all four-taxon data sets exhibited a very short internal branch leading to rather long external (tip) branches (Fig. 5). Short branch lengths connecting internal nodes may reflect rapid radiation events at the origin of these lineages, but this needs to be tested specifically. Although the monophyly of discoglossids, and the sister group relationship of Alytes and Discoglossus are confidently resolved in this study, the lack of sequence information for other lineages of frogs does not allow us to draw clear conclusions about the overall anuran relationships nor on the archaeobatrachian monophyly debate. Additional taxa need to be targeted in future molecular phylogenetic studies that use complete mt genomes and nuclear genes to further understand the origin and early evolution of Anura. From a taxonomic perspective, the Fig. 5. Estimated substitution rates (measured as ML tree branch length) of each single four-taxon data set (nucleotide sequences of each separate nuclear gene, a concatenated nucleotide sequence containing all four nuclear genes, and a concatenated nucleotide sequence containing all mt protein-coding genes except ND6). In every column, substitution rates of specific branches are identified by the number on the tree. S.E., standard error. 366 D. San Mauro et al. / Gene 343 (2004) 357–366 strongly supported monophyly of the discoglossids suggests that the family name Discoglossidae should be used again to include all four genera Discoglossus, Alytes, Bombina, and Barbourula. Acknowledgements We are grateful to Lukas Rqber for providing helpful technical advice with laboratory work, to Íñigo Martı́nezSolano for helping during sampling collection, and to the bConsejerı́a de Medio AmbienteQ of Madrid and Castilla y León (Spain) for providing the appropriate collecting permits. Two anonymous reviewers gave insightful comments on an earlier version of the manuscript. D.S.M. was sponsored by a predoctoral fellowship of the Ministerio de Ciencia y Tecnologı́a of Spain. This work received financial support from a project of the Ministerio de Ciencia y Tecnologı́a of Spain to R.Z. (CGL2004-00401). References Arntzen, J.W., Garcı́a-Parı́s, M., 1995. Morphological and allozyme studies of midwife toads (genus Alytes), including the description of two new taxa from Spain. Contrib. Zool. (Bijdr. Dierkd.) 65, 5 – 34. Biju, S.D., Bossuyt, F., 2003. New frog family from India reveals an ancient biogeographical link with the Seychelles. Nature 425, 711 – 714. Brown, W.M., Prager, E.M., Wang, A., Wilson, A.C., 1982. Mitochondrial DNA sequences of primates: tempo and mode of evolution. J. Mol. Evol. 18, 225 – 239. Duellman, W.E., 1975. On the classification of frogs. Occas. Pap. Mus. Nat. Hist., Univ. Kansas 42, 1 – 14. Duellman, W.E., Trueb, L., 1994. Biology of Amphibians. Johns Hopkins University Press, Baltimore, MD. Erspamer, V., Erspamer, F., Inselvini, M., Negri, L., 1972. Occurrence of bombesin and alytesin in extracts of the skin of three european discoglossid frogs and pharmacological actions of bombesin on extravascular smooth muscle. Br. J. Pharmacol. 45, 333 – 348. Ford, L.S., Cannatella, D.C., 1993. The major clades of frogs. Herpetol. Monogr. 7, 94 – 117. Fromhage, L., Vences, M., Veith, M., 2004. Testing alternative vicariance scenarios in Western Mediterranean discoglossid frogs. Mol. Phylogenet. Evol. 31, 308 – 322. Garcı́a-Parı́s, M., Buchholz, D.R., Parra-Olea, G., 2003. Phylogenetic relationships of Pelobatoidea re-examined using mtDNA. Mol. Phylogenet. Evol. 28, 12 – 23. Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49, 652 – 670. Griffiths, I.G., 1963. The phylogeny of the Salientia. Biol. Rev. 38, 241 – 292. Groth, J.G., Barrowclough, G.F., 1999. Basal divergences in birds and the phylogenetic utility of the nuclear RAG-1 gene. Mol. Phylogenet. Evol. 12, 115 – 123. Haas, A., 2003. Phylogeny of frogs as inferred from primarily larval characters (Amphibia: Anura). Cladistics 19, 23 – 89. Hay, J.M., Ruvinsky, I., Hedges, S.B., Maxson, L.R., 1995. Phylogenetic relationships of amphibian families inferred from DNA sequences of mitochondrial 12S and 16S ribosomal RNA genes. Mol. Biol. Evol. 12, 928 – 937. Hedges, S.B., Maxson, L.R., 1993. A molecular perspective on lissamphibian phylogeny. Herpetol. Monogr. 7, 27 – 42. Hertwig, S., de Sá, R.O., Haas, A., 2004. Phylogenetic signal and the utility of 12S and 16S mtDNA in frog phylogeny. J. Zoolog. Syst. Evol. Res. 42, 2 – 18. Hillis, D.M., Ammerman, L.K., Dixon, M.T., de Sá, R.O., 1993. Ribosomal DNA and the phylogeny of frogs. Herpetol. Monogr. 7, 118 – 131. Hoegg, S., Vences, M., Brinkmann, H., Meyer, A., 2004. Phylogeny and comparative substitution rates of frogs inferred from sequences of three nuclear genes. Mol. Biol. Evol. 21, 1188 – 1200. Jameson, D., Gibson, A.P., Hudelot, C., Higgs, P.G., 2003. OGRe: a relational database for comparative analyses of mitochondrial genomes. Nucleic Acids Res. 31, 202 – 206. Lanza, B., Cei, J.M., Crespo, E., 1975. Immunological evidence for the specific status of Discoglossus pictus Otth, 1837 and D. sardus Tschudi, 1837, with notes on the families Discoglossidae Gqnther, 1858 and Bombinidae Fitzinger, 1826 (Amphibia: Salientia). Monit. Zool. Ital. (N.S.) 9, 153 – 162. Laurent, R., 1979. Esquisse d’une phylogenèse des anoures. Bull. Soc. Zool. Fr. 104, 397 – 422. Martı́nez-Solano, I., 2004. Phylogeography of Iberian Discoglossus (Lissamphibia: Anura: Discoglossidae). J. Zoolog. Syst. Evol. Res. (in press). Maxson, L.R., Szymura, J.M., 1984. Relationships among discoglossid frogs: an albumin perspective. Amphib.-Reptil. 5, 245 – 252. Odierna, G., Andreone, F., Aprea, G., Arribas, O., Capriglione, T., Vences, M., 2000. Cytological and molecular analysis in the rare discoglossid species, Alytes muletensis (Sanchiz and Adrover 1977) and its bearing on archaeobatrachian phylogeny. Chromosom. Res. 8, 435 – 442. Pugener, L.A., Maglia, A.M., Trueb, L., 2003. Revisiting the contribution of larval characters to an analysis of phylogenetic relationships of basal anurans. Zool. J. Linn. Soc. 139, 129 – 155. Sanchiz, B., 1998. Encyclopedia of Palaeoherpetology, Part IV. Salientia. Friedrich Pfeil, Mqnchen. San Mauro, D., Gower, D.J., Oommen, O.V., Wilkinson, M., Zardoya, R., 2004. Phylogeny of caecilian amphibians (Gymnophiona) based on complete mitochondrial genomes and nuclear RAG1. Mol. Phylogenet. Evol. 33, 413 – 427. Swofford, D.L., Olse, G.J., Waddell, P.J., Hillis, D.M., 1996. Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (Eds.), Molecular Systematics. Sinnauer Associates, Sunderland, MA, USA, pp. 407 – 514. Yang, Z., Nielsen, R., Hasegawa, M., 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15, 1600 – 1611. Zardoya, R., Meyer, A., 1996. Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Mol. Biol. Evol. 13, 933 – 942.
© Copyright 2025 Paperzz