25.San Mauro et al 2004 Gene

Gene 343 (2004) 357 – 366
www.elsevier.com/locate/gene
Phylogenetic relationships of discoglossid frogs
(Amphibia:Anura:Discoglossidae) based on complete
mitochondrial genomes and nuclear genes
Diego San Mauro*, Mario Garcı́a-Parı́s, Rafael Zardoya
Departamento de Biodiversidad y Biologı́a Evolutiva, Museo Nacional de Ciencias Naturales, CSIC, José Gutiérrez Abascal, 2. E-28006 Madrid, Spain
Received 23 April 2004; received in revised form 30 July 2004; accepted 5 October 2004
Available online 11 November 2004
Received by G. Pesole
Abstract
The complete nucleotide sequence of the mitochondrial (mt) genome was determined for three species of discoglossid frogs
(Amphibia:Anura:Discoglossidae), representing three of the four recognized genera: Alytes obstetricans, Bombina orientalis, and
Discoglossus galganoi. The organization and size of these newly determined mt genomes are similar to those previously reported for
other vertebrates. Phylogenetic analyses (maximum likelihood, Bayesian inference, minimum evolution, and maximum parsimony) of mt
protein-coding genes at the amino acid level were performed in combination with already published mt genome sequence data of three
species of Neobatrachia, one of Pipoidea, and four of Caudata. Phylogenetic analyses based on the deduced amino acid sequences of all mt
protein-coding genes arrived at the same topology. The monophyly of Discoglossidae is strongly supported. Within the Discoglossidae,
Alytes is consistently recovered as sister group of Discoglossus, to the exclusion of Bombina. The three species representing Neobatrachia
exhibited extremely long branches irrespective of the phylogenetic inference method used, and hence their relative position with respect to
Discoglossidae and Xenopus may be artefactual due to a severe long branch attraction effect. To further investigate the phylogenetic
intrarelationships of discoglossids, nucleotide sequences of four nuclear protein-coding genes (CXCR4, RAG1, RAG2, and Rhodopsin) with
sequences available for the three discoglossid genera and Xenopus were retrieved from GenBank, and together with a concatenated
nucleotide sequence data set containing all mt protein-coding genes except ND6 were subjected to separate and combined phylogenetic
analyses. In all cases, a sister group relationship between Alytes and Discoglossus was recovered with high statistical support.
D 2004 Elsevier B.V. All rights reserved.
Keywords: Alytes; Bombina; Discoglossus; CXCR4; RAG1; RAG2; Rhodopsin
1. Introduction
Discoglossids (Amphibia:Anura:Discoglossidae) are
medium-sized frogs with a characteristic disc-shaped tongue
Abbreviations: ATP6 and ATP8, ATP synthase F0 subunits 6 and 8; CI,
consistency index; COX1-3, cytochrome c oxidase subunits I–III; CXCR4,
chemokine (C-X-C motif) receptor 4; H-strand, heavy strand; L-strand,
light strand; mt, mitochondrial; ND1-6, NADH dehydrogenase subunits
1–6; ORF, open reading frame; PCR, polymerase chain reaction; rRNA,
ribosomal ribonucleic acid; RAG1 and RAG2, recombination activating
genes 1 and 2; tRNA, transfer ribonucleic acid.
* Corresponding author. Tel.: +34 91 4111328; fax: +34 91 5645078.
E-mail address: [email protected] (D. San Mauro).
0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.gene.2004.10.001
that show either a stocky or elongated body, and a warty or
smooth skin. Furthermore, they exhibit considerably diverse
life histories, from largely aquatic to terrestrial burrowers
(Duellman and Trueb, 1994). Discoglossids are among the
oldest living frog lineages, dating back at least to the
Jurassic (Sanchiz, 1998). Living and fossil discoglossids are
strictly distributed within the Paleartic Region, which
supports a Laurasian origin of the lineage.
Living discoglossid frogs have been generally grouped
into four genera (e.g., Duellman, 1975; Laurent, 1979;
Duellman and Trueb, 1994; Sanchiz, 1998): Alytes including
five species from Western Europe and Morocco (but see the
debate on Baleaphryne and Ammoryctis; Arntzen and
358
D. San Mauro et al. / Gene 343 (2004) 357–366
Garcı́a-Parı́s, 1995), Discoglossus composed by six (probably seven; Martı́nez-Solano, 2004) species from Western
Europe, Northwestern Africa, Palestina and some Mediterranean islands, Bombina including nine species from Europe
and East and South East Asia, and Barbourula comprising
two species from Indonesia and Philippines (AmphibiaWeb,
November 2, 2004; http://www.amphibiaweb.org/).
Together with other bprimitive frogsQ (Leiopelmatidae,
Ascaphidae, Pipoidea, and Pelobatoidea), discoglossids
were traditionally placed within Archaeobatrachia (e.g.,
Duellman, 1975; Laurent, 1979). However, Archaeobatrachia is generally recovered as a paraphyletic group with
respect to the remaining frogs, the Neobatrachia (e.g., Ford
and Cannatella, 1993; Duellman and Trueb, 1994) based on
morphological evidence. Only some molecular studies,
based on partial sequences of mitochondrial (mt) ribosomal
genes, have supported the monophyly of the Archaeobatrachia (Hedges and Maxson, 1993; Hay et al., 1995). The
debate about the monophyly of Archaeobatrachia has been
associated to rooting problems, particularly relevant for the
molecular data sets (Garcı́a-Parı́s et al., 2003).
Discoglossid frogs have been traditionally treated as a
natural group (e.g., Duellman, 1975; Laurent, 1979; Duellman and Trueb, 1994; Sanchiz, 1998; Biju and Bossuyt,
2003; Pugener et al., 2003; Hertwig et al., 2004; Hoegg et
al., 2004). However, the antiquity of the different discoglossid lineages has prompted taxonomic disagreement over
the number of families in which the four genera should be
grouped: one (Discoglossidae) or two independent families
(Discoglossidae and Bombinatoridae) (e.g., Lanza et al.,
1975). Moreover, while Bombina and Barbourula have
been consistently treated as sister taxa (Duellman, 1975;
Laurent, 1979; Duellman and Trueb, 1994; Sanchiz, 1998),
a long and lasting controversy is still going on the
relationships of the clade Bombina+Barbourula to the other
discoglossid genera. Hypotheses supporting a sister taxon
relationship between Alytes and Discoglossus to the
exclusion of Bombina+Barbourula dominated over all other
arrangements (Duellman, 1975; Laurent, 1979; Ford and
Cannatella, 1993; Duellman and Trueb, 1994; Sanchiz,
1998; Biju and Bossuyt, 2003; Pugener et al., 2003; Hoegg
et al., 2004). However, a sister taxon relationship between
Alytes and Bombina (Erspamer et al., 1972; Lanza et al.,
1975) and between Bombina and Discoglossus (Maxson
and Szymura, 1984; Haas, 2003) were also proposed based
on immunological and morphological evidence.
Some studies have even challenged the monophyly of the
group based on morphological evidence (Ford and Cannatella, 1993). These authors found the Alytes+Discoglossus
grouping more closely related to other frogs (the Pipanura
comprising pipoideans, pelobatoideans, and neobatrachians)
than to the Bombina+Barbourula clade. They proposed the
name Discoglossanura for the group including Alytes+Discoglossus and the Pipanura, whereas they used the name
Bombinanura for the group comprising Bombina+Barbourula and the Discoglossanura.
To test between the competing hypotheses on the
monophyly of discoglossids, and to investigate the phylogenetic relationships among discoglossid genera, we have
determined the complete nucleotide sequence of the mt
genomes of three discoglossids, each one representing a
different genus, and compared it with previously described
frog mt genomes. This mitogenomic approach follows
several recent studies (e.g., Zardoya and Meyer, 1996) that
demonstrated the need to establish high-level phylogenetic
inferences based on rather large sequence data sets in order
to achieve statistical confidence. Also recently, several
studies (e.g., Groth and Barrowclough, 1999) have proven
that some orthologous nuclear protein-coding genes outperform individual mt genes in reconstructing ancient phylogenies. Therefore, to gain insights on the discoglossid
phylogeny from a nuclear perspective, we have also
gathered sequences of nuclear protein-coding genes that
have shown a good performance in recovering the phylogenetic relationships among divergent amphibian lineages
(Biju and Bossuyt, 2003; Hoegg et al., 2004; San Mauro
et al., 2004).
2. Materials and methods
2.1. Taxon sampling
The nucleotide sequence of the complete mt genome was
determined in a single representative of the three most
common discoglossid genera (voucher numbers from the
Museo Nacional de Ciencias Naturales, Spain): Alytes
obstetricans pertinax (MNCN/ADN 4313; collected in
Tielmes, Spain), Bombina orientalis (MNCN/ADN 4314;
pet trade), and Discoglossus galganoi (MNCN/ADN 4315;
collected in Reliegos, Spain). The South East Asian genus
Barbourula could not be included in the study, but it is
confidently thought to be the sister group of Bombina,
according to morphological and histological data (Sanchiz,
1998). The new sequence data were compared with all
available anuran complete mt genome sequences: Bufo
melanostictus (NC _ 005794), Fejervarya limnocharis
(NC_005055), Rana nigromaculata (NC_002805), and
Xenopus laevis (NC_001573). The complete mt genomes
of four salamanders, Ambystoma mexicanum (NC_005797),
Andrias davidianus (NC_004926), Lyciasalamandra atifi
(NC_002756), and Ranodon sibiricus (NC_004021), were
used as outgroups.
To further investigate the phylogenetic relationships
among discoglossids, we screened the GenBank database
for nuclear protein-coding genes available in at least one
species of each of the three discoglossid genera (Alytes,
Bombina, and Discoglossus). Sequence information of
the selected nuclear genes was not available for most of
other frog and salamander genera employed in the
mitochondrial analysis, so we used only Xenopus as
outgroup, and addressed this second approach from a
D. San Mauro et al. / Gene 343 (2004) 357–366
four-taxon-case perspective. The retrieved nuclear
sequences were: CXCR4 exon 2 (A. obstetricans,
AY364170; B. orientalis, AY364177; Discoglossus pictus,
AY364172; X. laevis, Y17895); RAG1 (A. obstetricans,
AY583334; B. orientalis, AY583335; D. galganoi,
AY583338; X. laevis, L19324); RAG2 (Alytes muletensis,
AY323780; B. orientalis, AY323783; Discoglossus sardus,
AY323785; X. laevis, L19325); and Rhodopsin exon 1 (A.
obstetricans, AY364385; B. orientalis, AY364391; D.
pictus, AY364387; X. laevis, S62229).
2.2. DNA extraction, PCR amplification, cloning and
sequencing
Total DNA was purified following standard phenol/
chloroform extraction procedures. Overlapping fragments
that covered the entire mt genome were amplified by PCR
using the same primers and conditions reported in San
Mauro et al. (2004). PCR products were purified by ethanol
precipitation, and sequenced in an automated DNA
sequencer (ABI PRISM 3700), using the BigDye Deoxy
Terminator cycle-sequencing kit (Applied Biosystems)
following manufacturer’s instructions. Short amplicons
were directly sequenced using the corresponding PCR
primers. Long amplicons were cloned into pGEM-T vectors
(Promega), and recombinant plasmids were sequenced using
the M13 (forward and reverse) universal primers and
additional walking primers (available from the authors upon
request). The sequences obtained averaged 700 base pairs
(bp) in length, and each sequence overlapped the next contig
by about 150 bp. In no case were differences in sequence
observed between the overlapping regions. Complete mt
genome nucleotide sequences reported in this paper have
been deposited at the GenBank database under accession
numbers AY585337 (A. obstetricans), AY585338 (B.
orientalis), and AY585339 (D. galganoi).
2.3. Molecular and phylogenetic analyses
Sequence data were analyzed with MacClade version
4.05, and PAUP* version 4.0b10. To control for saturation
in the different data sets, we plotted either pairwise
transition and transversion differences (for nucleotide
sequences) or mean character distances (for amino acid
sequences) against corrected sequence divergence (measured as ML distances). Transitions of mt protein-coding
gene nucleotide sequences were saturated (Fig. 1A),
particularly in all pairwise comparisons involving B.
melanostictus, F. limnocharis, and R. nigromaculata i.e.
Neobatrachia, and the outgroups. Hence, we analyzed the
mt protein-coding gene sequence data at the amino acid
level, which showed no saturation (Fig. 1B). Deduced
amino acid sequences of mt protein-coding genes were
aligned using CLUSTAL X version 1.83 and revised by
eye in order to maximize homology of position. Ambiguous alignments and gaps were excluded from the
359
analyses using GBLOCKS version 0.91b with default
parameters.
The deduced amino acid sequences of all 13 proteincoding genes encoded by each mt genome were combined
into a single concatenated data set that was subjected to four
phylogenetic analyses using: maximum parsimony (MP),
minimum evolution (ME), maximum likelihood (ML), and
Bayesian inference (BI). MP and ME (mean character
distances) analyses were carried out with PAUP* using
heuristic searches with 10 random stepwise additions of taxa
and TBR branch swapping. Support for the resulting MP
and ME trees was evaluated by non-parametric bootstrapping (BP) with 1000 pseudoreplicates. ML analyses were
conducted with TREE-PUZZLE version 5.2 using the
mtREV24 model with correction for among-site rate
heterogeneity (G+I). This model was selected following
Yang et al. (1998) and by performing Likelihood Ratio Tests
(LRTs) comparing hierarchically the following alternative
models: equal rates (eq.) versus gamma-distributed rates (G),
versus proportion of invariant sites (I), versus gammadistributed rates and proportion of invariant sites (G+I).
Robustness of the resulting ML tree was evaluated by
quartet puzzling (QP; 100,000 puzzling steps). BI analyses
were performed with MrBayes version 3.0b4, simulating
four simultaneous chains, for a million generations, sampling every 100 generations. Generations sampled before
the chain reached stationarity (100,000), as judged by plots
of ML scores, were discarded (bburn-inQ). For this analysis,
the mtREV24+G+I model was also selected. Statistical
support for clades obtained by BI was measured by
Bayesian posterior probability (BPP).
Because the sequences of the three species representing
Neobatrachia were highly divergent, separate analyses using
a more conservative alignment (by employing stringent
parameter setting in GBLOCKS: minimum number of
sequences for a conserved position: 9; minimum number
of sequences for a flanking position: 11; maximum number
of contiguous non-conserved positions: 1; minimum length
of a block: 50) were also performed, using the same settings
described above for each phylogenetic method.
In the four-taxon approach, pairwise distance values of
mt protein-coding genes among the three discoglossids and
X. laevis were located in the linear part of the saturation plot
(Fig. 1A), and neither transitions nor transversions were
saturated. Similarly, nucleotide sequences of the four
nuclear genes showed no saturation (not shown). Hence,
all four-taxon data sets were analysed at the nucleotide level
including all codon positions. For these data sets, inferred
amino acid sequences were aligned as described above, gaps
were excluded using GBLOCKS with default parameters,
and the resulting alignments were then imposed onto the
corresponding nucleotide sequences. Alignments are available from the authors upon request.
The single four-taxon data sets (nucleotide sequences of
each separate nuclear gene, a concatenated nucleotide
sequence containing all four nuclear genes, and a con-
360
D. San Mauro et al. / Gene 343 (2004) 357–366
Fig. 1. Saturation plots of the mt concatenated datasets. (A) Plot of pairwise transitions (Ti) and transversions (Tv) against corrected sequence divergence
(measured as ML distance) for the mt protein-coding genes at the nucleotide level. Dashed square indicates location of pairwise comparisons involving the
three discoglossids and Xenopus. (B) Plot of uncorrected mean character distance against corrected divergence (measured as ML distance) for the mt proteincoding genes at the amino acid level.
catenated nucleotide sequence containing all mt proteincoding genes except ND6) were subjected to MP, ME, ML
and BI analyses, separately. MP, ME and ML analyses were
carried out with PAUP*, whereas the BI analysis was
conducted with MrBayes. MP and ML analyses were both
performed using branch-and-bound searches with furthest
addition sequence of taxa, whereas ME and BI analyses
were both performed using the same settings mentioned
above. The best-fit model of nucleotide substitution for the
ME, ML, and BI analyses was selected using ModelTest
version 3.5, following the Akaike Information Criterion
(AIC). The selected model were: TVM+G, for CXCR4;
GTR+I, for RAG1; TrN+I, for RAG2; HKY+G, for
Rhodopsin; GTR+G, for the concatenated nuclear data set;
and GTR+G+I, for the concatenated mt data set. MrBayes
does not allow the TVM and TrN submodels, and hence the
GTR was used for BI with the CXCR4 and RAG2 data sets.
BPs were used to test the robustness of MP, ME, and ML
trees (1000 pseudoreplicates). The reliability of the BI
analyses was tested with BPPs. ML tree branch lengths were
estimated to compare substitution rates among the different
four-taxon data sets.
Finally, all single nuclear gene four-taxon data sets as
well as the concatenated mt four-taxon data set were
combined into a joint data set, and submitted to MP, ME,
ML, and BI methods of phylogenetic inference (using the
same settings as for the separate analyses, see above). For
ME and ML, a single GTR+G+I model of nucleotide
substitution was selected (according to the AIC calculated
using ModelTest). BI analysis was performed using the
corresponding substitution model for each of the separate
four-taxon data sets (see above), and model parameters
were independently estimated for each partition (bunlinkQ
option).
Approximately unbiased (AU), Shimodaira-Hasegawa
(SH), and Kishino-Hasegawa (KH) tests were used to
evaluate the three alternative unrooted trees for the
combined four-taxon data set using CONSEL version 0.1f,
D. San Mauro et al. / Gene 343 (2004) 357–366
with site-wise log-likelihoods of trees calculated by PAUP*.
A total of one million scaled bootstrap replicates were used
in order to get a small sampling error. Some recent studies
(e.g., Goldman et al., 2000) have pointed out that
inappropriate tree specification may bias non-parametric
tests (especially KH, which requires the trees to be specified
a priori; and SH, which requires the inclusion of all
breasonableQ trees though it is unclear how this set can be
selected). On this matter, Goldman et al. (2000) noted that
selecting all possible trees will be a conservative solution to
the problem, but this is impractical except for the smallest
taxon samplings. The more recent AU test is less biased
than other methods, but is also impractical when the number
of trees to be compared is large. We conducted the AU, SH,
and KH tests using the combined four-taxon data set
because is the one that gathers the largest and most
comprehensive set of sequence characters, and because for
four taxa there are only three alternative, fully resolved
unrooted trees, making the selection of all possible trees
practical.
3. Results and discussion
3.1. Mitochondrial genomes organization and structural
features
The complete nucleotide sequence of the L-strand of the
mt genomes of the three discoglossids was determined. The
total length of the new discoglossid mt genomes ranged
from 17,014 to 17,847 bp (Table 1). All three mt genomes
encoded for two rRNAs, 22 tRNAs, and 13 protein-coding
genes, and in all cases the organization conformed to the
vertebrate consensus mt gene arrangement (Jameson et al.,
2003) (Fig. 2A). Overall base compositions of the L-strand
as well as gene lengths for each genome are shown in Table
1. As in most vertebrates, the overall base compositions are
skewed against guanine in all three discoglossid mt
genomes, which is due to a strong bias against the use of
guanine at the third codon position.
The mt 12S and 16S rRNA genes range from 933 to 949,
and from 1583 to 1626 bp (Table 1), respectively. The 22
tRNA genes range in size from 65 to 75 bp. All tRNAs can
be folded into typical cloverleaf secondary structures with
the known exception of tRNASer(AGY). There is one case of
tRNA sequence overlap on the same strand: tRNACys and
tRNATyr share one nucleotide in D. galganoi.
Protein-coding genes in the three discoglossid mt
genomes begin with ATG as start codon, except COX1,
which initiates with GTG (Table 1). Stop codons are
variable among discoglossid taxa. Most ORFs have incomplete stop codons, either T or TA, which presumably
become functional by subsequent polyadenilation of the
respective mRNAs (Table 1).
As in most vertebrates, the putative origin of L-strand
replication (OL) of the discoglossid mt genomes was located
361
Table 1
Main structural features of discoglossid mt genomes
Feature
Discoglossus
galganoi
Alytes
obstetricans
Bombina
orientalis
Total length
%A
%C
%G
%T
Control region
OL
12S rRNA
16S rRNA
Intergenic
spacers
ATP6
ATP8
Cytochrome b
COX1
COX2
COX3
ND1
ND2
ND3
ND4
ND4L
ND5
ND6
17,014
29
27
16
28
1482
28
949
1626
37
17,490
29
29
15
27
2035
30
937
1583
35
17,847
30
27
15
28
2372
29
933
1599
42
683 (ATG/TA–)
168 (ATG/TAA)
1142 (ATG/TA–)
1554 (GTG/TAA)
688 (ATG/T–)
784 (ATG/T–)
965 (ATG/TA–)
1045 (ATG/T–)
343 (ATG/T–)
1378 (ATG/T–)
297 (ATG/TAA)
1818 (ATG/TAA)
510 (ATG/AGA)
683 (ATG/TA–)
168 (ATG/TAA)
1142 (ATG/TA–)
1551 (GTG/TAA)
688 (ATG/T–)
784 (ATG/T–)
963 (ATG/TAA)
1042 (ATG/T–)
343 (ATA/T–)
1378 (ATG/T–)
297 (ATG/TAA)
1809 (ATG/TAA)
510 (ATG/AGA)
684 (ATG/TAA)
168 (ATG/TAA)
1141 (ATG/T–)
1554 (GTG/TAA)
688 (ATG/T–)
784 (ATG/T–)
962 (ATG/TA–)
1045 (ATG/T–)
343 (ATG/T–)
1378 (ATG/T–)
297 (ATG/TAA)
1809 (ATG/TAA)
510 (ATG/AGA)
For each, total length of the mt genome, overall base composition of the
L-strand, length of the common non-coding regions, length of the ribosomal
genes, length of all the intergenic spacers, and length of the protein-coding
genes (showing start/stop codons within parentheses) are presented.
Lengths are expressed as bp.
within the WANCY tRNA cluster, between the tRNAAsn
and tRNACys genes (Fig. 2A). In all discoglossids, the OL
ranged from 28 to 30 bp (Table 1) and had the potential to
fold into a stem-loop secondary structure, sharing some
nucleotides with the flanking tRNACys (Fig. 2B). As
described for other tetrapods, L-strand synthesis is probably
initiated in a stretch of thymines in the OL loop (Fig. 2B).
The 5V-GCCGG-3Vmotif that in human mt DNA is involved
in the transition from RNA synthesis to DNA synthesis is
entirely conserved in all three discoglossids (Fig. 2B).
The control regions of the three discoglossid mt genomes
are highly variable in length, ranging from 1482 to 2372 bp
(Table 1). The structure of the control region of each species
is shown in Fig. 3A. Three conserved sequence blocks
(CSB-1, CSB-2, and CSB-3) (Fig. 3B) were identified in the
3V end part of each control region. The newly reported
discoglossid CSB-1 motifs are not reduced to a truncated
pentamotif (5V-GACAT-3V) as in fishes, but share moderately
high similarity to the recently described caecilian CSB-1
(San Mauro et al., 2004) (Fig. 3B). A truncated CSB-1 had
been reported for other amphibians: X. laevis, A. davidianus, L. atifi, and R. sibiricus. However, the alignment of all
amphibian mt control regions allowed us to identify a
complete CSB-1 motif in all these species (only tentatively
in A. davidianus), as well as in the recently sequenced F.
limnocharis, A. mexicanum, and B. melanostictus (not
362
D. San Mauro et al. / Gene 343 (2004) 357–366
Fig. 2. (A) Gene organization for the mt genomes of the discoglossids. Genes encoded by the L-strand are underlined. (B) Proposed secondary structures for the
origins of L-strand replication (OL) of the discoglossids. The 5V-GCCGG-3Vmotif is indicated by a box. Lines show the nucleotides partially shared with
flanking tRNAs.
Fig. 3. Main features of the discoglossid mt DNA control region. (A) Structure of the control region for each species. All discoglossids have three conserved
sequence blocks (CSB-1, 2, and 3), two pyrimidine-rich regions (PP-1 and 2), and repeated motifs at both 5Vand 3Vends. All repeats are in tandem except those
at the 3Vend of D. galganoi. (B) Alignments of the identified conserved sequence blocks (CSB) of all three discoglossids. (C) Alignment of the repeated motif
at the 5Vend. First position on this alignment is referred to first position on D. galganoi control region. Line shows nucleotides that correspond to a putative
termination-associated sequence (TAS).
D. San Mauro et al. / Gene 343 (2004) 357–366
shown). Two pyrimidine-rich stretches were identified
upstream the CSB motifs in each control region (Fig. 3A).
Although somewhat shorter, they are likely homologous to
the caecilian PP-1 (poly-T stretch) and PP-2 (poly-C
stretch), and might be involved in regulatory aspects of
the origin of H-strand replication (San Mauro et al., 2004).
All three discoglossid mt control regions possess repeats
at both 5Vand 3Vends (Fig. 3A). The repeated motif at 5Vend
is in tandem and shows high sequence similarity in all three
discoglossids (Fig. 3C), which suggests a common origin
i.e. homology. However, the number and length of tandem
repeats differ across taxa: D. galganoi possesses four
repeats of 87 bp, A. obstetricans five (plus five incomplete)
of 92 bp, and B. orientalis 11 (plus one incomplete) of 77 bp
(Fig. 3A). Two copies of the same motif were identified in
X. laevis, and one single copy in examined neobatrachians
and salamanders. This suggests that this motif at the 5Vend
of the mt control region was likely present in at least the
ancestor of frogs and salamanders, and that independent
duplication events occurred in the evolutionary history of
each lineage. Furthermore, a putative termination-associated
sequence (TAS) was found within this homologous motif in
all three discoglossids (Fig. 3C). Only in D. galganoi, there
was a L-strand-encoded ORF all along the 5Vend motif, but
363
BLAST searches of the predicted 29 amino acid sequence
produced no close matches, and thus the function of the
putative polypeptide (if any) is unknown. Unlike the 5Vend
repeats, sequence similarity of the 3Vend motifs (two repeats
of about 78 bp not in tandem in D. galganoi, five tandem
repeats of about 89 bp (plus one incomplete) in A.
obstetricans, and three tandem repeats of about 64 bp in
B. orientalis; Fig. 3A) was very low, suggesting that they
might not be related to each other.
3.2. Phylogenetic relationships of discoglossids
The deduced amino acid sequences of all 13 mt proteincoding genes were combined into a single data set that
produced an alignment of 3,818 positions. Of these, 301
were excluded from the analyses because of ambiguity in
the homology assignment, 1766 were invariant, and 1126
parsimony-informative. Mean character distance was
0.138F0.005 among discoglossids, 0.186F0.004 between
discoglossids and Xenopus, 0.284F0.008 between discoglossids and neobatrachians, 0.304F0.019 between Xenopus and neobatrachians, and 0.269F0.017 among
neobatrachians. ML ( ln likelihood=35,255.830), BI ( ln
likelihood=35,298.060), ME (score=1.198), and MP (one
Fig. 4. Phylogenetic relationships of discoglossid genera, and position of the family Discoglossidae within the Anura. (A) ML phylogram inferred from a single
concatenated data set with the deduced amino acid sequence of all 13 mt protein-coding genes. Numbers above branches indicate support for ML (QP support;
mtREV24+G+I model; upper value) and BI (BPPs; mtREV24+G+I model; lower value). Numbers below branches represent BPs for ME (mean character
distances; upper value) and MP (lower value). Hyphens indicate support values below 50%. Salamanders were used as outgroups. (B) Unrooted ML phylogram
inferred from analysis of the combined four-taxon data set (see text). Numbers above branches indicate support for ML (BPs; GTR+G+I model; upper value)
and BI (BPPs; different model according to partition, see text; lower value). Numbers below branches represent BPs for ME (GTR+G+I distances; upper value)
and MP (lower value).
364
D. San Mauro et al. / Gene 343 (2004) 357–366
single tree of 5371 steps; CI=0.762) phylogenetic analyses
arrived at the same tree topology (Fig. 4A). The recovered
tree strongly supported a discoglossid clade that comprises
Alytes, Bombina, and Discoglossus (Fig. 4A). This result is
congruent with recent morphological (Pugener et al., 2003)
and molecular (Biju and Bossuyt, 2003; Hertwig et al.,
2004; Hoegg et al., 2004) studies, and supports the
traditional view of discoglossids as a natural group (e.g.,
Griffiths, 1963; Duellman, 1975; Laurent, 1979; Duellman
and Trueb, 1994; Hay et al., 1995; Sanchiz, 1998).
Conversely, it clearly rejects the validity of Bombinanura
and Discoglossanura groupings as proposed by Ford and
Cannatella (1993).
Within the Discoglossidae, Alytes was recovered as the
sister taxon of Discoglossus, to the exclusion of Bombina
(Fig. 4A), a topology which is in full agreement with recent
morphological (Pugener et al., 2003) and molecular (Biju
and Bossuyt, 2003; Hoegg et al., 2004) investigations, and
supports most previous studies that found closer affinities of
Alytes to Discoglossus than to Bombina (Duellman, 1975;
Laurent, 1979; Ford and Cannatella, 1993; Duellman and
Trueb, 1994; Sanchiz, 1998; Odierna et al., 2000) irrespective of the actual monophyly of Discoglossidae. Previous
immunological and morphological studies that defended an
Alytes+Bombina clade (Erspamer et al., 1972; Lanza et al.,
1975) or a Discoglossus+Bombina clade (Maxson and
Szymura, 1984; Haas, 2003) are fully rejected by our
results. Two recent molecular studies (Fromhage et al.,
2004; Hertwig et al., 2004) have also dealt with the
discoglossid phylogeny using partial sequences of 12S and
16S rRNA mt genes (about 900 bp in total), but
unfortunately both of them were unable to reach clear and
well supported results regarding the phylogenetic interrelationships of discoglossid genera.
A close sister group relationship between Xenopus and
discoglossids was highly supported (Fig. 4A), which would
in principle support Hedges and Maxson’s (1993) and Hay
et al.’s (1995) hypothesis, and contradict previous morphological (e.g., Ford and Cannatella, 1993; Duellman and
Trueb, 1994) and molecular (Hillis et al., 1993) investigations. However, the lack of any Pelobatoidea in the
analysis, a taxon that is often recovered as sister taxon of
Pipoidea (e.g., Ford and Cannatella, 1993; Garcı́a-Parı́s et
al., 2003), might bias the analysis and cause the observed
topology. Moreover, the three neobatrachians, B. melanostictus, F. limnocharis and R. nigromaculata, were recovered
together in a clade achieving maximal support with all
methods (Fig. 4A), which fully agrees with almost all
morphological and molecular studies to date (e.g., Duellman
and Trueb, 1994; Hay et al., 1995). Interestingly, with all
methods of phylogenetic inference, the three neobatrachians
exhibited extremely long branches (Fig. 4A). It is well
known that unequal substitution rates among taxa may have
severe effects on tree reconstruction algorithms. Long
branch attraction leads to a strong grouping and basal
placement of the ingroup species with the fastest rates
(longest branches) irrespective of the true phylogeny
(Swofford et al., 1996). It is likely that the high rates of
the neobatrachians may bias the analyses and cause
artefactual monophyly of Archaeobatrachia. In fact, when
Table 2
Results of the phylogenetic analyses of the single four-taxon data sets
CXCR4
RAG1
RAG2
Rhodopsin
All nuclear genes
mt proteins
Number of positions
Total aligned
Ambiguous/gapped
Invariant
Parsimony-informative
651
6
431
43
1509
0
1072
80
816
0
485
57
294
0
228
8
3270
6
2216
188
10,866
210
6228
908
ML
ln L
BP
2014.676
91
4377.024
94
2787.458
59
739.552
90
9973.266
98
37,381.548
72
BI
ln L
BPP
2014.390
100
4377.620
100
2785.300
65
739.800
94
9974.130
100
37,382.570
99
ME
Tree score
BP
0.676
96
0.507
91
0.780
58
0.365
77
0.678
99
1.339
79
MP
Tree length
CI
BP
276
0.931
99
562
0.911
64
440
0.925
81
77
0.961
76
1355
0.923
99
6327
0.907
78
For each, number of total aligned positions, ambiguous/gapped positions, invariant positions, parsimony-informative positions, ML and BI log likelihoods, ME
tree score, MP tree length, CI, and support for the ((Alytes, Discoglossus), (Bombina, Xenopus)) grouping (BPs for ML, ME, and MP; and BPPs for BI) are
presented.
D. San Mauro et al. / Gene 343 (2004) 357–366
Table 3
Log likelihoods and p values of Approximately unbiased (AU), Shimodaira-Hasegawa (SH), and Kishino-Hasegawa (KH) tests for each of the
three unrooted topologies of the combined four-taxon data set
Alternative topologies
((A, D), (B, X))
((A, X), (B, D))
((A, B), (D, X))
ln L
47,581.186
47,599.262
47,599.360
AU
SH
KH
0.984
0.027
0.025
0.991
0.020
0.019
0.980
0.020
0.019
A, Alytes; B, Bombina; D, Discoglossus; X, Xenopus.
we use stringent parameter settings in GBLOCKS to remove
the fast evolving sites from the alignment, Xenopus is
consistently recovered as the sister group of the neobatrachians with all methods of phylogenetic inference,
whereas other ingroup relationships become unresolved
because of the strong reduction in the number of variable
sites (not shown). Therefore, the recovered monophyly of
archaeobatrachians may be spurious, and additional
sequence information from more key lineages (e.g.,
representatives of Pelobatoidea, Ascaphidae, or Leiopelmatidae) needs to be gathered to properly address this question.
We used the four-taxon data sets to further investigate the
relationships among discoglossids. All phylogenetic analyses based on four-taxon data sets (nucleotide sequences of
each separate nuclear gene, a concatenated nucleotide
sequence containing all four nuclear genes, and a concatenated nucleotide sequence containing all mt proteincoding genes except ND6) arrived at the same wellsupported tree topology: ((Alytes, Discoglossus), (Bombina,
Xenopus)). The results of all these analyses for each fourtaxon data set are given in Table 2. Despite the apparently
low variability of the nuclear data sets with respect to the
mt, the statistical support for the Alytes+Discoglossus
grouping was very high in all cases, and only the RAG2
gene showed a moderately lower phylogenetic performance.
365
The combination of all four-taxon data sets into a joint
matrix produced an alignment of 13,920 positions. ML ( ln
likelihood=47,581.186), BI ( ln likelihood=47,400.700),
ME (score=1.072), and MP (one single tree of 7682 steps;
CI=0.910) arrived at the same tree topology, and all support
values for the Alytes+Discoglossus grouping were maximal
or nearly so (Fig. 4B). Results of AU, SH, and KH tests of
alternative tree topologies of this latter combined data set
are summarised in Table 3. All tests rejected the two
suboptimal trees at Pb0.05. The strong evidence of the fourtaxon data sets in favor of an Alytes+Discoglossus grouping
further supports the results achieved based on phylogenetic
analyses of mt amino acids (Fig. 4A) (see above).
Estimated substitution rates (ML tree branch length) of
the nuclear genes were relatively slower than that of the
combined mt protein-coding gene data set (Fig. 5), which is
consistent with many previous studies (e.g., Brown et al.,
1982). This condition makes all four nuclear genes
potentially useful molecular markers for the study of deep
amphibian divergences. With the noteworthy exception of
CXCR4, all four-taxon data sets exhibited a very short
internal branch leading to rather long external (tip) branches
(Fig. 5). Short branch lengths connecting internal nodes may
reflect rapid radiation events at the origin of these lineages,
but this needs to be tested specifically.
Although the monophyly of discoglossids, and the sister
group relationship of Alytes and Discoglossus are confidently resolved in this study, the lack of sequence
information for other lineages of frogs does not allow us
to draw clear conclusions about the overall anuran relationships nor on the archaeobatrachian monophyly debate.
Additional taxa need to be targeted in future molecular
phylogenetic studies that use complete mt genomes and
nuclear genes to further understand the origin and early
evolution of Anura. From a taxonomic perspective, the
Fig. 5. Estimated substitution rates (measured as ML tree branch length) of each single four-taxon data set (nucleotide sequences of each separate nuclear gene,
a concatenated nucleotide sequence containing all four nuclear genes, and a concatenated nucleotide sequence containing all mt protein-coding genes except
ND6). In every column, substitution rates of specific branches are identified by the number on the tree. S.E., standard error.
366
D. San Mauro et al. / Gene 343 (2004) 357–366
strongly supported monophyly of the discoglossids suggests
that the family name Discoglossidae should be used again to
include all four genera Discoglossus, Alytes, Bombina, and
Barbourula.
Acknowledgements
We are grateful to Lukas Rqber for providing helpful
technical advice with laboratory work, to Íñigo Martı́nezSolano for helping during sampling collection, and to the
bConsejerı́a de Medio AmbienteQ of Madrid and Castilla y
León (Spain) for providing the appropriate collecting
permits. Two anonymous reviewers gave insightful comments on an earlier version of the manuscript. D.S.M. was
sponsored by a predoctoral fellowship of the Ministerio de
Ciencia y Tecnologı́a of Spain. This work received financial
support from a project of the Ministerio de Ciencia y
Tecnologı́a of Spain to R.Z. (CGL2004-00401).
References
Arntzen, J.W., Garcı́a-Parı́s, M., 1995. Morphological and allozyme studies
of midwife toads (genus Alytes), including the description of two new
taxa from Spain. Contrib. Zool. (Bijdr. Dierkd.) 65, 5 – 34.
Biju, S.D., Bossuyt, F., 2003. New frog family from India reveals an
ancient biogeographical link with the Seychelles. Nature 425, 711 – 714.
Brown, W.M., Prager, E.M., Wang, A., Wilson, A.C., 1982. Mitochondrial
DNA sequences of primates: tempo and mode of evolution. J. Mol.
Evol. 18, 225 – 239.
Duellman, W.E., 1975. On the classification of frogs. Occas. Pap. Mus. Nat.
Hist., Univ. Kansas 42, 1 – 14.
Duellman, W.E., Trueb, L., 1994. Biology of Amphibians. Johns Hopkins
University Press, Baltimore, MD.
Erspamer, V., Erspamer, F., Inselvini, M., Negri, L., 1972. Occurrence of
bombesin and alytesin in extracts of the skin of three european
discoglossid frogs and pharmacological actions of bombesin on
extravascular smooth muscle. Br. J. Pharmacol. 45, 333 – 348.
Ford, L.S., Cannatella, D.C., 1993. The major clades of frogs. Herpetol.
Monogr. 7, 94 – 117.
Fromhage, L., Vences, M., Veith, M., 2004. Testing alternative vicariance
scenarios in Western Mediterranean discoglossid frogs. Mol.
Phylogenet. Evol. 31, 308 – 322.
Garcı́a-Parı́s, M., Buchholz, D.R., Parra-Olea, G., 2003. Phylogenetic
relationships of Pelobatoidea re-examined using mtDNA. Mol.
Phylogenet. Evol. 28, 12 – 23.
Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based tests
of topologies in phylogenetics. Syst. Biol. 49, 652 – 670.
Griffiths, I.G., 1963. The phylogeny of the Salientia. Biol. Rev. 38,
241 – 292.
Groth, J.G., Barrowclough, G.F., 1999. Basal divergences in birds and the
phylogenetic utility of the nuclear RAG-1 gene. Mol. Phylogenet. Evol.
12, 115 – 123.
Haas, A., 2003. Phylogeny of frogs as inferred from primarily larval
characters (Amphibia: Anura). Cladistics 19, 23 – 89.
Hay, J.M., Ruvinsky, I., Hedges, S.B., Maxson, L.R., 1995. Phylogenetic
relationships of amphibian families inferred from DNA sequences of
mitochondrial 12S and 16S ribosomal RNA genes. Mol. Biol. Evol. 12,
928 – 937.
Hedges, S.B., Maxson, L.R., 1993. A molecular perspective on lissamphibian phylogeny. Herpetol. Monogr. 7, 27 – 42.
Hertwig, S., de Sá, R.O., Haas, A., 2004. Phylogenetic signal and the utility
of 12S and 16S mtDNA in frog phylogeny. J. Zoolog. Syst. Evol. Res.
42, 2 – 18.
Hillis, D.M., Ammerman, L.K., Dixon, M.T., de Sá, R.O., 1993. Ribosomal
DNA and the phylogeny of frogs. Herpetol. Monogr. 7, 118 – 131.
Hoegg, S., Vences, M., Brinkmann, H., Meyer, A., 2004. Phylogeny and
comparative substitution rates of frogs inferred from sequences of three
nuclear genes. Mol. Biol. Evol. 21, 1188 – 1200.
Jameson, D., Gibson, A.P., Hudelot, C., Higgs, P.G., 2003. OGRe: a
relational database for comparative analyses of mitochondrial genomes.
Nucleic Acids Res. 31, 202 – 206.
Lanza, B., Cei, J.M., Crespo, E., 1975. Immunological evidence for the
specific status of Discoglossus pictus Otth, 1837 and D. sardus
Tschudi, 1837, with notes on the families Discoglossidae Gqnther, 1858
and Bombinidae Fitzinger, 1826 (Amphibia: Salientia). Monit. Zool.
Ital. (N.S.) 9, 153 – 162.
Laurent, R., 1979. Esquisse d’une phylogenèse des anoures. Bull. Soc.
Zool. Fr. 104, 397 – 422.
Martı́nez-Solano, I., 2004. Phylogeography of Iberian Discoglossus
(Lissamphibia: Anura: Discoglossidae). J. Zoolog. Syst. Evol. Res.
(in press).
Maxson, L.R., Szymura, J.M., 1984. Relationships among discoglossid
frogs: an albumin perspective. Amphib.-Reptil. 5, 245 – 252.
Odierna, G., Andreone, F., Aprea, G., Arribas, O., Capriglione, T., Vences,
M., 2000. Cytological and molecular analysis in the rare discoglossid
species, Alytes muletensis (Sanchiz and Adrover 1977) and its bearing
on archaeobatrachian phylogeny. Chromosom. Res. 8, 435 – 442.
Pugener, L.A., Maglia, A.M., Trueb, L., 2003. Revisiting the contribution
of larval characters to an analysis of phylogenetic relationships of basal
anurans. Zool. J. Linn. Soc. 139, 129 – 155.
Sanchiz, B., 1998. Encyclopedia of Palaeoherpetology, Part IV. Salientia.
Friedrich Pfeil, Mqnchen.
San Mauro, D., Gower, D.J., Oommen, O.V., Wilkinson, M., Zardoya, R.,
2004. Phylogeny of caecilian amphibians (Gymnophiona) based on
complete mitochondrial genomes and nuclear RAG1. Mol. Phylogenet.
Evol. 33, 413 – 427.
Swofford, D.L., Olse, G.J., Waddell, P.J., Hillis, D.M., 1996. Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (Eds.),
Molecular Systematics. Sinnauer Associates, Sunderland, MA, USA,
pp. 407 – 514.
Yang, Z., Nielsen, R., Hasegawa, M., 1998. Models of amino acid
substitution and applications to mitochondrial protein evolution.
Mol. Biol. Evol. 15, 1600 – 1611.
Zardoya, R., Meyer, A., 1996. Phylogenetic performance of mitochondrial
protein-coding genes in resolving relationships among vertebrates.
Mol. Biol. Evol. 13, 933 – 942.