Incorporation, Relative Homoplasy, and Effect of

454
VOL. 50
S YSTEMATIC BIOLOGY
Syst. Biol. 50(3):454–462, 2001
Incorporation, Relative Homoplasy, and Effect of Gap Characters in
Sequence-Based Phylogenetic Analyses
M ARK P. S IMMONS ,1 HELGA O CHOTERENA,1 AND TIMOTHY G. CARR 2
1
2
L.H. Bailey Hortorium, 462 Mann Library, Cornell University, Ithaca, New York 14853, USA;
E-mail: [email protected]
Department of Ecology and Evolutionary Biology, Corson Hall, Cornell University, Ithaca, New York 14853, USA
Phylogenetic analysis of nucleotide and
amino acid sequences requires the alignment of homologous sequences. The alignment procedure often requires the insertion
of gaps, putatively corresponding to insertion or deletion events, which can be coded as
phylogenetic characters. As a general class of
phylogenetic characters, gaps have variously
been suggested to be reliable (e.g., Lloyd
and Calder, 1991; Van Dijk et al., 1999) or
unreliable (e.g., Golenberg et al., 1993; Ford
et al., 1995). This difference in opinion,
coupled with the lack of a well-supported
method for the coding of gaps, has led to
a diversity of approaches by which gaps
have been treated in, or excluded from, tree
searches (González, 1996).
In an earlier paper we presented two methods, termed simple and complex indel coding, in which all gaps (excluding leading and
trailing gaps, which are generally artifacts)
can be coded from aligned sequence-based
matrices (Simmons and Ochoterena, 2000).
Simple indel coding, which is used in this
study, is implemented by coding all gaps
that have different 5’ or 3’ termini as separate presence/absence characters. Whenever
a gap is being coded and the region it spans
is completely included within the span of another gap, those sequences having the longer
gap (i.e., one that extends to or beyond both
the 5’ and 3’ termini of the gap being coded)
are scored as inapplicable for the gap character being coded.
Some have suggested on theoretical and
empirical grounds that longer gaps are better
phylogenetic characters than shorter gaps.
Lloyd and Calder (1991) argued that multiresidue gaps are reliable phylogenetic characters because indels are unlikely to be repeated in the exact same position with the
same length and sequence (for insertions); indels of different lengths at the same position
are recognized as separate events. Similarly,
van Ham et al. (1994) suggested that, based
on the relative levels of homoplasy in the intergenic spacer between trnL and trnF, gaps
longer than two positions are reliable phylogenetic characters.
In this paper we assess the relative levels of
homoplasy of gap and base characters from
a selection of 38 published sequence-based
matrices. We determine the potential phylogenetic information included in gap characters and the extent to which inclusion of gap
characters alters the gene tree topology and
branch support values. We also test the assertion that longer gaps are better phylogenetic
characters than shorter gaps.
M ETHODS
Thirty-eight sequence-based data matrices
were selected for this study: 5 based on structural rDNA, 5 based on ITS, 6 based on introns, and 22 based on protein-coding exons
(Appendix 1). Matrices with many gaps were
preferentially selected over matrices with
few gaps so that many gaps could be coded.
Gaps were coded for all matrices by using
simple indel coding (as explained above, see
also Simmons and Ochoterena, 2000).
In all cases, the original sequence alignments, obtained from the authors or
downloaded from EMBL on 9 July 1999
at
http://bioinfo.weizmann.ac.il/pub/
databases/embl/align/, were used. All
aligned positions were included in the
analyses, even if certain regions were excluded by the original authors (as was done
by Kanai et al., 1997; Budin and Philippe,
1998; Burmester et al., 1998; Downie et al.,
1998; Tourancheau et al., 1998). Gaps in
DNA sequence matrices were manually
coded by using WinClada (Nixon, 1999).
Gaps in amino acid sequence matrices
2001
POINTS OF VIEW
were manually coded by using MacClade
(Maddison and Maddison, 1992). DNA
sequence matrices were analyzed with Nona
(Goloboff, 1993). Amino acid sequence
matrices were analyzed with PAUP¤ (Swofford, 1998). For all analyses equally weighted
parsimony was used. Tree searches were
performed by using 100 heuristic searches
with random order taxon entry and TBR
branch swapping. A maximum of 1,000 trees
was held.
To compare the potential amount of
phylogenetic information contained in the
base characters and the gap characters, the
maximum possible number of steps minus
the minimum possible number of steps for
each character was used as a measure of
the “amount of possible synapomorphy”
(Farris, 1989:418). This statistic, which was
calculated directly from the data matrices, was obtained by using the statistics
option in WinClada and manually calculated for the amino-acid–based matrices
by using MacClade. Note that the statistic
for potential–phylogenetic information is
not particularly sensitive to missing data,
and when large amounts of missing data
are present, the statistic may be somewhat
misleading.
The consistency index (Kluge and Farris,
1969) and the retention index (Farris, 1989)
were used to assess relative amounts of
homoplasy in gap and base characters. The
consistency indices presented include uninformative characters for two reasons. First,
uninformative characters that include
autapomorphies are not homoplasious.
Although they are not phylogenetically
informative, these characters are appropriately considered when measuring
homoplasy (Goloboff, 1991). Second, the
consistency index, when autapomorphic
characters are excluded, is not a fair measure
when comparing gaps coded as binary characters (as is done when using simple indel
coding) with multistate base characters (up
to four nucleotides or 20 amino acids). This
is because informative multistate characters
may include uninformative character states
that raise the consistency index. In contrast,
binary characters that have an uninformative character state are eliminated when the
consistency index is calculated only on the
basis of informative characters.
In comparing the consistency and retention indices of gap and base characters, both
455
groups of characters were optimized onto
the most-parsimonious tree(s) found by using base characters only. These trees were
selected as a very conservative measure of
relative levels of homoplasy between gap
and base characters. The base characters
were mapped onto the most-parsimonious
tree(s) for these characters, whereas the gap
characters were mapped onto trees for which
they had no effect on the topology. Furthermore, when a range of consistency or
retention indices (or both) for gap characters was obtained for the most-parsimonious
trees found by using base characters only,
the lowest values were reported (i.e., the tree
with the worst Žt of the gap characters was
used). Both of these factors represent biases
in favor of base characters when comparing
consistency and retention indices.
One problem complicating comparison of
homoplasy between gap and base characters is that the two types of characters
have different amounts of missing data.
All else being equal, the more missing
data in a character, the higher the consistency and retention indices are expected to
be—because missing data cannot conict
with the groupings inferred by the character state entries that are present. Generally,
because of the manner in which overlapping gaps are coded in simple indel coding, more data were missing in gap characters than in the base characters. For our
purposes, we corrected for these differences
in amounts of missing data by multiplying the consistency and retention indices by
the percentage of real (not missing) data
for each group of characters being compared. The modiŽed indices are termed the
“corrected consistency index” and the “corrected retention index.” Note that relative
to the “uncorrected” indices, the corrected
indices generally favor base characters because those generally contained less missing
data.
Strict consensus trees were used to compare the percentage of branches in common
between most-parsimonious trees found by
using base characters only compared with
those based on both base and gap characters. Comparing strict consensus trees is a severe measure of the similarity in tree topologies because a single rearrangement of one
taxon from one clade to a distantly related
clade results in a substantial decrease in the
number of clades in common between the
456
VOL. 50
S YSTEMATIC BIOLOGY
strict consensus trees. To determine the percentage of branches in common, the number
of branches resolved in the strict consensus
trees of both matrices in each comparison
was divided by the number occurring in
whichever strict consensus tree was least
resolved.
Relative levels of branch support between
the trees constructed with base characters
and the trees constructed with base and
gap characters were compared in terms of
bootstrap support values (Felsenstein, 1985).
Bootstrap support values were determined
with 100 replicates with 10 TBR searches using random taxon addition per replicate. For
the nucleotide sequence matrices analyzed
with Nona, strict consensus bootstrap support values (described by Davis et al., 1998)
were mapped onto the strict consensus of the
most-parsimonious trees by using WinClada.
For the amino acid sequence matrices analyzed with PAUP¤ , frequency within replicates bootstrap support values were mapped
onto the 50% majority rule bootstrap tree (because PAUP¤ does not calculate strict consensus bootstrap support values). Average bootstrap support values were calculated on the
basis of the branches in common on the strict
consensus trees from both matrices.
Two comparisons were performed to test
the assertions made by Lloyd and Calder
(1991) and van Ham et al. (1994) that longer
gaps are better phylogenetic characters than
shorter gaps. To test the assertion made by
Lloyd and Calder (1991), corrected consistency and retention indices of single-position
gaps were compared with gaps longer than
one position. Because gaps in exons generally occur in multiples of three nucleotide
positions (one codon) and in the coding
frame, the shortest gap in exons is generally three nucleotide positions long. According to the criteria for the assertion made by
Lloyd and Calder (1991), these gaps are considered equivalent to gaps that are one nucleotide position long in non-exon regions.
To test the assertion made by van Ham et
al. (1994), we compared corrected consistency and retention indices of one- and twoposition-long gaps with gaps longer than
two positions. According to the criterion
for the assertion made by van Ham et al.
(1994), gaps in exons (which were, in all exon
matrices, at least three nucleotide positions
long) were not considered. Therefore, this
test was limited to the structural rDNA, ITS,
and intron matrices. Consistency and retention indices were measured by mapping the
gap characters onto the most-parsimonious
trees found by using base characters
only.
Statistical Analyses
The comparisons outlined above are generally split-plot designs consisting of a random factor (matrix) nested within a Žxed
factor (type of matrix: rDNA, ITS, intron,
exon) and on which measurements are paired
for another Žxed factor (e.g., consistency index measured for base and gap characters;
Keppel, 1982). For each split-plot analysis,
only the statistically signiŽcant results or the
relevant statistically insigniŽcant results are
reported. ANOVAs were used for analyses
comparing a single measurement among the
matrix types. Raw data or residuals were
checked for normality and homoscedasticity
by using one-sample Kolmogorov–Smirnov
tests and Fmax tests, respectively, and transformed if necessary. Although sample sizes
for some matrix types were small, in >95% of
the cases the data or the residuals (depending on the type of analysis) were not significantly heteroscedastic nor were their distributions signiŽcantly different from normal.
Thus, being able to include matrix type as a
factor prevents exons (which have a much
greater sample size) from unduly inuencing the overall results. Differential effects of
matrix type show up as interactions in the
split-plot ANOVA and are reported whenever signiŽcant. All analyses were performed
with SYSTAT v. 6.0 for Windows, except the
Fmax tests were performed according to Sokal
and Rohlf (1981) and evaluated by using the
table in Rohlf and Sokal (1981). Any post hoc
tests were evaluated by using Bonferronicorrected probabilities (Keppel, 1982).
Our primary purpose in using statistical
hypothesis testing is to facilitate interpretation of the results from this set of matrices. Our sample has a broad taxonomic
base and examines a range of loci and types
of loci, suggesting that its inferential scope
should be broad. However, because systematists studying different groups often focus on different loci, correlations between
locus type and taxonomic group can confound interpretation (e.g., all ITS sequences
analyzed are from plants). Therefore, broader
inferences about types of loci and the effects
2001
457
POINTS OF VIEW
of gaps on phylogenetic analyses should be
made with caution.
R ESULTS AND D IS CUS SION
Relative Homoplasy and Effect of Gap
Characters
In the 38 matrices, an average of 8% (from
1% to 22%) of the potential phylogenetic
information was contained in gap characters (Table 1). The percentage of potential
phylogenetic information contained in gap
characters varied signiŽcantly among the
types of matrices (ANOVA: F3,34 D 11:19,
P < 0:00005) because of the signiŽcantly
greater phylogenetic content contained in
gaps in ITS-based matrices (average 14.8%)
and intron-based matrices (average 14.5%)
compared with exon-based matrices (average 4.4%; Bonferroni correction for both comparisons: P < 0:0005). Gaps averaged 9.8%
of the total characters in all matrices, and the
percentage of total characters that are gaps
did not differ among the types of matrices
(ANOVA: F3,34 D 1:60, P > 0:2).
Gap characters were found to have signiŽcantly less homoplasy than did base
characters. When mapped on the mostparsimonious trees found by using base
TABLE 1. Relative homoplasy and effect of gap characters.
Matrix
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
a%
Corrected CI
Corrected RI
Bases
Gaps
Bases
Gaps
IARb
14
8
6
3
17
6
16
12
22
18
17
22
11
15
20
2
3
2
6
6
3
2
8
12
3
5
2
2
2
3
2
6
4
3
1
8
9
5
42
58
50
40
54
30
60
69
50
62
67
75
68
82
56
52
57
24
70
47
39
40
63
49
69
68
59
72
64
61
67
65
70
70
64
71
46
66
57
71
48
61
56
51
70
71
48
72
64
71
84
72
57
37
62
40
77
54
57
38
61
54
78
65
54
80
51
49
79
56
52
86
77
76
54
67
44
50
36
49
63
58
63
73
34
69
61
65
70
77
64
67
68
42
67
40
48
48
65
45
65
70
55
54
64
53
63
58
55
63
50
68
55
62
60
61
37
36
60
70
68
73
22
76
52
60
85
42
65
54
67
59
83
57
68
50
83
65
86
86
65
82
64
56
90
76
51
90
87
81
77
79
yes
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
no
yes
yes
no
yes
no
no
yes
yes
yes
yes
no
yes
no
of potential phylogenetic information contained in gaps.
including of the amount of resolution.
c Positive values reect increase with the addition of gaps.
CI, consistency index; RI, retention index.
b IAR,
Differencec in
Topology change
%gap
inf. cont.a
Not IAR
yes
no
no
yes
yes
yes
yes
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
no
no
no
yes
no
yes
no
no
no
no
no
no
no
no
yes
no
no
no
yes
no
No. of trees
No. of clades
Bootstrap
2
1
¡1
8
29
—
¡3
¡2
0
¡10
0
0
¡1
0
16
—
1
¡16
0
¡2
¡261
¡5
¡50
1
0
1
5
0
¡3
0
0
¡2
0
¡3
1
0
6
0
¡2
¡1
2
0
¡5
¡5
1
0
0
0
0
0
0
0
¡1
18
0
¡5
0
3
23
5
7
¡1
0
1
¡6
0
0
0
0
4
0
2
¡1
0
¡3
0
2.7
0.2
¡4.3
¡2.8
6.8
0.4
1.1
5.2
4.6
7.8
6.4
4.5
0.6
¡0.1
1.6
6.9
0.2
2.2
¡0.5
¡0.1
1.0
4.2
1.7
¡0.4
¡0.4
0.7
1.4
1.0
2.6
0.6
¡0.8
0.6
2.0
2.0
0.4
1.0
2.9
458
S YSTEMATIC BIOLOGY
characters only, gap characters generally
had higher corrected ensemble consistency
indices than base characters. For 24 of 38
matrices (63%), gap characters had a higher
corrected consistency index than did base
characters, and on average the corrected
consistency index was slightly higher for
gap characters (0.62) than for base characters (0.58; Table 1; split-plot ANOVA: F1,34 D
5:69, P < 0:05). For 27 of 38 matrices (71%),
gap characters had a higher corrected retention index than did base characters, and
for 9 of 38 matrices (24%), gap characters
had a lower corrected retention index than
did base characters. Overall, the difference
between the corrected retention index for
gap and base characters was not signiŽcant
(split-plot ANOVA: F1,34 D 2:01, P > 0:15),
but the difference by “matrix type” interaction was (split-plot ANOVA: F3,34 D 7:68,
P < 0:0005): The corrected retention index
for gaps versus bases averaged 15.6% greater
in exon matrices (P < 0:000005).
Inclusion of gap characters usually
changed the strict consensus of the mostparsimonious trees. In 28 of 38 matrices
(74%), including gap characters resulted in a
change in the amount of resolution or topology of the strict consensus tree (Table 1).
In 17 of 38 matrices (45%), including gap
characters changed the topology of the strict
consensus tree, irrespective of the amount of
resolution (Table 1).
Inclusion of gap characters did not
necessarily decrease the number of mostparsimonious trees (split-plot ANOVA:
F1,32 D 0:02, P > 0:8) or increase the resolution of the strict consensus of the mostparsimonious trees (split-plot ANOVA:
F1,34 D 0:29, P > 0:5 ). Overall, the number
of most-parsimonious trees decreased in
36% (13) of the matrices and increased in
31% (11) of the matrices when gap characters
were included (Table 1). Likewise, in 26%
(10) of the matrices the resolution of the
strict consensus tree increased, and in 26%
(10) of the matrices the resolution of the
strict consensus tree decreased (Table 1).
No effect of matrix type on the change in
number of most-parsimonious trees (splitplot ANOVA: F3,32 D 0:12, P > 0:9) or on
the resolution of the strict consensus tree
(split-plot ANOVA: F3,34 D 0:27, P > 0:8)
was evident. For particular data sets (e.g.,
16, 21, and 23), however, including gap char-
VOL. 50
acters substantially decreased the number
of parsimonious trees found (by as many as
261) and increased the resolution of the strict
consensus tree (by as many as 23 clades).
Inclusion of gap characters generally resulted in increased branch support as measured by bootstrap support. In 29 of 37 matrices (78%; bootstrap support values were
not determined for one matrix because of
the many [130] terminals, which would have
resulted in prohibitively long tree-search
times) bootstrap support increased on the
branches in common between the strict consensus trees (Table 1). In only 8 of 37 matrices
(22%) did bootstrap support decrease on the
branches in common between the strict consensus trees. Overall, including gap characters produced a small (1.7%) but signiŽcant
increase in average branch support per tree
were included (split-plot ANOVA: F1,33 D
21:95, P < 0:00005). The average change in
bootstrap support was more pronounced
when the bootstrap support values increased
rather than when they decreased (C2.2% vs.
¡1.2%).
Homoplasy Relative to Gap Length
Longer gaps were not found to have less
homoplasy than shorter gaps. In 20 of 38 matrices (54%), the corrected consistency index
was greater for single-position gaps than for
gaps longer than one position, and the average corrected consistency index for single position gaps (0.62) differed insigniŽcantly from that for gaps longer than one
position (0.624; Table 2; split-plot ANOVA:
F1,34 D 1:94, P > 0:15). Likewise, in 18 of
the 37 (49%) matrices with informative gaps
both one position long and more than one
position long, the corrected retention index
was greater for single-position gaps than
for gaps longer than one position (Table 2).
Overall, the average corrected retention index for single position gaps (0.656) and that
for gaps longer than one position (0.657;
Table 2) were not signiŽcantly different (splitplot ANOVA: F1,33 D 0:18, P > 0:6). In exons, however, gaps longer than one position
(i.e., one codon) apparently are less homoplasious than single-position gaps. In 14 of
21 exon matrices (67%) the corrected consistency index was greater for gaps longer
than one position. In contrast, single-position
gaps appear to have less homoplasy in
2001
459
POINTS OF VIEW
TABLE 2. Homoplasy relative to gap length.
Matrix
No. of gaps
Corrected CI
Corrected RI
No. of gaps
Corrected CI
Corrected RI
No.
Exon
1 bp
>1 bp
1 bp
>1 bp
1 bp
>1 bp
1–2 bp
>2 bp
1–2 bp
>2 bp
1–2 bp
>2 bp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
88
75
17
44
191
38
58
54
64
45
23
25
26
46
34
20
12
47
16
37
64
7
14
39
19
6
1
7
6
12
5
14
8
13
8
11
24
16
58
83
39
13
121
44
62
37
104
65
30
47
43
52
130
116
24
76
18
95
125
46
19
73
19
3
9
12
7
23
6
5
13
9
2
17
39
19
57
80
61
65
61
44
75
75
51
75
81
70
83
72
51
48
67
28
74
62
54
27
54
45
67
67
50
70
67
42
82
56
63
78
73
68
57
69
59
63
44
52
50
58
67
66
47
72
55
73
84
75
58
36
61
33
79
51
59
40
66
58
90
62
54
87
44
53
75
57
48
99
100
80
52
64
60
68
29
33
63
67
74
75
24
77
74
60
84
50
36
60
71
65
82
66
69
29
78
62
82
91
80
72
69
44
88
71
53
84
87
77
75
85
61
57
37
43
54
74
65
72
22
76
43
60
85
30
68
53
66
56
84
53
68
52
86
67
92
77
64
88
63
62
90
80
50
99
115
109
26
53
243
57
71
76
100
65
26
35
30
55
53
32
31
49
30
4
69
25
49
15
68
45
27
37
39
43
111
104
56
74
55
63
59
45
74
71
46
75
71
71
82
75
57
45
62
64
44
44
46
68
66
70
52
70
58
74
84
72
57
36
57
63
28
36
64
68
71
73
25
77
59
57
81
50
57
54
69
58
41
39
49
69
64
75
18
73
47
63
86
32
67
54
83
78
72
CI, consistency index; RI, retention index.
non-exon regions than do gaps longer than
one position. In 10 of 16 non-exon matrices (62%), the corrected consistency index
was greater for single-position gaps. Thus,
there is a nearly signiŽcant matrix type £ gap
length interaction when the corrected consistency index is analyzed (split-plot ANOVA:
F1,34 D 2:63, P < 0:07). No such interaction
was found for the corrected retention index.
For the non-exon matrices, gaps longer
than two positions were not less homoplasious than gaps one or two positions long. In
11 of 16 matrices (69%), the corrected consistency index was greater for the shorter
gaps, but the average corrected consistency
index was not signiŽcantly different for gaps
one or two positions long (mean 0.637) and
gaps longer than two positions (mean 0.604;
Table 2; split-plot ANOVA: F1,13 D 1:87, P >
0:15). In 8 of 16 matrices (50%), the corrected
retention index was greater for longer gaps,
but again there was no difference on average between the corrected retention indices
of the two types of gaps (mean 0.575 for 1to 2-bp gaps, 0.565 for >2 bp gaps) (Table 2;
split-plot ANOVA: F1,13 D 0:15, P > 0:7).
Conclusions
Our results demonstrate the following: (1)
gap characters can represent a considerable
portion of the potential phylogenetic information in sequence-based matrices; (2) gap
characters have signiŽcantly less homoplasy
460
S YSTEMATIC BIOLOGY
than do base characters, but the difference is
slight and sometimes depends on the type
of matrix; (3) including gap characters in
sequence-based matrices often changes the
topology or resolution of the strict consensus tree; and (4) including gap characters
in sequence-based matrices often increases
branch support values. Together, these
results support the inclusion of gap characters in phylogenetic analyses that include sequence data from structural rDNA, ITS of
rDNA, intron, or exon regions. These empirical results, in combination with the theoretical bases given for using gap characters and
rigorous methodologies with which to code
gap characters (Giribet and Wheeler, 1999;
Simmons and Ochoterena, 2000), strongly
support the use of gap characters in phylogenetic analyses.
In contrast to the assertions made by Lloyd
and Calder (1991) and van Ham et al. (1994),
longer gaps were not necessarily found to be
better phylogenetic characters than shorter
gaps (assuming that characters with less homoplasy are better phylogenetic characters).
This result challenges any attempt to a priori weight gap characters according to their
length.
ACKNOWLEDGMENTS
We thank Jerrold Davis, Jeff Doyle, Damon Little, and
the Doyle and Harrison Lab Groups for reviewing the
manuscript and for helpful discussions; Kevin Nixon for
helpful discussions; and David Hibbett, Richard Olmstead, and two anonymous reviewers for their constructive criticisms. We also thank Gilles Bena, Alessandra
Bonci, James Brown, Thorsten Burmester, Elie Dassa,
Stephen Downie, Tadashi Kajita, Satoru Kanai, David
Krakauer, Roberta Mason-Gamer, Lucinda McDade,
Hervé Philippe, Jean-Loup Risler, Douglas Soltis, Anne
Baroin Tourancheau, and Miranda von Dornum for
sending us the aligned sequences used in this study.
R EFERENCES
ALLARD ,
M. W. 1994. An empirical example of parsimony behavior. Pages 231–248 in Models of phylogeny reconstruction (R. W. Scotland, D. Siebert, and
D. M. Williams, eds.). Clarendon Press, Oxford.
ANDREASEN, K., B. G. BALD WIN , AND B. BR EMER . 1999.
Phylogenetic utility of the nuclear rDNA ITS region
in subfamily Ixoroideae (Rubiaceae): Comparisons with
cpDNA rbcL sequence data. Plant Syst. Evol. 217:119–
135.
BARRIEL, V. 1994. Molecular phylogenies and how to
code insertion/deletion events. Life Sci. 317:693–701.
BENA, G., J.-M. PROSPER , B. LEJEUNE, AND I. OLIVIERI .
1998. Evolution of annual species of the genus Medicago: A molecular phylogenetic approach. Mol. Phylogenet. Evol. 9:552–559.
VOL.
50
BLOMS TER , J., C. A. MAG GS , AND M. J. STANHOPE.
1999. Extensive intraspeciŽc morphological variation
in Enteromorpha muscoides (Chlorophyta) revealed by
molecular analysis. J. Phycol. 35:575–586.
BONCI, A., A. CHIESURIN, P. MUS CAS , AND G. M.
ROSSOLINI . 1997. Relatedness and phylogeny within
the family of periplasmic chaperones involved in the
assembly of pili or capsule-like structures of Gramnegative bacteria. J. Mol. Evol. 44:299–309.
BROWN, J. R., F. T. ROBB , R. WEISS , AND W. F.
DOOLITTLE. 1997. Evidence for the early divergence of
tryptophanyl- and tyrosyl-tRNA synthetases. J. Mol.
Evol. 45:9–16.
BUDIN, K., AND H. PHILIPPE . 1998. New insights into
the phylogeny of eukaryotes based on ciliate Hsp70
sequences. Mol. Biol. Evol. 15:943–956.
BURMESTER , T., H. C. MASS EY, J R ., S. O. ZAKHARKIN,
AND H. BENES . 1998. The evolution of hexamerins and
the phylogeny of insects. J. Mol. Evol. 47:93–108.
DAVIS , J. I., M. P. SIMMO NS , D. W. STEVENSON, AND
J. F. W ENDEL . 1998. Data decisiveness, data quality,
and incongruence in phylogenetic analysis: An example from the monocotyledons using mitochondrial
atpA sequences. Syst. Biol. 47:282–310.
DIAZ-LAZCOZ, Y., J.-C. AUDE, P. NITSCHK É, H.
CHIAPELLO , C. LAND ÈS -D EVAUCHELLE, AND J.-L.
RISLER . 1998. Evolution of genes, evolution of species:
The case of aminoacyl-tRNA synthetases. Mol. Biol.
Evol. 15:1548–1561.
DOWNIE, S. R., S. RAMANATH, D. S. KATZ-D OWNIE, AND
E. LLANAS . 1998. Molecular systematics of Apiaceae
subfamily Apioideae: Phylogenetic analyses of nuclear ribosomal DNA internal transcribed spacer and
plastid rpoC1 intron sequences. Am. J. Bot. 85:563–
591.
FARRIS , J. S. 1989. The retention index and the rescaled
consistency index. Cladistics 5:417–419.
FELSENSTEIN , J. 1985. ConŽdence limits on phylogenies:
An approach using the bootstrap. Evolution 39:783–
791.
FORD, V. S., B. R. THOMAS , AND L. D. GOTTLIEB . 1995.
The same duplication accounts for the PgiC genes in
Clarkia xantiana and C. lewisii (Onagraceae). Syst. Bot.
20:147–160.
GIRIBET , G., AND W. C. WHEELER . 1999. On gaps. Mol.
Phylogenetic. Evol. 13:132–143.
GOLENBERG , E. M., M. T. CLEGG , M. L. DURBIN, J.
DOEBLEY, AND D. P. MA. 1993. Evolution of a noncoding region of the chloroplast genome. Mol. Phylogenet. Evol. 2:52–64.
GOLOBOFF, P. A. 1991. Homoplasy and the choice among
cladograms. Cladistics 7:215–232.
GOLOBOFF, P. A. 1993. Nona, version 1.6 (computer
software and manual). Distributed by the author.
Tucumán, Argentina.
GONZÁLEZ, D. 1996. CodiŽcación de las insercionesdeleciones en el an álisis Žlogenético de secuencias
génicas. Bol. Soc. Bot. Mex. 59:115–129.
HASSANIN, A., AND E. J. P. DOUZERY. 1999. Evolutionary afŽnities of the enigmatic saola (Pseudoryx nghetinhensis) in the context of the molecular phylogeny of
Bovidae. Proc. R. Soc. Lond. Biol. Sci. 266:893–900.
KAJITA, T., K. KAMIYA, H. TACHID A, R. WICKNES WARI,
Y. TSUMURA, H. YOSHIMARU, AND T. YAMAZAKI. 1998.
Molecular phylogeny of Dipterocarpacea e in southeast Asia based on nucleotide sequences of matK,
trnL intron, and trnL–trnF intergenic spacer region in
chloroplast DNA. Mol. Phylogenet. Evol. 10:202–209.
2001
POINTS OF VIEW
KANAI , S., R. KIKUNO , H. TOH, H. RYO , AND T. TODO .
1997. Molecular evolution of the photolyase-bluelight photoreceptor family. J. Mol. Evol. 45:535–548.
KEPPEL, G. 1982. Design & analysis: A researcher’s handbook, 2nd edition. Prentice-Hall, Englewood Cliffs.
KLUGE, A. G., AND J. S. FARRIS . 1969. Quantitative
phyletics and the evolution of Anurans. Syst. Zool.
18:1–32.
KRAKAUER, D. C., P. M. D . A. ZANOTTO, AND M. PAGEL.
1998. Prion’s progress: Patterns and rates of molecular
evolution in relation to spongiform disease. J. Mol.
Evol. 47:133–145.
LITTLEWOOD, D. T. J., A. B. SMITH, K. A. CLOUGH, AND
R. H. EMS ON. 1997. The interrelationships of the
echinoderm classes: Morphological and molecular evidence. Biol. J. Linn. Soc. 61:409–438.
LLOYD, D. G. AND V. L. CALDER . 1991. Multi-residue
gaps, a class of molecular characters with exceptional
reliability for phylogenetic analyses. J. Evol. Biol. 4:9–
21.
MADDIS ON, W. P., AND D. R. MADDIS ON. 1992.
MacClade: Analysis of phylogeny and character evolution. Sinauer, Sunderland, Massachusetts.
MASON-G AMER , R. J., C. F. WEIL, AND E. A. KELLO GG .
1998. Granule-bound starch synthase: Structure, function, and phylogenetic utility. Mol. Biol. Evol. 15:1658–
1673.
MCD ADE, L. A., AND M. L. MOODY. 1999. Phylogenetic
relationships among Acanthaceae: Evidence from
noncoding trnL-trnF chloroplast DNA sequences.
Am. J. Bot. 86:70–80.
NEDBAL, M. A., M. W. ALLARD , AND R. L. HONEYCUTT .
1994. Molecular systematics of hystricognath rodents:
Evidence from the mitochondrial 12S rRNA gene.
Mol. Phylogenet. Evol. 3:206–220.
NISHIDA , H., AND J. SUGIYAMA. 1993. Phylogenetic relationships among Taphrina, Saitoella, and other higher
fungi. Mol. Biol. Evol. 10:431–436.
NIXON, K. C. 2000. WinClada, version 1.0 (computer software and manual). Distributed by the author. Cornell
Univ., Ithaca, New York.
461
ROHLF, F. J., AND R. R. SOKAL. 1981. Statistical tables,
2nd edition. W. H. Freeman and Co., New York.
SAURIN, W., M. HOFNUNG , AND E. DASS A. 1999.
Getting in or out: Early segregation between importers and exporters in the evolution of ATPbinding cassette (ABC) transporters. J. Mol. Evol. 48:
22–41.
SIMMONS , M. P., AND H. OCHOTERENA. 2000. Gaps as
characters in sequence-based phylogenetic analyses.
Syst. Biol. 49:369–381.
SOKAL, R. R., AND F. J. ROHLF. 1981. Biometry, 2nd edition. W. H. Freeman and Co., New York.
SOLTIS , D. E., L. A. JOHNSON, AND C. LOONEY. 1996. Discordance between ITS and chloroplast topologies in
the Boykinia group (Saxifragaceae). Syst. Bot. 21:169–
185.
SWOFFORD, D. L. 1998. PAUP¤ : Phylogenetic analysis using parsimony (¤ and other methods). Sinauer, Sunderland, Massachusetts.
TOURANCHEAU, A. B., E. VILLALOBO , N. TSAO , A.
TORRES , AND R. E. PEARLMAN. 1998. Protein coding gene trees in ciliates: Comparison with rRNAbased phylogenies. Mol. Phylogenet. Evol. 10:299–
309.
VAN DIJK , M. A. M., E. PARADIS , F. CATZEFLIS , AND W. W.
DE JONG . 1999. The virtues of gaps: Xenarthran (Edentate) monophyly supported by a unique deletion in
aA-crystallin. Syst. Biol. 48:94–106.
VAN HAM , R. C. H. J., H. HART , T. H. M. MES , AND J. M.
SAND BRINK . 1994. Molecular evolution of noncoding
regions of the chloroplast genome in the Crassulaceae
and related species. Curr. Genet. 25:558–566.
VON DORNUM, M., AND M. RUVOLO . 1999. Phylogenetic
relationships of the new world monkeys (Primates,
Platyrrhini) based on nuclear G6PD DNA sequences.
Mol. Phylogenet. Evol. 11:459–476.
Received 12 July 2000; accepted 18 August 2000
Associate Editor: R. Olmstead
462
VOL. 50
S YSTEMATIC BIOLOGY
APPENDIX 1. CITATION AND CHARACTERIS TICS
OF THE
MATRICES
Matrix
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Citation
Nedbal et al. (1994)
Nishida and Sugiyama (1993)
Allard (1994)
Hassanin and Douzery (1999)
Littlewood et al. (1997)
Downie et al. (1998)
Soltis et al. (1996)
Bena et al. (1998)
Andreasen et al. (1999)
Blomster et al. (1999)
Mason-Gamer et al. (1998)
Mason-Gamer et al. (1998)
von Dornum and Ruvolo (1999)
Kajita et al. (1998)
McDade and Moody (1999)
Downie et al. (1998)
Krakauer et al. (1998)
Budin and Philippe (1998)
Tourancheau et al. (1998)
Kanai et al. (1997)
Burmester et al. (1998)
Saurin et al. (1999)
Brown et al. (1997)
Bonci et al. (1997)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
Diaz-Lazco z et al. (1998)
No. of characters
Type
rDNA
rDNA
rDNA
rDNA
rDNA
ITS
ITS
ITS
ITS
ITS
intron
intron
intron
intron
intron
intron
exon-DNA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
exon-AA
Locus
Base
Gap
mitochondrial 12S rRNA
18S rRNA
mitochondrial 12S rRNA
mitochondrial 12S rRNA
18S rRNA
ITS
ITS
ITS and ETS
ITS
ITS
granule-bound starch synthase introns
granule-bound starch synthase introns
G6PD introns
trnL-trnF intron and spacer
trnL-trnF intron and spacer
rpoC1 intron
prion precursor protein
Hsp70
phosphoglycerate kinase
photolyase-blue-light photoreceptor family
hexamerins and hemocyanins
ABC transporters
trytophanyl- and tyrosyl-tRNA synthetases
periplasmic chaperone-like proteins
ArgRS
AspRS
GluRS
GlyRS
HisRS
lleRS
LeuRS
MetRS
PheRS
ProRS
ThrRS
TrpRS
TrpRS and TyrRS
TyrRS
814
2021
1000
976
1957
488
597
1110
696
655
1358
1421
1286
1948
1152
949
896
817
492
838
971
416
184
295
411
218
226
490
201
461
267
266
271
321
343
245
233
313
146
158
56
57
312
82
120
91
168
110
53
72
69
98
164
136
36
123
34
132
189
53
33
112
38
9
10
19
13
35
11
19
21
22
10
28
63
35