PDF - Oxford Academic

Sy st. Biol. 47(1):9± 17, 199 8
Is It B etter to A dd Taxa or C h aracte rs to a D if® cult
Phylo genetic Problem ?
A N NA G RA YB EA L1
D epartment of Z oology, Un iversity of Texas, Austin, Texas 78712, U SA
A bstract.Ð The effects on phylogene tic accu racy of add ing ch aracters and /or taxa were explored
u sing data generated by com puter simulation. T he cond itions of this stud y were cons trained but
allowed for system atic investigation of certain param eters. The starting p oin t for the study was a
fou r-taxon tree in the ``Felsen stein zone,’’ representin g a dif® cu lt phylogene tic problem w ith an
extreme situation of long branch attraction. Taxa were added sequentially to this tree in a m an ner
sp eci® cally designed to break u p the long branches, and for each tree data m atrices of d ifferen t
sizes were simu lated. Phylog enetic trees were recons tructed from these data u sing the criteria of
parsimony and m aximu m likeliho od. P hy logenetic accu racy was m easured in three ways: (1)
proportion of trees that are com pletely correct, (2) proportion of correctly recon structed branches
in all trees, and (3) proportion of trees in w h ich the original fou r-taxon statement is correctly
recon structed. Accu racy im proved d ram atically w ith the add ition of taxa and m uch m ore slow ly
w ith the add ition of characters. If taxa can b e added to break up long branches, it is m uch m ore
preferable to add taxa than ch aracters. [L ong branch attraction; parsimony; phy logenetic recon struction; simulation; taxon sam pling.]
It is obvious that the successfu l reconstruction of phylo gen etic relationsh ip s require s som e am ount of data sam pling
from releva nt taxa and in form ative characters. Fa r less clear, however, is how much
of each data typ e is requ ired , and w he ther
one of tho se sources of data has a greater
imp act on accuracy than the other. G iven
lim ited time and resou rces, it is im portant
to explore the costs and bene ® ts to phy logenetic accuracy of addin g taxa versu s increasin g the num ber of characters. For exam ple, given that one has su f® cient tim e
and resou rces to sequence 10 kilob ases
(kb) of DNA , would it be better to sequence, say, 2.5 kb from each of 4 taxa, or
1 kb from each of 10, or 0.25 kb from each
of 40?
It is know n that for some trees certain
phylogenetic methods fail even w ith in® nite
data. This problem , called statistical inconsistency, was ® rst dem onstrated for parsim ony and com p atibility by Fels enstein
(1978) for a four-taxon tree w ith unequ al
branch lengths (or rates of evolution). Subsequent workers found inconsistent branches
even in ultrametric trees (with unequ al
branch lengths) and in trees w ith ® ve taxa
(Hendy and Penny, 1989; DeBry, 1992; Zharkikh and Li, 1993). Steel (1989), Hend y and
Charleston (1993), and Kim (1996) have also
found inconsistent branches in large trees.
Hen dy and Penny (1989) suggested that
consistency problem s could be allev iated
by adding taxa to break up long branches,
or, in other words, sh ortening the average
branch leng th. Sup port for this idea com es
from severa l p aleontological studies (G authier et al., 1988; D onoghue et al., 1989;
Huelsen beck, 1991), all of w hich ind icate
that taxa that break up long bra nche s w ithin the ing rou p (because they exhibit rela tively prim itive character states) have the
stron gest imp act on phy logenetic conclu sions. It is of intere st to explore sy stem atically how m any taxa must be added (and
w here in the tree and w ith w hat bra nch
leng ths) for inconsistency or long -branch
attraction problem s to disapp ea r. Kim
(1996) conducted sim ulations to show that
the addition of taxa do es not necessarily
allev iate inconsistency, but for the m ost
p art his added taxa did not break up long
branches because he kep t the avera ge
branch len gth consta nt. It rem ains to be
seen how m any taxa are nee ded to elim inate inconsisten t branches if taxa are add-
1
Pre sen t add ress: D epartm ent of Zo ology, Field
Mu seum of Natural H istory, R oo sevent R oad and
L ake Shore D rive, C h icago, Illinois 60605, U SA . Em ail: grayb eal@ fm p pr.fm n h.org.
9
10
SYSTE M AT IC BIOLO GY
ed sp eci® cally to break up the long
branches and thus reduce average branch
len gth.
A nother question concerns the relationsh ip between adding taxa and adding
characters in situations w here inconsis tency is not a problem (perh aps because
enou gh taxa have been added to break up
any long branches). In such situ ations one
m igh t thin k that phylogenetic accuracy
w ill best be obtained by increasin g the
num ber of characters p er taxon, rather
than by adding even m ore taxa. This
seem s to be the com m only held view
am ong sy stem atists, am ong w hom there is
much attention focused on increasin g the
num ber of characters to resolve phylogenetic problem s (e.g., Lecointre et al., 1993,
1994; H illis et al., 1994; C um m ings et al.,
1995). There has not been extensive exploration of the effect on phylo gen etic accuracy of adding both characters and taxa,
and given that there m ay be a trade-off between these two data types it is of intere st
to explore this situation.
Simulation studies are a p owerfu l tool
for investigating the behavior of phylo genetic reconstru ction over a range of precisely m anipu lated cond itions (e.g., Nei,
1 9 9 1 ; Hu e lse n b e ck a n d H illis , 1 9 9 3 ;
Ku hner and Felsen stein, 1994). This p aper
explores the effects of adding both taxa
and characters on phy logene tic accuracy.
The basic prem ise is to start w ith a dif® cult phy lo genetic problem : data derived
from a four-taxon tree in the ``Felsenstein
zone’ ’ know n to be inconsisten t for p arsim ony and o ther phylogenetic reconstruction methods. To exam ine the effect of
adding taxa, the to tal am ount of sequence
data was held constant and taxa were adde d s u cc e ss ive ly ; t h u s in a ll r e su ltin g
graphs the nu m ber of characters p er taxon
decreases as the nu m ber of taxa increases
(e.g., if the total nu m ber of characters was
40,000, then for 4 taxa each taxon had
10,000 characters w hile for 30 taxa each
taxon had 1,333 characters). To exam ine
the effect of adding characters, this procedure was rep eated for a range of character
m atrix sizes.
VO L.
47
F IG U R E 1. T he fou r-taxon starting tree in the ``Felsen stein zone,’ ’ u pper left, w ith dem on stration of the
taxon add ition procedu re im plem ented in this study:
the single ® ve-taxon and six-taxon trees and the two
seven -taxon trees, one of w hich h as the seven th branch
added at the tip (low er left) and the o ther of w hich
h as it added at the base (low er right). Branch lengths
are draw n proportional to the expe cted num b ers of
ch ang e: T he starting fou r-taxon trees h as two long
branche s of length 0.5 and three short branches of
length 0.05. T he larger trees m aintain the three short
branche s at length 0.05, but the longer branches are
d ivided into successively shorter and shorter ones as
illustrated.
M A T E R IA L S
AND
M ET HO D S
Simulations were p erform ed usin g the
program Sim inator, w ritten in C by John
Huelsen beck and m odi® ed sligh tly by m e.
O ne hund red replicate data m atrices of a
given sequence len gth were sim ulated over
each tree, w ith a kap pa value (the ratio of
transition rate, alpha, to tra nsvers ion rate,
beta) of 2 and rate variation determ ined by
a gam m a sh ape p aram eter of 0.5. T rees
were reconstructed from the resu ltin g da ta® les using PAU P* 4.0 (various develo pm en tal version s) (Swofford , unpu bl.) usin g
the criterio n of parsim ony. The topolo gies
were tabulated from the tree ® les usin g the
program TreeC ounter, w ritten in C by m e.
Phy logene tic accuracy was scored w ith
three m easures: (1) the proportion of trees
for w hich the relationsh ip among an orig in al set of four taxa was reconstructed correctly, (2) the prop ortion of trees that were
com pletely corre ct, and (3) the prop ortion
of correctly reconstructed branches am ong
all tree s.
The starting p oint for these analyse s is
a four-taxon tree know n to have an inconsisten t in tern al branch (Fig. 1), based on
1998
GRA YBEA LÐ
EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G
prev iou s ana lytical and sim ulation studies
(Felsenstein, 1978; Huelsen beck and H illis,
1993). This tree has two long branches
(length 0.5) and three short ones (length
0.05), w here branch len gth is de® ned as
the proportion of characters that are exp ected to change from one end of the
branch to the other. Taxa were added to
this tree accord in g to the follow ing ru les:
the longest branch(es) in the tree were
iden ti® ed, and a new taxon was joine d at
the m idp oint of that long est branch. The
leng th of the new branch was set such that
the total len gth from the in tern al node to
the tip was the sam e as for the original
long bra nch (see Fig. 1). A ll distinc t tree s
w ith between 4 and 15 taxa were used to
sim ulate data. A lthou gh one m igh t think
that by adding taxa in this m anner the
nu m ber of po ssible distinc t tree s would
steadily increase, instead it show s a wavelike increase and decrease as a function of
nu m ber of taxa. This is because for p articular nu m bers of taxa there is only one longest branch left in the tree, and thus there
is only one possible tree that is one taxon
larger. In contrast, in this sin gle one-taxonlarger tree there are multiple equ ally longest branches, and thus there are severa l
p ossible trees w ith yet one m ore taxon. In
order to get an idea of w hat ha pp ens for
large trees w ithout dram atically increasin g
com putation tim e, two relatively large
tree s were chosen for sim ulations: the sin gle 22-taxon and 30-taxon trees. Note that
thes e rules resu lt in the creation of ju st a
su bset of all possible tree s in the exam ined
size range.
D at a m at rice s o f ® ve siz e s Ð 1 0 ,0 0 0 ,
20,000, 40,000, 60,000, and 80,000 charactersÐ were created by sim ulation acro ss all
tree s. For one grou p of the trees, m atrices
of to tal size 1,000 characters were also created. For each tree, the total nu m ber of
characters was kept consta nt as the nu m ber of taxa was increased , w ith the characters distributed equally am ong the taxa
(for example, a to tal of 10,000 characters
m ea ns that each of 4 taxa have 2,500 characters, each of 10 taxa have 1,000 characters, etc.). The prop ortions of variable and
inform ative characters were determ ined by
11
the rate variation para m eter, the len gth of
the branche s, and the nu m ber of taxa. Both
prop ortion s increased w ith the nu m ber of
taxa. For exam ple, for 4 taxa approxim ately
40± 44% of the characters were variable and
about 3± 5% were inform ative, for 12 taxa
ap proxim ately 54± 64% of the characters
were variable and 41± 49% were inform ative, and for 30 taxa ap proxim ately 69± 75%
of the characters were variable and 59± 65%
were in form ative.
Trees were reconstructed accord ing to
the criterio n of parsim ony usin g equ ally
weigh ted characters and heu ristic sea rches
w ith TBR branch swap p ing. For a lim ited
grou p of data m atrices, trees were also reconstru cted using m aximum likelih ood.
The m odel of evolution used for the m aximum likelih ood analyses m atched p erfectly the mo del under w hich the data
were sim ulated: gam m a sh ap e param eter
of 0.5, base frequencie s equal, ka pp a of 2.
R E SU L T S
The genera l resu lt from all ana lyses was
that, for a given to tal data m atrix size, phylogen etic accuracy im proved as the nu m ber of taxa increased. This relationsh ip
contradicts the no tion that increasing the
nu m ber of taxa sh ould worsen rather than
improve accuracy, and seem s especially
su rprisin g given that as the nu m ber of taxa
is increasin g, the nu m ber of characters p er
taxon is decreasing . Neverth eless, there is
a lim it to this genera lizationÐ accuracy did
worsen w hen the num ber of characters per
taxon becam e quite low. In this study a
stron g decline in accuracy w ith increasin g
taxa was seen only in the sm alle st data m atrices (above ; 8 taxa for total m atrix size
of 1,000 characters; i.e., 125 characters per
taxon; Fig. 2).
The sp eci® c results based on parsim ony
are described in m ore detail next. A ® nal
section com pares the p arsim ony resu lts
w ith som e m aximum likeliho o d analyses.
Increasing the Number of Taxa and/or
C haracters
For all data m atrix sizes exam ined, w hen
the to tal nu m ber of characters is held contant, accuracy im proves w ith the addition
12
SYSTE M AT IC BIOLO GY
VO L.
47
F IG U R E 2. Tw o m easu res of phy logenetic accuracy: (1) the prop ortion of replicates resulting in trees that
accurately represent the relation ship am on g the original fou r taxa ( V ) and (2) the proportion of replicates
resu lting in com pletely correct trees ( m ), plotted as a function of num b er of taxa. Th e num b er of ch aracters for
e ach graph represen ts the total num b er across all included taxa (e.g ., for 10 taxa there are 100 characters per
taxon); thu s w ithin each graph the nu m ber of characters per taxon de creases as the num ber of taxa increases.
From top to b ottom : 1,000 ch aracters total, 10,000 ch aracters total, 40,000 ch aracters total, 80,000 ch aracters
total. For 40,000 and 80,000 characters total the two m easures are ex actly equivalent, and thus on ly one set of
sym b ols is plotted.
of taxa over all or at least som e p ortion of
the p aram eter space (Figs. 2, 3). Im provem en t was seen accord ing to all three m ea su res of accuracy: (1) the prop ortion of
tree s in w hich the relationsh ip am ong the
orig inal four taxa is corre ct (Fig. 2), (2) the
proportion of tree s that are com pletely correct (Fig. 2), and (3) the proportion of correct branche s am ong all trees (Fig. 3). The
strong est decline in accuracy is seen using
the second m easu re (proportion of com -
pletely correct tree s), w hich declines w ith
m ore than ; 10 taxa for the sm allest data
m atrix (1,000 total characters) and w ith
m ore tha n ; 20 taxa in the second sm allest
(10,000 total characters). There is also a
sligh t decline in accuracy according to the
® rst m easure, prop ortion of tree s w ith the
corre ct four-taxon relationsh ip, for the
sm aller data m atrices. The third m easu re,
proportion of corre ct branches among all
tree s, rem ains high because nea rly all of
1998
GRA YBEA LÐ
EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G
13
F IG U R E 3. T he proportion of correct branches recon structed for three sizes of data m atrices (10,000, 40,000,
80,000 characters total across all taxa) as a function of num ber of taxa. T he nu m b er of characters per taxon
decr eases as the nu m b er of taxa increases.
the incorre ct trees contain only one or two
incorrect branches. Of course w ith a su f® ciently low nu m ber of characters p er taxon
all m easures of accuracy would be exp ected to decline.
These resu lts suggest that it is always
preferable to add taxa rather than characters. However, although it is true that accuracy im proves m ore dram atically w ith
addition of taxa, it also im proves w ith an
increase in the nu m ber of characters per
taxon (Fig. 4), as long as there is no inconsisten t bra nch in the tree. Inconsisten t
branches cause accuracy to decline w ith an
increase in num ber of characters.
W here Is It Most Helpful to Add New Taxa?
ative branch len gths. There was a strikingly w ide ra ng e in the accuracy of phy logene tic reconstruction observed for these
tree s. The genera l p attern was that the
tree s reconstructed least accurately were
those in w hich taxa were added closest to
the tip s of the long branche s, and tho se reconstru cted m ost accurately were those in
w hich taxa were added clo sest to the base
of those bra nche s (Fig. 5). This resu lt is
consistent w ith the analytical result of Kim
(1996), w ho found that an in consistent in terna l bra nch could be m ade consistent if
one taxon was added to each of the two
long branches in approxim ately the basal
third of those branches (the exact p osition
dep end s on the relative branch len gth s).
Because of the ru les followed for adding
taxa to the orig inal four-taxon tree, trees of
a given num ber of taxa differ in their rel-
Inconsistency
By com p arin g the results for data m atrices of differen t sizes, it is possible to ge t
14
SYSTE M AT IC BIOLO GY
VO L.
47
F IG U R E 4. T he prop ortion of replicates resulting in trees that accu rately represent the relation ship am on g
the original fou r taxa for data m atrices of total size 10,000 characters ( M ) and 80,000 characters ( m ) as a fu nction
of num b er of taxa, w here the num b er of ch aracters per taxon decr eases as the nu m b er of taxa increases. Thes e
two data m atrix sizes represen t the com plete ex tremes evalu ated in this study; results from m atrices of interm ediate sizes fall b etween the values plotted here.
som e in sight in to w hich trees have inconsisten t in tern al branche s (Fig. 4). It is clear
that the trees of four, ® ve, and six taxa
have an inconsistent intern al branch, because the la rg er data m atrices all fail to reconstruct the corre ct four-taxon relationsh ip 100% of the tim e. C ertain of the trees
of between 7 and 10 taxa seem to have an
inconsisten t intern al branch, because the
accuracy is worse w ith m ore data than
w ith less. For all of the tree s w ith m ore
than 10 taxa inconsistency no longer ap p ea rs to be a problem . However, because
inconsistency is a cond ition based on in® nite data, it is not po ssible to m ake de® nitive statem ents about this situ ation w ith
the relatively sm all data m atrices exam ined here.
M aximum Likelihood
Lim ited analyse s usin g m aximum likeliho o d show a differen t p attern tha n parsim ony for the trees exam ine d here. The
m ajor difference is that this m ethod as im plem en ted here do es not suffer from prob lem s w ith inconsisten t branches, and thus
it reconstructs the sm aller tree s in this
study corre ctly (Fig. 6; also H illis et al.,
1994). M aximum likeliho od also reconstructs the la rg er trees w ith higher accuracy than parsim ony (Fig. 6), althou gh it is
not reasonable to comp are these two m eth-
o ds in this case because the m odel of m aximum likeliho o d evolution used for phylogeny reconstru ction m atched the m o del
of evolution used to simulate the data
much m ore closely than did the parsim ony
reconstruction m odel. Future work m igh t
com pare variou s phy lo genetic reconstruction m ethods m ore fairly, and determ ine if
certa in m ethods are m ore or less su sceptible to taxon and character sam plin g.
D IS C U S SIO N
C an we conclude that we always should
add data in the form of taxa rather than
characters? Under the conditions of this
study, addition of both kind s of data im proves phylo genetic accuracy, but w hen
the total nu m ber of characters is held consta nt, accuracy is much higher if the characters are distributed acro ss a larger nu m ber of taxa. A lthou gh the relationsh ip
between adding taxa versu s adding characters was not exam in ed directly by Lecoin tre et al. (1993, 1994), the gen eral conclu sion here is com patible w ith their resu lt
that the im pact of sp ecies sa m pling was
higher tha n expected, and it is also consist en t w ith p a le o n t o lo g ic al s t ud ie s th at
found that taxa that brea k up long bra nches im prove accuracy (Gauthier et al., 1988;
D onogh ue et al., 1989; Huelsen beck, 1991).
The resu lts of these sim ulations su gg est
that there are only a few reasons that one
1998
GRA YBEA LÐ
EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G
15
F IG U R E 5. T he proportion of replicates resu lting in trees in w hich the relationship am on g the original fou r
taxa is correct, plotted as a function of num b er of taxa for two data m atrix sizes (10,000 and 80,000 ch aracters
total, w here the num b er of characters per taxon decre ases as the nu m b er of taxa increases). T he m ost accurate
trees ( . ) are those w here n ew taxa h ad be en added at the b ase of the long branche s, and the least accu rate
trees ( n ) are those w here new taxa were added toward the tips.
would no t want to increase the num ber of
taxa in a phy logene tic study. The ® rst, and
m ost obvious, is that no such taxa exist to
sam ple. This m ay be because there have
not been su f® cient splitting even ts am ong
the taxa of interest to create ap propriate
taxonom ic units, or because such taxa have
been lost throu gh extinction, or even be-
cause splittin g even ts occurre d so rap idly
in a narrow w indow of tim e that no taxa
w ith suf® cien tly shor t branch len gth s exist.
The second is that strong sup p ort for the
phy logene tic relationship s in question exists and there is no intere st or need to
sp en d tim e and m oney obtainin g data
from other relatives. O f course an imp or-
F IG U R E 6. T he proportion of replicates resu lting in trees that accurately represent the relation ship am on g
the original fou r taxa ( V , v ) and the proportion of replicates resulting in com pletely correct trees ( M , m ),
plotted as a function of num b er of taxa. Trees were recons tructed from data m atrices of size 10,000 ch aracters,
and the num b er of ch aracters per taxon decreases as the num b er of taxa increases. O p en sym bols are results
u sing parsimony and closed sym bols are results using m aximum likelihoo d.
16
SYSTE M AT IC BIOLO GY
tant caveat in this case m ay be that the
strong sup p ort is for incorrect relationsh ips, in w hich case adding taxa m igh t be
critically im portant to im prove accuracy. It
therefore is im portant to explore the nature of su pp ort for p articular relationsh ip s
and to exam ine w he ther particular m isleading pheno m ena m igh t be affecting the
result. For exam ple, workers have determ ine d that long-bra nch attraction can m islead phy lo gene tic reconstru ction m etho ds,
as can base com po sition bias or o ther
form s of nonrandom converg ence in the
data (Felsen stein, 1978; Huelsen beck and
H illis, 1993; Lockhart et al., 1994; Swofford
et al., 1996). It is p ossible to explore the
p ossible im p act of such pheno men a using
m ethods such as the p aram etric boo tstrap,
as outline d by Huelsen beck et al. (1996).
A n in terestin g third possibility is that the
added taxa m ight m ake w hat was a consistent tree into an inconsistent tree. O ne
could im agine this happ ening , for example,
by the addition of successive outgroup taxa
w ith a short intern al branch between them .
C onsider ation of this possibility serves to
high light the very particular nature of the
conditions for adding taxa in this study:
Taxa were added speci® cally to break up
long branches. It would be entirely m isleading to claim that this study su pports a general conclusion that more taxa always im prove phy logenetic accuracy.
Conversely, there seem to be very few
reasons that it would be necessary to in crease the nu m ber of characters applie d to
a p articular phylo genetic question beyond
som e reasonable level (although that level
m ay be qu ite a bit higher than that seen in
m any studies today). O bviou sly there do es
exist som e thresho ld below w hich the
num ber of characters is insu f® cient. Accord in g to the cond itions of this study that
level is qu ite low, but it is likely to be high er w ith real data w here in genera l overall
rates of evolution are slower and the prop ortion of variable characters is lower. A
second reason m igh t be that the available
data are effectively useless because they
have not evolved at a rate ap propriate for
the question of in terest (e.g., the DNA sequen ces are saturated). This determ ination,
VO L.
47
however, m ay largely be a question of context, because w ith suf® cient taxa in the
right parts of the tree app arent hom oplasy
would disap pear. Nonethele ss, for certain
question s there is und oubtedly som e kind
of balance between gettin g ``bad’ ’ data
from m ore taxa in the hop es that these
taxa w ill solve the problem s w ith the data,
and lookin g for differen t data. More work
in this area is nee ded . In som e situ ations
it can be very dif® cult to obtain data from
new sources (e.g., differen t gen es), and it
could m ake sen se to work ® rst on includin g m ore taxa, and only then to m ove to
new sou rces of characters.
A n encoura ging conclusion from this
study is that the accurate reconstruction of
large tree s, such as the larg e-scale studies
of angiosp erm relationsh ip s (C hase et al.,
1993), m ay be easier than prev iou sly
thou ght, because such tree s are less likely
to have very long bra nche s. In fact, big
tree s m ay be far easier to reconstruct accurately than sm all ones, as dem onstrated
dram atically in recent sim ulations (H illis,
1996; Purvis and Q uicke, 1997). A lthou gh
this resu lt contradicts the m essage of
K im ’s (1996) pap er, it is no t inconsistent
w ith his resu lts. Severa l caveats reg ard ing
this conclusion are warra nted, m ost im p ortant of w hich is the recognition of the
very particu la r conditions of this study
w here taxa were added speci® cally to
brea k up long bra nche s. If taxa were in stead added random ly w ith resp ect to
branch leng th, as m igh t be m ore realistic,
the expectation is that phy lo genetic accuracy would also im prove but m ore slow ly.
Nonetheless, we are genera lly no t com pletely in the da rk about the iden tity of p otentially long branches, and it is certainly
conceivable that we can add taxa sele ctively to purposely break them up.
A C KN O W L ED GM EN TS
T his work was supp orted by a National Science
Foundation Pos tdoctoral Fellow ship in Bioscience s R e lated to the E nviron m ent. I than k D. M . H illis for help ful discu ssion about the des ign of the simulations, M .
B rauer for assistance im plemen ting various com puter
program s, J. P. Huelsen be ck for allow ing m e to use
Sim inator, and D. L. Swofford for acce ss to developm en tal version s of PAU P *. I than k m em bers of the
1998
GRA YBEA LÐ
EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G
H illis/B ull lab and the Field Museum evolution ary biolo gy d iscussion group, D. M . H illis, D. C . C ann atella,
P. Wagner, and R. In ger for com m en ts on the m anu script.
R E FE R E N C E S
C H A S E , M . W., D. E. S O L T IS , R . G. O L M S T E A D , D. M O R G A N , D. H . L E S , B . D. M IS H L E R , M . R. D U V A L L , R . A .
P R IC E , H . G. H IL L S , Y.-L . Q U I , K . A . K R O N , J. H . R E T T IG , E . C O N T I , J. D. P A L M E R , J. R . M A N H A R T , K . J.
S Y T S M A , H . J. M IC H A E L S , W. J. K R E S S , K . G. K A R O L ,
W. D. C L A R K , M . H E D R E N , B . S. G A U T , R . K . J A N S E N ,
K .-J. K IM , C . F. W IM P E E , J. F. S M IT H , G. R . F U R N IE R ,
S. H . S T R A U S S , Q .-Y. X IA N G , G. M . P L U N K E T T , P. S.
S O L T IS , S. M . S W E N S E N , S. E. W IL L I A M S , P. A . G A D E K ,
C . J. Q U IN N , L. E . E G U IA R T E , E. G O L E N B E R G , G. H .
L E A R N , S. W. G R A H A M , S. C . H . B A R R E T T , S. D A Y A N A N D A N , A N D V. A . A L B E R T . 1993. P hy logen ies of
seed plants: A n analysis of nucleo tide sequences
fr om the plastid gene rbcL . A n n. Mo. B ot. G ard. 80:
528± 580.
C U M M IN G S , M . P., S. P. O T T O , A N D J. W A K E L E Y . 1995.
Sam pling prop erties of DNA sequence data in phylogene tic analysis. Mol. B iol. E vol. 12:814± 822.
D E B R Y , R . W. 1992. T he con sistency of several phy logeny-inference m etho ds u nder varying evolutionary
rates. Mol. B iol. Evol. 9:537± 551.
D O N O G H U E , M . J., J. A . D O Y L E , J. G A U T H IE R , A . G.
K L U G E , A N D T. R O W E . 198 9. T he im p ortanc e o f fo ssils in phy log eny rec on struction. A n nu. R ev. E c ol.
S yst. 20 :431± 460 .
F E L S E N S T E IN , J. 1978. Ca ses in w h ich parsimony or
com b atibility m ethod s w ill be p ositively m isleading. Syst. Z ool. 27:401± 410.
G A U T H IE R , J., A . G. K L U G E , A N D T. R O W E . 1988. A m n iote phy logeny and the im p ortance of fos sils. C lad istics 4:105± 210.
H E N D Y , M . D., A N D M . A . C H A R L E S T O N . 1993. Hadam ard con jugation: A versatile tool for m odellin g nucleotide sequence evolution. N . Z. J. Bo t. 31:231± 237.
H E N D Y , M . D., A N D D. P E N N Y . 1989. A fram ew ork for
the quan titative stud y of evolutionary trees. Syst.
Z ool. 38:297± 309.
H IL L IS , D. M . 1996. In fer ring com plex phylogen ies.
N ature 383:130± 131.
H IL L IS , D. M ., J. P. H U E L S E N B E C K , A N D D. L. S W O F F O R D .
1994. Hobg lobin of phylogene tics? Nature 369:363±
364.
H U E L S E N B E C K , J. P. 1991. W hen are fo ssils better than
17
extan t taxa in phy logenetic an alysis? Syst. Z ool. 40:
458± 469.
H U E L S E N B E C K , J. P., A N D D. M . H IL L IS . 1993. Success
of phy logenetic m e thod s in the fou r-taxon case.
Syst. B iol. 42:247± 264.
H U E L S E N B E C K , J. P., D. M . H IL L IS , A N D R. J O N E S . 1996.
Pa ram etric b oo tstrap ping in m olecular phy logenetics: Ap plications and perform ance. Pa ge s 19± 46 in
M olecular zoolo gy: Advance s, strategies, and protocols ( J. D. Ferraris and S. R . Palum bi, ed s.). W ileyL iss, New York.
K IM , J. 1996. General incon sistency cond itions for
m aximu m p arsimony : Effects of branch lengths and
increasing num b ers of taxa. Syst. Biol. 45:363± 374.
K U H N E R , M . K ., A N D J. F E L S E N S T E IN . 1994. A simu lation com parison of phy logeny algorith m s under
e qu al and unequal evolution ary rates. Mol. B iol.
E vol. 11:459± 468.
L E C O IN T R E , G., H . P H IL IP P E , H . L . V A N L E , A N D H . L E
G U Y A D E R . 1993. Spe cies sam pling has a m ajor im p act on phy logenetic inference. Mol. P hy logenet.
E vol. 2:205± 224.
L E C O IN T R E , G., H . P H IL IP P E , H . L . V A N L E , A N D H . L E
G U Y A D E R . 1994. How m any nucleo tides are requ ired to resolve a phy logenetic problem ? T he use
of a new statistical m ethod applicable to available
sequences. Mol. P hy logenet. E vol. 3:292± 309.
L O C K H A R T , P. J., M . A . S T E E L , M . D. H E N D Y , A N D D.
P E N N Y . 1994. R ecovering evolution ary trees u nder a
m ore realistic m odel of sequence evolution. Mol.
B iol. E vol. 11:605± 612.
N E I , M . 1991. R elative ef® cien cies of differ en t treem aking m ethod s for m olecular data. Pa ges 90± 128
in Phylog ene tic analysis of DNA sequences (M . M .
M iyam oto and J. C racraft, ed s.). O xford Un iv. Pres s,
O xford, E n gland.
P U R V IS , A ., A N D D. L . J. Q U IC K E . 1997. Bu ilding phylogen ies: A re the big easy? Trend s E col. E vol. 12:
49± 50.
S T E E L , M . A . 1989. D istributions on bicolored evolu tionary trees. P h.D. T hesis, M assey Un iv., Pa lm erston North, New Z ealand.
S W O F F O R D , D. L ., G. J. O L S E N , P. J. W A D D E L L , A N D D.
M . H IL L IS . 1996. P hy logenetic inference. Pag es 407±
514 in Molecular system atics, 2nd e dition (D. M .
H illis, C . Moritz, and B . K . M able, ed s.). Sinauer A ssociates, Sunder land, M assach usetts.
Z H A R K IK H , A ., A N D W.-H . L I . 1993. Incon sistency of
the m aximu m -p arsim ony m e thod : The case of ® ve
taxa w ith a m olecu lar clock. Syst. B iol. 42:113± 125.
Received 8 April 1997; accepted 10 Ju ne 1997
A ssociate E ditor: D. C ann atella