Sy st. Biol. 47(1):9± 17, 199 8 Is It B etter to A dd Taxa or C h aracte rs to a D if® cult Phylo genetic Problem ? A N NA G RA YB EA L1 D epartment of Z oology, Un iversity of Texas, Austin, Texas 78712, U SA A bstract.Ð The effects on phylogene tic accu racy of add ing ch aracters and /or taxa were explored u sing data generated by com puter simulation. T he cond itions of this stud y were cons trained but allowed for system atic investigation of certain param eters. The starting p oin t for the study was a fou r-taxon tree in the ``Felsen stein zone,’’ representin g a dif® cu lt phylogene tic problem w ith an extreme situation of long branch attraction. Taxa were added sequentially to this tree in a m an ner sp eci® cally designed to break u p the long branches, and for each tree data m atrices of d ifferen t sizes were simu lated. Phylog enetic trees were recons tructed from these data u sing the criteria of parsimony and m aximu m likeliho od. P hy logenetic accu racy was m easured in three ways: (1) proportion of trees that are com pletely correct, (2) proportion of correctly recon structed branches in all trees, and (3) proportion of trees in w h ich the original fou r-taxon statement is correctly recon structed. Accu racy im proved d ram atically w ith the add ition of taxa and m uch m ore slow ly w ith the add ition of characters. If taxa can b e added to break up long branches, it is m uch m ore preferable to add taxa than ch aracters. [L ong branch attraction; parsimony; phy logenetic recon struction; simulation; taxon sam pling.] It is obvious that the successfu l reconstruction of phylo gen etic relationsh ip s require s som e am ount of data sam pling from releva nt taxa and in form ative characters. Fa r less clear, however, is how much of each data typ e is requ ired , and w he ther one of tho se sources of data has a greater imp act on accuracy than the other. G iven lim ited time and resou rces, it is im portant to explore the costs and bene ® ts to phy logenetic accuracy of addin g taxa versu s increasin g the num ber of characters. For exam ple, given that one has su f® cient tim e and resou rces to sequence 10 kilob ases (kb) of DNA , would it be better to sequence, say, 2.5 kb from each of 4 taxa, or 1 kb from each of 10, or 0.25 kb from each of 40? It is know n that for some trees certain phylogenetic methods fail even w ith in® nite data. This problem , called statistical inconsistency, was ® rst dem onstrated for parsim ony and com p atibility by Fels enstein (1978) for a four-taxon tree w ith unequ al branch lengths (or rates of evolution). Subsequent workers found inconsistent branches even in ultrametric trees (with unequ al branch lengths) and in trees w ith ® ve taxa (Hendy and Penny, 1989; DeBry, 1992; Zharkikh and Li, 1993). Steel (1989), Hend y and Charleston (1993), and Kim (1996) have also found inconsistent branches in large trees. Hen dy and Penny (1989) suggested that consistency problem s could be allev iated by adding taxa to break up long branches, or, in other words, sh ortening the average branch leng th. Sup port for this idea com es from severa l p aleontological studies (G authier et al., 1988; D onoghue et al., 1989; Huelsen beck, 1991), all of w hich ind icate that taxa that break up long bra nche s w ithin the ing rou p (because they exhibit rela tively prim itive character states) have the stron gest imp act on phy logenetic conclu sions. It is of intere st to explore sy stem atically how m any taxa must be added (and w here in the tree and w ith w hat bra nch leng ths) for inconsistency or long -branch attraction problem s to disapp ea r. Kim (1996) conducted sim ulations to show that the addition of taxa do es not necessarily allev iate inconsistency, but for the m ost p art his added taxa did not break up long branches because he kep t the avera ge branch len gth consta nt. It rem ains to be seen how m any taxa are nee ded to elim inate inconsisten t branches if taxa are add- 1 Pre sen t add ress: D epartm ent of Zo ology, Field Mu seum of Natural H istory, R oo sevent R oad and L ake Shore D rive, C h icago, Illinois 60605, U SA . Em ail: grayb eal@ fm p pr.fm n h.org. 9 10 SYSTE M AT IC BIOLO GY ed sp eci® cally to break up the long branches and thus reduce average branch len gth. A nother question concerns the relationsh ip between adding taxa and adding characters in situations w here inconsis tency is not a problem (perh aps because enou gh taxa have been added to break up any long branches). In such situ ations one m igh t thin k that phylogenetic accuracy w ill best be obtained by increasin g the num ber of characters p er taxon, rather than by adding even m ore taxa. This seem s to be the com m only held view am ong sy stem atists, am ong w hom there is much attention focused on increasin g the num ber of characters to resolve phylogenetic problem s (e.g., Lecointre et al., 1993, 1994; H illis et al., 1994; C um m ings et al., 1995). There has not been extensive exploration of the effect on phylo gen etic accuracy of adding both characters and taxa, and given that there m ay be a trade-off between these two data types it is of intere st to explore this situation. Simulation studies are a p owerfu l tool for investigating the behavior of phylo genetic reconstru ction over a range of precisely m anipu lated cond itions (e.g., Nei, 1 9 9 1 ; Hu e lse n b e ck a n d H illis , 1 9 9 3 ; Ku hner and Felsen stein, 1994). This p aper explores the effects of adding both taxa and characters on phy logene tic accuracy. The basic prem ise is to start w ith a dif® cult phy lo genetic problem : data derived from a four-taxon tree in the ``Felsenstein zone’ ’ know n to be inconsisten t for p arsim ony and o ther phylogenetic reconstruction methods. To exam ine the effect of adding taxa, the to tal am ount of sequence data was held constant and taxa were adde d s u cc e ss ive ly ; t h u s in a ll r e su ltin g graphs the nu m ber of characters p er taxon decreases as the nu m ber of taxa increases (e.g., if the total nu m ber of characters was 40,000, then for 4 taxa each taxon had 10,000 characters w hile for 30 taxa each taxon had 1,333 characters). To exam ine the effect of adding characters, this procedure was rep eated for a range of character m atrix sizes. VO L. 47 F IG U R E 1. T he fou r-taxon starting tree in the ``Felsen stein zone,’ ’ u pper left, w ith dem on stration of the taxon add ition procedu re im plem ented in this study: the single ® ve-taxon and six-taxon trees and the two seven -taxon trees, one of w hich h as the seven th branch added at the tip (low er left) and the o ther of w hich h as it added at the base (low er right). Branch lengths are draw n proportional to the expe cted num b ers of ch ang e: T he starting fou r-taxon trees h as two long branche s of length 0.5 and three short branches of length 0.05. T he larger trees m aintain the three short branche s at length 0.05, but the longer branches are d ivided into successively shorter and shorter ones as illustrated. M A T E R IA L S AND M ET HO D S Simulations were p erform ed usin g the program Sim inator, w ritten in C by John Huelsen beck and m odi® ed sligh tly by m e. O ne hund red replicate data m atrices of a given sequence len gth were sim ulated over each tree, w ith a kap pa value (the ratio of transition rate, alpha, to tra nsvers ion rate, beta) of 2 and rate variation determ ined by a gam m a sh ape p aram eter of 0.5. T rees were reconstructed from the resu ltin g da ta® les using PAU P* 4.0 (various develo pm en tal version s) (Swofford , unpu bl.) usin g the criterio n of parsim ony. The topolo gies were tabulated from the tree ® les usin g the program TreeC ounter, w ritten in C by m e. Phy logene tic accuracy was scored w ith three m easures: (1) the proportion of trees for w hich the relationsh ip among an orig in al set of four taxa was reconstructed correctly, (2) the prop ortion of trees that were com pletely corre ct, and (3) the prop ortion of correctly reconstructed branches am ong all tree s. The starting p oint for these analyse s is a four-taxon tree know n to have an inconsisten t in tern al branch (Fig. 1), based on 1998 GRA YBEA LÐ EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G prev iou s ana lytical and sim ulation studies (Felsenstein, 1978; Huelsen beck and H illis, 1993). This tree has two long branches (length 0.5) and three short ones (length 0.05), w here branch len gth is de® ned as the proportion of characters that are exp ected to change from one end of the branch to the other. Taxa were added to this tree accord in g to the follow ing ru les: the longest branch(es) in the tree were iden ti® ed, and a new taxon was joine d at the m idp oint of that long est branch. The leng th of the new branch was set such that the total len gth from the in tern al node to the tip was the sam e as for the original long bra nch (see Fig. 1). A ll distinc t tree s w ith between 4 and 15 taxa were used to sim ulate data. A lthou gh one m igh t think that by adding taxa in this m anner the nu m ber of po ssible distinc t tree s would steadily increase, instead it show s a wavelike increase and decrease as a function of nu m ber of taxa. This is because for p articular nu m bers of taxa there is only one longest branch left in the tree, and thus there is only one possible tree that is one taxon larger. In contrast, in this sin gle one-taxonlarger tree there are multiple equ ally longest branches, and thus there are severa l p ossible trees w ith yet one m ore taxon. In order to get an idea of w hat ha pp ens for large trees w ithout dram atically increasin g com putation tim e, two relatively large tree s were chosen for sim ulations: the sin gle 22-taxon and 30-taxon trees. Note that thes e rules resu lt in the creation of ju st a su bset of all possible tree s in the exam ined size range. D at a m at rice s o f ® ve siz e s Ð 1 0 ,0 0 0 , 20,000, 40,000, 60,000, and 80,000 charactersÐ were created by sim ulation acro ss all tree s. For one grou p of the trees, m atrices of to tal size 1,000 characters were also created. For each tree, the total nu m ber of characters was kept consta nt as the nu m ber of taxa was increased , w ith the characters distributed equally am ong the taxa (for example, a to tal of 10,000 characters m ea ns that each of 4 taxa have 2,500 characters, each of 10 taxa have 1,000 characters, etc.). The prop ortions of variable and inform ative characters were determ ined by 11 the rate variation para m eter, the len gth of the branche s, and the nu m ber of taxa. Both prop ortion s increased w ith the nu m ber of taxa. For exam ple, for 4 taxa approxim ately 40± 44% of the characters were variable and about 3± 5% were inform ative, for 12 taxa ap proxim ately 54± 64% of the characters were variable and 41± 49% were inform ative, and for 30 taxa ap proxim ately 69± 75% of the characters were variable and 59± 65% were in form ative. Trees were reconstructed accord ing to the criterio n of parsim ony usin g equ ally weigh ted characters and heu ristic sea rches w ith TBR branch swap p ing. For a lim ited grou p of data m atrices, trees were also reconstru cted using m aximum likelih ood. The m odel of evolution used for the m aximum likelih ood analyses m atched p erfectly the mo del under w hich the data were sim ulated: gam m a sh ap e param eter of 0.5, base frequencie s equal, ka pp a of 2. R E SU L T S The genera l resu lt from all ana lyses was that, for a given to tal data m atrix size, phylogen etic accuracy im proved as the nu m ber of taxa increased. This relationsh ip contradicts the no tion that increasing the nu m ber of taxa sh ould worsen rather than improve accuracy, and seem s especially su rprisin g given that as the nu m ber of taxa is increasin g, the nu m ber of characters p er taxon is decreasing . Neverth eless, there is a lim it to this genera lizationÐ accuracy did worsen w hen the num ber of characters per taxon becam e quite low. In this study a stron g decline in accuracy w ith increasin g taxa was seen only in the sm alle st data m atrices (above ; 8 taxa for total m atrix size of 1,000 characters; i.e., 125 characters per taxon; Fig. 2). The sp eci® c results based on parsim ony are described in m ore detail next. A ® nal section com pares the p arsim ony resu lts w ith som e m aximum likeliho o d analyses. Increasing the Number of Taxa and/or C haracters For all data m atrix sizes exam ined, w hen the to tal nu m ber of characters is held contant, accuracy im proves w ith the addition 12 SYSTE M AT IC BIOLO GY VO L. 47 F IG U R E 2. Tw o m easu res of phy logenetic accuracy: (1) the prop ortion of replicates resulting in trees that accurately represent the relation ship am on g the original fou r taxa ( V ) and (2) the proportion of replicates resu lting in com pletely correct trees ( m ), plotted as a function of num b er of taxa. Th e num b er of ch aracters for e ach graph represen ts the total num b er across all included taxa (e.g ., for 10 taxa there are 100 characters per taxon); thu s w ithin each graph the nu m ber of characters per taxon de creases as the num ber of taxa increases. From top to b ottom : 1,000 ch aracters total, 10,000 ch aracters total, 40,000 ch aracters total, 80,000 ch aracters total. For 40,000 and 80,000 characters total the two m easures are ex actly equivalent, and thus on ly one set of sym b ols is plotted. of taxa over all or at least som e p ortion of the p aram eter space (Figs. 2, 3). Im provem en t was seen accord ing to all three m ea su res of accuracy: (1) the prop ortion of tree s in w hich the relationsh ip am ong the orig inal four taxa is corre ct (Fig. 2), (2) the proportion of tree s that are com pletely correct (Fig. 2), and (3) the proportion of correct branche s am ong all trees (Fig. 3). The strong est decline in accuracy is seen using the second m easu re (proportion of com - pletely correct tree s), w hich declines w ith m ore than ; 10 taxa for the sm allest data m atrix (1,000 total characters) and w ith m ore tha n ; 20 taxa in the second sm allest (10,000 total characters). There is also a sligh t decline in accuracy according to the ® rst m easure, prop ortion of tree s w ith the corre ct four-taxon relationsh ip, for the sm aller data m atrices. The third m easu re, proportion of corre ct branches among all tree s, rem ains high because nea rly all of 1998 GRA YBEA LÐ EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G 13 F IG U R E 3. T he proportion of correct branches recon structed for three sizes of data m atrices (10,000, 40,000, 80,000 characters total across all taxa) as a function of num ber of taxa. T he nu m b er of characters per taxon decr eases as the nu m b er of taxa increases. the incorre ct trees contain only one or two incorrect branches. Of course w ith a su f® ciently low nu m ber of characters p er taxon all m easures of accuracy would be exp ected to decline. These resu lts suggest that it is always preferable to add taxa rather than characters. However, although it is true that accuracy im proves m ore dram atically w ith addition of taxa, it also im proves w ith an increase in the nu m ber of characters per taxon (Fig. 4), as long as there is no inconsisten t bra nch in the tree. Inconsisten t branches cause accuracy to decline w ith an increase in num ber of characters. W here Is It Most Helpful to Add New Taxa? ative branch len gths. There was a strikingly w ide ra ng e in the accuracy of phy logene tic reconstruction observed for these tree s. The genera l p attern was that the tree s reconstructed least accurately were those in w hich taxa were added closest to the tip s of the long branche s, and tho se reconstru cted m ost accurately were those in w hich taxa were added clo sest to the base of those bra nche s (Fig. 5). This resu lt is consistent w ith the analytical result of Kim (1996), w ho found that an in consistent in terna l bra nch could be m ade consistent if one taxon was added to each of the two long branches in approxim ately the basal third of those branches (the exact p osition dep end s on the relative branch len gth s). Because of the ru les followed for adding taxa to the orig inal four-taxon tree, trees of a given num ber of taxa differ in their rel- Inconsistency By com p arin g the results for data m atrices of differen t sizes, it is possible to ge t 14 SYSTE M AT IC BIOLO GY VO L. 47 F IG U R E 4. T he prop ortion of replicates resulting in trees that accu rately represent the relation ship am on g the original fou r taxa for data m atrices of total size 10,000 characters ( M ) and 80,000 characters ( m ) as a fu nction of num b er of taxa, w here the num b er of ch aracters per taxon decr eases as the nu m b er of taxa increases. Thes e two data m atrix sizes represen t the com plete ex tremes evalu ated in this study; results from m atrices of interm ediate sizes fall b etween the values plotted here. som e in sight in to w hich trees have inconsisten t in tern al branche s (Fig. 4). It is clear that the trees of four, ® ve, and six taxa have an inconsistent intern al branch, because the la rg er data m atrices all fail to reconstruct the corre ct four-taxon relationsh ip 100% of the tim e. C ertain of the trees of between 7 and 10 taxa seem to have an inconsisten t intern al branch, because the accuracy is worse w ith m ore data than w ith less. For all of the tree s w ith m ore than 10 taxa inconsistency no longer ap p ea rs to be a problem . However, because inconsistency is a cond ition based on in® nite data, it is not po ssible to m ake de® nitive statem ents about this situ ation w ith the relatively sm all data m atrices exam ined here. M aximum Likelihood Lim ited analyse s usin g m aximum likeliho o d show a differen t p attern tha n parsim ony for the trees exam ine d here. The m ajor difference is that this m ethod as im plem en ted here do es not suffer from prob lem s w ith inconsisten t branches, and thus it reconstructs the sm aller tree s in this study corre ctly (Fig. 6; also H illis et al., 1994). M aximum likeliho od also reconstructs the la rg er trees w ith higher accuracy than parsim ony (Fig. 6), althou gh it is not reasonable to comp are these two m eth- o ds in this case because the m odel of m aximum likeliho o d evolution used for phylogeny reconstru ction m atched the m o del of evolution used to simulate the data much m ore closely than did the parsim ony reconstruction m odel. Future work m igh t com pare variou s phy lo genetic reconstruction m ethods m ore fairly, and determ ine if certa in m ethods are m ore or less su sceptible to taxon and character sam plin g. D IS C U S SIO N C an we conclude that we always should add data in the form of taxa rather than characters? Under the conditions of this study, addition of both kind s of data im proves phylo genetic accuracy, but w hen the total nu m ber of characters is held consta nt, accuracy is much higher if the characters are distributed acro ss a larger nu m ber of taxa. A lthou gh the relationsh ip between adding taxa versu s adding characters was not exam in ed directly by Lecoin tre et al. (1993, 1994), the gen eral conclu sion here is com patible w ith their resu lt that the im pact of sp ecies sa m pling was higher tha n expected, and it is also consist en t w ith p a le o n t o lo g ic al s t ud ie s th at found that taxa that brea k up long bra nches im prove accuracy (Gauthier et al., 1988; D onogh ue et al., 1989; Huelsen beck, 1991). The resu lts of these sim ulations su gg est that there are only a few reasons that one 1998 GRA YBEA LÐ EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G 15 F IG U R E 5. T he proportion of replicates resu lting in trees in w hich the relationship am on g the original fou r taxa is correct, plotted as a function of num b er of taxa for two data m atrix sizes (10,000 and 80,000 ch aracters total, w here the num b er of characters per taxon decre ases as the nu m b er of taxa increases). T he m ost accurate trees ( . ) are those w here n ew taxa h ad be en added at the b ase of the long branche s, and the least accu rate trees ( n ) are those w here new taxa were added toward the tips. would no t want to increase the num ber of taxa in a phy logene tic study. The ® rst, and m ost obvious, is that no such taxa exist to sam ple. This m ay be because there have not been su f® cient splitting even ts am ong the taxa of interest to create ap propriate taxonom ic units, or because such taxa have been lost throu gh extinction, or even be- cause splittin g even ts occurre d so rap idly in a narrow w indow of tim e that no taxa w ith suf® cien tly shor t branch len gth s exist. The second is that strong sup p ort for the phy logene tic relationship s in question exists and there is no intere st or need to sp en d tim e and m oney obtainin g data from other relatives. O f course an imp or- F IG U R E 6. T he proportion of replicates resu lting in trees that accurately represent the relation ship am on g the original fou r taxa ( V , v ) and the proportion of replicates resulting in com pletely correct trees ( M , m ), plotted as a function of num b er of taxa. Trees were recons tructed from data m atrices of size 10,000 ch aracters, and the num b er of ch aracters per taxon decreases as the num b er of taxa increases. O p en sym bols are results u sing parsimony and closed sym bols are results using m aximum likelihoo d. 16 SYSTE M AT IC BIOLO GY tant caveat in this case m ay be that the strong sup p ort is for incorrect relationsh ips, in w hich case adding taxa m igh t be critically im portant to im prove accuracy. It therefore is im portant to explore the nature of su pp ort for p articular relationsh ip s and to exam ine w he ther particular m isleading pheno m ena m igh t be affecting the result. For exam ple, workers have determ ine d that long-bra nch attraction can m islead phy lo gene tic reconstru ction m etho ds, as can base com po sition bias or o ther form s of nonrandom converg ence in the data (Felsen stein, 1978; Huelsen beck and H illis, 1993; Lockhart et al., 1994; Swofford et al., 1996). It is p ossible to explore the p ossible im p act of such pheno men a using m ethods such as the p aram etric boo tstrap, as outline d by Huelsen beck et al. (1996). A n in terestin g third possibility is that the added taxa m ight m ake w hat was a consistent tree into an inconsistent tree. O ne could im agine this happ ening , for example, by the addition of successive outgroup taxa w ith a short intern al branch between them . C onsider ation of this possibility serves to high light the very particular nature of the conditions for adding taxa in this study: Taxa were added speci® cally to break up long branches. It would be entirely m isleading to claim that this study su pports a general conclusion that more taxa always im prove phy logenetic accuracy. Conversely, there seem to be very few reasons that it would be necessary to in crease the nu m ber of characters applie d to a p articular phylo genetic question beyond som e reasonable level (although that level m ay be qu ite a bit higher than that seen in m any studies today). O bviou sly there do es exist som e thresho ld below w hich the num ber of characters is insu f® cient. Accord in g to the cond itions of this study that level is qu ite low, but it is likely to be high er w ith real data w here in genera l overall rates of evolution are slower and the prop ortion of variable characters is lower. A second reason m igh t be that the available data are effectively useless because they have not evolved at a rate ap propriate for the question of in terest (e.g., the DNA sequen ces are saturated). This determ ination, VO L. 47 however, m ay largely be a question of context, because w ith suf® cient taxa in the right parts of the tree app arent hom oplasy would disap pear. Nonethele ss, for certain question s there is und oubtedly som e kind of balance between gettin g ``bad’ ’ data from m ore taxa in the hop es that these taxa w ill solve the problem s w ith the data, and lookin g for differen t data. More work in this area is nee ded . In som e situ ations it can be very dif® cult to obtain data from new sources (e.g., differen t gen es), and it could m ake sen se to work ® rst on includin g m ore taxa, and only then to m ove to new sou rces of characters. A n encoura ging conclusion from this study is that the accurate reconstruction of large tree s, such as the larg e-scale studies of angiosp erm relationsh ip s (C hase et al., 1993), m ay be easier than prev iou sly thou ght, because such tree s are less likely to have very long bra nche s. In fact, big tree s m ay be far easier to reconstruct accurately than sm all ones, as dem onstrated dram atically in recent sim ulations (H illis, 1996; Purvis and Q uicke, 1997). A lthou gh this resu lt contradicts the m essage of K im ’s (1996) pap er, it is no t inconsistent w ith his resu lts. Severa l caveats reg ard ing this conclusion are warra nted, m ost im p ortant of w hich is the recognition of the very particu la r conditions of this study w here taxa were added speci® cally to brea k up long bra nche s. If taxa were in stead added random ly w ith resp ect to branch leng th, as m igh t be m ore realistic, the expectation is that phy lo genetic accuracy would also im prove but m ore slow ly. Nonetheless, we are genera lly no t com pletely in the da rk about the iden tity of p otentially long branches, and it is certainly conceivable that we can add taxa sele ctively to purposely break them up. A C KN O W L ED GM EN TS T his work was supp orted by a National Science Foundation Pos tdoctoral Fellow ship in Bioscience s R e lated to the E nviron m ent. I than k D. M . H illis for help ful discu ssion about the des ign of the simulations, M . B rauer for assistance im plemen ting various com puter program s, J. P. Huelsen be ck for allow ing m e to use Sim inator, and D. L. Swofford for acce ss to developm en tal version s of PAU P *. I than k m em bers of the 1998 GRA YBEA LÐ EFFECT S O F TA XO N A ND C HA RAC TER SAM PL IN G H illis/B ull lab and the Field Museum evolution ary biolo gy d iscussion group, D. M . H illis, D. C . C ann atella, P. Wagner, and R. In ger for com m en ts on the m anu script. R E FE R E N C E S C H A S E , M . W., D. E. S O L T IS , R . G. O L M S T E A D , D. M O R G A N , D. H . L E S , B . D. M IS H L E R , M . R. D U V A L L , R . A . P R IC E , H . G. H IL L S , Y.-L . Q U I , K . A . K R O N , J. H . R E T T IG , E . C O N T I , J. D. P A L M E R , J. R . M A N H A R T , K . J. S Y T S M A , H . J. M IC H A E L S , W. J. K R E S S , K . G. K A R O L , W. D. C L A R K , M . H E D R E N , B . S. G A U T , R . K . J A N S E N , K .-J. K IM , C . F. W IM P E E , J. F. S M IT H , G. R . F U R N IE R , S. H . S T R A U S S , Q .-Y. X IA N G , G. M . P L U N K E T T , P. S. S O L T IS , S. M . S W E N S E N , S. E. W IL L I A M S , P. A . G A D E K , C . J. Q U IN N , L. E . E G U IA R T E , E. G O L E N B E R G , G. H . L E A R N , S. W. G R A H A M , S. C . H . B A R R E T T , S. D A Y A N A N D A N , A N D V. A . A L B E R T . 1993. P hy logen ies of seed plants: A n analysis of nucleo tide sequences fr om the plastid gene rbcL . A n n. Mo. B ot. G ard. 80: 528± 580. C U M M IN G S , M . P., S. P. O T T O , A N D J. W A K E L E Y . 1995. Sam pling prop erties of DNA sequence data in phylogene tic analysis. Mol. B iol. E vol. 12:814± 822. D E B R Y , R . W. 1992. T he con sistency of several phy logeny-inference m etho ds u nder varying evolutionary rates. Mol. B iol. Evol. 9:537± 551. D O N O G H U E , M . J., J. A . D O Y L E , J. G A U T H IE R , A . G. K L U G E , A N D T. R O W E . 198 9. T he im p ortanc e o f fo ssils in phy log eny rec on struction. A n nu. R ev. E c ol. S yst. 20 :431± 460 . F E L S E N S T E IN , J. 1978. Ca ses in w h ich parsimony or com b atibility m ethod s w ill be p ositively m isleading. Syst. Z ool. 27:401± 410. G A U T H IE R , J., A . G. K L U G E , A N D T. R O W E . 1988. A m n iote phy logeny and the im p ortance of fos sils. C lad istics 4:105± 210. H E N D Y , M . D., A N D M . A . C H A R L E S T O N . 1993. Hadam ard con jugation: A versatile tool for m odellin g nucleotide sequence evolution. N . Z. J. Bo t. 31:231± 237. H E N D Y , M . D., A N D D. P E N N Y . 1989. A fram ew ork for the quan titative stud y of evolutionary trees. Syst. Z ool. 38:297± 309. H IL L IS , D. M . 1996. In fer ring com plex phylogen ies. N ature 383:130± 131. H IL L IS , D. M ., J. P. H U E L S E N B E C K , A N D D. L. S W O F F O R D . 1994. Hobg lobin of phylogene tics? Nature 369:363± 364. H U E L S E N B E C K , J. P. 1991. W hen are fo ssils better than 17 extan t taxa in phy logenetic an alysis? Syst. Z ool. 40: 458± 469. H U E L S E N B E C K , J. P., A N D D. M . H IL L IS . 1993. Success of phy logenetic m e thod s in the fou r-taxon case. Syst. B iol. 42:247± 264. H U E L S E N B E C K , J. P., D. M . H IL L IS , A N D R. J O N E S . 1996. Pa ram etric b oo tstrap ping in m olecular phy logenetics: Ap plications and perform ance. Pa ge s 19± 46 in M olecular zoolo gy: Advance s, strategies, and protocols ( J. D. Ferraris and S. R . Palum bi, ed s.). W ileyL iss, New York. K IM , J. 1996. General incon sistency cond itions for m aximu m p arsimony : Effects of branch lengths and increasing num b ers of taxa. Syst. Biol. 45:363± 374. K U H N E R , M . K ., A N D J. F E L S E N S T E IN . 1994. A simu lation com parison of phy logeny algorith m s under e qu al and unequal evolution ary rates. Mol. B iol. E vol. 11:459± 468. L E C O IN T R E , G., H . P H IL IP P E , H . L . V A N L E , A N D H . L E G U Y A D E R . 1993. Spe cies sam pling has a m ajor im p act on phy logenetic inference. Mol. P hy logenet. E vol. 2:205± 224. L E C O IN T R E , G., H . P H IL IP P E , H . L . V A N L E , A N D H . L E G U Y A D E R . 1994. How m any nucleo tides are requ ired to resolve a phy logenetic problem ? T he use of a new statistical m ethod applicable to available sequences. Mol. P hy logenet. E vol. 3:292± 309. L O C K H A R T , P. J., M . A . S T E E L , M . D. H E N D Y , A N D D. P E N N Y . 1994. R ecovering evolution ary trees u nder a m ore realistic m odel of sequence evolution. Mol. B iol. E vol. 11:605± 612. N E I , M . 1991. R elative ef® cien cies of differ en t treem aking m ethod s for m olecular data. Pa ges 90± 128 in Phylog ene tic analysis of DNA sequences (M . M . M iyam oto and J. C racraft, ed s.). O xford Un iv. Pres s, O xford, E n gland. P U R V IS , A ., A N D D. L . J. Q U IC K E . 1997. Bu ilding phylogen ies: A re the big easy? Trend s E col. E vol. 12: 49± 50. S T E E L , M . A . 1989. D istributions on bicolored evolu tionary trees. P h.D. T hesis, M assey Un iv., Pa lm erston North, New Z ealand. S W O F F O R D , D. L ., G. J. O L S E N , P. J. W A D D E L L , A N D D. M . H IL L IS . 1996. P hy logenetic inference. Pag es 407± 514 in Molecular system atics, 2nd e dition (D. M . H illis, C . Moritz, and B . K . M able, ed s.). Sinauer A ssociates, Sunder land, M assach usetts. Z H A R K IK H , A ., A N D W.-H . L I . 1993. Incon sistency of the m aximu m -p arsim ony m e thod : The case of ® ve taxa w ith a m olecu lar clock. Syst. B iol. 42:113± 125. Received 8 April 1997; accepted 10 Ju ne 1997 A ssociate E ditor: D. C ann atella
© Copyright 2026 Paperzz