Fungal Zuotin Proteins Evolved from MIDA1-like Factors by Lineage-Specific Loss of MYB Domains Edward L. Braun and Erich Grotewold Department of Plant Biology and Plant Biotechnology Center, Ohio State University Proteins are often characterized by the presence of multiple domains, which make specific contributions to their cellular function. While the gain of domains in proteins by duplication and shuffling is well established, domain loss is poorly documented. Here, we provide evidence that domain loss has played an important role in the evolution of protein architecture and function by demonstrating that fungal Zuotin proteins evolved from MIDA1-like proteins, present in animals and plants, by complete loss of the carboxyl-terminal MYB domains. Phylogenetic analyses of the DnaJ motif (the J domain) present in both Zuotin and MIDA1 proteins were complicated by the limited length and profound differences in evolutionary rates exhibited by this domain. To rigorously examine J domain phylogeny, we combined the nonparametric bootstrap with Monte Carlo simulation. This method, which we have designated the resampled parametric bootstrap, allowed us to assess type I and type II error associated with these analyses. These results revealed significant support for domain loss rather than domain gain or gene loss involving paralogs. The absence of sequences related to the MIDA1 MYB domains in Saccharomyces cerevisiae further indicates that the domains have been completely lost, consistent with known functional differences between Zuotin and MIDA1 proteins. These analyses suggest that the description of additional examples of complete domain loss may provide a method to identify orthologous proteins exhibiting functional differences using genomic sequence data. Introduction Proteins often consist of multiple domains (Doolittle 1995; Henikoff et al. 1997), and evolutionary changes in the domain structures of these proteins have obvious functional implications. Indeed, domain structure differences have been used to infer protein-protein interactions (Enright et al. 1999; Marcotte et al. 1999). Similarity of domain structures has also been used as information for assigning orthology (Huynen and Bork 1998), a method especially helpful for large-scale genomic comparisons (e.g., Rubin et al. 2000). Variation in domain organization has largely been attributed to domain shuffling, intramolecular duplications, the fusion of distinct proteins to form multifunctional polypeptides, and pathways leading to the acquisition of novel domains during evolution (Doolittle 1995; Henikoff et al. 1997; Li 1997). In principle, domain loss provides an additional mechanism by which diversity in domain organization has been generated, but there have been few attempts to document domain loss, in part because of the difficulties associated with distinguishing domain loss from domain gain or gene loss. Domain loss can reflect the fission of specific sequence modules (Snel, Bork, and Huynen 2000), resulting in the dispersal of the domains present in a single protein into two or more polypeptides. This mechanism is conservative, since orthologs of each domain remain after the fission. Alternatively, specific sequence modules present in proteins may undergo deletion or sufficient sequence divergence in some lineages Abbreviations: ML, maximum likelihood; MP, maximum parsimony; NJ, neighbor joining. Key words: domain evolution, gene loss, protein evolution, statistical power analysis, comparative genomics, Monte Carlo simulation. Address for correspondence and reprints: Edward L. Braun, Department of Plant Biology and Plant Biotechnology Center, 206 Rightmire Hall, 1060 Carmack Road, Ohio State University, Columbus, Ohio 43210. E-mail: [email protected]. Mol. Biol. Evol. 18(7):1401–1412. 2001 q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 to be eliminated (e.g., Rodriguez-Monge, Ouzounis, and Kyrpides 1998; Braun and Grotewold 1999). This form of domain loss may be conservative, since it can occur after gene duplications as part of functional divergence between paralogs (e.g., Braun and Grotewold 1999). Alternatively, DNA sequences encoding specific domains can be completely deleted in some lineages (complete domain loss). This type of change might result in proteins with orthologous domains likely to have undergone functional changes as profound as those typically characterizing paralogs with similar differences in domain structure. Unequivocal demonstration of complete loss of domains requires entire genome sequences, since the existence of paralogs with the ancestral domain organization must be excluded. Even when complete genome sequences are available, it is important to examine the phylogenetic relationships among sequences containing the more broadly shared domain to establish whether the change in domain organization reflects loss or gain. However, the short lengths of many domains and the high degree of divergence between many multidomain proteins that exhibit differing domain organizations present substantial challenges for phylogenetic reconstruction (Nei, Kumar, and Takahashi 1998; Philippe and Laurent 1998). Since proteins that have undergone complete domain loss are unlikely to be sampled from a broad variety of organisms, limited taxon sampling may also have an impact on the estimates of phylogenetic relationships for these proteins. The MIDA/Zuotin family of proteins provides an interesting example of variation in the domain structure associated with particular groups of organisms. MIDA1like proteins, found in animals and plants, are characterized by the presence of an amino-terminal J domain embedded in a region of similarity to fungal Z-DNA binding proteins (Zuotins), along with two carboxyl-terminal MYB domains (fig. 1). The J domain is present in a large number of prokaryotic and eukaryotic proteins 1401 1402 Braun and Grotewold FIG. 1.—Structure of MIDA1 and Zuotin proteins. A, Schematic representation of MIDA1 and Zuotin homologs, highlighting the relevant domains. B, Alignment of the mouse MIDA1 (Shoji et al. 1995), budding yeast (Saccharomyces cerevisiae) Zuotin (Zhang et al. 1992), and Arabidopsis F24K9.12 (MIDA1-like) proteins. Identical residues are indicated in black, gaps in the alignment are indicated with dashes, and the positions of the J domain and the MYB domains are shaded. (Kelley 1998), but the MYB domains are related to a group of transcription factors that have been found almost exclusively in the eukaryotes (Ouzounis and Papavassiliou 1997). Thus, the MIDA1 homologs are multidomain proteins in which the ancient J domain, present in both prokaryotes and eukaryotes, has been fused to eukaryote-specific MYB domains. The mouse MIDA1 protein was identified because it interacts physically with the helix-loop-helix protein Id1 (Shoji et al. 1995), and overexpression suggests a role in the regulation of growth and proliferation. This function is consistent with the identification of the human MIDA1 ortholog as an M-phase phosphoprotein (MPP11; Matsumoto-Taniura et al. 1996) and the defect in asymmetric cell division evident in Volvox carteri GlsA (MIDA1) mutants (Miller and Kirk 1999). Thus, the ancestral function of MIDA1 in animals and plants is likely to involve the regulation of cell division, although the specific targets of this protein remain unknown. MIDA1-like proteins are absent from the fungi despite a large number of phylogenetic analyses supporting an animal-fungal clade that excludes the plants (reviewed by Braun et al. 1998; Baldauf 1999). However, the fungal Zuotins (Zhang et al. 1992) correspond to MIDA1 homologs lacking the conserved carboxylterminal MYB domains. Thus far, Zuotins have not been Domain Loss in Zuotin Proteins found in the plant or animal kingdoms. Saccharomyces cerevisiae Zuotin, encoded by the ZUO1 gene, is a ribosome-associated DnaJ (Yan et al. 1998) that exhibits tRNA- and Z DNA-binding activities (Zhang et al. 1992; Wilhelm et al. 1994). These and other findings (Yan et al. 1998) suggest that Zuo1p is the DnaJ partner of the Ssb-type Hsp70 proteins (Nelson et al. 1992), and thus a component of the fungal translation machinery. Here, we provide evidence that complete loss of conserved domains is one mechanism by which diversity in the domain distribution of multidomain proteins is established. By examining the origin of fungal Zuotin homologs, we determined that these DnaJ proteins originated from MIDA1-like proteins, present in animals and plants, by the complete loss of carboxyl-terminal MYB domains. We present an estimate of phylogenetic relationships among J domains—shared by MIDA1 proteins, Zuotins, and more distantly related DnaJ/Hsp40 homologs—using the results to distinguish between domain loss and more complex patterns of gene duplication and gene loss that could produce similar end results. To examine the reliability of J domain phylogeny, we employed the ‘‘resampled parametric bootstrap,’’ combining the nonparametric bootstrap (Felsenstein 1985) with Monte Carlo simulation. This tool permitted us to place our confidence in the phylogenetic reconstruction in a substantially more rigorous framework, allowing the impact of limited sequence length and unequal evolutionary rates on both type I and type II error to be assessed. Our findings provide the first described example of complete domain loss, consistent with observed functional differences between Zuotins and MIDA1-like proteins, which we demonstrate to be orthologs. Domain loss may represent a general mechanism underlying variation in the structure of multidomain proteins, and the complete domain loss exemplified by the MIDA1/Zuotin proteins may also contribute to functional differences between proteins. Materials and Methods Selection and Alignment of Sequences Homologs of the MIDA1 protein were identified using BLASTP (Altschul et al. 1997), and the positions of protein sequence motifs were established by searching the Pfam (Bateman et al. 2000). Sequences were aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1994) followed by manual adjustment. Full alignments are available on request. The protein sequences used in these analyses were as follows: (1) MIDA1 homologs from Arabidopsis thaliana (AAF00659 [F24K9.12] and BAA98200 [K16F4.7]), Caenorhabditis elegans (AAB09157), Drosophila melanogaster (AAF51678), Homo sapiens (CAA66913), Mus musculus (BAA09854), and Volvox carteri (AAD26632); (2) Zuotin homologs from S. cerevisiae (CAA45156) and Schizosaccharomyces pombe (CAB10796); (3) eukaryotic J domain proteins (closely related to MIDA1/Zuotin) from A. thaliana (AAD12011 [T3K9.23] and T00641 [F3I6.4]), Babesia bovis (AAC27389), D. melanogaster (AAF43627), Hevea 1403 brasiliensis (AAD120055), H. sapiens (Q99615), Nicotiana tabacum (AAD09516), S. cerevisiae (AAB67594), and Salix gilgiana (BAA35121 and BAA76888); (4) eukaryotic J domain proteins (more distantly related to MIDA1/Zuotin) from C. elegans (T15851 [C56C10.11], T22648 [F54D5.8], and T24396 [T03F6.2]), D. melanogaster (Q24133 [DNAJ1] and AAF47301 [GH03108]), H. sapiens (BAA88769 [DNAJ], BAA02656 [DNAJ-2], NPp036398 [HSP40–3], and NPp006136 [HDJ1]), M. musculus (NPp031895), Pisum sativum (T06594), Plasmodium falciparum (F71623 [PFB0090c] and G71610 [PFB0595w]), S. cerevisiae (NPp014172), and S. pombe (T41362); (5) prokaryotic J domain proteins from Borrelia burgdorferi (F70181), Buchnera sp. (BAB12870), Chlamydia pneumoniae (G72128), Escherichia coli (P08622), Haemophilus ducreyi (P48298), Haemophilus influenzae (P43735), Lactobacillus sakei (CAA06942), Mycoplasma genitalium (P47265), Mycoplasma pneumoniae (P78004), Nitrosomonas europaea (O06431), Rhodothermus marinus (AAD37973), Synechocystis sp. (P50027), Thermus thermophilus (Q56237), and Thermotoga maritima (B72327). Phylogenetic Reconstruction Phylogenetic analyses using amino acid sequences were conducted using a combination of neighbor joining (NJ; Saitou and Nei 1987), maximum parsimony (MP), and maximum likelihood (ML). Analyses conducted using the full alignment and after deleting regions that were difficult to align yielded similar results. Results obtained using the full alignment are reported. NJ trees were inferred using either ML distance estimates or p-distances (uncorrected distances) and constraining branch lengths to be nonnegative, using either the appropriate option in PAUP* 4.0b4a (Swofford 2000) or nnneighbor (W. J. Bruno, Los Alamos National Laboratories). ML distances were estimated using protml from MOLPHY 2.3b3 (J. Adachi and M. Hasegawa, Institute of Statistical Mathematics, Japan) with options ‘‘-D’’ (distance matrix) and ‘‘-df’’ (for PAM distances; Dayhoff, Schwartz, and Orcutt 1978). ML distance estimates under the BLOSUM model (Henikoff and Henikoff 1992) were calculated using a modified version of protml. Phylogenetic reconstruction by MP used PAUP* 4.0b4a, weighting amino acid changes equally and collapsing branches if the minimum branch length was 0. ML trees were estimated using protml and the PAM model, which exhibited the best fit to the data (see Results and Discussion). MP tree searches in PAUP* used 10 random-addition sequence replicates and TBR branch swapping. Amino acid substitutions at specific sites were examined using the ‘‘trace character’’ option in MACLADE 3.05 (Maddison and Maddison 1992), and overall patterns of amino acid substitution were inferred using the ‘‘state changes and stasis’’ option in MACLADE, counting unambiguous events only. Inferred amino acid changes were classified as conservative (changes to another amino acid within the same class, using the six classes of amino acids described by Dayhoff, Schwartz, 1404 Braun and Grotewold FIG. 2.—Phylogeny of the MIDA1/Zuotin proteins based on the Zuotin homology region. Branch lengths were estimated by maximumlikelihood using the PAM1G model of sequence evolution and are presented as expected amino acid substitutions per site (EAASS). Bootstrap support based on neighbor joining of PAM distances (500 replicates) is presented above branches as percentages. and Orcutt [1978]) or radical (changes to another class of amino acid). For the PAM model, the expected numbers of amino acid substitutions in each class were estimated by simulations, as described below. Amongsites rate heterogeneity was examined using an eightcategory discrete approximation to a G distribution (Yang 1994), as implemented in TREE-PUZZLE 4.0.2 (K. Strimmer, University of Oxford, and A. von Haeseler, Max Planck Institute for Evolutionary Anthropology). Improvements in model fit for nested models were evaluated using the likelihood ratio test (Felsenstein 1988; Goldman and Whelan 2000), comparing the test statistic (2d, where d is the difference between ln L values for each model) with the appropriate x2 or x̄2 distribution. The reliability of phylogenetic analyses was assessed by nonparametric bootstrap resampling (Felsenstein 1985) using 500 replicates and adding sequences in random orders for each replicate. MP bootstrap analyses used 10 random additions and no branch swapping. To examine the probability of obtaining specific bootstrap proportions, 500 data sets were simulated based on the null hypothesis (H0) tree using PSeq-Gen 1.1 (Grassly, Adachi, and Rambaut 1997) and the PAM1G model of evolution with parameters estimated from the data assuming H0. (The H0 topologies, branch lengths, and other simulation parameters are available on request.) Then, the nonparametric bootstrap support for the relevant group was estimated from each simulated data set as described above. The probability of observing a given bootstrap value if H0 were correct corresponds to the proportion of simulated data sets with bootstrap values greater than or equal to the value, with the critical bootstrap value given by the upper 5% of this null distribution (5% probability of type I error). The power of a specific analysis can be assessed using simulations under the alternative hypothesis. Power was calculated using the proportion of simulations under the alternative hypothesis (HA) with bootstrap support in the upper 5% of the distribution generated using the null hypothesis (since simulations under HA with support under the critical bootstrap value would reflect cases of type II error, and power 5 1 2 b, where b is the type II error probability). Like the data for H0 simulations, all data relevant to the HA simulations are available on request. Unix shell scripts used to conduct these analyses are also available on request. Results and Discussion Primary Structure and Molecular Evolution of MIDA1 Homologs The primary question we wanted to address was whether the fungal Zuotins exhibit an orthologous or paralogous relationship to MIDA1 proteins. Given the large number of distinct analyses supporting the existence of an animal-fungal clade (Baldauf and Palmer 1993; Wainright et al. 1993; Nikoh et al. 1994; reviewed by Braun et al. 1998; Baldauf 1999), an orthologous relationship between MIDA1-type proteins and the Zuotins would suggest that the difference in their domain organizations reflects domain loss. Searches of genomic sequences from S. cerevisiae and S. pombe did not reveal the presence of any MYB domains closely related to those present in the animal and plant MIDA1 homologs (data not shown). This indicates that the domain loss event would be complete, rather than a more conservative gene fission. To investigate evolutionary relationships among MIDA1-like proteins from animals, plants, and fungi, we estimated the phylogeny of these proteins using the Zuotin homologous region (fig. 1B). These analyses revealed a high level of bootstrap support for each eukaryotic kingdom (fig. 2). In fact, this estimate of MIDA1/Zuotin phylogeny is compatible with current estimates of animal phylogeny, which suggest the existence of a clade (the Ecdysozoa; Aguinaldo et al. 1997) containing the arthropods (D. melanogaster) and nematodes (C. elegans). The only gene duplication implied by this phylogeny occurred within the land plant lineage after the divergence of the chlorophyte algae, resulting in the A. thaliana MIDA1 homologs encoded by paralogous genes on chromosomes 3 (F24K9.12) and 5 (K16F4.7). Domain Loss in Zuotin Proteins Although other MIDA1 or Zuotin paralogs containing the J domain could not be identified, searches using the MYB domains revealed the existence of proteins containing these MYB domains related to those in MIDA1 in both C. elegans and A. thaliana (data not shown). In addition, a distinct class of plant MYB homologs, including the tomato I-box binding factor (Rose, Meier, and Wienand 1999), are characterized by the presence of a MIDA1-like MYB domain, in addition to a more distantly related MYB domain. The existence of these proteins can be explained by domain shuffling and the more conservative form of domain loss in which gene duplication precedes domain loss. The estimate of MIDA1/Zuotin phylogeny obtained using the Zuotin homology region (fig. 2) does not allow the reconstruction of the pattern of evolution for the MYB domains present in MIDA1 homologs, because the tree cannot be rooted in the absence of an outgroup containing the Zuotin region. However, this analysis provided valuable information regarding evolutionary relationships within this group and their patterns of sequence evolution. We found that accommodating the unequal amino acid composition of these proteins resulted in a significant improvement in model fit (ln L 5 23,408.69 [proportional model]; ln L 5 23,549.09 [Poisson model]; P , 0.001). Use of the PAM model of evolution (Dayhoff, Schwartz, and Orcutt 1978) to accommodate differences in the probability for different types of amino acid substitution resulted in additional improvement (ln L 5 23,259.26) and further revealed that the PAM model was better than the related JTT and BLOSUM models (ln L 5 23,261.02 [JTT]; ln L 5 23,298.20 [BLOSUM]). The Zuotin region exhibits modest among-sites rate heterogeneity, and the assumption that rates follow a G distribution (with shape parameter a 5 1.07) resulted in a significant improvement to the PAM model of evolution (ln L 5 23,196.22; P , 0.001). Similar results were obtained using the J domain alone, regardless of the specific set of sequences analyzed (data not shown). Despite the improvement in likelihood score when the PAM model of sequence evolution was used, all models (and methods of phylogenetic inference) supported the same relationships among these sequences. Two alternative positions for the root of the MIDA1/Zuotin phylogeny (fig. 2) are compatible with current estimates of eukaryotic phylogeny (reviewed by Braun et al. 1998; Baldauf 1999). Placement of the root between the plants and an animal-fungal clade (a in fig. 2) is compatible with organismal phylogeny in the absence of ancient gene duplications, and it is also suggested by midpoint rooting (data not shown). This placement suggests the complete loss of MYB domains in MIDA1-like proteins during the evolution of the fungi. Placement of the root between MIDA1 and Zuotin proteins (b in fig. 2) could be reconciled with current estimates of organismal phylogeny by postulating (minimally) the loss of the MIDA1 ortholog in the S. cerevisiae lineage along with independent losses of Zuotin orthologs in the Ecdysozoa and the Arabidopsis lineage. There is evidence for gene loss in S. cerevisiae, C. ele- 1405 gans, and D. melanogaster based on both large-scale surveys (Aravind et al. 2000; Braun et al. 2000; Rubin et al. 2000) and detailed analyses of specific gene families (Braun et al. 1998; Steele, Stover, and Sakaguchi 1999). Since members of at least one other class of MYB homologs (the c-MYB-like transcription factors found in plants and animals; Braun and Grotewold 1999) have been lost in S. cerevisiae and C. elegans, it is especially evident that the gene loss model cannot be dismissed. Alternatively, support for root b could indicate that previous estimates of eukaryotic phylogeny supporting the existence of an animal-fungal clade are incorrect (Wang, Kumar, and Hedges [1999] suggest that support for this clade is overstated), although we believe this is unlikely given the distinct lines of evidence supporting the animal-fungal clade (summarized by Braun et al. 1998; Baldauf 1999). Thus, the observed phylogeny and the absence of MYB domains in the fungal Zuotins can be reconciled with either gene loss or complete domain loss models. We believe that the difficulties in distinguishing between these two models with a high degree of confidence are a major reason why examples of complete domain loss have not previously been reported. Thus, it is essential to precisely define the position of the root in the MIDA1/Zuotin phylogeny (fig. 2). Placement of the Root in the MIDA1/Zuotin Subgroup The placement of the root for the MIDA1/Zuotin phylogeny could be reliably inferred using an outgroup. However, use of a DnaJ outgroup required limiting the alignment to the much shorter region of sequence homology corresponding to the J domain, only 75 residues in length (fig. 1B). To determine the position of the root within the MIDA1/Zuotin subgroup and establish the identity of the DnaJ protein most closely related to this subgroup, we retrieved a large set of DnaJ homologs— including prokaryotic sequences—and estimated the phylogeny of J domains related to MIDA1 and Zuotin (fig. 3). In phylogenetic trees obtained by NJ of PAM distances estimated using the J domain alignment, there is 75% bootstrap support for an animal-fungal clade containing MIDA1 and Zuotin proteins (fig. 3). Lower levels of bootstrap support for an animal-fungal J domain clade were evident when using NJ of p-distances and equally weighted MP (table 1), although neither analysis provided any support for placement of the root in position b. The data exhibited a good fit to the PAM1G model of sequence evolution, suggesting that the best method of phylogenetic estimation used was NJ of PAM distances, which provided support for root a in figure 2. Indeed, this method appeared to represent the most powerful test of this hypothesis (see below). Since the exact set of sequences examined in phylogenetic analyses can have a profound effect on estimates of evolutionary relationships (Lecointre et al. 1993), we wanted to assess the impact of the specific set of J domain sequences used. Analyses using a subset of eukaryotic J domains that appeared more closely re- 1406 Braun and Grotewold FIG. 3.—Phylogeny of J domains related to MIDA1 and Zuotin. Branch lengths were estimated by ML using the PAM1G model of evolution. Bootstrap support based on neighbor joining of PAM distances (500 replicates) is indicated below branches when it exceeds 50%. The MIDA1/Zuotin group is shaded for emphasis. Prokaryotic J domains are presented in capital letters, the subgroup used in the eukaryotic data set are indicated with asterisks, and the rapidly evolving J domains (‘‘fast’’ sequences) are indicated with bold letters. The paralogous MIDA1-like proteins from Arabidopsis thaliana are differentiated by the chromosomal locations of the genes encoding these proteins (chromosome 3 or chromosome 5). Table 1 The Impact of Taxon Sampling and Different Methods of Phylogenetic Analysis on Support for Grouping Animal MIDA1 Homologs and Fungal Zuotins FAST DATA SETa Full. . . . . . . . . . . . Eukaryotic . . . . . . GAMMA SHAPE SEQUENCES INCLUDED?b PARAMETERc Yes No Yes No 1.02 1.09 1.68 1.82 BOOTSTRAP SUPPORT FOR ad NJ with PAM NJ with BLOSUM NJ with p-Distances MP 75.4 97.8 75.0 98.4 81.6 99.4 77.8 98.0 61.8 99.2 67.6 97.6 46.9 79.0 52.7 69.7 NOTE.—NJ 5 neighbor joining; MP 5 maximum parsimony. a The full data set included all of the sequences listed in Materials and Methods; the eukaryotic data set included the sequences in groups 1, 2, and 3 listed in Materials and Methods (also see fig. 3). b Inclusion or exclusion of the divergent (fast) MIDA1 homologs from Volvox carteri and the Ecdysozoa and Zuotin from Saccharomyces cervisiae. c ML estimate of the shape parameter (a) of a G-distribution describing among-sites rare heterogeneity obtained using the relevant data set. d Bootstrap support for branch a, supporting a J domain clade containing animal MIDA1 homologs and fungal Zuotin homologs, using the analyses listed. Domain Loss in Zuotin Proteins lated to the MIDA1 group (designated the eukaryotic data set) were conducted, providing results virtually identical to those obtained with the larger data set (table 1). Since the MIDA1 homologs from the Ecdysozoa (C. elegans and D. melanogaster) and V. carteri and Zuotin from S. cerevisiae appeared divergent regardless of the region analyzed (see branch lengths in figs. 2 and 3), we examined the impact of removing these rapidly evolving (‘‘fast’’) sequences. Although there are clear limitations imposed by the small number of MIDA1 and Zuotin sequences available, these analyses substantially increased bootstrap support for an animal-fungal J domain clade (table 1) for NJ (of both PAM and p-distances) and MP analyses, providing additional evidence for root a. Assessing Support for Placement of the Root Using the Resampled Parametric Bootstrap Placement of the root at position a in figure 2 suggests that fungal Zuotin proteins evolved from MIDA1like proteins through loss of the MYB domains, providing the first reported example of this type of domain loss in a specific group of organisms. However, this placement of the root is contingent on the 75% bootstrap support for an animal-fungal J domain clade (fig. 3, branch a). This level of bootstrap support is often considered fairly high, since a number of studies have demonstrated that the bootstrap provides underestimates of the actual confidence level in a phylogenetic hypothesis under a number of conditions (Zharkikh and Li 1992, 1995; Felsenstein and Kishino 1993; Hillis and Bull 1993). However, given the implications for the origin of Zuotin proteins that root a has, we wanted to examine support for an animal-fungal J domain clade in a more rigorous fashion. The complex relationship between bootstrap proportion and the probability that a specific clade exists (Hillis and Bull 1993; Zharkikh and Li 1995; Huelsenbeck, Hillis, and Jones 1996) represents a frustrating aspect of bootstrap analyses. Since Monte Carlo simulation allows the analysis of a variety of informative statistics when their distributions cannot be obtained from theory (Besag and Diggle 1977), we envisioned that it might provide a solution to this problem. Indeed, Monte Carlo simulation was one method used to determine the tendency of the bootstrap to underestimate the support for specific groups (Hillis and Bull 1993). Monte Carlo simulation has also been used extensively to examine bias in phylogenetic analyses (e.g., Huelsenbeck, Hillis, and Jones 1996; Huelsenbeck 1998; Chang and Campbell 2000; Sanderson et al. 2000), so an additional benefit of this approach is its ability to determine whether the position of the J domains from plant MIDA1 homologs might be driven by differences in their evolutionary rates. To determine the probability of observing a bootstrap value supporting root a as extreme as that observed (fig. 3 and table 1) given the null hypothesis (H0) of placing the root in position b, 500 data sets were simulated assuming H0. Bootstrap support for monophy- 1407 ly of animal-fungal MIDA1/Zuotin J domains—the alternative hypothesis (HA) in this analysis—was then estimated for each simulation. Analyses of the eukaryotic data set using NJ of PAM distances allowed us to reject H0 (root position b) whether or not the fast sequences were included (23 simulations $ 75% bootstrap support, P 5 0.046 [all sequences]; 4 simulations $ 98% bootstrap support, P 5 0.008 [fast sequences removed]), although support was stronger when fast sequences were removed. Similar results were obtained using the full data set (P 5 0.06 [all sequences] and P 5 0.02 [fast sequences removed]), although analyses using the broader set of sequences required substantially longer times to run, and the bootstrap support using all sequences was slightly lower than necessary for significance. Thus, depending on the specific data set analyzed, the support for placement of the root of the MIDA1/ Zuotin subgroup in position a was usually significant and sometimes highly significant. Since the methodology we employed involves nonparametric bootstrap resampling of data sets generated by Monte Carlo simulation (also called the parametric bootstrap), we call this Monte Carlo approach the ‘‘resampled parametric bootstrap.’’ The value of employing the resampled parametric bootstrap in this study rather than applying a ‘‘rule-ofthumb’’ critical value for the bootstrap (such as 70%) is revealed by differences in the critical bootstrap value (the value defining the upper 5% of the null distribution) calculated by simulation. For this analysis, the critical value ranges from as little as 73% (for the eukaryotic data set) to as much as 94% (for the full data set excluding fast sequences). Thus, the inclusion of different sequences changed both the observed bootstrap proportions and the meaning of specific bootstrap values. This variation in the correspondence between bootstrap values and confidence in specific clades given differences in factors such as branch lengths is consistent with the results of previous simulations (Huelsenbeck, Hillis, and Jones 1996), suggesting that application of the resampled parametric bootstrap to various problems can provide a much more rigorous test of support for specific phylogenetic hypotheses. The resampled parametric bootstrap can provide a rigorous test for bias due to factors such as long-branch attraction (Felsenstein 1978), allowing one to determine whether long branches in a given phylogeny are long enough to both attract and exhibit the level of bootstrap support observed from the data. Since the H0 topology is characterized by a very short internal branch uniting the animals and the plants, as well as long branches to the outgroup and plant MIDA1 subgroups (data not shown), long-branch attraction might explain the observed support for HA (fig. 3 and table 1). Indeed, phylogenetic analyses of some data sets simulated assuming H0 revealed relationships consistent with HA (table 2). However, the observed bias did not appear to be extremely strong, with fewer than 60% of simulated replicates supporting HA in analyses based on either NJ of corrected distances or MP. Since the resampled parametric bootstrap is a method of establishing expected 1408 Braun and Grotewold Table 2 Bias in Phylogenetic Estimation for MIDA1/Zuotin Phylogenya PERCENTAGES SUPPORTING HAc FAST DATA SETb Full. . . . . . . . . . . . . . . Eukaryotic . . . . . . . . . SEQUENCES INCLUDED?b NJ with PAM NJ with BLOSUM NJ with p-Distances MP Yes No Yes No 38.4 51.2 31.4 52.2 37.2 51.4 34.2 46.6 65.6 86.2 66.2 92.6 30.2 55.4 37.4 57.8 NOTE.—NJ 5 neighbor joining; MP 5 maximum parsimony. a Tendency of phylogenetic analyses to support H (monophyly animal MIDA1 fungal) when simulations assume H A 0 (animal-plant monophyly). b Members of each data set are indicated in figure 3 and table 1. c Percentage of data sets simulated assuming H that have a clade consistent with branch a when analyzed using the 0 method indicated. bootstrap proportions given a specific null hypothesis, it provides a more conservative means to assess the impact of long-branch attraction on phylogenetic analyses. In fact, this approach was recently used specifically to examine potential bias in estimates of phylogeny based on vertebrate rhodopsin sequences (Chang and Campbell 2000). The ability of the resampled parametric bootstrap to provide information regarding the impact of bias on phylogenetic analyses might represent an advantage over other methods available for establishing the statistical significance of bootstrap support (e.g., Zharkikh and Li 1995; Efron, Halloran, and Holmes 1996). However, our analyses of J domain phylogeny suggest a need to extend the resampled parametric bootstrap beyond analyses using simulation under H0, since analyses conducted using either MP or NJ of p-distances did not provide significant support for HA (P 5 0.118 [MP] and P 5 0.3 [NJ of p-distances]). Although these results are consistent with the lower bootstrap proportions observed using these methods (table 1), they raise the question of why only one method provided significant support. To address this issue, 500 data sets were simulated assuming HA (monophyly of animal and fungal J domains), and the proportion of bootstrap proportions less than the critical value estimated by simulation under H0 was found, since this corresponded to the type II error probability. The power of a statistical analysis is defined as the probability that it can reject the null hypothesis (Cohen 1977), i.e., one minus the probability of type II error. We found that analyses using NJ of PAM distances (power 5 0.614) were more powerful than analyses using MP (power 5 0.442) or NJ of pdistances (power 5 0.3). The results of this dual simulation strategy suggest that the failure of analyses using either MP or NJ of p-distances to reject H0 could reflect the lower power of these analyses. The incorporation of prior knowledge about types of amino acid substitutions likely to occur during protein evolution based on analysis of many of homologous proteins, using evolutionary models such as the PAM model (Dayhoff, Schwartz, and Orcutt 1978), may increase the power of phylogenetic analyses. However, it is important to note that the current study assumed that the PAM1G model of evolution was a sufficiently accurate representation of the actual process of J domain evolution to provide useful estimates of both type I and type II error for phylogenetic analyses using this region. The PAM1G model of evolution had the best fit to the J domain data (see above), and the observed proportion of conservative amino acid substitutions in the J domain was close to that expected under the PAM1G model (see below), suggesting that it is a reasonable representation of J domain evolution despite the fact that the actual process of amino acid substitution in proteins is almost certainly more complex than the process assumed by the PAM1G model. Despite these concerns regarding differences between the PAM1G model of sequence evolution and the actual process of amino acid substitution for J domains, we believe that the simulations conducted as part of this study provide reasonable estimates of both type I and type II error because the PAM model of evolution with among-sites rate heterogeneity captures many notable aspects of J domain evolution. In fact, the examination of power using this dual simulation approach and the resampled parametric bootstrap may provide valuable information regarding the behavior of different methods of phylogenetic estimation. Simulations have shown that the best methods for examining specific phylogenetic problems can be difficult to predict by evaluating model fit alone. For example, NJ of p-distances may be more efficient than NJ of distances estimated using a more complex model even when the data were generated using the more complex model (Takahashi and Nei 2000). Likewise, MP or ML using a simple model may also be more efficient than ML using the true model for some trees (Hillis, Huelsenbeck, and Swofford 1994; Yang 1997; Huelsenbeck 1998). Thus, NJ or ML analysis using the best-fitting model (based on ML analyses) may not necessarily represent the best method of phylogenetic inference. Indeed, the dual simulation approach can even provide information about the best method of phylogenetic estimation when proteins exhibit patterns of evolution that have not been implemented as models in an ML framework. Evaluating the probability of type II error for phylogenetic analyses using different models, distance transformations, or weighting schemes could represent an excellent method of choosing among different analyses. Likewise, the impact of changing the set of se- Domain Loss in Zuotin Proteins quences included in a phylogenetic analysis can be evaluated. We found a slight decrease in power for analyses using the eukaryotic data set with the fast sequences removed (power 5 0.552). This suggests that inclusion of J domain sequences from the Ecdysozoa, S. cerevisiae, and V. carteri was beneficial for these analyses, despite the increased bootstrap support for the animalfungal clade (table 1) and the clear rejection of H0 using NJ of PAM distances after eliminating divergent J domain sequences from the fast taxa (see above). Furthermore, some of the fast J domain sequences were involved in rearrangements within the animal MIDA1 and fungal Zuotin group, with some analyses suggesting that S. cerevisae Zuotin is the outgroup of the animal and fungal sequences (also note the very short branch uniting the fungi in fig. 3). The strong support for monophyly of the fungi and other eukaryotic kingdoms (fig. 2) suggests that the inference of different topologies for the MIDA1/Zuotin J domains reflects the limited length of the J domain. This further emphasizes the need for caution in placing the root of this group and underscores the need to conduct the simulations we used to examine the position of the root. Analyses excluding the divergent J domain sequences exhibited a greater degree of bias in phylogenetic reconstruction (table 2), probably explaining the decrease in power. Despite the limited power of all analytical approaches used, which highlighted the need for caution when interpreting phylogenetic analyses of relatively short sequence alignments, such as the J domain alignment analyzed here, we believe that the rejection of H0 using NJ of PAM distances provided evidence for root a. The limited power of analyses using either MP or NJ of p-distances also provided a reasonable explanation for the inability of these analyses to support root a, and it is important to note that none of these analyses supported root b. Taken as a whole, the analyses we conducted provided evidence that fungal Zuotin proteins were derived from MIDA1-like proteins present in the common ancestor of animals, plants, and fungi by complete loss of the carboxyl-terminal MYB domains. Comparison Between Patterns of Evolution for the J Domain and the PAM Model of Evolution An often untested aspect of parametric tests of evolutionary hypotheses is the impact of model inaccuracy on their results. Although the PAM model of sequence evolution exhibited a better fit to the evolution of the J domain than the other models tested (the JTT and BLOSUM models; see above), an untested model that fits the data even better may exist. To examine the fit of the PAM model, the types of amino acid changes that occurred in the J domain were reconstructed by equally weighted parsimony. As we expected, a substantial bias toward conservative changes was evident in the J domain region (213 of 416 unambiguous changes were conservative substitutions, using the categories described by Dayhoff, Schwartz, and Orcutt [1978]). This number of conservative changes is significantly greater than that expected under an uncon- 1409 strained model (60.3 conservative substitutions expected, x2 5 440.23, P , 0.001) or a model constrained to amino acid substitutions involving single-nucleotide substitutions (145.6 conservative substitutions expected, x2 5 48.0, P , 0.001). However, the number of observed conservative changes (213) was only slightly greater than expected under the PAM1G model of evolution (191 conservative substitutions expected, x2 5 4.63, P 5 0.032). Comparison of observed and expected numbers of substitutions in each amino acid category revealed that the deviation from the PAM model resulted from greater numbers of conservative changes involving the hydrophobic (55 observed and 27.3 expected, x2 5 30.13, P , 0.001) and aromatic (20 observed and 10.3 expected, x2 5 9.5, P 5 0.002) groups than expected under the PAM1G model. The observed numbers of substitutions in other categories did not differ from expectation under the PAM1G model (P . 0.1). Thus, the PAM1G model of evolution appeared to exhibit a relatively good fit to the J domain data with the exception of the excess substitutions in two specific amino acid classes. To examine the robustness of phylogenetic analyses to the observed deviation from the PAM model, we examined bootstrap support for an animal-fungal J domain clade (fig. 3, branch a) based on NJ of distance estimates obtained using a model of evolution based on the BLOSUM matrix (Henikoff and Henikoff 1992). The BLOSUM model was more tolerant of substitutions involving both hydrophobic and aromatic amino acids than the PAM model, probably explaining the poorer fit of this model to the data in ML analyses (see above). Although NJ analyses using BLOSUM distances resulted in an increased probability of type II error (power 5 0.494), the observed level of bootstrap support was similar to that obtained using NJ of PAM distances (table 1), and we found significant support for the existence of an animal-fungal J domain clade (P 5 0.046 with the eukaryotic data set). Likewise, NJ of BLOSUM distances did not result in substantially increased bias in phylogenetic reconstruction (table 2). These results suggest that the results of NJ of corrected distances for the J domain are relatively robust to deviations between the model of sequence evolution used to obtain distance estimates and the actual pattern of evolution, providing additional support for our conclusions regarding the origin of fungal Zuotins by the process of complete domain loss in the fungal lineage. Despite the support for root a provided by parametric approaches to phylogenetic analyses (see above), we wanted to examine the amino acid changes that supported different placements of the root for MIDA1 and Zuotin phylogeny. Thus, we looked for potential synapomorphies (shared derived character states) uniting either the animal MIDA1 and fungal Zuotin J domains (root a) or the animal and plant MIDA1 J domains (root b). Examination of ancestral state reconstructions for all sites in the J domain alignment revealed the presence of two sites (aligned to Val 124 and Lys 150 of the S. cerevisiae Zuotin sequence) supporting root a and the absence of sites supporting root b, consistent with our 1410 Braun and Grotewold phylogenetic analyses supporting root a (fig. 3 and table 1). The first site supporting root a involved a change from Ala in the plant and outgroup J domains to Val in the animals and fungi, while the second involved a change from Glu in the plant and outgroup J domains to Lys in the animals and fungi. Volvox carteri GlsA differs from animal, fungal, plant, and outgroup J domains at both of these sites (with Cys and Asp residues aligned to Val 124 and Lys 150 of the S. cerevisiae Zuotin sequence, respectively). Since V. carteri GlsA is one of the divergent sequences (see branch lengths in figs. 2 and 3), the existence of unique (autapomorphic) character states in this sequence is not surprising. We believe that the support for root a from parametric analyses, coupled with the absence of any sites potentially supporting monophyly of J domains from animal and plant MIDA1-like proteins, provides strong evidence for our hypothesis that fungal Zuotin proteins evolved from MIDA1-like ancestors through complete domain loss. Additional Applications of the Resampled Parametric Bootstrap The use of a dual simulation approach to examine both type I and type II error represents an excellent method for examining specific questions in molecular evolution. Previous simulation studies to examine power in phylogenetic analyses have focused on establishing the number of sites necessary to have a given probability of reconstructing the correct topology (Hillis, Huelsenbeck, and Swofford 1994; Huelsenbeck, Hillis, and Jones 1996). The dual simulation approach we employed here could extend this approach to power analysis by establishing the number of sites necessary to reject a null hypothesis with specific type I and type II error probabilities, such as the values used by convention in many studies (probability of type I error 5 0.05 and power 5 0.8; see Cohen 1977). However, we want to emphasize that the dual simulation approach used here is not limited to the resampled parametric bootstrap. Indeed, it could also be applied to tests that compare differences in ML, MP, or minimum evolution (ME) scores with a null distribution generated by simulation (e.g., Swofford et al. 1996; a Monte Carlo approach called the ‘‘SOWH test’’ by Goldman, Anderson, and Rodrigo [2000]). The use of the resampled parametric bootstrap and either NJ or MP with a fast search algorithm (e.g., the methods used by Takahashi and Nei 2000) may have benefits relative to the SOWH test for some phylogenetic problems. The SOWH test requires the identification of the optimal topology using ML, MP, or ME for each simulated data set, which may not be tractable from a computational standpoint for data sets with a large number of sequences. Since increased taxon sampling has been suggested to improve the accuracy of phylogenetic estimation in some cases (Lecointre et al. 1993; Hillis 1996), the ability to conduct simulation-based analyses of such large data sets is clearly desirable. In addition, searches for optimal topologies may not result in the identification of the true topology when applied to data sets with a limited number of aligned sites, like the J domain (fig. 1). In fact, simulation studies have demonstrated that the true topology often has a slightly suboptimal ML, MP, or ME score in analyses of short alignments (see Nei, Kumar, and Takahashi 1998). Even if the true topology has the optimal ML, MP, or ME score, it may be one of multiple equally optimal topologies identified using these criteria when the alignment is short. Thus, it has been suggested that phylogenetic analyses of short alignments should focus on establishing support for specific groups rather than conducting extensive tree searches in order to identify optimal topologies (Nei, Kumar, and Takahashi 1998). Since the resampled parametric bootstrap uses the support for a specific group as a test statistic, it may have advantages over the SOWH test (which uses differences between the values of optimality criteria as a test statistic) for analyses of short alignments. In this study, all simulations were focused on testing a single null hypotheses developed by combining information about the domain structures of MIDA1 and Zuotin proteins, previous estimates of eukaryotic phylogeny, and estimates of phylogeny obtained using the larger Zuotin homology region. Multiple simulations were used to assess the impact of changing both the set of sequences analyzed and the parameters describing the evolutionary process on our tests of this null hypothesis. Thus, the null hypothesis was rejected in this study with P-critical 5 0.05, with the results of different simulations interpreted as evidence that differences in taxon selection and parameter estimates had limited impact on the null distribution of bootstrap proportions. The similarities of the null distributions obtained with different simulations suggest that our conclusions regarding J domain phylogeny are fairly robust (see above). However, we stress the importance of conducting an appropriate correction for global type I error (e.g., Rice 1989) when multiple independent phylogenetic hypotheses are examined. Since corrections of type I error for conducting multiple independent tests inflate type II error, conducting simulations under the relevant alternative hypotheses in order to assess the power of phylogenetic analyses will also provide valuable information when these corrections are used. Conclusions The role of domain loss in producing variation in the distribution of domains in multidomain proteins is not unexpected, and proteome scale studies of the distribution of multidomain proteins have suggested that specific processes have acted to balance the formation of multidomain proteins (Wolf et al. 1999). However, the relative contributions of protein fission, domain loss after gene duplication, and complete domain loss remain unclear. In fact, the complete domain loss apparently involved in the origin of Zuotin from MIDA1-like proteins can only be demonstrated using complete genome sequence information for relevant organisms. The limited sequence length and high degree of divergence between homologous domains present in Domain Loss in Zuotin Proteins many multidomain proteins that exhibit different domain structures present substantial challenges for the phylogenetic analyses required to distinguish between gene loss, domain gain, and domain loss. The dual simulation strategy described here allows examination of the impact that both limited sequence length and high degrees of sequence divergence have on type I and type II error in phylogenetic estimation. This method could be combined with broader surveys of complete domain loss as a useful strategy for the identification of proteins that have undergone functional changes during evolution. As genomic sequence data accumulate, it should become possible to conduct these types of analyses in a framework that explicitly considers the probabilities of domain gain and loss, as well as the probabilities of gene duplication and gene loss. Indeed, considering the possibility of complete domain loss may alter the interpretation of previous surveys of gene loss in particular groups of organisms (e.g., Aravind et al. 2000; Braun et al. 2000), since some putative gene loss events revealed by these surveys may correspond only to (complete) domain losses. Acknowledgments We thank Rebecca Kimball for helpful discussions regarding both phylogenetic analyses and the use of Monte Carlo simulation and for careful reading of this manuscript. We are also grateful for the comments of two anonymous reviewers, which improved this manuscript. This work was supported by the USDA (fellowship 1999-01582 to E.L.B), the National Science Foundation (grant MCB-9896111 to E.G.), and grants to E.G. from Pioneer Hi-Bred International and the Ohio State University Office of Research. LITERATURE CITED AGUINALDO, A. M. A., J. M. TURBEVILLE, L. S. LINFORD, M. C. RIVERA, J. R. GAREY, R. A. RAFF, and J. A. LAKE. 1997. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387:489–493. ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. H. ZHANG, Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. ARAVIND, L., H. WATANABE, D. J. LIPMAN, and E. V. KOONIN. 2000. Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc. Natl. Acad. Sci. USA 97: 11319–11324. BALDAUF, S. L. 1999. A search for the origins of animals and fungi: comparing and combining molecular data. Am. Nat. 154:S178–S188. BALDAUF, S. L., and J. D. PALMER. 1993. Animals and fungi are each other’s closest relatives: congruent evidence from multiple proteins. Proc. Natl. Acad. Sci. USA 90:11558– 11562. BATEMAN, A., E. BIRNEY, R. DURBIN, S. R. EDDY, K. L. HOWE, and E. L. L. SONNHAMMER. 2000. The PFAM protein families database. Nucleic Acids Res. 28:263–266. BESAG, J., and P. J. DIGGLE. 1977. Simple Monte Carlo tests for spatial patterns. Appl. Stat. 26:327–333. 1411 BRAUN, E. L., and E. GROTEWOLD. 1999. Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family. Plant Physiol. 121:21–24. BRAUN, E. L., A. L. HALPERN, M. A. NELSON, and D. O. NATVIG. 2000. Large-scale comparison of fungal sequence information: mechanisms of innovation in Neurospora crassa and gene loss in Saccharomyces cerevisiae. Genome Res. 10:416–430. BRAUN, E. L., S. KANG, M. A. NELSON, and D. O. NATVIG. 1998. Identification of the first fungal annexin: analysis of annexin gene duplications and implications for eukaryotic evolution. J. Mol. Evol. 47:531–543. CHANG, B. S. W., and D. L. CAMPBELL. 2000. Bias in phylogenetic reconstructions of vertebrate rhodopsin sequences. Mol. Biol. Evol. 17:1220–1231. COHEN, J. 1977. Statistical power analysis for the behavioral sciences. Academic Press, New York. DAYHOFF, M. O., R. M. SCHWARTZ, and B. C. ORCUTT. 1978. A model of evolutionary change in proteins. Pp. 345–352 in M. O. DAYHOFF, ed. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Springs, Md. DOOLITTLE, R. F. 1995. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64:287–314. EFRON, B., E. HALLORAN, and S. HOLMES. 1996. Bootstrap confidence levels for phylogenetic trees. Proc. Natl. Acad. Sci. USA 93:13429–13434. ENRIGHT, A. J., I. ILIOPOULOS, N. C. KYRPIDES, and C. A. OUZOUNIS. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90. FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401–410. ———. 1985. Confidence limits on phylogenies—an approach using the bootstrap. Evolution 39:783–791. ———. 1988. Phylogenies from molecular sequences—inference and reliability. Annu. Rev. Genet. 22:521–565. FELSENSTEIN, J., and H. KISHINO. 1993. Is there something wrong with the bootstrap on phylogenies—a reply. Syst. Biol. 42:193–200. GOLDMAN, N., J. P. ANDERSON, and A. G. RODRIGO. 2000. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49:652–670. GOLDMAN, N., and S. WHELAN. 2000. Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol. Biol. Evol. 17:975–978. GRASSLY, N. C., J. ADACHI, and A. RAMBAUT. 1997. PSeqGen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13:559–560. HENIKOFF, S., E. A. GREENE, S. PIETROKOWSKI, P. BORK, T. K. ATTWOOD, and L. HOOD. 1997. Gene families: the taxonomy of protein paralogs and chimeras. Science 278:609– 614. HENIKOFF, S., and J. G. HENIKOFF. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915–10919. HILLIS, D. M. 1996. Inferring complex phylogenies. Nature 383:130–131. HILLIS, D. M., and J. J. BULL. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182–192. HILLIS, D. M., J. P. HUELSENBECK, and D. L. SWOFFORD. 1994. Hobgoblin of phylogenetics? Nature 369:363–364. HUELSENBECK, J. P. 1998. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47: 519–537. 1412 Braun and Grotewold HUELSENBECK, J. P., D. M. HILLIS, and R. JONES. 1996. Parametric bootstrapping in molecular phylogenies: applications and performance. Pp. 19–45 in J. D. FERRARIS and S. R. PALUMBI, eds. Molecular zoology. Wiley-Liss, New York. HUYNEN, M. A., and P. BORK. 1998. Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95:5849–5856. KELLEY, W. L. 1998. The J-domain family and the recruitment of chaperone power. Trends Biochem. Sci. 23:222–227. LECOINTRE, G., H. PHILIPPE, H. L. V. LÊ, and H. LE GUYADER. 1993. Species sampling has a major impact on phylogenetic inference. Mol. Phylogenet. Evol. 2:205–224. LI, W.-H. 1997. Molecular evolution. Sinauer, Sunderland, Mass. MADDISON, W. P., and D. R. MADDISON. 1992. MacClade: analysis of phylogeny and character evolution. Sinauer, Sunderland, Mass. MARCOTTE, E. M., M. PELLEGRINI, H. L. NG, D. W. RICE, T. O. YEATES, and D. EISENBERG. 1999. Detecting protein function and protein-protein interactions from genome sequences. Science 285:751–753. MATSUMOTO-TANIURA, N., F. PIROLLET, R. MONROE, L. GERACE, and J. M. WESTENDORF. 1996. Identification of novel M phase phosphoproteins by expression cloning. Mol. Biol. Cell 7:1455–1469. MILLER, S. M., and D. L. KIRK. 1999. GlsA, a Volvox gene required for asymmetric division and germ cell specification, encodes a chaperone-like protein. Development 126: 649–658. NEI, M., S. KUMAR, and K. TAKAHASHI. 1998. The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc. Natl. Acad. Sci. USA 95:12390– 12397. NELSON, R. J., T. ZIEGELHOFFER, C. NICOLET, M. WERNERWASHBURNE, and E. A. CRAIG. 1992. The translation machinery and 70 kD heat shock protein cooperate in protein synthesis. Cell 71:97–105. NIKOH, N., N. HAYASE, N. IWABE, K. KUMA, and T. MIYATA. 1994. Phylogenetic relationship of the kingdoms Animalia, Plantae, and Fungi, inferred from 23 different protein species. Mol. Biol. Evol. 11:762–768. OUZOUNIS, C. A., and A. G. PAPAVASSILIOU. 1997. DNA-binding motifs of eukaryotic transcription factors. Pp. 1–21 in A. G. PAPAVASSILIOU, ed. Transcription factors in eukaryotes. R. G. Landes, Austin, Tex. PHILIPPE, H., and J. LAURENT. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616–623. RICE, W. R. 1989. Analyzing tables of statistical tests. Evolution 43:223–225. RODRIGUEZ-MONGE, L., C. A. OUZOUNIS, and N. C. KYRPIDES. 1998. A ferredoxin-like domain in RNA polymerase 30/40kDa subunits. Trends Biochem. Sci. 23:169–170. ROSE, A., I. MEIER, and U. WIENAND. 1999. The tomato I-box binding factor LeMYBI is a member of a novel class of Myb-like proteins. Plant J. 20:641–652. RUBIN, G. M., M. D. YANDELL, J. R. WORTMAN et al. (55 coauthors). 2000. Comparative genomics of the eukaryotes. Science 287:2204–2215. SAITOU, N., and M. NEI. 1987. The neighbor-joining method— a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425. SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. S. KHAN, and S. G. BRADY. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782–797. SHOJI, W., T. INOUE, T. YAMAMOTO, and M. OBINATA. 1995. MIDA1, a protein associated with Id, regulates cell growth. J. Biol. Chem. 270:24818–24825. SNEL, B., P. BORK, and M. HUYNEN. 2000. Genome evolution. Gene fusion vs. gene fission. Trends Genet. 16:9–11. STEELE, R. E., N. A. STOVER, and M. SAKAGUCHI. 1999. Appearance and disappearance of Syk family protein-tyrosine kinase genes during metazoan evolution. Gene 239:91–97. SWOFFORD, D. L. 2000. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4.0b4a. Sinauer, Sunderland, Mass. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, and D. M. HILLIS. 1996. Phylogenetic inference. Pp. 407–514 in D. M. HILLIS, C. MORITZ, and B. K. MABLE, eds. Molecular systematics. Sinauer, Sunderland, Mass. TAKAHASHI, K., and M. NEI. 2000. Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol. Biol. Evol. 17:1251–1258. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. CLUSTAL W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. WAINRIGHT, P. O., G. HINKLE, M. L. SOGIN, and S. K. STICKEL. 1993. Monophyletic origins of the metazoa: an evolutionary link with fungi. Science 260:340–342. WANG, D. Y. C., S. KUMAR, and S. B. HEDGES. 1999. Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc. R. Soc. Lond. B Biol. Sci. 266:163–171. WILHELM, M. L., J. REINBOLT, J. GANGLOFF, G. DIRHEIMER, and F. X. WILHELM. 1994. Transfer RNA binding protein in the nucleus of Saccharomyces cerevisiae. FEBS Lett. 349: 260–264. WOLF, Y. I., S. E. BRENNER, P. A. BASH, and E. V. KOONIN. 1999. Distribution of protein folds in the three superkingdoms of life. Genome Res. 9:17–26. YAN, W., B. SCHILKE, C. PFUND, W. WALTER, S. KIM, and E. A. CRAIG. 1998. Zuotin, a ribosome-associated DnaJ molecular chaperone. EMBO J. 17:4809–4817. YANG, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306–314. ———. 1997. How often do wrong models produce better phylogenies? Mol. Biol. Evol. 14:105–108. ZHANG, S., C. LOCKSHIN, A. HERBET, E. WINTER, and A. RICH. 1992. Zuotin, a putative Z-DNA binding protein in Saccharomyces cerevisae. EMBO J. 11:3787–3796. ZHARKIKH, A., and W.-H. LI. 1992. Statistical properties of bootstrap estimation of phylogenetic estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J. Mol. Evol. 35:356–366. ———. 1995. Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. Mol. Phylogenet. Evol. 4:44–63. ELIZABETH KELLOGG, reviewing editor Accepted March 23, 2001
© Copyright 2026 Paperzz